Data Visualization with Pandas and Matplotlib¶
Author: Pete Bryan (@MSSPete)
Project: Infosec Jupyter Book
Public Organization: Microsoft
License: Creative Commons Attribution-ShareAlike 4.0 International
Reference: https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/04_plotting.html
Description¶
When working with data visualziations can be a powerful tool to help you understand the data that you have, and they can be much more effective at helping you identify patterns, trends, and anomalies than raw data analysis alone can.
Python has a wide range of excellent, flexible, and powerful data visualization libraries however when working with data in Pandas the built in integration between Pandas and Matplotlib provides the fastest, and easiest way to simply plot your data. If you are planning on creating a complex, interactive visualization you are better placed using something like Bokeh but whilst learning or experimenting the features provided by pandas and matplotlib are likely to be enough.
Pre-requisites - Reading¶
Line Plot¶
The simplest plot is a line chart, you can plot a line chart of all numerical columns in your Pandas DataFrame with df.plot(). Depending on your dataset this may not produce a very interesting or usable visualization. Often (as with this dataset) this is not the best chart to plot.
Bar Plots¶
You can plot data in veritcal or horizontal bars. You can also plot on a range of data summarizations such as value_counts() as we are using here.
Pie Charts¶
There are a number of customizations for each chart such as showing a legend, which is useful for a pie chart.
Other charts¶
There are a number of other chart types avalibale, some of which are shown below. You can select axis to plot either before calling the plot function or as x and y parameters
Avaliable charts:
Area
Bar
Horizontal Bar
Box
Density
Hexbin
Histogram
Line
Pie
Scatter
Box Plot
Kernel Density Estimate
Some plots can also be done on a Pandas Series:
You can also apply Pandas conditional statements prior to plotting data:
Subplots¶
By default if plots will include all columns in the DataFrame provided overlayed into one chart. You can set subplots=True parameter to split each column into a seperate plot. You can also resize your plots with the figsize parameter.
Summary¶
As you can see Pandas and Matplotlib gives you a very easy and simple way of creating a number of visualizations. Also as is clear with some of these visualizations some fit certain data types better than others, when creating visualizations its important to consider what data you you have and what analysis you are looking to get from the visualization.
For more details on plotting data with Pandas refer to the documentation.