Visualize Amounts#
To get started in visualizing, we’ll look at one of the simplest ideas, single quantaties. Let’s grab some data too!
import pandas as pa
df = pa.read_csv('https://raw.githubusercontent.com/nurfnick/Data_Viz/main/Data_Sets/iris.csv')
Bar Charts#
I’ll give a small bar chart of the means of the different Classes of flowers.
df.groupby('Class').SepalLength.agg('mean')
Class
Iris-setosa 5.006
Iris-versicolor 5.936
Iris-virginica 6.588
Name: SepalLength, dtype: float64
df.groupby('Class').SepalLength.agg('mean').plot(kind = 'bar')
<AxesSubplot:xlabel='Class'>
data:image/s3,"s3://crabby-images/89090/8909063f8b5ad61f297c5dec90c6394c087f9453" alt="../../_images/d00142ad0d3cf81eb6b96d48221175076d17ac9380f87e7e36700463a696929f.png"
df.groupby('Class').SepalLength.agg('mean').plot.bar()
<AxesSubplot:xlabel='Class'>
data:image/s3,"s3://crabby-images/89090/8909063f8b5ad61f297c5dec90c6394c087f9453" alt="../../_images/d00142ad0d3cf81eb6b96d48221175076d17ac9380f87e7e36700463a696929f.png"
There are lots of options some of which we should be using reguarly. A title is always nice
df.groupby('Class').SepalLength.agg('mean').plot(kind = 'bar', title = 'Mean by Class')
<AxesSubplot:title={'center':'Mean by Class'}, xlabel='Class'>
data:image/s3,"s3://crabby-images/826be/826be7797d532cd04aa2b4fefd6ec965b3e7f7c5" alt="../../_images/d57b8d748e8503b1b2b0446d7b39eaa311c40e7603797727931ad5909823081f.png"
A vertical description on what the \(y\) axis represents should not be forgotten!
df.groupby('Class').SepalLength.agg('mean').plot(kind = 'bar',
title = 'Mean by Class',
ylabel= 'Mean of Sepal Length')
<AxesSubplot:title={'center':'Mean by Class'}, xlabel='Class', ylabel='Mean of Sepal Length'>
data:image/s3,"s3://crabby-images/b99bf/b99bf187cc160632860c4f595ca7374f9dc4fd14" alt="../../_images/2f7f4739f3e6ab1b0672ff9a1a765d02160d3128391b64fb00b51f30b59fe2fc.png"
One of the complaints about a graphic like this is the length of the class titles. It takes up a lot of vertical space. With a barh
you can change the orientation of the bars.
df.groupby('Class').SepalLength.agg('mean').plot(kind = 'barh',
title = 'Mean by Class',
ylabel= 'Mean of Sepal Length')
<AxesSubplot:title={'center':'Mean by Class'}, ylabel='Class'>
data:image/s3,"s3://crabby-images/a351f/a351f84a0421092e2ce5cd436f4c38b169920bf5" alt="../../_images/5914c5f3f11acba2370044aa73a9c01dfe1e77ad559ba7d0de4d55c9f3c7ba13.png"
I couldn’t get the label of the values to appear, maybe you can?
If there are lots of values, don’t use bars! Let’s see this with a different dataset.
df2 = pa.read_csv('https://raw.githubusercontent.com/nurfnick/Data_Viz/main/Activity_Dataset_V1.csv')
In the following graph it is very difficult to follow the data points across.
df2.groupby('workout_type').calories.agg('mean').sort_values(ascending = True).plot(kind = 'barh')
<AxesSubplot:ylabel='workout_type'>
data:image/s3,"s3://crabby-images/51b33/51b33720c0e6b02deb772216f5bf8614fbd318aa" alt="../../_images/f7ededf3868374c482ee78ef6c6d428629ba4bf738fdfdb1e37ab89b85492ac9.png"
To clear this up you could use a point instead of a bar!
Dot Plots Work Well Too#
df2.groupby('workout_type').calories.agg(['mean']).sort_values(by = 'mean',ascending = True).reset_index().plot.scatter(x = 'mean', y = 'workout_type')
<AxesSubplot:xlabel='mean', ylabel='workout_type'>
data:image/s3,"s3://crabby-images/473ea/473ea14adeeb542993258e353ea5531791278ec9" alt="../../_images/b1750746a00b860fa53b00f7fa571b7ad41b6e43e768b3625b439492c47f66b6.png"
This creates other issues in that the origin of the figure is not zero. To fix that, we simply require that the x limits go from 0 to 310.
df2.groupby('workout_type').calories.agg(['mean']).sort_values(by = 'mean',ascending = True).reset_index().plot.scatter(x = 'mean', y = 'workout_type', xlim = [0,310])
<AxesSubplot:xlabel='mean', ylabel='workout_type'>
data:image/s3,"s3://crabby-images/31e6e/31e6e125f460e3dda4394493d4a3294c0ac6cad0" alt="../../_images/ccb5f5df884995e793809ef495dc0b40c06c389729b6577dc86bd6333c385804.png"
All the workout types are giving us about the same bang for our buck, at least in mean calories.
Adding Labels#
It might also be nice to see the numbers presented with the data. This is esspecially nice for a small number of quantities.
ax = df2.groupby('workout_type').calories.agg(['mean']).sort_values(by = 'mean',ascending = True).reset_index().plot.scatter(x = 'mean', y = 'workout_type')#this made the same graph as above.
for i,k in enumerate(df2.groupby('workout_type').calories.agg(['mean']).sort_values(by = 'mean',ascending = True).reset_index()['mean']): #here I loop through the values, k, and indicies ,i.
ax.annotate(str(int(k)),[k+.2,i+.2])
data:image/s3,"s3://crabby-images/a7263/a7263c404060c7bb826d405481c7e270af73b09b" alt="../../_images/4b06703c53723b2db79d26e1d0282667fbfb6b4de6eba913c1a6d6664caf5143.png"
Bar Charts with Multiple Data#
df.groupby('Class').agg('mean').plot(kind = 'bar')
<AxesSubplot:xlabel='Class'>
data:image/s3,"s3://crabby-images/cfa95/cfa9574e8c7d8b6f6d2e80ee6f8fe946265b2cb4" alt="../../_images/b17bd40bc1cbbb5aaaa3370c34a43c7346a5241ab21dd64f97198441d17c560f.png"
df.groupby('Class').agg('mean').plot(kind = 'bar', stacked = True)
<AxesSubplot:xlabel='Class'>
data:image/s3,"s3://crabby-images/8857a/8857ac34bea56582fa947ccb3211878608c22079" alt="../../_images/99083e86535bd4a4c52ce1728ce5db98a625926439ed12947aeeffe38e369a13.png"
If you want to add labels, it should be simple but the version on colab is out of date… I update here.
!pip install --upgrade matplotlib
Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (3.5.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib) (1.4.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib) (0.11.0)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.7/dist-packages (from matplotlib) (2.8.2)
Requirement already satisfied: pyparsing>=2.2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib) (3.0.7)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.7/dist-packages (from matplotlib) (7.1.2)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from matplotlib) (1.21.5)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.7/dist-packages (from matplotlib) (21.3)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.7/dist-packages (from matplotlib) (4.31.2)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from kiwisolver>=1.0.1->matplotlib) (3.10.0.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7->matplotlib) (1.15.0)
import matplotlib
matplotlib.__version__
'3.5.1'
Now with the correct version it is acually really easy.
ax = df.groupby('Class').agg('mean').plot(kind = 'bar', ylim =[0,8])
for container in ax.containers:
ax.bar_label(container)
data:image/s3,"s3://crabby-images/88bf5/88bf550c477f39ba2cbb5099f201b83491f5e684" alt="../../_images/5b012ff20f0161a07f7c91bc4e32d85a9f5f2c8fad1ff134d306406058ce737f.png"
ax = df.groupby('Class').agg('mean').plot(kind = 'bar', stacked = True)
for container in ax.containers:
ax.bar_label(container)
data:image/s3,"s3://crabby-images/326a6/326a67f27e36acea53593bac39344a9ffb115381" alt="../../_images/089d1959989450da6347e688a935788870f4ff09af01f2655d0af356b19d8e92.png"
Be careful with the stacked as it is giving a cummulative total. This doesn’t really make any sense here…
Your Turn#
Using the Air B&B NYC data complete the following tasks.
Create a bar graph of the maximum ‘price’ by ‘neighbourhood_group’. Include the ‘price’ in your graph
Create a multiple bar graph with ‘neighbourhood_group’ and ‘room_type’ by looking at the average ‘price’.