Proportions#
There are many ways to visualize proportions. We have already seen bar graphs but we’ll include them,. stacked bars and pie charts here with matplotlib
and seaborn
.
import pandas as pa
import matplotlib.pyplot as plt
import seaborn as sns
df = pa.read_csv('https://raw.githubusercontent.com/nurfnick/Data_Viz/main/Data_Sets/Activity_Dataset_V1.csv')
df.head()
Unnamed: 0 | activity_day | workout_type | distance | time | calories | total_steps | avg_speed | avg_cadence | max_cadence | ... | max_pace | min_pace | avg_heart_rate | max_heart_rate | min_heart_rate | vo2_max(%) | aerobic(%) | anaerobic(%) | intensive(%) | light(%) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 2022-01-01 | Freestyle | 9.30 | 77 | 123 | NaN | 18.88 | 168.54 | 138.30 | ... | NaN | NaN | 112.5 | 122.0 | 103 | 19 | 28 | 2 | 7 | 50 |
1 | 1 | 2022-01-01 | Freestyle | 3.44 | 96 | 55 | NaN | 29.65 | 125.92 | 292.81 | ... | NaN | NaN | 111.0 | 122.0 | 100 | 42 | 28 | 2 | 29 | 88 |
2 | 2 | 2022-01-01 | Indoor Cycling | 6.34 | 85 | 33 | NaN | 17.85 | 81.93 | 323.69 | ... | NaN | NaN | 95.0 | 90.0 | 100 | 1 | 32 | 0 | 22 | 43 |
3 | 3 | 2022-01-01 | Walking | 7.91 | 42 | 82 | 1571.0 | 22.10 | 29.63 | 180.16 | ... | 28:58 | 07:58 | 83.0 | 85.0 | 81 | 3 | 22 | 0 | 24 | 65 |
4 | 4 | 2022-01-01 | Open Water | 8.99 | 36 | 131 | NaN | 25.83 | 64.55 | 342.89 | ... | NaN | NaN | 138.0 | 166.0 | 110 | 7 | 0 | 5 | 21 | 88 |
5 rows × 21 columns
Pies and Bars#
df.groupby('workout_type').workout_type.agg('count')
workout_type
Cricket 93
Freestyle 96
Indoor Cycling 80
Open Water 91
Outdoor Cycling 85
Outdoor Running 81
Pool Swimming 94
Trail Run 90
Treadmill 98
Trekking 94
Walking 98
Name: workout_type, dtype: int64
plt.pie(x=df.groupby('workout_type').workout_type.agg('count'),labels = df.groupby('workout_type').workout_type.agg('count').index, autopct='%.0f%%' )
plt.show()
A couple of things about this graphic:
It took a fair amount of work with the
groupby
andindex
to create.It is reporting the percentages rather than the raw numbers (we could fix that!)
Cricket and Walking are right next to each other and get the same color.
Let’s put that same data into a bar chart.
ax = sns.barplot( x= df.groupby('workout_type').workout_type.agg('count').index,
y = df.groupby('workout_type').workout_type.agg('count'))
ax.set_xticklabels(ax.get_xticklabels(),rotation = 30)
ax.set_title('Colorful Bar Chart')
plt.show()
df1 = df.groupby('workout_type').workout_type.agg(count = 'count')
df1 = df1.reset_index()
df1
workout_type | count | |
---|---|---|
0 | Cricket | 93 |
1 | Freestyle | 96 |
2 | Indoor Cycling | 80 |
3 | Open Water | 91 |
4 | Outdoor Cycling | 85 |
5 | Outdoor Running | 81 |
6 | Pool Swimming | 94 |
7 | Trail Run | 90 |
8 | Treadmill | 98 |
9 | Trekking | 94 |
10 | Walking | 98 |
ax = sns.barplot(data = df1,x = 'workout_type', y = 'count')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 30)
ax.set_title('Coolest Graph')
plt.show()
Mosaic Plots#
from statsmodels.graphics.mosaicplot import mosaic
df2 = df[(df.activity_day == '2022-01-01')|(df.activity_day == '2022-01-02')|(df.activity_day == '2022-01-03')]
ax = mosaic(df2, [ 'workout_type','activity_day'])
plt.show()
df2.groupby(['workout_type','activity_day']).workout_type.agg('count')
workout_type activity_day
Cricket 2022-01-01 4
2022-01-02 3
Freestyle 2022-01-01 3
2022-01-02 4
2022-01-03 3
Indoor Cycling 2022-01-01 4
2022-01-03 2
Open Water 2022-01-01 4
2022-01-02 1
2022-01-03 2
Outdoor Cycling 2022-01-01 2
2022-01-03 2
Outdoor Running 2022-01-01 3
2022-01-02 3
2022-01-03 1
Pool Swimming 2022-01-01 1
2022-01-02 4
2022-01-03 1
Trail Run 2022-01-01 2
2022-01-02 3
2022-01-03 2
Treadmill 2022-01-01 2
2022-01-02 2
2022-01-03 3
Trekking 2022-01-01 2
2022-01-02 2
2022-01-03 5
Walking 2022-01-01 1
2022-01-02 1
2022-01-03 2
Name: workout_type, dtype: int64
Colors#
You can access the seaborn colors with the following code.
sns.color_palette('bright')
There are many options; deep, muted, pastel, bright, dark, and colorblind.
sns.color_palette('deep')
sns.color_palette('colorblind')
sns.color_palette('coolwarm')
ax = sns.barplot(data = df1,x = 'workout_type', y = 'count', color = 'blue')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 30)
ax.set_title('One Color Graph')
plt.show()
I was not able to get the following code to work in a Jupyter notebook setting!
sns.set_palette('bright')
Instead I placed the color directly in the graphics command.
ax = sns.barplot(data = df1,
x = 'workout_type',
y = 'count',
palette=sns.color_palette('coolwarm', n_colors= 6))
ax.set_xticklabels(ax.get_xticklabels(),rotation = 30)
ax.set_title('Warm Colored Graph')
plt.show()
Annotate#
You can add context directly to your graphic by using the annotate
command.
ax = sns.barplot( x= df.groupby('workout_type').workout_type.agg('count').index, y = df.groupby('workout_type').workout_type.agg('count'))
ax.set_xticklabels(ax.get_xticklabels(),rotation = 30)
ax.annotate("Crazy Runners",
xy = [8,100],
xytext = [9,110],
arrowprops = dict(facecolor = 'red', width = 3, headwidth = 12, headlength = 6))
ax.set_title('Annotated Graph')
ax.set_ylabel('Frequency')
plt.show()
Your Turn#
Take the first pie chart and create it with the numbers (instead of the percentages) and use a different color scheme that won’t have two right next to each other of the same color. Annotate the graph in some way.
Explain what a Mosaic Plot might be able to show you that a pie and bar chart cannot.