Open In Colab

Proportions#

There are many ways to visualize proportions. We have already seen bar graphs but we’ll include them,. stacked bars and pie charts here with matplotlib and seaborn.

import pandas as pa
import matplotlib.pyplot as plt
import seaborn as sns

df = pa.read_csv('https://raw.githubusercontent.com/nurfnick/Data_Viz/main/Data_Sets/Activity_Dataset_V1.csv')

df.head()
Unnamed: 0 activity_day workout_type distance time calories total_steps avg_speed avg_cadence max_cadence ... max_pace min_pace avg_heart_rate max_heart_rate min_heart_rate vo2_max(%) aerobic(%) anaerobic(%) intensive(%) light(%)
0 0 2022-01-01 Freestyle 9.30 77 123 NaN 18.88 168.54 138.30 ... NaN NaN 112.5 122.0 103 19 28 2 7 50
1 1 2022-01-01 Freestyle 3.44 96 55 NaN 29.65 125.92 292.81 ... NaN NaN 111.0 122.0 100 42 28 2 29 88
2 2 2022-01-01 Indoor Cycling 6.34 85 33 NaN 17.85 81.93 323.69 ... NaN NaN 95.0 90.0 100 1 32 0 22 43
3 3 2022-01-01 Walking 7.91 42 82 1571.0 22.10 29.63 180.16 ... 28:58 07:58 83.0 85.0 81 3 22 0 24 65
4 4 2022-01-01 Open Water 8.99 36 131 NaN 25.83 64.55 342.89 ... NaN NaN 138.0 166.0 110 7 0 5 21 88

5 rows × 21 columns

Pies and Bars#

df.groupby('workout_type').workout_type.agg('count')
workout_type
Cricket            93
Freestyle          96
Indoor Cycling     80
Open Water         91
Outdoor Cycling    85
Outdoor Running    81
Pool Swimming      94
Trail Run          90
Treadmill          98
Trekking           94
Walking            98
Name: workout_type, dtype: int64
plt.pie(x=df.groupby('workout_type').workout_type.agg('count'),labels = df.groupby('workout_type').workout_type.agg('count').index, autopct='%.0f%%' )
plt.show()
../../_images/99a79bff3e276a8219d17b6fad7f84448ae820ec5bf8dcb08a30e0334aaa06a0.png

A couple of things about this graphic:

  1. It took a fair amount of work with the groupby and index to create.

  2. It is reporting the percentages rather than the raw numbers (we could fix that!)

  3. Cricket and Walking are right next to each other and get the same color.

Let’s put that same data into a bar chart.

ax = sns.barplot( x= df.groupby('workout_type').workout_type.agg('count').index,
                 y = df.groupby('workout_type').workout_type.agg('count'))
ax.set_xticklabels(ax.get_xticklabels(),rotation = 30)
ax.set_title('Colorful Bar Chart')

plt.show()
../../_images/1ef527075318c289dc45b8cff84c76543f17a1f2d26c56c991249f2b8cb7e084.png
df1 = df.groupby('workout_type').workout_type.agg(count = 'count')
df1 = df1.reset_index()

df1
workout_type count
0 Cricket 93
1 Freestyle 96
2 Indoor Cycling 80
3 Open Water 91
4 Outdoor Cycling 85
5 Outdoor Running 81
6 Pool Swimming 94
7 Trail Run 90
8 Treadmill 98
9 Trekking 94
10 Walking 98

ax = sns.barplot(data = df1,x = 'workout_type', y = 'count')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 30)
ax.set_title('Coolest Graph')

plt.show()
../../_images/2cae17cd47811c4aac3b8001f34a3af78a50248c83f21d1d99a4b15c874cf34f.png

Mosaic Plots#

from statsmodels.graphics.mosaicplot import mosaic
df2 = df[(df.activity_day == '2022-01-01')|(df.activity_day == '2022-01-02')|(df.activity_day == '2022-01-03')]
ax = mosaic(df2, [ 'workout_type','activity_day'])

plt.show()
../../_images/3570c12148f6325dbca6e5554c2a14b35afb92b6c1b8108b1b3c51cbc4936326.png
df2.groupby(['workout_type','activity_day']).workout_type.agg('count')
workout_type     activity_day
Cricket          2022-01-01      4
                 2022-01-02      3
Freestyle        2022-01-01      3
                 2022-01-02      4
                 2022-01-03      3
Indoor Cycling   2022-01-01      4
                 2022-01-03      2
Open Water       2022-01-01      4
                 2022-01-02      1
                 2022-01-03      2
Outdoor Cycling  2022-01-01      2
                 2022-01-03      2
Outdoor Running  2022-01-01      3
                 2022-01-02      3
                 2022-01-03      1
Pool Swimming    2022-01-01      1
                 2022-01-02      4
                 2022-01-03      1
Trail Run        2022-01-01      2
                 2022-01-02      3
                 2022-01-03      2
Treadmill        2022-01-01      2
                 2022-01-02      2
                 2022-01-03      3
Trekking         2022-01-01      2
                 2022-01-02      2
                 2022-01-03      5
Walking          2022-01-01      1
                 2022-01-02      1
                 2022-01-03      2
Name: workout_type, dtype: int64

Colors#

You can access the seaborn colors with the following code.

sns.color_palette('bright')

There are many options; deep, muted, pastel, bright, dark, and colorblind.

sns.color_palette('deep')
sns.color_palette('colorblind')
sns.color_palette('coolwarm')

ax = sns.barplot(data = df1,x = 'workout_type', y = 'count', color = 'blue')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 30)
ax.set_title('One Color Graph')

plt.show()
../../_images/ff82d0eb0ed4e4a2453373ff9cb6dbdaa0a7ac3ab2437043546a307fe36bb93b.png

I was not able to get the following code to work in a Jupyter notebook setting!

sns.set_palette('bright')

Instead I placed the color directly in the graphics command.

ax = sns.barplot(data = df1,
                 x = 'workout_type', 
                 y = 'count',
                 palette=sns.color_palette('coolwarm', n_colors= 6))
ax.set_xticklabels(ax.get_xticklabels(),rotation = 30)
ax.set_title('Warm Colored Graph')

plt.show()
../../_images/d675683bbdc5b50c52cff526cc1c62e2d7afe45dae7b7f2271162f4fcc6833c3.png

Annotate#

You can add context directly to your graphic by using the annotate command.

ax = sns.barplot( x= df.groupby('workout_type').workout_type.agg('count').index, y = df.groupby('workout_type').workout_type.agg('count'))
ax.set_xticklabels(ax.get_xticklabels(),rotation = 30)
ax.annotate("Crazy Runners",
            xy = [8,100],
            xytext = [9,110],
            arrowprops = dict(facecolor = 'red', width = 3, headwidth = 12, headlength = 6))
ax.set_title('Annotated Graph')
ax.set_ylabel('Frequency')

plt.show()
../../_images/95dda36f18344c477513661d06cb654147687ba4b74f99a9abbb13eb51341363.png

Your Turn#

  1. Take the first pie chart and create it with the numbers (instead of the percentages) and use a different color scheme that won’t have two right next to each other of the same color. Annotate the graph in some way.

  2. Explain what a Mosaic Plot might be able to show you that a pie and bar chart cannot.