Visualizing Association

Open In Colab

Visualizing Association#

import pandas as pa
import matplotlib.pyplot as plt
import seaborn as sns


df = pa.read_csv('https://raw.githubusercontent.com/nurfnick/Data_Viz/main/Data_Sets/iris.csv')

Scatter Plots#

The most important visualization is the scatter plot. It will help us see association between two (or possibly more) variables.

ax = sns.scatterplot(data = df, x = 'SepalLength', y = 'SepalWidth')
ax.set(title = "Length vs Width",
       xticks = [x for x in range(4,9,1)])
plt.show()
../../_images/68de7cafc6077af2ed7baceb5aafc127b70097ef317c0958feec1c2f49ee3709.png

The nice part about seaborn is I can add other aspects quickly.

sns.scatterplot(data = df, x = 'SepalLength', y = 'SepalWidth', hue = "Class")
<matplotlib.axes._subplots.AxesSubplot at 0x7fcb127e8b90>
../../_images/00a17a02e39ea063522602cfb1e020a44d46acf731097706d0640baa42c58593.png

I can pick the colors I want too! Here I do it with a dictionary.

colors = ['blue', 'green','orange']
colordict = {}
for i,name in enumerate(df.Class.unique()):
  colordict[name] = colors[i]
sns.scatterplot(data = df, 
                x = 'SepalLength', 
                y = 'SepalWidth', 
                hue = "Class", 
                palette = colordict )
<matplotlib.axes._subplots.AxesSubplot at 0x7fcb1226fa10>
../../_images/d3e03db0763a3d68cb0dedb7ee124d568f27c70efe16693670343e7989f4ad32.png

If you prefer you can change the marker

sns.scatterplot(data = df, 
                x = 'SepalLength', 
                y = 'SepalWidth',
                hue = 'Class',
                style= 'Class' )
<matplotlib.axes._subplots.AxesSubplot at 0x7fcb122cc0d0>
../../_images/ef6eccc1bd0cf89b7b6f8ddc6c078ae0332f0798a4d83f78ac4f6cd0b16cb937.png

We can vary the size of each entry too.

ax = sns.scatterplot(data = df, 
                x = 'SepalLength', 
                y = 'SepalWidth',
                hue = 'Class',
                size = 'PedalWidth')

sns.move_legend(ax, "upper right", bbox_to_anchor=(-.2, 1))
../../_images/646ba93ea74c8dd699119790a34700323bedc308a8a48702ecfd78716d529838.png

Adding the line of best fit (or regression) is easy.

sns.regplot(data = df, 
            x = 'SepalLength', 
            y = 'SepalWidth',
            ci = False, #I removed the confidence interval!
            order = 1)
<matplotlib.axes._subplots.AxesSubplot at 0x7faf7ce3a090>
../../_images/39af11c3c2ad63e9113bc9a8b2e74a38a2161d2fd304f43aca6fae4b7873af9e.png
sns.lmplot(data = df, 
                x = 'SepalLength', 
                y = 'SepalWidth',
                hue = 'Class',
                ci = False )
<seaborn.axisgrid.FacetGrid at 0x7faf7cd321d0>
../../_images/6c6c3ebff929b138a613d240c3e3252eee89cd1be1b21146c7f6d0c7cad9faba.png

Often it is nice to look at all of the associations in your data quickly.

g = sns.PairGrid(df, hue="Class")
g.map_diag(sns.histplot)
g.map_offdiag(sns.scatterplot)
g.add_legend()
plt.show()
../../_images/2da98ff9f4734d43f2f88109f9358f9e9149382fd24f16db4634b5bd0882a6ad.png

Heat Map#

Heat maps show correlation quickly between the variables. You’ll need to pass the correlation to make the map work.

sns.heatmap(df.corr(), annot=True, linewidths=0.5,vmin = -1)
<matplotlib.axes._subplots.AxesSubplot at 0x7faf793a4b10>
../../_images/928165d278ed9506f4cf6bc4ef484fc3e9df73de4023a3e7f03aea52b6a4b774.png

Your Turn#

Using the workout dataset, create a scatterplot with as many features as possible. Can you get 5 or six variables represented in one graphic?