{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Untitled88.ipynb",
"provenance": [],
"authorship_tag": "ABX9TyP7bmNqzjl3GY+efVjtB+Rd",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"source": [
"# Missing and Incomplete"
],
"metadata": {
"id": "EwT-jhvBfAD2"
}
},
{
"cell_type": "markdown",
"source": [
"Often datasets will be missing entries. There are many approaches we can take to dealing with these errors and omissions. I will examine a dataset on the characters from The Lord of The Rings"
],
"metadata": {
"id": "m9Cg2HjCD09o"
}
},
{
"cell_type": "markdown",
"source": [
"## Finding NaN's"
],
"metadata": {
"id": "NiryQXsjFV_c"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "f3tqDlgJe_cI",
"outputId": "21ebb4f6-69f4-4906-fc35-308166c7444b"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
birth
\n",
"
death
\n",
"
gender
\n",
"
hair
\n",
"
height
\n",
"
name
\n",
"
race
\n",
"
realm
\n",
"
spouse
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
NaN
\n",
"
NaN
\n",
"
Female
\n",
"
NaN
\n",
"
NaN
\n",
"
Adanel
\n",
"
Men
\n",
"
NaN
\n",
"
Belemir
\n",
"
\n",
"
\n",
"
1
\n",
"
TA 2978
\n",
"
February 26 ,3019
\n",
"
Male
\n",
"
Dark (book) Light brown (movie)
\n",
"
NaN
\n",
"
Boromir
\n",
"
Men
\n",
"
NaN
\n",
"
NaN
\n",
"
\n",
"
\n",
"
2
\n",
"
NaN
\n",
"
March ,3019
\n",
"
Male
\n",
"
NaN
\n",
"
NaN
\n",
"
Lagduf
\n",
"
Orcs
\n",
"
NaN
\n",
"
NaN
\n",
"
\n",
"
\n",
"
3
\n",
"
TA 280
\n",
"
TA 515
\n",
"
Male
\n",
"
NaN
\n",
"
NaN
\n",
"
Tarcil
\n",
"
Men
\n",
"
Arnor
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
4
\n",
"
NaN
\n",
"
NaN
\n",
"
Male
\n",
"
NaN
\n",
"
NaN
\n",
"
Fire-drake of Gondolin
\n",
"
Dragon
\n",
"
NaN
\n",
"
NaN
\n",
"
\n",
" \n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
" "
],
"text/plain": [
" birth death gender hair height \\\n",
"0 NaN NaN Female NaN NaN \n",
"1 TA 2978 February 26 ,3019 Male Dark (book) Light brown (movie) NaN \n",
"2 NaN March ,3019 Male NaN NaN \n",
"3 TA 280 TA 515 Male NaN NaN \n",
"4 NaN NaN Male NaN NaN \n",
"\n",
" name race realm spouse \n",
"0 Adanel Men NaN Belemir \n",
"1 Boromir Men NaN NaN \n",
"2 Lagduf Orcs NaN NaN \n",
"3 Tarcil Men Arnor Unnamed wife \n",
"4 Fire-drake of Gondolin Dragon NaN NaN "
]
},
"metadata": {},
"execution_count": 1
}
],
"source": [
"import pandas as pa\n",
"\n",
"df = pa.read_csv('https://raw.githubusercontent.com/nurfnick/Data_Viz/main/Data_Sets/lotr_characters.csv')\n",
"\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"source": [
"We see right away that there are lots of `NaN`'s. This is an empty field in our dataset. Some characters are mentioned but never given much more background than a name."
],
"metadata": {
"id": "wZ8SwMIcGMmP"
}
},
{
"cell_type": "code",
"source": [
"df.isnull().sum(axis = 0)"
],
"metadata": {
"id": "2aPFYQBtHjXr",
"outputId": "d4642faf-29b9-4d79-9308-7e5a456d1c3f",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"birth 207\n",
"death 315\n",
"gender 143\n",
"hair 734\n",
"height 813\n",
"name 0\n",
"race 140\n",
"realm 714\n",
"spouse 403\n",
"dtype: int64"
]
},
"metadata": {},
"execution_count": 2
}
]
},
{
"cell_type": "markdown",
"source": [
"There are null values in every column except name."
],
"metadata": {
"id": "NE2EnNXSHnQ6"
}
},
{
"cell_type": "code",
"source": [
"df.isnull().sum(axis = 1).value_counts().sort_index()"
],
"metadata": {
"id": "mE0t0jowFZAf",
"outputId": "fdd7f69b-768f-4488-81a1-61fabe1dec15",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0 15\n",
"1 59\n",
"2 185\n",
"3 236\n",
"4 178\n",
"5 81\n",
"6 20\n",
"7 1\n",
"8 136\n",
"dtype: int64"
]
},
"metadata": {},
"execution_count": 3
}
]
},
{
"cell_type": "markdown",
"source": [
"Here we see that there are only 15 entries with all fields and 136 that are name only (since name was never blank!) Let's look at just those characters."
],
"metadata": {
"id": "PashKJCsLd1S"
}
},
{
"cell_type": "code",
"source": [
"df[~df.isnull().any(axis = 1)]"
],
"metadata": {
"id": "Jb3H2Dj1LU-n",
"outputId": "74fb7437-fc1e-41ec-ffca-131373930316",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 520
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
birth
\n",
"
death
\n",
"
gender
\n",
"
hair
\n",
"
height
\n",
"
name
\n",
"
race
\n",
"
realm
\n",
"
spouse
\n",
"
\n",
" \n",
" \n",
"
\n",
"
125
\n",
"
SA 3209
\n",
"
TA 2
\n",
"
Male
\n",
"
Black
\n",
"
Very tall almost 7'1
\n",
"
Isildur
\n",
"
Men
\n",
"
Arnor,Gondor
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
134
\n",
"
YT, and perhaps firstborn
\n",
"
Still Alive
\n",
"
Male
\n",
"
Probably Golden
\n",
"
Tall
\n",
"
Ingwë
\n",
"
Elves
\n",
"
Valinor,Taniquetil
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
166
\n",
"
YT
\n",
"
FA 400
\n",
"
Male
\n",
"
Dark
\n",
"
Tall
\n",
"
Eöl
\n",
"
Elves
\n",
"
Nan Elmoth
\n",
"
Aredhel
\n",
"
\n",
"
\n",
"
186
\n",
"
TA 2990
\n",
"
FO 63
\n",
"
Male
\n",
"
Dirty blond
\n",
"
Tall-6'6
\n",
"
omer
\n",
"
Men
\n",
"
Rohan
\n",
"
Lothíriel after the War of the Ring
\n",
"
\n",
"
\n",
"
194
\n",
"
FA 532
\n",
"
Still alive; departed to ,Aman, on ,September ...
\n",
"
Male
\n",
"
Dark
\n",
"
Tall
\n",
"
Elrond
\n",
"
Half-elven
\n",
"
Rivendell
\n",
"
Celebrían
\n",
"
\n",
"
\n",
"
204
\n",
"
SA 3119
\n",
"
SA 3441
\n",
"
Male
\n",
"
Brown
\n",
"
7' 10\"
\n",
"
Elendil
\n",
"
Men
\n",
"
Arnor,Gondor
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
530
\n",
"
YT
\n",
"
Still alive, departed over the sea in the earl...
\n",
"
Male
\n",
"
Silver
\n",
"
Tall
\n",
"
Celeborn
\n",
"
Elves
\n",
"
Eregion,Lothlórien,Caras Galadhon
\n",
"
Galadriel
\n",
"
\n",
"
\n",
"
551
\n",
"
Possibly pre First Age
\n",
"
Unknown; possibly still alive
\n",
"
Most likely male
\n",
"
None
\n",
"
Huge
\n",
"
Watcher in the Water
\n",
"
Urulóki
\n",
"
Doors of Durin
\n",
"
Most likely none
\n",
"
\n",
"
\n",
"
579
\n",
"
3019
\n",
"
February 293019
\n",
"
Male
\n",
"
Dark (movie)
\n",
"
6' 6\" (movie)
\n",
"
Uglúk
\n",
"
Uruk-hai
\n",
"
Isengard
\n",
"
None
\n",
"
\n",
"
\n",
"
620
\n",
"
TA 2925
\n",
"
TA 3007
\n",
"
Male
\n",
"
Brown (film)
\n",
"
1.76m / 5'9\" (film)
\n",
"
Bain
\n",
"
Men
\n",
"
Dale
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
686
\n",
"
YT 1362
\n",
"
Still alive: Departed over the sea on ,Septemb...
\n",
"
Female
\n",
"
Golden
\n",
"
Tall
\n",
"
Galadriel
\n",
"
Elves
\n",
"
Eregion,Lothlórien,Caras Galadhon
\n",
"
Celeborn
\n",
"
\n",
"
\n",
"
692
\n",
"
YT 1169
\n",
"
YT 1497
\n",
"
Male
\n",
"
Raven
\n",
"
Tall
\n",
"
Fëanor
\n",
"
Elves
\n",
"
Tirion,Formenos
\n",
"
Nerdanel
\n",
"
\n",
"
\n",
"
795
\n",
"
First Age
\n",
"
Presumably departed to ,Aman
\n",
"
Male
\n",
"
Golden
\n",
"
Tall
\n",
"
Thranduil
\n",
"
Elves
\n",
"
Woodland Realm,Mirkwood
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
802
\n",
"
YT 1050
\n",
"
FA 502
\n",
"
Male
\n",
"
Silver
\n",
"
Tallest of the Elven-folk, 8'2\"
\n",
"
Thingol
\n",
"
Elves
\n",
"
Doriath
\n",
"
Melian
\n",
"
\n",
"
\n",
"
873
\n",
"
March 1 ,2931
\n",
"
FO 120
\n",
"
Male
\n",
"
Dark
\n",
"
198cm (6'6\")
\n",
"
Aragorn II Elessar
\n",
"
Men
\n",
"
Reunited Kingdom,Arnor,Gondor
\n",
"
Arwen
\n",
"
\n",
" \n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
" "
],
"text/plain": [
" birth \\\n",
"125 SA 3209 \n",
"134 YT, and perhaps firstborn \n",
"166 YT \n",
"186 TA 2990 \n",
"194 FA 532 \n",
"204 SA 3119 \n",
"530 YT \n",
"551 Possibly pre First Age \n",
"579 3019 \n",
"620 TA 2925 \n",
"686 YT 1362 \n",
"692 YT 1169 \n",
"795 First Age \n",
"802 YT 1050 \n",
"873 March 1 ,2931 \n",
"\n",
" death gender \\\n",
"125 TA 2 Male \n",
"134 Still Alive Male \n",
"166 FA 400 Male \n",
"186 FO 63 Male \n",
"194 Still alive; departed to ,Aman, on ,September ... Male \n",
"204 SA 3441 Male \n",
"530 Still alive, departed over the sea in the earl... Male \n",
"551 Unknown; possibly still alive Most likely male \n",
"579 February 293019 Male \n",
"620 TA 3007 Male \n",
"686 Still alive: Departed over the sea on ,Septemb... Female \n",
"692 YT 1497 Male \n",
"795 Presumably departed to ,Aman Male \n",
"802 FA 502 Male \n",
"873 FO 120 Male \n",
"\n",
" hair height name \\\n",
"125 Black Very tall almost 7'1 Isildur \n",
"134 Probably Golden Tall Ingwë \n",
"166 Dark Tall Eöl \n",
"186 Dirty blond Tall-6'6 omer \n",
"194 Dark Tall Elrond \n",
"204 Brown 7' 10\" Elendil \n",
"530 Silver Tall Celeborn \n",
"551 None Huge Watcher in the Water \n",
"579 Dark (movie) 6' 6\" (movie) Uglúk \n",
"620 Brown (film) 1.76m / 5'9\" (film) Bain \n",
"686 Golden Tall Galadriel \n",
"692 Raven Tall Fëanor \n",
"795 Golden Tall Thranduil \n",
"802 Silver Tallest of the Elven-folk, 8'2\" Thingol \n",
"873 Dark 198cm (6'6\") Aragorn II Elessar \n",
"\n",
" race realm \\\n",
"125 Men Arnor,Gondor \n",
"134 Elves Valinor,Taniquetil \n",
"166 Elves Nan Elmoth \n",
"186 Men Rohan \n",
"194 Half-elven Rivendell \n",
"204 Men Arnor,Gondor \n",
"530 Elves Eregion,Lothlórien,Caras Galadhon \n",
"551 Urulóki Doors of Durin \n",
"579 Uruk-hai Isengard \n",
"620 Men Dale \n",
"686 Elves Eregion,Lothlórien,Caras Galadhon \n",
"692 Elves Tirion,Formenos \n",
"795 Elves Woodland Realm,Mirkwood \n",
"802 Elves Doriath \n",
"873 Men Reunited Kingdom,Arnor,Gondor \n",
"\n",
" spouse \n",
"125 Unnamed wife \n",
"134 Unnamed wife \n",
"166 Aredhel \n",
"186 Lothíriel after the War of the Ring \n",
"194 Celebrían \n",
"204 Unnamed wife \n",
"530 Galadriel \n",
"551 Most likely none \n",
"579 None \n",
"620 Unnamed wife \n",
"686 Celeborn \n",
"692 Nerdanel \n",
"795 Unnamed wife \n",
"802 Melian \n",
"873 Arwen "
]
},
"metadata": {},
"execution_count": 4
}
]
},
{
"cell_type": "markdown",
"source": [
"Of course we could ask for just the ones with 8 null values."
],
"metadata": {
"id": "qjvFyJ_37eHd"
}
},
{
"cell_type": "code",
"source": [
"df[df.isnull().sum(axis = 1) == 8].name"
],
"metadata": {
"id": "6NKK_H-XL-ax",
"outputId": "8f15c14d-82cc-40f4-9dce-676dcfd5c193",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"8 Angrim\n",
"14 Angelimar\n",
"17 Linda (Baggins) Proudfoot\n",
"18 Bodo Proudfoot\n",
"40 Tanta (Hornblower) Baggins\n",
" ... \n",
"886 Andvír\n",
"891 Amlach\n",
"904 Aghan\n",
"905 Agathor\n",
"907 Aerandir\n",
"Name: name, Length: 136, dtype: object"
]
},
"metadata": {},
"execution_count": 5
}
]
},
{
"cell_type": "markdown",
"source": [
"I only included the names since the rest of the dataset was null!"
],
"metadata": {
"id": "oY9AMkQZ7-lL"
}
},
{
"cell_type": "markdown",
"source": [
"Of course we can use this method to include only entries that have 4 or less null entries."
],
"metadata": {
"id": "2Af68l5o8Tks"
}
},
{
"cell_type": "code",
"source": [
"df[df.isnull().sum(axis = 1) <= 4]"
],
"metadata": {
"id": "ylz1qtyz74K-",
"outputId": "465b1809-5141-4665-c30c-729536e85b30",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 424
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
birth
\n",
"
death
\n",
"
gender
\n",
"
hair
\n",
"
height
\n",
"
name
\n",
"
race
\n",
"
realm
\n",
"
spouse
\n",
"
\n",
" \n",
" \n",
"
\n",
"
1
\n",
"
TA 2978
\n",
"
February 26 ,3019
\n",
"
Male
\n",
"
Dark (book) Light brown (movie)
\n",
"
NaN
\n",
"
Boromir
\n",
"
Men
\n",
"
NaN
\n",
"
NaN
\n",
"
\n",
"
\n",
"
3
\n",
"
TA 280
\n",
"
TA 515
\n",
"
Male
\n",
"
NaN
\n",
"
NaN
\n",
"
Tarcil
\n",
"
Men
\n",
"
Arnor
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
5
\n",
"
SA 2709
\n",
"
SA 2962
\n",
"
Male
\n",
"
NaN
\n",
"
NaN
\n",
"
Ar-Adûnakhôr
\n",
"
Men
\n",
"
Númenor
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
7
\n",
"
YT
\n",
"
FA 455
\n",
"
Male
\n",
"
Golden
\n",
"
NaN
\n",
"
Angrod
\n",
"
Elves
\n",
"
NaN
\n",
"
Eldalótë
\n",
"
\n",
"
\n",
"
9
\n",
"
SA 3219
\n",
"
SA 3440
\n",
"
Male
\n",
"
NaN
\n",
"
NaN
\n",
"
Anárion
\n",
"
Men
\n",
"
Gondor
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
903
\n",
"
TA 2827
\n",
"
TA 2932
\n",
"
Male
\n",
"
NaN
\n",
"
NaN
\n",
"
Aglahad
\n",
"
Men
\n",
"
NaN
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
906
\n",
"
Mid ,First Age
\n",
"
FA 495
\n",
"
Female
\n",
"
NaN
\n",
"
NaN
\n",
"
Aerin
\n",
"
Men
\n",
"
NaN
\n",
"
Brodda
\n",
"
\n",
"
\n",
"
908
\n",
"
YT during the ,Noontide of Valinor
\n",
"
FA 455
\n",
"
Male
\n",
"
Golden
\n",
"
NaN
\n",
"
Aegnor
\n",
"
Elves
\n",
"
NaN
\n",
"
Loved ,Andreth but remained unmarried
\n",
"
\n",
"
\n",
"
909
\n",
"
TA 2917
\n",
"
TA 3010
\n",
"
Male
\n",
"
NaN
\n",
"
NaN
\n",
"
Adrahil II
\n",
"
Men
\n",
"
NaN
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
910
\n",
"
Before ,TA 1944
\n",
"
Late ,Third Age
\n",
"
Male
\n",
"
NaN
\n",
"
NaN
\n",
"
Adrahil I
\n",
"
Men
\n",
"
NaN
\n",
"
NaN
\n",
"
\n",
" \n",
"
\n",
"
673 rows × 9 columns
\n",
"
\n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
" "
],
"text/plain": [
" birth death gender \\\n",
"1 TA 2978 February 26 ,3019 Male \n",
"3 TA 280 TA 515 Male \n",
"5 SA 2709 SA 2962 Male \n",
"7 YT FA 455 Male \n",
"9 SA 3219 SA 3440 Male \n",
".. ... ... ... \n",
"903 TA 2827 TA 2932 Male \n",
"906 Mid ,First Age FA 495 Female \n",
"908 YT during the ,Noontide of Valinor FA 455 Male \n",
"909 TA 2917 TA 3010 Male \n",
"910 Before ,TA 1944 Late ,Third Age Male \n",
"\n",
" hair height name race realm \\\n",
"1 Dark (book) Light brown (movie) NaN Boromir Men NaN \n",
"3 NaN NaN Tarcil Men Arnor \n",
"5 NaN NaN Ar-Adûnakhôr Men Númenor \n",
"7 Golden NaN Angrod Elves NaN \n",
"9 NaN NaN Anárion Men Gondor \n",
".. ... ... ... ... ... \n",
"903 NaN NaN Aglahad Men NaN \n",
"906 NaN NaN Aerin Men NaN \n",
"908 Golden NaN Aegnor Elves NaN \n",
"909 NaN NaN Adrahil II Men NaN \n",
"910 NaN NaN Adrahil I Men NaN \n",
"\n",
" spouse \n",
"1 NaN \n",
"3 Unnamed wife \n",
"5 Unnamed wife \n",
"7 Eldalótë \n",
"9 Unnamed wife \n",
".. ... \n",
"903 Unnamed wife \n",
"906 Brodda \n",
"908 Loved ,Andreth but remained unmarried \n",
"909 Unnamed wife \n",
"910 NaN \n",
"\n",
"[673 rows x 9 columns]"
]
},
"metadata": {},
"execution_count": 6
}
]
},
{
"cell_type": "markdown",
"source": [
"Maybe we only want the characters whose *realm* has been included. We'll negate the `isnull()` command."
],
"metadata": {
"id": "xBaiiZaE8mdd"
}
},
{
"cell_type": "code",
"source": [
"df[~df.realm.isnull()]"
],
"metadata": {
"id": "W1YPuhDw8geJ",
"outputId": "fa5099ef-2251-42e3-c4af-694367184908",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 424
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
birth
\n",
"
death
\n",
"
gender
\n",
"
hair
\n",
"
height
\n",
"
name
\n",
"
race
\n",
"
realm
\n",
"
spouse
\n",
"
\n",
" \n",
" \n",
"
\n",
"
3
\n",
"
TA 280
\n",
"
TA 515
\n",
"
Male
\n",
"
NaN
\n",
"
NaN
\n",
"
Tarcil
\n",
"
Men
\n",
"
Arnor
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
5
\n",
"
SA 2709
\n",
"
SA 2962
\n",
"
Male
\n",
"
NaN
\n",
"
NaN
\n",
"
Ar-Adûnakhôr
\n",
"
Men
\n",
"
Númenor
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
9
\n",
"
SA 3219
\n",
"
SA 3440
\n",
"
Male
\n",
"
NaN
\n",
"
NaN
\n",
"
Anárion
\n",
"
Men
\n",
"
Gondor
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
10
\n",
"
SA 3118
\n",
"
Still alive
\n",
"
Male
\n",
"
NaN
\n",
"
Tall
\n",
"
Ar-Pharazôn
\n",
"
Men
\n",
"
Númenor
\n",
"
Tar-Míriel
\n",
"
\n",
"
\n",
"
11
\n",
"
SA 2876
\n",
"
SA 3102
\n",
"
Male
\n",
"
NaN
\n",
"
NaN
\n",
"
Ar-Sakalthôr
\n",
"
Men
\n",
"
Númenor
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
890
\n",
"
TA 726
\n",
"
TA 946
\n",
"
Male
\n",
"
NaN
\n",
"
NaN
\n",
"
Amlaith
\n",
"
Men
\n",
"
Arthedain
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
892
\n",
"
Sometime during ,Years of the Trees, or the ,F...
\n",
"
SA 3434
\n",
"
Male
\n",
"
NaN
\n",
"
NaN
\n",
"
Amdír
\n",
"
Elves
\n",
"
Lórien
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
898
\n",
"
NaN
\n",
"
NaN
\n",
"
Female
\n",
"
NaN
\n",
"
NaN
\n",
"
Almarian
\n",
"
Men
\n",
"
Númenor
\n",
"
Tar-Meneldur
\n",
"
\n",
"
\n",
"
900
\n",
"
TA 2544
\n",
"
TA 2645
\n",
"
Male
\n",
"
NaN
\n",
"
NaN
\n",
"
Aldor
\n",
"
Men
\n",
"
Rohan
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
901
\n",
"
TA 1330
\n",
"
TA 1540
\n",
"
Male
\n",
"
NaN
\n",
"
NaN
\n",
"
Aldamir
\n",
"
Men
\n",
"
Gondor
\n",
"
Unnamed wife
\n",
"
\n",
" \n",
"
\n",
"
197 rows × 9 columns
\n",
"
\n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
" "
],
"text/plain": [
" birth death gender \\\n",
"3 TA 280 TA 515 Male \n",
"5 SA 2709 SA 2962 Male \n",
"9 SA 3219 SA 3440 Male \n",
"10 SA 3118 Still alive Male \n",
"11 SA 2876 SA 3102 Male \n",
".. ... ... ... \n",
"890 TA 726 TA 946 Male \n",
"892 Sometime during ,Years of the Trees, or the ,F... SA 3434 Male \n",
"898 NaN NaN Female \n",
"900 TA 2544 TA 2645 Male \n",
"901 TA 1330 TA 1540 Male \n",
"\n",
" hair height name race realm spouse \n",
"3 NaN NaN Tarcil Men Arnor Unnamed wife \n",
"5 NaN NaN Ar-Adûnakhôr Men Númenor Unnamed wife \n",
"9 NaN NaN Anárion Men Gondor Unnamed wife \n",
"10 NaN Tall Ar-Pharazôn Men Númenor Tar-Míriel \n",
"11 NaN NaN Ar-Sakalthôr Men Númenor Unnamed wife \n",
".. ... ... ... ... ... ... \n",
"890 NaN NaN Amlaith Men Arthedain Unnamed wife \n",
"892 NaN NaN Amdír Elves Lórien Unnamed wife \n",
"898 NaN NaN Almarian Men Númenor Tar-Meneldur \n",
"900 NaN NaN Aldor Men Rohan Unnamed wife \n",
"901 NaN NaN Aldamir Men Gondor Unnamed wife \n",
"\n",
"[197 rows x 9 columns]"
]
},
"metadata": {},
"execution_count": 7
}
]
},
{
"cell_type": "markdown",
"source": [
"## Imputing"
],
"metadata": {
"id": "M_iH2nsiFTFf"
}
},
{
"cell_type": "markdown",
"source": [
"The simplest method for filling in `NaN`s is to just place a value there."
],
"metadata": {
"id": "WLCSQdziFagQ"
}
},
{
"cell_type": "code",
"source": [
"df.fillna(value = 0)"
],
"metadata": {
"id": "7DEhGXNAFaJo",
"outputId": "e254be9a-e1a1-44f7-ec69-30392e302b57",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 424
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
birth
\n",
"
death
\n",
"
gender
\n",
"
hair
\n",
"
height
\n",
"
name
\n",
"
race
\n",
"
realm
\n",
"
spouse
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
Female
\n",
"
0
\n",
"
0
\n",
"
Adanel
\n",
"
Men
\n",
"
0
\n",
"
Belemir
\n",
"
\n",
"
\n",
"
1
\n",
"
TA 2978
\n",
"
February 26 ,3019
\n",
"
Male
\n",
"
Dark (book) Light brown (movie)
\n",
"
0
\n",
"
Boromir
\n",
"
Men
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
2
\n",
"
0
\n",
"
March ,3019
\n",
"
Male
\n",
"
0
\n",
"
0
\n",
"
Lagduf
\n",
"
Orcs
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
3
\n",
"
TA 280
\n",
"
TA 515
\n",
"
Male
\n",
"
0
\n",
"
0
\n",
"
Tarcil
\n",
"
Men
\n",
"
Arnor
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
4
\n",
"
0
\n",
"
0
\n",
"
Male
\n",
"
0
\n",
"
0
\n",
"
Fire-drake of Gondolin
\n",
"
Dragon
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
906
\n",
"
Mid ,First Age
\n",
"
FA 495
\n",
"
Female
\n",
"
0
\n",
"
0
\n",
"
Aerin
\n",
"
Men
\n",
"
0
\n",
"
Brodda
\n",
"
\n",
"
\n",
"
907
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
Aerandir
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
908
\n",
"
YT during the ,Noontide of Valinor
\n",
"
FA 455
\n",
"
Male
\n",
"
Golden
\n",
"
0
\n",
"
Aegnor
\n",
"
Elves
\n",
"
0
\n",
"
Loved ,Andreth but remained unmarried
\n",
"
\n",
"
\n",
"
909
\n",
"
TA 2917
\n",
"
TA 3010
\n",
"
Male
\n",
"
0
\n",
"
0
\n",
"
Adrahil II
\n",
"
Men
\n",
"
0
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
910
\n",
"
Before ,TA 1944
\n",
"
Late ,Third Age
\n",
"
Male
\n",
"
0
\n",
"
0
\n",
"
Adrahil I
\n",
"
Men
\n",
"
0
\n",
"
0
\n",
"
\n",
" \n",
"
\n",
"
911 rows × 9 columns
\n",
"
\n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
" "
],
"text/plain": [
" birth death gender \\\n",
"0 0 0 Female \n",
"1 TA 2978 February 26 ,3019 Male \n",
"2 0 March ,3019 Male \n",
"3 TA 280 TA 515 Male \n",
"4 0 0 Male \n",
".. ... ... ... \n",
"906 Mid ,First Age FA 495 Female \n",
"907 0 0 0 \n",
"908 YT during the ,Noontide of Valinor FA 455 Male \n",
"909 TA 2917 TA 3010 Male \n",
"910 Before ,TA 1944 Late ,Third Age Male \n",
"\n",
" hair height name race \\\n",
"0 0 0 Adanel Men \n",
"1 Dark (book) Light brown (movie) 0 Boromir Men \n",
"2 0 0 Lagduf Orcs \n",
"3 0 0 Tarcil Men \n",
"4 0 0 Fire-drake of Gondolin Dragon \n",
".. ... ... ... ... \n",
"906 0 0 Aerin Men \n",
"907 0 0 Aerandir 0 \n",
"908 Golden 0 Aegnor Elves \n",
"909 0 0 Adrahil II Men \n",
"910 0 0 Adrahil I Men \n",
"\n",
" realm spouse \n",
"0 0 Belemir \n",
"1 0 0 \n",
"2 0 0 \n",
"3 Arnor Unnamed wife \n",
"4 0 0 \n",
".. ... ... \n",
"906 0 Brodda \n",
"907 0 0 \n",
"908 0 Loved ,Andreth but remained unmarried \n",
"909 0 Unnamed wife \n",
"910 0 0 \n",
"\n",
"[911 rows x 9 columns]"
]
},
"metadata": {},
"execution_count": 8
}
]
},
{
"cell_type": "markdown",
"source": [
"You should note right away that some of these zeros make no sense. You might be more careful with your zeros."
],
"metadata": {
"id": "gf_4T7cPFxzp"
}
},
{
"cell_type": "code",
"source": [
"df.height.fillna(value = 0)"
],
"metadata": {
"id": "sRcAE_om8xIN",
"outputId": "aed039ef-e4c3-4a68-bc96-989a202f104c",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0 0\n",
"1 0\n",
"2 0\n",
"3 0\n",
"4 0\n",
" ..\n",
"906 0\n",
"907 0\n",
"908 0\n",
"909 0\n",
"910 0\n",
"Name: height, Length: 911, dtype: object"
]
},
"metadata": {},
"execution_count": 9
}
]
},
{
"cell_type": "markdown",
"source": [
"Or you might not want to skew the average so much. You could assign the mean if the remaining values were numerical. Unfortuantely these are mostly strings with little hope of converting to a numerical value."
],
"metadata": {
"id": "b_uJNOzYGFu3"
}
},
{
"cell_type": "code",
"source": [
"df.height[~df.height.isnull()]"
],
"metadata": {
"id": "dTqTaV18GBQv",
"outputId": "73636ed3-fba5-4cc7-933e-dd96bf693967",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"10 Tall\n",
"19 Tall\n",
"20 Tallest of the Elves of Gondolin\n",
"41 Tall\n",
"74 Large and immense\n",
" ... \n",
"831 8'5\n",
"850 Tall\n",
"853 Tall\n",
"873 198cm (6'6\")\n",
"881 As tall as a mountain\n",
"Name: height, Length: 98, dtype: object"
]
},
"metadata": {},
"execution_count": 10
}
]
},
{
"cell_type": "markdown",
"source": [
"We can also fill the empties by grabbing other values around our missing."
],
"metadata": {
"id": "GjxffSLIG_k_"
}
},
{
"cell_type": "code",
"source": [
"df.height.fillna(method= 'pad')"
],
"metadata": {
"id": "xTD8Po7FGPox",
"outputId": "078b58e1-5820-4007-c7b9-14b0f3dfb2a7",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0 NaN\n",
"1 NaN\n",
"2 NaN\n",
"3 NaN\n",
"4 NaN\n",
" ... \n",
"906 As tall as a mountain\n",
"907 As tall as a mountain\n",
"908 As tall as a mountain\n",
"909 As tall as a mountain\n",
"910 As tall as a mountain\n",
"Name: height, Length: 911, dtype: object"
]
},
"metadata": {},
"execution_count": 11
}
]
},
{
"cell_type": "markdown",
"source": [
"`pad` took the last value and filled it forward. We can also go the otherway with `bfill`"
],
"metadata": {
"id": "bF16QtkSHXp_"
}
},
{
"cell_type": "code",
"source": [
"df.height.fillna(method= 'bfill')"
],
"metadata": {
"id": "j6-5ipcsHREF",
"outputId": "6357b1d2-aa5d-4a9d-8e81-ba13c088ad92",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0 Tall\n",
"1 Tall\n",
"2 Tall\n",
"3 Tall\n",
"4 Tall\n",
" ... \n",
"906 NaN\n",
"907 NaN\n",
"908 NaN\n",
"909 NaN\n",
"910 NaN\n",
"Name: height, Length: 911, dtype: object"
]
},
"metadata": {},
"execution_count": 12
}
]
},
{
"cell_type": "markdown",
"source": [
"Filling ing my mode is a little tricky as the mode returns an array rather than a single value. The code below changes all to *height* to the mode."
],
"metadata": {
"id": "kDz4EMqkXmYB"
}
},
{
"cell_type": "code",
"source": [
"\n",
"df.height.transform(lambda x: x.fillna(value = x.mode()[0]))"
],
"metadata": {
"id": "3ZKzof3p8pVz",
"outputId": "bd04ca1f-51db-42b4-ceba-0128d279ff8a",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0 Tall\n",
"1 Tall\n",
"2 Tall\n",
"3 Tall\n",
"4 Tall\n",
" ... \n",
"906 Tall\n",
"907 Tall\n",
"908 Tall\n",
"909 Tall\n",
"910 Tall\n",
"Name: height, Length: 911, dtype: object"
]
},
"metadata": {},
"execution_count": 13
}
]
},
{
"cell_type": "markdown",
"source": [
"## Imputing by Category"
],
"metadata": {
"id": "8XTgp5G-OiGt"
}
},
{
"cell_type": "markdown",
"source": [
"There is no quantitative data here so I actually have to work a little harder than I'd like. If height was just a number you'd run some code like \n",
"\n",
"```\n",
"df.height.fillna(df.groupby('realm').height.transform('mean'))\n",
"```\n",
"\n",
"To fill the NaNs with the mean from there group. To deal with the categories I'll need to get the most frequent from category first."
],
"metadata": {
"id": "5hm1LJhdOmoe"
}
},
{
"cell_type": "code",
"source": [
"df.groupby(['race']).height.agg(pa.Series.mode)"
],
"metadata": {
"id": "ZSl8a5zrQ8v2",
"outputId": "47957b08-d248-4298-8c8d-fa0b87d352a8",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"race\n",
"Ainur Varies\n",
"Ainur,Maiar []\n",
"Balrog []\n",
"Black Uruk 7'1\n",
"Dragon []\n",
"Dragons [As tall as a mountain, Gigantic]\n",
"Drúedain Short\n",
"Dwarf []\n",
"Dwarven []\n",
"Dwarves [4'5 - 5' (Estimate) , 4'5\" (film)]\n",
"Eagle []\n",
"Eagles []\n",
"Elf []\n",
"Elves Tall\n",
"Elves,Maiar []\n",
"Elves,Noldor []\n",
"Ents Very tall\n",
"Ents,Onodrim 15'4\n",
"Goblin,Orc 8,4 Body weight = 190kg\n",
"God Varies\n",
"Great Eagles 30\n",
"Great Spiders [Enormous, Large and immense]\n",
"Half-elven Tall\n",
"Half-elven,Men []\n",
"Hobbit [1.06m (3'6\"), 1.17m (3'10\"), 1.2m (3'11\"), 1....\n",
"Hobbits 1.22m (4'0\")\n",
"Horse []\n",
"Maiar Various until \n",
"Maiar,Balrog Slightly larger and taller than a Man (book), ...\n",
"Maiar,Balrogs []\n",
"Men Tall\n",
"Men,Rohirrim []\n",
"Men,Skin-changer Tall (in Man-form)\n",
"Men,Undead Tall\n",
"Men,Wraith 7' 1\" (2.13 metres)\n",
"Orc 5'9\" - 6'4\" (film)\n",
"Orc,Goblin []\n",
"Orcs [8'5, About nine feet (film)]\n",
"Raven []\n",
"Skin-changer Tall\n",
"Stone-trolls About 13'\n",
"Uruk-hai [6' 6\" (movie), 6'1 (film)]\n",
"Uruk-hai,Orc medium\n",
"Urulóki Huge\n",
"Vampire []\n",
"Werewolves Gigantic\n",
"Wolfhound Horse-sized\n",
"Name: height, dtype: object"
]
},
"metadata": {},
"execution_count": 14
}
]
},
{
"cell_type": "markdown",
"source": [
"This is showing be that the most common height by each realm is mostly NaN. We could to get rid of all that to help this imputation."
],
"metadata": {
"id": "vV66YbiaQq4k"
}
},
{
"cell_type": "code",
"source": [
"dfrh = df[(~df.race.isna())&(~df.height.isna())]\n",
"\n",
"dfrh.groupby(['race']).height.agg(pa.Series.mode)"
],
"metadata": {
"id": "7-LcET92Q2Od",
"outputId": "54f3fe22-01c0-435f-e4ca-890f57b308ce",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"race\n",
"Ainur Varies\n",
"Black Uruk 7'1\n",
"Dragons [As tall as a mountain, Gigantic]\n",
"Drúedain Short\n",
"Dwarves [4'5 - 5' (Estimate) , 4'5\" (film)]\n",
"Elves Tall\n",
"Ents Very tall\n",
"Ents,Onodrim 15'4\n",
"Goblin,Orc 8,4 Body weight = 190kg\n",
"God Varies\n",
"Great Eagles 30\n",
"Great Spiders [Enormous, Large and immense]\n",
"Half-elven Tall\n",
"Hobbit [1.06m (3'6\"), 1.17m (3'10\"), 1.2m (3'11\"), 1....\n",
"Hobbits 1.22m (4'0\")\n",
"Maiar Various until \n",
"Maiar,Balrog Slightly larger and taller than a Man (book), ...\n",
"Men Tall\n",
"Men,Skin-changer Tall (in Man-form)\n",
"Men,Undead Tall\n",
"Men,Wraith 7' 1\" (2.13 metres)\n",
"Orc 5'9\" - 6'4\" (film)\n",
"Orcs [8'5, About nine feet (film)]\n",
"Skin-changer Tall\n",
"Stone-trolls About 13'\n",
"Uruk-hai [6' 6\" (movie), 6'1 (film)]\n",
"Uruk-hai,Orc medium\n",
"Urulóki Huge\n",
"Werewolves Gigantic\n",
"Wolfhound Horse-sized\n",
"Name: height, dtype: object"
]
},
"metadata": {},
"execution_count": 15
}
]
},
{
"cell_type": "markdown",
"source": [
"The next line of code is not working as intended, only changing the first of each category to the mode. "
],
"metadata": {
"id": "H0gQo-lpX7Ex"
}
},
{
"cell_type": "code",
"source": [
"df.height.fillna(df.groupby('race').height.transform(lambda s: s.mode()))"
],
"metadata": {
"id": "K6iFGPYhHkbg",
"outputId": "4d8af44d-4c63-4846-fb22-7b52118faa24",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0 Tall\n",
"1 NaN\n",
"2 NaN\n",
"3 NaN\n",
"4 NaN\n",
" ... \n",
"906 NaN\n",
"907 NaN\n",
"908 NaN\n",
"909 NaN\n",
"910 NaN\n",
"Name: height, Length: 911, dtype: object"
]
},
"metadata": {},
"execution_count": 16
}
]
},
{
"cell_type": "markdown",
"source": [
"I believe this line of code does the same mistake but I leave it as another way to do the transformation and might be useful at some point."
],
"metadata": {
"id": "6YmCYojzYJgs"
}
},
{
"cell_type": "code",
"source": [
"df.groupby('race', sort=False).height.apply(lambda x: x.fillna(value = x.mode()))\n",
"\n"
],
"metadata": {
"id": "iXVIVW-RRiKU",
"outputId": "141d5df7-5a0e-4e6e-81d1-512b0d0407b1",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0 Tall\n",
"1 NaN\n",
"2 NaN\n",
"3 NaN\n",
"4 NaN\n",
" ... \n",
"903 NaN\n",
"906 NaN\n",
"908 NaN\n",
"909 NaN\n",
"910 NaN\n",
"Name: height, Length: 771, dtype: object"
]
},
"metadata": {},
"execution_count": 17
}
]
},
{
"cell_type": "markdown",
"source": [
"Below I am finally able to do the conversion. I'll be honest in saying I don't understand why this works but the pandas methods would not allow the transformation on the entire mode."
],
"metadata": {
"id": "uezP70HN7dUq"
}
},
{
"cell_type": "code",
"source": [
"import numpy as np\n",
"\n",
"df.height = df.height.fillna(df.groupby('race').height.transform(lambda x: next(iter(x.mode()), np.nan)))\n",
"\n",
"df"
],
"metadata": {
"id": "aCQPY7r5Izer",
"outputId": "4d8d576b-7d97-4876-97cf-160762e93e84",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 424
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
birth
\n",
"
death
\n",
"
gender
\n",
"
hair
\n",
"
height
\n",
"
name
\n",
"
race
\n",
"
realm
\n",
"
spouse
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
NaN
\n",
"
NaN
\n",
"
Female
\n",
"
NaN
\n",
"
Tall
\n",
"
Adanel
\n",
"
Men
\n",
"
NaN
\n",
"
Belemir
\n",
"
\n",
"
\n",
"
1
\n",
"
TA 2978
\n",
"
February 26 ,3019
\n",
"
Male
\n",
"
Dark (book) Light brown (movie)
\n",
"
Tall
\n",
"
Boromir
\n",
"
Men
\n",
"
NaN
\n",
"
NaN
\n",
"
\n",
"
\n",
"
2
\n",
"
NaN
\n",
"
March ,3019
\n",
"
Male
\n",
"
NaN
\n",
"
8'5
\n",
"
Lagduf
\n",
"
Orcs
\n",
"
NaN
\n",
"
NaN
\n",
"
\n",
"
\n",
"
3
\n",
"
TA 280
\n",
"
TA 515
\n",
"
Male
\n",
"
NaN
\n",
"
Tall
\n",
"
Tarcil
\n",
"
Men
\n",
"
Arnor
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
4
\n",
"
NaN
\n",
"
NaN
\n",
"
Male
\n",
"
NaN
\n",
"
NaN
\n",
"
Fire-drake of Gondolin
\n",
"
Dragon
\n",
"
NaN
\n",
"
NaN
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
906
\n",
"
Mid ,First Age
\n",
"
FA 495
\n",
"
Female
\n",
"
NaN
\n",
"
Tall
\n",
"
Aerin
\n",
"
Men
\n",
"
NaN
\n",
"
Brodda
\n",
"
\n",
"
\n",
"
907
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
Aerandir
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
\n",
"
\n",
"
908
\n",
"
YT during the ,Noontide of Valinor
\n",
"
FA 455
\n",
"
Male
\n",
"
Golden
\n",
"
Tall
\n",
"
Aegnor
\n",
"
Elves
\n",
"
NaN
\n",
"
Loved ,Andreth but remained unmarried
\n",
"
\n",
"
\n",
"
909
\n",
"
TA 2917
\n",
"
TA 3010
\n",
"
Male
\n",
"
NaN
\n",
"
Tall
\n",
"
Adrahil II
\n",
"
Men
\n",
"
NaN
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
910
\n",
"
Before ,TA 1944
\n",
"
Late ,Third Age
\n",
"
Male
\n",
"
NaN
\n",
"
Tall
\n",
"
Adrahil I
\n",
"
Men
\n",
"
NaN
\n",
"
NaN
\n",
"
\n",
" \n",
"
\n",
"
911 rows × 9 columns
\n",
"
\n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
" "
],
"text/plain": [
" birth death gender \\\n",
"0 NaN NaN Female \n",
"1 TA 2978 February 26 ,3019 Male \n",
"2 NaN March ,3019 Male \n",
"3 TA 280 TA 515 Male \n",
"4 NaN NaN Male \n",
".. ... ... ... \n",
"906 Mid ,First Age FA 495 Female \n",
"907 NaN NaN NaN \n",
"908 YT during the ,Noontide of Valinor FA 455 Male \n",
"909 TA 2917 TA 3010 Male \n",
"910 Before ,TA 1944 Late ,Third Age Male \n",
"\n",
" hair height name race \\\n",
"0 NaN Tall Adanel Men \n",
"1 Dark (book) Light brown (movie) Tall Boromir Men \n",
"2 NaN 8'5 Lagduf Orcs \n",
"3 NaN Tall Tarcil Men \n",
"4 NaN NaN Fire-drake of Gondolin Dragon \n",
".. ... ... ... ... \n",
"906 NaN Tall Aerin Men \n",
"907 NaN NaN Aerandir NaN \n",
"908 Golden Tall Aegnor Elves \n",
"909 NaN Tall Adrahil II Men \n",
"910 NaN Tall Adrahil I Men \n",
"\n",
" realm spouse \n",
"0 NaN Belemir \n",
"1 NaN NaN \n",
"2 NaN NaN \n",
"3 Arnor Unnamed wife \n",
"4 NaN NaN \n",
".. ... ... \n",
"906 NaN Brodda \n",
"907 NaN NaN \n",
"908 NaN Loved ,Andreth but remained unmarried \n",
"909 NaN Unnamed wife \n",
"910 NaN NaN \n",
"\n",
"[911 rows x 9 columns]"
]
},
"metadata": {},
"execution_count": 18
}
]
},
{
"cell_type": "markdown",
"source": [
"Lastly, I'll demonstrate the entire dataset transforming by the mode when grouped by race."
],
"metadata": {
"id": "kxZRXbLfYwsM"
}
},
{
"cell_type": "code",
"source": [
"df.fillna(df.groupby('race').transform(lambda x: next(iter(x.mode()), np.nan)))"
],
"metadata": {
"id": "ZwaF64XiWAt3",
"outputId": "2548e27f-c47a-4646-8740-da961214775a",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 424
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
birth
\n",
"
death
\n",
"
gender
\n",
"
hair
\n",
"
height
\n",
"
name
\n",
"
race
\n",
"
realm
\n",
"
spouse
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Late ,Third Age
\n",
"
FA 473
\n",
"
Female
\n",
"
Dark
\n",
"
Tall
\n",
"
Adanel
\n",
"
Men
\n",
"
Gondor
\n",
"
Belemir
\n",
"
\n",
"
\n",
"
1
\n",
"
TA 2978
\n",
"
February 26 ,3019
\n",
"
Male
\n",
"
Dark (book) Light brown (movie)
\n",
"
Tall
\n",
"
Boromir
\n",
"
Men
\n",
"
Gondor
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
2
\n",
"
NaN
\n",
"
March ,3019
\n",
"
Male
\n",
"
Grey/white strands of hair (film)
\n",
"
8'5
\n",
"
Lagduf
\n",
"
Orcs
\n",
"
Moria,Mount Gundabad
\n",
"
NaN
\n",
"
\n",
"
\n",
"
3
\n",
"
TA 280
\n",
"
TA 515
\n",
"
Male
\n",
"
Dark
\n",
"
Tall
\n",
"
Tarcil
\n",
"
Men
\n",
"
Arnor
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
4
\n",
"
NaN
\n",
"
NaN
\n",
"
Male
\n",
"
NaN
\n",
"
NaN
\n",
"
Fire-drake of Gondolin
\n",
"
Dragon
\n",
"
NaN
\n",
"
NaN
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
906
\n",
"
Mid ,First Age
\n",
"
FA 495
\n",
"
Female
\n",
"
Dark
\n",
"
Tall
\n",
"
Aerin
\n",
"
Men
\n",
"
Gondor
\n",
"
Brodda
\n",
"
\n",
"
\n",
"
907
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
Aerandir
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
\n",
"
\n",
"
908
\n",
"
YT during the ,Noontide of Valinor
\n",
"
FA 455
\n",
"
Male
\n",
"
Golden
\n",
"
Tall
\n",
"
Aegnor
\n",
"
Elves
\n",
"
Doriath
\n",
"
Loved ,Andreth but remained unmarried
\n",
"
\n",
"
\n",
"
909
\n",
"
TA 2917
\n",
"
TA 3010
\n",
"
Male
\n",
"
Dark
\n",
"
Tall
\n",
"
Adrahil II
\n",
"
Men
\n",
"
Gondor
\n",
"
Unnamed wife
\n",
"
\n",
"
\n",
"
910
\n",
"
Before ,TA 1944
\n",
"
Late ,Third Age
\n",
"
Male
\n",
"
Dark
\n",
"
Tall
\n",
"
Adrahil I
\n",
"
Men
\n",
"
Gondor
\n",
"
Unnamed wife
\n",
"
\n",
" \n",
"
\n",
"
911 rows × 9 columns
\n",
"
\n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
" "
],
"text/plain": [
" birth death gender \\\n",
"0 Late ,Third Age FA 473 Female \n",
"1 TA 2978 February 26 ,3019 Male \n",
"2 NaN March ,3019 Male \n",
"3 TA 280 TA 515 Male \n",
"4 NaN NaN Male \n",
".. ... ... ... \n",
"906 Mid ,First Age FA 495 Female \n",
"907 NaN NaN NaN \n",
"908 YT during the ,Noontide of Valinor FA 455 Male \n",
"909 TA 2917 TA 3010 Male \n",
"910 Before ,TA 1944 Late ,Third Age Male \n",
"\n",
" hair height name race \\\n",
"0 Dark Tall Adanel Men \n",
"1 Dark (book) Light brown (movie) Tall Boromir Men \n",
"2 Grey/white strands of hair (film) 8'5 Lagduf Orcs \n",
"3 Dark Tall Tarcil Men \n",
"4 NaN NaN Fire-drake of Gondolin Dragon \n",
".. ... ... ... ... \n",
"906 Dark Tall Aerin Men \n",
"907 NaN NaN Aerandir NaN \n",
"908 Golden Tall Aegnor Elves \n",
"909 Dark Tall Adrahil II Men \n",
"910 Dark Tall Adrahil I Men \n",
"\n",
" realm spouse \n",
"0 Gondor Belemir \n",
"1 Gondor Unnamed wife \n",
"2 Moria,Mount Gundabad NaN \n",
"3 Arnor Unnamed wife \n",
"4 NaN NaN \n",
".. ... ... \n",
"906 Gondor Brodda \n",
"907 NaN NaN \n",
"908 Doriath Loved ,Andreth but remained unmarried \n",
"909 Gondor Unnamed wife \n",
"910 Gondor Unnamed wife \n",
"\n",
"[911 rows x 9 columns]"
]
},
"metadata": {},
"execution_count": 19
}
]
},
{
"cell_type": "markdown",
"source": [
"This is a bit silly as the first person is dead before they are born!"
],
"metadata": {
"id": "CybzqSeWZCuR"
}
},
{
"cell_type": "markdown",
"source": [
"## Your Turn"
],
"metadata": {
"id": "nvuoaFyZZKVo"
}
},
{
"cell_type": "markdown",
"source": [
"Check out the Air B&B dataset, https://raw.githubusercontent.com/nurfnick/Data_Viz/main/Data_Sets/AB_NYC_2019.csv. Examine how many entries are null. Impute for atleast two columns that are null in an approriate fashion."
],
"metadata": {
"id": "yAMkUW6Z9LcB"
}
},
{
"cell_type": "code",
"source": [
"df = pa.read_csv('https://raw.githubusercontent.com/nurfnick/Data_Viz/main/Data_Sets/AB_NYC_2019.csv')\n",
"\n",
"df.head()"
],
"metadata": {
"id": "2dwwGp8nWtPl",
"outputId": "dd29e2b0-49e5-4d80-a3e2-f4e2b0e6be3c",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 427
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
id
\n",
"
name
\n",
"
host_id
\n",
"
host_name
\n",
"
neighbourhood_group
\n",
"
neighbourhood
\n",
"
latitude
\n",
"
longitude
\n",
"
room_type
\n",
"
price
\n",
"
minimum_nights
\n",
"
number_of_reviews
\n",
"
last_review
\n",
"
reviews_per_month
\n",
"
calculated_host_listings_count
\n",
"
availability_365
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
2539
\n",
"
Clean & quiet apt home by the park
\n",
"
2787
\n",
"
John
\n",
"
Brooklyn
\n",
"
Kensington
\n",
"
40.64749
\n",
"
-73.97237
\n",
"
Private room
\n",
"
149
\n",
"
1
\n",
"
9
\n",
"
2018-10-19
\n",
"
0.21
\n",
"
6
\n",
"
365
\n",
"
\n",
"
\n",
"
1
\n",
"
2595
\n",
"
Skylit Midtown Castle
\n",
"
2845
\n",
"
Jennifer
\n",
"
Manhattan
\n",
"
Midtown
\n",
"
40.75362
\n",
"
-73.98377
\n",
"
Entire home/apt
\n",
"
225
\n",
"
1
\n",
"
45
\n",
"
2019-05-21
\n",
"
0.38
\n",
"
2
\n",
"
355
\n",
"
\n",
"
\n",
"
2
\n",
"
3647
\n",
"
THE VILLAGE OF HARLEM....NEW YORK !
\n",
"
4632
\n",
"
Elisabeth
\n",
"
Manhattan
\n",
"
Harlem
\n",
"
40.80902
\n",
"
-73.94190
\n",
"
Private room
\n",
"
150
\n",
"
3
\n",
"
0
\n",
"
NaN
\n",
"
NaN
\n",
"
1
\n",
"
365
\n",
"
\n",
"
\n",
"
3
\n",
"
3831
\n",
"
Cozy Entire Floor of Brownstone
\n",
"
4869
\n",
"
LisaRoxanne
\n",
"
Brooklyn
\n",
"
Clinton Hill
\n",
"
40.68514
\n",
"
-73.95976
\n",
"
Entire home/apt
\n",
"
89
\n",
"
1
\n",
"
270
\n",
"
2019-07-05
\n",
"
4.64
\n",
"
1
\n",
"
194
\n",
"
\n",
"
\n",
"
4
\n",
"
5022
\n",
"
Entire Apt: Spacious Studio/Loft by central park
\n",
"
7192
\n",
"
Laura
\n",
"
Manhattan
\n",
"
East Harlem
\n",
"
40.79851
\n",
"
-73.94399
\n",
"
Entire home/apt
\n",
"
80
\n",
"
10
\n",
"
9
\n",
"
2018-11-19
\n",
"
0.10
\n",
"
1
\n",
"
0
\n",
"
\n",
" \n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
" "
],
"text/plain": [
" id name host_id \\\n",
"0 2539 Clean & quiet apt home by the park 2787 \n",
"1 2595 Skylit Midtown Castle 2845 \n",
"2 3647 THE VILLAGE OF HARLEM....NEW YORK ! 4632 \n",
"3 3831 Cozy Entire Floor of Brownstone 4869 \n",
"4 5022 Entire Apt: Spacious Studio/Loft by central park 7192 \n",
"\n",
" host_name neighbourhood_group neighbourhood latitude longitude \\\n",
"0 John Brooklyn Kensington 40.64749 -73.97237 \n",
"1 Jennifer Manhattan Midtown 40.75362 -73.98377 \n",
"2 Elisabeth Manhattan Harlem 40.80902 -73.94190 \n",
"3 LisaRoxanne Brooklyn Clinton Hill 40.68514 -73.95976 \n",
"4 Laura Manhattan East Harlem 40.79851 -73.94399 \n",
"\n",
" room_type price minimum_nights number_of_reviews last_review \\\n",
"0 Private room 149 1 9 2018-10-19 \n",
"1 Entire home/apt 225 1 45 2019-05-21 \n",
"2 Private room 150 3 0 NaN \n",
"3 Entire home/apt 89 1 270 2019-07-05 \n",
"4 Entire home/apt 80 10 9 2018-11-19 \n",
"\n",
" reviews_per_month calculated_host_listings_count availability_365 \n",
"0 0.21 6 365 \n",
"1 0.38 2 355 \n",
"2 NaN 1 365 \n",
"3 4.64 1 194 \n",
"4 0.10 1 0 "
]
},
"metadata": {},
"execution_count": 21
}
]
},
{
"cell_type": "code",
"source": [
""
],
"metadata": {
"id": "2uykfatk9TRp"
},
"execution_count": null,
"outputs": []
}
]
}