{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "Untitled88.ipynb", "provenance": [], "authorship_tag": "ABX9TyP7bmNqzjl3GY+efVjtB+Rd", "include_colab_link": true }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "source": [ "# Missing and Incomplete" ], "metadata": { "id": "EwT-jhvBfAD2" } }, { "cell_type": "markdown", "source": [ "Often datasets will be missing entries. There are many approaches we can take to dealing with these errors and omissions. I will examine a dataset on the characters from The Lord of The Rings" ], "metadata": { "id": "m9Cg2HjCD09o" } }, { "cell_type": "markdown", "source": [ "## Finding NaN's" ], "metadata": { "id": "NiryQXsjFV_c" } }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "id": "f3tqDlgJe_cI", "outputId": "21ebb4f6-69f4-4906-fc35-308166c7444b" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
birthdeathgenderhairheightnameracerealmspouse
0NaNNaNFemaleNaNNaNAdanelMenNaNBelemir
1TA 2978February 26 ,3019MaleDark (book) Light brown (movie)NaNBoromirMenNaNNaN
2NaNMarch ,3019MaleNaNNaNLagdufOrcsNaNNaN
3TA 280TA 515MaleNaNNaNTarcilMenArnorUnnamed wife
4NaNNaNMaleNaNNaNFire-drake of GondolinDragonNaNNaN
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " birth death gender hair height \\\n", "0 NaN NaN Female NaN NaN \n", "1 TA 2978 February 26 ,3019 Male Dark (book) Light brown (movie) NaN \n", "2 NaN March ,3019 Male NaN NaN \n", "3 TA 280 TA 515 Male NaN NaN \n", "4 NaN NaN Male NaN NaN \n", "\n", " name race realm spouse \n", "0 Adanel Men NaN Belemir \n", "1 Boromir Men NaN NaN \n", "2 Lagduf Orcs NaN NaN \n", "3 Tarcil Men Arnor Unnamed wife \n", "4 Fire-drake of Gondolin Dragon NaN NaN " ] }, "metadata": {}, "execution_count": 1 } ], "source": [ "import pandas as pa\n", "\n", "df = pa.read_csv('https://raw.githubusercontent.com/nurfnick/Data_Viz/main/Data_Sets/lotr_characters.csv')\n", "\n", "df.head()" ] }, { "cell_type": "markdown", "source": [ "We see right away that there are lots of `NaN`'s. This is an empty field in our dataset. Some characters are mentioned but never given much more background than a name." ], "metadata": { "id": "wZ8SwMIcGMmP" } }, { "cell_type": "code", "source": [ "df.isnull().sum(axis = 0)" ], "metadata": { "id": "2aPFYQBtHjXr", "outputId": "d4642faf-29b9-4d79-9308-7e5a456d1c3f", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "birth 207\n", "death 315\n", "gender 143\n", "hair 734\n", "height 813\n", "name 0\n", "race 140\n", "realm 714\n", "spouse 403\n", "dtype: int64" ] }, "metadata": {}, "execution_count": 2 } ] }, { "cell_type": "markdown", "source": [ "There are null values in every column except name." ], "metadata": { "id": "NE2EnNXSHnQ6" } }, { "cell_type": "code", "source": [ "df.isnull().sum(axis = 1).value_counts().sort_index()" ], "metadata": { "id": "mE0t0jowFZAf", "outputId": "fdd7f69b-768f-4488-81a1-61fabe1dec15", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "0 15\n", "1 59\n", "2 185\n", "3 236\n", "4 178\n", "5 81\n", "6 20\n", "7 1\n", "8 136\n", "dtype: int64" ] }, "metadata": {}, "execution_count": 3 } ] }, { "cell_type": "markdown", "source": [ "Here we see that there are only 15 entries with all fields and 136 that are name only (since name was never blank!) Let's look at just those characters." ], "metadata": { "id": "PashKJCsLd1S" } }, { "cell_type": "code", "source": [ "df[~df.isnull().any(axis = 1)]" ], "metadata": { "id": "Jb3H2Dj1LU-n", "outputId": "74fb7437-fc1e-41ec-ffca-131373930316", "colab": { "base_uri": "https://localhost:8080/", "height": 520 } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
birthdeathgenderhairheightnameracerealmspouse
125SA 3209TA 2MaleBlackVery tall almost 7'1IsildurMenArnor,GondorUnnamed wife
134YT, and perhaps firstbornStill AliveMaleProbably GoldenTallIngwëElvesValinor,TaniquetilUnnamed wife
166YTFA 400MaleDarkTallEölElvesNan ElmothAredhel
186TA 2990FO 63MaleDirty blondTall-6'6omerMenRohanLothíriel after the War of the Ring
194FA 532Still alive; departed to ,Aman, on ,September ...MaleDarkTallElrondHalf-elvenRivendellCelebrían
204SA 3119SA 3441MaleBrown7' 10\"ElendilMenArnor,GondorUnnamed wife
530YTStill alive, departed over the sea in the earl...MaleSilverTallCelebornElvesEregion,Lothlórien,Caras GaladhonGaladriel
551Possibly pre First AgeUnknown; possibly still aliveMost likely maleNoneHugeWatcher in the WaterUrulókiDoors of DurinMost likely none
5793019February 293019MaleDark (movie)6' 6\" (movie)UglúkUruk-haiIsengardNone
620TA 2925TA 3007MaleBrown (film)1.76m / 5'9\" (film)BainMenDaleUnnamed wife
686YT 1362Still alive: Departed over the sea on ,Septemb...FemaleGoldenTallGaladrielElvesEregion,Lothlórien,Caras GaladhonCeleborn
692YT 1169YT 1497MaleRavenTallFëanorElvesTirion,FormenosNerdanel
795First AgePresumably departed to ,AmanMaleGoldenTallThranduilElvesWoodland Realm,MirkwoodUnnamed wife
802YT 1050FA 502MaleSilverTallest of the Elven-folk, 8'2\"ThingolElvesDoriathMelian
873March 1 ,2931FO 120MaleDark198cm (6'6\")Aragorn II ElessarMenReunited Kingdom,Arnor,GondorArwen
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " birth \\\n", "125 SA 3209 \n", "134 YT, and perhaps firstborn \n", "166 YT \n", "186 TA 2990 \n", "194 FA 532 \n", "204 SA 3119 \n", "530 YT \n", "551 Possibly pre First Age \n", "579 3019 \n", "620 TA 2925 \n", "686 YT 1362 \n", "692 YT 1169 \n", "795 First Age \n", "802 YT 1050 \n", "873 March 1 ,2931 \n", "\n", " death gender \\\n", "125 TA 2 Male \n", "134 Still Alive Male \n", "166 FA 400 Male \n", "186 FO 63 Male \n", "194 Still alive; departed to ,Aman, on ,September ... Male \n", "204 SA 3441 Male \n", "530 Still alive, departed over the sea in the earl... Male \n", "551 Unknown; possibly still alive Most likely male \n", "579 February 293019 Male \n", "620 TA 3007 Male \n", "686 Still alive: Departed over the sea on ,Septemb... Female \n", "692 YT 1497 Male \n", "795 Presumably departed to ,Aman Male \n", "802 FA 502 Male \n", "873 FO 120 Male \n", "\n", " hair height name \\\n", "125 Black Very tall almost 7'1 Isildur \n", "134 Probably Golden Tall Ingwë \n", "166 Dark Tall Eöl \n", "186 Dirty blond Tall-6'6 omer \n", "194 Dark Tall Elrond \n", "204 Brown 7' 10\" Elendil \n", "530 Silver Tall Celeborn \n", "551 None Huge Watcher in the Water \n", "579 Dark (movie) 6' 6\" (movie) Uglúk \n", "620 Brown (film) 1.76m / 5'9\" (film) Bain \n", "686 Golden Tall Galadriel \n", "692 Raven Tall Fëanor \n", "795 Golden Tall Thranduil \n", "802 Silver Tallest of the Elven-folk, 8'2\" Thingol \n", "873 Dark 198cm (6'6\") Aragorn II Elessar \n", "\n", " race realm \\\n", "125 Men Arnor,Gondor \n", "134 Elves Valinor,Taniquetil \n", "166 Elves Nan Elmoth \n", "186 Men Rohan \n", "194 Half-elven Rivendell \n", "204 Men Arnor,Gondor \n", "530 Elves Eregion,Lothlórien,Caras Galadhon \n", "551 Urulóki Doors of Durin \n", "579 Uruk-hai Isengard \n", "620 Men Dale \n", "686 Elves Eregion,Lothlórien,Caras Galadhon \n", "692 Elves Tirion,Formenos \n", "795 Elves Woodland Realm,Mirkwood \n", "802 Elves Doriath \n", "873 Men Reunited Kingdom,Arnor,Gondor \n", "\n", " spouse \n", "125 Unnamed wife \n", "134 Unnamed wife \n", "166 Aredhel \n", "186 Lothíriel after the War of the Ring \n", "194 Celebrían \n", "204 Unnamed wife \n", "530 Galadriel \n", "551 Most likely none \n", "579 None \n", "620 Unnamed wife \n", "686 Celeborn \n", "692 Nerdanel \n", "795 Unnamed wife \n", "802 Melian \n", "873 Arwen " ] }, "metadata": {}, "execution_count": 4 } ] }, { "cell_type": "markdown", "source": [ "Of course we could ask for just the ones with 8 null values." ], "metadata": { "id": "qjvFyJ_37eHd" } }, { "cell_type": "code", "source": [ "df[df.isnull().sum(axis = 1) == 8].name" ], "metadata": { "id": "6NKK_H-XL-ax", "outputId": "8f15c14d-82cc-40f4-9dce-676dcfd5c193", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "8 Angrim\n", "14 Angelimar\n", "17 Linda (Baggins) Proudfoot\n", "18 Bodo Proudfoot\n", "40 Tanta (Hornblower) Baggins\n", " ... \n", "886 Andvír\n", "891 Amlach\n", "904 Aghan\n", "905 Agathor\n", "907 Aerandir\n", "Name: name, Length: 136, dtype: object" ] }, "metadata": {}, "execution_count": 5 } ] }, { "cell_type": "markdown", "source": [ "I only included the names since the rest of the dataset was null!" ], "metadata": { "id": "oY9AMkQZ7-lL" } }, { "cell_type": "markdown", "source": [ "Of course we can use this method to include only entries that have 4 or less null entries." ], "metadata": { "id": "2Af68l5o8Tks" } }, { "cell_type": "code", "source": [ "df[df.isnull().sum(axis = 1) <= 4]" ], "metadata": { "id": "ylz1qtyz74K-", "outputId": "465b1809-5141-4665-c30c-729536e85b30", "colab": { "base_uri": "https://localhost:8080/", "height": 424 } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
birthdeathgenderhairheightnameracerealmspouse
1TA 2978February 26 ,3019MaleDark (book) Light brown (movie)NaNBoromirMenNaNNaN
3TA 280TA 515MaleNaNNaNTarcilMenArnorUnnamed wife
5SA 2709SA 2962MaleNaNNaNAr-AdûnakhôrMenNúmenorUnnamed wife
7YTFA 455MaleGoldenNaNAngrodElvesNaNEldalótë
9SA 3219SA 3440MaleNaNNaNAnárionMenGondorUnnamed wife
..............................
903TA 2827TA 2932MaleNaNNaNAglahadMenNaNUnnamed wife
906Mid ,First AgeFA 495FemaleNaNNaNAerinMenNaNBrodda
908YT during the ,Noontide of ValinorFA 455MaleGoldenNaNAegnorElvesNaNLoved ,Andreth but remained unmarried
909TA 2917TA 3010MaleNaNNaNAdrahil IIMenNaNUnnamed wife
910Before ,TA 1944Late ,Third AgeMaleNaNNaNAdrahil IMenNaNNaN
\n", "

673 rows × 9 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " birth death gender \\\n", "1 TA 2978 February 26 ,3019 Male \n", "3 TA 280 TA 515 Male \n", "5 SA 2709 SA 2962 Male \n", "7 YT FA 455 Male \n", "9 SA 3219 SA 3440 Male \n", ".. ... ... ... \n", "903 TA 2827 TA 2932 Male \n", "906 Mid ,First Age FA 495 Female \n", "908 YT during the ,Noontide of Valinor FA 455 Male \n", "909 TA 2917 TA 3010 Male \n", "910 Before ,TA 1944 Late ,Third Age Male \n", "\n", " hair height name race realm \\\n", "1 Dark (book) Light brown (movie) NaN Boromir Men NaN \n", "3 NaN NaN Tarcil Men Arnor \n", "5 NaN NaN Ar-Adûnakhôr Men Númenor \n", "7 Golden NaN Angrod Elves NaN \n", "9 NaN NaN Anárion Men Gondor \n", ".. ... ... ... ... ... \n", "903 NaN NaN Aglahad Men NaN \n", "906 NaN NaN Aerin Men NaN \n", "908 Golden NaN Aegnor Elves NaN \n", "909 NaN NaN Adrahil II Men NaN \n", "910 NaN NaN Adrahil I Men NaN \n", "\n", " spouse \n", "1 NaN \n", "3 Unnamed wife \n", "5 Unnamed wife \n", "7 Eldalótë \n", "9 Unnamed wife \n", ".. ... \n", "903 Unnamed wife \n", "906 Brodda \n", "908 Loved ,Andreth but remained unmarried \n", "909 Unnamed wife \n", "910 NaN \n", "\n", "[673 rows x 9 columns]" ] }, "metadata": {}, "execution_count": 6 } ] }, { "cell_type": "markdown", "source": [ "Maybe we only want the characters whose *realm* has been included. We'll negate the `isnull()` command." ], "metadata": { "id": "xBaiiZaE8mdd" } }, { "cell_type": "code", "source": [ "df[~df.realm.isnull()]" ], "metadata": { "id": "W1YPuhDw8geJ", "outputId": "fa5099ef-2251-42e3-c4af-694367184908", "colab": { "base_uri": "https://localhost:8080/", "height": 424 } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
birthdeathgenderhairheightnameracerealmspouse
3TA 280TA 515MaleNaNNaNTarcilMenArnorUnnamed wife
5SA 2709SA 2962MaleNaNNaNAr-AdûnakhôrMenNúmenorUnnamed wife
9SA 3219SA 3440MaleNaNNaNAnárionMenGondorUnnamed wife
10SA 3118Still aliveMaleNaNTallAr-PharazônMenNúmenorTar-Míriel
11SA 2876SA 3102MaleNaNNaNAr-SakalthôrMenNúmenorUnnamed wife
..............................
890TA 726TA 946MaleNaNNaNAmlaithMenArthedainUnnamed wife
892Sometime during ,Years of the Trees, or the ,F...SA 3434MaleNaNNaNAmdírElvesLórienUnnamed wife
898NaNNaNFemaleNaNNaNAlmarianMenNúmenorTar-Meneldur
900TA 2544TA 2645MaleNaNNaNAldorMenRohanUnnamed wife
901TA 1330TA 1540MaleNaNNaNAldamirMenGondorUnnamed wife
\n", "

197 rows × 9 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " birth death gender \\\n", "3 TA 280 TA 515 Male \n", "5 SA 2709 SA 2962 Male \n", "9 SA 3219 SA 3440 Male \n", "10 SA 3118 Still alive Male \n", "11 SA 2876 SA 3102 Male \n", ".. ... ... ... \n", "890 TA 726 TA 946 Male \n", "892 Sometime during ,Years of the Trees, or the ,F... SA 3434 Male \n", "898 NaN NaN Female \n", "900 TA 2544 TA 2645 Male \n", "901 TA 1330 TA 1540 Male \n", "\n", " hair height name race realm spouse \n", "3 NaN NaN Tarcil Men Arnor Unnamed wife \n", "5 NaN NaN Ar-Adûnakhôr Men Númenor Unnamed wife \n", "9 NaN NaN Anárion Men Gondor Unnamed wife \n", "10 NaN Tall Ar-Pharazôn Men Númenor Tar-Míriel \n", "11 NaN NaN Ar-Sakalthôr Men Númenor Unnamed wife \n", ".. ... ... ... ... ... ... \n", "890 NaN NaN Amlaith Men Arthedain Unnamed wife \n", "892 NaN NaN Amdír Elves Lórien Unnamed wife \n", "898 NaN NaN Almarian Men Númenor Tar-Meneldur \n", "900 NaN NaN Aldor Men Rohan Unnamed wife \n", "901 NaN NaN Aldamir Men Gondor Unnamed wife \n", "\n", "[197 rows x 9 columns]" ] }, "metadata": {}, "execution_count": 7 } ] }, { "cell_type": "markdown", "source": [ "## Imputing" ], "metadata": { "id": "M_iH2nsiFTFf" } }, { "cell_type": "markdown", "source": [ "The simplest method for filling in `NaN`s is to just place a value there." ], "metadata": { "id": "WLCSQdziFagQ" } }, { "cell_type": "code", "source": [ "df.fillna(value = 0)" ], "metadata": { "id": "7DEhGXNAFaJo", "outputId": "e254be9a-e1a1-44f7-ec69-30392e302b57", "colab": { "base_uri": "https://localhost:8080/", "height": 424 } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
birthdeathgenderhairheightnameracerealmspouse
000Female00AdanelMen0Belemir
1TA 2978February 26 ,3019MaleDark (book) Light brown (movie)0BoromirMen00
20March ,3019Male00LagdufOrcs00
3TA 280TA 515Male00TarcilMenArnorUnnamed wife
400Male00Fire-drake of GondolinDragon00
..............................
906Mid ,First AgeFA 495Female00AerinMen0Brodda
90700000Aerandir000
908YT during the ,Noontide of ValinorFA 455MaleGolden0AegnorElves0Loved ,Andreth but remained unmarried
909TA 2917TA 3010Male00Adrahil IIMen0Unnamed wife
910Before ,TA 1944Late ,Third AgeMale00Adrahil IMen00
\n", "

911 rows × 9 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " birth death gender \\\n", "0 0 0 Female \n", "1 TA 2978 February 26 ,3019 Male \n", "2 0 March ,3019 Male \n", "3 TA 280 TA 515 Male \n", "4 0 0 Male \n", ".. ... ... ... \n", "906 Mid ,First Age FA 495 Female \n", "907 0 0 0 \n", "908 YT during the ,Noontide of Valinor FA 455 Male \n", "909 TA 2917 TA 3010 Male \n", "910 Before ,TA 1944 Late ,Third Age Male \n", "\n", " hair height name race \\\n", "0 0 0 Adanel Men \n", "1 Dark (book) Light brown (movie) 0 Boromir Men \n", "2 0 0 Lagduf Orcs \n", "3 0 0 Tarcil Men \n", "4 0 0 Fire-drake of Gondolin Dragon \n", ".. ... ... ... ... \n", "906 0 0 Aerin Men \n", "907 0 0 Aerandir 0 \n", "908 Golden 0 Aegnor Elves \n", "909 0 0 Adrahil II Men \n", "910 0 0 Adrahil I Men \n", "\n", " realm spouse \n", "0 0 Belemir \n", "1 0 0 \n", "2 0 0 \n", "3 Arnor Unnamed wife \n", "4 0 0 \n", ".. ... ... \n", "906 0 Brodda \n", "907 0 0 \n", "908 0 Loved ,Andreth but remained unmarried \n", "909 0 Unnamed wife \n", "910 0 0 \n", "\n", "[911 rows x 9 columns]" ] }, "metadata": {}, "execution_count": 8 } ] }, { "cell_type": "markdown", "source": [ "You should note right away that some of these zeros make no sense. You might be more careful with your zeros." ], "metadata": { "id": "gf_4T7cPFxzp" } }, { "cell_type": "code", "source": [ "df.height.fillna(value = 0)" ], "metadata": { "id": "sRcAE_om8xIN", "outputId": "aed039ef-e4c3-4a68-bc96-989a202f104c", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "0 0\n", "1 0\n", "2 0\n", "3 0\n", "4 0\n", " ..\n", "906 0\n", "907 0\n", "908 0\n", "909 0\n", "910 0\n", "Name: height, Length: 911, dtype: object" ] }, "metadata": {}, "execution_count": 9 } ] }, { "cell_type": "markdown", "source": [ "Or you might not want to skew the average so much. You could assign the mean if the remaining values were numerical. Unfortuantely these are mostly strings with little hope of converting to a numerical value." ], "metadata": { "id": "b_uJNOzYGFu3" } }, { "cell_type": "code", "source": [ "df.height[~df.height.isnull()]" ], "metadata": { "id": "dTqTaV18GBQv", "outputId": "73636ed3-fba5-4cc7-933e-dd96bf693967", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "10 Tall\n", "19 Tall\n", "20 Tallest of the Elves of Gondolin\n", "41 Tall\n", "74 Large and immense\n", " ... \n", "831 8'5\n", "850 Tall\n", "853 Tall\n", "873 198cm (6'6\")\n", "881 As tall as a mountain\n", "Name: height, Length: 98, dtype: object" ] }, "metadata": {}, "execution_count": 10 } ] }, { "cell_type": "markdown", "source": [ "We can also fill the empties by grabbing other values around our missing." ], "metadata": { "id": "GjxffSLIG_k_" } }, { "cell_type": "code", "source": [ "df.height.fillna(method= 'pad')" ], "metadata": { "id": "xTD8Po7FGPox", "outputId": "078b58e1-5820-4007-c7b9-14b0f3dfb2a7", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "0 NaN\n", "1 NaN\n", "2 NaN\n", "3 NaN\n", "4 NaN\n", " ... \n", "906 As tall as a mountain\n", "907 As tall as a mountain\n", "908 As tall as a mountain\n", "909 As tall as a mountain\n", "910 As tall as a mountain\n", "Name: height, Length: 911, dtype: object" ] }, "metadata": {}, "execution_count": 11 } ] }, { "cell_type": "markdown", "source": [ "`pad` took the last value and filled it forward. We can also go the otherway with `bfill`" ], "metadata": { "id": "bF16QtkSHXp_" } }, { "cell_type": "code", "source": [ "df.height.fillna(method= 'bfill')" ], "metadata": { "id": "j6-5ipcsHREF", "outputId": "6357b1d2-aa5d-4a9d-8e81-ba13c088ad92", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "0 Tall\n", "1 Tall\n", "2 Tall\n", "3 Tall\n", "4 Tall\n", " ... \n", "906 NaN\n", "907 NaN\n", "908 NaN\n", "909 NaN\n", "910 NaN\n", "Name: height, Length: 911, dtype: object" ] }, "metadata": {}, "execution_count": 12 } ] }, { "cell_type": "markdown", "source": [ "Filling ing my mode is a little tricky as the mode returns an array rather than a single value. The code below changes all to *height* to the mode." ], "metadata": { "id": "kDz4EMqkXmYB" } }, { "cell_type": "code", "source": [ "\n", "df.height.transform(lambda x: x.fillna(value = x.mode()[0]))" ], "metadata": { "id": "3ZKzof3p8pVz", "outputId": "bd04ca1f-51db-42b4-ceba-0128d279ff8a", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "0 Tall\n", "1 Tall\n", "2 Tall\n", "3 Tall\n", "4 Tall\n", " ... \n", "906 Tall\n", "907 Tall\n", "908 Tall\n", "909 Tall\n", "910 Tall\n", "Name: height, Length: 911, dtype: object" ] }, "metadata": {}, "execution_count": 13 } ] }, { "cell_type": "markdown", "source": [ "## Imputing by Category" ], "metadata": { "id": "8XTgp5G-OiGt" } }, { "cell_type": "markdown", "source": [ "There is no quantitative data here so I actually have to work a little harder than I'd like. If height was just a number you'd run some code like \n", "\n", "```\n", "df.height.fillna(df.groupby('realm').height.transform('mean'))\n", "```\n", "\n", "To fill the NaNs with the mean from there group. To deal with the categories I'll need to get the most frequent from category first." ], "metadata": { "id": "5hm1LJhdOmoe" } }, { "cell_type": "code", "source": [ "df.groupby(['race']).height.agg(pa.Series.mode)" ], "metadata": { "id": "ZSl8a5zrQ8v2", "outputId": "47957b08-d248-4298-8c8d-fa0b87d352a8", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "race\n", "Ainur Varies\n", "Ainur,Maiar []\n", "Balrog []\n", "Black Uruk 7'1\n", "Dragon []\n", "Dragons [As tall as a mountain, Gigantic]\n", "Drúedain Short\n", "Dwarf []\n", "Dwarven []\n", "Dwarves [4'5 - 5' (Estimate) , 4'5\" (film)]\n", "Eagle []\n", "Eagles []\n", "Elf []\n", "Elves Tall\n", "Elves,Maiar []\n", "Elves,Noldor []\n", "Ents Very tall\n", "Ents,Onodrim 15'4\n", "Goblin,Orc 8,4 Body weight = 190kg\n", "God Varies\n", "Great Eagles 30\n", "Great Spiders [Enormous, Large and immense]\n", "Half-elven Tall\n", "Half-elven,Men []\n", "Hobbit [1.06m (3'6\"), 1.17m (3'10\"), 1.2m (3'11\"), 1....\n", "Hobbits 1.22m (4'0\")\n", "Horse []\n", "Maiar Various until \n", "Maiar,Balrog Slightly larger and taller than a Man (book), ...\n", "Maiar,Balrogs []\n", "Men Tall\n", "Men,Rohirrim []\n", "Men,Skin-changer Tall (in Man-form)\n", "Men,Undead Tall\n", "Men,Wraith 7' 1\" (2.13 metres)\n", "Orc 5'9\" - 6'4\" (film)\n", "Orc,Goblin []\n", "Orcs [8'5, About nine feet (film)]\n", "Raven []\n", "Skin-changer Tall\n", "Stone-trolls About 13'\n", "Uruk-hai [6' 6\" (movie), 6'1 (film)]\n", "Uruk-hai,Orc medium\n", "Urulóki Huge\n", "Vampire []\n", "Werewolves Gigantic\n", "Wolfhound Horse-sized\n", "Name: height, dtype: object" ] }, "metadata": {}, "execution_count": 14 } ] }, { "cell_type": "markdown", "source": [ "This is showing be that the most common height by each realm is mostly NaN. We could to get rid of all that to help this imputation." ], "metadata": { "id": "vV66YbiaQq4k" } }, { "cell_type": "code", "source": [ "dfrh = df[(~df.race.isna())&(~df.height.isna())]\n", "\n", "dfrh.groupby(['race']).height.agg(pa.Series.mode)" ], "metadata": { "id": "7-LcET92Q2Od", "outputId": "54f3fe22-01c0-435f-e4ca-890f57b308ce", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "race\n", "Ainur Varies\n", "Black Uruk 7'1\n", "Dragons [As tall as a mountain, Gigantic]\n", "Drúedain Short\n", "Dwarves [4'5 - 5' (Estimate) , 4'5\" (film)]\n", "Elves Tall\n", "Ents Very tall\n", "Ents,Onodrim 15'4\n", "Goblin,Orc 8,4 Body weight = 190kg\n", "God Varies\n", "Great Eagles 30\n", "Great Spiders [Enormous, Large and immense]\n", "Half-elven Tall\n", "Hobbit [1.06m (3'6\"), 1.17m (3'10\"), 1.2m (3'11\"), 1....\n", "Hobbits 1.22m (4'0\")\n", "Maiar Various until \n", "Maiar,Balrog Slightly larger and taller than a Man (book), ...\n", "Men Tall\n", "Men,Skin-changer Tall (in Man-form)\n", "Men,Undead Tall\n", "Men,Wraith 7' 1\" (2.13 metres)\n", "Orc 5'9\" - 6'4\" (film)\n", "Orcs [8'5, About nine feet (film)]\n", "Skin-changer Tall\n", "Stone-trolls About 13'\n", "Uruk-hai [6' 6\" (movie), 6'1 (film)]\n", "Uruk-hai,Orc medium\n", "Urulóki Huge\n", "Werewolves Gigantic\n", "Wolfhound Horse-sized\n", "Name: height, dtype: object" ] }, "metadata": {}, "execution_count": 15 } ] }, { "cell_type": "markdown", "source": [ "The next line of code is not working as intended, only changing the first of each category to the mode. " ], "metadata": { "id": "H0gQo-lpX7Ex" } }, { "cell_type": "code", "source": [ "df.height.fillna(df.groupby('race').height.transform(lambda s: s.mode()))" ], "metadata": { "id": "K6iFGPYhHkbg", "outputId": "4d8af44d-4c63-4846-fb22-7b52118faa24", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "0 Tall\n", "1 NaN\n", "2 NaN\n", "3 NaN\n", "4 NaN\n", " ... \n", "906 NaN\n", "907 NaN\n", "908 NaN\n", "909 NaN\n", "910 NaN\n", "Name: height, Length: 911, dtype: object" ] }, "metadata": {}, "execution_count": 16 } ] }, { "cell_type": "markdown", "source": [ "I believe this line of code does the same mistake but I leave it as another way to do the transformation and might be useful at some point." ], "metadata": { "id": "6YmCYojzYJgs" } }, { "cell_type": "code", "source": [ "df.groupby('race', sort=False).height.apply(lambda x: x.fillna(value = x.mode()))\n", "\n" ], "metadata": { "id": "iXVIVW-RRiKU", "outputId": "141d5df7-5a0e-4e6e-81d1-512b0d0407b1", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "0 Tall\n", "1 NaN\n", "2 NaN\n", "3 NaN\n", "4 NaN\n", " ... \n", "903 NaN\n", "906 NaN\n", "908 NaN\n", "909 NaN\n", "910 NaN\n", "Name: height, Length: 771, dtype: object" ] }, "metadata": {}, "execution_count": 17 } ] }, { "cell_type": "markdown", "source": [ "Below I am finally able to do the conversion. I'll be honest in saying I don't understand why this works but the pandas methods would not allow the transformation on the entire mode." ], "metadata": { "id": "uezP70HN7dUq" } }, { "cell_type": "code", "source": [ "import numpy as np\n", "\n", "df.height = df.height.fillna(df.groupby('race').height.transform(lambda x: next(iter(x.mode()), np.nan)))\n", "\n", "df" ], "metadata": { "id": "aCQPY7r5Izer", "outputId": "4d8d576b-7d97-4876-97cf-160762e93e84", "colab": { "base_uri": "https://localhost:8080/", "height": 424 } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
birthdeathgenderhairheightnameracerealmspouse
0NaNNaNFemaleNaNTallAdanelMenNaNBelemir
1TA 2978February 26 ,3019MaleDark (book) Light brown (movie)TallBoromirMenNaNNaN
2NaNMarch ,3019MaleNaN8'5LagdufOrcsNaNNaN
3TA 280TA 515MaleNaNTallTarcilMenArnorUnnamed wife
4NaNNaNMaleNaNNaNFire-drake of GondolinDragonNaNNaN
..............................
906Mid ,First AgeFA 495FemaleNaNTallAerinMenNaNBrodda
907NaNNaNNaNNaNNaNAerandirNaNNaNNaN
908YT during the ,Noontide of ValinorFA 455MaleGoldenTallAegnorElvesNaNLoved ,Andreth but remained unmarried
909TA 2917TA 3010MaleNaNTallAdrahil IIMenNaNUnnamed wife
910Before ,TA 1944Late ,Third AgeMaleNaNTallAdrahil IMenNaNNaN
\n", "

911 rows × 9 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " birth death gender \\\n", "0 NaN NaN Female \n", "1 TA 2978 February 26 ,3019 Male \n", "2 NaN March ,3019 Male \n", "3 TA 280 TA 515 Male \n", "4 NaN NaN Male \n", ".. ... ... ... \n", "906 Mid ,First Age FA 495 Female \n", "907 NaN NaN NaN \n", "908 YT during the ,Noontide of Valinor FA 455 Male \n", "909 TA 2917 TA 3010 Male \n", "910 Before ,TA 1944 Late ,Third Age Male \n", "\n", " hair height name race \\\n", "0 NaN Tall Adanel Men \n", "1 Dark (book) Light brown (movie) Tall Boromir Men \n", "2 NaN 8'5 Lagduf Orcs \n", "3 NaN Tall Tarcil Men \n", "4 NaN NaN Fire-drake of Gondolin Dragon \n", ".. ... ... ... ... \n", "906 NaN Tall Aerin Men \n", "907 NaN NaN Aerandir NaN \n", "908 Golden Tall Aegnor Elves \n", "909 NaN Tall Adrahil II Men \n", "910 NaN Tall Adrahil I Men \n", "\n", " realm spouse \n", "0 NaN Belemir \n", "1 NaN NaN \n", "2 NaN NaN \n", "3 Arnor Unnamed wife \n", "4 NaN NaN \n", ".. ... ... \n", "906 NaN Brodda \n", "907 NaN NaN \n", "908 NaN Loved ,Andreth but remained unmarried \n", "909 NaN Unnamed wife \n", "910 NaN NaN \n", "\n", "[911 rows x 9 columns]" ] }, "metadata": {}, "execution_count": 18 } ] }, { "cell_type": "markdown", "source": [ "Lastly, I'll demonstrate the entire dataset transforming by the mode when grouped by race." ], "metadata": { "id": "kxZRXbLfYwsM" } }, { "cell_type": "code", "source": [ "df.fillna(df.groupby('race').transform(lambda x: next(iter(x.mode()), np.nan)))" ], "metadata": { "id": "ZwaF64XiWAt3", "outputId": "2548e27f-c47a-4646-8740-da961214775a", "colab": { "base_uri": "https://localhost:8080/", "height": 424 } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
birthdeathgenderhairheightnameracerealmspouse
0Late ,Third AgeFA 473FemaleDarkTallAdanelMenGondorBelemir
1TA 2978February 26 ,3019MaleDark (book) Light brown (movie)TallBoromirMenGondorUnnamed wife
2NaNMarch ,3019MaleGrey/white strands of hair (film)8'5LagdufOrcsMoria,Mount GundabadNaN
3TA 280TA 515MaleDarkTallTarcilMenArnorUnnamed wife
4NaNNaNMaleNaNNaNFire-drake of GondolinDragonNaNNaN
..............................
906Mid ,First AgeFA 495FemaleDarkTallAerinMenGondorBrodda
907NaNNaNNaNNaNNaNAerandirNaNNaNNaN
908YT during the ,Noontide of ValinorFA 455MaleGoldenTallAegnorElvesDoriathLoved ,Andreth but remained unmarried
909TA 2917TA 3010MaleDarkTallAdrahil IIMenGondorUnnamed wife
910Before ,TA 1944Late ,Third AgeMaleDarkTallAdrahil IMenGondorUnnamed wife
\n", "

911 rows × 9 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " birth death gender \\\n", "0 Late ,Third Age FA 473 Female \n", "1 TA 2978 February 26 ,3019 Male \n", "2 NaN March ,3019 Male \n", "3 TA 280 TA 515 Male \n", "4 NaN NaN Male \n", ".. ... ... ... \n", "906 Mid ,First Age FA 495 Female \n", "907 NaN NaN NaN \n", "908 YT during the ,Noontide of Valinor FA 455 Male \n", "909 TA 2917 TA 3010 Male \n", "910 Before ,TA 1944 Late ,Third Age Male \n", "\n", " hair height name race \\\n", "0 Dark Tall Adanel Men \n", "1 Dark (book) Light brown (movie) Tall Boromir Men \n", "2 Grey/white strands of hair (film) 8'5 Lagduf Orcs \n", "3 Dark Tall Tarcil Men \n", "4 NaN NaN Fire-drake of Gondolin Dragon \n", ".. ... ... ... ... \n", "906 Dark Tall Aerin Men \n", "907 NaN NaN Aerandir NaN \n", "908 Golden Tall Aegnor Elves \n", "909 Dark Tall Adrahil II Men \n", "910 Dark Tall Adrahil I Men \n", "\n", " realm spouse \n", "0 Gondor Belemir \n", "1 Gondor Unnamed wife \n", "2 Moria,Mount Gundabad NaN \n", "3 Arnor Unnamed wife \n", "4 NaN NaN \n", ".. ... ... \n", "906 Gondor Brodda \n", "907 NaN NaN \n", "908 Doriath Loved ,Andreth but remained unmarried \n", "909 Gondor Unnamed wife \n", "910 Gondor Unnamed wife \n", "\n", "[911 rows x 9 columns]" ] }, "metadata": {}, "execution_count": 19 } ] }, { "cell_type": "markdown", "source": [ "This is a bit silly as the first person is dead before they are born!" ], "metadata": { "id": "CybzqSeWZCuR" } }, { "cell_type": "markdown", "source": [ "## Your Turn" ], "metadata": { "id": "nvuoaFyZZKVo" } }, { "cell_type": "markdown", "source": [ "Check out the Air B&B dataset, https://raw.githubusercontent.com/nurfnick/Data_Viz/main/Data_Sets/AB_NYC_2019.csv. Examine how many entries are null. Impute for atleast two columns that are null in an approriate fashion." ], "metadata": { "id": "yAMkUW6Z9LcB" } }, { "cell_type": "code", "source": [ "df = pa.read_csv('https://raw.githubusercontent.com/nurfnick/Data_Viz/main/Data_Sets/AB_NYC_2019.csv')\n", "\n", "df.head()" ], "metadata": { "id": "2dwwGp8nWtPl", "outputId": "dd29e2b0-49e5-4d80-a3e2-f4e2b0e6be3c", "colab": { "base_uri": "https://localhost:8080/", "height": 427 } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnamehost_idhost_nameneighbourhood_groupneighbourhoodlatitudelongituderoom_typepriceminimum_nightsnumber_of_reviewslast_reviewreviews_per_monthcalculated_host_listings_countavailability_365
02539Clean & quiet apt home by the park2787JohnBrooklynKensington40.64749-73.97237Private room149192018-10-190.216365
12595Skylit Midtown Castle2845JenniferManhattanMidtown40.75362-73.98377Entire home/apt2251452019-05-210.382355
23647THE VILLAGE OF HARLEM....NEW YORK !4632ElisabethManhattanHarlem40.80902-73.94190Private room15030NaNNaN1365
33831Cozy Entire Floor of Brownstone4869LisaRoxanneBrooklynClinton Hill40.68514-73.95976Entire home/apt8912702019-07-054.641194
45022Entire Apt: Spacious Studio/Loft by central park7192LauraManhattanEast Harlem40.79851-73.94399Entire home/apt801092018-11-190.1010
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " id name host_id \\\n", "0 2539 Clean & quiet apt home by the park 2787 \n", "1 2595 Skylit Midtown Castle 2845 \n", "2 3647 THE VILLAGE OF HARLEM....NEW YORK ! 4632 \n", "3 3831 Cozy Entire Floor of Brownstone 4869 \n", "4 5022 Entire Apt: Spacious Studio/Loft by central park 7192 \n", "\n", " host_name neighbourhood_group neighbourhood latitude longitude \\\n", "0 John Brooklyn Kensington 40.64749 -73.97237 \n", "1 Jennifer Manhattan Midtown 40.75362 -73.98377 \n", "2 Elisabeth Manhattan Harlem 40.80902 -73.94190 \n", "3 LisaRoxanne Brooklyn Clinton Hill 40.68514 -73.95976 \n", "4 Laura Manhattan East Harlem 40.79851 -73.94399 \n", "\n", " room_type price minimum_nights number_of_reviews last_review \\\n", "0 Private room 149 1 9 2018-10-19 \n", "1 Entire home/apt 225 1 45 2019-05-21 \n", "2 Private room 150 3 0 NaN \n", "3 Entire home/apt 89 1 270 2019-07-05 \n", "4 Entire home/apt 80 10 9 2018-11-19 \n", "\n", " reviews_per_month calculated_host_listings_count availability_365 \n", "0 0.21 6 365 \n", "1 0.38 2 355 \n", "2 NaN 1 365 \n", "3 4.64 1 194 \n", "4 0.10 1 0 " ] }, "metadata": {}, "execution_count": 21 } ] }, { "cell_type": "code", "source": [ "" ], "metadata": { "id": "2uykfatk9TRp" }, "execution_count": null, "outputs": [] } ] }