{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "Untitled91.ipynb", "provenance": [], "authorship_tag": "ABX9TyMtZvtRWoKAOqz3q5mD4b+X", "include_colab_link": true }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "source": [ "# Exam 2 Review" ], "metadata": { "id": "t7eEcsbzqkLD" } }, { "cell_type": "markdown", "source": [ "Utilizing the bank dataset https://github.com/nurfnick/Data_Viz/blob/main/Data_Sets/bank.csv answer the following questions.\n", "\n", "1. What are the datatypes for each column. Comment on the appropriateness and change any easy/obvious ones.\n", "2. Using regular expresssions, remove the hyphen \"-\" from any job and replace it with a space.\n", "3. Convert any place where pdays is -1 to NaN.\n", "4. Fill the NaN in pdays with a zero.\n", "5. Compute the mean and median of balance when grouped by job.\n", "6. Add a column that is a datetime. Use year of 2020.\n", "7. Which day of the week (Monday, Tuesday, ect.) had the most approved loans, y = yes.\n", "8. Create an indicator column that shows if the person has more than \\$500 in their acocunt.\n", "9. Are divorced, admin people given loans at a higher rate?\n", "10. Which job is most likely to have a default on their record?\n", "11. Do highly educated people have more or less cellular phones than on average?\n", "12. What is the maritial status of the person with the maximum balance?\n", "12. Write another question based on your knowledge of the dataset.\n", "\n" ], "metadata": { "id": "imK8HrMtqn1A" } }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "yrE48wBEqjgD", "outputId": "25b2f248-94e7-48a0-f0b7-371c02f45314", "colab": { "base_uri": "https://localhost:8080/", "height": 677 } }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " age job marital education default balance housing loan \\\n", "0 30 unemployed married primary no 1787 no no \n", "1 33 services married secondary no 4789 yes yes \n", "2 35 management single tertiary no 1350 yes no \n", "3 30 management married tertiary no 1476 yes yes \n", "4 59 blue-collar married secondary no 0 yes no \n", "5 35 management single tertiary no 747 no no \n", "6 36 self-employed married tertiary no 307 yes no \n", "7 39 technician married secondary no 147 yes no \n", "8 41 entrepreneur married tertiary no 221 yes no \n", "9 43 services married primary no -88 yes yes \n", "10 39 services married secondary no 9374 yes no \n", "11 43 admin. married secondary no 264 yes no \n", "12 36 technician married tertiary no 1109 no no \n", "13 20 student single secondary no 502 no no \n", "14 31 blue-collar married secondary no 360 yes yes \n", "15 40 management married tertiary no 194 no yes \n", "16 56 technician married secondary no 4073 no no \n", "17 37 admin. single tertiary no 2317 yes no \n", "18 25 blue-collar single primary no -221 yes no \n", "19 31 services married secondary no 132 no no \n", "\n", " contact day month duration campaign pdays previous poutcome y \n", "0 cellular 19 oct 79 1 -1 0 unknown no \n", "1 cellular 11 may 220 1 339 4 failure no \n", "2 cellular 16 apr 185 1 330 1 failure no \n", "3 unknown 3 jun 199 4 -1 0 unknown no \n", "4 unknown 5 may 226 1 -1 0 unknown no \n", "5 cellular 23 feb 141 2 176 3 failure no \n", "6 cellular 14 may 341 1 330 2 other no \n", "7 cellular 6 may 151 2 -1 0 unknown no \n", "8 unknown 14 may 57 2 -1 0 unknown no \n", "9 cellular 17 apr 313 1 147 2 failure no \n", "10 unknown 20 may 273 1 -1 0 unknown no \n", "11 cellular 17 apr 113 2 -1 0 unknown no \n", "12 cellular 13 aug 328 2 -1 0 unknown no \n", "13 cellular 30 apr 261 1 -1 0 unknown yes \n", "14 cellular 29 jan 89 1 241 1 failure no \n", "15 cellular 29 aug 189 2 -1 0 unknown no \n", "16 cellular 27 aug 239 5 -1 0 unknown no \n", "17 cellular 20 apr 114 1 152 2 failure no \n", "18 unknown 23 may 250 1 -1 0 unknown no \n", "19 cellular 7 jul 148 1 152 1 other no " ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agejobmaritaleducationdefaultbalancehousingloancontactdaymonthdurationcampaignpdayspreviouspoutcomey
030unemployedmarriedprimaryno1787nonocellular19oct791-10unknownno
133servicesmarriedsecondaryno4789yesyescellular11may22013394failureno
235managementsingletertiaryno1350yesnocellular16apr18513301failureno
330managementmarriedtertiaryno1476yesyesunknown3jun1994-10unknownno
459blue-collarmarriedsecondaryno0yesnounknown5may2261-10unknownno
535managementsingletertiaryno747nonocellular23feb14121763failureno
636self-employedmarriedtertiaryno307yesnocellular14may34113302otherno
739technicianmarriedsecondaryno147yesnocellular6may1512-10unknownno
841entrepreneurmarriedtertiaryno221yesnounknown14may572-10unknownno
943servicesmarriedprimaryno-88yesyescellular17apr31311472failureno
1039servicesmarriedsecondaryno9374yesnounknown20may2731-10unknownno
1143admin.marriedsecondaryno264yesnocellular17apr1132-10unknownno
1236technicianmarriedtertiaryno1109nonocellular13aug3282-10unknownno
1320studentsinglesecondaryno502nonocellular30apr2611-10unknownyes
1431blue-collarmarriedsecondaryno360yesyescellular29jan8912411failureno
1540managementmarriedtertiaryno194noyescellular29aug1892-10unknownno
1656technicianmarriedsecondaryno4073nonocellular27aug2395-10unknownno
1737admin.singletertiaryno2317yesnocellular20apr11411522failureno
1825blue-collarsingleprimaryno-221yesnounknown23may2501-10unknownno
1931servicesmarriedsecondaryno132nonocellular7jul14811521otherno
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 2 } ], "source": [ "import pandas as pa\n", "\n", "df = pa.read_csv('https://raw.githubusercontent.com/nurfnick/Data_Viz/main/Data_Sets/bank.csv')\n", "\n", "df.head(20)" ] }, { "cell_type": "markdown", "source": [ "## Data Types" ], "metadata": { "id": "VW2njdqvIzjO" } }, { "cell_type": "code", "source": [ "df.dtypes" ], "metadata": { "id": "i52faT5kyFFA", "outputId": "0319e909-e5f7-4c28-f45c-88a344bc1ec0", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "age int64\n", "job object\n", "marital object\n", "education object\n", "default object\n", "balance int64\n", "housing object\n", "loan object\n", "contact object\n", "day int64\n", "month object\n", "duration int64\n", "campaign int64\n", "pdays int64\n", "previous int64\n", "poutcome object\n", "y object\n", "dtype: object" ] }, "metadata": {}, "execution_count": 3 } ] }, { "cell_type": "markdown", "source": [ "I'll change the `y` column to booleen. " ], "metadata": { "id": "Mn5NK1PrI2jt" } }, { "cell_type": "code", "source": [ "df.y = df.y=='yes'" ], "metadata": { "id": "wlUp6R3tyS1g" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "df.dtypes" ], "metadata": { "id": "ImP9ZpO7yv1T", "outputId": "b777418c-0c41-44a1-f2eb-c04d798ea5f7", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "age int64\n", "job object\n", "marital object\n", "education object\n", "default object\n", "balance int64\n", "housing object\n", "loan object\n", "contact object\n", "day int64\n", "month object\n", "duration int64\n", "campaign int64\n", "pdays int64\n", "previous int64\n", "poutcome object\n", "y bool\n", "dtype: object" ] }, "metadata": {}, "execution_count": 5 } ] }, { "cell_type": "markdown", "source": [ "## Regular Epressions" ], "metadata": { "id": "JuZtwLhdI8hd" } }, { "cell_type": "markdown", "source": [ "" ], "metadata": { "id": "j4-T_jT0JBvE" } }, { "cell_type": "code", "source": [ "import re\n", "\n", "re.sub(r\"-\",\" \",df.job[6])" ], "metadata": { "id": "pRJt39Gzy-xV", "outputId": "52dc3010-a664-44af-e1cc-842c52e552c9", "colab": { "base_uri": "https://localhost:8080/", "height": 36 } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "'self employed'" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "string" } }, "metadata": {}, "execution_count": 6 } ] }, { "cell_type": "markdown", "source": [ "The above code successfully removes the '-' so I just need to apply it to the entire column and not forget to replace the current column." ], "metadata": { "id": "ZJ2uo9o5JCzN" } }, { "cell_type": "code", "source": [ "df.job.apply(lambda x: re.sub(r\"-\",\" \",x))" ], "metadata": { "id": "C8LVhE7RzanH", "outputId": "4a8992de-cf80-4780-c088-0e8eeacd0618", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "0 unemployed\n", "1 services\n", "2 management\n", "3 management\n", "4 blue collar\n", " ... \n", "4516 services\n", "4517 self employed\n", "4518 technician\n", "4519 blue collar\n", "4520 entrepreneur\n", "Name: job, Length: 4521, dtype: object" ] }, "metadata": {}, "execution_count": 7 } ] }, { "cell_type": "code", "source": [ "df.job = df.job.apply(lambda x: re.sub(r\"-\",\" \",x))" ], "metadata": { "id": "f4sxV85z0aA9" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Replace With NaN" ], "metadata": { "id": "QtGaxFUeJOw-" } }, { "cell_type": "markdown", "source": [ "I needed `numpy` to do this. This was a silly exercise but still a nice thing to think about." ], "metadata": { "id": "O0MQMBVMJSEE" } }, { "cell_type": "code", "source": [ "import numpy as np\n", "\n", "df.pdays.replace(-1,np.nan)" ], "metadata": { "id": "qFgsR5ks0s9R", "outputId": "287e0a14-af9b-443a-dd4e-7b177db9576d", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "0 NaN\n", "1 339.0\n", "2 330.0\n", "3 NaN\n", "4 NaN\n", " ... \n", "4516 NaN\n", "4517 NaN\n", "4518 NaN\n", "4519 211.0\n", "4520 249.0\n", "Name: pdays, Length: 4521, dtype: float64" ] }, "metadata": {}, "execution_count": 9 } ] }, { "cell_type": "code", "source": [ "df.pdays = df.pdays.replace(-1,np.nan)" ], "metadata": { "id": "N8WLmYDd0x1f" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Fill NaN" ], "metadata": { "id": "TKQ61L_yJcLF" } }, { "cell_type": "markdown", "source": [ "Now I immediately undo what I just did." ], "metadata": { "id": "GsuPpG6GJfDT" } }, { "cell_type": "code", "source": [ "df.pdays = df.pdays.fillna(0)" ], "metadata": { "id": "zUXjXGwf103o" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Group By" ], "metadata": { "id": "g-7OHLeiJjBb" } }, { "cell_type": "code", "source": [ "df.groupby('job').balance.agg(['mean','median','count','std'])" ], "metadata": { "id": "wOvnu3vB2Omo", "outputId": "e1a98074-82f8-4a96-aaeb-6a09eaa8551f", "colab": { "base_uri": "https://localhost:8080/", "height": 457 } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " mean median count std\n", "job \n", "admin. 1226.736402 430.0 478 2370.119128\n", "blue collar 1085.161734 408.5 946 2040.218220\n", "entrepreneur 1645.125000 365.5 168 4441.303620\n", "housemaid 2083.803571 296.5 112 4603.836647\n", "management 1766.928793 577.0 969 3267.733077\n", "retired 2319.191304 672.5 230 5846.379889\n", "self employed 1392.409836 483.0 183 2479.640999\n", "services 1103.956835 288.0 417 2445.239976\n", "student 1543.821429 422.5 84 2579.886671\n", "technician 1330.996094 434.5 768 2630.253390\n", "unemployed 1089.421875 473.5 128 1692.267628\n", "unknown 1501.710526 655.5 38 1957.258258" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
meanmediancountstd
job
admin.1226.736402430.04782370.119128
blue collar1085.161734408.59462040.218220
entrepreneur1645.125000365.51684441.303620
housemaid2083.803571296.51124603.836647
management1766.928793577.09693267.733077
retired2319.191304672.52305846.379889
self employed1392.409836483.01832479.640999
services1103.956835288.04172445.239976
student1543.821429422.5842579.886671
technician1330.996094434.57682630.253390
unemployed1089.421875473.51281692.267628
unknown1501.710526655.5381957.258258
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 12 } ] }, { "cell_type": "markdown", "source": [ "## Dates" ], "metadata": { "id": "Eu_Oieu5JnML" } }, { "cell_type": "markdown", "source": [ "This requires some conversion of the months to numerical." ], "metadata": { "id": "UM_o-sEuJqGP" } }, { "cell_type": "code", "source": [ "monthConvert = {\n", " 'jan':1,\n", " 'feb':2,\n", " 'mar':3,\n", " 'apr':4,\n", " 'may':5,\n", " 'jun':6,\n", " 'jul':7,\n", " 'aug':8,\n", " 'sep':9,\n", " 'oct':10,\n", " 'nov':11,\n", " 'dec':12\n", "}" ], "metadata": { "id": "773I6vJo35fr" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "This next line actually creates the datetime column." ], "metadata": { "id": "_kEBJb9RJwkj" } }, { "cell_type": "code", "source": [ "date = pa.to_datetime(dict(year = 2020, day = df.day, month = df.month.map(monthConvert)))" ], "metadata": { "id": "tNy1t49Z2kk6" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "I'll add it to my dataframe. Adding a new column requires `concat` and creating a `DataFrame` from the series. \n", "\n", "I struggled in class to do this because I needed a dictinary with the coulmn name..." ], "metadata": { "id": "R7ujPmpSKGZQ" } }, { "cell_type": "code", "source": [ "df = pa.concat([df,pa.DataFrame( {'date':date})],axis = 1)\n", "\n", "df.head()" ], "metadata": { "id": "2ptVqF2L3uVI", "outputId": "c10c0508-6577-42e4-fcef-a08562149882", "colab": { "base_uri": "https://localhost:8080/", "height": 206 } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " age job marital education default balance housing loan \\\n", "0 30 unemployed married primary no 1787 no no \n", "1 33 services married secondary no 4789 yes yes \n", "2 35 management single tertiary no 1350 yes no \n", "3 30 management married tertiary no 1476 yes yes \n", "4 59 blue collar married secondary no 0 yes no \n", "\n", " contact day month duration campaign pdays previous poutcome y \\\n", "0 cellular 19 oct 79 1 0.0 0 unknown False \n", "1 cellular 11 may 220 1 339.0 4 failure False \n", "2 cellular 16 apr 185 1 330.0 1 failure False \n", "3 unknown 3 jun 199 4 0.0 0 unknown False \n", "4 unknown 5 may 226 1 0.0 0 unknown False \n", "\n", " date \n", "0 2020-10-19 \n", "1 2020-05-11 \n", "2 2020-04-16 \n", "3 2020-06-03 \n", "4 2020-05-05 " ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agejobmaritaleducationdefaultbalancehousingloancontactdaymonthdurationcampaignpdayspreviouspoutcomeydate
030unemployedmarriedprimaryno1787nonocellular19oct7910.00unknownFalse2020-10-19
133servicesmarriedsecondaryno4789yesyescellular11may2201339.04failureFalse2020-05-11
235managementsingletertiaryno1350yesnocellular16apr1851330.01failureFalse2020-04-16
330managementmarriedtertiaryno1476yesyesunknown3jun19940.00unknownFalse2020-06-03
459blue collarmarriedsecondaryno0yesnounknown5may22610.00unknownFalse2020-05-05
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 15 } ] }, { "cell_type": "markdown", "source": [ "## Day of the Week" ], "metadata": { "id": "WpuDmX58LIgp" } }, { "cell_type": "markdown", "source": [ "Here is one way to do it." ], "metadata": { "id": "0iZAhPpyMBVZ" } }, { "cell_type": "code", "source": [ "pa.crosstab(df.date.dt.day_name(),df.y)" ], "metadata": { "id": "XSEiP4XFLLLe", "outputId": "4269acf7-a238-4cbd-f8b7-56d007be5318", "colab": { "base_uri": "https://localhost:8080/", "height": 300 } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "y False True\n", "date \n", "Friday 705 81\n", "Monday 295 70\n", "Saturday 491 36\n", "Sunday 77 29\n", "Thursday 846 105\n", "Tuesday 753 107\n", "Wednesday 833 93" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yFalseTrue
date
Friday70581
Monday29570
Saturday49136
Sunday7729
Thursday846105
Tuesday753107
Wednesday83393
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 16 } ] }, { "cell_type": "markdown", "source": [ "Another way might be with `groupby`." ], "metadata": { "id": "hvMgZ52CMD5V" } }, { "cell_type": "code", "source": [ "df.groupby([date.dt.day_name(),df.y]).age.agg('count')" ], "metadata": { "id": "d-HlspRbMKc7", "outputId": "587e33ce-79ee-450e-9f4c-fc7430595c65", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " y \n", "Friday False 705\n", " True 81\n", "Monday False 295\n", " True 70\n", "Saturday False 491\n", " True 36\n", "Sunday False 77\n", " True 29\n", "Thursday False 846\n", " True 105\n", "Tuesday False 753\n", " True 107\n", "Wednesday False 833\n", " True 93\n", "Name: age, dtype: int64" ] }, "metadata": {}, "execution_count": 17 } ] }, { "cell_type": "markdown", "source": [ "I think you could look at this by doing some subsetting of the data too." ], "metadata": { "id": "kYl2k68hMnE8" } }, { "cell_type": "code", "source": [ "df[df.y == True].groupby(df.date.dt.day_name()).y.agg('count')" ], "metadata": { "id": "scRcs_8_Msl7", "outputId": "c984affb-a184-4ad1-f0d9-8bff23ab44ca", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "date\n", "Friday 81\n", "Monday 70\n", "Saturday 36\n", "Sunday 29\n", "Thursday 105\n", "Tuesday 107\n", "Wednesday 93\n", "Name: y, dtype: int64" ] }, "metadata": {}, "execution_count": 18 } ] }, { "cell_type": "markdown", "source": [ "This one was rather difficult. Finding the day names was what I was after!" ], "metadata": { "id": "Bah__FShNAbU" } }, { "cell_type": "markdown", "source": [ "## Indicator" ], "metadata": { "id": "eiFqGX3GNKu7" } }, { "cell_type": "code", "source": [ "bal = df.balance > 500\n", "pa.concat([df,pa.DataFrame({'FiveHundo':bal})], axis = 1)" ], "metadata": { "id": "WpoEr_BXNNnU", "outputId": "500d7c8a-99cb-43c4-b536-cba4c925a5cd", "colab": { "base_uri": "https://localhost:8080/", "height": 424 } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " age job marital education default balance housing loan \\\n", "0 30 unemployed married primary no 1787 no no \n", "1 33 services married secondary no 4789 yes yes \n", "2 35 management single tertiary no 1350 yes no \n", "3 30 management married tertiary no 1476 yes yes \n", "4 59 blue collar married secondary no 0 yes no \n", "... ... ... ... ... ... ... ... ... \n", "4516 33 services married secondary no -333 yes no \n", "4517 57 self employed married tertiary yes -3313 yes yes \n", "4518 57 technician married secondary no 295 no no \n", "4519 28 blue collar married secondary no 1137 no no \n", "4520 44 entrepreneur single tertiary no 1136 yes yes \n", "\n", " contact day month duration campaign pdays previous poutcome \\\n", "0 cellular 19 oct 79 1 0.0 0 unknown \n", "1 cellular 11 may 220 1 339.0 4 failure \n", "2 cellular 16 apr 185 1 330.0 1 failure \n", "3 unknown 3 jun 199 4 0.0 0 unknown \n", "4 unknown 5 may 226 1 0.0 0 unknown \n", "... ... ... ... ... ... ... ... ... \n", "4516 cellular 30 jul 329 5 0.0 0 unknown \n", "4517 unknown 9 may 153 1 0.0 0 unknown \n", "4518 cellular 19 aug 151 11 0.0 0 unknown \n", "4519 cellular 6 feb 129 4 211.0 3 other \n", "4520 cellular 3 apr 345 2 249.0 7 other \n", "\n", " y date FiveHundo \n", "0 False 2020-10-19 True \n", "1 False 2020-05-11 True \n", "2 False 2020-04-16 True \n", "3 False 2020-06-03 True \n", "4 False 2020-05-05 False \n", "... ... ... ... \n", "4516 False 2020-07-30 False \n", "4517 False 2020-05-09 False \n", "4518 False 2020-08-19 False \n", "4519 False 2020-02-06 True \n", "4520 False 2020-04-03 True \n", "\n", "[4521 rows x 19 columns]" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agejobmaritaleducationdefaultbalancehousingloancontactdaymonthdurationcampaignpdayspreviouspoutcomeydateFiveHundo
030unemployedmarriedprimaryno1787nonocellular19oct7910.00unknownFalse2020-10-19True
133servicesmarriedsecondaryno4789yesyescellular11may2201339.04failureFalse2020-05-11True
235managementsingletertiaryno1350yesnocellular16apr1851330.01failureFalse2020-04-16True
330managementmarriedtertiaryno1476yesyesunknown3jun19940.00unknownFalse2020-06-03True
459blue collarmarriedsecondaryno0yesnounknown5may22610.00unknownFalse2020-05-05False
............................................................
451633servicesmarriedsecondaryno-333yesnocellular30jul32950.00unknownFalse2020-07-30False
451757self employedmarriedtertiaryyes-3313yesyesunknown9may15310.00unknownFalse2020-05-09False
451857technicianmarriedsecondaryno295nonocellular19aug151110.00unknownFalse2020-08-19False
451928blue collarmarriedsecondaryno1137nonocellular6feb1294211.03otherFalse2020-02-06True
452044entrepreneursingletertiaryno1136yesyescellular3apr3452249.07otherFalse2020-04-03True
\n", "

4521 rows × 19 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 19 } ] }, { "cell_type": "markdown", "source": [ "## Subsetting Data" ], "metadata": { "id": "udVPISbwNyz3" } }, { "cell_type": "code", "source": [ "df[(df.job == 'admin.')&(df.marital == 'divorced')]" ], "metadata": { "id": "DZOtydbTN1F_", "outputId": "a441dfeb-6dc2-45ee-b098-dda2d14326bf", "colab": { "base_uri": "https://localhost:8080/", "height": 424 } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " age job marital education default balance housing loan \\\n", "35 42 admin. divorced secondary no 1811 yes no \n", "80 27 admin. divorced secondary no 451 yes no \n", "120 31 admin. divorced secondary no 1890 yes no \n", "183 40 admin. divorced secondary no 6 no no \n", "283 42 admin. divorced secondary no 63 no no \n", "... ... ... ... ... ... ... ... ... \n", "4128 34 admin. divorced secondary no 1268 yes no \n", "4244 39 admin. divorced secondary no 83 yes no \n", "4282 34 admin. divorced secondary no -251 no no \n", "4407 45 admin. divorced secondary no 59 yes no \n", "4438 52 admin. divorced secondary no 98 no yes \n", "\n", " contact day month duration campaign pdays previous poutcome \\\n", "35 unknown 14 may 150 1 0.0 0 unknown \n", "80 cellular 16 jul 652 1 0.0 0 unknown \n", "120 cellular 21 jul 588 1 0.0 0 unknown \n", "183 unknown 11 jun 140 1 0.0 0 unknown \n", "283 unknown 16 may 88 6 0.0 0 unknown \n", "... ... ... ... ... ... ... ... ... \n", "4128 cellular 15 may 259 3 0.0 0 unknown \n", "4244 cellular 30 jul 69 6 0.0 0 unknown \n", "4282 cellular 18 jul 641 1 0.0 0 unknown \n", "4407 cellular 24 jul 873 8 0.0 0 unknown \n", "4438 unknown 19 jun 148 1 0.0 0 unknown \n", "\n", " y date \n", "35 False 2020-05-14 \n", "80 True 2020-07-16 \n", "120 False 2020-07-21 \n", "183 False 2020-06-11 \n", "283 False 2020-05-16 \n", "... ... ... \n", "4128 False 2020-05-15 \n", "4244 False 2020-07-30 \n", "4282 True 2020-07-18 \n", "4407 True 2020-07-24 \n", "4438 False 2020-06-19 \n", "\n", "[69 rows x 18 columns]" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agejobmaritaleducationdefaultbalancehousingloancontactdaymonthdurationcampaignpdayspreviouspoutcomeydate
3542admin.divorcedsecondaryno1811yesnounknown14may15010.00unknownFalse2020-05-14
8027admin.divorcedsecondaryno451yesnocellular16jul65210.00unknownTrue2020-07-16
12031admin.divorcedsecondaryno1890yesnocellular21jul58810.00unknownFalse2020-07-21
18340admin.divorcedsecondaryno6nonounknown11jun14010.00unknownFalse2020-06-11
28342admin.divorcedsecondaryno63nonounknown16may8860.00unknownFalse2020-05-16
.........................................................
412834admin.divorcedsecondaryno1268yesnocellular15may25930.00unknownFalse2020-05-15
424439admin.divorcedsecondaryno83yesnocellular30jul6960.00unknownFalse2020-07-30
428234admin.divorcedsecondaryno-251nonocellular18jul64110.00unknownTrue2020-07-18
440745admin.divorcedsecondaryno59yesnocellular24jul87380.00unknownTrue2020-07-24
443852admin.divorcedsecondaryno98noyesunknown19jun14810.00unknownFalse2020-06-19
\n", "

69 rows × 18 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 20 } ] }, { "cell_type": "markdown", "source": [ "Here are those folks! Let's see what their stats are for getting approved." ], "metadata": { "id": "aekEV7TDOG-f" } }, { "cell_type": "code", "source": [ "df[(df.job == 'admin.')&(df.marital == 'divorced')].groupby('y').age.agg('count')" ], "metadata": { "id": "l-3hGgSNOla_", "outputId": "fcc0f6f8-1c20-4d0e-8652-0feff2054b9e", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "y\n", "False 58\n", "True 11\n", "Name: age, dtype: int64" ] }, "metadata": {}, "execution_count": 21 } ] }, { "cell_type": "markdown", "source": [ "## Another Group By?" ], "metadata": { "id": "RYveWK1yOvA9" } }, { "cell_type": "code", "source": [ "df.groupby(['job','default']).age.agg('count')" ], "metadata": { "id": "WMLFrD3oO2IE", "outputId": "fc81f3d1-18ce-4ec7-a660-ab78e1f5059f", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "job default\n", "admin. no 472\n", " yes 6\n", "blue collar no 932\n", " yes 14\n", "entrepreneur no 161\n", " yes 7\n", "housemaid no 110\n", " yes 2\n", "management no 955\n", " yes 14\n", "retired no 227\n", " yes 3\n", "self employed no 179\n", " yes 4\n", "services no 410\n", " yes 7\n", "student no 83\n", " yes 1\n", "technician no 753\n", " yes 15\n", "unemployed no 125\n", " yes 3\n", "unknown no 38\n", "Name: age, dtype: int64" ] }, "metadata": {}, "execution_count": 22 } ] }, { "cell_type": "markdown", "source": [ "## Two Way Tables" ], "metadata": { "id": "GJdSOASXKz5w" } }, { "cell_type": "markdown", "source": [ "Below is how I looked at number 11 in class." ], "metadata": { "id": "m70dYyzkK2Df" } }, { "cell_type": "code", "source": [ "df.groupby(['education','contact']).age.agg('count')" ], "metadata": { "id": "XsY7lm104trK", "outputId": "39853125-a4dd-4f27-e955-8e756d0547c8", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "education contact \n", "primary cellular 343\n", " telephone 75\n", " unknown 260\n", "secondary cellular 1446\n", " telephone 141\n", " unknown 719\n", "tertiary cellular 1012\n", " telephone 66\n", " unknown 272\n", "unknown cellular 95\n", " telephone 19\n", " unknown 73\n", "Name: age, dtype: int64" ] }, "metadata": {}, "execution_count": 23 } ] }, { "cell_type": "markdown", "source": [ "Below is a nicer way to create a two-way table with `pandas.crosstab`" ], "metadata": { "id": "YCaQ04U4K7fE" } }, { "cell_type": "code", "source": [ "pa.crosstab(df.education,df.contact)" ], "metadata": { "id": "W-XqwY1q7R3T", "outputId": "53b2d61a-ffb2-4171-af51-24a3a5cd5c0b", "colab": { "base_uri": "https://localhost:8080/", "height": 206 } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "contact cellular telephone unknown\n", "education \n", "primary 343 75 260\n", "secondary 1446 141 719\n", "tertiary 1012 66 272\n", "unknown 95 19 73" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
contactcellulartelephoneunknown
education
primary34375260
secondary1446141719
tertiary101266272
unknown951973
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 24 } ] }, { "cell_type": "markdown", "source": [ "## Maximum" ], "metadata": { "id": "0WAPUSe3ABXV" } }, { "cell_type": "code", "source": [ "df[df.balance == max(df.balance)].marital" ], "metadata": { "id": "H59_o5zMKu3o", "outputId": "f32080c7-96ce-41df-f19c-420a5f4d9c2e", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "3700 married\n", "Name: marital, dtype: object" ] }, "metadata": {}, "execution_count": 29 } ] }, { "cell_type": "code", "source": [ "" ], "metadata": { "id": "pGHzhi3BAGyf" }, "execution_count": null, "outputs": [] } ] }