{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Untitled91.ipynb",
"provenance": [],
"authorship_tag": "ABX9TyMtZvtRWoKAOqz3q5mD4b+X",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
" "
]
},
{
"cell_type": "markdown",
"source": [
"# Exam 2 Review"
],
"metadata": {
"id": "t7eEcsbzqkLD"
}
},
{
"cell_type": "markdown",
"source": [
"Utilizing the bank dataset https://github.com/nurfnick/Data_Viz/blob/main/Data_Sets/bank.csv answer the following questions.\n",
"\n",
"1. What are the datatypes for each column. Comment on the appropriateness and change any easy/obvious ones.\n",
"2. Using regular expresssions, remove the hyphen \"-\" from any job and replace it with a space.\n",
"3. Convert any place where pdays is -1 to NaN.\n",
"4. Fill the NaN in pdays with a zero.\n",
"5. Compute the mean and median of balance when grouped by job.\n",
"6. Add a column that is a datetime. Use year of 2020.\n",
"7. Which day of the week (Monday, Tuesday, ect.) had the most approved loans, y = yes.\n",
"8. Create an indicator column that shows if the person has more than \\$500 in their acocunt.\n",
"9. Are divorced, admin people given loans at a higher rate?\n",
"10. Which job is most likely to have a default on their record?\n",
"11. Do highly educated people have more or less cellular phones than on average?\n",
"12. What is the maritial status of the person with the maximum balance?\n",
"12. Write another question based on your knowledge of the dataset.\n",
"\n"
],
"metadata": {
"id": "imK8HrMtqn1A"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "yrE48wBEqjgD",
"outputId": "25b2f248-94e7-48a0-f0b7-371c02f45314",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 677
}
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" age job marital education default balance housing loan \\\n",
"0 30 unemployed married primary no 1787 no no \n",
"1 33 services married secondary no 4789 yes yes \n",
"2 35 management single tertiary no 1350 yes no \n",
"3 30 management married tertiary no 1476 yes yes \n",
"4 59 blue-collar married secondary no 0 yes no \n",
"5 35 management single tertiary no 747 no no \n",
"6 36 self-employed married tertiary no 307 yes no \n",
"7 39 technician married secondary no 147 yes no \n",
"8 41 entrepreneur married tertiary no 221 yes no \n",
"9 43 services married primary no -88 yes yes \n",
"10 39 services married secondary no 9374 yes no \n",
"11 43 admin. married secondary no 264 yes no \n",
"12 36 technician married tertiary no 1109 no no \n",
"13 20 student single secondary no 502 no no \n",
"14 31 blue-collar married secondary no 360 yes yes \n",
"15 40 management married tertiary no 194 no yes \n",
"16 56 technician married secondary no 4073 no no \n",
"17 37 admin. single tertiary no 2317 yes no \n",
"18 25 blue-collar single primary no -221 yes no \n",
"19 31 services married secondary no 132 no no \n",
"\n",
" contact day month duration campaign pdays previous poutcome y \n",
"0 cellular 19 oct 79 1 -1 0 unknown no \n",
"1 cellular 11 may 220 1 339 4 failure no \n",
"2 cellular 16 apr 185 1 330 1 failure no \n",
"3 unknown 3 jun 199 4 -1 0 unknown no \n",
"4 unknown 5 may 226 1 -1 0 unknown no \n",
"5 cellular 23 feb 141 2 176 3 failure no \n",
"6 cellular 14 may 341 1 330 2 other no \n",
"7 cellular 6 may 151 2 -1 0 unknown no \n",
"8 unknown 14 may 57 2 -1 0 unknown no \n",
"9 cellular 17 apr 313 1 147 2 failure no \n",
"10 unknown 20 may 273 1 -1 0 unknown no \n",
"11 cellular 17 apr 113 2 -1 0 unknown no \n",
"12 cellular 13 aug 328 2 -1 0 unknown no \n",
"13 cellular 30 apr 261 1 -1 0 unknown yes \n",
"14 cellular 29 jan 89 1 241 1 failure no \n",
"15 cellular 29 aug 189 2 -1 0 unknown no \n",
"16 cellular 27 aug 239 5 -1 0 unknown no \n",
"17 cellular 20 apr 114 1 152 2 failure no \n",
"18 unknown 23 may 250 1 -1 0 unknown no \n",
"19 cellular 7 jul 148 1 152 1 other no "
],
"text/html": [
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" age \n",
" job \n",
" marital \n",
" education \n",
" default \n",
" balance \n",
" housing \n",
" loan \n",
" contact \n",
" day \n",
" month \n",
" duration \n",
" campaign \n",
" pdays \n",
" previous \n",
" poutcome \n",
" y \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" 30 \n",
" unemployed \n",
" married \n",
" primary \n",
" no \n",
" 1787 \n",
" no \n",
" no \n",
" cellular \n",
" 19 \n",
" oct \n",
" 79 \n",
" 1 \n",
" -1 \n",
" 0 \n",
" unknown \n",
" no \n",
" \n",
" \n",
" 1 \n",
" 33 \n",
" services \n",
" married \n",
" secondary \n",
" no \n",
" 4789 \n",
" yes \n",
" yes \n",
" cellular \n",
" 11 \n",
" may \n",
" 220 \n",
" 1 \n",
" 339 \n",
" 4 \n",
" failure \n",
" no \n",
" \n",
" \n",
" 2 \n",
" 35 \n",
" management \n",
" single \n",
" tertiary \n",
" no \n",
" 1350 \n",
" yes \n",
" no \n",
" cellular \n",
" 16 \n",
" apr \n",
" 185 \n",
" 1 \n",
" 330 \n",
" 1 \n",
" failure \n",
" no \n",
" \n",
" \n",
" 3 \n",
" 30 \n",
" management \n",
" married \n",
" tertiary \n",
" no \n",
" 1476 \n",
" yes \n",
" yes \n",
" unknown \n",
" 3 \n",
" jun \n",
" 199 \n",
" 4 \n",
" -1 \n",
" 0 \n",
" unknown \n",
" no \n",
" \n",
" \n",
" 4 \n",
" 59 \n",
" blue-collar \n",
" married \n",
" secondary \n",
" no \n",
" 0 \n",
" yes \n",
" no \n",
" unknown \n",
" 5 \n",
" may \n",
" 226 \n",
" 1 \n",
" -1 \n",
" 0 \n",
" unknown \n",
" no \n",
" \n",
" \n",
" 5 \n",
" 35 \n",
" management \n",
" single \n",
" tertiary \n",
" no \n",
" 747 \n",
" no \n",
" no \n",
" cellular \n",
" 23 \n",
" feb \n",
" 141 \n",
" 2 \n",
" 176 \n",
" 3 \n",
" failure \n",
" no \n",
" \n",
" \n",
" 6 \n",
" 36 \n",
" self-employed \n",
" married \n",
" tertiary \n",
" no \n",
" 307 \n",
" yes \n",
" no \n",
" cellular \n",
" 14 \n",
" may \n",
" 341 \n",
" 1 \n",
" 330 \n",
" 2 \n",
" other \n",
" no \n",
" \n",
" \n",
" 7 \n",
" 39 \n",
" technician \n",
" married \n",
" secondary \n",
" no \n",
" 147 \n",
" yes \n",
" no \n",
" cellular \n",
" 6 \n",
" may \n",
" 151 \n",
" 2 \n",
" -1 \n",
" 0 \n",
" unknown \n",
" no \n",
" \n",
" \n",
" 8 \n",
" 41 \n",
" entrepreneur \n",
" married \n",
" tertiary \n",
" no \n",
" 221 \n",
" yes \n",
" no \n",
" unknown \n",
" 14 \n",
" may \n",
" 57 \n",
" 2 \n",
" -1 \n",
" 0 \n",
" unknown \n",
" no \n",
" \n",
" \n",
" 9 \n",
" 43 \n",
" services \n",
" married \n",
" primary \n",
" no \n",
" -88 \n",
" yes \n",
" yes \n",
" cellular \n",
" 17 \n",
" apr \n",
" 313 \n",
" 1 \n",
" 147 \n",
" 2 \n",
" failure \n",
" no \n",
" \n",
" \n",
" 10 \n",
" 39 \n",
" services \n",
" married \n",
" secondary \n",
" no \n",
" 9374 \n",
" yes \n",
" no \n",
" unknown \n",
" 20 \n",
" may \n",
" 273 \n",
" 1 \n",
" -1 \n",
" 0 \n",
" unknown \n",
" no \n",
" \n",
" \n",
" 11 \n",
" 43 \n",
" admin. \n",
" married \n",
" secondary \n",
" no \n",
" 264 \n",
" yes \n",
" no \n",
" cellular \n",
" 17 \n",
" apr \n",
" 113 \n",
" 2 \n",
" -1 \n",
" 0 \n",
" unknown \n",
" no \n",
" \n",
" \n",
" 12 \n",
" 36 \n",
" technician \n",
" married \n",
" tertiary \n",
" no \n",
" 1109 \n",
" no \n",
" no \n",
" cellular \n",
" 13 \n",
" aug \n",
" 328 \n",
" 2 \n",
" -1 \n",
" 0 \n",
" unknown \n",
" no \n",
" \n",
" \n",
" 13 \n",
" 20 \n",
" student \n",
" single \n",
" secondary \n",
" no \n",
" 502 \n",
" no \n",
" no \n",
" cellular \n",
" 30 \n",
" apr \n",
" 261 \n",
" 1 \n",
" -1 \n",
" 0 \n",
" unknown \n",
" yes \n",
" \n",
" \n",
" 14 \n",
" 31 \n",
" blue-collar \n",
" married \n",
" secondary \n",
" no \n",
" 360 \n",
" yes \n",
" yes \n",
" cellular \n",
" 29 \n",
" jan \n",
" 89 \n",
" 1 \n",
" 241 \n",
" 1 \n",
" failure \n",
" no \n",
" \n",
" \n",
" 15 \n",
" 40 \n",
" management \n",
" married \n",
" tertiary \n",
" no \n",
" 194 \n",
" no \n",
" yes \n",
" cellular \n",
" 29 \n",
" aug \n",
" 189 \n",
" 2 \n",
" -1 \n",
" 0 \n",
" unknown \n",
" no \n",
" \n",
" \n",
" 16 \n",
" 56 \n",
" technician \n",
" married \n",
" secondary \n",
" no \n",
" 4073 \n",
" no \n",
" no \n",
" cellular \n",
" 27 \n",
" aug \n",
" 239 \n",
" 5 \n",
" -1 \n",
" 0 \n",
" unknown \n",
" no \n",
" \n",
" \n",
" 17 \n",
" 37 \n",
" admin. \n",
" single \n",
" tertiary \n",
" no \n",
" 2317 \n",
" yes \n",
" no \n",
" cellular \n",
" 20 \n",
" apr \n",
" 114 \n",
" 1 \n",
" 152 \n",
" 2 \n",
" failure \n",
" no \n",
" \n",
" \n",
" 18 \n",
" 25 \n",
" blue-collar \n",
" single \n",
" primary \n",
" no \n",
" -221 \n",
" yes \n",
" no \n",
" unknown \n",
" 23 \n",
" may \n",
" 250 \n",
" 1 \n",
" -1 \n",
" 0 \n",
" unknown \n",
" no \n",
" \n",
" \n",
" 19 \n",
" 31 \n",
" services \n",
" married \n",
" secondary \n",
" no \n",
" 132 \n",
" no \n",
" no \n",
" cellular \n",
" 7 \n",
" jul \n",
" 148 \n",
" 1 \n",
" 152 \n",
" 1 \n",
" other \n",
" no \n",
" \n",
" \n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
" "
]
},
"metadata": {},
"execution_count": 2
}
],
"source": [
"import pandas as pa\n",
"\n",
"df = pa.read_csv('https://raw.githubusercontent.com/nurfnick/Data_Viz/main/Data_Sets/bank.csv')\n",
"\n",
"df.head(20)"
]
},
{
"cell_type": "markdown",
"source": [
"## Data Types"
],
"metadata": {
"id": "VW2njdqvIzjO"
}
},
{
"cell_type": "code",
"source": [
"df.dtypes"
],
"metadata": {
"id": "i52faT5kyFFA",
"outputId": "0319e909-e5f7-4c28-f45c-88a344bc1ec0",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"age int64\n",
"job object\n",
"marital object\n",
"education object\n",
"default object\n",
"balance int64\n",
"housing object\n",
"loan object\n",
"contact object\n",
"day int64\n",
"month object\n",
"duration int64\n",
"campaign int64\n",
"pdays int64\n",
"previous int64\n",
"poutcome object\n",
"y object\n",
"dtype: object"
]
},
"metadata": {},
"execution_count": 3
}
]
},
{
"cell_type": "markdown",
"source": [
"I'll change the `y` column to booleen. "
],
"metadata": {
"id": "Mn5NK1PrI2jt"
}
},
{
"cell_type": "code",
"source": [
"df.y = df.y=='yes'"
],
"metadata": {
"id": "wlUp6R3tyS1g"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"df.dtypes"
],
"metadata": {
"id": "ImP9ZpO7yv1T",
"outputId": "b777418c-0c41-44a1-f2eb-c04d798ea5f7",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"age int64\n",
"job object\n",
"marital object\n",
"education object\n",
"default object\n",
"balance int64\n",
"housing object\n",
"loan object\n",
"contact object\n",
"day int64\n",
"month object\n",
"duration int64\n",
"campaign int64\n",
"pdays int64\n",
"previous int64\n",
"poutcome object\n",
"y bool\n",
"dtype: object"
]
},
"metadata": {},
"execution_count": 5
}
]
},
{
"cell_type": "markdown",
"source": [
"## Regular Epressions"
],
"metadata": {
"id": "JuZtwLhdI8hd"
}
},
{
"cell_type": "markdown",
"source": [
""
],
"metadata": {
"id": "j4-T_jT0JBvE"
}
},
{
"cell_type": "code",
"source": [
"import re\n",
"\n",
"re.sub(r\"-\",\" \",df.job[6])"
],
"metadata": {
"id": "pRJt39Gzy-xV",
"outputId": "52dc3010-a664-44af-e1cc-842c52e552c9",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 36
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"'self employed'"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
}
},
"metadata": {},
"execution_count": 6
}
]
},
{
"cell_type": "markdown",
"source": [
"The above code successfully removes the '-' so I just need to apply it to the entire column and not forget to replace the current column."
],
"metadata": {
"id": "ZJ2uo9o5JCzN"
}
},
{
"cell_type": "code",
"source": [
"df.job.apply(lambda x: re.sub(r\"-\",\" \",x))"
],
"metadata": {
"id": "C8LVhE7RzanH",
"outputId": "4a8992de-cf80-4780-c088-0e8eeacd0618",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0 unemployed\n",
"1 services\n",
"2 management\n",
"3 management\n",
"4 blue collar\n",
" ... \n",
"4516 services\n",
"4517 self employed\n",
"4518 technician\n",
"4519 blue collar\n",
"4520 entrepreneur\n",
"Name: job, Length: 4521, dtype: object"
]
},
"metadata": {},
"execution_count": 7
}
]
},
{
"cell_type": "code",
"source": [
"df.job = df.job.apply(lambda x: re.sub(r\"-\",\" \",x))"
],
"metadata": {
"id": "f4sxV85z0aA9"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Replace With NaN"
],
"metadata": {
"id": "QtGaxFUeJOw-"
}
},
{
"cell_type": "markdown",
"source": [
"I needed `numpy` to do this. This was a silly exercise but still a nice thing to think about."
],
"metadata": {
"id": "O0MQMBVMJSEE"
}
},
{
"cell_type": "code",
"source": [
"import numpy as np\n",
"\n",
"df.pdays.replace(-1,np.nan)"
],
"metadata": {
"id": "qFgsR5ks0s9R",
"outputId": "287e0a14-af9b-443a-dd4e-7b177db9576d",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0 NaN\n",
"1 339.0\n",
"2 330.0\n",
"3 NaN\n",
"4 NaN\n",
" ... \n",
"4516 NaN\n",
"4517 NaN\n",
"4518 NaN\n",
"4519 211.0\n",
"4520 249.0\n",
"Name: pdays, Length: 4521, dtype: float64"
]
},
"metadata": {},
"execution_count": 9
}
]
},
{
"cell_type": "code",
"source": [
"df.pdays = df.pdays.replace(-1,np.nan)"
],
"metadata": {
"id": "N8WLmYDd0x1f"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Fill NaN"
],
"metadata": {
"id": "TKQ61L_yJcLF"
}
},
{
"cell_type": "markdown",
"source": [
"Now I immediately undo what I just did."
],
"metadata": {
"id": "GsuPpG6GJfDT"
}
},
{
"cell_type": "code",
"source": [
"df.pdays = df.pdays.fillna(0)"
],
"metadata": {
"id": "zUXjXGwf103o"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Group By"
],
"metadata": {
"id": "g-7OHLeiJjBb"
}
},
{
"cell_type": "code",
"source": [
"df.groupby('job').balance.agg(['mean','median','count','std'])"
],
"metadata": {
"id": "wOvnu3vB2Omo",
"outputId": "e1a98074-82f8-4a96-aaeb-6a09eaa8551f",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 457
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" mean median count std\n",
"job \n",
"admin. 1226.736402 430.0 478 2370.119128\n",
"blue collar 1085.161734 408.5 946 2040.218220\n",
"entrepreneur 1645.125000 365.5 168 4441.303620\n",
"housemaid 2083.803571 296.5 112 4603.836647\n",
"management 1766.928793 577.0 969 3267.733077\n",
"retired 2319.191304 672.5 230 5846.379889\n",
"self employed 1392.409836 483.0 183 2479.640999\n",
"services 1103.956835 288.0 417 2445.239976\n",
"student 1543.821429 422.5 84 2579.886671\n",
"technician 1330.996094 434.5 768 2630.253390\n",
"unemployed 1089.421875 473.5 128 1692.267628\n",
"unknown 1501.710526 655.5 38 1957.258258"
],
"text/html": [
"\n",
" \n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" mean \n",
" median \n",
" count \n",
" std \n",
" \n",
" \n",
" job \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" admin. \n",
" 1226.736402 \n",
" 430.0 \n",
" 478 \n",
" 2370.119128 \n",
" \n",
" \n",
" blue collar \n",
" 1085.161734 \n",
" 408.5 \n",
" 946 \n",
" 2040.218220 \n",
" \n",
" \n",
" entrepreneur \n",
" 1645.125000 \n",
" 365.5 \n",
" 168 \n",
" 4441.303620 \n",
" \n",
" \n",
" housemaid \n",
" 2083.803571 \n",
" 296.5 \n",
" 112 \n",
" 4603.836647 \n",
" \n",
" \n",
" management \n",
" 1766.928793 \n",
" 577.0 \n",
" 969 \n",
" 3267.733077 \n",
" \n",
" \n",
" retired \n",
" 2319.191304 \n",
" 672.5 \n",
" 230 \n",
" 5846.379889 \n",
" \n",
" \n",
" self employed \n",
" 1392.409836 \n",
" 483.0 \n",
" 183 \n",
" 2479.640999 \n",
" \n",
" \n",
" services \n",
" 1103.956835 \n",
" 288.0 \n",
" 417 \n",
" 2445.239976 \n",
" \n",
" \n",
" student \n",
" 1543.821429 \n",
" 422.5 \n",
" 84 \n",
" 2579.886671 \n",
" \n",
" \n",
" technician \n",
" 1330.996094 \n",
" 434.5 \n",
" 768 \n",
" 2630.253390 \n",
" \n",
" \n",
" unemployed \n",
" 1089.421875 \n",
" 473.5 \n",
" 128 \n",
" 1692.267628 \n",
" \n",
" \n",
" unknown \n",
" 1501.710526 \n",
" 655.5 \n",
" 38 \n",
" 1957.258258 \n",
" \n",
" \n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
" "
]
},
"metadata": {},
"execution_count": 12
}
]
},
{
"cell_type": "markdown",
"source": [
"## Dates"
],
"metadata": {
"id": "Eu_Oieu5JnML"
}
},
{
"cell_type": "markdown",
"source": [
"This requires some conversion of the months to numerical."
],
"metadata": {
"id": "UM_o-sEuJqGP"
}
},
{
"cell_type": "code",
"source": [
"monthConvert = {\n",
" 'jan':1,\n",
" 'feb':2,\n",
" 'mar':3,\n",
" 'apr':4,\n",
" 'may':5,\n",
" 'jun':6,\n",
" 'jul':7,\n",
" 'aug':8,\n",
" 'sep':9,\n",
" 'oct':10,\n",
" 'nov':11,\n",
" 'dec':12\n",
"}"
],
"metadata": {
"id": "773I6vJo35fr"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"This next line actually creates the datetime column."
],
"metadata": {
"id": "_kEBJb9RJwkj"
}
},
{
"cell_type": "code",
"source": [
"date = pa.to_datetime(dict(year = 2020, day = df.day, month = df.month.map(monthConvert)))"
],
"metadata": {
"id": "tNy1t49Z2kk6"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"I'll add it to my dataframe. Adding a new column requires `concat` and creating a `DataFrame` from the series. \n",
"\n",
"I struggled in class to do this because I needed a dictinary with the coulmn name..."
],
"metadata": {
"id": "R7ujPmpSKGZQ"
}
},
{
"cell_type": "code",
"source": [
"df = pa.concat([df,pa.DataFrame( {'date':date})],axis = 1)\n",
"\n",
"df.head()"
],
"metadata": {
"id": "2ptVqF2L3uVI",
"outputId": "c10c0508-6577-42e4-fcef-a08562149882",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" age job marital education default balance housing loan \\\n",
"0 30 unemployed married primary no 1787 no no \n",
"1 33 services married secondary no 4789 yes yes \n",
"2 35 management single tertiary no 1350 yes no \n",
"3 30 management married tertiary no 1476 yes yes \n",
"4 59 blue collar married secondary no 0 yes no \n",
"\n",
" contact day month duration campaign pdays previous poutcome y \\\n",
"0 cellular 19 oct 79 1 0.0 0 unknown False \n",
"1 cellular 11 may 220 1 339.0 4 failure False \n",
"2 cellular 16 apr 185 1 330.0 1 failure False \n",
"3 unknown 3 jun 199 4 0.0 0 unknown False \n",
"4 unknown 5 may 226 1 0.0 0 unknown False \n",
"\n",
" date \n",
"0 2020-10-19 \n",
"1 2020-05-11 \n",
"2 2020-04-16 \n",
"3 2020-06-03 \n",
"4 2020-05-05 "
],
"text/html": [
"\n",
" \n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" age \n",
" job \n",
" marital \n",
" education \n",
" default \n",
" balance \n",
" housing \n",
" loan \n",
" contact \n",
" day \n",
" month \n",
" duration \n",
" campaign \n",
" pdays \n",
" previous \n",
" poutcome \n",
" y \n",
" date \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" 30 \n",
" unemployed \n",
" married \n",
" primary \n",
" no \n",
" 1787 \n",
" no \n",
" no \n",
" cellular \n",
" 19 \n",
" oct \n",
" 79 \n",
" 1 \n",
" 0.0 \n",
" 0 \n",
" unknown \n",
" False \n",
" 2020-10-19 \n",
" \n",
" \n",
" 1 \n",
" 33 \n",
" services \n",
" married \n",
" secondary \n",
" no \n",
" 4789 \n",
" yes \n",
" yes \n",
" cellular \n",
" 11 \n",
" may \n",
" 220 \n",
" 1 \n",
" 339.0 \n",
" 4 \n",
" failure \n",
" False \n",
" 2020-05-11 \n",
" \n",
" \n",
" 2 \n",
" 35 \n",
" management \n",
" single \n",
" tertiary \n",
" no \n",
" 1350 \n",
" yes \n",
" no \n",
" cellular \n",
" 16 \n",
" apr \n",
" 185 \n",
" 1 \n",
" 330.0 \n",
" 1 \n",
" failure \n",
" False \n",
" 2020-04-16 \n",
" \n",
" \n",
" 3 \n",
" 30 \n",
" management \n",
" married \n",
" tertiary \n",
" no \n",
" 1476 \n",
" yes \n",
" yes \n",
" unknown \n",
" 3 \n",
" jun \n",
" 199 \n",
" 4 \n",
" 0.0 \n",
" 0 \n",
" unknown \n",
" False \n",
" 2020-06-03 \n",
" \n",
" \n",
" 4 \n",
" 59 \n",
" blue collar \n",
" married \n",
" secondary \n",
" no \n",
" 0 \n",
" yes \n",
" no \n",
" unknown \n",
" 5 \n",
" may \n",
" 226 \n",
" 1 \n",
" 0.0 \n",
" 0 \n",
" unknown \n",
" False \n",
" 2020-05-05 \n",
" \n",
" \n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
" "
]
},
"metadata": {},
"execution_count": 15
}
]
},
{
"cell_type": "markdown",
"source": [
"## Day of the Week"
],
"metadata": {
"id": "WpuDmX58LIgp"
}
},
{
"cell_type": "markdown",
"source": [
"Here is one way to do it."
],
"metadata": {
"id": "0iZAhPpyMBVZ"
}
},
{
"cell_type": "code",
"source": [
"pa.crosstab(df.date.dt.day_name(),df.y)"
],
"metadata": {
"id": "XSEiP4XFLLLe",
"outputId": "4269acf7-a238-4cbd-f8b7-56d007be5318",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 300
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"y False True\n",
"date \n",
"Friday 705 81\n",
"Monday 295 70\n",
"Saturday 491 36\n",
"Sunday 77 29\n",
"Thursday 846 105\n",
"Tuesday 753 107\n",
"Wednesday 833 93"
],
"text/html": [
"\n",
" \n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
" \n",
" y \n",
" False \n",
" True \n",
" \n",
" \n",
" date \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" Friday \n",
" 705 \n",
" 81 \n",
" \n",
" \n",
" Monday \n",
" 295 \n",
" 70 \n",
" \n",
" \n",
" Saturday \n",
" 491 \n",
" 36 \n",
" \n",
" \n",
" Sunday \n",
" 77 \n",
" 29 \n",
" \n",
" \n",
" Thursday \n",
" 846 \n",
" 105 \n",
" \n",
" \n",
" Tuesday \n",
" 753 \n",
" 107 \n",
" \n",
" \n",
" Wednesday \n",
" 833 \n",
" 93 \n",
" \n",
" \n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
" "
]
},
"metadata": {},
"execution_count": 16
}
]
},
{
"cell_type": "markdown",
"source": [
"Another way might be with `groupby`."
],
"metadata": {
"id": "hvMgZ52CMD5V"
}
},
{
"cell_type": "code",
"source": [
"df.groupby([date.dt.day_name(),df.y]).age.agg('count')"
],
"metadata": {
"id": "d-HlspRbMKc7",
"outputId": "587e33ce-79ee-450e-9f4c-fc7430595c65",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" y \n",
"Friday False 705\n",
" True 81\n",
"Monday False 295\n",
" True 70\n",
"Saturday False 491\n",
" True 36\n",
"Sunday False 77\n",
" True 29\n",
"Thursday False 846\n",
" True 105\n",
"Tuesday False 753\n",
" True 107\n",
"Wednesday False 833\n",
" True 93\n",
"Name: age, dtype: int64"
]
},
"metadata": {},
"execution_count": 17
}
]
},
{
"cell_type": "markdown",
"source": [
"I think you could look at this by doing some subsetting of the data too."
],
"metadata": {
"id": "kYl2k68hMnE8"
}
},
{
"cell_type": "code",
"source": [
"df[df.y == True].groupby(df.date.dt.day_name()).y.agg('count')"
],
"metadata": {
"id": "scRcs_8_Msl7",
"outputId": "c984affb-a184-4ad1-f0d9-8bff23ab44ca",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"date\n",
"Friday 81\n",
"Monday 70\n",
"Saturday 36\n",
"Sunday 29\n",
"Thursday 105\n",
"Tuesday 107\n",
"Wednesday 93\n",
"Name: y, dtype: int64"
]
},
"metadata": {},
"execution_count": 18
}
]
},
{
"cell_type": "markdown",
"source": [
"This one was rather difficult. Finding the day names was what I was after!"
],
"metadata": {
"id": "Bah__FShNAbU"
}
},
{
"cell_type": "markdown",
"source": [
"## Indicator"
],
"metadata": {
"id": "eiFqGX3GNKu7"
}
},
{
"cell_type": "code",
"source": [
"bal = df.balance > 500\n",
"pa.concat([df,pa.DataFrame({'FiveHundo':bal})], axis = 1)"
],
"metadata": {
"id": "WpoEr_BXNNnU",
"outputId": "500d7c8a-99cb-43c4-b536-cba4c925a5cd",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 424
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" age job marital education default balance housing loan \\\n",
"0 30 unemployed married primary no 1787 no no \n",
"1 33 services married secondary no 4789 yes yes \n",
"2 35 management single tertiary no 1350 yes no \n",
"3 30 management married tertiary no 1476 yes yes \n",
"4 59 blue collar married secondary no 0 yes no \n",
"... ... ... ... ... ... ... ... ... \n",
"4516 33 services married secondary no -333 yes no \n",
"4517 57 self employed married tertiary yes -3313 yes yes \n",
"4518 57 technician married secondary no 295 no no \n",
"4519 28 blue collar married secondary no 1137 no no \n",
"4520 44 entrepreneur single tertiary no 1136 yes yes \n",
"\n",
" contact day month duration campaign pdays previous poutcome \\\n",
"0 cellular 19 oct 79 1 0.0 0 unknown \n",
"1 cellular 11 may 220 1 339.0 4 failure \n",
"2 cellular 16 apr 185 1 330.0 1 failure \n",
"3 unknown 3 jun 199 4 0.0 0 unknown \n",
"4 unknown 5 may 226 1 0.0 0 unknown \n",
"... ... ... ... ... ... ... ... ... \n",
"4516 cellular 30 jul 329 5 0.0 0 unknown \n",
"4517 unknown 9 may 153 1 0.0 0 unknown \n",
"4518 cellular 19 aug 151 11 0.0 0 unknown \n",
"4519 cellular 6 feb 129 4 211.0 3 other \n",
"4520 cellular 3 apr 345 2 249.0 7 other \n",
"\n",
" y date FiveHundo \n",
"0 False 2020-10-19 True \n",
"1 False 2020-05-11 True \n",
"2 False 2020-04-16 True \n",
"3 False 2020-06-03 True \n",
"4 False 2020-05-05 False \n",
"... ... ... ... \n",
"4516 False 2020-07-30 False \n",
"4517 False 2020-05-09 False \n",
"4518 False 2020-08-19 False \n",
"4519 False 2020-02-06 True \n",
"4520 False 2020-04-03 True \n",
"\n",
"[4521 rows x 19 columns]"
],
"text/html": [
"\n",
" \n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" age \n",
" job \n",
" marital \n",
" education \n",
" default \n",
" balance \n",
" housing \n",
" loan \n",
" contact \n",
" day \n",
" month \n",
" duration \n",
" campaign \n",
" pdays \n",
" previous \n",
" poutcome \n",
" y \n",
" date \n",
" FiveHundo \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" 30 \n",
" unemployed \n",
" married \n",
" primary \n",
" no \n",
" 1787 \n",
" no \n",
" no \n",
" cellular \n",
" 19 \n",
" oct \n",
" 79 \n",
" 1 \n",
" 0.0 \n",
" 0 \n",
" unknown \n",
" False \n",
" 2020-10-19 \n",
" True \n",
" \n",
" \n",
" 1 \n",
" 33 \n",
" services \n",
" married \n",
" secondary \n",
" no \n",
" 4789 \n",
" yes \n",
" yes \n",
" cellular \n",
" 11 \n",
" may \n",
" 220 \n",
" 1 \n",
" 339.0 \n",
" 4 \n",
" failure \n",
" False \n",
" 2020-05-11 \n",
" True \n",
" \n",
" \n",
" 2 \n",
" 35 \n",
" management \n",
" single \n",
" tertiary \n",
" no \n",
" 1350 \n",
" yes \n",
" no \n",
" cellular \n",
" 16 \n",
" apr \n",
" 185 \n",
" 1 \n",
" 330.0 \n",
" 1 \n",
" failure \n",
" False \n",
" 2020-04-16 \n",
" True \n",
" \n",
" \n",
" 3 \n",
" 30 \n",
" management \n",
" married \n",
" tertiary \n",
" no \n",
" 1476 \n",
" yes \n",
" yes \n",
" unknown \n",
" 3 \n",
" jun \n",
" 199 \n",
" 4 \n",
" 0.0 \n",
" 0 \n",
" unknown \n",
" False \n",
" 2020-06-03 \n",
" True \n",
" \n",
" \n",
" 4 \n",
" 59 \n",
" blue collar \n",
" married \n",
" secondary \n",
" no \n",
" 0 \n",
" yes \n",
" no \n",
" unknown \n",
" 5 \n",
" may \n",
" 226 \n",
" 1 \n",
" 0.0 \n",
" 0 \n",
" unknown \n",
" False \n",
" 2020-05-05 \n",
" False \n",
" \n",
" \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" \n",
" \n",
" 4516 \n",
" 33 \n",
" services \n",
" married \n",
" secondary \n",
" no \n",
" -333 \n",
" yes \n",
" no \n",
" cellular \n",
" 30 \n",
" jul \n",
" 329 \n",
" 5 \n",
" 0.0 \n",
" 0 \n",
" unknown \n",
" False \n",
" 2020-07-30 \n",
" False \n",
" \n",
" \n",
" 4517 \n",
" 57 \n",
" self employed \n",
" married \n",
" tertiary \n",
" yes \n",
" -3313 \n",
" yes \n",
" yes \n",
" unknown \n",
" 9 \n",
" may \n",
" 153 \n",
" 1 \n",
" 0.0 \n",
" 0 \n",
" unknown \n",
" False \n",
" 2020-05-09 \n",
" False \n",
" \n",
" \n",
" 4518 \n",
" 57 \n",
" technician \n",
" married \n",
" secondary \n",
" no \n",
" 295 \n",
" no \n",
" no \n",
" cellular \n",
" 19 \n",
" aug \n",
" 151 \n",
" 11 \n",
" 0.0 \n",
" 0 \n",
" unknown \n",
" False \n",
" 2020-08-19 \n",
" False \n",
" \n",
" \n",
" 4519 \n",
" 28 \n",
" blue collar \n",
" married \n",
" secondary \n",
" no \n",
" 1137 \n",
" no \n",
" no \n",
" cellular \n",
" 6 \n",
" feb \n",
" 129 \n",
" 4 \n",
" 211.0 \n",
" 3 \n",
" other \n",
" False \n",
" 2020-02-06 \n",
" True \n",
" \n",
" \n",
" 4520 \n",
" 44 \n",
" entrepreneur \n",
" single \n",
" tertiary \n",
" no \n",
" 1136 \n",
" yes \n",
" yes \n",
" cellular \n",
" 3 \n",
" apr \n",
" 345 \n",
" 2 \n",
" 249.0 \n",
" 7 \n",
" other \n",
" False \n",
" 2020-04-03 \n",
" True \n",
" \n",
" \n",
"
\n",
"
4521 rows × 19 columns
\n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
" "
]
},
"metadata": {},
"execution_count": 19
}
]
},
{
"cell_type": "markdown",
"source": [
"## Subsetting Data"
],
"metadata": {
"id": "udVPISbwNyz3"
}
},
{
"cell_type": "code",
"source": [
"df[(df.job == 'admin.')&(df.marital == 'divorced')]"
],
"metadata": {
"id": "DZOtydbTN1F_",
"outputId": "a441dfeb-6dc2-45ee-b098-dda2d14326bf",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 424
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" age job marital education default balance housing loan \\\n",
"35 42 admin. divorced secondary no 1811 yes no \n",
"80 27 admin. divorced secondary no 451 yes no \n",
"120 31 admin. divorced secondary no 1890 yes no \n",
"183 40 admin. divorced secondary no 6 no no \n",
"283 42 admin. divorced secondary no 63 no no \n",
"... ... ... ... ... ... ... ... ... \n",
"4128 34 admin. divorced secondary no 1268 yes no \n",
"4244 39 admin. divorced secondary no 83 yes no \n",
"4282 34 admin. divorced secondary no -251 no no \n",
"4407 45 admin. divorced secondary no 59 yes no \n",
"4438 52 admin. divorced secondary no 98 no yes \n",
"\n",
" contact day month duration campaign pdays previous poutcome \\\n",
"35 unknown 14 may 150 1 0.0 0 unknown \n",
"80 cellular 16 jul 652 1 0.0 0 unknown \n",
"120 cellular 21 jul 588 1 0.0 0 unknown \n",
"183 unknown 11 jun 140 1 0.0 0 unknown \n",
"283 unknown 16 may 88 6 0.0 0 unknown \n",
"... ... ... ... ... ... ... ... ... \n",
"4128 cellular 15 may 259 3 0.0 0 unknown \n",
"4244 cellular 30 jul 69 6 0.0 0 unknown \n",
"4282 cellular 18 jul 641 1 0.0 0 unknown \n",
"4407 cellular 24 jul 873 8 0.0 0 unknown \n",
"4438 unknown 19 jun 148 1 0.0 0 unknown \n",
"\n",
" y date \n",
"35 False 2020-05-14 \n",
"80 True 2020-07-16 \n",
"120 False 2020-07-21 \n",
"183 False 2020-06-11 \n",
"283 False 2020-05-16 \n",
"... ... ... \n",
"4128 False 2020-05-15 \n",
"4244 False 2020-07-30 \n",
"4282 True 2020-07-18 \n",
"4407 True 2020-07-24 \n",
"4438 False 2020-06-19 \n",
"\n",
"[69 rows x 18 columns]"
],
"text/html": [
"\n",
" \n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" age \n",
" job \n",
" marital \n",
" education \n",
" default \n",
" balance \n",
" housing \n",
" loan \n",
" contact \n",
" day \n",
" month \n",
" duration \n",
" campaign \n",
" pdays \n",
" previous \n",
" poutcome \n",
" y \n",
" date \n",
" \n",
" \n",
" \n",
" \n",
" 35 \n",
" 42 \n",
" admin. \n",
" divorced \n",
" secondary \n",
" no \n",
" 1811 \n",
" yes \n",
" no \n",
" unknown \n",
" 14 \n",
" may \n",
" 150 \n",
" 1 \n",
" 0.0 \n",
" 0 \n",
" unknown \n",
" False \n",
" 2020-05-14 \n",
" \n",
" \n",
" 80 \n",
" 27 \n",
" admin. \n",
" divorced \n",
" secondary \n",
" no \n",
" 451 \n",
" yes \n",
" no \n",
" cellular \n",
" 16 \n",
" jul \n",
" 652 \n",
" 1 \n",
" 0.0 \n",
" 0 \n",
" unknown \n",
" True \n",
" 2020-07-16 \n",
" \n",
" \n",
" 120 \n",
" 31 \n",
" admin. \n",
" divorced \n",
" secondary \n",
" no \n",
" 1890 \n",
" yes \n",
" no \n",
" cellular \n",
" 21 \n",
" jul \n",
" 588 \n",
" 1 \n",
" 0.0 \n",
" 0 \n",
" unknown \n",
" False \n",
" 2020-07-21 \n",
" \n",
" \n",
" 183 \n",
" 40 \n",
" admin. \n",
" divorced \n",
" secondary \n",
" no \n",
" 6 \n",
" no \n",
" no \n",
" unknown \n",
" 11 \n",
" jun \n",
" 140 \n",
" 1 \n",
" 0.0 \n",
" 0 \n",
" unknown \n",
" False \n",
" 2020-06-11 \n",
" \n",
" \n",
" 283 \n",
" 42 \n",
" admin. \n",
" divorced \n",
" secondary \n",
" no \n",
" 63 \n",
" no \n",
" no \n",
" unknown \n",
" 16 \n",
" may \n",
" 88 \n",
" 6 \n",
" 0.0 \n",
" 0 \n",
" unknown \n",
" False \n",
" 2020-05-16 \n",
" \n",
" \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" \n",
" \n",
" 4128 \n",
" 34 \n",
" admin. \n",
" divorced \n",
" secondary \n",
" no \n",
" 1268 \n",
" yes \n",
" no \n",
" cellular \n",
" 15 \n",
" may \n",
" 259 \n",
" 3 \n",
" 0.0 \n",
" 0 \n",
" unknown \n",
" False \n",
" 2020-05-15 \n",
" \n",
" \n",
" 4244 \n",
" 39 \n",
" admin. \n",
" divorced \n",
" secondary \n",
" no \n",
" 83 \n",
" yes \n",
" no \n",
" cellular \n",
" 30 \n",
" jul \n",
" 69 \n",
" 6 \n",
" 0.0 \n",
" 0 \n",
" unknown \n",
" False \n",
" 2020-07-30 \n",
" \n",
" \n",
" 4282 \n",
" 34 \n",
" admin. \n",
" divorced \n",
" secondary \n",
" no \n",
" -251 \n",
" no \n",
" no \n",
" cellular \n",
" 18 \n",
" jul \n",
" 641 \n",
" 1 \n",
" 0.0 \n",
" 0 \n",
" unknown \n",
" True \n",
" 2020-07-18 \n",
" \n",
" \n",
" 4407 \n",
" 45 \n",
" admin. \n",
" divorced \n",
" secondary \n",
" no \n",
" 59 \n",
" yes \n",
" no \n",
" cellular \n",
" 24 \n",
" jul \n",
" 873 \n",
" 8 \n",
" 0.0 \n",
" 0 \n",
" unknown \n",
" True \n",
" 2020-07-24 \n",
" \n",
" \n",
" 4438 \n",
" 52 \n",
" admin. \n",
" divorced \n",
" secondary \n",
" no \n",
" 98 \n",
" no \n",
" yes \n",
" unknown \n",
" 19 \n",
" jun \n",
" 148 \n",
" 1 \n",
" 0.0 \n",
" 0 \n",
" unknown \n",
" False \n",
" 2020-06-19 \n",
" \n",
" \n",
"
\n",
"
69 rows × 18 columns
\n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
" "
]
},
"metadata": {},
"execution_count": 20
}
]
},
{
"cell_type": "markdown",
"source": [
"Here are those folks! Let's see what their stats are for getting approved."
],
"metadata": {
"id": "aekEV7TDOG-f"
}
},
{
"cell_type": "code",
"source": [
"df[(df.job == 'admin.')&(df.marital == 'divorced')].groupby('y').age.agg('count')"
],
"metadata": {
"id": "l-3hGgSNOla_",
"outputId": "fcc0f6f8-1c20-4d0e-8652-0feff2054b9e",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"y\n",
"False 58\n",
"True 11\n",
"Name: age, dtype: int64"
]
},
"metadata": {},
"execution_count": 21
}
]
},
{
"cell_type": "markdown",
"source": [
"## Another Group By?"
],
"metadata": {
"id": "RYveWK1yOvA9"
}
},
{
"cell_type": "code",
"source": [
"df.groupby(['job','default']).age.agg('count')"
],
"metadata": {
"id": "WMLFrD3oO2IE",
"outputId": "fc81f3d1-18ce-4ec7-a660-ab78e1f5059f",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"job default\n",
"admin. no 472\n",
" yes 6\n",
"blue collar no 932\n",
" yes 14\n",
"entrepreneur no 161\n",
" yes 7\n",
"housemaid no 110\n",
" yes 2\n",
"management no 955\n",
" yes 14\n",
"retired no 227\n",
" yes 3\n",
"self employed no 179\n",
" yes 4\n",
"services no 410\n",
" yes 7\n",
"student no 83\n",
" yes 1\n",
"technician no 753\n",
" yes 15\n",
"unemployed no 125\n",
" yes 3\n",
"unknown no 38\n",
"Name: age, dtype: int64"
]
},
"metadata": {},
"execution_count": 22
}
]
},
{
"cell_type": "markdown",
"source": [
"## Two Way Tables"
],
"metadata": {
"id": "GJdSOASXKz5w"
}
},
{
"cell_type": "markdown",
"source": [
"Below is how I looked at number 11 in class."
],
"metadata": {
"id": "m70dYyzkK2Df"
}
},
{
"cell_type": "code",
"source": [
"df.groupby(['education','contact']).age.agg('count')"
],
"metadata": {
"id": "XsY7lm104trK",
"outputId": "39853125-a4dd-4f27-e955-8e756d0547c8",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"education contact \n",
"primary cellular 343\n",
" telephone 75\n",
" unknown 260\n",
"secondary cellular 1446\n",
" telephone 141\n",
" unknown 719\n",
"tertiary cellular 1012\n",
" telephone 66\n",
" unknown 272\n",
"unknown cellular 95\n",
" telephone 19\n",
" unknown 73\n",
"Name: age, dtype: int64"
]
},
"metadata": {},
"execution_count": 23
}
]
},
{
"cell_type": "markdown",
"source": [
"Below is a nicer way to create a two-way table with `pandas.crosstab`"
],
"metadata": {
"id": "YCaQ04U4K7fE"
}
},
{
"cell_type": "code",
"source": [
"pa.crosstab(df.education,df.contact)"
],
"metadata": {
"id": "W-XqwY1q7R3T",
"outputId": "53b2d61a-ffb2-4171-af51-24a3a5cd5c0b",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"contact cellular telephone unknown\n",
"education \n",
"primary 343 75 260\n",
"secondary 1446 141 719\n",
"tertiary 1012 66 272\n",
"unknown 95 19 73"
],
"text/html": [
"\n",
" \n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
" \n",
" contact \n",
" cellular \n",
" telephone \n",
" unknown \n",
" \n",
" \n",
" education \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" primary \n",
" 343 \n",
" 75 \n",
" 260 \n",
" \n",
" \n",
" secondary \n",
" 1446 \n",
" 141 \n",
" 719 \n",
" \n",
" \n",
" tertiary \n",
" 1012 \n",
" 66 \n",
" 272 \n",
" \n",
" \n",
" unknown \n",
" 95 \n",
" 19 \n",
" 73 \n",
" \n",
" \n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
" "
]
},
"metadata": {},
"execution_count": 24
}
]
},
{
"cell_type": "markdown",
"source": [
"## Maximum"
],
"metadata": {
"id": "0WAPUSe3ABXV"
}
},
{
"cell_type": "code",
"source": [
"df[df.balance == max(df.balance)].marital"
],
"metadata": {
"id": "H59_o5zMKu3o",
"outputId": "f32080c7-96ce-41df-f19c-420a5f4d9c2e",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"3700 married\n",
"Name: marital, dtype: object"
]
},
"metadata": {},
"execution_count": 29
}
]
},
{
"cell_type": "code",
"source": [
""
],
"metadata": {
"id": "pGHzhi3BAGyf"
},
"execution_count": null,
"outputs": []
}
]
}