{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Untitled78.ipynb",
      "provenance": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/nurfnick/Data_Viz/blob/main/Content/Data_Cleaning/13_Data_Types.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Data Types"
      ],
      "metadata": {
        "id": "GR4Jyk8u1cZk"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Lots of Data"
      ],
      "metadata": {
        "id": "QHF7m2Ef7Nea"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "To get onto the task of cleaning our data, it is first important to know what type of data we have and the best tools for using it!\n",
        "\n",
        "Let's first look at how we bring our data into the python environment.  We have worked mostly with `pandas` and will continue to do so!  Pandas is actually built on top of another environment, `numpy`.  At some point in our work we will need both!"
      ],
      "metadata": {
        "id": "33R-OSZj1f_q"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Pandas"
      ],
      "metadata": {
        "id": "pCoJ0ero2Q0f"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Pandas is great for loading data.  We have seen it handle csv, html and data from an sql call.  We can also load JSON and excel files.\n",
        "\n",
        "`DataFrame` is the table environment we've used before and `series` is similar to a column.\n",
        "\n",
        "You should use a pandas dataframe when your data contains categorical data.\n",
        "\n",
        "Pandas is best when dealing with large datasets."
      ],
      "metadata": {
        "id": "NB0yJEUk2TML"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import pandas as pa\n",
        "\n",
        "df = pa.read_csv('https://raw.githubusercontent.com/nurfnick/Data_Viz/main/Data_Sets/iris.csv')\n",
        "\n",
        "df.head()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "id": "ZnmcDmsw8NZ5",
        "outputId": "5b9cf3e1-9d27-4b6b-98cc-3751ad1081ab"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-8660fb60-527a-41c2-aadd-fef91e6e1d31\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>SepalLength</th>\n",
              "      <th>SepalWidth</th>\n",
              "      <th>PedalLength</th>\n",
              "      <th>PedalWidth</th>\n",
              "      <th>Class</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>5.1</td>\n",
              "      <td>3.5</td>\n",
              "      <td>1.4</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>4.9</td>\n",
              "      <td>3.0</td>\n",
              "      <td>1.4</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>4.7</td>\n",
              "      <td>3.2</td>\n",
              "      <td>1.3</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>4.6</td>\n",
              "      <td>3.1</td>\n",
              "      <td>1.5</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>5.0</td>\n",
              "      <td>3.6</td>\n",
              "      <td>1.4</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-8660fb60-527a-41c2-aadd-fef91e6e1d31')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-8660fb60-527a-41c2-aadd-fef91e6e1d31 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-8660fb60-527a-41c2-aadd-fef91e6e1d31');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ],
            "text/plain": [
              "   SepalLength  SepalWidth  PedalLength  PedalWidth        Class\n",
              "0          5.1         3.5          1.4         0.2  Iris-setosa\n",
              "1          4.9         3.0          1.4         0.2  Iris-setosa\n",
              "2          4.7         3.2          1.3         0.2  Iris-setosa\n",
              "3          4.6         3.1          1.5         0.2  Iris-setosa\n",
              "4          5.0         3.6          1.4         0.2  Iris-setosa"
            ]
          },
          "metadata": {},
          "execution_count": 1
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Numpy"
      ],
      "metadata": {
        "id": "j7csniAy4DeT"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "The other important tool in python is `numpy`.  It is the foundation of the pandas module but it has some limitations.  \n",
        "\n",
        "Numpy is excellent for higher dimensional data, stored as an `array`.  Think multiple sheets in a excel spreadsheet, data that will not simply fit in a 2 dimensional array.\n",
        "\n",
        "Numpy arrays can be accessed easily by there indicies, this is very ineffiecient in pandas.\n",
        "\n",
        "Numpy data should be just numbers!  Categorical data should be converted first before utilizing a numpy array. Numpy is effiecient and fast but works best on smaller datasets.\n"
      ],
      "metadata": {
        "id": "5yVYYKMI4FuL"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import numpy as np\n",
        "\n",
        "df1 = pa.get_dummies(data = df)\n",
        "\n",
        "X = np.array(df1)\n",
        "\n",
        "X"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "qw2-bFP98keb",
        "outputId": "b52daa4c-e0ac-4066-a59b-03cb5dcb7660"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "array([[5.1, 3.5, 1.4, ..., 1. , 0. , 0. ],\n",
              "       [4.9, 3. , 1.4, ..., 1. , 0. , 0. ],\n",
              "       [4.7, 3.2, 1.3, ..., 1. , 0. , 0. ],\n",
              "       ...,\n",
              "       [6.5, 3. , 5.2, ..., 0. , 0. , 1. ],\n",
              "       [6.2, 3.4, 5.4, ..., 0. , 0. , 1. ],\n",
              "       [5.9, 3. , 5.1, ..., 0. , 0. , 1. ]])"
            ]
          },
          "metadata": {},
          "execution_count": 2
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "We will discuss what the `get_dummies` does in due time.  For now just know that it converted the class into numbers for use in numpy array."
      ],
      "metadata": {
        "id": "QTO8Lilb-KeW"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "df1.dtypes"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "voqkO1G3ylv6",
        "outputId": "139f9db6-351d-4846-b12e-a6f45cfa98e7"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "SepalLength              float64\n",
              "SepalWidth               float64\n",
              "PedalLength              float64\n",
              "PedalWidth               float64\n",
              "Class_Iris-setosa          uint8\n",
              "Class_Iris-versicolor      uint8\n",
              "Class_Iris-virginica       uint8\n",
              "dtype: object"
            ]
          },
          "metadata": {},
          "execution_count": 23
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Which is Best?"
      ],
      "metadata": {
        "id": "Wd0zlK7T6vzC"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Very often I will use both in a project.  I'll start with pandas for loading, cleaning and basic analysis.  Then I will convert the data to an numpy array and create models for predicitons."
      ],
      "metadata": {
        "id": "ICqp5xjU61OD"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Less Data"
      ],
      "metadata": {
        "id": "ryPaQGPR7K8E"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Now that we have lots of data we'll have to start examining each piece!  I am following [this](https://pbpython.com/pandas_dtypes.html) page of the different types."
      ],
      "metadata": {
        "id": "nfgrj3I67zCN"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Strings"
      ],
      "metadata": {
        "id": "y61PnqpzG5Sr"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "The most common type of data we examine is a string.  We will spend a lot of time dealing with strings.  Often data in another format is actually given as a string so we'll have our work cut out for use manipulating strings.  \n",
        "\n",
        "In the `iris` dataset, the class was given as a string."
      ],
      "metadata": {
        "id": "-n7hvkbMFpme"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "df.Class.iloc[0]"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 35
        },
        "id": "-6tRnxCBGINt",
        "outputId": "f8a1f6cf-907d-4da3-f7b9-5cb04e22405f"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            },
            "text/plain": [
              "'Iris-setosa'"
            ]
          },
          "metadata": {},
          "execution_count": 39
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "The tell-tale sign of a string is the quotes.  Of course we can save a string too."
      ],
      "metadata": {
        "id": "gFOs_z7YGgJ6"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "a_string = 'My really cool string'\n",
        "\n",
        "print(a_string)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "AFu3cEzAGwHx",
        "outputId": "e110a63e-4fdb-4467-ad79-2331da2d8f6e"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "My really cool string\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Pandas calls the datatype of object ('O') for any string it is passed."
      ],
      "metadata": {
        "id": "aBokifMD0FhE"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "df.Class.dtype"
      ],
      "metadata": {
        "id": "gu58lnGD0M2L",
        "outputId": "669f7229-1396-48cf-9326-67f14a6a92d0",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "dtype('O')"
            ]
          },
          "metadata": {},
          "execution_count": 30
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Boolean"
      ],
      "metadata": {
        "id": "L8nX1RwBGvVM"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Boolean is the logical operator, taking only two values, `True` or `False`.  We can combine them using the normal logical connections.  We can also get a boolean by doing comparisons."
      ],
      "metadata": {
        "id": "6hXPO8SYG_Sb"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "a = True\n",
        "b = False\n",
        "\n",
        "print(a and b) #or a & b\n",
        "\n",
        "print(a or b) # or a | b\n",
        "\n",
        "print( not a ) # or ~a"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "ukAhD0KOIRaU",
        "outputId": "4aabf856-cf8e-4c93-802f-acd628510523"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "False\n",
            "True\n",
            "False\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "print(3 == 4)\n",
        "\n",
        "print(5>-2)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "v_VSmnttInMF",
        "outputId": "e57e6d43-72a2-417b-ecc2-30ade51b38f3"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "False\n",
            "True\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "bool(0)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "bVi-awoDNc2z",
        "outputId": "1c54eb19-4c38-4033-8e47-1d61e2b36a81"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "False"
            ]
          },
          "metadata": {},
          "execution_count": 14
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "This may show up in manipulating data!  You can ask for only the classes that are *Iris-setosa* in your dataset"
      ],
      "metadata": {
        "id": "DXWyq5W_LXaY"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "df.Class == 'Iris-setosa'"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "oGTG6Lq_Ll98",
        "outputId": "e06efaa9-092a-402a-973b-721ee576e278"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "0       True\n",
              "1       True\n",
              "2       True\n",
              "3       True\n",
              "4       True\n",
              "       ...  \n",
              "145    False\n",
              "146    False\n",
              "147    False\n",
              "148    False\n",
              "149    False\n",
              "Name: Class, Length: 150, dtype: bool"
            ]
          },
          "metadata": {},
          "execution_count": 44
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "(df.Class == 'Iris-setosa').dtype"
      ],
      "metadata": {
        "id": "vHQiojaA0Vam",
        "outputId": "f760aee7-0871-4ae2-9c41-163ce434922d",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "dtype('bool')"
            ]
          },
          "metadata": {},
          "execution_count": 31
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Then you can pass that back into the dataframe and it will only give you the entries that were true."
      ],
      "metadata": {
        "id": "CxxBSPL8Lr8E"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "df[df.Class == 'Iris-setosa'].head(10) #I've added head to limit the output to 10 entries"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 363
        },
        "id": "z9KY0TT_Lwu8",
        "outputId": "dd63afec-7563-40e9-ea7b-5c844df0ed9e"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-dc5c0939-0592-48ce-ae8d-95ad2a49c1a5\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>SepalLength</th>\n",
              "      <th>SepalWidth</th>\n",
              "      <th>PedalLength</th>\n",
              "      <th>PedalWidth</th>\n",
              "      <th>Class</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>5.1</td>\n",
              "      <td>3.5</td>\n",
              "      <td>1.4</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>4.9</td>\n",
              "      <td>3.0</td>\n",
              "      <td>1.4</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>4.7</td>\n",
              "      <td>3.2</td>\n",
              "      <td>1.3</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>4.6</td>\n",
              "      <td>3.1</td>\n",
              "      <td>1.5</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>5.0</td>\n",
              "      <td>3.6</td>\n",
              "      <td>1.4</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>5.4</td>\n",
              "      <td>3.9</td>\n",
              "      <td>1.7</td>\n",
              "      <td>0.4</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>6</th>\n",
              "      <td>4.6</td>\n",
              "      <td>3.4</td>\n",
              "      <td>1.4</td>\n",
              "      <td>0.3</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7</th>\n",
              "      <td>5.0</td>\n",
              "      <td>3.4</td>\n",
              "      <td>1.5</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>8</th>\n",
              "      <td>4.4</td>\n",
              "      <td>2.9</td>\n",
              "      <td>1.4</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>9</th>\n",
              "      <td>4.9</td>\n",
              "      <td>3.1</td>\n",
              "      <td>1.5</td>\n",
              "      <td>0.1</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-dc5c0939-0592-48ce-ae8d-95ad2a49c1a5')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-dc5c0939-0592-48ce-ae8d-95ad2a49c1a5 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-dc5c0939-0592-48ce-ae8d-95ad2a49c1a5');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ],
            "text/plain": [
              "   SepalLength  SepalWidth  PedalLength  PedalWidth        Class\n",
              "0          5.1         3.5          1.4         0.2  Iris-setosa\n",
              "1          4.9         3.0          1.4         0.2  Iris-setosa\n",
              "2          4.7         3.2          1.3         0.2  Iris-setosa\n",
              "3          4.6         3.1          1.5         0.2  Iris-setosa\n",
              "4          5.0         3.6          1.4         0.2  Iris-setosa\n",
              "5          5.4         3.9          1.7         0.4  Iris-setosa\n",
              "6          4.6         3.4          1.4         0.3  Iris-setosa\n",
              "7          5.0         3.4          1.5         0.2  Iris-setosa\n",
              "8          4.4         2.9          1.4         0.2  Iris-setosa\n",
              "9          4.9         3.1          1.5         0.1  Iris-setosa"
            ]
          },
          "metadata": {},
          "execution_count": 20
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "If we want to combine several boolean DataSeries, use the `&` for and and `|` for or.\n",
        "\n",
        "\n"
      ],
      "metadata": {
        "id": "hSXI0RyMwWQ5"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "df[(df.Class == 'Iris-setosa') & (df.SepalLength > 5.2)]"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 394
        },
        "id": "MgIbhKDpwgjY",
        "outputId": "4ec1eecf-39c3-4aa5-ba25-49d60a7b878c"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-74a60b9b-22e2-4606-be00-0bfb53b23802\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>SepalLength</th>\n",
              "      <th>SepalWidth</th>\n",
              "      <th>PedalLength</th>\n",
              "      <th>PedalWidth</th>\n",
              "      <th>Class</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>5.4</td>\n",
              "      <td>3.9</td>\n",
              "      <td>1.7</td>\n",
              "      <td>0.4</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>10</th>\n",
              "      <td>5.4</td>\n",
              "      <td>3.7</td>\n",
              "      <td>1.5</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>14</th>\n",
              "      <td>5.8</td>\n",
              "      <td>4.0</td>\n",
              "      <td>1.2</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>15</th>\n",
              "      <td>5.7</td>\n",
              "      <td>4.4</td>\n",
              "      <td>1.5</td>\n",
              "      <td>0.4</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>16</th>\n",
              "      <td>5.4</td>\n",
              "      <td>3.9</td>\n",
              "      <td>1.3</td>\n",
              "      <td>0.4</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>18</th>\n",
              "      <td>5.7</td>\n",
              "      <td>3.8</td>\n",
              "      <td>1.7</td>\n",
              "      <td>0.3</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>20</th>\n",
              "      <td>5.4</td>\n",
              "      <td>3.4</td>\n",
              "      <td>1.7</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>31</th>\n",
              "      <td>5.4</td>\n",
              "      <td>3.4</td>\n",
              "      <td>1.5</td>\n",
              "      <td>0.4</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>33</th>\n",
              "      <td>5.5</td>\n",
              "      <td>4.2</td>\n",
              "      <td>1.4</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>36</th>\n",
              "      <td>5.5</td>\n",
              "      <td>3.5</td>\n",
              "      <td>1.3</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>48</th>\n",
              "      <td>5.3</td>\n",
              "      <td>3.7</td>\n",
              "      <td>1.5</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Iris-setosa</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-74a60b9b-22e2-4606-be00-0bfb53b23802')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-74a60b9b-22e2-4606-be00-0bfb53b23802 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-74a60b9b-22e2-4606-be00-0bfb53b23802');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ],
            "text/plain": [
              "    SepalLength  SepalWidth  PedalLength  PedalWidth        Class\n",
              "5           5.4         3.9          1.7         0.4  Iris-setosa\n",
              "10          5.4         3.7          1.5         0.2  Iris-setosa\n",
              "14          5.8         4.0          1.2         0.2  Iris-setosa\n",
              "15          5.7         4.4          1.5         0.4  Iris-setosa\n",
              "16          5.4         3.9          1.3         0.4  Iris-setosa\n",
              "18          5.7         3.8          1.7         0.3  Iris-setosa\n",
              "20          5.4         3.4          1.7         0.2  Iris-setosa\n",
              "31          5.4         3.4          1.5         0.4  Iris-setosa\n",
              "33          5.5         4.2          1.4         0.2  Iris-setosa\n",
              "36          5.5         3.5          1.3         0.2  Iris-setosa\n",
              "48          5.3         3.7          1.5         0.2  Iris-setosa"
            ]
          },
          "metadata": {},
          "execution_count": 21
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Integers"
      ],
      "metadata": {
        "id": "XhmvO1dhMSnY"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Integers are whole numbers that can be positive or negative.  Integers are closed under addition, subtration and multiplication (Not division).  Using integers saves some memory so if your entry is an integer you should use it that way.\n",
        "\n",
        "Some examples of integers are customer numbers and counts of objects.  The code to convert to an integer is `int`. "
      ],
      "metadata": {
        "id": "su1hq2d2MVNg"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "int(-3.0000)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "lmxUCvHKNSc4",
        "outputId": "bf533867-1503-4b0c-bacc-1be006d22b86"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "-3"
            ]
          },
          "metadata": {},
          "execution_count": 46
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Floats"
      ],
      "metadata": {
        "id": "xCATTI0NOHVS"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "A float is a generic number stored up to a certain number of decimals (64 bits in pandas).  Be wary of the last few decimals, more if you have done lots of computations.\n",
        "\n",
        "In the `iris` dataset most columns are floats."
      ],
      "metadata": {
        "id": "jqaiIV4QOK5T"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "df.dtypes"
      ],
      "metadata": {
        "id": "_qPrgL2EUHQ7",
        "outputId": "7e6116b0-aeb8-4503-a1f8-2caa437b9b33",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "SepalLength    float64\n",
              "SepalWidth     float64\n",
              "PedalLength    float64\n",
              "PedalWidth     float64\n",
              "Class           object\n",
              "dtype: object"
            ]
          },
          "metadata": {},
          "execution_count": 47
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Daytime"
      ],
      "metadata": {
        "id": "TLlYE_zcPSg2"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "These are dates and times and allow you to manipulate differences easily!  I'll grab a dataset with some dates in it.  Pandas does not recognize the datetime automatically so I had to convert."
      ],
      "metadata": {
        "id": "mxgXThX_PVYS"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "df2 = pa.read_csv('https://raw.githubusercontent.com/nurfnick/Data_Sets_For_Stats/master/CuratedDataSets/Landslides_From_NASA.csv')\n",
        "ds = df2.event_date.astype('datetime64')\n",
        "\n",
        "ds"
      ],
      "metadata": {
        "id": "X2zRNVSrLmBm",
        "outputId": "0f2f9098-fa66-4a2a-e24d-9018505f8eba",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "0       2008-08-01 00:00:00\n",
              "1       2009-01-02 02:00:00\n",
              "2       2007-01-19 00:00:00\n",
              "3       2009-07-31 00:00:00\n",
              "4       2010-10-16 12:00:00\n",
              "                ...        \n",
              "11028   2017-04-01 13:34:00\n",
              "11029   2017-03-25 17:32:00\n",
              "11030   2016-12-15 05:00:00\n",
              "11031   2017-04-29 19:03:00\n",
              "11032   2017-03-13 14:32:00\n",
              "Name: event_date, Length: 11033, dtype: datetime64[ns]"
            ]
          },
          "metadata": {},
          "execution_count": 10
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "ds[1]-ds[0]"
      ],
      "metadata": {
        "id": "0gn37nUbMfFr",
        "outputId": "787731e5-4874-4e36-cadd-8fb36f6f7cd3",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "Timedelta('154 days 02:00:00')"
            ]
          },
          "metadata": {},
          "execution_count": 11
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "The `Timedelta` is itself another data structure!  "
      ],
      "metadata": {
        "id": "t5YERsL6MpYh"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "(ds[1]-ds[0]).total_seconds()"
      ],
      "metadata": {
        "id": "FqFtI5osM2qn",
        "outputId": "28b0afb8-d2a7-4c3a-80a6-248d0e724ba7",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "13312800.0"
            ]
          },
          "metadata": {},
          "execution_count": 13
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "This gives us the total seconds in the elapsed time.  I'm certain there are other things you could do here.  When you need them, you'll have to explore!"
      ],
      "metadata": {
        "id": "0jwx-HKuN7ij"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Category"
      ],
      "metadata": {
        "id": "jJAVX4chPz7j"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "I'll convert the Class into a category inside Pandas by converting the DataSereis of Class into a category and passing it back to the dataframe."
      ],
      "metadata": {
        "id": "BsMEdTRX0fMl"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "df.Class = df.Class.astype('category')\n",
        "\n",
        "df.dtypes"
      ],
      "metadata": {
        "id": "4-C-716NUVp_",
        "outputId": "eab8c49d-fe94-4c60-b7f9-7813e0b0591e",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "SepalLength     float64\n",
              "SepalWidth      float64\n",
              "PedalLength     float64\n",
              "PedalWidth      float64\n",
              "Class          category\n",
              "dtype: object"
            ]
          },
          "metadata": {},
          "execution_count": 48
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "You can get at each category through the `unique` command. "
      ],
      "metadata": {
        "id": "gDBZaIt_00K1"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "df.Class.unique()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "OpuHwR54yWD8",
        "outputId": "19e72bcd-19b1-4ed1-fd7c-2a313f3a4179"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype=object)"
            ]
          },
          "metadata": {},
          "execution_count": 22
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Category is a structure only in pandas.  It will allow you to put an order to categorical ordinal variables. "
      ],
      "metadata": {
        "id": "qhTI2dMbP8k1"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3YzlpyBF1bxb",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "de660b33-e131-4734-cafb-27d9442e6260"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "CategoricalDtype(categories=['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], ordered=True)"
            ]
          },
          "metadata": {},
          "execution_count": 25
        }
      ],
      "source": [
        "from pandas.api.types import CategoricalDtype\n",
        "\n",
        "\n",
        "cat_type = CategoricalDtype(categories=['Iris-setosa', 'Iris-versicolor', \"Iris-virginica\"], ordered=True)\n",
        "\n",
        "dfClass = df.Class.astype(cat_type)\n",
        "\n",
        "dfClass.dtype"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "This may be nice for many categorical and ordinal variables!  Taken from [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html#categoricaldtype)"
      ],
      "metadata": {
        "id": "-uevwluhRjTd"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "You might be asking why I would order a categorical variable (especcially since the above example has no defined order) \n",
        "\n",
        "Some things do have an order!  Monday, Tuesday, Wednesday, etc.  GED, High school Grad, Associates, Bachelors, Masters, etc.\n",
        "\n",
        "Some things have an order but not always well defined:  Single, Married, Divorced, Widowed?\n",
        "\n",
        "Somethings have a circular order Winter, Spring, Summer, Fall, Winter, etc.\n",
        "\n",
        "Be careful and use order correctly!"
      ],
      "metadata": {
        "id": "SynrFbgivglH"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Your Turn"
      ],
      "metadata": {
        "id": "njEPBHzKT6bH"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Using the `banks` dataset retrieved from [UCI Machine Learning Repo](https://archive.ics.uci.edu/ml/datasets/Bank+Marketing)  but also accessible [here](https://raw.githubusercontent.com/nurfnick/Data_Viz/main/bank.csv).  Explore the following questions:\n",
        "\n",
        "1. What is the data type of the first column? Should you change it or is that the best data type.\n",
        "2. The second column is *job* could you assign an order to this string?  Could you assign an order the the fourth column *education*.  If yes, what order would you assign?  Do so.\n",
        "3. Could any of the columns have been a boolean?  Find one and create a new column in the dataframe that is just boolean and has a descriptive name.\n",
        "4. Name one more interesting fact about this dataset and the datatypes."
      ],
      "metadata": {
        "id": "m8lLbPwPT809"
      }
    },
    {
      "cell_type": "code",
      "source": [
        ""
      ],
      "metadata": {
        "id": "y3J890JCRdII"
      },
      "execution_count": null,
      "outputs": []
    }
  ]
}