{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "Untitled74.ipynb", "provenance": [], "authorship_tag": "ABX9TyP4m3VKofQbnFs7oZSSvePv", "include_colab_link": true }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "source": [ "# HTML Tables" ], "metadata": { "id": "rePz6LKGjCfI" } }, { "cell_type": "markdown", "source": [ "Let's load the same wiki page about the simpsons that we were working with before." ], "metadata": { "id": "kjQv8xfEItn7" } }, { "cell_type": "code", "source": [ "import requests\n", "import pandas as pa\n", "from bs4 import BeautifulSoup\n", "\n", "\n", "r = requests.get('https://en.wikipedia.org/wiki/The_Simpsons')\n", "html_contents = r.text\n", "html_soup = BeautifulSoup(html_contents,\"lxml\")" ], "metadata": { "id": "0ToRgB5NI0By" }, "execution_count": 52, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Tables Mean Data for Processing and Visualization!" ], "metadata": { "id": "0mrpNgyzL07P" } }, { "cell_type": "code", "source": [ "len(html_soup.find_all('table'))" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "LdnmsynUt9FK", "outputId": "f1e3c2ed-7d36-46f3-e3a8-6abae8568983" }, "execution_count": 53, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "41" ] }, "metadata": {}, "execution_count": 53 } ] }, { "cell_type": "markdown", "source": [ "We see here that there are 41 tables stored in a list! Let's get one of them by class. There are also sometimes **ids** and a grab bag of otherways to grab different parts of the html. Use your developer tools to examine your particular website!" ], "metadata": { "id": "WRVvs4LeuO9e" } }, { "cell_type": "code", "source": [ "tables = html_soup.find_all('table',class_=\"wikitable\")\n", "tables[0].find_all('a')" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "qoxXZNa_y_gp", "outputId": "c2ba6644-f136-4e48-87b8-9f8079a4c363" }, "execution_count": 54, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "[ET,\n", " 1,\n", " 1989–90,\n", " Life on the Fast Lane,\n", " 2,\n", " 1990–91,\n", " Bart Gets an 'F',\n", " 3,\n", " 1991–92,\n", " Colonel Homer,\n", " 4,\n", " 1992–93,\n", " Lisa's First Word,\n", " 5,\n", " 1993–94,\n", " Treehouse of Horror IV,\n", " 6,\n", " 1994–95,\n", " Treehouse of Horror V,\n", " 7,\n", " 1995–96,\n", " Treehouse of Horror VI,\n", " 8,\n", " 1996–97,\n", " [146],\n", " The Springfield Files,\n", " 9,\n", " 1997–98,\n", " The Two Mrs. Nahasapeemapetilons,\n", " 10,\n", " 1998–99,\n", " Maximum Homerdrive,\n", " 11,\n", " 1999–2000,\n", " The Mansion Family,\n", " 12,\n", " 2000–01,\n", " Worst Episode Ever,\n", " 13,\n", " 2001–02,\n", " The Parent Rap,\n", " 14,\n", " 2002–03,\n", " I'm Spelling as Fast as I Can,\n", " 15,\n", " 2003–04,\n", " I, (Annoyed Grunt)-Bot,\n", " 16,\n", " 2004–05,\n", " Homer and Ned's Hail Mary Pass,\n", " 17,\n", " 2005–06,\n", " Treehouse of Horror XVI,\n", " 18,\n", " 2006–07,\n", " The Wife Aquatic,\n", " 19,\n", " 2007–08,\n", " Treehouse of Horror XVIII,\n", " 20,\n", " 2008–09,\n", " Treehouse of Horror XIX,\n", " 21,\n", " 2009–10,\n", " Once Upon a Time in Springfield,\n", " 22,\n", " 2010–11,\n", " Moms I'd Like to Forget,\n", " 23,\n", " 2011–12,\n", " [147],\n", " The D'oh-cial Network,\n", " 24,\n", " 2012–13,\n", " [148],\n", " Homer Goes to Prep School,\n", " 25,\n", " 2013–14,\n", " [149],\n", " Steal This Episode,\n", " 26,\n", " 2014–15,\n", " [150],\n", " The Man Who Came to Be Dinner,\n", " 27,\n", " 2015–16,\n", " [151],\n", " Teenage Mutant Milk-Caused Hurdles,\n", " 28,\n", " 2016–17,\n", " [152],\n", " Pork and Burns,\n", " 29,\n", " 2017–18,\n", " [153],\n", " Frink Gets Testy,\n", " 30,\n", " 2018–19,\n", " [154],\n", " The Girl on the Bus,\n", " 31,\n", " 2019–20,\n", " [155],\n", " Go Big or Go Homer,\n", " 32,\n", " 2020–21,\n", " [156],\n", " Treehouse of Horror XXXI,\n", " 33,\n", " 2021–22,\n", " [157]]" ] }, "metadata": {}, "execution_count": 54 } ] }, { "cell_type": "markdown", "source": [ "Here I grabbed the first table, there were 3 that had the *class* of \"wikitable\". Next I grabbed all the links. The table was **HUGE** when I printed everything, To get the table in a nice form, I'll simply pass it to pandas using the `read_html` command. I did need to convert the soup back into a string and then I only selected the first table to call `df`." ], "metadata": { "id": "1KH3B5rr1A4x" } }, { "cell_type": "code", "source": [ "\n", "df = pa.read_html(str(tables))[0]\n", "df" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "grXFyQGJG73s", "outputId": "5fb01186-57ab-4946-c9ad-a5131ed564bf" }, "execution_count": 55, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SeasonNo. ofepisodesOriginally airedViewership
SeasonNo. ofepisodesSeason premiereSeason finaleTime slot (ET)Avg. viewers(in millions)Most watched episode
SeasonSeason.1No. ofepisodesSeason premiereSeason finaleTime slot (ET)Avg. viewers(in millions)Viewers(millions)Episode title
011989–9013December 17, 1989May 13, 1990Sunday 8:30 pm27.833.5\"Life on the Fast Lane\"
121990–9122October 11, 1990July 11, 1991Thursday 8:00 pm24.433.6\"Bart Gets an 'F'\"
231991–9224September 19, 1991August 27, 1992Thursday 8:00 pm21.825.5\"Colonel Homer\"
341992–9322September 24, 1992May 13, 1993Thursday 8:00 pm22.428.6\"Lisa's First Word\"
451993–9422September 30, 1993May 19, 1994Thursday 8:00 pm18.924.0\"Treehouse of Horror IV\"
561994–9525September 4, 1994May 21, 1995Sunday 8:00 pm15.622.2\"Treehouse of Horror V\"
671995–9625September 17, 1995May 19, 1996Sunday 8:00 pm (Episodes 1–24)Sunday 8:30 pm (...15.119.7\"Treehouse of Horror VI\"
781996–9725October 27, 1996May 18, 1997Sunday 8:30 pm (Episodes 1–3)Sunday 8:00 pm (E...14.520.9\"The Springfield Files\"
891997–9825September 21, 1997May 17, 1998Sunday 8:00 pm15.319.8\"The Two Mrs. Nahasapeemapetilons\"
9101998–9923August 23, 1998May 16, 1999Sunday 8:00 pm13.515.5\"Maximum Homerdrive\"
10111999–200022September 26, 1999May 21, 2000Sunday 8:00 pm8.818.4\"The Mansion Family\"
11122000–0121November 1, 2000May 20, 2001Sunday 8:00 pm15.518.6\"Worst Episode Ever\"
12132001–0222November 6, 2001May 22, 2002Tuesday 8:30 pm (Episode 1)Sunday 8:00 pm (Epi...12.514.9\"The Parent Rap\"
13142002–0322November 3, 2002May 18, 2003Sunday 8:00 pm (Episodes 1–11, 13–21)Sunday 8:...14.422.1\"I'm Spelling as Fast as I Can\"
14152003–0422November 2, 2003May 23, 2004Sunday 8:00 pm11.016.3\"I, (Annoyed Grunt)-Bot\"
15162004–0521November 7, 2004May 15, 2005Sunday 8:00 pm (Episodes 1–7, 9–16, 18, 20)Sun...10.223.07\"Homer and Ned's Hail Mary Pass\"
16172005–0622September 11, 2005May 21, 2006Sunday 8:00 pm9.5511.63\"Treehouse of Horror XVI\"
17182006–0722September 10, 2006May 20, 2007Sunday 8:00 pm9.1513.90\"The Wife Aquatic\"
18192007–0820September 23, 2007May 18, 2008Sunday 8:00 pm8.3711.7\"Treehouse of Horror XVIII\"
19202008–0921September 28, 2008May 17, 2009Sunday 8:00 pm7.112.4\"Treehouse of Horror XIX\"
20212009–1023September 27, 2009May 23, 2010Sunday 8:00 pm7.114.62\"Once Upon a Time in Springfield\"
21222010–1122September 26, 2010May 22, 2011Sunday 8:00 pm7.0912.6\"Moms I'd Like to Forget\"
22232011–1222September 25, 2011May 20, 2012Sunday 8:00 pm6.15[147]11.48\"The D'oh-cial Network\"
23242012–1322September 30, 2012May 19, 2013Sunday 8:00 pm (Episodes 1-21)Sunday 8:30 pm (...5.41[148]8.97\"Homer Goes to Prep School\"
24252013–1422September 29, 2013May 18, 2014Sunday 8:00 pm (Episodes 1–11 & 13-22)Sunday 7...5.02[149]12.04\"Steal This Episode\"
25262014–1522September 28, 2014May 17, 2015Sunday 8:00 pm5.61[150]10.62\"The Man Who Came to Be Dinner\"
26272015–1622September 27, 2015May 22, 2016Sunday 8:00 pm4.0[151]8.33\"Teenage Mutant Milk-Caused Hurdles\"
27282016–1722September 25, 2016May 21, 2017Sunday 8:00 pm4.80[152]8.19\"Pork and Burns\"
28292017–1821October 1, 2017May 20, 2018Sunday 8:00 pm4.07[153]8.04\"Frink Gets Testy\"
29302018–1923September 30, 2018May 12, 2019Sunday 8:00 pm3.10[154]8.20\"The Girl on the Bus\"
30312019–2022September 29, 2019May 17, 2020Sunday 8:00 pm2.58[155]5.63\"Go Big or Go Homer\"
31322020–2122September 27, 2020May 23, 2021Sunday 8:00 pm (Episodes 1–10 & 12-22)Sunday 9...2.32[156]4.93\"Treehouse of Horror XXXI\"
32332021–2222September 26, 2021May 15, 2022[157]Sunday 8:00 pm (Episodes 1–10 & 12-22)Sunday 9...TBATBATBA
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " Season ... Viewership \n", " Season ... Most watched episode \n", " Season Season.1 ... Viewers(millions) Episode title\n", "0 1 1989–90 ... 33.5 \"Life on the Fast Lane\"\n", "1 2 1990–91 ... 33.6 \"Bart Gets an 'F'\"\n", "2 3 1991–92 ... 25.5 \"Colonel Homer\"\n", "3 4 1992–93 ... 28.6 \"Lisa's First Word\"\n", "4 5 1993–94 ... 24.0 \"Treehouse of Horror IV\"\n", "5 6 1994–95 ... 22.2 \"Treehouse of Horror V\"\n", "6 7 1995–96 ... 19.7 \"Treehouse of Horror VI\"\n", "7 8 1996–97 ... 20.9 \"The Springfield Files\"\n", "8 9 1997–98 ... 19.8 \"The Two Mrs. Nahasapeemapetilons\"\n", "9 10 1998–99 ... 15.5 \"Maximum Homerdrive\"\n", "10 11 1999–2000 ... 18.4 \"The Mansion Family\"\n", "11 12 2000–01 ... 18.6 \"Worst Episode Ever\"\n", "12 13 2001–02 ... 14.9 \"The Parent Rap\"\n", "13 14 2002–03 ... 22.1 \"I'm Spelling as Fast as I Can\"\n", "14 15 2003–04 ... 16.3 \"I, (Annoyed Grunt)-Bot\"\n", "15 16 2004–05 ... 23.07 \"Homer and Ned's Hail Mary Pass\"\n", "16 17 2005–06 ... 11.63 \"Treehouse of Horror XVI\"\n", "17 18 2006–07 ... 13.90 \"The Wife Aquatic\"\n", "18 19 2007–08 ... 11.7 \"Treehouse of Horror XVIII\"\n", "19 20 2008–09 ... 12.4 \"Treehouse of Horror XIX\"\n", "20 21 2009–10 ... 14.62 \"Once Upon a Time in Springfield\"\n", "21 22 2010–11 ... 12.6 \"Moms I'd Like to Forget\"\n", "22 23 2011–12 ... 11.48 \"The D'oh-cial Network\"\n", "23 24 2012–13 ... 8.97 \"Homer Goes to Prep School\"\n", "24 25 2013–14 ... 12.04 \"Steal This Episode\"\n", "25 26 2014–15 ... 10.62 \"The Man Who Came to Be Dinner\"\n", "26 27 2015–16 ... 8.33 \"Teenage Mutant Milk-Caused Hurdles\"\n", "27 28 2016–17 ... 8.19 \"Pork and Burns\"\n", "28 29 2017–18 ... 8.04 \"Frink Gets Testy\"\n", "29 30 2018–19 ... 8.20 \"The Girl on the Bus\"\n", "30 31 2019–20 ... 5.63 \"Go Big or Go Homer\"\n", "31 32 2020–21 ... 4.93 \"Treehouse of Horror XXXI\"\n", "32 33 2021–22 ... TBA TBA\n", "\n", "[33 rows x 9 columns]" ] }, "metadata": {}, "execution_count": 55 } ] }, { "cell_type": "markdown", "source": [ "The column names are not quite right but that is not a terrible fix. This was actually found later. You can see below in the list way in which I built the tables. There is more flexibility in the list way but the simplicity of the pandas way cannot be beat!" ], "metadata": { "id": "pd_0fC5nICDQ" } }, { "cell_type": "markdown", "source": [ "So the column names here are a problem as there are many! I will simply use the `droplevel` command twice to remove two of the multi-indexing that is going on in the titles." ], "metadata": { "id": "PeGxFMHkODZQ" } }, { "cell_type": "code", "source": [ "df.columns = df.columns.droplevel(0).droplevel(0)\n", "df" ], "metadata": { "id": "p8O7usLDOpF3", "outputId": "2d3ced87-670d-4083-becf-073491d7ff75", "colab": { "base_uri": "https://localhost:8080/", "height": 1000 } }, "execution_count": 56, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SeasonSeason.1No. ofepisodesSeason premiereSeason finaleTime slot (ET)Avg. viewers(in millions)Viewers(millions)Episode title
011989–9013December 17, 1989May 13, 1990Sunday 8:30 pm27.833.5\"Life on the Fast Lane\"
121990–9122October 11, 1990July 11, 1991Thursday 8:00 pm24.433.6\"Bart Gets an 'F'\"
231991–9224September 19, 1991August 27, 1992Thursday 8:00 pm21.825.5\"Colonel Homer\"
341992–9322September 24, 1992May 13, 1993Thursday 8:00 pm22.428.6\"Lisa's First Word\"
451993–9422September 30, 1993May 19, 1994Thursday 8:00 pm18.924.0\"Treehouse of Horror IV\"
561994–9525September 4, 1994May 21, 1995Sunday 8:00 pm15.622.2\"Treehouse of Horror V\"
671995–9625September 17, 1995May 19, 1996Sunday 8:00 pm (Episodes 1–24)Sunday 8:30 pm (...15.119.7\"Treehouse of Horror VI\"
781996–9725October 27, 1996May 18, 1997Sunday 8:30 pm (Episodes 1–3)Sunday 8:00 pm (E...14.520.9\"The Springfield Files\"
891997–9825September 21, 1997May 17, 1998Sunday 8:00 pm15.319.8\"The Two Mrs. Nahasapeemapetilons\"
9101998–9923August 23, 1998May 16, 1999Sunday 8:00 pm13.515.5\"Maximum Homerdrive\"
10111999–200022September 26, 1999May 21, 2000Sunday 8:00 pm8.818.4\"The Mansion Family\"
11122000–0121November 1, 2000May 20, 2001Sunday 8:00 pm15.518.6\"Worst Episode Ever\"
12132001–0222November 6, 2001May 22, 2002Tuesday 8:30 pm (Episode 1)Sunday 8:00 pm (Epi...12.514.9\"The Parent Rap\"
13142002–0322November 3, 2002May 18, 2003Sunday 8:00 pm (Episodes 1–11, 13–21)Sunday 8:...14.422.1\"I'm Spelling as Fast as I Can\"
14152003–0422November 2, 2003May 23, 2004Sunday 8:00 pm11.016.3\"I, (Annoyed Grunt)-Bot\"
15162004–0521November 7, 2004May 15, 2005Sunday 8:00 pm (Episodes 1–7, 9–16, 18, 20)Sun...10.223.07\"Homer and Ned's Hail Mary Pass\"
16172005–0622September 11, 2005May 21, 2006Sunday 8:00 pm9.5511.63\"Treehouse of Horror XVI\"
17182006–0722September 10, 2006May 20, 2007Sunday 8:00 pm9.1513.90\"The Wife Aquatic\"
18192007–0820September 23, 2007May 18, 2008Sunday 8:00 pm8.3711.7\"Treehouse of Horror XVIII\"
19202008–0921September 28, 2008May 17, 2009Sunday 8:00 pm7.112.4\"Treehouse of Horror XIX\"
20212009–1023September 27, 2009May 23, 2010Sunday 8:00 pm7.114.62\"Once Upon a Time in Springfield\"
21222010–1122September 26, 2010May 22, 2011Sunday 8:00 pm7.0912.6\"Moms I'd Like to Forget\"
22232011–1222September 25, 2011May 20, 2012Sunday 8:00 pm6.15[147]11.48\"The D'oh-cial Network\"
23242012–1322September 30, 2012May 19, 2013Sunday 8:00 pm (Episodes 1-21)Sunday 8:30 pm (...5.41[148]8.97\"Homer Goes to Prep School\"
24252013–1422September 29, 2013May 18, 2014Sunday 8:00 pm (Episodes 1–11 & 13-22)Sunday 7...5.02[149]12.04\"Steal This Episode\"
25262014–1522September 28, 2014May 17, 2015Sunday 8:00 pm5.61[150]10.62\"The Man Who Came to Be Dinner\"
26272015–1622September 27, 2015May 22, 2016Sunday 8:00 pm4.0[151]8.33\"Teenage Mutant Milk-Caused Hurdles\"
27282016–1722September 25, 2016May 21, 2017Sunday 8:00 pm4.80[152]8.19\"Pork and Burns\"
28292017–1821October 1, 2017May 20, 2018Sunday 8:00 pm4.07[153]8.04\"Frink Gets Testy\"
29302018–1923September 30, 2018May 12, 2019Sunday 8:00 pm3.10[154]8.20\"The Girl on the Bus\"
30312019–2022September 29, 2019May 17, 2020Sunday 8:00 pm2.58[155]5.63\"Go Big or Go Homer\"
31322020–2122September 27, 2020May 23, 2021Sunday 8:00 pm (Episodes 1–10 & 12-22)Sunday 9...2.32[156]4.93\"Treehouse of Horror XXXI\"
32332021–2222September 26, 2021May 15, 2022[157]Sunday 8:00 pm (Episodes 1–10 & 12-22)Sunday 9...TBATBATBA
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " Season Season.1 ... Viewers(millions) Episode title\n", "0 1 1989–90 ... 33.5 \"Life on the Fast Lane\"\n", "1 2 1990–91 ... 33.6 \"Bart Gets an 'F'\"\n", "2 3 1991–92 ... 25.5 \"Colonel Homer\"\n", "3 4 1992–93 ... 28.6 \"Lisa's First Word\"\n", "4 5 1993–94 ... 24.0 \"Treehouse of Horror IV\"\n", "5 6 1994–95 ... 22.2 \"Treehouse of Horror V\"\n", "6 7 1995–96 ... 19.7 \"Treehouse of Horror VI\"\n", "7 8 1996–97 ... 20.9 \"The Springfield Files\"\n", "8 9 1997–98 ... 19.8 \"The Two Mrs. Nahasapeemapetilons\"\n", "9 10 1998–99 ... 15.5 \"Maximum Homerdrive\"\n", "10 11 1999–2000 ... 18.4 \"The Mansion Family\"\n", "11 12 2000–01 ... 18.6 \"Worst Episode Ever\"\n", "12 13 2001–02 ... 14.9 \"The Parent Rap\"\n", "13 14 2002–03 ... 22.1 \"I'm Spelling as Fast as I Can\"\n", "14 15 2003–04 ... 16.3 \"I, (Annoyed Grunt)-Bot\"\n", "15 16 2004–05 ... 23.07 \"Homer and Ned's Hail Mary Pass\"\n", "16 17 2005–06 ... 11.63 \"Treehouse of Horror XVI\"\n", "17 18 2006–07 ... 13.90 \"The Wife Aquatic\"\n", "18 19 2007–08 ... 11.7 \"Treehouse of Horror XVIII\"\n", "19 20 2008–09 ... 12.4 \"Treehouse of Horror XIX\"\n", "20 21 2009–10 ... 14.62 \"Once Upon a Time in Springfield\"\n", "21 22 2010–11 ... 12.6 \"Moms I'd Like to Forget\"\n", "22 23 2011–12 ... 11.48 \"The D'oh-cial Network\"\n", "23 24 2012–13 ... 8.97 \"Homer Goes to Prep School\"\n", "24 25 2013–14 ... 12.04 \"Steal This Episode\"\n", "25 26 2014–15 ... 10.62 \"The Man Who Came to Be Dinner\"\n", "26 27 2015–16 ... 8.33 \"Teenage Mutant Milk-Caused Hurdles\"\n", "27 28 2016–17 ... 8.19 \"Pork and Burns\"\n", "28 29 2017–18 ... 8.04 \"Frink Gets Testy\"\n", "29 30 2018–19 ... 8.20 \"The Girl on the Bus\"\n", "30 31 2019–20 ... 5.63 \"Go Big or Go Homer\"\n", "31 32 2020–21 ... 4.93 \"Treehouse of Horror XXXI\"\n", "32 33 2021–22 ... TBA TBA\n", "\n", "[33 rows x 9 columns]" ] }, "metadata": {}, "execution_count": 56 } ] }, { "cell_type": "markdown", "source": [ "That is much better although we loose a little bit of information about the last two columns." ], "metadata": { "id": "oPLxQ3HtSyZF" } }, { "cell_type": "markdown", "source": [ "Let's grab one more table to see if it works as well still. I just grabbed the very first table on the page." ], "metadata": { "id": "Lx_4XbV4AAn5" } }, { "cell_type": "code", "source": [ "df2 = pa.read_html(str(html_soup.find('table')))[0]\n", "\n", "df2" ], "metadata": { "id": "NbCRj0NTAGH2", "outputId": "d23ac8e9-52a4-4334-f3f2-a3d98646c004", "colab": { "base_uri": "https://localhost:8080/", "height": 896 } }, "execution_count": 57, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
The SimpsonsThe Simpsons.1
0NaNNaN
1GenreAnimated sitcom Satire
2Created byMatt Groening
3Based onThe Simpsons shortsby Matt Groening
4Developed byJames L. Brooks Matt Groening Sam Simon
5Voices ofDan Castellaneta Julie Kavner Nancy Cartwright...
6Theme music composerDanny Elfman
7Opening theme\"The Simpsons Theme\"
8ComposersRichard Gibbs (1989–1990)Alf Clausen (1990–201...
9Country of originUnited States
10Original languageEnglish
11No. of seasons33
12No. of episodes717 (list of episodes)
13ProductionProduction
14Executive producersList James L. Brooks (entire run) Matt Groen...
15Running time21–24 minutes
16Production companiesGracie Films 20th Television[a] (seasons 1–32)...
17Distributor20th Television
18ReleaseRelease
19Original networkFox
20Picture formatNTSC (1989–2009)HDTV 720p (2009–present)
21Audio formatStereo (1989–1991)Dolby Surround (1991–2009)Do...
22Original releaseDecember 17, 1989present
23ChronologyChronology
24Preceded byThe Simpsons shorts from The Tracey Ullman Show
25External linksExternal links
26Official websiteOfficial website
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " The Simpsons The Simpsons.1\n", "0 NaN NaN\n", "1 Genre Animated sitcom Satire\n", "2 Created by Matt Groening\n", "3 Based on The Simpsons shortsby Matt Groening\n", "4 Developed by James L. Brooks Matt Groening Sam Simon\n", "5 Voices of Dan Castellaneta Julie Kavner Nancy Cartwright...\n", "6 Theme music composer Danny Elfman\n", "7 Opening theme \"The Simpsons Theme\"\n", "8 Composers Richard Gibbs (1989–1990)Alf Clausen (1990–201...\n", "9 Country of origin United States\n", "10 Original language English\n", "11 No. of seasons 33\n", "12 No. of episodes 717 (list of episodes)\n", "13 Production Production\n", "14 Executive producers List James L. Brooks (entire run) Matt Groen...\n", "15 Running time 21–24 minutes\n", "16 Production companies Gracie Films 20th Television[a] (seasons 1–32)...\n", "17 Distributor 20th Television\n", "18 Release Release\n", "19 Original network Fox\n", "20 Picture format NTSC (1989–2009)HDTV 720p (2009–present)\n", "21 Audio format Stereo (1989–1991)Dolby Surround (1991–2009)Do...\n", "22 Original release December 17, 1989present\n", "23 Chronology Chronology\n", "24 Preceded by The Simpsons shorts from The Tracey Ullman Show\n", "25 External links External links\n", "26 Official website Official website" ] }, "metadata": {}, "execution_count": 57 } ] }, { "cell_type": "markdown", "source": [ "## Table the Hard Way" ], "metadata": { "id": "1dc4SXrGIVx4" } }, { "cell_type": "code", "source": [ "data =[]\n", "for table in tables:\n", " headers = []\n", " rows = table.find_all('tr')\n", " for header in table.find('tr').find_all('th'):\n", " headers.append(header.text.replace('\\n', ''))\n", " for row in table.find_all('tr')[1:]:\n", " values =[]\n", " for col in row.find_all(['th','td']):\n", " values.append(col.text.replace('\\n', ''))\n", " data.append(values)\n", "data[:4]\n", "\n", "#pa.DataFrame(data[1:], columns = data[0])" ], "metadata": { "id": "8V-njfrh3utZ", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "5d605175-acdd-4871-c37e-81c2477a007c" }, "execution_count": 66, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "[['Season premiere',\n", " 'Season finale',\n", " 'Time slot (ET)',\n", " 'Avg. viewers(in millions)',\n", " 'Most watched episode'],\n", " ['Viewers(millions)', 'Episode title'],\n", " ['1',\n", " '1989–90',\n", " '13',\n", " 'December 17, 1989',\n", " 'May 13, 1990',\n", " 'Sunday 8:30\\xa0pm',\n", " '27.8',\n", " '33.5',\n", " '\"Life on the Fast Lane\"'],\n", " ['2',\n", " '1990–91',\n", " '22',\n", " 'October 11, 1990',\n", " 'July 11, 1991',\n", " 'Thursday 8:00\\xa0pm',\n", " '24.4',\n", " '33.6',\n", " '\"Bart Gets an \\'F\\'\"']]" ] }, "metadata": {}, "execution_count": 66 } ] }, { "cell_type": "markdown", "source": [ "I have to do some work here to get this into a dataframe. Mostly just get the column names correct. Several were not named and some ended up in there own row. This is why it is important to look at your outputs!" ], "metadata": { "id": "Fk2b_cl_86eR" } }, { "cell_type": "code", "source": [ "titles = []\n", "titles.append('Season')\n", "titles.append('Years')\n", "titles.append('Episodes')\n", "for name in data[0]:\n", " titles.append(name)\n", "titles.append('Most watched episode title')\n", "\n", "df = pa.DataFrame(data[2:], columns = titles)\n", "df" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "-bMOmoAG9V9Q", "outputId": "3ac08fe5-c05f-40c7-86e2-1ca5808b63f5" }, "execution_count": 59, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SeasonYearsEpisodesSeason premiereSeason finaleTime slot (ET)Avg. viewers(in millions)Most watched episodeMost watched episode title
011989–9013December 17, 1989May 13, 1990Sunday 8:30 pm27.833.5\"Life on the Fast Lane\"
121990–9122October 11, 1990July 11, 1991Thursday 8:00 pm24.433.6\"Bart Gets an 'F'\"
231991–9224September 19, 1991August 27, 199221.825.5\"Colonel Homer\"None
341992–9322September 24, 1992May 13, 199322.428.6\"Lisa's First Word\"None
451993–9422September 30, 1993May 19, 199418.924.0\"Treehouse of Horror IV\"None
561994–9525September 4, 1994May 21, 1995Sunday 8:00 pm15.622.2\"Treehouse of Horror V\"
671995–9625September 17, 1995May 19, 1996Sunday 8:00 pm (Episodes 1–24)Sunday 8:30 pm (...15.119.7\"Treehouse of Horror VI\"
781996–9725October 27, 1996May 18, 1997Sunday 8:30 pm (Episodes 1–3)Sunday 8:00 pm (E...14.520.9\"The Springfield Files\"
891997–9825September 21, 1997May 17, 1998Sunday 8:00 pm15.319.8\"The Two Mrs. Nahasapeemapetilons\"
9101998–9923August 23, 1998May 16, 199913.515.5\"Maximum Homerdrive\"None
10111999–200022September 26, 1999May 21, 20008.818.4\"The Mansion Family\"None
11122000–0121November 1, 2000May 20, 200115.518.6\"Worst Episode Ever\"None
12132001–0222November 6, 2001May 22, 2002Tuesday 8:30 pm (Episode 1)Sunday 8:00 pm (Epi...12.514.9\"The Parent Rap\"
13142002–0322November 3, 2002May 18, 2003Sunday 8:00 pm (Episodes 1–11, 13–21)Sunday 8:...14.422.1\"I'm Spelling as Fast as I Can\"
14152003–0422November 2, 2003May 23, 2004Sunday 8:00 pm11.016.3\"I, (Annoyed Grunt)-Bot\"
15162004–0521November 7, 2004May 15, 2005Sunday 8:00 pm (Episodes 1–7, 9–16, 18, 20)Sun...10.223.07\"Homer and Ned's Hail Mary Pass\"
16172005–0622September 11, 2005May 21, 2006Sunday 8:00 pm9.5511.63\"Treehouse of Horror XVI\"
17182006–0722September 10, 2006May 20, 20079.1513.90\"The Wife Aquatic\"None
18192007–0820September 23, 2007May 18, 20088.3711.7\"Treehouse of Horror XVIII\"None
19202008–0921September 28, 2008May 17, 20097.112.4\"Treehouse of Horror XIX\"None
20212009–1023September 27, 2009May 23, 20107.114.62\"Once Upon a Time in Springfield\"None
21222010–1122September 26, 2010May 22, 20117.0912.6\"Moms I'd Like to Forget\"None
22232011–1222September 25, 2011May 20, 20126.15[147]11.48\"The D'oh-cial Network\"None
23242012–1322September 30, 2012May 19, 2013Sunday 8:00 pm (Episodes 1-21)Sunday 8:30 pm (...5.41[148]8.97\"Homer Goes to Prep School\"
24252013–1422September 29, 2013May 18, 2014Sunday 8:00 pm (Episodes 1–11 & 13-22)Sunday 7...5.02[149]12.04\"Steal This Episode\"
25262014–1522September 28, 2014May 17, 2015Sunday 8:00 pm5.61[150]10.62\"The Man Who Came to Be Dinner\"
26272015–1622September 27, 2015May 22, 20164.0[151]8.33\"Teenage Mutant Milk-Caused Hurdles\"None
27282016–1722September 25, 2016May 21, 2017 (2017-05-21)4.80[152]8.19\"Pork and Burns\"None
28292017–1821October 1, 2017May 20, 20184.07[153]8.04\"Frink Gets Testy\"None
29302018–1923September 30, 2018May 12, 20193.10[154]8.20\"The Girl on the Bus\"None
30312019–2022September 29, 2019May 17, 20202.58[155]5.63\"Go Big or Go Homer\"None
31322020–2122September 27, 2020May 23, 2021Sunday 8:00 pm (Episodes 1–10 & 12-22)Sunday 9...2.32[156]4.93\"Treehouse of Horror XXXI\"
32332021–2222September 26, 2021May 15, 2022[157]TBATBATBANone
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " Season ... Most watched episode title\n", "0 1 ... \"Life on the Fast Lane\"\n", "1 2 ... \"Bart Gets an 'F'\"\n", "2 3 ... None\n", "3 4 ... None\n", "4 5 ... None\n", "5 6 ... \"Treehouse of Horror V\"\n", "6 7 ... \"Treehouse of Horror VI\"\n", "7 8 ... \"The Springfield Files\"\n", "8 9 ... \"The Two Mrs. Nahasapeemapetilons\"\n", "9 10 ... None\n", "10 11 ... None\n", "11 12 ... None\n", "12 13 ... \"The Parent Rap\"\n", "13 14 ... \"I'm Spelling as Fast as I Can\"\n", "14 15 ... \"I, (Annoyed Grunt)-Bot\"\n", "15 16 ... \"Homer and Ned's Hail Mary Pass\"\n", "16 17 ... \"Treehouse of Horror XVI\"\n", "17 18 ... None\n", "18 19 ... None\n", "19 20 ... None\n", "20 21 ... None\n", "21 22 ... None\n", "22 23 ... None\n", "23 24 ... \"Homer Goes to Prep School\"\n", "24 25 ... \"Steal This Episode\"\n", "25 26 ... \"The Man Who Came to Be Dinner\"\n", "26 27 ... None\n", "27 28 ... None\n", "28 29 ... None\n", "29 30 ... None\n", "30 31 ... None\n", "31 32 ... \"Treehouse of Horror XXXI\"\n", "32 33 ... None\n", "\n", "[33 rows x 9 columns]" ] }, "metadata": {}, "execution_count": 59 } ] }, { "cell_type": "markdown", "source": [ "Actually I still have a problem with my data. Lots of the data had the airtime repeated from above. Let's see if we can fix that" ], "metadata": { "id": "CZQhJBQlD_h2" } }, { "cell_type": "code", "source": [ "newdata =[]\n", "for i in range(2,35):\n", " row = []\n", " if len(data[i])!= 9:\n", " for j in range(5):\n", " row.append(data[i][j])\n", " row.append(newdata[i-3][5])\n", " for j in range(5,8):\n", " row.append(data[i][j])\n", " else:\n", " row = data[i]\n", " newdata.append(row)\n" ], "metadata": { "id": "6wkKJFFvC3hs" }, "execution_count": 60, "outputs": [] }, { "cell_type": "code", "source": [ "df = pa.DataFrame(newdata, columns = titles)\n", "\n", "df" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "jwOlni6KFoaq", "outputId": "6012bd48-4256-4402-c6a7-44e60477a25e" }, "execution_count": 61, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SeasonYearsEpisodesSeason premiereSeason finaleTime slot (ET)Avg. viewers(in millions)Most watched episodeMost watched episode title
011989–9013December 17, 1989May 13, 1990Sunday 8:30 pm27.833.5\"Life on the Fast Lane\"
121990–9122October 11, 1990July 11, 1991Thursday 8:00 pm24.433.6\"Bart Gets an 'F'\"
231991–9224September 19, 1991August 27, 1992Thursday 8:00 pm21.825.5\"Colonel Homer\"
341992–9322September 24, 1992May 13, 1993Thursday 8:00 pm22.428.6\"Lisa's First Word\"
451993–9422September 30, 1993May 19, 1994Thursday 8:00 pm18.924.0\"Treehouse of Horror IV\"
561994–9525September 4, 1994May 21, 1995Sunday 8:00 pm15.622.2\"Treehouse of Horror V\"
671995–9625September 17, 1995May 19, 1996Sunday 8:00 pm (Episodes 1–24)Sunday 8:30 pm (...15.119.7\"Treehouse of Horror VI\"
781996–9725October 27, 1996May 18, 1997Sunday 8:30 pm (Episodes 1–3)Sunday 8:00 pm (E...14.520.9\"The Springfield Files\"
891997–9825September 21, 1997May 17, 1998Sunday 8:00 pm15.319.8\"The Two Mrs. Nahasapeemapetilons\"
9101998–9923August 23, 1998May 16, 1999Sunday 8:00 pm13.515.5\"Maximum Homerdrive\"
10111999–200022September 26, 1999May 21, 2000Sunday 8:00 pm8.818.4\"The Mansion Family\"
11122000–0121November 1, 2000May 20, 2001Sunday 8:00 pm15.518.6\"Worst Episode Ever\"
12132001–0222November 6, 2001May 22, 2002Tuesday 8:30 pm (Episode 1)Sunday 8:00 pm (Epi...12.514.9\"The Parent Rap\"
13142002–0322November 3, 2002May 18, 2003Sunday 8:00 pm (Episodes 1–11, 13–21)Sunday 8:...14.422.1\"I'm Spelling as Fast as I Can\"
14152003–0422November 2, 2003May 23, 2004Sunday 8:00 pm11.016.3\"I, (Annoyed Grunt)-Bot\"
15162004–0521November 7, 2004May 15, 2005Sunday 8:00 pm (Episodes 1–7, 9–16, 18, 20)Sun...10.223.07\"Homer and Ned's Hail Mary Pass\"
16172005–0622September 11, 2005May 21, 2006Sunday 8:00 pm9.5511.63\"Treehouse of Horror XVI\"
17182006–0722September 10, 2006May 20, 2007Sunday 8:00 pm9.1513.90\"The Wife Aquatic\"
18192007–0820September 23, 2007May 18, 2008Sunday 8:00 pm8.3711.7\"Treehouse of Horror XVIII\"
19202008–0921September 28, 2008May 17, 2009Sunday 8:00 pm7.112.4\"Treehouse of Horror XIX\"
20212009–1023September 27, 2009May 23, 2010Sunday 8:00 pm7.114.62\"Once Upon a Time in Springfield\"
21222010–1122September 26, 2010May 22, 2011Sunday 8:00 pm7.0912.6\"Moms I'd Like to Forget\"
22232011–1222September 25, 2011May 20, 2012Sunday 8:00 pm6.15[147]11.48\"The D'oh-cial Network\"
23242012–1322September 30, 2012May 19, 2013Sunday 8:00 pm (Episodes 1-21)Sunday 8:30 pm (...5.41[148]8.97\"Homer Goes to Prep School\"
24252013–1422September 29, 2013May 18, 2014Sunday 8:00 pm (Episodes 1–11 & 13-22)Sunday 7...5.02[149]12.04\"Steal This Episode\"
25262014–1522September 28, 2014May 17, 2015Sunday 8:00 pm5.61[150]10.62\"The Man Who Came to Be Dinner\"
26272015–1622September 27, 2015May 22, 2016Sunday 8:00 pm4.0[151]8.33\"Teenage Mutant Milk-Caused Hurdles\"
27282016–1722September 25, 2016May 21, 2017 (2017-05-21)Sunday 8:00 pm4.80[152]8.19\"Pork and Burns\"
28292017–1821October 1, 2017May 20, 2018Sunday 8:00 pm4.07[153]8.04\"Frink Gets Testy\"
29302018–1923September 30, 2018May 12, 2019Sunday 8:00 pm3.10[154]8.20\"The Girl on the Bus\"
30312019–2022September 29, 2019May 17, 2020Sunday 8:00 pm2.58[155]5.63\"Go Big or Go Homer\"
31322020–2122September 27, 2020May 23, 2021Sunday 8:00 pm (Episodes 1–10 & 12-22)Sunday 9...2.32[156]4.93\"Treehouse of Horror XXXI\"
32332021–2222September 26, 2021May 15, 2022[157]Sunday 8:00 pm (Episodes 1–10 & 12-22)Sunday 9...TBATBATBA
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " Season Years ... Most watched episode Most watched episode title\n", "0 1 1989–90 ... 33.5 \"Life on the Fast Lane\"\n", "1 2 1990–91 ... 33.6 \"Bart Gets an 'F'\"\n", "2 3 1991–92 ... 25.5 \"Colonel Homer\"\n", "3 4 1992–93 ... 28.6 \"Lisa's First Word\"\n", "4 5 1993–94 ... 24.0 \"Treehouse of Horror IV\"\n", "5 6 1994–95 ... 22.2 \"Treehouse of Horror V\"\n", "6 7 1995–96 ... 19.7 \"Treehouse of Horror VI\"\n", "7 8 1996–97 ... 20.9 \"The Springfield Files\"\n", "8 9 1997–98 ... 19.8 \"The Two Mrs. Nahasapeemapetilons\"\n", "9 10 1998–99 ... 15.5 \"Maximum Homerdrive\"\n", "10 11 1999–2000 ... 18.4 \"The Mansion Family\"\n", "11 12 2000–01 ... 18.6 \"Worst Episode Ever\"\n", "12 13 2001–02 ... 14.9 \"The Parent Rap\"\n", "13 14 2002–03 ... 22.1 \"I'm Spelling as Fast as I Can\"\n", "14 15 2003–04 ... 16.3 \"I, (Annoyed Grunt)-Bot\"\n", "15 16 2004–05 ... 23.07 \"Homer and Ned's Hail Mary Pass\"\n", "16 17 2005–06 ... 11.63 \"Treehouse of Horror XVI\"\n", "17 18 2006–07 ... 13.90 \"The Wife Aquatic\"\n", "18 19 2007–08 ... 11.7 \"Treehouse of Horror XVIII\"\n", "19 20 2008–09 ... 12.4 \"Treehouse of Horror XIX\"\n", "20 21 2009–10 ... 14.62 \"Once Upon a Time in Springfield\"\n", "21 22 2010–11 ... 12.6 \"Moms I'd Like to Forget\"\n", "22 23 2011–12 ... 11.48 \"The D'oh-cial Network\"\n", "23 24 2012–13 ... 8.97 \"Homer Goes to Prep School\"\n", "24 25 2013–14 ... 12.04 \"Steal This Episode\"\n", "25 26 2014–15 ... 10.62 \"The Man Who Came to Be Dinner\"\n", "26 27 2015–16 ... 8.33 \"Teenage Mutant Milk-Caused Hurdles\"\n", "27 28 2016–17 ... 8.19 \"Pork and Burns\"\n", "28 29 2017–18 ... 8.04 \"Frink Gets Testy\"\n", "29 30 2018–19 ... 8.20 \"The Girl on the Bus\"\n", "30 31 2019–20 ... 5.63 \"Go Big or Go Homer\"\n", "31 32 2020–21 ... 4.93 \"Treehouse of Horror XXXI\"\n", "32 33 2021–22 ... TBA TBA\n", "\n", "[33 rows x 9 columns]" ] }, "metadata": {}, "execution_count": 61 } ] }, { "cell_type": "markdown", "source": [ "Do you ever get to the finish line and think to yourself, man there must be an easier way to do that... Oh there totally was..." ], "metadata": { "id": "OJuF0D4sG0dY" } }, { "cell_type": "markdown", "source": [ "## Your Turn" ], "metadata": { "id": "0_WwEIzHBtsN" } }, { "cell_type": "markdown", "source": [ "Navigate to [the wikipedia page on Marvel Cinematic Universe Films](https://en.wikipedia.org/wiki/List_of_Marvel_Cinematic_Universe_films). Gather the table on the films in the Infinity series (Hint: *class* is 'wikitable plainrowheaders'). Fix any issues with the column names. Remove rows that are not movies.\n", "\n" ], "metadata": { "id": "yFEC02AHBwVl" } } ] }