Open In Colab

HTML Tables#

Let’s load the same wiki page about the simpsons that we were working with before.

import requests
import pandas as pa
from bs4 import BeautifulSoup


r = requests.get('https://en.wikipedia.org/wiki/The_Simpsons')
html_contents = r.text
html_soup = BeautifulSoup(html_contents,"lxml")

Tables Mean Data for Processing and Visualization!#

len(html_soup.find_all('table'))
41

We see here that there are 41 tables stored in a list! Let’s get one of them by class. There are also sometimes ids and a grab bag of otherways to grab different parts of the html. Use your developer tools to examine your particular website!

tables = html_soup.find_all('table',class_="wikitable")
tables[0].find_all('a')
[<a href="/wiki/Eastern_Time_Zone" title="Eastern Time Zone">ET</a>,
 <a href="/wiki/The_Simpsons_(season_1)" title="The Simpsons (season 1)">1</a>,
 <a href="/wiki/1989%E2%80%9390_United_States_network_television_schedule" title="1989–90 United States network television schedule">1989–90</a>,
 <a href="/wiki/Life_on_the_Fast_Lane" title="Life on the Fast Lane">Life on the Fast Lane</a>,
 <a href="/wiki/The_Simpsons_(season_2)" title="The Simpsons (season 2)">2</a>,
 <a href="/wiki/1990%E2%80%9391_United_States_network_television_schedule" title="1990–91 United States network television schedule">1990–91</a>,
 <a class="mw-redirect" href="/wiki/Bart_Gets_an_%27F%27" title="Bart Gets an 'F'">Bart Gets an 'F'</a>,
 <a href="/wiki/The_Simpsons_(season_3)" title="The Simpsons (season 3)">3</a>,
 <a href="/wiki/1991%E2%80%9392_United_States_network_television_schedule" title="1991–92 United States network television schedule">1991–92</a>,
 <a href="/wiki/Colonel_Homer" title="Colonel Homer">Colonel Homer</a>,
 <a href="/wiki/The_Simpsons_(season_4)" title="The Simpsons (season 4)">4</a>,
 <a href="/wiki/1992%E2%80%9393_United_States_network_television_schedule" title="1992–93 United States network television schedule">1992–93</a>,
 <a href="/wiki/Lisa%27s_First_Word" title="Lisa's First Word">Lisa's First Word</a>,
 <a href="/wiki/The_Simpsons_(season_5)" title="The Simpsons (season 5)">5</a>,
 <a href="/wiki/1993%E2%80%9394_United_States_network_television_schedule" title="1993–94 United States network television schedule">1993–94</a>,
 <a href="/wiki/Treehouse_of_Horror_IV" title="Treehouse of Horror IV">Treehouse of Horror IV</a>,
 <a href="/wiki/The_Simpsons_(season_6)" title="The Simpsons (season 6)">6</a>,
 <a href="/wiki/1994%E2%80%9395_United_States_network_television_schedule" title="1994–95 United States network television schedule">1994–95</a>,
 <a href="/wiki/Treehouse_of_Horror_V" title="Treehouse of Horror V">Treehouse of Horror V</a>,
 <a href="/wiki/The_Simpsons_(season_7)" title="The Simpsons (season 7)">7</a>,
 <a href="/wiki/1995%E2%80%9396_United_States_network_television_schedule" title="1995–96 United States network television schedule">1995–96</a>,
 <a href="/wiki/Treehouse_of_Horror_VI" title="Treehouse of Horror VI">Treehouse of Horror VI</a>,
 <a href="/wiki/The_Simpsons_(season_8)" title="The Simpsons (season 8)">8</a>,
 <a href="/wiki/1996%E2%80%9397_United_States_network_television_schedule" title="1996–97 United States network television schedule">1996–97</a>,
 <a href="#cite_note-147">[146]</a>,
 <a href="/wiki/The_Springfield_Files" title="The Springfield Files">The Springfield Files</a>,
 <a href="/wiki/The_Simpsons_(season_9)" title="The Simpsons (season 9)">9</a>,
 <a href="/wiki/1997%E2%80%9398_United_States_network_television_schedule" title="1997–98 United States network television schedule">1997–98</a>,
 <a href="/wiki/The_Two_Mrs._Nahasapeemapetilons" title="The Two Mrs. Nahasapeemapetilons">The Two Mrs. Nahasapeemapetilons</a>,
 <a href="/wiki/The_Simpsons_(season_10)" title="The Simpsons (season 10)">10</a>,
 <a href="/wiki/1998%E2%80%9399_United_States_network_television_schedule" title="1998–99 United States network television schedule">1998–99</a>,
 <a href="/wiki/Maximum_Homerdrive" title="Maximum Homerdrive">Maximum Homerdrive</a>,
 <a href="/wiki/The_Simpsons_(season_11)" title="The Simpsons (season 11)">11</a>,
 <a href="/wiki/1999%E2%80%932000_United_States_network_television_schedule" title="1999–2000 United States network television schedule">1999–2000</a>,
 <a href="/wiki/The_Mansion_Family" title="The Mansion Family">The Mansion Family</a>,
 <a href="/wiki/The_Simpsons_(season_12)" title="The Simpsons (season 12)">12</a>,
 <a href="/wiki/2000%E2%80%9301_United_States_network_television_schedule" title="2000–01 United States network television schedule">2000–01</a>,
 <a href="/wiki/Worst_Episode_Ever" title="Worst Episode Ever">Worst Episode Ever</a>,
 <a href="/wiki/The_Simpsons_(season_13)" title="The Simpsons (season 13)">13</a>,
 <a href="/wiki/2001%E2%80%9302_United_States_network_television_schedule" title="2001–02 United States network television schedule">2001–02</a>,
 <a href="/wiki/The_Parent_Rap" title="The Parent Rap">The Parent Rap</a>,
 <a href="/wiki/The_Simpsons_(season_14)" title="The Simpsons (season 14)">14</a>,
 <a href="/wiki/2002%E2%80%9303_United_States_network_television_schedule" title="2002–03 United States network television schedule">2002–03</a>,
 <a href="/wiki/I%27m_Spelling_as_Fast_as_I_Can" title="I'm Spelling as Fast as I Can">I'm Spelling as Fast as I Can</a>,
 <a href="/wiki/The_Simpsons_(season_15)" title="The Simpsons (season 15)">15</a>,
 <a href="/wiki/2003%E2%80%9304_United_States_network_television_schedule" title="2003–04 United States network television schedule">2003–04</a>,
 <a href="/wiki/I,_(Annoyed_Grunt)-Bot" title="I, (Annoyed Grunt)-Bot">I, (Annoyed Grunt)-Bot</a>,
 <a href="/wiki/The_Simpsons_(season_16)" title="The Simpsons (season 16)">16</a>,
 <a href="/wiki/2004%E2%80%9305_United_States_network_television_schedule" title="2004–05 United States network television schedule">2004–05</a>,
 <a href="/wiki/Homer_and_Ned%27s_Hail_Mary_Pass" title="Homer and Ned's Hail Mary Pass">Homer and Ned's Hail Mary Pass</a>,
 <a href="/wiki/The_Simpsons_(season_17)" title="The Simpsons (season 17)">17</a>,
 <a href="/wiki/2005%E2%80%9306_United_States_network_television_schedule" title="2005–06 United States network television schedule">2005–06</a>,
 <a href="/wiki/Treehouse_of_Horror_XVI" title="Treehouse of Horror XVI">Treehouse of Horror XVI</a>,
 <a href="/wiki/The_Simpsons_(season_18)" title="The Simpsons (season 18)">18</a>,
 <a href="/wiki/2006%E2%80%9307_United_States_network_television_schedule" title="2006–07 United States network television schedule">2006–07</a>,
 <a href="/wiki/The_Wife_Aquatic" title="The Wife Aquatic">The Wife Aquatic</a>,
 <a href="/wiki/The_Simpsons_(season_19)" title="The Simpsons (season 19)">19</a>,
 <a href="/wiki/2007%E2%80%9308_United_States_network_television_schedule" title="2007–08 United States network television schedule">2007–08</a>,
 <a href="/wiki/Treehouse_of_Horror_XVIII" title="Treehouse of Horror XVIII">Treehouse of Horror XVIII</a>,
 <a href="/wiki/The_Simpsons_(season_20)" title="The Simpsons (season 20)">20</a>,
 <a href="/wiki/2008%E2%80%9309_United_States_network_television_schedule" title="2008–09 United States network television schedule">2008–09</a>,
 <a href="/wiki/Treehouse_of_Horror_XIX" title="Treehouse of Horror XIX">Treehouse of Horror XIX</a>,
 <a href="/wiki/The_Simpsons_(season_21)" title="The Simpsons (season 21)">21</a>,
 <a href="/wiki/2009%E2%80%9310_United_States_network_television_schedule" title="2009–10 United States network television schedule">2009–10</a>,
 <a href="/wiki/Once_Upon_a_Time_in_Springfield" title="Once Upon a Time in Springfield">Once Upon a Time in Springfield</a>,
 <a href="/wiki/The_Simpsons_(season_22)" title="The Simpsons (season 22)">22</a>,
 <a href="/wiki/2010%E2%80%9311_United_States_network_television_schedule" title="2010–11 United States network television schedule">2010–11</a>,
 <a href="/wiki/Moms_I%27d_Like_to_Forget" title="Moms I'd Like to Forget">Moms I'd Like to Forget</a>,
 <a href="/wiki/The_Simpsons_(season_23)" title="The Simpsons (season 23)">23</a>,
 <a href="/wiki/2011%E2%80%9312_United_States_network_television_schedule" title="2011–12 United States network television schedule">2011–12</a>,
 <a href="#cite_note-Season23viewership-148">[147]</a>,
 <a href="/wiki/The_D%27oh-cial_Network" title="The D'oh-cial Network">The D'oh-cial Network</a>,
 <a href="/wiki/The_Simpsons_(season_24)" title="The Simpsons (season 24)">24</a>,
 <a href="/wiki/2012%E2%80%9313_United_States_network_television_schedule" title="2012–13 United States network television schedule">2012–13</a>,
 <a href="#cite_note-Season24viewership-149">[148]</a>,
 <a href="/wiki/Homer_Goes_to_Prep_School" title="Homer Goes to Prep School">Homer Goes to Prep School</a>,
 <a href="/wiki/The_Simpsons_(season_25)" title="The Simpsons (season 25)">25</a>,
 <a href="/wiki/2013%E2%80%9314_United_States_network_television_schedule" title="2013–14 United States network television schedule">2013–14</a>,
 <a href="#cite_note-Season25viewership-150">[149]</a>,
 <a href="/wiki/Steal_This_Episode" title="Steal This Episode">Steal This Episode</a>,
 <a href="/wiki/The_Simpsons_(season_26)" title="The Simpsons (season 26)">26</a>,
 <a href="/wiki/2014%E2%80%9315_United_States_network_television_schedule" title="2014–15 United States network television schedule">2014–15</a>,
 <a href="#cite_note-entertainment2015-151">[150]</a>,
 <a href="/wiki/The_Man_Who_Came_to_Be_Dinner" title="The Man Who Came to Be Dinner">The Man Who Came to Be Dinner</a>,
 <a href="/wiki/The_Simpsons_(season_27)" title="The Simpsons (season 27)">27</a>,
 <a href="/wiki/2015%E2%80%9316_United_States_network_television_schedule" title="2015–16 United States network television schedule">2015–16</a>,
 <a href="#cite_note-entertainment2016-152">[151]</a>,
 <a href="/wiki/Teenage_Mutant_Milk-Caused_Hurdles" title="Teenage Mutant Milk-Caused Hurdles">Teenage Mutant Milk-Caused Hurdles</a>,
 <a href="/wiki/The_Simpsons_(season_28)" title="The Simpsons (season 28)">28</a>,
 <a href="/wiki/2016%E2%80%9317_United_States_network_television_schedule" title="2016–17 United States network television schedule">2016–17</a>,
 <a href="#cite_note-entertainment2017-153">[152]</a>,
 <a href="/wiki/Pork_and_Burns" title="Pork and Burns">Pork and Burns</a>,
 <a href="/wiki/The_Simpsons_(season_29)" title="The Simpsons (season 29)">29</a>,
 <a href="/wiki/2017%E2%80%9318_United_States_network_television_schedule" title="2017–18 United States network television schedule">2017–18</a>,
 <a href="#cite_note-entertainment2018-154">[153]</a>,
 <a href="/wiki/Frink_Gets_Testy" title="Frink Gets Testy">Frink Gets Testy</a>,
 <a href="/wiki/The_Simpsons_(season_30)" title="The Simpsons (season 30)">30</a>,
 <a href="/wiki/2018%E2%80%9319_United_States_network_television_schedule" title="2018–19 United States network television schedule">2018–19</a>,
 <a href="#cite_note-155">[154]</a>,
 <a href="/wiki/The_Girl_on_the_Bus" title="The Girl on the Bus">The Girl on the Bus</a>,
 <a href="/wiki/The_Simpsons_(season_31)" title="The Simpsons (season 31)">31</a>,
 <a href="/wiki/2019%E2%80%9320_United_States_network_television_schedule" title="2019–20 United States network television schedule">2019–20</a>,
 <a href="#cite_note-156">[155]</a>,
 <a href="/wiki/Go_Big_or_Go_Homer" title="Go Big or Go Homer">Go Big or Go Homer</a>,
 <a href="/wiki/The_Simpsons_(season_32)" title="The Simpsons (season 32)">32</a>,
 <a href="/wiki/2020%E2%80%9321_United_States_network_television_schedule" title="2020–21 United States network television schedule">2020–21</a>,
 <a href="#cite_note-157">[156]</a>,
 <a href="/wiki/Treehouse_of_Horror_XXXI" title="Treehouse of Horror XXXI">Treehouse of Horror XXXI</a>,
 <a href="/wiki/The_Simpsons_(season_33)" title="The Simpsons (season 33)">33</a>,
 <a href="/wiki/2021%E2%80%9322_United_States_network_television_schedule" title="2021–22 United States network television schedule">2021–22</a>,
 <a href="#cite_note-158">[157]</a>]

Here I grabbed the first table, there were 3 that had the class of “wikitable”. Next I grabbed all the links. The table was HUGE when I printed everything, To get the table in a nice form, I’ll simply pass it to pandas using the read_html command. I did need to convert the soup back into a string and then I only selected the first table to call df.


df = pa.read_html(str(tables))[0]
df
Season No. ofepisodes Originally aired Viewership
Season No. ofepisodes Season premiere Season finale Time slot (ET) Avg. viewers(in millions) Most watched episode
Season Season.1 No. ofepisodes Season premiere Season finale Time slot (ET) Avg. viewers(in millions) Viewers(millions) Episode title
0 1 1989–90 13 December 17, 1989 May 13, 1990 Sunday 8:30 pm 27.8 33.5 "Life on the Fast Lane"
1 2 1990–91 22 October 11, 1990 July 11, 1991 Thursday 8:00 pm 24.4 33.6 "Bart Gets an 'F'"
2 3 1991–92 24 September 19, 1991 August 27, 1992 Thursday 8:00 pm 21.8 25.5 "Colonel Homer"
3 4 1992–93 22 September 24, 1992 May 13, 1993 Thursday 8:00 pm 22.4 28.6 "Lisa's First Word"
4 5 1993–94 22 September 30, 1993 May 19, 1994 Thursday 8:00 pm 18.9 24.0 "Treehouse of Horror IV"
5 6 1994–95 25 September 4, 1994 May 21, 1995 Sunday 8:00 pm 15.6 22.2 "Treehouse of Horror V"
6 7 1995–96 25 September 17, 1995 May 19, 1996 Sunday 8:00 pm (Episodes 1–24)Sunday 8:30 pm (... 15.1 19.7 "Treehouse of Horror VI"
7 8 1996–97 25 October 27, 1996 May 18, 1997 Sunday 8:30 pm (Episodes 1–3)Sunday 8:00 pm (E... 14.5 20.9 "The Springfield Files"
8 9 1997–98 25 September 21, 1997 May 17, 1998 Sunday 8:00 pm 15.3 19.8 "The Two Mrs. Nahasapeemapetilons"
9 10 1998–99 23 August 23, 1998 May 16, 1999 Sunday 8:00 pm 13.5 15.5 "Maximum Homerdrive"
10 11 1999–2000 22 September 26, 1999 May 21, 2000 Sunday 8:00 pm 8.8 18.4 "The Mansion Family"
11 12 2000–01 21 November 1, 2000 May 20, 2001 Sunday 8:00 pm 15.5 18.6 "Worst Episode Ever"
12 13 2001–02 22 November 6, 2001 May 22, 2002 Tuesday 8:30 pm (Episode 1)Sunday 8:00 pm (Epi... 12.5 14.9 "The Parent Rap"
13 14 2002–03 22 November 3, 2002 May 18, 2003 Sunday 8:00 pm (Episodes 1–11, 13–21)Sunday 8:... 14.4 22.1 "I'm Spelling as Fast as I Can"
14 15 2003–04 22 November 2, 2003 May 23, 2004 Sunday 8:00 pm 11.0 16.3 "I, (Annoyed Grunt)-Bot"
15 16 2004–05 21 November 7, 2004 May 15, 2005 Sunday 8:00 pm (Episodes 1–7, 9–16, 18, 20)Sun... 10.2 23.07 "Homer and Ned's Hail Mary Pass"
16 17 2005–06 22 September 11, 2005 May 21, 2006 Sunday 8:00 pm 9.55 11.63 "Treehouse of Horror XVI"
17 18 2006–07 22 September 10, 2006 May 20, 2007 Sunday 8:00 pm 9.15 13.90 "The Wife Aquatic"
18 19 2007–08 20 September 23, 2007 May 18, 2008 Sunday 8:00 pm 8.37 11.7 "Treehouse of Horror XVIII"
19 20 2008–09 21 September 28, 2008 May 17, 2009 Sunday 8:00 pm 7.1 12.4 "Treehouse of Horror XIX"
20 21 2009–10 23 September 27, 2009 May 23, 2010 Sunday 8:00 pm 7.1 14.62 "Once Upon a Time in Springfield"
21 22 2010–11 22 September 26, 2010 May 22, 2011 Sunday 8:00 pm 7.09 12.6 "Moms I'd Like to Forget"
22 23 2011–12 22 September 25, 2011 May 20, 2012 Sunday 8:00 pm 6.15[147] 11.48 "The D'oh-cial Network"
23 24 2012–13 22 September 30, 2012 May 19, 2013 Sunday 8:00 pm (Episodes 1-21)Sunday 8:30 pm (... 5.41[148] 8.97 "Homer Goes to Prep School"
24 25 2013–14 22 September 29, 2013 May 18, 2014 Sunday 8:00 pm (Episodes 1–11 & 13-22)Sunday 7... 5.02[149] 12.04 "Steal This Episode"
25 26 2014–15 22 September 28, 2014 May 17, 2015 Sunday 8:00 pm 5.61[150] 10.62 "The Man Who Came to Be Dinner"
26 27 2015–16 22 September 27, 2015 May 22, 2016 Sunday 8:00 pm 4.0[151] 8.33 "Teenage Mutant Milk-Caused Hurdles"
27 28 2016–17 22 September 25, 2016 May 21, 2017 Sunday 8:00 pm 4.80[152] 8.19 "Pork and Burns"
28 29 2017–18 21 October 1, 2017 May 20, 2018 Sunday 8:00 pm 4.07[153] 8.04 "Frink Gets Testy"
29 30 2018–19 23 September 30, 2018 May 12, 2019 Sunday 8:00 pm 3.10[154] 8.20 "The Girl on the Bus"
30 31 2019–20 22 September 29, 2019 May 17, 2020 Sunday 8:00 pm 2.58[155] 5.63 "Go Big or Go Homer"
31 32 2020–21 22 September 27, 2020 May 23, 2021 Sunday 8:00 pm (Episodes 1–10 & 12-22)Sunday 9... 2.32[156] 4.93 "Treehouse of Horror XXXI"
32 33 2021–22 22 September 26, 2021 May 15, 2022[157] Sunday 8:00 pm (Episodes 1–10 & 12-22)Sunday 9... TBA TBA TBA

The column names are not quite right but that is not a terrible fix. This was actually found later. You can see below in the list way in which I built the tables. There is more flexibility in the list way but the simplicity of the pandas way cannot be beat!

So the column names here are a problem as there are many! I will simply use the droplevel command twice to remove two of the multi-indexing that is going on in the titles.

df.columns = df.columns.droplevel(0).droplevel(0)
df
Season Season.1 No. ofepisodes Season premiere Season finale Time slot (ET) Avg. viewers(in millions) Viewers(millions) Episode title
0 1 1989–90 13 December 17, 1989 May 13, 1990 Sunday 8:30 pm 27.8 33.5 "Life on the Fast Lane"
1 2 1990–91 22 October 11, 1990 July 11, 1991 Thursday 8:00 pm 24.4 33.6 "Bart Gets an 'F'"
2 3 1991–92 24 September 19, 1991 August 27, 1992 Thursday 8:00 pm 21.8 25.5 "Colonel Homer"
3 4 1992–93 22 September 24, 1992 May 13, 1993 Thursday 8:00 pm 22.4 28.6 "Lisa's First Word"
4 5 1993–94 22 September 30, 1993 May 19, 1994 Thursday 8:00 pm 18.9 24.0 "Treehouse of Horror IV"
5 6 1994–95 25 September 4, 1994 May 21, 1995 Sunday 8:00 pm 15.6 22.2 "Treehouse of Horror V"
6 7 1995–96 25 September 17, 1995 May 19, 1996 Sunday 8:00 pm (Episodes 1–24)Sunday 8:30 pm (... 15.1 19.7 "Treehouse of Horror VI"
7 8 1996–97 25 October 27, 1996 May 18, 1997 Sunday 8:30 pm (Episodes 1–3)Sunday 8:00 pm (E... 14.5 20.9 "The Springfield Files"
8 9 1997–98 25 September 21, 1997 May 17, 1998 Sunday 8:00 pm 15.3 19.8 "The Two Mrs. Nahasapeemapetilons"
9 10 1998–99 23 August 23, 1998 May 16, 1999 Sunday 8:00 pm 13.5 15.5 "Maximum Homerdrive"
10 11 1999–2000 22 September 26, 1999 May 21, 2000 Sunday 8:00 pm 8.8 18.4 "The Mansion Family"
11 12 2000–01 21 November 1, 2000 May 20, 2001 Sunday 8:00 pm 15.5 18.6 "Worst Episode Ever"
12 13 2001–02 22 November 6, 2001 May 22, 2002 Tuesday 8:30 pm (Episode 1)Sunday 8:00 pm (Epi... 12.5 14.9 "The Parent Rap"
13 14 2002–03 22 November 3, 2002 May 18, 2003 Sunday 8:00 pm (Episodes 1–11, 13–21)Sunday 8:... 14.4 22.1 "I'm Spelling as Fast as I Can"
14 15 2003–04 22 November 2, 2003 May 23, 2004 Sunday 8:00 pm 11.0 16.3 "I, (Annoyed Grunt)-Bot"
15 16 2004–05 21 November 7, 2004 May 15, 2005 Sunday 8:00 pm (Episodes 1–7, 9–16, 18, 20)Sun... 10.2 23.07 "Homer and Ned's Hail Mary Pass"
16 17 2005–06 22 September 11, 2005 May 21, 2006 Sunday 8:00 pm 9.55 11.63 "Treehouse of Horror XVI"
17 18 2006–07 22 September 10, 2006 May 20, 2007 Sunday 8:00 pm 9.15 13.90 "The Wife Aquatic"
18 19 2007–08 20 September 23, 2007 May 18, 2008 Sunday 8:00 pm 8.37 11.7 "Treehouse of Horror XVIII"
19 20 2008–09 21 September 28, 2008 May 17, 2009 Sunday 8:00 pm 7.1 12.4 "Treehouse of Horror XIX"
20 21 2009–10 23 September 27, 2009 May 23, 2010 Sunday 8:00 pm 7.1 14.62 "Once Upon a Time in Springfield"
21 22 2010–11 22 September 26, 2010 May 22, 2011 Sunday 8:00 pm 7.09 12.6 "Moms I'd Like to Forget"
22 23 2011–12 22 September 25, 2011 May 20, 2012 Sunday 8:00 pm 6.15[147] 11.48 "The D'oh-cial Network"
23 24 2012–13 22 September 30, 2012 May 19, 2013 Sunday 8:00 pm (Episodes 1-21)Sunday 8:30 pm (... 5.41[148] 8.97 "Homer Goes to Prep School"
24 25 2013–14 22 September 29, 2013 May 18, 2014 Sunday 8:00 pm (Episodes 1–11 & 13-22)Sunday 7... 5.02[149] 12.04 "Steal This Episode"
25 26 2014–15 22 September 28, 2014 May 17, 2015 Sunday 8:00 pm 5.61[150] 10.62 "The Man Who Came to Be Dinner"
26 27 2015–16 22 September 27, 2015 May 22, 2016 Sunday 8:00 pm 4.0[151] 8.33 "Teenage Mutant Milk-Caused Hurdles"
27 28 2016–17 22 September 25, 2016 May 21, 2017 Sunday 8:00 pm 4.80[152] 8.19 "Pork and Burns"
28 29 2017–18 21 October 1, 2017 May 20, 2018 Sunday 8:00 pm 4.07[153] 8.04 "Frink Gets Testy"
29 30 2018–19 23 September 30, 2018 May 12, 2019 Sunday 8:00 pm 3.10[154] 8.20 "The Girl on the Bus"
30 31 2019–20 22 September 29, 2019 May 17, 2020 Sunday 8:00 pm 2.58[155] 5.63 "Go Big or Go Homer"
31 32 2020–21 22 September 27, 2020 May 23, 2021 Sunday 8:00 pm (Episodes 1–10 & 12-22)Sunday 9... 2.32[156] 4.93 "Treehouse of Horror XXXI"
32 33 2021–22 22 September 26, 2021 May 15, 2022[157] Sunday 8:00 pm (Episodes 1–10 & 12-22)Sunday 9... TBA TBA TBA

That is much better although we loose a little bit of information about the last two columns.

Let’s grab one more table to see if it works as well still. I just grabbed the very first table on the page.

df2 = pa.read_html(str(html_soup.find('table')))[0]

df2
The Simpsons The Simpsons.1
0 NaN NaN
1 Genre Animated sitcom Satire
2 Created by Matt Groening
3 Based on The Simpsons shortsby Matt Groening
4 Developed by James L. Brooks Matt Groening Sam Simon
5 Voices of Dan Castellaneta Julie Kavner Nancy Cartwright...
6 Theme music composer Danny Elfman
7 Opening theme "The Simpsons Theme"
8 Composers Richard Gibbs (1989–1990)Alf Clausen (1990–201...
9 Country of origin United States
10 Original language English
11 No. of seasons 33
12 No. of episodes 717 (list of episodes)
13 Production Production
14 Executive producers List James L. Brooks (entire run) Matt Groen...
15 Running time 21–24 minutes
16 Production companies Gracie Films 20th Television[a] (seasons 1–32)...
17 Distributor 20th Television
18 Release Release
19 Original network Fox
20 Picture format NTSC (1989–2009)HDTV 720p (2009–present)
21 Audio format Stereo (1989–1991)Dolby Surround (1991–2009)Do...
22 Original release December 17, 1989present
23 Chronology Chronology
24 Preceded by The Simpsons shorts from The Tracey Ullman Show
25 External links External links
26 Official website Official website

Table the Hard Way#

data =[]
for table in tables:
    headers = []
    rows = table.find_all('tr')
    for header in table.find('tr').find_all('th'):
        headers.append(header.text.replace('\n', ''))
    for row in table.find_all('tr')[1:]:
        values =[]
        for col in row.find_all(['th','td']):
            values.append(col.text.replace('\n', ''))
        data.append(values)
data[:4]

#pa.DataFrame(data[1:], columns  = data[0])
[['Season premiere',
  'Season finale',
  'Time slot (ET)',
  'Avg. viewers(in millions)',
  'Most watched episode'],
 ['Viewers(millions)', 'Episode title'],
 ['1',
  '1989–90',
  '13',
  'December 17, 1989',
  'May 13, 1990',
  'Sunday 8:30\xa0pm',
  '27.8',
  '33.5',
  '"Life on the Fast Lane"'],
 ['2',
  '1990–91',
  '22',
  'October 11, 1990',
  'July 11, 1991',
  'Thursday 8:00\xa0pm',
  '24.4',
  '33.6',
  '"Bart Gets an \'F\'"']]

I have to do some work here to get this into a dataframe. Mostly just get the column names correct. Several were not named and some ended up in there own row. This is why it is important to look at your outputs!

titles = []
titles.append('Season')
titles.append('Years')
titles.append('Episodes')
for name in data[0]:
  titles.append(name)
titles.append('Most watched episode title')

df = pa.DataFrame(data[2:], columns = titles)
df
Season Years Episodes Season premiere Season finale Time slot (ET) Avg. viewers(in millions) Most watched episode Most watched episode title
0 1 1989–90 13 December 17, 1989 May 13, 1990 Sunday 8:30 pm 27.8 33.5 "Life on the Fast Lane"
1 2 1990–91 22 October 11, 1990 July 11, 1991 Thursday 8:00 pm 24.4 33.6 "Bart Gets an 'F'"
2 3 1991–92 24 September 19, 1991 August 27, 1992 21.8 25.5 "Colonel Homer" None
3 4 1992–93 22 September 24, 1992 May 13, 1993 22.4 28.6 "Lisa's First Word" None
4 5 1993–94 22 September 30, 1993 May 19, 1994 18.9 24.0 "Treehouse of Horror IV" None
5 6 1994–95 25 September 4, 1994 May 21, 1995 Sunday 8:00 pm 15.6 22.2 "Treehouse of Horror V"
6 7 1995–96 25 September 17, 1995 May 19, 1996 Sunday 8:00 pm (Episodes 1–24)Sunday 8:30 pm (... 15.1 19.7 "Treehouse of Horror VI"
7 8 1996–97 25 October 27, 1996 May 18, 1997 Sunday 8:30 pm (Episodes 1–3)Sunday 8:00 pm (E... 14.5 20.9 "The Springfield Files"
8 9 1997–98 25 September 21, 1997 May 17, 1998 Sunday 8:00 pm 15.3 19.8 "The Two Mrs. Nahasapeemapetilons"
9 10 1998–99 23 August 23, 1998 May 16, 1999 13.5 15.5 "Maximum Homerdrive" None
10 11 1999–2000 22 September 26, 1999 May 21, 2000 8.8 18.4 "The Mansion Family" None
11 12 2000–01 21 November 1, 2000 May 20, 2001 15.5 18.6 "Worst Episode Ever" None
12 13 2001–02 22 November 6, 2001 May 22, 2002 Tuesday 8:30 pm (Episode 1)Sunday 8:00 pm (Epi... 12.5 14.9 "The Parent Rap"
13 14 2002–03 22 November 3, 2002 May 18, 2003 Sunday 8:00 pm (Episodes 1–11, 13–21)Sunday 8:... 14.4 22.1 "I'm Spelling as Fast as I Can"
14 15 2003–04 22 November 2, 2003 May 23, 2004 Sunday 8:00 pm 11.0 16.3 "I, (Annoyed Grunt)-Bot"
15 16 2004–05 21 November 7, 2004 May 15, 2005 Sunday 8:00 pm (Episodes 1–7, 9–16, 18, 20)Sun... 10.2 23.07 "Homer and Ned's Hail Mary Pass"
16 17 2005–06 22 September 11, 2005 May 21, 2006 Sunday 8:00 pm 9.55 11.63 "Treehouse of Horror XVI"
17 18 2006–07 22 September 10, 2006 May 20, 2007 9.15 13.90 "The Wife Aquatic" None
18 19 2007–08 20 September 23, 2007 May 18, 2008 8.37 11.7 "Treehouse of Horror XVIII" None
19 20 2008–09 21 September 28, 2008 May 17, 2009 7.1 12.4 "Treehouse of Horror XIX" None
20 21 2009–10 23 September 27, 2009 May 23, 2010 7.1 14.62 "Once Upon a Time in Springfield" None
21 22 2010–11 22 September 26, 2010 May 22, 2011 7.09 12.6 "Moms I'd Like to Forget" None
22 23 2011–12 22 September 25, 2011 May 20, 2012 6.15[147] 11.48 "The D'oh-cial Network" None
23 24 2012–13 22 September 30, 2012 May 19, 2013 Sunday 8:00 pm (Episodes 1-21)Sunday 8:30 pm (... 5.41[148] 8.97 "Homer Goes to Prep School"
24 25 2013–14 22 September 29, 2013 May 18, 2014 Sunday 8:00 pm (Episodes 1–11 & 13-22)Sunday 7... 5.02[149] 12.04 "Steal This Episode"
25 26 2014–15 22 September 28, 2014 May 17, 2015 Sunday 8:00 pm 5.61[150] 10.62 "The Man Who Came to Be Dinner"
26 27 2015–16 22 September 27, 2015 May 22, 2016 4.0[151] 8.33 "Teenage Mutant Milk-Caused Hurdles" None
27 28 2016–17 22 September 25, 2016 May 21, 2017 (2017-05-21) 4.80[152] 8.19 "Pork and Burns" None
28 29 2017–18 21 October 1, 2017 May 20, 2018 4.07[153] 8.04 "Frink Gets Testy" None
29 30 2018–19 23 September 30, 2018 May 12, 2019 3.10[154] 8.20 "The Girl on the Bus" None
30 31 2019–20 22 September 29, 2019 May 17, 2020 2.58[155] 5.63 "Go Big or Go Homer" None
31 32 2020–21 22 September 27, 2020 May 23, 2021 Sunday 8:00 pm (Episodes 1–10 & 12-22)Sunday 9... 2.32[156] 4.93 "Treehouse of Horror XXXI"
32 33 2021–22 22 September 26, 2021 May 15, 2022[157] TBA TBA TBA None

Actually I still have a problem with my data. Lots of the data had the airtime repeated from above. Let’s see if we can fix that

newdata =[]
for i in range(2,35):
  row = []
  if len(data[i])!= 9:
    for j in range(5):
      row.append(data[i][j])
    row.append(newdata[i-3][5])
    for j in range(5,8):
      row.append(data[i][j])
  else:
    row = data[i]
  newdata.append(row)
df = pa.DataFrame(newdata, columns = titles)

df
Season Years Episodes Season premiere Season finale Time slot (ET) Avg. viewers(in millions) Most watched episode Most watched episode title
0 1 1989–90 13 December 17, 1989 May 13, 1990 Sunday 8:30 pm 27.8 33.5 "Life on the Fast Lane"
1 2 1990–91 22 October 11, 1990 July 11, 1991 Thursday 8:00 pm 24.4 33.6 "Bart Gets an 'F'"
2 3 1991–92 24 September 19, 1991 August 27, 1992 Thursday 8:00 pm 21.8 25.5 "Colonel Homer"
3 4 1992–93 22 September 24, 1992 May 13, 1993 Thursday 8:00 pm 22.4 28.6 "Lisa's First Word"
4 5 1993–94 22 September 30, 1993 May 19, 1994 Thursday 8:00 pm 18.9 24.0 "Treehouse of Horror IV"
5 6 1994–95 25 September 4, 1994 May 21, 1995 Sunday 8:00 pm 15.6 22.2 "Treehouse of Horror V"
6 7 1995–96 25 September 17, 1995 May 19, 1996 Sunday 8:00 pm (Episodes 1–24)Sunday 8:30 pm (... 15.1 19.7 "Treehouse of Horror VI"
7 8 1996–97 25 October 27, 1996 May 18, 1997 Sunday 8:30 pm (Episodes 1–3)Sunday 8:00 pm (E... 14.5 20.9 "The Springfield Files"
8 9 1997–98 25 September 21, 1997 May 17, 1998 Sunday 8:00 pm 15.3 19.8 "The Two Mrs. Nahasapeemapetilons"
9 10 1998–99 23 August 23, 1998 May 16, 1999 Sunday 8:00 pm 13.5 15.5 "Maximum Homerdrive"
10 11 1999–2000 22 September 26, 1999 May 21, 2000 Sunday 8:00 pm 8.8 18.4 "The Mansion Family"
11 12 2000–01 21 November 1, 2000 May 20, 2001 Sunday 8:00 pm 15.5 18.6 "Worst Episode Ever"
12 13 2001–02 22 November 6, 2001 May 22, 2002 Tuesday 8:30 pm (Episode 1)Sunday 8:00 pm (Epi... 12.5 14.9 "The Parent Rap"
13 14 2002–03 22 November 3, 2002 May 18, 2003 Sunday 8:00 pm (Episodes 1–11, 13–21)Sunday 8:... 14.4 22.1 "I'm Spelling as Fast as I Can"
14 15 2003–04 22 November 2, 2003 May 23, 2004 Sunday 8:00 pm 11.0 16.3 "I, (Annoyed Grunt)-Bot"
15 16 2004–05 21 November 7, 2004 May 15, 2005 Sunday 8:00 pm (Episodes 1–7, 9–16, 18, 20)Sun... 10.2 23.07 "Homer and Ned's Hail Mary Pass"
16 17 2005–06 22 September 11, 2005 May 21, 2006 Sunday 8:00 pm 9.55 11.63 "Treehouse of Horror XVI"
17 18 2006–07 22 September 10, 2006 May 20, 2007 Sunday 8:00 pm 9.15 13.90 "The Wife Aquatic"
18 19 2007–08 20 September 23, 2007 May 18, 2008 Sunday 8:00 pm 8.37 11.7 "Treehouse of Horror XVIII"
19 20 2008–09 21 September 28, 2008 May 17, 2009 Sunday 8:00 pm 7.1 12.4 "Treehouse of Horror XIX"
20 21 2009–10 23 September 27, 2009 May 23, 2010 Sunday 8:00 pm 7.1 14.62 "Once Upon a Time in Springfield"
21 22 2010–11 22 September 26, 2010 May 22, 2011 Sunday 8:00 pm 7.09 12.6 "Moms I'd Like to Forget"
22 23 2011–12 22 September 25, 2011 May 20, 2012 Sunday 8:00 pm 6.15[147] 11.48 "The D'oh-cial Network"
23 24 2012–13 22 September 30, 2012 May 19, 2013 Sunday 8:00 pm (Episodes 1-21)Sunday 8:30 pm (... 5.41[148] 8.97 "Homer Goes to Prep School"
24 25 2013–14 22 September 29, 2013 May 18, 2014 Sunday 8:00 pm (Episodes 1–11 & 13-22)Sunday 7... 5.02[149] 12.04 "Steal This Episode"
25 26 2014–15 22 September 28, 2014 May 17, 2015 Sunday 8:00 pm 5.61[150] 10.62 "The Man Who Came to Be Dinner"
26 27 2015–16 22 September 27, 2015 May 22, 2016 Sunday 8:00 pm 4.0[151] 8.33 "Teenage Mutant Milk-Caused Hurdles"
27 28 2016–17 22 September 25, 2016 May 21, 2017 (2017-05-21) Sunday 8:00 pm 4.80[152] 8.19 "Pork and Burns"
28 29 2017–18 21 October 1, 2017 May 20, 2018 Sunday 8:00 pm 4.07[153] 8.04 "Frink Gets Testy"
29 30 2018–19 23 September 30, 2018 May 12, 2019 Sunday 8:00 pm 3.10[154] 8.20 "The Girl on the Bus"
30 31 2019–20 22 September 29, 2019 May 17, 2020 Sunday 8:00 pm 2.58[155] 5.63 "Go Big or Go Homer"
31 32 2020–21 22 September 27, 2020 May 23, 2021 Sunday 8:00 pm (Episodes 1–10 & 12-22)Sunday 9... 2.32[156] 4.93 "Treehouse of Horror XXXI"
32 33 2021–22 22 September 26, 2021 May 15, 2022[157] Sunday 8:00 pm (Episodes 1–10 & 12-22)Sunday 9... TBA TBA TBA

Do you ever get to the finish line and think to yourself, man there must be an easier way to do that… Oh there totally was…

Your Turn#

Navigate to the wikipedia page on Marvel Cinematic Universe Films. Gather the table on the films in the Infinity series (Hint: class is ‘wikitable plainrowheaders’). Fix any issues with the column names. Remove rows that are not movies.