HTML Tables#
Let’s load the same wiki page about the simpsons that we were working with before.
import requests
import pandas as pa
from bs4 import BeautifulSoup
r = requests.get('')
html_contents = r.text
html_soup = BeautifulSoup(html_contents,"lxml")
Tables Mean Data for Processing and Visualization!#
We see here that there are 41 tables stored in a list! Let’s get one of them by class. There are also sometimes ids and a grab bag of otherways to grab different parts of the html. Use your developer tools to examine your particular website!
tables = html_soup.find_all('table',class_="wikitable")
[<a href="/wiki/Eastern_Time_Zone" title="Eastern Time Zone">ET</a>,
<a href="/wiki/The_Simpsons_(season_1)" title="The Simpsons (season 1)">1</a>,
<a href="/wiki/1989%E2%80%9390_United_States_network_television_schedule" title="1989–90 United States network television schedule">1989–90</a>,
<a href="/wiki/Life_on_the_Fast_Lane" title="Life on the Fast Lane">Life on the Fast Lane</a>,
<a href="/wiki/The_Simpsons_(season_2)" title="The Simpsons (season 2)">2</a>,
<a href="/wiki/1990%E2%80%9391_United_States_network_television_schedule" title="1990–91 United States network television schedule">1990–91</a>,
<a class="mw-redirect" href="/wiki/Bart_Gets_an_%27F%27" title="Bart Gets an 'F'">Bart Gets an 'F'</a>,
<a href="/wiki/The_Simpsons_(season_3)" title="The Simpsons (season 3)">3</a>,
<a href="/wiki/1991%E2%80%9392_United_States_network_television_schedule" title="1991–92 United States network television schedule">1991–92</a>,
<a href="/wiki/Colonel_Homer" title="Colonel Homer">Colonel Homer</a>,
<a href="/wiki/The_Simpsons_(season_4)" title="The Simpsons (season 4)">4</a>,
<a href="/wiki/1992%E2%80%9393_United_States_network_television_schedule" title="1992–93 United States network television schedule">1992–93</a>,
<a href="/wiki/Lisa%27s_First_Word" title="Lisa's First Word">Lisa's First Word</a>,
<a href="/wiki/The_Simpsons_(season_5)" title="The Simpsons (season 5)">5</a>,
<a href="/wiki/1993%E2%80%9394_United_States_network_television_schedule" title="1993–94 United States network television schedule">1993–94</a>,
<a href="/wiki/Treehouse_of_Horror_IV" title="Treehouse of Horror IV">Treehouse of Horror IV</a>,
<a href="/wiki/The_Simpsons_(season_6)" title="The Simpsons (season 6)">6</a>,
<a href="/wiki/1994%E2%80%9395_United_States_network_television_schedule" title="1994–95 United States network television schedule">1994–95</a>,
<a href="/wiki/Treehouse_of_Horror_V" title="Treehouse of Horror V">Treehouse of Horror V</a>,
<a href="/wiki/The_Simpsons_(season_7)" title="The Simpsons (season 7)">7</a>,
<a href="/wiki/1995%E2%80%9396_United_States_network_television_schedule" title="1995–96 United States network television schedule">1995–96</a>,
<a href="/wiki/Treehouse_of_Horror_VI" title="Treehouse of Horror VI">Treehouse of Horror VI</a>,
<a href="/wiki/The_Simpsons_(season_8)" title="The Simpsons (season 8)">8</a>,
<a href="/wiki/1996%E2%80%9397_United_States_network_television_schedule" title="1996–97 United States network television schedule">1996–97</a>,
<a href="#cite_note-147">[146]</a>,
<a href="/wiki/The_Springfield_Files" title="The Springfield Files">The Springfield Files</a>,
<a href="/wiki/The_Simpsons_(season_9)" title="The Simpsons (season 9)">9</a>,
<a href="/wiki/1997%E2%80%9398_United_States_network_television_schedule" title="1997–98 United States network television schedule">1997–98</a>,
<a href="/wiki/The_Two_Mrs._Nahasapeemapetilons" title="The Two Mrs. Nahasapeemapetilons">The Two Mrs. Nahasapeemapetilons</a>,
<a href="/wiki/The_Simpsons_(season_10)" title="The Simpsons (season 10)">10</a>,
<a href="/wiki/1998%E2%80%9399_United_States_network_television_schedule" title="1998–99 United States network television schedule">1998–99</a>,
<a href="/wiki/Maximum_Homerdrive" title="Maximum Homerdrive">Maximum Homerdrive</a>,
<a href="/wiki/The_Simpsons_(season_11)" title="The Simpsons (season 11)">11</a>,
<a href="/wiki/1999%E2%80%932000_United_States_network_television_schedule" title="1999–2000 United States network television schedule">1999–2000</a>,
<a href="/wiki/The_Mansion_Family" title="The Mansion Family">The Mansion Family</a>,
<a href="/wiki/The_Simpsons_(season_12)" title="The Simpsons (season 12)">12</a>,
<a href="/wiki/2000%E2%80%9301_United_States_network_television_schedule" title="2000–01 United States network television schedule">2000–01</a>,
<a href="/wiki/Worst_Episode_Ever" title="Worst Episode Ever">Worst Episode Ever</a>,
<a href="/wiki/The_Simpsons_(season_13)" title="The Simpsons (season 13)">13</a>,
<a href="/wiki/2001%E2%80%9302_United_States_network_television_schedule" title="2001–02 United States network television schedule">2001–02</a>,
<a href="/wiki/The_Parent_Rap" title="The Parent Rap">The Parent Rap</a>,
<a href="/wiki/The_Simpsons_(season_14)" title="The Simpsons (season 14)">14</a>,
<a href="/wiki/2002%E2%80%9303_United_States_network_television_schedule" title="2002–03 United States network television schedule">2002–03</a>,
<a href="/wiki/I%27m_Spelling_as_Fast_as_I_Can" title="I'm Spelling as Fast as I Can">I'm Spelling as Fast as I Can</a>,
<a href="/wiki/The_Simpsons_(season_15)" title="The Simpsons (season 15)">15</a>,
<a href="/wiki/2003%E2%80%9304_United_States_network_television_schedule" title="2003–04 United States network television schedule">2003–04</a>,
<a href="/wiki/I,_(Annoyed_Grunt)-Bot" title="I, (Annoyed Grunt)-Bot">I, (Annoyed Grunt)-Bot</a>,
<a href="/wiki/The_Simpsons_(season_16)" title="The Simpsons (season 16)">16</a>,
<a href="/wiki/2004%E2%80%9305_United_States_network_television_schedule" title="2004–05 United States network television schedule">2004–05</a>,
<a href="/wiki/Homer_and_Ned%27s_Hail_Mary_Pass" title="Homer and Ned's Hail Mary Pass">Homer and Ned's Hail Mary Pass</a>,
<a href="/wiki/The_Simpsons_(season_17)" title="The Simpsons (season 17)">17</a>,
<a href="/wiki/2005%E2%80%9306_United_States_network_television_schedule" title="2005–06 United States network television schedule">2005–06</a>,
<a href="/wiki/Treehouse_of_Horror_XVI" title="Treehouse of Horror XVI">Treehouse of Horror XVI</a>,
<a href="/wiki/The_Simpsons_(season_18)" title="The Simpsons (season 18)">18</a>,
<a href="/wiki/2006%E2%80%9307_United_States_network_television_schedule" title="2006–07 United States network television schedule">2006–07</a>,
<a href="/wiki/The_Wife_Aquatic" title="The Wife Aquatic">The Wife Aquatic</a>,
<a href="/wiki/The_Simpsons_(season_19)" title="The Simpsons (season 19)">19</a>,
<a href="/wiki/2007%E2%80%9308_United_States_network_television_schedule" title="2007–08 United States network television schedule">2007–08</a>,
<a href="/wiki/Treehouse_of_Horror_XVIII" title="Treehouse of Horror XVIII">Treehouse of Horror XVIII</a>,
<a href="/wiki/The_Simpsons_(season_20)" title="The Simpsons (season 20)">20</a>,
<a href="/wiki/2008%E2%80%9309_United_States_network_television_schedule" title="2008–09 United States network television schedule">2008–09</a>,
<a href="/wiki/Treehouse_of_Horror_XIX" title="Treehouse of Horror XIX">Treehouse of Horror XIX</a>,
<a href="/wiki/The_Simpsons_(season_21)" title="The Simpsons (season 21)">21</a>,
<a href="/wiki/2009%E2%80%9310_United_States_network_television_schedule" title="2009–10 United States network television schedule">2009–10</a>,
<a href="/wiki/Once_Upon_a_Time_in_Springfield" title="Once Upon a Time in Springfield">Once Upon a Time in Springfield</a>,
<a href="/wiki/The_Simpsons_(season_22)" title="The Simpsons (season 22)">22</a>,
<a href="/wiki/2010%E2%80%9311_United_States_network_television_schedule" title="2010–11 United States network television schedule">2010–11</a>,
<a href="/wiki/Moms_I%27d_Like_to_Forget" title="Moms I'd Like to Forget">Moms I'd Like to Forget</a>,
<a href="/wiki/The_Simpsons_(season_23)" title="The Simpsons (season 23)">23</a>,
<a href="/wiki/2011%E2%80%9312_United_States_network_television_schedule" title="2011–12 United States network television schedule">2011–12</a>,
<a href="#cite_note-Season23viewership-148">[147]</a>,
<a href="/wiki/The_D%27oh-cial_Network" title="The D'oh-cial Network">The D'oh-cial Network</a>,
<a href="/wiki/The_Simpsons_(season_24)" title="The Simpsons (season 24)">24</a>,
<a href="/wiki/2012%E2%80%9313_United_States_network_television_schedule" title="2012–13 United States network television schedule">2012–13</a>,
<a href="#cite_note-Season24viewership-149">[148]</a>,
<a href="/wiki/Homer_Goes_to_Prep_School" title="Homer Goes to Prep School">Homer Goes to Prep School</a>,
<a href="/wiki/The_Simpsons_(season_25)" title="The Simpsons (season 25)">25</a>,
<a href="/wiki/2013%E2%80%9314_United_States_network_television_schedule" title="2013–14 United States network television schedule">2013–14</a>,
<a href="#cite_note-Season25viewership-150">[149]</a>,
<a href="/wiki/Steal_This_Episode" title="Steal This Episode">Steal This Episode</a>,
<a href="/wiki/The_Simpsons_(season_26)" title="The Simpsons (season 26)">26</a>,
<a href="/wiki/2014%E2%80%9315_United_States_network_television_schedule" title="2014–15 United States network television schedule">2014–15</a>,
<a href="#cite_note-entertainment2015-151">[150]</a>,
<a href="/wiki/The_Man_Who_Came_to_Be_Dinner" title="The Man Who Came to Be Dinner">The Man Who Came to Be Dinner</a>,
<a href="/wiki/The_Simpsons_(season_27)" title="The Simpsons (season 27)">27</a>,
<a href="/wiki/2015%E2%80%9316_United_States_network_television_schedule" title="2015–16 United States network television schedule">2015–16</a>,
<a href="#cite_note-entertainment2016-152">[151]</a>,
<a href="/wiki/Teenage_Mutant_Milk-Caused_Hurdles" title="Teenage Mutant Milk-Caused Hurdles">Teenage Mutant Milk-Caused Hurdles</a>,
<a href="/wiki/The_Simpsons_(season_28)" title="The Simpsons (season 28)">28</a>,
<a href="/wiki/2016%E2%80%9317_United_States_network_television_schedule" title="2016–17 United States network television schedule">2016–17</a>,
<a href="#cite_note-entertainment2017-153">[152]</a>,
<a href="/wiki/Pork_and_Burns" title="Pork and Burns">Pork and Burns</a>,
<a href="/wiki/The_Simpsons_(season_29)" title="The Simpsons (season 29)">29</a>,
<a href="/wiki/2017%E2%80%9318_United_States_network_television_schedule" title="2017–18 United States network television schedule">2017–18</a>,
<a href="#cite_note-entertainment2018-154">[153]</a>,
<a href="/wiki/Frink_Gets_Testy" title="Frink Gets Testy">Frink Gets Testy</a>,
<a href="/wiki/The_Simpsons_(season_30)" title="The Simpsons (season 30)">30</a>,
<a href="/wiki/2018%E2%80%9319_United_States_network_television_schedule" title="2018–19 United States network television schedule">2018–19</a>,
<a href="#cite_note-155">[154]</a>,
<a href="/wiki/The_Girl_on_the_Bus" title="The Girl on the Bus">The Girl on the Bus</a>,
<a href="/wiki/The_Simpsons_(season_31)" title="The Simpsons (season 31)">31</a>,
<a href="/wiki/2019%E2%80%9320_United_States_network_television_schedule" title="2019–20 United States network television schedule">2019–20</a>,
<a href="#cite_note-156">[155]</a>,
<a href="/wiki/Go_Big_or_Go_Homer" title="Go Big or Go Homer">Go Big or Go Homer</a>,
<a href="/wiki/The_Simpsons_(season_32)" title="The Simpsons (season 32)">32</a>,
<a href="/wiki/2020%E2%80%9321_United_States_network_television_schedule" title="2020–21 United States network television schedule">2020–21</a>,
<a href="#cite_note-157">[156]</a>,
<a href="/wiki/Treehouse_of_Horror_XXXI" title="Treehouse of Horror XXXI">Treehouse of Horror XXXI</a>,
<a href="/wiki/The_Simpsons_(season_33)" title="The Simpsons (season 33)">33</a>,
<a href="/wiki/2021%E2%80%9322_United_States_network_television_schedule" title="2021–22 United States network television schedule">2021–22</a>,
<a href="#cite_note-158">[157]</a>]
Here I grabbed the first table, there were 3 that had the class of “wikitable”. Next I grabbed all the links. The table was HUGE when I printed everything, To get the table in a nice form, I’ll simply pass it to pandas using the read_html
command. I did need to convert the soup back into a string and then I only selected the first table to call df
df = pa.read_html(str(tables))[0]
Season | No. ofepisodes | Originally aired | Viewership | ||||||
Season | No. ofepisodes | Season premiere | Season finale | Time slot (ET) | Avg. viewers(in millions) | Most watched episode | |||
Season | Season.1 | No. ofepisodes | Season premiere | Season finale | Time slot (ET) | Avg. viewers(in millions) | Viewers(millions) | Episode title | |
0 | 1 | 1989–90 | 13 | December 17, 1989 | May 13, 1990 | Sunday 8:30 pm | 27.8 | 33.5 | "Life on the Fast Lane" |
1 | 2 | 1990–91 | 22 | October 11, 1990 | July 11, 1991 | Thursday 8:00 pm | 24.4 | 33.6 | "Bart Gets an 'F'" |
2 | 3 | 1991–92 | 24 | September 19, 1991 | August 27, 1992 | Thursday 8:00 pm | 21.8 | 25.5 | "Colonel Homer" |
3 | 4 | 1992–93 | 22 | September 24, 1992 | May 13, 1993 | Thursday 8:00 pm | 22.4 | 28.6 | "Lisa's First Word" |
4 | 5 | 1993–94 | 22 | September 30, 1993 | May 19, 1994 | Thursday 8:00 pm | 18.9 | 24.0 | "Treehouse of Horror IV" |
5 | 6 | 1994–95 | 25 | September 4, 1994 | May 21, 1995 | Sunday 8:00 pm | 15.6 | 22.2 | "Treehouse of Horror V" |
6 | 7 | 1995–96 | 25 | September 17, 1995 | May 19, 1996 | Sunday 8:00 pm (Episodes 1–24)Sunday 8:30 pm (... | 15.1 | 19.7 | "Treehouse of Horror VI" |
7 | 8 | 1996–97 | 25 | October 27, 1996 | May 18, 1997 | Sunday 8:30 pm (Episodes 1–3)Sunday 8:00 pm (E... | 14.5 | 20.9 | "The Springfield Files" |
8 | 9 | 1997–98 | 25 | September 21, 1997 | May 17, 1998 | Sunday 8:00 pm | 15.3 | 19.8 | "The Two Mrs. Nahasapeemapetilons" |
9 | 10 | 1998–99 | 23 | August 23, 1998 | May 16, 1999 | Sunday 8:00 pm | 13.5 | 15.5 | "Maximum Homerdrive" |
10 | 11 | 1999–2000 | 22 | September 26, 1999 | May 21, 2000 | Sunday 8:00 pm | 8.8 | 18.4 | "The Mansion Family" |
11 | 12 | 2000–01 | 21 | November 1, 2000 | May 20, 2001 | Sunday 8:00 pm | 15.5 | 18.6 | "Worst Episode Ever" |
12 | 13 | 2001–02 | 22 | November 6, 2001 | May 22, 2002 | Tuesday 8:30 pm (Episode 1)Sunday 8:00 pm (Epi... | 12.5 | 14.9 | "The Parent Rap" |
13 | 14 | 2002–03 | 22 | November 3, 2002 | May 18, 2003 | Sunday 8:00 pm (Episodes 1–11, 13–21)Sunday 8:... | 14.4 | 22.1 | "I'm Spelling as Fast as I Can" |
14 | 15 | 2003–04 | 22 | November 2, 2003 | May 23, 2004 | Sunday 8:00 pm | 11.0 | 16.3 | "I, (Annoyed Grunt)-Bot" |
15 | 16 | 2004–05 | 21 | November 7, 2004 | May 15, 2005 | Sunday 8:00 pm (Episodes 1–7, 9–16, 18, 20)Sun... | 10.2 | 23.07 | "Homer and Ned's Hail Mary Pass" |
16 | 17 | 2005–06 | 22 | September 11, 2005 | May 21, 2006 | Sunday 8:00 pm | 9.55 | 11.63 | "Treehouse of Horror XVI" |
17 | 18 | 2006–07 | 22 | September 10, 2006 | May 20, 2007 | Sunday 8:00 pm | 9.15 | 13.90 | "The Wife Aquatic" |
18 | 19 | 2007–08 | 20 | September 23, 2007 | May 18, 2008 | Sunday 8:00 pm | 8.37 | 11.7 | "Treehouse of Horror XVIII" |
19 | 20 | 2008–09 | 21 | September 28, 2008 | May 17, 2009 | Sunday 8:00 pm | 7.1 | 12.4 | "Treehouse of Horror XIX" |
20 | 21 | 2009–10 | 23 | September 27, 2009 | May 23, 2010 | Sunday 8:00 pm | 7.1 | 14.62 | "Once Upon a Time in Springfield" |
21 | 22 | 2010–11 | 22 | September 26, 2010 | May 22, 2011 | Sunday 8:00 pm | 7.09 | 12.6 | "Moms I'd Like to Forget" |
22 | 23 | 2011–12 | 22 | September 25, 2011 | May 20, 2012 | Sunday 8:00 pm | 6.15[147] | 11.48 | "The D'oh-cial Network" |
23 | 24 | 2012–13 | 22 | September 30, 2012 | May 19, 2013 | Sunday 8:00 pm (Episodes 1-21)Sunday 8:30 pm (... | 5.41[148] | 8.97 | "Homer Goes to Prep School" |
24 | 25 | 2013–14 | 22 | September 29, 2013 | May 18, 2014 | Sunday 8:00 pm (Episodes 1–11 & 13-22)Sunday 7... | 5.02[149] | 12.04 | "Steal This Episode" |
25 | 26 | 2014–15 | 22 | September 28, 2014 | May 17, 2015 | Sunday 8:00 pm | 5.61[150] | 10.62 | "The Man Who Came to Be Dinner" |
26 | 27 | 2015–16 | 22 | September 27, 2015 | May 22, 2016 | Sunday 8:00 pm | 4.0[151] | 8.33 | "Teenage Mutant Milk-Caused Hurdles" |
27 | 28 | 2016–17 | 22 | September 25, 2016 | May 21, 2017 | Sunday 8:00 pm | 4.80[152] | 8.19 | "Pork and Burns" |
28 | 29 | 2017–18 | 21 | October 1, 2017 | May 20, 2018 | Sunday 8:00 pm | 4.07[153] | 8.04 | "Frink Gets Testy" |
29 | 30 | 2018–19 | 23 | September 30, 2018 | May 12, 2019 | Sunday 8:00 pm | 3.10[154] | 8.20 | "The Girl on the Bus" |
30 | 31 | 2019–20 | 22 | September 29, 2019 | May 17, 2020 | Sunday 8:00 pm | 2.58[155] | 5.63 | "Go Big or Go Homer" |
31 | 32 | 2020–21 | 22 | September 27, 2020 | May 23, 2021 | Sunday 8:00 pm (Episodes 1–10 & 12-22)Sunday 9... | 2.32[156] | 4.93 | "Treehouse of Horror XXXI" |
32 | 33 | 2021–22 | 22 | September 26, 2021 | May 15, 2022[157] | Sunday 8:00 pm (Episodes 1–10 & 12-22)Sunday 9... | TBA | TBA | TBA |
The column names are not quite right but that is not a terrible fix. This was actually found later. You can see below in the list way in which I built the tables. There is more flexibility in the list way but the simplicity of the pandas way cannot be beat!
So the column names here are a problem as there are many! I will simply use the droplevel
command twice to remove two of the multi-indexing that is going on in the titles.
df.columns = df.columns.droplevel(0).droplevel(0)
Season | Season.1 | No. ofepisodes | Season premiere | Season finale | Time slot (ET) | Avg. viewers(in millions) | Viewers(millions) | Episode title | |
0 | 1 | 1989–90 | 13 | December 17, 1989 | May 13, 1990 | Sunday 8:30 pm | 27.8 | 33.5 | "Life on the Fast Lane" |
1 | 2 | 1990–91 | 22 | October 11, 1990 | July 11, 1991 | Thursday 8:00 pm | 24.4 | 33.6 | "Bart Gets an 'F'" |
2 | 3 | 1991–92 | 24 | September 19, 1991 | August 27, 1992 | Thursday 8:00 pm | 21.8 | 25.5 | "Colonel Homer" |
3 | 4 | 1992–93 | 22 | September 24, 1992 | May 13, 1993 | Thursday 8:00 pm | 22.4 | 28.6 | "Lisa's First Word" |
4 | 5 | 1993–94 | 22 | September 30, 1993 | May 19, 1994 | Thursday 8:00 pm | 18.9 | 24.0 | "Treehouse of Horror IV" |
5 | 6 | 1994–95 | 25 | September 4, 1994 | May 21, 1995 | Sunday 8:00 pm | 15.6 | 22.2 | "Treehouse of Horror V" |
6 | 7 | 1995–96 | 25 | September 17, 1995 | May 19, 1996 | Sunday 8:00 pm (Episodes 1–24)Sunday 8:30 pm (... | 15.1 | 19.7 | "Treehouse of Horror VI" |
7 | 8 | 1996–97 | 25 | October 27, 1996 | May 18, 1997 | Sunday 8:30 pm (Episodes 1–3)Sunday 8:00 pm (E... | 14.5 | 20.9 | "The Springfield Files" |
8 | 9 | 1997–98 | 25 | September 21, 1997 | May 17, 1998 | Sunday 8:00 pm | 15.3 | 19.8 | "The Two Mrs. Nahasapeemapetilons" |
9 | 10 | 1998–99 | 23 | August 23, 1998 | May 16, 1999 | Sunday 8:00 pm | 13.5 | 15.5 | "Maximum Homerdrive" |
10 | 11 | 1999–2000 | 22 | September 26, 1999 | May 21, 2000 | Sunday 8:00 pm | 8.8 | 18.4 | "The Mansion Family" |
11 | 12 | 2000–01 | 21 | November 1, 2000 | May 20, 2001 | Sunday 8:00 pm | 15.5 | 18.6 | "Worst Episode Ever" |
12 | 13 | 2001–02 | 22 | November 6, 2001 | May 22, 2002 | Tuesday 8:30 pm (Episode 1)Sunday 8:00 pm (Epi... | 12.5 | 14.9 | "The Parent Rap" |
13 | 14 | 2002–03 | 22 | November 3, 2002 | May 18, 2003 | Sunday 8:00 pm (Episodes 1–11, 13–21)Sunday 8:... | 14.4 | 22.1 | "I'm Spelling as Fast as I Can" |
14 | 15 | 2003–04 | 22 | November 2, 2003 | May 23, 2004 | Sunday 8:00 pm | 11.0 | 16.3 | "I, (Annoyed Grunt)-Bot" |
15 | 16 | 2004–05 | 21 | November 7, 2004 | May 15, 2005 | Sunday 8:00 pm (Episodes 1–7, 9–16, 18, 20)Sun... | 10.2 | 23.07 | "Homer and Ned's Hail Mary Pass" |
16 | 17 | 2005–06 | 22 | September 11, 2005 | May 21, 2006 | Sunday 8:00 pm | 9.55 | 11.63 | "Treehouse of Horror XVI" |
17 | 18 | 2006–07 | 22 | September 10, 2006 | May 20, 2007 | Sunday 8:00 pm | 9.15 | 13.90 | "The Wife Aquatic" |
18 | 19 | 2007–08 | 20 | September 23, 2007 | May 18, 2008 | Sunday 8:00 pm | 8.37 | 11.7 | "Treehouse of Horror XVIII" |
19 | 20 | 2008–09 | 21 | September 28, 2008 | May 17, 2009 | Sunday 8:00 pm | 7.1 | 12.4 | "Treehouse of Horror XIX" |
20 | 21 | 2009–10 | 23 | September 27, 2009 | May 23, 2010 | Sunday 8:00 pm | 7.1 | 14.62 | "Once Upon a Time in Springfield" |
21 | 22 | 2010–11 | 22 | September 26, 2010 | May 22, 2011 | Sunday 8:00 pm | 7.09 | 12.6 | "Moms I'd Like to Forget" |
22 | 23 | 2011–12 | 22 | September 25, 2011 | May 20, 2012 | Sunday 8:00 pm | 6.15[147] | 11.48 | "The D'oh-cial Network" |
23 | 24 | 2012–13 | 22 | September 30, 2012 | May 19, 2013 | Sunday 8:00 pm (Episodes 1-21)Sunday 8:30 pm (... | 5.41[148] | 8.97 | "Homer Goes to Prep School" |
24 | 25 | 2013–14 | 22 | September 29, 2013 | May 18, 2014 | Sunday 8:00 pm (Episodes 1–11 & 13-22)Sunday 7... | 5.02[149] | 12.04 | "Steal This Episode" |
25 | 26 | 2014–15 | 22 | September 28, 2014 | May 17, 2015 | Sunday 8:00 pm | 5.61[150] | 10.62 | "The Man Who Came to Be Dinner" |
26 | 27 | 2015–16 | 22 | September 27, 2015 | May 22, 2016 | Sunday 8:00 pm | 4.0[151] | 8.33 | "Teenage Mutant Milk-Caused Hurdles" |
27 | 28 | 2016–17 | 22 | September 25, 2016 | May 21, 2017 | Sunday 8:00 pm | 4.80[152] | 8.19 | "Pork and Burns" |
28 | 29 | 2017–18 | 21 | October 1, 2017 | May 20, 2018 | Sunday 8:00 pm | 4.07[153] | 8.04 | "Frink Gets Testy" |
29 | 30 | 2018–19 | 23 | September 30, 2018 | May 12, 2019 | Sunday 8:00 pm | 3.10[154] | 8.20 | "The Girl on the Bus" |
30 | 31 | 2019–20 | 22 | September 29, 2019 | May 17, 2020 | Sunday 8:00 pm | 2.58[155] | 5.63 | "Go Big or Go Homer" |
31 | 32 | 2020–21 | 22 | September 27, 2020 | May 23, 2021 | Sunday 8:00 pm (Episodes 1–10 & 12-22)Sunday 9... | 2.32[156] | 4.93 | "Treehouse of Horror XXXI" |
32 | 33 | 2021–22 | 22 | September 26, 2021 | May 15, 2022[157] | Sunday 8:00 pm (Episodes 1–10 & 12-22)Sunday 9... | TBA | TBA | TBA |
That is much better although we loose a little bit of information about the last two columns.
Let’s grab one more table to see if it works as well still. I just grabbed the very first table on the page.
df2 = pa.read_html(str(html_soup.find('table')))[0]
The Simpsons | The Simpsons.1 | |
0 | NaN | NaN |
1 | Genre | Animated sitcom Satire |
2 | Created by | Matt Groening |
3 | Based on | The Simpsons shortsby Matt Groening |
4 | Developed by | James L. Brooks Matt Groening Sam Simon |
5 | Voices of | Dan Castellaneta Julie Kavner Nancy Cartwright... |
6 | Theme music composer | Danny Elfman |
7 | Opening theme | "The Simpsons Theme" |
8 | Composers | Richard Gibbs (1989–1990)Alf Clausen (1990–201... |
9 | Country of origin | United States |
10 | Original language | English |
11 | No. of seasons | 33 |
12 | No. of episodes | 717 (list of episodes) |
13 | Production | Production |
14 | Executive producers | List James L. Brooks (entire run) Matt Groen... |
15 | Running time | 21–24 minutes |
16 | Production companies | Gracie Films 20th Television[a] (seasons 1–32)... |
17 | Distributor | 20th Television |
18 | Release | Release |
19 | Original network | Fox |
20 | Picture format | NTSC (1989–2009)HDTV 720p (2009–present) |
21 | Audio format | Stereo (1989–1991)Dolby Surround (1991–2009)Do... |
22 | Original release | December 17, 1989present |
23 | Chronology | Chronology |
24 | Preceded by | The Simpsons shorts from The Tracey Ullman Show |
25 | External links | External links |
26 | Official website | Official website |
Table the Hard Way#
data =[]
for table in tables:
headers = []
rows = table.find_all('tr')
for header in table.find('tr').find_all('th'):
headers.append(header.text.replace('\n', ''))
for row in table.find_all('tr')[1:]:
values =[]
for col in row.find_all(['th','td']):
values.append(col.text.replace('\n', ''))
#pa.DataFrame(data[1:], columns = data[0])
[['Season premiere',
'Season finale',
'Time slot (ET)',
'Avg. viewers(in millions)',
'Most watched episode'],
['Viewers(millions)', 'Episode title'],
'December 17, 1989',
'May 13, 1990',
'Sunday 8:30\xa0pm',
'"Life on the Fast Lane"'],
'October 11, 1990',
'July 11, 1991',
'Thursday 8:00\xa0pm',
'"Bart Gets an \'F\'"']]
I have to do some work here to get this into a dataframe. Mostly just get the column names correct. Several were not named and some ended up in there own row. This is why it is important to look at your outputs!
titles = []
for name in data[0]:
titles.append('Most watched episode title')
df = pa.DataFrame(data[2:], columns = titles)
Season | Years | Episodes | Season premiere | Season finale | Time slot (ET) | Avg. viewers(in millions) | Most watched episode | Most watched episode title | |
0 | 1 | 1989–90 | 13 | December 17, 1989 | May 13, 1990 | Sunday 8:30 pm | 27.8 | 33.5 | "Life on the Fast Lane" |
1 | 2 | 1990–91 | 22 | October 11, 1990 | July 11, 1991 | Thursday 8:00 pm | 24.4 | 33.6 | "Bart Gets an 'F'" |
2 | 3 | 1991–92 | 24 | September 19, 1991 | August 27, 1992 | 21.8 | 25.5 | "Colonel Homer" | None |
3 | 4 | 1992–93 | 22 | September 24, 1992 | May 13, 1993 | 22.4 | 28.6 | "Lisa's First Word" | None |
4 | 5 | 1993–94 | 22 | September 30, 1993 | May 19, 1994 | 18.9 | 24.0 | "Treehouse of Horror IV" | None |
5 | 6 | 1994–95 | 25 | September 4, 1994 | May 21, 1995 | Sunday 8:00 pm | 15.6 | 22.2 | "Treehouse of Horror V" |
6 | 7 | 1995–96 | 25 | September 17, 1995 | May 19, 1996 | Sunday 8:00 pm (Episodes 1–24)Sunday 8:30 pm (... | 15.1 | 19.7 | "Treehouse of Horror VI" |
7 | 8 | 1996–97 | 25 | October 27, 1996 | May 18, 1997 | Sunday 8:30 pm (Episodes 1–3)Sunday 8:00 pm (E... | 14.5 | 20.9 | "The Springfield Files" |
8 | 9 | 1997–98 | 25 | September 21, 1997 | May 17, 1998 | Sunday 8:00 pm | 15.3 | 19.8 | "The Two Mrs. Nahasapeemapetilons" |
9 | 10 | 1998–99 | 23 | August 23, 1998 | May 16, 1999 | 13.5 | 15.5 | "Maximum Homerdrive" | None |
10 | 11 | 1999–2000 | 22 | September 26, 1999 | May 21, 2000 | 8.8 | 18.4 | "The Mansion Family" | None |
11 | 12 | 2000–01 | 21 | November 1, 2000 | May 20, 2001 | 15.5 | 18.6 | "Worst Episode Ever" | None |
12 | 13 | 2001–02 | 22 | November 6, 2001 | May 22, 2002 | Tuesday 8:30 pm (Episode 1)Sunday 8:00 pm (Epi... | 12.5 | 14.9 | "The Parent Rap" |
13 | 14 | 2002–03 | 22 | November 3, 2002 | May 18, 2003 | Sunday 8:00 pm (Episodes 1–11, 13–21)Sunday 8:... | 14.4 | 22.1 | "I'm Spelling as Fast as I Can" |
14 | 15 | 2003–04 | 22 | November 2, 2003 | May 23, 2004 | Sunday 8:00 pm | 11.0 | 16.3 | "I, (Annoyed Grunt)-Bot" |
15 | 16 | 2004–05 | 21 | November 7, 2004 | May 15, 2005 | Sunday 8:00 pm (Episodes 1–7, 9–16, 18, 20)Sun... | 10.2 | 23.07 | "Homer and Ned's Hail Mary Pass" |
16 | 17 | 2005–06 | 22 | September 11, 2005 | May 21, 2006 | Sunday 8:00 pm | 9.55 | 11.63 | "Treehouse of Horror XVI" |
17 | 18 | 2006–07 | 22 | September 10, 2006 | May 20, 2007 | 9.15 | 13.90 | "The Wife Aquatic" | None |
18 | 19 | 2007–08 | 20 | September 23, 2007 | May 18, 2008 | 8.37 | 11.7 | "Treehouse of Horror XVIII" | None |
19 | 20 | 2008–09 | 21 | September 28, 2008 | May 17, 2009 | 7.1 | 12.4 | "Treehouse of Horror XIX" | None |
20 | 21 | 2009–10 | 23 | September 27, 2009 | May 23, 2010 | 7.1 | 14.62 | "Once Upon a Time in Springfield" | None |
21 | 22 | 2010–11 | 22 | September 26, 2010 | May 22, 2011 | 7.09 | 12.6 | "Moms I'd Like to Forget" | None |
22 | 23 | 2011–12 | 22 | September 25, 2011 | May 20, 2012 | 6.15[147] | 11.48 | "The D'oh-cial Network" | None |
23 | 24 | 2012–13 | 22 | September 30, 2012 | May 19, 2013 | Sunday 8:00 pm (Episodes 1-21)Sunday 8:30 pm (... | 5.41[148] | 8.97 | "Homer Goes to Prep School" |
24 | 25 | 2013–14 | 22 | September 29, 2013 | May 18, 2014 | Sunday 8:00 pm (Episodes 1–11 & 13-22)Sunday 7... | 5.02[149] | 12.04 | "Steal This Episode" |
25 | 26 | 2014–15 | 22 | September 28, 2014 | May 17, 2015 | Sunday 8:00 pm | 5.61[150] | 10.62 | "The Man Who Came to Be Dinner" |
26 | 27 | 2015–16 | 22 | September 27, 2015 | May 22, 2016 | 4.0[151] | 8.33 | "Teenage Mutant Milk-Caused Hurdles" | None |
27 | 28 | 2016–17 | 22 | September 25, 2016 | May 21, 2017 (2017-05-21) | 4.80[152] | 8.19 | "Pork and Burns" | None |
28 | 29 | 2017–18 | 21 | October 1, 2017 | May 20, 2018 | 4.07[153] | 8.04 | "Frink Gets Testy" | None |
29 | 30 | 2018–19 | 23 | September 30, 2018 | May 12, 2019 | 3.10[154] | 8.20 | "The Girl on the Bus" | None |
30 | 31 | 2019–20 | 22 | September 29, 2019 | May 17, 2020 | 2.58[155] | 5.63 | "Go Big or Go Homer" | None |
31 | 32 | 2020–21 | 22 | September 27, 2020 | May 23, 2021 | Sunday 8:00 pm (Episodes 1–10 & 12-22)Sunday 9... | 2.32[156] | 4.93 | "Treehouse of Horror XXXI" |
32 | 33 | 2021–22 | 22 | September 26, 2021 | May 15, 2022[157] | TBA | TBA | TBA | None |
Actually I still have a problem with my data. Lots of the data had the airtime repeated from above. Let’s see if we can fix that
newdata =[]
for i in range(2,35):
row = []
if len(data[i])!= 9:
for j in range(5):
for j in range(5,8):
row = data[i]
df = pa.DataFrame(newdata, columns = titles)
Season | Years | Episodes | Season premiere | Season finale | Time slot (ET) | Avg. viewers(in millions) | Most watched episode | Most watched episode title | |
0 | 1 | 1989–90 | 13 | December 17, 1989 | May 13, 1990 | Sunday 8:30 pm | 27.8 | 33.5 | "Life on the Fast Lane" |
1 | 2 | 1990–91 | 22 | October 11, 1990 | July 11, 1991 | Thursday 8:00 pm | 24.4 | 33.6 | "Bart Gets an 'F'" |
2 | 3 | 1991–92 | 24 | September 19, 1991 | August 27, 1992 | Thursday 8:00 pm | 21.8 | 25.5 | "Colonel Homer" |
3 | 4 | 1992–93 | 22 | September 24, 1992 | May 13, 1993 | Thursday 8:00 pm | 22.4 | 28.6 | "Lisa's First Word" |
4 | 5 | 1993–94 | 22 | September 30, 1993 | May 19, 1994 | Thursday 8:00 pm | 18.9 | 24.0 | "Treehouse of Horror IV" |
5 | 6 | 1994–95 | 25 | September 4, 1994 | May 21, 1995 | Sunday 8:00 pm | 15.6 | 22.2 | "Treehouse of Horror V" |
6 | 7 | 1995–96 | 25 | September 17, 1995 | May 19, 1996 | Sunday 8:00 pm (Episodes 1–24)Sunday 8:30 pm (... | 15.1 | 19.7 | "Treehouse of Horror VI" |
7 | 8 | 1996–97 | 25 | October 27, 1996 | May 18, 1997 | Sunday 8:30 pm (Episodes 1–3)Sunday 8:00 pm (E... | 14.5 | 20.9 | "The Springfield Files" |
8 | 9 | 1997–98 | 25 | September 21, 1997 | May 17, 1998 | Sunday 8:00 pm | 15.3 | 19.8 | "The Two Mrs. Nahasapeemapetilons" |
9 | 10 | 1998–99 | 23 | August 23, 1998 | May 16, 1999 | Sunday 8:00 pm | 13.5 | 15.5 | "Maximum Homerdrive" |
10 | 11 | 1999–2000 | 22 | September 26, 1999 | May 21, 2000 | Sunday 8:00 pm | 8.8 | 18.4 | "The Mansion Family" |
11 | 12 | 2000–01 | 21 | November 1, 2000 | May 20, 2001 | Sunday 8:00 pm | 15.5 | 18.6 | "Worst Episode Ever" |
12 | 13 | 2001–02 | 22 | November 6, 2001 | May 22, 2002 | Tuesday 8:30 pm (Episode 1)Sunday 8:00 pm (Epi... | 12.5 | 14.9 | "The Parent Rap" |
13 | 14 | 2002–03 | 22 | November 3, 2002 | May 18, 2003 | Sunday 8:00 pm (Episodes 1–11, 13–21)Sunday 8:... | 14.4 | 22.1 | "I'm Spelling as Fast as I Can" |
14 | 15 | 2003–04 | 22 | November 2, 2003 | May 23, 2004 | Sunday 8:00 pm | 11.0 | 16.3 | "I, (Annoyed Grunt)-Bot" |
15 | 16 | 2004–05 | 21 | November 7, 2004 | May 15, 2005 | Sunday 8:00 pm (Episodes 1–7, 9–16, 18, 20)Sun... | 10.2 | 23.07 | "Homer and Ned's Hail Mary Pass" |
16 | 17 | 2005–06 | 22 | September 11, 2005 | May 21, 2006 | Sunday 8:00 pm | 9.55 | 11.63 | "Treehouse of Horror XVI" |
17 | 18 | 2006–07 | 22 | September 10, 2006 | May 20, 2007 | Sunday 8:00 pm | 9.15 | 13.90 | "The Wife Aquatic" |
18 | 19 | 2007–08 | 20 | September 23, 2007 | May 18, 2008 | Sunday 8:00 pm | 8.37 | 11.7 | "Treehouse of Horror XVIII" |
19 | 20 | 2008–09 | 21 | September 28, 2008 | May 17, 2009 | Sunday 8:00 pm | 7.1 | 12.4 | "Treehouse of Horror XIX" |
20 | 21 | 2009–10 | 23 | September 27, 2009 | May 23, 2010 | Sunday 8:00 pm | 7.1 | 14.62 | "Once Upon a Time in Springfield" |
21 | 22 | 2010–11 | 22 | September 26, 2010 | May 22, 2011 | Sunday 8:00 pm | 7.09 | 12.6 | "Moms I'd Like to Forget" |
22 | 23 | 2011–12 | 22 | September 25, 2011 | May 20, 2012 | Sunday 8:00 pm | 6.15[147] | 11.48 | "The D'oh-cial Network" |
23 | 24 | 2012–13 | 22 | September 30, 2012 | May 19, 2013 | Sunday 8:00 pm (Episodes 1-21)Sunday 8:30 pm (... | 5.41[148] | 8.97 | "Homer Goes to Prep School" |
24 | 25 | 2013–14 | 22 | September 29, 2013 | May 18, 2014 | Sunday 8:00 pm (Episodes 1–11 & 13-22)Sunday 7... | 5.02[149] | 12.04 | "Steal This Episode" |
25 | 26 | 2014–15 | 22 | September 28, 2014 | May 17, 2015 | Sunday 8:00 pm | 5.61[150] | 10.62 | "The Man Who Came to Be Dinner" |
26 | 27 | 2015–16 | 22 | September 27, 2015 | May 22, 2016 | Sunday 8:00 pm | 4.0[151] | 8.33 | "Teenage Mutant Milk-Caused Hurdles" |
27 | 28 | 2016–17 | 22 | September 25, 2016 | May 21, 2017 (2017-05-21) | Sunday 8:00 pm | 4.80[152] | 8.19 | "Pork and Burns" |
28 | 29 | 2017–18 | 21 | October 1, 2017 | May 20, 2018 | Sunday 8:00 pm | 4.07[153] | 8.04 | "Frink Gets Testy" |
29 | 30 | 2018–19 | 23 | September 30, 2018 | May 12, 2019 | Sunday 8:00 pm | 3.10[154] | 8.20 | "The Girl on the Bus" |
30 | 31 | 2019–20 | 22 | September 29, 2019 | May 17, 2020 | Sunday 8:00 pm | 2.58[155] | 5.63 | "Go Big or Go Homer" |
31 | 32 | 2020–21 | 22 | September 27, 2020 | May 23, 2021 | Sunday 8:00 pm (Episodes 1–10 & 12-22)Sunday 9... | 2.32[156] | 4.93 | "Treehouse of Horror XXXI" |
32 | 33 | 2021–22 | 22 | September 26, 2021 | May 15, 2022[157] | Sunday 8:00 pm (Episodes 1–10 & 12-22)Sunday 9... | TBA | TBA | TBA |
Do you ever get to the finish line and think to yourself, man there must be an easier way to do that… Oh there totally was…
Your Turn#
Navigate to the wikipedia page on Marvel Cinematic Universe Films. Gather the table on the films in the Infinity series (Hint: class is ‘wikitable plainrowheaders’). Fix any issues with the column names. Remove rows that are not movies.