SQL Joins#
from google.colab import auth
auth.authenticate_user()
print('Authenticated')
Authenticated
Next we will try to tackle JOINS
. There are many ways to join tables in SQL.
Inner Joins#
The easiest way is a cross join. Essentially this is a cartesian product of the two datasets. It will create a very large dataset as it is multiplicative in the number of rows. Here is a small table to start (10 rows)
%%bigquery --project pic-math
SELECT *
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE category_name = 'Imported Whiskies'
invoice_and_item_number | date | store_number | store_name | address | city | zip_code | store_location | county_number | county | category | category_name | vendor_number | vendor_name | item_number | item_description | pack | bottle_volume_ml | state_bottle_cost | state_bottle_retail | bottles_sold | sale_dollars | volume_sold_liters | volume_sold_gallons | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | INV-34280100007 | 2021-02-15 | 2190 | Central City Liquor, Inc. | 1460 2ND AVE | Des Moines | 50314.0 | POINT (-93.619787 41.60566) | 77 | POLK | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987952 | Kavalan Concertmaster | 6 | 750 | 51.00 | 76.50 | 6 | 459.00 | 4.5 | 1.18 |
1 | INV-34357200008 | 2021-02-16 | 6035 | Bootlegging Barzinis | 412 First Ave | Coralville | 52241.0 | POINT (-91.565517 41.672672) | 52 | JOHNSON | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987955 | Kavalan King Car Conductor | 6 | 750 | 61.34 | 92.01 | 6 | 552.06 | 4.5 | 1.18 |
2 | INV-34280100006 | 2021-02-15 | 2190 | Central City Liquor, Inc. | 1460 2ND AVE | Des Moines | 50314.0 | POINT (-93.619787 41.60566) | 77 | POLK | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987951 | Kavalan Classic Single Malt Whiskey | 6 | 750 | 46.00 | 69.00 | 6 | 414.00 | 4.5 | 1.18 |
3 | INV-41510000010 | 2021-11-01 | 2190 | Central City Liquor, Inc. | 1460 2ND AVE | Des Moines | 50314.0 | POINT (-93.619787 41.60566) | 77 | POLK | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987952 | Kavalan Concertmaster | 6 | 750 | 51.00 | 76.50 | 6 | 459.00 | 4.5 | 1.18 |
4 | INV-34280100008 | 2021-02-15 | 2190 | Central City Liquor, Inc. | 1460 2ND AVE | Des Moines | 50314.0 | POINT (-93.619787 41.60566) | 77 | POLK | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987955 | Kavalan King Car Conductor | 6 | 750 | 61.34 | 92.01 | 6 | 552.06 | 4.5 | 1.18 |
5 | INV-37854400006 | 2021-06-28 | 2190 | Central City Liquor, Inc. | 1460 2ND AVE | Des Moines | 50314.0 | POINT (-93.619787 41.60566) | 77 | POLK | 1012000.0 | Imported Whiskies | 391 | HOTALING & CO | 987952 | Kavalan Concertmaster | 6 | 750 | 51.00 | 76.50 | 6 | 459.00 | 4.5 | 1.18 |
6 | INV-34357200006 | 2021-02-16 | 6035 | Bootlegging Barzinis | 412 First Ave | Coralville | 52241.0 | POINT (-91.565517 41.672672) | 52 | JOHNSON | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987951 | Kavalan Classic Single Malt Whiskey | 6 | 750 | 46.00 | 69.00 | 6 | 414.00 | 4.5 | 1.18 |
7 | INV-40733900005 | 2021-10-06 | 3773 | Benz Distributing | 501 7th Ave SE | Cedar Rapids | 52401.0 | POINT (-91.659875 41.97574) | 57 | LINN | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987953 | Kavalan Ex-Bourbon | 6 | 750 | 81.84 | 122.76 | 6 | 736.56 | 4.5 | 1.18 |
8 | INV-34357200007 | 2021-02-16 | 6035 | Bootlegging Barzinis | 412 First Ave | Coralville | 52241.0 | POINT (-91.565517 41.672672) | 52 | JOHNSON | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987953 | Kavalan Ex-Bourbon | 6 | 750 | 81.84 | 122.76 | 6 | 736.56 | 4.5 | 1.18 |
9 | INV-40505700006 | 2021-09-29 | 6035 | Bootlegging Barzinis | 412 First Ave | Coralville | 52241.0 | POINT (-91.565517 41.672672) | 52 | JOHNSON | 1012000.0 | Imported Whiskies | 391 | HOTALING & CO | 987953 | Kavalan Ex-Bourbon | 6 | 750 | 81.84 | 122.76 | 6 | 736.56 | 4.5 | 1.18 |
Now we’ll use that table and add to it every column from the same table with the JOIN
. Note that the ON
clause is over the category_name and they all have the same category name!
%%bigquery --project pic-math
WITH t as(
SELECT *
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE category_name = 'Imported Whiskies')
SELECT *
FROM t as table1 JOIN t as table2 ON table1.category_name = table2.category_name
invoice_and_item_number | date | store_number | store_name | address | city | zip_code | store_location | county_number | county | category | category_name | vendor_number | vendor_name | item_number | item_description | pack | bottle_volume_ml | state_bottle_cost | state_bottle_retail | bottles_sold | sale_dollars | volume_sold_liters | volume_sold_gallons | invoice_and_item_number_1 | date_1 | store_number_1 | store_name_1 | address_1 | city_1 | zip_code_1 | store_location_1 | county_number_1 | county_1 | category_1 | category_name_1 | vendor_number_1 | vendor_name_1 | item_number_1 | item_description_1 | pack_1 | bottle_volume_ml_1 | state_bottle_cost_1 | state_bottle_retail_1 | bottles_sold_1 | sale_dollars_1 | volume_sold_liters_1 | volume_sold_gallons_1 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | INV-34280100007 | 2021-02-15 | 2190 | Central City Liquor, Inc. | 1460 2ND AVE | Des Moines | 50314.0 | POINT (-93.619787 41.60566) | 77 | POLK | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987952 | Kavalan Concertmaster | 6 | 750 | 51.00 | 76.50 | 6 | 459.00 | 4.5 | 1.18 | INV-34280100007 | 2021-02-15 | 2190 | Central City Liquor, Inc. | 1460 2ND AVE | Des Moines | 50314.0 | POINT (-93.619787 41.60566) | 77 | POLK | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987952 | Kavalan Concertmaster | 6 | 750 | 51.00 | 76.50 | 6 | 459.00 | 4.5 | 1.18 |
1 | INV-34357200008 | 2021-02-16 | 6035 | Bootlegging Barzinis | 412 First Ave | Coralville | 52241.0 | POINT (-91.565517 41.672672) | 52 | JOHNSON | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987955 | Kavalan King Car Conductor | 6 | 750 | 61.34 | 92.01 | 6 | 552.06 | 4.5 | 1.18 | INV-34280100007 | 2021-02-15 | 2190 | Central City Liquor, Inc. | 1460 2ND AVE | Des Moines | 50314.0 | POINT (-93.619787 41.60566) | 77 | POLK | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987952 | Kavalan Concertmaster | 6 | 750 | 51.00 | 76.50 | 6 | 459.00 | 4.5 | 1.18 |
2 | INV-34280100006 | 2021-02-15 | 2190 | Central City Liquor, Inc. | 1460 2ND AVE | Des Moines | 50314.0 | POINT (-93.619787 41.60566) | 77 | POLK | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987951 | Kavalan Classic Single Malt Whiskey | 6 | 750 | 46.00 | 69.00 | 6 | 414.00 | 4.5 | 1.18 | INV-34280100007 | 2021-02-15 | 2190 | Central City Liquor, Inc. | 1460 2ND AVE | Des Moines | 50314.0 | POINT (-93.619787 41.60566) | 77 | POLK | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987952 | Kavalan Concertmaster | 6 | 750 | 51.00 | 76.50 | 6 | 459.00 | 4.5 | 1.18 |
3 | INV-41510000010 | 2021-11-01 | 2190 | Central City Liquor, Inc. | 1460 2ND AVE | Des Moines | 50314.0 | POINT (-93.619787 41.60566) | 77 | POLK | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987952 | Kavalan Concertmaster | 6 | 750 | 51.00 | 76.50 | 6 | 459.00 | 4.5 | 1.18 | INV-34280100007 | 2021-02-15 | 2190 | Central City Liquor, Inc. | 1460 2ND AVE | Des Moines | 50314.0 | POINT (-93.619787 41.60566) | 77 | POLK | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987952 | Kavalan Concertmaster | 6 | 750 | 51.00 | 76.50 | 6 | 459.00 | 4.5 | 1.18 |
4 | INV-34280100008 | 2021-02-15 | 2190 | Central City Liquor, Inc. | 1460 2ND AVE | Des Moines | 50314.0 | POINT (-93.619787 41.60566) | 77 | POLK | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987955 | Kavalan King Car Conductor | 6 | 750 | 61.34 | 92.01 | 6 | 552.06 | 4.5 | 1.18 | INV-34280100007 | 2021-02-15 | 2190 | Central City Liquor, Inc. | 1460 2ND AVE | Des Moines | 50314.0 | POINT (-93.619787 41.60566) | 77 | POLK | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987952 | Kavalan Concertmaster | 6 | 750 | 51.00 | 76.50 | 6 | 459.00 | 4.5 | 1.18 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
95 | INV-40505700006 | 2021-09-29 | 6035 | Bootlegging Barzinis | 412 First Ave | Coralville | 52241.0 | POINT (-91.565517 41.672672) | 52 | JOHNSON | 1012000.0 | Imported Whiskies | 391 | HOTALING & CO | 987953 | Kavalan Ex-Bourbon | 6 | 750 | 81.84 | 122.76 | 6 | 736.56 | 4.5 | 1.18 | INV-34357200007 | 2021-02-16 | 6035 | Bootlegging Barzinis | 412 First Ave | Coralville | 52241.0 | POINT (-91.565517 41.672672) | 52 | JOHNSON | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987953 | Kavalan Ex-Bourbon | 6 | 750 | 81.84 | 122.76 | 6 | 736.56 | 4.5 | 1.18 |
96 | INV-40733900005 | 2021-10-06 | 3773 | Benz Distributing | 501 7th Ave SE | Cedar Rapids | 52401.0 | POINT (-91.659875 41.97574) | 57 | LINN | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987953 | Kavalan Ex-Bourbon | 6 | 750 | 81.84 | 122.76 | 6 | 736.56 | 4.5 | 1.18 | INV-34357200007 | 2021-02-16 | 6035 | Bootlegging Barzinis | 412 First Ave | Coralville | 52241.0 | POINT (-91.565517 41.672672) | 52 | JOHNSON | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987953 | Kavalan Ex-Bourbon | 6 | 750 | 81.84 | 122.76 | 6 | 736.56 | 4.5 | 1.18 |
97 | INV-34357200007 | 2021-02-16 | 6035 | Bootlegging Barzinis | 412 First Ave | Coralville | 52241.0 | POINT (-91.565517 41.672672) | 52 | JOHNSON | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987953 | Kavalan Ex-Bourbon | 6 | 750 | 81.84 | 122.76 | 6 | 736.56 | 4.5 | 1.18 | INV-34357200007 | 2021-02-16 | 6035 | Bootlegging Barzinis | 412 First Ave | Coralville | 52241.0 | POINT (-91.565517 41.672672) | 52 | JOHNSON | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987953 | Kavalan Ex-Bourbon | 6 | 750 | 81.84 | 122.76 | 6 | 736.56 | 4.5 | 1.18 |
98 | INV-37854400006 | 2021-06-28 | 2190 | Central City Liquor, Inc. | 1460 2ND AVE | Des Moines | 50314.0 | POINT (-93.619787 41.60566) | 77 | POLK | 1012000.0 | Imported Whiskies | 391 | HOTALING & CO | 987952 | Kavalan Concertmaster | 6 | 750 | 51.00 | 76.50 | 6 | 459.00 | 4.5 | 1.18 | INV-34357200007 | 2021-02-16 | 6035 | Bootlegging Barzinis | 412 First Ave | Coralville | 52241.0 | POINT (-91.565517 41.672672) | 52 | JOHNSON | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987953 | Kavalan Ex-Bourbon | 6 | 750 | 81.84 | 122.76 | 6 | 736.56 | 4.5 | 1.18 |
99 | INV-34357200006 | 2021-02-16 | 6035 | Bootlegging Barzinis | 412 First Ave | Coralville | 52241.0 | POINT (-91.565517 41.672672) | 52 | JOHNSON | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987951 | Kavalan Classic Single Malt Whiskey | 6 | 750 | 46.00 | 69.00 | 6 | 414.00 | 4.5 | 1.18 | INV-34357200007 | 2021-02-16 | 6035 | Bootlegging Barzinis | 412 First Ave | Coralville | 52241.0 | POINT (-91.565517 41.672672) | 52 | JOHNSON | 1012000.0 | Imported Whiskies | 391.0 | HOTALING & CO | 987953 | Kavalan Ex-Bourbon | 6 | 750 | 81.84 | 122.76 | 6 | 736.56 | 4.5 | 1.18 |
100 rows × 48 columns
So why would we want to do this? Well now we could compare each bottle to every other bottle. We could ask how many bottles cost more than this one.
%%bigquery --project pic-math
WITH t as(
SELECT *
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE category_name = 'Imported Whiskies')
SELECT table1.item_description, SUM(CASE WHEN table1.state_bottle_retail < table2.state_bottle_retail THEN 1 ELSE 0 END) as number_more_expensive
FROM t as table1 JOIN t as table2 ON table1.category_name = table2.category_name
GROUP BY table1.item_description
item_description | number_more_expensive | |
---|---|---|
0 | Kavalan Concertmaster | 15 |
1 | Kavalan Classic Single Malt Whiskey | 16 |
2 | Kavalan King Car Conductor | 6 |
3 | Kavalan Ex-Bourbon | 0 |
Wait, I thought I only had 10 bottles? What we see here is that I had really only 4 bottles. Let’s fix this! In the first table, I am including much more than I need and I should just group them by item_description there too
%%bigquery --project pic-math
SELECT item_description, MAX(state_bottle_retail) as state_bottle_retail, category_name
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE category_name = 'Imported Whiskies'
GROUP BY item_description, category_name
item_description | state_bottle_retail | category_name | |
---|---|---|---|
0 | Kavalan Concertmaster | 76.50 | Imported Whiskies |
1 | Kavalan King Car Conductor | 92.01 | Imported Whiskies |
2 | Kavalan Classic Single Malt Whiskey | 69.00 | Imported Whiskies |
3 | Kavalan Ex-Bourbon | 122.76 | Imported Whiskies |
Now if I get that back into my join, I should get the desired outcome. You should also ask yourself why category name was kept. It was needed for the cross join!
%%bigquery --project pic-math
WITH t as(
SELECT item_description, MAX(state_bottle_retail) as state_bottle_retail, category_name
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE category_name = 'Imported Whiskies'
GROUP BY item_description, category_name)
SELECT table1.item_description, SUM(CASE WHEN table1.state_bottle_retail < table2.state_bottle_retail THEN 1 ELSE 0 END) as number_more_expensive, table1.state_bottle_retail as price
FROM t as table1 JOIN t as table2 ON table1.category_name = table2.category_name
GROUP BY table1.item_description, table1.state_bottle_retail
item_description | number_more_expensive | price | |
---|---|---|---|
0 | Kavalan Concertmaster | 2 | 76.50 |
1 | Kavalan King Car Conductor | 1 | 92.01 |
2 | Kavalan Classic Single Malt Whiskey | 3 | 69.00 |
3 | Kavalan Ex-Bourbon | 0 | 122.76 |
A word of note here! The inner join preformed here will only join two datasets if the ON
matches. When it does not, the row will not be included in the join. Let’s see that in action!
%%bigquery --project pic-math
WITH iw as(
SELECT item_description, MAX(state_bottle_retail) as state_bottle_retail, category_name
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE category_name = 'Imported Whiskies'
GROUP BY item_description, category_name)
SELECT *
FROM iw JOIN
(SELECT item_description, MAX(state_bottle_retail) as state_bottle_retail, category_name
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE category_name = 'Coffee Liqueurs'
GROUP BY item_description, category_name) as cl
ON iw.category_name = cl.category_name
item_description | state_bottle_retail | category_name | item_description_1 | state_bottle_retail_1 | category_name_1 |
---|
Nothing was printed because nothing will match! All the data was lost! Now this example seems a little far fetched but often your tables may have missing or omitted data. It doesn’t mean that row should be lost because the column you were working with didn’t have a match! We examine this in the next section.
Outer Joins#
Let’s take the last example and do a LEFT OUTER JOIN
this will force the left table (Imported Whiskies) to be included in the table.
%%bigquery --project pic-math
WITH iw as(
SELECT item_description, MAX(state_bottle_retail) as state_bottle_retail, category_name
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE category_name = 'Imported Whiskies'
GROUP BY item_description, category_name)
SELECT *
FROM iw LEFT OUTER JOIN
(SELECT item_description, MAX(state_bottle_retail) as state_bottle_retail, category_name
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE category_name = 'Coffee Liqueurs'
GROUP BY item_description, category_name) as cl
ON iw.category_name = cl.category_name
item_description | state_bottle_retail | category_name | item_description_1 | state_bottle_retail_1 | category_name_1 | |
---|---|---|---|---|---|---|
0 | Kavalan Concertmaster | 76.50 | Imported Whiskies | None | None | None |
1 | Kavalan Classic Single Malt Whiskey | 69.00 | Imported Whiskies | None | None | None |
2 | Kavalan King Car Conductor | 92.01 | Imported Whiskies | None | None | None |
3 | Kavalan Ex-Bourbon | 122.76 | Imported Whiskies | None | None | None |
The Whiskies appear even though they never matched the coffee Liqueurs!
Can you guess what RIGHT OUTER JOIN
will do?
%%bigquery --project pic-math
WITH iw as(
SELECT item_description, MAX(state_bottle_retail) as state_bottle_retail, category_name
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE category_name = 'Imported Whiskies'
GROUP BY item_description, category_name)
SELECT *
FROM iw RIGHT OUTER JOIN
(SELECT item_description, MAX(state_bottle_retail) as state_bottle_retail, category_name
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE category_name = 'Coffee Liqueurs'
GROUP BY item_description, category_name) as cl
ON iw.category_name = cl.category_name
item_description | state_bottle_retail | category_name | item_description_1 | state_bottle_retail_1 | category_name_1 | |
---|---|---|---|---|---|---|
0 | None | None | None | Kahlua Coffee | 39.72 | Coffee Liqueurs |
1 | None | None | None | Kahlua Coffee Mini | 9.90 | Coffee Liqueurs |
2 | None | None | None | Kamora Coffee Liqueur | 18.09 | Coffee Liqueurs |
3 | None | None | None | Kahlua Coffee Liqueur | 37.49 | Coffee Liqueurs |
4 | None | None | None | Kahlua Salted Caramel | 19.49 | Coffee Liqueurs |
5 | None | None | None | Kapali Coffee Liqueur | 8.58 | Coffee Liqueurs |
6 | None | None | None | CCD Coffee Liqueur | 24.00 | Coffee Liqueurs |
7 | None | None | None | Iowa Coffee Company Liqueur | 13.50 | Coffee Liqueurs |
8 | None | None | None | Patron Xo Cafe | 21.75 | Coffee Liqueurs |
9 | None | None | None | Kahlua Vanilla | 19.49 | Coffee Liqueurs |
10 | None | None | None | Kahlua Coffee Liqueur Mini | 9.90 | Coffee Liqueurs |
11 | None | None | None | Caffe Lolita Coffee Liqueur | 7.50 | Coffee Liqueurs |
12 | None | None | None | Kahlua French Vanilla Liqueur DISCO | 18.74 | Coffee Liqueurs |
13 | None | None | None | Chila Coffee Liqueur | 7.46 | Coffee Liqueurs |
14 | None | None | None | Copa De Oro Mexican Coffee | 8.81 | Coffee Liqueurs |
15 | None | None | None | Kahlua Especial | 17.99 | Coffee Liqueurs |
16 | None | None | None | Patron XO Cafe | 21.00 | Coffee Liqueurs |
17 | None | None | None | Tia Maria Coffee Liqueur | 23.00 | Coffee Liqueurs |
18 | None | None | None | Carolans Cold Brew | 14.25 | Coffee Liqueurs |
19 | None | None | None | J. Rieger & Co. Caffe Amaro | 23.25 | Coffee Liqueurs |
20 | None | None | None | Mozart Chocolate Coffee Cream Liqueur | 22.31 | Coffee Liqueurs |
21 | None | None | None | Grind Espresso Shot | 13.50 | Coffee Liqueurs |
22 | None | None | None | Original Secret Family Recipe - A Coffee Liqueur | 27.00 | Coffee Liqueurs |
23 | None | None | None | Kahlua French Vanilla Liqueur | 17.99 | Coffee Liqueurs |
24 | None | None | None | Dr McGillicuddys Coffee | 12.99 | Coffee Liqueurs |
25 | None | None | None | Kahlua Peppermint Mocha | 17.99 | Coffee Liqueurs |
26 | None | None | None | Kahlua Chili Chocolate | 17.99 | Coffee Liqueurs |
27 | None | None | None | Conciere Coffee Liqueur | 4.49 | Coffee Liqueurs |
We guarantee all the right table’s entries are included!
Joining Other Tables#
Up to now, I have just been joining the same table together. What is really great about a relational database is the relations! Let’s try to get information from another table into ours! I am wondering if some counties drink more liquor than others. Well of course more populous counties will so let’s see if we can get the population added by zipcode. I’ll pull data from the 2010 census.
%%bigquery --project pic-math
SELECT *
FROM `bigquery-public-data.census_bureau_usa.population_by_zip_2010`
LIMIT 10
geo_id | zipcode | population | minimum_age | maximum_age | gender | |
---|---|---|---|---|---|---|
0 | 8600000US00601 | 601 | 18570 | None | None | None |
1 | 8600000US00602 | 602 | 41520 | None | None | None |
2 | 8600000US00603 | 603 | 54689 | None | None | None |
3 | 8600000US00606 | 606 | 6615 | None | None | None |
4 | 8600000US00610 | 610 | 29016 | None | None | None |
5 | 8600000US00612 | 612 | 67010 | None | None | None |
6 | 8600000US00616 | 616 | 11017 | None | None | None |
7 | 8600000US00617 | 617 | 24597 | None | None | None |
8 | 8600000US00622 | 622 | 7853 | None | None | None |
9 | 8600000US00623 | 623 | 43061 | None | None | None |
So I want to add population to a row but keep any sales even if the population does not work. You should note that the census breaks down population by age and gender. I just want the total and that happens when those three categories are NULL
so I ask that.
%%bigquery --project pic-math
SELECT liquor.city,liquor.zip_code,liquor.category_name,liquor.bottles_sold,zip.population
FROM `bigquery-public-data.iowa_liquor_sales.sales` as liquor LEFT OUTER JOIN `bigquery-public-data.census_bureau_usa.population_by_zip_2010` as zip
ON liquor.zip_code = zip.zipcode
WHERE zip.gender IS NULL AND zip.minimum_age IS NULL AND zip.maximum_age IS NULL
LIMIT 10
city | zip_code | category_name | bottles_sold | population | |
---|---|---|---|---|---|
0 | Cedar Rapids | 52401.0 | Imported Distilled Spirit Specialty | 72 | NaN |
1 | LEON | 50144 | WATERMELON SCHNAPPS | 4 | 2901.0 |
2 | CEDAR RAPIDS | 52401 | DISTILLED SPIRITS SPECIALTY | 48 | 2017.0 |
3 | KEOKUK | 52632 | ROOT BEER SCHNAPPS | 48 | 13086.0 |
4 | COUNCIL BLUFFS | 51503 | PEACH SCHNAPPS | 7 | 36376.0 |
5 | Des Moines | 50314.0 | Triple Sec | 180 | NaN |
6 | Johnston | 50131.0 | Neutral Grain Spirits Flavored | 16 | NaN |
7 | CEDAR RAPIDS | 52402 | TRIPLE SEC | 240 | 40149.0 |
8 | CEDAR RAPIDS | 52402 | IMPORTED DRY GINS | 72 | 40149.0 |
9 | DAVENPORT | 52802 | 100 PROOF VODKA | 60 | 10868.0 |
There are still some issues here! Newton has a zip code but it is being interpreted as a float then converted to a string. This would require some data munging that we have yet to discuss! In any case this gives us a nice table where we could ask more complicated questions!
Your Turn#
Examine the average trip on Austin bike share grouping by property_type column from the bikeshare_stations table. Is average time on trips starting from ‘parkland’ longer than from ‘sidewalks’? To answer this question you must join the bikeshare_trips and bikeshare_stations on the start_station_name and the name from their respective tables.
The below table is from bikeshare_stations
%%bigquery --project pic-math
SELECT name, property_type
FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
LIMIT 50
name | property_type | |
---|---|---|
0 | Rainey @ River St | None |
1 | Toomey Rd @ South Lamar | None |
2 | East 7th & Pleasant Valley | None |
3 | Pease Park | None |
4 | OFFICE/Main/Shop/Repair | None |
5 | 6th & Navasota St. | None |
6 | 8th & Guadalupe | None |
7 | Red River & LBJ Library | None |
8 | State Parking Garage @ Brazos & 18th | None |
9 | Zilker Park West | None |
10 | Lavaca & 6th | None |
11 | Nueces @ 3rd | None |
12 | Republic Square | None |
13 | Waller & 6th St. | None |
14 | Bullock Museum @ Congress & MLK | None |
15 | State Capitol @ 14th & Colorado | None |
16 | ACC - Rio Grande & 12th | None |
17 | ACC - West & 12th Street | None |
18 | Republic Square @ Guadalupe & 4th St. | None |
19 | 5th & San Marcos | None |
20 | Barton Springs & Riverside | |
21 | Boardwalk West | parkland |
22 | Congress & Cesar Chavez | parkland |
23 | Barton Springs Pool | parkland |
24 | Riverside @ S. Lamar | parkland |
25 | Zilker Park | parkland |
26 | Rainey St @ Cummings | parkland |
27 | 13th & Trinity | sidewalk |
28 | 22nd 1/2 & Rio Grande | sidewalk |
29 | East 5th/Shady @ Eastside Bus Plaza | sidewalk |
30 | South Congress & Academy | sidewalk |
31 | 8th & Congress | sidewalk |
32 | Capitol Station / Congress & 11th | sidewalk |
33 | Lavaca & 6th | sidewalk |
34 | 8th & Lavaca | sidewalk |
35 | South Congress & James | sidewalk |
36 | Barton Springs @ Kinney Ave | sidewalk |
37 | Trinity & 6th Street | sidewalk |
38 | 21st & Speedway @PCL | sidewalk |
39 | 23rd & San Jacinto @ DKR Stadium | sidewalk |
40 | 6th & Congress | sidewalk |
41 | 11th & San Jacinto | sidewalk |
42 | Convention Center / 3rd & Trinity | sidewalk |
43 | Long Center @ South 1st & Riverside | sidewalk |
44 | State Capitol Visitors Garage @ San Jacinto & ... | sidewalk |
45 | Lake Austin Blvd @ Deep Eddy | sidewalk |
46 | Plaza Saltillo | sidewalk |
47 | Red River & 8th Street | sidewalk |
48 | Republic Square @ 5th & Guadalupe | sidewalk |
49 | South Congress & Barton Springs at the Austin ... | sidewalk |