SQL Essentials

Open In Colab

SQL Essentials#

from google.colab import auth
auth.authenticate_user()
print('Authenticated')
Authenticated

Let’s start with the basics. I’ll continue to work with the liquor store data.

%%bigquery --project pic-math
SELECT *
FROM `bigquery-public-data.iowa_liquor_sales.sales`
LIMIT 5
invoice_and_item_number date store_number store_name address city zip_code store_location county_number county ... item_number item_description pack bottle_volume_ml state_bottle_cost state_bottle_retail bottles_sold sale_dollars volume_sold_liters volume_sold_gallons
0 INV-23344200024 2019-11-19 4452 SELECT MART GORDON DR 2825 GORDON DR SIOUX CITY 51105 POINT(-96.372014 42.489632) 97 WOODBURY ... 100510 RUMCHATA MINICHATAS HOLIDAY MUG 12 25 5.00 7.50 1 7.50 0.02 0.00
1 INV-22726700022 2019-10-22 5504 NEIGHBORHOOD MART 2102 LAFAYETTE ST WATERLOO 50703 POINT(-92.313923 42.489893) 7 BLACK HAWK ... 100794 RUMCHATA MINICHATAS SUMMER TUMBLER VAP 12 25 5.00 7.50 1 7.50 0.02 0.00
2 INV-40707200098 2021-10-05 5423 STAMMER LIQUOR CORP 615 2ND AVE SHELDON 51201.0 None 71 O'BRIEN ... 65259 JAGERMEISTER LIQUEUR MINI MEISTERS 12 20 4.93 7.40 1 7.40 0.02 0.00
3 INV-61319400030 2023-08-16 5822 CASEY'S GENERAL STORE #2836 / MONROE 509 S MONROE ST MONROE 50170.0 POINT(-93.10249604 41.51755801) None JASPER ... 65259 JAGERMEISTER LIQUEUR MINI MEISTERS 12 20 5.63 8.45 1 8.45 0.02 0.00
4 INV-33181800087 2021-01-04 2614 HY-VEE #3 FOOD & DRUGSTORE / DAVENPORT 1823 E KIMBERLY RD DAVENPORT 52807.0 POINT(-90.548919 41.556781) 82 SCOTT ... 100413 FIREBALL CINNAMON WHISKEY PARTY BUCKET 1 50 51.60 77.40 1 77.40 0.05 0.01

5 rows × 24 columns

Why did I run that command? Well it gives me an idea of what is in the table to reference and think about what questions I might ask! Let’s see how many gallons of liqour have been sold.

%%bigquery --project pic-math
SELECT SUM(volume_sold_gallons) as Total_Gallons_of_Liquor
FROM `bigquery-public-data.iowa_liquor_sales.sales`
Total_Gallons_of_Liquor
0 68071897.63

Let’s do a small conversion just to see what that means. An olympic pool holds 660 000 gallons so

(5.560763*10**7)/660000
84.25398484848485

About 84 swimming pools of liquor in Iowa! Fun times…

Let’s make it more complicated. Let’s see what the dollars per gallon is on the full dataset.

%%bigquery --project pic-math
SELECT SUM(sale_dollars)/SUM(volume_sold_gallons) as Total_Dollars_Per_Gallon
FROM `bigquery-public-data.iowa_liquor_sales.sales`
Total_Dollars_Per_Gallon
0 59.720321

Okay not terribly interesting. Let’s take that quesiton and add too it. Let’s create a column that is dollars per gallon of liquor.

%%bigquery --project pic-math
SELECT sale_dollars/volume_sold_gallons as Dollars_Per_Gallon
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE volume_sold_gallons != 0
LIMIT 5
Dollars_Per_Gallon
0 7740.0
1 7740.0
2 7740.0
3 7740.0
4 7740.0

I have added a few new things here. The WHERE clause allows me to restrict what I consider. You can combine several of the statements logically

%%bigquery --project pic-math
SELECT sale_dollars/volume_sold_gallons as Dollars_Per_Gallon, item_description
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE volume_sold_gallons != 0 and category_name = 'COFFEE LIQUEURS'
Dollars_Per_Gallon item_description
0 792.000000 KAHLUA COFFEE MINI
1 792.000000 KAHLUA COFFEE MINI
2 792.000000 KAHLUA COFFEE MINI
3 112.333333 KAHLUA COFFEE
4 108.222222 KAHLUA COFFEE
... ... ...
177721 86.500000 KAHLUA COFFEE LIQUEUR
177722 86.500000 KAHLUA COFFEE LIQUEUR
177723 86.500000 KAHLUA COFFEE LIQUEUR
177724 86.500000 KAHLUA COFFEE LIQUEUR
177725 86.500000 KAHLUA COFFEE LIQUEUR

177726 rows × 2 columns

I added the item_description so that I could see which ones were different. It is not utilized the the analysis yet. Let’s include it by getting the average price of coffee liqueurs based on the description. To do this I’ll add the GROUP BY command

%%bigquery --project pic-math
SELECT AVG(sale_dollars/volume_sold_gallons) as Average_Dollars_Per_Gallon, item_description
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE volume_sold_gallons != 0 and category_name = 'COFFEE LIQUEURS'
GROUP BY item_description
Average_Dollars_Per_Gallon item_description
0 98.042729 KAHLUA COFFEE
1 79.648073 KAHLUA COFFEE LIQUEUR MINI
2 93.760237 KAHLUA ESPECIAL
3 89.804451 KAHLUA COFFEE LIQUEUR
4 38.196790 CHILA COFFEE LIQUEUR
5 63.064841 KARIMBA COFFEE LIQUEUR
6 16.996845 CONCIERE COFFEE LIQUEUR
7 99.494591 KAHLUA VANILLA
8 70.697000 TEQUILA ROSE W/2 SHOT GLASSES HA
9 48.306810 KAMORA COFFEE LIQUEUR
10 110.811697 IRISHMAN IRISH COFFEE
11 109.122690 PATRON XO CAFE
12 154.526005 IOWA COFFEE COMPANY LIQUEUR
13 35.378913 SABROSO
14 88.089693 KAHLUA MOCHA LIQUEUR
15 99.575605 KAHLUA MIDNIGHT
16 134.012833 MR. BLACK COLD BREW COFFEE LIQUEUR
17 126.801440 LUXARDO ESPRESSO ITALIAN COFFEE LIQUEUR
18 68.273560 GRIND ESPRESSO SHOT
19 90.549048 KAHLUA FRENCH VANILLA LIQUEUR DISCO
20 114.019258 CANTERA NEGRA CAFE
21 96.567398 KAHLUA SALTED CARAMEL
22 42.167376 COPA DE ORO MEXICAN COFFEE
23 114.796784 MOZART CHOCOLATE COFFEE CREAM LIQUEUR
24 72.934872 CAROLANS COLD BREW
25 90.036168 AVION ESPRESSO
26 138.051271 ORIGINAL SECRET FAMILY RECIPE - A COFFEE LIQUEUR
27 247.964566 CCD COFFEE LIQUEUR
28 1136.210526 KAHLUA CHILI CHOCOLATE
29 830.290931 KAHLUA COFFEE MINI
30 103.666408 J. RIEGER & CO. CAFFE AMARO
31 98.117663 TIA MARIA COFFEE LIQUEUR
32 109.663866 PATRON XO CAFE DARK COCOA
33 181.670886 JUMPING GOAT COLD BREWED COFFEE LIQUEUR WITH N...
34 974.160000 KAMORA COFFEE LIQUEUR MINI
35 38.087056 CAFFE LOLITA COFFEE LIQUEUR
36 88.947355 KAHLUA FRENCH VANILLA LIQUEUR
37 112.367730 CAZADORES CAFE
38 41.799355 KAPALI COFFEE LIQUEUR
39 163.065954 PATRON XO CAFE MINI
40 90.327941 KAHLUA PEPPERMINT MOCHA
41 102.377486 XO CAFE DARK COCOA
42 65.656795 DR MCGILLICUDDYS COFFEE
43 36.908517 HIRAM WALKER COFFEE BRANDY
44 181.670886 JUMPING GOAT COLD BREWED COFFEE LIQUEUR BLACK ...

Okay but we pointed out earlier that there is no order… Let’s force an order on it

%%bigquery --project pic-math
SELECT AVG(sale_dollars/volume_sold_gallons) as Average_Dollars_Per_Gallon, item_description
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE volume_sold_gallons != 0 and category_name = 'COFFEE LIQUEURS'
GROUP BY item_description
ORDER BY Average_Dollars_Per_Gallon DESC
Average_Dollars_Per_Gallon item_description
0 1136.210526 KAHLUA CHILI CHOCOLATE
1 974.160000 KAMORA COFFEE LIQUEUR MINI
2 830.290931 KAHLUA COFFEE MINI
3 247.964566 CCD COFFEE LIQUEUR
4 181.670886 JUMPING GOAT COLD BREWED COFFEE LIQUEUR WITH N...
5 181.670886 JUMPING GOAT COLD BREWED COFFEE LIQUEUR BLACK ...
6 163.065954 PATRON XO CAFE MINI
7 154.526005 IOWA COFFEE COMPANY LIQUEUR
8 138.051271 ORIGINAL SECRET FAMILY RECIPE - A COFFEE LIQUEUR
9 134.012833 MR. BLACK COLD BREW COFFEE LIQUEUR
10 126.801440 LUXARDO ESPRESSO ITALIAN COFFEE LIQUEUR
11 114.796784 MOZART CHOCOLATE COFFEE CREAM LIQUEUR
12 114.019258 CANTERA NEGRA CAFE
13 112.367730 CAZADORES CAFE
14 110.811697 IRISHMAN IRISH COFFEE
15 109.663866 PATRON XO CAFE DARK COCOA
16 109.122690 PATRON XO CAFE
17 103.666408 J. RIEGER & CO. CAFFE AMARO
18 102.377486 XO CAFE DARK COCOA
19 99.575605 KAHLUA MIDNIGHT
20 99.494591 KAHLUA VANILLA
21 98.117663 TIA MARIA COFFEE LIQUEUR
22 98.042729 KAHLUA COFFEE
23 96.567398 KAHLUA SALTED CARAMEL
24 93.760237 KAHLUA ESPECIAL
25 90.549048 KAHLUA FRENCH VANILLA LIQUEUR DISCO
26 90.327941 KAHLUA PEPPERMINT MOCHA
27 90.036168 AVION ESPRESSO
28 89.804451 KAHLUA COFFEE LIQUEUR
29 88.947355 KAHLUA FRENCH VANILLA LIQUEUR
30 88.089693 KAHLUA MOCHA LIQUEUR
31 79.648073 KAHLUA COFFEE LIQUEUR MINI
32 72.934872 CAROLANS COLD BREW
33 70.697000 TEQUILA ROSE W/2 SHOT GLASSES HA
34 68.273560 GRIND ESPRESSO SHOT
35 65.656795 DR MCGILLICUDDYS COFFEE
36 63.064841 KARIMBA COFFEE LIQUEUR
37 48.306810 KAMORA COFFEE LIQUEUR
38 42.167376 COPA DE ORO MEXICAN COFFEE
39 41.799355 KAPALI COFFEE LIQUEUR
40 38.196790 CHILA COFFEE LIQUEUR
41 38.087056 CAFFE LOLITA COFFEE LIQUEUR
42 36.908517 HIRAM WALKER COFFEE BRANDY
43 35.378913 SABROSO
44 16.996845 CONCIERE COFFEE LIQUEUR

Let’s keep going down the rabbit hole here. What if we want to rank them? There are lots of ways ROW_NUMBER, RANK and DENSE_RANK. I find them difficult to use because they require lots of other inputs.

The general call is something like

ROW_NUMBER() OVER(PARTITION BY ________ ORDER BY _________)

Partition is like grouping. I’ll add another liquor to use it

%%bigquery --project pic-math
SELECT
  AVG(sale_dollars/volume_sold_gallons) as Average_Dollars_Per_Gallon,
  item_description,
  ROW_NUMBER() OVER(
    PARTITION BY category_name
    ORDER BY item_description) row_num
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE volume_sold_gallons != 0 and (category_name = 'COFFEE LIQUEURS')
GROUP BY item_description, category_name
ORDER BY Average_Dollars_Per_Gallon DESC
Average_Dollars_Per_Gallon item_description row_num
0 1136.210526 KAHLUA CHILI CHOCOLATE 18
1 974.160000 KAMORA COFFEE LIQUEUR MINI 32
2 830.290931 KAHLUA COFFEE MINI 22
3 247.964566 CCD COFFEE LIQUEUR 6
4 181.670886 JUMPING GOAT COLD BREWED COFFEE LIQUEUR BLACK ... 16
5 181.670886 JUMPING GOAT COLD BREWED COFFEE LIQUEUR WITH N... 17
6 163.065954 PATRON XO CAFE MINI 41
7 154.526005 IOWA COFFEE COMPANY LIQUEUR 13
8 138.051271 ORIGINAL SECRET FAMILY RECIPE - A COFFEE LIQUEUR 38
9 134.012833 MR. BLACK COLD BREW COFFEE LIQUEUR 37
10 126.801440 LUXARDO ESPRESSO ITALIAN COFFEE LIQUEUR 35
11 114.796784 MOZART CHOCOLATE COFFEE CREAM LIQUEUR 36
12 114.019258 CANTERA NEGRA CAFE 3
13 112.367730 CAZADORES CAFE 5
14 110.811697 IRISHMAN IRISH COFFEE 14
15 109.663866 PATRON XO CAFE DARK COCOA 40
16 109.122690 PATRON XO CAFE 39
17 103.666408 J. RIEGER & CO. CAFFE AMARO 15
18 102.377486 XO CAFE DARK COCOA 45
19 99.575605 KAHLUA MIDNIGHT 26
20 99.494591 KAHLUA VANILLA 30
21 98.117663 TIA MARIA COFFEE LIQUEUR 44
22 98.042729 KAHLUA COFFEE 19
23 96.567398 KAHLUA SALTED CARAMEL 29
24 93.760237 KAHLUA ESPECIAL 23
25 90.549048 KAHLUA FRENCH VANILLA LIQUEUR DISCO 25
26 90.327941 KAHLUA PEPPERMINT MOCHA 28
27 90.036168 AVION ESPRESSO 1
28 89.804451 KAHLUA COFFEE LIQUEUR 20
29 88.947355 KAHLUA FRENCH VANILLA LIQUEUR 24
30 88.089693 KAHLUA MOCHA LIQUEUR 27
31 79.648073 KAHLUA COFFEE LIQUEUR MINI 21
32 72.934872 CAROLANS COLD BREW 4
33 70.697000 TEQUILA ROSE W/2 SHOT GLASSES HA 43
34 68.273560 GRIND ESPRESSO SHOT 11
35 65.656795 DR MCGILLICUDDYS COFFEE 10
36 63.064841 KARIMBA COFFEE LIQUEUR 34
37 48.306810 KAMORA COFFEE LIQUEUR 31
38 42.167376 COPA DE ORO MEXICAN COFFEE 9
39 41.799355 KAPALI COFFEE LIQUEUR 33
40 38.196790 CHILA COFFEE LIQUEUR 7
41 38.087056 CAFFE LOLITA COFFEE LIQUEUR 2
42 36.908517 HIRAM WALKER COFFEE BRANDY 12
43 35.378913 SABROSO 42
44 16.996845 CONCIERE COFFEE LIQUEUR 8

You should notice that the ROW_NUMBER didn’t do what we needed. You will not be able to do the row nor rank on the column we created because it is not yet available to the SQL call. This leads to sub-processees. Let’s show one today and come back to it next class.

%%bigquery --project pic-math
WITH t as(
SELECT
  AVG(sale_dollars/volume_sold_gallons) as Average_Dollars_Per_Gallon,
  item_description, category_name
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE volume_sold_gallons != 0 and (category_name = 'COFFEE LIQUEURS' or category_name = 'IMPORTED VODKAS')
GROUP BY item_description, category_name
ORDER BY Average_Dollars_Per_Gallon DESC
)

SELECT *,
    RANK() OVER(
    PARTITION BY category_name
    ORDER BY Average_Dollars_Per_Gallon) rk_num
FROM t
ORDER BY rk_num
Average_Dollars_Per_Gallon item_description category_name rk_num
0 16.996845 CONCIERE COFFEE LIQUEUR COFFEE LIQUEURS 1
1 16.573795 PURUS ORGANIC VODKA MINI IMPORTED VODKAS 1
2 35.378913 SABROSO COFFEE LIQUEURS 2
3 23.316243 SOOH RELSKA 80 PROOF VODKA IMPORTED VODKAS 2
4 36.908517 HIRAM WALKER COFFEE BRANDY COFFEE LIQUEURS 3
... ... ... ... ...
346 2939.445000 GREY GOOSE ESSENCES STRAWBERRY & LEMONGRASS MINI IMPORTED VODKAS 302
347 2945.650394 GREY GOOSE ESSENCES WATERMELON & BASIL MINI IMPORTED VODKAS 303
348 9000.000000 OUTERSPACE VODKA MINI IMPORTED VODKAS 304
349 16996.723367 E.T.51 PREMIUM VODKA MINI IMPORTED VODKAS 305
350 17136.000000 E.T. 51 PREMIUM VODKA MINI IMPORTED VODKAS 306

351 rows × 4 columns

We will come back to this but a nice taste of some of the really powerful aspects of SQL!

Dealing With Strings#

I wanted to find why ‘vodka’ wasn’t in the table. Well it was but with other qualifiers. I needed to use a string command LIKE I also looked at CONTAINS but didn’t get it to work.

%%bigquery --project pic-math
SELECT category_name
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE category_name LIKE '%VODKA%'
GROUP BY category_name
category_name
0 VODKA FLAVORED
1 IMPORTED VODKAS
2 IMPORTED FLAVORED VODKA
3 VODKA 80 PROOF
4 100 PROOF VODKA
5 IMPORTED VODKA - MISC
6 OTHER PROOF VODKA
7 AMERICAN FLAVORED VODKA
8 AMERICAN VODKAS
9 IMPORTED VODKA - CHERRY
10 LOW PROOF VODKA

Your Turn#

Assignement for today

  1. Start a notebook getting BigQuery to work. Feel free to use the authentication atop.

  2. Navigate to the dataset ‘austin_bikeshare.bikeshare_trips’

  3. Compute the average time for a trip based on starting point

  4. Compute how many trips start at each starting point.