- Explore pandemic data with SQL
- Getting started with the data
- Database programming
- Find peak cases
In the last section, we populated our us_counties
table and used it for some queries. Now it’s time to create some views of the data. With views we’ll be able to do more interesting things without retyping huge queries. We can use views as a foundation for more analysis.
The first view we need is the one to give us back the 7-day moving averages along with the raw data. We’ll call it us_counties_avg7
. It looks like this.
CREATE OR REPLACE VIEW us_counties_avg7 AS SELECT date, county, state, fips, cases, AVG(cases) OVER ( PARTITION BY state, county ORDER BY date ROWS BETWEEN 6 PRECEDING AND 0 FOLLOWING) AS cases7, deaths, AVG(deaths) OVER ( PARTITION BY state, county ORDER BY date ROWS BETWEEN 6 PRECEDING AND 0 FOLLOWING) AS deaths7 FROM us_counties WHERE county <> 'Unknown' AND cases >= 100 AND deaths >= 10;
We exclude cases assigned to an Unknown county; those cases eventually are reassigned to an actual county by each state’s medical statisticians. We also exclude entries with small numbers of cases or deaths; this helps speed up the queries.
Now doing casual queries is easier. We can, for example, use the new view to get the 7-day moving average data for Los Angeles:
SELECT * FROM us_counties_avg7 WHERE date >= '2020-11-01' AND county = 'Los Angeles' AND state = 'California' ORDER BY date;
Even better, we can create another view using the view we just created to give us the daily new case loads from the 7-day averages. Let’s call the new view us_counties_new
.
CREATE OR REPLACE VIEW us_counties_new AS SELECT today.date, today.state, today.county, today.fips, today.cases, today.cases - yesterday.cases newcases, today.cases7 - yesterday.cases7 newcases7, today.deaths, today.deaths - yesterday.deaths newdeaths, today.deaths7 - yesterday.deaths7 newdeaths7 FROM us_counties_avg7 today JOIN us_counties_avg7 yesterday ON today.date = yesterday.date + INTERVAL 1 DAY AND today.state = yesterday.state AND today.county = yesterday.county
That makes it easy to get the daily caseloads like this. Again, they’re horrendous.
SELECT date, DAYNAME(date) weekday, state, county, fips, newcases7, newdeaths7 FROM us_counties_new WHERE date >= '2020-11-01' AND county = 'Los Angeles' AND state = 'California' ORDER BY date;