- Explore pandemic data with SQL
- Getting started with the data
- Database programming
- Find peak cases
In the last section, we populated our us_counties
table and used it for some queries. Now it’s time to create some views of the data. With views we’ll be able to do more interesting things without retyping huge queries. We can use views as a foundation for more analysis.
The first view we need is the one to give us back the 7-day moving averages along with the raw data. We’ll call it us_counties_avg7
. It looks like this.
CREATE OR REPLACE VIEW us_counties_avg7 AS
SELECT date, county, state, fips,
cases,
AVG(cases) OVER (
PARTITION BY state, county
ORDER BY date
ROWS BETWEEN 6 PRECEDING
AND 0 FOLLOWING) AS cases7,
deaths,
AVG(deaths) OVER (
PARTITION BY state, county
ORDER BY date
ROWS BETWEEN 6 PRECEDING
AND 0 FOLLOWING) AS deaths7
FROM us_counties
WHERE county <> 'Unknown'
AND cases >= 100
AND deaths >= 10;
We exclude cases assigned to an Unknown county; those cases eventually are reassigned to an actual county by each state’s medical statisticians. We also exclude entries with small numbers of cases or deaths; this helps speed up the queries.
Now doing casual queries is easier. We can, for example, use the new view to get the 7-day moving average data for Los Angeles:
SELECT *
FROM us_counties_avg7
WHERE date >= '2020-11-01'
AND county = 'Los Angeles'
AND state = 'California'
ORDER BY date;
Even better, we can create another view using the view we just created to give us the daily new case loads from the 7-day averages. Let’s call the new view us_counties_new
.
CREATE OR REPLACE VIEW us_counties_new AS
SELECT today.date, today.state, today.county, today.fips,
today.cases,
today.cases - yesterday.cases newcases,
today.cases7 - yesterday.cases7 newcases7,
today.deaths,
today.deaths - yesterday.deaths newdeaths,
today.deaths7 - yesterday.deaths7 newdeaths7
FROM us_counties_avg7 today
JOIN us_counties_avg7 yesterday
ON today.date = yesterday.date + INTERVAL 1 DAY
AND today.state = yesterday.state
AND today.county = yesterday.county
That makes it easy to get the daily caseloads like this. Again, they’re horrendous.
SELECT date, DAYNAME(date) weekday,
state, county, fips,
newcases7, newdeaths7
FROM us_counties_new
WHERE date >= '2020-11-01'
AND county = 'Los Angeles'
AND state = 'California'
ORDER BY date;