Database programming

This entry is part 3 of 4 in the series Exploring data with SQL

In the last section, we populated our us_counties table and used it for some queries. Now it’s time to create some views of the data. With views we’ll be able to do more interesting things without retyping huge queries. We can use views as a foundation for more analysis.

The first view we need is the one to give us back the 7-day moving averages along with the raw data. We’ll call it us_counties_avg7. It looks like this.

CREATE OR REPLACE VIEW us_counties_avg7 AS
SELECT date, county, state, fips,
       cases, 
       AVG(cases) OVER (
          PARTITION BY state, county 
          ORDER BY date
          ROWS BETWEEN 6 PRECEDING
                   AND 0 FOLLOWING) AS cases7,
       deaths,
       AVG(deaths) OVER (
          PARTITION BY state, county 
          ORDER BY date
          ROWS BETWEEN 6 PRECEDING
                   AND 0 FOLLOWING) AS deaths7
  FROM us_counties
 WHERE county <> 'Unknown'
   AND cases >= 100
   AND deaths >= 10;

We exclude cases assigned to an Unknown county; those cases eventually are reassigned to an actual county by each state’s medical statisticians. We also exclude entries with small numbers of cases or deaths; this helps speed up the queries.

Now doing casual queries is easier. We can, for example, use the new view to get the 7-day moving average data for Los Angeles:

SELECT * 
  FROM us_counties_avg7 
 WHERE date >= '2020-11-01'
   AND county = 'Los Angeles'
   AND state = 'California'
 ORDER BY date;

Even better, we can create another view using the view we just created to give us the daily new case loads from the 7-day averages. Let’s call the new view us_counties_new.

CREATE OR REPLACE VIEW us_counties_new AS
SELECT today.date, today.state, today.county, today.fips, 
       today.cases,
       today.cases - yesterday.cases newcases,
       today.cases7 - yesterday.cases7 newcases7,
       today.deaths,
       today.deaths - yesterday.deaths newdeaths,
       today.deaths7 - yesterday.deaths7 newdeaths7
  FROM us_counties_avg7 today
  JOIN us_counties_avg7 yesterday
         ON today.date = yesterday.date + INTERVAL 1 DAY
        AND today.state = yesterday.state
        AND today.county = yesterday.county 

That makes it easy to get the daily caseloads like this. Again, they’re horrendous.

SELECT date, DAYNAME(date) weekday, 
       state, county, fips,
       newcases7, newdeaths7 
  FROM us_counties_new 
 WHERE date >= '2020-11-01'
   AND county = 'Los Angeles'
   AND state = 'California'
 ORDER BY date;
Series Navigation<< Getting started with the dataFind peak cases >>

Leave a Comment