Find peak cases

This entry is part 4 of 4 in the series Exploring data with SQL

Back in the Database Programming section of this series, we created the view named us_counties_new, showing the daily new case load. Recall that we first computed the seven-day running average of the total case loads in the us_counties_avg7 view, then used those numbers for the daily case loads. Having the daily case loads available lets us do some interesting analysis.

We can use those numbers to discover the peak case load in each county. We do that with a simple GROUP BY query, like this one.

SELECT MAX(newcases7) peakcases, 
       MAX(newdeaths7) peakdeaths,
       state, county, fips 
  FROM us_counties_new
 GROUP BY state, county, fips;

Once we have the peak case load for each county, we can use it to retrieve the date of the peak with a JOIN query like the one in this view.

CREATE OR REPLACE VIEW us_counties_peakcases AS
SELECT DATEDIFF(CURDATE(), date) daysago,
       detail.date, 
       detail.state, detail.county, detail.fips,
       detail.newcases7, detail.newdeaths7
  FROM us_counties_new detail
  JOIN (
        SELECT MAX(newcases7) peakcases7,
               MAX(newdeaths7) peakdeaths7, 
               state, county, fips 
          FROM us_counties_new
         GROUP BY state, county, fips
        ) peak
            ON detail.state = peak.state
           AND detail.county = peak.county
           AND detail.newcases7 = peak.peakcases7;

This query handles a lot of number crunching and can be slow. So, let’s put it into a view so we can easily add WHERE filtering. For example, we can find places where the peak case load occurred within the past three days like this:

SELECT * 
  FROM us_counties_peakcases
 WHERE daysago <= 10
 ORDER BY daysago
Series Navigation<< Database programming

Leave a Comment