A version of this article specific to the Oracle DBMS is here.
It’s often helpful to use SQL to group information by periods of time. For example, we might like to examine sales data. For example, we might have a table of individual sales transactions like so.
Sales:
sales_id int
sales_time datetime
net decimal(7,2)
tax decimal(7,2)
gross decimal(7,2)
Each time we make a sale, a new row is inserted into this table. This kind of information is called a time series: each row data has its own DATETIME time stamp. Notice that we could, in MySQL, give each row a TIMESTAMP value instead. TIMESTAMPs are limited in range, however. They don’t work for dates before January 1, 1970. So, for many applications DATETIME values are preferable.
What can we do to report on this table? Here are several choices.
Generate a detail report
A detail report typically shows one row for each item in the raw data table. Here’s a query to do that.
SELECT sales_time, gross FROM sales ORDER BY sales_time
This is trivial, but it illustrates an important concept: it’s possible to order by DATETIME values. You may notice that every query in this article has an ORDER BY clause; that’s because SQL doesn’t have to present its results in any given order unless it has an ORDER BY clause.
We can also generate a detail report showing all yesterday’s sales, using this query. The trick here is a WHERE clause restricting the range of sales_time values.
SELECT sales_time, gross FROM sales WHERE sales_time >= CURRENT_DATE() - INTERVAL 1 DAY AND sales_time < CURRENT_DATE() ORDER BY sales_time
This query illustrates something important about using DATETIME ranges in WHERE clauses. When selecting a whole day’s worth of information, we want to include everything starting from midnight on the day in question (that is, “CURRENT_DATE() – INTERVAL 1 DAY” meaning yesterday). We want our range of dates to run up to but not including midnight on the day after the day in question. A common mistake is to use a WHERE / BETWEEN clause like this one.
WHERE sales_time BETWEEN CURRENT_DATE() - INTERVAL 1 DAY AND CURRENT_DATE() /*wrong!*/
This clause includes sales_time values equal to midnight on the current day, which is not desired.
Another common mistake is to write the WHERE clause using a DATE function like this.
WHERE DATE(sales_time) = CURRENT_DATE() - INTERVAL 1 DAY /* inefficient!*/
This yields correct results, but is inefficient.
Summarize by day
Now that we understand our basic detail data, we can generate all sorts of useful summary (or aggregate) reports. First, we can report gross daily sales. To do this, we need to be able to summarize our sales table by date. Here’s the SQL query to do that:
SELECT DATE(sales_time) AS sales_date, SUM(gross) AS total, COUNT(*) AS transactions FROM sales GROUP BY DATE(sales_time) ORDER BY DATE(sales_time)
This uses the little formula DATE(sales_time) to convert a DATETIME timestamp into a pure date. It then groups by those pure dates sums up the gross sales and counts the transactions. The result set will contain a row for every date on which there were any sales.
Summarize by month
Second, we can report gross monthly sales with a similar query. To do this we need a formula to convert any DATETIME expression to the date of the first day of the month. This formula is a little trickier than DATE(sales_time), but ultimately it works the same way. It converts the DATETIME expression to a string that shows “01” for the day of the month, then converts that back to a date. This is it:
DATE(DATE_FORMAT(sales_time, '%Y-%m-01'))
We can apply this to our query to get a summary of sales by month.
SELECT DATE(DATE_FORMAT(sales_time, '%Y-%m-01')) AS month_beginning, SUM(gross) AS total, COUNT(*) AS transactions FROM sales GROUP BY DATE(DATE_FORMAT(sales_time, '%Y-%m-01')) ORDER BY DATE(DATE_FORMAT(sales_time, '%Y-%m-01'))
Summarize by calendar quarter
Now let’s report quarterly sales. How do we convert a DATETIME value to a calendar quarter value? Fortunately, MySQL offers functions to extract the year and quarter from DATETIME expressions. Let’s blend those functions together to do this
DATE(CONCAT(YEAR(sales_time),'-', 1 + 3*(QUARTER(sales_time)-1),'-01'))
Applying this formula to our report, here’s what we get:
SELECT DATE(CONCAT(YEAR(sales_time),'-', 1 + 3*(QUARTER(sales_time)-1),'-01')) AS quarter_beginning, SUM(gross) AS total, COUNT(*) AS transactions FROM sales GROUP BY DATE(CONCAT(YEAR(sales_time),'-', 1 + 3*(QUARTER(sales_time)-1),'-01')) ORDER BY DATE(CONCAT(YEAR(sales_time),'-', 1 + 3*(QUARTER(sales_time)-1),'-01'))
Using valid DATE expressions
Notice the discipline of always using valid dates for these reports. When we have valid dates, we gain all kinds of advantages in MySQL. For example, if we know the first day of a quarter, we can find out the last day with this expression.
quarter_beginning + INTERVAL 1 QUARTER - INTERVAL 1 DAY
For the quarterly reports we could have chosen something else. For example, instead of saying the quarter beginning on 2013-10-01, the first of October 2013, we could have generated the value CY2013Q4, meaning the fourth quarter of calendar year 2013. This expression would do that.
CONCAT('CY',YEAR(sales_time),'Q',QUARTER(sales_time))
But, expressions like “CY2013Q3” and “CY2013Q4” aren’t valid dates. Instead, they are text strings, so attempts to use them for date arithmetic will yield NULL values.
Summarize by calendar week
Summarizing time-series data by calendar week is useful, and it works very much like the other interval reports. To make this work, we need a formula that can yield the first day of the calendar week given any DATETIME value. If Sunday is the first day of the week, this is the formula to use.
FROM_DAYS(TO_DAYS(sales_time) -MOD(TO_DAYS(sales_time) -1, 7))
It retrieves the date of the preceding Sunday from any DATETIME value. This calendar-week expression works properly for all weeks, including the special case of weeks that span the end of calendar years. For example, for both Tuesday, December 31, 2013, and Thursday, January 2, 2014, this expression returns Sunday, December 29, 2013.
In some jurisdictions, Monday is considered the first day of the week. This expression computes the preceding Monday’s date.
FROM_DAYS(TO_DAYS(sales_time) -MOD(TO_DAYS(sales_time) -2, 7))
You need to choose the appropriate expression for your country and your application’s customs about which day is the start of each week.
Here is the query to summarize sales by week (given a Sunday week start).
SELECT FROM_DAYS(TO_DAYS(sales_time) -MOD(TO_DAYS(sales_time) -1, 7)) AS week_beginning, SUM(gross) AS total, COUNT(*) AS transactions FROM sales GROUP BY FROM_DAYS(TO_DAYS(sales_time) -MOD(TO_DAYS(sales_time) -1, 7)) ORDER BY FROM_DAYS(TO_DAYS(sales_time) -MOD(TO_DAYS(sales_time) -1, 7))
Generate a summary of last week’s sales, day by day
Now that we know how to handle days and weeks, we can generate a day-by-day summary of last week’s sales. Here’s how we do that.
First, we need an expression for the first day of last week. Here it is:
FROM_DAYS(TO_DAYS(CURRENT_DATE() - INTERVAL 7 DAY) - MOD(TO_DAYS(CURRENT_DATE() - INTERVAL 7 DAY) -1, 7))
Next, we need an expression for the first day of the present week. Here it is:
FROM_DAYS(TO_DAYS(CURRENT_DATE()) - MOD(TO_DAYS(CURRENT_DATE()) -1, 7))
These two expressions need to be built into a WHERE clause, as follows.
WHERE sales_time >= FROM_DAYS(TO_DAYS(CURRENT_DATE() - INTERVAL 7 DAY) - MOD(TO_DAYS(CURRENT_DATE() - INTERVAL 7 DAY) -1, 7)) AND sales_time < FROM_DAYS(TO_DAYS(CURRENT_DATE()) - MOD(TO_DAYS(CURRENT_DATE()) -1, 7))
Finally, this WHERE clause needs to be applied to our day-by-day summary of sales, and our report of the most recently completed week is done
SELECT DATE(sales_time) AS sales_date, SUM(gross) AS total, COUNT(*) AS transactions FROM sales WHERE sales_time >= FROM_DAYS(TO_DAYS(CURRENT_DATE() - INTERVAL 7 DAY) - MOD(TO_DAYS(CURRENT_DATE() - INTERVAL 7 DAY) -1, 7)) AND sales_time < FROM_DAYS(TO_DAYS(CURRENT_DATE()) - MOD(TO_DAYS(CURRENT_DATE()) -1, 7)) GROUP BY DATE(sales_time) ORDER BY DATE(sales_time)
Using Stored Functions for Date Computations
You’ve probably noticed that this SQL code is quite repetitive. This repetition can be a little difficult to read. MySQL offers stored functions, and some (but not all) hosting providers allow their users to define their own stored functions. Let’s create stored functions for these formulas that take arbitrary DATETIME values and derive the beginnings of various time periods from them. This is a form of mathematical truncation. Just as TRUNC(3.14) yields 3, TRUNC_SUNDAY(‘2014-01-02’) yields ‘2013-12-19’ and TRUNC_MONTH(‘2014-01-02’) yields ‘2014-01-01’. We can define these functions:
- TRUNC_HOUR(time) retrieves the most recent date and hour for any DATETIME expression. “2013-12-31 10:15” becomes “2013-12-31 10:00”
- ROUND_HOUR(time) retrieves the nearest hour for any DATETIME expression. This is useful when you have timestamps for physical measurements. For example, “2013-12-31 23:45” becomes “2014-01-01 00:00”. “2013-12-15 10:29” and “2013-12-15 9:45” both become “2013-12-15 10:00”.
- TRUNC_N_MINUTES(time,n) retrieves the most recent n-minute boundary. For example, if n is 15, then “2013-12-31 23:59:50” becomes “2013-12-31 23:45:00”. Similarly, “2013-12-31 10:01” becomes “2013-12-31 10:00”. To group your times by tenths of hours, use 6 for your value of n: 6 minutes are a tenth of an hour.
- TRUNC_N_HOURS(time,n) retrieves the most recent n-hour boundary. For example, if n is 6, then “2013-12-31 23:59:50” becomes “2013-12-31 18:00:00”. Similarly, “2013-12-31 10:01” becomes “2013-12-31 06:00”. To group your times into four groups per day, use 6 for your value of n: 6 hours are a quarter of a (24-hour) day.
- TRUNC_DAY(time) retrieves the date (at midnight). “2013-12-31 10:15” becomes “2013-12-31”
- TRUNC_SUNDAY(time) retrieves the preceding Sunday. “2013-12-31 10:15” becomes “2013-12-29”
- TRUNC_MONDAY(time) retrieves the preceding Monday. “2013-12-31 10:15” becomes “2013-12-30”
- TRUNC_MONTH(time) retrieves the first day of the month. “2013-12-31 10:15” becomes “2013-12-01”
- TRUNC_QUARTER(time) retrieves the first day of the quarter.”2013-12-31 10:15″ becomes “2013-10-01”
- TRUNC_YEAR(time) retrieves the first day of the year.”2013-12-31 10:15″ becomes “2013-01-01”
With these functions defined, our date-summary queries get much easier to read. For example, the daily summary of sales for last week looks like this:
SELECT TRUNC_DAY(sales_time) AS sales_date, SUM(gross) AS total, COUNT(*) AS transactions FROM sales WHERE sales_time >= TRUNC_SUNDAY(CURRENT_DATE() - INTERVAL 7 DAY) AND sales_time < TRUNC_SUNDAY(CURRENT_DATE()) GROUP BY TRUNC_DAY(sales_time) ORDER BY TRUNC_DAY(sales_time)
In similar fashion, a monthly summary of last year’s sales would look like this;
SELECT TRUNC_MONTH(sales_time) AS month_beginning, SUM(gross) AS total, COUNT(*) AS transactions FROM sales WHERE sales_time >= TRUNC_YEAR(CURRENT_DATE() - INTERVAL 1 YEAR) AND sales_time < TRUNC_YEAR(CURRENT_DATE()) GROUP BY TRUNC_MONTH(sales_time) ORDER BY TRUNC_MONTH(sales_time)
Indexes and efficiency
When time series data tables are very large, indexes on DATETIME columns can be very helpful to speed up queries. In our example sales table, this is the index to use:
ALTER TABLE sales ADD INDEX sales_time (sales_time) USING BTREE;
When a BTREE index like this is present, a WHERE clause like the one in the previous section uses it to do a very efficient range scan. Notice that it’s possible to write WHERE clauses in ways that make indexes useless, however. For example:
WHERE TRUNC_YEAR(sales_time) = /* inefficient! */ TRUNC_YEAR(CURRENT_DATE() - INTERVAL 1 YEAR)
will correctly retrieve all sales from last year. However, because it applies a function to the value of the column it’s using to search, it cannot use the index. So it will go through the whole table row by row, compute TRUNC_YEAR(sales_time), and compare it.
Stored Functions Definitions
These are the MySQL definitions of the stored TRUNC functions.
You are free to use these for any purpose you wish.
Notice that you could create MSSQL, Oracle, or PostGreSQL implementations of these functions. If you did that you’d make your time-interval reporting SQL more portable between makes and models of RDBMS.
Is there a database-agnostic way to do this with basic SQL syntax that isn’t specific to MySQL? I would like to accomplish a similar effect in SAS (which is terrible for this kind of thing but it’s all my employer has granted to me).
If you use SAS SQL in pass-through mode you need to use the datestamp-truncation functions of the underlying database server. If you it native, in PROC SQL mode, you need documentation about how it handles datestamps internally. These things vary from one dialect of SQL to another. I’ve never used SAS.
There seems to be a lot of sites with bad/mediocre advice on mysql. This site is NOT one of them. Well done Ollie on your clear and concise (not to mention useful) post. Mark, New Zealand.
Thanks for the kind words, Mark. By the way, I’m looking for somebody in the Southern Hemisphere to test the location-finder code on this site, and make sure it works correctly there. If you get an opportunity … Thanks!
Ollie,
What is the difference between sales_date and sales_time? You seem to use them interchangeably in the beginning of the post.
Thanks,
Doug
Doug, you’re right. I’ve corrected the problems.
Thanks, Ollie. I greatly appreciate your efforts to provide clear descriptions and excellent examples. I needed to summarize a time series by week. and your concise explanation was all I needed.
Dave, you’re welcome. Thanks for the kind words!