Computing the mode with SQL

The mode of a bunch of numbers is the most frequent number in the bunch. A bunch of numbers can have more than one most-frequent number.

Given a table tbl with columns category and value, this SQL returns the mode for each category.

 

WITH freq AS (
    SELECT COUNT(*) freq,
           value,
           category
      FROM tbl
     GROUP BY value, category
),
most AS (
     SELECT MAX(freq) freq,
           category
      FROM freq
     GROUP BY category
),
mode AS (
      SELECT GROUP_CONCAT(value ORDER BY value DESC) modes,
             freq.category
        FROM freq
        JOIN most ON freq.freq = most.freq
       GROUP BY freq.category
)
SELECT * FROM mode

 

This works first by finding the frequencies of each value in each category, using the freq CTE. It then extracts the largest frequency value for each category, using the most CTE. FInally, it joins the freq and most virtual tables together to extract the most frequent values from freq.

It uses GROUP_CONCAT() because it’s possible for there to be more than one mode  — more than one most-frequent value — in a set of data.

Leave a Reply

Your email address will not be published. Required fields are marked *