Back to articles list Articles Cookbook
Updated: 14th Nov 2024 8 minutes read

5 Examples of GROUP BY in SQL

Data grouping—or data aggregation—is an important concept in the world of databases. In this article, we’ll demonstrate how you can use the GROUP BY clause in practice. We’ve gathered five GROUP BY examples, from easier to more complex ones so you can see data grouping in a real-life scenario. As a bonus, you’ll also learn a bit about aggregate functions and the HAVING clause.

SQL is a universal language to talk to databases that has been around for almost 50 years. If you’re a complete beginner, consider taking our SQL Basics course before reading this article.

One of the core concepts behind SQL is data grouping, or data aggregation. If you’re reading this article, you’ve probably already heard about the GROUP BY clause. To help you understand it better, we’ve presented five business problems and showed how they can be solved in our GROUP BY examples.

If you need a quick introduction to GROUP BY, watch our five-minute YouTube video.

Input Data

In this article, we’re going to help an imaginary museum analyze their guests. We’ll use the history of visits to the museum to derive meaningful insights using the GROUP BY clause. Let’s get down to work, then!

We’re going to work with a single table named visit. Each row represents a single visit to the museum. Below, you can see a few sample rows from this table:

visit

datepriceduration
2020-05-01215
2020-05-01437
2020-05-06724
...

As you can see, the table is not very complicated. It only contains three columns:

  • date—The date of the visit to the museum.
  • price—The price paid for the ticket in dollars (you will see a variety of prices because the museum uses different pricing options on various days along with many types of reduced tickets).
  • duration—The duration of the visit to the museum in minutes.

Why Do We Group Rows?

We know that we can aggregate (group) rows in SQL, but why do we do that? The GROUP BY clause is typically used alongside aggregate functions, which compute various statistics about the groups of rows. The five most basic aggregate functions in SQL are:

  • COUNT()—Used to count the number of rows.
  • AVG()—Used to find the average value.
  • MIN() and MAX()—Used to find the minimum and maximum value, respectively.
  • SUM()—Used to find the sum of all values.

In short, we group rows to compute various statistics.

GROUP BY Examples

Good. Now that we know a bit about aggregate functions, let’s take a look at five GROUP BY examples.

Example 1: GROUP BY With One Column

We’ll start with a simple example. We want to find out how many people visited the museum on each day. In other words, for each date, we’ll show the number of visits to the museum. The query we need will look like this:

SELECT date, COUNT(*)
FROM visit
GROUP BY date;

We only have two columns: date and count. COUNT(*) means “count everything.” Because we also use the date column in the GROUP BY clause, we’ll see a separate count for each date. When we run the query in our database, we should see something like this:

datecount
2020-06-297
2020-05-236
2020-06-235
...

Excellent. We now know how many people visited the museum on each day.

Instead of COUNT(*), which means “count every row,” we could also use a column name inside, e.g., COUNT(duration). The difference is that COUNT(*) counts all rows for a given group even if some rows contain NULL (unknown) values. COUNT(duration), on the other hand, only counts those rows that have a non-NULL value in the duration column. In our table, however, there are no NULL values, so the result would be the same. You can read more about the difference between different COUNT version in What is COUNT(*), COUNT(1), COUNT(column), and COUNT(DISTINCT) in SQL?.

Example 2: GROUP BY With Two Columns

We now want to know the average price paid for a ticket in a given month. For this, we’ll need a more complicated query. Take a look:

SELECT 
  EXTRACT(YEAR FROM date) AS year, 
  EXTRACT(MONTH FROM date) AS month, 
  ROUND(AVG(price), 2) AS avg_price
FROM visit
GROUP BY
  EXTRACT(YEAR FROM date),
  EXTRACT(MONTH FROM date);

We used the EXTRACT(YEAR FROM date) function to get the year from each date, and we used EXTRACT(MONTH FROM date) to get the month (as a numerical value, where “1” means “January,” “2” means “February,” etc.). Note that they are treated as separate columns, so you’ll see the year in one column and the month in another one.

Because we want to see the average for each month of each year separately, we also need to group by these two columns. We have to repeat the same functions in the GROUP BY clause.

The third column is a combination of two functions. Inside, we’ve got AVG(price), which will calculate the average price in each group. We also have ROUND(AVG(price), 2)) to round the average value to two decimal places.

When you use a GROUP BY clause, try to remember the golden rule: All column names from the SELECT clause should either appear in the GROUP BY clause or be used in the aggregate functions. In this case, both EXTRACT(YEAR FROM date) and EXTRACT(MONTH FROM date) should appear in the GROUP BY clause. If you forget about one of them, you’ll probably see an error. The third column uses an aggregate function, AVG(price), so it’s not mentioned in the GROUP BY clause.
There are some exceptions to this rule, and they may lead to unexpected behavior.

When we run the query, we’ll see something like this:

yearmonthavg_price
202057.52
202066.70

As you can see, the average ticket price decreased in June, compared to May. This could translate into lower income for the museum.

Example 3: GROUP BY and ORDER BY

This time, we want to find the average visit duration value for each month. We also want to make sure that the rows are sorted chronologically. The query we’ll need will be similar to the previous example:

SELECT 
  EXTRACT(YEAR FROM date) AS year, 
  EXTRACT(MONTH FROM date) AS month, 
  ROUND(AVG(duration), 2)
FROM visit
GROUP BY 
  EXTRACT(YEAR FROM date), 
  EXTRACT(MONTH FROM date)
ORDER BY 
  EXTRACT(YEAR FROM date), 
  EXTRACT(MONTH FROM date);

The new piece here is the ORDER BY clause. An ORDER BY clause lets us specify the order in which we should see the rows. In this case, we want to see all rows sorted first by the year, and then by the month.

Again, we need to repeat the same functions from the SELECT clause for the ORDER BY clause to work. By default, ORDER BY sorts the rows in the ascending order.

If you want to see rows sorted in descending order, you need to add the DESC keyword after the column name. For example, write ORDER BY EXTRACT(YEAR FROM date) DESC. You can read more about the difference between GROUP BY and ORDER BY in this article.

When we run the query, we’ll see something like this:

yearmonthavg_duration
2020547.61
2020651.33

On average, a guest spent more time in the museum in June than in May. That’s good news!

Example 4: GROUP BY and HAVING

Now, we have the following problem: we want to see the average ticket price for each day. However, there’s an extra condition: we don’t want to show days with 3 or fewer visits. This condition translates into a new piece in our SQL query. Take a look:

SELECT 
  date, 
  ROUND(AVG(price), 2) AS avg_price
FROM visit
GROUP BY date
HAVING COUNT(*) > 3
ORDER BY date;

The new part here is HAVING COUNT(*) > 3. HAVING is a clause we can use to filter on the grouped rows. In this case, we group rows by the date (GROUP BY date). When we do that, we want to make sure that a given group has more than three rows (HAVING COUNT(*) > 3). If a group (in this case, visits on a given day) doesn’t fulfill this condition, we don’t show it at all.

When we run the query, we’ll see something like this:

dateavg_price
2020-05-015.80
2020-05-157.00
2020-05-236.67
...

Example 5: GROUP BY, HAVING, and WHERE

Finally, we have the following problem to solve: we want to show the average visit duration for each day. Again, we only want to show days with more than three visits. However, we also want to make sure that visits which are five minutes long or shorter are not included in the calculations. These are most probably tests performed by the museum employees, so we want to ignore them. This is the query we’ll need:

SELECT 
  date, 
  ROUND(AVG(duration), 2) AS avg_duration
FROM visit
WHERE duration > 5
GROUP BY date
HAVING COUNT(*) > 3
ORDER BY date;

The new part here is the WHERE clause. It is used to only include visits that lasted more than five minutes.

The WHERE and HAVING clauses may look similar, but there is a difference between them: WHERE is used to filter single rows before they are grouped (i.e., single visits) while HAVING is used to filter groups of rows (i.e., visits on a given day). Read more in this article.

When we run the query, we’ll see something like this:

dateavg_duration
2020-05-0129.80
2020-05-1555.75
2020-05-2332.17
2020-05-2969.50
2020-06-0239.83
2020-06-0448.67
2020-06-0948.50
2020-06-2351.60
2020-06-2957.86

Note how the average visit duration increases with almost every day in June. It looks like we added aninteresting exhibition and our guests spread the word about it.

Where to Learn More

With these five examples, we went from easy to more complex GROUP BY cases. The versatility of SQL allowed us to analyze the museum visits and answer several questions about them. This shows how effectively GROUP BY can solve real business problems.

If you feel like learning more SQL, LearnSQL.com team teaches SQL from scratch in a completely interactive manner. For beginners, we have our best-selling SQL Basics course. We guarantee a smooth entry into the world of coding for people with no prior experience in IT. You won’t need to worry about the technical setup—you will study straight from your web browser. We take care of the database while you focus on the key concepts of SQL.