21st Jul 2020 6 minutes read

Using GROUP BY in SQL

group by

Table of Contents

GROUP BY Syntax
GROUP BY a Single Column
Grouping with Aggregate Functions
MAX(), MIN(), AVG(), SUM()
GROUP BY Multiple Columns with COUNT()
Mastering SQL GROUP BY

Now that you know SQL’s core commands, power up your queries with the GROUP BY clause and aggregate functions.

GROUP BY is a SQL clause that arranges data into groups based on a common value (or values). It is widely used to obtain summary information for large datasets as well as for computing basic aggregate statistics.

Once you have grasped SQL’s basic commands like SELECT, WHERE, and JOIN, mastering the GROUP BY clause can help you implement your SQL knowledge in real-world business intelligence scenarios.

SQL GROUP BY is most often used in combination with aggregate functions like COUNT(), MAX(), MIN(), SUM(), and AVG().It groups the result set by one or more columns, essentially creating summary rows for each group. These groups are usually based on categorical rather than continuous values.

For example, if you are using a dataset of global customers for your business, you can use the GROUP BY clause to summarize information such as the “total number of customers by each country” or the “total sales per customer”.

GROUP BY Syntax

Now that we know what the GROUP BY clause is, let's look at the syntax of a basic GROUP BY query.

SELECT column_name(s)
FROM table_name
WHERE [condition]
GROUP BY column_name1, column_name2
HAVING [condition]
ORDER BY column_name

Here:

SELECT is the standard SQL SELECT query.
GROUP BY column_name1 performs the grouping based on values in column_name1.
column_name2 is used when grouping is done on more than one column; you can group by any number of columns. This is optional.
HAVING [condition] and WHERE [condition] are also optional; both are used to restrict the rows affected by the GROUP BY clause. The key difference is that WHERE filters the rows before grouping and HAVING filters the groups themselves once they are formed. Important: WHERE must precede the GROUP BY clause and HAVING must follow it.
ORDER BY [condition] is used after GROUP BY to sort or order the data by a given column_name.

At first, GROUP BY and ORDER BY appear to do the same thing – sort the data. But that is where their similarities end. As the name suggests, ORDER BY simply orders the complete dataset in a defined order, while GROUP BY groups the data as an aggregate output. Our earlier article, The Difference Between GROUP BY and ORDER BY in Simple Words, explains this in more detail; check it out if you need a refresher.

Now, let’s see how the GROUP BY clause works with real queries.

GROUP BY a Single Column

To help understand the effect of the GROUP BY clause, let's execute a simple query on the Orders table, which has recorded 196 individual orders. Below, you can see the first five entries in the table:

OrderID	CustomerID	EmployeeID	OrderDate	ShipperID
10248	90	5	1996-07-04	3
10249	81	6	1996-07-05	1
10250	34	4	1996-07-08	2
10251	84	3	1996-07-08	1
10252	76	4	1996-07-09	2

Using the Orders table, let’s find out how many unique customers have placed an order. Here’s the query:

SELECT CustomerID 
FROM Orders
GROUP BY CustomerID

RESULT

Number of Records: 74

CustomerID
2
3
4
5
7

This is an example of a simple GROUP BY clause on a single column (CustomerID); the output places all the rows with the same value for the CustomerID column in the same group. It also returns the number of unique customers who have placed an order (74). Essentially, a GROUP BY clause without an aggregate function behaves similarly to a DISTINCT clause: it finds the unique values in a column.

Grouping can also be done by multiple columns. This is most useful when used with aggregate function(s).

Grouping with Aggregate Functions

The practical application of SQL’s GROUP BY command is summarizing statistical parameters for groups of data; this is usually achieved with aggregate functions. We’ll explore some examples now, using the Orders tables and the OrderDetails table, a sample of which is shown below:

OrderDetailID	OrderID	ProductID	Quantity
1	10248	11	12
2	10248	42	10
3	10248	72	5
4	10249	14	9
5	10249	51	40

MAX(), MIN(), AVG(), SUM()

GROUP BY is perfect for summarizing the statistical parameters of individual categories in your dataset. For example, let’s try to find the maximum, minimum, and average quantity of each ProductID per order and the total number of each product ordered. We’ll use the OrderDetails table and put the results in descending order by the sum quantities ordered for each product. This will shed light on the most popular ProductIDs. Here’s the query:

SELECT ProductID, MAX(Quantity), Min(Quantity), AVG(Quantity), SUM (Quantity)
FROM OrderDetails
GROUP BY ProductID
Order By SUM(Quantity) DESC

RESULT

Number of Records: 77

ProductID	MAX(Quantity)	Min(Quantity)	AVG(Quantity)	SUM (Quantity)
31	70	4	32.714285714285715	458
60	80	6	35.833333333333336	430
35	100	4	41	369
59	70	6	24.714285714285715	346
2	60	7	31	341

The table below breaks down how some of these aggregate values are calculated for the ProductID values 2 and 31. The GROUP BY clause first groups the rows together for ProductID 2 and then calculates the AVG() and MAX() for the whole group. It does the same for each ProductID for each of the aggregate functions in the query.

ProductID	Quantity	Average (?Quantity/# of ProductID)	Max (Quantity)
2	60	31	60
2	50
2	45
2	40
2	35
……	……
31	70	32.714	70
31	60
31	56
31	42
31	40
……	……

As you see, the GROUP BY command is an extremely useful tool for finding statistical summaries for any class of variables.

GROUP BY Multiple Columns with COUNT()

As mentioned earlier, you can use multiple columns in the GROUP BY. In other words, you’re placing all rows with the same values in both column_name1 and column_name2 in one group.

Let’s consider an example where we want to count the number of times each employee has used each shipping service. Once again, we’ll use the Orders table.

SELECT EmployeeID, ShipperID, COUNT(*)
FROM Orders
Group BY EmployeeID, ShipperID
Order BY ShipperID, Count(*) DESC

RESULT

Number of Records: 26

EmployeeID	ShipperID	COUNT(*)
4	1	12
1	1	8
2	1	7
3	1	7
6	1	7
…….	…….	…….

This query could be used, for example, to get feedback on shippers’ customer service from the employees who have used that service. As we can see above, there are 26 unique interaction pairs between employees and shippers!

How does grouping by two columns work behind the scenes? Rows with the same EmployeeID and ShipperID are paired into one group. The size of those groups (i.e. the number of records in the group) is calculated, as you can see below:

EmployeeID	ShipperID	Count
1	1
1	1
1	1	12
1	1
1	1
……	…….
4	1
4	1
4	1	8
4	1
4	1
……	…….

The GROUP BY command is also a powerful tool that can be combined with the SQL HAVING and WHERE clauses to define additional conditions for the returned results.

Keep in mind that there are certain subtleties when GROUP BY is used with columns that have NULL values.

Mastering SQL GROUP BY

Needless to say, only practice will make you master this command! Practice is best done on real-world datasets, where you can visualize and relate to the business needs. The SQL Basics course includes a comprehensive set of interactive exercises. You’ll learn how to group rows and compute statistics with aggregate functions on practical datasets. Try it out, and good luck!

Tags:

group by

GROUP BY Syntax

GROUP BY a Single Column

Grouping with Aggregate Functions

MAX(), MIN(), AVG(), SUM()

GROUP BY Multiple Columns with COUNT()

Mastering SQL GROUP BY

You may also like