23rd Dec 2021 Updated: 29th Oct 2024 9 minutes read

How to Use the SQL PARTITION BY With OVER

window functions

Table of Contents

What Is the PARTITION BY Clause in SQL?
Comparing PARTITION BY and GROUP BY
Examples of Using PARTITION BY Clause
Where to Learn More About Window Functions

At the heart of every window function call is an OVER clause that defines how the windows of the records are built. Within the OVER clause, there may be an optional PARTITION BY subclause that defines the criteria for identifying which records to include in each window. Read on and take an important step in growing your SQL skills!

What Is the PARTITION BY Clause in SQL?

The SQL PARTITION BY expression is a subclause of the OVER clause, which is used in almost all invocations of window functions like AVG(), MAX(), and RANK(). As many readers probably know, window functions operate on window frames which are sets of rows that can be different for each record in the query result. This is where the SQL PARTITION BY subclause comes in: it is used to define which records to make part of the window frame associated with each record of the result.

The best way to learn window functions is our interactive Window Functions course. There are 218 exercises that will teach you how window functions work, what functions there are, and how to apply them to real-world problems. You only need a web browser and some basic SQL knowledge.

This article explains the SQL PARTITION BY and its uses with examples. Since it is deeply related to window functions, you may first want to read some articles on window functions, like “SQL Window Function Example With Explanations” where you find a lot of examples. If you want to learn more about window functions, there is also an interesting article with many pointers to other window functions articles.

The first thing to focus on is the syntax. Here’s how to use the SQL PARTITION BY clause:

SELECT
    <column>,
    <window function> OVER(PARTITION BY <column> [ORDER BY <column>])
FROM table;

Let’s look at an example that uses a PARTITION BY clause. We will use the following table called car_list_prices:

car_make	car_model	car_type	car_price
Ford	Mondeo	premium	18200
Renault	Fuego	sport	16500
Citroen	Cactus	premium	19000
Ford	Falcon	low cost	8990
Ford	Galaxy	standard	12400
Renault	Megane	standard	14300
Citroen	Picasso	premium	23400

For each car, we want to obtain the make, the model, the price, the average price across all cars, and the average price over the same type of car (to get a better idea of how the price of a given car compared to other cars). Here’s the query:

SELECT
    car_make,
    car_model,
    car_price,
    AVG(car_price) OVER() AS "overall average price",
    AVG(car_price) OVER (PARTITION BY car_type) AS "car type average price"
FROM car_list_prices

The result of the query is the following:

car_make	car_model	car_price	overall average price	car type average price
Ford	Mondeo	18200	16112.85	20200.00
Renault	Fuego	16500	16112.85	16500.00
Citroen	Cactus	19000	16112.85	20200.00
Ford	Falcon	8990	16112.85	8990.00
Ford	Galaxy	12400	16112.85	13350.00
Renault	Megane	14300	16112.85	13350.00
Citroen	Picasso	23400	16112.85	20200.00

The above query uses two window functions. The first is used to calculate the average price across all cars in the price list. It uses the window function AVG() with an empty OVER clause as we see in the following expression:

AVG(car_price) OVER() AS "overall average price"

The second window function is used to calculate the average price of a specific car_type like standard, premium, sport, etc. This is where we use an OVER clause with a PARTITION BY subclause as we see in this expression:

AVG(car_price) OVER (PARTITION BY car_type) AS "car type average price"

The window functions are quite powerful, right? If you’d like to learn more by doing well-prepared exercises, I suggest the course "Window Functions", where you can learn about and become comfortable with using window functions in SQL databases.

Comparing PARTITION BY and GROUP BY

The GROUP BY clause groups a set of records based on criteria. This allows us to apply a function (for example, AVG() or MAX()) to groups of records to yield one result per group.

As an example, say we want to obtain the average price and the top price for each make. Use the following query:

SELECT 
  car_make,
  AVG(car_price) AS average_price,
  MAX(car_price) AS top_price
FROM car_list_prices
GROUP BY car_make;

Here is the result of this query:

car_make	average_price	top_price
Ford	13196	18200
Renault	15400	16500
Citroen	21200	23400

Compared to window functions, GROUP BY collapses individual records into a group. As a consequence, you cannot refer to any individual record field; that is, only the columns in the GROUP BY clause can be referenced.

For example, say you want to create a report with the model, the price, and the average price of the make. You cannot do this by using GROUP BY, because the individual records of each model are collapsed due to the clause GROUP BY car_make. For something like this, you need to use window functions, as we see in the following example:

SELECT 
  car_make,
  car_model,
  car_price,
  AVG(car_price) OVER (PARTITION BY car_make) AS average_make
FROM car_list_prices;

The result of this query is the following:

car_make	car_model	car_price	average_make
Citroen	Picasso	23400	21200
Citroen	Cactus	19000	21200
Ford	Galaxy	12400	13196
Ford	Falcon	8990	13196
Ford	Mondeo	18200	13196
Renault	Megane	14300	15400
Renault	Fuego	16500	15400

For those who want to go deeper, I suggest the article ““What Is the Difference Between a GROUP BY and a PARTITION BY?” with plenty of examples using aggregate and window functions.

There is a detailed article called “SQL Window Functions Cheat Sheet” where you can find a lot of syntax details and examples about the different bounds of the window frame.

Examples of Using PARTITION BY Clause

In this section, we show some examples of the SQL PARTITION BY clause. All are based on the table paris_london_flights, used by an airline to analyze the business results of this route for the years 2018 and 2019. Here’s a subset of the data:

make	model	flight_number	scheduled_departure	real_departure	scheduled_arrival	passengers	revenue
Boeing	757 300	FLP003	2019-01-30 15:00:00	2019-01-30 15:00:00	2019-01-30 15:00:00	260	82630.10
Boeing	737 200	FLP003	2019-02-01 15:00:00	2019-02-01 15:10:00	2019-02-01 15:55:00	195	58459.34
Airbus	A500	FLP003	2019-02-01 15:00:00	2019-02-01 15:03:00	2019-02-01 15:03:55	312	91570.87
Airbus	A500	FLP001	2019-10-28 05:00:00	2019-10-28 05:04:00	2019-10-28 05:55:00	298	87943.00
Boeing	737 200	FLP002	2019-10-28 09:00:00	2019-10-28 09:00:00	2019-10-28 09:55:00	178	56342.45

Example 1: Total passengers and revenue

The first query generates a report including the flight_number, model with the quantity of passenger transported, and the total revenue. The query is below:

SELECT DISTINCT
  flight_number,
  model,
  SUM(passengers) 
    OVER (PARTITION BY flight_number, model)
    AS total_passengers,
  SUM(revenue) 
    OVER (PARTITION BY flight_number, model)
   AS revenue
FROM paris_london_flights
ORDER BY flight_number, model;

Since the total passengers transported and the total revenue are generated for each possible combination of flight_number and model, we use the following PARTITION BY clause to generate a set of records with the same flight number and aircraft model:

OVER (PARTITION BY flight_number, model)

Then, for each set of records, we apply window functions SUM(passengers) and SUM(revenue) to obtain the metrics total_passengers and revenue shown in the next result set.

flight_number	model	total_passengers	revenue
FLP001	737 200	20481	6016060.82
FLP001	757 300	18389	5361126.23
FLP001	Airbus A500	53872	15892165.58
FLP002	737 200	21660	6297197.71
FLP002	757 300	16869	4951475.86
FLP002	Airbus A500	54627	16004812.16
FLP003	737 200	20098	5874892.44
FLP003	757 300	15708	4573379.28
FLP003	Airbus A500	57533	16712475.04

Example 2: Passengers month to month

In the next query, we show how the business evolves by comparing metrics from one month with those from the previous month. We create a report using window functions to show the monthly variation in passengers and revenue.

WITH year_month_data AS (
  SELECT DISTINCT
    EXTRACT(YEAR FROM scheduled_departure) AS year,
    EXTRACT(MONTH FROM scheduled_departure) AS month,
    SUM(number_of_passengers)
      OVER (PARTITION BY 
        EXTRACT(YEAR FROM scheduled_departure),
        EXTRACT(MONTH FROM scheduled_departure)
    ) AS passengers
   FROM paris_london_flights
  ORDER BY 1, 2
)
SELECT  
  year,
  month,
  passengers,
  LAG(passengers) OVER (ORDER BY year, month) passengers_previous_month,
  passengers - LAG(passengers) OVER (ORDER BY year, month) AS passengers_delta
FROM year_month_data;

In the query above, we use a WITH clause to generate a CTE (CTE stands for common table expressions and is a type of query to generate a virtual table that can be used in the rest of the query). We populate data into a virtual table called year_month_data, which has 3 columns: year, month, and passengers with the total transported passengers in the month.

Then, the second query (which takes the CTE year_month_data as an input) generates the result of the query. The column passengers contains the total passengers transported associated with the current record. With the LAG(passenger) window function, we obtain the value of the column passengers of the previous record to the current record. We ORDER BY year and month:

LAG(passengers) OVER (ORDER BY year, month)
passengers_previous_month

It obtains the number of passengers from the previous record, corresponding to the previous month. Then, we have the number of passengers for the current and the previous months. Finally, in the last column, we calculate the difference between both values to obtain the monthly variation of passengers.

year	month	passengers	passengers_previous_month	passengers_delta
2018	12	11469	null	null
2019	1	24723	11469	13254
2019	2	22536	24723	-2187
2019	3	24994	22536	2458
2019	4	24408	24994	-586
2019	5	23998	24408	-410
2019	6	23793	23998	-205
2019	7	24816	23793	1023
2019	8	24334	24816	-482
2019	9	23719	24334	-615
2019	10	24989	23719	1270
2019	11	24371	24989	-618
2019	12	1087	24371	-23284

Example 3: Flight delays

For our last example, let’s look at flight delays. We want to obtain different delay averages to explain the reasons behind the delays.

We use a CTE to calculate a column called month_delay with the average delay for each month and obtain the aircraft model. Then in the main query, we obtain the different averages as we see below:

WITH paris_london_delays AS (
  SELECT DISTINCT
    model,
    EXTRACT(YEAR FROM scheduled_departure) AS year,
    EXTRACT(MONTH FROM scheduled_departure) AS month,
    AVG(real_departure - scheduled_departure) AS month_delay
  FROM  paris_london_flights
  GROUP BY 1, 2, 3
)
SELECT DISTINCT
  model,
  year,
  month,
  month_delay AS monthly_avg_delay,
  AVG(month_delay) OVER (PARTITION BY model, year) AS year_avg_delay,
  AVG(month_delay) OVER (PARTITION BY year) AS year_avg_delay_all_models,
  AVG(month_delay) OVER (PARTITION BY model, year 
                         ORDER BY month
                         ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
                        ) AS rolling_average_4_months
FROM paris_london_delays
ORDER BY 1,2,3;

This query calculates several averages. The first is the average per aircraft model and year, which is very clear. The second is the average per year across all aircraft models. Note we only use the column year in the PARTITION BY clause. The third and last average is the rolling average, where we use the most recent 3 months and the current month (i.e., row) to calculate the average with the following expression:

AVG(month_delay) OVER (PARTITION BY model, year
                       ORDER BY month
                       ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
                      ) AS rolling_average_4_months

The clause ROWS BETWEEN 3 PRECEDING AND CURRENT ROW in the PARTITION BY restricts the number of rows (i.e., months) to be included in the average: the previous 3 months and the current month. You can see a partial result of this query below:

model	year	month	month_delay	year_avg_delay	year_avg_delay_all_models	rolling_average_4_months
737 200	2018	12	00:02:13.84	00:02:13.84	00:03:13.70	00:02:13.84
737 200	2019	1	00:02:16.80	00:02:36.59	00:02:34.12	00:02:16.80
737 200	2019	2	00:02:35.00	00:02:36.59	00:02:34.12	00:02:25.90
737 200	2019	3	00:01:38.40	00:02:36.59	00:02:34.12	00:02:10.06
737 200	2019	4	00:04:00.00	00:02:36.59	00:02:34.12	00:02:37.55
737 200	2019	5	00:03:12.72	00:02:36.59	00:02:34.12	00:02:51.53
737 200	2019	6	00:02:21.42	00:02:36.59	00:02:34.12	00:02:48.13

The article “The RANGE Clause in SQL Window Functions: 5 Practical Examples” explains how to define a subset of rows in the window frame using RANGE instead of ROWS, with several examples. Another interesting article is “Common SQL Window Functions: Using Partitions With Ranking Functions” in which the PARTITION BY clause is covered in detail.

Where to Learn More About Window Functions

Window functions are a very powerful resource of the SQL language, and the SQL PARTITION BY clause plays a central role in their use. In this article, we have covered how this clause works and showed several examples using different syntaxes.

If you want to learn more about window functions, try out our interactive Window Functions course with over 200 hands-on practical exercises. If you know window functions and are looking for SQL window functions practice, take a look at our Window Functions Practice Set course with 100 real-world exercises.

Tags:

window functions