Back to articles list Articles Cookbook
Updated: 29th Oct 2024 9 minutes read

How to Use the SQL PARTITION BY With OVER

At the heart of every window function call is an OVER clause that defines how the windows of the records are built. Within the OVER clause, there may be an optional PARTITION BY subclause that defines the criteria for identifying which records to include in each window. Read on and take an important step in growing your SQL skills!

What Is the PARTITION BY Clause in SQL?

The SQL PARTITION BY expression is a subclause of the OVER clause, which is used in almost all invocations of window functions like AVG(), MAX(), and RANK(). As many readers probably know, window functions operate on window frames which are sets of rows that can be different for each record in the query result. This is where the SQL PARTITION BY subclause comes in: it is used to define which records to make part of the window frame associated with each record of the result.

The best way to learn window functions is our interactive Window Functions course. There are 218 exercises that will teach you how window functions work, what functions there are, and how to apply them to real-world problems. You only need a web browser and some basic SQL knowledge.

This article explains the SQL PARTITION BY and its uses with examples. Since it is deeply related to window functions, you may first want to read some articles on window functions, like “SQL Window Function Example With Explanations” where you find a lot of examples. If you want to learn more about window functions, there is also an interesting article with many pointers to other window functions articles.

The first thing to focus on is the syntax. Here’s how to use the SQL PARTITION BY clause:

SELECT
    <column>,
    <window function> OVER(PARTITION BY <column> [ORDER BY <column>])
FROM table;

Let’s look at an example that uses a PARTITION BY clause. We will use the following table called car_list_prices:

car_makecar_modelcar_typecar_price
FordMondeopremium18200
RenaultFuegosport16500
CitroenCactuspremium19000
FordFalconlow cost8990
FordGalaxystandard12400
RenaultMeganestandard14300
CitroenPicassopremium23400

For each car, we want to obtain the make, the model, the price, the average price across all cars, and the average price over the same type of car (to get a better idea of how the price of a given car compared to other cars). Here’s the query:

SELECT
    car_make,
    car_model,
    car_price,
    AVG(car_price) OVER() AS "overall average price",
    AVG(car_price) OVER (PARTITION BY car_type) AS "car type average price"
FROM car_list_prices

The result of the query is the following:

car_makecar_modelcar_priceoverall average pricecar type average price
FordMondeo1820016112.8520200.00
RenaultFuego1650016112.8516500.00
CitroenCactus1900016112.8520200.00
FordFalcon899016112.858990.00
FordGalaxy1240016112.8513350.00
RenaultMegane1430016112.8513350.00
CitroenPicasso2340016112.8520200.00

The above query uses two window functions. The first is used to calculate the average price across all cars in the price list. It uses the window function AVG() with an empty OVER clause as we see in the following expression:

AVG(car_price) OVER() AS "overall average price"

The second window function is used to calculate the average price of a specific car_type like standard, premium, sport, etc. This is where we use an OVER clause with a PARTITION BY subclause as we see in this expression:

AVG(car_price) OVER (PARTITION BY car_type) AS "car type average price"

The window functions are quite powerful, right? If you’d like to learn more by doing well-prepared exercises, I suggest the course "Window Functions", where you can learn about and become comfortable with using window functions in SQL databases.

Comparing PARTITION BY and GROUP BY

The GROUP BY clause groups a set of records based on criteria. This allows us to apply a function (for example, AVG() or MAX()) to groups of records to yield one result per group.

As an example, say we want to obtain the average price and the top price for each make. Use the following query:

SELECT 
  car_make,
  AVG(car_price) AS average_price,
  MAX(car_price) AS top_price
FROM car_list_prices
GROUP BY car_make;

Here is the result of this query:

car_makeaverage_pricetop_price
Ford1319618200
Renault1540016500
Citroen2120023400

Compared to window functions, GROUP BY collapses individual records into a group. As a consequence, you cannot refer to any individual record field; that is, only the columns in the GROUP BY clause can be referenced.

For example, say you want to create a report with the model, the price, and the average price of the make. You cannot do this by using GROUP BY, because the individual records of each model are collapsed due to the clause GROUP BY car_make. For something like this, you need to use window functions, as we see in the following example:

SELECT 
  car_make,
  car_model,
  car_price,
  AVG(car_price) OVER (PARTITION BY car_make) AS average_make
FROM car_list_prices;

The result of this query is the following:

car_makecar_modelcar_priceaverage_make
CitroenPicasso2340021200
CitroenCactus1900021200
FordGalaxy1240013196
FordFalcon899013196
FordMondeo1820013196
RenaultMegane1430015400
RenaultFuego1650015400

For those who want to go deeper, I suggest the article ““What Is the Difference Between a GROUP BY and a PARTITION BY?” with plenty of examples using aggregate and window functions.

There is a detailed article called “SQL Window Functions Cheat Sheet” where you can find a lot of syntax details and examples about the different bounds of the window frame.

Examples of Using PARTITION BY Clause

In this section, we show some examples of the SQL PARTITION BY clause. All are based on the table paris_london_flights, used by an airline to analyze the business results of this route for the years 2018 and 2019. Here’s a subset of the data:

makemodelflight_numberscheduled_departurereal_departurescheduled_arrivalpassengersrevenue
Boeing757 300FLP0032019-01-30 15:00:002019-01-30 15:00:002019-01-30 15:00:0026082630.10
Boeing737 200FLP0032019-02-01 15:00:002019-02-01 15:10:002019-02-01 15:55:0019558459.34
AirbusA500FLP0032019-02-01 15:00:002019-02-01 15:03:002019-02-01 15:03:5531291570.87
AirbusA500FLP0012019-10-28 05:00:002019-10-28 05:04:002019-10-28 05:55:0029887943.00
Boeing737 200FLP0022019-10-28 09:00:002019-10-28 09:00:002019-10-28 09:55:0017856342.45

Example 1: Total passengers and revenue

The first query generates a report including the flight_number, model with the quantity of passenger transported, and the total revenue. The query is below:

SELECT DISTINCT
  flight_number,
  model,
  SUM(passengers) 
    OVER (PARTITION BY flight_number, model)
    AS total_passengers,
  SUM(revenue) 
    OVER (PARTITION BY flight_number, model)
   AS revenue
FROM paris_london_flights
ORDER BY flight_number, model;

Since the total passengers transported and the total revenue are generated for each possible combination of flight_number and model, we use the following PARTITION BY clause to generate a set of records with the same flight number and aircraft model:

OVER (PARTITION BY flight_number, model)

Then, for each set of records, we apply window functions SUM(passengers) and SUM(revenue) to obtain the metrics total_passengers and revenue shown in the next result set.

flight_numbermodeltotal_passengersrevenue
FLP001737 200204816016060.82
FLP001757 300183895361126.23
FLP001Airbus A5005387215892165.58
FLP002737 200216606297197.71
FLP002757 300168694951475.86
FLP002Airbus A5005462716004812.16
FLP003737 200200985874892.44
FLP003757 300157084573379.28
FLP003Airbus A5005753316712475.04

Example 2: Passengers month to month

In the next query, we show how the business evolves by comparing metrics from one month with those from the previous month. We create a report using window functions to show the monthly variation in passengers and revenue.

WITH year_month_data AS (
  SELECT DISTINCT
    EXTRACT(YEAR FROM scheduled_departure) AS year,
    EXTRACT(MONTH FROM scheduled_departure) AS month,
    SUM(number_of_passengers)
      OVER (PARTITION BY 
        EXTRACT(YEAR FROM scheduled_departure),
        EXTRACT(MONTH FROM scheduled_departure)
    ) AS passengers
   FROM paris_london_flights
  ORDER BY 1, 2
)
SELECT  
  year,
  month,
  passengers,
  LAG(passengers) OVER (ORDER BY year, month) passengers_previous_month,
  passengers - LAG(passengers) OVER (ORDER BY year, month) AS passengers_delta
FROM year_month_data;

In the query above, we use a WITH clause to generate a CTE (CTE stands for common table expressions and is a type of query to generate a virtual table that can be used in the rest of the query). We populate data into a virtual table called year_month_data, which has 3 columns: year, month, and passengers with the total transported passengers in the month.

Then, the second query (which takes the CTE year_month_data as an input) generates the result of the query. The column passengers contains the total passengers transported associated with the current record. With the LAG(passenger) window function, we obtain the value of the column passengers of the previous record to the current record. We ORDER BY year and month:

LAG(passengers) OVER (ORDER BY year, month)
passengers_previous_month

It obtains the number of passengers from the previous record, corresponding to the previous month. Then, we have the number of passengers for the current and the previous months. Finally, in the last column, we calculate the difference between both values to obtain the monthly variation of passengers.

yearmonthpassengerspassengers_previous_monthpassengers_delta
20181211469nullnull
20191247231146913254
201922253624723-2187
2019324994225362458
201942440824994-586
201952399824408-410
201962379323998-205
2019724816237931023
201982433424816-482
201992371924334-615
20191024989237191270
2019112437124989-618
201912108724371-23284

Example 3: Flight delays

For our last example, let’s look at flight delays. We want to obtain different delay averages to explain the reasons behind the delays.

We use a CTE to calculate a column called month_delay with the average delay for each month and obtain the aircraft model. Then in the main query, we obtain the different averages as we see below:

WITH paris_london_delays AS (
  SELECT DISTINCT
    model,
    EXTRACT(YEAR FROM scheduled_departure) AS year,
    EXTRACT(MONTH FROM scheduled_departure) AS month,
    AVG(real_departure - scheduled_departure) AS month_delay
  FROM  paris_london_flights
  GROUP BY 1, 2, 3
)
SELECT DISTINCT
  model,
  year,
  month,
  month_delay AS monthly_avg_delay,
  AVG(month_delay) OVER (PARTITION BY model, year) AS year_avg_delay,
  AVG(month_delay) OVER (PARTITION BY year) AS year_avg_delay_all_models,
  AVG(month_delay) OVER (PARTITION BY model, year 
                         ORDER BY month
                         ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
                        ) AS rolling_average_4_months
FROM paris_london_delays
ORDER BY 1,2,3;

This query calculates several averages. The first is the average per aircraft model and year, which is very clear. The second is the average per year across all aircraft models. Note we only use the column year in the PARTITION BY clause. The third and last average is the rolling average, where we use the most recent 3 months and the current month (i.e., row) to calculate the average with the following expression:

AVG(month_delay) OVER (PARTITION BY model, year
                       ORDER BY month
                       ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
                      ) AS rolling_average_4_months

The clause ROWS BETWEEN 3 PRECEDING AND CURRENT ROW in the PARTITION BY restricts the number of rows (i.e., months) to be included in the average: the previous 3 months and the current month. You can see a partial result of this query below:

modelyearmonthmonth_delayyear_avg_delayyear_avg_delay_all_modelsrolling_average_4_months
737 20020181200:02:13.8400:02:13.8400:03:13.7000:02:13.84
737 2002019100:02:16.8000:02:36.5900:02:34.1200:02:16.80
737 2002019200:02:35.0000:02:36.5900:02:34.1200:02:25.90
737 2002019300:01:38.4000:02:36.5900:02:34.1200:02:10.06
737 2002019400:04:00.0000:02:36.5900:02:34.1200:02:37.55
737 2002019500:03:12.7200:02:36.5900:02:34.1200:02:51.53
737 2002019600:02:21.4200:02:36.5900:02:34.1200:02:48.13

The article “The RANGE Clause in SQL Window Functions: 5 Practical Examples” explains how to define a subset of rows in the window frame using RANGE instead of ROWS, with several examples. Another interesting article is “Common SQL Window Functions: Using Partitions With Ranking Functions” in which the PARTITION BY clause is covered in detail.

Where to Learn More About Window Functions

Window functions are a very powerful resource of the SQL language, and the SQL PARTITION BY clause plays a central role in their use. In this article, we have covered how this clause works and showed several examples using different syntaxes.

If you want to learn more about window functions, try out our interactive Window Functions course with over 200 hands-on practical exercises. If you know window functions and are looking for SQL window functions practice, take a look at our Window Functions Practice Set course with 100 real-world exercises.