23rd Dec 2021 Updated: 29th Oct 2024 9 minutes read How to Use the SQL PARTITION BY With OVER Ignacio L. Bisso window functions Table of Contents What Is the PARTITION BY Clause in SQL? Comparing PARTITION BY and GROUP BY Examples of Using PARTITION BY Clause Example 1: Total passengers and revenue Example 2: Passengers month to month Example 3: Flight delays Where to Learn More About Window Functions At the heart of every window function call is an OVER clause that defines how the windows of the records are built. Within the OVER clause, there may be an optional PARTITION BY subclause that defines the criteria for identifying which records to include in each window. Read on and take an important step in growing your SQL skills! What Is the PARTITION BY Clause in SQL? The SQL PARTITION BY expression is a subclause of the OVER clause, which is used in almost all invocations of window functions like AVG(), MAX(), and RANK(). As many readers probably know, window functions operate on window frames which are sets of rows that can be different for each record in the query result. This is where the SQL PARTITION BY subclause comes in: it is used to define which records to make part of the window frame associated with each record of the result. The best way to learn window functions is our interactive Window Functions course. There are 218 exercises that will teach you how window functions work, what functions there are, and how to apply them to real-world problems. You only need a web browser and some basic SQL knowledge. This article explains the SQL PARTITION BY and its uses with examples. Since it is deeply related to window functions, you may first want to read some articles on window functions, like “SQL Window Function Example With Explanations” where you find a lot of examples. If you want to learn more about window functions, there is also an interesting article with many pointers to other window functions articles. The first thing to focus on is the syntax. Here’s how to use the SQL PARTITION BY clause: SELECT <column>, <window function> OVER(PARTITION BY <column> [ORDER BY <column>]) FROM table; Let’s look at an example that uses a PARTITION BY clause. We will use the following table called car_list_prices: car_makecar_modelcar_typecar_price FordMondeopremium18200 RenaultFuegosport16500 CitroenCactuspremium19000 FordFalconlow cost8990 FordGalaxystandard12400 RenaultMeganestandard14300 CitroenPicassopremium23400 For each car, we want to obtain the make, the model, the price, the average price across all cars, and the average price over the same type of car (to get a better idea of how the price of a given car compared to other cars). Here’s the query: SELECT car_make, car_model, car_price, AVG(car_price) OVER() AS "overall average price", AVG(car_price) OVER (PARTITION BY car_type) AS "car type average price" FROM car_list_prices The result of the query is the following: car_makecar_modelcar_priceoverall average pricecar type average price FordMondeo1820016112.8520200.00 RenaultFuego1650016112.8516500.00 CitroenCactus1900016112.8520200.00 FordFalcon899016112.858990.00 FordGalaxy1240016112.8513350.00 RenaultMegane1430016112.8513350.00 CitroenPicasso2340016112.8520200.00 The above query uses two window functions. The first is used to calculate the average price across all cars in the price list. It uses the window function AVG() with an empty OVER clause as we see in the following expression: AVG(car_price) OVER() AS "overall average price" The second window function is used to calculate the average price of a specific car_type like standard, premium, sport, etc. This is where we use an OVER clause with a PARTITION BY subclause as we see in this expression: AVG(car_price) OVER (PARTITION BY car_type) AS "car type average price" The window functions are quite powerful, right? If you’d like to learn more by doing well-prepared exercises, I suggest the course "Window Functions", where you can learn about and become comfortable with using window functions in SQL databases. Comparing PARTITION BY and GROUP BY The GROUP BY clause groups a set of records based on criteria. This allows us to apply a function (for example, AVG() or MAX()) to groups of records to yield one result per group. As an example, say we want to obtain the average price and the top price for each make. Use the following query: SELECT car_make, AVG(car_price) AS average_price, MAX(car_price) AS top_price FROM car_list_prices GROUP BY car_make; Here is the result of this query: car_makeaverage_pricetop_price Ford1319618200 Renault1540016500 Citroen2120023400 Compared to window functions, GROUP BY collapses individual records into a group. As a consequence, you cannot refer to any individual record field; that is, only the columns in the GROUP BY clause can be referenced. For example, say you want to create a report with the model, the price, and the average price of the make. You cannot do this by using GROUP BY, because the individual records of each model are collapsed due to the clause GROUP BY car_make. For something like this, you need to use window functions, as we see in the following example: SELECT car_make, car_model, car_price, AVG(car_price) OVER (PARTITION BY car_make) AS average_make FROM car_list_prices; The result of this query is the following: car_makecar_modelcar_priceaverage_make CitroenPicasso2340021200 CitroenCactus1900021200 FordGalaxy1240013196 FordFalcon899013196 FordMondeo1820013196 RenaultMegane1430015400 RenaultFuego1650015400 For those who want to go deeper, I suggest the article ““What Is the Difference Between a GROUP BY and a PARTITION BY?” with plenty of examples using aggregate and window functions. There is a detailed article called “SQL Window Functions Cheat Sheet” where you can find a lot of syntax details and examples about the different bounds of the window frame. Examples of Using PARTITION BY Clause In this section, we show some examples of the SQL PARTITION BY clause. All are based on the table paris_london_flights, used by an airline to analyze the business results of this route for the years 2018 and 2019. Here’s a subset of the data: makemodelflight_numberscheduled_departurereal_departurescheduled_arrivalpassengersrevenue Boeing757 300FLP0032019-01-30 15:00:002019-01-30 15:00:002019-01-30 15:00:0026082630.10 Boeing737 200FLP0032019-02-01 15:00:002019-02-01 15:10:002019-02-01 15:55:0019558459.34 AirbusA500FLP0032019-02-01 15:00:002019-02-01 15:03:002019-02-01 15:03:5531291570.87 AirbusA500FLP0012019-10-28 05:00:002019-10-28 05:04:002019-10-28 05:55:0029887943.00 Boeing737 200FLP0022019-10-28 09:00:002019-10-28 09:00:002019-10-28 09:55:0017856342.45 Example 1: Total passengers and revenue The first query generates a report including the flight_number, model with the quantity of passenger transported, and the total revenue. The query is below: SELECT DISTINCT flight_number, model, SUM(passengers) OVER (PARTITION BY flight_number, model) AS total_passengers, SUM(revenue) OVER (PARTITION BY flight_number, model) AS revenue FROM paris_london_flights ORDER BY flight_number, model; Since the total passengers transported and the total revenue are generated for each possible combination of flight_number and model, we use the following PARTITION BY clause to generate a set of records with the same flight number and aircraft model: OVER (PARTITION BY flight_number, model) Then, for each set of records, we apply window functions SUM(passengers) and SUM(revenue) to obtain the metrics total_passengers and revenue shown in the next result set. flight_numbermodeltotal_passengersrevenue FLP001737 200204816016060.82 FLP001757 300183895361126.23 FLP001Airbus A5005387215892165.58 FLP002737 200216606297197.71 FLP002757 300168694951475.86 FLP002Airbus A5005462716004812.16 FLP003737 200200985874892.44 FLP003757 300157084573379.28 FLP003Airbus A5005753316712475.04 Example 2: Passengers month to month In the next query, we show how the business evolves by comparing metrics from one month with those from the previous month. We create a report using window functions to show the monthly variation in passengers and revenue. WITH year_month_data AS ( SELECT DISTINCT EXTRACT(YEAR FROM scheduled_departure) AS year, EXTRACT(MONTH FROM scheduled_departure) AS month, SUM(number_of_passengers) OVER (PARTITION BY EXTRACT(YEAR FROM scheduled_departure), EXTRACT(MONTH FROM scheduled_departure) ) AS passengers FROM paris_london_flights ORDER BY 1, 2 ) SELECT year, month, passengers, LAG(passengers) OVER (ORDER BY year, month) passengers_previous_month, passengers - LAG(passengers) OVER (ORDER BY year, month) AS passengers_delta FROM year_month_data; In the query above, we use a WITH clause to generate a CTE (CTE stands for common table expressions and is a type of query to generate a virtual table that can be used in the rest of the query). We populate data into a virtual table called year_month_data, which has 3 columns: year, month, and passengers with the total transported passengers in the month. Then, the second query (which takes the CTE year_month_data as an input) generates the result of the query. The column passengers contains the total passengers transported associated with the current record. With the LAG(passenger) window function, we obtain the value of the column passengers of the previous record to the current record. We ORDER BY year and month: LAG(passengers) OVER (ORDER BY year, month) passengers_previous_month It obtains the number of passengers from the previous record, corresponding to the previous month. Then, we have the number of passengers for the current and the previous months. Finally, in the last column, we calculate the difference between both values to obtain the monthly variation of passengers. yearmonthpassengerspassengers_previous_monthpassengers_delta 20181211469nullnull 20191247231146913254 201922253624723-2187 2019324994225362458 201942440824994-586 201952399824408-410 201962379323998-205 2019724816237931023 201982433424816-482 201992371924334-615 20191024989237191270 2019112437124989-618 201912108724371-23284 Example 3: Flight delays For our last example, let’s look at flight delays. We want to obtain different delay averages to explain the reasons behind the delays. We use a CTE to calculate a column called month_delay with the average delay for each month and obtain the aircraft model. Then in the main query, we obtain the different averages as we see below: WITH paris_london_delays AS ( SELECT DISTINCT model, EXTRACT(YEAR FROM scheduled_departure) AS year, EXTRACT(MONTH FROM scheduled_departure) AS month, AVG(real_departure - scheduled_departure) AS month_delay FROM paris_london_flights GROUP BY 1, 2, 3 ) SELECT DISTINCT model, year, month, month_delay AS monthly_avg_delay, AVG(month_delay) OVER (PARTITION BY model, year) AS year_avg_delay, AVG(month_delay) OVER (PARTITION BY year) AS year_avg_delay_all_models, AVG(month_delay) OVER (PARTITION BY model, year ORDER BY month ROWS BETWEEN 3 PRECEDING AND CURRENT ROW ) AS rolling_average_4_months FROM paris_london_delays ORDER BY 1,2,3; This query calculates several averages. The first is the average per aircraft model and year, which is very clear. The second is the average per year across all aircraft models. Note we only use the column year in the PARTITION BY clause. The third and last average is the rolling average, where we use the most recent 3 months and the current month (i.e., row) to calculate the average with the following expression: AVG(month_delay) OVER (PARTITION BY model, year ORDER BY month ROWS BETWEEN 3 PRECEDING AND CURRENT ROW ) AS rolling_average_4_months The clause ROWS BETWEEN 3 PRECEDING AND CURRENT ROW in the PARTITION BY restricts the number of rows (i.e., months) to be included in the average: the previous 3 months and the current month. You can see a partial result of this query below: modelyearmonthmonth_delayyear_avg_delayyear_avg_delay_all_modelsrolling_average_4_months 737 20020181200:02:13.8400:02:13.8400:03:13.7000:02:13.84 737 2002019100:02:16.8000:02:36.5900:02:34.1200:02:16.80 737 2002019200:02:35.0000:02:36.5900:02:34.1200:02:25.90 737 2002019300:01:38.4000:02:36.5900:02:34.1200:02:10.06 737 2002019400:04:00.0000:02:36.5900:02:34.1200:02:37.55 737 2002019500:03:12.7200:02:36.5900:02:34.1200:02:51.53 737 2002019600:02:21.4200:02:36.5900:02:34.1200:02:48.13 The article “The RANGE Clause in SQL Window Functions: 5 Practical Examples” explains how to define a subset of rows in the window frame using RANGE instead of ROWS, with several examples. Another interesting article is “Common SQL Window Functions: Using Partitions With Ranking Functions” in which the PARTITION BY clause is covered in detail. Where to Learn More About Window Functions Window functions are a very powerful resource of the SQL language, and the SQL PARTITION BY clause plays a central role in their use. In this article, we have covered how this clause works and showed several examples using different syntaxes. If you want to learn more about window functions, try out our interactive Window Functions course with over 200 hands-on practical exercises. If you know window functions and are looking for SQL window functions practice, take a look at our Window Functions Practice Set course with 100 real-world exercises. Tags: window functions