30th Jun 2022 12 minutes read Learn the OVER() Clause in SQL with 7 Examples Ignacio L. Bisso window functions Table of Contents The OVER() Clause in SQL Introducing the Data Example Query #1: A Simple Use of OVER() Example Query #2: Using the PARTITION BY Subclause Example Query #3: Using the ORDER BY Subclause Example Query #4: Computing Table Free Time with OVER() and LAG() Example Query #5: Computing Rankings with the OVER() Clause Example Query #6: Computing Differences from a Previous Period Example Query #7: Moving Average Ready to Practice the SQL OVER() Clause? Learn how to use the OVER() clause in your SQL queries and you will take a step forward in your SQL knowledge. Most data analysts and SQL developers have a good handle on the SELECT, FROM, WHERE, GROUP BY, and ORDER BY clauses. However, these clauses are only a limited subset of the SQL language. There are other clauses, such as OVER(), which allow us to create flexible SQL reports and add an enormous expressive power to the SQL language. In this article, we are going to explain the OVER() clause by demonstrating several examples of its usage. Let’s get started! The OVER() Clause in SQL SQL has a fairly easy learning curve; anyone can learn how to create simple queries in SQL. In fact, the average IT pro or SQL developer handles the basic SELECT, FROM, WHERE, GROUP BY and ORDER BY clauses with ease. These clauses were available in SQL-92, which is 30 years old! However, the SQL language does not end there; many new elements have been added since 1992. One of them is the OVER() clause, which allows us to use window functions in SQL queries. In SQL, window functions are similar to GROUP BY in that they work on a group of rows. However, window functions are based on a data window, or a set of rows related to the current row. Unlike GROUP BY, window functions do not collapse rows; they keep the details of individual rows intact. For those people who want to go deeper into OVER() and window functions in SQL, I suggest our Window Functions online course. It contains plenty of examples using different window functions. Ok, now let’s shift back to the OVER() function. First, let’s look at our data. Introducing the Data Our example queries will be based on the following data. Suppose we have a seafood restaurant on the Mediterranean coast. We also have a database table where we store the history of the restaurant’s activity. Our database has a table called restaurant_activity with these columns: table_number waiter_name start_date start_time served_time end_time total_diners amount_payment total_tips. Below is an example of the data: Restaurant_activity table_numberwaiter_namestart_datestart_ timeserved_timeend_timetotal_dinersamount_paymenttotal_tips 1John5/5/202211:0311:1711:453350.0037 2Peter5/5/202211:1011:3213:104430.5050 3Mary5/5/202211:3012:0512:402260.3520 1John5/5/202212:0012:3813:104670.1230 3Mary5/5/202212:4313:1213:503320.5020 2Peter6/5/202211:1011:2111:405560.7560 3Mary6/5/202211:4011:5312:403240.1025 1John6/5/202211:3011:5312:301150.0010 3Mary6/5/202214:1014:2014:401240.1025 1Mary6/5/202214:3014:3514:502150.0030 This should all be self-explanatory, but let’s quickly run through a few columns. The start_date and start_time is the date and time when a meal was ordered by the customers at table_number; served_time is when the meal was served, and end_time is when the customers asked for their bill. The amount_payment is the cost of the meal, not including the tip (which is stored as total_tips). Example Query #1: A Simple Use of OVER() Let’s start by using the OVER() clause to calculate each order’s percentage of the total daily revenue on 5/5/2022. We can calculate the total revenue of the restaurant on this day (excluding tips) using OVER() combined with the SUM() window function. The OVER() clause is always associated with a window function; window functions calculate a value based on a set of records defined by the OVER() clause. The query is: SELECT start_date AS date, SUM(amount_payment) OVER () AS daily_revenue, amount_payment AS total_order, (amount_payment / SUM(amount_payment) OVER ())*100 AS order_percentage FROM restaurant_activity WHERE start_date = ’5/5/2022’ The previous query calculates the daily_revenue for 5/5/2022 by adding up each payment amount for this day. The OVER() clause is empty, meaning that the window of records used to calculate the SUM() function is the complete set of records returned by the query. In other words, the window is composed of all records for this date. In the calculation of column order_percentage, we divide the individual order amount by the total daily revenue to obtain the percentage; this is a central point in the query, because we are combining row-level columns with window function results in one expression. Understanding what records are part of the window is central to understanding how window functions work; we’ll return to this point later in the article. For now, let’s look at the results: datedaily_revenuetotal_orderorder_percentage 2022-05-052031.47350.0017.23 2022-05-052031.47430.5021.19 2022-05-052031.47260.3512.82 2022-05-052031.47670.1232.99 2022-05-052031.47320.5015.78 Before going into more complex OVER() examples, I’d like to suggest our Window Functions Practice Set. If you want to really learn how to use OVER(), these 100 interactive exercises will do the trick. Example Query #2: Using the PARTITION BY Subclause In this example, we will use the PARTITION BY subclause; it works with OVER() to define data windows. Let’s suppose we want a report similar to the previous one, but expanded to any day of May 2022. For each day, we want to see the total daily revenue, the individual order amount for all orders, and what percentage of the daily revenue is attributed to each order. Here is the query: SELECT start_date AS date, SUM(amount_payment) OVER (PARTITION BY start_date) AS daily_revenue, amount_payment AS total_order, (amount_payment / SUM(amount_payment) OVER (PARTITION BY start_date)) * 100 AS order_percentage FROM restaurant_activity WHERE start_date BETWEEN ’5/1/2022’ AND ’5/31/2022’ We mentioned that window functions work based on a window of records (or a set of records) related to the current record. The previous query uses the subclause PARTITION BY start_date to define which records belong in each window. In our example, all records with the same start_date value as the current row will be part of the window. As we have only two unique values for start_date, we have only two different windows. They are shown in green and red in the result below: datedaily_revenuetotal_orderorder_percentage 5/5/20222031.47350.0017.23 5/5/20222031.47430.5021.19 5/5/20222031.47260.3512.82 5/5/20222031.47670.1232.99 5/5/20222031.47320.5015.78 5/6/20221340.95560.7541.82 5/6/20221340.95240.1017.91 5/6/20221340.95150.0011.19 5/6/20221340.95240.1017.91 5/6/20221340.95150.0011.19 Example Query #3: Using the ORDER BY Subclause Now let’s introduce the ORDER BY subclause, which allows you to put rows in a specific order within the window. Let’s suppose we want a simple report with the five most expensive orders along with the waiter, date, and table number. The query is: WITH ranking AS ( SELECT table_number, amount_payment, waiter_name, start_date, RANK() OVER (ORDER BY amount_payment DESC) AS position FROM restaurant_activity ) SELECT amount_payment, waiter_name, start_date, table_number, position FROM ranking WHERE position <= 5 ORDER BY position Here, we used a CTE (or Common Table Expression) to build the ranking. In the CTE, we calculated the position of each order by using the window function RANK() combined with the following OVER() clause: RANK() OVER (ORDER BY amount_payment DESC) The above clause defines a window of records which is formed by all records in the table, so all orders of the restaurant are included. This window is sorted by amount_payment in descending order: the largest amount is the first record in the window, and so on. The RANK() function returns the position of the current row in the ordered window, storing this value in the position column of the CTE. After we build the ranking CTE, the rest of the query uses ranking like any other table. We filter records to obtain only the first 5 ranks, then we order the results by position. This is what we get: amount_paymentwaiter_namestart_datetable_numberposition 670.12John2022-05-0511 560.75Peter2022-05-0622 430.50Peter2022-05-0523 350.00John2022-05-0514 320.50Mary2022-05-0535 At this point, I would like to suggest the article SQL Window Functions vs. GROUP BY: What's the Difference?. It explains the differences between the GROUP BY clause and window functions through several SQL query examples. Another interesting article with additional details is What Is the OVER() Clause in SQL?. Example Query #4: Computing Table Free Time with OVER() and LAG() Window functions include many ways to make complex calculations easier. One of them is the LAG() function, which returns a value from any column in the previous row related to the current row of the window. The owner of the restaurant wants to know how much time the tables are free, e.g. the time that elapses between customers. We can do this by creating a report with the columns table_number, date, free_start, free_end, and free_time_duration. To calculate the free time, we need to access two rows. We need the end_time of the previous occupation and the start_time of the next occupation; then we can calculate the elapsed time between the two. Here is where the LAG() window function comes in, because LAG() allows access to any column in the previous record. Here’s the query we’d use: SELECT start_date AS date, table_number, -- ending time of the previous occupation COALESCE(LAG(end_time) OVER (PARTITION BY start_date, table_number ORDER BY start_time),'11:00') AS start_free_time, -- starting time of current occupation start_time AS end_free_time, -- calculating the free time when the table was unoccupied start_time - COALESCE(LAG(end_time) OVER (PARTITION BY start_date, table_number ORDER BY start_time),'11:00') AS free_time_duration FROM restaurant_activity The key in the previous query is the LAG() window function. We used it to obtain the end_time of the previous occupation. The clause … OVER (PARTITION BY start_date, table_number ORDER BY start_time) … defines one window (or set of rows) for each distinct pair of <start_date, table_number>, and each of these windows are ordered by start_time. Then LAG(end_time) returns the ending time of the previous occupation of the table. You’ve probably noticed we used LAG() two times. The first one is used to obtain the starting time of the free period, and the second is to calculate the duration of the free time using the following expression: start_time - coalesce(LAG(end_time) OVER (PARTITION BY start_date, table_number ORDER BY start_time),'11:00') The results of the query are: datetable_numberstart_free_timeend_free_timefree_time_duration 5/5/2022111:00:0011:03:0000:03:00 5/5/2022111:45:0012:00:0000:15:00 5/5/2022211:00:0011:10:0000:10:00 5/5/2022311:00:0011:30:0000:30:00 5/5/2022312:40:0012:43:0000:03:00 6/5/2022111:00:0011:30:0000:30:00 6/5/2022112:30:0014:30:0002:00:00 6/5/2022211:00:0011:10:0000:10:00 6/5/2022311:00:0011:40:0000:40:00 6/5/2022312:40:0014:10:0001:30:00 Before going to the next section, I suggest the following articles for more details: SQL Window Function Example With Explanations How to Rank Rows in SQL: A Complete Guide The LAG Function and the LEAD Function in SQL How to Calculate the Length of a Series with SQL Example Query #5: Computing Rankings with the OVER() Clause In this section, we will cover a business situation where the SQL OVER() clause can be applied to create a ranking like the top 10 salespeople or 5 best-selling products. You can use the OVER() clause combined with the RANK() window function to obtain this kind of report. Let’s see an example query that will return the two largest tips of the day and the waiter receiving them: SELECT * FROM ( SELECT waiter_name, start_date AS date, total_tips AS tip_amount, RANK() OVER(PARTITION BY start_date ORDER BY total_tips DESC) AS ranking FROM restaurant_activity ) AS ranking_table WHERE ranking <= 2; We used a subquery in the FROM clause to create a temporary “table” called ranking_table. It uses the column ranking to store the position of the tip in the daily tip ranking. The position in the ranking is calculated with the RANK() window function. Other columns in the temporary table are waiter_name, date, and tip_amount. In the outer query, we filter only the tips ranked 1 and 2. The result of the query is shown below: waiter_namedatetip_amountranking John5/5/2022501 John5/5/2022372 Peter6/5/2022601 Mary6/5/2022302 Example Query #6: Computing Differences from a Previous Period Another interesting possibility is showing the difference in value from the previous time period and the current one. The OVER() clause combined with window functions like LEAD() and LAG() are used to create this kind of report. For additional background, see How to Calculate the Difference Between Two Rows in SQL. Let’s suppose we want to see each waiter’s daily revenue alongside the difference from the previous day. In the same row, we also want to see the difference expressed as a percentage. Here’s the query: SELECT waiter_name, date, today_revenue, -- revenue variation ---------------------------------------------- LAG(today_revenue) OVER ( PARTITION BY waiter_name ORDER BY date) yesterday_revenue, today_revenue - LAG(today_revenue) OVER ( PARTITION BY waiter_name ORDER BY date) AS revenue_variation, -- ----------------------------------------------------------------- -- revenue variation percentage ------------------------------------------------ round((today_revenue - LAG(today_revenue) OVER (PARTITION BY waiter_name ORDER BY date)) / LAG(today_revenue) OVER (PARTITION BY waiter_name ORDER BY date),2) * 100 AS revenue_variation_percentage -- ----------------------------------------------------------------------------- FROM ( SELECT DISTINCT start_date::date AS "date", waiter_name, SUM(total_tips) OVER ( PARTITION BY waiter_name, start_date::date ORDER BY start_date) AS today_revenue FROM restaurant_activity ) AS daily_revenue_per_waiter; We’ve created a subquery in the FROM clause called daily_revenue_per_waiter that has the waiter_name, date, and the total revenue made by this waiter on this day. In the outer query, we use the LAG() window function to obtain the revenue in the previous day, then we obtain the variation of the revenue between yesterday and today and the percentage variation. The result is: waiter_namedatetoday_revenueyesterday_revenuerevenue_variationrevenue_variation_percentage John2022-05-0567nullnullnull John2022-05-061067-57-85.00 Mary2022-05-0540nullnullnull Mary2022-05-06804040100.00 Peter2022-05-0550nullnullnull Peter2022-05-0660501020.00 Example Query #7: Moving Average Moving averages are a metric that’s frequently used to smooth out short-term fluctuations. You can learn more in the article What a Moving Average Is and How to Compute it in SQL; let’s see how we can build one using the OVER() clause. As an example, let’s suppose the restaurant owner wants to know the average amount paid per person on the last table served at any given time. He uses this average as a metric to know how much money the diners are paying and to activate some promotions or discounts. We can easily calculate this average with the expression amount_payment/total_diners; however, the owner has realized that this metric has major fluctuations, so he decided to use the average amount paid per person on the last 3 and 6 tables served. The query to calculate this report is: SELECT start_date AS "date", start_time AS "time", table_number, amount_payment AS total_amount, total_diners, ROUND(amount_payment/total_diners,2) AS diner_avg, ROUND(AVG(amount_payment/total_diners) OVER (ORDER BY start_date, start_time ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),2) AS diner_moving_avg_last_3_tables_served, ROUND(AVG(amount_payment/total_diners) OVER (ORDER BY start_date, start_time ROWS BETWEEN 5 PRECEDING AND CURRENT ROW),2) AS diner_moving_avg_last_6_tables_served FROM restaurant_activity The previous query calculates 3 different averages. The first one is a simple average based on the expression: ROUND(amount_payment/total_diners,2) The second average is the moving average for the last 3 tables served; the third average is just the same, but for the last 6 tables served: ROUND(AVG(amount_payment/total_diners) OVER (ORDER BY start_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),2) … OVER (ORDER BY start_date, start_time ROWS BETWEEN 5 PRECEDING AND CURRENT ROW),2) The term “ROWS BETWEEN 2 PRECEDING AND CURRENT ROW” specifies that the average is calculated based on 3 rows: the current row and the 2 rows immediately preceding it. The window is ordered by table starting time. Here’s the result: datetimetable_numbertotal_amountdinersdiner_avgdiner_moving_avg_last_3_tables_serveddiner_moving_avg_last_6_tables_served 2022-05-0511:031350.003116.67116.67116.67 2022-05-0511:102430.504107.63112.15112.15 2022-05-0511:303260.352130.18118.16118.16 2022-05-0512:001670.124167.53135.11130.50 2022-05-0512:433320.503106.83134.85125.77 2022-05-0611:102560.755112.15128.84123.50 2022-05-0611:403240.10380.0399.67117.39 2022-05-0611:301150.001150.00114.06124.45 2022-05-0614:103240.101240.10156.71142.77 2022-05-0614:301150.00275.00155.03127.35 Other business uses of the OVER() clause include computing running totals (useful in all kinds of financial analysis scenarios) and calculating the length of a data series. Ready to Practice the SQL OVER() Clause? We’ve demonstrated several queries featuring the SQL OVER() clause. As OVER() must be used combined with a window function, we covered a few of those, too: SUM(), AVG(), LAG() and RANK(). If you want to put your new knowledge of the SQL OVER() clause into practice, I recommend our interactive Window Functions course followed by our Window Functions Practice Set. You can read about the course in the article SQL Course of the Month – Window Functions. You can also get a copy of our free SQL Window Functions Cheat Sheet to help you on your learning path. Develop your skills and increase your assets! Tags: window functions