21st May 2020 8 minutes read How to Number Rows in an SQL Result Set Ignacio L. Bisso window functions Table of Contents A Frequent Request: Could You Number the Records? Going Deeper: The Partition By and Order By Clauses Using ROW_NUMBER to Remove Duplicates Create a Ranking Report with ROW_NUMBER Have you ever needed to add a sequential number to the records in the result of an SQL query? It’s not as simple as it looks! Find out how to do it correctly in this article. To number rows in a result set, you have to use an SQL window function called ROW_NUMBER(). This function assigns a sequential integer number to each result row. However, it can also be used to number records in different ways, such as by subsets. You can even use it to number records for other interesting purposes, as we will see. To learn how window functions work, what functions there are, and how to apply them to real-world problems, it’s best to take the Window Functions course. You can find all about it here. It’s interactive, there are 218 exercises, and you only need a web browser and some basic SQL knowledge. A Frequent Request: Could You Number the Records? Suppose you work for a car sales company and you want to produce the following report. Note that the first column (row_num), is not a table column; we generate it using ROW_NUMBER() in the query. row_numArticle_codeArticle_nameBranchUnits_sold 1101Katan 2.3 LuxNew York23 2102Katan 1.8 StdNew York17 3102Katan 1.8 StdSan Francisco18 4101Katan 2.3 LuxSan Francisco15 5103Katan GoldNew York3 Result table The query to obtain the report is: SELECT ROW_NUMBER() OVER () AS row_num, article_code, article_name, branch, units_sold FROM Sales WHERE article_code IN ( 101, 102, 103 ) In the above query, the syntax of the ROW_NUMBER() function is very simple: we use an empty OVER clause. This means that we want to number all the records in the result set using just one sequence of numbers, assigning numbers to records without any order. This is the simplest way to use the ROW_NUMBER() function: ROW_NUMBER() OVER () AS row_num However, there are other ways to use ROW_NUMBER(). We can add a PARTITION BY and/or an ORDER BY clause to the OVER, as we’ll see in the next section. The PARTITION BY clause allows us to number several groups of records independently, while the ORDER BY clause allows us to number the records in specific order. In the next section, we will see some examples. Before continuing with ROW_NUMBER(), we should say a few words about SQL window functions. As we mentioned previously, ROW_NUMBER() is a window function. There are many other window functions that you can use in your queries, like AVG(), MAX(), LEAD(), LAG() and FIRST_VALUE(). If you want to go into details, I suggest LearnSQL’s Window Functions course. It’s a step-by-step tutorial that takes you through SQL window functions using examples and exercises. Going Deeper: The Partition By and Order By Clauses In the previous section, we covered the simplest way to use the ROW_NUMBER() window function, i.e. just numbering all records in the result set in no particular order. In the next paragraphs, we will see three examples with some additional clauses, like PARTITION BY and ORDER BY. In our first example, we will number the records using a different sequence for each company branch, which will be ordered by the units sold at that branch. In the next query, the PARTITION BY branch clause groups the records that have the same value in branch, assigning a different ROW_NUMBER sequence to each group/branch. (Each group has a different color in the image below.) The ORDER BY units_sold clause defines the order in which we process the rows inside the partition. In this case, the rows belonging to each partition will be ordered by unit_sold in descending order. SELECT ROW_NUMBER() OVER (PARTITION BY branch ORDER BY units_sold DESC) AS row_num, article_code, article_name, branch, units_sold FROM Sales WHERE article_code IN ( 101, 102, 103 ) Note that in this query, sequences are assigned per branch – in the image below, each group of records has a different color – and ordered by units_sold. The clause we used is: ROW_NUMBER() OVER (PARTITION BY branch ORDER BY units_sold DESC) This clause allows us to number groups of records that have the same value in the branch column. In our example, there are two groups of records: New York (red) and San Francisco (blue). Now each group of records will be numbered (the ORDER BY clause) based on the units_sold column. The values are shown in descending order. row_numArticle_codeArticle_nameBranchUnits_sold 1101Katan 2.3 LuxNew York23 2102Katan 1.8 StdNew York17 3103Katan GoldNew York3 1102Katan 1.8 StdSan Francisco18 2101Katan 2.3 LuxSan Francisco15 Result table The OVER, PARTITION BY, and ORDER BY clauses are very common in window functions; if you want to go into details, I suggest you read the article Window Function Examples, where you will find several window functions examples explained in detail. Using ROW_NUMBER to Remove Duplicates Another interesting use case for the ROW_NUMBER() function is when we have full duplicated records in a table. Full duplicated records happen when the table has more than one record with the same values in all their columns (usually due to a previous fail). We will show some SQL code to fix this situation; moreover, this code can be adapted to any case of full duplicated records. First of all, let’s insert some full duplicated records into the Sales table. Suppose we don’t have a primary key in the Sales table and an SQL developer mistakenly executes the following INSERT statement: INSERT INTO sales SELECT * FROM sales WHERE branch = 'San Francisco'; After the INSERT execution, the table Sales looks like this. The last two rows are full duplicates: Article_codeArticle_nameBranchUnits_soldPeriod 101Katan 2.3 LuxNew York23Q1-2020 102Katan 1.8 StdNew York17Q1-2020 102Katan 1.8 StdSan Francisco18Q1-2020 101Katan 2.3 LuxSan Francisco15Q1-2020 103Katan GoldNew York3Q1-2020 102Katan 1.8 StdSan Francisco18Q1-2020 101Katan 2.3 LuxSan Francisco15Q1-2020 Table: Sales To remove the duplicate records, we will add a new column called row_num and we’ll populate it with the following INSERT that uses the ROW_NUMBER() function. Note that we PARTITION BY all columns in the table. Here is the SQL code: ALTER TABLE sales ADD COLUMN row_num INTEGER; INSERT INTO sales SELECT article_code, article_name, branch, units_sold, period, ROW_NUMBER() OVER (PARTITION BY article_code,article_name,branch, units_sold,period) FROM sales ; Then, after adding the new column and populating it with ROW_NUMBER(), our table looks like this: Article_codeArticle_nameBranchUnits_soldPeriodrow_num 101Katan 2.3 LuxNew York23Q1-2020NULL 102Katan 1.8 StdNew York17Q1-2020NULL 102Katan 1.8 StdSan Francisco18Q1-2020NULL 101Katan 2.3 LuxSan Francisco15Q1-2020NULL 103Katan GoldNew York3Q1-2020NULL 102Katan 1.8 StdSan Francisco18Q1-2020NULL 101Katan 2.3 LuxSan Francisco15Q1-2020NULL 101Katan 2.3 LuxNew York23Q1-20201 102Katan 1.8 StdNew York17Q1-20201 102Katan 1.8 StdSan Francisco18Q1-20201 101Katan 1.8 LuxSan Francisco15Q1-20201 103Katan GoldNew York3Q1-20201 102Katan 1.8 StdSan Francisco18Q1-20202 101Katan 2.3 LuxSan Francisco15Q1-20202 Table: Sales It’s easy to see we need to remove all records with a NULL or a 2 in the column row_num. Let’s do it with the DELETE command. After that, we need to remove the column row_num. Here’s the code: DELETE FROM sales WHERE row_rank IS NULL OR row_rank = 2; ALTER TABLE sales DROP COLUMN row_rank; After running the DELETE and ALTER statements, the table Sales is fixed, with no duplicate records. There is a disadvantage with this approach that we must clarify. At some point in the process, the number of records in the table Sales will be doubled. That can make this method inefficient, especially with large tables. Thus, we recommend using it only on small and medium tables. Create a Ranking Report with ROW_NUMBER In this section, we will use the ROW_NUMBER() function to create a ranking. We will see that there are better functions for ranking, like RANK and DENSE_RANK; however, we can build a pretty good ranking report by using ROW_NUMBER(). Let’s suppose that once a year our car sales company gives three bonuses to their salespeople: one bonus is for the person who sold the most units, another bonus is for the person who made the most revenue, and the third bonus is for the person who made the most profit. If any bonus category is won by two sales reps, then both reps receive 50% of the bonus. We will use the table Sellers_2019 to obtain the rankings and define the winner of each bonus. Seller_nameUnits_soldRevenueProfit John Doyle123834.00038% Mary Smith121914.00039% Susan Graue123874.00039% Simon Doe117824.00042% Henry Savosky120813.00035% Table: Sellers_2019 The following query will return the rankings we need to define the bonus winners. These rankings will be on the columns units_ranking, revenue_ranking, and profit_ranking. SELECT seller_name, ROW_NUMBER() OVER (ORDER BY units_sold desc) units_ranking, ROW_NUMBER() OVER (ORDER BY revenue desc) revenue_ranking, ROW_NUMBER() OVER (ORDER BY profit desc) profit_ranking FROM sellers; The next image shows the results of the previous query with the ROW_NUMBER() function. Initially, the ranking values seem to be correct. But if we review them in detail, we can find a mistake with Susan Graue in Units_ranking. Susan (and John Doyle) won the units sold bonus with 123 units. However, the ROW_NUMBER function assigns position 1 to John and position 2 to Susan, which is not correct. Seller_nameUnits_rankingRevenue_rankingProfit_ranking John Doyle134 Mary Smith312 Susan Graue223 Simon Doe541 Henry Savosky455 Result table So, in this case the ROW_NUMBER() function is not the best choice for ranking calculations. Fortunately, SQL provides two functions specifically for ranking purposes: RANK() and DENSE_RANK(). The next query uses the RANK() function instead of the ROW_NUMBER(): SELECT seller_name, RANK() OVER (ORDER BY units_sold desc) units_ranking, RANK() OVER (ORDER BY revenue desc) revenue_ranking, RANK() OVER (ORDER BY profit desc) profit_ranking FROM sellers; In the next image, we can see the results of the RANK() query. We can verify that the issue with Susan’s rank is solved. Seller_nameUnits_rankingRevenue_rankingProfit_ranking John Doyle134 Mary Smith312 Susan Graue122 Simon Doe541 Henry Savosky455 Result table If you are interested in the RANK() and DENSE_RANK() window functions, I suggest you read the article How to use RANK functions. It will give you several examples and queries. Tags: window functions