11th Aug 2021 9 minutes read

How to Find the Nth-Highest Salary by Department with SQL

Window Functions

Table of Contents

The Data I’ll Be Using
What Is the Task Here?
Using NTH_VALUE()
Using ROW_NUMBER()
Using RANK()
Using DENSE_RANK()
Overview of the Concepts and Their Differences
Different Ways to Get the SQL Nth Value

Learn how to find the nth-highest salary in SQL, and you’ll learn how to get the nth value in any data.

In this article, I’ll focus on one problem and give you several solutions to that problem. The problem is stated in the title already: find the nth-highest salary by department using SQL.

This may sound too specific. But by learning how to solve this, you’ll be able to find the nth value in any data, not just salaries. You’ll get an idea of how to solve other similar problems you come across.

As I said, we will look at just one problem but explore several solutions for it. In fact, I’ll show you four solutions using these window functions:

NTH_VALUE()
ROW_NUMBER()
RANK()
DENSE_RANK()

To learn about SQL window functions, I recommend our interactive Window Functions course. It contains over 200 hands-on exercises to teach you all there is to know about window functions. I you already know window functions, consult our SQL Window Functions Cheat Sheet.

The Data I’ll Be Using

We will use data stored in the following two tables:

employee
department

Here’s what the data in the table employee looks like:

id	first_name	last_name	salary	department_id
1	Goraud	Tomankiewicz	7,231.06	1
2	Harwell	Winny	8,139.51	1
3	Bastien	Goosnell	4,574.20	1
4	Zachariah	Rapi	6,657.11	1
5	Giustino	Cruikshank	5,555.63	1
6	Abra	Clemon	5,564.25	3
7	Hurlee	Drance	9,790.16	2
8	Rozanna	McIlvoray	3,201.18	2
9	Ive	Strathdee	9,300.25	2
10	Lois	Skain	5,371.02	2
11	Debor	Holby	2,804.29	3
10	Hadrian	Robatham	2,615.78	3
13	Dix	Sowter	6,378.12	3
14	Leslie	Sandle	8,805.70	3
15	Dagny	Rosier	2,041.26	3

The table department has the following data:

id	department_name
1	Research and Development
2	Accounting
3	Human Resources

What Is the Task Here?

Let’s find the third-highest salary by department. This means finding the third-highest value, not overall but within each subset, where a subset has the salaries for a given department. The most helpful tool for doing this is the window functions.

So, here’s the first solution using a window function.

Using NTH_VALUE()

The purpose of the NTH_VALUE() function is to get the value of the nth row in the dataset. Here’s how we can use it to get the third-highest salary by department:

SELECT	e.first_name,
		e.last_name,
		d.department_name,
		salary,
		NTH_VALUE (salary, 3) OVER (
PARTITION BY department_name ORDER BY salary DESC
			RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS third_highest_salary
FROM department d JOIN employee e ON d.id = e.department_id;

This code first selects some columns from the tables employee and department. To use NTH_VALUE(), we have to specify the column and the value of N. Since we want to get the third-highest salary, the column is salary, and N = 3; hence, we have NTH_VALUE(salary, 3). This will get us the third-highest salary.

For a window function to work, we need to use an OVER() clause. Remember, we are looking for the third-highest salary in each department. To do this, we use PARTITION BY to divide the data into subsets by the column department_name from the table department. Since we’re looking for the third-highest salary, we use ORDER BY to make the window functions perform the calculations from the highest salary to the lowest. This is the reason for ORDER BY salary DESC in the code. Also, we need the RANGE clause, in this case, RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. This means the window function will look between the first row and the last row of the partition.

Of course, to get all this, we need to join the two tables.

Here’s the result:

first_name	last_name	department_name	salary	third_highest_salary
Hurlee	Drance	Accounting	9,790.16	5,371.02
Ive	Strathdee	Accounting	9,300.25	5,371.02
Lois	Skain	Accounting	5,371.02	5,371.02
Rozanna	McIlvoray	Accounting	3,201.18	5,371.02
Leslie	Sandle	Human Resources	8,805.70	5,564.25
Dix	Sowter	Human Resources	6,378.12	5,564.25
Abra	Clemon	Human Resources	5,564.25	5,564.25
Debor	Holby	Human Resources	2,804.29	5,564.25
Hadrian	Robatham	Human Resources	2,615.78	5,564.25
Dagny	Rosier	Human Resources	2,041.26	5,564.25
Harwell	Winny	Research and Development	8,139.51	6,657.11
Goraud	Tomankiewicz	Research and Development	7,231.06	6,657.11
Zachariah	Rapi	Research and Development	6,657.11	6,657.11
Giustino	Cruikshank	Research and Development	5,555.63	6,657.11
Bastien	Goosnell	Research and Development	4,574.20	6,657.11

The result tells us the third-highest salary in Accounting is 5,371.02, which is Lois Skain’s salary. Abra Clemon’s salary (5,564.25) is the third-highest salary in Human Resources. In Research and Development, Zachariah Rapi’s salary is the third highest (6,657.11).

Using ROW_NUMBER()

The second option for getting the third-highest salary by department is to use ROW_NUMBER(). This window function returns the sequence numbers of the rows in a dataset. If we order the salaries within each department, it'll be easy to pick the third highest. Here’s what the solution looks like:

SELECT	e.first_name,
		e.last_name,
		d.department_name,
		salary,
		ROW_NUMBER() OVER (
PARTITION BY d.id ORDER BY salary DESC
) AS salary_rank
FROM department d JOIN employee e ON d.id = e.department_id
ORDER BY department_name;

This solution selects all the same columns as in the previous solution. The difference is we use ROW_NUMBER() here instead of NTH_VALUE(). All criteria in the OVER() clause are the same as in the previous solution.

The result looks like this:

first_name	last_name	department_name	salary	salary_rank
Hurlee	Drance	Accounting	9,790.16	1
Ive	Strathdee	Accounting	9,300.25	2
Lois	Skain	Accounting	5,371.02	3
Rozanna	McIlvoray	Accounting	3,201.18	4
Leslie	Sandle	Human Resources	8,805.70	1
Dix	Sowter	Human Resources	6,378.12	2
Abra	Clemon	Human Resources	5,564.25	3
Debor	Holby	Human Resources	2,804.29	4
Hadrian	Robatham	Human Resources	2,615.78	5
Dagny	Rosier	Human Resources	2,041.26	6
Harwell	Winny	Research and Development	8,139.51	1
Goraud	Tomankiewicz	Research and Development	7,231.06	2
Zachariah	Rapi	Research and Development	6,657.11	3
Giustino	Cruikshank	Research and Development	5,555.63	4
Bastien	Goosnell	Research and Development	4,574.20	5

It looks a little different from the result of the previous solution. The last column does not contain the value of the third-highest salary. Instead, it contains a ranking, and we can easily see the effect is the same. The values of the third-highest salary by department are 5,371.02 (Accounting), 5,564.25 (Human Resources), and 6,657.11 (Research and Development), the same as what we got in the first solution.

We can also use ROW_NUMBER() in a CTE. If you’re not familiar with CTEs, this article explains what it is. For example, we can write the code below:

WITH salaries_ranks AS (
SELECT e.first_name,
 e.last_name,
 d.department_name,
 salary,
 ROW_NUMBER() OVER (PARTITION BY d.id ORDER BY salary DESC
 ) AS salary_rank
FROM department d JOIN employee e ON d.id = e.department_id
)

SELECT *
FROM salaries_ranks
WHERE salary_rank = 3;

We start our CTE using a WITH, followed by the CTE name, salaries_ranks. Then, we write a SELECT statement, which is exactly the same as in the previous code.

Now that we’ve written a CTE, we can use it like any other table. We do that in a separate SELECT statement which selects all the columns from the CTE salaries_ranks, showing only the rows where salary_rank = 3.

The result from this code is:

first_name	last_name	department_name	salary	salary_rank
Zachariah	Rapi	Research and Development	6,657.11	3
Lois	Skain	Accounting	5,371.02	3
Abra	Clemon	Human Resources	5,564.25	3

This is quite an elegant solution. We get only the necessary data – just three rows showing the third-highest salary for each department.

Using RANK()

The third option is to use the RANK() function. It is similar to ROW_NUMBER() in that it also ranks the rows within a partition. Similar, but not the same. I’ll talk about the differences later on, or you can read about them here.

Let’s write the code using RANK():

SELECT	e.first_name,
		e.last_name,
		d.department_name,
		salary,
		RANK() OVER (
PARTITION BY d.department_name ORDER BY salary DESC
) AS salary_rank
FROM department d JOIN employee e ON d.id = e.department_id;

Again, this code is not much different from the previous ones except for the specific window function used. Here, it’s RANK(), with all the same criteria in the OVER() clause. The result will be exactly the same as when we used ROW_NUMBER(), so I don’t think there’s any need to show the same result table again.

Using DENSE_RANK()

The last solution I will show you here is the DENSE_RANK() window function. Just like ROW_NUMBER() and RANK(), it ranks the values within a dataset.

The code is not really different, either. It’s the same as before, just using a different window function:

SELECT	e.first_name,
		e.last_name,
		d.department_name,
		salary,
		DENSE_RANK() OVER (
PARTITION BY d.department_name ORDER BY salary DESC 
		) AS salary_rank
FROM department d JOIN employee e ON d.id = e.department_id;

When we run this code, we get exactly the same result as we get with the previous two solutions.

Overview of the Concepts and Their Differences

As you have seen, you can use any of these four window functions to get the third-highest salary by department.

The NTH_VALUE() function explicitly shows you the value of the third-highest salary by department.

The ROW_NUMBER(), RANK(), and DENSE_RANK() functions rank the salaries within each department. Then, you can simply find the salary value associated with rank number 3. These three functions are similar but not the same. They seem identical in this example, but other data may expose their differences.

Here’s an explanation of how they are different to save you some unpleasant surprises.

The differences surface when you have ties (multiple rows with the same value) in your data. I knew I didn’t have any ties in my data, so I knew all three functions would give me the same solution.

But what if you do have ties? In that case, ROW_NUMBER() would sequentially allocate ranks; that is, the ties would have different rank numbers that are arbitrarily assigned. In contrast, RANK() allocates the same rank number to the same values and skips a rank number to compensate for it when it comes to a row with a different value. Finally, DENSE_RANK() allocates the same rank number to ties, without skipping a rank number at the next value. Here’s an example to illustrate:

employee	salary	row_number()	rank()	dense_rank()
1	2,000	1	1	1
2	4,000	3	3	3
3	3,000	2	2	2
4	8,000	5	5	4
5	4,000	4	3	3

If you’re looking for the value of the third-highest salary in this data, ROW_NUMBER() will give you the correct solution; the salary of 4,000 is the third-highest. However, it would give you an incorrect solution if you’re looking for all the employees with the third-highest salary. It would show only employee 2 when employee 5 should also be shown. In this case, RANK() or DENSE_RANK() would be a better choice.

What if you’re looking for the value of the fourth-highest salary? The ROW_NUMBER() function would give you a totally incorrect solution, since the third- and the fourth-highest values are the same. Using RANK() returns no result at all, because it skips the rank number four – as explained earlier, it allocates the same rank number to ties and skips the next rank number to compensate for it. Only DENSE_RANK() gives you a correct solution in this case.

Before deciding which function to use, be sure to understand the data and what you are trying to get as the solution. The safest way is to use NTH_VALUE(). If you want to use a ranking function, it’s generally best to use DENSE_RANK() when you don’t know if there are any ties in the data. If you use ROW_NUMBER() or RANK() instead, make sure you know what the result would look like. Maybe one or both these two functions give you what you need. Choose the function according to your needs.

You could also use these four functions when you want to find the maximum or the minimum salary by department. Or, for example, the most or the least expensive product by product category. Or any product with the nth-highest price. You could look for the nth-highest or the nth-lowest sales, revenue, hours worked, costs, number of likes, logins, engagements, streams, comments, etc.; you name it.

If you use MySQL, this comprehensive introduction will show you these four and all other window functions in MySQL.

Different Ways to Get the SQL Nth Value

The NTH_VALUE() function is perfect for finding the nth-highest salary or the nth value of any other column. After all, it was designed exactly for that purpose. I’ve shown you ways to get the same effect with three additional window functions: ROW_NUMBER(), RANK(), and DENSE_RANK(). Use whichever best suits your needs and data.

But these are just four window functions. It is just the tip of the iceberg, and there is so much more! These and other window functions are explained in this course on window functions.

Tags:

Window Functions