19th Oct 2021 9 minutes read

What Is a Recursive CTE in SQL?

Table of Contents

What are CTEs?
- Non-Recursive CTE Syntax
- Recursive CTE Syntax
Example 1 – Finding Bosses and Hierarchical Level for All Employees
Example 2 – Finding the Investment Amount by Investor
Example 3 – Finding Routes Between Cities
Continue Practicing Recursive CTEs

The article that’ll show you practical examples of using recursive CTEs in SQL.

If you’ve heard about SQL’s recursive CTEs but never used them, this article is for you. It’s also for you if you never get tired of recursive CTE examples.

Before we dig into recursion, I’ll remind you what CTEs are and what their syntax is. Then I’ll do the same for recursive CTEs. After that, I’ll show you how recursive CTEs work in three examples.

What are CTEs?

The CTE (common table expression), also known as the WITH clause, is an SQL feature that returns a temporary data set that can be used by another query. As it’s a temporary result, it’s not stored anywhere, but it still can be referenced like you would reference any other table.

There are two types of CTEs, non-recursive and recursive.

Here’s a nice article that will show you what CTEs are and how they work.

Non-Recursive CTE Syntax

The general syntax of a non-recursive CTE looks like this:

WITH cte_name AS (cte_query_definition)

SELECT *
FROM   cte_name;

The first part of the syntax is the CTE. It begins with the keyword WITH. Then you give your CTE a name. After you follow that by the AS keyword, you can define CTE in the parentheses.

The second part of the syntax is a simple SELECT statement. It is written immediately after the recursive CTE, without any commas, semicolons, or similar marks. Like I said earlier, the CTE is used in another query just like any other table. This is exactly what the SELECT statement does.

Here’s the article that can additionally help you with the CTE syntax and its rules. And if you need some more CTE examples, this article is for you.

Recursive CTE Syntax

A recursive CTE references itself. It returns the result subset, then it repeatedly (recursively) references itself, and stops when it returns all the results.

The syntax for a recursive CTE is not too different from that of a non-recursive CTE:

WITH RECURSIVE cte_name AS (
    cte_query_definition (the anchor member)

    UNION ALL

    cte_query_definition (the recursive member)
)


SELECT *
FROM   cte_name;

Again, at the beginning of your CTE is the WITH clause. However, if you want your CTE to be recursive, then after WITH you write the RECURSIVE keyword. Then it’s business as usual: AS is followed by the parentheses with the CTE query definition. This first query definition is called the anchor member.

To connect the anchor member with the recursive member, you need to use the UNION or UNION ALL command. The recursive member is, obviously, the recursive part of CTE that will reference the CTE itself. You’ll see how it works in an example very soon.

Recursive CTEs are used primarily when you want to query hierarchical data or graphs. This could be a company’s organizational structure, a family tree, a restaurant menu, or various routes between cities. See these articles to understand how CTEs work with hierarchical structures and how to query graph data.

Now that we understand how recursive CTEs work, let’s look at some examples.

Example 1 – Finding Bosses and Hierarchical Level for All Employees

For this problem, I’ll use data from the table employees, which has the following columns:

id: The employee’s ID.
first_name: The employee’s first name.
last_name: The employee’s last name.
boss_id: The employee boss’s ID.

Here’s what the data looks like:

id	first_name	last_name	boss_id
1	Domenic	Leaver	5
2	Cleveland	Hewins	1
3	Kakalina	Atherton	8
4	Roxanna	Fairlie	NULL
5	Hermie	Comsty	4
6	Pooh	Goss	8
7	Faulkner	Challiss	5
8	Bobbe	Blakeway	4
9	Laurene	Burchill	1
10	Augusta	Gosdin	8

It’s not too complicated. For example, Domenic Leaver’s boss is the employee with the ID of 5; that’s Hermie Comsty. The same principle works for all other employees except Roxanna Fairlie. She has no boss; there is a NULL value in the column boss_id. We can conclude that Roxanna is the president or owner of the company.

Let’s now write the recursive CTE to list all employees and their direct bosses.

WITH RECURSIVE company_hierarchy AS (
  SELECT	id,
    		first_name,
    		last_name,
    		boss_id,
		0 AS hierarchy_level
  FROM employees
  WHERE boss_id IS NULL

  UNION ALL 
  
  SELECT	e.id,
    		e.first_name,
    		e.last_name,
    		e.boss_id, 
		hierarchy_level + 1
  FROM employees e, company_hierarchy ch
  WHERE e.boss_id = ch.id
)

SELECT   ch.first_name AS employee_first_name,
	   ch.last_name AS employee_last_name,
	   e.first_name AS boss_first_name,
	   e.last_name AS boss_last_name,
	   hierarchy_level
FROM company_hierarchy ch
LEFT JOIN employees e
ON ch.boss_id = e.id
ORDER BY ch.hierarchy_level, ch.boss_id;

What does this query do? It’s a recursive query, so it starts with WITH RECURSIVE. The name of the CTE is company_hierarchy. After AS, the CTE definition is in the parentheses.

The first SELECT statement selects all the employee table columns where the column boss_id is NULL. In short, it will select Roxanna Fairlie, because only she has a NULL value in that column. Even shorter: I’m starting the recursion from the top of the organizational structure. There’s also a column hierarchy_level with the value of 0. That means the owner/president’s level is 0 – they're on top of the hierarchy.

I’ve used the UNION ALL to connect this SELECT statement with the second one, i.e. with the recursive member. In the recursive member, I’m selecting all the columns from the table employees and the CTE company_hierarchy where the column boss_id is equal to the column id. Notice the part hierarchy_level + 1. This means that with every recursion, the CTE will add 1 to the previous hierarchy level, and it will do that until it reaches the end of the hierarchy. Also note that I’m treating this CTE as any other table. To finish defining the CTE, simply close the parentheses.

Finally, there’s a third SELECT statement, outside of the CTE. It selects the columns that will show employees, their bosses’ names, and the hierarchy level. Data is taken from the CTE and the table employees. I’ve joined those two with a LEFT JOIN, since I want all the data from the CTE – including Roxanna Fairlie, who has the NULL value in the column boss_id. The result will be shown in ascending order: first by the hierarchy level, then by the boss’s ID. Here’s how it looks:

employee_first_name	employee_last_name	boss_first_name	boss_last_name	hierarchy_level
Roxanna	Fairlie	NULL	NULL	0
Hermie	Comsty	Roxanna	Fairlie	1
Bobbe	Blakeway	Roxanna	Fairlie	1
Domenic	Leaver	Hermie	Comsty	2
Faulkner	Challiss	Hermie	Comsty	2
Augusta	Gosdin	Bobbe	Blakeway	2
Pooh	Goss	Bobbe	Blakeway	2
Kakalina	Atherton	Bobbe	Blakeway	2
Laurene	Burchill	Domenic	Leaver	3
Cleveland	Hewins	Domenic	Leaver	3

Roxanna Fairlie is the ultimate boss; you already knew that. There are two employees at level 1. This means Bobbe Blakeway and Hermie Comsty are the direct subordinates of Roxanna Fairlie. At level 2, there are employees whose direct bosses are Bobbe Blakeway and Hermie Comsty. There’s also a third level in the hierarchy. Those are employees whose immediate boss is Domenic Leaver.

Example 2 – Finding the Investment Amount by Investor

In this example, I’ll use the table investment:

id: The investment’s ID.
investment_amount: The investment’s amount.

The data in the table looks like this:

id	investment_amount
1	9,705,321.00
2	5,612,948.60
3	5,322,146.00

These are the amounts of the three possible investment options. They will be considered by the three investors, who will divide the total investment amount in equal shares. Your task is to calculate the amount per investor depending on their number, i.e. if one, two, three, or no investors invest in each investment.

The query that solves this problem is:

WITH RECURSIVE per_investor_amount AS (
	SELECT	0 AS investors_number,
			0.00 AS investment_amount,
			0.00 AS individual_amount
	UNION 

	SELECT	investors_number + 1,
			i.investment_amount,
			i.investment_amount / (investors_number + 1)
	FROM investment i, per_investor_amount pia
	WHERE investors_number << 3
)

SELECT *
FROM per_investor_amount
ORDER BY  investment_amount, investors_number;

Once again, the CTE starts with WITH RECURSIVE, followed by its name and the query definition. This time, I’ll use the anchor member of the recursive query to create some data. The columns are investors_number, investment_amount, and individual_amount. This is the point I want the recursion to start from (the same way it was in the previous example, with hierarchy_level = 0).

Then comes the UNION and the recursive member. This part of the query will increase the investors_number column by one with every recursion. It will do that for every investment_amount. The third column will calculate the amount of that investment per investor, depending on the number of investors participating. The recursion will be performed for up to three investors (i.e. until it reaches the condition WHERE investors_number < 3).

After that comes the simple SELECT statement that will return all the columns from the CTE. And here’s the result:

investors_number	investment_amount	individual_amount
0	0.00	0.00
1	5,322,146.00	5,322,146.00
2	5,322,146.00	2,661,073.00
3	5,322,146.00	1,774,048.67
1	5,612,948.60	5,612,948.60
2	5,612,948.60	2,806,474.30
3	5,612,948.60	1,870,982.87
1	9,705,321.00	9,705,321.00
2	9,705,321.00	4,852,660.50
3	9,705,321.00	3,235,107.00

It’s not difficult to analyze. If there are no investors, the investment amount is zero, and so is the individual amount. If the investment is 5,322,146.00 and there’s only one investor, then the amount per investor will be 5,322,146.00. If there are two investors in the same amount, then each of them will have to pay 2,661,073.00. If all three investors decide to invest, each will pay 1,774,048.67. The other two investment amounts follow the same pattern, as you can see in the table.

Example 3 – Finding Routes Between Cities

In the third example, I’ll be using the table cities_route, which contains data about Dutch cities:

city_from: The departure city.
city_to: The destination city.
distance: The distance between two cities, in kilometers.

city_from	city_to	distance
Groningen	Heerenveen	61.4
Groningen	Harlingen	91.6
Harlingen	Wieringerwerf	52.3
Wieringerwerf	Hoorn	26.5
Hoorn	Amsterdam	46.1
Amsterdam	Haarlem	30
Heerenveen	Lelystad	74
Lelystad	Amsterdam	57.2

Use this table to find all the possible routes from Groningen to Haarlem, showing the cities on the route and the total distance.

Here’s the query to solve this problem:

WITH RECURSIVE possible_route AS (
	SELECT	cr.city_to,
       		cr.city_from || '->' ||cr.city_to AS route,
       		cr.distance
      FROM cities_route cr
      WHERE cr.city_from = 'Groningen'

UNION ALL

SELECT 	cr.city_to,
       		pr.route || '->' || cr.city_to AS route,
        		CAST((pr.distance + cr.distance) AS DECIMAL(10, 2))
      FROM possible_route pr
INNER JOIN cities_route cr
      		ON cr.city_from = pr.city_to
)

SELECT 	pr.route,
		pr.distance
FROM possible_route pr
WHERE pr.city_to = 'Haarlem'
ORDER BY pr.distance;

Let’s see what this query does. The first SELECT statement in the CTE definition will select the columns from the table cities_route where the departure city is Groningen. Notice there’s also a new column called route, which I’ll use to concatenate the cities on the route.

The UNION ALL connects this with the recursive member. This SELECT statement will select the arrival city, concatenate the cities on the route, and finally add the distances between these cities to the total of the route between Groningen and Haarlem. To achieve all that, I’ve joined the CTE with the table cities_route.

Then comes the SELECT statement that pulls data from the CTE. It will select the route and the distance where the arrival city is Haarlem, with the data being ordered by distance in ascending order.

The query result looks like this:

route	distance
Groningen->Heerenveen->Lelystad->Amsterdam->Haarlem	222.6
Groningen->Harlingen->Wieringerwerf->Hoorn->Amsterdam->Haarlem	246.5

It’s not difficult to understand this table. There are two routes from Groningen to Haarlem. They include different cities in between and they are 222.6 km and 246.5 km long, respectively.

If you want to keep learning, check out how you can use a recursive CTE instead of a long SQL query. And after tackling that subject,have a little fun by drawing something using a recursive CTE.

Continue Practicing Recursive CTEs

These three examples have demonstrated the possibilities of recursive CTEs in SQL. Now it’s time to build on what you learned here.

Probably the best option is having a go at our Recursive Queries course. It offers you plenty of examples, explanations, and practice opportunities. The course is part of the Advanced SQL course track, where you can learn about other advanced SQL topics like window functions, GROUP BY extensions, and recursive queries. Our platform offers many ways to practice these topics.

Have fun!

Tags: