21st Feb 2023 18 minutes read

6 Useful Examples of CTEs in SQL Server

Table of Contents

CTE Syntax in SQL Server
6 Examples of CTEs in SQL Server
When Should You Use CTEs?
CTEs Are the Door to Advanced SQL Usage!

How can you use CTEs in SQL Server in your everyday professional life as a data pro? We’ll answer this question by giving you six examples.

CTE is short for Common Table Expression. This is a relatively new feature in SQL Server that was made available with SQL Server 2005.

A CTE is a temporary named result. This result is available only for the query that runs it. It isn’t stored, so it doesn't take up disk space. A CTE is somewhat similar to a temporary table and can be used like any other table. CTEs are most often used with a SELECT statement, but they can be used with INSERT, UPDATE, and DELETE as well.

CTEs are one of the most challenging concepts in SQL Server. To reap their benefits, your approach to learning them should be carefully structured and not rushed. Our Recursive Queries in MS SQL Server course will show you how to write a simple CTE in SQL Server for a start. Then you’ll learn to write multiple CTEs, nest them, and use them within SELECT, INSERT, UPDATE, and DELETE statements. Finally, you’ll learn about hierarchical and graph data structure and how to use recursive CTEs in SQL Server to query such data. To ensure you get enough practice, there are 112 interactive exercises to complete in the course.

There are, of course, some other ways to learn CTEs that you can also check out.

CTE Syntax in SQL Server

Generally, CTE syntax in SQL Server is like the following example:

WITH cte AS (  
  SELECT
    ...
)

SELECT
  ...
FROM cte;

CTEs must always start with the keyword WITH. Then comes the CTE’s name, the AS keyword, and the parentheses. You define the CTE in those parentheses. Defining it, as you’ll see in our examples, means writing the SELECT statement. You can find more details in this article explaining what a CTE is.

6 Examples of CTEs in SQL Server

1: Find the Average Highest and Lowest Numbers of Daily Streams

In the first five examples, we’ll be using the same dataset. It shows some made-up data from an imaginary music streaming platform; let’s call it Terpsichore.

The dataset consists of three tables. The first is artist, and here’s the create table query. This table contains the following columns:

id – The artist’s ID and the table’s primary key.
artist_name – The artist’s name.

id	artist_name
1	Prince
2	Jimi Hendrix
3	Santana

This table shows three artists.

The next table is albums. Here’s the query for creating it. And here are the columns it contains:

id – The album’s ID and the table’s primary key.
artist_id – The artist (and the table’s foreign key).
album_title – The album’s title.
year_released – The year of the album’s release.

id	artist_id	album_title	year_released
1	2	Are You Experienced	1967
2	2	Axis: Bold as Love	1967
3	1	Dirty Mind	1980
4	2	Electric Ladyland	1968
5	3	Abraxas	1970
6	1	1999	1982
7	3	Santana III	1971
8	3	Santana	1969
9	1	Prince	1979
10	1	Controversy	1981

There are ten albums in the table.

The last table is streams. It shows streaming data for the individual songs. You can create the table using this query. And the columns:

id – The stream ID and the table’s primary key.
artist_id – The artist’s ID and a foreign key.
album_id – The album’s ID and a foreign key.
song_title – The name of the song.
date – The date of the stream.
number_of_streams – The number of times the song was played on a particular date.
pay_per_stream – Value (in dollars) Terpsichore pays to the artists for each stream.

id	artist_id	album_id	song_title	date	number_of_streams	pay_per_stream
1	1	9	I Wanna Be Your Lover	2023-01-01	597	0.013
2	1	6	Little Red Corvette	2023-01-01	497	0.013
3	1	6	D.M.S.R.	2023-01-01	217	0.013
4	1	3	Uptown	2023-01-01	9748	0.013
5	1	3	Do It All Night	2023-01-01	208	0.013

There are 45 rows in this table. We’ll show you only the first five, so you get the feeling of the table’s logic.

Now, the example! Let’s start with writing only one CTE in SQL Server. We will do that to calculate the average highest and lowest number of daily streams.

Here’s the code; we’ll explain it below:

WITH daily_streaming AS (
  SELECT date,
	   MIN(number_of_streams) AS minimum_streaming,
	   MAX(number_of_streams) AS maximum_streaming
  FROM streams
  GROUP BY date
)

SELECT AVG(minimum_streaming) AS average_minimum_daily_streaming,
	 AVG(maximum_streaming) AS average_maximum__daily_streaming
FROM daily_streaming;

A CTE is brilliant for breaking down the logic of any calculation in SQL Server.

Remember, there are multiple songs streamed in one day. Our CTE’s purpose is to get the lowest and highest number of streams each day.

As mentioned previously, a CTE in SQL Server always starts with the keyword WITH, which is followed by the CTE name. Our CTE is named daily_streaming.

After the keyword AS comes the parentheses with a SELECT statement – i.e. the CTE definition. We use it, along with the MIN() and MAX() aggregate functions, to calculate the highest and lowest number of streams by date.

The next SELECT statement uses data from the CTE, referencing it in the FROM clause. As we said, a CTE can be used as any other table. In this SELECT, we use the AVG() aggregate function to get the average of the daily stream peaks and lows.

The output shows the average lowest point is 90 streams. The average of the top daily streams is 8,367.

average_minimum_daily_streaming	average_maximum__daily_streaming
90	8,367

2: Calculate the Average Total Fee Paid per Song

Let’s now practice what we learned. We’ll again write a single CTE.

The problem that needs solving is finding the average total fee Terpsichore paid for each song.

Here’s the solution:

WITH paid_per_song AS (
  SELECT song_title,
	   SUM(number_of_streams * pay_per_stream) AS total_pay
  FROM streams
  GROUP BY id, song_title, pay_per_stream
)

SELECT song_title,
       AVG(total_pay) AS average_total_pay
FROM paid_per_song
GROUP BY song_title
ORDER BY average_total_pay DESC;

We use SQL Server CTE to calculate the total fee paid per song by multiplying the number of streams with the pay per stream, and then summing it using the SUM() aggregate function.

There are no changes regarding the CTE syntax: first comes WITH, then the CTE name, and AS comes after that.

We then use a SELECT that invokes the CTE to calculate the average pay per song. It’s simple: use AVG(), reference the CTE in FROM, and group by the song title.

The query returns the following result:

song_title	average_total_pay
Uptown	47.4803330
I Wanna Be Your Lover	36.8203330
Little Red Corvette	33.8693330
The Wind Cries Mary	23.6138660
Do It All Night	12.4063330
If 6 Was 9	7.7824000
Samba Pa Ti	7.5735000
All Along the Watchtower	5.2032000
Bold as Love	4.7424000
Burning of the Midnight Lamp	3.7333330
D.M.S.R.	3.1633330
Taboo	2.4871000
Jingo	2.1604000
Everything's Coming Our Way	1.5466000
Incident at Neshabur	0.9207000

We see that the song ‘Uptown’ earned $47.4803330 in total. The second and third songs by earnings are ‘I Wanna Be Your Lover’ and ‘Little Red Corvette’. If you’re a fan of this artist, then you don’t need SQL to find who wrote all three of these songs.

3: Find Each Artist’s Most Streamed Album

In this exercise, you have to find the most streamed album by each artist. Output the artist name, the album title, and the number of streams by album.

This will, again, be a query with only one CTE. However, it’s a little more complex than the previous two – there are some JOINs and a window function.

WITH album_streaming AS (
  SELECT artist_id,
	   album_id,
	   SUM(number_of_streams) AS streams_by_album,
	   RANK() OVER (PARTITION BY artist_id ORDER BY SUM(number_of_streams) DESC) AS streaming_rank
  FROM streams
  GROUP BY artist_id, album_id
)

SELECT artist_name,
       album_title,
	 streams_by_album
FROM album_streaming alst
JOIN albums al
ON alst.album_id = al.id
JOIN artist ar 
ON al.artist_id = ar.id
WHERE streaming_rank = 1;

Again, the CTE syntax in SQL Server is familiar. So let’s focus on what this CTE does. We use it to rank the albums by artist. First, we select the artist and album IDs. Then we use SUM() to calculate the number of streams by album.

Now comes the crucial part – ranking the output using the RANK() window function. After the function is invoked, we see the OVER() clause – a mandatory clause for SQL window functions. We partition the dataset by the artist ID and order data within each partition by the number of streams, in descending order.

What does that mean in practice? It means the window function will rank the albums for one artist, then the rank is restarted once the function reaches the next artist, and so on. The artist’s album with the most streams will be ranked first in their partition.

If you run only this SELECT statement within the CTE, you’ll get this output:

artist_id	album_id	streams_by_album	streaming_rank
1	3	13,820	1
1	6	8,546	2
1	9	8,497	3
2	5	7,722	1
2	7	3,667	2
2	8	1,964	3
3	1	11,069	1
3	2	5,871	2
3	4	4,189	3

As you can see, the first artist’s albums are ranked from the first to the third, according to the number of streams. When we reach the second artist, the ranking restarts. The same is with the third artist.

Now, let’s see what the second SELECT statement does. Actually, it’s nothing complicated. It returns the artist and album name and the number of streams. What complicates this query is that we have to join three tables.

The first join is the CTE album_streaming. Then we join it with albums and then with the artist table. In the end, we filter data using the WHERE clause because we’re only interested in the most streamed album.

You’ll get this result:

artist_name	album_title	streams_by_album
Prince	Dirty Mind	13,820
Jimi Hendrix	Are You Experienced	11,069
Santana	Abraxas	7,722

It shows Prince’s most streamed album is ‘Dirty Mind’ with 13,820 streams. For Jimi Hendrix, the most streamed album is ‘Are You Experienced’, and for Santana, it’s ‘Abraxas’.

This solution uses the window functions, so here’s a reminder of how they work when ranking data.

4: Calculate the Average Streams per Song and Compare It With Average Streams per Date

Things are now getting more complicated. But not too much, don’t worry. We’re building on what we have learned so far about CTEs in SQL Server.

Here, we need to find the average number of streams per song. Then we need to calculate the average number of streams per date.

The output should show both metrics. Also, it should show the difference between the average stream per song and the daily average (as a percent difference), the song title, and the dates.

So far, we’ve been writing queries with one CTE. This time, the solution consists of two CTEs. Let’s see how this works:

WITH streams_per_song AS (
  SELECT song_title,
	   AVG(number_of_streams) AS average_streams_per_song
  FROM streams
  GROUP BY song_title
),

streams_per_date AS (
  SELECT date,
	   AVG(number_of_streams) AS average_streams_per_date
  FROM streams
  GROUP BY date
)

SELECT song_title,
	 average_streams_per_song,
	 date,
	 average_streams_per_date,
	 (average_streams_per_song - average_streams_per_date)/CAST(average_streams_per_date AS DECIMAL(10,2))*100 AS diff_from_daily_average
FROM streams_per_song, streams_per_date;

The first CTE is written as usual. We use it to calculate the average number of streams per song with AVG().

After closing the parentheses, the first CTE must be separated from the second CTE with a comma.

Then we write the second CTE. Behold! There’s no WITH! That’s right. When writing multiple CTEs in a query in SQL Server, you write WITH only in front of the first CTE. The second (and any subsequent CTE) starts with the CTE's name; everything else is the same.

This second query is for calculating the average number of streams per date. Again, we use the AVG() function.

The third SELECT uses data from both CTEs. It returns all the required columns. The last column is diff_from_daily_average. We calculate it by subtracting the average streams by date from the average streams by song. The difference is divided by the average streams by date and multiplied by 100 to get the percentage. Also, we converted the result into a decimal data type using the CAST() function.

Due to the output size, we’ll only show the first few rows:

song_title	average_streams_per_song	date	average_streams_per_date	diff_from_daily_average
All Along the Watchtower	813	2023-01-01	1,031	-21.14
Bold as Love	741	2023-01-01	1,031	-28.13
Burning of the Midnight Lamp	583	2023-01-01	1,031	-43.45
D.M.S.R.	243	2023-01-01	1,031	-76.43
Do It All Night	954	2023-01-01	1,031	-7.47

The data shows the average daily stream for 1 January 2023 is 1,031. ‘All Along the Watchtower’ is 21.14% below that average. The next two songs are 28.13% and 43.45% below the daily average, and so on.

5: Calculate the Highest and Lowest Average Album Pay by Artist

Let’s explain what we mean by that. We want first to find the average pay by album and date. Then we need to find the lowest and highest pay value by album. After that, we want to aggregate data by artist. Along with its name, we need to show the value of the lowest pay the artist got for an album. We need to do the same with the highest pay for an album.

The solution in SQL Server contains two CTEs. However, this time it’s a nested CTE. That’s when the second CTE references the first CTE. Let’s see how this works:

WITH pay_per_album AS (
  SELECT album_id,
	   date,
	   AVG(number_of_streams * pay_per_stream) AS average_pay_per_album
  FROM streams
  GROUP BY album_id, date
),
	
min_max_average_pay AS (
  SELECT album_id,
	   MIN(average_pay_per_album) AS lowest_average_pay_by_album,
	   MAX(average_pay_per_album) AS highest_average_pay_by_album
  FROM pay_per_album
  GROUP BY album_id
)

SELECT artist_name,
	 MIN(lowest_average_pay_by_album) AS lowest_album_pay_by_artist,
	 MAX(highest_average_pay_by_album) AS highest_album_pay_by_artist
FROM min_max_average_pay mmap
JOIN albums al
ON mmap.album_id = al.id
JOIN artist ar
ON al.artist_id = ar.id
GROUP BY artist_name;

Be careful when you read the code explanation! It’s easy to get lost in all these aggregations. To make things easier to follow, I’ve copied each part of the query and followed it with an explanation. We’ll start with the first CTE:

WITH pay_per_album AS (
  SELECT album_id,
	   date,
	   AVG(number_of_streams * pay_per_stream) AS average_pay_per_album
  FROM streams
  GROUP BY album_id, date
),

The first CTE calculates the average pay per album and the date. This is done by multiplying the number of streams by the pay per stream and using AVG().

min_max_average_pay AS (
  SELECT album_id,
	   MIN(average_pay_per_album) AS lowest_average_pay_by_album,
	   MAX(average_pay_per_album) AS highest_average_pay_by_album
  FROM pay_per_album
  GROUP BY album_id
)

When writing the second CTE in SQL Server, the syntax is the same as in the previous example – no additional WITH, start with the CTE name, and separate the CTEs with a comma. The only difference is that this time the second CTE references the first CTE, not the original dataset.

This nested CTE uses the MIN() and MAX() functions to find each album’s lowest and highest average pay for all dates. The first CTE is referenced in the FROM.

SELECT artist_name,
	 MIN(lowest_average_pay_by_album) AS lowest_album_pay_by_artist,
	 MAX(highest_average_pay_by_album) AS highest_album_pay_by_artist
FROM min_max_average_pay mmap
JOIN albums al
ON mmap.album_id = al.id
JOIN artist ar
ON al.artist_id = ar.id
GROUP BY artist_name;

Finally, there’s SELECT that joins the second CTE with the albums and artist tables. We again apply the MIN() and MAX() functions on the second CTEs result. This is to return only the values of the lowest and highest paid of all albums by each artist.

Here’s what we get after running the query:

artist_name	lowest_album_pay_by_artist	highest_album_pay_by_artist
Jimi Hendrix	1.59	63.19
Prince	4.64	97.31
Santana	0.91	10.22

6: Find the Longest Path Between Rotterdam and Amsterdam

This is a variation of the shortest path problem in graph theory, except we’ll look for the longest path.

A graph is a type of data structure that consists of nodes or points which are connected with edges. Since they are connected, finding a way from one node to another is possible even though they are not directly connected.

Think of it as a map of roads. This is exactly the example we’ll use here. Below is the table cities_distance, showing the cities and the distance between them. Use this query to create the table. It contains these columns:

city_from – The origin city.
city_to – The arrival city.
distance – The distance between the two cities, in kilometers.

Here’s the data:

city_from	city_to	distance
Rotterdam	Amsterdam	78.20
Rotterdam	Gouda	24.10
Amsterdam	Gouda	72.50
Gouda	Leiden	34.10
Amsterdam	Leiden	50.00
Rotterdam	Leiden	35.40
Gouda	Utrecht	44.00
Utrecht	Amsterdam	52.40
Leiden	Gouda	34.10

We need to find the longest path from Rotterdam to Amsterdam. The path should include the name of all cities along the way, separated by ‘/’. Also, we should show the length of the longest path.

When we say the longest path, we want to exclude circular paths (where you can endlessly make rounds and build up the distance). We want this longest path to go through any particular city only once.

To solve this problem, we will use a recursive CTE. It’s a query that references itself until it reaches the end of the data. This characteristic is ideal for querying graph data, where multiple paths can lead to the same goal.

Let’s see how this recursive CTE works:

WITH longest_path AS (
  SELECT cd.city_to,
         CAST((cd.city_from + '/' + cd.city_to) AS VARCHAR(100)) AS path,
        cd.distance AS distance
   FROM cities_distance cd
   WHERE cd.city_from = 'Rotterdam'

UNION ALL

  SELECT cd.city_to,
         CAST((lp.path + '/' + cd.city_to) AS VARCHAR(100)) AS path,
         CAST((lp.distance + cd.distance) AS DECIMAL(10,2))
  FROM longest_path lp 
  INNER JOIN cities_distance cd
  ON cd.city_from = lp.city_to
  WHERE lp.city_to <> 'Amsterdam'
  AND lp.path NOT LIKE '%/' + cd.city_to + '/%'
)

SELECT TOP 1 lp.path,
	 lp.distance
FROM longest_path lp
WHERE lp.city_to = 'Amsterdam'
ORDER BY lp.distance DESC;

The syntax looks the same as before – the recursive query also starts with WITH in SQL Server.

As usual, there’s a SELECT statement in the parentheses. There are two, to be more precise. Let’s see what the first one does.

WITH longest_path AS (
  SELECT cd.city_to,
         CAST((cd.city_from + '/' + cd.city_to) AS VARCHAR(100)) AS path,
         cd.distance AS distance
  FROM cities_distance cd
  WHERE cd.city_from = 'Rotterdam'

The first SELECT in recursion is called the anchor member. It’s used to select the starting point of the recursion. The starting point will be Rotterdam, which we get by filtering this city in WHERE. The column city_to is used to show all the final destinations that can be reached directly from Rotterdam. The column path will list all the origin and destination cities. The length of that route is shown in the column distance.

Then comes UNION ALL, which will connect the results of the anchor and the recursive member, i.e., the second SELECT. Unionizing these two queries is necessary for the recursion to work.

Note: In some other SQL dialects, it’s also possible to use UNION. However, SQL Server allows only UNION ALL.

Now we come to the recursive member. It references the CTE itself in FROM and joins it with the table cities_distance. For the queries to be unionized, they both have to have the same number of columns of the same data type. The first two columns are the same as in the anchor member. The column longest_path sums all the distances all the way to reach all the cities from Rotterdam.

SELECT cd.city_to,
       CAST((lp.path + '/' + cd.city_to) AS VARCHAR(100)) AS path,
       CAST((lp.distance + cd.distance) AS DECIMAL(10,2))
FROM longest_path lp 
INNER JOIN cities_distance cd
ON cd.city_from = lp.city_to
WHERE lp.city_to <> 'Amsterdam'
AND lp.path NOT LIKE '%/' + cd.city_to + '/%'
)

We also added two conditions in WHERE. The first excludes all the relations where Amsterdam is the final destination; we’re looking for the longest, not the shortest, way to Amsterdam. The second condition ensures that any new city added to the path is not already included in the path. Otherwise, the query will go into endless recursion. This addresses what we stated above: the longest path should not visit the same city more than once.

To better understand what we’re talking about, here’s the output of the recursive CTE:

city_to	path	distance
Amsterdam	Rotterdam/Amsterdam	78.20
Gouda	Rotterdam/Gouda	24.10
Leiden	Rotterdam/Leiden	35.40
Gouda	Rotterdam/Leiden/Gouda	69.50
Utrecht	Rotterdam/Leiden/Gouda/Utrecht	113.50
Amsterdam	Rotterdam/Leiden/Gouda/Utrecht/Amsterdam	165.90
Leiden	Rotterdam/Gouda/Leiden	58.2
Utrecht	Rotterdam/Gouda/Utrecht	68.1
Amsterdam	Rotterdam/Gouda/Utrecht/Amsterdam	120.5

You can see that there are three ways from Rotterdam to Amsterdam. One is direct, with the distance being 78.20 km. The second and third go through other cities and take 165.90 and 120.50 km, respectively.

Mind you, this is not the final result! Our solution also has a SELECT statement that references the CTE:

SELECT TOP 1 lp.path,
	 lp.distance
FROM longest_path lp
WHERE lp.city_to = 'Amsterdam'
ORDER BY lp.distance DESC;

This SELECT returns the path and the distance. We use the TOP 1 command combined with ORDER BY to return the longest way from Rotterdam to Amsterdam. We sorted data from the highest to the shortest distance, so the first row will also be the longest path.

Here’s the final result:

path	distance
Rotterdam/Leiden/Gouda/Utrecht/Amsterdam	165.90

This output shows that the longest path from Rotterdam to Amsterdam is 165.9 km and goes through Leiden, Gouda, and Utrecht.

If you can’t get enough of this topic, here are some more CTE examples.

When Should You Use CTEs?

As you saw in our examples, a CTE has many purposes in SQL Server.

One is that it generally improves code readability. All the above solutions (except the recursion) could have been written with subqueries. But that would make the code much longer and less clear.

When you think of it, the subqueries are usually written contrary to the logic of the problem you’re trying to solve. You first have the main query, which uses the subquery's output. So the subquery is usually the first step of the calculation, even though it’s not positioned that way in the code.

CTEs, on the other hand, can be written to follow the problem’s logic. You can write several separate CTEs and merge their outputs into the SELECT statements. You can also reference one CTE’s output with the second query (or third, fourth…), with the final SELECT statement being another calculation level.

One of the examples also showed you could rank data in SQL Server using a window function and a CTE.

And if you want to write recursive queries in SQL Server, you can’t do that without CTEs. A CTE can be non-recursive, but no recursive queries exist without CTE. Along with graphs, recursion is extremely helpful in querying hierarchical structures, such as data organization and family trees.

You can find more about when to use a CTE here.

CTEs Are the Door to Advanced SQL Usage!

CTEs are one of the more advanced SQL Server concepts. If you want to open the door to advanced-level SQL Server skills, CTEs are a must.

As your queries in SQL Server get more complex, you’ll soon realize that learning CTEs was one of the best decisions you ever made. They are also a stepping stone to recursive queries, allowing you to query unusual types of data structures in SQL Server, such as hierarchies and graphs.

This article is only a preview of the knowledge you can find in the Recursive Queries in MS SQL Server course. So don’t stop here. There’s plenty more to learn!

Tags: