21st Feb 2023 18 minutes read 6 Useful Examples of CTEs in SQL Server Tihomir Babic Common Table Expressions MS SQL Server Table of Contents CTE Syntax in SQL Server 6 Examples of CTEs in SQL Server 1: Find the Average Highest and Lowest Numbers of Daily Streams 2: Calculate the Average Total Fee Paid per Song 3: Find Each Artist’s Most Streamed Album 4: Calculate the Average Streams per Song and Compare It With Average Streams per Date 5: Calculate the Highest and Lowest Average Album Pay by Artist 6: Find the Longest Path Between Rotterdam and Amsterdam When Should You Use CTEs? CTEs Are the Door to Advanced SQL Usage! How can you use CTEs in SQL Server in your everyday professional life as a data pro? We’ll answer this question by giving you six examples. CTE is short for Common Table Expression. This is a relatively new feature in SQL Server that was made available with SQL Server 2005. A CTE is a temporary named result. This result is available only for the query that runs it. It isn’t stored, so it doesn't take up disk space. A CTE is somewhat similar to a temporary table and can be used like any other table. CTEs are most often used with a SELECT statement, but they can be used with INSERT, UPDATE, and DELETE as well. CTEs are one of the most challenging concepts in SQL Server. To reap their benefits, your approach to learning them should be carefully structured and not rushed. Our Recursive Queries in MS SQL Server course will show you how to write a simple CTE in SQL Server for a start. Then you’ll learn to write multiple CTEs, nest them, and use them within SELECT, INSERT, UPDATE, and DELETE statements. Finally, you’ll learn about hierarchical and graph data structure and how to use recursive CTEs in SQL Server to query such data. To ensure you get enough practice, there are 112 interactive exercises to complete in the course. There are, of course, some other ways to learn CTEs that you can also check out. CTE Syntax in SQL Server Generally, CTE syntax in SQL Server is like the following example: WITH cte AS ( SELECT ... ) SELECT ... FROM cte; CTEs must always start with the keyword WITH. Then comes the CTE’s name, the AS keyword, and the parentheses. You define the CTE in those parentheses. Defining it, as you’ll see in our examples, means writing the SELECT statement. You can find more details in this article explaining what a CTE is. 6 Examples of CTEs in SQL Server 1: Find the Average Highest and Lowest Numbers of Daily Streams In the first five examples, we’ll be using the same dataset. It shows some made-up data from an imaginary music streaming platform; let’s call it Terpsichore. The dataset consists of three tables. The first is artist, and here’s the create table query. This table contains the following columns: id – The artist’s ID and the table’s primary key. artist_name – The artist’s name. idartist_name 1Prince 2Jimi Hendrix 3Santana This table shows three artists. The next table is albums. Here’s the query for creating it. And here are the columns it contains: id – The album’s ID and the table’s primary key. artist_id – The artist (and the table’s foreign key). album_title – The album’s title. year_released – The year of the album’s release. idartist_idalbum_titleyear_released 12Are You Experienced1967 22Axis: Bold as Love1967 31Dirty Mind1980 42Electric Ladyland1968 53Abraxas1970 6119991982 73Santana III1971 83Santana1969 91Prince1979 101Controversy1981 There are ten albums in the table. The last table is streams. It shows streaming data for the individual songs. You can create the table using this query. And the columns: id – The stream ID and the table’s primary key. artist_id – The artist’s ID and a foreign key. album_id – The album’s ID and a foreign key. song_title – The name of the song. date – The date of the stream. number_of_streams – The number of times the song was played on a particular date. pay_per_stream – Value (in dollars) Terpsichore pays to the artists for each stream. idartist_idalbum_idsong_titledatenumber_of_streamspay_per_stream 119I Wanna Be Your Lover2023-01-015970.013 216Little Red Corvette2023-01-014970.013 316D.M.S.R.2023-01-012170.013 413Uptown2023-01-0197480.013 513Do It All Night2023-01-012080.013 There are 45 rows in this table. We’ll show you only the first five, so you get the feeling of the table’s logic. Now, the example! Let’s start with writing only one CTE in SQL Server. We will do that to calculate the average highest and lowest number of daily streams. Here’s the code; we’ll explain it below: WITH daily_streaming AS ( SELECT date, MIN(number_of_streams) AS minimum_streaming, MAX(number_of_streams) AS maximum_streaming FROM streams GROUP BY date ) SELECT AVG(minimum_streaming) AS average_minimum_daily_streaming, AVG(maximum_streaming) AS average_maximum__daily_streaming FROM daily_streaming; A CTE is brilliant for breaking down the logic of any calculation in SQL Server. Remember, there are multiple songs streamed in one day. Our CTE’s purpose is to get the lowest and highest number of streams each day. As mentioned previously, a CTE in SQL Server always starts with the keyword WITH, which is followed by the CTE name. Our CTE is named daily_streaming. After the keyword AS comes the parentheses with a SELECT statement – i.e. the CTE definition. We use it, along with the MIN() and MAX() aggregate functions, to calculate the highest and lowest number of streams by date. The next SELECT statement uses data from the CTE, referencing it in the FROM clause. As we said, a CTE can be used as any other table. In this SELECT, we use the AVG() aggregate function to get the average of the daily stream peaks and lows. The output shows the average lowest point is 90 streams. The average of the top daily streams is 8,367. average_minimum_daily_streamingaverage_maximum__daily_streaming 908,367 2: Calculate the Average Total Fee Paid per Song Let’s now practice what we learned. We’ll again write a single CTE. The problem that needs solving is finding the average total fee Terpsichore paid for each song. Here’s the solution: WITH paid_per_song AS ( SELECT song_title, SUM(number_of_streams * pay_per_stream) AS total_pay FROM streams GROUP BY id, song_title, pay_per_stream ) SELECT song_title, AVG(total_pay) AS average_total_pay FROM paid_per_song GROUP BY song_title ORDER BY average_total_pay DESC; We use SQL Server CTE to calculate the total fee paid per song by multiplying the number of streams with the pay per stream, and then summing it using the SUM() aggregate function. There are no changes regarding the CTE syntax: first comes WITH, then the CTE name, and AS comes after that. We then use a SELECT that invokes the CTE to calculate the average pay per song. It’s simple: use AVG(), reference the CTE in FROM, and group by the song title. The query returns the following result: song_titleaverage_total_pay Uptown47.4803330 I Wanna Be Your Lover36.8203330 Little Red Corvette33.8693330 The Wind Cries Mary23.6138660 Do It All Night12.4063330 If 6 Was 97.7824000 Samba Pa Ti7.5735000 All Along the Watchtower5.2032000 Bold as Love4.7424000 Burning of the Midnight Lamp3.7333330 D.M.S.R.3.1633330 Taboo2.4871000 Jingo2.1604000 Everything's Coming Our Way1.5466000 Incident at Neshabur0.9207000 We see that the song ‘Uptown’ earned $47.4803330 in total. The second and third songs by earnings are ‘I Wanna Be Your Lover’ and ‘Little Red Corvette’. If you’re a fan of this artist, then you don’t need SQL to find who wrote all three of these songs. 3: Find Each Artist’s Most Streamed Album In this exercise, you have to find the most streamed album by each artist. Output the artist name, the album title, and the number of streams by album. This will, again, be a query with only one CTE. However, it’s a little more complex than the previous two – there are some JOINs and a window function. WITH album_streaming AS ( SELECT artist_id, album_id, SUM(number_of_streams) AS streams_by_album, RANK() OVER (PARTITION BY artist_id ORDER BY SUM(number_of_streams) DESC) AS streaming_rank FROM streams GROUP BY artist_id, album_id ) SELECT artist_name, album_title, streams_by_album FROM album_streaming alst JOIN albums al ON alst.album_id = al.id JOIN artist ar ON al.artist_id = ar.id WHERE streaming_rank = 1; Again, the CTE syntax in SQL Server is familiar. So let’s focus on what this CTE does. We use it to rank the albums by artist. First, we select the artist and album IDs. Then we use SUM() to calculate the number of streams by album. Now comes the crucial part – ranking the output using the RANK() window function. After the function is invoked, we see the OVER() clause – a mandatory clause for SQL window functions. We partition the dataset by the artist ID and order data within each partition by the number of streams, in descending order. What does that mean in practice? It means the window function will rank the albums for one artist, then the rank is restarted once the function reaches the next artist, and so on. The artist’s album with the most streams will be ranked first in their partition. If you run only this SELECT statement within the CTE, you’ll get this output: artist_idalbum_idstreams_by_albumstreaming_rank 1313,8201 168,5462 198,4973 257,7221 273,6672 281,9643 3111,0691 325,8712 344,1893 As you can see, the first artist’s albums are ranked from the first to the third, according to the number of streams. When we reach the second artist, the ranking restarts. The same is with the third artist. Now, let’s see what the second SELECT statement does. Actually, it’s nothing complicated. It returns the artist and album name and the number of streams. What complicates this query is that we have to join three tables. The first join is the CTE album_streaming. Then we join it with albums and then with the artist table. In the end, we filter data using the WHERE clause because we’re only interested in the most streamed album. You’ll get this result: artist_namealbum_titlestreams_by_album PrinceDirty Mind13,820 Jimi HendrixAre You Experienced11,069 SantanaAbraxas7,722 It shows Prince’s most streamed album is ‘Dirty Mind’ with 13,820 streams. For Jimi Hendrix, the most streamed album is ‘Are You Experienced’, and for Santana, it’s ‘Abraxas’. This solution uses the window functions, so here’s a reminder of how they work when ranking data. 4: Calculate the Average Streams per Song and Compare It With Average Streams per Date Things are now getting more complicated. But not too much, don’t worry. We’re building on what we have learned so far about CTEs in SQL Server. Here, we need to find the average number of streams per song. Then we need to calculate the average number of streams per date. The output should show both metrics. Also, it should show the difference between the average stream per song and the daily average (as a percent difference), the song title, and the dates. So far, we’ve been writing queries with one CTE. This time, the solution consists of two CTEs. Let’s see how this works: WITH streams_per_song AS ( SELECT song_title, AVG(number_of_streams) AS average_streams_per_song FROM streams GROUP BY song_title ), streams_per_date AS ( SELECT date, AVG(number_of_streams) AS average_streams_per_date FROM streams GROUP BY date ) SELECT song_title, average_streams_per_song, date, average_streams_per_date, (average_streams_per_song - average_streams_per_date)/CAST(average_streams_per_date AS DECIMAL(10,2))*100 AS diff_from_daily_average FROM streams_per_song, streams_per_date; The first CTE is written as usual. We use it to calculate the average number of streams per song with AVG(). After closing the parentheses, the first CTE must be separated from the second CTE with a comma. Then we write the second CTE. Behold! There’s no WITH! That’s right. When writing multiple CTEs in a query in SQL Server, you write WITH only in front of the first CTE. The second (and any subsequent CTE) starts with the CTE's name; everything else is the same. This second query is for calculating the average number of streams per date. Again, we use the AVG() function. The third SELECT uses data from both CTEs. It returns all the required columns. The last column is diff_from_daily_average. We calculate it by subtracting the average streams by date from the average streams by song. The difference is divided by the average streams by date and multiplied by 100 to get the percentage. Also, we converted the result into a decimal data type using the CAST() function. Due to the output size, we’ll only show the first few rows: song_titleaverage_streams_per_songdateaverage_streams_per_datediff_from_daily_average All Along the Watchtower8132023-01-011,031-21.14 Bold as Love7412023-01-011,031-28.13 Burning of the Midnight Lamp5832023-01-011,031-43.45 D.M.S.R.2432023-01-011,031-76.43 Do It All Night9542023-01-011,031-7.47 The data shows the average daily stream for 1 January 2023 is 1,031. ‘All Along the Watchtower’ is 21.14% below that average. The next two songs are 28.13% and 43.45% below the daily average, and so on. 5: Calculate the Highest and Lowest Average Album Pay by Artist Let’s explain what we mean by that. We want first to find the average pay by album and date. Then we need to find the lowest and highest pay value by album. After that, we want to aggregate data by artist. Along with its name, we need to show the value of the lowest pay the artist got for an album. We need to do the same with the highest pay for an album. The solution in SQL Server contains two CTEs. However, this time it’s a nested CTE. That’s when the second CTE references the first CTE. Let’s see how this works: WITH pay_per_album AS ( SELECT album_id, date, AVG(number_of_streams * pay_per_stream) AS average_pay_per_album FROM streams GROUP BY album_id, date ), min_max_average_pay AS ( SELECT album_id, MIN(average_pay_per_album) AS lowest_average_pay_by_album, MAX(average_pay_per_album) AS highest_average_pay_by_album FROM pay_per_album GROUP BY album_id ) SELECT artist_name, MIN(lowest_average_pay_by_album) AS lowest_album_pay_by_artist, MAX(highest_average_pay_by_album) AS highest_album_pay_by_artist FROM min_max_average_pay mmap JOIN albums al ON mmap.album_id = al.id JOIN artist ar ON al.artist_id = ar.id GROUP BY artist_name; Be careful when you read the code explanation! It’s easy to get lost in all these aggregations. To make things easier to follow, I’ve copied each part of the query and followed it with an explanation. We’ll start with the first CTE: WITH pay_per_album AS ( SELECT album_id, date, AVG(number_of_streams * pay_per_stream) AS average_pay_per_album FROM streams GROUP BY album_id, date ), The first CTE calculates the average pay per album and the date. This is done by multiplying the number of streams by the pay per stream and using AVG(). min_max_average_pay AS ( SELECT album_id, MIN(average_pay_per_album) AS lowest_average_pay_by_album, MAX(average_pay_per_album) AS highest_average_pay_by_album FROM pay_per_album GROUP BY album_id ) When writing the second CTE in SQL Server, the syntax is the same as in the previous example – no additional WITH, start with the CTE name, and separate the CTEs with a comma. The only difference is that this time the second CTE references the first CTE, not the original dataset. This nested CTE uses the MIN() and MAX() functions to find each album’s lowest and highest average pay for all dates. The first CTE is referenced in the FROM. SELECT artist_name, MIN(lowest_average_pay_by_album) AS lowest_album_pay_by_artist, MAX(highest_average_pay_by_album) AS highest_album_pay_by_artist FROM min_max_average_pay mmap JOIN albums al ON mmap.album_id = al.id JOIN artist ar ON al.artist_id = ar.id GROUP BY artist_name; Finally, there’s SELECT that joins the second CTE with the albums and artist tables. We again apply the MIN() and MAX() functions on the second CTEs result. This is to return only the values of the lowest and highest paid of all albums by each artist. Here’s what we get after running the query: artist_namelowest_album_pay_by_artisthighest_album_pay_by_artist Jimi Hendrix1.5963.19 Prince4.6497.31 Santana0.9110.22 6: Find the Longest Path Between Rotterdam and Amsterdam This is a variation of the shortest path problem in graph theory, except we’ll look for the longest path. A graph is a type of data structure that consists of nodes or points which are connected with edges. Since they are connected, finding a way from one node to another is possible even though they are not directly connected. Think of it as a map of roads. This is exactly the example we’ll use here. Below is the table cities_distance, showing the cities and the distance between them. Use this query to create the table. It contains these columns: city_from – The origin city. city_to – The arrival city. distance – The distance between the two cities, in kilometers. Here’s the data: city_fromcity_todistance RotterdamAmsterdam78.20 RotterdamGouda24.10 AmsterdamGouda72.50 GoudaLeiden34.10 AmsterdamLeiden50.00 RotterdamLeiden35.40 GoudaUtrecht44.00 UtrechtAmsterdam52.40 LeidenGouda34.10 We need to find the longest path from Rotterdam to Amsterdam. The path should include the name of all cities along the way, separated by ‘/’. Also, we should show the length of the longest path. When we say the longest path, we want to exclude circular paths (where you can endlessly make rounds and build up the distance). We want this longest path to go through any particular city only once. To solve this problem, we will use a recursive CTE. It’s a query that references itself until it reaches the end of the data. This characteristic is ideal for querying graph data, where multiple paths can lead to the same goal. Let’s see how this recursive CTE works: WITH longest_path AS ( SELECT cd.city_to, CAST((cd.city_from + '/' + cd.city_to) AS VARCHAR(100)) AS path, cd.distance AS distance FROM cities_distance cd WHERE cd.city_from = 'Rotterdam' UNION ALL SELECT cd.city_to, CAST((lp.path + '/' + cd.city_to) AS VARCHAR(100)) AS path, CAST((lp.distance + cd.distance) AS DECIMAL(10,2)) FROM longest_path lp INNER JOIN cities_distance cd ON cd.city_from = lp.city_to WHERE lp.city_to <> 'Amsterdam' AND lp.path NOT LIKE '%/' + cd.city_to + '/%' ) SELECT TOP 1 lp.path, lp.distance FROM longest_path lp WHERE lp.city_to = 'Amsterdam' ORDER BY lp.distance DESC; The syntax looks the same as before – the recursive query also starts with WITH in SQL Server. As usual, there’s a SELECT statement in the parentheses. There are two, to be more precise. Let’s see what the first one does. WITH longest_path AS ( SELECT cd.city_to, CAST((cd.city_from + '/' + cd.city_to) AS VARCHAR(100)) AS path, cd.distance AS distance FROM cities_distance cd WHERE cd.city_from = 'Rotterdam' The first SELECT in recursion is called the anchor member. It’s used to select the starting point of the recursion. The starting point will be Rotterdam, which we get by filtering this city in WHERE. The column city_to is used to show all the final destinations that can be reached directly from Rotterdam. The column path will list all the origin and destination cities. The length of that route is shown in the column distance. Then comes UNION ALL, which will connect the results of the anchor and the recursive member, i.e., the second SELECT. Unionizing these two queries is necessary for the recursion to work. Note: In some other SQL dialects, it’s also possible to use UNION. However, SQL Server allows only UNION ALL. Now we come to the recursive member. It references the CTE itself in FROM and joins it with the table cities_distance. For the queries to be unionized, they both have to have the same number of columns of the same data type. The first two columns are the same as in the anchor member. The column longest_path sums all the distances all the way to reach all the cities from Rotterdam. SELECT cd.city_to, CAST((lp.path + '/' + cd.city_to) AS VARCHAR(100)) AS path, CAST((lp.distance + cd.distance) AS DECIMAL(10,2)) FROM longest_path lp INNER JOIN cities_distance cd ON cd.city_from = lp.city_to WHERE lp.city_to <> 'Amsterdam' AND lp.path NOT LIKE '%/' + cd.city_to + '/%' ) We also added two conditions in WHERE. The first excludes all the relations where Amsterdam is the final destination; we’re looking for the longest, not the shortest, way to Amsterdam. The second condition ensures that any new city added to the path is not already included in the path. Otherwise, the query will go into endless recursion. This addresses what we stated above: the longest path should not visit the same city more than once. To better understand what we’re talking about, here’s the output of the recursive CTE: city_topathdistance AmsterdamRotterdam/Amsterdam78.20 GoudaRotterdam/Gouda24.10 LeidenRotterdam/Leiden35.40 GoudaRotterdam/Leiden/Gouda69.50 UtrechtRotterdam/Leiden/Gouda/Utrecht113.50 AmsterdamRotterdam/Leiden/Gouda/Utrecht/Amsterdam165.90 LeidenRotterdam/Gouda/Leiden58.2 UtrechtRotterdam/Gouda/Utrecht68.1 AmsterdamRotterdam/Gouda/Utrecht/Amsterdam120.5 You can see that there are three ways from Rotterdam to Amsterdam. One is direct, with the distance being 78.20 km. The second and third go through other cities and take 165.90 and 120.50 km, respectively. Mind you, this is not the final result! Our solution also has a SELECT statement that references the CTE: SELECT TOP 1 lp.path, lp.distance FROM longest_path lp WHERE lp.city_to = 'Amsterdam' ORDER BY lp.distance DESC; This SELECT returns the path and the distance. We use the TOP 1 command combined with ORDER BY to return the longest way from Rotterdam to Amsterdam. We sorted data from the highest to the shortest distance, so the first row will also be the longest path. Here’s the final result: pathdistance Rotterdam/Leiden/Gouda/Utrecht/Amsterdam165.90 This output shows that the longest path from Rotterdam to Amsterdam is 165.9 km and goes through Leiden, Gouda, and Utrecht. If you can’t get enough of this topic, here are some more CTE examples. When Should You Use CTEs? As you saw in our examples, a CTE has many purposes in SQL Server. One is that it generally improves code readability. All the above solutions (except the recursion) could have been written with subqueries. But that would make the code much longer and less clear. When you think of it, the subqueries are usually written contrary to the logic of the problem you’re trying to solve. You first have the main query, which uses the subquery's output. So the subquery is usually the first step of the calculation, even though it’s not positioned that way in the code. CTEs, on the other hand, can be written to follow the problem’s logic. You can write several separate CTEs and merge their outputs into the SELECT statements. You can also reference one CTE’s output with the second query (or third, fourth…), with the final SELECT statement being another calculation level. One of the examples also showed you could rank data in SQL Server using a window function and a CTE. And if you want to write recursive queries in SQL Server, you can’t do that without CTEs. A CTE can be non-recursive, but no recursive queries exist without CTE. Along with graphs, recursion is extremely helpful in querying hierarchical structures, such as data organization and family trees. You can find more about when to use a CTE here. CTEs Are the Door to Advanced SQL Usage! CTEs are one of the more advanced SQL Server concepts. If you want to open the door to advanced-level SQL Server skills, CTEs are a must. As your queries in SQL Server get more complex, you’ll soon realize that learning CTEs was one of the best decisions you ever made. They are also a stepping stone to recursive queries, allowing you to query unusual types of data structures in SQL Server, such as hierarchies and graphs. This article is only a preview of the knowledge you can find in the Recursive Queries in MS SQL Server course. So don’t stop here. There’s plenty more to learn! Tags: Common Table Expressions MS SQL Server