12th Dec 2023 21 minutes read SQL Subquery Practice: 15 Exercises with Solutions Martyna Sławińska subqueries sql practice Table of Contents SQL Subquery Essentials SQL Subquery Exercises Dataset: Orchestras Exercise 1: Select Orchestras with a City of Origin Where a Concert Was Held in 2013 Exercise 2: Select Members that Belong to High-Rated Orchestras Exercise 3: Select Members Who Earn More Than Violinists Exercise 4: Select High-Rated Orchestras Newer Than the Chamber Orchestra Exercise 5: Select Players in Large Orchestras Dataset: University Exercise 6: Select Spring Term Courses Exercise 7: Select All Students Who Passed At Least One Course Exercise 8: Select the Lecturer(s) Teaching the Fewest Courses Exercise 9: Select Students Enrolled in the Most Courses Dataset: Store Exercise 10: Calculate the Percentage Spent by the Customer on Each Purchase Exercise 11: Find the Number of Expensive Products in Each Category Exercise 12: Display Purchased Products with Their Maximum Quantity Bought Exercise 13: List Discontinued, Continued, and Total Products in Each Category Exercise 14: Count Purchases Handled by Each Employee in Houston Exercise 15: Find the Largest Number of Product Categories in One Purchase More SQL Subquery Practice Subqueries are often challenging for beginners. Practice makes perfect, so join us as we work through these 15 SQL subquery practice exercises! In SQL, a subquery is a query nested within another query. It simplifies building intricate queries to retrieve data that meets specific conditions from various tables. In this article, we present various ways of employing subqueries to create complex queries. We start by introducing SQL subqueries along with common use cases. Then we’ll walk you through 15 SQL subquery practice exercises – complete with solutions and explanations. If you’ve never heard of SQL subqueries, check out our free Beginner’s Guide to the SQL Subquery. You can also get hands-on experience building subqueries in our SQL Subqueries course. But if you know the basics and are ready to improve your knowledge, let’s get started practicing SQL subqueries. SQL Subquery Essentials Just to recap, a SQL subquery is a SELECT statement embedded in another SELECT statement. You can think of subqueries as building blocks that make up complex queries: they let us break down complicated tasks into smaller parts and make the code easier to read. Imagine putting one question inside another – that's how subqueries work. With the help of subqueries, you can get specific information from different tables filtered by different conditions all in one go. Here are some common use cases for SQL subqueries: Filtering data: Use subqueries in the WHERE clause to filter data based on specific conditions, making your queries more dynamic. Covered in subquery practice exercises 1, 2, 3, 4, 8, and 9. Nested aggregations: Employ subqueries to perform aggregations within aggregations, allowing for more complex calculations. Covered in subquery practice exercises 5, 6, and 7. Checking existence: Determine whether a specific value exists in another table using subqueries with the EXISTS or IN Covered in subquery practice exercises 1, 2, and 14. Correlated subqueries: Create subqueries that reference columns from the outer query, enabling context-aware filtering. Covered in subquery practice exercises 10, 11, 12, and 13. Subquery in SELECT clause: Include a subquery in the SELECT clause to retrieve a single value or set of values that can be used in the main query. Covered in subquery practice exercises 10 and 13. Subquery in FROM clause: Use a subquery in the FROM clause to create a temporary table, allowing for more complex joins. Covered in subquery practice exercises 14 and 15. SQL Subquery Exercises Dataset: Orchestras The following exercises use the orchestras dataset that contains three tables. The orchestras table stores all orchestras. The columns are id, name, rating, city_origin, country_origin, and year in which the orchestra was founded. The concerts table contains all concerts played by the orchestras. The columns are id, city, country, year, rating, and orchestra_id (references the orchestras table). The members table stores the members of (i.e. musicians playing in) each orchestra. The columns are id, name, position (i.e. the instrument played), wage, experience, and orchestra_id (references the orchestras table). Now that we are familiar with the dataset, let’s proceed to the SQL practice exercises. The following exercises come from the SQL Subqueries course. Exercise 1: Select Orchestras with a City of Origin Where a Concert Was Held in 2013 Exercise: Select the names of all orchestras that have the same city of origin as any city in which any orchestra performed in 2013. Solution: SELECT name FROM orchestras WHERE city_origin IN (SELECT city FROM concerts WHERE year = 2013); Solution explanation: We aim to select names of orchestras that fulfill a certain condition, so we start with SELECT name FROM orchestras. Then, the condition is going to be imposed on the city_origin column, as mentioned in the instructions. We want to select only the orchestras whose city of origin belongs to the group of cities where concerts were played in the year 2013. To create this condition in the WHERE clause, we use SQL subquery. Let’s create a (sub)query that selects all cities where concerts were played in 2013: SELECT city FROM concerts WHERE year = 2013. It returns a column containing city names. To ensure that the city of origin belongs to the cities returned by the subquery, we use the IN operator. Exercise 2: Select Members that Belong to High-Rated Orchestras Exercise: Select the names and positions (i.e. instrument played) of all orchestra members that have above 10 years of experience and do not belong to orchestras with a rating below 8.0. Solution: SELECT name, position FROM members WHERE experience > 10 AND orchestra_id NOT IN (SELECT id FROM orchestras WHERE rating < 8.0); Solution explanation: We want to select names and positions of orchestra members that fulfill the conditions indicated in the instructions, so we start with SELECT name, position FROM members. Then, we impose filtering conditions on members’ years of experience and orchestras where they belong. We want to select members whose years of experience exceed 10. Thus, we add the first WHERE clause condition: experience > 10. We don’t want to select members who belong to orchestras with ratings below 8.0. Let’s create a (sub)query that selects all orchestras with a rating below 8.0: SELECT id FROM orchestras WHERE rating < 8.0. To ensure that the orchestras do not belong to the orchestras listed by this subquery, we use the NOT IN operator. Exercise 3: Select Members Who Earn More Than Violinists Exercise: Show the name and position of orchestra members who earn more than the average wage of all violinists. Solution: SELECT name, position FROM members WHERE wage > (SELECT AVG(wage) FROM members WHERE position = 'violin'); Solution explanation: We select the name and position columns from the members table. We use a subquery to find out the average wage of all violinists: SELECT AVG(wage) FROM members WHERE position = 'violin'. To ensure that we select orchestra members whose salary is greater than the average salary of all violinists, we impose a condition on the wage column to be greater than the average value returned by the subquery. Exercise 4: Select High-Rated Orchestras Newer Than the Chamber Orchestra Exercise: Show the names of orchestras that were created after the 'Chamber Orchestra' and have a rating greater than 7.5. Solution: SELECT name FROM orchestras WHERE year > (SELECT year FROM orchestras WHERE name = 'Chamber Orchestra') AND rating > 7.5; Solution explanation: First, we select names from the orchestras table. Then, we create a subquery that returns the year when the Chamber Orchestra was created. As we want to list orchestras created after the Chamber Orchestra, we impose a condition on the year column to be greater than whatever is returned by this subquery. Finally, we define the condition on the rating column to be greater than 7.5. Exercise 5: Select Players in Large Orchestras Exercise: Show the name and number of members for each orchestra that has more members than the average membership of all orchestras in the table. Solution: SELECT o.name, COUNT(m.id) FROM orchestras o JOIN members m ON o.id = m.orchestra_id GROUP BY o.name HAVING COUNT(m.id) > (SELECT AVG(d.count) FROM (SELECT orchestra_id, COUNT(id) FROM members GROUP BY orchestra_id) AS d); Solution explanation: To show the orchestra’s name, we simply select the name column from the orchestras table. And to show the number of members in each orchestra, we must join the orchestras table with the members table on their common column (the orchestra ID). Then, we use the COUNT() function to count all members (COUNT(m.id)), grouping by the name column from the orchestras table (GROUP BY o.name). We want to choose only orchestras that have more than the average number of members. Therefore, we must impose a condition on COUNT(m.id) to be greater than the average number of members. To impose a condition on an aggregate function, we must use the HAVING clause that follows the GROUP BY clause. We can find the average number of orchestra members using subqueries. To be exact, we use a nested subquery – a subquery within a subquery. The inner subquery finds the number of orchestra members for each orchestra using the COUNT() aggregate function: SELECT orchestra_id, COUNT(id) FROM members GROUP BY orchestra_id The outer subquery calculates the average of all COUNT(id) values returned by the inner subquery using the AVG() aggregate function: SELECT AVG(d.count) FROM (<inner subquery>) AS d Finally, the total subquery is: (SELECT AVG(d.count) FROM (SELECT orchestra_id, COUNT(id) FROM members GROUP BY orchestra_id) AS d) And it returns the average number of members per orchestra. Now that we have the average number of orchestra members, we can impose a condition on COUNT(m.id) to ensure it is greater than the average number of orchestra members: HAVING COUNT(m.id) > (SELECT AVG(d.count) FROM (SELECT orchestra_id, COUNT(id) FROM members GROUP BY orchestra_id) AS d) Dataset: University The following exercises use the university dataset, which contains six tables. The course table stores information about courses. The columns are id, title, learning_path, short_description, lecture_hours, tutorial_hours, ects_points, has_exam, and has_project. The lecturer table stores information about lecturers. The columns are id, first_name, last_name, degree, and email. The student table contains information about students. The columns are id, first_name, last_name, email, birth_date, and start_date. The academic_semester table contains information about each study semester. The columns are id, calendar_year, term, start_date, and end_date. The course_edition table contains information on which lecturers teach each course in each semester. The columns are id, course_id (references the course table), academic_semester_id (references the academic_semester table), and lecturer_id (references the lecturer table). The course_enrollment table contains information about students enrolled in each course. The columns are course_edition_id (references the course_edition table), student_id (references the student table), midterm_grade, final_grade, course_letter_grade, and passed. Now that we are familiar with the dataset, let’s proceed to the SQL practice exercises. The following exercises come from the Basic SQL Practice University course. Exercise 6: Select Spring Term Courses Exercise: Display the IDs and titles of all courses that took place during any spring term. Solution: SELECT id, title FROM course WHERE id = ANY (SELECT course_id FROM course_edition ce JOIN academic_semester asem ON ce.academic_semester_id = asem.id WHERE asem.term = 'spring'); Solution explanation: We start by selecting IDs and titles from the course table. In the WHERE clause, we must impose a condition on the id column of the course table, utilizing subqueries and the ANY operator. We want to select courses that took place during spring term at least once, so let’s start by creating a subquery that selects such course IDs. Note that we need to join the course_edition table with the academic_semester table on their common column (academic_semester_id and id, respectively) to be able to filter spring term courses. The ANY operator returns true if at least one value returned by the subquery fulfills the condition. Let’s illustrate it: The line in green returns true because 9 is equal to one of the numbers returned by the subquery. The line in red returns false because 3 is not equal to any of the numbers returned by the subquery. Exercise 7: Select All Students Who Passed At Least One Course Exercise: Select the IDs and names of students who passed at least one course. Solution: SELECT id, first_name, last_name FROM student WHERE id = ANY (SELECT student_id FROM course_enrollment WHERE passed = 't'); Solution explanation: We start by selecting IDs and names from the student table. In the WHERE clause, we must impose a condition on the id column of the student table using subqueries and the ANY operator. We want to select students who passed at least one course, so let’s start by creating a subquery that selects the IDs of all students who passed one or more courses: SELECT student_id FROM course_enrollment WHERE passed = 't' The ANY operator returns true if at least one value returned by the subquery fulfills the condition. Let’s illustrate it: The lines in green return true because both 5 and 8 belong to the IDs returned by the subquery. The line in red returns false because 3 does not belong to the IDs returned by the subquery. Exercise 8: Select the Lecturer(s) Teaching the Fewest Courses Exercise: Find the lecturer(s) with the least number of courses taught. Display the lecturer’s first and last name and the number of courses they teach (as no_of_courses). Solution: SELECT l.first_name, l.last_name, COUNT(ce.id) AS no_of_courses FROM lecturer l JOIN course_edition ce ON l.id = ce.lecturer_id GROUP BY l.first_name, l.last_name HAVING COUNT(ce.id) <= ALL (SELECT COUNT(id) FROM course_edition GROUP BY lecturer_id); Solution explanation: We select the names from the lecturer table, joining it with the course_edition table on lecturer ID. We count the rows in the course_edition table for each lecturer: COUNT(ce.id) AS no_of_courses. Thus, we group by lecturer names. To ensure that we select only the lecturer(s) with the fewest courses taught, we must impose a condition on COUNT(ce.id) to be less than or equal to the number of taught courses for each lecturer. Let’s create a subquery that selects the number of taught courses for each lecturer: SELECT COUNT(id) FROM course_edition GROUP BY lecturer_id. The ALL operator returns true if the condition is fulfilled by all rows returned by the subquery. Here we want to ensure that we select only the lecturer(s) whose no_of_courses is less than all other lecturers (and only equal to the smallest number of taught courses, hence, the smaller/equal sign). Let’s illustrate it: The line in green returns true because 4 is less than each number returned by the subquery and equal to only the smallest number returned by the subquery. The line in red returns false because 8 is not smaller than all numbers returned by the subquery (i.e. 8 > 4, 8 > 5, 8 > 6). Exercise 9: Select Students Enrolled in the Most Courses Exercise: Find the student(s) enrolled in the greatest number of course editions. Display the student’s ID, first and last names, and the number of course editions they’ve been enrolled in (as no_of_course_ed). Solution: SELECT student_id, first_name, last_name, COUNT(course_edition_id) AS no_of_course_ed FROM course_enrollment JOIN student ON course_enrollment.student_id = student.id GROUP BY student_id, first_name, last_name HAVING COUNT(course_edition_id) >= ALL (SELECT COUNT(course_edition_id) FROM course_enrollment GROUP BY student_id); Solution explanation: We select IDs and names from the student table and join student with the course_edition table on their common column (student ID). We count the rows in the course_edition table for each student (COUNT(course_edition_id) AS no_of_course_ed). Thus, we group by student IDs and names. To ensure that we select only students with the greatest number of enrolled courses, we must impose a condition on COUNT(course_edition_id) to be greater than or equal to the number of enrolled courses for each student. Let’s create a subquery that selects the number of enrolled courses for each student: SELECT COUNT(course_edition_id) FROM course_enrollment GROUP BY student_id The ALL operator returns true if the condition is fulfilled by all rows returned by the subquery. Here we want to ensure that we select only students whose no_of_course_ed is greater than for any other student (and only equal to the greatest number of enrolled courses – hence, the greater/equal sign). Let’s illustrate it: The line in green returns true because 9 is greater than each number returned by the subquery and equal to only the greatest number returned by the subquery. The line in red returns false because 6 is not greater than all numbers returned by the subquery (i.e. 6 < 8 and 6 < 9). Dataset: Store The following exercises use the store database that contains six tables: The customer table contains information about The columns are customer_id, contact_name, company_name, contact_email, address, city, and country. The product table stores information about products. The columns are product_id, product_name, category_id (references the category table), quantity_per_unit, unit_price, units_in_stock, and discontinued. The category table stores information about product categories. The columns are category_id, name, description, and parent_category_id (references itself). The purchase table stores information about purchases made by customers. The columns are purchase_id, customer_id (references the customer table), employee_id (references the employee table), total_price, purchase_date, shipped_date, ship_address, ship_city, and ship_country. The purchase_item table associates all purchases with products. The columns are purchase_id (references the purchase table), product_id (references the product table), unit_price, and quantity. The employee table stores information about employees. The columns are employee_id, last_name, first_name, birth_date, address, city, country, and reports_to. Now that we are familiar with the dataset, let’s proceed to the SQL practice exercises. The following exercises come from the Basic SQL Practice Store course. Exercise 10: Calculate the Percentage Spent by the Customer on Each Purchase Exercise: For each customer who made at least one purchase, display the ID of each purchase made by this customer and the percentage of the money spent on that purchase relative to all money spent by that customer. Round the percentages to integers. Show three columns: contact_name, purchase_id, and percentage. Solution: SELECT contact_name, purchase_id, ROUND(total_price * 100.0 / (SELECT SUM(total_price) FROM purchase WHERE customer_id = p.customer_id)) AS percentage FROM purchase p JOIN customer c ON p.customer_id = c.customer_id; Solution explanation: To identify customers who made at least one purchase, we must join the purchase table with the customer table on their common column (customer ID). To calculate the percentage of money spent on a purchase relative to all money spent by that customer, we need subqueries. The subquery calculates how much money a customer spent on all the purchases: SELECT SUM(total_price) FROM purchase WHERE customer_id = p.customer_id Note that the subquery references the outer query’s purchase table (aliased as p) to get the correct customer ID. This is called a correlated subquery. Finally, we calculate the percentage value by dividing the total_price by the value returned by the subquery. Additionally, we must multiply this value by 100 to get the percentage and ROUND() it to an integer. If you want to learn more about correlated subqueries, read Learn to Write a Correlated Subquery in 5 Minutes. Exercise 11: Find the Number of Expensive Products in Each Category Exercise: Show the names of categories and the number of products from this category that have a unit price greater than the average price of a product in this category. Show only the categories that have such product(s). Display two columns: name (the name of the category), and expensive_products (the number of products that cost more than the average product in this category). Solution: SELECT c.name, COUNT(*) AS expensive_products FROM category AS c JOIN product AS p ON c.category_id = p.category_id WHERE p.unit_price > (SELECT AVG(unit_price) FROM product JOIN category ON product.category_id = category.category_id WHERE category.category_id = c.category_id) GROUP BY c.name; Solution explanation: We want to display the category’s names and product counts; therefore, we need to join the category table with the product table on their common column (category ID). To show the number of products per category, we use the COUNT() function. As we select the category name (c.name) and the count of products per category name (COUNT(*)), we must group by the category name column (GROUP BY c.name). In the COUNT() function, we want to include only products with a unit price greater than the average price of a product in this category. To do that, we use a correlated subquery. In the subquery, we join the product and category tables again and select the average value of unit prices. To ensure that we take the average of values from the specific category ID, we impose a condition in the WHERE clause stating that the subquery’s category_id must be equal to the outer query’s category_id. The WHERE clause condition of the main query says that the unit_price should be greater than the average unit_price for this category, as returned by the subquery. Exercise 12: Display Purchased Products with Their Maximum Quantity Bought Exercise: For each purchased product, display its name, the largest quantity in which it has been purchased, and the number of max-quantity purchases for this product. Display three columns: product_name, quantity, and purchases_number. Solution: SELECT product_name, quantity, COUNT(purchase_id) AS purchases_number FROM purchase_item pi JOIN product p ON pi.product_id = p.product_id WHERE quantity = (SELECT MAX(quantity) FROM purchase_item WHERE product_id = pi.product_id) GROUP BY pi.product_id, product_name, quantity; Solution explanation: To get information about products and the quantities in which they were purchased, we must join the purchase_item table with the product table on their common column (product ID). We use the COUNT() aggregate function to get the number of purchases (COUNT(purchase_id)). To ensure we select only the largest quantity in which a particular product was purchased, we need to build a subquery. The subquery refers to the product_id value from the outer query to ensure we choose the maximum quantity for the correct product – therefore, it is a correlated subquery. In the WHERE clause of the main query, we impose a condition that the quantity value must be equal to the value returned by the subquery. Exercise 13: List Discontinued, Continued, and Total Products in Each Category Exercise: For each category, display: Its name. The number of discontinued (i.e. no longer available) products in this category (name this column discontinued_products). The number of continued (i.e. currently available) products in this category (name this column continued_products). The number of all products in this category (name this column all_products). Solution: SELECT c.name, (SELECT COUNT(*) FROM product WHERE category_id = c.category_id AND discontinued IS TRUE) AS discontinued_products, (SELECT COUNT(*) FROM product WHERE category_id = c.category_id AND discontinued IS FALSE) AS continued_products, (SELECT COUNT(*) FROM product WHERE category_id = c.category_id) AS all_products FROM category c; Solution explanation: In this exercise, we use not one or two but three correlated subqueries. We select the category name from the category table. The first correlated subquery counts all products that have been discontinued. This subquery refers to the category_id value from the outer query to ensure the discontinued products are counted per category. The second correlated subquery counts all products that haven’t been discontinued. This subquery refers to the category_id value from the outer query to ensure the continued products are counted per category. The third correlated subquery counts all products per category. This subquery refers to the category_id value from the outer query to ensure all products are counted per category. Exercise 14: Count Purchases Handled by Each Employee in Houston Exercise: Display the employee ID and the total number of purchases this employee handled. Use a subquery to fetch information about the number of orders each employee handled per customer and make the main query select FROM this subquery. Consider only employees who live in Houston. Solution: SELECT employee_per_customer.employee_id, SUM(employee_per_customer.no_of_purchases) AS total_no_of_purchases FROM (SELECT e.employee_id, p.customer_id, COUNT(p.purchase_id) AS no_of_purchases FROM employee e JOIN purchase p ON e.employee_id = p.employee_id WHERE EXISTS (SELECT * FROM employee WHERE employee.employee_id = e.employee_id AND city = 'Houston') GROUP BY e.employee_id, p.customer_id ) AS employee_per_customer GROUP BY employee_per_customer.employee_id; Solution explanation: Let’s start by ensuring we consider only employees who live in Houston. To do that, we use the EXISTS keyword. It returns true if the subquery returns at least one row. Note that the subquery passed to the EXISTS keyword is a correlated subquery, as it refers to the employee_id value of its outer query (which is a subquery for the main query). Let’s analyze the subquery that is passed to the main query in the FROM clause. It selects employee and customer IDs and counts how many purchases were made per employee and per customer (hence, the grouping by employee and customer ID values). SELECT e.employee_id, p.customer_id, COUNT(p.purchase_id) AS no_of_purchases FROM employee e JOIN purchase p ON e.employee_id = p.employee_id WHERE EXISTS (SELECT * FROM employee WHERE employee.employee_id = e.employee_id AND city = 'Houston') GROUP BY e.employee_id, p.customer_id The part in red ensures that we consider only employees who live in Houston. This subquery is aliased AS employee_per_customer and the main query selects from it. The main query selects the following: Employee IDs from the subquery (from employee_per_customer), The total number of purchases made by each employee . This requires grouping by employee ID (GROUP BY employee_per_customer.employee_id). Note that the correlated subquery uses COUNT() to count the purchases (or rows) per employee and per customer. But the main query uses the SUM() function to add up all the values returned by COUNT() in the subquery. You can learn more about aggregate functions here. And check out this article on how to use SUM() with OVER(PARTITION BY) to find out more details about window functions. This exercise presented the idea of using subqueries as building blocks – here, we used three building blocks to fetch the desired data. Exercise 15: Find the Largest Number of Product Categories in One Purchase Exercise: Use a subquery to select the purchase ID and the number of distinct categories contained in this purchase. In the main query, select the maximum number of categories from this subquery. Solution: SELECT MAX(categories_per_purchase.category_count) AS max_categories_per_purchase FROM (SELECT purchase_id, COUNT(DISTINCT category_id) AS category_count FROM purchase_item pi JOIN product p ON pi.product_id = p.product_id GROUP BY purchase_id) AS categories_per_purchase; Solution explanation: Let’s start from the subquery that is passed in the FROM clause of the main query. We join the purchase_item table with the product table on their common column (product ID). We select the purchase ID and COUNT DISTINCT category IDs per purchase. Hence, we group by the purchase_id column The main query uses the MAX() function to select (from the subquery aliased AS categories_per_purchase) the maximal number of product categories contained in one purchase. More SQL Subquery Practice This article presented various use cases of subqueries, including advanced filtering of data or nesting queries one within another. A basic idea of how to benefit from using subqueries is to break down a question into smaller (sub)questions – each small (sub)question should be answerable using a subquery. Go ahead and practice on your own – it is the best way to learn more techniques of employing subqueries to simplify complex data extraction tasks. Check out more subquery exercises here and here. We encourage you to dive in and practice subqueries with our interactive SQL Subqueries course. Sign up now and get started for free. Good luck! Tags: subqueries sql practice