15th Oct 2021 Updated: 7th Nov 2024 7 minutes read

How to Join Only the First Row in SQL

JOIN

Table of Contents

Why Join Just the First Row?
4 Methods to Join the Top Row in SQL
Let’s Practice SQL JOINs!

In your projects, you may encounter situations when you only need to join the first row with the most recent order to the corresponding record in another table. In this article, I’ll go through several ways to do this in SQL.

The best way to practice basic and advanced SQL is our interactive SQL Practice track. It contains 10 SQL practice courses with onver 1000 hands-on exercises to help you refresh your SQL skills.

Why Join Just the First Row?

There are many different scenarios where you need to join only the first match from one table to the corresponding record in another. For example, you may be looking for:

The most expensive item in each order.
The most recently observed temperature for each location.
The most experienced employee in each department.
The most recent order for each customer.

In all these cases, you may order the table with many corresponding records accordingly (e.g., by item price, observation date, etc.), and therefore, turn your problem into selecting the first, or the top, row.

To demonstrate several possible solutions to this problem, we use the tables that list the customers and their respective orders. The customer table has the columns id, first_name, last_name, phone, and email. The orders table has the columns id, order_date, customer_id, shipped_date, order_status.

For each customer, we want to know the date and the status of his/her most recent order. Here’s the output we are looking for:

id	first_name	last_name	order_date	order_status
11	Kate	White	2021-10-05	Awaiting shipment
12	Rose	Parker	2021-10-05	Awaiting payment
13	William	Spencer	2021-10-04	Completed
14	John	Smith	2021-10-04	Completed

Now let’s go through several possible ways to get this output from our initial tables.

4 Methods to Join the Top Row in SQL

I’ll present four possible solutions to joining only the first row in SQL. Some of these solutions can be used with any database, while others work only with specific databases (e.g., PostgreSQL or MS SQL Server).

Solution 1: Find Top Row with MAX

If we know that the orders in our table are numbered sequentially, with a greater value of ID indicating a more recent order, we can use this column to define the latest record for each customer. Our step-by-step solution is the following:

Define the greatest order ID for each customer.
Assuming these IDs correspond to the most recent order for each customer, create a table that lists only the most recent orders.
Join the customers table with this table of the most recent orders.

This solution can be implemented using common table expressions (CTEs).

WITH last_orders AS (
  SELECT *
  FROM orders
  WHERE id IN (
    SELECT MAX(id)
    FROM orders
    GROUP BY customer_id
  )
)
SELECT 
  customers.id, 
  customers.first_name, 
  customers.last_name,
  last_orders.order_date, 
  last_orders.order_status
FROM customers
JOIN last_orders
ON customers.id = last_orders.customer_id
ORDER BY customer_id;

In the queries above, we use one SELECT statement, to find order IDs that correspond to the most recent order for each customer. We have another subquery to list these orders, and yet another query to join the table with the most recent orders with the table with customer information.

I prefer to use CTEs in cases like these because, in my opinion, they have better structure and readability. If you want to learn more about CTEs or WITH clauses, check out this introductory article and this interactive Recursive Queries course that covers all kinds of CTEs.

This solution gets us the output we need, but it relies on orders being indexed sequentially by when it was created. This may not always be the case. So, let’s move to the next solution that gives us more control over the output.

Solution 2: Find Top Row with ROW_NUMBER

If we cannot rely on the order ID to define the most recent order, we can add a column that does the job. Specifically, we can use a window function to number the rows of our orders table based on the order date, separately for each customer.

Our strategy in this solution is the following:

Number the rows in the orders table so that the most recent order for each customer gets number 1.
Select only the most recent order for each customer by filtering the records with row numbers equal to 1.
Join the customers table with the table containing only the most recent orders.

Again, we can implement the above strategy using CTEs:

WITH numbered_orders AS (
  SELECT
    *,
    ROW_NUMBER() OVER (
      PARTITION BY customer_id
      ORDER BY order_date DESC
    ) AS row_number
  FROM orders
),
last_orders AS (
  SELECT *
  FROM numbered_orders
  WHERE numbered_orders.row_number = 1
)
SELECT 
  customers.id, 
  customers.first_name, 
  customers.last_name,
  last_orders.order_date, 
  last_orders.order_status
FROM customers
JOIN last_orders
ON customers.id = last_orders.customer_id
ORDER BY customer_id;

In the above SQL query:

We use the ROW_NUMBER() function to number the rows in the orders Note that before numbering the rows, we group them by customer ID with PARTITION BY and sort them by date in descending order to get the most recent order in the top row. We save the output of this subquery as numbered_orders.
Next, we select the orders with the row number equal to 1 and save the result of this subquery as last_orders.
Finally, we join the customers table with last_orders to get the required output.

If you are new to window functions, learn more in this beginner-friendly guide and consider taking this interactive Window Functions course. For an overview of the syntax, check out the SQL Window Functions Cheat Sheet.

Both of these solutions so far can be applied in all kinds of relational databases. Now, let’s move on to database-specific solutions.

Solution 3: DISTINCT ON in PostgreSQL

PostgreSQL allows the DISTINCT ON clause that can be of great value when we need to join only the first match in SQL:

WITH last_orders AS (
  SELECT DISTINCT ON (customer_id)
    *
  FROM orders
  ORDER BY customer_id, order_date DESC
)
SELECT 
  customers.id, 
  customers.first_name, 
  customers.last_name,
  last_orders.order_date, 
  last_orders.order_status
FROM customers
JOIN last_orders
ON customers.id = last_orders.customer_id
ORDER BY customer_id;

Instead of a separate subquery to number the rows or define the most recent order using order ID, we use DISTINCT ON (customer_id) to get only the first row corresponding to each customer. Also, in our CTE, we sort the rows by order date in descending order to ensure that the first row for each customer corresponds to the most recent order of this customer.

The DISTINCT ON () clause is very convenient for cases like this, but unfortunately, it is available only in PostgreSQL.

Solution 4: Use LIMIT or TOP N query

We can use the SQL toolkit for specifying the number of rows to be displayed in the output. This option is available in most SQL dialects, but the syntax can be different.

Several SQL dialects (e.g., SQLite, MySQL, and PostgreSQL) use the LIMIT clause to specify the number of rows to be returned. You can use this option to select only the most recent order for each customer. You’ll need to sort the results by order date in descending order then limit the output to only one row:

SELECT 
  customers.id, 
  customers.first_name, 
  customers.last_name,
  orders.order_date, 
  orders.order_status
FROM customers
JOIN orders
ON orders.id = (
  SELECT id
  FROM orders
  WHERE customer_id = customers.id
  ORDER BY order_date DESC
  LIMIT 1
)
ORDER BY customer_id;

MS SQL Server doesn’t support the LIMIT clause, but it has another solution to join only the top row in SQL. You can use the TOP 1 clause in MS SQL Server to get only the most recent orders joined with the customers table:

SELECT 
  customers.id, 
  customers.first_name, 
  customers.last_name,
  orders.order_date, 
  orders.order_status
FROM customers
JOIN orders
ON orders.id = (
  SELECT TOP 1 id
  FROM orders
  WHERE customer_id = customers.id
  ORDER BY order_date DESC
)
ORDER BY customer_id;

Like the previous solution, we order the rows by date in descending order to make sure that TOP 1 clause selects the most recent order for each customer.

Let’s Practice SQL JOINs!

I hope that these solutions have shown you how powerful and flexible SQL can be with various tasks. You can see how SQL JOINs can be used to join only the first row when there is a one-to-many relationship between two tables. There are many more use cases where SQL JOINs can help address non-trivial problems.

To review and deepen your knowledge of SQL JOINs, I recommend interactive SQL JOINs course that includes 93 hands-on exercises. It covers INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN; filtering data with different kinds of JOINs; joining a table with itself; joining tables on non-key columns; and more.

Thanks for reading, and happy learning!

Tags:

JOIN