3rd Mar 2021 9 minutes read

How to JOIN Tables in SQL

JOIN

Table of Contents

Step by Step Through SQL JOIN
Wrapping Up SQL JOINs!

Do you need to display data stored in different tables? Then it’s time to use SQL JOINs! This can be a challenging topic for beginners, so I’ve prepared this comprehensive step-by-step guide on joining tables in SQL.

Often, the information that you want to display is stored in several tables. In such cases, you’ll need to join these tables by specifying which rows should be combined with other rows. That’s exactly what JOIN does. The JOIN clause in SQL is used to combine rows from one or more tables based on a common column value.

For example, let’s say we research modern architecture and have the tables buildings, architects, and building_restorations:

buildings
id	name	location	year	architect_id
124	Fallingwater House	Mill Run, PA, USA	1935	11
231	Barcelona Pavilion	Barcelona, Spain	1929	15
764	Villa Dirickz	Brussels, Belgium	1933	18
850	Guggenheim Museum	New York, NY, USA	1959	11
894	Neue Nationalgalerie	Berlin, Germany	1968	15

architects
id	name	year_of_birth	year_of_death
11	Frank Lloyd Wright	1867	1959
15	Ludwig Mies Van der Rohe	1886	1969
18	Marcel Leborgne	1898	1978

building_restorations
id	building_id	arhitect_id	restoration_year
1	124	11	2001
2	231	15	1986
3	850	11	2008
4	894	15	2021

Now, suppose we want to list the buildings along with the respective architects. Note that building names and architect names are stored in different tables. However, we have the architect_id column in the buildings table and the id column in the architects table. We can join these two tables based on those related columns. Here’s the query:

SELECT buildings.name, architects.name
FROM buildings
JOIN architects
ON buildings.architect_id = architects.id;

We first specify the columns we want to display in the SELECT clause. Then, we join the tables by specifying one table in the FROM clause, another table in the JOIN clause, and the related columns in the ON clause. Here’s the output:

name	name
Guggenheim Museum	Frank Lloyd Wright
Fallingwater House	Frank Lloyd Wright
Neue Nationalgalerie	Ludwig Mies Van der Rohe
Barcelona Pavilion	Ludwig Mies Van der Rohe
Villa Dirickz	Marcel Leborgne

Don’t worry if joining tables in SQL seems confusing to you. We’ll discuss the syntax of SQL JOIN below. You can also get more practice with SQL JOINs in this interactive course.

Step by Step Through SQL JOIN

Relational databases include multiple tables storing different pieces of information. Since data analysis implies getting insights from data by analyzing it from different viewpoints, joining tables is very common in SQL. However, many beginners find this topic challenging.

If you are new to SQL JOINs, you may find it useful to go through this illustrated guide to the SQL INNER JOIN, which is the basic JOIN type.

But now, let’s take a guided tour of the JOIN syntax.

Step 1. Identify Tables to JOIN

We always start by identifying the tables that contain the information to be displayed. In the above example, we wanted to display the building names along with their architect’s names. Since the building names are stored in the buildings table and the architect names are stored in the architects table, we have joined these two tables to display the necessary information.

Another example could be to list the restorations – specifically, the building name and the restoration year. Since the building name is stored only in the buildings table and the year of restoration is available only in the building_restorations table, we need to join these two tables to display the information we want.

In the example query, the tables are specified as follows:

FROM buildings
JOIN building_restorations

For simple JOINs, the order of tables does not matter.

Step 2. Identify the JOIN Condition

The next step is to tell the database which rows should be joined with which other rows. That’s called the JOIN condition. Usually, the JOIN condition is equality, which means you specify the columns that the tables have in common and put an equal sign between these columns. This condition is put after the ON keyword:

ON buildings.architect_id = architects.id

Note that most databases have a consistent naming convention; you need to know the conventions in your database to work with it effectively. Usually, the convention is to call the first column id or similar and use it as a unique identifier for each record. This unique identifier is also referred to as a primary key. All the tables in our example (i.e. buildings, architects, and building_restorations) have the id column as their primary key.

Then, it’s usual that one table has a column that references another table, like the architect_id column in the buildings table or the building_id and architect_id columns in the building_restorations table. These columns are referred to as foreign keys.

It is very common to join tables using one table’s primary key – which is the other table’s foreign key. To list buildings together with their architects, we have joined the respective tables using the primary key of the architects table (i.e. id) and the foreign key of the buildings table (i.e., architect_id, which points to the architect table). If we wanted to list buildings together with the restoration year, we would join the buildings and the building_restorations tables using the primary key of the buildings table (i.e. id) and the relevant foreign key of the building_restorations table (i.e. building_id).

In some cases, a primary key might be multi-column. If the referencing table also includes a two-column foreign key, you can join the tables using a JOIN condition with two equalities. For example, instead of one id column in the architects table, we could have a composite primary key consisting of two columns: (1) style_id, which references the architectural style (e.g. modern, contemporary, etc.) and (2) architect_id, which references the ID of a particular architect working in this style. Assuming that the buildings table now also includes the style_id and the architect_id columns, we can join the buildings and architects tables using the following JOIN condition:

ON buildings.style_id = architects.style_id AND 
   buildings.architect_id = architects.architect_id

Also, note that while the equality condition is the most common for joining tables in SQL, you can use other conditions to join tables. This is called a non-equi join; you can find examples of non-equi joins in this practical guide.

Step 3. Refer to the Columns Properly

When you’re displaying data from multiple tables, you need to be very careful when referring to the columns. We may have columns with the same name in different tables. (All three of our tables have a column called id.) Thus, when referring to a particular column, we first specify the table name, then use a dot, and finally include the column name (e.g. buildings.architect_id). The same rules apply to naming columns in the ON condition, the SELECT clause, or any other clauses in the query.

However, if the column name is unambiguous (i.e. used only in one table), we may specify only the column name and omit the table name. For example, if we want to display the year of each building together with its name and architect, we may use the following query:

SELECT buildings.name, architects.name, year
FROM buildings
JOIN architects
ON buildings.architect_id = architects.id;

Since the column year is present in the buildings table only, the database knows which column we want to display and doesn’t require us to specify the table.

Step 4. Use Table Aliases (Optional)

Even though some of the column names are unique and can be referenced without a table name, you’ll still need to specify the name of the table before each column name in most cases. If the table name is long, it’s not convenient or readable to type it over and over again. In such cases, you may use a table alias, i.e. an alternative short name for this table in this particular query.

You declare the aliases in the FROM and JOIN clauses by putting the alias after the full table name, with a space in between. You then use the alias everywhere instead of the full table name:

SELECT b.name, a.name
FROM buildings b
JOIN architects a
ON b.architect_id = a.id;

To keep queries readable and understandable to others, pick meaningful names for aliases. Usually, the first letter of a table name is used as an alias (e.g. b for buildings). If a table name consists of several words, you may use the first letters of each word (e.g. br for building_restorations).

Step 5. Use Column Aliases (Optional)

When the result of a query is displayed, the original columns are the column names used in the output table. In some cases, you may want to rename these columns – or, in other words, replace the original column names with column aliases.

This option is most often used when there are two columns with the same name coming from two different tables and you want to see clearly what kind of information is displayed in each column of the output.

For example, you may have noticed that in the output table of our very first query, both columns are called name; we need to investigate the table content to understand which column lists buildings and which column lists architects. Let’s replace the original names with meaningful aliases. We do this by declaring the column aliases in the SELECT statement, after the AS keyword:

SELECT b.name AS building, a.name AS architect
FROM buildings b
JOIN architects a
ON b.architect_id = a.id;

Now we have the following output:

building	architect
Guggenheim Museum	Frank Lloyd Wright
Fallingwater House	Frank Lloyd Wright
Neue Nationalgalerie	Ludwig Mies Van der Rohe
Barcelona Pavilion	Ludwig Mies Van der Rohe
Villa Dirickz	Marcel Leborgne

Much better, isn’t it?

Note that you can rename any column you want, not necessarily just those with the same names. Also, the column alias only influences how the result is displayed; it cannot be used as a column name in other parts of the query.

Wrapping Up SQL JOINs!

In real-world scenarios, you often need to combine and analyze data from two or more tables. That’s when SQL JOINs come into play! To join two tables in SQL, you need to write a query with the following steps:

Identify the tables to JOIN.
Identify the JOIN condition.
Refer to the columns properly.
(Optional) Use table aliases to make the query readable.
(Optional) Use column aliases to make the result readable.

After joining the necessary tables, you can use WHERE, GROUP BY, HAVING, ORDER BY, and other SQL clauses in the same way you do for a single table query.

In this article, we went through the basics of joining tables in SQL. However, there is much more to learn. Here, we were using examples of inner JOINs only, but there are several other JOIN types in SQL. Check out our SQL JOINs course – it has 93 interactive exercises that cover all common types of JOINs, including self-joins, non-equi JOINs, and more. Learn more about the course in this overview article.

If you want to become really confident with SQL JOINs, consider trying our SQL Practice track. It has courses that go beyond SQL JOINs; in total, there are 341 coding challenges for you to solve. That’s a LOT of practice!

Thanks for reading, and happy learning!

Tags:

JOIN