13th Oct 2020 11 minutes read What Is a Self Join in SQL? Explaned With 7 Examples Martyna Sławińska JOIN Table of Contents What Is a Self Join in SQL? Table Aliases in Self Join Examples Scenario 1: Processing a Hierarchy in SQL Scenario 2: Listing Pairs Within a Table Scenario 3: Self Join in Combination With Another Table Self Join: A Special Case of the Join The self join is a popular special case of the SQL JOIN. While most JOINs link two or more tables with each other to present their data together, a self join links a table to itself. This is usually done by joining a table to itself just once within a SQL query, but it is possible to do so multiple times within the same query. Typically, each table in a database stores a specific type of information. So, often there are hundreds of tables related to each other in a database. This implies the need for joins. You can join different tables by their common columns using the JOIN keyword. It is also possible to join a table to itself, which is known as a self join. In this article, we will discuss what a self join is, how it works, and when you need it in your SQL queries. To practice SQL JOIN, including self joins, I recommend our interactive SQL JOINs course. It contains over 90 exercises on differnt types of JOIN, including sections dedicated just to self joins. What Is a Self Join in SQL? The self join, as its name implies, joins a table to itself. To use a self join, the table must contain a column (call it X) that acts as the primary key and a different column (call it Y) that stores values that can be matched up with the values in Column X. The values of Columns X and Y do not have to be the same for any given row, and the value in Column Y may even be null. Let’s take a look at an example. Consider the table Employees: IdFullNameSalaryManagerId 1John Smith100003 2Jane Anderson120003 3Tom Lanon150004 4Anne Connor20000 5Jeremy York90001 Each employee has his/her own Id, which is our “Column X.” For a given employee (i.e., row), the column ManagerId contains the Id of his or her manager; this is our “Column Y.” If we trace the employee-manager pairs in this table using these columns: The manager of the employee John Smith is the employee with Id 3, i.e., Tom Lanon. The manager of the employee Jane Anderson is the employee with Id 3, i.e., Tom Lanon. The manager of the employee Tom Lanon is the employee with Id 4, i.e., Anne Connor. The employee Anne Connor does not have a manager; her ManagerId is null. The manager of the employee Jeremy York is the employee with Id 1, i.e., John Smith. This type of table structure is very common in hierarchies. Now, to show the name of the manager for each employee in the same row, we can run the following query: SELECT employee.Id, employee.FullName, employee.ManagerId, manager.FullName as ManagerName FROM Employees employee JOIN Employees manager ON employee.ManagerId = manager.Id which returns the following result: IdFullNameManagerIdManagerName 1John Smith3Tom Lanon 2Jane Anderson3Tom Lanon 3Tom Lanon4Anne Connor 5Jeremy York1John Smith The query selects the columns Id, FullName, and ManagerId from the table aliased employee. It also selects the FullName column of the table aliased manager and designates this column as ManagerName. As a result, every employee who has a manager is output along with his/her manager’s ID and name. In this query, the Employees table is joined with itself and has two different roles: Role 1: It stores the employee data (alias employee). Role 2: It stores the manager data (alias manager). By doing so, we are essentially considering the two copies of the Employees table as if they are two distinct tables, one for the employees and another for the managers. You can find more about the concept of the self join in our article An Illustrated Guide to the SQL Self Join. Table Aliases in Self Join When referring to the same table more than once in an SQL query, we need a way to distinguish each reference from the others. For this reason, it is important to use aliases to uniquely identify each reference of the same table in an SQL query. As a good practice, the aliases should indicate the role of the table for each specific reference in a query. The aliases are in red in the following query. You can see their declaration in the FROM and JOIN clauses. SELECT employee.Id, employee.FullName, employee.ManagerId, manager.FullName as ManagerName FROM Employees employee JOIN Employees manager ON employee.ManagerId = manager.Id The JOIN keyword connects two tables and is usually followed by an ON or USING clause that specifies the common columns used for linking the two tables. Here, we see that the two references to the Employees table are linked by conditioning on the employee’s ManagerId to match the employee ID of the manager. Examples Let’s go through some common scenarios that use the self join. Scenario 1: Processing a Hierarchy in SQL The self join is commonly used in processing a hierarchy. As we saw earlier, a hierarchy assigns a row in a table to another row within the same table. You might think of it as having parent and child rows. Let’s go back to the example with the employees and their managers. Here’s the Employees table again: IdFullNameSalaryManagerId 1John Smith100003 2Jane Anderson120003 3Tom Lanon150004 4Anne Connor20000 5Jeremy York90001 And the code to list every employee that has a manager with the name of his or her manager: SELECT employee.Id, employee.FullName, employee.ManagerId, manager.FullName as ManagerName FROM Employees employee JOIN Employees manager ON employee.ManagerId = manager.Id Here’s the result when you run the code: IdFullNameManagerIdManagerName 1John Smith3Tom Lanon 2Jane Anderson3Tom Lanon 3Tom Lanon4Anne Connor 5Jeremy York1John Smith This query uses the standard join, also known as the INNER JOIN. To read more about the INNER JOIN, please see our article An Illustrated Guide to the SQL INNER JOIN. If we want to list all the employees whether or not they have managers, we can use a LEFT OUTER JOIN instead. The query below does this: SELECT employee.Id, employee.FullName, employee.ManagerId, manager.FullName as ManagerName FROM Employees employee LEFT OUTER JOIN Employees manager ON employee.ManagerId = manager.Id When you run this query, you get the following result: IdFullNameManagerIdManagerName 1John Smith3Tom Lanon 2Jane Anderson3Tom Lanon 3Tom Lanon4Anne Connor 4Anne Connor 5Jeremy York1John Smith The difference between JOIN and LEFT OUTER JOIN becomes clear when we compare this with the result of the previous SQL query. In the result of the (inner) JOIN query, only the employees with managers are included. In contrast, the LEFT OUTER JOIN query returns all employees, with or without managers. To read more about the OUTER JOIN, please read our article An Illustrated Guide to the SQL OUTER JOIN. Another example of a hierarchy is the relationship between parents and their children. Consider the Human table shown here: IdNameAgeParentId 1Jonathan53 2Alexandra73 3Barbara30 In the query below, the children are assigned to their respective parents by joining the Human table to itself: SELECT child.Id as ChildId, child.FirstName as ChildFirstName, child.Age as ChildAge, child.ParentId, parent.FirstName as ParentFirstName, parent.age as ParentAge FROM Human child INNER JOIN Human parent ON child.ParentId = parent.Id Here’s the result of this query: ChildIdChildFirstNameChildAgeParentIdParentFirstNameParentAge 1Jonathan53Barbara30 2Alexandra73Barbara30 The result of the query includes only the children who have parents. As was the case in the example of the employee-manager hierarchy, we could use a LEFT OUTER JOIN to include all rows from the table aliased child. Here is yet another example of a hierarchy. Consider the table Category, shown below: IdQuantityCategoryParentCategoryId 160Food 250Fruit1 340Apple2 420Granny Smith3 5100Milk1 660Soy Milk5 740Cow Milk5 830Whole Milk7 910Fat-Free Milk7 Let’s assign a parent category to each category wherever it is possible. Here’s a query to do that: SELECT category.Id, category.Quantity, category.Category, category.ParentCategoryId, parentcategory.Category as ParentCategory FROM Category category JOIN Category parentcategory ON category.ParentCategoryId = parentcategory.Id And here is the result: IdQuantityCategoryParentCategoryIdParentCategory 250Fruit1Food 340Apple2Fruit 420Granny Smith3Apple 5100Milk1Food 660Soy Milk5Milk 740Cow Milk5Milk 830Whole Milk7Cow Milk 910Fat-Free Milk7Cow Milk The first four columns of the result above come from the reference to the table aliased category. The last column comes from the table aliased parentcategory and contains the parent category name matched by the respective Id. The Category table has two different roles as denoted by the two separate references. The column ParentCategoryId from the table aliased category is matched with Id from the table aliased parentcategory. The ON clause specifies that ParentCategoryId from category must equal Id from parentcategory to connect the corresponding rows. Scenario 2: Listing Pairs Within a Table You can use a self join to generate pairs of rows based on the condition in the ON clause. Let’s start with a simple example that generates all possible pairs among the colleagues. Consider the following table, Colleagues: IdFullNameAge 1Bart Thompson43 2Catherine Anderson44 3John Burkin35 4Nicole McGregor29 Suppose we need to generate all possible pairs among the colleagues so that everyone has a chance to talk with everyone else at the company introductory evening. Here’s the SQL code: SELECT teammate1.FullName as Teammate1FullName, teammate1.Age as Teammate1Age, teammate2.FullName as Teammate2FullName, teammate2.Age as Teammate2Age FROM Colleagues teammate1 CROSS JOIN Colleagues teammate2 ON teammate1.FullName <> teammate2.FullName And here’s the result: Teammate1FullNameTeammate1AgeTeammate2FullNameTeammate2Age Catherine Anderson44Bart Thompson43 John Burkin35Bart Thompson43 Nicole McGregor29Bart Thompson43 Bart Thompson43Catherine Anderson44 John Burkin35Catherine Anderson44 Nicole McGregor29Catherine Anderson44 Bart Thompson43John Burkin35 Catherine Anderson44John Burkin35 Nicole McGregor29John Burkin35 Bart Thompson43Nicole McGregor29 Catherine Anderson44Nicole McGregor29 John Burkin35Nicole McGregor29 The result matches every person with every single person in the table. Since we don’t want anyone to be paired with himself or herself, we have the ON clause condition teammate1.FullName <> teammate2.FullName. This means that each person will be paired with three other colleagues, because there are four colleagues at this event. Now, let’s look at a slightly more complicated example. Consider the Human table shown below. We want to match all the ancestors to each person wherever the data allows, where a person is an ancestor if he or she has a higher Id value. Below is the Human table used in this example. IdFirstNameAgeParentId 1Jonathan53 2Alexandra73 3Barbara304 4Tom506 5George556 6Amy807 7Josephine9935 Let’s find all the descendant-ancestor pairs in the above table. Here’s the SQL code: SELECT descendant.Id, descendant.FirstName, descendant.Age, descendant.ParentId, ancestor.Id as AncestorId, ancestor.FirstName as AncestorFirstName, ancestor.Age as AncestorAge FROM Human descendant LEFT JOIN Human ancestor ON descendant.ParentId <= ancestor.Id And the result: IdFirstNameAgeParentIdAncestorIdAncestorFirstNameAncestorAge 1Jonathan533Barbara30 1Jonathan534Tom50 1Jonathan535George55 1Jonathan536Amy80 1Jonathan537Josephine99 2Alexandra733Barbara30 2Alexandra734Tom50 2Alexandra735George55 2Alexandra736Amy80 2Alexandra737Josephine99 3Barbara3044Tom50 3Barbara3045George55 3Barbara3046Amy80 3Barbara3047Josephine99 4Tom5066Amy80 4Tom5067Josephine99 5George5566Amy80 5George5567Josephine99 6Amy8077Josephine99 7Josephine99 By specifying the ON clause condition descendant.ParentId <= ancestor.Id, we find all the ancestors of every person in the table where they exist; otherwise, the query returns null for the ancestor information. The first four columns are taken from the table aliased descendant which contains information of the person for whom the ancestors are searched. The last three columns are taken from the table aliased ancestor and contain details about each ancestor. Scenario 3: Self Join in Combination With Another Table In SQL, it is possible to have a self join in combination with one or more different tables. While not a clean self join, this is very common in practice. A real-life example of this is the flight information in airports, with an enormous amount of data each hour. Suppose we want to search for a flight identification number along with the details about their departure and destination airports. Consider the following tables: Table Airport: AirportIdCountryCity 1USANew York 2CanadaToronto 3GermanyFrankfurt 4FranceParis 5ItalyRome Table Flight: FlightIdAirplaneIdStartTimestampEndTimestampStartAirportIdEndAirportId 25558772020-01-14 13:00:002020-01-14 15:00:0034 32225362020-02-04 01:00:002020-02-04 16:00:0015 41117452020-02-15 09:00:002020-02-15 12:00:0054 57775242020-02-24 03:00:002020-02-24 19:00:0042 68885212020-03-25 10:00:002020-03-25 12:00:0021 74449372020-04-01 00:00:002020-04-01 17:00:0031 2431116542020-01-01 02:00:002020-01-01 04:00:0012 Here, note that the column AirportId of the Airport table is the foreign key to the columns StartAirportId and EndAirportId of the Flight table. We will join the Airport table to the Flight table two separate times as follows: In the first JOIN, Airport takes the role of the table with the starting airports. In the second JOIN, Airport takes the role of the table with the destination airports. The query looks like this: SELECT flight.FlightId, flight.AirplaneId, flight.StartAirportId, startairport.Country as StartAirportCountry, startairport.City as StartAirportCity, flight.EndAirportId, endairport.Country as EndAirportCountry, endairport.City as EndAirportCity FROM Flight flight JOIN Airport startairport ON flight.StartAirportId = startairport.AirportId JOIN Airport endairport ON flight.EndAirportId = endairport.AirportId And the result of the query looks like this: FlightIdAirplaneIdStartAirportIdStartAirportCountryStartAirportCityEndAirportIdEndAirportCountryEndAirportCity 11116541USANew York2CanadaToronto 25558773GermanyFrankfurt4FranceParis 32225361USANew York5ItalyRome 41117455ItalyRome4FranceParis 57775244FranceParis2CanadaToronto 68885212CanadaToronto1USANew York 74449373GermanyFrankfurt1USANew York Let’s analyze the result. The first three columns come from a straightforward SELECT of the Flight table. The next two columns come from Airport in the role of the starting airport table; the rows are matched based on AirportId and StartAirportId from the Airport and Flight tables, respectively. This is followed by a column from the Flight table. The last two columns come from Airport in the role of the destination airport table this time; the rows are matched based on AirportId and EndAirportId from the Airport and Flight tables, respectively. Still a bit confused about all the JOINs? There are many more articles for you to browse through for help. I especially recommend the article on How to Learn SQL JOINs. And if you need to start practicing, please visit our article on How to Practice SQL JOINs. Self Join: A Special Case of the Join As we have seen, the self join is an important special case of the join. We have seen examples of various applications of the self join, including processing a hierarchy in a table and pairing the rows within a table. We can join the same table multiple times, but it is important to give each reference an alias that indicates its role. These table aliases are used to fetch columns from this single table based on the role for which it is referenced. Joins are a vital part of SQL and a very helpful, frequently used feature for combining different tables. They are everywhere – be sure to check out our course on SQL JOINs to master this powerful tool! Tags: JOIN