13th Jun 2017 Updated: 9th Aug 2018 10 minutes read Learning JOINs With Real World SQL Examples Francisco Claria JOIN Table of Contents What is the SQL JOIN Clause? Getting to Know the JOIN Types JOINs in SQL Real Time Examples The Grandfather, Father, and Son Relationship The Many-to-Many Relationship Try It Yourself The JOIN statement lets you work with data stored in multiple tables. In this article, I’ll walk you through the topic of JOIN clauses using real world SQL examples. Imagine if you could only work with one database table at a time. Fortunately, this isn’t anything we have to worry about. Once you learn the JOIN statement, you can start linking data together. In this article, I’ll use real world SQL examples that illustrate how we use JOINs, how each type of JOIN works, and when to use each type. Plus, I’ll share some tips that will help you avoid making common mistakes when using JOINs. What is the SQL JOIN Clause? The JOIN clause allows us to combine the columns from two or more tables based on shared column values. I’ll explain it using SQL real time examples. Here’s the first one: say we have a students table that holds student names, their respective usernames, and an ID number. We also have a “comments” table that stores any comments students have posted in a forum. Here are the two tables. Let’s add some test data: INSERT INTO `students` VALUES (1,'Jhon','jj2005'),(2,'Albert','salbert'),(3,'Mathew','powermath'), (4,'Lisa','lisabbc'),(5,'Sandy','imsandyw'),(6,'Tamara','tamy21'); id name forum_username 1 Jhon jj2005 2 Albert salbert 3 Mathew powermath 4 Lisa lisabbc 5 Sandy imsandyw 6 Tamara tamy21 INSERT INTO 'comments' VALUES (1,'jj2005','Awesome!'),(2,'jj2005','This is great :)'), (3,'powermath','Hmmm...'),(4,'imsandyw','Let\'s wait a moment'), (5,'lisabbc','Sure thing'),(6,'lisabbc','wow!'), (7,'lisabbc','lol :) :) :)'); id forum_username comment 1 jj2005 Awesome! 2 jj2005 This is great :) 3 powermath Hmmm… 4 imsandyw Let’s wait a moment 5 lisabbc Sure thing 6 lisabbc wow! 7 lisabbc lol :) :) :) As you can see, both tables have the forum_username column in common. Therefore, this column can be used to in a JOIN statement to relate the two tables together. For example, if we wanted to know the actual student name for each of the comments in the forum, we would write this JOIN query: SELECT students.name, comments.forum_username, comments.comment FROM students INNER JOIN comments ON students.forum_username = comments.forum_username ORDER BY students.name ASC; The results would look like this: name forum_username comment Jhon jj2005 Awesome! Jhon jj2005 This is great :) Lisa lisabbc lol :) :) :) Lisa lisabbc wow! Lisa lisabbc Sure thing Mathew powermath Hmmm… Sandy imsandyw Let’s wait a moment Note: I have purposely used forum_username to illustrate the JOIN concept, but in practice you would use the primary key (in this case, the id column from the students table) to relate tables. Getting to Know the JOIN Types There are several types of JOINs. Let’s review them quickly: INNER JOIN: This JOIN returns records that have a match in both tables based on the join predicate (which comes after the ON keyword). This is the same JOIN as we used in the preceding example. The INNER keyword is optional. LEFT [OUTER] JOIN: This will return all records from the left table (i.e. the table you list first in the JOIN) and only matching records from the right (i.e. second) table. The OUTER keyword is optional. Getting back to our real world SQL examples, in the student forum case, this would be: SELECT students.name, comments.forum_username, comments.comment FROM students LEFT JOIN comments ON students.forum_username = comments.forum_username ORDER BY students.name ASC; This LEFT JOIN would return a record for all students – including Tamara and Albert, who don’t have any comments and wouldn’t be listed in the results of an INNER JOIN. Notice too that in this LEFT JOIN example, Tamara’s and Albert’s comment column results will be NULL. RIGHT [OUTER] JOIN: This is the inverse of the LEFT JOIN; it returns all records from the right (second) table and only those that have a match from the left (first) table. A query similar to the previous example would look like this: SELECT students.name, comments.forum_username, comments.comment FROM students RIGHT JOIN comments ON students.forum_username = comments.forum_username ORDER BY students.name ASC; FULL [OUTER] JOIN: This is essentially the combination of a LEFT JOIN and a RIGHT JOIN. The result set will include all rows from both tables, populating the columns with table values when possible or with NULLs when there is no match in the counterpart table. This is not a JOIN you’ll use very often in real life. Note: MySQL lacks this statement, but a similar result can be achieved using the UNION of LEFT JOIN and RIGHT JOIN. In the case of our real world SQL examples, it’s worth noticing that in the following FULL JOIN clause we are supplying the ON keyword as we do in LEFT or RIGHT JOINs: SELECT * FROM students FULL OUTER JOIN comments ON students.forum_username = comments.forum_username ORDER BY students.name ASC; CROSS JOIN: This is another join type that you won’t use all that often – in this case, because it retrieves the Cartesian product of both tables. Basically, this gives you the combination of all records from both tables. CROSS JOIN does not apply a predicate (there’s no ON keyword), but it’s still possible to filter rows using WHERE. Doing this could make the result set equivalent to an INNER JOIN. In MySQL, JOIN, CROSS JOIN, and INNER JOIN are syntactic equivalents, i.e. they can replace each other. Below, there is no ON clause to filter the results. All possible combinations from both tables will be shown in the result set: SELECT * FROM students CROSS JOIN comments ORDER BY students.name ASC; You can learn more about the types of SQL JOINs on our YouTube channel - We Learn SQL. Remember to click subscribe. JOINs in SQL Real Time Examples The number of scenarios that require a JOIN is endless, but some scenarios do appear more often. Instead of going through the typical table1/table2 example, I’d rather present you with real world SQL examples. We can use these to get some practical tips. The Grandfather, Father, and Son Relationship A common real time scenario deals with data that follows this kind of relationship. For instance, a user is located in a city that belongs to a state. The tables (with more or fewer columns) look something like this: Let’s add some data: INSERT INTO `states` VALUES (3,'California'),(2,'Florida'),(1,'Nevada'); INSERT INTO `cities` VALUES (1,1,'Miami'),(2,1,'Orlando'), (3,2,'Las Vegas'),(4,2,'Coyote Springs'); INSERT INTO `users` VALUES (1,1,'Jhon','Doe'),(2,1,'Albert','Thomson'), (3,2,'Robert','Ford'),(4,3,'Samantha','Simpson'); To obtain the full list of users in a given city and state we will need to join the son table (User) with its father (City) and grandfather (State). SELECT User.first_name, user.last_name, City.cityname, State.statename FROM users User INNER JOIN cities City ON User.city_id = City.id INNER JOIN states State ON City.state_id = State.id; Already we have a couple of useful tips: The columns used to link tables should be indexed for greater performance. When the columns that link the tables (like the former example) are pointing to the primary key of the related table, then we are talking about foreign keys. In this case, it is better to include this relationship as part of your table definition; it will increase performance. In MySQL, you can create a users/city foreign key like this: ALTER TABLE `users` ADD INDEX `fk_city_idx` (`city_id` ASC); ALTER TABLE `users` ADD CONSTRAINT `fk_city` FOREIGN KEY (`city_id`) REFERENCES `cities` (`id`) ON DELETE CASCADE ON UPDATE CASCADE; This gives you the additional benefit of the integrity check that will be performed by the engine when data in these tables is updated or deleted. Suppose we want to find all the users in one state. You add a filtering condition to the query, as shown below: SELECT User.first_name, user.last_name, City.cityname, State.statename FROM users User INNER JOIN cities City ON User.city_id = City.id INNER JOIN states State ON City.state_id = State.id AND State.statename = 'Nevada'; Or you could even use an implicit join (shown in bold text), like this: SELECT User.first_name, user.last_name, City.cityname, State.statename FROM users User, cities City, states State WHERE User.city_id = City.id AND City.state_id = State.id AND State.statename = 'Nevada'; But I’d suggest you explicitly write the JOIN and keep the joining criteria and the filtering conditions separate: SELECT User.first_name, user.last_name, City.cityname, State.statename FROM users User INNER JOIN cities City ON User.city_id = City.id INNER JOIN states State ON City.state_id = State.id WHERE State.statename = 'Nevada'; A couple of important concepts: We can see how the son (user) table is filtered based on the conditions used on the grandfather (state) table. In other words, the users were based on a given state. Similarly, we could have filtered results based on the father (city) table and gotten a list of users based on a given city. A general rule of thumb is that the JOIN predicates (the conditions after the ON keyword) should be used for the joining relationship only. Leave the rest of the filtering conditions inside the WHERE section. This will simplify query readability and future code maintenance. Conversely, we could return states based on certain criteria. For instance, we could retrieve states that have at least one user belonging to that state: SELECT DISTINCT State.statename FROM states State INNER JOIN cities City ON City.state_id = State.id INNER JOIN users User ON User.city_id = City.id What do we learn here? The INNER JOIN removes any “unused” or unmatched records (records without a match on both sides of the JOIN). DISTINCT filters out duplicate records. Since there could be several users for a state, if we didn’t use DISTINCT we would get as many repeated states as users belonging to it. (The same filtering effect could also be achieved using GROUP BY.) Further filtering could be accomplished by adding WHERE conditions. The Many-to-Many Relationship Want more SQL real time examples? Another typical scenario for JOINs is when records relate to each other in a “many-to-many” or N-to-N manner. Say you have a system where you create badges that are awarded to users. In this case, a user has badges and at the same time a badge has users. These relationships will need a third table that will connect the primary keys from users and badges. It would look something like this: Let’s add some sample data: INSERT INTO `badges` VALUES (1,'gold heart',100),(2,'silver heart',50),(3,'metal heart',10),(4,'star',5),(5,'wood',2),(6,'dust',1); INSERT INTO `users` VALUES (1,'Robert','Williams'),(2,'Anthony','McPeters'),(3,'Tania','Krugger'),(4,'JJ','Richards'),(5,'Katy','Thomas'); INSERT INTO `badges_users` VALUES (1,1,1),(2,4,1),(3,4,2),(4,4,2),(5,4,2),(6,3,3),(7,3,3),(8,2,4); How can a JOIN fetch us all the users with their respective badges? SELECT User.first_name, User.last_name, BU.user_id, BU.badge_id, Badge.badge_name, Badge.badge_points FROM users User LEFT JOIN badges_users B ON User.id = BU.user_id LEFT JOIN badges Badge ON BU.badge_id = Badge.id ORDER BY Badge.badge_points DESC ember about this type of Here are a few things to remember about this type of query: We’ve used LEFT JOIN here on purpose because it will show users that have no badge at all. If we had used an INNER JOIN or an implicit inner join (by setting the ID’s equalities in a WHERE), then users who have no badges would not be included in the results. If you want to exclude these users, you should use an INNER JOIN. Also, using a LEFT JOIN means that unused badges will not be listed; we are focusing on the users and not on the badges. Finally, remember to properly index the intermediate table (badges_users). All its foreign keys should be defined. Now let’s grab all the badges that have at least one user. Expressed in another way, these are badges that have been used at least once. The query would be: SELECT DISTINCT Badge.badge_name FROM badges Badge LEFT JOIN badges_users BadgeUser ON Badge.id = BadgeUser.badge_id And if we wanted to get all unused badges, the query becomes: SELECT Badge.badge_name, Badge.badge_points FROM badges Badge LEFT JOIN badges_users BadgeUser ON Badge.id = BadgeUser.badge_id WHERE BadgeUser.badge_id IS NULL Notice that: We don’t always need to join all three tables. By only joining with the intermediate table (badges_users), which holds the references to the user and badge tables, we can still successfully run our queries. The intermediate table could also be used to save additional information. For instance, it could store the timestamp value when a user was given a certain badge. Try It Yourself You will likely daily face these types of situations when dealing with tables that hold related data. I strongly recommend you examine the above real world SQL examples and test them with the actual data. This will give you a clearer understanding of the concepts involved. If you have any comments on our real world SQL examples or your own ideas of other real life JOIN clauses, please feel free to share it so we can all keep learning! Tags: JOIN