DEV Community

Cover image for SQL 101 | Chapter 7: Using Joins to Combine and Unify Data
Yujin
Yujin

Posted on

SQL 101 | Chapter 7: Using Joins to Combine and Unify Data

Joins allow you to combine data from multiple tables. Learn how these can be the foundation of recommendations and collaborative filtering as these enable us to find patterns, associations, and similarities within datasets.

Improve your readability in:

SQL 101 | Chapter 7: Using Joins to Combine and Unify Data

Joins allow you to combine data from multiple tables. Learn how these can be the foundation of recommendations and collaborative filtering as these enable us to find patterns, associations, and similarities within datasets.

favicon blog.ardenov.com

In SQL, joins allow you to combine data from multiple tables. This chapter we’ll learn advanced join techniques. To illustrate these concepts, we’ll use three tables: books, authors, and sales (from Chapter 3). Joins are one of the most essential features of SQL, allowing you to combine rows from multiple tables based on a related column. While basic joins like INNER JOIN are typically the starting point, advanced SQL workflows often require other types of joins such as LEFT JOIN, RIGHT JOIN, FULL JOIN and CROSS JOIN. Each join type serves a specific purpose and determines how data from two tables is combined. INNER JOIN (or JOIN) An INNER JOIN (commonly written simply as JOIN) returns only the rows where there is a match between the columns of both tables. If no match is found, the rows are excluded from the result. INNER JOIN diagram INNER JOIN diagram For example, to retrieve books along with their authors: SELECT b.book_title , a.author_name FROM books b INNER JOIN authors a ON b.author_id = a.author_id; This query matches the author_id in the books table with the author_id in the authors table. Only books with an associated author are included in the result. 💡 INNER JOIN can also be written as JOIN SELECT b.book_title , a.author_name FROM books b JOIN authors a ON b.author_id = a.author_id; List of books along with their authors List of books along with their authors If there are books in the books table without a corresponding author_id in the authors table, those books will not appear in the result. LEFT JOIN A LEFT JOIN retrieves all rows from the left table and matches rows from the right table. If there is no match, rows from the right table are filled with NULL. LEFT JOIN diagram LEFT JOIN diagram This is particularly useful when you want to see all entries from one table regardless of whether they have matching data in the other table. To retrieve all books, including those without an assigned author: SELECT b.book_title , a.author_name FROM books b LEFT JOIN authors a ON b.author_id = a.author_id; List of books along with their authors using LEFT JOIN List of books along with their authors using LEFT JOIN In this query, all rows from the books table are returned. If a book_id doesn’t have a matching author_id in the authors table, the author_name will be NULL. RIGHT JOIN A RIGHT JOIN is the opposite of LEFT JOIN. It retrieves all rows from the right table and matches rows from the left table. Unmatched rows from the left table are filled with NULL. RIGHT JOIN diagram RIGHT JOIN diagram Let’s try to retrieve all authors and the books they’ve written, even if some authors haven’t written any books. SELECT a.author_name , b.book_title FROM authors a RIGHT JOIN books b ON a.author_id = b.author_id; List of books along with their authors using RIGHT JOIN List of books along with their authors using RIGHT JOIN Here, all rows from the books table are included in the result. If a book doesn’t have a corresponding author, the author_name will display NULL. FULL JOIN (or FULL OUTER JOIN) A FULL JOIN combines the results of both a LEFT JOIN and a RIGHT JOIN. It returns all rows from both tables, matching rows where possible. If no match is found, the result includes NULL for the rows of the non-matching table. FULL JOIN diagram FULL JOIN diagram This is helpful for reconciling mismatched datasets or ensuring no data is omitted from either table. To get a complete list of books and authors, whether or not they have a match, do: SELECT b.book_title , a.author_name FROM books b FULL JOIN authors a ON b.author_id = a.author_id; 💡 FULL JOIN can also be written as FULL OUTER JOIN. It is the more explicit version that emphasizes the “outer” nature of the join. SELECT b.book_title , a.author_name FROM books b FULL OUTER JOIN authors a ON b.author_id = a.author_id; List of books and authors, whether or not they have a match List of books and authors, whether or not they have a match A FULL JOIN combines rows from both tables and includes all unmatched rows, assigning NULL values to the rows where no match exists. However, you can add additional filters to control how the NULL values behave using the WHERE clause, by filtering the join key (the column you are joining on). This approach allows you to extract subsets of the FULL JOIN results, giving you more flexibility when analyzing your data. Inversed diagram of INNER JOIN by filtering the FULL JOIN Inversed diagram of INNER JOIN by filtering the FULL JOIN The query below specifically filters for rows where the author_id in the books table has no match in the authors table, or vice versa: SELECT b.book_title , a.author_name FROM books b FULL JOIN authors a ON b.author_id = a.author_id WHERE b.author_id IS NULL OR a.author_id IS NULL; Result of filtering the FULL JOIN Result of filtering the FULL JOIN In this query, the FULL JOIN brings in all rows from both tables, matching them on the author_id column. The WHERE clause ensures only unmatched rows are included in the result, either rows from the books table with no corresponding author_id in the authors table (b.author_id IS NULL) or rows from the authors table with no match in the books table (a.author_id IS NULL). Such gaps are useful for data integrity checks or reconciling data between related tables. CROSS JOIN A CROSS JOIN produces the Cartesian product of two tables, meaning every row in the first table is combined with every row in the second table. Diagram of CROSS JOIN Diagram of CROSS JOIN While CROSS JOIN are not used as often as other types of joins (like INNER JOIN and LEFT JOIN), they are incredibly useful in certain scenarios, especially for generating all possible combinations, comparisons, or for analytical purposes. Let’s take an example using the books and authors tables. Imagine we want to see all possible combinations of books and authors. A CROSS JOIN will pair each book with every author: SELECT b.book_title , a.author_name FROM books b CROSS JOIN authors a; 💡 CROSS JOIN can also be written as ,. SELECT b.book_title , a.author_name FROM books b , authors a; Always prefer CROSS JOIN: It adheres to modern SQL standards, is explicit, is optimized, and makes your queries more readable and maintainable. Avoid using comma-separated joins (FROM table_1, table_2), as they are considered outdated and can lead to misunderstandings in complex queries. Combinations of books and authors Combinations of books and authors CROSS JOIN can lead to extremely large result sets, which may not always be useful. However, they are valuable in specific scenarios such as generating combinations, exploring relationship, and dimensional analysis. Self Join A Self Join is a join where a table is joined with itself. This might seem unusual at first, but self joins are incredibly useful when comparing rows within the same table or establishing relationships between rows in a single dataset. In a self join, the table is aliased to differentiate between its two instances in the query. A Self Join is foundational of recommendations and collaborative filtering because it enables us to find patterns, associations, and similarities within datasets. Collaborative filtering works on the principle of leveraging user preferences or behaviors to recommend items. It analyzes the relationships between users and items to suggest what a user might like based on the behavior of similar users or the attributes of items they interacted with. There are two types of collaborative filtering: User-Based Collaborative Filtering. It recommends items based on the preferences of users who are similar to the target user. Item-Based Collaborative Filtering. It recommends items that are similar to the items the target user has interacted with. Let’s use the books table to help identify relationships based on metadata, such as genre: SELECT b1.book_title AS book_1_title , b2.book_title AS book_2_title , b1.book_genre FROM books b1 JOIN books b2 ON b1.book_genre = b2.book_genre AND b1.book_id <> b2.book_id; List of pairs of books that share the same genre List of pairs of books that share the same genre This query generates pairs of books that share the same genre but are not the same book. This is useful for recommending books within a specific genre. To improve the quality of recommendations, you can use the sales table tracks purchases of books. A Self Join on this table can also pair books that are purchased together. By using techniques like INNER JOIN (or JOIN), LEFT JOIN, RIGHT JOIN, FULL JOIN (or FULL OUTER JOIN), CROSS JOIN (or ,) and self join in various situations, you can handle complex relational data with ease. Take time to practice these queries with the provided tables. As you become more familiar with these techniques, you’ll be better equipped to tackle more advanced SQL problems. References https://en.wikipedia.org/wiki/Join_(SQL) https://dataschool.com/how-to-teach-people-sql/sql-join-types-explained-visually/


Originally published at https://blog.ardenov.com.

Top comments (0)