Kitco Codes

Posted on Feb 23

[SQL] - Inner Join, Left Join, Non-Equi Join, Self Join

#webdev #programming #beginners #sql

[Day 26] Reviewing Incorrect SQL problems.

Before diving in, I’ll start by explaining the purpose of using JOINs.

Why Use the JOIN in SQL

The JOIN combines information from different tables into a single dataset to retrieve the desired data.

For example, imagine one table contains "customer information" (e.g., name, customer ID), and another table holds "order information" (e.g., order ID, customer ID, order date). If I want to know "which customers placed orders and when," how would I approach this?

I can connect the customer table and the order table using the common key, customer_id, through a JOIN, allowing me to view the customer’s name and order date together in one result. If the customer table is called the customer and the order table is order, use customer.customer_id = order.customer_id retrieves only the rows where customer_id matches in both tables. This effectively returns the intersection of the two tables—only the overlapping data.

Problem 01

You need to generate a report of each employee’s salary grade, which is stored in the Job_Grades table. The report should include the employee’s job title, department name, hire date, salary, and salary grade.

The employees table does not contain job_title.
- employees e JOIN jobs j ON e.job_id = j.job_id: I use the employee’s job_id to fetch the job_title from the jobs table.
The employees table does not include department_name.

- `employees e JOIN departments d ON e.department_id = d.department_id`: I use the employee’s `department_id` to retrieve the `department_name` from the `departments` table.

I needed to consider how to use the lowest_sal and highest_sal fields in the job_grades table to assign grades based on an employee’s salary. I couldn’t rely on the typical equi join (joining with =), which I’ve been accustomed to using.

- I realized that I need to fetch only the rows where the `salary` falls between `lowest_sal` and `highest_sal`. If I want to include employees whose salaries fall outside any defined range, I should use a `LEFT JOIN`. Using a plain `JOIN` (which defaults to `INNER JOIN`) would exclude employees whose salaries don’t fit any range, as those rows with NULL values would be omitted from the output.

```sql
SELECT e.last_name, j.job_title, d.department_name, e.hire_date, e.salary, jg.grade_level
FROM employees e
JOIN departments d ON e.department_id = d.department_id
JOIN jobs j ON e.job_id = j.job_id
LEFT JOIN job_grades jg ON e.salary BETWEEN jg.lowest_sal AND jg.highest_sal;

```

This problem initially confused me because I had primarily used equi joins to solve problems. It turned out to be a valuable opportunity to learn more about non-equi joins.

Problem 02

Some employees report to other employees as their managers (supervisors). Query the employees’ manager IDs, manager names, employee names, and employee IDs. If an employee does not report to a manager, you must still include that information in the output, and the manager’s name should be displayed in uppercase.

SELECT
    e.employee_id AS employee_id,
    e.first_name AS employee_name,
    e.manager_id AS manager_id,
    UPPER(m.first_name) AS manager_name
FROM
    employees e
LEFT JOIN
    employees m ON e.manager_id = m.employee_id;

Since the problem specifies, “If there are employees without a reporting manager, include that information as well,” I need to include employees with manager_id as NULL in the results. Using a plain JOIN (which defaults to INNER JOIN) would exclude employees without managers, as it only returns matching rows. Therefore, I must use LEFT JOIN to ensure all employees are included.

Here’s another valid version:

SELECT e1.employee_id AS employee_id,
       e1.last_name AS employee_name,
       e2.employee_id AS manager_id,
       e2.last_name AS manager_name
FROM employees e1
LEFT JOIN employees e2 ON e1.manager_id = e2.employee_id;

Problem 02

Output the department name and employee count for departments with five or more employees, sorted by employee count in descending order.

SELECT d.department_name, COUNT(e.employee_id)
FROM departments d, employees e
WHERE e.department_id = d.department_id
GROUP BY d.department_name
HAVING COUNT(e.employee_id) >= 5;

- ANSI Standard Version:

SELECT d.department_name, COUNT(e.employee_id)
FROM employees e
JOIN departments d ON e.department_id = d.department_id
GROUP BY d.department_name
HAVING COUNT(e.employee_id) >= 5;

I initially got this wrong because I performed the grouping before joining the tables. I need to join the tables first to establish the connection, then group and filter the results.

Problem 03

Query the total number of employees working as managers.

A Quick review of primary key and foreign key

Databases store data across multiple tables. To connect data between tables, I use a common value called a “key,” which can be thought of as a column in a table. A primary key (Primary Key) uniquely identifies each row (record) in a table. A foreign key (Foreign Key) is a column in one table that references the primary key in another table (or the same table). “Referencing a key” means that a foreign key in one table points to or connects to a primary key in another (or the same) table.

[EER Diagram of the database]

I can use MySQL Workbench’s Reverse Engineer feature to create an EER diagram and examine the relationships between tables. By observing the four lines connected to the employees table in the diagram, I can identify which columns in the employees table are linked to columns in other tables.

Here, the manager_id I need isn’t connected to any other table’s key. Instead, manager_id references the employee_id within the employees table itself. Therefore, I need to use a self-join.

Every employee has an employee_id, and managers do too. To query employees working as managers, I can write:

SELECT COUNT(DISTINCT e1.employee_id) AS manager_count
FROM employees e1
JOIN employees e2 ON e1.employee_id = e2.manager_id;

e1 is the alias for the manager table, and e2 is the alias for the employee table. When joining the manager table (e1) with the employee table (e2), e1.employee_id = e2.manager_id means it retrieves only the rows where the manager table’s employee_id matches the employee table’s manager_id.

Here’s another query that produces the same result:

SELECT COUNT(DISTINCT e1.employee_id) AS manager_count
FROM employees e2
JOIN employees e1 ON e1.employee_id = e2.manager_id;

The difference lies in the FROM clause and the JOIN employees e1 part. The logical relationship remains the same, so the results are identical. However, conventionally, placing the “subject (manager)” in the FROM clause and the “dependent (employee)” in the JOIN clause is more readable and aligns with standard practices.

Conversely, if the logical relationship changes, the query results will differ. For example:

SELECT COUNT(DISTINCT e1.employee_id) AS num_of_manager
FROM employees e2
JOIN employees e1 ON e2.employee_id = e1.employee_id;

This query implies that e2 (employee) references e1 (manager). “Reference” means a column in one table points to or connects to a column in another (or the same) table. Here, manager_id (foreign key) references employee_id (primary key) in the employees table, forming a “self-referencing relationship.”

A primary key uniquely identifies each row in a table and does not reference data in other tables—it is only referenced by foreign keys. A primary key does not reference a foreign key; rather, a foreign key references a primary key. In the EER diagram, the self-referencing relationship between employee_id and manager_id confirms this connection.

I used to interpret the JOIN as simply “fetching rows where they match,” but this review has helped me gain a clearer understanding of 'referencing' in joining tables.

DEV Community

[SQL] - Inner Join, Left Join, Non-Equi Join, Self Join

Why Use the JOIN in SQL

Problem 01

Problem 02

Problem 02

Problem 03

A Quick review of primary key and foreign key

Top comments (0)

Read next

Simple in-memory database in JS

Best Practices for Returning API Responses in Express.js Controllers

Building a Blog Application with React and Tailwind CSS

What do you guys think is the easiest coding language to learn