[Day 26] Reviewing Incorrect SQL problems.
Before diving in, I’ll start by explaining the purpose of using JOINs.
Why Use the JOIN in SQL
The JOIN combines information from different tables into a single dataset to retrieve the desired data.
For example, imagine one table contains "customer information" (e.g., name, customer ID), and another table holds "order information" (e.g., order ID, customer ID, order date). If I want to know "which customers placed orders and when," how would I approach this?
I can connect the customer table and the order table using the common key, customer_id
, through a JOIN, allowing me to view the customer’s name and order date together in one result. If the customer table is called the customer
and the order table is order
, use customer.customer_id = order.customer_id
retrieves only the rows where customer_id
matches in both tables. This effectively returns the intersection of the two tables—only the overlapping data.
Problem 01
- You need to generate a report of each employee’s salary grade, which is stored in the
Job_Grades
table. The report should include the employee’s job title, department name, hire date, salary, and salary grade.
-
The
employees
table does not containjob_title
.-
employees e JOIN jobs j ON e.job_id = j.job_id
: I use the employee’sjob_id
to fetch thejob_title
from thejobs
table.
-
The
employees
table does not includedepartment_name
.
- `employees e JOIN departments d ON e.department_id = d.department_id`: I use the employee’s `department_id` to retrieve the `department_name` from the `departments` table.
- I needed to consider how to use the
lowest_sal
andhighest_sal
fields in thejob_grades
table to assign grades based on an employee’s salary. I couldn’t rely on the typical equi join (joining with=
), which I’ve been accustomed to using.
- I realized that I need to fetch only the rows where the `salary` falls between `lowest_sal` and `highest_sal`. If I want to include employees whose salaries fall outside any defined range, I should use a `LEFT JOIN`. Using a plain `JOIN` (which defaults to `INNER JOIN`) would exclude employees whose salaries don’t fit any range, as those rows with NULL values would be omitted from the output.
```sql
SELECT e.last_name, j.job_title, d.department_name, e.hire_date, e.salary, jg.grade_level
FROM employees e
JOIN departments d ON e.department_id = d.department_id
JOIN jobs j ON e.job_id = j.job_id
LEFT JOIN job_grades jg ON e.salary BETWEEN jg.lowest_sal AND jg.highest_sal;
```
This problem initially confused me because I had primarily used equi joins to solve problems. It turned out to be a valuable opportunity to learn more about non-equi joins.
Problem 02
- Some employees report to other employees as their managers (supervisors). Query the employees’ manager IDs, manager names, employee names, and employee IDs. If an employee does not report to a manager, you must still include that information in the output, and the manager’s name should be displayed in uppercase.
SELECT
e.employee_id AS employee_id,
e.first_name AS employee_name,
e.manager_id AS manager_id,
UPPER(m.first_name) AS manager_name
FROM
employees e
LEFT JOIN
employees m ON e.manager_id = m.employee_id;
Since the problem specifies, “If there are employees without a reporting manager, include that information as well,” I need to include employees with manager_id
as NULL in the results. Using a plain JOIN
(which defaults to INNER JOIN
) would exclude employees without managers, as it only returns matching rows. Therefore, I must use LEFT JOIN
to ensure all employees are included.
Here’s another valid version:
SELECT e1.employee_id AS employee_id,
e1.last_name AS employee_name,
e2.employee_id AS manager_id,
e2.last_name AS manager_name
FROM employees e1
LEFT JOIN employees e2 ON e1.manager_id = e2.employee_id;
Problem 02
- Output the department name and employee count for departments with five or more employees, sorted by employee count in descending order.
SELECT d.department_name, COUNT(e.employee_id)
FROM departments d, employees e
WHERE e.department_id = d.department_id
GROUP BY d.department_name
HAVING COUNT(e.employee_id) >= 5;
- - ANSI Standard Version:
SELECT d.department_name, COUNT(e.employee_id)
FROM employees e
JOIN departments d ON e.department_id = d.department_id
GROUP BY d.department_name
HAVING COUNT(e.employee_id) >= 5;
I initially got this wrong because I performed the grouping before joining the tables. I need to join the tables first to establish the connection, then group and filter the results.
Problem 03
- Query the total number of employees working as managers.
A Quick review of primary key and foreign key
Databases store data across multiple tables. To connect data between tables, I use a common value called a “key,” which can be thought of as a column in a table. A primary key (Primary Key) uniquely identifies each row (record) in a table. A foreign key (Foreign Key) is a column in one table that references the primary key in another table (or the same table). “Referencing a key” means that a foreign key in one table points to or connects to a primary key in another (or the same) table.
[EER Diagram of the database]
I can use MySQL Workbench’s Reverse Engineer feature to create an EER diagram and examine the relationships between tables. By observing the four lines connected to the employees
table in the diagram, I can identify which columns in the employees
table are linked to columns in other tables.
Here, the manager_id
I need isn’t connected to any other table’s key. Instead, manager_id
references the employee_id
within the employees
table itself. Therefore, I need to use a self-join.
Every employee has an employee_id
, and managers do too. To query employees working as managers, I can write:
SELECT COUNT(DISTINCT e1.employee_id) AS manager_count
FROM employees e1
JOIN employees e2 ON e1.employee_id = e2.manager_id;
-
e1
is the alias for the manager table, ande2
is the alias for the employee table. When joining the manager table (e1
) with the employee table (e2
),e1.employee_id = e2.manager_id
means it retrieves only the rows where the manager table’semployee_id
matches the employee table’smanager_id
.
Here’s another query that produces the same result:
SELECT COUNT(DISTINCT e1.employee_id) AS manager_count
FROM employees e2
JOIN employees e1 ON e1.employee_id = e2.manager_id;
The difference lies in the FROM
clause and the JOIN employees e1
part. The logical relationship remains the same, so the results are identical. However, conventionally, placing the “subject (manager)” in the FROM
clause and the “dependent (employee)” in the JOIN
clause is more readable and aligns with standard practices.
Conversely, if the logical relationship changes, the query results will differ. For example:
SELECT COUNT(DISTINCT e1.employee_id) AS num_of_manager
FROM employees e2
JOIN employees e1 ON e2.employee_id = e1.employee_id;
This query implies that e2
(employee) references e1
(manager). “Reference” means a column in one table points to or connects to a column in another (or the same) table. Here, manager_id
(foreign key) references employee_id
(primary key) in the employees
table, forming a “self-referencing relationship.”
A primary key uniquely identifies each row in a table and does not reference data in other tables—it is only referenced by foreign keys. A primary key does not reference a foreign key; rather, a foreign key references a primary key. In the EER diagram, the self-referencing relationship between employee_id
and manager_id
confirms this connection.
I used to interpret the JOIN as simply “fetching rows where they match,” but this review has helped me gain a clearer understanding of 'referencing' in joining tables.
Top comments (0)