DEV Community

Cover image for Duplicate Emails | LeetCode | MSSQL
Retiago Drago
Retiago Drago

Posted on • Edited on

Duplicate Emails | LeetCode | MSSQL

The Problem

The task is to identify all duplicate emails from the Person table where each row contains an email that has no uppercase letters. The Person table has the following structure:

Table: Person

Column Name Type
id int
email varchar

Here, id is the primary key, and it's guaranteed that the email field is not NULL.

Explanation

Consider the following sample input for the Person table:

id email
1 a@b.com
2 c@d.com
3 a@b.com

In this case, the output will be:

Email
a@b.com

Here, a@b.com is a duplicate email as it occurs twice.

The Solution

We present two SQL solutions with varying strategies and trade-offs.

Source Code 1

The first solution creates a Common Table Expression (CTE) called duplicate, which assigns a row number to each email, partitioning by the email itself. This effectively counts the instances of each email. The outer query then selects distinct emails where the row number (i.e., the count) is greater than one.

WITH duplicate AS (
  SELECT
    email AS Email,
    ROW_NUMBER() OVER (PARTITION BY email ORDER BY email) AS rn
  FROM Person
)
SELECT DISTINCT Email
FROM duplicate
WHERE rn > 1
Enter fullscreen mode Exit fullscreen mode

This solution is relatively complex due to the use of window function and may be less performant, with a runtime of 790ms, which ranks in the 40.90% percentile.

Performance1

Source Code 2

The second solution uses GROUP BY to group all emails and HAVING to filter those with a count greater than one. It's simpler and more performant than the first, with a runtime of 641ms, which ranks in the 78.49% percentile.

SELECT email AS Email
FROM Person
GROUP BY email
HAVING COUNT(*) > 1
Enter fullscreen mode Exit fullscreen mode

Performance2

Conclusion

Both solutions successfully identify duplicate emails. The first solution is more complex and potentially less performant but demonstrates the use of window functions. The second solution is simpler, more readable, and more performant, showing the power of grouping and aggregation in SQL. When writing SQL queries, consider trade-offs between complexity, performance, and readability.

You can find the original problem at LeetCode.

For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.

ranggakd - Link in Bio & Creator Tools | Beacons

@ranggakd | center details summary summary Oh hello there I m a an Programmer AI Tech Writer Data Practitioner Statistics Math Addict Open Source Contributor Quantum Computing Enthusiast details center.

favicon beacons.ai

Top comments (0)