DEV Community

Cover image for SQL Normalization: Cleaning Up Your Data Like a Pro!
Athreya aka Maneshwar
Athreya aka Maneshwar

Posted on

SQL Normalization: Cleaning Up Your Data Like a Pro!

Ever felt like your database is a messy, overstuffed closet? Too many redundancies, anomalies, and weird dependencies?

Well, SQL normalization is here to tidy things up!

Normalization is the process of organizing a relational database to reduce redundancy and improve data integrity.

It involves breaking down tables into smaller, structured tables based on rules called normal forms (NFs).

Image description

1NF: First Normal Form (Atomicity is the Key)

What’s the issue?

Unnormalized tables often contain repeating groups and non-atomic (multi-valued) data, making data retrieval and updates inefficient.

How does 1NF fix it?

1NF ensures that:

  • Each column contains atomic values (indivisible values).
  • No repeating groups or arrays exist in any row.

Example:

Unnormalized Table (Before 1NF)

StudentID Name Courses
101 Alice Math, Science
102 Bob English
103 Charlie Math, History

Here, the Courses column contains multiple values, violating 1NF.

1NF Table (After Fix)

StudentID Name Course
101 Alice Math
101 Alice Science
102 Bob English
103 Charlie Math
103 Charlie History

Now, each column contains atomic values, and there are no repeating groups.


2NF: Second Normal Form (Eliminating Partial Dependencies)

What’s the issue?

Even after 1NF, we might still have partial dependencies, where a non-key attribute depends on only part of a composite primary key.

How does 2NF fix it?

  • Remove partial dependencies by creating separate tables.

Example:

In our 1NF table, StudentID and Course together form the primary key.

If we add StudentAge, it only depends on StudentID, not the whole key.

1NF Table (Before 2NF)

StudentID Name Age Course
101 Alice 20 Math
101 Alice 20 Science
102 Bob 22 English

Here, Age depends only on StudentID, not on Course.

That's a partial dependency.

2NF Tables (After Fix)

Students Table

StudentID Name Age
101 Alice 20
102 Bob 22

Enrollments Table

StudentID Course
101 Math
101 Science
102 English

Now, no column depends on only part of the primary key.


3NF: Third Normal Form (Eliminating Transitive Dependencies)

What’s the issue?

Even after 2NF, we might still have transitive dependencies, where a non-key column depends on another non-key column instead of the primary key.

How does 3NF fix it?

  • Ensure that every non-key column depends ONLY on the primary key and not on another non-key attribute.

Example:

Let’s say we store the Department along with the Instructor.

2NF Table (Before 3NF)

CourseID Course Instructor Department
201 Math Shiv Science
202 English Jhon Arts

Here, Department depends on Instructor, not directly on CourseID.

That’s a transitive dependency.

3NF Tables (After Fix)

Courses Table

CourseID Course Instructor
201 Math Shiv
202 English Jhon

Instructors Table

Instructor Department
Shiv Science
Jhon Arts

Now, every column depends ONLY on the primary key, eliminating transitive dependencies.


BCNF: Boyce-Codd Normal Form (Stricter Than 3NF)

What’s the issue?

How does BCNF fix it?

  • Ensures every determinant (an attribute that functionally determines another) is a candidate key.

Example:

Let’s say we have a table storing course assignments with instructors.

3NF Table (Before BCNF)

StudentID Course Instructor
101 Math Shiv
101 Science Mike

Here, Instructor determines Course, but Instructor is not a candidate key.

This violates BCNF.

BCNF Tables (After Fix)

StudentCourses Table

StudentID Course
101 Math
101 Science

CourseInstructors Table

Course Instructor
Math Shiv
Science Mike

Understanding BCNF

Boyce-Codd Normal Form (BCNF) is an advanced level of normalization that builds on 3NF by addressing cases where 3NF still allows certain functional dependencies to exist improperly.

BCNF ensures that every functional dependency (FD) has a superkey as its determinant.

This means that if a column (or set of columns) determines another column, it must be a candidate key—a minimal set of attributes that uniquely identify a row.

A simple way to understand BCNF is:

  • If X → Y, then X must be a candidate key.

Why is BCNF necessary?

3NF still allows anomalies when a non-trivial functional dependency exists where a non-superkey determines another column.

BCNF eliminates these issues by enforcing stricter rules.

How do we achieve BCNF?

If a table violates BCNF, we can:

  1. Decompose the table into smaller tables to eliminate the dependency.
  2. Ensure that all determinants are candidate keys so that each functional dependency is properly structured.

By following these steps, we ensure data integrity and eliminate redundancy, making our database more efficient.

4NF, 5NF

Image description


I’ve been working on a super-convenient tool called LiveAPI.

LiveAPI helps you get all your backend APIs documented in a few minutes

With LiveAPI, you can quickly generate interactive API documentation that allows users to execute APIs directly from the browser.

image

If you’re tired of manually creating docs for your APIs, this tool might just make your life easier.

Top comments (1)

Collapse
 
shesidhar_sharmapendyala profile image
Shesidhar Sharma Pendyala

That us a very good explanation of the concept