Isaac Tonyloi - SWE

Posted on Dec 10, 2024

What is Database Normalization- With an Example

#database #normalization

Database normalization is a design process aimed at organizing data within a database to eliminate redundancy, improve integrity, and enhance scalability. By systematically applying normalization principles, you create a database structure that is more efficient, reduces anomalies, and ensures consistency in your data.

This article provides a detailed breakdown of normalization, taking you through unnormalized data to the third normal form (3NF) with illustrative examples and tables.

Why Normalize a Database?

Normalization is vital for designing efficient databases. Here are some key objectives:

Avoid Redundancy: Prevent duplicate data storage to save space.
Eliminate Update Anomalies: Ensure changes in data occur consistently across the database.
Improve Query Performance: Facilitate easier data retrieval and manipulation.
Ensure Data Integrity: Prevent inconsistencies caused by improper data dependencies.
Enhance Scalability: Create flexible structures that can handle growing datasets.

Normalization consists of several forms (or stages), with each step building on the previous one to further refine the structure.

Step 1: Unnormalized Form (UNF)

Characteristics:

Contains redundant data.
May include multi-valued attributes or repeating groups.
Offers no guarantees of consistency or integrity.

Example of an Unnormalized Table:

Doctor ID	Doctor Name	Patient ID	Patient Name	Region	Council	Postcode	Surgery Number	Slot ID	Total Cost
D1	Dr. Smith	P1	John	East	Council1	12345	101	S1	200
D1	Dr. Smith	P2	Alice	East	Council1	12345	101	S2	300
D2	Dr. Doe	P3	Bob	West	Council2	67890	102	S3	150

This table is unnormalized because:

It contains repeating groups like multiple Slot IDs.
Non-atomic values could exist in certain fields.

Issues with UNF:

Redundant information (e.g., Doctor Name is repeated for D1).
Difficult to update or query due to mixed information.

Step 2: First Normal Form (1NF)

Characteristics of 1NF:

Data is organized into rows and columns.
Each column contains atomic values (no arrays or sets).
Each record is uniquely identified using a primary key.

Transformation to 1NF:
To achieve 1NF:

Remove repeating groups by creating separate tables.
Ensure that all attributes have atomic values.
Introduce a primary key for each table.

Resulting Tables:

Patients Table (1NF):

Patient ID	Patient Name	Slot ID	Total Cost
P1	John	S1	200
P2	Alice	S2	300
P3	Bob	S3	150

Doctors Table (1NF):

Doctor ID	Doctor Name
D1	Dr. Smith
D2	Dr. Doe

Surgery Table (1NF):

Doctor ID	Region	Council	Postcode	Surgery Number
D1	East	Council1	12345	101
D2	West	Council2	67890	102

Step 3: Second Normal Form (2NF)

Characteristics of 2NF:

Satisfies all requirements of 1NF.
Removes partial dependencies, i.e., non-key attributes must depend on the entire primary key.

Key Observations:
In the Surgery table from 1NF, the Region depends only on the Council, not the entire primary key. This violates 2NF.

Transformation to 2NF:
To remove partial dependencies:

Create a new table for Council and Region.
Ensure all attributes depend solely on the primary key.

Resulting Tables:

Appointments Table (2NF):

Appointment ID	Slot ID	Total Cost
A1	S1	200
A2	S2	300
A3	S3	150

Council Table (2NF):

Council	Region
Council1	East
Council2	West

Step 4: Third Normal Form (3NF)

Characteristics of 3NF:

Satisfies all requirements of 2NF.
Removes transitive dependencies, i.e., non-key attributes must not depend on other non-key attributes.

Key Observations:
In the Surgery table from 2NF, Postcode and Surgery Number have a transitive dependency, where Postcode indirectly depends on Surgery Number.

Transformation to 3NF:
To eliminate transitive dependencies:

Create a new table for Postcode and Surgery Number.
Ensure all attributes depend directly on the primary key.

Resulting Tables:

Surgery Table (3NF):

Postcode	Surgery Number
12345	101
67890	102

Final Database Structure

After normalization, the database consists of the following well-structured tables:

Patients Table:
Stores patient information with atomic fields.
- Patient ID, Patient Name
Appointments Table:
Links Slot ID with Total Cost.
- Appointment ID, Slot ID, Total Cost
Doctors Table:
Contains details about doctors.
- Doctor ID, Doctor Name
Council Table:
Maps Council to Region.
- Council, Region
Surgery Table:
Connects Postcode to Surgery Number.
- Postcode, Surgery Number

Why Stop at 3NF?

While 3NF is sufficient for most practical purposes, further normalization forms exist:

Boyce-Codd Normal Form (BCNF): Ensures stricter elimination of redundancy.
Fourth Normal Form (4NF): Addresses multi-valued dependencies.
Fifth Normal Form (5NF): Eliminates redundancy arising from join dependencies.

For most real-world applications, 3NF strikes the right balance between performance and data integrity.

Conclusion

Normalization is a critical process for designing databases that are efficient, scalable, and easy to maintain. By following the steps outlined above, you can create a well-structured database that minimizes redundancy, prevents anomalies, and simplifies data operations.

Forem

What is Database Normalization- With an Example

Why Normalize a Database?

Step 1: Unnormalized Form (UNF)

Characteristics:

Issues with UNF:

Step 2: First Normal Form (1NF)

Characteristics of 1NF:

Patients Table (1NF):

Doctors Table (1NF):

Surgery Table (1NF):

Step 3: Second Normal Form (2NF)

Characteristics of 2NF:

Appointments Table (2NF):

Council Table (2NF):

Step 4: Third Normal Form (3NF)

Characteristics of 3NF:

Surgery Table (3NF):

Final Database Structure

Why Stop at 3NF?

Conclusion

Top comments (0)

Read next

ACID Properties in Databases - Part 2: Your Shield Against Transactional Chaos

🔧 Embed SQL Editor ✍️ in Your Internal Web Portal 💻

Hard Delete Vs Soft Delete in DBMS

2025 Guide to Architecting an Iceberg Lakehouse