DEV Community

Isaac Tonyloi - SWE
Isaac Tonyloi - SWE

Posted on

What is Database Normalization- With an Example

Database normalization is a design process aimed at organizing data within a database to eliminate redundancy, improve integrity, and enhance scalability. By systematically applying normalization principles, you create a database structure that is more efficient, reduces anomalies, and ensures consistency in your data.

This article provides a detailed breakdown of normalization, taking you through unnormalized data to the third normal form (3NF) with illustrative examples and tables.

Why Normalize a Database?

Normalization is vital for designing efficient databases. Here are some key objectives:

  • Avoid Redundancy: Prevent duplicate data storage to save space.
  • Eliminate Update Anomalies: Ensure changes in data occur consistently across the database.
  • Improve Query Performance: Facilitate easier data retrieval and manipulation.
  • Ensure Data Integrity: Prevent inconsistencies caused by improper data dependencies.
  • Enhance Scalability: Create flexible structures that can handle growing datasets.

Normalization consists of several forms (or stages), with each step building on the previous one to further refine the structure.

Step 1: Unnormalized Form (UNF)

Characteristics:

  • Contains redundant data.
  • May include multi-valued attributes or repeating groups.
  • Offers no guarantees of consistency or integrity.

Example of an Unnormalized Table:

Doctor ID Doctor Name Patient ID Patient Name Region Council Postcode Surgery Number Slot ID Total Cost
D1 Dr. Smith P1 John East Council1 12345 101 S1 200
D1 Dr. Smith P2 Alice East Council1 12345 101 S2 300
D2 Dr. Doe P3 Bob West Council2 67890 102 S3 150

This table is unnormalized because:

  1. It contains repeating groups like multiple Slot IDs.
  2. Non-atomic values could exist in certain fields.

Issues with UNF:

  • Redundant information (e.g., Doctor Name is repeated for D1).
  • Difficult to update or query due to mixed information.

Step 2: First Normal Form (1NF)

Characteristics of 1NF:

  • Data is organized into rows and columns.
  • Each column contains atomic values (no arrays or sets).
  • Each record is uniquely identified using a primary key.

Transformation to 1NF:
To achieve 1NF:

  1. Remove repeating groups by creating separate tables.
  2. Ensure that all attributes have atomic values.
  3. Introduce a primary key for each table.

Resulting Tables:

Patients Table (1NF):

Patient ID Patient Name Slot ID Total Cost
P1 John S1 200
P2 Alice S2 300
P3 Bob S3 150

Doctors Table (1NF):

Doctor ID Doctor Name
D1 Dr. Smith
D2 Dr. Doe

Surgery Table (1NF):

Doctor ID Region Council Postcode Surgery Number
D1 East Council1 12345 101
D2 West Council2 67890 102

Step 3: Second Normal Form (2NF)

Characteristics of 2NF:

  • Satisfies all requirements of 1NF.
  • Removes partial dependencies, i.e., non-key attributes must depend on the entire primary key.

Key Observations:
In the Surgery table from 1NF, the Region depends only on the Council, not the entire primary key. This violates 2NF.

Transformation to 2NF:
To remove partial dependencies:

  1. Create a new table for Council and Region.
  2. Ensure all attributes depend solely on the primary key.

Resulting Tables:

Appointments Table (2NF):

Appointment ID Slot ID Total Cost
A1 S1 200
A2 S2 300
A3 S3 150

Council Table (2NF):

Council Region
Council1 East
Council2 West

Step 4: Third Normal Form (3NF)

Characteristics of 3NF:

  • Satisfies all requirements of 2NF.
  • Removes transitive dependencies, i.e., non-key attributes must not depend on other non-key attributes.

Key Observations:
In the Surgery table from 2NF, Postcode and Surgery Number have a transitive dependency, where Postcode indirectly depends on Surgery Number.

Transformation to 3NF:
To eliminate transitive dependencies:

  1. Create a new table for Postcode and Surgery Number.
  2. Ensure all attributes depend directly on the primary key.

Resulting Tables:

Surgery Table (3NF):

Postcode Surgery Number
12345 101
67890 102

Final Database Structure

After normalization, the database consists of the following well-structured tables:

  1. Patients Table:
    Stores patient information with atomic fields.

    • Patient ID, Patient Name
  2. Appointments Table:
    Links Slot ID with Total Cost.

    • Appointment ID, Slot ID, Total Cost
  3. Doctors Table:
    Contains details about doctors.

    • Doctor ID, Doctor Name
  4. Council Table:
    Maps Council to Region.

    • Council, Region
  5. Surgery Table:
    Connects Postcode to Surgery Number.

    • Postcode, Surgery Number

Why Stop at 3NF?

While 3NF is sufficient for most practical purposes, further normalization forms exist:

  • Boyce-Codd Normal Form (BCNF): Ensures stricter elimination of redundancy.
  • Fourth Normal Form (4NF): Addresses multi-valued dependencies.
  • Fifth Normal Form (5NF): Eliminates redundancy arising from join dependencies.

For most real-world applications, 3NF strikes the right balance between performance and data integrity.

Conclusion

Normalization is a critical process for designing databases that are efficient, scalable, and easy to maintain. By following the steps outlined above, you can create a well-structured database that minimizes redundancy, prevents anomalies, and simplifies data operations.

Top comments (0)