Database normalization organizes data in a structured and consistent way, eliminating redundancy and ensuring data integrity within relational databases. It makes databases easier to manage, query, and maintain, reducing errors.
By following normalization rules, you can design databases that are easier to maintain and less prone to errors.
Before dive into the normalization we wanted to know the basics of the relational database.
- A relational database is a type of database that stores data in tables, which are structured into rows and columns. Table - Collections of related data entries Rows - Records, each row represents a single data Columns - Fields, each column represents a specific attribute of the data.
Primary Key - A unique identifier for each row in a table. Ensures each record can be uniquely identified.
Foreign Key - A field in one table that links to the primary key of another table, creating a relationship between the two tables
We have mentioned that normalization is for eliminating redundancy and inconsistent dependency. What’s that?
Redundant data means storing the same data in multiple places, which wastes disk space and creates maintenance problems. If any changes occur in the data, we have to update it in all places.
Inconsistent dependency happens when data in a database relies on other data in a way that is unpredictable or incorrect. This can cause errors and make the database unreliable.
- Normal form - Each rule in database normalization. If a database follows the first rule of normalization, it’s in “First Normal Form” (1NF). If it follows the first three rules, it’s in “Third Normal Form” (3NF).
Think of normal forms as checkpoints/rule in a process, similar to how a driving test has stages like learners, parking, and road test.
First Normal Form (1NF)
- Eliminate repeating groups in individual tables.
- Create a separate table for each set of related data.
To normalize this data, we create separate tables for students and their courses:
- Identify each set of related data with a primary key.
Second Normal Form (2NF)
- Remove partial dependency — Every non-key column must depend on the whole primary key, not just part of it.
The 1NF only eliminates repeating groups, not redundancy. That’s why there is 2NF.
A table is said to be in 2NF if it meets the following criteria:
it’s already in 1NF
No partial dependency. That is, all non-key attributes are fully dependent on a primary key.
Third Normal Form (3NF)
Removes transitive dependencies by ensuring that non-key attributes depend only on the primary key.
Now we need to understand transitive dependency, which is key to understanding 3NF.
Transitive dependency happens when an attribute(column) depends on another non-key attribute rather than directly depending on the primary key.
3NF requires
- The table is already is 2 NF
- There are no transitive dependencies. This means that non-key attributes should not depend on other non-key attributes.
This is the final structure in 3NF:
No partial dependency (because all attributes are fully dependent on the whole primary key in their respective tables).
No transitive dependency (because no non-key attributes depend on other non-key attributes).
Elementary key Normal Form (EKNF)
- EKNF is a stricter form of 3NF that ensures all functional dependencies are elementary, meaning they are straightforward and clear, without any complex dependencies.
3NF: Ensures no transitive dependencies.
EKNF: Further refines 3NF by ensuring all functional dependencies are elementary, addressing more complex dependencies that 3NF might not handle.
Boyce–Codd Normal Form(BCNF)
-
A table is in BCNF if:
- It is in 3NF.
- For every functional dependency ( A —> B ),A must be a candidate key (superkey).
Superkey: A set of one or more columns that can uniquely identify a row in a table.
Functional Dependency: A relationship where one attribute uniquely determines another attribute.
Fourth Normal Form (4NF)
Next level of normal form after BCNF.
Other Normal forms are concerned with functional dependencies, 4NF is concerned with a more general type of dependency known as a multivalued dependency.
Multivalued dependency - Occurs when one attribute determines multiple values of another attribute independently of other attributes.
Fifth Normal Form (5NF)
It is also known as Project-Join Normal Form (PJNF), is the highest level of normalization. A table is in 5NF if it is already in 4NF and cannot be decomposed into smaller tables without losing data integrity. This means that every non-trivial join dependency in the table is implied by the candidate keys.
If we decompose this table into smaller tables to eliminate redundancy, we must ensure that we can still reconstruct the original table without losing any information. In 5NF, this decomposition is done in such a way that no information is lost, and all join dependencies are preserved. explain wit examples
Denormalization
Tables may be denormalized to improve performance.
Normalization increases the number of tables & relationships.
Accessing multiple tables across relationships requires more processing than accessing a single table.
Top comments (0)