Normalization


Normalization is a systematic approach used in database design to organize data in a way that reduces redundancy and dependency by dividing large tables into smaller, more manageable ones. It enhances the integrity and availability of the database by arranging data into logical groupings, such that each group describes a small part of the whole data.

The goal is to structure the data in such a way that:

  • Data is stored logically across tables.
  • Redundancy is minimized (avoiding duplicate data).
  • Anomalies during insert, update, or delete operations are prevented.
  • Data consistency is maintained.

Why is Normalization Used?


Normalization is used to:

  • Minimize Redundancy: Avoids duplicate storage of data.
  • Avoid Anomalies: Prevents anomalies that occur during insertions, deletions, or updates.
  • Enhance Data Integrity: Ensures data accuracy and consistency through established relationships and rules.
  • Improve Query Performance: Optimizes queries by reducing table sizes and complexity which can speed up search and retrieval operations.

Forms of Normalization


Normalization is typically done in several steps, each corresponding to a specific normal form. Each form represents a set of criteria that a database must meet to ensure it is free of unwanted characteristics like redundancies, insertion/deletion anomalies, and update anomalies.

First Normal Form (1NF) – Eliminate Repeating Groups


A table is in 1NF if it meets the following criteria:

  • All columns contain only atomic values (no multiple values or lists).
  • All entries in a column are of the same data type.
  • Each column contains unique data for each row within that table.

Example -

Before 1NF

Order_IDCustomer_NameLaptop, Mouse
1John DoeLaptop, Mouse
2Jane SmithKeyboard, Monitor

Problem: The "Product_Names" column stores multiple values (like Laptop and Mouse in the same cell). This makes it hard to query specific products.

After 1NF

Order_IDCustomer_NameProduct_Name
1John DoeLaptop
1John DoeMouse
2Jane SmithKeyboard
2Jane SmithMonitor

Now, each column contains only one value per row, and we've split the products into separate rows.

Second Normal Form (2NF) – Eliminate Partial Dependency


A table is in 2NF if:

  • It is in 1NF.
  • Every non-key column is fully dependent on the entire primary key, not just part of it. (This applies only to tables with composite keys.)

Example -

Before 2NF

Order_IDProduct_NameProduct_PriceCustomer_Name
1Laptop50000John Doe
1Mouse1000John Doe
2Keyboard2000Jane Smith
2Monitor10000Jane Smith

Problem: Product_Price depends only on Product_Name, not on the full Order_ID. If the product price changes, we need to update multiple rows (risking inconsistency).

After 2NF

Solution: Break the table into two:

Orders Table -

Order_IDProduct_NameProduct_Price
1Laptop50000
1Mouse1000
2Keyboard2000
2Monitor10000

Products Table -

Product_NameProduct_Price
Laptop50000
Mouse1000
Keyboard2000
Monitor10000

Now, Product_Price is stored in a separate Products Table and referenced by Product_Name. This eliminates partial dependency.

Third Normal Form (3NF) – Eliminate Transitive Dependency


A table is in 3NF if:

  • It is already in 2NF.
  • Non-key columns do not depend on other non-key columns (no transitive dependencies).

Example -

Before 3NF

Customer_IDCustomer_NameCityZip_Code
1John DoeNew York10001
2Jane SmithSan Diego92101

Problem: City depends on Zip_Code, not directly on Customer_ID. This is a transitive dependency, meaning City is indirectly linked to Customer_ID through Zip_Code.

After 3NF

Solution: Break the table into two:

Customers Table -

Customer_IDCustomer_NameZip_Code
1John Doe10001
2Jane Smith92101

Zip Codes Table -

Zip_CodeCity
10001New York
92101San Diego

Now, the City is stored in a separate Zip Codes Table, ensuring that non-key columns do not depend on each other.

Boyce-Codd Normal Form (BCNF) – Strengthened 3NF


A table is in BCNF if:

  • It is already in 3NF.
  • Every determinant (a column that determines another column) is a candidate key.

Example -

Before BCNF

Course_IDInstructorClassroom
101Dr. SmithRoom A
102Dr. SmithRoom B

Problem: The instructor (Dr. Smith) determines the classroom, but Instructor is not a candidate key.

After 3NF

Solution: Break the table into two:

Courses Table -

Course_IDInstructor
101Dr. Smith
102Dr. Smith

Instructors Table -

InstructorClassroom
Dr. SmithRoom A

Now, Instructor is treated as a candidate key, ensuring that the table follows BCNF.

Summary of Normal Forms


Normal FormWhat It EnsuresIssue it Solves
1NFNo repeating groups or multi-valued columnsPrevents data duplication in columns
2NFNo partial dependenciesEnsures all data depends on the full primary key
3NFNo transitive dependenciesPrevents non-key columns from depending on other non-key columns
BCNFStrengthened 3NFFixes anomalies not addressed by 3NF