Day 28: Normalization and Database design

Introduction to normalization:

Normalization is the process of organizing data in a database. The goal of normalization is to eliminate redundancy and inconsistency, and ensure data is stored in a way that supports efficient querying and updating.

There are several different normal forms, with each normal form building on the previous one. The most commonly used normal forms are first normal form (1NF), second normal form (2NF), and third normal form (3NF).

The normalization process and steps:

The normalization process involves several steps, each building on the previous one. The general steps for normalization are:

Identify the entities and attributes in the database.
Identify the primary keys for each table.
Eliminate repeating groups and separate the data into separate tables.
Ensure each table has a primary key.
Eliminate any partial dependencies by creating separate tables.
Ensure there are no transitive dependencies by creating separate tables.

Applying normalization to a sample database design:

Let’s consider an example of a database design that is not normalized, and how we can apply normalization to improve it.

Suppose we have a database that stores customer orders for a clothing store. The current database design looks like this:

Customer_Order (order_number, customer_name, customer_address, item_description, item_price, item_quantity)

In this design, there are several repeating groups: customer_name, customer_address, item_description, item_price, and item_quantity. Additionally, the primary key is the order number.

To normalize this database, we can follow the steps outlined above:

Identify the entities and attributes in the database. In this case, we have two entities: customers and orders. The attributes for the customers entity are customer_name and customer_address. The attributes for the orders entity are order_number, item_description, item_price, and item_quantity.
Identify the primary keys for each table. For the customers table, the primary key is the customer_name. For the orders table, the primary key is the order_number.
Eliminate repeating groups and separate the data into separate tables. We can create a separate table for the items, which will include the item_description, item_price, and item_quantity attributes. This will eliminate the repeating groups in the Customer_Order table.
Ensure each table has a primary key. We have already identified the primary keys for the customers and orders tables. For the items table, we can create a new primary key called item_id.
Eliminate any partial dependencies by creating separate tables. There are no partial dependencies in this design.
Ensure there are no transitive dependencies by creating separate tables. There are no transitive dependencies in this design.

After normalizing the database, the design will look like this:

Customers (customer_name, customer_address)

Orders (order_number, customer_name, order_date)

Items (item_id, item_description, item_price, item_quantity, order_number)

By normalizing the database, we have eliminated redundancy and improved the efficiency of querying and updating data.