Normalization

Normalization is a process of representing a database in terms of relations in standard normal forms where first normal form is a minimal requirement. In short:

  1. It can be defined as the trade off between data redundancy and performance.
  2. In this process relations are examined and anomalies are being detected and removed.
  3. While normalizing a relation data redundancy is being reduced but need for joins gets introduced.

Anomalies
When an attempt is made to modify(update, insert into, or delete) a table, undesired side-effects may follow. These undesired effects present in our database which we have to detect and then remove using normalization, are known as anomalies.

Procedure for normalization:

  1. Specify the key to the relation.
  2. Specify the functional dependency of the relation.
  3. Apply the definition of the each normal form(like start with the 1NF).
  4. If condition not satisfied then divide the relation(table) into two or more relations as required.
  5. Re-test

Now what are the keys?
key – One or more attributes(columns) that uniquely identify a tuple(row) in a relation.
Primary key – It is a unique key whose value can not be a null value and in which no duplicates can be tolerated.
Candidate key – Basically a candidate key is a primary key only, which uniquely describes the whole database record without referring to the other keys. As the name specifies, these keys are the candidates for primary keys. One of the candidate keys later on become primary key.
Super key – It can be defined as a non-minimal candidate key.
non-key attributes – Key attributes which are not the candidate keys are called non-key attributes.

Functional Dependency

  1. It describes a relation between attributes within a single relation.
  2. An attribute is called functionally dependent on another if we can use the value of another attribute to determine the value of the other.
  3. A->B in a relation R(A, B, C, D) implies A determines B or B is being determined by A.

Normal form – A class of relations free from a certain kind of anomalies and defaults.

    1. First Normal Form(1NF): A relation can be in a first normal form if each attribute is having a single value only. i.e no multiple values and each attribute must be unique.
    2. Second Normal Form(2NF): According to it there should not be any partial dependencies. Means any part of a candidate key should not determine a non-key attribute.
    3. Third Normal Forms(3NF): There should not be any non-key to non-key determination in a functional dependency of the relation. i.e Transitive dependency should not be there.
    4. BCNF: If  x->y and x is a super key.

Now lets take an example to implement all the above concepts.
What normal form is the following relation in (key has been emphasized):
STORE_ITEM (SKU, PromotionID, Vendor, Style, Price)
FD1: SKU, PromotionID → Vendor, Style, Price
FD2: SKU → Vendor, Style

Now as SKU and PromotionID are the keys. And in the second functional dependency a part of the key has been determining a non-key attribute which is called partial functional dependency, Which violates the rule for a relation to be in 2NF.
Hence the given relation is in 1NF.

Leave a comment