How-to: Database Design: 1st, 2nd and 3rd normal form

In a relational database you can design separate tables and columns to hold sets of related data.

This is vaguely similar to a spreadsheet with multiple tabs holding different data but with three key extra benefits:

  1. You can strictly define the type of data to be stored in each place (Numbers or dates or text etc)
  2. Dates and numbers can be restricted to sensible values - Date_of_Birth must be in the past, Zip_code must be 20 characters or less etc.
  3. You can define exactly how the tables are linked (related) to one another.

It has been shown; in a series of systematic field audits and lab experiments; that around 88% of all spreadsheets contain errors (Raymond R. Panko, University of Hawai’i, 1998, Revised 2008)
Moving data into a structured relational database can prevent many of these errors.

First Normal Form

The  First Normal Form states that all column values must be atomic (singular)
So the value for one column (in a record) must contain a single value, not a list of items or a coded value that has more than one meaning

For example an order_type of 'A' meaning Overseas+ Airmail, then order_type of 'B' meaning Overseas+ Ship by boat would fail to meet First Normal Form. Changing to separate columns will fix this and allow for the case of Airmail within the same country.

First Normal Form also prohibits the presence of repeating groups even if they are stored in composite columns.
This means that a table for sales Orders containing the columns:
Order_No, budget_holder_name, Item1_description, Item2Description, Item3Description
would fail to meet First Normal Form, even though the descriptions are in separate columns.

Changing the design so that every item gets it's own row in the table means that we can have any number of items from 1 upwards, the table columns then become:
Order_No, Item_no, budget_holder_name, Description
with a primary key of Order_No + Item_no (concatenated) this is a valid First Normal Form table.

Second Normal Form

The  Second Normal Form implies that a tableis in First Normal Form and every non-key column is fully dependent on the entire primary key. In plain language the table should contain data relating to one 'thing' and the primary key will uniquely identify each 'thing'.

In the example above the budget_holder_name field is not fully dependent on the entire primary key, in the vast majority of cases the same member of staff would be the budget holder for all the items in each order. If a different budget holder was somehow entered for different items in the same order then it would be unclear which budget holder should actually approve the order.

Getting to second normal form (2NF) typically means that some data items need to be split into a separate table, in this case an Order_items table which would link back to the main Orders table. The order would then have a single budget_holder_name.

Orders table:
Order_No, budget_holder_name

Order_Items table:
Order_No, Item_no, Quantity, Description

Third Normal Form

To meet the requirements of Third Normal Form a table must meet the requirements of Second normal form plus all non-key columns are mutually independent.

In the example above, description is dependent on the Order item, so we split items into a separate table:

Orders table:
Order_No, budget_holder_name

Order_Items table:
Order_No, Item_no, Quantity

Items table:
Item_no, Description, Manufacturer

An excuse that is often used to avoid normalising databases is "By just duplicating the data in multiple tables we will get better performance". The Query Optimizer built into every major database is more efficient than any SQL or cache optimisation you can think of, joining tables in memory is always going to be orders of magnitude faster than reading fresh data from disk.
Just store the data in the most compact normalised form you can and let the Query Optimizer do it's job.

“My favorite things in life don't cost any money. It's really clear that the most precious resource we all have is time” ~ Steve Jobs


Naming conventions - The standard 'Reddick VBA Naming convention' for Access.
Number Data Types - Access data Types.

Copyright © 1999-2024
Some rights reserved