Now that we covered the very basics of database concepts and the terminology (if you’ve read part 1 of this series), we need to speak about getting your data in order…

I’ve talked previously about data anomalies, which can occur while inserting, updating, or deleting data in tables. To prevent this, relations that could introduce anomalies need to be split into smaller, well-structured relations.

This process is called normalisation: If a relation is normalised, rows can be inserted, deleted, or modified without creating anomalies.

Normalisation principles

In general, a well-formed relation will not contain more than one theme (e.g. Employee does not contain information about Department).

  • To be a well-formed relation, every determinant must be a candidate key.
  • Any not well-formed relation should be broken into two or more well-formed one.

Normalisation steps

There are several so called Normal Forms (a measurement of how well-formed relations are) defined but 3NF is sufficient for the majority of business cases:

Table with multivalued attributes
  ↓ Remove multivalued attributes
First Normal Form (1NF)
  ↓ Remove partial dependencies
Second Normal Form (2NF)
  ↓ Remove transitive dependencies
Third Normal Form (3NF)

First Normal Form (1NF)

A table does not contain multivalued attributes (more than one value in one cell, e.g. 101; 207). Each row of a given column holds only a single value. This is called an atomic value since it can’t be split (from the Greek word atomos for uncuttable).

When we look at this example for non-1NF table…

EmployeeName Department
Eric, Olivia Hospitation

…we see that Eric, Olivia is a multivalued attribute which needs to be split into separate rows in order to be 1NF-conform:

EmployeeName Department
Eric Hospitation
Olivia Hospitation

Second Normal Form (2NF)

  • 1NF, and
  • every non-key attribute is fully functionally dependent (see Part 1 for a refresher) on the entire primary key.

Let’s look at our Employee example again:

EmployeeId Name City PostalCode Department DepartmentHead
101 Eric Cologne 50672 Hospitation Olivia

This table is representing information about two different themes: the employee and the department. It must therefore be normalized into 2NF by splitting Department and DepartmentHead into a second table that holds only information about the Department, leaving its primary key DepartmentId as foreign key in the Employee table, as we did in the previous post:

Table Employee:

EmployeeId Name City PostalCode DepartmentId
101 Eric Cologne 50672 1
102 Daniela Berlin 10178 2
103 Thomas Cologne 50667 2
104 Olivia Munich 80638 1

Table Department:

DepartmentId DepartmentName DepartmentHead
1 Hospitation Olivia
2 Engineering Thomas

Third Normal Form (3NF)

  • 1NF, and
  • 2NF, and
  • no transitive dependencies (i.e. no functional dependencies on non-primary key attributes).

When a none-prime attribute depends on other non-prime attributes instead of depending on the prime attributes / primary key, a _transitive dependency- exists. Or in simpler words: an attribute within an entity depends on a non-prime attribute, which in turn depends on the primary key.

In our Employee example, City is transitive dependent on PostalCode, which in turn depends on the primary key EmployeeId:

EmployeeId Name City PostalCode DepartmentId
101 Eric Cologne 50672 1

Let’s move City and its PostalCode to a new table, where the latter becomes the primary key. PostalCode remains in the Employee table as foreign key:

Table Employee:

EmployeeId Name PostalCode DepartmentId
101 Eric 50672 1
102 Daniela 10178 2
103 Thomas 50667 2
104 Olivia 80638 1

Table Department:

DepartmentId DepartmentName DepartmentHead
1 Hospitation Olivia
2 Engineering Thomas

Table City:

PostalCode City
10178 Berlin
50672 Cologne
80638 Munich

General strategy

By breaking partial and transitive dependencies up into smaller relations, we can move from a lower Normal Form to a higher one.

But sometimes you might want or even need to do the opposite:

Denormalisation

Normalising relations (the process of breaking them apart into many component relations) may significantly increase the complexity of your data structure and denormalisation can be a good idea if you’re working with huge sets of data.

The benefits of denormalisation are:

  • Simplicity
  • Improved performance when querying the database

Of course, there are also costs of denormalisation:

  • Risk of modification anomalies
  • Potentially redundant data
  • More storage space is required

In general though, normalising your data is an essential tool to avoid anomalies, and the need to denormalise might never arise in your project.

Grab a pen and some paper…

The process of data normalisation was already a hard topic to grasp. So for the next part, let’s do some drawing in:

Up next: Database Basics, Part 3: Data Models and Database Design

Previous topic:

Database Basics, Part 1: What are Relational Databases?


General credits and further readings:

Adrienne Watt’s “Database Design - 2nd Edition” book

Dr. Daniel Soper’s “Database Lessons” videos