Database Basics, Part 2: Data Normalisation | BetterDoc Product Development Blog

Now that we covered the very basics of database concepts and the terminology (if you’ve read part 1 of this series), we need to speak about getting your data in order…

I’ve talked previously about data anomalies, which can occur while inserting, updating, or deleting data in tables. To prevent this, relations that could introduce anomalies need to be split into smaller, well-structured relations.

This process is called normalisation: If a relation is normalised, rows can be inserted, deleted, or modified without creating anomalies.

Normalisation principles

In general, a well-formed relation will not contain more than one theme (e.g. Employee does not contain information about Department).

To be a well-formed relation, every determinant must be a candidate key.
Any not well-formed relation should be broken into two or more well-formed one.

Normalisation steps

There are several so called Normal Forms (a measurement of how well-formed relations are) defined but 3NF is sufficient for the majority of business cases:

Table with multivalued attributes
  ↓ Remove multivalued attributes
First Normal Form (1NF)
  ↓ Remove partial dependencies
Second Normal Form (2NF)
  ↓ Remove transitive dependencies
Third Normal Form (3NF)

First Normal Form (1NF)

A table does not contain multivalued attributes (more than one value in one cell, e.g. 101; 207). Each row of a given column holds only a single value. This is called an atomic value since it can’t be split (from the Greek word atomos for uncuttable).

When we look at this example for non-1NF table…

EmployeeName	Department
Eric, Olivia	Hospitation

…we see that Eric, Olivia is a multivalued attribute which needs to be split into separate rows in order to be 1NF-conform:

EmployeeName	Department
Eric	Hospitation
Olivia	Hospitation

Second Normal Form (2NF)

1NF, and
every non-key attribute is fully functionally dependent (see Part 1 for a refresher) on the entire primary key.

Let’s look at our Employee example again:

EmployeeId	Name	City	PostalCode	Department	DepartmentHead
101	Eric	Cologne	50672	Hospitation	Olivia

This table is representing information about two different themes: the employee and the department. It must therefore be normalized into 2NF by splitting Department and DepartmentHead into a second table that holds only information about the Department, leaving its primary key DepartmentId as foreign key in the Employee table, as we did in the previous post:

Table Employee:

EmployeeId	Name	City	PostalCode	DepartmentId
101	Eric	Cologne	50672	1
102	Daniela	Berlin	10178	2
103	Thomas	Cologne	50667	2
104	Olivia	Munich	80638	1

Table Department:

DepartmentId	DepartmentName	DepartmentHead
1	Hospitation	Olivia
2	Engineering	Thomas

Third Normal Form (3NF)

1NF, and
2NF, and
no transitive dependencies (i.e. no functional dependencies on non-primary key attributes).

When a none-prime attribute depends on other non-prime attributes instead of depending on the prime attributes / primary key, a _transitive dependency- exists. Or in simpler words: an attribute within an entity depends on a non-prime attribute, which in turn depends on the primary key.

In our Employee example, City is transitive dependent on PostalCode, which in turn depends on the primary key EmployeeId:

EmployeeId	Name	City	PostalCode	DepartmentId
101	Eric	Cologne	50672	1

Let’s move City and its PostalCode to a new table, where the latter becomes the primary key. PostalCode remains in the Employee table as foreign key:

Table Employee:

EmployeeId	Name	PostalCode	DepartmentId
101	Eric	50672	1
102	Daniela	10178	2
103	Thomas	50667	2
104	Olivia	80638	1

Table Department:

DepartmentId	DepartmentName	DepartmentHead
1	Hospitation	Olivia
2	Engineering	Thomas

Table City:

PostalCode	City
10178	Berlin
50672	Cologne
80638	Munich

General strategy

By breaking partial and transitive dependencies up into smaller relations, we can move from a lower Normal Form to a higher one.

But sometimes you might want or even need to do the opposite:

Denormalisation

Normalising relations (the process of breaking them apart into many component relations) may significantly increase the complexity of your data structure and denormalisation can be a good idea if you’re working with huge sets of data.

The benefits of denormalisation are:

Simplicity
Improved performance when querying the database

Of course, there are also costs of denormalisation:

Risk of modification anomalies
Potentially redundant data
More storage space is required

In general though, normalising your data is an essential tool to avoid anomalies, and the need to denormalise might never arise in your project.

Grab a pen and some paper…

The process of data normalisation was already a hard topic to grasp. So for the next part, let’s do some drawing in:

Up next: Database Basics, Part 3: Data Models and Database Design

Previous topic:

Database Basics, Part 1: What are Relational Databases?

General credits and further readings:

Adrienne Watt’s “Database Design - 2nd Edition” book

Dr. Daniel Soper’s “Database Lessons” videos