Now that we covered the very basics of database concepts and the terminology (if you’ve read part 1 of this series), we need to speak about getting your data in order…
I’ve talked previously about data anomalies, which can occur while inserting, updating, or deleting data in tables. To prevent this, relations that could introduce anomalies need to be split into smaller, well-structured relations.
This process is called normalisation: If a relation is normalised, rows can be inserted, deleted, or modified without creating anomalies.
In general, a well-formed relation will not contain more than one theme (e.g.
Employee does not contain information about
- To be a well-formed relation, every determinant must be a candidate key.
- Any not well-formed relation should be broken into two or more well-formed one.
There are several so called Normal Forms (a measurement of how well-formed relations are) defined but 3NF is sufficient for the majority of business cases:
Table with multivalued attributes ↓ Remove multivalued attributes First Normal Form (1NF) ↓ Remove partial dependencies Second Normal Form (2NF) ↓ Remove transitive dependencies Third Normal Form (3NF)
First Normal Form (1NF)
A table does not contain multivalued attributes (more than one value in one cell, e.g.
101; 207). Each row of a given column holds only a single value. This is called an atomic value since it can’t be split (from the Greek word atomos for uncuttable).
When we look at this example for non-1NF table…
…we see that
Eric, Olivia is a multivalued attribute which needs to be split into separate rows in order to be 1NF-conform:
Second Normal Form (2NF)
- 1NF, and
- every non-key attribute is fully functionally dependent (see Part 1 for a refresher) on the entire primary key.
Let’s look at our Employee example again:
This table is representing information about two different themes: the employee and the department. It must therefore be normalized into 2NF by splitting
DepartmentHead into a second table that holds only information about the Department, leaving its primary key
DepartmentId as foreign key in the
Employee table, as we did in the previous post:
Third Normal Form (3NF)
- 1NF, and
- 2NF, and
- no transitive dependencies (i.e. no functional dependencies on non-primary key attributes).
When a none-prime attribute depends on other non-prime attributes instead of depending on the prime attributes / primary key, a _transitive dependency- exists. Or in simpler words: an attribute within an entity depends on a non-prime attribute, which in turn depends on the primary key.
City is transitive dependent on
PostalCode, which in turn depends on the primary key
City and its
PostalCode to a new table, where the latter becomes the primary key.
PostalCode remains in the
Employee table as foreign key:
By breaking partial and transitive dependencies up into smaller relations, we can move from a lower Normal Form to a higher one.
But sometimes you might want or even need to do the opposite:
Normalising relations (the process of breaking them apart into many component relations) may significantly increase the complexity of your data structure and denormalisation can be a good idea if you’re working with huge sets of data.
The benefits of denormalisation are:
- Improved performance when querying the database
Of course, there are also costs of denormalisation:
- Risk of modification anomalies
- Potentially redundant data
- More storage space is required
In general though, normalising your data is an essential tool to avoid anomalies, and the need to denormalise might never arise in your project.
Grab a pen and some paper…
The process of data normalisation was already a hard topic to grasp. So for the next part, let’s do some drawing in:
General credits and further readings: