In this final part of the series I’m going to tell you about two database administration concepts that will help you in your understanding of how databases work, and one bonus that I consider “good to have heard about.”
Transactions
A transaction can be thought of as a group of operations. An operation is a “logical unit of work”, if you will.
For example, if you want to update your postal address in an online shop where you’re registered, the transaction might consist of these operations:
Find user with name "Jenny Muller" in table "Customers"
Set the address to the new address
Save the updated address to the database
As you can see, a transaction consists of several operations that belong together. In order to ensure data integrity in the database, a database transaction must maintain Atomicity, Consistency, Isolation, and Durability. This is commonly known as the ACID properties:
Atomicity
The Greek word “atomos” means “uncuttable”. And just like the atom (Greek philosopher Leucippus’s idea of the atom, at least), a database transaction needs to be atomic, “unsplittable”:
- All operations of an atomic transaction (reading, updating, inserting, deleting) must be successfully completed before it is committed.
- If at least one operation is unsuccessful, then a the whole transaction is rolled back and no changes are saved to the database.
Imagine you want transfer 50 € from your bank account to another: the transaction consists of reading the amount of money in your bank account, deducting the amount you want to transfer, and adding it to the target bank account.
Reading shows your account holds 100 €. Your account gets updated to 50 €. And for whatever reason, the target bank account can’t be updated (there’s an uncheck typo in the account number, the account has been closed, etc.). If this transaction would be committed, those 50 € would simply vanish.
What must happen instead, since the last operation of the transaction is unsuccessful and transactions must not be split: the whole transaction gets rolled back (reversed) and no changes are saved to the bank’s database. Your account still holds 100 €.
Consistency
A consistent transaction may change affected data only in allowed ways. Each attribute has certain defined rules (e.g. a constraint like “this field may only hold a string value of 128 characters length”) which need to be observed, so the saved data does not violate the integrity of the database.
This also includes that interrupted modifications (e.g. the internet connection between the user and the database server gets disconnected) are rolled back as to ensure the database is in the same state it was in before the partial change took place.
Isolation
When multiple users (which of course can be people but also applications) have access to a database, different transactions may be operating on the same data at the same time.
Isolated transactions ensure that only one transaction is operating on a given set of data at one time by sequencing them: one transaction performs all of its operations and only then the next transaction is started.
The following concurrency issues might arise from multiple users accessing the same data at the same time:
- Dirty reads: A transactions reads a modified record which has not yet been committed.
- Inconsistent reads: A transaction re-reads a data set and finds the data has changed since its last read.
- Phantom reads: A transaction re-reads a data set and finds that a new record has been added.
For this, the ANSI SQL Standard1 defines four isolation levels that specify which of the concurrency control problems are allowed to occur:
Accessibility | Restrictiveness | |
---|---|---|
Highest | Least | |
1. Read uncommitted | ↑ | ↑ |
2. Read committed | | | | |
3. Repeatable read | | | | |
4. Serializable | ↓ | ↓ |
Lowest | Most |
Durability
In an durable transaction, all committed changes are permanent. This means that once the transaction is complete, the data is not lost due to a system failure, crash, or restart.
Resource locking
To avoid concurrency issues, resource locking is used to disallow transactions from reading, updating and/or writing to a data set that is currently in use.
- Implicit locks are issued automatically by the database management system on activity.
- Explicit locks are issue by users requesting exclusive rights to specific data. These can include:
- table locks
- column locks
- row locks
- cell locks
There are two different strategies to handle manual locking:
Optimistic locking | Pessimistic locking |
---|---|
1. Read data and note a kind of version number | 1. Lock required resources |
2. Process transaction | 2. Read data |
3. Issue update | 3. Process transaction |
4. Look for conflict (based on version number) | 4. Commit changes |
- If no conflict occurred: commit transaction | 5. Release locks |
- else: rollback and repeat transaction |
Therefore optimistic locking is useful in database environments where a majority of transactions are:
Many selects, and few inserts, updates, or deletes.
While pessimistic locking is useful where a majority of transactions are:
Many inserts, updates, or deletes.
Deadlock
As a transaction begins to lock resources, it may have to wait for a particular resource to be released (unlocked) by another transaction. It might happen that two transactions need to wait indefinitely on each other to release resources.
This is known as a deadlock since neither of the transactions can proceed and therefore can’t unlock their resources.2 Database management systems have processes that scan for such situations on aborts one of the transactions, which then is rolled back to allow the other one to proceed.
Database indexes
As the last topic my the list, I’d like to tell you about indexes which you might not use any time soon but might be good to at least know about in broad strokes.
Indexing is an important tool to achieve high performance in big relational database. It is a data structure that contains a copy of parts of the data from one ore more existing database tables.
- Just like the index at the back of book, a database index provides the DBMS the framework to quickly locate needed information.
- This data can be sorted and therefore allowing the usage of quick, binary search strategies.3
Without an index, the DBMS has to scan a given table one row at a time in order to locate the desired row(s).
There are drawbacks for using indexes though:
- Increased storage space since the index contains a copy of some of the data in a table.
- An index needs to be rebuilt on every insert, update, and delete operation on the indexed table. They should therefore only be used sparingly on tables that are frequently updated.
- Index performance is low when an indexed column contains a large amount of duplicate values (e.g. product ratings).
Conclusion
Congratulations! You’ve read about the basic concepts of relational databases, data normalisation, what entity relationships are and how to represent them, and the principles of how to maintain database integrity during transactions.
Thanks for coming with me on this voyage, and I hope this not-so-brief introduction has helped you make your first steps into the world of relational databases.
Previous topics:
Database Basics, Part 1: What are Relational Databases?
Database Basics, Part 2: Data Normalisation
Database Basics, Part 3: Data Models and Database Design
General credits and further readings:
Adrienne Watt’s “Database Design - 2nd Edition” book
Dr. Daniel Soper’s “Database Lessons” videos
-
for the technical isolation definitions, see page 84 of ANSI/ISO/IEC International Standard (IS) Database Language SQL — Part 2: Foundation (SQL/Foundation) ↩
-
for a good illustration of a deadlock, see Vlad Mihalcea’s A beginner’s guide to database deadlock ↩
-
for an easy explanation of binary search, see TutorialsPoint’s Data Structure and Algorithms Binary Search ↩