Database Design

TODO When the PCDE course gets here, start looking through the notes in the course repo already written and put them here. Then start to go through the course as intended and refine the notes and add to them.

Relational Database Design

Pros vs Cons

Pros

Cons

Process of Creating a Database Model

  1. Understand what you are modeling
  2. Identify the separate entities that need to be captured
  3. Identify the properties of said entities
  4. Identify what unique identifiers exist for those entities
  5. Identify relationships between entities and properties.

Understand the Entity or Domain

Get info about the domain, it takes time & effort.

Identify Entities & Properties

Identify Primary Keys, or What Makes an Entity Unique

Relationships Between Tables

Design Summary & Naming Practices

Sample Book Database

Let's walk through creating an actual database design with a real world domain. Let's create a book database. How do we come up with the entities, properties and the relationships. There's a number of ?s that can be posed to design this data model:

These are the sorts of questions that can arise when considering the nature of data. This is a good starting point when it comes to design, the questions you might want answered from the data.

Hard Coded Policies

As well as asking those questions, you might want to ask yourself: What fixed rules are you capturing within your design?

Ie can an author have more than one book? Perhaps not initially, but eventually many authors in the database will write their second book.

Ie Can a book have more than one author? Yes, many books are co-authored by many writers.

Ie Can a sales order be for more than one book? Normally in a shop you might want to buy many books.

Ie Can an editor edit multiple books? Yes, very likely, this is a book business, they usually edit many books.

Ie Can a book have more than one editor? Again, yes, this does happen.

So track these kinds of questions because they become clues in how the data model aught to be designed through these hardcoded rules.

Entities & Relationships in the Data Model

So, let's think here about the entities and relations we want to capture. And the data model we want to create.

Once again, these are our:

  1. Tables
  2. Columns
  3. Relationships between tables, through columns

Active Learning

Identify the entities and the properties for the book business data model

There are many possibile solutions. Let's start with a subset. You can think of entities like:

...as being core parts of the design.

Authors

Properties

Books

Properties

Editors

Properties

Publishers

Properties

Naming Conventions

Cardinality for Sample Database

Cardinality of the Book Database

The Book Database so Far

Here are the four tables so far and the properties to consider. They all have primary keys (PK). What of their foreign keys (FK)?

Implementation of 1-to-N Relationships

How would you enter multiple books for the same publisher? To answer that let's write out some initial data.

BookID Name Price Date
1 Eduardo $13 02/20/2018
2 Victoria $86 11/22/2016
3 Sapiens $20.99 02/15/2015
PublisherID Name Address BookID
1 Random House NY, NY 1
1 Random House NY, NY 2
1 Random House NY, NY 3

Note how because of the 1-to-N relationship between books and publishers, while there's three unique book rows in the Books table, There's three repeating rows in the Publishers table with the only difference being the BookID. This is how this 1-to-N relations play out, the many table that's related to the 1 in this relationship must have a lot of repeated information to express that many books are related to one publisher.

This violates some of the key rules of database design. Namely that you can't have repeat PKs and you do not want to put multiple values in one cell. Instead we would rather have multiple FKs per one PK in one table cell.

Better Approach to Implementing 1-to-N Relationships

Instead of putting the BookID in the Publisher table like above, let's sketch out the implications of only putting the FK for PublisherID into the Books table.

PublisherID Name Address
1 Random House NY, NY
BookID Name Price Date PublisherID
1 Eduardo $13 02/20/2018 1
2 Victoria $9.86 11/22/2016 1
3 Sapiens $20.99 02/15/2015 1

Note how because the one PublisherID is tied to many BookIDs, placing the FK PublisherID in the Books table deduplicates the data. And all without destroying any information. It will also be more performant because the amount of repition is much less.

Cardinality for Sample Database: N-to-N

How would we represent a many to many relationship? For this example in the Book database system, we'll look at the Authors & Books tables' relationship as (m:m).

For Many-to-Many relationships, there's no good way to represent the relationship w/o creating an extra table. This is known as an intermediate table between Books & Authors, known as BookAuthors that will represent the relationship. It will hold for each row a BookID & AuthorID to represent related Books & Authors.

Notice that the relationship from Authors to BookAuthors is (1:m). Also, that Books to BookAuthors is (1:m). So that we use that intermediate table to map the two relationships.

Some Observations about Intermediate Tables

Cardinality for Sample Database: 1-to-1

More Info on Cardinality

To learn more about the nuances of cardinality, consider reading this article

Normalization

Many of the points made about the design of data models are captured with noramlization rules.

graph TD;
A[Reduction of Redundancy within Tables]
-->B[Simpler Data by Avoiding Duplication]
B --> C[Non-Loss Decomposition]

We do not want redundant data within the tables, we want to support and improve integrity by avoiding duplicates. The other thing is to split up concerns without losing data by using non-loss decomposition.

First Normal Form

Each Row and Column Cell has only One Value

How might you handle one order with multiple books? We have a Books table & Orders table. How would you handle that relationship without breaking this normalization rule? A naive reaction might be to simply add more Book columns in the Order table. You could also add multiple Books in one cell. You can tell this isn't a good solution because it immediately breaks this normal form.

Second Normal Form

Every non-key must be dependent on the primary key

That is that the key truly is a primary key, even when it is composed of two columns. Irrelevant when the primary key is the only column of a table. Ie, every royalty in BookAuthors depends on Books & Authors

Table BookAuthors Example

The royalty column is the only non-key of the table. It depends entirely on the primary key BookID in this table. It makes sense due to the relation of Book & Author. Only one royalty amount would be related to one association of Book & Author.

Third Normal Form

Every non-key must not depend on another non-key column

That is, there aren't any connections between dependent data columns. Only independent data may connect others. Think of each non-key column as a fact about the primary key. In the Orders table, Books can be multiple than an incomplete fact.

Mini Lesson 4.1: Normalization

The [mini lesson 4.1][pcde-mod4-content-zk] handles the nuances of normalization forms.

References

Web Links

Note Links

[pcde-mod4-content-zk]: ./pcde-module4-content.md#Mini Lesson 4.1: Normalization "PCDE Module 4: Mini Lesson 4.1 on Normalization"