The Power of the Dependencies

We cannot discuss data modeling without talking about normalization and functional dependencies. We already illustrated the mechanics of the process of
normalization in chapter 2. The objectives of eliminating malformed relational data designs are (roughly) these:
  • Reduce redundancy
  • Avoid update anomalies
  • Avoid information loss.
Working around these problems naturally leads to designing keys, which may be primary or foreign. But do we really need normalization and functional dependencies?
To get a feel for the approach that one could call “Relational Data Modeling Classic,” let us quickly review orthodox normalization by way of a famous example. This example appeared as a complimentary poster in the Database Programming & Design Magazine published by Miller Freeman back in 1989. The intellectual rights belong to Marc Rettig, now of Fit Associates. It is included here with his kind permission - take time to look it over, it is worth the effort.
The starting point of his example is the list of non-normalized data Items for puppies:
  • Puppy Number
    ⁃ Puppy Name
    ⁃ Kennel Code
    ⁃ Kennel Name
    ⁃ Kennel Location
    ⁃ Trick ID 1..n
    ⁃ Trick Name 1..n
    ⁃ Trick Where Learned 1..n
    ⁃ Skill Level 1..n
There are 5 steps in the proposed procedure:
  1. Eliminate repeating groups
  2. Eliminate redundant data
  3. Eliminate columns not dependent on the key
  4. Isolate independent multiple relationships
  5. Isolate semantically related multiple relationships
Each of these steps correspond to the 5 "normal forms", which together make up the normalization method. At no point in the process visualization other than showing tables with columns is used. The fourth normal comprises 5 tables, and the example used for the fifth normal form adds another 5 tables.
Looking at the 5 rules it is evident that the processes are driven be the dependencies (between two concepts, really). It is also evident that there are two classes of dependencies / relationships:
  • Intra-table dependencies, which is the classic functional dependency of properties, which "hang off" the key of some some object (type) identified by that key (color of a product, for example) , and
  • Inter-table dependencies, where some object (type) is "pointing" to the key of another object (type), such as an employee working in a department.
The power of the dependencies is that they drive the structure of the area being looked at.

Normalization by way of Visualization

The normalization approach is an example of what happens when one particular scenario is generalized. The assumption being that table design is made from a long list of fields in no particular order. If, on the other hand, the table is based off a structural analysis (like an entity-relationship analysis), there shouldn’t be any need to normalize. Instead the structures should be visualized as part of the analysis. You, then, should have no problems deconstructing a graph-based data model (which is what entity-relationship models are all about) into a table-based data model.

Let us turn to the benefits of working with dependencies, staying in the puppy context.

Things would definitely be easier if the starting point was not a list of fields, but instead a concept map:
Stacks Image 113
The way we arrive at a concept map like the above is based on the semantics (meaning) of the linking phrases. Some linking phrases determine a simple dependency (like “has”) on the unique identification of the object it is a property of. Other linking phrases contain action verbs meaning that they signify intra-object relationships (like puppy can do puppy trick).

So let’s work through the process described earlier, from the poster:

Eliminate repeating groups?
Not relevant. We already identified them by way of the arrowheads on some of the relationships.

Eliminate redundant data?
Not relevant, since a property belongs to one object only, and is identified by its identity.
We will group the properties into tables, if we want to use tables at all.

Composition from a well-structured starting point is a whole lot easier than decomposition of a semantically-blurred bunch of data.

Eliminate columns not depending on the key?
Not relevant, for the same reason as above.

Isolate independent multiple relationships?
Easy. We would add a new object, Puppy Costume, with a property named Costume. We would point at it from the Puppy Trick, just as in Marc Rettig’s example. Here it is:
Stacks Image 117
Isolate semantically related multiple relationships?
Again, easy. Follow basically the same logic as above. Just as for entity-relationship modeling, as Marc Rettig comments, our concept map or our logical model will naturally show the relationships. Here it is in fifth normal form (5NF):
Stacks Image 121
It is obvious that depicting the dependencies as a directed graph gives a deep understanding of the functional dependency structure.

Given that graph visualization is the way forward (as argued so far in this book), is there anything we can carry forward from normalization?

The answer is simple: identity and uniqueness. These two things contribute hugely to setting the context in precise ways.

Read the Graph Data Modeling book for complete and comprehensive guidelines.
You may follow the sequence or explore the site as you wish:

You could also take a look at the book about Graph Data Modeling:
Stacks Image 72