Visualizing the Structure by visualizing the Dependencies

Structure is the most important ingredient in any data model. One of the major contributions of Dr. Ted Codd’s relational model is the focus on the importance of functional dependencies. In fact, normalization is driven by a modeler’s desire for a relation where strict functional dependency applies across all the binary relations between the designated primary key and the fields depending on it.

As such, functional dependencies are very important contributors to the structural description of the information being modeled.

Inter-table dependencies are modeled as relationships; depending on the declarative language (e.g. SQL) they may be named (with a “constraint name”) or not.

Intra-table dependencies, on the other hand, are not present in the relational model. The only indication of their presence is the logical co-location in the table of the fields, which are guaranteed (by the data modeler) to be functionally dependent on the primary key and that alone. No arrows, no names, no nothing.

Since dependencies between data are key to creating a correct data model, we need to bring the dependencies out into the bright sun - by way of visualization. They must be clearly explicated, like the pedagogical functional dependency graph in the following diagram.

Unfortunately, not many people have worked with functional dependencies on the visual level as Bob Dewhurst (of the Charles Darwin University) does in this diagram:
Stacks Image 89
Beyond functional dependencies, other spill-overs from normalization include avoiding redundancies and update anomalies. Date‘s famous supplier-parts example illustrates this:
Stacks Image 93
This "relation" is in the "second normal form", which implies that it could contain redundancy (of city, for example). It also implies an update anomaly; it is impossible to register a city before you have a supplier in that city. City and Supplier are more independent than dependent. Supplier No (SNO) is not an identifying key for Cities. But, yes, there is a relationship between Supplier and City, but having them in the same relation is not a good idea, because it hides important structural information (the relationship between supplier and city).

This relation is ill-defined because not all of its fields are functionally dependent on the same key. So, what the cookbook recipe of normalization is doing is identifying which relations (tables) are not well-formed (according to the mathematics behind the paradigm used by Dr. Ted Codd in his relational model).

Since the co-location is only on the visual level (in the data model), and since the physical model might be more granular than the tabular model (e.g. a graph database), normalization loses importance these days as a vehicle for eliminating physical redundancies and anomalies. However normalization was also designed to help with identifying the inherent structures, and that remains a key obligation of the data modeler.

Detecting dependencies is still very important. One advantage of graph databases is that they make dependencies very visible. Here is a simple directed graph representation of the ill-formed relation above:
Stacks Image 97
What the term "relation" implies (loosely) is that the relation is a set of binary relations, and if all those relations are functionally dependent on the same function, the functional relation of all of the set members (or columns) is good to go. This relation will be stamped as properly normalized. Many people think “relationship” when they hear the term “relation.” In terms of everyday semantics, that is perfectly right.

For data modelers though, structure rules the world. And structure is not handled well, at least visually in the classic relational model.

Relationships are secondary citizens of relational. They are normally not named at all in the visualizations and functional dependencies (i.e. the sources of meaning / semantics) are not visible or named. This goes for both inter-table and intra-table dependencies.
Stacks Image 101
Concept maps, such as the example above, communicate information structure extremely efficiently, and they’re easy to produce.

Notice the named relationships, which include the functional dependencies (for example, employee and salary), and notice how you can read the whole thing in short sentences. Imagine the same thing expressed as text. Clearly, the concept map communicates clearly and quickly. But there is a drawback. While they’re more succinct than text-based user stories, though, they can still quickly become daunting in size. Can we find a more compact representation?

Given that structure is important, why then, do the classic data modeling practices all deliver content diagrams much like the model on the facing page, on the “logical” level?

The diagram below is filled with detail:
  • Tables and their names
  • All of the fields and their data types
  • Primary keys and foreign keys (named by their constraint names)
  • Relationships as “crow’s feet” with either dashed lines or full lines, signaling “mandatory” or “optional”.

The needed detail is present, but most of the space is used for listing table content (i.e. field lists):
Stacks Image 105
Property graphs are directed graphs, just like concept maps, and they offer elegant visualization of structure, such as this example:
Stacks Image 109
Not all of the details about the data model are found in the diagram above, but we really do not need to have all details on all diagrams, always.

Properties of a business type are all at the same “location” by definition. It logically follows that the business object types are the “landmarks” of the data model. One size does not fit all. But property graphs come close to being the perfect candidates for a database-agnostic representation of data models.

Property graphs are similar to concept maps in that there is no normative style (e.g. like there is in UML). So feel free to find your own style. If you communicate well with your readers, you have accomplished the necessary.

Here is a more elaborate description of property graphs from the perspective of data modeling.
You may follow the sequence or explore the site as you wish:

You could also take a look at the book about Graph Data Modeling:
Stacks Image 74