Transform, Optimize and Deploy
1) business requirements (user stories) and
2) competency in physical modeling (for the target of your choice).
At this point in the process, you’ll likely already be quite familiar with the business requirements at hand; they’ve been relevant in every step. We cannot help you with the second requirement. There exist plenty of other good sources, closer to the vendors, which can help you with that.
Here we will examine how visual representations of graph solution data models can help us determine, which transformations we will need.
We’ll also get useful mapping information on the business level, if we visualize the selected transformations.
A good example of the synergy between the three modeling levels is the matter of hierarchies. These exist on the business level, and can be seen in concept maps and in solution data models. However, they tend to disappear (because of denormalization) at the physical level.
The type of physical model you end up using in a given solution can have some effects on requirements for detailed preservation of history (e.g. Type 2 level of versioning as it appears in data warehouses and other places). If you have those kinds of requirements, please make sure your choice of physical data store is compatible with them.
Also remember that most of these products do not “join” in the SQL sense.
Document and graph databases do offer good possibilities for relating things; otherwise, you will have to rely on applications to perform joins. This will influence your design; you should try to cater to frequent query patterns in the design of the physical data model. Identifying these is what the user stories are for. Go back to your concept maps and consult the business level relationships/dependencies before you get too deep into the physical design.
Hierachical Aggregation Schemes
Column-wise, they include either just one value column (which may become very large) or multiple value columns.
These are called many names: “column family databases,” “big tables,” “wide tables,” or “long row designs”. Essentially, the column-wise physical models are applications of the repeating group paradigm, such that every row contains column(s) for each occurrence of the repeating data. Consider, for example, the multiple phone number example above.
Denormalization will have to cater to that. For example, sketching an aggregate structure (in the domain-driven design paradigm) could look like this:
Aggregates essentially deliver hierarchies for each relevant sub-tree of the solution data model. Note that aggregation might imply denormalization as well; the result will always have redundant data. Consider this when designing your model. If the redundant data is processed at different stages of a lifecycle, it might work well.
Many document stores offer flexibility of choice between embedding or referencing. Referencing allows you to avoid redundancy, but at the cost of having to retrieve many documents. In fact, some products go to great lengths to deliver graph-style capabilities, both in the semantic “RDF triplets” style and in the
general “directed graph” style.
In this manner, what your physical design will likely end up with is a set of related collections that reference each other; these could be arranged in many-to-many relationships, if need be.
Property Graph Databases
Look at the examples in section 4.1, and later in chapter 5. If you want more “graph-ish” modeling tips, you can look into this (free) book from Neo4j: http://bit.ly/29VFvPZ.
Property graphs are quite rich in their expressiveness, and allow for example properties on relationships, which should be used for describing some important feature(s) of the relationship like strength, weight, proportion, or quality.
Relationships are essentially joins between nodes, with one big difference. A relationship called “produces” may well point to different types of business objects, like “part” and “waste.” This is both a strength and a weakness. It’s positive because it reflects the way people think about reality. It’s negative because people might get confused, if the semantics are unclear. These considerations are very good reasons to define a solution data model, even if your physical data model is going to be based on a “flexible” or even non-existing schema.
Much of the flexibility of the graph model can be used to handle specialization on the fly. That is fine in most cases, but specializations hang off more precisely defined structures. And the precisely defined structures is what you have in your solution data model.
In graph databases you can use ordered, linked lists.
Here is an example of establishing purchasing history for customers by way of using a “next order” relationship:
In this manner time-series data can be handled elegantly in graphical representations using next / prior linking.
If you find yourself working extensively with these applications that work well with graph databases, you can bite the bullet and go “all graphs.” Alternatively, you can architect a solution that is part-graph and part-something else (SQL, for example).
Differences between Relational and Graph databases like Neo4j
- What goes into a Node, a Relationship and a Property? Sometimes a property, for example, is better modeled as a node and vice versa
- An upfront schema not absolutely necessary, but knowing what you work with, is …
- Incremental extensions of the model
- Early testing on realistic data (but possibly not too many…)
- Frequently designed for specific queries / analytics (user stories)
- Labels are fast, properties can be indexed, but…
- Named relationships!
- Multiple labels, but not too many
- Join tables in relational become relationships (even M:M, but …)
- Lists (like a repeating group in relational)
SQL as Target Model
In the little employee-department data model from the seventies (Peter Chen), the transformations look like this. First, the solution data model: