Legacy Data Models can be Recycled into Graph Data Models

Why waste good legacy data models just because you go to NoSQL? Literally legions of data modelers have spent tons of hours and days on producing (mostly) rather good representations of a business context. AKA data models.

I tried to get an overview of the data modeling tools market (see addendum below) and I found a list of 77 ERD-supporting tools. Add to that that wikipedia has a list of 49 UML tools, many of which are also used for data modeling. And the history of these tools go back to the mid 80es with the CASE tools (see also below) based on the emerging IBM PC/AT computers.

There must be hundreds of thousands of good, reusable data models! Why waste such a large resource of business metadata?

Much similar to data science we need to be able to (in "Metadata Science") to read, transform, scope / reduce / enhance and adapt to modern database technologies. Not least graph databases, if you ask me.

This book contains all you need to do this - in 5 different contexts:
Stacks Image 88
Recycle, Reuse and Reduce also apply to data models!

Why waste time on remodeling the same data again, because you change platform? Explore how to auto-generate graph data models (for Neo4j) from legacy data models in UML, XML, ERD, concept maps and other formats. And it includes a design of a metadata repository giving you full scale control.

The graph below is a graph data model transformed from the old StarUML XPD representation:
Stacks Image 86

Contents of the Book

I started looking at things this way because of a recent client situation. The challenge was building a graph database based on data, which:
  • Resided in Oracle®
  • Were modeled as UML® class diagrams, and
  • Were also available in JSON-format.
The target was a graph database and I started looking at metadata transformation using Neo4j®. One of the nice things about graph technology is that you evolve the data model as you go. Furthermore you have a very powerful, declarative language with many, many useful procedures and functions. It is called Cypher®, and it is becoming the SQL of graph.

To make the story short: We built (mostly we generated) a graph model by way of extracting metadata, mostly from an XMI®-representation of the UML® model, but also from some JSON meta files. We only used Cypher®-scripts and it was a whole lot easier than we thought it would be. This saved us a lot of time and gave us good opportunities for caring about the scope of the forthcoming graph data model.

This book explains how to do that. Cypher®-scripts are included for Concept Maps (CmapTools®), CSDL, XML Schemas, StarUML v1 and UML® via XMI®. More to follow.

It also explains how to build a simple graph-based metadata repository for:
  • Business level concept models
  • Solution level logical data models, and
  • Physical models.
The repository has full lineage back to the source data model, and it supports identities, uniqueness, mandatory fields and basic datatypes.

Cypher-scripts for repository handling are indeed also part of the book (under a MIT license)..

We suggest a choice of two approaches:
  • Fast Track Data Models (agile)
  • Super Data Models (crafted, using the repository)
See the table of contents here at Leanpub

Short recap of the evolution of data modeling tools:

CASE tools on wikipedia: https://en.wikipedia.org/wiki/Computer-aided_software_engineering. With the advent of the IBM PC/AT, a first wave of tools came about: Excelerator from Index Technology, Knowledgeware, Texas Instrument's CA Gen and Andersen Consulting's FOUNDATION toolset (DESIGN/1, INSTALL/1, FCP). Many of the leaders of the CASE market of the early 1990s ended up being purchased by Computer Associates, including IEW, IEF, ADW, Cayenne, and Learmonth & Burchett Management Systems (LBMS).

Database Star (https://www.databasestar.com/data-modeling-tools/) lists 77 data modeling tools supporting ERD diagrams! Some of them also support UML. First pure data modeling tool was PowerDesigner (1989) from Sybase (according to wikipedia).

What about UML tools, then? They are also used for data modeling. Wikipedia lists (https://en.wikipedia.org/wiki/List_of_Unified_Modeling_Language_tools) 49 UML tools. According to that list, the first one was, once again, PowerDesigner (1989) from Sybase.

Actually PowerDesigner started out as AMC*Designor in 1989 from a French company called SDP (cf. https://www.powerdesigner.biz/EN/powerdesigner/powerdesigner-history.html). The following year the English version S-Designor came out, and in 1995 they got bought by PowerSoft and the name changed to PowerDesigner. 2 years later Sybase bought the company.
Want to know more about graph data modeling in general? Read the book:
Stacks Image 54