Evolution of Database

Data modeling and databases evolved together, and their history dates back to the 1960’s.

The database evolution happened in four “waves”:
• The first wave consisted of network, hierarchical, inverted list, and (in the 1990’s) object-oriented DBMSs; it took place from roughly 1960 to 1999.
• The relational wave introduced all of the SQL products (and a few non-SQL) around 1990 and began to lose users around 2008.
• The decision support wave introduced Online Analytical Processing (OLAP) and specialized DBMSs around 1990, and is still in full force today.
• The NoSQL wave includes big data, graphs, and much more; it began in 2008.

Evolution of Data Modeling

If we summarize the data model progress in terms of the Turing Awards related to data modeling, we get this picture:
  • 1973: Charles Bachman with “The Programmer as Navigator”
  • 1981: E. F. (Ted) Codd with “Relational Database: A Practical Foundation for Productivity”
  • 2001: Ole-Johan Dahl and Kristen Nygaard for ideas fundamental to the emergence of object-oriented programming
  • 2014: Michael Stonebraker with “The Land Sharkx are on the Squawk Box.”
In hindsight, it is observed that data modeling was invented “on the fly.” The results were diverse and sometimes counter-productive to one another. Now, however, pragmatism has come to the rescue; there seems to be a great opportunity to solidify the best methods of modeling data.

Let us look back at the history of database in a slightly different manner.
There was a lot of ground to cover for the pioneers of Database Management Systems, and they have done a fine job. The first twenty to twenty-five years introduced and fine-tuned important technological fundamentals.
Stacks Image 11
The relational proponents were struggling in the 1980’s, with two major issues:
  • Complexity of data modeling (“normalization …”) and of SQL
    • Performance.
Even in its heyday, there were quite a few analysts who were not 100% sold on the omnipotence of the relational model. Yes, there were some good intentions and some good ideas. However, even to this day some serious criticisms of the relational model persist:
  • Recently a mathematical incompleteness claim has come from a company called Algebraix Data. They claim that the relational model as defined by Dr. Codd is not really a consistent model since it cannot support sets of sets.
    • Other criticisms accused SQL of not being a “well-formed” and complete (in the mathematical sense) computer language.
What really turned relational DBMSs into reliable, performant, scalable production tools was the advent of robust query optimizers. In 1979, Patricia Selinger of the IBM research center in San Jose described the optimizer of the IBM System R (a relational prototype system). Optimizer technologies matured during the 1980’s and established the “relational empire” around 1990.
1990 is the start of the “relational empire” because by then, the costbased query optimizers had reached sufficient sophistication to allow the RDBMS products to take over most of the database processing across most industries.
Not much new relational technology was published through the 1990’s and early 2000’s. In fact, entrepreneurs (in California, mostly) were busy developing alternatives to the relational approach. Quite a few of the new companies and products were focused on specialized niches such as documents, graphs, semantics, and high-volume applications.
Today, vendors unite under the NoSQL / Big Data brand. In one white paper, a non-relational vendor (MarkLogic) very succinctly complained of relational models: “Relational database vendors are still offering users a 1990’s-era product using code written in the 1980’s, designed to solve the data problems of the 1970’s,
with an idea that came around in the 1960’s.”
Stacks Image 13
Around 2008, triggered by Facebook’s open source versions of Hive and Cassandra, the NoSQL counter-revolution started. This space gets all of the attention today.

2008 was indeed a turning point. This can also be seen in the report of the very influential summit of database researchers, which have met in 1989, 1990, 1995, 1996, 1998, 2003, 2008 and 2013. In 2008, big data was the number one factor for a “sense of change” (The Claremont Report on Database Research, downloaded from http://bit.ly/2abBidh on 2016-02-27.)

So, where do we go now? How to balance the “what” and the “how” in light of NoSQL and all of the new technologies?
Well, the modern development platforms use schema-free or semi-structured approaches (also under the umbrella of NoSQL). “Model as you go” is a common theme, while data modelers and data governors are seen as relics from the past. Surveys (e.g. Insights into Modeling NoSQL, A Dataversity Report 2015, by Dr. Vladimir Bacvanski and Charles Roe) confirm this. Modeling for NoSQL is very often performed by the developer on the fly.

Database veteran David McGoveran notes this important distinction: the IT programmer’s perspective is that of rapid solution delivery, whereas the data modeler’s perspective is that of working with a long term asset (http://bit.ly/29PpWIJ).

How then can we choose what to carry over from the data modeling legacy, if anything at all? The answer is found in the Graph Data Modeling Book.
See the complete story in this book:
Stacks Image 19