What Goes Around Comes Around

Reference: Stonebraker, Michael, and Joey Hellerstein. "What goes around comes around." Readings in Database Systems 4 (2005).

This is one of the several papers belong to suggested readings for Background of Readings in Database Systems, 5th Edition.

0. Abstract

This paper provides a summary of 35 years of data model proposals, grouped into 9 different eras. We discuss the proposals of each era, and show that there are only a few basic data modeling ideas, and most have been around a long time. Later proposals inevitably bear a strong resemblance to certain earlier proposals. Hence, it is a worthwhile exercise to study previous proposals.

In addition, we present the lessons learned from the exploration of the proposals in each era. Most current researchers were not around for many of the previous eras, and have limited (if any) understanding of what was previously learned. There is an old adage that he who does not understand history is condemned to repeat it. By presenting “ancient history”, we hope to allow future researchers to avoid replaying history.

Unfortunately, the main proposal in the current XML era bears a striking resemblance to the CODASYL proposal from the early 1970’s, which failed because of its complexity. Hence, the current era is replaying history, and “what goes around comes around”. Hopefully the next era will be smarter.

1. Introduction

  • The purpose of this paper is to summarize 35 years worth of “progress” and point out what should be learned from this lengthy exercise.

2. IMS Era

  • Hierarchical
  • Two common undesirable properties: (1) information is repeated; (2) existence depends on parents.

Lessons Learned

  • Physical and logical data independence are highly desirable
  • Tree structured data models are very restrictive
  • It is a challenge to provide sophisticated logical reorganizations of tree structured data
  • A record-at-a-time user interface forces the programmer to do manual query optimization, and this is often hard

3. CODASYL Era

  • Directed graph

Lessons Learned

  • Directed graphs are more flexible than hierarchies but more complex
  • Loading and recovering directed graphs is more complex than hierarchies

4. Relational Era

  • Proposer: Ted Codd

Lessons Learned

  • Set-a-time languages are good, regardless of the data model, since they offer much improved physical data independence.
  • Logical data independence is easier with a simple data model than with a complex one.
  • Technical debates are usually settled by the elephants of the marketplace, and often for reasons that have little to do with the technology.
  • Query optimizers can beat all but the best record-at-a-time DBMS application programmers.

5. The Entity-Relationship Era

  • Proposer: Peter Chen
  • The E-R model was never implemented by a DBMS as the underlying data model, but it became popular as a data base design tool.

Lessons Learned

  • Functional dependencies are too difficult for mere mortals to understand. Another reason for KISS (Keep It Simple Stupid).

6. R++ Era

  • By adding new features to the relational model
  • Minor impact

Lessons Learned

  • Unless there is a big performance or functionality advantage, new constructs will go nowhere.

7. The Semantic Data Model Era

  • Little long term influence

8. OO Era

  • This community pointed to an “impedance mismatch” between relational data bases and languages like C++.

Lessons Learned

  • Packages will not sell to users unless they are in “major pain”
  • Persistent languages will go nowhere without the support of the programming language community

9. The Object-Relational Era

  • The OR proposal added user-defined data types, operators, functions, and access methods to a SQL engine.
  • The major OR research prototype was Postgres.

Lessons Learned

  • The major benefits of OR is two-fold: putting code in the data base (and thereby bluring the distinction between code and data) and a general purpose extension mechanism that allows OR DBMSs to quickly respond to market requirements.
  • Widespread adoption of new technology requires either standards and/or an elephant pushing hard.

10. Semi Structured Data

  • Schema later
  • Complex graph-oriented data model

Lessons Learned

  • Schema-later is probably a niche market
  • XQuery is pretty much OR SQL with a different syntax
  • XML will not solve the semantic heterogeneity either inside or outside the enterprise

11. Full Circle

  • If we don’t start learning something from history, we will be condemned to repeat it yet again
Written on December 13, 2017