Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Digital Library
Subscribe
Home


June 1, 1999, Volume 2 - Number 8

Thirty Years of Relational: Relational Forever!

The relational model will stand the test of time

By C.J.Date

I've now devoted several articles to a historical review and analysis of Codd's original relational papers (or at least the most important of those papers). To be specific, I've examined the following papers in some detail:

  • "Derivability, Redundancy, and Consistency of Relations Stored in Large Data Banks"
  • "A Relational Model of Data for Large Shared Data Banks"
  • "Relational Completeness of Data Base Sublanguages"
  • "A Data Base Sublanguage Founded on the Relational Calculus"
  • "Further Normalization of the Data Base Relational Model"
  • "Interactive Support for Nonprogrammers: The Relational and Network Approaches"
  • "Extending the Relational Database Model to Capture More Meaning."

I've also briefly touched on a few other papers from time to time.

The time has come to bring the series to a close. In this final installment, I'd like to set out some specific objectives for the relational model and consider how well it meets them (or doesn't meet them). I'd also like to take a brief look at exactly what the model is and where it might be headed. All in all, I'd like this whole series to be seen as a tribute to Codd's tremendous achievement in founding, more or less single-handed, pretty much the entire field of modern database management -- the field in which we all toil and from which we all obtain our livelihood. Thank you, Ted!

Relational Objectives

Let's begin by taking a look at what Codd thought he was trying to achieve with his relational research. It turns out, unsurprisingly, that several of his papers do address this issue. For example, in the RM/T paper1, he writes: "The relational model... was conceived... primarily as a tool to free users from the frustrations of having to deal with the clutter of storage representation details." More specifically, in the Alpha paper,2 he identifies the following as "principal motivations of the relational model":

1. Data independence
2. The simplest possible [data] structure consistent with semantic considerations
3. Provision of a unifying principle that would simplify the language needed for interaction and statement analysis needed for authorization of access and optimization of search
4. Relatively easy analysis for [data] consistency.

Later, in his "Great Debate" paper,3 he writes: "The relational approach was developed as a response to the following requirements, which were considered to be relatively novel in 1968":

1. Data independence
2. Integration of files into databases
3. Multiple user types
4. Many online users at terminals
5. Increased dynamic sharing of data
6. Networks of mutually remote databases.

In an invited paper to the 1974 IFIP Congress4 (the same year as the Great Debate), he lists the following as "the objectives of [the relational approach]":

1. To provide a high degree of data independence
2. To provide a community view of the data of spartan simplicity, so that a wide variety of users in an enterprise (ranging from the most computer-naive to the most computer-sophisticated) can interact with a common model (while not prohibiting superimposed user views for specialized purposes)
3. To simplify the potentially formidable job of the database administrator
4. To introduce a theoretical foundation (albeit modest) into database management (a field sadly lacking in solid principles and guidelines)
5. To merge the fact retrieval and file management fields in preparation for the addition at a later time of inferential services in the commercial world
6. To lift data-based application programming to a new level -- a level in which sets (and more specifically relations) are treated as operands instead of being processed element by element.

And he adds: "In connection with the second [of these objectives], it is important to remember that data bases are being established for the benefits of end users, and not for the application programmers who act as middle-men [sic] for today's data processing needs."

Codd went on to repeat these same objectives in the Great Debate paper, at which time he added that the relational approach had "four main components":

1. Simplify to the greatest practical extent the types of data structure employed in the principal schema (or community view)
2. Introduce powerful operators to enable both programmers and nonprogrammers to store and retrieve target data without having to "navigate" to the target [emphasis in the original]
3. Introduce natural language (for example, English) with dialog support to permit effective interaction by casual (and possibly computer-naive) users
4. Express authorization and integrity constraints separately from the data structure (because they are liable to change).

"Discussions on the relational approach often become riveted on the first [of these components] to the neglect of the other three... To do justice to this approach, all four components must be considered as a package."3

Finally, in the paper he presented on the occasion of his receiving the 1981 ACM Turing Award (richly deserved!) for his work on the relational model,5 Codd claims that truly relational systems can:

1. Put many database applications within the nonprogrammer's reach, where programmers were previously a necessity
2. Increase the productivity of programmers on many (though not all) database applications(slightly paraphrased).

It seems to me that these various lists of objectives and related matters together constitute a striking testimonial to Codd's huge achievement. I don't think anyone could seriously claim either (a) that any of the objectives is undesirable in itself or (b) that -- with one possible exception -- the relational model has failed in meeting them. The possible exception is the one having to do with "the addition at a later time of inferential services in the commercial world." Providing such services is an issue that (so far as I know) has yet to be seriously addressed in the DBMS marketplace. Nevertheless, there's every reason to believe that the relational model does indeed provide the right foundation for such services, owing in large part to its close relationship (noted in the previous installment) to predicate logic.

So What Is The Relational Model?

I pointed out in the previous installment that, strangely enough, Codd apparently didn't even define the term "relational model" until 1979.1 Perhaps even more strangely, he didn't define the more general term "data model" until 1981! In a paper entitled "Data Models in Database Management",6 he defines a data model to consist of a combination of three components:

1. A collection of data object types, which form the basic building blocks for any database that conforms to the model
2. A collection of general integrity rules, which constrain the set of occurrences of those object types that can legally appear in any such database
3. A collection of operators, which can be applied to such object occurrences for retrieval and other purposes (somewhat paraphrased once again).

By the way, note that object here is definitely not meant in its modern, rather restricted sense!

The paper goes on to discuss what purpose data models in general, and the relational model in particular, are intended to serve, and offers evidence in support of the claim that -- contrary to popular belief -- the relational model was actually the first abstract data model to be defined. (As we saw in the previous installment, the so-called hierarchic and network "models" were defined after the fact by a process of abstraction from already existing implementations. Though it's interesting to note in the light of this fact that Codd himself referred to "the hierarchic and network models" in his very first two papers, dated 1969 and 1970 respectively!)

Be that as it may, the question arises: What then exactly is the relational model? If you've been following this series carefully, you'll have noticed that Codd's own definitions evolved somewhat throughout the 1970s and early 1980s. (Indeed, they've continued to change since that time, too.) One consequence is that critics have been able to accuse Codd in particular, and relational advocates in general, of "moving the goalposts" far too much. For example, Mike Stonebraker has written7 that "one can think of four different versions" of the model:

  • Version 1: Defined by the 1970 CACM paper8
  • Version 2: Defined by the 1981 Turing Award paper5
  • Version 3: Defined by Codd's 12 rules and scoring system9
  • Version 4: Defined by Codd's book.10

Perhaps because we're a trifle sensitive to such criticisms, Hugh Darwen and I have tried to provide, in The Third Manifesto,11 our own careful statement of what we believe the relational model is (or ought to be!). Indeed, we'd like the Manifesto to be seen in part as a definitive statement in this regard. I refer you to the document itself for the details; here just let me say that we see our contribution in this area as primarily one of dotting a few i's and crossing a few t's that Codd himself left undotted or uncrossed in his own original work. We most certainly do not want to be thought of as departing in any major respect from Codd's original vision; indeed, the whole of the Manifesto is very much as in the spirit of Codd's ideas and continues along the path that he originally laid down.

Whither The Relational Model?

In the first installment in this series, I said I expected database systems still to be based on Codd's relational foundation a hundred years from now. And I hope you can see, from what we've covered over the past few months, why I believe such a thing. The relational approach really is rock solid, owing (once again) to its basis in mathematics and predicate logic. (Of course, I don't mean to suggest that the model solves all known problems and will never need any extensions; as we saw last month, extensions are certainly possible and sometimes desirable. I just mean, to repeat, that the foundation is solid.)

I'd like to conclude by summarizing an argument that Codd himself presented (under the heading "Whither Database Management?") in the Great Debate paper, having to do with alternative possibilities for the future development of database systems. We start by assuming we're given the simplest possible programmer-oriented interface to the database (a record-at-a-time interface). Then:

1. If we add high-level operators (join and so on), we get automatic navigation -- meaning that even nonprogrammers can get to their targets unaided.
2. Alternatively, if we don't add such operators but do add new data structures, such as links, then manual navigation becomes a necessity -- meaning that programmers become indispensable in the task of helping end users get to their targets.
3. And if we add both operators and structure, then we get needless complexity -- meaning that many more decisions have to be made by both programmers and the database administrator (without, I might add, any good guidelines as to how to make those decisions).

The conclusion is obvious.

REFERENCES

1. Codd, E. F. "Extending the Relational Database Model to Capture More Meaning." IBM Research Report RJ2599 (August 6th, 1979). Republished in ACM Transactions on Database Systems 4(4), December 1979.

2. Codd, E. F. "A Data Base Sublanguage Founded on the Relational Calculus." IBM Research Report RJ893 (July 26th, 1971). Republished in Proc. 1971 ACM SIGFIDET Workshop on Data Description, Access and Control, San Diego, November 1971.

3. Codd, E. F. and C. J. Date. "Interactive Support for Nonprogrammers: The Relational and Network Approaches." IBM Research Report RJ1400 (June 6th, 1974). Republished in Randall J. Rustin (ed.), Proc. ACM SIGMOD Workshop on Data Description, Access, and Control, Vol. II, Ann Arbor, Michigan, May 1974. Also in C. J. Date, Relational Database: Selected Writings. Reading, Mass.: Addison-Wesley, 1986.

4. Codd, E. F. "Recent Investigations into Relational Data Base Systems." IBM Research Report RJ1385 (April 23rd, 1974). Republished in Proc. 1974 Congress (Stockholm, 1974). New York, N.Y.: North-Holland, 1974.

5. Codd, E. F. "Relational Database: A Practical Foundation for Productivity." IBM Research Report RJ3339 (December 21st, 1981). Republished in CACM 25(2), February 1982.

6. Codd, E. F. "Data Models in Database Management." Proc. Workshop on Data Abstraction, Databases, and Conceptual Modelling (Michael L. Brodie and Stephen N. Zilles, eds.), Pingree Park, Colo. (June 1980): ACM SIGART Newsletter No. 74 (January 1981); ACM SIGMOD Record 11(2), February 1981; ACM SIGPLAN Notices 16(1), January 1981.

7. Stonebraker, Michael. Introduction to Chapter 1 ("The Roots"), Readings in Database Systems (2nd edition). San Mateo, Calif.: Morgan Kaufmann, 1994.

8. Codd, E. F. "A Relational Model of Data for Large Shared Data Banks." CACM 13(6), June 1970. Republished in Milestones of Research -- Selected Papers 1958-1982 (CACM 25th Anniversary Issue), CACM 26(1), January 1983.

9. Codd, E. F. "Is Your DBMS Really Relational?"; "Does Your DBMS Run By The Rules?" Computerworld (October 14th, 1985; October 21st, 1985).

10. Codd, E. F. The Relational Model For Database Management Version 2. Reading, Mass.: Addison-Wesley, 1990.

11. Date, C. J. and Hugh Darwen. Foundation for Object/Relational Databases: The Third Manifesto. Reading, Mass.: Addison-Wesley, 1998.



C. J. Date is an independent author, lecturer, researcher, and consultant, specializing in relational database systems. His most recent books are Foundation for Object/Relational Databases: The Third Manifesto, coauthored with Hugh Darwen, and Relational Database Writings 1994-1997, both published by Addison-Wesley in 1998. Correspondence may be sent to him in care of Intelligent Enterprise, iemagazine@mfi.com.





IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







Techweb
Informationweek Business Technology Network
InformationweekInformationweek 500Informationweek 500 ConferenceInformationweek AnalyticsInformationweek Events
Informationweek MagazineGlobal CIOIWK Government ITbMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingPlug Into The CloudDr. DobbsContentinople
space
TechWeb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0Mobile Business ExpoNoJitter
Black HatGTECEnergy CampCloud ConnectGov 2.0 ExpoGov 2.0 Summit
space
Light Reading Communications Network
Light ReadingLight Reading AsiaUnstrungCable Digital NewsInternet EvolutionPyramid Research
Heavy ReadingLight Reading LiveLight Reading InsiderEthrnet ExpoTelco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems and TechnologyInsurance and TechnologyWall Street and TechnologyAccelerating WallstreetBST SummitBuyside Trading SummitIT Summit
space
Microsoft Technology Network
MSDNTechNetTotal IT ProTotal Dev ProNET Total Dev Pro CommunitySQL Total Dev Pro Community
space