An analysis of Codd's contribution to the Great DebateThirty Years of Relational: Relational Really Is DifferentBy C.J.Date
The Great Debate was a debate between proponents of the relational and network approaches. It was held at the ACM SIGMOD Workshop on Data Description, Access, and Control in 1974; the principal speakers were Edgar F. Codd for the relational approach (surprise!) and Charles W. Bachman for the network, or CODASYL, approach. In preparing for the debate, Codd wrote a paper entitled "Interactive Support for Nonprogrammers: The Relational and Network Approaches"1, and it's that paper that I want to discuss here. Note: The paper is shown in the debate proceedings as being a joint production by Codd and myself; in fact, it was wholly written by Codd. (The companion paper,2 on application programming considerations, which is also attributed to both of us, was written by myself.)
OVERVIEW OF THE PAPER Of course, the battle between relations and networks is ancient history now. (The good guys won.) This fact notwithstanding, Codd's paper -- even though it was written over 25 years ago -- is still worth reading today as a beautiful example of clear thinking. Indeed, it's quite remarkable to see how, on a topic where muddled thinking was the norm at the time, Codd was able to do such a good job of cutting to the chase and focusing on the real underlying issues. Let me elaborate:
Thus, Codd has some claim to being the first person to attempt to give an abstract definition, not just of the relational model (of course), but also of a network model! Certainly none of the CODASYL documents ever attempted any such thing. (In fact, the paper might still be timely after all, even though the original battle is over. Certainly we are -- regrettably -- seeing some of the same tired old issues surfacing again in connection with object DBMSs. As several writers have observed, object DBMSs do tend to look like "CODASYL warmed over" in certain respects. A case of those who don't know history being doomed to repeat it?) Anyway, the most significant contribution in "Interactive Support for Nonprogrammers: The Relational and Network Approaches" is probably his introduction of the concept of essentiality, a concept that's critical to a proper understanding of data models in general and relations vs. networks in particular. It's the concept of essentiality that allows us to pinpoint the crucial difference between relational databases and others, as we'll see in the next section. (Indeed, it's really the main reason I want to talk about the paper in this series at all.) As well as introducing the essentiality notion, Codds paper also raises a number of pertinent questions regarding the suitability of network structures as a component of what it calls "the principal schema" -- questions that, so far as I know, have never been answered in the open literature. To quote: "In the past, many designers of software systems... have confused two quite distinct notions: enrichment of features on the one hand, and generality of application on the other." How true! "A crucial issue in database management systems is that of the richness (that is, variety) of data structures... that should be supported in the principal schema. In the event that enrichment of these data structures ... beyond the minimum is proposed, we ask the following questions..." (I don't want to discuss the questions themselves in detail here, however, since I've already considered them at some length in an earlier paper3). Incidentally, the foregoing quote reminds me of another one, which appeared in Codd's tutorial paper on normalization4: "Several existing systems permit a variety of physical representations for a given logical structure... The complexity of the physical representations which these systems support is, perhaps, understandable, because these representations are selected in order to obtain high performance... What is less understandable is the trend toward more and more complexity in the data structures with which... users directly interact. Surely, in the choice of logical data structures that a system is to support, there is one consideration that is of absolutely paramount importance -- and that is the convenience of the majority of users." Again, how true! Back to "Interactive Support." The paper also includes an appendix comparing CODASYL/COBOL and relational/Alpha versions of a simple machine shop scheduling application. That comparison provides strong evidence in support of relational claims of simplicity (simplicity for the user, that is). For full details of the comparison, including the actual code used in the two solutions, I refer you to the original paper; here I content myself with simply repeating a few remarks from an earlier paper of mine5 that discusses the same example. Here are a few comparative statistics:
The relative simplicity of the relational solution is very striking. Note: In fact, the relational solution could have been reduced to just a single statement, a The example highlights another point, too. To quote Codd: "The reader is cautioned to avoid comparing [different] approaches solely on the basis of differences in [data structure]. An adequate appreciation of the differences ... must entail consideration of the... operators also." And later he adds: "Discussions on the relational approach often become riveted on the [data structure] component to the neglect of [other components]. To do justice to this approach, all... components must be considered as a package."
ESSENTIALITY Although this concept is so important, it's my experience that few database professionals are really familiar with it. This installment therefore has more of a tutorial flavor than others. Note: The material that follows is based in part on another earlier paper of mine.6 First of all, "everyone knows" that the only data structure available in the relational model is the relation itself. To understand the significance of this point, however, it's necessary to know something about at least one other data structure: for example, the link structure found in hierarchic and network systems. So let's look at an example. Fig. 1 shows (a) the relational design for a simple departments-and-employees database, together with (b) a network equivalent of that design. (Actually, the example is so simple that the network design degenerates to a mere hierarchy, but the point's not important for present purposes. Hierarchies and networks are much more like each other than either one is like relations.) ![]() The network design involves a "CODASYL set" (not to be confused with a mathematical set!). Each occurrence of that CODASYL set consists of a Observe now that the Now let's consider a couple of queries against this database. For each query, I'll show a relational (SQL) formulation and a network equivalent, using a hypothetically extended version of SQL that caters for links.
Q1: Get employee numbers and employee names for employees with salary greater than 20K.
For query Q1 the two formulations are obviously identical; for query Q2, however, they're not. The relational formulation for Q2 still has the same basic form as for Q1 ( Q2 thus illustrates the important point that networks fundamentally require certain additional data access operators. Note too that those operators are additional; the relational operators are still needed as well, as query Q1 shows. Note moreover that this point applies not only to all of the data manipulation operators (update operators included), but also to the definitional operators, the security operators, the integrity operators, and so on. The links of the network data structure thus certainly add complexity. However, they don't add any power -- there's nothing that can be represented by a network that can't be represented by relations, and there's no query that can be answered from a network and not from relations. Now, it's sometimes suggested that the complexity can be reduced by reinstating the
Fig. 2: Departments and employees: network design with foreign key
So now (at last!) I can explain the notion of essentiality. A data construct is essential if its loss would cause a loss of information -- by which I mean, very precisely, that some relation would no longer be derivable. For example, in the relational version of departments-and-employees, all data constructs (all rows and all columns) are essential in this sense. Likewise, in the original network version (Figure 1), all data constructs (all rows, all columns, and all links) are again essential. But in the revised network of Figure 2, the rows and columns are essential but the link is inessential. There's no information that can be derived from that network that can't be derived from the rows and columns alone; there's no logical need to use the link at all. Note: Some people might argue that the opposite is the case -- the link is essential and the foreign key is inessential. But that argument misses the point, which is that, since some rows and columns must be essential, and nothing else need be, then why have anything else? Now (at last) I can pin down the crucial difference between a relational database and any other kind, say a network database. In a relational database, the only essential data construct is the relation itself. In other databases, there must be at least one additional essential data construct (such as an essential link). If there isn't, then the database is really a relational database that happens to have certain access paths exposed. (And there's no requirement that the user use those access paths, and the question arises as to why they're exposed anyway when others aren't.) And it's those additional essential data constructs that lead to much (not all) of the complexity of nonrelational databases. Note: In the case of CODASYL specifically, there are at least four additional constructs that can also be used to carry information essentially. To get into details of those other constructs here would take us much too far afield, however.
A NOTE ON ORDERING The concept of essentiality allows me to explain why it's significant that relations have no ordering to their rows. In an ordered file, the ordering itself might be essential in the sense explained above. For example, a file of temperature readings might be kept in the order in which those readings were taken; the ordering itself might thus carry information, which would be lost if the records of the file were rearranged -- just as information can be lost if someone drops a box of cards, if those cards don't include a sequence field. (If you're too young to know what boxes of cards and sequence fields are all about, you can ignore this example!) And essential ordering, like an essential link, requires additional operators to deal with it -- "find the nth record," "insert a record between records n and n+1," and so on. For this reason it's not permitted in the relational model. In contrast to the foregoing, it's sometimes suggested that inessential ordering might be acceptable. A file is inessentially ordered if it's ordered on the basis of the value(s) of some field(s) -- for example, the employee file might be ordered by employee number, but no information would be lost if the records were shuffled around. Some "relational" systems do in fact support ordering in this sense. Note, however, that relations per se are unordered by definition; it would really be better to regard an "ordered relation" as a totally different kind of thing -- perhaps as, precisely, a sequential file. In this regard, the SQL In any case, even inessential data constructs can cause problems, because they do still carry information, even though they're inessential. For example, they might represent a security exposure. Suppose a file of employee records is ordered (inessentially) by increasing salary. Then the fact that your manager's record appears after your own certainly tells you something, even if you're not authorized to see actual salary values.
CONCLUDING REMARKS In conclusion, let me get back to Codd's "great debate" paper. Codd observes, wisely, that it's "not enough to take note of where [the] approaches stand today -- we must also be clear about where they're going." He contends that "casual" users of database systems will become increasingly important (in fact, they'll soon be the largest user constituency of all), and so we must pay attention to what the different approaches are or might be doing to support such users. Some quotes:
These observations need only minor changes in wording for them to be still highly pertinent today. REFERENCES 1. Codd, E. F. Codd and C. J. Date. "Interactive Support for Nonprogrammers: The Relational and Network Approaches." IBM Research Report RJ1400 (June 6th, 1974). Republished in Randall J. Rustin (ed.), Proc. ACM SIGMOD Workshop on Data Description, Access, and Control, Vol. II, Ann Arbor, Michigan (May 1974). Also in C. J. Date, Relational Database: Selected Writings. Reading, Mass.: Addison-Wesley (1986). 2. Date, C. J. and E. F. Codd. "The Relational and Network Approaches: Comparison of the Application Programming Interfaces." IBM Research Report RJ1401 (June 6th, 1974). Republished in Randall J. Rustin (ed.), Proc. ACM SIGMOD Workshop on Data Description, Access, and Control, Vol. II, Ann Arbor, Michigan (May 1974). Also in C. J. Date, Relational Database: Selected Writings. Reading, Mass.: Addison-Wesley (1986). 3. Date, C. J. "Support for the Conceptual Schema: The Relational and Network Approaches." In C. J. Date, Relational Database Writings 1985-1989. Reading, Mass.: Addison-Wesley (1990). 4. Codd, E. F. "Normalized Data Base Structure: A Brief Tutorial." Proc. 1971 ACM SIGFIDET Workshop on Data Description, Access, and Control, San Diego, Calif. (November 11th-12th, 1971). 5. Date, C. J. "Why Relational?" In C. J. Date, Relational Database Writings 1985-1989. Reading, Mass.: Addison-Wesley (1990). 6. Date, C. J. "Essentiality." In C. J. Date, Relational Database Writings 1991-1994. Reading, Mass.: Addison-Wesley (1995). C. J. Date is an independent author, lecturer, researcher, and consultant, specializing in relational database systems. His most recent books are Foundation for Object/Relational Databases: The Third Manifesto, coauthored with Hugh Darwen, and Relational Database Writings 1994-1997, both published by Addison-Wesley in 1998. Correspondence may be sent to him in care of Intelligent Enterprise, 411 Borel Ave. #100, San Mateo, CA 94402. |
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||










