June 1, 1999, Volume 2 - Number 8

Thirty Years of Relational: Extending the Relational Model

When's an extension not an extension?

By C.J.Date

Since its inception, the relational model has been the target of an unusually high degree of criticism. To be more specific, many claims have been made over the years (a) to the effect that the model is seriously deficient in some respect or another, and accordingly (b) that it therefore needs to be extended in some way. In this installment, I want to examine this business of "extending the relational model" in some detail; in particular, I want to take a brief look at Codd's own extended version known as RM/T.

Bogus vs. Genuine Extensions

Some claims of deficiency in the model are valid, others aren't. As a consequence, some proposed extensions to the model are genuine, meaning that they do truly serve to add useful functionality; others, however, are bogus, meaning either (a) that they don't add any new functionality or (b) that the functionality they do add isn't useful. Examples of genuine extensions include such things as the EXTEND and SUMMARIZE operators, the relational comparison operators, and view updatability theory. (I'm sure we can agree that these examples all do provide useful new functionality.) Examples of bogus extensions include such things as quota queries, date and time support, and "data of type REF" (REF = reference). Let me elaborate:

  • Quota queries are useful from a pragmatic point of view but are essentially just syntactic shorthand for functionality that already exists. Note carefully, therefore, that when I say an extension is bogus, I don't necessarily mean it's a bad thing -- I just mean it isn't really an extension to the model as such.
  • Date and time support is also useful. Furthermore, it isn't just a syntactic shorthand for something that already exists -- it really does provide new functionality. However, it still isn't an extension to the model, since the question of what data types are supported is nothing to with the model ("types are orthogonal to tables").
  • By contrast, "data of type REF" is bogus because it isn't useful! Indeed, it essentially drags back into the model all of the pointers and their associated baggage that Codd deliberately threw out, for very good reasons, all those years ago.1 Bogus extensions -- when they're actually claimed to be extensions as such -- tend to be predicated on a flawed understanding of the true nature of the model. The SQL community is especially at fault here, of course; SQL's many problems are ascribed to "relational model deficiencies," and solutions to those problems are then described as "relational model extensions." (As I've written elsewhere many times, the biggest problem with SQL is precisely that it doesn't support the relational model.) The most recent, and perhaps most egregious, case in point is provided by the so-called object/relational model, which I'll discuss briefly in the next section.
The "Object/Relational" Model

Several SQL vendors have attempted (with varying degrees of success, I might add) to extend their SQL products to incorporate some kind of object functionality. They then go on to claim that their extended products thus support an "extended" version of the relational model, which they refer to as the "object/relational model" (O/R model for short).

But this claim is absurd! As Hugh Darwen and I have shown in The Third Manifesto,2 object functionality and the relational model are completely orthogonal to one another. To quote: "The relational model needs no extension, no correction, no subsumption, and above all no perversion, in order [to support object functionality]." All that's needed is to support relational domains properly (which SQL never did), recognizing those domains for what they are, which is basically just abstract data types (ADTs), With All That That Entails. In other words, the so-called O/R model is just the relational model, pure and simple; there aren't any (genuine) "relational model extensions" involved at all.

The RM/T Paper: Basics

Let's turn to some interesting genuine extensions to the model. In 1979, Codd published yet another important paper, this one entitled "Extending the Relational Database Model to Capture More Meaning."3 I'll refer to this paper as the RM/T paper, for reasons that will quickly become clear. As the title suggests, the primary purpose of the RM/T paper was to suggest a set of "semantic" extensions to the original model; however, it began by summarizing the basic model (as of 1979), and I'd like to make a few remarks in that regard first before getting into details on the proposed extensions.

First of all, I believe I'm right in saying that the RM/T paper was actually the first of Codd's papers to include an explicit definition of the term relational model! Here is that definition:

The relational model consists of:

1. A collection of time-varying tabular relations (with the properties cited above -- note especially the keys and domains)

2. The insert-update-delete rules (Rules 1 and 2 cited above)

3. The relational algebra described... below.

As an aside, I have a few comments on this definition:

  • It's probably not a big deal, but to me it seems a little odd to say that the relational model includes "a collection of... relations" (a collection of relations is surely just a database?). I would have said rather that the model includes a type generator called RELATION, which allows users to define relation values and variables (relation variables,1 of course, being my preferred term for Codd's "time-varying tabular relations").
  • "Rules 1 and 2 cited above" are the entity integrity and referential integrity rules. These rules were implicit, more or less, in earlier papers but hadn't previously been spelled out (or named, come to that). Note: As a matter of fact, the referential integrity rule was slightly defective as stated, inasmuch as it overlooked the possibility that foreign keys in general were supposed to permit nulls. Of course, the point is unimportant if you believe, as I do, that nulls should never have been introduced in the first place.
  • The relational algebra "described ... below" consists of the usual operators, together with some additional ones for dealing with "null values" [sic]. The paper defines the following such additional operators: MAYBE O-JOIN, MAYBE DIVIDE, OUTER O-JOIN, OUTER NATURAL JOIN, and OUTER UNION ("in a similar manner, we could define OUTER versions of INTERSECTION and DIFFERENCE also"). This list of operators raises many questions -- for example, questions of orthogonality and completeness -- but again I don't think they're very important, since I believe nulls and everything to do with them to be a mistake (in my opinion Codd's one big error of judgment in this whole business).

End of aside.

The RM/T paper was also the first by Codd to make explicit mention of the idea of relational assignment. However, the mention occurs only in connection with the proposed semantic extensions; it isn't part of the "basic relational model" definition given above, though it's certainly part of the model as now commonly understood. Moreover, there's no discussion of the fact that INSERT, UPDATE, and DELETE are basically just shorthand for certain relational assignments.

Third, the paper also has this to say: "Closely associated with the relational model are various [semantic] concepts... Examples are... nonloss (natural) joins and functional dependencies, multivalued dependencies, and normal forms." Here, then, we have a clear statement of Codd's position that these matters are to be seen as separate from the model per se (though I think he might subsequently have changed his mind on this point4).

Fourth, the RM/T paper was also the first in which Codd embraced the idea of surrogates -- that is, system-assigned identifiers. (Again, the concept is brought in only in connection with the proposed semantic extensions, but there's no reason why it can't be used with the basic model, and indeed there are often good arguments in favor of doing so.) Unfortunately, however, the paper states that surrogates must be hidden from users -- a clear violation of the paper's own earlier definition of a relational database, which says, to paraphrase, that all data in the database must be accessible to (authorized) users. In fact, an argument could be made that hiding surrogates constitutes a violation of Codd's own Information Principle, which states that all information in the database must be cast explicitly in terms of values in relations and in no other way.

(Just as an aside, let me remind you that -- as we saw last month -- relations per se are the only essential data construct allowed in a relational database. If I now add that relations are also the only allowable inessential construct, then what we wind up with is effectively a statement of the Information Principle.)

Finally, the RM/T paper devotes a brief (too brief) section to the relationship between the relational model and predicate logic: "A database [is] a set of [propositions] in first-order predicate logic... [We can] factor out the predicate common to a set of simple [propositions] and then treat the [propositions] as an... n-ary relation and the predicate as the name of the relation." Codd goes on to refer to the "propositions" portion of the database as the extension and the "predicates" portion as the intension (extension and intension here being technical terms from logic). "One may ... view the intension as a set of integrity constraints." And he briefly discusses the closed vs. open world interpretations. (Under the closed interpretation, the omission of a given row from a given relation means the corresponding proposition is false; under the open interpretation, it means we don't know whether it's true or false.)

The RM/T Paper: Extensions

As I've already indicated, the bulk of reference3 is concerned with an extended version of the relational model called RM/T ("T for Tasmania, where these ideas were first presented"). It opens with some nice preliminary remarks on the matter of semantic extensions and "semantic data modeling" in general:

Actually, the task of capturing the meaning of data is a never-ending one. So the label ‘semantic’ must not be interpreted in any absolute sense. Moreover, database models developed earlier (and sometimes attacked as ‘syntactic’) were not devoid of semantic features (take domains, keys, and functional dependence, for example). The goal [of semantic modeling] is nevertheless an extremely important one, because even small successes can bring understanding and order into the field of database design.

(What a pleasing contrast to the exaggerated claims so often encountered in the semantic modeling field!)

Later, Codd makes another good point:

In recent papers on semantic data modeling there is a strong emphasis on structural aspects, sometimes to the detriment of manipulative aspects. Structure without corresponding operators or inferencing techniques is rather like anatomy without physiology.

Nice analogy!

To turn now to RM/T specifically: RM/T generally falls into the same broad category as the rather better known "entity/relationship model" (E/R model for short).5 Even if never implemented, therefore (and to my knowledge it never has been), it can still serve -- just as the E/R model can -- as the basis for a systematic database design methodology; in fact, I personally prefer it to the E/R model for this purpose, since I find it to be more precisely specified. Some immediate differences between the two are as follows:

    1. RM/T makes no unnecessary distinctions between entities and relationships -- a relationship is regarded merely as a special kind of entity.

    2. The structural and integrity aspects of RM/T are more extensive, and more precisely defined, than those of the E/R model.

    3. RM/T includes its own special operators, over and above the operators of the basic relational model (though much additional work remains to be done in this last area).

In outline, RM/T works as follows:

    1. Entities (including "relationships") are represented by E-relations and P-relations, both of which are special forms of the general n-ary relation. E-relations are used to record the fact that certain entities exist, P-relations are used to record certain properties of those entities (E-relations are of degree exactly one, P-relations of degree at least two).

    2. A variety of relationships can exist among entities; for example, entity types A and B might be linked together in an association (RM/T's term for a many-to-many relationship), or entity type Y might be a subtype of entity type X. RM/T includes a formal catalog structure by which such relationships can be made known to the system. The system is thus capable of enforcing the various integrity constraints that are implied by the existence of such relationships.

    3. As already mentioned, a number of high-level operators are provided to facilitate the manipulation of the various RM/T objects (E-relations, P-relations, catalog relations, and so forth).

RM/T also provides an entity classification scheme, which in many respects constitutes the most significant aspect -- or, at least, the most immediately visible aspect -- of the entire model. To be more specific, entities are classified (though only informally, please note) into three categories, called kernels, characteristics, and associations:

  • Kernels: Kernel entities are entities that have independent existence; they are "what the database is really all about." In other words, kernels are entities that are neither characteristic nor associative (see below). Examples might be suppliers and parts (but not shipments) in the usual suppliers-and-parts database.
  • Characteristics: A characteristic entity is an entity whose primary purpose is to describe or "characterize" some other entity. An example might be individual line items on a customer order. Characteristics are existence-dependent on the entity they describe. The entity described can be kernel, characteristic, or associative. <>LI>Associations: An associative entity is an entity whose function is to represent a many-to-many (or many-to-many-to-many...) relationship among two or more other entities. Shipments in the familiar suppliers-and-parts database provide an example. The entities associated can each be kernel, characteristic, or associative.

In addition:

  • Entities (regardless of their classification) can also have properties; for example, parts have colors, line items have costs, shipments have quantities.
  • In particular, any entity (again, regardless of its classification) can have a property whose function is to designate some other related entity; for example, orders designate customers. A designation represents a many-to-one relationship between two entities. Note: Actually, the idea of designations was added later6 -- it wasn't included in the original RM/T paper.
  • Entity supertypes and subtypes are supported. If B is a subtype of A, then B is a kernel, a characteristic, or an association depending on whether A is a kernel, a characteristic, or an association. Note: The RM/T paper had virtually nothing to say on the related (and important!) notion of inheritance, however. Indeed, RM/T's notion of supertypes and subtypes has more to do with the notion, proposed for SQL3, of "supertables and subtables"7 than it does with true type inheritance as discussed in, for example, The Third Manifesto.

The foregoing concepts can be related (somewhat loosely) to their E/R analogs as follows: A kernel corresponds to an E/R "regular entity"; a characteristic to an E/R "weak entity"; and an association to an E/R "relationship" (many-to-many variety only).

Note: In addition to the aspects discussed briefly above, RM/T also includes support for (a) the time dimension and (b) various kinds of data aggregation. For more detailed discussions, see Codd's original paper3 or my own tutorial description of RM/T.6

References

1. Date, C. J. "Don't Mix Pointers and Relations!" and "Don't Mix Pointers and Relations -- Please!". In C. J. Date, Hugh Darwen, and David McGoveran: Relational Database Writings 1994-1997. Reading, Mass.: Addison-Wesley, 1998.

2. Date, C. J. and Hugh Darwen. Foundation for Object/Relational Databases: The Third Manifesto. Reading, Mass.: Addison-Wesley, 1998.

3. Codd, E. F. "Extending the Relational Database Model to Capture More Meaning." IBM Research Report RJ2599 (August 6th, 1979). Republished in ACM Transactions on Database Systems 4(4), December 1979.

4. Codd, E. F. The Relational Model For Database Management Version 2. Reading, Mass.: Addison-Wesley, 1990.

5. Pin-Shan Chen, P. "The Entity-Relationship Model -- Toward a Unified View of Data." ACM Transactions on Database Systems 1(1), March 1976. Republished in Michael Stonebraker (ed.): Readings in Database Systems (2nd edition). San Mateo, Calif.: Morgan Kaufmann, 1994.

6. Date, C. J. "The Extended Relational Model RM/T." In C. J. Date, Relational Database Writings 1991-1994. Reading, Mass.: Addison-Wesley, 1995.

7. Date, C. J. and Hugh Darwen. A Guide to the SQL Standard (4th edition). Reading, Mass.: Addison-Wesley, 1997.

C. J. Date is an independent author, lecturer, researcher, and consultant, specializing in relational database systems. His most recent books are Foundation for Object/Relational Databases: The Third Manifesto, coauthored with Hugh Darwen, and Relational Database Writings 1994-1997, both published by Addison-Wesley in 1998. Correspondence may be sent to him in care of Intelligent Enterprise, iemagazine@mfi.com.



 





IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo JitterPlug Into The Cloud
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet EvolutionPyramid Research
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space