Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Digital Library
Subscribe
Home


The Birth of the Relational Model, Part 2 of 3
March 30, 1999


Data Sublanguage Alpha, Installment 7

Concluding our brief look at Codd's Alpha language

By C. J. Date

In my last installment, we looked at Codd's Data Sublanguage Alpha, reviewing that language’s basic data definition and data manipulation operations. As before, I'll use reference 5 as my primary source (referring to it as "the Alpha paper" or sometimes just "Codd's paper"); I'll mention reference 3 only in connection with material that didn't make it into reference 5 for some reason.

Implicit range variables: Alpha supports the obvious shorthand of letting you use relation names in place of explicit range variable names so long as no ambiguity results. A relation name used in this way denotes an implicit range variable that ranges over the relation in question. Thus, you might have expressed the retrieval example from last month ("get supplier names and cities for suppliers who supply all parts") as follows:

GET W1 (S.SNAME, S.CITY) :
ALL P SOME SP (SP.S# = S.S# AND SP.P# = P.P#)

But it's important to understand that the name "S" here does not represent the supplier’s relation S; it represents a range variable called S that ranges over the relation with that same name (and similarly for the names P and SP). Both QUEL and SQL adopted such a shorthand.

Dual mode: In reference 2, Codd explicitly talks about what we now call the dual-mode principle: Any database operation that you can invoke interactively can also be invoked from within an application program. "[Alpha] is intended to be a sublanguage ...of the languages used by all terminal users," writes Codd. "…It is also intended to be a sublanguage of such host programming languages as PL/I, COBOL, and FORTRAN." Another first, I believe.

Catalog: The Alpha paper shows clearly that Codd was aware of the importance of the catalog. It states that the catalog itself should be structured as relations: "The catalog... can itself be a part of the data base and would then consist of...relations." And later: "All of the information regarding a new relation -- relation name, attribute and domain names, primary key specification, and so on -- must be entered in those relations that catalog the data base relations" (slightly paraphrased). And: "The access authorization constraints must be set up in those relations which describe these constraints ... A storage representation must be selected for the new relation (this may include a decision as to which attributes are to be indexed), and this descriptive information must be stored in appropriate relations."

Indirect references: Alpha includes a dereferencing operator called PER, according to which (for example) the operation:

GET W2 PER (W1.X)

retrieves into workspace W2 the relation whose name is given in the X component of workspace W1. QUEL had a somewhat similar feature; SQL didn't until the "dynamic SQL" feature was introduced.

Domain migration: Reference 3 includes some remarks on what it calls domain migration. (Attribute migration would be a better term). The basic idea is that some attribute might "migrate" from one base relation to another, loosely speaking, and we would like our queries and applications to continue to function correctly after such a change. In other words, Codd is talking here about one aspect of what we now call logical data independence. The general solution to this problem is to provide views that make the new relations look like the old ones as far as those queries and applications are concerned. And that solution is what Codd proposes (though he doesn't use the term "view" as such). Reference 5 touches on the foregoing ideas only very obliquely, not going into detail.

Three-valued logic: The Alpha paper -- very unfortunately, in my opinion! -- permits a retrieval operation to include the qualifier MAYBE_TOO, to indicate that tuples should be returned for which the qualification evaluates to unknown (the paper refers to this truth value as maybe), as well as those for which it evaluates to true. In other words, Codd was suggesting that the system should be based on three-valued logic and should support some kind of null (which he refers to as "the absent value"). He didn't elaborate on this idea at all in the paper, except for:

a) Giving a simple example of inserting tuples for which not all components are specified ("the system [inserts] the absent value [for those missing components]"), and b) Remarking that "the ramifications [of such an approach] are substantial."

I fear this latter observation is all too true. Regular readers of this series will know that Codd and I disagree strongly on the merits of nulls and three-valued logic, and I'm sorry to see him even mentioning the possibility as far back as 1971. He didn't actually do anything with the idea until 1979;7 in other words, the relational model managed perfectly well without nulls for some ten years.

Language Levels

Before getting into details of Alpha per se, reference 5 discusses the general question of language levels: "[Data] base systems may be classified by the data model with which the user interacts and the [level of] language provided to the user for expressing this interaction." The data model can be trees, nets, or relations; the language level can be low (Codd also calls this level "procedural"), intermediate (algebra based), or high (calculus based). Again, observe that Codd is still treating the model and the operators as two different things! What's more, (though the foregoing extract doesn't illustrate the point), he's also using the term "data model" to mean a model of the data in a specific database instead of a model of data in general.

Before going any further, I should perhaps mention that some confusion exists over the term "procedural"; some people use it to mean what would more properly be described as imperative. Although procedural languages will certainly be imperative, an imperative language might not be procedural. For example, we could certainly imagine a language that was based on Codd's relational algebra (and so nonprocedural) and yet was imperative in style.

Be that as it may, Codd goes on to consider the pros and cons of the three language levels and presents arguments to support his position that the calculus level is superior to the algebraic, which is in turn superior to the procedural. He notes, correctly, that these arguments "are particularly relevant to questions of intersystem compatibility and standardization"; he also notes that arguments already presented in reference 4 (regarding advantages of the relational model in general) reinforce those presented in this paper 5 in favor of both the calculus and algebraic levels over the procedural.

All in all, Codd's arguments in this section of the paper display a remarkable degree of foresight. Let me summarize those arguments here, very briefly.

Protecting users from representation clutter: "The provision of a conceptually concise model of the data and a powerful, conceptually concise language for its manipulation is not just an aesthetic concern. When users are forced to make numerous choices and decisions about avoidable representation details, the consequences are manifold and costly... This is not merely an argument for protecting users from... the sordid details of physical representation; it is an equally valid argument against imposing...an over-elaborate, conceptually redundant logical representation" (somewhat paraphrased). These arguments are as forceful, valid, and correct today as they were when they first appeared! How sad that our industry seems to have lost sight of them (I am, of course, thinking here of numerous recent attempts to replace the relational model by some kind of "object model"). Descriptive vs. constructive expression of intent: Here, Codd characterizes the calculus as descriptive and the algebra as constructive (or prescriptive), and argues that the former is preferable to the latter. As I noted in my installment two months back, I don't fully agree with this position, but I do certainly agree that (as Codd says) the calculus and the algebra are both superior to the procedural approach.

Understanding and modifying programs: This point is a corollary of the previous two. "Clarity of intent is important, [especially] when an application program has to be changed [and especially when that change has to be made by] people who did not write the program [in the first place]." In this connection, Codd invites us to compare the effort involved in changing the order of two quantifiers in an Alpha program with the work needed to restructure a corresponding Codasyl program to achieve the same effect. Nice example!

Evolutionary development of search techniques: "Adoption of the calculus approach permits successive improvements in general search algorithms to be incorporated into data base systems without impacting user programs" (I would say the same is true of the algebraic approach, too). In other words, moving performance considerations out of user programs means those programs can automatically take advantage of evolutionary -- even revolutionary -- developments in physical data access technology.

Evolutionary development of data structures: This point is related to, and similar to, the previous one (it means that user programs can automatically take advantage of developments in physical storage technology too). Note: By "data structures" here, Codd really means storage structures.

Support for specialized query-update languages: "Many users need...languages specialized to their applications. The high cost of supporting [such] languages...suggests that [as much common functionality as possible] be identified and programmed once and for all...[Research on natural language query processors] suggests that the calculus-oriented language is an appropriate stepping-stone toward this goal." Again, all very true. Indeed, Codd's own later work on a natural language query system called Rendezvous6 lent further weight to this particular argument.

 

Concluding Remarks

There are two final points I'd like to make to close out this discussion of Codd's Alpha language.

1. The Alpha paper mentions several planned follow-on papers: "[The present paper] is intended to provide a framework for subsequent papers on authorization principles, search tactics, and data representation techniques" (page 2). "Detailed treatment of [the catalog] will be postponed to another paper" (page 35). "Several additional features are desirable ... [including] interlocking,
access authorization, integrity preservation, virtual attributes, literal insertions ... Various types of errors and ... feedback [information] ... have ... been intentionally omitted. These aspects will be discussed in a later paper" (page 41). Sadly, I don't think any of these promised papers ever actually materialized!

2. Together with three colleagues, Codd subsequently worked on the design of a low-level subsystem called Gamma-0, which was intended to serve as a basis for implementing higher-level relational languages like Alpha.2 More precisely, Gamma-0 was intended as a basis for implementing another, slightly higher-level, interface called Gamma-1, and Gamma-1 in turn was intended as a basis for implementing truly high-level languages like Alpha. The principal difference between Gamma-0 and Gamma-1 was that Gamma-0 provided a single-user interface only, while Gamma-1 provided a multi-user one. They were, of course, designed in concert: "Essential aspects of Gamma-1 [were] considered and were influential in certain Gamma-0 design choices."2

Gamma-0 and Gamma-1 together exhibit many points of similarity with the storage subsystem of System R 1 known as the RSS ("Relational Storage System"). It's therefore presumably not just coincidence that one of Codd's three coworkers on the Gamma projects, Irv Traiger, was later the manager of the RSS project during much of its life.



C. J. Date is an independent author, lecturer, researcher, and consultant specializing in relational database systems. His most recent books are Foundation for Object/Relational Databases: The Third Manifesto, coauthored with Hugh Darwen, and Relational Database Writings 1994-1997, both published by Addison-Wesley in 1998. You can reach him at iemagazine@mfi.com

 

References

1. M. M. Astrahan et al.: "System R: Relational Approach to Database Management." ACM Transactions on Database Systems 1, No. 2 (June 1976).

2. D. Bjo/rner, E. F. Codd, K. L. Deckert, and I. L. Traiger: "The GAMMA-0 n-ary Relational Data Base Interface: Specifications of Objects and Operations." IBM Research Report RJ1200 (April 11, 1973).

3. E. F. Codd: "Notes on a Data Sublanguage." IBM internal memo (January 19, 1970).

4. E. F. Codd: "A Relational Model of Data for Large Shared Data Banks." CACM 13, No. 6 (June 1970). Republished in Milestones of Research -- Selected Papers 1958-1982 (CACM 25th Anniversary Issue), CACM 26, No. 1 (January 1983).

5. E. F. Codd: "A Data Base Sublanguage Founded on the Relational Calculus." IBM Research Report RJ893 (July 26, 1971). Republished in Proc. 1971 ACM SIGFIDET Workshop on Data

Description, Access and Control, San Diego, Calif. (November 1971).

6. E. F. Codd: "Seven Steps to Rendezvous with the Casual User." IBM Research Report RJ1333 (January 7, 1974). Republished in J. W. Klimbie and K. L. Koffeman (eds.), Data Base Management, Proc. IFIP TC-2 Working Conference on Data Base Management. New York, N.Y.: North-Holland (1974).

7. E. F. Codd: "Extending the Relational Database Model to Capture More Meaning." IBM Research Report RJ2599 (August 6, 1979). Republished in ACM Transactions on Database Systems 4, No. 4 (December 1979).







IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo JitterPlug Into The Cloud
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet EvolutionPyramid Research
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space