A look at Codd's own proposal for a relational languageData Sublanguage AlphaIn last month's installment, I mentioned the fact that Codd himself originally proposed a database language ÷ "Data Sublanguage Alpha" ÷ based on the relational calculus. He described that language (albeit only informally) as early as 1970,1 and again more fully in 1971;2 in fact, Alpha was probably the first relational language to be described by anyone, anywhere. Although Alpha itself was never implemented, it was very influential on the design of subsequent languages that definitely were, including in particular QUEL and÷to a lesser extent÷SQL. Moreover, Alpha also included some useful ideas (such as quota queries) that are still not widely supported to this day. I want to take a look at some of Alpha's salient features. I'll use reference 2 as my primary source (referring to it in what follows as "the Alpha paper" or sometimes just "Codd's paper"); I'll mention reference 1 specifically only in connection with certain interesting ideas that didn't make it in to reference 2. Before I get into Alpha's specifics, let me just mention that the Alpha paper also includes a short list of "principal motivations of the relational model." WorkspacesBecause it's intended as a sublanguage specifically, Alpha requires some mechanism for exchanging data with the host language in which it happens to be embedded. Workspaces serve this purpose. The basic idea is that data retrieved from the database via an Alpha operation Alpha's reliance on workspaces effectively means that it embraces the concept of a two-level store, meaning that the user sees a sharp distinction between data in main memory and data in the database. This distinction ÷ which is, of course, a consequence of the fact that Alpha is a sublanguage specifically ÷ has been perpetuated in many other languages, including SQL. It has also been the subject of some criticism.6 In <ital>The Third Manifesto<unital>,5 Hugh Darwen and I suggest that users should be able to operate in terms of a one-level store; data transfers among memory levels are an implementation detail and should be hidden from the user. An Alpha OverviewCodd states explicitly that he "does not attach any special importance to the syntax of Alpha"; however, he does have to use some syntax in describing the language, and I'll use that same syntax here. I'll show keywords of that syntax in bold in order to set them off from surrounding text. The following feature summary gives some idea of the scope of the Alpha language: - Introducing and removing domain names: DOMAIN, DROP - Declaring and destroying relations: RELATION, DROP - Declaring range ("tuple") variables: RANGE - Retrieval: GET - Insertion: PUT - Modification: HOLD, UPDATE, RELEASE - Deletion: DELETE - "Piped mode": OPEN, CLOSE Let's take a closer look at each of these features. Data Definition OperationsI'll start with domains. The only domains mentioned in the Alpha paper are "simple" ones ÷ that is, ones represented in terms of "simple" data types such as CHAR(5), NUM(4,0). In other words, domains seem to be little more than what SQL3 calls "DISTINCT types"11 instead of being full abstract data types ÷ ADTs ÷ in all their glory. (On the other hand, there's nothing in the paper to suggest they can't be full ADTs!) Here's an example of defining (and subsequently destroying) an Alpha domain: DOMAIN CITY CHAR(15) DROP CITY One good thing about the paper is that ÷ unlike some of those that preceded it ÷ it does make a clear distinction between domains and attributes. In fact, it adopts a convention that attribute names take the specific form [role-name_]domain-name where the optional role name is a qualifier that's needed only if the same domain is used more than once in the same relation. "One important advantage of [this scheme is that it permits] systematic handling of all-occurrence transformations... An example...is a sweeping change of all part numbers wherever they occur·" Here, Codd seems to be at least partly overlooking the role the catalog could play in such a transformation ÷ an odd omission, considering that he does mention the catalog explicitly later in the paper. (Indeed, one of the many important firsts that can be chalked up to the Alpha paper is the suggestion that the catalog itself should be structured in the same way as the rest of the data: namely, as relations.) In other words, I'm not convinced that the attribute naming convention is a good one; indeed,it clearly isn't when we take into account the question of attribute names for derived relations. Turning now to relations÷more precisely, to what Codd calls "time-varying" relations÷note first that the bulk of the paper makes a tacit assumption that all such relations are base relations specifically; such matters as view definition, and especially view updating, receive essentially no attention (though the earlier draft1 does include a brief discussion of a view-related phenomenon it calls "domain migration," which I'll discuss in my next installment). The definition of such a relation specifies the relation name, the attribute names (and the corresponding domains), and the primary key. Here's an example of such a definition (together with a corresponding DROP): RELATION S (S#, SNAME_NAME, STATUS, CITY) KEY S# DROP S Data Manipulation OperationsRetrieval: Here's an Alpha retrieval example ("Get supplier names and cities for suppliers who supply all parts"): RANGE S SX GET W1 (SX.SNAME, SX.CITY) : In this example, W1 is the name of the applicable workspace; you can read the colon as "where" or "such that," and ALL and SOME denote the universal and the existential quantifier, respectively. Actually, reference 2 uses the standard logic symbols for ALL and SOME (also for the connectives AND, OR, and so on); reference 1, in contrast, uses the keywords, which I prefer. Reference 2 also lets you move the quantifiers up into the RANGE declarations, as here: RANGE S SX GET W1 (SX.SNAME, SX.CITY) : SPX.S# = SX.S# AND SPX.P# = PX.P# Note: Reference 2 never mentions the possibility of a range variable ranging over, for example, the union of two base relations, or indeed over anything more complex than a single base relation. (I discussed this possibility in last month's column.) Also, the earlier paper1 uses the keyword DUMMY in place of RANGE on the grounds that (as I also discussed last installment) you can think of a range variable as a kind of "dummy" variable. GET operations also permit: - References to relations in workspaces just as if they were in the database. Such references use the workspace name for qualification purposes as if it were a relation name. - A tuple specification ordering for the result (analogous to SQL's ORDER BY), using the keywords UP and DOWN. - A quota specification to limit the number of tuples retrieved. The paper does not, however, address the question of what should happen if the result relation is not uniquely determined.10 - The use of certain truth-valued functions (TOP and BOTTOM) in the qualification. TOP and BOTTOM are Alpha's analogs of the functions IS_NTH_LARGEST and IS_NTH_SMALLEST described in reference 10. - The use of certain aggregate functions (COUNT, TOTAL, MAX, MIN, AVERAGE) in the target-commalist (the term "aggregate functions" isn't used, however). Note: The original version of the paper overlooked the fact that, at least for TOTAL and AVERAGE, the argument is a bag, not a set (duplicates shouldn't be eliminated before the aggregation is performed). Codd later acknowledged this error.3 As an aside, I remark that Alpha does not directly permit aggregate functions to appear in the qualification; instead, it makes use of a shorthand version called image functions (which can also be used in the target-commalist). I'll skip the details here, since frankly I don't think image functions were one of Alpha's better ideas. Indeed, they might well have been the source of the strange syntax used in both QUEL and SQL for aggregate function invocations, and hence for the complexities caused in both languages by that unorthodox syntax. (The complexities in question are caused by the fact that function arguments are determined, in part, by reference to context, instead of ÷ as would be more orthodox ÷ being fully determined by whatever's specified within the parentheses immediately following the function name.)7,8 Alpha also permits retrieval to be performed a tuple at a time via piped mode (analogous to SQL's FETCH via a cursor). For example (assuming the same RANGE declarations as before): OPEN GET W1 (SX.SNAME, SX.CITY) : ... ... Each execution of GET W1 here retrieves the next tuple from the result of the query in the OPEN. (Note the UP ordering specification in that OPEN.) Of course, piped mode opens up the possibility of subverting the system, because you can use it to perform what is essentially "manual navigation" (instead of the automatic navigation normally preferred in a relational system).4 But piped mode or something like it is essential if the language is meant to be a sublanguage specifically, as Alpha is. Insertion: The PUT operation transfers a set of tuples from a specified workspace to a specified relation. For example: PUT W2 SP ("Insert the tuples in workspace W2 into relation SP"). An ordering can be specified too, to inform the system that the tuples to be inserted are ordered in a certain way in the workspace. Actually, I feel a little uncomfortable with this particular notion; it seems to me to mix logical and physical ideas too much.
"Piped insertion" is also possible: OPEN PUT W2 S ... ... Modification: Here's a data modification example ("Move all Paris suppliers to Rome"): HOLD W3 (SX.S#, SX.CITY) : SX.CITY = 'Paris' Alpha requires tuples that are to be updated to be retrieved first via HOLD. (There's no analog of SQL's out-of-the-blue "searched UPDATE.") The HOLD target-commalist must involve exactly one range variable (SX in the example), meaning the UPDATE will be applied to just one (base) relation. (As Codd puts it, "[the single range variable restriction] avoids unwarranted complexity." Just so!) If the user decides not to perform the UPDATE after all, the operation RELEASE W3 can be executed to release the held tuples. Note: Although reference 2 doesn't say as much, the HOLD target-commalist must include all primary key attributes of the relation you're modifying so that the system can know which tuples you're updating.3 Primary key attributes themselves cannot be updated directly via a HOLD...UPDATE sequence, but you can use deletions and insertions to achieve an analogous effect. Piped mode is also available for HOLD...UPDATE/RELEASE (analogous to SQL's UPDATE CURRENT). Oddly, there's no analog of DELETE CURRENT. The omission is unimportant, however, because ÷ as I've argued elsewhere9 ÷ modifying or deleting (or indeed inserting) a single tuple is a bad idea anyway. Update operations should always be done set-at-a-time. You can probably tell that the modification portions of Alpha aren't exactly its strongpoint; indeed, the foregoing discussions raise many questions the paper simply doesn't address. Deletion: DELETE operations are basically straightforward, apart from the criticisms already mentioned that apply to update operations in general. Here's an example: DELETE SX: SX.S# = 'S1' In my next installment, I'll conclude my discussion about data sublanguage Alpha. C. J. Date is an independent author, lecturer, researcher, and consultant, specializing in relational database systems. His most recent books are Foundation for Object/Relational Databases: The Third Manifesto, cowritten with Hugh Darwen, and Relational Database Writings 1994-1997, both published by Addison-Wesley in 1998. Correspondence may be sent to him in care of Intelligent Enterprise, 411 Borel Ave., Ste. 100, San Mateo, Calif. 94402.
REFERENCES
1. E. F. Codd: "Notes on a Data Sublanguage." IBM internal memo (January 19, 1970). 2. E. F. Codd: "A Data Base Sublanguage Founded on the Relational Calculus." IBM Research Report RJ893 (July 26, 1971). Republished in Proc. 1971 ACM SIGFIDET Workshop on Data Description, Access and Control, San Diego, Calif. (November 1971). 3. E. F. Codd: Private communication (March 14, 1972). 4. E. F. Codd and C. J. Date: "Interactive Support for Nonprogrammers: The Relational and Network Approaches." IBM Research Report RJ1400 (June 6, 1974). Republished in Randall J. Rustin (ed.), Proc. ACM SIGMOD Workshop on Data Description, Access, and Control, Vol. II, Ann Arbor, Michigan (May 1974). Also in C. J. Date, Relational Database: Selected Writings. Reading, Mass.: Addison-Wesley (1986). See also E. F. Codd: "The Significance of the SQL / Data System Announcement." IBM Report (January 30, 1981). 5. Hugh Darwen and C. J. Date: "The Third Manifesto." ACM SIGMOD Record 24, No. 1 (March 1995). Version Two of this document is in preparation at the time of writing. 6. C. J. Date: "An Introduction to the Unified Database Language (UDL)." Proc. 6th International Conference on Very Large Data Bases, Montreal, Canada (September/October 1980). Republished (in considerably revised form) in C. J. Date, Relational Database: Selected Writings, Addison-Wesley (1986). 7. C. J. Date: "A Critique of the SQL Database Language." ACM SIGMOD Record 14, No. 3 (November 1984). Republished in C. J. Date, Relational Database: Selected Writings. Reading, Mass.: Addison-Wesley (1986). 8. C. J. Date: A Guide to INGRES. Reading, Mass.: Addison-Wesley (1987). 9. C. J. Date: "It's All Relations!" Database Programming & Design 8, No. 1 (January 1995). 10. C. J. Date: "Quota Queries." Database Programming & Design 9, Nos. 7-9 (July-September 1996). 11. C. J. Date (with Hugh Darwen): A Guide to the SQL Standard (4th ed). Reading, Mass.: Addison-Wesley (1997). |
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| |||||||||||||||||||||||||||||||




















