|
|
Type hierarchies are an intuitively convenient way of depicting subtype/supertype relationships in an inheritance model
Type Hierarchies:Beginning An Investigation into Type Inheritance
By C.J. Date
Some readers will be aware that this column has been "on hold" for the past several months. The reason isn't important here, but it has led to a small problem, as follows. At the time the column was put on hold, I'd just begun another miniseries, having to do with (and meant as a gentle introduction to) the type inheritance model Hugh Darwen and I defined in our book The Third Manifesto (Addison-Wesley, 1998). The trouble is, that model has evolved somewhat in the interim! It has changed so much, in fact, that some of the material in the first few installments of the series is now already out of date.
Rather than scrap the first few installments and start again, however, I decided to let the series proceed as originally planned, building on what I've already covered, but warning you when I present material that we've subsequently revised. In fact, I think it can be positively helpful to see where we went down a blind alley, and why what looked like a promising approach at first didn't work out on closer inspection. In this way, the series will not only (eventually) describe our model as it finally wound up, it will also trace the historical developments that brought us to our final position.
THE RUNNING EXAMPLE
I'll begin by introducing a running example that illustrates many of the points I'll make over the next few columns. (See Figure 1.) As that figure suggests, the example involves a set of geometric types -- PLANE_FIGURE, ELLIPSE, CIRCLE, POLYGON, and so on -- each of which is (in accordance with the definition I gave in the previous installment for the term type) a named set of values. For example, there's a set of values named ELLIPSE, and every value in that set is some specific ellipse. The types are arranged into what's called a type hierarchy or (more generally) a type graph.
The general concept of a type hierarchy should be pretty much self-explanatory. The type hierarchy of Figure 1, for example, is meant to show, among other things, that type RECTANGLE is a subtype of type POLYGON, which means that all rectangles are polygons, but the converse isn't true (some polygons aren't rectangles). Further, it also means that all properties that apply to polygons in general apply to -- that is, are inherited by -- rectangles in particular, but the converse isn't true (rectangles have properties of their own that don't apply to polygons in general). Recall from the previous installment that "properties" here essentially means operators and type constraints; thus, operators and type constraints that apply to polygons apply to rectangles a fortiori (because a rectangle is a polygon), but some operators and type constraints that apply to rectangles don't apply to mere polygons.
Here in outline are Tutorial D definitions for some of the types in the type hierarchy of Figure 1. (See Listing 1.) Note the type constraints in particular. (Just as an aside, I should mention that Tutorial D syntax is one of the things we've changed since the previous installment was published.)
Now the system knows, for example, that CIRCLE is a subtype of ELLIPSE, and hence that operators and constraints that apply to ellipses in general apply to circles in particular.
I should elaborate briefly on the type definitions for types ELLIPSE and CIRCLE. Basically, I'm assuming for simplicity that ellipses are always oriented such that their major axis is horizontal and their minor axis vertical; thus, ellipses might possibly be represented by their semiaxes a and b (and their center). By contrast, circles might possibly be represented by their radius r (and their center). I'm also assuming for simplicity that all ellipses are such that their major semiaxis a is never shorter than their minor semiaxis b (in other words, ellipses are always "short and fat," never "tall and thin").
Listing 2 shows in outline definitions for some of the operators associated with some of the types in Figure 1's type hierarchy
All of these operators except THE_R apply to values of type ELLIPSE and hence a fortiori to values of type CIRCLE as well. THE_R, by contrast, applies to values of type CIRCLE only.
POINTS ARISING
Type hierarchies like that of Figure 1 are not really part of our inheritance model as such -- they're merely an intuitively convenient way of depicting subtype/supertype relationships (which are). Indeed, type hierarchies play a role in our inheritance model analogous to that played by tables in the relational model: Tables per se are not really part of the relational model, they're merely an intuitively convenient way of depicting relations (which are).
I should mention too that type hierarchies are referred to in the literature by a variety of different names, including but not limited to the following:
Class hierarchies (on the grounds that types are sometimes called classes, especially in the object world)
Generalization hierarchies (on the grounds that, for example, an ellipse is a generalization of a circle)
Specialization hierarchies (on the grounds that, for example, a circle is a specialization of an ellipse)
Inheritance hierarchies (on the grounds that, for example, circles inherit properties from ellipses)
ISA hierarchies (on the grounds that, for example, every circle "IS A" ellipse).
In this series, I'll stick with the term "type hierarchy" (at least so long as we limit our attention to single inheritance only; for obvious reasons, type hierarchies per se won't be sufficient when we get to multiple inheritance, when we'll have to extend them to become type lattices).
TERMS AND DEFINITIONS
I need to introduce a number of additional definitions and new terms (most of them fairly straightforward, however) before I can go much further. So here goes:
- A supertype of a supertype is itself a supertype; for example,
POLYGON is a supertype of SQUARE.
- Every type is a supertype of itself; for example,
ELLIPSE is a supertype of ELLIPSE. Note: We adopt this perhaps slightly counterintuitive definition for reasons of convenience. It has the effect of simplifying several of the definitions and concepts we'll be dealing with over the next few installments.
- If
A is a supertype of B, and A and B are distinct, then A is a proper supertype of B; for example, POLYGON is a proper supertype of SQUARE (it's also a supertype of POLYGON, of course, but not a proper one).
Analogous definitions apply to subtypes, of course. Thus:
- A subtype of a subtype is itself a subtype; for example,
SQUARE is a subtype of POLYGON.
- Every type is a subtype of itself; for example,
ELLIPSE is a subtype of ELLIPSE.
- If
B is a subtype of A, and B and A are distinct, then B is a proper subtype of A; for example, SQUARE is a proper subtype of POLYGON (it's also a subtype of SQUARE, of course, but not a proper one).
Next:
- If
A is a proper supertype of B, and there is no type C that's both a proper subtype of A and a proper supertype of B, then A is an immediate supertype of B, and B is an immediate subtype of A. For example, RECTANGLE is an immediate supertype of SQUARE, and SQUARE is an immediate subtype of RECTANGLE. (POLYGON is a proper supertype of SQUARE, but not an immediate one; likewise, SQUARE is a proper subtype of POLYGON, but not an immediate one.) Incidentally, note therefore that in a Tutorial D type definition the keyword IS means, quite specifically, "is an immediate subtype of."
- A root type is a type with no proper supertype; for example,
PLANE_FIGURE is a root type.
- A leaf type is a type with no proper subtype; for example,
CIRCLE is a leaf type. Note: We don't insist that every value be a value of some leaf type (for example, some ellipses are not circles). To put it another way, we don't insist that ELLIPSE have another proper subtype called NONCIRCLE and that every value of type ELLIPSE be either of type CIRCLE or of type NONCIRCLE. Of course, we don't prohibit such an arrangement, either, but there are reasons (which I'll discuss later in this series) why such a state of affairs might be undesirable in practice.
- Every proper subtype has exactly one immediate supertype (of course, this particular rule will no longer apply when we get to multiple inheritance).
- So long as (a) there's at least one type and (b) there aren't any cycles -- that is, there isn't any sequence of types
T1, T2, T3, ..., Tn such that T1 is an immediate subtype of T2, T2 is an immediate subtype of T3, ..., and Tn is an immediate subtype of T1 -- then at least one type must be a root type. (In fact, there can't be any cycles. I'll leave it to you to figure out why this must be so!)
Note: We don't assume there's just one root type (equivalently, we don't assume there's just one type hierarchy). If there are two or more, however, we can always invent some kind of "system" type that's an immediate supertype for all of them, thereby tying all of the separate hierarchies together into one. So there's no loss of generality in assuming just one root type. Indeed, some commercial object systems come ready equipped with such a type, often called OBJECT (because "everything's an object").
THE DISJOINTNESS ASSUMPTION
Now I want to introduce another important simplifying assumption. To be specific, I'm going to assume until further notice that if T1 and T2 are distinct root types or distinct immediate subtypes of the same supertype (implying in particular that neither is a subtype of the other), then they're disjoint -- that is, no value is of both type T1 and type T2. For example, no value is both an ellipse and a polygon.
Now, at first sight this assumption might seem both intuitive and reasonable. Unfortunately, however, there are many situations where it isn't reasonable at all, as you'll see later in this series. Nevertheless, I'll stay with the assumption as stated until further notice. Observe the following important logical consequences:
- Distinct type hierarchies are disjoint.
- Distinct leaf types are disjoint.
- Every value has exactly one most specific type. For example, a given value might be "just an ellipse" and not a circle, meaning its most specific type is
ELLIPSE (in the real world, some ellipses aren't circles). In fact, to say that the most specific type of some value v is T is to say, precisely, that the set of types possessed by v is the set of all supertypes of T (a set that includes T itself, of course).
One reason the disjointness assumption is desirable is that it avoids certain operator invocation ambiguities that might otherwise arise. For suppose, contrariwise, that some value v could be of both type T1 and type T2, neither of which is a subtype of the other. Suppose further that an operator named Op has been defined for type T1 and another operator with the same name, Op, has been defined for type T2 (in other words, Op is polymorphic, a concept I'll discuss next installment). Then an invocation of Op with argument v would be ambiguous.
AN IMPORTANT ASIDE
Although I'm primarily concerned with a model of inheritance, not with implementation matters, the sad fact is that you do have to understand certain implementation issues in order to understand the overall concept of inheritance properly -- and now I come to one such issue. The point is this:
- The fact that
B is a subtype of A does not imply that the actual (hidden) representation of B values is the same as that of A values! For example, ellipses might be actually represented by their center and semiaxes, circles by their center and radius.
(As a matter of fact, there's no logical reason why all values of the same type have to have the same actual representation. For example, some points might be represented by Cartesian coordinates and some by polar coordinates; some temperatures might be represented in Celsius and some in Fahrenheit; some integers might be represented in decimal and some in binary; and so on.)
This point will turn out to be important in the next installment, when I'll discuss the question of code reuse.
PUZZLE
It's been a long time since I included a puzzle corner problem in my column, but I have one for you this time. In general, any given type hierarchy includes several subhierarchies that can be regarded as type hierarchies in their own right. For example, the hierarchy obtained from that of Figure 1 by deleting types PLANE_FIGURE, ELLIPSE, and CIRCLE (only) can be regarded as a type hierarchy in its own right, and so can the one obtained by deleting types CIRCLE, SQUARE, and RECTANGLE (only). On the other hand, the hierarchy obtained by deleting ELLIPSE (only) can't be regarded as a type hierarchy in its own right (at least, not one that can be derived from that of Figure 1), because type CIRCLE "loses some of its inheritance," as it were, in that hierarchy.
The question is: How many distinct type hierarchies are there altogether in Figure 1? I'll provide an answer to this question next time.
C. J. Date is an independent author, lecturer, researcher, and consultant, specializing in relational database systems. His most recent books are Foundation for Object/Relational Databases: The Third Manifesto (coauthored with Hugh Darwen), Relational Database Writings 1994-1997, and An Introduction to Database Systems (7th edition), all published by Addison-Wesley. Correspondence may be sent to him in care of Intelligent Enterprise <iemagazine@cmp.com.>
|
Listing 1
|
TYPE PLANE_FIGURE ... ;
TYPE ELLIPSE
IS PLANE_FIGURE
POSSREP { A LENGTH, B LENGTH, CTR POINT
CONSTRAINT A > B } ;
TYPE CIRCLE
IS ELLIPSE
CONSTRAINT THE_A ( ELLIPSE ) = THE_B ( ELLIPSE )
POSSREP { R = THE_A ( ELLIPSE ) ,
CTR = THE_CTR ( ELLIPSE ) } ;
|
|
Listing 2
|
OPERATOR AREA { E ELLIPSE } RETURNS AREA ;
/* "area of" -- note that AREA is the name of both */
/* the operator itself and the type of the result */ ... ;
END OPERATOR ;
OPERATOR THE_A { E ELLIPSE } RETURNS LENGTH ;
/* "the a semiaxis of" */ ... ;
END OPERATOR ;
OPERATOR THE_B { E ELLIPSE } RETURNS LENGTH ;
/* "the b semiaxis of" */ ... ;
END OPERATOR ;
OPERATOR THE_CTR { E ELLIPSE } RETURNS POINT ;
/* "the center of" */ ... ;
END OPERATOR ;
OPERATOR THE_R { C CIRCLE } RETURNS LENGTH ;
/* "the radius of" */ ... ;
END OPERATOR ;
|
FIGURE 1: A Sample type hierarchy.

Copyright © 2004 CMP Media Inc. ALL RIGHTS RESERVED
No Reproduction without permission
|