Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home



Inheritance and Comparison Operations


The effects of polymorphism and substitutability on comparisons

by C.J. Date

I first examined comparison operations -- equality comparisons, to be more specific -- in the previous installment of this column, when I discussed the TREAT UP operator. The general rule for such comparisons follows immediately from The Principle of Value Substitutability. It runs as follows. Consider the comparison:

X = Y

where X and Y are scalar expressions. The declared types DT(X) and DT(Y) must have a common supertype, otherwise the comparison is illegal (this is a compile-time check). If the comparison is legal, its effect is to return true if the most specific type MST(X) is equal to the most specific type MST(Y) and the value v(X) is equal to the value v(Y), and false otherwise. Note in particular that two values can't possibly "compare equal" if their most specific types are different (indeed, if their most specific types are different, then the values must be different, since any given value has exactly one most specific type).

By the way, note that DT(X) and DT(Y) will certainly have a common supertype if either is a supertype of the other. It follows that, for example, a comparison of the form

E = C

(where the declared types of E and C are ELLIPSE and CIRCLE, respectively) is certainly legal.






Relational Algebra Operations

Equality comparisons are involved, implicitly or explicitly, in many of the operators of the relational algebra. And when supertypes and subtypes are involved, it turns out that certain of those operators exhibit behavior that might be thought (at least at first blush) a trifle counterintuitive. Consider the relations RX and RY shown in Fig. 1. Observe that the sole attribute A in RX is of declared type ELLIPSE and its counterpart A in RY is of declared type CIRCLE. (I adopt the convention in the figure that values of the form Ei are ellipses that aren't circles and values of the form Ci are circles. Most specific types are shown in lower case italics.)

Now consider the join of RX and RY, RJ say (see Fig. 2). Clearly, every A value in RJ will necessarily be of type CIRCLE (because any A value in RX whose most specific type is merely ELLIPSE can't possibly "compare equal" to any A value in RY). Thus, it might be thought that the declared type of attribute A in RJ should be CIRCLE, not ELLIPSE. But consider the following:






  • Since RX and RY each have A as their sole attribute, RX JOIN RY reduces to RX INTERSECT RY. In such circumstances, therefore, the rule regarding the declared type of the result attribute for JOIN must obviously reduce to the analogous rule for INTERSECT.
  • RX INTERSECT RY in turn is logically equivalent to RX MINUS (RX MINUS RY). Let the result of the second operand here -- that is, RX MINUS RY -- be RZ. Then it should be clear that:
    • RZ will include some A values of most specific type ELLIPSE, in general, and so the declared type of attribute A in RZ must be ELLIPSE.
    • The original expression thus reduces to RX MINUS RZ, where the declared type of attribute A in both RX and RZ is ELLIPSE, and hence yields a final result in which the declared type of attribute A must obviously be ELLIPSE once again.
  • It follows that the declared type of the result attribute for INTERSECT, and therefore for JOIN as well, must be ELLIPSE, not CIRCLE -- even though (to repeat) every value of that attribute must in fact be of type CIRCLE!

Now let's turn to the relational difference operator, MINUS. First, consider RX MINUS RY. It should be clear that some A values in the result of this operation will be of type ELLIPSE, not CIRCLE, and so the declared type of A in that result must be of type ELLIPSE. But what about RY MINUS RX? Clearly, every A value in the result of this latter operation will be of type CIRCLE, and so again it might be thought that the declared type of A in that result should be CIRCLE, not ELLIPSE. However, observe that RX INTERSECT RY is logically equivalent, not only to RX MINUS (RX MINUS RY) as already discussed, but also to RY MINUS (RY MINUS RX); given this fact, it is easy to see that specifying the declared type of A in the result of RY MINUS RX to be CIRCLE leads to a contradiction. It follows that the declared type of the result attribute for MINUS too must be ELLIPSE, not CIRCLE, even in the case of RY MINUS RX where every value of that attribute must in fact be of type CIRCLE.

Finally, let's consider RX UNION RY. Here it should be obvious that the result will include some A values of most specific type ELLIPSE, in general, and so the declared type of attribute A in that result must necessarily be ELLIPSE. Thus, the declared type of the result attribute for UNION too must be ELLIPSE (but this particular case, unlike the JOIN, INTERSECT, and MINUS cases, can hardly be described as counterintuitive).

Here then is the general rule:

  • Let rx and ry be relations with a common attribute A, and let the declared types of A in rx and ry be DT(Ax) and DT(Ay), respectively. Consider the join of rx and ry (necessarily over A, at least in part). DT(Ax) and DT(Ay) must have a common supertype, otherwise the join is illegal (this is a compile-time check). If the join is legal, then the declared type of A in the result is the most specific common supertype of DT(Ax) and DT(Ay).
  • Analogous remarks apply to union, intersection, and difference: In every case, (a) corresponding attributes of the operands must be such that their declared types have a common supertype, and (b) the declared type of the corresponding attribute in the result is the most specific common supertype of those two declared types.

What About Coercions?

To repeat a point I've discussed previously in this series, when we assign (for example) a value of most specific type CIRCLE to a variable of declared type ELLIPSE, what does not happen is that the circle value is implicitly converted to become "just an ellipse." In the same way, when we compare (for example) a value of most specific type CIRCLE and a value of most specific type ELLIPSE, what does not happen is that the circle value is implicitly converted to become "just an ellipse." In fact, The Third Manifesto expressly suggests that coercions -- that is, such implicit type conversions -- be prohibited.

Now, we adopt this rather conservative position in the Manifesto partly on the basis of claims that coercions undermine the objective of value substitutability. Essentially (and to repeat myself somewhat), the argument goes like this:

  • Type inheritance means among other things that, if we define (say) type CIRCLE to be a subtype of supertype ELLIPSE, then:
    • By definition, every circle is also an ellipse.
    • Therefore, any code that works for ellipses should also work for circles (because circles are ellipses).
    • In other words, it should be possible to invoke that code and pass it a circle instead of an ellipse and have it still work.
    • Thus, wherever an ellipse is expected, it should be possible to substitute a circle -- The Principle of Value Substitutability.
  • Even if a circle is substituted for an ellipse in this way, it's required that it remains a circle and retain its circle-specific properties (for example, the property of having a radius).
  • If passing a circle when an ellipse was expected were to cause that circle to be coerced to type ELLIPSE, then it would cease to be a circle and would lose its circle-specific properties (for example, ellipses in general do not have a radius).
  • Therefore coercions undermine the objective of substitutability.

But observe now the following arguably undesirable consequence of this position. Let (say) 5.0 and 5 be values of most specific types RATIONAL and INTEGER, respectively. Then -- perhaps surprisingly -- the comparison

5.0 = 5

will either (a) be illegal (if INTEGER is not a subtype of RATIONAL) or (b) give false (otherwise). This state of affairs might be seen as an argument in favor of permitting coercions after all. To elaborate:

  • If INTEGER is not a subtype of RATIONAL:
    • The concept of value substitutability doesn't arise (that is, integers can't be substituted for rationals);
    • Therefore the idea of converting integers to rationals seems useful;
    • Therefore it seems reasonable to assume that an operator to perform such conversions will have been defined;
    • Therefore no harm is done (no harm to substitutability, that is) by invoking that conversion operator implicitly in an INTEGER-to-RATIONAL comparison or in an INTEGER-to-RATIONAL assignment. In particular, value substitutability isn't undermined because, as already noted, the concept simply doesn't apply.
  • On the other hand, if INTEGER is a subtype of RATIONAL:
    • Integers can be substituted for rationals (that is, the concept of value substitutability does apply);
    • Therefore there's little point in defining an operator to convert integers to rationals (after all, integers would be rationals already);
    • Therefore it seems reasonable to assume that no such operator will have been defined;
    • Therefore the question of invoking such an operator implicitly doesn't arise, meaning that (again) value substitutability isn't undermined. (The system obviously can't invoke an operator Op implicitly if Op doesn't exist!)

Perhaps I should add that it's my feeling that good type design will tend to reduce the number of type conversions needed anyway, implicit or otherwise. For example, consider temperatures. A good design would involve a single TEMPERATURE type and THE_ operators to expose a Celsius representation, a Fahrenheit representation, and so on. By contrast, a bad design would involve lots of different types -- CELSIUS, FAHRENHEIT, and so on -- and hence lots of conversion operators also.

Of course, we do all know that there are other sound reasons for rejecting the notion of coercions. To be specific, it's well known that coercions can lead to bugs (sometimes ones that are hard to find, too). But my intention here is simply to investigate the question of whether inheritance specifically is an argument against coercions -- and I have to conclude that it's not.

By the way, I'd like to note in passing that if INTEGER is a subtype of RATIONAL, then we have a good example in which subtype and supertype have very different physical actual representations.

Other Comparison Operators

What about comparison operators other than equality? "Not equals" is easy: The expression "x ¹ y" is defined to be equivalent to "NOT (x = y)." As for "<" and ">," well, of course these operators don't apply to all types (for example, "<" clearly makes no sense for type POINT). Types for which the operators do make sense are referred to as ordered types. Even here, the semantics of "<" and ">" are up to the type designer (they aren't prescribed by the model), though it would seem prudent to define "x > y" to be equivalent to "NOT (x £ y)." Note, however, that it might not be necessary for MST(x) and MST(y) to be the same in order for such comparisons to give true. For example, MST(4) and MST(5) might be EVEN_INTEGER and ODD_INTEGER respectively (where EVEN_INTEGER and ODD_INTEGER are both proper subtypes of INTEGER), and yet "4 < 5" might very reasonably give true.



C. J. Date is an independent author, lecturer, researcher, and consultant, specializing in relational database systems (a field he helped pioneer). His most recent books are Foundation for Future Database Systems: The Third Manifesto (2nd edition, co-authored with Hugh Darwen); The Database Relational Model: A Retrospective Review and Analysis; WHAT Not HOW: The Business Rules Approach to Application Development; and An Introduction to Database Systems (7th edition), all of them published by Addison-Wesley in 2000.





IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo Jitter
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet Evolution
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space