In the last issue (There are no guarantees, August 1, 2000), I proposed four kinds of useful data warehouse business rules, which I arranged in ascending order of power. These rules included simple data formats, relationships between the keys of connected tables, declarations of entity relationships, and complex business logic. Unfortunately, I also arranged this list by the degree of difficulty involved in applying the business rules directly to implementing data warehouses. When we create tables, the database enforces data formats, the first and simplest kind of business rules. The database also enforces foreign key/primary key relationshipsthe second kind of business ruleswhen we create tables and when we declare referential integrity constraints. But our primary database systems do not directly support the third and fourth kinds of business rules. The third kind of business ruledeclaration of entity relationshipstakes many forms but it is often of the form Part is Supplied by Supplier when we declare the link between the Part entity and the Supplier entity in the business model. In an entity-relation (E/R) diagram, the supplied verb is only an annotation on the diagram. There are no semantics in support of this verb. If the part-supplier relationship is many-to-one, meaning that each part is supplied by one and only one supplier, then we explicitly define a hierarchy that key relationships between the tables can enforce. A good CASE tool will transform this diagrammatic notation into the proper key relationships in the DBMS when we create the tables. But if the relationship is many-to-many, meaning that more than one supplier may supply each part, then this business rule creates no enforceable constraint and remains just a notation on the diagram. In my article, I criticized E/R modeling for its inability to add much value to this situation because its vocabulary was limited to one-to-one, many-to-one, and many-to-many relationships. Is there a more powerful approach that builds on the intuitive appeal of diagram-oriented modeling, but offers the hope of much richer and enforceable semantics? Also, last time I didnt really deal with the far more open-ended world of complex business logic. My mini-example of a complex piece of business logic was an insurance policy whose administration date is allowed to be NULL because an underwriter has committed but not yet approved it; when the policy has been underwritten, the administration date must be present and must always be later than the agreement date. If you have been thinking about this example, I hope you are bothered by some of the same things I am. These expressions of complex business logic are combinations of data relationships and procedural logic. We can verify some of the logic by examining the static data, and we can enforce some of the logic by observing a specific procedural sequence. A troubling question is: Where in our data processing environment are these relationships observed and enforced? In the production OLTP system? In the extract and transform logic of the data warehouse? Or in some quality assurance pass we make over the data warehouse after it is loaded?
The Promise of UMLThere is, of course, no silver bullet that gives a simple answer to all these questions. Complex business logic will always remain a combination of static data relationships and adherence to procedural sequences. We will enforce business logic in all the places I have listed previously. But in most of todays data warehouses we do not systematically record all of our business rules, and we have few if any systems for translating business rule specifications into programs downstream from our OLTP systems, in the data warehouse proper. We can do better. The practice of data warehousing has developed in parallel with, but distanced from, the practice of object-oriented (OO) software development. OO analysis and design methods have reached a kind of third-generation maturity with powerful techniques, vocabulary, and an extensive literature. But the bulk of the OO techniques developed between 1988 and the present have been applied to documenting and building software systems, not data warehouses. This fact is ironic in a way, because the modeling techniques and languages from the object world are powerful enough to express all of our data warehouse business rules. For example, in 1999, the object world completed the unification of a number of similar but competing techniques for building OO models. The Object Management Group released the Unified Modeling Language (UML) specification in mid-1999, and to paraphrase the OMG, it was time to stop arguing about diagramming notations and vocabulary and get on with the more substantive issues of building real systems. Because one of the main strategic purposes of UML is to represent business rules, it certainly seems like it is time for the data warehouse community to profit from the work of our OO colleagues and to apply some discipline to our messy and vaguely enforced data warehouse business rules.
What is UML?This article can only provide a teaspoon sip of an introduction to OO programming and UML. In my opinion, the best introductory guide to UML is UML Distilled, Second Edition, A Brief Guide to the Standard Object Modeling Language, by Martin Fowler (Addison Wesley, 2000). It is mercifully short and eminently readable. Martin describes a typical UML-driven development process, which he calls the Rational Unified Process, with the steps listed below. His process-intensive approach is typical of the OO practitioners. Inception and elaborationThese steps correspond to the familiar data warehouse steps of business requirements gathering and the data audit. Inception is a short project-scoping phase, perhaps lasting a few days, and elaboration is a comprehensive assessment of the projects requirements and risks. The elaboration phase may take 20 percent of the projects total time. ConstructionIn the construction phase, the system is built in a series of iterations. Martins goal during this phase is to deliver working code, driven from the UML specifications. Our challenge as data warehouse designers is to identify which code environments are our targets. TransitionThe transition phase occurs when construction is done and testing and debugging must occur. Software developers describe this as the time between beta test and final release. Having laid out a typical UML-driven development process, Martin then explains a number of specific techniques that fit into the UML framework, including: Use casesUse cases are sets of scenarios tied together with a common user goal. Typically, the scenarios try to enumerate all the things that can happen during a specific business process. For instance, when you purchase a product, you can pay in several different ways. After you pay (or attempt to pay), several things can happen. Your credit can be approved or denied. The product can be available or back ordered. Each of these alternatives is part of the overall use case describing this business process. Class diagramsA class diagram describes all the objects in the system and the relationships among these objects. A class diagram can be viewed as a powerful generalization of the familiar E/R diagram, embellished with much richer semantics, and supported by CASE tools that then translate the class diagrams into working code. Interaction diagramsAn interaction diagram describes the sequential behavior in a single use case. State diagramsA state diagram describes all the possible states that an object can get into and how each object responds to events. Activity diagramsAn activity diagram is a variant of a state diagram that describes the sequencing of activities, with specific emphasis on conditional and parallel execution paths. Physical diagramsA physical diagram is, as its name implies, a depiction of the hardware and software physical components of the system, with a guide to where the systems objects are stored and manipulated.
Where Can We Apply UML in a Warehouse Implementation?As you can see from my overview of Martin Fowlers book, the object and UML world have a lot of structure to offer data warehousing, which may be a little bit misleading. It is very important not to get too far invested in all the structure and lose sight of the basic data warehouse mission to publish the right data. In my opinion, there are several separate design situations in which the data warehouse community can bring the OO approach and UML-based tools to bear, and at the same time capture the organizations significant business rules. We can use these approaches to design our OLTP systems, our ETL systems, our data quality management systems, and our end-user applications. These four areas certainly span all of the important data warehouse architectural components, but the sheer breadth of these four areas points out the dilemma. Unlike many software environments, data warehouse development is very distributed and will never be confined to a single technology. As warehouse designers, we must segment our data warehouses and apply these structured techniques rather independently. Although I rarely discuss any specific vendors product offerings in these columns, I feel a real obligation to complete the story I began in these last two issues. Countless times people have asked me, How can I apply object methodologies to the design of my data warehouse? In the next issue, I will complete the survey of business rules, object technologies, and UML, with a specific discussion of which vendors in the four development areas Ive listed in this column can actually take a proper UML design and turn it into useful code that implements a data warehouse.
Ralph Kimball co-invented the Star Workstation at Xerox and founded Red Brick Systems. He has three best-selling data warehousing books in print, including the newly released The Data Webhouse Toolkit (Wiley, 2000). Ralph teaches dimensional data warehouse design through Kimball University and critically reviews large data warehouse projects. You can reach Ralph through his Web site at www.ralphkimball.com.
| Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
| |||||||||||||||||||||||||||||||||||||||























