As database designers, we talk frequently about business rules. Business rules, after all, are the heart and soul of our applications. If our systems obey the business rules, then the data will be correct, our applications will function, and our users and management will be happy. But what exactly is a business rule? Where do we declare business rules and where do we enforce them? Lets propose four levels of business rules, starting with the simplest local definitions of data that are clearly enforced at the database level: 1. Single field format definitions, enforced directly by the database: The Payment field is an amount interpreted as dollars and cents. The Surname field is a text field expressed in the UNICODE character set. 2. Multiple field key relationships, enforced by key declarations residing in the database: The Brand Name field in the Brand table has a many-to-one relationship with the Manufacturer Name field in the Manufacturer table. The Product foreign key in the Sales fact table has a many-to-one relationship to the Product primary key in the Product dimension table. 3. Relationships between entities, declared on an entity-relation (E/R) diagram, but not directly enforced by the database because the relationship is many-to-many: Employee is a subtype of Person. Supplier supplies Customer. 4. Complex business logic, relating to business processes, and enforced perhaps only at data entry time, by a complex application: When an insurance policy has been committed but has not yet been approved by the underwriter, the administration date can be NULL, but when the policy has been underwritten, the administration date must be present and must always be more recent than the agreement date. Looking at these four kinds of business rules makes it clear that the core database software manages only the first two levels, single field format definitions and multiple field key relationships. While these first two levels are a kind of bedrock of any proper database environment, we would hope to enforce level 3, relationships between entities, and level 4, complex business logic, because there is much more valuable business content at these levels.
Does E/R Modeling Handle Business Rules?E/R modeling is something of a holy grail among database designers. At first blush it appears to be a comprehensive language for describing relationships between entities. But what does E/R modeling really guarantee? E/R modeling is a diagramming technique for specifying one-to-one, many-to-one, and many-to-many relationships among data elements. While in a pure sense an E/R model is only a logical model, powerful tools such as Computer Associates ERwin convert an E/R diagram into data definition language (DDL) declarations that appropriately define key definitions and join constraints among tables that enforce the various flavors of relationships. This all sounds wonderful. Whats the problem? What is missing from this picture provided by E/R modeling? In my opinion, while E/R modeling is a useful technique for beginning the process of understanding and enforcing business rules, it falls far short of providing any kind of completeness or guarantee. Worse, it has been drastically overblown as a platform for all forms of business rules. Here is my list of practical and theoretical issues with E/R modeling: E/R modeling is incomplete. The entities and relationships on any given diagram represent only what the designer decided to emphasize, or was aware of. There is no test of an E/R model to determine if the designer has specified all possible one-to-one, many-to-one, or many-to-many relationships. E/R modeling is not unique. A given set of data relationships can be represented by many E/R diagrams. Most real data relationships are many-to-many. This is a kind of catch-all declaration that doesnt provide any discipline or value. Enforcing a many-to-many relationship is a kind of oxymoron. There are many flavors of many-to-many relationships involving various conditions and degrees of correlation that would be useful to include as business rules, but E/R modeling provides no extensions to the basic many-to-many declaration. Most large E/R models are ideal, not real. Nearly all the corporate data models I have seen are an exercise in how things ought to be. Up to a point, such a model is a useful exercise in understanding the business, but if that model is not physically populated with real data, I have never found it worthwhile to use the corporate data model as the basis for a pragmatic data warehouse implementation. E/R models are rarely models of real data. A close corollary to the previous point is that we dont have tools for crawling over real data sets and making E/R models. We almost always make the E/R model and then try to shoehorn the data into the model. This fact leads to the odd realization that when dirty data arrives at the data staging area after weve extracted it from a primary production source, we cannot put it into an E/R model as a prelude to cleaning! We can only put it into the E/R model after weve cleaned it. And, given the first two points in this section, even if we eventually place the data into an E/R driven framework, there is no guarantee that the cleaning step is complete, unique, or has captured many of the interesting data relationships. E/R models lead to absurdly complex schemas that defeat the primary objectives of information delivery. Every designer is aware of how complex an enterprise level E/R model can become. The E/R models underlying Oracle Financials can easily require 2,000 tables, and SAPs model can easily require 10,000 of them. These huge schemas thwart the basic data warehouse objectives of understandability and high performance. Only a few very high-end hardware vendors still try to deliver data warehouse services against large E/R-driven schemas, while most of the rest of us have sought simpler design techniques that we can implement far more cost-effectively. At this point, you are probably thinking: Kimball is at it again. People who really understand data modeling use the E/R technique as their primary tool. Well, I have always thought of Chris Date as someone who really understands modeling. In many ways, he invented the important concepts of relational database modeling. The seventh edition of his seminal book, An Introduction to Database Systems (Addison Wesley, 2000) has just been released, and I was paging through my copy, reminiscing about my own first exposure to these ideas through a much earlier version of the book. In Chapter 13, Semantic Modeling, his comments about E/R modeling caught my eye. I quote: It is not even clear that the E/R model is truly a data model at all, at least in the sense we have been using that term in this book (page 435). A charitable reading of [Chens original paper defining E/R modeling] would suggest that the E/R model is indeed a data model, but one that is essentially just a thin layer on top of the basic relational model (page 436, italics from the original). And finally, The E/R model is completely incapable of dealing with integrity constraints or business rules, except for a few special cases (admittedly important ones). Declarative rules are too complex to be captured as part of the business model and must be defined separately by the analyst/developer (page 436, italics from the original). Although Chris and I come to the database world from very different origins, we appear to agree on the relative importance of E/R modeling. E/R modeling is useful for transaction processing because it reduces redundancy in the data, and it is useful for a limited set of data cleaning activities, but it falls far short of being a comprehensive platform for data warehouse business rules, and it is very difficult to use for data delivery.
One of the Origins of Dimensional Modeling
I have always been fascinated by the way Chris introduced the relational model in the original editions of his book with his classic suppliers and parts database. To my surprise, the recent edition of his book uses the same original data model as its foundation. In fact, the data model is the books frontispiece. Study this model, which I reproduce as Figure 1, for a moment.
Doesnt this model look familiar? Its a perfect, beautiful star schema! The SP (supplier-parts) table is a fact table. It has a multipart key consisting of foreign keys linking to the individual supplier table and parts table. The remaining field in the SP table is an additive numeric fact that is clearly a measurement. The S (supplier) table is a perfect dimension table with a single primary key, the S#. The remaining fields are all text, or textlike. The P (parts) table is also a perfect dimension table and is organized in the same way. The primary keys in both dimensions are beautiful surrogate keys, devoid of any content found elsewhere in the dimension tables. A subtle point: We could have normalized the S and P tables further by creating a snowflaked City table. But these tables have been left in the classic denormalized form favored by dimensional modelers. I consider a student to be a graduate when they produce dimensional models as perfect as Chris model. And yet Chris model is the one we all cut our teeth on starting in the 1970s. This fact tells me that the classic star schema (or dimensional model) has a profound symmetry and simplicity that are very close to that expressed in the classic view of relational systems. E/R modeling is effectively an offshoot of this classic view of the relational world that solved some problems and created others.
Back to Business RulesIve tried to open up some very serious and interesting issues about business rules in this article, but I certainly havent solved them. At least we have seen that entity relation modeling fails to address the critical mass of possible business rules. So, where do we record business rules? How do we enforce them? What does this have to do with metadata? And finally, does dimensional modeling do anything for business rules that E/R modeling doesnt? See you in three weeks with some answers to these questions.
Ralph Kimball co-invented the Star Workstation at Xerox and founded Red Brick Systems. He has three best-selling data warehousing books in print, including the newly released The Data Webhouse Toolkit (Wiley, 2000). Ralph teaches dimensional data warehouse design through Kimball University and critically reviews large data warehouse projects. You can reach Ralph through his Web site at www.ralphkimball.com.
| Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|











