CMP -- United Business Media

Intelligent Enterprise

Better Insight for Business Decisions

UBM
Intelligent Enterprise - Better Insight for Business Decisions
Part of the TechWeb Network
Intelligent Enterprise
search Intelligent Enterprise





October 30, 2002

Divide and Conquer

Build your data warehouse one piece at a time

by Ralph Kimball

Deciding whether to enforce common labels for disparate data sources across the enterprise is the $64,000 question in many data warehouses. This decision is one you should address at the very earliest stages of implementation, preferably before you have exposed any data sources to end users.

You may remember that in my previous column ("Two Powerful Ideas," Sept. 17, 2002) I described two ideas that are the basis for data warehouse design: The first was about separating your systems logically, physically, and administratively into a backroom data staging area and a front-room data presentation area. The second was about building stars and cubes in the presentation area. Taking the engineer's perspective, I described many advantages gained from these ideas. At the end of the column, I left you with the thought that the symmetrical stars and cubes I designed in the presentation area gave us a set of predictable points of commonality for linking together data from across the enterprise.

Common Labels Wanted?

Not every enterprise wants or needs a system of common labels for disparate data sources enforced across the enterprise. An umbrella organization that owns a broad portfolio of manufacturing businesses probably doesn't want a common set of labels across all the different product types. The suppliers are different, the product categories are incompatible, the sales channels are distinct, and the customers are varied. Most important, the senior managers of the umbrella organization don't manage the details of the individual businesses and don't treat the businesses monolithically. Similarly, the separate lines of business in a large financial organization may not have much motivation to adopt a set of common labels if their lines of business are quite different.

But many enterprises have a strong desire to enforce a set of common labels across disparate data sources. If these labels are to drive the design of the data warehouse, the enterprise must meet two conditions:

1) The most senior management of the enterprise must be strongly committed to using the common labels.

2) One of the end-user executives, such as the vice president of marketing, must be available as a forceful sponsor of the effort to define the common labels.

The data warehouse architect, no matter how energetic and persuasive, cannot single-handedly create and enforce a set of common labels for an enterprise. The senior management and the end-user, sponsoring executive must periodically cut through the politics of the enterprise to make sure all the parties stick to the task of defining the common labels.

Data Marts Are Not Departmental

Whether or not you put a system of common labels for disparate data sources into place, it's mandatory that you present each single data source to the users with a single set of labels.

Another way to say this is that data marts are defined by data sources, not by departments. If an enterprise has an Orders data source, then there should be exactly one Orders data mart, which serves to present that data mart to all the end users. This data mart should have one set of labels that everyone uses. The enterprise must not have a sales data mart for orders, a marketing data mart for orders, and a finance data mart for orders.

Having three different, incompatible views of the same data is poor data warehouse design and a recipe for disaster.

Conformed Dimensions and Facts

Theoretically, the effort of establishing a set of common labels across an enterprise is independent of the data modeling approach you take. But in practice, dimensional modeling forces the issue of establishing common labels, whereas entity/relationship (E/R) modeling provides no support or inducement for this task. A big E/R model of inconsistent data across an organization is just that: a big model of inconsistent data. Oddly, dimensional modeling is sometimes criticized as being difficult because it strongly invites you to define conformed dimensions. Yes, the difficult part is reaching agreement on common labels in your enterprise, but it has nothing to do with the storage model implied by the choice of modeling technique!

Dimensional modeling divides the world of data into two major types: measurements and descriptions of the context surrounding those measurements. The measurements, which are typically numeric, are stored in fact tables, and the descriptions of the context, which are typically textual, are stored in the dimension tables. We'll dig into the mathematics of fact tables and dimension tables in the next column in this series, but suffice it to say that every dimensional design is boringly similar. For example, if the notion of "product" is found in many places across a large enterprise, the structure of the product dimension in each of these places is likely to be similar, even before you attack the issue of common labels.

If a large enterprise decides it can't or won't create a set of common labels (in all the product dimensions, for instance) then the enterprise will have many separate data warehouses that aren't intended to be linked together.






IE Weekly Newsletter
Subscribe to the newsletter
    Email Address