CMP -- United Business Media

Intelligent Enterprise

Better Insight for Business Decisions

UBM
Intelligent Enterprise - Better Insight for Business Decisions
Part of the TechWeb Network
Intelligent Enterprise
search Intelligent Enterprise





March 8, 2002

Real-Life Data Mart Processing

Is your mart the symmetric "information diamond" at the end of the data pipeline?

By Gabriel Tanase
Edited by Ralph Kimball

The conventional vision of a data mart is of a cleansed, historical, queryable data source for reports and ad hoc queries. And, of course, these reports are used directly for business decisions. Rarely are there any further feeds to downstream applications. Our vision also assures us that ad hoc analysis is possible by using any combination of available dimensions joined to one or a few fact tables. We often gravitate to a fundamentally symmetric star schema logical design where all the atomic measures can be meaningfully aggregated under any combination of dimensions. Any way you look at this information diamond, it looks clear and shiny. If only real life were as simple and symmetric as a diamond....

THREE REAL-LIFE USAGE STYLES

The first reality we must confront is the range of quite different usage styles among our end-user clients. At one extreme of the usage style continuum there are environments where a data mart is dominated by ad hoc queries, developed by business users, with almost no IT-developed, canned reports. Even when many such queries are used on a periodic basis for a relatively long time, they are permanently evolving in the hands of their users.

At the other extreme, there is almost no ad hoc querying. The data mart need support only IT-developed, standard reports. These reports change much more slowly than ad hoc queries, but occasionally they change in leaps and bounds, by releases separated by months if not years.

Even if you have a great slice-and-dice or standard reporting environment, a third style of usage shows up almost insidiously. Our end user explains to us the need to add just a few more calculations to the results, merge in some data from five other operational reports, apply some tricky business rules only this end user understands, and then submit the results as a complex spreadsheet to the finance department. What's tough from the data mart perspective is that this is a one-of-a-kind construct that probably has to stay in this form. So we not only need to have a good data pipeline into spreadsheets, we need to assist the end user with training and data quality issues as well.

REAL-LIFE DATA MART: END OF THE LINE?

From the perspective of the enterprise information architecture, an important question is whether the ad hoc queries or canned reports produced from the data mart are indeed the end of the IT system's contributions and the start of the human feedback and decision loop. Or, is the information coming out of the data mart actually a significant feed to a downstream application?

In my opinion, the standard report style of usage lends itself much better to using the data mart as a source for downstream applications, compared to the ad hoc style of usage.

Given the nature of IT, most system developments require a stable information source. Also, the fact that the data mart already produces agreed-upon, cleansed and trusted, true history-based summary values of business measures may be of great value for applications dedicated to financial analysis, budgeting, forecasting, and statutory reporting. The designers and developers of such applications downstream from our data mart would, of course, prefer not to reinvent the wheel.

In an environment where ad hoc queries are predominant, it is very likely that the data mart must be regarded as the end of the information pipeline. Managers and business analysts in an ad hoc environment use queries interactively to make business decisions. Also, data extracts from queries may end up in desktop tools such as Excel. However, this is not a permanent, highly robust information flow; it serves mostly changeable, one-off purposes.

In an environment where standard reports are predominant, there is a stronger possibility that the data mart will not be at the end of the information pipeline. In particular, any downstream application which is itself storing and manipulating data over long periods of time — such as a financially oriented application — needs to depend on stable and specific calculations for various atomic and aggregated measures that it sources from the mart.

CARRYING COMPLEX MEASURES AWAY

For senior business managers, some queries and reports are "more equal than others." Known under different names — key performance indicators, dashboard gauges, and so on — some measures are more important for business than all the rest.

As long as these high-level measures can be calculated simply from atomic-level transaction measures according to the "sum and count, then variance and ratio" basic paradigm of the dimensional model, there isn't much to discuss. The atomic-level transactions would happily aggregate by any combination of dimension values, as long as the aggregation is sum and count.

My experience is, however, that many of those all-important reports include nonlinear measure aggregations. Let's define a linear aggregation as an algebraic expression with its terms being sums and counts of measures from the same atomic fact grain over a given set of dimension values. A nonlinear aggregation is any that does not follow this definition.






IE Weekly Newsletter
Subscribe to the newsletter
    Email Address