CMP -- United Business Media

Intelligent Enterprise

Better Insight for Business Decisions

UBM
Intelligent Enterprise - Better Insight for Business Decisions
Part of the TechWeb Network
Intelligent Enterprise
search Intelligent Enterprise





March 28, 2002

Asymmetric Aggregation

What to do if your data mart isn't a shiny dimensional diamond

By Gabriel Tanase
Edited by Ralph Kimball

In last issue's column I tried, without sounding too pessimistic, to describe several real-world threads most data mart designers are likely to encounter that make the data mart more complex than the shiny dimensional "diamond" we all dream about. (See "Real-Life Data Mart Processing.") These threads include: three distinct usage styles (ad hoc querying, standard reporting, and using the data mart as an intermediate step in the delivery of data to a serious downstream application); the existence of complex nonlinear aggregated measures; and the conflict between source-aggregated and data mart-aggregated data that should produce the same results but don't.

Now, as promised, I'll draw these threads together and show you an adaptable architecture you can use to tackle these issues if they arise in your environment. I'll suggest how to create a specific asymmetric aggregate level of the data mart. And I'll recommend building a custom user interface for storing the underlying component factors of your nonlinear aggregations in recognizable data mart dimensions.

I have argued that:

  • In an environment where a data mart is used mostly via standard reports, its aggregated information output is likely to be trusted, required, and used as a source for downstream applications.
  • Some aggregate-level measure calculations are nonlinear and dependent on business rules.
  • Sometimes data sourcing or measure calculations can take place only at a certain aggregate level: Either there are no source transactions available to be aggregated or no allocation and aggregation makes good business sense.

Telltale Requirements

There are a few telltale signs that you'll require an asymmetric aggregate layer in your data mart, which are relatively easy to catch during the requirements-gathering phase.

  • Some of the future users of the data mart periodically reenter values from existing reports or OLTP extracts into spreadsheets or similar tools, obtaining processed figures that they send to other coworkers or directly to recipients outside the company (the last being a dangerous situation). The desktop processing rules are understood by only a handful of people, if at all.
  • Users tell you during interviews that they need reports with a lot of figures that are only slightly aggregated; they say they themselves will take care of the higher-level aggregation and calculation requirements.
  • They tell you that a major requirement for the reporting tool is to enable easy download into Excel.
  • During the reporting tool selection process, key users wonder whether it would be capable of performing "complicated calculations."
  • An existing application takes extracts from the OLTP system for further processing, or an application needing summarized data is scheduled to be developed in the near future.
  • Users describe data items they say they will absolutely need in the data mart, but the description makes it clear that the items are not at a transaction level.
  • Users say that they would need a process of "adjusting" or "filling in" data in the data mart after aggregation takes place, because they "will always have missing, estimated, or unavailable data items." They may also say that they want their reports produced by including or excluding the "adjusted" figures, for comparison and auditing purposes.

Designing for IT

After establishing that business needs and limitations demand a customized aggregate layer, your next area of concern is an appropriate design.

It should be obvious that the variety of possible circumstances precludes anyone from offering simple recipes. Hence, I will try to suggest just some starting points by asking and answering two questions that I trust designers would find asking themselves in such a situation.

The fundamental prerequisite is that the "asymmetric" aggregate layer must consist of real tables, not views or materialized views. You cannot create these aggregate measures at run time. You need to put those special, aggregate-only feeds somewhere.

Question: Should I keep measures aggregated from transactions and externally sourced aggregate-only measures in separate tables?

Answer: It depends on whether the values from the external feed must be added to measures already aggregated from transactions or do not interfere with each other.

If they must be added, a final aggregate measure M must be computed as M = I + E, where I is a value aggregated from transactions and E is an externally sourced value at the same summary-level grain.

If the external values and preexisting aggregates do not interfere with each other, there exists:

  • A first set of measures IM1, IM2, ..., IMp obtained purely by aggregation from transactions
  • A nonoverlapping second set of measures EM1, EM2, ..., EMq obtained from the external feed.

Again, both the IM1 and EMj measures must be defined at the same summary-level grain. Here the two sets of measures could be vertically partitioned in two tables. But, ultimately the two data flows must come together into only one set of measures. So you should start with them in separate tables and combine them in the final step.






IE Weekly Newsletter
Subscribe to the newsletter
    Email Address