Asymmetric AggregationWhat to do if your data mart isn't a shiny dimensional diamond
By Gabriel Tanase In last issue's column I tried, without sounding too pessimistic, to describe several real-world threads most data mart designers are likely to encounter that make the data mart more complex than the shiny dimensional "diamond" we all dream about. (See "Real-Life Data Mart Processing.") These threads include: three distinct usage styles (ad hoc querying, standard reporting, and using the data mart as an intermediate step in the delivery of data to a serious downstream application); the existence of complex nonlinear aggregated measures; and the conflict between source-aggregated and data mart-aggregated data that should produce the same results but don't. Now, as promised, I'll draw these threads together and show you an adaptable architecture you can use to tackle these issues if they arise in your environment. I'll suggest how to create a specific asymmetric aggregate level of the data mart. And I'll recommend building a custom user interface for storing the underlying component factors of your nonlinear aggregations in recognizable data mart dimensions. I have argued that:
Telltale RequirementsThere are a few telltale signs that you'll require an asymmetric aggregate layer in your data mart, which are relatively easy to catch during the requirements-gathering phase.
Designing for ITAfter establishing that business needs and limitations demand a customized aggregate layer, your next area of concern is an appropriate design. It should be obvious that the variety of possible circumstances precludes anyone from offering simple recipes. Hence, I will try to suggest just some starting points by asking and answering two questions that I trust designers would find asking themselves in such a situation. The fundamental prerequisite is that the "asymmetric" aggregate layer must consist of real tables, not views or materialized views. You cannot create these aggregate measures at run time. You need to put those special, aggregate-only feeds somewhere. Question: Should I keep measures aggregated from transactions and externally sourced aggregate-only measures in separate tables? Answer: It depends on whether the values from the external feed must be added to measures already aggregated from transactions or do not interfere with each other. If they must be added, a final aggregate measure M must be computed as M = I + E, where I is a value aggregated from transactions and E is an externally sourced value at the same summary-level grain. If the external values and preexisting aggregates do not interfere with each other, there exists:
Again, both the IM1 and EMj measures must be defined at the same summary-level grain. Here the two sets of measures could be vertically partitioned in two tables. But, ultimately the two data flows must come together into only one set of measures. So you should start with them in separate tables and combine them in the final step.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
|
|











