Lost, Shrunken, And CollapsedWhy and how to create the three types of aggregates in a dimensional data warehouseWe do best to think of aggregates tables as summary indexes, as I explained in my previous guest column ("Aggregate Improvements," Oct. 4, 2001). By doing so, we recognize that pre-aggregation is essential, as are other indexing techniques, to data warehouse performance. Also, aggregates should be silently utilized just as other indexes are, via a form of query optimization known as aggregate navigation. My previous column also discussed how to select the right size and number of these all-important summary indexes. To continue the theme, I discuss this time how individual aggregates are defined and built. Those of you who follow Ralph Kimball's advice will build aggregates from atomic-level dimensional models. Doing so immediately makes life simpler, as you can obviously see that each large fact table is a candidate for one or more aggregates with a predictable structure and a predictable relationship to the base fact table. If you are faced with a normalized entity/relationship warehouse, it's very hard to see where to begin describing the many complex transformations needed to instantiate a useful set of aggregates. There is no standard body of practice to automate building and using aggregates in a normalized environment. It's a good thing there are only three ways to build aggregates from the fact table of a star or snowflake schema. To succinctly describe these different types of dimensional aggregates, I call them lost, shrunken, and collapsed. None of these terms sounds too healthy, but they have the advantage of being memorable. As a starting point to examine each aggregate technique, let's use a fact table with four dimensions that represents daily product sales per store per promotional campaign. LOST DIMENSION AGGREGATESLost dimension aggregates are created by completely excluding one or more dimensions when summarizing a fact table. In SQL, you can create lost dimension aggregates by grouping directly on a subset of the dimensional surrogate keys in the fact table. Figure 1 shows an aggregate with two lost dimensions, possibly representing a daily product sales aggregate of the original fact table. I suppressed both the store and promotion dimensions. This is the simplest type of aggregate to define and build, as it requires no access to the dimensional data. Its equivalent SQL query has no join processing, so this is often the easiest aggregate for modern DBMSs to build and incrementally maintain via their materialized view features. This type of aggregate can be very small and offers significant performance gains over the base fact table if several high-cardinality dimensions are lost and therefore aggregated. However, the reduced dimensionality of such an aggregate significantly affects its applicability for accelerating a broad range of queries. For example, an aggregate that lost the time dimension would be of little use. The DBA needs to study the result of suppressing a dimension to make sure that the resulting shrunken fact table is significantly smaller than the original fact table. Removing a dimension that is not part of the logical key of the fact table will not affect the number of rows in the fact table! SHRUNKEN DIMENSION AGGREGATESShrunken dimension aggregates have one or more dimensions replaced by shrunken or rolled versions of themselves. You can combine this technique with lost dimensions as well. Figure 2 shows an example with one shrunken dimension and one lost dimension; this diagram could represent a monthly-product-sales-by-store summary of the original fact table. In this example, a monthly-grain time dimension replaces the daily-grain time dimension. The aggregate would be significantly smaller than the fact table (though probably not by the full factor of 30 you might expect, because not every product is sold every day) but would still allow dimensional analysis by time at the month, quarter, and year levels. Before you can build a shrunken dimension aggregate, one or more shrunken dimension keys must replace original surrogate keys in the fact table. If the shrunken dimension keys are carried in the original dimensions, the aggregate can be populated by a SQL query that joins the fact table to these dimensions and groups on a combination of shrunken dimension keys and surviving atomic-level surrogate keys.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
|
|











