Realtime PartitionsBuild a special extension of each fact table to complement your static data warehouseIn the past two years, a major new requirement has been added to the data warehouse designer's mental list. The data warehouse now must seamlessly extend its existing historical time series right up to the current instant. If the customer has placed an order in the last hour, you need to see this order in the context of the entire customer relationship. Furthermore, you need to track the hourly status of this most current order as it changes during the day. Even though the gap between the production OLTP systems and the data warehouse has shrunk in most cases to 24 hours, the rapacious needs of our marketing users require the data warehouse to fill this gap with realtime data. Most data warehouse designers are skeptical that the existing extract-transform-load (ETL ) jobs can simply be sped up from a 24-hour cycle time to a 15-minute cycle time. Even if the data cleaning steps are pipelined to occur in parallel with the final data loading, the physical manipulations surrounding the biggest fact and dimension tables simply can't be performed every 15 minutes. Data warehouse designers are responding to this crunch by building a realtime partition in front of the conventional, static data warehouse. REQUIREMENTS FOR THE REALTIME PARTITIONTo achieve realtime reporting, we build a special partition that is physically and administratively separated from the conventional static data warehouse tables. Actually, the name partition is a little misleading. The realtime partition usually should not be a literal table partition, in the database sense. Rather, the realtime partition is a separate table subject to special rules for update and query. The realtime partition ideally should meet the following tough set of requirements. It must:
In the dimensional modeling world there are three main types of fact tables: transaction grain, periodic snapshot grain, and accumulating snapshot grain. (For more information on these fact tables, see my Intelligent Enterprise column "Fundamental Grains" from March 30, 1999.) The realtime partition has a different structure corresponding to each type. TRANSACTION GRAIN REALTIME PARTITIONIf the static data warehouse fact table has a transaction grain, then it contains exactly one record for each individual transaction in the source system from the beginning of "recorded history." If no activity occurs in a time period, there are no transaction records. Conversely, there can be a blizzard of closely related transaction records if the activity level is high. The realtime partition has exactly the same dimensional structure as its underlying static fact table. It contains only the transactions that have occurred since midnight, when you loaded the regular data warehouse tables. The realtime partition may be completely unindexed, both because you need to maintain a continuously open window for loading, but also because there is no time series because you keep only today's data in this table. Finally, you avoid building aggregates on this table because you want a minimalist administrative scenario during the day.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| |||||||||||||||||||||||||||||||























