Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Digital Library
Subscribe
Home




February 1, 2002

Realtime Partitions

Build a special extension of each fact table to complement your static data warehouse

By Ralph Kimball

In the past two years, a major new requirement has been added to the data warehouse designer's mental list. The data warehouse now must seamlessly extend its existing historical time series right up to the current instant. If the customer has placed an order in the last hour, you need to see this order in the context of the entire customer relationship. Furthermore, you need to track the hourly status of this most current order as it changes during the day.

Even though the gap between the production OLTP systems and the data warehouse has shrunk in most cases to 24 hours, the rapacious needs of our marketing users require the data warehouse to fill this gap with realtime data.

Most data warehouse designers are skeptical that the existing extract-transform-load (ETL ) jobs can simply be sped up from a 24-hour cycle time to a 15-minute cycle time. Even if the data cleaning steps are pipelined to occur in parallel with the final data loading, the physical manipulations surrounding the biggest fact and dimension tables simply can't be performed every 15 minutes.

Data warehouse designers are responding to this crunch by building a realtime partition in front of the conventional, static data warehouse.

REQUIREMENTS FOR THE REALTIME PARTITION

To achieve realtime reporting, we build a special partition that is physically and administratively separated from the conventional static data warehouse tables. Actually, the name partition is a little misleading. The realtime partition usually should not be a literal table partition, in the database sense. Rather, the realtime partition is a separate table subject to special rules for update and query.

The realtime partition ideally should meet the following tough set of requirements. It must:

  • Contain all the activity that has occurred since the last update of the static data warehouse. Let's assume right now that the static tables are updated each night at midnight.
  • Link as seamlessly as possible to the grain and content of the static data warehouse fact tables.
  • Be indexed so lightly that incoming data can continuously be "dribbled in."
  • Support highly responsive queries.

In the dimensional modeling world there are three main types of fact tables: transaction grain, periodic snapshot grain, and accumulating snapshot grain. (For more information on these fact tables, see my Intelligent Enterprise column "Fundamental Grains" from March 30, 1999.) The realtime partition has a different structure corresponding to each type.

TRANSACTION GRAIN REALTIME PARTITION

If the static data warehouse fact table has a transaction grain, then it contains exactly one record for each individual transaction in the source system from the beginning of "recorded history." If no activity occurs in a time period, there are no transaction records. Conversely, there can be a blizzard of closely related transaction records if the activity level is high. The realtime partition has exactly the same dimensional structure as its underlying static fact table. It contains only the transactions that have occurred since midnight, when you loaded the regular data warehouse tables. The realtime partition may be completely unindexed, both because you need to maintain a continuously open window for loading, but also because there is no time series — because you keep only today's data in this table. Finally, you avoid building aggregates on this table because you want a minimalist administrative scenario during the day.






IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







Techweb
Informationweek Business Technology Network
InformationweekInformationweek 500Informationweek 500 ConferenceInformationweek AnalyticsInformationweek Events
Informationweek MagazineGlobal CIOIWK Government ITbMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingPlug Into The CloudDr. DobbsContentinople
space
TechWeb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0Mobile Business ExpoNoJitter
Black HatGTECEnergy CampCloud ConnectGov 2.0 ExpoGov 2.0 Summit
space
Light Reading Communications Network
Light ReadingLight Reading AsiaUnstrungCable Digital NewsInternet EvolutionPyramid Research
Heavy ReadingLight Reading LiveLight Reading InsiderEthrnet ExpoTelco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems and TechnologyInsurance and TechnologyWall Street and TechnologyAccelerating WallstreetBST SummitBuyside Trading SummitIT Summit
space
Microsoft Technology Network
MSDNTechNetTotal IT ProTotal Dev ProNET Total Dev Pro CommunitySQL Total Dev Pro Community
space