Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




April 22, 2003

The Soul of the Data Warehouse, Part 3: Handling Time

The data warehouse takes a pledge to preserve history

by Ralph Kimball

Continued from Page 1

I've written voluminously about these three types of slowly changing dimensions (SCDs), how they're modeled, and how they're administered. Please see any of my books for the details. But the point I wish to make here is that all three choices need at least one embedded time stamp stating when the record was updated as well as a companion field describing that change. For the primary Type 2 SCD, where a new record is created, you need a pair of time stamps as well as a change description field. The pair of time stamps define a span of time from the begin-effective time to the end-effective time when the complete customer description remains valid.

The most sophisticated treatment of a Type 2 SCD record involves five fields:

  • Begin effective date/time stamp (not a surrogate key pointer)
  • End effective date/time stamp
  • Effective date surrogate key (daily grain) connecting to date dimension as a snowflake
  • Change description field (text)
  • Most recent flag.

The first two fields are what conventional BETWEEN constraints use to profile the dimension at specific points in time. They need to be single fields with full date and time stamps in order to make the BETWEEN machinery work. The third field constrains specific records in the dimension that changed on days that can be described only via the organization's calendar date table (such as employee demotions that occurred the day before payday). The fourth field lets you find all changes in the dimension meeting a particular description.The fifth field is a quick way to find all the current records in a dimension without using BETWEEN.

Natural Views

In 30 years of analyzing and modeling data, I've found that fact-table measurements all fall into just three classes. These types correspond to instantaneous events, regular periodic reports, and latest status. In dimensional modeling, these three fact-table types are the transaction grain, the periodic-snapshot grain, and the accumulating-snapshot grain.

The transaction grain represents a point in space and time, and is meant for a measurement event defined at a particular instant. A scanner event at a grocery store is the classic example of a transaction event. In this case, the time stamp in the fact table is very simple. It's either a single daily-time-grain foreign key or a pair consisting of a daily-time-grain foreign key together with a time-of-day foreign key, depending on what the source system provides. The facts in this transaction-grain table must be true to the grain and should describe only what took place in that instant.

The periodic-snapshot grain represents a regular repeating measurement, like a bank account monthly statement. This fact table also has a single time stamp, representing the overall period. Usually the time stamp is the end of the period, and often is expressed at the daily grain, even if it's understood to represent a month or a fiscal period. The facts in this periodic-snapshot grain table must be true to the grain and should describe only measures appropriate to the specific period.

The accumulating-snapshot grain represents the current evolving status of a process that has a finite beginning and end. Usually these processes are of short duration and therefore don't lend themselves to the periodic snapshot. Order processing is the classic example of an accumulating snapshot.

The design and administration of the accumulating snapshot is quite different from the first two fact-table types. All accumulating-snapshot fact tables have a set of as many as four to 12 dates describing the typical scenario of the process being modeled. For instance, an order has a set of characteristic dates: original order date, actual ship date, delivery date, final payment date, and return date. In this example, these five dates appear as five, separate, date-valued foreign (surrogate) keys. When the order record is first created, the first of these dates is well defined, but perhaps none of the others have yet happened. This same fact record is subsequently revisited as the order wends its way through the pipeline. Each time something happens, the accumulating-snapshot fact record is destructively modified. The date foreign keys are overwritten, and various facts are updated. Often the first date remains inviolate because that describes when the record was created, but all the other dates may well be overwritten, sometimes more than once.

Have You Lived Up to Your Pledges?

This column has been a brief overview of the central techniques of handling time in a data warehouse. If you've systematically employed surrogate keys for all your connections to your master time dimensions, faithfully tracked changes in dimension entities with the three types of SCDs, and supported your users' reporting needs with transaction-grain, periodic-snapshot-grain, and accumulating-snapshot-grain fact tables, then you have indeed lived up to your pledge.



Rate This Article

Comments:

Optional e-mail address:


Ralph Kimball (founder of the Ralph Kimball Group) co-invented the Star Workstation at Xerox and founded Red Brick Systems. He has three best-selling data warehousing books in print, including The Data Warehouse Toolkit, Second Edition (Wiley, 2002). He teaches dimensional data warehouse design through Kimball University and critically reviews large data warehouse projects. You can reach him through his Web site, www.ralphkimball.com.


RESOURCES

This is the tenth and final part of the Fundamentals Series. See also parts one and two of "The Soul of the Data Warehouse."








IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







techweb
Online Communities TechWebInformationWeekLight ReadingIntelligent EnterprisebMightyNetwork ComputingDark ReadingDigital LibraryWall Street & Technology
Byte & SwitchNo JitterInternet EvolutionLight Reading's Cable Digital NewsContentinopleUnStrungBank Systems & TechnologyAdvanced TradingInsurance & Technology
Face-to-Face Events
InteropWeb 2.0 ExpoWeb 2.0 SummitVoiceConBlack HatCSISoftwareEntrprise 2.0 ConferenceGTEC
Mobile Business Expo
InformationWeek 500 ConferenceBuy Side Trading XchangeBuy Side Trading SummitBank Executive SummitInsurance Executive SummitTelcoTVEthernet ExpoOptical Expo
Magazines  
InformationWeekWall Street & TechnologyInsurance & TechnologyBank Systems & TechnologyAdvanced TradingMSDNTechNetSmart EnterpriseThe Architecture JournalDatabase Magazine
 
Research & Analyst Services  
Heavy ReadingInformationWeek ReportsInformationWeek Analytics