Tricky Time SpansThe time dimension isn't nearly as simple as it looksContinued from Page 1 For all these moderately hard time span questions, you can simplify the applications enormously by providing twin timestamps on each fact record, indicating the beginning and end of the time span that the transaction implicitly defines. Again, thinking of a transaction as defining an "episode of constant value over a region of time" is helpful here. With the twin-timestamp design, you easily knock off the three moderately hard example queries:
In all these cases, the SQL uses a simple When you use the twin-timestamp approach, you have to be honest about one major drawback. In
almost all situations, you have to visit each fact table record twice: once when you first insert it
(with an open-ended end timestamp), and once more when a superceding transaction occurs that
actually defines the real end timestamp. The open-ended end timestamp probably should be a real
value somewhere out in the future, so that applications don't trip over null values when they try to
execute the Time Spans to the SecondI saved the hardest questions for the end: They're the same basic questions as in the first two sections, but with the time span boundaries defined to the nearest second. For this scenario, you'll put the same two twin timestamps in the fact record, but you have to give up your connection to a robust time dimension. Both the beginning and ending timestamps must be conventional RDBMS date/timestamps. You must do it this way because you usually can't create a single time dimension with all the
minutes or seconds over a significant period of time. There are more than 500,000 minutes and 31
million seconds in a year. Dividing the timestamp into a day component and a seconds-in-a-day
component makes the But if you're really a die hard, you could consider four timestamps on each transaction record if your time spans are accurate to the second. The first two would be RDBMS date/timestamps as described in the preceding paragraph. But the third and fourth would be calendar day (only) foreign keys connecting to a verbose calendar day dimension as in the first two sections of this column. That way, you can have your cake and eat it, too. You can search for ultraprecise time spans, but you can also ask questions such as, "Show me all the power outages that occurred on a holiday." Even with these powerful techniques, I'm sure there are some tricky time span questions I haven't considered. I collect these tricky puzzles, so email new ones to me at Ralph@ralphkimball.com. Ralph Kimball co-invented the Star Workstation at Xerox and founded Red Brick Systems. He has three best-selling data warehousing books in print, including The Data Warehouse Toolkit, Second Edition (Wiley, 2002). He teaches dimensional data warehouse design through Kimball University and critically reviews large data warehouse projects. You can reach him through his Web site, www.ralphkimball.com.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| ||||||||||||||||||||||||||||||||









