CMP -- United Business Media

Intelligent Enterprise

Better Insight for Business Decisions

UBM
Intelligent Enterprise - Better Insight for Business Decisions
Part of the TechWeb Network
Intelligent Enterprise
search Intelligent Enterprise





June 13, 2002

Tricky Time Spans

The time dimension isn't nearly as simple as it looks

By Ralph Kimball

In some ways the time dimension seems like the simplest dimension. At least, if you're modeling calendar days, the time dimension is small and predictable. You can make up a 10-year time dimension in an afternoon using a spreadsheet; that's only about 7,300 rows.

In the past couple of years, I've seen an increased demand for applications that need to ask questions about time spans. One person captured it nicely when he said, "Each record in my fact table is an episode of constant value over a region of time." Time spans can start and stop at arbitrary points in time. In some cases, time spans link together to form an unbroken chain; in other cases, time spans are isolated; and in the worst cases, time spans overlap arbitrarily. But each time span is represented in the database by a single record.

To make this concept easier to visualize, imagine you have a database filled with atomic transactions, such as deposits and withdrawals from bank accounts. It includes open account and close account transactions. Each transaction implicitly defines an episode of constant value over a region in time. A deposit or withdrawal defines a new value for the account balance that's valid until the next transaction. This time span could be one second or many months. The open account transaction defines the status of the account as continuously active over a time span until a close account transaction appears.

Before proposing a database design, let's remind ourselves of some of the time span questions we want to ask.

Easy, Moderate, and Very Hard

Start off by limiting your questions to a granularity of individual days rather than parts of days, such as minutes and seconds (more on minutes and seconds at the end of this column). The easy queries include:

  • Show all the transactions that occurred within a given time span.
  • Determine if a selected transaction occurred within a given time span.
  • Define time spans using complex calendar navigation capabilities including seasons, fiscal periods, day numbers, week numbers, paydays, and holidays.

For these cases all you need is a single timestamp on the transaction fact table record. The first query picks up all the transactions with a timestamp in an interval specified in the user's query. The second query retrieves the timestamp from a selected transaction and compares it to the interval. The third set of queries replaces the simple timestamp with a calendar date dimension filled with a lot of helpful calendar attributes. This date dimension is connected to the fact table through a standard foreign key - primary key join. This is all-vanilla dimensional design and requires only a single time key in the fact table record to represent the required timestamp. So far, so good.

By the way, when using complex calendar navigation, the queries become much easier if the verbose date dimension includes first day and last day markers for each defined span of time, such as "last day of quarter." This field would have the value N for all the days except the last day of the applicable quarter. The last day would have the value Y in the special field. These markers allow the complex business time spans to be easily specified in the queries. Note that the use of a verbose date dimension means the application isn't navigating a timestamp on the fact table (more about this in the last section).

A second, moderately hard category of time span queries include:

  • Show everyone who was a customer at some point within a time span.
  • Show the last transaction for a given customer within a time span.
  • Show the balance of an account at an arbitrarily selected point in time.

Continue to make the simplifying assumption that all the time spans are described by calendar days, not by minutes and seconds. It's possible to answer all these questions with the single-timestamp design discussed earlier, but this approach requires complex and inefficient queries. For instance, to answer the last query, you would need to search the set of account transactions for the latest transaction at or prior to the desired point in time. In SQL, this approach would take the form of a correlated sub-SELECT embedded in the surrounding query. Not only is this probably slow, but end-user tools can't readily produce the SQL.






IE Weekly Newsletter
Subscribe to the newsletter
    Email Address