CMP -- United Business Media

Intelligent Enterprise

Better Insight for Business Decisions

UBM
Intelligent Enterprise - Better Insight for Business Decisions
Part of the TechWeb Network
Intelligent Enterprise
search Intelligent Enterprise





August 10, 2001



Managing Helper Tables

A careful look at many-to-many relationships between important dimensions

By Ralph Kimball

Multivalued dimensions are normally illegal in a dimensional design. We usually insist that when the grain of a fact table is declared, the only legal dimensions that can be attached to that fact table are those that take on a single value for that grain.

For instance, in the banking world, if the grain of the fact table is Account by Month, then we exclude the Transaction dimension because it takes on many different values throughout the month. If we wanted to see individual transactions, then we would declare a finer grain, such as Account by Transaction by Time of Day.

But for every rule there have to be exceptions. Sometimes even when a dimension takes on multiple values in the presence of the fact table grain, it nevertheless is compelling and natural to attach the multivalued dimension to the fact table without changing the grain. It is very desirable, for example, to attach a Customer dimension to that banking fact table whose grain is Account by Month.

The problem is that the number of customers associated with each account is open ended. I may have a checking account in my own name, but my wife and I have a joint savings account. Perhaps we also have a family trust account with five or six customer names.

The preferred way to handle this multivalued dimension is with a "helper table," as shown in Figure 1, where it is called the Account to Customer Map. I wrote a previous column, "Help for Dimensional Modeling," for DBMS magazine (August 1998) on helper tables in banking, healthcare, and industry. You can find it in my column archive at www.ralphkimball.com or www.intelligententerprise.com/dbmssearch.jhtml. Since the time of that column, I have discussed with designers and students many times the details of constructing and using these helper tables. Let's take a careful look at the helper table in Figure 1, and extend the design of the original DBMS column.

Figure 1.

USING SURROGATE KEYS

The Account to Customer Map is a kind of fact table whose primary key (PK) is composed of multiple foreign keys (FK). The primary key in this example consists of the Account key, the Customer key, and the Begindate key. An individual record in this table shows that a particular customer was part of a specific account during the interval defined by the begin date and the end date. But this definition requires a careful look.

It is very important that the customer and account foreign keys are surrogate keys referring to their respective dimensions, both of which are Type 2 slowly changing dimensions (SCDs). In other words, you carefully track changes in both the customer dimension and the account dimension, and continuously issue new versions of records in those dimensions to reflect changes. In a Type 2 SCD, the natural keys for customer and account remain constant, but the surrogate keys change each time you insert a new record into the dimension.

The helper table needs the surrogate keys so that the record of the customer belonging to the account refers to the correct, contemporary descriptions of the customer and the account during the designated time interval. But this necessary precision comes with a price: Each time either the customer or the account undergoes a Type 2 change, you need to issue a new record in the helper table to reflect the new key combinations. So you can see that the begin time and the end time in the helper record actually refer to the time span when the customer was part of the account and both the customer description and the account description were unchanged. Although this sounds complicated, you will see in the next section that by using twin timestamps, you can perform interesting queries without having to be an expert logician.

USING TWIN TIMESTAMPS

The twin timestamps shown in Figure 1 didn't appear in the previously mentioned DBMS column. In retrospect, I have seen these timestamps prove very useful for querying. A list of the customers in an account named ABC123 at a particular point in time can be expressed with a very simple SQL query:


SELECT customer.name
FROM account, map, customer
WHERE account.accountkey = map.Accountkey
AND customer.customerkey = map.Customerkey
AND account.naturalid = 'ABC123'
AND '7/18/2001' BETWEEN map.begindate AND map.enddate

Of course, I cheated a little here by using a nonstandard interpretation of BETWEEN. SQL specifies BETWEEN syntax as field BETWEEN values. This example uses a reverse relationship, value BETWEEN fields. But most modern relational databases such as Oracle support this syntax.

The disadvantage of using twin timestamps is that it complicates updating the helper table. Any account map record still valid as of today has to have an open-ended ENDDATE, which is ugly. When a new record supercedes this one, the ENDDATE has to be adjusted to the real value. In spite of this administrative complexity, the alternative of storing only the BEGINDATE makes querying far too complex. You would have to change the preceding query to look for the largest begin date less than or equal to the requested date. This embedded select statement is inefficient and may be hard to set up in standard query tools, especially by "normal" end users.

The twin timestamps also make a time span query fairly simple. Suppose you wanted the list of all customers who were part of an account at any time between two dates. You would need to test only (1) whether the begin date falls in the requested span or (2) whether the requested span completely surrounds the begin and end dates. Now the query looks like this:


SELECT DISTINCT customer.name
FROM account, map, customer
WHERE account.accountkey = map.Accountkey
AND customer.customerkey = map.Customerkey
AND account.naturalid = 'ABC123'
AND (map.begindate BETWEEN '7/18/2000' and '7/18/2001'
OR ('7/18/2000' < map.begindate AND '7/18/2001' > map.enddate))





IE Weekly Newsletter
Subscribe to the newsletter
    Email Address