Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




May 24, 2001



Joint Effort

Administer the distributed data warehouse correctly or lose it!

By Ralph Kimball


To distinguish between minor and major releases of dimensions. A minor release of a dimension occurs when new records have been added to the dimension but no existing records have been altered. In the parlance of slowly changing dimensions, a minor release allows fresh new records as well as Type 2 changes.

A major release of a dimension occurs when existing records have been altered with Type 1 or Type 3 changes. Each new dimension must carry a release version number. A properly constructed drill-across application requires that the version numbers for the particular dimension match as the drill-across operation is carried out. Mismatched version numbers are a warning that the definitions of some of the dimension attributes are not consistent across the DDW and that the results may be incorrect.

Not to modify the published content of a dimension. A member organization can add private attributes to a dimension without invalidating the conformed nature of the dimension and can delete records from the dimension if none of its fact tables use those records, but the values in the attributes the dimension authority supplies must remain unmodified.

A corollary of this rule is that dimensions will usually grow monotonically. A dimension authority can't delete records from a dimension, because the authority can't know if downstream fact tables are still using fact records.

Dimension Authority Duties

With this background, it's easier to understand the administrative steps the dimension authority must take. These steps are:

  1. Add fresh new records to the dimension, generating new surrogate keys.
  2. Add new records for Type 2 changes to existing dimension entries (true physical changes at a point in time), generating new surrogate keys.
  3. Modify records in place for Type 1 changes (overwrites) and Type 3 changes (alternate realities), without changing the surrogate keys. Update the version number of the dimension if any of these Type 1 or Type 3 changes are made.
  4. Replicate the revised dimension simultaneously to all fact table providers (members of the DDW).

Fact Table Provider Duties

The fact table provider has a more complex job. But, in exchange for this complexity, the fact table provider gets a complete suite of verbose, high-quality dimensions that should add considerable value to each fact table.

For instance, with exactly the same fact table, a really elaborate customer dimension offers far more opportunities for querying, reporting, and data mining than a narrow customer dimension does. With any luck, the extract-transform-load programs for processing each dimension will be similar, making the overall backroom software development task more manageable.

The fact table provider's administrative steps include:

  1. Receive or download dimension updates.
  2. Process dimension records marked as new and current, to update current key maps in the surrogate key pipeline.
  3. Process dimension records marked as new but postdated. This action triggers a complex alternative to the normal surrogate key pipeline processing. I described the detailed processing steps for this case in my September 29, 2000 column, "Backward in Time" (available at www.ralphkimball.com or www.intelligententerprise.com/000929/webhouse.jhtml).
  4. Add all new records to fact tables after replacing their natural keys with correct surrogate keys.
  5. Modify records in all fact tables for error correction, accumulating snapshots, and postdated dimension changes. You probably do this on a partition-by-partition basis.
  6. Remove invalidated aggregates. An existing historical aggregate becomes invalid only when a Type 1 or Type 3 change occurs on the attribute that is the target of the aggregation or if historical fact records have been modified in step 5. Changes to other attributes do not invalidate an aggregate. For instance, a change in the Flavor attribute of a product doesn't invalidate aggregates based on the Category attribute.
  7. Recalculate affected aggregates. If the new release of a dimension doesn't change the version number, then you need to extend aggregates only to handle newly loaded fact data. If the version number of the dimension has changed, then you may have to recalculate the entire historical aggregate if it was removed in step 6. OLAP systems may handle these steps automatically.
  8. Assure the quality of all base fact tables and aggregate tables. Be satisfied that the aggregate tables are correctly calculated.
  9. Bring updated fact and dimension tables online. The detailed strategy for taking a fact table (or more likely a partition of a fact table) offline for the briefest possible duration can be found in my book, The Data Webhouse Toolkit (Wiley, 2000), starting on page 645.
  10. Inform end users that the database has been updated. Tell the users if major changes have been made, including dimension version changes, postdated record additions, and changes to historical aggregates.


Rate This Article

Comments:

Optional e-mail address:

This column sketches just the key steps of administering a DDW. Certainly these steps are complex, but without them the DDW will be a failure.

When conformed dimensions and conformed facts are administered correctly and within the DDW framework, the separate organizational members of the DDW are quite free to develop their databases independently. Furthermore, the benefits of the work you put into producing the verbose, high-quality dimensions are multiplied across the whole DDW. In all probability, no single member of the DDW would have produced such good dimensions, but now everyone can drill across with this high-quality detail.

You may have noticed that I didn't talk about administering the role definitions for security across the DDW. Give me a few weeks and we'll tackle that one, too.



Ralph Kimball co-invented the Star Workstation at Xerox and founded Red Brick Systems. He has three best-selling data warehousing books in print, including The Data Webhouse Toolkit (Wiley, 2000). He teaches dimensional data warehouse design through Kimball University and posts data warehouse design tips, which are available through his Web site, www.rkimball.com







IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







techweb
Online Communities TechWebInformationWeekLight ReadingIntelligent EnterprisebMightyNetwork ComputingDark ReadingDigital LibraryWall Street & Technology
Byte & SwitchNo JitterInternet EvolutionLight Reading's Cable Digital NewsContentinopleUnStrungBank Systems & TechnologyAdvanced TradingInsurance & Technology
Face-to-Face Events
InteropWeb 2.0 ExpoWeb 2.0 SummitVoiceConBlack HatCSISoftwareEntrprise 2.0 ConferenceGTEC
Mobile Business Expo
InformationWeek 500 ConferenceBuy Side Trading XchangeBuy Side Trading SummitBank Executive SummitInsurance Executive SummitTelcoTVEthernet ExpoOptical Expo
Magazines  
InformationWeekWall Street & TechnologyInsurance & TechnologyBank Systems & TechnologyAdvanced TradingMSDNTechNetSmart EnterpriseThe Architecture JournalDatabase Magazine
 
Research & Analyst Services  
Heavy ReadingInformationWeek ReportsInformationWeek Analytics