What Not to DoDimensional modeling mistakes to guard againstContinued from Page 1 Mistake 6: Declare that a dimensional model is "based on a specific report." A dimensional model has nothing to do with an intended report! A dimensional model is a model of a measurement process. A numeric measurement is a solid physical reality. Numeric measurements form the basis of fact tables. The dimensions appropriate for a given fact table are the physical context that describe the circumstances of the measurements. A dimensional model is solidly based on the physics of a measurement process and is quite independent from how an end user chooses to define a report. Mistake 5: Mix facts of differing grain in the same fact table. A serious error in a dimensional design is to add "helpful" facts to a fact table, such as records that describe totals for an extended time span or for a large geographic area. Although these extra facts are well known at the time of the individual measurement and would seem to make some applications simpler, they cause havoc because all the automatic summations across dimensions double- and triple-count these higher-level facts, producing incorrect results. Each different measurement grain demands its own fact table. Mistake 4: Leave lowest-level atomic data in E/R format. The lowest-level data is the most dimensional and should be the physical foundation of your dimensional design. Aggregated data has been deprived of some of its dimensions. If you build a data mart from aggregated data and expect your end-user tools to drill down to E/R-formatted atomic data that you have left in your staging area, then you're dreaming. Build all your data marts on top of the most atomic data. Make all atomic data part of the presentation portion of your data warehouse. Then your end-user tools will gracefully resist the "ad hoc attack." Mistake 3: Eschew aggregate fact tables and shrunken dimension tables when faced with query performance concerns; solve performance problems by adding more parallel processing hardware. Aggregates (derived summary tables) are the single most cost-effective way to improve query performance. Most query-tool vendors explicitly support aggregates, and all of these depend on dimensional modeling constructs. The addition of parallel processing hardware, which is expensive, should be done as part of a balanced program that consists also of building aggregates, choosing query-efficient DBMS software, building lots of indexes, increasing real memory size, and increasing CPU speed. Mistake 2: Fail to conform facts across separate fact tables. It would be a shame to get this far and then build stovepipes. This is called snatching defeat from the jaws of victory. If you have a numeric measured fact called, for example, Revenue, in two or more of your data marts sourced from different underlying systems, then you need to take special care to make sure that the technical definitions of these facts match exactly. You want to be able to add and divide these separate revenue facts freely in your applications. This act is called conforming the facts. AND THE BIGGEST MISTAKE ...Mistake 1: Fail to conform dimensions across separate fact tables. The single most important design technique in the dimensional modeling arsenal is conforming your dimensions. If two or more fact tables have the same dimension then you must be a fanatic about making these dimensions identical or carefully chosen subsets of each other. When you conform your dimensions across fact tables, you will be able to drill across separate data sources because the constraints and row headers will mean the same thing and will match at the data level. Conformed dimensions are the secret sauce needed for building distributed data warehouses, adding unexpected new data sources to an existing warehouse, and making multiple incompatible technologies function together harmoniously. Ralph Kimball coinvented the Star Workstation at Xerox and founded Red Brick Systems. He has three best-selling data warehousing books in print, including The Data Webhouse Toolkit (Wiley, 2000). He teaches dimensional data warehouse design through Kimball University and critically reviews large data warehouse projects. You can reach him through his Web site, www.rkimball.com.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| |||||||||||||||||||||||||||||||





















