Doing the Work at Extract TimeCountering the minimalist approachContinued from Page 1 You should resist the temptation to keep adding more standard currency fields beyond the first to all your designs. Although you may have legitimate needs to roll everything up to third and fourth standard currencies, these applications should be handled with a separate currency exchange rate table that provides symmetrical exchange rates on a daily basis between every reasonable combination of currencies. PRODUCT PIPELINE MEASUREMENTSMost of us think that product pipeline measurements are pretty simple and don't have the complications that financial services, for example, have. Well, spend some time with manufacturing, distribution, and retail people, all talking about the same products in the same pipe. The manufacturing people want to see everything in carload lots or pallets. The distribution people want to see everything in shipment cases. The retail people can see things only in individual "scan units." So what do you put in the various fact tables to keep everyone happy? The wrong answer is to publish each fact in its local unit-of-measure context and leave it to the applications to find the right conversion factors in the product dimension tables! Yes, this is all theoretically possible, but this architecture places an unreasonable burden on the last step of the process, where the end users and the application developers live. Instead, present all the measured facts in a single standard unit of measure and then, in the fact table itself, provide the conversion factors to all the other desirable units of measure. That way, applications querying the pipeline data from any perspective have a consistent way to convert all the numeric values to the end user's specific, idiosyncratic perspective. PHYSICAL COMPLETION OF A PROFIT AND LOSS DESIGNA profit and loss (P&L) fact table is very powerful because you can usually count on it to present all the components of revenue and cost at a low level of granularity. After providing this wonderful level of detail, designers sometimes compromise their design by failing to provide all the intermediate levels of the P&L. For instance, the "bottom-line profit" is calculated by subtracting the costs from the net revenue. This bottom-line profit should be an explicit field in the data, even if it is equal to the algebraic sum of other fields in the same record. It would be a real shame if the user or application developer got confused at the last step and calculated the bottom-line profit incorrectly. HETEROGENEOUS PRODUCTSIn financial services, such as banking and insurance, a characteristic conflict often arises between the need to see all the account types in a single "household" view of the customer and the need to see the detailed attributes and measures of each account type. In a big retail bank, there may be 25 lines of business and more than 200 special measures associated with all the different account types. You simply cannot create a giant fact table and giant account dimension table that can accommodate all the heterogeneous products. The solution is to publish the data twice. First, create a single core fact table with only the four or five measures, such as balance, that are common to all account types. Then publish the data a second time, with the fact table and account dimension table separately extended for each of the 25 lines of business. Although this technique may seem wasteful because the huge fact table is effectively published twice, it makes the separate householding and line-of-business applications simple unto themselves. AGGREGATIONS IN GENERALI have discussed aggregations many times in this column. Suffice it to say, aggregations are like indexes: They are specific data structures meant to improve performance. Aggregations are a significant distraction in the back room. They consume processing resources, add complexity to the extract, transform, and load (ETL) suite of applications, and take up a lot of storage. But aggregations remain the single most potent tool in the arsenal of the data warehouse designer to improve performance cost effectively. DIMENSIONAL MODELING IN GENERALBy now I hope it is obvious that the most widely used backroom trade-off made for the benefit of the end user is the practice of dimensional modeling. A dimensional model is a second normal form version of a third (or higher) normal form model. Collapsing the snowflakes and other complex structures of the higher normal form models into the characteristic flat dimension tables makes the designs simple, symmetrical, and understandable. Furthermore, database vendors have been able to focus their processing algorithms on this well-understood case in order to make dimensional models run really fast. Unlike most of the other techniques discussed in this column, the dimensional model approach is one you can apply across almost all horizontal and vertical application areas. Ralph Kimball coinvented the Star Workstation at Xerox and founded Red Brick Systems. He has three best-selling data warehousing books in print, including The Data Webhouse Toolkit (Wiley, 2000). He teaches dimensional data warehouse design through Kimball University and critically reviews large data warehouse projects. You can reach him through his Web site, www.rkimball.com.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| ||||||||||||||||||||||||||||||||









