Divide and ConquerBuild your data warehouse one piece at a timeContinued from Page 1 But, more likely, when the enterprise decides to create a set of common labels across all the sources of data, then the separate data mart teams (or, equivalently, a single centralized team) must sit down to create master dimensions that everyone will use for every data source. These master dimensions are called conformed dimensions. A single, centralized dimension authority creates, administers, and periodically releases each conformed dimension. There would normally be separate dimension authorities for the conformed calendar, product, vendor, employee, and customer dimensions, to name some typical examples. When I discuss the mathematics of dimension tables and fact tables in my next column, you will see that the dimension authority is deeply involved in defining the keys of the dimension tables. Furthermore, you'll see that all the client data marts are expected to use these keys, both in the dimension tables they receive from the dimension authorities, as well as in the fact tables they create from their local data sources. Conformed dimensions are agreed to by an interdisciplinary team representing all the interests of the enterprise. This is a hard job. It's expected that the team will get stuck from time to time trying to align the incompatible original vocabularies of different groups. This is why the high-level executive sponsor from an end-user department is necessary. The executive must periodically approve these tough vocabulary compromises or even force them to be made. At the same time the team is defining the conformed dimensions, it's natural to discuss the various kinds of measured facts that exist in the various separate data sources. For instance, if several parts of the enterprise report revenue, then the assembled team is in a good place to discuss the exact definition of these different revenue measurements. If the definitions are highly consistent, then the team can combine the revenues mathematically in sums and ratios. We call these highly consistent facts conformed facts. With facts that can't be conformed, you must label them differently so that they are not used in computations that could be misleading. The Data Warehouse Bus ArchitectureYou can think of the conformed dimensions and facts of an enterprise as a standard set of connection points for applications in other words, as a data warehouse bus architecture. The term "bus" comes from the bus in an electrical power system or the backplane of a computer, both of which define a set of common connection points and a methodology for making the connections. By using the data warehouse bus architecture, a set of very simple query applications can retrieve results from separate data sources, all with the same row labels on the answer sets, and then combine the sets by sort-merging these identical row headers in a process called "drilling across" (also called "multi-pass SQL"). The row labels are guaranteed to be drawn from the same domain because they're drawn from the same dimension table! This seemingly obvious and trivial result is immensely powerful. These consistent row headers, because they allow us to divide and conquer the overall data warehouse problem, make possible:
Only for Highly Distributed Systems?You might ask, is the bus architecture only for highly distributed systems? The bus architecture approach, with its conformed dimensions and facts, has nothing to do with the issue of centralization! The bus architecture is all about using a set of common labels that have been defined across separate data sources. A highly centralized data warehouse with a single design team and a single physical installation faces exactly the same issues of creating and then using a master set of common labels. In fact, even in a highly centralized data warehouse, the separate data sources will appear as separate physical tables. Even in an E/R modeled environment, the tables containing measurements and the tables describing the context of the measurements are pretty distinct. In order to perform queries across these separate tables, all of the issues addressed by the bus architecture must be attacked in a centralized, E/R modeled environment as well. Netting Out the BenefitsUsing conformed dimensions and conformed facts to build a data warehouse bus architecture is a kind of predictable cookbook approach to building a data warehouse in our complex enterprises. It allows us to divide the problem into separate chunks and to arrive at the final solution gradually and incrementally. The true up-front investment required with the bus architecture is defining the conformed dimensions and facts. Once that is done, new data marts can be added as opportunity dictates, as long as they observe the bus architecture rules consistently. The bus architecture also allows an autonomous development style, where loosely coupled data mart teams cooperate to build an overall distributed system. The bus architecture is therefore a blueprint for separate organizations to share data effectively, even if they have different IT departments and different technologies. In fact, OLAP and ROLAP can coexist gracefully in a bus architecture environment because they only need to be compatible along their row headers at query results time. Ralph Kimball co-invented the Star Workstation at Xerox and founded Red Brick Systems. He has three best-selling data warehousing books in print, including The Data Warehouse Toolkit, Second Edition (Wiley, 2002). He teaches dimensional data warehouse design through Kimball University and critically reviews large data warehouse projects. You can reach him through his Web site, www.ralphkimball.com.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| ||||||||||||||||||||||||||||||||









