Joint EffortAdminister the distributed data warehouse correctly or lose it!By Ralph KimballAs you may have noticed, my column's name has changed to "Data Warehouse Designer." This new name is more descriptive of the broader scope and more practical flavor of the subjects I am now covering here. I will continue to invite a number of experienced data warehouse practitioners to alternate with me in writing these columns. I hope you will enjoy some of their fresh perspectives. While we are on the subject of new things, this column's topic is about the most important new development in data warehousing: the distributed data warehouse (DDW). Inspired by the Web and the ability to share data through XML, a DDW arises whenever several organizations agree to share data. A DDW encourages "drilling across" many separated databases to produce coordinated overall results. The most visible and urgent DDWs are supply chains in which confederations of manufacturers, distributors, and retailers share data in order to implement just-in-time "efficient consumer response." But there are many other examples of DDWs in other industries. The advantages of a distributed architecture are enormous. In situations like supply chains, we simply can't build a centralized data warehouse. The separate organizations have separate technologies, IT staffs, data elements, and administrative rhythms. What ties these diverse entities together is a strong desire to share data, but not share much else! With the benefit of a few years' experience building DDW architectures, we are beginning to understand the implications of administering and supporting these systems. We need to do our homework in understanding DDW administration, because if we do it incorrectly we will lose just about all the advantages we hope for. Before outlining what it takes to administer a DDW, it's important to define carefully what it is. A DDW is a collection of organizations that:
It should be evident that conformed dimensions are copied and distributed widely around the DDW, and that the conformed fact tables exist as single copies within their respective DDW member organizations. Centralized AgreementA DDW doesn't come about because a group of organizations just wants to share data. Significant planning, compromises, and technical development must take place before a DDW can go live. All the organizations participating in the DDW must embrace a centralized agreement. The members of the DDW agree to the following: To define and use conformed dimensions. For instance, all members will use the Item dimension in a supply chain. All must be willing to live with the attribute names and attribute values in this centrally defined dimension. All the members of the DDW agree to rely on a single item czar who will administer and publish the Item dimension. This person takes the title "dimension authority." Each conformed dimension in the overall DDW needs a dimension authority. Although not mandatory, it probably would work best to have all the dimension authorities work together in a single office. To define and use conformed facts. For instance, all members must agree on the definition of the "revenue" measure in a DDW so that separate revenue measurements coming from separate fact tables can be combined mathematically. Unlike with conformed dimensions, the individual member organizations create these facts locally when they publish their own fact tables. To use the surrogate keys contained in the conformed dimensions. The dimension authority creates a well-administered set of simple keys for each record in the dimension. These keys will usually be integers devoid of any structure, meaning, or order. The Calendar Date dimension is the only exception to the lack of ordering, because we usually use the Calendar Date surrogate key as the basis for physically partitioning the downstream fact tables. The surrogate keys are the primary keys of the dimensions, and all fact tables using conformed dimensions must use the surrogate key values as foreign keys. To synchronize the addition of new fact records with the correct current release of a dimension. Each member organization makes sure that new fact records they add to their respective fact tables are processed using the correct surrogate keys defined in the most recent release of each dimension. If the member organization has implemented a surrogate key pipeline with lookup tables for finding the contemporary surrogate keys, these lookup tables must be updated whenever a new release of a dimension is received. See the article "Pipelining Your Surrogates" from the June 1998 issue of DBMS magazine (available at www.ralphkimball.com or www.dbmsmag.com).
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
|
|











