To Be or Not To Be CentralizedContrary to conventional data warehouse wisdom, physical centralization is not the questionby Margy Ross As the data warehouse market matures, the cause of data warehouse "pain" (otherwise known as vendor growth opportunity) within the IT organization is bound to evolve. Vendors promote centralization as a miracle elixir to treat data warehouse ailments. They claim it spins independent, disparate data marts into gold by reducing administrative costs and improving performance. Physical centralization may deliver some efficiencies; however, you can't afford to bypass the larger, more important issues of integration and consistency. That's the focus of this column, part 6 of the Fundamentals series. If your data warehouse environment has been developed without an overall architecture or strategy, you're probably dealing with multiple, independent islands of data with the following characteristics:
Some have tried to implicate data marts as the root cause of these problems. That's a generalization that fails to acknowledge the benefits many organizations have realized with properly designed data marts. The problems I listed result from a nonexistent, poorly defined, or inappropriately executed strategy and can crop up with any architectural approach, including the enterprise data warehouse, the hub-and-spoke data warehouse, and distributed or federated data marts. All That Glitters Is Not GoldWe can all agree that independent, isolated sets of data warrant attention, because they're inefficient and incapable of delivering on the business promise of data warehousing. These stand-alone databases may be easier to implement initially, but without a higher-level enterprise integration strategy, they're dead ends that perpetuate incompatible views of the organization. Merely moving these renegade data islands onto a bigger, better centralized platform to give the appearance of centralization is no silver bullet: Data integration and consistency are the true targets. Any approach that aims elsewhere treats the symptoms rather than the disease. While it may be simpler to just brush integration and consistency under the carpet to avoid the political or organizational challenges they pose, doing so will keep you from realizing the true business benefit of the data warehouse. I can't stress enough the importance of logical centralization and integration in the data warehouse, regardless of the physical implementation. In the vernacular of dimensional modeling, using this objective means focusing on the enterprise data warehouse bus architecture and conformed dimensions and facts. As Ralph Kimball previously described many times in this column, the enterprise data warehouse bus architecture is a tool to establish and enforce the overall data integration strategy for the warehouse. It provides the framework for integrating the analytic information in your organization. The result is a powerful centralized architecture that you can implement either as a distributed system on multiple hardware platforms and technologies or on a single, physically centralized technology. The enterprise data warehouse bus architecture is nondenominational and technology-independent. The enterprise's bus architecture is documented and communicated via the data warehouse bus matrix. (See Figure 1 for an example.) The matrix rows represent the core business events or processes of the organization, while the columns reflect the common, conformed dimensions. Conformed dimensions are the means for consistently describing the core characteristics of your business. They're the integration points between the disparate processes of the organization, ensuring semantic consistency. There may be valid business reasons for not conforming dimensions for example, if your organization is a diversified conglomerate with subsidiaries that sells unique products to unique customers through unique channels. However, for most organizations, the key to integrating disparate data is organizational commitment to the creation and use of conformed dimensions throughout the warehouse architecture, regardless of whether data is physically centralized or distributed.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
|
|











