Design Constraints and Unavoidable RealitiesNo design problem in school was this hardThe basic mission of a data warehouse professional, as I described in my previous column ("An Engineer's View," July 26, 2002), is to publish the right data. Because I am an engineer and try to build practical things that work, I then asked: What does a good engineer do when confronted with the task of presenting an organization's data most effectively? Well, unfortunately, before we can start looking through the engineer's portfolio of tools and techniques, we have to swallow some bitter medicine. We have to face, all at once, the complete list of design constraints and unavoidable realities of designing a modern data warehouse. And it's a daunting list. Perhaps more than any other job in IT, the data warehouse design task combines computer technology, cognitive psychology, business content, and politics. Whenever I present the following design constraints and unavoidable realities, I worry that I'm encouraging prospective data warehouse professionals to seek another career. But maybe the challenges of the job are what make these techniques so attractive and compelling. We will dig our way out, I promise you, but not until we've faced the full list. Design ConstraintsDesign constraints are the requirements that we, as good engineers, seek to place on our design because they're obvious and desirable goals. Unavoidable realities, as the name suggests, are requirements that we wish we could avoid, but dare not if we're being honest. The first two design constraints are, in my opinion, absolutely nonnegotiable requirements of publishing the right data: Understandability. The final screens presented to the end user must be immediately understandable, simple, recognizable, and intuitive. This is the most challenging constraint in the whole list. As I've often said, we designers are genetically selected because we have an unusual tolerance for complexity. Almost all of our designs are too intricate and too complicated. We put features in front of users and mistake them for solutions. We give demos when we should be listening. We should provide more blank space on our application screens with fewer choices. Intricate, tiny dashboards filled with widgets appeal to only 10 percent of our end users. Count the mouse clicks as a measure of complexity. Try to make everything work with three clicks or less. Speed. From the end user's perspective, the only acceptable delay in presenting the data is zero. One of the biggest false design rules is, "The users will accept a long delay if the results are complicated (or involve processing a lot of data)." This is a mealymouthed excuse by designers who are unwilling to say, "We know this is slow, and we are working diligently to make it faster." Implementation cost. There are several costs that a good engineer must be constrained by. Implementation costs include the labor costs and delays during the design phase before any useful result is delivered to end users. The design is probably divided 70 percent between the back-room extract, transform, load (ETL) applications and 30 percent for the front-room end-user queries and reports. Implementation costs swell when each data warehouse design starts from a clean slate as a "custom job" without any reusable designs. Implementation costs can't be controlled when the design approach depends on the complexity of the data. A design with 10 tables is controllable. A design with 100 tables is marginal. A design with 1,000 tables is a disaster that will fail. Hardware and software technology cost. The hardware and software should be scaled to the requirements at hand and should be easy to extend well beyond the first implementation. Software is the gold coin in the long run for understandable and fast delivery of data to our end users. Hardware is a commodity that should periodically be discarded in favor of more powerful versions. The most serious mistake with hardware and software is choosing a proprietary and closed hardware solution that emphasizes raw computing power on complex production schemas rather than careful software and data design. While such solutions may reduce isolated back-room costs, they drive up the costs of application development and increase the chances that end users will encounter complexity. Daily administrative costs. Daily administration includes the routine loading of data into fact tables and dimension tables through standard ETL applications and includes the production of standard reports distributed to end users. In a dimensional data warehouse, the key ETL application is the "surrogate key pipeline." Cost of surprises. Little surprises include late-arriving facts, late-arriving dimensions, and corrections to existing data. We know they'll happen but we can't respond until we receive the data. We need standard techniques for handling these little surprises. Big surprises include new dimensions, dimension attributes, facts, and granularity of a data source. All the big surprises make us alter our database schemas while we're in production. We demand that that these big surprises are "graceful" so that all existing end-user applications keep on working without requiring any recoding. Prevention of irrelevant results. One of the biggest causes of data warehouse failures is shooting at the wrong target. The data warehouse must be relevant if it is to be successful. Relevance is not an accident; it comes from extensive and continuous business requirements gathering at the beginning and throughout the life of the warehouse. The best data warehouse engineer lives half in IT and half in the end-user department. Data warehouse professionals should have desks in the end users' department. Prevention of inappropriate centralization. A centrally planned data warehouse is as likely to be successful as a centrally planned economy. It sounds great on paper, and it appeals to the controlling instincts of IT, but a centrally planned data warehouse makes the assumptions of perfect information and control. Eventually, the problems with these assumptions, like those of a centrally planned economy, come home to roost. In the long run, a data warehouse should be a decentralized community of data marts, tied together with an architecture that makes them work together effectively, but where true control is ceded to the individual and autonomous remote departments. Unavoidable RealitiesUnavoidable realities are departures from the ideal model of the business world. Anthropologists who study the business world call the ideal model normative and the realistic model descriptive. The descriptive model throws away the procedures manual and describes the unavoidable realities of the business world, such as:
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
|
|











