Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




May 13, 2003

TCO Starts With the End User

The conventional view of data warehouse total cost of ownership is myopic and wrong

by Ralph Kimball

Continued from Page 1

In all of these types of problems, the potential cost is the failure of the data warehouse. This cost dominates and renders traditional cost analyses meaningless. There's no upper bound on the potential cost of not being able to make the right decision. Your goal, then, is to replace this potentially unbounded cost with finite, knowable costs, and at the same time eliminate the risks of losing the data warehouse. Many of these finite costs are in the numeric range of the less-important costs I listed at the start of this column.

A Closer Look

Taking a constructive view, let's look at these sources of cost to the data warehouse, how much they affect the overall organization, and what we can do to reduce them.

Data needed for decisions is unavailable. This is the big one. Unavailable data means the data warehouse failed to inform decisions. We want to replace this unbounded and unknowable cost with the predictable costs of gathering business requirements from the end users, studying what information end users need when making decisions, regularly canvassing end-user decision makers to understand new requirements, and systematically trawling for new sources of data and new metrics that explain or predict events.

Lack of partnership between IT and end users. When IT and the end users don't have a good partnership, the end users will be frustrated because they aren't being served well, and IT will blame the end users for complaining, not being computer literate, and not reading the documentation. A failed or underperforming data warehouse results with probably no clear consensus on how to fix it. Decisions will be missed because the system isn't usable. A good partnership means that IT staff live in the end-user environments, and that there's a flow of personnel across the end-user/IT organization boundary. I've often said that the best application support person is permanently conflicted as to whether the business or the technology is more appealing. These people (of which I am one) spend their entire careers moving back and forth across the end-user/IT boundary. The cost to address this problem is an explicit program of tours of duty for IT people to spend a year or longer working directly in the end-user department. End-user credibility for IT personnel is the "gold coin."

Lack of explicit end-user-focused cognitive and conceptual models. IT application designers systematically make things too complicated, or assume that end users will be adept at wrangling data from one computer window into another, or assume that end users even want to perform analysis. End users come in many flavors. A good IT applications delivery team will carefully profile the cognitive and computer sophistication of the end users, and at the same time construct conceptual models of how the user performs a task and makes a decision. Then the team can choose or configure the delivery tools to be the best match. The cost of this approach is significant. Multiple tools may be needed. More custom user interfaces and canned reports may have to be built. In my experience, this focus on end users is rare.

Data needed for decisions is delayed. There's been a groundswell of interest recently in providing real-time data warehouse access. The tongue-in-cheek definition of real time is "any data delivery that is too fast for the current extract, transform, load system." The demand for real-time data warehousing appears in many market trading, customer support, credit approval, security authorization, medical diagnosis, and process management situations. The killer, of course, is a data delivery system that's too slow to support the decision that must be made in real time. The costs of real-time data delivery can be significant, and there's no single approach. I described one piece of this puzzle in my column "Realtime Partitions" (Feb. 1, 2002).

Unconformed dimensions. If a customer dimension (for instance) is unconformed, it means that two of your data sources have incompatible customer categorizations and labels. The result is that the two data sources can't be used together. Or, more insidiously, the data sources will look like they can be compared but the logic is wrong. The cost, once again, is a lost opportunity to be well informed about your customers, but there are huge unreported costs when managers waste time resolving the data incompatibilities and vent their anger at not having comprehensible data. The more desirable cost in this case is the cost of resolving the categorization and labeling differences up front when designing conformed dimensions. For example, see my recent Fundamentals series article "Divide and Conquer" (Oct. 30, 2002).

Unconformed facts. Unconformed facts are related to unconformed dimensions. They arise when two numeric measures are similar but cannot be logically combined in a calculation such as a ratio or a difference. For instance, two different revenue numbers may not be put into the same calculation because one is before tax adjustments and one is after. The cost to fix this problem can be combined with the cost of conforming dimensions and can be accomplished by the same people in the same meetings.

Insufficiently verbose data. Providing verbose dimensional descriptions is a basic responsibility of the data warehouse designer. Each attribute in a product or customer dimension is a separate entry point into the data because attributes are the dominant means of constraining and grouping data. The cost of making data more verbose usually comes from finding, cleaning, and merging auxiliary data sources into the database.

Data in awkward formats. There are several categories of poorly formatted data that defeat the end users, even when the data is present. By far, the worst offender is data presented to the end user in an entity-relation (E/R) format. These complex E/R schemas are impossible for end users to understand, and they require custom schema-dependent programming to deliver queries and reports. A few vendors actually recommend E/R schemas for data warehouse delivery and then sell extremely expensive hardware solutions to IT that are powerful enough to overcome these inefficient schemas. What these vendors systematically avoid is an honest accounting of the application development costs and lost opportunity costs they have transferred to the end users.

Sluggish, unresponsive delivery of data. End users have little tolerance for slow user interfaces. The only truly acceptable response time is instantaneous, and the data warehouse designer must always have this as a goal. End users aren't likely to try ad hoc queries more than once if they take many minutes or hours to return a result. The use of a really fast decision-support system is qualitatively different from a system that has to be used in batch job mode. Users of a fast system try far more alternatives and explore more directions than users of a slow system. Fixing a slow system is a multipronged challenge, but it starts with good database design and good software. Dimensional models are fast and E/R models are slow, given the same hardware capacities. After addressing the database design and choice of software, the next relevant performance knobs are lots of real memory (RAM), proper tuning with aggregations and indexes, and CPU raw speed.

Data locked in a report or dashboard. Data that can't be transferred in tabular format with a single command to a spreadsheet is locked uselessly in an application. Choose applications that allow any visible data to be copied by the end user to another tool, especially a spreadsheet.

Prematurely aggregated data. Data marts that consist of aggregated (not atomic) data are dangerous because they anticipate a set of business questions and prohibit the end user from drilling down to needed detail. This was the fatal mistake of the "executive information system" movement in the late 1980s. The cost to solve this problem, of course, is to base all data marts on the most atomic data. Atomic data is the most naturally dimensional data and is the most able to withstand the ad hoc attack where end users pose unexpected and precise questions.

Focus on data warehouse ROI. A recent issue of ComputerWorld had a series of articles on ROI approaches for IT managers. There were separate articles on the main methods of calculating ROI: payback period, net present value, internal rate of return, balanced scorecard, and economic value added. In my opinion, all these missed the main point of evaluating the costs and eventually the value of a data warehouse. A data warehouse supports decisions. After a decision is made, give the data warehouse a portion of the credit, and then compare that retrospectively to the costs of the warehouse. My rule of thumb is to take 20 percent of the revenue value of a decision made and book that to the benefit of the data warehouse. This really drives the point that the only meaningful view of the warehouse is the ability to support end-user decisions.

Creation of a corporate data model. In most cases, an a priori effort to model an organization's data is a waste of time. All too often the model is an ideal expression of how data ought to be, and the thousands of entities created are never actually physically populated. It may be fun, and even mildly educational, but the corporate data model is a waste of time that delays the data warehouse. Now, a data model that describes an actual data source, warts and all, is probably a good thing...

Mandate to load all data into the warehouse. Finally, following a mandate to source "all data" in an organization is an excuse to avoid talking to the end users. While it's necessary to have a design perspective that understands the basic dimensionality and content of all your data sources, most IT shops will never be able to address more than a fraction of all their possible data sources. When the preliminary data audit is finished, it's time to go hang out with the end users and understand which of those data sources need to be published in the data warehouse first.

Esoteric Value

I hope that after reading about all these potential and real costs of the data warehouse you've almost forgotten about hardware, software, and services. Years ago, at Xerox PARC (now just PARC), I was shocked when Alan Kay, inventor of the personal computer, said, "Hardware is tissue paper. You use it and throw it away." That seemed disrespectful of all the tangible pieces of gear we had at the research center, but he was right. For the data warehouse, the only thing that matters is to effectively publish (in the most pleasing and fulfilling sense) the right data (that supports our ability make decisions).

Thanks for reading my 100 columns.



Rate This Article

Comments:

Optional e-mail address:


Ralph Kimball (founder of the Ralph Kimball Group) co-invented the Star Workstation at Xerox and founded Red Brick Systems. He has three best-selling data warehousing books in print, including The Data Warehouse Toolkit, Second Edition (Wiley, 2002). He teaches dimensional data warehouse design through Kimball University and critically reviews large data warehouse projects. You can reach him through his Web site, www.ralphkimball.com.


RESOURCES

Related Articles on Application Integration at IntelligentEnterprise.com:

"Realtime Partitions," Feb. 1, 2002

"Divide and Conquer," Oct. 30, 2002








IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo Jitter
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet Evolution
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space