CMP -- United Business Media

Intelligent Enterprise

Better Insight for Business Decisions

UBM
Intelligent Enterprise - Better Insight for Business Decisions
Part of the TechWeb Network
Intelligent Enterprise
search Intelligent Enterprise





October 8, 2002

A Paragon of Quality

What should you consider for improving information quality in data warehouses?

by Pushpak Sarkar

Information quality (IQ) is becoming increasingly important for data warehouses because of knowledge workers' constantly growing demands for data analysis. This expanding popularity reflects the rising need to make strategic use of the data integrated from heterogeneous source systems.

Enterprises also need to instill quality and accuracy into their data warehouses to maximize their return on investment. Better quality information lets an enterprise improve the decision-making process by basing it on objective facts collected from the field using operational systems. IQ also helps the enterprise discover cross-marketing opportunities for increasing revenues and reduce costs from error detection and correction. Ultimately, IQ builds credibility and improves the image of the overall data warehouse to the customer.

With this overall background, I'll discuss the major IQ considerations that information managers need to consider for the data warehouse environment.

IQ in the Data Warehouse

The data integration process in a data warehouse system is analogous to the production process in the manufacturing industry. In the same way that the production process generates material products, the data warehouse produces information that is used by the customer — namely the knowledge workers. The heterogeneous source data corresponds to raw materials provided by suppliers (data sources), and the data integration process in the data warehouse corresponds to the information manufacturing process (see Figure 1).

Executive Summary

Pushpak Sarkar

Any successful data warehouse project needs to generate information that is credible, usable, and has business value to its business users. To accomplish this objective, enterprises using data warehouses need an overall information quality (IQ) strategy. This article discusses how to plan, develop, and implement an IQ strategy. It also describes the types of IQ tools and platforms required.

The similarities between manufacturing and data warehouse processing could make the well-established total quality management (TQM) concepts known in the manufacturing sector relevant to the world of data warehousing and IQ. Indeed, IQ's evolution as a discipline originated from TQM.

The renowned quality expert J.M. Juran defines quality in the following manner: Quality is focused on the "consumer and the product's fitness for use." The expectations of today's information consumers go beyond having accurate data. The information should also be timely, easily accessible, and relevant to the consumer's tasks. Hence, you must treat information not just as a product, but also as a service. When you speak of IQ, you should include aspects of both product and service quality.

IQ Activities in the Enterprise

IQ management in any enterprise typically involves the following major activities:

  • Quality policy: establishing the overall quality related intentions and goals of an organization
  • Quality planning: setting quality objectives and specifying processes and related resources necessary to fulfill these objectives
  • Quality control: executing processing to fulfill quality requirements
  • Quality assurance: providing confidence that quality requirements will be fulfilled
  • Quality improvement: increasing the ability to fulfill quality improvements.

To improve the IQ in a data warehouse, you have to attempt to measure and improve the various data components that form the foundation of IQ:

  • Data definition quality is the value of the data specification and includes the clear and precise definition of the associated elements and business rules. Completely defining data by assessing the data definition and information architecture helps provide clear criteria to measure the quality of the data generated after the data warehouse has been implemented.
  • Data content quality is the correctness of the data values physically stored in the data warehouse. This data has to conform to the approved business rules and the data specification set out in the data definition process.
  • Data presentation quality is the value of the information product as perceived by the knowledge worker using the information. Typically, the information should be timely, usable, and useful at this level.

Measurement Process for IQ

You can measure the IQ in your data warehouses through an IQ assessment exercise, which attempts to find out the current state of the IQ processes as objectively as possible and certifies the reliability of the data made available to the knowledge worker.

The IQ assessment tries to determine whether the data conforms to the business rules specified or has valid data values as defined in common domains. Typically, you can conduct this assessment using automated data assessment tools. (See the sidebar " Automated Tools for Improving IQ," and Table 1.) An IQ assessment in a data warehouse will typically comprise the following steps:

Step 1: Identify an application area or information group where poor quality can cause significant negative impact.

Step 2: Establish the purpose for IQ measurement and identify the relevant measures to assess.

Step 3: Determine a detailed list of files or processes that should be assessed as part of the IQ assessment exercise.

Step 4: Identify data sources that will be used to validate the accuracy of data available in the data warehouse.

Step 5: Extract a random sample of data from the data warehouse as well as sources for data validation. The data sampling should be based on appropriate statistical techniques because the entire IQ assessment depends on the validity of this step.

Step 6: Analyze the IQ of the collected sample data against the specified quality criteria.

Step 7: Interpret and report IQ assessment findings. Once the data is analyzed, the results of the IQ for the assessed data have to be interpreted and the findings have to be communicated back to the knowledge worker and information producers to establish the credibility of the data warehouse for feedback purposes.

Desirable IQ Measures

Measuring the quality of information generated by the data warehouse as a final output can be difficult because of the multiple processes involved in its multilayered architecture such as extract, transform, and load processes. I'll therefore take a different approach by looking at some of the most desirable measures and metrics in information generated by a data warehouse:







IE Weekly Newsletter
Subscribe to the newsletter
    Email Address