A Paragon of QualityWhat should you consider for improving information quality in data warehouses?by Pushpak Sarkar Information quality (IQ) is becoming increasingly important for data warehouses because of knowledge workers' constantly growing demands for data analysis. This expanding popularity reflects the rising need to make strategic use of the data integrated from heterogeneous source systems. Enterprises also need to instill quality and accuracy into their data warehouses to maximize their return on investment. Better quality information lets an enterprise improve the decision-making process by basing it on objective facts collected from the field using operational systems. IQ also helps the enterprise discover cross-marketing opportunities for increasing revenues and reduce costs from error detection and correction. Ultimately, IQ builds credibility and improves the image of the overall data warehouse to the customer. With this overall background, I'll discuss the major IQ considerations that information managers need to consider for the data warehouse environment. IQ in the Data WarehouseThe data integration process in a data warehouse system is analogous to the production process in the manufacturing industry. In the same way that the production process generates material products, the data warehouse produces information that is used by the customer namely the knowledge workers. The heterogeneous source data corresponds to raw materials provided by suppliers (data sources), and the data integration process in the data warehouse corresponds to the information manufacturing process (see Figure 1).
The similarities between manufacturing and data warehouse processing could make the well-established total quality management (TQM) concepts known in the manufacturing sector relevant to the world of data warehousing and IQ. Indeed, IQ's evolution as a discipline originated from TQM. The renowned quality expert J.M. Juran defines quality in the following manner: Quality is focused on the "consumer and the product's fitness for use." The expectations of today's information consumers go beyond having accurate data. The information should also be timely, easily accessible, and relevant to the consumer's tasks. Hence, you must treat information not just as a product, but also as a service. When you speak of IQ, you should include aspects of both product and service quality. IQ Activities in the EnterpriseIQ management in any enterprise typically involves the following major activities:
To improve the IQ in a data warehouse, you have to attempt to measure and improve the various data components that form the foundation of IQ:
Measurement Process for IQYou can measure the IQ in your data warehouses through an IQ assessment exercise, which attempts to find out the current state of the IQ processes as objectively as possible and certifies the reliability of the data made available to the knowledge worker. The IQ assessment tries to determine whether the data conforms to the business rules specified or has valid data values as defined in common domains. Typically, you can conduct this assessment using automated data assessment tools. (See the sidebar " Automated Tools for Improving IQ," and Table 1.) An IQ assessment in a data warehouse will typically comprise the following steps: Step 1: Identify an application area or information group where poor quality can cause significant negative impact. Step 2: Establish the purpose for IQ measurement and identify the relevant measures to assess. Step 3: Determine a detailed list of files or processes that should be assessed as part of the IQ assessment exercise. Step 4: Identify data sources that will be used to validate the accuracy of data available in the data warehouse. Step 5: Extract a random sample of data from the data warehouse as well as sources for data validation. The data sampling should be based on appropriate statistical techniques because the entire IQ assessment depends on the validity of this step. Step 6: Analyze the IQ of the collected sample data against the specified quality criteria. Step 7: Interpret and report IQ assessment findings. Once the data is analyzed, the results of the IQ for the assessed data have to be interpreted and the findings have to be communicated back to the knowledge worker and information producers to establish the credibility of the data warehouse for feedback purposes. Desirable IQ MeasuresMeasuring the quality of information generated by the data warehouse as a final output can be difficult because of the multiple processes involved in its multilayered architecture such as extract, transform, and load processes. I'll therefore take a different approach by looking at some of the most desirable measures and metrics in information generated by a data warehouse:
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
|
|











