Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




October 8, 2002

A Paragon of Quality

What should you consider for improving information quality in data warehouses?

by Pushpak Sarkar

Continued from Page 1

Usability of data. The knowledge worker will only use information if it is usable and easy to work with. If information generated by the data warehouse has to be extensively manipulated by analysts, then the information lacks the necessary quality.

Definition conformance. In a mature IQ environment, the individual data elements should contain values that are consistent with the official data definition. Data values stored in the data warehouse should comply with the data definition specifications.

Nonduplication and collaboration of data. The more you distribute or duplicate the data, the less chance of data consistency, which leads its lack of credibility among knowledge workers. Information architecture should promote enterprisewide data collaboration.

Consistent representation. In most large enterprises, duplicated data appears in different applications across the enterprise. The redundant, distributed data with common data elements should be consistent across the various applications.

Derivation integrity. Summarization and derivation of transaction data plays a key role in the data warehouse environment. Mature IQ processes ensure that data summarized in the data warehouse from the source data is accurate and in accordance with the specified derivation rules.

Fact completeness. Mature IQ processes ensure that users have all the necessary data required to complete their analytic needs. If a knowledge worker requires five elements to complete a calculation, but only four are available in the data warehouse, then the information lacks fact completeness.

Accessibility. Information has to be accessible to the end user easily on demand. If getting the information involves running multiple steps or applications, then clearly the user views the information as less usable.

Timeliness. A mature IQ environment ensures that information is available to the knowledge workers on time per predefined service level agreements between the data warehouse and the user community. Information loses its value if it doesn't reach the knowledge worker in time.

Appropriate level of data. The information generated by the data warehouse should be at the right level of granularity and precision, based on the user's requirements.

(These IQ measures are largely based on Larry English's IQ methodology; see Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits, Wiley, 1999.)

IQ Best Practices

Organizations that place a top priority on IQ use a combination of people-, process-, and technology-related measures to lay the foundation for improving IQ within their enterprises. Understanding what's worked for other enterprises can help you develop a mature IQ process for your own data warehouse environment.

The fundamental lesson in IQ is to view your information as a strategic business resource owned by the enterprise. An effective information policy is required to manage IQ in the enterprise. With that in mind, introducing an information stewardship program to promote cross-system ownership and accountability of the different subject areas (marketing, finance, and so forth) in the enterprise is vital. IQ targets should be set at the outset by management. You should measure these periodically to compare actual results against targets.

The IT stewardship committee should make efforts to raise the quality of operational data that acts as a data source to the data warehouse. Operational systems providing information for the data warehouse should be held accountable for the integrity of the data they generate.



Rate This Article

Comments:

Optional e-mail address:

Keep in mind that your quality improvement process should be founded on well-defined standards and benchmarks across the enterprise. Audits of the source system data should include surveys completed by knowledge workers who use the source data indirectly in a downstream DSS/reporting application. You should promote these data standards across the enterprise as they facilitate shared understanding among various user groups and better communication among different departments. Better communication would ultimately result in greater productivity among data warehouse users organizationwide.

An ongoing goal for the enterprise should be to capture data electronically only once at the business event closest to the data origin. Data entered at the originating system is more likely to have the highest level of data integrity. Once you capture the data, you should train operational system personnel adequately so that they know the data's potential downstream customers to ensure data completeness. You should also give information producers incentives to capture data for downstream processes and data warehouses.

These best practices aren't all-inclusive, as they will evolve in every organization differently. As you may have realized, ultimately every organization will need to define IQ in the context of its unique business environment.


AUTOMATED TOOLS FOR IMPROVING IQ

Enterprises interested in improving IQ within their data warehouse environment can use various automated tools. Broadly speaking, five types of tools are available in the IQ market:

IQ analysis tools. This category of tools extracts data from the data warehouse, measures its quality, and reports the analysis of the automated data assessment by the tool.

Business rule discovery tools. These tools help you understand how data is used by concentrating on the business rules that are actually practiced. They analyze data in fields and files to discover useful patterns, relationships, and rules in the underlying data.

IQ defect prevention tools. These tools automate information process quality improvement by minimizing error introduction at the source. They are similar to data reengineering and cleansing tools, but they provide cleansing during the online data creation process, rather than in batch mode. They can provide reasonability tests and assure valid values (valid codes, addresses, and so on), but may not assure the correct values necessarily.

Data reengineering and cleansing tools. These tools improve the quality of the data itself — typically using automated data correction tools. They usually cleanse the data and then load it into a target database or file storage system. This tool category includes the ability to do the following:

  • Fill in missing data based on data matching algorithms
  • Calculate derived or summarized data
  • Transform data from one data type to another
  • Match and consolidate duplicate data
  • Match names and addresses (a specialized function in some tools).

Metadata management and quality tools. These tools provide automated management and quality control of data definition and information architecture development. They often are used to support data definition assessment, which includes the following major functions:

  • Evaluate data models for normalization
  • Evaluate database design for checking integrity such as primary key, foreign key integrity, and performance optimization
  • Validate data naming abbreviations used to conform to naming standards.


Pushpak Sarkar [pushpak_sarkar@merck.com] is a senior information architect at Merck & Co. He is an information practitioner who's interested in information architecture, information quality, and data warehousing.









IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







techweb
Online Communities TechWebInformationWeekLight ReadingIntelligent EnterprisebMightyNetwork ComputingDark ReadingDigital LibraryWall Street & Technology
Byte & SwitchNo JitterInternet EvolutionLight Reading's Cable Digital NewsContentinopleUnStrungBank Systems & TechnologyAdvanced TradingInsurance & Technology
Face-to-Face Events
InteropWeb 2.0 ExpoWeb 2.0 SummitVoiceConBlack HatCSISoftwareEntrprise 2.0 ConferenceGTEC
Mobile Business Expo
InformationWeek 500 ConferenceBuy Side Trading XchangeBuy Side Trading SummitBank Executive SummitInsurance Executive SummitTelcoTVEthernet ExpoOptical Expo
Magazines  
InformationWeekWall Street & TechnologyInsurance & TechnologyBank Systems & TechnologyAdvanced TradingMSDNTechNetSmart EnterpriseThe Architecture JournalDatabase Magazine
 
Research & Analyst Services  
Heavy ReadingInformationWeek ReportsInformationWeek Analytics