Quality Is Job OneWith strategic business applications reaching more users, data quality is finally getting its dueThere's nothing like $92 million to get people's attention. Ascential Software Corp.'s acquisition of Vality Technology Inc. this past spring woke up interest in software solutions to data quality problems. At The Data Warehouse Institute (TDWI) event in San Diego held during the week of May 13, no topic was more dominant. Ascential, flush with cash after its sale of Informix Software products to IBM in 2001, also added data profiling technology acquired from Metagenix to complement the Vality package. By bringing data integration and quality solutions together, the company is making a strong bid to become the dominant one-stop shop for strategic data management infrastructure. Ascential isn't the only contender. My first vendor meeting at TDWI was with SAS Institute Inc., which was pitching the notion of "ETLQ" that is, to "exponentially enhance the power of extract, transform, and load [ETL] procedures with data quality." Two years ago, SAS acquired data quality, cleansing, and integration specialist DataFlux and made it into a wholly owned subsidiary so that it could perform these "Switzerland" functions without the taint of association with a dominant business intelligence (BI) vendor. Firstlogic Inc., Arkidata Corp., and G4 Analytics were among the companies I met with at TDWI carrying a strong data quality message. All view poor data quality as the biggest "showstopper" for BI, data warehousing, and strategic projects, such as CRM. And all are wondering which among them will be the next to be acquired for a vast sum. What exactly are we talking about here? "Beware of 'dirty data'": Consultant Jill Dyche includes this admonition as a checklist item for data analysis projects in her useful text, The CRM Handbook (Addison-Wesley, 2001). "That means data in its natural state," she writes, "prior to being cleansed and formatted for use by businesspeople." She offers an example of an "archaic company billing system" that produces data business users "won't touch" because of its quality problems. The quality of data in its "natural state" depends on practices and procedures for capturing or inputting the data. Afterwards, quality becomes more subjective, depending on the requirements of those who'll be analyzing the data. "Cleansing" the data has been sort of a euphemism for the process of properly preparing data so that it can be interpreted according to the requirements of power users and their BI and reporting tools. Cleansing steps have improved but never guaranteed quality. Now, from the way the vendors are talking, "cleansing" could come to mean an end-to-end effort to guarantee the quality of the data coming in as well as the preparation of it for further and increasingly intelligent transformation befitting the needs of the BI masses. In other words, by "users" we must also mean external customers and business partners: More and more, the value of CRM, supplier, and partner relationships rests on data relationships and shared market analysis. Strategic business applications, which support decision making through "actionable intelligence," are supposed to leave as much of the messy details of where the data came from and its reliability under the covers, away from the eyes of external users, if not most internal ones. But they never disappear: As data services grow in value and importance, data quality could become as much a part of a company's brand identity and human resources reputation as the products and services for which it might be better known. Portals and dashboards will be either a beacon of success or a data-filled embarrassment. The Survey SaysTDWI, led by Wayne Eckerson, its director of education and research, issued a timely report at the conference, "Data Quality and the Bottom Line" (see www.dw-institute.com/dqreport). The report claims that data quality problems "cost U.S. businesses more than $600 billion a year." Much of this is based on the cost of postage, printing and the staff overhead to deal with the mess of erroneous communications and marketing. But, as the report describes, the problems go beyond sales and marketing to the waste generated by business processes working with poor quality data. Humans are often required to sort things out, thereby frustrating the savings and efficiency promised by automation. With a strong tone of self-criticism, a huge majority of those surveyed answered that their organizations need more education about the significance of data quality problems and how to address them. Unfortunately, the subject is often avoided out of fear that revelations could result in a severe public relations blow. Problems are acknowledged indirectly and suffered as a cost of doing business. Nonetheless, companies not to mention government bodies have had embarrassing public disclosures recently. More will come as the public grows more data-savvy, the worst cases will end up as targets for class-action suits, no doubt. "Data warehousing, e-business, and CRM projects often expose poor quality data because they require companies to extract and integrate data from multiple operational systems," the report says. "Data that is sufficient to run payroll, shipping, or accounts receivable is often peppered with errors, missing values, and integrity problems that don't show up until someone tries to summarize or aggregate the data." The survey goes on to identify the chief sources of poor data quality. Among them are lack of validation routines; mismatched syntax, formats, and structures; a spider web of interfaces; lack of referential integrity checks; and data conversion errors. Business changes such as mergers and acquisitions are often the cause of these problems. The "single version of the truth" many users so fervently desire and expect their data warehousing and data mart systems to provide is confounded by the "fragmentation of organizations into a multitude of department, divisions, and operating groups, each with its own business processes supported by distinct data management systems." Data quality was likely a strong motive behind California's huge, infamous, and now cancelled contract to standardize all state agency databases on Oracle. Political shenanigans aside, the deal probably would have saved the state money by solving a large chunk of its data quality and integration problems. But few organizations can afford such an attempt to clear the decks of dissimilar systems. Visualizing the ProblemWhat's the biggest thing that could pull organizations out of data quality hell? Visibility. The more users who demand strong data support for decision-making, the more organizations will gather the will to grapple with the root causes of poor data quality and invest in infrastructure required to fix problems before they infect either internal analyses or the data products shared with external users. Context, which means so much to evaluating the quality of any information, will demand knowledge management approaches to data quality problems. Thus, it makes sense that software and procedures to handle data quality, ETL, integration, and profiling begin rolling together, possibly to produce an integrated, more intelligent package that delivers a "sum" of higher value than its parts. However, the horizontal suite approach won't be a silver bullet. Solving some quality issues may require too much process and subject-matter expertise. Vendors that have succeeded in vertical or functional niches could expand outward. Firstlogic, for example, is seeking to extend its success in solving customer "householding" problems into a new notion of corporate householding, which would clearly be valuable for sales and marketing applications but also for critical historical reporting requirements. Arkidata, which has thrived in the pension benefits niche, is pushing out to apply its software to similar vertical solutions. And a relatively new player, G4 Analytics, is tackling retail category management first as an application for its data quality, integration, and analytic application solutions. Quality is something everyone desires; yet, the concern is only now reaching critical mass in the realm of data management. Any progress will save companies money and open up enormous opportunities. David Stodder [dstodder@cmp.com] is editorial director of Intelligent Enterprise.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
|
|











