Stakes Increase In Data Quality GameA data quality solution that aims for data integrationby Ganesh Variar
In this Issue: Data is a vital resource in today's information-centric enterprises. Therefore, it is imperative that data quality meet the expectations of the people or applications using it. However, The Data Warehousing Institute estimates that poor-quality customer data costs U.S. businesses a staggering $611 billion a year in postage, printing, and staff overhead. The intangible costs, such as alienating loyal customers by incorrectly addressing letters, are much higher.
The key to achieving high data quality is to take a strategic rather than a tactical approach. Commercial tools can play a significant role in streamlining the data quality process for your organization. An emerging player in the data quality market, DataFlux Corp. (now owned by SAS) presents users of DataFlux 5.0 with a plethora of features such as data auditing, parsing, standardization, verification, matching, linking, and householding. It also provides high-end features such as internationalization, data augmentation, real-time cleaning, a high degree of customization, and integration with other applications. The PlayersThe data quality market is growing rapidly and includes a surprisingly large number of offerings from vendors such as Ascential Software Corp., DataLever Corp., DataMentors Inc., Evoke Software Corp., FirstLogic Inc., Fuzzy! Informatic AG, Group 1 Software, InfoRoute Inc., Innovative Systems Inc., Netrics Inc., Paladyne Corp., Sagent Technology Inc., and Trillium Software (a division of Harte-Hanks Inc.) DataFlux differentiates itself from the competition by providing a comprehensive set of features that focus on data integration rather than just data correction. Moreover, there's a trend in the market for extract, transform, load (ETL) tool vendors to integrate data quality solutions with their software. (See "Quality Control," May 9, 2002.) The most notable such merger in recent months was Ascential's $92 million acquisition of Vality Technology Inc. DataFlux fits nicely into this pattern, since SAS acquired it a few years ago. DataFlux provides interfaces to SAS solutions as well as other applications through a set of fully customizable libraries. The Chess PiecesThe DataFlux software consists of several components. DfPower Studio is a stand-alone, end-user-oriented, Windows-based application that can analyze, match, correct, and enhance the data. It comprises dfPower Base and a set of add-on modules called PowerPacks. The base module analyzes data either at column level or element level (by breaking down the column data into its constituent parts). It can also standardize the data using a built-in or user-defined standardization scheme. The PowerPacks include three data integration modules dfPower Match, dfPower Customize, and dfPower Verify. DfPower Match matches records that contain identical or similar data, based on fuzzy logic. Therefore, you can use it to identify and purge duplicate and near-duplicate records, merge data from multiple sources, household disparate records, and virtually link data across the organization. DfPower Customize lets you tailor existing parse, standardization, and match algorithms, based on current or new business rules. DfPower Verify lets you validate and correct U.S. and Canadian addresses based on postal service standards. It uses Coding Accuracy Support System to certify U.S. addresses, and The Software Evaluation and Recognition Program for Canadian addresses. It also can enhance your address data by appending Geocode information such as ZIP centroid latitude and longitude coordinates, state and county Federal Information Processing Standard codes, and Census Block Group numbers. DfPower Studio relies on a repository called the Quality Knowledge Base that stores customizable pattern and word libraries, regular expression libraries, matching routines, and standardization rules. In addition, dfPower Studio can create batch jobs and data quality reports. It can also perform gender analysis and phonetic analysis (to match similar sounding words).
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
|
|











