Stakes Increase In Data Quality GameA data quality solution that aims for data integrationby Ganesh Variar Continued from Page 1 Blue Fusion SDK is a comprehensive set of data quality libraries. Its algorithms are available to Windows users as C-callable code or as COM objects. For mainframe or Unix users, the algorithms are available as shared objects. With BlueFusion SDK you can incorporate DataFlux components into your own server-based applications. DfIntelliServer is the client/server extension of Blue Fusion SDK. It allows easier integration to a wider variety of software architectures, such as intranet and Web applications, client/server data-entry and reporting systems, ETL systems, and other complex environments. DfIntelliServer is a multiplatform, multiserver, concurrent-user product that you can implement in Java, Perl, COM/ASP, and C. Since all DataFlux applications are based on the same Quality Knowledge Base, any definition created in dfPower Studio can be saved and reused in dfIntelliServer or Blue Fusion SDK.
The MovesI took the software for a test drive and was impressed with the results. The installation was easy, and I found the dfPower Studio GUI intuitive. Let's look at a realistic example to understand the flexibility and power of the DataFlux solution. Suppose you want to standardize the customer data in your CRM database. An example of this is to standardize the name "Mister Larry P. Brown" to "Mr. Larry P Brown." Although this task looks easy to the human eye, it's quite an undertaking for the computer. First, the name must be dissected into its individual parts. DataFlux uses a feature called chop tables to specify the delimiter between the elements. In this case, a single space serves as the delimiter between the parts of the name. Next, the system needs to recognize the order of the elements. Here, the order is "Name Prefix, Given Name, Middle Name, Family Name." These generic parts are termed categories. DataFlux contains a built-in vocabulary that contains most common names for each category. The user can extend the vocabulary further. DataFlux terminology calls the defined order of the categories "grammar." After breaking the name into the relevant categories, you proceed with the standardization. DataFlux uses its standardization scheme to convert "Mister" to "Mr." Last, you can build a rule with the regular expression library that will remove the period after the middle initial. Let's extend this exercise now by trying to match this entry to another entry for the same customer (perhaps in another database), where the customer's name has been entered as "Lawrence Brown." You would create a match code for the two entries with a sensitivity factor of less than 100 percent. DataFlux will recognize both names to be the same and create the same match code for both records. You can then use SQL to link the two records joining on the match codes. Checkmate!If you're embarking on a data quality initiative, DataFlux is definitely worth considering. It works beyond the usual name and address standardizations that many vendors offer. One of the reasons that many organizations prefer in-house quality solutions vs. off-the-shelf packages is the general inability to customize tools to meet user requirements. Not only can you easily customize DataFlux, but you can also leverage its features to enhance existing applications with minimum overhead which makes DataFlux a top quality solution indeed. Ganesh Variar [ganesh_variar@yahoo.com] is a lead analyst at Regence BlueCross BlueShield of Oregon. He has managed and designed business intelligence solutions for nine years. RESOURCESAscential Software Corp.: www.ascentialsoftware.com DataLever Corp.: www.datalever.com DataMentors Inc.: www.datamentors.com The Data Warehousing Institute: www.dw-institute.com/dqreport Evoke Software Corp.: www.evokesoft.com FirstLogic Inc.: www.firstlogic.com Fuzzy! Informatic AG: www.fuzzy-informatik.com/englisch Group 1 Software: www.g1.com InfoRoute Inc.: www.inforouteinc.com Innovative Systems Inc.: www.innovativesystems.com Netrics Inc.: www.netrics.com Paladyne Corp.: www.paladyne.com Sagent Technology Inc.: www.centrus.com Trillium Software (a division of Harte-Hanks Inc.): www.trilliumsoft.com
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| ||||||||||||||||||||||||||||||||









