Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




March 20, 2003

Stakes Increase In Data Quality Game

A data quality solution that aims for data integration

by Ganesh Variar

Continued from Page 1

Blue Fusion SDK is a comprehensive set of data quality libraries. Its algorithms are available to Windows users as C-callable code or as COM objects. For mainframe or Unix users, the algorithms are available as shared objects. With BlueFusion SDK you can incorporate DataFlux components into your own server-based applications. DfIntelliServer is the client/server extension of Blue Fusion SDK. It allows easier integration to a wider variety of software architectures, such as intranet and Web applications, client/server data-entry and reporting systems, ETL systems, and other complex environments. DfIntelliServer is a multiplatform, multiserver, concurrent-user product that you can implement in Java, Perl, COM/ASP, and C. Since all DataFlux applications are based on the same Quality Knowledge Base, any definition created in dfPower Studio can be saved and reused in dfIntelliServer or Blue Fusion SDK.

The Moves

I took the software for a test drive and was impressed with the results. The installation was easy, and I found the dfPower Studio GUI intuitive. Let's look at a realistic example to understand the flexibility and power of the DataFlux solution. Suppose you want to standardize the customer data in your CRM database. An example of this is to standardize the name "Mister Larry P. Brown" to "Mr. Larry P Brown."

Although this task looks easy to the human eye, it's quite an undertaking for the computer. First, the name must be dissected into its individual parts. DataFlux uses a feature called chop tables to specify the delimiter between the elements. In this case, a single space serves as the delimiter between the parts of the name.

Next, the system needs to recognize the order of the elements. Here, the order is "Name Prefix, Given Name, Middle Name, Family Name." These generic parts are termed categories. DataFlux contains a built-in vocabulary that contains most common names for each category. The user can extend the vocabulary further. DataFlux terminology calls the defined order of the categories "grammar."

After breaking the name into the relevant categories, you proceed with the standardization. DataFlux uses its standardization scheme to convert "Mister" to "Mr." Last, you can build a rule with the regular expression library that will remove the period after the middle initial.

Let's extend this exercise now by trying to match this entry to another entry for the same customer (perhaps in another database), where the customer's name has been entered as "Lawrence Brown." You would create a match code for the two entries with a sensitivity factor of less than 100 percent. DataFlux will recognize both names to be the same and create the same match code for both records. You can then use SQL to link the two records joining on the match codes.



Rate This Article

Comments:

Optional e-mail address:

Checkmate!

If you're embarking on a data quality initiative, DataFlux is definitely worth considering. It works beyond the usual name and address standardizations that many vendors offer. One of the reasons that many organizations prefer in-house quality solutions vs. off-the-shelf packages is the general inability to customize tools to meet user requirements. Not only can you easily customize DataFlux, but you can also leverage its features to enhance existing applications with minimum overhead — which makes DataFlux a top quality solution indeed.


Ganesh Variar [ganesh_variar@yahoo.com] is a lead analyst at Regence BlueCross BlueShield of Oregon. He has managed and designed business intelligence solutions for nine years.


RESOURCES

Ascential Software Corp.: www.ascentialsoftware.com

DataLever Corp.: www.datalever.com

DataMentors Inc.: www.datamentors.com

The Data Warehousing Institute: www.dw-institute.com/dqreport

Evoke Software Corp.: www.evokesoft.com

FirstLogic Inc.: www.firstlogic.com

Fuzzy! Informatic AG: www.fuzzy-informatik.com/englisch

Group 1 Software: www.g1.com

InfoRoute Inc.: www.inforouteinc.com

Innovative Systems Inc.: www.innovativesystems.com

Netrics Inc.: www.netrics.com

Paladyne Corp.: www.paladyne.com

Sagent Technology Inc.: www.centrus.com

Trillium Software (a division of Harte-Hanks Inc.): www.trilliumsoft.com









IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







techweb
Online Communities TechWebInformationWeekLight ReadingIntelligent EnterprisebMightyNetwork ComputingDark ReadingDigital LibraryWall Street & Technology
Byte & SwitchNo JitterInternet EvolutionLight Reading's Cable Digital NewsContentinopleUnStrungBank Systems & TechnologyAdvanced TradingInsurance & Technology
Face-to-Face Events
InteropWeb 2.0 ExpoWeb 2.0 SummitVoiceConBlack HatCSISoftwareEntrprise 2.0 ConferenceGTEC
Mobile Business Expo
InformationWeek 500 ConferenceBuy Side Trading XchangeBuy Side Trading SummitBank Executive SummitInsurance Executive SummitTelcoTVEthernet ExpoOptical Expo
Magazines  
InformationWeekWall Street & TechnologyInsurance & TechnologyBank Systems & TechnologyAdvanced TradingMSDNTechNetSmart EnterpriseThe Architecture JournalDatabase Magazine
 
Research & Analyst Services  
Heavy ReadingInformationWeek ReportsInformationWeek Analytics