Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




April 16, 2002

Validation Victory

Which of many data mining algorithms and options are best for your solution? With validation techniques you can make a solid decision

By Barry Grushkin

Continued from Page 1

Interestingly, a new company, Sightward Inc., offers software that automatically uses this method to pick the best algorithm for several applications from a wide range of popular data mining algorithms. Because of validation methods, it is so confident it can make model improvements that it audaciously guarantees an increase in its clients' ROI.

Narrowing It Down to One Model

The output of some popular algorithms is really not one model at all, but a range of models. Figure 2 shows an example of this in a decision-tree process. Using the primary data set, consecutively selected, independent variables are sliced at break points to produce increasingly focused definitions of who bought product A vs. who didn't. This slicing commonly can go way beyond reality, with rules that generalize poorly but fit the points of the primary data set well. The set of business rules up to any point can be considered a model. So where do you stop?

In Figure 2, the numbers in brackets represent acceptance rates — the left for the generating data and the right for a validation sample. But by looking at acceptance rates in the validation sample, you can see that, as of Rule 3, the rules start becoming less general. Acceptance rates fall from 37 percent to 29 percent. Rule 3 advantages are only happenstance aspects unique to the generating (primary) data. Validation indicates that a superior model would include Rules 1 and 2, but not 3.

As another example, memory-based reasoning techniques can in fact produce an infinite number of models. Figure 3 illustrates a foundational perplexing issue in learning and modeling — what does it mean for two events to be like each other?

The diagram shows a primary data set mapped into a space defined by the independent variables (W, X). The white Ys and Ns represent known successes or failures respectively. A forecast for a new case is determined by its proximity to these values. The concentric ovals indicate the wide range of ways to define proximity. With the addition of a validation sample (the lowercase Ys), the light-green oval becomes the extension of choice from the white Ys.

The Right Choice



Rate This Article

Comments:

Optional e-mail address:

This situation is no trivial matter: Actions based on correct models can be highly lucrative, and conversely, actions based on incorrect views of the world can result in bankruptcy. One of the reasons for the vast diversity of CRM initiative results (and, therefore, opinions about their value) is differing analysts'ability to understand the implications of options such as those I've mentioned.

Data mining can rapidly generate intricate models but, unfortunately, way too many of them. Which is right? Validation techniques offer a set of powerful methods to help researchers make the right choices.


Barry Grushkin is chairman and CTO of The Machine Intelligence Development Co., a group specializing in sophisticated data mining and constantly improving data mining techniques and methods.


RESOURCES

SAS Institute Inc.: www.sas.com

Sightward Inc.: www.sightward.com

Silicon Graphics Inc.: www.sgi.com

SPSS Inc.: www.spss.com

Related Articles at IntelligentEnterprise.com:

"Connect the Dots," March 1, 2000: www.intelligententerprise.com/000301/decision.jhtml

"The Quest for Speed," Nov. 12, 2001: www.intelligententerprise.com/011112/417decision1_1.jhtml








IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







techweb
Online Communities TechWebInformationWeekLight ReadingIntelligent EnterprisebMightyNetwork ComputingDark ReadingDigital LibraryWall Street & Technology
Byte & SwitchNo JitterInternet EvolutionLight Reading's Cable Digital NewsContentinopleUnStrungBank Systems & TechnologyAdvanced TradingInsurance & Technology
Face-to-Face Events
InteropWeb 2.0 ExpoWeb 2.0 SummitVoiceConBlack HatCSISoftwareEntrprise 2.0 ConferenceGTEC
Mobile Business Expo
InformationWeek 500 ConferenceBuy Side Trading XchangeBuy Side Trading SummitBank Executive SummitInsurance Executive SummitTelcoTVEthernet ExpoOptical Expo
Magazines  
InformationWeekWall Street & TechnologyInsurance & TechnologyBank Systems & TechnologyAdvanced TradingMSDNTechNetSmart EnterpriseThe Architecture JournalDatabase Magazine
 
Research & Analyst Services  
Heavy ReadingInformationWeek ReportsInformationWeek Analytics