Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




October 4, 2001



Hard Core Mining

This data mining tool is not for the meek or frail

By Greg James

Continued from Page 1

"Model" functions include Regression, Tree, Neural Network, Principal Components/DMNeural, Memory-Based Reasoning, Two-Stage, Ensemble, and User Defined nodes. The Regression node provides standard linear and logistic regression with nearly all the options available from the underlying SAS/Stat package. The Tree node generates decision trees using a composite algorithm SAS assembled with features from the CHAID and CART algorithms. It allows both automatic and interactive training to take place.

Enterprise Miner's Principal Components/DMNeural node, as its name states, is a combination of principal component analysis (PCA) and neural networks. PCA is a popular technique for feature selection. Because neural network algorithms are sensitive to large sets of input variables and variables with many possible values (large domains), these algorithms commonly use PCA or some other form of feature selection or dimension reduction. This node conveniently and intelligently combines these operations.

Two-Stage is really a composite model: a class prediction and an interval prediction model. It is useful for developing models for problems like, "Will Customer A buy Product X (yes/no) and, if yes, how much (quantity)?" The Ensemble node takes individual models' outputs (posterior probabilities or predicted values) into one, composite output. Ensembles can be homogeneous or heterogeneous.

"Assess" functions boil down to the Assess and Reporter nodes. The Assess node is a comprehensive module for comparing the expected results of models to current performance or comparing models to one another. The Reporter node automatically generates HTML-formatted reports of complete data mining project flows. It is "intelligent" in that it understands its positioning within the project stream and will generate reports of differing detail depending upon where it is placed and what its inputs are.

Architecture Options

Enterprise Miner sits on top of a large, bundled collection of SAS statistical products. Enterprise Miner is available in a standalone, workstation configuration or in a client/server configuration. In the latter case, you can perform analysis on both the workstation and the server simultaneously. As for scalability, I regularly run jobs processing several hundred million rows of data. Just make sure you provide Enterprise Miner with lots of free disk space and plenty of RAM. A typical workstation should have 1GB RAM and 100GB of disk space.

There are two ways to deploy Enterprise Miner data mining procedures: Score and C*Score. Score generates SAS code that can be run on any Enterprise Miner installation. The generated procedures will apply all data processing, transformation, and replacement logic, execute all required models, and generate the model scores to an output data set. The C*Score node works similarly but generates compilable C programs or parsable XML.



Rate This Article

Comments:

Optional e-mail address:

Enterprise Miner also employs a new Data Mining Database (DMDB) to store mined data. The DMDB is a special SAS data set that has been optimized specifically for data mining operations. For example, certain algorithms require variance and covariance statistics. By having these statistics precalculated and stored in the DMDB, Enterprise Miner's algorithms can eliminate many passes through the data. This concept is already making its way into the mainstream, as Oracle, IBM, and Microsoft are all adding similar statistics and logic into their database systems.

Hard Core

Enterprise Miner is not a tool for the uninitiated. Its documentation is complete but technical. Although the "Getting Started" guide will get experienced SAS users up to speed quickly, nascent data miners will be lost without additional training.

The combination of Enterprise Miner's new data mining algorithms and access to core SAS procedures means this product will do just about everything. SAS is obviously not standing still, as evidenced by the many new and experimental nodes in this release.



Greg James [greg.james@nationalcity.com] is a vice president of National City Corp. and manager of the Retail Marketing Quantitative Methods group. He also teaches university computer science and data mining classes.







IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







techweb
Online Communities TechWebInformationWeekLight ReadingIntelligent EnterprisebMightyNetwork ComputingDark ReadingDigital LibraryWall Street & Technology
Byte & SwitchNo JitterInternet EvolutionLight Reading's Cable Digital NewsContentinopleUnStrungBank Systems & TechnologyAdvanced TradingInsurance & Technology
Face-to-Face Events
InteropWeb 2.0 ExpoWeb 2.0 SummitVoiceConBlack HatCSISoftwareEntrprise 2.0 ConferenceGTEC
Mobile Business Expo
InformationWeek 500 ConferenceBuy Side Trading XchangeBuy Side Trading SummitBank Executive SummitInsurance Executive SummitTelcoTVEthernet ExpoOptical Expo
Magazines  
InformationWeekWall Street & TechnologyInsurance & TechnologyBank Systems & TechnologyAdvanced TradingMSDNTechNetSmart EnterpriseThe Architecture JournalDatabase Magazine
 
Research & Analyst Services  
Heavy ReadingInformationWeek ReportsInformationWeek Analytics