Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




August 31, 2001



Analysts' Darling

Under SPSS, Clementine continues to earn favor

By Greg James

Continued from Page 1

Tree-based classification and prediction techniques are among the oldest data mining methods in use. Prior versions of Clementine included a Build Rule node that was largely superceded by the companion C5.0 node. Version 6.0 dropped the Build Rule node and now includes Classification and Regression Trees, another traditional algorithm. This is a more complementary pairing with the powerful C5.0 algorithm as it can handle numeric and symbolic output fields.

The benefit of new ownership is no more apparent than with the inclusion of SPSS's Linear and Logistic Regression algorithms. These fundamental modeling techniques are the benchmark to which newer algorithms are compared. Their inclusion fills a big omission from prior releases.

Clementine's Neural Network node provides both Multi-Layer Perceptron and Radial Basis Function (RBF) Networks. RBFs are good in situations where the data contains additive noise or there are many input variables. Clementine's implementation of both of these algorithms automates the "sample, preprocess, train, and test" sequence common to neural net development. A complete dashboard of settings and options is also available to control the overall process and network topology.

Also new to this version, Factor Analysis/Principle Component Analysis is a well-documented technique used for attribute reduction, a common requirement since many algorithms take exponentially longer to run as the number of variables increases.

CaprI, another new module, stands for "Clementine Apriori Intervals." As its name suggests, it is a modified version of the traditional Apriori algorithm that discovers sequential association patterns. For example, it could identify the most common Web site clickstreams that precede product purchases.

SYSTEM ARCHITECTURE

As to be expected, Clementine likes its resources. A minimal system will have 256MB of RAM. Disk space consumption, however, is small when compared to the size of the data files that are to be analyzed.

Clementine was originally developed on Unix running under an X-Windows server. This legacy requirement has carried over to the Windows platform where Clementine comes bundled with Hummingbird's eXceed package.

Clementine generates Predictive Modeling Markup Language scripts, which is the de facto standard for model exchange among data mining systems and databases.

Clementine is available in a client/server version for more demanding applications. The server component is optimized to run compute-intensive operations on a server platform and can take advantage of built-in database support such as Microsoft's OLE DB for Data Mining, or IBM's IntelligentMiner for DB2 scoring facility. Unfortunately, its core algorithms are single-threaded and therefore can't exploit parallelism beyond what's provided by the DBMS or operating system.

A MATURE ENTERPRISE PRODUCT

Clementine has additional features that are beyond the scope of this review that make it a serious enterprise solution. It has built-in scripting language, facilities for storing project templates and work-in-process, a "solution publisher" for translating streams into runtime systems, an external module interface, and prebuilt application templates to name a few.



Rate This Article

Comments:

Optional e-mail address:

Ultimately, Clementine's productivity lies in its integration of the project workspace, analytical and support functions, access to operational parameters, and the ease with which all of this can be used.

With this release, Clementine's feature set is well rounded, its client/server architecture is more robust, and its overall integration with operating systems and database servers is excellent. It is clearly a mature product.



Greg James [greg.james@nationalcity.com] is a vice president of National City Corp. and manager of the Retail Marketing Quantitative Methods group. He also teaches university computer science and data mining classes.







IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo Jitter
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet Evolution
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space