Analysts' DarlingUnder SPSS, Clementine continues to earn favor
By Greg James Tree-based classification and prediction techniques are among the oldest data mining methods in use. Prior versions of Clementine included a Build Rule node that was largely superceded by the companion C5.0 node. Version 6.0 dropped the Build Rule node and now includes Classification and Regression Trees, another traditional algorithm. This is a more complementary pairing with the powerful C5.0 algorithm as it can handle numeric and symbolic output fields. The benefit of new ownership is no more apparent than with the inclusion of SPSS's Linear and Logistic Regression algorithms. These fundamental modeling techniques are the benchmark to which newer algorithms are compared. Their inclusion fills a big omission from prior releases. Clementine's Neural Network node provides both Multi-Layer Perceptron and Radial Basis Function (RBF) Networks. RBFs are good in situations where the data contains additive noise or there are many input variables. Clementine's implementation of both of these algorithms automates the "sample, preprocess, train, and test" sequence common to neural net development. A complete dashboard of settings and options is also available to control the overall process and network topology. Also new to this version, Factor Analysis/Principle Component Analysis is a well-documented technique used for attribute reduction, a common requirement since many algorithms take exponentially longer to run as the number of variables increases. CaprI, another new module, stands for "Clementine Apriori Intervals." As its name suggests, it is a modified version of the traditional Apriori algorithm that discovers sequential association patterns. For example, it could identify the most common Web site clickstreams that precede product purchases. SYSTEM ARCHITECTUREAs to be expected, Clementine likes its resources. A minimal system will have 256MB of RAM. Disk space consumption, however, is small when compared to the size of the data files that are to be analyzed. Clementine was originally developed on Unix running under an X-Windows server. This legacy requirement has carried over to the Windows platform where Clementine comes bundled with Hummingbird's eXceed package. Clementine generates Predictive Modeling Markup Language scripts, which is the de facto standard for model exchange among data mining systems and databases. Clementine is available in a client/server version for more demanding applications. The server component is optimized to run compute-intensive operations on a server platform and can take advantage of built-in database support such as Microsoft's OLE DB for Data Mining, or IBM's IntelligentMiner for DB2 scoring facility. Unfortunately, its core algorithms are single-threaded and therefore can't exploit parallelism beyond what's provided by the DBMS or operating system. A MATURE ENTERPRISE PRODUCTClementine has additional features that are beyond the scope of this review that make it a serious enterprise solution. It has built-in scripting language, facilities for storing project templates and work-in-process, a "solution publisher" for translating streams into runtime systems, an external module interface, and prebuilt application templates to name a few. Ultimately, Clementine's productivity lies in its integration of the project workspace, analytical and support functions, access to operational parameters, and the ease with which all of this can be used. With this release, Clementine's feature set is well rounded, its client/server architecture is more robust, and its overall integration with operating systems and database servers is excellent. It is clearly a mature product. Greg James [greg.james@nationalcity.com] is a vice president of National City Corp. and manager of the Retail Marketing Quantitative Methods group. He also teaches university computer science and data mining classes.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| |||||||||||||||||||||||||||||||





















