Suite SmartsFind the right data mining suite for youby Stephen Penn Continued from Page 1 First, determine how many distinct algorithms are in each of the following categories: clustering, classification, prediction, associations, and time-series analysis. Typical clustering algorithms are neural network, demographic, Cobweb, EM, and Gaussian neighborhood. Typical classification algorithms are neural network and decision trees. Typical prediction algorithms are neural networks, radial basis function, and naive bayes. Typical association (or market-basket analysis) algorithms are Apriori and generalized rule induction. Finally, typical time-series analysis algorithms are periodicity analysis and trend analysis. Visualization (Importance: Medium-4). The complexity of patterns, groupings, and scorings can make it difficult to display results from data mining activities. How the results are visualized can directly affect your ability to analyze and evaluate the results. Also, it's much easier to bring domain experts and end users into the process when you have charts and graphs rather than typical columnar reports. Good visualization tools offer interactivity as well. Be sure to ask how many algorithm categories the visual reporting tools support. Usually they support clustering and associations; histograms, bar charts, and decision trees may be a part of the data mining suite as well. Statistics (Importance: Medium-5). The ability to jump back and forth between traditional statistical operations and data mining activities can be a tremendous help. The more statistical tests available, the more options you have when testing hypotheses that were developed from reviewing the data mining results. Make sure you ask which statistical tests are available. Tests I've found particularly useful include Chi-Squared, F-Test, Principal Component Analysis, Factor Analysis, curve fitting, ANOVA, and log regression. Project ManagementMethodology (Importance: Low-3). If the data mining suite supports a methodology directly, your data mining team will be much more organized and prepared. Although it's certainly not essential for the tool to support a methodology, the team definitely should. Ask if the suite supports a particular methodology. Why does the team support one methodology over another? The most well-known methodologies include SEMMA (developed by SAS Inc.) and CRISP-DM (developed by a consortium with chief contributors coming from SPSS Inc., NCR, and Daimler-Chrysler). Coupling (Importance: High-7). The plumbing behind the flow of data can be critical to the overall success or failure of a project. The questions in this section attempt to determine how easy it will be to move data into and out of the suite, depending on your working environment. Questions you should ask:
An Extendable SystemNow you're equipped to explore potential data mining suites in detail and make a systematic comparison. Along the way, you may discover other areas that should be evaluated as well. If so, simply add more rows (with the appropriate description and weight) to the comparison chart. By using this comparison chart for evaluating data mining suites, you expand your product options well beyond specific algorithms. Remember, it's just as vital to evaluate how a product visualizes results, integrates with existing databases, and performs statistical analysis. Stephen Penn [stelyn@att.net] is an application software developer at Lockheed Martin where his responsibility focuses on business intelligence issues. He has more than 10 years' experience and holds an MBA from Frostburg State University. RESOURCESSee the Business Intelligence Information Center for data mining articles, books, and other resources.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| ||||||||||||||||||||||||||||||||









