Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




May 13, 2003

Suite Smarts

Find the right data mining suite for you

by Stephen Penn

Continued from Page 1

First, determine how many distinct algorithms are in each of the following categories: clustering, classification, prediction, associations, and time-series analysis. Typical clustering algorithms are neural network, demographic, Cobweb, EM, and Gaussian neighborhood. Typical classification algorithms are neural network and decision trees. Typical prediction algorithms are neural networks, radial basis function, and naive bayes. Typical association (or market-basket analysis) algorithms are Apriori and generalized rule induction. Finally, typical time-series analysis algorithms are periodicity analysis and trend analysis.

Visualization (Importance: Medium-4). The complexity of patterns, groupings, and scorings can make it difficult to display results from data mining activities. How the results are visualized can directly affect your ability to analyze and evaluate the results. Also, it's much easier to bring domain experts and end users into the process when you have charts and graphs rather than typical columnar reports. Good visualization tools offer interactivity as well.

Be sure to ask how many algorithm categories the visual reporting tools support. Usually they support clustering and associations; histograms, bar charts, and decision trees may be a part of the data mining suite as well.

Statistics (Importance: Medium-5). The ability to jump back and forth between traditional statistical operations and data mining activities can be a tremendous help. The more statistical tests available, the more options you have when testing hypotheses that were developed from reviewing the data mining results.

Make sure you ask which statistical tests are available. Tests I've found particularly useful include Chi-Squared, F-Test, Principal Component Analysis, Factor Analysis, curve fitting, ANOVA, and log regression.

Project Management

Methodology (Importance: Low-3). If the data mining suite supports a methodology directly, your data mining team will be much more organized and prepared. Although it's certainly not essential for the tool to support a methodology, the team definitely should.

Ask if the suite supports a particular methodology. Why does the team support one methodology over another? The most well-known methodologies include SEMMA (developed by SAS Inc.) and CRISP-DM (developed by a consortium with chief contributors coming from SPSS Inc., NCR, and Daimler-Chrysler).

Coupling (Importance: High-7). The plumbing behind the flow of data can be critical to the overall success or failure of a project. The questions in this section attempt to determine how easy it will be to move data into and out of the suite, depending on your working environment.

Questions you should ask:

  • How well does it tie into your existing databases and data warehouses?
  • Which data set formats will it accept? Flat files, Microsoft Excel spreadsheets, and comma-separated-values are common file formats.
  • Can it mine database tables directly? Can it mine text, such as email and documentation manuals?
  • Can it mine multidimensional data, such as OLAP cubes?
  • Can you manipulate data sets from within the suite? Aggregation, calculation, filtering, and discretization are useful utilities that can be performed from within the suite instead of inside the database.

An Extendable System

Now you're equipped to explore potential data mining suites in detail and make a systematic comparison. Along the way, you may discover other areas that should be evaluated as well. If so, simply add more rows (with the appropriate description and weight) to the comparison chart.



Rate This Article

Comments:

Optional e-mail address:

By using this comparison chart for evaluating data mining suites, you expand your product options well beyond specific algorithms. Remember, it's just as vital to evaluate how a product visualizes results, integrates with existing databases, and performs statistical analysis.


Stephen Penn [stelyn@att.net] is an application software developer at Lockheed Martin where his responsibility focuses on business intelligence issues. He has more than 10 years' experience and holds an MBA from Frostburg State University.


RESOURCES

See the Business Intelligence Information Center for data mining articles, books, and other resources.










IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







techweb
Online Communities TechWebInformationWeekLight ReadingIntelligent EnterprisebMightyNetwork ComputingDark ReadingDigital LibraryWall Street & Technology
Byte & SwitchNo JitterInternet EvolutionLight Reading's Cable Digital NewsContentinopleUnStrungBank Systems & TechnologyAdvanced TradingInsurance & Technology
Face-to-Face Events
InteropWeb 2.0 ExpoWeb 2.0 SummitVoiceConBlack HatCSISoftwareEntrprise 2.0 ConferenceGTEC
Mobile Business Expo
InformationWeek 500 ConferenceBuy Side Trading XchangeBuy Side Trading SummitBank Executive SummitInsurance Executive SummitTelcoTVEthernet ExpoOptical Expo
Magazines  
InformationWeekWall Street & TechnologyInsurance & TechnologyBank Systems & TechnologyAdvanced TradingMSDNTechNetSmart EnterpriseThe Architecture JournalDatabase Magazine
 
Research & Analyst Services  
Heavy ReadingInformationWeek ReportsInformationWeek Analytics