Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




December 5, 2002

Full Analysis

PolyAnalyst mines data and text, and its engines run the algorithmic gamut

by Greg James

Continued from Page 1

Most k-nearest neighbor algorithms incorporate a set of assumptions about the data they explore. For example, many algorithms assume that data points are normally distributed; therefore, they use a Gaussian distance function to determine how close cases are to each other. PolyAnalyst is able to automatically test several common distribution functions and choose the most appropriate one. Another important task is to determine which combination of parameters will yield the best overall classification accuracy. PolyAnalyst's Nearest Neighbor module incorporates a genetic algorithm specifically designed to find this combination, relieving the data miner from having to discover it through trial and error.

Text Analysis

Perhaps the most exciting exploration engine is the new Text Analysis module. It's no surprise that the vast majority of a corporation's data is textual. Text mining has been gaining mindshare for several years as systems and algorithms that deal with this form of unstructured data have matured.

PolyAnalyst steps into this domain by providing algorithms that scan through semistructured text, extract novel terms, concepts, and relationships, and represent these results in the same SRL as its other models. This uniformity, in turn, lets the data miner analyze textual data values with the same algorithms as they do for numeric or categorical data values.

A classic application of this brand of text mining is the analysis of customer feedback in call center logs. PolyAnalyst can scan through these comment fields, extract the important concepts and terms, and turn these results into numeric values that data miners can further analyze with Link Analysis, Classification, Clustering, and so on. PolyAnalyst's Text Mining exploration engine provides both directed and undirected modeling.

Proper Guidance

The PolyAnalyst user's manual thoroughly describes all the exploration engines, with the exception of the new Text Analysis module. What I find especially helpful are the usage guidelines, including minimum, maximum, and optimum data set sizes. Too many vendors neglect to provide this information, leaving the data miner to guess the system's capabilities.

Each algorithm's description discusses applicable problems, data format requirements, data preprocessing suggestions, the underlying algorithm, recommendations for when to use the algorithm, and the system's outputs for that algorithm. A slightly more technical description of each algorithm is also included in a separate appendix.

The Interfaces

PolyAnalyst augments the exploration engines with a basic set of charting, graphing, and reporting options. Each exploration engine automatically generates the appropriate outputs for its analysis, but the data miner can also create customized outputs using the system's interactive interface to these functions. PolyAnalyst's built-in data visualization and reporting features are adequate, but spartan.

PolyAnalyst's SRL, in combination with its OLE DB Data Links module, provides the data-processing functionality needed for practical data-mining projects. Data-processing steps, as well as generated models, are all stored as SRL rules, much like an expert system's knowledge base. All rules, both generated and user-written, are listed in the project tree and executed when the user manually applies them to a specific data set. Although this design has a certain elegance, it doesn't eliminate the need for a third-party, heavy-duty data-processing tool for large data-mining projects.

PolyAnalyst is available in a workstation or client/server configuration. An "In-Place" data mining interface is also available for Microsoft SQL Server. This option exports the most I/O-intensive operations for very large data sets down to the DBMS, which is optimized to perform these functions. Organizations working with very large data sets or data warehouses should definitely evaluate this option.



Rate This Article

Comments:

Optional e-mail address:

Megaputer also offers PolyAnalyst Software Development Kit for customers interested in embedding PolyAnalyst's functionality in their own customized applications. PolyAnalyst is a Windows COM application split into a three-tier, client/server architecture: PolyAnalyst Client, Knowledge Server, and Data Access Module. As a COM application, PolyAnalyst can integrate with standard Microsoft Office applications such as Excel and Access.

In Balance, A Positive

The more I work with PolyAnalyst, the more I appreciate the sophistication of its exploration engines. What it lacks in data preprocessing and visualization capabilities, it overcomes by automatically handling complex data-mining processes. The integrated text-mining feature adds a whole new dimension to what this package can analyze. And the in-place data-mining option lets several core exploration engines scale to data warehouse proportions. Megaputer is constantly enhancing this product, proactively participating in standardization efforts such as Predictive Model Markup Language, and demonstrating its willingness to develop and enhance its innovative algorithms. ie


Greg James [greg.james@nationalcity.com] is a vice president of National City Corp. and manager of the Retail Marketing Quantitative Methods group. He also teaches university computer science and data-mining classes.


RESOURCES

Other data mining package reviews from Intelligent Enterprise:

SAS Enterprise Miner 4.1

SPSS Clementine 6.0









IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo Jitter
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet Evolution
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space