Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Digital Library
Subscribe
Home




December 5, 2002

Full Analysis

PolyAnalyst mines data and text, and its engines run the algorithmic gamut

by Greg James

In this Issue:

  • Full Analysis
  • Pipeline

    Megaputer Intelligence Inc., the U.S.-based corporation behind PolyAnalyst, traces its roots back to the Artificial Intelligence (AI) Research and Development group at Moscow State University. PolyAnalyst made its debut in 1994, with continual enhancements ever since. Version 4.5 adds decision forests, transactional market basket analysis, and link analysis to the base product. Notably, it's also the first of several commercial packages to offer integrated text mining within the same system as numeric data mining. Until recently, text and numeric mining were separate endeavors.

    Product Spec Sheet

    PolyAnalyst 4.5

    Megaputer Intelligence Inc.
    120 West 7th Street, Suite 310
    Bloomington, IN 47404
    (812) 330-0110

    Minimum System Requirements: Microsoft Windows NT/2000/XP, 256MB RAM, 80MB disk space.

    Pricing: Standalone licenses range from $4,000 to $20,000 depending on selected exploration engines. Client/server licenses start at $30,000 for one server and three workstations.

    PolyAnalyst's user interface is a familiar, interactive development environment: standard menus and a horizontal toolbar across the top; a hierarchical project tree on the left; a multiview workspace in the center; and a system message window across the bottom. Its design focuses on high-level data mining functions and user-defined data-mining projects. These projects comprise Attributes, Data Sets, Illustrations and Graphs, Rules, Reports, Mining Models, and Data Links. These components are listed in the project tree and visualized in the multiview workspace. (See Figure 1.)

    Perhaps what's most notably unique about PolyAnalyst is its holistic approach to combining low-level algorithms into high-level data-mining functions. Megaputer's goal is to provide a comprehensive set of "exploration engines" built out of best-of-class machine-learning, statistical, and data-mining algorithms.

    The current version of PolyAnalyst includes 15 exploration engines: Summary Statistics, Linear Regression, Find Dependencies, Classify, Cluster, Decision Tree, Decision Forest, Discriminate, Find Laws, Nearest Neighbor, PolyNet Predictor, Basket Analysis, Transactional Basket Analysis, Link Analysis, and Text Analysis. Although PolyAnalyst is methodologically agnostic, its complete set of exploration engines supports almost any data-mining strategy. I review several here to provide insight into PolyAnalyst's algorithmic sophistication.

    Find Laws

    The Find Laws exploration engine is unique to PolyAnalyst and sits at the core of the system. Find Laws automatically generates candidate formulas and tests them to find the ones that best fit the data. Find Laws is capable of describing complex, multidimensional, nonlinear relationships even though the equations are restricted to rational expressions (polynomials).

    Megaputer's developers drew upon their extensive background in AI to develop an algorithm that limits Find Laws' equation generator from producing numerous trivial and redundant expressions. A special search algorithm directs the system to generate candidate equations most likely to be an improvement over previous ones. Find Laws evaluates candidate equations in terms of standard errors, the same method you use with standard regression models.

    The benefit of this design is that Find Laws generates complex models stated in algebraic terms, thus making them relatively easy to deploy in production environments. PolyAnalyst uses its own Symbolic Rule Language (SRL) that doubles as its scripting language and internal model representation language.

    Find Dependencies

    The Find Dependencies exploration engine is another example of higher-level data-mining functionality. One of the first steps in predictive modeling projects is identifying the internal relationships between specific target (dependent) variables and all relevant input (independent) variables. Next, feature selection reduces the set of inputs to include only those inputs that have the most predictive relationship to the target. This process becomes exponentially more challenging as the number of inputs and their possible values grow.

    PolyAnalyst's Find Dependencies algorithm accomplishes both these steps simultaneously, requiring you merely to specify the target and input variables. Furthermore, Megaputer encourages data miners to consider selecting all available input variables if very little is known about the data set.

    Find Dependencies can run in two modes: strict or liberal. Use strict mode to discover which attributes have the strongest influence on the chosen target. Liberal mode, on the other hand, will identify exceptional cases or outliers, depending upon the research objective. Both modes, when used in tandem, make the Find Dependencies engine an extremely efficient way of getting a handle on new data sets.

    Nearest Neighbor

    The Nearest Neighbor exploration engine is useful for classifying cases into one of several, mutually exclusive categories and predicting categories for new cases. Thus, you can use it for both exploration and prediction. Nearest Neighbor's accuracy gets better with more cases, but Megaputer warns that data sets in excess of 100,000 records can require a significant amount of time to complete.







  • IE Weekly Newsletter
    Subscribe to the newsletter
        Email Address







    InformationWeek Business Technology Network
    InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
    InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
    Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo JitterPlug Into The Cloud
    space
    Techweb Events Network
    InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
    Black HatGTECEnergy CampMashup CampStartup Camp
    space
    Light Reading Communications Network
    Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet EvolutionPyramid Research
    Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
    space
    Financial Technology Network
    Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
    space
    Microsoft Technology Network
    MSDN MagazineTechNetThe Architecture Journal
    space