CMP -- United Business Media

Intelligent Enterprise

Better Insight for Business Decisions

UBM
Intelligent Enterprise - Better Insight for Business Decisions
Part of the TechWeb Network
Intelligent Enterprise
search Intelligent Enterprise





January 1, 2002

Out in the Open

In the world of analytics, the open-source torrent is little more than a trickle

Seth Grimes

The open-source (OS) software movement differs greatly from explosive Internet-era fads — think mid-1990s "push" technology — in that its substance outshines the hype. Progress has instead been quiet and steady, to the point where OS software is now widely used, often displacing commercial alternatives. This acceptance has come in the face of denigration of the OS development process and business model by the savviest businesspeople of recent times and despite reluctant, incomplete support by leading software vendors. Nonetheless, OS's reach continues to expand to the point where even that bastion of corporate self-affirmation, Gartner Inc., has jumped ship, recommending the Apache Web server over Microsoft's Internet Information Server, which Gartner states requires a complete rewrite with (this time) security in mind.

Yet the OS movement has focused almost exclusively on tools like scripting languages and compilers, operating-system platforms, and Web services. There are significant OS database management systems, but no apparent development activity in enterprise applications or business intelligence (BI). Although I describe some of the strong niche OS analytic tools, they have little of the presence earned (and deserved) by Apache's GNU Linux operating system or the Perl and Python scripting languages.

Search for Open-Source Decision Support


The Free Software Foundation's GNU project Web site is the starting point for research into OS software. It's still not widely understood that GNU — standing, recursively, for GNU's Not Unix — offers a comprehensive computing platform licensed under a general public license (GPL) that allows community modifications so long as they're available for inclusion in the free code base. (Some OS software, notably Berkeley Software Distribution [BSD] Unix operating-system variants, is provided under similar, non-GNU licensing schemes.)

Some software, Linux for example, was OS from the start while other packages including Borland's InterBase RDBMS were released to the OS community after failing in the commercial market. InterBase is an exception to the rule that most significant OS efforts are the work of academic and nonacademic researchers. The developers don't profit directly from their efforts beyond benefiting from the collaborative effort. The business model, if there is any, usually involves packaging or supporting goods that can be obtained and used for free. This approach is unlikely to lead to world domination, as market forces brutally taught Linux distributors including Red Hat Inc., Caldera International Inc., Corel Corp., and the recently demised Postgres RDBMS provider Great Bridge LLC.

In contrasting those OS successes to examples of the best of the OS analytic tools — the R statistical programming language, the gretl econometric and modeling package, the OpenCyc inference engine, and Snort, a rule-based network intrusion detection (NID) package — I hope to draw useful conclusions about both the OS movement and about the decision-support market.

DECISION-SUPPORT SYSTEMS

Let's look at the decision-support system situation.

R is a dialect of Bell Laboratories' S programming language, the work of John Chambers and colleagues, that provides a statistical analysis and graphing environment via command line and graphical interfaces. It appears to be doing quite well without commercial packaging or support.

R programming facilities include conditionals, looping, recursive functions, and support for a spectrum of data structures and specialized operators that simplify work with arrays, time series, and simultaneous-equation models. R programs can be extended with calls to dynamically linked external routines and through packages that come with the distribution or are available for download.

R competes head-on with Splus, a commercial implementation of the S language: My impression is that the competition, by enlarging the overall S user community, is beneficial to Splus's publisher, Insightful, and to the R effort.

R competes less directly with commercial statistical analysis software vendors including SAS Institute Inc., whose SAS product is the market leader. Many R/S users also use SAS; the packages complement each other. R/S is an excellent programming environment that is very well suited to developing statistical algorithms — for methodology research — while SAS provides a great environment for production work, extensive database and import/export interfaces, and vertical subject-domain modules.

SAS statistical tools link easily to enterprise applications: systems for CRM, manufacturing, sales force and marketing automation, and so on. Although you can code your own links into R, the development community apparently only supports links to gretl, a GNU-licensed econometric-analysis package. Gretl started from Professor Ramu Ramanathan's Econometrics Software Library with many extensions, including a graphical client, an integrated scripting language, gnuplot graphics, and interfaces to a variety of data file and database formats.

Rule processing systems lie at the opposite decision-support pole from statistical modeling and number crunching. Snort, an OS network intrusion detection (NID) system, typifies highly specialized, rule-based packages. Developer Martin Roesch describes Snort as a TCP/IP packet sniffer — now you know where the name comes from — that performs content-pattern matching to detect attacks and probes with logging and realtime alert capabilities. Roesch states, "Snort is a tool for small, lightly utilized networks." It's not going to replace commercial packages.

By contrast, one of Cycorp Inc.'s motives in releasing OpenCyc, an OS version of the Cyc "general knowledge base and common-sense reasoning engine," may have been to protect and even expand the market position of its commercial software. Release 1.0, now in limited beta release, will be available under the GNU Library or Lesser Public License and will include a subset of the Cyc knowledge base (KB), compiled versions of the inference engine and KB browser, and tools for KB building.

The OpenCyc KB will include 6,000 concepts constituting "an upper ontology for all of human consensus reality" and 60,000 assertions about these concepts. An R&D version, ResearchCyc, will include a "substantially larger" subset of the Cyc KB. Cycorp openly hopes that the market will standardize on its formats and interfaces and that some OpenCyc users will license the full KB.

Statistical analysis and algorithm development lie at decision support's core, and econometrics is an important decision-support application, although only one of many. NID is a tightly focused rule processing application, and Snort is a limited implementation, leaving only OpenCyc of the handful of OS packages I've cited as a pretender to dominating a decision-support category in the way that Apache and Perl dominate Web hosting and scripting. In my research for this column, I found no OS activity in the most prominent decision-support subdomain, BI, or in linking decision support to enterprise applications.

JUMP-START

There are several clear reasons for the paucity of OS BI software. First, commercial BI vendors do not have business models that focus on consulting services. Next, they have not experienced market shifts that would induce them to free their code bases.








IE Weekly Newsletter
Subscribe to the newsletter
    Email Address