Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




October 8, 2002

The 80 Percent Solution

Products make unstructured data easier to incorporate into enterprise applications

by Dan Sullivan

Continued from Page 1

So who will benefit from the Relevancy Server? According to David Marshak, senior vice president at Patricia Seybold Group and a Relevancy Server user, the whole value of this application is not just that it will give sales and customer service representatives more efficient access to information they need. What's really valuable is that this tool also provides the same information to those who are not proficient with CRM applications or who may not know what information about a topic is scattered throughout the organization.

A Future Standard Feature

The desktop level integration of LumaPath will work well for users juggling a number of disparate applications and working with a narrowly targeted topic, such as a specific customer. When users need access to broader ranges of information and need to navigate through a large number of possibly relevant pieces of content, then categorization and hierarchical classification should be used. For example, decision support systems and portal users often require rapid access to a range of unstructured information but have to settle for basic search engine result sets. The Stratify Discovery Server targets this type of user.

Like the Relevancy Server, the Discovery Server builds and maintains a repository of information about content it has crawled. The Stratify tool also provides a Windows client that analyzes the content of active Microsoft Office, HTML, and PDF files and offers links to related content. It also automatically builds user interest profiles.

Unlike LumaPath, Stratify has chosen to keep this content metadata in a relational database in a form accessible to other applications.

The Discovery Server architecture centers on the classifier, which receives data from crawlers and uses existing hierarchical categorization information. The classifier uses several automated methods for categorizing content, including statistical, keyword, Bayesian, and neural network algorithms. It combines the results of all algorithms to produce categorization metadata for each piece of content it analyzes.

Because categorization algorithms depend upon an inductive bias to formulate a best guess at classification rules, combining the results of different algorithms with different biases can yield better overall results when dealing with a wide range of content and classification problems.

Of course, no automated text analysis system is 100 percent accurate. The Discovery Server supports rule-based classification and manual tagging as well.

EAI Need Not Apply

Organizations are scrambling to integrate data in business intelligence, CRM, and other enterprise applications. Now unstructured data is being thrown into the mix. Fortunately, this doesn't entail the same level of analysis and development as structured EAI and extract, transform, and load systems, for the simple reason that we don't need to manage the details of fine-grained data structures. In the end, the ease of integration comes at the cost of less-than-perfect results. Some documents might be misclassified and some relevant content might be missed, but the results are still far better than what many of us have to work with today.



Rate This Article

Comments:

Optional e-mail address:

The Liberation of Content

Unstructured data doesn't need to be formally structured to be integrated with enterprise applications, as long as metadata about content is available. The key to successfully working with unstructured data is finding, indexing, and categorizing it and then linking it to applications based on the semantics of the content rather than the data structure that holds it.

This is one step in our liberation from the headaches of EAI-type integration, which focuses on structural elements. Both LumaPath's Relevancy Server and Stratify's Discovery Server highlight the core issues in unstructured data management: accessing and indexing content, organizing it based on application requirements, and making it readily accessible to users without the poor-quality results of full text search engines. The recent merger of Inktomi and Quiver, a categorization software vendor, as well as Verity's continued development of its Verity Intelligent Classifier and its tight integration with its search engine, are other examples of the trend away from commodity search toward more sophisticated text analysis techniques.

Even with better analytic tools, the timing and quality of content analysis remain important design considerations. As Dyche noted, "The challenge isn't so much how to integrate the data as it is maintaining its currency. For instance, knowing which products high-value customers inquired about last month is too late for Marketing. They need to know what products people asked about yesterday — whether through email, by phone, or via live chat. It's the increasing demand for near-real-time data in different formats that's flummoxing everyone." Certainly there are performance and architectural issues that have to be addressed, but the proverbial genie has been let out of the bottle and the integration of structured and unstructured data is here to stay.


Dan Sullivan [dsullivan@redmontcorp.com] is the chief technology officer of Redmont Corp. and specializes in evaluating, designing, and implementing enterprise information management systems. Dan is the author of Document Warehousing and Text Mining (Wiley, 2001) and is working on a new book, Proven Portals (Addison-Wesley, 2003).









IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo Jitter
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet Evolution
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space