The 80 Percent SolutionProducts make unstructured data easier to incorporate into enterprise applicationsby Dan Sullivan
In this Issue: In the past, unstructured data management has been a problem left to document management and content management administrators to deal with. Those days are over. Both organizations and vendors are realizing the value of integrating unstructured data with existing enterprise applications. There's a lot of talk about keeping a 360-degree view of the customer, but without access to customer emails, notes in customer relationship management (CRM) systems, memos logged in sales force automation systems, and purchase contracts kept in document management systems, it's just talk. Unstructured data management is moving out of the realms of just portals and content management systems into CRM, sales force automation, and other enterprise-level applications. Here I'll examine two products that exemplify this move: LumaPath's Real Time Relevance Server and Stratify's Discovery System. Of course, the real impetus in this process is not the vendor offerings, it's the realization that unstructured data permeates the organization and access to it is just as essential as access to structured data. Getting the Full PictureBy now we all realize that a customer has multiple points of contact with an enterprise, from the sales process to the calls for service. Keeping track of these contacts is an enormous problem even when dealing with structured data, but to add insult to injury, now we are starting to realize that we need to deal with unstructured data as well. "The need for unstructured data as part of the overall CRM data infrastructure is clear," says Jill Dyche, a vice president at Baseline Consulting Group and author of The CRM Handbook (Addison-Wesley, 2001). "What's less clear is its availability on demand. The value proposition for unstructured data is in combining it with other data for instance a customer's latest email as part of his profile screen. This is easier said than done." A recently launched product, LumaPath's Real Time Information Integration server, is one attempt to take on the problem of on-demand integration. LumaPath's Real Time Information Integration (RTII) software combines features of enterprise application integration (EAI), search engines, and portals. It crawls and indexes a variety of unstructured data sources and makes access to that content available from a desktop toolbar that constantly analyzes the contents of the active window. Consider a hypothetical example: When a customer service representative reviews a record in Siebel eBusiness 2000, the toolbar queries the LumaPath server and passes it information about the customer and the contents of notes on the screen. The server replies with an indication of the types of relevant content available about this customer, including emails from the customer, memos in Lotus Notes from account reps, and news articles from Hoovers or OneSource about the customer. The LumaPath toolbar acts as a utility for multiple applications and can use structured data, such as customer account information, when it's available. Using structured information, such as account numbers and customer names, improves the relevance of the content that the server identifies. The Real-Time Relevance Server comprises four components: the Relevancy Server, the client LensBar, data adaptors, and an administrator console. The LumaPath Relevancy Server is an ISAPI application that requires Microsoft Windows 2000 and uses Microsoft SQL Server for persistence. The server responds to requests from both the LensBar and administrator console and communicates using XML over HTTP. The server does not store the actual content or even an XML structure of key features, but maps content to a space-saving bitmap representation. This technique radically reduces the storage requirements and has proven itself with other information retrieval techniques such as file signatures. To maintain real-time speeds, the server keeps all data in memory and uses the relational database for persistence. This technique, like so many good ideas, has its limits. Without enough RAM, the server will page and experience the relatively slow I/O operations the server is trying to avoid. Because the content is not stored in the Relevancy Server, the server will query the source system each time a user fetches content. These are by no means problems for every implementation, but in high-volume environments you need to attend to system architecture issues created by introducing unstructured data management services. The LensBar is a Win32 client application that gathers content from the active window on the desktop, sends information to the server, and receives results of the automatic search. Unlike with traditional portal applications, a user doesn't need to leave one application to look up information in another. If the server finds a relevant piece of content, an icon on the toolbar changes to indicate that information is available. Icons are application-specific, indicating the source of the relevant text such as Siebel, Pivotal, Microsoft Outlook, Lotus Notes, or a relational database. The Relevancy Server gathers indexing data through data adaptors. These adaptors crawl content sources such as databases, mail servers, file servers, Web sites, and enterprise applications. The final component is the administrator utility, which is a Microsoft Management Console plug-in. The Underlying TechnologyLike many unstructured data management tools today, the Relevancy Server uses a combination of natural language processing and statistical techniques. The language analysis depends upon both syntactic information, like parts of speech, and semantic information. It then analyzes the preliminary results of the language analysis with statistical techniques to identify distinguishing features. In the last step, it applies numerical models to map the reduced feature set to the bitmap representation stored on the server. In addition to analyzing the content of active windows, the system also monitors the user's reading habits and builds a profile that it uses to improve the relevance of content it returns to the user.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
|
|











