A Closer LookGiving customer emails the same analytic scrutiny you give transactions could be a valuable competitive advantage
By Dan Sullivan Continued from Page 1 The final product of information extraction process is either a fully structured representation of an email in a relational representation or a semistructured document in XML that can be easily parsed and loaded into a database. When in the database, extracted elements can be mapped to existing dimensional models and hierarchies and aggregate measures calculated about the extracted facts. At that point, users are ready to pose questions such as, "How many complaints were received about consumer electronics equipment in the last three months?" EXTRACTION TOOLS FOR EMAILCompared to their text analysis siblings (search, classification, and taxonomy generators), information extraction tools are far less common. Tools from three vendors Solutions-United, Temis, and WhizBang Labs are representative examples of this technology. Solution-United's MetaMarker provides a full range of lexical, syntactic, semantic, and pragmatic analysis. The product includes a set of distinct modules for individual tasks allowing developers to choose processing features a la carte through a Java and XML- based API. From a system architecture perspective, MetaMarker is a transformation tool that takes free-form text as input and generates a formatted data stream ready to load into the database. MetaMarker works with three basic objects: resources, tasks, and controls. There are two types of resources: rule sets, which work on a single problem such as lexical analysis or noun phrase identification, and knowledge bases that support analysis, such as a lexicon. Tasks are sets of resources that are applied to a text in a particular order. For example, the part-of-speech tagging task is specified by:
Other predefined tasks include cleaner, tokenizer, sentence detector, stemmer, classifier, and formatter: You can group tasks into profiles for easier reference when processing documents. Controls specify which tasks are applied to a particular section of the email; for example, the classification task might be applied to the body of an email only, instead of including the header information as well. Objects are specified at the environment level using an XML configuration file and at the document level where the specifications are embedded into the text stream. (A typical message ready for analysis would look something like what you see in Listing 1.) This example assumes we have defined two profiles, core and classify, using the appropriate tasks which in turn are used to specify the resources required to complete the operation. MetaMarker is initiated through a Java program in several steps. First, a document object is created, the source file is parsed, a MetaMaker object is created, and then the processDocument method is invoked. The basic template is:
The resulting string in the xmldoc object is the fully augmented message with tags such as
Temis also provides a set of tools for analyzing customer emails. Its Insight Discoverer Extractor combines morphological and syntactic analysis to produce an intermediate representation of a message that is then further analyzed using a set of grammatical rules and thesauri that are stored in "skill cartridges." The skill cartridges contain both lexical, word-based information and sentence patterns for identifying key features. The Opinions Analysis cartridge identifies particular opinions and their relationship to specific objects, such as a customer is dissatisfied with a particular product. Industry-specific cartridges, such as the Banking Cartridge, are also available. Although not specifically targeted at email analytics, WhizBang Labs is another vendor in the information extraction market. WhizBang's products use machine learning techniques to develop extraction rules from example texts labeled by users. In addition, extracted information is assigned confidence measures that can be used to route texts that fall below a predefined threshold to a human for verification. These tools have been used in a range of related applications, from extracting targeted information from resumes to adding XML structures to unstructured texts. TO THE LETTERNot all the information we need to manage customer relations is neatly packaged and ready for analysis. Customer emails can act as early indicators of trends, such as dissatisfaction with a shipping policy, that will not show up in structured databases for some time or with as much detail. Emerging information extraction tools are bridging the gap between free-from texts and analytic tools. Dan Sullivan [dsullivan@redmontcorp.com], author of Document Warehousing and Text Mining: Techniques for Improving Business Operations, Marketing and Sales (John Wiley, 2001), is CTO of Redmont Corp., a firm specializing in the design and development of content management and business intelligence applications. Read All About It: The Right Tool for the JobInformation extraction tools are beginning to make their mark in e-business analytics, but like other contemporary text analysis tools, they have their limitations. First, it is difficult to develop rule sets for identifying interesting patterns. Some progress has been made in applying machine-learning techniques to discover these rules (for example, WhizBang products), but you should assume results would still require some manual review. A second shortcoming is that pattern-matching techniques are not foolproof. Without the correct context, order numbers can be confused with account numbers, and noun phrases may not be parsed correctly. Finally, specialized thesauri and knowledge bases may be required for accurate information extraction in domains with specialized terminology. However, if you need to extract a relatively fixed set of information from free-form text, an information extraction tool might meet your needs. When evaluating these tools, consider these factors:
RESOURCESSolutions-United: www.solutions-united.com Temis: www.temis-group.com WhizBang Labs: www.whizbang.com
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| ||||||||||||||||||||||||||||||||









