CMP -- United Business Media

Intelligent Enterprise

Better Insight for Business Decisions

UBM
Intelligent Enterprise - Better Insight for Business Decisions
Part of the TechWeb Network
Intelligent Enterprise
search Intelligent Enterprise





December 5, 2001

A Closer Look

Giving customer emails the same analytic scrutiny you give transactions could be a valuable competitive advantage

By Dan Sullivan

Imagine for a moment that your company is a Web retailer that plans to implement a free-shipping policy to boost sales, but wants to measure the effectiveness of the promotion. No problem: You would use a data warehouse fed from the online ordering system to determine how sales have increased since the policy's implementation. With a decent online analytic processing (OLAP) tool, your analysts could slice and dice the data to determine if the policy is affecting certain types of products and hone it as needed, eliminating high shipping cost items while extending the promotion for more profitable products.

This scenario is common in the world of e-business analytics. The only problem is that the numbers only tell part of the story: Sales are up, but what about customer satisfaction? Has quality control suffered with increased volume? Lacking any intervention, these factors will be reflected in returns and customer churn. But if you could detect such patterns earlier — that is, if you were to give customer communications the same analytic scrutiny as you give structured data — you would have access to business-critical information about how customers perceive your organization before the relationship goes sour.

Fortunately, an emerging breed of natural language processing and information extraction tools provide the capability to map targeted pieces of data from free-form texts into standardized XML or relational formats for the purposes of analysis. In this article, I will discuss the types of information that can be culled from customer emails and the basic steps in the information extraction processes and provide some fundamental guidelines for evaluating information extraction tools.

EXECUTIVE SUMMARY

Dan Sullivan

Customer emails can reflect measurable trends if you use the right tools to extract and structure appropriate information. Combining new sources of information with traditional BI measures can provide a richer picture of operations than conventional measures alone.

NEXT GENERATION EMAIL PROCESSING

To make customer interaction as easy as possible, many companies offer general email accounts for any type of customer question or comment. Automated classification tools then route messages to particular groups within the organization. These tools analyze word patterns in the text to determine if the email is a support question, complaint, request for information, or some other predefined type.

While useful, and sometimes essential, these classification systems do not provide access to the substantive content of an email. For example, which product is the customer complaining about? Is it a complaint about price, quality, shipping, or something else? When did the customer purchase the product? To pull this type of detailed information from free-form text, you need to use information extraction tools.

The basic premise of information extraction is that patterns of phrases occur repeatedly in similar types of messages, articles, advertisements, and documents. A customer complaint will typically contain a name of a product, date of purchase, customer account number, a description of the problem and, possibly, the cost of the item. These elements can be successfully identified and transformed into a structured format suitable for aggregate analysis techniques by following a logical four-step process.

The first logical step is to apply morphological and lexical analyses in which the "part of speech" of each word is identified. For example, the text "shipment of sports equipment" could be tagged as:


<NOUN_SINGULAR>SHIPMENT</NOUN_SINGULAR>
<PREPOSTION>OF</PREPOSITION>
<NOUN_PLURAL>SPORTS</NOUN_PLURAL>
<NOUN_SINGULAR>EQUIPMENT</NOUN_SINGULAR>

Of course, grammatical texts are analyzed more reliably than ungrammatical ones, in part because part-of-speech programs have to "guess" the syntactic category of unknown terms and frequently use syntactic clues from surrounding words. With basic lexical information available, information extraction tools can move to more complex phrase-level analysis that depends on basic syntactic information, such as which words are nouns, verbs, or determiners. The previous example could be grouped into a single noun phrase:


<NOUN_PHRASE>
<NOUN_SINGULAR>SHIPMENT</NOUN_SINGULAR>
<PREPOSTION>OF</PREPOSITION>
<NOUN_PLURAL>SPORTS</NOUN_PLURAL>
<NOUN_SINGULAR>EQUIPMENT</NOUN_SINGULAR>
</NOUN_PHRASE>

This second step adds higher-level syntactic information by tagging dates, monetary amounts, noun phrases, and other basic elements used during later semantic processing. For example, after the second phase, a message such as:

On Aug 10, 2001, I purchased a copy of Corporate Portals and E-Business Integration (order # 980127302) for $34.95 and still have not received it.

becomes something like (individual word tags are eliminated for ease of reading):

On <DATE> Aug 10, 2001 </DATE> I purchased a copy of <NOUN_PHRASE> Corporate Portals and E-Business Integration </NOUN_PHRASE> (order # <NUMERIC> 980127302 </NUMERIC>) for <MONEY CURRENCY=USD> 34.95 </MONEY> and still have not received it.

In the third step, features and functional structures are identified. Features generally include the names of persons, places, and organizations; they are identified using "gazetteers" that include names of major corporations and geographic locations. Pattern-matching rules then help identify components of commonly used phrases. (For example, "Mary Johnson" is clearly a person's name by virtue of its position in the phrase "Mary Johnson, Director of Marketing.") Functional structures are application-specific but can include purchase dates, sales amounts, and description of products.

Effective information extraction requires fine-grained rules so that syntactically similar but semantically different elements are distinguishable. For example, the numeric element in the previous example could just as easily be a model number if all we considered was the lexical element itself. The context, however, conveys the number's meaning to the reader.

In the last logical step, discourse-level attributes are identified, including a measure of the emotional tone and urgency of the message. Not all information extraction tools extend to this level, nor do they need to, but the value of doing so is obvious. Solutions-United Inc.'s MetaMarker, for example, measures emotional intention as strongly negative, negative, neutral, or positive and urgency as very urgent, urgent, and neutral.







IE Weekly Newsletter
Subscribe to the newsletter
    Email Address