http://www.intelligententerprise.com/010723/411feat2_1.jhtml

8 Steps to Better Customer Identification

The first step in retaining profitable customers is to know who they are

By David Cameron

A simple truth exists in sales and marketing: To make sales, you need customers, and the customers must tell you who they are. When you have a million customers, this concept is not trivial. When you want to sell to a million customers using a one-to-one, personalized approach, simplicity becomes incredibly complex. But that is what you're doing when you build marketing infrastructures that combine networks, large-scale marketing software, and giant customer databases.

EXECUTIVE SUMMARY

DAVID CAMERON

Customer identification has always been a foundational requirement for presenting the right offer to the right customer at the right time. Internet marketing forces the transition of this capability from a back-office batch routine to real time in order to optimize the offer process and apply closed-loop marketing techniques. This process requires careful thinking and best practices. Here are eight guidelines for your consideration.


Retailers thrive on new leads to grow customer bases, but the cost of keeping customer data clean and fresh drives operational can cost up to $4.6 million per year. Retailers can't avoid this expense because consumer data error rates typically range from 1 percent to 5 percent.
Source: The Forrester Report, March 2001

At least 50 percent of financial service providers will have poor customer data quality through 2004 (0.8 probability).
Source: Gartner Group, September 2000

Marketing technology all comes down to an ability to send a customized offer via email, direct mail, telephone, or other medium to a specific person (Mr. Jones, for example) and be certain that that person is indeed 42-year-old John P. Jones of 123 Anystreet, Citytown, Somewhere, USA (phone: 999-000-0000, email: jones@aol.com) who is a hiking and biking enthusiast.

Frankly, marketers could not do that before in real time, which is partially why you receive so much junk mail from the post office and spam in your email inbox. Customer identification is the key to one-to-one and permission marketing. Customer identification, the assignment and maintenance of a constant unique identifier for each customer, lets marketers create a dialog with a customer over time. Without this capability, marketing remains hit-or-miss as it is today - a guessing game of where and who customers are, based on demographics and targeting of groups with similar characteristics. Essentially, mainstream marketing still circles a large group of names and says, "Our customers are somewhere in here."

The main agent of change in customer identification is Internet marketing, where customer interaction occurs in milliseconds, not in weeks or months. However, the collapse of time places complex requirements on the systems that identify your customers. Traditional batch processing of customer identities is too slow and inefficient. In addition, the Internet creates additional fields related to a customer, such as an email address and URL where the customer appeared. Relating this data to a traditional name and address field requires careful consideration.

Further, with the growth of international marketing and complex, B2B data models, the process for identifying your customers takes on new and subtle textures that come down to hard realities such as how to record a European postal code in a U.S. ZIP Code database.

Many ways exist to establish customer identification - some better than others. You should follow best practices to build a customer identification architecture that is scalable and extensible as marketing moves from batch to interactive identification, B2C to B2B, and across national boundaries. Here are eight guidelines that you should follow when building customer identification systems.

FOCUS ON PROCESS

Good matching is more a function of good process than good name matching rules or algorithms.

Good process is knowing when to evaluate changes in your matching fields, which matching algorithms to apply in which order, and when to resolve conflicting matches. This process marks the difference between effective and ineffective customer identification. For example, efficient matching includes a change-detection step prior to rematching unchanged records. The change-detection step ensures that records matched in previous runs will stay matched even when addresses have not been updated on all information source systems. Including this step prevents the headache of old addresses from out-of-date sources overwriting new ones and irritating your customers who have already changed their addresses with your company only to find that the change didn't take.

Furthermore, performing lower-level matches, such as matching individual names, prior to performing higher level matches, such as matching names to a household or business site, is critical. The correct sequence of events plays a pivotal role in providing for referential integrity - the relationships between individual, household, business site, and so forth.

Additionally, resolving indirect matches is purely a function of implementing the right process in the right order. For example, Customer A matches Customer B. Customer B matches Customer C. Therefore Customer A matches Customer C. Indirect matching is one of the most effective matching techniques and adds tremendous value and accuracy to the matching process.

Good process creates an optimal balance between accuracy and speed, which is critical in realtime relationship identification.

SEPARATE MATCHING RULES

Customer identification is a journey, not a goal. New data sources and changes to existing data sources require vigilance and adjustments to maintain the accuracy of a matching system. Reaching an 80 percent accuracy level in name matching doesn't take much work, but reaching the upper 90th percentiles of name accuracy takes constant database tuning. If you build rules into a system so that you can't change them, you'll slow the system down or fail to reach high levels of matching accuracy. Therefore, marketers should always separate the matching rules from the matching system, which also enables rule reuse by simply copying matching parameters from one information source to another.

Database tuning requires easy access to matching rules that handle complex patterns and routines. Simple rule-setting capability is not enough. Rules should be accessible to marketers without programming skills. In an open, multiplatform marketing system that draws on databases from across the enterprise, marketers need to replicate exact customer identification logic at each customer touchpoint, whether it is telemarketing, direct mail, or email. This precision is the only way to ensure consistency of customer identification and treatment. You must always accurately identify Mary Clark on Double Diamond Avenue regardless of whether she touches the organization at a call center or Web site, or arrives on a list of recent purchasers or attendees to a seminar.

CREATE MULTILEVEL AND MULTIVIEW CAPABILITIES

As marketing organizations have become more complex, a single definition of "customer" has been replaced by multiple definitions, which is especially true in B2B or "mixed" B2B and B2C marketing. But multiple customer definitions and views create a complexity that challenges even the most refined matching systems.

For example, in a B2B view of a customer, the matching levels could be individual, department, site, and enterprise. In a B2C view, the matching levels are typically individual and household. This distinction can create a situation in which Mary Clark on Double Diamond Avenue is identified as both a householder and vice president at Acme Company without distinction. If I want to sell Mary Clark business services, approaching her as a householder is the wrong way to go. On the other hand, if I want to sell Mary Clark a household product, addressing her as the vice president of Acme is just as wrong.

Multilevel and multiview capabilities are essential in a complex marketing environment, but they require careful maintenance to assure referential integrity.

ALLOW FOR COUNTRY-BASED BUSINESS RULES

Marketers must account for national name and address conventions when identifying customers. In New York City, the address "20 5th Avenue 203" probably means house number 20 on 5th Avenue, unit 203. In Holland, "Vanhelsdingenlaan 16" means house number 16 on Vanhelsdingenlaan. Such varied use of numerics after street names requires different business rules to ensure you match the correct fields. In Spain, multiple middle names are common. In Quebec, women routinely retain their maiden names. In Britain, first initials replace first names in most instances. Without country-by-country adjustments to the matching rules, marketers can quickly and extensively introduce errors into the system and corrupt customer identification.

MINIMIZE TOTAL ERROR

In traditional merge-purge processes, the goal is to minimize undermatching. What is undermatching? Here's an example: Two identical records fail to match only because one has an apartment number and one doesn't. One common way to fix this error is to remove the requirement that apartment numbers match, which prevents duplicate mail from being sent to the same individual. What happens, unfortunately, is that matching rules are biased to minimize undermatching at the expense of overmatching, which results in erroneous matches of completely different records.

But pushing too far to fix one problem can inadvertently cause another. "Fixing" an undermatch causes too much overmatch in other areas, such as when the absence of an apartment number keeps legitimately different records separate from one another. Most often marketers never notice that they have actually increased error in an effort to reduce it. By adjusting rules so that overmatching and undermatching are treated equally, marketers can reduce total error.

INCLUDE DATA QUALITY IN THE PROCESS

Most errors in matching are not caused by inadequate business rules. In fact, confusion in identifying address elements causes most matching errors. If you identify a house number incorrectly, you can't use it in a match. This result is especially true when cleaning up business names. An important step is removing common words both universally (for example, "the," "a," "an", and so on) and geographically (for instance, "Bay State" in Massachusetts names, or the name of the street from the business name, like 31st Street Antiques). These matching errors are data quality issues.

You also need to flag records based on common words. For example, key words like Corp., Inc., GmBh, SA, and so forth, connote corporations, whose names you can't use elsewhere. The absence of this suffix (for example, Town Pizza, which occurs in virtually every town in America) is also significant. Other flags to set and account for include common words flags (where a name consists of all common words, such as "Boston Consulting Group") and franchise flags (McDonalds, Burger King, Pizza Hut, and so on). These flags play a crucial role in the match.

Also, creating a separate copy of the address purely for matching purposes is extremely important. For instance, you must not change Beacon Hill to Boston for a mailing because it often angers customers, but for matching reasons, you need to substitute the true city for this vanity reference.

Incorporating data quality into the matching process answers all these issues and dramatically improves accuracy.

IMPLEMENT BOTH CALLABLE AND BATCH-RULE EXECUTION

This practice is critical to realtime relationship identification and successful one-to-one and permission marketing on the Internet. Basically, marketers must have the capability of running matching rules on demand or periodically.

Most batch processes run periodically to clean up duplicates in customer databases or incorporate new activity into the customer view. Although this step is necessary, equally important are identifying individuals in real time while you interact with them and linking them to the total customer view in order to establish a dialog. Customer identification must be done consistently across all processes and touch customer points, regardless of platform. Technically, this process requires that your matching rules are available in library form so that you can match one record at a time into a database - also known as callable matching. You should use the same business rules in many-to-many batch matching processes.

Most systems have inconsistent business rules for callable and batch matching and inconsistent rules across different platforms. Such discrepancy means it is impossible to consistently match data in both real time and during periodic updates, which prevents the establishment of good customer dialogs.

IMPLEMENT A TIERED RULE SYSTEM

The most effective matching systems are multitiered and involve multiple points of control.

The content of data fields is the first point of control, such as data quality. The second point of control is which fields you use - for example, name and address; name and email; name, address, and email. It should be possible to choose any set of fields for matching. The third point of control is the thresholds for a match when comparing any one field. For instance, comparing two email addresses requires an exact match, but when comparing two last names, one or two errors is acceptable. The fourth point of control is the combination of matches required to associate two records. For example, if the first and last name are exact, but the city is totally different while the street address is close, that might be considered a match. However, if errors occur in the first and last name, the city might increase in importance to overcome those errors.

These complex points of control ensure that you can implement sophisticated business rules to accommodate special situations.

In summary, the realtime requirements of Internet marketing and the rise of global marketing have forced significant changes on traditional customer identification systems. Marketers need to retrain themselves in order to achieve scalability and extensibility, forethought, and a well-designed architecture. Reliance on traditional methods will result in an inability to cross from the present generation of marketing science to the next.

DAVID CAMERON [david.cameron@wheelhouse.com] is vice president of data integration and analytics for Wheelhouse Corp., a Burlington, Mass.-based marketing infrastructure services provider. Cameron is a frequent consultant, speaker, and publisher of topics in this field.


RESOURCES

Related Articles on IntelligentEnterprise.com:

"The Model Customer," January 30, 2001

"I Buy, Therefore I Am," April 28, 2000

Return to Article