CMP -- United Business Media

Intelligent Enterprise

Better Insight for Business Decisions

UBM
Intelligent Enterprise - Better Insight for Business Decisions
Part of the TechWeb Network
Intelligent Enterprise
search Intelligent Enterprise



June 22, 1999, Volume 2 - Number 9


Mining the Wallet


Use data mining to determine the best next offer

By Michael J. A. Berry




Selling additional services to the customers you already have, or cross selling, and its cousin up selling (getting existing customers to trade up to more profitable products) are traditional direct marketing techniques. Traditionally, however, the techniques have been more crudely implemented than is currently possible. Now you can use data mining to maximize the potential of these strategies by determining each customer’s best next offer — the offer that is most likely to elicit a positive response from that individual.

Cross-selling opportunities are so important that they are often the primary justification to shareholders for merging with another company. A general-purpose retail bank will buy a monoline credit card company because, in part, it hopes to mine the rich vein of behavioral information represented by credit card purchase histories in order to sell mortgages, home-improvement loans, and brokerage services to the credit card holders.

Increasingly, companies are devising such opportunistic strategies based more on increasing the profitability of existing customers than on simply increasing the number of customers. In some industries, such as retail banking, the majority of customers is likely to be unprofitable. In these industries, it makes sense to put your effort into improving the performance of the customers you have before going after more of them.

Another reason to cross sell is to increase customer loyalty by widening the breadth of the firm’s relationships with its customers. If you have a checking account at one bank, a mortgage from another, and a car loan from a third, you have been trained to shop around for each financial service. If, however, you have all of those plus an IRA at a single institution, you are likely to think of that institution as “your bank,” and you will tend to go there for your next financial need.


Aim Precisely

Successful cross selling requires more than having a lot of products that someone might like. The old Sears or Montgomery Ward catalog approach is no longer effective. The successful cross seller must figure out what products it should offer to whom. This principle is true even in an environment where the outbound communication is essentially free, as with email or banner ads on the company’s e-commerce Web site. When there are too many messages, none is effective. When there are too many emails, the customer will regard them as spam and withdraw permission to be contacted.

Cross selling and up selling are natural applications for data mining because you generally know much more about current customers than you could possibly find out about external prospects. Furthermore, the information you gather about customers in the course of normal business operations is much more reliable than the data you can purchase on external prospects.

In general, our approach at the Lab to building a best-next-offer model is first to build an individual propensity-to-buy model for each product. We then combine these models to create a set of best next offers for each customer. We give each customer a set of scores (usually numbers between zero and one) representing the likelihood that the customer will want to purchase each product. The top-scoring products for each customer are the individual’s best next offers.

In order to be useful for the best-next-offer model, the scores from the various product propensity models must be comparable. But what does it mean to be comparable? The Lab looks for models that meet the following list of requirements:

1. All scores must fall into the same range: zero to one.

2. Anyone who already has a product should score zero for it.

3. The scores should reflect the products’ relative popularity.

The third requirement is the one posing the biggest problem. Many algorithms designed to accommodate the first requirement would grant the people most likely to want a given product a score of one and the people least likely to want it a score of zero, regardless of the number of people who might possibly want the product. To see the problem with that method, imagine two products: one that anyone can use, and one that only left-handed people can use. The vast majority of people are right-handed and so have no interest in the left-hand product. Any system that gives right-handed people a higher score for a left-hand product than for a hand-neutral product is misleading. It is fine for the occasional left-handed person to score very high for the left-hand product, but the average score for that product will necessarily be much lower than the score for the hand-neutral product, to reflect the fact that most customers are not interested in it at all.


Propensity Models

The Lab’s approach to forming offers starts by building models that characterize the attitudes toward each product of those who have already purchased it. We then give prospects, meaning any existing customers who do not already have the product, a score based on the extent to which they resemble those who have purchased it. The precise definition of what it means to resemble a past purchaser depends on the data mining technique employed.

The clustering-based approach assigns the prospect to a preexisting cluster and uses the relative popularity of the various products within that cluster to assign scores. The memory-based reasoning approach is similar except that, instead of using preexisting clusters, all the customers within a certain distance of the record to be scored would be surveyed and their “votes” translated into scores. Our preferred approach is to assign each prospect to a leaf node of a decision tree built for the product and use the percentage of existing customers at that leaf to assign a score for the product. I describe in detail the data mining algorithms for clustering, memory-based reasoning, and decision trees in Data Mining Techniques for Marketing, Sales, and Customer Support (John Wiley & Sons, 1997).


Decision-Tree Pitfalls

You must avoid the following pitfalls of the decision-tree approach to building cross-sell models.

Becoming a customer changes the way you look. The first problem is that, as I originally stated, we are building a model that scores prospects based on their similarity to past purchasers, but past purchasers may look different now from how they looked when they signed up. Certificates of deposit (CDs) provide a good example: Customers who own CDs probably do not have large balances lying around in their ordinary savings accounts. Does that mean that we should look for CD prospects among the customers with low savings balances? Surely not! Prior to purchasing a CD, the customer must have had the purchase price available somewhere else — quite probably in a savings account.

The solution is to build models based on the way established purchasers looked just before they purchased the product. This approach requires a fairly sophisticated data warehouse (and a fairly complex query) to get the requisite data. To stay with the banking example, for each current CD holder, you need to go back months, or years, to get a snapshot of that person’s account in the month before the CD was opened. For any one product, each customer would have a different base month. Furthermore, each product would be modeled on a completely different data set because each product has a different population of customers with different account opening dates.

Past purchasers reflect past policy. A look into any company’s database will show that some products are much more popular than others. To some extent, this inequality reflects naturally occurring patterns. For example, the number of people who want checking accounts is larger than the number seeking home equity lines, because there are more people who pay bills than there are homeowners. In other cases, the data reflects a past or present marketing policy. If the company marketed a product in the past only to women, the models will predict that men aren’t interested.

A particularly pernicious form of this problem involves past discrimination or “redlining.” If a bank had a policy of refusing mortgages on buildings in certain zip codes, the model would show a low propensity for mortgages in those zip codes — which, if not caught, could perpetuate the discrimination.


Building Decision Trees

Building decision-tree models in an environment where only a small minority of the customer population bought any one product poses some challenges. The algorithms used to build decision trees measure their own success in terms of the classification error rate. The algorithm assigns each record to a leaf as it builds the tree. If more than half the training records reaching a certain leaf node have the product, it classifies that leaf as a “yes” for that product. If fewer than half the records reaching the node have the product, the node will become a “no” leaf. When the tree is used as a classifier, new records are fed through the tree and classified according to the labels on the nodes.

The problem is that when a product is rare, the best decision tree has only a single node, and it is labeled “no”; the safest prediction is that each prospect will not want the product. If only one or two percent of the population has a product, this straightforward prediction is hard to beat.

The solution is to oversample. Build the model on a training set in which the percentage of people with the product you are trying to predict is artificially boosted to 20 or 30 percent to give the algorithm a sporting chance of learning to recognize what you are seeking. After this oversampled data goes into building the structure of the decision tree, the model must be backfitted with data that reflects the true population densities so that the scores will be calibrated correctly. The resulting model has a much higher error rate than the model that predicts no one will buy, but it is much more useful because it produces scores for ranking a likeliness list, even if the highest scores are still quite low.

Finally, you can apply all the models to the customer database to generate propensity scores for each product. After that, use a query to find the best next offer, the name of the product scoring highest for each prospect. Although this sort of query is not easy to express in SQL, it is straightforward for an OLAP tool.

After applying these data mining techniques in pursuit of responsive cross selling and up selling targets, your customers might actually appreciate your next direct marketing campaign because they’ll be more likely to want what you’re offering. And when you think about it, that kind of courtesy is another way to build customer loyalty.

You can download a detailed white paper from the Decision-Support Systems Laboratory Web site about how to build cross-sell models at www.dsslab.com



Michael J. A. Berry is a founder and principal of Data Miners (www.data-miners. com) and co-creator of DSS Lab (www.dsslab. com) in Cambridge, Mass. You can write to him at mjab@dsslab.com .





IE Weekly Newsletter
Subscribe to the newsletter
    Email Address