CMP -- United Business Media

Intelligent Enterprise

Better Insight for Business Decisions

UBM
Intelligent Enterprise - Better Insight for Business Decisions
Part of the TechWeb Network
Intelligent Enterprise
search Intelligent Enterprise





May 9, 2002

Wrangling Behavior Tags

Behavior tags are true "text facts." How do we handle them in a data warehouse?

By Ralph Kimball

In my previous column, I argued that behavior was the new marquee application of the 2000s ("Behavior: The Next Marquee Application," April 16, 2002). As we're entering the third decade of data warehousing, we have progressed beyond the shipments-and-share applications of the '80s, past the customer-profitability applications of the '90s, to this new focus on individual customer behavior.

I also pointed out that the granularity of the data has increased by roughly a factor of 1,000 each decade. The megabyte databases of the '80s gave way to gigabytes in the '90s. Gigabytes are clearly giving way to terabytes in the 2000s. As I remarked in the previous column, the transition to terabyte databases caught me a little by surprise in the last few years because I was hoping that we had finally stopped growing our databases once we recorded every atomic sales transaction in our largest businesses.

But our databases are still growing without bounds because we're recording more and more subtransactions in advance of the sales transaction. Even if the customer eventually makes only a single purchase, we might capture all the behavior that led up to it. When we extract data from all the customer-facing processes of the business, we see physical visits to brick-and-mortar stores, Web site page requests from e-store visits, calls to support lines, responses to mailings, receipt records of HTML emails containing Web bugs that report back the display of the email on the end user's screen, product deliveries, product returns, and payments made either by regular mail or online. The flood of data from all the customer-facing processes surrounds and explains the final solitary sales transaction. The scary thing about all these subtransactions is that there's no obvious barrier or limit to the amount of data you might collect.

Well, it's nice that we have all this data available for describing customer behavior, but how can we boil the terabytes down to simple, understandable behavior tracking reports?

In the previous column I described how our data mining colleagues can assign behavior tags to complex patterns of subtransactions. I'll describe a simple, classic example. Let's use our standard data warehouse reporting techniques to summarize three customer behavior metrics: recency, frequency, and intensity (RFI).

Recency is a measure of how recently you've interacted with the customer in any transaction or subtransaction. The metric of recency is the number of days elapsed since the last interaction. Similarly, frequency is a measure of how often you've interacted with the customer. And finally, intensity is a numeric measure of how productive the interactions have been. The most obvious measure of intensity is the total amount of purchases, but you might decide that the total number of Web pages visited is a good measure of intensity, too.

All the RFI measures can be subdivided into separate measures for each customer-facing process, but I'll keep this example simple.

Now for every customer, we compute the RFI metrics for a rolling time period, such as the latest month. The result is three numbers. Imagine plotting the RFI results in a three-dimensional cube with the axes Recency, Frequency, and Intensity.

Now you call in your data mining colleagues and ask them to identify the natural clusters of customers in this cube. You really don't want all the numeric results; you want the behavioral clusters that are meaningful for your marketing department. After running the cluster identifier data mining step, you might find eight natural clusters of customers. After studying where the centroids of the clusters are located in your RFI cube, you're able to assign behavior descriptions to the eight behavior clusters:

A: High-volume, repeat customer, good credit, few product returns

B: High-volume, repeat customer, good credit, but many product returns

C: Recent new customer, no established credit pattern

D: Occasional customer, good credit

E: Occasional customer, poor credit

F: Former good customer, hasn't been seen recently

G: Frequent window shopper, mostly unproductive

H: Other






IE Weekly Newsletter
Subscribe to the newsletter
    Email Address