The Model CustomerWith Web site complexity increasing, direct marketing principles can suggest a more robust approach to clickstream analysis
By Thomas F. Richebacher What do you really know about your Web site customers? If you think you're getting enough data from your server logs, think again. Typical server logs don't give you enough information about your visitors to support important strategic decisions. And without a complete customer view, you'll be left guessing about user behavior. In today's fast-paced economy, your company must be able to develop and extract value-added data from your Web site and integrate it with offline information to provide managers with data that moves them away from scorekeeping and toward strategic analysis. Unfortunately, while the need for better information is increasingly vital, Web site complexity is growing exponentially. Web sites are deployed on geographically distributed servers and traffic on popular sites keeps going up. Static "brochure" pages are long gone. Instead, dynamically developed Web pages are based on query strings or profile criteria. While some of this complexity is technology driven, competitive pressures and higher customer expectations are the main market drivers. To succeed, your company must be able to gauge its ability to meet its own objectives, as well as its users'. Your measurements have to be results-oriented, designed to support decision making, and, because Web sites are dynamic, flexible enough to support continuous improvement efforts. Web Site Data Collection TodayCurrent Web site data collection methods and analysis rely primarily on Web server logs that collect data on user actions such as page requests, clicking on pictures, sending email, or filling out forms. Log analyzer tools summarize this data as counts: most requested pages, exit-and-entry pages, popular paths traveled, visitor counts, referrer counts, and so on.Traditionally, this information was used to ensure smooth site operation, monitor Web traffic, and determine potential bandwidth problems to meet customer demand and response time objectives. Because of its realtime delivery, marketers are now also using this information to evaluate response to marketing campaigns, adjust ad placement, and redirect traffic. However, three problems exist with using Web server log data as the basis for strategic analysis: Data volumes. Microsoft.com alone receives 60 million page views, 300 million hits, and has 4.1 million users per day. The group of sites that it operates (including msn.com, microsoft.com, expedia.com, and hotmail.com), generates approximately 200GB of data per day or 73TB per year. If Microsoft wished to develop unified user clickstream profiles across all sites, it would first have to combine all server logs, sort, and process them -- a time-consuming, and perhaps impossible task, within reasonable time constraints. Granted, most firms will not have to support Microsoft's data volume. But even if a site generates only one one-hundredth of Microsoft's traffic, 2GB per day still need to be processed and stored. That is no small task -- especially when you should keep data for at least four years for analytical purposes. This volume is expected to increase. A 1999 Forrester study, Online Retail Data Strategies (May 1999), states that 20 out of 54 online retailers expect data volumes to increase tenfold within two years. When wireless application use achieves critical mass in 2004, the click volumes will really increase. Incomplete data. During the Internet "Stone Age," approximately 1993, Web sites were usually based on a Unix file system and static HTML, which made it easy to track user behavior. You simply followed the page request entries from the Web server log. Things have changed: Now every site with a search engine accepts user input and matches it against the text in a file system or database to create personalized pages. Here's the problem: In all such cases the Web server log records only the name of the program that processes the input, but the nature of the information returned to the user bypasses the log altogether. In effect, what users input or see is hidden in the communication layer between the browser, server, and back-end application -- making it impossible to intelligibly interpret user actions. Lack of integration with non-Web data. Web site data does not exist in a vacuum. You need to develop information within its full context to pinpoint strengths and weaknesses. When your company uses multichannel distribution or communication methods, you must examine your Web site's effectiveness in relationship to all direct (mail, phone, email, and so on) and indirect (distributors, retailers, VARs) channels. You can't assume that a Web site is profitable because revenues exceed costs; rather, revenues migrate across channels according to customers' preferences and business policies that encourage the use of more efficient channels. With an increase of customer contact points, the focus of customer analysis moves from detecting lifts to understanding shifts. What Can You Do?Obviously, you need to take into consideration the increased complexities resulting from more elaborate Web site designs, increased traffic, and channel proliferation during the last five years when you collect and analyze data from Web sites. And you can no longer rely on log file data alone.There are situations in which most log-file data needs to be stored because of an outside agency's requirements. For instance, ABC audit requirements for magazine publishers specify that log variables (HTML tags of pages requested, protocol, browser software, and so on), should be retained for four years for users that complete critical transactions, such as a purchases or address changes. But in general, log files are becoming too large to process, too incomplete, and in their raw form, impossible to combine with offline data. Therefore, we need an alternative method to develop data. Take a Conceptual PerspectiveTraditional direct marketing principles can assist you here. They can help you understand the significance of data definition. Data definition is the first step toward analysis that allows tracking behavior from individuals, modeling response and profitability, developing customer segments, and estimating lifetime value.The main problem with applying direct marketing principles to Web sites is that they evolved within a very controlled environment. The direct marketing firm decides on the offer, the copy, and the list, and therefore the data model that describes how variables are defined, collected, and relate to each other is unambiguous. Unfortunately, this is not true on the Internet. Web site users select themselves through search engines, links, banner ads, or other media. They enter and leave sites from anywhere and decide when and what they want to see. The users are in control, not the Web site publisher, and this means that literally infinite variations of data elements can exist. While variation has its analytic advantages, too much variation becomes meaningless. Also, storing every possible variable creates the nemesis of too much data and too much processing time. Does this mean that direct marketing principles are useless on the Internet? No, it means that these principles need to be proactively extended and applied during the site's conceptualization and design stages. The three most basic steps to accomplish this goal are: Defining objectives and critical activities. Knowing your objectives is central to understanding what data you should collect and how to define it. But decisions about which activities to measure are even more important because objectives overlap more easily than activities. Objectives overlap because every firm needs to accomplish similar things and know:
But while every firm needs this knowledge, each will acquire it in a different way based on its own philosophy and values. Ultimately you'll create differentiation only after you decide which activities are unique and critical toward achieving your firm's objectives and how their impact will be measured. (See Sidebar, "Who Is Your Customer?" for some basic examples.)
Although each of the definitions shown in the sidebar has its merits, it is also clear that the outcome of any future analysis that is based on them will differ. The simplest solution would be to decide on one definition. But what if you change your mind six months later? The crux of the matter is that knowing individual customer identities is insufficient; knowledge about their behavior is what counts -- we want to define their activities. In the sidebar examples, each customer carried out or omitted an activity whose nature can be captured. You can store this information in an order-status field and design different customer handling plans based on order status. In a multichannel environment, you would perform the additional step of combining the order status information from different channels and assign an overall customer status score that would drive customer contact strategies. Summarizing data. After you have identified activities and defined corresponding data elements, the next step is deciding on the level of summarization. This is the key to data reduction because here you decide on data groupings. These groupings are based on objectives or activities. For example:
Regardless of the summarization method, you must maintain consistency among measurements. If you track revenues at the individual customer level, you should store costs there as well. Converting data. You want to extract the essence of Web site activities, not the clutter. In the context of collecting Web site data, this goal implies gathering only what is critical in an easy-to-analyze format that is compatible with other data sets.
For instance, instead of storing actual text values based on form selections, you could collect indicator variables that represent significant form content categories. In its simplest structure, the form selection Why would you want to do that? Text values are hard to analyze. By converting them, into numerical indicator values, it becomes easier to analyze the data. Ideally, indicators would be in the form of binaries; they are the easiest to integrate into statistical procedures. Binaries are simple on/off switches that indicate whether a certain condition was met: A value of 1 means that the condition holds and 0 that it does not. Indicators are often used to customize content or flag customers that have started a critical action such as loading a shopping cart. The dilemma of defining data at this binary level is a proliferation of variables and consequently greater storage needs. You can use similar conventions with navigational data. Instead of collecting entire URL addresses, information could be maintained by page type or content category. This approach reduces processing time, storage demands, and also makes it easier to determine significant pathways.
Clearly, the three tasks I've outlined here are far beyond what you can accomplish with a browser and Web server alone. You don't just want to format text; you want to detect interaction -- which means you need tools that tap the communication process.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
|
|











