Clickstream Data MartIt's part of the 360-degree customer view. It's being requested more. Are you ready?
By Joe Caserta The clickstream's image is finally transforming, deservedly so, from that of a glorified "hit counter" to that of a crucial customer information source. Case in point: Over the past few months, many people from even non-dot-com corporate environments have asked me about clickstream data marts. Most of this change of perception probably owes to the endless effort businesses are exerting to represent all 360 degrees of the customer in their data warehouses. After all, the clickstream data mart is a strategy that allows companies to truly complete the customer profile, by providing a way to integrate all Web-based customer activity data with conventional transactions. SHEER VOLUMEImportant as the clickstream is, as most extract-transform-load (ETL) developers have come to realize, analysis of the raw Web logs that yield clickstream data can be a daunting task. The sheer volume and unformatted nature of a Web log makes for very complex parsing strategies. Data warehouse architects and developers already deal with very large volumes of data, but Web logs can easily be 10 times larger. Therefore, it is 10 times more critical that the dimensional data model be precise and that the ETL process be efficient. Because of the effort required to conquer this Goliath data source, many companies purchase preprogrammed applications to gather the statistics from their Web sites, creating isolated reporting solutions for single subject areas. Any data warehouse manager knows that creating such stovepipes is one of the taboos of a successful data warehouse implementation. Companies with these solutions in place are now realizing the shortcomings of these disconnected applications. Web log data has much more to offer than basic Web statistics, a fact that cannot be fully appreciated unless the Web log is integrated into the data warehouse. This important data includes key customer behavior habits that have never before been available to corporate decision makers. SAME, BUT DIFFERENTThe fact is the clickstream data mart can be implemented just like any other data mart in your data warehouse with some slight modifications. The data warehouse manager faces the following challenges in the clickstream data mart development life cycle:
The logs themselves, regardless of the particular Web server that creates them, conform to a field layout standard that the World Wide Web Consortium (W3C) sets. As the W3C standardizes available data elements, the content of these elements varies. Additionally, the data within some of these fields is highly customizable, especially the query string and the cookie. BUS TO THE DATA WAREHOUSEThe query string and cookie fields are what let us "plugin" a properly designed clickstream fact table into the existing data warehouse, using the data warehouse bus architecture described in Ralph Kimball's books and articles. You should use existing conformed dimensions in the data warehouse to add dimensionality to the clickstream data mart where possible. As Figure 1 shows, the Page Events Fact table utilizes the Customer, Product, Promotion, Date, and Time dimensions. These are all conformed dimensions that have been built previously, during unrelated data mart projects. This technique not only enables timely delivery of the data mart, but also provides "drill-across" access to other facts throughout the data warehouse thus enabling you to link Web-based and conventional customer activity.
Figure 1 Use existing conformed dimensions in the data warehouse
to add dimensionality to the clickstream data mart where possible. PANNING FOR GOLDFinding and parsing the data required for joining each clickstream fact to the conformed dimensions is another challenge. You must analyze each Web log entry, searching for specific name=value pairs that have meaning to a particular application the Web page supports. These name=value pairs are primarily stored in the cookie and user request line (also referred to as the query string).
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
|
|












