Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




December 5, 2001

Clickstream Data Mart

It's part of the 360-degree customer view. It's being requested more. Are you ready?

By Joe Caserta
Edited by Ralph Kimball

Continued from Page 1

The values in the name=value pairs contain production keys that the ETL engine looks up to obtain the surrogate keys of the appropriate rows in the conformed dimensions. These values can appear at any position in the query string or cookie. Knowing the exact name or label as it appears in the log entry is crucial for accurate parsing. The webmaster or Web page developers need to provide these labels. When the exact strings are known, the ETL process will use nested substring and instring functions to capture the values associated with each label. This technique can be slow in the DBMS environment and should always be performed outside the database, managed by an ETL tool.

NEED FOR SPEED

To achieve the greatest ETL performance, it is essential that all Web log parsing processing be performed outside the DBMS and in a managed ETL tool environment. Many of the ETL tools available today are "clickstream ready," meaning they have native features and transformations exclusively for parsing and managing Web logs. Even though the ETL vendors are striving to accommodate clickstream data analysis, much of the Web log movement and management utilities still have to be explicitly programmed. Traversing Web farms and server directories to detect new logs; being file-date aware; conditional movement of log files; compression of logs; and in-stream processing of compressed data are all features I'd like to see in future versions of these tools.

In some cases, to gain maximum throughput, certain tasks can be performed at the operating system level. Specifically, while processing page views, the speed of filtering unwanted hits such as GIF and JPG images and robot visits can be improved significantly by moving the initial filtering process to the operating system level, prior to executing the explicit ETL tool procedures. With some ingenuity, you can exploit the ETL tool to dynamically create and execute customized OS-level scripts, surfing through the logs at astonishing speeds while passing only essential entries to the ETL tool for processing.

NEW VIEWPOINTS

When integrating clickstream data, the data warehouse manager must account for new viewpoints of the dimensional data. For example, before the introduction of the clickstream, every customer-based activity associated with a product started at the product stock keeping unit (SKU) level of the product hierarchy. Conversely, during Web navigation, a customer will most likely begin at the product category level, traversing the hierarchy from the top down, with no guarantee that the SKU will ever be reached. Moreover, the customer can now enter or exit the Product dimension at any level within the hierarchy.



Rate This Article

Comments:

Optional e-mail address:

To support this new requirement, the Product dimension and any other dimension that offers new hierarchical viewpoints must have new rows inserted to accommodate all those levels of the hierarchy where the browsing end user may do something interesting. This technique lets your business drill across facts when lowest-level SKUs are unspecified. You can apply this same strategy for analyzing the effect of marketing campaign initiatives when you don't know customers' names but do know certain higher-level demographics about them.

The facility with which decision makers are now able to combine Web-derived data with conventional activity data results in a major effect: The role of the data warehouse is becoming even more vital than ever to the business. As a result, the responsibilities of the data warehouse manager are growing. Soon, integrating clickstream data with the data warehouse will no longer be an optional practice but a requirement for competitive survival. Are you ready?


Guest columnist Joe Caserta [joe@casertaconcepts.com] is the founder of Caserta Concepts LLC, a data warehouse architecture and implementation consulting firm.


RESOURCES

Robert E. Hall. Digital Dealing. W.W. Norton and Co., 2001






IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo Jitter
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet Evolution
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space