Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




August 10, 2003

Better Storytelling

Dimensional design techniques bind events into stories

by Jim Stagnitto
edited by Ralph Kimball

Continued from Page 1

Notice that by constraining on purchase steps that are one step from the end, only page visits that immediately precede shopping cart adds will be returned, regardless of their position in the session sequence or absolute time. This is powerful medicine because it allows simple query tools to identify and scrutinize the Web page pathways that consistently lead to purchases, helping retailers rearchitect their sites and promotions to better drive Web traffic into these "purchase vortexes."

Similarly, you could use the step dimension in alternative roles, such as identifying page visit steps that lead to abandoned shopping carts.

The step dimension is easy to build in a spreadsheet; prepopulating it with up to 300 total steps requires only about 45,000 rows, a reasonable dimension size that covers enough potential steps for most typical Web site, travel, or network-traffic analytic applications. You can create a special "Over Max Steps" dimension row to handle any exceptionally high numbers of steps.

The page dimension contains a row for each Web page on all sites that publish clickstream data to the data warehouse, and is typically a slowly changing dimension. The data model in Figure 1 has three roles for this dimension: prior page, current page, and next page (referring to the pages associated with the previous, current, and next page visits of the session respectively). Having all three of these relationships available allows standard query tools to easily produce reports that show where visitors originated and where they went next for any Web page. A special "Not Applicable" row stands in for the previous-page and next-page roles of a session's first and last page visits. Reporting page-to-page traffic this way is a whole lot easier than using correlated subqueries.

I use the topographic dimension for Web sites with structured content, perhaps using style templates consisting of several containers for dropping content into. This dimension stores the template and container names and codes, along with other descriptive information about the placement (screen geography) of the container on the page and any special appearance characteristics of the container. The model embraces two distinct roles — one for the container that was clicked in the session's prior page visit and one for the container on the current page that will be clicked next.

This data modeling technique simplifies finding hot spots of high-value Web page real estate and identifying which types of visitors the template and container presentation styles attract.

For more complete coverage of this clickstream data model, including associated dimensions such as session profile, page visit profile, and browser and customer dimensions, see Figure 2.


Figure 2

Buy the ETL Team Lunch

Of course, all of this increased storytelling power comes with a price, and as usual, the data warehouse ETL team will have to take on the additional work and complexity.



Rate This Article

Comments:

Optional e-mail address:

In particular, these techniques require the ETL software to hold in memory and profile a complete image of the individual story before its constituent event facts are written out. In the clickstream example, this requirement means that Web events need to be consolidated, standardized in granularity, "sessionized," and sorted by session prior to being handed off to a downstream layer that buffers each session, assigns surrogate keys to each page visit, and creates the page visit fact rows.

Thankfully, there are some terrific clickstream data preparation tools on the market today that can handle some of these consolidation, standardization, and sessionization tasks. Type "Clickstream Collection Technology" into Google to get a list of some of the players in this arena.


Jim Stagnitto [jimstag@comcast.net] is a freelance data warehouse architect and integration specialist.








IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo Jitter
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet Evolution
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space