|
OLAP is arguably today's most popular decision-support approach, but OLAP tools have very limited ability to tease knowledge from time-varying data. If you need to know anything beyond the difference or percentage change in a value over time, or you need to do more than year-to-date and other period-to-date roll-up aggregations, you may be out of luck.
If you handle data of mixed frequencies or perform advanced analysis such as forecasting along the time dimension, you need a way to model these values as series rather than merely as sets of discrete points in an OLAP cube. These needs are common, but many tools don't meet the challenge. You probably don't like to "leave money on the table," but that's what you're doing if you're unaware of OLAP tools' time-dimension-related analytic shortcomings.
The Underlying Problem
The problems start with the data structures that underlie OLAP analyses. The first job is to express analysis objectives in terms of measures - functions that you need to maximize (such as profit), minimize (such as cost), or simply study (such as output, which you want to maximize subject to constraints including market conditions and marginal cost). Next, you would determine the variables from which the measures are derived. By variables, I mean abstract, algebraic x and y representations of observed data values, which in an OLAP cube lie at the intersection of dimensions in a multidimensional space. The variables also represent hypothetical values that let you perform "what if?" type analyses of the outcome of different, conjectural scenarios.
(Data mining techniques can be of some help in showing dimensions and the value classifications along those dimensions. They also expose the relationships among variables given those dimensions and classifications.)
For example, if you're attempting to maximize profit in a retail operation, one variable might be sales and another cost, with the measure profit equal to the difference between the two. But cost at a given point in time, say the week starting April 29, 2001, really reflects not only that week's labor and other costs but also that week's expenditures on inventory that remains unsold until some future week. You should more realistically decompose cost into a current labor component and a lagged term that reflects your previous weeks' inventory-acquisition expenditures. Does your OLAP tool help you create measures based on lagged time-based values or leading, future-period values?
Different Frequencies
I deliberately chose as one of my variables a quantity that you can typically observe on a weekly basis: Most employers pay their employees weekly or biweekly. Leading OLAP databases have a time dimension type that lets you create facts - fields for observed data values - for a weekly payroll amount. But what if your company reports daily sales? Does your OLAP tool let you analyze cubes mixing data observed with different frequencies, such as data from a week and from a month?
Common OLAP-speak would say that the time dimension has multiple granularities, that is, more than one observation frequency. This characteristic is not the same as creating multiple time dimensions, which the Microsoft OLE DB for OLAP multidimensional expressions (MDX) interface calls multiple hierarchies within the time dimension, for reporting purposes.
If necessary, you can easily sum sales for each day in the week to create a weekly value or divide weekly labor costs by seven to create an average daily labor cost. But what if your company reports your measure, profit, monthly? You would need a systematic way of rolling-up weekly data to months.
Some tools offer options - splitting a week's values according to the number of days of that week in each month, allocating the whole week's value to the month in which the first or last day of the week falls, or allocating weeks to months in a calendar quarter based on a 4-5-4 split - but most do not have capabilities built-in. If you want them, you'd have to program them yourself. And what if you want to study the effects of different allocations to determine the best model for your business? Must you hard-wire a separate time dimension for each allocation method, or can you change it on the fly with a function or computational option?
|
|
|
|
|
|
|











