|
||
|
http://www.intelligententerprise.com/010507/decision.jhtml
|
||
OLAP is arguably today's most popular decision-support approach, but OLAP tools have very limited ability to tease knowledge from time-varying data. If you need to know anything beyond the difference or percentage change in a value over time, or you need to do more than year-to-date and other period-to-date roll-up aggregations, you may be out of luck.
If you handle data of mixed frequencies or perform advanced analysis such as forecasting along the time dimension, you need a way to model these values as series rather than merely as sets of discrete points in an OLAP cube. These needs are common, but many tools don't meet the challenge. You probably don't like to "leave money on the table," but that's what you're doing if you're unaware of OLAP tools' time-dimension-related analytic shortcomings.
The problems start with the data structures that underlie OLAP analyses. The first job is to express analysis objectives in terms of measures - functions that you need to maximize (such as profit), minimize (such as cost), or simply study (such as output, which you want to maximize subject to constraints including market conditions and marginal cost). Next, you would determine the variables from which the measures are derived. By variables, I mean abstract, algebraic x and y representations of observed data values, which in an OLAP cube lie at the intersection of dimensions in a multidimensional space. The variables also represent hypothetical values that let you perform "what if?" type analyses of the outcome of different, conjectural scenarios.
(Data mining techniques can be of some help in showing dimensions and the value classifications along those dimensions. They also expose the relationships among variables given those dimensions and classifications.)
For example, if you're attempting to maximize profit in a retail operation, one variable might be sales and another cost, with the measure profit equal to the difference between the two. But cost at a given point in time, say the week starting April 29, 2001, really reflects not only that week's labor and other costs but also that week's expenditures on inventory that remains unsold until some future week. You should more realistically decompose cost into a current labor component and a lagged term that reflects your previous weeks' inventory-acquisition expenditures. Does your OLAP tool help you create measures based on lagged time-based values or leading, future-period values?
I deliberately chose as one of my variables a quantity that you can typically observe on a weekly basis: Most employers pay their employees weekly or biweekly. Leading OLAP databases have a time dimension type that lets you create facts - fields for observed data values - for a weekly payroll amount. But what if your company reports daily sales? Does your OLAP tool let you analyze cubes mixing data observed with different frequencies, such as data from a week and from a month?
Common OLAP-speak would say that the time dimension has multiple granularities, that is, more than one observation frequency. This characteristic is not the same as creating multiple time dimensions, which the Microsoft OLE DB for OLAP multidimensional expressions (MDX) interface calls multiple hierarchies within the time dimension, for reporting purposes.
If necessary, you can easily sum sales for each day in the week to create a weekly value or divide weekly labor costs by seven to create an average daily labor cost. But what if your company reports your measure, profit, monthly? You would need a systematic way of rolling-up weekly data to months.
Some tools offer options - splitting a week's values according to the number of days of that week in each month, allocating the whole week's value to the month in which the first or last day of the week falls, or allocating weeks to months in a calendar quarter based on a 4-5-4 split - but most do not have capabilities built-in. If you want them, you'd have to program them yourself. And what if you want to study the effects of different allocations to determine the best model for your business? Must you hard-wire a separate time dimension for each allocation method, or can you change it on the fly with a function or computational option?
And a last point on this subject: What if your company reports some data monthly, but your analysis must be done with a daily output frequency? Is dividing a monthly figure by the number of days in the month enough, or must you account for weekends and holidays? Or is any kind of division simply wrong, because data along the particular dimension is not additive? For example, having 300 employees at the end of April doesn't mean you only had 10 employees on a given day in that month.
A more realistic method is an average value for the month, although a straight-line interpolation between the current and previous-period values or a moving average (an average of the values immediately before and after the time point in question) could be more meaningful. Regardless, the key is having the right capabilities in your software tools and the flexibility to use the approach appropriate to each dimension in your cube.
Analyzing time-varying data can be an even bigger challenge than simply storing it and transforming frequencies. I'll look at two examples: seasonal adjustment and forecasting.
The concept of seasonality is familiar to all of us: the changes in farming and construction employment due to winter weather and the increase in retail sales and retail-sales employment in the months before the end-of-year holidays. Simply measuring employment and sales levels doesn't help you understand the underlying conditions that are independent of seasonal factors. Decomposing demand for labor and retail sales into seasonal and nonseasonal terms can help provide that understanding. Oracle's Express Server and Lucent Technologies Inc.'s Strategist, however, are the only OLAP tools I know of with built-in capabilities to help analysts meet this widespread need.
The goal of forecasting is to discover trends that you can use to predict the future values of variables. You could forecast measure values directly, but this method is less safe than first estimating the future values of the underlying variables from which you can then derive the measure values.
The most basic forecasting technique is regression and extrapolation: fitting a straight-line or more complex curve to a set of data points and projecting to future times. Although Microsoft's MDX includes linear regression, some mass-market OLAP tools don't even have this ability.
But even linear regression, which may not reveal the dependencies across time that are inherent in most business data, can be insufficient for your needs. Think what would happen if you fit a straight line to the outside temperature. If you were working with many years of data, you might be able to detect warming or cooling trends. But if you live in the northeast United States. as I do and you only had data from January to June of one year, you might conclude that you need to put heavy-duty air conditioners on your holiday wish list.
Seasonal analysis and forecasting techniques will be interesting topics for a later column, as will specialized software and techniques for managing time-varying data. In many fields, understanding and respecting the time basis of the data is the key to meaningful analysis. We have much to gain by examining these cases - and looking at techniques for management and analysis of time series data - in more depth.
For now, take a good look at your analysis tools and decide if they're handling your time-varying data adequately. If they aren't, you have several alternatives worth exploring in both OLAP and more specialized tools. In the OLAP realm, Accrue Software Inc.'s Pilot Decision Support Suite comes to mind. And relational-database object extensions for time series management and other complex-typed data are gaining wider acceptance.
Being able to do advanced analyses like forecasting requires understanding the advantages of software that truly understands time - and the limitations of mass-market OLAP tools that don't.