|
March 30, 1999 Volume 2 Number 5
Symmetry Lost
|
The notion of symmetry starts with the assumption that we already think in terms of dimensions and measures. Whatever we are tracking sales and costs, for example are measures; the ways we organize what we track are our dimensions. Symmetry then goes on in two different directions. One direction is called type varying because the underlying type structure required to support the two symmetrical views changes. It asserts that we can treat the entire collection of measures as members of a measures dimension with the implicit addition of a value dimension. Furthermore, this measures dimension should have all the properties associated with any other dimension. This is the sense that Microsofts OLE DB for OLAP programmers reference gives, stating that OLE DB for OLAP is designed to enforce symmetry between dimensions and measures.
Another direction is called type-fixed symmetry, which assumes that each dimension and measure can swap roles. As a result, each dimension can serve as a measure, and each measure can serve as a dimension. Figure 1, page 26, illustrates the concepts of type-varying and type-fixed symmetry.
Notice that in type-varying symmetry, the dimensions are not (explicitly) treated as measures. Also notice that the underlying type structure as represented by the table column structure changes in a type-varying symmetry. For type-fixed symmetry, however, we dont treat the measures collectively as a dimension, but the set of types as represented by the table columns remains fixed.
FIGURE 1 Type-varying vs. fixed-type symmetry.
Not only does the term symmetry mean different things, but the degree to which a particular product or API supports symmetry is a function of the type of activity, or purpose, for which the measures and dimensions are inverted. In other words, even though you may be able to treat dimensions and measures symmetrically for querying purposes, you may not be able to use the values of a measure to create the x-axis in a line graph, aggregate the values of a dimension, or use a dimension as a response variable in a predictive model. The degree to which products support dimensional symmetry varies as a function of whether you are trying to perform data selection or filtering, data representation, aggregations, or other analyses.
Querying
Basic querying is the area in which symmetry is most prevalent. Consider a table composed of time, stores, sales, and costs. Assume the typical configuration in which time and stores are keys, and sales and costs are nonkey attributes. If you were to query for sales, for example:
Select sales from table where Time in quarter
1 and stores in Eastern region
You treat the data definition language (DDL)-defined dimensions as dimensions in the query, and you treat the DDL-defined measures as measures in the query. (Keep in mind that the general form of a query is Ty (Ti, Ti,.. ) => Iy or Select the Iy associated with Ty from source where Ti, Ti. In other words, the Ti tuples define the query-specific dimensions and the Ty given, which returns the Iy, defines the query-specific measures.)
On the other hand, if you wanted to know which stores met certain criteria, you would need to treat dimensions and measures symmetrically by composing a query for stores. For example:
Select stores from table where sales > $500
Here, we treat the DDL-defined stores dimension as a measure in the query, and we treat the DDL-defined sales measure as a dimension in the query.
SQL supports fixed-type symmetry at the query level. Nearly all OLAP products support this form of query symmetry, though not always directly. The product interfaces make it easy to select DDL-defined measures data by traversing dimension values. But to select dimension values by traversing measure values requires defining formulas, usually in the form of filters. Nevertheless, most products do support fixed-type symmetry because you can create this formula-based filter in most products.
What if you wanted to see the measures with values that had increased by more than 20 percent during the previous year? How would you express that? You need type-varying symmetry as illustrated in the following query:
Select measures from table where value/ [prev.year]:value > 1.2
SQL does not support this type of symmetry because it would entail selecting columns. This type of symmetry is also less supported in the OLAP world in which products such as Oracles Express would have a hard time selecting variables. OLE DB for OLAP, however, does support type-varying symmetry.
Representation
Analytical visualization relies heavily on flexible viewing schemes or representational symmetry. Assuming that the query tool is delivering only a result set (that is, were not judging the query tools ability to define representations) and that the representational tool maps from the result set to the visual form, the limiting factor in visual representation is the semantics of the visual form. Many representational forms such as line graphs, pie charts, and surface plots have an implicit asymmetry. Some parts of the visual form are most effective for representing dimensions, yet other parts are most effective for representing measures.
Take for example the ubiquitous line graph. Imagine a graph using sales as a function of time, in which time is the dimension represented on the x-axis and sales is on the y-axis. Note that for each unique value of time, there exists only one sales value. This rule is one of the basic function rules, and it explains why the graph looks like a line graph. Line graphs are decidedly asymmetrical.
FIGURE 2 Time as pseudo-function symmetry.
Now look at the graph in Figure 2, which uses fixed-type symmetry and shows the sales measure along the x-axis and the time dimension along the y-axis. It doesnt even look like a line graph; it looks more like a scatter plot. In fact, Figure 2 violates the function rule because, for some sales values, there may be either more than or less than one value for time. A more likely graph which obeys the function rule, relies on fixed-type symmetry, and places sales along the x-axis and time along the y-axis would show unique sales values on the x-axis and the count of times on the y-axis. Sales would be treated ordinally; weve reduced sales to sales rank. (See Figure 3.)
FIGURE 3 Count of times as a function of sales.
What if you wanted to compare the changes across several measures by superposing them on a line graph? You would still be treating dimensions as dimensions. However, you would be treating the collection of measures as members of a measures dimension (that is, varying-type symmetry) while abstracting out the units common to the measures and portraying that common unit in this case, dollars as the new measure along the y-axis.
Although most OLAP products will let you create the necessary result sets, they wont let you create the representation form directly from the query. You will have to instantiate the result set and then querying into the result set to create your representations. The way that you choose to map from result set to representation form will determine how you treat dimensions and measures. Most products support symmetry with querying, but they do not support full representational symmetry of either fixed or varying types.
Aggregation and Analysis
More sophisticated analyses frequently require some form of dimensional symmetry such as using measures as aggregation principles (dimensions). The grouping operations that statisticians and data miners rely on in which a measure such as sales is binned and used to aggregate additional information, such as profits is a typical example of using a measure as an aggregation dimension and relying on fixed-type dimensional symmetry. You may also use dimensions or dimension attributes as dependent variables. Consider the following equations:
Distance traveled = F(price, discount, promotion, competition)
Count new stores (where new stores is a member of an aggregation level) = F(prev year.sales)
Count new products = F(prev year sales)
To a lesser extent, you can aggregate dimension values as measures. Space and time (the original dimensions) are cardinally ordered and aggregated frequently, as they are with the following aggregate queries:
What is the average time between first contact and the closing of a deal?
What is the average distance traveled by 75 percent of our customers?
Typical dimension values are less frequently aggregated because most DSS dimensions such as stores, products, employees, and customers are ordered nominally. Remember that these dimensions are composed of names. A product dimension is a product name dimension, and its easy to see why aggregating names is less than illuminating. However, even nominal dimensions benefit from aggregation. Consider a store-size ranking based on square feet per store. Now imagine averaging the number of rank units per unique instance of sales. The average of the ranks would indicate which store sizes generated the most sales. This type of analytical aggregation assumes fixed-type symmetry.
Listing 1. The MDX example.
WITH
SET Ordered_Stores AS
Order (Stores.Members, Measures.Sales, BASC)
MEMBER Measures.NewStoreSalesFlag AS
iif (Measures.Sales <> (Measures.Sales, Ordered_Stores.Item (Rank (Stores.CurrentMember,
OrderedStores) 2)), 1, 0)
MEMBER Measures.CumStoreSqFt AS
iif (Measures.NewStoreSalesFlag = 1,
Stores.StoreSqFt,
Stores.StoreSqFt + (Measures.CumStoreSqFt, Ordered_Stores.Item (
Rank (Stores.CurrentMember, OrderedStores) 2 ))
)
MEMBER Measures.CumStorecnt AS
iif (Measures.NewStoreSalesFlag = 1,
1,
1+ (Measures.CumStorecnt, Ordered_Stores.Item (
Rank (Stores.CurrentMember, OrderedStores) 2 ))
)
MEMBER Measures.CumStoreavg AS
Measures.CumStoreSqFt / Measures.CumStorecnt
SET Reduced_Stores AS
Filter (Ordered_Stores,
(Measures.NewStoreSalesFlag, Ordered_Stores.Item (
Rank (Stores.CurrentMember, OrderedStores) 1 +1 )) = 1
OR Rank (Stores.CurrentMember, OrderedStores) = Count (OrderedStores, INCLUDEEMPTY ))
)
SELECT
{ Measures.Sales, Measures.CumStoreavg } on axis (0),
{ Reduced_Stores } on axis (1)
FROM Cube
Technically it is possible to provide for fixed-type symmetry in analytical aggregations using OLE DB for OLAP, but using Microsofts multidimensional query language (MDX) to perform this aggregation is brutal. (See Listing 1 at Intelligent Enterprise Online. You can find the full notes surrounding this MDX example on Dimensional Systems Web site, www.dimsys.com ) Listing 1 would return the highlighted rows of Table 1, and if there were some ideal store size, it might appear as in Figure 4. (See both Table 1 and Figure 4 at Intelligent Enterprise Online.)
|
Store |
Sales |
Flag |
CumStoreavg |
|
a |
10 |
1 |
10 |
|
b |
10 |
0 |
20 |
|
c |
10 |
0 |
30 |
|
d |
50 |
1 |
50 |
|
e |
50 |
0 |
100 |
|
f |
50 |
0 |
150 |
|
g |
100 |
1 |
100 |
|
h |
100 |
0 |
200 |
|
i |
150 |
1 |
150 |
Only logic should constrain analysis. With DSS tools, there is still a tradeoff between ease of basic analysis and analytical flexibility. OLAP tools and de facto APIs facilitate the creation of multilevel aggregates and context-specific formulas, but they do so at the expense of analytical flexibility. Statistical and data mining tools provide more analytical symmetry at the expense of offering no dimensional support. It is encouraging to see the increased analytical symmetry provided in OLE DB for OLAP, and further improvements in DSS capabilities will require the trend to continue.
Erik Thomsen is cofounder of the Cambridge, Mass.-based consultancy Dimensional Systems and author of the book OLAP Solutions (John Wiley & Sons, 1997). You can reach him via email at erik@dimsys.com.
|
|
|
| |||||||||||||||||||||||||||||||




















