Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Digital Library
Subscribe
Home


March 30, 1999 Volume 2 Number 5

Symmetry Lost

How new standards of dimensional symmetry are improving querying and analysis - and what's still missing


Although interactions among elementary particles are reasonably symmetric, their aggregates’ interactions are most decidedly not. In my last column, I explained how the existence of multiple dimensions of the same type created large amounts of sparsity for multidimensional OLAP (MOLAP) applications. I also mentioned that we need to treat some dimensions as measures — effectively swapping the roles of dimensions and measures — and that I would treat that issue at a later time. Well, now I am. The notion of role swapping or symmetry between dimensions and measures is crucial for DSS.

The notion of symmetry starts with the assumption that we already think in terms of dimensions and measures. Whatever we are tracking — sales and costs, for example — are measures; the ways we organize what we track are our dimensions. Symmetry then goes on in two different directions. One direction is called type varying because the underlying type structure required to support the two symmetrical views changes. It asserts that we can treat the entire collection of measures as members of a measure’s dimension with the implicit addition of a value dimension. Furthermore, this measure’s dimension should have all the properties associated with any other dimension. This is the sense that Microsoft’s OLE DB for OLAP programmers reference gives, stating that “OLE DB for OLAP is designed to enforce symmetry between dimensions and measures.”

Another direction is called type-fixed symmetry, which assumes that each dimension and measure can swap roles. As a result, each dimension can serve as a measure, and each measure can serve as a dimension. Figure 1, page 26, illustrates the concepts of type-varying and type-fixed symmetry.

Notice that in type-varying symmetry, the dimensions are not (explicitly) treated as measures. Also notice that the underlying type structure as represented by the table column structure changes in a type-varying symmetry. For type-fixed symmetry, however, we don’t treat the measures collectively as a dimension, but the set of types as represented by the table columns remains fixed.

FIGURE 1 Type-varying vs. fixed-type symmetry.

Not only does the term “symmetry” mean different things, but the degree to which a particular product or API supports symmetry is a function of the type of activity, or purpose, for which the measures and dimensions are inverted. In other words, even though you may be able to treat dimensions and measures symmetrically for querying purposes, you may not be able to use the values of a measure to create the x-axis in a line graph, aggregate the values of a dimension, or use a dimension as a response variable in a predictive model. The degree to which products support dimensional symmetry varies as a function of whether you are trying to perform data selection or filtering, data representation, aggregations, or other analyses.

Querying

Basic querying is the area in which symmetry is most prevalent. Consider a table composed of time, stores, sales, and costs. Assume the typical configuration in which time and stores are keys, and sales and costs are nonkey attributes. If you were to query for sales, for example:


Select sales from table where Time in quarter 
1 and stores in Eastern region

You treat the data definition language (DDL)-defined dimensions as dimensions in the query, and you treat the DDL-defined measures as measures in the query. (Keep in mind that the general form of a query is Ty (Ti, Ti,.. ) => Iy or Select the Iy associated with Ty from source where Ti, Ti. In other words, the Ti tuples define the query-specific dimensions and the Ty given, which returns the Iy, defines the query-specific measures.)

On the other hand, if you wanted to know which stores met certain criteria, you would need to treat dimensions and measures symmetrically by composing a query for stores. For example:


Select stores from table where sales > ‘$500’


Here, we treat the DDL-defined stores dimension as a measure in the query, and we treat the DDL-defined sales measure as a dimension in the query.

SQL supports fixed-type symmetry at the query level. Nearly all OLAP products support this form of query symmetry, though not always directly. The product interfaces make it easy to select DDL-defined measures data by traversing dimension values. But to select dimension values by traversing measure values requires defining formulas, usually in the form of filters. Nevertheless, most products do support fixed-type symmetry because you can create this formula-based filter in most products.

What if you wanted to see the measures with values that had increased by more than 20 percent during the previous year? How would you express that? You need type-varying symmetry as illustrated in the following query:

Select measures from table where value/ [prev.year]:value > 1.2

SQL does not support this type of symmetry because it would entail selecting columns. This type of symmetry is also less supported in the OLAP world in which products such as Oracle’s Express would have a hard time selecting variables. OLE DB for OLAP, however, does support type-varying symmetry.

Representation

Analytical visualization relies heavily on flexible viewing schemes or representational symmetry. Assuming that the query tool is delivering only a result set (that is, we’re not judging the query tool’s ability to define representations) and that the representational tool maps from the result set to the visual form, the limiting factor in visual representation is the semantics of the visual form. Many representational forms such as line graphs, pie charts, and surface plots have an implicit asymmetry. Some parts of the visual form are most effective for representing dimensions, yet other parts are most effective for representing measures.

Take for example the ubiquitous line graph. Imagine a graph using sales as a function of time, in which time is the dimension represented on the x-axis and sales is on the y-axis. Note that for each unique value of time, there exists only one sales value. This rule is one of the basic function rules, and it explains why the graph looks like a line graph. Line graphs are decidedly asymmetrical.

FIGURE 2 Time as pseudo-function symmetry.


Now look at the graph in Figure 2, which uses fixed-type symmetry and shows the sales measure along the x-axis and the time dimension along the y-axis. It doesn’t even look like a line graph; it looks more like a scatter plot. In fact, Figure 2 violates the function rule because, for some sales values, there may be either more than or less than one value for time. A more likely graph — which obeys the function rule, relies on fixed-type symmetry, and places sales along the x-axis and time along the y-axis — would show unique sales values on the x-axis and the count of times on the y-axis. Sales would be treated ordinally; we’ve reduced sales to sales rank. (See Figure 3.)

FIGURE 3 Count of times as a function of sales.


What if you wanted to compare the changes across several measures by superposing them on a line graph? You would still be treating dimensions as dimensions. However, you would be treating the collection of measures as members of a measures dimension (that is, varying-type symmetry) while abstracting out the units common to the measures and portraying that common unit — in this case, dollars — as the new measure along the y-axis.

Although most OLAP products will let you create the necessary result sets, they won’t let you create the representation form directly from the query. You will have to instantiate the result set and then querying into the result set to create your representations. The way that you choose to map from result set to representation form will determine how you treat dimensions and measures. Most products support symmetry with querying, but they do not support full representational symmetry of either fixed or varying types.

Aggregation and Analysis

More sophisticated analyses frequently require some form of dimensional symmetry such as using measures as aggregation principles (dimensions). The grouping operations that statisticians and data miners rely on — in which a measure such as sales is binned and used to aggregate additional information, such as profits — is a typical example of using a measure as an aggregation dimension and relying on fixed-type dimensional symmetry. You may also use dimensions or dimension attributes as dependent variables. Consider the following equations:

Distance traveled = F(price, discount, promotion, competition)

Count new stores (where new stores is a member of an aggregation level) = F(prev year.sales)

Count new products = F(prev year sales)

To a lesser extent, you can aggregate dimension values as measures. Space and time (the original dimensions) are cardinally ordered and aggregated frequently, as they are with the following aggregate queries:

• What is the average time between first contact and the closing of a deal?

• What is the average distance traveled by 75 percent of our customers?

Typical dimension values are less frequently aggregated because most DSS dimensions — such as stores, products, employees, and customers — are ordered nominally. Remember that these dimensions are composed of names. A product dimension is a product name dimension, and it’s easy to see why aggregating names is less than illuminating. However, even nominal dimensions benefit from aggregation. Consider a store-size ranking based on square feet per store. Now imagine averaging the number of rank units per unique instance of sales. The average of the ranks would indicate which store sizes generated the most sales. This type of analytical aggregation assumes fixed-type symmetry.


Listing 1. The MDX example.

WITH
SET Ordered_Stores AS 
‘Order (Stores.Members, Measures.Sales, BASC)’
MEMBER Measures.NewStoreSalesFlag AS 
‘iif (Measures.Sales <> (Measures.Sales, Ordered_Stores.Item (Rank (Stores.CurrentMember, 
OrderedStores) — 2)), 1, 0)’
MEMBER Measures.CumStoreSqFt AS
‘iif (Measures.NewStoreSalesFlag = 1, 
Stores.StoreSqFt, 
Stores.StoreSqFt + (Measures.CumStoreSqFt, Ordered_Stores.Item (
		Rank (Stores.CurrentMember, OrderedStores) —2 ))
)’
MEMBER Measures.CumStorecnt AS
‘iif (Measures.NewStoreSalesFlag = 1, 
1, 
1+ (Measures.CumStorecnt, Ordered_Stores.Item (
		Rank (Stores.CurrentMember, OrderedStores) —2 ))
)’
MEMBER Measures.CumStoreavg AS
‘Measures.CumStoreSqFt / Measures.CumStorecnt’
SET Reduced_Stores AS
‘Filter (Ordered_Stores, 
(Measures.NewStoreSalesFlag, Ordered_Stores.Item (
		Rank (Stores.CurrentMember, OrderedStores) —1 +1 )) = 1
OR Rank (Stores.CurrentMember, OrderedStores) = Count (OrderedStores, INCLUDEEMPTY ))
)’
SELECT
	{ Measures.Sales, Measures.CumStoreavg } on axis (0),
	{ Reduced_Stores } on axis (1)
FROM Cube

Technically it is possible to provide for fixed-type symmetry in analytical aggregations using OLE DB for OLAP, but using Microsoft’s multidimensional query language (MDX) to perform this aggregation is brutal. (See Listing 1 at Intelligent Enterprise Online. You can find the full notes surrounding this MDX example on Dimensional Systems’ Web site, www.dimsys.com ) Listing 1 would return the highlighted rows of Table 1, and if there were some ideal store size, it might appear as in Figure 4. (See both Table 1 and Figure 4 at Intelligent Enterprise Online.)



Store

Sales

Flag

CumStoreavg

a

10

1

10

b

10

0

20

c

10

0

30

d

50

1

50

e

50

0

100

f

50

0

150

g

100

1

100

h

100

0

200

i

150

1

150



Only logic should constrain analysis. With DSS tools, there is still a tradeoff between ease of basic analysis and analytical flexibility. OLAP tools and de facto APIs facilitate the creation of multilevel aggregates and context-specific formulas, but they do so at the expense of analytical flexibility. Statistical and data mining tools provide more analytical symmetry at the expense of offering no dimensional support. It is encouraging to see the increased analytical symmetry provided in OLE DB for OLAP, and further improvements in DSS capabilities will require the trend to continue.



Erik Thomsen is cofounder of the Cambridge, Mass.-based consultancy Dimensional Systems and author of the book OLAP Solutions (John Wiley & Sons, 1997). You can reach him via email at erik@dimsys.com.





IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo JitterPlug Into The Cloud
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet EvolutionPyramid Research
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space