Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




February 16, 2001




Smarter Data Warehouses


We need a better analytic language than SQL


by Joy Mundy

Continued from Page 1

The Extension Problem

The addition of the WINDOW clause to the SQL standard greatly increases the usefulness of the SQL language for numerical analysis. Analysts will receive the syntax gratefully, and I expect the burden of computation to shift somewhat toward the database engine and away from the end-user tool. But there are several types of analytic problems that are not addressed by the standard; two of which are:

  • Aggregations using multiplication
  • Applying different rules to different branches of a hierarchy (such as computing returns in Europe using a different formula than for returns in the United States).

These are second-order issues. The fundamental problem with using SQL - even SQL-99 - to perform complex computations is that it is just too hard to formulate the queries. Subqueries and HAVING clauses have been part of SQL for many years, yet the syntaxes are poorly supported by query tools. Unfortunately, I suspect it will be several years before we see many products with an intuitive interface for building ad hoc queries that use the new SQL-99 OLAP extensions.

OLAP Is the Answer

The answer is to place an analytic tier between the data store and the users. This analytic tier will provide the kinds of functionality that Ralph Kimball described in his "dimensionally friendly" criteria. (See his Intelligent Enterprise columns "Is Your Dimensional Data Warehouse Expressive?" May 15, 2000, and "Rating Your Dimensional Data Warehouse," April 28, 2000.) For the purposes of the current discussion, I'm agnostic about whether the underlying data store is in the relational database or another format, although if the underlying database is relational, it should support the OLAP extensions. The analytic criteria should include:

  • Performing effective and consistent analysis at multiple levels of aggregation
  • Exploring the relationships among multiple attributes of the data
  • Predefining analytic expressions in a flexible way
  • Natively supporting a wide variety of analytics
  • Leaving the analytic door open: Use APIs to seamlessly integrate external computation engines for specialpurpose analytics that are beyond the scope of the OLAP engine.

The OLAP products on the market today address these criteria with varying, but generally good, success. OLAP technology is most widely known for solving the "aggregate navigation" problem, but it also provides an effective platform for interesting analytics. All OLAP products provide significantly greater flexibility for analysis than SQL does, at a cost: Each uses its own proprietary query or calculation language.

For example, Microsoft's analytic language MDX (from "multidimensional expressions") provides a facility for the power analyst to store an analytic expression in the Analysis Services database. The computed member, as these stored expressions are called, can be staggeringly complex: They can even be defined to call out to an external module that would perform multivariate time series analysis. All the functionality I discuss in this column is natively supported.

Returning to the business problem in Listing 1, the power analyst would define in the database a computed member, Price Moving Average as:

Avg(Time.CurrentMember.Lag(2):Time.

CurrentMember, [Closing Price])

The casual user of the system can drag this measure into a query. It will work correctly regardless of the requested level of aggregation and no matter which dimensions are displayed. You could create a more complex expression, transparently creating a 50-day moving average if the measure is viewed on the day level; a 10-week moving average at the week level; and a three-month moving average for data viewed by month.

Listing 3 shows a full query that uses this predefined measure.

No matter how complex the MDX of the computed measure, its use is straightforward. The MDX-enabled query and reporting tool can be designed to graphically present elements on rows, columns, and pages; the numeric measures that fill in the grid are computed by the Analysis Services code.

I don't mean to trivialize the complexity of MDX and the other analytic languages. They are difficult syntaxes to master. I have seen MDX that is breathtakingly complex, and there are few tools to help develop analytic measures. Again, the saving grace is that the diff icult work needs be done only once, and is easily shared with many.



Rate This Article

Comments:

Optional e-mail address:

Looking to the future, we can expect to see a broader use of analytic systems in closed-loop applications. Query and reporting requirements are well served by languages and programming interfaces that were designed to support ad hoc analysis. Similarly, the "graduate level" statistical requirements of closed-loop systems call for analysis systems vendors to incorporate data-mining APIs into their products. And the data-mining algorithm developers must take this movement seriously, or risk having their tool set dismissed as too difficult to use and too awkward to integrate into operational systems.



Joy Mundy (joy@microsoft.com) was one of those "power users" until the early 1990s when she joined data warehousing on the IT side. She has built, managed, analyzed, consulted, and productized data warehouses since that time. Joy currently works on Microsoft's SQL Server product team.




RESOURCES

SQL-99 specification (SQL-3): www.anis.org

OLAP extensions discussion: ftp://jerry.ece.umassd.edu/isowg3/x3h2/1999docs,document99-154








IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo Jitter
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet Evolution
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space