CMP -- United Business Media

Intelligent Enterprise

Better Insight for Business Decisions

UBM
Intelligent Enterprise - Better Insight for Business Decisions
Part of the TechWeb Network
Intelligent Enterprise
search Intelligent Enterprise





September 17, 2002

Access to Intelligence

The New OLAP APIs

by George Spofford

Historically data mining and online analytic processing (OLAP) have been solely human activities. In other words, it is humans who specify analytic models, build them, and then consume the results. But with the emergence of the Web services computing model, which involves the creation of new, lightweight plumbing for connecting disparate systems, the landscape is changing: Analytics can now be more easily interwoven with other computing interactions. (See the sidebar, "Not Just for Humans Anymore.") In other words, humans are no longer the sole specifiers or consumers of analytic services. The possibilities, as you might imagine, are fascinating.

Until recently, the options for connecting analytic engines have been limited. Developers have been able to use a variety of vendor APIs, or Microsoft's OLE DB for OLAP (or ODBO), which is supported by some servers besides Microsoft's. But ODBO is only available on Win32, rendering it less than universally accepted for Web services or enterprise mid-tier applications. The Java version of the OLAP Council's MD-API was never accepted by server or client vendors, so non-Win32 application servers haven't had any multivendor APIs at all.

However, new choices are emerging. In this article, I'll discuss two up-and-coming APIs for accessing OLAP-related information. One, XML for Analysis (XML/A), is already at version 1.0 and will be entering version 1.1 in the near future. The other, Java OLAP Interface (JOLAP), is currently working its way through the Java Community Process, having been in public review stage from June 20 to July 19, 2002.

One benefit of each of these APIs that deserves mentioning up front is their support for non-Win32-based mid-tier applications, whether you wish to consider such apps "Web services" or not; however, they are both suitable for other purposes. For client-side applications, each API also provides at least the benefit of enabling access to a previously unavailable choice of server types. Along the way, I'll discuss what the APIs provide, and how a range of vendors view the APIs with regard to their own interests.

Executive Summary

George Spofford

Web services offer the possibility of breaking down proprietary barriers between platforms as well as resolving differences between analytic and transactional systems. Two new APIs — JOLAP and XML for Analysis — have been proposed as a means of bringing these benefits to reality. Here's what they bring to the table.

The XML for Analysis Approach

XML/A is a simple object access protocol (SOAP)-based API derived from, and encompassing, Microsoft's ODBO. It includes SOAP and the entirety of ODBO as its main related specifications and adds SOAP actions and XML schemas for the ODBO data structures, with a few enhancements.

SOAP provides the logical mechanism for transmitting a client request to a server and retrieving information. There are only two SOAP actions a client may make: Discover (to retrieve metadata) and Execute (to issue DDL or DML). Being based on SOAP, there is latitude for providers to use a variety of transports for communication, but HTTP is likely to be the one supported by the most vendors of XML/A providers. Also, like HTTP, XML/A is designed to be stateless, with a minimalist and lightweight approach to handling sessions.

By virtue of its basis in ODBO, many of the OLAP aspects of XML/A will be familiar to developers who already use ODBO. The same MDX dimensional modeling language is used for queries and DDL, for example, and the same essential data structures are used for representing tables of OLAP metadata. XML/A itself occupies a lower level than ODBO, as an XML string by itself still needs parsing to become a data structure that resembles ODBO data structures. A DOM object representation of a query result or metadata set resembles the data structures of ODBO. Most developers who use ODBO today use Microsoft's ADO MD library because it provides simpler, high-level access. Microsoft will introduce versions of ADO MD for XML/A as well; I'll talk about that later.

Earlier, I stated that XML/A is an API for "accessing information." Like OLE DB for OLAP, a client can request multidimensional information in a relational recordset format as well as a multidimensional cellset. However, the specification is sufficiently open-ended that other provider types can be connected to and accessed without violating the specification.

Whereas Microsoft's ODBO and OLE DB for Data Mining are two separate APIs, XML/A as an analytic query and result transport spec will cover the functionality of both. (OLE DB for Data Mining isn't currently part of XML/A, but the vendors involved are working to incorporate its functionality into a future version.) Microsoft views XML/A as a generalized, extensible framework that may well be used for query and reporting tasks as well. Furthermore, nothing prevents a client from connecting to a relational provider, issuing a SQL query, and obtaining a rowset back. The key is simply that other analytic interactions, such as relational reporting, creating GIS queries, or solving an optimization model, can be handled through this framework. XML/A doesn't specify data structures or interaction modes that preclude them taking place through an XML/A framework, but it currently doesn't explicitly support them.

XML/A version 1.0 provides XML modeling of metadata and query results, albeit as relatively simple arrays and specifies MDX as the query language. However, a markup form of MDX called mdXML is currently under development. As of this writing, mdXML isn't yet finalized, but it will function as an object description of MDX queries. The chief benefit of a markup form is to provide a programmatic query object model to all developers and provide a mechanism for different tiers of software to contribute to analytic queries. As a textual interface, is isn't clear that mdXML will enhance MDX semantics. However, developers can programmatically manipulate a DOM representation of mdXML to create and modify queries and can also use Xpath and DOM manipulations to incorporate prior queries and query results into portions of new queries.

One area of concern for adopters as well as implementers is the "wordiness" of XML compared to binary formats, particularly as that wordiness affects data communications. Straightforward XML consumes more network bandwidth in transport than binary protocols for two reasons: binary data must be represented as text, which frequently makes it larger, and the opening and closing tags themselves may consume more space than the equivalent markers in binary wire protocols. Vendors at work on XML/A expect to leverage techniques under development for improving XML communications in general.

Schema-based XML compression dramatically reduces the number of bytes transmitted between machines based on knowledge of the data structures used. When the savings in transmission time outweigh the CPU demands of compression and decompression (more likely in a WAN environment), this could be very helpful. However, as some of the client vendors I spoke with agreed, the lack of standard compression in XML means the benefits only appear when a single vendor supports the client and server sides. Work is also under way on binary encoding of XML data, which could eliminate the need to represent some of the information in textual form as well. In the medium to long term, that approach could ease any performance issues that arise in use.

The Juice on JOLAP

JOLAP is Java-based, and all aspects of the API are object-based. The specification provides an object model for metadata traversal, query results, and query construction. Notably, no textual query language is specified. Instead, the object model provides classes that can be combined to specify selections and orientation for results.

JOLAP (Java Specification Request-69) leverages other specifications, including the OMG specs for Common Warehouse Metamodel (CWM), Meta Object Facility (MOF) and XML Metadata Interchange (XMI), and the Java Community Process' Java Metadata Interface (JMI, JSR-40). It is designed to be compatible with both J2EE and J2SE environments; connection and transaction facilities are patterned after the J2EE Connector Architecture. Developers who are already invested in the other related standards, particularly CWM and JMI, will benefit from the ability to leverage code and extend functionality into OLAP-specific areas.

JOLAP provides a somewhat rich mechanism for describing and obtaining metadata. Dimensions, cubes, hierarchies, levels, attributes, and the associations among dimensions and cubes are metadata objects obtained from a root schema object. Objects describing members are further queried from dimensions and their components. Resource management via cursors are available for all actions that retrieve members, so it is suitable for retrieving member information from levels having hundreds of thousands of members. (Microsoft's ADO MD provides an object model for metadata as well, but the JOLAP model is somewhat richer and better tuned for resource usage.)

Query object models and languages are critical for developers who use an analytic API. Every dynamic client tool has some internal model for structuring of query components as well as metadata, so that components of queries can be assembled into a whole and modified. JOLAP models all queries as collections of objects designed to reflect GUI gestures in constructing and refining a query. There is no subsequent step required of an application to traverse the objects and generate a textual query.

The JOLAP object model is conceptually similar in many ways to the exposed object models of most if not all OLAP client tools. The object model is a double-edged sword; enhancements to query capabilities require library updates. However, using any enhancements to an underlying textual query language would require updates to the software of the client software, so the fact of library updates is more of a synchronization issue.

JOLAP provides one very interesting feature for support of interactive clients: The transactions on the object constructs that define the queries allow client applications to conveniently manage the modification and rollback of portions of a query, such as a series of drill-downs and modifications to filters. Reversing a sequence of changes and picking an earlier query state as a point of departure for a new path can be managed by the API, as opposed to user code. This technique lets the client spend less code on managing query state, and the API implementation may be able to leverage knowledge of the connection between two states to optimize data transfer.







IE Weekly Newsletter
Subscribe to the newsletter
    Email Address