Center of the Universe: The Microsoft ViewThe vision of 21st century data management as told through Web-exclusive interviews with key strategists at IBM, Microsoft, and Oracle
by Ken North Microsoft researchers Jim Gray and Michael Rys shared with Ken North the Microsoft view on data management topics, including SQL and XML integration, the importance of metadata, the integration of messaging, transactions and databases, and the future of grid data access. Gray, who received the ACM Turing Award for his work on transaction processing, focuses on databases at Microsoft, is active in the research community, edits a series of books on data management, and has been active in building the online databases TerraServer (terraservice.net) and the Sloan Digital Sky Survey (skyserver.sdss.org). Rys focuses on information integration and the database and data model integration and query language aspects of XML. He is Microsoft's representative to the W3C XQuery working group and other standards groups. North: Critics of SQL often point to an impedance mismatch between object-oriented languages and SQL databases because of different type systems and computational models. Does adding XML (hierarchical data) to the mix introduce yet another impedance mismatch for developers using objects and SQL databases? Is an XML-centric database programming language the solution? Gray: There are two parts to your question: the SQL/OO [object-oriented] impedance mismatch and the SQL/XML data model mismatch. Let's take them in turn. SQL is a language for defining relations and operations on relations. Relations are sets of records, and records are vectors of values. So far, so good. When SQL starts defining values, it gets into deep water. SQL has numbers, strings, time, and a few other basic types, but essentially no constructors. SQL:1999 introduced an object model, but that object model doesn't match any of the popular OO languages. The next generation of Microsoft SQL Server, codenamed Yukon, addresses this problem by integrating the SQL system with the Common Language Runtime (CLR, pronounced "clear"). CLR is the native type system of all the tools on the Windows platform. In this new world, there's no difference between the SQL types and the CRL types. Any CLR type is an SQL type. SQL does relational operators and records indeed SQL just implements the relation class. In this new world, there's no inside-the-database/outside-the-database dichotomy. There's just a persistent object store. It may well be that this puts the impedance-mismatch discussion to bed. Michael is the XML expert. I leave the second point to him. Rys: The SQL/XML data model mismatch has two major aspects: XML mapping to SQL and SQL mapping to XML. The SQL to XML mapping is fairly easy in most cases, because the SQL data model is mostly a subset of the XML model with the need to map the SQL types into XML types. Both SQL Server 2000 and Yukon provide several ways to generate XML schemas and data from SQL. The XML to SQL mapping has two different usage scenarios. [In one], XML is the model to represent nonrelational data (such as document or semi-structured data). [In the second] XML is the transport model of relational data sets. The first scenario is addressed by extending the SQL value space with an XML datatype (both the ANSI standard SQL:2003 and Yukon provide this). This allows SQL databases to operate on native XML data without the need for shredding or marshalling. It bypasses the impedance mismatch by explicit mappings of the more complex XML model to the relational model. In the second scenario (XML as a transport), the impedance mismatch on the data model really doesn't exist on the data semantic level. The data is relational as Jim described. XML is the universal encoding. It can represent XML, of course, but it can also represent complete SQL databases. Indeed, this is the basis of the .Net dataset a collection of recordsets connected via foreign keys and described by an XML schema. Because it encodes data in XML that is best represented relationally, we also need clever, schema-driven mappings that provide the marshalling when XML is being used as a transport format of relational data and not as the primary data model. SQL Server 2000 provides such mapping mechanisms already through the SQLXML 3.0 technologies and will continue to do so. Even relational views over nonrelational XML data are provided in SQL Server. Eventually, such mappings will be done behind the scenes but, in order to guarantee the efficient processing of the relational-data case, the decision still needs user intervention. North: Do XQuery and XPath present an impedance mismatch problem for SQL, particularly when formulating queries for parallel execution? Gray: This is Michael's expertise. I don't think there's an impedance mismatch, and I think many of our old tricks (parallelism, cost-based optimizers, hash joins, transactions, and lock granularity) apply to XML, but certainly we need to invent some new techniques as well. Rys: Jim is correct. There's not an impedance mismatch for query formulation if you take a declarative query language such as XQuery. Many existing tricks can be applied. For example, XQuery's FLWR iterator is well suited for parallelization. However, there's a need to add more tricks in the context of SQL. These include allowing sub-cell locking granularity on an XML datatype instance and more operators for the optimizer to deal with, such as document order, nesting, and un-nesting. Because the XML model has some performance-critical characteristics such as order-preservation and sequences, we also need clever strategies to provide performing XML processing. North: The major software vendors have adopted common specifications (such as XML and Web services). One area of competition is development and deployment platforms. Some companies offer a tight integration of their SQL database with message queues and messaging software. Does this provide an inherent advantage for Web services and other XML messaging applications? How do .Net and SQL Server approach the problem of integrating databases, queuing and messaging? Gray: Yes, the story here is proprietary implementations of standard specifications. Unfortunately, the SQL Server solution in this space is an unannounced product, so we can only speak about it in vague terms. It's going to be great! Yes, we expect tight integration of Visual Studio, SQL Server, XML, Web services, and some other components. I predict that you'll be astonished at how tight the integration is. Microsoft has concluded that the only way to make it easy to build apps is to reuse a few core concepts in all parts of the system. This means that everything gets integrated with everything. North: Different solutions have been proposed for collaborative Internet transactions, Web services and grid services, including compensating transactions and document-centric (XML) transactions. What direction is Microsoft taking for collaborative transactions? Gray: Microsoft is a big company with some brilliant people. They're not all of one mind. Transactions and workflow is an area that has been in ferment for decades. Everyone can agree on atomic flat transactions. Most people can agree on save points. But, even there, different religions emerge. When you get to workflow, there's an enormous diversity. Everyone is for workflow, everyone is for compensation, everyone is for sagas, and everyone is for contracts. But, they differ on the details. Today, right this minute, BizTalk provides a sagas and contracts model (complete with commit and abort dependencies and compensation). We are helping define the Business Process Execution Language (BPEL) as a way of scripting and exchanging flows. Microsoft is actively participating in the WS-Transaction and WS-Coordination specs. These are underlying plumbing that can support BPEL. Rys: I think it's also important to note that workflow-centric models, multilevel transaction models, contract-based models, and so on all have application areas in which they are the most natural and most scalable. For collaborative Internet transactions to be interoperable, they must have several characteristics. They need to allow a loosely coupled architecture and they must provide plug and play of components and scalability. I think a research breakthrough is needed to give this space the simplicity and scalability of the atomic flat-transaction model. Relaxing some of the so-called ACID properties of the flat-transaction model is necessary. Compensation will be part of it. But current workflow proposals lack sufficiently automatic conflict detection and resolution mechanisms. North: What direction is Microsoft taking on grid computing and grid data access? Gray: Grid computing is Internet-scale distributed computing. As such it needs discovery, naming, security, resource management, invocation, queuing, and all the other services that make up a distributed computing system. Microsoft is fully committed to the Web services model of distributed computing and has been working closely with IBM, BEA, and others to define the W3C standards for these services. We've been building that infrastructure into products since 1996. .Net is the brand name for Microsoft's implementations of these open standards. The .Net products started rolling out in 2001 and are now arriving in bulk. We still have a lot of unannounced products in the pipeline. Still, the .Net tools are a reality today. I see my colleagues being very productive in building Web services with the Microsoft toolkits, and the resulting services interact well with the Apache Axis platform. I'm more focused on the data grid (publishing and finding data) and the collaboration grid (enhancing communication among people) than I am on the computation grid (harvesting spare CPU cycles). But, I think all three initiatives are scientifically interesting.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| ||||||||||||||||||||||||||||||||









