What's Next for the Database?
AN EXCLUSIVE ROUNDTABLE DISCUSSION
|
HADERLE: In my view, SQL is going to be around not for years or decades, but for centuries. SQL was a response to a very important change; in the 1970s, database models were evolving through the network model, which was very complicated and aimed largely at engineering and scientific environments. SQL offered a much simpler view of the data, with a language that is about value-based relationships. Now, as we get further into this next wave we're riding, we are finding that the data out there is maintained without value-based relationships. We're faced with the old network models of navigation.
As we add this network-model orientation into SQL, it's like adding English to French it's not a natural fit. You can extend SQL, but it becomes a bit esoteric and not as friendly. XML is evolving to become more descriptive and capable for navigational data structures and collection class needs: but honestly, there's no standard query language in there. Something like that has yet to come out of the hopper. XML evolved largely to deal with something other than the classical records used in business enterprises today, which are highly structured information sets.
Not only do I see the language living on, I see relational technology living on. XML can contain everything relational has, and then some. All the predicate handling that's in your relational database can be reused to create an XML database. So, we aren't going to see the market need for start-from-scratch XML database managers; we're going to see folks repurposing their relational database servers for use with tools sitting on top that use an XML dialect.
ULLMAN: I sort of agree, but take a slightly different slant. Object-oriented database approaches didn't really solve all that many problems that couldn't be solved by existing means. Nobody wanted to put in the investment to convert from relational for what might have been a 10 percent improvement. However, OQL [Object Query Language] offers some potential. It has evolved to look very much like SQL, but rather than having to differentiate between SQL and what's navigational or hierarchical, you can think of objects from the value point of view.
IE: While object databases, for various reasons, couldn't penetrate the relational-dominated market, some of the technology headed for the middleware layer. In fact, the middleware layer seems to be the target of many new technologies for integration and distributed intelligence. Would you say that the database server layer is pretty well defined now, and that the action is in the middle?
BROBST: I wouldn't necessarily agree with that because, as we discussed earlier, more and more functionality is moving into the database we are bringing work to the database, rather than moving data out into the middleware. Middleware provides the ability to acquire and deliver data; it can do the sniffing that we talked about earlier.
JACOBS: I would agree with what Stephen said. I don't think it's an either/or; there's plenty of interesting activity happening in both layers, with many of the same trends toward integration and support for different functions. We used to think of transaction-processing monitors or message-oriented middleware as distinct pieces of the middleware layer. Cache capabilities then entered the picture. Now, all of these things have come together as part of the evolving definition of what an application server is. Much of the activity on both layers is synergistic.
HADERLE: I'd amplify one of Ken's points: Databases have been extended outward, so they're not just some monolithic island. We provide caching in front of the databases caching at the JDBC or ODBC layer, at the edge of the network layer, and into application server layers that have a hand-in-hand communication with the "central server," if there even is such a thing in the architecture. These caches all are adopting database properties. They intercept the query; respond to whatever the application server's dialect is which may some form of OQL, or an object query. Depending on the structure of the application, the system may materialize the cache. The cache could even have replication or synchronization mechanisms coordinated with the back-end set of servers to upload, with some concurrency and consistency specifications, to the cache layer.
BROBST: Databases today are not only responding to simple queries, retrievals, and updates: There's enough intelligence in the database servers that they are often the ones making decisions, creating events, and then pushing the results back out to the applications.
HADERLE: Absolutely.
BROBST: That's a significant evolutionary change and certainly affects EAI middleware developments.
IE: There's a lot of activity right now in moving BI analytics into the database engines. Where do you see this leading in the next two, three, or five years?
BROBST: Well, today, almost all data mining is done outside the database. But in two to five years, I think it will be done almost completely inside the database. There's no reason to be moving data back and forth between these two environments. Within the database, you can have the ability to take trickle feeds of data-external events and then apply rule-based systems to create knowledge and events that need to be acted upon.
Some of the issues that Jeff brought up about being able to process huge volumes of data in real time need to be solved. People are working very hard on these problems, especially to be able to essentially fire off rules-based systems incrementally through a triggering mechanism so that you don't need to apply the whole rule base every time the system sees a new event.
JACOBS: Real, technical integration is happening, not just marketing integration on the price list. I'm talking about having an OLAP engine inside the database that can provide SQL access to multidimensional cubes, to join those cubes to relational data and to mine those cubes. There's tremendous potential once these technologies are integrated. We will be able to move on to address entirely new problems.
IE: Before we close our discussion, we can't let this particular panel go without talking a bit about parallelism, shared disk, shared memory, and related issues that are a big part of the competitive landscape in the database industry today.
JACOBS: Well, it's no secret that there's competitive activity in this space. We're seeing some very exciting hardware trends, which are moving things quickly and attracting a lot of attention because of the potential dollar savings. First, we have storage area networks [SANs], which will allow us to manage storage in a unified way, independent of the computers and applications accessing the storage. That's one trend. Another is the emergence of low-cost, commodity computers that can be arrayed in a rack of "blades" assembled in tens or hundreds. This development will enable tremendous scalability at a low cost compared with very high-end symmetric multiprocessors [SMPs].
Now the question becomes, what sort of database architecture can you use to support applications hosted on such a platform? The whole debate, of course, has been around this notion of shared disk and shared nothing. IBM has had considerable success with its DB2 Sysplex environment for years on the mainframe, and this uses a shared disk model where all of the processors have access to the entire database. Oracle9i, with its Real Application Clusters, has adopted that model and deployed it in an open systems world on Unix, Linux, and Windows platforms. It has done so with just these commodity components I've been talking about: for example, an Intel box running a Linux cluster and accessing a single database on a SAN.
The major advantages of this approach are simplified management, the single appearance of the database close to the application and administrator, and the ability to work with packaged applications without change but on top of a scalable, dynamic, cost-effective environment. We believe that to exploit these developments in hardware architectures and address the realities of an OLTP environment, a shared cache model makes the most sense. It's certainly true that shared nothing can address decision support and data warehousing applications quite well. But I think the harder problem is to provide a highly scalable, available, and manageable environment that will run on this new emerging hardware architecture I've described.
HADERLE: In the open-systems environment we have shared nothing, largely because it makes better sense given the hardware platforms in that environment. With the OS/390, we have not just shared disk but shared everything. The DB2 OS/390 environment shares memory structures as well as disk. In fact, to get reasonably good performance given the workloads, you have to be pretty bright in terms of what you're going to put into the shared memory cache so that you don't bring the whole workload down. It takes a lot of good, thoughtful engineering to make it work.
In the end, solutions will always be a hybrid a little shared this, a little shared that. We see hardware vendors providing "boosters," which provide some shared memory and shared disks across SMP nodes. So to me, we are in the midst of an evolution. We provide the full range and scale inside of IBM's databases, from smaller environments to great big ones.
A more interesting question is about mixed workloads. Today, systems are evolving to where they're not just a strictly OLTP or decision support. If you project out over the next few years, you're going to find more and more mixed workload environments. We've been able to neatly tune our designs and implementations for analytical environments, with fragmentation and partitioning in ways that are quite different from what we'd do in an OLTP environment. But now we're heading into the tricky part: How do we create a "uniform" and I use that word carefully system that can respond to the demands of a mixed-mode environment? I use the word "uniform" carefully because it's not clear to me that we will have one copy of the data, or one single data organization. When you organize the data one way, it works well for direct access: organize it another way, and it doesn't work well for sequential access. You have to be careful with indexes or you can just kill transaction times and incur maintenance problems.
BROBST: As CPUs continue to get faster and faster, the notion of "fat" SMPs and I'm including NUMA [non-uniform memory architecture] here becomes much less attractive from a database perspective. The CPUs need to be able to communicate with each other and with memory, so the whole system risks becoming increasingly imbalanced. The CPUs in a fat SMP/NUMA deployment can become underutilized; they're spending a lot of time idling, waiting for useful work. If you look at small SMPs as building blocks in a shared-nothing environment, that's when you begin to get better utilization. However, we have to remember that shared-nothing hardware does not necessarily mean shared-nothing software. Each of us has implementations that will run on reverse configurations.
But clearly, from a hardware perspective, shared nothing is much more efficient for databases. Shared-everything hardware will continue to exist for applications that have been developed with a shared-memory programming model. That's what the kids learned in school, and what applications have been working with for many, many years. But databases are fortunate to have this functional abstraction with SQL, which means that we don't need to depend on a shared-memory model to deliver performance, scalability, and other important things.
IE: Thanks very much to everyone.
|
|
|
|
|
|
|
|
|
| |||||||||||||||||||||||||||||||





















