Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




May 9, 2002

What's Next for the Database?

AN EXCLUSIVE ROUNDTABLE DISCUSSION
Strategic business applications count on the database to be rock solid, always available — and yet highly flexible in the face of new demands. A discussion with four industry veterans reveals some key directions

By David Stodder and Justin Kestelyn

Stephen Brobst
Chief Technology Officer
NCR, Teradata Division

Don Haderle
VP, Database Technology
IBM Corp.

Ken Jacobs
VP, Product Strategy Server Technologies Division
Oracle Corp.

Jeffrey Ullman
Professor of Computer Science
Stanford University

The database: Few modern applications would survive without this data-thumping beast. While the market has consolidated, present and future demands upon this technology show no signs of slowing. Very large databases, once the exclusive province of very large organizations, are nearly becoming commonplace. Centralized databases are giving way to distributed, networked systems, which will undergo even more change as Web services employ XML to speed data and analytics around the Internet.

For this special data management issue, we brought together four individuals, each with a sizeable base of experience in the field, to discuss present and future directions. Stephen Brobst is chief technology officer of NCR's Teradata division. He joined Teradata in 1999 from Strategic Technologies and Systems, a consulting firm he founded to specialize in the design and construction of data warehouse solutions. IBM Fellow Donald J. Haderle is vice president of Database Technology at IBM. Haderle served as the founding and chief architect of DB2 in the 1970s and 1980s and continues to be the guiding force as IBM moves advanced research technology into DB2 Universal Database. Ken Jacobs is vice president of Product Strategy for Oracle's Server Technologies Division. Jacobs, who recently reached his 20th year with Oracle, is responsible for technology planning activities for the Oracle database product family. Jeffrey D. Ullman is Stanford W. Ascherman Professor of Computer Science at Stanford University. Ullman is the coauthor of two leading database texts as well as numerous other books and articles. He has long been an influential figure in information technology research.

IE: Over the course of 30 or 40 years now, advances in database technology have served as change agents in information systems. Likewise, database technology has had to adjust to dramatic changes in IT architectures, such as client/server and the Internet. From our current vantage point, what problems in database technology would you consider "solved," and what remain "unsolved"?

HADERLE: I would say that basic OLTP [online transaction processing] and batch decision support are "solved," although these systems continue to be challenging as they scale in size, number of users, and so forth. What concerns folks most today is the integration of all their information assets: not just simply for accessing information, but for discovering things about it. Businesses want integration at the semantic level so that they can bring together information from across departments and possibly across businesses. That's why XML is rising in importance. The types of data businesses are dealing with are more than just simple structured records; they're dealing with documents, spreadsheets, voice, and other types of information.

Thirty years ago, we had the "application backlog." People talked about having five years of application backlog. IT needed to respond to business requirements much faster — and that's something relational databases were able to address. We've been able to speed up the capability to deliver applications, but even today, the backlog is still at the top of the list. We still need to respond to business requirements faster. Web services and the whole set of related initiatives is really about moving up a notch in being able to encapsulate and describe assets and move more quickly to deploy applications.

JACOBS: I would second much of what Don said. However, I think it's always hard to say things are "solved." Even within OLTP, there's plenty of room for innovation, and indeed, unconventional approaches to solving problems. If you go back and look at some of the work that Don and Jim Gray did in the early 1970s or before with transaction management, things like concurrency control have a nice formal, according-to-the-book definition: but in fact even in the last few years there's been a lot of work done on unconventional concurrency models. Oracle, for example, has been distinctive in not implementing a locking-based model.

So, I think there's room for innovation. It's not just a matter of incrementally improving availability or scalability. We need to look a bit beyond — at what enables one system to address a certain set of requirements, while another cannot.

BROBST: What I think hasn't been solved is the ability to combine decision support workloads against the single source of truth for an organization. I mean, we can talk about it in theory, but if you look at practical implementations at a very, very large scale, it hasn't happened yet. I would also consider manageability an unsolved problem — particularly the ability to automate management and maintain high availability. It still takes a great deal of human work to design, architect, and deploy these systems when it really should be possible to automate much of the work.

IE: It does seem that, for a fairly mature technology, databases still require significant human effort to design, build, and maintain.

JACOBS: Yes, Stephen made a great point. We are actually seeing a shift right now, where systems are getting the ability to automate things, simplify what can't be automated, and do things like enable applications to recover on their own. We are building features that exploit the intelligence of the tools rather than of the human being.

HADERLE: Availability is just one focus of automation: another extremely important goal is to employ automation to make sure systems are operational. Trying to set up and manage, for example, an SAP AG R/3 system with 20,000 tables and 40,000 indexes is an awesome task. Performance management of this kind of system is a huge consumer of human intelligence and time. At IBM, we have an ongoing learning optimization research project called eLiza that will help us automate adjustments in configuration parameters, memory space allotment, and schemas in ways that accept the fact that, by golly, the tables are going to change over time.

IE: "Real time" has become a kind of rallying cry across the business intelligence (BI) industry, with users demanding access to data much closer to the time it was recorded or updated. Event-driven applications and role-based workflow are also potentially changing how systems work with databases. Should we expect a lot of innovation in database systems around the concept of "real-time" information?

BROBST: It's already happening. At Teradata, we have one customer who feeds a data warehouse directly from point-of-sale systems in many, many retail stores across North America. Based on data coming from each individual cashier, they can use pattern matching to determine whether fraud is taking place: and if it's detected, page security people and take that cashier out of action within an hour.

HADERLE: In relational databases we have triggers for active event or alert management. But you have to consider that the vast majority of stored information is still in nonrelational applications. I think we will see things evolve from EAI [enterprise application integration]: that is, a rich set of capabilities wrapped around the ability to intercept messages to and from application systems. We'll have "sniffers," if you will, on the wire — on a ticker-tape system, for example — working with a publish-and-subscribe mechanism to make event-driven information available to users. We can expand that thought to Web services, which is a very neat construct for identifying the set of services that allow us to encapsulate a set of information systems and make things publicly available using open directories and registries.

Federated structures will be important to real-time data access. At the back end, we will have rich query systems with parallelization and integration with relational operators to allow people on the front end to work with standard SQL. In a federated system, relational engines become the consumers of information from a diverse set of local data sources. Creating real-time access is about putting sniffers where the data is located and generated: from there, the federated system pumps the data to an engine for transformation, cleansing, and integration with other information assets.

JACOBS: Web services are clearly going to play a very big role in all IT systems going forward: but there are other kinds of integration taking place. There's consolidation, centralization, and globalization going on that will help people better manage their systems thanks to fewer moving parts. The information is all in one place, or in a fewer number of places. Federation is one choice — consolidation and centralization is another choice.

Certainly, the integration of SQL and XML and having a single repository that can store both relational and XML data will give businesses a great deal of power. You will be able to do data mining or other BI activities on XML data. Of course, this integration is going to be critical to Web services.

Another important type of integration is to bring data mining, ETL [extract, transform, and load], and OLAP technologies into a database engine. This is going to shorten the information cycle, bringing us closer to the real-time notion that you suggested. At one large European telco that I'm familiar with, they're capturing all the phone calls in their country and doing analytics within an hour of data capture. This allows them to make decisions quickly. For e-commerce, integrating these technologies will allow companies to look at B2B or B2C transactions to see current and historical behavior and then use data mining algorithms and models to predict future behavior and make recommendations.

ULLMAN: There's another style of integration we should consider: The creation of logical integration products. This is what Gio Weiderhold called "mediators." Since Sept. 11th, we've heard it said that there should be an integrated product that has access to the enrollments at every school in the world. Then, you could ask a query like, "How many guys are enrolled in flight schools and don't have any affiliation with airlines?" Answering this query would require a logical transformation of the semantics of thousands of databases, featuring all kinds of formats. With a mediator, we could focus on the common concepts — a student, and so on. We could take queries at a central site and translate them into queries at thousands of participating sites.

BROBST: I would definitely agree, this kind of integration has not been solved. The whole notion of federated databases, which we've been talking about here, can work for simple OLTP queries. But when you talk about doing deep analytics, which requires very high volumes of detailed data, the reality is — and I think I'm echoing that Don's point earlier — that you have to have "sniffers" or EAI play a critical role in acquiring data. We need to take data from a physically distributed world and bring it together so that you can do sophisticated analytics. As it stands today, you really cannot do this across physically distributed repositories.

ULLMAN: Let me propose another idea, which my colleague Jennifer Widom has examined: stream processing. We've touched on the notion that this stuff is whizzing by incredibly fast. Maybe you can capture it and send it to analysts in the back room, but the technology for real-time capture of important events will have to be different. For example, let's say you're monitoring IP packets going through a national switch network and you want to detect denial of service: or, you want to identify somebody who has a series of stepping stones to hide the fact that they're breaking into a computer through a long series of telnets. You need to be able to detect important events in data that is so voluminous that you can't do anything useful with it in real time.

This brings up another important issue in database research, which we've touched on here: improving optimization. We are looking at how systems will be able to issue very high-level queries with very high-level languages, and have efficient systems provide the results. I think the stream application is one very good example. You need to say something in SQL that more or less describes the kinds of events you're looking at and have it turn that into an algorithm that stores a little bit of data — that remembers only what it has to — and allow the system to handle this very fast stream of data in real time.

IE: With XML gaining in importance, what do you see as its impact on SQL's future? Are we reaching the end of SQL's life as we've known it?

ULLMAN: No, the spirit will remain alive. SQL will adapt. There's been a lot of research into semi-structured data, of which XML is just an example. People have just begun to scratch the surface of how you optimize SQL-like queries on XML or tree-like structures. This is a very exciting area for the future.

BROBST: With more and more analytics and other work coming into the database, I see SQL remaining very much alive. At Teradata, we often talk about "ELT" — extract, load, and transform — rather than ETL. Data transformation becomes much more interesting once you've got the data in the database, where you've got parallelism, scalability, and all the other desirable features. Advanced analytic functions like data mining are going to move into SQL. There's just a ton of stuff that's going to happen with SQL going forward. The nice thing about SQL is that it's a functional language and therefore has very desirable properties for parallelization.

JACOBS: SQL will be revitalized by these trends and continued expansion, not only to handle multimedia data and analytics, but also to express business functions. We have only begun to scratch the surface of exploiting this integration. We'll be able to do data mining or OLAP on collections of documents and then drive the results back into an operational OLTP environment.

IE: For what seems like forever we've had these two separate worlds: structured and unstructured. Integration of SQL and XML, enterprise portals, and Internet applications generally seem to be bringing these worlds together. Would you say that SQL is growing to handle structured and unstructured data simultaneously?

JACOBS: Yes. I think it's a near-term reality, in fact. SQL-X is an emerging standard for using SQL together with XML syntax to navigate XML documents and express XML-related queries. We are also seeing what we saw a few years back with the object databases: that is, an effort to establish query languages designed from the point of view of a non-SQL environment. Object query languages haven't gotten very far because there's always been a higher value in integrating functions with SQL. Other languages will continue to emerge and die over time, while SQL will remain a very vibrant language.







IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo Jitter
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet Evolution
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space