|
Third Eye View, by Rajan Chandras
Rajan Chandras is a consultant with a global IT consulting, systems integration and outsourcing firm. Write him at rchandras@gmail.com. See More by Rajan Chandras
E-MAIL |
Of 'Elephants,' Column-Store Databases and the Von Neumann Architecture
Listening in to Dr. Michael Stonebraker decry "elephants" and extol the virtues of column-store databases in general and Vertica in particular, it's becoming clear that a totally new data storage architecture is the need of the day. Dr. Stonebraker is, of course, a venerable figure in the world of databases, best known for his pioneering work on Ingres at UC Berkeley more than a quarter century ago. These days, however, in his role as CTO of Vertica, he is constrained to speak more or less unilaterally on the topic. In a recent presentation on Vertica, Dr. Stonebraker didn't actually call the leading relational database vendors — Oracle, IBM, Microsoft — "large, lumbering and slow." He did, however, repeatedly refer to them as "elephants." Very clever. You probably know of column-store databases and about Vertica, so I won't go into too many details here — IntelligentEnterprise.com has plenty of information to offer (check this update, this trend article and this blog). Here's what's interesting. Towards the end of the presentation, I thought I heard Dr. Stonebraker clearly state/imply that column-store databases are wonderful not just for data warehouses, they are pretty good for conventional (transactional) uses as well. That, of course, doesn't seem right. The central premise of all conventional relational databases is to store the entire row on a single database "page," as far as possible, which makes for efficient storage and retrieval of a single row of data (i.e. a single tuple or entity instance) — thus making it efficient for systems that read or write transactions (one transaction typically deals with a single entity instance — for example, one customer order, one invoice, or for that matter, a single customer). Hence, careful planning around row size and page size is a key component of database design optimization. This strength of conventional databases, when used for large, star-join sorts of queries, also turns into a weakness, since the typical data warehouse query only needs to look at a few columns and not the entire row of data (specifically, the columns in the SELECT and WHERE clauses). That's where column-store databases get their strength: because they store data by the column, the page now has a single column of data, organized in (whatever) sorting order. Queries now need to read less number of pages to get all the values, and sorting and matching is faster. Consider what happens when we use a column-store database and read a single transaction — say, that customer master record or the customer order. This data is now spread across many pages, and reading the transaction suddenly becomes much less efficient. Now imagine a large-scale OLTP system. It's not clear how column-store databases will cater to this need. Conventional or column-stored representation – there's no getting away from the Yin and Yang of database organization. This reminded me (rather laterally) of the Von Neumann single-instruction-single-data (SISD) bottleneck. How fast can you process data if you are constrained to operate each instruction on a single piece of data sequentially? Subsequent architectures, such as vector processing (SIMD) and parallel processing (MIMD, whether small-scale clustering or large-scale parallelism) got around the bottleneck by a fundamental shift in paradigm. Similarly, we need an equally fundamental shift in database storage architecture that will take us past two critical bottlenecks in database organization and performance that exist today:
This is interesting and highly pertinent stuff. Stay tuned for more in the future. Your own insight is also invited.
E-MAIL |
This is a public forum. United Business Media and its affiliates are not responsible for and do not control what is posted herein. United Business Media makes no warranties or guarantees concerning any advice dispensed by its staff members or readers. Community standards in this comment area do not permit hate language, excessive profanity, or other patently offensive language. Please be aware that all information posted to this comment area becomes the property of United Business Media LLC and may be edited and republished in print or electronic format as outlined in United Business Media's Terms of Service. Important Note: This comment area is NOT intended for commercial messages or solicitations of business.
|
Blog Channels
The Brain Food Blogger SQL Puzzlers by Joe Celkoon Enterprise App Development on Changing the Enterprise by Shawn Shell by Kas Thomas Product Maven Subscribe to RSS feed of all blogs Archives
|
| |||||||||||||||||||||||||||||||





















