Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Digital Library
Subscribe
Home



Tradeoffs In Splitting DBMS Work Among MPP Nodes | Intelligent Enterprise Blog
Data Frontiers, by Curt Monash
Curt Monash runs Monash Research, which provides strategic, analysis-based advice to users and vendors of advanced information technology. He also writes the blogs DBMS2, Text Technologies, and Strategic Messaging.
See More by Curt Monash

E-MAIL  |  
Share
Tradeoffs In Splitting DBMS Work Among MPP Nodes

Posted by Curt Monash
Tuesday, September 9, 2008
12:16 PM

I talk with lots of vendors of MPP data warehouse DBMS. I've now heard enough different approaches to MPP architecture that I think it might be interesting to contrast some of the alternatives. The base-case MPP DBMS architecture is one in which there are two kinds of nodes:

A boss node, whose jobs include:
- Receiving and parsing queries
- Optimizing queries, determining execution plans, and sending execution plans to the nodes
- Receiving result sets and sending them back to the querier
Worker nodes, which do their part of the query execution job and eventually ship data back to the head

In primitive forms of this architecture, there's a "fat head" that does altogether too much aggregation and query resolution. In more mature versions, data is shipped intelligently from worker nodes to their peers, reducing or eliminating "fat head" bottlenecks.

Exceptions to the base case include Vertica and Exasol. In their systems, all nodes run identical software. At the other extreme, some vendors use dedicated nodes for particular purposes. For example, Aster Data famously has special nodes for bulk data loading and export. Greenplum has a logical split between nodes that execute queries and nodes that talk to storage, and is considering offering the option of physically separating them in a future release.

The basic tradeoffs between these schemes go something like this:

• If there are more kinds of dedicated nodes, real-time load-balancing is harder; you're more likely to have idle capacity.
• If there are more kinds of dedicated nodes, you can optimize hardware better, by using different kinds of hardware for different kinds of nodes. Potentially, this is a bigger factor if some kinds of nodes have dedicated disks attached and some don't.

Calpont, which hasn't actually shipped a DBMS yet, has an interesting twist. They're building a columnar DBMS in which the querying work is split between a kind of worker node, which does the query processing, and a storage node, which talks to disk. These nodes are not in any kind of one-to-one correspondence; any worker node can talk with any storage node. Calpont believes that in the future some of the storage node logic can migrate into storage systems themselves, in almost a Netezza-like strategy, but on more standard equipment.

The Calpont story may actually make more sense in a shared-disk storage-area-network implementation than for a fully shared-nothing MPP, but that's a subject for a different post.



E-MAIL  |  
Share




This is a public forum. United Business Media and its affiliates are not responsible for and do not control what is posted herein. United Business Media makes no warranties or guarantees concerning any advice dispensed by its staff members or readers.

Community standards in this comment area do not permit hate language, excessive profanity, or other patently offensive language. Please be aware that all information posted to this comment area becomes the property of United Business Media LLC and may be edited and republished in print or electronic format as outlined in United Business Media's Terms of Service.

Important Note: This comment area is NOT intended for commercial messages or solicitations of business.


 




    Subscribe to RSS feed of all blogs


 



Techweb
Informationweek Business Technology Network
InformationweekInformationweek 500Informationweek 500 ConferenceInformationweek AnalyticsInformationweek Events
Informationweek MagazineGlobal CIOIWK Government ITbMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingPlug Into The CloudDr. DobbsContentinople
space
TechWeb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0Mobile Business ExpoNoJitter
Black HatGTECEnergy CampCloud ConnectGov 2.0 ExpoGov 2.0 Summit
space
Light Reading Communications Network
Light ReadingLight Reading AsiaUnstrungCable Digital NewsInternet EvolutionPyramid Research
Heavy ReadingLight Reading LiveLight Reading InsiderEthrnet ExpoTelco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems and TechnologyInsurance and TechnologyWall Street and TechnologyAccelerating WallstreetBST SummitBuyside Trading SummitIT Summit
space
Microsoft Technology Network
MSDNTechNetTotal IT ProTotal Dev ProNET Total Dev Pro CommunitySQL Total Dev Pro Community
space