Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Digital Library
Subscribe
Home



Why MapReduce Matters to SQL Data Warehousing | Intelligent Enterprise Blog
Why MapReduce Matters to SQL Data Warehousing

Posted by Curt Monash
Thursday, August 28, 2008
8:53 AM

Greenplum and Aster Data have both just announced the integration of MapReduce into their SQL MPP data warehouse products. So why do I think this could be a big deal? The short answer is "Because MapReduce offers dramatic performance gains in analytic application areas that still need great performance speed-up." The long answer goes something like this.

The core ideas of MapReduce are:

• For large problems, parallel computing is much more cost effective and/or feasible than the alternatives.
• If you shoehorn programs into a certain very simple framework — namely that you're limited to only having map and reduce steps — then building a general execution engine that gives parallelism "for free" is straightforward.
• A lot more problems can be solved within that framework than one might at first expect.

In essence, you can do almost anything to a single record* — that's a map step. But you are sharply limited in how you combine information about multiple (often intermediate) records — that's a reduce step. Still, reduce steps let you do counts, sums, or other aggregations. That, plus the general power of map steps, makes MapReduce useful for at least three major classes of applications:

1. Text tokenization, indexing, and search
2. Creation of other kinds of data structures (e.g., graphs)
3. Data mining and machine learning

Except for the building of entire search engines, these are all application areas that data warehouse users should and do care about. And they all still could benefit from large performance increases, as is evidenced by the routine compromises analysts make in areas such as data reduction, sampling, over-simplified models and the like.

*Technically, MapReduce doesn't allow for records. Instead, you process key-value pairs and lists of same. But so far as I can tell, that's a distinction without a difference. LISP long ago proved that lists are a very general construct indeed.

MapReduce can be superior to pure SQL for these application areas, because they involve creation of data structures that are awkward to fit into a SQL rows-and-tables paradigm. Inverted-list text indexes just aren't tables. Formally, graphs can always be fit into tables; but even so, if you want to follow a graph for numerous hops, relational structures can be problematic. Data mining can involve very high-dimensional problems with super-sparse tables. And while exhaustive text extraction into flat tables works OK, getting from there to common-sense semantic hierarchies can be a bit of a kludge.

Additional links about MapReduce:

Three major applications of MapReduce
Another application of MapReduce
Sound bites about MapReduce
Other links about MapReduce



E-MAIL | SLASHDOT | DIGG




This is a public forum. CMP Technology and its affiliates are not responsible for and do not control what is posted herein. CMP Technology makes no warranties or guarantees concerning any advice dispensed by its staff members or readers.

Community standards in this comment area do not permit hate language, excessive profanity, or other patently offensive language. Please be aware that all information posted to this comment area becomes the property of CMP Media LLC and may be edited and republished in print or electronic format as outlined in CMP Technology's Terms of Service.

Important Note: This comment area is NOT intended for commercial messages or solicitations of business.


 




    Subscribe to RSS feed of all blogs


 



InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo JitterPlug Into The Cloud
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet EvolutionPyramid Research
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space