Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




November 12, 2001

The Quest for Speed

More users, more data, and more complex analytic demands are requiring new kinds of software

By Barry Grushkin

Continued from Page 1

However, anyone who has tried traditional software on one of these multiprocessor boxes has found that usually only one of the processors is doing nearly all the work. Parallel software architectures must be designed to take advantage of the parallel hardware. Problems must be broken down into multiple processing threads in order to run appropriately against multiple processors.

SPEED READING

THE GROWING AMOUNT OF DATA REQUIRES FASTER PROCESSING

Visa stores one petabyte of data in five years.

One petabyte = 1,024 terabytes.
One terabyte = 1,024 gigabytes.
One gigabyte = 1,024 megabytes.
One megabyte = 1,024 kilobytes.
One kilobyte = 1,024 bytes.

If reading a byte takes one second, then reading a petabyte would take more than 35 million years.

BI processing demands generally have a different flavor from scientific needs. In BI, scalability issues involve the size of the data set and multiuser demands, while in science, the issue is usually the complexity of the mathematics.

SOME SPEEDY SOLUTIONS

MicroStrategy Inc. and Accrue Software Inc. software are examples where scaling is being addressed with very elaborate middle tiers. MicroStrategy has a relational online analytic processing product that gains all the information it needs directly from the data warehouse. Its processing strategy echoes the approach taken by the human mind - dividing up a problem among many specialized problem solvers.

All user-requested views are split into many subtasks - formatting, filtering, SQL generation, calculations, communication with the database, XML generation, and so forth. Each specialist engine in an array focuses on efficiently completing all demands of a given type. Each of these engines is dynamically offered larger or smaller parts of multiple processor resources, as the sum of the demands of all the tasks across all users requires.

Accrue's G2, an e-commerce analytics platform rated by Jupiter Communications as the most scalable, contains a hybrid OLAP reporting component. It meets the bulk of user demands by daily calculating a multidimensional cube, intelligently choosing the intersections to create based on user demand.

Most interesting is how G2 populates that cube. For one client, 7TBs of daily Web traffic is reduced into key statements about users via intelligent selection and reinterpretation. Hundreds of gigabytes of information is then clustered, transformed, preaggregated, and precalculated in Accrue's Analyzer. The Analyzer can store and rapidly calculate on many processors massive amounts of information because all record information is stored in a highly compressed bit form in RAM, avoiding all I/O delays. In this way, clients can get updates daily, rather than weekly.

Torrent System Inc.'s Orchestrate offers another strategy. It offers rewrites of many valued statistical, data mining, extract-transform-and-load, and data-refinement methods (including SAS Institute Inc. methods, SNCHSORT, i.d.Centric, third-party applications, and wrappers for your own in-house methods). In Orchestrate's GUI, the user can drag in and string together icons representing any of these processes to create a visual representation of a serial program. Orchestrate handles the rest - consistent integration of all the metadata and parallel calculations among whatever multiprocessing, multinode system you specify.



Rate This Article

Comments:

Optional e-mail address:

For one customer, an operation that usually took 30 hours to run on an eight-way processing box was reduced to four hours, because Orchestrate divided the work evenly among the processors to take advantage of resources.

I have only scratched the surface here. The architectural options for gaining scalable BI solutions are growing rapidly. In my next column, I will drill down with more details and describe some trends. I am sure the math professor would approve.


Barry Grushkin is principal and founder of the Machine Intelligence Co., specializing in deep, insightful data mining and comparative analysis of business intelligence techniques and technologies.








IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







techweb
Online Communities TechWebInformationWeekLight ReadingIntelligent EnterprisebMightyNetwork ComputingDark ReadingDigital LibraryWall Street & Technology
Byte & SwitchNo JitterInternet EvolutionLight Reading's Cable Digital NewsContentinopleUnStrungBank Systems & TechnologyAdvanced TradingInsurance & Technology
Face-to-Face Events
InteropWeb 2.0 ExpoWeb 2.0 SummitVoiceConBlack HatCSISoftwareEntrprise 2.0 ConferenceGTEC
Mobile Business Expo
InformationWeek 500 ConferenceBuy Side Trading XchangeBuy Side Trading SummitBank Executive SummitInsurance Executive SummitTelcoTVEthernet ExpoOptical Expo
Magazines  
InformationWeekWall Street & TechnologyInsurance & TechnologyBank Systems & TechnologyAdvanced TradingMSDNTechNetSmart EnterpriseThe Architecture JournalDatabase Magazine
 
Research & Analyst Services  
Heavy ReadingInformationWeek ReportsInformationWeek Analytics