The Quest for SpeedMore users, more data, and more complex analytic demands are requiring new kinds of softwareContinued from Page 1 However, anyone who has tried traditional software on one of these multiprocessor boxes has found that usually only one of the processors is doing nearly all the work. Parallel software architectures must be designed to take advantage of the parallel hardware. Problems must be broken down into multiple processing threads in order to run appropriately against multiple processors.
BI processing demands generally have a different flavor from scientific needs. In BI, scalability issues involve the size of the data set and multiuser demands, while in science, the issue is usually the complexity of the mathematics. SOME SPEEDY SOLUTIONSMicroStrategy Inc. and Accrue Software Inc. software are examples where scaling is being addressed with very elaborate middle tiers. MicroStrategy has a relational online analytic processing product that gains all the information it needs directly from the data warehouse. Its processing strategy echoes the approach taken by the human mind - dividing up a problem among many specialized problem solvers. All user-requested views are split into many subtasks - formatting, filtering, SQL generation, calculations, communication with the database, XML generation, and so forth. Each specialist engine in an array focuses on efficiently completing all demands of a given type. Each of these engines is dynamically offered larger or smaller parts of multiple processor resources, as the sum of the demands of all the tasks across all users requires. Accrue's G2, an e-commerce analytics platform rated by Jupiter Communications as the most scalable, contains a hybrid OLAP reporting component. It meets the bulk of user demands by daily calculating a multidimensional cube, intelligently choosing the intersections to create based on user demand. Most interesting is how G2 populates that cube. For one client, 7TBs of daily Web traffic is reduced into key statements about users via intelligent selection and reinterpretation. Hundreds of gigabytes of information is then clustered, transformed, preaggregated, and precalculated in Accrue's Analyzer. The Analyzer can store and rapidly calculate on many processors massive amounts of information because all record information is stored in a highly compressed bit form in RAM, avoiding all I/O delays. In this way, clients can get updates daily, rather than weekly. Torrent System Inc.'s Orchestrate offers another strategy. It offers rewrites of many valued statistical, data mining, extract-transform-and-load, and data-refinement methods (including SAS Institute Inc. methods, For one customer, an operation that usually took 30 hours to run on an eight-way processing box was reduced to four hours, because Orchestrate divided the work evenly among the processors to take advantage of resources. I have only scratched the surface here. The architectural options for gaining scalable BI solutions are growing rapidly. In my next column, I will drill down with more details and describe some trends. I am sure the math professor would approve. Barry Grushkin is principal and founder of the Machine Intelligence Co., specializing in deep, insightful data mining and comparative analysis of business intelligence techniques and technologies.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| ||||||||||||||||||||||||||||||||









