Mass MovementScalable data integration is crucial as businesses face increasing data volumes and transactions
By Philip Russom
Data Integration ChallengesAs if rapidly rising data sets, transaction volumes, application numbers, and user counts aren't troublesome enough, the situation is further exacerbated by barriers to the performance of data integration strategies. (I charted these trends as the horizontal axis in Figure 1.) Shrinking batch windows. The tradition of executing data integration as batch processes in the dark of night - when user and system activity are lowest - may soon be a limited option. For example, IT departments typically conduct data integration in support of e-commerce at night for functions like dot-com order fulfillment and product catalog data movement, but nightly "batch windows" are shrinking as business in general becomes more global and as e-business is increasingly conducted over the geography- and time-zone-free Internet. Corporations fuel the data integration crisis by asking IT professionals to integrate unprecedented volumes of data, but give them ever-decreasing amounts of time in which to accomplish this Herculean feat. Complexity of distributed environments. The emergence of client/server in the 1980s greatly increased the complexity of computing architectures. Following the infusion of Web-based architectures in the 1990s, computing architectures have pushed forward into a new extreme of distributed computing. In terms of data integration, many corporations face a daunting list of source and target components that must be integrated. As the number of components grows, so does the number of data transformations, staging areas, and data caches. Preserving network bandwidth. In fully digitized businesses, network bandwidth is already in peril. Numerous applications and processes depend on the network and, as data integration efforts scale up, the regular movement of massive data sets may degrade network performance considerably. Employees, customers, and partners in e-business environments have little patience for sluggish applications - to the point that a loss of performance may lead to a loss of revenue. Although new network technologies - which double bandwidth capacity - appear every 18 months or so, companies cannot afford the time and expense of constant upgrades. These restrictions force companies to choose data integration processes that make lesser demands on bandwidth. Rising performance expectations. As recently as the mid-1990s, data integration processes could run daily or weekly, and no one worried much if the process failed occasionally. But those days are long gone because business people in today's fast-paced markets need to analyze enterprise performance today instead of waiting a day or a week to review the results. Furthermore, once you provide self-service systems to employees, customers, and partners, they expect the most up-to-date information possible. Therefore, data integration processes must run frequently and without error. Scalable Technology RequirementsAs these trends in volume and performance move forward, they push data integration toward a crisis of scalability. To achieve scalable data integration under extreme conditions, IT personnel need to adopt different types of scalable technologies: Parallelism, the top priority. The most important technology for scaling data integration is parallelism. In other words, running multiple computing processes simultaneously in parallel simply allows them to complete sooner than running sequentially. Here are some "parallelized" features to look for in scalable data integration tools:
Scalable extract and load. When you're extracting data from source databases and loading it into target databases, your scalability depends on the query optimizers and bulk loaders of the source and target databases involved. Unfortunately, the capabilities of query optimizers and bulk loaders vary greatly from one brand of DBMS to another and sometimes among releases of the same brand. Most of them, however, support multiple performance modes. To avoid scalability-limiting bottlenecks, the programs you write and the tools you use should support the highest performance mode, even if you must alter your application design and data structures to achieve it. Extract and load can also be accomplished with SQL statements transported via ODBC. When query optimizers and bulk loaders are not available, ODBC is a fine stop-gap measure, but the poor performance of most ODBC drivers makes them unacceptable to organizations that demand scalable data integration.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| |||||||||||||||||||||||||||||||





















