Search Powered
by Thunderstone:

Intelligent Enterprise
DBPD Online
DBMS Archives
 


October 26, 1999, Volume 2 Number 15

E-Business Everlasting

Planning ahead for scalability under growing customer demand — whether through partitioning or replication — may be the best e-business move you can make

By Ike Van Cruyningen



It’s not hard to imagine: Your development team has done an outstanding job on your latest Web application, a high-visibility corporate purchasing system. Performance is wonderful during development and testing, so you deploy the application with great confidence. Your first customers are delighted and your customer list begins to snowball. You’re really riding high.

Suddenly, your application slows to a crawl, and the complaints pour in. You realize that unless you repair the problem immediately, customers will soon leave in droves. A system can meet all its functional and performance goals, but still fail to scale in the real world. America Online (AOL) found this out the hard way just after it offered unlimited monthly access to its online service. System overloads led to millions of unhappy customers and legal threats from dozens of states. Only after a round-the-clock effort to upgrade the system did the service get back on track. AOL had to issue refunds totaling millions of dollars to restore customer confidence.

Whether you’re managing, sponsoring, funding, or developing a Web application, you absolutely must plan for scalability before you deploy. In the frenetic world of the Web, it is exceedingly difficult to patch in a solution later on. In this article, I’ll discuss some field-tested replication and partitioning techniques that will ensure your applications can handle enormous audiences without missing a beat.

Congestion Takes Its Toll

Why would a well-written, well-behaved application suddenly start acting up under load? The explanation is remarkably similar to your commuting experience. Performance as measured during development is like driving home on a Sunday afternoon, while actual performance under load is like driving home at rush hour. The freeway and your application both behave according to this formula, which is charted in Figure 1: actual response time = unloaded service time/ ( 1 – utilization ).



FIGURE 1 Service time multiplier vs. utilization percentage (X10).


At 50-percent utilization, your customers wait twice as long for a response. At 75-percent utilization, they wait four times as long. It gets exponentially worse from there: Adding just a few more customers at high utilization rates can easily double the wait time for everyone. Naturally, the response time doesn’t go to infinity as utilization approaches 100 percent. When application traffic is too congested, some customers lose patience and go somewhere else, just as some commuters adjust their work schedules to avoid rush hour.

Of course, you don’t want your customers to go somewhere else, so you must plan for scalability. Alloy.com recently decided to supplement its online marketing with a mail-order catalog. When the catalog hit, response was so enthusiastic it took out the e-commerce servers, negating much of the mailing’s value. You must plan for scalability early in the development process because your application responsiveness will deteriorate exponentially under load. You cannot put this issue on the back burner and hope it will resolve itself. It will escalate and cause you more and more stress — and put your business at more and more risk — as time goes on.

If you’re lucky, you can start the project off with a demand forecast from marketing studies. Capacity planning will give you an estimate for your server size based on the forecast. Don’t question the methodology behind the forecast too closely, because all too often, it simply seems to be someone’s best guess.

But whether you have a reliable forecast or not, you have to ask yourself, “What if we get twice that many customers?” The goal of a scalable design is a system that will accommodate more and more customers simply by adding hardware. If you start with a scalable hardware platform, you’ll be able to double capacity without significant disruption by simply replacing the server with a more powerful one.

However, the Web is growing exponentially, so you have to ask about doubling, redoubling, and so on. Regardless of your hardware vendor’s skill and reputation, at some point you’ll reach the capacity limit for a single server. Then the only solution is to replicate or partition your application across several servers.

In application replication, you add duplicate hardware in parallel and distribute requests to the duplicate servers. In application partitioning, you split the application so you can move parts of it to other servers as the system outgrows the original hardware’s capacity. The two strategies are complementary. As your customer base increases, you should first partition your initial deployment. After further growth, you can replicate the partitions that outgrow their new servers.

Replication

Replicating your application across duplicate hardware is like adding lanes to a congested freeway. The major design issues are: First, determine where bottlenecks will occur, because that’s where the new hardware will help most; and second, determine how to distribute requests among the duplicate hardware systems to balance the load. Figure 2 (page 32) shows five potential replication strategies at different tiers of a Web application.



FIGURE 2 Replication strategies for the tiers in a Web application.


If disk operations are the bottleneck, supporting additional disk drives is relatively easy. (See Figure 2a.) Transactional databases rely heavily on disk operations to guarantee transaction durability and store recovery logs. In these databases, transaction rates are often limited by disk I/O, so it makes sense to invest in additional drives. Redundant arrays of inexpensive disk (RAID) technology makes it easy to add drives, transparently load-balance requests to the drives, and provide redundancy and failover in case a drive fails. For transactional databases, invest in RAID.

The database engine itself may be a bottleneck during complex joins or in the long-running queries in data mining or exploration tasks. If your system bogs down when the marketing folks connect for sophisticated reports, you probably have a database engine bottleneck. As Figure 2b shows, a parallel or replicated database will increase capacity. A parallel database runs one database instance on parallel hardware. In such products, database developers have figured out how to balance the load and maintain data consistency. In replicated databases, two separate systems manage two instances of the same data. Replicated databases incur major overheads in maintaining consistency between the two sets of data. They work for slowly changing content — such as situations where several hours of inconsistency between the databases is tolerable — but they perform poorly for databases that have to stay synchronized at all times. In such cases, trying to segment or partition the database (as described in the next section) is a better idea. (The architectures in Figures 2c, 2d, and 2e assume the use of a parallel database engine.)

The application server may become a bottleneck if the business rules are very complex. In my experience, relatively minor changes in user interface layout and application navigation can automatically satisfy a surprising number of complex business rules. For example, if a workflow has to occur in a specific order, guide your clients through the application in that order. If dependencies between data elements exist, make it impossible to enter dependent data without having already entered the antecedent data. If certain clients do not have the authority to take specific actions, simply don’t offer such actions to those clients. In fact, one of the primary benefits of a consultant is to offer a fresh, independent look at application design, with an eye to removing as much code as possible. The fastest line of code to develop and run is the line of code you don’t write at all.

An app server may also become a bottleneck is if it has to manipulate or rearrange data. Databases excel at sorting, shuffling, joining, aggregating, and otherwise massaging data. If you’re writing app server code containing nested loops or arrays, you should consider a more advanced SQL statement or a stored procedure instead.

Finally, an app server may become a bottleneck if the technology is underpowered. Interpreted languages are slower than compiled languages, so long scripts with lots of interpreted code will cause problems. Active Server Page (ASP) scripts are notoriously slow; even Microsoft advises the use of compiled components for significant app server tasks. And server-side Java may have many benefits, but high performance is not among them.

Compiled components certainly improve the performance of an app server, but how do you replicate that capability for larger loads? Although you could deploy additional networked app servers (see Figure 2c) and implement load balancing, the communication overhead tends to overwhelm performance benefits. A LAN remote procedure call (RPC) takes roughly two orders of magnitude more time than a local RPC because it requires tens of thousands of machine instructions to get a message onto the LAN. Load balancing while the request is still on the LAN by replicating the Web server is a better idea.

The Web server itself is almost never a bottleneck, but the architecture of Figure 2d is attractive because it is an easy way to load balance the rest of the system. The simplest implementations use a router to distribute requests to Web servers based on the IP packet source address; each Web server manages a specific range of client IP addresses. This approach neatly solves the problem of session state. HTTP does not maintain a continuous connection between client and server during a session; rather, each request is entirely independent. However, many applications have to maintain state information, such as shopping cart contents, between requests. If the state information is maintained in the app server, then all the requests from a client during a session have to route to the same server. Routing based on source IP address ensures this process occurs.

One drawback here is that requests are not distributed evenly among source IP addresses. Requests from major networks such as AOL travel through just a few gateways. You may have to dedicate one Web server just for clients from AOL. When that server becomes overloaded, this approach doesn’t scale further. A second problem is that the load balancing here is static. Dynamic load balancing based on current server workloads is more effective; if the server dedicated to AOL is overloaded or fails, you can shunt some or all of the requests to an idle server. To achieve dynamic load balancing, you must replace the simple router with one of the following:

•A smarter redirector that polls the Web servers for current load and understands not only source IP addresses, but also session information in cookies, logins, URLs, SSL session IDs, or specific hidden fields in forms. Examining the contents of HTTP requests within IP packets definitely slows down a redirector.

•A cluster manager to distribute requests among equivalent machines in a cluster. The clustered machines transparently exchange session state information using a separate LAN protocol. This solution is the best one for load balancing as well as failover at the intranet level, but it is also the most expensive.

•A dedicated server for your home or login pages that redirects client requests to peer servers for the rest of a session. The redirection is based on the peer server’s current workload. This approach, which Netscape Netcenter and other large sites use, is relatively easy to implement.

For an Internet application (as opposed to an intranet one), the bottleneck is most likely the communication across the network. The architecture of Figure 2e, which includes several replicated or mirror servers close to different backbones and in widely dispersed geographic locations, is most attractive for an Internet application. The Domain Name System (DNS) allows association of multiple IP addresses with a single name and will return those IP addresses in a round-robin fashion.



FIGURE 3 Round-robin name resolution with DNS.


As Figure 3 shows, three repetitions of the nslookup command for www.yahoo. com returns three different IP address from four possibilities. DNS provides elementary load balancing; each Yahoo server will get one quarter of the initial requests. The allocation among servers is independent of current server load or availability, so it may still produce hotspots or bottlenecks, particularly if the servers are not equal in capability.

The browser caches the DNS-returned IP address during the time-to-live (TTL) interval. If a server fails, the client has to wait at least one TTL interval before a new DNS request returns a different server, so failover is not transparent. If the client’s session is all within a TTL interval, the requests will all go to one server and the server session state will be maintained correctly. If the TTL interval expires, resulting in a new DNS lookup, a client may be directed to a different server and the session state will be lost. If you’re maintaining session state in the server and plan to use the round-robin DNS to distribute requests, make sure your server session timeout is shorter than the DNS TTL interval.

Several companies have developed smarter DNS servers to improve on the round-robin DNS scheme. A smarter DNS server acts as a front end for several Web servers supporting the same URL. The enhanced DNS server tracks the load and availability of the Web servers. In response to a DNS name lookup, it returns the IP address of the least busy, but still available, Web server. The client then works with that IP address for the duration of the session.

Another approach is to redirect clients when they make an HTTP request. A primary server corresponding to the IP address returned by DNS tracks the load and availability of a number of secondary servers. When the primary server receives an HTTP request, it redirects the request to the secondary server with the lowest utilization. The secondary servers might be geographically dispersed, so a smart primary server also takes into account the proximity of the secondary server to the client, to minimize total client response time.

Replication can be used at each tier in a Web application to remove bottlenecks. For an intranet application, the architecture of Figure 2d is most attractive from a scalability and availability point of view. For an Internet application with widely dispersed clients, the architecture of Figure 2e provides the best performance, scalability, and availability.

Partitioning

Application partitioning is another technique for distributing workload. Here different server tasks are allocated to different hardware systems, splitting the application either horizontally by tier or vertically by function.



FIGURE 4 Partitioning Web applications.


In discussing replication strategies, I didn’t specify whether the Web server, app server, and database server should all run on the same hardware system. Putting them all on one system (see Figure 4a) minimizes communication overhead among servers, but increases utilization for the single hardware system. There is no clear-cut rule for partitioning the three services among hardware platforms. If the Web server has enough memory to cache static content, it will be network I/O bound; it’s waiting to send files to the client through the network card. The app server implements business rules, contacts a database server, and generates HTML. The business rule calculation and HTML generation are CPU bound, while the database actions in the database server are typically disk I/O bound. To a performance analyst, an I/O bound Web server and a CPU bound app server is a match made in heaven. For your initial system deployment, put the Web and app servers on the same hardware (See Figure 4b.) Then monitor the CPU and network loads. You might think that when the CPU becomes a bottleneck, you should offload the app server to another system. (See Figure 4c.) However, this approach often simply converts a CPU bound app server into an I/O bound app server limited by network communication overhead. You’re better off replicating the entire Web and app server combination onto a new hardware system. (See Figures 2d and 2e.)

Database operations are commonly I/O bound because disk access times are thousands of times slower than CPU operations. Combining a Web server waiting on network I/O, an app server waiting for the CPU, and a database server waiting on disk I/O could result in a balanced system, provided it has enough memory to keep all the processes and cached files memory resident. More commonly, the database server resides on a separate system. Often the database server has been operational for years and serves many other applications, so it is already on a separate system that should not be modified.

You might also partition your Web application vertically by function. You separate the different tasks onto hardware systems optimized to perform those tasks. For example, my company worked with Mindscape, a leading developer of innovative software titles for the education and entertainment markets, to develop an application scalable to hundreds of thousands of customers. Mindscape wanted to let customers download additional content for its software titles using cost-effective Web distribution. It also wanted to market additional content from within the application and support distribution of application updates across the Web.

These requirements map to two distinctly different types of hardware systems. Downloading static content such as clip-art, styles, design tips, or application updates requires a simple Web server optimized for static content — a system with lots of memory and multiple network cards but only modest CPU and disk I/O subsystems. Verifying client licenses and supporting purchases of additional content requires an e-commerce Web application that accesses a database. Database access requires a system with fast disk I/O. Based on the relative frequencies of the two types of operations, we designed a system with one database-backed system supported by multiple content-download systems. (See Figure 5.) This architecture makes the same hardware and database server investment scale much better than multiple identical systems that support transactions as well as download content.



FIGURE 5 Cost-effective e-commerce architectures.


Separating read-only data from volatile read-write data — which reflects the division between online analytical processing (OLAP) and OLTP — can yield enormous dividends in other tiers, as well. A database that is read-only for most users can omit logging and largely be cached in memory. This solution would eliminate most disk accesses, thereby improving performance by an order of magnitude. A read-only database is also easy to replicate, which vastly improves the entire Web application’s performance and scalability. For example, in any e-commerce application, the product descriptions or shopping catalog is read-only for most clients. The content authors do change it, but much less frequently than shoppers browse it. However, the customer and inventory data is volatile. If these two types of data are separated, replicated servers close to the client would quickly serve the product descriptions while a centralized server manages orders. In browsing the site the client works with a fast local server, but when the shopper orders an item, the order request is directed to a central system managing customer and inventory data. The central system performs only order processing and customer updates, so utilization stays low. The frequently changing customer and order data does not incur the costs of replication and consistency checks.

Separating data that changes quickly from data that changes slowly improves performance and scalability in many contexts. A financial application should replicate historical data and price quotes, but centralize trades and customer accounts. A human resources application should replicate benefit plans and company policies, but centralize employee requests and individual benefits tracking. An online bookseller should replicate the “Books in Print” catalog, but centralize orders and customer preferences.

Beyond simply separating static data from volatile data, you might consider a distributed database. In this architecture, you segment your data by geography, customer, product line, or some other criterion and then put each segment in its own database. If you choose your segmentation correctly, most of your database operations will access only one of the split databases. For example, real-estate databases segment very nicely by location; you can easily use one database for each state.

Unfortunately, most corporate data does not separate quite as cleanly. In distributed databases, the scalability and performance risks arise from transactions that span more than one database: The transaction monitor that coordinates distributed transactions uses two-phase commit, which can be a major performance drain. Even worse is the potential for cross-database deadlocks. The bottom line is, be careful with distributed databases. If you have to divide a database, you may be better off splitting the relatively static data from the volatile data. Keep all the transactions and write locks in a single database containing all the volatile data.

Beyond separating the static and volatile data, it makes sense to further partition data onto separate systems. As long as you can cache all the content or most of the data in memory, you’re better off replicating the entire system, rather than partitioning. Replication lets you balance the load and support failover, providing a more flexible system than specialized servers would. With HTTP 1.1 persistent connections and pipelined requests, supplying as much of a session as possible from one server will improve client response time.

Be Prepared

Deploying a new e-business application on the exponentially expanding Web can be daunting, but the partitioning and replication techniques I’ve described here can give you an edge. Overall, appropriate application partitioning gives you the best bang for your hardware buck, and replication eliminates bottlenecks and supports failover. A combination of these techniques ensures a cost-effective solution capable of handling phenomenal growth.



Ike van Cruyningen is a principal of Architier, a firm specializing in development of multi-tier enterprise systems based on Web technologies. This article is derived from his forthcoming book, COM+ Web Applications (McGraw-Hill). You can reach him through www.architier.com



 

Copyright © 2004 CMP Media Inc. ALL RIGHTS RESERVED
No Reproduction without permission