|
|||||||
|
http://www.intelligententerprise.com/010507/feat1_1.jhtml
|
|||||||
| Executive Summary | |
|
Once upon a time, storage was boring; it was simply a disk drive storing your data. Application servers were the center of the universe: they required strategic planning. After all, once you decided on the server vendor, you just bought the storage that came with the server. Times have changed. Today, storage rules.
Now, your first decision is which storage vendor you should use, then the application vendor. Finally, you are ready to choose the servers. Visualize these elements with storage in the center, applications in the next ring, and servers in the outermost ring. Numerous reasons for these shifts exist, but primarily it comes down to the following factors:
Storage costs as a percentage of the total IT budget have steadily climbed and now make up its largest piece. In the early 1990s the split between server and storage spending for enterprise class servers was 70/30; now the ratio is reversed. Also, the significance of data has evolved from being an important part of an organization to being the lifeblood of a company today. The sheer competitiveness of your company depends upon how much relevant customer and market data you collect and, equally importantly, how you manipulate it to extract information for decision-making. Clearly, the type and quality of storage infrastructure is more important than ever before.
The Internet has been single-handedly responsible for the explosion of data. Enterprises adding clicks to bricks and those based purely upon clicks are adding huge amounts of data - with no end in sight. Most organizations are doubling the amount of data they have to manage every year. To get a sense of this incredible growth, consider the fact that in 2001 alone, enterprises will add 400 to 500PB (1 petabyte equals 1,000TB) of new data, which is only slightly less than all the historical data that enterprises have ever created. Also, companies have a greater need to maintain more data online for business or legal reasons.
The type of data is also changing dramatically; it is no longer predominately textual and numeric. The user now expects to see graphics, audio, and video file attachments. A simple audio file can easily take up 1MB: video files are even hungrier for storage. A two-minute video file can require 10MB of storage.
One hundred percent uptime is the only acceptable status for your Web site in the global economy. Your Web site must always be open for business. And just think, only a few years ago, vendors talked grandly about a 99 percent uptime. In the past, vendors and the IT community would distinguish between unplanned vs. planned downtime. Somehow planned downtime was acceptable, or at least condoned by the end users. Today, the end user could care less about the distinction and has zero tolerance for any downtime whatsoever. Consider online buyers: If they go to a Web site and the site is unavailable, for whatever reasons, they move on and the sales go to the next vendor.
Time to market means life or death to an enterprise in today's Internet economy. If your company can introduce and ship a product a quarter earlier, it could potentially mean millions of dollars in terms of incremental revenue and other benefits associated with a first-mover advantage.
All of these factors combine to make effective gathering, moving, and storing of data the most important aspect of an enterprise. That means as an IT executive, if you don't make the right strategic decision regarding storage, you are potentially jeopardizing your company's entire IT infrastructure. The day of storage is finally here.
Unfortunately, existing storage architectures are practically useless for managing vast amounts of data - further complicating the situation. So what is an IT executive to do? Before I provide you with some guidelines for success, let's look at the current state of storage infrastructures and then examine what's new today and what technology is almost here.
The dominant current storage infrastructure is called direct attached storage (DAS). Basically, this infrastructure means that application servers, database servers, and file servers are on a network, typically a TCP/IP-based Ethernet, interconnected with routers or switches, with each server - usually using a SCSI bus - with its own storage attached directly to it. The client computers are mostly PCs, Macs, or Unix workstations, also on these same networks, and with varying amounts of local disk storage. In this environment, because each server has a maximum capacity for storage, once you reach that limit, you have to add another server, regardless of the remaining processing power of the CPU (see Figure 1). Similarly, if the server runs out of processing power, you have to add another one, even if the existing server has plenty of storage capacity left. This one-to-one relationship between server and storage is a fundamental issue for scalability and manageability of storage required for today's economy.
Because data on a storage unit is "owned" by the server to which that the unit is attached, accessing information simultaneously from multiple storage units involves all associated servers - a highly inefficient process for your databases and many e-commerce applications. Also, consider the impact when one of these servers is down: Data associated with that server is unavailable. This server-storage association also requires that you manage each storage unit separately, making overall storage management in a large enterprise with hundreds or thousands of servers an absolute nightmare. Small wonder that the IT community, already reeling from budget crunches and lack of storage-centric personnel resources, is fighting a losing battle and crying for help.
| Storage Planning for the Future | |
|
Two storage infrastructures have arrived on the scene recently: network attached storage (NAS) and storage area network (SAN). I will first discuss these conceptually and then look at the technologies that lie underneath, with their pros and cons.
Network attached storage. Conceptually, you can think of a NAS device as a dedicated file server that is attached to the network - typically Ethernet - and serves files to a multitude of clients on that network. NAS provides common storage for needed files that a large number of heterogeneous clients can access. The most common access protocols are NFS (often used by Unix clients) and CIFS (used by Windows clients). In the mid-1980s, general-purpose Unix servers played the role of a file server in addition to performing other tasks, such as application and database management. Then in the late 1980s, Network Appliance Inc. developed the concept of an appliance, a system designed to perform only one function, in this case file serving, that was significantly better in terms of performance and management than a general purpose server.
While the terminology for NAS came later, this appliance was indeed a NAS box. It connected via Ethernet and had vast amounts of storage expandability for its time, and because it required little or no configuring, it could be put into service within minutes. The role of NAS had been increasing steadily, but has reached a crescendo due to the maddening pace of data growth. Already at $2 billion, I expect the NAS market to grow at about 70 percent for the next few years. The two major players are Network Appliance Inc. and EMC Corp. Other server vendors (Compaq Computer Corp., Dell Computer Corp., Hewlet-Packard, Sun Microsystems, and so forth) have also jumped into the fray.
Besides excellent performance and ease of management, the most notable aspect of NAS is the fact that it deals exclusively with file-level rather than block-level data (explained later in this article).
Storage area network. Conceptually, a SAN's primary purpose is transferring data between computer systems and storage units, or between storage units. While there is no theoretical tie between a SAN and Fibre Channel (FC) protocol, in reality the terms are used interchangeably today. With the advent of InfiniBand and iSCSI protocols, this association is bound to change (see the sidebar, "Storage Planning for the Future," for an explanation of terms).
Figure 2 shows a simple SAN based upon FC protocol. In effect, I have stripped the servers of all storage units, created an FC network on the other side of the TCP/IP-based Ethernet network, and connected all the servers via high-speed 100MBps links to an FC switch, to which I have attached FC disks and a tape library via an FC-SCSI bridge. This configuration lets us consolidate storage and independently scale storage and servers, based upon requirements rather than architectural limitations. You can allocate portions of storage to a given server and reallocate as necessary by simple drag-and-drop methods. All storage is managed as a common pool, easily increasing the amount of storage managed by an administrator by factors of 5 to 10. You can interconnect the switches into a fabric and scale them almost infinitely, hence providing incremental bandwidth as you need it. This storage networking concept lets the bandwidth, performance, and capacity scale independently along the three axes. Application servers can then increase their performance for application processing because they no longer have the burden of storage management. Your TCP/IP network congestion has relief as most data now travels on the high-speed SAN. You can share tape resources and enhance backup and restore efficiency. Cumulatively, these are huge benefits.
A key aspect of a SAN is that it deals with block-level data. To understand this data attribute requires a quick understanding of SCSI - a protocol used by a computer to talk to a storage device. The transfer of data between these devices occurs in chunks known as "blocks." Certain applications, most notably database applications, use this method of communicating with a disk in order to maximize performance. Consequently, some of the earliest SAN implementations have been for database applications (financial, e-commerce, CRM, ERP, and so on).
For reasons already stated, the SAN market is already at a $5 billion level and is expected to grow at 50 percent or more over the next few years.
NAS vs. SAN. A hot debate is brewing in the market regarding the superiority of one technology over the other. But - in a nutshell - the debate is completely useless. Both technologies are required to alleviate the issues of storage growth and management. As shown in Figure 3, it is only a matter of time before most solutions in the market will let you store all data in a common storage pool, with access to this data via block or file - depending upon the application and user needs. Figure 3 shows an Internet application where you need both NAS and SAN to get the job done. Also bear in mind that even NAS devices that connect via Ethernet on the front end, often have their back ends connected via SAN, to the disks. All of these situations should lead you to one conclusion: Both are necessary in spite of what vendors tell you. Storage networking is the term that encompasses both NAS and SAN and captures the essence of storage and networking convergence. I prefer this term as well.
Your application determines which of these two methodologies will be most appropriate. For instance, a financial application that deals with transactions (OLTP application) will benefit from a SAN, and the database behind the application is most likely designed to handle block-level data. An email application, on the other hand, deals with files so a NAS device would be most effective. You can establish very quickly that a typical enterprise has applications that span across NAS and SAN boundaries and therefore, requires a well thought-out storage strategy to smoothly incorporate both. (See Table 1.)
SCSI. As I mentioned earlier, SCSI is the primary protocol used by servers in the enterprise to communicate with the storage devices, including disks, disk arrays, and tape libraries. It is extremely efficient and has stood the test of time. To complicate matters somewhat, SCSI is also the transport used by this protocol. So when SCSI devices talk to each other, they have everything they need to communicate with each other. Being a parallel protocol, however, SCSI suffers from serious limitations of distance, which has restricted its appeal to DAS where storage is attached over short distances, directly to the server. To implement SANs, which require the server- storage separation, you need a protocol that can span longer distances. Hence the creation of Fibre Channel.
Fibre Channel. FC is a transport protocol. It basically transports SCSI protocol that has been redefined to be serial, letting it carry over optical fiber media to distances of up to 10 km. The initial implementation of FC was at 100MBps (1 gigabit per second), with 200MBps devices now becoming available. Given these speed and distance specifications, you can implement a SAN with each server and storage unit - up to 10 km for the switch. You can interconnect the switches in a mesh to essentially create an infinitely complex fabric. In addition to carrying SCSI protocol, FC has also been defined to carry IP and other protocols. FC is really the only choice available today to build SANs and start reducing the horrendous costs of managing storage.
The advantages of FC are well understood: excellent bandwidth, long distances, and the ability to carry SCSI traffic - meaning no changes are required to thousands of applications that speak SCSI.
The disadvantages are: interoperability issues primarily caused by vendor in-fighting and delayed creation of standards (a practical limitation, not an inherent technological one); requirements for implementing a new, separate network (all enterprises have Ethernet networks today that run TCP/IP); and the need to purchase a lot of new equipment.
Other technologies. For other related technologies, see the sidebar, "Storage Planning for the Future."
The bottom line is: Storage needs are roughly doubling every year; storage management costs are spiraling out of control; the storage skill set is quite scarce; and traditional storage infrastructures are at a breaking point. SAN and NAS are excellent technologies to bring some order to this chaos. But selecting technologies is not easy. Should you jump into the fray with FC today, or wait for the iSCSI or InfiniBand panacea? My recommendation is simple: Jump in with both feet with FC right away because your needs are immediate, and the other technologies are not fully implemented yet. The good news is that many vendors are beginning to build bridge products from FCl to iSCSI or InfiniBand so your investment in FC will not go to waste. By starting now, you will gain experience with SANs and NAS and, no matter which technology gains momentum in future, you will be poised and ready. Without that experience, the storage tsunami is sure to drown those who do not act now.
Arun Taneja [arunt@enterprisestoragegroup.com, 508-482-0188] is a senior analyst with the Enterprise Storage Group, an analyst group focused on storage technologies. He has 25 years of experience in the storage and server industry.
Cisco Systems
Compaq Computer Corp.
Dell Computer Corp.
EMC Corp.
Hewlett-Packard
IBM
Network Appliance Inc.
Sun Microsystems
Fibre Channel Industry Association
InfiniBand Trade Association
Internet Engineering Task Force Internet Draft for iSCSI
Storage Networking Industry Association