Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home








Adjust Your Thinking for SANs

Storage area networks offer interesting advantages to the data warehouse

by Ralph Kimball


Continued from Page 1

High-performance direct transfer from disk to disk. Data warehouse operations have all sorts of needs for major data copying from disk to disk. This kind of data copying does not involve a complex application. You may need to move a suite of databases from a test machine onto a production machine. Or perhaps you will need to physically replicate an entire application in order to scale it upward to meet increased demand. (And of course, you can scale down an application pretty easily on a SAN when the surge of demand passes.) Maybe we should call this kind of scaling "naive parallelization" because it lets you copy both servers and data without very much planning. All that seems to be required is to segment the incoming demand so that each of the servers on the SAN can satisfy its part separately. Busy Web sites take this approach, bringing "proxy servers" that are identical clones of each other online to handle high load conditions.

An interesting wrinkle in the copying-to-accomplish-scaling scenario allows a DBA to copy granular base data onto separate storage devices but not necessarily copy aggregated data. Separate application servers on the SAN could all have their own copies of the base data but navigate to the same aggregate tables on a single storage device if it makes sense to do so. Given the usual rule of thumb that the total size of aggregated data is equal to the base data, the disk requirements for a parallelized application might be reduced significantly, under the right conditions.

Justified economics for higher-performance devices. Although this benefit may sound like vendorspeak, it is probably true that the economics of centralizing the physical storage really do justify investing in higher-performance devices. This seems especially true for high-performance tape-backup devices. It might not make sense to have a terabyte-capacity tape-backup system for any one application, but it probably would if the device could be shared by the whole enterprise. High-end (expensive) tape subsystems can handle 20TB of data with transfer rates of 500GB per hour. This scenario supports the next point as well.

Elimination of intensive bandwidth data transfers from the LAN. The huge data transfers needed to back up all the various stages of the data warehouse can be removed from the main LAN.

More efficient use of expensive administrative personnel. The arguments for centralizing backup facilities apply to centralizing the personnel who perform these functions. These folks can work more efficiently with the full production-level responsibilities of the whole enterprise, and they can be more skilled personnel who can be paid more.

A single centralized scratch space for major database table manipulations. The standard formula of planning for five times the storage space of your largest fact table can be significantly scaled back because a number of large applications can share a common scratch space for temporary copies of database tables.

No need for applications to know where the data is physically located. The SAN is between the application and the physical storage.



Rate This Article

Comments:

Optional e-mail address:

Openness, letting multiple technologies access the stored data. A typical data warehouse staging area could have the production OLTP system running under Unix, the data extract-transform step running under NT, and various data marts running under Unix or Windows. The extract-transform-load tool could control the flow of data as it is being transformed by reading and writing from the native files of each of these systems with the performance of local disk storage.

Configurable to support disaster recovery and fault-tolerant computing. You can extend the SAN to include a separate physical facility up to 10km away from the other devices.

SAN vendors have not developed their technology exclusively for data warehousing, but given the length of the preceding list, we should be a principal target for these vendors. As we in the data warehouse community become more and more distracted by the physical issues of managing our huge data sets, we should welcome this new set of services and products offered by the storage technology vendors.

 

Ralph Kimball co-invented the Star Workstation at Xerox and founded Red Brick Systems. He has three best-selling data warehousing books in print, including the newly released The Data Webhouse Toolkit (Wiley, 2000). Ralph teaches dimensional data warehouse design through Kimball University and critically reviews large data warehouse projects. You can reach Ralph through his Web site at www.ralphkimball.com.






IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo Jitter
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet Evolution
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space