The Power of PrototypingWith small steps focused on gaining and retaining user buy-in, a data warehouse can arrive and thriveBy Chris Lawson Software development for large, sophisticated information systems is traditionally an extremely structured process, with many days spent on up-front analysis, requirements analysis, design reviews, and so on. The strategy for these types of projects is to invest much time early, when mistakes are cheaper to fix. For large government-funded projects, this strategy is reasonable because the customer requirements are fixed and nonnegotiable. In fact, the customer may even mandate strict standards, such as those defined by the Software Engineering Institute. These standards demand thorough requirements analysis and rigorous design reviews. The emphasis is on repeatable processes.
CHANGE IN GROUND RULESThis software development strategy often does not work well with private-sector projects. Simply put, the environment and ground rules are different. For instance, in many projects, the requirements are not actually fixed. The precise requirement is negotiable, elusive, and will probably change. As Steve Tracy points out in "Close the Loop" (March 27, 2001), "Just when you are about to deliver, expect the rules to change then change again." In other words, the entire nature of the project development cycle is fluid. Data warehouse projects in particular have fared poorly using the traditional development process. For example, by the time the various data sources have been identified and transformed into some form of viable product, the in-house customers have often lost interest or turned elsewhere for the information. It's a well-known fact that 50 percent of large data warehouse projects fall short of goal and many data warehouse projects fail on first or even second attempts. Thus, traditional, extremely structured development cycles may not meet the corporation's business needs, especially for a complex data warehouse. DIGITALTHINK PROJECT STRATEGY: PROTOTYPINGFor all these reasons, DigitalThink chose a different model for its new data warehouse. The product manager realized that "selling the customer" would be a key ingredient for success. Selling the project in-house was recognized as a problem equal in importance to technical issues. The data warehouse team used a simple prototype approach to quickly showcase a working model with some features that could be demonstrated. The idea in the DigitalThink data warehouse prototype was to deliberately follow a simpler development process. Thus, early phases in the project were designed to be very limited in scope. In fact, product management identified just six key reports to produce for the initial rollout. The (good) assumption was that early success in these critical reports would prompt enthusiasm for the more challenging later stages. A prototype model requires trade-offs. Because much less time will be spent on design analysis, some mistakes will be made; designers will probably make some poor design choices. However, prototyping provides the opportunity to quickly uncover "show stoppers," as Vijay Saradhi and Martin Simoneau advise in their article on prototyping, "Look Before You Leap" (Feb. 16, 2001). In other words, with a narrow scope of work, you will encounter serious roadblocks quickly, and the cost of correcting them will be small. A second benefit you reap via the prototype method is the ability to gain support among potential users. The DigitalThink working prototype was available after just a few months. It provided for the demonstration of a few key reports in order to get "buy-in" from interested parties and increase the probability of the project's success. Saradhi and Simoneau point out that this scheme can "illustrate the power and capabilities of data warehousing to the user community." The various demos showed the customers that the project was on track and the output was useful. The product manager acted as customer liaison and focused on publicizing the project and coordinating demos. Both in-house and external customers were presented with actual working versions of the prototype. One important demo was at a conference for the American Society for Training and Development. For this event, the team prepared a select group of reports that would best illustrate the project's value (and could also be used later, in production). A separate Oracle database server was set up just for this conference. One drawback to the demos was that they consumed 100 percent of some resources for the week preceding the demo. At the beginning of the project, the magnitude of this impact was unknown. Despite the large preparation time, however, these demonstrations were not just tangents off the "real work" of solving technical problems. After time, it became clear that they were the real work. THE DESIGNThe DigitalThink data warehouse provides customers online statistical reports on student training activity. For instance, a training administrator can request a list of student scores for a particular course enrollment or list how many students are enrolled in each course. Most reports present information both graphically and in tabular form. Most data warehouses use a process called extraction, transformation, and loading (ETL). In the DigitalThink design nightly runs perform data ETL from the production Sybase Inc. online transaction processing (OLTP) system into an Oracle 8i database (see Figure 1). Initially, the system performed the data extraction using Perl scripts that transferred data from Sybase into Oracle, using Oracle's Sql*Loader utility. Later, because of concern about maintaining custom shell scripts, the team used Data Junction Corp. for data extraction. In order to expedite daily processing, the system inserted only the changed production data into the data warehouse. The identification of changed production records required several minor modifications to the production OLTP system. These changes were surprisingly easy to institute, in large part because the customers had bought in. That is, everyone knew that the new data warehouse was going to work because it was already working. The system uses E.piphany Inc.'s ETL tool to transform data into the appropriate star schema format. With this tool, all database definitions, such as table and index names, are stored in E.piphany metadata. This metadata comes in handy if a complete database rebuild should ever be necessary. After E.piphany transfers all data from Sybase to Oracle, the Actuate Corp. reporting system produces the reports. When a user requests a report, Actuate reads the preexisting report definition to create a database query. It then executes the query on the Oracle 8i database, adds formatting and graphics, then forwards the report to the Web server for delivery to the client's Web browser. A typical report queries the Oracle database to determine the student enrollment history for a particular customer. (See Figure 2). This report is called the "dashboard report" because it is the starting point from which the user "drives" the reporting application. FOUR-SECOND GOALThe biggest technical hurdle was the need for extremely rapid generation of certain critical reports. The entire project's success would be marred if the reports could not be produced more rapidly than by the existing NT-based system. The product manager set a goal of four seconds' elapsed time to produce the initial dashboard report page that the customer sees. (See Figure 2). The idea was to use the six prototype reports to identify performance bottlenecks early. Various bottlenecks revealed themselves in the database and use patterns, which the team was able to counteract: Database performance. Surprisingly, some of the newest Oracle features were a mixed blessing. One Oracle feature, specially designed for data warehouses, is called "Star Transformation." This feature recognizes the unique layout of data warehouse tables and optimizes query performance. Star Transformation did improve performance of some reports, but severely degraded others, causing more work for the database administrator. After enabling this feature, one report in particular had to be temporarily disabled because of its abysmal performance. Report "factories." Some warehouse queries require extensive processing that is, more than just a few minutes. This need presents a risk that numerous reports of this type might be requested simultaneously, thus degrading the system for everyone. The team mitigated this risk with "factories." The Actuate reporting tool assigns report requests to factories, which restrict the number of simultaneous queries against the Oracle database. It assigns reports to particular factories based on expected run time. The factory for the typical, fast-running report was configured to allow simultaneous execution of two reports per CPU.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| ||||||||||||||||||||||||||||||||









