Look Leap Before You LeapPrototyping your data warehouse can save your team time and money
By Vijay Saradhi & Martin Simoneau
Building a PrototypeHere's the scenario: You've finished your logical modeling, and now you want to test it for "data sufficiency"; in other words, you would like to know if you could generate all the reports that you promised to your users. Here the steps involved. Step 1: Identify the focus of the prototype. As engineers would say, to find out how long a cow falling off a cliff would take to hit the ground, make an assumption ("imagine that the cow is spherical"). By identifying the focus of the problem (time to reach the ground), engineers are able to make assumptions, which reduces the complexity of the problem. Here your primary focus is data sufficiency. Performance, tools, hardware, and software are not the focus of your prototype. Step 2: Create database objects using your model and connect to the database using your tool. To test your model, you need a database. Ideally, you should test your model in a development environment that mimics your production data warehouse. However, it is possible that your team is still trying to decide on the target RDBMS or you don't have your development environment ready. In that case, get some space in any database that you could get your hands on. If nothing else is available, you could even use a Microsoft Access database. (If you are using Access, you could use username: admin so you don't need to enter a password.) Then forward-engineer your model and configure your OLAP tool to connect to the database. (But remember that unlike with mainstream databases, if you forward-engineer your model into Access, the referential integrity/ER relationships might not be created. If the relationships are missing, you might have to create them manually.) In addition, consider the configuration effort required to connect to the database using the OLAP tool. It could be as simple as pointing to the database through an ODBC connection, or as complicated as creating a metadata database and going through a complicated configuration process that could also include installing middleware such as Net8, DB2connect, or DirectConnect (depending on your database). Step 3: Load a good size sample of real data. Obviously, it is best to work with real data and with a large enough data sample. The required data amount could range from as few as 1,000 records to a month's worth of data, provided you keep in mind the size of your sample set while interpreting your results. However, depending on the time period in which you are prototyping, it is very possible that you might not have access to large amounts of real data. However, it is well worth your while to put the effort into loading ample representative data into your test database. If you are using Access, it is easy enough to input data using an Excel spreadsheet or create a quick form in a few minutes using Access forms. If you have multiple sources of data, make sure you include all the sources during prototyping. Step 4: Generate the required reports and run ad hoc queries. Generate all the user-defined reports. Make sure you not only run the SQL queries necessary to generate the reports, but also use your chosen ad hoc query tool to generate the reports. In the case of webhouses, you might even have to formulate questions based on your data (depending on how your data mining tool works). To determine the probable ad hoc queries, make the prototype available to the users and ask them to run queries against the prototype. Log all the queries that are run against the database. If your prototyping platform is a mainstream database such as Oracle, DB2, Teradata, Informix or Sybase, it is very easy to log all the queries for a particular session or at the database level. For example, with Oracle, you simply need to run utilities such as sql*trace or utilities that come with "tuning packs." They would easily give you the most frequently run SQL along with their query plans and performance statistics. Also, some of the query/analysis tools or business intelligence tools - as they are referred to now - come with administration tools to monitor your data warehouse. You can use these tools to obtain performance statistics or other information to help you gain some insight. Step 5: Digesting the results from your prototype. Being able to run all the reports and ad hoc queries using your favorite tool means that your model has sufficient data elements to support all the user required reports and "most probable" ad hoc queries. Even though performance statistics are meaningless at this point, collect them anyway because when they are evaluated in the proper context, they still yield important insights into your warehouse architecture and data flow within your application. Errors in InterpretationThere are several things that you should not do while interpreting the prototyping results. The following are a few of the important ones: Don't draw unreasonable conclusions. Don't draw conclusions from aspects of your project that are not part of the focus of your prototype. Consider our earlier example of prototyping using an Access database. If you draw any conclusions about query performance, data quality, or timeline to finish, the ETL programming at best gives you a very rough estimate and at worst gives unreasonable expectations to your users and upper management. If your prototype doesn't include all the sources of data, there is potential for conflict when you try to integrate the disparate data sources later. Sizing estimates based on prototype only minimize the margin of error. They do not eliminate the error(s) completely. Manage user expectations. It is impressive to see the results from your ad hoc queries appear on your screen in a split-second. But before you show these results to your users, make sure you prep them first or you could be making a tremendous mistake - because whatever you show them becomes the de facto standard. After examining your prototype, if your users are expecting the same query performance from your fact table, which will have 500 million rows, then they are likely to be disappointed despite your best efforts. Last WordPrototyping should give you very good insight into the nature of your warehouse. If something goes wrong despite all your efforts, at least nobody can blame you for not doing your homework. We wish you all happy prototyping. Vijay Saradhi (vsaradhi@aatnet.net) is the chief technology officer for Advanced American Technologies (www.aatnet.net),a consultancy specializing in complex systems design, integration, data warehousing and webhousing. Martin Simoneau (mcsimoneau@aol.com) is a director of information technology in the financial services industry. He is responsible for the design development and implementation of enterprise decision support systems, specializing in data warehousing solutions.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| |||||||||||||||||||||||||||||||





















