Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




August 12, 2002

Mapping Your Enterprise Genome

How much do you know about your IT infrastructure? It pays to find out

by Karen Kazimer-Shockley

Almost any new strategic software project involves a consideration of legacy systems and their perceived functionality. Unfortunately, system documentation is often out of date, the developers of the systems are long gone, and the users themselves may insist on the status quo simply because they've "always" done things that way.

This familiar scenario further complicates the process of determining legacy requirements where more than one legacy system is to be replaced. Much of the same functionality will be duplicated across these systems, most of which were developed as stovepipes. In addition, even if the functionality isn't exactly duplicated, the data will be — although often in different forms or with different data definitions.

One of my recent projects involved just such challenges: My team was assigned to build a data warehouse supporting decision-making across the enterprise for a disparate group of users. Among those users, some were permitted to query databases directly, some could run queries on cubes or subsets of the data, and some could view preformatted reports. In addition to the new data elements that were to be incorporated, this system had the daunting goal of replacing no fewer than 11 legacy systems, a few of which dated back almost 20 years.

These legacy systems were scheduled to be shut down when the new system was operational, so we needed to ensure that our requirements encompassed all levels of functionality within the systems. We had to assume the acceptance of data from several source systems, we had the requirement to present data to several different levels of users and to assure them they were getting the data that they had come to know and love, and we needed to consolidate data elements where possible, providing one version of the "truth." Because this new system also needed to be efficient, we had to eliminate those items or processes that were no longer necessary, or at least not high on the list of priorities.

Executive Summary

Karen Kazimer-Shockley

The biggest risk in implementing innovative e-business solutions is that requirements and existing processes may not be fully documented or understood. Unmitigated, this risk can result in new systems that don't fully provide the features users require. This article describes a strategy for avoiding that risk.



Mark Riggle's "Breaking the Cycle of Failure" (Aug. 10, 2001) describes a requirements analysis approach for data warehousing.

Our first approach was to perform due diligence by examining all available system documentation. After this one-day task was completed (well, it couldn't have taken much longer; there wasn't a lot to read), we expanded our approach to simulate how you might determine requirements for new systems using the legacy systems' users and their processes as data points. In addition, we decided to make our approach data driven because the final goal was to store data in a data warehouse.

Although all strategic systems aren't data warehouses, all systems do involve data, so the approach I'll describe here is generally applicable. In essence, we focused on three areas: storage and processing of data, data sources, and target output.

Storage and Processing

The most complicated requirements involved data storage and processing, so we addressed those first. The required data itself depended on the data received from the sources, the transformation applied to this data, and the data that was to be delivered to the user in reports, queries, or feeds to other systems. We addressed each legacy system by defining its data, the reports provided, and the applied transformations:

1. Legacy system data. For each of the 11 systems, we annotated each data element along with its metadata.

2. Reports and queries. Next, we set up programs on the legacy systems (where possible) to determine when, how often, and by whom certain reports were run. We then tracked over a six-week period which users requested that custom queries be provided, as well as when these queries were requested. We also compiled a list of all reports that were run, how often they were run, and who the target users were. Based on this information, we ranked the reports and queries in two ways: by those that were run the most and those that were used by the highest-level decision makers.

3. Transformations. This last step was the most difficult. We set up joint application development-like groups with user advocates and then attempted to capture the business rules that were used to transform the input data to the result: report and query data.

As an aid to this effort, we created a spreadsheet containing the sum of our knowledge: input, processes, and output. (See Table 1 [PDF, 12K].) The vertical left-hand column of the matrix contains a list of all input data elements, by legacy. The next set of columns contains the metadata for each element. Across the top are listed the top queries run and reports produced, along with the targeted user involved. (The targeted user is identified both by organization and level within the organization). An X is placed in each box where the data element appears in the column headed by a report. If any raw input data was processed, the business rule involved is listed as well.

When the matrix was complete, we had a comprehensive list of each system's data elements, those that were used as output, and the transformations applied to that data. There were instances, however, where a data element wasn't directly displayed in a report or query, but was used in a transformation or query. This data element was also then captured in the matrix and annotated in the Required Field column. An example of such a field is the Modification Date, which is required for ensuring that only the latest updates are applied to a field, but doesn't appear routinely in a user report.

The users of the legacy systems then validated the business rules and the list of data elements represented. In some cases, they added data elements to the list. (In any computer system, important elements don't routinely surface during data access.) The users also agreed to use these completed spreadsheets as a baseline for new system requirements. Our target system was built to be flexible; new items could easily be added during the implementation phase, if necessary. We were also lucky in that our users understood that when a terabyte-sized data warehouse is involved, storing unnecessary data can not only be expensive but also a drain on system resources, causing delays in response time.

For each legacy system, we now had those data elements that the users considered most important, the systems they came from, and the business rules applied to achieve any required transformations. Because we now had a subset of elements from each system, we were able to repeat the initial step and create a "super" spreadsheet, which contained all input elements from each of the systems, along with the attendant metadata, the compiled list of reports and queries, and the transformations applied.

The next step was extremely time-consuming because all of the data elements were reviewed, based on the definitions, to determine which were duplicates and, thus, candidates for consolidation. This step was where the metadata was extremely useful. When elements were found to be duplicates, the user advocates had the job of deciding which element would be carried forward into the new systems, as well as the length, type, definition, and transformation rules that would be applied to each report.

Although this step contained some risk — based mostly on the fact that not everyone could be assured they were choosing the right element and the right transformation — it at least provided a methodology for creating one version of the truth. In the new system, each data element would have only one definition and would come from a designated source or sources. In Table 1, an example of a data element that would not be carried forward from the legacy system would be the element "Name" in the Sales System, in favor of the four elements, "First Name," "Middle Initial," "Last Name," and "Suffix," which is now carried in the marketing system. (The sales system was a fairly old legacy system that didn't offer the flexibility that the marketing system was built to provide.)







IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo Jitter
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet Evolution
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space