CMP -- United Business Media

Intelligent Enterprise

Better Insight for Business Decisions

UBM
Intelligent Enterprise - Better Insight for Business Decisions
Part of the TechWeb Network
Intelligent Enterprise
search Intelligent Enterprise



March 20, 2000, Volume 3 - Number 5



Solving the ERP Paradox

Sourcing data warehouses from ERP systems eases the task of data consistency, but makes creating accessible data marts much more difficult

By Mark Riggle



Enterprise resource planning (ERP) systems provide mission-critical operational workflow for the enterprise. In many companies, hundreds of legacy subsystems retire when an ERP system is installed. However, because ERP systems do not support decision-support systems (DSS) directly for several reasons, the need to source a data warehouse from the ERP system is obvious.

This need has been a continuous theme in writings about ERP systems and data warehousing. However, the distinction between building data warehouses sourced from ERP systems vs. those sourced from pre-ERP legacy systems is nearly always ignored. As I’ll explain here, this distinction is crucial for reasons that lie at the heart of the data warehouse. Some of the tasks in constructing a data warehouse on top of an ERP system have become easier and others have become harder, but the good news is that the more difficult tasks may become easier with the imminent emergence of a new set of tools.

To fully understand this change, we need to take a close look at ERP design and data warehousing basics. A data warehouse has two reasons to exist: to make data consistent and to make it accessible. Figure 1 (page 54), which draws on the methodology proposed by Ralph Kimball, gives you an idea of the relationship between these tasks. Generally, the staging area makes data consistent while the data marts provide data accessibility. But the work required to meet both of these goals differs when an ERP-based data warehouse is involved. Let me explain how.



FIGURE 1 Elements of a data warehouse.


Achieving Data Consistency

To make data consistent, usually you need to address problems associated with data noise, multiple-source data, multidata encoding, and data structuring:

•Data noise arises from incorrect data entered or data entered multiple times, such as an address with small differences. Networks of legacy systems often have this problem because the same data is entered in many separate systems.

•When different data originates from multiple sources, it may differ in value or be otherwise incompatible.

•The difficult problem of multidata encoding can occur during the evolution of a legacy system. Because it is often faster to modify programs than to change table structures, developers may have encoded an attribute so that it could hold information that would otherwise comprise two or more attributes. The drawback is that the information is now hidden in the code that uses that attribute, and other uses of the data are now complicated by the required decoding. Making the data consistent requires separating the multiple parts.

•Data structure problems among legacy systems can be vexing. In one system, a concept could be an attribute on an entity, and in another system, it could expand into several entities. For example, system A may have address recorded as three text lines and system B may define the constituent parts of an address as entities, to enable understanding of relationships among addresses. Combining data from these systems requires a lot of work, either by collapsing the structure or adding it to the flattened structure.

In addition, legacy systems are often poorly organized and very hard to understand. The people with the job of ensuring data consistency must be familiar with the original systems as well as the models of the desired data warehouse. Many data warehouse tools are geared toward aiding this process by providing a place to record the mappings, but the task remains difficult and time-consuming.

You must resolve all these problems — and more — to make the data in your data warehouse consistent. To complicate matters, political problems in resolving them may arise when the different legacy systems span departments or even larger units; for example, the parties may disagree about which kinds of data are more important than others. Indeed, political problems can be the most difficult ones in a data warehouse project. Departmental-level projects may largely avoid them, but to gain an enterprise data warehouse view, you’ll have to resolve such problems eventually.

As has been well documented, ensuring data consistency often consumes a large proportion of the overall project timeline. Although the potential problems are generally not intellectually difficult, in traditional data warehousing projects, they can hijack 60 to 90 percent of project time and design money. In fact, this task can slow overall progress so severely that the legacy systems may change before the project is completed.

The ERP Change

First of all, ERP systems solve all the data consistency problems I’ve described during design and installation. Legacy systems may remain after an ERP package is installed, but that package becomes the standard for the enterprise and the central point for resolving consistency problems. The ERP system can span an enterprise because it has a consistent enterprise data model: objects are modeled once and reused over and over. Thus, ERP systems offer data consistency as “icing on the cake.”

Although the ERP system will obviate data consistency problems, political problems among departments can again be the greatest risk factor in an ERP installation. If a department chooses not to cooperate, information in the ERP will not be properly used and may be incorrect, putting the whole system at risk. Thus, the ERP system must be supported at the highest level of the company and its form of the data made supreme for the enterprise; otherwise, the installation project will fail.

This support is clearly great news for data warehousing projects because it ensures enterprisewide consistency in the data.

The Other Twist

We have seen the data consistency side, but to understand completely how ERP systems change data warehousing, we need to look closely at the second justification for a data warehouse: data accessibility; that is, making the data understandable and navigable to the user, and ensuring that tools and applications correctly interpret the data.

Data accessibility means keeping the data in a form that end users can use; usually through data marts with a star or snowflake schema. The task of making the data accessible is intellectually demanding and requires you to understand user requirements, available data types and structures, and business rules and processes. This goal is a difficult one in any environment.

We have seen that with an ERP system serving as the base source system, the task of making data consistent is far easier because it supplies a populated enterprisewide data model that all departments agree upon. The burning question is: If before the ERP era the consistency phase consumed 80 to 90 percent of the data warehouse project’s resources, why aren’t more ERP data warehouse implementations — in which consistency is a given — faster and more successful? The answer is that a different problem now arises in that the price for this enterprise data model is paid in the dimensional modeling design task phase. This phase becomes harder for a subtle but important reason: The data models of the ERP system must be highly abstract.

Proprietary legacy systems usually model the enterprise at a very concrete level. This approach makes them initially easy to create and move to a dimensional model, but very difficult to modify to support changes in the business. In fact, this induced brittleness is often a factor in deciding to implement an ERP system, which you can configure to handle thousands of different business processes, all without data model changes. These systems, in order to handle this wide variety of businesses, use a more abstract model than the in-house systems of most organizations. (The difference between concrete modeling and abstracted modeling has been well covered in Intelligent Enterprise, particularly by Terry Moriarty; see Resources.)

Take the example of a common data model patterned as a hierarchy. (See Figure 2.) The concrete representation of the departments, divisions, and companies and the allowed relationships among them are very easy to understand and transfer to a dimensional model. In the higher level of abstraction, the three entities are modeled by one entity and the relationships are transformed into an entity. In addition, you must model the constraints of the prior relationships with additional entities that carry the needed restrictions as data.

Initially, things are more complicated in the more abstract model, but supporting change and variability is much simpler. A new organization level becomes new entries or instances, not new entities. In the case of ERP systems, you can support different configurations without model changes and often without program changes. However, how should you represent this hierarchy in the data warehouse to make it accessible? The end user wants to see department, division, and company — not organizations and reified relationships. The greater dependence on higher abstraction in the source system (the ERP) — abstraction that can confuse the data warehouse end user — now makes data accessibility more difficult. To make that data accessible to the data warehouse end user, you have to reduce the level of abstraction.



FIGURE 2 Common data model patterned as a hierarchy.


Abstraction in Modeling

There are more than two levels of abstraction for various modeling problems. Levels of abstraction can vary greatly in the representation — from none to so abstract that communication becomes impossible. The appropriate level of abstraction will not only accommodate the right level of change, but also retain meaning for good communication and process modeling. The ERP vendors hope that they chose the right level.

A good example of a highly abstracted pattern is the “parameterization” pattern David Hay describes in his book Data Model Patterns: Conventions of Thought (Dorset House, 1995). This pattern applies to entities such as products, customers, vendors, and work orders. For example, in a large company, distinct products could number in the thousands, and thus the distinct attributes of its products could number in the thousands or tens of thousands (most of them disjoint). For instance, a computer has a CPU speed and a TV has a picture tube diagonal measure, and they do not share those attributes. Of course, it would be very poor modeling practice to make a table of thousands of attributes in which nearly every one would be missing values for most rows. An extensive hierarchy of subtypes would be better, but that would make adding new attribute types difficult.

“Abstracting out” the column from an entity yields the parameterization pattern. (See Figure 3.) For example, say we’re looking at shoes. Shoes have eyelets for the laces, and you could represent the number of eyelets by a parameter name such as “number of eyelets” and a parameter value (of say, 4).

FIGURE 3 Parameterization pattern.


An actual implementation is much more complicated, however. The problem with parameterization when constructing a data warehouse is that usually you want to use a more concrete representation that a user can understand. Clearly, if the business analyst is looking at shoes, she expects to see that “number of eyelets” is an attribute of shoes, and that she will be able to manipulate the shoe data in those terms. But burying the information in a parameterization pattern renders it useless to that analyst. Many such abstractions occur in ERP systems, all of them requiring transformation into a dimensional model and data mart.

The Tools Scene

Thus, we see that ERP systems make creating dimensional models harder due to their abstracted nature, while making the consistency problem easier to solve because they supply the enterprisewide data model. The currently available tools mostly address the clerical side of the problem; they are tireless scribes that do notation well and deliver the low-level information. They are the best we can hope for to make data consistent.

The complexity of the consistency phase for ERP data warehouses is still high enough that current extract-transform-load (ETL) tools provide a good value, especially because you can add other data sources outside the ERP system. However, those ETL tools do nothing to ease the structural understanding and transformation tasks required in the modeling phase.

Different tools are needed for that process, tools that bridge the transition from clerical help to “knowledge” help. These “intelligent” tools would be aware of the abstractions and commonly used data structure patterns in the system. A tool that understands the structural features of the ERP data model would give the user the link between the abstractions in the ERP model and the concrete representations required by the data mart that is accessed directly by an analyst. It is in that transformation that the data held by an ERP is made visible and useful. For example, an intelligent tool that understands the hierarchy pattern could determine if the hierarchy is balanced and create an appropriate data mart where the levels are expanded.

An even deeper problem would be solved if the tool were to understand the parameterization pattern. In practice, the parameterization pattern and the hierarchy pattern are used together to create a product hierarchy (for example) where the different nodes and levels of the hierarchy can have different attributes. Such tools would let users navigate the hierarchy and parameters to create a data mart that has the classifications they want.

The Upshot

ERP systems have changed data warehousing. They ease the task of making data consistent, but make the task of creating accessible data marts much more difficult because of their overall complexity and the higher level of abstraction in their data models. Unfortunately, the available tools don’t help in the intellectually difficult task of making the data accessible. We need a new kind of tool — a more “intelligent” one — for this task.

Mark Riggle (mark.riggle@sas.com or rigglem@mindspring.com) is on the research and development staff in the data warehousing technologies group of SAS Institute Inc. in Cary, N.C., and is currently investigating ERP-based data warehousing. He has more than 20 years of experience in software research and development.


RESOURCES

Terry Moriarty’s Metaprise columns:
www.intelligententerprise.com/ports/search_metaprise.jhtml





IE Weekly Newsletter
Subscribe to the newsletter
    Email Address