CMP -- United Business Media

Intelligent Enterprise

Better Insight for Business Decisions

UBM
Intelligent Enterprise - Better Insight for Business Decisions
Part of the TechWeb Network
Intelligent Enterprise
search Intelligent Enterprise



March 20, 2000, Volume 3 - Number 5


Debashish Bhattacharjee   



Bridge of the Enterprise

Using Tivoli to manage multiple ERP systems

Welcome to the new evolution of In the Field! Rather than send a reporter to describe a field implementation second-hand, from now on we’re going to get the story straight from the source: the project manager in whose footsteps you may need to follow. You can expect In the Field to describe not just the “as advertised” features of a product, but also the implementer’s business problem, the thought process that went into the solution, the headaches and unexpected obstacles along the way, and the final resolution. We invite feedback of any kind. Email Jeanette Burriesci, jburriesci@cmp.com.

Many large corporations have become increasingly dependent on ERP systems such as SAP and PeopleSoft to perform core business functions. In fact, any downtime because of system failure or poor performance could be costly and even halt daily business operations. Unfortunately, most ERP systems are very difficult to manage, and the task can overwhelm IT departments at corporations with large installations. To solve this problem, many IT departments are looking to enterprise systems management (ESM) tools such as IBM Division Tivoli Systems Inc.’s suite and Computer Associates’ Unicenter TNG suite to manage the enterprise.

ESM encompasses a broad range of functions. This Fortune 500 company’s IT department was interested in systems monitoring, software distribution, backups, and job scheduling. Numerous products claim to perform the ESM function, but two notable leaders in the field, Unicenter and Tivoli, differentiate themselves by incorporating the concept of a framework in their architecture. A framework allows the customer to start by purchasing the bare minimum and then add on functionality — or plug in competitors’ modules — as the need arises.

Because our client was already a Big Blue shop, it chose Tivoli. The modules it purchased were systems monitoring for Unix, NT, Oracle, and SAP; software distribution; ADSM for backup and recovery; and Maestro to handle job scheduling. The purpose of these modules was to support the ERP systems and provide an integrated package with all modules “talking” to each other and reporting errors to a central location.

A Business Case for ESM

The organization had implemented SAP, PeopleSoft, and Manugistics to streamline manufacturing, supply-chain, and back-office functions. These ERP systems require a significant infrastructure — in this case, 25 Unix servers, 12 NT servers, and seven Oracle databases. Prior to ESM, these systems were mostly unattended because system health checks had to be done manually by a few system administrators, and these staff members were usually busy performing routine maintenance functions.

When a server became unavailable or experienced system problems, the first form of notification was usually an irate user, and at that point, it took additional time to identify the entity within the distributed enterprise that had caused the problem. The IT department needed a tool that could proactively monitor the infrastructure, identify a problem as soon as it occurred, notify the appropriate staff members, and sometimes even attempt to fix the problem itself.

In addition to systems monitoring, the IT department wanted to automate some of the functions required to support ERP systems. SAP, PeopleSoft, and Manugistics all have a “client,” or desktop, component. IT needed a way to push this software out to hundreds of users, many of whom would be at remote locations. The IT department did not have the skilled personnel needed to handle this kind of workload and had to make a decision between hiring new technicians or using a tool to automate the function of software distribution.

The department also needed to centralize backup, recovery, and job scheduling — and perform these functions with a single tool, regardless of operating system or database. This approach would serve to simplify operations, improve error handling, and reduce the staff required to support these systems.

Architecture

The first task on a Tivoli implementation is to design the architecture of the Tivoli Managed Environment (TME). The Tivoli Management Region (TMR) server is the central orchestrating body for the TME, so you must position it strategically within the corporate network. Tivoli’s protocol of choice is TCP/IP. There can be a significant amount of network traffic in the form of “chatting” between TMR and the managed server or PC, so the TME’s architecture is critical. The principal design goals are to minimize network traffic, ensure optimum performance in the TME, and minimize the effect of bottlenecks that slow WAN links cause.

We were able to distribute the software during standard working hours because doing so does not significantly affect the user. The only side effect was a slight degradation in client PC performance.

Systems Monitoring

In the days of the mainframe, you could monitor all the mainframe-run applications with a single set of tools. Tivoli systems monitoring is an attempt to do the same for the client/ server world. Tivoli provides distributed monitoring modules for SAP, Unix, Windows NT, and Oracle but none for PeopleSoft or Manugistics.

Therefore, we had to monitor PeopleSoft and Manugistics at the Unix, NT, and Oracle level, but could monitor SAP using the SAP application module. (See Table 1.) Because an application-level module was available for SAP, Tivoli was able to look at the guts of the SAP processes, whereas PeopleSoft was just an NT process to Tivoli.

Table 1 Tivoli modules used to monitor ERP applications.
  Unix Module NT Module Oracle Module Application Module
SAP  
PeopleSoft      
Manugistics  

After we defined the monitors (see Table 2), we needed to decide how to display their results and what response they should prompt. This task required an understanding of the way the IT department operates.

Table 2 List of monitors.
Application Monitors
AIX Memory, Swap Space, % CPU Utilization, File Systems, Error Log
NT Memory, Virtual Memory, % CPU Utilization, Logical Partitions, Event Log
Oracle Max Extents, Tablespaces, Processes, Listener, Buffers,
SAP Buffers, Syslogs, Alerts, Work Processes, Oracle Database


The data center operators were the eyes and ears of the IT department, responsible for detecting any errors and notifying the appropriate people, so we decided to place the Tivoli Enterprise Console (TEC) in the data center.

The TEC is an application that can display messages with various priority levels, such as fatal, critical, minor, and warning. The TEC is particularly powerful because you can program it to process rules and take corrective action on its own.

We decided to display all messages from the TME on the TEC, which the operators monitored around the clock. When a message appeared on the TEC, the operator responded to fatal and critical messages by calling or paging the person responsible for the application that had posted the message. Minor and warning messages resulted simply in email.

In general, fatal messages appeared only when an application or server was unavailable. Critical messages showed up when an application was in imminent danger of becoming unavailable, for example, when an Oracle tablespace was full. Minor and warning messages served primarily as an escalation procedure and for early warnings to administrators.

The ESM team designed each monitor and the notification process. When determining the specifications for the TEC, the ESM team followed a number of guidelines. For example, the operator was not required to interpret anything or understand the message. The operator needed only to identify the application that generated the message and notify the appropriate people. The TEC console, therefore, needed to be “operator friendly.” We had to construct rules that make the console somewhat intelligent about ERP applications. For example, if an SAP Unix server became unavailable, any further messages from the SAP module indicating the same would be dropped.

Also, any messages that popped up on the TEC were required to be resolvable by an action. Therefore, the ESM team had to be careful with metrics that were ambiguous about system performance. For example, a message indicating that an application server was unavailable could be resolved by starting up the application server. However, a message indicating that the page-in rate on the Unix server had exceeded a threshold required further analysis of the page-in to page-out ratio over time.

Other Support Functions

The ESM team implemented several additional tools to support the ERP applications: a help-desk system from Remedy Corp.; Maestro, a job scheduler; ADSM, the enterprise backup and recovery tool; Dazel, for printing and faxing large volumes from SAP; and finally Mercator, to enable EDI transactions from SAP. We required all tools to work with Tivoli so it could remain at the center of the enterprise’s systems monitoring.

Maestro, the job scheduler, had an agent running on each server and was able to schedule jobs for SAP, Unix, and NT. A schedule compiled job dependencies. When a job failed, Maestro reported to the TEC.

ADSM, the disaster recovery backup, we scheduled to do full file system backups of the Unix and NT servers and online backups of the Oracle databases. In fact, ADSM proved its worth more than once on the engagement itself. When an upgrade corrupted the TME environment, we had to restore several Unix and NT servers to a point-in-time snapshot. In the world of ADSM, this was simply a matter of specifying file systems on servers that needed to be restored and the date of a successful backup; ADSM took care of the rest. This result was very consistent with the IT department’s goal of a single interface for backup and recovery. It is worth noting, however, that your selection of backup media should consider the restrictions of your recovery-and-backup time window. For data-intensive ERPs — SAP, for instance — I recommend a super-fast storage system such as EMC.

The help-desk system was the last piece of the puzzle. We tied the TEC to the help desk so that an operator could generate a trouble ticket for a message that popped up on the console. This arrangement ensured that the service-level agreements predefined within the help-desk system would kick in both for problems detected by Tivoli and for direct calls from an end user. If 24 hours were to elapse with no action on a trouble ticket, for example, the system would notify the support manager that no action had been taken.

ERP systems take a significant amount of time and money to maintain. ESM tools such as Tivoli, if implemented correctly, will ease the burden of maintenance. Not many ESM implementations are successful, however. Perhaps it’s because the field is still in its early stages and the products available are difficult to implement. The key to a successful implementation is active sponsorship from management and an understanding that the ESM implementation by itself is a formidable undertaking and requires dedicated time and resources. For this company and probably others like it, the long-term benefits from such a product justify the initial investment. This Tivoli installation was to support the ERP systems, but we expect eventually to use it to monitor all applications enterprisewide.

Field Notes

Product in focus: Tivoli Systems Inc.’s suite of EMS software (www.tivoli.com)
Business problem: Multiple ERP systems difficult for IT to manage
Environment: Client/server environment with IBM RISC 6000 AIX and IBM Netfinity NT machines, Oracle databases, SAP, PeopleSoft, Manugistics. Fortune 500 IT shop
Product requirements: Client/server systems-integration experience
Project duration: Eight months
Percent of original problem solved: 100
Next steps planned: Use this implementation model to manage all applications enterprisewide
Wish list for product features: Distributed monitoring module for PeopleSoft and Manugistics

If you have implemented an enterprise application and would like to share the insight you gained about it, we may be able to print your story here. Please see the submission guidelines for In the Field by clicking on Submit an Article at IntelligentEnterprise.com.


Debashish Bhattacharjee (dev.bhattacharjee@us.pwcglobal.com) is a management consultant with PriceWaterhouseCoopers. He has seven years of experience in the IT industry, integrating information systems for Fortune 500 clients.





IE Weekly Newsletter
Subscribe to the newsletter
    Email Address