|
March 30 1999, Volume 2 Number 5 Steve ElkinsShare and Share AlikeInformatica's PowerMart and PowerCenter combine into a complex yet impressive solution for enterprise data mart architecture
PowerMart provides a set of intuitive graphical design tools for documenting data sources, designing data marts, and developing the data mappings that transform the former into the latter, as well as a powerful engine for executing the transformations. PowerCenter, as a superset of PowerMart, provides a global data-mart repository that weaves numerous data marts together into a data warehouse based on an enterprise data-mart architecture in which all component data marts share common definitions of business measures and dimensions. PowerCenter also includes several optional add-ons appropriate to large-scale data warehouse implementations. (See Figure 1, page 56.)
PowerMart has five major components: PowerMart Designer, Server, Server Manager, Repository, and Repository Manager. PowerMart Designer comprises an integrated set of four tools that graphically document the complete transformation of data from its operational sources to the finished data warehouse. The first tool, Source Analyzer, reverse engineers the content of source database tables to speed the process of validating the content of data warehouse inputs. You can extract metadata - such as field names, types and lengths, and table names - directly from source RDBMS catalogs and mainframe flat-file definitions. You can then easily edit and enhance these definitions to document the data entering the warehouse fully.
At the other end of the pipeline, the Warehouse Designer documents the target data mart's design. You can create target table definitions from scratch or base them on existing tables with fields added, edited, or deleted. Designers can also prototype dimensional database designs with the Cube Wizard, a template that walks you through the process of creating a star schema complete with the surrogate keys recommended by Ralph Kimball. (Note: The graphical representation of dimensional hierarchies is backward - from bottom to top - and you cannot modify the resulting prototypes using the wizard.) You can also replicate complete database designs, created with leading graphical data modeling tools, in the Informatica Repository via optional PowerPlugs that enable the exchange of design metadata via Informatica's UML-compliant Metadata Exchange (MX) API.
The third tool, Mapping Designer, provides a graphical tool for defining data transformation from source to target schema. Transformation mappings appear in the form of data flow diagrams that connect the source tables defined with Source Analyzer. You design the target tables using the Warehouse Designer via graphical representations of more than a dozen transformation objects (see Figure 2, page 56), including source qualification, joining, normalizing, and external procedure calls. Impressively, you can combine these objects to automate the otherwise difficult maintenance of "slowly changing dimension tables" with surrogate keys in a star schema. This capability is invaluable for data marts that must reflect ever-evolving sales department organizational charts, product classifications, and so on. During evaluation, a few of my developer colleagues complained that the transformations layout seems a bit cluttered, which is understandable considering how much functionality is crammed into it. It can also be a bit unforgiving; you must meticulously check all the right boxes on each object for everything to work properly. The Transformation Developer lets you create additional reusable transformations based on existing ones - or develop custom transformations in C, C++, Visual Basic, and so on - and import them into the repository via the Transformation Expression (TX) API. Responding to criticism that the suite includes no robust "data scrubbing" utility - which is especially important for name and address data in customer relationship marketing applications - Informatica has integrated Trillium Software's data cleanser with PowerCenter via this API. In addition, Informatica has integrated PowerCenter with the SAP Business Information Warehouse (BW) as a tool for importing non-SAP data. (It would also be nice to see templates for getting data out of SAP and other ERP systems.)
The PowerMart/PowerCenter Repository serves as the central metadata library. Through the Repository Manager, the warehouse administrator organizes the design components in the library and controls access to them. The Repository Manager's Metadata Browser provides a familiar Windows Explorer-type interface for browsing the contents of the "card catalog," which is organized into folders. Each folder contains the mappings and mapping components (source, target, and transformation objects) that define the process of populating a particular target data mart. The Repository Manager gives the warehouse administrator control over who can create, modify, read, and use each folder, mapping, and object at the user and group levels. With PowerCenter in the mix, you can promote the metadata components in a data mart's folder to the warehouse level and then share and reuse them within other data marts. A master version of the component, which exists in the PowerCenter Global Metadata Repository, then publishes to dependent data marts through hotlink shortcuts. Among other benefits, this approach facilitates the maintenance of a rigorous conformance among the dimension tables (which are a kind of target object) in a dimensional data mart.
The Server Manager includes a graphical user interface for configuring, scheduling, and monitoring Informatica loads (time based, event based, and on demand). The optional PowerCapture module supports change-data-capture warehouse-update strategies from DB2, Sybase, or Oracle sources. The PowerMart (or PowerCenter) Server is the engine that drives the physical transformation of data from source to target. The server runs on single-processor or symmetrical-multiprocessor (SMP) Windows NT or Unix platforms. It features overlapped, parallel execution of three process engines: the Extractor, Transformation Engine, and Loader. Although the data throughput of early versions of PowerMart was suspect, it should no longer be an issue. In one simple transformation on a Sun Microsystems 4000E Unix server, I used the PowerMart Server to denormalize five- and eight-million row Oracle8 tables into a single eight-million row table in a little over an hour, inserting more than 1,600 rows per second during the write phase. The server can extract and transform data from heterogeneous sources during a single load. A new PowerCenter module, PowerConnect, provides high-speed direct access to IBM DB2 sources on MVS mainframes.
Among the most attractive features of the Informatica suite is its MX API for exchanging metadata with business intelligence tools. Introduced in 1998, one of the API's first implementations was an impressive integration between the Informatica repository and MicroStrategy Inc.'s DSS Suite. The metadata "bridge" between these two tool suites lets MicroStrategy data-mart developers read preexisting dimensional metadata from the Informatica repository directly into the DSS Architect repository. This bridge saves an enormous amount of duplicate metadata mapping effort and speeds data mart deployment dramatically. Furthermore, this bridge lets DSS Agent client front-end users see data lineage information - such as the definition of the data, its source, and last update - by right-clicking on a number on a view. The new COM-based MX2 API extends the capabilities of MX to a wider set of business intelligence tools. One of the keys is a richer set of multidimensional metadata that facilitates integration with multidimensional database servers such as Hyperion Solutions Corp.'s Essbase. In addition, a bidirectional metadata bridge lets a manager using Brio Technology Inc.'s BrioQuery, for example, write data definitions from BrioQuery back to the Informatica repository for access by other managers. The list of business intelligence vendors with implemented MX2 metadata bridges now includes MicroStrategy, Brio, Hyperion, Business Objects SA, and Hummingbird Communication Ltd., and will soon include Cognos, Information Advantage Inc., Viador Inc. (formerly InfoSpace), and Seagate Software Inc. One of the practical benefits of this broad support from the leading business intelligence tools vendors is that preexisting front-end tools in different departments are no longer an obstacle to collecting your independent data marts into an enterprise data-mart architecture with conformed business rules and dimensions. You can define a "common prayer book" using the PowerCenter repository, and each department can keep its favorite front-end tool. If you're starting from scratch, so much the better. In this scenario, the PowerCenter Global Metadata Repository maintains and enforces conformed definitions of measures and dimensions across all the atomic detailed, dimensionally modeled data "supermarts" in the data warehouse (one per data source). Using these detailed data marts as a master source, the MX2 metadata bridges make it easy to carve out more focused, summary data marts designed to address particular business problems using whichever business intelligence tool works best. For example, finance departments can use Essbase for budget analysis, marketing can use Information Advantage for customer relationship marketing, and both groups can draw from the same data sources and use the same terminology. The bottom line is that although the Informatica suite will support Inmon-style third normal form data warehouses as well as Kimball-style dimensional data warehouses, it also provides the ideal framework for executing the enterprise data mart architecture Kimball calls data warehouse bus architecture in The Data Warehouse Lifecycle Toolkit (Wiley, 1998) and in his November 1998 column in Intelligent Enterprise ("Coping with the Brave New Requirements"). Just make sure you keep an experienced DBA involved who knows what's going on under the covers.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| |||||||||||||||||||||||||||||||




















