The Oracle WayThe company claims BI is a key area for consolidating functionality around its solutions, but does it deliver?
by Jack Hakim and Tom Spitzer Continued from Page 1 Oracle Data Mining (ODM). ODM incorporates supervised and unsupervised learning models. Supervised learning models, sometimes called directed models, are used to predict a value, such as the probability that a customer will respond favorably (buy) to a promotional mailing. These techniques are appropriate for scenarios where you identify a dependent variable and want to model how a group of independent variables influence it. One supervised model, classification, is provided by the Naive Bayes (NB) algorithm, and by decision trees in the form of Adaptive Bayes Networks (ABN). ABN significantly expands the classification capabilities of ODM. In addition to supporting a larger number of attributes, ABN is "transparent" - the rules used to make decisions by the algorithms are understandable by a business analyst. NB is often faster than ABN, but not always as accurate. Your choice of algorithm will depend on the data's characteristics as well as the amount of transparency you want into how the algorithm reaches its decisions. In conjunction with these classification tools, ODM also includes a "Model Seeker" algorithm that can build a series of NB and ABN models, evaluate them, and recommend which one to use. Clustering or association algorithms are unsupervised-learning approaches used to detect similarities and patterns intrinsic to the data. The algorithms partition the data into groups such that each member of the group has more intragroup adhesion than adhesion to members of other groups. ODM includes two clustering algorithms, a relatively stock k-means approach, and a relatively new proprietary model Oracle calls O-Cluster. O-Cluster is better suited for more highly dimensional data (like that encountered in life sciences applications) and analyzes both numeric and categorical attributes, whereas k-means analyzes only numeric attributes. ODM performs attribute importance using the Predictor Variance algorithm. The objective of this algorithm is to provide insight into how various attributes influence behaviors. This approach is used to obtain a ranking of the influence of consumer attributes on their decisions. ODM also provides a SQL-based algorithm called Apriori to determine association rules. Apriori highlights one of the major distinctions between ODM and other data mining approaches: the fact that all the algorithms and data reside within the database. Keeping everything in the database eliminates the need to extract data sets for building and testing the model, and facilitates the insertion of the results back into the database or the joining of data mining results with either entity information or summative statistics using relatively simple SQL queries. Data mining projects can be built as custom solutions that integrate the work of the Java-based ODM API with direct SQL database access and multidimensional analysis with the OLAP API. Oracle wants to put powerful data mining capabilities in the hands of a much wider range of users. To that end, its data mining group is developing wizards that greatly facilitate the creation of data mining modules, and a visualization tool that enables interactive review of data mining results in much the same way as an analyst uses a pivot-table style interface to interact with multidimensional data. Oracle9i Warehouse Builder (OWB). OWB assembles several elements that enable warehouse architects to design a logical model of a warehouse, define ETL rules for moving data from various sources into the warehouse, deploy and populate the warehouse, and manage ongoing updates. Information about data sources - the structure and configuration of the warehouse and ETL processes - persists in the OWB repository, a set of tables structured in accordance with the Common Warehouse Metamodel (CWM) specifications adopted by OMG. The OWB Client presents a series of forms and workflows that contain user interface objects representing the source data and target structures and the transformations required to get from the source to the target. ETL operations are defined in mappings that can include filters, joins, splits, deduplicators, and transformations. A graphical expression builder facilitates defining these operations. When all the definitions and transformation rules are complete, OWB provides a one-click generation facility for generating all the PL/SQL ETL code they imply. OWB supports a variety of data sources, including flat files, SAP systems, and other DBMS products, generating extraction code appropriate to the data source. The OWB run-time environment comprises a library of tables, sequences, packages, and triggers that are installed in the target warehouse schema, and which are referenced by the generated ETL code. These database objects provide auditing and error-detecting capabilities. A set of reports available through the 9iAS Portal will display the ETL audit trail and error reports. There are also lineage reports that describe how information in the data warehouse was derived as well as a graphical visualization of impact analysis that displays all dependent objects. This analysis drives a powerful change reconciliation feature that automatically updates all the affected PL/SQL procedures when a change is made. OWB integrates with Oracle Workflow and Enterprise Manager so that dependencies in the order of ETL operations can be defined and schedules can be established. Oracle OLAP and the OLAP Catalog. When you have assembled the data in the warehouse, you create the OLAP Catalog metadata. The OLAP Catalog is one component of Oracle OLAP, comprising a set of tables that maintain OLAP metadata. Like the OWB repository, these structures conform to CWM specifications. In the current database (9.2) incarnation, there are really two repositories: a CWM Lite repository managed by Oracle Enterprise Manager, and a CWM2 repository that can be managed with a set of PL/SQL packages and views included with the 9.2 Enterprise edition of the database. Among other extensions, the CWM2 repository affords more flexibility in specifying hierarchies and support for aggregations. Creating OLAP Catalog metadata a prerequisite to using the OLAP API. At this point you need to decide whether to store historical and derived data in fact tables in the data warehouse or in multidimensional objects in analytic workspaces, which Oracle has created to support analytic processing. Multidimensional objects support a richer set of analytic functions than the relational engine supports, including hierarchical weighted averages, aggregations to varying levels, statistical forecasting and allocations. Oracle offers the flexibility of using an entirely relational schema consisting of fact and dimension tables, a multidimensional schema consisting of dimensions and variables in the analytic workspace, or a combination of the two. In addition, it provides the ability to define materialized views in conjunction with relational storage. Materialized views are essentially summary tables that contain aggregate values that get dynamically updated when the underlying detail data changes. Oracle's query optimizer will redirect queries to materialized views where appropriate. Metadata in the OLAP Catalog maps the data structures defined in either the relational warehouse schema or the multidimensional schema onto multidimensional objects such as measures, dimensions and their internal hierarchies, attributes, and cubes. The general process is to first create the dimensions, specifying the levels, attributes, and hierarchies associated with each; create cubes and specify the dimensions they include; and create the measures that represent the fact data, associating each measure with a cube. All the measures associated with a cube must share the same set of dimensions. When cubes are created, these metadata entities can be mapped to source data. Finally, analysis services can begin and analytic applications can access multidimensional data through the OLAP API. Up At BatThus far, we've examined database functionality and tools that manage metadata to enable analytic reporting and query applications. Not all analytic applications will require implementing this entire set of technologies. However, within the scope of enterprise activities, it's likely that all will have some relevance. Although Oracle hasn't yet completely integrated this diverse set of technologies, its progress thus far is impressive: A substantial amount of analytic power is packaged into the Oracle database now. Over the past few years, Oracle has also made a great deal of progress in improving the usability of its administrative tools, such as OWB and Oracle Enterprise Manager, which makes the engine functionality more accessible and reduces training and administration costs. To a large extent, we accept the strategy of packaging the analytic engine and the data mining functionality into the database engine - not only for the performance, administrative, and security reasons that Oracle likes to emphasize, but also because moving toward a consistent set of APIs across different data analysis engines makes all functionality more accessible. In our next installment, we'll examine the Oracle tools that analysts and decision makers can use to obtain the benefits of this infrastructure. We'll then review concerns about scaling, usability, and performance, and summarize how this all fits into the IT context. Jack Hakim [jhakim@ecwise.com] and Tom Spitzer [tspitzer@ecwise.com] are principals at ECWise Inc., with offices in the San Francisco Bay area and Seattle. RESOURCESOracle BI Web page: www.oracle.com/solutions/business_intelligence/index.html?featureoverview.html Related Articles at IntelligentEnterprise.com: "Enterprise Foundation," Oct. 24, 2001
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| ||||||||||||||||||||||||||||||||









