Data Mining: A Call To ActionBusinesses can no longer afford to let data warehouse teams serve as passive onlookers in the data mining process
by Michael L. Gonzales Continued from Page 1 Could it be that we too often associate data mining with intense math and statistics, which becomes a mental barrier and the genesis for excuses by warehouse planners? Or are most warehouse teams still grappling with the fundamentals, such as ad hoc reporting and OLAP techniques and technologies? In either case, the question doesn't bode well for many warehouse teams. If we recognize data mining as an integral part of BI, then we as warehouse practitioners must do more to ensure that we're actively supporting the technology to address our most complex business requirements. How Your Warehouse Must Support MiningTraditional warehouse environments should support several aspects of data mining. Data acquisition. Approximately 40 percent of any mining effort involves acquiring the necessary data for the mining run. This particular task is one that data warehouse resources are well suited to provide; most warehouse environments have resident extract, transform, load (ETL) programmers as well as the necessary tools to perform the job in an efficient manner. There's absolutely no reason why these same resources can't be used to extract the necessary data for mining just as they're expected to do for any other BI-centric application, such as OLAP. Data cleansing and transformation. The workload associated with the cleansing and transformation of data mining source data is estimated to be 10 percent of the overall mining effort. Again, this particular task is especially well suited for warehouse personnel and resources. Already, many warehouse teams have invested heavily in ETL software and supplemental tools for data quality efforts. It's not unusual to see Ascential Software Corp.'s Data Stage used in conjunction with Vality (Ascential acquired Vality last year), nor is it uncommon to see Informatica PowerCenter and Trillium's Software System working side by side in warehouse efforts. So why aren't these tools being employed in mining efforts? Data loading. Certainly, the ETL tools data architects implement in warehouse environments can support the loading of mining model data. However, warehouse administrators also provide an extremely flexible environment from which to propagate data within the warehouse structures. Therefore, as data is loaded into an atomic level of a warehouse or a specific data mart, it can also be fed through mining models especially if the mining model itself can be executed with SQL. And with the establishment of predictive model markup language (PMML), this approach is increasingly feasible. Production implementation of mining models. PMML is providing new flexibility in the implementation of mining models. For example, with this standard, you can develop a mining model in SAS and execute that model in a DB2 relational database. (See Listing 1.) Although the PMML standard is still new and being developed, it holds a lot of promise for mining practitioners and warehouse architects; the fact that the warehouse administration can implement and maintain the model wherever SQL can be executed makes it especially powerful. Moreover, PMML and SQL in-database mining functions allow data mining designers to control precisely where and when the model is executed to best support the business issue being addressed. For example, when you telephone a call center and the operator retrieves your warehoused record, a mining model automatically scores you as a potential candidate for a new product or service based on the extensive history stored about you in the warehouse. This technique is very powerful, especially when you're working toward zero latency in your BI environment. (Some relational databases, such as Teradata, already support mining models in SQL. For Teradata Warehouse Miner, the model is generated in SQL and is executed anywhere within the Teradata database.)
Deployment of results. If your warehouse administrators control
the implementation of mining models using SQL, then it becomes
natural to structure your SQL around the model so that net results of
the run feed directly into warehouse tables. As illustrated in
Listing 2, a frequency distribution
model is used to examine a data
set ( Although the traditional warehouse team may not comprise mining experts, that doesn't negate their obligation to actively participate in the mining process. If the warehouse team provides much of the necessary data preparation, then mining experts can spend more time doing what they were hired do to in the first place: Mine the data. To Support And ServeThe best approach in data mining projects is to perform much of the data acquisition and preparation as a natural part of the data warehousing process. This method minimizes any delays in the use of mining results. And, if data architects can implement the mining model itself, you not only ensure a faster decision-making process but also the inclusion of mining results into the warehouse structure. The results will include improved use of the corporate information asset as well as the delivery of informational insights to the broadest possible audience. Michael L. Gonzales [mlg@starfocus.com] is the president of The Focus Group Ltd., a consulting firm specializing in data warehousing. He has written several books, speaks frequently at industry user conferences, and conducts data warehouse courses internationally. RESOURCESRelated Article at IntelligentEnterprise.com: "The Origin Of Data," Feb. 1, 2002 "To Believe or Not To Believe," March 8, 2002 "The Golden Rules," May 9, 2002
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| |||||||||||||||||||||||||||||||





















