Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




April 5, 2003

Data Mining: A Call To Action

Businesses can no longer afford to let data warehouse teams serve as passive onlookers in the data mining process

by Michael L. Gonzales

Continued from Page 1

Could it be that we too often associate data mining with intense math and statistics, which becomes a mental barrier and the genesis for excuses by warehouse planners? Or are most warehouse teams still grappling with the fundamentals, such as ad hoc reporting and OLAP techniques and technologies? In either case, the question doesn't bode well for many warehouse teams. If we recognize data mining as an integral part of BI, then we as warehouse practitioners must do more to ensure that we're actively supporting the technology to address our most complex business requirements.

How Your Warehouse Must Support Mining

Traditional warehouse environments should support several aspects of data mining.

Data acquisition. Approximately 40 percent of any mining effort involves acquiring the necessary data for the mining run. This particular task is one that data warehouse resources are well suited to provide; most warehouse environments have resident extract, transform, load (ETL) programmers as well as the necessary tools to perform the job in an efficient manner. There's absolutely no reason why these same resources can't be used to extract the necessary data for mining — just as they're expected to do for any other BI-centric application, such as OLAP.

Data cleansing and transformation. The workload associated with the cleansing and transformation of data mining source data is estimated to be 10 percent of the overall mining effort. Again, this particular task is especially well suited for warehouse personnel and resources. Already, many warehouse teams have invested heavily in ETL software and supplemental tools for data quality efforts. It's not unusual to see Ascential Software Corp.'s Data Stage used in conjunction with Vality (Ascential acquired Vality last year), nor is it uncommon to see Informatica PowerCenter and Trillium's Software System working side by side in warehouse efforts. So why aren't these tools being employed in mining efforts?

Data loading. Certainly, the ETL tools data architects implement in warehouse environments can support the loading of mining model data. However, warehouse administrators also provide an extremely flexible environment from which to propagate data within the warehouse structures. Therefore, as data is loaded into an atomic level of a warehouse or a specific data mart, it can also be fed through mining models — especially if the mining model itself can be executed with SQL. And with the establishment of predictive model markup language (PMML), this approach is increasingly feasible.

Production implementation of mining models. PMML is providing new flexibility in the implementation of mining models. For example, with this standard, you can develop a mining model in SAS and execute that model in a DB2 relational database. (See Listing 1.) Although the PMML standard is still new and being developed, it holds a lot of promise for mining practitioners and warehouse architects; the fact that the warehouse administration can implement and maintain the model wherever SQL can be executed makes it especially powerful.

Moreover, PMML and SQL in-database mining functions allow data mining designers to control precisely where and when the model is executed to best support the business issue being addressed. For example, when you telephone a call center and the operator retrieves your warehoused record, a mining model automatically scores you as a potential candidate for a new product or service based on the extensive history stored about you in the warehouse. This technique is very powerful, especially when you're working toward zero latency in your BI environment. (Some relational databases, such as Teradata, already support mining models in SQL. For Teradata Warehouse Miner, the model is generated in SQL and is executed anywhere within the Teradata database.)

Deployment of results. If your warehouse administrators control the implementation of mining models using SQL, then it becomes natural to structure your SQL around the model so that net results of the run feed directly into warehouse tables. As illustrated in Listing 2, a frequency distribution model is used to examine a data set (SELECT statement). Additional SQL is included to create a target table for the distribution results as well. Because the model is in SQL, the only limitation regarding the execution of the frequency model is SQL. This particular mining model can be easily wrapped into any stored procedure program, and executed by any means available to SQL.

Although the traditional warehouse team may not comprise mining experts, that doesn't negate their obligation to actively participate in the mining process. If the warehouse team provides much of the necessary data preparation, then mining experts can spend more time doing what they were hired do to in the first place: Mine the data.



Rate This Article

Comments:

Optional e-mail address:

To Support And Serve

The best approach in data mining projects is to perform much of the data acquisition and preparation as a natural part of the data warehousing process. This method minimizes any delays in the use of mining results. And, if data architects can implement the mining model itself, you not only ensure a faster decision-making process but also the inclusion of mining results into the warehouse structure. The results will include improved use of the corporate information asset as well as the delivery of informational insights to the broadest possible audience.

Michael L. Gonzales [mlg@starfocus.com] is the president of The Focus Group Ltd., a consulting firm specializing in data warehousing. He has written several books, speaks frequently at industry user conferences, and conducts data warehouse courses internationally.


RESOURCES

Related Article at IntelligentEnterprise.com:

"The Origin Of Data," Feb. 1, 2002

"To Believe or Not To Believe," March 8, 2002

"The Golden Rules," May 9, 2002









IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo Jitter
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet Evolution
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space