Data Mining: A Call To ActionBusinesses can no longer afford to let data warehouse teams serve as passive onlookers in the data mining process
by Michael L. Gonzales We've all heard and read about business intelligence (BI). And we all agree that traditional data warehousing concepts such as data marts, cubes, star schemas, and atomic level warehouses all play an important part in the BI process. The operative word is part: BI is much bigger than just traditional warehousing. It spans concepts such as portals, dashboards, spatial analysis, and especially, data mining. Data mining is uniquely qualified to inspire informational insight from massive amounts of detailed data, not unlike that found in atomic-level warehouses. For that reason, it's incumbent on warehouse planners and data architects to ensure that they're active participants in the mining effort, as opposed to passive onlookers. If the warehouse simply dishes up raw data to mining teams, then your organization loses in at least two important respects: timeliness of decision making and maximum return on investment from corporate information assets. Doling raw data to mining projects ensures that substantial cleansing and transformation will be required before mining can occur, causing significant delays before any actual results are found and implemented. Moreover, because the mining effort is outside the mainstream of warehousing, mining results aren't often fed back into the warehouse. Consequently, critical insight found in your warehoused data is never made known to warehouse users. For example, if you score customers as high or low risk, then that information must be made part of your warehoused customer data so that it can be analyzed with any warehouse-centric tool, including ad hoc reports and online analytic processing (OLAP).
But as critical as mining is in BI efforts, most warehouse teams seem unable or unwilling to support this aspect of BI. To the detriment of their organization and its information assets, these teams seem intent only on providing a passive repository serving up data to be mined by other teams, departments, or even third-party vendors. Mining And The BI EnvironmentToday's BI solutions must grapple with the rising flood of data, both in terms of the number of records as well as their size. For example, not only do businesses keep information about existing customers, but more and more they also keep information about previous customers for win-back campaigns and about prospective customers for acquisition models. Many businesses are attempting to analyze incredibly detailed data, as well. For example, telecommunications providers need to analyze all their call detail records. These logs are not unlike Web logs in their sheer volume and messiness. Then there's the growth of the records themselves not only in number, but in attributes per record. Although the rising volume and size of data sets definitely compromises the effectiveness of many warehouse-centric tools, it's a characteristic that mining is uniquely suited to address. For example, OLAP often preaggregates data in order to deal with large data sets. With each aggregation, important detail is lost. Alternatively, mining feasts on the detail data and can crunch through mountains of it. Mining thrives in this environment, which is one of the critical reasons that mining must be an integral part of your BI process. Finally, data mining must be integrated into the business process to let information flow across the organization. The integration of data mining into a warehouse gives the analyst fast and efficient access to the data. The integration of data mining into end-to-end solutions, such as CRM, makes data mining results quickly available to a much wider group of knowledge users. Work Distribution Of Mining EffortsWith this massive data growth challenge in mind, one aspect of the mining effort in particular stands out: Data preprocessing, acquisition, and cleansing represent 80 percent of the overall mining project, which is an extraordinary amount of time spent just preparing the right data (see Figure 1). Fortunately, this aspect of data mining can be actively supported by the warehouse team.
Traditionally, mining teams must go directly to production systems
for source data if the warehouse doesn't contain the necessary data.
But what's especially alarming is that even when the warehouse has
the needed data, mining teams must still perform excessive data
transformation and cleansing to prepare data for two reasons. First,
data mining, not unlike other BI applications, often requires special
transformation of the data. For example, instead of just having the
customer's age in the record, miners might want to store a value
signifying In essence, as opposed to being an active participant in the mining effort, most warehouse environments simply serve up what can only be defined as raw mining data. The question we data architects must ask ourselves is "Why?"
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| |||||||||||||||||||||||||||||||




















