CMP -- United Business Media

Intelligent Enterprise

Better Insight for Business Decisions

UBM
Intelligent Enterprise - Better Insight for Business Decisions
Part of the TechWeb Network
Intelligent Enterprise
search Intelligent Enterprise





January 30, 2001



Data Mining for the People

Both administrators and users may be pleased with the latest SQL Server release

PRODUCT SPEC SHEET
SQL Server 2000


Microsoft Corp.
One Microsoft Way
Redmond, WA 98052-6399
Phone 425-882-8080
www.microsoft.com

Released: October 2000

Pricing: Personal Edition $499 total, Developer Edition $499 total, Standard Edition $4,999 per processor, Enterprise Edition $19,999 per processor.

Minimum Requirements: For Enterprise Edition: Pentium 166MHz or faster; 64MB RAM (128MB recommended); Hard disk space -- 95 to 270MB (250 typical) for SQL Server database components, 50MB minimum (130MB typical) for Analysis Services, 80MB for English Query, 44MB for Desktop Engine; VGA monitor or better (8003600 or better resolution required for the SQL Server graphical tools); CD-ROM drive.

by Geoffrey Hollander     

Several years ago, word processors and spreadsheets were rudimentary, expensive products. Now they're commonplace, moderately priced, and powerful. The former condition was once true for industrial-strength database applications also. But Microsoft SQL Server 2000 has radically changed the face of enterprise database applications by including several new features and services that formerly could be added only by expensive third-party products that too often required experts to run them.

The transformation began with version 7, first introduced with built-in online analytic processing (OLAP) services, English Query, and Unicode support. New Full-Text Search is another potent example, providing easy access to your data through a custom front-end interface, such as commonly used for Internet searches, that retrieves any queried text from within fields, documents (typically stored in image fields), or both.

SQL Server 2000 is optimized for enterprise and e-commerce use in that it supports up to 32 multiprocessors and can utilize, in Enterprise Edition, as much as 64GB of RAM. These traits translate into efficient handling of databases as large as 2TB and readiness either to increase server processing power with more powerful hardware or to scale up.

SQL Server can "scale out" as well, by increasing processing power in a modular fashion. You can create a cluster of computers that you can expand by adding Windows 2000 computers (nodes) made of low-cost components right off the shelf.

Huge Web sites, whose growth often generates huge processing loads that can exceed an individual server's capacity, may find scaling out to be their optimum solution for boosting their system's processing performance. SQL Server also supports distributed partitioned views, called federations, used transparently across a group of servers. Each server is "independent," in that it has separate operational rules and can support independent processes and data. In fact, SQL Server 2000 can often support much larger processing loads when partitioned across a federation than on servers with eight or more processors. In addition, SQL Server's failover technology ensures that the system as a whole continues to perform if one of the federated servers goes down, as well as lets you add or exchange individual systems without processing interruptions.

While these features are great for administrators, SQL Server 2000 also brings serious value to its users and the decision makers who depend on their database for ease of use, efficiency, and timely decision support. The new Data Analysis and Data Mining tools, continuing what version 7 started, represent Microsoft's stated strategy for SQL Server to include all the basic services that are intrinsic to a serious database, and to make them accessible to anybody.

Transactions and Analysis in One

Enterprise-level database products are optimized for transaction-generated (OLTP) data, and are therefore organized primarily to support those transactions (such as recording an order from a point-of-sale terminal). They are not designed around providing summary information from which managers can analyze trends that affect their organization or workgroup.

Systems designed to handle queries that can discover key trends and critical factors are called OLAP systems and, since OLAP queries normally require large amounts of data, running them against original detail data in an OLTP system typically slows reporting. Aggregating all of the detail records takes a tremendously long time and the workload generated by the query is so heavy that it can decelerate normal transaction recording times.

SQL Server organizes data into multidimensional OLAP cubes, making queries perform better than when they're run on data in relational tables. And because no two databases are the same, SQL Server supports three modes of cube storage: multidimensional (MOLAP), relational (ROLAP), and hybrid (HOLAP). Because MOLAP storage provides the fastest query response times, it's usually implemented for frequently used cubes needed for rapid query response. ROLAP storage is most often used for large, infrequently queried data sets, and its query response is generally slower than that of the other two storage modes. For queries that access only summary data contained in a partition's aggregations, HOLAP is the equivalent of MOLAP; while for queries that access source data, such as a drilldown for which there is no aggregation, it behaves like ROLAP, and will be slower than a MOLAP structure.

Data Mining Included

SQL Server 2000 incorporates a data mining technology that can train data mining models with data from any OLE DB-enabled relational source, as well as from MOLAP cubes created with Analysis Services. You can also use data mining results to create a dimension you add to a cube, in order to further analyze your data.

SQL Server 2000's new built-in data mining technology provides the kind of power previously available only in expensive ($50,000 to $250,000) third-party applications. If the cost didn't kill you, the time and energy needed to transform your relational data into an appropriately joined, aggregated, and summarized flat file format, with the necessary application-specific data transformations (such as changing text flags in character fields to corresponding numbers in numeric fields) would. In a move that breaks open the worst bottleneck in this process, SQL Server now lets you work directly with the relational data structure already in your database -- saving you a lot of money and the IT personnel a lot of conversion time and effort. For many decision makers, SQL Server's new Decision Tree (for identifying most likely responders and predicting behavior) and Neural Network (often applied to cross-selling) data mining tools, which work equally well with OLAP or relational input, may well be the most exciting features of all. (See Figure 1)

Both included algorithms work very well right out of the box for most generic data. But if you are a statistical specialist or work with data that contains known skews or other crucial anomalies, you can write your own algorithms (using the OLE DB for Data Mining specification) and run them right from within SQL Server 2000's infrastructure. Say, as a simple example, you use direct mail as an advertising medium. You have a stored response history of customers and prospects, as well as some key characteristics such as age, gender, and income. You could set up a decision tree that would, based on these demographics, determine from the "good responder" profile, which customers or prospects are most likely to respond to future mailings.

That being said, my real question was how all this would work on actual business data (skimpy and inconsistent as it often is) in a real world situation. I took a client's data, collected from five years of transactional input, and ran a direct-mail response analysis, first creating my responder model from a subset of known responders. As in most businesses, the data was less than perfect. Even after a basic cleaning, the only factors SQL Server had to work with were gender, age range, income range, and location.

Using the decision tree, I was easily able to determine the most likely responders -- about 12 percent -- from the larger database and segment them off into a separate file for mailing. SQL Server took the data in stride, intuitively changing the character-based gender, age, and income ranges into categories and working with them directly, no conversions needed. The results showed that the most important factors in this case were gender and income -- a surprise, but one that ultimately made sense.

After your modeling technique has evolved over time, you can then establish a Data Transformation Services package that will periodically (for example, before each new mailing is generated) rerun the decision tree using your most current response data and then output the best candidates for your next marketing effort.

Neural Network models can increase your bottom line by increasing your sales to existing customers, which is often more cost-effective than finding new ones, and encouraging them to send more of their business your way. For example, Amazon.
com uses clustering technology to suggest books you might like based on the titles you've ordered in the past.

All in all, SQL Server 2000's enhancement of features such as English Query, VB support, and OLAP services, as well as its inclusion of new features such as federated servers, failover technology, XML support, and true data mining, clearly makes it a significant improvement over previous versions. Whether your business is bricks and mortar or totally cyber, you're going to want to look at this product.

Geoffrey Hollander (gch@nwdatabase.com) owns and operates Northwest Database Services, a direct-marketing database and computer services business in Lake Oswego, Ore.






IE Weekly Newsletter
Subscribe to the newsletter
    Email Address