CMP -- United Business Media

Intelligent Enterprise

Better Insight for Business Decisions

UBM
Intelligent Enterprise - Better Insight for Business Decisions
Part of the TechWeb Network
Intelligent Enterprise
search Intelligent Enterprise





December 05, 2000



Knowledge Nurturance

Choosing a successful modeling approach requires sensitivity to the subject matter's needs

By Barry Grushkin

People love to ponder the nature vs. nurture question: how much OF our personality is formed by our genetic inheritance, and how much by the environment in which we are raised? It turns out the nature vs. nurture paradigm offers a nice way to talk about some of the key issues that differentiate the many modeling approaches exploited in the decision-support world. With this framework I point out crucial differences between traditional statistics, Bayesian statistics, and data mining, and then focus on what makes data discovery modeling different from modeling for long-term forecasting.

The term "nurture" has two different meanings in this context. "Nurture" can mean providing the nutrients and conducive environment so something can unpack as designed. For example, an acorn will turn into an oak when placed in well-drained, moist topsoil in a temperate climate.

But by "nurture" here, I refer to how the environment partly determines the way something will unpack. That is, the system brings in information about its environment to selectively produce results potentially better suited to the problems of that environment. Code that compiles differently for different operating systems is one example.

For living organisms, the boundary between nature and nurture is fuzzy, and the interaction between the two influences occurs at many levels. A single DNA strand creates different tissues under various localized chemical environments. One of those tissues, brain matter, builds differently depending on stimuli - with nerve endings and their synapses growing in the direction of activity. These initial conditions or trajectories (which could very well include the physical foundation for the universal grammar linguist Noam Chomsky believes common to all languages) yet again interact with a part of the environment, say speech in process, to produce the language skill. Our nature is to be nurtured in an iterative process of unpacking.

Thus biological development shows, among other things, an interplay of components with various levels of plasticity. Some architectural and value components stay the same (are more solid) and some parts change forms and change values (are more plastic) so the system can more successfully interact with the environment by learning and gaining necessary components.

Describing components of information processing systems (including both biological and electronic) in terms of levels of solidity and plasticity emphasizes the issue that the nature vs. nurture polarity in many cases is better thought of as a continuum. It is more true to the potential complexities and becomes a valuable tool for looking at and comparing quantitative analytic methods and the applications where we put them to use.

Traditional Statistics

Let us start with a simple example from traditional statistics - fitting a straight line via regression, as we might do when we are looking for a relation between average purchase size and the age of the customer. You might say that the decision to use a straight line to model the situation (at least in this phase of the analysis) is solid. It is the nature of the model.

However, which line do you use? You still need to determine the parameters: the slope and intercept of that line. This is the nurture part. The environment comes into play by giving you data points and then the statistical methods (as represented perhaps in software) give you a number of ways to learn from this experience and pick a good line.

Standard statistics comes with all sorts of measures that tell you how well your model fits the data and potentially reality. Regression, for example, gives you an R-squared value. A value of one is a perfect fit (too good to be true in real life); a value of zero indicates no amount of nurturing is going to make this nature (a line) tell you anything about this data.

Bayesian Statistics

In traditional statistics, the nature is the chosen molding form. The nurturing is done by parameter estimation, with viability or success given by calculating various confidence measures. Bayesian statistics adds something else to the nature side that you can use in iterative layers. It's a bit analogous to the repeated nature and nurture layers that lead to people being able to learn speech. In addition to the model used in traditional statistics, a Bayesian statistician adds to the model's initial nature "priors," or preliminary beliefs about parameters, adding some solidity to this otherwise plastic component. A Bayesian thinks less in terms of measured frequencies, more in terms of beliefs about the world. If chosen reasonably, these beliefs can make for rapid convergence to solutions that might never otherwise have been found.

These preliminary assumptions are sometimes required to get any result at all. For example, for the eye to see or the ear to hear, the brain has to make assumptions about the world - there is just not enough information otherwise to fill in visual blanks, understand whispers, or learn full-blown generative speech from the limited examples we are exposed to. A Bayesian might, for example, start a model with the belief that the text query "hot" refers as equally to temperature as to "being in vogue."

Then the nurturing part starts. You see which articles a user actually goes to from this query, and with the new information you update. If a user clearly picks articles where "hot" means "in vogue," you might now have a posterior (post fact) update. With a now perhaps 55-percent belief level on this issue, you have a new model (with the prior belief set at 55 percent) waiting to be nurtured with still more data, and so forth.

In a knowledge management system such as Autonomy Corp.'s, rather than updating the beliefs about priors on a global basis, it keeps separate values for each person in each of their contexts. Thus if you are wearing your medical hat, the system's belief that AMA means the American Medical Association rather the American Management Association will over time converge to near certainty. A Bayesian could even make the molding method more plastic. If uncertain about whether a line or a quadratic might fit best, the analyst might assign preliminary probabilities to these options as well.

Data Mining

Data mining, in most cases, differentiates itself from both traditional and Bayesian statistics in that part of the modeling form or architecture is plastic and determined by data (experience) just like the parameters. The choice of a method - say, association rules - picks out a general category of modeling forms to contain potential solutions and does not require choosing a specific function type, such as a linear fit, as traditional statistics demands.

For example, the choice of a decision tree method (say, applied to an e-commerce problem) defines that you wish to use a hierarchy of conditionals on your variables in order to classify your population (say, by who you expect will or will not buy a red snow suit). But it does not define exactly which variables will be used and in which order. The variables chosen, their ordering, and their split points are defined by the nature of the algorithm as the environment nurtures it.

You can compare this plasticity to the way the developing human brain grows new connections to be more perceptive of a repeated stimulus. (Mice, for example, when given sore feet for a few weeks after birth are measurably more sensitive to pain later in life and have grown noticeably more pain nerve endings in their toes.)







IE Weekly Newsletter
Subscribe to the newsletter
    Email Address