Knowledge NurturanceChoosing a successful modeling approach requires sensitivity to the subject matter's needsBy Barry Grushkincontinued from Page 1 Modeling and ForecastingHow does modeling fit into this picture? I see this term used differently depending on how much data is at hand. When you have a lot of data, you might call any of the previously mentioned methods "modeling." Fitting a curve, for example, can be called "finding a model." You can in turn use the model to forecast. When a new record comes in, its context (independent variables) can be input to the model and out comes a forecast for the dependent variables - a value or a classification, for example. This forecast might imply a recommended action - such as to buy or sell, solicit or not, or open a store or not. Although we call this process "forecasting," because it assigns a value to an unknown, mathematicians would usually call it interpolation - finding values between known data points. The terms "modeling" and "forecasting" take on an entirely different flavor in the absence of sufficient data. On a recent project for the Congressional Budget Office, I forecast veteran hospital demand 10 years into the future with only a year and a half of past data. This project lent itself to only minimal use of function fitting and automated methods. In a case such as this, the prior beliefs you bring in - which you use to construct the model architecture itself - center on the nature of and assumptions about people and the world.
The model should line up with whatever information is at hand. But you cannot create the model, let alone test it, on data you will not have for 10 years or more. The resulting forecasts are needed for the design of a major infrastructure development project so the office can meet upcoming demand. As with any business situation, the department cannot possibly be ready if it waits for all the data to come in. In short, this use of the term "modeling" involves applying outside knowledge about the past and about how things evolve in order to construct a system or architecture that has a very good chance of predicting the future accurately. You have to focus on nature but leave open the possibility of added nurturing as more information comes in - for the functions and parameters and even the overall model itself. On my project, I knew the number of veterans, and how they used the medical system, in many categories - age ranges, locations, and comparative complexity of medical problems, for example. I also knew how many new veterans were arriving annually, how people in general age (that is, a year at a time), how often people move (rarely), and statistics on sickness and death as a function of the foregoing attributes. I needed to make assumptions about who would continue to use other medical care based on what they did in the past, and so forth. Basically, I constructed in qualitative terms a world view, a connected interacting system that contained the influences that would affect the quantity of interest: VA hospital use. These beliefs would be updated as more information arrived, but here the updating is part of the modeling process with the nature of the model itself being amended and created from real-world understandings. A model's nature (beliefs about systemic relationships in the world) should be nurtured over time as is the nature of the successful longer-term modeling endeavors. To forecast, you put a lot of effort into the solidity of the conceptualization so that an extension of it has the greatest chance of succeeding. This method is vastly different from finding a right function that best fits a data set. Given a few data points, you can find a polynomial that goes right through them: a perfect fit. But nearly always, once you get past the base values, the purely mathematically derived curves fly off far from any future real values. The curve fits perfectly because the idea of choosing any polynomial that would fit best is too plastic - plastic to the current experience - with no insight solidifying a truth about the future. Data mining contains a variation of the same conundrum - are you finding something that is true only from the given sample or discovering something about a, perhaps yet unobserved, population? Comparing which parts of your automatically produced models change with changes in samples gives you some idea of which parts should be considered solid, which are plastic (and appropriately changeable) and which are so changeable as to be considered just the results of noise. But this is far from being a science.As you can see, there is data-driven modeling - which characterizes most statistics and data mining - and there is the modeling used for long-term forecasting based on constructing a systematic representation of the factors and their relationships that evolve over time with very different balances between nature and nurture in each. To succeed, each application requires a correct determination of what is solid and what is plastic, and the correspondingly appropriate tools should be used. With too much solidity, you may be claiming more than you really know. Without developing enough truths to produce some solidity, you may just not have enough data to get the answers you want. With too much plasticity or the wrong kind, you are going to be affected too much by the uniqueness of the moment and will be tossed by the winds. With too little, you will not be able to take advantage of facts that appear in the data and miss this valuable anchoring in reality altogether. Part of the art is to know whether each part should be solid or plastic, with each of the methods I mentioned and their variations emphasizing a differing balance. Being able to differentiate between what changes and what stays the same, what is predictably consistent and what changes rapidly or is unpredictable - what is the sea and what are the waves - is fundamental to the survival of business or beast. It is also fundamental in modeling and forecasting. Barry Grushkin (BLG23@cornell.edu) is Senior Researcher at the Machine Intelligence Company. RESOURCES
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
| |||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||









