Matching PatternsPatterns in historical data are the lifeblood of business intelligence and knowledge
By Girish Keshav Palshikar
What Is a Pattern?For the purpose of understanding historical databases, I define a pattern as a significant, high-level structure present in the data. A pattern condenses and summarizes vast amounts of numeric data. Detecting a pattern (and measuring its attributes) in a given source is a significant observation. You see where, when, and how strongly it occurred, variations from a standard reference pattern, repetitions, other measurements, and so forth. Clearly, a pattern is a valuable piece of knowledge. Describing and recognizing such patterns contributes to knowledge building and knowledge reuse within an intelligent enterprise. If a pattern consists of nothing but a group of records, then queries and reports should be able to find it. If a pattern is a group of data values - for example, most guests who stay for more than three nights are men, corporate executives, and use air travel - then clustering algorithms are generally well equipped to detect them. Data mining algorithms often detect such associations of data values as well. But what about patterns that are temporal in nature, like shapes in electrocardiogram (ECG) signals? Also, most experienced businesspeople know that leopards don't change their spots. That is, the experts often have a repository of well-understood patterns that they wish to compare with the current data. So how does one detect a pattern, which is known a priori and not an unknown pattern? What my company's researchers are saying is that often you may need to match a known pattern against given data, rather than automatically detect an unknown pattern. For example, finance managers often describe the health of a company in approximate symbolic patterns. In the context of accounts receivables (AR), they may say "collection is high," "collection is slow," "collection is not improving," or various combinations like "collections are high, not slow, and improving." The last one indicates a healthy status, which may be described in more detail as, "The AR turnover is increasing and the days' sales outstanding (DSO) is well above the given norm and the average collection period (ACP) remains much higher than the credit terms." Here, AR turnover, DSO, and ACP are domain concepts; their values vary over time and can be computed from sales and AR databases. The pattern is clearly composed from smaller subpatterns. As another example, you may look for periods in which "collections are high and slowing down." In the domain of manufacturing systems, experts often know the possible faults that can occur in the system, and they usually characterize these faults by means of inexact symbolic descriptions: If the temperature steadily increases and no sustained pressure builds up, then the out valve may be leaking. Finally, we at TRDDC ask a more general question: "Is there a way to characterize the patterns more logically to resemble their users' verbal informal description?" Characteristics of Temporal PatternsWhat then are the characteristics of temporal patterns? First and foremost, managers and other decision makers often describe a pattern in terms that are qualitative (nonnumeric or symbolic) in nature. (See the sidebar "Ups and Downs" for specific examples.) A temporal pattern is qualitative in that it typically does not specify actual numeric values, time instants, and intervals but deals with temporal relationships between events. (See the sidebar "Time Factors") Symbolic descriptors like high, low, close, rapidly, and so on replace numeric values. A pattern description deals with domain-specific concepts, rather than database columns. A decision maker's descriptions of temporal patterns are conceptually at a very high level of abstraction, removed from table structure, field types, keys, and so on. Users often use inexact, approximate and probabilistic, or fuzzy terms in describing temporal patterns. A pattern is approximate in the sense that its instances may occur several times in a given source and they are usually similar but not identical. If you look at it another way, a pattern is not black and white; it is present in a graded way (described, for example, as a fuzzy degree of truth by a number between 0 and 100) rather than in a binary Boolean (true or false) fashion. Interestingly, a pattern description is composed using smaller, more primitive patterns; typically, the composition operators are logical (
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| ||||||||||||||||||||||||||||||||









