Preemptive IntelligenceThe Department of Homeland Security's concept of an information sharing and analysis center designed to watch for crisis signals so that it can respond rapidly and intelligently if one does unfold has merit for private businesses. Here's why
by Lisa Sokol Continued from Page 1 Data CollectionGiven a set of signals of interest, the challenge is to determine actual data observables that are associated with the event. ISACs start collecting data that users believe can help predict these events of interest. The data may take many forms, including structured, unstructured, qualitative, and quantitative. Documents (or unstructured text) are one of the largest sources of data. These documents can be in the form of email, company documents, newspaper articles, or even articles found on the Internet. The potential of the ISAC is realized as users acquire more and more data that is potentially relevant to the signals that they are trying to collect. Data TransformationUntil recently, distilling and analyzing the relevant facts from text (or text mining) has been a challenging, time-consuming process. New natural language technology called entity extraction has revolutionized our ability to take unstructured text-based documents and automatically extract signals information in the form of people, places, things, and events names, places, currency fluctuations, organizations, phone numbers, riots, legislative initiatives, product introduction, and more. The entity extraction process is truly exciting because it lets us apply reason to the facts embedded in text. Several entity extraction software products are on the market, including those made by SPSS, Attensity, ClearForest, SRA, and Inxight. XML tags are an important component of entity extraction technology. Entity extraction software automatically tags data with appropriate XML tags specifying name, organization, location, and so on. Incoming data is also tagged with metadata concerning content, originator, level of classification, date, and so forth. For structured data transformation, XML tags can contain metadata that can dramatically improve the interoperability of the data. The data transformation process also focuses on creating database rows that are relevant to the topics that we wish to mine. The temporal aspect of the data is an important component of the transformation. Temporal analysis lets us reason with the time between events. The XML tag, whether it is derived or comes already associated with data or documents, helps improve our ability to exploit the data and use more sophisticated analysis tools. XML facilitates distributed queries, helps integrate the results from different entity extraction engines, and helps control access to specific data sources and data elements. Data WarehouseAn important contributor to the success of an ISAC is the creation of a data warehouse that integrates data from structured sources with data derived from unstructured text. Database integration is one of the most challenging tasks, because each data source has its own database design and data dictionary. The warehouse structure must facilitate the analysis required by users. The data warehouse must also act as a repository for the business rules associated with the detection of a signal. The data integration can occur either virtually or within a centralized data warehouse.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| ||||||||||||||||||||||||||||||||









