Integrating Structured Data and TextTreating text as a relational application is a viable alternative for many data warehousesBy David Grossman and Ophir FriederEdited by Erik Thomsen Continued from Page 1
Text as a Relational AppNumerous benefits exist for treating text as a relational application. For starters, you don't need to acquire, install, or integrate a text package into the data warehouse to support access to a few text columns. For example, almost every warehouse has a "comments" column or two that lets users enter whatever unstructured data they feel is relevant to the transactional record. But searching these text columns with a Treating text as a relational application also opens the door to parallel processing - something that has eluded the commercial text world because of the inherently sequential nature of the inverted index. The downside, obviously, is that extra overhead happens when you use a relational application, but didn't we go through this argument in the '70s when people were griping that the relational approach was too slow and the best thing to do was to stick with ISAM files? More Next MonthIn the next column, we'll show how you can implement more complex text functionality (such as relevance ranking) and give some more details, performance statistics, and tuning hints on this approach. The bottom line is that treating text as a relational application is a viable alternative for many data warehouses, and it has been deployed in a number of real-world applications. We suspect that as the need for integration of structured data and text increases, more applications will consider solutions similar to the one discussed here. David Grossman [grossman@iit.edu] is an assistant professor of computer science and Ophir Frieder [frieder@iit.edu] is the IITRI professor of computer science at the Information Retrieval Laboratory, Illinois Institute of Technology. RESOURCES Frieder, O., A. Chowdhury, D. Grossman, M. C. McCabe, "On the Integration of Structured Data and Text: A Review of the SIRE Architecture," DELOS Workshop on Information Seeking, Searching, and Querying in Digital Libraries, Zurich, Switzerland, December 2000. Grossman, D., D. Holmes, and O. Frieder, "A Parallel DBMS Approach to IR in TREC-3," Overview of the Third Text Retrieval Conference (TREC-3), NIST Special Publication 500-225, April 1995. Grossman, D. and O. Frieder. Information Retrieval: Algorithms and Heuristics. Kluwer Academic Press, 1998. Grossman, D., D. Holmes, O. Frieder, D. Roberts. "Integrating Structured Data and Text: A Relational Approach." Journal of the American Society of Information Science, February 1997.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| ||||||||||||||||||||||||||||||||










