Integrating Structured Data and Text: Part 2Build relationally integrated systems to fully leverage your warehouse investmentsBy David Grossman and Ophir Frieder Continued from Page 1 TANDAt times users want a threshold
As we said earlier, any of these queries may be joined with structured data. If we want to find
documents written only by employees, we can easily join the selected document list with the employee
table. Assume that we have an Employee table with columns Relevance RankingMultiterm queries work, and existing search engines use them, but they lack the incorporation of term weights. Some terms are simply more important than others. To ensure that unimportant terms don't dominate retrieval, most Web search engines use queries with term weights. Search engines rank documents based on a relevance measure that computes the relevance of a given
document to a given query. A common document ranking strategy, known as the vector space model,
represents each document and query as a vector and ranks the documents according to the distance
between the vectors. A means of computing this distance is to take the inner product of the two
vectors. Various weights may be used for each term; assume each term has a weight that is stored in
the
This is a very simplistic similarity measure. For a survey of other ranking strategies and SQL to support them check out our book, Information Retrieval: Algorithms and Heuristics, or our Web site at www.ir.iit.edu. TUNINGAs with any other kind of application, getting it to run correctly is not the same as getting it to run well. So expect to spend some time tuning your text applications once you have them running. DBMSs that permit a clustered index on term in the David Grossman [grossman@iit.edu] is an assistant professor of computer science and Ophir Frieder [frieder@iit.edu] is the IITRI professor of computer science at the Information Retrieval Laboratory, Illinois Institute of Technology. RESOURCES Chowdhury, A., O. Frieder, D. Grossman, and M. McCabe, "Collection Statistics for Fast Duplicate Document Detection," to appear in ACM Transactions on Information Systems (TOIS) Frieder, O., A. Chowdhury, D. Grossman, M. C. McCabe, "On the Integration of Structured Data and Text: A Review of the SIRE Architecture," DELOS Workshop on Information Seeking, Searching, and Querying in Digital Libraries, Zurich, Switzerland, December 2000. Grossman, D. and O. Frieder. Information Retrieval: Algorithms and Heuristics. Kluwer Academic Press, 1998. Grossman, D., D. Holmes, O. Frieder, D. Roberts. "Integrating Structured Data and Text: A Relational Approach." Journal of the American Society of Information Science, February 1997.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
| ||||||||||||||||||||||||||||||||










