首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Knowledge-Enhanced Latent Semantic Indexing
Authors:David Guo  Michael W Berry  Bryan B Thompson  Sidney Bailin
Institution:(1) Department of Computer Science, University of Tennessee, 203 Claxton Complex, Knoxville, TN 37996-3450, USA;(2) Global Wisdom, Inc., 1737 Harvard Street, NW Washington, DC, 20009, USA;(3) Knowledge Evolution, Inc., 1050 17th Street, NW Washington, DC, 20036, USA
Abstract:Latent Semantic Indexing (LSI) is a popular information retrieval model for concept-based searching. As with many vector space IR models, LSI requires an existing term-document association structure such as a term-by-document matrix. The term-by-document matrix, constructed during document parsing, can only capture weighted vocabulary occurrence patterns in the documents. However, for many knowledge domains there are pre-existing semantic structures that could be used to organize and categorize information. The goals of this study are (i) to demonstrate how such semantic structures can be automatically incorporated into the LSI vector space model, and (ii) to measure the effect of these structures on query matching performance. The new approach, referred to as Knowledge-Enhanced LSI, is applied to documents in the OHSUMED medical abstracts collection using the semantic structures provided by the UMLS Semantic Network and MeSH. Results based on precision-recall data (11-point average precision values) indicate that a MeSH-enhanced search index is capable of delivering noticeable incremental performance gain (as much as 35%) over the original LSI for modest constraints on precision. This performance gain is achieved by replacing the original query with the MeSH heading extracted from the query text via regular expression matches.
Keywords:latent semantic indexing  MeSH  metathesaurus  OHSUMED  semantic network  UMLS
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号