首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Improving English and Chinese Ad-Hoc Retrieval: A Tipster Text Phase 3 Project Report
Authors:KL Kwok
Institution:(1) Computer Science Department, Queens College, CUNY, 65-30 Kissena Boulevard, Flushing, NY, 11367
Abstract:Both English and Chinese ad-hoc information retrieval were investigated in this Tipster 3 project. Part of our objectives is to study the use of various term level and phrasal level evidence to improve retrieval accuracy. For short queries, we studied five term level techniques that together can lead to good improvements over standard ad-hoc 2-stage retrieval for TREC5-8 experiments. For long queries, we studied the use of linguistic phrases to re-rank retrieval lists. Its effect is small but consistently positive.For Chinese IR, we investigated three simple representations for documents and queries: short-words, bigrams and characters. Both approximate short-word segmentation or bigrams, augmented with characters, give highly effective results. Accurate word segmentation appears not crucial for overall result of a query set. Character indexing by itself is not competitive. Additional improvements may be obtained using collection enrichment and combination of retrieval lists.Our PIRCS document-focused retrieval is also shown to have similarity with a simple language model approach to IR.
Keywords:language model and PIRCS retrieval model  ad-hoc two-stage retrieval  pseudo-relevance feedback  collection enrichment  segmentation and Chinese IR
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号