期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

针对搜索引擎在信息检索过程中存在的局限性，提出基于本体的多Agent智能检索系统模型，给出模型的体系结构、工作流程和功能描述。系统中智能Agent借助本体知识对检索请求信息进行规范化描述，以提高检索的准确率和覆盖率；各个Agent分工协作完成信息检索与自动更新服务，体现系统的智能化与个性化等特点，为实现高效智能检索系统的研究奠定基础。相似文献

10.

从国外权威性检索工具的频繁变动看我国信息检索系统存在的问题

李占兵李欣欣于双成《图书情报工作》1997,41(8):59

列举并分析了CA与IM的频繁变动情况,认为这是检索系统的一种自我完善的重要措施,同时指出我国的信息检索系统在这一方面存在的问题? 相似文献

11.

中文全文检索技术的研究及实现 总被引：9，自引：0，他引：9

李梅王庆林《情报学报》2003,22(1):10-17

本文设计了一个中文全文检索系统 ,在单汉字全文数据库的基础之上进行了全文检索的算法研究 ,提出了针对特定检索策略的计算公式。同时还对检索结果集的排序问题进行了讨论 ,并采用用户反馈信息量 ,使最后检出的结果在应用中不断得到优化相似文献

12.

Applying Machine Learning to Text Segmentation for Information Retrieval 总被引：2，自引：0，他引：2

Xiangji Huang Fuchun Peng Dale Schuurmans Nick Cercone Stephen E. Robertson 《Information Retrieval》2003,6(3-4):333-362

We propose a self-supervised word segmentation technique for text segmentation in Chinese information retrieval. This method combines the advantages of traditional dictionary based, character based and mutual information based approaches, while overcoming many of their shortcomings. Experiments on TREC data show this method is promising. Our method is completely language independent and unsupervised, which provides a promising avenue for constructing accurate multi-lingual or cross-lingual information retrieval systems that are flexible and adaptive. We find that although the segmentation accuracy of self-supervised segmentation is not as high as some other segmentation methods, it is enough to give good retrieval performance. It is commonly believed that word segmentation accuracy is monotonically related to retrieval performance in Chinese information retrieval. However, for Chinese, we find that the relationship between segmentation and retrieval performance is in fact nonmonotonic; that is, at around 70% word segmentation accuracy an over-segmentation phenomenon begins to occur which leads to a reduction in information retrieval performance. We demonstrate this effect by presenting an empirical investigation of information retrieval on Chinese TREC data, using a wide variety of word segmentation algorithms with word segmentation accuracies ranging from 44% to 95%, including 70% word segmentation accuracy from our self-supervised word-segmentation approach. It appears that the main reason for the drop in retrieval performance is that correct compounds and collocations are preserved by accurate segmenters, while they are broken up by less accurate (but reasonable) segmenters, to a surprising advantage. This suggests that words themselves might be too broad a notion to conveniently capture the general semantic meaning of Chinese text. Our research suggests machine learning techniques can play an important role in building adaptable information retrieval systems and different evaluation standards for word segmentation should be given to different applications. 相似文献

13.

全文检索处理技术研究Ⅰ:汉字全文检索技术

王源秦聿昌刘滨《情报学报》1997,(1)

提出了汉字全文检索系统的新的数据结构、建库和检索的算法，完成了程序设计、用于对中国化学文献数据库标题和文摘的检索，测定了索引建立时间、空间消耗和检索的响应时间，计算了每篇文献的长度在不同范围时的高频字数和索引空间消耗，讨论了索引膨胀比与文献长度的关系相似文献

14.

汉字全文检索系统的关键技术与实现 总被引：14，自引：1，他引：13

张俭恭陈定权《现代图书情报技术》2001,17(2):16-18

全文检索作为一种发展迅速的情报检索技术, 在近年来已得到广泛的关注并走向市场。本文在对中西文全文检索系统进行分析比较的基础上, 做出了新的尝试, 提出了一种新的索引建立方式, 并在此基础上实现了支持模糊提问的全文检索。同时独立设计了自己的数据结构和算法, 以及利用V isual C+ + 在W indows 环境下加以实现。最后, 对目前汉字全文检索中仍存在的一些问题进行了探讨。相似文献

15.

An empirical study of tokenization strategies for biomedical information retrieval

Jing Jiang ChengXiang Zhai 《Information Retrieval》2007,10(4-5):341-363

Due to the great variation of biological names in biomedical text, appropriate tokenization is an important preprocessing step for biomedical information retrieval. Despite its importance, there has been little study on the evaluation of various tokenization strategies for biomedical text. In this work, we conducted a careful, systematic evaluation of a set of tokenization heuristics on all the available TREC biomedical text collections for ad hoc document retrieval, using two representative retrieval methods and a pseudo-relevance feedback method. We also studied the effect of stemming and stop word removal on the retrieval performance. As expected, our experiment results show that tokenization can significantly affect the retrieval accuracy; appropriate tokenization can improve the performance by up to 96%, measured by mean average precision (MAP). In particular, it is shown that different query types require different tokenization heuristics, stemming is effective only for certain queries, and stop word removal in general does not improve the retrieval performance on biomedical text. 相似文献

16.

基于Agent的信息系统模糊检索接口研究 总被引：2，自引：0，他引：2

刘建勋张申生王英林《情报学报》2001,20(4):464-470

本文介绍利用全文检索技术与Agent技术完善基于RDBMS开发的信息系统检索功能的不足之处 (即未能提供模糊检索的功能 )的方法 ,并介绍了用该方法所解决的一个实例以及对该实例的性能所进行的深入分析。分析结果证明该方法是有效、可靠和有实用价值的。对于解决图书馆书目模糊查询、旅游地址模糊查询、CBD(CaseBasedDesign)中的实例模糊检索等一类问题很有帮助。相似文献

17.

超文本全文检索系统的研究 总被引：4，自引：3，他引：1

张子枫方正《现代图书情报技术》1996,12(1):7-11

全文检索和超文本技术的结合是情报检索领域的一个发展方向, 但目前已有的全文检索系统都缺乏超文本能力, 而超文本系统也缺乏全文检索功能。本文提出了一个超文本全文检索系统的模型, 并介绍了一个基于该模型的试验系统HFTRS (Hypertext Full Text Retrieval System) , 试图就超文本技术和全文检索技术的结合作一探讨。相似文献

18.

基于Lucene的Ftp搜索引擎的设计 总被引：2，自引：0，他引：2

郭一平向晖王亮《图书情报工作》2006,50(4):122-125

针对当前网络中所使用的基于数据库的Ftp搜索引擎没有标准资源文档且不支持中文分词和动态数据更新的缺陷,提出基于Lucene这个功能强大的全文索引引擎工具包的Ftp搜索引擎的设计方案。此Ftp搜索引擎不仅能够自动生成标准格式的XML资源文档,而且采用基于字典的前向最大匹配中文分词法在Lucene中动态更新全文索引。该设计还能够对检索关键字进行中英文混合分析和检索。相似文献

19.

BDSIRS全文信息检索系统的应用研究

方小容《现代图书情报技术》1999,15(5):34-37

首先介绍了BDSIRS全文信息检索系统所使用的两种联机检索方式外,又尝试了一种新的联机检索方式。即通过因特网的链路,结合BDSIRS专用检索软件联机检索的方式。结果证明是切实可行的,而且是一条降低费用的有效途径。接着对其全文检索方法、技术进行了探讨,从中得到的检索技巧亦有一定的实用价值。相似文献

20.

基于Lucene/XML全文检索系统的跨库应用

彭哲《图书情报工作》2008,52(6):110-110

全文检索系统由三大功能模块组成：索引模块、检索模块和存储模块。本文着重分析系统组成和XML数据库的设计、建立倒排索引文件、中文分词等技术难点。同时在此基础之上建立基于Lucene/XML的期刊文献全文检索系统。相似文献