首页 | 本学科首页   官方微博 | 高级检索  
 共查询到19条相似文献,搜索用时 234 毫秒
一种基于主题和分众分类的信息检索优化方法   总被引:1,自引:0,他引:1  
本文针对目前搜索引擎存在的检索结果缺乏组织导致检准率不高的问题,提出一种基于主题和分众分类的信息检索优化方法.首先对用户检索主题进行获取和表达,然后以社会标签为聚类项,采用向量空间模型实现基于分众分类的文档主题聚类,并将检索结果按相似度和标签"受欢迎度"复合排序,达到提高检索准确率和优化检索的效果.  相似文献   

信息检索系统返回结果的排序称为相关排序,其中各条目的顺序反映了结果和查询的相关程度.在相关文献的基础上研究了基于概念格的文档相关排序,提出了一种新的排序算法,该算法相对简单,结果令人满意而且能够对大量的文档进行排序.  相似文献   

本体是基于本体的信息检索性能优劣的关键。目前的本体学习没有专门针对信息检索的查询扩展和检索结果组织的特点,导致信息检索效果不佳。提出面向信息检索的本体学习框架,采用基于相容类的概念层次关系学习方法,各层领域概念从相容类对应的文档集合提取。然后量化表示领域概念,挖掘概念中的同义词,基于同义词重新建立文档集合的概念空间。将获取的本体应用到信息检索实验中,实验表明该框架获取的本体可提高检索的准确性和效率。  相似文献   

江腾蛟  万常选 《情报杂志》2006,25(10):48-50
研究了XML文档检索的特性,归纳了XML文档模糊检索的结构放松和内容放松的影响因素。在此基础上,设计了模糊结构和内容检索的结果相关性排序模型;并设计了满足这种排序模型的top-K排序算法和搜索引擎的体系结构。  相似文献   

针对Web网络环境下的食品安全追溯信息检索存在的问题和KS方法的不足,建立了基于改进KS方法的信息检索模型,该模型采用改进KS方法的关键技术,利用元搜索引擎结构,提取领域标引词并设置标引词权值,生成反映食品安全追溯信息领域文献特征的查询扩展式,进而对搜索引擎返回结果排序,最后通过实验表明,该模型取得了较好的检索效果,为管理机构提供了有价值的食品安全追溯信息。  相似文献   

基于文本聚类与LDA相融合的微博主题检索模型研究   总被引:1,自引:0,他引:1  
伴随着微博的日趋流行,对微博信息的检索逐渐成为人们获取第一消息的手段.其中文本聚类和主题发现是信息检索领域的有效方法,采用适当的方法是影响微博短文本信息检索质量的关键因素.文章针对文本聚类和LDA主题模型的互补特征,综合考虑了微博特殊文体和短文本聚类效率问题,提出了基于频繁词集的文本聚类和基于类簇的LDA主题挖掘相融合的微博检索方法,给出了针对微博文体的一种新的主题检索模型.实验表明,该方法不仅能有效地划分微博文本,并且能清晰地挖掘类簇中潜在主题.  相似文献   

随着互联网信息量呈现指数级增长,人们希望搜索引擎能够把用户最关心的信息排在前面以方便浏览。本文提出了一种基于分类特征选择的信息检索结果重排序方法,将分类特征与其它检索特征融合在一起,在保持分类搜索引擎结果文档召回率的前提下,该方法有效地提高了检索结果的平均准确率。  相似文献   

在对信息检索过程中,文档的属性信息是不确定性或者是不完全的,很难做出决策。基于此,提出将粗集理论应用于信息检索模型中,构造语料库的邻接矩阵,通过对扩展的特征项与文档的上近似集和下近似集的重叠程度的比较,来确定文档与查询的相关性,并通过相关度来对文档进行取舍。通过实验证明,该方法可以提高信息检索的准确率。  相似文献   

向量空间模型信息检索技术讨论   总被引:9,自引:0,他引:9  
刘斌  陈桦 《情报杂志》2006,25(7):92-93,91
传统的向量空间模型信息检索技术,只是简单地统计检索信息在文档中出现的频度,检索结果时常与文档不一致,没有反映出真实的相关性,提出了改进的加权算法,并借助辅助主题词表和个性化信息库设计了新的检索系统模型,改进了信息检索方法。  相似文献   

一种基于UIMA的企业级信息检索系统研究   总被引:1,自引:0,他引:1  
企业级信息检索在检索需求描述、检索方法、检索精度、检索对象、检索应用和检索过程方面与传统的Web检索存在着巨大的差异,要求从信息检索算法以及信息检索系统所使用的技术环境和系统结构等方面来研究企业级信息栓索系统.论文研究了一种基于UIMA的企业级信息检索系统,介绍了其系统体系结构和原型系统的技术实现方法.实验表明该系统具有很好的灵活性,一方面有利于充分利用各种文本分析技术来提高系统的性能,另一方面可以满足企业用户多样性的信息过滤需求.  相似文献   

刘秀娟 《现代情报》2010,30(7):138-139,142
现代信息技术正对传统的文献检索课程目标、教学内容、教学方式和评价产生深刻的变革和影响。计算机辅助教学已经不能完全覆盖信息技术对信息素养教育所产生的影响。信息技术与课程整合正开辟了一个崭新的研究领域和实践空间。因此,本文探讨了信息技术与文献检索课整合的含义、层次和整合点,旨在在新技术条件下从文检课的教与学方式、教学结构方面探索教学改革的新思路。  相似文献   

A new method is described to extract significant phrases in the title and the abstract of scientific or technical documents. The method is based upon a text structure analysis and uses a relatively small dictionary. The dictionary has been constructed based on the knowledge about concepts in the field of science or technology and some lexical knowledge, for significant phrases and their component items may be used in different meanings among the fields. A text analysis approach has been applied to select significant phrases as substantial and semantic information carriers of the contents of the abstract.The results of the experiment for five sets of documents have shown that the significant phrases are effectively extracted in all cases, and the number of them for every document and the processing time is fairly satisfactory. The information representation of the document, partly using the method, is discussed with relation to the construction of the document information retrieval system.  相似文献   

论数据库检索系统用于文献计量分析   总被引:6,自引:1,他引:5  
陈光祚 《情报科学》1998,16(2):122-127
笔者自建了一个CDS/ISIS软件支持的、包括6万多篇文献记录的图书馆学、情报学、文献学、档案学的书目数据库,该库不仅收录期刊论文,而且也收录专著、教材、工具书、会议文献、学位论文以及大型手册、教材中的部分篇章子目,使之形成包含多种类型文献的综合性数据库,并开发利用检索软件中固有的功能,例如建立子库法、编辑倒排文件法、关键词标引与题名中单汉字检索相结合以增强检索语言性能的方法、后控词表(ANY词表)法、对文献发表年代的数值字段函数检索法等等,就80年代以来我国图书情报学科群的文献进行了各种指标的文献计量分析。笔者认为,基于包含多种类型文献、时间跨度较长的数据库检索系统,并应用各种软件功能的文献计量方法,是我国文献计量分析的发展方向。  相似文献   

With the growing focus on what is collectively known as “knowledge management”, a shift continues to take place in commercial information system development: a shift away from the well-understood data retrieval/database model, to the more complex and challenging development of commercial document/information retrieval models. While document retrieval has had a long and rich legacy of research, its impact on commercial applications has been modest. At the enterprise level most large organizations have little understanding of, or commitment to, high quality document access and management. Part of the reason for this is that we still do not have a good framework for understanding the major factors which affect the performance of large-scale corporate document retrieval systems. The thesis of this discussion is that document retrieval—specifically, access to intellectual content—is a complex process which is most strongly influenced by three factors: the size of the document collection; the type of search (exhaustive, existence or sample); and, the determinacy of document representation. Collectively, these factors can be used to provide a useful framework for, or taxonomy of, document retrieval, and highlight some of the fundamental issues facing the design and development of commercial document retrieval systems. This is the first of a series of three articles. Part II (D.C. Blair, The challenge of commercial document retrieval. Part II. A strategy for document searching based on identifiable document partitions, Information Processing and Management, 2001b, this issue) will discuss the implications of this framework for search strategy, and Part III (D.C. Blair, Some thoughts on the reported results of Text REtrieval Conference (TREC), Information Processing and Management, 2002, forthcoming) will consider the importance of the TREC results for our understanding of operating information retrieval systems.  相似文献   

Opinion mining is one of the most important research tasks in the information retrieval research community. With the huge volume of opinionated data available on the Web, approaches must be developed to differentiate opinion from fact. In this paper, we present a lexicon-based approach for opinion retrieval. Generally, opinion retrieval consists of two stages: relevance to the query and opinion detection. In our work, we focus on the second state which itself focusses on detecting opinionated documents . We compare the document to be analyzed with opinionated sources that contain subjective information. We hypothesize that a document with a strong similarity to opinionated sources is more likely to be opinionated itself. Typical lexicon-based approaches treat and choose their opinion sources according to their test collection, then calculate the opinion score based on the frequency of subjective terms in the document. In our work, we use different open opinion collections without any specific treatment and consider them as a reference collection. We then use language models to determine opinion scores. The analysis document and reference collection are represented by different language models (i.e., Dirichlet, Jelinek-Mercer and two-stage models). These language models are generally used in information retrieval to represent the relationship between documents and queries. However, in our study, we modify these language models to represent opinionated documents. We carry out several experiments using Text REtrieval Conference (TREC) Blogs 06 as our analysis collection and Internet Movie Data Bases (IMDB), Multi-Perspective Question Answering (MPQA) and CHESLY as our reference collection. To improve opinion detection, we study the impact of using different language models to represent the document and reference collection alongside different combinations of opinion and retrieval scores. We then use this data to deduce the best opinion detection models. Using the best models, our approach improves on the best baseline of TREC Blog (baseline4) by 30%.  相似文献   

Several statistical sampling methods are evaluated for estimating the total number of relevant documents in a collection for a given query. The total number of relevant documents is needed in order to compute recall values for use in evaluating document retrieval systems. The simplest method considered uses simple random sampling to estimate the number of relevant documents. Another type of random sampling, which assigns unequal selection probabilities to the individual documents in the collection, is also investigated. An alternative approach considered uses curve fitting and extrapolation, where a smooth curve is developed which relates precision to document rank. Another curve relates a function of precision to the query-document score. In either case, the curve is extrapolated to the total number of documents in order to estimate the number of relevant documents. Empirical comparisons are made of all three methods.  相似文献   

The indirect retrieval method proposed by Goffman is outlined and some similarities to other retrieval methods are indicated. The method is then evaluated and the results are compared with those obtained on the same document collection with cluster-based retrieval using single-link clustering.The comparisons show that although the effectiveness of the indirect retrieval method can be comparable to cluster-based retrieval, the efficiency is lower.  相似文献   

This paper describes our novel retrieval model that is based on contexts of query terms in documents (i.e., document contexts). Our model is novel because it explicitly takes into account of the document contexts instead of implicitly using the document contexts to find query expansion terms. Our model is based on simulating a user making relevance decisions, and it is a hybrid of various existing effective models and techniques. It estimates the relevance decision preference of a document context as the log-odds and uses smoothing techniques as found in language models to solve the problem of zero probabilities. It combines these estimated preferences of document contexts using different types of aggregation operators that comply with different relevance decision principles (e.g., aggregate relevance principle). Our model is evaluated using retrospective experiments (i.e., with full relevance information), because such experiments can (a) reveal the potential of our model, (b) isolate the problems of the model from those of the parameter estimation, (c) provide information about the major factors affecting the retrieval effectiveness of the model, and (d) show that whether the model obeys the probability ranking principle. Our model is promising as its mean average precision is 60–80% in our experiments using different TREC ad hoc English collections and the NTCIR-5 ad hoc Chinese collection. Our experiments showed that (a) the operators that are consistent with aggregate relevance principle were effective in combining the estimated preferences, and (b) that estimating probabilities using the contexts in the relevant documents can produce better retrieval effectiveness than using the entire relevant documents.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号