首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
In Information Retrieval, since it is hard to identify users’ information needs, many approaches have been tried to solve this problem by expanding initial queries and reweighting the terms in the expanded queries using users’ relevance judgments. Although relevance feedback is most effective when relevance information about retrieved documents is provided by users, it is not always available. Another solution is to use correlated terms for query expansion. The main problem with this approach is how to construct the term-term correlations that can be used effectively to improve retrieval performance. In this study, we try to construct query concepts that denote users’ information needs from a document space, rather than to reformulate initial queries using the term correlations and/or users’ relevance feedback. To form query concepts, we extract features from each document, and then cluster the features into primitive concepts that are then used to form query concepts. Experiments are performed on the Associated Press (AP) dataset taken from the TREC collection. The experimental evaluation shows that our proposed framework called QCM (Query Concept Method) outperforms baseline probabilistic retrieval model on TREC retrieval.  相似文献   

2.
Multilingual information retrieval is generally understood to mean the retrieval of relevant information in multiple target languages in response to a user query in a single source language. In a multilingual federated search environment, different information sources contain documents in different languages. A general search strategy in multilingual federated search environments is to translate the user query to each language of the information sources and run a monolingual search in each information source. It is then necessary to obtain a single ranked document list by merging the individual ranked lists from the information sources that are in different languages. This is known as the results merging problem for multilingual information retrieval. Previous research has shown that the simple approach of normalizing source-specific document scores is not effective. On the other side, a more effective merging method was proposed to download and translate all retrieved documents into the source language and generate the final ranked list by running a monolingual search in the search client. The latter method is more effective but is associated with a large amount of online communication and computation costs. This paper proposes an effective and efficient approach for the results merging task of multilingual ranked lists. Particularly, it downloads only a small number of documents from the individual ranked lists of each user query to calculate comparable document scores by utilizing both the query-based translation method and the document-based translation method. Then, query-specific and source-specific transformation models can be trained for individual ranked lists by using the information of these downloaded documents. These transformation models are used to estimate comparable document scores for all retrieved documents and thus the documents can be sorted into a final ranked list. This merging approach is efficient as only a subset of the retrieved documents are downloaded and translated online. Furthermore, an extensive set of experiments on the Cross-Language Evaluation Forum (CLEF) () data has demonstrated the effectiveness of the query-specific and source-specific results merging algorithm against other alternatives. The new research in this paper proposes different variants of the query-specific and source-specific results merging algorithm with different transformation models. This paper also provides thorough experimental results as well as detailed analysis. All of the work substantially extends the preliminary research in (Si and Callan, in: Peters (ed.) Results of the cross-language evaluation forum-CLEF 2005, 2005).
Hao YuanEmail:
  相似文献   

3.
交互式跨语言信息检索是信息检索的一个重要分支。在分析交互式跨语言信息检索过程、评价指标、用户行为进展等理论研究基础上,设计一个让用户参与跨语言信息检索全过程的用户检索实验。实验结果表明:用户检索词主要来自检索主题的标题;用户判断文档相关性的准确率较高;目标语言文档全文、译文摘要、译文全文都是用户认可的判断依据;翻译优化方法以及翻译优化与查询扩展的结合方法在用户交互环境下非常有效;用户对于反馈后的翻译仍然愿意做进一步选择;用户对于与跨语言信息检索系统进行交互是有需求并认可的。用户行为分析有助于指导交互式跨语言信息检索系统的设计与实践。  相似文献   

4.
We propose a method for performing evaluation of relevance feedback based on simulating real users. The user simulation applies a model defining the user’s relevance threshold to accept individual documents as feedback in a graded relevance environment; user’s patience to browse the initial list of retrieved documents; and his/her effort in providing the feedback. We evaluate the result by using cumulated gain-based evaluation together with freezing all documents seen by the user in order to simulate the point of view of a user who is browsing the documents during the retrieval process. We demonstrate the method by performing a simulation in the laboratory setting and present the “branching” curve sets characteristic for the presented evaluation method. Both the average and topic-by-topic results indicate that if the freezing approach is adopted, giving feedback of mixed quality makes sense for various usage scenarios even though the modeled users prefer finding especially the most relevant documents.  相似文献   

5.
An information retrieval (IR) system can often fail to retrieve relevant documents due to the incomplete specification of information need in the user’s query. Pseudo-relevance feedback (PRF) aims to improve IR effectiveness by exploiting potentially relevant aspects of the information need present in the documents retrieved in an initial search. Standard PRF approaches utilize the information contained in these top ranked documents from the initial search with the assumption that documents as a whole are relevant to the information need. However, in practice, documents are often multi-topical where only a portion of the documents may be relevant to the query. In this situation, exploitation of the topical composition of the top ranked documents, estimated with statistical topic modeling based approaches, can potentially be a useful cue to improve PRF effectiveness. The key idea behind our PRF method is to use the term-topic and the document-topic distributions obtained from topic modeling over the set of top ranked documents to re-rank the initially retrieved documents. The objective is to improve the ranks of documents that are primarily composed of the relevant topics expressed in the information need of the query. Our RF model can further be improved by making use of non-parametric topic modeling, where the number of topics can grow according to the document contents, thus giving the RF model the capability to adjust the number of topics based on the content of the top ranked documents. We empirically validate our topic model based RF approach on two document collections of diverse length and topical composition characteristics: (1) ad-hoc retrieval using the TREC 6-8 and the TREC Robust ’04 dataset, and (2) tweet retrieval using the TREC Microblog ’11 dataset. Results indicate that our proposed approach increases MAP by up to 9% in comparison to the results obtained with an LDA based language model (for initial retrieval) coupled with the relevance model (for feedback). Moreover, the non-parametric version of our proposed approach is shown to be more effective than its parametric counterpart due to its advantage of adapting the number of topics, improving results by up to 5.6% of MAP compared to the parametric version.  相似文献   

6.
Users are often faced with complex information needs that are not easily represented as a single query. With current technology, the burden of issuing these individual queries, analysing retrieved documents for relevance, as well as aggregating results falls upon the time-poor and informationally overloaded user. Aggregated search techniques represent the new generation of search applications that endeavour to help users perform these complex tasks. However, the way in which different data types are combined in current aggregated search applications is often performed using static hard-coded structures. We suggest that a useful alternative is to marry techniques from natural language generation, such as text planning and summarisation, in order to dynamically determine the best organisation of retrieved information. These organisations can be motivated by linguistic theories that consider issues such as the role that the information plays to facilitate a task, and the relationships between different pieces of information. With reference to a discourse strategy, it is possible to draw on several data sources automatically to generate a useful, focused, and coherent answer. We focus on exploring the parallels between aggregated search and natural language generation in the hope that the fields can be mutually informed, leading to further advances in the way search technologies can better serve the user. These issues are discussed and presented with examples of existing systems across different domains.  相似文献   

7.
指出在网络受控环境下,用户主观因素是影响信息检索绩效的重要变量,在信息检索过程中发挥着至关重要的作用。利用CCA典型相关分析方法,分析主观因素与检索行为等因素之间的关系。结果表明:①容易度与任务总时间显著相关,当用户在检索后判断任务难度大时将会付出更多的时间和努力;②标记文献数、重复浏览率、浏览网页数与感觉、丢失度相关;③高兴度与相关度、完成度负相关表明用户在检索前愉悦程度低的用户可能获得更好的检索结果。  相似文献   

8.
9.
Genetic Approach to Query Space Exploration   总被引:2,自引:0,他引:2  
This paper describes a genetic algorithm approach for intelligent information retrieval. The goal is to find an optimal set of documents which best matches the user's needs by exploring and exploiting the document space. More precisely, we define a specific genetic algorithm for information retrieval based on knowledge based operators and guided by a heuristic for relevance multi-modality problem solving. Experiments with TREC-6 French data and queries show the effectiveness of our approach.  相似文献   

10.
A structured document retrieval (SDR) system aims to minimize the effort users spend to locate relevant information by retrieving parts of documents. To evaluate the range of SDR tasks, from element to passage to tree retrieval, numerous task-specific measures have been proposed. This has resulted in SDR evaluation measures that cannot easily be compared with respect to each other and across tasks. In previous work, we defined the SDR task of tree retrieval where passage and element are special cases. In this paper, we look in greater detail into tree retrieval to identify the main components of SDR evaluation: relevance, navigation, and redundancy. Our goal is to evaluate SDR within a single probabilistic framework based on these components. This framework, called Extended Structural Relevance (ESR), calculates user expected gain in relevant information depending on whether it is seen via hits (relevant results retrieved), unseen via misses (relevant results not retrieved), or possibly seen via near-misses (relevant results accessed via navigation). We use these expectations as parameters to formulate evaluation measures for tree retrieval. We then demonstrate how existing task-specific measures, if viewed as tree retrieval, can be formulated, computed and compared using our framework. Finally, we experimentally validate ESR across a range of SDR tasks.  相似文献   

11.
信息用户对信息检索相关性判断的因素分析   总被引:2,自引:0,他引:2  
介绍相关性是信息检索科学的核心概念,用户观点则是相关性研究的主要观点;从用户角度研究相关性理论,以试验法为研究方法,力图证明存在一个核心的、可以跨不同用户类型、问题情境和信息源环境的关于信息用户在信息需求检索中的相关性判断的因素集,以此阐述如何提高信息检索的准确率,指导信息用户能够及时、准确的查找到所需信息。  相似文献   

12.
This paper presents a Graph Inference retrieval model that integrates structured knowledge resources, statistical information retrieval methods and inference in a unified framework. Key components of the model are a graph-based representation of the corpus and retrieval driven by an inference mechanism achieved as a traversal over the graph. The model is proposed to tackle the semantic gap problem—the mismatch between the raw data and the way a human being interprets it. We break down the semantic gap problem into five core issues, each requiring a specific type of inference in order to be overcome. Our model and evaluation is applied to the medical domain because search within this domain is particularly challenging and, as we show, often requires inference. In addition, this domain features both structured knowledge resources as well as unstructured text. Our evaluation shows that inference can be effective, retrieving many new relevant documents that are not retrieved by state-of-the-art information retrieval models. We show that many retrieved documents were not pooled by keyword-based search methods, prompting us to perform additional relevance assessment on these new documents. A third of the newly retrieved documents judged were found to be relevant. Our analysis provides a thorough understanding of when and how to apply inference for retrieval, including a categorisation of queries according to the effect of inference. The inference mechanism promoted recall by retrieving new relevant documents not found by previous keyword-based approaches. In addition, it promoted precision by an effective reranking of documents. When inference is used, performance gains can generally be expected on hard queries. However, inference should not be applied universally: for easy, unambiguous queries and queries with few relevant documents, inference did adversely affect effectiveness. These conclusions reflect the fact that for retrieval as inference to be effective, a careful balancing act is involved. Finally, although the Graph Inference model is developed and applied to medical search, it is a general retrieval model applicable to other areas such as web search, where an emerging research trend is to utilise structured knowledge resources for more effective semantic search.  相似文献   

13.
在海量信息中检索时,与用户查询相关的信息常常被漏掉,而与查询无关的信息———信息垃圾,却大量地出现在检索结果中。改进文本信息检索系统的质量,提高检索效能,已成为亟待解决的问题。本文针对能够影响检索效力的一个易被忽略的因素———修饰语,研究其在文本信息检索中的作用。为此,构建了修正的向量空间模型(Modified Vector Space Model,MVSM),并以英文文本进行试验,进而说明修饰语的作用。  相似文献   

14.
基于用户兴趣的个性化检索   总被引:8,自引:0,他引:8  
目前检索工具的设计大都面向所有用户,而不考虑用户个人的兴趣偏好。本文提出一种基于用户兴趣的个性化检索方法。该方法自动学习用户查询的历史记录,构建用户兴趣模型,以此推导用户新提问的真正意图。实验结果表明,该方法更适宜涉及多个类别的关键词的信息检索,可提高信息检索的查准率。  相似文献   

15.
Large-scale retrieval systems are often implemented as a cascading sequence of phases—a first filtering step, in which a large set of candidate documents are extracted using a simple technique such as Boolean matching and/or static document scores; and then one or more ranking steps, in which the pool of documents retrieved by the filter is scored more precisely using dozens or perhaps hundreds of different features. The documents returned to the user are then taken from the head of the final ranked list. Here we examine methods for measuring the quality of filtering and preliminary ranking stages, and show how to use these measurements to tune the overall performance of the system. Standard top-weighted metrics used for overall system evaluation are not appropriate for assessing filtering stages, since the output is a set of documents, rather than an ordered sequence of documents. Instead, we use an approach in which a quality score is computed based on the discrepancy between filtered and full evaluation. Unlike previous approaches, our methods do not require relevance judgments, and thus can be used with virtually any query set. We show that this quality score directly correlates with actual differences in measured effectiveness when relevance judgments are available. Since the quality score does not require relevance judgments, it can be used to identify queries that perform particularly poorly for a given filter. Using these methods, we explore a wide range of filtering options using thousands of queries, categorize the relative merits of the different approaches, and identify useful parameter combinations.  相似文献   

16.
We present a system for multilingual information retrieval that allows users to formulate queries in their preferred language and retrieve relevant information from a collection containing documents in multiple languages. The system is based on a process of document level alignments, where documents of different languages are paired according to their similarity. The resulting mapping allows us to produce a multilingual comparable corpus. Such a corpus has multiple interesting applications. It allows us to build a data structure for query translation in cross-language information retrieval (CLIR). Moreover, we also perform pseudo relevance feedback on the alignments to improve our retrieval results. And finally, multiple retrieval runs can be merged into one unified result list. The resulting system is inexpensive, adaptable to domain-specific collections and new languages and has performed very well at the TREC-7 conference CLIR system comparison.  相似文献   

17.
调查1967-2013年期间与用户相关性判断研究相关的82篇文献,筛选其中55篇在研究方法上具有代表性者构成样本文献。通过分析发现:样本文献在方法论思想和具体研究方法上存在较多共性和规律性的观点与做法,并可在结构上组织为一个以相关性判断的情境依赖性、认知主因和真实情境设定中开展研究3项方法原则为核心,纵向上涵盖方法论思想、研究策略和具体研究方案设计等多个层次,横向上涉及样本选取、数据采集与分析策略制定等多个研究方案设计关键环节的参考性框架。认为信息查询与检索领域的认知观是该方法框架形成、发展和进一步演化的关键驱动因素,并据此分析该框架的未来发展。  相似文献   

18.
网络环境下信息存储与检索技术的发展   总被引:7,自引:0,他引:7  
信息存储与检索技术是信息传递中的重要环节。检索语言和检索效率密切相关,它在信息检索过程中起着语言保障的作用。为满足不同用户能够检索到所需要的信息,检索语言必然朝着自然语言、用户界面友好的方向发展。  相似文献   

19.
将稀缺理论与用户认知理论结合,可以重新定义用户在信息检索过程中的认知行为,合理地阐明用户认知行为的发展趋势,提升信息检索研究的有效性与科学性。本文研究分析了稀缺心理对用户认知的动态影响过程,并在稀缺心理的基础上解释了用户潜在信息需求的深层原因。实验结果证明了用户的社交网络行为与信息检索行为之间存在一定的语义关系,通过提取用户社交网络数据,为研究用户个性化需求提供理论依据和参考依据,有利于信息检索个性化服务的实现。  相似文献   

20.
采用出声思维、观察、访谈等方法收集5个真实信息需求语境下群体用户合作信息查寻与检索活动的案例的相关数据,通过扎根理论归纳出合作信息查寻与检索行为的17个主要概念范畴并聚焦为5个核心范畴,对5个案例从主体、情境与行为三个层面展开对比研究并描述合作信息查寻与检索过程。基于对合作相关性判断概念范畴的解析,指出影响群体用户合作相关性判断的5个主要因素,并建构合作信息查寻与检索的相关性判断模型。研究表明:合作信息查寻与检索的相关性判据与个体查寻与检索的相关性判据大致相似,群体交流是两者间的最大差异,语境对形成群体共识具有重要作用。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号