共查询到20条相似文献,搜索用时 62 毫秒
1.
Coverage-based search result diversification 总被引:1,自引:0,他引:1
Traditional retrieval models may provide users with less satisfactory search experience because documents are scored independently and the top ranked documents often contain excessively redundant information. Intuitively, it is more desirable to diversify search results so that the top-ranked documents can cover different query subtopics, i.e., different pieces of relevant information. In this paper, we study the problem of search result diversification in an optimization framework whose objective is to maximize a coverage-based diversity function. We first define the diversity score of a set of search results through measuring the coverage of query subtopics in the result set, and then discuss how to use them to derive diversification methods. The key challenge here is how to define an appropriate coverage function given a query and a set of search results. To address this challenge, we propose and systematically study three different strategies to define coverage functions. They are based on summations, loss functions and evaluation measures respectively. Each of these coverage functions leads to a result diversification method. We show that the proposed coverage based diversification methods not only cover several state-of-the-art methods but also allows us to derive new ones. We compare these methods both analytically and empirically. Experiment results on two standard TREC collections show that all the methods are effective for diversification and the new methods can outperform existing ones. 相似文献
2.
Olivier Chapelle Shihao Ji Ciya Liao Emre Velipasaoglu Larry Lai Su-Lin Wu 《Information Retrieval》2011,14(6):572-592
We study the problem of web search result diversification in the case where intent based relevance scores are available. A
diversified search result will hopefully satisfy the information need of user-L.s who may have different intents. In this
context, we first analyze the properties of an intent-based metric, ERR-IA, to measure relevance and diversity altogether.
We argue that this is a better metric than some previously proposed intent aware metrics and show that it has a better correlation
with abandonment rate. We then propose an algorithm to rerank web search results based on optimizing an objective function
corresponding to this metric and evaluate it on shopping related queries. 相似文献
3.
User queries to the Web tend to have more than one interpretation due to their ambiguity and other characteristics. How to diversify the ranking results to meet users’ various potential information needs has attracted considerable attention recently. This paper is aimed at mining the subtopics of a query either indirectly from the returned results of retrieval systems or directly from the query itself to diversify the search results. For the indirect subtopic mining approach, clustering the retrieval results and summarizing the content of clusters is investigated. In addition, labeling topic categories and concept tags on each returned document is explored. For the direct subtopic mining approach, several external resources, such as Wikipedia, Open Directory Project, search query logs, and the related search services of search engines, are consulted. Furthermore, we propose a diversified retrieval model to rank documents with respect to the mined subtopics for balancing relevance and diversity. Experiments are conducted on the ClueWeb09 dataset with the topics of the TREC09 and TREC10 Web Track diversity tasks. Experimental results show that the proposed subtopic-based diversification algorithm significantly outperforms the state-of-the-art models in the TREC09 and TREC10 Web Track diversity tasks. The best performance our proposed algorithm achieves is α-nDCG@5 0.307, IA-P@5 0.121, and α#-nDCG@5 0.214 on the TREC09, as well as α-nDCG@10 0.421, IA-P@10 0.201, and α#-nDCG@10 0.311 on the TREC10. The results conclude that the subtopic mining technique with the up-to-date users’ search query logs is the most effective way to generate the subtopics of a query, and the proposed subtopic-based diversification algorithm can select the documents covering various subtopics. 相似文献
4.
基于自适应直觉模糊推理的查新站质量评价方法研究 总被引:1,自引:0,他引:1
5.
In this article we introduce a novel search paradigm for microblogging sites resulting from the intersection of Information Retrieval and Social Network Analysis (SNA). This approach is based on a formal model of on-line conversations and a set of ranking measures including SNA centrality metrics, time-related conversational metrics and other specific features of current microblogging sites. The ranking approach has been compared to other methods and tested on two well known social network sites (Twitter and Friendfeed) showing that the inclusion of SNA metrics in the ranking function and the usage of a model of conversation can improve the results of search tasks. 相似文献
6.
7.
网络资源是科技查新的重要资源之一,利用搜索引擎,可以弥补新产品科技查新中数据库的信息不足。本文通过典型查新案例分析,说明搜索引擎在新产品科技查新中的作用。 相似文献
8.
Oren Kurland 《Information Retrieval》2009,12(4):437-460
To obtain high precision at top ranks by a search performed in response to a query, researchers have proposed a cluster-based
re-ranking paradigm: clustering an initial list of documents that are the most highly ranked by some initial search, and using
information induced from these (often called) query-specific clusters for re-ranking the list. However, results concerning the effectiveness of various automatic cluster-based re-ranking methods have been inconclusive. We show that using query-specific clusters for automatic re-ranking
of top-retrieved documents is effective with several methods in which clusters play different roles, among which is the smoothing of document language models. We do so by adapting previously-proposed cluster-based retrieval approaches, which are based on (static) query-independent
clusters for ranking all documents in a corpus, to the re-ranking setting wherein clusters are query-specific. The best performing
method that we develop outperforms both the initial document-based ranking and some previously proposed cluster-based re-ranking
approaches; furthermore, this algorithm consistently outperforms a state-of-the-art pseudo-feedback-based approach. In further
exploration we study the performance of cluster-based smoothing methods for re-ranking with various (soft and hard) clustering
algorithms, and demonstrate the importance of clusters in providing context from the initial list through a comparison to
using single documents to this end.
相似文献
Oren KurlandEmail: |
9.
Search engine results are often biased towards a certain aspect of a query or towards a certain meaning for ambiguous query terms. Diversification of search results offers a way to supply the user with a better balanced result set increasing the probability that a user finds at least one document suiting her information need. In this paper, we present a reranking approach based on minimizing variance of Web search results to improve topic coverage in the top-k results. We investigate two different document representations as the basis for reranking. Smoothed language models and topic models derived by Latent Dirichlet?allocation. To evaluate our approach we selected 240 queries from Wikipedia disambiguation pages. This provides us with ambiguous queries together with a community generated balanced representation of their (sub)topics. For these queries we crawled two major commercial search engines. In addition, we present a new evaluation strategy based on Kullback-Leibler divergence and Wikipedia. We evaluate this method using the TREC sub-topic evaluation on the one hand, and manually annotated query results on the other hand. Our results show that minimizing variance in search results by reranking relevant pages significantly improves topic coverage in the top-k results with respect to Wikipedia, and gives a good overview of the overall search result. Moreover, latent topic models achieve competitive diversification with significantly less reranking. Finally, our evaluation reveals that our automatic evaluation strategy using Kullback-Leibler divergence correlates well with α-nDCG scores used in manual evaluation efforts. 相似文献
10.
Ranking information resources is a task that usually happens within more complex workflows and that typically occurs in any form of information retrieval, being commonly implemented by Web search engines. By filtering and rating data, ranking strategies guide the navigation of users when exploring large volumes of information items. There exist a considerable number of ranking algorithms that follow different approaches focusing on different aspects of the complex nature of the problem, and reflecting the variety of strategies that are possible to apply. With the growth of the web of linked data, a new problem space for ranking algorithms has emerged, as the nature of the information items to be ranked is very different from the case of Web pages. As a consequence, existing ranking algorithms have been adapted to the case of Linked Data and some specific strategies have started to be proposed and implemented. Researchers and organizations deploying Linked Data solutions thus require an understanding of the applicability, characteristics and state of evaluation of ranking strategies and algorithms as applied to Linked Data. We present a classification that formalizes and contextualizes under a common terminology the problem of ranking Linked Data. In addition, an analysis and contrast of the similarities, differences and applicability of the different approaches is provided. We aim this work to be useful when comparing different approaches to ranking Linked Data and when implementing new algorithms. 相似文献
11.
《Journal of Informetrics》2014,8(2):318-328
Citation based approaches, such as the impact factor and h-index, have been used to measure the influence or impact of journals for journal rankings. A survey of the related literature for different disciplines shows that the level of correlation between these citation based approaches is domain dependent. We analyze the correlation between the impact factors and h-indices of the top ranked computer science journals for five different subjects. Our results show that the correlation between these citation based approaches is very low. Since using a different approach can result in different journal rankings, we further combine the different results and then re-rank the journals using a combination method. These new ranking results can be used as a reference for researchers to choose their publication outlets. 相似文献
12.
Search result diversification aims to diversify search results to cover different query subtopics, i.e., pieces of relevant information. The state of the art diversification methods often explicitly model the diversity based on query subtopics, and their performance is closely related to the quality of subtopics. Most existing studies extracted query subtopics only from the unstructured data such as document collections. However, there exists a huge amount of information from structured data, which complements the information from the unstructured data. The structured data can provide valuable information about domain knowledge, but is currently under-utilized. In this article, we study how to leverage the integrated information from both structured and unstructured data to extract high quality subtopics for search result diversification. We first discuss how to extract subtopics from structured data. We then propose three methods to integrate structured and unstructured data. Specifically, the first method uses the structured data to guide the subtopic extraction from unstructured data, the second one uses the unstructured data to guide the extraction, and the last one first extracts the subtopics separately from two data sources and then combines those subtopics. Experimental results in both Enterprise and Web search domains show that the proposed methods are effective in extracting high quality subtopics from the integrated information, which can lead to better diversification performance. 相似文献
13.
从科技查新服务的实践出发,探讨客户关系管理在查新服务管理中的应用。通过总结目前科技查新服务的
创新模式,论述基于客户关系管理的查新服务的特点和优势。认为科技查新服务中实施客户关系管理的目标是转变科
技查新被动服务的局面,挖掘价值用户以提供增值服务,进而提升科技查新服务价值。本文从客户分析、客户满意度
与忠诚度管理、客户关系保持与发展以及价值客户的发现四个方面探讨了服务模式构建的思路,并以中科院文献情报
中心科技查新服务为案例,讨论服务实践及效果。 相似文献
14.
15.
Relevance feedback is an effective technique for improving search accuracy in interactive information retrieval. In this paper, we study an interesting optimization problem in interactive feedback that aims at optimizing the tradeoff between presenting search results with the highest immediate utility to a user (but not necessarily most useful for collecting feedback information) and presenting search results with the best potential for collecting useful feedback information (but not necessarily the most useful documents from a user’s perspective). Optimizing such an exploration–exploitation tradeoff is key to the optimization of the overall utility of relevance feedback to a user in the entire session of relevance feedback. We formally frame this tradeoff as a problem of optimizing the diversification of search results since relevance judgments on more diversified results have been shown to be more useful for relevance feedback. We propose a machine learning approach to adaptively optimizing the diversification of search results for each query so as to optimize the overall utility in an entire session. Experiment results on three representative retrieval test collections show that the proposed learning approach can effectively optimize the exploration–exploitation tradeoff and outperforms the traditional relevance feedback approach which only does exploitation without exploration. 相似文献
16.
充分发挥高校科技查新咨询的情报评价作用 总被引:7,自引:0,他引:7
科技查新是为辅助科技评估和同行专家评议提供客观文献信息依据的一项新型的科技信息咨询服务业务.本文在分析科技查新咨询中以科技查新报告进行定性情报评价、以查新课题相关论文进行定量情报评价的基础上,建议高校拓展定题情报服务、专题情报服务、竞争情报服务等多层次系列化情报增值服务,以充分发挥高校科技查新咨询的情报评价作用.参考文献8. 相似文献
17.
Query suggestions have become pervasive in modern web search, as a mechanism to guide users towards a better representation of their information need. In this article, we propose a ranking approach for producing effective query suggestions. In particular, we devise a structured representation of candidate suggestions mined from a query log that leverages evidence from other queries with a common session or a common click. This enriched representation not only helps overcome data sparsity for long-tail queries, but also leads to multiple ranking criteria, which we integrate as features for learning to rank query suggestions. To validate our approach, we build upon existing efforts for web search evaluation and propose a novel framework for the quantitative assessment of query suggestion effectiveness. Thorough experiments using publicly available data from the TREC Web track show that our approach provides effective suggestions for adhoc and diversity search. 相似文献
18.
19.
基于关键字匹配的搜索引擎排序网页时仅仅考虑评价网页的重要性,而忽视分类;基于分类目录的搜索引擎很难动态分析Web信息。本文在分析它们不足的前提下,提出利用模糊聚类的方法对搜索引擎的检索结果进行动态分类,依据超链分析算法PageRank和Web文档隶属度相结合进行分类排序,并给出具有调节值的结合公式。实验证明,该算法能够更有效地满足用户的需要,提高检索效率。 相似文献
20.
本文以PageRank算法和HITS算法为例,对基于超链接分析技术的搜索引擎排序算法进行分析,并总结了超链接分析技术应用于搜索引擎结果排序的局限性。 相似文献