首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Coverage-based search result diversification   总被引:1,自引:0,他引:1  
Traditional retrieval models may provide users with less satisfactory search experience because documents are scored independently and the top ranked documents often contain excessively redundant information. Intuitively, it is more desirable to diversify search results so that the top-ranked documents can cover different query subtopics, i.e., different pieces of relevant information. In this paper, we study the problem of search result diversification in an optimization framework whose objective is to maximize a coverage-based diversity function. We first define the diversity score of a set of search results through measuring the coverage of query subtopics in the result set, and then discuss how to use them to derive diversification methods. The key challenge here is how to define an appropriate coverage function given a query and a set of search results. To address this challenge, we propose and systematically study three different strategies to define coverage functions. They are based on summations, loss functions and evaluation measures respectively. Each of these coverage functions leads to a result diversification method. We show that the proposed coverage based diversification methods not only cover several state-of-the-art methods but also allows us to derive new ones. We compare these methods both analytically and empirically. Experiment results on two standard TREC collections show that all the methods are effective for diversification and the new methods can outperform existing ones.  相似文献   

2.
We study the problem of web search result diversification in the case where intent based relevance scores are available. A diversified search result will hopefully satisfy the information need of user-L.s who may have different intents. In this context, we first analyze the properties of an intent-based metric, ERR-IA, to measure relevance and diversity altogether. We argue that this is a better metric than some previously proposed intent aware metrics and show that it has a better correlation with abandonment rate. We then propose an algorithm to rerank web search results based on optimizing an objective function corresponding to this metric and evaluate it on shopping related queries.  相似文献   

3.
User queries to the Web tend to have more than one interpretation due to their ambiguity and other characteristics. How to diversify the ranking results to meet users’ various potential information needs has attracted considerable attention recently. This paper is aimed at mining the subtopics of a query either indirectly from the returned results of retrieval systems or directly from the query itself to diversify the search results. For the indirect subtopic mining approach, clustering the retrieval results and summarizing the content of clusters is investigated. In addition, labeling topic categories and concept tags on each returned document is explored. For the direct subtopic mining approach, several external resources, such as Wikipedia, Open Directory Project, search query logs, and the related search services of search engines, are consulted. Furthermore, we propose a diversified retrieval model to rank documents with respect to the mined subtopics for balancing relevance and diversity. Experiments are conducted on the ClueWeb09 dataset with the topics of the TREC09 and TREC10 Web Track diversity tasks. Experimental results show that the proposed subtopic-based diversification algorithm significantly outperforms the state-of-the-art models in the TREC09 and TREC10 Web Track diversity tasks. The best performance our proposed algorithm achieves is α-nDCG@5 0.307, IA-P@5 0.121, and α#-nDCG@5 0.214 on the TREC09, as well as α-nDCG@10 0.421, IA-P@10 0.201, and α#-nDCG@10 0.311 on the TREC10. The results conclude that the subtopic mining technique with the up-to-date users’ search query logs is the most effective way to generate the subtopics of a query, and the proposed subtopic-based diversification algorithm can select the documents covering various subtopics.  相似文献   

4.
基于自适应直觉模糊推理的查新站质量评价方法研究   总被引:1,自引:0,他引:1  
夏蕾 《图书情报工作》2010,54(18):40-43
基于用户为中心原则,构建查新站质量评价的指标体系,将直觉模糊集理论引入查新站质量评价领域,提出基于自适应神经直觉模糊推理系统的查新站评价方法。首先,采用减法聚类确定自适应直觉模糊神经网络的结构;然后利用混合学习算法训练该网络的前件参数和结论参数;最后,通过采集数据对模型进行训练和测试,结果表明采用该方法所建立的评价模型是有效的。  相似文献   

5.
In this article we introduce a novel search paradigm for microblogging sites resulting from the intersection of Information Retrieval and Social Network Analysis (SNA). This approach is based on a formal model of on-line conversations and a set of ranking measures including SNA centrality metrics, time-related conversational metrics and other specific features of current microblogging sites. The ranking approach has been compared to other methods and tested on two well known social network sites (Twitter and Friendfeed) showing that the inclusion of SNA metrics in the ranking function and the usage of a model of conversation can improve the results of search tasks.  相似文献   

6.
基于DIT理论建模的付费搜索排名质量测度研究   总被引:1,自引:0,他引:1  
针对付费搜索对搜索引擎结果排名的影响,建立以“顺序浏览”为特征的用户搜索模型,通过将模型预测结果与实际数据分布的比较对模型进行检验,进一步定义以信息代价为衡量标准的排名质量定量测度方法,并设计一个数值实验评价两种不同排名机制下付费搜索对排名质量的影响,指出付费排名相对自然排名给用户所增加的搜索代价比与SERP结果质量差异成正相关关系,但在有效竞价排名下该比例会有所减小。  相似文献   

7.
网络资源是科技查新的重要资源之一,利用搜索引擎,可以弥补新产品科技查新中数据库的信息不足。本文通过典型查新案例分析,说明搜索引擎在新产品科技查新中的作用。  相似文献   

8.
To obtain high precision at top ranks by a search performed in response to a query, researchers have proposed a cluster-based re-ranking paradigm: clustering an initial list of documents that are the most highly ranked by some initial search, and using information induced from these (often called) query-specific clusters for re-ranking the list. However, results concerning the effectiveness of various automatic cluster-based re-ranking methods have been inconclusive. We show that using query-specific clusters for automatic re-ranking of top-retrieved documents is effective with several methods in which clusters play different roles, among which is the smoothing of document language models. We do so by adapting previously-proposed cluster-based retrieval approaches, which are based on (static) query-independent clusters for ranking all documents in a corpus, to the re-ranking setting wherein clusters are query-specific. The best performing method that we develop outperforms both the initial document-based ranking and some previously proposed cluster-based re-ranking approaches; furthermore, this algorithm consistently outperforms a state-of-the-art pseudo-feedback-based approach. In further exploration we study the performance of cluster-based smoothing methods for re-ranking with various (soft and hard) clustering algorithms, and demonstrate the importance of clusters in providing context from the initial list through a comparison to using single documents to this end.
Oren KurlandEmail:
  相似文献   

9.
Search engine results are often biased towards a certain aspect of a query or towards a certain meaning for ambiguous query terms. Diversification of search results offers a way to supply the user with a better balanced result set increasing the probability that a user finds at least one document suiting her information need. In this paper, we present a reranking approach based on minimizing variance of Web search results to improve topic coverage in the top-k results. We investigate two different document representations as the basis for reranking. Smoothed language models and topic models derived by Latent Dirichlet?allocation. To evaluate our approach we selected 240 queries from Wikipedia disambiguation pages. This provides us with ambiguous queries together with a community generated balanced representation of their (sub)topics. For these queries we crawled two major commercial search engines. In addition, we present a new evaluation strategy based on Kullback-Leibler divergence and Wikipedia. We evaluate this method using the TREC sub-topic evaluation on the one hand, and manually annotated query results on the other hand. Our results show that minimizing variance in search results by reranking relevant pages significantly improves topic coverage in the top-k results with respect to Wikipedia, and gives a good overview of the overall search result. Moreover, latent topic models achieve competitive diversification with significantly less reranking. Finally, our evaluation reveals that our automatic evaluation strategy using Kullback-Leibler divergence correlates well with α-nDCG scores used in manual evaluation efforts.  相似文献   

10.
Ranking information resources is a task that usually happens within more complex workflows and that typically occurs in any form of information retrieval, being commonly implemented by Web search engines. By filtering and rating data, ranking strategies guide the navigation of users when exploring large volumes of information items. There exist a considerable number of ranking algorithms that follow different approaches focusing on different aspects of the complex nature of the problem, and reflecting the variety of strategies that are possible to apply. With the growth of the web of linked data, a new problem space for ranking algorithms has emerged, as the nature of the information items to be ranked is very different from the case of Web pages. As a consequence, existing ranking algorithms have been adapted to the case of Linked Data and some specific strategies have started to be proposed and implemented. Researchers and organizations deploying Linked Data solutions thus require an understanding of the applicability, characteristics and state of evaluation of ranking strategies and algorithms as applied to Linked Data. We present a classification that formalizes and contextualizes under a common terminology the problem of ranking Linked Data. In addition, an analysis and contrast of the similarities, differences and applicability of the different approaches is provided. We aim this work to be useful when comparing different approaches to ranking Linked Data and when implementing new algorithms.  相似文献   

11.
Citation based approaches, such as the impact factor and h-index, have been used to measure the influence or impact of journals for journal rankings. A survey of the related literature for different disciplines shows that the level of correlation between these citation based approaches is domain dependent. We analyze the correlation between the impact factors and h-indices of the top ranked computer science journals for five different subjects. Our results show that the correlation between these citation based approaches is very low. Since using a different approach can result in different journal rankings, we further combine the different results and then re-rank the journals using a combination method. These new ranking results can be used as a reference for researchers to choose their publication outlets.  相似文献   

12.
Search result diversification aims to diversify search results to cover different query subtopics, i.e., pieces of relevant information. The state of the art diversification methods often explicitly model the diversity based on query subtopics, and their performance is closely related to the quality of subtopics. Most existing studies extracted query subtopics only from the unstructured data such as document collections. However, there exists a huge amount of information from structured data, which complements the information from the unstructured data. The structured data can provide valuable information about domain knowledge, but is currently under-utilized. In this article, we study how to leverage the integrated information from both structured and unstructured data to extract high quality subtopics for search result diversification. We first discuss how to extract subtopics from structured data. We then propose three methods to integrate structured and unstructured data. Specifically, the first method uses the structured data to guide the subtopic extraction from unstructured data, the second one uses the unstructured data to guide the extraction, and the last one first extracts the subtopics separately from two data sources and then combines those subtopics. Experimental results in both Enterprise and Web search domains show that the proposed methods are effective in extracting high quality subtopics from the integrated information, which can lead to better diversification performance.  相似文献   

13.
从科技查新服务的实践出发,探讨客户关系管理在查新服务管理中的应用。通过总结目前科技查新服务的 创新模式,论述基于客户关系管理的查新服务的特点和优势。认为科技查新服务中实施客户关系管理的目标是转变科 技查新被动服务的局面,挖掘价值用户以提供增值服务,进而提升科技查新服务价值。本文从客户分析、客户满意度 与忠诚度管理、客户关系保持与发展以及价值客户的发现四个方面探讨了服务模式构建的思路,并以中科院文献情报 中心科技查新服务为案例,讨论服务实践及效果。  相似文献   

14.
通过实证方法分析消费者感知搜索成本降低的影响因素,并在双源渠道环境下初步探讨消费者的信息搭便车行为与搜索成本之间的关系。研究结果表明,搜索便利性、卖方努力、个人产品知识及渠道多样化会对消费者感知搜索成本降低产生正的影响;同时,搜索成本的降低能够促成消费者的信息搭便车行为。  相似文献   

15.
Relevance feedback is an effective technique for improving search accuracy in interactive information retrieval. In this paper, we study an interesting optimization problem in interactive feedback that aims at optimizing the tradeoff between presenting search results with the highest immediate utility to a user (but not necessarily most useful for collecting feedback information) and presenting search results with the best potential for collecting useful feedback information (but not necessarily the most useful documents from a user’s perspective). Optimizing such an exploration–exploitation tradeoff is key to the optimization of the overall utility of relevance feedback to a user in the entire session of relevance feedback. We formally frame this tradeoff as a problem of optimizing the diversification of search results since relevance judgments on more diversified results have been shown to be more useful for relevance feedback. We propose a machine learning approach to adaptively optimizing the diversification of search results for each query so as to optimize the overall utility in an entire session. Experiment results on three representative retrieval test collections show that the proposed learning approach can effectively optimize the exploration–exploitation tradeoff and outperforms the traditional relevance feedback approach which only does exploitation without exploration.  相似文献   

16.
充分发挥高校科技查新咨询的情报评价作用   总被引:7,自引:0,他引:7  
科技查新是为辅助科技评估和同行专家评议提供客观文献信息依据的一项新型的科技信息咨询服务业务.本文在分析科技查新咨询中以科技查新报告进行定性情报评价、以查新课题相关论文进行定量情报评价的基础上,建议高校拓展定题情报服务、专题情报服务、竞争情报服务等多层次系列化情报增值服务,以充分发挥高校科技查新咨询的情报评价作用.参考文献8.  相似文献   

17.
Query suggestions have become pervasive in modern web search, as a mechanism to guide users towards a better representation of their information need. In this article, we propose a ranking approach for producing effective query suggestions. In particular, we devise a structured representation of candidate suggestions mined from a query log that leverages evidence from other queries with a common session or a common click. This enriched representation not only helps overcome data sparsity for long-tail queries, but also leads to multiple ranking criteria, which we integrate as features for learning to rank query suggestions. To validate our approach, we build upon existing efforts for web search evaluation and propose a novel framework for the quantitative assessment of query suggestion effectiveness. Thorough experiments using publicly available data from the TREC Web track show that our approach provides effective suggestions for adhoc and diversity search.  相似文献   

18.
高校社科查新体系构建探讨   总被引:2,自引:1,他引:1  
介绍高校社科查新体系构建的必要性,分析社科查新体系与现有的科技查新体系之间的区别,指出高校社科查新的现存问题,就高校社科查新体系的构建提出五点建议:一、发挥政府导向功能,推动社科查新体系建设;二、借鉴科技查新经验,健全社科查新行业规范;三、自建社科成果数据库,加强社科文献资源建设;四、配备高级查新人才,建立专业社科查新队伍;五、加大宣传和教育力度,提高社科研究者查新意识。  相似文献   

19.
严海兵  崔志明 《情报学报》2007,26(3):361-365
基于关键字匹配的搜索引擎排序网页时仅仅考虑评价网页的重要性,而忽视分类;基于分类目录的搜索引擎很难动态分析Web信息。本文在分析它们不足的前提下,提出利用模糊聚类的方法对搜索引擎的检索结果进行动态分类,依据超链分析算法PageRank和Web文档隶属度相结合进行分类排序,并给出具有调节值的结合公式。实验证明,该算法能够更有效地满足用户的需要,提高检索效率。  相似文献   

20.
本文以PageRank算法和HITS算法为例,对基于超链接分析技术的搜索引擎排序算法进行分析,并总结了超链接分析技术应用于搜索引擎结果排序的局限性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号