首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 29 毫秒
1.
The goal of this article is to understand the reasons why known-item search queries entered in a discovery system return zero hits. We analyze a sample of 708 known-item queries and classify them into four categories of zero hits with regard to whether the item is held by the library and whether the query is formulated correctly: (1) item in stock, but query incorrect, (2) item not in stock, (3) item in stock, but incomplete or erroneous metadata, (4) query is ambiguous or not understandable. The main reasons for zero hits are caused by acquisition and erroneous search queries. We discuss possible solutions for known-item queries resulting in zero hits from the side of the system and show that 30% of zero hits could easily be avoided by applying automatic spelling correction. We argue that libraries can improve their discovery systems or online catalogs by applying strategies to avoid or cope with zero hits inspired by web search engines and commercial search web sites.  相似文献   

2.
3.
为减少元搜索引擎中无效成员搜索引擎返回的大量重复冗余信息、减轻后期结果处理的负担、提高系统的查准率,文章提出一种基于奖励机制的成员搜索引擎调度策略。该策略引入Agent技术,将每个成员搜索引擎Agent对查询的重要程度进行量化管理,选择检索性能最佳的若干成员搜索引擎进行调度。实验结果证明,这种基于奖励机制的成员搜索引擎调度策略在提高查准率、缩短查询时间、减轻元搜索引擎后期的结果处理负担方面,都优于传统的成员搜索引擎调度策略。  相似文献   

4.
This paper reports findings from an analysis of medical or health queries to different web search engines. We report results: (i). comparing samples of 10000 web queries taken randomly from 1.2 million query logs from the AlltheWeb.com and Excite.com commercial web search engines in 2001 for medical or health queries, (ii). comparing the 2001 findings from Excite and AlltheWeb.com users with results from a previous analysis of medical and health related queries from the Excite Web search engine for 1997 and 1999, and (iii). medical or health advice-seeking queries beginning with the word 'should'. Findings suggest: (i). a small percentage of web queries are medical or health related, (ii). the top five categories of medical or health queries were: general health, weight issues, reproductive health and puberty, pregnancy/obstetrics, and human relationships, and (iii). over time, the medical and health queries may have declined as a proportion of all web queries, as the use of specialized medical/health websites and e-commerce-related queries has increased. Findings provide insights into medical and health-related web querying and suggests some implications for the use of the general web search engines when seeking medical/health information.  相似文献   

5.
中文搜索引擎用户检索式特征探析   总被引:2,自引:0,他引:2  
马寒  冯锦玲 《情报学报》2005,24(6):718-722
这项研究采集了百度、一搜、中搜和搜狗四家中文搜索引擎的七千余项检索式,分别从词汇出现频次、词汇量、类别等方面分析了中文搜索引擎用户的检索行为特征,对开展用户教育和搜索服务设计都有一定的实用价值。  相似文献   

6.
Search engine results are often biased towards a certain aspect of a query or towards a certain meaning for ambiguous query terms. Diversification of search results offers a way to supply the user with a better balanced result set increasing the probability that a user finds at least one document suiting her information need. In this paper, we present a reranking approach based on minimizing variance of Web search results to improve topic coverage in the top-k results. We investigate two different document representations as the basis for reranking. Smoothed language models and topic models derived by Latent Dirichlet?allocation. To evaluate our approach we selected 240 queries from Wikipedia disambiguation pages. This provides us with ambiguous queries together with a community generated balanced representation of their (sub)topics. For these queries we crawled two major commercial search engines. In addition, we present a new evaluation strategy based on Kullback-Leibler divergence and Wikipedia. We evaluate this method using the TREC sub-topic evaluation on the one hand, and manually annotated query results on the other hand. Our results show that minimizing variance in search results by reranking relevant pages significantly improves topic coverage in the top-k results with respect to Wikipedia, and gives a good overview of the overall search result. Moreover, latent topic models achieve competitive diversification with significantly less reranking. Finally, our evaluation reveals that our automatic evaluation strategy using Kullback-Leibler divergence correlates well with α-nDCG scores used in manual evaluation efforts.  相似文献   

7.
王若佳  李培 《图书情报工作》2015,59(11):111-118
[目的/意义] 针对当前我国网络用户的健康信息检索行为, 探索利用中文搜索引擎的健康信息检索规律, 为完善健康搜索引擎和网站建设提供参考。[方法/过程] 基于搜狗搜索引擎的大规模查询日志, 采用日志挖掘的方法, 从查询行为和点击行为两个角度对网络用户的健康信息检索行为进行研究。查询行为的研究指标包括会话层(会话长度、用户重复查询), 查询串层(查询串长度、重复查询)和词项层(高频词汇, 主题分类);点击行为的研究指标为点击位置和点击内容。[结果/结论] 健康相关查询的重复率较高, 提示相关网站可缓存高重复率查询串的返回结果;大众关注的热点领域为疾病、保健、母婴、医疗机构与美容整形, 提示网站的导航设计注意导航方向;用户更偏爱使用问答型平台, 提示网站设计者应更加关注与用户间问答型的互动模式。  相似文献   

8.
Query suggestion, which enables the user to revise a query with a single click, has become one of the most fundamental features of Web search engines. However, it has not been clear what circumstances cause the user to turn to query suggestion. In order to investigate when and how the user uses query suggestion, we analyzed three kinds of data sets obtained from a major commercial Web search engine, comprising approximately 126 million unique queries, 876 million query suggestions and 306 million action patterns of users. Our analysis shows that query suggestions are often used (1) when the original query is a rare query, (2) when the original query is a single-term query, (3) when query suggestions are unambiguous, (4) when query suggestions are generalizations or error corrections of the original query, and (5) after the user has clicked on several URLs in the first search result page. Our results suggest that search engines should provide better assistance especially when rare or single-term queries are input, and that they should dynamically provide query suggestions according to the searcher’s current state.  相似文献   

9.
This study examined ten, selected word pairs, each containing a word's full spelling and its abbreviation, to determine which form search engine users preferred in searching. Using seven search logs gathered from several Internet search engines with approximately 608 MB of data, the study measured the occurrences of the twenty terms. The selected words are important in library cataloging, for some are prescribed abbreviations in metadata content standards. The study found that in eight of the ten word pairs users preferred to search words’ full spellings over the abbreviations, often by a high margin.  相似文献   

10.
中文搜索引擎的搜索结果重合率研究   总被引:1,自引:0,他引:1  
本文的研究目的是测试主流中文搜索引擎搜索结果之间的重合程度和差异程度.利用一个具有11 171条来自真实用户的提问样本集对百度、谷歌和中国雅虎进行实际测试,发现中文搜索引擎搜索结果之间的差异很大,重合率很低.在全部的第一页搜索结果中,三个引擎中任何一个引擎独有的搜索结果总数占89.34%,任何两个引擎之间重合的搜索结果总数占8.11%,三个引擎重合的搜索结果数量占2.54%.三个引擎前两页搜索结果的重合比例更低.通过和已有的英文搜索引擎重合率测试数据相比较,发现中英文搜索引擎的搜索结果重合率都很低,且很相近.  相似文献   

11.
Query recommendation has long been considered a key feature of search engines, which can improve users’ search experience by providing useful query suggestions for their search tasks. Most existing approaches on query recommendation aim to recommend relevant queries, i.e., alternative queries similar to a user’s initial query. However, the ultimate goal of query recommendation is to assist users to reformulate queries so that they can accomplish their search task successfully and quickly. Only considering relevance in query recommendation is apparently not directly toward this goal. In this paper, we argue that it is more important to directly recommend queries with high utility, i.e., queries that can better satisfy users’ information needs. For this purpose, we attempt to infer query utility from users’ sequential search behaviors recorded in their search sessions. Specifically, we propose a dynamic Bayesian network, referred as Query Utility Model (QUM), to capture query utility by simultaneously modeling users’ reformulation and click behaviors. We then recommend queries with high utility to help users better accomplish their search tasks. We empirically evaluated the performance of our approach on a publicly released query log by comparing with the state-of-the-art methods. The experimental results show that, by recommending high utility queries, our approach is far more effective in helping users find relevant search results and thus satisfying their information needs.  相似文献   

12.
Transaction logs of NAVER, a major Korean Web search engine, were analyzed to track the information-seeking behavior of Korean Web users. These transaction logs include more than 40 million queries collected over 1 week. This study examines current transaction log analysis methodologies and proposes a method for log cleaning, session definition, and query classification. A term definition method which is necessary for Korean transaction log analysis is also discussed. The results of this study show that users behave in a simple way: they type in short queries with a few query terms, seldom use advanced features, and view few results' pages. Users also behave in a passive way: they seldom change search environments set by the system. It is of interest that users tend to change their queries totally rather than adding or deleting terms to modify the previous queries. The results of this study might contribute to the development of more efficient and effective Web search engines and services.  相似文献   

13.
When searching for health information, results quality can be judged against available scientific evidence: Do search engines return advice consistent with evidence based medicine? We compared the performance of domain-specific health and depression search engines against a general-purpose engine (Google) on both relevance of results and quality of advice. Over 101 queries, to which the term ‘depression’ was added if not already present, Google returned more relevant results than those of the domain-specific engines. However, over the 50 treatment-related queries, Google returned 70 pages recommending for or against a well studied treatment, of which 19 strongly disagreed with the scientific evidence. A domain-specific index of 4 sites selected by domain experts was only wrong in 5 of 50 recommendations. Analysis suggests a tension between relevance and quality. Indexing more pages can give a greater number of relevant results, but selective inclusion can give better quality.  相似文献   

14.
针对当前检索中存在的关键词拼写错误影响检索结果的现状,进行OPAC拼写检查功能的应用现状、需求分析、设计思路及其实现方法的研究。利用数据挖掘、ASP和数据库相关技术对汉词、拼音及英语单词进行收集、编辑、存储、查询、纠错及更新,实现拼写检查功能,给出纠错提示与检索建议,以增强OPAC的可用性、提升用户的使用体验。  相似文献   

15.
搜索引擎的最新进展述要   总被引:1,自引:0,他引:1  
搜索引擎已成为人们利用网络的最重要工具.目前,网络上出现了一些具有新颖性、创新性的搜索引擎或挖掘出搜索引擎的新功能,其中有些研究成果直接代表着搜索引擎的发展方向.文章通过跟踪、试用、分析等环节,对新的搜索引擎或具有新功能的搜索引擎进行了归纳,便于人们更好地了解当前世界搜索引擎的现状.  相似文献   

16.
The critical task of predicting clicks on search advertisements is typically addressed by learning from historical click data. When enough history is observed for a given query-ad pair, future clicks can be accurately modeled. However, based on the empirical distribution of queries, sufficient historical information is unavailable for many query-ad pairs. The sparsity of data for new and rare queries makes it difficult to accurately estimate clicks for a significant portion of typical search engine traffic. In this paper we provide analysis to motivate modeling approaches that can reduce the sparsity of the large space of user search queries. We then propose methods to improve click and relevance models for sponsored search by mining click behavior for partial user queries. We aggregate click history for individual query words, as well as for phrases extracted with a CRF model. The new models show significant improvement in clicks and revenue compared to state-of-the-art baselines trained on several months of query logs. Results are reported on live traffic of a commercial search engine, in addition to results from offline evaluation.  相似文献   

17.
This study investigates the information seeking behavior of general Korean Web users. The data from transaction logs of selected dates from August 2006 to August 2007 were used to examine characteristics of Web queries and to analyze click logs that consist of a collection of documents that users clicked and viewed for each query. Changes in search topics are explored for NAVER users from 2003/2004 to 2006/2007. Patterns involving spelling errors and queries in foreign languages are also investigated. Search behaviors of Korean Web users are compared to those of the United States and other countries. The results show that entertainment is the topranked category, followed by shopping, education, games, and computer/Internet. Search topics changed from computer/Internet to entertainment and shopping from 2003/2004 to 2006/2007 in Korea. The ratios of both spelling errors and queries in foreign languages are low. This study reveals differences for search topics among different regions of the world. The results suggest that the analysis of click logs allows for the reduction of unknown or unidentifiable queries by providing actual data on user behaviors and their probable underlying information needs. The implications for system designers and Web content providers are discussed.  相似文献   

18.
Query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. This search paradigm works for highly schematic XML data collections such as electronic catalogs. However, for searching information in open environments such as the Web or intranets of large corporations, ranked retrieval is more appropriate: a query result is a ranked list of XML elements in descending order of (estimated) relevance. Web search engines, which are based on the ranked retrieval paradigm, do, however, not consider the additional information and rich annotations provided by the structure of XML documents and their element names.This article presents the XXL search engine that supports relevance ranking on XML data. XXL is particularly geared for path queries with wildcards that can span multiple XML collections and contain both exact-match as well as semantic-similarity search conditions. In addition, ontological information and suitable index structures are used to improve the search efficiency and effectiveness. XXL is fully implemented as a suite of Java classes and servlets. Experiments in the context of the INEX benchmark demonstrate the efficiency of the XXL search engine and underline its effectiveness for ranked retrieval.  相似文献   

19.
Users often issue all kinds of queries to look for the same target due to the intrinsic ambiguity and flexibility of natural languages. Some previous work clusters queries based on co-clicks; however, the intents of queries in one cluster are not that similar but roughly related. It is desirable to conduct automatic mining of queries with equivalent intents from a large scale search logs. In this paper, we take account of similarities between query strings. There are two issues associated with such similarities: it is too costly to compare any pair of queries in large scale search logs, and two queries with a similar formulation, such as “SVN” (Apache Subversion) and support vector machine (SVM), are not necessarily similar in their intents. To address these issues, we propose using the similarities of query strings above the co-click based clustering results. Our method improves precision over the co-click based clustering method (lifting precision from 0.37 to 0.62), and outperforms a commercial search engine’s query alteration (lifting \(F_1\) measure from 0.42 to 0.56). As an application, we consider web document retrieval. We aggregate similar queries’ click-throughs with the query’s click-throughs and evaluate them on a large scale dataset. Experimental results indicate that our proposed method significantly outperforms the baseline method of using a query’s own click-throughs in all metrics.  相似文献   

20.
搜索引擎中Robot搜索算法的优化   总被引:15,自引:0,他引:15  
目前的搜索引擎越来越暴露出不足之处 ,当用户使用搜索引擎时输入特定关键词之后 ,返回的查询结果往往有数千甚至几百万之多 ,而且其中包含大量的重复信息与垃圾信息 ,用户从中筛选出自己感兴趣的网页仍然需要耗费很长的时间。另外一种情况就是 ,Web上明明存在某些重要网页 ,却没有被搜索引擎的robot发现。本文针对这种现象 ,重点讨论搜索引擎中的搜索策略 ,改善搜索算法 ,使Robot在搜索阶段就能够充分处理与Robot频繁交互的URL列表。根据网页的内容、HTML结构以及其中包含的超链信息计算网页的PageRank ,使URL列表能够根据重要性调整排列顺序。初步的试验结果表明 ,本文的优化算法可以较大程度地改进搜索引擎的整体性能  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号