首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
网络上科学信息的时效性测量   总被引:3,自引:0,他引:3  
时效性是影响网上信息质量的重要因素.本文以网上可公共获取的科学信息为对象,采用层次分析法分配信息时效性各测量指标的权重,选择数学、生命科学、物理、材料科学等8个学科门类的32个主题词进行跟踪查询,抽取Google、Yahoo和Altavista搜索引擎返回的前50个页面作为测量样本.测量结果为:网络科学信息时效性的平均得分为2.6482(总体样本2814个),仅有34.90%的网页时效性得分高于平均值.不同域名中,.gov测量结果最好;在不同资源类型方面,虚拟研究社区与博客的时效性最好.然而,时效性只是网络信息的质量特征之一,并不能仅仅根据时效性判断信息的质量.总的说来,网络科学信息的时效性有待提高.本研究中提出的时效性测评框架及方法有利于帮助研究人员和公众在查询信息时对其时效性作出初步判断.  相似文献   

2.
中文搜索引擎的搜索结果重合率研究   总被引:1,自引:0,他引:1  
本文的研究目的是测试主流中文搜索引擎搜索结果之间的重合程度和差异程度.利用一个具有11 171条来自真实用户的提问样本集对百度、谷歌和中国雅虎进行实际测试,发现中文搜索引擎搜索结果之间的差异很大,重合率很低.在全部的第一页搜索结果中,三个引擎中任何一个引擎独有的搜索结果总数占89.34%,任何两个引擎之间重合的搜索结果总数占8.11%,三个引擎重合的搜索结果数量占2.54%.三个引擎前两页搜索结果的重合比例更低.通过和已有的英文搜索引擎重合率测试数据相比较,发现中英文搜索引擎的搜索结果重合率都很低,且很相近.  相似文献   

3.
Measuring Search Engine Quality   总被引:12,自引:3,他引:9  
The effectiveness of twenty public search engines is evaluated using TREC-inspired methods and a set of 54 queries taken from real Web search logs. The World Wide Web is taken as the test collection and a combination of crawler and text retrieval system is evaluated. The engines are compared on a range of measures derivable from binary relevance judgments of the first seven live results returned. Statistical testing reveals a significant difference between engines and high intercorrelations between measures. Surprisingly, given the dynamic nature of the Web and the time elapsed, there is also a high correlation between results of this study and a previous study by Gordon and Pathak. For nearly all engines, there is a gradual decline in precision at increasing cutoff after some initial fluctuation. Performance of the engines as a group is found to be inferior to the group of participants in the TREC-8 Large Web task, although the best engines approach the median of those systems. Shortcomings of current Web search evaluation methodology are identified and recommendations are made for future improvements. In particular, the present study and its predecessors deal with queries which are assumed to derive from a need to find a selection of documents relevant to a topic. By contrast, real Web search reflects a range of other information need types which require different judging and different measures.  相似文献   

4.
Bing and Google customize their results to target people with different geographic locations and languages but, despite the importance of search engines for web users and webometric research, the extent and nature of these differences are unknown. This study compares the results of seventeen random queries submitted automatically to Bing for thirteen different English geographic search markets at monthly intervals. Search market choice alters a small majority of the top 10 results but less than a third of the complete sets of results. Variation in the top 10 results over a month was about the same as variation between search markets but variation over time was greater for the complete results sets. Most worryingly for users, there were almost no ubiquitous authoritative results: only one URL was always returned in the top 10 for all search markets and points in time, and Wikipedia was almost completely absent from the most common top 10 results. Most importantly for webometrics, results from at least three different search markets should be combined to give more reliable and comprehensive results, even for queries that return fewer than the maximum number of URLs.  相似文献   

5.
王若佳  李培 《图书情报工作》2015,59(11):111-118
[目的/意义] 针对当前我国网络用户的健康信息检索行为, 探索利用中文搜索引擎的健康信息检索规律, 为完善健康搜索引擎和网站建设提供参考。[方法/过程] 基于搜狗搜索引擎的大规模查询日志, 采用日志挖掘的方法, 从查询行为和点击行为两个角度对网络用户的健康信息检索行为进行研究。查询行为的研究指标包括会话层(会话长度、用户重复查询), 查询串层(查询串长度、重复查询)和词项层(高频词汇, 主题分类);点击行为的研究指标为点击位置和点击内容。[结果/结论] 健康相关查询的重复率较高, 提示相关网站可缓存高重复率查询串的返回结果;大众关注的热点领域为疾病、保健、母婴、医疗机构与美容整形, 提示网站的导航设计注意导航方向;用户更偏爱使用问答型平台, 提示网站设计者应更加关注与用户间问答型的互动模式。  相似文献   

6.
The study reports on a longitudinal and comparative evaluation of Greek language searching on the web. Ten engines, five global (A9, AltaVista, Google, MSN Search, and Yahoo!) and five Greek (Anazitisi, Ano-Kato, Phantis. Trinity, and Visto), were evaluated using (a) navigational queries in 2004 and 2006; and (b) by measuring the freshness of the search engine indices in 2005 and 2006. Homepage finding queries for known Greek organizations were created and searched. Queries included the name of the organization in its Greek and non-Greek, English or transliterated equivalent forms. The organizations represented ten categories: government departments, universities, colleges, travel agencies, museums, media (TV, radio, newspapers), transportation, and banks. The freshness of the indices was evaluated by examining the status of the returned URLs (live versus dead) from the navigational queries, and by identifying if the engines have indexed 32480 active (live) Greek domain URLs. Effectiveness measures included (a) qualitative assessment of how engines handle the Greek language; (b) precision at 10 documents (P@10); (c) mean reciprocal rank (MRR); (d) Navigational Query Discounted Cumulative Gain (NQ-DCG), a new heuristic evaluation measure; (e) response time; (f) the ratio of the dead URL links returned, (g) the presence or absence of URLs and the decay observed over the period of the study. The results report on which of the global and Greek search engines perform best; and if the performance achieved is good enough from a user’s perspective.  相似文献   

7.
This paper reports findings from an analysis of medical or health queries to different web search engines. We report results: (i). comparing samples of 10000 web queries taken randomly from 1.2 million query logs from the AlltheWeb.com and Excite.com commercial web search engines in 2001 for medical or health queries, (ii). comparing the 2001 findings from Excite and AlltheWeb.com users with results from a previous analysis of medical and health related queries from the Excite Web search engine for 1997 and 1999, and (iii). medical or health advice-seeking queries beginning with the word 'should'. Findings suggest: (i). a small percentage of web queries are medical or health related, (ii). the top five categories of medical or health queries were: general health, weight issues, reproductive health and puberty, pregnancy/obstetrics, and human relationships, and (iii). over time, the medical and health queries may have declined as a proportion of all web queries, as the use of specialized medical/health websites and e-commerce-related queries has increased. Findings provide insights into medical and health-related web querying and suggests some implications for the use of the general web search engines when seeking medical/health information.  相似文献   

8.
Web 信息检索(Information Retrieval)技术研究是应用文本检索研究的成果,它结合Web图论的思想,研究Web上的信息检索,是行之有效的Web知识发现的途径。传统HITS方法所获得的信息精确度相当低,而PageRank作为一通用的搜索方法,不能够应用于特定主题的信息获取。在充分分析了PageRank、HITS等现有算法和Web文档的相似度计算方法的基础上,提出了Web上查询特定主题相关信息发现的RG-HITS算法。它结合了Web超链接、网页知识表示的信息相关度以及HITS方法来搜索Web上特定主题的相关知识。  相似文献   

9.
The authors of this paper investigated the impact of the advanced search features of three common search engines on retrieval result performance: Yahoo, Google, and Live Search. The authors analyzed 240 search queries with different information need emphases to determine retrieval effectiveness differences among regular search, title search, exact phrase search, and PDF file format restriction search. A one-way ANOVA method and regression analysis method were used for the study. It was found that the PDF file format restriction search achieved the best retrieval performance among Yahoo, Google and Live Search. The regular search achieved the best web page ranking performance among Yahoo, Google, and Live Search. The findings of this study can be used to assist users in formulating an appropriate search strategy to improve search effectiveness, and to shed light on how search engines react to different types of search features in terms of retrieval effectiveness.  相似文献   

10.
The authors of this paper investigated the impact of the advanced search features of three common search engines on retrieval result performance: Yahoo, Google, and Live Search. The authors analyzed 240 search queries with different information need emphases to determine retrieval effectiveness differences among regular search, title search, exact phrase search, and PDF file format restriction search. A one-way ANOVA method and regression analysis method were used for the study. It was found that the PDF file format restriction search achieved the best retrieval performance among Yahoo, Google and Live Search. The regular search achieved the best web page ranking performance among Yahoo, Google, and Live Search. The findings of this study can be used to assist users in formulating an appropriate search strategy to improve search effectiveness, and to shed light on how search engines react to different types of search features in terms of retrieval effectiveness.  相似文献   

11.

Objective

The purpose of this study was to investigate the relative effectiveness of three web-scale discovery (WSD) tools in answering health sciences search queries.

Methods

Simple keyword searches, based on topics from six health sciences disciplines, were run at multiple real-world implementations of EBSCO Discovery Service (EDS), Ex Libris''s Primo, and ProQuest''s Summon. Each WSD tool was evaluated in its ability to retrieve relevant results and in its coverage of MEDLINE content.

Results

All WSD tools returned between 50%–60% relevant results. Primo returned a higher number of duplicate results than the other 2 WSD products. Summon results were more relevant when search terms were automatically mapped to controlled vocabulary. EDS indexed the largest number of MEDLINE citations, followed closely by Summon. Additionally, keyword searches in all 3 WSD tools retrieved relevant material that was not found with precision (Medical Subject Headings) searches in MEDLINE.

Conclusions

None of the 3 WSD products studied was overwhelmingly more effective in returning relevant results. While difficult to place the figure of 50%–60% relevance in context, it implies a strong likelihood that the average user would be able to find satisfactory sources on the first page of search results using a rudimentary keyword search. The discovery of additional relevant material beyond that retrieved from MEDLINE indicates WSD tools'' value as a supplement to traditional resources for health sciences researchers.  相似文献   

12.
Search engine results are often biased towards a certain aspect of a query or towards a certain meaning for ambiguous query terms. Diversification of search results offers a way to supply the user with a better balanced result set increasing the probability that a user finds at least one document suiting her information need. In this paper, we present a reranking approach based on minimizing variance of Web search results to improve topic coverage in the top-k results. We investigate two different document representations as the basis for reranking. Smoothed language models and topic models derived by Latent Dirichlet?allocation. To evaluate our approach we selected 240 queries from Wikipedia disambiguation pages. This provides us with ambiguous queries together with a community generated balanced representation of their (sub)topics. For these queries we crawled two major commercial search engines. In addition, we present a new evaluation strategy based on Kullback-Leibler divergence and Wikipedia. We evaluate this method using the TREC sub-topic evaluation on the one hand, and manually annotated query results on the other hand. Our results show that minimizing variance in search results by reranking relevant pages significantly improves topic coverage in the top-k results with respect to Wikipedia, and gives a good overview of the overall search result. Moreover, latent topic models achieve competitive diversification with significantly less reranking. Finally, our evaluation reveals that our automatic evaluation strategy using Kullback-Leibler divergence correlates well with α-nDCG scores used in manual evaluation efforts.  相似文献   

13.
针对目前搜索引擎返回结果的海量性,构建一个元搜索引擎,旨在高效利用多个成员搜索引擎返回的结果。介绍元搜索引擎的基本架构及当前结果融合的主要方法,应用统计学方法研究网页标题、网页摘要与网页文本之间的相关性关系,从而确定相关度权值进行结果相关性判断。实验证明,元搜索引擎搜索结果的平均准确率比各个成员引擎的搜索结果平均准确率都有较大提高。  相似文献   

14.
Search engine optimization, or the practice of designing a web site so that it rises to the top of the results page when users search for particular keywords or phrases, has become so prevalent on the modern web that it has a significant influence on Google search results. This article examines the techniques used by search engine optimization practitioners, the difference between “white hat” and “black hat” optimization tactics, and why it is important for library staff to understand these techniques and their impact on search engine results pages. It also looks at ways that library staff can help their users develop awareness of the factors that influence search results and how to better assess the quality and relevance of results listings.  相似文献   

15.
Google is the search engine of choice for most Internet users. For a variety of reasons, librarians and other expert searchers do not always use Google to its full potential, even though it provides capabilities not possible in traditional bibliographic databases and other search engines. Applying expert searching principles and practices, such as the use of advanced search operators, information retrieval strategies, and search hedges to Google will allow health sciences librarians to find quality information on the Internet more efficiently and effectively.  相似文献   

16.
On May 18, 2009, British computer scientist Stephen Wolfram officially launched a new search product called Wolfram|Alpha (WA). This launch was preceded by months of speculation and hype online about exactly what WA would be and how it would compare to Google and other search engines. This article will explore the basic features of WA, show some example queries and results, and discuss the usefulness and limitations of this new tool.  相似文献   

17.
综合性学术搜索引擎研究   总被引:8,自引:1,他引:8  
选择了四种当前国外典型的可免费使用的综合性学术搜索引擎Google Scholar、Scir-us、BASE、Athenus进行了研究,分析了各自的特色,对比了它们的资源覆盖范围、检索功能和检索效果,提出了当前这类学术搜索引擎存在的主要问题:资源的学术质量问题、检索结果的归并与去重问题、相关性排序问题。在此基础上,探讨了其对图书情报机构、用户和资源供应商的有利和不利的影响。  相似文献   

18.
Transaction logs of NAVER, a major Korean Web search engine, were analyzed to track the information-seeking behavior of Korean Web users. These transaction logs include more than 40 million queries collected over 1 week. This study examines current transaction log analysis methodologies and proposes a method for log cleaning, session definition, and query classification. A term definition method which is necessary for Korean transaction log analysis is also discussed. The results of this study show that users behave in a simple way: they type in short queries with a few query terms, seldom use advanced features, and view few results' pages. Users also behave in a passive way: they seldom change search environments set by the system. It is of interest that users tend to change their queries totally rather than adding or deleting terms to modify the previous queries. The results of this study might contribute to the development of more efficient and effective Web search engines and services.  相似文献   

19.
Query recommendation has long been considered a key feature of search engines, which can improve users’ search experience by providing useful query suggestions for their search tasks. Most existing approaches on query recommendation aim to recommend relevant queries, i.e., alternative queries similar to a user’s initial query. However, the ultimate goal of query recommendation is to assist users to reformulate queries so that they can accomplish their search task successfully and quickly. Only considering relevance in query recommendation is apparently not directly toward this goal. In this paper, we argue that it is more important to directly recommend queries with high utility, i.e., queries that can better satisfy users’ information needs. For this purpose, we attempt to infer query utility from users’ sequential search behaviors recorded in their search sessions. Specifically, we propose a dynamic Bayesian network, referred as Query Utility Model (QUM), to capture query utility by simultaneously modeling users’ reformulation and click behaviors. We then recommend queries with high utility to help users better accomplish their search tasks. We empirically evaluated the performance of our approach on a publicly released query log by comparing with the state-of-the-art methods. The experimental results show that, by recommending high utility queries, our approach is far more effective in helping users find relevant search results and thus satisfying their information needs.  相似文献   

20.
This article critically examines four Google search products (Google Advanced Search, Google News Advanced Search, Google Books Advanced Search, and Google Advanced Scholar Search) and shows how each uses metadata to enhance or improve search results. In addition, the article shows how metadata can increase search precision and recall in information discovery systems. From a library perspective, this article analyzes some of the metadata-enabled features of Google's advanced search pages and compares these features to those found in a typical online library catalog. From a serials perspective, Google News Advanced Search demonstrates how Google indexes news websites, sites that are essentially continuing resources. As Google incorporates more and more metadata functionality into its advanced search pages, they increasingly begin to function more like online library catalogs and less like search pages found in a traditional Internet search engine. The simple search box has many limitations, and like libraries, Google is increasingly creating and offering metadata-enabled search features that improve search precision and recall in its products.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号