首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
中文搜索引擎的搜索结果重合率研究   总被引:1,自引:0,他引:1  
本文的研究目的是测试主流中文搜索引擎搜索结果之间的重合程度和差异程度.利用一个具有11 171条来自真实用户的提问样本集对百度、谷歌和中国雅虎进行实际测试,发现中文搜索引擎搜索结果之间的差异很大,重合率很低.在全部的第一页搜索结果中,三个引擎中任何一个引擎独有的搜索结果总数占89.34%,任何两个引擎之间重合的搜索结果总数占8.11%,三个引擎重合的搜索结果数量占2.54%.三个引擎前两页搜索结果的重合比例更低.通过和已有的英文搜索引擎重合率测试数据相比较,发现中英文搜索引擎的搜索结果重合率都很低,且很相近.  相似文献   

2.
王若佳  李培 《图书情报工作》2015,59(11):111-118
[目的/意义] 针对当前我国网络用户的健康信息检索行为, 探索利用中文搜索引擎的健康信息检索规律, 为完善健康搜索引擎和网站建设提供参考。[方法/过程] 基于搜狗搜索引擎的大规模查询日志, 采用日志挖掘的方法, 从查询行为和点击行为两个角度对网络用户的健康信息检索行为进行研究。查询行为的研究指标包括会话层(会话长度、用户重复查询), 查询串层(查询串长度、重复查询)和词项层(高频词汇, 主题分类);点击行为的研究指标为点击位置和点击内容。[结果/结论] 健康相关查询的重复率较高, 提示相关网站可缓存高重复率查询串的返回结果;大众关注的热点领域为疾病、保健、母婴、医疗机构与美容整形, 提示网站的导航设计注意导航方向;用户更偏爱使用问答型平台, 提示网站设计者应更加关注与用户间问答型的互动模式。  相似文献   

3.
中文搜索引擎用户检索式特征探析   总被引:2,自引:0,他引:2  
马寒  冯锦玲 《情报学报》2005,24(6):718-722
这项研究采集了百度、一搜、中搜和搜狗四家中文搜索引擎的七千余项检索式,分别从词汇出现频次、词汇量、类别等方面分析了中文搜索引擎用户的检索行为特征,对开展用户教育和搜索服务设计都有一定的实用价值。  相似文献   

4.
The goal of this article is to understand the reasons why known-item search queries entered in a discovery system return zero hits. We analyze a sample of 708 known-item queries and classify them into four categories of zero hits with regard to whether the item is held by the library and whether the query is formulated correctly: (1) item in stock, but query incorrect, (2) item not in stock, (3) item in stock, but incomplete or erroneous metadata, (4) query is ambiguous or not understandable. The main reasons for zero hits are caused by acquisition and erroneous search queries. We discuss possible solutions for known-item queries resulting in zero hits from the side of the system and show that 30% of zero hits could easily be avoided by applying automatic spelling correction. We argue that libraries can improve their discovery systems or online catalogs by applying strategies to avoid or cope with zero hits inspired by web search engines and commercial search web sites.  相似文献   

5.
6.
It is known that users of internet search engines often enter queries with misspellings in one or more search terms. Several web search engines make suggestions for correcting misspelled words, but the methods used are proprietary and unpublished to our knowledge. Here we describe the methodology we have developed to perform spelling correction for the PubMed search engine. Our approach is based on the noisy channel model for spelling correction and makes use of statistics harvested from user logs to estimate the probabilities of different types of edits that lead to misspellings. The unique problems encountered in correcting search engine queries are discussed and our solutions are outlined.  相似文献   

7.
几种搜索引擎中Image搜索的比较   总被引:4,自引:0,他引:4  
随着用户对网上图像搜索要求的不断增长,各种基于Web的图像搜索引擎应运而生。但是各种不同的图像搜索引擎在搜索的响应时间、检索出的图像的数量、准确性、检索结果的排序等方面存在着较大的差异。本文首先就图像搜索模式作一简单的叙述,然后对搜索引擎Google、Excite、Yahoo、Ixqiuck的Image Search进行比较。  相似文献   

8.
The study reports on a longitudinal and comparative evaluation of Greek language searching on the web. Ten engines, five global (A9, AltaVista, Google, MSN Search, and Yahoo!) and five Greek (Anazitisi, Ano-Kato, Phantis. Trinity, and Visto), were evaluated using (a) navigational queries in 2004 and 2006; and (b) by measuring the freshness of the search engine indices in 2005 and 2006. Homepage finding queries for known Greek organizations were created and searched. Queries included the name of the organization in its Greek and non-Greek, English or transliterated equivalent forms. The organizations represented ten categories: government departments, universities, colleges, travel agencies, museums, media (TV, radio, newspapers), transportation, and banks. The freshness of the indices was evaluated by examining the status of the returned URLs (live versus dead) from the navigational queries, and by identifying if the engines have indexed 32480 active (live) Greek domain URLs. Effectiveness measures included (a) qualitative assessment of how engines handle the Greek language; (b) precision at 10 documents (P@10); (c) mean reciprocal rank (MRR); (d) Navigational Query Discounted Cumulative Gain (NQ-DCG), a new heuristic evaluation measure; (e) response time; (f) the ratio of the dead URL links returned, (g) the presence or absence of URLs and the decay observed over the period of the study. The results report on which of the global and Greek search engines perform best; and if the performance achieved is good enough from a user’s perspective.  相似文献   

9.
When searching for health information, results quality can be judged against available scientific evidence: Do search engines return advice consistent with evidence based medicine? We compared the performance of domain-specific health and depression search engines against a general-purpose engine (Google) on both relevance of results and quality of advice. Over 101 queries, to which the term ‘depression’ was added if not already present, Google returned more relevant results than those of the domain-specific engines. However, over the 50 treatment-related queries, Google returned 70 pages recommending for or against a well studied treatment, of which 19 strongly disagreed with the scientific evidence. A domain-specific index of 4 sites selected by domain experts was only wrong in 5 of 50 recommendations. Analysis suggests a tension between relevance and quality. Indexing more pages can give a greater number of relevant results, but selective inclusion can give better quality.  相似文献   

10.
The availability of web search engines offers opportunities in addition to those provided by bibliographic databases for identifying academic literature, but their usefulness for retrieving research is uncertain. A rigorous literature search was undertaken to investigate whether web search engines might replace bibliographic databases, using empirical research in health and social care as a case study. Eight databases and five web search engines were searched between 20 July and 6 August 2015. Sixteen unique studies which compared at least one database with at least one web search engine were examined, as well as drawing lessons from the authors’ own search process. Web search engines were limited in that the searcher cannot be certain that the principles of Boolean logic apply and they were more limited than bibliographic databases in their functions, such as exporting abstracts. Recommendations are made for improving the rigour and quality of reporting studies of academic literature searching.  相似文献   

11.
The authors of this paper investigated the impact of the advanced search features of three common search engines on retrieval result performance: Yahoo, Google, and Live Search. The authors analyzed 240 search queries with different information need emphases to determine retrieval effectiveness differences among regular search, title search, exact phrase search, and PDF file format restriction search. A one-way ANOVA method and regression analysis method were used for the study. It was found that the PDF file format restriction search achieved the best retrieval performance among Yahoo, Google and Live Search. The regular search achieved the best web page ranking performance among Yahoo, Google, and Live Search. The findings of this study can be used to assist users in formulating an appropriate search strategy to improve search effectiveness, and to shed light on how search engines react to different types of search features in terms of retrieval effectiveness.  相似文献   

12.
WWW检索引擎Excite的研究与利用   总被引:1,自引:1,他引:0  
介绍了WWW著名检索引擎Excite的发展及检索技术,针对其检索方式与检索特点进行了深入的分析研究,并在此基础上向用户介绍推荐了Excite的中文检索站点。  相似文献   

13.
Background: Research is essential for evidence‐based practice yet many health professionals do not have enough time to find research. Studies relevant to occupational therapists can be particularly difficult to find. Most search engines are broad and return a large number of irrelevant articles. Occupational Therapy Systematic Evaluation of Evidence (OTseeker) is an occupational therapy database available at http://www.otseeker.com . Developed by Australian occupational therapists, the resource aims to increase access to research and support clinical decision making. This discipline‐specific database contains pre‐appraised information from a variety of sources and decreases the time required to locate best evidence. Objectives: The aims of this paper are to: (i) describe how health librarians can use OTseeker to help allied health students, researchers and practitioners, particularly in occupational therapy, to find quality evidence; (ii) provide a teaching resource for health librarians based around the OTseeker evidence database; and (iii) highlight new features contained on the OTseeker database. Methods: A case study is provided which focuses on searching for evidence on the effectiveness of upper limb rehabilitation after stroke using OTseeker. Conclusion: This paper may increase the knowledge, skills and competencies of health librarians, helping them to access evidence‐based databases, and educate other professionals.  相似文献   

14.
The authors of this paper investigated the impact of the advanced search features of three common search engines on retrieval result performance: Yahoo, Google, and Live Search. The authors analyzed 240 search queries with different information need emphases to determine retrieval effectiveness differences among regular search, title search, exact phrase search, and PDF file format restriction search. A one-way ANOVA method and regression analysis method were used for the study. It was found that the PDF file format restriction search achieved the best retrieval performance among Yahoo, Google and Live Search. The regular search achieved the best web page ranking performance among Yahoo, Google, and Live Search. The findings of this study can be used to assist users in formulating an appropriate search strategy to improve search effectiveness, and to shed light on how search engines react to different types of search features in terms of retrieval effectiveness.  相似文献   

15.
Automating the Construction of Internet Portals with Machine Learning   总被引:11,自引:0,他引:11  
Domain-specific internet portals are growing in popularity because they gather content from the Web and organize it for easy access, retrieval and search. For example, www.campsearch.com allows complex queries by age, location, cost and specialty over summer camps. This functionality is not possible with general, Web-wide search engines. Unfortunately these portals are difficult and time-consuming to maintain. This paper advocates the use of machine learning techniques to greatly automate the creation and maintenance of domain-specific Internet portals. We describe new research in reinforcement learning, information extraction and text classification that enables efficient spidering, the identification of informative text segments, and the population of topic hierarchies. Using these techniques, we have built a demonstration system: a portal for computer science research papers. It already contains over 50,000 papers and is publicly available at www.cora.justresearch.com. These techniques are widely applicable to portal creation in other domains.  相似文献   

16.
Physicians are becoming aware of the World Wide Web as a resource for medical information. In spring 1999, first-year students at the University of Louisville's School of Medicine were given an assignment to review and evaluate Internet search engines and directories, medicine-specific search engines and meta lists, and health-related Web sites. Students found that general search engines were easier to learn and use and produced better results than either meta medical sites or medicine-specific search engines. Students were very severe in judging the quality of health-related Web sites. Our students' impressions are compared to those of physicians in similar studies. Solutions to the problems of searching the Web for health information are reviewed.  相似文献   

17.
Bing and Google customize their results to target people with different geographic locations and languages but, despite the importance of search engines for web users and webometric research, the extent and nature of these differences are unknown. This study compares the results of seventeen random queries submitted automatically to Bing for thirteen different English geographic search markets at monthly intervals. Search market choice alters a small majority of the top 10 results but less than a third of the complete sets of results. Variation in the top 10 results over a month was about the same as variation between search markets but variation over time was greater for the complete results sets. Most worryingly for users, there were almost no ubiquitous authoritative results: only one URL was always returned in the top 10 for all search markets and points in time, and Wikipedia was almost completely absent from the most common top 10 results. Most importantly for webometrics, results from at least three different search markets should be combined to give more reliable and comprehensive results, even for queries that return fewer than the maximum number of URLs.  相似文献   

18.
The TREC 2009 web ad hoc and relevance feedback tasks used a new document collection, the ClueWeb09 dataset, which was crawled from the general web in early 2009. This dataset contains 1 billion web pages, a substantial fraction of which are spam—pages designed to deceive search engines so as to deliver an unwanted payload. We examine the effect of spam on the results of the TREC 2009 web ad hoc and relevance feedback tasks, which used the ClueWeb09 dataset. We show that a simple content-based classifier with minimal training is efficient enough to rank the “spamminess” of every page in the dataset using a standard personal computer in 48 hours, and effective enough to yield significant and substantive improvements in the fixed-cutoff precision (estP10) as well as rank measures (estR-Precision, StatMAP, MAP) of nearly all submitted runs. Moreover, using a set of “honeypot” queries the labeling of training data may be reduced to an entirely automatic process. The results of classical information retrieval methods are particularly enhanced by filtering—from among the worst to among the best.  相似文献   

19.
Query recommendation has long been considered a key feature of search engines, which can improve users’ search experience by providing useful query suggestions for their search tasks. Most existing approaches on query recommendation aim to recommend relevant queries, i.e., alternative queries similar to a user’s initial query. However, the ultimate goal of query recommendation is to assist users to reformulate queries so that they can accomplish their search task successfully and quickly. Only considering relevance in query recommendation is apparently not directly toward this goal. In this paper, we argue that it is more important to directly recommend queries with high utility, i.e., queries that can better satisfy users’ information needs. For this purpose, we attempt to infer query utility from users’ sequential search behaviors recorded in their search sessions. Specifically, we propose a dynamic Bayesian network, referred as Query Utility Model (QUM), to capture query utility by simultaneously modeling users’ reformulation and click behaviors. We then recommend queries with high utility to help users better accomplish their search tasks. We empirically evaluated the performance of our approach on a publicly released query log by comparing with the state-of-the-art methods. The experimental results show that, by recommending high utility queries, our approach is far more effective in helping users find relevant search results and thus satisfying their information needs.  相似文献   

20.
The problem of language in Web searching has been discussed primarily in the area of cross-language information retrieval (CLIR). However, much CLIR research centers on investigation of the effectiveness of automatic translation techniques. The case study reported here explored bilingual user behaviors, perceptions, and preferences with respect to the capability of the Web as a multilingual information resource. Twenty-eight bilingual academic users from Myongji University in Korea were recruited for the study. Findings show that the subjects did not use Web search engines as multilingual tools. For search queries, they selected a language that represents their information need most accurately depending on the types of information task rather than choosing their first language. Subjects expressed concerns about the accuracy of machine translation of scholarly terminologies and preferred to have user control over multilingual Web searches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号