首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Comparing rankings of search results on the Web   总被引:1,自引:0,他引:1  
The Web has become an information source for professional data gathering. Because of the vast amounts of information on almost all topics, one cannot systematically go over the whole set of results, and therefore must rely on the ordering of the results by the search engine. It is well known that search engines on the Web have low overlap in terms of coverage. In this study we measure how similar are the rankings of search engines on the overlapping results.We compare rankings of results for identical queries retrieved from several search engines. The method is based only on the set of URLs that appear in the answer sets of the engines being compared. For comparing the similarity of rankings of two search engines, the Spearman correlation coefficient is computed. When comparing more than two sets Kendall’s W is used. These are well-known measures and the statistical significance of the results can be computed. The methods are demonstrated on a set of 15 queries that were submitted to four large Web search engines. The findings indicate that the large public search engines on the Web employ considerably different ranking algorithms.  相似文献   

2.
Stochastic simulation has been very effective in many domains but never applied to the WWW. This study is a premiere in using neural networks in stochastic simulation of the number of rejected Web pages per search query. The evaluation of the quality of search engines should involve not only the resulting set of Web pages but also an estimate of the rejected set of Web pages. The iterative radial basis functions (RBF) neural network developed by Meghabghab and Nasr [Iterative RBF neural networks as meta-models for stochastic simulations, in: Second International Conference on Intelligent Processing and Manufacturing of Materials, IPMM’99, Honolulu, Hawaii, 1999, pp. 729–734] was adapted to the actual evaluation of the number of rejected Web pages on four search engines, i.e., Yahoo, Alta Vista, Google, and Northern Light. Nine input variables were selected for the simulation: (1) precision, (2) overlap, (3) response time, (4) coverage, (5) update frequency, (6) boolean logic, (7) truncation, (8) word and multi-word searching, (9) portion of the Web pages indexed. Typical stochastic simulation meta-modeling uses regression models in response surface methods. RBF becomes a natural target for such an attempt because they use a family of surfaces each of which naturally divides an input space into two regions X+ and X− and the n patterns for testing will be assigned either class X+ or X−. This technique divides the resulting set of responses to a query into accepted and rejected Web pages. To test the hypothesis that the evaluation of any search engine query should involve an estimate of the number of rejected Web pages as part of the evaluation, RBF meta-model was trained on 937 examples from a set of 9000 different simulation runs on the nine different input variables. Results show that two of the variables can be eliminated which include: response time and portion of the Web indexed without affecting evaluation results. Results show that the number of rejected Web pages for a specific set of search queries on these four engines very high. Also a goodness measure of a search engine for a given set of queries can be designed which is a function of the coverage of the search engine and the normalized age of a new document in result set for the query. This study concludes that unless search engine designers address the issue of rejected Web pages, indexing, and crawling, the usage of the Web as a research tool for academic and educational purposes will stay hindered.  相似文献   

3.
Search engines are the gateway for users to retrieve information from the Web. There is a crucial need for tools that allow effective analysis of search engine queries to provide a greater understanding of Web users' information seeking behavior. The objective of the study is to develop an effective strategy for the selection of samples from large-scale data sets. Millions of queries are submitted to Web search engines daily and new sampling techniques are required to bring these databases to a manageable size, while preserving the statistically representative characteristics of the entire data set. This paper reports results from a study using data logs from the Excite Web search engine. We use Poisson sampling to develop a sampling strategy, and show how sample sets selected by Poisson sampling statistically effectively represent the characteristics of the entire dataset. In addition, this paper discusses the use of Poisson sampling in continuous monitoring of stochastic processes, such as Web site dynamics.  相似文献   

4.
Web queries in question format are becoming a common element of a user's interaction with Web search engines. Web search services such as Ask Jeeves – a publicly accessible question and answer (Q&A) search engine – request users to enter question format queries. This paper provides results from a study examining queries in question format submitted to two different Web search engines – Ask Jeeves that explicitly encourages queries in question format and the Excite search service that does not explicitly encourage queries in question format. We identify the characteristics of queries in question format in two different data sets: (1) 30,000 Ask Jeeves queries and 15,575 Excite queries, including the nature, length, and structure of queries in question format. Findings include: (1) 50% of Ask Jeeves queries and less than 1% of Excite were in question format, (2) most users entered only one query in question format with little query reformulation, (3) limited range of formats for queries in question format – mainly “where”, “what”, or “how” questions, (4) most common question query format was “Where can I find………” for general information on a topic, and (5) non-question queries may be in request format. Overall, four types of user Web queries were identified: keyword, Boolean, question, and request. These findings provide an initial mapping of the structure and content of queries in question and request format. Implications for Web search services are discussed.  相似文献   

5.
Ecommerce is developing into a fast-growing channel for new business, so a strong presence in this domain could prove essential to the success of numerous commercial organizations. However, there is little research examining ecommerce at the individual customer level, particularly on the success of everyday ecommerce searches. This is critical for the continued success of online commerce. The purpose of this research is to evaluate the effectiveness of search engines in the retrieval of relevant ecommerce links. The study examines the effectiveness of five different types of search engines in response to ecommerce queries by comparing the engines’ quality of ecommerce links using topical relevancy ratings. This research employs 100 ecommerce queries, five major search engines, and more than 3540 Web links. The findings indicate that links retrieved using an ecommerce search engine are significantly better than those obtained from most other engines types but do not significantly differ from links obtained from a Web directory service. We discuss the implications for Web system design and ecommerce marketing campaigns.  相似文献   

6.
The Web and especially major Web search engines are essential tools in the quest to locate online information for many people. This paper reports results from research that examines characteristics and changes in Web searching from nine studies of five Web search engines based in the US and Europe. We compare interactions occurring between users and Web search engines from the perspectives of session length, query length, query complexity, and content viewed among the Web search engines. The results of our research shows (1) users are viewing fewer result pages, (2) searchers on US-based Web search engines use more query operators than searchers on European-based search engines, (3) there are statistically significant differences in the use of Boolean operators and result pages viewed, and (4) one cannot necessary apply results from studies of one particular Web search engine to another Web search engine. The wide spread use of Web search engines, employment of simple queries, and decreased viewing of result pages may have resulted from algorithmic enhancements by Web search engine companies. We discuss the implications of the findings for the development of Web search engines and design of online content.  相似文献   

7.
Search engines are essential for finding information on the World Wide Web. We conducted a study to see how effective eight search engines are. Expert searchers sought information on the Web for users who had legitimate needs for information, and these users assessed the relevance of the information retrieved. We calculated traditional information retrieval measures of recall and precision at varying numbers of retrieved documents and used these as the bases for statistical comparisons of retrieval effectiveness among the eight search engines. We also calculated the likelihood that a document retrieved by one search engine was retrieved by other search engines as well.  相似文献   

8.
多元搜索引擎研究   总被引:12,自引:0,他引:12  
This article describes the types of metasearch engines, discusses their searching characteristics and gives a detailed introduction to the commonly-used search engine directories and simultaneous metasearch engines.  相似文献   

9.
The Web has become a worldwide source of information and a mainstream business tool. It is changing the way people conduct the daily business of their lives. As these changes are occurring, we need to understand what Web searching trends are emerging within the various global regions. What are the regional differences and trends in Web searching, if any? What is the effectiveness of Web search engines as providers of information? As part of a body of research studying these questions, we have analyzed two data sets collected from queries by mainly European users submitted to AlltheWeb.com on 6 February 2001 and 28 May 2002. AlltheWeb.com is a major and highly rated European search engine. Each data set contains approximately a million queries submitted by over 200,000 users and spans a 24-h period. This longitudinal benchmark study shows that European Web searching is evolving in certain directions. There was some decline in query length, with extremely simple queries. European search topics are broadening, with a notable percentage decline in sexual and pornographic searching. The majority of Web searchers view fewer than five Web documents, spending only seconds on a Web document. Approximately 50% of the Web documents viewed by these European users were topically relevant. We discuss the implications for Web information systems and information content providers.  相似文献   

10.
针对现有搜索引擎的局限性和当前用户的个性化需求,以用户兴趣模型为基础,对个性化元搜索引擎的基本原理和结构、方法及关键技术进行了研究,并在此基础上提出了用户个性化元搜索引擎的简单实现。  相似文献   

11.
In this paper, we define and present a comprehensive classification of user intent for Web searching. The classification consists of three hierarchical levels of informational, navigational, and transactional intent. After deriving attributes of each, we then developed a software application that automatically classified queries using a Web search engine log of over a million and a half queries submitted by several hundred thousand users. Our findings show that more than 80% of Web queries are informational in nature, with about 10% each being navigational and transactional. In order to validate the accuracy of our algorithm, we manually coded 400 queries and compared the results from this manual classification to the results determined by the automated method. This comparison showed that the automatic classification has an accuracy of 74%. Of the remaining 25% of the queries, the user intent is vague or multi-faceted, pointing to the need for probabilistic classification. We discuss how search engines can use knowledge of user intent to provide more targeted and relevant results in Web searching.  相似文献   

12.
网络蜘蛛搜索策略的研究是近年来专业搜索引擎研究的焦点之一,如何使搜索引擎快速准确地从庞大的网页数据中获取所需资源的需求是目前所面临的重要问题。重点阐述了搜索引擎的Web Spider(网络蜘蛛)的搜索策略和搜索优化措施,提出了一种简单的基于广度优先算法的网络蜘蛛设计方案,并分析了设计过程中的优化措施。  相似文献   

13.
A growing body of research is beginning to explore the information-seeking behavior of Web users. The vast majority of these studies have concentrated on the area of textual information retrieval (IR). Little research has examined how people search for non-textual information on the Internet, and few large-scale studies has investigated visual information-seeking behavior with general-purpose Web search engines. This study examined visual information needs as expressed in users’ Web image queries. The data set examined consisted of 1,025,908 sequential queries from 211,058 users of Excite, a major Internet search service. Twenty-eight terms were used to identify queries for both still and moving images, resulting in a subset of 33,149 image queries by 9855 users. We provide data on: (1) image queries – the number of queries and the number of search terms per user, (2) image search sessions – the number of queries per user, modifications made to subsequent queries in a session, and (3) image terms – their rank/frequency distribution and the most highly used search terms. On average, there were 3.36 image queries per user containing an average of 3.74 terms per query. Image queries contained a large number of unique terms. The most frequently occurring image related terms appeared less than 10% of the time, with most terms occurring only once. We contrast this to earlier work by P.G.B. Enser, Journal of Documentation 51 (2) (1995) 126–170, who examined written queries for pictorial information in a non-digital environment. Implications for the development of models for visual information retrieval, and for the design of Web search engines are discussed.  相似文献   

14.
Across the world, millions of users interact with search engines every day to satisfy their information needs. As the Web grows bigger over time, such information needs, manifested through user search queries, also become more complex. However, there has been no systematic study that quantifies the structural complexity of Web search queries. In this research, we make an attempt towards understanding and characterizing the syntactic complexity of search queries using a multi-pronged approach. We use traditional statistical language modeling techniques to quantify and compare the perplexity of queries with natural language (NL). We then use complex network analysis for a comparative analysis of the topological properties of queries issued by real Web users and those generated by statistical models. Finally, we conduct experiments to study whether search engine users are able to identify real queries, when presented along with model-generated ones. The three complementary studies show that the syntactic structure of Web queries is more complex than what n-grams can capture, but simpler than NL. Queries, thus, seem to represent an intermediate stage between syntactic and non-syntactic communication.  相似文献   

15.
基于RSS的搜索引擎技术及其发展趋向探析   总被引:3,自引:0,他引:3  
随着RSS资源的飞速增长,基于RSS的搜索引擎应运而生.RSS搜索引擎的使用方法和Google、百度一样,都是通过输入关键词来搜索要查询的内容.不同的是传统门户搜索引擎是对抓取到的网页内容进行搜索.而RSS搜索引擎则是直接对RSS种子或含有RSS种子的网页进行检索.RSS搜索引擎具有高度准确性、动态聚合机制和高效率、高速度的搜索特点,为当前网络资源最为重要的新型信息检索工具.鉴于此,本文就RSS搜索引擎的国内外研究状况、技术特点和应用机理等进行了初步探讨,在此基础上.笔者进一步对基于RSS的搜索引擎技术进行了展望.  相似文献   

16.
Search Engine for South-East Europe (SE4SEE) is a socio-cultural search engine running on the grid infrastructure. It offers a personalized, on-demand, country-specific, category-based Web search facility. The main goal of SE4SEE is to attack the page freshness problem by performing the search on the original pages residing on the Web, rather than on the previously fetched copies as done in the traditional search engines. SE4SEE also aims to obtain high download rates in Web crawling by making use of the geographically distributed nature of the grid. In this work, we present the architectural design issues and implementation details of this search engine. We conduct various experiments to illustrate performance results obtained on a grid infrastructure and justify the use of the search strategy employed in SE4SEE.  相似文献   

17.
Real time search is an increasingly important area of information seeking on the Web. In this research, we analyze 1,005,296 user interactions with a real time search engine over a 190 day period. Using query log analysis, we investigate searching behavior, categorize search topics, and measure the economic value of this real time search stream. We examine aggregate usage of the search engine, including number of users, queries, and terms. We then classify queries into subject categories using the Google Directory topical hierarchy. We next estimate the economic value of the real time search traffic using the Google AdWords keyword advertising platform. Results shows that 30% of the queries were unique (used only once in the entire dataset), which is low compared to traditional Web searching. Also, 60% of the search traffic comes from the search engine’s application program interface, indicating that real time search is heavily leveraged by other applications. There are many repeated queries over time via these application program interfaces, perhaps indicating both long term interest in a topic and the polling nature of real time queries. Concerning search topics, the most used terms dealt with technology, entertainment, and politics, reflecting both the temporal nature of the queries and, perhaps, an early adopter user-based. However, 36% of the queries indicate some geographical affinity, pointing to a location-based aspect to real time search. In terms of economic value, we calculate this real time search stream to be worth approximately US $33,000,000 (US $33 M) on the online advertising market at the time of the study. We discuss the implications for search engines and content providers as real time content increasingly enters the main stream as an information source.  相似文献   

18.
Students use general web search engines as their primary source of research while trying to find answers to school-related questions. Although search engines are highly relevant for the general population, they may return results that are out of educational context. Another rising trend; social community question answering websites are the second choice for students who try to get answers from other peers online. We attempt discovering possible improvements in educational search by leveraging both of these information sources. For this purpose, we first implement a classifier for educational questions. This classifier is built by an ensemble method that employs several regular learning algorithms and retrieval based approaches that utilize external resources. We also build a query expander to facilitate classification. We further improve the classification using search engine results and obtain 83.5% accuracy. Although our work is entirely based on the Turkish language, the features could easily be mapped to other languages as well. In order to find out whether search engine ranking can be improved in the education domain using the classification model, we collect and label a set of query results retrieved from a general web search engine. We propose five ad-hoc methods to improve search ranking based on the idea that the query-document category relation is an indicator of relevance. We evaluate these methods for overall performance, varying query length and based on factoid and non-factoid queries. We show that some of the methods significantly improve the rankings in the education domain.  相似文献   

19.
XML has become a universal standard for information exchange over the Web due to features such as simple syntax and extensibility. Processing queries over these documents has been the focus of several research groups. In fact, there is broad literature in efficient XML query processing which explore indexes, fragmentation techniques, etc. However, for answering complex queries, existing approaches mainly analyze information that is explicitly defined in the XML document. A few work investigate the use of Prolog to increase the query possibilities, allowing inference over the data content. This can cause a significant increase in the query possibilities and expressive power, allowing access to non-obvious information. However, this requires translating the XML documents into Prolog facts. But for regular queries (which do not require inference), is this a good alternative? What kind of queries could benefit from the Prolog translation? Can we always use Prolog engines to execute XML queries in an efficient way? There are many questions involved in adopting an alternative approach to run XML queries. In this work, we investigate this matter by translating XML queries into Prolog queries and comparing the query processing times using Prolog and native XML engines. Our work contributes by providing a set of heuristics that helps users to decide when to use Prolog engines to process a given XML query. In summary, our results show that queries that search elements by a key value or by its position (simple search) are more efficient when run in Prolog than in native XML engines. Also, queries over large datasets, or that searches for substrings perform better when run by native XML engines.  相似文献   

20.
赵发珍 《现代情报》2013,33(6):91-95
论文通过Yahoo!和Bing搜索引擎获取30个网络社区网站的网页总数、链接总数、内、外部链接数、PR值,并计算了网络影响因子等,运用灰色关联分析对以上多项链接指标数据进行综合排序。研究结果表明:这30个网络社区网站网络影响力前几位是:51.com、腾讯微博、腾讯博客、腾讯论坛、网易微博、网易博客、新浪博客、豆瓣网。最后通过对比Yahoo!和Bing搜索引擎获取的链接数据,验证了两大搜索引擎对于网站链接分析是可行的,但是用Yahoo搜索引擎统计的数据来分析更为准确一些。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号