期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Document replication strategies for geographically distributed web search engines

Enver Kayaaslan B. Barla Cambazoglu Cevdet Aykanat 《Information processing & management》2013

Large-scale web search engines are composed of multiple data centers that are geographically distant to each other. Typically, a user query is processed in a data center that is geographically close to the origin of the query, over a replica of the entire web index. Compared to a centralized, single-center search engine, this architecture offers lower query response times as the network latencies between the users and data centers are reduced. However, it does not scale well with increasing index sizes and query traffic volumes because queries are evaluated on the entire web index, which has to be replicated and maintained in all data centers. As a remedy to this scalability problem, we propose a document replication framework in which documents are selectively replicated on data centers based on regional user interests. Within this framework, we propose three different document replication strategies, each optimizing a different objective: reducing the potential search quality loss, the average query response time, or the total query workload of the search system. For all three strategies, we consider two alternative types of capacity constraints on index sizes of data centers. Moreover, we investigate the performance impact of query forwarding and result caching. We evaluate our strategies via detailed simulations, using a large query log and a document collection obtained from the Yahoo! web search engine. 相似文献

2.

不同搜索引擎在网络影响因子分析中的比较研究 总被引：11，自引：0，他引：11

吴茵茵《情报科学》2005,23(3):431-435

网络影响因子是网络计量学研究中的一个重要分支，搜索引擎在网络影响因子的研究中起着重要的作用。本文利用三种搜索引擎对中国10所大学的总网络影响因子进行了分析，并对这几种搜索引擎进行了对比性研究。相似文献

3.

基于Heritrix和Lucene的专题搜索引擎研究

贾超卫文学《中国科技信息》2012,(10):95-96

专题搜索引擎也称垂直搜索引擎,主要用来满足特定领域的用户需求。Heritrix是开源的网络爬虫,Heritrix的WebUI启动方式并不易用于广大用户。本文改变了往常对Heritrix用法,摒弃了Heritrix的WebUI启动方式,对Heritrix源码进行修改,将Lucene整合到Heritrix中,构建成一个完整的搜索引擎,并通过监听器监听搜索引擎状态,使搜索引擎能够进行自动爬取和数据更新。同时,本文添加了网页过滤模块以及对查询结果排序算法进行了改进,提高了搜索引擎的易用性和查询的准确率。相似文献

4.

Bid keyword suggestion in sponsored search based on competitiveness and relevance

Ying Zhang Weinan Zhang Bin Gao Xiaojie Yuan Tie-Yan Liu 《Information processing & management》2014

In sponsored search, many advertisers have not achieved their expected performances while the search engine also has a large room to improve their revenue. Specifically, due to the improper keyword bidding, many advertisers cannot survive the competitive ad auctions to get their desired ad impressions; meanwhile, a significant portion of search queries have no ads displayed in their search result pages, even if many of them have commercial values. We propose recommending a group of relevant yet less-competitive keywords to an advertiser. Hence, the advertiser can get the chance to win some (originally empty) ad slots and accumulate a number of impressions. At the same time, the revenue of the search engine can also be boosted since many empty ad shots are filled. Mathematically, we model the problem as a mixed integer programming problem, which maximizes the advertiser revenue and the relevance of the recommended keywords, while minimizing the keyword competitiveness, subject to the bid and budget constraints. By solving the problem, we can offer an optimal group of keywords and their optimal bid prices to an advertiser. Simulation results have shown the proposed method is highly effective in increasing ad impressions, expected clicks, advertiser revenue, and search engine revenue. 相似文献

5.

How are we searching the World Wide Web? A comparison of nine search engine transaction logs

Bernard J. Jansen Amanda Spink 《Information processing & management》2006

The Web and especially major Web search engines are essential tools in the quest to locate online information for many people. This paper reports results from research that examines characteristics and changes in Web searching from nine studies of five Web search engines based in the US and Europe. We compare interactions occurring between users and Web search engines from the perspectives of session length, query length, query complexity, and content viewed among the Web search engines. The results of our research shows (1) users are viewing fewer result pages, (2) searchers on US-based Web search engines use more query operators than searchers on European-based search engines, (3) there are statistically significant differences in the use of Boolean operators and result pages viewed, and (4) one cannot necessary apply results from studies of one particular Web search engine to another Web search engine. The wide spread use of Web search engines, employment of simple queries, and decreased viewing of result pages may have resulted from algorithmic enhancements by Web search engine companies. We discuss the implications of the findings for the development of Web search engines and design of online content. 相似文献

6.

Exploring features for the automatic identification of user goals in web search

Mauro Rojas Herrera Edleno Silva de Moura Marco Cristo Thomaz Philippe Silva Altigran Soares da Silva 《Information processing & management》2010

Queries submitted to search engines can be classified according to the user goals into three distinct categories: navigational, informational, and transactional. Such classification may be useful, for instance, as additional information for advertisement selection algorithms and for search engine ranking functions, among other possible applications. This paper presents a study about the impact of using several features extracted from the document collection and query logs on the task of automatically identifying the users’ goals behind their queries. We propose the use of new features not previously reported in literature and study their impact on the quality of the query classification task. Further, we study the impact of each feature on different web collections, showing that the choice of the best set of features may change according to the target collection. 相似文献

7.

Multimedia search capabilities of Chinese language search engines

Yun-Ke Chang Miguel A. Morales-Arroyo Amanda Spink 《Information processing & management》2010

This paper reports results from a study exploring the multimedia search functionality of Chinese language search engines. Web searching in Chinese (Mandarin) is a growing research area and a technical challenge for popular commercial Web search engines. Few studies have been conducted on Chinese language search engines. We investigate two research questions: which Chinese language search engines provide multimedia searching, and what multimedia search functionalities are available in Chinese language Web search engines. Specifically, we examine each Web search engine’s (1) features permitting Chinese language multimedia searches, (2) extent of search personalization and user control of multimedia search variables, and (3) the relationships between Web search engines and their features in the Chinese context. Key findings show that Chinese language Web search engines offer limited multimedia search functionality, and general search engines provide a wider range of features than specialized multimedia search engines. Study results have implications for Chinese Web users, Website designers and Web search engine developers. 相似文献

8.

A study of results overlap and uniqueness among major Web search engines

Amanda Spink Bernard J. Jansen Chris Blakely Sherry Koshman 《Information processing & management》2006

The performance and capabilities of Web search engines is an important and significant area of research. Millions of people world wide use Web search engines very day. This paper reports the results of a major study examining the overlap among results retrieved by multiple Web search engines for a large set of more than 10,000 queries. Previous smaller studies have discussed a lack of overlap in results returned by Web search engines for the same queries. The goal of the current study was to conduct a large-scale study to measure the overlap of search results on the first result page (both non-sponsored and sponsored) across the four most popular Web search engines, at specific points in time using a large number of queries. The Web search engines included in the study were MSN Search, Google, Yahoo! and Ask Jeeves. Our study then compares these results with the first page results retrieved for the same queries by the metasearch engine Dogpile.com. Two sets of randomly selected user-entered queries, one set was 10,316 queries and the other 12,570 queries, from Infospace’s Dogpile.com search engine (the first set was from Dogpile, the second was from across the Infospace Network of search properties were submitted to the four single Web search engines). Findings show that the percent of total results unique to only one of the four Web search engines was 84.9%, shared by two of the three Web search engines was 11.4%, shared by three of the Web search engines was 2.6%, and shared by all four Web search engines was 1.1%. This small degree of overlap shows the significant difference in the way major Web search engines retrieve and rank results in response to given queries. Results point to the value of metasearch engines in Web retrieval to overcome the biases of individual search engines. 相似文献

9.

Users can change their web search tactics: Design guidelines for categorized overviews

Bill Kules Ben Shneiderman 《Information processing & management》2008

Categorized overviews of web search results are a promising way to support user exploration, understanding, and discovery. These search interfaces combine a metadata-based overview with the list of search results to enable a rich form of interaction. A study of 24 sophisticated users carrying out complex tasks suggests how searchers may adapt their search tactics when using categorized overviews. This mixed methods study evaluated categorized overviews of web search results organized into thematic, geographic, and government categories. Participants conducted four exploratory searches during a 2-hour session to generate ideas for newspaper articles about specified topics such as “human smuggling.” Results showed that subjects explored deeper while feeling more organized, and that the categorized overview helped subjects better assess their results, although no significant differences were detected in the quality of the article ideas. A qualitative analysis of searcher comments identified seven tactics that participants reported adopting when using categorized overviews. This paper concludes by proposing a set of guidelines for the design of exploratory search interfaces. An understanding of the impact of categorized overviews on search tactics will be useful to web search researchers, search interface designers, information architects and web developers. 相似文献

10.

基于搜索引擎的中文分词评估方法

王华栋饶培伦《情报科学》2007,25(1):108-112

中文分词的结果是影响搜索引擎中文检索结果质量的重要因素,能否准确有效的分词对提高搜索结果的相关性和用户满意度都至关重要。本文回顾和整理了中文分词评估所依靠的理论依据,同时建立了一套完整的基于搜索引擎中文分词评估方法。这套评估方法涵盖了评估样本的提取、评估人员选取、评估标准的制定、以及评估流程的设置等各个方面。实例分析的结果表明此方法是行之有效的。在此基础上,作者进一步对实验评估的结果进行了深入讨论,并提出了提高评估效果的几条建议,包括如何考虑评估人员背景、取舍评估项目等。相似文献

11.

Analysis of multiple query reformulations on the web: The interactive information retrieval context

Soo Young Rieh Hong Xie 《Information processing & management》2006

This study examines the facets and patterns of multiple Web query reformulations with a focus on reformulation sequences. Based on IR interaction models, it was presumed that query reformulation is the product of the interaction between the user and the IR system. Query reformulation also reflects the interplay between the surface and deeper levels of user interaction. Query logs were collected from a Web search engine through the selection of search sessions in which users submitted six or more unique queries per session. The final data set was composed of 313 search sessions. Three facets of query reformulation (content, format, and resource) as well as nine sub-facets were derived from the data. In addition, analysis of modification sequences identified eight distinct patterns: specified, generalized, parallel, building-block, dynamic, multitasking, recurrent, and format reformulation. Adapting Saracevic’s stratified model, the authors develop a model of Web query reformulation based on the results of the study. The implications for Web search engine design are finally discussed and the functions of an interactive reformulation tool are suggested. 相似文献

12.

The influence of task and gender on search and evaluation behavior using Google

Lori Lorigo Bing Pan Helene Hembrooke Thorsten Joachims Laura Granka Geri Gay 《Information processing & management》2006

To improve search engine effectiveness, we have observed an increased interest in gathering additional feedback about users’ information needs that goes beyond the queries they type in. Adaptive search engines use explicit and implicit feedback indicators to model users or search tasks. In order to create appropriate models, it is essential to understand how users interact with search engines, including the determining factors of their actions. Using eye tracking, we extend this understanding by analyzing the sequences and patterns with which users evaluate query result returned to them when using Google. We find that the query result abstracts are viewed in the order of their ranking in only about one fifth of the cases, and only an average of about three abstracts per result page are viewed at all. We also compare search behavior variability with respect to different classes of users and different classes of search tasks to reveal whether user models or task models may be greater predictors of behavior. We discover that gender and task significantly influence different kinds of search behaviors discussed here. The results are suggestive of improvements to query-based search interface designs with respect to both their use of space and workflow. 相似文献

13.

A heuristic hierarchical scheme for academic search and retrieval

Emmanouil Amolochitis Ioannis T. Christou Zheng-Hua Tan Ramjee Prasad 《Information processing & management》2013

We present PubSearch, a hybrid heuristic scheme for re-ranking academic papers retrieved from standard digital libraries such as the ACM Portal. The scheme is based on the hierarchical combination of a custom implementation of the term frequency heuristic, a time-depreciated citation score and a graph-theoretic computed score that relates the paper’s index terms with each other. We designed and developed a meta-search engine that submits user queries to standard digital repositories of academic publications and re-ranks the repository results using the hierarchical heuristic scheme. We evaluate our proposed re-ranking scheme via user feedback against the results of ACM Portal on a total of 58 different user queries specified from 15 different users. The results show that our proposed scheme significantly outperforms ACM Portal in terms of retrieval precision as measured by most common metrics in Information Retrieval including Normalized Discounted Cumulative Gain (NDCG), Expected Reciprocal Rank (ERR) as well as a newly introduced lexicographic rule (LEX) of ranking search results. In particular, PubSearch outperforms ACM Portal by more than 77% in terms of ERR, by more than 11% in terms of NDCG, and by more than 907.5% in terms of LEX. We also re-rank the top-10 results of a subset of the original 58 user queries produced by Google Scholar, Microsoft Academic Search, and ArnetMiner; the results show that PubSearch compares very well against these search engines as well. The proposed scheme can be easily plugged in any existing search engine for retrieval of academic publications. 相似文献

14.

Using a new relational concept to improve the clustering performance of search engines

Lin-Chih Chen 《Information processing & management》2011

In this paper, we present a novel clustering algorithm to generate a number of candidate clusters from other web search results. The candidate clusters generate a connective relation among the clusters and the relation is semantic. Moreover, the algorithm also contains the following attractive properties: (1) it can be applied to multilingual web documents, (2) it improves the clustering performance of any search engine, (3) its unsupervised learning can automatically identify potentially relevant knowledge without using any corpus, and (4) clustering results are generated on the fly and fitted into search engines. 相似文献

15.

Improving the performance of personal name disambiguation using web directories

Quang Minh Vu Atsuhiro Takasu Jun Adachi 《Information processing & management》2008

Frequent requests from users to search engines on the World Wide Web are to search for information about people using personal names. Current search engines only return sets of documents containing the name queried, but, as several people usually share a personal name, the resulting sets often contain documents relevant to several people. It is necessary to disambiguate people in these result sets in order to to help users find the person of interest more readily. In the task of name disambiguation, effective measurement of similarities in the documents is a crucial step towards the final disambiguation. We propose a new method that uses web directories as a knowledge base to find common contexts in documents and uses the common contexts measure to determine document similarities. Experiments, conducted on documents mentioning real people on the web, together with several famous web directory structures, suggest that there are significant advantages in using web directories to disambiguate people compared with other conventional methods. 相似文献

16.

一种基于领域的语义搜索引擎模型SSEM

李江华时鹏《情报杂志》2012,31(4):112-116

Internet已成为全球最丰富的数据源,数据类型繁杂且动态变化,如何从中快速准确地检索出用户所需要的信息是一个亟待解决的问题.传统的搜索引擎基于语法的方式进行搜索,缺乏语义信息,难以准确地表达用户的查询需求和被检索对象的文档语义,致使查准率和查全率较低且搜索范围有限.本文对现有的语义检索方法进行了研究,分析了其中存在的问题,在此基础上提出了一种基于领域的语义搜索引擎模型,结合语义Web技术,使用领域本体元数据模型对用户的查询进行语义化规范,依据领域本体模式抽取文档中的知识并RDF化,准确地表达了用户的查询语义和作为被查询对象的文档语义,可以大大提高检索的准确性和检索效率,详细地给出了模型的体系结构、基本功能和工作原理. 相似文献

17.

Improving educational web search for question-like queries through subject classification

Tolga Yilmaz Rifat Ozcan Ismail Sengor Altingovde Özgür Ulusoy 《Information processing & management》2019,56(1):228-246

Students use general web search engines as their primary source of research while trying to find answers to school-related questions. Although search engines are highly relevant for the general population, they may return results that are out of educational context. Another rising trend; social community question answering websites are the second choice for students who try to get answers from other peers online. We attempt discovering possible improvements in educational search by leveraging both of these information sources. For this purpose, we first implement a classifier for educational questions. This classifier is built by an ensemble method that employs several regular learning algorithms and retrieval based approaches that utilize external resources. We also build a query expander to facilitate classification. We further improve the classification using search engine results and obtain 83.5% accuracy. Although our work is entirely based on the Turkish language, the features could easily be mapped to other languages as well. In order to find out whether search engine ranking can be improved in the education domain using the classification model, we collect and label a set of query results retrieved from a general web search engine. We propose five ad-hoc methods to improve search ranking based on the idea that the query-document category relation is an indicator of relevance. We evaluate these methods for overall performance, varying query length and based on factoid and non-factoid queries. We show that some of the methods significantly improve the rankings in the education domain. 相似文献

18.

国外网络搜索引擎最佳资源现状述评(1)——桌面搜索工具、搜索引擎指南、目录和论著资源述评

赵金海《现代情报》2007,27(3):62-64

从桌面搜索工具、搜索引擎指南、目录和论著资源等方面入手,对国外现有论述搜索引擎的主要资源的种类、性能和特色进行了述评。在此基础上,推荐有关搜索引擎的最佳资源,为人们学习掌握搜索引擎的资源、搜索技巧、方法和优化检索策略提供参考资料和学习途径。相似文献

19.

基于Web的图像搜索引擎 总被引：1，自引：0，他引：1

蔡颖《情报科学》2002,20(10):1075-1077

随着互联网的快速普及，宽带网的全力推行，网络上的图像信息急剧膨胀，多媒体文件越来越多，与此同时，用户对网上图像搜索的要求也在不断增长，在这种背景下，传统的文本搜索方式已经不能满足用的特殊需要，如何能更方便快捷地从网络上找到需要的图像或多媒体文件？于是，各种基于Web的图像搜索引擎应运而生。它们各自以不同的工作方式，使我们对网上图像信息的搜索变得非常简单，本文将从图像搜索引擎的工作原理，搜索方法以及国内外各大图像搜索引擎三个方面作一介绍。相似文献

20.

国外网络搜索引擎优秀资源现状述评——搜索引擎网站、论坛、新闻和学术会议资源

赵金海赵西安《现代情报》2008,28(1):218-220,223

从优秀搜索引擎、搜索引擎网站、搜索引擎论坛、搜索引擎新闻和搜索引擎会议等方面入手,对国外现有论述搜索引擎的主要资源、种类、性能和特色进行了述评.在此基础上,推荐有关搜索引擎的最佳资源,为人们学习掌握搜索引擎的资源、搜索技巧、方法和优化检索策略提供参考资料和学习途径. 相似文献