首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 578 毫秒
1.
李慧 《现代情报》2015,35(2):159-164
排序算法的好坏很大程度上影响了搜索引擎的用户体验,尤其是近些年随着语义检索技术的发展,使其检索和排序的对象不仅仅局限于文档和网页,更包括了实体和关系等。在对现有研究与应用调研的基础上,对当前语义检索研究进行了综述,并按照排序的阶段将其分为实体排序、关系排序和本体文档排序,并详细阐述了每种排序算法的研究进展,最后指出,将用户的社会网络因素同已有的排序算法相结合,是未来语义排序的发展趋势之一。  相似文献   

2.
In the whole world, the internet is exercised by millions of people every day for information retrieval. Even for a small to smaller task like fixing a fan, to cook food or even to iron clothes persons opt to search the web. To fulfill the information needs of people, there are billions of web pages, each having a different degree of relevance to the topic of interest (TOI), scattered throughout the web but this huge size makes manual information retrieval impossible. The page ranking algorithm is an integral part of search engines as it arranges web pages associated with a queried TOI in order of their relevance level. It, therefore, plays an important role in regulating the search quality and user experience for information retrieval. PageRank, HITS, and SALSA are well-known page ranking algorithm based on link structure analysis of a seed set, but ranking given by them has not yet been efficient. In this paper, we propose a variant of SALSA to give sNorm(p) for the efficient ranking of web pages. Our approach relies on a p-Norm from Vector Norm family in a novel way for the ranking of web pages as Vector Norms can reduce the impact of low authority weight in hub weight calculation in an efficient way. Our study, then compares the rankings given by PageRank, HITS, SALSA, and sNorm(p) to the same pages in the same query. The effectiveness of the proposed approach over state of the art methods has been shown using performance measurement technique, Mean Reciprocal Rank (MRR), Precision, Mean Average Precision (MAP), Discounted Cumulative Gain (DCG) and Normalized DCG (NDCG). The experimentation is performed on a dataset acquired after pre-processing of the results collected from initial few pages retrieved for a query by the Google search engine. Based on the type and amount of in-hand domain expertise 30 queries are designed. The extensive evaluation and result analysis are performed using MRR, [email protected], MAP, DCG, and NDCG as the performance measuring statistical metrics. Furthermore, results are statistically verified using a significance test. Findings show that our approach outperforms state of the art methods by attaining 0.8666 as MRR value, 0.7957 as MAP value. Thus contributing to the improvement in the ranking of web pages more efficiently as compared to its counterparts.  相似文献   

3.
Topic distillation is one of the main information needs when users search the Web. Previous approaches for topic distillation treat single page as the basic searching unit, which has not fully utilized the structure information of the Web. In this paper, we propose a novel concept for topic distillation, named sub-site retrieval, in which the basic searching unit is sub-site instead of single page. A sub-site is the subset of a website, consisting of a structural collection of pages. The key of sub-site retrieval includes (1) extracting effective features for the representation of a sub-site using both the content and structure information, (2) delivering the sub-site-based retrieval results with a friendly and informative user interface. For the first point, we propose Punished Integration algorithm, which is based on the modeling of the growth of websites. For the second point, we design a user interface to better illustrate the search results of sub-site retrieval. Testing on the topic distillation task of TREC 2003 and 2004, sub-site retrieval leads to significant improvement of retrieval performance over the previous methods based on single pages. Furthermore, time complexity analysis shows that sub-site retrieval can be integrated into the index component of search engines.  相似文献   

4.
Students use general web search engines as their primary source of research while trying to find answers to school-related questions. Although search engines are highly relevant for the general population, they may return results that are out of educational context. Another rising trend; social community question answering websites are the second choice for students who try to get answers from other peers online. We attempt discovering possible improvements in educational search by leveraging both of these information sources. For this purpose, we first implement a classifier for educational questions. This classifier is built by an ensemble method that employs several regular learning algorithms and retrieval based approaches that utilize external resources. We also build a query expander to facilitate classification. We further improve the classification using search engine results and obtain 83.5% accuracy. Although our work is entirely based on the Turkish language, the features could easily be mapped to other languages as well. In order to find out whether search engine ranking can be improved in the education domain using the classification model, we collect and label a set of query results retrieved from a general web search engine. We propose five ad-hoc methods to improve search ranking based on the idea that the query-document category relation is an indicator of relevance. We evaluate these methods for overall performance, varying query length and based on factoid and non-factoid queries. We show that some of the methods significantly improve the rankings in the education domain.  相似文献   

5.
Document similarity search (i.e. query by example) aims to retrieve a ranked list of documents similar to a query document in a text corpus or on the Web. Most existing approaches to similarity search first compute the pairwise similarity score between each document and the query using a retrieval function or similarity measure (e.g. Cosine), and then rank the documents by the similarity scores. In this paper, we propose a novel retrieval approach based on manifold-ranking of document blocks (i.e. a block of coherent text about a subtopic) to re-rank a small set of documents initially retrieved by some existing retrieval function. The proposed approach can make full use of the intrinsic global manifold structure of the document blocks by propagating the ranking scores between the blocks on a weighted graph. First, the TextTiling algorithm and the VIPS algorithm are respectively employed to segment text documents and web pages into blocks. Then, each block is assigned with a ranking score by the manifold-ranking algorithm. Lastly, a document gets its final ranking score by fusing the scores of its blocks. Experimental results on the TDT data and the ODP data demonstrate that the proposed approach can significantly improve the retrieval performances over baseline approaches. Document block is validated to be a better unit than the whole document in the manifold-ranking process.  相似文献   

6.
The Web and especially major Web search engines are essential tools in the quest to locate online information for many people. This paper reports results from research that examines characteristics and changes in Web searching from nine studies of five Web search engines based in the US and Europe. We compare interactions occurring between users and Web search engines from the perspectives of session length, query length, query complexity, and content viewed among the Web search engines. The results of our research shows (1) users are viewing fewer result pages, (2) searchers on US-based Web search engines use more query operators than searchers on European-based search engines, (3) there are statistically significant differences in the use of Boolean operators and result pages viewed, and (4) one cannot necessary apply results from studies of one particular Web search engine to another Web search engine. The wide spread use of Web search engines, employment of simple queries, and decreased viewing of result pages may have resulted from algorithmic enhancements by Web search engine companies. We discuss the implications of the findings for the development of Web search engines and design of online content.  相似文献   

7.
We develop a two-phased survey design—based on the uses and gratifications approach and the theory of planned behavior—to analyze competitive relations between search engines and traditional information sources. We apply the survey design in a large-scale empirical study with 14-to 66-year-old Internet users (mean age 32) to find out whether complementary or substitutional dependencies predominate between search engines and three traditional information sources—paper-based encyclopedias and yellow pages and telephone-based directory assistance. We find that search engines, compared to the traditional alternatives, are gratifying a wider spread of users' needs. Although yellow pages and directory assistance are potentially substitutable, encyclopedias serve those needs that search engines cannot (yet) fulfill. The traditional media companies face increased competition, but do not necessarily have to be in an inferior competitive position.  相似文献   

8.
We develop a two-phased survey design—based on the uses and gratifications approach and the theory of planned behavior—to analyze competitive relations between search engines and traditional information sources. We apply the survey design in a large-scale empirical study with 14-to 66-year-old Internet users (mean age 32) to find out whether complementary or substitutional dependencies predominate between search engines and three traditional information sources—paper-based encyclopedias and yellow pages and telephone-based directory assistance. We find that search engines, compared to the traditional alternatives, are gratifying a wider spread of users' needs. Although yellow pages and directory assistance are potentially substitutable, encyclopedias serve those needs that search engines cannot (yet) fulfill. The traditional media companies face increased competition, but do not necessarily have to be in an inferior competitive position.  相似文献   

9.
曲琳琳 《情报科学》2021,39(8):132-138
【目的/意义】跨语言信息检索研究的目的即在消除因语言的差异而导致信息查询的困难,提高从大量纷繁 复杂的查找特定信息的效率。同时提供一种更加方便的途径使得用户能够使用自己熟悉的语言检索另外一种语 言文档。【方法/过程】本文通过对国内外跨语言信息检索的研究现状分析,介绍了目前几种查询翻译的方法,包括: 直接查询翻译、文献翻译、中间语言翻译以及查询—文献翻译方法,对其效果进行比较,然后阐述了跨语言检索关 键技术,对使用基于双语词典、语料库、机器翻译技术等产生的歧义性提出了解决方法及评价。【结果/结论】使用自 然语言处理技术、共现技术、相关反馈技术、扩展技术、双向翻译技术以及基于本体信息检索技术确保知识词典的 覆盖度和歧义性处理,通过对跨语言检索实验分析证明采用知识词典、语料库和搜索引擎组合能够提高查询效 率。【创新/局限】本文为了解决跨语言信息检索使用词典、语料库中词语缺乏的现象,提出通过搜索引擎从网页获 取信息资源来充实语料库中语句对不足的问题。文章主要针对中英文信息检索问题进行了探讨,解决方法还需要 进一步研究,如中文切词困难以及字典覆盖率低等严重影响检索的效率。  相似文献   

10.
郭金兰 《现代情报》2009,29(7):53-56
作者通过调研说明了政府网站优化的必要性,并且提出了政府网站优化的方法,如避免复杂的结构,修改关键字结构与密度,加入一些合适的外部链接,可以让一般的政府职能部门网站在搜索引擎上获得很好排名,但对政府门户网站优化还要精心布局网站内部结构,巧妙使用网站内部链接来提高其子页在搜索引擎上的排名。  相似文献   

11.
A fast and efficient page ranking mechanism for web crawling and retrieval remains as a challenging issue. Recently, several link based ranking algorithms like PageRank, HITS and OPIC have been proposed. In this paper, we propose a novel recursive method based on reinforcement learning which considers distance between pages as punishment, called “DistanceRank” to compute ranks of web pages. The distance is defined as the number of “average clicks” between two pages. The objective is to minimize punishment or distance so that a page with less distance to have a higher rank. Experimental results indicate that DistanceRank outperforms other ranking algorithms in page ranking and crawling scheduling. Furthermore, the complexity of DistanceRank is low. We have used University of California at Berkeley’s web for our experiments.  相似文献   

12.
Many machine learning technologies such as support vector machines, boosting, and neural networks have been applied to the ranking problem in information retrieval. However, since originally the methods were not developed for this task, their loss functions do not directly link to the criteria used in the evaluation of ranking. Specifically, the loss functions are defined on the level of documents or document pairs, in contrast to the fact that the evaluation criteria are defined on the level of queries. Therefore, minimizing the loss functions does not necessarily imply enhancing ranking performances. To solve this problem, we propose using query-level loss functions in learning of ranking functions. We discuss the basic properties that a query-level loss function should have and propose a query-level loss function based on the cosine similarity between a ranking list and the corresponding ground truth. We further design a coordinate descent algorithm, referred to as RankCosine, which utilizes the proposed loss function to create a generalized additive ranking model. We also discuss whether the loss functions of existing ranking algorithms can be extended to query-level. Experimental results on the datasets of TREC web track, OHSUMED, and a commercial web search engine show that with the use of the proposed query-level loss function we can significantly improve ranking accuracies. Furthermore, we found that it is difficult to extend the document-level loss functions to query-level loss functions.  相似文献   

13.
网络信息分类检索问题研究   总被引:4,自引:0,他引:4  
This paper studies network information classification retrieval from the theory of information management. With a brief introduction to search engines, it focuses on analyzing the characteristics of network documents and their classification system. Problems in network document classification are pointed out. Suggestions such as constructing a catalog classification search engine system are made.  相似文献   

14.
Searching for relevant material that satisfies the information need of a user, within a large document collection is a critical activity for web search engines. Query Expansion techniques are widely used by search engines for the disambiguation of user’s information need and for improving the information retrieval (IR) performance. Knowledge-based, corpus-based and relevance feedback, are the main QE techniques, that employ different approaches for expanding the user query with synonyms of the search terms (word synonymy) in order to bring more relevant documents and for filtering documents that contain search terms but with a different meaning (also known as word polysemy problem) than the user intended. This work, surveys existing query expansion techniques, highlights their strengths and limitations and introduces a new method that combines the power of knowledge-based or corpus-based techniques with that of relevance feedback. Experimental evaluation on three information retrieval benchmark datasets shows that the application of knowledge or corpus-based query expansion techniques on the results of the relevance feedback step improves the information retrieval performance, with knowledge-based techniques providing significantly better results than their simple relevance feedback alternatives in all sets.  相似文献   

15.
刘俊熙  盛宇 《现代情报》2009,29(3):143-145
垂直搜索被普遍认为将是下个潜力市场,是搜索引擎的细分和延伸。是对某类网页资源和结构化资源的深度整合。本文综合分析了垂直搜索的特性,并从信息采集、信息索引和信息处理方面分析其同通用搜索引擎的差异,然后通过垂直搜索在电子政务上的强势进入的案例来分析其应用发站的特性。  相似文献   

16.
Recent research in the human computer interaction and information retrieval areas has revealed that search response latency exhibits a clear impact on the user behavior in web search. Such impact is reflected both in users’ subjective perception of the usability of a search engine and in their interaction with the search engine in terms of the number of search results they engage with. However, a similar impact analysis has been missing so far in the context of sponsored search. Since the predominant business model for commercial search engines is advertising via sponsored search results (i.e., search advertisements), understanding how response latency influences the user interaction with the advertisements displayed on the search engine result pages is crucial to increase the revenue of a commercial search engine. To this end, we conduct a large-scale analysis using query logs obtained from a commercial web search. We analyze the short-term and long-term impact of search response latency on the querying and clicking behaviors of users using desktop and mobile devices to access the search engine, as well as the corresponding impact on the revenue of the search engine. This analysis demonstrates the importance of serving sponsored search results with low latency and provides insight into the ad serving policy of commercial search engines to ensure long-term user engagement and search revenue.  相似文献   

17.
搜索引擎搜索结果的评价技术   总被引:6,自引:0,他引:6  
陶跃华  孙茂松 《情报科学》2001,19(8):861-863,873
搜索引擎根据用户查询在自己的索引数据库中进行查找,并根据相关性分析将查询结果返回给用户。本文就传统信息检索系统的性能效率评价技术,针对Internet的特点,对搜索引擎搜索结果的评价进行了讨论。  相似文献   

18.
利用分类法和主题法改善搜索引擎的性能   总被引:6,自引:0,他引:6  
苏瑞竹  吴英姿 《情报科学》2001,19(11):1170-1175
本文对Internet上的检索工具搜索引擎的工作机理和性能进行了全方位的探讨,指出了常见搜索引擎信息检索缺点。同时还指出了Meta搜索引擎、智能搜索引擎和代理搜索引擎虽然提高了网络信息检索的质量,但由于分类体系不统一,类目划分标准模糊,因而仍然未能从根本上改变搜索引擎主要以关键词(自然语言)作为检索入口的现状,不能实现分类检索与主题检索的一体化。要实现搜索引擎信息检索的突破,笔者认为有必要运用情报检索语言的理论和方法来完善因特网搜索引擎的性能,实现分类、主题一体化的检索机制,克服分类检索语言单纯以学科聚类、主题语言单纯以事物聚类的局限性。  相似文献   

19.
Comparing rankings of search results on the Web   总被引:1,自引:0,他引:1  
The Web has become an information source for professional data gathering. Because of the vast amounts of information on almost all topics, one cannot systematically go over the whole set of results, and therefore must rely on the ordering of the results by the search engine. It is well known that search engines on the Web have low overlap in terms of coverage. In this study we measure how similar are the rankings of search engines on the overlapping results.We compare rankings of results for identical queries retrieved from several search engines. The method is based only on the set of URLs that appear in the answer sets of the engines being compared. For comparing the similarity of rankings of two search engines, the Spearman correlation coefficient is computed. When comparing more than two sets Kendall’s W is used. These are well-known measures and the statistical significance of the results can be computed. The methods are demonstrated on a set of 15 queries that were submitted to four large Web search engines. The findings indicate that the large public search engines on the Web employ considerably different ranking algorithms.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号