首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 500 毫秒
1.
The combination of evidence can increase retrieval effectiveness. In this paper, we investigate the effectiveness of a decision mechanism for the selective combination of evidence for Web Information Retrieval and particularly for topic distillation. We introduce two measures of a query’s broadness and use them to select an appropriate combination of evidence for each query. The results from our experiments show that there is a statistically significant association between the output of the decision mechanism and the relative effectiveness of the different combinations of evidence. Moreover, we show that the proposed methodology can be applied in an operational setting, where relevance information is not available, by setting the decision mechanism’s thresholds automatically.  相似文献   

2.
针对起源记录在Web应用中的表达和查询服务问题,对研究所涉及的主要概念进行辨析,在深入解析Web应用中起源记录的定位、传递模式、实现途径和实现模式的基础上归纳总结出Web应用中起源元数据的四类定位发现机制与两类查询机制。结合语义标注Web页面和溯源信息表达技术,采用在线论文追溯案例,实现包含RDFa起源记录的HTML页面显示并通过可视化方式揭示起源,最后对案例中的查询服务问题进行探讨。  相似文献   

3.
4.
The collective feedback of the users of an Information Retrieval (IR) system has been shown to provide semantic information that, while hard to extract using standard IR techniques, can be useful in Web mining tasks. In the last few years, several approaches have been proposed to process the logs stored by Internet Service Providers (ISP), Intranet proxies or Web search engines. However, the solutions proposed in the literature only partially represent the information available in the Web logs. In this paper, we propose to use a richer data structure, which is able to preserve most of the information available in the Web logs. This data structure consists of three groups of entities: users, documents and queries, which are connected in a network of relations. Query refinements correspond to separate transitions between the corresponding query nodes in the graph, while users are linked to the queries they have issued and to the documents they have selected. The classical query/document transitions, which connect a query to the documents selected by the users’ in the returned result page, are also considered. The resulting data structure is a complete representation of the collective search activity performed by the users of a search engine or of an Intranet. The experimental results show that this more powerful representation can be successfully used in several Web mining tasks like discovering semantically relevant query suggestions and Web page categorization by topic.  相似文献   

5.
Transaction logs of NAVER, a major Korean Web search engine, were analyzed to track the information-seeking behavior of Korean Web users. These transaction logs include more than 40 million queries collected over 1 week. This study examines current transaction log analysis methodologies and proposes a method for log cleaning, session definition, and query classification. A term definition method which is necessary for Korean transaction log analysis is also discussed. The results of this study show that users behave in a simple way: they type in short queries with a few query terms, seldom use advanced features, and view few results' pages. Users also behave in a passive way: they seldom change search environments set by the system. It is of interest that users tend to change their queries totally rather than adding or deleting terms to modify the previous queries. The results of this study might contribute to the development of more efficient and effective Web search engines and services.  相似文献   

6.
利用链接关系评价网络信息的可行性研究   总被引:43,自引:3,他引:40  
刘雁书  方平 《情报学报》2002,21(4):401-406
目的 :通过对有代表意义网站的链接特征及站外链接类型及特征的调查分析 ,评价利用链接关系评价网络信息的可行性。方法 :利用搜索引擎FastSearch分别检索综合网站与专业网站的被链次数及新浪被不同类型网页链接次数 ,以分析网站的链接特征 ,并对新浪站外链接类型及其特征进行分析。结果 :综合网站与专业网站被链次数存在明显差异 ,站外链接数在总链接数中所占比例超过站内链接数。新浪被不同语种及不同国别网页广泛链接。站外链接可分为推荐链接、合作链接、相关链接、资源链接、通讯链接、广告链接 6种类型。前 3种类型全部是新浪的主页与频道建立的链接 ,对评价网络信息价值更高。结论 :站外链接关系反映的是被链网页被利用与被推荐的总体情况 ,与被链网页质量存在正向 (肯定 )联系 ,因此利用站外链接评价网络信息是可行的。  相似文献   

7.
Query suggestions have become pervasive in modern web search, as a mechanism to guide users towards a better representation of their information need. In this article, we propose a ranking approach for producing effective query suggestions. In particular, we devise a structured representation of candidate suggestions mined from a query log that leverages evidence from other queries with a common session or a common click. This enriched representation not only helps overcome data sparsity for long-tail queries, but also leads to multiple ranking criteria, which we integrate as features for learning to rank query suggestions. To validate our approach, we build upon existing efforts for web search evaluation and propose a novel framework for the quantitative assessment of query suggestion effectiveness. Thorough experiments using publicly available data from the TREC Web track show that our approach provides effective suggestions for adhoc and diversity search.  相似文献   

8.
在Web of Science中准确进行中文机构检索的方法研究   总被引:1,自引:0,他引:1  
分析在web of science中检索中文机构时,出现漏检、重检和误检等问题的原因,提出二类中文机构名的特点及不同检索方法,总结出五步法编写中文机构检索式并通过实例分析,对如何完善web of science的检索功能提出了建议.  相似文献   

9.
侯丽  李姣  侯震  陈松景 《图书情报工作》2015,59(23):115-123
[目的/意义] 从互联网公众查询数据中发现公众使用的健康术语,为建立公众健康术语与医学专业术语的映射提供基础,进而优化健康类知识服务平台的知识组织与管理性能。[方法/过程] 设计规则与N-Gram相结合的健康术语新词的识别模型,采集公众查询数据,开展实验验证,通过多次实验,逐步完善过滤语料集合,结合人工判读,不断优化并验证方案的有效性。[结果/结论] 从互联网中公众提问句抽取出规则,结合统计算法进行公众使用的健康类新词抽取,该技术方法对识别公众使用的健康术语具有一定的通用性,能为建立公众术语与医学术语映射提供数据基础。实验结果表明:基于规则进行公众日志数据预处理,能为后续的实验方案提供较好的预处理文本,而采用N-Gram及各种过滤规则结合的术语识别方法,能较好地识别发现短文本中的新词。  相似文献   

10.
网络半结构化信息资源的描述   总被引:2,自引:0,他引:2  
要对网络信息资源进行更好的管理和查询,首先要建立一种合理的信息资源描述机制。metadata是描述网络信息资源的有力工具,但新的信息描述机制--linking机制不仅能表述metadata的内容,而且可以表达比metada更丰富的语义,弥补metadata自身不能克服的一些缺陷。  相似文献   

11.
Entity ranking has recently emerged as a research field that aims at retrieving entities as answers to a query. Unlike entity extraction where the goal is to tag names of entities in documents, entity ranking is primarily focused on returning a ranked list of relevant entity names for the query. Many approaches to entity ranking have been proposed, and most of them were evaluated on the INEX Wikipedia test collection. In this paper, we describe a system we developed for ranking Wikipedia entities in answer to a query. The entity ranking approach implemented in our system utilises the known categories, the link structure of Wikipedia, as well as the link co-occurrences with the entity examples (when provided) to retrieve relevant entities as answers to the query. We also extend our entity ranking approach by utilising the knowledge of predicted classes of topic difficulty. To predict the topic difficulty, we generate a classifier that uses features extracted from an INEX topic definition to classify the topic into an experimentally pre-determined class. This knowledge is then utilised to dynamically set the optimal values for the retrieval parameters of our entity ranking system. Our experiments demonstrate that the use of categories and the link structure of Wikipedia can significantly improve entity ranking effectiveness, and that topic difficulty prediction is a promising approach that could also be exploited to further improve the entity ranking performance.  相似文献   

12.
User queries to the Web tend to have more than one interpretation due to their ambiguity and other characteristics. How to diversify the ranking results to meet users’ various potential information needs has attracted considerable attention recently. This paper is aimed at mining the subtopics of a query either indirectly from the returned results of retrieval systems or directly from the query itself to diversify the search results. For the indirect subtopic mining approach, clustering the retrieval results and summarizing the content of clusters is investigated. In addition, labeling topic categories and concept tags on each returned document is explored. For the direct subtopic mining approach, several external resources, such as Wikipedia, Open Directory Project, search query logs, and the related search services of search engines, are consulted. Furthermore, we propose a diversified retrieval model to rank documents with respect to the mined subtopics for balancing relevance and diversity. Experiments are conducted on the ClueWeb09 dataset with the topics of the TREC09 and TREC10 Web Track diversity tasks. Experimental results show that the proposed subtopic-based diversification algorithm significantly outperforms the state-of-the-art models in the TREC09 and TREC10 Web Track diversity tasks. The best performance our proposed algorithm achieves is α-nDCG@5 0.307, IA-P@5 0.121, and α#-nDCG@5 0.214 on the TREC09, as well as α-nDCG@10 0.421, IA-P@10 0.201, and α#-nDCG@10 0.311 on the TREC10. The results conclude that the subtopic mining technique with the up-to-date users’ search query logs is the most effective way to generate the subtopics of a query, and the proposed subtopic-based diversification algorithm can select the documents covering various subtopics.  相似文献   

13.
通过分析网络日志获得查询聚类和会话单元数据集,在此基础上提出知识地图构造算法,对学习到的知识进行存储与管理,构建基于网络日志的知识地图,利用知识地图中的查询知识可以进行知识的筛选,将查询知识展示在用户面前,使用户快速地获得需要的查询知识。  相似文献   

14.
Search engine results are often biased towards a certain aspect of a query or towards a certain meaning for ambiguous query terms. Diversification of search results offers a way to supply the user with a better balanced result set increasing the probability that a user finds at least one document suiting her information need. In this paper, we present a reranking approach based on minimizing variance of Web search results to improve topic coverage in the top-k results. We investigate two different document representations as the basis for reranking. Smoothed language models and topic models derived by Latent Dirichlet?allocation. To evaluate our approach we selected 240 queries from Wikipedia disambiguation pages. This provides us with ambiguous queries together with a community generated balanced representation of their (sub)topics. For these queries we crawled two major commercial search engines. In addition, we present a new evaluation strategy based on Kullback-Leibler divergence and Wikipedia. We evaluate this method using the TREC sub-topic evaluation on the one hand, and manually annotated query results on the other hand. Our results show that minimizing variance in search results by reranking relevant pages significantly improves topic coverage in the top-k results with respect to Wikipedia, and gives a good overview of the overall search result. Moreover, latent topic models achieve competitive diversification with significantly less reranking. Finally, our evaluation reveals that our automatic evaluation strategy using Kullback-Leibler divergence correlates well with α-nDCG scores used in manual evaluation efforts.  相似文献   

15.
Query suggestion, which enables the user to revise a query with a single click, has become one of the most fundamental features of Web search engines. However, it has not been clear what circumstances cause the user to turn to query suggestion. In order to investigate when and how the user uses query suggestion, we analyzed three kinds of data sets obtained from a major commercial Web search engine, comprising approximately 126 million unique queries, 876 million query suggestions and 306 million action patterns of users. Our analysis shows that query suggestions are often used (1) when the original query is a rare query, (2) when the original query is a single-term query, (3) when query suggestions are unambiguous, (4) when query suggestions are generalizations or error corrections of the original query, and (5) after the user has clicked on several URLs in the first search result page. Our results suggest that search engines should provide better assistance especially when rare or single-term queries are input, and that they should dynamically provide query suggestions according to the searcher’s current state.  相似文献   

16.
网络搜索中语言使用特征研究   总被引:1,自引:0,他引:1  
以网络搜索中语言使用的特征为研究对象,旨在对网络搜索中查询式的句法和语义问题进行探索性的研究。主要使用搜索引擎查询日志挖掘的方法,辅以网络问卷调查法所得到的结论进行比较分析,得出在句法、词汇类别、辅助词和主体词等方面的特征。  相似文献   

17.
通过挖掘网络日志中的查询词语义关系,将《知网》的语义知识加入到聚类算法中实现搜索引擎优化。该方法通过机器学习算法深入挖掘查询日志,对其中的查询串进行概念相似度、语义聚类等计算,使返回网页更加合理,将更准确的网页结果呈现在用户面前,能够更好地满足用户需求。  相似文献   

18.
基于ASP+ADO开发Web数据库查询系统   总被引:2,自引:0,他引:2  
介绍了A SP 的基本概念及主要特点, 并简介了A SP 的组件ADO。说明了W eb 数据库查询系统的设计方法, 并给出实例加以说明。  相似文献   

19.
This paper provides an overview of the research into current medical vocabularies and their impact on searching the Web for health information. The Web provides growing opportunities for laypersons to gain knowledge about specific health conditions, though research to date has been incomplete. Many studies have examined aspects of controlled medical vocabularies. Other studies have examined aspects of medical Web searching vocabularies. In this context, there is a growing need to examine more closely laypersons' Web queries using controlled medical vocabularies that were designed to serve the needs of medical professionals. It may be the case that the average consumer of Web health services is not able to use correct medical terminology, and may not be able to choose analogous or synonymous terms from a search result list. Our review suggests a growing need for studies to examine the current applicability of controlled medical vocabularies as well as alternatives to semantic query by Web search engine users.  相似文献   

20.
基于查询结果的Web数据库自动分类研究   总被引:2,自引:0,他引:2  
郭少友 《情报学报》2006,25(4):481-487
本文提出了基于查询结果的Web数据库自动分类方法,该方法以雅虎分类目录体系中的类目词为查询词对数据库进行查询,并根据查询结果对Web数据库进行分类。本文通过原型系统检验了该方法的分类效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号