首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于社会化标签系统的个性化信息推荐探讨   总被引:4,自引:0,他引:4  
针对用户个人特征并向其提供准确恰当信息的个性化信息推荐研究,一直是学术界和产业界所关注的热点。结合后控词表,对用户分散的、个性化的标注进行处理,并将用户兴趣用向量表示,然后借鉴协同过滤算法的思想,寻找出相似用户集及其内部的资源集。在此基础上,采用相对匹配策略,提出一种基于社会化标签系统的个性化推荐方法。  相似文献   

2.
Patent prior art search is a type of search in the patent domain where documents are searched for that describe the work previously carried out related to a patent application. The goal of this search is to check whether the idea in the patent application is novel. Vocabulary mismatch is one of the main problems of patent retrieval which results in low retrievability of similar documents for a given patent application. In this paper we show how the term distribution of the cited documents in an initially retrieved ranked list can be used to address the vocabulary mismatch. We propose a method for query modeling estimation which utilizes the citation links in a pseudo relevance feedback set. We first build a topic dependent citation graph, starting from the initially retrieved set of feedback documents and utilizing citation links of feedback documents to expand the set. We identify the important documents in the topic dependent citation graph using a citation analysis measure. We then use the term distribution of the documents in the citation graph to estimate a query model by identifying the distinguishing terms and their respective weights. We then use these terms to expand our original query. We use CLEF-IP 2011 collection to evaluate the effectiveness of our query modeling approach for prior art search. We also study the influence of different parameters on the performance of the proposed method. The experimental results demonstrate that the proposed approach significantly improves the recall over a state-of-the-art baseline which uses the link-based structure of the citation graph but not the term distribution of the cited documents.  相似文献   

3.
提出基于关联数据技术组织用户需求的设想及其架构——需求语义网络模型,该模型由数据层、需求信息层、应用层组成,需求信息层是整个模型的核心,其构建包括需求信息建模、需求信息命名、需求信息RDF化、需求信息发布、开放查询5个步骤,需求语义网络构建的重点和难点包括用户需求及关系的定义与描述、用户需求的关联与分解、需求网络中各层次之间的协作与交流以及匹配服务器的延伸和扩展等,最后,将需求语义网络理论应用到高校图书馆个性化知识服务中,提出基于关联数据的高校图书馆图书需求语义网络的构建模型。  相似文献   

4.
Enterprise search is important, and the search quality has a direct impact on the productivity of an enterprise. Enterprise data contain both structured and unstructured information. Since these two types of information are complementary and the structured information such as relational databases is designed based on ER (entity-relationship) models, there is a rich body of information about entities in enterprise data. As a result, many information needs of enterprise search center around entities. For example, a user may formulate a query describing a problem that she encounters with an entity, e.g., the web browser, and want to retrieve relevant documents to solve the problem. Intuitively, information related to the entities mentioned in the query, such as related entities and their relations, would be useful to reformulate the query and improve the retrieval performance. However, most existing studies on query expansion are term-centric. In this paper, we propose a novel entity-centric query expansion framework for enterprise search. Specifically, given a query containing entities, we first utilize both unstructured and structured information to find entities that are related to the ones in the query. We then discuss how to adapt existing feedback methods to use the related entities and their relations to improve search quality. Experimental results over two real-world enterprise collections show that the proposed entity-centric query expansion strategies are more effective and robust to improve the search performance than the state-of-the-art pseudo feedback methods for long natural language-like queries with entities. Moreover, results over a TREC ad hoc retrieval collections show that the proposed methods can also work well for short keyword queries in the general search domain.  相似文献   

5.
Despite a clear improvement of search and retrieval temporal applications, current search engines are still mostly unaware of the temporal dimension. Indeed, in most cases, systems are limited to offering the user the chance to restrict the search to a particular time period or to simply rely on an explicitly specified time span. If the user is not explicit in his/her search intents (e.g., “philip seymour hoffman”) search engines may likely fail to present an overall historic perspective of the topic. In most such cases, they are limited to retrieving the most recent results. One possible solution to this shortcoming is to understand the different time periods of the query. In this context, most state-of-the-art methodologies consider any occurrence of temporal expressions in web documents and other web data as equally relevant to an implicit time sensitive query. To approach this problem in a more adequate manner, we propose in this paper the detection of relevant temporal expressions to the query. Unlike previous metadata and query log-based approaches, we show how to achieve this goal based on information extracted from document content. However, instead of simply focusing on the detection of the most obvious date we are also interested in retrieving the set of dates that are relevant to the query. Towards this goal, we define a general similarity measure that makes use of co-occurrences of words and years based on corpus statistics and a classification methodology that is able to identify the set of top relevant dates for a given implicit time sensitive query, while filtering out the non-relevant ones. Through extensive experimental evaluation, we mean to demonstrate that our approach offers promising results in the field of temporal information retrieval (T-IR), as demonstrated by the experiments conducted over several baselines on web corpora collections.  相似文献   

6.
传统的查询扩展方法,不能从根本上消除用户查询意图与检索结果之间的语义偏差和用户查询的歧义性问题,而交互式查询扩展可以有效地帮助用户更快捷、精确地从海量的网络资源中找到所需信息,为用户提供更满意的搜索结果。综合运用文献调研和问卷调查法,从用户使用及需求情况、使用原因、评价及建议等维度对交互式查询扩展进行实证分析。提出操作方式简单化、查询扩展个性化、交互显示人性化、检索结果精确化、检索环境移动化是交互式查询扩展的研究重点和主要发展方向。  相似文献   

7.
Vocabulary incompatibilities arise when the terms used to index a document collection are largely unknown, or at least not well-known to the users who eventually search the collection. No matter how comprehensive or well-structured the indexing vocabulary, it is of little use if it is not used effectively in query formulation. This paper demonstrates that techniques for mapping user queries into the controlled indexing vocabulary have the potential to radically improve document retrieval performance. We also show how the use of controlled indexing vocabulary can be employed to achieve performance gains for collection selection. Finally, we demonstrate the potential benefit of combining these two techniques in an interactive retrieval environment. Given a user query, our evaluation approach simulates the human user's choice of terms for query augmentation given a list of controlled vocabulary terms suggested by a system. This strategy lets us evaluate interactive strategies without the need for human subjects.  相似文献   

8.
随着信息的爆炸,人们对检索系统的功能、智能化程度以及检索效果的要求更高,希望它们能提供更准确、更精炼和更符合个人需要的检索结果。本文提出一种基于用户兴趣的个性化检索方法,结合分类法的思想,用“分类”代替“关键词”表示用户兴趣,改进了信息过滤的方法,优化了检索结果,使其更加符合用户的需要,实现了基于用户兴趣的个性化信息检索。此外,开发了基于用户兴趣的个性化检索系统,并进行了相关实验,验证了该方法确可明显改善检索效果。  相似文献   

9.
In the patent domain significant efforts are invested to assist researchers in formulating better queries, preferably via automated query expansion. Currently, automatic query expansion in patent search is mostly limited to computing co-occurring terms for the searchable features of the invention. Additional query terms are extracted automatically from patent documents based on entropy measures. Learning synonyms in the patent domain for automatic query expansion has been a difficult task. No dedicated sources providing synonyms for the patent domain, such as patent domain specific lexica or thesauri, are available. In this paper we focus on the highly professional search setting of patent examiners. In particular, we use query logs to learn synonyms for the patent domain. For automatic query expansion, we create term networks based on the query logs specifically for several USPTO patent classes. Experiments show good performance in automatic query expansion using these automatically generated term networks. Specifically, with a larger number of query logs for a specific patent US class available the performance of the learned term networks increases.  相似文献   

10.
The collective feedback of the users of an Information Retrieval (IR) system has been shown to provide semantic information that, while hard to extract using standard IR techniques, can be useful in Web mining tasks. In the last few years, several approaches have been proposed to process the logs stored by Internet Service Providers (ISP), Intranet proxies or Web search engines. However, the solutions proposed in the literature only partially represent the information available in the Web logs. In this paper, we propose to use a richer data structure, which is able to preserve most of the information available in the Web logs. This data structure consists of three groups of entities: users, documents and queries, which are connected in a network of relations. Query refinements correspond to separate transitions between the corresponding query nodes in the graph, while users are linked to the queries they have issued and to the documents they have selected. The classical query/document transitions, which connect a query to the documents selected by the users’ in the returned result page, are also considered. The resulting data structure is a complete representation of the collective search activity performed by the users of a search engine or of an Intranet. The experimental results show that this more powerful representation can be successfully used in several Web mining tasks like discovering semantically relevant query suggestions and Web page categorization by topic.  相似文献   

11.
交互式跨语言信息检索是信息检索的一个重要分支。在分析交互式跨语言信息检索过程、评价指标、用户行为进展等理论研究基础上,设计一个让用户参与跨语言信息检索全过程的用户检索实验。实验结果表明:用户检索词主要来自检索主题的标题;用户判断文档相关性的准确率较高;目标语言文档全文、译文摘要、译文全文都是用户认可的判断依据;翻译优化方法以及翻译优化与查询扩展的结合方法在用户交互环境下非常有效;用户对于反馈后的翻译仍然愿意做进一步选择;用户对于与跨语言信息检索系统进行交互是有需求并认可的。用户行为分析有助于指导交互式跨语言信息检索系统的设计与实践。  相似文献   

12.
Query expansion (QE) is an important process in information retrieval applications that improves the user query and helps in retrieving relevant results. In this paper, we introduce a hybrid query expansion model (HQE) that investigates how external resources can be combined to association rules mining and used to enhance expansion terms generation and selection. The HQE model can be processed in different configurations, starting from methods based on association rules and combining it with external knowledge. The HQE model handles the two main phases of a QE process, namely: the candidate terms generation phase and the selection phase. We propose for the first phase, statistical, semantic and conceptual methods to generate new related terms for a given query. For the second phase, we introduce a similarity measure, ESAC, based on the Explicit Semantic Analysis that computes the relatedness between a query and the set of candidate terms. The performance of the proposed HQE model is evaluated within two experimental validations. The first one addresses the tweet search task proposed by TREC Microblog Track 2011 and an ad-hoc IR task related to the hard topics of the TREC Robust 2004. The second experimental validation concerns the tweet contextualization task organized by INEX 2014. Global results highlighted the effectiveness of our HQE model and of association rules mining for QE combined with external resources.  相似文献   

13.
元数据的应用需要开发适于所应用主题领域的规范词表来满足用户的检索需求,但目前对用户用什么词来进行查找却知之甚少。为了了解数字化教育图书馆用户在检索中使用什么样的词来进行查找,本文作者利用检索记录挖掘的方法来进行研究。在初步分析了40多万条检索记录中所含的100多万个检索词之后,作者重点分析了规范词在检索中被用户使用的情况,并且对比分析了哪些非规范词被用户使用.作者发现用户在查找信息的过程中对规范词的使用频率大大超过了非规范词的使用频率。对非规范词使用的进一步分析不仅可以提供补充更新规范词的来源,而且也可以为分析规范词非规范词之间建立对应的浯义关系提供重要的信息来源。  相似文献   

14.
Relevance feedback methods generally suffer from topic drift caused by word ambiguities and synonymous uses of words. Topic drift is an important issue in patent information retrieval as people tend to use different expressions describing similar concepts causing low precision and recall at the same time. Furthermore, failing to retrieve relevant patents to an application during the examination process may cause legal problems caused by granting an existing invention. A possible cause of topic drift is utilizing a relevance feedback-based search method. As a way to alleviate the inherent problem, we propose a novel query phrase expansion approach utilizing semantic annotations in Wikipedia pages, trying to enrich queries with phrases disambiguating the original query words. The idea was implemented for patent search where patents are classified into a hierarchy of categories, and the analyses of the experimental results showed not only the positive roles of phrases and words in retrieving additional relevant documents through query expansion but also their contributions to alleviating the query drift problem. More specifically, our query expansion method was compared against relevance-based language model, a state-of-the-art query expansion method, to show its superiority in terms of MAP on all levels of the classification hierarchy.  相似文献   

15.
信息检索扩展技术研究   总被引:1,自引:0,他引:1  
本文针对信息检索在查询扩展方面的不足,提出了一种结合本体理论和用户相关反馈技术的查询扩展方法。以FirteX作为检索平台, 选取WordNet作为本体扩展资源来验证本文所提出的查询扩展算法,实现结果表明该方法比基于余弦相似性的查询扩展方法在平均查全率、平均查准率方面有更大的优点。  相似文献   

16.
Query Expansion with Long-Span Collocates   总被引:1,自引:0,他引:1  
The paper presents two novel approaches to query expansion with long-span collocates—words, significantly co-occurring in topic-size windows with query terms. In the first approach—global collocation analysis—collocates of query terms are extracted from the entire collection, in the second—local collocation analysis—from a subset of retrieved documents. The significance of association between collocates was estimated using modified Mutual Information and Z score. The techniques were tested using the Okapi IR system. The effect of different parameters on performance was evaluated: window size, number of expansion terms, measures of collocation significance and types of expansion terms. We present performance results of these techniques and provide comparison with related approaches.  相似文献   

17.
一种面向用户兴趣的个性化语义查询扩展方法   总被引:1,自引:0,他引:1  
在基于本体的语义查询扩展研究的基础上,结合用户模型的研究,提出要将用户的兴趣模型与查询扩展相结合,实现个性化的语义查询扩展,并把个性化的语义查询扩展过程分为两个阶段——检索关键词向用户模型中的个性化领域本体概念的映射以及在本体层次对映射概念的语义扩展,给出每一阶段的实现算法。实验表明该方法能够提高信息检索的查准率和查全率,在一定程度上满足个性化的查询需求。  相似文献   

18.
研究一种新兴搜索方式——社会化搜索在互联网搜索引擎中的应用模式及其进展。在社会化搜索的背景与视角下通过考察近年来出现的多种Web2.0现象,归纳出社会化搜索在web2.0社区对搜索贡献基础上的三个发展方向,即社区化搜索、社会化搜索引擎和个性化搜索,并总结各自的特点和意义。认为这三条途径相互联系,共同满足用户搜索多样化、民主化、自主化的需求。最后对发展前景提出展望。  相似文献   

19.
In this paper, we present a framework that can process a user query for retrieval of information from documents of different properties across multiple domains, with specific application to patent laws and regulations. The framework has three basic components. The first component is ontology mapping and generation. What happens is that the keywords entered by users are mapped into a subset of relevant keywords. This step is performed by looking up those words in an ontology database. The second component is the joint and cross search in various document domains; in our case, they are patents and scientific publications. The last component is to modify the search results by applying user feedback statistics. The results of feedback will be saved as metadata for future uses.A case example is given to demonstrate how results from multiple domain searches can be combined using ontology and cross referencing. We use an example of well-known biotechnology patents on erythropoietin (EPO) and give detailed analysis on each document domain with this keyword. Relationships between each domain are demonstrated.A user feedback mechanism is also discussed in this paper. The ability to take user feedback into the framework is important. There is no doubt that domain knowledge from expert or experienced users could be a very good compliment to the proposed system. Both direct and indirect user feedbacks are discussed.  相似文献   

20.
特征词抽取和相关性融合的伪相关反馈查询扩展   总被引:2,自引:0,他引:2  
针对现有信息检索系统中存在的词不匹配问题,提出一种基于特征词抽取和相关性融合的伪相关反馈查询扩展算法以及新的扩展词权重计算方法。该算法从前列n篇初检局部文档中抽取与原查询相关的特征词,根据特征词在初检文档集中出现的频度以及与原查询的相关度,将特征词确定为最终的扩展词实现查询扩展。实验结果表明,该方法有效,并能提高和改善信息检索性能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号