首页 | 本学科首页   官方微博 | 高级检索  
 共查询到19条相似文献,搜索用时 234 毫秒
隐性语义标引是一种基于词的相依性和语义结构的文献自动标引和检索技术,它采用词频统计和奇值分解技术来捕捉文献中的语义结构,进而得到标引词、提问和文献的向量表示,检索系统就可以通过计算文献与提问之间的相似度,来达到检索目的。相关反馈是通过反复与用户交互,分析调整检索策略,控制检索词的重要程度,从而增强对相关文献的响应且有效地抑制非相关文献的出现的对检索性能加以改进的有效措施。本文描述了相关反馈的数学基础及其在隐生语义标引方法中的工作原理,并以系统论和控制论的基本理论为指导,建立了一个具有相关反馈机制的隐性语义标引方法实验系统,进行了  相似文献   

LSI潜在语义标引方法在情报检索中的应用   总被引:9,自引:2,他引:7  
介绍了一种基于词相依性的语义结构, 被称为“潜在语义标引”的文献自动标引和检索技术。采用词频统计和奇值分解技术来捕捉文献的语义结构, 得到标引词、提问和文献的向量表示, 检索系统可以预测文献与提问之间的相关度, 达到检索的目的。  相似文献   

基于领域本体的数字图书馆检索结果动态组织方法研究   总被引:1,自引:1,他引:0  
在对现有数字图书馆检索结果的组织方法进行分析的基础上,从忠实于用户提问的角度,提出基于领域本体的检索结果动态组织方法。基本解决思路是将文献的标识与用户的提问进行有效地对接,即以用户提问为基础构造提问模型,并基于检索结果构造标识模型,将提问模型与标识模型在语义层面通过领域本体进行映射,从而实现文献标识与用户提问在语义层面的互通,最终以用户提问的语义方式来展现检索结果。  相似文献   

为解决传统关键词检索技术的不足,人民出版社采用最新的语义检索技术,建立面向政治理论文献的语义模型,开发知识点标引平台来提取和整理文献中的各个知识点,将复杂的理论文献知识化、条理化,在此基础上建立多种语义检索模型,实现检索结果的去粗取精、去伪存真。最终实现的人民金典语义检索系统,已经在"人民出版社"网站经过一年多时间的运行,证明其知识点检索的语义准确率和有效率均达到了70%以上,其中"人民金典语义查询"系统的准确率达到95%以上。  相似文献   

自动构造布尔检索提问式算法研究   总被引:6,自引:0,他引:6  
本文分析和评价了自动构造布尔检索提问式的两种典型算法,在此基础上提出了一种新的算法──基于样本文献提问构造布尔检索提问式算法。核算法以样本文献提问为基础计算检索词的权重,根据检索词权重值的分布规律来构造布尔检索提问式。此算法的主要目的是简化用户在检索中与情报检索系统的交互过程,从而提高检索效率。笔者利用AUBO检索系统对算法进行了验证。结果表明、,该算法在相同的查全水平上的查准率普遍高于手编提问式的检索结果。  相似文献   

通过网络图像检索实验,采集用户进行图像搜索任务时的提问式及其变化序列,并进行小规模实证。通过内容编码和统计分析发现:①图像检索提问调整绝大多数与内容调整有关,缩检、扩检、平移和跟随系统相关搜索词是发生最频繁的四类主要提问调整行为,用户在图像检索时表现出从宽泛检索入手,逐步缩小范围的行为趋向;②就图像提问调整模式而言,有四种基本模式和四种混合模式,各有特点;③网络图像检索的演进性偏于一种自演进,而现有图像检索系统对交互式演进的支持明显不足。图1。表2。参考文献7。  相似文献   

GoPubMed是基于PubMed的语义智能搜索引擎,Quertle是以PubMed为主要数据源的语义智能搜索引擎,两者都是基于本体向未来语义检索发展的尝试。GoPubMed最大的检索特色是对检索结果的分类统计和可视化。Quertle在本体技术和自然语言处理技术的支撑下,形成了Power TermTM检索,能识别单词大写时的特定含义,将检索提问与关系网进行匹配查找获得高度相关的检索结果。  相似文献   

关于“手工检索策略”的初步研究   总被引:1,自引:0,他引:1  
一、何谓“手检策略”在进行计算机情报检索时,在一定的数据库质量与系统功能的前提下,检索策略构造的好坏,直接影响到相关文献的查全率和查准率,关系到机检服务的效果。那么,在手工检索的条件下,也存在着检索策略问题。例如:针对某一情报提问,需要弄清楚其真正所要求的检索角度、检索深度与广度,需要选择适合于这一情报提问的检索工具;需要确定从何种途径入手、使用什么索引;需要确定该查什么类目、该用什么主题词或关键词;并且需  相似文献   

通过对检索资源及用户检索提问的语义解析,采用基于概念图匹配的语句相似度计算方法,不仅可得到与检索条件精确匹配的信息资源,而且还能查询到与检索条件语义相关的隐含信息资源,提高信息查全率和查准率。最后,用一个语义检索实验系统验证系统分析与设计的可行性和有效性。  相似文献   

本文综合考虑了专利检索的业务特点以及智能语义分析技术与专利检索的结合点设计了一次信息检索评测,目的是检测专利语义检索技术的研究现状和系统有效性.为了提高评测的自动化程度,本文提出了一种基于引证文献的相关专利检索自动评测方法,实验结果表明,该方法可以取得与人工评价方法基本一致的测试结果.本次评测的相关工作为专利检索研究提供了有益参考.  相似文献   

针对常用信息检索模型存在的两大不足——检索提问与内容表达上的语义缺失与结果返回形式上的单文档局限,提出相应的解决方案,在此基础上进一步提出基于本体的族式返回检索模型,并就该模型中的部分关键问题,如族式返回、查询与文档表示以及语义匹配等进行讨论。  相似文献   

This paper presents a Graph Inference retrieval model that integrates structured knowledge resources, statistical information retrieval methods and inference in a unified framework. Key components of the model are a graph-based representation of the corpus and retrieval driven by an inference mechanism achieved as a traversal over the graph. The model is proposed to tackle the semantic gap problem—the mismatch between the raw data and the way a human being interprets it. We break down the semantic gap problem into five core issues, each requiring a specific type of inference in order to be overcome. Our model and evaluation is applied to the medical domain because search within this domain is particularly challenging and, as we show, often requires inference. In addition, this domain features both structured knowledge resources as well as unstructured text. Our evaluation shows that inference can be effective, retrieving many new relevant documents that are not retrieved by state-of-the-art information retrieval models. We show that many retrieved documents were not pooled by keyword-based search methods, prompting us to perform additional relevance assessment on these new documents. A third of the newly retrieved documents judged were found to be relevant. Our analysis provides a thorough understanding of when and how to apply inference for retrieval, including a categorisation of queries according to the effect of inference. The inference mechanism promoted recall by retrieving new relevant documents not found by previous keyword-based approaches. In addition, it promoted precision by an effective reranking of documents. When inference is used, performance gains can generally be expected on hard queries. However, inference should not be applied universally: for easy, unambiguous queries and queries with few relevant documents, inference did adversely affect effectiveness. These conclusions reflect the fact that for retrieval as inference to be effective, a careful balancing act is involved. Finally, although the Graph Inference model is developed and applied to medical search, it is a general retrieval model applicable to other areas such as web search, where an emerging research trend is to utilise structured knowledge resources for more effective semantic search.  相似文献   

针对传统信息检索模型不能很好满足用户需求的问题,在分析现有相关研究的基础上,提出基于领域Ontology的知识检索模型。通过构建领域Ontology,对文档进行语义标注,对查询请求进行概念提取和语义扩展,从而得到语义索引项作为文档和用户请求的知识表达,进一步研究领域Ontology中词语间语义关系的计算模型。考虑到语义相似度与语义相关的内在关系,给出相关系数来衡量检索目标与候选者间符合程度。最后对提出的模型进行验证,结果表明检索性能有显著提高。  相似文献   

基于中文自然语言理解的知识检索模型   总被引:6,自引:0,他引:6  
基于中文自然语言理解的知识检索模型的设计思路是:通过对用户提问及Web文档信息进行语义层次的自然语言处理,构建概念和概念网络,针对用户真实查询需求与概念网络及其映射的源文档进行推理匹配,然后对检索结果进行排序处理,提交给用户。图2。参考文献5。  相似文献   

文本检索的潜在语义索引法初探   总被引:5,自引:0,他引:5  
传统的文本检索方式是基于提问集合和文本集合的单纯语词匹配检索,然而这并不能解决检索实践过程中存在的同义和多义问题。文章阐述了文本检索的潜在语义索引法的原理并通过实验来验证潜在语义索引可以用来解决同义和多义问题,完善检索系统的性能。  相似文献   

The collective feedback of the users of an Information Retrieval (IR) system has been shown to provide semantic information that, while hard to extract using standard IR techniques, can be useful in Web mining tasks. In the last few years, several approaches have been proposed to process the logs stored by Internet Service Providers (ISP), Intranet proxies or Web search engines. However, the solutions proposed in the literature only partially represent the information available in the Web logs. In this paper, we propose to use a richer data structure, which is able to preserve most of the information available in the Web logs. This data structure consists of three groups of entities: users, documents and queries, which are connected in a network of relations. Query refinements correspond to separate transitions between the corresponding query nodes in the graph, while users are linked to the queries they have issued and to the documents they have selected. The classical query/document transitions, which connect a query to the documents selected by the users’ in the returned result page, are also considered. The resulting data structure is a complete representation of the collective search activity performed by the users of a search engine or of an Intranet. The experimental results show that this more powerful representation can be successfully used in several Web mining tasks like discovering semantically relevant query suggestions and Web page categorization by topic.  相似文献   

基于NLP的知识抽取系统架构研究*   总被引:1,自引:0,他引:1  
在参考自然语言处理平台及知识抽取系统的系统结构的基础上,提出一个基于NLP的知识抽取系统的详细设计方案。自然语言处理过程包括分词、词性标注、句法分析、语义分析等8大模块;知识抽取过程包括论文类型分析、篇章结构分析、知识抽取、知识表示4大模块。通过对基于NLP的知识抽取系统架构的研究,明确自然语言处理与知识抽取的关系,分析出知识抽取的系统流程及关键技术。  相似文献   

Large-scale retrieval systems are often implemented as a cascading sequence of phases—a first filtering step, in which a large set of candidate documents are extracted using a simple technique such as Boolean matching and/or static document scores; and then one or more ranking steps, in which the pool of documents retrieved by the filter is scored more precisely using dozens or perhaps hundreds of different features. The documents returned to the user are then taken from the head of the final ranked list. Here we examine methods for measuring the quality of filtering and preliminary ranking stages, and show how to use these measurements to tune the overall performance of the system. Standard top-weighted metrics used for overall system evaluation are not appropriate for assessing filtering stages, since the output is a set of documents, rather than an ordered sequence of documents. Instead, we use an approach in which a quality score is computed based on the discrepancy between filtered and full evaluation. Unlike previous approaches, our methods do not require relevance judgments, and thus can be used with virtually any query set. We show that this quality score directly correlates with actual differences in measured effectiveness when relevance judgments are available. Since the quality score does not require relevance judgments, it can be used to identify queries that perform particularly poorly for a given filter. Using these methods, we explore a wide range of filtering options using thousands of queries, categorize the relative merits of the different approaches, and identify useful parameter combinations.  相似文献   

The application of word sense disambiguation (WSD) techniques to information retrieval (IR) has yet to provide convincing retrieval results. Major obstacles to effective WSD in IR include coverage and granularity problems of word sense inventories, sparsity of document context, and limited information provided by short queries. In this paper, to alleviate these issues, we propose the construction of latent context models for terms using latent Dirichlet allocation. We propose building one latent context per word, using a well principled representation of local context based on word features. In particular, context words are weighted using a decaying function according to their distance to the target word, which is learnt from data in an unsupervised manner. The resulting latent features are used to discriminate word contexts, so as to constrict query’s semantic scope. Consistent and substantial improvements, including on difficult queries, are observed on TREC test collections, and the techniques combines well with blind relevance feedback. Compared to traditional topic modeling, WSD and positional indexing techniques, the proposed retrieval model is more effective and scales well on large-scale collections.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号