首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 221 毫秒
1.
[目的/意义]通过对国内外多语本体领域映射技术相关研究成果的总结和EuroWordNet案例分析,为国内跨语言信息检索系统映射机制的建立提供借鉴和参考。[方法/过程]选取目前发展较为成熟的多语本体库EuroWordNet作为案例,分别从数据库设计、本体构建、概念存储和多语文化差异的映射处理4个方面对其中间语言索引机制(Inter-Lingual-Index,ILI)进行分析。[结果/结论]嵌入式的数据库结构设计、概念抽取及同义词集对应关系的界定、概念存储的细粒度化和复杂等价关系的建立是建立跨语言信息检索映射机制的关键。  相似文献   

2.
本体驱动的跨语言信息检索研究   总被引:5,自引:0,他引:5  
分析跨语言信息检索技术的翻译歧异性问题,指出多语本体的引入可以提高语义排歧的准确性,详细分析两个国外的跨语言信息检索系统,并在此基础上提出一个基于双语本体的中英跨语言信息检索模型及实现方案。  相似文献   

3.
[目的/意义] 构建一个基于多语言本体的跨语言信息检索模型,有助于用户通过该模型使用自己熟悉的语言来获取不同语种的信息资源。[方法/过程] 通过本体设计及检索模型功能模块设计建立一个基于数字出版领域本体的中英跨语言信息检索模型,并利用Java语言及Lucene搜索引擎架构对该模型进行编程实现。[结果/结论] 多语言领域本体具有明确、形式化、共享、概念化、结构清晰等特征,可以作为语义层应用于跨语言信息检索系统之中,实现信息资源的语义表达。经测试,本文构建的模型能够较好地实现分词、查询扩展和语义关联等功能,促进跨语言信息检索向语义层次发展。  相似文献   

4.
随着各种本体构建方法和诸多实验本体的相继涌现,以跨本体通信、跨本体协同为目的的本体对应相关研究在近年来受到国际学术界的普遍关注.为了最大限度复用现有本体,解决本体对应中跨本体映射的核心问题,在对当前本体映射中概念相似度的计算方法进行梳理和总结的基础上,提出了以"基于概念格的对象-属性相似度(object-attribute similarity based on concept lattice,OASBCL)"法计算跨本体映射中概念的相似度.通过对该方法在跨本体映射中的应用举例,阐明了方法的有效性.并在此基础上从概念格与本体互补、相似度要素指标、映射性质三个方面对该方法进行了讨论.以尝试探索一种能够支持异构本体间跨本体映射的形式化的概念相似度计算方法.  相似文献   

5.
智能化是信息检索技术发展的一个重要方向,将本体论技术引入信息检索过程实现信息检索智能化得到了广泛的关注和研究.但是,现有的研究仅仅局限于利用本体规范化人-机对篇章词汇概念的一致理解,实现词汇语义级别的信息检索,对本体所具有的推理能力如何应用于信息检索却缺乏有效的解决方法.为了解决这一问题,本文将描述逻辑引入信息检索.一方面,通过建立描述逻辑与领域本体之间的映射关系,规范领域本体的构建,使领域本体具备自动推理能力;另一方面,强调通过语义标注实现领域本体对检索文档的描述.这样,就可以将针对检索文档集合的信息检索转换为针对描述逻辑知识库的推理过程,从而在信息检索过程中引入推理服务,实现智能化检索.文章详细定义了基于描述逻辑的信息检索的概念,阐述了其所提供的各种信息检索服务,并且以一个企业环境中的应用说明基于描述逻辑所实现的信息检索新能力.最后,介绍一种基于描述逻辑的信息检索技术实现方案.  相似文献   

6.
从本体的概念、知识组织系统层次、本体的编制方法及其在信息检索系统领域的应用等角度,分析本体与情报检索语言之间的联系。结合本体的特点,指出情报语言学的发展趋势是语义性加强;使用主体范围扩大;“透明化”加强;共享和兼容等。  相似文献   

7.
张兆伦 《兰台世界》2006,(20):65-66
文章先介绍本体的概念、作用及构建方法,然后阐述了基于本体的信息检索的一般方法。  相似文献   

8.
郝斌 《图书情报知识》2007,287(6):67-71
相关性理论是情报学基础理论之一,是衡量信息检索效能的关键指标,而本体信息检索是信息检索领域研究的前沿课题、发展方向。本文以米扎罗四维相关性模型为基础,对不同类型本体信息检索模型式下的相关性表现进行了对比研究,发现在本体信息检索条件下,相关性得到较大提高。  相似文献   

9.
针对不同领域本体之间存在的异质性,以及本体之间存在语义分歧、难以协同操作,从而影响本体复用与知识共享的现实问题,以当前成熟的领域本体为研究对象,通过在共同语义基础上对异质领域本体的概念格解析,在不破坏偏序关系的情况下获得本体中相关概念的外延与内涵,进而借助基于对象(外延)与属性(内涵)的概念相似度计算方法建立跨本体的映射,并最终构建多本体协同知识地图,实现跨越异质本体的知识呈现与知识检索.  相似文献   

10.
基于领域本体的专利信息检索系统研究与实现   总被引:1,自引:0,他引:1  
 针对传统信息检索方法在当今网络信息环境下所面临的问题,提出基于领域本体的专利信息检索模型,从用户检索请求处理、本体构建、本体可视化与语义扩展、检索及存储的过程和技术实现进行研究,并开发一个基于服装领域本体的专利信息检索原型系统。比较测试表明,该模型在确保信息检索准确性的同时能够极大地提高其全面性。  相似文献   

11.
We present a system for multilingual information retrieval that allows users to formulate queries in their preferred language and retrieve relevant information from a collection containing documents in multiple languages. The system is based on a process of document level alignments, where documents of different languages are paired according to their similarity. The resulting mapping allows us to produce a multilingual comparable corpus. Such a corpus has multiple interesting applications. It allows us to build a data structure for query translation in cross-language information retrieval (CLIR). Moreover, we also perform pseudo relevance feedback on the alignments to improve our retrieval results. And finally, multiple retrieval runs can be merged into one unified result list. The resulting system is inexpensive, adaptable to domain-specific collections and new languages and has performed very well at the TREC-7 conference CLIR system comparison.  相似文献   

12.
Semantic Web与基于语义的网络信息检索   总被引:54,自引:3,他引:51  
张晓林 《情报学报》2002,21(4):413-420
本文描述网络环境语义检索的需求 ,分析SemanticWeb的组成框架 ,探讨概念集 (ontologies)及其定义和标记语言 ,并介绍基于概念集的信息资源语义标注和语义推理基本过程。  相似文献   

13.
14.
Abstract

This article reports on the results of an exploratory user-centered study that examined how technological advancements in natural language processing (NLP) such as the availability of multilingual information access (MLIA) tools impact the information searching behavior of bi/multilingual academic users. Thirty-one bi/multilingual students participated in a controlled lab-based user experiment in which they carried out two assigned tasks each on Google and WorldCat for a total of four tasks, and then completed a post experiment questionnaire. The captures from the experiment showed 86.7% of the participants using multilingual information access tools. Further analyses of the captures also showed that participants were more likely to use MLIA tools when the instructions for the task were stated in their native language. An independent samples t-test revealed that participants spent less time on their searches when they used MLIA tools. The study revealed considerable diversity in the information searching behavior of the participants, even within the same pair of languages, and even for the same user. Diversity was noted for instance, on which tasks MLIA tools were used and in how these tools were used. User-centered designed, personalized multilingual information retrieval (PMLIR) models could hold promise for best representing the information searching behavior of bi/multilingual users.  相似文献   

15.
Multilingual information retrieval is generally understood to mean the retrieval of relevant information in multiple target languages in response to a user query in a single source language. In a multilingual federated search environment, different information sources contain documents in different languages. A general search strategy in multilingual federated search environments is to translate the user query to each language of the information sources and run a monolingual search in each information source. It is then necessary to obtain a single ranked document list by merging the individual ranked lists from the information sources that are in different languages. This is known as the results merging problem for multilingual information retrieval. Previous research has shown that the simple approach of normalizing source-specific document scores is not effective. On the other side, a more effective merging method was proposed to download and translate all retrieved documents into the source language and generate the final ranked list by running a monolingual search in the search client. The latter method is more effective but is associated with a large amount of online communication and computation costs. This paper proposes an effective and efficient approach for the results merging task of multilingual ranked lists. Particularly, it downloads only a small number of documents from the individual ranked lists of each user query to calculate comparable document scores by utilizing both the query-based translation method and the document-based translation method. Then, query-specific and source-specific transformation models can be trained for individual ranked lists by using the information of these downloaded documents. These transformation models are used to estimate comparable document scores for all retrieved documents and thus the documents can be sorted into a final ranked list. This merging approach is efficient as only a subset of the retrieved documents are downloaded and translated online. Furthermore, an extensive set of experiments on the Cross-Language Evaluation Forum (CLEF) () data has demonstrated the effectiveness of the query-specific and source-specific results merging algorithm against other alternatives. The new research in this paper proposes different variants of the query-specific and source-specific results merging algorithm with different transformation models. This paper also provides thorough experimental results as well as detailed analysis. All of the work substantially extends the preliminary research in (Si and Callan, in: Peters (ed.) Results of the cross-language evaluation forum-CLEF 2005, 2005).
Hao YuanEmail:
  相似文献   

16.
多语言信息检索系统可视化初探   总被引:1,自引:0,他引:1  
多语言检索的研究在信息种类越来越多的现在十分重要,除检索技术与翻译功能的研究外,信息可视化的运用以及界面设计是另一个研究要点.依据以前的研究和文章综述,信息可视化被证明是帮助用户实施多语言信息检索的有效方法.研究提出一个多语言信息检索系统可视化模型及其设计方案,并指出该领域未来的发展方向.  相似文献   

17.
认为领域本体自身结构的复杂性和领域本体之间的异构性,使领域本体映射方法成为实现本体映射的难点之一。提出多领域本体映射与聚类理论模型,并以该模型为指导,选取药物领域本体RxNorm与NDF-RT(美国国家药物文件-参考术语)进行映射实例研究,提出RxNorm与NDF-RT两个领域本体之间映射的一种新方法,并利用NDF-RT提供的药物分类信息实现RxNorm本体中药物信息的分类聚合,为数字资源的语义互联提供新思路。  相似文献   

18.
Technical term translations are important for cross-lingual information retrieval. In many languages, new technical terms have a common origin rendered with different spelling of the underlying sounds, also known as cross-lingual spelling variants (CLSV). To find the best CLSV in a text database index, we contribute a formulation of the problem in a probabilistic framework, and implement this with an instance of the general edit distance using weighted finite-state transducers. Some training data is required when estimating the costs for the general edit distance. We demonstrate that after some basic training our new multilingual model is robust and requires little or no adaptation for covering additional languages, as the model takes advantage of language independent transliteration patterns. We train the model with medical terms in seven languages and test it with terms from varied domains in six languages. Two test languages are not in the training data. Against a large text database index, we achieve 64–78 % precision at the point of 100% recall. This is a relative improvement of 22% on the simple edit distance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号