首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 223 毫秒
1.
英汉交互式跨语言检索系统设计与实现   总被引:1,自引:0,他引:1  
针对跨语言信息检索的查询翻译歧义性问题,采用交互式系统开发设计方法,对基于相关反馈的跨语言信息检索技术进行研究和分析,提出一个英汉交互式跨语言信息检索系统,实现用户辅助查询翻译、多级用户相关性判断,以及翻译优化与查询扩展等相关反馈功能,结果明显提高了检索效果。  相似文献   

2.
分析跨语言信息检索的基本模式和翻译消歧关键技术,采用基于词语对共现率和词语间距加权计算的方法,对查询式翻译进行消歧优化,在此基础上构建跨语言商品信息检索系统并应用于图书商品搜索,实验结果证明翻译质量和检索效果得到提高。  相似文献   

3.
跨语言信息检索进展研究   总被引:7,自引:1,他引:6  
根据研究对象的变迁,国外关于跨语言信息检索的历程主要分为三个阶段。跨语言信息检索目前的主要解决方法是在单语言信息检索系统上增加一个语言转换机制。解决查询条件与查询文档集间的语言障碍有五种不同的技术路线。跨语言信息检索主要研究热点有翻译歧义研究、翻译资源构建、专有名词识别与音译研究等五个领域。  相似文献   

4.
交互式跨语言信息检索是信息检索的一个重要分支。在分析交互式跨语言信息检索过程、评价指标、用户行为进展等理论研究基础上,设计一个让用户参与跨语言信息检索全过程的用户检索实验。实验结果表明:用户检索词主要来自检索主题的标题;用户判断文档相关性的准确率较高;目标语言文档全文、译文摘要、译文全文都是用户认可的判断依据;翻译优化方法以及翻译优化与查询扩展的结合方法在用户交互环境下非常有效;用户对于反馈后的翻译仍然愿意做进一步选择;用户对于与跨语言信息检索系统进行交互是有需求并认可的。用户行为分析有助于指导交互式跨语言信息检索系统的设计与实践。  相似文献   

5.
面对日益膨胀的多语种信息资源,跨语言信息检索已成为实现全球知识存取和共享的关键技术手段。构建一个实用型的跨语言检索查询翻译接口,可方便地嵌入任意的信息检索平台,扩展现有信息检索平台的多语言信息处理能力。该查询翻译接口采用基于最长短语、查询分类和概率词典等多种翻译消歧策略,并从查询翻译的准确性和接口的运行效率两个角度对构建的查询翻译接口进行评测,实验结果验证所采用方法具有可行性。  相似文献   

6.
本体驱动的跨语言信息检索研究   总被引:5,自引:0,他引:5  
分析跨语言信息检索技术的翻译歧异性问题,指出多语本体的引入可以提高语义排歧的准确性,详细分析两个国外的跨语言信息检索系统,并在此基础上提出一个基于双语本体的中英跨语言信息检索模型及实现方案。  相似文献   

7.
许多研究已经探讨了跨语言和多语言信息检索问题,并提出了多种实现方法,特别是针对查询的翻译.但是大多数的方法都将跨语言检索问题看成是两个分开的步骤查询的翻译和单语检索.而对于多语言信息检索,则另外再加上一个结果合成的步骤.在本文中,我们提出一种一体化的检索方法,即将查询的翻译看成是整个检索过程的一部分.使用这种一体化的方法能充分将翻译和检索中的不确定性结合起来,从而达到更好的整体优化,也能将单语言信息检索的方法用于跨语言及多语言信息检索.  相似文献   

8.
针对现有的命名实体识别方法不能很好地处理专业领域特定命名抽取的问题,提出一种基于启发式规则的专业命名识别方法。以中文文本中化学物质命名为研究对象,分析其领域特征及统计语言特征,建立适用于化学领域文献命名识别的启发式规则,为专业领域的命名实体识别提供新的解决方案。对比实验证明本文的方法能有效提升专业命名识别的效率。  相似文献   

9.
跨语言信息检索方法概述   总被引:4,自引:0,他引:4  
本文介绍了跨语言信息检索问题的由来与发展,通过对单语言信息检索技术的介绍引入了跨语言检索的关键问题,并分别介绍了跨语言信息检索中基于词典、基于语料和基于机器翻译模块这三种主流方法,最后对跨语言检索的一体化方法等较新的思想和跨语言检索评测等做了简要说明.  相似文献   

10.
跨语言信息检索的发展与展望   总被引:7,自引:0,他引:7  
随着计算机网络技术的发展,全球互联网用户快速增长,网络信息资源语种也日益多样,跨语言信息检索已成为越来越重要的研究课题。本文介绍了跨语言信息检索的概念,阐述了跨语言信息检索的实现方法,分析了各翻译方式的技术难点问题,简介了相关的国际测评会议,并对如何进一步提高跨语言的检索精度,提出了自己的看法。  相似文献   

11.
A usual strategy to implement CLIR (Cross-Language Information Retrieval) systems is the so-called query translation approach. The user query is translated for each language present in the multilingual collection in order to compute an independent monolingual information retrieval process per language. Thus, this approach divides documents according to language. In this way, we obtain as many different collections as languages. After searching in these corpora and obtaining a result list per language, we must merge them in order to provide a single list of retrieved articles. In this paper, we propose an approach to obtain a single list of relevant documents for CLIR systems driven by query translation. This approach, which we call 2-step RSV (RSV: Retrieval Status Value), is based on the re-indexing of the retrieval documents according to the query vocabulary, and it performs noticeably better than traditional methods. The proposed method requires query vocabulary alignment: given a word for a given query, we must know the translation or translations to the other languages. Because this is not always possible, we have researched on a mixed model. This mixed model is applied in order to deal with queries with partial word-level alignment. The results prove that even in this scenario, 2-step RSV performs better than traditional merging methods.  相似文献   

12.
Prior-art search in patent retrieval is concerned with finding all existing patents relevant to a patent application. Since patents often appear in different languages, cross-language information retrieval (CLIR) is an essential component of effective patent search. In recent years machine translation (MT) has become the dominant approach to translation in CLIR. Standard MT systems focus on generating proper translations that are morphologically and syntactically correct. Development of effective MT systems of this type requires large training resources and high computational power for training and translation. This is an important issue for patent CLIR where queries are typically very long sometimes taking the form of a full patent application, meaning that query translation using MT systems can be very slow. However, in contrast to MT, the focus for information retrieval (IR) is on the conceptual meaning of the search words regardless of their surface form, or the linguistic structure of the output. Thus much of the complexity of MT is not required for effective CLIR. We present an adapted MT technique specifically designed for CLIR. In this method IR text pre-processing in the form of stop word removal and stemming are applied to the MT training corpus prior to the training phase. Applying this step leads to a significant decrease in the MT computational and training resources requirements. Experimental application of the new approach to the cross language patent retrieval task from CLEF-IP 2010 shows that the new technique to be up to 23 times faster than standard MT for query translations, while maintaining IR effectiveness statistically indistinguishable from standard MT when large training resources are used. Furthermore the new method is significantly better than standard MT when only limited translation training resources are available, which can be a significant issue for translation in specialized domains. The new MT technique also enables patent document translation in a practical amount of time with a resulting significant improvement in the retrieval effectiveness.  相似文献   

13.
Given a user question, the goal of a Question Answering (QA) system is to retrieve answers rather than full documents or even best-matching passages, as most Information Retrieval systems currently do. In this paper, we present BRUJA, a QA system for the management of multilingual collections. BRUJ rkstions (English, Spanish and French). The BRUJA architecture is not formed with three monolingual QA systems but instead uses English as Interlingua to make usual QA tasks such as question classifications and answer extractions. In addition, BRUJA uses Cross Language Information Retrieval (CLIR) techniques to retrieve relevant documents from a multilingual collection. On the one hand, we have more documents to find answers from but on the other hand, we are introducing noise into the system because of translations to the Interlingua (English) and the CLIR module. The question is whether the difficulty of managing three languages is worth it or whether a monolingual QA system delivers better results. We report on in-depth experimentation and demonstrate that our multilingual QA system gets better results than its monolingual counterpart whenever it uses good translation resources and, especially, CLIR techniques that are state-of-the-art.  相似文献   

14.
在指出跨语言信息检索技术中大部分实现方法存在分离现象的基础上,介绍将提问式翻译与检索过程统一的思想,并探讨将提问式翻译与检索过程统一的方法。  相似文献   

15.
In this paper, we study different applications of cross-language latent topic models trained on comparable corpora. The first focus lies on the task of cross-language information retrieval (CLIR). The Bilingual Latent Dirichlet allocation model (BiLDA) allows us to create an interlingual, language-independent representation of both queries and documents. We construct several BiLDA-based document models for CLIR, where no additional translation resources are used. The second focus lies on the methods for extracting translation candidates and semantically related words using only per-topic word distributions of the cross-language latent topic model. As the main contribution, we combine the two former steps, blending the evidences from the per-document topic distributions and the per-topic word distributions of the topic model with the knowledge from the extracted lexicon. We design and evaluate the novel evidence-rich statistical model for CLIR, and prove that such a model, which combines various (only internal) evidences, obtains the best scores for experiments performed on the standard test collections of the CLEF 2001–2003 campaigns. We confirm these findings in an alternative evaluation, where we automatically generate queries and perform the known-item search on a test subset of Wikipedia articles. The main importance of this work lies in the fact that we train translation resources from comparable document-aligned corpora and provide novel CLIR statistical models that exhaustively exploit as many cross-lingual clues as possible in the quest for better CLIR results, without use of any additional external resources such as parallel corpora or machine-readable dictionaries.  相似文献   

16.
With the increasing availability of machine-readable bilingual dictionaries, dictionary-based automatic query translation has become a viable approach to Cross-Language Information Retrieval (CLIR). In this approach, resolving term ambiguity is a crucial step. We propose a sense disambiguation technique based on a term-similarity measure for selecting the right translation sense of a query term. In addition, we apply a query expansion technique which is also based on the term similarity measure to improve the effectiveness of the translation queries. The results of our Indonesian to English and English to Indonesian CLIR experiments demonstrate the effectiveness of the sense disambiguation technique. As for the query expansion technique, it is shown to be effective as long as the term ambiguity in the queries has been resolved. In the effort to solve the term ambiguity problem, we discovered that differences in the pattern of word-formation between the two languages render query translations from one language to the other difficult.  相似文献   

17.
This paper reviews literature on dictionary-based cross-language information retrieval (CLIR) and presents CLIR research done at the University of Tampere (UTA). The main problems associated with dictionary-based CLIR, as well as appropriate methods to deal with the problems are discussed. We will present the structured query model by Pirkola and report findings for four different language pairs concerning the effectiveness of query structuring. The architecture of our automatic query translation and construction system is presented.  相似文献   

18.
This study develops regression models for predicting the performance of cross-language information retrieval (CLIR). The model assumes that CLIR performance can be explained by two factors: (1) the ease of search inherent in each query and (2) the translation quality in the process of CLIR systems. As operational variables, monolingual information retrieval (IR) performance is used for measuring the ease of search, and the well-known evaluation metric BLEU is used to measure the translation quality. This study also proposes an alternative metric, weighted average for matched unigrams (WAMU), which is tailored to gauging translation quality for special IR purposes. The data for regression analysis are obtained from a retrieval experiment of English-to-Italian bilingual searches using the CLEF 2003 test collection. The CLIR and monolingual IR performances are measured by average precision score. The result shows that the proposed regression model can explain about 60% of the variation in CLIR performance, and WAMU has more predictive power than BLEU. A back translation method for applying the regression model to operational CLIR systems in real situations is discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号