首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 265 毫秒
1.
分析跨语言信息检索的基本模式和翻译消歧关键技术,采用基于词语对共现率和词语间距加权计算的方法,对查询式翻译进行消歧优化,在此基础上构建跨语言商品信息检索系统并应用于图书商品搜索,实验结果证明翻译质量和检索效果得到提高。  相似文献   

2.
跨语言信息检索中的查询翻译方法研究   总被引:1,自引:0,他引:1  
文章介绍了跨语言信息检索查询翻译的四种基本方法,并且对目前查询翻译过程中所遇到的问题及现阶段的研究进展进行了总结分析,最后总结出跨语言信息检索查询翻译未来的发展方向。  相似文献   

3.
吴丹 《图书情报工作》2009,53(13):120-81
查询翻译歧义性问题是影响跨语言信息检索结果的关键,因此针对查询翻译的消歧研究已成为信息检索领域的研究热点。在对现有研究与应用调研的基础上,详细分析四类自动消歧方法,分别是:对查询进行结构化处理、通过语言分析帮助消歧、借助机读化语言资源进行消歧以及通过人机交互消歧,以期为跨语言信息检索查询翻译提供较好的消歧方法。  相似文献   

4.
许多研究已经探讨了跨语言和多语言信息检索问题,并提出了多种实现方法,特别是针对查询的翻译.但是大多数的方法都将跨语言检索问题看成是两个分开的步骤查询的翻译和单语检索.而对于多语言信息检索,则另外再加上一个结果合成的步骤.在本文中,我们提出一种一体化的检索方法,即将查询的翻译看成是整个检索过程的一部分.使用这种一体化的方法能充分将翻译和检索中的不确定性结合起来,从而达到更好的整体优化,也能将单语言信息检索的方法用于跨语言及多语言信息检索.  相似文献   

5.
跨语言信息检索进展研究   总被引:7,自引:1,他引:6  
根据研究对象的变迁,国外关于跨语言信息检索的历程主要分为三个阶段。跨语言信息检索目前的主要解决方法是在单语言信息检索系统上增加一个语言转换机制。解决查询条件与查询文档集间的语言障碍有五种不同的技术路线。跨语言信息检索主要研究热点有翻译歧义研究、翻译资源构建、专有名词识别与音译研究等五个领域。  相似文献   

6.
文章旨在探讨查询分类技术和跨语言检索技术的关系,前者的应用能否改善后者的系统性能是核心问题。首先提出一种基于查询分类的标准化折扣累积增量评价指标,通过对采用查询分类技术前后信息检索系统的标准化折扣累积增量评价指标的变化进行判断,来检验该评价指标的可用性和有效性。同时,查询分类可以作为降低跨语言检索系统查询翻译的歧义性的技术手段。对大规模查询集随机抽样的查询翻译实验结果表明,本文提出的基于查询分类的查询翻译消歧方法对大部分查询有效,在一些情况下甚至可以直接通过本方法完成查询翻译。结合其他方法进一步消除翻译的歧义性则是下一步的工作内容。  相似文献   

7.
本文提出了一种新的基于相关反馈的跨语言信息检索查询翻译优化技术,就实现该技术的关键步骤"估计检索词在相关文献集合中的翻译概率"设计了4种不同的算法,并通过伪相关反馈实验比较了这4种算法,验证了查询翻译优化技术的有效性.实验结果显示,4种翻译优化算法都能够提高检索结果的精度,其中基于词对齐的翻译算法相对更优越.此外,查询式的长度和检索主题的特征对不同查询翻译优化算法产生着不同程度的影响.  相似文献   

8.
英汉交互式跨语言检索系统设计与实现   总被引:1,自引:0,他引:1  
针对跨语言信息检索的查询翻译歧义性问题,采用交互式系统开发设计方法,对基于相关反馈的跨语言信息检索技术进行研究和分析,提出一个英汉交互式跨语言信息检索系统,实现用户辅助查询翻译、多级用户相关性判断,以及翻译优化与查询扩展等相关反馈功能,结果明显提高了检索效果。  相似文献   

9.
综述命名实体识别与翻译研究现状,提出基于信息抽取的命名实体识别与翻译方法,以及对该方法进行一系列集成优化处理,并实现了基于命名实体识别与翻译的跨语言信息检索实验。实验结果显示出命名实体识别与翻译在跨语言信息检索中的重要性,并证明了所提出的翻译加权和网络挖掘未登录命名实体方法的应用能显著提高跨语言信息检索的性能。  相似文献   

10.
基于伪相关反馈的跨语言查询扩展   总被引:3,自引:2,他引:1  
相关反馈是一种重要的查询重构技术,本文分析了两类相关反馈技术,一是按用户是否参与可分为伪相关反馈和交互式相关反馈,二是按作用于查询的方式可分为查询扩展与检索词重新加权.在此基础上,本文重点探讨了将相关反馈技术应用于跨语言信息检索,提出了翻译前查询扩展、翻译后查询扩展、翻译前与翻译后相结合的查询扩展三种方法.最后,本文通过伪相关反馈实验对这三种方法进行了比较,实验结果显示,三种跨语言查询扩展方法都能够有效地提高检索结果的精度,其中翻译后查询扩展方法相对更优越.此外,查询式的长度对不同跨语言查询扩展方法产生着不同程度的影响.  相似文献   

11.
面对基于双语词典的跨语言检索查询翻译方法中固有的一对多等翻译模糊问题,已有研究成果存在对于非组合型复合词无法进行准确翻译、双语词典和其他翻译资源联合使用引入较大计算开销等弊端。为建立英汉双向跨语言检索实用性系统,在现有的一部包含若干科技词汇和短语的双语科技词典的基础上,着重研究如何引入平行语料来改进已有的双语词典问题。目标是生成一部基于句对齐平行语料的科技类双语概率词典,为跨语言检索查询翻译消歧提供实时性支持。  相似文献   

12.
The problem of language in Web searching has been discussed primarily in the area of cross-language information retrieval (CLIR). However, much CLIR research centers on investigation of the effectiveness of automatic translation techniques. The case study reported here explored bilingual user behaviors, perceptions, and preferences with respect to the capability of the Web as a multilingual information resource. Twenty-eight bilingual academic users from Myongji University in Korea were recruited for the study. Findings show that the subjects did not use Web search engines as multilingual tools. For search queries, they selected a language that represents their information need most accurately depending on the types of information task rather than choosing their first language. Subjects expressed concerns about the accuracy of machine translation of scholarly terminologies and preferred to have user control over multilingual Web searches.  相似文献   

13.
This study develops regression models for predicting the performance of cross-language information retrieval (CLIR). The model assumes that CLIR performance can be explained by two factors: (1) the ease of search inherent in each query and (2) the translation quality in the process of CLIR systems. As operational variables, monolingual information retrieval (IR) performance is used for measuring the ease of search, and the well-known evaluation metric BLEU is used to measure the translation quality. This study also proposes an alternative metric, weighted average for matched unigrams (WAMU), which is tailored to gauging translation quality for special IR purposes. The data for regression analysis are obtained from a retrieval experiment of English-to-Italian bilingual searches using the CLEF 2003 test collection. The CLIR and monolingual IR performances are measured by average precision score. The result shows that the proposed regression model can explain about 60% of the variation in CLIR performance, and WAMU has more predictive power than BLEU. A back translation method for applying the regression model to operational CLIR systems in real situations is discussed.  相似文献   

14.
邱悦 《图书情报工作》2006,50(10):82-86
认为网络语言和用户语言的多样化使跨语言信息检索成为一个重要的研究领域,该领域所采用的技术主要包括基于机器翻译的方法、基于机读双语词典的方法、基于主题词表的方法以及基于平行语料库的方法。跨语言信息检索的实现除以技术为基础外,还需要查询扩展技术的辅助。  相似文献   

15.
The paper studies concept-based cross-language information retrieval (CLIR). The document collection was a subset of the TREC collection. The test requests were formed from TREC's health related topics. As translation dictionaries the study used a general dictionary and a domain-specific (=medical) dictionary. The effects of translation method, conjunction, and facet order on the effectiveness of concept-based cross-language queries were studied, and concept-based structuring of cross-language queries was compared to mechanical structuring based on the output of dictionaries. The performance of translated Finnish queries against English documents was compared to the performance of original English queries against the English documents, and the performance of different CLIR query types was compared with one another. No major difference was found between concept-based and mechanical structuring. The best translation method was a simultaneous look-up in the medical dictionary and the general dictionary, in which case cross-language queries performed as well as the original English queries. The results showed that especially at high exhaustivity (the number of mutually restrictive concepts in a request) levels cross-language queries perform well in relation to monolingual queries. This suggests that conjunction disambiguates cross-language queries. An extensive study was made of the relative importance of the concepts of requests. On the basis of the classification data of request concepts it was shown how the order of facets in a query affects cross-language as well as monolingual queries.  相似文献   

16.
With the increasing availability of machine-readable bilingual dictionaries, dictionary-based automatic query translation has become a viable approach to Cross-Language Information Retrieval (CLIR). In this approach, resolving term ambiguity is a crucial step. We propose a sense disambiguation technique based on a term-similarity measure for selecting the right translation sense of a query term. In addition, we apply a query expansion technique which is also based on the term similarity measure to improve the effectiveness of the translation queries. The results of our Indonesian to English and English to Indonesian CLIR experiments demonstrate the effectiveness of the sense disambiguation technique. As for the query expansion technique, it is shown to be effective as long as the term ambiguity in the queries has been resolved. In the effort to solve the term ambiguity problem, we discovered that differences in the pattern of word-formation between the two languages render query translations from one language to the other difficult.  相似文献   

17.
Cross-language information retrieval (CLIR) has so far been studied with the assumption that some rich linguistic resources such as bilingual dictionaries or parallel corpora are available. But creation of such high quality resources is labor-intensive and they are not always at hand. In this paper we investigate the feasibility of using only comparable corpora for CLIR, without relying on other linguistic resources. Comparable corpora are text documents in different languages that cover similar topics and are often naturally attainable (e.g., news articles published in different languages at the same time period). We adapt an existing cross-lingual word association mining method and incorporate it into a language modeling approach to cross-language retrieval. We investigate different strategies for estimating the target query language models. Our evaluation results on the TREC Arabic–English cross-lingual data show that the proposed method is effective for the CLIR task, demonstrating that it is feasible to perform cross-lingual information retrieval with just comparable corpora.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号