共查询到18条相似文献,搜索用时 171 毫秒
1.
面对日益膨胀的多语种信息资源,跨语言信息检索已成为实现全球知识存取和共享的关键技术手段。构建一个实用型的跨语言检索查询翻译接口,可方便地嵌入任意的信息检索平台,扩展现有信息检索平台的多语言信息处理能力。该查询翻译接口采用基于最长短语、查询分类和概率词典等多种翻译消歧策略,并从查询翻译的准确性和接口的运行效率两个角度对构建的查询翻译接口进行评测,实验结果验证所采用方法具有可行性。 相似文献
2.
3.
4.
5.
基于三数组Trie索引树的词典查询机制* 总被引:1,自引:0,他引:1
对双数组Trie进行改进,设计实现基于三数组Trie索引树原理的汉语词典查询机制,并用递归算法实现构词状态表的自动构建。 相似文献
6.
7.
文章旨在探讨查询分类技术和跨语言检索技术的关系,前者的应用能否改善后者的系统性能是核心问题。首先提出一种基于查询分类的标准化折扣累积增量评价指标,通过对采用查询分类技术前后信息检索系统的标准化折扣累积增量评价指标的变化进行判断,来检验该评价指标的可用性和有效性。同时,查询分类可以作为降低跨语言检索系统查询翻译的歧义性的技术手段。对大规模查询集随机抽样的查询翻译实验结果表明,本文提出的基于查询分类的查询翻译消歧方法对大部分查询有效,在一些情况下甚至可以直接通过本方法完成查询翻译。结合其他方法进一步消除翻译的歧义性则是下一步的工作内容。 相似文献
8.
介绍了在线词典、翻译软件及搜索引擎等网络资源在文献翻译中的利用情况,并指出网络资源在提高翻译效率的同时也有其不足之处。 相似文献
9.
英汉交互式跨语言检索系统设计与实现 总被引:1,自引:0,他引:1
吴丹 《现代图书情报技术》2009,3(2):89-95
针对跨语言信息检索的查询翻译歧义性问题,采用交互式系统开发设计方法,对基于相关反馈的跨语言信息检索技术进行研究和分析,提出一个英汉交互式跨语言信息检索系统,实现用户辅助查询翻译、多级用户相关性判断,以及翻译优化与查询扩展等相关反馈功能,结果明显提高了检索效果。 相似文献
10.
11.
Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings 总被引:3,自引:1,他引:2
Ari Pirkola Turid Hedlund Heikki Keskustalo Kalervo Järvelin 《Information Retrieval》2001,4(3-4):209-230
This paper reviews literature on dictionary-based cross-language information retrieval (CLIR) and presents CLIR research done at the University of Tampere (UTA). The main problems associated with dictionary-based CLIR, as well as appropriate methods to deal with the problems are discussed. We will present the structured query model by Pirkola and report findings for four different language pairs concerning the effectiveness of query structuring. The architecture of our automatic query translation and construction system is presented. 相似文献
12.
Using Statistical Term Similarity for Sense Disambiguation in Cross-Language Information Retrieval 总被引:2,自引:0,他引:2
Mirna Adriani 《Information Retrieval》2000,2(1):71-82
With the increasing availability of machine-readable bilingual dictionaries, dictionary-based automatic query translation has become a viable approach to Cross-Language Information Retrieval (CLIR). In this approach, resolving term ambiguity is a crucial step. We propose a sense disambiguation technique based on a term-similarity measure for selecting the right translation sense of a query term. In addition, we apply a query expansion technique which is also based on the term similarity measure to improve the effectiveness of the translation queries. The results of our Indonesian to English and English to Indonesian CLIR experiments demonstrate the effectiveness of the sense disambiguation technique. As for the query expansion technique, it is shown to be effective as long as the term ambiguity in the queries has been resolved. In the effort to solve the term ambiguity problem, we discovered that differences in the pattern of word-formation between the two languages render query translations from one language to the other difficult. 相似文献
13.
Focused web crawling in the acquisition of comparable corpora 总被引:2,自引:0,他引:2
Tuomas Talvensaari Ari Pirkola Kalervo Järvelin Martti Juhola Jorma Laurikkala 《Information Retrieval》2008,11(5):427-445
Cross-Language Information Retrieval (CLIR) resources, such as dictionaries and parallel corpora, are scarce for special domains.
Obtaining comparable corpora automatically for such domains could be an answer to this problem. The Web, with its vast volumes
of data, offers a natural source for this. We experimented with focused crawling as a means to acquire comparable corpora
in the genomics domain. The acquired corpora were used to statistically translate domain-specific words. The same words were
also translated using a high-quality, but non-genomics-related parallel corpus, which fared considerably worse. We also evaluated
our system with standard information retrieval (IR) experiments, combining statistical translation using the Web corpora with
dictionary-based translation. The results showed improvement over pure dictionary-based translation. Therefore, mining the
Web for comparable corpora seems promising. 相似文献
14.
Turid Hedlund Eija Airio Heikki Keskustalo Raija Lehtokangas Ari Pirkola Kalervo Järvelin 《Information Retrieval》2004,7(1-2):99-119
In this study the basic framework and performance analysis results are presented for the three year long development process of the dictionary-based UTACLIR system. The tests expand from bilingual CLIR for three language pairs Swedish, Finnish and German to English, to six language pairs, from English to French, German, Spanish, Italian, Dutch and Finnish, and from bilingual to multilingual. In addition, transitive translation tests are reported. The development process of the UTACLIR query translation system will be regarded from the point of view of a learning process. The contribution of the individual components, the effectiveness of compound handling, proper name matching and structuring of queries are analyzed. The results and the fault analysis have been valuable in the development process. Overall the results indicate that the process is robust and can be extended to other languages. The individual effects of the different components are in general positive. However, performance also depends on the topic set and the number of compounds and proper names in the topic, and to some extent on the source and target language. The dictionaries used affect the performance significantly. 相似文献
15.
Multilingual retrieval (querying of multiple document collections each in a different language) can be achieved by combining several individual techniques which enhance retrieval: machine translation to cross the language barrier, relevance feedback to add words to the initial query, decompounding for languages with complex term structure, and data fusion to combine monolingual retrieval results from different languages. Using the CLEF 2001 and CLEF 2002 topics and document collections, this paper evaluates these techniques within the context of a monolingual document ranking formula based upon logistic regression. Each individual technique yields improved performance over runs which do not utilize that technique. Moreover the techniques are complementary, in that combining the best techniques outperforms individual technique performance. An approximate but fast document translation using bilingual wordlists created from machine translation systems is presented and evaluated. The fast document translation is as effective as query translation in multilingual retrieval. Furthermore, when fast document translation is combined with query translation in multilingual retrieval, the performance is significantly better than that of query translation or fast document translation. 相似文献
16.
This study develops regression models for predicting the performance of cross-language information retrieval (CLIR). The model assumes that CLIR performance can be explained by two factors: (1) the ease of search inherent in each query and (2) the translation quality in the process of CLIR systems. As operational variables, monolingual information retrieval (IR) performance is used for measuring the ease of search, and the well-known evaluation metric BLEU is used to measure the translation quality. This study also proposes an alternative metric, weighted average for matched unigrams (WAMU), which is tailored to gauging translation quality for special IR purposes. The data for regression analysis are obtained from a retrieval experiment of English-to-Italian bilingual searches using the CLEF 2003 test collection. The CLIR and monolingual IR performances are measured by average precision score. The result shows that the proposed regression model can explain about 60% of the variation in CLIR performance, and WAMU has more predictive power than BLEU. A back translation method for applying the regression model to operational CLIR systems in real situations is discussed. 相似文献
17.
跨语言信息检索中的询问翻译方法及其研究进展 总被引:10,自引:0,他引:10
主要介绍了跨语言文本信息检索的三类基本方法:询问翻译、文献翻译和不翻译,并且对目前最常用的询问翻译方法所涉及的一些基本问题及其研究进展进行了阐述,最后总结出跨语言信息检索的现状和动向。 相似文献
18.
比较字典式混排索引与分列式索引在编制与利用层面的优劣,探讨字典式混排索引在中国发展及衰落的历程,着重分析导致字典式混排索引衰落的内在原因:动荡不安的时代背景、传统文化的消极方面及意识形态领域的影响。指出其中包含两个值得深思的话题:如何避免传统文化消极方面以及意识形态领域的争论对学术文化的消极影响。 相似文献