首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
As digital libraries grow to global scale, the provision of interactive access to content in many languages will become increasingly important. In systems that support query-based searching, the presence of multilingual content will affect both the search technology itself and the user interface components that support query formulation, document selection and query refinement. This article describes the interactions among these components and presents a practical way of evaluating the adequacy of the selection interface. A categorization-based model for the user's selection process is presented and an experimental methodology suitable for obtaining process centered results in this context is developed. The methodology is applied to assess the adequacy of a selection interface in which multiple candidate translations for a term can be simultaneously presented. The results indicate that the modeled selection process is somewhat less effective when users are presented with multi-translation glosses from Japanese to English rather than materials generated originally in English, but that users with access to the gloss translations substantially outperform a Naive Bayes classification algorithm.  相似文献   

2.
Query translation is a viable method for cross-language information retrieval (CLIR), but it suffers from translation ambiguities caused by multiple translations of individual query terms. Previous research has employed various methods for disambiguation, including the method of selecting an individual target query term from multiple candidates by comparing their statistical associations with the candidate translations of other query terms. This paper proposes a new method where we examine all combinations of target query term translations corresponding to the source query terms, instead of looking at the candidates for each query term and selecting the best one at a time. The goodness value for a combination of target query terms is computed based on the association value between each pair of the terms in the combination. We tested our method using the NTCIR-3 English–Korean CLIR test collection. The results show some improvements regardless of the association measures we used.  相似文献   

3.
Dictionary-based query translation for cross-language information retrieval often yields various translation candidates having different meanings for a source term in the query. This paper examines methods for solving the ambiguity of translations based on only the target document collections. First, we discuss two kinds of disambiguation technique: (1) one is a method using term co-occurrence statistics in the collection, and (2) a technique based on pseudo-relevance feedback. Next, these techniques are empirically compared using the CLEF 2003 test collection for German to Italian bilingual searches, which are executed by using English language as a pivot. The experiments showed that a variation of term co-occurrence based techniques, in which the best sequence algorithm for selecting translations is used with the Cosine coefficient, is dominant, and that the PRF method shows comparable high search performance, although statistical tests did not sufficiently support these conclusions. Furthermore, we repeat the same experiments for the case of French to Italian (pivot) and English to Italian (non-pivot) searches on the same CLEF 2003 test collection in order to verity our findings. Again, similar results were observed except that the Dice coefficient outperforms slightly the Cosine coefficient in the case of disambiguation based on term co-occurrence for English to Italian searches.  相似文献   

4.
曲琳琳 《情报科学》2021,39(8):132-138
【目的/意义】跨语言信息检索研究的目的即在消除因语言的差异而导致信息查询的困难,提高从大量纷繁 复杂的查找特定信息的效率。同时提供一种更加方便的途径使得用户能够使用自己熟悉的语言检索另外一种语 言文档。【方法/过程】本文通过对国内外跨语言信息检索的研究现状分析,介绍了目前几种查询翻译的方法,包括: 直接查询翻译、文献翻译、中间语言翻译以及查询—文献翻译方法,对其效果进行比较,然后阐述了跨语言检索关 键技术,对使用基于双语词典、语料库、机器翻译技术等产生的歧义性提出了解决方法及评价。【结果/结论】使用自 然语言处理技术、共现技术、相关反馈技术、扩展技术、双向翻译技术以及基于本体信息检索技术确保知识词典的 覆盖度和歧义性处理,通过对跨语言检索实验分析证明采用知识词典、语料库和搜索引擎组合能够提高查询效 率。【创新/局限】本文为了解决跨语言信息检索使用词典、语料库中词语缺乏的现象,提出通过搜索引擎从网页获 取信息资源来充实语料库中语句对不足的问题。文章主要针对中英文信息检索问题进行了探讨,解决方法还需要 进一步研究,如中文切词困难以及字典覆盖率低等严重影响检索的效率。  相似文献   

5.
This paper presents a laboratory based evaluation study of cross-language information retrieval technologies, utilizing partially parallel test collections, NTCIR-2 (used together with NTCIR-1), where Japanese–English parallel document collections, parallel topic sets and their relevance judgments are available. These enable us to observe and compare monolingual retrieval processes in two languages as well as retrieval across languages. Our experiments focused on (1) the Rosetta stone question (whether a partially parallel collection helps in cross-language information access or not?) and (2) two aspects of retrieval difficulties namely “collection discrepancy” and “query discrepancy”. Japanese and English monolingual retrieval systems are combined by dictionary based query translation modules so that a symmetrical bilingual evaluation environment is implemented.  相似文献   

6.
The paper presents two approaches to interactively refining user search formulations and their evaluation in the new High Accuracy Retrieval from Documents (HARD) track of TREC-12. The first method consists of asking the user to select a number of sentences that represent documents. The second method consists of showing to the user a list of noun phrases extracted from the initial document set. Both methods then expand the query based on the user feedback. The TREC results show that one of the methods is an effective means of interactive query expansion and yields significant performance improvements. The paper presents a comparison of the methods and detailed analysis of the evaluation results.  相似文献   

7.
We will explore various ways to apply query structuring in cross-language information retrieval. In the first test, English queries were translated into Finnish using an electronic dictionary, and were run in a Finnish newspaper database of 55,000 articles. Queries were structured by combining the Finnish translation equivalents of the same English query key using the syn-operator of the InQuery retrieval system. Structured queries performed markedly better than unstructured queries. Second, the effects of compound-based structuring using a proximity operator for the translation equivalents of query language compound components were tested. The method was not useful in syn-based queries but resulted in decrease in retrieval effectiveness. Proper names are often non-identical spelling variants in different languages. This allows n-gram based translation of names not included in a dictionary. In the third test, a query structuring method where the Boolean and-operator was used to assign more weight to keys translated through n-gram matching gave good results.  相似文献   

8.
With the ever increasing size of the web, relevant information extraction on the Internet with a query formed by a few keywords has become a big challenge. Query Expansion (QE) plays a crucial role in improving searches on the Internet. Here, the user’s initial query is reformulated by adding additional meaningful terms with similar significance. QE – as part of information retrieval (IR) – has long attracted researchers’ attention. It has become very influential in the field of personalized social document, question answering, cross-language IR, information filtering and multimedia IR. Research in QE has gained further prominence because of IR dedicated conferences such as TREC (Text Information Retrieval Conference) and CLEF (Conference and Labs of the Evaluation Forum). This paper surveys QE techniques in IR from 1960 to 2017 with respect to core techniques, data sources used, weighting and ranking methodologies, user participation and applications – bringing out similarities and differences.  相似文献   

9.
目前大多数机器翻译和跨语言检索系统都是基于通用语料,对外文科技资料的翻译效果不理想,本文结合科技文献的加工方法,研究面向科技文献的跨语言信息检索系统的模型。首先对跨语言信息检索的概念和特点进行简单的概述,从3个角度介绍跨语言信息检索的研究方法,然后讨论构建跨语言信息检索系统的必要性,在此基础上设计出一个面向科技文献的跨语言信息检索系统模型以及主要功能结构。  相似文献   

10.
双语机读词典是基于查询翻译的跨语言信息检索中的常用资源,但是传统的手工构建词典的方法费时费力,本文利用统计方法从英汉句对齐平行语料库中自动获取翻译词典,以用于查询翻译过程中。  相似文献   

11.
We present an efficient document clustering algorithm that uses a term frequency vector for each document instead of using a huge proximity matrix. The algorithm has the following features: (1) it requires a relatively small amount of memory and runs fast, (2) it produces a hierarchy in the form of a document classification tree and (3) the hierarchy obtained by the algorithm explicitly reveals a collection structure. We confirm these features and thus show the algorithm's feasibility through clustering experiments in which we use two collections of Japanese documents, the sizes of which are 83,099 and 14,701 documents. We also introduce an application of this algorithm to a document browser. This browser is used in our Japanese-to-English translation aid system. The browsing module of the system consists of a huge database of Japanese news articles and their English translations. The Japanese article collection is clustered into a hierarchy by our method. Since each node in the hierarchy corresponds to a topic in the collection, we can use the hierarchy to directly access articles by topic. A user can learn general translation knowledge of each topic by browsing the Japanese articles and their English translations. We also discuss techniques of presenting a large tree-formed hierarchy on a computer screen.  相似文献   

12.
13.
The rapid growth of documents in different languages, the increased accessibility of electronic documents, and the availability of translation tools have caused cross-lingual plagiarism detection research area to receive increasing attention in recent years. The task of cross-language plagiarism detection entails two main steps: candidate retrieval and assessing pairwise document similarity. In this paper we examine candidate retrieval, where the goal is to find potential source documents of a suspicious text. Our proposed method for cross-language plagiarism detection is a keyword-focused approach. Since plagiarism usually happens in parts of the text, there is a requirement to segment the texts into fragments to detect local similarity. Therefore we propose a topic-based segmentation algorithm to convert the suspicious document to a set of related passages. After that, we use a proximity-based model to retrieve documents with the best matching passages. Experiments show promising results for this important phase of cross-language plagiarism detection.  相似文献   

14.
15.
Knowledge acquisition and bilingual terminology extraction from multilingual corpora are challenging tasks for cross-language information retrieval. In this study, we propose a novel method for mining high quality translation knowledge from our constructed Persian–English comparable corpus, University of Tehran Persian–English Comparable Corpus (UTPECC). We extract translation knowledge based on Term Association Network (TAN) constructed from term co-occurrences in same language as well as term associations in different languages. We further propose a post-processing step to do term translation validity check by detecting the mistranslated terms as outliers. Evaluation results on two different data sets show that translating queries using UTPECC and using the proposed methods significantly outperform simple dictionary-based methods. Moreover, the experimental results show that our methods are especially effective in translating Out-Of-Vocabulary terms and also expanding query words based on their associated terms.  相似文献   

16.
Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in an approach often called structured query translation. In contrast, language models incorporate translation probabilities into a unified framework. We compare the two approaches on Arabic and Spanish data sets, using two kinds of bilingual dictionaries––one derived from a conventional dictionary, and one derived from a parallel corpus. We find that structured query processing gives slightly better results when queries are not expanded. On the other hand, when queries are expanded, language modeling gives better results, but only when using a probabilistic dictionary derived from a parallel corpus.We pursue two additional issues inherent in the comparison of structured query processing with language modeling. The first concerns query expansion, and the second is the role of translation probabilities. We compare conventional expansion techniques (pseudo-relevance feedback) with relevance modeling, a new IR approach which fits into the formal framework of language modeling. We find that relevance modeling and pseudo-relevance feedback achieve comparable levels of retrieval and that good translation probabilities confer a small but significant advantage.  相似文献   

17.
李江华  时鹏 《情报杂志》2012,31(4):112-116
Internet已成为全球最丰富的数据源,数据类型繁杂且动态变化,如何从中快速准确地检索出用户所需要的信息是一个亟待解决的问题.传统的搜索引擎基于语法的方式进行搜索,缺乏语义信息,难以准确地表达用户的查询需求和被检索对象的文档语义,致使查准率和查全率较低且搜索范围有限.本文对现有的语义检索方法进行了研究,分析了其中存在的问题,在此基础上提出了一种基于领域的语义搜索引擎模型,结合语义Web技术,使用领域本体元数据模型对用户的查询进行语义化规范,依据领域本体模式抽取文档中的知识并RDF化,准确地表达了用户的查询语义和作为被查询对象的文档语义,可以大大提高检索的准确性和检索效率,详细地给出了模型的体系结构、基本功能和工作原理.  相似文献   

18.
数字资源整合及检索初探   总被引:8,自引:0,他引:8  
本文介绍了目前整合数字资源的方式以及相关标准、协议和接口技术,讨论了数字资源整合系统的信息检索原理及跨语言翻译、学科专业分类表、检索语言转换等问题,分析了数字资源整合研究将面临的挑战。  相似文献   

19.
This paper reviews state-of-the-art techniques and methods for enhancing effectiveness of cross-language information retrieval (CLIR). The following research issues are covered: (1) matching strategies and translation techniques, (2) methods for solving the problem of translation ambiguity, (3) formal models for CLIR such as application of the language model, (4) the pivot language approach, (5) methods for searching multilingual document collection, (6) techniques for combining multiple language resources, etc.  相似文献   

20.
This paper presents a study of relevance feedback in a cross-language information retrieval environment. We have performed an experiment in which Portuguese speakers are asked to judge the relevance of English documents; documents hand-translated to Portuguese and documents automatically translated to Portuguese. The goals of the experiment were to answer two questions (i) how well can native Portuguese searchers recognise relevant documents written in English, compared to documents that are hand translated and automatically translated to Portuguese; and (ii) what is the impact of misjudged documents on the performance improvement that can be achieved by relevance feedback. Surprisingly, the results show that machine translation is as effective as hand translation in aiding users to assess relevance in the experiment. In addition, the impact of misjudged documents on the performance of RF is overall just moderate, and varies greatly for different query topics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号