首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
A usual strategy to implement CLIR (Cross-Language Information Retrieval) systems is the so-called query translation approach. The user query is translated for each language present in the multilingual collection in order to compute an independent monolingual information retrieval process per language. Thus, this approach divides documents according to language. In this way, we obtain as many different collections as languages. After searching in these corpora and obtaining a result list per language, we must merge them in order to provide a single list of retrieved articles. In this paper, we propose an approach to obtain a single list of relevant documents for CLIR systems driven by query translation. This approach, which we call 2-step RSV (RSV: Retrieval Status Value), is based on the re-indexing of the retrieval documents according to the query vocabulary, and it performs noticeably better than traditional methods. The proposed method requires query vocabulary alignment: given a word for a given query, we must know the translation or translations to the other languages. Because this is not always possible, we have researched on a mixed model. This mixed model is applied in order to deal with queries with partial word-level alignment. The results prove that even in this scenario, 2-step RSV performs better than traditional merging methods.  相似文献   

2.
Background: Question‐answering systems (or QA Systems) stand as a new alternative for Information Retrieval Systems. Most users frequently need to retrieve specific information about a factual question to obtain a whole document. Objectives: The study evaluates the efficiency of QA systems as terminological sources for physicians, specialised translators and users in general. It assesses the performance of one open‐domain QA system, START, and one restricted‐domain QA system, MedQA. Method: The study collected two hundred definitional questions (What is…?), either general or specialised, from the health website WebMD. Sources used by the open‐domain QA system, START, and the restricted‐domain QA system, MedQA, were studied to retrieve answers, and later a range of evaluation measures (precision, Mean Reciprocal Rank, Total Reciprocal Rank, First Hit Success) were applied to mark the quality of answers. Results: It was established that both systems are useful in the retrieval of valid definitional healthcare information, with an acceptable degree of coherent and precise responses from both. The answers supplied by MedQA were more reliable that those of START in the sense that they came from specialised clinical or academic sources, most of them showing links to further research articles. Conclusions: Results obtained show the potential of this type of tool in the more general realm of information access, and the retrieval of health information. They may be considered a good, reliable and reasonably precise alternative in alleviating the information overload. Both QA systems can help professionals and users can obtain healthcare information.  相似文献   

3.
面对日益膨胀的多语种信息资源,跨语言信息检索已成为实现全球知识存取和共享的关键技术手段。构建一个实用型的跨语言检索查询翻译接口,可方便地嵌入任意的信息检索平台,扩展现有信息检索平台的多语言信息处理能力。该查询翻译接口采用基于最长短语、查询分类和概率词典等多种翻译消歧策略,并从查询翻译的准确性和接口的运行效率两个角度对构建的查询翻译接口进行评测,实验结果验证所采用方法具有可行性。  相似文献   

4.
We present a system for multilingual information retrieval that allows users to formulate queries in their preferred language and retrieve relevant information from a collection containing documents in multiple languages. The system is based on a process of document level alignments, where documents of different languages are paired according to their similarity. The resulting mapping allows us to produce a multilingual comparable corpus. Such a corpus has multiple interesting applications. It allows us to build a data structure for query translation in cross-language information retrieval (CLIR). Moreover, we also perform pseudo relevance feedback on the alignments to improve our retrieval results. And finally, multiple retrieval runs can be merged into one unified result list. The resulting system is inexpensive, adaptable to domain-specific collections and new languages and has performed very well at the TREC-7 conference CLIR system comparison.  相似文献   

5.
英汉交互式跨语言检索系统设计与实现   总被引:1,自引:0,他引:1  
针对跨语言信息检索的查询翻译歧义性问题,采用交互式系统开发设计方法,对基于相关反馈的跨语言信息检索技术进行研究和分析,提出一个英汉交互式跨语言信息检索系统,实现用户辅助查询翻译、多级用户相关性判断,以及翻译优化与查询扩展等相关反馈功能,结果明显提高了检索效果。  相似文献   

6.
分析跨语言信息检索的基本模式和翻译消歧关键技术,采用基于词语对共现率和词语间距加权计算的方法,对查询式翻译进行消歧优化,在此基础上构建跨语言商品信息检索系统并应用于图书商品搜索,实验结果证明翻译质量和检索效果得到提高。  相似文献   

7.
国外跨语言信息检索中的翻译歧义性问题研究综述   总被引:2,自引:0,他引:2  
翻译歧义是影响跨语言信息检索效果的主要因素之一。本文论述了跨语言信息检索中翻译歧义性问题产生的原因,并且总结了目前消除这种歧义的方法和技术。  相似文献   

8.
We describe the Eurospider component for Cross-Language Information Retrieval (CLIR) that has been employed for experiments at all three CLEF campaigns to date. The central aspect of our efforts is the use of combination approaches, effectively combining multiple language pairs, translation resources and translation methods into one multilingual retrieval system. We discuss the implications of building a system that allows flexible combination, give details of the various translation resources and methods, and investigate the impact of merging intermediate results generated by the individual steps. An analysis of the resulting combination system is given which also takes into account additional requirements when deploying the system as a component in an operational, commercial setting.  相似文献   

9.
跨语言信息检索中的询问翻译方法及其研究进展   总被引:10,自引:0,他引:10  
主要介绍了跨语言文本信息检索的三类基本方法:询问翻译、文献翻译和不翻译,并且对目前最常用的询问翻译方法所涉及的一些基本问题及其研究进展进行了阐述,最后总结出跨语言信息检索的现状和动向。  相似文献   

10.
This article presents FIDJI, a question-answering (QA) system for French. FIDJI combines syntactic information with traditional QA techniques such as named entity recognition and term weighting; it does not require any pre-processing other than classical search engine indexing. Among other uses of syntax, we experiment in this system the validation of answers through different documents, as well as specific techniques for answering different types of questions (e.g., yes/no or list questions). We present several experiments which show the benefits of syntactic analysis, as well as multi-document validation. Different types of questions and corpora are tested, and specificities are commented. Links with result aggregation are also discussed.  相似文献   

11.
The paper studies concept-based cross-language information retrieval (CLIR). The document collection was a subset of the TREC collection. The test requests were formed from TREC's health related topics. As translation dictionaries the study used a general dictionary and a domain-specific (=medical) dictionary. The effects of translation method, conjunction, and facet order on the effectiveness of concept-based cross-language queries were studied, and concept-based structuring of cross-language queries was compared to mechanical structuring based on the output of dictionaries. The performance of translated Finnish queries against English documents was compared to the performance of original English queries against the English documents, and the performance of different CLIR query types was compared with one another. No major difference was found between concept-based and mechanical structuring. The best translation method was a simultaneous look-up in the medical dictionary and the general dictionary, in which case cross-language queries performed as well as the original English queries. The results showed that especially at high exhaustivity (the number of mutually restrictive concepts in a request) levels cross-language queries perform well in relation to monolingual queries. This suggests that conjunction disambiguates cross-language queries. An extensive study was made of the relative importance of the concepts of requests. On the basis of the classification data of request concepts it was shown how the order of facets in a query affects cross-language as well as monolingual queries.  相似文献   

12.
With the increasing availability of machine-readable bilingual dictionaries, dictionary-based automatic query translation has become a viable approach to Cross-Language Information Retrieval (CLIR). In this approach, resolving term ambiguity is a crucial step. We propose a sense disambiguation technique based on a term-similarity measure for selecting the right translation sense of a query term. In addition, we apply a query expansion technique which is also based on the term similarity measure to improve the effectiveness of the translation queries. The results of our Indonesian to English and English to Indonesian CLIR experiments demonstrate the effectiveness of the sense disambiguation technique. As for the query expansion technique, it is shown to be effective as long as the term ambiguity in the queries has been resolved. In the effort to solve the term ambiguity problem, we discovered that differences in the pattern of word-formation between the two languages render query translations from one language to the other difficult.  相似文献   

13.
Research on cross-language information retrieval (CLIR) has typically been restricted to settings using binary relevance assessments. In this paper, we present evaluation results for dictionary-based CLIR using graded relevance assessments in a best match retrieval environment. A text database containing newspaper articles and a related set of 35 search topics were used in the tests. First, monolingual baseline queries were automatically formed from the topics. Secondly, source language topics (in English, German, and Swedish) were automatically translated into the target language (Finnish), using structured target queries. The effectiveness of the translated queries was compared to that of the monolingual queries. Thirdly, pseudo-relevance feedback was used to expand the original target queries. CLIR performance was evaluated using three relevance thresholds: stringent, regular, and liberal. When regular or liberal threshold was used, a reasonable performance was achieved. Using stringent threshold, equally high performance could not be achieved. On all the relevance thresholds the performance of the translated queries was successfully raised by pseudo-relevance feedback based query expansion. However, the performance of the stringent threshold in relation to the other thresholds could not be raised by this method.  相似文献   

14.
This study develops regression models for predicting the performance of cross-language information retrieval (CLIR). The model assumes that CLIR performance can be explained by two factors: (1) the ease of search inherent in each query and (2) the translation quality in the process of CLIR systems. As operational variables, monolingual information retrieval (IR) performance is used for measuring the ease of search, and the well-known evaluation metric BLEU is used to measure the translation quality. This study also proposes an alternative metric, weighted average for matched unigrams (WAMU), which is tailored to gauging translation quality for special IR purposes. The data for regression analysis are obtained from a retrieval experiment of English-to-Italian bilingual searches using the CLEF 2003 test collection. The CLIR and monolingual IR performances are measured by average precision score. The result shows that the proposed regression model can explain about 60% of the variation in CLIR performance, and WAMU has more predictive power than BLEU. A back translation method for applying the regression model to operational CLIR systems in real situations is discussed.  相似文献   

15.
Cross-language information retrieval (CLIR) has so far been studied with the assumption that some rich linguistic resources such as bilingual dictionaries or parallel corpora are available. But creation of such high quality resources is labor-intensive and they are not always at hand. In this paper we investigate the feasibility of using only comparable corpora for CLIR, without relying on other linguistic resources. Comparable corpora are text documents in different languages that cover similar topics and are often naturally attainable (e.g., news articles published in different languages at the same time period). We adapt an existing cross-lingual word association mining method and incorporate it into a language modeling approach to cross-language retrieval. We investigate different strategies for estimating the target query language models. Our evaluation results on the TREC Arabic–English cross-lingual data show that the proposed method is effective for the CLIR task, demonstrating that it is feasible to perform cross-lingual information retrieval with just comparable corpora.  相似文献   

16.
Observations from a unique investigation of failure analysis of Information Retrieval research engines held in 2003 are presented. The Reliable Information Access Workshop invited seven leading IR research groups to supply both their systems and their experts to an effort to analyze why their systems fail on some topics and whether the failures are due to system flaws, approach flaws, or the topic itself. There were surprising results from this cross-system failure analysis. One is that despite systems retrieving very different documents, the major cause of failure for any particular topic was almost always the same across all systems. Another is that relationships between aspects of a topic are not especially important for state-of-the-art systems; the systems are failing at a much more basic level where the top-retrieved documents are not reflecting some aspect at all. The investigatory framework and the lessons learned can serve as a model for needed future research in this area.  相似文献   

17.
交互式跨语言信息检索是信息检索的一个重要分支。在分析交互式跨语言信息检索过程、评价指标、用户行为进展等理论研究基础上,设计一个让用户参与跨语言信息检索全过程的用户检索实验。实验结果表明:用户检索词主要来自检索主题的标题;用户判断文档相关性的准确率较高;目标语言文档全文、译文摘要、译文全文都是用户认可的判断依据;翻译优化方法以及翻译优化与查询扩展的结合方法在用户交互环境下非常有效;用户对于反馈后的翻译仍然愿意做进一步选择;用户对于与跨语言信息检索系统进行交互是有需求并认可的。用户行为分析有助于指导交互式跨语言信息检索系统的设计与实践。  相似文献   

18.
The problem of language in Web searching has been discussed primarily in the area of cross-language information retrieval (CLIR). However, much CLIR research centers on investigation of the effectiveness of automatic translation techniques. The case study reported here explored bilingual user behaviors, perceptions, and preferences with respect to the capability of the Web as a multilingual information resource. Twenty-eight bilingual academic users from Myongji University in Korea were recruited for the study. Findings show that the subjects did not use Web search engines as multilingual tools. For search queries, they selected a language that represents their information need most accurately depending on the types of information task rather than choosing their first language. Subjects expressed concerns about the accuracy of machine translation of scholarly terminologies and preferred to have user control over multilingual Web searches.  相似文献   

19.
Automatic question answering using the web: Beyond the Factoid   总被引:4,自引:0,他引:4  
In this paper we describe and evaluate a Question Answering (QA) system that goes beyond answering factoid questions. Our approach to QA assumes no restrictions on the type of questions that are handled, and no assumption that the answers to be provided are factoids. We present an unsupervised approach for collecting question and answer pairs from FAQ pages, which we use to collect a corpus of 1 million question/answer pairs from FAQ pages available on the Web. This corpus is used to train various statistical models employed by our QA system: a statistical chunker used to transform a natural language-posed question into a phrase-based query to be submitted for exact match to an off-the-shelf search engine; an answer/question translation model, used to assess the likelihood that a proposed answer is indeed an answer to the posed question; and an answer language model, used to assess the likelihood that a proposed answer is a well-formed answer. We evaluate our QA system in a modular fashion, by comparing the performance of baseline algorithms against our proposed algorithms for various modules in our QA system. The evaluation shows that our system achieves reasonable performance in terms of answer accuracy for a large variety of complex, non-factoid questions.  相似文献   

20.
New Mexico State University's Computing Research Lab has participated in research in all three phases of the US Government's Tipster program. Our work on information retrieval has focused on research and development of multilingual and cross-language approaches to automatic retrieval. The work on automatic systems has been supplemented by additional research into the role of the IR system user in interactive retrieval scenarios: monolingual, multilingual and cross-language. The combined efforts suggest that universal text retrieval, in which a user can find, access and use documents in the face of language differences and information overload, may be possible.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号