首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
吴丹  齐和庆 《现代情报》2009,29(7):215-221
信息检索发展中的一个重要理论问题是如何对查询与文档进行匹配,由此形成了不同的信息检索模型。跨语言信息检索是信息检索研究的一个分支,也是近年来的热点问题。本文主要对信息检索模型的研究进展,及其在跨语言信息检索中的应用进展进行分析与综述。  相似文献   

2.
Due to their ready availability, database management systems are being applied to bibliographic databases with increasing frequency. This is being done in spite of the fact that although DBMS query languages tend to be very powerful, they are far too complex for the casual user. It is proposed that PSI, an existing virtual-system intermediary for document retrieval systems, be extended to include access to DBMS containing bibliographic data in order to circumvent the complexity problem or the casual user. PSI currently provides a common command language for access to multiple document retrieval systems. It is shown that PSI could be extended to provide this same command language to access DBMS, whether the DBMS are relational or network.  相似文献   

3.
曲琳琳 《情报科学》2021,39(8):132-138
【目的/意义】跨语言信息检索研究的目的即在消除因语言的差异而导致信息查询的困难,提高从大量纷繁 复杂的查找特定信息的效率。同时提供一种更加方便的途径使得用户能够使用自己熟悉的语言检索另外一种语 言文档。【方法/过程】本文通过对国内外跨语言信息检索的研究现状分析,介绍了目前几种查询翻译的方法,包括: 直接查询翻译、文献翻译、中间语言翻译以及查询—文献翻译方法,对其效果进行比较,然后阐述了跨语言检索关 键技术,对使用基于双语词典、语料库、机器翻译技术等产生的歧义性提出了解决方法及评价。【结果/结论】使用自 然语言处理技术、共现技术、相关反馈技术、扩展技术、双向翻译技术以及基于本体信息检索技术确保知识词典的 覆盖度和歧义性处理,通过对跨语言检索实验分析证明采用知识词典、语料库和搜索引擎组合能够提高查询效 率。【创新/局限】本文为了解决跨语言信息检索使用词典、语料库中词语缺乏的现象,提出通过搜索引擎从网页获 取信息资源来充实语料库中语句对不足的问题。文章主要针对中英文信息检索问题进行了探讨,解决方法还需要 进一步研究,如中文切词困难以及字典覆盖率低等严重影响检索的效率。  相似文献   

4.
The estimation of query model is an important task in language modeling (LM) approaches to information retrieval (IR). The ideal estimation is expected to be not only effective in terms of high mean retrieval performance over all queries, but also stable in terms of low variance of retrieval performance across different queries. In practice, however, improving effectiveness can sacrifice stability, and vice versa. In this paper, we propose to study this tradeoff from a new perspective, i.e., the bias–variance tradeoff, which is a fundamental theory in statistics. We formulate the notion of bias–variance regarding retrieval performance and estimation quality of query models. We then investigate several estimated query models, by analyzing when and why the bias–variance tradeoff will occur, and how the bias and variance can be reduced simultaneously. A series of experiments on four TREC collections have been conducted to systematically evaluate our bias–variance analysis. Our approach and results will potentially form an analysis framework and a novel evaluation strategy for query language modeling.  相似文献   

5.
Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in an approach often called structured query translation. In contrast, language models incorporate translation probabilities into a unified framework. We compare the two approaches on Arabic and Spanish data sets, using two kinds of bilingual dictionaries––one derived from a conventional dictionary, and one derived from a parallel corpus. We find that structured query processing gives slightly better results when queries are not expanded. On the other hand, when queries are expanded, language modeling gives better results, but only when using a probabilistic dictionary derived from a parallel corpus.We pursue two additional issues inherent in the comparison of structured query processing with language modeling. The first concerns query expansion, and the second is the role of translation probabilities. We compare conventional expansion techniques (pseudo-relevance feedback) with relevance modeling, a new IR approach which fits into the formal framework of language modeling. We find that relevance modeling and pseudo-relevance feedback achieve comparable levels of retrieval and that good translation probabilities confer a small but significant advantage.  相似文献   

6.
The Multilevel Information System (MLIS), extension of typical Information Retrieval System towards more complete data processing, is discussed. MLIS integrates functions typical for data base management systems and retrieval-oriented systems. Several levels of data accessing are provided, each level developed for a different class of users. End-user level is based on simple query language, trained user level on a relational model, and application programmer level on a Data Manipulation Language nested in high level programming language. The last two levels are discussed in detail.  相似文献   

7.
The ECDIN (Environmental Chemicals Data and Information Network) project started in 1973. During the pilot phase of operation the feasibility of the system was demonstrated using a data base of 4000 compounds and the SIMAS information retrieval system. It was quickly realised that for ECDIN data management was as important as information retrieval and in November 1977, after a study of available software, the ADABAS data base management system was installed at JRC Ispra for ECDIN and other JRC data banks. A design exercise for the ECDIN ADABAS data base has been completed and parts of the existing ECDIN data base have been converted to the new system. The problems encountered and the solutions adopted are discussed. The user interface to ECDIN is still under development. When fully operational ECDIN will be available through EURONET to both casual and specialist users and, in consequence, at least two levels of user interface will be required: (a) a user friendly conversational language designed for the casual user and capable of dealing with the more common types of question, (b) a sophisticated query language capable of answering the more difficult questions, producing “one-off” reports and probably requiring both a specialist knowledge of the data base and a programmer oriented background. The first tentative steps in this direction are described.  相似文献   

8.
This paper presents a probabilistic information retrieval framework in which the retrieval problem is formally treated as a statistical decision problem. In this framework, queries and documents are modeled using statistical language models, user preferences are modeled through loss functions, and retrieval is cast as a risk minimization problem. We discuss how this framework can unify existing retrieval models and accommodate systematic development of new retrieval models. As an example of using the framework to model non-traditional retrieval problems, we derive retrieval models for subtopic retrieval, which is concerned with retrieving documents to cover many different subtopics of a general query topic. These new models differ from traditional retrieval models in that they relax the traditional assumption of independent relevance of documents.  相似文献   

9.
主要讨论在Web环境下,如何使用PHP和MYSQL实现成绩查询系统的应用,并且给出了详细的设计方法。内容具体涉及到数据库中批量数据的导入、一般条件查询和多种条件复合查询两种方式的实现。实例选取外语成绩查询为模版,相关的原理和方法可以应用到类似的一般成绩查询系统中。  相似文献   

10.
11.
OCR errors in text harm information retrieval performance. Much research has been reported on modelling and correction of Optical Character Recognition (OCR) errors. Most of the prior work employ language dependent resources or training texts in studying the nature of errors. However, not much research has been reported that focuses on improving retrieval performance from erroneous text in the absence of training data. We propose a novel approach for detecting OCR errors and improving retrieval performance from the erroneous corpus in a situation where training samples are not available to model errors. In this paper we propose a method that automatically identifies erroneous term variants in the noisy corpus, which are used for query expansion, in the absence of clean text. We employ an effective combination of contextual information and string matching techniques. Our proposed approach automatically identifies the erroneous variants of query terms and consequently leads to improvement in retrieval performance through query expansion. Our proposed approach does not use any training data or any language specific resources like thesaurus for identification of error variants. It also does not expend any knowledge about the language except that the word delimiter is blank space. We have tested our approach on erroneous Bangla (Bengali in English) and Hindi FIRE collections, and also on TREC Legal IIT CDIP and TREC 5 Confusion track English corpora. Our proposed approach has achieved statistically significant improvements over the state-of-the-art baselines on most of the datasets.  相似文献   

12.
An experimental best match retrieval system is described based on the serial file organisation. Documents and queries are characterised by fixed length bit strings and the time-consuming character-by-character term match is preceeded by a bit string search to eliminate large numbers of documents which cannot possibly satisfy the query. Two methods, one fully automatic and one partially manual in character, are described for the generation of such bit string characterisations. Retrieval experiments with a large document test collection show that the two-level search can increase substantially the efficiency of serial searching while maintaining retrieval effectiveness, and that a single-level search based only upon the bit strings results in only a small decrease in effectiveness in some cases.  相似文献   

13.
MEDLINE is presented as a prototype for on-line bibliographic search systems. Creation of the data base, indexing language, and file organization are reviewed. On accessing the files, search logic is illustrated with a sample MEDLINE search. NLM's development of a document delivery system to complement its bibliographic retrieval system is discussed.  相似文献   

14.
A binary approach to data storage and retrieval is introduced. It views the data base as a two-dimensional matrix that relates entities to all possible values the attributes of these entities may take. As such, it provides a unified solution to the two conflicting types of data base transactions—operational and managerial. An analytical investigation of the feasibility of binary storage and a compression method for reducing meaningless areas of the matrix are presented. Storage efficiencies of binary and conventional inverted file methods are compared and evaluated. An analysis of retrieval considerations associated with the binary matrix is given, particularly the issue of going from high to low orders of compression. Results of these analyses indicate that the binary data base's efficiency increases with increases in query complexity. Future research directions are sited and discussed.  相似文献   

15.
Cross-language information retrieval (CLIR) systems allow users to find documents written in different languages from that of their query. Simple knowledge structures such as bilingual term lists have proven to be a remarkably useful basis for bridging that language gap. A broad array of dictionary-based techniques have demonstrated utility, but comparison across techniques has been difficult because evaluation results often span only a limited range of conditions. This article identifies the key issues in dictionary-based CLIR, develops unified frameworks for term selection and term translation that help to explain the relationships among existing techniques, and illustrates the effect of those techniques using four contrasting languages for systematic experiments with a uniform query translation architecture. Key results include identification of a previously unseen dependence of pre- and post-translation expansion on orthographic cognates and development of a query-specific measure for translation fanout that helps to explain the utility of structured query methods.  相似文献   

16.
The success of information retrieval depends on the ability to measure the effective relationship between a query and its response. If both are posed in natural language, one might expect that understanding the meaning of that language could not be avoided. The aim of this research is to demonstrate that it is perhaps unnecessary to be able to determine the meaning in the absolute sense; it may be sufficient to measure how far there is a conformity in meaning, and then only in the context of the set of documents in which the answer to a query is sought. Handling a particular language using a computer is made possible through replacing certain texts by special sets. A given text has a ‘syntactic trace’, the set of all the overlapping trigrams forming part of the text. When determining the effective relationship between a query and its answer, not only do their syntactic traces play a role, but so do the traces of all other documents in the set. This is known as the ‘information trace method’.  相似文献   

17.
Networked information retrieval aims at the interoperability of heterogeneous information retrieval (IR) systems. In this paper, we show how differences concerning search operators and database schemas can be handled by applying data abstraction concepts in combination with uncertain inference. Different data types with vague predicates are required to allow for queries referring to arbitrary attributes of documents. Physical data independence separates search operators from access paths, thus solving text search problems related to noun phrases, compound words and proper nouns. Projection and inheritance on attributes support the creation of unified views on a set of IR databases. Uncertain inference allows for query processing even on incompatible database schemas.  相似文献   

18.
In ad hoc querying of document collections, current approaches to ranking primarily rely on identifying the documents that contain the query terms. Methods such as query expansion, based on thesaural information or automatic feedback, are used to add further terms, and can yield significant though usually small gains in effectiveness. Another approach to adding terms, which we investigate in this paper, is to use natural language technology to annotate - and thus disambiguate - key terms by the concept they represent. Using biomedical research documents, we quantify the potential benefits of tagging users’ targeted concepts in queries and documents in domain-specific information retrieval. Our experiments, based on the TREC Genomics track data, both on passage and full-text retrieval, found no evidence that automatic concept recognition in general is of significant value for this task. Moreover, the issues raised by these results suggest that it is difficult for such disambiguation to be effective.  相似文献   

19.
A concept of end-user query language with facilities of expressing relationships between objects kept in a data base is presented. The idea of nesting these facilities in typical document system query language is shown. Special kinds of referring terms are designed. Examples of usage of the new facilities are attached.  相似文献   

20.
许跃军 《情报科学》2008,26(6):866-871
主要论述基于Ontology(本体)的政府知识库的信息检索技术。该技术有别于传统的全文检索技术,采用基于本体的技术来分析处理用户提交的查询请求,分析自然语言形式问题中的词法、语法、语义等信息,识别出问题的类别,得到一些关键词,并进行扩展。还可根据本体中领域词汇的关系对关键词进行扩展,并赋予不同的权值。然后将问题类别和带权值的关键词序列提交给系统的检索引擎进行后继的处理。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号