首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
董文鸳 《现代情报》2006,26(9):57-58
作者在简介了LISA数据库之后,分析LISA的检索规则和可检字段,最后从快速检索、高级检索和检索工具三个方面介绍了LISA的检索技巧与方法。  相似文献   

2.
本文作者力求在现阶段的信息科学业内推介LISA Database。在细致介绍了LISA 3种不同版本的检索特点后,着重分析与探究了“网络LISA”数据库的检索流程与结果揭示,最后对以LISA为代表的外文Database之优势做了简要概括,并指出其未来信息资源整合前景。  相似文献   

3.
It is well-known that relevance feedback is a method significant in improving the effectiveness of information retrieval systems. Improving effectiveness is important since these information retrieval systems must gain access to large document collections distributed over different distant sites. As a consequence, efforts to retrieve relevant documents have become significantly greater. Relevance feedback can be viewed as an aid to the information retrieval task. In this paper, a relevance feedback strategy is presented. The strategy is based on back-propagation of the relevance of retrieved documents using an algorithm developed in a neural approach. This paper describes a neural information retrieval model and emphasizes the results obtained with the associated relevance back-propagation algorithm in three different environments: manual ad hoc, automatic ad hoc and mixed ad hoc strategy (automatic plus manual ad hoc).  相似文献   

4.
相关概念的关联参照检索是概念检索的重要研究内容。本文提出了一种基于主题的语义关联的参照检索模型,通过融合语义网、本体论的相关知识及信息提取等语言处理技术,提取关于特定主题的文档的主题概念及概念之间的关联构成该主题的语义关联模型,并辅助于参照检索过程。  相似文献   

5.
This paper reports our experimental investigation into the use of more realistic concepts as opposed to simple keywords for document retrieval, and reinforcement learning for improving document representations to help the retrieval of useful documents for relevant queries. The framework used for achieving this was based on the theory of Formal Concept Analysis (FCA) and Lattice Theory. Features or concepts of each document (and query), formulated according to FCA, are represented in a separate concept lattice and are weighted separately with respect to the individual documents they present. The document retrieval process is viewed as a continuous conversation between queries and documents, during which documents are allowed to learn a set of significant concepts to help their retrieval. The learning strategy used was based on relevance feedback information that makes the similarity of relevant documents stronger and non-relevant documents weaker. Test results obtained on the Cranfield collection show a significant increase in average precisions as the system learns from experience.  相似文献   

6.
7.
How to merge and organise query results retrieved from different resources is one of the key issues in distributed information retrieval. Some previous research and experiments suggest that cluster-based document browsing is more effective than a single merged list. Cluster-based retrieval results presentation is based on the cluster hypothesis, which states that documents that cluster together have a similar relevance to a given query. However, while this hypothesis has been demonstrated to hold in classical information retrieval environments, it has never been fully tested in heterogeneous distributed information retrieval environments. Heterogeneous document representations, the presence of document duplicates, and disparate qualities of retrieval results, are major features of an heterogeneous distributed information retrieval environment that might disrupt the effectiveness of the cluster hypothesis. In this paper we report on an experimental investigation into the validity and effectiveness of the cluster hypothesis in highly heterogeneous distributed information retrieval environments. The results show that although clustering is affected by different retrieval results representations and quality, the cluster hypothesis still holds and that generating hierarchical clusters in highly heterogeneous distributed information retrieval environments is still a very effective way of presenting retrieval results to users.  相似文献   

8.
Satisfying non-trivial information needs involves collecting information from multiple resources, and synthesizing an answer that organizes that information. Traditional recall/precision-oriented information retrieval focuses on just one phase of that process: how to efficiently and effectively identify documents likely to be relevant to a specific, focused query. The TREC Interactive Track has as its goal the location of documents that pertain to different instances of a query topic, with no reward for duplicated coverage of topic instances. This task is similar to the task of organizing answer components into a complete answer. Clustering and classification are two mechanisms for organizing documents into groups. In this paper, we present an ongoing series of experiments that test the feasibility and effectiveness of using clustering and classification as an aid to instance retrieval and, ultimately, answer construction. Our results show that users prefer such structured presentations of candidate result set to a list-based approach. Assessment of the structured organizations based on the subjective judgement of the experiment subjects suggests that the structured organization can be more effective; however, assessment based on objective judgements shows mixed results. These results indicate that a full determination of the success of the approach depends on assessing the quality of the final answers generated by users, rather than on performance during the intermediate stages of answer construction.  相似文献   

9.
This paper presents a probabilistic information retrieval framework in which the retrieval problem is formally treated as a statistical decision problem. In this framework, queries and documents are modeled using statistical language models, user preferences are modeled through loss functions, and retrieval is cast as a risk minimization problem. We discuss how this framework can unify existing retrieval models and accommodate systematic development of new retrieval models. As an example of using the framework to model non-traditional retrieval problems, we derive retrieval models for subtopic retrieval, which is concerned with retrieving documents to cover many different subtopics of a general query topic. These new models differ from traditional retrieval models in that they relax the traditional assumption of independent relevance of documents.  相似文献   

10.
In the information retrieval systems, one of the most important and difficult operations is to extract appropriate keywords from documents. This paper proposes an effective substring search method by extending a pattern matching machine for multi-keyword based on Aho and Corasick (AC) called AC machine. The proposed method enables us to extract keyword candidates as much as possible and to select the suitable keywords for users' purpose at a retrieval stage. This method contains four types of substring search methods (exact, prefix, suffix and proper substring search). This paper also proposes a construction algorithm of the retrieval structure for speeding up the substring search. From the simulation results, it is shown that the retrieval time of the presented method is as fast as the key retrieval method based on the trie.  相似文献   

11.
12.
13.
This paper is an interim report on our efforts at NIST to construct an information discovery tool through the fusion of hypertext and information retrieval (IR) technologies. The tool works by parsing a contiguous document base into smaller documents and inserting semantic links between these documents using document–document similarity measures based on IR techniques. The focus of the paper is a case study in which domain experts evaluate the utility of the tool in the performance of information discovery tasks on a large, dynamic procedural manual. The results of the case study are discussed, and their implications for the design of large-scale automatic hypertext generation systems are described.  相似文献   

14.
Traditional information retrieval techniques that primarily rely on keyword-based linking of the query and document spaces face challenges such as the vocabulary mismatch problem where relevant documents to a given query might not be retrieved simply due to the use of different terminology for describing the same concepts. As such, semantic search techniques aim to address such limitations of keyword-based retrieval models by incorporating semantic information from standard knowledge bases such as Freebase and DBpedia. The literature has already shown that while the sole consideration of semantic information might not lead to improved retrieval performance over keyword-based search, their consideration enables the retrieval of a set of relevant documents that cannot be retrieved by keyword-based methods. As such, building indices that store and provide access to semantic information during the retrieval process is important. While the process for building and querying keyword-based indices is quite well understood, the incorporation of semantic information within search indices is still an open challenge. Existing work have proposed to build one unified index encompassing both textual and semantic information or to build separate yet integrated indices for each information type but they face limitations such as increased query process time. In this paper, we propose to use neural embeddings-based representations of term, semantic entity, semantic type and documents within the same embedding space to facilitate the development of a unified search index that would consist of these four information types. We perform experiments on standard and widely used document collections including Clueweb09-B and Robust04 to evaluate our proposed indexing strategy from both effectiveness and efficiency perspectives. Based on our experiments, we find that when neural embeddings are used to build inverted indices; hence relaxing the requirement to explicitly observe the posting list key in the indexed document: (a) retrieval efficiency will increase compared to a standard inverted index, hence reduces the index size and query processing time, and (b) while retrieval efficiency, which is the main objective of an efficient indexing mechanism improves using our proposed method, retrieval effectiveness also retains competitive performance compared to the baseline in terms of retrieving a reasonable number of relevant documents from the indexed corpus.  相似文献   

15.
16.
17.
The paper introduces a new method for the visualization of information retrieval. Angle attributes of a document are used to construct the angle–angle-based visual space. The retrieved documents are perceived, several traditional information retrieval evaluation models are visualized and interpreted, and new non-traditional retrieval control means based on the model are explored in the two-dimensional angle display space. The impacts of different metrics on the visualization of information retrieval are discussed. Ambiguity, future research directions and other relevant issues are also addressed.  相似文献   

18.
Document retrieval systems based on probabilistic or fuzzy logic considerations may order documents for retrieval. Users then examine the ordered documents until deciding to stop, based on the estimate that the highest ranked unretrieved document will be most economically not retrieved. We propose an expected precision measure useful in estimating the performance expected if yet unretrieved documents were to be retrieved, providing information that may result in more economical stopping decisions. An expected precision graph, comparing expected precision versus document rank, may graphically display the relative expected precision of retrieved and unretrieved documents and may be used as a stopping aid for online searching of text data bases. The effectiveness of relevance feedback may be examined as a search progresses. Expected precision values may also be used as a cutoff for systems consistent with probabilistic models operating in batch modes. Techniques are given for computing the best expected precision obtainable and the expected precision of subject neutral documents.  相似文献   

19.
The Internet, together with the large amount of textual information available in document archives, has increased the relevance of information retrieval related tools. In this work we present an extension of the Gambal system for clustering and visualization of documents based on fuzzy clustering techniques. The tool allows to structure the set of documents in a hierarchical way (using a fuzzy hierarchical structure) and represent this structure in a graphical interface (a 3D sphere) over which the user can navigate.Gambal allows the analysis of the documents and the computation of their similarity not only on the basis of the syntactic similarity between words but also based on a dictionary (Wordnet 1.7) and latent semantics analysis.  相似文献   

20.
An information retrieval performance measure that is interpreted as the percent of perfect performance (PPP) can be used to study the effects of the inclusion of specific document features or feature classes or techniques in an information retrieval system. Using this, one can measure the relative quality of a new ranking algorithm, the result of incorporating specific types of metadata or folksonomies from natural language, or determine what happens when one makes modifications to terms, such as stemming or adding part-of-speech tags. For example, knowledge that removing stopwords in a specific system improves the performance 5% of the way from the level of random performance to the best possible result is relatively easy to interpret and to use in decision making; using this percent based measure also allows us to simply compute and interpret that there remains 95% of the possible performance to be obtained using other methods. The PPP measure as used here is based on the average search length, a measure of the ordering quality of a set of data, and may be used when evaluating all the documents or just the first N documents in an ordered list of documents. Because the ASL may be computed empirically or may be estimated analytically, the PPP measure may also be computed empirically or performance may be estimated analytically. Different levels of upper bound performance are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号