首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
在对信息检索过程中,文档的属性信息是不确定性或者是不完全的,很难做出决策。基于此,提出将粗集理论应用于信息检索模型中,构造语料库的邻接矩阵,通过对扩展的特征项与文档的上近似集和下近似集的重叠程度的比较,来确定文档与查询的相关性,并通过相关度来对文档进行取舍。通过实验证明,该方法可以提高信息检索的准确率。  相似文献   

2.
现代化企业通过信息共享、协同合作来实现跨越式发展,而计算机支持的协同工作(CSCW)无疑是提高企业运行效率的重要途径。提出了一种基于CSCW的异构数据库集成管理系统,从研究WebServices技术入手设计系统结构和功能,研究异构数据源中数据获取的最佳方案,通过解决转换文档格式、抽象资源服务、查询分解等问题来完成集成管理异构数据库任务,为网络环境下的用户提供安全、有效的查询结果。  相似文献   

3.
赵英 《现代情报》2008,28(5):14-16
在网络环境下,数据的异构和分布是不可避免的.本文提出以中间件加数据集成缓存器集的方法来集成分布式数据,重点研讨在此基础上两种情形的异构数据无缝整合的三种方法,为解决"信息孤岛"问題和实现分布式异构数据的全方位共事提供理论和技术支撑.  相似文献   

4.
对分布式存储的异构数据源来说,通过LDAP与各数据库的集成能够更好地提高系统的效率。从系统设计、系统模型、LDAP与异构数据库的集成3个方面,对异构分布式数据查询系统的设计进行了分析。  相似文献   

5.
分布式企业信息集成模型   总被引:5,自引:0,他引:5  
常明  刘烨 《情报杂志》2006,25(1):100-101,104
针对目前我圆各颁城企业ERP系统分布式异构的特点所带来的信息共享问题,提出了基于CORBA和XML技术构建分布式异构数据集成系统,从而实现网络环境下的企业间信息共享。满足用户对信息集成的需求。  相似文献   

6.
大数据时代的数字图书馆异构数据集成研究   总被引:1,自引:0,他引:1  
在大数据时代,数字图书馆的数据处理及服务将会发生明显的变化,将从传统的信息查询、推送等服务转向在海量的数据中分析和挖掘出潜在有价值的信息,而关系型数据库的结构和机制不能很好地适应这种变化。针对数字图书馆在大数据背景下异构数据的集成问题,提出了基于NoSQL的中间件模型的数据集成方法。该方法有利于数字图书馆存储各种结构的数据,同时能够很好地适应海量数据分布式存储。  相似文献   

7.
分布式数字图书馆的模式研究   总被引:4,自引:0,他引:4  
针对万维网信息资源呈现分布式、开放异构性的特点,介绍了三种分布式数字图书馆的结构模式,并进行了分析、比较,指出解决异构系统间的互操作性是分布式数字图书馆建设的关键问题。  相似文献   

8.
基于关键词的文档层次查询   总被引:1,自引:0,他引:1  
文档查询是科学工作中的重要环节.从实现机理采看,文档查询是一种核心就是构建查询语句即设计查询界面及向数据库查询语句转化.提出了一种面向文档查询的查询树概念,将每个叶结点对应于一条SQL语句,而分支结点则表示子结点之间的并交差集合运算关系和其他运算关系,便于表达复杂文档查询要求.设计了查询树向SQL语句转化算法,将整个查询树合并为一条SQL语句,充分发挥DBMS 查询优化功能.  相似文献   

9.
基金项目评审管理中智能交互式文档检索   总被引:4,自引:0,他引:4  
讨论了非结构化文档信息的检索模型,分析了传统交互式信息检索方法,提出了面向基金项目文档查询的智能交互式信息检索过程和处理流程。基于用户评价项目文档的反馈信息,采用ID3算法、CLCC算法和SVM分类函数分别学习用户查询的潜在意图和目标,并应用所学习的规则知识或分类函数支持项目文档查询。以某基金评审管理中项目文档的查询为例进行了实验计算与分析。  相似文献   

10.
分布式数据库系统由于数据的分布和冗于使得分布式查询处理增加了许多新的内容和复杂性,不同的查询处理方法,其查询的费用和并行处理程度是大不一样的,因此,分布式数据库系统的查询优化较集中式数据库系统更重要,效果更显著。根据分布式数据库系统的特点,简要介绍分布式查询优化的目标、策略及查询优化的基本方法。  相似文献   

11.
The problem of results merging in distributed information retrieval environments has gained significant attention the last years. Two generic approaches have been introduced in research. The first approach aims at estimating the relevance of the documents returned from the remote collections through ad hoc methodologies (such as weighted score merging, regression etc.) while the other is based on downloading all the documents locally, completely or partially, in order to calculate their relevance. Both approaches have advantages and disadvantages. Download methodologies are more effective but they pose a significant overhead on the process in terms of time and bandwidth. Approaches that rely solely on estimation on the other hand, usually depend on document relevance scores being reported by the remote collections in order to achieve maximum performance. In addition to that, regression algorithms, which have proved to be more effective than weighted scores merging algorithms, need a significant number of overlap documents in order to function effectively, practically requiring multiple interactions with the remote collections. The new algorithm that is introduced is based on adaptively downloading a limited, selected number of documents from the remote collections and estimating the relevance of the rest through regression methodologies. Thus it reconciles the above two approaches, combining their strengths, while minimizing their drawbacks, achieving the limited time and bandwidth overhead of the estimation approaches and the increased effectiveness of the download. The proposed algorithm is tested in a variety of settings and its performance is found to be significantly better than the former, while approximating that of the latter.  相似文献   

12.
How to merge and organise query results retrieved from different resources is one of the key issues in distributed information retrieval. Some previous research and experiments suggest that cluster-based document browsing is more effective than a single merged list. Cluster-based retrieval results presentation is based on the cluster hypothesis, which states that documents that cluster together have a similar relevance to a given query. However, while this hypothesis has been demonstrated to hold in classical information retrieval environments, it has never been fully tested in heterogeneous distributed information retrieval environments. Heterogeneous document representations, the presence of document duplicates, and disparate qualities of retrieval results, are major features of an heterogeneous distributed information retrieval environment that might disrupt the effectiveness of the cluster hypothesis. In this paper we report on an experimental investigation into the validity and effectiveness of the cluster hypothesis in highly heterogeneous distributed information retrieval environments. The results show that although clustering is affected by different retrieval results representations and quality, the cluster hypothesis still holds and that generating hierarchical clusters in highly heterogeneous distributed information retrieval environments is still a very effective way of presenting retrieval results to users.  相似文献   

13.
In this paper, a new source selection algorithm for uncooperative distributed information retrieval environments is presented. The algorithm functions by modeling each information source as an integral, using the relevance score and the intra-collection position of its sampled documents in reference to a centralized sample index and selects the collections that cover the largest area in the rank-relevance space. Based on the above novel metric, the algorithm explicitly focuses on addressing the two goals of source selection; high-recall, which is important for source recommendation applications and high-precision which is important for distributed information retrieval, aiming to produce a high-precision final merged list.  相似文献   

14.
This paper examines the estimation of global term weights (such as IDF) in information retrieval scenarios where a global view on the collection is not available. In particular, the two options of either sampling documents or of using a reference corpus independent of the target retrieval collection are compared using standard IR test collections. In addition, the possibility of pruning term lists based on frequency is evaluated.  相似文献   

15.
Numerous feature-based models have been recently proposed by the information retrieval community. The capability of features to express different relevance facets (query- or document-dependent) can explain such a success story. Such models are most of the time supervised, thus requiring a learning phase. To leverage the advantages of feature-based representations of documents, we propose TournaRank, an unsupervised approach inspired by real-life game and sport competition principles. Documents compete against each other in tournaments using features as evidences of relevance. Tournaments are modeled as a sequence of matches, which involve pairs of documents playing in turn their features. Once a tournament is ended, documents are ranked according to their number of won matches during the tournament. This principle is generic since it can be applied to any collection type. It also provides great flexibility since different alternatives can be considered by changing the tournament type, the match rules, the feature set, or the strategies adopted by documents during matches. TournaRank was experimented on several collections to evaluate our model in different contexts and to compare it with related approaches such as Learning To Rank and fusion ones: the TREC Robust2004 collection for homogeneous documents, the TREC Web2014 (ClueWeb12) collection for heterogeneous web documents, and the LETOR3.0 collection for comparison with supervised feature-based models.  相似文献   

16.
Conventional approaches to information retrieval search through all applicable entries in an inverted file for a particular collection in order to find those documents with the highest scores. For particularly large collections this may be extremely time consuming.  相似文献   

17.
It is well-known that relevance feedback is a method significant in improving the effectiveness of information retrieval systems. Improving effectiveness is important since these information retrieval systems must gain access to large document collections distributed over different distant sites. As a consequence, efforts to retrieve relevant documents have become significantly greater. Relevance feedback can be viewed as an aid to the information retrieval task. In this paper, a relevance feedback strategy is presented. The strategy is based on back-propagation of the relevance of retrieved documents using an algorithm developed in a neural approach. This paper describes a neural information retrieval model and emphasizes the results obtained with the associated relevance back-propagation algorithm in three different environments: manual ad hoc, automatic ad hoc and mixed ad hoc strategy (automatic plus manual ad hoc).  相似文献   

18.
Users of search engines express their needs as queries, typically consisting of a small number of terms. The resulting search engine query logs are valuable resources that can be used to predict how people interact with the search system. In this paper, we introduce two novel applications of query logs, in the context of distributed information retrieval. First, we use query log terms to guide sampling from uncooperative distributed collections. We show that while our sampling strategy is at least as efficient as current methods, it consistently performs better. Second, we propose and evaluate a pruning strategy that uses query log information to eliminate terms. Our experiments show that our proposed pruning method maintains the accuracy achieved by complete indexes, while decreasing the index size by up to 60%. While such pruning may not always be desirable in practice, it provides a useful benchmark against which other pruning strategies can be measured.  相似文献   

19.
网络信息分类检索问题研究   总被引:4,自引:0,他引:4  
This paper studies network information classification retrieval from the theory of information management. With a brief introduction to search engines, it focuses on analyzing the characteristics of network documents and their classification system. Problems in network document classification are pointed out. Suggestions such as constructing a catalog classification search engine system are made.  相似文献   

20.
In ad hoc querying of document collections, current approaches to ranking primarily rely on identifying the documents that contain the query terms. Methods such as query expansion, based on thesaural information or automatic feedback, are used to add further terms, and can yield significant though usually small gains in effectiveness. Another approach to adding terms, which we investigate in this paper, is to use natural language technology to annotate - and thus disambiguate - key terms by the concept they represent. Using biomedical research documents, we quantify the potential benefits of tagging users’ targeted concepts in queries and documents in domain-specific information retrieval. Our experiments, based on the TREC Genomics track data, both on passage and full-text retrieval, found no evidence that automatic concept recognition in general is of significant value for this task. Moreover, the issues raised by these results suggest that it is difficult for such disambiguation to be effective.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号