首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
The National Library of Medicine has offered TOXLINE, an online interactive bibliographic database of biomedical (toxicology) information since 1972. Files from 11 secondary sources comprise the TOXLINE database. The sources supplied bibliographic records in different formats and data structures. Data from each supplier's format had to be converted into a format suitable for TOXLINE. Three different, successive retrieval systems were used for the TOXLINE database which required reformatting of the data. Algorithms for generating terms for inverted file search methods were tested. Special characters peculiar to the scientific literature were evaluated during search term generation. Developing search term algorithms for chemical names in the scientific literature required techniques different from those used for nonscientific literature. Problems with replication of bibliographic records from multiple secondary sources are described. Some observations about online interactive databases since TOXLINE was first offered are noted.  相似文献   

2.
A prefix trie index (originally called trie hashing) is applied to the problem of providing fast search times, fast load times and fast update properties in a bibliographic or full text retrieval system. For all but the largest dictionaries a single key search in the dictionary under trie hashing takes exactly one disk read. Front compression of search keys is used to enhance performance. Partial combining of the postings into the dictionary is analyzed as a method to give both faster retrieval and improved update properties for the trie hashing inverted file. Statistics are given for a test database consisting of an online catalog at the Graduate School of Library and Information Science Library of the University of Western Ontario. The effect of changing various parameters of prefix tries are tested in this application.  相似文献   

3.
MEDLINE is presented as a prototype for on-line bibliographic search systems. Creation of the data base, indexing language, and file organization are reviewed. On accessing the files, search logic is illustrated with a sample MEDLINE search. NLM's development of a document delivery system to complement its bibliographic retrieval system is discussed.  相似文献   

4.
The database to be used with an online bibliographic information system must meet a number of requirements which are often not satisfied by conventional database management systems. Most important of these is the requirement for full authority file control over the indexes to the database. This paper reviews the special requirements of a bibliographic database and shows how they are met in the database system of DOBIS-LIBIS (Dortmund Library System-Leuven Library System).  相似文献   

5.
A variety of data structures such as inverted file, multi-lists, quad tree, k-d tree, range tree, polygon tree, quintary tree, multidimensional tries, segment tree, doubly chained tree, the grid file, d-fold tree. super B-tree, Multiple Attribute Tree (MAT), etc. have been studied for multidimensional searching and related problems. Physical data base organization, which is an important application of multidimensional searching, is traditionally and mostly handled by employing inverted file. This study proposes MAT data structure for bibliographic file systems, by illustrating the superiority of MAT data structure over inverted file. Both the methods are compared in terms of preprocessing, storage and query costs. Worst-case complexity analysis of both the methods, for a partial match query, is carried out in two cases: (a) when directory resides in main memory, (b) when directory resides in secondary memory. In both cases, MAT data structure is shown to be more efficient than the inverted file method. Arguments are given to illustrate the superiority of MAT data structure in an average case also. An efficient adaptation of MAT data structure, that exploits the special features of MAT structure and bibliographic files, is proposed for bibliographic file systems. In this adaptation, suitable techniques for fixing and ranking of the attributes for MAT data structure are proposed. Conclusions and proposals for future research are presented.  相似文献   

6.
The response time characteristics of the National Library of Medicine's (NLM) ELHILL bibliographic search system are examined in this article. Transactions for a five-week period are analyzed and average response times are calculated for typical search commands, by time of day, and by file being searched. Overall, the response time of the system was found to be 2.1 seconds, a very low value. Based on statistical tests of significance applied to the data, it was concluded that response time differences can be explained in terms of the number of users on the system and not the command issued by the user nor the file the user searched.  相似文献   

7.
The inverted file is the most popular indexing mechanism for document search in an information retrieval system. Compressing an inverted file can greatly improve document search rate. Traditionally, the d-gap technique is used in the inverted file compression by replacing document identifiers with usually much smaller gap values. However, fluctuating gap values cannot be efficiently compressed by some well-known prefix-free codes. To smoothen and reduce the gap values, we propose a document-identifier reassignment algorithm. This reassignment is based on a similarity factor between documents. We generate a reassignment order for all documents according to the similarity to reassign closer identifiers to the documents having closer relationships. Simulation results show that the average gap values of sample inverted files can be reduced by 30%, and the compression rate of d-gapped inverted file with prefix-free codes can be improved by 15%.  相似文献   

8.
Online information retrieval systems continued to reach wider audiences. The authors discuss a particular text retrieval system and its techniques for helping the common unsophisticated user through both the search for and understanding of information based on the vocabular file concept. In addition methods for easy construction and maintenance of a suuitable data base organization are described.  相似文献   

9.
Conventional approaches to information retrieval search through all applicable entries in an inverted file for a particular collection in order to find those documents with the highest scores. For particularly large collections this may be extremely time consuming.  相似文献   

10.
This paper outlines an online acquisition system and describes in brief the various factors involved in the automated environment. The acquisition system database is based on a network model and is structured to handle the complex nature of bibliographic data efficiently and economically to achieve an integrated library system. The comprehensibility of the system has been reflected in the database design stage itself. System design and development factors concerning hierarchical menu-driven operation, as well as considerations of portability with respect to hardware/system software requirements are highlighted.  相似文献   

11.
资源与环境应用模型方法元数据初步探讨   总被引:6,自引:0,他引:6  
探讨编写模型方法无数据的必要性和可行性,综合方法无数据标准的基本框架和适用的方法无数据管理模型,并研究了基于方法无数据的模型管理模式。本文认为模型方法无数据应当包括标识翻译片、适用领域、模型参数、运行条件、性能、原理、模型实现和管理信息等8个方面的内容。方法无数据可以采用文本文件、关系数据库和面对象数据库管理;基于方法无数据建立构模语言可方便的实现资源与环境应用模型的运行管理。  相似文献   

12.
王泽贤 《情报探索》2014,(5):95-100
利用Lucene的全文索引和搜索技术,开发了与ILAS III集成的全文搜索型OPAC系统Bookle。介绍了Bookle的体系结构以及参数管理器、索引器、搜索器、用户接口等的设计与实现。Bookle系统实现了扩展书目信息的自动抓取并使之本地化,扩展了书目检索点,为读者提供了书目记录及其扩展书目信息的任意词全文搜索等服务,弥补了ILAS III OPAC的不足。  相似文献   

13.
仇壮丽 《现代情报》2013,33(2):52-55
总结了目前国内知识产权文献数据库的不足,提出了国家知识产权文献数据库系统设计的目标、数据库构成及概念结构。为了实现"快速、全面、准确"的检索目标,需要建立元数据实现数据库的标准化,采用科学的分类体系实现族性检索,开发知识产权领域本体扩展用户检索入口词汇,通过搜索引擎实现全文检索。最后提出了系统的实现方式。  相似文献   

14.
The purpose of this study was to propose a design for a Superintendent of Documents (SuDocs) number search key to retrieve bibliographic records for United States Government documents from OCLC's On-Line Union Catalog. Experimentation with a test file of 25,000 records indicated that a search key derived from a maximum of the first 14 digits in the SuDocs number is sufficiently distinctive to obtain an expected average retrieval of 2.5 records per search. OCLC will implement a SuDocs number search key in the future. It is expected that this key will be a valuable tool for library catalogers and users.  相似文献   

15.
Unknown words such as proper nouns, abbreviations, and acronyms are a major obstacle in text processing. Abbreviations, in particular, are difficult to read/process because they are often domain specific. In this paper, we propose a method for automatic expansion of abbreviations by using context and character information. In previous studies dictionaries were used to search for abbreviation expansion candidates (candidates words for original form of abbreviations) to expand abbreviations. We use a corpus with few abbreviations from the same field instead of a dictionary. We calculate the adequacy of abbreviation expansion candidates based on the similarity between the context of the target abbreviation and that of its expansion candidate. The similarity is calculated using a vector space model in which each vector element consists of words surrounding the target abbreviation and those of its expansion candidate. Experiments using approximately 10,000 documents in the field of aviation showed that the accuracy of the proposed method is 10% higher than that of previously developed methods.  相似文献   

16.
17.
18.
随着甲骨文数字化研究地不断深入,需要处理的甲骨文信息变得越来越多,对甲骨文信息地提取变得非常困难。本课题就是要研究利用基于java的全文检索工具包Luence,建立甲骨文全文检索系统,使之能够在本地硬盘完成对甲骨文的全文检索,完成全文匹配。本文通过对目标文件夹建立索引,输入关键词后,能够检索到包含该关键词的目标文件的详细信息。  相似文献   

19.
A fast algorithm is described for comparing the lists of terms representing documents in automatic classification experiments. The speed of the procedure arises from the fact that all of the non-zero-valued coefficients for a given document are identified together, using an inverted file to the terms in the document collection. The complexity and running time of the algorithm are compared with previously described procedures.  相似文献   

20.
The automatic text summary concerns the language industries. This work proposes a system automatically and directly transforming a source text into a reduced target text. The system deals exclusively with scientific and technical texts. It is based on the identification of specific expressions allowing an evaluation of the relevance of the sentence concerned, which can then be selected for the elaboration of the summary. The procedure consists in attributing a score to each sentence of the text and then eliminating those having the lowest scores. To produce the RAFI system (automatic summary based on indicative fragments), we resorted to the linguistic means of discourse analysis and the computing capacity of data processing instruments. This system could be adapted to Internet.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号