共查询到20条相似文献,搜索用时 125 毫秒
1.
R J Schultheisz 《Journal of the American Society for Information Science》1981,32(6):421-429
The National Library of Medicine has offered TOXLINE, an online interactive bibliographic database of biomedical (toxicology) information since 1972. Files from 11 secondary sources comprise the TOXLINE database. The sources supplied bibliographic records in different formats and data structures. Data from each supplier's format had to be converted into a format suitable for TOXLINE. Three different, successive retrieval systems were used for the TOXLINE database which required reformatting of the data. Algorithms for generating terms for inverted file search methods were tested. Special characters peculiar to the scientific literature were evaluated during search term generation. Developing search term algorithms for chemical names in the scientific literature required techniques different from those used for nonscientific literature. Problems with replication of bibliographic records from multiple secondary sources are described. Some observations about online interactive databases since TOXLINE was first offered are noted. 相似文献
2.
Michael J. Nelson 《Information processing & management》1997,33(6):739-744
A prefix trie index (originally called trie hashing) is applied to the problem of providing fast search times, fast load times and fast update properties in a bibliographic or full text retrieval system. For all but the largest dictionaries a single key search in the dictionary under trie hashing takes exactly one disk read. Front compression of search keys is used to enhance performance. Partial combining of the postings into the dictionary is analyzed as a method to give both faster retrieval and improved update properties for the trie hashing inverted file. Statistics are given for a test database consisting of an online catalog at the Graduate School of Library and Information Science Library of the University of Western Ontario. The effect of changing various parameters of prefix tries are tested in this application. 相似文献
3.
D B McCarn 《Journal of the American Society for Information Science》1980,31(3):181-192
MEDLINE is presented as a prototype for on-line bibliographic search systems. Creation of the data base, indexing language, and file organization are reviewed. On accessing the files, search logic is illustrated with a sample MEDLINE search. NLM's development of a document delivery system to complement its bibliographic retrieval system is discussed. 相似文献
4.
The database to be used with an online bibliographic information system must meet a number of requirements which are often not satisfied by conventional database management systems. Most important of these is the requirement for full authority file control over the indexes to the database. This paper reviews the special requirements of a bibliographic database and shows how they are met in the database system of DOBIS-LIBIS (Dortmund Library System-Leuven Library System). 相似文献
5.
S.V.Nageswara Rao S.Sitharama Iyengar C.E.Veni Madhavan 《Information processing & management》1985,21(5):433-442
A variety of data structures such as inverted file, multi-lists, quad tree, k-d tree, range tree, polygon tree, quintary tree, multidimensional tries, segment tree, doubly chained tree, the grid file, d-fold tree. super B-tree, Multiple Attribute Tree (MAT), etc. have been studied for multidimensional searching and related problems. Physical data base organization, which is an important application of multidimensional searching, is traditionally and mostly handled by employing inverted file. This study proposes MAT data structure for bibliographic file systems, by illustrating the superiority of MAT data structure over inverted file. Both the methods are compared in terms of preprocessing, storage and query costs. Worst-case complexity analysis of both the methods, for a partial match query, is carried out in two cases: (a) when directory resides in main memory, (b) when directory resides in secondary memory. In both cases, MAT data structure is shown to be more efficient than the inverted file method. Arguments are given to illustrate the superiority of MAT data structure in an average case also. An efficient adaptation of MAT data structure, that exploits the special features of MAT structure and bibliographic files, is proposed for bibliographic file systems. In this adaptation, suitable techniques for fixing and ranking of the attributes for MAT data structure are proposed. Conclusions and proposals for future research are presented. 相似文献
6.
M D Cooper 《Journal of the American Society for Information Science》1983,34(6):374-380
The response time characteristics of the National Library of Medicine's (NLM) ELHILL bibliographic search system are examined in this article. Transactions for a five-week period are analyzed and average response times are calculated for typical search commands, by time of day, and by file being searched. Overall, the response time of the system was found to be 2.1 seconds, a very low value. Based on statistical tests of significance applied to the data, it was concluded that response time differences can be explained in terms of the number of users on the system and not the command issued by the user nor the file the user searched. 相似文献
7.
《Information processing & management》2003,39(1):117-131
The inverted file is the most popular indexing mechanism for document search in an information retrieval system. Compressing an inverted file can greatly improve document search rate. Traditionally, the d-gap technique is used in the inverted file compression by replacing document identifiers with usually much smaller gap values. However, fluctuating gap values cannot be efficiently compressed by some well-known prefix-free codes. To smoothen and reduce the gap values, we propose a document-identifier reassignment algorithm. This reassignment is based on a similarity factor between documents. We generate a reassignment order for all documents according to the similarity to reassign closer identifiers to the documents having closer relationships. Simulation results show that the average gap values of sample inverted files can be reduced by 30%, and the compression rate of d-gapped inverted file with prefix-free codes can be improved by 15%. 相似文献
8.
T. G. Burket 《Information processing & management》1979,15(6):281-289
Online information retrieval systems continued to reach wider audiences. The authors discuss a particular text retrieval system and its techniques for helping the common unsophisticated user through both the search for and understanding of information based on the vocabular file concept. In addition methods for easy construction and maintenance of a suuitable data base organization are described. 相似文献
9.
Conventional approaches to information retrieval search through all applicable entries in an inverted file for a particular collection in order to find those documents with the highest scores. For particularly large collections this may be extremely time consuming. 相似文献
10.
This paper outlines an online acquisition system and describes in brief the various factors involved in the automated environment. The acquisition system database is based on a network model and is structured to handle the complex nature of bibliographic data efficiently and economically to achieve an integrated library system. The comprehensibility of the system has been reflected in the database design stage itself. System design and development factors concerning hierarchical menu-driven operation, as well as considerations of portability with respect to hardware/system software requirements are highlighted. 相似文献
11.
12.
利用Lucene的全文索引和搜索技术,开发了与ILAS III集成的全文搜索型OPAC系统Bookle。介绍了Bookle的体系结构以及参数管理器、索引器、搜索器、用户接口等的设计与实现。Bookle系统实现了扩展书目信息的自动抓取并使之本地化,扩展了书目检索点,为读者提供了书目记录及其扩展书目信息的任意词全文搜索等服务,弥补了ILAS III OPAC的不足。 相似文献
13.
总结了目前国内知识产权文献数据库的不足,提出了国家知识产权文献数据库系统设计的目标、数据库构成及概念结构。为了实现"快速、全面、准确"的检索目标,需要建立元数据实现数据库的标准化,采用科学的分类体系实现族性检索,开发知识产权领域本体扩展用户检索入口词汇,通过搜索引擎实现全文检索。最后提出了系统的实现方式。 相似文献
14.
The purpose of this study was to propose a design for a Superintendent of Documents (SuDocs) number search key to retrieve bibliographic records for United States Government documents from OCLC's On-Line Union Catalog. Experimentation with a test file of 25,000 records indicated that a search key derived from a maximum of the first 14 digits in the SuDocs number is sufficiently distinctive to obtain an expected average retrieval of 2.5 records per search. OCLC will implement a SuDocs number search key in the future. It is expected that this key will be a valuable tool for library catalogers and users. 相似文献
15.
《Information processing & management》2004,40(1):31-45
Unknown words such as proper nouns, abbreviations, and acronyms are a major obstacle in text processing. Abbreviations, in particular, are difficult to read/process because they are often domain specific. In this paper, we propose a method for automatic expansion of abbreviations by using context and character information. In previous studies dictionaries were used to search for abbreviation expansion candidates (candidates words for original form of abbreviations) to expand abbreviations. We use a corpus with few abbreviations from the same field instead of a dictionary. We calculate the adequacy of abbreviation expansion candidates based on the similarity between the context of the target abbreviation and that of its expansion candidate. The similarity is calculated using a vector space model in which each vector element consists of words surrounding the target abbreviation and those of its expansion candidate. Experiments using approximately 10,000 documents in the field of aviation showed that the accuracy of the proposed method is 10% higher than that of previously developed methods. 相似文献
16.
17.
18.
19.
Peter Willett 《Information processing & management》1981,17(2):53-60
A fast algorithm is described for comparing the lists of terms representing documents in automatic classification experiments. The speed of the procedure arises from the fact that all of the non-zero-valued coefficients for a given document are identified together, using an inverted file to the terms in the document collection. The complexity and running time of the algorithm are compared with previously described procedures. 相似文献
20.
《Information processing & management》1999,35(2):181-191
The automatic text summary concerns the language industries. This work proposes a system automatically and directly transforming a source text into a reduced target text. The system deals exclusively with scientific and technical texts. It is based on the identification of specific expressions allowing an evaluation of the relevance of the sentence concerned, which can then be selected for the elaboration of the summary. The procedure consists in attributing a score to each sentence of the text and then eliminating those having the lowest scores. To produce the RAFI system (automatic summary based on indicative fragments), we resorted to the linguistic means of discourse analysis and the computing capacity of data processing instruments. This system could be adapted to Internet. 相似文献