首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 328 毫秒
1.
FACTS is an APL-based interactive on-line system used for retrieval of budget and accounting data. The system provides selective retrieval and manipulation of financial data for management in a development laboratory. The terms “teilnehmer” and “teilhaber” are defined and it is argued that use of a teilnehmer system, such as APL, can considerably reduce the programming and monitary investment for information science systems applications. A brief discussion of APL's text editing facilities is also included to introduce this relatively unknown language to information scientists.  相似文献   

2.
Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel integrated information retrieval system—the Query, Cluster, Summarize (QCS) system—which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of methods in the QCS design improves retrievals by providing users more focused information organized by topic.We demonstrate the improved performance by a series of experiments using standard test sets from the Document Understanding Conferences (DUC) as measured by the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines.Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence “trimming” and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format.Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules.  相似文献   

3.
The research examines the notion that the principles underlying the procedure used by doctors to diagnose a patient's disease are useful in the design of “intelligent” IR systems because the task of the doctor is conceptually similar to the computer (or human) intermediary's task in “intelligent information retrieval”: to draw out, through interaction with the IR system, the user's query/information need. The research is reported in two parts. In Part II, an information retrieval tool is described which is based on “intelligent information retrieval” assumptions about the information user. In Part I, presented here, the theoretical framework for the tool is set out. This framework is borrowed from the diagnostic procedure currently used in medicine, called “differential diagnosis”. Because of the severe consequences that attend misdiagnosis, the operating principle in differential diagnosis is (1) to expand the uncertainty in the diagnosis situation so that all possible hypotheses and evidence are considered, then (2) to contract the uncertainty in a step by step fashion (from an examination of the patient's symptoms, through the patient's history and a physical (signs), to laboratory tests). The IR theories of Taylor, Kuhlthau and Belkin are used to demonstrate that these medical diagnosis procedures are already present in IR and that it is a viable model with which to design “intelligent” IR tools and systems.  相似文献   

4.
如何在海量的非结构文档内容中准确、快捷找到自己所需要的信息,是信息检索技术的研究重点。全文检索是现代信息检索技术一个非常重要的分支,是解决非结构化数据检索需求的重要技术手段。以已发布的各类通信业务管理规范的全文检索需求为切入点,设计并实现了适用于国家级气象信息化业务管理的非结构化文档全文检索系统。该系统基于Java技术,并采用Lucene技术框架,对业务规范信息进行了分析和重新数据组织,确保良好的检索时效与准确率。系统应用后能快速应对业务变化,在已有的大量的规定、规范、标准和公文函件中迅速、准确、全面地查找有关资料信息,帮助用户准确把握气象信息化发展脉络。  相似文献   

5.
An indexing technique for text data based on word fragments is described. In contrast to earlier approaches the fragments are allowed to be overlapping and are linked in a directed graph structure reflecting that many fragments (“Superstrings”) contain other fragments as substrings. This leads to a redundant free set of primary data pointers. By classifying the set of Superstrings belonging to a fragment according to the position of the fragment in the Superstring, one gains a novel possibility of supporting exact match-, partial match-, and masked partial match-retrieval by an index. The search strategies for the various retrieval cases are described.  相似文献   

6.
Traditional Cranfield test collections represent an abstraction of a retrieval task that Sparck Jones calls the “core competency” of retrieval: a task that is necessary, but not sufficient, for user retrieval tasks. The abstraction facilitates research by controlling for (some) sources of variability, thus increasing the power of experiments that compare system effectiveness while reducing their cost. However, even within the highly-abstracted case of the Cranfield paradigm, meta-analysis demonstrates that the user/topic effect is greater than the system effect, so experiments must include a relatively large number of topics to distinguish systems’ effectiveness. The evidence further suggests that changing the abstraction slightly to include just a bit more characterization of the user will result in a dramatic loss of power or increase in cost of retrieval experiments. Defining a new, feasible abstraction for supporting adaptive IR research will require winnowing the list of all possible factors that can affect retrieval behavior to a minimum number of essential factors.  相似文献   

7.
Information-systems are classified into two types, termed “Evidence-of Existence” and “Presentation” of information. The objective of the evidence-type system lies in the domain of documentation and retrieval of information. The structure of this system-type is developed, with application of cybernetic concepts, as an isomorphic model in analogy to the system-structure of communication technology. The latter postulates three criteria of structuring: (1) Source-Channel-Sink, with input-output characteristics, (2) Filter-type communication-channel, (3) Reversable code. These criteria are applied to the structuring of information-systems of the evidence-of-existence type. For the purpose of two-way communication the information-systems have to be represented by closed-loop models. The selective-retrieval requirements necessitate the system-channel to be a filter of information. These information-filters are implemented by keyword-phrases, being identical with the codewords. They yield a uniquely decodable code which is totally reversible to adequately serve both the documentation and the retrieval of documents. It is proven that hierarchic information-systems, applying categorization or subject-heading objects of information, do not meet the mandatory code-requirements. The inherent coding-deficiencies of hierarchic systems generate intolerable retrieval ambiguities. The same critique applies to the thesaurus concept. The development of a novel species of thesaurus is suggested, realizing a kind of Linnéan encyclopedia of general human knowledge, presenting all relevant interrelations of objects of knowledge. Such thesaurus would provide the much needed support for formulating efficient search queries. Other relevant features of communication technology, like the information-potential, should be isomorphically transformed into information-system models.  相似文献   

8.
Researchers in indexing and retrieval systems have been advocating the inclusion of more contextual information to improve results. The proliferation of full-text databases and advances in computer storage capacity have made it possible to carry out text analysis by means of linguistic and extra-linguistic knowledge. Since the mid 80s, research has tended to pay more attention to context, giving discourse analysis a more central role. The research presented in this paper aims to check whether discourse variables have an impact on modern information retrieval and classification algorithms. In order to evaluate this hypothesis, a functional framework for information analysis in an automated environment has been proposed, where the n-grams (filtering) and the k-means and Chen’s classification algorithms have been tested against sub-collections of documents based on the following discourse variables: “Genre”, “Register”, “Domain terminology”, and “Document structure”. The results obtained with the algorithms for the different sub-collections were compared to the MeSH information structure. These demonstrate that n-grams does not appear to have a clear dependence on discourse variables, though the k-means classification algorithm does, but only on domain terminology and document structure, and finally Chen’s algorithm has a clear dependence on all of the discourse variables. This information could be used to design better classification algorithms, where discourse variables should be taken into account. Other minor conclusions drawn from these results are also presented.  相似文献   

9.
单汉字索引是中文全文检索索引技术中一个主要方法,此方法在索引的空问和检索的效率方面都存在不足。本文引入单元词索引,并分析试验数据,表明引入单元词索引后,索引的空间效率和检索的时间效率均有提高。  相似文献   

10.
向禹  吴世明 《现代情报》2014,34(6):75-78
通过建设双层PDF全文数据库、创建索引和全文检索等实现过程来阐述相关技术的研究和运用。以建设全文数据库为基础,研究结构化信息与非结构化数据的合并管理,对目录数据和全文数据的同步索引,基于Lucene技术,实现档案管理系统的一站式智能化档案全文检索,提升档案查全率。  相似文献   

11.
This article describes the architecture of a generic platform for building distributed locational systems over stand-alone applications. The proposed platform integrates ideas and technology from areas such as distributed and parallel databases, transaction processing systems, and workflow management. The main contribution of this research effort is to propose a “kernel” locational system providing the essentials for distributed processing and to show the important role database technology may play in supporting such functionality of workflow management. These include a powerful process management environment using the principles of the Problem Structuring Methodology (PSM), which it created as a generalization of workflow ideas and incorporated transactional notions such as spheres of isolation, atomicity, and persistence and a transactional engine enforcing these “quality guarantees” based on the nested and multi-level models. It also includes a tool kit providing externalized database functionality enabling physical database design over heterogeneous data repositories.  相似文献   

12.
Wouter Stam   《Research Policy》2009,38(8):1288-1299
This study examined how participation in open innovation communities influences the innovative and financial performance of firms commercializing open source software. Using an original dataset of open source companies in the Netherlands, I found that the community participation–performance relationship is curvilinear. In addition, results indicate that extensive technical participation in open source projects is more strongly related to performance for firms that also engage in social (“offline”) community activities, for companies of larger size, and for firms with high R&D intensities. Overall, this research refines our understanding of the boundary conditions under which engagement in community-based innovation yields private returns to commercial actors.  相似文献   

13.
A new method is described to extract significant phrases in the title and the abstract of scientific or technical documents. The method is based upon a text structure analysis and uses a relatively small dictionary. The dictionary has been constructed based on the knowledge about concepts in the field of science or technology and some lexical knowledge, for significant phrases and their component items may be used in different meanings among the fields. A text analysis approach has been applied to select significant phrases as substantial and semantic information carriers of the contents of the abstract.The results of the experiment for five sets of documents have shown that the significant phrases are effectively extracted in all cases, and the number of them for every document and the processing time is fairly satisfactory. The information representation of the document, partly using the method, is discussed with relation to the construction of the document information retrieval system.  相似文献   

14.
Citing statements can be used to aid retrieval, to increase the efficiency of citation indexes and for the study of information flow and use. These uses are only feasible on a large scale if computers can identify citing statements within the texts of documents with reasonable accuracy.Computer recognition of multi-sentence citing statements is not easy. Procedures developed for chemistry papers in an earlier experiment were tested on biomedical papers (dealing with various aspects of cancer) and were almost as successful. Specifically, (1) 78% of the words in computer-recognized citing statements were correctly attributable to the corresponding cited papers; and (2) the computer procedures missed 4% of the words in the actual citing statements. When the procedures were modified on the basis of those results and tested on a new sample of cancer papers the results were comparable: 72 and 3% respectively.In an earlier experiment in use of full-text searching to retrieve answer-passages from cancer papers, recall in the “test phase” averaged about 70% and the false retrieval rate was thirteen falsely retrieved sentences per answer-paper retrieved. Unretrieved answer-papers in that experiment's “development phase”, and citing statements referring to them, were studied to develop computer procedures for using citing statements to increase recall. The procedures developed only produced slight recall increases for development phase answer-papers, and similarly for the test phase papers on which they were then tested. Specifically, the test phase results were the following: recall was increased from 70 to 74%, and there was no increase in false retrieval. This contrasts with an earlier experiment in which 50% recall of chemistry papers by search of index terms and abstract words was increased to 70% by the addition of words from citing statements. The difference may be because the average number of citing papers per unretrieved cancer paper was only six while that for chemistry papers was thirteen.  相似文献   

15.
An expert system was developed in the area of information retrieval, with the objective of performing the job of an information specialist, who assists users in selecting the right vocabulary terms for a database search.The system is composed of two components: One is the knowledge base, represented as a semantic network, in which the nodes are words, concepts, phrases, comprising a vocabulary of the application area and the links express semantic relationships between those nodes. The second component is the rules, or procedures, which operate upon the knowledge-base, analogous to the decision rules or work patterns of the information specialist.Two major stages comprise the consulting process of the system: During the “search” stage relevant knowledge in the semantic network is activated, and search and evaluation rules are applied in order to find appropriate vocabulary terms to represent the user's problem. During the “suggest” stage those terms are further evaluated, dynamically rank-ordered according to relevancy, and suggested to the user. Explanations to the findings can be provided by the system and backtracking is possible in order to find alternatives in case some suggested term is rejected by the user.This article presents the principle, procedures and rules which are utilized in the expert system.  相似文献   

16.
Due to the large repository of documents available on the web, users are usually inundated by a large volume of information, most of which is found to be irrelevant. Since user perspectives vary, a client-side text filtering system that learns the user's perspective can reduce the problem of irrelevant retrieval. In this paper, we have provided the design of a customized text information filtering system which learns user preferences and modifies the initial query to fetch better documents. It uses a rough-fuzzy reasoning scheme. The rough-set based reasoning takes care of natural language nuances, like synonym handling, very elegantly. The fuzzy decider provides qualitative grading to the documents for the user's perusal. We have provided the detailed design of the various modules and some results related to the performance analysis of the system.  相似文献   

17.
本文借鉴文本检索领域的研究成果,利用标注文字信息描述单个模型的语义,采用语义树表达三维模型间的语义,基于WordNet计算检索关键词与语义树中节点的语义相似性,返回语义相关性强的模型。提出较灵活的返回策略,筛选各语义相关节点的代表模型,便于用户进一步优化检索结果。实验结果表明,基于语义树的三维模型检索方法能够提高信息检索的效率,具有较高的理论及应用价值。  相似文献   

18.
The structure of a retrospective information retrieval system that uses equifrequent word or text fragments is described, and its advantages over word oriented systems are mentioned briefly. Word fragments are proposed as retrieval elements, and a discussion is given of the changes required in order to process a query. Some necessary modifications in the treatment of logical operators are described, and two conditions are postulated as necessary for the successful operation of the system. Aspects of query processing are illustrated by experimental results obtained from single and two-term queries applied to a portion of the MARC tapes.  相似文献   

19.
王昊  陈雅 《情报科学》2005,23(6):955-960
本文针对倒排文档检索技术实现全文检索过程中检索算式的逆波兰转换问题进行详细深入的探讨;提出并分析了逆波兰转换的“堆栈”和“二叉树”两种实现算法;阐述了如何在Foxpro环境下实现两种算法的程序设计,最后比较这两种算法。  相似文献   

20.
Noetica is a tool for structuring knowledge about concepts and the relationships between them. It differs from typical information systems in that the knowledge it represents is abstract, highly connected and includes meta-knowledge (knowledge about knowledge). Noetica represents knowledge using a strongly-typed semantic network. By providing a rich type system it is possible to represent conceptual information using formalised structures. A class hierarchy provides a basic classification for all objects. This allows for a consistency of representation that is not often found in “free” semantic networks and gives the ability to easily extend a knowledge model while retaining its semantics. We also provide visualisation and query tools for this data model. Visualisation can be used to explore complete sets of link-classes, show paths while navigating through the database, or visualise the results of queries. Noetica supports goal-directed queries (a series of user-supplied goals that the system attempts to satisfy in sequence) and path-finding queries (where the system find relationships between objects in the database by following links).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号