首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 484 毫秒
1.
Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel integrated information retrieval system—the Query, Cluster, Summarize (QCS) system—which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of methods in the QCS design improves retrievals by providing users more focused information organized by topic.We demonstrate the improved performance by a series of experiments using standard test sets from the Document Understanding Conferences (DUC) as measured by the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines.Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence “trimming” and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format.Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules.  相似文献   

2.
FACTS is an APL-based interactive on-line system used for retrieval of budget and accounting data. The system provides selective retrieval and manipulation of financial data for management in a development laboratory. The terms “teilnehmer” and “teilhaber” are defined and it is argued that use of a teilnehmer system, such as APL, can considerably reduce the programming and monitary investment for information science systems applications. A brief discussion of APL's text editing facilities is also included to introduce this relatively unknown language to information scientists.  相似文献   

3.
The research examines the notion that the principles underlying the procedure used by doctors to diagnose a patient's disease are useful in the design of “intelligent” IR systems because the task of the doctor is conceptually similar to the computer (or human) intermediary's task in “intelligent information retrieval”: to draw out, through interaction with the IR system, the user's query/information need. The research is reported in two parts. In Part II, an information retrieval tool is described which is based on “intelligent information retrieval” assumptions about the information user. In Part I, presented here, the theoretical framework for the tool is set out. This framework is borrowed from the diagnostic procedure currently used in medicine, called “differential diagnosis”. Because of the severe consequences that attend misdiagnosis, the operating principle in differential diagnosis is (1) to expand the uncertainty in the diagnosis situation so that all possible hypotheses and evidence are considered, then (2) to contract the uncertainty in a step by step fashion (from an examination of the patient's symptoms, through the patient's history and a physical (signs), to laboratory tests). The IR theories of Taylor, Kuhlthau and Belkin are used to demonstrate that these medical diagnosis procedures are already present in IR and that it is a viable model with which to design “intelligent” IR tools and systems.  相似文献   

4.
This research contributes to the intra-organization, inter-organization, and new product development (NPD) management literature by studying the impact of a firm's internal organizational design on the communication within and performance of NPD projects conducted with strategic alliance partners. The empirical data were collected from three in-depth case studies of network lead companies (NLCs) operating in different industries. The three NLCs have different internal organizational designs, ranging from very flexible “organic” to very rigid “mechanistic.” In each NLC, a successful new-to-firm product development project was chosen for further detailed investigation. First, we identify the role the alliance's NPD project characteristics and industry characteristics play in determining the “intensity level” and “media richness” of communication required between the alliance's NPD project partners. Then, we examine how the internal organizational design influences the actual intensity and media richness of communication of the alliance's NPD project that matches our assumptions of what is required.  相似文献   

5.
ERLI was asked by the French TELECOM to develop a specific system to query the professional headings of the French Yellow Pages directory. Approximately 4 million end users now have access (via their “Minitel” terminals) to some 6 million professionals registered under 2500 different headings. (A second application has also been developed using a similar system: the Minitel Applications Directory, which gives information on all the available applications in the Minitel network.)Although the retrieval of a heading is a necessary step in accessing data, it is of no real interest to the user, who wishes only to retrieve the phone number of a given professional or tradesperson.The general aims of the Natural Language System (NLS) are to facilitate access to headings by intelligent query processing (or even to bypass completely the necessity of choosing between headings).This is done through: • The association of a specific knowledge base to the list of headings, • The construction of a “grammar” ensuring a consistent interpretation of the queries.ERLI's system is as an alternative to the existing one, which is based on a key-word indexing technique. The weaknesses and insufficiencies of such a technique are well known, especially in this context, where queries are expressed by unqualified users, who are unfamiliar with the data (i.e., the headings of the directory).Finally, it is important to note that the NLS was developed with regard to industrial considerations (in particular, the minimizing of the average processing time per query). The system is not a prototype. Extensive on-side testing is scheduled to begin in July 1988 and a complete installation will be carried out at the end of the year.  相似文献   

6.
Information-systems are classified into two types, termed “Evidence-of Existence” and “Presentation” of information. The objective of the evidence-type system lies in the domain of documentation and retrieval of information. The structure of this system-type is developed, with application of cybernetic concepts, as an isomorphic model in analogy to the system-structure of communication technology. The latter postulates three criteria of structuring: (1) Source-Channel-Sink, with input-output characteristics, (2) Filter-type communication-channel, (3) Reversable code. These criteria are applied to the structuring of information-systems of the evidence-of-existence type. For the purpose of two-way communication the information-systems have to be represented by closed-loop models. The selective-retrieval requirements necessitate the system-channel to be a filter of information. These information-filters are implemented by keyword-phrases, being identical with the codewords. They yield a uniquely decodable code which is totally reversible to adequately serve both the documentation and the retrieval of documents. It is proven that hierarchic information-systems, applying categorization or subject-heading objects of information, do not meet the mandatory code-requirements. The inherent coding-deficiencies of hierarchic systems generate intolerable retrieval ambiguities. The same critique applies to the thesaurus concept. The development of a novel species of thesaurus is suggested, realizing a kind of Linnéan encyclopedia of general human knowledge, presenting all relevant interrelations of objects of knowledge. Such thesaurus would provide the much needed support for formulating efficient search queries. Other relevant features of communication technology, like the information-potential, should be isomorphically transformed into information-system models.  相似文献   

7.
Citing statements can be used to aid retrieval, to increase the efficiency of citation indexes and for the study of information flow and use. These uses are only feasible on a large scale if computers can identify citing statements within the texts of documents with reasonable accuracy.Computer recognition of multi-sentence citing statements is not easy. Procedures developed for chemistry papers in an earlier experiment were tested on biomedical papers (dealing with various aspects of cancer) and were almost as successful. Specifically, (1) 78% of the words in computer-recognized citing statements were correctly attributable to the corresponding cited papers; and (2) the computer procedures missed 4% of the words in the actual citing statements. When the procedures were modified on the basis of those results and tested on a new sample of cancer papers the results were comparable: 72 and 3% respectively.In an earlier experiment in use of full-text searching to retrieve answer-passages from cancer papers, recall in the “test phase” averaged about 70% and the false retrieval rate was thirteen falsely retrieved sentences per answer-paper retrieved. Unretrieved answer-papers in that experiment's “development phase”, and citing statements referring to them, were studied to develop computer procedures for using citing statements to increase recall. The procedures developed only produced slight recall increases for development phase answer-papers, and similarly for the test phase papers on which they were then tested. Specifically, the test phase results were the following: recall was increased from 70 to 74%, and there was no increase in false retrieval. This contrasts with an earlier experiment in which 50% recall of chemistry papers by search of index terms and abstract words was increased to 70% by the addition of words from citing statements. The difference may be because the average number of citing papers per unretrieved cancer paper was only six while that for chemistry papers was thirteen.  相似文献   

8.
Many operational IR indexes are non-normalized, i.e. no lemmatization or stemming techniques, etc. have been employed in indexing. This poses a challenge for dictionary-based cross-language retrieval (CLIR), because translations are mostly lemmas. In this study, we face the challenge of dictionary-based CLIR in a non-normalized index. We test two optional approaches: FCG (Frequent Case Generation) and s-gramming. The idea of FCG is to automatically generate the most frequent inflected forms for a given lemma. FCG has been tested in monolingual retrieval and has been shown to be a good method for inflected retrieval, especially for highly inflected languages. S-gramming is an approximate string matching technique (an extension of n-gramming). The language pairs in our tests were English–Finnish, English–Swedish, Swedish–Finnish and Finnish–Swedish. Both our approaches performed quite well, but the results varied depending on the language pair. S-gramming and FCG performed quite equally in all the other language pairs except Finnish–Swedish, where s-gramming outperformed FCG.  相似文献   

9.
In this essay, I argue that popular entertainment can be understood in terms of Husserl’s concepts of epochē, reduction and constitution, and, conversely, that epochē, reduction and constitution can be explicated in terms of popular entertainment. To this end I use Husserl’s concepts to explicate and reflect upon the psychological and ethical effects of an exemplary instance of entertainment, the renowned Star Trek episode entitled “The Measure of a Man.” The importance of such an exercise is twofold: (1) to demonstrate, once again, the fecundity of the methodological procedures Husserl bequeathed to us; more than any other philosopher, he tapped into the fundamental manners in which we lose, make and remake the meanings of our lives; and (2) to demonstrate how popular entertainment, similarly, plays a central role in the making and remaking of the meanings of our lives. If my zig-zag procedure between Husserl’s philosophy and popular entertainment is productive and cogent, in addition to elucidating Husserl’s philosophy, it will demonstrate the reality-generating potency and the constitutive power of entertainment in the contemporary world. Entertainment, via ourselves, has become the primary producer of the meanings via which consciousness constitutes the world.  相似文献   

10.
n-grams have been used widely and successfully for approximate string matching in many areas. s-grams have been introduced recently as an n-gram based matching technique, where di-grams are formed of both adjacent and non-adjacent characters. s-grams have proved successful in approximate string matching across language boundaries in Information Retrieval (IR). s-grams however lack precise definitions. Also their similarity comparison lacks precise definition. In this paper, we give precise definitions for both. Our definitions are developed in a bottom-up manner, only assuming character strings and elementary mathematical concepts. Extending established practices, we provide novel definitions of s-gram profiles and the L1 distance metric for them. This is a stronger string proximity measure than the popular Jaccard similarity measure because Jaccard is insensitive to the counts of each n-gram in the strings to be compared. However, due to the popularity of Jaccard in IR experiments, we define the reduction of s-gram profiles to binary profiles in order to precisely define the (extended) Jaccard similarity function for s-grams. We also show that n-gram similarity/distance computations are special cases of our generalized definitions.  相似文献   

11.
The essay examines the basic issues confronting information science education, issues that must be resolved if information science itself is to evolve in an orderly fashion. The essay is organized in three parts. In the first part definitions were considered and in a historical context the emergence, evolution and current state of information science and its education. This second part considers the problems and unresolved questions that deal with external aspects (“externalities”) of information science education: (i) academic affiliation, (ii) degree levels, (iii) admission requirements, (iv) jurisdiction and (v) financing. The third part will deal with problems and unresolved questions in respect to internal aspects (“internalities”) of education: (i) objectives, (ii) content, (iii) teachers and (iv) teaching. It is suggested that information science cannot prosper or even survive if concentrated action is not undertaken in the “externalities” and “internalities” of its education. A majority of the specific situations discussed pertain to North America, however, general aspects are valid for information science education everywhere. Recommendations about areas that need action are made.  相似文献   

12.
The article treats the problem of “rationality” in learning processes in research policies. The underlying hypothesis is that there are contemporary efforts in research policy-making, which, against views in organisational sociology like “bounded rationality” or “garbage-can”, endeavour to “rationalise” the process of decision-making in research policies. This hypothesis is worked out by taking one example, the setting-up of the “National Centres of Competence in Research” (NCCR) in Switzerland and analyse the processes that have contributed to the acceptance of this funding measure. Our finding is that Switzerland has introduced some “rationalising devices” but that these devices are still insufficiently institutionalised and can be further elaborated. In addition, it is made clear that goal-oriented problem-solving and interests are closely intertwined and cannot be dissociated from another. This may have distorting effects on the rationality of the learning process. It is, nevertheless, a necessary condition in order to learn at all.  相似文献   

13.
An expert system was developed in the area of information retrieval, with the objective of performing the job of an information specialist, who assists users in selecting the right vocabulary terms for a database search.The system is composed of two components: One is the knowledge base, represented as a semantic network, in which the nodes are words, concepts, phrases, comprising a vocabulary of the application area and the links express semantic relationships between those nodes. The second component is the rules, or procedures, which operate upon the knowledge-base, analogous to the decision rules or work patterns of the information specialist.Two major stages comprise the consulting process of the system: During the “search” stage relevant knowledge in the semantic network is activated, and search and evaluation rules are applied in order to find appropriate vocabulary terms to represent the user's problem. During the “suggest” stage those terms are further evaluated, dynamically rank-ordered according to relevancy, and suggested to the user. Explanations to the findings can be provided by the system and backtracking is possible in order to find alternatives in case some suggested term is rejected by the user.This article presents the principle, procedures and rules which are utilized in the expert system.  相似文献   

14.
Abbreviations adversely affect information retrieval and text comprehensibility. We describe a software tool to decipher abbrevations by finding their whole-word equivalents or “disabbreviations”. It uses a large English dictionary and a rule-based system to guess the most-likely candidates, with users having final approval. The rule-based system uses a variety of knowledge to limit its search, including phonetics, known methods of constructing multiword abbrevations, and analogies to previous abbreviations. The tool is especially helpful for retrieval from computer programs, a form of technical text in which abbreviations are notoriously common; disabbreviation of programs can make programs more reusable, improving software engineering. It also helps decipher the often-specialized abbreviations in technical captions. Experimental results confirm that the prototype tool is easy to use, finds many correct disabbreviations, and improves text comprehensibility.  相似文献   

15.
The paper re-examines the twin concepts of knowledge “tacitness” and “codification”, which both the literature on (broadly defined) industrial districts, and some recent econometric literature on “localized knowledge spillovers” have possibly mis-handled. Even within specialized local small and medium enterprises (SMEs) clusters, knowledge may be highly codified and firm-specific. The case study on Brescia mechanical firms shows that knowledge, rather than flowing freely within the cluster boundaries, circulates within a few smaller “epistemic communities”, each centered around the mechanical engineers of individual machine producers, and spanning to a selected number of suppliers’ and customers’ technicians. Physical distance among members of each community vary a lot, but even local messages may be highly codified.  相似文献   

16.
Egghe’s three papers regarding the universal IR surface (2004, 2007, 2008) clearly represent an original and significant contribution to the IR evaluation literature. However, Egghe’s attempt to find a complete set of universal IR evaluation points (P,R,F,M) fell short of his goal: his universal IR surface equation did not suffice in and of itself, and his continuous extension argument was insufficient to find all the remaining points (quadruples). Egghe found only two extra universal IR evaluation points, (1,1,0,0) and (0,0,1,1), but it turns out that a total of 15 additional, valid, universal IR evaluation points exist. The gap first appeared in Egghe’s earliest paper and was carried into subsequent papers. The mathematical method used here for finding the additional universal IR evaluation points involves defining the relevance metrics P,R,F,M in terms of the Swets variables a,b,c,d. Then the maximum possible number of additional quadruples is deduced, and finally, all the invalid quadruples are eliminated so that only the valid, universal IR points remain. Six of these points may be interpreted as being continuous extensions of the universal IR surface, while the other nine points may be interpreted as being “off the universal IR surface.” This completely solves the problem of finding the maximum range possible of universal IR evaluation points.  相似文献   

17.
Bibliometric maps of field of science   总被引:2,自引:1,他引:2  
The present paper is devoted to two directions in algorithmic classificatory procedures: the journal co-citation analysis as an example of citation networks and lexical analysis of keywords in the titles and texts. What is common to those approaches is the general idea of normalization of deviations of the observed data from the mathematical expectation. The application of the same formula leads to discovery of statistically significant links between objects (journals in one case, keywords — in the other). The results of the journal co-citation analysis are reflected in tables and map for field “Women’s Studies” and for field “Information Science and Library Science”. An experimental attempt at establishing textual links between words was carried out on two samples from SSCI Data base: (1) EDUCATION and (2) ETHICS. The EDUCATION file included 2180 documents (of which 751 had abstracts); the ETHICS file included 807 documents (289 abstracts). Some examples of the results of this pilot study are given in tabular form . The binary links between words discovered in this way may form triplets or other groups with more than two member words.  相似文献   

18.
This article deals with the “teething, problems” of the profession of researchers in Italy.This group of professionals in fact has grown very rapidly over the last two decades, but they still seem to be striving to find their proper place in the organization they work in and recognition from society in general. At the same time they have acquired a sufficiently self-awareness as an emerging group and started claiming their own “rights”.The article examines some problems researchers encounter in their specific working setting (industry, university, public research agencies), such as mobility, status, and career prospects.Mobility within the Italian science and technology (S&T) system is very low and is essentially one way, from industry. the professions, and public research agencies, to university, This peculiarity is very closely linked to the high prestige enjoyed by university professors in this country. Moreover. recent laws concerning university teaching staff havede facto saturated the permanent staff, severely restricting both mobility to university and the intake of new blood.An indication of the self-awareness of the profession may he found in the mobilisation of researchers in public research agencies and their claim to their “ecological niche”.It is concluded that researchers, who may he considered a substantially homogeneous group, feel mature and numerous enough to demand the slalus and prestige adequate to their contribution to a modern society; it is also asserted that the delay in granting recognition is due to the organizations and institutions.  相似文献   

19.
An imperfect document selection system is represented as the analogy of a system in which symbols are selected and transmitted through a noisy channel. Provided that transmission reception uncertainties and not meaning are considered, it is suggested that one of Shannon's equations is applicable, and a single figure measure of system efficiency, Ht, is proposed.Values obtained using this new yardstick are compared with Recall/Precision values obtained for a typical system. Further research is required to test whether system “improvements” resulting in higher values of Ht are perceived as such by users.  相似文献   

20.
In this paper, we present several insights regarding the influence of institutional design on the process of Research Joint Venture (RJV) formation. Our results are obtained with a firm-level dataset on RJVs formed under the umbrella of the Eureka initiative and of the European Union’s Framework Programmes (EU-FPs) for science and technology. We focus on firms that are known to have a high probability of forming RJVs, with the latter identified as firms with a past experience in collaborative research. The results indicate that EU-FP RJVs are consistent with a “top-down” and “mission oriented” research policy. By contrast, Eureka RJVs appear as more market driven and “bottom-up”.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号