首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
Researchers in indexing and retrieval systems have been advocating the inclusion of more contextual information to improve results. The proliferation of full-text databases and advances in computer storage capacity have made it possible to carry out text analysis by means of linguistic and extra-linguistic knowledge. Since the mid 80s, research has tended to pay more attention to context, giving discourse analysis a more central role. The research presented in this paper aims to check whether discourse variables have an impact on modern information retrieval and classification algorithms. In order to evaluate this hypothesis, a functional framework for information analysis in an automated environment has been proposed, where the n-grams (filtering) and the k-means and Chen’s classification algorithms have been tested against sub-collections of documents based on the following discourse variables: “Genre”, “Register”, “Domain terminology”, and “Document structure”. The results obtained with the algorithms for the different sub-collections were compared to the MeSH information structure. These demonstrate that n-grams does not appear to have a clear dependence on discourse variables, though the k-means classification algorithm does, but only on domain terminology and document structure, and finally Chen’s algorithm has a clear dependence on all of the discourse variables. This information could be used to design better classification algorithms, where discourse variables should be taken into account. Other minor conclusions drawn from these results are also presented.  相似文献   

3.
In this essay, I argue that popular entertainment can be understood in terms of Husserl’s concepts of epochē, reduction and constitution, and, conversely, that epochē, reduction and constitution can be explicated in terms of popular entertainment. To this end I use Husserl’s concepts to explicate and reflect upon the psychological and ethical effects of an exemplary instance of entertainment, the renowned Star Trek episode entitled “The Measure of a Man.” The importance of such an exercise is twofold: (1) to demonstrate, once again, the fecundity of the methodological procedures Husserl bequeathed to us; more than any other philosopher, he tapped into the fundamental manners in which we lose, make and remake the meanings of our lives; and (2) to demonstrate how popular entertainment, similarly, plays a central role in the making and remaking of the meanings of our lives. If my zig-zag procedure between Husserl’s philosophy and popular entertainment is productive and cogent, in addition to elucidating Husserl’s philosophy, it will demonstrate the reality-generating potency and the constitutive power of entertainment in the contemporary world. Entertainment, via ourselves, has become the primary producer of the meanings via which consciousness constitutes the world.  相似文献   

4.
The paper re-examines the twin concepts of knowledge “tacitness” and “codification”, which both the literature on (broadly defined) industrial districts, and some recent econometric literature on “localized knowledge spillovers” have possibly mis-handled. Even within specialized local small and medium enterprises (SMEs) clusters, knowledge may be highly codified and firm-specific. The case study on Brescia mechanical firms shows that knowledge, rather than flowing freely within the cluster boundaries, circulates within a few smaller “epistemic communities”, each centered around the mechanical engineers of individual machine producers, and spanning to a selected number of suppliers’ and customers’ technicians. Physical distance among members of each community vary a lot, but even local messages may be highly codified.  相似文献   

5.
The use of domain-specific concepts in biomedical text summarization   总被引:3,自引:0,他引:3  
Text summarization is a method for data reduction. The use of text summarization enables users to reduce the amount of text that must be read while still assimilating the core information. The data reduction offered by text summarization is particularly useful in the biomedical domain, where physicians must continuously find clinical trial study information to incorporate into their patient treatment efforts. Such efforts are often hampered by the high-volume of publications. This paper presents two independent methods (BioChain and FreqDist) for identifying salient sentences in biomedical texts using concepts derived from domain-specific resources. Our semantic-based method (BioChain) is effective at identifying thematic sentences, while our frequency-distribution method (FreqDist) removes information redundancy. The two methods are then combined to form a hybrid method (ChainFreq). An evaluation of each method is performed using the ROUGE system to compare system-generated summaries against a set of manually-generated summaries. The BioChain and FreqDist methods outperform some common summarization systems, while the ChainFreq method improves upon the base approaches. Our work shows that the best performance is achieved when the two methods are combined. The paper also presents a brief physician’s evaluation of three randomly-selected papers from an evaluation corpus to show that the author’s abstract does not always reflect the entire contents of the full-text.  相似文献   

6.
One of the best known measures of information retrieval (IR) performance is the F-score, the harmonic mean of precision and recall. In this article we show that the curve of the F-score as a function of the number of retrieved items is always of the same shape: a fast concave increase to a maximum, followed by a slow decrease. In other words, there exists a single maximum, referred to as the tipping point, where the retrieval situation is ‘ideal’ in terms of the F-score. The tipping point thus indicates the optimal number of items to be retrieved, with more or less items resulting in a lower F-score. This empirical result is found in IR and link prediction experiments and can be partially explained theoretically, expanding on earlier results by Egghe. We discuss the implications and argue that, when comparing F-scores, one should compare the F-score curves’ tipping points.  相似文献   

7.
Citing statements can be used to aid retrieval, to increase the efficiency of citation indexes and for the study of information flow and use. These uses are only feasible on a large scale if computers can identify citing statements within the texts of documents with reasonable accuracy.Computer recognition of multi-sentence citing statements is not easy. Procedures developed for chemistry papers in an earlier experiment were tested on biomedical papers (dealing with various aspects of cancer) and were almost as successful. Specifically, (1) 78% of the words in computer-recognized citing statements were correctly attributable to the corresponding cited papers; and (2) the computer procedures missed 4% of the words in the actual citing statements. When the procedures were modified on the basis of those results and tested on a new sample of cancer papers the results were comparable: 72 and 3% respectively.In an earlier experiment in use of full-text searching to retrieve answer-passages from cancer papers, recall in the “test phase” averaged about 70% and the false retrieval rate was thirteen falsely retrieved sentences per answer-paper retrieved. Unretrieved answer-papers in that experiment's “development phase”, and citing statements referring to them, were studied to develop computer procedures for using citing statements to increase recall. The procedures developed only produced slight recall increases for development phase answer-papers, and similarly for the test phase papers on which they were then tested. Specifically, the test phase results were the following: recall was increased from 70 to 74%, and there was no increase in false retrieval. This contrasts with an earlier experiment in which 50% recall of chemistry papers by search of index terms and abstract words was increased to 70% by the addition of words from citing statements. The difference may be because the average number of citing papers per unretrieved cancer paper was only six while that for chemistry papers was thirteen.  相似文献   

8.
Task-based evaluation of text summarization using Relevance Prediction   总被引:2,自引:0,他引:2  
This article introduces a new task-based evaluation measure called Relevance Prediction that is a more intuitive measure of an individual’s performance on a real-world task than interannotator agreement. Relevance Prediction parallels what a user does in the real world task of browsing a set of documents using standard search tools, i.e., the user judges relevance based on a short summary and then that same user—not an independent user—decides whether to open (and judge) the corresponding document. This measure is shown to be a more reliable measure of task performance than LDC Agreement, a current gold-standard based measure used in the summarization evaluation community. Our goal is to provide a stable framework within which developers of new automatic measures may make stronger statistical statements about the effectiveness of their measures in predicting summary usefulness. We demonstrate—as a proof-of-concept methodology for automatic metric developers—that a current automatic evaluation measure has a better correlation with Relevance Prediction than with LDC Agreement and that the significance level for detected differences is higher for the former than for the latter.  相似文献   

9.
Beginning with the initial premise that as the Internet has a global character, the paper will argue that the normative evaluation of digital information on the Internet necessitates an evaluative model that is itself universal and global in character (I agree, therefore, with Gorniak- Kocikowska’s claim that because of its global nature “computer ethics has to be regarded as global ethics”. (Gorniak-Kocikowska, Science and Engineering Ethics, 1996). The paper will show that information has a dual normative structure that commits all disseminators of information to both epistemological and ethical norms that are in principle universal and thus global in application. Based on this dual normative characterization of information the paper will seek to demonstrate: (1) that information and internet information (interformation) specifically, as a process and product of communication, has an inherent normative structure that commits its producers, disseminators, communicators and users, everyone in fact that deals with information, to certain mandatory epistemological and ethical commitments; and (2) that the negligent or purposeful abuse of information in violation of the epistemological and ethical commitments to which its inherent normative structure gives rise is also a violation of universal rights to freedom and wellbeing to which all agents are entitled by virtue of being agents, and in particular informational agents.  相似文献   

10.
Bibliometric maps of field of science   总被引:2,自引:1,他引:2  
The present paper is devoted to two directions in algorithmic classificatory procedures: the journal co-citation analysis as an example of citation networks and lexical analysis of keywords in the titles and texts. What is common to those approaches is the general idea of normalization of deviations of the observed data from the mathematical expectation. The application of the same formula leads to discovery of statistically significant links between objects (journals in one case, keywords — in the other). The results of the journal co-citation analysis are reflected in tables and map for field “Women’s Studies” and for field “Information Science and Library Science”. An experimental attempt at establishing textual links between words was carried out on two samples from SSCI Data base: (1) EDUCATION and (2) ETHICS. The EDUCATION file included 2180 documents (of which 751 had abstracts); the ETHICS file included 807 documents (289 abstracts). Some examples of the results of this pilot study are given in tabular form . The binary links between words discovered in this way may form triplets or other groups with more than two member words.  相似文献   

11.
介绍和评述了中华医学会杂志社领军人物游苏宁先生最新编著的《编辑哲思与践行》一书及著者所取得的学术成就。全书共五章,分别为恪守初心、砥砺前行,末技探究、日益精进,编辑哲思、办刊秘籍,百年老店、立德树人,学界泰斗、业界楷模。该书汇集了中华医学会杂志社近30年编辑学研究的代表性论文106篇,不仅是我国医学科技期刊编辑学研究的思想集萃,也忠实记录了这一中国医学期刊航母的发展与实践历程。  相似文献   

12.
C. Brotman 《Endeavour》2001,25(4):144
In the years after the publication of Darwin's On the Origin of Species, Alfred Russel Wallace became a prominent critic of the argument that evolution provided a sufficient account of human origins. Unbeknownst to many historians of science, Wallace partly based his case on his belief that man's musical sense and aesthetic powers could not have evolved by natural selection. Although he witnessed a variety of musical practices during his travels abroad, Wallace, like many contemporaries in Victorian England, assumed that music uniquely belonged to the ‘civilized’ world he inhabited. In the late 19th century, some evolutionists would challenge this view by reconceiving the nature of music itself.  相似文献   

13.
The high quality evaluation of generated summaries is needed if we are to improve automatic summarization systems. Although human evaluation provides better results than automatic evaluation methods, its cost is huge and it is difficult to reproduce the results. Therefore, we need an automatic method that simulates human evaluation if we are to improve our summarization system efficiently. Although automatic evaluation methods have been proposed, they are unreliable when used for individual summaries. To solve this problem, we propose a supervised automatic evaluation method based on a new regression model called the voted regression model (VRM). VRM has two characteristics: (1) model selection based on ‘corrected AIC’ to avoid multicollinearity, (2) voting by the selected models to alleviate the problem of overfitting. Evaluation results obtained for TSC3 and DUC2004 show that our method achieved error reductions of about 17–51% compared with conventional automatic evaluation methods. Moreover, our method obtained the highest correlation coefficients in several different experiments.  相似文献   

14.
This research contributes to the intra-organization, inter-organization, and new product development (NPD) management literature by studying the impact of a firm's internal organizational design on the communication within and performance of NPD projects conducted with strategic alliance partners. The empirical data were collected from three in-depth case studies of network lead companies (NLCs) operating in different industries. The three NLCs have different internal organizational designs, ranging from very flexible “organic” to very rigid “mechanistic.” In each NLC, a successful new-to-firm product development project was chosen for further detailed investigation. First, we identify the role the alliance's NPD project characteristics and industry characteristics play in determining the “intensity level” and “media richness” of communication required between the alliance's NPD project partners. Then, we examine how the internal organizational design influences the actual intensity and media richness of communication of the alliance's NPD project that matches our assumptions of what is required.  相似文献   

15.
Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel integrated information retrieval system—the Query, Cluster, Summarize (QCS) system—which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of methods in the QCS design improves retrievals by providing users more focused information organized by topic.We demonstrate the improved performance by a series of experiments using standard test sets from the Document Understanding Conferences (DUC) as measured by the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines.Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence “trimming” and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format.Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules.  相似文献   

16.
The problem addressed in this article is to use Bertram Brookes' ‘fundamental equation’ as a starting off-point for a conceptual exercise whose purpose is to set out a method for calculating the information content of an information process. The knowledge structure variables in the Brookes' equation are first operationalized, following principles set out in Claude Shannon's mathematical theory of communication. The set of ‘a priori’ alternatives and the a priori probabilities assigned to each member of the set by the person undergoing the information process is the operational definition of the variable ‘K[S]’ from the ‘fundamental equation,’ which represent the person's knowledge structure ‘before’ the information process takes place. The set of ‘a posteriori’ alternatives and the revised probabilities assigned to each member of the set by the person undergoing the information process is the operational definition of the Brookes' variable ‘K[S + ΔS],’ which is the person's knowledge structure ‘after’ the information process takes place. To illustrate how the variables can be determined, an example of a information process is used from a recent real-life archeological discovery.  相似文献   

17.
The emergence of Web-based IR systems calls for the need to support ease-of-use as well as user control. This study attempts to investigate users’ perceptions of ease-of-use versus user control, and desired functionalities as well as desired interface structure of online IR systems in supporting both ease-of-use and user control. Forty subjects who had an opportunity to learn and use five online databases participated in the study. Multiple methods were employed to collect data. The qualitative and quantitative analysis of the data show that users consider both ease-of-use and user control are essential for effective retrieval. The results are discussed within the context of a model of optimal support for ease-of-use and user control, particularly, emphasizing on the balance between system role and user involvement in achieving various IR sub-tasks.  相似文献   

18.
This paper reports findings from an exploratory study investigating working notes created during encoding and external storage (EES) processes, by human search intermediates using a Boolean information retrieval (IR) system. EES processes have been an important area of research in educational contexts where students create and use notes to facilitate learning. In the context of interactive IR, encoding can be conceptualized as the process of creating working notes to help in the understanding and translating a user's information problem into a search strategy suitable for use with an IR system. External storage is the process of using working notes to facilitate interaction with IR systems. Analysis of 221 sets of working notes created by human search intermediaries revealed extensive use of EES processes and the creation of working notes of textual, numerical and graphical entities. Nearly 70% of recorded working notes were textual/numerical entities, nearly 30% were graphical entities and 0.73% were indiscernible. Segmentation devices were also used in 48% of the working notes. The creation of working notes during EES processes was a fundamental element within the mediated, interactive IR process. Implications for the design of IR interfaces to support users' EES processes and further research is discussed.  相似文献   

19.
Social media systems have encouraged end user participation in the Internet, for the purpose of storing and distributing Internet content, sharing opinions and maintaining relationships. Collaborative tagging allows users to annotate the resulting user-generated content, and enables effective retrieval of otherwise uncategorised data. However, compared to professional web content production, collaborative tagging systems face the challenge that end-users assign tags in an uncontrolled manner, resulting in unsystematic and inconsistent metadata.This paper introduces a framework for the personalization of social media systems. We pinpoint three tasks that would benefit from personalization: collaborative tagging, collaborative browsing and collaborative search. We propose a ranking model for each task that integrates the individual user’s tagging history in the recommendation of tags and content, to align its suggestions to the individual user preferences. We demonstrate on two real data sets that for all three tasks, the personalized ranking should take into account both the user’s own preference and the opinion of others.  相似文献   

20.
A hybrid text/citation-based method is used to cluster journals covered by the Web of Science database in the period 2002–2006. The objective is to use this clustering to validate and, if possible, to improve existing journal-based subject-classification schemes. Cross-citation links are determined on an item-by-paper procedure for individual papers assigned to the corresponding journal. Text mining for the textual component is based on the same principle; textual characteristics of individual papers are attributed to the journals in which they have been published. In a first step, the 22-field subject-classification scheme of the Essential Science Indicators (ESI) is evaluated and visualised. In a second step, the hybrid clustering method is applied to classify the about 8300 journals meeting the selection criteria concerning continuity, size and impact. The hybrid method proves superior to its two components when applied separately. The choice of 22 clusters also allows a direct field-to-cluster comparison, and we substantiate that the science areas resulting from cluster analysis form a more coherent structure than the “intellectual” reference scheme, the ESI subject scheme. Moreover, the textual component of the hybrid method allows labelling the clusters using cognitive characteristics, while the citation component allows visualising the cross-citation graph and determining representative journals suggested by the PageRank algorithm. Finally, the analysis of journal ‘migration’ allows the improvement of existing classification schemes on the basis of the concordance between fields and clusters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号