首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
王德鹏 《科教文汇》2012,(32):67-67,73
多词表达在自然语言处理领域具有越来越重要的地位,多词表达的研究对于机器学习与机器翻译等有重要的意义.在中文中有一类特殊的多词表达,在文中具有特定的含义,因此成语在中文中的提取具有重要的地位,文中采用搭配同现相关性模型与直接提取模型相结合的方法对成语提取进行研究.  相似文献   

2.
Summarizing court decisions   总被引:2,自引:0,他引:2  
In the field of law there is an absolute need for summarizing the texts of court decisions in order to make the content of the cases easily accessible for legal professionals. During the SALOMON and MOSAIC2 projects we investigated the summarization and retrieval of legal cases. This article presents some of the main findings while integrating the research results of experiments on legal document summarization by other research groups. In addition, we propose novel avenues of research for automatic text summarization, which we currently exploit when summarizing court decisions in the ACILA3 project. Techniques for automated concept learning and argument recognition are here the most challenging.  相似文献   

3.
4.
In the current era of internet, information related to crime is scattered across many sources namely news media, social networks, blogs, and video repositories, etc. Crime reports published in online newspapers are often considered as reliable compared to crowdsourced data like social media and contain crime information not only in the form of unstructured text but also in the form of images. Given the volume and availability of crime-related information present in online newspapers, gathering and integrating crime entities from multiple modalities and representing them as a knowledge base in machine-readable form will be useful for any law enforcement agencies to analyze and prevent criminal activities. Extant research works to generate the crime knowledge base, does not address extraction of all non-redundant entities from text and image data present in multiple newspapers. Hence, this work proposes Crime Base, an entity relationship based system to extract and integrate crime related text and image data from online newspapers with a focus towards reducing duplicity and loss of information in the knowledge base. The proposed system uses a rule-based approach to extract the entities from text and image captions. The entities extracted from text data are correlated using contextual as-well-as semantic similarity measures and image entities are correlated using low-level and high-level image features. The proposed system also presents an integrated view of these entities and their relations in the form of a knowledge base using OWL. The system is tested for a collection of crime related articles from popular Indian online newspapers.  相似文献   

5.
Natural Language Processing (NLP) techniques have been successfully used to automatically extract information from unstructured text through a detailed analysis of their content, often to satisfy particular information needs. In this paper, an automatic concept map construction technique, Fuzzy Association Concept Mapping (FACM), is proposed for the conversion of abstracted short texts into concept maps. The approach consists of a linguistic module and a recommendation module. The linguistic module is a text mining method that does not require the use to have any prior knowledge about using NLP techniques. It incorporates rule-based reasoning (RBR) and case based reasoning (CBR) for anaphoric resolution. It aims at extracting the propositions in text so as to construct a concept map automatically. The recommendation module is arrived at by adopting fuzzy set theories. It is an interactive process which provides suggestions of propositions for further human refinement of the automatically generated concept maps. The suggested propositions are relationships among the concepts which are not explicitly found in the paragraphs. This technique helps to stimulate individual reflection and generate new knowledge. Evaluation was carried out by using the Science Citation Index (SCI) abstract database and CNET News as test data, which are well known databases and the quality of the text is assured. Experimental results show that the automatically generated concept maps conform to the outputs generated manually by domain experts, since the degree of difference between them is proportionally small. The method provides users with the ability to convert scientific and short texts into a structured format which can be easily processed by computer. Moreover, it provides knowledge workers with extra time to re-think their written text and to view their knowledge from another angle.  相似文献   

6.
自动文摘综述   总被引:2,自引:0,他引:2  
刘挺  吴岩  王开铸 《情报科学》1998,16(1):63-69
本文回顾了自动文摘技术的发展历史,介绍了三种主要的文摘方法,即基于文本物理信息分析的方法、基于自然语言理解的方法以及基于文本结构分析的方法。同时指出了自动文摘技术目前存在的问题,包括摘要内容冗余的问题、摘要的语言缺乏连贯性的问题等,并探讨了文摘技术未来的发展方向。  相似文献   

7.
李冰  张璐 《科教文汇》2011,(32):139-139,169
连词造句、学以致用是语法教学的目的。但是,在英语教学中语法教学越来越不受重视,致使许多大学生出口"乱法"、下笔"乱文"。因此,学习语法的目的就是掌握学习语言的工具,只有掌握了这种工具,学生才能掌握这种语言,以至在"做"和"说"中学会使用。  相似文献   

8.
罗显良 《科教文汇》2013,(35):121-121,123
英语语法是系统的知识体系,处在认知初期的初中生,在学〉--j英语语法知识时,普遍不喜欢复杂语法规则的教授,更倾向于在生动的课堂活动中学习和运用语法。本研究提出一些在选择英语语法教学方法时应该考虑的相关因素以及促进初中生语法学习的教学方法,以期帮助初中英语教师更好地认识影响语法教学的相关要素,从而思考符合所教学生的教学方法,为初中英语教师语法教学方法的选择和语法教学活动的开展,提供了一定的依据,具有研究价值和研究意义。  相似文献   

9.
Arabic is a widely spoken language but few mining tools have been developed to process Arabic text. This paper examines the crime domain in the Arabic language (unstructured text) using text mining techniques. The development and application of a Crime Profiling System (CPS) is presented. The system is able to extract meaningful information, in this case the type of crime, location and nationality, from Arabic language crime news reports. The system has two unique attributes; firstly, information extraction that depends on local grammar, and secondly, dictionaries that can be automatically generated. It is shown that the CPS improves the quality of the data through reduction where only meaningful information is retained. Moreover, the Self Organising Map (SOM) approach is adopted in order to perform the clustering of the crime reports, based on crime type. This clustering technique is improved because only refined data containing meaningful keywords extracted through the information extraction process are inputted into it, i.e. the data are cleansed by removing noise. The proposed system is validated through experiments using a corpus collated from different sources; it was not used during system development. Precision, recall and F-measure are used to evaluate the performance of the proposed information extraction approach. Also, comparisons are conducted with other systems. In order to evaluate the clustering performance, three parameters are used: data size, loading time and quantization error.  相似文献   

10.
甘玲 《科教文汇》2012,(1):122-123
体验式教学模式是"从做中学",是从体验——反思——理论——应用——体验的循环模式。听力是属于接受技巧,要求听者运用所有的语言知识、经验与所听到内容调和从而达到理解文本的目的。教师在听前训练中引入对话环节,听中运用从上至下、从下至上的程序,即从语法、词汇、结构和语境、背景两方面进行教学。最后运用突显原则,让学生根据已有信息掌握文本大意。  相似文献   

11.
Compact graphic display of phrases from the original text is among abstracting assistance features being prototyped in the TEXNET text network management system. Compaction is achieved by embedding subphrases and by enabling the user to select rapidly word by word. Phrases displayed would not necessarily be those selected for automatic indexing.  相似文献   

12.
With the popularity of online educational platforms, English learners can learn and practice no matter where they are and what they do. English grammar is one of the important components in learning English. To learn English grammar effectively, it requires students to practice questions containing focused grammar knowledge. In this paper, we study a novel problem of retrieving English grammar questions with similar grammatical focus. Since the grammatical focus similarity is different from textual similarity or sentence syntactic similarity, existing approaches cannot be applied directly to our problem. To address this problem, we propose a syntactic based approach for English grammar question retrieval which can retrieve related grammar questions with similar grammatical focus effectively. In the proposed syntactic based approach, we first propose a new syntactic tree, namely parse-key tree, to capture English grammar questions’ grammatical focus. Next, we propose two kernel functions, namely relaxed tree kernel and part-of-speech order kernel, to compute the similarity between two parse-key trees of the query and grammar questions in the collection. Then, the retrieved grammar questions are ranked according to the similarity between the parse-key trees. In addition, if a query is submitted together with answer choices, conceptual similarity and textual similarity are also incorporated to further improve the retrieval accuracy. The performance results have shown that our proposed approach outperforms the state-of-the-art methods based on statistical analysis and syntactic analysis.  相似文献   

13.
This paper describes a technique for automatic book indexing. The technique requires a dictionary of terms that are to appear in the index, along with all text strings that count as instances of the term. It also requires that the text be in a form suitable for processing by a text formatter. A program searches the text for each occurrence of a term or its associated strings and creates an entry to the index when either is found. The results of the experimental application to a portion of a book text are presented, including measures of precision and recall, with precision giving the ratio of terms correctly assigned in the automatic process to the total assigned, and recall giving the ratio of correct terms automatically assigned to the total number of term assignments according to a human standard. Results indicate that the technique can be applied successfully, especially for texts that employ a technical vocabulary and where there is a premium on indexing exhaustivity.  相似文献   

14.
刑事案件是纠纷的一种特殊形式,并非所有刑事案件均必须采取落实刑罚权的方式解决。协商性司法作为一种可移植的刑事纠纷解决方式,具有保护被害人权益、恢复受损社会秩序的特点,对于我国刑事诉讼一定范围内刑事纠纷的平和解决具有现实意义。刑事和解是协商性司法理念与我国实践结合后本土化的产物,2013年新刑事诉讼法实施后对此进行了立法确认。刑事诉讼的解纷属性考察应以纠纷解决为视角,在此基础上,进行准确的制度设计以实现在刑事诉讼中纠纷的平和解决。  相似文献   

15.
Cross-Lingual Link Discovery (CLLD) is a new problem in Information Retrieval. The aim is to automatically identify meaningful and relevant hypertext links between documents in different languages. This is particularly helpful in knowledge discovery if a multi-lingual knowledge base is sparse in one language or another, or the topical coverage in each language is different; such is the case with Wikipedia. Techniques for identifying new and topically relevant cross-lingual links are a current topic of interest at NTCIR where the CrossLink task has been running since the 2011 NTCIR-9. This paper presents the evaluation framework for benchmarking algorithms for cross-lingual link discovery evaluated in the context of NTCIR-9.  相似文献   

16.
Applying text mining techniques to legal issues has been an emerging research topic in recent years. Although a few previous studies focused on assisting professionals in the retrieval of related legal documents, to our knowledge, no previous studies could provide relevant statutes to the general public using problem statements. In this work, we design a text mining based method, the three-phase prediction (TPP) algorithm, which allows the general public to use everyday vocabulary to describe their problems and find pertinent statutes for their cases. The experimental results indicate that our approach can help the general public, who are not familiar with professional legal terms, to acquire relevant statutes more accurately and effectively.  相似文献   

17.
The automated classification of texts into predefined categories has witnessed a booming interest, due to the increased availability of documents in digital form and the ensuing need to organize them. An important problem for text classification is feature selection, whose goals are to improve classification effectiveness, computational efficiency, or both. Due to categorization unbalancedness and feature sparsity in social text collection, filter methods may work poorly. In this paper, we perform feature selection in the training process, automatically selecting the best feature subset by learning, from a set of preclassified documents, the characteristics of the categories. We propose a generative probabilistic model, describing categories by distributions, handling the feature selection problem by introducing a binary exclusion/inclusion latent vector, which is updated via an efficient Metropolis search. Real-life examples illustrate the effectiveness of the approach.  相似文献   

18.
无线网络   总被引:1,自引:0,他引:1  
郑海波 《现代情报》2005,25(10):195-197
本文论述了近年来发展迅速的无线局域网技术,并通过实际工程案例,介绍了相关的知识。  相似文献   

19.
【目的/意义】元宇宙视域下的虚拟教育知识流转机制与现有形态相比将发生重要的变革,需要研究并推动 元宇宙在虚拟教育领域的应用,促进知识在社会群体中的高效流转。【方法/过程】通过对相关案例的挖掘,分析其 所在知识流转环节的本质,以此为基础对元宇宙视域下的虚拟教育知识流转机制进行构建。【结果/结论】元宇宙视 域下的虚拟教育知识流转,历经从知识源到虚拟资源重建,到学习者内化吸收,再到去中心化的知识创造、分享与 储存,最后回归社会总体知识的过程。【创新/局限】对虚拟教育知识流转机制进行重构,构建了元宇宙视域下的虚 拟教育知识流转机制。但元宇宙尚处于探索发展初期,缺乏实证条件,相关实践案例样本也不够丰富。  相似文献   

20.
Most knowledge accumulated through scientific discoveries in genomics and related biomedical disciplines is buried in the vast amount of biomedical literature. Since understanding gene regulations is fundamental to biomedical research, summarizing all the existing knowledge about a gene based on literature is highly desirable to help biologists digest the literature. In this paper, we present a study of methods for automatically generating gene summaries from biomedical literature. Unlike most existing work on automatic text summarization, in which the generated summary is often a list of extracted sentences, we propose to generate a semi-structured summary which consists of sentences covering specific semantic aspects of a gene. Such a semi-structured summary is more appropriate for describing genes and poses special challenges for automatic text summarization. We propose a two-stage approach to generate such a summary for a given gene – first retrieving articles about a gene and then extracting sentences for each specified semantic aspect. We address the issue of gene name variation in the first stage and propose several different methods for sentence extraction in the second stage. We evaluate the proposed methods using a test set with 20 genes. Experiment results show that the proposed methods can generate useful semi-structured gene summaries automatically from biomedical literature, and our proposed methods outperform general purpose summarization methods. Among all the proposed methods for sentence extraction, a probabilistic language modeling approach that models gene context performs the best.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号