首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
[目的 /意义]将海量学术文本观点提取工作由人工转向机器,提高效率的同时又能够保证观点提取的准确性、客观性。[方法 /过程]使用UniLM统一语言预训练模型,训练过程中对模型进行精调,以人工标注数据集进行机器学习。将学术文摘作为长度为a的文本序列,经过机器学习,生成长度为b的句子序列(a≥b),并且作为学术论文观点句输出。[结果 /结论 ]研究结果表明:UniLM模型对于规范型文摘、半规范型文摘、非规范型文摘观点生成精准度分别为94.36%、77.27%、57.43%,规范型文摘生成效果最好。将机器学习模型应用于长文本观点生成,为学术论文观点生成提供一种新方法。不足之处在于本文模型依赖文摘的结构性,对非规范型文摘观点生成效果有所欠缺。  相似文献   

2.
[目的/意义] 自动识别项目申请书摘要中的科学要素,对于揭示科技项目中的科学知识具有重要的研究意义。这些科学要素的识别依赖于结构化项目摘要文本,然而目前结构化项目摘要语料资源匮乏,严重制约着相关研究的进一步发展。拟构建项目申请书摘要文本的语步语料集,为相关研究提供数据支撑。[方法/过程] 首先将项目摘要内容归纳为背景及问题、目标及任务、方法内容、价值意义4种语步类型,总结每个语步结构中出现的标志性特征并制定语步标注规范;其次相继利用基于规则和基于深度学习的方法辅助人工进行项目摘要的语步结构标注,并对每轮标注后的语料进行质量评估。[结果/结论] 两种方法共计标注近25 000条语句,语料标注的一致性系数达到0.983 9,表明该语料集基本能够区分项目摘要内的不同语步结构,初步达到了语料库建设的基本要求。  相似文献   

3.
《The Reference Librarian》2013,54(22):297-308
Reference librarians, who are thoroughly familiar with the purpose, scope and arrangement of abstract journals, are uniquely qualified for the task of writing abstracts. The procedures described here offer a relatively simply way for them to write acceptable abstracts from the outset. Although research is being conducted in the area of machine generated abstracts, there will continue to be arole for human abstractors, who must determine what combination of the author's original words and/or paraphrasing best conveys the central message of the article. The completed abstract must be accurate and must include those terms and phrases that will both facilitate manual indexing and enhance free-text searching of on-line abstracts.  相似文献   

4.
In clinical research and clinical decision-making, it is important to know if a study changes or only supports the current standards of care for specific disease management. We define such a change as transformative and a support as incremental research. It usually requires a huge amount of domain expertise and time for humans to finish such tasks. Faculty Opinions provides us with a well-annotated corpus on whether a research challenges or only confirms established research. In this study, a machine learning approach is proposed to distinguishing transformative from incremental clinical evidence. The texts from both abstract and a 2-year window of citing sentences are collected for a training set of clinical studies recommended and labeled by Faculty Opinions experts. We achieve the best performance with an average AUC of 0.755 (0.705–0.875) using Random Forest as the classifier and citing sentences as the feature. The results showed that transformative research has more typical language patterns in citing sentences than abstract sentences. We provide an efficient tool for identifying those clinical evidence challenging or only confirming established claims for clinicians and researchers.  相似文献   

5.
曹洋  成颖  裴雷 《图书情报工作》2014,58(18):122-130
探讨基于机器学习的自动文摘研究中的特征选取、算法选择、模型训练、文摘提取和模型评测等主要过程;重点分析3种主要的机器学习算法:朴素贝叶斯、隐马尔科夫和条件随机场,阐释3种算法的基本思想,在对相关研究进行系统梳理的基础上,给出作者的思考;对3种机器学习算法在训练方法、协同训练与主动学习、类别平衡以及词汇分布等方面存在的共性问题进行深入讨论并提出未来的主要研究方向。  相似文献   

6.
[目的/意义] 论文摘要是信息组织的重要标引对象,将论文摘要按一定结构进行标引有利于科学传播、知识发现和情报分析。如何对现有非结构式摘要进行精准快速的自动标引是亟待解决的现实问题。[方法/过程] 假定不同类别的摘要具有内在一致性,即对结构式摘要的研究可为非结构式摘要自动标引提供方法和技术参考。据此,基于美国国家医学图书馆结构要素标签术语集和标签分类映射关系,提出结构要素BOMRC体系和结构式摘要的识别与规范化标引方法。其次选取研究样本并采用文本挖掘方法对样本语料中的单词、动词、三词词块、四词词块等词汇进行词频、TFIDF值等多个指标的定量统计分析,构建能够进行结构要素识别的语义特征词典。最后利用非结构式摘要测试集进行语义特征词典有效性检验。[结果/结论] 结果显示,利用语义特征词典方法能够有效识别非结构式摘要的各类要素,并可用于优化以机器学习方法为核心的自动识别模型。  相似文献   

7.
��[Purpose/significance] The abstract of scientific papers is a vital indexing object within information organization. Meanwhile, indexing the abstract according to certain rules is conducive for not only scientific communication or knowledge discovery, and intelligence analysis as well. Thus, how to realize auto-index accurately and quickly, for millions of unstructured abstracts existed nowadays is a crucial problem to be addressed.[Method/process] This study assumed that different categories of abstract are inherently consistent, that is, the study of structured abstract can provide a method and technical reference for unstructured abstract auto-indexing. Acting in accordance with this assumption and based on the US National Library of Medicine's structural element labeling terminology, this study accomplished mapping across abstract element classifications and proposed BOMRC system, a normalization indexing method for structured abstract. Then we collected research sample and used text mining method to analyze multiple features of structured abstract quantitatively and statistically, such as word frequency, TF-IDF value, as for dimension of words, verbs, three-word lexical chunks and four-word lexical chunks, which enabled us propose a semantic feature dictionary for structured elements. Finally, we used unstructured abstract to test the validity of the semantic feature dictionary.[Result/conclusion] The results show that the semantic feature dictionary method can effectively identify various structural elements of scientific paper abstract, and it can be used to optimize the automatic recognition model, which may be based on machine learning methods.  相似文献   

8.
9.
郭燕慧  钟义信 《情报学报》2003,22(4):472-475
目前统计语言模型在语音识别、机器翻译和自动文摘等领域得到了广泛的应用.准确判别通过语言模型所得到的句子是否连贯、通顺,对于语言模型的评测和改进是个很重要的问题.本文采用基于词频统计的一组特征项,利用决策树算法自动对生成句的语义连贯性进行评测,在需要生成或识别连贯句的各自然语言处理领域具有广泛的实用价值.  相似文献   

10.
鉴于专利术语的翻译要求高度的准确性和专业性,而专利术语的自动获取翻译对于机器翻译、词典自动编纂、跨语言信息检索等自然语言处理具有重要的实用价值,从双语的专利摘要中分别抽取术语,之后融合多术语识别方法,采用规则翻译和统计机器翻译来动态地辅助词汇化方法进行术语对齐,以期尽可能多地在双语的专利文献中获取准确的专利术语翻译对。在专利文摘中进行实验验证的结果是:专利术语翻译对的准确率达到80%。  相似文献   

11.
Objective:We sought to determine how many abstracts presented at the 2012 and 2014 Medical Library Association (MLA) annual conferences were later published as full-text journal articles and which features of the abstract and first author influence the likelihood of future publication. To do so, we replicated a previous study on MLA conference abstracts presented in 2002 and 2003. The secondary objective was to compare the publication rates between the prior and current study.Methods:Presentations and posters delivered at the 2012 and 2014 MLA meetings were coded to identify factors associated with publication. Postconference publication of abstracts as journal articles was determined using a literature search and survey sent to first authors. Chi-squared tests were used to assess differences in the publication rate, and logistic regression was used to assess the influence of abstract factors on publication.Results:The combined publication rate for the 2012 and 2014 meetings was 21.8% (137/628 abstracts), which is a statistically significant decrease compared to the previously reported rate for 2002 and 2003 (27.6%, 122/442 abstracts). The odds that an abstract would later be published as a journal article increased if the abstract was multi-institutional or if it was research, specifically surveys or mixed methods research.Conclusions:The lower publication rate of MLA conference abstracts may be due to an increased number of program or nonresearch abstracts that were accepted or a more competitive peer review process for journals. MLA could increase the publication rate by encouraging and enabling multi-institutional research projects among its members.  相似文献   

12.
The present formatting and content of abstracts in educational research might be one barrier preventing wider dissemination and use of such research. Structured abstracts, with specific formatting and content requirements, might help researchers disseminate their work more effectively and efficiently. The purpose of this study was to investigate 2 years of abstracts of empirical research articles submitted to Research in the Schools and to determine the extent to which the abstracts were underdeveloped, thereby suggesting the need for structured abstracts. Of the 74 articles reviewed, 35 (44.3%) contained an underdeveloped abstract. Articles with underdeveloped abstracts were approximately twice as likely to be rejected than were articles with developed abstracts. Finally, 34.3% of the articles contained information in the abstract (e.g., purpose statement, sample size, findings) that was inconsistent with information provided elsewhere in the article.  相似文献   

13.
《全国报刊索引》摘刊量大摘文量小,而相应学科的文献期刊则相反。它们在不同程度上离期刊信息的实际离散特性。因此,用不同的检索类期刊研究与期刊信息离散性有关的同一课题时,难以得到相同的结果。检索类期刊摘刊摘文量的规范化以及采用科学的综合方法是更真实地反映期刊信息散性的有效途径。  相似文献   

14.
郭进京  黄奇 《图书情报工作》2021,65(20):123-134
[目的/意义] 科学研究(尤其是医学研究)充满了不确定性,识别研究中的矛盾知识主张将有助于识别"科学分歧/不一致的科学结论",推动潜在变革性科学研究的识别和相关研究的完善。[方法/过程] 以阿尔茨海默病为例,将PubMed文摘数据作为数据源,采用SemRep工具进行三元组抽取。制定表征矛盾意义的知识主张识别规则,依据不确定性程度高低对来源语句进行划分,分别采用单句识别和跨语句识别两种途径,识别以三元组形式表示的、具有矛盾意义的医学研究知识主张。[结果/结论] 从来源于PubMed的6 574篇医学文摘中共计识别出49组(涉及277对三元组)矛盾知识主张。阿尔茨海默病在诊断和治疗方面的研究仍存在部分争议和矛盾,有待进一步验证。矛盾知识主张识别为潜在变革性医学研究前沿发现提供新思路,可用于基于知识计量的知识发现和为知识图谱可信度计算提供参考。  相似文献   

15.
从《化学文摘》对摘要的修改看英文摘要撰写的常见问题   总被引:6,自引:3,他引:3  
景霞  周传敬 《编辑学报》2001,13(Z1):41-42
Common defects in English abstract writing deduced from analysis of abstracts corrected by Chemical Abstracts  相似文献   

16.
我国科技期刊应坚持国际化的文化视野和辩证的翻译观,提高英文的信息含量,倡导"形式为内容服务",细化科技期刊英译的规范。文章推荐了英文摘要规范化的一些具体要求,并建议编辑与作者共同努力提高科技期刊英译质量及国际传播力。  相似文献   

17.
基于本体的专利摘要知识抽取*   总被引:4,自引:0,他引:4  
采用知识工程的方法,对“新能源汽车”中文专利摘要进行研究分析,提出了一个基于本体的中文专利摘要抽取模型。通过构建相应的本体、收集相关的词表、撰写相应的规则,并利用这些规则对专利摘要进行知识抽取结果,抽取结果辅助完成专利知识库的自动构建。即就如何组织非结构化信息以及如何自动构建知识库的进行尝试,验证了基于本体对专利摘要进行知识抽取的可行性。  相似文献   

18.
吴俊玲 《编辑学报》2017,29(6):557-558
抽取2017年出版的100种医学期刊的100篇论文,每篇论文中至少1个句子同时出现“收集”“患者”.100篇论文中,45篇(45.0%)含有“收集患者”的表达错误,含有同时出现“收集”“患者”的句子172个,其中表达错误61个(35.5%).核心期刊与非核心期刊错误个数、差错率差异无统计学意义(Z=-0.365,P=0.715;x2 =0.400,P=0.527);摘要与正文差错率差异无统计学意义(,=0.576,P=0.448).分析出现“收集患者”的错误情况,并结合案例总结正确的表达方式.  相似文献   

19.
BACKGROUND: The use of a structured abstract has been recommended in reporting medical literature to quickly convey necessary information to editors and readers. The use of structured abstracts increased during the mid-1990s; however, recent practice has yet to be analyzed. OBJECTIVES: This article explored actual reporting patterns of abstracts recently published in selected medical journals and examined what these journals required of abstracts (structured or otherwise and, if structured, which format). METHODS: The top thirty journals according to impact factors noted in the "Medicine, General and Internal" category of the ISI Journal Citation Reports (2000) were sampled. Articles of original contributions published by each journal in January 2001 were examined. Cluster analysis was performed to classify the patterns of structured abstracts objectively. Journals' instructions to authors for writing an article abstract were also examined. RESULTS: Among 304 original articles that included abstracts, 188 (61.8%) had structured and 116 (38.2%) had unstructured abstracts. One hundred twenty-five (66.5%) of the abstracts used the introduction, methods, results, and discussion (IMRAD) format, and 63 (33.5%) used the 8-heading format proposed by Haynes et al. Twenty-one journals requested structured abstracts in their instructions to authors; 8 journals requested the 8-heading format; and 1 journal requested it only for intervention studies. CONCLUSIONS: Even in recent years, not all abstracts of original articles are structured. The eight-heading format was neither commonly used in actual reporting patterns nor noted in journal instructions to authors.  相似文献   

20.
康祝圣  王燕  谢暄 《编辑学报》2011,23(1):33-34
一篇好的英文摘要不但要反映论文的核心价值,而且要有正确传达论文思想的语言表达。本文针对这2个方面,提出科技期刊应对论文英文摘要的质量进行流程控制的思想。讨论了英文摘要的审、编、修内容,结合稿件处理流程,指出流程质量控制的任务与要求。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号