首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种基于序列标注的概念短语抽取方法
引用本文:李雪思,张智雄,刘欢.一种基于序列标注的概念短语抽取方法[J].图书情报工作,2022,66(11):121-128.
作者姓名:李雪思  张智雄  刘欢
作者单位:1. 中国科学院文献情报中心 北京 100190;2. 中国科学院大学经济与管理学院图书情报与档案管理系 北京 100190;3. 科技大数据湖北省重点实验室 武汉 430072
基金项目:本文系中国科学院文献情报能力建设专项课题"基于科技文献知识的人工智能(AI)引擎建设"(项目编号:E0290906)和国家科技图书文献中心 (NSTL)"下一代开放知识服务平台关键技术优化集成与系统研发"(项目编号:2020XM05)研究成果之一。
摘    要:[目的/意义] 科技文献中的概念是对文献中知识高度凝练的表达,通常以定义句的形式出现在科技文献中。自动从概念定义句中抽取概念,能够进一步挖掘科技文献中蕴含的重要知识。[方法/过程] 通过分析概念定义句的结构、句式等模式特征,提出以WCL数据集为基础的语料构建方案,并采用BERT+BiLSTM+CRF模型学习概念定义句的模式,从而实现概念短语抽取。[结果/结论] 结合以往对概念定义句模式特征的研究,创新性地提出一种基于序列标注学习概念定义句的组成模式,从而实现概念短语抽取。通过BERT+BiLSTM+CRF模型,有效学习了概念定义句中的上下文语义、句式结构、组成项分布等模式特征,实现了句子中概念短语的抽取。

关 键 词:序列标注  概念定义句  概念短语  自动抽取  
收稿时间:2021-10-07
修稿时间:2021-12-08

A Method for Extracting Concept Phrases Based on Sequence Labeling
Li Xuesi,Zhang Zhixiong,Liu Huan.A Method for Extracting Concept Phrases Based on Sequence Labeling[J].Library and Information Service,2022,66(11):121-128.
Authors:Li Xuesi  Zhang Zhixiong  Liu Huan
Institution:1. National Science Library, Chinese Academy of Sciences, Beijing 100190;2. Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190;3. Hubei Key Laboratory of Science and Technology for Big Data, Wuhan 430072
Abstract:Purpose/Significance] Concepts in scientific and technical literatures are highly condensed expressions of knowledge in the literatures and usually appear in scientific and technical literatures in the form of definition sentences. Automatic extraction of concepts from concept definition sentences is an important research topic in scientific and technical literature mining. Method/Process] By analyzing the structure, syntax and other pattern features of concept definition sentences, this paper proposed a corpus construction scheme based on WCL dataset, and used BERT+BiLSTM+CRF model to learn concept definition sentence patterns to achieve concept phrase extraction. Result/Conclusion] Based on the previous study of the characteristics of concept definition sentence pattern, this paper creatively proposes the composition pattern to learn concept definition sentences based on sequence labeling, to realize the extraction of concept phrases. Through the BERT+BiLSTM+CRF model, the pattern features such as contextual semantics, sentence structure and constituent term distribution in concept definition sentences are effectively learned to achieve the extraction of concept phrases in definition sentences.
Keywords:sequence labeling  concept definition sentence  concept phrase  automatic extraction  
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号