首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于BERT模型的科技政策文本分类研究
引用本文:沈自强,李晔,丁青艳,王金颖,白全民.基于BERT模型的科技政策文本分类研究[J].数字图书馆论坛,2022(1).
作者姓名:沈自强  李晔  丁青艳  王金颖  白全民
作者单位:齐鲁工业大学(山东省科学院)经济与管理学部;山东省科技发展战略研究所;山东省计算中心(国家超级计算济南中心)
基金项目:山东省高等学校青创科技支持计划“智能时代的产业变革:技术、制度与创业导向”(编号:2020RWG009)资助。
摘    要:在智慧政务的应用背景下,利用深度学习的方法对海量的科技政策文本数据进行自动分类,可以降低人工处理的成本,提高政策匹配的效率。利用BERT深度学习模型对科技政策进行自动分类实验,通过TextRank算法和TF-IDF算法提取政策文本关键词,将关键词与政策标题融合后输入BERT模型中以优化实验,并对比不同深度学习模型的分类效果来验证该方法的有效性。结果表明,通过BERT模型,融合标题和TF-IDF政策关键词的分类效果最佳,其准确率可达94.41%,证明利用BERT模型在标题的基础上加入政策关键词能够提高政策文本自动分类的准确率,实现对科技政策文本的有效分类。

关 键 词:科技政策  文本分类  BERT模型  关键词提取

Research on Science and Technology Policy Text Classification Based on BERT Model
SHEN ZiQiang,LI Ye,DING QingYan,WANG JinYing,BAI QuanMin.Research on Science and Technology Policy Text Classification Based on BERT Model[J].Digital Library Forum,2022(1).
Authors:SHEN ZiQiang  LI Ye  DING QingYan  WANG JinYing  BAI QuanMin
Institution:(School of Economics and Management,Qilu University of Technology(Shandong Academy of Sciences),Jinan 250014,P.R.China;Institute of Science and Technology for Development of Shandong,Jinan,250014,P.R.China;Shandong Computer Science Center(National Super Computer Center in Jinan),Jinan 250014,P.R.China)
Abstract:In the context of the application of smart government,this article uses deep learning methods to automatically classify massive amounts of scientific and technological policy text data in order to reduce the cost of manual processing and improve the efficiency of policy matching.This paper used the BERT deep learning model to automatically classify science and technology policies.It extracted the keywords of the policy text through the TextRank algorithm and the TF-IDF algorithm,then integrated the policy titles and policy keywords into the BERT model,so as to optimize the experiment and improve the effect and accuracy of policy text classification.It also made a comprehensive comparative analysis of the classification effect on different deep learning models to show the superiority of this method.The results show that the classification effect of combining the title and TF-IDF policy keywords is the best through the BERT model,and the accuracy rate can reach 94.41%,which proves that adding policy keywords on the basis of the title can improve the accuracy of automatic classification of policy texts on BERT model.Our research achieves an efficient classification of science and technology policy texts.
Keywords:Science and Technology Policy  Text Classification  BERT Model  Keyword Extraction
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号