首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种基于机器学习的研究前沿识别方法研究
引用本文:李欣,温阳,黄鲁成,苗红.一种基于机器学习的研究前沿识别方法研究[J].科研管理,2021,42(1):20-32.
作者姓名:李欣  温阳  黄鲁成  苗红
作者单位:北京工业大学经济与管理学院,北京 100124
基金项目:国家自然科学基金面上项目:“基于多源异构数据的新兴技术形成机理研究”(71673018,2017.01-2020.12)。
摘    要:研究前沿是科技创新过程中最具潜力和前瞻性的研究方向,尽早识别研究前沿对科学研究、企业研发资源优化配置、政府创新战略前瞻部署等至关重要。针对目前在研究前沿识别研究中存在的不足,提出一种基于机器学习的研究前沿识别方法。该方法首先通过构建机器学习模型来识别出潜在高被引论文,解决利用引文分析法来识别研究前沿的时滞性问题,并将潜在高被引论文纳入研究前沿识别的高被引论文核心文档集中;其次,以高被引论文核心文档集为数据源,利用聚类分析法识别出研究前沿主题,并对研究前沿主题进行对比和评价分析,进而识别出研究前沿;最后,以太阳能光伏电池研究领域为例进行了实证研究,验证了该方法的可行性和有效性,为研究前沿识别提供了新的研究方法。

关 键 词:机器学习  研究前沿  引文分析  评价  识别
收稿时间:2020-11-07
修稿时间:2020-12-03

A study of the research front identification method based on machine learning
Li Xin,Wen Yang,Huang Lucheng,Miao Hong.A study of the research front identification method based on machine learning[J].Science Research Management,2021,42(1):20-32.
Authors:Li Xin  Wen Yang  Huang Lucheng  Miao Hong
Institution:College of Economic and Management, Beijing University of Technology, Beijing 100124, China
Abstract:Research front is the most potential and forward-looking research direction in the process of technological innovation.It is very important to early identify the research fronts for scientific research,optimal allocation of enterprises′R&D resources,governments′innovation strategies formulation.Faced the massive amount of scientific research results data,how to quickly and accurately identify research fronts has become the focus of the academic community.Many scholars have used bibliometric methods to identify research fronts.Citation analysis is one of the most commonly used methods to identify research fronts,and highly cited papers are regarded as an important data source.However,it takes a certain amount of time to accumulate citations of papers.The existing citation analysis method cannot incorporate newly published papers and papers that will be highly cited in the future into the data collection of highly cited papers that identify research fronts.Therefore,aiming at the current deficiencies in the research on research fronts identification,this paper proposed a novel framework for identifying research fronts based on machine learning methods.The research steps of this framework are as follows.Firstly,we used Web of Science(WoS)as the data source to download historical highly cited papers and the references of the highly cited papers.Secondly,we constructed the identification indexes system of the highly cited papers and calculated the corresponding values of the indexes.Then we divided the obtained data into the training data set and the testing data set for machine learning model.Thirdly,we constructed support vector machine(SVM),random forest(RF),and eXtreme Gradient Boosting(XGBoost)models,and continuously adjusted model parameters to make the three models to be optimal.Fourth,we downloaded the newly published papers from WoS to verify the generalization ability of each machine learning model.Then we selected the model with the best generalization ability to predict the future citations of the newly published papers and identified potentially highly cited papers,and we incorporated the potentially highly cited papers into the core data set of the highly cited papers.Fifth,we used the core data set of the highly cited papers as the data source to identify the research front topics by applying cluster analysis.Finally,the research front topics are compared and evaluated to identify the research fronts.We selected solar cells as a case study to verify the valid and flexible of this framework.The research results show that emerging research fronts in the research field of solar cells include:Ternary organic solar cells/Ternary polymer solar cells,PbS quantum-dot solar cells,inverted planar perovskite solar cells;the growing research fronts include:Non-fullerene polymer solar cells/Non-fullerene organic solar cells,CH3 NH3 PbI3 perovskite solar cells.We found that the research fronts we identified were basically consistent with the research fronts in the field of solar cells in existing authoritative research reports.In addition,we invited three well-known experts in the field of solar cells to evaluate the research fronts identified in this paper,and they basically agreed with the results.This verifies the effectiveness and feasibility of the method proposed in this paper.
Keywords:machine learning  research front  citation analysis  evaluation  identification
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《科研管理》浏览原始摘要信息
点击此处可从《科研管理》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号