首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents
Institution:1. Information Technology University, 346-B, Ferozepur Road, Lahore, Pakistan;2. Deree College - The American College of Greece, 6 Gravias Street, 153-42 Aghia Paraskevi, Athens, Greece;3. Faculty of Information and Communication Technology, Mahidol University, Thailand;4. Department of Operations, Technology Events and Hospitality Management, Manchester Metropolitan University, Manchester, United Kingdom;1. Xianyang Vocational Technical College, Xianyang, P. R. China;2. China Electric Power Research Institute, Beijing, P. R. China;3. GuiZhou University, Guizhou Provincial Key Laboratory of Public Big Data, Guiyang, P. R. China;4. State Key Laboratory of Integrated Service Networks, School of Telecommunications Engineering, Xidian University, Xi’an, P. R. China;5. Pedagogical University of Krakow, Podchorazych 2 St., 30-084 Kraków, Poland;1. Key Laboratory of Computer Vision and System (Ministry of Education), Tianjin University of Technology, Tianjin, China;2. Institute of AI, Shandong Computer Science Center(National Supercomputer Center in Jinan), QILU University of Technology, China;1. The Hong Kong Polytechnic University, Hong Kong, China;2. Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, China;3. College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China;4. Bio-Computing Research Center, Harbin Institute of Technology, Shenzhen, China;5. Shenzhen Key Laboratory of Visual Object Detection and Recognition, Shenzhen, China
Abstract:The advancements of search engines for traditional text documents have enabled the effective retrieval of massive textual information in a resource-efficient manner. However, such conventional search methodologies often suffer from poor retrieval accuracy especially when documents exhibit unique properties that behoove specialized and deeper semantic extraction. Recently, AlgorithmSeer, a search engine for algorithms has been proposed, that extracts pseudo-codes and shallow textual metadata from scientific publications and treats them as traditional documents so that the conventional search engine methodology could be applied. However, such a system fails to facilitate user search queries that seek to identify algorithm-specific information, such as the datasets on which algorithms operate, the performance of algorithms, and runtime complexity, etc. In this paper, a set of enhancements to the previously proposed algorithm search engine are presented. Specifically, we propose a set of methods to automatically identify and extract algorithmic pseudo-codes and the sentences that convey related algorithmic metadata using a set of machine-learning techniques. In an experiment with over 93,000 text lines, we introduce 60 novel features, comprising content-based, font style based and structure-based feature groups, to extract algorithmic pseudo-codes. Our proposed pseudo-code extraction method achieves 93.32% F1-score, outperforming the state-of-the-art techniques by 28%. Additionally, we propose a method to extract algorithmic-related sentences using deep neural networks and achieve an accuracy of 78.5%, outperforming a Rule-based model and a support vector machine model by 28% and 16%, respectively.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号