首页 | 本学科首页   官方微博 | 高级检索  
     检索      


A deep search method to survey data portals in the whole web: toward a machine learning classification model
Institution:1. Department of Information Sciences and Technology, Penn State University, 2809 Saucon Valley Rd, Center Valley, PA 18034, USA;2. School of Public Affairs, Penn State University, 200 University Drive, Schuylkill Haven, PA 17972, USA
Abstract:The emergence of standardized open data software platforms has provided a similar set of features to sustain the lifecycle of open data practices, which includes storing, managing, publishing, and visualizing data, in addition to providing an out-of-the-box solution for data portals. Accordingly, the dissemination of data portals that implement such platforms has paved the way for automation, wherein (meta)data extraction supplies the demand for quantity-oriented metrics, mainly for benchmark purposes. This has given rise to an issue regarding how to survey data portals globally, especially reducing the manual efforts, while covering a wide variety of sources that may not implement standardized solutions. Thus, this study raises two main problems: searching for standardized open data software platforms and identifying specific developed web-based software operated as data portals. This study aims to develop a method that deeply searches each web page on the internet and formalizes a machine learning classification model to improve the identification of data portals, irrespective of how these data portals implement a standardized open data software platform and comply with the open data technical guidelines. The contributions of this work have been demonstrated through a list of 1,650 open data portals generalized in a training model that makes it feasible to distinguish between a data portal (that may or may not implement a standardized platform) and an ordinary web page. The results provide new insights on how machine-readable, publicly available data are affected by artificial intelligence, with special focus on how it can be used to understand data openness worldwide.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号