首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Towards an automated method to assess data portals in the deep web
Authors:Andreiwid Sheffer Correa  Raul Mendes de Souza  Flavio Soares Correa da Silva
Institution:1. Federal Institute of Education, Science and Technology of Sao Paulo - IFSP, Rodovia D. Pedro I (SP-65), Km 143,6 Campinas, Sao Paulo (SP) CEP 13069-901, Brazil;2. Institute of Mathematics and Statistics, University of Sao Paulo, Brazil
Abstract:The rising number of data portals has been increasing demand for new techniques to assess data openness in an automated manner. Some methods have emerged that presuppose well-organized data catalogs, the availability of API interfaces and natively exposed metadata. However, many data portals, particularly those of local governments, appear to be misimplemented and developed with the classic website model in mind, which provides access to data only through user interaction with web forms. Data in such portals resides in the hidden part of the web, as it is dynamically produced only in response to direct requests. This paper proposes an automated method for assessing government-related data in the deep web on the basis of compliance with open data principles and requirements. To validate our method, we apply it in an experiment using the government websites of the 27 Brazilian capitals. The method is fully carried out for 22 of the capitals' websites, resulting in the analysis of 5.6 million government web pages. The results indicate that the keyword search approach utilized in the method, along with the checking of web pages for multifield web forms, is effective for identifying deep web data sources, as 1.5% of web pages with potential government data that are analyzed are found to contain data stored in the deep web. This work contributes to the development of a novel method that allows for the continuous checking and identification of government data from surface web data portals. In addition, this method can be scaled and repeated to assure the widest possible content coverage.
Keywords:Corresponding author    Deep web  Data portals  Assessment  Open government data  Benchmarking
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号