首页 | 本学科首页   官方微博 | 高级检索  
     检索      


A Google Trends spatial clustering approach for a worldwide Twitter user geolocation
Institution:1. Institute for Informatics and Telematics (IIT) of the National Research Council of Italy (CNR), Pisa, Italy;2. ANIMA Sgr S.p.a., Corso Giuseppe Garibaldi 99, 20121 Milan, Italy;3. ALGORITMI Centre, Department of Information Systems, University of Minho, 4804-533 Guimarães, Portugal;1. Key Laboratory of Computer Vision and System (Ministry of Education), Tianjin University of Technology, Tianjin, China;2. Institute of AI, Shandong Computer Science Center(National Supercomputer Center in Jinan), QILU University of Technology, China;1. Xianyang Vocational Technical College, Xianyang, P. R. China;2. China Electric Power Research Institute, Beijing, P. R. China;3. GuiZhou University, Guizhou Provincial Key Laboratory of Public Big Data, Guiyang, P. R. China;4. State Key Laboratory of Integrated Service Networks, School of Telecommunications Engineering, Xidian University, Xi’an, P. R. China;5. Pedagogical University of Krakow, Podchorazych 2 St., 30-084 Kraków, Poland;1. The Hong Kong Polytechnic University, Hong Kong, China;2. Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, China;3. College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China;4. Bio-Computing Research Center, Harbin Institute of Technology, Shenzhen, China;5. Shenzhen Key Laboratory of Visual Object Detection and Recognition, Shenzhen, China
Abstract:User location data is valuable for diverse social media analytics. In this paper, we address the non-trivial task of estimating a worldwide city-level Twitter user location considering only historical tweets. We propose a purely unsupervised approach that is based on a synthetic geographic sampling of Google Trends (GT) city-level frequencies of tweet nouns and three clustering algorithms. The approach was validated empirically by using a recently collected dataset, with 3,268 worldwide city-level locations of Twitter users, obtaining competitive results when compared with a state-of-the-art Word Distribution (WD) user location estimation method. The best overall results were achieved by the GT noun DBSCAN (GTN-DB) method, which is computationally fast, and correctly predicts the ground truth locations of 15%, 23%, 39% and 58% of the users for tolerance distances of 250 km, 500 km, 1,000 km and 2,000 km.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号