首页 | 本学科首页   官方微博 | 高级检索  
     检索      


ReGR: Relation-aware graph reasoning framework for video question answering
Institution:1. Department of Information Management, Dongbei University of Finance & Economics, Dalian, Liaoning, China;2. Department of Information Resources Management, Business School of Nankai University, Tianjin, China;3. Center for Network Society Governance of Nankai University, Tianjin, China;1. College of Big Data and Intelligent Engineering, Yangtze Normal University, Chongqing 408100, China;2. Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;3. College of Computer and Information Science, Southwest University, Chongqing 400715, China
Abstract:As one of the challenging cross-modal tasks, video question answering (VideoQA) aims to fully understand video content and answer relevant questions. The mainstream approach in current work involves extracting appearance and motion features to characterize videos separately, ignoring the interactions between them and with the question. Furthermore, some crucial semantic interaction details between visual objects are overlooked. In this paper, we propose a novel Relation-aware Graph Reasoning (ReGR) framework for video question answering, which first combines appearance–motion and location–semantic multiple interaction relations between visual objects. For the interaction between appearance and motion, we design the Appearance–Motion Block, which is question-guided to capture the interdependence between appearance and motion. For the interaction between location and semantics, we design the Location–Semantic Block, which utilizes the constructed Multi-Relation Graph Attention Network to capture the geometric position and semantic interaction between objects. Finally, the question-driven Multi-Visual Fusion captures more accurate multimodal representations. Extensive experiments on three benchmark datasets, TGIF-QA, MSVD-QA, and MSRVTT-QA, demonstrate the superiority of our proposed ReGR compared to the state-of-the-art methods.
Keywords:Video question answering  Cross-modal  Graph neural network  Interaction relations reasoning  Attention mechanism
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号