首页 | 本学科首页   官方微博 | 高级检索  
     检索      


HCA: Hierarchical Compare Aggregate model for question retrieval in community question answering
Institution:1. Key Laboratory of Computer Vision and System (Ministry of Education), Tianjin University of Technology, Tianjin, China;2. Institute of AI, Shandong Computer Science Center(National Supercomputer Center in Jinan), QILU University of Technology, China;1. The Hong Kong Polytechnic University, Hong Kong, China;2. Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, China;3. College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China;4. Bio-Computing Research Center, Harbin Institute of Technology, Shenzhen, China;5. Shenzhen Key Laboratory of Visual Object Detection and Recognition, Shenzhen, China;1. 710071 School of Computer Science and Technology, Xidian University, Xi''an, PR China;2. 19 Ambo University, Ambo, Ethiopia;1. Beijing University of Posts and Telecommunications, Beijing, China;2. Singapore Management University, Singapore;3. Worcester Polytechnic Institute, USA;4. Alibaba Group, Hangzhou, China;1. Information Studies, School of Humanities, University of Glasgow, Glasgow, UK;2. Information School, University of Sheffield, Sheffield, UK;3. Special Collections Service, University of Reading, Reading, UK
Abstract:We address the problem of finding similar historical questions that are semantically equivalent or relevant to an input query question in community question-answering (CQA) sites. One of the main challenges for this task is that questions are usually too long and often contain peripheral information in addition to the main goals of the question. To address this problem, we propose an end-to-end Hierarchical Compare Aggregate (HCA) model that can handle this problem without using any task-specific features. We first split questions into sentences and compare every sentence pair of the two questions using a proposed Word-Level-Compare-Aggregate model called WLCA-model and then the comparison results are aggregated with a proposed Sentence-Level-Compare-Aggregate model to make the final decision. To handle the insufficient training data problem, we propose a sequential transfer learning approach to pre-train the WLCA-model on a large paraphrase detection dataset. Our experiments on two editions of the Semeval benchmark datasets and the domain-specific AskUbuntu dataset show that our model outperforms the state-of-the-art models.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号