首页 | 本学科首页   官方微博 | 高级检索  
     检索      


HClaimE: A tool for identifying health claims in health news headlines
Authors:Shi Yuan  Bei Yu
Institution:1. School of Economics and Management, Beihang University, Beijing 100191, China;2. School of Information Studies, Syracuse University, Syracuse 13244, NY, USA
Abstract:This study tackles the problem of extracting health claims from health research news headlines, in order to carry out veracity check. A health claim can be formally defined as a triplet consisting of an independent variable (IV – namely, what is being manipulated), a dependent variable (DV – namely, what is being measured), and the relation between the two. In this study, we develop HClaimE, an information extraction tool for identifying health claims in news headlines. Unlike the existing open information extraction (OpenIE) systems that rely on verbs as relation indicators, HClaimE focuses on finding relations between nouns, and draws on the linguistic characteristics of news headlines. HClaimE uses a Naïve Bayes classifier that combines syntactic and lexical features for identifying IV and DV nouns, and recognizes relations between IV and DV through a rule-based method. We conducted an evaluation on a set of health news headlines from ScienceDaily.com, and the results show that HClaimE outperforms current OpenIE systems: the F-measures for identifying headlines without health claims is 0.60 and that for extracting IV-relation-DV is 0.69. Our study shows that nouns can provide more clues than verbs for identifying health claims in news headlines. Furthermore, it also shows that dependency relations and bag-of-words can distinguish IV-DV noun pairs from other noun pairs. In practice, HClaimE can be used as a helpful tool to identifying health claims in news headlines, which can then be further compared against authoritative health claims for veracity. Given the linguistic similarity between health claims and other causal claims, e.g., impacts of pollution on the environment, HClaimE may also be applicable for extracting claims in other domains.
Keywords:Corresponding author    Health claim  Syntactic feature  Lexical feature  Information quality  Information extraction  Natural language processing  00-01  99-00
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号