Multi-lingual opinion mining on YouTube期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Multi-lingual opinion mining on YouTube

Institution:	1. Universidad Autónoma de Madrid, Ciudad Universitaria de Cantoblanco. C/Iván Pavlov, s/n., 28049 Madrid, Spain\n;2. Universidad Nacional de Educación a Distancia, Juan del Rosal, nº 10. 28023, Spain;3. Semantia Lab, Bravo Murillo, 38. 28015, Madrid, Spain;1. Université de Toulouse, Laboratoire de Génie de Production (LGP), EA 1905, ENIT-INPT, 47 Avenue d’Azereix, BP 1629, Tarbes Cedex 65016, France;2. Université de Toulouse, Faculté de droit, 2 rue du Doyen Gabriel Marty, Toulouse cedex 9 31042, France;1. Polytechnic Institute of Tomar, Tomar, Portugal;2. LIAAD/INESC TEC – INESC Technology and Science, Portugal\n;3. DCC – FCUP, University of Porto, Portugal;4. HULTECH/GREYC, University of Caen Basse-Normandie, Caen, France;5. Department of Mathematics, University of Beira Interior, Covilhã, Portugal;6. Center of Mathematics, University of Beira Interior, Covilhã, Portugal;1. College of Education Science and Technology, Zhejiang University of Technology, Hangzhou, 310023, China;2. College of Business and Administration, Zhejiang University of Technology, Hangzhou, 310023, China;3. College of Electrical and Information Engineering, Hunan University, Changsha, Hunan, 410082, China

Abstract:	In order to successfully apply opinion mining (OM) to the large amounts of user-generated content produced every day, we need robust models that can handle the noisy input well yet can easily be adapted to a new domain or language. We here focus on opinion mining for YouTube by (i) modeling classifiers that predict the type of a comment and its polarity, while distinguishing whether the polarity is directed towards the product or video; (ii) proposing a robust shallow syntactic structure (STRUCT) that adapts well when tested across domains; and (iii) evaluating the effectiveness on the proposed structure on two languages, English and Italian. We rely on tree kernels to automatically extract and learn features with better generalization power than traditionally used bag-of-word models. Our extensive empirical evaluation shows that (i) STRUCT outperforms the bag-of-words model both within the same domain (up to 2.6% and 3% of absolute improvement for Italian and English, respectively); (ii) it is particularly useful when tested across domains (up to more than 4% absolute improvement for both languages), especially when little training data is available (up to 10% absolute improvement) and (iii) the proposed structure is also effective in a lower-resource language scenario, where only less accurate linguistic processing tools are available.

Keywords:	Natural Language Processing Opinion mining Social media
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏