首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and dictionary
Authors:Yeohoon Yoon  Choong-Nyoung Seon  Songwook Lee  Jungyun Seo
Institution:1. Department of Computer Science, Sogang University, 1 Sinsu-dong, Mapo-gu, Seoul, Korea;2. Division of Computer Engineering, Dongseo University, San 69-1 Jurye-dong, Sasang-gu, Busan 617-716, Korea
Abstract:Word sense disambiguation (WSD) is meant to assign the most appropriate sense to a polysemous word according to its context. We present a method for automatic WSD using only two resources: a raw text corpus and a machine-readable dictionary (MRD). The system learns the similarity matrix between word pairs from the unlabeled corpus, and it uses the vector representations of sense definitions from MRD, which are derived based on the similarity matrix. In order to disambiguate all occurrences of polysemous words in a sentence, the system separately constructs the acyclic weighted digraph (AWD) for every occurrence of polysemous words in a sentence. The AWD is structured based on consideration of the senses of context words which occur with a target word in a sentence. After building the AWD per each polysemous word, we can search the optimal path of the AWD using the Viterbi algorithm. We assign the most appropriate sense to the target word in sentences with the sense on the optimal path in the AWD. By experiments, our system shows 76.4% accuracy for the semantically ambiguous Korean words.
Keywords:AWD  Acyclic weighted digraph  HMM  Hidden Markov model  MRD  Machine-readable dictionary  POS  Part of speech  WSD  Word sense disambiguation
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号