首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Automatic extraction of bilingual word pairs using inductive chain learning in various languages
Authors:Hiroshi Echizen-ya  Kenji Araki  Yoshio Momouchi
Institution:1. Department of Electronics and Information Engineering, Hokkai-Gakuen University, South-26 West-11, Chuo-ku, Sapporo 064-0926, Japan;2. Graduate School of Information Science and Technology, Hokkaido University, Kita-14, Nishi-9, Kita-ku, Sapporo 060-0814, Japan
Abstract:In this paper, we propose a new learning method for extracting bilingual word pairs from parallel corpora in various languages. In cross-language information retrieval, the system must deal with various languages. Therefore, automatic extraction of bilingual word pairs from parallel corpora with various languages is important. However, previous works based on statistical methods are insufficient because of the sparse data problem. Our learning method automatically acquires rules, which are effective to solve the sparse data problem, only from parallel corpora without any prior preparation of a bilingual resource (e.g., a bilingual dictionary, a machine translation system). We call this learning method Inductive Chain Learning (ICL). Moreover, the system using ICL can extract bilingual word pairs even from bilingual sentence pairs for which the grammatical structures of the source language differ from the grammatical structures of the target language because the acquired rules have the information to cope with the different word orders of source language and target language in local parts of bilingual sentence pairs. Evaluation experiments demonstrated that the recalls of systems based on several statistical approaches were improved through the use of ICL.
Keywords:Learning method  Bilingual word pairs  Various languages  Sparse data problem  Parallel corpora  Statistical approach
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号