首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于枢轴语言的平行语料构建方法
引用本文:单华,张玉洁,周雯,徐金安,陈钰枫.基于枢轴语言的平行语料构建方法[J].情报工程,2017,3(3):029-039.
作者姓名:单华  张玉洁  周雯  徐金安  陈钰枫
作者单位:北京交通大学计算机与信息技术学院,北京交通大学计算机与信息技术学院,北京交通大学计算机与信息技术学院,北京交通大学计算机与信息技术学院,北京交通大学计算机与信息技术学院
基金项目:本文受国家自然科学基金(61370130, 61473294)的资助。
摘    要:平行语料库的规模对于统计机器翻译性能的提高具有重要作用,但是平行语料库的人工构建成本很高。针对这个问题,本文提出了一种低成本高效率的平行语料构建方法,利用枢轴语言作为桥梁,借助已有的机器翻译技术并融合主动学习方法构建目标语言对的大规模高质量平行语料库。本文通过以英语作为枢轴语言构建日汉平行语料库的实例研究,利用成熟的基于短语的统计机器翻译技术,描述了基于译文自动评测的良好译文选择方法、基于主动学习的语料选取方法、以及翻译系统的更新迭代和评价实验。实验结果表明,本文提出的方法能够快速构建日汉平行语料,并有效提高日汉翻译系统的性能。

关 键 词:枢轴语言,机器翻译,平行语料,主动学习

Approach of Constructing Parallel Corpus Based on Pivot Language
Authors:SHAN Hu  ZHANG YuJie  ZHOU Wen  XU JinAn and CHEN YuFeng
Institution:The School of Computer and Information Technology, Beijing Jiaotong University,The School of Computer and Information Technology, Beijing Jiaotong University,The School of Computer and Information Technology, Beijing Jiaotong University,The School of Computer and Information Technology, Beijing Jiaotong University and The School of Computer and Information Technology, Beijing Jiaotong University
Abstract:A large scale parallel corpus plays an important role in improving the performance of machine translation. It spent highly for manually constructing a parallel corpus. This paper proposed a pivot based approach for constructing high quality parallel corpus with low cost, in which the existing machine translation technology and active learning method are combined. This paper describes the domain adaptation method based on active learning, the good translation selection method based on automatic translation evaluation, and iterative retraining of translation system. We applied the approach to the construction of Japanese-Chinese parallel corpus by taking English as pivot and conducted evaluation experiments. The experimental results showed that the proposed approach effectively obtained Japanese-Chinese parallel corpus with high quality and the constructed parallel corpus indeed improved the performance of Japanese-Chinese machine translation system.
Keywords:Pivot language  machine translation  parallel corpus  active learning
点击此处可从《情报工程》浏览原始摘要信息
点击此处可从《情报工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号