基于三词位的字标注汉语分词 |
| |
引用本文: | 王希杰,黄勇杰.基于三词位的字标注汉语分词[J].安阳师范学院学报,2013(5):49-52. |
| |
作者姓名: | 王希杰 黄勇杰 |
| |
作者单位: | 安阳师范学院计算机与信息工程学院; |
| |
摘 要: | 借助于统计语言模型将汉语分词转换为字序列标注并实现汉语分词已经成为近年来汉语分词的主流方法,但统计语言模型训练时间较长一直是这一方法中的最大问题.提出了一种基于三词位的字标注汉语分词方法,并在bakeoff2005提供的语料上进行了对比实验,结果表明该方法可以取得接近四词位字标注分词方法的性能,但在模型的训练时间上明显优于四词位标注方法.
|
关 键 词: | 汉语分词 三词位 条件随机场 特征模板 上下文窗口 |
Three Word-positions-based Tagging for Chinese Word Segmentation |
| |
Institution: | WANG Xi - jie, HONG Yong - jie ( School of Computer and Information Engineering of Anyang Normal University, Anyang 455000, China) |
| |
Abstract: | In recent years, it has been the mainstream method that treates Chinese word segmentation as a se-quence data tagging problem with the help of statistical language mode. But the biggest problem is that the training time of the model is too long. A method based on three word - positions tagging is proposed for Chi-nese word segmentation, and comparative experiments are performed on corpus from the second international Chinese word segmentation Bakeoff-2005. Experimental results show that the method could get the closer performance of Chinese word segmentation which using the four word - positions tagging could get, but the training time is significantly reduced. |
| |
Keywords: | Chinese word segmentation Three word - positions Conditional random fields Feature template context window |
本文献已被 CNKI 维普 等数据库收录! |