首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于虚词停顿的中文分词消歧研究
引用本文:麦范金,李东普.基于虚词停顿的中文分词消歧研究[J].图书情报工作,2010,54(14):121-125.
作者姓名:麦范金  李东普
作者单位:1. 桂林理工大学现代教育技术中心;2. 桂林理工大学电子与计算机系;
基金项目:广西教育厅科研项目,广西研究生教育创新计划资助项目
摘    要:提出一种基于虚词停顿的中文分词消岐的模型。首先利用建立的虚词知识库对文本进行粗分词-划分停顿,然后对句子中停顿间的短语用双向最大匹配再进行分词,提取歧义部分,最后使用N-Gram模型和数据平滑等技术处理。整个过程分为粗分词、精分词和歧义消除三个过程。测试结果显示,该模型能有效地降低词歧义引起的错误切分率。

关 键 词:分词  停顿  最大匹配  N-Gram模型  数据平滑  
收稿时间:2009-11-23
修稿时间:2010-05-04

Research on the Sense Disambiguation of Chinese Segmentation Based on the Pause of the Form Word
Mai Fanjin,Li Dongpu.Research on the Sense Disambiguation of Chinese Segmentation Based on the Pause of the Form Word[J].Library and Information Service,2010,54(14):121-125.
Authors:Mai Fanjin  Li Dongpu
Institution:1. Modern Education Technology Centre,Guilin University of Technology,;2. Electronic and Computer Department, Guilin University of Technology,;
Abstract:This paper puts forward a model which can eliminate sense ambiguity of Chinese segmentation based on the pause of the empty words.Firstly,this model segments words roughly based on the empty words library and then it has many phrases between pauses. Secondly, they segment phrases based on MM and RMM and extract the ambiguity.Finally,they have the model of N Gram and the technology of the data smoothing to improve it.The process can be divided into three parts:segments word roughly,segments word narrowly and disambiguation.The test result shows that this model is able to reduce the error rate of segmentation,which is caused by the ambiguity of word segmentation.
Keywords:word segmentation  pause word  maximum matching method  N-Gram model  data smoothing  
本文献已被 万方数据 等数据库收录!
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号