基于虚词停顿的中文分词消歧研究 Research on the Sense Disambiguation of Chinese Segmentation Based on the Pause of the Form Word期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于虚词停顿的中文分词消歧研究

引用本文：	麦范金,李东普.基于虚词停顿的中文分词消歧研究[J].图书情报工作,2010,54(14):121-125.

作者姓名：	麦范金李东普

作者单位：	1. 桂林理工大学现代教育技术中心;2. 桂林理工大学电子与计算机系;

基金项目：	广西教育厅科研项目，广西研究生教育创新计划资助项目

摘要：	提出一种基于虚词停顿的中文分词消岐的模型。首先利用建立的虚词知识库对文本进行粗分词-划分停顿，然后对句子中停顿间的短语用双向最大匹配再进行分词，提取歧义部分，最后使用N-Gram模型和数据平滑等技术处理。整个过程分为粗分词、精分词和歧义消除三个过程。测试结果显示，该模型能有效地降低词歧义引起的错误切分率。
关键词：	分词停顿最大匹配 N-Gram模型数据平滑
收稿时间：	2009-11-23
修稿时间：	2010-05-04
Research on the Sense Disambiguation of Chinese Segmentation Based on the Pause of the Form Word

Mai Fanjin,Li Dongpu.Research on the Sense Disambiguation of Chinese Segmentation Based on the Pause of the Form Word[J].Library and Information Service,2010,54(14):121-125.

Authors:	Mai Fanjin Li Dongpu

Institution:	1. Modern Education Technology Centre,Guilin University of Technology,;2. Electronic and Computer Department, Guilin University of Technology,;

Abstract:	This paper puts forward a model which can eliminate sense ambiguity of Chinese segmentation based on the pause of the empty words.Firstly,this model segments words roughly based on the empty words library and then it has many phrases between pauses. Secondly, they segment phrases based on MM and RMM and extract the ambiguity.Finally,they have the model of N Gram and the technology of the data smoothing to improve it.The process can be divided into three parts:segments word roughly,segments word narrowly and disambiguation.The test result shows that this model is able to reduce the error rate of segmentation,which is caused by the ambiguity of word segmentation.

Keywords:	word segmentation pause word maximum matching method N-Gram model data smoothing
本文献已被万方数据等数据库收录！
	点击此处可从《图书情报工作》浏览原始摘要信息
	点击此处可从《图书情报工作》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏