首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于词形的汉语文本切分方法
引用本文:付国宏,王晓龙.基于词形的汉语文本切分方法[J].情报学报,1999,18(3).
作者姓名:付国宏  王晓龙
作者单位:哈尔滨工业大学计算机科学与工程系,哈尔滨,150001
摘    要:本文在分析汉语分词一般模型基础上,引入词形概率、词整合系数和词形网格等概念,提出了一个基于词形的汉语文本切分模型,并实现了一个反向动态规划和正向栈解码相结合的二次扫描的汉语文本切分算法。由于引入了词形概率、词整合系数,本模型不仅反映了词形统计构词规律,而且在一定程度上体现了长词优先的切分原则。初步测试表明,本方法的切分准确率和消歧率分别可达996%和9344%。

关 键 词:汉语分词  词形概率  整合系数  词形网格
修稿时间:1998年6月8日

Word Form Based Chinese Text Segmentation Approach
Fu Guohong and Wang Xiaolong.Word Form Based Chinese Text Segmentation Approach[J].Journal of the China Society for Scientific andTechnical Information,1999,18(3).
Authors:Fu Guohong and Wang Xiaolong
Abstract:In this paper,word form probability,word form coefficient and word lattice are introduced to construct a word formality based segmentation model,and a two way scanning segmentation algorithm is implemented incorporating backward dynamic programming algorithm with forward stack decoding algorithm.Not only the statistic law on word formality,but also the principle of longest word first to some extent is reflected in the model,due to the introducing of word form probability and coefficient.Finally a segmentation accuracy rate of 99 6% and a disambiguation rate of 93 44% are achieved in the primary experiment.
Keywords:Chinese word segmentation  word form probability  word form coefficient  word Form lattice  
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号