一种快速中文分词词典机制 Fast dictionary mechanism for Chinese word segmentation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

一种快速中文分词词典机制

作者姓名：	吴晶晶荆继武聂晓峰王平建

作者单位：	1. 中国科学技术大学电子工程与信息科学系,合肥 230027; 2. 中国科学院研究生院信息安全国家重点实验室, 北京100049

基金项目：	国家高技术研究发展计划(863)(2006AA01Z454)、国家信息安全242计划(2005B23)和国家自然科学基金(60573015)资助 

摘要：	通过研究目前中文分词领域各类分词机制,注意到中文快速分词机制的关键在于对单双字词的识别,在这一思想下,提出了一种快速中文分词机制:双字词-长词哈希机制,通过提高单双字词的查询效率来实现对中文分词机制的改进.实验证明,该机制提高了中文文本分词的效率.
关键词：	文本实时处理中文分词词典法分词双字词-长词哈希机制
收稿时间：	2008-10-16
修稿时间：	2009-04-21
Fast dictionary mechanism for Chinese word segmentation

Authors:	WU Jing-Jing JING Ji-Wu NIE Xiao-Feng Wang Ping-Jian

Institution:	1. Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei 230027, China; 2. State Key Laboratory of Information Security, Graduate University of the Chinese Academy of Sciences, Beijing 100049, China

Abstract:	With the development of global networking through Internet, the amount of articles in Chinese or other native languages is increasing rapidly. As the lack of explicit separator, word segmentation is a precondition for the processing of these character-based languages and thus it affects the whole system in performance. In this paper, we propose a new solution for Chinese word segmentation problem based on Lexicon named double-character-and-long-word-hash-indexing (DCLWHI).Compared with traditional lexicon mechanism, DCLWHI improves the speed and efficiency of word segmentation without extra memory spending and gains the same accuracy.

Keywords:	text real-time processing Chinese word segmentation lexicon mechanism double-character-and-long-word-Hash-indexing(DCLWHI)

	点击此处可从《》浏览原始摘要信息
	点击此处可从《》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏