首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种快速中文分词词典机制
作者姓名:吴晶晶  荆继武  聂晓峰  王平建
作者单位:1. 中国科学技术大学电子工程与信息科学系,合肥 230027; 2. 中国科学院研究生院信息安全国家重点实验室, 北京100049
基金项目:国家高技术研究发展计划(863)(2006AA01Z454)、国家信息安全242计划(2005B23)和国家自然科学基金(60573015)资助 
摘    要:通过研究目前中文分词领域各类分词机制,注意到中文快速分词机制的关键在于对单双字词的识别,在这一思想下,提出了一种快速中文分词机制:双字词-长词哈希机制,通过提高单双字词的查询效率来实现对中文分词机制的改进.实验证明,该机制提高了中文文本分词的效率.

关 键 词:文本实时处理  中文分词  词典法分词  双字词-长词哈希机制  
收稿时间:2008-10-16
修稿时间:2009-04-21

Fast dictionary mechanism for Chinese word segmentation
Authors:WU Jing-Jing  JING Ji-Wu  NIE Xiao-Feng  Wang Ping-Jian
Institution:1. Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei 230027, China; 2. State Key Laboratory of Information Security, Graduate University of the Chinese Academy of Sciences, Beijing 100049, China
Abstract:With the development of global networking through Internet, the amount of articles in Chinese or other native languages is increasing rapidly. As the lack of explicit separator, word segmentation is a precondition for the processing of these character-based languages and thus it affects the whole system in performance. In this paper, we propose a new solution for Chinese word segmentation problem based on Lexicon named double-character-and-long-word-hash-indexing (DCLWHI).Compared with traditional lexicon mechanism, DCLWHI improves the speed and efficiency of word segmentation without extra memory spending and gains the same accuracy.
Keywords:text real-time processing  Chinese word segmentation  lexicon mechanism  double-character-and-long-word-Hash-indexing(DCLWHI)  
点击此处可从《》浏览原始摘要信息
点击此处可从《》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号