首页 | 本学科首页   官方微博 | 高级检索  
     检索      

词结合型未登录词识别方法研究
引用本文:周蕾,朱巧明.词结合型未登录词识别方法研究[J].常熟理工学院学报,2012(4):110-114.
作者姓名:周蕾  朱巧明
作者单位:1. 常熟理工学院计算机科学与工程学院,江苏常熟215500
2. 江苏省计算机信息处理技术重点实验室,江苏苏州215006
基金项目:江苏省自然科学基金资助项目“基于超媒体引擎的个人办公移动桌面”(BK2003030);江苏省教育厅自然基金资助项目“汉语新词汇自动抽取和发布信息网格的研究”(04KKB320134)
摘    要:介绍一种基于词结合提取的未登录词识别方法.该方法对碎片分词后的文本建立二元模型,结合互信息和规则过滤提取由若干个词组合而成的未登录词(组).测试结果准确率为84.71%,召回率为72.13%.

关 键 词:未登录词  二元模型  互信息

Research on the Recognition Method of Unknown Chinese Words Based On Compound Words Recognition
ZHOU Lei,ZHU Qiao-ming.Research on the Recognition Method of Unknown Chinese Words Based On Compound Words Recognition[J].Journal of Changshu Institute of Technology,2012(4):110-114.
Authors:ZHOU Lei  ZHU Qiao-ming
Institution:1.(School of Computer Science and Engineering,ChangShu Institute of Technology,Changshu 215500,China; 2.Jiangsu Provincial Key Laboratory for Computer Information Processing Technology,Suzhou 215006,China)
Abstract:This paper introduces a method to extract unknown Chinese words based on compound words recognition.This method builds a bi-gram model on the text which is processed by fragments segmentation,and it uses mutual information and regulations to combine some adjacent words to unknown words.The precision on the open test sets is 84.71% and recall is 72.13%.
Keywords:unknown Chinese words  bi-gram model  mutual information
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号