首页 | 本学科首页   官方微博 | 高级检索  
     检索      

无词典中英文混合术语抽取及算法研究
引用本文:姜韶华,党延忠.无词典中英文混合术语抽取及算法研究[J].情报学报,2006,25(3):301-305.
作者姓名:姜韶华  党延忠
作者单位:大连理工大学系统工程研究所,大连,116024
摘    要:中英文混合术语可作为未登录词处理、加权处理和歧义消解等的辅助信息,并有助于提高中文信息处理的质量。依据长度递减与串频统计思想,本文提出了一种中英文混合术语的抽取方法。该方法不需要词典,不需要事先进行语料库的学习,不需要建立字索引,而是依靠统计信息,抽取出支持度大于等于阈值的中英文混合术语。该算法能够有效地抽取出文本中新涌现的通用词、专业术语及专有名词。实验显示该方法不受语料限制,能够快速、准确地进行中英文混合术语的抽取。

关 键 词:中英文混合术语  中文信息处理  串频  长串优先
修稿时间:2005年8月2日

Research on Terms Combined with Chinese and English Extracting and Algorithm with no Thesaurus
Jiang Shaohua,Dang Yanzhong.Research on Terms Combined with Chinese and English Extracting and Algorithm with no Thesaurus[J].Journal of the China Society for Scientific andTechnical Information,2006,25(3):301-305.
Authors:Jiang Shaohua  Dang Yanzhong
Abstract:Terms combined with Chinese and English can provide supplement knowledge for the un-login words processing,word weighting and word disambiguation,and can improve the quality of Chinese information processing.This paper presents an algorithm extracting terms combined with Chinese and English based on string length descending and statistics of string frequency.This algorithm can automatically extract terms combined with Chinese and English without thesaurus,without acquiring the probability between words in advance and without character index.This algorithm can effectively extract new universal words,specialized terms and proper nouns.The experimental results show that it can work on arbitrary text and has high speed and accuracy.
Keywords:terms combined with Chinese and English  Chinese information processing  string frequency  matching longer string first  
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号