首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于K-近邻方法的科技文献分类
引用本文:鲍文,胡清华,于达仁.基于K-近邻方法的科技文献分类[J].情报学报,2003,22(4):451-456.
作者姓名:鲍文  胡清华  于达仁
作者单位:哈尔滨工业大学,哈尔滨150001
摘    要:本文提出了一种在小样本数据下、无需分词处理的科技文献分类器建造方法.分析了科技文献的特点,提出了抽取科技文献的关键词作为分类特征词条,以文献的标题、关键词和摘要作为文档主题信息进行词频统计分析建立分类器.最后分别进行了基于最近邻决策和K-邻近决策的分类效果实验研究,实验证明基于欧氏距离相似性测度和基于余弦相似性测度的文本分类效果并不存在显著的差别,K-邻近决策的分类效果要优于最近邻决策的分类效果.

关 键 词:文本分类  最近邻  K-近邻  相似性测度
修稿时间:2002年8月12日

Academic Text Classification Based on K-Nearest Neighbor Method
Bao Wen,Hu Qinghua and Yu Daren.Academic Text Classification Based on K-Nearest Neighbor Method[J].Journal of the China Society for Scientific andTechnical Information,2003,22(4):451-456.
Authors:Bao Wen  Hu Qinghua and Yu Daren
Abstract:Constructing classifiers for arademic text is proposed based on small samples,no-word-segmentation.The paper analyzes characteristic of academic text,and introduces the keywords of academic text as classification features,which can avoid word segmentation in the classification.It is derived that just computing the word frequency in the titles,keywords,and abstracts for word-document matrix can reduce noise and improve classification.Tests showed there wasn't remarkable difference on similarity computing based on cosine and Euclidean distance but the classification is improved by K-nearest neighbor method compared with the nearest neighbor.
Keywords:text classification  nearest neighbor  K-nearest neighbor  similarity measure
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号