首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于语义相似度的Web文本分类研究
引用本文:王静婷.基于语义相似度的Web文本分类研究[J].图书馆学研究,2012(9):64-69.
作者姓名:王静婷
作者单位:南京政治学院上海校区军事信息管理系
摘    要:传统的Web文本分类方法将文本中关键词的相似度作为分类的依据,丢失了很多重要的语义信息,导致分类结果不够准确且计算量大。基于此,文章提出了一种基于语义相似度的Web文本分类方法,利用领域本体将用关键词表示的文本特征向量表示为与之匹配的语义概念特征向量集,定义Web文本相似度的计算公式,设计并实现基于语义相似度的KNN算法。实验结果表明,该方法从语义概念层次上表示和处理Web文本,降低了文本特征空间维度,减少了计算量,提高了分类精确度。

关 键 词:本体  语义相似度  文本分类  K最近邻

A Research on Web Text Categorization Based on Semantic Similarity
Wang Jingting.A Research on Web Text Categorization Based on Semantic Similarity[J].Researches in Library Science,2012(9):64-69.
Authors:Wang Jingting
Institution:Wang Jingting
Abstract:The Traditional Web text categorization methods are usually based on the similarity of Keywords appearing in documents.Since these methods may lead to the loss of lots of semantic information,their results are not accurate enough and often need large amount of computation.A new method for Web text categorization based on semantic similarity is proposed.The method first transforms keywords based vectors into corresponding semantic concept based vectors making use of domain ontology.Then,a formula is given for calculating similarities between different documents.An algorithm for KNN based on semantic similarity is proposed.The experimental results show that the method can express and process Web text from the perspective of semantic concept.It can decrease the amount of computation by reducing the dimension of the space of objects and improve the accuracy.
Keywords:ontology semantic similarity text categorization KNN
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号