文本聚类中文本表示和相似度计算研究综述 A Survey on Text Representation and Similarity Calculation in Text Clustering期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

文本聚类中文本表示和相似度计算研究综述

引用本文：	吴夙慧,成颖,郑彦宁,潘云涛.文本聚类中文本表示和相似度计算研究综述[J].情报科学,2012(4):622-627.

作者姓名：	吴夙慧成颖郑彦宁潘云涛

作者单位：	南京大学信息管理系;中国科学技术信息研究所

基金项目：	国家社科基金项目(10CTQ027);教育部人文社会科学研究规划基金项目(07JA870006);中国科学技术信息研究所合作研究项目

摘要：	围绕文本聚类中的文本表示和相似度计算两个基本的问题,对目前学界提出的文本表示方法和相似度计算方法进行了分类和较为全面的综述,将文本表示模型分为向量空间模型、语言模型、后缀树模型、本体等,相似度计算方法分为基于向量空间模型的相似度计算,基于短语的相似度计算方法和基于本体的相似度计算方法。
关键词：	文本聚类文本表示相似度计算
A Survey on Text Representation and Similarity Calculation in Text Clustering

WU Su-hui,CHENG Ying,ZHENG Yan-ning,PAN Yun-tao.A Survey on Text Representation and Similarity Calculation in Text Clustering[J].Information Science,2012(4):622-627.

Authors:	WU Su-hui CHENG Ying ZHENG Yan-ning PAN Yun-tao

Institution:	1.Department of Information Management,Nanjing University,Nanjing 210093,China; 2.Institute of Scientific and Technical Information of China,Beijing 100038,China)

Abstract:	The two basic problems of text clustering are text representation and similarity calculation.In this paper,We classified the different text representation models and the methods of similarity calculation and summarized them detailedly.This paper classified the present text representation models as VSM,language model,suffix tree model and ontology,classified the methods of similarity calculation as three categories,including VSM-based method,phrase-based method and ontology-based method.

Keywords:	text clustering text representation similarity calculation
本文献已被 CNKI 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏