基于句子的文本表示及中文文本分类研究 Text Representation Based on Sentence and Chinese Text Categorization期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于句子的文本表示及中文文本分类研究

引用本文：	何维,王宇.基于句子的文本表示及中文文本分类研究[J].情报学报,2009,28(6).

作者姓名：	何维王宇

作者单位：	大连理工大学管理学院,大连,116024

基金项目：	国家自然科学重点基金资助项目

摘要：	文本挖掘技术是信息资源管理的一项关键技术.向量空间模型是文本挖掘中成熟的文本表示模型,通常以词语或短语作为特征项,但这些特征项只能提供较少的语义信息.为实现基于内容的文本挖掘,本文将文本切分粒度从词语或短语提高到句子,用句子包表示文本,使用句子相似度定义文本相似度,用KNN算法进行中文文本分类,验证模型的可行性.实验证明,基于句子包的KNN算法的平均精度(92.12%)和召回率(92.01%)是比较理想的.
关键词：	信息资源管理句子包文本表示文本分类
Text Representation Based on Sentence and Chinese Text Categorization

He Wei,Wang Yu.Text Representation Based on Sentence and Chinese Text Categorization[J].Journal of the China Society for Scientific andTechnical Information,2009,28(6).

Authors:	He Wei Wang Yu

Abstract:	Text mining is a key technology in information resources management. Vector space model is a mature model of text representation in text mining. Words and phrases are commonly used as feature items, but little semantic information is provided by these items. To carry out text mining based on the content, the segmentation granularity is increased from feature items to sentence. Text is represented by a bag of sentences and text similarity is defined by sentence similarity. In order to validate this representation, a Chinese text classifier has been built by KNN algorithm and good average precision (92.12 % ) and recall (92.01 %) have been achieved in the experiments.

Keywords:	information resources management bag of sentences text representation text categorization
本文献已被万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏