基于FastText的新闻文本多分类研究 Research on News Text Classification Based on FastText期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于FastText的新闻文本多分类研究

引用本文：	张超超,卢新明.基于FastText的新闻文本多分类研究[J].教育技术导刊,2020,19(3):44-47.

作者姓名：	张超超卢新明

作者单位：	山东科技大学计算机科学与工程学院，山东青岛 266590

基金项目：	国家重点研发计划项目（2017YFC0804406）；山东省重点研发计划项目（2016ZDJS02A05）

摘要：	在迅速增加的海量数据中，文本形式的数据占很大比重。文本分类作为最常见的文本挖掘技术，可在大量杂乱的文本数据中发现有价值的信息，具有重要意义。文本分类面临的首要问题是如何在确保分类准确率的同时缩短分类时间。提出使用分类模型FastText学习单词特征以解决该问题，同时在数据集上使用停用词处理方法降低噪声数据对分类模型的影响。实验结果表明，使用FastText文本分类模型在数据集上准确率达到96.11%，比传统模型提高近4%，且模型处理每条文本的平均时间为1.5ms，缩短了约1/3。
关键词：	文本分类词向量 FastText 停用词噪声数据
收稿时间：	2019-11-14
Research on News Text Classification Based on FastText

ZHANG Chao-chao,LU Xin-ming.Research on News Text Classification Based on FastText[J].Introduction of Educational Technology,2020,19(3):44-47.

Authors:	ZHANG Chao-chao LU Xin-ming

Institution:	College of Computer Science and Engineering， Shandong University of Science and Technology，Qingdao 266590，China

Abstract:	With the rapid increase of the amount of data， textual data accounts for a large proportion. Text classification， as the most common text mining technology， is of great significance for finding valuable information in a large amount of messy text data. In the field of text classification， the primary goal is to reduce the classification time while ensuring the classification accuracy. Therefore， this paper uses the classification model FastText to learn the word features to solve the current problem. In addition， a stop word processing method is used to reduce the influence of noise data on the classification model. The experimental results show that the accuracy rate of FastText text classification model is 96.11%， which is nearly 4% higher than the traditional model. Furthermore the time spent by the model in processing each text was 1.5m/s on average， which was reduced by about 1/3.

Keywords:	text classification term vectors FastText stop words noise data

	点击此处可从《教育技术导刊》浏览原始摘要信息
	点击此处可从《教育技术导刊》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏