首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于FastText的新闻文本多分类研究
引用本文:张超超,卢新明.基于FastText的新闻文本多分类研究[J].教育技术导刊,2020,19(3):44-47.
作者姓名:张超超  卢新明
作者单位:山东科技大学 计算机科学与工程学院,山东 青岛 266590
基金项目:国家重点研发计划项目(2017YFC0804406);山东省重点研发计划项目(2016ZDJS02A05)
摘    要:在迅速增加的海量数据中,文本形式的数据占很大比重。文本分类作为最常见的文本挖掘技术,可在大量杂乱的文本数据中发现有价值的信息,具有重要意义。文本分类面临的首要问题是如何在确保分类准确率的同时缩短分类时间。提出使用分类模型FastText学习单词特征以解决该问题,同时在数据集上使用停用词处理方法降低噪声数据对分类模型的影响。实验结果表明,使用FastText文本分类模型在数据集上准确率达到96.11%,比传统模型提高近4%,且模型处理每条文本的平均时间为1.5ms,缩短了约1/3。

关 键 词:文本分类  词向量  FastText  停用词  噪声数据  
收稿时间:2019-11-14

Research on News Text Classification Based on FastText
ZHANG Chao-chao,LU Xin-ming.Research on News Text Classification Based on FastText[J].Introduction of Educational Technology,2020,19(3):44-47.
Authors:ZHANG Chao-chao  LU Xin-ming
Institution:College of Computer Science and Engineering, Shandong University of Science and Technology,Qingdao 266590,China
Abstract:With the rapid increase of the amount of data, textual data accounts for a large proportion. Text classification, as the most common text mining technology, is of great significance for finding valuable information in a large amount of messy text data. In the field of text classification, the primary goal is to reduce the classification time while ensuring the classification accuracy. Therefore, this paper uses the classification model FastText to learn the word features to solve the current problem. In addition, a stop word processing method is used to reduce the influence of noise data on the classification model. The experimental results show that the accuracy rate of FastText text classification model is 96.11%, which is nearly 4% higher than the traditional model. Furthermore the time spent by the model in processing each text was 1.5m/s on average, which was reduced by about 1/3.
Keywords:text classification    term vectors    FastText    stop words    noise data  
点击此处可从《教育技术导刊》浏览原始摘要信息
点击此处可从《教育技术导刊》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号