首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于支持向量机的垃圾邮件过滤分析与实现
引用本文:陈青,王鹏鸣.基于支持向量机的垃圾邮件过滤分析与实现[J].科技广场,2008(3):89-92.
作者姓名:陈青  王鹏鸣
作者单位:江西师范大学计算机信息工程学院,江西,南昌,330022
摘    要:互联网的发展逐渐改变了人们的生活方式,电子邮件因其方便、快捷的特点已受到人们的青睐。但许多垃圾邮件同时也在网络中蔓延,占据了邮件服务器的大量存储空间,用户往往需要花费大量的时间去删除这些垃圾邮件。因此,研究邮件的自动过滤具有重要意义。邮件的自动过滤主要有基于规则和基于统计两种方式。而目前基于统计的过滤器中,常用的贝叶斯方法等是建立在经验风险最小化的基础之上,过滤器推广性能较差。支持向量机(SVM)是在统计学习理论的基础上发展而来的一种新的模式识别方法,在解决有限样本、非线性及高维模式识别问题中表现出许多特有的优势。它不仅考虑了对推广能力的要求,而且追求在有限信息的条件下得到最优结果。因此,本文将支持向量机应用于邮件过滤,实验证明过滤效果较好。

关 键 词:垃圾邮件过滤  支持向量机  统计学习理论
文章编号:1671-4792-(2008)3-0067-04

Analysis and Realization of Spam Filtering Based on Support Vector Machines(SVM)
Chen Qing,Wang Pengming.Analysis and Realization of Spam Filtering Based on Support Vector Machines(SVM)[J].Science Mosaic,2008(3):89-92.
Authors:Chen Qing  Wang Pengming
Abstract:With the development of Internet, E-mail has already been favored gradually by people for its convenience. Unfortunately,many Spam that spread on network at the same time not only fill up mail server storage space,but also make users spend much time on removing them. As a result,it is significant to explore an automated E-mail filter. There are two major methods on automated filtering mail:based on rule and based on probability. At present, the majority filter based on probability commonly use Naive Bayes algorithm (NB) or K-Nearest Neighbor(KNN), which are based on Empirical Risk Minimization. So, their popularization perfor-mance isn't very excellent. Support Vector Machines (SVM) is a kind of novel machine learning method. It can solve small-sample learning problems better by using Empirical Risk Minimization in place of Structural Risk Minimization. Moreover, this theory can change the problem in non-linear space to that in the linear space in order to reduce the algorithm complexity by using the kernel function idea. SVM have become the hotspot of machine learning because of their excellent learning performance. In this article, SVM has been applied to E-mail-filter. The result of experimentation shows that the result of filtering is better.
Keywords:Spam Filtering  Partial Least Squares  Feature Extraction
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号