Augmenting Naive Bayes Classifiers with Statistical Language Models期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Augmenting Naive Bayes Classifiers with Statistical Language Models

Authors:	Fuchun Peng Dale Schuurmans Shaojun Wang

Institution:	1. Center for Intelligent Information Retrieval, Department of Computer Science, University of Massachusetts at Amherst, 140 Governors Drive, Amherst, MA, USA, 01003 2. Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada, T6G 2E8 3. Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada, T6G 2E8

Abstract:	We augment naive Bayes models with statistical n-gram language models to address short-comings of the standard naive Bayes text classifier. The result is a generalized naive Bayes classifier which allows for a local Markov dependence among observations; a model we refer to as the C hain A ugmented N aive Bayes (CAN) Bayes classifier. CAN models have two advantages over standard naive Bayes classifiers. First, they relax some of the independence assumptions of naive Bayes—allowing a local Markov chain dependence in the observed variables—while still permitting efficient inference and learning. Second, they permit straightforward application of sophisticated smoothing techniques from statistical language modeling, which allows one to obtain better parameter estimates than the standard Laplace smoothing used in naive Bayes classification. In this paper, we introduce CAN models and apply them to various text classification problems. To demonstrate the language independent and task independent nature of these classifiers, we present experimental results on several text classification problems—authorship attribution, text genre classification, and topic detection—in several languages—Greek, English, Japanese and Chinese. We then systematically study the key factors in the CAN model that can influence the classification performance, and analyze the strengths and weaknesses of the model.

Keywords:
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏