A Bayesian feature selection paradigm for text classification期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

A Bayesian feature selection paradigm for text classification

Authors:	Guozhong Feng Jianhua Guo Bing-Yi Jing Lizhu Hao

Institution:	1. Key Laboratory for Applied Statistics of MOE, Northeast Normal University, Changchun 130024, Jilin Province, China;2. School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, Jilin Province, China;3. Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong

Abstract:	The automated classification of texts into predefined categories has witnessed a booming interest, due to the increased availability of documents in digital form and the ensuing need to organize them. An important problem for text classification is feature selection, whose goals are to improve classification effectiveness, computational efficiency, or both. Due to categorization unbalancedness and feature sparsity in social text collection, filter methods may work poorly. In this paper, we perform feature selection in the training process, automatically selecting the best feature subset by learning, from a set of preclassified documents, the characteristics of the categories. We propose a generative probabilistic model, describing categories by distributions, handling the feature selection problem by introducing a binary exclusion/inclusion latent vector, which is updated via an efficient Metropolis search. Real-life examples illustrate the effectiveness of the approach.

Keywords:	Bayesian feature selection Metropolis search Mixture model Text classification
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏