首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于模糊分类规则树的文本分类
引用本文:郭玉琴,袁方,刘海博.基于模糊分类规则树的文本分类[J].东南大学学报,2008,24(3):339-342.
作者姓名:郭玉琴  袁方  刘海博
作者单位:[1]河北大学数学与计算机学院,保定071002 [2]中国人民银行天津分行,天津300040
基金项目:国家自然科学基金,Technology Research Project of Hebei Province,Research Plan of Education Office of Hebei Province
摘    要:针对传统的基于关联规则的文本分类方法在分类文本时需要遍历分类器中的所有规则,分类效率非常低的问题,提出一种基于模糊分类规则树(FCR-tree)的文本分类方法.分类器中的规则以树的形式存储,由于树型结构避免了重复结点的存储,节省了存储空间.模糊分类关联规则与一般分类规则相比,不仅包含了词条信息,还包含了词条出现频度对应的模糊集,所以FCR-tree的构建过程及树的结构不同于一般规则树CR-tree.为降低构建及遍历FCR-tree的难度,采用了构造多棵k-FCR-tree的方法.在搜索规则树时,如果结点中的词条没在待分类文本中出现,则不需要再搜索该结点引导的子树,大大减少了需要匹配的规则的数量.实验表明该方法是可行的,与遍历分类器的分类方法相比,分类效率有了明显提高.

关 键 词:文本分类  模糊分类关联规则  分类规则树  模糊分类规则树

Text categorization based on fuzzy classification rules tree
Guo Yuqin,Yuan Fang,Liu Haibo.Text categorization based on fuzzy classification rules tree[J].Journal of Southeast University(English Edition),2008,24(3):339-342.
Authors:Guo Yuqin  Yuan Fang  Liu Haibo
Institution:Guo Yuqin Yuan Fang Liu Haibo ( 1 College of Mathematics and Computer Science, Hebei University, Baoding 071002, China;2 Tianjin Branch of the People's Bank of China, Tianjin 300040, China)
Abstract:To deal with the problem that arises when the conventional fuzzy class-association method applies repetitive scans of the classifier to classify new texts,which has low efficiency, a new approach based on the FCR-tree(fuzzy classification rules tree)for text categorization is proposed.The compactness of the FCR-tree saves significant space in storing a large set of rules when there are many repeated words in the rules.In comparison with classification rules,the fuzzy classification rules contain not only words,but also the fuzzy sets corresponding to the frequencies of words appearing in texts.Therefore,the construction of an FCR-tree and its structure are different from a CR-tree.To debase the difficulty of FCR-tree construction and rules retrieval,more k-FCR-trees are built.When classifying a new text,it is not necessary to search the paths of the sub-trees led by those words not appearing in this text,thus reducing the number of traveling rules.Experimental results show that the proposed approach obviously outperforms the conventional method in efficiency.
Keywords:text categorization  fuzzy classification association rule  classification rules tree  fuzzy classification rules tree
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号