首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于多分类器组合择优方法的主题爬行分类策略
引用本文:乔建忠.基于多分类器组合择优方法的主题爬行分类策略[J].图书情报工作,2013,57(14):114-120.
作者姓名:乔建忠
作者单位:解放军艺术学院信息管理中心
摘    要:针对主题爬行技术中的单一分类算法在面对多主题Web抓取和分类需求时泛化能力不强的局限,设计一种利用多种强分类算法形成的分类器组合,主题爬行器根据当前主题任务在线评估并为分类器排名,从中选择最优分类器分类的策略,并开展在多个主题抓取任务下的分类实验,比较每种分类算法的准确率和组合后的平均分类准确率以及对分类效率等评价指标的综合分析,结果证明该策略对领域局域性有所克服,普适性较强。

关 键 词:主题爬行技术  主题爬行器  网页分类  分类算法  多分类器组合  分类准确率  分类效率  
收稿时间:2013-06-18
修稿时间:2013-07-05

Classification Strategy for Focus Crawling Based on Multi-classifier Combination and Ranking Approach
Qiao Jianzhong.Classification Strategy for Focus Crawling Based on Multi-classifier Combination and Ranking Approach[J].Library and Information Service,2013,57(14):114-120.
Authors:Qiao Jianzhong
Institution:Information Management Center of PLA Academy of Arts, Beijing 100081
Abstract:For the limitation that generalization capacity of single classification algorithm is not strong when focused crawler is facing multi-topic Web crawling and classification, the paper proposed a strategy of using multi-classifier combination formed of multiple strong classification algorithms. The focused crawler evaluates and ranks the classifiers online according to the current topic, and classifies Web pages by selecting the better classifiers. Through classification experiments of multiple topics crawling tasks, comparing between accurate rate of each classification algorithm and average classification accurate rate of multi-classifier combination, and comprehensive analysis of the two indicators——classification accuracy and classification efficiency, it proved the proposed method is better in universality, to a certain extent and overcomes the limitations of a single classifier.
Keywords:focused crawling  focused crawler  Web page classification  classification algorithm  multiple classifiers combination  classification accuracy  classification efficiency    
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号