分布式产品数据采集动态任务调度模型研究 |
| |
引用本文: | 余凡,程虹,王超,余红伟,许伟.分布式产品数据采集动态任务调度模型研究[J].现代情报,2014,34(4):7-12,17. |
| |
作者姓名: | 余凡 程虹 王超 余红伟 许伟 |
| |
作者单位: | 武汉大学质量发展战略研究院, 湖北 武汉 430072 |
| |
基金项目: | 本文系国家社科基金重大项目“我国质量安全评价与网络预警方法研究”(项目编号:11&ZD158)、科技部质检公益性行业科研专项“质量监管技术及安全风险信息系统支撑研究”(项目编号:201210117)的研究成果之一. |
| |
摘 要: | 网络数据采集是大数据时代进行数据挖掘和分析的基础性工作。本文尝试着以任务在不同节点上采集过程中产生的信息作为调度指标制定动态任务调度策略,分别从任务调度策略、任务修改策略和任务回收策略3个角度构建任务调度模型,最后通过实验分析其可行性。实验结果表明,动态任务调度模型能够提高数据采集的效率。
|
关 键 词: | 数据采集 动态任务调度 任务调度策略 任务修改策略 任务回收策略 |
Distributed Products Data Crawling System Based on Dynamic Task Scheduling Module |
| |
Authors: | Yu Fan Cheng Hong Wang Chao Yu Hongwei Xu Wei |
| |
Institution: | Wuhan University Institute of Quality Development Strategy, Wuhan 430072, China |
| |
Abstract: | Web data crawling is a basic work for data mining and analyzing in big data era. This paper attempted to con- struct task scheduling module based on dynamic task scheduling strategy. Dynamlc task scheduling strategy included task schedul- ing strategy, task modifying strategy, and task recalling strategy. Dynamic task scheduling strategy was made by scheduling in- dex, which was based on information of different nodes in the process of crawling. Experimental results showed that dynamic task scheduling strategy should improve the efficiency of data crawling. |
| |
Keywords: | date crawling dynamic task scheduling task scheduling strategy task modifying strategy task recalling strategy |
本文献已被 CNKI 维普 等数据库收录! |
| 点击此处可从《现代情报》浏览原始摘要信息 |
| 点击此处可从《现代情报》下载免费的PDF全文 |