数字图书馆Web 学术资源信息的分块采集研究 Research of Page Segmentation for Digital Library Based on Web Academic Resource Crawling期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

数字图书馆Web 学术资源信息的分块采集研究

引用本文：	王兰成,朱建华.数字图书馆Web 学术资源信息的分块采集研究[J].中国信息导报,2012(6):76-80.

作者姓名：	王兰成朱建华

作者单位：	南京政治学院上海校区军事信息管理系，上海 200433

摘要：	在数字图书馆Web 学术信息资源的优化采集中，有效结合网页空间特征、内容特征和标签信息对网页进行分块，研究对分块结果进行识别和合并，然后输出网页的主题文本和相关链接块集合，最后通过实验分析该方法能够进一步去除页面中噪音、准确地分析页面的主题相关性和提高Web 主题信息采集的质量。
关键词：	数字图书馆 Web 学术资源自动采集信息系统
Research of Page Segmentation for Digital Library Based on Web Academic Resource Crawling

Wang Lancheng,Zhu Jianhua.Research of Page Segmentation for Digital Library Based on Web Academic Resource Crawling[J].China Information Review,2012(6):76-80.

Authors:	Wang Lancheng Zhu Jianhua

Institution:	Department of Information Management, Shanghai Political College PLA, Shanghai 200433

Abstract:	Web academic resource crawling on digital library is an important research area. The effective integration of web space characteristics, content characteristics and label information on the web pages block are researched. The identification and the merger of results on Page Segmentation are studied. The subject of the final text page and related links block collection are output. It is fact that more accurate analysis of the topic pages and improve the quality of Web information collection subject.

Keywords:	digital library web academic resource automation crawling information system

	点击此处可从《中国信息导报》浏览原始摘要信息
	点击此处可从《中国信息导报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏