基于本体和DOM相结合的Web信息抽取器 A Web Information Extractor Based on the Combination of Ontology and DOM期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于本体和DOM相结合的Web信息抽取器

引用本文：	柳佳刚,陈山,贺令亚.基于本体和DOM相结合的Web信息抽取器[J].现代图书情报技术,2009,25(5):44-49.

作者姓名：	柳佳刚陈山贺令亚

作者单位：	湖南工学院计算机科学系,衡阳,421002

摘要：	针对基于Web页面信息本体的信息抽取不能准确划定抽取区域的缺点，设计基于本体和DOM相结合的Web信息抽取器。利用DOM树设计对样本页面信息项路径进行归纳学习的算法,该算法能准确划定信息抽取区域，降低页面噪声，实现对Web页面的预处理。实验表明，改进后的抽取方法提高了Web信息的抽准率。
关键词：	信息抽取包装器本体文档对象模型归纳学习
收稿时间：	2009-03-23
修稿时间：	2009-04-22
A Web Information Extractor Based on the Combination of Ontology and DOM

Liu Jiagang,Chen Shan,He Lingya.A Web Information Extractor Based on the Combination of Ontology and DOM[J].New Technology of Library and Information Service,2009,25(5):44-49.

Authors:	Liu Jiagang Chen Shan He Lingya

Institution:	(Department of Computer Science,Hunan Institute of Technology,Hengyang 421002,China)

Abstract:	In terms of the weakness that information extraction based on information item Ontology of Web page can not partition accurately the areas of extraction, an improved Web information extractor based on Ontology and DOM is designed. This paper utilizes the DOM tree to design an inductive learning algorithm for the path of information items in sample Web pages. Through this algorithm, the areas of information extraction can be partitioned accurately, the noises of sample Web page can be reduced, and the preprocessing of the Web page can be implemented. The experiment shows that the improved approach can increase the precision of information extraction.

Keywords:	Information extraction Wrapper Ontology DOM Inductive Learning
本文献已被万方数据等数据库收录！
	点击此处可从《现代图书情报技术》浏览原始摘要信息
	点击此处可从《现代图书情报技术》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏