首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于本体和DOM相结合的Web信息抽取器
引用本文:柳佳刚,陈山,贺令亚.基于本体和DOM相结合的Web信息抽取器[J].现代图书情报技术,2009,25(5):44-49.
作者姓名:柳佳刚  陈山  贺令亚
作者单位:湖南工学院计算机科学系,衡阳,421002
摘    要:针对基于Web页面信息本体的信息抽取不能准确划定抽取区域的缺点,设计基于本体和DOM相结合的Web信息抽取器。利用DOM树设计对样本页面信息项路径进行归纳学习的算法,该算法能准确划定信息抽取区域,降低页面噪声,实现对Web页面的预处理。实验表明,改进后的抽取方法提高了Web信息的抽准率。

关 键 词:信息抽取  包装器  本体  文档对象模型  归纳学习
收稿时间:2009-03-23
修稿时间:2009-04-22

A Web Information Extractor Based on the Combination of Ontology and DOM
Liu Jiagang,Chen Shan,He Lingya.A Web Information Extractor Based on the Combination of Ontology and DOM[J].New Technology of Library and Information Service,2009,25(5):44-49.
Authors:Liu Jiagang  Chen Shan  He Lingya
Institution:(Department of Computer Science,Hunan Institute of Technology,Hengyang 421002,China)
Abstract:In terms of the weakness that information extraction based on information item Ontology of Web page can not partition accurately the areas of extraction, an improved Web information extractor based on Ontology and DOM is designed. This paper utilizes the DOM tree to design an inductive learning algorithm for the path of information items in sample Web pages. Through this algorithm, the areas of information extraction can be partitioned accurately, the noises of sample Web page can be reduced, and the preprocessing of the Web page can be implemented. The experiment shows that the improved approach can increase the precision of information extraction.
Keywords:Information extraction  Wrapper  Ontology  DOM  Inductive Learning
本文献已被 万方数据 等数据库收录!
点击此处可从《现代图书情报技术》浏览原始摘要信息
点击此处可从《现代图书情报技术》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号