基于XML的WEB数据抽取模型研究 Study on WEB Data Extraction Model Based on XML期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于XML的WEB数据抽取模型研究

引用本文：	黄淑芹.基于XML的WEB数据抽取模型研究[J].通化师范学院学报,2012,33(2):31-33.

作者姓名：	黄淑芹

作者单位：	安徽财经大学管理科学与工程学院,安徽蚌埠,233030

基金项目：	安徽高校优秀青年人才基金资助项目(2011SQRL069);安徽高校自然科学基金资助项目(KJ2011Z007);安徽财经大学青年科研项目(ACKYQ1129)

摘要：	介绍了基于XML技术的WEB信息抽取方法.搭建了WEB信息抽取的三层数据模型,重点在于数据抽取层,在该层先借助于Tidy工具将HTML转换成XHTML,通过Path路径定位与抽取内容相关的锚,再利用XSL将抽取结果映射成XML文件.该XML文件可以直接作为辅助决策的信息源,也可以直接存入数据库为其他所用.这是由非结构化数据向结构化数据转换的一种方法,为应用程序利用WEB中的数据提供了可能.并实现了有关天气预报信息抽取的系统实例,抽取规则简单、健壮,代码移植性好.
关键词：	可扩展标记语言 WEB信息抽取可扩展样式表语言非结构化数据结构化数据
Study on WEB Data Extraction Model Based on XML

HUANG Shu-qin.Study on WEB Data Extraction Model Based on XML[J].Journal of Tonghua Teachers College,2012,33(2):31-33.

Authors:	HUANG Shu-qin

Institution:	HUANG Shu-qin(School of Management Science and Engineering,Anhui University of Finance and Economics, Bengbu,Anhui 233030,China)

Abstract:	The paper introduced a method of WEB information extraction based on XML technology and constructed a three layer data model of WEB information extraction.Data extraction layer is the most important among the three layers.First it converted the data from HTML to XHTML with Tidy tools,and then by path orientation and extracting the content-related anchor,mapped the extraction result to XML file with XSL.A system example was realized about weather forecasts information extraction.The extraction rules are simple,robust and the codes can be widely adopted.

Keywords:	XML WEB information extraction XSLT unstructured data structured data
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏