基于XML的PDF文档信息抽取系统的研究* Research on PDF Documents Information Extraction System Based on XML期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于XML的PDF文档信息抽取系统的研究*

引用本文：	宋艳娟,张文德.基于XML的PDF文档信息抽取系统的研究*[J].现代图书情报技术,2005,21(9):10-13.

作者姓名：	宋艳娟张文德

作者单位：	1. 福州大学数学与计算机科学学院,福州,350002 2. 福州大学图书馆,福州,350002

基金项目：	本项目是福建省高等学校科技项目（JA04164）的研究成果之一

摘要：	首先设计了科技论文的DTD文档，然后分析了PDF文档的结构。在此基础上，我们介绍了PDF文档信息抽取系统的设计框架。该框架以上述DTD为模板，把以PDF格式表示的科技论文解析转换为有效的XML文档。
关键词：	信息抽取 PDF XML
收稿时间：	2005-05-23
修稿时间：	2005-06-07
Research on PDF Documents Information Extraction System Based on XML

Song Yanjuan,Zhang wende.Research on PDF Documents Information Extraction System Based on XML[J].New Technology of Library and Information Service,2005,21(9):10-13.

Authors:	Song Yanjuan Zhang wende

Institution:	(College of Mathematics and Computer Science, Fuzhou Uninversity, Fuzhou 350002,China) (Library of Fuzhou Uninversity, Fuzhou 350002, China)

Abstract:	The article is structured as follows. Firstly, we try to design a DTD of articles of science and technology. Secondly, we analyze the structure of PDF documents. Based on that, we dwell on the design of a PDF information extraction system, which use the above-mentioned DTD as a template, transfer a PDF-formatted scientific and technological article to a valid XML document.

Keywords:	Information Extraction PDF XML
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《现代图书情报技术》浏览原始摘要信息
	点击此处可从《现代图书情报技术》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏