基于正则表达式批量提取CNKI文献元数据技术探究 Exploration on Automatic Extraction Metadata of CNKI Papers Based on Regular Expression期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于正则表达式批量提取CNKI文献元数据技术探究

引用本文：	曹俊,万晓云,廖顺宝.基于正则表达式批量提取CNKI文献元数据技术探究[J].图书情报工作,2010,54(19):111-114.

作者姓名：	曹俊万晓云廖顺宝

作者单位：	中国科学院地理科学与资源研究所资源与环境信息系统国家重点实验室

基金项目：	资源与环境信息系统国家重点实验室自主研究课题，中国科学院地理科学与资源研究所创新三期领域前沿项目(地球信息方法论体系)课题

摘要：	介绍一种仅仅依靠中国知网文献磁盘文件提取元数据的方法。根据已下载的CNKI文献提取元数据，构建个人文献信息数据库，进而能建立个人文献信息管理系统。虽然CNKI文献库没有提供外界的数据库访问接口，但是文献的元数据都以web页面的方式展现。通过分析与文献绑定的属性页面的结构，利用正则表达式提取文献元数据，可以将其批量导入到数据库中。
关键词：	CNKI 元数据正则表达式批量提取
收稿时间：	2010-06-28
修稿时间：	2010-08-14
Exploration on Automatic Extraction Metadata of CNKI Papers Based on Regular Expression

Cao Jun,Wan Xiaoyun,Liao Shunbao.Exploration on Automatic Extraction Metadata of CNKI Papers Based on Regular Expression[J].Library and Information Service,2010,54(19):111-114.

Authors:	Cao Jun Wan Xiaoyun Liao Shunbao

Institution:	State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences,

Abstract:	A method of extracting CNKI papers’ metadata just relying on files stored on hard disk is introduced. According to downloaded CNKI papers, the method can extract metadata, build personal literature database, and then create personal paper information management system. Although CNKI Archive does not provide database access interface to the outside world, the metadata of paper is shown on web pages. By analyzing the structure of binding properties web page of literature and using regular expression to extract metadata，the metadata can be batch imported into the database.

Keywords:	CNKI metadata regular expression batch extraction
本文献已被万方数据等数据库收录！
	点击此处可从《图书情报工作》浏览原始摘要信息
	点击此处可从《图书情报工作》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏