首页 | 本学科首页   官方微博 | 高级检索  
     检索      

C2C电子商务网站交易信息抽取工具的研究与实现
引用本文:王鸿伟,吴扬扬.C2C电子商务网站交易信息抽取工具的研究与实现[J].泉州师范学院学报,2010,28(4):12-17.
作者姓名:王鸿伟  吴扬扬
作者单位:1. 泉州师范学院,数学与计算机科学学院,福建,泉州,362000
2. 华侨大学,计算机学院,福建,泉州,362000
摘    要:研究淘宝网和百度有啊这两个国内有代表性的C2C电子商务平台上的销售记录及其用户信息的抽取.针对两个网站上的店铺销售数据,设计一个基于JerichoHtmlParser的、以Html数据标签为地标的Web数据抽取算法;针对两个网站上的用户信息,设计一个基于正则表达式的Web数据抽取算法.设计实现了一个Web抽取系统,可以按不同的抽取规则实现对不同站点上数据的抽取.最后通过对上述2个平台上实际数据的抽取,验证了设计方案的有效性,实验证实了所设计的原型系统具有较高查全率和准确率.

关 键 词:Web数据抽取  C2C电子商务  正则表达式

Research and Implementation of a Transaction Information Extraction Tool for C2C E-commerce Sites
WANG Hong-wei,WU Yang-yang.Research and Implementation of a Transaction Information Extraction Tool for C2C E-commerce Sites[J].Journal of Quanzhou Normal College,2010,28(4):12-17.
Authors:WANG Hong-wei  WU Yang-yang
Institution:1.School of Mathematics and Computer Science,Quanzhou Normal University,Fujan 362000,China;2.Department of Computer,Huaqiao University,Fujan 362000,China)
Abstract:Taobao and Youa are representative C2C E-commerce platforms in China at present.This paper studies how to extract information from transaction record pages and user registration pages on these two platforms.According to the sales records and user registration information on the two sites,two Web data extraction algorithms are designed.One is JerichoHtmlParser-based and uses Html tag as landmark,the other is based on regular expression matching.A Web information extraction system which can extract data from different sites by different extraction rules is designed and implemented.To prove the validity of the algorithm,some experiments have been done.The results show that the prototype system has higher recall rate and accuracy rate.
Keywords:Web data extraction  C2C E-commerce  regular expression
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号