搜索引擎返回结果自动抽取 Automated Extraction of Search Engine Results期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

搜索引擎返回结果自动抽取

引用本文：	藕军,任明仑.搜索引擎返回结果自动抽取[J].现代图书情报技术,2007,2(2):49-52.

作者姓名：	藕军任明仑

作者单位：	合肥工业大学计算机网络研究所,合肥,230009

摘要：	提出一种从搜索引擎返回结果页面上自动抽取结果记录及后续页面链接信息并生成Wrapper的方法:对于一个有效的结果页面，通过比较其HTML标签树上节点的相似度从而识别出潜在记录块，利用启发式规则从潜在记录块中将结果记录块和后续页面链接分别识别出来，然后利用其在标签树上的位置信息分别构造Wrapper。实验结论及与已有方法的比较表明，该方法简单可行且高效。
关键词：	Web信息抽取包装器生成 HTML标签树节点相似度
收稿时间：	2006-11-24
修稿时间：	2006-11-24
Automated Extraction of Search Engine Results

Ou Jun,Ren Minglun.Automated Extraction of Search Engine Results[J].New Technology of Library and Information Service,2007,2(2):49-52.

Authors:	Ou Jun Ren Minglun

Institution:	Institute of Computer Network of Hefei University of Technology, Hefei 230009, China

Abstract:	Present a new method for automatically extracting Search Result Records(SRRs) and Subsequent Result Page Links(SRPLs) from a search engine's response page.Compare the similarity of nodes on the HTML tags tree of a valid response page to recognize Candidated Records Blocks(CRBs).And recognize SRRs and SRPLs form CRBs based on several heuristic rules.Then building wrapper for them using their location on tags tree.Experiments and comparison with other methods show that the methed is useful and efficient.

Keywords:	Search engine Web information extraction Wrapper generation HTML tags tree Nodes similarity
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《现代图书情报技术》浏览原始摘要信息
	点击此处可从《现代图书情报技术》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏