基于SLCS的元搜索去重技术研究 The Study on the Duplicated Detection Algorithm Based on SLCS with Meta Search Engine期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于SLCS的元搜索去重技术研究

引用本文：	秦杰,谢蕙,王春云.基于SLCS的元搜索去重技术研究[J].图书情报工作,2010,54(15):113-116.

作者姓名：	秦杰谢蕙王春云

作者单位：	河南工业大学信息科学与工程学院

基金项目：	基于合同的构件可测试性设计与主动测试技术

摘要：	针对元搜索结果中的网页重复问题，把基于最长公共子序列（Longest Common Subsequence，简称LCS）的网页去重方法应用到元搜索引擎的去重中，提出基于SLCS(首字母S表示Summary)的元搜索去重方法。在获得网页文档摘要后，根据查询词在语句中出现的次数和语句长度，计算摘要语句集合中每个语句权重，提取权重最大的语句作为网页摘要特征语句，通过比较摘要特征语句间的LCS，计算出结果网页相似性，以提高元搜索引擎的检索质量，实验表明该方法具有较高的准确率。
关键词：	网页去重元搜索引擎 LCS 特征码
收稿时间：	2010-03-23
修稿时间：	2010-05-28
The Study on the Duplicated Detection Algorithm Based on SLCS with Meta Search Engine

Qin Jie,Xie Hui,Wang Chunyun.The Study on the Duplicated Detection Algorithm Based on SLCS with Meta Search Engine[J].Library and Information Service,2010,54(15):113-116.

Authors:	Qin Jie Xie Hui Wang Chunyun

Institution:	College of Information Science and Engineering，Henan University of Technology，

Abstract:	Based on the study on the duplicated web pages detection algorithm, the paper proposed a duplicated detection algorithm based on LCS（Longest Common Subsequence），and studied the duplicated web pages based on SLCS with meta search engine. The main steps of the SLCS(The first S means Summary)algorithm are introduced: first, we get the weight of each sentence of summary, according to the length of the sentence and the frequency of the keyword that consumer submits in the sentence, then take the largest weight sentence as the feature sentence, finally, get the similarity of summaries of the web pages through comparing the similarity of the sentence. Experiments have proved that the new method can make high performance in precision.

Keywords:	LCS
本文献已被万方数据等数据库收录！
	点击此处可从《图书情报工作》浏览原始摘要信息
	点击此处可从《图书情报工作》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏