首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于SLCS的元搜索去重技术研究
引用本文:秦杰,谢蕙,王春云.基于SLCS的元搜索去重技术研究[J].图书情报工作,2010,54(15):113-116.
作者姓名:秦杰  谢蕙  王春云
作者单位:河南工业大学信息科学与工程学院
基金项目:基于合同的构件可测试性设计与主动测试技术
摘    要:针对元搜索结果中的网页重复问题,把基于最长公共子序列(Longest Common Subsequence,简称LCS)的网页去重方法应用到元搜索引擎的去重中,提出基于SLCS(首字母S表示Summary)的元搜索去重方法。在获得网页文档摘要后,根据查询词在语句中出现的次数和语句长度,计算摘要语句集合中每个语句权重,提取权重最大的语句作为网页摘要特征语句,通过比较摘要特征语句间的LCS,计算出结果网页相似性,以提高元搜索引擎的检索质量,实验表明该方法具有较高的准确率。

关 键 词:网页去重  元搜索引擎  LCS  特征码  
收稿时间:2010-03-23
修稿时间:2010-05-28

The Study on the Duplicated Detection Algorithm Based on SLCS with Meta Search Engine
Qin Jie,Xie Hui,Wang Chunyun.The Study on the Duplicated Detection Algorithm Based on SLCS with Meta Search Engine[J].Library and Information Service,2010,54(15):113-116.
Authors:Qin Jie  Xie Hui  Wang Chunyun
Institution:College of Information Science and Engineering,Henan University of Technology,
Abstract:Based on the study on the duplicated web pages detection algorithm, the paper proposed a duplicated detection algorithm based on LCS(Longest Common Subsequence),and studied the duplicated web pages based on SLCS with meta search engine. The main steps of the SLCS(The first S means Summary)algorithm are introduced: first, we get the weight of each sentence of summary, according to the length of the sentence and the frequency of the keyword that consumer submits in the sentence, then take the largest weight sentence as the feature sentence, finally, get the similarity of summaries of the web pages through comparing the similarity of the sentence. Experiments have proved that the new method can make high performance in precision.
Keywords:LCS
本文献已被 万方数据 等数据库收录!
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号