首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 199 毫秒
1.
"链接工厂"欺骗(Link farm spam)和重复链接严重地损坏了像HITS这样基于链接排序的算法性能.为了检测与抑制Web"链接工厂"欺骗和重复链接,本文充分利用页面之间的复制信息,尤其是利用完全超链接信息识别可疑链接目标.提出一种由页面文档和完全链接构成的二部图结构,通过对二部图进行构建与分析,搜寻共享锚文本和链接目标的Web页面,在此过程中标识链接工厂和重复链接,并通过带惩罚因子的权重邻接矩阵减小可疑链接的影响.实时实验和用户仿真测试结果显示,本文算法能显著改善传统HITS类方法的信息搜索质量.  相似文献   

2.
主要介绍了设计开发Web主题信息采集系统的一个核心算法——超链接主题预测算法。文章在已有理论的基础上,通过实验分析,发现超链接的主题主要取决于三个因素:父网页的主题相关度、锚文本的主题相关度和Web子图的链接结构特性,从而提出了基于Web页面内容和链接结构的超链接主题预测算法,系统评价结果显示该算法有很好的效果。  相似文献   

3.
运用共词分析的方法,检索CNKI数据库中的链接分析领域论文,确定高频关键词,用Bicomb建立关键词共词矩阵,以SPSS为工具进行因子分析和聚类分析,探讨国内链接分析的研究现状与研究热点,发现应用于链接分析的方法主要有引文分析、共链分析、可视化、社会网络分析等,链接分析算法主要包括PageRank算法、HIST算法、网页排序等,应用研究集中于网络信息资源评价、网站的网络影响力评价和大学评价。  相似文献   

4.
运用共词分析的方法,检索CNKI数据库中的链接分析领域论文,确定高频关键词,用Bicomb建立关键词共词矩阵,以SPSS为工具进行因子分析和聚类分析,探讨国内链接分析的研究现状与研究热点,发现应用于链接分析的方法主要有引文分析、共链分析、可视化、社会网络分析等,链接分析算法主要包括PageRank算法、HIST算法、网页排序等,应用研究集中于网络信息资源评价、网站的网络影响力评价和大学评价.  相似文献   

5.
指出Web挖掘是从数据挖掘发展而来,是集合Web技术、数据挖掘、信息科学等多领域为一体的一项综合技术;介绍Web挖掘的概念、分类以及Web页面之间链接结构挖掘的HITS与Page-rank等算法;提出基于样本模式特征提取的信息检索方法。最后,分析Web链接挖掘面临的问题和未来研究的发展趋势。  相似文献   

6.
引文分析是传统文献计量学和科学计量学的一种独特研究方法。主要从网络链接分析研究、基于网页链接分析的搜索引擎排序算法研制和新型网络引文索引工具的编制等方面,分析论述引文分析方法在网络环境下的发展和应用,以期形成对引文分析方法及其价值的合理认知和评价。  相似文献   

7.
针对高校图书馆场景存在的无显式反馈、借阅数据稀疏和传统推荐算法效果不好问题,提出基于时间上下文优化协同过滤的推荐算法,包含读者阅读行为评分、时间上下文和内容兴趣变迁3个要素。在数据准备阶段,通过制定评分转化规则、设计标准化函数来构建一种基于用户行为操作的兴趣评分模型,以解决用户评分缺失问题;在推荐召回阶段,提出一种非线性的时间衰减模型来对评价矩阵进行优化,以提高推荐效果;在推荐排序阶段,提出一种兴趣捕捉模型对召回结果按照图书类别进行精排序,以缓解数据稀疏问题并进一步提高推荐效果。实验结果表明,文章提出的优化算法在Top5的F值较未经优化的协同过滤提升增幅达141%。  相似文献   

8.
基于社会化标注的个性化推荐研究进展   总被引:6,自引:2,他引:4  
社会化标注是当前互联网研究中的一个热点.本文在对社会化标注的内涵和结构加以简单介绍的基础上,重点探讨了基于社会化标注进行推荐的相关进展.首先是明确了标签对于用户模型的意义,接着,从用户、资源和标签三个角度对基于社会化标注的聚类算法进行了讨论.同时也对基于社会化标注的排序算法进行了分析,并进一步将其分为依附补充、独立排序和通用排序三类算法.然后,对标签推荐方面的研究进行了探讨,主要是围绕内容分析、协同分析、语义分析三个方面展开的.最后,分析了社会化标注中个性化信息推荐的研究,发现借助矩阵、聚类和网络的分析是三种主要思路.  相似文献   

9.
严海兵  崔志明 《情报学报》2007,26(3):361-365
基于关键字匹配的搜索引擎排序网页时仅仅考虑评价网页的重要性,而忽视分类;基于分类目录的搜索引擎很难动态分析Web信息。本文在分析它们不足的前提下,提出利用模糊聚类的方法对搜索引擎的检索结果进行动态分类,依据超链分析算法PageRank和Web文档隶属度相结合进行分类排序,并给出具有调节值的结合公式。实验证明,该算法能够更有效地满足用户的需要,提高检索效率。  相似文献   

10.
本研究在总结现有以共链分析和社会网络分析为主的学术网络局部结构识别方法的基础上,提出了改进的两步式K核分析方法,首次引入了复杂网络中的社区识别算法进行链接网络的分割,并尝试通过适用性评测验证快速聚类算法在同质Web链接网络的主题结构识别方面的有效性.最后的实验结果表明,本研究提出的改进K核分析方法可以有效地发现存在于链接网络中的主题聚类现象;同时研究中引入的快速聚类算法对以93所大学网站进行了聚类并获得六个主题类.通过聚类准确率指标计算,该聚类方法的平均准确率为72%.以上结论证实了本研究中采用的从链接关系度量,数据矩阵构建、到链接网络分析的方法体系是有效的.  相似文献   

11.
Web search algorithms that rank Web pages by examining the link structure of the Web are attractive from both theoretical and practical aspects. Todays prevailing link-based ranking algorithms rank Web pages by using the dominant eigenvector of certain matrices—like the co-citation matrix or variations thereof. Recent analyses of ranking algorithms have focused attention on the case where the corresponding matrices are irreducible, thus avoiding singularities of reducible matrices. Consequently, rank analysis has been concentrated on authority connected graphs, which are graphs whose co-citation matrix is irreducible (after deleting zero rows and columns). Such graphs conceptually correspond to thematically related collections, in which most pages pertain to a single, dominant topic of interest.A link-based search algorithm A is rank-stable if minor changes in the link structure of the input graph, which is usually a subgraph of the Web, do not affect the ranking it produces; algorithms A,B are rank-similar if they produce similar rankings. These concepts were introduced and studied recently for various existing search algorithms.This paper studies the rank-stability and rank-similarity of three link-based ranking algorithms—PageRank, HITS and SALSA—in authority connected graphs. For this class of graphs, we show that neither HITS nor PageRank is rank stable. We then show that HITS and PageRank are not rank similar on this class, nor is any of them rank similar to SALSA.This research was supported by the Fund for the Promotion of Research at the Technion, and by the Barnard Elkin Chair in Computer Science.  相似文献   

12.
As the volume of scientific articles has grown rapidly over the last decades, evaluating their impact becomes critical for tracing valuable and significant research output. Many studies have proposed various ranking methods to estimate the prestige of academic papers using bibliometric methods. However, the weight of the links in bibliometric networks has been rarely considered for article ranking in existing literature. Such incomplete investigation in bibliometric methods could lead to biased ranking results. Therefore, a novel scientific article ranking algorithm, W-Rank, is introduced in this study proposing a weighting scheme. The scheme assigns weight to the links of citation network and authorship network by measuring citation relevance and author contribution. Combining the weighted bibliometric networks and a propagation algorithm, W-Rank is able to obtain article ranking results that are more reasonable than existing PageRank-based methods. Experiments are conducted on both arXiv hep-th and Microsoft Academic Graph datasets to verify the W-Rank and compare it with three renowned article ranking algorithms. Experimental results prove that the proposed weighting scheme assists the W-Rank in obtaining ranking results of higher accuracy and, in certain perspectives, outperforming the other algorithms.  相似文献   

13.
We address the feature extraction problem for document ranking in information retrieval. We then propose LifeRank, a Linear feature extraction algorithm for Ranking. In LifeRank, we regard each document collection for ranking as a matrix, referred to as the original matrix. We try to optimize a transformation matrix, so that a new matrix (dataset) can be generated as the product of the original matrix and a transformation matrix. The transformation matrix projects high-dimensional document vectors into lower dimensions. Theoretically, there could be very large transformation matrices, each leading to a new generated matrix. In LifeRank, we produce a transformation matrix so that the generated new matrix can match the learning to rank problem. Extensive experiments on benchmark datasets show the performance gains of LifeRank in comparison with state-of-the-art feature selection algorithms.  相似文献   

14.
Information Retrieval Journal - Ranking models are the main components of information retrieval systems. Several approaches to ranking are based on traditional machine learning algorithms using a...  相似文献   

15.
Ranking information resources is a task that usually happens within more complex workflows and that typically occurs in any form of information retrieval, being commonly implemented by Web search engines. By filtering and rating data, ranking strategies guide the navigation of users when exploring large volumes of information items. There exist a considerable number of ranking algorithms that follow different approaches focusing on different aspects of the complex nature of the problem, and reflecting the variety of strategies that are possible to apply. With the growth of the web of linked data, a new problem space for ranking algorithms has emerged, as the nature of the information items to be ranked is very different from the case of Web pages. As a consequence, existing ranking algorithms have been adapted to the case of Linked Data and some specific strategies have started to be proposed and implemented. Researchers and organizations deploying Linked Data solutions thus require an understanding of the applicability, characteristics and state of evaluation of ranking strategies and algorithms as applied to Linked Data. We present a classification that formalizes and contextualizes under a common terminology the problem of ranking Linked Data. In addition, an analysis and contrast of the similarities, differences and applicability of the different approaches is provided. We aim this work to be useful when comparing different approaches to ranking Linked Data and when implementing new algorithms.  相似文献   

16.
OBJECTIVE: To quantify the impact of Pakistani Medical Journals using the principles of citation analysis. METHODS: References of articles published in 2006 in three selected Pakistani medical journals were collected and examined. The number of citations for each Pakistani medical journal was totalled. The first ranking of journals was based on the total number of citations; second ranking was based on impact factor 2006 and third ranking was based on the 5-year impact factor. Self-citations were excluded in all the three ratings. RESULTS: A total of 9079 citations in 567 articles were examined. Forty-nine separate Pakistani medical journals were cited. The Journal of the Pakistan Medical Association remains on the top in all three rankings, while Journal of College of Physicians and Surgeons-Pakistan attains second position in the ranking based on the total number of citations. The Pakistan Journal of Medical Sciences moves to second position in the ranking based on the impact factor 2006. The Journal of Ayub Medical College, Abbottabad moves to second position in the ranking based on the 5-year impact factor. CONCLUSION: This study examined the citation pattern of Pakistani medical journals. The impact factor, despite its limitations, is a valid indicator of quality for journals.  相似文献   

17.
We investigate temporal factors in assessing the authoritativeness of web pages. We present three different metrics related to time: age, event, and trend. These metrics measure recentness, special event occurrence, and trend in revisions, respectively. An experimental dataset is created by crawling selected web pages for a period of several months. This data is used to compare page rankings by human users with rankings computed by the standard PageRank algorithm (which does not include temporal factors) and three algorithms that incorporate temporal factors, including the Time-Weighted PageRank (TWPR) algorithm introduced here. Analysis of the rankings shows that all three temporal-aware algorithms produce rankings more like those of human users than does the PageRank algorithm. Of these, the TWPR algorithm produces rankings most similar to human users’, indicating that all three temporal factors are relevant in page ranking. In addition, analysis of parameter values used to weight the three temporal factors reveals that age factor has the most impact on page rankings, while trend and event factors have the second and the least impact. Proper weighting of the three factors in TWPR algorithm provides the best ranking results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号