首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 515 毫秒
1.
基于WEB日志挖掘的网站个性化服务研究   总被引:1,自引:0,他引:1  
论述了基于Web日志挖掘的网站个性化服务,提出了利用Web日志挖掘的个性化服务中频繁访问路径的挖掘及用户聚类和页面聚类。利用这些算法推动Web站点从“站点为中心”向“用户为中心”的发展,即站点不但要针对用户共同感兴趣的信息服务,更要有针对每个用户的个性化服务。  相似文献   

2.
吕娜 《图书情报工作》2007,51(5):138-141
数字图书馆网站服务系统中存有大量有关用户访问信息的日志文件,通过对这些日志文件的挖掘可以得到有关数字图书馆资源利用情况和用户兴趣的信息。但实际上,这些日志文件很少被利用。针对这一现状,作者对日志文件进行挖掘试验分析,得出用户的频繁下载集合,以用于推荐链接和评价数据库资源。  相似文献   

3.
Web日志挖掘数据预处理方法研究     总被引:1,自引:0,他引:1  
Web日志挖掘技术是Web数据挖掘中最重要的应用。通过对挖掘服务器日志文件的分析和研究,可以对网站的组织结构及其性能进行改进,增加个性化服务,发现潜在的读者群体。数据预处理关系到Web日志挖掘的质量。数据预处理包括数据清理、识别用户、识别用户会话、格式化,目的是分割服务器日志为多个独一无二的用户的一次访问序列,并给予了算法实现。  相似文献   

4.
基于SQL Server 2005的Web日志挖掘系统构建   总被引:6,自引:0,他引:6  
分析Web日志挖掘在图书馆中的应用,提出一种利用SQL Server 2005数据挖掘平台构建日志挖掘系统的方法,详细介绍系统主要组件的功能和实现方法,并给出相应的系统框架设计图,最后论述此设计的优点。  相似文献   

5.
基于XML和关联规则的Web挖掘研究*   总被引:4,自引:1,他引:4  
首先对Web挖掘、关联规则分析及XML作了简要介绍,提出了基于XML和关联规则的Web挖掘研究思想,随后对XML结构挖掘、XML内容挖掘和基于XML的Web日志挖掘进行讨论,建立了一个较为完整的XML挖掘体系。  相似文献   

6.
Web日志挖掘在图书馆中的应用探讨   总被引:1,自引:0,他引:1  
首先分析了Web日志挖掘的模型构造与流程处理。在此基础上探讨了web日志挖掘在图书馆中的应用。  相似文献   

7.
按照标准的服务器日志格式,对图书馆Web服务器日志文件的记录进行分析,并通过对其一定时间段的数据挖掘,对图书馆网站的使用状况进行了有益的探讨。  相似文献   

8.
搜索引擎日志记录了用户与系统交互的整个过程。对日志文件进行挖掘,可以发现用户进行Web搜索的行为特征与规律,有效改善搜索引擎系统的性能。在对国内外相关研究进行系统梳理和总结的基础上,文章提出了一个Web搜索引擎日志挖掘的研究框架,主要包括日志挖掘的研究内容、数据集的选择方法、数据预处理的方法、不同地域用户行为的特征与比较、如何应用于系统性能的改善等内容。  相似文献   

9.
基于高校图书馆Web日志挖掘的分析与研究   总被引:1,自引:0,他引:1  
以长江大学图书馆主页服务器Web日志文件为例,介绍了Web日志分析流程、Web日志格式,分析了Web日志挖掘的数据预处理过程,最后给出了应用实例。  相似文献   

10.
通过日志挖掘研究图书馆资源发现服务用户的搜索行为   总被引:1,自引:0,他引:1  
通过日志挖掘的方法对图书馆资源发现服务用户的搜索行为进行研究。搭建一个搜索日志中间平台,用于产生和记录相关日志信息;用Java编写日志处理程序,将日志信息存储于MySQL数据库中;基于日志结果,从检索方式、查询语言与查询长度、分面点击、查询词修改等方面对用户搜索行为进行分析,希望能对图书馆资源发现服务的部署、应用和优化提供参考。  相似文献   

11.
12.
基于序列模式的个性化Web页面推荐模型*   总被引:1,自引:1,他引:0  
基于数据挖掘中的序列模式方法,提出一种个性化Web页面推荐模型。该模型首先利用Web使用数据预处理提取Web交易事务集,然后应用序列模式算法挖掘频繁(连续)序列,最后通过构建频繁(连续)序列树生成用户偏好视图以生成个性化Web页面推荐集。  相似文献   

13.
14.
依据Web 页面和W eb 站点可以被搜集和分类, 文章探讨的主题是:WWW 构成图书馆。结论是:Web 不是数字化图书馆, 但是图书馆可以从Web 中选择搜集材料。Web 文献有两种变化形式。第一种变化形式是本文所谈论的“持续性”; 第二种变化形式是W eb 页面或Web 站点信息的变化。本文力求更进一步认识Web 页面和Web 站点的生存期, 生存期的变化会影响具有Web 信息的图书馆的完整性和有效性, 然而如果能够认识这些变化就可以进行控制和管理。  相似文献   

15.
[目的/意义] 进一步探讨Web 2.0环境下选择性计量指标的有效性。[方法/过程] 以"data mining"为检索词,获得Mendeley与Web of Science两平台的交叉文献集合,分别对交叉文集的被引频数与阅读数、被引频数与标签数进行相关性检验后,从每组选取指标值差异最大与最小的各100篇文献进行具体分析。[结果/结论] 传统计量指标被引频数与Mendeley中的阅读数和标签数均存在弱相关性,证实了以阅读数和标签数为代表的选择性计量指标可以在一定程度上评估文献的影响力,且文献类型、出版年份和作者h指数会对用户阅读、引用等文献利用行为产生影响。未来文献影响力评价的发展方向应为传统文献计量方法与选择性计量方法的结合。  相似文献   

16.
The most common approach to measuring the effectiveness of Information Retrieval systems is by using test collections. The Contextual Suggestion (CS) TREC track provides an evaluation framework for systems that recommend items to users given their geographical context. The specific nature of this track allows the participating teams to identify candidate documents either from the Open Web or from the ClueWeb12 collection, a static version of the web. In the judging pool, the documents from the Open Web and ClueWeb12 collection are distinguished. Hence, each system submission should be based only on one resource, either Open Web (identified by URLs) or ClueWeb12 (identified by ids). To achieve reproducibility, ranking web pages from ClueWeb12 should be the preferred method for scientific evaluation of CS systems, but it has been found that the systems that build their suggestion algorithms on top of input taken from the Open Web achieve consistently a higher effectiveness. Because most of the systems take a rather similar approach to making CSs, this raises the question whether systems built by researchers on top of ClueWeb12 are still representative of those that would work directly on industry-strength web search engines. Do we need to sacrifice reproducibility for the sake of representativeness? We study the difference in effectiveness between Open Web systems and ClueWeb12 systems through analyzing the relevance assessments of documents identified from both the Open Web and ClueWeb12. Then, we identify documents that overlap between the relevance assessments of the Open Web and ClueWeb12, observing a dependency between relevance assessments and the source of the document being taken from the Open Web or from ClueWeb12. After that, we identify documents from the relevance assessments of the Open Web which exist in the ClueWeb12 collection but do not exist in the ClueWeb12 relevance assessments. We use these documents to expand the ClueWeb12 relevance assessments. Our main findings are twofold. First, our empirical analysis of the relevance assessments of 2  years of CS track shows that Open Web documents receive better ratings than ClueWeb12 documents, especially if we look at the documents in the overlap. Second, our approach for selecting candidate documents from ClueWeb12 collection based on information obtained from the Open Web makes an improvement step towards partially bridging the gap in effectiveness between Open Web and ClueWeb12 systems, while at the same time we achieve reproducible results on well-known representative sample of the web.  相似文献   

17.
Web 信息检索(Information Retrieval)技术研究是应用文本检索研究的成果,它结合Web图论的思想,研究Web上的信息检索,是行之有效的Web知识发现的途径。传统HITS方法所获得的信息精确度相当低,而PageRank作为一通用的搜索方法,不能够应用于特定主题的信息获取。在充分分析了PageRank、HITS等现有算法和Web文档的相似度计算方法的基础上,提出了Web上查询特定主题相关信息发现的RG-HITS算法。它结合了Web超链接、网页知识表示的信息相关度以及HITS方法来搜索Web上特定主题的相关知识。  相似文献   

18.
This article reports a national study of Internet users' usage of cable television Web site features to illustrate the dynamics of cross-media use in electronic media and explore the role of cable television network Web sites in network branding and viewership. Our findings indicate that younger Internet users are much more likely to use television Web sites than older Internet users. Despite the low use of the enhanced features of television Web sites, the increase in the number of Web site feature usage positively predicts viewer loyalty, subscriber loyalty, and to a lesser extent, new subscriber attraction for cable networks.  相似文献   

19.
Web site usage statistics are a widely used tool for Web site development, but libraries are still learning how to use them successfully. This case study summarizes how Morris Library at Southern Illinois University Carbondale implemented Google Analytics on its Web site and used the reports to inform a site redesign. As the main campus library at a research university with about 20,000 undergraduate and graduate students, the library included resources from multiple library departments on a single site. In planning the redesign, Morris Library's Virtual Library Group combined usage reports with information from other sources, such as usability tests and user comments. The Virtual Library Group faced barriers to interpreting and applying the usage statistics in the site redesign, including some that were specific to the library's implementation of the Google Analytics tool and some limitations inherent with Web usage statistics in general. Some key barriers in applying the usage statistics to a redesign included sifting through data that did not have implications for the site redesign, interpreting the implications of usage numbers for the site redesign, and balancing competing interests within the library. Nevertheless, the usage statistics enabled the Virtual Library Group to make better decisions by providing a source of factual information about the site's use rather than relying on staff members’ opinions and conjectures.  相似文献   

20.
This article reports the results of an online survey that examined the development of information architecture of Australian library Web sites with reference to documented methods and guidelines. A broad sample of library Web managers responded from across the academic, public, and special sectors. A majority of libraries used either in-house or external documents or both, but the nature of these documents varied greatly. Most external documents were guidelines handed down by libraries’ parent bodies, though some documents produced by independent organizations were used. More general guides on best IA practice were also consulted. The extent of libraries’ control over their own Web sites also varied widely, from minimal control to complete autonomy. Although guiding documentation was considered useful in some ways, respondents were more interested in developing the necessary IA skills and competencies than in cross-site standardization. The lack of these skills and resource and management issues were a greater concern than a lack of documentation. The influence of parent bodies and the diverse purpose and context of library Web sites suggest that a generic set of guidelines for libraries would not be particularly helpful. Instead, librarians with greater IA skills would be in a better position to apply the most appropriate standards and guidelines according to their local contexts.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号