首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
张皓  周学广 《教育技术导刊》2013,12(11):135-137
通过分析开源网络爬虫Heritrix的工作原理及架构,针对Heritrix开源爬虫只能对全网站进行通爬的特点,对Heritrix进行改进,增加了基于Hash算法的增量式抓取模块。实验表明,改进的Heritrix能够有效实现对网页的增量式抓取。  相似文献   

2.
以开源网络爬虫Heritrix为基础,阐述其工作原理和架构。根据渔业信息词库建立索引,提出一种基于Heritrix的定题爬虫算法,根据链接和内容对网页进行过滤,并构建了渔业信息网络爬虫FishInfoCrawler,经实验表明,本算法能完成渔业信息领域相关网页的抓取。  相似文献   

3.
以开源网络爬虫Heritrix为基础,阐述其工作原理和架构。根据渔业信息词库建立索引,提出一种基于Heritrix的定题爬虫算法,根据链接和内容对网页进行过滤,并构建了渔业信息网络爬虫FishInfoCrawler,经实验表明,本算法能完成渔业信息领域相关网页的抓取。  相似文献   

4.
主要介绍了主题搜索引擎、网络爬虫的基本概念和Heritrix系统的体系结构,分析了Heritrix的工作流程,在Heritrix框架的基础上进行扩展和优化。通过一个实例,实现了对京东网图书信息的抓取,为建立面向图书信息的垂直搜索引擎提供了网页信息资源。  相似文献   

5.
基于语义的网络化楚辞文献智能检索系统模型,其三个重要功能模块的构建思路是:使用Protégé工具采用面向对象构建法进行楚辞文献本体的构建;采用开源Heritrix爬虫工具扩展相关类进行主题爬虫的设计抓取网络化楚辞文献并保存至数据库;利用Lucene软件包建立语义索引实现主题智能搜索引擎架构.基于语义的智能检索系统框架,充分利用本体的语义优点,改进了传统检索系统只查询包含关键词,缺少概念间语义关联的局限,以此增强检索结果的相关性和联想性,提高系统的可复用性、可靠性、规范性和检索速率.  相似文献   

6.
针对电子商务网站的特点,基于Heritrix开源爬虫系统,结合电子商务网站的特点,提出了面向电子商务网站的增量爬行策略。并在开源爬虫系统Heritrix上进行了实现,增加了面向电子商务网站商品页面的抽取功能,以及增量抓取功能设计。通过对电子商务网站的抓取实验,表明该增量爬行策略的设计能够有效提取电子商务网站上的商品信息,并实现了增量抓取。  相似文献   

7.
作为一种快捷、高效访问网络海量数据的工具,通用搜索引擎自诞生以来备受人们喜爱.然而在设计上它却存在着很多不足,并且随着万维网的快速发展而日益不能满足人们的需求.基于这种背景,用于对网页进行定向抓取的主题爬虫应运而生.主题爬虫的设计理念是利用最少的资源,尽可能快而准确地抓取网络中用户关心的网页,目前已经有着非常广泛的应用.首先,了解主题爬虫提出的历史背景及当前国内外的发展状况,分析与主题爬虫设计相关的技术知识,如HTTP协议、HTML解析、中文分词等.其次,提出使用向量空间模型进行主题相关度计算.为了能够充分利用网页中丰富的启发式信息,综合运用了网页内容分析和网页链接分析技术.最后,基于对主题爬虫设计与实现方法的研究,使用Java开发一个多线程主题爬虫.  相似文献   

8.
针对主题搜索引擎反馈信息主题相关度低的问题,提出了将遗传算法与基于内容的空间向量模型相结合的搜索策略。利用空间向量模型确定网页与主题的相关度,并将遗传算法应用于相关度判别,提高主题信息搜索的准确率和查全率。在Heritrix框架基础上,利用Eclipse 3.3实现了相应功能。实验结果表明,搜索策略改进后的系统抓取主题页面所占比例与原系统相比提高了约30%。  相似文献   

9.
主题网络爬虫采用集中式体系结构,具有对单台服务器性能要求高、可扩展性差等缺点。提出了一种基于Hadoop的分布式主题网络爬虫架构,通过将主题网络爬虫部署在分布式集群中的不同机器,运用MapReduce编程模型对数据进行抓取分析,使不同机器共同完成对指定任务的抓取工作。实验证明,采用分布式架构,通过动态调节分布式集群中的节点个数,能够明显改善主题网络爬虫的抓取效果。  相似文献   

10.
基于主题搜索的主题网络爬虫,只抓取与用户主题相关的页面。在深入分析主题页面分布特征和主题相关性判别算法的基础上,提出了一个面向主题搜索的网络爬虫模型,它很好地克服了通用搜索引擎准确率偏低、信息内容相对陈旧、信息分布范围不均衡等不足。实验结果表明,尽管基于主题爬虫的搜索增加了内存使用率,但也成倍提升了搜索的准确性,提高了抓取效率以及抓取结果的利用率。  相似文献   

11.
This paper focuses primarily on the integration of Web 2.0 technologies into social studies education. It documents how various Web 2.0 tools can be utilized in the social studies context to support and enhance teaching and learning. For the purposes of focusing on one specific topic, global connections at the middle school level will be the overlapping theme across the Web 2.0 technologies.  相似文献   

12.
Epidural analgesia has become a popular way to reduce pain during labor. Because epidural use is not limited to women who plan its use, but extends to some who originally planned a nonmedicated birth, it is important for the childbirth educator to provide information on this topic to all women. In this column, the authors provide examples of Web sites that address epidural anesthesia. Web sites for professionals and consumers are included. Encouraging the use of such resources by expectant parents can provide them with good information and allow the class time on this topic to be that of clarification.  相似文献   

13.
Web数据挖掘在搜索引擎中的应用   总被引:1,自引:0,他引:1  
分析了搜索引擎的结构组成,从Web的结构挖掘、内容挖掘、使用挖掘3方面对Web挖掘在搜索引擎中的应用进行阐述。  相似文献   

14.
Focused crawling is an important technique for topical resource discovery on the Web. The key issue in focused crawling is to prioritize uncrawled uniform resource locators (URLs) in the frontier to focus the crawling on relevant pages. Traditional focused crawlers mainly rely on content analysis. Link-based techniques are not effectively exploited despite their usefulness. In this paper, we propose a new frontier prioritizing algorithm, namely the on-line topical importance estimation (OTIE) algorithm. OTIE combines link- and content-based analysis to evaluate the priority of an uncrawled URL in the frontier. We performed real crawling experiments over 30 topics selected from the Open Directory Project (ODP) and compared harvest rate and target recall of the four crawling algorithms: breadth-first, link-context-prediction, on-line page importance computation (OPIC) and our OTIE. Experimental results showed that OTIE significantly outperforms the other three algorithms on the average target recall while maintaining an acceptable harvest rate. Moreover, OTIE is much faster than the traditional focused crawling algorithm.  相似文献   

15.
ABSTRACT

The World Wide Web has become a major information resource for adolescents (i.e., 10–19 years of age), offering an unprecedented amount of information on virtually any topic. While the Web can potentially offer new learning opportunities, it also presents several challenges. Reading and learning on the Web requires a set of advanced literacy skills that adolescents do not necessarily possess and need to develop in order to effectively deal with the complexity of information encountered online. This special issue brings together five empirical articles and a discussion paper that examine internal and external factors that are beneficial (or detrimental) to adolescents’ reading and learning on the Web, and contribute to explaining how young learners develop complex literacy skills. Theoretically, the special issue contributes to the conceptualization of what researchers refer to as ‘multiple documents literacy’. In practice, it informs researchers and educators of emerging empirical results regarding adolescents’ information behaviour, as well as on instructional strategies that can be effective for developing adolescents’ literacy skills.  相似文献   

16.
网络多媒体教学资源搜索与利用是信息教育中不可忽视的工作。基于主题搜索技术在专业领域中的应用,建立教育主题词集、提取网络多媒体表征信息、改进主题搜索策略,在已有的主题搜索器的基础上设计并实现了一个网络多媒体主题搜索系统,用于搜索Web中与教学有关的视频、音频、动画等多媒体资源,为有效利用多媒体网络教学资源提供了一个良好平台。实验结果显示,该系统能有效提高多媒体教学资源的搜索效率,在教育教学中具有重要的意义。  相似文献   

17.
网络教育形式作为教育领域的新生事物,正在日益受到人们的重视。各种学习理论成果在网络教学环境下的探讨与应用成为大家关注的热点,奥苏贝尔(D.P.Ausubel)的先行组织者教学策略在网络教学中诠释了新的含义,传统的基于先行组织者教学策略的教学过程也应重新构建为基于先行组织者教学策略的网络教学过程。  相似文献   

18.
Where do infants go? A longstanding assumption is that infants primarily crawl or walk to reach destinations viewed while stationary. However, many bouts of spontaneous locomotion do not end at new people, places, or things. Study 1 showed that half of 10- and 13-month-old crawlers’ (N = 29) bouts end at destinations—more than previously found with walkers. Study 2 confirmed that, although infants do not commonly go to destinations, 12-month-old crawlers go to proportionally more destinations than age-matched walkers (N = 16). Head-mounted eye tracking revealed that crawlers and walkers mostly take steps in place while fixating something within reach. When infants do go to a destination, they take straight, short paths to a target fixated while stationary.  相似文献   

19.
Web searching is a timely topic which importance is recognized by researchers, educators and instructional designers. This paper aims to guide these practitioners in developing instructional materials for learning to search the Web. It does so by articulating ten design principles that attend to the content and presentation of Web searching instruction. These principles convey a mixture of insights gleaned from instructional theory, empirical research, and many hours of classroom experience. Together, these design recommendations elucidate the key characteristics of effective Web searching instruction, explaining not only what the instructional materials look like, but also why they look the way they do.  相似文献   

20.
互联网给人们提供了无限丰富的信息,可是当前的浏览工具却没有提供有效的手段来缩短网络连接时间。由此提出了一个多智能体结构来辅助无线移动用户进行软件检索与安装,它是智能信息检索领域的重要课题。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号