首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 375 毫秒
1.
针对主题搜索引擎反馈信息主题相关度低的问题,提出了将遗传算法与基于内容的空间向量模型相结合的搜索策略。利用空间向量模型确定网页与主题的相关度,并将遗传算法应用于相关度判别,提高主题信息搜索的准确率和查全率。在Heritrix框架基础上,利用Eclipse3.3实现了相应功能。实验结果表明,搜索策略改进后的系统抓取主题页面所占比例与原系统相比提高了约30%。  相似文献   

2.
吴红艳 《现代情报》2006,26(10):144-145,148
本文对“灰色文献”归属权离散、时空离散、价值离散等特点及分布现状进行了系统地分析.进而提出应用遗传算法进行编程,达到快速有效的Internet定位,实现读者快速找到特定的“灰色文献”信息地址,获取相关的“灰色文献”信息.  相似文献   

3.
随着国网“SG186”营销业务应用系统上线并实用化,为实现全省范围内营销数据信息的采集、监控和分析奠定了技术基础。通过建设统一的营销稽查监控平台,整合营销业务应用、95598客服资源、电能量采集系统、配网信息管理等系统资源,实现营销运营展示、供电质量及应急处置监控,经营成果、工作质量、数据质量和服务资源监控与稽查,主题分析及查询、稽查监控主题管理、稽查任务管理、稽查评价等功能,为建立“纵向贯通、横向集成、科学规范、精益高效”的营销稽查监控体系提供支撑,实现营销全过程精益化管理。  相似文献   

4.
Internet查询中基于元遗传算法的信息过滤研究   总被引:1,自引:0,他引:1  
郑红军  杨冰 《情报杂志》2005,24(11):70-71,74
阐述了信息过滤的基本概念和信息过滤系统的状况,分析了基于遗传算法的信息过滤及其相关背景,并在此基础上对遗传算法的改进——元遗传算法进行了探讨。  相似文献   

5.
本刊讯 第 1 6届世界计算机大会于 8月 2 1日上午在北京国际会议中心隆重开幕 ,国家主席江泽民出席大会并发表了讲话。本届大会为期 5天 ,主题为“2 0 0 0年后的信息科学与技术” ,集中探讨了信息技术在新世纪的发展脉络及其对社会的深刻影响 ,讨论的问题涉及芯片设计自动化、软件理论与应用、通信、信息与网络安全、智能信息处理、信号处理、现代教育中的信息与通信技术、企业管理中的信息技术等领域。大会设立了由国内外大学生参加的青年论坛和著名学者参加的先驱者论坛 ,会议期间还举办了主题为“因特网技术和商务应用”的第 1 6届世界…  相似文献   

6.
浅析遗传算法在面向主题的元搜索引擎设计的改进及应用。  相似文献   

7.
人类正在进入一个“人机物”三元融合的万物智能互联时代,需要一种新型信息基础设施,即全球规模的高通量低熵算力网,形象地简称为“信息高铁”。文章介绍了信息高铁的愿景,包括基础性需求、关键科学技术问题和系统结构。与互联网、云计算、大数据、物联网等现有网络计算系统相比,信息高铁的目标是原生支持“人机物”三元融合和低熵有序,降低系统无序的负面影响,提升系统通量与应用品质。  相似文献   

8.
[目的/意义]旨在发现国内用户画像研究领域的研究主题以及这些主题的发展脉络,为图书馆用户画像的构建提供参考。[方法/过程]运用LDA主题模型对国内用户画像研究论文的题目、摘要和关键词等内容进行文本挖掘,按年度对热点主题进行分析并发现各主题的演化趋势。[结果/结论]国内用户画像研究领域大体可划分为8个研究主题:新媒体营销、电商系统与精准营销、推荐算法与推荐系统、健康信息服务、教育教学、金融服务、社交网络与内容分析、高校图书馆与信息服务。研究主题按年度演化趋势可分为上升主题、平稳主题和衰减主题3类。高校图书馆与信息服务是上升幅度最大的主题,这表明研究人员越来越关注用户画像在图书馆及相关领域的应用研究。  相似文献   

9.
王雪华  郑佳露 《现代情报》2006,26(4):204-206,208
针对政务信息化建设过程中较为普遍的“信息孤岛”与重复建设问题,设计了基于J2EE的电子政务应用系统整合平台,通过谊平台的应用达到“整合”原有基础设施和应用系统的目的,以提高电子政务建设与运行的质量和效率。  相似文献   

10.
由中国信息产业商会和中国市长协会联合主办的“2006城市信息化建设论坛”于9月28日在北京友谊宾馆召开。本次论坛的主题为:大力推广信息技术在城市建设和管理中的应用.推动信息产业界与城市信息化建设的融合互动。  相似文献   

11.
以净化网页、提取网页主题内容为目标,提出一个基于网页规划布局的网页主题内容抽取算法。该算法依据原始网页的规划布局,通过构造标签树为网页分块分类,进而通过计算内容块的主题相关度,辨别网页主题,剔除不相关信息,提取网页主题内容。实验表明,算法适用于主题型网页的“去噪”及内容提取,具体应用中有较理想的表现。  相似文献   

12.
传统Web页面是根据具体需求由专业程序员开发设计和实现,非专业人员无法着手建立个性化网站。提出了一种可视化Web页面设计系统,通过利用ExtJS技术构建基本工具模块和高级工具模块,结合模块化和对象化设计理念,实现简单易懂的网页设计系统。该系统具有类似于桌面应用软件的交互界面,操作简单。利用该系统开发网站不仅可以减少开发成本,提高Web页面的开发效率,而且能够让用户方便快捷地打造属于自己的网站,进而实现Web的个性化服务机制。  相似文献   

13.
随着网络的飞速发展,网页数量急剧膨胀,近几年来更是以指数级进行增长,搜索引擎面临的挑战越来越严峻,很难从海量的网页中准确快捷地找到符合用户需求的网页。网页分类是解决这个问题的有效手段之一,基于网页主题分类和基于网页体裁分类是网页分类的两大主流,二者有效地提高了搜索引擎的检索效率。网页体裁分类是指按照网页的表现形式及其用途对网页进行分类。介绍了网页体裁的定义,网页体裁分类研究常用的分类特征,并且介绍了几种常用特征筛选方法、分类模型以及分类器的评估方法,为研究者提供了对网页体裁分类的概要性了解。  相似文献   

14.
In the whole world, the internet is exercised by millions of people every day for information retrieval. Even for a small to smaller task like fixing a fan, to cook food or even to iron clothes persons opt to search the web. To fulfill the information needs of people, there are billions of web pages, each having a different degree of relevance to the topic of interest (TOI), scattered throughout the web but this huge size makes manual information retrieval impossible. The page ranking algorithm is an integral part of search engines as it arranges web pages associated with a queried TOI in order of their relevance level. It, therefore, plays an important role in regulating the search quality and user experience for information retrieval. PageRank, HITS, and SALSA are well-known page ranking algorithm based on link structure analysis of a seed set, but ranking given by them has not yet been efficient. In this paper, we propose a variant of SALSA to give sNorm(p) for the efficient ranking of web pages. Our approach relies on a p-Norm from Vector Norm family in a novel way for the ranking of web pages as Vector Norms can reduce the impact of low authority weight in hub weight calculation in an efficient way. Our study, then compares the rankings given by PageRank, HITS, SALSA, and sNorm(p) to the same pages in the same query. The effectiveness of the proposed approach over state of the art methods has been shown using performance measurement technique, Mean Reciprocal Rank (MRR), Precision, Mean Average Precision (MAP), Discounted Cumulative Gain (DCG) and Normalized DCG (NDCG). The experimentation is performed on a dataset acquired after pre-processing of the results collected from initial few pages retrieved for a query by the Google search engine. Based on the type and amount of in-hand domain expertise 30 queries are designed. The extensive evaluation and result analysis are performed using MRR, [email protected], MAP, DCG, and NDCG as the performance measuring statistical metrics. Furthermore, results are statistically verified using a significance test. Findings show that our approach outperforms state of the art methods by attaining 0.8666 as MRR value, 0.7957 as MAP value. Thus contributing to the improvement in the ranking of web pages more efficiently as compared to its counterparts.  相似文献   

15.
With the increase of information on the Web, it is difficult to find desired information quickly out of the documents retrieved by a search engine. One way to solve this problem is to classify web documents according to various criteria. Most document classification has been focused on a subject or a topic of a document. A genre or a style is another view of a document different from a subject or a topic. The genre is also a criterion to classify documents. In this paper, we suggest multiple sets of features to classify genres of web documents. The basic set of features, which have been proposed in the previous studies, is acquired from the textual properties of documents, such as the number of sentences, the number of a certain word, etc. However, web documents are different from textual documents in that they contain URL and HTML tags within the pages. We introduce new sets of features specific to web documents, which are extracted from URL and HTML tags. The present work is an attempt to evaluate the performance of the proposed sets of features, and to discuss their characteristics. Finally, we conclude which is an appropriate set of features in automatic genre classification of web documents.  相似文献   

16.
网页自动标引方案的优选及标引性能的测评   总被引:2,自引:0,他引:2  
仲云云  侯汉清  薛鹏军 《情报科学》2002,20(10):1108-1110
本文介绍了三种网页自动标引方案,通过对“中国经济网”上50页网页的手工标引、自动标引结果比较,从而优选出一种方案,即对网页全文不同部位加权,采用词频加权统计法。最后对该方案自动主题标引和分类标引分别从人机相符率方面进行测评。  相似文献   

17.
本文通过对网页结构和内容特征的深入分析和识别,对噪音网页的过滤方法进行研究和实验。首先利用阈值过滤具有明显特征的噪音网页,而后建立网页特征向量,利用SVM对网页进行分类。采用采集自Web的网页数据进行实验分析,最后得出研究结论,并展望下一步工作。  相似文献   

18.
This research is a part of ongoing study to better understand citation analysis on the Web. It builds on Kleinberg's research (J. Kleinberg, R. Kumar, P. Raghavan, P. Rajagopalan, A. Tomkins, Invited survey at the International Conference on Combinatorics and Computing, 1999) that hyperlinks between web pages constitute a web graph structure and tries to classify different web graphs in the new coordinate space: out-degree, in-degree. The out-degree coordinate is defined as the number of outgoing web pages from a given web page. The in-degree coordinate is the number of web pages that point to a given web page. In this new coordinate space a metric is built to classify how close or far are different web graphs. Kleinberg's web algorithm (J. Kleinberg, Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 1998, pp. 668–677) on discovering “hub web pages” and “authorities web pages” is applied in this new coordinate space. Some very uncommon phenomenon has been discovered and new interesting results interpreted. This study does not look at enhancing web retrieval by adding context information. It only considers web hyperlinks as a source to analyze citations on the web. The author believes that understanding the underlying web page as a graph will help design better web algorithms, enhance retrieval and web performance, and recommends using graphs as a part of visual aid for search engine designers.  相似文献   

19.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号