首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Frequent requests from users to search engines on the World Wide Web are to search for information about people using personal names. Current search engines only return sets of documents containing the name queried, but, as several people usually share a personal name, the resulting sets often contain documents relevant to several people. It is necessary to disambiguate people in these result sets in order to to help users find the person of interest more readily. In the task of name disambiguation, effective measurement of similarities in the documents is a crucial step towards the final disambiguation. We propose a new method that uses web directories as a knowledge base to find common contexts in documents and uses the common contexts measure to determine document similarities. Experiments, conducted on documents mentioning real people on the web, together with several famous web directory structures, suggest that there are significant advantages in using web directories to disambiguate people compared with other conventional methods.  相似文献   

2.
Over the past decade, worldwide Internet usage has grown tremendously, with the most rapid growth in some emerging economies such as Latin America and the Middle East, where people speaking different languages actively seek information on the web. Global search engines may not adequately address local users’ needs while regional web portals may lack rich web content. Different from search engines, web directories organize sites and pages into intuitive hierarchical structures to facilitate browsing. However, high-quality web directories in users’ native languages often do not exist and their development requires much domain knowledge not readily available. In this research, we proposed a novel semi-automatic approach to facilitate web repository management. We applied the approach to developing web directories in the business and health-care domains for the Spanish-speaking and Arabic-speaking communities respectively. The two directories contain respectively 4735 and 5107 unique sites and pages with a maximum depth of 5 levels. Results of experiments involving 37 native speakers show that these directories outperformed existing benchmark directories in terms of browsing effectiveness and efficiency, providing strong implications for information professionals and multinational enterprise managers.  相似文献   

3.
We present three fundamental, interrelated approaches to support multiple access paths to each terminal object in information hierarchies: faceted classification, faceted search, and web directories with embedded symbolic links. This survey aims to demonstrate how each approach supports users who seek information from multiple perspectives. We achieve this by exploring each approach, the relationships between these approaches, including tradeoffs, and how they can be used in concert, while focusing on a core set of hypermedia elements common to all. This approach provides a foundation from which to study, understand, and synthesize applications which employ these techniques. This survey does not aim to be comprehensive, but rather focuses on thematic issues.  相似文献   

4.
One major approach for information finding in the WWW is to navigate through some Web directories and browse them until the goal pages were found. However, such directories are generally constructed manually and may have disadvantages of narrow coverage and inconsistency. Besides, most of existing directories provide only monolingual hierarchies that organized Web pages in terms that a user may not be familiar with. In this work, we will propose an approach that could automatically arrange multilingual Web pages into a multilingual Web directory to break the language barriers in Web navigation. In this approach, a self-organizing map is constructed to train each set of monolingual Web pages and obtain two feature maps, which reveal the relationships among Web pages and thematic keywords, respectively, for such language. We then apply a hierarchy generation process on these maps to obtain the monolingual hierarchy for these Web pages. A hierarchy alignment method is then applied on these monolingual hierarchies to discover the associations between nodes in different hierarchies. Finally, a multilingual Web directory is constructed according to such associations. We applied the proposed approach on a set of Web pages and obtained interesting result that demonstrates the feasibility of our method in multilingual Web navigation.  相似文献   

5.
Broken hypertext links are a frequent problem in the Web. Sometimes the page which a link points to has disappeared forever, but in many other cases the page has simply been moved to another location in the same web site or to another one. In some cases the page besides being moved, is updated, becoming a bit different to the original one but rather similar. In all these cases it can be very useful to have a tool that provides us with pages highly related to the broken link, since we could select the most appropriate one. The relationship between the broken link and its possible linkable pages, can be defined as a function of many factors. In this work we have employed several resources both in the context of the link and in the Web to look for pages related to a broken link. From the resources in the context of a link, we have analyzed several sources of information such as the anchor text, the text surrounding the anchor, the URL and the page containing the link. We have also extracted information about a link from the Web infrastructure such as search engines, Internet archives and social tagging systems. We have combined all of these resources to design a system that recommends pages that can be used to recover the broken link. A novel methodology is presented to evaluate the system without resorting to user judgments, thus increasing the objectivity of the results, and helping to adjust the parameters of the algorithm. We have also compiled a web page collection with true broken links, which has been used to test the full system by humans.  相似文献   

6.
Ecommerce is developing into a fast-growing channel for new business, so a strong presence in this domain could prove essential to the success of numerous commercial organizations. However, there is little research examining ecommerce at the individual customer level, particularly on the success of everyday ecommerce searches. This is critical for the continued success of online commerce. The purpose of this research is to evaluate the effectiveness of search engines in the retrieval of relevant ecommerce links. The study examines the effectiveness of five different types of search engines in response to ecommerce queries by comparing the engines’ quality of ecommerce links using topical relevancy ratings. This research employs 100 ecommerce queries, five major search engines, and more than 3540 Web links. The findings indicate that links retrieved using an ecommerce search engine are significantly better than those obtained from most other engines types but do not significantly differ from links obtained from a Web directory service. We discuss the implications for Web system design and ecommerce marketing campaigns.  相似文献   

7.
We propose the use of ISI-JCR categories as units of cocitation and measurement for the construction of heliocentric maps. The use of a spatial metaphor allows us to illustrate, analyze and compare domains in terms of the categories and their interconnections or links. We can also move around within the structure of these domains for further analysis, and access the documents associated to the categories and to the links that cocite or relate them.  相似文献   

8.
This case study analyzes the Internet-based resources that a software engineer uses in his daily work. Methodologically, we studied the web browser history of the participant, classifying all the web pages he had seen over a period of 12 days into web genres. We interviewed him before and after the analysis of the web browser history. In the first interview, he spoke about his general information behavior; in the second, he commented on each web genre, explaining why and how he used them. As a result, three approaches allow us to describe the set of 23 web genres obtained: (a) the purposes they serve for the participant; (b) the role they play in the various work and search phases; (c) and the way they are used in combination with each other. Further observations concern the way the participant assesses quality of web-based resources, and his information behavior as a software engineer.  相似文献   

9.
Many Web sites have begun allowing users to submit items to a collection and tag them with keywords. The folksonomies built from these tags are an interesting topic that has seen little empirical research. This study compared the search information retrieval (IR) performance of folksonomies from social bookmarking Web sites against search engines and subject directories. Thirty-four participants created 103 queries for various information needs. Results from each IR system were collected and participants judged relevance. Folksonomy search results overlapped with those from the other systems, and documents found by both search engines and folksonomies were significantly more likely to be judged relevant than those returned by any single IR system type. The search engines in the study had the highest precision and recall, but the folksonomies fared surprisingly well. Del.icio.us was statistically indistinguishable from the directories in many cases. Overall the directories were more precise than the folksonomies but they had similar recall scores. Better query handling may enhance folksonomy IR performance further. The folksonomies studied were promising, and may be able to improve Web search performance.  相似文献   

10.
赵健 《情报科学》2012,(3):377-380,413
搜索引擎、主题指南与大众分类是目前因特网上信息资源分类组织的三种主要方式,它们对应着三种不同类型的网上信息检索系统。在对它们各自特点深入分析的基础上,首次对三者进行了比较研究,并对网络信息分类组织的发展趋势提出了一些个人观点。  相似文献   

11.
Ontologies and folksonomies are currently the most prominent web content classification schemes. While their roles are similar, their engineering is different. In an attempt to combine and harness their distinct powers, web and information scientists are attempting to integrate them, merging the flexibility, collaboration and information aggregation of folksonomies with the standardisation, automated validation and interoperability of ontologies. This paper explores the basics of web information classification engineering, identifies the strengths and weaknesses of the existing methodologies, assesses their effectiveness and investigates a number of key quality issues. It then investigates the existing methods for integrating ontologies and folksonomies and examines the integration requirements. It finally proposes a common framework for reconciliation of the two classification approaches and quality assurance.  相似文献   

12.
Previous studies of academic web interlinking have tended to hypothesise that the relationship between the research of a university and links to or from its web site should follow a linear trend, yet the typical distribution of web data, in general, seems to be a non-linear power law. This paper assesses whether a linear trend or a power law is the most appropriate method with which to model the relationship between research and web site size or outlinks. Following linear regression, analysis of the confidence intervals for the logarithmic graphs, and analysis of the outliers, the results suggest that a linear trend is more appropriate than a non-linear power law.  相似文献   

13.
随着互联网和社交网络的发展,个人信息越来越多地暴露在网络空间中,有效收集和挖掘这些信息可发现所需要的人才信息。设计了一个人才发现与推荐系统,该系统基于Hadoop平台,利用网络爬虫程序寻找网页,通过信息抽取技术获取页面内容,利用lucene的分词器提取文本中的关键词,根据关联规则算法挖掘出关联关键词,采用基于相似项的策略推荐人才。系统为企业提供了一种基于网页数据的技术人才发现和推荐工具,节约了大量时间和成本。  相似文献   

14.
Queries submitted to search engines can be classified according to the user goals into three distinct categories: navigational, informational, and transactional. Such classification may be useful, for instance, as additional information for advertisement selection algorithms and for search engine ranking functions, among other possible applications. This paper presents a study about the impact of using several features extracted from the document collection and query logs on the task of automatically identifying the users’ goals behind their queries. We propose the use of new features not previously reported in literature and study their impact on the quality of the query classification task. Further, we study the impact of each feature on different web collections, showing that the choice of the best set of features may change according to the target collection.  相似文献   

15.
Due to the proliferation and abundance of information on the web, ranking algorithms play an important role in web search. Currently, there are some ranking algorithms based on content and connectivity such as BM25 and PageRank. Unfortunately, these algorithms have low precision and are not always satisfying for users. In this paper, we propose an adaptive method, called A3CRank, based on the content, connectivity, and click-through data triple. Our method tries to aggregate ranking algorithms such as BM25, PageRank, and TF-IDF. We have used reinforcement learning to incorporate user behavior and find a measure of user satisfaction for each ranking algorithm. Furthermore, OWA, an aggregation operator is used for merging the results of the various ranking algorithms. A3CRank adapts itself with user needs and makes use of user clicks to aggregate the results of ranking algorithms. A3CRank is designed to overcome some of the shortcomings of existing ranking algorithms by combining them together and producing an overall better ranking criterion. Experimental results indicate that A3CRank outperforms other combinational ranking algorithms such as Ranking SVM in terms of P@n and NDCG metrics. We have used 130 queries on University of California at Berkeley’s web to train and evaluate our method.  相似文献   

16.
李健  肖友国 《情报科学》2002,20(1):103-104,112
介绍了网络广告的特点和常见形式,分析了网络广告传递广告信息的优势与弊端,对网络广告中信息问题提出了一些解决途径。  相似文献   

17.
提出了一种MVC控制器设计方案,适用于REST(Representational State Transfer)风格的Web应用。该方案使用支持简单中文语法的目录表格式文件来描述路由映射和处理逻辑,文档清晰、简洁。实验效果表明,该控制器可以被方便地配置和使用,提高应用开发的效率。  相似文献   

18.
杨丽丽 《中国科技信息》2007,(23):128-129,131
色彩是设计的精髓所在,在网页设计中尤为如此。色彩不仅展现着设计华丽的外表,更是一种具有丰富情感和象征性的语言。因此,体现网站主题、符合浏览者审美的色彩搭配是增添网页视觉吸引力和提升信息传达效率的关键。本论文通过对网页中色彩的特征与作用的分析,结合具体实例,探讨了网页设计中的色彩运用的原则与方法。  相似文献   

19.
The goal of the study presented in this article is to investigate to what extent the classification of a web page by a single genre matches the users’ perspective. The extent of agreement on a single genre label for a web page can help understand whether there is a need for a different classification scheme that overrides the single-genre labelling. My hypothesis is that a single genre label does not account for the users’ perspective. In order to test this hypothesis, I submitted a restricted number of web pages (25 web pages) to a large number of web users (135 subjects) asking them to assign only a single genre label to each of the web pages. Users could choose from a list of 21 genre labels, or select one of the two ‘escape’ options, i.e. ‘Add a label’ and ‘I don’t know’. The rationale was to observe the level of agreement on a single genre label per web page, and draw some conclusions about the appropriateness of limiting the assignment to only a single label when doing genre classification of web pages. Results show that users largely disagree on the label to be assigned to a web page.  相似文献   

20.
Categorized overviews of web search results are a promising way to support user exploration, understanding, and discovery. These search interfaces combine a metadata-based overview with the list of search results to enable a rich form of interaction. A study of 24 sophisticated users carrying out complex tasks suggests how searchers may adapt their search tactics when using categorized overviews. This mixed methods study evaluated categorized overviews of web search results organized into thematic, geographic, and government categories. Participants conducted four exploratory searches during a 2-hour session to generate ideas for newspaper articles about specified topics such as “human smuggling.” Results showed that subjects explored deeper while feeling more organized, and that the categorized overview helped subjects better assess their results, although no significant differences were detected in the quality of the article ideas. A qualitative analysis of searcher comments identified seven tactics that participants reported adopting when using categorized overviews. This paper concludes by proposing a set of guidelines for the design of exploratory search interfaces. An understanding of the impact of categorized overviews on search tactics will be useful to web search researchers, search interface designers, information architects and web developers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号