首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recently, two new indicators (Equalized Mean-based Normalized Proportion Cited, EMNPC; Mean-based Normalized Proportion Cited, MNPC) were proposed which are intended for sparse scientometrics data, e.g., alternative metrics (altmetrics). The indicators compare the proportion of mentioned papers (e.g. on Facebook) of a unit (e.g., a researcher or institution) with the proportion of mentioned papers in the corresponding fields and publication years (the expected values). In this study, we propose a third indicator (Mantel-Haenszel quotient, MHq) belonging to the same indicator family. The MHq is based on the MH analysis – an established method in statistics for the comparison of proportions. We test (using citations and assessments by peers, i.e. F1000Prime recommendations) if the three indicators can distinguish between different quality levels as defined on the basis of the assessments by peers. Thus, we test their convergent validity. We find that the indicator MHq is able to distinguish between the quality levels in most cases while MNPC and EMNPC are not. Since the MHq is shown in this study to be a valid indicator, we apply it to six types of zero-inflated altmetrics data and test whether different altmetrics sources are related to quality. The results for the various altmetrics demonstrate that the relationship between altmetrics (Wikipedia, Facebook, blogs, and news data) and assessments by peers is not as strong as the relationship between citations and assessments by peers. Actually, the relationship between citations and peer assessments is about two to three times stronger than the association between altmetrics and assessments by peers.  相似文献   

2.
Word-based byte-oriented compression has succeeded on large natural language text databases, by providing competitive compression ratios, fast random access, and direct sequential searching. We show that by just rearranging the target symbols of the compressed text into a tree-shaped structure, and using negligible additional space, we obtain a new implicitly indexed representation of the compressed text, where search times are drastically improved. The occurrences of a word can be listed directly, without any text scanning, and in general any inverted-index-like capability, such as efficient phrase searches, can be emulated without storing any inverted list information. We experimentally show that our proposal performs not only much more efficiently than sequential searches over compressed text, but also than explicit inverted indexes and other types of indexes, when using little extra space. Our representation is especially successful when searching for single words and short phrases.  相似文献   

3.
The changes in the global information landscape, as epitomized by the reaction of governments to the 9/11 attacks, resulted in legislation, policy, and the formation of agencies that have affected many issues related to information and its use. This article examines the recent multiplicity of challenges that affect citizens' control and use of information. In the name of the war on terror, greater national security, and globalization trends, information laws, and policies often go further than is necessary and impact on the information rights of citizens. In this article, we advocate for bringing together what are at times disparate information issues under one label, namely, “information rights” (which include privacy, freedom of expression, access, etc.). Information rights are apprehended from a user-centered perspective (i.e., users as citizens, not just consumers). They cover many different aspects of the information life cycle and the roles and responsibilities of individuals and communities. Such an approach provides an alternative way of framing current information issues as they relate to national security policies and civil liberties in the broader sense.  相似文献   

4.
For the purposes of classification it is common to represent a document as a bag of words. Such a representation consists of the individual terms making up the document together with the number of times each term appears in the document. All classification methods make use of the terms. It is common to also make use of the local term frequencies at the price of some added complication in the model. Examples are the naïve Bayes multinomial model (MM), the Dirichlet compound multinomial model (DCM) and the exponential-family approximation of the DCM (EDCM), as well as support vector machines (SVM). Although it is usually claimed that incorporating local word frequency in a document improves text classification performance, we here test whether such claims are true or not. In this paper we show experimentally that simplified forms of the MM, EDCM, and SVM models which ignore the frequency of each word in a document perform about at the same level as MM, DCM, EDCM and SVM models which incorporate local term frequency. We also present a new form of the naïve Bayes multivariate Bernoulli model (MBM) which is able to make use of local term frequency and show again that it offers no significant advantage over the plain MBM. We conclude that word burstiness is so strong that additional occurrences of a word essentially add no useful information to a classifier.  相似文献   

5.
This study examined ten, selected word pairs, each containing a word's full spelling and its abbreviation, to determine which form search engine users preferred in searching. Using seven search logs gathered from several Internet search engines with approximately 608 MB of data, the study measured the occurrences of the twenty terms. The selected words are important in library cataloging, for some are prescribed abbreviations in metadata content standards. The study found that in eight of the ten word pairs users preferred to search words’ full spellings over the abbreviations, often by a high margin.  相似文献   

6.
"公文"释源     
针对"公文"一词的起源诸说,采用文献检索统计、同类词比较和文本分析等方法,本文发现,"公文"一词起源于东汉末年,却到宋、明、清才日渐盛行;它的出现次数和频次总体呈递增趋势;其文书类用法较为广泛,应用范围始终稳定在"官方政务文书"的基本范畴,是古代汉语在近现代得以发扬光大的典型之一.  相似文献   

7.
ABSTRACT

Despite a general decline in recent years in academic libraries’ reference desk statistics, research indicates that library users continue to have complex research questions but are largely unaware that librarians are waiting and ready to assist them. The challenge for librarians is to connect with users at their point of need. At Bowling Green State University, we are making a move in this direction with proactive (pop-up) chat widgets embedded within our library Web pages, catalog, and databases. Since implementation, the number of chat reference questions received has more than doubled, helping us reach additional users from on-and off-campus.  相似文献   

8.
图书馆图书借阅系统与单标度二元网络模型   总被引:8,自引:0,他引:8  
本文从网络的角度 ,研究了图书馆这样一种有趣的复杂系统。读者和图书之间通过借阅建立联系 ,可以在两个层次上用网络语言来描述 ,即二元 (读者—图书 )和单元 (读者—读者 ,图书—图书 )网络。我们以研究配位数分布为工具 ,研究了北京师范大学图书馆外借处图书在 14个月内的借阅情况所构成的网络 ,发现其体现了很好的单标度性质 ,即配位数分布体现为一指数衰减的形式。随后提出了一个单标度二元网络模型 ,对此进行解释 ,定性地重现了这一实测结果。  相似文献   

9.
自然语言语义分析研究进展   总被引:5,自引:0,他引:5  
按照自然语言的构成层次——词语、句子和篇章,分析各层次语义分析的内涵、现有的研究策略、理论依据及存在的主要方法,并对现存的两类主要研究策略进行对比分析.认为词语语义分析是指确定词语意义,衡量两个词之间的语义相似度或相关度;句子语义分析研究包含句义分析和句义相似度分析两方面;文本语义分析就是识别文本的意义、主题、类别等语义信息的过程.当前的自然语言语义分析主要存在两种主要的研究策略:基于知识或语义学规则的语义分析和基于统计学的语义分析.基于统计与规则相融合的语义分析方法是未来自然语言语义分析的主流方法,本体语义学是自然语言语义分析的重要基础.  相似文献   

10.
The numerical-algorithmic procedures of fractional counting and field normalization are often mentioned as indispensable requirements for bibliometric analyses. Against the background of the increasing importance of statistics in bibliometrics, a multilevel Poisson regression model (level 1: publication, level 2: author) shows possible ways to consider fractional counting and field normalization in a statistical model (fractional counting I). However, due to the assumption of duplicate publications in the data set, the approach is not quite optimal. Therefore, a more advanced approach, a multilevel multiple membership model, is proposed that no longer provides for duplicates (fractional counting II). It is assumed that the citation impact can essentially be attributed to time-stable dispositions of researchers as authors who contribute with different fractions to the success of a publication’s citation. The two approaches are applied to bibliometric data for 254 scientists working in social science methodology. A major advantage of fractional counting II is that the results no longer depend on the type of fractional counting (e.g., equal weighting). Differences between authors in rankings are reproduced more clearly than on the basis of percentiles. In addition, the strong importance of field normalization is demonstrated; 60% of the citation variance is explained by field normalization.  相似文献   

11.
Municipalities face a dilemma as they pursue technologically enabled modes of providing traditional services. The planning stages of e-government amount to triage: which specific municipal functions and services can a municipality afford to implement (or which services can they afford not to implement) given the costs of technology and technological capability? Little in the way of defining the leading edge of innovation among cities exists. To date, the literature on e-government “best practices” tends to stress creating standards for evaluating web-enabled services rather than for benchmarking the actual status of e-government implementation. In other words, a well-developed literature is emerging around standards by which municipal websites can be evaluated such as navigability and content standards. These standards do not give us insight, however, into the specific functions and services as they emerge on municipality websites. As a means toward addressing this lacuna, the authors created a rubric for benchmarking implementation among cities nationwide using a broad range of functional dimensions and assigning municipalities “e-scores.” In this paper, the authors describe these efforts, their approach and their findings.  相似文献   

12.
In this paper, a novel neighborhood based document smoothing model for information retrieval has been proposed. Lexical association between terms is used to provide a context sensitive indexing weight to the document terms, i.e. the term weights are redistributed based on the lexical association with the context words. A generalized retrieval framework has been presented and it has been shown that the vector space model (VSM), divergence from randomness (DFR), Okapi Best Matching 25 (BM25) and the language model (LM) based retrieval frameworks are special cases of this generalized framework. Being proposed in the generalized retrieval framework, the neighborhood based document smoothing model is applicable to all the indexing models that use the term-document frequency scheme. The proposed smoothing model is as efficient as the baseline retrieval frameworks at runtime. Experiments over the TREC datasets show that the neighborhood based document smoothing model consistently improves the retrieval performance of VSM, DFR, BM25 and LM and the improvements are statistically significant.  相似文献   

13.
Somewhere in the vicinity of 80 percent of all governmental information has some “geographic” element, and the vast majority is called “geospatial” because of the nature of describing spatial phenomena of the earth. Geospatial information has been increasing steadily in popularity and use since the advent of geographic information systems in the 1960s. From the early 1990s until the present, research libraries have seen an increase in the availability of geospatial information, and they have also seen a substantial change in the services provided to support the needs of clients seeking that information. However, recent events have altered the “access landscape.” September 11, 2001, and subsequent events, caused many policy shifts to take place as to how, or whether, access to governmental geospatial information should be granted. This paper explores those policy developments with the goal of prognosticating on the future of access to governmental geospatial information.  相似文献   

14.
15.
In this paper the authors present a preliminary discussion on some of the results from a survey aimed to explore, describe and explain some of the usability characteristics in digital libraries evaluation in the Mexican context. The study is framed in the evaluation of a multinational and monolingual digital library: the Miguel de Cervantes Virtual Library, from the University of Alicante in Spain. The evaluators were Mexican “expert” users (i.e. Spanish-speaking professional university librarians specialized in electronic reference services) who were asked to carry on an evaluation instrument based on usability criteria as taken from some models in developed countries. Some questions that might be answered with future research are outlined in this paper.  相似文献   

16.
Recogito 2 is an open source annotation tool currently under development by Pelagios, an international initiative aimed at facilitating better linkages between online resources documenting the past. With Recogito 2, we aim to provide an environment for efficient semantic annotation—i.e., the task of enriching content with references to controlled vocabularies—in order to facilitate links between online data. At the same time, we address a perceived gap in the performance of existing tools, by emphasizing the development of mechanisms for manual intervention and editorial control that support the curation of quality data. While Recogito 2 provides an online workspace for general-purpose document annotation, it is particularly well-suited for geo-annotation, in other words annotating documents with references to gazetteers, and supports the annotation of both texts and images (i.e., digitized maps). Already available for testing at http://recogito.pelagios.org, its formal release to the public occurred in December 2016.  相似文献   

17.
Since 2009, Open Access (OA) Week has been celebrated worldwide in October each year. It is an opportunity for librarians to engage with the research community and demonstrate the value that they bring to their organisations in the area of disseminating scholarly output. Although thousands of events have been held since the inception of OA Week, a minimal amount of research has been carried out regarding the impact of these events. This article presents a review of the literature on OA Week and evaluates the effectiveness of three events held during OA Week 2015 in Ireland through the use of statistics and a survey. The three events held during OA Week 2015 in Ireland that were evaluated include: a seminar run by Repository Network Ireland (RNI), a D.E.A.R. (Drop Everything And Read) campaign using OA materials organized by Dr. Steevens' Library, and a collaborative OA seminar between Dr. Steevens' Library and Dublin Institute of Technology (DIT) libraries. The author concludes that a collaborative approach to planning and managing OA week between librarians from academic and other sectors can have tangible benefits both in terms of promoting OA and also promoting the role of the Librarian in the OA movement.  相似文献   

18.
This study compares usage figures between equivalent e-books and print books owned by the Texas A&M University Libraries in the physical sciences and technology. For NetLibrary, the top 10 science e-books were used over six times more than the print books, and the top 10 chemistry e-books were used over three times more than their print counterparts. For ebrary, the top 17 science e-books were used at least 17 times more than the same print books. In Safari, the top 10 computer science e-books were used 207 times more than their print counterparts. Usage statistics such as these can help librarians make informed e-book purchase decisions, especially in times of retrenchment.  相似文献   

19.
In many probabilistic modeling approaches to Information Retrieval we are interested in estimating how well a document model “fits” the user’s information need (query model). On the other hand in statistics, goodness of fit tests are well established techniques for assessing the assumptions about the underlying distribution of a data set. Supposing that the query terms are randomly distributed in the various documents of the collection, we actually want to know whether the occurrences of the query terms are more frequently distributed by chance in a particular document. This can be quantified by the so-called goodness of fit tests. In this paper, we present a new document ranking technique based on Chi-square goodness of fit tests. Given the null hypothesis that there is no association between the query terms q and the document d irrespective of any chance occurrences, we perform a Chi-square goodness of fit test for assessing this hypothesis and calculate the corresponding Chi-square values. Our retrieval formula is based on ranking the documents in the collection according to these calculated Chi-square values. The method was evaluated over the entire test collection of TREC data, on disks 4 and 5, using the topics of TREC-7 and TREC-8 (50 topics each) conferences. It performs well, outperforming steadily the classical OKAPI term frequency weighting formula but below that of KL-Divergence from language modeling approach. Despite this, we believe that the technique is an important non-parametric way of thinking of retrieval, offering the possibility to try simple alternative retrieval formulas within goodness-of-fit statistical tests’ framework, modeling the data in various ways estimating or assigning any arbitrary theoretical distribution in terms.  相似文献   

20.
BBS中文新词语自动挖掘*   总被引:1,自引:0,他引:1  
针对从BBS文本中自动挖掘新词语的问题,提出一种结合统计和规则的简单易行的方法,采用中文分词、频数统计、词性过滤、词语碎片组合等关键技术。据此方法开发的系统可以自动挖掘不限长度、不限领域、不限类别的与上下文无关的任意新词语。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号