首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 500 毫秒
1.
In earlier papers the authors focused on differences in the ageing of journal literature in science and the social sciences. It was shown that for several fields and topics bibliometric standard indicators based on journal articles need to be modified in order to provide valid results. In fields where monographs, books or reports are important means of scientific information, standard models of scientific communication are not reflected by journal literature alone. To identify fields where the role of non-serial literature is considerable or critical in terms of bibliometric standard methods, the totality of the bibliographic citations indexed in the 1993 annual cumulation of the SCI and SSCI databases, have been processed. The analysis is based on three indicators, the percentage of references to serials, the mean references age, and the mean reference rate. Applications of these measures at different levels of aggregation (i.e., to journals in selected science and social science fields) lead to the following conclusions. 1. The percentage of references to serials proved to be a sensitive measure to characterise typical differences in the communication behaviour between the sciences and the social sciences. 2. However, there is an overlap zone which includes fields like mathematics, technology oriented science, and some social science areas. 3. In certain social sciences part of the information seems even to be originated in non-scientific sources: references to non-serials do not always represent monographs, pre-prints or reports. Consequently, the model of information transfer from scientific literature to scientific (journal) literature assumed by standard bibliometrics requires substantial revision before valid results can be expected through its application to social science areas.  相似文献   

2.
In 2019, the International Journal of Information Management (IJIM) celebrated its 40th year of publication. This study commemorates this event by presenting a retrospect of the journal. Using a range of bibliometric tools, we find that the journal has grown impressively in terms of publication and citation. The contributions come from all over the world, but the majority are from Europe and the United States. The journal has mostly published empirical articles, with its authors dominantly using quantitative methodology. Further, the culture of collaboration has increased among authors over the years. The journal publishes on a number of including managing information systems, information technologies and their application in business, technology acceptance among consumers, using information systems for decision making, social perspectives on knowledge management, and information research from the social science perspective. Regression analysis reveals that article attributes such as article order, methodology, presence of authors from Europe, number of references, number of keywords, and abstract length have a significant association with the citations. Finally, we find that conceptual and review articles have a positive association with citations.  相似文献   

3.
Merging the citation counts of arXiv-deposited e-prints (arXiv version) with those of their corresponding published journal articles (publisher version) is an important issue in citation analysis. Using examples of arXiv-deposited e-prints, this article adopts a manual approach to investigate the processing methods used by bibliographic repositories such as Google Scholar, Web of Science, Scopus, Astrophysics Data System (ADS), and INSPIRE for the citation merging. Both Google Scholar and ADS consolidate all citations from the two versions into the publisher one, whereas the consolidated citations are accumulated into the arXiv version in the INSPIRE repository. All these methods ignore the categories of the arXiv-deposited versions and the corresponding availability dates. As for Web of Science and Scopus, they count the citations of the two versions separately, which is likely regarding them as two independent articles. Focusing on journal articles that also appeared as arXiv e-prints, we classify them into two categories and identify two public availability dates of articles as the starting point of citation statistics. We present four feasible schemes to consolidate citation counts for the articles with both versions and also propose a universal scheme based on the research output. Furthermore, we investigated 2,662 e-prints in the “Computer Science - Digital Libraries” subject (cs.DL) from 1998 to 2018 in arXiv.org and manually calculated the consolidated citation counts of arXiv-deposited articles with the corresponding citation merging schemes. Furthermore, these citation consolidation methods are applied to the evaluation of articles, authors, and journals. Such empirical testing proves the feasibility of the schemes proposed in this article.  相似文献   

4.
An h-type index is proposed which depends on the obtained citations of articles belonging to the h-core. This weighted h-index, denoted as hw, is presented in a continuous setting and in a discrete one. It is shown that in a continuous setting the new index enjoys many good properties. In the discrete setting some small deviations from the ideal may occur.  相似文献   

5.
[目的]重视研究论文中引文的学术论证作用,提高引文质量。[方法]根据引文在研究型论文中所起论证作用不同,将其分为关键论证作用与非关键论证作用2大类,进而提出"关键引文"的概念和标注方法。[结果]关键引文是指在论文中所引用的起到关键性学术论证作用并对论文核心内容具有不可或缺意义的参考文献。提出了关键引文的确定方法和标注方法。[结论]正确认识和把握关键引文,对科学著述中参考文献的合理引用、论文审稿以及引文分析都具有参考指导意义,其有利于促进学术交流质量的提高。  相似文献   

6.
目的】采用H指数和二八法则基于文献的被引频次来划分出期刊文献的核心区,为科研工作者提供一种把握研究热点的新方法。【方法】 以WoS中收录的84种的图书情报学期刊15592篇文献为研究样本,并采用H指数和二八法则基于文献的被引频次来划分出期刊文献的核心区。【结果】采用二八法则划分文献核心区,其核心区文献量占总文献量为6.12%~48.61%,均值为33.81%;而采用H指数划分文献核心区,其核心区文献量占总文献量为0.56%~26.09%,均值为7.80%。对学科采用二八法则划分文献核心区,其核心区文献量占总文献量80%的文献量为38.05%;采用H指数划分文献核心区,其核心区文献占总文献量为0.67%。核心区文献篇均被引频次为H指数划分大于二八法则划分。【结论】采用H指数划分期刊文献核心区比二八法则更具集中趋势,故而H指数能更好地划分出期刊文献核心区。  相似文献   

7.
《Research Policy》2023,52(7):104815
Due to the inadequacy of official notices in disseminating retraction information, a significant proportion of retracted articles continue to be cited in the post-retraction period. There are adverse consequences of citing such questionable articles. This study extends the literature on official versus unofficial information channels by examining three key roles that unofficial information channels can play in disseminating retraction information (i.e., providing broader reach for information dissemination, packaging information from different sources, and creating new information) as well as the effects of these roles. An unofficial information channel affords a broader reach for information dissemination, which reduces post-retraction citations. Moreover, according to the information processing theory, different types of additional information (that comes from the ability of an unofficial information channel to package information from different sources or create new information) can moderate such effect. Leveraging on the launch of Retraction Watch (RW), an unofficial information channel for reporting retractions, this study designed a natural experiment and found that reporting retractions on RW significantly reduced post-retraction citations of non-swiftly retracted articles in biomedical sciences. Furthermore, additional author-related and retraction-related information provided on RW enhanced the main effect, whereas additional article-related information provided on RW weakened the main effect.  相似文献   

8.
We present a new variable-length encoding scheme for sequences of integers, Directly Addressable Codes (DACs), which enables direct access to any element of the encoded sequence without the need of any sampling method. Our proposal is a kind of implicit data structure that introduces synchronism in the encoded sequence without using asymptotically any extra space. We show some experiments demonstrating that the technique is not only simple, but also competitive in time and space with existing solutions in several applications, such as the representation of LCP arrays or high-order entropy-compressed sequences.  相似文献   

9.
10.
Automatic document summarization using citations is based on summarizing what others explicitly say about the document, by extracting a summary from text around the citations (citances). While this technique works quite well for summarizing the impact of scientific articles, other genres of documents as well as other types of summaries require different approaches. In this paper, we introduce a new family of methods that we developed for legal documents summarization to generate catchphrases for legal cases (where catchphrases are a form of legal summary). Our methods use both incoming and outgoing citations, and we show how citances can be combined with other elements of cited and citing documents, including the full text of the target document, and catchphrases of cited and citing cases. On a legal summarization corpus, our methods outperform competitive baselines. The combination of full text sentences and catchphrases from cited and citing cases is particularly successful. We also apply and evaluate the methods on scientific paper summarization, where they perform at the level of state-of-the-art techniques. Our family of citation-based summarization methods is powerful and flexible enough to target successfully a range of different domains and summarization tasks.  相似文献   

11.
《Research Policy》2019,48(7):1855-1865
Quantitative research evaluation requires measures that are transparent, relatively simple, and free of disciplinary and temporal bias. We document and provide a solution to a hitherto unaddressed temporal bias – citation inflation – which arises from the basic fact that scientific publication is steadily growing at roughly 4% per year. Moreover, because the total production of citations grows by a factor of 2 every 12 years, this means that the real value of a citation depends on when it was produced. Consequently, failing to convert nominal citation values into real citation values produces significant mis-measurement of scientific impact. To address this problem, we develop a citation deflator method, outline the steps to generalize and implement it using the Web of Science portal, and analyze a large set of researchers from biology and physics to demonstrate how two common evaluation metrics – total citations and h-index – can differ by a remarkable amount depending on whether the underlying citation counts are deflated or not. In particular, our results show that the scientific impact of prior generations is likely to be significantly underestimated when citations are not deflated, often by 100% or more of the nominal value. Thus, our study points to the need for a systemic overhaul of the counting methods used evaluating citation impact – especially in the case of researchers, journals, and institutions – which can span several decades and thus several doubling periods.  相似文献   

12.
Prior art patent citations have become a popular measure of patent quality and knowledge flow between firms. Interpreting these measurements is complicated, in some cases, because prior art citations are added by patent examiners as well as by patent applicants. The U.S. Patent and Trademark Office (USPTO) adopted new reporting procedures in 2001, making it possible to measure examiner and applicant citations separately for the first time. We analyzed prior art citations listed in all U.S. patents granted in 2001-2003, and found that examiners played a significant role in identifying prior art, adding 63% of citations on the average patent, and all citations on 40% of patents granted. An analysis of variance found that firm-specific variables explain most of the variation in examiner-citation shares. Using multivariate regression, we found that foreign applicants to the USPTO had the highest proportion of citations added by examiners. High-volume patent applicants had a greater proportion of examiner citations, and a substantial number of firms won patents without listing a single applicant citation. In terms of technology, we found higher examiner shares among patents in electronics, communications, and computer-related fields. Taken together, our findings suggest that firm-level patenting practices, particularly among high-volume applicants, have a strong influence on citation data and merit additional research.  相似文献   

13.
This paper describes, evaluates and compares the use of Latent Dirichlet allocation (LDA) as an approach to authorship attribution. Based on this generative probabilistic topic model, we can model each document as a mixture of topic distributions with each topic specifying a distribution over words. Based on author profiles (aggregation of all texts written by the same writer) we suggest computing the distance with a disputed text to determine its possible writer. This distance is based on the difference between the two topic distributions. To evaluate different attribution schemes, we carried out an experiment based on 5408 newspaper articles (Glasgow Herald) written by 20 distinct authors. To complement this experiment, we used 4326 articles extracted from the Italian newspaper La Stampa and written by 20 journalists. This research demonstrates that the LDA-based classification scheme tends to outperform the Delta rule, and the χ2 distance, two classical approaches in authorship attribution based on a restricted number of terms. Compared to the Kullback–Leibler divergence, the LDA-based scheme can provide better effectiveness when considering a larger number of terms.  相似文献   

14.
Extractive summarization for academic articles in natural sciences and medicine has attracted attention for a long time. However, most existing extractive summarization models often process academic articles with sentence classification models, which are hard to produce comprehensive summaries. To address this issue, we explore a new view to solve the extractive summarization of academic articles in natural sciences and medicine by taking it as a question-answering process. We propose a novel framework, MRC-Sum, where the extractive summarization for academic articles in natural sciences and medicine is cast as an MRC (Machine Reading Comprehension) task. To instantiate MRC-Sum, article-summary pairs in the summarization datasets are firstly reconstructed into (Question, Answer, Context) triples in the MRC task. Several questions are designed to cover the main aspects (e.g. Background, Method, Result, Conclusion) of the articles in natural sciences and medicine. A novel strategy is proposed to solve the problem of the non-existence of the ground truth answer spans. Then MRC-Sum is trained on the reconstructed datasets and large-scale pre-trained models. During the inference stage, four answer spans of the predefined questions are given by MRC-Sum and concatenated to form the final summary for each article. Experiments on three publicly available benchmarks, i.e., the Covid, PubMed, and arXiv datasets, demonstrate the effectiveness of MRC-Sum. Specifically, MRC-Sum outperforms advanced extractive summarization baselines on the Covid dataset and achieves competitive results on the PubMed and arXiv datasets. We also propose a novel metric, COMPREHS, to automatically evaluate the comprehensiveness of the system summaries for academic articles in natural sciences and medicine. Abundant experiments are conducted and verified the reliability of the proposed metric. And the results of the COMPREHS metric show that MRC-Sum is able to generate more comprehensive summaries than the baseline models.  相似文献   

15.
We present a comparative study of four impact measures: the h-index, the g-index, the R-index and the j-index. The g-index satisfies the transfer principle, the j-index satisfies the opposite transfer principle while the h- and R-indices do not satisfy any of these principles. We study general inequalities between these measures and also determine their maximal and minimal values, given a fixed total number of citations.  相似文献   

16.
Textual entailment is a task for which the application of supervised learning mechanisms has received considerable attention as driven by successive Recognizing Data Entailment data challenges. We developed a linguistic analysis framework in which a number of similarity/dissimilarity features are extracted for each entailment pair in a data set and various classifier methods are evaluated based on the instance data derived from the extracted features. The focus of the paper is to compare and contrast the performance of single and ensemble based learning algorithms for a number of data sets. We showed that there is some benefit to the use of ensemble approaches but, based on the extracted features, Naïve Bayes proved to be the strongest learning mechanism. Only one ensemble approach demonstrated a slight improvement over the technique of Naïve Bayes.  相似文献   

17.
Previous studies have repeatedly demonstrated that the relevance of a citing document is related to the number of times with which the source document is cited. Despite the ease with which electronic documents would permit the incorporation of this information into citation-based document search and retrieval systems, the possibilities of repeated citations remain untapped. Part of this under-utilization may be due to the fact that very little is known regarding the pattern of repeated citations in scholarly literature or how this pattern may vary as a function of journal, academic discipline or self-citation. The current research addresses these unanswered questions in order to facilitate the future incorporation of repeated citation information into document search and retrieval systems. Using data mining of electronic texts, the citation characteristics of nine different journals, covering the three different academic fields (economics, computing, and medicine & biology), were characterized. It was found that the frequency (f) with which a reference is cited N or more times within a document is consistent across the sampled journals and academic fields. Self-citation causes an increase in frequency, and this effect becomes more pronounced for large N. The objectivity, automatability, and insensitivity of repeated citations to journal and discipline, present powerful opportunities for improving citation-based document search.  相似文献   

18.
PubMed主题词检索与自由词检索的检索效率比较研究   总被引:5,自引:0,他引:5  
胡德华  梁丽明 《情报科学》2006,24(5):717-721
PubMed检索系统是由美国NCBI研制推出的免费医学文献检索系统,由于其资源权威而丰富,检索功能强大,界面友好,一直以来,都是很多医学工作者查找文献必选的检索系统。PubMed提供了多种检索方法,因此对这些检索方法进行比较,对于提高用户的检索效率是十分必要的。本文通过对PubMed检索系统所提供的主题词检索、自由词检索这两种用户常用的检索方式进行选词测试,比较它们的检索效率,从而了解这两种检索方法的特点,并为用户进行检索提供建议。  相似文献   

19.
One of the most time-critical challenges for the Natural Language Processing (NLP) community is to combat the spread of fake news and misinformation. Existing approaches for misinformation detection use neural network models, statistical methods, linguistic traits, fact-checking strategies, etc. However, the menace of fake news seems to grow more vigorous with the advent of humongous and unusually creative language models. Relevant literature reveals that one major characteristic of the virality of fake news is the presence of an element of surprise in the story, which attracts immediate attention and invokes strong emotional stimulus in the reader. In this work, we leverage this idea and propose textual novelty detection and emotion prediction as the two tasks relating to automatic misinformation detection. We re-purpose textual entailment for novelty detection and use the models trained on large-scale datasets of entailment and emotion to classify fake information. Our results correlate with the idea as we achieve state-of-the-art (SOTA) performance (7.92%, 1.54%, 17.31% and 8.13% improvement in terms of accuracy) on four large-scale misinformation datasets. We hope that our current probe will motivate the community to explore further research on misinformation detection along this line. The source code is available at the GitHub.2  相似文献   

20.
In the past decade, news consumption has shifted from printed news media to online alternatives. Although these come with advantages, online news poses challenges as well. Notable here is the increased competition between online newspapers and other online news providers to attract readers. Hereby, speed is often favored over quality. As a consequence, the need for new tools to monitor online news accuracy has grown. In this work, a fundamentally new and automated procedure for the monitoring of online news accuracy is proposed. The approach relies on the fact that online news articles are often updated after initial publication, thereby also correcting errors. Automated observation of the changes being made to online articles and detection of the errors that are corrected may offer useful insights concerning news accuracy. The potential of the presented automated error correction detection model is illustrated by building supervised classification models for the detection of objective, subjective and linguistic errors in online news updates respectively. The models are built using a large news update data set being collected during two consecutive years for six different Flemish online newspapers. A subset of 21,129 changes is then annotated using a combination of automated and human annotation via an online annotation platform. Finally, manually crafted features and text embeddings obtained by four different language models (TF-IDF, word2vec, BERTje and SBERT) are fed to three supervised machine learning algorithms (logistic regression, support vector machines and decision trees) and performance of the obtained models is subsequently evaluated. Results indicate that small differences in performance exist between the different learning algorithms and language models. Using the best-performing models, F2-scores of 0.45, 0.25 and 0.80 are obtained for the classification of objective, subjective and linguistic errors respectively.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号