首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The number of clinical citations received from clinical guidelines or clinical trials has been considered as one of the most appropriate indicators for quantifying the clinical impact of biomedical papers. Therefore, the early prediction of clinical citation count of biomedical papers is critical to scientific activities in biomedicine, such as research evaluation, resource allocation, and clinical translation. In this study, we designed a four-layer multilayer perceptron neural network (MPNN) model to predict the clinical citation count of biomedical papers in the future by using 9,822,620 biomedical papers published from 1985 to 2005. We extracted ninety-one paper features from three dimensions as the input of the model, including twenty-one features in the paper dimension, thirty-five in the reference dimension, and thirty-five in the citing paper dimension. In each dimension, the features can be classified into three categories, i.e., the citation-related features, the clinical translation-related features, and the topic-related features. Besides, in the paper dimension, we also considered the features that have previously been demonstrated to be related to the citation counts of research papers. The results showed that the proposed MPNN model outperformed the other five baseline models, and the features in the reference dimension were the most important. In all the three dimensions, the citation-related and topic-related features were more important than the clinical translation-related features for the prediction. It also turned out that the features helpful in predicting the citation count of papers are not important for predicting the clinical citation count of biomedical papers. Furthermore, we explored the MPNN model based on different categories of biomedical papers. The results showed that the clinical translation-related features were more important for the prediction of clinical citation count of basic papers rather than those papers closer to clinical science. This study provided a novel dimension (i.e., the reference dimension) for the research community and could be applied to other related research tasks, such as the research assessment for translational programs. In addition, the findings in this study could be useful for biomedical authors (especially for those in basic science) to get more attention from clinical research.  相似文献   

2.
本文创新性构建学术论文被引影响因素特征空间,以我校SCI&SSCI学术论文为例,验证机器学习模型在预测学术论文被引频次研究中的有效性和准确性,本文的分析结论可以为高校图书馆开展决策支持服务提供参考。本文梳理学术论文被引频次影响因素及预测方法的相关研究,结合传统文献计量和Altmetrics指标构建学术论文影响因素的特征空间,并通过实验比较线性回归、神经网络、支持向量机三种机器学习模型在预测学术论文被引频次研究中的有效性和准确性。本文的分析结论证明基于Altmetrics视角构建的特征空间的预测准确率大幅度提高,并且支持向量机模型在对学术论文影响力预测的实证研究中表现出优异的性能。  相似文献   

3.
睡美人与王子文献的识别方法研究   总被引:1,自引:0,他引:1  
[目的/意义] 研究睡美人与王子文献的识别方法。分析唤醒机制,为未来在学术交流体系中发现"王子"作者,发掘、唤醒低被引和零被引文献的潜在价值提供理论依据。[方法/过程] 采用被引速率指标和睡美人指数两种客观指标识别1970-2005年临床医学四大名刊上发表的睡美人文献;基于以下4个原则寻找唤醒睡美人的王子文献:①发表于被引突增的附近年份;②本身被引次数较高;③与睡美人文献的同被引次数高;④在年度被引次数曲线上,王子文献对睡美人文献的"牵引或拉动"作用非常显著,即至少在睡美人文献引用突增的附近年份,王子文献的年度被引次数应高于睡美人文献。[结果/结论] 由于考虑了全部引文窗的引文曲线,被引速率指标能够识别出那些被引生命周期长、至今仍持续不断高频被引的论文;睡美人指数能够快速识别出睡美人文献,但却无法反映年度被引次数达到峰值之后的引文曲线;将被引速率+发表最初5年年均被引次数两个指标结合起来能够更好地识别睡美人文献。分析发现,综述、指南、著作等"共识型"的文献对于引发那些提出了新思想但尚未被认可的睡美人文献的被引突增起到了关键作用。建议事后识别睡美人文献可采用客观指标与主观界定相结合的方法,事前预测睡美人文献要注意追踪其是否被"共识性"文献推荐和引用,学术评价要特别关注被引速率低的论文。  相似文献   

4.
Citation prediction of scholarly papers is of great significance in guiding funding allocations, recruitment decisions, and rewards. However, little is known about how citation patterns evolve over time. By exploring the inherent involution property in scholarly paper citation, we introduce the Paper Potential Index (PPI) model based on four factors: inherent quality of scholarly paper, scholarly paper impact decaying over time, early citations, and early citers’ impact. In addition, by analyzing factors that drive citation growth, we propose a multi-feature model for impact prediction. Experimental results demonstrate that the two models improve the accuracy in predicting scholarly paper citations. Compared to the multi-feature model, the PPI model yields superior predictive performance in terms of range-normalized RMSE. The PPI model better interprets the changes in citation, without the need to adjust parameters. Compared to the PPI model, the multi-feature model performs better prediction in terms of Mean Absolute Percentage Error and Accuracy; however, their predictive performance is more dependent on the parameter adjustment.  相似文献   

5.
In this paper, we discussed the feasibility of early recognition of highly cited papers with citation prediction tools. Because there are some noises in papers’ citation behaviors, the soft fuzzy rough set (SFRS), which is well robust to noises, is introduced in constructing the case-based classifier (CBC) for highly cited papers. After careful design that included: (a) feature reduction by SFRS; (b) case selection by the combination use of SFRS and the concept of case coverage; (c) reasoning by two classification techniques of case coverage based prediction and case score based prediction, this study demonstrates that the highly cited papers could be predicted by objectively assessed factors. It shows that features included the research capabilities of the first author, the papers’ quality and the reputation of journal are the most relevant predictors for highly cited papers.  相似文献   

6.
In citation network analysis, complex behavior is reduced to a simple edge, namely, node A cites node B. The implicit assumption is that A is giving credit to, or acknowledging, B. It is also the case that the contributions of all citations are treated equally, even though some citations appear multiply in a text and others appear only once. In this study, we apply text-mining algorithms to a relatively large dataset (866 information science articles containing 32,496 bibliographic references) to demonstrate the differential contributions made by references. We (1) look at the placement of citations across the different sections of a journal article, and (2) identify highly cited works using two different counting methods (CountOne and CountX). We find that (1) the most highly cited works appear in the Introduction and Literature Review sections of citing papers, and (2) the citation rankings produced by CountOne and CountX differ. That is to say, counting the number of times a bibliographic reference is cited in a paper rather than treating all references the same no matter how many times they are invoked in the citing article reveals the differential contributions made by the cited works to the citing paper.  相似文献   

7.
Review papers tend to be cited more frequently than regular research articles. This fact, together with the continuous increase of the share of reviews in scientific literature, can have important consequences for the measurement of individuals’ research output, usually based on citation analysis. However, studies evaluating the differences in citations of review papers compared to original research articles are almost non-existing in the literature. This paper presents a thorough analysis of the overcitation and overrepresentation of review papers in the most cited papers of the 35 largest subject categories in Science Citation Index-Expanded. Results indicate the average citations received by reviews depends largely on the research area considered, varying from 1.34 to 6.74 times the citations received by original research articles (average value is 2.95). Correlated with this overcitation, there is an important overrepresentation of reviews in the most cited papers, this overrepresentation being greater when the most highly cited papers are considered, i.e. 0.05% and 0.1% most cited papers, where the share of reviews have increased from 16 to 18% in 1990 to around 40% in 2010. Interestingly, the overcitation and overrepresentation in the most cited papers is more important in the areas with the lowest shares of reviews in total publications.  相似文献   

8.
Delayed recognition is a concept applied to articles that receive very few to no citations for a certain period of time following publication, before becoming actively cited. To determine whether such a time spent in relative obscurity had an effect on subsequent citation patterns, we selected articles that received no citations before the passage of ten full years since publication, investigated the subsequent yearly citations received over a period of 37 years and compared them with the citations received by a group of papers without such a latency period. Our study finds that papers with delayed recognition do not exhibit the typical early peak, then slow decline in citations, but that the vast majority enter decline immediately after their first – and often only – citation. Middling papers’ citations remain stable over their lifetime, whereas the more highly cited papers, some of which fall into the “sleeping beauty” subtype, show non-stop growth in citations received. Finally, papers published in different disciplines exhibit similar behavior and did not differ significantly.  相似文献   

9.
刘洋  崔雷 《图书情报工作》2014,58(6):101-104
以引文上下文为研究对象,探讨来自于引文上下文、目标文献摘要以及目标文献自标医学主题词(下称主题词)三者间的符合程度,定量分析引文上下文在表征目标文献内容特征时的作用。以被Circulation杂志高频引证的5篇研究类论文作为目标文献,提取其施引文献的全部引文上下文,并对其进行分词及主题词匹配;将其结果与目标文献摘要提取的主题词以及文献自标的主题词进行两两比较。结果表明,引文上下文与目标文献摘要具有较高的符合度,而且在表征被引文献内容特征的效果上明显具有优势。  相似文献   

10.
p 指数运用于人才评价的有效性实证研究   总被引:2,自引:0,他引:2  
h指数用于高发文、高引用的学者评价是有效的,但对低发文、高引用的学者进行评价存在缺陷,且数值易于雷同,不易区分。p指数在学者研究绩效评价方面具有同h指数相一致的维度,它不仅考虑学者的被引次数(C),而且考虑学者的研究质量指标——平均被引率(C/N)。以图书情报与文献学科领域49位专家为例,对比分析专家的发文量(N)、被引次数(C)、平均被引率、专家h指标、g指数、p指数,并进行相关性分析。结论:p指数优于现有的h指数、g指数,更具有评价的合理性,应在更大范围内进一步使用。  相似文献   

11.
The non-citation rate refers to the proportion of papers that do not attract any citation over a period of time following their publication. After reviewing all the related papers in Web of Science, Google Scholar and Scopus database, we find the current literature on citation distribution gives more focus on the distribution of the percentages and citations of papers receiving at least one citation, while there are fewer studies on the time-dependent patterns of the percentage of never-cited papers, on what distribution model can fit their time-dependent patterns, as well as on the factors influencing the non-citation rate. Here, we perform an empirical pilot analysis to the time-dependent distribution of the percentages of never-cited papers in a series of different, consecutive citation time windows following their publication in our selected six sample journals, and study the influence of paper length on the chance of papers’ getting cited. Through the above analysis, the following general conclusions are drawn: (1) a three-parameter negative exponential model can well fit time-dependent distribution curve of the percentages of never-cited papers; (2) in the initial citation time window, the percentage of never-cited papers in each journal is very high. However, as the citation time window becomes wider and wider, the percentage of never-cited papers begins to drop rapidly at first, and then drop more slowly, and the total degree of decline for most of journals is very large; (3) when applying the wider citation time windows, the percentage of never-cited papers for each journal begins to approach a stable value, and after that value, there will be very few changes in these stable percentages, unless we meet a large amount of “Sleeping Beauties” type papers; (4) the length of an paper has a great influence on whether it will be cited or not.  相似文献   

12.
[目的/意义]在引文分析中,可通过论文的一些属性特征对其未来的被引情况进行预测,并通过预测结果对论文、论文作者、作者所属机构及出版物做出评价。[方法/过程] 从出版物、作者和论文三个方面对影响论文被引的多个因素展开研究,以图书馆学情报学领域被SCI索引的论文作为分析及验证数据,使用逻辑回归、GBDT、XGBoost、AdaBoost、随机森林等算法进行预测,使用多组评测指标对比不同预测方法的效果,并使用GBDT识别对论文被引影响较大的因素。[结果/结论]确定三个方面的影响因素对论文被引预测的影响程度,构建预测模型,并较好地预测论文在未来一段时间的被引情况。大量实验分析发现GBDT、XGBoost和随机森林的预测能力较强,且预测的时间段越长,效果也就相对越好。  相似文献   

13.
基于F1000与WoS的同行评议与文献计量相关性研究   总被引:1,自引:1,他引:0  
为比较同行评议与文献计量方法在科学评价中的有效性及相关性,选取F1000以及Web of Science数据库,采用SPSS16.0软件,将近2000篇论文的F1000因子与Web of Science数据库中指标进行相关性比较。结果显示,F1000因子与统计区间内的被引频次呈显著正相关,同时一些F1000因子很高的论文并没有高频被引,反之亦然。结论指出:从统计学的视角,文献计量指标与同行评议结果具有正向相关性,但是无论是同行评议还是文献计量,单独作为科学评价标准都会有失偏颇,以引文分析为代表的定量指标与同行评议方法的结合将是未来科学评价的主流。  相似文献   

14.
Researchers have investigated factors thought to affect the total number of citations in various academic disciplines, and some general trends have emerged. However, there are still limited data for many fields, including aquatic sciences. Using papers published in 2003–2005 (n = 785), we investigated marine and freshwater biology articles to identify factors that may contribute to the probability of citation and for cumulative citation counts over 10 years. We found no relationships with probability of citation; however, we found evidence that for those that were cited at least once, cumulative citations were related to several factors. Articles cited by books received more citations than those never cited by books, which we hypothesized to be indicative of the impact an article may have in the field. We also found that articles first cited within 2 years of publication received more cumulative citations than those first cited after 2 years. We found no evidence that self‐citation (as the first citation) had a significant effect on total citations. Our findings were compared with previous studies in other disciplines, and it was found that aquatic science citation patterns are comparable to fields in science and technology but less so to humanities and social sciences.  相似文献   

15.
《编辑学报》高被引论文分析   总被引:5,自引:1,他引:4  
张建合 《编辑学报》2010,22(6):562-564
以中国知网<中国学术文献网络出版总库>为统计源,从文献引证的角度分析<编辑学报>高被引论文的分布规律.研究结果表明:该数据库共收录<编辑学报>1989-2009年原文3 508篇,被引文献2545篇,被引率为73%,总被引频次为1万5 863,单篇最高被引频次71;较少的论文拥有较高的被引频次,基本符合"二八定律";前100篇高被引论文中,栏目高被引论文数最多的是<理论研究>(46篇),个人高被引论文数最多的是游苏宁(6篇);前10篇高被引论文每年都在被引用,具有旺盛的生命力.  相似文献   

16.
Understanding paper citation dynamics and accurately predicting future citation counts of papers is of significant interest, and thus modeling citation dynamics as an information cascade has recently attracted considerable attention. Nevertheless, most of these recent deep learning-based information cascade prediction models are focused on the embedding of each individual node rather than the entire structure of the cascade graph, which limits the robustness of the model. Thus, instead of learning the representation of each node in the cascade, we propose learning the dynamic structural representation of the entire information cascade graph with the degree distribution vectors corresponding to different timestamps as the input of a sequential deep neural network, named CasDENN. Extensive experiments on datasets from academic paper citations (APS) and social media post forwards (Weibo) show a dramatic improvement over state-of-the-art baselines, where the prediction error can be reduced by approximately 8%–10% and the running time is less than 10% of the fast baseline.  相似文献   

17.
The findings of Bornmann, Leydesdorff, and Wang (2013b) revealed that the consideration of journal impact improves the prediction of long-term citation impact. This paper further explores the possibility of improving citation impact measurements on the base of a short citation window by the consideration of journal impact and other variables, such as the number of authors, the number of cited references, and the number of pages. The dataset contains 475,391 journal papers published in 1980 and indexed in Web of Science (WoS, Thomson Reuters), and all annual citation counts (from 1980 to 2010) for these papers. As an indicator of citation impact, we used percentiles of citations calculated using the approach of Hazen (1914). Our results show that citation impact measurement can really be improved: If factors generally influencing citation impact are considered in the statistical analysis, the explained variance in the long-term citation impact can be much increased. However, this increase is only visible when using the years shortly after publication but not when using later years.  相似文献   

18.
We address issues concerning what one may learn from how citation instances are distributed in scientific articles. We visualize and analyze patterns of citation distributions in the full text of 350 articles published in the Journal of Informetrics. In particular, we visualize and analyze the distributions of citations in articles that are organized in a commonly seen four-section structure, namely, introduction, method, results, and conclusions (IMRC). We examine the locations of citations to the groundbreaking h-index paper by Hirsch in 2005 and how patterns associated with citation locations evolve over time. The results show that citations are highly concentrated in the first section of an article. The density of citations in the first section is about three times higher than that in subsequent sections. The distributions of citations to highly cited papers are even more uneven.  相似文献   

19.
《Journal of Informetrics》2019,13(2):485-499
With the growing number of published scientific papers world-wide, the need to evaluation and quality assessment methods for research papers is increasing. Scientific fields such as scientometrics, informetrics, and bibliometrics establish quantified analysis methods and measurements for evaluating scientific papers. In this area, an important problem is to predict the future influence of a published paper. Particularly, early discrimination between influential papers and insignificant papers may find important applications. In this regard, one of the most important metrics is the number of citations to the paper, since this metric is widely utilized in the evaluation of scientific publications and moreover, it serves as the basis for many other metrics such as h-index. In this paper, we propose a novel method for predicting long-term citations of a paper based on the number of its citations in the first few years after publication. In order to train a citation count prediction model, we employed artificial neural network which is a powerful machine learning tool with recently growing applications in many domains including image and text processing. The empirical experiments show that our proposed method outperforms state-of-the-art methods with respect to the prediction accuracy in both yearly and total prediction of the number of citations.  相似文献   

20.
Identifying the future influential papers among the newly published ones is an important yet challenging issue in bibliometrics. As newly published papers have no or limited citation history, linear extrapolation of their citation counts—which is motivated by the well-known preferential attachment mechanism—is not applicable. We translate the recently introduced notion of discoverers to the citation network setting, and show that there are authors who frequently cite recent papers that become highly-cited in the future; these authors are referred to as discoverers. We develop a method for early identification of highly-cited papers based on the early citations from discoverers. The results show that the identified discoverers have a consistent citing pattern over time, and the early citations from them can be used as a valuable indicator to predict the future citation counts of a paper. The discoverers themselves are potential future outstanding researchers as they receive more citations than average.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号