首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This study explored the potential of using sentiment analysis of tweets to predict referendum choices (Brexit). The feasibility of using StreamKM++ in the massive online analysis framework was examined over five categories, ranging from strongly agree to strongly disagree (to exit). A Naïve Bayes classifier was used to classify people’s opinions according to these categories. The prediction model resulted in high accuracy (97.98%), making it possible to use it in predicting opinions about public events and issues. The findings from this study may help practitioners, and policymakers understand the importance of sentiment analysis of social media in assessing public opinion and, accordingly, making certain voting predictions.  相似文献   

2.
利用信息可视化软件Citespace Ⅱ,以Web of Science(SCI,ISTP)中收录的1998-2011年间的521篇社群信息学相关文献及其所包含的17 968篇参考文献为研究对象,对发文时间、作者、机构、学科、参考文献等进行分析;同时通过对高频关键词和高中心度、高被引文献进行分析,明确社群信息学研究热点。结果显示,社群信息学研究还未迎来大发展时期,但知识储备速度在不断加快;社群信息学研究由社会网络和社会资本两个分支组成,其中,社会网络是主要分支;网络学习、社会网络、复杂网络、卫生保健将成为未来重点研究主题。  相似文献   

3.
This paper proposes a new node centrality measurement index (c-index) and its derivative indexes (iterative c-index and cg-index) to measure the collaboration competence of a node in a weighted network. We prove that c-index observe the power law distribution in the weighted scale-free network. A case study of a very large scientific collaboration network indicates that the indexes proposed in this paper are different from other common centrality measures (degree centrality, betweenness centrality, closeness centrality, eigenvector centrality and node strength) and other h-type indexes (lobby-index, w-lobby index and h-degree). The c-index and its derivative indexes proposed in this paper comprehensively utilize the amount of nodes’ neighbors, link strengths and centrality information of neighbor nodes to measure the centrality of a node, composing a new unique centrality measure for collaborative competency.  相似文献   

4.
[目的/意义] 针对复杂网络中的重要节点的识别,设计一种节点中心性算法,在传染病防控、舆情监控、产品营销、人才发现等方面发挥作用。[方法/过程] 同时考虑节点的高影响力邻居的数量及其总体影响,提出HHa节点中心性算法,在真实网络和人工网络上,使用SIR传染病模型模拟信息传播过程,采用单调函数M和肯德尔相关系数作为评价指标验证HHa中心性算法的有效性、准确性以及稳定性。[结果/结论] 实验表明,与7种经典的中心性算法相比,HHa中心性算法得出的排序结果M值为0.999等,排名第2;肯德尔系数为0.845等,高于其他算法0.15左右,排名第1且表现稳定。采用HHa中心性算法识别网络中的重要节点具备可行性。  相似文献   

5.
We analyze whether preferential attachment in scientific coauthorship networks is different for authors with different forms of centrality. Using a complete database for the scientific specialty of research about “steel structures,” we show that betweenness centrality of an existing node is a significantly better predictor of preferential attachment by new entrants than degree or closeness centrality. During the growth of a network, preferential attachment shifts from (local) degree centrality to betweenness centrality as a global measure. An interpretation is that supervisors of PhD projects and postdocs broker between new entrants and the already existing network, and thus become focal to preferential attachment. Because of this mediation, scholarly networks can be expected to develop differently from networks which are predicated on preferential attachment to nodes with high degree centrality.  相似文献   

6.
在知识经济时代,知识流动在激发知识创新和促进科技发展等方面的价值逐步凸显出来。本文在直引-共被引-耦合三维引文关联网络融合的基础上,对主题关联层面进行映射,对领域内潜在的知识流动进行挖掘。链路预测指标作为特征值,分别应用于构建分类器和回归器。其中,分类器用于预测目前尚未存在、在未来极有可能发生的知识流动;回归器主要用于预测目前链接权重较低的,尚未引起广泛关注、但在未来具有较高链接权重的知识流动。两种预测层面综合互补,可更全面地探测领域内的研究前沿或新兴趋势。利用该思路对当前热门的基因编辑技术领域进行探索研究,得到了该领域中的潜在知识流动和潜在研究的热点,为科研人员选择研究方向提供参考。  相似文献   

7.
[目的/意义]实现学术查询意图的自动识别,提高学术搜索引擎的效率。[方法/过程]结合已有查询意图特征和学术搜索特点,从基本信息、特定关键词、实体和出现频率4个层面对查询表达式进行特征构造,运用Naive Bayes、Logistic回归、SVM、Random Forest四种分类算法进行查询意图自动识别的预实验,计算不同方法的准确率、召回率和F值。提出了一种将Logistic回归算法所预测的识别结果扩展到大规模数据集、提取"关键词类"特征的方法构建学术查询意图识别的深度学习两层分类器。[结果/结论]两层分类器的宏平均F1值为0.651,优于其他算法,能够有效平衡不同学术查询意图的类别准确率与召回率效果。两层分类器在学术探索类的效果最好,F1值为0.783。  相似文献   

8.
Author keywords for scientific literature are terms selected and created by authors. Although most studies have focused on how to apply author keywords to represent their research interests, little is known about the process of how authors select keywords. To fill this research gap, this study presents a pilot study on author keyword selection behavior. Our empirical results show that the average percentages of author keywords appearing in titles, abstracts, and both titles and abstracts are 31%, 52.1%, and 56.7%, respectively. Meanwhile, we find that keywords also appear in references and high-frequency keywords. The proportions of author-selected keywords appearing in the references and high-frequency keywords are 41.6% and 56.1%, respectively. In addition, keywords of papers written by core authors (productive authors) are found to appear less frequently in titles and abstracts in their papers than that of others, and appear more frequently in references and high-frequency keywords. The percentages of keywords appearing in titles and abstracts in scientific papers are negatively correlated with citation counts of papers. In contrast, the percentages of author keywords appearing in high-frequency keywords are positively associated with citation counts of papers.  相似文献   

9.
当前,针对知识网络的链路预测主要是基于网络拓扑结构的相似性,很少考虑作者的研究领域,导致信息利用不充分等问题,因此本文提出了双层知识网络的链路预测框架hypernet2vec。双层知识网络,即作者合著关系网络和学术领域关系网络,利用网络表示学习,分别将两层网络中的节点映射到低维的向量空间,再输入到专门设计的卷积神经网络中计算并进行链路预测。与经典的链路预测指标如RA指标、LP指标和LRW指标等相比,hypernet2vec模型预测的AUC(area under curve)值取得了显著的提升,平均提升幅度达11.17%。文章还从情报产生层面和复杂系统层面,对模型发生作用的深层机理进行了探讨。  相似文献   

10.
[目的/意义]作为科学学预测的重要组成部分,学科主题热度预测旨在揭示学术前沿和发展趋势,辅助学者发现前沿选题,支持科研管理机构科学立项。[研究设计/方法]提出基于期刊影响因子的学科主题热度计算指标(TP-JIF),构建基于LSTM神经网络的学科主题热度预测模型(TPP-LSTM),并以LIS领域数据为例,通过时间切片的形式抽取、计算学科主题的热度序列,检验不同长度时间序列下模型的各项误差。[结论/发现]相对于RBF-SVM、Linear-SVM、KNN、Naive Bayesian等模型,TPP-LSTM预测模型可有效表征学科主题热度时间序列的特性,当时间序列长度为4年时预测效果相对较好。[创新/价值]提出的基于期刊影响因子的学科主题热度计算指标,能够有效刻画不同学术刊物对学科影响的差异,规避了单纯依据频率计算热度的弊端;构建的学科主题热度预测模型,有效表征了学科主题的时间序列变化规律,减小了各项预测误差,预测效果较好。  相似文献   

11.
For the purposes of classification it is common to represent a document as a bag of words. Such a representation consists of the individual terms making up the document together with the number of times each term appears in the document. All classification methods make use of the terms. It is common to also make use of the local term frequencies at the price of some added complication in the model. Examples are the naïve Bayes multinomial model (MM), the Dirichlet compound multinomial model (DCM) and the exponential-family approximation of the DCM (EDCM), as well as support vector machines (SVM). Although it is usually claimed that incorporating local word frequency in a document improves text classification performance, we here test whether such claims are true or not. In this paper we show experimentally that simplified forms of the MM, EDCM, and SVM models which ignore the frequency of each word in a document perform about at the same level as MM, DCM, EDCM and SVM models which incorporate local term frequency. We also present a new form of the naïve Bayes multivariate Bernoulli model (MBM) which is able to make use of local term frequency and show again that it offers no significant advantage over the plain MBM. We conclude that word burstiness is so strong that additional occurrences of a word essentially add no useful information to a classifier.  相似文献   

12.
[目的/意义] 通过构建二模复杂网络模型,揭示隐藏在海量文献中的隐性知识。[方法/过程] 通过NetworkX复杂网络工具包,依据任意两个节点的共现关系构建二模复杂网络模型;对网络模型中节点的共现关系进行加权,计算网络的拓扑信息并进行AP聚类,提取节点间的直接关系;采用AUC方法对AA、JC、加权改进的wAA和wJC等4种链路预测算法进行评价,遴选出最合适的预测算法,并对复杂网络的隐性关系进行预测分析。[结果/结论] 以潜在药物靶点挖掘为例进行的实证研究结果表明,wAA链路预测算法为最优的链路预测算法;二模复杂网络模型、指标和方法体系在美国化学文摘社数据库中的药物靶点挖掘中具有一定的有效性。下一步计划在其他数据库中或其他研究领域中进行尝试,以进一步验证该模型的通用性和有效性。  相似文献   

13.
共被引网络中介中心性的Zipf-Pareto分布研究   总被引:4,自引:0,他引:4  
针对共被引网络的统计特性的研究,以科学计量学领域的权威期刊《科学计量学》(Scientometrics)为数据来源,建立共被引网络,明确了《科学计量学》(Scientometrics)中高被引文献与高中心性引文的特点;以该共被引网络为例,应用复杂网络分析方法,分析了共被引网络的统计特性,包括度分布、中介中心性的分布等。结果表明共被引网络是一个具有小世界、无标度特性的复杂网络;中介中心性的分布符合Zipf-Pareto分布;而且只有少部分引文节点具有高中介中心性的值,多数引文节点的中介中心性值都很小。  相似文献   

14.
The number of received citations have been used as an indicator of the impact of academic publications. Developing tools to find papers that have the potential to become highly-cited has recently attracted increasing scientific attention. Topics of concern by scholars may change over time in accordance with research trends, resulting in changes in received citations. Author-defined keywords, title and abstract provide valuable information about a research article. This study performs a latent Dirichlet allocation technique to extract topics and keywords from articles; five keyword popularity (KP) features are defined as indicators of emerging trends of articles. Binary classification models are utilized to predict papers that were highly-cited or less highly-cited by a number of supervised learning techniques. We empirically compare KP features of articles with other commonly used journal-related and author-related features proposed in previous studies. The results show that, with KP features, the prediction models are more effective than those with journal and/or author features, especially in the management information system discipline.  相似文献   

15.
基于复杂网络的国内信息行为研究热点及衍化路径分析   总被引:1,自引:0,他引:1  
通过CNKI获取信息行为领域研究论文的关键词数据,采用社会网络的方法构建信息行为研究概念(关键词)的无向加权网络,验证该网络所具有的社会网络特性,计算网络节点的程度中心性和中介中心性;采用G-N聚类算法对国内信息行为研究概念网络进行聚类分析,划分出10大分支领域,通过计算时间隶属度进行历时分析,以此描绘我国信息行为研究的发展轨迹。  相似文献   

16.
Convexity in a network (graph) has been recently defined as a property of each of its subgraphs to include all shortest paths between the nodes of that subgraph. It can be measured on the scale [0, 1] with 1 being assigned to fully convex networks. The largest convex component of a graph that emerges after the removal of the least number of edges is called a convex skeleton. It is basically a tree of cliques, which has been shown to have many interesting features. In this article the notions of convexity and convex skeletons in the context of scientific collaboration networks are discussed. More specifically, we analyze the co-authorship networks of Slovenian researchers in computer science, physics, sociology, mathematics, and economics and extract convex skeletons from them. We then compare these convex skeletons with the residual graphs (remainders) in terms of collaboration frequency distributions by various parameters such as the publication year and type, co-authors’ birth year, status, gender, discipline, etc. We also show the top-ranked scientists by four basic centrality measures as calculated on the original networks and their skeletons and conclude that convex skeletons may help detect influential scholars that are hardly identifiable in the original collaboration network. As their inherent feature, convex skeletons retain the properties of collaboration networks. These include high-level structural properties but also the fact that the same authors are highlighted by centrality measures. Moreover, the most important ties and thus the most important collaborations are retained in the skeletons.  相似文献   

17.
科技论文关键词呈现多类型、多关联关系的属性,可以借助具有多层次、多超边的超网络进行表示建模。本研究构建了由研究对象-实验品种-研究用途-技术方法 4层关键词子网和多种关联超边组成的超网络模型,并将该超网络模型用于"农业动物生殖细胞和干细胞调控"领域的科技论文的实证分析。该超网络模型在揭示单层关键词子网同质关联关系的同时,也能挖掘多层子网之间的隐性异质关联关系,从而发现了该领域常用技术方法、实验品种、研究对象和研究用途,同时还发现了该领域的技术空白点和技术应用空白点,这些空白点很可能成为未来的研究热点。  相似文献   

18.
1999-2008年我国图书馆学研究的实证分析(下)   总被引:4,自引:2,他引:2  
对1999-2008年我国图书馆学研究热点、结构以及特征进行关键词词频分析和共词分析,进而绘制出战略坐标图来分析我国图书馆学九大研究结构的发展动态.通过对高频关键词的逐年扫描,将68个高频关键词分为四大类,揭示出四种变化态势;从人文、技术、管理和综合四个维度对前后五年新增或增幅较快的84个低频词进行透视和分析,推断出未来我国图书馆学研究存在八个微观发展主题.未来的研究可以根据<中国分类主题词表>及对新生词的统计建立标准主题词表,利用得到的完整词频进行统计分析;另外结合引文分析和作者共被引等文献计量方法来完善实证研究是一种更为客观、准确的研究方法.  相似文献   

19.
《Journal of Informetrics》2019,13(2):485-499
With the growing number of published scientific papers world-wide, the need to evaluation and quality assessment methods for research papers is increasing. Scientific fields such as scientometrics, informetrics, and bibliometrics establish quantified analysis methods and measurements for evaluating scientific papers. In this area, an important problem is to predict the future influence of a published paper. Particularly, early discrimination between influential papers and insignificant papers may find important applications. In this regard, one of the most important metrics is the number of citations to the paper, since this metric is widely utilized in the evaluation of scientific publications and moreover, it serves as the basis for many other metrics such as h-index. In this paper, we propose a novel method for predicting long-term citations of a paper based on the number of its citations in the first few years after publication. In order to train a citation count prediction model, we employed artificial neural network which is a powerful machine learning tool with recently growing applications in many domains including image and text processing. The empirical experiments show that our proposed method outperforms state-of-the-art methods with respect to the prediction accuracy in both yearly and total prediction of the number of citations.  相似文献   

20.
Despite its rising position as a first-class research object, scientific software remains a marginal object in studies of scholarly communication. This study aims to fill the gap by examining the co-mention network of R packages across all Public Library of Science (PLoS) journals. To that end, we developed a software entity extraction method and identified 14,310 instances of R packages across the 13,684 PLoS journal papers mentioning or citing R. A paper-level co-mention network of these packages was visualized and analyzed using three major centrality measures: degree centrality, betweenness centrality, and PageRank. We analyzed the distributive patterns of R packages in all PLoS papers, identified the top packages mentioned in these papers, and examined the clustering structure of the network. Specifically, we found that the discipline and function of the packages can partly explain the largest clusters. The present study offers the first large-scale analysis of R packages’ extensive use in scientific research. As such, it lays the foundation for future explorations of various roles played by software packages in the scientific enterprise.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号