首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
There is no doubt that scientific discoveries have always brought changes to society. New technologies help solve social problems such as transportation and education, while research brings benefits such as curing diseases and improving food production. Despite the impacts caused by science and society on each other, this relationship is rarely studied and they are often seen as different universes. Previous literature focuses only on a single domain, detecting social demands or research fronts for example, without ever crossing the results for new insights. In this work, we create a system that is able to assess the relationship between social and scholar data using the topics discussed in social networks and research topics. We use the articles as science sensors and humans as social sensors via social networks. Topic modeling algorithms are used to extract and label social subjects and research themes and then topic correlation metrics are used to create links between them if they have a significant relationship. The proposed system is based on topic modeling, labeling and correlation from heterogeneous sources, so it can be used in a variety of scenarios. We make an evaluation of the approach using a large-scale Twitter corpus combined with a PubMed article corpus. In both of them, we work with data of the Zika epidemic in the world, as this scenario provides topics and discussions on both domains. Our work was capable of discovering links between various topics of different domains, which suggests that some of the relationships can be automatically inferred by the sensors. Results can open new opportunities for forecasting social behavior, assess community interest in a scientific subject or directing research to the population welfare.  相似文献   

2.
【目的/意义】社会网络分析是由社会学家根据数学方法、图论等发展起来的定量分析方法。本文将对社会网络分析方法与图书情报学研究主题之间的关联性进行分析。【方法/过程】本文利用社会网络领域的术语为检索词,从CNKI收集了图书情报学领域应用社会网络分析方法的论文,对论文的研究主题与研究工具、数据类型及采用的社会网络分析指标进行关联分析,揭示它们之间的联系。【结果/结论】研究发现,国内图书情报学领域学者利用社会网络分析方法时,关注的研究主题,使用的研究工具和数据类型都呈现为一个集中与分散的状态。另外,研究主题不同,关注点不同,研究中所采用的社会网络分析测度指标也存在一定的差异。【创新/局限】本文研究层面呈多样性、细粒度性等特点,重点揭示了网络分析测度指标与研究主题间的潜在关联,研究工具与数据源类型也是本文的关注点,但对研究工具用于解决的具体问题方面没有进行深入挖掘。  相似文献   

3.
A news article’s online audience provides useful insights about the article’s identity. However, fake news classifiers using such information risk relying on profiling. In response to the rising demand for ethical AI, we present a profiling-avoiding algorithm that leverages Twitter users during model optimisation while excluding them when an article’s veracity is evaluated. For this, we take inspiration from the social sciences and introduce two objective functions that maximise correlation between the article and its spreaders, and among those spreaders. We applied our profiling-avoiding algorithm to three popular neural classifiers and obtained results on fake news data discussing a variety of news topics. The positive impact on prediction performance demonstrates the soundness of the proposed objective functions to integrate social context in text-based classifiers. Moreover, statistical visualisation and dimension reduction techniques show that the user-inspired classifiers better discriminate between unseen fake and true news in their latent spaces. Our study serves as a stepping stone to resolve the underexplored issue of profiling-dependent decision-making in user-informed fake news detection.  相似文献   

4.
In this era, the proliferating role of social media in our lives has popularized the posting of the short text. The short texts contain limited context with unique characteristics which makes them difficult to handle. Every day billions of short texts are produced in the form of tags, keywords, tweets, phone messages, messenger conversations social network posts, etc. The analysis of these short texts is imperative in the field of text mining and content analysis. The extraction of precise topics from large-scale short text documents is a critical and challenging task. The conventional approaches fail to obtain word co-occurrence patterns in topics due to the sparsity problem in short texts, such as text over the web, social media like Twitter, and news headlines. Therefore, in this paper, the sparsity problem is ameliorated by presenting a novel fuzzy topic modeling (FTM) approach for short text through fuzzy perspective. In this research, the local and global term frequencies are computed through a bag-of-words (BOW) model. To remove the negative impact of high dimensionality on the global term weighting, the principal component analysis is adopted; thereafter the fuzzy c-means algorithm is employed to retrieve the semantically relevant topics from the documents. The experiments are conducted over the three real-world short text datasets: the snippets dataset is in the category of small dataset whereas the other two datasets, Twitter and questions, are the bigger datasets. Experimental results show that the proposed approach discovered the topics more precisely and performed better as compared to other state-of-the-art baseline topic models such as GLTM, CSTM, LTM, LDA, Mix-gram, BTM, SATM, and DREx+LDA. The performance of FTM is also demonstrated in classification, clustering, topic coherence and execution time. FTM classification accuracy is 0.95, 0.94, 0.91, 0.89 and 0.87 on snippets dataset with 50, 75, 100, 125 and 200 number of topics. The classification accuracy of FTM on questions dataset is 0.73, 0.74, 0.70, 0.68 and 0.78 with 50, 75, 100, 125 and 200 number of topics. The classification accuracies of FTM on snippets and questions datasets are higher than state-of-the-art baseline topic models.  相似文献   

5.
Mainstream social media, such as Facebook, Twitter, and Weibo, provide enterprises an opportunity to innovate and develop. User-generated content on social media platforms can help determine the needs of the user and identify a target market, providing a basis for enterprise innovation. In this study, we propose a user-interactive innovation knowledge acquisition model. Accordingly, the comments data on a selected forum were first crawled using network crawler software. Subsequently, we pre-processed the data to obtain a semi-structured user corpus. We then used the Latent Dirichlet Allocation model to cluster topics and obtain the subject words that were hidden from each comment text. A user demand ontology was built based on the subject words, and with an expert's reference, the product function ontology was established. Through semantic similarity matching, we integrated two ontologies to obtain the user-interactive innovation knowledge acquisition model. Finally, the model was validated using the Volvo XC60 automobile as an example. The empirical results showed that the proposed model could assist enterprises by providing ideas for follow-up innovation and product development.  相似文献   

6.
The rising popularity of social media posts, most notably Twitter posts, as a data source for social science research poses significant problems with regard to access to representative, high-quality data for analysis. Cheap, publicly available data such as that obtained from Twitter's public application programming interfaces is often of low quality, while high-quality data is expensive both financially and computationally. Moreover, data is often available only in real-time, making post-hoc analysis difficult or impossible. We propose and test a methodology for inexpensively creating an archive of Twitter data through population sampling, yielding a database that is highly representative of the targeted user population (in this test case, the entire population of Japanese-language Twitter users). Comparing the tweet volume, keywords, and topics found in our sample data set with the ground truth of Twitter's full data feed confirmed a very high degree of representativeness in the sample. We conclude that this approach yields a data set that is suitable for a wide range of post-hoc analyses, while remaining cost effective and accessible to a wide range of researchers.  相似文献   

7.
熊文靓  付慧真 《情报科学》2021,39(11):117-126
【目的/意义】跨学科是当今科学发展的显著特征,以跨学科研究特征为主的跨学科性研究探索,不仅为厘 清跨学科研究主题提供重要线索,而且为跨学科研究管理和评价提供依据。【方法/过程】以跨学科性研究为研究对 象,借助Coherence Score与LDA相结合主题挖掘模型识别跨学科性研究的主要主题,并通过文献计量法从宏观和 微观层次探索跨学科性研究演化特征。【结果/结论】结果显示,对跨学科研究的跨学科性评估正处于快速发展期, 跨学科研究不仅来源于社会科学、生态学等学科跨界探索的内在驱动,也源于气候变化、生态环境脆弱性等复杂问 题的外在驱动;跨学科研究评价指标与方法复杂综合,融合大数据、人工智能等新技术是发展趋势;多种形式的跨 学科教育与科研合作是促进跨学科研究落实的根本。【创新/局限】多维剖析跨学科性研究热点与未来发展趋势,为 国家科技政策制定和科学研究者开展相关研究提供参考。  相似文献   

8.
Centrality is one of the most studied concepts in social network analysis. There is a huge literature regarding centrality measures, as ways to identify the most relevant users in a social network. The challenge is to find measures that can be computed efficiently, and that can be able to classify the users according to relevance criteria as close as possible to reality. We address this problem in the context of the Twitter network, an online social networking service with millions of users and an impressive flow of messages that are published and spread daily by interactions between users. Twitter has different types of users, but the greatest utility lies in finding the most influential ones. The purpose of this article is to collect and classify the different Twitter influence measures that exist so far in literature. These measures are very diverse. Some are based on simple metrics provided by the Twitter API, while others are based on complex mathematical models. Several measures are based on the PageRank algorithm, traditionally used to rank the websites on the Internet. Some others consider the timeline of publication, others the content of the messages, some are focused on specific topics, and others try to make predictions. We consider all these aspects, and some additional ones. Furthermore, we include measures of activity and popularity, the traditional mechanisms to correlate measures, and some important aspects of computational complexity for this particular context.  相似文献   

9.
The paper presents new annotated corpora for performing stance detection on Spanish Twitter data, most notably Health-related tweets. The objectives of this research are threefold: (1) to develop a manually annotated benchmark corpus for emotion recognition taking into account different variants of Spanish in social posts; (2) to evaluate the efficiency of semi-supervised models for extending such corpus with unlabelled posts; and (3) to describe such short text corpora via specialised topic modelling.A corpus of 2,801 tweets about COVID-19 vaccination was annotated by three native speakers to be in favour (904), against (674) or neither (1,223) with a 0.725 Fleiss’ kappa score. Results show that the self-training method with SVM base estimator can alleviate annotation work while ensuring high model performance. The self-training model outperformed the other approaches and produced a corpus of 11,204 tweets with a macro averaged f1 score of 0.94. The combination of sentence-level deep learning embeddings and density-based clustering was applied to explore the contents of both corpora. Topic quality was measured in terms of the trustworthiness and the validation index.  相似文献   

10.
Inferring users’ interests from their activities on social networks has been an emerging research topic in the recent years. Most existing approaches heavily rely on the explicit contributions (posts) of a user and overlook users’ implicit interests, i.e., those potential user interests that the user did not explicitly mention but might have interest in. Given a set of active topics present in a social network in a specified time interval, our goal is to build an interest profile for a user over these topics by considering both explicit and implicit interests of the user. The reason for this is that the interests of free-riders and cold start users who constitute a large majority of social network users, cannot be directly identified from their explicit contributions to the social network. Specifically, to infer users’ implicit interests, we propose a graph-based link prediction schema that operates over a representation model consisting of three types of information: user explicit contributions to topics, relationships between users, and the relatedness between topics. Through extensive experiments on different variants of our representation model and considering both homogeneous and heterogeneous link prediction, we investigate how topic relatedness and users’ homophily relation impact the quality of inferring users’ implicit interests. Comparison with state-of-the-art baselines on a real-world Twitter dataset demonstrates the effectiveness of our model in inferring users’ interests in terms of perplexity and in the context of retweet prediction application. Moreover, we further show that the impact of our work is especially meaningful when considered in case of free-riders and cold start users.  相似文献   

11.
We deal with the task of authorship attribution, i.e. identifying the author of an unknown document, proposing the use of Part Of Speech (POS) tags as features for language modeling. The experimentation is carried out on corpora untypical for the task, i.e., with documents edited by non-professional writers, such as movie reviews or tweets. The former corpus is homogeneous with respect to the topic making the task more challenging, The latter corpus, puts language models into a framework of a continuously and fast evolving language, unique and noisy writing style, and limited length of social media messages. While we find that language models based on POS tags are competitive in only one of the corpora (movie reviews), they generally provide efficiency benefits and robustness against data sparsity. Furthermore, we experiment with model fusion, where language models based on different modalities are combined. By linearly combining three language models, based on characters, words, and POS trigrams, respectively, we achieve the best generalization accuracy of 96% on movie reviews, while the combination of language models based on characters and POS trigrams provides 54% accuracy on the Twitter corpus. In fusion, POS language models are proven essential effective components.  相似文献   

12.
Marketing professionals face challenges of increasing complexity to adapt classic marketing strategies to the phenomenon of social networks. Companies are currently trying to take advantage of the useful collective knowledge available on social networks to support different types of marketing decisions. The appropriate analysis of this information can offer marketing professionals with important competitive advantages. This work proposes a new methodology to extract the social collective behavior of Twitter users concerning a group of brands based on the users’ temporal activity. Time series of mentions made by individual users to each company’s Twitter account are aggregated to obtain collective activity data for the companies, which is a consequence of both the company’s and other users’ actions. These data are processed using classical unsupervised machine learning techniques, such as temporal clustering and hidden Markov models, to extract collective temporal behavior patterns and models of the dynamics of customers over time for a single brand and groups of brands. The derived knowledge can be used for different tasks, such as identifying the impact of a marketing campaign on Twitter and comparatively assessing the social behaviors of different brands and groups of brands to assist in making marketing decisions. Our methodology is validated in a case study from the wine market. Twitter data were gathered from four regions of different countries around the world with important wineries (Italy: Veneto, Portugal: Porto and Douro Valley, Spain: La Rioja, and United States: Napa Valley), and comparative behavior analysis was carried out from the perspective of the use of Twitter as a communication channel for marketing campaigns.  相似文献   

13.
Social media platforms allow users to express their opinions towards various topics online. Oftentimes, users’ opinions are not static, but might be changed over time due to the influences from their neighbors in social networks or updated based on arguments encountered that undermine their beliefs. In this paper, we propose to use a Recurrent Neural Network (RNN) to model each user’s posting behaviors on Twitter and incorporate their neighbors’ topic-associated context as attention signals using an attention mechanism for user-level stance prediction. Moreover, our proposed model operates in an online setting in that its parameters are continuously updated with the Twitter stream data and can be used to predict user’s topic-dependent stance. Detailed evaluation on two Twitter datasets, related to Brexit and US General Election, justifies the superior performance of our neural opinion dynamics model over both static and dynamic alternatives for user-level stance prediction.  相似文献   

14.
The emergence of social media has fundamentally changed the way of scholarly communication and allows for scientific research to be shared at an unprecedented speed and scale. While many studies have discussed what papers attract most online attention, how they prevail online is unclear. In this paper, we explore the diffusion patterns of ~170,000 papers with different journal tiers from 2012 to 2019 based on over 3 million Twitter mentions. We first categorize journals by the elite (the top of Q1) and non-elite (Q2~Q4) according to their journal impact factor quartiles, then use network analysis and time series analysis to characterize papers’ dynamic diffusion process, and finally discuss papers’ engaged users and disciplinary contexts. Results show that though elite journal papers spread significantly faster and more broadly than non-elite, some non-elite journal papers reached a sizable audience. For non-elite journal papers, a decent size of social media reach can be achieved through the aid of highly influential users or multiple waves of small spread that span a long period. As a result, popular non-elite journal papers tend to be more viral than the elite, focusing discussions around topics close to daily life. This study provides a new perspective to characterize the diffusion process of scientific papers and helps to further enhance the understanding of such a process.  相似文献   

15.
This paper conducts a comparative literature survey of Open Government Data (OGD) and Freedom of Information (FOI), with a view to tracking the central themes in the two civil society campaigns. With seeming similarities and a growing popularity in research, the major themes framing research on the two movements have not clearly emerged. Topic modelling, text mining and document analysis methods are used to extract the themes as well as key named entities. The topics are subsequently labeled and with expert guidance, their semantic meaning are provided. The results indicate that the major theme in FOI research borders on issues relating to disclosure, publishing, access and cost of requests. On the other hand, themes in OGD research have largely centered on technology and related concepts. The approach also helped in determining key similarities and differences in the two campaigns as reported in research.  相似文献   

16.
司莉  何依 《现代情报》2016,36(6):165-170
语料库是指根据一定的方法收集的自然出现语料构成的电子数据库。2000年以来我国对多语言语料库的研究呈现快速上升的趋势。在全面文献调研的基础上,本文对我国多语言语料库的研究现状进行了归纳与梳理。国内学者对多语言语料库的研究多集中于语言学领域,其次是计算机领域。研究主题主要分布在多语言语料库的关键技术研究、多语言语料库的应用研究两大方面。  相似文献   

17.
在国际科技竞争日益激化、我国科技实力迅速腾飞的当下,国家科学形象日益成为国家形象立体化过程中不可忽视的重要组成部分。通过对国际社交媒体推特平台上有关中国科学相关议题讨论的分析发现,西方公众目前对于中国科学类相关议题的关注度并不高,讨论由少量的主要科学事件主导,明显受到了西方主流新闻媒体与政治话语力量的引导与掌控,且认知偏向于负面。这种负面形象的“他塑”建构在一定程度上被卷入政治话语与国际关系话语体系中,表征为对中国负面科技新闻的报道及阴谋论的关注。而正面积极的科学形象则更多表征为“去政治化”语境下,对中国突破性科学成果与获得国际科学奖项科学家的赞赏,以及对于中国科幻的格外关注。对此,提升中国国际科学形象需要在重视社交媒体平台这一舆论场域的基础上,结合研究结果,制定具有针对性的对外科技传播策略。  相似文献   

18.
Social networks are becoming a key communication tool for organizations, but also for top managers like CEOs. Among the different available platforms, Twitter is one of the greatest and it is considered one of the most suitable to share information and engage in dialogue with stakeholders. In this way, this paper analyzes the presence of CEOs on the most active social network sites, and assess the activity and interaction of these top managers on Twitter. CEOs from Global and Latin American companies were selected, to compare their performance. The results of the study show that the presence of CEOs in social networks is very low, and the majority of those that are present on them are not adequately using their Twitter accounts. Although the general presence and performance on are low, LatAm CEOs have a better presence on social networks and they are more active on Twitter, but Global CEOs have better interaction results on their accounts. So, this area of strategic communication should be improved by communication practitioners, since the CEO communication is nowadays a key communication issue for any organization.  相似文献   

19.
20.
We empirically explore the associations between social media use at home and shopping preferences using survey data. We focus on popular retail firms including brick-and-mortar firms such as Walmart, Target, Nordstrom, and Best Buy, and online retailers, such as Amazon, Walmart, Target, and Best Buy. Social media use of popular platforms such as Facebook, Twitter, LinkedIn, Skype and a general category Other Social Media are analyzed. We find that use of LinkedIn, Skype and Other Social Media at home, in the model without control variables, is associated with shopping at Nordstrom, Walmart and Target. Shopping online at Amazon, Best Buy and Walmart, without control variables in the model specification, is associated with use of Facebook, Skype, Twitter and Other Social Media at home. We report additional insights using an alternative specification that includes social media use at work. Media Richness Theory (MRT) and Strength of Weak Ties from Social Network Analysis (SNA) and related theories help explain our results. Our results have important implications for social marketing campaigns and social media policies for consumer retail firms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号