期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The impact of deep learning on document classification using semantically rich representations

Zenun Kastrati Ali Shariq Imran Sule Yildirim Yayilgan 《Information processing & management》2019,56(5):1618-1632

This paper presents a semantically rich document representation model for automatically classifying financial documents into predefined categories utilizing deep learning. The model architecture consists of two main modules including document representation and document classification. In the first module, a document is enriched with semantics using background knowledge provided by an ontology and through the acquisition of its relevant terminology. Acquisition of terminology integrated to the ontology extends the capabilities of semantically rich document representations with an in depth-coverage of concepts, thereby capturing the whole conceptualization involved in documents. Semantically rich representations obtained from the first module will serve as input to the document classification module which aims at finding the most appropriate category for that document through deep learning. Three different deep learning networks each belonging to a different category of machine learning techniques for ontological document classification using a real-life ontology are used.Multiple simulations are carried out with various deep neural networks configurations, and our findings reveal that a three hidden layer feedforward network with 1024 neurons obtain the highest document classification performance on the INFUSE dataset. The performance in terms of F1 score is further increased by almost five percentage points to 78.10% for the same network configuration when the relevant terminology integrated to the ontology is applied to enrich document representation. Furthermore, we conducted a comparative performance evaluation using various state-of-the-art document representation approaches and classification techniques including shallow and conventional machine learning classifiers. 相似文献

2.

Knowledge representation learning with entity descriptions,hierarchical types,and textual relations

Xing Tang Ling Chen Jun Cui Baogang Wei 《Information processing & management》2019,56(3):809-822

相似文献

3.

Causality-based CTR prediction using graph neural networks

《Information processing & management》2023,60(1):103137

As a prevalent problem in online advertising, CTR prediction has attracted plentiful attention from both academia and industry. Recent studies have been reported to establish CTR prediction models in the graph neural networks (GNNs) framework. However, most of GNNs-based models handle feature interactions in a complete graph, while ignoring causal relationships among features, which results in a huge drop in the performance on out-of-distribution data. This paper is dedicated to developing a causality-based CTR prediction model in the GNNs framework (Causal-GNN) integrating representations of feature graph, user graph and ad graph in the context of online advertising. In our model, a structured representation learning method (GraphFwFM) is designed to capture high-order representations on feature graph based on causal discovery among field features in gated graph neural networks (GGNNs), and GraphSAGE is employed to obtain graph representations of users and ads. Experiments conducted on three public datasets demonstrate the superiority of Causal-GNN in AUC and Logloss and the effectiveness of GraphFwFM in capturing high-order representations on causal feature graph. 相似文献

4.

FFTree: A flexible tree to handle multiple fairness criteria

《Information processing & management》2022,59(6):103099

The demand for transparency and fairness in AI-based decision-making systems is constantly growing. Organisations need to be assured that their applications, based on these technologies, behave fairly, without introducing negative social implications in relation to sensitive attributes such as gender or race. Since the notion of fairness is context dependent and not uniquely defined, studies in the literature have proposed various formalisation. In this work, we propose a novel, flexible, discrimination-aware decision-tree that allows the user to employ different fairness criteria depending on the application domain. Our approach enhances decision-tree classifiers to provide transparent and fair rules to final users. 相似文献

5.

Joint contrastive triple-learning for deep multi-view clustering

《Information processing & management》2023,60(3):103284

Deep multi-view clustering (MVC) is to mine and employ the complex relationships among views to learn the compact data clusters with deep neural networks in an unsupervised manner. The more recent deep contrastive learning (CL) methods have shown promising performance in MVC by learning cluster-oriented deep feature representations, which is realized by contrasting the positive and negative sample pairs. However, most existing deep contrastive MVC methods only focus on the one-side contrastive learning, such as feature-level or cluster-level contrast, failing to integrating the two sides together or bringing in more important aspects of contrast. Additionally, most of them work in a separate two-stage manner, i.e., first feature learning and then data clustering, failing to mutually benefit each other. To fix the above challenges, in this paper we propose a novel joint contrastive triple-learning framework to learn multi-view discriminative feature representation for deep clustering, which is threefold, i.e., feature-level alignment-oriented and commonality-oriented CL, and cluster-level consistency-oriented CL. The former two submodules aim to contrast the encoded feature representations of data samples in different feature levels, while the last contrasts the data samples in the cluster-level representations. Benefiting from the triple contrast, the more discriminative representations of views can be obtained. Meanwhile, a view weight learning module is designed to learn and exploit the quantitative complementary information across the learned discriminative features of each view. Thus, the contrastive triple-learning module, the view weight learning module and the data clustering module with these fused features are jointly performed, so that these modules are mutually beneficial. The extensive experiments on several challenging multi-view datasets show the superiority of the proposed method over many state-of-the-art methods, especially the large improvement of 15.5% and 8.1% on Caltech-4V and CCV in terms of accuracy. Due to the promising performance on visual datasets, the proposed method can be applied into many practical visual applications such as visual recognition and analysis. The source code of the proposed method is provided at https://github.com/ShizheHu/Joint-Contrastive-Triple-learning. 相似文献

6.

Debiaser for Multiple Variables to enhance fairness in classification tasks

《Information processing & management》2023,60(2):103226

Nowadays assuring that search and recommendation systems are fair and do not apply discrimination among any kind of population has become of paramount importance. This is also highlighted by some of the sustainable development goals proposed by the United Nations. Those systems typically rely on machine learning algorithms that solve the classification task. Although the problem of fairness has been widely addressed in binary classification, unfortunately, the fairness of multi-class classification problem needs to be further investigated lacking well-established solutions. For the aforementioned reasons, in this paper, we present the Debiaser for Multiple Variables (DEMV), an approach able to mitigate unbalanced groups bias (i.e., bias caused by an unequal distribution of instances in the population) in both binary and multi-class classification problems with multiple sensitive variables. The proposed method is compared, under several conditions, with a set of well-established baselines using different categories of classifiers. At first we conduct a specific study to understand which is the best generation strategies and their impact on DEMV’s ability to improve fairness. Then, we evaluate our method on a heterogeneous set of datasets and we show how it overcomes the established algorithms of the literature in the multi-class classification setting and in the binary classification setting when more than two sensitive variables are involved. Finally, based on the conducted experiments, we discuss strengths and weaknesses of our method and of the other baselines. 相似文献

7.

Heterogeneous graph-based joint representation learning for users and POIs in location-based social network

《Information processing & management》2020,57(2):102151

Learning latent representations for users and points of interests (POIs) is an important task in location-based social networks (LBSN), which could largely benefit multiple location-based services, such as POI recommendation and social link prediction. Many contextual factors, like geographical influence, user social relationship and temporal information, are available in LBSN and would be useful for this task. However, incorporating all these contextual factors for user and POI representation learning in LBSN remains challenging, due to their heterogeneous nature. Although the encouraging performance of POI recommendation and social link prediction are delivered, most of the existing representation learning methods for LBSN incorporate only one or two of these contextual factors. In this paper, we propose a novel joint representation learning framework for users and POIs in LBSN, named UP2VEC. In UP2VEC, we present a heterogeneous LBSN graph to incorporate all these aforementioned factors. Specifically, the transition probabilities between nodes inside the heterogeneous graph are derived by jointly considering these contextual factors. The latent representations of users and POIs are then learnt by matching the topological structure of the heterogeneous graph. For evaluating the effectiveness of UP2VEC, a series of experiments are conducted with two real-world datasets (Foursquare and Gowalla) in terms of POI recommendation and social link prediction. Experimental results demonstrate that the proposed UP2VEC significantly outperforms the existing state-of-the-art alternatives. Further experiment shows the superiority of UP2VEC in handling cold-start problem for POI recommendation. 相似文献

8.

实体-属性抽取的GRU+CRF方法

王仁武孟现茹孔琦《现代情报》2018,38(10):57-64

[目的/意义]研究利用深度学习的循环神经网络GRU结合条件随机场CRF对标注的中文文本序列进行预测,来抽取在线评论文本中的实体-属性。[方法/过程]首先根据设计好的文本序列标注规范,对评论语料分词后进行实体及其属性的命名实体标注,得到单词序列、词性序列和标注序列;然后将单词序列、词性序列转为分布式词向量表示并用于GRU循环神经网络的输入;最后输出层采用条件随机场CRF,输出标签即是实体或属性。[结果/结论]实验结果表明,本文的方法将实体-属性抽取简化为命名实体标注,并利用深度学习的GRU捕获输入数据的上下文语义以及条件随机场CRF获取输出标签的前后关系,比传统的基于规则或一般的机器学习方法具有较大的应用优势。相似文献

9.

On the task assignment with group fairness for spatial crowdsourcing

《Information processing & management》2023,60(2):103175

Task assignment, the core problem of Spatial Crowdsourcing (SC), is often modeled as an optimization problem with multiple constraints, and the quality and efficiency of its solution determines how well the SC system works. Fairness is a critical aspect of task assignment that affects worker participation and satisfaction. Although the existing studies on SC have noticed the fairness problem, they mainly focus on fairness at the individual level rather than at the group level. However, differences among groups in certain attributes (e.g. race, gender, age) can easily lead to discrimination in task assignment, which triggers ethical issues and even deteriorates the quality of the SC system. Therefore, we study the problem of task assignment with group fairness for SC. According to the principle of fair budget allocation, we define a well-designed constraint that can be considered in the task assignment problem of SC systems, resulting in assignment with group fairness. We mainly consider the task assignment problem in a common One-to-One SC system (O2-SC), and our goal is to maximize the quality of the task assignment while satisfying group fairness and other constraints such as budget and spatial constraints. Specifically, we first give the formal definition of task assignment with group fairness constraint for O2-SC. Then, we prove that it is essentially an NP-hard combinatorial optimization problem. Next, we provide a novel fast algorithm with theoretical guarantees to solve it. Finally, we conduct extensive experiments using both synthetic and real datasets. The experimental results show that the proposed constraint can significantly improve the group fairness level of algorithms, even for a completely random algorithm. The results also show that our algorithm can efficiently and effectively complete the task assignment of SC systems while ensuring group fairness. 相似文献

10.

Multiplicity and dynamics of social representations of the COVID-19 pandemic on Chinese social media from 2019 to 2020

《Information processing & management》2022,59(4):102990

Documenting the emergent social representations of COVID-19 in public communication is necessary for critically reflecting on pandemic responses and providing guidance for global pandemic recovery policies and practices. This study documents the dynamics of changing social representations of the COVID-19 pandemic on one of the largest Chinese social media, Weibo, from December 2019 to April 2020. We draw on the social representation theory (SRT) and conceptualize topics and topic networks as a form of social representation. We analyzed a dataset of 40 million COVID-19 related posts from 9.7 million users (including the general public, opinion leaders, and organizations) using machine learning methods. We identified 12 topics and found an expansion in social representations of COVID-19 from a clinical and epidemiological perspective to a broader perspective that integrated personal illness experiences with economic and sociopolitical discourses. Discussions about COVID-19 science did not take a prominent position in the representations, suggesting a lack of effective science and risk communication. Further, we found the strongest association of social representations existed between the public and opinion leaders and the organizations’ representations did not align much with the other two groups, suggesting a lack of organizations’ influence in public representations of COVID-19 on social media in China. 相似文献

11.

MTGCN: A multi-task approach for node classification and link prediction in graph data

《Information processing & management》2022,59(3):102902

Both node classification and link prediction are popular topics of supervised learning on the graph data, but previous works seldom integrate them together to capture their complementary information. In this paper, we propose a Multi-Task and Multi-Graph Convolutional Network (MTGCN) to jointly conduct node classification and link prediction in a unified framework. Specifically, MTGCN consists of multiple multi-task learning so that each multi-task learning learns the complementary information between node classification and link prediction. In particular, each multi-task learning uses different inputs to output representations of the graph data. Moreover, the parameters of one multi-task learning initialize the parameters of the other multi-task learning, so that the useful information in the former multi-task learning can be propagated to the other multi-task learning. As a result, the information is augmented to guarantee the quality of representations by exploring the complex constructure inherent in the graph data. Experimental results on six datasets show that our MTGCN outperforms the comparison methods in terms of both node classification and link prediction. 相似文献

12.

Re-examining lexical and semantic attention: Dual-view graph convolutions enhanced BERT for academic paper rating

《Information processing & management》2023,60(2):103216

Automatically assessing academic papers has enormous potential to reduce peer-review burden and individual bias. Existing studies strive for building sophisticated deep neural networks to identify academic value based on comprehensive data, e.g., academic graphs and full papers. However, these data are not always easy to access. And the content of the paper rather than other features outside the paper should matter in a fair assessment. Furthermore, while BERT models can maintain general semantics by pre-training on large-scale corpora, they tend to be over-smoothing due to stacked self-attention layers among unfiltered input tokens. Therefore, it is nontrivial to figure out distinguishable value of an academic paper from its limited content. In this study, we propose a novel deep neural network, namely Dual-view Graph Convolutions Enhanced BERT (DGC-BERT), for academic paper acceptance estimation. We combine the title and abstract of the paper as input. Then, a pre-trained BERT model is employed to extract the paper’s general representations. Apart from hidden representations of the final layer, we highlight the first and last few layers as lexical and semantic views. In particular, we re-examine the dual-view filtered self-attention matrices via constructing two graphs, respectively. After that, two multi-hop Graph Convolutional Networks (GCNs) are separately employed to capture pivotal and distant dependencies between the tokens. Moreover, the dual-view representations are facilitated by each other with biaffine attention modules. And a re-weighting gate is proposed to further streamline the dual-view representations with the help of the original BERT representation. Finally, whether the submitted paper could be acceptable is predicted based on the original language model features cooperated with the dual-view dependencies. Extensive data analyses and the full paper based MHCNN studies provide insights into the task and structural functions. Comparison experiments on two benchmark datasets demonstrate that the proposed DGC-BERT significantly outperforms alternative approaches, especially the state-of-the-art models like MHCNN and BERT variants. Additional analyses reveal significance and explainability of the proposed modules in the DGC-BERT. Our codes and settings have been released on Github (https://github.com/ECNU-Text-Computing/DGC-BERT). 相似文献

13.

我国科技创新政策保障市场公平竞争的进路优化——以公平竞争审查制度实施为背景

郑和园李胜利《科技管理研究》2019,39(5)

科技创新政策优化是实施创新驱动发展战略的重要议题。在现代化经济体系中,通过科技创新政策优化来营造公平竞争的市场环境,是契合科技创新政策导向性、应用性和科学性的基本手段。公平竞争审查制度以其保障公平竞争的独特价值,能够推动科技创新政策优化,实现政府与市场在科技创新资源配置中的合力。现有科技创新政策存在诸多有碍市场公平竞争因素,应当以公平竞争审查制度为抓手,识别影响公平竞争的政策掣肘,破除科技创新政策壁垒,强化评估标准建设,增进政策制定机关职能,推动多主体协同共治,最终推动科技创新政策体系优化及公平竞争审查制度实施。相似文献

14.

一种基于多元数据融合的引文网络知识表示方法

陈文杰许海云《情报理论与实践》2020,43(1):150-154,134

[目的/意义]有效融合引文网络中的引用关系和文本属性等多元数据,增强文献节点间的语义关联,从而为数据挖掘和知识发现等任务提供有力的支撑。[方法/过程]提出了一种引文网络的知识表示方法,先利用神经网络模型学习引文网络中的k阶邻近结构;然后使用doc2vec模型学习标题、摘要等文本属性;最后给出了基于向量共享的交叉学习机制用于多元数据融合。[结果/结论]通过面向干细胞领域的CNKI引文数据集的测试,在链路预测上取得了较好的性能,证明了方法的有效性和科学性。相似文献

15.

Towards a real-time processing framework based on improved distributed recurrent neural network variants with fastText for social big data analytics

《Information processing & management》2020,57(1):102122

Big data generated by social media stands for a valuable source of information, which offers an excellent opportunity to mine valuable insights. Particularly, User-generated contents such as reviews, recommendations, and users’ behavior data are useful for supporting several marketing activities of many companies. Knowing what users are saying about the products they bought or the services they used through reviews in social media represents a key factor for making decisions. Sentiment analysis is one of the fundamental tasks in Natural Language Processing. Although deep learning for sentiment analysis has achieved great success and allowed several firms to analyze and extract relevant information from their textual data, but as the volume of data grows, a model that runs in a traditional environment cannot be effective, which implies the importance of efficient distributed deep learning models for social Big Data analytics. Besides, it is known that social media analysis is a complex process, which involves a set of complex tasks. Therefore, it is important to address the challenges and issues of social big data analytics and enhance the performance of deep learning techniques in terms of classification accuracy to obtain better decisions.In this paper, we propose an approach for sentiment analysis, which is devoted to adopting fastText with Recurrent neural network variants to represent textual data efficiently. Then, it employs the new representations to perform the classification task. Its main objective is to enhance the performance of well-known Recurrent Neural Network (RNN) variants in terms of classification accuracy and handle large scale data. In addition, we propose a distributed intelligent system for real-time social big data analytics. It is designed to ingest, store, process, index, and visualize the huge amount of information in real-time. The proposed system adopts distributed machine learning with our proposed method for enhancing decision-making processes. Extensive experiments conducted on two benchmark data sets demonstrate that our proposal for sentiment analysis outperforms well-known distributed recurrent neural network variants (i.e., Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BiLSTM), and Gated Recurrent Unit (GRU)). Specifically, we tested the efficiency of our approach using the three different deep learning models. The results show that our proposed approach is able to enhance the performance of the three models. The current work can provide several benefits for researchers and practitioners who want to collect, handle, analyze and visualize several sources of information in real-time. Also, it can contribute to a better understanding of public opinion and user behaviors using our proposed system with the improved variants of the most powerful distributed deep learning and machine learning algorithms. Furthermore, it is able to increase the classification accuracy of several existing works based on RNN models for sentiment analysis. 相似文献

16.

Fast-DRD: Fast decentralized reinforcement distillation for deadline-aware edge computing

《Information processing & management》2022,59(2):102850

Edge computing has recently gained momentum as it provides computing services for mobile devices through high-speed networks. In edge computing system optimization, deep reinforcement learning(DRL) enhances the quality of services(QoS) and shorts the age of information(AoI). However, loosely coupled edge servers saturate a noisy data space for DRL exploration, and learning a reasonable solution is enormously costly. Most existing works assume that the edge is an exact observation system and harvests well-labeled data for the pretraining of DRL neural networks. However, this assumption stands in opposition to the motivation of driving DRL to explore unknown information and increases the scheduling and computing costs in large-scale dynamic systems. This article leverages DRL with a distillation module to drive learning efficiency for edge computing with partial observation. We formulate the deadline-aware offloading problem as a decentralized partially observable Markov decision process (Dec-POMDP) with distillation, called fast decentralized reinforcement distillation(Fast-DRD). Each edge server decides makes offloading decisions in accordance with its own observations and learning strategies in a decentralized manner. By defining trajectory observation history(TOH) distillation and trust distillation to avoid overfitting, Fast-DRD learns a suitable offloading model in a noisy partially observed edge system and reduces the cost for communication among servers. Finally, experimental simulations are presented to evaluate and compare the effectiveness and complexity of Fast-DRD. 相似文献

17.

On “an improved approach for nonlinear system identification using neural networks”

Andrew I. Hanna Danilo P. Mandic 《Journal of The Franklin Institute》2003,340(5):363-370

Nonlinear system identification and prediction is a complex task, and often non-parametric models such as neural networks are used in place of intricate mathematics. To that cause, recently an improved approach to nonlinear system identification using neural networks was presented in Gupta and Sinha (J. Franklin Inst. 336 (1999) 721). Therein a learning algorithm was proposed in which both the slope of the activation function at a neuron, β, and the learning rate, η, were made adaptive. The proposed algorithm assumes that η and β are independent variables. Here, we show that the slope and the learning rate are not independent in a general dynamical neural nétwork, and this should be taken into account when designing a learning algorithm. Further, relationships between η and β are developed which helps reduce the number of degrees of freedom and computational complexity in an optimisation task of training a fully adaptive neural network. Simulation results based on Gupta and Sinha (1999) and the proposed approach support the analysis. 相似文献

18.

Language processing and learning models for community question answering in Arabic

《Information processing & management》2019,56(2):274-290

In this paper we focus on the problem of question ranking in community question answering (cQA) forums in Arabic. We address the task with machine learning algorithms using advanced Arabic text representations. The latter are obtained by applying tree kernels to constituency parse trees combined with textual similarities, including word embeddings. Our two main contributions are: (i) an Arabic language processing pipeline based on UIMA—from segmentation to constituency parsing—built on top of Farasa, a state-of-the-art Arabic language processing toolkit; and (ii) the application of long short-term memory neural networks to identify the best text fragments in questions to be used in our tree-kernel-based ranker. Our thorough experimentation on a recently released cQA dataset shows that the Arabic linguistic processing provided by Farasa produces strong results and that neural networks combined with tree kernels further boost the performance in terms of both efficiency and accuracy. Our approach also enables an implicit comparison between different processing pipelines as our tests on Farasa and Stanford parsers demonstrate. 相似文献

19.

包容性增长视野下甘肃民族地区加强政治沟通的当代价值

岳彬张克政《科学．经济．社会》2012,30(1):22-25,29

通过政治沟通确立公平正义的社会发展理念,在此前提下建构起以权利公平、机会公平、规则公平、分配公平为主要内容的社会制度体系,不断消除不同地区、不同社会阶层和不同人群之间参与社会发展、分享社会发展成果方面的障碍,是包容性增长的基础和前提.在包容性增长视野下,对于多民族聚居、地区之间自然人文差异巨大、现代化发展滞后的甘肃民族地区的经济社会发展而言,加强政治沟通意义重大.甘肃民族地区政治沟通工作的开展要从抓教育入手、从抓学习开始,加强制度建设,以调查研究为抓手,深入推进. 相似文献

20.

试论促进教育公平的公共财政对策 总被引：1，自引：0，他引：1

温勇斌李东亮王智勇《科技与管理》2008,10(2):135-137

教育公平是社会公平关键的一环,教育的不公平会加剧社会的不公平。通过分析了教育公平的含义、教育公平的一些基本特征、我国教育公平的现状。并在此基础上,从公共财政的角度提出了促进教育公平的相应对策。相似文献