期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

何湜程结海王世东王雅萍《青海科技》2020,27(1)

遥感图像监督分类需要充足精确的标注数据训练分类器,然而数据标注需要人工参与,很多任务难以及时获得符合要求的监督信息,不利于影像分类。半监督学习是一种利用少量标注数据和大量未标注数据共同训练分类器的机器学习方法,能从机理上减少人工参与,提高效率。本文引入一种半监督方法——平方损失互信息归一化模型(squared-loss mutual information regularization,SMIR)实现遥感图像分类。实验结果表明,在小样本监督信息的条件下,SMIR能够利用标注数据与未标注数据,直接构建多类分类器,其影像分类结果优于经典的支持向量机(support vector machine,SVM)方法。相似文献

2.

数据流分类器算法在水质环境中的应用

曹红郑鑫《科技通报》2014,(1)

许多现实应用中,由于数据流的特性,使人们难以获得全部数据的类标签。为了解决类标签不完整数据流的分类问题,本文首先分析了有标签数据集对基于聚类假设半监督分类算法分类误差的影响;然后,利用分类误差影响分析以及数据流的特点,提出一种基于聚类假设半监督数据流集成分类器算法(semi-supervised data stream ensemble classifiers under the cluster assumption,SSDSEC),并针对个体分类器的权值设定进行了探讨;最后,利用仿真实验验证本文算法的有效性。相似文献

3.

基于图学习的领域文本特征抽取方法

王煜卫莉莉《黑龙江科技信息》2012,(32):69+254

根据领域文本数据自身的特点,首先对领域文本样本建立文本向量空间模型,使用词频与DF相结合的方法,缩小特征词候选集,再依据基于图的半监督学习算法,迭代地学习一个基于领域特征关联度的图的半监督分类器,利用少量的标记数据,获得更好的领域文本特征信息抽取。在机械制造等多个领域的语料集上进行测试,对实验结果进行分析,实验证明,该方法是可行的。相似文献

4.

钢板表面缺陷深度主动学习高效分类方法

下载免费PDF全文

周友行孟高磊赵文杰易倩《中国科技信息》2022,36(2):23-31

针对钢板表面缺陷图像分类传统深度学习算法中需要大量标签数据的问题,提出一种基于主动学习的高效分类方法。该方法包含一个轻量级的卷积神经网络和一个基于不确定性的主动学习样本筛选策略。神经网络采用简化的convolutional base进行特征提取,然后用全局池化层替换掉传统密集连接分类器中的隐藏层来减轻过拟合。为了更好的衡量模型对未标签图像样本所属类别的不确定性,首先将未标签图像样本传入到用标签图像样本训练好的模型,得到模型对每一个未标签样本关于标签的概率分布(probability distribution over classes, PDC),然后用此模型对标签样本进行预测并得到模型对每个标签的平均PDC。将两类分布的KL-divergence值作为不确定性指标来筛选未标签图像进行人工标注。根据在NEU-CLS开源缺陷数据集上的对比实验,该方法可以通过44%的标签数据实现97%的准确率,极大降低标注成本。相似文献

5.

集成学习有效性研究

周济文志强林海龙《人天科学研究》2014,(6):199-201

集成学习是构造一系列的分类器,然后对新的样本预测其类别的学习算法。最原始的集成方法是贝叶斯平均,最近的算法包括Error--Correcting output coding,Bagging和Boosting。阐述了集成后的分类器效果优于单个分类器的原因,结合实验对一些集成学习的研究结果进行了说明。相似文献

6.

一种基于不完整数据集的网页分类技术

蔡崇超《人天科学研究》2011,10(1):143-145

常用的网页分类技术大多基于普通文本分类方法,没有充分考虑到网页分类的特殊性——网页本身的半结构化特征以及网页中存在大量干扰分类的噪音信息,同时多数网页分类的测试集和训练集来源于同一个样本集而忽视了测试集中可能包含无类别样本的可能。基于向量空间模型,将样本集看成由有类别样本和无类别样本两部分组成,同时选择了样本集来自于相同的网站,在去除网页噪音基础上结合文本相似度算法和最优截尾法,提出了一种基于不完整数据集的网页分类技术LUD（Learning by Unlabeled Data）来改善分类效果,提高分类精度。实验证明：LUD算法与传统的分类方法相比较而言,不但可以提高已有类别样本的分类精度,更主要的是提供了一种发现新类别样本的方法。相似文献

7.

基于集成学习的动态Web页面语义标注方法研究

邱金鹏《科技通报》2019,35(10):133-136

传统Web页面语义标注方法需手工处理,或只可将Web页面中有属性的标签赋予数据,针对无属性标签数据不进行标注,不适于大规模Web页面信息标注,且标注结果不可靠。为此,提出一种新的基于集成学习的动态Web页面语义标注方法。给出动态Web页面语义标注流程。将Web页面转换成DOM树,识别待标注文本。选取抽取信息特征与训练Web页面特征,将含有语义信息的内容分配至概念抽象化的本体上,采用多分类器集成学习方法进行分类,区分待标注信息是属性标签还是数据元素,通过不同分类器预测结果的一致性对相应样本被准确标注的置信度进行衡量。通过训练页面中涵盖的属性标注规则集与抽取信息中的属性名称实现语义标注。实验结果表明,所提方法适于大规模动态Web页面语义标注,标注结果可靠。相似文献

8.

基于集成分类器的新疆哈萨克族早期食管癌X线图像的分型研究

麦麦提·如则严传波木拉提·哈米提姚娟排孜丽耶·尤山塔依娜迪亚·阿卜杜迪克依木茹仙古丽·艾尔西丁《科技通报》2019,35(7):85-91

目的:讨论Bagging、Adaboost、Random Forest(RF) 3个集成分类器对新疆哈萨克族食管图像分型中的分类能力。方法:使用Matlab图像处理软件,对食管X线图像进行预处理,对预处理后的图像使用灰度共生矩阵和Hu不变矩特征进行图像特征的提取;然后,使用主成分分析法对特征值进行筛选优化,得到分类能力较强的特征值;最后,使用Weka软件,将3个不同的集成分类器对正常食管和早期食管癌图像进行分类,并进行分类模型的评估。结果:使用Bagging、Adaboost、Random Forest(RF) 3个集成分类器结合降维后的灰度共生矩阵特征值对食管图像进行分类时,正常食管的分类准确率是82%、94%、88%,早期食管癌的分类准确率是94%、88%和94%;使用降维后的Hu不变矩特征值和3种集成分类器对正常食管和早期食管癌进行分类时,正常食管的分类准确率是60%和64%、61%,早期食管癌的分类准确率是57%、68%和65%;结论:3种集成分类器结合灰度共生矩阵对正常食管和早期食管癌X线图像进行分类,其分类准确率与Hu不变矩相比分类效果更显著。说明灰度共生矩阵结合3种集成分类器更适合用于区分正常食管和早期食管癌X线图像。相似文献

9.

基于word2vec和自训练的无监督情感分类方法

陶娅芝《科技风》2019,(12)

针对现有情感分类算法中存在的问题,本文提出了一种基于word2vec和自训练的无监督情感分类方法。该方法首先利用word2vec和词性标签构建领域情感词典,并在此基础上融合否定词和程度副词来计算评论的情感倾向值;其次,选取情感倾向强烈的评论作为已标注训练集,剩余部分作为待分类数据集;最后,采用机器学习方法生成分类器进行自训练学习,直到迭代结束。采用手机评论作为实验数据,结果证实了该方法的有效性。相似文献

10.

Subagging在个人信用评估中的应用研究

刘玉峰贺昌政《科技管理研究》2011,(19)

运用集成分类算法bagging的改进模型——subagging试图建立一个专门针对个人信用评估的方法,以期取得更好的预测分类效果.针对个人信用评估中单一分类器的不足,提出了利用分类器的集成进行个人信用评估的方法.利用UCI上的信用数据对单个分类器、bagging集成分类器以及subagging集成分类器进行实验比较,结果表明,subagging -决策树和subagging -K近邻在样本不独立和不平衡的情况下有效地提高了模型的精准性.结果显示,它们对商业银行控制消费信贷风险具有更好的适用性. 相似文献

11.

Class-aware tensor factorization for multi-relational classification

《Information processing & management》2020,57(2):102068

In this paper, we propose a tensor factorization method, called CLASS-RESCAL, which associates the class labels of data samples with their latent representations. Specifically, we extend RESCAL to produce a semi-supervised factorization method that combines a classification error term with the standard factor optimization process. CLASS-RESCAL assimilates information from all the relations of the tensor, while also taking into account classification performance. This procedure forces the data samples within the same class to have similar latent representations. Experimental results on several real-world social network data indicate this is a promising approach for multi-relational classification tasks. 相似文献

12.

Effect of ensemble classifier composition on offline cursive character recognition

Ashfaqur Rahman Brijesh Verma 《Information processing & management》2013

In this paper we present novel ensemble classifier architectures and investigate their influence for offline cursive character recognition. Cursive characters are represented by feature sets that portray different aspects of character images for recognition purposes. The recognition accuracy can be improved by training ensemble of classifiers on the feature sets. Given the feature sets and the base classifiers, we have developed multiple ensemble classifier compositions under four architectures. The first three architectures are based on the use of multiple feature sets whereas the fourth architecture is based on the use of a unique feature set. Type-1 architecture is composed of homogeneous base classifiers and Type-2 architecture is constructed using heterogeneous base classifiers. Type-3 architecture is based on hierarchical fusion of decisions. In Type-4 architecture a unique feature set is learned by a set of homogeneous base classifiers with different learning parameters. The experimental results demonstrate that the recognition accuracy achieved using the proposed ensemble classifier (with best composition of base classifiers and feature sets) is better than the existing recognition accuracies for offline cursive character recognition. 相似文献

13.

Search task success evaluation by exploiting multi-view active semi-supervised learning

《Information processing & management》2020,57(2):102180

Search task success rate is an important indicator to measure the performance of search engines. In contrast to most of the previous approaches that rely on labeled search tasks provided by users or third-party editors, this paper attempts to improve the performance of search task success evaluation by exploiting unlabeled search tasks that are existing in search logs as well as a small amount of labeled ones. Concretely, the Multi-view Active Semi-Supervised Search task Success Evaluation (MA4SE) approach is proposed, which exploits labeled data and unlabeled data by integrating the advantages of both semi-supervised learning and active learning with the multi-view mechanism. In the semi-supervised learning part of MA4SE, we employ a multi-view semi-supervised learning approach that utilizes different parameter configurations to achieve the disagreement between base classifiers. The base classifiers are trained separately from the pre-defined action and time views. In the active learning part of MA4SE, each classifier received from semi-supervised learning is applied to unlabeled search tasks, and the search tasks that need to be manually annotated are selected based on both the degree of disagreement between base classifiers and a regional density measurement. We evaluate the proposed approach on open datasets with two different definitions of search tasks success. The experimental results show that MA4SE outperforms the state-of-the-art semi-supervised search task success evaluation approach. 相似文献

14.

Rank-based self-training for graph convolutional networks

Daniel Carlos Guimarães Pedronette Longin Jan Latecki 《Information processing & management》2021,58(2):102443

Graph Convolutional Networks (GCNs) have been established as a fundamental approach for representation learning on graphs, based on convolution operations on non-Euclidean domain, defined by graph-structured data. GCNs and variants have achieved state-of-the-art results on classification tasks, especially in semi-supervised learning scenarios. A central challenge in semi-supervised classification consists in how to exploit the maximum of useful information encoded in the unlabeled data. In this paper, we address this issue through a novel self-training approach for improving the accuracy of GCNs on semi-supervised classification tasks. A margin score is used through a rank-based model to identify the most confident sample predictions. Such predictions are exploited as an expanded labeled set in a second-stage training step. Our model is suitable for different GCN models. Moreover, we also propose a rank aggregation of labeled sets obtained by different GCN models. The experimental evaluation considers four GCN variations and traditional benchmarks extensively used in the literature. Significant accuracy gains were achieved for all evaluated models, reaching results comparable or superior to the state-of-the-art. The best results were achieved for rank aggregation self-training on combinations of the four GCN models. 相似文献

15.

RGAN-EL: A GAN and ensemble learning-based hybrid approach for imbalanced data classification

《Information processing & management》2023,60(2):103235

Imbalanced sample distribution is usually the main reason for the performance degradation of machine learning algorithms. Based on this, this study proposes a hybrid framework (RGAN-EL) combining generative adversarial networks and ensemble learning method to improve the classification performance of imbalanced data. Firstly, we propose a training sample selection strategy based on roulette wheel selection method to make GAN pay more attention to the class overlapping area when fitting the sample distribution. Secondly, we design two kinds of generator training loss, and propose a noise sample filtering method to improve the quality of generated samples. Then, minority class samples are oversampled using the improved RGAN to obtain a balanced training sample set. Finally, combined with the ensemble learning strategy, the final training and prediction are carried out. We conducted experiments on 41 real imbalanced data sets using two evaluation indexes: F1-score and AUC. Specifically, we compare RGAN-EL with six typical ensemble learning; RGAN is compared with three typical GAN models. The experimental results show that RGAN-EL is significantly better than the other six ensemble learning methods, and RGAN is greatly improved compared with three classical GAN models. 相似文献

16.

A distance-based weighting framework for boosting the performance of dynamic ensemble selection

《Information processing & management》2019,56(4):1300-1316

Dynamic Ensemble Selection (DES) strategy is one of the most common and effective techniques in machine learning to deal with classification problems. DES systems aim to construct an ensemble consisting of the most appropriate classifiers selected from the candidate classifier pool according to the competence level of the individual classifier. Since several classifiers are selected, their combination becomes crucial. However, most of current DES approaches focus on the combination of the selected classifiers while ignoring the local information surrounding the query sample needed to be classified. In order to boost the performance of DES-based classification systems, we in this paper propose a dynamic weighting framework for the classifier fusion during obtaining the final output of an DES system. In particular, the proposed method first employs a DES approach to obtain a group of classifiers for a query sample. Then, the hypothesis vector of the selected ensemble is obtained based on the analysis of consensus. Finally, a distance-based weighting scheme is developed to adjust the hypothesis vector depending on the closeness of the query sample to each class. The proposed method is tested on 30 real-world datasets with six well-known DES approaches based on both homogeneous and heterogeneous ensemble. The obtained results, supported by proper statistical tests, show that our method outperforms, both in terms of accuracy and kappa measures, the original DES framework. 相似文献

17.

A qualitatively analyzable two-stage ensemble model based on machine learning for credit risk early warning: Evidence from Chinese manufacturing companies

《Information processing & management》2023,60(3):103267

Constructing ensemble models has become a common method for corporate credit risk early warning, while as to deep learning model with better predictive ability, there have been no fixed theoretical models formed in corporate credit risk early warning, as such models often fail to conduct further qualitative analysis of the results. Thus, this article builds a new two-stage ensemble model using a variety of machine learning methods represented by deep learning for corporate credit risk early warning, which can not only effectively improve the prediction performance of the model, but also qualitatively analyze the source of corporate credit risk from multiple angles according to the results. At first stage, the improved entropy method is used to re-assign the instance weight in correlation degree based on grey correlation analysis. At second stage, this study adopts Bagging method to integrate multiple one-dimensional convolutional neural networks, and borrows idea of N-fold cross validation to expand the difference of the base classifier. Empirically, this article selects listed companies in Chinese manufacturing industry between 2012 and 2021 as datasets, including 467 samples with 51 financial indicators. The new ensemble model has the highest F1-score (87.29%) and G-mean (89.47%) among comparative models, and qualitatively analyzes corporate risk sources. Further, it also analyzes how to increase early warning effect from the angles of indicator number and time span. 相似文献

18.

Extraction of category orthonormal subspace for multi-class classification

Hao Su Zhiping Lin Lei Sun 《Journal of The Franklin Institute》2021,358(9):5089-5112

Extraction of pattern class associated discriminative subspace is critical to many pattern classification problems. Traditionally, pattern class labels are regarded as indicators to discriminate between pattern classes. In this work, a novel indicator model is proposed to extract discriminant subspace by projecting samples onto a space where the projected categories are mutually orthogonal and in-category normalized. Category orthonormal property and its connections to discriminative subspace extraction are derived. It is shown that the proposed method has a strong connection with the existing Fukunaga-Koontz Transformation but extends the category number from two to multiple. For applications with a large dimension size but limited number of samples, an analytic least-norm solver is developed for calculating the projection function. A discriminative subspace extraction method for multiple classes is proposed and is evaluated by a combination with classifiers. Experiments demonstrate a promising result of using the extracted category orthonormal subspace for multi-class subspace extraction when sample number is small. 相似文献

19.

A novel reasoning mechanism for multi-label text classification

Ran Wang Robert Ridley Xi’ao Su Weiguang Qu Xinyu Dai 《Information processing & management》2021,58(2):102441

The aim in multi-label text classification is to assign a set of labels to a given document. Previous classifier-chain and sequence-to-sequence models have been shown to have a powerful ability to capture label correlations. However, they rely heavily on the label order, while labels in multi-label data are essentially an unordered set. The performance of these approaches is therefore highly variable depending on the order in which the labels are arranged. To avoid being dependent on label order, we design a reasoning-based algorithm named Multi-Label Reasoner (ML-Reasoner) for multi-label classification. ML-Reasoner employs a binary classifier to predict all labels simultaneously and applies a novel iterative reasoning mechanism to effectively utilize the inter-label information, where each instance of reasoning takes the previously predicted likelihoods for all labels as additional input. This approach is able to utilize information between labels, while avoiding the issue of label-order sensitivity. Extensive experiments demonstrate that our method outperforms state-of-the art approaches on the challenging AAPD dataset. We also apply our reasoning module to a variety of strong neural-based base models and show that it is able to boost performance significantly in each case. 相似文献

20.

Using semantic similarity to reduce wrong labels in distant supervision for relation extraction

Chengsen Ru Jintao Tang Shasha Li Songxian Xie Ting Wang 《Information processing & management》2018,54(4):593-608

Distant supervision (DS) has the advantage of automatically generating large amounts of labelled training data and has been widely used for relation extraction. However, there are usually many wrong labels in the automatically labelled data in distant supervision (Riedel, Yao, & McCallum, 2010). This paper presents a novel method to reduce the wrong labels. The proposed method uses the semantic Jaccard with word embedding to measure the semantic similarity between the relation phrase in the knowledge base and the dependency phrases between two entities in a sentence to filter the wrong labels. In the process of reducing wrong labels, the semantic Jaccard algorithm selects a core dependency phrase to represent the candidate relation in a sentence, which can capture features for relation classification and avoid the negative impact from irrelevant term sequences that previous neural network models of relation extraction often suffer. In the process of relation classification, the core dependency phrases are also used as the input of a convolutional neural network (CNN) for relation classification. The experimental results show that compared with the methods using original DS data, the methods using filtered DS data performed much better in relation extraction. It indicates that the semantic similarity based method is effective in reducing wrong labels. The relation extraction performance of the CNN model using the core dependency phrases as input is the best of all, which indicates that using the core dependency phrases as input of CNN is enough to capture the features for relation classification and could avoid negative impact from irrelevant terms. 相似文献