期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

何湜程结海王世东王雅萍《青海科技》2020,27(1)

遥感图像监督分类需要充足精确的标注数据训练分类器,然而数据标注需要人工参与,很多任务难以及时获得符合要求的监督信息,不利于影像分类。半监督学习是一种利用少量标注数据和大量未标注数据共同训练分类器的机器学习方法,能从机理上减少人工参与,提高效率。本文引入一种半监督方法——平方损失互信息归一化模型(squared-loss mutual information regularization,SMIR)实现遥感图像分类。实验结果表明,在小样本监督信息的条件下,SMIR能够利用标注数据与未标注数据,直接构建多类分类器,其影像分类结果优于经典的支持向量机(support vector machine,SVM)方法。相似文献

2.

实体-属性抽取的GRU+CRF方法

王仁武孟现茹孔琦《现代情报》2018,38(10):57-64

[目的/意义]研究利用深度学习的循环神经网络GRU结合条件随机场CRF对标注的中文文本序列进行预测,来抽取在线评论文本中的实体-属性。[方法/过程]首先根据设计好的文本序列标注规范,对评论语料分词后进行实体及其属性的命名实体标注,得到单词序列、词性序列和标注序列;然后将单词序列、词性序列转为分布式词向量表示并用于GRU循环神经网络的输入;最后输出层采用条件随机场CRF,输出标签即是实体或属性。[结果/结论]实验结果表明,本文的方法将实体-属性抽取简化为命名实体标注,并利用深度学习的GRU捕获输入数据的上下文语义以及条件随机场CRF获取输出标签的前后关系,比传统的基于规则或一般的机器学习方法具有较大的应用优势。相似文献

3.

基于集成学习的动态Web页面语义标注方法研究

邱金鹏《科技通报》2019,35(10):133-136

传统Web页面语义标注方法需手工处理,或只可将Web页面中有属性的标签赋予数据,针对无属性标签数据不进行标注,不适于大规模Web页面信息标注,且标注结果不可靠。为此,提出一种新的基于集成学习的动态Web页面语义标注方法。给出动态Web页面语义标注流程。将Web页面转换成DOM树,识别待标注文本。选取抽取信息特征与训练Web页面特征,将含有语义信息的内容分配至概念抽象化的本体上,采用多分类器集成学习方法进行分类,区分待标注信息是属性标签还是数据元素,通过不同分类器预测结果的一致性对相应样本被准确标注的置信度进行衡量。通过训练页面中涵盖的属性标注规则集与抽取信息中的属性名称实现语义标注。实验结果表明,所提方法适于大规模动态Web页面语义标注,标注结果可靠。相似文献

4.

基于半监督学习的客户信用评估集成模型研究

黄静薛书田肖进《软科学》2017,(7):131-134

将半监督学习技术与多分类器集成模型Bagging相结合,构建类别分布不平衡环境下基于Bagging的半监督集成模型(SSEBI),综合利用有、无类别标签的样本来提高模型的性能.该模型主要包括三个阶段:(1)从无类别标签数据集中选择性标记一部分样本并训练若干个基本分类器;(2)使用训练好的基本分类器对测试集样本进行分类;(3)对分类结果进行集成得到最终分类结果.在五个客户信用评估数据集上进行实证分析,结果表明本研究提出的SSEBI模型的有效性. 相似文献

5.

基于LM神经网络水色图像识别技术的水质评价研究

许新华《黑龙江科技信息》2019,(6)

运用图像切割技术对采集的图像进行处理,计算水质图像的颜色距的特征值,以便于作为样本。然后构建LM神经网络模型来对水质进行分类以及评价的模型,对养殖水质样本数据进行评价,算法简单,实现容易,得出数据可信度较高。相似文献

6.

三种图像语义标注模式的标注效果对比研究

陆泉韩阳陈静赵雅琴《情报理论与实践》2015,(3):122-127

通过实验采集用户的图像标注结果,对3种图像语义标注模式——基于标签打分的图像标注模式、单标签下基于图像比较的标注模式以及多标签下基于图像比较的标注模式的标注效果进行对比研究。研究发现:1基于标签打分的图像标注模式和单标签下基于图像比较的标注模式能够帮助用户对图像各标签的语义强度进行有效标注;2多标签下基于图像比较的标注模式可以帮助用户对图像各标签语义强度的比例关系进行有效标注;3标注界面中是否同时显示图像的所有标签,可能影响到用户对图像在各标签上语义强度比例关系的判断。相似文献

7.

融合领域知识与深度学习的机器翻译领域自适应研究

下载免费PDF全文

丁亮何彦青《情报科学》2017,35(10):125-132

【目的/意义】无论是统计机器翻译,还是神经机器翻译,训练数据通常来源复杂,主题多样,文体不一,与待翻译目标文本的领域不能保证完全一致,导致领域自适应问题。目前机器翻译的领域自适应方法大多用主题模型得到主题信息,将数据粗略划分为领域内(in-domain)和领域外(out-domain),缺乏更为明确的领域标签。【方法/过程】本研究采用中图分类号作为领域标签,采用两种方法对汉语句子进行自动领域标注领域: 利用论文关键词和科技词系统等知识组织构建领域知识库的领域标注方法;训练卷积神经网络的深度学习的领域标注方法,通过神经网络深度融合模型将这两种方法融合起来得到效果更佳的领域标注器,利用机器翻译的测试集获取领域标签集合筛选其训练数据。【结果/结论】经过在神经机器翻译系统上进行测试,针对两个特定领域测试集,仅利用部分训练数据就获取了比原始训练数据高约1.3BLEU得分（相对5.4%）的翻译结果,证明了本研究方法的有效性和可行性。相似文献

8.

基于PCA和神经网络的人脸识别算法研究

唐赫《人天科学研究》2013,(6):33-34

在MATLAB环境下,取ORL人脸数据库的部分人脸样本集,基于PCA方法提取人脸特征,形成特征脸空间,然后将每个人脸样本投影到该空间得到一投影系数向量,该投影系数向量在一个低维空间表述了一个人脸样本,这样就得到了训练样本集。同时将另一部分ORL人脸数据库的人脸作同样处理得到测试样本集。然后基于最近邻算法进行分类,得到识别率,接下来使用BP神经网络算法进行人脸识别,最后通过基于神经网络算法和最近邻算法进行综合决策,对待识别的人脸进行分类。相似文献

9.

基于相容粗糙集的BP神经网络湿地覆被信息提取——以双台子河口湿地为例

周林飞姚雪芦晓峰《资源科学》2016,38(8):1538-1549

BP神经网络因具有自学习、自适应、大规模并行处理等特点而广泛应用于遥感影像分类中,但是该方法训练时容易陷入局部极小值,且收敛速度较慢,针对这些不足提出一种基于相容粗糙集的BP神经网络分类方法。本文以双台子河口湿地为研究对象,以Landsat-8 OLI影像为数据基础,利用相容粗糙集理论对样本数据集进行预处理,将得到的数据作为新的训练样本,在Matlab软件平台下建立BP神经网络的湿地覆被分类模型,进行湿地覆被信息提取,将分类结果与单纯的BP神经网络以及粗糙集样本属性约简预处理的分类结果进行比较分析。结果表明,基于相容粗糙集的BP神经网络分类方法可以剔除训练样本中的噪声数据,提高网络的训练成功率,缩短网络的收敛时间,分类效果较好,其总体精度达到91.25%,Kappa系数为0.8969,比单纯的BP神经网络分类结果高7.92%和0.0926,比粗糙集样本属性约简预处理方法的分类结果高3.03%和0.0357,是一种有效的湿地覆被分类方法。相似文献

10.

基于社会化标注的用户兴趣发现及个性化推荐研究

王晓耘赵菁徐作宁《现代情报》2018,38(7):67

基于标签的个性化推荐应用越来越普遍,但是标签带有的语义模糊、时序动态性等问题影响着个性化推荐质量,现有研究仅从数量和结构上考虑用户与标签的关系。基于社会化标注系统的个性化推荐首先对融合社会关系的标签进行潜在语义主题挖掘,然后构建多层、多维度用户兴趣模型,提出模型更新策略,最后实现个性化推荐。采集CiteUlike站点数据进行实验分析,结果表明改进算法比传统算法更准确表达用户兴趣偏好,有效提高了个性化推荐准确率。相似文献

11.

图像自动标注方法研究综述

徐勇张慧《现代情报》2016,36(3):144-150

随着Web2.0的逐步发展, 海量用户生成的图像信息充斥于各大网络平台, 图像自动标注技术逐步成为图像检索以及图像理解的关键问题之一。该文主要通过对现有图像自动标注方法的文献进行收集和整理, 在比较、分析各种方法相关理论和实现技术的基础上, 对图像自动标注方法研究进展进行评述;并归纳了各种方法的优势与不足。得出结论:图像自动标注方法和图像处理技术仍然需要从机器学习方面进一步的研究与改进, 且可以从图像信息的标注拓展到视频信息的标注。相似文献

12.

Self-supervised learning for medieval handwriting identification: A case study from the Vatican Apostolic Library

《Information processing & management》2022,59(3):102875

相似文献

13.

Semisupervised SAR image change detection based on a siamese variational autoencoder

《Information processing & management》2022,59(1):102726

In synthetic aperture radar (SAR) image change detection, the deep learning has attracted increasingly more attention because the difference images (DIs) of traditional unsupervised technology are vulnerable to speckle noise. However, most of the existing deep networks do not constrain the distributional characteristics of the hidden space, which may affect the feature representation performance. This paper proposes a variational autoencoder (VAE) network with the siamese structure to detect changes in SAR images. The VAE encodes the input as a probability distribution in the hidden space to obtain regular hidden layer features with a good representation ability. Furthermore, subnetworks with the same parameters and structure can extract the spatial consistency features of the original image, which is conducive to the subsequent classification. The proposed method includes three main steps. First, the training samples are selected based on the false labels generated by a clustering algorithm. Then, we train the proposed model with the semisupervised learning strategy, including unsupervised feature learning and supervised network fine-tuning. Finally, input the original data instead of the DIs in the trained network to obtain the change detection results. The experimental results on four real SAR datasets show the effectiveness and robustness of the proposed method. 相似文献

14.

RoSAS: Deep semi-supervised anomaly detection with contamination-resilient continuous supervision

《Information processing & management》2023,60(5):103459

Semi-supervised anomaly detection methods leverage a few anomaly examples to yield drastically improved performance compared to unsupervised models. However, they still suffer from two limitations: 1) unlabeled anomalies (i.e., anomaly contamination) may mislead the learning process when all the unlabeled data are employed as inliers for model training; 2) only discrete supervision information (such as binary or ordinal data labels) is exploited, which leads to suboptimal learning of anomaly scores that essentially take on a continuous distribution. Therefore, this paper proposes a novel semi-supervised anomaly detection method, which devises contamination-resilient continuous supervisory signals. Specifically, we propose a mass interpolation method to diffuse the abnormality of labeled anomalies, thereby creating new data samples labeled with continuous abnormal degrees. Meanwhile, the contaminated area can be covered by new data samples generated via combinations of data with correct labels. A feature learning-based objective is added to serve as an optimization constraint to regularize the network and further enhance the robustness w.r.t. anomaly contamination. Extensive experiments on 11 real-world datasets show that our approach significantly outperforms state-of-the-art competitors by 20%–30% in AUC-PR and obtains more robust and superior performance in settings with different anomaly contamination levels and varying numbers of labeled anomalies. 相似文献

15.

Semi-supervised document retrieval 总被引：2，自引：0，他引：2

Ming Li Hang Li Zhi-Hua Zhou 《Information processing & management》2009

This paper proposes a new machine learning method for constructing ranking models in document retrieval. The method, which is referred to as SSRank, aims to use the advantages of both the traditional Information Retrieval (IR) methods and the supervised learning methods for IR proposed recently. The advantages include the use of limited amount of labeled data and rich model representation. To do so, the method adopts a semi-supervised learning framework in ranking model construction. Specifically, given a small number of labeled documents with respect to some queries, the method effectively labels the unlabeled documents for the queries. It then uses all the labeled data to train a machine learning model (in our case, Neural Network). In the data labeling, the method also makes use of a traditional IR model (in our case, BM25). A stopping criterion based on machine learning theory is given for the data labeling process. Experimental results on three benchmark datasets and one web search dataset indicate that SSRank consistently and almost always significantly outperforms the baseline methods (unsupervised and supervised learning methods), given the same amount of labeled data. This is because SSRank can effectively leverage the use of unlabeled data in learning. 相似文献

16.

Search task success evaluation by exploiting multi-view active semi-supervised learning

《Information processing & management》2020,57(2):102180

Search task success rate is an important indicator to measure the performance of search engines. In contrast to most of the previous approaches that rely on labeled search tasks provided by users or third-party editors, this paper attempts to improve the performance of search task success evaluation by exploiting unlabeled search tasks that are existing in search logs as well as a small amount of labeled ones. Concretely, the Multi-view Active Semi-Supervised Search task Success Evaluation (MA4SE) approach is proposed, which exploits labeled data and unlabeled data by integrating the advantages of both semi-supervised learning and active learning with the multi-view mechanism. In the semi-supervised learning part of MA4SE, we employ a multi-view semi-supervised learning approach that utilizes different parameter configurations to achieve the disagreement between base classifiers. The base classifiers are trained separately from the pre-defined action and time views. In the active learning part of MA4SE, each classifier received from semi-supervised learning is applied to unlabeled search tasks, and the search tasks that need to be manually annotated are selected based on both the degree of disagreement between base classifiers and a regional density measurement. We evaluate the proposed approach on open datasets with two different definitions of search tasks success. The experimental results show that MA4SE outperforms the state-of-the-art semi-supervised search task success evaluation approach. 相似文献

17.

Ensemble transfer learning-based multimodal sentiment analysis using weighted convolutional neural networks

《Information processing & management》2022,59(3):102929

相似文献

18.

Deep gesture interaction for augmented anatomy learning

《International Journal of Information Management》2019

Augmented reality is very useful in medical education because of the problem of having body organs in a regular classroom. In this paper, we propose to apply augmented reality to improve the way of teaching in medical schools and institutes. We propose a novel convolutional neural network (CNN) for gesture recognition, which recognizes the human's gestures as a certain instruction. We use augmented reality technology for anatomy learning, which simulates the scenarios where students can learn Anatomy with HoloLens instead of rare specimens. We have used the mesh reconstruction to reconstruct the 3D specimens. A user interface featured augment reality has been designed which fits the common process of anatomy learning. To improve the interaction services, we have applied gestures as an input source and improve the accuracy of gestures recognition by an updated deep convolutional neural network. Our proposed learning method includes many separated train procedures using cloud computing. Each train model and its related inputs have been sent to our cloud and the results are returned to the server. The suggested cloud includes windows and android devices, which are able to install deep convolutional learning libraries. Compared with previous gesture recognition, our approach is not only more accurate but also has more potential for adding new gestures. Furthermore, we have shown that neural networks can be combined with augmented reality as a rising field, and the great potential of augmented reality and neural networks to be employed for medical learning and education systems. 相似文献

19.

改进的图神经网络文本分类模型应用研究——以NSTL科技期刊文献分类为例

张晓丹《情报杂志》2021,(1):184-188

[目的/意义]随着互联网数字资源的剧增,如何从海量数据中挖掘出有价值的信息成为数据挖掘领域研究的热点问题。文本大数据分类是这一领域的关键问题之一。随着深度学习的发展,使得基于深度学习的文本大数据分类成为可能。[方法/过程]针对近年来出现的图神经网络文本分类效率低的问题,提出改进的方法。利用文本、句子及关键词构建拓扑关系图和拓扑关系矩阵,利用马尔科夫链采样算法对每一层的节点进行采样,再利用多级降维方法实现特征降维,最后采用归纳式推理的方式实现文本分类。[结果/结论]为了测试该文所提方法的性能,利用常用的公用语料库和自行构建的NSTL科技期刊文献语料库对本文提出的方法进行实验,与当前常用的文本分类模型进行准确率和推理时间的比较。实验结果表明,所提出的方法可在保证文本及文献大数据分类准确率的前提下,有效提高分类的效率。相似文献

20.

Contrastive Graph Convolutional Networks with adaptive augmentation for text classification

《Information processing & management》2022,59(4):102946

Text classification is an important research topic in natural language processing (NLP), and Graph Neural Networks (GNNs) have recently been applied in this task. However, in existing graph-based models, text graphs constructed by rules are not real graph data and introduce massive noise. More importantly, for fixed corpus-level graph structure, these models cannot sufficiently exploit the labeled and unlabeled information of nodes. Meanwhile, contrastive learning has been developed as an effective method in graph domain to fully utilize the information of nodes. Therefore, we propose a new graph-based model for text classification named CGA2TC, which introduces contrastive learning with an adaptive augmentation strategy into obtaining more robust node representation. First, we explore word co-occurrence and document word relationships to construct a text graph. Then, we design an adaptive augmentation strategy for the text graph with noise to generate two contrastive views that effectively solve the noise problem and preserve essential structure. Specifically, we design noise-based and centrality-based augmentation strategies on the topological structure of text graph to disturb the unimportant connections and thus highlight the relatively important edges. As for the labeled nodes, we take the nodes with same label as multiple positive samples and assign them to anchor node, while we employ consistency training on unlabeled nodes to constrain model predictions. Finally, to reduce the resource consumption of contrastive learning, we adopt a random sample method to select some nodes to calculate contrastive loss. The experimental results on several benchmark datasets can demonstrate the effectiveness of CGA2TC on the text classification task. 相似文献