共查询到20条相似文献,搜索用时 31 毫秒
1.
Wanpeng Song Liu Wenyin Naijie Gu Xiaojun Quan Tianyong Hao 《Information processing & management》2011
Question categorization, which suggests one of a set of predefined categories to a user’s question according to the question’s topic or content, is a useful technique in user-interactive question answering systems. In this paper, we propose an automatic method for question categorization in a user-interactive question answering system. This method includes four steps: feature space construction, topic-wise words identification and weighting, semantic mapping, and similarity calculation. We firstly construct the feature space based on all accumulated questions and calculate the feature vector of each predefined category which contains certain accumulated questions. When a new question is posted, the semantic pattern of the question is used to identify and weigh the important words of the question. After that, the question is semantically mapped into the constructed feature space to enrich its representation. Finally, the similarity between the question and each category is calculated based on their feature vectors. The category with the highest similarity is assigned to the question. The experimental results show that our proposed method achieves good categorization precision and outperforms the traditional categorization methods on the selected test questions. 相似文献
2.
3.
《Information processing & management》2023,60(3):103320
We propose a CNN-BiLSTM-Attention classifier to classify online short messages in Chinese posted by users on government web portals, so that a message can be directed to one or more government offices. Our model leverages every bit of information to carry out multi-label classification, to make use of different hierarchical text features and the labels information. In particular, our designed method extracts label meaning, the CNN layer extracts local semantic features of the texts, the BiLSTM layer fuses the contextual features of the texts and the local semantic features, and the attention layer selects the most relevant features for each label. We evaluate our model on two public large corpuses, and our high-quality handcraft e-government multi-label dataset, which is constructed by the text annotation tool doccano and consists of 29920 data points. Experimental results show that our proposed method is effective under common multi-label evaluation metrics, achieving micro-f1 of 77.22%, 84.42%, 87.52%, and marco-f1 of 77.68%, 73.37%, 83.57% on these three datasets respectively, confirming that our classifier is robust. We conduct ablation study to evaluate our label embedding method and attention mechanism. Moreover, case study on our handcraft e-government multi-label dataset verifies that our model integrates all types of semantic information of short messages based on different labels to achieve text classification. 相似文献
4.
采购联盟合作伙伴的选择是采购联盟成功的一个关键,而伙伴搜索是伙伴选择重要的第一步。本文将电子商务中采购联盟伙伴搜索问题转换为采购需求文本的语义匹配问题,介绍了一种基于领域本体和语义相似度的采购联盟伙伴搜索模型。该模型通过对采购需求文本概念向量的上位填充和语义相似度计算来量化采购需求的语义匹配程度。 相似文献
5.
《Information processing & management》2022,59(5):103037
This paper constructs a novel enhanced latent semantic model based on users’ comments, and employs regularization factors to capture the temporal evolution characteristics of users’ potential topics for each commodity, so as to improve the accuracy of recommendation. The adaptive temporal weighting of multiple preference features is also improved to calculate the preferences of different users at different time periods using human forgetting features, item interest overlap, and similarity at the semantic level of the review text to improve the accuracy of sparse evaluation data. The paper conducts comparison experiments with six temporal matrix-based decomposition baseline methods in nine datasets, and the results show that the accuracy is 31.64% better than TimeSVD++, 21.08% better than BTMF, 15.51% better than TMRevCo, 13.99% better than BPTF, 9.24% better than TCMF, and 3.19% better than MUTPD ,which indicates that the model is more effective in capturing users’ temporal interest drift and better reflects the evolutionary relationship between users’ latent topics and item ratings. 相似文献
6.
《Information processing & management》2022,59(2):102813
With the rapid development of remote sensing technology, using remote sensing technology is an important means to monitor the dynamic change of land cover and ecology. In view of the complexity of mangrove ecological monitoring in Dongzhaigang, Hainan Province of China, we propose a semantic understanding method of mangrove remote sensing image by combining a multi-feature kernel sparse classifier with a decision rule model in this paper. First, on the basis of multi-feature extraction, we take into account the spatial context relations of the samples and introduce the kernel function into the sparse representation classifier, a multi-feature kernel sparse representation classifier can be constructed to classify cover types of mangroves and their surrounding objects. Second, in view of growth conditions of mangrove area, we put forward a semantic understanding method of mangrove remote sensing image based on decision rules and divide mangrove and non-mangrove areas by combining classification results of the multi-feature kernel sparse representation classifier. We make a divisibility analysis based on the extracted features of spatial and spectral domains. Then select the best split attribute based on the maximum information gain criterion, to generate a semantic tree and extract semantic rules. Finally, we work on the semantic understanding of mangrove areas in line with decision rules and further divide mangrove areas into two categories: excellent growth and poor growth. Experimental results show that the proposed method can effectively identify mangrove areas and make decisions on mangrove growth. 相似文献
7.
文章提出的适用于关联数据资源集相似度计算的综合描述信息模型,分为基本描述、内容描述和外部链接3个模块描述资源集,并根据各信息项的特点挑选字符串相似度、集合相似度、向量空间模型和基于统计和语义的相似度等算法计算资源集相似度,在一定程度上解决了当前关联创建中相关资源集手工配置的问题。 相似文献
8.
《Information processing & management》2020,57(6):102309
Effectively detecting supportive knowledge of answers is a fundamental step towards automated question answering. While pre-trained semantic vectors for texts have enabled semantic computation for background-answer pairs, they are limited in representing structured knowledge relevant for question answering. Recent studies have shown interests in enrolling structured knowledge graphs for text processing, however, their focus was more on semantics than on graph structure. This study, by contrast, takes a special interest in exploring the structural patterns of knowledge graphs. Inspired by human cognitive processes, we propose novel methods of feature extraction for capturing the local and global structural information of knowledge graphs. These features not only exhibit good indicative power, but can also facilitate text analysis with explainable meanings. Moreover, aiming to better combine structural and semantic evidence for prediction, we propose a Neural Knowledge Graph Evaluator (NKGE) which showed superior performance over existing methods. Our contributions include a novel set of interpretable structural features and the effective NKGE for compatibility evaluation between knowledge graphs. The methods of feature extraction and the structural patterns indicated by the features may also provide insights for related studies in computational modeling and processing of knowledge. 相似文献
9.
将标准平面结晶体P1迭代公式中的自变量x,y用x3+c1,y3+c2替换,构造可视化平面动力系统.将原有的自变量的线性关系替换为自变量的非线性关系,提出了一种新的自变量的映射方法.并且运用蒙特卡罗搜索法寻找参数,运用李雅普诺夫指数来确定该动力系统的特性,绘制出该动力学系统的混沌吸引子和充满Julia集的图案。 相似文献
10.
11.
12.
在线词汇资源FrameNet的语义类型对于深层次的自然语言理解具有重要的作用。文章首先对FrameNet的语义类型功能进行论述,指出其由框架类型、本体类型和词汇类型构成,从概念、感情、搭配等层面描述词汇固有的词类特征和语义特征。在此基础上,作者探讨语义类型在自然语言处理中的具体应用范围:如限制框架元素的取值范围,进行问答、文摘、复述和翻译等自然语言理解,便于概念检索中关系相关度的计算,实现语义层面的推理,建立与其他本体在多领域中的映射等。 相似文献
13.
潜在标准必要专利在未来市场中具有极高的战略价值和经济价值,企业如何抢先识别这些专利对建设创新型国家、优化企业专利布局、加快技术创新、提升行业地位、规避专利挟持具有重要意义。但目前关于自动化识别潜在标准必要专利的研究尚少。本文从提取标准必要专利语义特征的视角下,提出利用Bert-CNN网络模型结合上下文对已知标准必要专利的隐性全局语义特征和高维层次语义特征双重提取,依据特征提取结果识别潜在标准必要专利,并通过计算Bert向量相似度预测潜在标准必要专利可能对应的标准。实证部分以ETSI欧洲标准化协会发布的标准必要专利构建数据验证集对模型的性能进行验证,结果显示本模型在大规模专利数据实验中的精准率、召回率、F1值优于已有研究。 相似文献
14.
针对图书、期刊论文等数字文献文本特征较少而导致特征向量语义表达不够准确、分类效果差的问题,本文提出一种基于特征语义扩展的数字文献分类方法。该方法首先利用TF-IDF方法获取对数字文献文本表示能力较强、具有较高TF-IDF值的核心特征词;其次分别借助知网(Hownet)语义词典以及开放知识库维基百科(Wikipedia)对核心特征词集进行语义概念的扩展,以构建维度较低、语义丰富的概念向量空间;最后采用MaxEnt、SVM等多种算法构造分类器实现对数字文献的自动分类。实验结果表明:相比传统基于特征选择的短文本分类方法,该方法能有效地实现对短文本特征的语义扩展,提高数字文献分类的分类性能。 相似文献
15.
Matheus A. Marins Felipe M.L. Ribeiro Sergio L. Netto Eduardo A.B. da Silva 《Journal of The Franklin Institute》2018,355(4):1913-1930
Similarity-based modeling (SBM) is a technique whereby the normal operation of a system is modeled in order to detect faults by analyzing their similarity to the normal system states. First proposed around two decades ago, SBM has been successfully used for fault detection in varied systems. In spite of this success, there is not much study performed in the literature regarding its design, that encompasses both similarity metrics and model training. This work aims at contributing with an in-depth study of SBM for fault detection considering these two design aspects. This is done in the context of proposing a novel system to identify rotating-machinery faults based on SBM, that is employed either as a standalone classifier or to generate features for a random forest classifier. New approaches for training the model and new similarity metrics are investigated. Experimental results are shown for the recently developed Machinery Fault Database (MaFaulDa) that has an extensive set of sequences and fault types, and for the Case Western Reserve University (CWRU) bearing database. Results for both databases indicate that the proposed techniques increase the generalization power of the similarity model and of the associated classifier, achieving accuracies of 98.5% on MaFaulDa and 98.9% on CWRU database. 相似文献
16.
17.
《Information processing & management》2023,60(5):103476
Log parsing is a critical task that converts unstructured raw logs into structured data for downstream tasks. Existing methods often rely on manual string-matching rules to extract template tokens, leading to lower adaptability on different log datasets. To address this issue, we propose an automated log parsing method, PVE, which leverages Variational Auto-Encoder (VAE) to build a semi-supervised model for categorizing log tokens. Inspired by the observation that log template tokens often consist of words, we choose common words and their combinations to serve as training data to enhance the diversity of structure features of template tokens. Specifically, PVE constructs two types of embedding vectors, the sum embedding and the n-gram embedding, for each word and word combination. The structure features of template tokens can be learned by training VAE on these embeddings. PVE categorizes a token as a template token if it is similar to the training data when log parsing. To improve efficiency, we use the average similarity between token embedding and VAE samples to determine the token type, rather than the reconstruction error. Evaluations on 16 real-world log datasets demonstrate that our method has an average accuracy of 0.878, which outperforms comparison methods in terms of parsing accuracy and adaptability. 相似文献
18.
[目的/意义]针对单纯使用统计自然语言处理技术对社交网络上产生的短文本数据进行意向分类时存在的特征稀疏、语义模糊和标记数据不足等问题,提出了一种融合心理语言学信息的Co-training意图分类方法。[方法/过程]首先,为丰富语义信息,在提取文本特征的同时融合带有情感倾向的心理语言学线索对特征维度进行扩展。其次,针对标记数据有限的问题,在模型训练阶段使用半监督集成法对两种机器学习分类方法(基于事件内容表达分类器与情感事件表达分类器)进行协同训练(Co-training)。最后,采用置信度乘积的投票制进行分类。[结论/结果]实验结果表明融入心理语言学信息的语料再经过协同训练的分类效果更优。 相似文献
19.
Automated visual inspection of fabric defects is a challenge due to the diversity of the fabric patterns and defects. Although there are many automated inspection methods of identifying fabric defects, most methods process images containing the fabric patterns classified as the crystallographic group p1 and implicitly assume the fabric patterns are arranged in fixed directions. This paper proposes an automated defect inspection method which calibrates the fabric image and then segments the image into none-overlapped sub-images which are called lattices. Thus, the image is represented by hundreds of lattices sharing some common features instead of millions of unrelated pixels. The defect inspection problem is transformed to comparing the lattice similarity based on the shared features and identifying the defective lattices as the outliers in the feature space. The performance of the proposed method ILS (Isotropic Lattice Segmentation) is evaluated on the databases of images containing fabric patterns arranged orthogonally and arbitrarily. By comparing the resultant images with ground-truth images, an overall detection rate of 0.955 is achieved, which is comparable with the state-of-the-art methods. 相似文献