首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Question categorization, which suggests one of a set of predefined categories to a user’s question according to the question’s topic or content, is a useful technique in user-interactive question answering systems. In this paper, we propose an automatic method for question categorization in a user-interactive question answering system. This method includes four steps: feature space construction, topic-wise words identification and weighting, semantic mapping, and similarity calculation. We firstly construct the feature space based on all accumulated questions and calculate the feature vector of each predefined category which contains certain accumulated questions. When a new question is posted, the semantic pattern of the question is used to identify and weigh the important words of the question. After that, the question is semantically mapped into the constructed feature space to enrich its representation. Finally, the similarity between the question and each category is calculated based on their feature vectors. The category with the highest similarity is assigned to the question. The experimental results show that our proposed method achieves good categorization precision and outperforms the traditional categorization methods on the selected test questions.  相似文献   

2.
针对向量空间模型中语义缺失问题,将语义词典(知网)应用到文本分类的过程中以提高文本分类的准确度。对于中文文本中的一词多义现象,提出改进的词汇语义相似度计算方法,通过词义排歧选取义项进行词语的相似度计算,将相似度大于阈值的词语进行聚类,对文本特征向量进行降维,给出基于语义的文本分类算法,并对该算法进行实验分析。结果表明,该算法可有效提高中文文本分类效果。  相似文献   

3.
We propose a CNN-BiLSTM-Attention classifier to classify online short messages in Chinese posted by users on government web portals, so that a message can be directed to one or more government offices. Our model leverages every bit of information to carry out multi-label classification, to make use of different hierarchical text features and the labels information. In particular, our designed method extracts label meaning, the CNN layer extracts local semantic features of the texts, the BiLSTM layer fuses the contextual features of the texts and the local semantic features, and the attention layer selects the most relevant features for each label. We evaluate our model on two public large corpuses, and our high-quality handcraft e-government multi-label dataset, which is constructed by the text annotation tool doccano and consists of 29920 data points. Experimental results show that our proposed method is effective under common multi-label evaluation metrics, achieving micro-f1 of 77.22%, 84.42%, 87.52%, and marco-f1 of 77.68%, 73.37%, 83.57% on these three datasets respectively, confirming that our classifier is robust. We conduct ablation study to evaluate our label embedding method and attention mechanism. Moreover, case study on our handcraft e-government multi-label dataset verifies that our model integrates all types of semantic information of short messages based on different labels to achieve text classification.  相似文献   

4.
何小琴 《现代情报》2012,32(8):45-48
采购联盟合作伙伴的选择是采购联盟成功的一个关键,而伙伴搜索是伙伴选择重要的第一步。本文将电子商务中采购联盟伙伴搜索问题转换为采购需求文本的语义匹配问题,介绍了一种基于领域本体和语义相似度的采购联盟伙伴搜索模型。该模型通过对采购需求文本概念向量的上位填充和语义相似度计算来量化采购需求的语义匹配程度。  相似文献   

5.
This paper constructs a novel enhanced latent semantic model based on users’ comments, and employs regularization factors to capture the temporal evolution characteristics of users’ potential topics for each commodity, so as to improve the accuracy of recommendation. The adaptive temporal weighting of multiple preference features is also improved to calculate the preferences of different users at different time periods using human forgetting features, item interest overlap, and similarity at the semantic level of the review text to improve the accuracy of sparse evaluation data. The paper conducts comparison experiments with six temporal matrix-based decomposition baseline methods in nine datasets, and the results show that the accuracy is 31.64% better than TimeSVD++, 21.08% better than BTMF, 15.51% better than TMRevCo, 13.99% better than BPTF, 9.24% better than TCMF, and 3.19% better than MUTPD ,which indicates that the model is more effective in capturing users’ temporal interest drift and better reflects the evolutionary relationship between users’ latent topics and item ratings.  相似文献   

6.
With the rapid development of remote sensing technology, using remote sensing technology is an important means to monitor the dynamic change of land cover and ecology. In view of the complexity of mangrove ecological monitoring in Dongzhaigang, Hainan Province of China, we propose a semantic understanding method of mangrove remote sensing image by combining a multi-feature kernel sparse classifier with a decision rule model in this paper. First, on the basis of multi-feature extraction, we take into account the spatial context relations of the samples and introduce the kernel function into the sparse representation classifier, a multi-feature kernel sparse representation classifier can be constructed to classify cover types of mangroves and their surrounding objects. Second, in view of growth conditions of mangrove area, we put forward a semantic understanding method of mangrove remote sensing image based on decision rules and divide mangrove and non-mangrove areas by combining classification results of the multi-feature kernel sparse representation classifier. We make a divisibility analysis based on the extracted features of spatial and spectral domains. Then select the best split attribute based on the maximum information gain criterion, to generate a semantic tree and extract semantic rules. Finally, we work on the semantic understanding of mangrove areas in line with decision rules and further divide mangrove areas into two categories: excellent growth and poor growth. Experimental results show that the proposed method can effectively identify mangrove areas and make decisions on mangrove growth.  相似文献   

7.
文章提出的适用于关联数据资源集相似度计算的综合描述信息模型,分为基本描述、内容描述和外部链接3个模块描述资源集,并根据各信息项的特点挑选字符串相似度、集合相似度、向量空间模型和基于统计和语义的相似度等算法计算资源集相似度,在一定程度上解决了当前关联创建中相关资源集手工配置的问题。  相似文献   

8.
Effectively detecting supportive knowledge of answers is a fundamental step towards automated question answering. While pre-trained semantic vectors for texts have enabled semantic computation for background-answer pairs, they are limited in representing structured knowledge relevant for question answering. Recent studies have shown interests in enrolling structured knowledge graphs for text processing, however, their focus was more on semantics than on graph structure. This study, by contrast, takes a special interest in exploring the structural patterns of knowledge graphs. Inspired by human cognitive processes, we propose novel methods of feature extraction for capturing the local and global structural information of knowledge graphs. These features not only exhibit good indicative power, but can also facilitate text analysis with explainable meanings. Moreover, aiming to better combine structural and semantic evidence for prediction, we propose a Neural Knowledge Graph Evaluator (NKGE) which showed superior performance over existing methods. Our contributions include a novel set of interpretable structural features and the effective NKGE for compatibility evaluation between knowledge graphs. The methods of feature extraction and the structural patterns indicated by the features may also provide insights for related studies in computational modeling and processing of knowledge.  相似文献   

9.
将标准平面结晶体P1迭代公式中的自变量x,y用x3+c1,y3+c2替换,构造可视化平面动力系统.将原有的自变量的线性关系替换为自变量的非线性关系,提出了一种新的自变量的映射方法.并且运用蒙特卡罗搜索法寻找参数,运用李雅普诺夫指数来确定该动力系统的特性,绘制出该动力学系统的混沌吸引子和充满Julia集的图案。  相似文献   

10.
11.
[目的]利用向量空间描述语义信息,研究基于词向量包的自动文摘方法;[方法]文摘是文献内容缩短的精确表达;而词向量包可以在同一个向量空间下表示词、短语、句子、段落和篇章,其空间距离用于反映语义相似度。提出一种基于词向量包的自动文摘方法,用词向量包的表示距离衡量句子与整篇文献的语义相似度,将与文献语义相似的句子抽取出来最终形成文摘;[结果]在DUC01数据集上,实验结果表明,该方法能够生成高质量的文摘,结果明显优于其它方法;[结论]实验证明该方法明显提升了自动文摘的性能。  相似文献   

12.
在线词汇资源FrameNet的语义类型对于深层次的自然语言理解具有重要的作用。文章首先对FrameNet的语义类型功能进行论述,指出其由框架类型、本体类型和词汇类型构成,从概念、感情、搭配等层面描述词汇固有的词类特征和语义特征。在此基础上,作者探讨语义类型在自然语言处理中的具体应用范围:如限制框架元素的取值范围,进行问答、文摘、复述和翻译等自然语言理解,便于概念检索中关系相关度的计算,实现语义层面的推理,建立与其他本体在多领域中的映射等。  相似文献   

13.
   潜在标准必要专利在未来市场中具有极高的战略价值和经济价值,企业如何抢先识别这些专利对建设创新型国家、优化企业专利布局、加快技术创新、提升行业地位、规避专利挟持具有重要意义。但目前关于自动化识别潜在标准必要专利的研究尚少。本文从提取标准必要专利语义特征的视角下,提出利用Bert-CNN网络模型结合上下文对已知标准必要专利的隐性全局语义特征和高维层次语义特征双重提取,依据特征提取结果识别潜在标准必要专利,并通过计算Bert向量相似度预测潜在标准必要专利可能对应的标准。实证部分以ETSI欧洲标准化协会发布的标准必要专利构建数据验证集对模型的性能进行验证,结果显示本模型在大规模专利数据实验中的精准率、召回率、F1值优于已有研究。  相似文献   

14.
针对图书、期刊论文等数字文献文本特征较少而导致特征向量语义表达不够准确、分类效果差的问题,本文提出一种基于特征语义扩展的数字文献分类方法。该方法首先利用TF-IDF方法获取对数字文献文本表示能力较强、具有较高TF-IDF值的核心特征词;其次分别借助知网(Hownet)语义词典以及开放知识库维基百科(Wikipedia)对核心特征词集进行语义概念的扩展,以构建维度较低、语义丰富的概念向量空间;最后采用MaxEnt、SVM等多种算法构造分类器实现对数字文献的自动分类。实验结果表明:相比传统基于特征选择的短文本分类方法,该方法能有效地实现对短文本特征的语义扩展,提高数字文献分类的分类性能。  相似文献   

15.
Similarity-based modeling (SBM) is a technique whereby the normal operation of a system is modeled in order to detect faults by analyzing their similarity to the normal system states. First proposed around two decades ago, SBM has been successfully used for fault detection in varied systems. In spite of this success, there is not much study performed in the literature regarding its design, that encompasses both similarity metrics and model training. This work aims at contributing with an in-depth study of SBM for fault detection considering these two design aspects. This is done in the context of proposing a novel system to identify rotating-machinery faults based on SBM, that is employed either as a standalone classifier or to generate features for a random forest classifier. New approaches for training the model and new similarity metrics are investigated. Experimental results are shown for the recently developed Machinery Fault Database (MaFaulDa) that has an extensive set of sequences and fault types, and for the Case Western Reserve University (CWRU) bearing database. Results for both databases indicate that the proposed techniques increase the generalization power of the similarity model and of the associated classifier, achieving accuracies of 98.5% on MaFaulDa and 98.9% on CWRU database.  相似文献   

16.
左晓飞  刘怀亮  范云杰  赵辉 《情报杂志》2012,31(5):180-184,191
传统的基于关键词的文本聚类算法,由于难以充分利用文本的语义特征,聚类效果差强人意。笔者提出一种概念语义场的概念,并给出了基于知网构建概念语义场的算法。即首先利用知网构造义原屏蔽层,将描述能力弱的义原屏蔽,然后在分析知网结构的基础上给出抽取相关概念的规则,以及简单概念语义场和复杂概念语义场的构造方法。最后给出一种基于概念语义场的文本聚类算法。该算法可充分利用特征词的语义关系,对不规则形状的聚类也有较好效果。实验表明,该算法可以有效提高聚类的质量。  相似文献   

17.
Log parsing is a critical task that converts unstructured raw logs into structured data for downstream tasks. Existing methods often rely on manual string-matching rules to extract template tokens, leading to lower adaptability on different log datasets. To address this issue, we propose an automated log parsing method, PVE, which leverages Variational Auto-Encoder (VAE) to build a semi-supervised model for categorizing log tokens. Inspired by the observation that log template tokens often consist of words, we choose common words and their combinations to serve as training data to enhance the diversity of structure features of template tokens. Specifically, PVE constructs two types of embedding vectors, the sum embedding and the n-gram embedding, for each word and word combination. The structure features of template tokens can be learned by training VAE on these embeddings. PVE categorizes a token as a template token if it is similar to the training data when log parsing. To improve efficiency, we use the average similarity between token embedding and VAE samples to determine the token type, rather than the reconstruction error. Evaluations on 16 real-world log datasets demonstrate that our method has an average accuracy of 0.878, which outperforms comparison methods in terms of parsing accuracy and adaptability.  相似文献   

18.
[目的/意义]针对单纯使用统计自然语言处理技术对社交网络上产生的短文本数据进行意向分类时存在的特征稀疏、语义模糊和标记数据不足等问题,提出了一种融合心理语言学信息的Co-training意图分类方法。[方法/过程]首先,为丰富语义信息,在提取文本特征的同时融合带有情感倾向的心理语言学线索对特征维度进行扩展。其次,针对标记数据有限的问题,在模型训练阶段使用半监督集成法对两种机器学习分类方法(基于事件内容表达分类器与情感事件表达分类器)进行协同训练(Co-training)。最后,采用置信度乘积的投票制进行分类。[结论/结果]实验结果表明融入心理语言学信息的语料再经过协同训练的分类效果更优。  相似文献   

19.
Automated visual inspection of fabric defects is a challenge due to the diversity of the fabric patterns and defects. Although there are many automated inspection methods of identifying fabric defects, most methods process images containing the fabric patterns classified as the crystallographic group p1 and implicitly assume the fabric patterns are arranged in fixed directions. This paper proposes an automated defect inspection method which calibrates the fabric image and then segments the image into none-overlapped sub-images which are called lattices. Thus, the image is represented by hundreds of lattices sharing some common features instead of millions of unrelated pixels. The defect inspection problem is transformed to comparing the lattice similarity based on the shared features and identifying the defective lattices as the outliers in the feature space. The performance of the proposed method ILS (Isotropic Lattice Segmentation) is evaluated on the databases of images containing fabric patterns arranged orthogonally and arbitrarily. By comparing the resultant images with ground-truth images, an overall detection rate of 0.955 is achieved, which is comparable with the state-of-the-art methods.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号