期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Semantic trees for natural language representation

Camilla B. Schwind 《Information processing & management》1983,19(4):223-235

This paper presents a formalism for the representation of complex semantic relations among concepts of natural language. We define a semantic algebra as a set of atomic concepts together with an ordered set of semantic relations. Semantic trees are a graphical representation of a semantic algebra (comparable to Kantorovic trees for boolean or arithmetical expressions). A semantic tree is an ordered tree with nodes labeled with relation and concept names. We generate semantic trees from natural language texts in such a way that they represent the semantic relations which hold among the concepts occurring within that text. This generation process is carried out by a transformational grammar which transforms directly natural language sentences into semantic trees. We present an example for concepts and relations within the domain of computer science where we have generated semantic trees from definition texts by means of a metalanguage for transformational grammars (a sort of metacompiler for transformational grammars). The semantic trees generated so far serve for thesaurus entries in an information retrieval system. 相似文献

2.

Semantic audio content-based music recommendation and visualization based on user preference examples

Dmitry Bogdanov Martín Haro Ferdinand Fuhrmann Anna Xambó Emilia Gómez Perfecto Herrera 《Information processing & management》2013

相似文献

3.

Graph-based Arabic text semantic representation

《Information processing & management》2020,57(3):102183

Semantic representation reflects the meaning of the text as it may be understood by humans. Thus, it contributes to facilitating various automated language processing applications. Although semantic representation is very useful for several applications, a few models were proposed for the Arabic language. In that context, this paper proposes a graph-based semantic representation model for Arabic text. The proposed model aims to extract the semantic relations between Arabic words. Several tools and concepts have been employed such as dependency relations, part-of-speech tags, name entities, patterns, and Arabic language predefined linguistic rules. The core idea of the proposed model is to represent the meaning of Arabic sentences as a rooted acyclic graph. Textual entailment recognition challenge is considered in order to evaluate the ability of the proposed model to enhance other Arabic NLP applications. The experiments have been conducted using a benchmark Arabic textual entailment dataset, namely, ArbTED. The results proved that the proposed graph-based model is able to enhance the performance of the textual entailment recognition task in comparison to other baseline models. On average, the proposed model achieved 8.6%, 30.2%, 5.3% and 16.2% improvement in terms of accuracy, recall, precision, and F-score results, respectively. 相似文献

4.

Automatic,semantics-based indexing of natural language texts for information retrieval systems

Stephan Braun Camilla Schwind 《Information processing & management》1976,12(2):147-153

The fundamental idea of the work reported here is to extract index phrases from texts with the help of a single word concept dictionary and a thesaurus containing relations among concepts. The work is based on the fact, that, within every phrase, the single words the phrase is composed of are related in a certain well denned manner, the type of relations holding between concepts depending only on the concepts themselves. Therefore relations can be stored in a semantic network. The algorithm described extracts single word concepts from texts and combines them to phrases using the semantic relations between these concepts, which are stored in the network. The results obtained show that phrase extraction from texts by this semantic method is possible and offers many advantages over other (purely syntactic or statistic) methods concerning preciseness and completeness of the meaning representation of the text. But the results show, too, that some syntactic and morphologic “filtering” should be included for effectivity reasons. 相似文献

5.

Improving relational similarity measurement using symmetries in proportional word analogies

Danushka Bollegala Tomokazu Goto Nguyen Tuan Duc Mitsuru Ishizuka 《Information processing & management》2013

Measuring the similarity between the semantic relations that exist between words is an important step in numerous tasks in natural language processing such as answering word analogy questions, classifying compound nouns, and word sense disambiguation. Given two word pairs (A, B) and (C, D), we propose a method to measure the relational similarity between the semantic relations that exist between the two words in each word pair. Typically, a high degree of relational similarity can be observed between proportional analogies (i.e. analogies that exist among the four words, A is to B such as C is to D). We describe eight different types of relational symmetries that are frequently observed in proportional analogies and use those symmetries to robustly and accurately estimate the relational similarity between two given word pairs. We use automatically extracted lexical-syntactic patterns to represent the semantic relations that exist between two words and then match those patterns in Web search engine snippets to find candidate words that form proportional analogies with the original word pair. We define eight types of relational symmetries for proportional analogies and use those as features in a supervised learning approach. We evaluate the proposed method using the Scholastic Aptitude Test (SAT) word analogy benchmark dataset. Our experimental results show that the proposed method can accurately measure relational similarity between word pairs by exploiting the symmetries that exist in proportional analogies. The proposed method achieves an SAT score of 49.2% on the benchmark dataset, which is comparable to the best results reported on this dataset. 相似文献

6.

Criminal Action Graph: A semantic representation model of judgement documents for legal charge prediction

《Information processing & management》2023,60(5):103421

Semantic information in judgement documents has been an important source in Artificial Intelligence and Law. Sequential representation is the traditional structure for analyzing judgement documents and supporting the legal charge prediction task. The main problem is that it is not effective to represent the criminal semantic information. In this paper, to represent and verify the criminal semantic information such as multi-linked legal features, we propose a novel criminal semantic representation model, which constructs the Criminal Action Graph (CAG) by extracting criminal actions linked in two temporal relationships. Based on the CAG, a Graph Convolutional Network is also adopted as the predictor for legal charge prediction. We evaluate the validity of CAG on the confusing charges which composed of 32,000 judgement documents on five confusing charge sets. The CAG reaches about 88% accuracy averagely, more than 3% over the compared model. The experimental standard deviation also show the stability of our model, which is about 0.0032 on average, nearly 0. The results show the effectiveness of our model for representing and using the semantic information in judgement documents. 相似文献

7.

Applying semantic knowledge to the automatic processing of temporal expressions and events in natural language

Hector Llorens Estela SaqueteBorja Navarro-Colorado 《Information processing & management》2013

This paper addresses the problem of the automatic recognition and classification of temporal expressions and events in human language. Efficacy in these tasks is crucial if the broader task of temporal information processing is to be successfully performed. We analyze whether the application of semantic knowledge to these tasks improves the performance of current approaches. We therefore present and evaluate a data-driven approach as part of a system: TIPSem. Our approach uses lexical semantics and semantic roles as additional information to extend classical approaches which are principally based on morphosyntax. The results obtained for English show that semantic knowledge aids in temporal expression and event recognition, achieving an error reduction of 59% and 21%, while in classification the contribution is limited. From the analysis of the results it may be concluded that the application of semantic knowledge leads to more general models and aids in the recognition of temporal entities that are ambiguous at shallower language analysis levels. We also discovered that lexical semantics and semantic roles have complementary advantages, and that it is useful to combine them. Finally, we carried out the same analysis for Spanish. The results obtained show comparable advantages. This supports the hypothesis that applying the proposed semantic knowledge may be useful for different languages. 相似文献

8.

Extracting temporal and causal relations based on event networks

《Information processing & management》2020,57(6):102319

Event relations specify how different event flows expressed within the context of a textual passage relate to each other in terms of temporal and causal sequences. There have already been impactful work in the area of temporal and causal event relation extraction; however, the challenge with these approaches is that (1) they are mostly supervised methods and (2) they rely on syntactic and grammatical structure patterns at the sentence-level. In this paper, we address these challenges by proposing an unsupervised event network representation for temporal and causal relation extraction that operates at the document level. More specifically, we benefit from existing Open IE systems to generate a set of triple relations that are then used to build an event network. The event network is bootstrapped by labeling the temporal disposition of events that are directly linked to each other. We then systematically traverse the event network to identify the temporal and causal relations between indirectly connected events. We perform experiments based on the widely adopted TempEval-3 and Causal-TimeBank corpora and compare our work with several strong baselines. We show that our method improves performance compared to several strong methods. 相似文献

9.

Wikipedia bi-linear link (WBLM) model: A new approach for measuring semantic similarity and relatedness between linguistic concepts using Wikipedia link structure

《Information processing & management》2023,60(2):103202

Wikipedia links its articles by manually defined semantic relations called the Wikipedia hyperlink (link) structure. The existing Wikipedia link-based semantic similarity (SS) and semantic relatedness (SR) computation models, such as Wikipedia one-way link (WOLM) model and Wikipedia two-way link (WTLM) model, do not assess the strengths of the relationships between a candidate concept and its links (out-links or in-links). These models treat all the links as equally important even though some links are semantically more influential than others and should be given more importance. This phenomenon reduces the accuracy of these models. This paper presents the Wikipedia bi-linear link (WBLM) model that extends the previously proposed WOLM and WTLM models. The WBLM model explores the Wikipedia link structure as a semantic graph and discovers the strongly (bi-linear links) and weakly (out-links or in-links) connected links of a candidate concept. It improves the link-based vector representations of concepts by assigning weights to their connected links according to the strengths of their semantic associations. The experimental results demonstrate that the proposed WBLM model significantly improves the SS and SR computation accuracy of the WOLM model (6.9%, 8%, 24%, 17.3%, 31.2%, 30.6%, 26.5%, and 35.4%) and WTLM model (1.2%, 3.9%, 7.1%, 9.9%, 11%, 6.3%, 12.7%, and 13%), in terms of linear correlations with human judgments on gold standard benchmarks, including MC30, RG65, WS203, SimLex, 353All, MTurk287, MTurk771, and MEN3000, respectively. Moreover, this research offers a deep insight into the Wikipedia link structure and provides an adequate base for understanding it as a semantic graph. 相似文献

10.

A novel T–S fuzzy systems identification with block structured sparse representation

Minnan Luo Fuchun Sun Huaping Liu Zhijun Li 《Journal of The Franklin Institute》2014

相似文献

11.

Planarized sentence representation for nested named entity recognition

《Information processing & management》2023,60(4):103352

One strategy to recognize nested entities is to enumerate overlapped entity spans for classification. However, current models independently verify every entity span, which ignores the semantic dependency between spans. In this paper, we first propose a planarized sentence representation to represent nested named entities. Then, a bi-directional two-dimensional recurrent operation is implemented to learn semantic dependencies between spans. Our method is evaluated on seven public datasets for named entity recognition. It achieves competitive performance in named entity recognition. The experimental results show that our method is effective to resolve nested named entities and learn semantic dependencies between them. 相似文献

12.

Cascade embedding model for knowledge graph inference and retrieval

《Information processing & management》2019,56(6):102093

Knowledge graphs are widely used in retrieval systems, question answering systems (QA), hypothesis generation systems, etc. Representation learning provides a way to mine knowledge graphs to detect missing relations; and translation-based embedding models are a popular form of representation model. Shortcomings of translation-based models however, limits their practicability as knowledge completion algorithms. The proposed model helps to address some of these shortcomings.The similarity between graph structural features of two entities was found to be correlated to the relations of those entities. This correlation can help to solve the problem caused by unbalanced relations and reciprocal relations. We used Node2vec, a graph embedding algorithm, to represent information related to an entity's graph structure, and we introduce a cascade model to incorporate graph embedding with knowledge embedding into a unified framework. The cascade model first refines feature representation in the first two stages (Local Optimization Stage), and then uses backward propagation to optimize parameters of all the stages (Global Optimization Stage). This helps to enhance the knowledge representation of existing translation-based algorithms by taking into account both semantic features and graph features and fusing them to extract more useful information. Besides, different cascade structures are designed to find the optimal solution to the problem of knowledge inference and retrieval.The proposed model was verified using three mainstream knowledge graphs: WIN18, FB15K and BioChem. Experimental results were validated using the hit@10 rate entity prediction task. The proposed model performed better than TransE, giving an average improvement of 2.7% on WN18, 2.3% on FB15k and 28% on BioChem. Improvements were particularly marked where there were problems with unbalanced relations and reciprocal relations. Furthermore, the stepwise-cascade structure is proved to be more effective and significantly outperforms other baselines. 相似文献

13.

Automatic categorization of questions for user-interactive question answering

Wanpeng Song Liu Wenyin Naijie Gu Xiaojun Quan Tianyong Hao 《Information processing & management》2011

Question categorization, which suggests one of a set of predefined categories to a user’s question according to the question’s topic or content, is a useful technique in user-interactive question answering systems. In this paper, we propose an automatic method for question categorization in a user-interactive question answering system. This method includes four steps: feature space construction, topic-wise words identification and weighting, semantic mapping, and similarity calculation. We firstly construct the feature space based on all accumulated questions and calculate the feature vector of each predefined category which contains certain accumulated questions. When a new question is posted, the semantic pattern of the question is used to identify and weigh the important words of the question. After that, the question is semantically mapped into the constructed feature space to enrich its representation. Finally, the similarity between the question and each category is calculated based on their feature vectors. The category with the highest similarity is assigned to the question. The experimental results show that our proposed method achieves good categorization precision and outperforms the traditional categorization methods on the selected test questions. 相似文献

14.

A Cross-Media Deep Relationship Classification Method Using Discrimination Information

《Information processing & management》2020,57(6):102344

Relation classification is one of the most fundamental tasks in the area of cross-media, which is essential for many practical applications such as information extraction, question&answer system, and knowledge base construction. In the cross-media semantic retrieval task, in order to meet the needs of cross-media uniform representation and semantic analysis, it is necessary to analyze the semantic potential relationship and construct semantic-related cross-media knowledge graph. The relationship classification technology is an important part of solving semantic correlation classification. Most of existing methods regard relation classification as a multi-classification task, without considering the correlation between different relationships. However, two relationships in the opposite directions are usually not independent of each other. Hence, this kind of relationships are easily confused in the traditional way. In order to solve the problem of confusing the relationships of the same semantic with different entity directions, this paper proposes a neural network fusing discrimination information for relation classification. In the proposed model, discrimination information is used to distinguish the relationship of the same semantic with different entity directions, the direction of entity in space is transformed into the direction of vector in mathematics by the method of entity vector subtraction, and the result of entity vector subtraction is used as discrimination information. The model consists of three modules: sentence representation module, relation discrimination module and discrimination fusion module. Moreover, two fusion methods are used for feature fusion. One is a Cascade-based feature fusion method, and another is a feature fusion method based on convolution neural network. In addition, this paper uses the new function added by cross-entropy function and deformed Max-Margin function as the loss function of the model. The experimental results show that the proposed discriminant feature is effective in distinguishing confusing relationships, and the proposed loss function can improve the performance of the model to a certain extent. Finally, the proposed model achieves 84.8% of the F1 value without any additional features or NLP analysis tools. Hence, the proposed method has a promising prospect of being incorporated in various cross-media systems. 相似文献

15.

SRL-ESA-TextSum: A text summarization approach based on semantic role labeling and explicit semantic analysis

Muhidin Mohamed Mourad Oussalah 《Information processing & management》2019,56(4):1356-1372

Automatic text summarization attempts to provide an effective solution to today’s unprecedented growth of textual data. This paper proposes an innovative graph-based text summarization framework for generic single and multi document summarization. The summarizer benefits from two well-established text semantic representation techniques; Semantic Role Labelling (SRL) and Explicit Semantic Analysis (ESA) as well as the constantly evolving collective human knowledge in Wikipedia. The SRL is used to achieve sentence semantic parsing whose word tokens are represented as a vector of weighted Wikipedia concepts using ESA method. The essence of the developed framework is to construct a unique concept graph representation underpinned by semantic role-based multi-node (under sentence level) vertices for summarization. We have empirically evaluated the summarization system using the standard publicly available dataset from Document Understanding Conference 2002 (DUC 2002). Experimental results indicate that the proposed summarizer outperforms all state-of-the-art related comparators in the single document summarization based on the ROUGE-1 and ROUGE-2 measures, while also ranking second in the ROUGE-1 and ROUGE-SU4 scores for the multi-document summarization. On the other hand, the testing also demonstrates the scalability of the system, i.e., varying the evaluation data size is shown to have little impact on the summarizer performance, particularly for the single document summarization task. In a nutshell, the findings demonstrate the power of the role-based and vectorial semantic representation when combined with the crowd-sourced knowledge base in Wikipedia. 相似文献

16.

Dynamic commonsense knowledge fused method for Chinese implicit sentiment analysis

《Information processing & management》2022,59(3):102934

Compared with explicit sentiment analysis that attracts considerable attention, implicit sentiment analysis is a more difficult task due to the lack of sentimental words. The abundant information in an external sentimental knowledge base can play a significant complementary and expansion role. In this paper, a sentimental commonsense knowledge graph embedded multi-polarity orthogonal attention model is proposed to learn the implication of the implicit sentiment. We analyzed the effectiveness of different knowledge relations in the ConceptNet knowledge base in detail, and proposed a matching and filtering method to distill useful knowledge tuples for implicit sentiment analysis automatically. By introducing the sentimental information in the knowledge base, the proposed model can extend the semantic of a sentence with an implicit sentiment. Then, a bi-directional long–short term memory model with multi-polarity orthogonal attention is adopted to fuse the distilled sentimental knowledge with the semantic embedding, effectively enriching the representation of sentences. Experiments on the SMP2019-ECISA implicit sentiment dataset show that our model fully utilizes the information of the knowledge base and improves the performance of Chinese implicit sentiment analysis. 相似文献

17.

Multifaceted conceptual image indexing on the world wide web

Fariza Fauzi Mohammed Belkhatir 《Information processing & management》2013

相似文献

18.

Semantic understanding based on multi-feature kernel sparse representation and decision rules for mangrove growth

《Information processing & management》2022,59(2):102813

With the rapid development of remote sensing technology, using remote sensing technology is an important means to monitor the dynamic change of land cover and ecology. In view of the complexity of mangrove ecological monitoring in Dongzhaigang, Hainan Province of China, we propose a semantic understanding method of mangrove remote sensing image by combining a multi-feature kernel sparse classifier with a decision rule model in this paper. First, on the basis of multi-feature extraction, we take into account the spatial context relations of the samples and introduce the kernel function into the sparse representation classifier, a multi-feature kernel sparse representation classifier can be constructed to classify cover types of mangroves and their surrounding objects. Second, in view of growth conditions of mangrove area, we put forward a semantic understanding method of mangrove remote sensing image based on decision rules and divide mangrove and non-mangrove areas by combining classification results of the multi-feature kernel sparse representation classifier. We make a divisibility analysis based on the extracted features of spatial and spectral domains. Then select the best split attribute based on the maximum information gain criterion, to generate a semantic tree and extract semantic rules. Finally, we work on the semantic understanding of mangrove areas in line with decision rules and further divide mangrove areas into two categories: excellent growth and poor growth. Experimental results show that the proposed method can effectively identify mangrove areas and make decisions on mangrove growth. 相似文献

19.

A benchmark for evaluating Arabic contextualized word embedding models

《Information processing & management》2023,60(5):103452

Word embeddings, which represent words as numerical vectors in a high-dimensional space, are contextualized by generating a unique vector representation for each sense of a word based on the surrounding words and sentence structure. They are typically generated using such deep learning models as BERT and trained on large amounts of text data and using self-supervised learning techniques. Resulting embeddings are highly effective at capturing the nuances of language, and have been shown to significantly improve the performance of numerous NLP tasks. Word embeddings represent textual records of human thinking, with all the mental relations that we utilize to produce the succession of sentences that make up texts and discourses. Consequently, the distributed representation of words within embeddings ought to capture the reasoning relations that hold texts together. This paper makes its contribution to the field by proposing a benchmark for the assessment of contextualized word embeddings that probes into their capability for true contextualization by inspecting how well they capture resemblance, contrariety, comparability, identity, relations in time and space, causation, analogy, and sense disambiguation. The proposed metrics adopt a triangulation approach, so they use (1) Hume’s reasoning relations, (2) standard analogy, and (3) sense disambiguation. The benchmark has been evaluated against 22 Arabic contextualized embeddings and has proven to be capable of quantifying their differential performance in terms of these reasoning relations. Results of evaluation of the target embeddings revealed that they do take context into account and that they do reasonably well in sense disambiguation but have weakness in their identification of converseness, synonymy, complementarity, and analogy. Results also show that size of an embedding has diminishing returns because the highly frequent language patterns swamp low frequency patterns. Furthermore, the suggest that future research endeavors should not be concerned with the quantity of data as much as its quality, and that it should focus more on the representativeness of data, and on model architecture, design, and training. 相似文献

20.

LDA单词图像表示的蒙古文古籍图像关键词检索方法

白淑霞鲍玉来《现代情报》2017,37(7):51

[目的]为了克服传统视觉词袋方法（Bag-of-Visual-Words）中忽略视觉单词间的空间关系和语义信息等问题。[方法]本文提出一种与视觉语言模型相结合的基于LDA主题模型,并采用查询似然模型实现检索。[结果]实验数据表明,本文所提出的基于LDA的表示方法可以高效、准确地解决蒙古文古籍的关键词检索问题。[结论]同时,该方法的性能比BoVW方法有显著提高。相似文献