期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

唐晓波刘志源《情报科学》2021,39(5):3-11

【目的/意义】金融领域实体关系抽取是构造金融知识库的基础,对金融领域的文本信息利用具有重要作用。本文提出金融领域实体关系联合抽取模型,增加了对金融文本复杂重叠关系的识别,可以有效避免传统的流水线模型中识别错误在不同任务之间的传递。【方法/过程】本文构建了高质量金融文本语料,提出一种新的序列标注模式和实体关系匹配规则,在预训练语言模型BERT（Bidirectional Encoder Representations from Transformers）的基础上结合双向门控循环单元 BiGRU（Bidirectional Gated Recurrent Units）与条件随机场 CRF（Conditional Random Field）构建了端到端的序列标注模型,实现了实体关系的联合抽取。【结果/结论】针对金融领域文本数据进行实验,实验结果表明本文提出的联合抽取模型在关系抽取以及重叠关系抽取上的F1值分别达到了0.627和 0.543,初步验证了中文语境下本文模型对金融领域实体关系抽取的有效性。【创新/局限】结合金融文本特征提出了新的序列标注模式并构建了基于BERT的金融领域实体关系联合抽取模型,实现了对金融文本中实体间重叠关系的识别。相似文献

2.

Boundaries and edges rethinking: An end-to-end neural model for overlapping entity relation extraction

《Information processing & management》2020,57(6):102311

Overlapping entity relation extraction has received extensive research attention in recent years. However, existing methods suffer from the limitation of long-distance dependencies between entities, and fail to extract the relations when the overlapping situation is relatively complex. This issue limits the performance of the task. In this paper, we propose an end-to-end neural model for overlapping relation extraction by treating the task as a quintuple prediction problem. The proposed method first constructs the entity graphs by enumerating possible candidate spans, then models the relational graphs between entities via a graph attention model. Experimental results on five benchmark datasets show that the proposed model achieves the current best performance, outperforming previous methods and baseline systems by a large margin. Further analysis shows that our model can effectively capture the long-distance dependencies between entities in a long sentence. 相似文献

3.

AHAB: Aligning heterogeneous knowledge bases via iterative blocking

Ling Chen Weidong Gu Xiaoxue Tian Gencai Chen 《Information processing & management》2019,56(1):1-13

With the development of information extraction, there have been an increasing number of large-scale knowledge bases available in different domains. In recent years, a great deal of approaches have been proposed for large-scale knowledge base alignment. Most of them are based on iterative matching. If a pair of entities has been aligned, their compatible neighbors are selected as candidate entity pairs. The limitation of these methods is that they discover candidate entity pairs depending on aligned relations, which cannot be used for aligning heterogeneous knowledge bases. Only few existing methods focus on aligning heterogeneous knowledge bases, which discover candidate entity pairs just for once by traditional blocking methods. However, the performance of these methods depends on blocking keys heavily, which are hard to select. In this paper, we present an approach for aligning heterogeneous knowledge bases via iterative blocking (AHAB) to improve the discovery and refinement of candidate entity pairs. AHAB iteratively utilizes different relations for blocking, and then matches block pairs based on matched entity pairs. The Cartesian product of unmatched entities in matched block pairs forms candidate entity pairs. By filtering out dissimilar candidate entity pairs, matched entity pairs will be found. The number of matched entity pairs proliferates with iterations, which in turn helps match block pairs in each iteration. Experiments on real-world heterogeneous knowledge bases demonstrate that AHAB is able to yield a competitive performance. 相似文献

4.

Exploring syntactic structured features over parse trees for relation extraction using kernel methods

Min Zhang GuoDong Zhou Aiti Aw 《Information processing & management》2008

Extracting semantic relationships between entities from text documents is challenging in information extraction and important for deep information processing and management. This paper proposes to use the convolution kernel over parse trees together with support vector machines to model syntactic structured information for relation extraction. Compared with linear kernels, tree kernels can effectively explore implicitly huge syntactic structured features embedded in a parse tree. Our study reveals that the syntactic structured features embedded in a parse tree are very effective in relation extraction and can be well captured by the convolution tree kernel. Evaluation on the ACE benchmark corpora shows that using the convolution tree kernel only can achieve comparable performance with previous best-reported feature-based methods. It also shows that our method significantly outperforms previous two dependency tree kernels for relation extraction. Moreover, this paper proposes a composite kernel for relation extraction by combining the convolution tree kernel with a simple linear kernel. Our study reveals that the composite kernel can effectively capture both flat and structured features without extensive feature engineering, and easily scale to include more features. Evaluation on the ACE benchmark corpora shows that the composite kernel outperforms previous best-reported methods in relation extraction. 相似文献

5.

Automatically building templates for entity summary construction

Peng Li Yinglin Wang Jing Jiang 《Information processing & management》2013

In this paper, we propose a novel approach to automatic generation of summary templates from given collections of summary articles. We first develop an entity-aspect LDA model to simultaneously cluster both sentences and words into aspects. We then apply frequent subtree pattern mining on the dependency parse trees of the clustered and labeled sentences to discover sentence patterns that well represent the aspects. Finally, we use the generated templates to construct summaries for new entities. Key features of our method include automatic grouping of semantically related sentence patterns and automatic identification of template slots that need to be filled in. Also, we implement a new sentence compression algorithm which use dependency tree instead of parser tree. We apply our method on five Wikipedia entity categories and compare our method with three baseline methods. Both quantitative evaluation based on human judgment and qualitative comparison demonstrate the effectiveness and advantages of our method. 相似文献

6.

Incremental cue phrase learning and bootstrapping method for causality extraction using cue phrase and word pair probabilities

Du-Seong Chang Key-Sun Choi 《Information processing & management》2006

This work aims to extract possible causal relations that exist between noun phrases. Some causal relations are manifested by lexical patterns like causal verbs and their sub-categorization. We use lexical patterns as a filter to find causality candidates and we transfer the causality extraction problem to the binary classification. To solve the problem, we introduce probabilities for word pair and concept pair that could be part of causal noun phrase pairs. We also use the cue phrase probability that could be a causality pattern. These probabilities are learned from the raw corpus in an unsupervised manner. With this probabilistic model, we increase both precision and recall. Our causality extraction shows an F-score of 77.37%, which is an improvement of 21.14 percentage points over the baseline model. The long distance causal relation is extracted with the binary tree-styled cue phrase. We propose an incremental cue phrase learning method based on the cue phrase confidence score that was measured after each causal classifier learning step. A better recall of 15.37 percentage points is acquired after the cue phrase learning. 相似文献

7.

Feature-enriched matrix factorization for relation extraction

Duc-Thuan Vo Ebrahim Bagheri 《Information processing & management》2019,56(3):424-444

Relation extraction aims at finding meaningful relationships between two named entities from within unstructured textual content. In this paper, we define the problem of information extraction as a matrix completion problem where we employ the notion of universal schemas formed as a collection of patterns derived from open information extraction systems as well as additional features derived from grammatical clause patterns and statistical topic models. One of the challenges with earlier work that employ matrix completion methods is that such approaches require a sufficient number of observed relation instances to be able to make predictions. However, in practice there is often insufficient number of explicit evidence supporting each relation type that could be used within the matrix model. Hence, existing work suffer from a low recall. In our work, we extend the work in the state of the art by proposing novel ways of integrating two sets of features, i.e., topic models and grammatical clause structures, for alleviating the low recall problem. More specifically, we propose that it is possible to (1) employ grammatical clause information from textual sentences to serve as an implicit indication of relation type and argument similarity. The basis for this is that it is likely that similar relation types and arguments are observed within similar grammatical structures, and (2) benefit from statistical topic models to determine similarity between relation types and arguments. We employ statistical topic models to determine relation type and argument similarity based on their co-occurrence within the same topics. We have performed extensive experiments based on both gold standard and silver standard datasets. The experiments show that our approach has been able to address the low recall problem in existing methods, by showing an improvement of 21% on recall and 8% on f-measure over the state of the art baseline. 相似文献

8.

TSVFN: Two-Stage Visual Fusion Network for multimodal relation extraction

《Information processing & management》2023,60(3):103264

Multimodal relation extraction is a critical task in information extraction, aiming to predict the class of relations between head and tail entities from linguistic sequences and related images. However, the current works are vulnerable to less relevant visual objects detected from images and are not able to sufficiently fuse visual information into text pre-trained models. To overcome these problems, we propose a Two-Stage Visual Fusion Network (TSVFN) that employs the multimodal fusion approach in vision-enhanced entity relation extraction. In the first stage, we design multimodal graphs, whose novelty lies mainly in transforming the sequence learning into the graph learning. In the second stage, we merge the transformer-based visual representation into the text pre-trained model by a multi-scale cross-model projector. Specifically, two multimodal fusion operations are implemented inside the pre-trained model respectively. We finally accomplish deep interaction of multimodal multi-structured data in two fusion stages. Extensive experiments are conducted on a dataset (MNRE), our model outperforms the current state-of-the-art method by 1.76%, 1.52%, 1.29%, and 1.17% in terms of accuracy, precision, recall, and F1 score, respectively. Moreover, our model also achieves excellent results under the condition of fewer samples. 相似文献

9.

面向信息检索的汉语同义词自动识别和挖掘 总被引：3，自引：0，他引：3

陆勇侯汉清《情报理论与实践》2006,29(4):472-475

为了提高同义词自动挖掘的效率，本文提出了从词典释义中自动识别和挖掘同义词的方法，使用超链接分析算法和模式匹配算法，从不同的角度提取同义词：第一部分是把词汇之间注释与被注释的关系看成是一种链接关系。对给定的词汇进行分析，把与给定词汇具有链接关系的所有相关词汇构造一个词汇图，图中的每一个节点代表相关词，每条弧代表了词汇之间注释与被注释的关系。利用超链接分析方法并结合PageRank算法，计算词汇的PageRank值，把PageRank值看成是体现词汇之间语义相似性的衡量指标，最后为每一个词汇生成候选同义词集，并通过一定的筛选原则和方法，推荐出最佳的同义词。第二部分是利用词汇定义模式，对词汇的释义方式进行分析，归纳总结出在词典释义中同义词出现的模式，进而利用模式匹配方法识别和挖掘同义词。此外，利用模式匹配方法对Web网页和期刊论文中的同义词也进行了挖掘测试。测试结果表明，利用模式匹配和超链接分析方法来自动识别和挖掘同义词具有可行性和实用性。相似文献

10.

Social relation extraction from texts using a support-vector-machine-based dependency trigram kernel

Maengsik Choi Harksoo Kim 《Information processing & management》2013

We propose a social relation extraction system using dependency-kernel-based support vector machines (SVMs). The proposed system classifies input sentences containing two people’s names on the basis of whether they do or do not describe social relations between two people. The system then extracts relation names (i.e., social-related keywords) from sentences describing social relations. We propose new tree kernels called dependency trigram kernels for effectively implementing these processes using SVMs. Experiments showed that the proposed kernels delivered better performance than the existing dependency kernel. On the basis of the experimental evidence, we suggest that the proposed system can be used as a useful tool for automatically constructing social networks from unstructured texts. 相似文献

11.

Predicate constraints based question answering over knowledge graph

《Information processing & management》2019,56(3):445-462

Generally, QA systems suffer from the structural difference where a question is composed of unstructured data, while its answer is made up of structured data in a Knowledge Graph (KG). To bridge this gap, most approaches use lexicons to cover data that are represented differently. However, the existing lexicons merely deal with representations for entity and relation mentions rather than consulting the comprehensive meaning of the question. To resolve this, we design a novel predicate constraints lexicon which restricts subject and object types for a predicate. It facilitates a comprehensive validation of a subject, predicate and object simultaneously. In this paper, we propose Predicate Constraints based Question Answering (PCQA). Our method prunes inappropriate entity/relation matchings to reduce search space, thus leading to an improvement of accuracy. Unlike the existing QA systems, we do not use any templates but generates query graphs to cover diverse types of questions. In query graph generation, we put more focus on matching relations rather than linking entities. This is well-suited to the use of predicate constraints. Our experimental results prove the validity of our approach and demonstrate a reasonable performance compared to other methods which target WebQuestions and Free917 benchmarks. 相似文献

12.

An end-to-end joint model for evidence information extraction from court record document

《Information processing & management》2020,57(6):102305

Information extraction is one of the important tasks in the field of Natural Language Processing (NLP). Most of the existing methods focus on general texts and little attention is paid to information extraction in specialized domains such as legal texts. This paper explores the task of information extraction in the legal field, which aims to extract evidence information from court record documents (CRDs). In the general domain, entities and relations are mostly words and phrases, indicating that they do not span multiple sentences. In contrast, evidence information in CRDs may span multiple sentences, while existing models cannot handle this situation. To address this issue, we first add a classification task in addition to the extraction task. We then formulate the two tasks as a multi-task learning problem and present a novel end-to-end model to jointly address the two tasks. The joint model adopts a shared encoder followed by separate decoders for the two tasks. The experimental results on the dataset show the effectiveness of the proposed model, which can obtain 72.36% F1 score, outperforming previous methods and strong baselines by a large margin. 相似文献

13.

Extracting relation information from text documents by exploring various types of knowledge

GuoDong Zhou Min Zhang 《Information processing & management》2007

Extracting semantic relationships between entities from text documents is challenging in information extraction and important for deep information processing and management. This paper investigates the incorporation of diverse lexical, syntactic and semantic knowledge in feature-based relation extraction using support vector machines. Our study illustrates that the base phrase chunking information is very effective for relation extraction and contributes to most of the performance improvement from syntactic aspect while current commonly used features from full parsing give limited further enhancement. This suggests that most of useful information in full parse trees for relation extraction is shallow and can be captured by chunking. This indicates that a cheap and robust solution in relation extraction can be achieved without decreasing too much in performance. We also demonstrate how semantic information such as WordNet, can be used in feature-based relation extraction to further improve the performance. Evaluation on the ACE benchmark corpora shows that effective incorporation of diverse features enables our system outperform previously best-reported systems. It also shows that our feature-based system significantly outperforms tree kernel-based systems. This suggests that current tree kernels fail to effectively explore structured syntactic information in relation extraction. 相似文献

14.

Robot rights? Towards a social-relational justification of moral consideration

Mark Coeckelbergh 《Ethics and Information Technology》2010,12(3):209-221

Should we grant rights to artificially intelligent robots? Most current and near-future robots do not meet the hard criteria set by deontological and utilitarian theory. Virtue ethics can avoid this problem with its indirect approach. However, both direct and indirect arguments for moral consideration rest on ontological features of entities, an approach which incurs several problems. In response to these difficulties, this paper taps into a different conceptual resource in order to be able to grant some degree of moral consideration to some intelligent social robots: it sketches a novel argument for moral consideration based on social relations. It is shown that to further develop this argument we need to revise our existing ontological and social-political frameworks. It is suggested that we need a social ecology, which may be developed by engaging with Western ecology and Eastern worldviews. Although this relational turn raises many difficult issues and requires more work, this paper provides a rough outline of an alternative approach to moral consideration that can assist us in shaping our relations to intelligent robots and, by extension, to all artificial and biological entities that appear to us as more than instruments for our human purposes. 相似文献

15.

Using cause-effect relations in text to improve information retrieval precision

《Information processing & management》2001,37(1):119-145

This study attempted to use semantic relations expressed in text, in particular cause-effect relations, to improve information retrieval effectiveness. The study investigated whether the information obtained by matching cause-effect relations expressed in documents with the cause-effect relations expressed in users’ queries can be used to improve document retrieval results, in comparison to using just keyword matching without considering relations.An automatic method for identifying and extracting cause-effect information in Wall Street Journal text was developed. Causal relation matching was found to yield a small but significant improvement in retrieval results when the weights used for combining the scores from different types of matching were customized for each query. Causal relation matching did not perform better than word proximity matching (i.e. matching pairs of causally related words in the query with pairs of words that co-occur within document sentences), but the best results were obtained when causal relation matching was combined with word proximity matching. The best kind of causal relation matching was found to be one in which one member of the causal relation (either the cause or the effect) was represented as a wildcard that could match with any word. 相似文献

16.

Crime base: Towards building a knowledge base for crime entities and their relationships from online news papers

《Information processing & management》2019,56(6):102059

In the current era of internet, information related to crime is scattered across many sources namely news media, social networks, blogs, and video repositories, etc. Crime reports published in online newspapers are often considered as reliable compared to crowdsourced data like social media and contain crime information not only in the form of unstructured text but also in the form of images. Given the volume and availability of crime-related information present in online newspapers, gathering and integrating crime entities from multiple modalities and representing them as a knowledge base in machine-readable form will be useful for any law enforcement agencies to analyze and prevent criminal activities. Extant research works to generate the crime knowledge base, does not address extraction of all non-redundant entities from text and image data present in multiple newspapers. Hence, this work proposes Crime Base, an entity relationship based system to extract and integrate crime related text and image data from online newspapers with a focus towards reducing duplicity and loss of information in the knowledge base. The proposed system uses a rule-based approach to extract the entities from text and image captions. The entities extracted from text data are correlated using contextual as-well-as semantic similarity measures and image entities are correlated using low-level and high-level image features. The proposed system also presents an integrated view of these entities and their relations in the form of a knowledge base using OWL. The system is tested for a collection of crime related articles from popular Indian online newspapers. 相似文献

17.

Spatial multi-scaled chimera states of cerebral cortex network and its inherent structure-dynamics relationship in human brain

Siyu Huo Changhai Tian Muhua Zheng Shuguang Guan Changsong Zhou Zonghua Liu 《国家科学评论(英文版)》2021,8(1)

Human cerebral cortex displays various dynamics patterns under different states, however the mechanism how such diverse patterns can be supported by the underlying brain network is still not well understood. Human brain has a unique network structure with different regions of interesting to perform cognitive tasks. Using coupled neural mass oscillators on human cortical network and paying attention to both global and local regions, we observe a new feature of chimera states with multiple spatial scales and a positive correlation between the synchronization preference of local region and the degree of symmetry of the connectivity of the region in the network. Further, we use the concept of effective symmetry in the network to build structural and dynamical hierarchical trees and find close matching between them. These results help to explain the multiple brain rhythms observed in experiments and suggest a generic principle for complex brain network as a structure substrate to support diverse functional patterns. 相似文献

18.

Self-training on refined clause patterns for relation extraction

Duc-Thuan Vo Ebrahim Bagheri 《Information processing & management》2018,54(4):686-706

Within the context of Information Extraction (IE), relation extraction is oriented towards identifying a variety of relation phrases and their arguments in arbitrary sentences. In this paper, we present a clause-based framework for information extraction in textual documents. Our framework focuses on two important challenges in information extraction: 1) Open Information Extraction and (OIE), and 2) Relation Extraction (RE). In the plethora of research that focus on the use of syntactic and dependency parsing for the purposes of detecting relations, there has been increasing evidence of incoherent and uninformative extractions. The extracted relations may even be erroneous at times and fail to provide a meaningful interpretation. In our work, we use the English clause structure and clause types in an effort to generate propositions that can be deemed as extractable relations. Moreover, we propose refinements to the grammatical structure of syntactic and dependency parsing that help reduce the number of incoherent and uninformative extractions from clauses. In our experiments both in the open information extraction and relation extraction domains, we carefully evaluate our system on various benchmark datasets and compare the performance of our work against existing state-of-the-art information extraction systems. Our work shows improved performance compared to the state-of-the-art techniques. 相似文献

19.

Extracting temporal and causal relations based on event networks

《Information processing & management》2020,57(6):102319

Event relations specify how different event flows expressed within the context of a textual passage relate to each other in terms of temporal and causal sequences. There have already been impactful work in the area of temporal and causal event relation extraction; however, the challenge with these approaches is that (1) they are mostly supervised methods and (2) they rely on syntactic and grammatical structure patterns at the sentence-level. In this paper, we address these challenges by proposing an unsupervised event network representation for temporal and causal relation extraction that operates at the document level. More specifically, we benefit from existing Open IE systems to generate a set of triple relations that are then used to build an event network. The event network is bootstrapped by labeling the temporal disposition of events that are directly linked to each other. We then systematically traverse the event network to identify the temporal and causal relations between indirectly connected events. We perform experiments based on the widely adopted TempEval-3 and Causal-TimeBank corpora and compare our work with several strong baselines. We show that our method improves performance compared to several strong methods. 相似文献

20.

An exploration of ranking models and feedback method for related entity finding

Xitong Liu Wei Zheng Hui Fang 《Information processing & management》2013

Most existing search engines focus on document retrieval. However, information needs are certainly not limited to finding relevant documents. Instead, a user may want to find relevant entities such as persons and organizations. In this paper, we study the problem of related entity finding. Our goal is to rank entities based on their relevance to a structured query, which specifies an input entity, the type of related entities and the relation between the input and related entities. We first discuss a general probabilistic framework, derive six possible retrieval models to rank the related entities, and then compare these models both analytically and empirically. To further improve performance, we study the problem of feedback in the context of related entity finding. Specifically, we propose a mixture model based feedback method that can utilize the pseudo feedback entities to estimate an enriched model for the relation between the input and related entities. Experimental results over two standard TREC collections show that the derived relation generation model combined with a relation feedback method performs better than other models. 相似文献