首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Image and text matching bridges visual and textual modality differences and plays a considerable role in cross-modal retrieval. Much progress has been achieved through semantic representation and alignment. However, the distribution of multimedia data is severely unbalanced and contains many low-frequency occurrences, which are often ignored and cause performance degradation, i.e., the long-tail effect. In this work, we propose a novel rare-aware attention network (RAAN), which explores and exploits textual rare content for tackling the long-tail effect of image and text matching. Specifically, we first design a rare-aware mining module, which contains global prior information construction and rare fragment detector for modeling the characteristic of rare content. Then, the rare attention matching utilizes prior information as attention to guide the representation enhancement of rare content and introduces the rareness representation to strengthen the similarity calculation. Finally, we design prior information loss to optimize the model together with the triplet loss. We perform quantitative and qualitative experiments on two large-scale databases and achieve leading performance. In particular, we conduct 0-shot test for rare content and improve rSum by 21.0 and 41.5 on Flickr30K (155,000 image and text pairs) and MSCOCO (616,435 image and text pairs), demonstrating the effectiveness of the proposed method for the long-tail effect.  相似文献   

2.
Information residing in multiple modalities (e.g., text, image) of social media posts can jointly provide more comprehensive and clearer insights into an ongoing emergency. To identify information valuable for humanitarian aid from noisy multimodal data, we first clarify the categories of humanitarian information, and define a multi-label multimodal humanitarian information identification task, which can adapt to the label inconsistency issue caused by modality independence while maintaining the correlation between modalities. We proposed a Multimodal Humanitarian Information Identification Model that simultaneously captures the Correlation and Independence between modalities (CIMHIM). A tailor-made dataset containing 4,383 annotated text-image pairs was built to evaluate the effectiveness of our model. The experimental results show that CIMHIM outperforms both unimodal and multimodal baseline methods by at least 0.019 in macro-F1 and 0.022 in accuracy. The combination of OCR text, object-level features, and the decision rule based on label correlations enhances the overall performance of CIMHIM. Additional experiments on a similar dataset (CrisisMMD) also demonstrate the robustness of CIMHIM. The task, model, and dataset proposed in this study contribute to the practice of leveraging multimodal social media resources to support effective emergency response.  相似文献   

3.
With the explosion of multilingual content on Web, particularly in social media platforms, identification of languages present in the text is becoming an important task for various applications. While automatic language identification (ALI) in social media text is considered to be a non-trivial task due to the presence of slang words, misspellings, creative spellings and special elements such as hashtags, user mentions etc., ALI in multilingual environment becomes even more challenging task. In a highly multilingual society, code-mixing without affecting the underlying language sense has become a natural phenomenon. In such a dynamic environment, conversational text alone often fails to identify the underlying languages present in the text. This paper proposes various methods of exploiting social conversational features for enhancing ALI performance. Although social conversational features for ALI have been explored previously using methods like probabilistic language modeling, these models often fail to address issues related to code-mixing, phonetic typing, out-of-vocabulary etc. which are prevalent in a highly multilingual environment. This paper differs in the way the social conversational features are used to propose text refinement strategies that are suitable for ALI in highly multilingual environment. The contributions in this paper therefore includes the following. First, this paper analyzes the characteristics of various social conversational features by exploiting language usage patterns. Second, various methods of text refinement suitable for language identification are proposed. Third, the effects of the proposed refinement methods are investigated using various sentence level language identification frameworks. From various experimental observations over three conversational datasets collected from Facebook, Youtube and Twitter social media platforms, it is evident that our proposed method of ALI using social conversational features outperforms the baseline counterparts.  相似文献   

4.
Social media users are increasingly using both images and text to express their opinions and share their experiences, instead of only using text in the conventional social media. Consequently, the conventional text-based sentiment analysis has evolved into more complicated studies of multimodal sentiment analysis. To tackle the challenge of how to effectively exploit the information from both visual content and textual content from image-text posts, this paper proposes a new image-text consistency driven multimodal sentiment analysis approach. The proposed approach explores the correlation between the image and the text, followed by a multimodal adaptive sentiment analysis method. To be more specific, the mid-level visual features extracted by the conventional SentiBank approach are used to represent visual concepts, with the integration of other features, including textual, visual and social features, to develop a machine learning sentiment analysis approach. Extensive experiments are conducted to demonstrate the superior performance of the proposed approach.  相似文献   

5.
6.
本文研究一种改进的近邻搜索算法的图像匹配技术。本文采用基于特征的图像匹配方法,利用SIFT算法提取特征点。在特征点匹配的过程中,为提高搜索样本特征点的最近邻和次近邻特征点的速度,本文采用一种基于二叉检索树算法改进的近邻搜索算法,该算法用最近邻与次近邻比值来进行特征点的匹配。用MATLAB语言实现该算法并运用到图像特征匹配中,实验证明优于原算法并具有较高实时性。  相似文献   

7.
Aiming at the problem that watermark information is easy to be lost in two-dimensional text image watermarking, a three-dimensional text image watermarking model is proposed based on multilayer overlapping of extracted two-dimensional information, and a specific method is accordingly realized by means of embedding, extracting and overlapping of multiple watermarks in sequence. We discuss the influence of the sequence order, positional distancing and color difference of the extracted multiple two-dimensional information on displaying a three-dimensional text watermark image. In addition, considering the crucial evaluation indices of the invisibility and robustness for the watermarking algorithm, the selection method for superior embedding regions of multiple watermarks is also constructed to enhance the robust performance of watermarks under the premise of effective invisibility. Moreover, we embed the multiple two-dimensional information into the host image by using the undecimated dual-tree complex wavelet transforms - bidiagonal singular value decomposition algorithm. In this way, the extracted multiple two-dimensional information is automatically overlapped to generate the three-dimensional text watermarking according to the optimal matching parameters. We use standard image processing data set to carry out experiments, the results show that the peak signal to noise ratio before and after embedded watermarks is higher than 39 dB, and the peak signal to noise ratio between the original watermarks and the extracted watermarks is more than 37 dB, which is superior to the current reported watermark model. Therefore, it is suggested that the proposed model shows good invisibility and robustness and can reduce the loss of watermark during transmission and attack as much as possible.  相似文献   

8.
Pseudo-relevance feedback (PRF) is a well-known method for addressing the mismatch between query intention and query representation. Most current PRF methods consider relevance matching only from the perspective of terms used to sort feedback documents, thus possibly leading to a semantic gap between query representation and document representation. In this work, a PRF framework that combines relevance matching and semantic matching is proposed to improve the quality of feedback documents. Specifically, in the first round of retrieval, we propose a reranking mechanism in which the information of the exact terms and the semantic similarity between the query and document representations are calculated by bidirectional encoder representations from transformers (BERT); this mechanism reduces the text semantic gap by using the semantic information and improves the quality of feedback documents. Then, our proposed PRF framework is constructed to process the results of the first round of retrieval by using probability-based PRF methods and language-model-based PRF methods. Finally, we conduct extensive experiments on four Text Retrieval Conference (TREC) datasets. The results show that the proposed models outperform the robust baseline models in terms of the mean average precision (MAP) and precision P at position 10 (P@10), and the results also highlight that using the combined relevance matching and semantic matching method is more effective than using relevance matching or semantic matching alone in terms of improving the quality of feedback documents.  相似文献   

9.
We investigate how, and to what extent, morphological complexity of the language influences text classification using support vector machines (SVM). The Croatian–English parallel corpus provides the basis for direct comparison of two languages of radically different morphological complexity. We quantified, compared, and statistically tested the effects of morphological normalisation on SVM classifier performance based on a series of parallel experiments on both languages, carried over a large scale of different feature subset sizes obtained by different feature selection methods, and applying different levels of morphological normalisation. We also quantified the trade-off between feature space size and performance for different levels of morphological normalisation, and compared the results for both languages. Our experiments have shown that the improvements in SVM classifier performance is statistically significant; they are greater for small and medium number of features, especially for Croatian, whereas for large number of features the improvements are rather small and may be negligible in practice for both languages.  相似文献   

10.
11.
OCR errors in text harm information retrieval performance. Much research has been reported on modelling and correction of Optical Character Recognition (OCR) errors. Most of the prior work employ language dependent resources or training texts in studying the nature of errors. However, not much research has been reported that focuses on improving retrieval performance from erroneous text in the absence of training data. We propose a novel approach for detecting OCR errors and improving retrieval performance from the erroneous corpus in a situation where training samples are not available to model errors. In this paper we propose a method that automatically identifies erroneous term variants in the noisy corpus, which are used for query expansion, in the absence of clean text. We employ an effective combination of contextual information and string matching techniques. Our proposed approach automatically identifies the erroneous variants of query terms and consequently leads to improvement in retrieval performance through query expansion. Our proposed approach does not use any training data or any language specific resources like thesaurus for identification of error variants. It also does not expend any knowledge about the language except that the word delimiter is blank space. We have tested our approach on erroneous Bangla (Bengali in English) and Hindi FIRE collections, and also on TREC Legal IIT CDIP and TREC 5 Confusion track English corpora. Our proposed approach has achieved statistically significant improvements over the state-of-the-art baselines on most of the datasets.  相似文献   

12.
Traditional content based image retrieval attempts to retrieve images using syntactic features for a query image. Annotated image banks and Google allow the use of text to retrieve images. In this paper, we studied the task of using the content of an image to retrieve information in general. We describe the significance of object identification in an information retrieval paradigm that uses image set as intermediate means in indexing and matching. We also describe a unique Singapore Tourist Object Identification Collection with associated queries and relevance judgments for evaluating the new task and the need for efficient image matching using simple image features. We present comprehensive experimental evaluation on the effects of feature dimensions, context, spatial weightings, coverage of image indexes, and query devices on task performance. Lastly we describe the current system developed to support mobile image-based tourist information retrieval.  相似文献   

13.
Arabic is a widely spoken language but few mining tools have been developed to process Arabic text. This paper examines the crime domain in the Arabic language (unstructured text) using text mining techniques. The development and application of a Crime Profiling System (CPS) is presented. The system is able to extract meaningful information, in this case the type of crime, location and nationality, from Arabic language crime news reports. The system has two unique attributes; firstly, information extraction that depends on local grammar, and secondly, dictionaries that can be automatically generated. It is shown that the CPS improves the quality of the data through reduction where only meaningful information is retained. Moreover, the Self Organising Map (SOM) approach is adopted in order to perform the clustering of the crime reports, based on crime type. This clustering technique is improved because only refined data containing meaningful keywords extracted through the information extraction process are inputted into it, i.e. the data are cleansed by removing noise. The proposed system is validated through experiments using a corpus collated from different sources; it was not used during system development. Precision, recall and F-measure are used to evaluate the performance of the proposed information extraction approach. Also, comparisons are conducted with other systems. In order to evaluate the clustering performance, three parameters are used: data size, loading time and quantization error.  相似文献   

14.
In this paper, we investigate two new reflectance and illumination decomposition models based on a nonlocal partial differential equation (PDE) applied to text images. Taking into consideration the higher regularity level of the illumination compared to the reflectance, we propose a nonlocal PDE which deals with repetitive structures and textures that characterize the text image much better compared to the classical local PDEs. The aim of this approach is to use the repetitive features of the reflectance to efficiently extract it from the non-uniform illumination. This idea is motivated by extending the range of application of the nonlocal operators to such a problem. Numerical experiments on both grayscale and color text images show the performance and strength of the proposed nonlocal PDE.  相似文献   

15.
Image–text matching is a crucial branch in multimedia retrieval which relies on learning inter-modal correspondences. Most existing methods focus on global or local correspondence and fail to explore fine-grained global–local alignment. Moreover, the issue of how to infer more accurate similarity scores remains unresolved. In this study, we propose a novel unifying knowledge iterative dissemination and relational reconstruction (KIDRR) network for image–text matching. Particularly, the knowledge graph iterative dissemination module is designed to iteratively broadcast global semantic knowledge, enabling relevant nodes to be associated, resulting in fine-grained intra-modal correlations and features. Hence, vector-based similarity representations are learned from multiple perspectives to model multi-level alignments comprehensively. The relation graph reconstruction module is further developed to enhance cross-modal correspondences by constructing similarity relation graphs and adaptively reconstructing them. We conducted experiments on the datasets Flickr30K and MSCOCO, which have 31,783 and 123,287 images, respectively. Experiments show that KIDRR achieves improvements of nearly 2.2% and 1.6% relative to Recall@1 on Flicr30K and MSCOCO, respectively, compared to the current state-of-the-art baselines.  相似文献   

16.
何小琴 《现代情报》2012,32(8):45-48
采购联盟合作伙伴的选择是采购联盟成功的一个关键,而伙伴搜索是伙伴选择重要的第一步。本文将电子商务中采购联盟伙伴搜索问题转换为采购需求文本的语义匹配问题,介绍了一种基于领域本体和语义相似度的采购联盟伙伴搜索模型。该模型通过对采购需求文本概念向量的上位填充和语义相似度计算来量化采购需求的语义匹配程度。  相似文献   

17.
18.
High quality summary is the target and challenge for any automatic text summarization. In this paper, we introduce a different hybrid model for automatic text summarization problem. We exploit strengths of different techniques in building our model: we use diversity-based method to filter similar sentences and select the most diverse ones, differentiate between the more important and less important features using the swarm-based method and use fuzzy logic to make the risks, uncertainty, ambiguity and imprecise values of the text features weights flexibly tolerated. The diversity-based method focuses to reduce redundancy problems and the other two techniques concentrate on the scoring mechanism of the sentences. We presented the proposed model in two forms. In the first form of the model, diversity measures dominate the behavior of the model. In the second form, the diversity constraint is no longer imposed on the model behavior. That means the diversity-based method works same as fuzzy swarm-based method. The results showed that the proposed model in the second form performs better than the first form, the swarm model, the fuzzy swarm method and the benchmark methods. Over results show that combination of diversity measures, swarm techniques and fuzzy logic can generate good summary containing the most important parts in the document.  相似文献   

19.
XMage is introduced in this paper as a method for partial similarity searching in image databases. Region-based image retrieval is a method of retrieving partially similar images. It has been proposed as a way to accurately process queries in an image database. In region-based image retrieval, region matching is indispensable for computing the partial similarity between two images because the query processing is based upon regions instead of the entire image. A naive method of region matching is a sequential comparison between regions, which causes severe overhead and deteriorates the performance of query processing. In this paper, a new image contents representation, called Condensed eXtended Histogram (CXHistogram), is presented in conjunction with a well-defined distance function CXSim() on the CX-Histogram. The CXSim() is a new image-to-image similarity measure to compute the partial similarity between two images. It achieves the effect of comparing regions of two images by simply comparing the two images. The CXSim() reduces query space by pruning irrelevant images, and it is used as a filtering function before sequential scanning. Extensive experiments were performed on real image data to evaluate XMage. It provides a significant pruning of irrelevant images with no false dismissals. As a consequence, it achieves up to 5.9-fold speed-up in search over the R*-tree search followed by sequential scanning.  相似文献   

20.
The replies of people seeking support in online mental health communities can be analyzed to discover if they feel better after receiving support; feeling better indicates a cognitive change. Most research uses key phrase matching and word frequency statistics to identify psychological cognitive change, methods that result in omissions and inaccuracy. This study constructs an intelligent method for identifying psychological cognitive change based on natural language processing technology. It incorporates information related to emotions that appears in reply text to help identify whether psychological cognitive change has occurred. The model first encodes the emotion information based on rule matching and manual annotation, then adds the encoded emotion lexicon and a cognitive change lexicon to a word2vec high-dimensional semantic word vector training, converts the annotated cognitive change recognition text into a vector matrix using the trained model, and train in the annotated text using TextCNN. To compare the results with those of the traditional methods (key phrase matching and sentiment word frequency statistics), this study uses a semi-automated approach to construct a lexicon of psychological cognitive change, as well as a keyword lexicon without cognitive change, based on word vectors and similarity. We compare the performance of the classifier before and after the fusion of the graphical emotion information, compare the LSTM and Transformer as baselines, and compare traditional word frequency statistics methods. The experimental results show that our proposed classification model performs better than the others; it achieves 84.38% precision, an 84.09% recall rate, and an 84.17% F1 value. Our work bears methodological implications for online mental health platforms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号