首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 484 毫秒
1.
Graph-based recommendation approaches use a graph model to represent the relationships between users and items, and exploit the graph structure to make recommendations. Recent graph-based recommendation approaches focused on capturing users’ pairwise preferences and utilized a graph model to exploit the relationships between different entities in the graph. In this paper, we focus on the impact of pairwise preferences on the diversity of recommendations. We propose a novel graph-based ranking oriented recommendation algorithm that exploits both explicit and implicit feedback of users. The algorithm utilizes a user-preference-item tripartite graph model and modified resource allocation process to match the target user with users who share similar preferences, and make personalized recommendations. The principle of the additional preference layer is to capture users’ pairwise preferences, provide detailed information of users for further recommendations. Empirical analysis of four benchmark datasets demonstrated that our proposed algorithm performs better in most situations than other graph-based and ranking-oriented benchmark algorithms.  相似文献   

2.
Entity alignment is an important task for the Knowledge Graph (KG) completion, which aims to identify the same entities in different KGs. Most of previous works only utilize the relation structures of KGs, but ignore the heterogeneity of relations and attributes of KGs. However, these information can provide more feature information and improve the accuracy of entity alignment. In this paper, we propose a novel Multi-Heterogeneous Neighborhood-Aware model (MHNA) for KGs alignment. MHNA aggregates multi-heterogeneous information of aligned entities, including the entity name, relations, attributes and attribute values. An important contribution is to design a variant attention mechanism, which adds the feature information of relations and attributes to the calculation of attention coefficients. Extensive experiments on three well-known benchmark datasets show that MHNA significantly outperforms 12 state-of-the-art approaches, demonstrating that our approach has good scalability and superiority in both cross-language and monolingual KGs. An ablation study further supports the effectiveness of our variant attention mechanism.  相似文献   

3.
General recommenders and sequential recommenders are two modeling paradigms of recommender. The main focus of a general recommender is to identify long-term user preferences, while the user’s sequential behaviors are ignored and sequential recommenders try to capture short-term user preferences by exploring item-to-item relations, failing to consider general user preferences. Recently, better performance improvement is reported by combining these two types of recommenders. However, most of the previous works typically treat each item separately and assume that each user–item interaction in a sequence is independent. This may be a too simplistic assumption, since there may be a particular purpose behind buying the successive item in a sequence. In fact, a user makes a decision through two sequential processes, i.e., start shopping with a particular intention and then select a specific item which satisfies her/his preferences under this intention. Moreover, different users usually have different purposes and preferences, and the same user may have various intentions. Thus, different users may click on the same items with an attention on a different purpose. Therefore, a user’s behavior pattern is not completely exploited in most of the current methods and they neglect the distinction between users’ purposes and their preferences. To alleviate those problems, we propose a novel method named, CAN, which takes both users’ purposes and preferences into account for the next-item recommendation. We propose to use Purpose-Specific Attention Unit (PSAU) in order to discriminately learn the representations of user purpose and preference. The experimental results on real-world datasets demonstrate the advantages of our approach over the state-of-the-art methods.  相似文献   

4.
Dictionary-based classifiers are an essential group of approaches in the field of time series classification. Their distinctive characteristic is that they transform time series into segments made of symbols (words) and then classify time series using these words. Dictionary-based approaches are suitable for datasets containing time series of unequal length. The prevalence of dictionary-based methods inspired the research in this paper. We propose a new dictionary-based classifier called SAFE. The new approach transforms the raw numeric data into a symbolic representation using the Simple Symbolic Aggregate approXimation (SAX) method. We then partition the symbolic time series into a sequence of words. Then we employ the word embedding neural model known in Natural Language Processing to train the classifying mechanism. The proposed scheme was applied to classify 30 benchmark datasets and compared with a range of state-of-the-art time series classifiers. The name SAFE comes from our observation that this method is safe to use. Empirical experiments have shown that SAFE gives excellent results: it is always in the top 5%–10% when we rank the classification accuracy of state-of-the-art algorithms for various datasets. Our method ranks third in the list of state-of-the-art dictionary-based approaches (after the WEASEL and BOSS methods).  相似文献   

5.
社会选择和社会影响是在线社交网络社群形成的两个主要因素,如果能有效对网络社群中用户和群体进行分类,就可以采取不同的群推荐策略,实现群体满意最大化。利用偏好对表示群用户偏好,利用矩阵分解和贝叶斯个性化排序方法,考查社会选择和影响对用户偏好的影响程度,实现群用户和群体的分类,进而提出2种群推荐策略。最后通过2个数据集的实验验证,表明本文提出的基于用户和群体分类的群推荐策略是有效的。  相似文献   

6.
We propose an approach to the retrieval of entities that have a specific relationship with the entity given in a query. Our research goal is to investigate whether related entity finding problem can be addressed by combining a measure of relatedness of candidate answer entities to the query, and likelihood that the candidate answer entity belongs to the target entity category specified in the query. An initial list of candidate entities, extracted from top ranked documents retrieved for the query, is refined using a number of statistical and linguistic methods. The proposed method extracts the category of the target entity from the query, identifies instances of this category as seed entities, and computes similarity between candidate and seed entities. The evaluation was conducted on the Related Entity Finding task of the Entity Track of TREC 2010, as well as the QA list questions from TREC 2005 and 2006. Evaluation results demonstrate that the proposed methods are effective in finding related entities.  相似文献   

7.
This paper examines several different approaches to exploiting structural information in semi-structured document categorization. The methods under consideration are designed for categorization of documents consisting of a collection of fields, or arbitrary tree-structured documents that can be adequately modeled with such a flat structure. The approaches range from trivial modifications of text modeling to more elaborate schemes, specifically tailored to structured documents. We combine these methods with three different text classification algorithms and evaluate their performance on four standard datasets containing different types of semi-structured documents. The best results were obtained with stacking, an approach in which predictions based on different structural components are combined by a meta classifier. A further improvement of this method is achieved by including the flat text model in the final prediction.  相似文献   

8.
Question-answering has become one of the most popular information retrieval applications. Despite that most question-answering systems try to improve the user experience and the technology used in finding relevant results, many difficulties are still faced because of the continuous increase in the amount of web content. Questions Classification (QC) plays an important role in question-answering systems, with one of the major tasks in the enhancement of the classification process being the identification of questions types. A broad range of QC approaches has been proposed with the aim of helping to find a solution for the classification problems; most of these are approaches based on bag-of-words or dictionaries. In this research, we present an analysis of the different type of questions based on their grammatical structure. We identify different patterns and use machine learning algorithms to classify them. A framework is proposed for question classification using a grammar-based approach (GQCC) which exploits the structure of the questions. Our findings indicate that using syntactic categories related to different domain-specific types of Common Nouns, Numeral Numbers and Proper Nouns enable the machine learning algorithms to better differentiate between different question types. The paper presents a wide range of experiments the results show that the GQCC using J48 classifier has outperformed other classification methods with 90.1% accuracy.  相似文献   

9.
Transductive classification is a useful way to classify texts when labeled training examples are insufficient. Several algorithms to perform transductive classification considering text collections represented in a vector space model have been proposed. However, the use of these algorithms is unfeasible in practical applications due to the independence assumption among instances or terms and the drawbacks of these algorithms. Network-based algorithms come up to avoid the drawbacks of the algorithms based on vector space model and to improve transductive classification. Networks are mostly used for label propagation, in which some labeled objects propagate their labels to other objects through the network connections. Bipartite networks are useful to represent text collections as networks and perform label propagation. The generation of this type of network avoids requirements such as collections with hyperlinks or citations, computation of similarities among all texts in the collection, as well as the setup of a number of parameters. In a bipartite heterogeneous network, objects correspond to documents and terms, and the connections are given by the occurrences of terms in documents. The label propagation is performed from documents to terms and then from terms to documents iteratively. Nevertheless, instead of using terms just as means of label propagation, in this article we propose the use of the bipartite network structure to define the relevance scores of terms for classes through an optimization process and then propagate these relevance scores to define labels for unlabeled documents. The new document labels are used to redefine the relevance scores of terms which consequently redefine the labels of unlabeled documents in an iterative process. We demonstrated that the proposed approach surpasses the algorithms for transductive classification based on vector space model or networks. Moreover, we demonstrated that the proposed algorithm effectively makes use of unlabeled documents to improve classification and it is faster than other transductive algorithms.  相似文献   

10.
Question classification (QC) involves classifying given question based on the expected answer type and is an important task in the Question Answering(QA) system. Existing approaches for question classification use full training dataset to fine-tune the models. It is expensive and requires more time to develop labelled datasets in huge size. Hence, there is a need to develop approaches that can achieve comparable or state of the art performance using limited training instances. In this paper, we propose an approach that uses data augmentation as a tool to generate additional training instances. We evaluate our proposed approach on two question classification datasets namely TREC and ICHI datasets. Experimental results show that our proposed approach reduces the requirement of labelled instances (a) up to 81.7% and achieves new state of the art accuracy of 98.11 on TREC dataset and (b) up to 75% and achieves 67.9 on ICHI dataset.  相似文献   

11.
Integrating useful input information is essential to provide efficient recommendations to users. In this work, we focus on improving items ratings prediction by merging both multiple contexts and multiple criteria based research directions which were addressed separately in most existent literature. Throughout this article, Criteria refer to the items attributes, while Context denotes the circumstances in which the user uses an item. Our goal is to capture more fine grained preferences to improve items recommendation quality using users’ multiple criteria ratings under specific contextual situations. Therefore, we examine the recommenders’ data from the graph theory based perspective by representing three types of entities (users, contextual situations and criteria) as well as their relationships as a tripartite graph. Upon the assumption that contextually similar users tend to have similar interests for similar item criteria, we perform a high-order co-clustering on the tripartite graph for simultaneously partitioning the graph entities representing users in similar contextual situations and their evaluated item criteria. To predict cluster-based multi-criteria ratings, we introduce an improved rating prediction method that considers the dependency between users and their contextual situations, and also takes into account the correlation between criteria in the prediction process. The predicted multi-criteria ratings are finally aggregated into a single representative output corresponding to an overall item rating. To guide our investigation, we create a research hypothesis to provide insights about the tripartite graph partitioning and design clear and justified preliminary experiments including quantitative and qualitative analyzes to validate it. Further thorough experiments on the two available context-aware multi-criteria datasets, TripAdvisor and Educational, demonstrate that our proposal exhibits substantial improvements over alternative recommendations approaches.  相似文献   

12.
俞扬信  刘瀛泽 《情报杂志》2012,31(2):136-140
针对传统检索方法在当今网络信息环境下所面临的问题,提出了一种用户个性化信息检索新方法。在这种方法中,根据形式概念分析(FCA)理论,将用户偏好定义为概念网,用户概念网中的概念定义了用户偏好的范围和目标。使用传统的TF-IDF加权方案和ODP的参考概念层次,将用户偏好用概念矢量表示,进行用户概念网的扩展。比较测试表明所提出的方法不仅具有实现可行性,而且在检索效果上优于传统的检索模式,具有一定的应用前景。  相似文献   

13.
Entity disambiguation is a fundamental task of semantic Web annotation. Entity Linking (EL) is an essential procedure in entity disambiguation, which aims to link a mention appearing in a plain text to a structured or semi-structured knowledge base, such as Wikipedia. Existing research on EL usually annotates the mentions in a text one by one and treats entities independent to each other. However this might not be true in many application scenarios. For example, if two mentions appear in one text, they are likely to have certain intrinsic relationships. In this paper, we first propose a novel query expansion method for candidate generation utilizing the information of co-occurrences of mentions. We further propose a re-ranking model which can be iteratively adjusted based on the prediction in the previous round. Experiments on real-world data demonstrate the effectiveness of our proposed methods for entity disambiguation.  相似文献   

14.
Automatic text classification is the task of organizing documents into pre-determined classes, generally using machine learning algorithms. Generally speaking, it is one of the most important methods to organize and make use of the gigantic amounts of information that exist in unstructured textual format. Text classification is a widely studied research area of language processing and text mining. In traditional text classification, a document is represented as a bag of words where the words in other words terms are cut from their finer context i.e. their location in a sentence or in a document. Only the broader context of document is used with some type of term frequency information in the vector space. Consequently, semantics of words that can be inferred from the finer context of its location in a sentence and its relations with neighboring words are usually ignored. However, meaning of words, semantic connections between words, documents and even classes are obviously important since methods that capture semantics generally reach better classification performances. Several surveys have been published to analyze diverse approaches for the traditional text classification methods. Most of these surveys cover application of different semantic term relatedness methods in text classification up to a certain degree. However, they do not specifically target semantic text classification algorithms and their advantages over the traditional text classification. In order to fill this gap, we undertake a comprehensive discussion of semantic text classification vs. traditional text classification. This survey explores the past and recent advancements in semantic text classification and attempts to organize existing approaches under five fundamental categories; domain knowledge-based approaches, corpus-based approaches, deep learning based approaches, word/character sequence enhanced approaches and linguistic enriched approaches. Furthermore, this survey highlights the advantages of semantic text classification algorithms over the traditional text classification algorithms.  相似文献   

15.
Query enrichment is a process of dynamically enhancing a user query based on her preferences and context in order to provide a personalized answer. The central idea is that different users may find different services relevant due to different preferences and contexts. In this paper, we present a preference model that combines user preferences, user context, domain knowledge to enrich the initial user query. We use CP-nets to rank the preferences using implicit and explicit user preferences and domain knowledge. We present some algorithms for preferential matching. We have implemented the proposed model as a prototype. The initial results look promising.  相似文献   

16.
Recommender system as an effective method to reduce information overload has been widely used in the e-commerce field. Existing studies mainly capture semantic features by considering user-item interactions or behavioral history records, which ignores the sparsity of interactions and the drift of user preferences. To cope with these challenges, we introduce the recently popular Graph Neural Networks (GNN) and propose an Interest Evolution-driven Gated Neighborhood (IEGN) aggregation representation model which can capture accurate user representation and track the evolution of user interests. Specifically, in IEGN, we explicitly model the relational information between neighbor nodes by introducing the gated adaptive propagation mechanism. Then, a personalized time interval function is designed to track the evolution of user interests. In addition, a high-order convolutional pooling operation is used to capture the correlation among the short-term interaction sequence. The user preferences are predicted by the fusion of user dynamic preferences and short-term interaction features. Extensive experiments on Amazon and Alibaba datasets show that IEGN outperforms several state-of-the-art methods in recommendation tasks.  相似文献   

17.
The matrix factorization model based on user-item rating data has been widely studied and applied in recommender systems. However, data sparsity, the cold-start problem, and poor explainability have restricted its performance. Textual reviews usually contain rich information about items’ features and users’ sentiments and preferences, which can solve the problem of insufficient information from only user ratings. However, most recommendation algorithms that take sentiment analysis of review texts into account are either fine- or coarse-grained, but not both, leading to uncertain accuracy and comprehensiveness regarding user preference. This study proposes a deep learning recommendation model (i.e., DeepCGSR) that integrates textual review sentiments and the rating matrix. DeepCGSR uses the review sets of users and items as a corpus to perform cross-grained sentiment analysis by combining fine- and coarse-grained levels to extract sentiment feature vectors for users and items. Deep learning technology is used to map between the extracted feature vector and latent factor through the rating-based matrix factorization model and obtain deep, nonlinear features to predict the user's rating of an item. Iterative experiments on e-commerce datasets from Amazon show that DeepCGSR consistently outperforms the recommendation models LFM, SVD++, DeepCoNN, TOPICMF, and NARRE. Overall, comparing with other recommendation models, the DeepCGSR model demonstrated improved evaluation results by 14.113% over LFM, 13.786% over SVD++, 9.920% over TOPICMF, 5.122% over DeepCoNN, and 2.765% over NARRE. Meanwhile, the DeepCGSR has great potential in fixing the overfitting and cold-start problems. Built upon previous studies and findings, the DeepCGSR is the state of the art, moving the design and development of the recommendation algorithms forward with improved recommendation accuracy.  相似文献   

18.
Whether to deal with issues related to information ranking (e.g. search engines) or content recommendation (on social networks, for instance), algorithms are at the core of processes that select which information is made visible. Such algorithmic choices have a strong impact on users’ activity de facto, and therefore on their access to information. This raises the question of how to measure the quality of the choices algorithms make and their impact on users. As a first step in that direction, this paper presents a framework with which to analyze the diversity of information accessed by users in the context of musical content.The approach adopted centers on the representation of user activity through a tripartite graph that maps users to products and products to categories. In turn, conducting random walks in this structure makes it possible to analyze how categories catch users’ attention and how this attention is distributed. Building upon this distribution, we propose a new index referred to as the (calibrated) herfindahl diversity, which is aimed at quantifying the extent to which this distribution is diverse and representative of existing categories.To the best of our knowledge, this paper is the first to connect the output of random walks on graphs with diversity indexes. We demonstrate the benefit of such an approach by applying our index to two datasets that record user activity on online platforms involving musical content. The results are threefold. First, we show that our index can discriminate between different user behaviors. Second, we shed some light on a saturation phenomenon in the diversity of users’ attention. Finally, we show that the lack of diversity observed in the datasets derives from exogenous factors related to the heterogeneous popularity of music styles, as opposed to internal factors such as recurrent user behaviors.  相似文献   

19.
Zero-shot object classification aims to recognize the object of unseen classes whose supervised data are unavailable in the training stage. Recent zero-shot learning (ZSL) methods usually propose to generate new supervised data for unseen classes by designing various deep generative networks. In this paper, we propose an end-to-end deep generative ZSL approach that trains the data generation module and object classification module jointly, rather than separately as in the majority of existing generation-based ZSL methods. Due to the ZSL assumption that unseen data are unavailable in the training stage, the distribution of generated unseen data will shift to the distribution of seen data, and subsequently causes the projection domain shift problem. Therefore, we further design a novel meta-learning optimization model to improve the proposed generation-based ZSL approach, where the parameters initialization and the parameters update algorithm are meta-learned to assist model convergence. We evaluate the proposed approach on five standard ZSL datasets. The average accuracy increased by the proposed jointly training strategy is 2.7% and 23.0% for the standard ZSL task and generalized ZSL task respectively, and the meta-learning optimization further improves the accuracy by 5.0% and 2.1% on two ZSL tasks respectively. Experimental results demonstrate that the proposed approach has significant superiority in various ZSL tasks.  相似文献   

20.
Learning semantic representations of documents is essential for various downstream applications, including text classification and information retrieval. Entities, as important sources of information, have been playing a crucial role in assisting latent representations of documents. In this work, we hypothesize that entities are not monolithic concepts; instead they have multiple aspects, and different documents may be discussing different aspects of a given entity. Given that, we argue that from an entity-centric point of view, a document related to multiple entities shall be (a) represented differently for different entities (multiple entity-centric representations), and (b) each entity-centric representation should reflect the specific aspects of the entity discussed in the document.In this work, we devise the following research questions: (1) Can we confirm that entities have multiple aspects, with different aspects reflected in different documents, (2) can we learn a representation of entity aspects from a collection of documents, and a representation of document based on the multiple entities and their aspects as reflected in the documents, (3) does this novel representation improves algorithm performance in downstream applications, and (4) what is a reasonable number of aspects per entity? To answer these questions we model each entity using multiple aspects (entity facets1), where each entity facet is represented as a mixture of latent topics. Then, given a document associated with multiple entities, we assume multiple entity-centric representations, where each entity-centric representation is a mixture of entity facets for each entity. Finally, a novel graphical model, the Entity Facet Topic Model (EFTM), is proposed in order to learn entity-centric document representations, entity facets, and latent topics.Through experimentation we confirm that (1) entities are multi-faceted concepts which we can model and learn, (2) a multi-faceted entity-centric modeling of documents can lead to effective representations, which (3) can have an impact in downstream application, and (4) considering a small number of facets is effective enough. In particular, we visualize entity facets within a set of documents, and demonstrate that indeed different sets of documents reflect different facets of entities. Further, we demonstrate that the proposed entity facet topic model generates better document representations in terms of perplexity, compared to state-of-the-art document representation methods. Moreover, we show that the proposed model outperforms baseline methods in the application of multi-label classification. Finally, we study the impact of EFTM’s parameters and find that a small number of facets better captures entity specific topics, which confirms the intuition that on average an entity has a small number of facets reflected in documents.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号