首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 641 毫秒
1.
Information filtering (IF) systems usually filter data items by correlating a vector of terms that represent the user profile with similar vectors of terms that represent data items. Terms that represent data items can be determined by experts or automatic indexing methods. In this study we employ an artificial neural network (ANN) as an alternative method for both IF and term selection and compare its effectiveness to that of “traditional” methods. In an earlier study we developed and examined the performance of an IF system that employed content-based and stereotypic rule-based filtering methods in the domain of e-mail messages. In this study, we train a large-scale ANN-based filter, which uses meaningful terms in the same database as input, and use it to predict the relevance of those messages. Our results reveal that the ANN relevance prediction out-performs the prediction of the IF system. Moreover, we found very low correlation between the terms in the user profile (explicitly selected by the users) and the positive causal-index (CI) terms of the ANN, which indicate the relative importance of terms in messages. This implies that the users underestimate the importance of some terms, failing to include them in their profiles. This may explain the rather low prediction accuracy of the IF system.  相似文献   

2.
Recommender systems are techniques to make personalized recommendations of items to users. In e-commerce sites and online sharing communities, providing high quality recommendations is an important issue which can help the users to make effective decisions to select a set of items. Collaborative filtering is an important type of the recommender systems that produces user specific recommendations of the items based on the patterns of ratings or usage (e.g. purchases). However, the quality of predicted ratings and neighbor selection for the users are important problems in the recommender systems. Selecting suitable neighbors set for the users leads to improve the accuracy of ratings prediction in recommendation process. In this paper, a novel social recommendation method is proposed which is based on an adaptive neighbor selection mechanism. In the proposed method first of all, initial neighbors set of the users is calculated using clustering algorithm. In this step, the combination of historical ratings and social information between the users are used to form initial neighbors set for the users. Then, these neighbor sets are used to predict initial ratings of the unseen items. Moreover, the quality of the initial predicted ratings is evaluated using a reliability measure which is based on the historical ratings and social information between the users. Then, a confidence model is proposed to remove useless users from the initial neighbors of the users and form a new adapted neighbors set for the users. Finally, new ratings of the unseen items are predicted using the new adapted neighbors set of the users and the top_N interested items are recommended to the active user. Experimental results on three real-world datasets show that the proposed method significantly outperforms several state-of-the-art recommendation methods.  相似文献   

3.
Searching for relevant material that satisfies the information need of a user, within a large document collection is a critical activity for web search engines. Query Expansion techniques are widely used by search engines for the disambiguation of user’s information need and for improving the information retrieval (IR) performance. Knowledge-based, corpus-based and relevance feedback, are the main QE techniques, that employ different approaches for expanding the user query with synonyms of the search terms (word synonymy) in order to bring more relevant documents and for filtering documents that contain search terms but with a different meaning (also known as word polysemy problem) than the user intended. This work, surveys existing query expansion techniques, highlights their strengths and limitations and introduces a new method that combines the power of knowledge-based or corpus-based techniques with that of relevance feedback. Experimental evaluation on three information retrieval benchmark datasets shows that the application of knowledge or corpus-based query expansion techniques on the results of the relevance feedback step improves the information retrieval performance, with knowledge-based techniques providing significantly better results than their simple relevance feedback alternatives in all sets.  相似文献   

4.
Collaborative Filtering techniques have become very popular in the last years as an effective method to provide personalized recommendations. They generally obtain much better accuracy than other techniques such as content-based filtering, because they are based on the opinions of users with tastes or interests similar to the user they are recommending to. However, this is precisely the reason of one of its main limitations: the cold-start problem. That is, how to recommend new items, not yet rated, or how to offer good recommendations to users they have not information about. For example, because they have recently joined the system. In fact, the new user problem is particularly serious, because an unsatisfied user may stop using the system before it could even collect enough information to generate good recommendations. In this article we tackle this problem with a novel approach called “profile expansion”, based on the query expansion techniques used in Information Retrieval. In particular, we propose and evaluate three kinds of techniques: item-global, item-local and user-local. The experiments we have performed show that both item-global and user-local offer outstanding improvements in precision, up to 100%. Moreover, the improvements are statistically significant and consistent among different movie recommendation datasets and several training conditions.  相似文献   

5.
王井 《情报科学》2020,38(3):54-59
【目的/意义】通过订阅记录获取用户兴趣爱好,并将协同过滤推荐方法应用于图书个性化推荐,为读者提供优质服务。【方法/过程】以协同过滤算法为基础,根据用户订阅记录,分别计算用户相似性和订阅图书相似性。针对传统协同过滤方法在计算热门订阅相似度时存在的缺陷,引入对订阅权重的惩罚机制,减轻了热门订阅会和很多订阅相似的可能性,并根据协同过滤方法,产生相应推荐结果。【结果/结论】运用公开可获取的数据集进行的算法验证表明,基于订阅记录的协同过滤算法推荐准确度较高,对提升用户图书借阅体验相关研究与实践有一定的参考价值。  相似文献   

6.
随着电子商务的迅速发展,推荐系统与算法已经成为理论研究的热点。支持向量机是一种强大的分类工具,由其衍生出的支持向量机回归方法能很好地解决非线性回归问题。文中以电影推荐为例,引入支持向量机回归方法来分析项目的内容,构建用户模型,进而给出推荐。实验结果和理论分析表明这种推荐算法与传统协同过滤算法相比,能够明显提高推荐精度,并显著缩短了推荐所需时间;在大样本量情况下也能同样高效。  相似文献   

7.
吕果  李法运 《情报探索》2014,(2):101-105,110
基于协同过滤(CF)的个性化推荐技术,提出一种移动设备个性化软件推荐系统.该系统根据协同过滤的理论,首先通过软件类别兴趣相似度的计算,筛选出软件类别相似的用户候选集,过滤所有移动用户,减小产生的用户候选推荐集;然后对用户候选推荐集进行最近邻居的相似性计算以找出目标用户的邻居集合,并且对邻居集合中的邻居评分进行实时更新;最后根据兴趣相似度最大的K个邻居形成目标用户的Top-N推荐集.在第三方手机软件管理平台上通过监测推荐软件的下载或浏览量,验证系统的有效性和准确性.  相似文献   

8.
Rocchio relevance feedback and latent semantic indexing (LSI) are well-known extensions of the vector space model for information retrieval (IR). This paper analyzes the statistical relationship between these extensions. The analysis focuses on each method’s basis in least-squares optimization. Noting that LSI and Rocchio relevance feedback both alter the vector space model in a way that is in some sense least-squares optimal, we ask: what is the relationship between LSI’s and Rocchio’s notions of optimality? What does this relationship imply for IR? Using an analytical approach, we argue that Rocchio relevance feedback is optimal if we understand retrieval as a simplified classification problem. On the other hand, LSI’s motivation comes to the fore if we understand it as a biased regression technique, where projection onto a low-dimensional orthogonal subspace of the documents reduces model variance.  相似文献   

9.
Existing pseudo-relevance feedback (PRF) methods often divide an original query into individual terms for processing and select expansion terms based on the term frequency, proximity, position, etc. This process may lose some contextual semantic information from the original query. In this work, based on the classic Rocchio model, we propose a probabilistic framework that incorporates sentence-level semantics via Bidirectional Encoder Representations from Transformers (BERT) into PRF. First, we obtain the importance of terms at the term level. Then, we use BERT to interactively encode the query and sentences in the feedback document to acquire the semantic similarity score of a sentence and the query. Next, the semantic scores of different sentences are summed as the term score at the sentence level. Finally, we balance the term-level and sentence-level weights by adjusting factors and combine the terms with the top-k scores to form a new query for the next-round processing. We apply this method to three Rocchio-based models (Rocchio, PRoc2, and KRoc). A series of experiments are conducted based on six official TREC data sets. Various evaluation indicators suggest that the improved models achieve a significant improvement over the corresponding baseline models. Our proposed models provide a promising avenue for incorporating sentence-level semantics into PRF, which is feasible and robust. Through comparison and analysis of a case study, expansion terms obtained from the proposed models are shown to be more semantically consistent with the query.  相似文献   

10.
用户模板的构建是信息过滤系统建设的最重要的工作之一。本文首先介绍几种用户兴趣模型的构建技术,然后阐述在我们研制的专题文献过滤系统中所采用的用户模板构建方法。  相似文献   

11.
针对传统协同过滤技术在图书推荐中效率不高、数据极端稀疏性及主观性强等问题,提出一种基于云填充和蚁群聚类的协同过滤图书推荐方法,首先根据蚁群聚类算法得到用户群分类,然后在进行协同过滤前预先通过云模型填充用户——项目矩阵,以降低数据的稀疏性。实验结果表明,该算法在推荐精度上有明显的提高。  相似文献   

12.
A recommender system has an obvious appeal in an environment where the amount of on-line information vastly outstrips any individual’s capability to survey. Music recommendation is considered a popular application area. In order to make personalized recommendations, many collaborative music recommender systems (CMRS) focus on capturing precise similarities among users or items based on user historical ratings. Despite the valuable information from audio features of music itself, however, few studies have investigated how to utilize information extracted directly from music for personalized recommendation in CMRS. In this paper, we describe a CMRS based on our proposed item-based probabilistic model, where items are classified into groups and predictions are made for users considering the Gaussian distribution of user ratings. In addition, this model has been extended for improved recommendation performance by utilizing audio features that help alleviate three well-known problems associated with data sparseness in collaborative recommender systems: user bias, non-association, and cold start problems in capturing accurate similarities among items. Experimental results based on two real-world data sets lead us to believe that content information is crucial in achieving better personalized recommendation beyond user ratings. We further show how primitive audio features can be combined into aggregate features for the proposed CRMS and analyze their influences on recommendation performance. Although this model was developed originally for music collaborative recommendation based on audio features, our experiment with the movie data set demonstrates that it can be applied to other domains.  相似文献   

13.
Collaborative filtering aims at predicting a test user’s ratings for new items by integrating other like-minded users’ rating information. The key assumption is that users sharing the same ratings on past items tend to agree on new items. Traditional collaborative filtering methods can mainly be divided into two classes: memory-based and model-based. The memory-based approaches generally suffer from two fundamental problems: sparsity and scalability, and the model-based approaches usually cost too much on establishing a model and have many parameters to be tuned.  相似文献   

14.
One of the major problems in information retrieval is the formulation of queries on the part of the user. This entails specifying a set of words or terms that express their informational need. However, it is well-known that two people can assign different terms to refer to the same concepts. The techniques that attempt to reduce this problem as much as possible generally start from a first search, and then study how the initial query can be modified to obtain better results. In general, the construction of the new query involves expanding the terms of the initial query and recalculating the importance of each term in the expanded query. Depending on the technique used to formulate the new query several strategies are distinguished. These strategies are based on the idea that if two terms are similar (with respect to any criterion), the documents in which both terms appear frequently will also be related. The technique we used in this study is known as query expansion using similarity thesauri.  相似文献   

15.
Recommender Systems deal with the issue of overloading information by retrieving the most relevant sources in the wide range of web services. They help users by predicting their interests in many domains like e-government, social networks, e-commerce and entertainment. Collaborative Filtering (CF) is the most promising technique used in recommender systems to give suggestions based on liked-mind users’ preferences. Despite the widespread use of CF in providing personalized recommendation, this technique has problems including cold start, data sparsity and gray sheep. Eventually, these problems lead to the deterioration of the efficiency of CF. Most existing recommendation methods have been proposed to overcome the problems of CF. However, they fail to suggest the top-n recommendations based on the sequencing of the users’ priorities. In this research, to overcome the shortcomings of CF and current recommendation methods in ranking preference dataset, we have used a new graph-based structure to model the users’ priorities and capture the association between users and items. Users’ profiles are created based on their past and current interest. This is done because their interest can change with time. Our proposed algorithm keeps the preferred items of active user at the beginning of the recommendation list. This means these items come under top-n recommendations, which results in satisfaction among users. The experimental results demonstrate that our algorithm archives the significant improvement in comparison with CF and other proposed recommendation methods in terms of recall, precision, f-measure and MAP metrics using two benchmark datasets including MovieLens and Superstore.  相似文献   

16.
Collaborative frequent itemset mining involves analyzing the data shared from multiple business entities to find interesting patterns from it. However, this comes at the cost of high privacy risk. Because some of these patterns may contain business-sensitive information and hence are denoted as sensitive patterns. The revelation of such patterns can disclose confidential information. Privacy-preserving data mining (PPDM) includes various sensitive pattern hiding (SPH) techniques, which ensures that sensitive patterns do not get revealed when data mining models are applied on shared datasets. In the process of hiding sensitive patterns, some of the non-sensitive patterns also become infrequent. SPH techniques thus affect the results of data mining models. Maintaining a balance between data privacy and data utility is an NP-hard problem because it requires the selection of sensitive items for deletion and also the selection of transactions containing these items such that side effects of deletion are minimal. There are various algorithms proposed by researchers that use evolutionary approaches such as genetic algorithm(GA), particle swarm optimization (PSO) and ant colony optimization (ACO). These evolutionary SPH algorithms mask sensitive patterns through the deletion of sensitive transactions. Failure in the sensitive patterns masking and loss of data have been the biggest challenges for such algorithms. The performance of evolutionary algorithms further gets degraded when applied on dense datasets. In this research paper, victim item deletion based PSO inspired evolutionary algorithm named VIDPSO is proposed to sanitize the dense datasets. In the proposed algorithm, each particle of the population consists of n number of sub-particles derived from pre-calculated victim items. The proposed algorithm has a high exploration capability to search the solution space for selecting optimal transactions. Experiments conducted on real and synthetic dense datasets depict that VIDPSO algorithm performs better vis-a-vis GA, PSO and ACO based SPH algorithms in terms of hiding failure with minimal loss of data.  相似文献   

17.
Recommender systems’ (RSs) research has mostly focused on algorithms aimed at improving platform owners’ revenues and user’s satisfaction. However, RSs have additional effects, which are related to their impact on users’ choices. In order to avoid an undesired system behaviour and anticipate the effects of an RS, the literature suggests employing simulations.In this article we present a novel, well grounded and flexible simulation framework. We adopt a stochastic user’s choice model and simulate users’ repeated choices for items in the presence of alternative RSs. Properties of the simulated choices, such as their diversity and their quality, are analysed. We state four research questions, also motivated by identified research gaps, which are addressed by conducting an experimental study where three different data sets and five alternative RSs are used. We identify some important effects of RSs. We find that non-personalised RSs result in choices for items that have a larger predicted rating compared to personalised RSs. Moreover, when a user’s awareness set, which is the set containing the items that she can choose from, increases, then choices are more diverse, but the average quality (rating) of the choices decreases. Additionally, in order to achieve a higher choice diversity, increasing the awareness of the users is shown to be a more effective remedy than increasing the number of recommendations offered to the users.  相似文献   

18.
Automatic text classification is the problem of automatically assigning predefined categories to free text documents, thus allowing for less manual labors required by traditional classification methods. When we apply binary classification to multi-class classification for text classification, we usually use the one-against-the-rest method. In this method, if a document belongs to a particular category, the document is regarded as a positive example of that category; otherwise, the document is regarded as a negative example. Finally, each category has a positive data set and a negative data set. But, this one-against-the-rest method has a problem. That is, the documents of a negative data set are not labeled manually, while those of a positive set are labeled by human. Therefore, the negative data set probably includes a lot of noisy data. In this paper, we propose that the sliding window technique and the revised EM (Expectation Maximization) algorithm are applied to binary text classification for solving this problem. As a result, we can improve binary text classification through extracting potentially noisy documents from the negative data set using the sliding window technique and removing actually noisy documents using the revised EM algorithm. The results of our experiments showed that our method achieved better performance than the original one-against-the-rest method in all the data sets and all the classifiers used in the experiments.  相似文献   

19.
This article explores how to develop complex data driven user models that go beyond the bag of words model and topical relevance. We propose to learn from rich user specific information and to satisfy complex user criteria under the graphical modelling framework. We carried out a user study with a web based personal news filtering system, and collected extensive user information, including explicit user feedback, implicit user feedback and some contextual information. Experimental results on the data set collected demonstrate that the graphical modelling approach helps us to better understand the complex domain. The results also show that the complex data driven user modelling approach can improve the adaptive information filtering performance. We also discuss some practical issues while learning complex user models, including how to handle data noise and the missing data problem.  相似文献   

20.
One difficult problem in information retrieval (IR) is the proper interpretation of user queries. It is extremely hard for users to express their information needs in a specific yet exhaustive way. In an effort to alleviate this problem, two theoretical models have been proposed to utilize user characteristics maintained in the form of a user profile. Although the idea of integrating user profiles into an IR system is intuitively appealing, and the models seem viable, no research to date has established a foundation for the roles of user profiles in such a system. Aiming at the investigation of the roles of user profiles, therefore, this study first identifies and extends various query/profile interaction models to provide a ground upon which the investigation can be undertaken. From a continuum of models characterized on the basis of interaction types, metrics, and parameters, nearly 400 models are chosen to investigate the “model space.” New measures are developed based on the notion of user satisfaction/frustration. In addition, three different criteria are used to guide users in making judgments on the quality of retrieved items. Analysis of the data obtained from the experiments shows that, for a wide variety of criteria and metrics, there are always some query/profile interaction models that outperform the query alone model. In addition, preferable characteristics for different criteria are identified in terms of interaction types, parameters, and metrics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号