首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In recent years, there has been a rapid growth of user-generated data in collaborative tagging (a.k.a. folksonomy-based) systems due to the prevailing of Web 2.0 communities. To effectively assist users to find their desired resources, it is critical to understand user behaviors and preferences. Tag-based profile techniques, which model users and resources by a vector of relevant tags, are widely employed in folksonomy-based systems. This is mainly because that personalized search and recommendations can be facilitated by measuring relevance between user profiles and resource profiles. However, conventional measurements neglect the sentiment aspect of user-generated tags. In fact, tags can be very emotional and subjective, as users usually express their perceptions and feelings about the resources by tags. Therefore, it is necessary to take sentiment relevance into account into measurements. In this paper, we present a novel generic framework SenticRank to incorporate various sentiment information to various sentiment-based information for personalized search by user profiles and resource profiles. In this framework, content-based sentiment ranking and collaborative sentiment ranking methods are proposed to obtain sentiment-based personalized ranking. To the best of our knowledge, this is the first work of integrating sentiment information to address the problem of the personalized tag-based search in collaborative tagging systems. Moreover, we compare the proposed sentiment-based personalized search with baselines in the experiments, the results of which have verified the effectiveness of the proposed framework. In addition, we study the influences by popular sentiment dictionaries, and SenticNet is the most prominent knowledge base to boost the performance of personalized search in folksonomy.  相似文献   

2.
张泰瑞  陈渝 《现代情报》2019,39(8):92-102
[目的/意义]探索式学术信息搜索是科研工作者科学研究过程中重要一环。结合有限理性理论,探究用户对信息复杂性刺激的应对,进而构建学术信息搜索行为理论模型。[方法/过程]借鉴前人研究成果初步提炼影响因素,采用线上专家学者访谈方式确定研究采用的具体影响因素,并基于外部刺激(S)—机体认知(O)—行为反应(R)3个层次,通过二阶结构方程模型进行分析。[结果/结论]研究显示信息维度和方法维度刺激显著增强期望确认,而期望确认促进学术信息搜索行为,但是感知示能性会显著减弱外部刺激对期望确认的影响。结果证明学术搜索行为不是简单的需求-搜索行为,而是受到认知期望与认知能力影响的有限理性行为。  相似文献   

3.
Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in an approach often called structured query translation. In contrast, language models incorporate translation probabilities into a unified framework. We compare the two approaches on Arabic and Spanish data sets, using two kinds of bilingual dictionaries––one derived from a conventional dictionary, and one derived from a parallel corpus. We find that structured query processing gives slightly better results when queries are not expanded. On the other hand, when queries are expanded, language modeling gives better results, but only when using a probabilistic dictionary derived from a parallel corpus.We pursue two additional issues inherent in the comparison of structured query processing with language modeling. The first concerns query expansion, and the second is the role of translation probabilities. We compare conventional expansion techniques (pseudo-relevance feedback) with relevance modeling, a new IR approach which fits into the formal framework of language modeling. We find that relevance modeling and pseudo-relevance feedback achieve comparable levels of retrieval and that good translation probabilities confer a small but significant advantage.  相似文献   

4.
周旭东  王丽爱  陈崚 《现代情报》2006,26(12):133-135,138
综述了几种Web搜索个性化方法,介绍了基本思想,对一些系统如何实现Web搜索个性化进行了分析,包括所使用的用户信息、与用户的相互作用、信息的存储、结合用户信息与搜索所使用的算法。  相似文献   

5.
The nature of the task that leads a person to engage in information interaction, as well as of information seeking and searching tasks, have been shown to influence individuals’ information behavior. Classifying tasks in a domain has been viewed as a departure point of studies on the relationship between tasks and human information behavior. However, previous task classification schemes either classify tasks with respect to the requirements of specific studies or merely classify a certain category of task. Such approaches do not lead to a holistic picture of task since a task involves different aspects. Therefore, the present study aims to develop a faceted classification of task, which can incorporate work tasks and information search tasks into the same classification scheme and characterize tasks in such a way as to help people make predictions of information behavior. For this purpose, previous task classification schemes and their underlying facets are reviewed and discussed. Analysis identifies essential facets and categorizes them into Generic facets of task and Common attributes of task. Generic facets of task include Source of task, Task doer, Time, Action, Product, and Goal. Common attributes of task includes Task characteristics and User’s perception of task. Corresponding sub-facets and values are identified as well. In this fashion, a faceted classification of task is established which could be used to describe users’ work tasks and information search tasks. This faceted classification provides a framework to further explore the relationships among work tasks, search tasks, and interactive information retrieval and advance adaptive IR systems design.  相似文献   

6.
Many enterprise employees may publish content outside their corporate intranet, making the Web a valuable source for identifying company experts. In this article, we thoroughly investigate the usefulness of Web search engines (WSEs) for expert search. In particular, we claim that the ranking of documentary expertise evidence provided by a WSE should also give an indication of the importance of such evidence. To investigate this, we mimic the rankings of seven different WSEs by trying to reproduce their underlying ranking mechanisms in order to search for candidate experts in the TREC CERC collection. Experimental results show that our approach is effective for expert search, and can significantly improve an intranet-based expert search engine. Moreover, when the mimicking of WSEs is further improved by training, expert search performance is also generally enhanced. Finally, we show that WSEs can be mimicked as effectively using only titles and snippets instead of the full content of WSEs’ results, while drastically reducing network costs.  相似文献   

7.
刘俊熙 《现代情报》2010,30(3):7-10,13
在实际的信息搜索中,许多问题是传统关键词搜索不能有效解决的。在此背景下,能将隐性知识转化成显性知识的知识搜索应运而生,为用户提供最简单便捷获取有效信息的途径和方式,成为搜索引擎的补充和延伸,并有可能成为下一代搜索引擎技术(信息检索向知识搜索转化)发展的方向。由知识搜索的变异而生成的、引发众多争议的人肉搜索更使知识搜索成为搜索领域的热门话题。本文主要对两者的性质概念及相互关系予以阐述。  相似文献   

8.
The retrieval effectiveness of the underlying document search component of an expert search engine can have an important impact on the effectiveness of the generated expert search results. In this large-scale study, we perform novel experiments in the context of the document search and expert search tasks of the TREC Enterprise track, to measure the influence that the performance of the document ranking has on the ranking of candidate experts. In particular, our experiments show that while the expert search system performance is related to the relevance of the retrieved documents, surprisingly, it is not always the case that increasing document search effectiveness causes an increase in expert search performance. Moreover, we simulate document rankings designed with expert search performance in mind and, through a failure analysis, show why even a perfect document ranking may not result in a perfect ranking of candidate experts.  相似文献   

9.
熊利红 《情报科学》2003,21(10):1098-1099,1103
通过因特网检索各国专利信息已成为各行各业获取专利信息的一条便捷有效的途径.目前网上专利数据库出现了专利引文检索、基于概念的自然语言检索、数据挖掘、以及数据库整合等一些新功能和新动向.为企业获取竞争情报提供了一些新方法和途径。  相似文献   

10.
袁红  黄燕 《现代情报》2019,39(5):48-56
[目的/意义]查询式搜索适用于目标明确的提问应答式信息问题,探索式搜索更注重搜索过程的人机交互性、动态性与多面性,两者表现出不同的行为特征。作为搜索行为研究的基本问题之一,相关研究还比较缺乏。论文旨在探究查询式搜索与探索式搜索行为特征的差异,这对于信息搜索系统的功能优化以及指导用户高效获取信息都具有重要的实践意义。[方法/过程]论文以健康信息搜索为例,采用搜索行为实验的方法,通过对录屏数据的分析,从检索策略、学习行为、深度搜索和搜索绩效4个维度对两种搜索行为进行比较。[结果/结论]查询式搜索与探索式搜索在关键词变换数、访问网页数目等6个指标上存在显著性差异,在检索工具选择、查询串长度、搜索结果集的翻页和相关链接搜索4个指标上不存在显著性差异。  相似文献   

11.
Through the recent NTCIR workshops, patent retrieval casts many challenging issues to information retrieval community. Unlike newspaper articles, patent documents are very long and well structured. These characteristics raise the necessity to reassess existing retrieval techniques that have been mainly developed for structure-less and short documents such as newspapers. This study investigates cluster-based retrieval in the context of invalidity search task of patent retrieval. Cluster-based retrieval assumes that clusters would provide additional evidence to match user’s information need. Thus far, cluster-based retrieval approaches have relied on automatically-created clusters. Fortunately, all patents have manually-assigned cluster information, international patent classification codes. International patent classification is a standard taxonomy for classifying patents, and has currently about 69,000 nodes which are organized into a five-level hierarchical system. Thus, patent documents could provide the best test bed to develop and evaluate cluster-based retrieval techniques. Experiments using the NTCIR-4 patent collection showed that the cluster-based language model could be helpful to improving the cluster-less baseline language model.  相似文献   

12.
Efficient management of toxicity information as an enterprise asset is increasingly important for the chemical, pharmaceutical, cosmetics and food industries. Many organisations focus on better information organisation and reuse, in an attempt to reduce the costs of testing and manufacturing in the product development phase. Toxicity information is extracted not only from toxicity data but also from predictive models. Accurate and appropriately shared models can bring a number of benefits if we are able to make effective use of existing expertise. Although usage of existing models may provide high-impact insights into the relationships between chemical attributes and specific toxicological effects, they can also be a source of risk for incorrect decisions. Thus, there is a need to provide a framework for efficient model management. To address this gap, this paper introduces a concept of model governance, that is based upon data governance principles. We extend the data governance processes by adding procedures that allow the evaluation of model use and governance for enterprise purposes. The core aspect of model governance is model representation. We propose six rules that form the basis of a model representation schema, called Minimum Information About a QSAR Model Representation (MIAQMR). As a proof-of-concept of our model governance framework we develop a web application called Model and Data Farm (MADFARM), in which models are described by the MIAQMR-ML markup language.  相似文献   

13.
In this paper we describe the design of a groupware framework, CIRLab, for experimenting with collaborative information retrieval (CIR) techniques in different search scenarios. This framework has been designed applying design patterns and an object-oriented middleware platform to maximize its reusability and adaptability in new contexts with a minimum of programming efforts. Our collaborative search application comprises three main modules: the Core, which supports various modern state-of-the-art CIR techniques that can be reused or extended in a distributed collaborative environment; the Facades Mediator, an event-driven notification service which allows easy integration between the Core and front-end applications; and finally, the Actions Tracker, which allows researchers to perform experiments on the different elements involved in the collaborative search sessions. The applying of this framework is illustrated through the analysis of the collaborative search-driven development case study.  相似文献   

14.
Large-scale web search engines are composed of multiple data centers that are geographically distant to each other. Typically, a user query is processed in a data center that is geographically close to the origin of the query, over a replica of the entire web index. Compared to a centralized, single-center search engine, this architecture offers lower query response times as the network latencies between the users and data centers are reduced. However, it does not scale well with increasing index sizes and query traffic volumes because queries are evaluated on the entire web index, which has to be replicated and maintained in all data centers. As a remedy to this scalability problem, we propose a document replication framework in which documents are selectively replicated on data centers based on regional user interests. Within this framework, we propose three different document replication strategies, each optimizing a different objective: reducing the potential search quality loss, the average query response time, or the total query workload of the search system. For all three strategies, we consider two alternative types of capacity constraints on index sizes of data centers. Moreover, we investigate the performance impact of query forwarding and result caching. We evaluate our strategies via detailed simulations, using a large query log and a document collection obtained from the Yahoo! web search engine.  相似文献   

15.
This paper presents a probabilistic information retrieval framework in which the retrieval problem is formally treated as a statistical decision problem. In this framework, queries and documents are modeled using statistical language models, user preferences are modeled through loss functions, and retrieval is cast as a risk minimization problem. We discuss how this framework can unify existing retrieval models and accommodate systematic development of new retrieval models. As an example of using the framework to model non-traditional retrieval problems, we derive retrieval models for subtopic retrieval, which is concerned with retrieving documents to cover many different subtopics of a general query topic. These new models differ from traditional retrieval models in that they relax the traditional assumption of independent relevance of documents.  相似文献   

16.
Recently, sentiment classification has received considerable attention within the natural language processing research community. However, since most recent works regarding sentiment classification have been done in the English language, there are accordingly not enough sentiment resources in other languages. Manual construction of reliable sentiment resources is a very difficult and time-consuming task. Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language (typically English) for sentiment classification of text documents in another language. Most existing research works rely on automatic machine translation services to directly project information from one language to another. However, different term distribution between original and translated text documents and translation errors are two main problems faced in the case of using only machine translation. To overcome these problems, we propose a novel learning model based on active learning and semi-supervised co-training to incorporate unlabelled data from the target language into the learning process in a bi-view framework. This model attempts to enrich training data by adding the most confident automatically-labelled examples, as well as a few of the most informative manually-labelled examples from unlabelled data in an iterative process. Further, in this model, we consider the density of unlabelled data so as to select more representative unlabelled examples in order to avoid outlier selection in active learning. The proposed model was applied to book review datasets in three different languages. Experiments showed that our model can effectively improve the cross-lingual sentiment classification performance and reduce labelling efforts in comparison with some baseline methods.  相似文献   

17.
黄传慧 《情报探索》2021,(1):129-134
[目的/意义]旨在揭示国外用户健康信息搜索行为的内容、特点及趋势,以期为我国用户健康信息搜索行为领域的研究提供借鉴和参考.[方法/过程]以Web of Science为数据源,对86篇文献采用综合、归纳、对比等方法进行综合分析,从相关概念、研究对象、研究方法等方面对知识脉络研究进行梳理,按老年人、成年人、青年人分别对健...  相似文献   

18.
智能搜索引擎与数字图书馆个性化服务   总被引:13,自引:0,他引:13  
邹凯  汪全莉 《情报科学》2004,22(7):874-877
智能搜索引擎基于知识(概念)层面实行信息检索,以较强的自然语言理解和知识处理能力,表现出良好的个性化信息服务特色。因此,智能搜索引擎应用于数字图书馆个性化服务体系,不仅有效发挥前者数据挖掘、知识发现的功能,同时能较大地加深后者的主动性、智能性优势。  相似文献   

19.
RSS: A framework enabling ranked search on the semantic web   总被引:1,自引:0,他引:1  
The semantic web not only contains resources but also includes the heterogeneous relationships among them, which is sharply distinguished from the current web. As the growth of the semantic web, specialized search techniques are of significance. In this paper, we present RSS—a framework for enabling ranked semantic search on the semantic web. In this framework, the heterogeneity of relationships is fully exploited to determine the global importance of resources. In addition, the search results can be greatly expanded with entities most semantically related to the query, thus able to provide users with properly ordered semantic search results by combining global ranking values and the relevance between the resources and the query. The proposed semantic search model which supports inference is very different from traditional keyword-based search methods. Moreover, RSS also distinguishes from many current methods of accessing the semantic web data in that it applies novel ranking strategies to prevent returning search results in disorder. The experimental results show that the framework is feasible and can produce better ordering of semantic search results than directly applying the standard PageRank algorithm on the semantic web.  相似文献   

20.
Exploratory search increasingly becomes an important research topic. Our interests focus on task-based information exploration, a specific type of exploratory search performed by a range of professional users, such as intelligence analysts. In this paper, we present an evaluation framework designed specifically for assessing and comparing performance of innovative information access tools created to support the work of intelligence analysts in the context of task-based information exploration. The motivation for the development of this framework came from our needs for testing systems in task-based information exploration, which cannot be satisfied by existing frameworks. The new framework is closely tied with the kind of tasks that intelligence analysts perform: complex, dynamic, and multiple facets and multiple stages. It views the user rather than the information system as the center of the evaluation, and examines how well users are served by the systems in their tasks. The evaluation framework examines the support of the systems at users’ major information access stages, such as information foraging and sense-making. The framework is accompanied by a reference test collection that has 18 tasks scenarios and corresponding passage-level ground truth annotations. To demonstrate the usage of the framework and the reference test collection, we present a specific evaluation study on CAFÉ, an adaptive filtering engine designed for supporting task-based information exploration. This study is a successful use case of the framework, and the study indeed revealed various aspects of the information systems and their roles in supporting task-based information exploration.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号