首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 215 毫秒
1.
The Inductive Query By Example (IQBE) paradigm allows a system to automatically derive queries for a specific Information Retrieval System (IRS). Classic IRSs based on this paradigm [Smith, M., & Smith, M. (1997). The use of genetic programming to build Boolean queries for text retrieval through relevance feedback. Journal of Information Science, 23(6), 423–431] generate a single solution (Boolean query) in each run, that with the best fitness value, which is usually based on a weighted combination of the basic performance criteria, precision and recall.  相似文献   

2.
The quality of stemming algorithms is typically measured in two different ways: (i) how accurately they map the variant forms of a word to the same stem; or (ii) how much improvement they bring to Information Retrieval systems. In this article, we evaluate various stemming algorithms, in four languages, in terms of accuracy and in terms of their aid to Information Retrieval. The aim is to assess whether the most accurate stemmers are also the ones that bring the biggest gain in Information Retrieval. Experiments in English, French, Portuguese, and Spanish show that this is not always the case, as stemmers with higher error rates yield better retrieval quality. As a byproduct, we also identified the most accurate stemmers and the best for Information Retrieval purposes.  相似文献   

3.
New methods and new systems are needed to filter or to selectively distribute the increasing volume of electronic information being produced nowadays. An effective information filtering system is one that provides the exact information that fulfills user's interests with the minimum effort by the user to describe it. Such a system will have to be adaptive to the user changing interest. In this paper we describe and evaluate a learning model for information filtering which is an adaptation of the generalized probabilistic model of Information Retrieval. The model is based on the concept of `uncertainty sampling', a technique that allows for relevance feedback both on relevant and nonrelevant documents. The proposed learning model is the core of a prototype information filtering system called ProFile.  相似文献   

4.
Whereas in language words of high frequency are generally associated with low content [Bookstein, A., & Swanson, D. (1974). Probabilistic models for automatic indexing. Journal of the American Society of Information Science, 25(5), 312–318; Damerau, F. J. (1965). An experiment in automatic indexing. American Documentation, 16, 283–289; Harter, S. P. (1974). A probabilistic approach to automatic keyword indexing. PhD thesis, University of Chicago; Sparck-Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, 11–21; Yu, C., & Salton, G. (1976). Precision weighting – an effective automatic indexing method. Journal of the Association for Computer Machinery (ACM), 23(1), 76–88], shallow syntactic fragments of high frequency generally correspond to lexical fragments of high content [Lioma, C., & Ounis, I. (2006). Examining the content load of part of speech blocks for information retrieval. In Proceedings of the international committee on computational linguistics and the association for computational linguistics (COLING/ACL 2006), Sydney, Australia]. We implement this finding to Information Retrieval, as follows. We present a novel automatic query reformulation technique, which is based on shallow syntactic evidence induced from various language samples, and used to enhance the performance of an Information Retrieval system. Firstly, we draw shallow syntactic evidence from language samples of varying size, and compare the effect of language sample size upon retrieval performance, when using our syntactically-based query reformulation (SQR) technique. Secondly, we compare SQR to a state-of-the-art probabilistic pseudo-relevance feedback technique. Additionally, we combine both techniques and evaluate their compatibility. We evaluate our proposed technique across two standard Text REtrieval Conference (TREC) English test collections, and three statistically different weighting models. Experimental results suggest that SQR markedly enhances retrieval performance, and is at least comparable to pseudo-relevance feedback. Notably, the combination of SQR and pseudo-relevance feedback further enhances retrieval performance considerably. These collective experimental results confirm the tenet that high frequency shallow syntactic fragments correspond to content-bearing lexical fragments.  相似文献   

5.
In order to evaluate the effectiveness of Information Retrieval (IR) systems it is key to collect relevance judgments from human assessors. Crowdsourcing has successfully been used as a method to scale-up the collection of manual relevance judgments, and previous research has investigated the impact of different judgment task design elements (e.g., highlighting query keywords in the document) on judgment quality and efficiency. In this work we investigate the positive and negative impacts of presenting crowd human assessors with more than just the topic and the document to be judged. We deploy different variants of crowdsourced relevance judgment tasks following a between-subjects design in which we present different types of metadata to the human assessor. Specifically, we investigate the effect of human metadata (e.g., what other human assessors think of the current document, as in which relevance level has already been selected by the majority crowd workers), machine metadata (e.g., how IR systems scored this document such as its average position in ranked lists, statistics about the document such as term frequencies). We look at the impact of metadata on judgment quality (i.e., the level of agreement with trained assessors) and cost (i.e., the time it takes for workers to complete the judgments) as well as at how metadata quality positively or negatively impact the collected judgments.  相似文献   

6.
Although there has been a great deal of research into Collaborative Information Retrieval (CIR) and Collaborative Information Seeking (CIS), the majority has assumed that team members have the same level of unrestricted access to underlying information. However, observations from different domains (e.g. healthcare, business, etc.) have suggested that collaboration sometimes involves people with differing levels of access to underlying information. This type of scenario has been referred to as Multi-Level Collaborative Information Retrieval (MLCIR). To the best of our knowledge, no studies have been conducted to investigate the effect of awareness, an existing CIR/CIS concept, on MLCIR. To address this gap in current knowledge, we conducted two separate user studies using a total of 5 different collaborative search interfaces and 3 information access scenarios. A number of Information Retrieval (IR), CIS and CIR evaluation metrics, as well as questionnaires were used to compare the interfaces. Design interviews were also conducted after evaluations to obtain qualitative feedback from participants. Results suggested that query properties such as time spent on query, query popularity and query effectiveness could allow users to obtain information about team's search performance and implicitly suggest better queries without disclosing sensitive data. Besides, having access to a history of intersecting viewed, relevant and bookmarked documents could provide similar positive effect as query properties. Also, it was found that being able to easily identify different team members and their actions is important for users in MLCIR. Based on our findings, we provide important design recommendations to help develop new CIR and MLCIR interfaces.  相似文献   

7.
Relevance-Based Language Models, commonly known as Relevance Models, are successful approaches to explicitly introduce the concept of relevance in the statistical Language Modelling framework of Information Retrieval. These models achieve state-of-the-art retrieval performance in the pseudo relevance feedback task. On the other hand, the field of recommender systems is a fertile research area where users are provided with personalised recommendations in several applications. In this paper, we propose an adaptation of the Relevance Modelling framework to effectively suggest recommendations to a user. We also propose a probabilistic clustering technique to perform the neighbour selection process as a way to achieve a better approximation of the set of relevant items in the pseudo relevance feedback process. These techniques, although well known in the Information Retrieval field, have not been applied yet to recommender systems, and, as the empirical evaluation results show, both proposals outperform individually several baseline methods. Furthermore, by combining both approaches even larger effectiveness improvements are achieved.  相似文献   

8.
The thesis of this study is to propose an extended methodology for laboratory based Information Retrieval evaluation under incomplete relevance assessments. This new methodology aims to identify potential uncertainty during system comparison that may result from incompleteness. The adoption of this methodology is advantageous, because the detection of epistemic uncertainty – the amount of knowledge (or ignorance) we have about the estimate of a system’s performance – during the evaluation process can guide and direct researchers when evaluating new systems over existing and future test collections. Across a series of experiments we demonstrate how this methodology can lead towards a finer grained analysis of systems. In particular, we show through experimentation how the current practice in Information Retrieval evaluation of using a measurement depth larger than the pooling depth increases uncertainty during system comparison.  相似文献   

9.
纪明奎  黄丽霞 《现代情报》2007,27(12):166-167,171
本文提出了一个基于语义网的个性化信息检索模型以实现用户的个性化信息检索。文章论述了该模型的模块组成,并对模型的功能模块进行了分析。  相似文献   

10.
The effectiveness of query expansion methods depends essentially on identifying good candidates, or prospects, semantically related to query terms. Word embeddings have been used recently in an attempt to address this problem. Nevertheless query disambiguation is still necessary as the semantic relatedness of each word in the corpus is modeled, but choosing the right terms for expansion from the standpoint of the un-modeled query semantics remains an open issue. In this paper we propose a novel query expansion method using word embeddings that models the global query semantics from the standpoint of prospect vocabulary terms. The proposed method allows to explore query-vocabulary semantic closeness in such a way that new terms, semantically related to more relevant topics, are elicited and added in function of the query as a whole. The method includes candidates pooling strategies that address disambiguation issues without using exogenous resources. We tested our method with three topic sets over CLEF corpora and compared it across different Information Retrieval models and against another expansion technique using word embeddings as well. Our experiments indicate that our method achieves significant results that outperform the baselines, improving both recall and precision metrics without relevance feedback.  相似文献   

11.
The transformation of many governments all around the world into new forms, namely, electronic government (e-Government), could not leave the Greek government unaffected. Therefore, it initiated an e-Government project related to national information systems and finance services, the Greek Taxation Information System (TAXIS). The purpose of this paper is to investigate the success of TAXIS from the perspective of expert employees, who work in public taxation agencies. This topic is interesting, because TAXIS is applied in a tax-driven country, under a mandatory setting. Also, it is the first time that the success of this project is examined, from the perspective of employees, using IS success models. The study adapts DeLone and McLean [DeLone, W. H., & McLean, E. R. (2003). The DeLone and McLean model of information systems success: A ten year update. Journal of Management Information Systems, 19(4), 9–30] and Seddon's [Seddon, P. B. (1997). A respecification and extension of the DeLone and McLean model of IS success. Information Systems Research, 8(3) 240–253] information systems success models. The model developed includes the constructs of information, system and service quality, perceived usefulness and user satisfaction. The results provide evidence that there are strong connections between the five success constructs. All hypothesized relationships are supported, except for the relationship between system quality and user satisfaction. The empirical evidence and discussion presented can help the Greek Government improve and fully exploit the potential of TAXIS as an innovative tool for taxation purposes.  相似文献   

12.
Concurrent concepts of specificity are discussed and differentiated from each other to investigate the relationship between index term specificity and users’ relevance judgments. The identified concepts are term-document specificity, hierarchical specificity, statement specificity, and posting specificity. Among them, term-document specificity, which is a relationship between an index term and the document indexed with the term, is regarded as a fruitful research area. In an experiment involving three searches with 175 retrieved documents from 356 matched index terms, the impact of specificity on relevance judgments is analyzed and found to be statistically significant. Implications for index practice and for future research are discussed.  相似文献   

13.
本文简要介绍了信息检索以及信息检索模型的概念,研究分析和对比了三种经典信息检索模型,其中重点放在概率检索模型上,并给出了一种具体实现。  相似文献   

14.
This paper presents a Foreign-Language Search Assistant that uses noun phrases as fundamental units for document translation and query formulation, translation and refinement. The system (a) supports the foreign-language document selection task providing a cross-language indicative summary based on noun phrase translations, and (b) supports query formulation and refinement using the information displayed in the cross-language document summaries. Our results challenge two implicit assumptions in most of cross-language Information Retrieval research: first, that once documents in the target language are found, Machine Translation is the optimal way of informing the user about their contents; and second, that in an interactive setting the optimal way of formulating and refining the query is helping the user to choose appropriate translations for the query terms.  相似文献   

15.
n-grams have been used widely and successfully for approximate string matching in many areas. s-grams have been introduced recently as an n-gram based matching technique, where di-grams are formed of both adjacent and non-adjacent characters. s-grams have proved successful in approximate string matching across language boundaries in Information Retrieval (IR). s-grams however lack precise definitions. Also their similarity comparison lacks precise definition. In this paper, we give precise definitions for both. Our definitions are developed in a bottom-up manner, only assuming character strings and elementary mathematical concepts. Extending established practices, we provide novel definitions of s-gram profiles and the L1 distance metric for them. This is a stronger string proximity measure than the popular Jaccard similarity measure because Jaccard is insensitive to the counts of each n-gram in the strings to be compared. However, due to the popularity of Jaccard in IR experiments, we define the reduction of s-gram profiles to binary profiles in order to precisely define the (extended) Jaccard similarity function for s-grams. We also show that n-gram similarity/distance computations are special cases of our generalized definitions.  相似文献   

16.
For the linear statistical model y = Xb + e, X of full column rank estimates of b of the form (C + X′X)+X′y are studied, where C commutes with X′X and Q+ is the Moore-Penrose inverse of Q. Such estimators may have smaller mean square error, component by component than does the least squares estimator. It is shown that this class of estimators is equivalent to two apparently different classes considered by other authors. It is also shown that there is no C such that (C + XX)+XY = My, in which My has the smallest mean square error, component by component. Two criteria, other than tmse, are suggested for selecting C. Each leads to an estimator independent of the unknown b and σ2. Subsequently, comparisons are made between estimators in which the C matrices are functions of a parameter k. Finally, it is shown for the no intercept model that standardizing, using a biased estimate for the transformed parameter vector, and retransforming to the original units yields an estimator with larger tmse than the least squares estimator.  相似文献   

17.
This paper investigates two relatively new measures of retrieval effectiveness in relation to the problem of incomplete relevance data. The measures, Bpref and RankEff, which do not take into account documents that have not been relevance judged, are compared theoretically and experimentally. The experimental comparisons involve a third measure, the well-known mean uninterpolated average precision. The results indicate that RankEff is the most stable of the three measures when the amount of relevance data is reduced, with respect to system ranking and absolute values. In addition, RankEff has the lowest error-rate.  相似文献   

18.
The varying uses of the term “relevance” are considered by analyzing retrieval-related activities into three different processes (Formulation, Retrieval, and Utilization) and four sorts of entities (Enquiry, Attribute, Response, and Benefit).Three quite different sorts of “relevance” are identified (Responsiveness, Pertinence, and Beneficiality). It is argued that relevance in the sense of utility cannot properly be used to evaluate the performance of retrieval processes as retrieval processes. Any use of utility implies that the utilization is also being considered.  相似文献   

19.
We present PubSearch, a hybrid heuristic scheme for re-ranking academic papers retrieved from standard digital libraries such as the ACM Portal. The scheme is based on the hierarchical combination of a custom implementation of the term frequency heuristic, a time-depreciated citation score and a graph-theoretic computed score that relates the paper’s index terms with each other. We designed and developed a meta-search engine that submits user queries to standard digital repositories of academic publications and re-ranks the repository results using the hierarchical heuristic scheme. We evaluate our proposed re-ranking scheme via user feedback against the results of ACM Portal on a total of 58 different user queries specified from 15 different users. The results show that our proposed scheme significantly outperforms ACM Portal in terms of retrieval precision as measured by most common metrics in Information Retrieval including Normalized Discounted Cumulative Gain (NDCG), Expected Reciprocal Rank (ERR) as well as a newly introduced lexicographic rule (LEX) of ranking search results. In particular, PubSearch outperforms ACM Portal by more than 77% in terms of ERR, by more than 11% in terms of NDCG, and by more than 907.5% in terms of LEX. We also re-rank the top-10 results of a subset of the original 58 user queries produced by Google Scholar, Microsoft Academic Search, and ArnetMiner; the results show that PubSearch compares very well against these search engines as well. The proposed scheme can be easily plugged in any existing search engine for retrieval of academic publications.  相似文献   

20.
良好的信息能力是大学生由接受被动式教育向自主性学习,进而向终身学习转变的必要条件。本文以湖北民族学院2013年春季学期选修过《信息检索》课的普通本科学生为调查样本,测评他们的信息能力及其信息检索行为与习惯,并检验《信息检索》课程效果。研究发现,该校大学生信息能力堪忧,《信息检索》课改空间大。提出实施《信息检索》课改时,尤其应关注教学层面、领导层面、行业指导层面、授课教师权益保障层面等四方面的问题。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号