首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
In order to evaluate the effectiveness of Information Retrieval (IR) systems it is key to collect relevance judgments from human assessors. Crowdsourcing has successfully been used as a method to scale-up the collection of manual relevance judgments, and previous research has investigated the impact of different judgment task design elements (e.g., highlighting query keywords in the document) on judgment quality and efficiency. In this work we investigate the positive and negative impacts of presenting crowd human assessors with more than just the topic and the document to be judged. We deploy different variants of crowdsourced relevance judgment tasks following a between-subjects design in which we present different types of metadata to the human assessor. Specifically, we investigate the effect of human metadata (e.g., what other human assessors think of the current document, as in which relevance level has already been selected by the majority crowd workers), machine metadata (e.g., how IR systems scored this document such as its average position in ranked lists, statistics about the document such as term frequencies). We look at the impact of metadata on judgment quality (i.e., the level of agreement with trained assessors) and cost (i.e., the time it takes for workers to complete the judgments) as well as at how metadata quality positively or negatively impact the collected judgments.  相似文献   

2.
Term weighting for document ranking and retrieval has been an important research topic in information retrieval for decades. We propose a novel term weighting method based on a hypothesis that a term’s role in accumulated retrieval sessions in the past affects its general importance regardless. It utilizes availability of past retrieval results consisting of the queries that contain a particular term, retrieved documents, and their relevance judgments. A term’s evidential weight, as we propose in this paper, depends on the degree to which the mean frequency values for the relevant and non-relevant document distributions in the past are different. More precisely, it takes into account the rankings and similarity values of the relevant and non-relevant documents. Our experimental result using standard test collections shows that the proposed term weighting scheme improves conventional TF*IDF and language model based schemes. It indicates that evidential term weights bring in a new aspect of term importance and complement the collection statistics based on TF*IDF. We also show how the proposed term weighting scheme based on the notion of evidential weights are related to the well-known weighting schemes based on language modeling and probabilistic models.  相似文献   

3.
In information retrieval, the task of query performance prediction (QPP) is concerned with determining in advance the performance of a given query within the context of a retrieval model. QPP has an important role in ensuring proper handling of queries with varying levels of difficulty. Based on the extant literature, query specificity is an important indicator of query performance and is typically estimated using corpus-specific frequency-based specificity metrics However, such metrics do not consider term semantics and inter-term associations. Our work presented in this paper distinguishes itself by proposing a host of corpus-independent specificity metrics that are based on pre-trained neural embeddings and leverage geometric relations between terms in the embedding space in order to capture the semantics of terms and their interdependencies. Specifically, we propose three classes of specificity metrics based on pre-trained neural embeddings: neighborhood-based, graph-based, and cluster-based metrics. Through two extensive and complementary sets of experiments, we show that the proposed specificity metrics (1) are suitable specificity indicators, based on the gold standards derived from knowledge hierarchies (Wikipedia category hierarchy and DMOZ taxonomy), and (2) have better or competitive performance compared to the state of the art QPP metrics, based on both TREC ad hoc collections namely Robust’04, Gov2 and ClueWeb’09 and ANTIQUE question answering collection. The proposed graph-based specificity metrics, especially those that capture a larger number of inter-term associations, proved to be the most effective in both query specificity estimation and QPP. We have also publicly released two test collections (i.e. specificity gold standards) that we built from the Wikipedia and DMOZ knowledge hierarchies.  相似文献   

4.
Smart cities employ information and communication technologies to improve: the quality of life for its citizens, the local economy, transport, traffic management, environment, and interaction with government. Due to the relevance of smart cities (also referred using other related terms such as Digital City, Information City, Intelligent City, Knowledge-based City, Ubiquitous City, Wired City) to various stakeholders and the benefits and challenges associated with its implementation, the concept of smart cities has attracted significant attention from researchers within multiple fields, including information systems. This study provides a valuable synthesis of the relevant literature by analysing and discussing the key findings from existing research on issues related to smart cities from an Information Systems perspective. The research analysed and discussed in this study focuses on number of aspects of smart cities: smart mobility, smart living, smart environment, smart citizens, smart government, and smart architecture as well as related technologies and concepts. The discussion also focusses on the alignment of smart cities with the UN sustainable development goals. This comprehensive review offers critical insight to the key underlying research themes within smart cities, highlighting the limitations of current developments and potential future directions.  相似文献   

5.
研究渔业公共信息化建设与渔业经济效率相关性,有助于分析渔业公共信息化建设与渔业经济效率内在联系,辨识渔业公共信息化建设过程中存在的问题。基于此,利用基于熵值法的模糊物元分析方法和DEA—Malmquist指数方法,采用我国28个省区(市)2006-2018年的相关数据,分别计算各省区(市)的渔业公共信息化建设水平和渔业经济效率,并运用面板单位根检验、协整检验和误差修正模型综合分析渔业公共信息化建设与渔业经济效率的相关性。研究结果表明渔业公共信息化建设与渔业经济效率存在长期和短期均衡关系,且呈正相关关系。  相似文献   

6.
Relevance judgments occur within an information search process, where time, context and situation can impact the judgments. The determination of relevance is dependent on a number of factors and variables which include the criteria used to determine relevance. The relevance judgment process and the criteria used to make those judgments are manifestations of the cognitive changes which occur during the information search process.Understanding why these relevance criteria choices are made, and how they vary over the information search process can provide important information about the dynamic relevance judgment process. This information can be used to guide the development of more adaptive information retrieval systems which respond to the cognitive changes of users during the information search process.The research data analyzed here was collected in two separate studies which examined a subject’s relevance judgment over an information search process. Statistical analysis was used to examine these results and determine if there were relationships between criteria selections, relevance judgments, and the subject’s progression through the information search process. Findings confirm and extend findings of previous studies, providing strong statistical evidence of an association between the information search process and the choices of relevance criteria by users, and identifying specific changes in the user preferences for specific criteria over the course of the information search process.  相似文献   

7.
While test collections provide the cornerstone for Cranfield-based evaluation of information retrieval (IR) systems, it has become practically infeasible to rely on traditional pooling techniques to construct test collections at the scale of today’s massive document collections (e.g., ClueWeb12’s 700M+ Webpages). This has motivated a flurry of studies proposing more cost-effective yet reliable IR evaluation methods. In this paper, we propose a new intelligent topic selection method which reduces the number of search topics (and thereby costly human relevance judgments) needed for reliable IR evaluation. To rigorously assess our method, we integrate previously disparate lines of research on intelligent topic selection and deep vs. shallow judging (i.e., whether it is more cost-effective to collect many relevance judgments for a few topics or a few judgments for many topics). While prior work on intelligent topic selection has never been evaluated against shallow judging baselines, prior work on deep vs. shallow judging has largely argued for shallowed judging, but assuming random topic selection. We argue that for evaluating any topic selection method, ultimately one must ask whether it is actually useful to select topics, or should one simply perform shallow judging over many topics? In seeking a rigorous answer to this over-arching question, we conduct a comprehensive investigation over a set of relevant factors never previously studied together: 1) method of topic selection; 2) the effect of topic familiarity on human judging speed; and 3) how different topic generation processes (requiring varying human effort) impact (i) budget utilization and (ii) the resultant quality of judgments. Experiments on NIST TREC Robust 2003 and Robust 2004 test collections show that not only can we reliably evaluate IR systems with fewer topics, but also that: 1) when topics are intelligently selected, deep judging is often more cost-effective than shallow judging in evaluation reliability; and 2) topic familiarity and topic generation costs greatly impact the evaluation cost vs. reliability trade-off. Our findings challenge conventional wisdom in showing that deep judging is often preferable to shallow judging when topics are selected intelligently.  相似文献   

8.
A new model for aggregating multiple criteria evaluations for relevance assessment is proposed. An Information Retrieval context is considered, where relevance is modeled as a multidimensional property of documents. The usefulness and effectiveness of such a model are demonstrated by means of a case study on personalized Information Retrieval with multi-criteria relevance. The following criteria are considered to estimate document relevance: aboutness, coverage, appropriateness, and reliability.  相似文献   

9.
《Research Policy》2019,48(10):103557
Complex societal or environmental problems require fast and substantial socio-technical transitions. For instance, in the case of climate change, these transitions need to take place in the energy, transport and several industry sectors. To induce and accelerate such transitions, numerous policy interventions are required, which interact with each other in policy mixes. While several conceptual studies on policy mixes have been published recently, there is very little empirical research apart from single case or small-n studies. It has been prominently argued that the debate about policy mixes has reached an impasse partly due to this lack of empirical work. This paper addresses this gap by providing a first analysis of the temporal dynamics of complex policy mixes. To do so, we develop a conceptualization and measurement of policy mix balance across instrument types as well as policy mix design features (in the form of intensity as a general and technology specificity as a technology-focused design feature). This allows us to answer the question how temporal dynamics of policy mixes differ between countries regarding their balance and design features. Our measurement approach is developed bottom-up, i.e., policies are assessed individually and then aggregated systematically at the policy mix level. This enables overcoming the ‘dependent variable problem in the study of policy change’, i.e., the problem of measuring policy output. More specifically, we develop a comparative dataset of 522 renewable energy policies in nine OECD countries. Our analysis shows that countries’ policy mix dynamics vary strongly regarding some variables (e.g., technology specificity) but less regarding others (e.g., balance). As a validity check, we also test the effects of these mix dynamics on policy outcome in the form of renewable energy technology diffusion. We reflect our findings in light of the theoretical debates around policy mixes and policy design and discuss how our results provoke an agenda for the new generation of research on policy mixes. We specifically discuss avenues for future research with a particular focus on the ‘politics of policy mixes’.  相似文献   

10.
In this paper results from three studies examining 1295 relevance judgments by 36 information retrieval (IR) system end-users is reported. Both the region of the relevance judgments, from non-relevant to highly relevant, and the motivations or levels for the relevance judgments are examined. Three major findings are studied. First, the frequency distributions of relevance judgments by IR system end-users tend to take on a bi-modal shape with peaks at the extremes (non-relevant/relevant) with a flatter middle range. Second, the different type of scale (interval or ordinal) used in each study did not alter the shape of the relevance frequency distributions. And third, on an interval scale, the median point of relevance judgment distributions correlates with the point where relevant and partially relevant items begin to be retrieved. The median point of a distribution of relevance judgments may provide a measure of user/IR system interaction to supplement precision/recall measures. The implications of investigation for relevance theory and IR systems evaluation are discussed.  相似文献   

11.
Document length normalization is one of the fundamental components in a retrieval model because term frequencies can readily be increased in long documents. The key hypotheses in literature regarding document length normalization are the verbosity and scope hypotheses, which imply that document length normalization should consider the distinguishing effects of verbosity and scope on term frequencies. In this article, we extend these hypotheses in a pseudo-relevance feedback setting by assuming the verbosity hypothesis on the feedback query model, which states that the verbosity of an expanded query should not be high. Furthermore, we postulate the following two effects of document verbosity on a feedback query model that easily and typically holds in modern pseudo-relevance feedback methods: 1) the verbosity-preserving effect: the query verbosity of a feedback query model is determined by feedback document verbosities; 2) the verbosity-sensitive effect: highly verbose documents more significantly and unfairly affect the resulting query model than normal documents do. By considering these effects, we propose verbosity normalized pseudo-relevance feedback, which is straightforwardly obtained by replacing original term frequencies with their verbosity-normalized term frequencies in the pseudo-relevance feedback method. The results of the experiments performed on three standard TREC collections show that the proposed verbosity normalized pseudo-relevance feedback consistently provides statistically significant improvements over conventional methods, under the settings of the relevance model and latent concept expansion.  相似文献   

12.
Three traditional problems in accelerative mechanics are chosen to exemplify the manner of working by the proportionalities method. The formal construction of a proportionality is given and the form of the dimensional matrix of a monomial is discussed. Having chosen the dimensional in which the (n+1) term reduction of the variables of a dimensionally homogeneous set is to be explicitly dimensionally homogeneous, the variables of other dimensions may be systematically combined, as necessary, to leave only terms in the first chosen dimension and one other. The variables of the first dimension, together with the proportionalities formed in terms of the first dimension, can then be arranged to encompass the whole relationship between the variables of the set. This is done in a manner which readily allows formal display of all the choices in the nondimensionalization process of reduction from the (n+1) term equation to the n term nondimensional equation.  相似文献   

13.
An experiment was conducted to see how relevance feedback could be used to build and adjust profiles to improve the performance of filtering systems. Data was collected during the system interaction of 18 graduate students with SIFTER (Smart Information Filtering Technology for Electronic Resources), a filtering system that ranks incoming information based on users' profiles. The data set came from a collection of 6000 records concerning consumer health. In the first phase of the study, three different modes of profile acquisition were compared. The explicit mode allowed users to directly specify the profile; the implicit mode utilized relevance feedback to create and refine the profile; and the combined mode allowed users to initialize the profile and to continuously refine it using relevance feedback. Filtering performance, measured in terms of Normalized Precision, showed that the three approaches were significantly different (α=0.05 and p=0.012). The explicit mode of profile acquisition consistently produced superior results. Exclusive reliance on relevance feedback in the implicit mode resulted in inferior performance. The low performance obtained by the implicit acquisition mode motivated the second phase of the study, which aimed to clarify the role of context in relevance feedback judgments. An inductive content analysis of thinking aloud protocols showed dimensions that were highly situational, establishing the importance context plays in feedback relevance assessments. Results suggest the need for better representation of documents, profiles, and relevance feedback mechanisms that incorporate dimensions identified in this research.  相似文献   

14.
Some of the most popular measures to evaluate information filtering systems are usually independent of the users because they are based in relevance judgments obtained from experts. On the other hand, the user-centred evaluation allows showing the different impressions that the users have perceived about the system running. This work is focused on discussing the problem of user-centred versus system-centred evaluation of a Web content personalization system where the personalization is based on a user model that stores long term (section, categories and keywords) and short term interests (adapted from user provided feedback). The user-centred evaluation is based on questionnaires filled in by the users before and after using the system and the system-centred evaluation is based on the comparison between ranking of documents, obtained from the application of a multi-tier selection process, and binary relevance judgments collected previously from real users. The user-centred and system-centred evaluations performed with 106 users during 14 working days have provided valuable data concerning the behaviour of the users with respect to issues such as document relevance or the relative importance attributed to different ways of personalization. The results obtained shows general satisfaction on both the personalization processes (selection, adaptation and presentation) and the system as a whole.  相似文献   

15.
16.
In Part I properties of the scale coördinate, of the form: B(n + θ) are discussed. n is shown to be associated with the operation of counting scale marks, θ with the operation of estimating between them, and B, with the operational and configurational aspects of that part of apparatus which lies adjunct to the scale system.In Part II three types of measurement codification are discussed: (a) the differential interval; (b) the finite amorphous interval; (c) the scale interval; a relationship among them is postulated.In Part III the finite differences in scale coördinates are defined and simple theorems are used to illustrate these definitions. Simple difference equations in scale coördinates are solved to illustrate macroscopic “selection principles” arising partly out of the methodology of codifying a coincidence in scale coördinates.In Part IV an example of causally related dimensional systems is described by use of the scale coördinate. This example is taken from the perfect gas law and Van der Waals' gas law.  相似文献   

17.
In this paper, a new source selection algorithm for uncooperative distributed information retrieval environments is presented. The algorithm functions by modeling each information source as an integral, using the relevance score and the intra-collection position of its sampled documents in reference to a centralized sample index and selects the collections that cover the largest area in the rank-relevance space. Based on the above novel metric, the algorithm explicitly focuses on addressing the two goals of source selection; high-recall, which is important for source recommendation applications and high-precision which is important for distributed information retrieval, aiming to produce a high-precision final merged list.  相似文献   

18.
This paper investigates two relatively new measures of retrieval effectiveness in relation to the problem of incomplete relevance data. The measures, Bpref and RankEff, which do not take into account documents that have not been relevance judged, are compared theoretically and experimentally. The experimental comparisons involve a third measure, the well-known mean uninterpolated average precision. The results indicate that RankEff is the most stable of the three measures when the amount of relevance data is reduced, with respect to system ranking and absolute values. In addition, RankEff has the lowest error-rate.  相似文献   

19.
20.
Management innovation and the consultants who promote and support it are both typically associated with the ‘new’, with departures from the norm and from standard approaches. Indeed, standardization is often seen as an impediment to innovation, especially in the current ‘post-bureaucratic’ era. This article challenges such a view, arguing that consultant-led management innovation is often highly standardized. Based upon qualitative research into internal consultancy in large business organizations, both standardizing agendas and standardized methods are identified from a range of consultant-led management innovation programs. The analysis then points to some of the structural and cultural features of organizations that lead to managers favouring incremental, standardized approaches to change, even if these are often contested. In conclusion, the article points to the need to consider a range of different dimensions in the relationship between standardization and management innovation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号