期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Tasks, topics and relevance judging for the TREC Genomics Track: five years of experience evaluating biomedical text information retrieval systems

Phoebe M. Roberts Aaron M. Cohen William R. Hersh 《Information Retrieval》2009,12(1):81-97

With the help of a team of expert biologist judges, the TREC Genomics track has generated four large sets of “gold standard” test collections, comprised of over a hundred unique topics, two kinds of ad hoc retrieval tasks, and their corresponding relevance judgments. Over the years of the track, increasingly complex tasks necessitated the creation of judging tools and training guidelines to accommodate teams of part-time short-term workers from a variety of specialized biological scientific backgrounds, and to address consistency and reproducibility of the assessment process. Important lessons were learned about factors that influenced the utility of the test collections including topic design, annotations provided by judges, methods used for identifying and training judges, and providing a central moderator “meta-judge”. 相似文献

2.

Mining subtopics from different aspects for diversifying search results

Chieh-Jen Wang Yung-Wei Lin Ming-Feng Tsai Hsin-Hsi Chen 《Information Retrieval》2013,16(4):452-483

User queries to the Web tend to have more than one interpretation due to their ambiguity and other characteristics. How to diversify the ranking results to meet users’ various potential information needs has attracted considerable attention recently. This paper is aimed at mining the subtopics of a query either indirectly from the returned results of retrieval systems or directly from the query itself to diversify the search results. For the indirect subtopic mining approach, clustering the retrieval results and summarizing the content of clusters is investigated. In addition, labeling topic categories and concept tags on each returned document is explored. For the direct subtopic mining approach, several external resources, such as Wikipedia, Open Directory Project, search query logs, and the related search services of search engines, are consulted. Furthermore, we propose a diversified retrieval model to rank documents with respect to the mined subtopics for balancing relevance and diversity. Experiments are conducted on the ClueWeb09 dataset with the topics of the TREC09 and TREC10 Web Track diversity tasks. Experimental results show that the proposed subtopic-based diversification algorithm significantly outperforms the state-of-the-art models in the TREC09 and TREC10 Web Track diversity tasks. The best performance our proposed algorithm achieves is α-nDCG@5 0.307, IA-P@5 0.121, and α#-nDCG@5 0.214 on the TREC09, as well as α-nDCG@10 0.421, IA-P@10 0.201, and α#-nDCG@10 0.311 on the TREC10. The results conclude that the subtopic mining technique with the up-to-date users’ search query logs is the most effective way to generate the subtopics of a query, and the proposed subtopic-based diversification algorithm can select the documents covering various subtopics. 相似文献

3.

The strange case of reproducibility versus representativeness in contextual suggestion test collections

Thaer Samar Alejandro Bellogín Arjen P. de Vries 《Information Retrieval》2016,19(3):230-255

The most common approach to measuring the effectiveness of Information Retrieval systems is by using test collections. The Contextual Suggestion (CS) TREC track provides an evaluation framework for systems that recommend items to users given their geographical context. The specific nature of this track allows the participating teams to identify candidate documents either from the Open Web or from the ClueWeb12 collection, a static version of the web. In the judging pool, the documents from the Open Web and ClueWeb12 collection are distinguished. Hence, each system submission should be based only on one resource, either Open Web (identified by URLs) or ClueWeb12 (identified by ids). To achieve reproducibility, ranking web pages from ClueWeb12 should be the preferred method for scientific evaluation of CS systems, but it has been found that the systems that build their suggestion algorithms on top of input taken from the Open Web achieve consistently a higher effectiveness. Because most of the systems take a rather similar approach to making CSs, this raises the question whether systems built by researchers on top of ClueWeb12 are still representative of those that would work directly on industry-strength web search engines. Do we need to sacrifice reproducibility for the sake of representativeness? We study the difference in effectiveness between Open Web systems and ClueWeb12 systems through analyzing the relevance assessments of documents identified from both the Open Web and ClueWeb12. Then, we identify documents that overlap between the relevance assessments of the Open Web and ClueWeb12, observing a dependency between relevance assessments and the source of the document being taken from the Open Web or from ClueWeb12. After that, we identify documents from the relevance assessments of the Open Web which exist in the ClueWeb12 collection but do not exist in the ClueWeb12 relevance assessments. We use these documents to expand the ClueWeb12 relevance assessments. Our main findings are twofold. First, our empirical analysis of the relevance assessments of 2 years of CS track shows that Open Web documents receive better ratings than ClueWeb12 documents, especially if we look at the documents in the overlap. Second, our approach for selecting candidate documents from ClueWeb12 collection based on information obtained from the Open Web makes an improvement step towards partially bridging the gap in effectiveness between Open Web and ClueWeb12 systems, while at the same time we achieve reproducible results on well-known representative sample of the web. 相似文献

4.

Click-based evidence for decaying weight distributions in search effectiveness metrics

Yuye Zhang Laurence A. F. Park Alistair Moffat 《Information Retrieval》2010,13(1):46-69

Search effectiveness metrics are used to evaluate the quality of the answer lists returned by search services, usually based on a set of relevance judgments. One plausible way of calculating an effectiveness score for a system run is to compute the inner-product of the run’s relevance vector and a “utility” vector, where the ith element in the utility vector represents the relative benefit obtained by the user of the system if they encounter a relevant document at depth i in the ranking. This paper uses such a framework to examine the user behavior patterns—and hence utility weightings—that can be inferred from a web query log. We describe a process for extrapolating user observations from query log clickthroughs, and employ this user model to measure the quality of effectiveness weighting distributions. Our results show that for measures with static distributions (that is, utility weighting schemes for which the weight vector is independent of the relevance vector), the geometric weighting model employed in the rank-biased precision effectiveness metric offers the closest fit to the user observation model. In addition, using past TREC data as to indicate likelihood of relevance, we also show that the distributions employed in the BPref and MRR metrics are the best fit out of the measures for which static distributions do not exist. 相似文献

5.

Server selection methods in personal metasearch: a comparative empirical study

Paul Thomas David Hawking 《Information Retrieval》2009,12(5):581-604

Server selection is an important subproblem in distributed information retrieval (DIR) but has commonly been studied with collections of more or less uniform size and with more or less homogeneous content. In contrast, realistic DIR applications may feature much more varied collections. In particular, personal metasearch—a novel application of DIR which includes all of a user’s online resources—may involve collections which vary in size by several orders of magnitude, and which have highly varied data. We describe a number of algorithms for server selection, and consider their effectiveness when collections vary widely in size and are represented by imperfect samples. We compare the algorithms on a personal metasearch testbed comprising calendar, email, mailing list and web collections, where collection sizes differ by three orders of magnitude. We then explore the effect of collection size variations using four partitionings of the TREC ad hoc data used in many other DIR experiments. Kullback-Leibler divergence, previously considered poorly effective, performs better than expected in this application; other techniques thought to be effective perform poorly and are not appropriate for this problem. A strong correlation with size-based rankings for many techniques may be responsible. 相似文献

6.

Incremental Relevance Feedback in Japanese Text Retrieval

Gareth Jones Tetsuya Sakai Masahiro Kajiura Kazuo Sumita 《Information Retrieval》2000,2(4):361-384

The application of relevance feedback techniques has been shown to improve retrieval performance for a number of information retrieval tasks. This paper explores incremental relevance feedback for ad hoc Japanese text retrieval; examining, separately and in combination, the utility of term reweighting and query expansion using a probabilistic retrieval model. Retrieval performance is evaluated in terms of standard precision-recall measures, and also using number-to-view graphs. Experimental results, on the standard BMIR-J2 Japanese language retrieval collection, show that both term reweighting and query expansion improve retrieval performance. This is reflected in improvements in both precision and recall, but also a reduction in the average number of documents which must be viewed to find a selected number of relevant items. In particular, using a simple simulation of user searching, incremental application of relevance information is shown to lead to progressively improved retrieval performance and an overall reduction in the number of documents that a user must view to find relevant ones. 相似文献

7.

Construction of query concepts based on feature clustering of documents

Youjin Chang Minkoo Kim Vijay V. Raghavan 《Information Retrieval》2006,9(3):231-248

In Information Retrieval, since it is hard to identify users’ information needs, many approaches have been tried to solve this problem by expanding initial queries and reweighting the terms in the expanded queries using users’ relevance judgments. Although relevance feedback is most effective when relevance information about retrieved documents is provided by users, it is not always available. Another solution is to use correlated terms for query expansion. The main problem with this approach is how to construct the term-term correlations that can be used effectively to improve retrieval performance. In this study, we try to construct query concepts that denote users’ information needs from a document space, rather than to reformulate initial queries using the term correlations and/or users’ relevance feedback. To form query concepts, we extract features from each document, and then cluster the features into primitive concepts that are then used to form query concepts. Experiments are performed on the Associated Press (AP) dataset taken from the TREC collection. The experimental evaluation shows that our proposed framework called QCM (Query Concept Method) outperforms baseline probabilistic retrieval model on TREC retrieval. 相似文献

8.

Editorial decision making: Risk and reward

Mike Shatzkin 《Publishing Research Quarterly》1999,15(3):55-62

The most critical decisions—whether or not to invest in a publishing project—are made with inadequate analytical infrastructure. Much time is consumed in obtaining the view of generalists within the company in second-guessing the decisions of those who really know the market—the editor and the author. Decisions are made on the basis of “title P&Ls” when individual titles do not make a profit or a loss—only businesses do. Little attempt is made to assess relative risk and the most crucial element for most businesses, cash, is hardly considered. Editors and publishers need new tools to support their decision making process. Mike Shatzkin is co-chairman of the VISTA Editorial Board. His Idea Logical company produces content for Sportsline USA web sites. Address for correspondence 相似文献

9.

Measured markets: Limited edition publishing and the Grabhorn Press, 1920–1930

Megan Benton 《Publishing Research Quarterly》1995,11(2):90-102

In the postwar prosperity of the 1920s there burgeoned a new interest in fine book-making, which typically featured handcraft production, luxurious materials, “worthy” texts, and—virtually by definition—limited editions. A small but socially prominent community of bibliophiles and wealthy collectors consituted an eager market for these elite books, distinguished by their visible repudiation of mass culture and “commercialism.” This article examines the publishing enterprise of the Grabhorn Press, one of the foremost producers of finely printed books in twentieth-century America. It analyzes the press's editiorial and design strategies, pricing and marketing policies, and general business practices in order to better understand the cultural paradoxes of producing such books both “for love” and for profit. 相似文献

10.

Evaluating the effectiveness of relevance feedback based on a user simulation model: effects of a user scenario on cumulated gain value

Heikki Keskustalo Kalervo Järvelin Ari Pirkola 《Information Retrieval》2008,11(3):209-228

We propose a method for performing evaluation of relevance feedback based on simulating real users. The user simulation applies a model defining the user’s relevance threshold to accept individual documents as feedback in a graded relevance environment; user’s patience to browse the initial list of retrieved documents; and his/her effort in providing the feedback. We evaluate the result by using cumulated gain-based evaluation together with freezing all documents seen by the user in order to simulate the point of view of a user who is browsing the documents during the retrieval process. We demonstrate the method by performing a simulation in the laboratory setting and present the “branching” curve sets characteristic for the presented evaluation method. Both the average and topic-by-topic results indicate that if the freezing approach is adopted, giving feedback of mixed quality makes sense for various usage scenarios even though the modeled users prefer finding especially the most relevant documents. 相似文献

11.

Perfect present,perfect gift: finding a place for archival consciousness in social theory

Brien Brothman 《Archival Science》2010,10(2):141-189

In 1924, Canadian Dominion Archivist Arthur Doughty (1860–1936) characterized archives as “the gift of one generation to another.” This essay takes these words seriously. It sets aside the common habit of thinking of archival work in terms of “keeping” and “preserving” and experiments with—re-imagines—archives as a form of gift giving. However, as a growing body of scholarship across numerous disciplines is discovering, gift giving is a complex social act. Thus, construing archives as a form of gift opens up new avenues of critical inquiry into archives’ unique temporal consciousness and its importance to accounts of the establishment and unmaking of any social order. This article explores the nature of archival consciousness and its place in social theory. 相似文献

12.

On information retrieval metrics designed for evaluation with incomplete relevance assessments 总被引：1，自引：0，他引：1

Tetsuya Sakai Noriko Kando 《Information Retrieval》2008,11(5):447-470

Modern information retrieval (IR) test collections have grown in size, but the available manpower for relevance assessments has more or less remained constant. Hence, how to reliably evaluate and compare IR systems using incomplete relevance data, where many documents exist that were never examined by the relevance assessors, is receiving a lot of attention. This article compares the robustness of IR metrics to incomplete relevance assessments, using four different sets of graded-relevance test collections with submitted runs—the TREC 2003 and 2004 robust track data and the NTCIR-6 Japanese and Chinese IR data from the crosslingual task. Following previous work, we artificially reduce the original relevance data to simulate IR evaluation environments with extremely incomplete relevance data. We then investigate the effect of this reduction on discriminative power, which we define as the proportion of system pairs with a statistically significant difference for a given probability of Type I Error, and on Kendall’s rank correlation, which reflects the overall resemblance of two system rankings according to two different metrics or two different relevance data sets. According to these experiments, Q′, nDCG′ and AP′ proposed by Sakai are superior to bpref proposed by Buckley and Voorhees and to Rank-Biased Precision proposed by Moffat and Zobel. We also point out some weaknesses of bpref and Rank-Biased Precision by examining their formal definitions.

Noriko KandoEmail:

相似文献

13.

Steps to Global Licensing Success

Tracey Armstrong 《Publishing Research Quarterly》2012,28(1):23-26

The impact of evolving technology on those who create content and those who use it has raised many interesting copyright-related challenges that legislators, copyright experts, authors, publishers and licensing organizations around the world are looking to address. Several international initiatives underway highlight the evolving global copyright landscape, including a report commissioned by the UK government calling for the creation of a “Digital Copyright Exchange.” Through such international efforts—and through the content licensing experience of collective management organizations—the best solutions to the copyright challenges of our time can deliver efficiency to everyone involved. 相似文献

14.

INEX与TREC的比较研究

陈敏宋红文《图书馆杂志》2012,(4):15-19

INEX与TREC是检索领域的两大检索系统评价平台,在检索技术发展迅速的今天依然保持强大生命力,在当今检索技术评价领域起着十分重要的作用。本篇文章通过对INEX与TREC的研究目标以及平台的构成要素包括三个方面:测试集、检索问题的构造、相关性评估的比较,找出INEX相对于TREC评测平台的创新及不同点,以便更加深入和全面地了解INEX的评测方法。相似文献

15.

Inventing book news, 1925–1935: “Publicity hypnosis” and the colophon

Claire Badaracco 《Publishing Research Quarterly》1990,6(4):17-30

The Colophon, published from 1925 through 1935 for an audience of book collectors and connoisseurs, illuminates the debate over the basis of a book's value: genuine quality or artificial scarcity. It also illustrates the distinction between genuine news about the book industry and “publicity hypnosis”—today’s “hype.” The magazine’s authors, designers, typographers, printers, and illustrators were among the finest of their day. Claire Badaracco is an assistant professor at Marquette University, and is currently writing a series of articles about Anglo-American book publishing, journalism, and publicity trades between 1920 and 1940 with the support of The British Academy, the National Endowment, and the Bibliographical Society of America. Her work has been published inAmerican Literary Realism, Journalism Quarterly, Essays in Business and Economic History, and other journals. The Library of Congress, The Center for The Book, is publishing her monograph “The Lakeside Press Four American Books Campaign 1926–1930.” 相似文献

16.

Exploiting entity relationship for query expansion in enterprise search

Xitong Liu Fei Chen Hui Fang Min Wang 《Information Retrieval》2014,17(3):265-294

Enterprise search is important, and the search quality has a direct impact on the productivity of an enterprise. Enterprise data contain both structured and unstructured information. Since these two types of information are complementary and the structured information such as relational databases is designed based on ER (entity-relationship) models, there is a rich body of information about entities in enterprise data. As a result, many information needs of enterprise search center around entities. For example, a user may formulate a query describing a problem that she encounters with an entity, e.g., the web browser, and want to retrieve relevant documents to solve the problem. Intuitively, information related to the entities mentioned in the query, such as related entities and their relations, would be useful to reformulate the query and improve the retrieval performance. However, most existing studies on query expansion are term-centric. In this paper, we propose a novel entity-centric query expansion framework for enterprise search. Specifically, given a query containing entities, we first utilize both unstructured and structured information to find entities that are related to the ones in the query. We then discuss how to adapt existing feedback methods to use the related entities and their relations to improve search quality. Experimental results over two real-world enterprise collections show that the proposed entity-centric query expansion strategies are more effective and robust to improve the search performance than the state-of-the-art pseudo feedback methods for long natural language-like queries with entities. Moreover, results over a TREC ad hoc retrieval collections show that the proposed methods can also work well for short keyword queries in the general search domain. 相似文献

17.

Diversified search evaluation: lessons from the NTCIR-9 INTENT task

Tetsuya Sakai Ruihua Song 《Information Retrieval》2013,16(4):504-529

The evaluation of diversified web search results is a relatively new research topic and is not as well-understood as the time-honoured evaluation methodology of traditional IR based on precision and recall. In diversity evaluation, one topic may have more than one intent, and systems are expected to balance relevance and diversity. The recent NTCIR-9 evaluation workshop launched a new task called INTENT which included a diversified web search subtask that differs from the TREC web diversity task in several aspects: the choice of evaluation metrics, the use of intent popularity and per-intent graded relevance, and the use of topic sets that are twice as large as those of TREC. The objective of this study is to examine whether these differences are useful, using the actual data recently obtained from the NTCIR-9 INTENT task. Our main experimental findings are: (1) The $\hbox{D}\,\sharp$ evaluation framework used at NTCIR provides more “intuitive” and statistically reliable results than Intent-Aware Expected Reciprocal Rank; (2) Utilising both intent popularity and per-intent graded relevance as is done at NTCIR tends to improve discriminative power, particularly for $\hbox{D}\,\sharp$ -nDCG; and (3) Reducing the topic set size, even by just 10 topics, can affect not only significance testing but also the entire system ranking; when 50 topics are used (as in TREC) instead of 100 (as in NTCIR), the system ranking can be substantially different from the original ranking and the discriminative power can be halved. These results suggest that the directions being explored at NTCIR are valuable. 相似文献

18.

Replicating Web Structure in Small-Scale Test Collections

Cathal Gurrin Alan F. Smeaton 《Information Retrieval》2004,7(3-4):239-263

Linkage analysis as an aid to web search has been assumed to be of significant benefit and we know that it is being implemented by many major Search Engines. Why then have few TREC participants been able to scientifically prove the benefits of linkage analysis in recent years? In this paper we put forward reasons why many disappointing results have been found in TREC experiments and we identify the linkage density requirements of a dataset to faithfully support experiments into linkage-based retrieval by examining the linkage structure of the WWW. Based on these requirements we report on methodologies for synthesising such a test collection. 相似文献

19.

Enhancing literary understandings through young adult fiction

Maia Pank Mertz 《Publishing Research Quarterly》1992,8(1):23-33

Young adult literature can be used to enhance students' understanding of literary techniques and concepts, providing a “transition” between children's and adult fiction. Two young adult novels—The Pigman andI am the Cheese—are used to illustrate the use of young adult novels to teach about literary elements such as tome, point of view, characterization, motif, symbolism, style, and structure. Maia Pank Mertz areas of specialization include the teaching of young adult literature, literature study, and the impact of media on adolescents. 相似文献

20.

Rallying Point: Lewis Michaux’s <Emphasis Type="Italic">National Memorial African Bookstore</Emphasis>

David Emblidge 《Publishing Research Quarterly》2008,24(4):267-276

Michaux’s National Memorial African Bookstore, Harlem, NY, was the epicenter of black literary life and bookselling, 1933–c.1975. Michaux migrated from Virginia to escape farm work and his brother’s evangelical church, opting instead—despite the lack of formal education—to become a trafficker in ideas, through bookselling. A self-styled Garveyite, Michaux advised Malcolm X, though he never joined the Nation of Islam or advocated revolution. The bookshop—with a huge inventory of books about black experience and spearheaded by the charismatic bookseller (known as “The Professor”)—attracted a loyal clientele, championed famous writers and artists, and hosted international leaders (especially Africans). A rallying point for political speeches, often delivered in front of the store, in its period, there was no other black bookstore in America with Michaux’s influence. 相似文献