共查询到20条相似文献,搜索用时 796 毫秒
1.
Precision prediction based on ranked list coherence 总被引:1,自引:0,他引:1
We introduce a statistical measure of the coherence of a list of documents called the clarity score. Starting with a document list ranked by the query-likelihood retrieval model, we demonstrate the score's relationship to query ambiguity with respect to the collection. We also show that the clarity score is correlated with the average precision of a query and lay the groundwork for useful predictions by discussing a method of setting decision thresholds automatically. We then show that passage-based clarity scores correlate with average-precision measures of ranked lists of passages, where a passage is judged relevant if it contains correct answer text, which extends the basic method to passage-based systems. Next, we introduce variants of document-based clarity scores to improve the robustness, applicability, and predictive ability of clarity scores. In particular, we introduce the ranked list clarity score that can be computed with only a ranked list of documents, and the weighted clarity score where query terms contribute more than other terms. Finally, we show an approach to predicting queries that perform poorly on query expansion that uses techniques expanding on the ideas presented earlier.
相似文献
W. Bruce CroftEmail: |
2.
Due to the heavy use of gene synonyms in biomedical text, people have tried many query expansion techniques using synonyms
in order to improve performance in biomedical information retrieval. However, mixed results have been reported. The main challenge
is that it is not trivial to assign appropriate weights to the added gene synonyms in the expanded query; under-weighting
of synonyms would not bring much benefit, while overweighting some unreliable synonyms can hurt performance significantly.
So far, there has been no systematic evaluation of various synonym query expansion strategies for biomedical text. In this
work, we propose two different strategies to extend a standard language modeling approach for gene synonym query expansion
and conduct a systematic evaluation of these methods on all the available TREC biomedical text collections for ad hoc document
retrieval. Our experiment results show that synonym expansion can significantly improve the retrieval accuracy. However, different
query types require different synonym expansion methods, and appropriate weighting of gene names and synonym terms is critical
for improving performance.
相似文献
Chengxiang ZhaiEmail: |
3.
Query structuring and expansion with two-stage term dependence for Japanese web retrieval 总被引:1,自引:1,他引:0
In this paper, we propose a new term dependence model for information retrieval, which is based on a theoretical framework
using Markov random fields. We assume two types of dependencies of terms given in a query: (i) long-range dependencies that
may appear for instance within a passage or a sentence in a target document, and (ii) short-range dependencies that may appear
for instance within a compound word in a target document. Based on this assumption, our two-stage term dependence model captures
both long-range and short-range term dependencies differently, when more than one compound word appear in a query. We also
investigate how query structuring with term dependence can improve the performance of query expansion using a relevance model.
The relevance model is constructed using the retrieval results of the structured query with term dependence to expand the
query. We show that our term dependence model works well, particularly when using query structuring with compound words, through
experiments using a 100-gigabyte test collection of web documents mostly written in Japanese. We also show that the performance
of the relevance model can be significantly improved by using the structured query with our term dependence model.
相似文献
Koji EguchiEmail: |
4.
Query Expansion is commonly used in Information Retrieval to overcome vocabulary mismatch issues, such as synonymy between
the original query terms and a relevant document. In general, query expansion experiments exhibit mixed results. Overall TREC
Genomics Track results are also mixed; however, results from the top performing systems provide strong evidence supporting
the need for expansion. In this paper, we examine the conditions necessary for optimal query expansion performance with respect
to two system design issues: IR framework and knowledge source used for expansion. We present a query expansion framework
that improves Okapi baseline passage MAP performance by 185%. Using this framework, we compare and contrast the effectiveness
of a variety of biomedical knowledge sources used by TREC 2006 Genomics Track participants for expansion. Based on the outcome
of these experiments, we discuss the success factors required for effective query expansion with respect to various sources
of term expansion, such as corpus-based cooccurrence statistics, pseudo-relevance feedback methods, and domain-specific and
domain-independent ontologies and databases. Our results show that choice of document ranking algorithm is the most important
factor affecting retrieval performance on this dataset. In addition, when an appropriate ranking algorithm is used, we find
that query expansion with domain-specific knowledge sources provides an equally substantive gain in performance over a baseline
system.
相似文献
Nicola StokesEmail: Email: |
5.
Modern retrieval test collections are built through a process called pooling in which only a sample of the entire document
set is judged for each topic. The idea behind pooling is to find enough relevant documents such that when unjudged documents
are assumed to be nonrelevant the resulting judgment set is sufficiently complete and unbiased. Yet a constant-size pool represents
an increasingly small percentage of the document set as document sets grow larger, and at some point the assumption of approximately
complete judgments must become invalid. This paper shows that the judgment sets produced by traditional pooling when the pools
are too small relative to the total document set size can be biased in that they favor relevant documents that contain topic
title words. This phenomenon is wholly dependent on the collection size and does not depend on the number of relevant documents
for a given topic. We show that the AQUAINT test collection constructed in the recent TREC 2005 workshop exhibits this biased
relevance set; it is likely that the test collections based on the much larger GOV2 document set also exhibit the bias. The
paper concludes with suggested modifications to traditional pooling and evaluation methodology that may allow very large reusable
test collections to be built.
相似文献
Ellen VoorheesEmail: |
6.
To put an end to the large copyright trade deficit, both Chinese government agencies and publishing houses have been striving
for entering the international publication market. The article analyzes the background of the going-global strategy, and sums
up the performance of both Chinese administrations and publishers.
相似文献
Qing Fang (Corresponding author)Email: |
7.
Smoothing of document language models is critical in language modeling approaches to information retrieval. In this paper,
we present a novel way of smoothing document language models based on propagating term counts probabilistically in a graph
of documents. A key difference between our approach and previous approaches is that our smoothing algorithm can iteratively
propagate counts and achieve smoothing with remotely related documents. Evaluation results on several TREC data sets show that the proposed method significantly outperforms the
simple collection-based smoothing method. Compared with those other smoothing methods that also exploit local corpus structures,
our method is especially effective in improving precision in top-ranked documents through “filling in” missing query terms
in relevant documents, which is attractive since most users only pay attention to the top-ranked documents in search engine
applications.
相似文献
ChengXiang ZhaiEmail: |
8.
In retrieving medical free text, users are often interested in answers pertinent to certain scenarios that correspond to common
tasks performed in medical practice, e.g., treatment or diagnosis of a disease. A major challenge in handling such queries is that scenario terms in the query (e.g., treatment) are often too general to match specialized terms in relevant documents (e.g., chemotherapy). In this paper, we propose a knowledge-based query expansion method that exploits the UMLS knowledge source to append the
original query with additional terms that are specifically relevant to the query's scenario(s). We compared the proposed method
with traditional statistical expansion that expands terms which are statistically correlated but not necessarily scenario
specific. Our study on two standard testbeds shows that the knowledge-based method, by providing scenario-specific expansion,
yields notable improvements over the statistical method in terms of average precision-recall. On the OHSUMED testbed, for
example, the improvement is more than 5% averaging over all scenario-specific queries studied and about 10% for queries that
mention certain scenarios, such as treatment of a disease and differential diagnosis of a symptom/disease.
相似文献
Wesley W. ChuEmail: |
9.
We present software that generates phrase-based concordances in real-time based on Internet searching. When a user enters
a string of words for which he wants to find concordances, the system sends this string as a query to a search engine and
obtains search results for the string. The concordances are extracted by performing statistical analysis on search results
and then fed back to the user. Unlike existing tools, this concordance consultation tool is language-independent, so concordances
can be obtained even in a language for which there are no well-established analytical methods. Our evaluation has revealed
that concordances can be obtained more effectively than by only using a search engine directly.
相似文献
Yuichiro IshiiEmail: |
10.
E. Herrera-Viedma A. G. López-Herrera S. Alonso J. M. Moreno F. J. Cabrerizo C. Porcel 《Information Retrieval》2009,12(2):179-200
This paper describes a computer-supported learning system to teach students the principles and concepts of Fuzzy Information
Retrieval Systems based on weighted queries. This tool is used to support the teacher’s activity in the degree course Information Retrieval Systems Based on Artificial Intelligence at the Faculty of Library and Information Sciences at the University of Granada. Learning of languages of weighted queries
in Fuzzy Information Retrieval Systems is complex because it is very difficult to understand the different semantics that
could be associated to the weights of queries together with their respective strategies of query evaluation. We have developed
and implemented this computer-supported education system because it allows to support the teacher’s activity in the classroom
to teach the use of weighted queries in FIRSs and it helps students to develop self-learning processes on the use of such
queries. We have evaluated the performance of its use in the learning process according to the students’ perceptions and their
results obtained in the course’s exams. We have observed that using this software tool the students learn better the management
of the weighted query languages and then their performance in the exams is improved.
相似文献
C. PorcelEmail: |
11.
Index maintenance strategies employed by dynamic text retrieval systems based on inverted files can be divided into two categories:
merge-based and in-place update strategies. Within each category, individual update policies can be distinguished based on
whether they store their on-disk posting lists in a contiguous or in a discontiguous fashion. Contiguous inverted lists, in
general, lead to higher query performance, by minimizing the disk seek overhead at query time, while discontiguous inverted
lists lead to higher update performance, requiring less effort during index maintenance operations. In this paper, we focus
on retrieval systems with high query load, where the on-disk posting lists have to be stored in a contiguous fashion at all
times. We discuss a combination of re-merge and in-place index update, called Hybrid Immediate Merge. The method performs strictly better than the re-merge baseline policy used in our experiments, as it leads to the same query
performance, but substantially better update performance. The actual time savings achievable depend on the size of the text
collection being indexed; a larger collection results in greater savings. In our experiments, variations of Hybrid Immediate Merge were able to reduce the total index update overhead by up to 73% compared to the re-merge baseline.
相似文献
Stefan BüttcherEmail: |
12.
13.
Through a reading of the archived letters of Henry Garnet (1555–1606), Superior of the Jesuit order in England and suspected
Gunpowder plotter, this article investigates the nature of the archive in relation to narrative theory. Figuring the archive
as one of the number of narrating voices accrued by the individual record, I argue that models of communication such as those
put forward by Roman Jakobson, Wayne C. Booth and Seymour Chatman afford useful insights into the ways in which power is inscribed
and reinscribed in the record through successive acts of reading and rewriting.
Paul Wake is a Senior Lecturer in English Literature at Manchester Metropolitan University. He is the author of Conrad’s Marlow (2007), editor, with Simon Malpas, of The Routledge Companion to Critical Theory (2006), and he has published articles on narrative theory and postmodernism. 相似文献
Paul WakeEmail: |
Paul Wake is a Senior Lecturer in English Literature at Manchester Metropolitan University. He is the author of Conrad’s Marlow (2007), editor, with Simon Malpas, of The Routledge Companion to Critical Theory (2006), and he has published articles on narrative theory and postmodernism. 相似文献
14.
Towards enhancing retrieval effectiveness of search engines for diacritisized Arabic documents 总被引:1,自引:1,他引:0
Bassam H. Hammo 《Information Retrieval》2009,12(3):300-323
15.
Tim Schlak 《Archival Science》2008,8(2):85-101
This article pursues the varying understandings of the photograph in archival literature. An in-depth review of the scholarship
uncovers several possible reasons why archivists and those writing about photographic archives apparently continue to struggle
with the photograph, including: the sheer difficulty that photographs as an elusive medium present; past debates about photography
in art history, history, and archival literature; and the challenges that the photograph as an evasive document presents to
the contradictory nature of archives themselves and to conceptions of archival science. Having evolved from an understanding
of photographs that conflated content with meaning to postmodernist notions of contingent and plural meanings in which photographs
participate, archival writings on the photograph hold promise as they begin to tread the waters that Schwartz charted in the
last 15 years. This paper follows that historical progression in order to trace the discourse on photographic archives that
has emerged over the past three decades.
相似文献
Tim SchlakEmail: |
16.
Andy Weissberg 《Publishing Research Quarterly》2008,24(4):255-260
This article analyzes current industry practices toward the identification of digital book content. It highlights key technology
trends, workflow considerations and supply chain behaviors, and examines the implications of these trends and behaviors on
the production, discoverability, purchasing and consumption of digital book products.
相似文献
Andy WeissbergEmail: |
17.
Jacob Soll 《Archival Science》2007,7(4):331-342
This article examines the archival methods developed by Colbert to train his son in state administration. Based on Colbert’s
correspondence with his son, it reveals the practices Colbert thought necessary to collect and manage information in his state
encyclopedic archive during the last half of the 17th century.
相似文献
Jacob SollEmail: |
18.
Nathan Hollier 《Publishing Research Quarterly》2008,24(3):165-174
This article provides a summary of and commentary on ‘A Lovely Kind of Madness: Small and Independent Publishing in Australia’,
an unpublished report by Kate Freeth, commissioned by the Small Press Underground Networking Community (SPUNC), the representative
body for small and independent publishers in Australia, and released in November 2007. Freeth’s 14,000 word report constitutes
the most detailed and comprehensive study of Australian small and independent publishing since the second volume of Michael
Denholm’s Small Press Publishing in Australia (1991) and provides much primary material for policy makers, scholars, and people working in and around the publishing industry.
相似文献
Nathan HollierEmail: |
19.
Beatrice S. Bartlett 《Archival Science》2007,7(4):369-390
This article describes the first half century of the Communist government’s supervision and management of the central-government
archives of the last two dynasties. Immediately with the Communist ascent to power in 1949, the new government took great
interest in assembling and protecting the country’s archival documents, readying the Ming-Qing archives for access to scholars,
and preparing for publication of selected materials. By the 1980s Beijing’s Number One Historical Archives, in charge of the
largest holding of Ming-Qing documents, had become the first Chinese authority to complete a full sorting and preliminary
catalogues for such a collection. Moreover, to facilitate searches, an attempt has recently begun to create a subject-heading
system for these and other holdings in the country. In the first half century’s final decades, foreign researchers were admitted
for the first time and tours and international exchanges began to take place.
相似文献
Beatrice S. BartlettEmail: |
20.
Sandeep Chaufla 《Publishing Research Quarterly》2008,24(3):187-201
A review and analysis of the rules and regulations including the tax aspects of making an investment in India is presented.
The full range from Foreign Direct Investment to different forms of doing business with specific examples from the publishing
industry is explored to help understand current policies and regulations.
相似文献
Sandeep ChauflaEmail: Email: |