共查询到20条相似文献,搜索用时 15 毫秒
1.
Due to the heavy use of gene synonyms in biomedical text, people have tried many query expansion techniques using synonyms
in order to improve performance in biomedical information retrieval. However, mixed results have been reported. The main challenge
is that it is not trivial to assign appropriate weights to the added gene synonyms in the expanded query; under-weighting
of synonyms would not bring much benefit, while overweighting some unreliable synonyms can hurt performance significantly.
So far, there has been no systematic evaluation of various synonym query expansion strategies for biomedical text. In this
work, we propose two different strategies to extend a standard language modeling approach for gene synonym query expansion
and conduct a systematic evaluation of these methods on all the available TREC biomedical text collections for ad hoc document
retrieval. Our experiment results show that synonym expansion can significantly improve the retrieval accuracy. However, different
query types require different synonym expansion methods, and appropriate weighting of gene names and synonym terms is critical
for improving performance.
相似文献
Chengxiang ZhaiEmail: |
2.
Precision prediction based on ranked list coherence 总被引:1,自引:0,他引:1
We introduce a statistical measure of the coherence of a list of documents called the clarity score. Starting with a document list ranked by the query-likelihood retrieval model, we demonstrate the score's relationship to query ambiguity with respect to the collection. We also show that the clarity score is correlated with the average precision of a query and lay the groundwork for useful predictions by discussing a method of setting decision thresholds automatically. We then show that passage-based clarity scores correlate with average-precision measures of ranked lists of passages, where a passage is judged relevant if it contains correct answer text, which extends the basic method to passage-based systems. Next, we introduce variants of document-based clarity scores to improve the robustness, applicability, and predictive ability of clarity scores. In particular, we introduce the ranked list clarity score that can be computed with only a ranked list of documents, and the weighted clarity score where query terms contribute more than other terms. Finally, we show an approach to predicting queries that perform poorly on query expansion that uses techniques expanding on the ideas presented earlier.
相似文献
W. Bruce CroftEmail: |
3.
Query structuring and expansion with two-stage term dependence for Japanese web retrieval 总被引:1,自引:1,他引:0
In this paper, we propose a new term dependence model for information retrieval, which is based on a theoretical framework
using Markov random fields. We assume two types of dependencies of terms given in a query: (i) long-range dependencies that
may appear for instance within a passage or a sentence in a target document, and (ii) short-range dependencies that may appear
for instance within a compound word in a target document. Based on this assumption, our two-stage term dependence model captures
both long-range and short-range term dependencies differently, when more than one compound word appear in a query. We also
investigate how query structuring with term dependence can improve the performance of query expansion using a relevance model.
The relevance model is constructed using the retrieval results of the structured query with term dependence to expand the
query. We show that our term dependence model works well, particularly when using query structuring with compound words, through
experiments using a 100-gigabyte test collection of web documents mostly written in Japanese. We also show that the performance
of the relevance model can be significantly improved by using the structured query with our term dependence model.
相似文献
Koji EguchiEmail: |
4.
To put an end to the large copyright trade deficit, both Chinese government agencies and publishing houses have been striving
for entering the international publication market. The article analyzes the background of the going-global strategy, and sums
up the performance of both Chinese administrations and publishers.
相似文献
Qing Fang (Corresponding author)Email: |
5.
We consider the following autocompletion search scenario: imagine a user of a search engine typing a query; then with every
keystroke display those completions of the last query word that would lead to the best hits, and also display the best such
hits. The following problem is at the core of this feature: for a fixed document collection, given a set D of documents, and an alphabetical range W of words, compute the set of all word-in-document pairs (w, d) from the collection such that w ∈ W and d ∈ D. We present a new data structure with the help of which such autocompletion queries can be processed, on the average, in
time linear in the input plus output size, independent of the size of the underlying document collection. At the same time,
our data structure uses no more space than an inverted index. Actual query processing times on a large test collection correlate
almost perfectly with our theoretical bound.
相似文献
Ingmar WeberEmail: |
6.
Bestsellers are an important commercial and social phenomenon. The paper defines and analyses bestsellers in the UK between
1998 and 2005, following on earlier work by one of the authors. It is concluded that there is a core groups of genres and
authors dominating the bestseller lists, although there are also unexpected successes, especially at Christmas time. There
is some evidence of long-term changes in taste, including the apparent decline in the popularity of romantic fiction and the
growth of fantasy literature. It is also shown that media and movie adaptations and spin-offs are now an integral part of
this dimension of the book industry.
相似文献
John FeatherEmail: |
7.
We present software that generates phrase-based concordances in real-time based on Internet searching. When a user enters
a string of words for which he wants to find concordances, the system sends this string as a query to a search engine and
obtains search results for the string. The concordances are extracted by performing statistical analysis on search results
and then fed back to the user. Unlike existing tools, this concordance consultation tool is language-independent, so concordances
can be obtained even in a language for which there are no well-established analytical methods. Our evaluation has revealed
that concordances can be obtained more effectively than by only using a search engine directly.
相似文献
Yuichiro IshiiEmail: |
8.
Result merging methods in distributed information retrieval with overlapping databases 总被引:5,自引:0,他引:5
In distributed information retrieval systems, document overlaps occur frequently among different component databases. This
paper presents an experimental investigation and evaluation of a group of result merging methods including the shadow document
method and the multi-evidence method in the environment of overlapping databases. We assume, with the exception of resultant
document lists (either with rankings or scores), no extra information about retrieval servers and text databases is available,
which is the usual case for many applications on the Internet and the Web.
The experimental results show that the shadow document method and the multi-evidence method are the two best methods when
overlap is high, while Round-robin is the best for low overlap. The experiments also show that [0,1] linear normalization
is a better option than linear regression normalization for result merging in a heterogeneous environment.
相似文献
Sally McCleanEmail: |
9.
Evaluation is a major driving force in advancing the state of the art in language technologies. In particular, methods for automatically assessing the quality of machine output is the preferred method for measuring progress, provided that these metrics have been validated against human judgments. Following recent developments in the automatic evaluation of machine translation and document summarization, we present a similar approach, implemented in a measure called POURPRE, an automatic technique for evaluating answers to complex questions based on n-gram co-occurrences between machine output and a human-generated answer key. Until now, the only way to assess the correctness of answers to such questions involves manual determination of whether an information “nugget” appears in a system's response. The lack of automatic methods for scoring system output is an impediment to progress in the field, which we address with this work. Experiments with the TREC 2003, TREC 2004, and TREC 2005 QA tracks indicate that rankings produced by our metric correlate highly with official rankings, and that POURPRE outperforms direct application of existing metrics.
相似文献
Dina Demner-FushmanEmail: |
10.
Andy Weissberg 《Publishing Research Quarterly》2008,24(4):255-260
This article analyzes current industry practices toward the identification of digital book content. It highlights key technology
trends, workflow considerations and supply chain behaviors, and examines the implications of these trends and behaviors on
the production, discoverability, purchasing and consumption of digital book products.
相似文献
Andy WeissbergEmail: |
11.
Multilingual information retrieval is generally understood to mean the retrieval of relevant information in multiple target
languages in response to a user query in a single source language. In a multilingual federated search environment, different
information sources contain documents in different languages. A general search strategy in multilingual federated search environments
is to translate the user query to each language of the information sources and run a monolingual search in each information
source. It is then necessary to obtain a single ranked document list by merging the individual ranked lists from the information
sources that are in different languages. This is known as the results merging problem for multilingual information retrieval.
Previous research has shown that the simple approach of normalizing source-specific document scores is not effective. On the
other side, a more effective merging method was proposed to download and translate all retrieved documents into the source
language and generate the final ranked list by running a monolingual search in the search client. The latter method is more
effective but is associated with a large amount of online communication and computation costs. This paper proposes an effective
and efficient approach for the results merging task of multilingual ranked lists. Particularly, it downloads only a small
number of documents from the individual ranked lists of each user query to calculate comparable document scores by utilizing
both the query-based translation method and the document-based translation method. Then, query-specific and source-specific
transformation models can be trained for individual ranked lists by using the information of these downloaded documents. These
transformation models are used to estimate comparable document scores for all retrieved documents and thus the documents can
be sorted into a final ranked list. This merging approach is efficient as only a subset of the retrieved documents are downloaded
and translated online. Furthermore, an extensive set of experiments on the Cross-Language Evaluation Forum (CLEF) () data has demonstrated the effectiveness of the query-specific and source-specific results merging algorithm against other
alternatives. The new research in this paper proposes different variants of the query-specific and source-specific results
merging algorithm with different transformation models. This paper also provides thorough experimental results as well as
detailed analysis. All of the work substantially extends the preliminary research in (Si and Callan, in: Peters (ed.) Results
of the cross-language evaluation forum-CLEF 2005, 2005).
相似文献
Hao YuanEmail: |
12.
Sandeep Chaufla 《Publishing Research Quarterly》2008,24(3):187-201
A review and analysis of the rules and regulations including the tax aspects of making an investment in India is presented.
The full range from Foreign Direct Investment to different forms of doing business with specific examples from the publishing
industry is explored to help understand current policies and regulations.
相似文献
Sandeep ChauflaEmail: Email: |
13.
Smoothing of document language models is critical in language modeling approaches to information retrieval. In this paper,
we present a novel way of smoothing document language models based on propagating term counts probabilistically in a graph
of documents. A key difference between our approach and previous approaches is that our smoothing algorithm can iteratively
propagate counts and achieve smoothing with remotely related documents. Evaluation results on several TREC data sets show that the proposed method significantly outperforms the
simple collection-based smoothing method. Compared with those other smoothing methods that also exploit local corpus structures,
our method is especially effective in improving precision in top-ranked documents through “filling in” missing query terms
in relevant documents, which is attractive since most users only pay attention to the top-ranked documents in search engine
applications.
相似文献
ChengXiang ZhaiEmail: |
14.
A summary overview of the children’s and young adult publishing industry in China with a focus on the size of the market,
ten major publishing houses, copyright and trends. Special emphasis has been placed on specific transaction for the sale of
translation rights from German language publishers to China and minimal activities of German rights sold to Chinese publishers.
相似文献
Jing BartzEmail: |
15.
A comparison of analyses of the Scottish publishing industry carried out in 1992, 2002 and 2007 underscores the fragility
of the sector within a small country within the English-language community. A number of indices reveal either stability or
stagnation and the picture emerges of the remarkable tenacity of publishing in Scotland. Although there is already a significant
and vital element of state support for publishing in Scotland, further intervention will be necessary to ensure fulfilment
of its potential.
相似文献
Alistair McCleeryEmail: |
16.
17.
Oren Kurland 《Information Retrieval》2009,12(4):437-460
To obtain high precision at top ranks by a search performed in response to a query, researchers have proposed a cluster-based
re-ranking paradigm: clustering an initial list of documents that are the most highly ranked by some initial search, and using
information induced from these (often called) query-specific clusters for re-ranking the list. However, results concerning the effectiveness of various automatic cluster-based re-ranking methods have been inconclusive. We show that using query-specific clusters for automatic re-ranking
of top-retrieved documents is effective with several methods in which clusters play different roles, among which is the smoothing of document language models. We do so by adapting previously-proposed cluster-based retrieval approaches, which are based on (static) query-independent
clusters for ranking all documents in a corpus, to the re-ranking setting wherein clusters are query-specific. The best performing
method that we develop outperforms both the initial document-based ranking and some previously proposed cluster-based re-ranking
approaches; furthermore, this algorithm consistently outperforms a state-of-the-art pseudo-feedback-based approach. In further
exploration we study the performance of cluster-based smoothing methods for re-ranking with various (soft and hard) clustering
algorithms, and demonstrate the importance of clusters in providing context from the initial list through a comparison to
using single documents to this end.
相似文献
Oren KurlandEmail: |
18.
In retrieving medical free text, users are often interested in answers pertinent to certain scenarios that correspond to common
tasks performed in medical practice, e.g., treatment or diagnosis of a disease. A major challenge in handling such queries is that scenario terms in the query (e.g., treatment) are often too general to match specialized terms in relevant documents (e.g., chemotherapy). In this paper, we propose a knowledge-based query expansion method that exploits the UMLS knowledge source to append the
original query with additional terms that are specifically relevant to the query's scenario(s). We compared the proposed method
with traditional statistical expansion that expands terms which are statistically correlated but not necessarily scenario
specific. Our study on two standard testbeds shows that the knowledge-based method, by providing scenario-specific expansion,
yields notable improvements over the statistical method in terms of average precision-recall. On the OHSUMED testbed, for
example, the improvement is more than 5% averaging over all scenario-specific queries studied and about 10% for queries that
mention certain scenarios, such as treatment of a disease and differential diagnosis of a symptom/disease.
相似文献
Wesley W. ChuEmail: |
19.
This article analyses the extent to which archival exemptions for historical, scientific and statistical research in privacy
legislation support preservation in selected European Union countries, and comparable aspects of Australian, American and
Canadian law within a legal, ethical and digital archival perspective. The authors recommend that the further processing of
personal data under data protection law be given a wider scope of interpretation for archival preservation purposes in both
the public and private sector, coupled with the use of researcher and archival codes in relation to access to personal data.
They also recommend early appraisal and integration of privacy with freedom of information and archival regimes.
相似文献
Malcolm ToddEmail: |
20.
Fernando Diaz 《Information Retrieval》2007,10(6):531-562
We adapt the cluster hypothesis for score-based information retrieval by claiming that closely related documents should have
similar scores. Given a retrieval from an arbitrary system, we describe an algorithm which directly optimizes this objective
by adjusting retrieval scores so that topically related documents receive similar scores. We refer to this process as score
regularization. Because score regularization operates on retrieval scores, regardless of their origin, we can apply the technique
to arbitrary initial retrieval rankings. Document rankings derived from regularized scores, when compared to rankings derived
from un-regularized scores, consistently and significantly result in improved performance given a variety of baseline retrieval
algorithms. We also present several proofs demonstrating that regularization generalizes methods such as pseudo-relevance
feedback, document expansion, and cluster-based retrieval. Because of these strong empirical and theoretical results, we argue
for the adoption of score regularization as general design principle or post-processing step for information retrieval systems.
相似文献
Fernando DiazEmail: |