A pipelined architecture for distributed text query evaluation |
| |
Authors: | Alistair Moffat William Webber Justin Zobel Ricardo Baeza-Yates |
| |
Institution: | (1) Department of Computer Science and Software Engineering, The University of Melbourne, Melbourne, Australia, 3010;(2) School of Computer Science and Information Technology, RMIT University, Melbourne, Australia, 3001;(3) Center for Web Research, Department of Computer Science, University of Chile, Santiago, Chile;(4) Present address: Yahoo! Research, Barcelona, Spain |
| |
Abstract: | Two principal query-evaluation methodologies have been described for cluster-based implementation of distributed information
retrieval systems: document partitioning and term partitioning. In a document-partitioned system, each of the processors hosts
a subset of the documents in the collection, and executes every query against its local sub-collection. In a term-partitioned
system, each of the processors hosts a subset of the inverted lists that make up the index of the collection, and serves them
to a central machine as they are required for query evaluation.
In this paper we introduce a pipelined query-evaluation methodology, based on a term-partitioned index, in which partially
evaluated queries are passed amongst the set of processors that host the query terms. This arrangement retains the disk read
benefits of term partitioning, but more effectively shares the computational load. We compare the three methodologies experimentally,
and show that term distribution is inefficient and scales poorly. The new pipelined approach offers efficient memory utilization
and efficient use of disk accesses, but suffers from problems with load balancing between nodes. Until these problems are
resolved, document partitioning remains the preferred method.
Alistair Moffat was supported by the Australian Research Council, the ARC Special Research Center for Perceptive and Intelligent
Machines in Complex Environments, and the NICTA Victoria Laboratory.
William Webber and Justin Zobel were supported by the Australian Research Council.
Ricardo Baeza-Yates was supported by Grant P01-029-F from Millennium Initiative of Mideplan, Chile; and by the University
of Melbourne as a visiting scholar at the time this project was undertaken. |
| |
Keywords: | Distributed retrieval Text searching Index representations |
本文献已被 SpringerLink 等数据库收录! |
|