Distributed search based on self-indexed compressed text期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Distributed search based on self-indexed compressed text

Authors:	Diego Arroyuelo Veronica Gil-Costa Senén González Mauricio Marin Mauricio Oyarzún

Institution:	1. Yahoo! Research Latin America, Santiago, Chile;2. Department of Informatics Engineering, University of Santiago of Chile, Chile;3. CONICET, National University of San Luis, Argentina

Abstract:	Query response times within a fraction of a second in Web search engines are feasible due to the use of indexing and caching techniques, which are devised for large text collections partitioned and replicated into a set of distributed-memory processors. This paper proposes an alternative query processing method for this setting, which is based on a combination of self-indexed compressed text and posting lists caching. We show that a text self-index (i.e., an index that compresses the text and is able to extract arbitrary parts of it) can be competitive with an inverted index if we consider the whole query process, which includes index decompression, ranking and snippet extraction time. The advantage is that within the space of the compressed document collection, one can carry out the posting lists generation, document ranking and snippet extraction. This significantly reduces the total number of processors involved in the solution of queries. Alternatively, for the same amount of hardware, the performance of the proposed strategy is better than that of the classical approach based on treating inverted indexes and corresponding documents as two separate entities in terms of processors and memory space.

Keywords:	Web search engines Wavelet trees Snippet extraction Self-indexed compressed text Query processing
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏