A multi-collection latent topic model for federated search |
| |
Authors: | Mark Baillie Mark Carman Fabio Crestani |
| |
Institution: | (1) Department of Computer and Information Sciences, University of Strathclyde, Glasgow, Scotland, UK;(2) Faculty of Informatics, University of Lugano, Lugano, Switzerland;; |
| |
Abstract: | Collection selection is a crucial function, central to the effectiveness and efficiency of a federated information retrieval
system. A variety of solutions have been proposed for collection selection adapting proven techniques used in centralised
retrieval. This paper defines a new approach to collection selection that models the topical distribution in each collection.
We describe an extended version of latent Dirichlet allocation that uses a hierarchical hyperprior to enable the different
topical distributions found in each collection to be modelled. Under the model, resources are ranked based on the topical
relationship between query and collection. By modelling collections in a low dimensional topic space, we can implicitly smooth
their term-based characterisation with appropriate terms from topically related samples, thereby dealing with the problem
of missing vocabulary within the samples. An important advantage of adopting this hierarchical model over current approaches
is that the model generalises well to unseen documents given small samples of each collection. The latent structure of each
collection can therefore be estimated well despite imperfect information for each collection such as sampled documents obtained
through query-based sampling. Experiments demonstrate that this new, fully integrated topical model is more robust than current
state of the art collection selection algorithms. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|