首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Reducing hardware hit by queries in web search engines
Institution:1. Universidad Técnica Federico Santa María, Santiago, Chile;2. Universidad de Santiago de Chile, Santiago, Chile;3. CONICET, Universidad Nacional de San Luis, Argentina;4. Software Competence Center Hagenberg, Austria;1. Institute of Computing, Federal University of Amazonas, AM, Brazil;2. Department of Computer Science, Federal University of Minas Gerais, MG, Brazil;3. Institute of Computing, University of Campinas, SP, Brazil;1. Institute of Computing, Federal University of Amazonas –Av. Gen. Rodrigo Otávio, 3000, Manaus 69077-000, AM, Brazil;2. Neemu S/A, Av. Via Lactea, 1374, Manaus 69060-020, AM, Brazil;1. Department of Information Management, National Sun Yat-Sen University, No. 70, Lienhai Rd., Kaohsiung 80424, Taiwan;2. School of Information Sciences, University of Pittsburgh, 135 North Bellefield Avenue, Pittsburgh, PA 15260, USA;1. Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Alicante, Spain;2. Departamento de Computación, Universidad Agraria de La Habana, La Habana, Cuba
Abstract:In this paper, we introduce a new collection selection strategy to be operated in search engines with document partitioned indexes. Our method involves the selection of those document partitions that are most likely to deliver the best results to the formulated queries, reducing the number of queries that are submitted to each partition. This method employs learning algorithms that are capable of ranking the partitions, maximizing the probability of recovering documents with high gain. The method operates by building vector representations of each partition on the term space that is spanned by the queries. The proposed method is able to generalize to new queries and elaborate document lists with high precision for queries not considered during the training phase. To update the representations of each partition, our method employs incremental learning strategies. Beginning with an inversion test of the partition lists, we identify queries that contribute with new information and add them to the training phase. The experimental results show that our collection selection method favorably compares with state-of-the-art methods. In addition our method achieves a suitable performance with low parameter sensitivity making it applicable to search engines with hundreds of partitions.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号