Variance reduction in large graph sampling |
| |
Authors: | Jianguo Lu Hao Wang |
| |
Institution: | School of Computer Science, University of Windsor, 401 Sunset Avenue, Windsor, Ontario N9B 3P4, Canada |
| |
Abstract: | The norm of practice in estimating graph properties is to use uniform random node (RN) samples whenever possible. Many graphs are large and scale-free, inducing large degree variance and estimator variance. This paper shows that random edge (RE) sampling and the corresponding harmonic mean estimator for average degree can reduce the estimation variance significantly. First, we demonstrate that the degree variance, and consequently the variance of the RN estimator, can grow almost linearly with data size for typical scale-free graphs. Then we prove that the RE estimator has a variance bounded from above. Therefore, the variance ratio between RN and RE samplings can be very large for big data. The analytical result is supported by both simulation studies and 18 real networks. We observe that the variance reduction ratio can be more than a hundred for some real networks such as Twitter. Furthermore, we show that random walk (RW) sampling is always worse than RE sampling, and it can reduce the variance of RN method only when its performance is close to that of RE sampling. |
| |
Keywords: | Uniform random sampling Random walk Graph sampling Online social network Scale-free network Harmonic mean |
本文献已被 ScienceDirect 等数据库收录! |
|