Role of Bootstrap Averaging in Generalized Approximate Message Passing
Abstract
Generalized approximate message passing (GAMP) is a computationally efficient algorithm for estimating an unknown signal from a random linear measurement , where is a known measurement matrix and is the noise vector. The salient feature of GAMP is that it can provide an unbiased estimator , which can be used for various hypothesis-testing methods. In this study, we consider the bootstrap average of an unbiased estimator of GAMP for the elastic net. By numerically analyzing the state evolution of approximate message passing with resampling, which has been proposed for computing bootstrap statistics of the elastic net estimator, we investigate when the bootstrap averaging reduces the variance of the unbiased estimator and the effect of optimizing the size of each bootstrap sample and hyperparameter of the elastic net regularization in the asymptotic setting . The results indicate that bootstrap averaging effectively reduces the variance of the unbiased estimator when the actual data generation process is inconsistent with the sparsity assumption of the regularization and the sample size is small. Furthermore, we find that when is less sparse, and the data size is small, the system undergoes a phase transition. The phase transition indicates the existence of the region where the ensemble average of unbiased estimators of GAMP for the elastic net norm minimization problem yields the unbiased estimator with the minimum variance.
I Introduction
Consider estimating an unknown signal from a random linear measurement in the form,
(1) |
In (1), is a known measurement matrix whose elements are independent and identically distributed (i.i.d.) standard Gaussian variables, and is the measurement noise. We also assume that each element of the unknown signal is i.i.d., according to a distribution .
Generalized approximate message passing (GAMP) [1, 2] is a computationally efficient algorithm for solving this problem. A striking feature of GAMP is its applicability to various hypothesis testing [3]. Specifically, GAMP can provide an unbiased estimator [2] in a high-dimensional asymptotic setting with , where the variance depends on the quality of the measurement and denoising function used in GAMP. This unbiased estimator has been used to test the significance of estimated signals [4, 5, 3, 6].
The statistical power of these tests depends on the variance , with a lower variance leading to a higher statistical power. However, reducing the variance is not a trivial task. Replacing the denoising function used in GAMP with a powerful one based on nonconvex regularization, for example, can worsen the convergence of GAMP [7] or, even if convergence is achieved, the improvement may be insignificant [8]. This study aims to find an alternative way to reduce the variance without nonconvex regularization.
To reduce the variance, we use the bootstrap averaging [9] of computational statistics (commonly known as ensemble learning in machine learning [10, 11]). Specifically, we consider averaging the unbiased estimators of GAMP for multiple bootstrap samples with arbitrary size . For denoising function of GAMP, we consider the one for the elastic net [12].
However, the efficient computation and theoretical analysis of an averaged unbiased estimator remains unresolved. To resolve this problem, we use AMP with resampling (AMPR) [13, 14] that has been proposed for computing bootstrap statistics of the elastic net estimator by running a variant of GAMP once. We will argue that AMPR is actually computing the bootstrap average of the unbiased estimators of GAMP. That is, the averaged unbiased estimator can be computed efficiently, and its variance can be analyzed using the state evolution (SE) of AMPR, which has been developed to analyze the performance of AMPR in [14, 13]. We then conduct a thorough numerical analysis of the SE of the AMPR to investigate when bootstrap averaging reduces the variance of the unbiased estimator, and what phenomena occur when optimizing the bootstrap sample size and the hyperparameter of the elastic net regularization.
The findings of this study are summarized as follows:
-
•
As mentioned above, the averaged unbiased estimator is obtained by AMPR. Furthermore, its variance can be estimated using the output of AMPR without knowing the actual signal . Thus, we can minimize the variance by adjusting the bootstrap sample size and the hyperparameters of the elastic net (see Section III and IV)
-
•
The variance of the averaged unbiased estimator can be reduced via bootstrapping, especially when the true data generation process is inconsistent with the sparsity assumption of the regularization and the data size is insufficient (see Section V-B)
-
•
When is less sparse, and the data size is small, a phase transition occurs. This phase transition indicates the existence of the region where the value of the regularization parameter is infinitesimally small, and the number of unique data points in each bootstrap sample is less than the dimension of . That is, in this region, the ensemble average of the unbiased estimators of GAMP for the elastic norm minimization problem (also known as the minimum norm interpolation in machine learning [15, 16, 17]) yields the best averaged unbiased estimator (see Section V-C)
I-A Notation
denotes a Gaussian distribution with mean and variance and denotes a Poisson distribution with mean . For a random variable , we denote by an average . Given a vector and scalar function , we write for the vector obtained by applying componentwise. For a vector , we denote by the componentwise operations and by the empirical average.
II Background on AMPR
II-A AMPR
Algorithm 1 shows AMPR [13] with an elastic net denoising function. Function is the elastic net denoising function and is a derivative of with respect to the first argument:
(4) | |||||
(7) |
where represents the regularization strength and is the -ratio. At a fixed point, AMPR offers bootstrap statistics of the elastic net estimator as follows:
Proposition 1 (bootstrap statistics based on AMPR [13])
Let be the elastic net estimator for a bootstrap sample of size .
(8) |
where represents the number of times the data point , being the -th row of , appears in the bootstrap sample . Then once the AMPR reaches its fixed point at sufficiently large , the bootstrap statistics of can be computed as
(9) |
where is such that the expression in (9) is well-defined and otherwise arbitrary. The variables without iteration indexes are the output of AMPR at a fixed point.
Note that the averages and can be written in closed-form by the error function, thus computing these quantities is computationally easy. Also, is an average over a one-dimensional discrete random variable. It can be computed numerically without much computational overhead. Hence the computational complexity of computing the RHS of (9) is dominated by the matrix-vector product operations in lines 11-12 of Algorithm 1 instead of repeatedly computing for numerous realizations of , making AMPR a computationally efficient algorithm for computing the bootstrap statistics.
II-B SE of AMPR
AMPR displays remarkable behavior. Let Then behaves like a white Gaussian noise-corrupted version of the true signal [13]. Furthermore, the variance can be estimated using SE.
Proposition 2 (SE of AMPR [13])
behaves as a white Gaussian noise-corrupted version of the actual signal :
(10) |
for some positive value , indicating that can be used as an unbiased estimate of . The variance is predicted in the asymptotic setting using the scalar SE specified in Algorithm 2. There, corresponds to the mean squared error (MSE) of the AMPR estimate : . To track the performance of AMPR, should be inputted as the MSE of .
III Averaged unbiased estimator
Here, we explain the meaning of the SE of AMPR. Subsequently, we argue that of AMPR is the bootstrap average of the unbiased estimators of GAMP.
The first point is the meaning of and that appear in SE. Propositions 1 and 2 indicate that, once the AMPR reaches its fixed point at sufficiently large , for any functions such that the following expression is well defined, the following holds in the asymptotic setting :
(11) |
where the variables are the outputs of SE at a fixed point. In (11), one can interpret that and effectively represents the randomness originating from and , respectively.
The next point is the relationship between AMPR and GAMP. For this, we define and . Then, the evolution of is equivalent to the SE of GAMP for computing for one realization of [1, 2]. Recall that is specified in (8). In addition, represents the variance of the unbiased estimate of GAMP at each iteration. These observations indicate that the unbiased estimator of GAMP at each iteration can be decomposed into two Gaussian variables:
(12) |
where , and and represent the randomness originating from and , respectively. We can interpret that AMPR computes the bootstrap statistics by decomposing into and , and averaging over . This yields the unbiased estimator as in (10) that is the bootstrap average of . We verify this point numerically in section V-A (see Fig. 1 and 2).





IV Variance optimization
In this section, we describe the properties of the variance of the unbiased estimate of AMPR.
First, the following proposition states that bootstrap averaging for the optimal choice of does not increase the variance of the unbiased estimator.
Proposition 3
Let be the variance of the unbiased estimator of GAMP at a fixed point to compute the elastic net estimator of : . We can select the parameter such that the variance of AMPR’s unbiased estimator at a fixed point does not exceed .
Proof:
We denote by the values of and at the fixed point of the SE of AMPR. We consider the limit . In this limit, the random variable that appears in (8) and Algorithm 1 behaves deterministically; the mean and variance of converge to and . Then proposition 1 indicates that . Moreover, from line 12 in Algorithm 2, implies , which yields the GAMP algorithm for computing the elastic net estimator of [1]. ∎
Although SE prediction of the variance of requires information on the unknown signal , we can predict the variance from the data. In other words, estimating the variance of does not require explicit knowledge on .
Proposition 4 (Variance estimation from data)
In the asymptotic setting , the variance of the unbiased estimate can be estimated as
(13) |
Proof:
In SE of AMPR, is determined by the MSE and variance of measurement noise as . For linear models, corresponds to the prediction error for a new sample and can be estimated using the leave-one-out error (LOOE) [20]. LOOE can be estimated from because it is proportional to the leave-one-out estimate for the data point : , where is the AMPR’s estimate of without the sample (Equation (19) of reference [13]). ∎
V Numerical analysis
In the sequel, by numerically minimizing the variance using SE, we investigate when bootstrapping reduces the variance of the unbiased estimator and the phenomena that occur when optimizing the bootstrap sample size and the hyperparameter of the elastic net regularization .
For this, we searched for the optimal parameter that yielded the minimum variance using the SE of AMPR and the Nelder-Mead algorithm in the Optim.jl library [18]. We obtained the fixed point of the SE by iterating the SE a sufficient number of times. For comparison, the same optimization was performed for the non-bootstrap case. For the signal distribution , we consider the Gauss-Bernoulli model: , with sparsity .
In this section, we denote the outputs of the AMPR or GAMP at fixed points by unindexed variables.
V-A Distribution of the unbiased estimator
We verified the interpretation of AMPR’s output described in Section III. For this, we compared the output of GAMP for each realization of as in (8), and the output of AMPR. The parameters used to produce the figure were set as .
Fig. 1 shows the sample quantiles of versus the normal distribution with variance . The scattered points are approximately aligned with a line with a slope of and an intercept of . Fig. 2 shows the scatter plot of versus . Again, the scattered points are approximately aligned with a line with a slope of and an intercept of . This is consistent with the decomposition of the Gaussian noise (12). Thus, AMPR computes the bootstrap average of the unbiased estimator of GAMP.
V-B Variance reduction
We quantitatively compare the variance of the averaged unbiased estimator with that of without bootstrapping. Fig. 3 shows the ratio of the optimal and the optimal . Panels (a) and (b) show the results when the -ratio is fixed and panel (c) shows when the -ratio is optimized. As expected from Proposition 3, the variance is reduced compared to the case without bootstrapping. The magnitude of the reduction is larger when the -ratio is fixed and close to (LASSO [21] case). In particular, the largest improvement is obtained when is large (less sparse) and the measurement ratio is small. The improvement is minor when -ratio is optimized. This suggests that using the bootstrap average is effective when the actual data generation process is inconsistent with the sparsity assumption of the regularization and the data size is insufficient. However, when the data size is too small, meaningful improvement cannot be obtained.






V-C Phase transition to an ensemble of interpolators
Fig. 4 shows the optimal regularization strength for . Panels (a) and (b) show the results when the -ratio is fixed, and panel (c) shows when the -ratio is optimized. In all cases, it is clear that a phase transition has occurred in which drops to an infinitesimally small value (although for numerical reasons, is constrained to exceed ) as the measurement ratio decreases or increases. Moreover, Fig. 5 shows , the typical number of unique data points in each bootstrap sample scaled by : . From Fig. 5, it is clear that the typical number of unique data points is always smaller than in the region where . This holds, even if the measurement ratio . Thus, when , the elastic net estimator in (8) becomes the minimum elastic net norm estimator as
(14) |
which is commonly known as minimum elastic net norm interpolator in machine learning [15, 16, 17]. These suggest that when elastic net regularization cannot determine an appropriate sparse structure of , it is better to use an over-parameterized setting in which the number of unique data points of each bootstrap data is smaller than the dimension of and use an ensemble of interpolators.
VI Summary and discussion
In this study, we investigated the behavior of the bootstrap-averaged unbiased estimator of GAMP using AMPR and its SE. We found that the bootstrap averaging procedure can effectively reduce the variance of the unbiased estimator when the actual data generation process is inconsistent with the sparsity assumption of the regularization and the data size is insufficient. We also found a phase transition where the regularization strength drops to infinitesimally small values by decreasing the measurement ratio or increasing .
Although increasing the variance of weak learners is key to the success of ensemble learning [22, 23, 24], the phase transition to an ensemble of interpolators may be unexpected. Investigating whether similar phase transitions occur in other more sophisticated machine-learning models, such as neural networks, would be an interesting future direction.
We also remark that the landscape in -space may not be convex in general. Thus, the uniqueness of the optimizer should be investigated in the future, paying attention to the relationship with the implicit regularization [25, 26, 27].
On the technical side, the key to this study was the precise performance characterization of the averaged estimator by AMPR, which was developed by combining GAMP and the replica method of statistical physics [28, 29]. Such a combination of approximate inference algorithms and the replica method has been used to develop approximate computation algorithms [13, 30, 14, 31, 32, 33, 34] and has not been applied to precise performance analysis of ensemble methods. It would be an interesting direction to try similar performance analysis for other bootstrap methods or ensemble learning.
Acknowledgement
This study was supported by JSPS KAKENHI Grant Number 21K21310 and 17H00764.
References
- [1] S. Rangan, “Generalized approximate message passing for estimation with random linear mixing,” in 2011 IEEE International Symposium on Information Theory Proceedings. IEEE, 2011, pp. 2168–2172.
- [2] A. Javanmard and A. Montanari, “State evolution for general approximate message passing algorithms, with applications to spatial coupling,” Information and Inference: A Journal of the IMA, vol. 2, no. 2, pp. 115–144, 2013.
- [3] P. Sur and E. J. Candès, “A modern maximum-likelihood theory for high-dimensional logistic regression,” Proceedings of the National Academy of Sciences, vol. 116, no. 29, pp. 14 516–14 525, 2019.
- [4] A. Javanmard and A. Montanari, “Hypothesis testing in high-dimensional regression under the gaussian random design model: Asymptotic theory,” IEEE Transactions on Information Theory, vol. 60, no. 10, pp. 6522–6554, 2014.
- [5] T. Takahashi and Y. Kabashima, “A statistical mechanics approach to de-biasing and uncertainty estimation in lasso for random measurements,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2018, no. 7, p. 073405, 2018.
- [6] P. Sur, Y. Chen, and E. J. Candès, “The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-square,” Probability theory and related fields, vol. 175, no. 1, pp. 487–558, 2019.
- [7] A. Sakata and T. Obuchi, “Perfect reconstruction of sparse signals with piecewise continuous nonconvex penalties and nonconvexity control,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2021, no. 9, p. 093401, 2021.
- [8] L. Zheng, A. Maleki, H. Weng, X. Wang, and T. Long, “Does -minimization outperform -minimization?” IEEE Transactions on Information Theory, vol. 63, no. 11, pp. 6896–6935, 2017.
- [9] B. Efron, “Bootstrap Methods: Another Look at the Jackknife,” The Annals of Statistics, vol. 7, no. 1, pp. 1 – 26, 1979. [Online]. Available: https://doi.org/10.1214/aos/1176344552
- [10] Z.-H. Zhou, Ensemble methods: foundations and algorithms. CRC press, 2012.
- [11] K. P. Murphy, Probabilistic machine learning: an introduction. MIT press, 2022.
- [12] H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” Journal of the royal statistical society: series B (statistical methodology), vol. 67, no. 2, pp. 301–320, 2005.
- [13] T. Obuchi and Y. Kabashima, “Semi-analytic resampling in lasso,” The Journal of Machine Learning Research, vol. 20, no. 1, pp. 2544–2576, 2019.
- [14] T. Takahashi and Y. Kabashima, “Semi-analytic approximate stability selection for correlated data in generalized linear models,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2020, no. 9, p. 093402, 2020.
- [15] V. Muthukumar, K. Vodrahalli, V. Subramanian, and A. Sahai, “Harmless interpolation of noisy data in regression,” IEEE Journal on Selected Areas in Information Theory, vol. 1, no. 1, pp. 67–83, 2020.
- [16] P. L. Bartlett, P. M. Long, G. Lugosi, and A. Tsigler, “Benign overfitting in linear regression,” Proceedings of the National Academy of Sciences, vol. 117, no. 48, pp. 30 063–30 070, 2020.
- [17] T. Hastie, A. Montanari, S. Rosset, and R. J. Tibshirani, “Surprises in high-dimensional ridgeless least squares interpolation,” The Annals of Statistics, vol. 50, no. 2, pp. 949–986, 2022.
- [18] P. K. Mogensen and A. N. Riseth, “Optim: A mathematical optimization package for Julia,” Journal of Open Source Software, vol. 3, no. 24, p. 615, 2018.
- [19] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, İ. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, and SciPy 1.0 Contributors, “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python,” Nature Methods, vol. 17, pp. 261–272, 2020.
- [20] T. Obuchi and Y. Kabashima, “Cross validation in lasso and its acceleration,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2016, no. 5, p. 053304, 2016.
- [21] R. Tibshirani, “Regression shrinkage and selection via the lasso: a retrospective,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 73, no. 3, pp. 273–282, 2011.
- [22] A. Krogh and P. Sollich, “Statistical mechanics of ensemble learning,” Physical Review E, vol. 55, no. 1, p. 811, 1997.
- [23] P. Sollich and A. Krogh, “Learning with ensembles: How overfitting can be useful,” Advances in neural information processing systems, vol. 8, 1995.
- [24] L. I. Kuncheva and C. J. Whitaker, “Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy,” Machine learning, vol. 51, no. 2, pp. 181–207, 2003.
- [25] D. LeJeune, H. Javadi, and R. Baraniuk, “The implicit regularization of ordinary least squares ensembles,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2020, pp. 3525–3535.
- [26] J.-H. Du, P. Patil, and A. K. Kuchibhotla, “Subsample ridge ensembles: Equivalences and generalized cross-validation,” arXiv preprint arXiv:2304.13016, 2023.
- [27] T. Yao, D. LeJeune, H. Javadi, R. G. Baraniuk, and G. I. Allen, “Minipatch learning as implicit ridge-like regularization,” in 2021 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE, 2021, pp. 65–68.
- [28] M. Mézard and A. Montanari, Information, physics, and computation. Oxford University Press, 2009.
- [29] A. Montanari and S. Sen, “A short tutorial on mean-field spin glass techniques for non-physicists,” arXiv preprint arXiv:2204.02909, 2022.
- [30] T. Takahashi and Y. Kabashima, “Replicated vector approximate message passing for resampling problem,” arXiv preprint arXiv:1905.09545, 2019.
- [31] D. Malzahn and M. Opper, “An approximate analytical approach to resampling averages,” Journal of Machine Learning Research, vol. 4, no. Dec, pp. 1151–1173, 2003.
- [32] ——, “A statistical mechanics approach to approximate analytical bootstrap averages,” in Advances in Neural Information Processing Systems, 2003, pp. 343–350.
- [33] ——, “Approximate analytical bootstrap averages for support vector classifiers,” in Advances in Neural Information Processing Systems, S. Thrun, L. Saul, and B. Schölkopf, Eds., vol. 16. MIT Press, 2003. [Online]. Available: https://proceedings.neurips.cc/paper/2003/file/2c6ae45a3e88aee548c0714fad7f8269-Paper.pdf
- [34] M. Opper, “An approximate inference approach for the pca reconstruction error,” Advances in Neural Information Processing Systems, vol. 18, 2005.