Enhancing Computational Efficiency in High-Dimensional Bayesian Analysis: Applications to Cancer Genomics
Abstract
In this study, we present a comprehensive evaluation of the Two-Block Gibbs (2BG) sampler as a robust alternative to the traditional Three-Block Gibbs (3BG) sampler in Bayesian shrinkage models. Through extensive simulation studies, we demonstrate that the 2BG sampler exhibits superior computational efficiency and faster convergence rates, particularly in high-dimensional settings where the ratio of predictors to samples is large. We apply these findings to real-world data from the NCI-60 cancer cell panel, leveraging gene expression data to predict protein expression levels. Our analysis incorporates feature selection, identifying key genes that influence protein expression while shedding light on the underlying genetic mechanisms in cancer cells. The results indicate that the 2BG sampler not only produces more effective samples than the 3BG counterpart but also significantly reduces computational costs, thereby enhancing the applicability of Bayesian methods in high-dimensional data analysis. This contribution extends the understanding of shrinkage techniques in statistical modeling and offers valuable insights for cancer genomics research.
Keywords: Bayesian Lasso; Spike-and-Slab; Gibbs sampler; High-Dimensional Statistics, MCMC Methods, Cancer Genomics
1. Introduction
Linear regression aims to model the relationship between a response variable, , and a set of predictor variables, , using regression coefficients . In situations where the number of predictors, , far exceeds the number of observations, , penalization techniques are often employed. These methods help to regularize the model by applying a penalty to the regression coefficients, effectively shrinking some ’s towards zero. This leads to sparse estimates, which can improve model stability and prediction accuracy in high-dimensional settings. However, despite their effectiveness, penalization methods face the challenge of providing uncertainty quantification for the estimated parameters.
In the Bayesian paradigm, inference is achieved by incorporating prior distributions on the parameters, allowing for direct quantification of uncertainty through the posterior distribution. In recent years, Bayesian shrinkage models have gained popularity for addressing high-dimensional problems, with two main approaches: the use of spike-and-slab priors, and continuous shrinkage priors. Spike-and-slab models, such as Laplace-Zero mixtures (Johnstone and Silverman, 2004), combine a point mass at zero with a continuous distribution, providing a flexible framework for inducing sparsity. On the other hand, purely continuous shrinkage priors, such as the Horseshoe prior (Carvalho et al., 2010), have emerged as effective alternatives for handling high-dimensional data while quantifying parameter uncertainties.
Additionally, the Bayesian lasso, introduced by Park and Casella (2008), extends the frequentist lasso by interpreting its objective as a Bayesian model. This approach places an independent Laplace prior on the regression coefficients, connecting it to the original lasso formulation developed by Tibshirani (1996), which has been widely used for regularization and variable selection in high-dimensional regression.
However, as with most Bayesian problems, there are no closed form expression for posterior quantification of the Bayesian Shrinkage models. An approach to approximating the posterior distribution is via Variational Bayes algorithms such as mean field variational Bayes (MFVB) methodology. While this is a fast algorithm, Neville et al. (2014) points that the MFVB algorithm can perform quite poorly due to posterior dependence among auxiliary variables. Markov chain Monte Carlo (MCMC), an indispensable tool in contemporary Bayesian inference provides us with alternative methods that can be used to approximate the posterior distribution and hence make inferences on the parameters, since the shrinkage priors of these models capitalize on hierarchical representations. Utilizing the power of MCMC, Park and Casella (2008) provided the three-step Gibbs sampler, otherwise known as the three-Block Gibbs sampler (3BG) algorithm to explore the posterior distribution for arbitrary and . While Khare and Hobert (2013) proved that the 3BG is geometrically ergodic for arbitrary values of and , it tends to suffer from slow convergence especially if is large enough due to high correlation between components of the different blocks, especially that between the regression coefficients in one block and the variance parameters in another(Rajaratnam and Sparks, 2015). Owing to the deterioration in the convergence properties of the 3BG in high-dimesional settings, Rajaratnam et al. (2019) proposed an efficient version known as the two-Block Gibbs sampler (2BG) algorithm to overcome the sluggish convergence issues of the former. Theoretical convergence properties can be found in the aforementioned paper for interested readers.
In this study, an extensive simulation analysis is conducted to compare the computational efficiencies of the 3BG and the 2BG using both the spike-and-slab priors and Bayesian lasso model following Rajaratnam et al. (2019). Here, computing efficiency is measured as the effective sample size per second (). Further, the mixing rates are evaluated by examining lag-one autocorrelations , with values closer to zero indicating improved mixing performance (Rajaratnam and Sparks, 2015). We consequently apply this method on the well-known NCI-60 cancer cell panel from the National cancer Institute.
Jin and Tan (2021) introduced the 2BG for Bayesian group lasso, sparse group lasso, and fused lasso models, comparing its performance to the 3BG and Hamiltonian Monte Carlo (HMC). While their work focused on group-structured priors, this study takes a different direction by evaluating the performance of the 2BG relative to the 3BG for two classic Bayesian shrinkage models: the spike-and-slab prior and the Bayesian lasso. Furthermore, this paper extends the application to real-world data from the NCI-60 cancer cell line panel, using gene expression data to predict protein expression levels. By incorporating feature selection, the analysis not only identifies the most influential genes but also provides insights into the genetic mechanisms driving protein expression in cancer cells. This contribution deepens the understanding of shrinkage methods in high-dimensional settings while offering novel applications in cancer genomics.
The rest of the article is organized as follows: Section 2 presents an overview of the Bayesian shrinkage framework for regression, including a review of the spike-and-slab priors, the Bayesian lasso, and the 2BG algorithms for exploring posterior distributions. Sections 3 and 4 detail the simulation studies and real data analyses conducted to empirically compare the two algorithms. The article concludes with a discussion of the findings in Section 5. The codes for this project are found at https://github.com/bosafoagyare/2-BGS.
2. Methods
2.1. Bayesian Shrinkage Models
Consider a dataset , where represents the response vector, and is an design matrix of standardized covariates. Let denote the vector of regression coefficients, the residual variance, and an unknown intercept. The regression model is expressed as:
(1) |
The focus here is on high-dimensional regression problems where the number of covariates, , greatly exceeds the sample size, . In a Bayesian framework where sparsity is desired, the use of spike-and-slab priors—combining a spike at zero with a normal density and another normal density that is flat near zero—has been explored as a method to shrink regression coefficients toward zero. Alternatively, purely continuous shrinkage priors have also been used to achieve this effect. In many Bayesian high-dimensional regression settings, the prior is typically specified as:
(2) |
where represents the prior distribution on . Additionally, the prior for and is assumed to be the improper prior , which is independent of . By integrating out and combining Equations 1 and 2, the following conditional distributions are obtained:
(3) | ||||
where and .
Given the ability to sample from , these conditional distributions can be used to construct a three-Block Gibbs sampler (3BG) for drawing from the joint posterior . The one-step transition density with respect to Lebesgue measure on is given by:
(4) | ||||
2.1.1. The Spike-and-Slab Prior
Under the spike-and-slab prior framework, each is assigned an independent discrete prior that places probability on the value and probability on the value , where is a small value, is large, and . This setup leads to a conditional posterior distribution for , which is a product of independent discrete distributions. Each distribution assigns probability to the point and probability to the point (Rajaratnam et al., 2019). The value of is given by:
(5) |
In this study, , , and are treated as constants, with values chosen to satisfy the constraints of the subsequent analyses.
2.1.2. The Bayesian Lasso
In the Bayesian lasso framework, each is assigned an independent Exponential prior, where is the rate parameter of the exponential distribution. This setup implies that the marginal prior of (given ) follows independent Laplace distributions for each component, effectively shrinking the regression coefficients. Consequently, the conditional posterior distribution of assigns independent inverse Gaussian distributions to each , which allows for efficient sampling. This full framework is extensively discussed by Park and Casella (2008) within the context of the 3BG algorithm.
2.2. The Two-Block Gibbs Sampler for Bayesian Shrinkage Models
Rajaratnam et al. (2019) introduce a lemma that forms the foundation for a two-block Gibbs sampler, which addresses the slow convergence properties of the three-block Gibbs sampler .
Lemma 1. For the Bayesian model specified in equation 2, the posterior distribution of follows an inverse gamma distribution with shape parameter and scale parameter .
This lemma enables the development of a 2BG sampler to generate samples from the joint posterior distribution of , which retains the computational tractability of the original 3BG sampler. The key difference lies in the way is sampled: instead of drawing as done in 3BG (refer to equation 3), the 2BG sampler uses the draw , as derived in Lemma 1. The algorithm alternates between sampling and . Specifically, is sampled by first drawing , followed by drawing .
The cyclical steps of the 2BG sampler can be summarized as follows:
where , and is the diagonal matrix formed by the ’s.
3. Simulation studies
We assess the computational efficiency of the 2BG and 3BG samplers through a series of simulation studies. All simulations were conducted on a Windows 11 Pro PC with 16.0 GB RAM, 8 cores, and an Intel(R) Core(TM) i7-8650U @ 1.90GHz processor.
3.1. Data Generation
The data is simulated from the following linear model:
where is a vector of standard normal random variables, and represents the true regression coefficients. We conduct simulations for two distinct models:
1. Spike-and-Slab Model: Two sets of simulations are performed with .
2. Bayesian Lasso Model: A single set of simulations is performed with .
For both models, the dimension is chosen such that . This results in 10 datasets for each combination for the Spike-and-Slab model using the 2-block Gibbs sampler and 1-10 block simulations for the Bayesian Lasso model.
For each dataset, the design matrix is generated by drawing rows independently from the -dimensional standard multivariate normal distribution. The columns of are then standardized to have a mean of zero and a squared Euclidean norm of . For each , the first elements of are nonzero, drawn independently from the distribution.
3.2. Simulation Setup
We use the foreach package in R to run 6 parallel Markov chains, each with 15,000 iterations. The first 10% of iterations are discarded as burn-in. All chains are initialized with and . The regularization parameter for the Bayesian Lasso model is set to .
For the Spike-and-Slab model, the hyperparameters are consistently set as , , and .
3.3. Performance Metrics
To evaluate computational efficiency, we estimate the average lag-1 autocorrelation of the marginal after burn-in. This metric is chosen because Rajaratnam and Sparks (2015) suggest that offers better insight into chain mixing than . Additionally, we estimate the effective sample size per second, , where is calculated using the R package coda (Plummer et al., 2006) as:
with denoting the total number of iterations and the lag- autocorrelation.
3.4. Results for the Spike-and-slab model
The left panel of Figure 1 presents the empirical average lag-one autocorrelation of the component across the six MCMC chains for the spike-and-slab model at . It is evident that the new 2BG sampler consistently exhibits smaller average autocorrelations compared to the 3BG sampler across all combinations. This suggests that the 2BG sampler has a better mixing rate, and thus, is computationally more efficient than the 3BG algorithm.
Moreover, as the dimensionality of the covariates increases relative to the sample size , the performance gap between the 2BG and 3BG samplers widens, further emphasizing the efficiency of the 2BG sampler in high-dimensional settings.
The right panel of Figure 1 shows the average effective sample size per second, , on a base-10 log scale for each sampler. The results demonstrate that the 2BG sampler consistently produces more effective samples than the 3BG sampler. As grows larger relative to , the computational advantage of the 2BG sampler becomes more pronounced. This suggests that the 2BG sampler incurs a smaller computational cost per chain, making it a more suitable algorithm for high-dimensional problems.
A similar trend is observed when scaling up the sample size to , as depicted in Figure 2. The 2BG sampler continues to outperform the 3BG in terms of both autocorrelation and effective sample size.


3.5. Results for the Bayesian lasso model
A similar analysis as in Section 3.4 is performed under the Bayesian Lasso model. The left panel of Figure 3 presents the empirical average lag-one autocorrelation of the component across six MCMC chains at . Consistent with the findings for the spike-and-slab model, the 2BG sampler exhibits smaller average autocorrelations than the 3BG sampler across all combinations. This trend holds under both the "-small, -large" and "-large, -small" regimes.
Interestingly, the closest average lag-one autocorrelation between the two algorithms occurs when , suggesting comparable performance in this specific scenario. However, as increases relative to , the 2BG sampler consistently outperforms the 3BG sampler in terms of mixing efficiency.
The right panel of Figure 3 shows the average effective sample size per second, , also on a base-10 log scale. Unsurprisingly, the 2BG sampler generates significantly more effective samples than the 3BG counterpart, further demonstrating its computational superiority. This indicates that the 2BG sampler achieves a higher degree of efficiency, especially in high-dimensional settings, making it a more suitable choice for Bayesian Lasso models.

4. Application to Real Data
We now apply both the Bayesian lasso and the spike-and-slab models to a real dataset. We use the well-known NCI-60 cancer cell panel from the National cancer Institute. This dataset comprises protein expressions for a specific protein selected as the response variable, and the gene expressions of the 100 genes that have the highest (robustly estimated) correlations with the response variable which are screened as candidate predictors. This pre-processing is done using the R robustHD package (Alfons, 2021). Hence, we have samples and covariates. Each covariate was further standardized to have mean zero and squared Euclidean norm . We set the hyperparameters of the priors and with . For both shrinkage models, we run 18,000 chains and set the first 10% aside as burn-in.
As illustrated in Figures 4 and 5, the chains for both the spike-and-slab and Bayesian lasso models have attained stationarity, indicating that they provide sufficient samples for posterior estimation. Further, from Table 1, we observe that the lag-one autocorrelation for the 2BG model for both the spike-and-slab and Bayesian lasso models (0.387 and 0.161 respectively) are much smaller than the 3BG counterparts, indicating better mixing.
Moreover, the 2BG sampler generates about twice as many effective samples as the 3BG sampler in the spike-and-slab model, and approximately five times as many for the Bayesian Lasso model. This demonstrates that the 2BG sampler is not only computationally more efficient but also highly effective, particularly in high-dimensional scenarios.
Autocorrelation | ||||||
---|---|---|---|---|---|---|
Dataset | 2GB | 3GB | 2BG | 3BG | ||
Gene (spike-and-slab) | 59 | 100 | 0.387 | 0.774 | 3,263 | 1,639 |
Gene (Bayesian lasso) | 59 | 100 | 0.161 | 0.700 | 10,921 | 2,856 |


5. Discussion
In this study, we have used both simulation studies and real data analysis to demonstrate the computational prowess of the two-block Gibbs (2BG) sampler algorithm, specifically within the context of the spike-and-slab and Bayesian Lasso models. Our results consistently show that the 2BG is faster and more efficient relative to the three-block Gibbs (3BG) sampler. This efficiency is particularly evident in its superior mixing rates, as evidenced by lower lag-one autocorrelations, and its ability to produce substantially higher effective sample sizes per unit of time. These computational advantages make the 2BG sampler a highly attractive choice for practitioners working in high-dimensional settings, where computational complexity can often be a bottleneck.
Furthermore, this study extends the application to real-world data from the NCI-60 cancer cell line panel, using gene expression data to predict protein expression levels. By incorporating feature selection, the analysis not only identifies the most influential genes but also provides insights into the genetic mechanisms driving protein expression in cancer cells. This contribution deepens the understanding of shrinkage methods in high-dimensional settings while offering novel applications in cancer genomics. The results of our real data application show that the 2BG sampler can handle real-world, high-dimensional data efficiently, providing both robust estimates and computational scalability.
References
- Alfons (2021) Alfons, A. (2021). robusthd: An r package for robust regression with high-dimensional data. Journal of Open Source Software, 6(67), 3786.
- Carvalho et al. (2010) Carvalho, C. M. et al. (2010). The horseshoe estimator for sparse signals. Biometrika, 97(2), 465–480.
- Jin and Tan (2021) Jin, R. and Tan, A. (2021). Fast markov chain monte carlo for high-dimensional bayesian regression models with shrinkage priors. Journal of Computational and Graphical Statistics, 30(3), 632–646.
- Johnstone and Silverman (2004) Johnstone, I. M. and Silverman, B. W. (2004). Needles and straw in haystacks: Empirical bayes estimates of possibly sparse sequences.
- Khare and Hobert (2013) Khare, K. and Hobert, J. P. (2013). Geometric ergodicity of the bayesian lasso.
- Neville et al. (2014) Neville, S. E. et al. (2014). Mean field variational bayes for continuous sparse signal shrinkage: pitfalls and remedies.
- Park and Casella (2008) Park, T. and Casella, G. (2008). The bayesian lasso. Journal of the American Statistical Association, 103(482), 681–686.
- Plummer et al. (2006) Plummer, M. et al. (2006). Coda: convergence diagnosis and output analysis for mcmc. R news, 6(1), 7–11.
- Rajaratnam and Sparks (2015) Rajaratnam, B. and Sparks, D. (2015). Mcmc-based inference in the era of big data: A fundamental analysis of the convergence complexity of high-dimensional chains. arXiv preprint arXiv:1508.00947.
- Rajaratnam et al. (2019) Rajaratnam, B. et al. (2019). Uncertainty quantification for modern high-dimensional regression via scalable bayesian methods. Journal of Computational and Graphical Statistics, 28(1), 174–184.
- Tibshirani (1996) Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288.