This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Statistical Approach to Estimating Adsorption-Isotherm Parameters in Gradient-Elution Preparative Liquid Chromatography

Jiaji Su    Zhigang Yaolabel=e1 [    mark][email protected] or [email protected]    Cheng Li    Ye Zhang Department of Statistics and Data Science, National University of Singapore, 117546 Singapore. Department of Statistics and Data Science, National University of Singapore, 117546 Singapore and Center of Mathematical Sciences and Applications, Harvard University, 02138 Cambridge USA.
Corresponding author:
School of Mathematics and Statistics, Beijing Institute of Technology, 100081 Beijing China and Shenzhen MSU-BIT University, 518172 Shenzhen China
Abstract

Determining the adsorption isotherms is an issue of significant importance in preparative chromatography. A modern technique for estimating adsorption isotherms is to solve an inverse problem so that the simulated batch separation coincides with actual experimental results. However, due to the ill-posedness, the high non-linearity, and the uncertainty quantification of the corresponding physical model, the existing deterministic inversion methods are usually inefficient in real-world applications. To overcome these difficulties and study the uncertainties of the adsorption-isotherm parameters, in this work, based on the Bayesian sampling framework, we propose a statistical approach for estimating the adsorption isotherms in various chromatography systems. Two modified Markov chain Monte Carlo algorithms are developed for a numerical realization of our statistical approach. Numerical experiments with both synthetic and real data are conducted and described to show the efficiency of the proposed new method.

Liquid chromatography,
Adsorption isotherm,
Inverse problem,
Bayesian sampling,
Gaussian-mixture model,
keywords:

, , and

1 Introduction

The separation and purification of mixtures are important processes in many fields, including the fine-chemical, biomedical, pharmaceutical, food, and environmental industries. A popular technique for high qualified separation and purification is liquid chromatography, whose operating principle can be roughly described as follows: a mixed sample is injected into a fluid stream that is pumped through a pipe filled with small porous beads. These beads slow down the components traveling through the column because the components adsorb on the surface of the beads. Since the components adsorb at different rates, they will travel at different speeds through the column and exit the column at different times, and thereby be separated from each other. The foundational monograph by Guiochon and Lin (2003) provides a detailed theoretical analysis of preparative liquid chromatography.

In preparative chromatography, the Adsorption Isotherm can be considered the most important quantity as it can be used to calculate the specific surface area of materials, mean size of deposited particles, pore size or particle size distribution, etc. Loosely speaking, the adsorption isotherm describes the dependence of the amount of adsorbed substance on the partial pressure of the solution concentration at a given temperature. It can be viewed as an intrinsic physical quantity of a given chromatography system. Various approaches exist for estimating the adsorption isotherm in practice, and can be classified into two categories: experimental methods and computational methods. Most traditional methods, such as frontal analysis (Lisec, Hugo and Seidel-Morgenstern (2001)) and perturbation peak (Dose, Jacobson and Guiochon (1991)), belong to the category of experimental methods. They are usually conducted in a series of programmed concentration steps, each step resulting in a so-called breakthrough front giving one point on the adsorption-isotherm curve. Hence, to reduce the experimental costs, over the last two decades, numerous computational methods have been developed for efficiently estimating the adsorption isotherm in various chromatography systems (see, for example, Felinger, Zhou and Guiochon (2003); Forssén, Arnell and Fornstedt (2006); Zhang et al. (2016a, b); Cheng et al. (2017)). These types of methods are used to numerically estimate adsorption-isotherm parameters so that the simulated batch separation coincides with the actual experimental results. Most of such computational approaches require only a few injections of different sample concentrations, and so solute consumption and time requirements are very modest. The mathematical fundamentals of most computational methods are based on the numerical estimation of adsorption-isotherm parameters by solving an inverse problem in Partial Differential Equations (PDEs). However, to the best of our knowledge, all existing approaches belong to the class of deterministic models. Hence, the main goal of this paper is to develop a probabilistic model to estimate the adsorption-isotherm parameters, which will constitute a methodological contribution to the field of chromatography.

It should be noted that the complex structure (highly non-linear) of parameter-to-measurement mapping makes the corresponding optimization problem highly non-convex. Consequently, the global optimal estimator of the adsorption-isotherm parameters cannot be efficiently obtained through conventional optimization solvers. As with the fitting of Gaussian-mixture models, the estimation of adsorption-isotherm parameters is hindered by multiple global solutions, and optimization algorithms are valid only under certain constraints. Therefore, this paper proposes a hybrid method of optimization and sampling to estimate the adsorption-isotherm parameters, and the algorithm is shown to be valid on the Gaussian-mixture data, Gamma-mixture data and the data for the experimental gradient-elution preparative liquid chromatography. We adopt a Bayesian approach and model the solution path obtained from the numerical solver with additive white noise. The framework provides an uncertainty quantification for the model parameters using draws of model parameters from the Bayesian posterior distribution based on modified Markov chain Monte Carlo (MCMC) algorithms.

Bayesian modeling of PDEs and complex dynamic systems has already been studied in some previous literature, though their focus is mostly on the estimation and uncertainty quantification for the PDE solution instead of the finite-dimensional system parameters. For example, Xun et al. (2013) uses B-splines in a Bayesian hierarchical model to estimate the solution of PDEs. Their model assumption is different from ours since the PDEs in our problem have an existing numerical solver while theirs do not. Chkrebtii et al. (2016) proposes a systematic Bayesian calibration method to estimate both the PDE solution and the model parameters, characterizing the uncertainty from both the discretization error in numerical solvers and the random error in observed data. We refer readers to Cockayne et al. (2019) for a detailed review on Bayesian methods on PDEs. Although the framework considered in Chkrebtii et al. (2016) is very general, their main focus is on using the Gaussian process prior to measure the discretization error in the PDE solution. In contrast, in our chromatography problem, the primary interest is in estimating the finite-dimensional system parameters rather than the PDE solution itself, and the main challenge comes from the lack of identification in these parameters due to the Gaussian-mixture-like solution paths. This unique nonidentification issue cannot be addressed by the generic Bayesian MCMC Algorithm 2 in Chkrebtii et al. (2016). Instead, to decouple the strong dependence between model parameters, we propose a dimension-reduction strategy based on the knowledge of mixture-alike solution paths and then incorporate either a gradient descent or a Langevin dynamics subroutine into the MCMC algorithm, leading to significantly improved mixing of the posterior samples.

Section 2 describes the gradient-elution preparative liquid chromatography model from mathematical and statistical perspectives, while in Section 3 a statistical approach is developed to estimate the adsorption-isotherm parameters. Numerical simulations with synthetic data are presented in Section 4 to demonstrate the robustness of the proposed method. Finally, an experimental gradient data set is tested in Section 5, and a short discussion and concluding remarks are provided in Section 6.

2 Modeling of preparative liquid chromatography

In this section, we briefly review the mathematical models used in this paper for chromatographic processes and provide a parameter-to-measurement mapping of the considered inverse problem of estimating adsorption isotherms. Following this, a statistical model with spatial noise terms and corresponding notations is introduced. Finally, we visualize the structure of the target function in optimization, to illustrate the multi-solution and correlation in parameters intuitively.

2.1 Mathematical background

Without loss of generality, and for the convenience of readers who are interested only in our statistical approach that can be used for other types of real-world problems, we consider the following mass-balance equation of a two-component system for a fixed-bed chromatography column with the Danckwerts boundary condition, which is the most commonly used one for column chromatography (Ruthven (1984); Guiochon and Lin (2003); Horvath (1988); Javeed et al. (2011); Lin et al. (2017); Zhang et al. (2016b)).

{Cit+Fqit+uCix=Da2Cix2,x𝒳[0,L],t(0,T]Ci(x,0)=gi(x),x𝒳,t=0uCi(0,t)DaCi(0,t)x=uhi(t),x=0,t(0,T]DaCi(L,t)x=0,x=L,t(0,T],\left\{\begin{array}[]{ll}\frac{\partial C_{i}}{\partial t}+F\frac{\partial q_{i}}{\partial t}+u\frac{\partial C_{i}}{\partial x}=D_{a}\frac{\partial^{2}C_{i}}{\partial x^{2}},&x\in\mathcal{X}\equiv[0,L],t\in(0,T]\\ C_{i}(x,0)=g_{i}(x),&x\in\mathcal{X},t=0\\ uC_{i}(0,t)-D_{a}\frac{\partial C_{i}(0,t)}{\partial x}=uh_{i}(t),&x=0,t\in(0,T]\\ D_{a}\frac{\partial C_{i}(L,t)}{\partial x}=0,&x=L,t\in(0,T]\end{array}\right., (1)

where xx is distance, tt is time, and i=1, 2i=1,\,2 refers to the two components. CC and qq are the concentrations in the mobile and stationary phases, respectively, uu is the mobile phase velocity, and FF is the stationary-to-mobile phase ratio. DaD_{a} is the diffusion parameter. LL is the length of the chromatographic column, and TT is an appropriate time point slightly larger than the dead time of chromatographic time T0=L/uT_{0}=L/u. In this paper, we set T=1.5T0T=1.5T_{0}. In addition, g(x)g(x) is the initial condition and h(t)h(t) is the boundary condition, which describes the injection profile in the experiment. We adopt a simplified model here for ease of illustration; a more detailed version of gradient-elution liquid chromatography, with further discussion, can be found in Section S1 of the supplementary material.

Throughout this paper, we focus on the case in which the adsorption-energy distribution is bimodal. In this case, the bi-Langmuir adsorption isotherm is usually adopted as follows:

q1(C1,C2)=aI,1C11+bI,1C1+bI,2C2+aII,1C11+bII,1C1+bII,2C2,\displaystyle q_{1}\left(C_{1},C_{2}\right)=\frac{a_{I,1}C_{1}}{1+b_{I,1}C_{1}+b_{I,2}C_{2}}+\frac{a_{II,1}C_{1}}{1+b_{II,1}C_{1}+b_{II,2}C_{2}}, (2)
q2(C1,C2)=aI,2C21+bI,1C1+bI,2C2+aII,2C21+bII,1C1+bII,2C2.\displaystyle q_{2}\left(C_{1},C_{2}\right)=\frac{a_{I,2}C_{2}}{1+b_{I,1}C_{1}+b_{I,2}C_{2}}+\frac{a_{II,2}C_{2}}{1+b_{II,1}C_{1}+b_{II,2}C_{2}}.

where subscripts I and II refer to two adsorption sites with different levels of adsorption energy.

In this paper, the collection of adsorption-isotherm parameters is denoted by

𝝃=(aI,1,aII,1,bI,1,bII,1,aI,2,aII,2,bI,2,bII,2)T.\bm{\xi}=(a_{I,1},a_{II,1},b_{I,1},b_{II,1},a_{I,2},a_{II,2},b_{I,2},b_{II,2})^{T}. (3)

Now, we consider the measurement-data structure. In most laboratory and industry environments, the total response R(𝝃,t)R(\bm{\xi},t) is observed at the column outlet x=Lx=L with

R(𝝃,t)=i=12Ci(L,t),R(\bm{\xi},t)=\sum_{i=1}^{2}C_{i}(L,t), (4)

where C(x,t)C(x,t) is the solution to problem (1) with the bi-Langmuir adsorption-isotherm model (2), and Ci(L,t)C_{i}(L,t) represents the concentration of the ii-th component at the outlet x=Lx=L. The parameter-to-measurement map 𝒜\mathcal{A}: 8L2(𝒯)\mathbb{R}^{8}\to L^{2}(\mathcal{T}) can be expressed as

𝒜(𝝃)=R(𝝃,t),\mathcal{A}(\bm{\xi})=R(\bm{\xi},t), (5)

where the model operator 𝒜\mathcal{A} is defined through (4). To be more precise, for a given parameter 𝝃\bm{\xi}, a bi-Langmuir adsorption-isotherm model can be constructed according to (2). Then, the concentration in mobile, i.e. CC, can be obtained by solving PDE (1). Finally, the experimental data can be collected by using the designed sensor with the physical law (4). The aim of this paper is to estimate adsorption-isotherm parameters 𝝃\bm{\xi} from the time series database R(𝝃,t)R(\bm{\xi},t) and the integrated mathematical model (5) via a statistical approach.

2.2 A statistical model

In a liquid-chromatography experiment, a sampler brings the sample mixture into the mobile-phase stream, which carries it into a column, and pumps deliver the desired flow and composition of the mobile phase through the column. The detector located at the end of the column records a signal proportional to the amount of sample component emerging from the column, for time period 𝒯=[0,T]\mathcal{T}=[0,T]. The signal recorded is the observation of interest.

To build up a statistical model, let 𝒓=(r(t1),,r(tn))T\bm{r}=(r(t_{1}),\cdots,r(t_{n}))^{T} be the observation points of the experiment, collected at discrete time points 𝒯n={t1,,tn}𝒯\mathcal{T}_{n}=\{t_{1},\cdots,t_{n}\}\subseteq\mathcal{T}. The liquid-chromatography data measured at time tt can be modeled by

r(t)=R(𝝃,t)+ϵ(t),t𝒯n,r(t)=R(\bm{\xi},t)+\epsilon(t),~{}\,t\in\mathcal{T}_{n}, (6)

where ϵ(t)\epsilon(t) represents the measurement noise. The clean liquid-chromatography data are R(𝝃,t)=𝔼[r(t)]R(\bm{\xi},t)=\mathbb{E}[r(t)], where {Ci(L,t;𝝃):i=1, 2}\{C_{i}(L,t;\,\bm{\xi}):~{}i=1,\,2\} is the solution of a system of differential equations with static parameters 𝝃\bm{\xi}; 𝝃=(aI,1,aII,1,bI,1,bII,1,aI,2,aII,2,bI,2,bII,2)\bm{\xi}=(a_{I,1},a_{II,1},b_{I,1},b_{II,1},a_{I,2},a_{II,2},b_{I,2},b_{II,2}) represents the parameter of interest.

Let 𝑹(𝝃)=(R(𝝃,t1),,R(𝝃,tn))T\bm{R}(\bm{\xi})=(R(\bm{\xi},t_{1}),\cdots,R(\bm{\xi},t_{n}))^{T} be the collection of the exact chromatography signals with parameter 𝝃\bm{\xi}. Then, the general framework of the chromatography measurement of 𝒓\bm{r} is represented by

𝒓=𝑹(𝝃)+ϵ.\bm{r}=\bm{R}(\bm{\xi})+\bm{\epsilon}. (7)

ϵ=(ϵ(t1),,ϵ(tn))T\bm{\epsilon}=(\epsilon(t_{1}),\cdots,\epsilon(t_{n}))^{T} stands for the measurement noise, and for simplicity we assume the noises ϵ(t)\epsilon(t) are uncorrelated between every pair, with zero mean and identical variances σϵ2\sigma^{2}_{\epsilon}.

Throughout this paper, the framework is based on a single observation, but it can be generalized to a case with multiple observations or multiple injection groups. Assuming there is a single observation 𝒓obs=𝑹(𝝃)+ϵ\bm{r_{\text{obs}}}=\bm{R}(\bm{\xi}^{*})+\bm{\epsilon}^{*} with fixed parameter 𝝃\bm{\xi}^{*}, the aim of the paper is to estimate 𝝃\bm{\xi} through the posterior distribution as described in the following sections, with all the other parameters apart from 𝚯={𝝃,σϵ2}\bm{\Theta}=\{\bm{\xi},\sigma^{2}_{\epsilon}\} assumed to be known.

2.3 Difficulties in optimization

A natural idea is to estimate the adsorption-isotherm parameters with optimization methods. For example, one may think the minimizer of the L2L^{2} distance between 𝒓obs\bm{r_{\text{obs}}} and 𝑹(𝝃)\bm{R}(\bm{\xi}) with respect to 𝝃\bm{\xi} as a good estimator. However, the non-smooth loss function and the non-unique global minimum prevent us from obtaining valid estimates by optimization.

More specifically, let us consider a four-dimensional parameter 𝝃\bm{\xi} by setting the parameters related to the second component as 0. Since all the elements of 𝝃\bm{\xi} are positive, we can decompose 𝝃\bm{\xi} into a two-dimensional unbounded parameter 𝝂=(aI,1+aII,1,bI,1+bII,1)T\bm{\nu}=(a_{I,1}+a_{II,1},b_{I,1}+b_{II,1})^{T} and a two-dimensional bounded parameter 𝜼=(aI,1/ν1,bI,1/ν2)T\bm{\eta}=(a_{I,1}/\nu_{1},b_{I,1}/\nu_{2})^{T} without losing any information. Let 𝝃=g(𝜼,𝝂)\bm{\xi}=g(\bm{\eta},\bm{\nu}), and

(𝜼,𝝂)=𝒓obs𝑹(g(𝜼,𝝂))2\mathcal{L}(\bm{\eta},\bm{\nu})=\|\bm{r_{\text{obs}}}-\bm{R}(g(\bm{\eta},\bm{\nu}))\|_{2}

be the loss function. While studying the structure of \mathcal{L}, we notice that it is convex with respect to 𝝂\bm{\nu} for any 𝜼\bm{\eta}, as shown in Fig. 1(a). The topography suggests that an optimal estimator 𝝂^(𝜼)=argmin𝝂(𝜼,𝝂)\bm{\hat{\nu}}(\bm{\eta})=\arg\min_{\bm{\nu}}\mathcal{L}(\bm{\eta},\bm{\nu}) can be easily obtained via optimization algorithms for each 𝜼\bm{\eta}, as presented in Fig. 3(a)(c). However, if we try to optimize both 𝜼\bm{\eta} and 𝝂\bm{\nu} at the same time, the non-smooth loss function and potential multiple solutions, as illustrated in Fig. 1(b), will make the estimation invalid.

Refer to caption
Refer to caption
Figure 1: The structure of the loss function (𝜼,𝝂)\mathcal{L}(\bm{\eta},\bm{\nu}) for the chromatography system. (a) The L2L^{2} distance between 𝒓obs\bm{r_{\text{obs}}} and 𝑹(g(𝜼,𝝂))\bm{R}(g(\bm{\eta},\bm{\nu})), where 𝜼\bm{\eta} is set to be the truth and 𝝂\bm{\nu} is sampled in a rectangular area containing the truth. (b) The L2L^{2} distance between 𝒓obs\bm{r_{\text{obs}}} and 𝑹(g(𝜼,𝝂^(𝜼)))\bm{R}(g(\bm{\eta},\bm{\hat{\nu}}(\bm{\eta}))), where 𝜼[0,1]2\bm{\eta}\in[0,1]^{2} is sampled uniformly and 𝝂^(𝜼)=argmin𝝂(𝜼,𝝂)\bm{\hat{\nu}}(\bm{\eta})=\arg\min_{\bm{\nu}}\mathcal{L}(\bm{\eta},\bm{\nu}) is calculated with gradient descent for each 𝜼\bm{\eta}.
Refer to caption
Refer to caption
Figure 2: The structure of the loss function for a two-component Gaussian-mixture model. (a) The L2L^{2} distance between the observed density function and signal simulated with weights 𝜼\bm{\eta} and means 𝝂\bm{\nu}, where 𝜼\bm{\eta} equals the truth and 𝝂\bm{\nu} is sampled in a rectangular area containing the truth. (b) The L2L^{2} distance between the observed density function and signal simulated with weights 𝜼\bm{\eta} and means 𝝂^(𝜼)\bm{\hat{\nu}}(\bm{\eta}), where 𝜼[0,1]2\bm{\eta}\in[0,1]^{2} and 𝝂^(𝜼)\bm{\hat{\nu}}(\bm{\eta}) is calculated with gradient descent for each 𝜼\bm{\eta}.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3: Visualization of the running gradient descent algorithm with the chromatography system (the two left panels) and the two-component Gaussian-mixture model (the two right panels). (a, b) The L2L^{2} distance between the observation and simulated signal versus the number of iterations. (c, d) The trajectory of updating 𝝂\bm{\nu} in the gradient descent algorithm.

Similar loss functions can be observed when dealing with mixture models. Let us consider the simplest case. Assume the signal is the density function of a two-component Gaussian-mixture model with weights 𝜼\bm{\eta} and mean 𝝂\bm{\nu}. The corresponding loss function is visualized in Fig. 2. Similar to that of the chromatography system, we can also easily calculate the corresponding optimal means for any fixed weights, but multiple solutions to the problem and saddle points make the optimization unreliable. Unfortunately, this problem will get worse as the number of components grows. Therefore, we usually use Markov chain-based Bayesian methods to make inferences about such parameters of interest. The traversability of these chains and the possible existence of ”label switches” also reduce the impact of multiple solutions.

In summary, the adsorption-isotherm parameters are not likely to be directly estimated via gradient-based optimization algorithms, but such optimization methods might lead to a partial solution under certain restricted conditions. Meanwhile, since the mixture models imitate the chromatography system well in terms of the loss function and the former is more computationally efficient, they can be used as toy examples to verify our proposed methods.

3 Methodology

In this section, we introduce the main methodology of this paper. The section is organized as follows: first, the traditional Bayes sampling framework is constructed, after which a strategy of dimensionality reduction and restoration is presented. Finally, two algorithms based on this strategy and Metropolis-within-Gibbs sampling are explained.

3.1 Bayesian approach

To investigate the parameter of interest, we aim to draw samples from the posterior distribution of 𝝃\bm{\xi} given the entire observation 𝒓obs\bm{r_{\text{obs}}}. In this section, we adopt a Bayesian framework to sample from the posterior distribution π(𝜽|𝒓obs)\pi(\bm{\theta}|\bm{r_{\text{obs}}}). Throughout this paper, p()p(\cdot) is a generic symbol for continuous probability densities, π()\pi(\cdot) denotes the prior densities, π(|𝒓obs)\pi(\cdot|\bm{r_{\text{obs}}}) denotes the posterior densities, and q(|)q(\cdot|\cdot) denotes the proposal densities used in Markov chain Monte Carlo (MCMC) algorithms.

For Bayesian inference, we assume that the measurement errors {ϵ(t),t𝒯}\{\epsilon(t),t\in\mathcal{T}\} are normally distributed, and impose the following prior distributions on 𝝃\bm{\xi} and the error term variance σϵ2\sigma^{2}_{\epsilon}:

ϵ(t1),,ϵ(tn)|σϵ2\displaystyle\epsilon(t_{1}),\ldots,\epsilon(t_{n})~{}|~{}\sigma^{2}_{\epsilon} i.i.d.N(0,σϵ2),for i=1,,n\displaystyle\overset{\text{i.i.d.}}{\sim}N(0,\sigma^{2}_{\epsilon}),\quad\text{for }i=1,\ldots,n
σϵ2|α,β\displaystyle\sigma^{2}_{\epsilon}~{}|~{}\alpha,\beta IG(α,β)\displaystyle\sim IG(\alpha,\beta)
π(𝝃|γ)\displaystyle\pi(\bm{\xi}~{}|~{}\gamma) exp{γ𝑹(𝝃)𝒓obs22}\displaystyle\propto\exp\left\{-\gamma\|\bm{R}(\bm{\xi})-\bm{r_{\text{obs}}}\|_{2}^{2}\right\}

where N(μ,σ2)N(\mu,\sigma^{2}) represents the normal distribution with mean μ\mu and variacne σ2\sigma^{2}, IG(α,β)IG(\alpha,\beta) represents the inverse gamma distribution with shape parameter α\alpha and scale parameter β\beta, and γ\gamma is a tuning parameter. We let 𝝍=(α,β,γ)\bm{\psi}=(\alpha,\beta,\gamma), which includes all hyperparameters.

Given this prior specification, the posterior distribution of (𝝃,σϵ2,𝑹(ξ))(\bm{\xi},\sigma^{2}_{\epsilon},\bm{R}(\xi)) can be written explicitly as

π(𝝃,σϵ2|𝒓obs,𝝍)\displaystyle\pi(\bm{\xi},\sigma^{2}_{\epsilon}~{}|~{}\bm{r_{\text{obs}}},\bm{\psi}) ϕ(𝒓obs;𝑹(𝝃),σϵ2In)π(σϵ2|α,β)π(𝝃|γ)\displaystyle\propto\phi(\bm{r_{\text{obs}}};\bm{R}(\bm{\xi}),\sigma^{2}_{\epsilon}\mathit{I}_{n})\cdot\pi(\sigma^{2}_{\epsilon}|\alpha,\beta)\cdot\pi(\bm{\xi}|\gamma) (8)
(σϵ2)(n2+α+1)exp{σϵ2(ETE2+β)γETE}\displaystyle\propto(\sigma^{2}_{\epsilon})^{-(\frac{n}{2}+\alpha+1)}\exp\left\{-\sigma^{-2}_{\epsilon}\left(\frac{E^{T}E}{2}+\beta\right)-\gamma E^{T}E\right\}

where ϕ(x;μ,Σ)\phi(x;\mu,\Sigma) stands for the multivariate normal density for the argument xx with mean μ\mu and covariance matrix Σ\Sigma, InI_{n} is the n×nn\times n identity matrix, and E=𝑹(𝝃)𝒓obsE=\bm{R}(\bm{\xi})-\bm{r_{\text{obs}}}.

After identifying the posterior distribution, we can use the Metropolis-Hastings algorithm to sample from the posterior. Because 𝜼\bm{\eta} and 𝝂\bm{\nu} are highly correlated in the posterior, an independent random-walk proposal may not converge quickly, or even get stuck somewhere. For this reason, we consider the Metropolis-within-Gibbs sampler instead. The method is outlined in Algorithm 3.1, in which the acceptance probability is calculated as

ρ(𝜽;𝜽)=min{π(𝝃,σϵ2|𝒓obs,𝝍)π(𝝃,σϵ2|𝒓obs,𝝍)q(𝝃,σϵ2|𝝃,σϵ2)q(𝝃,σϵ2|𝝃,σϵ2),1}.\rho(\bm{\theta};\bm{\theta}^{\prime})=\min\left\{\frac{\pi(\bm{\xi}^{\prime},\sigma^{2\prime}_{\epsilon}|\bm{r_{\text{obs}}},\bm{\psi})}{\pi(\bm{\xi},\sigma^{2}_{\epsilon}|\bm{r_{\text{obs}}},\bm{\psi})}\frac{q(\bm{\xi},\sigma_{\epsilon}^{2}|\bm{\xi}^{\prime},\sigma_{\epsilon}^{2\prime})}{q(\bm{\xi}^{\prime},\sigma_{\epsilon}^{2\prime}|\bm{\xi},\sigma_{\epsilon}^{2})},1\right\}.

The hyperparameters 𝝍=(α,β,γ)\bm{\psi}=(\alpha,\beta,\gamma) together with the variances of proposal distributions are tuned to make the chain as stable as possible and to have an acceptance rate of approximately 40%40\% to 80%80\% for each element of (𝝃,σϵ2)(\bm{\xi},\sigma^{2}_{\epsilon}). Then the sampler will generate a recurrent Markov chain whose stationary distribution is the target posterior distribution π(𝝃,σϵ2|𝒓obs,𝝍)\pi(\bm{\xi},\sigma^{2}_{\epsilon}~{}|~{}\bm{r_{\text{obs}}},\bm{\psi}).

  Algorithm 1 Metropolis-within-Gibbs sampler

 

Observed data 𝒓obs\bm{r_{\text{obs}}}; Hyperparameter 𝝍\bm{\psi}.
KK posterior samples {𝜽(k),k=1,,K}\{\bm{\theta}^{(k)},k=1,\cdots,K\}.
Initialize 𝜽(0)\bm{\theta}^{(0)};
for k=1:Kk=1:K do
     𝜽𝜽(k1)\bm{\theta}^{\prime}\leftarrow\bm{\theta}^{(k-1)};
     for j=1:length(𝜽)j=1:\text{length}(\bm{\theta}) do
         𝜽𝜽\bm{\theta}^{\star}\leftarrow\bm{\theta}^{\prime};
         Draw 𝜽jq(𝜽j|𝜽j)\bm{\theta}_{j}^{\star}\sim q(\bm{\theta}_{j}|\bm{\theta}_{j}^{\prime});
         Accept 𝜽𝜽\bm{\theta}^{\prime}\leftarrow\bm{\theta}^{\star} with probability ρ(𝜽;𝜽)\rho(\bm{\theta}^{\prime};\bm{\theta}^{\star});
     end for
     𝜽(k)𝜽\bm{\theta}^{(k)}\leftarrow\bm{\theta}^{\prime}.
end for

 

3.2 Strategy of dimensionality reduction and restoration

Modeling inverse problems as the posterior distributions will introduce correlation to the parameters of interest. When sampling more than one of the parameters directly from the posterior, the strong correlation might disrupt the motion of the Markov chain. To surmount this problem, we propose a dimension-reduction strategy with two modified MCMC algorithms to stabilize the posterior chains.

We first consider the degenerated case, in which the posterior distribution of 𝝃\bm{\xi} is almost singular and only supported on a lower dimensional set the size of observed data tends to infinity. Specifically, suppose that the parameter of interest 𝝃D\bm{\xi}\in\mathbb{R}^{D} can be written as 𝝃=g(𝜼,𝝂)\bm{\xi}=g(\bm{\eta},\bm{\nu}), where 𝜼d\bm{\eta}\in\mathbb{R}^{d} with dDd\ll D, 𝝂Dd\bm{\nu}\in\mathbb{R}^{D-d}, and g():DDg(\cdot):\mathbb{R}^{D}\to\mathbb{R}^{D} is a one-to-one mapping determined with pilot study. Roughly speaking, we first set dd as the largest dimension that leads to a stable posterior chain of 𝜼\bm{\eta}. Then the dimensionality is reduced by combining the elements of 𝝃\bm{\xi} according to the pilot study. Please see Section S4 of the supplementary material for a detailed discussion on how to choose 𝜼,𝝂,g\bm{\eta},\bm{\nu},g and some empirical results.

In the extreme case that 𝝂\bm{\nu} and 𝜼\bm{\eta} are perfectly dependent in the posterior as the size of observed data nn tends to infinity, the conditional posterior π(𝝂|𝜼,𝒓obs,𝝍)\pi(\bm{\nu}|\bm{\eta},\bm{r_{\text{obs}}},\bm{\psi}) is degenerate to a Dirac distribution δh(𝜼)\delta_{h(\bm{\eta})} for some unknown function hh, i.e. it is almost sure that 𝝂=h(𝜼)\bm{\nu}=h(\bm{\eta}). Then, the support of the posterior of 𝝃\bm{\xi} can be expressed as

{𝝃D𝝃=g(𝜼,h(𝜼))},\{\bm{\xi}\in\mathbb{R}^{D}\mid\bm{\xi}=g(\bm{\eta},h(\bm{\eta}))\},

and the induced posterior distribution of (𝜼,𝝂)(\bm{\eta},\bm{\nu}) is

π(𝜼,𝝂|𝒓obs,𝝍)=π(𝜼|𝒓obs,𝝍)π(𝝂|𝜼,𝒓obs,𝝍)=π(𝜼|𝒓obs,𝝍)δh(𝜼).\pi(\bm{\eta},\bm{\nu}|\bm{r_{\text{obs}}},\bm{\psi})=\pi(\bm{\eta}|\bm{r_{\text{obs}}},\bm{\psi})\pi(\bm{\nu}|\bm{\eta},\bm{r_{\text{obs}}},\bm{\psi})=\pi(\bm{\eta}|\bm{r_{\text{obs}}},\bm{\psi})\delta_{h(\bm{\eta})}.

In such a scenario, we can find the least squares estimator of 𝝂\bm{\nu} in terms of 𝜼,𝒓obs,𝝍\bm{\eta},\bm{r_{\text{obs}}},\bm{\psi} from the following equation:

𝝂^(𝜼)=argmin𝝂𝑹(g(𝜼,𝝂))𝒓obs2,\bm{\hat{\nu}}(\bm{\eta})=\arg\min_{\bm{\nu}^{\prime}}\|\bm{R}(g(\bm{\eta},\bm{\nu}^{\prime}))-\bm{r_{\text{obs}}}\|_{2}, (9)

such that 𝝂^(𝜼)\bm{\hat{\nu}}(\bm{\eta}) approximates h(𝜼)h(\bm{\eta}) as nn\to\infty. In what follows, we assume that the optimization problem in (9) can be settled by the following gradient descent algorithm (GD) without discussing the optimization accuracy.

Algorithm 2 Gradient Descent (GD)
Observed data 𝒓obs\bm{r_{\text{obs}}}; Recording time 𝒯n\mathcal{T}_{n}; Argument 𝜼\bm{\eta}.
Restored parameter 𝝃\bm{\xi}
Initialize complementary information 𝝂=𝝂0\bm{\nu}=\bm{\nu}_{0}, step size γ\gamma.
for iter = 1 : MaxIter do
     L(𝝂)<𝑹(g(𝜼,𝝂))𝒓obs22L(\bm{\nu})<-\|\bm{R}(g(\bm{\eta},\bm{\nu}))-\bm{r_{\text{obs}}}\|_{2}^{2}
     𝑮<\bm{G}<- numerical gradient of LL at point ss
     if 𝑮<105\|\bm{G}\|<10^{-5} then
         Break
     end if
     for  i = 1 : 100 do
         𝝂=𝝂0.9(i1)γ𝑮/𝑮\bm{\nu}^{\prime}=\bm{\nu}-0.9^{(i-1)}\gamma\cdot\bm{G}/\|\bm{G}\|
         if L(𝝂)<L(𝝂)L(\bm{\nu}^{\prime})<L(\bm{\nu}) then
              Break
         end if
     end for
     𝝂=𝝂\bm{\nu}=\bm{\nu}^{\prime}
end for
Return: 𝝂^=𝝂\bm{\hat{\nu}}=\bm{\nu} and 𝝃^=g(𝜼,𝝂^)\bm{\hat{\xi}}=g(\bm{\eta},\bm{\hat{\nu}}).

To incorporate the optimizing step in the MCMC algorithm, we keep the overall structure of Algorithm 3.1 for sampling 𝜼\bm{\eta} and σϵ2\sigma_{\epsilon}^{2}. In the extreme case of 𝝂=h(𝜼)\bm{\nu}=h(\bm{\eta}) for some function hh, instead of sampling 𝝂\bm{\nu}, we can calculate a 𝝂^(𝜼)\bm{\hat{\nu}}(\bm{\eta}) from (9) in each MCMC iteration. Together with the function gg, 𝝃^=g(𝜼,𝝂^(𝜼))\bm{\hat{\xi}}=g(\bm{\eta},\bm{\hat{\nu}}(\bm{\eta})) can be obtained for any specific 𝜼\bm{\eta}. The detailed algorithm is provided in Algorithm 3.3.

Our theoretical justification for this dimension-reduction strategy is as follows. We first introduce a series of technical assumptions, and our theoretical results will depend on different subsets of these assumptions. We define the loss function LR(𝝃)=𝒯[R(𝝃,t)R(𝝃,t)]2𝑑tL_{R}(\bm{\xi})=\int_{\mathcal{T}}[R(\bm{\xi},t)-R(\bm{\xi}^{*},t)]^{2}dt associated with the solution function RR, for any 𝝃𝚵\bm{\xi}\in\bm{\Xi}.

  • (A.1)

    (𝝂,𝜼)(\bm{\nu},\bm{\eta}) lies in a compact space 𝚯D\bm{\Theta}\subseteq\mathbb{R}^{D}, where 𝜼𝚯𝜼d\bm{\eta}\in\bm{\Theta}_{\bm{\eta}}\subseteq\mathbb{R}^{d} and 𝝂𝚯𝝂Dd\bm{\nu}\in\bm{\Theta}_{\bm{\nu}}\subseteq\mathbb{R}^{D-d}. There is a continuous one-to-one mapping such that 𝝃=g(𝜼,𝝂)\bm{\xi}=g(\bm{\eta},\bm{\nu}) and 𝚵=g(𝚯)\bm{\Xi}=g(\bm{\Theta}) is also a compact space for 𝝃\bm{\xi}. The true parameter 𝝃=g(𝜼,𝝂)\bm{\xi}^{*}=g(\bm{\eta}^{*},\bm{\nu}^{*}) is an interior point of 𝚵\bm{\Xi}. 𝒯\mathcal{T} is a compact set, and {ϵ(t):t𝒯}\{\epsilon(t):t\in\mathcal{T}\} are independent and identically distributed as N(0,σϵ2)N(0,\sigma_{\epsilon}^{2*}) for some σϵ2<\sigma_{\epsilon}^{2*}<\infty.

  • (A.2)

    Let |𝒯||\mathcal{T}| be the Lebesgue measure of 𝒯\mathcal{T}. As nn\to\infty,

    sup𝝃𝚵||𝒯|n𝑹(𝝃)𝑹(𝝃)22𝒯[R(𝝃,t)R(𝝃,t)]2𝑑t|0.\displaystyle\sup_{\bm{\xi}\in\bm{\Xi}}\left|\frac{|\mathcal{T}|}{n}\left\|\bm{R}(\bm{\xi})-\bm{R}(\bm{\xi}^{*})\right\|_{2}^{2}-\int_{\mathcal{T}}[R(\bm{\xi},t)-R(\bm{\xi}^{*},t)]^{2}dt\right|\to 0. (10)
  • (A.3)

    There exists a function h(𝜼)h(\bm{\eta}), and positive constants a1R,c1R,κ1Ra_{1R},c_{1R},\kappa_{1R}, such that for any (𝜼,𝝂)𝚯(\bm{\eta},\bm{\nu})\in\bm{\Theta},

    LR(g(𝜼,𝝂))LR(g(𝜼,h(𝜼)))min(a1R𝝂h(𝜼)2κ1R,c1R).\displaystyle L_{R}(g(\bm{\eta},\bm{\nu}))-L_{R}(g(\bm{\eta},h(\bm{\eta})))\geq\min\left(a_{1R}\|\bm{\nu}-h(\bm{\eta})\|_{2}^{\kappa_{1R}},c_{1R}\right). (11)
  • (A.4)

    There exist positive constants a2R,c2R,κ2Ra_{2R},c_{2R},\kappa_{2R}, such that, for all 𝝃𝚵\bm{\xi}\in\bm{\Xi} that satisfies 𝝃𝝃2c2R\|\bm{\xi}-\bm{\xi}^{*}\|_{2}\leq c_{2R}, LR(𝝃)a2R𝝃𝝃2κ2RL_{R}(\bm{\xi})\leq a_{2R}\|\bm{\xi}-\bm{\xi}^{*}\|_{2}^{\kappa_{2R}}.

  • (A.5)

    There exist positive constants a3R,c3R,κ3Ra_{3R},c_{3R},\kappa_{3R}, such that, for all 𝝃𝚵\bm{\xi}\in\bm{\Xi}, LR(𝝃)min(a3R𝝃𝝃2κ3R,c3R)L_{R}(\bm{\xi})\geq\min\left(a_{3R}\|\bm{\xi}-\bm{\xi}^{*}\|_{2}^{\kappa_{3R}},c_{3R}\right).

Assumption A.1 assumes a compact parameter space and normal errors. Assumption A.2 assumes the uniform convergence of the empirical L2L_{2} distance from the function R(𝝃,t)R(\bm{\xi},t) to the true function R(𝝃,t)R(\bm{\xi}^{*},t) over the compact parameter space 𝚵\bm{\Xi}. This assumption usually holds when 𝒯n={t1,,tn}\mathcal{T}_{n}=\{t_{1},\ldots,t_{n}\} are densely distributed in 𝒯\mathcal{T} and R(𝝃,t)R(\bm{\xi},t) is continuous in both 𝝃\bm{\xi} and tt. Assumption A.3 is the identification condition for 𝝂\bm{\nu}, where the lower bound guarantees that h(𝜼)h(\bm{\eta}) is the unique minimizer of the loss function LR(g(𝜼,𝝂))L_{R}(g(\bm{\eta},\bm{\nu})) over the 𝝂\bm{\nu} argument for any given 𝜼\bm{\eta}. This assumption is only used for justifying our later Algorithm 3.3 in the degenerate case of 𝝂=h(𝜼)\bm{\nu}=h(\bm{\eta}). Assumption A.4 imposes a local-continuity condition on the loss function LR(𝝃)L_{R}(\bm{\xi}) in a small neighborhood of 𝝃\bm{\xi}^{*}. Since we have already assumed that 𝝃\bm{\xi}^{*} is an interior point of 𝚵\bm{\Xi} in A.1, the radius c2Rc_{2R} can be made small such that the whole ball {𝝃:𝝃𝝃2c2R}\{\bm{\xi}:\|\bm{\xi}-\bm{\xi}^{*}\|_{2}\leq c_{2R}\} is included in 𝚵\bm{\Xi}. Assumption A.5 imposes an identification condition on the loss function LR()L_{R}(\cdot) which guarantees that the true parameter 𝝃\bm{\xi}^{*} is uniquely identified in 𝚵\bm{\Xi}. This is similar to the identification condition for moment estimation (see, for example, ZE.2 in Belloni and Chernozhukov 2009). The power constants κ1R\kappa_{1R}, κ2R\kappa_{2R}, and κ3R\kappa_{3R} in A.3, A.4, and A.5 are not specified to allow flexibility in the local-continuity property of LR()L_{R}(\cdot). In Section 2.2 of the supplementary material, we provide some partial verification for these assumptions using the model of simulation case 1 in Section 4.1.

Theorem 3.1.
  • (i)

    Suppose that Assumptions A.1, A.2, and A.3 hold. Then, for 𝝂^(𝜼)\bm{\hat{\nu}}(\bm{\eta}) defined in (9),

    sup𝜼𝚯𝜼𝝂^(𝜼)h(𝜼)220,\displaystyle\sup_{\bm{\eta}\in\bm{\Theta}_{\bm{\eta}}}\left\|\bm{\hat{\nu}}(\bm{\eta})-h(\bm{\eta})\right\|_{2}^{2}\to 0, (12)

    as nn\to\infty, almost surely.

  • (ii)

    Suppose that Assumptions A.1, A.2, A.3, and A.4 hold. Then, for any ε>0\varepsilon>0, and for every 𝜼𝚯𝜼\bm{\eta}\in\bm{\Theta}_{\bm{\eta}} and every σϵ2>0\sigma_{\epsilon}^{2}>0,

    Π(𝝂h(𝜼)2>ε|σϵ2,𝒓obs,𝝍)0,\displaystyle\Pi(\|\bm{\nu}-h(\bm{\eta})\|_{2}>\varepsilon~{}|~{}\sigma_{\epsilon}^{2},\bm{r_{\text{obs}}},\bm{\psi})\to 0, (13)

    as nn\to\infty, almost surely, where Π(|σϵ2,𝒓obs,𝝍)\Pi(\cdot|\sigma_{\epsilon}^{2},\bm{r_{\text{obs}}},\bm{\psi}) denotes the conditional posterior measure of 𝝃\bm{\xi} given σϵ2\sigma_{\epsilon}^{2}.

  • (iii)

    Suppose that Assumptions A.1, A.2, A.4, and A.5 hold. Then, for any ε>0\varepsilon>0, and for every 𝜼𝚯𝜼\bm{\eta}\in\bm{\Theta}_{\bm{\eta}} and every σϵ2>0\sigma_{\epsilon}^{2}>0,

    Π(𝝃𝝃2>ε|σϵ2,𝒓obs,𝝍)0,\displaystyle\Pi(\|\bm{\xi}-\bm{\xi}^{*}\|_{2}>\varepsilon~{}|~{}\sigma_{\epsilon}^{2},\bm{r_{\text{obs}}},\bm{\psi})\to 0, (14)

    as nn\to\infty, almost surely.

Theorem 3.1 provides justification for our Algorithm 3.3, in the sense that, in each MCMC iteration, we can numerically solve for 𝝂\bm{\nu} in terms of 𝜼\bm{\eta} from (9). Part (i) shows that, as the sample size of observations nn increases, the empirical solution 𝝂^(𝜼)\hat{\bm{\nu}}(\bm{\eta}) becomes increasingly close to h(𝜼)h(\bm{\eta}), which is the unique minimizer of the loss function LR()L_{R}(\cdot). Part (ii) shows that, under the additional continuity condition on LR()L_{R}(\cdot), most of the posterior probability mass of 𝝂\bm{\nu} is concentrated around h(𝜼)h(\bm{\eta}). Part (iii) gives the stronger result of posterior consistency of 𝝃\bm{\xi} to the truth 𝝃\bm{\xi}^{*} (Chapter 6, Ghosal and van der Vaart 2017) under the additional identification condition A.5 on 𝝃\bm{\xi}, which implies that most of the posterior probability mass will concentrate around the true parameter 𝝃\bm{\xi}^{*} asymptotically. The detailed proof of Theorem 3.1 can be found in Section S2.1 of the supplementary material. We emphasize that even when the strong assumption of A.3 that assumes 𝝂=h(𝜼)\bm{\nu}=h(\bm{\eta}) does not hold, Part (iii) of Theorem 3.1 on the posterior consistency for the whole parameter vector 𝝃\bm{\xi} remains valid as the sample size nn\to\infty.

3.3 Main algorithms

After identifying the method of dimensionality reduction and restoration, we propose a new modified Metropolis-embedded gradient-descent-within-Gibbs sampler (MGDG). The detailed algorithm is presented below.

  Algorithm 3 Metropolis-embedded gradient-descent-within-Gibbs sampler (MGDG)

 

Observed data 𝒓obs\bm{r_{\text{obs}}}; Recording time 𝒯n\mathcal{T}_{n}; Hyperparameter 𝝍\bm{\psi}.
KK posterior samples {𝜼(k),σϵ2,(k),𝝃^(k),k=1:K}.\{\bm{\eta}^{(k)},\sigma^{2,(k)}_{\epsilon},\hat{\bm{\xi}}^{(k)},\,k=1:K\}.
Initialize 𝜼(0),σϵ2,(0)\bm{\eta}^{(0)},\sigma^{2,(0)}_{\epsilon};
Set 𝝃(0)g(𝜼(0),𝝂^(𝜼(0)))\bm{\xi}^{(0)}\leftarrow g(\bm{\eta}^{(0)},\bm{\hat{\nu}}(\bm{\eta}^{(0)})), and calculate 𝑹(𝝃(0))\bm{R}(\bm{\xi}^{(0)}) from the numerical solver;
for k=1:Kk=1:K do
     Let 𝜼,σϵ2𝜼(k1),σϵ2,(k1)\bm{\eta},\,\sigma^{2}_{\epsilon}\leftarrow\bm{\eta}^{(k-1)},\sigma^{2,(k-1)}_{\epsilon}, E𝑹(𝝃)𝒓obsE\leftarrow\bm{R}(\bm{\xi})-\bm{r_{\text{obs}}};
     Propose 𝜼,σϵ2q(|𝜼,σϵ2)\bm{\eta}^{\prime},\,\sigma^{2\prime}_{\epsilon}\sim q(~{}\cdot~{}|\bm{\eta},\,\sigma^{2}_{\epsilon});
     (Sampling of σϵ2\sigma^{2}_{\epsilon})
     ρσϵ2min{(σϵ2σϵ2)n2exp{12[(σϵ2)1(σϵ2)1]}π(σϵ2|𝝍)π(σϵ2|𝝍)q(σϵ2|σϵ2)q(σϵ2|σϵ2),1};\rho_{\sigma^{2}_{\epsilon}}\leftarrow\min\left\{\left(\frac{\sigma_{\epsilon}^{2\prime}}{\sigma_{\epsilon}^{2}}\right)^{-\frac{n}{2}}\exp\left\{-\frac{1}{2}\left[(\sigma_{\epsilon}^{2\prime})^{-1}-(\sigma_{\epsilon}^{2})^{-1}\right]\right\}\frac{\pi(\sigma^{2\prime}_{\epsilon}|\bm{\psi})}{\pi(\sigma^{2}_{\epsilon}|\bm{\psi})}\frac{q(\sigma^{2}_{\epsilon}|\sigma^{2\prime}_{\epsilon})}{q(\sigma^{2\prime}_{\epsilon}|\sigma^{2}_{\epsilon})},1\right\};
     if U[0,1]<ρσϵ2U[0,1]<\rho_{\sigma^{2}_{\epsilon}} then
         σϵ2,(k)σϵ2\sigma_{\epsilon}^{2,(k)}\leftarrow\sigma_{\epsilon}^{2\prime};
     else
         σϵ2,(k)σϵ2\sigma_{\epsilon}^{2,(k)}\leftarrow\sigma_{\epsilon}^{2};
     end if
     Let σϵ2σϵ2,(k)\sigma_{\epsilon}^{2}\leftarrow\sigma_{\epsilon}^{2,(k)};
     (Sampling of η\bm{\eta})
     for j=1:dj=1:d do
         𝜼cand(η1,,ηj1,ηj,,ηj+1,ηd)\bm{\eta}_{cand}\leftarrow(\eta_{1},\cdots,\eta_{j-1},\eta_{j}^{\prime},,\eta_{j+1}\cdots,\eta_{d});
         𝝃candGD(𝒓obs,𝜼cand)\bm{\xi}_{cand}\leftarrow GD(\bm{r_{\text{obs}}},\bm{\eta}_{cand});
         Calculate 𝑹(𝝃cand)\bm{R}(\bm{\xi}_{cand}) from numerical solver, E𝑹(𝝃cand)𝒓obsE^{\prime}\leftarrow\bm{R}(\bm{\xi}_{cand})-\bm{r_{\text{obs}}};
         ρ𝜼min{exp{12σϵ2(ETEETE)}π(𝜼cand|𝝍)π(𝜼|𝝍)q(𝜼|𝜼cand)q(𝜼cand|𝜼),1}\rho_{\bm{\eta}}\leftarrow\min\left\{\exp\left\{-\frac{1}{2\sigma_{\epsilon}^{2}}(E^{\prime T}E^{\prime}-E^{T}E)\right\}\frac{\pi(\bm{\eta}_{cand}|\bm{\psi})}{\pi(\bm{\eta}|\bm{\psi})}\frac{q(\bm{\eta}|\bm{\eta}_{cand})}{q(\bm{\eta}_{cand}|\bm{\eta})},1\right\};
         if U[0,1]<ρ𝜼U[0,1]<\rho_{\bm{\eta}} then
              𝜼j(k)𝜼j\bm{\eta}_{j}^{(k)}\leftarrow\bm{\eta}_{j}^{\prime};
         else
              𝜼j(k)𝜼j\bm{\eta}_{j}^{(k)}\leftarrow\bm{\eta}_{j};
         end if
         𝝃^(k)GD(𝒓obs,𝜼(k))\hat{\bm{\xi}}^{(k)}\leftarrow GD(\bm{r_{\text{obs}}},\bm{\eta}^{(k)});
     end for
end for
Return: {𝜼(k),σϵ2,(k),𝝃^(k),k=1:K}.\{\bm{\eta}^{(k)},\sigma^{2,(k)}_{\epsilon},\hat{\bm{\xi}}^{(k)},\,k=1:K\}.

 

The main strategy of this algorithm is to sample only 𝜼d\bm{\eta}\in\mathbb{R}^{d} and the error term variance σϵ2\sigma^{2}_{\epsilon} from the posterior distribution, and 𝝂\bm{\nu} is treated as a function of 𝜼\bm{\eta} in the posterior. We can restore a posterior sample of 𝝃=g(𝜼,𝝂)\bm{\xi}=g(\bm{\eta},\bm{\nu}) using the posterior draws of 𝜼\bm{\eta} and 𝝂\bm{\nu}. The lower dimensionality of 𝜼\bm{\eta} helps to improve the mixing of Markov chains and prevent the ”label switching” phenomenon that occurs in mixture models (Jasra, Holmes and Stephens 2005). After reaching the stationary distribution, the samples {𝝂^(k)}\{\hat{\bm{\nu}}^{(k)}\} should be distributed around the truth h(𝝂)h(\bm{\nu}), as shown in Theorem 3.1.

Now we consider the general case in which 𝝂\bm{\nu} is not a function of 𝜼\bm{\eta}, but may be highly correlated with 𝜼\bm{\eta} in the posterior distribution π(𝝃,σϵ2|robs,𝝍)\pi(\bm{\xi},\sigma_{\epsilon}^{2}|r_{\text{obs}},\bm{\psi}). In such a scenario, we use the Metropolis-adjusted Langevin algorithm (MALA) (Roberts and Rosenthal, 1998; Roberts and Tweedie, 1996) to sample 𝝂\bm{\nu}. We still use the Metropolis-within-Gibbs algorithm to draw 𝜼\bm{\eta} and σϵ2\sigma^{2}_{\epsilon}, and then use the MALA to draw 𝝂\bm{\nu} using the gradient information from the conditional posterior of 𝝂\bm{\nu}. We call this algorithm the Metropolis-adjusted Langevin-dynamics-within-Gibbs sampler (MALG).

  Algorithm 4 Metropolis-adjusted Langevin-dynamics-within-Gibbs sampler (MALG)

 

Observed data 𝒓obs\bm{r_{\text{obs}}}; Recording time 𝒯n\mathcal{T}_{n}; Hyperparameter 𝝍\bm{\psi}; Scaling parameter τ\tau; Length of Langevin Monte Carlo mm.
KK posterior samples {𝜼(k),𝝂(k),σϵ2,(k),k=1:K}.\{\bm{\eta}^{(k)},\bm{\nu}^{(k)},\sigma^{2,(k)}_{\epsilon},\,k=1:K\}.
Initialize 𝜼(0),𝝂(0),σϵ2,(0)\bm{\eta}^{(0)},\bm{\nu}^{(0)},\sigma^{2,(0)}_{\epsilon};
Set 𝝃(0)g(𝜼(0),𝝂(0))\bm{\xi}^{(0)}\leftarrow g(\bm{\eta}^{(0)},\bm{\nu}^{(0)}), and calculate 𝑹(𝝃(0))\bm{R}(\bm{\xi}^{(0)}) from the numerical solver;
for k=1:Kk=1:K do
     Let 𝝃,𝜼,𝝂,σϵ2𝝃(k1),𝜼(k1),𝝂(k1),σϵ2,(k1)\bm{\xi},\bm{\eta},\bm{\nu},\sigma^{2}_{\epsilon}\leftarrow\bm{\xi}^{(k-1)},\bm{\eta}^{(k-1)},\bm{\nu}^{(k-1)},\sigma^{2,(k-1)}_{\epsilon};
     Let E𝑹(𝝃)𝒓obsE\leftarrow\bm{R}(\bm{\xi})-\bm{r_{\text{obs}}};
     (Sampling of σϵ2\sigma^{2}_{\epsilon})
     Propose σϵ2q(|σϵ2)\sigma^{2\prime}_{\epsilon}\sim q(\cdot|\,\sigma^{2}_{\epsilon});
     ρσϵ2min{(σϵ2/σϵ2)n2exp{12[(σϵ2)1(σϵ2)1]ETE}π(σϵ2|𝝍)π(σϵ2|𝝍)q(σϵ2|σϵ2)q(σϵ2|σϵ2),1};\rho_{\sigma^{2}_{\epsilon}}\leftarrow\min\left\{\left(\sigma_{\epsilon}^{2\prime}/\sigma_{\epsilon}^{2}\right)^{-\frac{n}{2}}\exp\left\{-\frac{1}{2}\left[(\sigma_{\epsilon}^{2\prime})^{-1}-(\sigma_{\epsilon}^{2})^{-1}\right]E^{T}E\right\}\frac{\pi(\sigma^{2\prime}_{\epsilon}|\bm{\psi})}{\pi(\sigma^{2}_{\epsilon}|\bm{\psi})}\frac{q(\sigma^{2}_{\epsilon}|\sigma^{2\prime}_{\epsilon})}{q(\sigma^{2\prime}_{\epsilon}|\sigma^{2}_{\epsilon})},1\right\};
     if U[0,1]<ρσϵ2U[0,1]<\rho_{\sigma^{2}_{\epsilon}} then
         σϵ2,(k)σϵ2\sigma_{\epsilon}^{2,(k)}\leftarrow\sigma_{\epsilon}^{2\prime};
     else
         σϵ2,(k)σϵ2\sigma_{\epsilon}^{2,(k)}\leftarrow\sigma_{\epsilon}^{2};
     end if
     Let σϵ2σϵ2,(k)\sigma_{\epsilon}^{2}\leftarrow\sigma_{\epsilon}^{2,(k)};
     (Sampling of ν\bm{\nu})
     𝝂(k,0)𝝂(k1)\bm{\nu}^{(k,0)}\leftarrow\bm{\nu}^{(k-1)};
     for j=1:mj=1:m do
         𝝂cand𝝂(k,j1)+τ𝝂logπ(𝝂(k,j1)|𝜼,σϵ2,𝑹(𝝃),𝒓obs)+2τN(0,IDd)\bm{\nu}_{cand}\leftarrow\bm{\nu}^{(k,j-1)}+\tau\nabla_{\bm{\nu}}\log\pi(\bm{\nu}^{(k,j-1)}|\bm{\eta},\sigma_{\epsilon}^{2},\bm{R}(\bm{\xi}),\bm{r_{\text{obs}}})+\sqrt{2\tau}N(0,I_{D-d});
         𝝃candg(𝜼,𝝂cand)\bm{\xi}_{cand}\leftarrow g(\bm{\eta},\bm{\nu}_{cand});
         Calculate 𝑹(𝝃cand)\bm{R}(\bm{\xi}_{cand}) from the numerical solver, E𝑹(𝝃cand)𝒓obsE^{\prime}\leftarrow\bm{R}(\bm{\xi}_{cand})-\bm{r_{\text{obs}}};
         Calculate qLD(𝝂cand|𝝂)q_{LD}(\bm{\nu}_{cand}|\bm{\nu}) and qLD(𝝂|𝝂cand)q_{LD}(\bm{\nu}|\bm{\nu}_{cand}), using the formula
qLD(𝝂|𝝂)=exp(𝝂𝝂τ𝝂logπ(𝝂|𝜼,σϵ2,𝑹(𝝃),𝒓obs)4τ).q_{LD}(\bm{\nu}^{\prime}|\bm{\nu})=\exp\left(-\frac{\|\bm{\nu}^{\prime}-\bm{\nu}-\tau\nabla_{\bm{\nu}}\log\pi(\bm{\nu}|\bm{\eta},\sigma_{\epsilon}^{2},\bm{R}(\bm{\xi}),\bm{r_{\text{obs}}})\|}{4\tau}\right).
         ρ𝝂min{exp{12σϵ2(ETEETE)}π(𝝃cand|𝝍)π(𝝃|𝝍)qLD(𝝂|𝝂cand)qLD(𝝂cand|𝝂),1}\rho_{\bm{\nu}}\leftarrow\min\left\{\exp\left\{-\frac{1}{2\sigma_{\epsilon}^{2}}\left(E^{\prime T}E^{\prime}-E^{T}E\right)\right\}\frac{\pi(\bm{\xi}_{cand}|\bm{\psi})}{\pi(\bm{\xi}|\bm{\psi})}\frac{q_{LD}(\bm{\nu}|\bm{\nu}_{cand})}{q_{LD}(\bm{\nu}_{cand}|\bm{\nu})},1\right\};
         if U[0,1]<ρ𝝂U[0,1]<\rho_{\bm{\nu}} then
              𝝂(k,j)𝝂cand\bm{\nu}^{(k,j)}\leftarrow\bm{\nu}_{cand};
         else
              𝝂(k,j)𝝂(k,j1)\bm{\nu}^{(k,j)}\leftarrow\bm{\nu}^{(k,j-1)};
         end if
     end for
     Let 𝝂𝝂(k)𝝂(k,m)\bm{\nu}\leftarrow\bm{\nu}^{(k)}\leftarrow\bm{\nu}^{(k,m)};
     (Sampling of η\bm{\eta})
     Propose 𝜼q(|𝜼)\bm{\eta}^{\prime}\sim q(\cdot|\bm{\eta}); Let 𝝃candg(𝜼,𝝂)\bm{\xi}_{cand}\leftarrow g(\bm{\eta}^{\prime},\bm{\nu});
     Calculate 𝑹(𝝃cand)\bm{R}(\bm{\xi}_{cand}) from the numerical solver, E𝑹(𝝃cand)𝒓obsE^{\prime}\leftarrow\bm{R}(\bm{\xi}_{cand})-\bm{r_{\text{obs}}};
     ρ𝜼min{exp{12σϵ2(ETEETE)}π(𝜼cand|𝝍)π(𝜼|𝝍)q(𝜼|𝜼cand)q(𝜼cand|𝜼),1}\rho_{\bm{\eta}}\leftarrow\min\left\{\exp\left\{-\frac{1}{2\sigma_{\epsilon}^{2}}\left(E^{\prime T}E^{\prime}-E^{T}E\right)\right\}\frac{\pi(\bm{\eta}_{cand}|\bm{\psi})}{\pi(\bm{\eta}|\bm{\psi})}\frac{q(\bm{\eta}|\bm{\eta}_{cand})}{q(\bm{\eta}_{cand}|\bm{\eta})},1\right\};
     if U[0,1]<ρ𝜼U[0,1]<\rho_{\bm{\eta}} then
         𝜼(k)𝜼\bm{\eta}^{(k)}\leftarrow\bm{\eta}^{\prime};
     else
         𝜼(k)𝜼\bm{\eta}^{(k)}\leftarrow\bm{\eta};
     end if
end for
Return: {𝜼(k),𝝂(k),σϵ2,(k),k=1:K}.\{\bm{\eta}^{(k)},\bm{\nu}^{(k)},\sigma^{2,(k)}_{\epsilon},~{}k=1:K\}.

 

where the values of the proposal densities, qLD(𝝂|𝝂)q_{LD}(\bm{\nu}^{\prime}|\bm{\nu}) and qLD(𝝂|𝝂)q_{LD}(\bm{\nu}|\bm{\nu}^{\prime}), are calculated from

qLD(𝝂|𝝂)=exp(𝝂𝝂τ𝝂logπ(𝝂|𝜼,σϵ2,𝑹(𝝃),𝒓obs)4τ).q_{LD}(\bm{\nu}^{\prime}|\bm{\nu})=\exp\left(-\frac{\|\bm{\nu}^{\prime}-\bm{\nu}-\tau\nabla_{\bm{\nu}}\log\pi(\bm{\nu}|\bm{\eta},\sigma_{\epsilon}^{2},\bm{R}(\bm{\xi}),\bm{r_{\text{obs}}})\|}{4\tau}\right).

Because the log-likelihood term logπ(𝝂|𝜼,σϵ2,𝑹(𝝃),𝒓obs)\log\pi(\bm{\nu}|\bm{\eta},\sigma_{\epsilon}^{2},\bm{R}(\bm{\xi}),\bm{r_{\text{obs}}}) is proportional to 𝑹(g(𝜼,𝝂))𝒓obs22\|\bm{R}(g(\bm{\eta},\bm{\nu}))-\bm{r_{\text{obs}}}\|_{2}^{2}, the gradient term is the same gradient 𝝂𝑹(g(𝜼,𝝂))𝒓obs22\nabla_{\bm{\nu}}\|\bm{R}(g(\bm{\eta},\bm{\nu}))-\bm{r_{\text{obs}}}\|_{2}^{2} as used in Algorithm 3.3. After choosing suitable values for the stepsize τ\tau and the sub-chain length mm, this algorithm will return MCMC samples of all parameters, including 𝝂\bm{\nu}.

4 Simulation study

In this section, we propose three numerical solvers based on mixture models to confirm the robustness of our algorithm from three aspects: initialization, dimension of 𝝃\bm{\xi} and skewness of 𝑹(𝝃)\bm{R}(\bm{\xi}). To reach this conclusion, repeated sampling is performed in all three simulation cases to account for the effect of the initialization; the second case is designed by adding dimensions to 𝝃\bm{\xi} on the basis of the first case; and the third case is constructed by introducing skewness to the first case. Although these numerical solvers have relatively simple structures, their parameters are highly correlated, which mimics the problem with the data for real experimental gradient-elution preparative liquid chromatography. Meanwhile, the calculation can be implemented through vectorization, which makes it possible to have larger sample sizes and more repeated trials. We only present the key plots in this section and leave other plots to Section S6 of the supplementary material.

4.1 Simulation Case 1

For the first case, we considered a Gaussian-mixture model with two components. Let the parameter of interest be 𝝃+4\bm{\xi}\in\mathbb{R}_{+}^{4}, with a single observation 𝒓obs=𝑹(𝝃)+ϵ\bm{r_{\text{obs}}}=\bm{R}(\bm{\xi}^{*})+\bm{\epsilon} with 𝝃=(13,23,83,43)\bm{\xi}^{*}=(\frac{1}{3},\frac{2}{3},\frac{8}{3},\frac{4}{3}) and noise variance σϵ2=0.001\sigma^{2*}_{\epsilon}=0.001 from the following numerical solver:

R(𝝃,t)=i=12ξ2i1(ξ2i1+ξ2i)12πe(t(ξ2i1+ξ2i))22,t𝒯.R(\bm{\xi},t)=\sum_{i=1}^{2}\frac{\xi_{2i-1}}{(\xi_{2i-1}+\xi_{2i})}\frac{1}{\sqrt{2\pi}}e^{-\frac{(t-(\xi_{2i-1}+\xi_{2i}))^{2}}{2}},\quad t\in\mathcal{T}. (15)

To perform the dimension reduction, we selected 𝜼2\bm{\eta}\in\mathbb{R}^{2} and 𝝂2\bm{\nu}\in\mathbb{R}^{2} as follows:

ηi=ξ2i1ξ2i1+ξ2i,νi=ξ2i1+ξ2i,i=1,2.\eta_{i}=\frac{\xi_{2i-1}}{\xi_{2i-1}+\xi_{2i}},\quad\nu_{i}={\xi_{2i-1}+\xi_{2i}},\qquad i=1,2.

These two parameters can be regarded as weights and means in the Gaussian-mixture model. Because 𝜼[0,1]2\bm{\eta}\in[0,1]^{2}, the MGDG algorithm can be initialized by uniformly sampling {𝜼i}i=11000Uniform(0,1)2\{\bm{\eta}_{i}\}_{i=1}^{1000}\sim\text{Uniform}(0,1)^{2} and setting 𝜼(0)\bm{\eta}^{(0)} as the one minimizing the L2L^{2} norm between 𝑹(g(𝜼i,𝝂^(𝜼i)))\bm{R}(g(\bm{\eta}_{i},\bm{\hat{\nu}}(\bm{\eta}_{i}))) and 𝒓obs\bm{r_{\text{obs}}}, i.e.

𝜼(0)=argmin𝜼i𝑹(g(𝜼i,𝝂^(𝜼i)))𝒓obs2,\bm{\eta}^{(0)}=\arg\min_{\bm{\eta}_{i}}\|\bm{R}(g(\bm{\eta}_{i},\bm{\hat{\nu}}(\bm{\eta}_{i})))-\bm{r_{\text{obs}}}\|_{2},

with 𝝂^(𝜼)\bm{\hat{\nu}(\bm{\eta})} defined in (9). The noise variance σϵ2\sigma^{2}_{\epsilon} is initialized with a random value from its prior distribution.

Let TN(μ,σ2,l,u)TN(\mu,\sigma^{2},l,u) stand for the truncated normal density on interval [l,u][l,u] with mean μ\mu and variance σ2\sigma^{2}. The other hyperparameters, prior distributions, and proposal distributions are summarized in Table 1. Since the two elements of 𝜼\bm{\eta} can always be exchanged in the sampling, we set the smaller one as η1\eta_{1} and the larger one as η2\eta_{2} after each iteration.

Table 1: Hyperparameters in Case 1: the prior distributions, proposal distributions, and other parameters.
ηiηiTN(ηi,0.022,0,1)\eta_{i}^{\prime}\mid\eta_{i}\sim TN(\eta_{i},0.02^{2},0,1) Proposal distribution of ηi\eta_{i} for i=1,2i=1,2
π(𝜼)exp{γ𝑹(g(𝜼,𝝂^(𝜼)))𝒓obs22}\pi(\bm{\eta})\propto\exp\{-\gamma\|\bm{R}(g(\bm{\eta},\bm{\hat{\nu}}(\bm{\eta})))-\bm{r_{\text{obs}}}\|^{2}_{2}\} Prior of 𝜼\bm{\eta}
𝒯=[2,7]\mathcal{T}=[-2,7] Recording time
𝒯n𝒯,n=50\mathcal{T}_{n}\subseteq\mathcal{T},\,n=50 Equally spaced time points used in calculation
B=500B=500 Number of burn-in samples
𝝍=(α,β,γ)\bm{\psi}=(\alpha,\beta,\gamma) (2,𝑹(g(𝜼(0),𝝂^(𝜼(0))))𝒓obs22/n,8)(2,\|\bm{R}(g(\bm{\eta}^{(0)},\bm{\hat{\nu}}(\bm{\eta}^{(0)})))-\bm{r_{\text{obs}}}\|_{2}^{2}/n,8)
(τ,m)=(0.001,200)(\tau,m)=(0.001,200) Step size and chain length of sampling 𝝂\bm{\nu} in MALG
Refer to caption
(a) MGDG
Refer to caption
(b) MALG
Refer to caption
(c) Metropolis-within-Gibbs
Figure 4: An example of sampling result of 𝝃^\bm{\hat{\xi}} from (a) MGDG, (b) MALG, and (c) Metropolis-within-Gibbs sampler. In each panel, the four solid lines represent the trace of the four elements in 𝝃^\bm{\hat{\xi}} with sample size K=1×104K=1\times 10^{4}.

Fig. 4 shows an example of the samples from our algorithms and from the Metropolis-within-Gibbs sampler. It can be observed that traces of both MGDG and MALG are distributed closely around the ground truth. Such distribution suggests that our method can stably sample from the correlated posterior and that the sample is reliable in inferring 𝝃\bm{\xi}. The performance of our method, in this case, differs from that of the Metropolis-within-Gibbs sampler (Hastings (1970)), for which the sample is less stationary and more biased in inferring 𝝃\bm{\xi}. A possible reason is that the elements of 𝝃\bm{\xi} are highly correlated in the posterior, which makes the Metropolis-within-Gibbs sampler not very reliable.

The algorithms are evaluated by the residuals in one sampling trial, i.e. 𝜼𝜼\bm{\eta}-\bm{\eta}^{*}, 𝝂𝝂\bm{\nu}-\bm{\nu}^{*}, and 𝝃^𝝃\bm{\hat{\xi}}-\bm{\xi}^{*}. The corresponding scatter-plot matrices with sample size K=1×104K=1\times 10^{4} are presented in Fig. 5. These plots suggest that these residuals are approximately normally distributed around 0, and that the bias is in the same order as the threshold in the gradient descent algorithm. The correlation between 𝜼\bm{\eta} is not significant, but the variables in the same group in 𝝃\bm{\xi} have a strong linear correlation.

Refer to caption
(a) 𝜼𝜼\bm{\eta}-\bm{\eta}^{*}
Refer to caption
(b) 𝝃^𝝃\bm{\hat{\xi}}-\bm{\xi}^{*}
Refer to caption
(c) 𝜼𝜼\bm{\eta}-\bm{\eta}^{*}
Refer to caption
(d) 𝝂𝝂\bm{\nu}-\bm{\nu}^{*}
Refer to caption
(e) 𝝃^𝝃\bm{\hat{\xi}}-\bm{\xi}^{*}
Figure 5: Distribution of residuals in one running trial of Case 1 with sample size K=1×104K=1\times 10^{4}. (a,b)Scatter-plot matrix of result from MGDG. (c,d,e) Scatter-plot matrix of result from MALG.

To validate the robustness of our MGDG algorithm, it was run 100 times, and the result is summarized in Fig. 6. The two graphs in the first column show the boxplots of sampled 𝜼\bm{\eta} in some iterations. These two panels suggest that the distribution of sampled 𝜼\bm{\eta} in multiple repeated trials is very stable, and that the overall error is controlled within an acceptable range. Therefore, the restored 𝝃^\hat{\bm{\xi}} should also have a stable distribution that is not far away from 𝝃\bm{\xi}^{*}, which is consistent with the remaining four panels, where the boxplots of 𝝃^\hat{\bm{\xi}} have similar quantiles, and the distance between 𝝃^\hat{\bm{\xi}} and the box is in the same order as the gradient descent threshold. In general, the MGDG algorithm can robustly estimate the posterior distribution in multiple repeated experiments. The other algorithm, MALG, performed similarly, and the results can be found in Section S6 of the supplementary material.

Refer to caption
(a) η1\eta_{1}
Refer to caption
(b) ξ^1\hat{\xi}_{1}
Refer to caption
(c) ξ^2\hat{\xi}_{2}
Refer to caption
(d) η2\eta_{2}
Refer to caption
(e) ξ^3\hat{\xi}_{3}
Refer to caption
(f) ξ^4\hat{\xi}_{4}
Figure 6: Box plots of repeating sampling 𝜼\bm{\eta} and 𝝃^\bm{\hat{\xi}} with sample size K=1×104K=1\times 10^{4} for 100 times via MGDG algorithm in Case 1. Each of the boxes represents the distribution of 100 sampled values of a corresponding variable, with the sampling iterations as on the x-axis, while the ground truth is depicted by the dotted cyan line.

Overall, our method is able to deal with the solver defined in (15). The algorithms provide samples of 𝜼\bm{\eta} that are stably distributed around 𝜼\bm{\eta}^{*}. With these samples, 𝝃^\hat{\bm{\xi}} can be restored; it is also close to the ground truth 𝝃\bm{\xi}^{*}. Compared with the Metropolis-within-Gibbs sampler, our MGDG algorithm is more reliable for estimating 𝝃\bm{\xi} in this case.

4.2 Simulation Case 2

To test our algorithm with more local minimums on the basis of Case 1, another attempt with a 4-component Gaussian-mixture model and 𝝃+8\bm{\xi}\in\mathbb{R}_{+}^{8} was performed. In this scenario, the observation was 𝒓obs=𝑹(𝝃)+ϵ\bm{r_{\text{obs}}}=\bm{R}(\bm{\xi}^{*})+\bm{\epsilon} with 𝝃=(16,56,52,52,163,83,9,3)\bm{\xi}^{*}=(\frac{1}{6},\frac{5}{6},\frac{5}{2},\frac{5}{2},\frac{16}{3},\frac{8}{3},9,3) and noise variance σϵ2=0.001\sigma^{2*}_{\epsilon}=0.001 from the following numerical solver:

R(𝝃,t)=i=14ξ2i1(ξ2i1+ξ2i)12πe(t(ξ2i1+ξ2i))22,t𝒯.R(\bm{\xi},t)=\sum_{i=1}^{4}\frac{\xi_{2i-1}}{(\xi_{2i-1}+\xi_{2i})}\frac{1}{\sqrt{2\pi}}e^{-\frac{(t-(\xi_{2i-1}+\xi_{2i}))^{2}}{2}},\quad t\in\mathcal{T}. (16)

To perform the dimension reduction, we selected 𝜼4\bm{\eta}\in\mathbb{R}^{4} and 𝝂4\bm{\nu}\in\mathbb{R}^{4} as follows:

ηi=ξ2i1(ξ2i1+ξ2i),νi=(ξ2i1+ξ2i),i=1,,4,\eta_{i}=\frac{\xi_{2i-1}}{(\xi_{2i-1}+\xi_{2i})},\quad\nu_{i}={(\xi_{2i-1}+\xi_{2i})},\qquad i=1,\cdots,4,\\

where 𝜼[0,1]4\bm{\eta}\in[0,1]^{4}. Similarly, these parameters can be regarded as the weights and means in the Gaussian-mixture model, and the MGDG algorithm was initialized by uniformly sampling {𝜼i}i=11000Uniform(0,1)4\{\bm{\eta}_{i}\}_{i=1}^{1000}\sim\text{Uniform}(0,1)^{4} and setting 𝜼(0)\bm{\eta}^{(0)} as the sample minimizing the L2L^{2} norm between 𝑹(g(𝜼i,𝝂^(𝜼i)))\bm{R}(g(\bm{\eta}_{i},\bm{\hat{\nu}}(\bm{\eta}_{i}))) and 𝒓obs\bm{r_{\text{obs}}}. The noise-term variance σϵ2\sigma^{2}_{\epsilon} was initialized with a random value from its prior distribution. The other hyperparameters, prior distributions, and proposal distributions are summarized in Table 2. Given that each of the elements in 𝜼=(η1,η2,η3,η4)\bm{\eta}=(\eta_{1},\eta_{2},\eta_{3},\eta_{4}) plays the same role in (16), we sort them in ascending order after each iteration in sampling.

Table 2: Hyperparameters in Case 2: the prior distributions, proposal distributions, and other parameters.
ηiηiTN(ηi,0.022,0,1)\eta_{i}^{\prime}\mid\eta_{i}\sim TN(\eta_{i},0.02^{2},0,1) Proposal distribution of ηi\eta_{i} for i=1,,4i=1,\cdots,4
π(𝜼)exp{γ𝑹(g(𝜼,𝝂^(𝜼)))𝒓obs22}\pi(\bm{\eta})\propto\exp\{-\gamma\|\bm{R}(g(\bm{\eta},\bm{\hat{\nu}}(\bm{\eta})))-\bm{r_{\text{obs}}}\|^{2}_{2}\} Prior of 𝜼\bm{\eta}
𝒯=[4,15]\mathcal{T}=[-4,15] Recording time
𝒯n𝒯,n=100\mathcal{T}_{n}\subseteq\mathcal{T},\,n=100 Equally spaced time points used in calculation
B=500B=500 Number of burn-in samples
𝝍=(α,β,γ)\bm{\psi}=(\alpha,\beta,\gamma) (2,𝑹(g(𝜼(0),𝝂^(𝜼(0))))𝒓obs22/n,10)(2,\|\bm{R}(g(\bm{\eta}^{(0)},\bm{\hat{\nu}}(\bm{\eta}^{(0)})))-\bm{r_{\text{obs}}}\|_{2}^{2}/n,10)
(τ,m)=(0.001,200)(\tau,m)=(0.001,200) Step size and chain length of sampling 𝝂\bm{\nu} in MALG
Refer to caption
(a) 𝜼𝜼\bm{\eta}-\bm{\eta}^{*}
Refer to caption
(b) 𝝃^𝝃\bm{\hat{\xi}}-\bm{\xi}^{*}
Refer to caption
(c) 𝜼𝜼\bm{\eta}-\bm{\eta}^{*}
Refer to caption
(d) 𝝂𝝂\bm{\nu}-\bm{\nu}^{*}
Refer to caption
(e) 𝝃^𝝃\bm{\hat{\xi}}-\bm{\xi}^{*}
Figure 7: Distribution of residuals in one running trial of Case 2 with sample size K=1×104K=1\times 10^{4}. (a,b)Scatter-plot matrix of result from MGDG. (c,d,e) Scatter-plot matrix of result from MALG.

Fig. 7 shows the scatter-plot matrix of the residuals in a trial of MGDG and MALG with sample size K=1×104K=1\times 10^{4}. These panels suggest that all of the elements are unimodally distributed around 0, and that the correlations among the elements of 𝜼\bm{\eta} and 𝝂\bm{\nu} are not very significant, although some of the adjacent elements in 𝝃^\hat{\bm{\xi}} have almost linear correlations. Overall, both 𝜼\bm{\eta} and 𝝃^\bm{\hat{\xi}} have nearly normal distributions, and the distance from their center to 0 is within an acceptable range. The algorithms were also repeated 100 times to investigate the robustness with the solver defined in (16), and the boxplots can be found in Section S6 of the supplementary material.

In general, our algorithms are still able to effectively estimate the parameters in (16) with acceptable bias. After increasing the dimensions by adding components to the solver, the exchangeable components give 𝑹(g(𝜼,𝝂^(𝜼)))𝒓obs2\|\bm{R}(g(\bm{\eta},\bm{\hat{\nu}}(\bm{\eta})))-\bm{r_{\text{obs}}}\|_{2} more local minima. Our algorithm is not much affected by these local solutions, and is able to provide samples that are normally distributed around the ground truth.

4.3 Simulation Case 3

To complement Case 1, we adopted a Gamma-mixture model to study the performance of our algorithm on skewed observations and steep loss functions. Let the parameter of interest be 𝝃+4\bm{\xi}\in\mathbb{R}_{+}^{4}, and the single observation 𝒓obs=𝑹(𝝃)+ϵ\bm{r_{\text{obs}}}=\bm{R}(\bm{\xi}^{*})+\bm{\epsilon} with 𝝃=(4,34,2,14)\bm{\xi}^{*}=(4,\frac{3}{4},2,\frac{1}{4}) and σ2=0.001\sigma^{2*}=0.001 from the following numerical solver:

R(𝝃,t)=i=121Γ(ξ2i1)ξ2iξ2i1tξ2i11etξ2i,t𝒯.R(\bm{\xi},t)=\sum_{i=1}^{2}\frac{1}{\Gamma(\xi_{2i-1})\xi_{2i}^{\xi_{2i-1}}}t^{\xi_{2i-1}-1}e^{-\frac{t}{\xi_{2i}}},\quad t\in\mathcal{T}. (17)

To perform the dimension reduction, we selected 𝜼2\bm{\eta}\in\mathbb{R}^{2} and 𝝂2\bm{\nu}\in\mathbb{R}^{2} as follows:

ηi=ξ2i,νi=ξ2i1,i=1,2.\eta_{i}=\xi_{2i},\quad\nu_{i}=\xi_{2i-1},\qquad i=1,2.

These two parameters can be regarded as the shape and scale parameters in a Gamma-mixture model. With the prior knowledge that 𝜼[0,1]2\bm{\eta}\in[0,1]^{2}, we can still uniformly sample {𝜼i}i=11000Uniform(0,1)2\{\bm{\eta}_{i}\}_{i=1}^{1000}\sim\text{Uniform}(0,1)^{2} and initialize 𝜼(0)\bm{\eta}^{(0)} as the minimizer of 𝑹(g(𝜼i,𝝂^(𝜼i)))𝒓obs2\|\bm{R}(g(\bm{\eta}_{i},\bm{\hat{\nu}}(\bm{\eta}_{i})))-\bm{r_{\text{obs}}}\|_{2}/.The noise variance σϵ2\sigma^{2}_{\epsilon} was initialized by sampling from its prior distribution. The other hyperparameters, prior distributions, and proposal distributions are summarized in Table 3. Although the two elements of 𝜼\bm{\eta} can always be exchanged in the sampling, we do not sort them in this case.

Table 3: Hyperparameters in Case 3: the prior distributions, proposal distributions, and other parameters.
ηiηiTN(ηi,σq,i2,0,1)\eta_{i}^{\prime}\mid\eta_{i}\sim TN(\eta_{i},\sigma^{2}_{q,i},0,1) Proposal distribution of ηi\eta_{i} for i=1,2i=1,2
(σq,1,σq,2)=(0.05,0.15)(\sigma_{q,1},\sigma_{q,2})=(0.05,0.15) Propose standard derivation for MALG
(σq,1,σq,2)=(0.08,0.33)(\sigma_{q,1},\sigma_{q,2})=(0.08,0.33) Propose standard derivation for MGDG
π(𝜼)exp{γ𝑹(g(𝜼,𝝂^(𝜼)))𝒓obs22}\pi(\bm{\eta})\propto\exp\{-\gamma\|\bm{R}(g(\bm{\eta},\bm{\hat{\nu}}(\bm{\eta})))-\bm{r_{\text{obs}}}\|^{2}_{2}\} Prior of 𝜼\bm{\eta}
𝒯=[0,10]\mathcal{T}=[0,10] Recording time
𝒯n𝒯,n=200\mathcal{T}_{n}\subseteq\mathcal{T},\,n=200 Equally spaced time points used in calculation
B=500B=500 Number of burn-in samples
𝝍=(α,β,γ)\bm{\psi}=(\alpha,\beta,\gamma) (2,𝑹(g(𝜼(0),𝝂^(𝜼(0))))𝒓obs22/n,8)(2,\|\bm{R}(g(\bm{\eta}^{(0)},\bm{\hat{\nu}}(\bm{\eta}^{(0)})))-\bm{r_{\text{obs}}}\|_{2}^{2}/n,8)
(τ,m)=(0.0002,200)(\tau,m)=(0.0002,200) Step size and chain length of sampling 𝝂\bm{\nu} in MALG
Refer to caption
(a) Input observation
Refer to caption
(b) An example of MGDG result
Figure 8: Input data and sampling result from the MGDG algorithm in Case 3. (a) An illustration of the input 𝒓obs\bm{r_{\text{obs}}} (solid black line) and clean 𝑹(𝝃)\bm{R}(\bm{\xi}^{*}) without noise (dotted red line) (b) MGDG result 𝝃^\bm{\hat{\xi}} with sample size K=1×104K=1\times 10^{4} in one trial.

Fig. 8(a) roughly shows the shape and noise level of the input data in this simulation case. With such a skewed observation, the L2L^{2} distance between 𝑹(𝝃)\bm{R}(\bm{\xi}) and 𝒓obs\bm{r_{\text{obs}}} is no longer as symmetrical as in the previous cases. The huge slope on one side makes the gradient descent more difficult, and the rejection rate during the sampling process increases. As shown in Fig. 8(b), although the trajectory of the sample is still stable, it is far less smooth than in the previous cases.

The estimation in one trial is also evaluated by the residuals of samples, and the corresponding scatter-plot matrices with sample size K=1×104K=1\times 10^{4} are shown in Fig. 9. According to these scatter plots, the residuals of each parameter are symmetrically distributed with a single peak, and the centers of the peaks are very close to 0, which is also within the range of the threshold of the gradient descent algorithm. The correlation between 𝜼\bm{\eta} is not significant, but there seem to be straight lines in the scatter plots, which may be caused by the rejected proposals. As for 𝝃^\hat{\bm{\xi}}, we can clearly see that the residuals of each element are also symmetrically and unimodally distributed around 0, and that the two elements belonging to the same component are highly correlated. In general, in a single experiment, samples can be effectively drawn from the posterior distribution.

Refer to caption
(a) 𝜼𝜼\bm{\eta}-\bm{\eta}^{*}
Refer to caption
(b) 𝝃^𝝃\bm{\hat{\xi}}-\bm{\xi}^{*}
Refer to caption
(c) 𝜼𝜼\bm{\eta}-\bm{\eta}^{*}
Refer to caption
(d) 𝝂𝝂\bm{\nu}-\bm{\nu}^{*}
Refer to caption
(e) 𝝃^𝝃\bm{\hat{\xi}}-\bm{\xi}^{*}
Figure 9: Distribution of residuals in one running trial of Case 3 with sample size K=1×104K=1\times 10^{4}. (a,b)Scatter-plot matrix of result from MGDG. (c,d,e) Scatter-plot matrix of result from MALG.

According to the results from both algorithms, the bias of parameters related to the second component is much larger, which is because the second peak is flatter in signal and such bias does not change the shape of the observation significantly. We also repeated this sampling experiment 100 times, and the results were similar to that of the original (see Section S6 of the supplementary material). Overall, our method is able to deal with the skew-solver from the Gamma-mixture density function.

5 Real data application

In this section, we verify our algorithm with real data. First, we briefly introduce the background of the real experiment, after which the model, parameter settings, and data-generation process are presented in detail. Finally, we evaluate the performance of our method through repeated experiments.

5.1 Experiment background

Our data consist of gradient separations of a cycloheptanone/cyclohexanone mixture. The experiments were conducted on an Agilent 1200 system (Palo Alto, CA, USA), with a 150 mm ×\times 4.6 mm Kromasil column (AkzoNobel Eka, Bohus, Sweden) filled with C18-bonded porous silica, with an average particle diameter of 5 μ\mum. The system contained a 900 μ\muL-injection-loop auto-sampler, a binary pump system, a diode-array UV detector, and a column thermostat. In all experiments, the column was held at a constant temperature of 22 C, and the flow rate was 1.0mL/min.

The solutes adopted in the experiments were cyclohexanone (\geq95%) and cycloheptanone (\geq95%) from Sigma-Aldrich (Steinheim, Germany), while the solvents used in the pycnometer measurements were dichloromethane (\geq99.5%) from VWR International (Paris, France) and isopropanol (HPLC grade) from Fisher Scientific (Loughborough, UK). The mobile phase was composed of HPLC-grade methanol from Fisher Scientific (Loughborough, UK) and deionized water, with a conductivity of 18.2 MΩ\Omegacm, supplied by a Milli-Q Plus 185 water-purification system from Millipore (Merck Millipore, MA, USA).

During the experiments, calibration curves for cyclohexanone and cycloheptanone were recorded at 280 nm for several mobile-phase compositions. The column hold-up volume measured with a pycnometer was 1.38 mL. To match the injected amount of solute, the total area under the peaks in the elution profiles was adjusted. Indistinguishable inlet and outlet effects were included in the injection profile, and the injection profile was recorded and used in the calculations.

5.2 Model setting and data generation

Based on the experiment background, we can consider a chromatography system with a 150 mm ×\times 4.6 mm column, a flow rate of 0.7 mL/min, and 9000 theoretical plates. The hold time is 1.5 min and 400 μ\muL samples are introduced using rectangular injection profiles. In this section, we focus on a single observation from

R(𝝃,t)=i=12Ci(L,t;𝝃),t𝒯,R(\bm{\xi},t)=\sum_{i=1}^{2}C_{i}(L,t;\,\bm{\xi}),\quad t\in\mathcal{T},

where Ci(L,t;𝝃)C_{i}(L,t;\,\bm{\xi}) is the solution to the time-dependent convection-diffusion system defined in (1); all the other parameters adopted in that PDE system are summarized in Table 4. In obtaining real data, researchers usually make multiple observations of chromatographic curves with the same parameter settings and average the records to reduce errors. This allows the noise term to have a smaller variance and the averaged noise will be distributed closer to a Gaussian distribution. Overall, the noise level of observation 𝒓obs\bm{r_{\text{obs}}} used in this section is similar to the one shown in Fig. 10.

Table 4: Main parameters adopted in real data solver.
Parameter Description
u=0.125u=0.125 Linear velocity [cm/s]
L=15L=15 Column length [cm]
T=L/uT=L/u Dead time [s]
F=0.7806F=0.7806 Phase ratio
Da=0.00010417D_{a}=0.00010417 Diffusion constant
gi(x)0g_{i}(x)\equiv 0 Initial condition
h=[5,0]h=[5,0] Injection profile (mM)

To simplify the problem, we silence part of the parameters. The solid black line in Fig. 10 is an example of a clean bimodal observation from R(𝝃0,t)R(\bm{\xi}_{0},t) with full parameter set 𝝃0=(aI,1,aII,1,bI,1,bII,1,aI,2,aII,2,bI,2,bII,2)=(2,1,0.1,0.05,4,2,0.2,0.1)\bm{\xi}_{0}=(a_{I,1},a_{II,1},b_{I,1},b_{II,1},a_{I,2},a_{II,2},b_{I,2},b_{II,2})=(2,1,0.1,0.05,4,2,0.2,0.1) and h=[15,15]h=[15,15]. Each peak in that signal roughly corresponds to a set of parameters (aI,i,aII,i,bI,i,bII,i)(a_{I,i},a_{II,i},b_{I,i},b_{II,i}), which also represents a component in the sample. We observe that these peaks can be separated after adjusting injection parameter hh, and the second one almost has the same shape as that from the observation from R(𝝃2,t)R(\bm{\xi}_{2},t), where 𝝃2=(0,0,0,0,4,2,0.2,0.1)\bm{\xi}_{2}=(0,0,0,0,4,2,0.2,0.1) and h=[0,15]h=[0,15], which is shown as Set 2 in Fig. 10. Therefore, (aI,2,aII,2,bI,2,bII,2)(a_{I,2},a_{II,2},b_{I,2},b_{II,2}) can be directly estimated from the signal segment containing only the second peak, and the estimator will help us figure out the remaining parameters. Given the fact that multiple sets of the parameters will only bring additional calculation difficulties, we focus only on the case of one peak by setting the injection profile of the second group as 0 and letting the parameter of interest be 𝝃=(aI,aII,bI,bII)\bm{\xi}=(a_{I},a_{II},b_{I},b_{II}) in this section.

Refer to caption
Refer to caption
Figure 10: (a) An example observation with noise from the time-dependent convection-diffusion system. (b) An illustration of the two-peaked data. Set 0 is the clean data with the full set of parameters. Set 1 (or Set 2) stands for the observation with the second (or first) group of parameters set to be 0.

5.3 Parameter estimation

After identifying the model and parameter settings, we let the observation of interest be 𝒓obs\bm{r_{\text{obs}}} from 𝑹(𝝃)\bm{R}(\bm{\xi}^{*}), where the noise level estimated from the flat part (i.e. t[0,300]t\in[0,300]) is σ^ϵ2=0.001\widehat{\sigma}^{2}_{\epsilon}=0.001. To estimate 𝝃\bm{\xi}^{*} with the MGDG algorithm, we set 𝜼2\bm{\eta}\in\mathbb{R}^{2} and 𝝂2\bm{\nu}\in\mathbb{R}^{2} as

η1=aIaI+aII,η2=bIbI+bII,ν1=aI+aII,ν2=bI+bII.\displaystyle\eta_{1}=\frac{a_{I}}{a_{I}+a_{II}},\quad\eta_{2}=\frac{b_{I}}{b_{I}+b_{II}},\quad\nu_{1}={a_{I}+a_{II}},\quad\nu_{2}={b_{I}+b_{II}}.

To ensure the algorithm MALG produces a valid Markov chain, we adopt the element-wise transformation

𝜼=tanh(𝜼~)+12,𝝂=exp(𝝂~),\bm{\eta}=\frac{\tanh(\widetilde{\bm{\eta}})+1}{2},\quad\bm{\nu}=\exp(\widetilde{\bm{\nu}}),

and sample the elements (𝜼~,𝝂~)(\widetilde{\bm{\eta}},\widetilde{\bm{\nu}}) from the entire real line.

Since (aI,aII,bI,bII)(a_{I},a_{II},b_{I},b_{II}) are also non-negative and 𝜼[0,1]2\bm{\eta}\in[0,1]^{2}, we still sample {𝜼i}i=1600Uniform(0,1)2\{\bm{\eta}_{i}\}_{i=1}^{600}\sim\text{Uniform}(0,1)^{2} and initialize 𝜼(0)\bm{\eta}^{(0)} as the one minimizing 𝑹(g(𝜼i,𝝂^(𝜼i)))𝒓obs2\|\bm{R}(g(\bm{\eta}_{i},\bm{\hat{\nu}}(\bm{\eta}_{i})))-\bm{r_{\text{obs}}}\|_{2} with respect to 𝜼i\bm{\eta}_{i}. The noise variance σϵ2\sigma^{2}_{\epsilon} is initialized with a sampled value from its prior. Together with the other distributions and hyperparameters summarized in Table 5, K=3000K=3000 posterior samples can be obtained from our algorithms.

Table 5: Settings in real data application: the prior distributions, proposal distributions, and other parameters.
ηiηiTN(ηi,0.022,0,1)\eta_{i}^{\prime}\mid\eta_{i}\sim TN(\eta_{i},0.02^{2},0,1) Proposal distribution of ηi\eta_{i} for i=1,2i=1,2 in MGDG
η~iη~iN(η~i,0.052)\widetilde{\eta}_{i}^{\prime}\mid\widetilde{\eta}_{i}\sim N(\widetilde{\eta}_{i},0.05^{2}) Proposal distribution of η~i\widetilde{\eta}_{i} for i=1,2i=1,2 in MALG
π(𝜼)exp{γ𝑹(g(𝜼,𝝂^(𝜼)))𝒓obs22}\pi(\bm{\eta})\propto\exp\{-\gamma\|\bm{R}(g(\bm{\eta},\bm{\hat{\nu}}(\bm{\eta})))-\bm{r_{\text{obs}}}\|^{2}_{2}\} Prior of 𝜼\bm{\eta}
𝒯=[0,750]\mathcal{T}=[0,750] Recording time
𝒯n[300,500],n=100\mathcal{T}_{n}\subseteq[300,500],\,n=100 Equally spaced time points used in calculation
B=500B=500 Number of burn-in samples
𝝍=(α,β,γ)\bm{\psi}=(\alpha,\beta,\gamma) (2,𝑹(g(𝜼(0),𝝂^(𝜼(0))))𝒓obs22/n,8)(2,\|\bm{R}(g(\bm{\eta}^{(0)},\bm{\hat{\nu}}(\bm{\eta}^{(0)})))-\bm{r_{\text{obs}}}\|_{2}^{2}/n,8)
(τ,m)=(1×108,20)(\tau,m)=(1\times 10^{-8},20) Step size and chain length of sampling 𝝂\bm{\nu} in MALG

The sampling result is summarized in Fig. 11, in which the 𝜼\bm{\eta} sampled with MGDG is distributed in a wide range, while the samples from MALG are distributed with some pattern and has a higher rejection rate. The acceptance rate is due to the design of the algorithm - a new proposal of 𝜼\bm{\eta} is more easily accepted in one MGDG iteration, as we chose an optimal 𝝂\bm{\nu} for it. From the samples, an empirical 95% credible interval (CI) can be constructed from the quantile of R(𝝃(i),t)R(\bm{\xi}^{(i)},t) for each t𝒯t\in\mathcal{T}. Fig. 11(c)11(f) present this 95% CI from one trial of MGDG and MALG, in light blue. The CIs are so narrow that they almost coincide with the clean truth 𝑹(𝝃)\bm{R}(\bm{\xi}^{*}). Since there are multiple solutions, these samples are evaluated by the relative error between the 𝑹(𝝃^)\bm{R}(\bm{\hat{\xi}}) and 𝑹(𝝃)\bm{R}(\bm{\xi^{*}}), that is,

RE(𝝃^)=𝑹(𝝃^)𝑹(𝝃)2𝑹(𝝃)2,RE(\bm{\hat{\xi}})=\frac{\left\|\bm{R}(\bm{\hat{\xi}})-\bm{R}(\bm{\xi^{*}})\right\|_{2}}{\left\|\bm{R}(\bm{\xi^{*}})\right\|_{2}},

where the lower and upper bound of the 95% CI in Fig. 11(c) have a relative error of approximately 3.21%3.21\% and 2.65%2.65\%, while 𝒓obs\bm{r_{\text{obs}}} itself has a relative error of approximately 26%26\%. Moreover, the two traces of σϵ2\sigma_{\epsilon}^{2} are stationary distributed around σ^ϵ2=0.001\widehat{\sigma}^{2}_{\epsilon}=0.001. This, on the other hand, confirms the validity of our algorithm with real data.

Refer to caption
(a) 𝜼\bm{\eta} from MGDG
Refer to caption
(b) σϵ2\sigma^{2}_{\epsilon} from MGDG
Refer to caption
(c) 95%95\% CI from MGDG
Refer to caption
(d) 𝜼\bm{\eta} from MALG
Refer to caption
(e) 𝝂\bm{\nu} from MALG
Refer to caption
(f) 95%95\% CI from MALG
Refer to caption
(g) 𝜼~\widetilde{\bm{\eta}} from MALG
Refer to caption
(h) 𝝂~\widetilde{\bm{\nu}} from MALG
Refer to caption
(i) σϵ2\sigma^{2}_{\epsilon} from MALG
Figure 11: Summary plots of MGDG and MALG with sample size K=3000K=3000 in the real data case. (a,d,e,g,h) Scatter plots of variables of interest from the corresponding algorithm. (b,i) Sample trace of σϵ2\sigma_{\epsilon}^{2} from MGDG and MALG. (c,f) The 95% credible interval (light blue shaded part) from MGDG and MALG, with the input data 𝒓obs\bm{r}_{obs} is drawn in grey solid line and the clean truth 𝑹(𝝃)\bm{R}(\bm{\xi}^{*}) is drawn as a dotted red line.
Table 6: Comparison of sample means and maximum relative error of 95% credible interval from MGDG and MALG. Standard deviations computed based on 20 repetitions are shown in parentheses.
η¯1\bar{\eta}_{1} η¯2\bar{\eta}_{2} ν¯1\bar{\nu}_{1} ν¯2\bar{\nu}_{2}
max RE within
95% CI
MGDG 0.4341(0.0944) 0.4509(0.1313) 2.9936(0.0000) 0.1515(0.0000) 0.0310(0.0020)
MALG 0.4144(0.1041) 0.3348(0.1970) 3.0012(0.0225) 0.1510(0.0020) 0.0676(0.0290)

To validate the robustness of our algorithms, the sampling was repeated 20 times, and the result is summarized in Table 6. The large standard deviations of 𝜼\bm{\eta} can be attributed to multiple solutions from the time-dependent convection-diffusion system in (1). In contrast, the distribution of 𝝂\bm{\nu} is very concentrated, and in most cases, the 95% CIs are very close to 𝑹(𝝃)\bm{R}(\bm{\xi}^{*}). Overall, these experiments indicate that our algorithms can robustly infer 𝝃\bm{\xi}^{*} based on the observations 𝒓obs\bm{r_{\text{obs}}}.

6 Conclusion

The primary objective of this study was to develop a probabilistic model and estimate the adsorption-isotherm parameters in gradient-elution preparative liquid chromatography from a statistical perspective.

With the aim of estimating the adsorption-isotherm parameters reliably, a statistical observing model with spatial-correlation noise was proposed. Because the estimation was affected by the correlation between the parameters in the preliminary experiment, we designed two modified MCMC algorithms to reduce this effect. These algorithms were verified on several numerical solvers with highly correlated parameters, and they were able to produce approximately normally distributed estimators with acceptable bias. The verification indicates that our method is reliable with correlated parameters. The real data application revealed that the estimation of the parameters is not unique and that our method leads to reasonable results.

Our method has major implications for estimating parameters that are correlated among their elements. On one hand, these algorithms avoid the limitation of the local optimal solution, but on the other hand, they do not require the objective function to be derivable for all variables. The combination of random sampling and optimization methods expands their scope of application. Under certain conditions, the performance of the proposed approach is guaranteed theoretically.

It should be noted that the estimation procedure here has two limitations. First, the gradient of the objective function is calculated numerically. Although it does not consume too much time when implemented in parallel, thousands of repetitions of gradient descent embedded in sampling iterations still require enormous computing resources. Second, the proposed approach can only result in a group of possible estimators when the real data are processed, and we cannot make further evaluations of these results. Because the signals deduced from these estimates are very close to the initial observations, they have almost the same likelihood from a statistical point of view. It is possible that further research on the forward model could reduce the demand for computing resources and make these parameters identifiable. With such a forward model, the efficiency and accuracy of the proposed approach could be further improved.

Overall, a reliable estimation of the adsorption isotherm requires dealing with the correlation efficiently and describing the signal accurately. Our method enables us to explore the combination of sampling and optimization in a more advanced form, to further estimate the adsorption-isotherm parameters. Besides the considered chromatography application, our statistical approach has potential to be applied in other inverse problems associated with some time-dependent convection-diffusion PDEs such as the hydrogen plasma model in astrophysics (Berryman and Holland (1978)), the autowave model in environmental pollution (Chaikovskii and Zhang (2022)), and the atherosclerosis inflammatory disease model in biology (Hidalgo, Tello and Toro (2014)). We are currently looking into these extensions.

{supplement}\stitle

Supplementary material for “A Statistical Approach to Estimating Adsorption-Isotherm Parameters in Gradient-Elution Preparative Liquid Chromatography” \slink[doi]COMPLETED BY THE TYPESETTER \sdatatype.pdf \sdescriptionWe include all materials omitted from the main text.

Acknowledgements

The authors would also like to thank the Fundamental Separation Science Group (FSSG) under the supervision of Professor Torgny Fornstedt at Karlstad University, Sweden for providing the real data (cyclohexanone and cycloheptanone). Z.Y. would like to thank the Singapore MOE Tier 1 grants R-155-000- 196-114, A-0004826-00-00 and Tier 2 grant A-0008520-00-00 at the National University of Singapore. C.L. would like to thank the Singapore MOE Tier 1 grant A-0004822-00-00 at the National University of Singapore

References

  • Belloni and Chernozhukov (2009) {barticle}[author] \bauthor\bsnmBelloni, \bfnmA.\binitsA. and \bauthor\bsnmChernozhukov, \bfnmV.\binitsV. (\byear2009). \btitleOn the computational complexity of MCMC-based estimators in large samples. \bjournalThe Annals of Statistics \bvolume37 \bpages2011–2055. \endbibitem
  • Berryman and Holland (1978) {barticle}[author] \bauthor\bsnmBerryman, \bfnmJames G\binitsJ. G. and \bauthor\bsnmHolland, \bfnmCharles J\binitsC. J. (\byear1978). \btitleNonlinear diffusion problem arising in plasma physics. \bjournalPhysical Review Letters \bvolume40 \bpages1720. \endbibitem
  • Chaikovskii and Zhang (2022) {barticle}[author] \bauthor\bsnmChaikovskii, \bfnmDmitrii\binitsD. and \bauthor\bsnmZhang, \bfnmYe\binitsY. (\byear2022). \btitleConvergence analysis for forward and inverse problems in singularly perturbed time-dependent reaction-advection-diffusion equations. \bjournalJournal of Computational Physics \bvolume470 \bpages111609. \endbibitem
  • Cheng et al. (2017) {barticle}[author] \bauthor\bsnmCheng, \bfnmX.\binitsX., \bauthor\bsnmLin, \bfnmG.\binitsG., \bauthor\bsnmZhang, \bfnmY.\binitsY., \bauthor\bsnmGong, \bfnmR.\binitsR. and \bauthor\bsnmGulliksson, \bfnmM.\binitsM. (\byear2017). \btitleA modified coupled complex boundary method for an inverse chromatography problem. \bjournalJ. Inverse Ill-Posed Probl. \bpages1–17. \endbibitem
  • Chkrebtii et al. (2016) {barticle}[author] \bauthor\bsnmChkrebtii, \bfnmO. A.\binitsO. A., \bauthor\bsnmCampbell, \bfnmD. A.\binitsD. A., \bauthor\bsnmCalderhead, \bfnmB.\binitsB. and \bauthor\bsnmGirolami, \bfnmM. A.\binitsM. A. (\byear2016). \btitleBayesian solution uncertainty quantification for differential equations. \bjournalBayesian Analysis \bvolume11 \bpages1239–1267. \endbibitem
  • Cockayne et al. (2019) {barticle}[author] \bauthor\bsnmCockayne, \bfnmJ.\binitsJ., \bauthor\bsnmOates, \bfnmC. J.\binitsC. J., \bauthor\bsnmSullivan, \bfnmT. J.\binitsT. J. and \bauthor\bsnmGirolami, \bfnmM. A.\binitsM. A. (\byear2019). \btitleBayesian probabilistic numerical methods. \bjournalSIAM Review \bvolume61 \bpages756–789. \endbibitem
  • Dose, Jacobson and Guiochon (1991) {barticle}[author] \bauthor\bsnmDose, \bfnmEric V.\binitsE. V., \bauthor\bsnmJacobson, \bfnmStephen\binitsS. and \bauthor\bsnmGuiochon, \bfnmGeorges\binitsG. (\byear1991). \btitleDetermination of isotherms from chromatographic peak shapes. \bjournalAnal. chemi. \bvolume63 \bpages833–839. \endbibitem
  • Felinger, Zhou and Guiochon (2003) {barticle}[author] \bauthor\bsnmFelinger, \bfnmAttila\binitsA., \bauthor\bsnmZhou, \bfnmDongmei\binitsD. and \bauthor\bsnmGuiochon, \bfnmGeorges\binitsG. (\byear2003). \btitleDetermination of the single component and competitive adsorption isotherms of the 1-indanol enantiomers by the inverse method. \bjournalJ. Chromatogr. A \bvolume1005 \bpages35–49. \endbibitem
  • Forssén, Arnell and Fornstedt (2006) {barticle}[author] \bauthor\bsnmForssén, \bfnmP.\binitsP., \bauthor\bsnmArnell, \bfnmR.\binitsR. and \bauthor\bsnmFornstedt, \bfnmT.\binitsT. (\byear2006). \btitleAn improved algorithm for solving inverse problems in liquid chromatography. \bjournalComput. Chem. Eng. \bvolume30 \bpages1381–1391. \endbibitem
  • Ghosal and van der Vaart (2017) {bbook}[author] \bauthor\bsnmGhosal, \bfnmS.\binitsS. and \bauthor\bparticlevan der \bsnmVaart, \bfnmA. W.\binitsA. W. (\byear2017). \btitleFundamentals of Nonparametric Bayesian Inference. \bpublisherCambridge University Press. \endbibitem
  • Guiochon and Lin (2003) {bbook}[author] \bauthor\bsnmGuiochon, \bfnmGeorges\binitsG. and \bauthor\bsnmLin, \bfnmBingchang\binitsB. (\byear2003). \btitleModeling for Preparative Chromatography. \bpublisherNew York: Academic Press. \endbibitem
  • Hastings (1970) {barticle}[author] \bauthor\bsnmHastings, \bfnmW. K.\binitsW. K. (\byear1970). \btitleMonte Carlo sampling methods using Markov chains and their applications. \bjournalBiometrika \bvolume57 \bpages97-109. \bdoi10.1093/biomet/57.1.97 \endbibitem
  • Hidalgo, Tello and Toro (2014) {barticle}[author] \bauthor\bsnmHidalgo, \bfnmA\binitsA., \bauthor\bsnmTello, \bfnmL\binitsL. and \bauthor\bsnmToro, \bfnmEF\binitsE. (\byear2014). \btitleNumerical and analytical study of an atherosclerosis inflammatory disease model. \bjournalJournal of Mathematical Biology \bvolume68 \bpages1785–1814. \endbibitem
  • Horvath (1988) {bbook}[author] \bauthor\bsnmHorvath, \bfnmC.\binitsC. (\byear1988). \btitleIn high-performance liquid chromatography: Advances and perspectives (Volume 5). \bpublisherNew York: Academic Press. \endbibitem
  • Jasra, Holmes and Stephens (2005) {barticle}[author] \bauthor\bsnmJasra, \bfnmA.\binitsA., \bauthor\bsnmHolmes, \bfnmC. C.\binitsC. C. and \bauthor\bsnmStephens, \bfnmD. A.\binitsD. A. (\byear2005). \btitleMarkov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. \bjournalStatistical Science \bvolume20 \bpages50–67. \endbibitem
  • Javeed et al. (2011) {barticle}[author] \bauthor\bsnmJaveed, \bfnmS.\binitsS., \bauthor\bsnmQamar, \bfnmS.\binitsS., \bauthor\bsnmSeidel-Morgenstern, \bfnmA.\binitsA. and \bauthor\bsnmWarnecke, \bfnmG.\binitsG. (\byear2011). \btitleEfficient and accurate numerical simulation of nonlinear chromatographic processes. \bjournalComputers and Chemical Engineering \bvolume35 \bpages2294-2305. \endbibitem
  • Lin et al. (2017) {barticle}[author] \bauthor\bsnmLin, \bfnmG.\binitsG., \bauthor\bsnmZhang, \bfnmY.\binitsY., \bauthor\bsnmCheng, \bfnmX.\binitsX., \bauthor\bsnmGulliksson, \bfnmM.\binitsM., \bauthor\bsnmForssén, \bfnmP.\binitsP. and \bauthor\bsnmFornstedt, \bfnmT.\binitsT. (\byear2017). \btitleA regularizing Kohn-Vogelius formulation for the model-free adsorption isotherm estimation problem in chromatography. \bjournalApplicable Analysis \bpages1–28. \endbibitem
  • Lisec, Hugo and Seidel-Morgenstern (2001) {barticle}[author] \bauthor\bsnmLisec, \bfnmO.\binitsO., \bauthor\bsnmHugo, \bfnmP.\binitsP. and \bauthor\bsnmSeidel-Morgenstern, \bfnmA.\binitsA. (\byear2001). \btitleFrontal analysis method to determine competitive adsorption isotherms. \bjournalJ. Chromatogr. A \bvolume908 \bpages19–34. \endbibitem
  • Roberts and Rosenthal (1998) {barticle}[author] \bauthor\bsnmRoberts, \bfnmG. O.\binitsG. O. and \bauthor\bsnmRosenthal, \bfnmJ. S.\binitsJ. S. (\byear1998). \btitleOptimal scaling of discrete approximations to Langevin diffusions. \bjournalJournal of the Royal Statistical Society: Series B (Statistical Methodology) \bvolume60 \bpages255–268. \endbibitem
  • Roberts and Tweedie (1996) {barticle}[author] \bauthor\bsnmRoberts, \bfnmG. O.\binitsG. O. and \bauthor\bsnmTweedie, \bfnmR. L.\binitsR. L. (\byear1996). \btitleExponential convergence of Langevin distributions and their discrete approximations. \bjournalBernoulli \bvolume2 \bpages341–363. \endbibitem
  • Ruthven (1984) {bbook}[author] \bauthor\bsnmRuthven, \bfnmD. M.\binitsD. M. (\byear1984). \btitlePrinciples of adsorption and adsorption processes. \bpublisherNew York: Wiley. \endbibitem
  • Xun et al. (2013) {barticle}[author] \bauthor\bsnmXun, \bfnmX.\binitsX., \bauthor\bsnmCao, \bfnmJ.\binitsJ., \bauthor\bsnmMallick, \bfnmB.\binitsB., \bauthor\bsnmMaity, \bfnmA.\binitsA. and \bauthor\bsnmCarroll, \bfnmR. J.\binitsR. J. (\byear2013). \btitleParameter estimation of partial differential equation models. \bjournalJournal of the American Statistical Association \bvolume108 \bpages1009–1020. \endbibitem
  • Zhang et al. (2016a) {barticle}[author] \bauthor\bsnmZhang, \bfnmYe\binitsY., \bauthor\bsnmLin, \bfnmGuangliang\binitsG., \bauthor\bsnmForssén, \bfnmPatrik\binitsP., \bauthor\bsnmGulliksson, \bfnmMårten\binitsM., \bauthor\bsnmFornstedt, \bfnmTorgny\binitsT. and \bauthor\bsnmCheng, \bfnmXiaoliang\binitsX. (\byear2016a). \btitleA regularization method for the reconstruction of adsorption isotherms in liquid chromatography. \bjournalInverse Probl. \bvolume32 \bpages105005. \endbibitem
  • Zhang et al. (2016b) {barticle}[author] \bauthor\bsnmZhang, \bfnmY.\binitsY., \bauthor\bsnmLin, \bfnmG.\binitsG., \bauthor\bsnmGulliksson, \bfnmM.\binitsM., \bauthor\bsnmForssén, \bfnmP.\binitsP., \bauthor\bsnmFornstedt, \bfnmT.\binitsT. and \bauthor\bsnmCheng, \bfnmX.\binitsX. (\byear2016b). \btitleAn adjoint method in inverse problems of chromatography. \bjournalInverse Probl. Sci. Eng. \bpages1–26. \endbibitem