This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Bayesian DD-Optimal Design of Experiments with Quantitative and Qualitative Responses

Lulu Kang1 1Illinois Institute of Technology, Chicago, IL, 2Virginia Tech, Blacksburg, VA. Xinwei Deng2 1Illinois Institute of Technology, Chicago, IL, 2Virginia Tech, Blacksburg, VA. Ran Jin2 1Illinois Institute of Technology, Chicago, IL, 2Virginia Tech, Blacksburg, VA.
Abstract

Systems with both quantitative and qualitative responses are widely encountered in many applications. Design of experiment methods are needed when experiments are conducted to study such systems. Classic experimental design methods are unsuitable here because they often focus on one type of response. In this paper, we develop a Bayesian DD-optimal design method for experiments with one continuous and one binary response. Both noninformative and conjugate informative prior distributions on the unknown parameters are considered. The proposed design criterion has meaningful interpretations regarding the DD-optimality for the models for both types of responses. An efficient point-exchange search algorithm is developed to construct the local DD-optimal designs for given parameter values. Global DD-optimal designs are obtained by accumulating the frequencies of the design points in local DD-optimal designs, where the parameters are sampled from the prior distributions. The performances of the proposed methods are evaluated through two examples.
KEYWORDS: Bayesian DD-optimal design, Conjugate prior, Generalized linear model, Multivariate responses, Noninformative prior, Point-exchange.

1 Introduction

In many applications, both quantitative and qualitative responses are often collected for evaluating the quality of the system. Often, the two types of responses are mutually dependent. We call such a system with both types of quality responses quantitative-qualitative system. Such systems are widely encountered in practice (Kang et al., 2022, 2021, 2018). In Kang et al. (2018), the authors studied an experiment of the lapping stage of the wafer manufacturing process. The qualitative response is the conformity of the site total indicator reading (STIR) of the wafer, which has two possible outcomes: whether or not the STIR of a wafer is within the tolerance. The quantitative response is the total thickness variation (TTV) of the wafer. Kang et al. (2021) focused on the birth records and examined the mutual dependency of birth weight and preterm birth. The birth weight of an infant is a quantitative outcome and the preterm birth is a binary indicator of whether an infant is born before 36 gestational weeks. The two types of outcomes are correlated as an infant is usually underweight if the infant is born preterm. In Kang et al. (2022), two case studies of quantitative-qualitative systems from material sciences and gene expressions are illustrated. In the gene expression study, the qualitative response has three possible outcomes: healthy individuals, patients with Crohn’s disease, and patients with Ulcerative colitis.

This work is motivated by a study of an etching process in a wafer manufacturing process. In the production of silicon wafers, the silicon ingot is sliced into wafers in fine geometry parameters. Inevitably, this step leaves scratches on the wafers’ surface. An etching process is used to improve the surface finish, during which the wafers are submerged in the container of etchant for chemical reaction. The quality of the wafers after etching is measured by two response variables: the total thickness variation of the wafer (TTV) and the binary judgment that whether the wafer has cloudy stains in its appearance. The two responses measure the quality from different but connected aspects. There is a hidden dependency between the continuous TTV and binary judgment of stains. To improve the etching quality, expensive experiments are to be carried out to reveal this hidden dependency and to model the quality-process relationship. Therefore, an ideal experimental design for continuous and binary responses should be able to extract such useful information with economic run size.

The classic design of experiments methods mainly focus on experiments with a single continuous response. There have been various methods developed for a single discrete response too, including Sitter and Torsney (1995); Woods et al. (2006); Russell et al. (2009); Woods (2010); Woods and Van de Ven (2012). For multiple responses, Draper and Hunter (1966) proposed the seminal work for continuous responses, Denman et al. (2011) developed a design method for bivariate binary responses modeled by Copula functions. In the case of mixed types of responses, the literature is very scarce. A naive design method is to combine the two designs that are separately constructed for each type of response. However, such a naive strategy could be reasonable for one type of response but problematic for the other by ignoring the dependency between the types of responses, as shown in Example 1.

Example 1.

Denote by YY and ZZ a continuous response and a binary response, respectively. Assume that the true model of the binary response ZZ is 𝔼(Z|x)=π(x)=exp(1+x)/(1+exp(1+x))\mathbb{E}(Z|x)=\pi(x)=\exp(1+x)/(1+\exp(1+x)). The true model of YY is related to ZZ in the form Y|Z=zN(1(1z)x2,0.32)Y|Z=z\sim N(1-(1-z)x^{2},0.3^{2}). Thus, 𝔼(Y|Z=1)=1\mathbb{E}(Y|Z=1)=1 and 𝔼(Y|Z=0)=1x2\mathbb{E}(Y|Z=0)=1-x^{2}. Using the naive design method, a 14-point design is constructed, which consists of an 8-point local DD-optimal design for the model of ZZ with log(π(x)/(1π(x))=η0+η1x\log(\pi(x)/(1-\pi(x))=\eta_{0}+\eta_{1}x, and a 6-point DD-optimal design for linear regression with a quadratic model of xx. Given the design, we generate the responses from the true models of YY and ZZ. Figure 1 (a)-(c) show the data (xi,yi,zi)(x_{i},y_{i},z_{i}) from the 8-point, 6-point and their combined 14-point design, respectively.

Refer to caption
Figure 1: (a) Observations from design for ZZ; (b) Observations from design for YY; (c) Observations from the combined design. Dashed line “- - -” denotes 𝔼(Y|Z=1)=1\mathbb{E}(Y|Z=1)=1; solid line “—” denotes 𝔼(Y|Z=0)=1x2\mathbb{E}(Y|Z=0)=1-x^{2}; point “+” denotes (xi,yi)(x_{i},y_{i}) with zi=1z_{i}=1; point “o” denotes (xi,yi)(x_{i},y_{i}) with zi=0z_{i}=0.

In this example, there is a strong dependency between the two responses since the true underlying models of 𝔼(Y|Z)\mathbb{E}(Y|Z) are different when Z=1Z=1 and Z=0Z=0. In both designs for a single response shown in Figure 1 (a) and (b), the design points are balanced and reasonably distributed for the targeted response. However, since there are no YY observations for Z=0Z=0 at x=1.0x=1.0 shown in Figure 1 (c), the quadratic model for Y|Z=0Y|Z=0 is not estimable. Clearly, the combined design is not suitable here. Note that this problem is not caused by outliers, since all the points for Z=1Z=1 (with “+”) are varied around Y=1Y=1 and the points for Z=0Z=0 (with “o”) are around Y=1x2Y=1-x^{2}. In fact P(Z=0|x=1)=0.12P(Z=0|x=1)=0.12, which is relatively small. Thus it is less likely to observe YY with Z=0Z=0 at x=1.0x=1.0. A simple solution is to add more replications at x=1.0x=1.0, but it is not clear how many replications should be sufficient. It becomes more difficult to spot a direct solution when the experiments get more complicated.

Such experiments call for new experimental design methods to account for both continuous and binary responses. Note that under the experimental design framework, the linear model is often considered for modeling the continuous response, and the generalized linear model (GLM) is often considered for modeling the qualitative response. A joint model must be developed to incorporate both types of responses. Compared to the classic design methods for linear models or GLMs, design for the joint model is more challenging due to the following aspects. First, the design criterion for the joint model is more complicated, as the joint model is more complicated than the separate ones. Second, experimental design for the GLM itself is more difficult than that for the linear model, which is naturally inherited by the design for the joint model. Third, efficient design construction algorithms are needed to handle the complexity of the design criterion based on the joint model. Kang and Huang (2019) proposed an AA-optimal design for the experiments with both quantitative and qualitative responses. The AA-optimality was derived under a Bayesian framework proposed in Kang et al. (2018). Although Kang and Huang (2019) addressed the three challenges to a degree, the AA-optimality is not a commonly used criterion. More importantly, only informative prior is considered, which circumvented some difficulties brought by noninformative prior of the parameters.

In this paper, we choose the most commonly used DD-optimal design criterion and propose a novel Bayesian design method for the continuous and binary responses. The proposed method considers both cases of noninformative priors and informative priors. With the noninformative priors, the Bayesian framework is equivalent to the frequentist approach. In this case, we also establish some regularity conditions on the experimental run sizes. With the informative priors, we develop the DD-optimal design using conjugate priors. The derived design criterion has meaningful interpretations in terms of the DD-optimality criteria for the models of both continuous and binary responses. Moreover, we develop an efficient point-exchange algorithm to construct the proposed designs. The construction algorithm can be applied to more general settings other than factorial designs.

The rest of the paper is organized as follows. Section 2 reviews the general Bayesian quantitative-qualitative (QQ) model and the optimal design criterion. The Bayesian DD-optimal design criterion is derived using noninformative prior distributions in Section 3. In Section 4, the design criterion is derived with conjugate informative priors. Efficient algorithms for constructing optimal designs are elaborated in Section 5. One artificial example and the etching experimental design are shown in Section 6. Section 7 concludes this paper with some discussions.

2 General Bayesian QQ Model and Design

We first review the general Bayesian QQ model introduced in Kang et al. (2018) and focus on the scenario that YY is a continuous response and ZZ is a binary response. The input variable 𝒙=(x1,,xp)p\bm{x}=(x_{1},\ldots,x_{p})^{\prime}\in\mathbb{R}^{p} contains pp dimensions. Denote the data as (𝒙i,yi,zi),i=1,,n(\bm{x}_{i},y_{i},z_{i}),i=1,\ldots,n, where yiy_{i}\in\mathbb{R} and zi{0,1}z_{i}\in\{0,1\}. The vectors 𝒚=(yi,,yn)\bm{y}=(y_{i},\ldots,y_{n})^{\prime} and 𝒛=(z1,,zn)\bm{z}=(z_{1},\ldots,z_{n})^{\prime} are the vectors of response observations. To jointly model the continuous response YY and the binary response ZZ given 𝒙\bm{x}, consider the joint probability of Y|ZY|Z and ZZ. The conditional model on Y|ZY|Z is assumed to be a linear regression model, while the model of ZZ is a logistic regression model. Specifically, we consider joint modeling of YY and ZZ as follows,

Z={1, with probability π(𝒙)0, with probability 1π(𝒙) with π(𝒙,𝜼)=exp(𝒇(𝒙)𝜼)1+exp(𝒇(𝒙)𝜼),\displaystyle Z=\left\{\begin{array}[]{cl}1,&\textrm{ with probability }\pi(\bm{x})\\ 0,&\textrm{ with probability }1-\pi(\bm{x})\end{array}\right.\textrm{ with }\pi(\bm{x},\bm{\eta})=\frac{\exp(\bm{f}(\bm{x})^{\prime}\bm{\eta})}{1+\exp(\bm{f}(\bm{x})^{\prime}\bm{\eta})}, (3)

where 𝒇(𝒙)=(f1(𝒙),,fq(𝒙))\bm{f}(\bm{x})=\left(f_{1}(\bm{x}),\ldots,f_{q}(\bm{x})\right) contains qq modeling effects including the intercept, the main, interaction and quadratic effects, etc., and 𝜼=(η1,,ηq)\bm{\eta}=(\eta_{1},\ldots,\eta_{q})^{\prime} is a vector of parameter coefficients. Conditioning on Z=zZ=z, the quantitative variable YY has the distribution

Y|Z=zN(μ0+z𝒇(𝒙)𝜷(1)+(1z)𝒇(𝒙)𝜷(2),σ2),Y|Z=z\sim N(\mu_{0}+z\bm{f}(\bm{x})^{\prime}\bm{\beta}^{(1)}+(1-z)\bm{f}(\bm{x})^{\prime}\bm{\beta}^{(2)},\sigma^{2}), (4)

where 𝜷(i)=(β1(i),,βq(i)),i=1,2\bm{\beta}^{(i)}=(\beta^{(i)}_{1},\ldots,\beta^{(i)}_{q})^{\prime},i=1,2 are the corresponding coefficients of the model effects. The parameter μ0\mu_{0} is the mean and σ2\sigma^{2} is the noise variance. The above conditional model (4) indicates that Y|Z=1N(μ0+𝒇(x)𝜷(1),σ2)Y|Z=1\sim N(\mu_{0}+\bm{f}(x)^{\prime}\bm{\beta}^{(1)},\sigma^{2}) and Y|Z=0N(μ0+𝒇(𝒙)𝜷(2),σ2)Y|Z=0\sim N(\mu_{0}+\bm{f}(\bm{x})^{\prime}\bm{\beta}^{(2)},\sigma^{2}). We assume the same variance σ2\sigma^{2} for the two conditional distributions of Y|Z=1Y|Z=1 and 0. The design method developed in the paper can be easily adapted to the case with different variances.

The association between the two responses YY and ZZ is represented using the conditional model Y|ZY|Z. When the two linear models for Y|Z=0Y|Z=0 and Y|Z=1Y|Z=1 are different, i.e., 𝜷(1)𝜷(2)\bm{\beta}^{(1)}\neq\bm{\beta}^{(2)}, then it is important to take account of the influence of the qualitative response ZZ when modeling the quantitative response YY. Let 𝑿=(𝒙1,,𝒙n)\bm{X}=(\bm{x}_{1},\ldots,\bm{x}_{n})^{\prime} be the n×pn\times p design matrix with 𝒙i\bm{x}_{i} as the iith design point. Based on the CB model, we can express the sampling distributions as

𝒚|𝒛,𝜷(1),𝜷(2),μ0,σ2,𝑿N(μ0𝟏+𝑽1𝑭𝜷(1)+𝑽2𝑭𝜷(2),σ2𝑰n),\displaystyle\bm{y}|\bm{z},\bm{\beta}^{(1)},\bm{\beta}^{(2)},\mu_{0},\sigma^{2},\bm{X}\sim N(\mu_{0}{\bf 1}+\bm{V}_{1}\bm{F}\bm{\beta}^{(1)}+\bm{V}_{2}\bm{F}\bm{\beta}^{(2)},\sigma^{2}\bm{I}_{n}), (5)
𝒛|𝜼,𝑿Bernoulli(π(𝒙i,𝜼)) for i=1,,n, and\displaystyle\bm{z}|\bm{\eta},\bm{X}\sim\textrm{Bernoulli}(\pi(\bm{x}_{i},\bm{\eta}))\textrm{ for }i=1,\ldots,n,\textrm{ and } (6)
p(𝒛|𝜼,𝑿)exp{i=1n(zi𝒇(𝒙i)𝜼log(1+e𝒇(𝒙i)𝜼))},\displaystyle p(\bm{z}|\bm{\eta},\bm{X})\propto\exp\left\{\sum_{i=1}^{n}\left(z_{i}\bm{f}(\bm{x}_{i})^{\prime}\bm{\eta}-\log(1+e^{\bm{f}(\bm{x}_{i})^{\prime}\bm{\eta}})\right)\right\},

where p()p(\cdot) denotes a general density function. Here 𝑽1=diag{z1,,zn}\bm{V}_{1}=\mbox{diag}\{z_{1},\ldots,z_{n}\} is a diagonal matrix, 𝑰n\bm{I}_{n} is the n×nn\times n identity matrix and 𝑽2=𝑰n𝑽1\bm{V}_{2}=\bm{I}_{n}-\bm{V}_{1}, 𝑭\bm{F} is the model matrix with the iith row as 𝒇(𝒙i)\bm{f}(\bm{x}_{i})^{\prime}, and 𝟏\bf 1 is a vector of ones.

Denote p(𝜷(1))p(\bm{\beta}^{(1)}), p(𝜷(2))p(\bm{\beta}^{(2)}), and p(𝜼)p(\bm{\eta}) as the prior distributions of the parameters 𝜷(1)\bm{\beta}^{(1)}, 𝜷(2)\bm{\beta}^{(2)}, and 𝜼\bm{\eta}. Note that we focus on the estimation accuracy of the three groups of parameters. The mean μ0\mu_{0} and variance σ2\sigma^{2} are considered nuisance parameters and thus excluded from the optimal design criterion. In this work, we assume that the priors of 𝜷(1)\bm{\beta}^{(1)}, 𝜷(2)\bm{\beta}^{(2)}, and 𝜼\bm{\eta} are independent. Under this assumption, the conditional posterior distribution of 𝜼\bm{\eta}, 𝜷(1)\bm{\beta}^{(1)}, and 𝜷(2)\bm{\beta}^{(2)} are also independent as explained in Sections 3 and 4. Under the Bayesian framework, the conditional posterior distribution of the parameters (𝜷(1),𝜷(2),𝜼)(\bm{\beta}^{(1)},\bm{\beta}^{(2)},\bm{\eta}) can be derived as

p(𝜷(1),𝜷(2),𝜼|𝒚,𝒛,μ0,σ2,𝑿)\displaystyle p(\bm{\beta}^{(1)},\bm{\beta}^{(2)},\bm{\eta}|\bm{y},\bm{z},\mu_{0},\sigma^{2},\bm{X}) p(𝒚|𝒛,𝜷(1),𝜷(2),μ0,σ2,𝑿)p(𝜷(1))p(𝜷(2))p(𝒛|𝜼,𝑿)p(𝜼)\displaystyle\propto p(\bm{y}|\bm{z},\bm{\beta}^{(1)},\bm{\beta}^{(2)},\mu_{0},\sigma^{2},\bm{X})p(\bm{\beta}^{(1)})p(\bm{\beta}^{(2)})p(\bm{z}|\bm{\eta},\bm{X})p(\bm{\eta}) (7)

Using (7) we develop the general Bayesian optimal design criterion. Let ψ\psi be a criterion function on the conditional posterior distribution of the parameters. For example, it can be the Shannon information (or equivalently, the Kullback-Leibler distance), AA/II-optimality (Fedorov, 1972), or other design criteria. However, ψ()\psi(\cdot) cannot be directly used as the final optimal design criterion because its value depends on the random parameters (𝜷(1),𝜷(2),𝜼)(\bm{\beta}^{(1)},\bm{\beta}^{(2)},\bm{\eta}) and the experimental outputs (𝒚,𝒛)(\bm{y},\bm{z}) that are not yet observed. The randomness of (𝜷(1),𝜷(2),𝜼)(\bm{\beta}^{(1)},\bm{\beta}^{(2)},\bm{\eta}) can be removed by calculating the mean of ψ\psi with respect to these parameters. The uncertainty of (𝒚,𝒛)(\bm{y},\bm{z}) can be removed by calculating the mean 𝔼(𝔼(ψ|𝒚,𝒛))\mathbb{E}(\mathbb{E}(\psi|\bm{y},\bm{z})). Therefore, the general Bayesian optimal design criterion on the design matrix 𝑿\bm{X} is

Ψ(𝑿|μ0,σ2)=p(𝒚,𝒛|μ0,σ2,𝑿)\displaystyle\Psi(\bm{X}|\mu_{0},\sigma^{2})=\int p(\bm{y},\bm{z}|\mu_{0},\sigma^{2},\bm{X}) ×(ψ(p(𝜷(1),𝜷(2),𝜼|𝒚,𝒛,μ0,σ2,𝑿))×\displaystyle\times\left(\int\psi(p(\bm{\beta}^{(1)},\bm{\beta}^{(2)},\bm{\eta}|\bm{y},\bm{z},\mu_{0},\sigma^{2},\bm{X}))\times\right. (8)
p(𝜷(1),𝜷(2),𝜼|𝒚,𝒛,μ0,σ2,𝑿)d𝜷(1)d𝜷(2)d𝜼)d𝒚d𝒛.\displaystyle\left.p(\bm{\beta}^{(1)},\bm{\beta}^{(2)},\bm{\eta}|\bm{y},\bm{z},\mu_{0},\sigma^{2},\bm{X})d\bm{\beta}^{(1)}d\bm{\beta}^{(2)}d\bm{\eta}\right)d\bm{y}d\bm{z}.

It is well-known that the Bayesian DD-optimal design is equivalent to the Shannon information criterion (Chaloner and Verdinelli, 1995), omitting the constant terms to 𝑿\bm{X}. The criterion function ψ()\psi(\cdot) of Shannon information is log()\log(\cdot). Next, we develop the Bayesian DD-optimal design criteria (8) under different prior distributions.

3 Optimal Design under Noninformative Priors

When lacking domain knowledge or proper historical data, experimenters often favor the frequentist approach as no priors need to be specified. The frequentist approach can be seen as the Bayesian approach using noninformative priors. In this section, we derive the optimal design criterion and the regularity conditions for noninformative priors.

3.1 Design Criterion

Assume the non-informative priors p(𝜷(i))1p(\bm{\beta}^{(i)})\propto 1 for i=1,2i=1,2 and p(𝜼)1p(\bm{\eta})\propto 1. The conditional posterior distribution in (7) is the same as the joint distribution of the data. It can be further factorized into

p(𝜷(1),𝜷(2),𝜼|𝒚,𝒛,μ0,σ2,𝑿)p(𝜼|𝒛,𝑿)i=12p(𝜷(i)|𝒚,𝒛,μ0,σ2,𝑿),p(\bm{\beta}^{(1)},\bm{\beta}^{(2)},\bm{\eta}|\bm{y},\bm{z},\mu_{0},\sigma^{2},\bm{X})\propto p(\bm{\eta}|\bm{z},\bm{X})\prod_{i=1}^{2}p(\bm{\beta}^{(i)}|\bm{y},\bm{z},\mu_{0},\sigma^{2},\bm{X}),

with the posterior distributions

𝜷(i)|𝒚,𝒛,μ0,σ2,𝑿N((𝑭𝑽i𝑭)1𝑭𝑽i(𝒚μ0𝟏),σ2(𝑭𝑽i𝑭)1) for i=1,2,\displaystyle\bm{\beta}^{(i)}|\bm{y},\bm{z},\mu_{0},\sigma^{2},\bm{X}\sim N\left((\bm{F}^{\prime}\bm{V}_{i}\bm{F})^{-1}\bm{F}^{\prime}\bm{V}_{i}(\bm{y}-\mu_{0}{\bf 1}),\sigma^{2}(\bm{F}^{\prime}\bm{V}_{i}\bm{F})^{-1}\right)\textrm{ for }i=1,2, (9)
p(𝜼|𝒛,𝑿)exp{i=1n(zi𝒇(𝒙i)𝜼log(1+e𝒇(𝒙i)𝜼))}.\displaystyle p(\bm{\eta}|\bm{z},\bm{X})\propto\exp\left\{\sum_{i=1}^{n}\left(z_{i}\bm{f}(\bm{x}_{i})^{\prime}\bm{\eta}-\log(1+e^{\bm{f}(\bm{x}_{i})^{\prime}\bm{\eta}})\right)\right\}. (10)

Conditioning on 𝒛\bm{z}, the posterior distributions of (𝜷1,𝜷2)(\bm{\beta}^{1},\bm{\beta}^{2}) and 𝜼\bm{\eta} are independent. Note that the noninformative prior p(𝜼)1p(\bm{\eta})\propto 1 is proper because it leads to proper posterior p(𝜼|𝒛,𝑿)p(\bm{\eta}|\bm{z},\bm{X}). Under the noninformative priors, the Bayesian estimation is identical to the frequentist estimation.

Using the posterior distributions (9)–(10) and the criterion function ψ()=log()\psi(\cdot)=\log(\cdot) in the general Bayesian optimal design criterion (8), we obtain the Bayesian DD-optimal design criterion (11).

Ψ(𝑿|μ0,σ2)\displaystyle\Psi(\bm{X}|\mu_{0},\sigma^{2}) =𝔼𝒛,𝜼{log(p(𝜼|𝒛,𝑿))}\displaystyle=\mathbb{E}_{\bm{z},\bm{\eta}}\left\{\log(p(\bm{\eta}|\bm{z},\bm{X}))\right\} (11)
+12i=12𝔼𝜼𝔼𝒛|𝜼{logdet{(𝑭𝑽i𝑭)}}+constants.\displaystyle+\frac{1}{2}\sum_{i=1}^{2}\mathbb{E}_{\bm{\eta}}\mathbb{E}_{\bm{z}|\bm{\eta}}\left\{\log\det\{(\bm{F}^{\prime}\bm{V}_{i}\bm{F})\}\right\}+\textrm{constants}.

The derivation is in Appendix A1. The first additive term in (11) is exactly the Bayesian DD-optimal design criterion for GLMs. Unfortunately, its exact integration is not tractable. The common approach in experimental design for GLMs is to use a normal approximation for the posterior distribution p(𝜼|𝒛,𝑿)p(\bm{\eta}|\bm{z},\bm{X}) (Chaloner and Verdinelli, 1995; Khuri et al., 2006). Such an approximation leads to

𝔼𝒛,𝜼{log(p(𝜼|𝒛,𝑿))}𝔼𝜼{logdet𝑰(𝜼|𝑿)}+constant,\mathbb{E}_{\bm{z},\bm{\eta}}\left\{\log(p(\bm{\eta}|\bm{z},\bm{X}))\right\}\approx\mathbb{E}_{\bm{\eta}}\{\log\det\bm{I}(\bm{\eta}|\bm{X})\}+\textrm{constant}, (12)

where 𝑰(𝜼|𝑿)\bm{I}(\bm{\eta}|\bm{X}) is the Fisher information matrix. We can easily show that

𝑰(𝜼|𝑿)=𝔼𝒛(2l(𝒛,𝜼|𝑿)𝜼𝜼T)=i=1n𝒇(𝒙i)𝒇(𝒙i)π(𝒙i,𝜼)(1π(𝒙i,𝜼))=𝑭𝑾0𝑭,\bm{I}(\bm{\eta}|\bm{X})=-\mathbb{E}_{\bm{z}}\left(\frac{\partial^{2}l(\bm{z},\bm{\eta}|\bm{X})}{\partial\bm{\eta}\partial\bm{\eta}^{T}}\right)=\sum_{i=1}^{n}\bm{f}(\bm{x}_{i})\bm{f}(\bm{x}_{i})^{\prime}\pi(\bm{x}_{i},\bm{\eta})(1-\pi(\bm{x}_{i},\bm{\eta}))=\bm{F}^{\prime}\bm{W}_{0}\bm{F},

where 𝑾0\bm{W}_{0} is a diagonal weight matrix

𝑾0=diag{π(𝒙1,𝜼)(1π(𝒙1,𝜼)),,π(𝒙n,𝜼)(1π(𝒙n,𝜼))}.\bm{W}_{0}=\mbox{diag}\{\pi(\bm{x}_{1},\bm{\eta})(1-\pi(\bm{x}_{1},\bm{\eta})),\ldots,\pi(\bm{x}_{n},\bm{\eta})(1-\pi(\bm{x}_{n},\bm{\eta}))\}.

Omitting the irrelevant constant, we approximate the exact criterion Ψ(𝑿|μ0,σ2)\Psi(\bm{X}|\mu_{0},\sigma^{2}) in (11) as follows.

Ψ(𝑿|μ0,σ2)𝔼𝜼{logdet(𝑭𝑾0𝑭)}+12i=12𝔼𝜼𝔼𝒛|𝜼{logdet(𝑭𝑽i𝑭)}.\displaystyle\Psi(\bm{X}|\mu_{0},\sigma^{2})\approx\mathbb{E}_{\bm{\eta}}\{\log\det(\bm{F}^{\prime}\bm{W}_{0}\bm{F})\}+\frac{1}{2}\sum_{i=1}^{2}\mathbb{E}_{\bm{\eta}}\mathbb{E}_{\bm{z}|\bm{\eta}}\left\{\log\det(\bm{F}^{\prime}\bm{V}_{i}\bm{F})\right\}. (13)

To construct the optimal design, we consider maximizing the approximated Ψ(𝑿|μ0,σ2)\Psi(\bm{X}|\mu_{0},\sigma^{2}) in (13). But this is not trivial, because it involves the expectation on ZiZ_{i}’s in the second additive term. To overcome this challenge, Hainy et al. (2013) constructed optimal designs by simulating samples from the joint distribution of responses and the unknown parameters. But this method can be computationally expensive for even slightly larger dimensions of experimental factors. Instead of simulating ZiZ_{i}’s, we derive the following Theorem 1 that gives a tractable upper bound Q(𝑿)Q(\bm{X}). Thus we propose using the upper bound Q(𝑿)Q(\bm{X}) as the optimal criterion.

Theorem 1.

Assume that the matrices 𝐅𝐖0𝐅\bm{F}^{\prime}\bm{W}_{0}\bm{F}, 𝐅𝐕1𝐅\bm{F}^{\prime}\bm{V}_{1}\bm{F}, and 𝐅𝐕2𝐅\bm{F}^{\prime}\bm{V}_{2}\bm{F} are all nonsingular. Omitting the irrelevant constant, an upper bound of the approximated Ψ(𝐗|μ0,σ2)\Psi(\bm{X}|\mu_{0},\sigma^{2}) is

Q(𝑿)=𝔼𝜼{logdet(𝑭𝑾0𝑭)+12i=12logdet(𝑭𝑾i𝑭)},Q(\bm{X})=\mathbb{E}_{\bm{\eta}}\left\{\log\det(\bm{F}^{\prime}\bm{W}_{0}\bm{F})+\frac{1}{2}\sum_{i=1}^{2}\log\det(\bm{F}^{\prime}\bm{W}_{i}\bm{F})\right\}, (14)

where 𝐖1=diag{π(𝐱1,𝛈),,π(𝐱n,𝛈)}\bm{W}_{1}=\mbox{diag}\{\pi(\bm{x}_{1},\bm{\eta}),\ldots,\pi(\bm{x}_{n},\bm{\eta})\} and 𝐖2=𝐈n𝐖1\bm{W}_{2}=\bm{I}_{n}-\bm{W}_{1}.

The proof of Theorem 1 is in Appendix A1. Note that Theorem 1 requires that 𝑭𝑾i𝑭\bm{F}^{\prime}\bm{W}_{i}\bm{F} for i=0,1,2i=0,1,2 are all nonsingular. Obviously 𝑾0=𝑾1𝑾2\bm{W}_{0}=\bm{W}_{1}\bm{W}_{2}. It is easy to see that 𝑭𝑾0𝑭\bm{F}^{\prime}\bm{W}_{0}\bm{F} is nonsingular if and only if both 𝑭𝑾1𝑭\bm{F}^{\prime}\bm{W}_{1}\bm{F} and 𝑭𝑾2𝑭\bm{F}^{\prime}\bm{W}_{2}\bm{F} are nonsingular.

The matrices 𝑭𝑽1𝑭\bm{F}^{\prime}\bm{V}_{1}\bm{F} and 𝑭𝑽2𝑭\bm{F}^{\prime}\bm{V}_{2}\bm{F} involve the responses ZiZ_{i}’s that are not yet observed at the experimental design stage. We can only choose the experimental run size and the design points to avoid the singularity problem with a larger probability for given values of 𝜼\bm{\eta}. Once the run size is chosen, the design points can be optimally arranged by maximizing Q(𝑿)Q(\bm{X}). The weight matrix 𝑾1\bm{W}_{1} (or 𝑾2\bm{W}_{2}) gives more weight to the feasible design points that are more likely to lead to Z=1Z=1 (or Z=0Z=0) observations so that the parameters 𝜷(1)\bm{\beta}^{(1)} (or 𝜷(2)\bm{\beta}^{(2)}) of the linear model Y|Z=1Y|Z=1 (or Y|Z=0Y|Z=0) are more likely to be estimable. Next, we introduce some regularity conditions on the run size and number of replications to alleviate the singularity problem.

3.2 Regularity Conditions

Let mm be the number of distinct design points in the design matrix 𝑿\bm{X}, nin_{i} be the number of repeated point 𝒙i\bm{x}_{i} in 𝑿\bm{X} for i=1,,mi=1,\ldots,m. Thus n=i=1mnin=\sum_{i=1}^{m}n_{i} and ni1n_{i}\geq 1 for i=1,,mi=1,\ldots,m. First, it is necessary that mqm\geq q for the linear regression model to be estimable under the noninformative priors. The if-and-only-if condition for 𝑭𝑾0𝑭\bm{F}^{\prime}\bm{W}_{0}\bm{F} to be nonsingular is that rank(𝑭𝑾0𝑭)q\textrm{rank}(\bm{F}^{\prime}\bm{W}_{0}\bm{F})\geq q. If mqm\geq q and π(𝒙i,𝜼)(0,1)\pi(\bm{x}_{i},\bm{\eta})\in(0,1) for i=1,,mi=1,\ldots,m, then 𝑭𝑾i𝑭\bm{F}^{\prime}\bm{W}_{i}\bm{F} for i=0,1,2i=0,1,2 are all nonsingular and thus positive definite. To make sure π(𝒙i,𝜼)(0,1)\pi(\bm{x}_{i},\bm{\eta})\in(0,1) for i=1,,mi=1,\ldots,m, it is sufficient to assume that 𝜼\bm{\eta} is finitely bounded. This condition is typically used for the frequentist DD-optimal design for GLMs. For instance, Woods et al. (2006) suggested using the centroids of the finite bounded space of 𝜼\bm{\eta} to develop the local DD-optimal design for GLMs. Dror and Steinberg (2006) clustered different local DD-optimal designs with 𝜼\bm{\eta} randomly sampled from its bounded space.

The if-and-only-if condition for 𝑭𝑽1𝑭\bm{F}^{\prime}\bm{V}_{1}\bm{F} and 𝑭𝑽2𝑭\bm{F}^{\prime}\bm{V}_{2}\bm{F} being nonsingular is

i=1mI(j=1niZij>0)qandi=1mI(j=1niZij<ni)q,\sum_{i=1}^{m}I(\sum_{j=1}^{n_{i}}Z_{ij}>0)\geq q\quad\textrm{and}\quad\sum_{i=1}^{m}I(\sum_{j=1}^{n_{i}}Z_{ij}<n_{i})\geq q, (15)

where ZijZ_{ij} is the jjth random binary response at the unique design point 𝒙i\bm{x}_{i} and I()I(\cdot) is the indicator function. In the following, we discuss how to choose sample sizes nin_{i} and nn under two scenarios (i) m=qm=q and (ii) m>qm>q.

Proposition 1.

Assume m=qm=q. Both 𝐅𝐕1𝐅\bm{F}^{\prime}\bm{V}_{1}\bm{F} and 𝐅𝐕2𝐅\bm{F}^{\prime}\bm{V}_{2}\bm{F} are nonsingular if and only if I(0<j=1niZij<ni)=1I(0<\sum_{j=1}^{n_{i}}Z_{ij}<n_{i})=1 for i=1,2,,mi=1,2,\ldots,m. For any given κ(0,1)\kappa\in(0,1), a sufficient condition on nin_{i} for Pr(0<j=1niZij<ni)κ\Pr(0<\sum_{j=1}^{n_{i}}Z_{ij}<n_{i})\geq\kappa is

ni1+log(1κ)log(max{π(𝒙i,𝜼),1π(𝒙i,𝜼)})for i=1,2,,m,n_{i}\geq 1+\left\lceil\frac{\log(1-\kappa)}{\log\left(\max\left\{\pi(\bm{x}_{i},\bm{\eta}),1-\pi(\bm{x}_{i},\bm{\eta})\right\}\right)}\right\rceil\quad\textrm{for }i=1,2,\ldots,m, (16)

and a necessary condition is

ni2log(1κ2)logπ(𝒙i,𝜼)+log(1π(𝒙i,𝜼))for i=1,2,,m.n_{i}\geq\left\lceil\frac{2\log\left(\frac{1-\kappa}{2}\right)}{\log\pi(\bm{x}_{i},\bm{\eta})+\log(1-\pi(\bm{x}_{i},\bm{\eta}))}\right\rceil\quad\textrm{for }i=1,2,\ldots,m. (17)
Proposition 2.

Assume m>qm>q. To make both 𝐅𝐕1𝐅\bm{F}^{\prime}\bm{V}_{1}\bm{F} and 𝐅𝐕2𝐅\bm{F}^{\prime}\bm{V}_{2}\bm{F} nonsingular with large probability, or equivalently,

𝔼{i=1mI(j=1niZij>0)}qand𝔼{i=1mI(j=1niZij<ni)}q,\mathbb{E}\left\{\sum_{i=1}^{m}I(\sum_{j=1}^{n_{i}}Z_{ij}>0)\right\}\geq q\quad\textrm{and}\quad\mathbb{E}\left\{\sum_{i=1}^{m}I(\sum_{j=1}^{n_{i}}Z_{ij}<n_{i})\right\}\geq q,
  • (i)

    a sufficient condition is

    n0max{1,log(1q/m)log(1πmin),log(1q/m)logπmax},n_{0}\geq\max\left\lceil\left\{1,\frac{\log(1-q/m)}{\log(1-\pi_{\min})},\frac{\log(1-q/m)}{\log\pi_{\max}}\right\}\right\rceil, (18)

    which is the same as

    nmmax{1,log(1q/m)log(1πmin),log(1q/m)logπmax},n\geq\left\lceil m\cdot\max\left\{1,\frac{\log(1-q/m)}{\log(1-\pi_{\min})},\frac{\log(1-q/m)}{\log\pi_{\max}}\right\}\right\rceil, (19)

    where n0=min{n1,,nm}n_{0}=\min\{n_{1},\ldots,n_{m}\}, πmin=mini=1mπ(𝒙i,𝜼)\pi_{\min}=\min_{i=1}^{m}\pi(\bm{x}_{i},\bm{\eta}), πmax=maxi=1mπ(𝒙i,𝜼)\pi_{\max}=\max_{i=1}^{m}\pi(\bm{x}_{i},\bm{\eta}) and 𝒙i\bm{x}_{i}’s are the unique design points;

  • (ii)

    a necessary condition is

    n0max{1,log(1q/m)log(1πmax),log(1q/m)logπmin}n_{0}\geq\left\lceil\max\left\{1,\frac{\log(1-q/m)}{\log(1-\pi_{\max})},\frac{\log(1-q/m)}{\log\pi_{\min}}\right\}\right\rceil (20)

    which is the same as

    nmmax{1,log(1q/m)log(1πmax),log(1q/m)logπmin}.n\geq\left\lceil m\cdot\max\left\{1,\frac{\log(1-q/m)}{\log(1-\pi_{\max})},\frac{\log(1-q/m)}{\log\pi_{\min}}\right\}\right\rceil. (21)

Proposition 1 gives a sufficient condition on the lower bound of nin_{i} when saturated design (m=qm=q) is used. Under the sufficient condition, the nonsingularity of 𝑭𝑽1𝑭\bm{F}\bm{V}_{1}\bm{F} and 𝑭𝑽2𝑭\bm{F}\bm{V}_{2}\bm{F} holds with a probability larger than κm\kappa^{m}. For Example 1 in Section 1, suppose that if the possible values of xx can only be 1,0,1-1,0,1, then m=q=3m=q=3. Let 𝜼=(1,1)\bm{\eta}=(1,1)^{\prime}. If κ=0.5\kappa=0.5, then the numbers of replications for x=1,0,1x=-1,0,1 need to satisfy n12n_{1}\geq 2, n24n_{2}\geq 4, and n37n_{3}\geq 7, respectively. If κ=0.9\kappa=0.9, then n15n_{1}\geq 5, n29n_{2}\geq 9, and n320n_{3}\geq 20. Proposition 1 is useful in Step 1 to construct the initial design in Algorithm 2 in Section 5.

Proposition 2 provides one sufficient condition and one necessary condition when m>qm>q on the smallest number of replications and the overall run size. But these conditions only examine the nonsingularity of the two matrices with large probability, which is weaker than Proposition 1. For given 𝜼\bm{\eta} value, Algorithm 2 in Section 5.1 can return the local DD-optimal design. Proposition 2 can be useful to check the local DD-optimal design, as πmin\pi_{\min} and πmax\pi_{\max} depend on 𝜼\bm{\eta}. Take the artificial example in Section 6.1 for instance. The local DD-optimal design for ρ=0\rho=0 (Table A1 in Appendix A2) has m=50m=50 unique design points and there are q=22q=22 effects. According to Proposition 2, the sufficient condition requires n07n_{0}\geq 7 and the necessary condition requires n01n_{0}\geq 1. The local DD-optimal design in Table A1 only satisfies the necessary condition. To meet the sufficient condition nn has to be much larger. For the global optimal design considering all possible 𝜼\bm{\eta} values, Proposition 2 can provide some guidelines for the design construction when 𝜼\bm{\eta} is varied in a relatively small range.

4 Optimal Design under Conjugate Priors

When prior information for parameters 𝜷(1)\bm{\beta}^{(1)}, 𝜷(2)\bm{\beta}^{(2)}, and 𝜼\bm{\eta} is available, it would be desirable to consider the optimal design under the informative priors. In this section, we detail the proposed Bayesian DD-optimal design using the conjugate priors.

4.1 Design Criterion

For the parameters 𝜷(1)\bm{\beta}^{(1)} and 𝜷(2)\bm{\beta}^{(2)}, the conjugate priors are normal distribution since Y|ZY|Z follows normal distribution. Thus we consider their priors as

𝜷(1)N(𝟎,τ2𝑹1),𝜷(2)N(𝟎,τ2𝑹2).\bm{\beta}^{(1)}\sim N(\bm{0},\tau^{2}\bm{R}_{1}),\quad\bm{\beta}^{(2)}\sim N(\bm{0},\tau^{2}\bm{R}_{2}).

where τ2\tau^{2} is the prior variance and 𝑹i\bm{R}_{i} is the prior correlation matrix of 𝜷(i)\bm{\beta}^{(i)} for i=1,2i=1,2. Here we use the same prior variance τ2\tau^{2} only for simplicity. The matrix 𝑹i\bm{R}_{i} can be specified flexibly such as using (𝑭𝑭)1(\bm{F}^{\prime}\bm{F})^{-1}, or those in Joseph (2006) for factorial designs.

For the parameters 𝜼\bm{\eta}, we choose the conjugate prior derived in Chen and Ibrahim (2003). It takes the form

𝜼D(s,𝒃)exp{i=1ns(bi𝒇(𝒙i)𝜼log(1+e𝒇(𝒙i)𝜼))},\bm{\eta}\sim D(s,\bm{b})\propto\exp\left\{\sum_{i=1}^{n}s\left(b_{i}\bm{f}(\bm{x}_{i})^{\prime}\bm{\eta}-\log(1+e^{\bm{f}(\bm{x}_{i})^{\prime}\bm{\eta}})\right)\right\}, (22)

where D(s,𝒃)D(s,\bm{b}) is the distribution with parameters (s,𝒃)(s,\bm{b}). Here ss is a scalar factor and 𝒃(0,1)n\bm{b}\in(0,1)^{n} is the marginal mean of 𝒛\bm{z} as shown in Diaconis and Ylvisaker (1979). The value of 𝒃\bm{b} can be interpreted as a prior prediction (or guess) for 𝔼(𝒁)\mathbb{E}(\bm{Z}). Based on the priors for (𝜷(1),𝜷(2),𝜼)(\bm{\beta}^{(1)},\bm{\beta}^{(2)},\bm{\eta}) we can derive the posteriors as follows.

Proposition 3.

For priors 𝛃(i)N(𝟎,τ2𝐑i)\bm{\beta}^{(i)}\sim N(\bm{0},\tau^{2}\bm{R}_{i}), i=1,2i=1,2 , and 𝛈D(s,𝐛)\bm{\eta}\sim D(s,\bm{b}), the posterior distributions of 𝛃(1)\bm{\beta}^{(1)}, 𝛃(2)\bm{\beta}^{(2)} and 𝛈\bm{\eta} are independent of each other with the following forms.

𝜷(i)|𝒚,𝒛,μ0,σ2,𝑿N\displaystyle\bm{\beta}^{(i)}|\bm{y},\bm{z},\mu_{0},\sigma^{2},\bm{X}\sim N (𝑯i1𝑭𝑽i(𝒚μ0𝟏),σ2𝑯i1),for i=1,2.\displaystyle\left(\bm{H}_{i}^{-1}\bm{F}^{\prime}\bm{V}_{i}(\bm{y}-\mu_{0}{\bf 1}),\sigma^{2}\bm{H}_{i}^{-1}\right),\qquad\textrm{for }i=1,2. (23)
𝜼|𝒛,𝑿D\displaystyle\bm{\eta}|\bm{z},\bm{X}\sim D (1+s,𝒛+s𝒃1+s),\displaystyle\left(1+s,\frac{\bm{z}+s\bm{b}}{1+s}\right), (24)

where 𝐇i=𝐅𝐕i𝐅+ρ𝐑i1\bm{H}_{i}=\bm{F}^{\prime}\bm{V}_{i}\bm{F}+\rho\bm{R}_{i}^{-1} with ρ=σ2τ2\rho=\frac{\sigma^{2}}{\tau^{2}}.

The proof of Proposition 3 can be derived following the standard Bayesian framework, thus is omitted. To derive the Bayesian DD-optimal design criterion, we take the posterior distributions in Proposition 3 to (8) and set ψ()=log()\psi(\cdot)=\log(\cdot). The derivation is very similar to that in (11), and thus we obtain the exact design criterion as

Ψ(𝑿|μ0,σ2)\displaystyle\Psi(\bm{X}|\mu_{0},\sigma^{2}) =𝔼𝒛,𝜼{log(p(𝜼|𝒛,𝑿))}\displaystyle=\mathbb{E}_{\bm{z},\bm{\eta}}\left\{\log(p(\bm{\eta}|\bm{z},\bm{X}))\right\} (25)
+12i=12𝔼𝜼𝔼𝒛|𝜼{logdet(𝑭𝑽i𝑭+ρ𝑹i1)}+constant.\displaystyle+\frac{1}{2}\sum_{i=1}^{2}\mathbb{E}_{\bm{\eta}}\mathbb{E}_{\bm{z}|\bm{\eta}}\left\{\log\det(\bm{F}^{\prime}\bm{V}_{i}\bm{F}+\rho\bm{R}_{i}^{-1})\right\}+\textrm{constant}.

As the integration of 𝔼𝒛,𝜼{log(p(𝜼|𝒛,𝑿))}\mathbb{E}_{\bm{z},\bm{\eta}}\left\{\log(p(\bm{\eta}|\bm{z},\bm{X}))\right\} is not tractable, we adopt the same normal approximation of the posterior distribution p(𝜼|𝒛,𝑿)p(\bm{\eta}|\bm{z},\bm{X}) as in (12). A straightforward calculation leads to getting Fisher information matrix 𝑰(𝜼|𝑿)=(1+s)𝑭𝑾0𝑭\bm{I}(\bm{\eta}|\bm{X})=(1+s)\bm{F}^{\prime}\bm{W}_{0}\bm{F}. Thus we have

𝔼𝒛,𝜼{log(p(𝜼|𝒛,𝑿))}𝔼𝜼{logdet(𝑭𝑾0𝑭)}+constant.\mathbb{E}_{\bm{z},\bm{\eta}}\left\{\log(p(\bm{\eta}|\bm{z},\bm{X}))\right\}\approx\mathbb{E}_{\bm{\eta}}\{\log\det(\bm{F}^{\prime}\bm{W}_{0}\bm{F})\}+\textrm{constant}. (26)

Disregarding the constant, we can approximate the exact Ψ(𝑿|μ0,σ2)\Psi(\bm{X}|\mu_{0},\sigma^{2}) by

Ψ(𝑿|μ0,σ2)𝔼𝜼{logdet(𝑭𝑾0𝑭)}+12i=12𝔼𝜼𝔼𝒛|𝜼{logdet(𝑭𝑽i𝑭+ρ𝑹i1)},\Psi(\bm{X}|\mu_{0},\sigma^{2})\approx\mathbb{E}_{\bm{\eta}}\{\log\det(\bm{F}^{\prime}\bm{W}_{0}\bm{F})\}+\frac{1}{2}\sum_{i=1}^{2}\mathbb{E}_{\bm{\eta}}\mathbb{E}_{\bm{z}|\bm{\eta}}\left\{\log\det(\bm{F}^{\prime}\bm{V}_{i}\bm{F}+\rho\bm{R}_{i}^{-1})\right\},

The following Theorem 2 gives an upper bound of the approximated criterion Ψ(𝑿|μ0,σ2)\Psi(\bm{X}|\mu_{0},\sigma^{2}) to avoid the integration with respect to 𝒛\bm{z}, which plays the same role as Theorem 1.

Theorem 2.

Assume that the prior distributions of 𝛃(i)\bm{\beta}^{(i)} are 𝛃(i)N(𝟎,τ2𝐑i)\bm{\beta}^{(i)}\sim N(\bm{0},\tau^{2}\bm{R}_{i}) for i=1,2i=1,2 and 𝛈\bm{\eta} has either the conjugate prior 𝛈D(s,𝐛)\bm{\eta}\sim D(s,\bm{b}) or the noninformative prior p(𝛈)1p(\bm{\eta})\propto 1. If 𝐅𝐖0𝐅\bm{F}^{\prime}\bm{W}_{0}\bm{F} is nonsingular, an upper bound of the approximated Ψ(𝐗|μ0,σ2)\Psi(\bm{X}|\mu_{0},\sigma^{2}) is

Q(𝑿)=𝔼𝜼(logdet(𝑭𝑾0𝑭)+12i=12logdet(𝑭𝑾i𝑭+ρ𝑹i1)).Q(\bm{X})=\mathbb{E}_{\bm{\eta}}\left(\log\det(\bm{F}^{\prime}\bm{W}_{0}\bm{F})+\frac{1}{2}\sum_{i=1}^{2}\log\det(\bm{F}^{\prime}\bm{W}_{i}\bm{F}+\rho\bm{R}_{i}^{-1})\right). (27)

For the same argument as in Section 3, we use the upper bound in (27) as the optimal design criterion. Note that since ρ𝑹i1\rho\bm{R}_{i}^{-1} is added to 𝑭𝑽i𝑭\bm{F}^{\prime}\bm{V}_{i}\bm{F} and 𝑭𝑾i𝑭\bm{F}^{\prime}\bm{W}_{i}\bm{F}, 𝑭𝑽i𝑭+ρ𝑹i1\bm{F}^{\prime}\bm{V}_{i}\bm{F}+\rho\bm{R}_{i}^{-1} and 𝑭𝑾i𝑭+ρ𝑹i1\bm{F}^{\prime}\bm{W}_{i}\bm{F}+\rho\bm{R}_{i}^{-1} are nonsingular. The derivation of Ψ(𝑿|μ0,σ2)\Psi(\bm{X}|\mu_{0},\sigma^{2}) and Q(𝑿)Q(\bm{X}) only needs 𝑭𝑾0𝑭\bm{F}^{\prime}\bm{W}_{0}\bm{F} to be nonsingular, which requires mqm\geq q and 𝜼\bm{\eta} to be finitely bounded as in Section 3.

4.2 Interpretation

Note that the criterion in (27) has a similar formulation with Q(𝑿)Q(\bm{X}) in (14). The only difference is that (14) does not involve ρ𝑹i1\rho\bm{R}_{i}^{-1}. For consistency, we use the formula (27) as the design criterion Q(𝑿)Q(\bm{X}) for both cases. When noninformative priors for 𝜷(1)\bm{\beta}^{(1)} and 𝜷(2)\bm{\beta}^{(2)} are used, we set ρ=0\rho=0. From another point of view, as τ2\tau^{2}\rightarrow\infty, ρ0\rho\rightarrow 0, the variances in the priors p(𝜷1)p(\bm{\beta}_{1}) and p(𝜷2)p(\bm{\beta}_{2}) diffuse and result in a noninformative priors.

The criterion Q(𝑿)Q(\bm{X}), consisting of three additive terms, can be interpreted intuitively. The first additive term 𝔼𝜼{logdet(𝑭𝑾0𝑭)}\mathbb{E}_{\bm{\eta}}\{\log\det(\bm{F}^{\prime}\bm{W}_{0}\bm{F})\} is known as the Bayesian DD-optimal criterion for logistic regression and 𝔼𝜼{logdet(𝑭𝑾i𝑭+ρ𝑹i1)}\mathbb{E}_{\bm{\eta}}\{\log\det(\bm{F}^{\prime}\bm{W}_{i}\bm{F}+\rho\bm{R}_{i}^{-1})\} is the Bayesian DD-optimal criterion for the linear regression model of YY. To explain the weights, we rewrite Q(𝑿)Q(\bm{X}) as follows.

Q(𝑿)=1𝔼𝜼(logdet(𝑭𝑾0𝑭))+1(12i=12𝔼𝜼(logdet(𝑭𝑾i𝑭+ρ𝑹i1)))Q(\bm{X})=1\cdot\mathbb{E}_{\bm{\eta}}\left(\log\det(\bm{F}^{\prime}\bm{W}_{0}\bm{F})\right)+1\cdot\left(\frac{1}{2}\sum_{i=1}^{2}\mathbb{E}_{\bm{\eta}}(\log\det(\bm{F}^{\prime}\bm{W}_{i}\bm{F}+\rho\bm{R}_{i}^{-1}))\right)

Since there are equal numbers of binary and continuous response observations, the design criterion should put the same weight (equal to 1) on both design criteria for ZZ and YY. For the two criteria for the linear regression models, the same weight 1/2 is used. This is also reasonable because we assume πi(0,1)\pi_{i}\in(0,1). Then none of the diagonal entries of 𝑾1\bm{W}_{1} and 𝑾2\bm{W}_{2} are zero, so the two terms should split the total weight 1 assigned for the entire linear regression part. Therefore, even though Q(𝑿)Q(\bm{X}) are derived analytically, all the additive terms and their weights make sense intuitively.

4.3 Prior Parameters

Note that the conjugate prior p(𝜼)p(\bm{\eta}) requires prior parameters (s,𝒃)(s,\bm{b}) to be specified. Moreover, the prior distribution (22) contains f(𝒙i)f(\bm{x}_{i}), which depends on the design points. When sampling 𝜼\bm{\eta} from the prior (22), it does not matter whether f(𝒙i)f(\bm{x}_{i})’s are actually from the design points. If relevant historical data is available, we can simply sample 𝜼\bm{\eta} from the likelihood of the data. Alternatively, one can adopt the method in Chen and Ibrahim (2003) to estimate the parameters (s,𝒃)(s,\bm{b}). Without the relevant data, we would use the noninformative prior for 𝜼\bm{\eta}, i.e., p(𝜼)1p(\bm{\eta})\propto 1 in the bounded region for 𝜼\bm{\eta}.

The design criterion Q(𝑿)Q(\bm{X}) contains some unknown parameters, including the noise-to-signal ratio ρ=σ2/τ2\rho=\sigma^{2}/\tau^{2} and the correlation matrices 𝑹i\bm{R}_{i}’s for i=1,2i=1,2. The value of ρ\rho has to be specified either from the historical data or from the domain knowledge. Typically we would assume ρ<1\rho<1 such that the measurement error has a smaller variance than the signal variance.

The setting of 𝑹i\bm{R}_{i} can also be specified flexibly. If historical data are available, 𝑹i\bm{R}_{i} can be set as the estimated correlation matrix of 𝜷(i)\bm{\beta}^{(i)}. Otherwise, we can use the correlation matrix in Joseph (2006) and Kang and Joseph (2009), which is targeted for factorial design. Specifically, let 𝜷\bm{\beta} be the unknown coefficients of the linear regression model and the prior distribution is 𝜷N(𝟎,τ2𝑹)\bm{\beta}\sim N(\bm{0},\tau^{2}\bm{R}). For 2-level factor coded in 1-1 and 1, Joseph (2006) suggests that 𝑹\bm{R} is a diagonal matrix and the priors for individual βj\beta_{j} is

β0N(0,τ2),\displaystyle\beta_{0}\sim N(0,\tau^{2}), (28)
βjN(0,τ2r),i=1,,p,\displaystyle\beta_{j}\sim N(0,\tau^{2}r),\qquad i=1,\ldots,p,
βjN(0,τ2r2),i=p+1,,p+(p2),\displaystyle\beta_{j}\sim N(0,\tau^{2}r^{2}),\qquad i=p+1,\ldots,p+\binom{p}{2},
\displaystyle\vdots
β2p1N(0,τ2rp),\displaystyle\beta_{2^{p}-1}\sim N(0,\tau^{2}r^{p}),

where βj\beta_{j} i=1,,pi=1,\ldots,p are main effects, βj\beta_{j} j=p+1,,p+(p2)j=p+1,\ldots,p+\binom{p}{2} are 2-factor-interactions and up to the pp-factor-interaction β2p1\beta_{2^{p}-1}. The variance of βj\beta_{j} decreases exponentially with the order of their corresponding effects by r(0,1)r\in(0,1), thus it incorporates the effects hierarchy principle (Wu and Hamada, 2011). Joseph (2006) showed that if 𝒇(𝒙)\bm{f}(\bm{x}) contains all the 2p2^{p} effects of all pp orders, τ2𝑹\tau^{2}\bm{R} can be represented alternatively by Kronecker product as τ2𝑹=ς2j=1p𝑭j(xj)1𝚿j(xj)(𝑭j(xj))1\tau^{2}\bm{R}=\varsigma^{2}\bigotimes_{j=1}^{p}\bm{F}_{j}(x_{j})^{-1}\bm{\Psi}_{j}(x_{j})(\bm{F}_{j}(x_{j}))^{-1}. The model matrix for the 2-level factor and the correlation matrix are

𝑭j(xj)=(1111)and𝚿j(xj)=(1ζζ1).{\bm{F}_{j}(x_{j})}=\left(\begin{array}[]{cc}1&-1\\ 1&1\end{array}\right)\;\;\textrm{and}\;\;\bm{\Psi}_{j}(x_{j})=\left(\begin{array}[]{cc}1&\zeta\\ \zeta&1\end{array}\right). (29)

To keep the two different presentations equivalent, let ζ=1r1+r\zeta=\frac{1-r}{1+r} and τ2=(1+ζ2)pς2\tau^{2}=(\frac{1+\zeta}{2})^{p}\varsigma^{2}. For the mixed-level of 2- and 3-level experiments, Kang and Joseph (2009) have extended the 2-level case to the format

τ2𝑹=ς2j=1p2+p3,c+p3,q𝑭j(xj)1𝚿j(xj)(𝑭j(xj)1),\tau^{2}\bm{R}=\varsigma^{2}\bigotimes_{j=1}^{p_{2}+p_{3,c}+p_{3,q}}\bm{F}_{j}(x_{j})^{-1}\bm{\Psi}_{j}(x_{j})(\bm{F}_{j}(x_{j})^{-1})^{\prime}, (30)

where p2p_{2} is the number of 2-level factors, p3,cp_{3,c} is the number of 3-level qualitative (categorical) factors, and p3,qp_{3,q} is the number of 3-level quantitative factors. For all the 3-level factors, the model matrix is

𝑭j(xj)=(1321210213212).{\bm{F}_{j}(x_{j})}=\left(\begin{array}[]{ccc}1&-\sqrt{\frac{3}{2}}&\sqrt{\frac{1}{2}}\\ 1&0&-\sqrt{2}\\ 1&\sqrt{\frac{3}{2}}&\sqrt{\frac{1}{2}}\end{array}\right). (31)

An isotropic correlation function is recommended for the 3-level qualitative factors and a Gaussian correlation function for quantitative factors. Thus, the correlation matrices for the 3-level qualitative and quantitative factors are

𝚿j(xj)=(1ζζζ1ζζζ1)and𝚿j(xj)=(1ζζ4ζ1ζζ4ζ1),{\bm{\Psi}_{j}(x_{j})}=\left(\begin{array}[]{ccc}1&\zeta&\zeta\\ \zeta&1&\zeta\\ \zeta&\zeta&1\end{array}\right)\;\;\textrm{and}\;\;\bm{\Psi}_{j}(x_{j})=\left(\begin{array}[]{ccc}1&\zeta&\zeta^{4}\\ \zeta&1&\zeta\\ \zeta^{4}&\zeta&1\end{array}\right), (32)

respectively. To keep the covariance (30) consistent with the 2-level case we still set ζ=1r1+r\zeta=\frac{1-r}{1+r}. To keep the variance of the intercept equal to τ2\tau^{2} (Kang and Joseph, 2009), we set

τ2=ς2(1+ζ2)p2(1+2ζ3)p3,c(3+4ζ+2ζ49)p3,q,\tau^{2}=\varsigma^{2}\left(\frac{1+\zeta}{2}\right)^{p_{2}}\left(\frac{1+2\zeta}{3}\right)^{p_{3,c}}\left(\frac{3+4\zeta+2\zeta^{4}}{9}\right)^{p_{3,q}},

and thus

𝑹=((1+ζ2)p2(1+2ζ3)p3,c(3+4ζ+2ζ49)p3,q)1×j=1p2+p3,c+p3,q𝑭j(xj)1𝚿j(xj)(𝑭j(xj)1).\bm{R}=\left(\left(\frac{1+\zeta}{2}\right)^{p_{2}}\left(\frac{1+2\zeta}{3}\right)^{p_{3,c}}\left(\frac{3+4\zeta+2\zeta^{4}}{9}\right)^{p_{3,q}}\right)^{-1}\times\bigotimes_{j=1}^{p_{2}+p_{3,c}+p_{3,q}}\bm{F}_{j}(x_{j})^{-1}\bm{\Psi}_{j}(x_{j})(\bm{F}_{j}(x_{j})^{-1})^{\prime}.

It is straightforward to prove that 𝑹\bm{R} is a diagonal matrix if only 2-level and 3-level qualitative factors are involved, but not so if any 3-level quantitative factors are involved, and the first diagonal entry of 𝑹\bm{R} is always 1.

To specify different prior distributions for 𝜷(1)\bm{\beta}^{(1)} and 𝜷(2)\bm{\beta}^{(2)}, we only need to use different values r1r_{1} (or ζ1\zeta_{1}) and r2r_{2} (or ζ2\zeta_{2}) to construct the prior correlation matrix. If the prior knowledge assumes that the two responses ZZ and YY are independent, one can set r1=r2=rr_{1}=r_{2}=r so that the two correlation matrices 𝑹i\bm{R}_{i}’s are the same, denoted as 𝑹\bm{R}. Kang and Joseph (2009) has used r=1/3r=1/3 (equivalently ζ=1/2\zeta=1/2) according to a meta-analysis of 113 data sets from published experiments (Li et al., 2006). Thus we also use r=1/3r=1/3 in all the examples. The readers can specify different values for r1r_{1} and r2r_{2} if needed.

In computation, we construct 𝑹\bm{R} using the Kronecker product in (30). But such 𝑹\bm{R} is for 𝒇(𝒙)\bm{f}(\bm{x}) containing effects of all possible orders. Usually, we would assume the model just contains lower-order effects. So we just pick the rows and columns that correspond to the lower-order effects assumed in the model as the correlation matrix.

5 Design Search Algorithm

In this work, we focus on the construction of optimal design based on factorial design, which is suited for the prior distribution introduced in Section 4.3. For optimizing the design criterion Q(𝑿)Q(\bm{X}) we consider two cases. First, for fixed 𝜼\bm{\eta} value, we develop a point-exchange algorithm to construct a local optimal design that maximizes the criterion Q(𝑿|𝜼)Q(\bm{X}|\bm{\eta}). Second, we construct a global optimal design based on the prior distribution of 𝜼\bm{\eta}. Specifically, we construct the local optimal designs for different 𝜼\bm{\eta}’s sampled from its prior distribution. Then the global optimal continuous design is obtained by accumulating the frequencies of design points selected into those local optimal designs.

5.1 Local Optimal Design for Fixed η\eta

For a fixed 𝜼\bm{\eta}, we adapt the point-wise exchange algorithm to maximize the criterion

Q(𝑿|𝜼)=logdet(𝑭𝑾0𝑭)+12i=1nlogdet(𝑭𝑾i𝑭+ρ𝑹i1).Q(\bm{X}|\bm{\eta})=\log\det(\bm{F}^{\prime}\bm{W}_{0}\bm{F})+\frac{1}{2}\sum_{i=1}^{n}\log\det(\bm{F}^{\prime}\bm{W}_{i}\bm{F}+\rho\bm{R}_{i}^{-1}).

The point-wise exchange algorithm is commonly used to construct DD-optimal designs. It was first introduced by Fedorov (1972) and then widely used in many works (Cook and Nachtrheim, 1980; Nguyen and Miller, 1992).

The point-wise exchange algorithm finds the optimal design from a candidate set. Here the candidate set is chosen to be the full factorial design without replicates. For now, we develop the method for 2- and 3-level factors, but it can be generalized to factors of more levels. Use previous notation that p2p_{2}, p3,cp_{3,c}, p3,qp_{3,q} as the number of 2-level, 3-level categorical, and 3-level quantitative factors. The total number of full factorial design points is N=2p23p3,c+p3,qN=2^{p_{2}}3^{p_{3,c}+p_{3,q}}, which can be large if the experiment involves many factors. To make the algorithm efficient, we filter out the candidate points that are unlikely to be the optimal design points. Following the suggestion from Dror and Steinberg (2005), we exclude the candidate design points whose corresponding probabilities π(𝒙,𝜼)\pi(\bm{x},\bm{\eta}) is outside of [0.15,0.85][0.15,0.85]. This range is used because the approximate variance of log(πi1πi)\log\left(\frac{\pi_{i}}{1-\pi_{i}}\right) is nearly constant for πi(0.2,0.8)\pi_{i}\in(0.2,0.8) but increases rapidly if πi\pi_{i} is outside that range (Wu, 1985). Denote the reduced candidate set as 𝑿c\bm{X}_{c} with size NN^{\prime}.

Next we construct the initial design of size nn, such that 𝑭𝑾0𝑭\bm{F}^{\prime}\bm{W}_{0}\bm{F} is nonsingular, and so should be 𝑭𝑾i𝑭\bm{F}^{\prime}\bm{W}_{i}\bm{F} if ρ=0\rho=0 for i=1,2i=1,2. If NqN^{\prime}\geq q, we construct the initial design by reduction. Starting the initial design as 𝑿c\bm{X}_{c}, we remove the design points one by one until there are qq points left. The remaining nqn-q design points are then sampled from these qq initial design points with probabilities proportional to the lower bounds in the sufficient condition in Proposition 1. For removing one design point, we select the one having the smallest deletion function d(𝒙)d(\bm{x}) defined in (33). Shortcut formulas are developed in Appendix for updating the inverse of the matrices 𝑭𝑾0𝑭\bm{F}^{\prime}\bm{W}_{0}\bm{F} and 𝑭𝑾i𝑭+ρ𝑹\bm{F}^{\prime}\bm{W}_{i}\bm{F}+\rho\bm{R} for i=1,2i=1,2 after one design point is removed. If NqN^{\prime}\leq q, we have to restore the candidate set back to the full factorial design and construct the initial design in the same reduction fashion.

To simplify the notation for d(𝒙)d(\bm{x}), we define vi(𝒙1,𝒙2)=𝒇(𝒙1)𝑴i𝒇(𝒙2)v_{i}(\bm{x}_{1},\bm{x}_{2})=\bm{f}(\bm{x}_{1})^{\prime}\bm{M}_{i}\bm{f}(\bm{x}_{2}) and vi(𝒙)=𝒇(𝒙)𝑴i𝒇(𝒙)v_{i}(\bm{x})=\bm{f}(\bm{x})^{\prime}\bm{M}_{i}\bm{f}(\bm{x}) for i=0,1,2i=0,1,2, where 𝑴0=(𝑭𝑾0𝑭)1\bm{M}_{0}=\left(\bm{F}^{\prime}\bm{W}_{0}\bm{F}\right)^{-1} and 𝑴i=(𝑭𝑾i𝑭+ρ𝑹i1)1\bm{M}_{i}=\left(\bm{F}^{\prime}\bm{W}_{i}\bm{F}+\rho\bm{R}_{i}^{-1}\right)^{-1} for i=1,2i=1,2. Denote 𝑿\bm{X} as the current design and 𝑿i\bm{X}_{-i} the design of 𝑿\bm{X} with the iith row removed. Then the deletion function can be derived as

d(𝒙i)\displaystyle d(\bm{x}_{i}) =Q(𝑿|𝜼)Q(𝑿i|𝜼)\displaystyle=Q(\bm{X}|\bm{\eta})-Q(\bm{X}_{-i}|\bm{\eta}) (33)
={log[1π(𝒙i,𝜼)(1π(𝒙i,𝜼))v0(𝒙i)]+12log[1π(𝒙i,𝜼)v1(𝒙i)]\displaystyle=-\left\{\log\left[1-\pi(\bm{x}_{i},\bm{\eta})(1-\pi(\bm{x}_{i},\bm{\eta}))v_{0}(\bm{x}_{i})\right]+\frac{1}{2}\log\left[1-\pi(\bm{x}_{i},\bm{\eta})v_{1}(\bm{x}_{i})\right]\right.
+12log[1(1π(𝒙i,𝜼))v2(𝒙i)]}.\displaystyle\left.+\frac{1}{2}\log\left[1-(1-\pi(\bm{x}_{i},\bm{\eta}))v_{2}(\bm{x}_{i})\right]\right\}.

The smaller d(𝒙i)d(\bm{x}_{i}) is, the less contribution the corresponding point makes for the overall objective Q(𝑿|𝜼)Q(\bm{X}|\bm{\eta}).

One key of the point-wise exchange algorithm is to compute Δ(𝒙,𝒙i)=Q(𝑿|𝜼)Q(𝑿|𝜼)\Delta(\bm{x},\bm{x}_{i})=Q(\bm{X}^{*}|\bm{\eta})-Q(\bm{X}|\bm{\eta}), the change in the criterion after the candidate design point 𝒙\bm{x} replaces 𝒙i\bm{x}_{i} in the current design 𝑿\bm{X}. Here 𝑿\bm{X}^{*} is the new design matrix after the exchange. To compute Δ(𝒙,𝒙i)\Delta(\bm{x},\bm{x}_{i}) efficiently, we can obtain the following formula.

Δ(𝒙,𝒙i)\displaystyle\Delta(\bm{x},\bm{x}_{i}) =Q(𝑿|𝜼)Q(𝑿|𝜼)\displaystyle=Q(\bm{X}^{*}|\bm{\eta})-Q(\bm{X}|\bm{\eta}) (34)
=logΔ0(𝒙,𝒙i)+12i=12logΔi(𝒙,𝒙i),\displaystyle=\log\Delta_{0}(\bm{x},\bm{x}_{i})+\frac{1}{2}\sum_{i=1}^{2}\log\Delta_{i}(\bm{x},\bm{x}_{i}),

where Δi(𝒙,𝒙i)\Delta_{i}(\bm{x},\bm{x}_{i}) for i=0,1,2i=0,1,2 are derived in Appendix. The matrices 𝑴i\bm{M}_{i} for i=0,1,2i=0,1,2 need to be updated after the exchange of design points. Denote the updated matrices as 𝑴i\bm{M}_{i}^{*} for the updated design 𝑿\bm{X}^{*}. We derive the shortcut formulas to easily compute 𝑴i\bm{M}_{i}^{*} for i=0,1,2i=0,1,2 as shown in Appendix.

Given the initial design, we can iteratively exchange the current design points with candidate design points to improve the objective Q(𝑿|𝜼)Q(\bm{X}|\bm{\eta}). The details are listed in the following Algorithm 1.

Algorithm 1 Exchange-Point Algorithm for Local DD-Optimal Design

Step 0: Generate the candidate design set from full factorial design. Filter out the points with probabilities π(𝒙,𝜼)\pi(\bm{x},\bm{\eta}) outside of [0.15,0.85][0.15,0.85].

Step 1: Generate the initial design. Based on the initial design 𝑿\bm{X}, update the matrices 𝑭\bm{F}, 𝑾i\bm{W}_{i}, and 𝑴i\bm{M}_{i} for i=0,1,2i=0,1,2. Compute the current objective value Q(𝑿|𝜼)Q(\bm{X}|\bm{\eta}).

Step 2: Compute the deletion function d(𝒙i)d(\bm{x}_{i}) for each 𝒙i\bm{x}_{i} in 𝑿\bm{X}. Randomly sample one design point with probability inversely proportional to d(𝒙i)d(\bm{x}_{i})’s. Denote it as 𝒙i0\bm{x}_{i_{0}}.

Step 3: Find 𝒙\bm{x}^{*} as the candidate point having the largest Δ(𝒙,𝒙i0)\Delta(\bm{x},\bm{x}_{i_{0}}). If Δ(𝒙,𝒙i0)>0\Delta(\bm{x}^{*},\bm{x}_{i_{0}})>0, exchange 𝒙\bm{x}^{*} with 𝒙i0\bm{x}_{i_{0}} in 𝑿\bm{X} and update the objective function value to Q(𝑿|𝜼)+Δ(𝒙,𝒙i0)Q(\bm{X}|\bm{\eta})+\Delta(\bm{x}^{*},\bm{x}_{i_{0}}).

Step 4: Repeat Step 2 and 3 until the objective function has been stabilized or the maximum number of iterations is reached.

The Algorithm 1 can return different optimal designs due to different initial designs and the random sampling in Step 2. Thus, we run Algorithm 1 a few times and return the design with the best optimal value. We have several remarks regarding the algorithm. (1) The initial design generated via reduction does not have singularity issues. (2) The updated design from point-exchange does not have the singularity problem either, based on the way 𝒙\bm{x}^{*} is selected and 𝑴i\bm{M}_{i}^{*} for i=0,1,2i=0,1,2 are computed. (3) To avoid being trapped in a local maximum, in Step 2 we randomly sample the design point for an exchange instead of deterministically picking the “worst” point. (4) Different from some other point-exchange algorithms, the candidate set here remains the same through Steps 1-4 since no points are deleted if they are selected in the design. It enables the resultant optimal design having replicated design points.

5.2 Global Optimal Design

Based on Algorithm 1 for local DD-optimal design, we can use the following Algorithm 2 to construct global optimal design.

Algorithm 2 Algorithm for Global DD-Optimal Design

Step 0: If p(𝜼)p(\bm{\eta}) is informative, simulate 𝜼jp(𝜼)\bm{\eta}_{j}\sim p(\bm{\eta}) for j=1,,Bj=1,\ldots,B. Otherwise, 𝜼\bm{\eta} is uniformly distributed in a rectangular high-dimensional space.

Step 1: For each 𝜼j\bm{\eta}_{j}, call Algorithm 1 to construct the local optimal design 𝑿j\bm{X}_{j}.

Step 2: For each point in the candidate set, count its frequency of being selected in the local optimal designs. The continuous optimal design is formed by the normalized frequency as a discrete distribution.

Step 3: To obtain a discrete optimal design, sample nn design points from the continuous optimal design.

In Step 1 of generating 𝜼\bm{\eta} uniformly, we can use uniform design (Fang et al., 2000), maximin Latin hypercube design (Morris and Mitchell, 1995), or other space filling design methods (Joseph and Hung, 2008; Lin et al., 2010; Qian, 2012) to select samples 𝜼j\bm{\eta}_{j} for j=1,,Bj=1,\ldots,B. From Algorithm 2, it is likely that the discrete design obtained in Step 3 has some design points with ni=1n_{i}=1. When experimenters prefer to have replications at every design point, they can choose a saturated design by sampling m=qm=q unique design points in Step 3. Then sample some 𝜼\bm{\eta} values as in Step 0. Compute the lower bounds for nin_{i} for every 𝜼\bm{\eta} sample according to Proposition 1 and use the averaged lower bounds to set nin_{i}. If i=1mni\sum_{i=1}^{m}n_{i} exceeds nn, the experimenters have to either increase the experiment budget or reduce the κ\kappa value.

6 Examples

In this section, we use two examples to demonstrate the proposed Bayesian DD-optimal design and the construction method. For both examples, we set r=1/3r=1/3 (equivalently ζ=1/2\zeta=1/2) as explained in Section 4.3. Since there are few existing works on experimental design for continuous and binary responses, we compare the proposed method with three alternative designs: the optimal designs for the quantitative-only response, the optimal design for the binary-only response, and the naively combined design method as mentioned in Example 1.

6.1 Artificial Example

In this artificial experiment, there are three 2-level factors x1x3x_{1}\sim x_{3}, one 3-level categorical factor x4x_{4}, and one 3-level quantitative factor x5x_{5}. The underlying model assumed is the complete quadratic model and 𝒇(𝒙)\bm{f}(\bm{x}) contains q=22q=22 model effects including the intercept and the following model effects.

First order effects: x1,x2,x3,x4,1,x4,2,x5,l,\displaystyle x_{1},x_{2},x_{3},x_{4,1},x_{4,2},x_{5,l},
Second order effects: x1x2,x1x3,x1x4,1,x1x4,2,x1x5,l,x2x3,x2x4,1,x2x4,2,\displaystyle x_{1}x_{2},x_{1}x_{3},x_{1}x_{4,1},x_{1}x_{4,2},x_{1}x_{5,l},x_{2}x_{3},x_{2}x_{4,1},x_{2}x_{4,2},
x2x5,l,x3x4,1,x3x4,2,x3x5,l,x4,1x5,l,x4,2x5,l,x5,quad.\displaystyle x_{2}x_{5,l},x_{3}x_{4,1},x_{3}x_{4,2},x_{3}x_{5,l},x_{4,1}x_{5,l},x_{4,2}x_{5,l},{\color[rgb]{0,0,0}x_{5,\text{quad}}}.

Here for the 3-level factors x4x_{4} and x5x_{5}, the effects x4,1x_{4,1} (1st comparison) and x5,lx_{5,l} (linear effect) have values {32,0,32}\left\{-\sqrt{\frac{3}{2}},0,\sqrt{\frac{3}{2}}\right\}, and x4,2x_{4,2} (2nd comparison) and x5,quadx_{5,\text{quad}} (quadratic effect) have values {12,2,12}\left\{-\sqrt{\frac{1}{2}},\sqrt{2},\sqrt{\frac{1}{2}}\right\}. For the 2-level factors, the effects xix_{i} i=1,2,3i=1,2,3 have the same values as the design settings {1,1}\{-1,1\}. We consider independent uniform distribution for each ηi\eta_{i}. Specifically, ηiUniform[1,1]\eta_{i}\sim\textrm{Uniform}[-1,1] for the intercept and the first order effects and Uniform[0.5,0.5][-0.5,0.5] for the second order effects. The ranges of ηi\eta_{i}’s satisfy the effect hierarchy principle.

Table 1: An example of 𝜼\bm{\eta} value for the local DD-optimal design.
Effect η\eta Effect η\eta Effect η\eta Effect η\eta
Intercept -0.0153 x1x_{1} -0.6067 x2x_{2} 0.7212 x1x2x_{1}x_{2} 0.0080
x3x_{3} -0.1682 x1x3x_{1}x_{3} 0.0010 x2x3x_{2}x_{3} 0.1349 x4,1x_{4,1} 0.0283
x1x4,1x_{1}x_{4,1} 0.0594 x2x4,1x_{2}x_{4,1} -0.1719 x3x4,1x_{3}x_{4,1} 0.1492 x4,2x_{4,2} -0.1468
x1x4,2x_{1}x_{4,2} 0.0553 x2x4,2x_{2}x_{4,2} -0.0634 x3x4,2x_{3}x_{4,2} -0.2629 x5,lx_{5,l} -0.0660
x1x5,lx_{1}x_{5,l} -0.1054 x2x5,lx_{2}x_{5,l} -0.0857 x3x5,lx_{3}x_{5,l} -0.0807 x4,1x5,lx_{4,1}x_{5,l} -0.1198
x4,2x5,lx_{4,2}x_{5,l} -0.0292 x5,quadx_{5,\text{quad}} -0.1336

We set the experimental run size to be n=66n=66. Table 1 illustrates the values of a randomly chosen 𝜼\bm{\eta}. Using Algorithm 2 with this 𝜼\bm{\eta}, we construct the proposed local DD-optimal designs for QQ model DQQD_{QQ} for ρ=0\rho=0 and ρ=0.3\rho=0.3, respectively. For comparison, we also generate three alternative designs via R package AlgDesign developed by Wheeler (2014). Specifically, they are (i) the 66-run classic DD-optimal design for linear regression model, denoted as DLD_{L}, (ii) the 66-run local DD-optimal design for logistic regression model given the 𝜼\bm{\eta}, denoted as DGD_{G}, (iii) and the naively combined design of 4444-run local DD-optimal design for logistic regression model and 22-run DD-optimal design for the linear regression model, denoted as DCD_{C}. The details of these designs can be found in Table A1 in Appendix.

To evaluate the performance of the proposed design in comparison with alternative designs, we consider the efficiency between two designs (Woods et al., 2006) as

eff(D1,D2|𝜼)=exp{1q(Q(D1|𝜼)Q(D2|𝜼))}.\displaystyle\textrm{eff}(D_{1},D_{2}|\bm{\eta})=\exp\left\{\frac{1}{q}\left(Q(D_{1}|\bm{\eta})-Q(D_{2}|\bm{\eta})\right)\right\}. (35)

Table 2 reports the efficiency of DQQD_{QQ} compared with DLD_{L}, DGD_{G}, and DCD_{C}, respectively. The proposed QQ optimal design DQQD_{QQ} gains the best efficiency over the three alternative designs. It appears that the combined design DCD_{C} has the second-best design efficiency.

Table 2: Design efficiency between the proposed local design DQQD_{QQ} and three alternative designs.
ρ\rho eff(DQQ,DL|𝜼)\textrm{eff}(D_{QQ},D_{L}|\bm{\eta}) eff(DQQ,DG|𝜼)\textrm{eff}(D_{QQ},D_{G}|\bm{\eta}) eff(DQQ,DC|𝜼)\textrm{eff}(D_{QQ},D_{C}|\bm{\eta})
0 1.08 1.11 1.05
0.3 1.10 1.14 1.07

Next, we focus on the comparison of DQQD_{QQ} with DCD_{C} under different 𝜼\bm{\eta} values. We generate a maximin Latin hypercube design of B=500B=500 runs (R package lhs by Carnell (2012)) for 𝜼\bm{\eta} with the lower and upper bounds specified earlier. For each of the 𝜼\bm{\eta} values, we construct a local QQ optimal design DQQD_{QQ} and the combined design DCD_{C}. Figure 2 shows the histogram of the eff(DQQ,DC|𝜼)\textrm{eff}(D_{QQ},D_{C}|\bm{\eta}) for different 𝜼\bm{\eta} value. All eff(DQQ,DC|𝜼)\textrm{eff}(D_{QQ},D_{C}|\bm{\eta}) values are larger than 1, indicating that the local QQ optimal design outperforms the combined design.

Refer to caption
Figure 2: Artificial example: the efficiency between each local DQQD_{QQ} and DCD_{C} under different 𝜼\bm{\eta} values: (a) ρ=0\rho=0 (b) ρ=0.3\rho=0.3.

Based on Algorithm 2, we accumulate frequencies of locally optimal designs and obtain the global DD-optimal designs shown in Figure 3. Denote dQQd_{QQ} and dCd_{C} are the proposed global optimal design for the QQ model and the global optimal combined design, respectively. The bar plots show the normalized frequencies for all the candidate points with the largest 22 frequencies colored blue. From Figure 3, for dQQd_{QQ}, the points in the middle have much smaller frequencies than the other points. It is known that these points in the middle correspond to the points with x5=0x_{5}=0 in Table A1. Note that these points are only necessary for estimating the coefficient for x5,qx_{5,q}, whose variances are the smallest in the prior for 𝜷(i)\bm{\beta}^{(i)}’s based on the effects hierarchy principle. In contrast, such a pattern is not observed for points with x4=0x_{4}=0. The reason is that x4x_{4} is a categorical variable and x4=1,0,1x_{4}=-1,0,1 are equally necessary to estimate the effects x4,1x_{4,1} and x4,2x_{4,2}. For dCd_{C}, the points with the largest 22 frequencies correspond to the 22-run DD-optimal design for the linear regression model, which is independent of 𝜼\bm{\eta} and remains the same every time. The points with x5=1x_{5}=1 and 1-1 only have slightly higher frequencies than the ones with x5=0x_{5}=0, due to the way we specify the prior distribution of 𝜼\bm{\eta}.

To compare the performances of the global designs, the design efficiencies in (35) is used with Q(d|𝜼)Q(d|\bm{\eta}) adapted as

Q(d|𝜼)\displaystyle Q(d|\bm{\eta}) =logdet(ni=1Nd(𝒙i)π(𝒙i,𝜼)(1π(𝒙i,𝜼))𝒇(𝒙i)f(𝒙i))\displaystyle=\log\det\left(n\sum_{i=1}^{N}d(\bm{x}_{i})\pi(\bm{x}_{i},\bm{\eta})(1-\pi(\bm{x}_{i},\bm{\eta}))\bm{f}(\bm{x}_{i})f(\bm{x}_{i})^{\prime}\right)
+12logdet(ni=1Nd(𝒙i)π(𝒙i,𝜼)𝒇(𝒙i)𝒇(𝒙i)+ρ𝑹)\displaystyle+\frac{1}{2}\log\det\left(n\sum_{i=1}^{N}d(\bm{x}_{i})\pi(\bm{x}_{i},\bm{\eta})\bm{f}(\bm{x}_{i})\bm{f}(\bm{x}_{i})^{\prime}+\rho\bm{R}\right)
+12logdet(ni=1Nd(𝒙i)(1π(𝒙i,𝜼))𝒇(𝒙i)𝒇(𝒙i)+ρ𝑹)\displaystyle+\frac{1}{2}\log\det\left(n\sum_{i=1}^{N}d(\bm{x}_{i})(1-\pi(\bm{x}_{i},\bm{\eta}))\bm{f}(\bm{x}_{i})\bm{f}(\bm{x}_{i})^{\prime}+\rho\bm{R}\right)

for a global optimal design dd given 𝜼\bm{\eta} value. Here d(𝒙i)d(\bm{x}_{i}) is the probability frequency for candidate design point 𝒙i\bm{x}_{i} specified by the design dd and i=1Nd(𝒙i)=1\sum_{i=1}^{N}d(\bm{x}_{i})=1. For the global optimal designs dQQd_{QQ} and dCd_{C} obtained previously, Figure 4 shows the histograms of the eff(dQQ,dC|𝜼)\textrm{eff}(d_{QQ},d_{C}|\bm{\eta}) values, where the 𝜼\bm{\eta} values are generated from another 100-run maximin Latin hypercube design. It is clear that dQQd_{QQ} is universally better than dCd_{C}, thus the proposed design is more robust to different values of 𝜼\bm{\eta}.

Refer to caption
Figure 3: Artificial example: global QQ optimal designs for (a) ρ=0\rho=0 and (b) ρ=0.3\rho=0.3 and (c) global combined design.
Refer to caption
Figure 4: Artificial example: the efficiency between each global QQ optimal design and combined design under different 𝜼\bm{\eta} values: (a) ρ=0\rho=0 (b) ρ=0.3\rho=0.3.

6.2 Etching Experiment

In the etching process described in Section 1, the etchant is circulating at a certain flow rate. The wafers are rotated and swung horizontally and vertically. Meanwhile, the air is blown in the etchant with certain pressure. There are five factors involved in the etching process, the wafer rotation speed (x1x_{1}), the pressure for blowing the bubbles (x2x_{2}), the horizontal and vertical frequencies for swinging wafers (x3x_{3}, x4x_{4}), and the flow rate of circulating the etchant (x5x_{5}). The engineers intend to experiment to study the relationship between these factors and the two QQ responses.

Because of the newly developed process, the historical data on similar processes are not directly applicable to this experiment. Based on some exploratory analysis, we set ρ=0.5\rho=0.5. Both domain knowledge and data have shown that the wafer appearance is the worst when both the rotating speed (x1x_{1}) and bubble pressure (x2x_{2}) are low. Accordingly, we set the prior of 𝜼\bm{\eta} as follows. For intercept η0Uniform[0,6]\eta_{0}\sim\textrm{Uniform}[0,6]. The linear effects of rotating speed and bubble pressure follow Uniform[1,5]\textrm{Uniform}[1,5]. The other linear effects follow Uniform[1,1]\textrm{Uniform}[-1,1] and the second order interactions and quadratic effects Uniform[0.3,0.3]\textrm{Uniform}[-0.3,0.3]. The experimental run size is set to be n=21×6=126n=21\times 6=126, 6 times the number of effects q=21q=21.

We generate a maximin Latin hypercube design of B=500B=500 runs for 𝜼\bm{\eta} values. For each 𝜼\bm{\eta} value, we obtain the local optimal designs DQQD_{QQ} and DCD_{C}. Here the local combined design DCD_{C} has 2/32/3 of the runs generated from the local DD-optimal design for logistic regression and 1/31/3 of the runs from the DD-optimal design for linear regression. The efficiency between each pair of local designs DQQD_{QQ} and DCD_{C} is reported in Figure 6(a). We can see that almost every local design DQQD_{QQ} is better than DCD_{C}. Moreover, we obtain the global optimal designs dQQd_{QQ} and dCd_{C} by accumulating the frequencies of the local designs. To compare dQQd_{QQ} and dCd_{C}, we generate another 100-run maximin Latin hypercube design for 𝜼\bm{\eta} values and compute the efficiencies between dQQd_{QQ} and dCd_{C} under different 𝜼\bm{\eta} values, which are shown in Figure 6 (b). Clearly, dQQd_{QQ} is universally better and more robust to 𝜼\bm{\eta} than dCd_{C}.

Fractional factorial design (Wu and Hamada, 2011) is another commonly used design in practice. We compare the proposed design with a 3523^{5-2} minimum aberration (MA) fractional factorial design by defining the contrast subgroup as I=ABD2=AB2CE2=AC2DE=BCDE2I=ABD^{2}=AB^{2}CE^{2}=AC^{2}DE=BCDE^{2}. Each design point is replicated 5 times and the overall run is 352×5=1353^{5-2}\times 5=135. Figure 6(c) shows the histogram of the efficiencies between dQQd_{QQ} and the MA design, and the proposed global optimal design is still superior.

Refer to caption
Figure 5: Etching experiment: (a) the global Bayesian QQ DD-optimal design for ρ=0.5\rho=0.5 and (b) the global combined design.
Refer to caption
Figure 6: Etching experiment: histograms of the efficiencies (a) efficiencies between local designs DQQD_{QQ} and DCD_{C} for different 𝜼\bm{\eta}’s; (b) efficiencies between global designs dQQd_{QQ} and dCd_{C}; (c) efficiencies between global design dQQd_{QQ} and the 3523^{5-2} MA fractional factorial design.

7 Discussion

In this paper, we propose the Bayesian DD-optimal design criterion for QQ responses. The adoption of the Bayesian approach allows us to consider both the non-informative priors as the frequentist approach and informative priors when domain knowledge or historical data are available. A new point-exchange algorithm is developed for efficiently constructing the proposed designs. This algorithm can also be used to construct non-factorial designs when the candidate set is not a full factorial design. Moreover, the proposed method can be directly generalized for the sequential design with the QQ response. In the following, we discuss some other scenarios for the proposed method that are not considered in detail previously.

Non-conjugate Prior for η\bm{\eta}

Other than the conjugate prior p(𝜼)p(\bm{\eta}), we can also use a non-conjugate prior distribution 𝜼N(𝟎,τ02𝑹0)\bm{\eta}\sim N(\bm{0},\tau_{0}^{2}\bm{R}_{0}). In this situation, one can consider the normal approximation in the posterior distribution p(𝜼|𝒛)p(\bm{\eta}|\bm{z}). Then the design criterion for the binary response becomes (Chaloner and Verdinelli, 1995) 𝔼𝒛,𝜼{log(p(𝜼|𝒛)}𝔼𝜼{logdet(𝑭𝑾0𝑭+ρ0𝑹01)}\mathbb{E}_{\bm{z},\bm{\eta}}\{\log(p(\bm{\eta}|\bm{z})\}\approx\mathbb{E}_{\bm{\eta}}\left\{\log\det(\bm{F}^{\prime}\bm{W}_{0}\bm{F}+\rho_{0}\bm{R}_{0}^{-1})\right\}. The overall design criterion Q(𝑿)Q(\bm{X}) can be updated as

Q(𝑿)=i=02𝔼𝜼{logdet(𝑭𝑾i𝑭+τi𝑹i1)},Q(\bm{X})=\sum_{i=0}^{2}\mathbb{E}_{\bm{\eta}}\left\{\log\det\left(\bm{F}^{\prime}\bm{W}_{i}\bm{F}+\tau_{i}\bm{R}_{i}^{-1}\right)\right\},

where ρi=σ2/τi\rho_{i}=\sigma^{2}/\tau_{i}, τi\tau_{i} and 𝑹i\bm{R}_{i} are the prior variance and correlation of 𝜼\bm{\eta}, 𝜷(1)\bm{\beta}^{(1)}, and 𝜷(2)\bm{\beta}^{(2)} respectively. The proposed design construction algorithm can still be applied with minor modifications.

Multiple QQ Responses

In this paper, we focus on optimal designs for one quantitative response and one qualitative response. The proposed method can also be generalized to accommodate the QQ models with multiple quantitative responses Y1,,YlY_{1},\ldots,Y_{l} and binary responses Z1,,ZkZ_{1},\ldots,Z_{k}. For example, a multi-level qualitative response can be transformed into a set of dummy binary responses. One idea is to generalize the QQ models in (3) and (4) for both l1l\geq 1 and k1k\geq 1. For l1l\geq 1 and k=1k=1, we generalize the QQ model by introducing the correlation matrix between Y1,,YlY_{1},\ldots,Y_{l} as in the standard multi-response regression (Breiman and Friedman, 1997). Then the corresponding optimal design can be established by studying its likelihood function. For l1l\geq 1 and k>1k>1 with multiple binary responses, considering all 2k2^{k} conditional models for (Y1,,Yl|Z1=1,,Zk=1)(Y_{1},\ldots,Y_{l}|Z_{1}=1,\ldots,Z_{k}=1),…, (Y1,,Yl|Z1=0,,Zk=0)(Y_{1},\ldots,Y_{l}|Z_{1}=0,\ldots,Z_{k}=0) only works for a small kk. Moreover, the construction algorithm can be more complicated as it needs to involve the multi-logit model (McCullagh, 1980) for modeling the multiple binary responses. When kk is relatively large, we are going to pursue an alternative QQ model and develop its corresponding optimal design method as a future research topic.

Continuous Design

The point-exchange algorithm is to construct the exact discrete optimal designs, which are different from the theoretical continuous optimal designs. As described in Sections 5 and 6, the way of generating the frequency as the local optimal design is heuristic. The rigorous definition of the local continuous DD-optimal design criterion is the probability measure ξ\xi on the design space Ω\Omega that maximizes

Q(𝑿|𝜼)=logdet(π(𝒙)(1π(𝒙))f(𝒙)f(𝒙)𝑑ξ(𝒙))+\displaystyle Q(\bm{X}|\bm{\eta})=\log\det\left(\int\pi(\bm{x})(1-\pi(\bm{x}))f(\bm{x})f(\bm{x})^{\prime}d\xi(\bm{x})\right)+
logdet((π(𝒙)f(𝒙)f(𝒙)+ρ𝑹11)𝑑ξ(𝒙))+logdet(((1π(𝒙))f(𝒙)f(𝒙)+ρ𝑹21)𝑑ξ(𝒙)).\displaystyle\log\det\left(\int(\pi\left(\bm{x})f(\bm{x})f(\bm{x})^{\prime}+\rho\bm{R}_{1}^{-1}\right)d\xi(\bm{x})\right)+\log\det\left(\int\left((1-\pi(\bm{x}))f(\bm{x})f(\bm{x})^{\prime}+\rho\bm{R}_{2}^{-1}\right)d\xi(\bm{x})\right).

Yang et al. (2013) developed a method to obtain the optimal ξ\xi for the nonlinear models. It will be interesting to extend their framework and develop the method to obtain the optimal ξ\xi for QQ models.

Different QQ Models

The proposed design is not restricted to the logit model for the binary response. For example, if the probit model is used, the Bayesian DD-optimal design criterion can be directly obtained by replacing the logit transformation with the probit transformation in both p(𝒛|𝜼)p(\bm{z}|\bm{\eta}) and p(𝜼)p(\bm{\eta}). The design criterion can be derived similarly with minor modifications. The criterion formula remains the same with the following different diagonal matrices,

𝑾0\displaystyle\bm{W}_{0} =diag{ϕ2(𝒇(𝒙1)𝜼)Φ(𝒇(𝒙1)𝜼)(1Φ(𝒇(𝒙1)𝜼)),,ϕ2(𝒇(𝒙n)𝜼)Φ(𝒇(𝒙n)𝜼)(1Φ(𝒇(𝒙n)𝜼))},\displaystyle=\mbox{diag}\left\{\frac{\phi^{2}(\bm{f}(\bm{x}_{1})^{\prime}\bm{\eta})}{\Phi(\bm{f}(\bm{x}_{1})^{\prime}\bm{\eta})\left(1-\Phi(\bm{f}(\bm{x}_{1})^{\prime}\bm{\eta})\right)},\ldots,\frac{\phi^{2}(\bm{f}(\bm{x}_{n})^{\prime}\bm{\eta})}{\Phi(\bm{f}(\bm{x}_{n})^{\prime}\bm{\eta})\left(1-\Phi(\bm{f}(\bm{x}_{n})^{\prime}\bm{\eta})\right)}\right\},
𝑾1\displaystyle\bm{W}_{1} =diag{Φ(𝒇(𝒙1)𝜼),,Φ(𝒇(𝒙n)𝜼)},𝑾2=𝑰𝑾1,\displaystyle=\mbox{diag}\left\{\Phi\left(\bm{f}(\bm{x}_{1})^{\prime}\bm{\eta}\right),\ldots,\Phi\left(\bm{f}(\bm{x}_{n})^{\prime}\bm{\eta}\right)\right\},\quad\bm{W}_{2}=\bm{I}-\bm{W}_{1},

where Φ\Phi and ϕ\phi are CDF and PDF of the standard normal distribution.

It is worth pointing out that the design criterion in the work is based on the QQ model constructed by the joint model of Y|ZY|Z in (3) and ZZ in (4). Kang et al. (2021) created a new QQ model based on Z|UZ|U where UU is a latent continuous variable that is assumed to be correlated with the observed continuous response variable YY. Besides the conditional model structures, other model structures such as mixed graphical models (Yang et al., 2014) can also be used as long as the DD-optimality can be derived.

Acknowledgement

The authors were partly supported by U.S. National Science Foundation for this research project. Dr. Lulu Kang was supported by grants CMMI-1435902, DMS-1916467, and DMS-2153029, Dr. Xinwei Deng by CMMI-1233571 and CMMI-1435996, and Dr. Ran Jin by CMMI-1435996.

References

  • (1)
  • Breiman and Friedman (1997) Breiman, L., and Friedman, J. H. (1997), “Predicting multivariate responses in multiple linear regression,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(1), 3–54.
  • Carnell (2012) Carnell, R. (2012), lhs: Latin Hypercube Samples. R package version 0.10.
    http://CRAN.R-project.org/package=lhs
  • Chaloner and Verdinelli (1995) Chaloner, K., and Verdinelli, I. (1995), “Bayesian experimental design: A review,” Statistical Science, 10(3), 273–304.
  • Chen and Ibrahim (2003) Chen, M. H., and Ibrahim, J. G. (2003), “Conjugate priors for generalized linear models,” Statistica Sinica, 13(2), 461–476.
  • Cook and Nachtrheim (1980) Cook, R. D., and Nachtrheim, C. J. (1980), “A comparison of algorithms for constructing exact D-optimal designs,” Technometrics, 22(3), 315–324.
  • Denman et al. (2011) Denman, N., McGree, J., Eccleston, J., and Duffull, S. B. (2011), “Design of experiments for bivariate binary responses modelled by Copula functions,” Computational Statistics & Data Analysis, 55(4), 1509–1520.
  • Diaconis and Ylvisaker (1979) Diaconis, P., and Ylvisaker, D. (1979), “Conjugate priors for exponential families,” The Annals of Statistics, 7(2), 269–281.
  • Draper and Hunter (1966) Draper, N. R., and Hunter, W. G. (1966), “Design of experiments for parameter estimation in multiresponse situations,” Biometrika, 53(3-4), 525–533.
  • Dror and Steinberg (2005) Dror, H. A., and Steinberg, D. M. (2005), Approximate Local D-optimal Experimental Design for Binary Response, Technical Report RP SOR-0501, Tel Aviv University.
  • Dror and Steinberg (2006) Dror, H. A., and Steinberg, D. M. (2006), “Robust experimental design for multivariate generalized linear models,” Technometrics, 48(4), 520–529.
  • Fang et al. (2000) Fang, K.-T., Lin, D. K., Winker, P., and Zhang, Y. (2000), “Uniform design: theory and application,” Technometrics, 42(3), 237–248.
  • Fedorov (1972) Fedorov, V. V. (1972), Theory of Optimal Experiments, Vol. 12, New York: Academic Press.
  • Hainy et al. (2013) Hainy, M., Müller, W. G., and Wynn, H. P. (2013), “Approximate Bayesian computation design (ABCD), an introduction,” in mODa 10–Advances in Model-Oriented Design and Analysis, Switzerland: Springer, pp. 135–143.
  • Joseph (2006) Joseph, V. R. (2006), “A Bayesian approach to the design and analysis of fractionated experiments,” Technometrics, 48(2), 219–229.
  • Joseph and Hung (2008) Joseph, V. R., and Hung, Y. (2008), “Orthogonal-maximin Latin hypercube designs,” Statistica Sinica, 18(1), 171–186.
  • Kang and Huang (2019) Kang, L., and Huang, X. (2019), “Bayesian A-Optimal Design of Experiment with Quantitative and Qualitative Responses,” Journal of Statistical Theory and Practice, 13(4), 64.
    https://doi.org/10.1007/s42519-019-0063-6
  • Kang and Joseph (2009) Kang, L., and Joseph, V. R. (2009), “Bayesian optimal single arrays for robust parameter design,” Technometrics, 51(3), 250–261.
  • Kang et al. (2018) Kang, L., Kang, X., Deng, X., and Jin, R. (2018), “A Bayesian hierarchical model for quantitative and qualitative responses,” Journal of Quality Technology, 50(3), 290–308.
    https://doi.org/10.1080/00224065.2018.1489042
  • Kang et al. (2022) Kang, X., Kang, L., Chen, W., and Deng, X. (2022), “A generative approach to modeling data with quantitative and qualitative responses,” Journal of Multivariate Analysis, 190, 104952.
    https://www.sciencedirect.com/science/article/pii/S0047259X22000045
  • Kang et al. (2021) Kang, X., Ranganathan, S., Kang, L., Gohlke, J., and Deng, X. (2021), “Bayesian auxiliary variable model for birth records data with qualitative and quantitative responses,” Journal of Statistical Computation and Simulation, 91(16), 3283–3303.
    https://doi.org/10.1080/00949655.2021.1926459
  • Khuri et al. (2006) Khuri, A. I., Mukherjee, B., Sinha, B. K., and Ghosh, M. (2006), “Design issues for generalized linear models: A review,” Statistical Science, 21(3), 376–399.
  • Li et al. (2006) Li, X., Sudarsanam, N., and Frey, D. D. (2006), “Regularities in data from factorial experiments,” Complexity, 11(5), 32–45.
  • Lin et al. (2010) Lin, C. D., Bingham, D., Sitter, R. R., and Tang, B. (2010), “A new and flexible method for constructing designs for computer experiments,” The Annals of Statistics, 38, 1460–1477.
  • McCullagh (1980) McCullagh, P. (1980), “Regression models for ordinal data,” Journal of the Royal Statistical Society. Series B (Methodological), 42, 109–142.
  • Morris and Mitchell (1995) Morris, M. D., and Mitchell, T. J. (1995), “Exploratory designs for computational experiments,” Journal of Statistical Planning and Inference, 43(3), 381–402.
  • Nguyen and Miller (1992) Nguyen, N.-K., and Miller, A. J. (1992), “A review of some exchange algorithms for constructing discrete D-optimal designs,” Computational Statistics and Data Analysis, 14(4), 489–498.
  • Qian (2012) Qian, P. Z. (2012), “Sliced Latin hypercube designs,” Journal of the American Statistical Association, 107(497), 393–399.
  • Russell et al. (2009) Russell, K. G., Woods, D. C., Lewis, S., and Eccleston, J. (2009), “D-optimal designs for Poisson regression models,” Statistica Sinica, 19, 721–730.
  • Sitter and Torsney (1995) Sitter, R. R., and Torsney, B. (1995), “Optimal designs for binary response experiments with two design variables,” Statistica Sinica, 5, 405–419.
  • Wheeler (2014) Wheeler, B. (2014), AlgDesign: Algorithmic Experimental Design. R package version 1.1-7.2.
    http://CRAN.R-project.org/package=AlgDesign
  • Woods (2010) Woods, D. (2010), “Robust designs for binary data: applications of simulated annealing,” Journal of Statistical Computation and Simulation, 80(1), 29–41.
  • Woods and Van de Ven (2012) Woods, D. C., and Van de Ven, P. (2012), “Blocked designs for experiments with correlated non-normal response,” Technometrics, .
  • Woods et al. (2006) Woods, D., Lewis, S., Eccleston, J., and Russell, K. (2006), “Designs for generalized linear models with several variables and model uncertainty,” Technometrics, 48(2), 284–292.
  • Wu (1985) Wu, C. F. J. (1985), “Efficient Sequential Designs With Binary Data,” Journal of the American Statistical Association, 80(392), 974–984.
  • Wu and Hamada (2011) Wu, C. J., and Hamada, M. S. (2011), Experiments: planning, analysis, and optimization, Vol. 552, New Jersey: John Wiley & Sons.
  • Yang et al. (2014) Yang, E., Baker, Y., Ravikumar, P., Allen, G., and Liu, Z. (2014), Mixed Graphical Models via Exponential Families,, in Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, pp. 1042–1050.
  • Yang et al. (2013) Yang, M., Biedermann, S., and Tang, E. (2013), “On optimal designs for nonlinear models: a general and efficient algorithm,” Journal of the American Statistical Association, 108(504), 1411–1420.

Appendix: Proofs, Derivations, and Extra Table

A1 Proofs and Derivations

Proof of (11)

Proof.
Ψ(𝑿|μ0,σ2)=p(𝒚,𝒛|μ0,σ2)\displaystyle\Psi(\bm{X}|\mu_{0},\sigma^{2})=\int p(\bm{y},\bm{z}|\mu_{0},\sigma^{2}) log(p(𝜷(1),𝜷(2),𝜼|𝒚,𝒛,μ0,σ2))\displaystyle\int\log\left(p(\bm{\beta}^{(1)},\bm{\beta}^{(2)},\bm{\eta}|\bm{y},\bm{z},\mu_{0},\sigma^{2})\right)
p(𝜷(1),𝜷(2),𝜼|𝒚,𝒛,μ0,σ2)d𝜷(1)d𝜷(2)d𝜼d𝒚d𝒛\displaystyle p(\bm{\beta}^{(1)},\bm{\beta}^{(2)},\bm{\eta}|\bm{y},\bm{z},\mu_{0},\sigma^{2})d\bm{\beta}^{(1)}d\bm{\beta}^{(2)}d\bm{\eta}d\bm{y}d\bm{z}
=p(𝒚,𝒛|μ0,σ2)\displaystyle=\int p(\bm{y},\bm{z}|\mu_{0},\sigma^{2}) {log(p(𝜼|𝒛))p(𝜼|𝒛)d𝜼\displaystyle\left\{\int\log\left(p(\bm{\eta}|\bm{z})\right)p(\bm{\eta}|\bm{z})d\bm{\eta}\right.
+i=12log(p(𝜷(i)|𝒚,𝒛,μ0,σ2))p(𝜷(i)|𝒚,𝒛,μ0,σ2)d𝜷(i)}d𝒚d𝒛\displaystyle+\left.\sum_{i=1}^{2}\int\log\left(p(\bm{\beta}^{(i)}|\bm{y},\bm{z},\mu_{0},\sigma^{2})\right)p(\bm{\beta}^{(i)}|\bm{y},\bm{z},\mu_{0},\sigma^{2})d\bm{\beta}^{(i)}\right\}d\bm{y}d\bm{z}
=p(𝒚,𝒛|μ0,σ2)\displaystyle=\int p(\bm{y},\bm{z}|\mu_{0},\sigma^{2}) {log(p(𝜼|𝒛))p(𝜼|𝒛)d𝜼\displaystyle\left\{\int\log\left(p(\bm{\eta}|\bm{z})\right)p(\bm{\eta}|\bm{z})d\bm{\eta}\right.
12i=12logdet{σ2(𝑭𝑽i𝑭)1}nlog(2π)n}d𝒚d𝒛\displaystyle-\left.\frac{1}{2}\sum_{i=1}^{2}\log\det\{\sigma^{-2}(\bm{F}^{\prime}\bm{V}_{i}\bm{F})^{-1}\}-n\log(2\pi)-n\right\}d\bm{y}d\bm{z}
=log(p(𝜼|𝒛))p(𝜼|𝒛)\displaystyle=\int\log(p(\bm{\eta}|\bm{z}))p(\bm{\eta}|\bm{z}) p(𝒛)d𝜼d𝒛\displaystyle p(\bm{z})d\bm{\eta}d\bm{z}
12i=12logdet{σ2(𝑭𝑽i𝑭)1}p(𝒛)d𝒛+constants\displaystyle-\frac{1}{2}\sum_{i=1}^{2}\int\log\det\{\sigma^{-2}(\bm{F}^{\prime}\bm{V}_{i}\bm{F})^{-1}\}p(\bm{z})d\bm{z}+\textrm{constants}
=log(p(𝜼|𝒛))p(𝒛,𝜼)\displaystyle=\int\log(p(\bm{\eta}|\bm{z}))p(\bm{z},\bm{\eta}) d𝜼d𝒛\displaystyle d\bm{\eta}d\bm{z}
+12i=12logdet{(𝑭𝑽i𝑭)}p(𝒛|𝜼)p(𝜼)d𝒛d𝜼+constants.\displaystyle+\frac{1}{2}\sum_{i=1}^{2}\int\log\det\{(\bm{F}^{\prime}\bm{V}_{i}\bm{F})\}p(\bm{z}|\bm{\eta})p(\bm{\eta})d\bm{z}d\bm{\eta}+\textrm{constants}.

Write the integration into the form of expectation,

Ψ(𝑿|μ0,σ2)\displaystyle\Psi(\bm{X}|\mu_{0},\sigma^{2}) =𝔼𝒛,𝜼{log(p(𝜼|𝒛))}\displaystyle=\mathbb{E}_{\bm{z},\bm{\eta}}\left\{\log(p(\bm{\eta}|\bm{z}))\right\}
+12i=12𝔼𝜼𝔼𝒛|𝜼{logdet(𝑭𝑽i𝑭)}+constants.\displaystyle+\frac{1}{2}\sum_{i=1}^{2}\mathbb{E}_{\bm{\eta}}\mathbb{E}_{\bm{z}|\bm{\eta}}\left\{\log\det(\bm{F}^{\prime}\bm{V}_{i}\bm{F})\right\}+\textrm{constants}.

Proof of Theorem 1

Proof.

To show (14), we just need to show that 𝔼𝒛|𝜼(logdet(𝑭𝑽i𝑭))logdet(𝑭𝑾i𝑭)\mathbb{E}_{\bm{z}|\bm{\eta}}\left(\log\det(\bm{F}^{\prime}\bm{V}_{i}\bm{F})\right)\leq\log\det(\bm{F}^{\prime}\bm{W}_{i}\bm{F}^{\prime}) for i=1,2i=1,2.

First we need to show that logdet(𝑭𝑨𝑭)=logdet(i=1nai𝒇(𝒙i)𝒇(𝒙i))\log\det(\bm{F}^{\prime}\bm{A}\bm{F})=\log\det(\sum_{i=1}^{n}a_{i}\bm{f}(\bm{x}_{i})\bm{f}(\bm{x}_{i})^{\prime}) with 𝑨=diag{a1,,an}\bm{A}=\mbox{diag}\{a_{1},\ldots,a_{n}\} is a concave function of 𝒂=(a1,,an)\bm{a}=(a_{1},\ldots,a_{n})^{\prime} for 𝒂[0,1]n\bm{a}\in[0,1]^{n}. Denote 𝑩=diag{b1,,bn}\bm{B}=\mbox{diag}\{b_{1},\ldots,b_{n}\} for 𝒃=(b1,,bn)[0,1]n\bm{b}=(b_{1},\ldots,b_{n})^{\prime}\in[0,1]^{n}. We assume that 𝑭𝑨𝑭\bm{F}^{\prime}\bm{A}\bm{F} is nonsingular, thus 𝑭𝑨𝑭\bm{F}^{\prime}\bm{A}\bm{F} is positive definite. For any scalar value of tt, define function g(t)g(t)

g(t)\displaystyle g(t) =logdet(𝑭𝑨𝑭+𝑭(t𝑩)𝑭)\displaystyle=\log\det\left(\bm{F}^{\prime}\bm{A}\bm{F}+\bm{F}^{\prime}(t\bm{B})\bm{F}\right)
=logdet(𝑭𝑨𝑭)+logdet(𝑰q+t(𝑭𝑨𝑭)1/2(𝑭𝑩𝑭)(𝑭𝑨𝑭)1/2)\displaystyle=\log\det(\bm{F}^{\prime}\bm{A}\bm{F})+\log\det(\bm{I}_{q}+t(\bm{F}^{\prime}\bm{A}\bm{F})^{-1/2}(\bm{F}^{\prime}\bm{B}\bm{F})(\bm{F}^{\prime}\bm{A}\bm{F})^{-1/2})
=logdet(𝑭𝑨𝑭)+i=1qlog(1+tλi),\displaystyle=\log\det(\bm{F}^{\prime}\bm{A}\bm{F})+\sum_{i=1}^{q}\log(1+t\lambda_{i}),

where λi\lambda_{i}’s are the eigenvalues of the positive definite matrix (𝑭𝑨𝑭)1/2(𝑭𝑩𝑭)(𝑭𝑨𝑭)1/2(\bm{F}^{\prime}\bm{A}\bm{F})^{-1/2}(\bm{F}^{\prime}\bm{B}\bm{F})(\bm{F}^{\prime}\bm{A}\bm{F})^{-1/2}. Thus g(t)g(t) is a concave function in tt for any choice of 𝒂\bm{a}, which is the sufficient and necessary condition that logdet(𝑭𝑨𝑭)\log\det(\bm{F}^{\prime}\bm{A}\bm{F}) is a concave function of 𝒂\bm{a}. According to Jensen’s inequality, if π(𝒙i,𝜼)(0,1)\pi(\bm{x}_{i},\bm{\eta})\in(0,1) for i=1,,ni=1,\ldots,n, then

𝔼𝒛|𝜼(logdet(𝑭𝑽1𝑭))\displaystyle\mathbb{E}_{\bm{z}|\bm{\eta}}\left(\log\det(\bm{F}^{\prime}\bm{V}_{1}\bm{F})\right) logdet(i=1n𝔼(Zi|𝜼)𝒇(𝒙i)𝒇(𝒙i))\displaystyle\leq\log\det\left(\sum_{i=1}^{n}\mathbb{E}(Z_{i}|\bm{\eta})\bm{f}(\bm{x}_{i})\bm{f}(\bm{x}_{i})^{\prime}\right)
=logdet(i=1nπ(𝒙i,𝜼)𝒇(𝒙i)𝒇(𝒙i)),\displaystyle=\log\det\left(\sum_{i=1}^{n}\pi(\bm{x}_{i},\bm{\eta})\bm{f}(\bm{x}_{i})\bm{f}(\bm{x}_{i})^{\prime}\right),
𝔼𝒛|𝜼(logdet(𝑭𝑽2𝑭))\displaystyle\mathbb{E}_{\bm{z}|\bm{\eta}}\left(\log\det(\bm{F}^{\prime}\bm{V}_{2}\bm{F})\right) logdet(i=1n𝔼((1Zi)|𝜼)𝒇(𝒙i)𝒇(𝒙i))\displaystyle\leq\log\det\left(\sum_{i=1}^{n}\mathbb{E}((1-Z_{i})|\bm{\eta})\bm{f}(\bm{x}_{i})\bm{f}(\bm{x}_{i})^{\prime}\right)
=logdet(i=1n(1π(𝒙i,𝜼))𝒇(𝒙i)𝒇(𝒙i)).\displaystyle=\log\det\left(\sum_{i=1}^{n}(1-\pi(\bm{x}_{i},\bm{\eta}))\bm{f}(\bm{x}_{i})\bm{f}(\bm{x}_{i})^{\prime}\right).

So we have proved (14). ∎

Proof of Proposition 1

Proof.
m=i=1mI(0<j=1niZij<ni)+i=1mI(j=1niZij=0)+i=1mI(j=1niZij=ni)qm=\sum_{i=1}^{m}I\left(0<\sum_{j=1}^{n_{i}}Z_{ij}<n_{i}\right)+\sum_{i=1}^{m}I\left(\sum_{j=1}^{n_{i}}Z_{ij}=0\right)+\sum_{i=1}^{m}I\left(\sum_{j=1}^{n_{i}}Z_{ij}=n_{i}\right)\geq q

If m=qm=q, (15) is equivalent to I(0<j=1niZij<ni)=1I\left(0<\sum_{j=1}^{n_{i}}Z_{ij}<n_{i}\right)=1 for i=1,2,,mi=1,2,\ldots,m. A sufficient condition for any Pr(0<j=1niZij<ni)κ\Pr\left(0<\sum_{j=1}^{n_{i}}Z_{ij}<n_{i}\right)\geq\kappa can be derived as follows.

Pr(0<j=1niZij<ni)\displaystyle\Pr\left(0<\sum_{j=1}^{n_{i}}Z_{ij}<n_{i}\right) =1π(𝒙i,𝜼)ni(1π(𝒙i,𝜼))niκ\displaystyle=1-\pi(\bm{x}_{i},\bm{\eta})^{n_{i}}-(1-\pi(\bm{x}_{i},\bm{\eta}))^{n_{i}}\geq\kappa
\displaystyle\Longleftrightarrow\quad π(𝒙i,𝜼)ni+(1π(𝒙i,𝜼))ni1κ.\displaystyle\pi(\bm{x}_{i},\bm{\eta})^{n_{i}}+(1-\pi(\bm{x}_{i},\bm{\eta}))^{n_{i}}\leq 1-\kappa.

Next develop an upper bound for π(𝒙i,𝜼)ni+(1π(𝒙i,𝜼))ni\pi(\bm{x}_{i},\bm{\eta})^{n_{i}}+(1-\pi(\bm{x}_{i},\bm{\eta}))^{n_{i}}. If π(𝒙i,𝜼)1/2\pi(\bm{x}_{i},\bm{\eta})\geq 1/2, denote a=(1π(𝒙i,𝜼))/π(𝒙i,𝜼)a=(1-\pi(\bm{x}_{i},\bm{\eta}))/\pi(\bm{x}_{i},\bm{\eta}) and a1a\leq 1. Then

π(𝒙i,𝜼)ni+(1π(𝒙i,𝜼))ni=1+ani(1+a)ni1+a(1+a)ni.\pi(\bm{x}_{i},\bm{\eta})^{n_{i}}+(1-\pi(\bm{x}_{i},\bm{\eta}))^{n_{i}}=\frac{1+a^{n_{i}}}{(1+a)^{n_{i}}}\leq\frac{1+a}{(1+a)^{n_{i}}}.

If

1+a(1+a)ni1κni1+log(1κ)logπ(𝒙i,𝜼),\frac{1+a}{(1+a)^{n_{i}}}\leq 1-\kappa\quad\Longleftrightarrow\quad n_{i}\geq 1+\frac{\log(1-\kappa)}{\log\pi(\bm{x}_{i},\bm{\eta})},

then π(𝒙i,𝜼)ni+(1π(𝒙i,𝜼))ni1κ\pi(\bm{x}_{i},\bm{\eta})^{n_{i}}+(1-\pi(\bm{x}_{i},\bm{\eta}))^{n_{i}}\leq 1-\kappa. If π(𝒙i,𝜼)<1/2\pi(\bm{x}_{i},\bm{\eta})<1/2, denote a=π(𝒙i,𝜼)/(1π(𝒙i,𝜼))a=\pi(\bm{x}_{i},\bm{\eta})/(1-\pi(\bm{x}_{i},\bm{\eta})) and a<1a<1. Then

π(𝒙i,𝜼)ni+(1π(𝒙i,𝜼))ni=1+ani(1+a)ni<1+a(1+a)ni.\pi(\bm{x}_{i},\bm{\eta})^{n_{i}}+(1-\pi(\bm{x}_{i},\bm{\eta}))^{n_{i}}=\frac{1+a^{n_{i}}}{(1+a)^{n_{i}}}<\frac{1+a}{(1+a)^{n_{i}}}.

The sufficient condition becomes

1+a(1+a)ni1κni1+log(1κ)log(1π(𝒙i,𝜼)).\frac{1+a}{(1+a)^{n_{i}}}\leq 1-\kappa\quad\Longleftrightarrow\quad n_{i}\geq 1+\frac{\log(1-\kappa)}{\log(1-\pi(\bm{x}_{i},\bm{\eta}))}.

Combining the two cases, the sufficient condition on nin_{i} for i=1,,mi=1,\ldots,m is (16). It is known that

2(π(𝒙i,𝜼)(1π(𝒙i,𝜼))ni/2π(𝒙i,𝜼)ni+(1π(𝒙i,𝜼))ni1κ.2\left(\pi(\bm{x}_{i},\bm{\eta})(1-\pi(\bm{x}_{i},\bm{\eta})\right)^{n_{i}/2}\leq\pi(\bm{x}_{i},\bm{\eta})^{n_{i}}+(1-\pi(\bm{x}_{i},\bm{\eta}))^{n_{i}}\leq 1-\kappa.

The necessary condition is

2(π(𝒙i,𝜼)(1π(𝒙i,𝜼))ni/21κni2log(1κ2)logπ(𝒙i,𝜼)+log(1π(𝒙i,𝜼)).2\left(\pi(\bm{x}_{i},\bm{\eta})(1-\pi(\bm{x}_{i},\bm{\eta})\right)^{n_{i}/2}\leq 1-\kappa\quad\Longleftrightarrow\quad n_{i}\geq\frac{2\log\left(\frac{1-\kappa}{2}\right)}{\log\pi(\bm{x}_{i},\bm{\eta})+\log(1-\pi(\bm{x}_{i},\bm{\eta}))}.

Proof of Proposition 2

Proof.

If m>qm>q, (15) is equivalent to

i=1mI(j=1niZij=0)mq, and i=1mI(j=1niZij=ni)mq.\sum_{i=1}^{m}I\left(\sum_{j=1}^{n_{i}}Z_{ij}=0\right)\leq m-q,\textrm{ and }\sum_{i=1}^{m}I\left(\sum_{j=1}^{n_{i}}Z_{ij}=n_{i}\right)\leq m-q.

For the two inequalities to hold with large probability,

i=1m𝔼{I(j=1niZij=0)}=i=1m(1π(𝒙i,𝜼))nimq,\displaystyle\sum_{i=1}^{m}\mathbb{E}\left\{I\left(\sum_{j=1}^{n_{i}}Z_{ij}=0\right)\right\}=\sum_{i=1}^{m}(1-\pi(\bm{x}_{i},\bm{\eta}))^{n_{i}}\leq m-q, (36)
and i=1m𝔼{I(j=1niZij=ni)}=i=1mπ(𝒙i,𝜼)nimq.\displaystyle\sum_{i=1}^{m}\mathbb{E}\left\{I\left(\sum_{j=1}^{n_{i}}Z_{ij}=n_{i}\right)\right\}=\sum_{i=1}^{m}\pi(\bm{x}_{i},\bm{\eta})^{n_{i}}\leq m-q. (37)

It is known that

m(i=1m(1πmax)ni)1/mi=1m(1πmax)nii=1m(1π(𝒙i,𝜼))nii=1m(1πmin)nim(1πmin)n0.m\left(\prod_{i=1}^{m}(1-\pi_{\max})^{n_{i}}\right)^{1/m}\leq\sum_{i=1}^{m}(1-\pi_{\max})^{n_{i}}\leq\sum_{i=1}^{m}(1-\pi(\bm{x}_{i},\bm{\eta}))^{n_{i}}\leq\sum_{i=1}^{m}(1-\pi_{\min})^{n_{i}}\leq m(1-\pi_{\min})^{n_{0}}.

Thus one sufficient condition for (36) is

m(1πmin)n0mqn0log(1q/m)log(1πmin).m(1-\pi_{\min})^{n_{0}}\leq m-q\quad\Longleftrightarrow\quad n_{0}\geq\frac{\log(1-q/m)}{\log(1-\pi_{\min})}.

One necessary condition for (36) is

m(i=1m(1πmax)ni)1/mmqi=1mnimlog(1q/m)log(1πmax).m\left(\prod_{i=1}^{m}(1-\pi_{\max})^{n_{i}}\right)^{1/m}\leq m-q\quad\Longleftrightarrow\sum_{i=1}^{m}n_{i}\geq m\frac{\log(1-q/m)}{\log(1-\pi_{\max})}.

Similarly,

m(i=1mπminni)1/mi=1mπminnii=1mπ(𝒙i,𝜼)nii=1mπmaxnimπmaxn0.m\left(\prod_{i=1}^{m}\pi_{\min}^{n_{i}}\right)^{1/m}\leq\sum_{i=1}^{m}\pi_{\min}^{n_{i}}\leq\sum_{i=1}^{m}\pi(\bm{x}_{i},\bm{\eta})^{n_{i}}\leq\sum_{i=1}^{m}\pi_{\max}^{n_{i}}\leq m\cdot\pi_{\max}^{n_{0}}.

Thus one sufficient condition for (37) is

mπmaxn0mqn0log(1q/m)logπmax.m\cdot\pi_{\max}^{n_{0}}\leq m-q\quad\Longleftrightarrow\quad n_{0}\geq\frac{\log(1-q/m)}{\log\pi_{\max}}.

One necessary condition for (37) is

m(i=1mπminni)1/mmqi=1mnimlog(1q/m)logπmin.m\left(\prod_{i=1}^{m}\pi_{\min}^{n_{i}}\right)^{1/m}\leq m-q\quad\Longleftrightarrow\sum_{i=1}^{m}n_{i}\geq m\frac{\log(1-q/m)}{\log\pi_{\min}}.

Thus the sufficient condition for (36) and (37) derived here is

n0max{1,log(1q/m)log(1πmin),log(1q/m)logπmax},n_{0}\geq\max\left\{1,\frac{\log(1-q/m)}{\log(1-\pi_{\min})},\frac{\log(1-q/m)}{\log\pi_{\max}}\right\},

or equivalently,

i=1mnimn0mmax{1,log(1q/m)log(1πmin),log(1q/m)logπmax}.\sum_{i=1}^{m}n_{i}\geq m\cdot n_{0}\geq m\cdot\max\left\{1,\frac{\log(1-q/m)}{\log(1-\pi_{\min})},\frac{\log(1-q/m)}{\log\pi_{\max}}\right\}.

The necessary condition for (36) and (37) derived here is

i=1nnimmax{1,log(1q/m)log(1πmax),log(1q/m)logπmin}.\sum_{i=1}^{n}n_{i}\geq m\cdot\max\left\{1,\frac{\log(1-q/m)}{\log(1-\pi_{\max})},\frac{\log(1-q/m)}{\log\pi_{\min}}\right\}.

It is easy to see that this lower bound in the necessary condition is smaller than the one in the sufficient condition,

i=1mnimn0mmax{1,log(1q/m)log(1πmin),log(1q/m)logπmax}.\sum_{i=1}^{m}n_{i}\geq m\cdot n_{0}\geq m\cdot\max\left\{1,\frac{\log(1-q/m)}{\log(1-\pi_{\min})},\frac{\log(1-q/m)}{\log\pi_{\max}}\right\}.

Proof of Theorem 2

Proof.

We only need to show that logdet(𝑭𝑨𝑭+ρ𝑹i1)=logdet(i=1nai𝒇(𝒙i)𝒇(𝒙i)+ρ𝑹i1)\log\det(\bm{F}^{\prime}\bm{A}\bm{F}+\rho\bm{R}_{i}^{-1})=\log\det(\sum_{i=1}^{n}a_{i}\bm{f}(\bm{x}_{i})\bm{f}(\bm{x}_{i})^{\prime}+\rho\bm{R}_{i}^{-1}) with 𝑨=diag{a1,,an}\bm{A}=\mbox{diag}\{a_{1},\ldots,a_{n}\} is a concave function of 𝒂=(a1,,an)\bm{a}=(a_{1},\ldots,a_{n})^{\prime} for 𝒂[0,1]n\bm{a}\in[0,1]^{n}. Denote 𝑩=diag{b1,,bn}\bm{B}=\mbox{diag}\{b_{1},\ldots,b_{n}\} for 𝒃=(b1,,bn)[0,1]n\bm{b}=(b_{1},\ldots,b_{n})^{\prime}\in[0,1]^{n} and 𝑲=𝑭𝑨𝑭+ρ𝑹i1\bm{K}=\bm{F}^{\prime}\bm{A}\bm{F}+\rho\bm{R}_{i}^{-1}. Define g(t)g(t) as follows.

g(t)\displaystyle g(t) =logdet(𝑭𝑨𝑭+ρ𝑹i1+𝑭(t𝑩)𝑭+ρ𝑹i1)\displaystyle=\log\det\left(\bm{F}^{\prime}\bm{A}\bm{F}+\rho\bm{R}_{i}^{-1}+\bm{F}^{\prime}(t\bm{B})\bm{F}+\rho\bm{R}_{i}^{-1}\right)
=logdet(𝑭𝑨𝑭+ρ𝑹i1)\displaystyle=\log\det(\bm{F}^{\prime}\bm{A}\bm{F}+\rho\bm{R}_{i}^{-1})
+logdet(𝑰q+(𝑭𝑨𝑭+ρ𝑹i1)1/2(𝑭t𝑩𝑭+ρ𝑹i1)(𝑭𝑨𝑭+ρ𝑹i1)1/2)\displaystyle+\log\det(\bm{I}_{q}+(\bm{F}^{\prime}\bm{A}\bm{F}+\rho\bm{R}_{i}^{-1})^{-1/2}(\bm{F}^{\prime}t\bm{B}\bm{F}+\rho\bm{R}_{i}^{-1})(\bm{F}^{\prime}\bm{A}\bm{F}+\rho\bm{R}_{i}^{-1})^{-1/2})
=logdet(𝑭𝑨𝑭+ρ𝑹i1)\displaystyle=\log\det(\bm{F}^{\prime}\bm{A}\bm{F}+\rho\bm{R}_{i}^{-1})
+logdet(𝑰q+t𝑲1/2𝑭𝑩𝑭𝑲1/2+ρ𝑲1/2𝑹i1𝑲1/2)\displaystyle+\log\det(\bm{I}_{q}+t\bm{K}^{-1/2}\bm{F}^{\prime}\bm{B}\bm{F}\bm{K}^{-1/2}+\rho\bm{K}^{-1/2}\bm{R}_{i}^{-1}\bm{K}^{-1/2})
=logdet(𝑭𝑨𝑭+ρ𝑹i1)+logdet(𝑰q+ρ𝑲1/2𝑹i1𝑲1/2)+i=1qlog(1+tλi),\displaystyle=\log\det(\bm{F}^{\prime}\bm{A}\bm{F}+\rho\bm{R}_{i}^{-1})+\log\det(\bm{I}_{q}+\rho\bm{K}^{-1/2}\bm{R}_{i}^{-1}\bm{K}^{-1/2})+\sum_{i=1}^{q}\log(1+t\lambda_{i}),

where λi\lambda_{i}’s are the eigenvalues of

(𝑰q+ρ𝑲1/2𝑹i1𝑲1/2)1/2𝑲1/2𝑭𝑩𝑭𝑲1/2(𝑰q+ρ𝑲1/2𝑹i1𝑲1/2)1/2.(\bm{I}_{q}+\rho\bm{K}^{-1/2}\bm{R}_{i}^{-1}\bm{K}^{-1/2})^{-1/2}\bm{K}^{-1/2}\bm{F}^{\prime}\bm{B}\bm{F}\bm{K}^{-1/2}(\bm{I}_{q}+\rho\bm{K}^{-1/2}\bm{R}_{i}^{-1}\bm{K}^{-1/2})^{-1/2}.

Therefore, λi0\lambda_{i}\geq 0 for i=1,,qi=1,\ldots,q. Thus g(t)g(t) is a concave function in tt for any choice of 𝒂[0,1]n\bm{a}\in[0,1]^{n}, which is the sufficient and necessary condition for logdet(𝑭𝑨𝑭+ρ𝑹i1)\log\det(\bm{F}^{\prime}\bm{A}\bm{F}+\rho\bm{R}_{i}^{-1}) to be a concave function of 𝒂\bm{a}. ∎

Proof of the deletion function (33)

Proof.

Denote 𝑭i\bm{F}_{-i} as the model matrix for 𝑿i\bm{X}_{-i}, which is the design matrix without the iith design point, and 𝑾0,i\bm{W}_{0,-i}, 𝑾1,i\bm{W}_{1,-i}, and 𝑾2,i\bm{W}_{2,-i} the weight matrices accordingly. According to the properties of matrix determinants, we can show the following.

det(𝑭i𝑾0,i𝑭i)\displaystyle\det\left(\bm{F}^{\prime}_{-i}\bm{W}_{0,-i}\bm{F}_{-i}\right) =det(jiπ(𝒙i,𝜼)(1π(𝒙i,𝜼))𝒇(𝒙j)𝒇(𝒙j))\displaystyle=\det\left(\sum_{j\neq i}\pi(\bm{x}_{i},\bm{\eta})(1-\pi(\bm{x}_{i},\bm{\eta}))\bm{f}(\bm{x}_{j})\bm{f}(\bm{x}_{j})^{\prime}\right)
=det(𝑭𝑾0𝑭π(𝒙i,𝜼)(1π(𝒙i,𝜼))𝒇(𝒙i)𝒇(𝒙i))\displaystyle=\det\left(\bm{F}^{\prime}\bm{W}_{0}\bm{F}-\pi(\bm{x}_{i},\bm{\eta})(1-\pi(\bm{x}_{i},\bm{\eta}))\bm{f}(\bm{x}_{i})\bm{f}(\bm{x}_{i})^{\prime}\right)
=det(𝑭𝑾0𝑭)[1π(𝒙i,𝜼)(1π(𝒙i,𝜼))𝒇(𝒙i)(𝑭𝑾0𝑭)1𝒇(𝒙i)]\displaystyle=\det(\bm{F}^{\prime}\bm{W}_{0}\bm{F})\left[1-\pi(\bm{x}_{i},\bm{\eta})(1-\pi(\bm{x}_{i},\bm{\eta}))\bm{f}(\bm{x}_{i})^{\prime}(\bm{F}^{\prime}\bm{W}_{0}\bm{F})^{-1}\bm{f}(\bm{x}_{i})\right]
=det(𝑭𝑾0𝑭)[1π(𝒙i,𝜼)(1π(𝒙i,𝜼))v0(𝒙i)].\displaystyle=\det(\bm{F}^{\prime}\bm{W}_{0}\bm{F})\left[1-\pi(\bm{x}_{i},\bm{\eta})(1-\pi(\bm{x}_{i},\bm{\eta}))v_{0}(\bm{x}_{i})\right].
det(𝑭i𝑾1,i𝑭i+ρ𝑹1)\displaystyle\det\left(\bm{F}^{\prime}_{-i}\bm{W}_{1,-i}\bm{F}_{-i}+\rho\bm{R}^{-1}\right)
=\displaystyle= det(jiπ(𝒙i,𝜼)𝒇(𝒙j)𝒇(𝒙j)+ρ𝑹1)\displaystyle\det\left(\sum_{j\neq i}\pi(\bm{x}_{i},\bm{\eta})\bm{f}(\bm{x}_{j})\bm{f}(\bm{x}_{j})^{\prime}+\rho\bm{R}^{-1}\right)
=\displaystyle= det(𝑭𝑾1𝑭+ρ𝑹1π(𝒙i,𝜼)𝒇(𝒙i)𝒇(𝒙i))\displaystyle\det\left(\bm{F}^{\prime}\bm{W}_{1}\bm{F}+\rho\bm{R}^{-1}-\pi(\bm{x}_{i},\bm{\eta})\bm{f}(\bm{x}_{i})\bm{f}(\bm{x}_{i})^{\prime}\right)
=\displaystyle= det(𝑭𝑾1𝑭+ρ𝑹1)[1π(𝒙i,𝜼)𝒇(𝒙i)(𝑭𝑾1𝑭+ρ𝑹1)1𝒇(𝒙i)]\displaystyle\det\left(\bm{F}^{\prime}\bm{W}_{1}\bm{F}+\rho\bm{R}^{-1}\right)\left[1-\pi(\bm{x}_{i},\bm{\eta})\bm{f}(\bm{x}_{i})^{\prime}(\bm{F}^{\prime}\bm{W}_{1}\bm{F}+\rho\bm{R}^{-1})^{-1}\bm{f}(\bm{x}_{i})\right]
=\displaystyle= det(𝑭𝑾1𝑭+ρ𝑹1)[1π(𝒙i,𝜼)v1(𝒙i)].\displaystyle\det\left(\bm{F}^{\prime}\bm{W}_{1}\bm{F}+\rho\bm{R}^{-1}\right)\left[1-\pi(\bm{x}_{i},\bm{\eta})v_{1}(\bm{x}_{i})\right].

Similarly,

det(𝑭i𝑾2,i𝑭i+ρ𝑹21)=det(𝑭𝑾2𝑭+ρ𝑹1)[1(1π(𝒙i,𝜼))v2(𝒙i)].\det\left(\bm{F}^{\prime}_{-i}\bm{W}_{2,-i}\bm{F}_{-i}+\rho\bm{R}_{2}^{-1}\right)=\det\left(\bm{F}^{\prime}\bm{W}_{2}\bm{F}+\rho\bm{R}^{-1}\right)\left[1-(1-\pi(\bm{x}_{i},\bm{\eta}))v_{2}(\bm{x}_{i})\right].

The deletion function (33) is then obtained. ∎

Shortcut formulas for Mi,j\bm{M}_{i,-j}.

Proof.

Denote 𝑴i,j\bm{M}_{i,-j} for i=0,1,2i=0,1,2 as the updated 𝑴i\bm{M}_{i} when the jjth design point is removed. The following shortcut formulas are used in constructing the initial design.

𝑴i,j=𝑴i+[π(𝒙j,𝜼)(1π(𝒙j,𝜼))1π(𝒙j,𝜼)(1π(𝒙j,𝜼))v0(𝒙j)]𝑴i𝒇(𝒙j)𝒇(𝒙j)𝑴i, for i=0,1,2.\bm{M}_{i,-j}=\bm{M}_{i}+\left[\frac{\pi(\bm{x}_{j},\bm{\eta})(1-\pi(\bm{x}_{j},\bm{\eta}))}{1-\pi(\bm{x}_{j},\bm{\eta})(1-\pi(\bm{x}_{j},\bm{\eta}))v_{0}(\bm{x}_{j})}\right]\bm{M}_{i}\bm{f}(\bm{x}_{j})\bm{f}(\bm{x}_{j})^{\prime}\bm{M}_{i},\quad\textrm{ for }i=0,1,2.

Proof of Δ(x,xi)\Delta(\bm{x},\bm{x}_{i}) in (34).

Proof.

Denote 𝑭\bm{F}^{*} as the updated model matrix for updated design 𝑿\bm{X}^{*}, and 𝑾i\bm{W}_{i}^{*} for i=0,1,2i=0,1,2 as the updated weight matrices accordingly. Let

𝑮0=[π(𝒙,𝜼)(1π(𝒙,𝜼)𝒇(𝒙),iπ(𝒙i,𝜼)(1π(𝒙i,𝜼)𝒇(𝒙i)],\bm{G}_{0}=[\sqrt{\pi(\bm{x},\bm{\eta})(1-\pi(\bm{x},\bm{\eta})}\bm{f}(\bm{x}),i\sqrt{\pi(\bm{x}_{i},\bm{\eta})(1-\pi(\bm{x}_{i},\bm{\eta})}\bm{f}(\bm{x}_{i})],

where i=1i=\sqrt{-1}.

det(𝑭𝑾0𝑭)=det(jiπ(𝒙j,𝜼)(1π(𝒙j,𝜼))𝒇(𝒙j)𝒇(𝒙j)\displaystyle\det(\bm{F}^{*^{\prime}}\bm{W}_{0}^{*}\bm{F}^{*})=\det\left(\sum_{j\neq i}\pi(\bm{x}_{j},\bm{\eta})(1-\pi(\bm{x}_{j},\bm{\eta}))\bm{f}(\bm{x}_{j})\bm{f}(\bm{x}_{j})^{\prime}\right.
π(𝒙i,𝜼)(1π(𝒙i,𝜼))𝒇(𝒙i)𝒇(𝒙i)+π(𝒙,𝜼)(1π(𝒙,𝜼))𝒇(𝒙)𝒇(𝒙))\displaystyle\left.-\pi(\bm{x}_{i},\bm{\eta})(1-\pi(\bm{x}_{i},\bm{\eta}))\bm{f}(\bm{x}_{i})\bm{f}(\bm{x}_{i})^{\prime}+\pi(\bm{x},\bm{\eta})(1-\pi(\bm{x},\bm{\eta}))\bm{f}(\bm{x})\bm{f}(\bm{x})^{\prime}\right)
=det(𝑭𝑾0𝑭+𝑮0𝑮0)\displaystyle=\det(\bm{F}^{\prime}\bm{W}_{0}\bm{F}+\bm{G}_{0}\bm{G}_{0}^{\prime})
=det(𝑭𝑾0𝑭)det(𝑰2+𝑮0𝑴0𝑮0).\displaystyle=\det(\bm{F}^{\prime}\bm{W}_{0}\bm{F})\det\left(\bm{I}_{2}+\bm{G}_{0}^{\prime}\bm{M}_{0}\bm{G}_{0}\right).

It can be computed that

det(𝑰2+𝑮0𝑴0𝑮0)=Δ0(𝒙,𝒙i),\det\left(\bm{I}_{2}+\bm{G}_{0}^{\prime}\bm{M}_{0}\bm{G}_{0}\right)=\Delta_{0}(\bm{x},\bm{x}_{i}),

where Δ0(𝒙,𝒙i)\Delta_{0}(\bm{x},\bm{x}_{i}) is

Δ0(𝒙,𝒙i)\displaystyle\Delta_{0}(\bm{x},\bm{x}_{i}) =[1+π(𝒙,𝜼)(1π(𝒙,𝜼))v0(𝒙)][1π(𝒙i,𝜼)(1π(𝒙i,𝜼))v0(𝒙i)]\displaystyle=\left[1+\pi(\bm{x},\bm{\eta})(1-\pi(\bm{x},\bm{\eta}))v_{0}(\bm{x})\right]\left[1-\pi(\bm{x}_{i},\bm{\eta})(1-\pi(\bm{x}_{i},\bm{\eta}))v_{0}(\bm{x}_{i})\right]
+π(𝒙,𝜼)(1π(𝒙,𝜼))π(𝒙i,𝜼)(1π(𝒙i,𝜼))v0(𝒙,𝒙i)2.\displaystyle+\pi(\bm{x},\bm{\eta})(1-\pi(\bm{x},\bm{\eta}))\pi(\bm{x}_{i},\bm{\eta})(1-\pi(\bm{x}_{i},\bm{\eta}))v_{0}(\bm{x},\bm{x}_{i})^{2}.

Similarly, define

𝑮1\displaystyle\bm{G}_{1} =[π(𝒙,η)𝒇(𝒙),iπ(𝒙i,η)𝒇(𝒙i)]and\displaystyle=[\sqrt{\pi(\bm{x},\eta)}\bm{f}(\bm{x}),i\sqrt{\pi(\bm{x}_{i},\eta)}\bm{f}(\bm{x}_{i})]\quad\textrm{and}
𝑮2\displaystyle\bm{G}_{2} =[(1π(𝒙,𝜼)𝒇(𝒙),i(1π(𝒙i,𝜼)𝒇(𝒙i)].\displaystyle=[\sqrt{(1-\pi(\bm{x},\bm{\eta})}\bm{f}(\bm{x}),i\sqrt{(1-\pi(\bm{x}_{i},\bm{\eta})}\bm{f}(\bm{x}_{i})].

Following a similar derivation,

det(𝑭𝑾1𝑭+ρ𝑹)\displaystyle\det(\bm{F}^{*^{\prime}}\bm{W}_{1}^{*}\bm{F}^{*}+\rho\bm{R}) =det(𝑭𝑾1𝑭+ρ𝑹)Δ1(𝒙,𝒙i),\displaystyle=\det(\bm{F}^{\prime}\bm{W}_{1}\bm{F}+\rho\bm{R})\Delta_{1}(\bm{x},\bm{x}_{i}),
det(𝑭𝑾2𝑭+ρ𝑹)\displaystyle\det(\bm{F}^{*^{\prime}}\bm{W}_{2}^{*}\bm{F}^{*}+\rho\bm{R}) =det(𝑭𝑾2𝑭+ρ𝑹)Δ2(𝒙,𝒙i),\displaystyle=\det(\bm{F}^{\prime}\bm{W}_{2}\bm{F}+\rho\bm{R})\Delta_{2}(\bm{x},\bm{x}_{i}),

where Δ1(𝒙,𝒙i)\Delta_{1}(\bm{x},\bm{x}_{i}) and Δ2(𝒙,𝒙i)\Delta_{2}(\bm{x},\bm{x}_{i}) are

Δ1(𝒙,𝒙i)\displaystyle\Delta_{1}(\bm{x},\bm{x}_{i}) =[1+π(𝒙,𝜼)v1(𝒙)][1π(𝒙i,𝜼)v1(𝒙i)]+π(𝒙,𝜼)π(𝒙i,𝜼)v1(𝒙,𝒙i)2,\displaystyle=\left[1+\pi(\bm{x},\bm{\eta})v_{1}(\bm{x})\right]\left[1-\pi(\bm{x}_{i},\bm{\eta})v_{1}(\bm{x}_{i})\right]+\pi(\bm{x},\bm{\eta})\pi(\bm{x}_{i},\bm{\eta})v_{1}(\bm{x},\bm{x}_{i})^{2},
Δ2(𝒙,𝒙i)\displaystyle\Delta_{2}(\bm{x},\bm{x}_{i}) =[1+(1π(𝒙,𝜼))v2(𝒙)][1(1π(𝒙i,𝜼))v2(𝒙i)]\displaystyle=\left[1+(1-\pi(\bm{x},\bm{\eta}))v_{2}(\bm{x})\right]\left[1-(1-\pi(\bm{x}_{i},\bm{\eta}))v_{2}(\bm{x}_{i})\right]
+(1π(𝒙,𝜼))(1π(𝒙i,𝜼))v2(𝒙,𝒙i)2.\displaystyle+(1-\pi(\bm{x},\bm{\eta}))(1-\pi(\bm{x}_{i},\bm{\eta}))v_{2}(\bm{x},\bm{x}_{i})^{2}.

Thus Δ(𝒙,𝒙i)\Delta(\bm{x},\bm{x}_{i}) is computed as in (34). ∎

Proof of update formulas Mi\bm{M}_{i}^{*} for i=0,1,2i=0,1,2

Proof.

Use the same notation of 𝑮i\bm{G}_{i} for i=0,1,2i=0,1,2 as in the previous proof. Define the functions

𝑺i(𝒙)=𝑴i𝒇(𝒙)𝒇(𝒙)𝑴i,for i=0,1,2,\displaystyle\bm{S}_{i}(\bm{x})=\bm{M}_{i}\bm{f}(\bm{x})\bm{f}(\bm{x})^{\prime}\bm{M}_{i},\quad\textrm{for }i=0,1,2,
𝑺i(𝒙1,𝒙2)=𝑴i𝒇(𝒙1)𝒇(𝒙2)𝑴i,for i=0,1,2.\displaystyle\bm{S}_{i}(\bm{x}_{1},\bm{x}_{2})=\bm{M}_{i}\bm{f}(\bm{x}_{1})\bm{f}(\bm{x}_{2})^{\prime}\bm{M}_{i},\quad\textrm{for }i=0,1,2.

It is straightforward to derive

𝑴0\displaystyle\bm{M}_{0}^{*} =(𝑭𝑾0𝑭+𝑮0𝑮0)1\displaystyle=\left(\bm{F}^{\prime}\bm{W}_{0}\bm{F}+\bm{G}_{0}\bm{G}_{0}^{\prime}\right)^{-1}
=𝑴0𝑴0𝑮0(𝑰2+𝑮0𝑴0𝑮0)1𝑮𝑴0.\displaystyle=\bm{M}_{0}-\bm{M}_{0}\bm{G}_{0}\left(\bm{I}_{2}+\bm{G}_{0}^{\prime}\bm{M}_{0}\bm{G}_{0}\right)^{-1}\bm{G}^{\prime}\bm{M}_{0}.

For simpler notation, denote a(𝒙)=π(𝒙,𝜼)(1π(𝒙,𝜼))a(\bm{x})=\pi(\bm{x},\bm{\eta})(1-\pi(\bm{x},\bm{\eta})) and a(𝒙i)=π(𝒙i,𝜼)(1π(𝒙i,𝜼))a(\bm{x}_{i})=\pi(\bm{x}_{i},\bm{\eta})(1-\pi(\bm{x}_{i},\bm{\eta})).

(𝑰2+𝑮0𝑴0𝑮0)1\displaystyle\left(\bm{I}_{2}+\bm{G}_{0}^{\prime}\bm{M}_{0}\bm{G}_{0}\right)^{-1}
=\displaystyle= (1+a(𝒙)v0(𝒙),ia(𝒙)a(𝒙i)v0(𝒙,𝒙i)ia(𝒙)a(𝒙i)v0(𝒙,𝒙i),1a(𝒙i)v0(𝒙i))1\displaystyle\left(\begin{array}[]{ll}1+a(\bm{x})v_{0}(\bm{x}),&i\sqrt{a(\bm{x})a(\bm{x}_{i})}v_{0}(\bm{x},\bm{x}_{i})\\ i\sqrt{a(\bm{x})a(\bm{x}_{i})}v_{0}(\bm{x},\bm{x}_{i}),&1-a(\bm{x}_{i})v_{0}(\bm{x}_{i})\end{array}\right)^{-1}
=\displaystyle= 1Δ0(𝒙,𝒙i)(1a(𝒙i)v0(𝒙i),ia(𝒙)a(𝒙i)v0(𝒙,𝒙i)ia(𝒙)a(𝒙i)v0(𝒙,𝒙i),1+a(𝒙)v0(𝒙)).\displaystyle\frac{1}{\Delta_{0}(\bm{x},\bm{x}_{i})}\left(\begin{array}[]{ll}1-a(\bm{x}_{i})v_{0}(\bm{x}_{i}),&-i\sqrt{a(\bm{x})a(\bm{x}_{i})}v_{0}(\bm{x},\bm{x}_{i})\\ -i\sqrt{a(\bm{x})a(\bm{x}_{i})}v_{0}(\bm{x},\bm{x}_{i}),&1+a(\bm{x})v_{0}(\bm{x})\end{array}\right).
𝑴0\displaystyle\bm{M}_{0}^{*} =𝑴01Δ0(𝒙,𝒙i)𝑴0[a(𝒙)v0(𝒙),ia(𝒙i)v0(𝒙i)]\displaystyle=\bm{M}_{0}-\frac{1}{\Delta_{0}(\bm{x},\bm{x}_{i})}\bm{M}_{0}\left[\sqrt{a(\bm{x})}v_{0}(\bm{x}),i\sqrt{a(\bm{x}_{i})}v_{0}(\bm{x}_{i})\right]
×\displaystyle\times (1a(𝒙i)v0(𝒙i),ia(𝒙)a(𝒙i)v0(𝒙,𝒙i)ia(𝒙)a(𝒙i)v0(𝒙,𝒙i),1+a(𝒙)v0(𝒙))×[a(𝒙)𝒇(𝒙)iai(𝒙i)𝒇(𝒙i)]𝑴0.\displaystyle\left(\begin{array}[]{ll}1-a(\bm{x}_{i})v_{0}(\bm{x}_{i}),&-i\sqrt{a(\bm{x})a(\bm{x}_{i})}v_{0}(\bm{x},\bm{x}_{i})\\ -i\sqrt{a(\bm{x})a(\bm{x}_{i})}v_{0}(\bm{x},\bm{x}_{i}),&1+a(\bm{x})v_{0}(\bm{x})\end{array}\right)\times\left[\begin{array}[]{c}\sqrt{a(\bm{x})}\bm{f}(\bm{x})^{\prime}\\ i\sqrt{a_{i}(\bm{x}_{i})}\bm{f}(\bm{x}_{i})^{\prime}\end{array}\right]\bm{M}_{0}.
=𝑴01Δ0(𝒙,𝒙i){π(𝒙,𝜼)(1π(𝒙,𝜼))[1π(𝒙i,𝜼)(1π(𝒙i,𝜼))v0(𝒙i)]𝑺0(𝒙)\displaystyle=\bm{M}_{0}-\frac{1}{\Delta_{0}(\bm{x},\bm{x}_{i})}\left\{\pi(\bm{x},\bm{\eta})(1-\pi(\bm{x},\bm{\eta}))\left[1-\pi(\bm{x}_{i},\bm{\eta})(1-\pi(\bm{x}_{i},\bm{\eta}))v_{0}(\bm{x}_{i})\right]\bm{S}_{0}(\bm{x})\right.
π(𝒙i,𝜼)(1π(𝒙i,𝜼))[1+π(𝒙,𝜼)(1π(𝒙,𝜼))v0(𝒙)]𝑺0(𝒙i)\displaystyle\left.-\pi(\bm{x}_{i},\bm{\eta})(1-\pi(\bm{x}_{i},\bm{\eta}))\left[1+\pi(\bm{x},\bm{\eta})(1-\pi(\bm{x},\bm{\eta}))v_{0}(\bm{x})\right]\bm{S}_{0}(\bm{x}_{i})\right.
+π(𝒙i,𝜼)(1π(𝒙i,𝜼))π(𝒙,𝜼)(1π(𝒙,𝜼))v0(𝒙,𝒙i)[𝑺0(𝒙,𝒙i)+𝑺0(𝒙i,𝒙)]}\displaystyle+\left.\pi(\bm{x}_{i},\bm{\eta})(1-\pi(\bm{x}_{i},\bm{\eta}))\pi(\bm{x},\bm{\eta})(1-\pi(\bm{x},\bm{\eta}))v_{0}(\bm{x},\bm{x}_{i})\left[\bm{S}_{0}(\bm{x},\bm{x}_{i})+\bm{S}_{0}(\bm{x}_{i},\bm{x})\right]\right\}

Update formulas for 𝑴1\bm{M}_{1}^{*} and 𝑴2\bm{M}_{2}^{*} can be derived similarly as follows.

𝑴1=𝑴1\displaystyle\bm{M}_{1}^{*}=\bm{M}_{1} 1Δ1(𝒙,𝒙i){π(𝒙,𝜼)[1π(𝒙i,𝜼)v1(𝒙i)]𝑺1(𝒙)π(𝒙i,𝜼)[1+π(𝒙,𝜼)v1(𝒙)]𝑺1(𝒙i)\displaystyle-\frac{1}{\Delta_{1}(\bm{x},\bm{x}_{i})}\left\{\pi(\bm{x},\bm{\eta})\left[1-\pi(\bm{x}_{i},\bm{\eta})v_{1}(\bm{x}_{i})\right]\bm{S}_{1}(\bm{x})-\pi(\bm{x}_{i},\bm{\eta})\left[1+\pi(\bm{x},\bm{\eta})v_{1}(\bm{x})\right]\bm{S}_{1}(\bm{x}_{i})\right.
+π(𝒙i,𝜼)π(𝒙,𝜼)v1(𝒙,𝒙i)[𝑺1(𝒙,𝒙i)+𝑺1(𝒙i,𝒙)]},\displaystyle+\left.\pi(\bm{x}_{i},\bm{\eta})\pi(\bm{x},\bm{\eta})v_{1}(\bm{x},\bm{x}_{i})\left[\bm{S}_{1}(\bm{x},\bm{x}_{i})+\bm{S}_{1}(\bm{x}_{i},\bm{x})\right]\right\},
𝑴2=𝑴2\displaystyle\bm{M}_{2}^{*}=\bm{M}_{2} 1Δ2(𝒙,𝒙i){(1π(𝒙,𝜼))[1(1π(𝒙i,𝜼))v2(𝒙i)]𝑺2(𝒙)\displaystyle-\frac{1}{\Delta_{2}(\bm{x},\bm{x}_{i})}\left\{(1-\pi(\bm{x},\bm{\eta}))\left[1-(1-\pi(\bm{x}_{i},\bm{\eta}))v_{2}(\bm{x}_{i})\right]\bm{S}_{2}(\bm{x})\right.
(1π(𝒙i,𝜼))[1+(1π(𝒙,𝜼))v2(𝒙)]𝑺2(𝒙i)\displaystyle\left.-(1-\pi(\bm{x}_{i},\bm{\eta}))\left[1+(1-\pi(\bm{x},\bm{\eta}))v_{2}(\bm{x})\right]\bm{S}_{2}(\bm{x}_{i})\right.
+(1π(𝒙i,𝜼))(1π(𝒙,𝜼))v2(𝒙,𝒙i)[𝑺2(𝒙,𝒙i)+𝑺2(𝒙i,𝒙)]}.\displaystyle+\left.(1-\pi(\bm{x}_{i},\bm{\eta}))(1-\pi(\bm{x},\bm{\eta}))v_{2}(\bm{x},\bm{x}_{i})\left[\bm{S}_{2}(\bm{x},\bm{x}_{i})+\bm{S}_{2}(\bm{x}_{i},\bm{x})\right]\right\}.

A2 Appendix A2. Table

Table A1: Local Bayesian DD-optimal Designs for ρ=0\rho=0 and 0.3 and other three alternative designs. The values in columns 7-11 are frequencies.
Point x1x_{1} x2x_{2} x3x_{3} x4x_{4} x5x_{5} DQQD_{QQ} ρ=0\rho=0 DQQD_{QQ} ρ=0.3\rho=0.3 DGD_{G} DLD_{L} DnD_{n}
1 -1 -1 -1 -1 -1 1 2 1 1 2
2 1 -1 -1 -1 -1 1 1 2 1 1
3 -1 1 -1 -1 -1 1 1 2 1 1
4 1 1 -1 -1 -1 2 1 2 1 2
5 -1 -1 1 -1 -1 1 0 2 1 1
6 1 -1 1 -1 -1 1 1 0 1 1
7 -1 1 1 -1 -1 2 2 1 1 2
8 1 1 1 -1 -1 1 2 1 1 1
9 -1 -1 -1 0 -1 2 2 1 1 1
10 1 -1 -1 0 -1 0 0 1 1 1
11 -1 1 -1 0 -1 1 1 2 1 2
12 1 1 -1 0 -1 1 2 1 1 1
13 -1 -1 1 0 -1 1 1 1 1 2
14 1 -1 1 0 -1 2 2 1 1 1
15 -1 1 1 0 -1 1 1 0 1 0
16 1 1 1 0 -1 1 1 1 1 2
17 -1 -1 -1 1 -1 1 0 1 1 2
18 1 -1 -1 1 -1 2 2 1 1 1
19 -1 1 -1 1 -1 2 2 1 1 1
20 1 1 -1 1 -1 1 1 1 1 1
21 -1 -1 1 1 -1 2 2 1 1 1
22 1 -1 1 1 -1 0 1 1 1 1
23 -1 1 1 1 -1 0 1 1 1 2
24 1 1 1 1 -1 2 1 1 1 1
25 -1 -1 -1 -1 0 1 0 1 1 0
26 1 -1 -1 -1 0 0 0 1 1 1
27 -1 1 -1 -1 0 0 0 0 1 0
28 1 1 -1 -1 0 0 1 1 1 0
29 -1 -1 1 -1 0 0 1 1 1 1
30 1 -1 1 -1 0 0 0 0 1 0
31 -1 1 1 -1 0 0 0 0 1 0
32 1 1 1 -1 0 1 0 1 1 0
33 -1 -1 -1 0 0 0 0 1 1 0
34 1 -1 -1 0 0 1 1 1 1 1
35 -1 1 -1 0 0 0 0 0 1 0
36 1 1 -1 0 0 1 0 1 1 0
37 -1 -1 1 0 0 1 1 1 1 0
38 1 -1 1 0 0 0 0 1 1 0
39 -1 1 1 0 0 0 0 0 1 0
40 1 1 1 0 0 0 0 1 1 0
41 -1 -1 -1 1 0 0 2 1 0 0
42 1 -1 -1 1 0 0 0 1 1 0
43 -1 1 -1 1 0 0 0 0 1 0
44 1 1 -1 1 0 1 0 1 0 0
45 -1 -1 1 1 0 0 0 1 1 0
46 1 -1 1 1 0 1 1 1 0 0
47 -1 1 1 1 0 1 0 1 0 0
48 1 1 1 1 0 0 1 1 1 1
49 -1 -1 -1 -1 1 1 1 1 1 1
50 1 -1 -1 -1 1 2 2 1 1 2
51 -1 1 -1 -1 1 1 1 0 1 1
52 1 1 -1 -1 1 1 1 1 1 1
53 -1 -1 1 -1 1 2 1 1 1 2
54 1 -1 1 -1 1 0 0 0 1 0
55 -1 1 1 -1 1 1 1 1 1 1
56 1 1 1 -1 1 1 2 1 1 2
57 -1 -1 -1 0 1 1 1 1 1 2
58 1 -1 -1 0 1 1 1 1 1 0
59 -1 1 -1 0 1 2 2 1 1 1
60 1 1 -1 0 1 1 1 1 1 2
61 -1 -1 1 0 1 2 2 1 0 1
62 1 -1 1 0 1 1 1 1 1 2
63 -1 1 1 0 1 0 0 1 1 1
64 1 1 1 0 1 2 2 1 0 1
65 -1 -1 -1 1 1 2 1 1 1 2
66 1 -1 -1 1 1 1 1 1 1 1
67 -1 1 -1 1 1 1 1 1 1 1
68 1 1 -1 1 1 1 2 1 1 2
69 -1 -1 1 1 1 1 1 1 1 1
70 1 -1 1 1 1 1 1 0 1 1
71 -1 1 1 1 1 2 2 1 1 2
72 1 1 1 1 1 1 0 1 1 1