This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A response-adaptive multi-arm design for continuous endpoints based on a weighted information measure

[Uncaptioned image] Gianmarco Caruso
MRC Biostatistics Unit
University of Cambridge
Cambridge, UK CB2 0SR
[email protected]
&[Uncaptioned image] Pavel Mozgunov
MRC Biostatistics Unit
University of Cambridge
Cambridge, UK CB2 0SR
[email protected]
corresponding author
Abstract

Multi-arm trials are gaining interest in practice given the statistical and logistical advantages that they can offer. The standard approach is to use a fixed (throughout the trial) allocation ratio, but there is a call for making it adaptive and skewing the allocation of patients towards better performing arms. However, among other challenges, it is well-known that these approaches might suffer from lower statistical power. We present a response-adaptive design for continuous endpoints which explicitly allows to control the trade-off between the number of patients allocated to the “optimal” arm and the statistical power. Such a balance is achieved through the calibration of a tuning parameter, and we explore various strategies to effectively select it. The proposed criterion is based on a context-dependent information measure which gives a greater weight to those treatment arms which have characteristics close to a pre-specified clinical target. We also introduce a simulation-based hypothesis testing procedure which focuses on selecting the target arm, discussing strategies to effectively control the type-I error rate. The potential advantage of the proposed criterion over currently used alternatives is evaluated in simulations, and its practical implementation is illustrated in the context of early Phase IIa proof-of-concept oncology clinical trials.

Keywords information gain  \cdot best arm identification  \cdot type-I error control  \cdot phase II trials  \cdot patient benefit

1 Introduction

Clinical trials are medical research studies involving patients with the aim of identifying the best treatment for future recommendation. For many rare diseases, they may be the only chance for patients of receiving a new promising treatment promptly. However, the clinical trial dilemma often requires some participants to receive suboptimal treatments in order to be able to estimate the treatment effect and draw reliable conclusions. In other terms, ethical responsibility in clinical trials is based on the balance of two components: the individual ethics, prioritising the allocation of patients to the better treatment to ensure their own well-being, and the collective ethics, focusing on identifying the better treatment with high probability (Antognini and Giovagnoli, 2010; Bartroff and Leung Lai, 2011). Balanced allocation strategies are the most commonly used in practice, due their easy implementation and their even exploration of all the treatments which generally leads to high statistical power (Azriel et al., 2012). Although they comply quite well with the collective objective, their indiscriminate balanced allocation does not allow for a preventive stochastic drop of suboptimal treatments, by often neglecting the individual ethics requirements (Thall et al., 1989). In contrast, response-adaptive designs are able to skew the allocation of patients towards those treatments that are observed to perform better, often without a significant sacrifice in terms of statistical power. In fact, rather than using allocation probabilities fixed throughout the trial, this class of designs sequentially allocates patients to the treatment arms on the basis of the previous patient responses (Rosenberger and Lachin, 1993; Berry and Eick, 1995; Robertson et al., 2023).

We consider the case of a multi-armed clinical trial comparing several alternative treatments and where the primary endpoint is measurable in a continuous scale. These types of measurements are often available in clinical trials, although much of the methodological literature focus on binary endpoints (Rombach et al., 2020) and, in many real applications, continuous endpoints for efficacy tend to be dichotomised (Altman and Royston, 2006). In early phase drug development, common measurements in continuous scale are represented by biomarkers, which consist in measurable indicators of biological conditions and that are often used as primary endpoints in clinical trials to evaluate the efficacy (or its proxy) of treatments (Zhao et al., 2015). While in most applications one targets the highest or the lowest value of the endpoints, in some cases the focus may rather be on an optimal range or a specific target value for the selected biomarker. An example is represented by the levels of glucose in the blood measured through the HbA1c biomarker: values of HbA1c much above a given threshold indicate poor glucose control and risk of complications due to hyperglicemia, while values too low can indicate risk of hypoglycemia (Kaiafa et al., 2021). In this context, we consider clinical trials with a clinical target γ\gamma\in\mathbb{R} set by clinicians in advance. In the case of targeting the highest (or lowest) value of an endpoint, γ\gamma can be set to the highest (lowest) value which is clinically attainable.

While it is common in the literature to emphasise comparisons with the control and test hypotheses against it, this approach does not always align with the objectives of some multi-armed trials, such as selecting the best treatment arm. Rather than comparing each arm against the control, we aim at identifying the treatment arm which gives responses on average closer to the target γ\gamma. Under this assumption, competing treatment arms can be ranked from the best to the worst according to the proximity of their true mean responses to the target. In line with this objective, we propose a hypothesis testing procedure which compares the first and second best arms in the ranking, while taking into account the correct identification of these two arms. We also discuss strategies to implement such a hypothesis testing procedure and to control type-I error rate inflation.

In this work, we introduce a class of response-adaptive designs for studies with continuous endpoints aiming at identifying the best treatment arm under ethical constraints (e.g. maximising the number of patients assigned to it). Drawing from the theory of weighted information measures (Kelbert and Mozgunov, 2015; Suhov et al., 2016), we propose to use the information gain - i.e. the difference between the Shannon differential entropy and its weighted version - as a measure for the decision-making in clinical trials. Indeed, the use of suitable parametric weight functions allows to comply with ethical constraints by informing entropy measures about which outcomes are more desirable. In this work, we consider weight functions with a particular interest in arms whose mean responses are in the neighbourhood of γ\gamma. The idea of taking into account the "context" of the experiment directly in the information measure has also been the basis of some recently proposed clinical trial designs with either binary (Kasianova et al., 2021; Mozgunov and Jaki, 2020a; Kasianova et al., 2023) or multinomial responses (Mozgunov and Jaki, 2019, 2020b). In line with these recent developments, we define the information measures based on the Bayesian posterior distribution of the parameter of interest (i.e. the true mean response), which arises from the assumption of objective prior on it. We introduce a parameter in the weight function to control the so-called "exploration vs exploitation" trade-off (Azriel et al., 2011), and we discuss strategies to perform a robust calibration of this tuning parameter according to the objective of the trial.

The current work can be framed within the broader class of response-adaptive designs, relaxing the assumption of independence between competing arms to maximise patient benefit. An exhaustive review of the topic has been recently proposed by Robertson et al. (2023). Multi-Arm Bandit (MAB) approaches are a popular sub-class of response-adaptive designs which can be adopted in multi-armed trials to find a balance between exploration and exploitation. Many MAB designs have been recently proposed both for discrete (Villar et al., 2015; Aziz et al., 2021) and continuous (Smith and Villar, 2018; Williamson and Villar, 2020) responses, but they all focus on the highest (or lowest) response rather than on a specific target γ\gamma. For this reason, we also introduce two extensions of Smith and Villar (2018)’s design that incorporate the information about γ\gamma.

This work is motivated by an early Phase IIa proof-of-concept oncology clinical trial of a novel inhibitor (called A, the name is masked) in combination with one or several currently approved immunotherapy treatments. The working hypothesis is that, due to its mechanism of action, this inhibitor can enhance the effect of an immunotherapy, and several doses of the inhibitor has underwent the Phase I dose-escalation trial. However, the challenge is to select one immunotherapy for the further development in Phase IIb (or Phase III) trial. To answer this question, a multi-arm trial studying the three combination arms of the novel inhibitor with three various approved immunotherapies (called B, C, and D) and one arm of A+B+C was proposed. Due to the expected mechanism of action and to make a rapid decision on the development, the primary endpoint is a PD-marker (at Day 15) that was agreed to be an accurate proxy for the expected interaction effect of the drug on efficacy. A specific value of this PD-marker is being targeted, as lower values would indicate insufficient efficacy, while higher values could lead to adverse effects. The objective of the trial are (i) to find the most promising combination among the four, (ii) check whether the difference between the most promising and the second promising arm is “significant”, and (iii) collect the most information about the most promising arms (due to the constrain on the resources). Taking all three objectives into account, the proposal is to use the response-adaptive design that is presented in this work.

The remainder of the paper is organised as follows. Section 2 illustrates the derivation of the proposed criterion for a particular weight function symmetric around the target γ\gamma, and highlights the central role of the parameter κ\kappa controlling the trade-off between exploration and exploitation. Section 3 introduces a novel simulation-based hypothesis testing procedure which formalises the objective of the trial outlined above, and discusses strategies to control type-I error rates. Section 4 illustrates methods for a robust selection of the tuning parameter κ\kappa. In Section 5, the performance of our criterion is evaluated against several competitors through an extensive simulation study inspired by the motivating setting described above, while Section 6 illustrates the application of our multivariate proposal to a trial with co-primary endpoints. The paper concludes with a discussion in Section 7.

2 Methodology

2.1 Context-dependent information measure

Consider a clinical trial where a continuous endpoint is observed for each of the KK considered treatment arms. In addition, consider the njn_{j} responses from an arm jj to be an independent and identically distributed (iid) sequence of realisations of the random variables X1,j,,Xnj,jiidN(μj,σj2)X_{1,\,j},\dots,X_{{n_{j}},\,j}\overset{iid}{\sim}N(\mu_{j},\,\sigma_{j}^{2}), with μj\mu_{j}\in\mathbb{R} and σj2+\sigma_{j}^{2}\in\mathbb{R}^{+} (j=1,,Kj=1,\dots,K). We assume an objective Bayesian setting where μj\mu_{j} is provided with a non-informative prior distribution and σj2\sigma_{j}^{2} is supposed to be known. Notably, we assume for μj\mu_{j} an improper uniform prior on the whole real line, resulting in a posterior distribution N(x¯nj,σnj2)N(\bar{x}_{n_{j}},\,\sigma^{2}_{n_{j}}), where x¯nj=1nji=1njxi,j\bar{x}_{n_{j}}=\frac{1}{n_{j}}\sum_{i=1}^{n_{j}}x_{i,\,j} is the sample mean and σnj2=σj2nj\sigma^{2}_{n_{j}}=\frac{\sigma_{j}^{2}}{n_{j}}. The posterior density of μj\mu_{j} is given by

πnj(μj)=(2πσj2)12exp{12(x¯njμjσnj)2}.\pi_{n_{j}}(\mu_{j})=\bigl{(}2\pi\sigma_{j}^{2}\bigr{)}^{-\frac{1}{2}}\,\exp\Biggl{\{}-\frac{1}{2}\biggl{(}\frac{\bar{x}_{n_{j}}-\mu_{j}}{\sigma_{n_{j}}}\biggr{)}^{2}\Biggr{\}}\,. (1)

Assume that x¯njmj\bar{x}_{n_{j}}\rightarrow m_{j}, namely that the posterior density in (1) tends to concentrate in the neighbourhood of a certain value mjm_{j}\in\mathbb{R} as njn_{j} grows. The amount of information needed to estimate mjm_{j} is given by the Shannon differential entropy of πnj(μj)\pi_{n_{j}}(\mu_{j}), namely

h(πnj)=πnj(μj)logπnj(μj)𝑑μj.h(\pi_{n_{j}})=-\int_{\mathbb{R}}\,\pi_{n_{j}}(\mu_{j})\log\pi_{n_{j}}(\mu_{j})\,d\mu_{j}\,. (2)

However, the previous quantity does not account for the fact that the information about the unknown parameter is not uniformly valuable across the entire parametric space. Indeed, along with the amount of information needed to estimate a parameter, in experimental studies it is also relevant to consider the nature of the outcome itself and how desirable it is. For example, given a pre-specified target mean response γj\gamma_{j}\in\mathbb{R}, one can consider a context-dependent measure as the weighted Shannon entropy, i.e.

hϕγ(πnj)=ϕγ(μj)πnj(μj)logπnj(μj)𝑑μj,h^{\phi_{\gamma}}(\pi_{n_{j}})=-\int_{\mathbb{R}}\,\phi_{\gamma}(\mu_{j})\pi_{n_{j}}(\mu_{j})\log\pi_{n_{j}}(\mu_{j})\,d\mu_{j}\,, (3)

where the weight function, ϕγ(μj)\phi_{\gamma}(\mu_{j}), emphasises the interest on a particular subset of the parametric space (e.g. the neighbourhood of γj\gamma_{j}) rather than on the whole space. Following Mozgunov and Jaki (2020b), the information gain from considering the estimation problem in the sensitive area is given by

Δ(μj)=h(πnj)hϕγ(πnj).\Delta(\mu_{j})=h(\pi_{n_{j}})-h^{\phi_{\gamma}}(\pi_{n_{j}})\,. (4)

This quantity can be interpreted as the amount of additional information needed to estimate the mean response μj\mu_{j} when the degree of interest in the true value of μj\mu_{j} is not uniform across the whole parametric space (i.e. some values of μj\mu_{j} are more desirable than others).

Theorem 1 provides a closed-form expression for the information gain when a family of weight functions having a Gaussian kernel centered around the target γj\gamma_{j} is considered, and it offers some insights on its asymptotical behaviour.

Theorem 1.

Let h(πnj)h(\pi_{n_{j}}) and hϕγ(πnj)h^{\phi_{\gamma}}(\pi_{n_{j}}) be the standard and weighted Shannon entropy of the posterior normal distribution in (1) corresponding to arm jj. Assume a family of weight functions of the form

ϕγ(μj)=C(x¯nj,σj2,γj,nj)exp{12(μjγjσϕ,j)2},\phi_{\gamma}(\mu_{j})=C(\bar{x}_{n_{j}},\sigma^{2}_{j},\gamma_{j},n_{j})\,\exp\biggl{\{}-\frac{1}{2}\Bigl{(}\frac{\mu_{j}-\gamma_{j}}{\sigma_{\phi,\,j}}\Bigr{)}^{2}\biggr{\}}\,, (5)

where σϕ,j2=σjpnjκ\sigma_{\phi,\,j}^{2}=\frac{\sigma_{j}^{p}}{n_{j}^{\kappa}}, (κ(\kappa\in\mathbb{R}, p)p\in\mathbb{R}), is the variance term of the Gaussian kernel and C(x¯nj,σj2,γj,nj)C(\bar{x}_{n_{j}},\sigma^{2}_{j},\gamma_{j},n_{j}) satisfies the normalisation condition ϕγ(μj)πnj(μj)𝑑μj=1\int_{\mathbb{R}}\,\phi_{\gamma}(\mu_{j})\pi_{n_{j}}(\mu_{j})\,d\mu_{j}=1. Then,

Δnj=12σj2pnjκσj2pnjκ+nj12(γjx¯njσjnj)2(σj2pnjκσj2pnjκ+nj)2.\Delta_{n_{j}}=\frac{1}{2}\frac{\sigma_{j}^{2-p}n_{j}^{\kappa}}{\sigma_{j}^{2-p}n_{j}^{\kappa}+n_{j}}-\frac{1}{2}\Biggl{(}\frac{\gamma_{j}-\bar{x}_{n_{j}}}{\sigma_{j}}\sqrt{n_{j}}\Biggr{)}^{2}\Biggl{(}\frac{\sigma_{j}^{2-p}n_{j}^{\kappa}}{\sigma_{j}^{2-p}n_{j}^{\kappa}+n_{j}}\Biggr{)}^{2}\,. (6)

Additionally, if limnjx¯nj=mj\lim_{n_{j}\rightarrow\infty}\bar{x}_{n_{j}}=m_{j}, then: (i) Δnj=O(12(γjmj)2σj(p1)2nj2κ1)\scriptstyle\Delta_{n_{j}}=O\biggl{(}-\frac{1}{2}(\gamma_{j}-m_{j})^{2}\sigma_{j}^{(p-1)^{2}}n_{j}^{2\kappa-1}\biggr{)} when κ<1\kappa<1; (ii) Δnj=O(12(γjmj)2(σj(p1)2σj2p+1)2nj)\scriptstyle\Delta_{n_{j}}=O\Biggl{(}-\frac{1}{2}(\gamma_{j}-m_{j})^{2}\biggl{(}\frac{\sigma_{j}^{(p-1)^{2}}}{\sigma_{j}^{2-p}+1}\biggr{)}^{2}n_{j}\Biggr{)} when κ=1\kappa=1; (iii) Δnj=O(12(γjmjσj)2nj)\scriptstyle\Delta_{n_{j}}=O\biggl{(}-\frac{1}{2}\Bigl{(}\frac{\gamma_{j}-m_{j}}{\sigma_{j}}\Bigr{)}^{2}n_{j}\biggr{)} when κ>1\kappa>1.

The proof is provided in Section A.1 of the Supplementary Material (SM, henceforth). Notice that the function in (5) is symmetric around its maximum γj\gamma_{j}, while σϕ,j2\sigma_{\phi,\,j}^{2} quantifies how the curve is dispersed around this target mean. Theorem 1 shows that the leading term of the information gain in (6) is always non-positive, with the asymptotic term achieving the maximum value 0 when the sample mean equals the target mean. Notably, as nn\rightarrow\infty, its asymptotical behaviour varies with κ\kappa. For example, the information gain converges to 0 when κ<0.5\kappa<0.5, meaning that the context of the study tends to be neglected as more information is collected on a specific arm. For this reason, we focus on values of κ\kappa greater or equal to 0.50.5. When κ>0.5\kappa>0.5, the information gain diverges to -\infty, while, when κ=0.5\kappa=0.5, the information gain converges to a finite quantity which, for p=1p=1, depends on the squared distance between mjm_{j} and γj\gamma_{j} and, for p=2p=2, on a standardised version of this distance.

Notice that the expression in (6) achieves its maximum at x¯nj=γj\bar{x}_{n_{j}}=\gamma_{j}, ceteris paribus (cfr. Figure 1a).

Refer to caption
Figure 1: Behaviour of the information gain Δnj\Delta_{n_{j}} in (6) as function of - ceteris paribus - the current sample mean x¯nj\bar{x}_{n_{j}} (left panel), the standard deviation σjp\sigma_{j}^{p} (center panel) and the current sample size njκn_{j}^{\kappa} of the jj-th arm (right panel).

This means that collecting more information about the arm with mean response μj\mu_{j} close to γj\gamma_{j} implies a maximisation of the information gain Δnj\Delta_{n_{j}}. This property guarantees that the more valuable the learning about μj\mu_{j}, the higher the information gain associated to this arm. In addition, when p1p\geq 1, Δnj\Delta_{n_{j}} is monotonically increasing with σj\sigma_{j} (cfr. Figure 1b), reflecting the idea that the more the uncertainty around arm jj the more the learning value associated with sampling from it. When κ0.5\kappa\geq 0.5, as more observations on arm jj are collected, the value of the information gain decreases, with higher values of κ\kappa leading to a steeper decline of this function (cfr. Figure 1c).

For the desirable properties illustrated above, we suggest the use of the information gain in (6) as patient’s gain criterion to govern patients’ allocation in a sequential experiment.

2.2 Allocation rule

Let NjN_{j} be the total number of patients allocated to arm jj at the end of the trial and N=j=1KNjN=\sum_{j=1}^{K}\,N_{j} the total number of patients treated at the end of the trial. The scheme to allocate the NN patients to the KK arms works as follows.

After a burn-in phase where an equal batch of B1B\geq 1 individuals is allocated to each of the KK arms, the tt-th patient (t=KB+1,,Nt=KB+1,\dots,N) is allocated following the rule:

assign patient t to arm j(t) such that j(t)=argmaxj=1,,KΔnj.\texttt{assign patient }t\texttt{ to arm }j^{(t)}\texttt{ such that }j^{(t)}=\underset{j=1,\dots,K}{\text{argmax}}\,\Delta_{n_{j}}\,. (7)

This rule may be seen as a deterministic response-adaptive criterion, where the next patient is almost surely allocated to the arm associated with the highest information gain. The algorithm proceeds until the total number of NN patients is attained. Finally, the arm jj^{*} whose final sample mean x¯Nj\bar{x}_{N_{j}} is closest to the target is selected for the final recommendation; namely, j=argminj=1,,K|x¯Njγj|j^{*}=\underset{j=1,\dots,K}{\text{argmin}}\,|\bar{x}_{N_{j}}-\gamma_{j}|. The burn-in phase is necessary to have a first estimate of the sample means of the KK arms, even more importantly when the sequential design is based on non-informative prior distributions.

The rule in (7) accounts for patient benefit by rewarding arms with mean responses closer to the target, while encouraging the exploration of arms with higher uncertainty. Notably, smaller values of pp tend to mitigate the impact that different values of the standard deviations have on the information gain, thus pp can be regarded as a parameter controlling the impact of the response variability on this information measure. Moreover, a too greedy algorithm may lead to an insufficient exploration of the alternative arms, resulting in poor performance in terms of statistical power (Villar et al., 2015). For this reason, a key role in the "exploration vs exploitation" trade-off is played by the parameter κ\kappa which penalises an increasing number of observations on the same arm, offering more chances to other arms to be explored. Notably, a larger κ\kappa corresponds to a higher penalisation of arms with many patients, favouring a more spread allocation. In Section 4, we show some strategies to select κ\kappa in accordance with the trial objectives, while, in Section 5, we offer further insights on how this parameter affects the operating characteristics of the proposed design.

As alternative to (5), a weight function asymmetric around γj\gamma_{j}, allowing for a different weighting of values above and below the target, is discussed in Section B of the SM.

2.3 Multivariate generalisation

Let us now consider the case where q>1q>1 endpoints are measured for each of the KK arms of a clinical trial. In addition, consider the njn_{j} responses from arm jj as an iid sequence of realisations of the q×1q\times 1 random vectors 𝑿i1,j,,𝑿inj,jiidMVN(𝝁j,𝚺j)\bm{X}_{i_{1},\,j},\dots,\bm{X}_{i_{n_{j}},\,j}\overset{iid}{\sim}MVN(\bm{\mu}_{j},\,\bm{\Sigma}_{j}), with mean vector 𝝁jq\bm{\mu}_{j}\in\mathbb{R}^{q} and covariance matrix 𝚺j\bm{\Sigma}_{j} supposed to be known. Under these assumptions, an improper prior uniform on q\mathbb{R}^{q} for 𝝁j\bm{\mu}_{j} will result in a posterior distribution MVN(𝒙¯nj,𝚺nj)MVN(\bar{\bm{x}}_{n_{j}},\,\bm{\Sigma}_{n_{j}}), where 𝒙¯nj\bar{\bm{x}}_{n_{j}} is the vector of sample means for the jj-th arm and 𝚺nj=𝚺jnj\bm{\Sigma}_{n_{j}}=\frac{\bm{\Sigma}_{j}}{n_{j}}. If 𝛀nj=𝚺nj1\bm{\Omega}_{n_{j}}=\bm{\Sigma}^{-1}_{n_{j}} is the posterior precision matrix, the posterior density of 𝝁j\bm{\mu}_{j} is given by

πnj(𝝁j)=(2π)q2det(𝚺nj)12exp{12(𝒙¯nj𝝁j)𝛀nj(𝒙¯nj𝝁j)}.\pi_{n_{j}}(\bm{\mu}_{j})=(2\pi)^{-\frac{q}{2}}\,\det(\bm{\Sigma}_{n_{j}})^{-\frac{1}{2}}\,\exp\biggl{\{}-\frac{1}{2}\Bigl{(}\bm{\bar{x}}_{n_{j}}-\bm{\mu}_{j}\Bigr{)}^{\prime}\bm{\Omega}_{n_{j}}\Bigl{(}\bm{\bar{x}}_{n_{j}}-\bm{\mu}_{j}\Bigr{)}\biggr{\}}\,. (8)

Theorem 2 provides a closed-form expression for the information gain when a family of weight functions having a multivariate Gaussian kernel centered around the vector of target means 𝜸j\bm{\gamma}_{j} is considered.

Theorem 2.

Let h(πnj)h(\pi_{n_{j}}) and hϕγ(πnj)h^{\phi_{\gamma}}(\pi_{n_{j}}) be the standard and weighted Shannon entropy of the density in (8) corresponding to arm jj. Assume a family of weight functions of the form

ϕγ(𝝁j)=C(𝒙¯nj,𝚺j,𝜸j,nj)exp{12(𝝁j𝜸j)𝛀ϕ,j(𝝁j𝜸j)},\phi_{\gamma}(\bm{\mu}_{j})=C(\bm{\bar{x}}_{n_{j}},\bm{\Sigma}_{j},\bm{\gamma}_{j},n_{j})\,\exp\biggl{\{}-\frac{1}{2}\Bigl{(}\bm{\mu}_{j}-\bm{\gamma}_{j}\Bigr{)}^{\prime}\bm{\Omega}_{\phi,\,j}\Bigl{(}\bm{\mu}_{j}-\bm{\gamma}_{j}\Bigr{)}\biggr{\}}\,, (9)

where 𝛀ϕ,j=𝚺ϕ,j1=njκ𝚺j1\bm{\Omega}_{\phi,\,j}=\bm{\Sigma}_{\phi,\,j}^{-1}=n_{j}^{\kappa}\,\bm{\Sigma}_{j}^{-1} is the precision matrix term of the weight function and C(𝐱¯nj,𝚺j,𝛄j,nj)C(\bm{\bar{x}}_{n_{j}},\bm{\Sigma}_{j},\bm{\gamma}_{j},n_{j}) satisfying qϕγ(𝛍j)πnj(𝛍j)𝑑𝛍j=1\int_{\mathbb{R}^{q}}\,\phi_{\gamma}(\bm{\mu}_{j})\pi_{n_{j}}(\bm{\mu}_{j})\,d\bm{\mu}_{j}=1. Then,

Δnj=q2njκnjκ+nj12(𝜸j𝒙¯nj)𝚺j1(𝜸j𝒙¯nj)(njκ+12njκ+nj)2.\Delta_{n_{j}}=\frac{q}{2}\frac{n_{j}^{\kappa}}{n_{j}^{\kappa}+n_{j}}-\frac{1}{2}\bigl{(}\bm{\gamma}_{j}-\bar{\bm{x}}_{n_{j}}\bigr{)}^{\prime}\bm{\Sigma}_{j}^{-1}\bigl{(}\bm{\gamma}_{j}-\bar{\bm{x}}_{n_{j}}\bigr{)}^{\prime}\Biggl{(}\frac{n_{j}^{\kappa+\frac{1}{2}}}{n_{j}^{\kappa}+n_{j}}\Biggr{)}^{2}\,. (10)

The proof is provided in Section A.2 of the SM. Interestingly, for q=1q=1 the information gain in (10) coincides with the one in (6) with p=2p=2. The information gain attains higher values in correspondence of vectors of the sample means being close to the vector of the target means, and it is also monotonically increasing with the variances of the endpoints (see section C.1 of the SM). The asymptotical behaviour of (10) is similar to the one of the information gain in (6), thus we consider κ0.5\kappa\geq 0.5 for the same reasons argued for the univariate case.

In the bivariate case (q=2q=2), the function in (10) can be expressed in scalar terms as

Δnj=q2njκnjκ+nj(γj(1)x¯nj(1))2σj(11)2(γj(1)x¯nj(1))(γj(2)x¯nj(2))σj(12)+(γj(2)x¯nj(2))2σj(22)σj(11)σj(22)σj(12)2(njκ+12njκ+nj)2,{\Delta_{n_{j}}=\frac{q}{2}\frac{n_{j}^{\kappa}}{n_{j}^{\kappa}+n_{j}}-\frac{\bigl{(}\gamma_{j}^{\scriptscriptstyle(1)}-\bar{x}_{n_{j}}^{\scriptscriptstyle(1)}\bigr{)}^{2}\sigma_{j}^{\scriptscriptstyle(11)}-2\bigl{(}\gamma_{j}^{\scriptscriptstyle(1)}-\bar{x}_{n_{j}}^{\scriptscriptstyle(1)}\bigr{)}\bigl{(}\gamma_{j}^{\scriptscriptstyle(2)}-\bar{x}_{n_{j}}^{\scriptscriptstyle(2)}\bigr{)}\sigma_{j}^{\scriptscriptstyle(12)}+\bigl{(}\gamma_{j}^{\scriptscriptstyle(2)}-\bar{x}_{n_{j}}^{\scriptscriptstyle(2)}\bigr{)}^{2}\sigma_{j}^{\scriptscriptstyle(22)}}{\sigma_{j}^{\scriptscriptstyle(11)}\sigma_{j}^{\scriptscriptstyle(22)}-\sigma_{j}^{{\scriptscriptstyle(12)}^{2}}}\Biggl{(}\frac{n_{j}^{\kappa+\frac{1}{2}}}{n_{j}^{\kappa}+n_{j}}\Biggr{)}^{2}\,,}

where 𝜸j=(γj(1),γj(2))\bm{\gamma}_{j}=\bigr{(}\gamma_{j}^{\scriptscriptstyle(1)},\gamma_{j}^{\scriptscriptstyle(2)}\bigl{)}^{\prime}, 𝒙¯j=(x¯nj(1),x¯nj(2))\bar{\bm{x}}_{j}=\bigr{(}\bar{x}_{n_{j}}^{\scriptscriptstyle(1)},\bar{x}_{n_{j}}^{\scriptscriptstyle(2)}\bigl{)}^{\prime} and 𝚺j=(σj(11)σj(12)σj(12)σj(22))\bm{\Sigma}_{j}=\biggl{(}\begin{smallmatrix}\sigma_{j}^{\scriptscriptstyle(11)}&\sigma_{j}^{\scriptscriptstyle(12)}\\ \\ \sigma_{j}^{\scriptscriptstyle(12)}&\sigma_{j}^{\scriptscriptstyle(22)}\end{smallmatrix}\biggr{)} with variances in the diagonal and covariance between the endpoints in the off-diagonal. The behaviour of the information gain as function of the correlation ρj=σj(12)σj(11)σj(22)\rho_{j}=\frac{\sigma_{j}^{\scriptscriptstyle(12)}}{\sqrt{\sigma_{j}^{\scriptscriptstyle(11)}\sigma_{j}^{\scriptscriptstyle(22)}}} is discussed in Section C.2 of the SM.

We adopt the function (10) as patients’ allocation criterion. As for the univariate case, the next patient of the sequential scheme is assigned to the arm with the highest information gain, while the arm selection is still based on the proximity between the final estimates of the means and the respective target mean values, i.e. the estimated best arm is the one whose final estimated mean vector is closest to the target vector.

3 Simulation-based hypothesis testing procedure

3.1 Hypothesis testing

We assume a generic context where a KK-armed trial is considered and no control arm is adopted for comparison. Thus, (K2)\binom{K}{2} pairwise comparisons may be necessary to test the superiority of an arm over the others, and this can be a computationally demanding task. Here, following the motivating trial and to be aligned with the final recommendation strategy, we propose a testing procedure which only compares the two best performing arms, by also taking into account their correct identification.

Let j:=argminj=1,,K|μjγj|j^{*}:=\underset{j=1,\dots,K}{\text{argmin}}\,|\mu_{j}-\gamma_{j}| be the best treatment arm and j:=argminj=1,,K,jj|μjγj|j^{**}:=\underset{j=1,\dots,K,\,j\neq j^{*}}{\text{argmin}}\,|\mu_{j}-\gamma_{j}| be the second best treatment arm. In case of tie between two or more treatment arms, the best (or second best) arm is randomly determined among them. Denoting μj\mu_{j^{*}} and μj\mu_{j^{**}} the true mean responses of arm jj^{*} and arm jj^{**}, respectively, we propose a test procedure which assesses whether μj\mu_{j^{*}} is sufficiently much closer to the target γj\gamma_{j} than μj\mu_{j^{**}}. Formally, we consider the following set of hypotheses:

H0:|μjγj|=|μjγj|vsH1:|μjγj|<|μjγj|.H_{0}:\,|\mu_{j^{*}}-\gamma_{j}|=|\mu_{j^{**}}-\gamma_{j}|\quad\text{vs}\quad H_{1}:\,|\mu_{j^{*}}-\gamma_{j}|<|\mu_{j^{**}}-\gamma_{j}|\,. (11)

If x^j\widehat{x}_{j} is the sample mean of arm jj at the end of the trial, the estimated best and second best arm are j^:=argminj=1,,K|x^jγj|\widehat{j}^{*}:=\underset{j=1,\dots,K}{\text{argmin}}\,|\widehat{x}_{j}-\gamma_{j}| and j^:=argminj=1,,K,jj|x^jγj|\widehat{j}^{**}:=\underset{j=1,\dots,K,\,j\neq j^{*}}{\text{argmin}}\,|\widehat{x}_{j}-\gamma_{j}|. The hypotheses in (11) are tested under a Bayesian framework where H0H_{0} is rejected at a significance level α\alpha if the posterior probability that |μj^γj|<|μj^γj||\mu_{\widehat{j}^{*}}-\gamma_{j}|<|\mu_{\widehat{j}^{**}}-\gamma_{j}| exceeds a fixed cut-off probability ηα\eta_{\alpha}. Strategies to calibrate ηα\eta_{\alpha} to achieve type-I error control are proposed in Section 3.2.

However, comparing distances between the means of the two estimated best arms is not sufficient to make a claim on the true ones, thus we need to guarantee that the ground truth is correctly identified. For this reason, we propose two definitions of power incorporating their correct identification, under the assumption that ({j^=j,j^=j}|H0)=1\mathbb{P}\bigl{(}\bigl{\{}\widehat{j}^{*}=j^{*},\,\widehat{j}^{**}=j^{**}\bigr{\}}\bigl{|}H_{0}\bigr{)}=1.

The first proposal is to consider a conditional power,

(1β)R=({πj^,j^>ηα}|{j^=j,j^=j},H1),(1-\beta)_{R}=\mathbb{P}\bigl{(}\bigl{\{}\pi_{\widehat{j}^{*},\widehat{j}^{**}}>\eta_{\alpha}\bigr{\}}\bigl{|}\bigl{\{}\widehat{j}^{*}=j^{*},\,\widehat{j}^{**}=j^{**}\bigr{\}},\,H_{1}\bigr{)}\,, (12)

where πj^,j^\pi_{\widehat{j}^{*},\widehat{j}^{**}} is the posterior probability that |μj^γj|<|μj^γj||\mu_{\widehat{j}^{*}}-\gamma_{j}|<|\mu_{\widehat{j}^{**}}-\gamma_{j}|. Rejections of the null are counted in (12) only for pseudo-trials whose two best arms are correctly identified. Alternatively, one can consider a two-components power, namely

(1β)TC=({πj^,j^>ηα}{j^=j,j^=j}|H1),(1-\beta)_{TC}=\mathbb{P}\bigl{(}\bigl{\{}\pi_{\widehat{j}^{*},\widehat{j}^{**}}>\eta_{\alpha}\bigr{\}}\,\cap\,\bigl{\{}\widehat{j}^{*}=j^{*},\,\widehat{j}^{**}=j^{**}\bigr{\}}\bigl{|}H_{1}\bigr{)}\,, (13)

where correct identification is counted as a "success". Notice that (1β)TC(1β)R(1-\beta)_{TC}\leq(1-\beta)_{R}.

3.2 Controlling type-I error: mean and strong control

Let us consider a wide set 𝒮0={H0(1),,H0(S)\mathcal{S}_{0}=\{H_{0}^{(1)},\,\dots,H_{0}^{(S)}} of plausible null scenarios, where H0(s)H_{0}^{(s)} is characterised by a particular vector of true mean responses. Given a specific design and a maximum tolerated type-I error probability α\alpha, we define ηα(s)\eta_{\alpha}^{(s)} as the individual cut-off value calibrated at level α\alpha under the null scenario H0(s)H_{0}^{(s)} (s=1,,S)s=1,\dots,S), namely a threshold value ηα(s)\eta_{\alpha}^{(s)} such that (πj^,j^>ηα(s)|H0(s))=α\mathbb{P}\Bigl{(}\pi_{\widehat{j}^{*},\widehat{j}^{**}}>\eta_{\alpha}^{(s)}\Bigl{|}\,H_{0}^{(s)}\Bigr{)}=\alpha. Notice that, for a fixed ηα(s)\eta_{\alpha}^{(s)}, this probability can be efficiently evaluated via Monte Carlo simulations.

To guarantee a strong control of the type-I error, one can fix the threshold ηα\eta_{\alpha} to the maximum of the individual cut-off probabilities, i.e. ηαmax=maxs=1,,Sηα(s)\eta_{\alpha}^{\text{max}}=\underset{s=1,\dots,S}{\max}\,\eta_{\alpha}^{(s)}. This is a conservative strategy which assures that the probability of incorrectly detecting the presence of a superior arm under the null does not exceed α\alpha in each of the scenarios in 𝒮0\mathcal{S}_{0}. Ideally, 𝒮0\mathcal{S}_{0} should contain a wide variety of scenarios in order to control as many situations as possible. In practice, there may be scenarios which are thought to be highly unlikely and can be ruled out. As rule of thumb, we suggest to consider null scenarios with μj\mu_{j} not being distant from γj\gamma_{j} more than 1010 times the largest standard deviation. Remember that for normal responses it is known that the response XjX_{j} will be in the interval μj±3σj\mu_{j}\pm 3\sigma_{j} with probability higher than 0.970.97. Thus, a distance |μjγj||\mu_{j}-\gamma_{j}| much higher than 3σj3\sigma_{j} would imply that values in the neighbourhood of γj\gamma_{j} are very unlikely for arm jj. Furthermore, in some contexts, null scenarios with distances larger than 10maxjσj10\cdot\max_{j}\sigma_{j} might signify the ineffectiveness of all the treatments and, therefore, having a control on any of them may be out of the trial scope.

In some cases, the strong control of the type-I error may be a too stringent requirement which leads to a loss of power. This is even more evident when there are only few scenarios requiring a substantially higher cut-off value than the rest. Similarly to Daniells et al. (2024), we propose to relax the required type of control, trying to reach an average control of the type-I error probability rather than a strong one. Then the resulting threshold value is ηαave\eta_{\alpha}^{\text{ave}} such that 1Ss=1Sα(s)=α\frac{1}{S}\sum_{s=1}^{S}\,\alpha^{(s)}=\alpha, where α(s)\alpha^{(s)} is the individual type-I error probability associated with scenario H0(s)H_{0}^{(s)}. In this case, α\alpha can be regarded as the average type-I error probability for the set of scenarios in 𝒮0\mathcal{S}_{0}. Notice that the simple average can be replaced by a weighted average, giving larger weight to scenarios which are more relevant than others.

4 Robust strategies for selecting κ\kappa

4.1 Strategy

The choice of the penalisation parameter κ\kappa is crucial since to control the trade-off between exploration and exploitation. We consider as "optimal", a value of κ\kappa which maximises a specific operating characteristic of interest (e.g. statistical power, percentage of times the best arm is selected in repeated trials). The optimal value of κ\kappa depends on several parameters, like the vector of true means, the vector of standard deviations, the vector of target means, the total sample size, the number of arms, the burn-in size and the parameter pp. Apart from the vectors of the true means and true standard deviations, we suppose that all the other parameters are fixed in advance by clinicians. Particular caution must be placed on the vector of standard deviations which may be supposed to be either known (i.e. prior guess used as estimate) or unknown.

Following Mozgunov and Jaki (2020b), we discuss a general approach to select the robust optimal value of κ\kappa which does not require the prior knowledge on the mean responses and the standard deviations. The idea is to consider a large variety of plausible alternative scenarios, each one characterised by a vector of mean responses (and a vector of standard deviations, in case they are assumed to be unknown), and to select the value of κ\kappa associated with the average best performance over all the considered scenarios.

In practice, one considers a set 𝒮1={H1(1),,H1(S)}\mathcal{S}_{1}=\{H_{1}^{(1)},\,\dots,H_{1}^{(S)}\} of SS plausible alternative scenarios, where H1(s)H_{1}^{(s)} is characterised by the set of unknown parameters, and a grid of values for κ\kappa. Then, the algorithm to find the robust optimal κ\kappa consists in the following steps:

  1. 1.

    Define an objective function g(u(s)(κ))g(u^{(s)}(\kappa)), where u(s)(κ)u^{(s)}(\kappa) is the value assumed by a particular operating characteristic of interest for a fixed κ\kappa and scenario ss (s=1,,Ss=1,\dots,S).

  2. 2.

    Compute u(s)(κ)u^{(s)}(\kappa), for each scenario ss and value of κ\kappa in the considered grid.

  3. 3.

    Find the optimal κ\kappa: κ=argminκ1Ss=1Sg(u(s)(κ))\kappa^{*}=\text{argmin}_{\kappa}\,\frac{1}{S}\sum_{s=1}^{S}\,g(u^{(s)}(\kappa)).

Averaging over a wide set of scenarios offers a robust way to select a suitable value of κ\kappa regardless of the true values of the parameters: although the optimal value of κ\kappa may not be convenient for some particular scenario, this procedure ensures that it is nearly optimal.

4.2 Objective functions

The objective functions should be chosen in accordance with the operating characteristic that one wants to target in the calibration of κ\kappa. If the main objective is to maximise the patient benefit (PB), we propose to target the average percentage of experimentation of the best arm (evaluated over MM pseudo-trials), namely

PB=1Mm=1M{1Nt=1N𝕀(atm=j)},PB=\frac{1}{M}\sum_{m=1}^{M}\Biggl{\{}\frac{1}{N}\sum_{t=1}^{N}\,\mathbb{I}\bigl{(}a_{tm}=j^{*}\bigr{)}\Biggr{\}}\,, (14)

where atm=ja_{tm}=j^{*} if and only if patient tt is allocated to the true best arm jj^{*} during the pseudo-trial mm. Supposing that u1(κ)=(PB)κu_{1}(\kappa)=(PB)_{\kappa} for any κ\kappa, we adopt the following squared difference as corresponding objective function:

g1(κ)=1Ss=1S(u1(s)(κ)umax(s))2,g_{1}(\kappa)=\frac{1}{S}\sum_{s=1}^{S}\,\Bigl{(}u_{1}^{(s)}(\kappa)-u^{(s)}_{\text{max}}\Bigr{)}^{2}\,, (15)

quantifying how far each value u1(s)(κ)u^{(s)}_{1}(\kappa) is from umax(s)=maxku1(s)(κ)u^{(s)}_{\text{max}}=\max_{k}u_{1}^{(s)}(\kappa).

In case the objective is to achieve a pre-specified level of power, than we use

g2(κ)={κ1Ss=1S𝕀(u2(s)(κ)0.8×u2,FR(s))ξotherwise,g_{2}(\kappa)=\begin{cases}\kappa&\frac{1}{S}\sum_{s=1}^{S}\,\mathbb{I}\Bigl{(}u_{2}^{(s)}(\kappa)\geq 0.8\times u^{(s)}_{2,FR}\Bigr{)}\,\geq\,\xi\\ \infty&\text{otherwise}\end{cases}\quad, (16)

where u2(s)(κ)u_{2}^{(s)}(\kappa) is either the power defined in (12) or in (13), u2,FR(s)u^{(s)}_{2,FR} is the power attained using fixed and equal randomisation (FR) and ξ[0,1]\xi\in[0,1]. The optimal κ\kappa will be then the smallest κ\kappa for which the probability to obtain a power higher than 80%80\% of the power attained by the FR is at least ξ\xi. This avoids a too excessive penalisation of increasing arm sample sizes, once that the requirement in terms of power is met.

5 Simulation study of Phase II clinical trials

5.1 Setting

Let us consider an early Phase IIa proof-of-concept oncology clinical trial such as the one introduced in the motivating example of Section 1. We consider K=4K=4 alternative treatments (no control arm) and a total of N=100N=100 patients recruited at the end of the trial. We assume that the primary endpoint is PD-marker measured on a continuous scale, and responses from arm jj closer to the target γj\gamma_{j} (j=1,,Kj=1,\dots,K) are more desirable. For simplicity, we assume γj=γ=0\gamma_{j}=\gamma=0 for j=1,,Kj=1,\dots,K, implying that sample means equal to ϵ\epsilon and ϵ-\epsilon (ϵ>0\epsilon>0) are treated equally by our proposed design. Thus, for a given scenario, the best treatment arm is the one associated with smallest mean response in absolute value.

We investigate the performance of our method over S=500S=500 random scenarios under the alternative hypothesis in (11). Each scenario is characterised by a vector of means (μ1,,μK)(\mu_{1},\dots,\mu_{K}), where μj\mu_{j} is generated from a Unif[4,4]Unif[-4,4], j=1,,K\forall j=1,\dots,K, while variances are assumed to be known and equal to σj2=22\sigma_{j}^{2}=2^{2}, j=1,2,3j=1,2,3, σ42=42\sigma_{4}^{2}=4^{2} for each scenario. Details and a full analysis for the case of unknown variances are presented in Section F of the SM.

Operating characteristics of interest are (i) the type-I error rates in correspondence of specific null scenarios, (ii.a) conditional ((1β)C(1-\beta)_{C}) and (ii.b) two-components ((1β)TC(1-\beta)_{TC}) statistical power (cfr. Section 3.1), (iii) the average percentage of experimentation of the best arm - cfr. (14) - as measure of patient benefit (PB) and (iv.a) the percentage of pseudo-trials with correct identification of the best arm (CS(%)I{}_{I}(\%)) and of (iv.b) the two best arms (CS(%)I&II{}_{I\&II}(\%)). For each given scenario, operating characteristics are assessed over M=104M=10^{4} pseudo-trials (Monte Carlo simulations).

In the following, we will denote WE(pp,κ\kappa) the class of designs proposed in Section 2.2, and we will consider the case of p={1, 2}p=\{1,\,2\}: this will offer insights on how different degrees of discrimination between arm-specific variances affect the resulting operating characteristics.

We compare its performance with several alternative allocation procedures which have good properties either in terms of statistical power or patient benefit. For a fair comparison with our proposal and to narrow down the set of competitors, we restrict to response-adaptive designs based on deterministic allocation rule. Additionally, in line with the objective of the trial, we only consider those adaptive designs that try to skew allocations in favour of arms with means closer to the target. Specifically, we compare with the following designs: (i.) Fixed and equal randomisation (FR), where each patient can be treated in a specific arm with fixed probability 1K\frac{1}{K}; (ii.) Current belief (CB), where the next patient is treated in the arm with posterior mean x¯nj\bar{x}_{n_{j}} closest to the target γj\gamma_{j}; (iii.) Thompson Sampling (TS), where patient tt is treated in the arm which maximises the adjusted posterior probability of being the best, i.e. πj=(j=j|data)cl=1K(l=j|data)c,\pi_{j}=\frac{\mathbb{P}(j=j^{*}|\texttt{data})^{c}}{\sum_{l=1}^{K}\,\mathbb{P}(l=j^{*}|\texttt{data})^{c}}\,, where c=t2Nc=\frac{t}{2N} helps to stabilise the resulting allocations; (iv.) Symmetric Gittins Index (SGI) and (v.) Targeted Gittins Index (TGI), two modifications of Smith and Villar (2018)’s proposal involving γj\gamma_{j} (full details are provided in Section D of the SM). We use objective priors and fix d=0.99d=0.99, in line with the considerations of Wang (1991) and Williamson and Villar (2020) for a trial of size N=100N=100. Notice that, apart from FR, all the alternative designs can be classified as response-adaptive. FR is taken as benchmark for power comparison, due to its even exploration of all the arms and its wide application in real trials. CB and TS are common and intuitive response-adaptive designs, which are generally associated with good performance in terms of patient benefit. Finally, SGI and TGI are modifications of a forward-looking design which is oriented toward a patient benefit objective (Smith and Villar, 2018; Williamson and Villar, 2020).

In absence of prior knowledge about the treatments, the burn-in size is fixed to B=5B=5 for all the response-adaptive designs to help sequential algorithms collecting sufficient preliminary information about all the arms. The same burn-in size is chosen for both the case of known and unknown variances, in order to isolate its effect in the comparison between these two settings. Further details are provided in Section E.1 of the SM.

5.2 Calibration of cut-off probabilities

For a given combination of pp and κ\kappa, we calibrate design-specific cut-off probabilities under the set of null scenarios 𝒮0={(μ1,,μK):μjγ=c,c0,j}\mathcal{S}_{0}=\bigl{\{}(\mu_{1},\dots,\mu_{K}):\,\mu_{j}-\gamma=c,\,c\geq 0,\,\forall j\bigr{\}} both considering the strong and the average control of the type-I error at a level α=5%\alpha=5\%. Two types of considerations should be made on this choice of null scenarios: (i) calibration is done under the most challenging situation where there is no sub-optimal arm; (ii) one only considers c0c\geq 0 since, for γ=0\gamma=0, values cc and c-c are treated equally in the definition of best arm. Details on the choice of the null scenarios are presented in Section E.2 of the SM.

Results are reported in Figure 2, where only four WE designs are shown based on the results of Section 5.3.

Refer to caption
Figure 2: Individual type-I error rates for a grid of values of c:μjγ=c,j,c0c:\,\mu_{j}-\gamma=c,\,\forall j,\,c\geq 0 and for several designs. Design-specific cut-off probabilities are calibrated both in the case of strong control (left panel) and average control (right panel) of the type-I error probability at level α=5%\alpha=5\%.

Response-adaptive designs tend to be associated with higher type-I error probabilities when cc is small, while FR design is associated with type-I error rates which increase with cc before reaching a plateau. This different behaviour can be explained by the peculiar nature of response-adaptive designs. Indeed, variability around the arm responses can overly penalise some arms at the beginning of the trial in favour of others. Notably, when cc is very small (i.e. all true means are very close to the target value), most of the patients are likely to be assigned to the arm which - by chance - performs better at the beginning of the trial: as a result, any other arm may not have the chance to be further explored, thus being falsely identified as inferior at the end of the trial. If a practitioner aims at reducing even more the resulting inflation of the type-I error rate associated with response-adaptive designs in this type of scenarios, a possible solution is to consider higher burn-in sizes (cfr. Section E.1 of the SM), possibly at the expense of a lower patient benefit: although a larger BB is associated to a more exhaustive exploration of all the arms and to more precise estimates, a too high value may lead to assign a high percentage of patients to sub-optimal treatment arms.

5.3 Robust selection of the parameter κ\kappa

The implementation of the proposed design requires to fix in advance the penalisation parameter κ\kappa. We apply the procedure presented in Section 4 both for the objective function maximising the patient benefit and the one achieving a specific level of statistical power.

We perform two independent analyses for the case of p=1p=1 and p=2p=2. After preliminary checks, the sequence of values for κ\kappa is restricted in the interval [0.50.5, 1.51.5] with step 0.050.05. For the power objective function, we require that 80%80\% of the power of the FR is achieved with probability higher or equal to ξ=0.9\xi=0.9. This condition should assure that in most of the scenarios the resulting power is not too below the one achieved by the FR design. As a measure for u2(κ)u_{2}(\kappa), we consider the two-components power defined in (13) with design-specific cut-off probabilities guaranteeing an average control of the type-I error rate at α=5%\alpha=5\% over the set of scenarios considered in Section 5.2.

Figure 3 illustrates the result of the optimisation of g1g_{1} and g2g_{2}.

Refer to caption
Figure 3: Objective function g1g_{1} (left panel) and probability to achieve at least 80%80\% of the power of FR (right panel) for several values of κ\kappa, and for both p=1p=1 (black square) and p=2p=2 (red circles). Filled shapes correspond to the robust optimal values of κ\kappa. The target probability to achieve 80%80\% of FR power is ξ=0.9\xi=0.9 (dashed line).

As anticipated, lower values of κ\kappa are suggested when the criterion optimises the patient benefit, while larger values of κ\kappa are necessary to achieve the power objective. At the same time, higher values of pp requires higher values of κ\kappa to attain the specific objective to compensate the higher relevance attributed to the variance. Therefore, the value of κ=0.55\kappa=0.55 and κ=0.8\kappa=0.8 will be used when p=1p=1, while κ=0.7\kappa=0.7 and κ=1.1\kappa=1.1 when p=2p=2.

5.4 Numerical results

A wide overview of the operating characteristics of the considered designs is reported in Figure 4, which summarises results across all the 500500 randomly generated scenarios.

Refer to caption
Figure 4: Operating characteristics across S=500S=500 randomly generated scenarios for the considered set of designs. Each scenario is characterised by a vector of mean responses. For all scenarios, we assume K=4K=4, N=100N=100, 𝝈=(2,2,2,4)\bm{\sigma}=(2,2,2,4) (known) and γj=0\gamma_{j}=0, for j=1,,4j=1,\dots,4. Burn-in size is fixed to B=5B=5 for all the response-adaptive designs.

WE designs achieve good performance in terms of patient benefit and identification of the best arm than their competitors, with WE(11, 0.80.8) and WE(22, 1.11.1) achieving a power almost comparable to FR. The even exploration of all the arms which characterises FR also leads to a higher percentage of correct identification of the two best arms, but with high variability across the scenarios. CB and TS generally perform worse than the other response-adaptive designs: notably, they both achieve high medians in terms of patient benefit but with high variability around them. In contrast, SGI and WE(11, 0.550.55) achieve comparable patient benefit measures, but with much less variability around their medians.

To further investigate the performance of our proposal, we pick two scenarios from the ones considered in Figure 4. Notably, given 𝝈=(2,2,2,4)\bm{\sigma}=(2,2,2,4) and γj=0j\gamma_{j}=0\,\forall j, we consider: (i) Scenario I, characterised by 𝝁=(1.91,3.36,0.37,3.99)\bm{\mu}=(1.91,-3.36,-0.37,3.99), with the worst treatment arm having the highest variability; (ii) Scenario II, characterised by 𝝁=(1.13,3.48,3.57,0.34)\bm{\mu}=(1.13,-3.48,-3.57,0.34), with the best treatment arm having the highest variability. Table 1 illustrates the operating characteristics of the considered designs under Scenario I and II.

Under Scenario I, TGI, WE(11, 0.850.85), WE(22, 1.21.2) achieve a conditional power higher or equal to FR, while their two-components power drops down due to a higher difficulty in identifying the true second best arm at the end of the trial: indeed, the difference between CS1 and CS2 is higher than 14%14\% in these three response-adaptive designs, while is lower than 2%2\% for FR. Moreover, FR detects the two best arms and claim the superiority of the best one in 88%88\% of the simulated trials, while this percentage is reduced by 88 points when WE(22, 1.21.2) is considered. This moderate sacrifice in terms of power allows WE(22, 1.21.2) to assign - on average - more than 33 out of 44 patients to the best treatment arm (around 11 out 44 with FR), with a large earning in terms of patient benefit. All the other considered response-adaptive designs achieve moderately higher PB at the cost of a much lower two-components power. Notably, the highest PB is achieved by WE(11, 0.550.55), which make this design an appealing alternative whenever one aims at optimising the benefit of the patients, also due to its low variability (s.e. =0.06=0.06).

Scenario I: (μj,σj)={(1.91, 2),(3.36, 2),(0.37, 2),(3.99, 4)}(\mu_{j},\,\sigma_{j})=\bigl{\{}(1.91,\,2),\,(-3.36,\,2),\,(-0.37,\,2),\,(3.99,\,4)\bigr{\}}, γj=0\gamma_{j}=0
      Design       PB (s.e.)       CS(%)I{}_{I}(\%)       CS(%)I&II{}_{I\&II}(\%)       (1β)C(1-\beta)_{C}       (1β)TC(1-\beta)_{TC}       FR       24.99 (0.04)       99.63       97.97       0.89       0.88       CB       81.22 (0.14)       97.10       74.49       0.72       0.53       TS       81.57 (0.13)       96.43       73.90       0.70       0.52       SGI       81.81 (0.07)       99.67       82.75       0.85       0.70       TGI       80.99 (0.08)       94.84       75.17       0.95       0.71       WE(1,0.55)       82.22 (0.06)       99.88       82.49       0.81       0.66       WE(2,0.7)       80.92 (0.07)       99.85       84.46       0.86       0.73       WE(1,0.8)       81.12 (0.06)       99.89       83.36       0.88       0.73       WE(2,1.1)       77.68 (0.08)       99.93       85.57       0.93       0.79

Scenario II: (μj,σj)={(1.13, 2),(3.48, 2),(3.57, 2),(0.34, 4)}(\mu_{j},\,\sigma_{j})=\bigl{\{}(1.13,\,2),\,(-3.48,\,2),\,(-3.57,\,2),\,(0.34,\,4)\bigr{\}}, γj=0\gamma_{j}=0
      Design       PB (s.e.)       CS(%)I{}_{I}(\%)       CS(%)I&II{}_{I\&II}(\%)       (1β)C(1-\beta)_{C}       (1β)TC(1-\beta)_{TC}       FR       25.05 (0.04)       75.72       75.72       0.07       0.06       CB       38.93 (0.37)       43.31       39.31       0.52       0.21       TS       35.40 (0.37)       47.88       43.89       0.52       0.23       SGI       63.67 (0.28)       76.94       72.49       0.39       0.28       TGI       23.31 (0.30)       78.71       76.33       0.22       0.17       WE(1,0.55)       67.59 (0.26)       82.67       77.86       0.35       0.28       WE(2,0.7)       76.78 (0.14)       91.99       86.67       0.29       0.25       WE(1,0.8)       72.12 (0.17)       88.24       83.81       0.35       0.29       WE(2,1.1)       76.70 (0.11)       91.19       86.51       0.33       0.28

Table 1: Operating characteristics of the WE(pp,κ\kappa) design for different combinations of κ\kappa and pp, FR design, CB, SGI and TGI under Scenario I and II. Both (1β)C(1-\beta)_{C} and (1β)TC(1-\beta)_{TC} are based on the design-specific cut-off probabilities η0.05ave\eta_{0.05}^{\text{ave}} determined in Section 5.2. Results are based on 10410^{4} replicated trials.

Under Scenario II, WE(22, 0.650.65) and WE(22, 1.21.2) achieve the highest value of PB, but the second one is associated with lower variability around this measure and, thus, can be deemed to be more reliable. CB performs best in terms of conditional power compared to the other designs, but fails in correctly identify the best treatment more than 50%50\% of the times. In contrast, WE designs correctly identifying the two best arms in a high percentage of replicas (i.e. more than 75%75\%) and are associated with higher values of two-components power. Notice that FR is associated with a much lower power than the other designs, probably due to the relatively lower exploration (2525 patients on average) of the best arm.

Overall, the considered WE designs performance are comparable to the ones of the other response-adaptive designs, with small differences in terms of patient benefit but larger gain in terms of correct identification of the best arm and two-components power. In addition, WE designs generally tend to achieve a much higher patient benefit than FR due to their ability to skew allocations according to the accrued data, at the cost of moderately lower power. With a selection of κ\kappa targeting the power, WE can result in a statistical power closer to the one of FR, but with remarkably larger values of PB. Finally, when the best arm has also the most variable response, the flexibility of WE designs allows it to achieve an even higher gain over FR in terms of power. Further details on how operating characteristics are affected by different choices of pp and κ\kappa are given in Section E.3 of the SM.

6 Illustration: co-primary endpoints in Phase II trials

6.1 Setting

In Section 6, we focused on a single quantitative endpoint. However, multiple endpoints are generally collected in clinical trials and we can adopt the design proposed in Section 2.3 to deal with this particular context. In particular, we illustrate the performance of the novel response-adaptive design in the case of an oncology clinical trial with K=4K=4 competing drug combinations, a total sample size of N=100N=100 and two co-primary endpoints. The first endpoint is a PD-marker (PDM) - e.g., see motivating setting in Section 1 - targeting a specific value γj(1)=γ(1)=0\gamma_{j}^{(1)}=\gamma^{(1)}=0, for all j=1,,Kj=1,\dots,K, while the second endpoint is the tumour shrinkage rate (TSR, %\%) whose theoretical target is assumed to be γj(2)=γ(2)=100\gamma_{j}^{(2)}=\gamma^{(2)}=100 (i.e. full disappearance of the tumour), for all j=1,,Kj=1,\dots,K. Variance matrices are assumed to be known and equal to 𝚺j=(220082),j\bm{\Sigma}_{j}=\Big{(}\begin{smallmatrix}2^{2}&0\\ 0&8^{2}\end{smallmatrix}\Big{)},\,\forall j. The goal of the study is to identify the best arm, to claim its superiority and to allocate most of the patients to it during the trial.

As illustration, we consider the design for bivariate endpoints described in Section 2.3 with κ={0.5, 0.75}\kappa=\{0.5,\,0.75\}. We compare its performance to two alternative approaches: (a.) fixed and equal randomisation (FR), (b.) current belief (CB). As already discussed in Section 5, FR is expected to achieve high power due to the even exploration of all the arms, while CB is expected to result in high PB. We choose a burn-in size of B=1B=1 patient per arm for response-adaptive designs, and we consider a scenario where 𝝁(1)=(1,1,2,2.5)\bm{\mu}^{(1)}=(1,-1,2,-2.5)^{\prime} and 𝝁(2)=(10,25,55,60)\bm{\mu}^{(2)}=(10,25,55,60)^{\prime}. The four treatment arms can be characterised as follows: the first two treatments have values of the PDM relatively close to the target but associated with a relatively small TSR (10%10\% and 25%25\%, respectively); the third has twice the PDM of the first but it is associated with a remarkable average TSR (i.e. 55%55\%); finally, the fourth is associated with a slightly poorer average PDM (e.g. more distant from the target) than the third treatment, but it has the highest average TSR. Observe that coefficients of variation of the PDM are relatively small (range: 0.5-1.25) if compared with the ones of TSR (1.25-7.5).

The definition of "best" arm in the bivariate case may be more challenging, and it strictly depends on the adopted distance. The means of the two endpoints can vary greatly in magnitude, and the use of a Minkowski distance (e.g. Euclidean, Manhattan) may lead to the endpoint with higher magnitude prevailing over the others, unless some sort of standardisation is applied. Notably, for a given arm jj, we propose to use a standardised version of the distance between the target vector and the true mean vector, i.e. d~(𝝁j,𝜸j)=l=12|μj(l)γj(l)|σj(ll).\widetilde{d}(\bm{\mu}_{j},\bm{\gamma}_{j})=\sum_{l=1}^{2}\frac{\bigl{|}\mu_{j}^{\scriptscriptstyle(l)}-\gamma_{j}^{\scriptscriptstyle(l)}\bigr{|}}{\sqrt{\sigma_{j}^{\scriptscriptstyle(ll)}}}\,. Therefore, the final recommendation is j^:=argminj=1,,Kd~(𝝁j,𝜸j)\widehat{j}^{*}:=\underset{j=1,\dots,K}{\text{argmin}}\,\widetilde{d}(\bm{\mu}_{j},\bm{\gamma}_{j}). Adopting this standardised distance, the fourth treatment arm can be regarded as the best one.

We consider the same operating characteristics as in Section 5 to assess the performance of the considered designs. Hypothesis testing is performed following the same procedure described for the univariate case (cfr. Section 3), with the only exception that H0H_{0} is rejected at significance level α\alpha if the posterior probability that d~(𝝁j^,𝜸j)<d~(𝝁j^,𝜸j)\widetilde{d}(\bm{\mu}_{\hat{j}^{*}},\,\bm{\gamma}_{j})<\widetilde{d}(\bm{\mu}_{\hat{j}^{**}},\,\bm{\gamma}_{j}) exceeds the cut-off probability ηα\eta_{\alpha}, where j^\hat{j}^{*} and j^\hat{j}^{**} are, respectively, the estimated best and second best treatment arms according to distance d~()\widetilde{d}(\cdot).

6.2 Type-I error control

Design-specific cut-off probabilities are calibrated under the set of null scenarios 𝒮0′′={(𝝁1,,𝝁j):𝝁j𝜸=𝒄,j}\mathcal{S}^{\prime\prime}_{0}=\bigl{\{}(\bm{\mu}_{1},\dots,\bm{\mu}_{j}):\,\bm{\mu}_{j}-\bm{\gamma}=\bm{c},\,\forall j\bigr{\}}, with 𝒄=(c1,c2)\bm{c}=(c_{1},\,c_{2})^{\prime}, to achieve an average control of the type-I error at a level α=5%\alpha=5\%. Figure 5 shows the resulting individual type-I error rates in correspondence of |𝒮0′′|=36|\mathcal{S}^{\prime\prime}_{0}|=36 combinations (c1c_{1}, c2c_{2}).

Refer to caption
Figure 5: Individual type-I error rates in correspondence of several null scenarios (combinations of c1c_{1} and c2c_{2}) for the considered designs under average control of type-I error probability at α=5%\alpha=5\%. Cut-off probabilities are 0.9110.911 (FR), 0.9180.918 (CB), 0.8980.898 (WE(0.5)), 0.9040.904 (WE(0.75)).

The calibration under strong control is available in Section G of the SM. Notably, FR and WE designs are associated with individual type-I error rates which do not exceed 6%6\%, with the exception of WE(0.5) which registers a 9%9\% type-I error rate under the configuration 𝒄=(0, 75)\bm{c}=(0,\,75)^{\prime}. FR and WE(0.75) have more conservative type-I error rates when the mean of PDM endpoint equals the target in all the treatment arms. On the contrary, there is a large inflation of type-I error rate of CB in correspondence of these scenarios, while elsewhere it is below 5%5\%.

6.3 Results

Table 2 shows the operating characteristics of the considered designs. FR correctly selects the best two arms in a high number of replicas (82%82\%), but loses power in treating only 50%50\% (on average) patients in these two arms. Both WE(0.5) and CB achieve high patient benefit measure, but the standard error for CB is more than five times larger than for WE(0.5). Such a high standard error suggests an unpredictable behaviour of CB, which highly depends on the response of the first treated patients. Overall, despite of the low burn-in size (B=1B=1), WE designs obtain the best operating characteristics, with WE(0.5) prioritising the patient benefit objective and WE(0.75) achieving the highest power.

      Design       PB (s.e.)       CS(%)I{}_{I}(\%)       CS(%)I&II{}_{I\&II}(\%)       (1β)C(1-\beta)_{C}       (1β)TC(1-\beta)_{TC}
      FR       24.98 (0.04)       0.82       0.82       0.41       0.34
      CB       64.41 (0.44)       0.67       0.65       0.33       0.21
      WE(0.5)       77.18 (0.08)       0.89       0.89       0.51       0.45
      WE(0.75)       49.78 (0.03)       0.89       0.89       0.52       0.47
Table 2: Operating characteristics of FR, CB, WE(0.5) and WE(0.75) under scenario where 𝝁(1)=(1,1,2,2.5)\bm{\mu}^{(1)}=(1,-1,2,-2.5)^{\prime}, 𝝁(2)=(10,25,55,60)\bm{\mu}^{(2)}=(10,25,55,60)^{\prime}, 𝜸j=(0, 100)\bm{\gamma}_{j}=(0,\,100)^{\prime}, 𝚺j=(220082),j=1,,K\bm{\Sigma}_{j}=\Big{(}\begin{smallmatrix}2^{2}&0\\ 0&8^{2}\end{smallmatrix}\Big{)},\,j=1,\dots,K and N=100N=100. Both (1β)C(1-\beta)_{C} and (1β)TC(1-\beta)_{TC} are based on the design-specific cut-off probabilities η0.05ave\eta_{0.05}^{\text{ave}} determined in Section 6.2. Results are based on 10410^{4} replicated trials.

7 Discussion and future directions

In this work, we proposed a class of design for the selection of arms in a multi-armed setting with continuous outcomes. This is based on a weighted information measure which retains the information about which outcomes are more desirable, a feature of considerable interest in settings with ethical constraints. Particular emphasis was given on a weight function with Gaussian kernel centered around a pre-specified clinical target γ\gamma. The resulting information gain criterion was used to govern the allocation of patients in the arms during the trial, and it showed flexibility in tackling the exploration vs. exploitation trade-off. The careful tuning of a penalisation parameter κ\kappa was of great importance to ensure desirable operating characteristics of the design, and a general procedure to select κ\kappa was illustrated.

In Section 5, we selected the burn-in size based on considerations regarding the need to control type-I error rates, taking into account both the case of known and unknown variances. In Section 6, demonstrate that the proposed methodology can be made fully adaptive. In any case, the burn-in size plays a relevant role in sequential designs since it mitigates the impact of initial random fluctuations that might occur at the start of the trial, though it may also reduce the ethical benefits of full adaptiveness when an optimal treatment truly exists. Future research could explore the joint optimization of κ\kappa and BB, with the former maximising a patient benefit criterion and the latter targeting a specific power or type-I error rate.

For the criterion presented in Section 2.2, we introduced an additional parameter pp controlling the impact of different variances on the allocation strategy. We do not recommend a robust selection of pp, as the simultaneous calibration of multiple tuning parameters (e.g., κ\kappa, BB and pp) could reduce the feasibility of our proposal. In general, we suggest to fix p=1p=1 when heteroscedasticity among different arms is thought to be evident. This would mitigate the effect of a too large variance taking the lead over the others.

The choice of the weight function is crucial and should be based on the setting of interest. For example, a multivariate generalisation of the symmetric weight function was presented to evaluate multiple endpoints simultaneously. An asymmetric weight function was also presented in the SM: this type of weight function might be particularly useful in some dose-finding studies, where, for example, overdosing and underdosing may want to be treated in a different way. Although normal (or multinormal) distributions were assumed throughout the work, the general procedure outlined in Section 2 can be adapted to accommodate various types of responses and weight functions.

In line with the primary objective of selecting the best treatment, a hypothesis testing procedure which compares the two best treatments and assesses their correct identification was presented. In this context, two measures of power were defined: the conditional power quantifies the ability of a design to correctly detect the superiority of the best treatment arm over the second best, whenever these two arms are properly identified at the end of the trial; the two-components power takes into account both the ability of identifying the best two arms and correctly detect the superiority of the best one. While the first measure isolates the actual gain in using the best treatment rather than the second best, the second measure provides a more comprehensive assessment decision-making performance of the design. A procedure to calibrate cut-off values was illustrated, along with different strategies to control the type-I error (either achieving a strong or an average control).

We adopted a hybrid approach which combines Bayesian and frequentist methodologies. A Bayesian perspective was employed to define information measures based on the posterior distribution of μ\mu derived from an objective prior. While non-informative priors allow equal treatment of all therapies and minimise potential subjective bias (Ghosh, 2011), integrating available historical information for some or all treatment arms can be readily implemented within the proposed framework. On the other hand, frequentist properties (such as type-I error rates) of Bayesian hypothesis testing procedures were evaluated. This practice is common, as regulatory bodies often require explicit evidence that frequentist error rates are appropriately controlled (Shi and Yin, 2019). In addition, in the proposed allocation criterion, unknown variances are updated sequentially using their maximum likelihood estimate, similar to the approach adopted in many sequential designs (e.g., Liu et al. (2009)). Although this method has shown good performance in practice, an attempt to make the procedure fully Bayesian is subject of ongoing research.

In this work, we used a deterministic "select the best" rule to allocate patients in the sequential experiments, and we compared our proposed criterion to several allocation procedures which are known to have good properties either in terms of statistical power or patient benefit. For a fair comparison, we only considered response-adaptive designs based on deterministic rules. When appropriately calibrated through a robust selection of the parameter κ\kappa, the proposed class of designs offers a good balance between patient benefit and statistical power, by making the correct decision in a remarkable proportion of replicated trials. In future works, a randomised allocation rule based on the information gain in (4) is worth to be considered. The feasibility of such a criterion is shown in Mozgunov and Jaki (2020b). Other good randomised options can be found in the literature of the "Top-Two" allocation criteria, a class of allocation rules that, at each iteration, randomly assign patients only to one of the two best performing treatments. Top-Two Probability Sampling and Top-Two Thompson Sampling are two common examples (Russo, 2020; Jourdan et al., 2022; Wang and Tiwari, 2023), but more sophisticated criteria based on weighted information measures may be explored.

References

  • Antognini and Giovagnoli [2010] Alessandro Baldi Antognini and Alessandra Giovagnoli. Compound optimal allocation for individual and collective ethics in binary clinical trials. Biometrika, 97(4):935–946, 2010.
  • Bartroff and Leung Lai [2011] Jay Bartroff and Tze Leung Lai. Incorporating individual and collective ethics into phase i cancer trial designs. Biometrics, 67(2):596–603, 2011.
  • Azriel et al. [2012] David Azriel, Micha Mandel, and Yosef Rinott. Optimal allocation to maximize the power of two-sample tests for binary response. Biometrika, 99(1):101–113, 2012.
  • Thall et al. [1989] PF Thall, R Simon, and SS Ellenberg. A two-stage design for choosing among several experimental treatments and a control in clinical trials. Biometrics, 45(2):537–547, 1989.
  • Rosenberger and Lachin [1993] William F Rosenberger and John M Lachin. The use of response-adaptive designs in clinical trials. Controlled clinical trials, 14(6):471–484, 1993.
  • Berry and Eick [1995] Donald A Berry and Stephen G Eick. Adaptive assignment versus balanced randomization in clinical trials: a decision analysis. Statistics in medicine, 14(3):231–246, 1995.
  • Robertson et al. [2023] David S Robertson, Kim May Lee, Boryana C López-Kolkovska, and Sofía S Villar. Response-adaptive randomization in clinical trials: from myths to practical considerations. Statistical science: a review journal of the Institute of Mathematical Statistics, 38(2):185, 2023.
  • Rombach et al. [2020] Ines Rombach, Ruth Knight, Nicholas Peckham, Jamie R Stokes, and Jonathan A Cook. Current practice in analysing and reporting binary outcome data—a review of randomised controlled trial reports. BMC medicine, 18:1–8, 2020.
  • Altman and Royston [2006] Douglas G Altman and Patrick Royston. The cost of dichotomising continuous variables. Bmj, 332(7549):1080, 2006.
  • Zhao et al. [2015] Xuemei Zhao, Vijay Modur, Leonidas N Carayannopoulos, and Omar F Laterza. Biomarkers in pharmaceutical research. Clinical chemistry, 61(11):1343–1353, 2015.
  • Kaiafa et al. [2021] Georgia Kaiafa, Stavroula Veneti, George Polychronopoulos, Dimitrios Pilalas, Stylianos Daios, Ilias Kanellos, Triantafyllos Didangelos, Stamatina Pagoni, and Christos Savopoulos. Is hba1c an ideal biomarker of well-controlled diabetes? Postgraduate medical journal, 97(1148):380–383, 2021.
  • Kelbert and Mozgunov [2015] Mark Yakovlevich Kelbert and Pavel Mozgunov. Asymptotic behaviour of the weighted renyi, tsallis and fisher entropies in a bayesian problem. Eurasian Mathematical Journal, 6(2):6–17, 2015.
  • Suhov et al. [2016] Yuri Suhov, Izabella Stuhl, Salimeh Yasaei Sekeh, and Mark Kelbert. Basic inequalities for weighted entropies. Aequationes mathematicae, 90:817–848, 2016.
  • Kasianova et al. [2021] Ksenia Kasianova, Mark Kelbert, and Pavel Mozgunov. Response adaptive designs for phase ii trials with binary endpoint based on context-dependent information measures. Computational Statistics & Data Analysis, 158:107187, 2021.
  • Mozgunov and Jaki [2020a] Pavel Mozgunov and Thomas Jaki. Improving safety of the continual reassessment method via a modified allocation rule. Statistics in medicine, 39(7):906–922, 2020a.
  • Kasianova et al. [2023] Ksenia Kasianova, Mark Kelbert, and Pavel Mozgunov. Response-adaptive randomization for multiarm clinical trials using context-dependent information measures. Biometrical Journal, 65(8):2200301, 2023.
  • Mozgunov and Jaki [2019] Pavel Mozgunov and Thomas Jaki. An information theoretic phase i–ii design for molecularly targeted agents that does not require an assumption of monotonicity. Journal of the Royal Statistical Society Series C: Applied Statistics, 68(2):347–367, 2019.
  • Mozgunov and Jaki [2020b] Pavel Mozgunov and Thomas Jaki. An information theoretic approach for selecting arms in clinical trials. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(5):1223–1247, 2020b.
  • Azriel et al. [2011] David Azriel, Micha Mandel, and Yosef Rinott. The treatment versus experimentation dilemma in dose finding studies. Journal of Statistical Planning and Inference, 141(8):2759–2768, 2011.
  • Villar et al. [2015] Sofia Villar, Jack Bowden, and James Wason. Multi-armed bandit models for the optimal design of clinical trials: Benefits and challenges. Statistical Science, 30:199–215, 2015.
  • Aziz et al. [2021] Maryam Aziz, Emilie Kaufmann, and Marie-Karelle Riviere. On multi-armed bandit designs for dose-finding trials. Journal of Machine Learning Research, 22(14):1–38, 2021.
  • Smith and Villar [2018] Adam L Smith and Sofía S Villar. Bayesian adaptive bandit-based designs using the gittins index for multi-armed trials with normally distributed endpoints. Journal of applied statistics, 45(6):1052–1076, 2018.
  • Williamson and Villar [2020] S Faye Williamson and Sofía S Villar. A response-adaptive randomization procedure for multi-armed clinical trials with normally distributed outcomes. Biometrics, 76(1):197–209, 2020.
  • Daniells et al. [2024] Libby Daniells, Pavel Mozgunov, Helen Barnett, Alun Bedding, and Thomas Jaki. How to add baskets to an ongoing basket trial with information borrowing. arXiv preprint arXiv:2407.06069, 2024.
  • Wang [1991] You-Gan Wang. Sequential allocation in clinical trials. Communications in Statistics-Theory and Methods, 20(3):791–805, 1991.
  • Ghosh [2011] Malay Ghosh. Objective priors: An introduction for frequentists. Statistical Science, 26(2):187–202, 2011.
  • Shi and Yin [2019] Haolun Shi and Guosheng Yin. Control of type i error rates in bayesian sequential designs. Bayesian Analysis, 14(2):399–425, 2019.
  • Liu et al. [2009] Guohui Liu, William F Rosenberger, and Linda M Haines. Sequential designs for ordinal phase i clinical trials. Biometrical Journal: Journal of Mathematical Methods in Biosciences, 51(2):335–347, 2009.
  • Russo [2020] Daniel Russo. Simple bayesian algorithms for best-arm identification. Operations Research, 68(6):1625–1647, 2020.
  • Jourdan et al. [2022] Marc Jourdan, Rémy Degenne, Dorian Baudry, Rianne de Heide, and Emilie Kaufmann. Top two algorithms revisited. Advances in Neural Information Processing Systems, 35:26791–26803, 2022.
  • Wang and Tiwari [2023] Jixian Wang and Ram Tiwari. Adaptive designs for best treatment identification with top-two thompson sampling and acceleration. Pharmaceutical Statistics, 22(6):1089–1103, 2023.
  • Gittins et al. [2011] John Gittins, Kevin Glazebrook, and Richard Weber. Multi-armed bandit allocation indices. John Wiley & Sons, 2011.

Supplementary Material

A Proofs

A.1 Proof of Theorem 1

Proof.

Consider the generic arm jj. It is well known that the Shannon entropy of a normal distribution with mean μj\mu_{j} and variance σnj2\sigma_{n_{j}}^{2} is given by

h(πnj)=12+12log(2πσnj2),h(\pi_{n_{j}})=\frac{1}{2}+\frac{1}{2}\log\Bigl{(}2\pi\sigma_{n_{j}}^{2}\Bigr{)}\,,

while the weighted Shannon entropy depends on the adopted weight function.

To begin, one can derive the explicit formula of the (normalised) weight function. For this purpose, the following equation is useful:

(μjγjσϕ,j)2+(μjx¯njσnj)2=(μjμ~jσ~j)2(μ~jσ~j)2+(γjσϕ,j)2+(x¯njσnj)2,\biggl{(}\frac{\mu_{j}-\gamma_{j}}{\sigma_{\phi,\,j}}\biggr{)}^{2}+\biggl{(}\frac{\mu_{j}-\bar{x}_{n_{j}}}{\sigma_{n_{j}}}\biggr{)}^{2}=\biggl{(}\frac{\mu_{j}-\widetilde{\mu}_{j}}{\widetilde{\sigma}_{j}}\biggr{)}^{2}-\biggl{(}\frac{\widetilde{\mu}_{j}}{\widetilde{\sigma}_{j}}\biggr{)}^{2}+\biggl{(}\frac{\gamma_{j}}{\sigma_{\phi,\,j}}\biggr{)}^{2}+\biggl{(}\frac{\bar{x}_{n_{j}}}{\sigma_{n_{j}}}\biggr{)}^{2}\,, (17)

where μ~j=γjσnj2+x¯njσϕ,j2σnj2+σϕ,j2\widetilde{\mu}_{j}=\frac{\gamma_{j}\sigma_{n_{j}}^{2}+\bar{x}_{n_{j}}\sigma_{\phi,\,j}^{2}}{\sigma_{n_{j}}^{2}+\sigma_{\phi,\,j}^{2}} and σ~j2=σnj2σϕ,j2σnj2+σϕ,j2,\widetilde{\sigma}^{2}_{j}=\frac{\sigma_{n_{j}}^{2}\cdot\sigma_{\phi,\,j}^{2}}{\sigma_{n_{j}}^{2}+\sigma_{\phi,\,j}^{2}}\,, and find that the normalisation condition ϕγ(μj)πnj(μj)𝑑μj=1\int_{\mathbb{R}}\,\phi_{\gamma}(\mu_{j})\pi_{n_{j}}(\mu_{j})\,d\mu_{j}=1 is satisfied by

ϕγ(μj)\displaystyle\phi_{\gamma}(\mu_{j}) =C(x¯nj,σj2,γj,nj)exp{12(μjγjσϕ,j)2}\displaystyle=C(\bar{x}_{n_{j}},\sigma^{2}_{j},\gamma_{j},n_{j})\,\exp\biggl{\{}-\frac{1}{2}\Bigl{(}\frac{\mu_{j}-\gamma_{j}}{\sigma_{\phi,\,j}}\Bigr{)}^{2}\biggr{\}}
=σnjσ~jexp{12[(μjγjσϕ,j)2+(μ~jσ~j)2(γjσϕ,j)2(x¯njσnj)2]}.\displaystyle=\frac{\sigma_{n_{j}}}{\widetilde{\sigma}_{j}}\cdot\exp\Biggl{\{}-\frac{1}{2}\Biggl{[}\biggl{(}\frac{\mu_{j}-\gamma_{j}}{\sigma_{\phi,\,j}}\biggr{)}^{2}+\biggl{(}\frac{\widetilde{\mu}_{j}}{\widetilde{\sigma}_{j}}\biggr{)}^{2}-\biggl{(}\frac{\gamma_{j}}{\sigma_{\phi,\,j}}\biggr{)}^{2}-\biggl{(}\frac{\bar{x}_{n_{j}}}{\sigma_{n_{j}}}\biggr{)}^{2}\Biggr{]}\Biggr{\}}\,. (18)

Notably,

ϕγ(μj)πnj(μj)=(2πσ~j2)12exp{12(μjμ~jσ~j)2}=φ(μj;μ~j,σ~j2),\phi_{\gamma}(\mu_{j})\pi_{n_{j}}(\mu_{j})=\bigl{(}2\pi\widetilde{\sigma}_{j}^{2}\bigr{)}^{-\frac{1}{2}}\,\exp\Biggl{\{}-\frac{1}{2}\biggl{(}\frac{\mu_{j}-\widetilde{\mu}_{j}}{\widetilde{\sigma}_{j}}\biggr{)}^{2}\Biggr{\}}=\varphi(\mu_{j};\widetilde{\mu}_{j},\,\widetilde{\sigma}_{j}^{2})\,,

with φ(μj;μ~j,σ~j2)\varphi(\mu_{j};\widetilde{\mu}_{j},\,\widetilde{\sigma}_{j}^{2}) being the pdf in μj\mu_{j} of a normal distribution with mean μ~j\widetilde{\mu}_{j} and variance σ~j2\widetilde{\sigma}_{j}^{2}.

As a consequence, the weighted Shannon entropy can be written as

hϕγ(πnj)\displaystyle h^{\phi_{\gamma}}(\pi_{n_{j}}) =ϕγ(μj)πnj(μj)logπnj(μj)𝑑μj\displaystyle=-\int_{\mathbb{R}}\,\phi_{\gamma}(\mu_{j})\pi_{n_{j}}(\mu_{j})\log\pi_{n_{j}}(\mu_{j})\,d\mu_{j}
=φ(μj;μ~j,σ~j2)[12log(2πσnj2)+(μjx¯njσnj)2]𝑑μj\displaystyle=\int_{\mathbb{R}}\,\varphi(\mu_{j};\widetilde{\mu}_{j},\,\widetilde{\sigma}_{j}^{2})\,\biggl{[}\frac{1}{2}\log\Bigl{(}2\pi\sigma_{n_{j}}^{2}\Bigr{)}+\biggl{(}\frac{\mu_{j}-\bar{x}_{n_{j}}}{\sigma_{n_{j}}}\biggr{)}^{2}\biggr{]}\,d\mu_{j}
=12log(2πσnj2)+12σnj2(μjx¯nj)2φ(μj;μ~j,σ~j2)𝑑μj.\displaystyle=\frac{1}{2}\log\Bigl{(}2\pi\sigma_{n_{j}}^{2}\Bigr{)}+\frac{1}{2\sigma_{n_{j}}^{2}}\int_{\mathbb{R}}(\mu_{j}-\bar{x}_{n_{j}})^{2}\,\varphi(\mu_{j};\widetilde{\mu}_{j},\,\widetilde{\sigma}_{j}^{2})\,d\mu_{j}\,.

Exploiting the fact that (μjx¯nj)2=(μjμ~j)2+2μj(μ~jx¯nj)μ~j2+x¯nj2(\mu_{j}-\bar{x}_{n_{j}})^{2}=(\mu_{j}-\widetilde{\mu}_{j})^{2}+2\mu_{j}(\widetilde{\mu}_{j}-\bar{x}_{n_{j}})-\widetilde{\mu}_{j}^{2}+\bar{x}_{n_{j}}^{2}, the weighted Shannon entropy can be finally expressed as

hϕγ(πnj)=12log(2πσnj2)+σ~j2+(μ~jx¯nj)22σnj2.h^{\phi_{\gamma}}(\pi_{n_{j}})=\frac{1}{2}\log\Bigl{(}2\pi\sigma_{n_{j}}^{2}\Bigr{)}+\frac{\widetilde{\sigma}_{j}^{2}+(\widetilde{\mu}_{j}-\bar{x}_{n_{j}})^{2}}{2\sigma_{n_{j}}^{2}}\,.

The resulting information gain is then

Δ(μj)=h(πnj)hϕγ(πnj)=σnj2σ~j2(μ~jx¯nj)22σnj2.\Delta(\mu_{j})=h(\pi_{n_{j}})-h^{\phi_{\gamma}}(\pi_{n_{j}})=\frac{\sigma_{n_{j}}^{2}-\widetilde{\sigma}_{j}^{2}-(\widetilde{\mu}_{j}-\bar{x}_{n_{j}})^{2}}{2\sigma_{n_{j}}^{2}}\,.

By observing that μ~jx¯nj=(γx¯nj)σnj2σnj2+σϕ,j2\widetilde{\mu}_{j}-\bar{x}_{n_{j}}=\frac{(\gamma-\bar{x}_{n_{j}})\sigma_{n_{j}}^{2}}{\sigma_{n_{j}}^{2}+\sigma_{\phi,\,j}^{2}}, one can easily derive the final expression of the information gain for the weight function in (18), i.e.

Δnj=12σnj2σnj2+σϕ,j212(γjx¯njσnj)2(σnj2σnj2+σϕ,j2)2.\Delta_{n_{j}}=\frac{1}{2}\frac{\sigma_{n_{j}}^{2}}{\sigma_{n_{j}}^{2}+\sigma_{\phi,\,j}^{2}}-\frac{1}{2}\Biggl{(}\frac{\gamma_{j}-\bar{x}_{n_{j}}}{\sigma_{n_{j}}}\Biggr{)}^{2}\Biggl{(}\frac{\sigma_{n_{j}}^{2}}{\sigma_{n_{j}}^{2}+\sigma_{\phi,\,j}^{2}}\Biggr{)}^{2}\,.

If σϕ,j2=σjpnjκ\sigma_{\phi,\,j}^{2}=\frac{\sigma_{j}^{p}}{n_{j}^{\kappa}}, then the result immediately follows.

A.2 Proof of Theorem 2

Proof.

Consider the generic arm jj. The Shannon entropy of a multivariate normal distribution with qq-dimensional mean vector 𝝁j\bm{\mu}_{j} and q×qq\times q covariance matrix 𝚺nj\bm{\Sigma}_{n_{j}} is given by

h(πnj)=q2+q2log(2π)+12logdet(𝚺nj).h(\pi_{n_{j}})=\frac{q}{2}+\frac{q}{2}\log(2\pi)+\frac{1}{2}\log\det(\bm{\Sigma}_{n_{j}})\,.

Remind that 𝛀ϕ,j=𝚺ϕ,j1\bm{\Omega}_{\phi,\,j}=\bm{\Sigma}_{\phi,\,j}^{-1} and 𝛀nj=𝚺nj1\bm{\Omega}_{n_{j}}=\bm{\Sigma}^{-1}_{n_{j}}. To derive the (normalised) weight function, the following result is of particular interest:

(𝝁j𝜸j)\displaystyle(\bm{\mu}_{j}-\bm{\gamma}_{j})^{\prime} 𝛀ϕ,j(𝝁j𝜸j)+(𝝁j𝒙¯nj)𝛀nj(𝝁j𝒙¯nj)\displaystyle\bm{\Omega}_{\phi,\,j}(\bm{\mu}_{j}-\bm{\gamma}_{j})+(\bm{\mu}_{j}-\bar{\bm{x}}_{n_{j}})^{\prime}\bm{\Omega}_{n_{j}}(\bm{\mu}_{j}-\bar{\bm{x}}_{n_{j}})
=𝝁j𝛀~j𝝁j2𝝁j𝒃j+𝜸j𝛀ϕ,j𝜸j+𝒙¯nj𝛀nj𝒙¯nj\displaystyle=\bm{\mu}_{j}^{\prime}\widetilde{\bm{\Omega}}_{j}\bm{\mu}_{j}-2\bm{\mu}_{j}^{\prime}\bm{b}_{j}+\bm{\gamma}_{j}^{\prime}\bm{\Omega}_{\phi,\,j}\bm{\gamma}_{j}+\bar{\bm{x}}_{n_{j}}^{\prime}\bm{\Omega}_{n_{j}}\bar{\bm{x}}_{n_{j}}
=𝝁j𝛀~j𝝁j2𝝁j𝒃j+𝒃j𝚺~j𝒃j𝒃j𝚺~j𝒃j+𝜸j𝛀ϕ,j𝜸j+𝒙¯nj𝛀nj𝒙¯nj\displaystyle=\bm{\mu}_{j}^{\prime}\widetilde{\bm{\Omega}}_{j}\bm{\mu}_{j}-2\bm{\mu}_{j}^{\prime}\bm{b}_{j}+\bm{b}_{j}^{\prime}\widetilde{\bm{\Sigma}}_{j}\bm{b}_{j}-\bm{b}_{j}^{\prime}\widetilde{\bm{\Sigma}}_{j}\bm{b}_{j}+\bm{\gamma}_{j}^{\prime}\bm{\Omega}_{\phi,\,j}\bm{\gamma}_{j}+\bar{\bm{x}}_{n_{j}}^{\prime}\bm{\Omega}_{n_{j}}\bar{\bm{x}}_{n_{j}}
=(𝝁j𝝁~j)𝛀~j(𝝁j𝝁~j)𝒃j𝚺~j𝒃j+𝜸j𝛀ϕ,j𝜸j+𝒙¯nj𝛀nj𝒙¯nj,\displaystyle=(\bm{\mu}_{j}-\widetilde{\bm{\mu}}_{j})^{\prime}\widetilde{\bm{\Omega}}_{j}(\bm{\mu}_{j}-\widetilde{\bm{\mu}}_{j})-\bm{b}_{j}^{\prime}\widetilde{\bm{\Sigma}}_{j}\bm{b}_{j}+\bm{\gamma}_{j}^{\prime}\bm{\Omega}_{\phi,\,j}\bm{\gamma}_{j}+\bar{\bm{x}}_{n_{j}}^{\prime}\bm{\Omega}_{n_{j}}\bar{\bm{x}}_{n_{j}}\,, (19)

where 𝒃j=𝛀ϕ,j𝜸j+𝛀nj𝒙¯nj\bm{b}_{j}=\bm{\Omega}_{\phi,\,j}\bm{\gamma}_{j}+\bm{\Omega}_{n_{j}}\bar{\bm{x}}_{n_{j}}, 𝝁~j=𝚺~j𝒃j\,\widetilde{\bm{\mu}}_{j}=\widetilde{\bm{\Sigma}}_{j}\bm{b}_{j} and 𝛀~j=𝚺~j1=𝛀ϕ,j+𝛀nj\widetilde{\bm{\Omega}}_{j}=\widetilde{\bm{\Sigma}}_{j}^{-1}=\bm{\Omega}_{\phi,\,j}+\bm{\Omega}_{n_{j}}.

Then, using the result in (19), the normalisation condition qϕγ(𝝁j)πnj(𝝁j)𝑑𝝁j=1\int_{\mathbb{R}^{q}}\,\phi_{\gamma}(\bm{\mu}_{j})\pi_{n_{j}}(\bm{\mu}_{j})\,d\bm{\mu}_{j}=1 is satisfied by

ϕγ(𝝁j)\displaystyle\phi_{\gamma}(\bm{\mu}_{j}) =C(𝒙¯nj,𝚺j,𝜸j,nj)exp{12(𝝁j𝜸j)𝛀ϕ,j(𝝁j𝜸j)}\displaystyle=C(\bm{\bar{x}}_{n_{j}},\bm{\Sigma}_{j},\bm{\gamma}_{j},n_{j})\,\exp\biggl{\{}-\frac{1}{2}\Bigl{(}\bm{\mu}_{j}-\bm{\gamma}_{j}\Bigr{)}^{\prime}\bm{\Omega}_{\phi,\,j}\Bigl{(}\bm{\mu}_{j}-\bm{\gamma}_{j}\Bigr{)}\biggr{\}}
=det(𝛀nj𝚺~j)×exp{12[(𝝁j𝜸j)𝛀ϕ,j(𝝁j𝜸j)+𝝁~j𝚺~j𝝁~j𝜸j𝛀ϕ,j𝜸j𝒙¯njΩnj𝒙¯nj]}.\displaystyle=\sqrt{\det{\bigl{(}\bm{\Omega}_{n_{j}}\widetilde{\bm{\Sigma}}_{j}\bigr{)}}}\,\times\,\exp\biggl{\{}-\frac{1}{2}\biggl{[}(\bm{\mu}_{j}-\bm{\gamma}_{j})^{\prime}\bm{\Omega}_{\phi,\,j}(\bm{\mu}_{j}-\bm{\gamma}_{j})+\bm{\widetilde{\mu}}_{j}^{\prime}\widetilde{\bm{\Sigma}}_{j}\bm{\widetilde{\mu}}_{j}-\bm{\gamma}_{j}^{\prime}\bm{\Omega}_{\phi,\,j}\bm{\gamma}_{j}-\bar{\bm{x}}_{n_{j}}^{\prime}\Omega_{n_{j}}\bar{\bm{x}}_{n_{j}}\biggr{]}\biggr{\}}\,.

Notably,

ϕγ(𝝁j)πnj(𝝁j)=(2π)q2det𝚺~jexp{12(𝝁j𝝁~j)𝛀~j(𝝁j𝝁~j)}=φ(𝝁j;𝝁~j,𝚺~j),\phi_{\gamma}(\bm{\mu}_{j})\pi_{n_{j}}(\bm{\mu}_{j})=(2\pi)^{\frac{q}{2}}\sqrt{\det\widetilde{\bm{\Sigma}}_{j}}\,\exp\biggl{\{}-\frac{1}{2}(\bm{\mu}_{j}-\widetilde{\bm{\mu}}_{j})^{\prime}\widetilde{\bm{\Omega}}_{j}(\bm{\mu}_{j}-\widetilde{\bm{\mu}}_{j})\biggr{\}}=\varphi(\bm{\mu}_{j};\,\bm{\widetilde{\mu}}_{j},\,\bm{\widetilde{\Sigma}}_{j})\,,

with φ(𝝁j;𝝁~j,𝚺~j)\varphi(\bm{\mu}_{j};\,\bm{\widetilde{\mu}}_{j},\,\bm{\widetilde{\Sigma}}_{j}) being the pdf in 𝝁j\bm{\mu}_{j} of a normal distribution with mean vector 𝝁~j\widetilde{\bm{\mu}}_{j} and precision matrix 𝛀~j\widetilde{\bm{\Omega}}_{j}.

As a consequence, the weighted Shannon entropy can be written as

hϕγ(πnj)\displaystyle h^{\phi_{\gamma}}(\pi_{n_{j}}) =qϕγ(𝝁j)πnj(𝝁j)logπnj(𝝁j)𝑑𝝁j\displaystyle=-\int_{\mathbb{R}^{q}}\,\phi_{\gamma}(\bm{\mu}_{j})\pi_{n_{j}}(\bm{\mu}_{j})\log\pi_{n_{j}}(\bm{\mu}_{j})\,d\bm{\mu}_{j}
=qφ(𝝁j;𝝁~j,𝚺~j)[q2log(2π)+12logdet(𝚺nj)+12(𝝁j𝒙¯nj)𝛀nj(𝝁j𝒙¯nj)]φ(𝝁j;𝝁~j,𝚺~j)\displaystyle=\int_{\mathbb{R}^{q}}\,\varphi(\bm{\mu}_{j};\,\bm{\widetilde{\mu}}_{j},\,\bm{\widetilde{\Sigma}}_{j})\,\biggl{[}\frac{q}{2}\log(2\pi)+\frac{1}{2}\log\det(\bm{\Sigma}_{n_{j}})+\frac{1}{2}\Bigl{(}\bm{\mu}_{j}-\bm{\bar{x}}_{n_{j}}\Bigr{)}^{\prime}\bm{\Omega}_{n_{j}}\Bigl{(}\bm{\mu}_{j}-\bm{\bar{x}}_{n_{j}}\Bigr{)}\biggr{]}\varphi(\bm{\mu}_{j};\,\bm{\widetilde{\mu}}_{j},\,\bm{\widetilde{\Sigma}}_{j})
=q2log(2π)+12logdet(𝚺nj)+12q[(𝝁j𝒙¯nj)𝛀nj(𝝁j𝒙¯nj)]φ(𝝁j;𝝁~j,𝚺~j)𝑑𝝁j.\displaystyle=\frac{q}{2}\log(2\pi)+\frac{1}{2}\log\det(\bm{\Sigma}_{n_{j}})+\frac{1}{2}\,\int_{\mathbb{R}^{q}}\,\Bigl{[}(\bm{\mu}_{j}-\bm{\bar{x}}_{n_{j}})^{\prime}\bm{\Omega}_{n_{j}}(\bm{\mu}_{j}-\bm{\bar{x}}_{n_{j}})\Bigr{]}\varphi(\bm{\mu}_{j};\,\bm{\widetilde{\mu}}_{j},\,\bm{\widetilde{\Sigma}}_{j})\,d\bm{\mu}_{j}\,.

To solve the integral in the last line of the previous expression, the following three steps are followed:

  1. 1.

    exploiting the properties of the trace of a matrix:

    (𝝁j𝒙¯nj)𝛀nj(𝝁j𝒙¯nj)=tr[𝛀nj(𝝁j𝒙¯nj)(𝝁j𝒙¯nj)].(\bm{\mu}_{j}-\bm{\bar{x}}_{n_{j}})^{\prime}\bm{\Omega}_{n_{j}}(\bm{\mu}_{j}-\bm{\bar{x}}_{n_{j}})=tr\Bigl{[}\bm{\Omega}_{n_{j}}(\bm{\mu}_{j}-\bm{\bar{x}}_{n_{j}})(\bm{\mu}_{j}-\bm{\bar{x}}_{n_{j}})^{\prime}\Bigr{]}\,.
  2. 2.

    adding and subtracting a same quantity 𝝁~j\bm{\widetilde{\mu}}_{j}:

    (𝝁j𝒙¯nj)(𝝁j𝒙¯nj)\displaystyle(\bm{\mu}_{j}-\bm{\bar{x}}_{n_{j}})(\bm{\mu}_{j}-\bm{\bar{x}}_{n_{j}})^{\prime} =[(𝝁j𝝁~j)(𝝁~j𝒙¯nj)][(𝝁j𝝁~j)(𝝁~j𝒙¯nj)]\displaystyle=\bigl{[}(\bm{\mu}_{j}-\bm{\widetilde{\mu}}_{j})(\bm{\widetilde{\mu}}_{j}-\bm{\bar{x}}_{n_{j}})\bigl{]}\,\bigl{[}(\bm{\mu}_{j}-\bm{\widetilde{\mu}}_{j})(\bm{\widetilde{\mu}}_{j}-\bm{\bar{x}}_{n_{j}})\bigl{]}^{\prime}
    =(𝝁j𝝁~j)(𝝁j𝝁~j)+2𝝁j(𝝁~j𝒙¯nj)+𝒙¯nj𝒙¯nj𝝁~j𝝁~j.\displaystyle=(\bm{\mu}_{j}-\bm{\widetilde{\mu}}_{j})(\bm{\mu}_{j}-\bm{\widetilde{\mu}}_{j})^{\prime}+2\bm{\mu}_{j}(\bm{\widetilde{\mu}}_{j}-\bm{\bar{x}}_{n_{j}})^{\prime}+\bm{\bar{x}}_{n_{j}}\bm{\bar{x}}_{n_{j}}^{\prime}-\bm{\widetilde{\mu}}_{j}\bm{\widetilde{\mu}}_{j}^{\prime}\,.
  3. 3.

    combining the previous results to obtain a sum of three tractable integrals:

    q[(𝝁j𝒙¯nj)𝛀nj(𝝁j𝒙¯nj)]φ(𝝁j;𝝁~j,𝚺~j)𝑑𝝁j=i=13Ii,\int_{\mathbb{R}^{q}}\,\Bigl{[}(\bm{\mu}_{j}-\bm{\bar{x}}_{n_{j}})^{\prime}\bm{\Omega}_{n_{j}}(\bm{\mu}_{j}-\bm{\bar{x}}_{n_{j}})\Bigr{]}\varphi(\bm{\mu}_{j};\,\bm{\widetilde{\mu}}_{j},\,\bm{\widetilde{\Sigma}}_{j})\,d\bm{\mu}_{j}=\sum_{i=1}^{3}\,I_{i}\,,

    where

    (i)I1=tr[𝛀njq(𝝁j𝝁~j)(𝝁j𝝁~j)φ(𝝁j;𝝁~j,𝚺~j)𝑑𝝁j]=tr[𝛀nj𝚺~j]\displaystyle(i)\quad I_{1}=tr\biggl{[}\bm{\Omega}_{n_{j}}\int_{\mathbb{R}^{q}}\,(\bm{\mu}_{j}-\bm{\widetilde{\mu}}_{j})(\bm{\mu}_{j}-\bm{\widetilde{\mu}}_{j})^{\prime}\varphi(\bm{\mu}_{j};\,\bm{\widetilde{\mu}}_{j},\,\bm{\widetilde{\Sigma}}_{j})\,d\bm{\mu}_{j}\biggr{]}=tr\bigl{[}\bm{\Omega}_{n_{j}}\bm{\widetilde{\Sigma}}_{j}\bigr{]}
    (ii)I2\displaystyle(ii)\quad I_{2} =tr[2𝛀nj(q𝝁jφ(𝝁j;𝝁~j,𝚺~j)𝑑𝝁j)(𝝁~j𝒙¯nj)]\displaystyle=tr\biggl{[}2\,\bm{\Omega}_{n_{j}}\biggl{(}\int_{\mathbb{R}^{q}}\,\bm{\mu}_{j}\varphi(\bm{\mu}_{j};\,\bm{\widetilde{\mu}}_{j},\,\bm{\widetilde{\Sigma}}_{j})\,d\bm{\mu}_{j}\biggr{)}\,(\bm{\widetilde{\mu}}_{j}-\bm{\bar{x}}_{n_{j}})^{\prime}\biggr{]}
    =tr[2𝛀nj𝝁~j(𝝁~j𝒙¯nj)]\displaystyle=tr\bigl{[}2\,\bm{\Omega}_{n_{j}}\bm{\widetilde{\mu}}_{j}(\bm{\widetilde{\mu}}_{j}-\bm{\bar{x}}_{n_{j}})^{\prime}\bigr{]}
    (iii)I3\displaystyle(iii)\quad I_{3} =tr[𝛀nj(𝒙¯nj𝒙¯nj𝝁~j𝝁~j)qφ(𝝁j;𝝁~j,𝚺~j)𝑑𝝁j]\displaystyle=tr\biggl{[}\bm{\Omega}_{n_{j}}(\bm{\bar{x}}_{n_{j}}\bm{\bar{x}}_{n_{j}}^{\prime}-\bm{\widetilde{\mu}}_{j}\bm{\widetilde{\mu}}_{j}^{\prime})\int_{\mathbb{R}^{q}}\,\varphi(\bm{\mu}_{j};\,\bm{\widetilde{\mu}}_{j},\,\bm{\widetilde{\Sigma}}_{j})\,d\bm{\mu}_{j}\biggr{]}
    =tr[𝛀nj(𝒙¯nj𝒙¯nj𝝁~j𝝁~j)]\displaystyle=tr\bigl{[}\bm{\Omega}_{n_{j}}(\bm{\bar{x}}_{n_{j}}\bm{\bar{x}}_{n_{j}}^{\prime}-\bm{\widetilde{\mu}}_{j}\bm{\widetilde{\mu}}_{j}^{\prime})\bigr{]}

Then,

hϕγ(πnj)\displaystyle h^{\phi_{\gamma}}(\pi_{n_{j}}) =q2log(2π)+12logdet(𝚺nj)+12i=13Ii\displaystyle=\frac{q}{2}\log(2\pi)+\frac{1}{2}\log\det(\bm{\Sigma}_{n_{j}})+\frac{1}{2}\sum_{i=1}^{3}\,I_{i}
=q2log(2π)+12logdet(𝚺nj)+12tr[𝛀nj𝚺~j]+12tr[(𝝁~j𝒙¯nj)𝛀nj(𝝁~j𝒙¯nj)]\displaystyle=\frac{q}{2}\log(2\pi)+\frac{1}{2}\log\det(\bm{\Sigma}_{n_{j}})+\frac{1}{2}tr\bigl{[}\bm{\Omega}_{n_{j}}\bm{\widetilde{\Sigma}}_{j}\bigr{]}+\frac{1}{2}tr\bigl{[}(\bm{\widetilde{\mu}}_{j}-\bm{\bar{x}}_{n_{j}})^{\prime}\bm{\Omega}_{n_{j}}(\bm{\widetilde{\mu}}_{j}-\bm{\bar{x}}_{n_{j}})\bigr{]}
=q2log(2π)+12logdet(𝚺nj)+12tr[𝛀nj𝚺~j]+12(𝝁~j𝒙¯nj)𝛀nj(𝝁~j𝒙¯nj)\displaystyle=\frac{q}{2}\log(2\pi)+\frac{1}{2}\log\det(\bm{\Sigma}_{n_{j}})+\frac{1}{2}tr\bigl{[}\bm{\Omega}_{n_{j}}\bm{\widetilde{\Sigma}}_{j}\bigr{]}+\frac{1}{2}(\bm{\widetilde{\mu}}_{j}-\bm{\bar{x}}_{n_{j}})^{\prime}\bm{\Omega}_{n_{j}}(\bm{\widetilde{\mu}}_{j}-\bm{\bar{x}}_{n_{j}})

and, after replacing the values of 𝝁~j\bm{\widetilde{\mu}}_{j} and 𝚺~j\bm{\widetilde{\Sigma}}_{j},

Δnj=q212tr[𝛀nj(𝛀nj+𝛀ϕ,j)1]12[(𝜸j𝒙¯nj)𝛀ϕ,j(𝛀nj+𝛀ϕ,j)1𝛀nj(𝛀nj+𝛀ϕ,j)1𝛀ϕ,j(𝜸j𝒙¯nj)].\begin{split}\Delta_{n_{j}}&=\frac{q}{2}-\frac{1}{2}\text{tr}\Bigl{[}\bm{\Omega}_{n_{j}}\bigl{(}\bm{\Omega}_{n_{j}}+\bm{\Omega}_{\phi,\,j}\bigr{)}^{-1}\Bigr{]}\\ &-\frac{1}{2}\Bigl{[}\bigl{(}\bm{\gamma}_{j}-\bar{\bm{x}}_{n_{j}}\bigr{)}^{\prime}\bm{\Omega}_{\phi,\,j}\bigl{(}\bm{\Omega}_{n_{j}}+\bm{\Omega}_{\phi,\,j}\bigr{)}^{-1}\bm{\Omega}_{n_{j}}\bigl{(}\bm{\Omega}_{n_{j}}+\bm{\Omega}_{\phi,\,j}\bigr{)}^{-1}\bm{\Omega}_{\phi,\,j}\bigl{(}\bm{\gamma}_{j}-\bar{\bm{x}}_{n_{j}}\bigr{)}\Bigr{]}\,.\end{split}

If 𝛀ϕ,j=𝚺ϕ,j1=njκ𝚺j1\bm{\Omega}_{\phi,\,j}=\bm{\Sigma}_{\phi,\,j}^{-1}=n_{j}^{\kappa}\,\bm{\Sigma}_{j}^{-1}, the result immediately follows.

B Asymmetric weight function

The allocation rule for single endpoints presented in the main text has the property to give the same importance to values of the sample mean being above or below the target mean of a same constant value xx\in\mathbb{R}. More formally, if x¯nj=γx\bar{x}_{n_{j}}^{-}=\gamma-x and x¯nj+=γ+x\bar{x}_{n_{j}}^{+}=\gamma+x, then ϕ(xnj)=ϕ(xnj+)\phi(x_{n_{j}}^{-})=\phi(x_{n_{j}}^{+}) and Δnj(xnj)=Δnj(xnj+)\Delta_{n_{j}}(x_{n_{j}}^{-})=\Delta_{n_{j}}(x_{n_{j}}^{+}), ceteris paribus. However, the assumption of symmetric behaviour around the target may not be tenable whenever the investigator desires to weight values above and below the target in a different way.

To comply with this different context of the study, we propose the following family of piece-wise asymmetric weight functions:

ϕγ(μj)=C(x¯nj,σj2,γj,nj){exp{12(μjγjσϕa,j)2}μjγjexp{12(μjγjσϕb,j)2}μj>γj,\phi_{\gamma}(\mu_{j})=C(\bar{x}_{n_{j}},\sigma^{2}_{j},\gamma_{j},n_{j})\,\cdot\,\begin{cases}\exp\biggl{\{}-\frac{1}{2}\Bigl{(}\frac{\mu_{j}-\gamma_{j}}{\sigma_{\phi_{a},\,j}}\Bigr{)}^{2}\biggr{\}}\quad\mu_{j}\leq\gamma_{j}\\ \exp\biggl{\{}-\frac{1}{2}\Bigl{(}\frac{\mu_{j}-\gamma_{j}}{\sigma_{\phi_{b},\,j}}\Bigr{)}^{2}\biggr{\}}\quad\mu_{j}>\gamma_{j}\\ \end{cases}\,, (20)

where both pieces are proportional to truncated normal distributions with mean γj\gamma_{j}, whereas the left piece is truncated in (,γj](-\infty,\gamma_{j}] with variance σϕa,j2=a2σj2n\sigma^{2}_{\phi_{a},j}=a^{2}\frac{\sigma_{j}^{2}}{n} while the right piece is truncated in (γj,+)(\gamma_{j},+\infty) with variance σϕb,j2=b2σj2n\sigma^{2}_{\phi_{b},j}=b^{2}\frac{\sigma_{j}^{2}}{n}, with a,b+a,b\in\mathbb{R}^{+}, and with C(x¯nj,σj2,γj,nj)C(\bar{x}_{n_{j}},\sigma^{2}_{j},\gamma_{j},n_{j}) such that ϕγ(μj)πnj(μj)𝑑μj=1\int_{\mathbb{R}}\,\phi_{\gamma}(\mu_{j})\pi_{n_{j}}(\mu_{j})\,d\mu_{j}=1. Notice that the symmetric weight function can be seen as a special case of the asymmetric one (cfr. (20)) when a=b=1a=b=1. In Theorem 3, we illustrate the derivation of the corresponding information gain, Δnja,b\Delta_{n_{j}}^{a,b}.

Theorem 3.

Consider the same assumption as in Theorem 1 (cfr. main text), but assuming a family of weight functions as the one in (20). Then the corresponding expression of the information gain is given by

Δnja,b=12σ˙a,j2+(μ˙a,jx¯nj)22σnj2Da,jDa,j+Db,jσ˙b,j2+(μ˙b,jx¯nj)22σnj2Db,jDa,j+Db,j,\Delta_{n_{j}}^{a,b}=\frac{1}{2}-\frac{\dot{\sigma}_{a,\,j}^{2}+(\dot{\mu}_{a,\,j}-\bar{x}_{n_{j}})^{2}}{2\sigma_{n_{j}}^{2}}\cdot\frac{D_{a,\,j}}{D_{a,\,j}+D_{b,\,j}}-\frac{\dot{\sigma}_{b,\,j}^{2}+(\dot{\mu}_{b,\,j}-\bar{x}_{n_{j}})^{2}}{2\sigma_{n_{j}}^{2}}\cdot\frac{D_{b,\,j}}{D_{a,\,j}+D_{b,\,j}}\,, (21)

with

Da,j=σ~a,jexp{12(γjσϕa,j)2+12(μ~a,jσ~a,j)2}Φ(za,j)D_{a,\,j}=\tilde{\sigma}_{a,\,j}\cdot\exp\Biggl{\{}-\frac{1}{2}\Bigl{(}\frac{\gamma_{j}}{\sigma_{\phi_{a},\,j}}\Bigr{)}^{2}+\frac{1}{2}\Bigl{(}\frac{\tilde{\mu}_{a,\,j}}{\tilde{\sigma}_{a,\,j}}\Bigr{)}^{2}\Biggr{\}}\cdot\Phi(z_{a,\,j})
Db,j=σ~b,jexp{12(γjσϕb,j)2+12(μ~b,jσ~b,j)2}[1Φ(zb,j)]D_{b,\,j}=\tilde{\sigma}_{b,\,j}\cdot\exp\Biggl{\{}-\frac{1}{2}\Bigl{(}\frac{\gamma_{j}}{\sigma_{\phi_{b},\,j}}\Bigr{)}^{2}+\frac{1}{2}\Bigl{(}\frac{\tilde{\mu}_{b,\,j}}{\tilde{\sigma}_{b,\,j}}\Bigr{)}^{2}\Biggr{\}}\cdot\biggl{[}1-\Phi(z_{b,\,j})\biggr{]}
zi,j=γjμ~i,jσ~i,j,μ~i,j=x¯njσϕi,j2+γjσnj2σϕi,j2+σnj2,σ~i,j2=σϕi,j2σnj2σϕi,j2+σnj2,i={a,b}z_{i,\,j}=\frac{\gamma_{j}-\tilde{\mu}_{i,\,j}}{\tilde{\sigma}_{i,\,j}}\,,\quad\tilde{\mu}_{i,\,j}=\frac{\bar{x}_{n_{j}}\sigma_{\phi_{i},\,j}^{2}\,+\,\gamma_{j}\sigma_{n_{j}}^{2}}{\sigma_{\phi_{i},\,j}^{2}\,+\,\sigma_{n_{j}}^{2}}\,,\quad\tilde{\sigma}_{i,\,j}^{2}=\frac{\sigma_{\phi_{i},\,j}^{2}\,\cdot\,\sigma_{n_{j}}^{2}}{\sigma_{\phi_{i},\,j}^{2}\,+\,\sigma_{n_{j}}^{2}}\,,\quad i=\{a,\,b\}
μ˙a,j=μ~a,jσ~a,jφ(za,j)Φ(za,j),σ˙a,j2=σ~a,j2[1za,jφ(za,j)Φ(za,j)(φ(za,j)Φ(za,j))2],\dot{\mu}_{a,\,j}=\tilde{\mu}_{a,\,j}-\tilde{\sigma}_{a,\,j}\frac{\varphi(z_{a,\,j})}{\Phi(z_{a,\,j})}\,,\qquad\dot{\sigma}^{2}_{a,\,j}=\tilde{\sigma}^{2}_{a,\,j}\cdot\Biggl{[}1-\frac{z_{a,\,j}\cdot\varphi(z_{a,\,j})}{\Phi(z_{a,\,j})}-\Bigl{(}\frac{\varphi(z_{a,\,j})}{\Phi(z_{a,\,j})}\Bigr{)}^{2}\Biggr{]}\,,
μ˙b,j=μ~b,j+σ~b,jφ(zb,j)1Φ(zb,j),σ˙b,j2=σ~b,j2[1+zb,jφ(zb,j)1Φ(zb,j)(φ(zb,j)1Φ(zb,j))2],\dot{\mu}_{b,\,j}=\tilde{\mu}_{b,\,j}+\tilde{\sigma}_{b,\,j}\frac{\varphi(z_{b,\,j})}{1-\Phi(z_{b,\,j})}\,,\qquad\dot{\sigma}^{2}_{b,\,j}=\tilde{\sigma}^{2}_{b,\,j}\cdot\Biggl{[}1+\frac{z_{b,\,j}\cdot\varphi(z_{b,\,j})}{1-\Phi(z_{b,\,j})}-\Bigl{(}\frac{\varphi(z_{b,\,j})}{1-\Phi(z_{b,\,j})}\Bigr{)}^{2}\Biggr{]}\,,

and Φ()\Phi(\cdot) and φ()\varphi(\cdot) being, respectively, the cdf and the pdf of a standard normal distribution.

When aba\neq b, the information gain in 21 does not generally attain its maximum in correspondence of x¯nj=γj\bar{x}_{n_{j}}=\gamma_{j} (cfr. Figure B.1).

Refer to caption
Figure B.1: Behaviour of the asymmetric information gain as function of x¯nj\bar{x}_{n_{j}} when γj=0\gamma_{j}=0. Filled dots show the maximum value assumed by each curve.

Therefore, one needs to find combinations of values aa and bb that lead to an information gain having this desirable property (henceforth, referred to as optimal combinations).

B.1 Optimal combinations of a and b

Here, we focus on the case where a>ba>b is required. Once we fix a=aa=a^{*}, the corresponding value bb^{*} leading to an information gain which attains its maximum in x¯nj=γj\bar{x}_{n_{j}}=\gamma_{j} is such that

b=argmin0<b<a|maxx¯njΔnja,bγj|.b^{*}=\underset{0<b<a^{*}}{\text{argmin}}\,\biggl{|}\underset{\bar{x}_{n_{j}}}{\text{max}}\,\Delta_{n_{j}}^{a^{*},b}-\gamma_{j}\biggr{|}\,. (22)

The closed-form derivation of x¯njΔnja,b\frac{\partial}{\partial\bar{x}_{n_{j}}}\,\Delta^{a,b}_{n_{j}} is cumbersome, thus we solve (22) via numerical optimisation. Figure B.2 suggests that under the constraint a>ba>b, the value bb^{*} does exist and is unique if and only if a>2a^{*}>\sqrt{2}.

Refer to caption
Figure B.2: Optimal values b2b^{*^{2}} for each considered a2a^{*^{2}}. Combinations (a,b)(a^{*},b^{*}) below the red dashed line are the only admissible values under the condition a>ba*>b*.

The parameters aa^{*} and bb^{*} can be thus characterised as asymmetry parameters. For example, the larger a>2a^{*}>\sqrt{2} (analogously, b>2b^{*}>\sqrt{2}), the more skewed to the left (right) the resulting information gain and the higher the penalisation of values of x¯nj\bar{x}_{n_{j}} higher (lower) than γj\gamma_{j} (cfr. Figure 21).

Refer to caption
Figure B.3: Behaviour of the asymmetric information gain for several values of aa^{*}. The other parameters are fixed to γj=0\gamma_{j}=0, nj=10n_{j}=10, σj=2\sigma_{j}=2 and κ=1\kappa=1.

The practical selection of aa^{*} (or bb^{*}, when b>ab>a) should depend on the objective of the trial.

Finally, Figure B.4 shows that the optimal combination (a,ba^{*},\,b^{*}) remains the same regardless of σj\sigma_{j}, njn_{j} and γj\gamma_{j}.

Refer to caption
Figure B.4: Asymmetric information gain Δnja,b\Delta_{n_{j}}^{a,b} (a=2.236a^{*}=2.236, b=0.906b^{*}=0.906) as function of the sample mean for different values of either σj\sigma_{j} (left panel), njn_{j} (center panel) or γj\gamma_{j} (right panel). Grey vertical lines help visualising the maxima of the curves.

B.2 Proof of Theorem 3

Proof.

Consider the generic arm jj. First, one needs to derive the normalisation constant C(x¯nj,σj2,γj,nj)C(\bar{x}_{n_{j}},\sigma^{2}_{j},\gamma_{j},n_{j}) such that

μjγjϕγ(μj)πnj(μj)𝑑μj+μj>γjϕγ(μj)πnj(μj)𝑑μj=1\int_{\mu_{j}\leq\gamma_{j}}\,\phi_{\gamma}(\mu_{j})\pi_{n_{j}}(\mu_{j})\,d\mu_{j}+\int_{\mu_{j}>\gamma_{j}}\,\phi_{\gamma}(\mu_{j})\pi_{n_{j}}(\mu_{j})\,d\mu_{j}=1
\iff
C(x¯nj,σj2,γj,nj)[Ia, 1+Ib, 1]=1,C(\bar{x}_{n_{j}},\sigma^{2}_{j},\gamma_{j},n_{j})\cdot\bigl{[}I_{a,\,1}+I_{b,\,1}\bigr{]}=1\,,

where

  • Ia, 1=μjγjexp{12(μjγjσϕa,j)2}πnj(μj)𝑑μj;I_{a,\,1}=\displaystyle\int_{\mu_{j}\leq\gamma_{j}}\,\exp\biggl{\{}-\frac{1}{2}\Bigl{(}\frac{\mu_{j}-\gamma_{j}}{\sigma_{\phi_{a},\,j}}\Bigr{)}^{2}\biggr{\}}\pi_{n_{j}}(\mu_{j})\,d\mu_{j}\,;

  • Ib, 1=μj>γjexp{12(μjγjσϕb,j)2}πnj(μj)𝑑μj.I_{b,\,1}=\displaystyle\int_{\mu_{j}>\gamma_{j}}\,\exp\biggl{\{}-\frac{1}{2}\Bigl{(}\frac{\mu_{j}-\gamma_{j}}{\sigma_{\phi_{b},\,j}}\Bigr{)}^{2}\biggr{\}}\pi_{n_{j}}(\mu_{j})\,d\mu_{j}\,.

Exploiting the equation in (17), the first integral can be re-written as

Ia, 1\displaystyle I_{a,\,1} =(2πσnj2)12μjγjexp{12[(μjγjσϕa,j)2+(μjx¯njσnj)2]}𝑑μj\displaystyle=\bigl{(}2\pi\sigma_{n_{j}}^{2}\bigr{)}^{-\frac{1}{2}}\int_{\mu_{j}\leq\gamma_{j}}\,\exp\biggl{\{}-\frac{1}{2}\biggl{[}\Bigl{(}\frac{\mu_{j}-\gamma_{j}}{\sigma_{\phi_{a},\,j}}\Bigr{)}^{2}+\Bigl{(}\frac{\mu_{j}-\bar{x}_{n_{j}}}{\sigma_{n_{j}}}\Bigr{)}^{2}\biggr{]}\biggr{\}}\,d\mu_{j}
=σ~a,jσnjexp{12[(x¯njσnj)2+(γjσϕa,j)2(μ~a,jσ~a,j)2]}μjγj(2πσ~a,j2)12exp{12(μjμ~a,jσ~a,j)2}𝑑μj\displaystyle=\frac{\tilde{\sigma}_{a,\,j}}{\sigma_{n_{j}}}\cdot\exp\biggl{\{}-\frac{1}{2}\biggl{[}\Bigl{(}\frac{\bar{x}_{n_{j}}}{\sigma_{n_{j}}}\Bigr{)}^{2}+\Bigl{(}\frac{\gamma_{j}}{\sigma_{\phi_{a},\,j}}\Bigr{)}^{2}-\Bigl{(}\frac{\widetilde{\mu}_{a,\,j}}{\tilde{\sigma}_{a,\,j}}\Bigr{)}^{2}\biggr{]}\biggr{\}}\int_{\mu_{j}\leq\gamma_{j}}\,\bigl{(}2\pi\widetilde{\sigma}_{a,\,j}^{2}\bigr{)}^{-\frac{1}{2}}\exp\biggl{\{}-\frac{1}{2}\Bigl{(}\frac{\mu_{j}-\widetilde{\mu}_{a,\,j}}{\widetilde{\sigma}_{a,\,j}}\Bigr{)}^{2}\biggr{\}}\,d\mu_{j}
=σ~a,jσnjexp{12[(x¯njσnj)2+(γjσϕa,j)2(μ~a,jσ~a,j)2]}Φ(za,j),\displaystyle=\frac{\widetilde{\sigma}_{a,\,j}}{\sigma_{n_{j}}}\cdot\exp\biggl{\{}-\frac{1}{2}\biggl{[}\Bigl{(}\frac{\bar{x}_{n_{j}}}{\sigma_{n_{j}}}\Bigr{)}^{2}+\Bigl{(}\frac{\gamma_{j}}{\sigma_{\phi_{a},\,j}}\Bigr{)}^{2}-\Bigl{(}\frac{\tilde{\mu}_{a,\,j}}{\tilde{\sigma}_{a,\,j}}\Bigr{)}^{2}\biggr{]}\biggr{\}}\cdot\Phi\bigl{(}z_{a,\,j}\bigr{)}\,,

and, similarly,

Ib, 1=σ~b,jσnjexp{12[(x¯njσnj)2(γjσϕb,j)2(μ~b,jσ~b,j)2]}[1Φ(zb,j)],I_{b,\,1}=\frac{\tilde{\sigma}_{b,\,j}}{\sigma_{n_{j}}}\cdot\exp\biggl{\{}-\frac{1}{2}\biggl{[}\Bigl{(}\frac{\bar{x}_{n_{j}}}{\sigma_{n_{j}}}\Bigr{)}^{2}-\Bigl{(}\frac{\gamma_{j}}{\sigma_{\phi_{b},\,j}}\Bigr{)}^{2}-\Bigl{(}\frac{\tilde{\mu}_{b,\,j}}{\tilde{\sigma}_{b,\,j}}\Bigr{)}^{2}\biggr{]}\biggr{\}}\cdot\biggl{[}1-\Phi\bigl{(}z_{b,\,j}\bigr{)}\biggr{]}\,,

where Φ()\Phi(\cdot) is the cdf of a standard normal distribution, zi,j=γjμ~i,jσ~i,jz_{i,\,j}=\frac{\gamma_{j}-\tilde{\mu}_{i,\,j}}{\tilde{\sigma}_{i,\,j}}, μ~i,j=x¯njσϕi,j2+γjσnj2σϕi,j2+σnj2\tilde{\mu}_{i,\,j}=\frac{\bar{x}_{n_{j}}\sigma_{\phi_{i},\,j}^{2}\,+\,\gamma_{j}\sigma_{n_{j}}^{2}}{\sigma_{\phi_{i},\,j}^{2}\,+\,\sigma_{n_{j}}^{2}} and σ~a,j2=σϕi,j2σnj2σϕi,j2+σnj2\tilde{\sigma}_{a,\,j}^{2}=\frac{\sigma_{\phi_{i},\,j}^{2}\,\cdot\,\sigma_{n_{j}}^{2}}{\sigma_{\phi_{i},\,j}^{2}\,+\,\sigma_{n_{j}}^{2}}, i={a,b}i=\{a,\,b\}.

After some computations, the normalisation constant of the weight function ϕγ(μj)\phi_{\gamma}(\mu_{j}) can be expressed as

C(x¯nj,σj2,γj,nj)=σnjexp{12(x¯njσnj)2}Da,j+Db,j,C(\bar{x}_{n_{j}},\sigma^{2}_{j},\gamma_{j},n_{j})=\frac{\sigma_{n_{j}}\exp\Bigl{\{}\frac{1}{2}\Bigl{(}\frac{\bar{x}_{n_{j}}}{\sigma_{n_{j}}}\Bigr{)}^{2}\Bigr{\}}}{D_{a,\,j}+D_{b,\,j}}\,,

where Da,j=σ~a,jexp{12(γjσϕa,j)2+12(μ~a,jσ~a,j)2}Φ(za,j)D_{a,\,j}=\tilde{\sigma}_{a,\,j}\cdot\exp\biggl{\{}-\frac{1}{2}\Bigl{(}\frac{\gamma_{j}}{\sigma_{\phi_{a},\,j}}\Bigr{)}^{2}+\frac{1}{2}\Bigl{(}\frac{\tilde{\mu}_{a,\,j}}{\tilde{\sigma}_{a,\,j}}\Bigr{)}^{2}\biggr{\}}\cdot\Phi(z_{a,\,j}) and Db,j=σ~b,jexp{12(γjσϕb,j)2+12(μ~b,jσ~b,j)2}[1Φ(zb,j)]D_{b,\,j}=\tilde{\sigma}_{b,\,j}\cdot\exp\biggl{\{}-\frac{1}{2}\Bigl{(}\frac{\gamma_{j}}{\sigma_{\phi_{b},\,j}}\Bigr{)}^{2}+\frac{1}{2}\Bigl{(}\frac{\tilde{\mu}_{b,\,j}}{\tilde{\sigma}_{b,\,j}}\Bigr{)}^{2}\biggr{\}}\cdot\biggl{[}1-\Phi(z_{b,\,j})\biggr{]}.


Now, let’s derive the weighted Shannon entropy:

hϕγ(πnj)=ϕγ(μj)πnj(μj)logπnj(μj)𝑑μj=[Ia, 2+Ib, 2],h^{\phi_{\gamma}}(\pi_{n_{j}})=-\int_{\mathbb{R}}\,\phi_{\gamma}(\mu_{j})\pi_{n_{j}}(\mu_{j})\log\pi_{n_{j}}(\mu_{j})\,d\mu_{j}=-\bigl{[}I_{a,\,2}+I_{b,\,2}\bigr{]}\,,

where

  • Ia, 2=μjγjϕγ(μj)πnj(μj)logπnj(μj)𝑑μj;I_{a,\,2}=\displaystyle\int_{\mu_{j}\leq\gamma_{j}}\,\phi_{\gamma}(\mu_{j})\pi_{n_{j}}(\mu_{j})\log\pi_{n_{j}}(\mu_{j})\,d\mu_{j}\,;

  • Ib, 2=μj>γjϕγ(μj)πnj(μj)logπnj(μj)𝑑μj.I_{b,\,2}=\displaystyle\int_{\mu_{j}>\gamma_{j}}\,\phi_{\gamma}(\mu_{j})\pi_{n_{j}}(\mu_{j})\log\pi_{n_{j}}(\mu_{j})\,d\mu_{j}\,.

To solve the two integrals, the following four results are particularly useful:

  1. 1.

    Once again, exploiting the equation in (17) and assuming that φ(μj;μ~i,j,σ~i,j2)\varphi(\mu_{j};\widetilde{\mu}_{i,\,j},\,\widetilde{\sigma}_{i,\,j}^{2}) is the pdf in μi,j\mu_{i,\,j} of a normal distribution with mean μ~i,j\widetilde{\mu}_{i,\,j} and variance σ~i,j2\widetilde{\sigma}_{i,\,j}^{2} (i={a,b}i=\{a,b\}), one obtains that

    ϕγ(μj)πnj(μj)\displaystyle\phi_{\gamma}(\mu_{j})\pi_{n_{j}}(\mu_{j}) =12π(Da,j+Db,j){exp{12[(μjμ~a,jσ~a,j)2+(γjσϕa,j)2(μ~a,jσ~a,j)2]}μjγjexp{12[(μjμ~b,jσ~b,j)2(γjσϕb,j)2(μ~b,jσ~b,j)2]}μj>γj\displaystyle=\frac{1}{\sqrt{2\pi}(D_{a,\,j}+D_{b,\,j})}\cdot\begin{cases}\exp\biggl{\{}-\frac{1}{2}\biggl{[}\Bigl{(}\frac{\mu_{j}-\widetilde{\mu}_{a,\,j}}{\widetilde{\sigma}_{a,\,j}}\Bigr{)}^{2}+\Bigl{(}\frac{\gamma_{j}}{\sigma_{\phi_{a},\,j}}\Bigr{)}^{2}-\Bigl{(}\frac{\widetilde{\mu}_{a,\,j}}{\tilde{\sigma}_{a,\,j}}\Bigr{)}^{2}\biggr{]}\biggr{\}}\quad\mu_{j}\leq\gamma_{j}\\ \\ \exp\biggl{\{}-\frac{1}{2}\biggl{[}\Bigl{(}\frac{\mu_{j}-\widetilde{\mu}_{b,\,j}}{\widetilde{\sigma}_{b,\,j}}\Bigr{)}^{2}-\Bigl{(}\frac{\gamma_{j}}{\sigma_{\phi_{b},\,j}}\Bigr{)}^{2}-\Bigl{(}\frac{\widetilde{\mu}_{b,\,j}}{\tilde{\sigma}_{b,\,j}}\Bigr{)}^{2}\biggr{]}\biggr{\}}\quad\mu_{j}>\gamma_{j}\\ \end{cases}
    =1Da,j+Db,j{Da,jφ(μj;μ~a,j,σ~a,j2)Φ(za,j)μjγjDb,jφ(μj;μ~b,j,σ~b,j2)1Φ(zb,j)μj>γj.\displaystyle=\frac{1}{D_{a,\,j}+D_{b,\,j}}\cdot\begin{cases}D_{a,\,j}\cdot\frac{\varphi(\mu_{j};\,\widetilde{\mu}_{a,\,j},\,\widetilde{\sigma}_{a,\,j}^{2})}{\Phi(z_{a,\,j})}\quad\mu_{j}\leq\gamma_{j}\\ \\ D_{b,\,j}\cdot\frac{\varphi(\mu_{j};\,\widetilde{\mu}_{b,\,j},\,\widetilde{\sigma}_{b,\,j}^{2})}{1-\Phi(z_{b,\,j})}\quad\mu_{j}>\gamma_{j}\\ \end{cases}\,.
  2. 2.

    Let’s define two probability density functions:

    1. (a)

      φ(,γj](μj;μ~a,j,σ~a,j2)=φ(μj;μ~a,j,σ~a,j2)Φ(za,j)\varphi_{(-\infty,\,\gamma_{j}]}(\mu_{j};\,\widetilde{\mu}_{a,\,j},\,\widetilde{\sigma}_{a,\,j}^{2})=\frac{\varphi(\mu_{j};\,\widetilde{\mu}_{a,\,j},\,\widetilde{\sigma}_{a,\,j}^{2})}{\Phi(z_{a,\,j})}, which is the pdf of a N(μ~a,j,σ~a,j2)N(\widetilde{\mu}_{a,\,j},\,\widetilde{\sigma}_{a,\,j}^{2}) truncated in (,γj](-\infty,\,\gamma_{j}] and computed in μj\mu_{j};

    2. (b)

      φ[γj,)(μj;μ~b,j,σ~b,j2)=φ(μj;μ~b,j,σ~b,j2)1Φ(zb,j)\varphi_{[\gamma_{j},\,\infty)}(\mu_{j};\,\widetilde{\mu}_{b,\,j},\,\widetilde{\sigma}_{b,\,j}^{2})=\frac{\varphi(\mu_{j};\,\widetilde{\mu}_{b,\,j},\,\widetilde{\sigma}_{b,\,j}^{2})}{1-\Phi(z_{b,\,j})}, which is the pdf of a N(μ~b,j,σ~b,j2)N(\widetilde{\mu}_{b,\,j},\,\widetilde{\sigma}_{b,\,j}^{2}) truncated in [γj,)[\gamma_{j},\,\infty) and computed in μj\mu_{j}.

  3. 3.

    In addition, let’s define

    1. (a)

      the truncated means:

      μ˙a,j=μjγjμjφ(,γj](μj;μ~a,j,σ~a,j2)𝑑μj=μ~a,jσ~a,jφ(za,j)Φ(za,j)\dot{\mu}_{a,\,j}=\int_{\mu_{j}\leq\gamma_{j}}\,\mu_{j}\,\cdot\,\varphi_{(-\infty,\,\gamma_{j}]}(\mu_{j};\,\widetilde{\mu}_{a,\,j},\widetilde{\sigma}_{a,\,j}^{2})\,d\mu_{j}=\tilde{\mu}_{a,\,j}-\tilde{\sigma}_{a,\,j}\frac{\varphi(z_{a,\,j})}{\Phi(z_{a,\,j})}
      μ˙b,j=μj>γjμjφ[γj,)(μj;μ~b,j,σ~b,j2)𝑑μj=μ~b,j+σ~b,jφ(zb,j)1Φ(zb,j).\dot{\mu}_{b,\,j}=\int_{\mu_{j}>\gamma_{j}}\,\mu_{j}\,\cdot\,\varphi_{[\gamma_{j},\,\infty)}(\mu_{j};\,\widetilde{\mu}_{b,\,j},\widetilde{\sigma}_{b,\,j}^{2})\,d\mu_{j}=\tilde{\mu}_{b,\,j}+\tilde{\sigma}_{b,\,j}\frac{\varphi(z_{b,\,j})}{1-\Phi(z_{b,\,j})}\,.
    2. (b)

      the truncated variances:

      σ˙a,j2\displaystyle\dot{\sigma}_{a,\,j}^{2} =μjγj(μjμ˙a,j)2φ(,γj](μj;μ~a,j,σ~a,j2)𝑑μj\displaystyle=\int_{\mu_{j}\leq\gamma_{j}}\,(\mu_{j}-\dot{\mu}_{a,\,j})^{2}\,\cdot\,\varphi_{(-\infty,\,\gamma_{j}]}(\mu_{j};\,\widetilde{\mu}_{a,\,j},\widetilde{\sigma}_{a,\,j}^{2})\,d\mu_{j}
      =σ~a,j2[1za,jφ(za,j)Φ(za,j)(φ(za,j)Φ(za,j))2]\displaystyle=\tilde{\sigma}^{2}_{a,\,j}\cdot\Biggl{[}1-\frac{z_{a,\,j}\cdot\varphi(z_{a,\,j})}{\Phi(z_{a,\,j})}-\Bigl{(}\frac{\varphi(z_{a,\,j})}{\Phi(z_{a,\,j})}\Bigr{)}^{2}\Biggr{]}
      σ˙b,j2\displaystyle\dot{\sigma}_{b,\,j}^{2} =μj>γj(μjμ˙b,j)2φ[γj,)(μj;μ~b,j,σ~b,j2)𝑑μj\displaystyle=\int_{\mu_{j}>\gamma_{j}}\,(\mu_{j}-\dot{\mu}_{b,\,j})^{2}\,\cdot\,\varphi_{[\gamma_{j},\,\infty)}(\mu_{j};\,\widetilde{\mu}_{b,\,j},\widetilde{\sigma}_{b,\,j}^{2})\,d\mu_{j}
      =σ~b,j2[1+zb,jφ(zb,j)1Φ(zb,j)(φ(zb,j)1Φ(zb,j))2]\displaystyle=\tilde{\sigma}^{2}_{b,\,j}\cdot\Biggl{[}1+\frac{z_{b,\,j}\cdot\varphi(z_{b,\,j})}{1-\Phi(z_{b,\,j})}-\Bigl{(}\frac{\varphi(z_{b,\,j})}{1-\Phi(z_{b,\,j})}\Bigr{)}^{2}\Biggr{]}
  4. 4.

    For i={a,b}i=\{a,b\}: (μjx¯nj)2=(μjμ˙i,j)2+2μj(μ˙i,jx¯nj)μ˙i,j2+x¯nj2.(\mu_{j}-\bar{x}_{n_{j}})^{2}=(\mu_{j}-\dot{\mu}_{i,\,j})^{2}+2\mu_{j}(\dot{\mu}_{i,\,j}-\bar{x}_{n_{j}})-\dot{\mu}_{i,\,j}^{2}+\bar{x}_{n_{j}}^{2}\,.

Combining the previous four results, one obtains that

Ia, 2\displaystyle I_{a,\,2} =Da,j(Da,j+Db,j)μjγjφ(,γj](μj;μ~a,j,σ~a,j2)logπnj(μj)𝑑μj\displaystyle=-\frac{D_{a,\,j}}{(D_{a,\,j}+D_{b,\,j})}\int_{\mu_{j}\leq\gamma_{j}}\,\varphi_{(-\infty,\,\gamma_{j}]}(\mu_{j};\,\widetilde{\mu}_{a,\,j},\,\widetilde{\sigma}_{a,\,j}^{2})\cdot\log\pi_{n_{j}}(\mu_{j})\,d\mu_{j}
=Da,jDa,j+Db,j[12log(2πσnj2)+12σnj2μjγj(μjx¯nj)2φ(,γj](μj;μ~a,j,σ~a,j2)𝑑μj]\displaystyle=-\frac{D_{a,\,j}}{D_{a,\,j}+D_{b,\,j}}\cdot\biggl{[}\frac{1}{2}\log(2\pi\sigma_{n_{j}}^{2})+\frac{1}{2\sigma_{n_{j}}^{2}}\int_{\mu_{j}\leq\gamma_{j}}\,(\mu_{j}-\bar{x}_{n_{j}})^{2}\cdot\varphi_{(-\infty,\,\gamma_{j}]}(\mu_{j};\,\widetilde{\mu}_{a,\,j},\,\widetilde{\sigma}_{a,\,j}^{2})\,d\mu_{j}\biggr{]}
=Da,jDa,j+Db,j[12log(2πσnj2)+σ˙a,j2+(μ˙a,j2x¯nj)22σnj2].\displaystyle=-\frac{D_{a,\,j}}{D_{a,\,j}+D_{b,\,j}}\cdot\biggl{[}\frac{1}{2}\log(2\pi\sigma_{n_{j}}^{2})+\frac{\dot{\sigma}^{2}_{a,\,j}+(\dot{\mu}^{2}_{a,\,j}-\bar{x}_{n_{j}})^{2}}{2\sigma_{n_{j}}^{2}}\biggr{]}\,.

and, similarly,

Ib, 2=Db,jDa,j+Db,j[12log(2πσnj2)+σ˙b,j2+(μ˙b,j2x¯nj)22σnj2].I_{b,\,2}=-\frac{D_{b,\,j}}{D_{a,\,j}+D_{b,\,j}}\cdot\biggl{[}\frac{1}{2}\log(2\pi\sigma_{n_{j}}^{2})+\frac{\dot{\sigma}^{2}_{b,\,j}+(\dot{\mu}^{2}_{b,\,j}-\bar{x}_{n_{j}})^{2}}{2\sigma_{n_{j}}^{2}}\biggr{]}\,.

Finally,

hϕγ(πnj)\displaystyle h^{\phi_{\gamma}}(\pi_{n_{j}}) =[Ia, 2+Ib, 2]\displaystyle=-\bigl{[}I_{a,\,2}+I_{b,\,2}\bigr{]}
=12log(2πσnj2)+σ˙a,j2+(μ˙a,jx¯nj)22σnj2Da,jDa,j+Db,j+σ˙b,j2+(μ˙b,jx¯nj)22σnj2Db,jDa,j+Db,j\displaystyle=\frac{1}{2}\log(2\pi\sigma_{n_{j}}^{2})+\frac{\dot{\sigma}_{a,\,j}^{2}+(\dot{\mu}_{a,\,j}-\bar{x}_{n_{j}})^{2}}{2\sigma_{n_{j}}^{2}}\cdot\frac{D_{a,\,j}}{D_{a,\,j}+D_{b,\,j}}+\frac{\dot{\sigma}_{b,\,j}^{2}+(\dot{\mu}_{b,\,j}-\bar{x}_{n_{j}})^{2}}{2\sigma_{n_{j}}^{2}}\cdot\frac{D_{b,\,j}}{D_{a,\,j}+D_{b,\,j}}

The resulting information gain immediately follows.

C Multivariate generalisation: how information gain changes as function of the main parameters (case q=2q=2)

C.1 Means and variances

Figure C.1 and C.2 show the information gain (bivariate case) as function of the vector of means and the vector of variances, respectively. Similarly to the univariate case, the information gain achieves its maximum when sample means are equal to their corresponding target values. Additionally, the larger the variances, the higher the information gain.

Refer to caption
Figure C.1: Information gain as function of the two sample means. Other parameters are set as follows: σj(11)=4\sigma_{j}^{(11)}=4, σj(22)=5\sigma_{j}^{(22)}=5, ρj=0.4\rho_{j}=0.4, nj=10n_{j}=10 and κ=0.75\kappa=0.75. Both target means are equal to 0.
Refer to caption
Figure C.2: Information gain as function of the two variances. Other parameters are set as follows: x¯nj(1)=x¯nj(2)=2\bar{x}_{n_{j}}^{(1)}=\bar{x}_{n_{j}}^{(2)}=2, ρj=0.4\rho_{j}=0.4, nj=10n_{j}=10 and κ=0.75\kappa=0.75. Both target means are equal to 0.

C.2 Correlation

The maximum of the information gain as function of ρj\rho_{j} - ceteris paribus - is attained in

ρj=sign{(γj(1)x¯nj(1))(γj(2)x¯nj(2))}×min{|γj(1)x¯nj(1)γj(2)x¯nj(2)σj(22)σj(11)|,|γj(2)x¯nj(2)γj(1)x¯nj(1)σj(11)σj(22)|},\rho_{j}=\text{sign}\Bigl{\{}\bigl{(}\gamma_{j}^{\scriptscriptstyle(1)}-\bar{x}_{n_{j}}^{\scriptscriptstyle(1)}\bigr{)}\bigl{(}\gamma_{j}^{\scriptscriptstyle(2)}-\bar{x}_{n_{j}}^{\scriptscriptstyle(2)}\bigr{)}\Bigr{\}}\times\min\biggl{\{}\biggl{|}\frac{\gamma_{j}^{\scriptscriptstyle(1)}-\bar{x}_{n_{j}}^{\scriptscriptstyle(1)}}{\gamma_{j}^{\scriptscriptstyle(2)}-\bar{x}_{n_{j}}^{\scriptscriptstyle(2)}}\sqrt{\frac{\sigma_{j}^{\scriptscriptstyle(22)}}{\sigma_{j}^{\scriptscriptstyle(11)}}}\biggl{|},\,\biggl{|}\frac{\gamma_{j}^{\scriptscriptstyle(2)}-\bar{x}_{n_{j}}^{\scriptscriptstyle(2)}}{\gamma_{j}^{\scriptscriptstyle(1)}-\bar{x}_{n_{j}}^{\scriptscriptstyle(1)}}\sqrt{\frac{\sigma_{j}^{\scriptscriptstyle(11)}}{\sigma_{j}^{\scriptscriptstyle(22)}}}\biggl{|}\biggr{\}}\,,

with the convention that 00=0\frac{0}{0}=0. In other words, for the jj-th arm, the maximum of the information gain is attained in correspondence of the correlation being equal to the ratio of the standardised differences between the endpoint-specific target and sample mean. If variances of the two endpoints are equal and the sample means are at the same distance from the target, then the maximum is attained in ρj=+1\rho_{j}=+1 when the two endpoint-specific sample means are equal and ρj=1\rho_{j}=-1 when they are unequal (cfr. Figure C.3a and C.3b, respectively). Intuitively, in the first case responses of the two endpoints tend to be on the same side with respect to their target means (e.g. on average, both above or both below the corresponding target): this would suggest that a positive correlation may exist between the two endpoints and, coherently, higher positive correlations lead to a higher information gain. Conversely, when responses of the two endpoints tend to be on different sides with respect to their target means (e.g. on average, one above and one below the corresponding target), higher negative correlations lead to a higher information gain. Finally, when at least one of the two sample means equals the corresponding target, the information gain is symmetric around the null correlation point (cfr. Figure C.3c).

Refer to caption
Figure C.3: Information gain as function of ρj\rho_{j} when σj(11)=σj(22)=4\sigma_{j}^{(11)}=\sigma_{j}^{(22)}=4 and nj=10n_{j}=10. In each panel, different vectors of sample means are considered: a) (2,2)(2,2), b) (2,2)(2,-2) and c) (0,10)(0,-10).

D Target-oriented modified Gittins index rules

Smith and Villar [2018]’s design is oriented toward a patient benefit objective and, thus, it is worth to be considered as competitor to our proposal. In Smith and Villar [2018]’s formulation, the allocation rule proceeds as follows: if larger responses are desirable, the next patient is assigned to the arm jj (j=1,,Kj=1,\dots,K) with largest Gittins index (GI), 𝒢j+=x¯nj+σj𝒢¯j(nj,d)\mathcal{G}_{j}^{+}=\bar{x}_{n_{j}}+\sigma_{j}\,\bar{\mathcal{G}}_{j}(n_{j},d), where the values of the standardised index 𝒢¯j(nj,d)\bar{\mathcal{G}}_{j}(n_{j},d) can be interpolated from the tables printed in Gittins et al. [2011, pp. 261–262] for several possible combinations of current sample size njn_{j} and d(0,1)d\in(0,1). Vice versa, if smaller responses are desirable, the next patient is assigned to the arm jj with smallest quantity 𝒢j=x¯njσj𝒢¯j(nj,d)\mathcal{G}_{j}^{-}=\bar{x}_{n_{j}}-\sigma_{j}\,\bar{\mathcal{G}}_{j}(n_{j},d). The quantity σj𝒢¯j(nj,d)\sigma_{j}\,\bar{\mathcal{G}}_{j}(n_{j},d) is a measure of the learning reward associated with keep sampling an arm which has already been sampled njn_{j} times, with dd being a tuning parameter quantifying the value of learning. However, this design cannot be directly adopted to our setting where desirable values are the ones as close as possible to γj\gamma_{j}. Thus, we propose two heuristic extensions of the rule above: the symmetric Gittins index criterion (SGI) and the targeted Gittins index criterion (TGI).

The first proposal is to use an allocation rule which assigns the next patient to the arm jj which minimises the quantity

𝒢jγj=|x¯njγj|σj𝒢¯j(nj,d),\mathcal{G}^{\gamma_{j}}_{j}=|\bar{x}_{n_{j}}-\gamma_{j}|-\sigma_{j}\,\bar{\mathcal{G}}_{j}(n_{j},d)\,, (23)

which has the property of being symmetrical around its maximum x¯nj=γj\bar{x}_{n_{j}}=\gamma_{j} (i.e. it takes the same value in correspondence of xnj=γjcx_{n_{j}}=\gamma_{j}-c and xnj=γj+cx_{n_{j}}=\gamma_{j}+c, cc\in\mathbb{R}, ceteris paribus. The basic idea dwells in discounting the learning component σj𝒢¯j(nj,d)\sigma_{j}\,\bar{\mathcal{G}}_{j}(n_{j},d) from the distance between the current sample mean and the target mean.

The second proposal arises from considering a targeted Gittins index for arm jj (j=1,,Kj=1,\dots,K), namely the Gittins index that would be associated to arm jj if its response were N(γj,σj2)N(\gamma_{j},\,\sigma_{j}^{2}): as nn goes to infinity, the value of learning goes to 0 and the targeted Gittins index would tend to γj\gamma_{j} by the Law of Large Numbers. For this reason, our proposal is to adopt the following rule which allocates the next patient to the arm jj which minimised the distance between the current Gittins index and the asymptotical targeted Gittins index (i.e., γj\gamma_{j}):

δ𝒢j,𝒢γj={|𝒢j+γj|if x¯njγj|𝒢jγj|if x¯nj>γj,\delta_{\mathcal{G}_{j},\mathcal{G}_{\gamma_{j}}}=\begin{cases}|\mathcal{G}_{j}^{+}-\gamma_{j}|&\text{if }\bar{x}_{n_{j}}\leq\gamma_{j}\\ |\mathcal{G}_{j}^{-}-\gamma_{j}|&\text{if }\bar{x}_{n_{j}}>\gamma_{j}\\ \end{cases}\,, (24)

where the 𝒢j+\mathcal{G}_{j}^{+} (respectively, 𝒢j\mathcal{G}_{j}^{-}) is adopted when the current sample mean is below (above) the target. Interestingly, as njn_{j} tends to infinity, δ𝒢j,𝒢γj\delta_{\mathcal{G}_{j},\mathcal{G}_{\gamma_{j}}} tends to |μjγj||\mu_{j}-\gamma_{j}| which is the key quantity of our adopted recommendation rule.

E Additional results for the simulation study with single endpoint

E.1 Burn-in size and type-I error

Figure E.1 shows how the type-I error rate decreases as the burn-in size (BB) increases. We choose the scenario where all true means are equal to the target: this scenario can be considered challenging for our proposed design, as it is generally associated with a high inflation of the Type I error rate. The drop is surprisingly rapid when passing from B=2B=2 to B=5B=5, meaning that an equal allocation of 20%20\% of the patients to each arm prior to the adaptive allocation may be useful to stabilise the algorithm and reduce the its dependence from the initial values. This is even more evident in the case of unknown variances since an additional parameter must be estimated in each arm.

For the full simulation study, we adopt B=5B=5 which seems to offer a good balance between protecting patients from default allocations to sub-optimal arms and control of type-I error. Notably, in the considered example, the choice B=5B=5 leads to a small difference between type-I error rates of the known and unknown cases.

Refer to caption
Figure E.1: Type-I error rates under H0:μj=γjH_{0}:\,\mu_{j}=\gamma_{j} and ηα=0.95\eta_{\alpha}=0.95 when the design WE(22,11) is used in combination of different burn-in sizes, in the case of either known or unknown variances. The empty shapes in correspondence of burn-in size equal to 2525 correspond to the extreme case of full burn-in (i.e. lack of adapting phase). Other parameters are set as follows: K=4K=4, N=100N=100, 𝝈=(2,2,2,4)\bm{\sigma}=(2,2,2,4) and γj=0,j\gamma_{j}=0,\,\forall j.

E.2 Choice of the null scenarios used for calibration of cut-off probabilities

Recall that we focus on the set of null scenarios 𝒮0={H0:μjγ=c,j,c0}\mathcal{S}_{0}=\{H_{0}:\,\mu_{j}-\gamma=c,\,\forall j,\,c\geq 0\} for the calibration of the cut-off probabilities. Our strategy consists in considering values of cc from 0 to 4040 increasing in a quadratic way. This reflects the interest in having a larger number of scenarios where cc is closer to γ=0\gamma=0, which are frequently associated with high inflation or deflation of the type-I error. Indeed, as cc increases, the type-I error rate tends to stabilise.

In practice, we first consider a grid of evenly spaced values from 0 to 40\sqrt{40}; then, we take the square of these values to obtain the final grid of values for cc.

E.3 How pp and κ\kappa affect operating characteristics

Figure E.2 illustrates how the parameter κ\kappa has a key role in regulating the trade-off between patient benefit (individual objective) and power requirements (collective objective). Notably, in Scenario I, the power tends to increase and the patient benefit to decrease with κ\kappa as result of the more spread allocation of patients in the various treatment arms. Moreover, if one compares the operating characteristics under the two different choices of pp, the case of p=2p=2 uniformly leads to higher power but lower patient benefit than the case of p=1p=1. A reasonable explanation dwells in the fact that a high pp gives more chance to the arm with highest variance to be explored: while this generally helps improving the power, it may slightly reduce the exploitation of the best treatment when the most variable arm is suboptimal as in Scenario I. In contrast, Scenario II tells a partially different story: WE designs with lower pp are uniformly (across κ\kappa’s) dominated by the ones with higher pp in terms of patient benefit since the best treatment is also the one with most variable response; unsurprisingly, the opposite tends to happen in terms of power. The only unexpected behaviour is represented by the patient benefit increasing with κ\kappa when p=1p=1, which is in net contrast with the value κ=0.55\kappa=0.55 optimising the patient benefit criterion. Once again, it is important to remind that this value of κ\kappa was selected by averaging across a large number of possible configurations, and the ones like Scenario II can be regarded as outliers.

Refer to caption
Figure E.2: Average percentage of experimentation of the best arm during the trial (PB, i.e. patient benefit measure) and two-components power (cut-off values based on average type-I error control) for the WE design for different values of κ\kappa, under either p=1p=1 or p=2p=2. Dashed and dotted lines highlights the values of the penalisation parameter κ\kappa chosen using patient benefit and power criteria. Scenario I is characterised by 𝝁=(1.91,3.36,0.37,3.99)\bm{\mu}=(1.91,-3.36,-0.37,3.99), while Scenario II by 𝝁=(1.13,3.48,3.57,0.34)\bm{\mu}=(1.13,-3.48,-3.57,0.34). Both scenarios consider 𝝈=(2,2,2,4)\bm{\sigma}=(2,2,2,4) and γ=0\gamma=0.

F Unknown variance case with single endpoint

When the variance of the response is supposed to be unknown, we rely on the unbiased sample variance 1nj1i=1nj(xi,jx¯nj)2\frac{1}{n_{j}-1}\sum_{i=1}^{n_{j}}\,(x_{i,\,j}-\bar{x}_{n_{j}})^{2} (j=1,,K)j=1,\dots,K) as plug-in estimate of the true one. This requires a minimum burn-in of B=2B=2 patients per arm to obtain a first estimate of the arm-specific variances when response-adaptive designs are used. However, in practice, a higher burn-in may be needed to achieve better operating characteristics when the variances are supposed to be unknown (cfr. Section E.1). Here, we adopt the same setting as in Section 5 of the main text, apart from the scenarios which will be now characterised by the generic set {(μ1,,μK),(σ1,,σK)}\{(\mu_{1},\dots,\mu_{K}),\,(\sigma_{1},\dots,\sigma_{K})\}, where μj\mu_{j} is generated from Unif[4,4]Unif[-4,4] and σj\sigma_{j} from Unif[2,4]Unif[2,4], j=1,,K\forall j=1,\dots,K. We run a simulation study over S=500S=500 alternative scenarios.

F.1 Type-I error control and cut-off probabilities

We consider the set of null scenarios 𝒮0={(μ1,,μK):μjγ=c,c0,j},{(σ1,,σK):σj=dj,d>0,j},\mathcal{S}^{\prime}_{0}=\Bigl{\{}(\mu_{1},\dots,\mu_{K}):\,\mu_{j}-\gamma=c,\,c\geq 0,\,\forall j\},\,\{(\sigma_{1},\dots,\sigma_{K})\,:\sigma_{j}=d_{j},\,d>0,\,\forall j\Bigr{\}}\,, whose elements are all the possible combinations of values c={0, 10, 40}c=\{0,\,10,\,40\} and dj={2, 4}d_{j}=\{2,\,4\}. Figure F.1 and F.2 show the type-I error rates for each considered design in correspondence of the scenarios in 𝒮0\mathcal{S}^{\prime}_{0} when cut-off probabilities are calibrated - respectively - under strong and average control of the type-I error at level α=5%\alpha=5\%.

Interestingly, most of the design are associated with an inflation of the type-I error in correspondence when arms have different variances, and this is even more evident in the case of 𝝈=(2,2,4,4)\bm{\sigma}=(2,2,4,4). The magnitude of this inflation strongly depends on the particular vector of means: for example, when c=0c=0, most of the response-adaptive designs are associated with higher type-I error rates with respect to the case of c=10c=10 or c=40c=40. An interesting exception is represented by WE(22, 0.750.75), which is associated with higher type-I error rates when c=40c=40 and all the variances are equal. Finally, FR is associated with higher type-I error rates when cc becomes large.

Refer to caption
Figure F.1: Individual type-I error rates in correspondence of the null scenarios in 𝒮0\mathcal{S}^{\prime}_{0} for several designs in the case of unknown variances. The target mean is set to γ=0\gamma=0 for all the scenarios. Design-specific cut-off probabilities are calibrated under strong control of the type-I error probability at level α=5%\alpha=5\%.
Refer to caption
Figure F.2: Same as Figure F.1, but with design-specific cut-off probabilities are calibrated under average control of the type-I error probability at level α=5%\alpha=5\%.

F.2 Selection of κ\kappa

Figure F.3 illustrates the result of the optimisation of the patient benefit and power criteria for p=1p=1 and p=2p=2. The probability to achieve 80%80\% of FR power does not achieve the threshold ξ=90%\xi=90\%, therefore we choose κ\kappa in order to maximise this probability. The proposed values of κ\kappa are {0.55, 1.2}\{0.55,\,1.2\} for p=1p=1 and {0.75, 1.45}\{0.75,\,1.45\} for p=2p=2.

Refer to caption
Figure F.3: Values of the objective function g1g_{1} (left panel) and the probability to achieve at least 80%80\% of the power of FR (right panel) for several values of κ\kappa, and for both p=1p=1 (black square and p=2p=2 (red circles) in the case of unknown variances. Filled shapes (either circles or squares) corresponds to the robust optimal values of κ\kappa. The two-components power is considered, along with design-specific cut-off probabilities calibrated to achieve a 5%5\% type-I error probability on average over the set of scenarios considered in Section F.1.

F.3 Numerical results

Figure F.4 shows the results of the simulation study in the case of unknown variances. As anticipated, the uncertainty about the estimates of the unknown variances is reflected in poorer performance of all the methods. Similarly to the case of known variances (cfr. main text), WE designs with κ\kappa optimising the power criterion - i.e. WE(11, 1.21.2) and WE(22, 1.451.45) - are associated with high percentage of identification of the best arm at the end of the trial. TGI, WE(11, 1.21.2) and WE(22, 1.451.45) exhibit similar distributions of the two-components power, but TGI obtains lower performance in terms of identification of the best arm and patient benefit. Surprisingly, in few random scenarios WE designs may perform even poorer than FR in terms of average percentage of experimentation of the best arm (i.e. patient benefit measure): specifically, this occurs in 1414 scenarios for WE(22, 0.750.75) and only in 22 scenarios for WE(11, 0.550.55). However, it is interesting to notice that these scenarios are all characterised by the mean of the best treatment being relatively far from the target and close to the one of the other treatments (the median two-components power under FR is 3.1%3.1\% in this subset of scenarios).

Refer to caption
Figure F.4: Operating characteristics across S=500S=500 randomly generated scenarios for the considered set of designs. Each scenario is characterised by a vector of mean responses. For all scenarios, we assume K=4K=4, N=100N=100, 𝝈=(2,2,2,4)\bm{\sigma}=(2,2,2,4) (unknown) and γj=0\gamma_{j}=0, for j=1,,4j=1,\dots,4. Burn-in size is fixed to B=5B=5 for all the response-adaptive designs.

Overall, WE(1,1.2) achieves slightly lower power than FR but with more satisfactory performance in terms of correct identification of the best arm and, above all, patient benefit.

Now, we assess the operating characteristics of the considered designs in two specific scenarios:

  1. i

    Scenario Ibis, characterised by 𝝁=(1.91,3.36,0.37,3.99)\bm{\mu}=(1.91,-3.36,-0.37,3.99) and 𝝈=(3.48,2.16,2.91,4)\bm{\sigma}=(3.48,2.16,2.91,4), with the worst treatment arm having the highest variability but the second worst having the lowest variability;

  2. ii

    Scenario IIbis, characterised by 𝝁=(1.13,3.48,3.57,0.34)\bm{\mu}=(1.13,-3.48,-3.57,0.34) and 𝝈=(3.28,2.13,2.11,3.08)\bm{\sigma}=(3.28,2.13,2.11,3.08), with the best two treatment arms having the highest variability.

Table F.1 illustrates the operating characteristics of the considered designs under Scenario Ibis and IIbis. In both cases, WE(1,1.2) and WE(2,1.45) perform well in terms of correct identification of the best arm at the end of the trial and power: in Scenario I, their two-components power is 1111 points lower than FR, but high a remarkable gain in terms of patients benefit, while in Scenario II they both achieve the highest two-components power, along with SGI. Interestingly, SGI results in slightly better operating characteristics than WE(1,0.55) and WE(2,0.75) in the two considered scenarios. TGI achieves relatively high conditional power (higher than FR), but performs poorly in terms of patient benefit measure, with a corresponding high standard error.

Scenario Ibis (unknown variances): (μj,σj)={(1.91, 3.48),(3.36, 2.16),(0.37, 2.91),(3.99, 4)}(\mu_{j},\,\sigma_{j})=\bigl{\{}(1.91,\,3.48),\,(-3.36,\,2.16),\,(-0.37,\,2.91),\,(3.99,\,4)\bigr{\}}, γj=0\gamma_{j}=0
      Design       PB (s.e.)       CS(%)I{}_{I}(\%)       CS(%)I&II{}_{I\&II}(\%)       (1β)C(1-\beta)_{C}       (1β)TC(1-\beta)_{TC}       FR       24.99 (0.04)       94.49       88.81       0.57       0.50       CB       74.31 (0.24)       90.40       59.71       0.37       0.22       TS       75.10 (0.23)       89.01       58.52       0.37       0.22       SGI       74.42 (0.17)       95.80       66.58       0.50       0.33       TGI       70.33 (0.24)       86.60       59.59       0.58       0.34       WE(1,0.55)       74.23 (0.18)       95.88       66.73       0.44       0.29       WE(2,0.75)       69.93 (0.22)       93.98       66.23       0.39       0.26       WE(1,1.2)       71.95 (0.16)       97.80       71.11       0.54       0.39       WE(2,1.45)       69.84 (0.17)       97.04       71.46       0.54       0.39

Scenario IIbis (unknown variances): (μj,σj)={(1.13, 3.28),(3.48, 2.13),(3.57, 2.11),(0.34, 3.08)}(\mu_{j},\,\sigma_{j})=\bigl{\{}(1.13,\,3.28),\,(-3.48,\,2.13),\,(-3.57,\,2.11),\,(0.34,\,3.08)\bigr{\}}, γj=0\gamma_{j}=0
      Design       PB (s.e.)       CS(%)I{}_{I}(\%)       CS(%)I&II{}_{I\&II}(\%)       (1β)C(1-\beta)_{C}       (1β)TC(1-\beta)_{TC}       FR       25.05 (0.04)       78.25       78.03       0.23       0.18       CB       60.26 (0.35)       71.87       58.59       0.30       0.17       TS       60.09 (0.35)       71.00       57.40       0.30       0.17       SGI       62.85 (0.27)       80.72       67.42       0.35       0.24       TGI       54.69 (0.36)       75.43       64.86       0.32       0.21       WE(1,0.55)       62.67 (0.29)       80.08       66.59       0.31       0.21       WE(2,0.75)       60.95 (0.29)       79.39       66.57       0.24       0.16       WE(1,1.2)       63.60 (0.23)       85.40       73.56       0.34       0.25       WE(2,1.45)       62.69 (0.23)       84.83       73.46       0.33       0.24

Table F.1: Operating characteristics of the WE(pp,κ\kappa) design for different combinations of κ\kappa and pp, FR design, CB, SGI and TGI under Scenario Ibis and IIbis (unknown variances). Both (1β)C(1-\beta)_{C} and (1β)TC(1-\beta)_{TC} are based on the design-specific cut-off probabilities η0.05ave\eta_{0.05}^{\text{ave}} shown in Figure F.2. Results are based on 10410^{4} replicated trials.

G Additional results for illustration with co-primary endpoints

Figure G.1 shows the individual type-I error rates in correspondence of |𝒮0′′|=36|\mathcal{S}^{\prime\prime}_{0}|=36 combinations of values c1c_{1} and c2c_{2} when cut-off probabilities are derived under strong control of type-I error probability. Results are in line with those discussed in the main text for the case of average control of the type-I error.

Refer to caption
Figure G.1: Individual type-I error rates in correspondence of several null scenarios (combinations of c1c_{1} and c2c_{2}) for the considered designs under strong control of type-I error probability. Cut-off probabilities are 0.9190.919 (FR), 0.9640.964 (CB), 0.9280.928 (WE(0.5)), 0.9120.912 (WE(0.75)).