\pagerange

Comparing HIV Vaccine Immunogenicity across Trials with Different Populations and Study Designs–References \artmonthDecember

Comparing HIV Vaccine Immunogenicity across Trials with Different Populations and Study Designs

Yutong Jin^1,∗ Alex Luedtke² Zoe Moodie³ Holly Janes³ David Benkeser¹
¹Department of Biostatistics and Bioinformatics [email protected] Emory University Atlanta GA USA
²Department of Statistics University of Washington Seattle WA USA
³ Vaccine and Infectious Disease Division Fred Hutchinson Cancer Research Center Seattle WA USA

(Received October 2007. Revised February 2008. Accepted March 2008.; 2008)

Abstract

Safe and effective preventive vaccines have the potential to help stem the HIV epidemic. The efficacy of such vaccines is typically measured in randomized, double-blind phase IIb/III trials and described as a reduction in newly acquired HIV infections. However, such trials are often expensive, time-consuming, and/or logistically challenging. These challenges lead to a great interest in immune responses induced by vaccination, and in identifying which immune responses predict vaccine efficacy. These responses are termed vaccine correlates of protection. Studies of vaccine-induced immunogenicity vary in size and design, ranging from small, early phase trials, to case-control studies nested in a broader late-phase randomized trial. Moreover, trials can be conducted in geographically diverse study populations across the world. Such diversity presents a challenge for objectively comparing vaccine-induced immunogenicity. To address these practical challenges, we propose a framework that is capable of identifying appropriate causal estimands and estimators, which can be used to provide standardized comparisons of vaccine-induced immunogenicity across trials. We evaluate the performance of the proposed estimands via extensive simulation studies. Our estimators are well-behaved and enjoy robustness properties. The proposed technique is applied to compare vaccine immunogenicity using data from three recent HIV vaccine trials.

keywords:

Causal inference, HIV/AIDS, infectious disease, randomized clinical trials, vaccine immunogenicity.

^†^†volume: 64

1 Introduction

Outbreaks of infectious diseases remain a major concern. In combating such diseases, the development of safe and effective preventive vaccines is crucial. Vaccines are often designed to generate immune responses that protect individuals against infection and/or disease caused by pathogens such as viruses, bacteria or parasites. The efficacy of candidate vaccines is typically measured in randomized, double-blind, placebo-controlled clinical trials. Vaccine efficacy (VE) is typically quantified as one minus a relative risk, comparing risk of infection or disease under vaccination to risk under a placebo or control vaccine. However, estimation of VE against a clinical endpoint can be time-consuming, costly, and difficult to assess in randomized trials. For rare endpoints, it can take thousands of participants and years to complete such a trial. For emerging pathogens such as Chikungunya, Lassa fever, and Nipah virus, it is logistically challenging to implement randomized trials due to unpredictable and short-lived outbreaks. Moreover, randomized trials are unlikely to generate all evidence needed to guide policies around vaccines, such as whether vaccines should be updated with the emergence of new strains of a pathogen in the population.

While randomized trial-generated evidence is the gold standard for demonstrating efficacy of a vaccine, there is intense interest in identifying immune responses that are predictive of VE. Such responses, termed vaccine correlates of protection (CoP), may serve as surrogate endpoints in lieu of a formal evaluation of clinical VE, thereby potentially opening accelerated pathways for new vaccine products to be brought to market and/or updates to the strain included in existing vaccines. Therefore, in many contexts it is often of interest to study immunological endpoints and compare vaccine immunogenicity across different vaccines. The most common statistical approach to quantifying differences in immune responses across various vaccines is to use a t-test or Wilcoxon Signed-Rank test. Sometimes these simple procedures must be extended to account for the sampling design, for example, case-control studies nested in a broader randomized trial (Banzhoff et al., 2003; Chung et al., 2014; Furuya-Kanamori et al., 2021).

We consider the problem of extending these methods to enable more objective comparisons of vaccines when immunogenicity is evaluated in different studies across diverse geographic sites, using varying study designs. When vaccines are evaluated at different study sites, there may be important differences in clinical and/or demographic characteristics across the various trial populations. If these characteristics also impact immune responses, then simple approaches may yield biased inference regarding differences in vaccine immunogenicity. Moreover, deriving proper standard errors for estimators and tests can be challenging when comparing data generated under different designs. We make use of efficient estimators based on influence functions to tackle both of these challenges.

Our motivation for studying this problem arises from the field of HIV vaccines. Over the past decade, there have been many small and several large trials of preventive HIV vaccines. These trials have been conducted across South East Asia, sub-Saharan Africa, and in the Americas, each with their own specific set of enrollment criteria. Much of the recent work in HIV vaccine development has been motivated by the results of the RV144 trial, which demonstrated modest but significant vaccine efficacy against HIV-1 infection (Rerks-Ngarm et al., 2009). A key consideration for HIV vaccine development is the selection of vaccine immunogens. Immunogens are molecules that are capable of eliciting a host immune response. An immunogen in the vaccine studied in the RV144 trial was designed to protect against clades circulating locally in Thailand. The RV144 trial showed modest preventive vaccine efficacy, with an estimated 31.2% reduction in the cumulative incidence of HIV-1 infection over 42 months. These results were encouraging, led to a large correlates analysis, and prompted several smaller follow-up trials including the HVTN 097 trial, which was designed to evaluate immunogenicity of the same vaccine regimen in South Africa (Gray et al., 2019). The HVTN 097 results indicated that response rates and magnitudes of putatively protective immune responses in South Africa were similar or better than those observed in RV144 in Thailand, providing support for continuing with this vaccine approach for research in sub-Saharan Africa. A subsequent study, HVTN 100, evaluated a revised version of the RV144 vaccine but updated with immunogens based on HIV-1 subtype C prevalent in South Africa and a different adjuvant (MF59 in place of alum). HVTN 100 determined that the South African-adapted vaccine successfully met the pre-specified immunological criteria for advancement to efficacy testing (Bekker et al., 2018). Based on these results, a larger phase IIb/III randomized, double-blind, controlled efficacy trial, HVTN 702, was conducted to evaluate the vaccine efficacy of the subtype C-adapted vaccine against HIV-1 acquisition in South Africa. Unfortunately, this trial was halted after pre-specified non-efficacy criteria were met at an interim analysis (Gray et al., 2021). To help identify potential explanations for the lack of efficacy in HVTN 702 and to inform future directions for the HIV vaccine field, it is important to examine possible differences in immunogenicity between the vaccines used in HVTN 097, HVTN 100, and HVTN 702 in South Africa. Moodie et al. (2022) studied immune correlates of risk and concluded that the CD4+ T cell response rate in HVTN 097 to 92TH023 (74%) was similar to that in HVTN 702 against antigen ZM96 (63%), but that the IgG response rate to IgG 1086.C V1V2 in RV144 was significantly higher than that in HVTN 702 (100% vs. 67%). We developed the framework described herein to support these comparisons across different trial populations.

The need for a comparison of vaccine-induced immune responses across vaccines that are evaluated in trials is not unique to HIV vaccines. Indeed, this is a common and important problem in many domains of vaccine research. For example, recent studies that separately evaluated the Moderna mRNA-1273 preventive COVID-19 vaccine and the Pfizer/bioNtech BNT162b2 vaccine had different sampling designs and different covariate distributions for enrolled participants (Baden et al., 2020; Polack et al., 2020). Similarly, recent studies of dengue vaccines have been conducted in diverse populations across Southeast Asia and South America. Past dengue exposure may be a key modifier of the immunogenicity and efficacy of the vaccines and circulating serotypes of dengue virus may differ across geography (Rabaa et al., 2017; Sridhar et al., 2018).

The above scientific context highlights a clear need for understanding the causal relationship between a vaccine regimen and its immunogenicity in a particular population. Such information can guide the design of new vaccines, as well as the prioritization of current vaccines for further research. In this paper, we develop a framework that identifies appropriate causal estimands and estimators that can be used to provide standardized comparisons of vaccine immunogenicity. We propose estimators of these causal estimands and establish theory that dictates the large sample behavior of the estimators. Our estimators account for two practical difficulties that arise in vaccine trials. First, we propose methodology that accounts for different sampling designs that may be used to measure immune responses across trials. For example, HVTN 100 and HVTN 097 measured immune responses in all participants; however, in HVTN 702 immune responses were measured using case-control sampling. Second, our methodology allows for pooling of trial data to gain efficiency when the same vaccine is evaluated in multiple trials. For example, an identical vaccine was evaluated in HVTN 100 and HVTN 702 and therefore, we may wish to pool data from these trials when evaluating immunogenicity. We clarify the formal causal assumptions and semiparametric efficiency theory that allows such pooling. Our work relates to other recent work on transportability of causal effects (Stuart et al., 2015; Bareinboim and Pearl, 2016; Li and Luedtke, 2023), focusing on the specific challenges of these approaches in the context of vaccine immunogenicity studies.

2 Materials and Method

2.1 Notation and Data Structure

Vaccine trials can be generally categorized as being early or late phase studies. Early phase studies are often designed specifically to evaluate immunogenicity of one or several vaccine candidates and/or candidate doses of vaccine. These trials typically have smaller sample sizes, generally less than several hundred participants and do not assess VE on a clinical endpoint of interest. They may include one or several doses of a single vaccine, or one or several variations of a vaccine (e.g., vaccines with different adjuvants). We use the variable $T\in\mathcal{T}=\{1,2,\dots,N_{T}\}$ to denote an arbitrary numeric label applied to the various trials considered in a particular application. Data in each of these trials contains a possibly categorical variable indicating which of the vaccine formulations/doses a participant receives denoted by the label $A\in\mathcal{A}=\{0,1,2,\dots,N_{A}\}$ . A given vaccine $a\in\mathcal{A}$ could be evaluated in multiple trials; however, in our notation we use only a single, unique label for each vaccine and we denote by $\mathcal{T}_{a}\subseteq\mathcal{T}$ the trials in which the immunogenicity of vaccine $a$ was evaluated. The observed data also include measurements of one or several immune responses of interest $S$ . In practice $S$ may be a vector, but we focus here only on scalar-valued $S$ , as we can separately apply our methods to each immune response of interest. As a concrete example, we may consider HVTN 702, a phase IIb/III trial where participants were randomized to receive either an active vaccine or a placebo, and the immune responses of interest included various CD4+ T cell responses and IgG binding antibody responses.

Each trial’s data will also generally include other participant-level information collected prior to vaccine assignment. The specific baseline characteristics measured may vary across trials, and we introduce $\bm{W}(t)$ to denote covariates measured in trial $t=\{1,2,...,\tau\}$ . We use $\bm{W}=\bm{W}(1)\cup\bm{W}(2)\cup...\cup\bm{W}(\tau)$ to denote the superset of covariates consisting of all covariates collected in at least one of the trials considered. In our HIV vaccine example, participants in HVTN 097, 100 and 702 had their age, gender, body mass index (BMI), region of enrollment and educational level recorded. Thus, in this example $\bm{W}$ would include five demographic variables that were available in three trials.

In addition to vaccine, immune response, and covariates, some trials will also have data available on clinical endpoints of interest. This will almost always be the case for larger phase IIb/III trials that are designed explicitly to evaluate VE. For example, in HVTN 702, the primary outcome was time to first detection of HIV-1 acquisition or censoring and this information is recorded for all participants in the trial. Thus, we can assume that for some trials, the observed data will also include a clinical outcome of interest, which we denote by $Y$ . The outcome could be binary (e.g., indicator of disease by a fixed time-point) or it may be a time-to-event endpoint (e.g., time since vaccination until first occurrence of clinical disease). Our methods apply readily to both situations; however, for simplicity we hence assume $Y$ is binary. In early phase trials, $Y$ may be missing or right-censored for most or all individuals. This missingness pattern has no adverse impact on our developments, since we are primarily interested in comparing $S$ across vaccines and $Y$ typically occurs after $S$ . If $Y$ is subject to missingness it will be entirely appropriate for our developments to consider this variable as a three-level categorical variable with levels 0, 1, and missing.

An interesting aspect of the design of many vaccine trials is that $S$ may not be measured on every participant. Therefore, we introduce two versions of the data structure that allow us to differentiate between settings where $S$ is measured on every participant in every trial and settings where $S$ is only measured on a subset of participants in at least some of the trials considered. We refer to a datum collected in the former setting as a full data unit and in the latter setting as a observed data unit. Explicitly considering the full data unit in this setting is useful mathematically for describing requisite assumptions for identification of our causal estimands of interest. In the full data setting, for each participant in each trial, we record $X=(T,A,\bm{W},S,Y)\sim P_{X}$ , while recalling that without loss of generality, $Y$ will generally be coded as right-censored or missing for most or all individuals enrolled in early phase trials. In our notation, we use $P_{X}$ to denote the probability distribution $X$ , which is assumed to follow a statistical model $\mathcal{M}_{X}$ that is nonparametric up to certain assumptions detailed in Section 2.3 below. We use $E_{X}$ to denote expectation of a random variable under sampling from $P_{X}$ .

We now turn to the observed data unit, where $S$ may not be measured on all individuals. Many phase IIb/III trials employ a two-stage sampling design to efficiently determine the subset of participants in which $S$ should be measured (Breslow, 2005). In these designs, all participants have specimens (e.g., serum) collected at a clinic visit following vaccination but immune responses are only measured for participants in the subset. For example, in HVTN 702, a case-control design was used, wherein all vaccinated participants assigned female at birth who tested positive for HIV-1 after the month 6.5 study visit but before the end of primary follow-up (month 24) were selected and measured for $S$ at months 6.5 (and 12.5 for those who acquired HIV-1 thereafter). A covariate-matched set of controls was also selected to have $S$ measured. Hence, out of the 1168 female per-protocol participants eligible for case-control sampling in HVTN 702, $S$ was measured in 130 of these individuals (Moodie et al., 2022). To accommodate the potential for the presence of two-stage sampling, we introduce the observed data unit $O=(T,A,\bm{W},\Delta,\Delta S,Y)\sim P$ , which is a coarsened version of the full data unit $X$ . A typical observed data unit includes $T$ , $A$ , $\bm{W}$ and $Y$ (possibly subject to missingness) as above; however, the immune response $S$ is measured only in a subset of participants. The random variable $\Delta$ takes value 1 if the immune response $S$ is measured and zero otherwise. In the data unit $O$ , without loss of generality we represent the observed value of $S$ by $\Delta S$ , thereby arbitrarily recording a value of 0 for $S$ in individuals not selected for two-phase sampling. We note that for early phase trials generally we will have $\Delta=1$ for all participants, while for late phase trials, $\Delta=1$ for only a subset of participants. The statistical model $\mathcal{M}$ for $P$ is implied by the model for the distribution of the full data unit $P_{X}$ and the model for the sampling variable $\Delta$ given $(T,A,W,Y)$ , where these sampling probabilities are generally known by design. We use $E$ to denote the expectation of random variable under sampling for $P$ .

We provide an example visualization of the type of observed data that is used in our motivating example in Supplementary Table 1. In this example, our data set consists of data from three trials pooled into a single data set. Our two covariates $\bm{W}$ of interest are categorical age ( $W_{1}$ ) and sex at birth ( $W_{2}$ ). There is one late phase trial, HVTN 702 and two early phase trials HVTN 100 and HVTN 097. In the late phase trial, HIV-1 acquisition $Y$ is recorded for all participants (for simplicity, we present an idealization of the actual data where we ignore right-censoring of $Y$ ). However, the immune response $S$ of interest is only measured in a subset of participants in these trials, as indicated by rows where $\Delta=1$ ; $S$ is missing for all rows in which $\Delta=0$ . On the other hand, in the early phase trials, $S$ is measured for everyone, while $Y$ is generally missing. At times, we will refer back to Supplementary Table 1 to make concrete our general estimation strategies.

2.2 Causal Estimands

Traditionally, the average immune response induced by each vaccine is estimated in each trial separately. This approach targets the estimand $\mu_{a}:=E_{X}(S\mid A=a,T=t)$ . However, in some situations there may be components of $\bm{W}$ that are correlated with both trial enrollment $T$ and immune responses $S$ . For example, age may vary across trials and correlate with the magnitude of vaccine-induced immune responses, rendering the comparison of two vaccine candidates $\mu_{a}-\mu_{a^{\prime}}$ evaluated in different trials $t$ and $t^{\prime}$ biased.

To address this concern, we propose a causal framework to provide such comparisons in an appropriate way. In particular, we can consider a counterfactual variable $S(a)$ that corresponds to the immune response that would be observed if an individual were given vaccine $a$ . We assume that causal consistency holds and that there is no interference between individuals. Both assumptions are reasonable in the present context, where causal consistency stipulates that there are not “multiple formulations” of a single vaccine. This assumption is generally reasonable for most vaccines, where often a key goal of pre-clinical vaccine development process is developing consistent manufacturing processes to ensure comparable vaccines across lots. No interference is also likely to be plausible in the present context as the immune response of one individual is unlikely to depend on vaccines received by other individuals in the study.

In this counterfactual scenario, it is possible for all individuals who could potentially enroll in any of the trials to receive any of the $N_{A}$ vaccines considered. Thus, we can conceptualize a counterfactual data unit $\mathbb{X}=(T,\bm{W},\{S(a),Y(a):a\in\{1,\dots,N_{A}\}\})\sim P_{\mathbb{X}}$ , where for completeness we define $Y(a)$ as the counterfactual clinical endpoint that would be observed under vaccination with $a$ , though this quantity does not play a role in our development. As above, we denote by $E_{\mathbb{X}}$ expectation of a random variable under $P_{\mathbb{X}}$ .

We are ultimately interested in comparing immunogenicity, for example, by comparing the average value of $S(a)$ vs. $S(a^{\prime})$ for vaccines $a,a^{\prime}$ that were evaluated in different trials. However, when these vaccines are evaluated in different trials that enroll from different populations, there are several such comparisons that could be of interest. In the context of HIV vaccines, a series of trials were conducted in several countries across several years. As described in the introduction, in our motivating example, the population of HVTN 702 was of primary interest, as our goal is to compare the immunogenicity across vaccines to aid in the interpretation of the null signal in the primary vaccine efficacy analysis of the HVTN 702. Thus, we may be interested in understanding whether and how the immunogenicity of the vaccine formulation studied in the earlier HVTN 097 trial compares to the formulation studied in HVTN 702, while making this comparison in the HVTN702 trial population. That is, we are asking a hypothetical question about the immunogenicity that would have been observed had we evaluated the HVTN 097 vaccine alongside the HVTN 702 vaccine, in the HVTN 702 study population. Using the labels from Supplementary Table 1, this estimand would be denoted $E_{\mathbb{X}}[S(2)-S(1)\mid T=\text{HVTN702}]$ . We label this type of causal estimand a standardized comparison of immunogenicity.

While our motivating example focuses on a setting where a single trial’s population is of interest, more generally we could consider standardized comparisons of the form $E_{\mathbb{X}}[S(a)-S(a^{\prime})\mid T\in\mathcal{T}_{\text{ref}}]$ , where $\mathcal{T}_{\text{ref}}\subseteq\mathcal{T}$ may include multiple trials. We refer to $\mathcal{T}_{\text{ref}}$ as the referent trial(s) to which we are standardizing our comparison. The choice of referent trial should be dictated by the scientific context. While we generally expect that $\mathcal{T}_{\text{ref}}$ will consist of a single trial, in some situations we may wish to include multiple trials in our referent. For example, if vaccines $a$ and $a^{\prime}$ are evaluated in trials that enroll from very similar or identical populations, then we may wish for $\mathcal{T}_{\text{ref}}=\mathcal{T}_{a}\cup\mathcal{T}_{a^{\prime}}$ . A trivial situation where this might occur is when vaccines $a$ and $a^{\prime}$ are evaluated in the same trial and we are interested in inference on their immunogenicity in that trial’s study population. However, we may also have settings where vaccines are evaluated in different trials, but the distribution of common baseline covariates are largely similar among trials $\mathcal{T}_{a}$ and $\mathcal{T}_{a^{\prime}}$ . This could happen when vaccines are evaluated at the same study sites using trials with similar enrollment criteria. In the absence of effect heterogeneity by covariates, inference on this quantity may have greater precision than inference based on an estimand standardized to either $\mathcal{T}_{a}$ or $\mathcal{T}_{a^{\prime}}$ alone.

Remark: Another potential setting is one where there exists a common referent population that is not sampled directly from any of the observed trials. For example, we may wish to compare immunogenicity of two vaccines in an age- and sex-standardized way against a known referent population distribution. We provide theory for this estimation and inference pertaining to this estimand in the Supplementary C.

2.3 Identification of standardized immunogenicity using full data

To identify the standardized immunogenicity comparison described in our motivating example, we require certain causal assumptions regarding the distribution of $\mathbb{X}$ , in addition to assumptions pertaining to the sampling design of $S$ as encoded in the distribution of $O$ . Key to both sets of assumptions is the consideration of which baseline covariates are available across the various trials.

We introduce the general notation $\bm{W}_{\cap}(\mathcal{T}_{0})$ to denote covariates that are available across all of a given set of trials $\mathcal{T}_{0}\subseteq\mathcal{T}$ . Thus, $\bm{W}_{\cap}(\mathcal{T}_{\text{ref}})\subseteq\bm{W}$ refers to the covariates common to all referent trial(s) and $\bm{W}_{\cap}(\mathcal{T}_{a})$ denotes covariates common to all trials where vaccine $a$ is evaluated. We denote by $\bm{W}_{\cap}(\mathcal{\mathcal{T}_{\text{ref}}})\cap\bm{W}_{\cap}(\mathcal{T}_{a})$ the set of covariates that are available in all referent trials and all trials where vaccine $a$ is evaluated. This set of covariates is particularly important for identification. As we presently show, we must be able to identify a subset of these covariates that is sufficient to control for differences in counterfactual immunogenicity between the individuals receiving vaccine $a$ and individuals in the referent population. We make the simplifying assumption that such covariates must be available in all of the trials where the immunogenicity of vaccine $a$ is actually measured, so that we can identify the vaccine’s expected immunogenicity conditional on this set of covariates. Moreover, we also need the same set of covariates to be available in the referent trial(s) so that the covariate-conditional immunogenicity can be properly standardized to the referent trial. We further discuss the identification under a weaker assumption that covariates are only available in at least one trial where vaccine $a$ is evaluated (Supplementary F).

Formally, identification of $\psi_{\mathbb{X}}(a)=E_{\mathbb{X}}[S(a)\mid T\in\mathcal{T}_{\text{ref}}]$ for an arbitrary vaccine $a$ using the full data requires the following assumptions.

(A1)

Ignorability of trial enrollment and vaccine assignment conditional on common covariates. There exists a set of common baseline covariates $\bm{W}_{S}\subseteq\bm{W}_{\cap}(\mathcal{\mathcal{T}_{\text{ref}}})\cap\bm{W}_{\cap}(\mathcal{T}_{a})$ such that: (A1.1) $S(a)\perp A\mid T\in\mathcal{T}_{\text{ref}},\bm{W}_{S}$ and (A1.2) $S\perp T\mid A,\bm{W}_{S}$ .
(A2)

Positivity of vaccine assignment. $P_{X}\{P_{X}(A=a\mid\bm{W}_{S})>0\mid T\in\mathcal{T}_{\text{ref}}\}=1$

Assumption (A1.1) stipulates that we must be able to identify a set of covariates that are measured in both the referent trial and the trial(s) where vaccine $a$ is evaluated such that conditional on this set of covariates, the vaccine which a referent-trial participant is observed to receive provides no additional information about their potential outcome $S(a)$ . This condition will generally hold by design if vaccines are randomly assigned, but may require additional scrutiny in observational studies. Assumption (A1.2) stipulates that conditional on vaccine assignment $A$ and $\bm{W}_{S}$ , the particular trial provides no additional information about the immune response outcome $S$ . Generally, we can think about two sub-assumptions that are needed to satisfy this assumption. First, there can not be a direct effect of trial on vaccine immunogenicity. This assumption would be violated if, for example, a trial had inappropriate cold storage procedures thereby causing weakened immunogenicity of the vaccine. Second, we require that $\bm{W}_{S}$ includes all characteristics that are related to both vaccine immunogenicity and that may differ across trial populations. For example, consider a scenario where certain compositions of the gut microbiome have a positive impact on vaccine immunogenicity and microbiome data are not available as part of $\bm{W}_{S}$ . If microbiome composition differs across trials in $\mathcal{T}_{a}$ , then assumption (A1.2) would be violated. Graphical approaches may be useful for scrutinizing this assumption in each specific scientific context. We remark that our notation $\bm{W}_{S}$ indicates that the choice of covariates may differ depending on the immune response that is being studied, as different responses may have different biological drivers. The choice of covariates $\bm{W}_{S}$ may also differ depending on the particular vaccine $a$ and the particular choice of referent trial $\mathcal{T}_{\text{ref}}$ . However, for simplicity we have elected to suppress this dependency in our notation. Assumption (A2) stipulates that there is a positive probability of receiving vaccine $a$ for all values of $\bm{W}_{S}$ that are observable in $\mathcal{T}_{\text{ref}}$ . This assumption would be violated if, for example, there were certain combinations of covariates that are observable in the referent trials $\mathcal{T}_{\text{ref}}$ , but not in any of the trials in which vaccine $a$ was studied. This condition could be scrutinized empirically using standard methods for evaluating propensity score overlap, for example by evaluating an estimate of $P_{X}(A=a\mid\bm{W}_{S})$ using observations in $\mathcal{T}_{\text{ref}}$ (Austin and Stuart, 2015).

Theorem 2.1

If (A1) and (A2) hold, then $\psi_{\mathbb{X}}(a)=E_{X}\left[E_{X}(S\mid A=a,\bm{W}_{S})\mid T\in\mathcal{T}_{\text{ref}}\right]$ .

A detailed proof can be found in Supplementary B. We hence use $\psi_{X}(a)=E_{X}[E_{X}(S\mid A=a,\bm{W}_{S})\mid T\in\mathcal{T}_{\text{ref}}]$ to refer to the identifying estimand as distinct from the causal estimand $\psi_{\mathbb{X}}(a)$ . The implication of Theorem 2.1 is that if (A1) and (A2) hold then $\psi_{\mathbb{X}}(a)=\psi_{X}(a)$ and a causal standardized immunogenicity comparison is possible using data sampled from $P_{X}$ . However, even in the ideal context where $\mathcal{T}_{a}$ consists of only randomized trials, assumption (A1.2) may yet be considered unreasonable. For example, the various trials in $\mathcal{T}_{a}$ may collect different sets of key covariates, rendering this assumption difficult to satisfy based on the set of covariates common to $\mathcal{T}_{\text{ref}}$ and $\mathcal{T}_{a}$ . In this case, $\psi_{X}(a)$ does not have a causal interpretation. Nevertheless, we argue that a comparison of $\psi_{X}(a)$ and $\psi_{X}(a^{\prime})$ still retains a useful non-causal interpretation as a covariate-adjusted comparison of vaccines $a$ and $a^{\prime}$ , standardizing the set of available common covariates to their distribution in the referent trial(s). So long as $\bm{W}_{S}$ contains at least some covariates that are prognostic of immune response and whose distributions differ across trials, we argue that a comparison of $\psi_{X}(a)$ and $\psi_{X}(a^{\prime})$ may still be preferred over a naïve estimand that compares $\mu_{a}$ and $\mu_{a^{\prime}}$ directly.

2.4 Identification of standardized immunogenicity using observed data

We now describe how $\psi_{X}(a)$ can be identified in the coarsened data setting, where we are sampling data from $P$ rather than $P_{X}$ . In a particular trial $t$ , sampling probabilities for $S$ could be determined based on $A$ (e.g., we may over-sample vaccine recipients and under-sample placebo recipients), $Y$ (e.g., it is common to sample all cases and only a subset of the remaining trial participants), and/or a subset of available covariates $\bm{W}(t)$ (e.g., we may over-sample particular populations to ensure appropriate representation in the observed data). We denote by $\bm{W}_{\Delta}(t)\subseteq\bm{W}(t)$ the set of covariates, if any, that are used to determine sampling probabilities in trial $t$ . Here, to simplify the exposition, we make the simplifying assumption that all trials in $\mathcal{T}_{a}$ use two-stage sampling and that the covariates used for sampling, $\bm{W}_{\Delta}(t)$ are the same for all such trials. We refer to this set of covariates as $\bm{W}_{\Delta}$ , suppressing the dependence on $a$ for simplicity. In future work, we will demonstrate how this assumption may be relaxed to allow different sampling designs across $\mathcal{T}_{a}$ ; we expect this generalization to be straightforward.

To identify $\psi_{X}(a)$ using the observed data we require the following assumptions.

(A3)

Missing at random: $S\perp\Delta\mid T,A,\bm{W}_{\Delta},Y$
(A4)

Positivity of sampling: $\forall\ t\in\mathcal{T}_{a}$ , $P\{P(\Delta=1\mid t,a,\bm{W}_{\Delta},Y)>0\mid A=a,T=t\}=1$ .

Assumption (A3) stipulates that given $(T,A,\bm{W}_{\Delta},Y)$ the probability of having immune responses measured cannot depend on the underlying immune response itself. Sampling probabilities are generally selected a-priori by design in late phase vaccine trials, so we expect this assumption will typically be satisfied. If instead, trials are designed such that immune responses are measured subject to some form of convenience sampling (e.g., participants can self-select into an immunogenicity sub-study), then this assumption would require further scrutiny. Assumption (A4) stipulates that there is a positive probability of sampling immune responses for measurement for each available covariate profile in each trial where vaccine $a$ is administered. Again, this assumption can generally be ensured by design. We define $\bm{W}_{\Delta,S}=\bm{W}_{S}\cup\bm{W}_{\Delta}$ to be the union of covariates required to satisfy (A1) and (A3). We have the following identification result for $\psi_{X}(a)$ .

Theorem 2.2

If Assumptions (A3)-(A4) hold then $\psi_{X}(a)=E\{E[E(S\mid\Delta=1,A=a,T\in\mathcal{T}_{a},Y,\bm{W}_{\Delta,S})\mid A=a,T\in\mathcal{T}_{a},\bm{W}_{S}]\mid T\in\mathcal{T}_{\text{ref}}\}$ .

2.5 Towards estimation: efficiency theory for identifying estimands

In this section, we provide the efficient influence function (EIF) of $\psi_{X}(a)$ and $\psi(a)$ in models that assume (A1)-(A4). We recall that an estimator’s influence function is a function of the data unit that has mean zero and finite variance. In particular, an estimator $\psi_{n}(a)$ of $\psi(a)$ is said to have influence function $D$ if $\psi_{n}(a)=\psi(a)+n^{-1}\sum_{i=1}^{n}D(O_{i})+o_{P}(n^{-1/2})$ . Influence functions are particularly useful for so-called regular estimators, as they can also be used to characterize the efficiency bound of all such estimators of a given parameter. The influence function of the regular estimator with the smallest asymptotic variance is called the efficient influence function. Influence functions are often indexed by so-called nuisance parameters, parameters of the data generating distribution that are not directly of interest, but are useful for constructing and studying the large sample behavior of estimators of the estimand of interest. Thus, influence functions can provide a means to generate final estimates with desirable large sample behavior (Rose and van der Laan, 2011).

We introduce some additional notation to represent the nuisance parameters indexing our efficient influence function. We define $\bar{Q}_{X}(\bm{W}_{i,S})=E_{X}(S\mid A=a,\bm{W}_{S}=\bm{W}_{i,S})$ as the conditional mean immune response, $g_{A}(a\mid\bm{W}_{i,S})=P_{X}(A=a\mid\bm{W}_{S}=\bm{W}_{i,S})$ as the conditional probability of vaccine $a$ given covariates $\bm{W}_{i,S}$ , $g_{T}(\mathcal{T}_{0}\mid\bm{W}_{i,S})=P_{X}(T\in\mathcal{T}_{0}\mid\bm{W}_{S}=\bm{W}_{i,S})$ as the conditional probability of enrollment in one of the trials in a given set $\mathcal{T}_{0}\subseteq\mathcal{T}$ given covariates $\bm{W}_{i,S}$ , and $g_{T}(\mathcal{T}_{\text{ref}})=P_{X}(T\in\mathcal{T}_{\text{ref}})$ as the marginal probability of enrollment in one of the trials in $\mathcal{T}_{\text{ref}}$ .

Theorem 2.3

The EIF for $\psi_{X}(a)$ in a model for $P_{X}$ that only assumes (A1)-(A2) is

\displaystyle D_{X}(X_{i})

\displaystyle=\frac{\mathbbm{1}_{a}(A_{i})}{g_{A}(a\mid\bm{W}_{i,S})}\frac{g_{T}(\mathcal{T}_{\text{ref}}\mid\bm{W}_{i,S})}{g_{T}(\mathcal{T}_{\text{ref}})}\left\{S_{i}-\bar{Q}_{X}(\bm{W}_{i,S})\right\}+\frac{\mathbbm{1}_{\mathcal{T}_{\text{ref}}}(T_{i})}{g_{T}(\mathcal{T}_{\text{ref}})}\left\{\bar{Q}_{X}(\bm{W}_{i,S})-\psi_{X}(a)\right\}\ .

We can also define the EIF for $\psi(a)$ (see Supplementary G), which is indexed by the following additional nuisance parameters:

	$\displaystyle\bar{Q}_{2}(Y_{i},\bm{W}_{i,\Delta,S})$	$\displaystyle=E(S\mid\Delta=1,A=a,T\in\mathcal{T}_{a},Y=Y_{i},\bm{W}_{\Delta,S}=\bm{W}_{i,\Delta,S})\ ,$
	$\displaystyle\bar{Q}_{1}(\bm{W}_{i,S})$	$\displaystyle=E[\bar{Q}_{2}(Y,\bm{W}_{\Delta,S})\mid A=a,T\in\mathcal{T}_{a},\bm{W}_{S}=\bm{W}_{i,S}]\ ,$
	$\displaystyle g_{\Delta}(1\mid T_{i},A_{i},Y_{i},\bm{W}_{i,\Delta,S})$	$\displaystyle=P(\Delta=1\mid T=T_{i},A=A_{i},Y=Y_{i},\bm{W}_{\Delta,S}=\bm{W}_{i,\Delta,S})\ .$

Theorem 2.4

The EIF for $\psi(a)$ in a model for $P$ that assumes (A1)-(A4) is

	$\displaystyle D(O_{i})$	$\displaystyle=\frac{\mathbbm{1}_{1}(\Delta_{i})}{g_{\Delta}(1\mid T_{i},A_{i},Y_{i},\bm{W}_{i,\Delta,S})}\frac{\mathbbm{1}_{a}(A_{i})}{g_{A}(a\mid\bm{W}_{i,S})}\frac{g_{T}(\mathcal{T}_{\text{ref}}\mid\bm{W}_{i,S})}{g_{T}(\mathcal{T}_{\text{ref}})}\{S_{i}-\bar{Q}_{2}(Y_{i},\bm{W}_{i,\Delta,S})\}$
		$\displaystyle+\frac{\mathbbm{1}_{a}(A_{i})}{g_{A}(a\mid\bm{W}_{i,S})}\frac{g_{T}(\mathcal{T}_{\text{ref}}\mid\bm{W}_{i,S})}{g_{T}(\mathcal{T}_{\text{ref}})}\{\bar{Q}_{2}(Y_{i},\bm{W}_{i,\Delta,S})-\bar{Q}_{1}(\bm{W}_{i,S})\}+\frac{\mathbbm{1}_{\mathcal{T}_{\text{ref}}}(T_{i})}{g_{T}(\mathcal{T}_{\text{ref}})}\{\bar{Q}_{1}(\bm{W}_{i,S})-\psi(a)\}\ .$

2.6 Targeted minimum loss estimation

The form of the efficient influence function suggests a natural targeted minimum loss-based estimation (TMLE) approach involving sequential regression (van der Laan and Rubin, 2006). TMLE in general consists of two major steps that are sometimes implemented iteratively. In the first step, estimators of nuisance parameters indexing the efficient influence function are obtained. TMLE is agnostic as to how such parameters are estimated, though regression stacking or super learning is commonly used towards this end in practice (van der Laan et al., 2007). The second step of TMLE improves by using empirical risk minimization in a low-dimensional parametric model to simultaneously (i) improve the fit of initial nuisance parameter estimates and (ii) ensure that, at the end of the TMLE procedure, the so-called efficient influence function estimating equation is solved.

A TMLE for $\psi(a)$ may be implemented in the following specific steps. Further information pertaining to hypothesis testing is available in Supplementary Section H.

1.

Estimate the probability of enrollment in referent trial(s) given covariates. To estimate $g_{T}(\mathcal{T}_{\text{ref}}\mid\bm{W}_{S})$ , we can use data from all observed trials to fit a regression with the outcome $\mathbbm{1}_{\mathcal{T}_{\text{ref}}}(T)$ and predictors $\bm{W}_{S}$ . This regression could be estimated using any binary regression approach. Denote the estimate by $g_{n,T}$ and define the estimated marginal probability of enrollment in $\mathcal{T}_{\text{ref}}$ as $g_{n,T}(\mathcal{T}_{\text{ref}})=n^{-1}\sum_{i=1}^{n}\mathbbm{1}_{\mathcal{T}_{\text{ref}}}(T_{i})$ .
2.

Estimate the pooled probability of receiving vaccine $a$ given covariates. To estimate $g_{A}(a\mid\bm{W}_{S})$ , we can use all the data to fit a regression with outcome $\mathbbm{1}_{a}(A)$ and predictors $\bm{W}_{S}$ . This regression could be estimated using approach that is appropriate for binary outcome regression. Denote the estimate by $g_{n,A}$ .
3.

Compute sampling probabilities for each individual. Next, we need to evaluate sampling probabilities $g_{\Delta}(1\mid T_{i},A_{i},Y_{i},\bm{W}_{i,\Delta,S})$ for each individual who received vaccine $a$ and who were enrolled in one of the trials included in $\mathcal{T}_{a}$ . These probabilities are generally known by design; if unknown, then they could be estimated separately for each trial in $\mathcal{T}_{a}$ using regression of the binary outcome $\Delta$ on predictors $A,Y,\bm{W}_{\Delta,S}$ . Denote by $g_{n,\Delta}$ the estimates (or true values) of the conditional sampling probabilities.
4.

Estimate vaccine-specific conditional mean immunogenicity given sampling and other covariates. To obtain an estimate of $\bar{Q}_{2}$ , we can use data from individuals who received vaccine $a$ across all trials in $\mathcal{T}_{a}$ to fit a regression with the outcome $S$ and predictors $Y,\bm{W}_{\Delta,S}$ . As above, any suitable regression technique can be used and we denote by $\bar{Q}_{n,2}$ the estimate of the conditional mean immunogenicity.

Target the vaccine-specific conditional mean immunogenicity given sampling and other covariates. For simplicity, suppose $S\in(0,1)$ . If this assumption does not hold, then $S$ can be re-scaled to fall in this interval and the same approach can be adopted (Gruber and van der Laan, 2010). Using all individuals with measured immune response $\Delta=1$ that received vaccine $a$ in trials $\mathcal{T}_{a}$ , fit a logistic regression with outcome $S$ , an offset equal to $\mbox{logit}[\bar{Q}_{n,2}(Y,\bm{W}_{\Delta,S})]$ , and a single covariate, defined as

H_{n,2}(T,A,Y,\bm{W}_{\Delta,S})=\frac{g_{n,T}(\mathcal{T}_{\text{ref}}\mid\bm{W}_{S})}{g_{n,\Delta}(1\mid T,A,Y,\bm{W}_{\Delta,S})g_{n,A}(a\mid\bm{W}_{S})g_{n,T}(\mathcal{T}_{\text{ref}})}\ .

Note that this regression model for $\bar{Q}_{2}$ has a single coefficient $\beta_{2}$ and the model can be expressed as $\bar{Q}_{2,\beta_{2}}=\mbox{expit}[\mbox{logit}(\bar{Q}_{n,2})+\beta_{2}H_{n,2}],\beta_{2}\in\mathbb{R}.$ Let $\beta_{n,2}$ denote the maximum likelihood estimate of $\beta_{2}$ and define $\bar{Q}_{n,2}^{*}$ as an estimate of $\bar{Q}_{2,\beta_{2}}$ .
6.

Estimate vaccine-specific conditional mean immunogenicity excluding sampling covariates. To estimate $\bar{Q}_{1}$ , we regress the pseudo-outcome $\bar{Q}_{n,2}^{*}(T,A,Y,\bm{W}_{\Delta,S})$ onto $\bm{W}_{S}$ using observations that received vaccine $a$ . As above, any suitable regression technique can be used and we denote by $\bar{Q}_{n,1}$ the estimate of conditional mean immunogenicity, now conditioning only on baseline covariates $\bm{W}_{S}$ .

Target the vaccine-specific conditional mean immunogenicity excluding sampling covariates. For simplicity, we again suppose that our initial estimates obtained in the previous step are such that $\bar{Q}_{n,1}(\bm{W}_{S})\in(0,1)$ for all $\bm{W}_{S}$ , while re-scaling can again be applied as needed. Now, using all individuals that received vaccine $a$ , fit a logistic regression with outcome $\bar{Q}_{n,2}^{*}(T,A,Y,\bm{W}_{\Delta,S})$ , an offset equal to $\mbox{logit}[\bar{Q}_{n,1}(\bm{W}_{S})]$ , and a single covariate, defined as

H_{n,1}(\bm{W}_{S})=\frac{g_{n,T}(\mathcal{T}_{\text{ref}}\mid\bm{W}_{S})}{g_{n,A}(a\mid\bm{W}_{i,S})g_{n,T}(\mathcal{T}_{\text{ref}})}\ .

Note that this regression model for $\bar{Q}_{1}$ has a single coefficient $\beta_{1}$ and the model can be expressed as $\bar{Q}_{1,\beta_{1}}=\mbox{expit}[\mbox{logit}(\bar{Q}_{n,1})+\beta_{1}H_{n,1}],\beta_{1}\in\mathbb{R}.$ Let $\beta_{n,1}$ denote the maximum likelihood estimate of $\beta_{1}$ and define $\bar{Q}_{n,1}^{*}$ as an estimate of $\bar{Q}_{1,\beta_{1}}$ .

Construct the final TMLE estimate. The final estimate is

\psi_{n}^{*}(a)=\frac{1}{\sum_{j}\mathbbm{1}_{\mathcal{T}_{\text{ref}}}(T_{j})}\sum_{i=1}^{n}\mathbbm{1}_{\mathcal{T}_{\text{ref}}}(T_{i})\bar{Q}_{n,1}^{*}(\bm{W}_{i,S})\ .

3 Simulation Studies

We evaluated the proposed estimators via simulation studies in terms of their bias, variance, mean squared error (MSE), coverage probability of 95% Wald-type confidence intervals and mean width of confidence intervals.

In the first simulation, we wished to compare the immunogenicity of vaccines evaluated in two different studies where the studies were imbalanced on key covariates. In this context, we considered three scenarios: (i) comparing vaccines evaluated in separate early phase trials; (ii) comparing vaccines where one is evaluated in an early phase trial and the other in a late-phase trial that used two-phase sampling for measurement of immune responses; (iii) comparing vaccines evaluated in separate late phase trials that both used two-phase sampling for measurement of immune responses. In each setting, we simulated two binary covariates $W_{1}\mid T\sim Bernoulli(0.65+0.15\mathbbm{1}_{2}(T))$ and $W_{2}\mid T\sim Bernoulli(0.5-0.2\mathbbm{1}_{2}(T))$ . We use $A=1$ to denote the vaccine evaluated in trial $T=1$ and $A=2$ to denote the vaccine evaluated in trial $T=2$ . Both trials 1 and 2 were simulated to have 1:1 randomization to either their respective active vaccines or a control vaccine (arbitrarily labeled $A=3$ ). The immune response was simulated as $S\mid A,\bm{W}\sim\mbox{Normal}((W_{1}-W_{2}+2\mathbbm{1}_{\{1,2\}}(A)),1)$ and the clinical outcome $Y$ was simulated as $Y\mid S,A,W\sim\mbox{Bernoulli}(\text{expit}(-2+\mathbbm{1}_{\{1,2\}}(A)+W_{1}/2-S/2))$ . To simulate two-phase sampling, we allowed sampling probabilities to depend on vaccine $A$ and outcome $Y$ . In early-phase trials $P(\Delta=1\mid A=a,Y=y)=1$ for all $a,y$ , consistent with the standard design of measuring immunogenicity in all participants in such trials. For late-phase trials, we generated data with two-phase sampling according to probabilities listed in Table 1.

Table 1: Details of generating scheme for each simulated trial set.

n

is the sample size and

P_{y,a}

is the sampling probability in the sub-population

Y=y

and

A=a

Scenario	Trial	Vaccine	$n$	$P(W_{1})$	$P(W_{2})$	$P_{1,0}$	$P_{1,1}$	$P_{0,0}$	$P_{0,1}$
1	1	1	200	0.65	0.80	1	1	1	1
	2	2	150	0.5	0.30	1	1	1	1
2	1	1	5000	0.65	0.80	0.05	0.1	0.05	0.1
	2	2	150	0.5	0.30	1	1	1	1
3	1	1	2000	0.65	0.80	0.05	0.1	0.05	0.1
	2	2	1500	0.5	0.30	0.05	0.1	0.05	0.1

The resultant two trial sets are stacked as a whole set and are used for two target parameter estimations via TMLE. This procedure is repeated 1000 times for each scenario and then summarized by comparing with the corresponding true values. Our estimators exhibited low bias and reasonable MSE in all proposed scenarios (Table 2). They also achieve nominal confidence interval coverage with reasonable width of confidence intervals.

Table 2: Bias, variance, mean-squared error (MSE), coverage probability and width of 95% CI for first simulation. Simulation results of three scenarios are summarized for two choices of referent populations. Our methods have consistent performance with small biases, low MSE and well-defined coverage probability of

95\%

confidence intervals.

\text{CI}_{c}

: CI coverage;

\text{CI}_{w}

: CI width.

Case	$\mathcal{T}_{\text{ref}}$	Vaccine	Truth	Bias	Variance	MSE	$\text{CI}_{c}$	$\text{CI}_{w}$
1	$\{1,2\}$	1	2.0000	0.0014	0.0081	0.0080	0.9450	0.3513
1	$\{1,2\}$	2	2.0000	0.0005	0.0112	0.0109	0.9500	0.4122
1	$\{1\}$	1	1.8500	0.0001	0.0069	0.0065	0.9540	0.3262
1	$\{1\}$	2	1.8500	-0.0004	0.0184	0.0180	0.9490	0.5267
2	$\{1,2\}$	1	1.8600	-0.0023	0.0041	0.0043	0.9450	0.2497
2	$\{1,2\}$	2	1.8600	0.0047	0.0161	0.0162	0.9370	0.4925
2	$\{1\}$	1	1.8500	-0.0022	0.0041	0.0042	0.9430	0.2497
2	$\{1\}$	2	1.8500	0.0050	0.0167	0.0168	0.9360	0.5010
3	$\{1,2\}$	1	2.0000	-0.0008	0.0134	0.0139	0.9450	0.4493
3	$\{1,2\}$	2	2.0000	-0.0016	0.0196	0.0214	0.9170	0.5418
3	$\{1\}$	1	1.8500	-0.0015	0.0108	0.0102	0.9520	0.4057
3	$\{1\}$	2	1.8500	-0.0016	0.0311	0.0347	0.9150	0.6789

Additional simulation studies evaluating our estimators are provided in Supplementary Sections D and E. These studies provide further evidence supporting the applicability of our method across a variety of contexts. The proposed method yields well-performed estimation and inference in a range of vaccine trial settings, consistently exhibiting low bias, small MSE and well-calibrated 95% Wald-type confidence intervals for our estimators.

4 Application to HVTN Trials

The proposed methods were applied to three investigational HIV vaccine trials: HVTN 702 (Gray et al., 2021), HVTN 100 (Bekker et al., 2018) and HVTN 097 (Gray et al., 2019). The vaccine regimen used in HVTN 097 is ALVAC-HIV-vCP1521 + subtypes B/E gp120 protein with alum adjuvant, labeled as $\text{P}_{AE/B}/\text{alum}$ ; the vaccine regimen used in HVTN 100 and HVTN 702 is ALVAC-HIV-vCP2438 + subtype C gp120 protein with MF59 adjuvant, labeled as $\text{P}_{C}/\text{MF59}$ . Our analysis characterized the immunogenicity of these vaccines in terms of their impact on CD4+ T cells expressing cytokines in response to three antigens: ZM96, TV1 and 1086. The percentage of CD4+ T cells expressing cytokines in response to antigen are measured by the intracellular cytokine staining (ICS) assay. We evaluated the readout of this assay as both a continuous response magnitude and a binary response (0: Yes, 1: No), the latter indicating that the assay readout met the criteria positivity. The analysis adjusted for the following baseline participant characteristics: age, sex at birth, BMI, region of enrollment, and educational level. In HVTN 702, the immune responses were measured subject to a case-control sampling scheme with known sampling weights. Participants who were vaccinated in either of the other two trials are assigned weight one since the target immune markers are all measured.

We present results that compare the difference between vaccine $\text{P}_{AE/B}/\text{alum}$ evaluated in HVTN 097 versus vaccine $\text{P}_{C}/\text{MF59}$ evaluated in HVTN 100 and HVTN 702 among HVTN 702 population where the most prevalent HIV subtype in South Africa is clade C. To provide a benchmark for comparison, we included unadjusted estimators based on the sample average immune response within each trial arm.

Table 3 displays results for both estimators of the positive response rate (RR) of CD4+ T cells expressing two cytokines in response to a specifc antigen. Comparing RR between HVTN 097 and HVTN 702 for antigen ZM96, the unadjusted estimated difference in average RR is 0.097 (-0.055 0.249), which is smaller than the results obtained from TMLE, which is 0.182 (-0.014, 0.365). The geometric average response magnitudes show similar trends: 1.169 (0.844, 1.618) for the raw method and 1.470 (0.964, 2.241) using TMLE. For HVTN 702 population, the TMLE analysis revealed a statistically significant decrease in the CD4+ T-cell RR (0.287 (0.085, 0.466), $p=0.006$ ) and geometric mean (1.935 (1.275, 2.938), $p=0.002$ ) of the immune response to antigen 1086 for the vaccine $\text{P}_{C}/\text{MF59}$ compared to the vaccine $\text{P}_{AE/B}/\text{alum}$ , but our analysis showed no evidence of a difference for antigens ZM96 and TV1 based on either the unadjusted or TMLE.

Table 3: The difference in the average immune responses of CD4+ cells between vaccine administered in HVTN 702 and HVTN 097 within the referent population HVTN 702. The point estimate for each vaccine regimen employed in the reference population HVTN 702 was provided in the second and third columns. Comparisons were summarized for unadjusted approaches and our proposed method for both contrasts. The contrasts pertain to the difference between response rates and the ratio of geometric means. ICS: Intracellular cytokine staining; RR: response rate; GM: geometric mean.

Trial	HVTN 702&100	HVTN 097	Difference/Ratio
Vaccine	$\text{P}_{C}/\text{MF59}$	$\text{P}_{AE/B}/\text{alum}$	-
Antigen: ZM96
RR (unadj)	0.639	0.736	0.097
CI	(0.527, 0.752)	(0.634, 0.838)	(-0.055, 0.249)
p value	–	–	0.210
RR (TMLE)	0.573	0.755	0.182
CI	(0.472, 0.668)	(0.559, 0.882)	(-0.014, 0.365)
p value	–	–	0.068
GM (unadj)	0.077	0.090	1.169
CI	(0.060, 0.099)	(0.073, 0.112)	(0.844, 1.618)
p value	–	–	0.348
GM (TMLE)	0.074	0.109	1.470
CI	(0.060, 0.092)	(0.076, 0.158)	(0.964, 2.241)
p value	–	–	0.074
Antigen: TV1
RR (unadj)	0.723	0.736	0.013
CI	(0.618, 0.827)	(0.634, 0.838)	(-0.133, 0.160)
p value	–	–	0.857
RR (TMLE)	0.649	0.755	0.106
CI	(0.554, 0.734)	(0.559, 0.882)	(-0.083, 0.288)
p value	–	–	0.272
GM (unadj)	0.080	0.090	1.129
CI	(0.063, 0.101)	(0.073, 0.112)	(0.819, 1.557)
p value	–	–	0.458
GM (TMLE)	0.076	0.109	1.441
CI	(0.062, 0.093)	(0.076, 0.158)	(0.948, 2.190)
p value	–	–	0.087
Antigen: 1086
RR (unadj)	0.546	0.736	0.190
CI	(0.430, 0.662)	(0.634, 0.838)	(0.035, 0.344)
p value	–	–	0.016
RR (TMLE)	0.468	0.755	0.287
CI	(0.370, 0.569)	(0.559, 0.882)	(0.085, 0.466)
p value	–	–	0.006
GM (unadj)	0.060	0.090	1.510
CI	(0.048, 0.075)	(0.073, 0.112)	(1.103, 2.067)
p value	–	–	0.010
GM (TMLE)	0.057	0.109	1.935
CI	(0.046, 0.069)	(0.076, 0.158)	(1.275, 2.938)
p value	–	–	0.002

5 Discussion

Our framework explicitly outlines sufficient assumptions for a causal interpretation of vaccine immunogenicity comparison across trials. We acknowledge that these assumptions are strong and may not be justifiable in many practical applications, particularly the assumption of ignorability of trial enrollment given measured covariates. This may limit the interpretability of our results in the language of counterfactuals. Nevertheless, we argue that transparently explicating these assumptions is critical. This clarity allows researchers to understand whether and how we should adjust for covariates when calculating average immune responses. Moreover, we argue that the observed data parameter $\psi_{X}(a)$ is likely to be closer to the true counterfactual immune response $\psi_{\bm{X}}(a)$ than the naïve estimand $\mu_{a}$ . Thus, an interpretation of $\psi_{X}(a)$ as a covariate-standardized immune response may prove satisfactory for advancing scientific aims even in settings where the assumptions required for full counterfactual interpretation fail.

Our framework leads to at least two practical recommendations for the conduct of vaccine immunogenicity studies. First, because the assumptions required for a causal comparison hinges on the availability of key covariates across all trials in which the vaccines of interest are evaluated, standardizing the collection of covariates across different trials should be a priority. While fully standardizing a set of covariates across trials with different sponsors or vaccine developers may be unrealistic, funding organizations and vaccine trial networks may consider developing standardized operating procedures for covariate collection for all studies of candidate vaccines. This may be particularly important for pathogens for which there is considerable prior exposure in the trial population. In these instances, pre-existing immunity may significantly modify the immunogenicity of vaccines and may differ across trials. Therefore, every effort should be made to standardize the assays that measure pre-existing immunity in each trial. We discuss other avenues for relaxing assumptions related to covariate availability in Supplementary Section F.

A second practical recommendation suggest by our framework is that it may be desirable to publish covariate-conditional estimates of vaccine immunogenicity. While our framework has focused on nonparametric estimation in settings where data from all trials are simultaneously available to the analyst, our results suggest that publishing even simple covariate-conditional immunogenicity models (e.g., based on logistic regression) may lead to more objective immunogenicity comparisons. Moreover, such models may help stimulate new hypotheses pertaining to individual-level factors that influence vaccine immunogenicity. We argue that reporting conditional models may be particularly important for performing improved meta analyses for establishing vaccine correlates of protection. In such analyses, it is typical to visually depict estimated VE from several trials along the vertical axis plotted against some marker of vaccine-induced immunogenicity on the horizontal axis. If the immune response in question is a strong candidate for a correlate of protection, we would expect a positive correlation between vaccine-induced immune responses and protective efficacy of the vaccine. However, we note that these trials themselves may be conducted across diverse populations and thus we should consider standardizing not only the vaccine-induced immunogenicity readout, but also the estimated VE readout from the trial, in order to have the most appropriate means of evaluating an immune response as a potential correlate of protection. We hypothesize applying covariate standardization to meta-analyses in certain contexts could enhance the power for detecting correlates of protection.

References

Austin and Stuart (2015) Austin, P. C. and Stuart, E. A. (2015). Moving towards best practice when using inverse probability of treatment weighting (iptw) using the propensity score to estimate causal treatment effects in observational studies. Statistics in medicine 34, 3661–3679.
Baden et al. (2020) Baden, L. R., El Sahly, H. M., Essink, B., Kotloff, K., Frey, S., Novak, R., Diemert, D., Spector, S. A., Rouphael, N., Creech, C. B., et al. (2020). Efficacy and safety of the mRNA-1273 SARS-CoV-2 vaccine. New England journal of medicine .
Banzhoff et al. (2003) Banzhoff, A., Nacci, P., and Podda, A. (2003). A new MF59-adjuvanted influenza vaccine enhances the immune response in the elderly with chronic diseases: results from an immunogenicity meta-analysis. Gerontology 49, 177–184.
Bareinboim and Pearl (2016) Bareinboim, E. and Pearl, J. (2016). Causal inference and the data-fusion problem. Proceedings of the National Academy of Sciences 113, 7345–7352.
Bekker et al. (2018) Bekker, L.-G., Moodie, Z., Grunenberg, N., Laher, F., Tomaras, G. D., Cohen, K. W., Allen, M., Malahleha, M., Mngadi, K., Daniels, B., et al. (2018). Subtype C ALVAC-HIV and bivalent subtype C gp120/MF59 HIV-1 vaccine in low-risk, HIV-uninfected, South African adults: a phase 1/2 trial. The lancet HIV 5, e366–e378.
Breslow (2005) Breslow, N. E. (2005). Case–Control Study, Two-Phase. John Wiley & Sons, Ltd.
Chung et al. (2014) Chung, A. W., Ghebremichael, M., Robinson, H., Brown, E., Choi, I., Lane, S., Dugast, A.-S., Schoen, M. K., Rolland, M., Suscovich, T. J., et al. (2014). Polyfunctional Fc-effector profiles mediated by IgG subclass selection distinguish RV144 and VAX003 vaccines. Science translational medicine 6, 228ra38–228ra38.
Furuya-Kanamori et al. (2021) Furuya-Kanamori, L., Xu, C., Doi, S. A., Clark, J., Wangdi, K., Mills, D. J., and Lau, C. L. (2021). Comparison of immunogenicity and safety of licensed Japanese encephalitis vaccines: A systematic review and network meta-analysis. Vaccine 39, 4429–4436.
Gray et al. (2021) Gray, G. E., Bekker, L.-G., Laher, F., Malahleha, M., Allen, M., Moodie, Z., Grunenberg, N., Huang, Y., Grove, D., Prigmore, B., et al. (2021). Vaccine efficacy of ALVAC-HIV and bivalent subtype C gp120–MF59 in adults. New England Journal of Medicine 384, 1089–1100.
Gray et al. (2019) Gray, G. E., Huang, Y., Grunenberg, N., Laher, F., Roux, S., Andersen-Nissen, E., De Rosa, S. C., Flach, B., Randhawa, A. K., Jensen, R., et al. (2019). Immune correlates of the Thai RV144 HIV vaccine regimen in South Africa. Science translational medicine 11, eaax1880.
Gruber and van der Laan (2010) Gruber, S. and van der Laan, M. J. (2010). A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome. The International Journal of Biostatistics 6,.
Li and Luedtke (2023) Li, S. and Luedtke, A. (2023). Efficient estimation under data fusion. Biometrika 110, 1041–1054.
Moodie et al. (2022) Moodie, Z., Dintwe, O., Sawant, S., Grove, D., Huang, Y., Janes, H., Heptinstall, J., Omar, F. L., Cohen, K., De Rosa, S. C., et al. (2022). Analysis of the hiv vaccine trials network 702 phase 2b–3 hiv-1 vaccine trial in south africa assessing rv144 antibody and t-cell correlates of hiv-1 acquisition risk. The Journal of infectious diseases 226, 246–257.
Polack et al. (2020) Polack, F. P., Thomas, S. J., Kitchin, N., Absalon, J., Gurtman, A., Lockhart, S., Perez, J. L., Pérez Marc, G., Moreira, E. D., Zerbini, C., et al. (2020). Safety and efficacy of the bnt162b2 mrna covid-19 vaccine. New England journal of medicine 383, 2603–2615.
Rabaa et al. (2017) Rabaa, M. A., Girerd-Chambaz, Y., Hue, K. D. T., Tuan, T. V., Wills, B., Bonaparte, M., van Der Vliet, D., Langevin, E., Cortes, M., Zambrano, B., et al. (2017). Genetic epidemiology of dengue viruses in phase III trials of the CYD tetravalent dengue vaccine and implications for efficacy. Elife 6, e24196.
Rerks-Ngarm et al. (2009) Rerks-Ngarm, S., Pitisuttithum, P., Nitayaphan, S., Kaewkungwal, J., Chiu, J., Paris, R., Premsri, N., Namwat, C., de Souza, M., Adams, E., et al. (2009). Vaccination with ALVAC and AIDSVAX to prevent HIV-1 infection in Thailand. New England Journal of Medicine 361, 2209–2220.
Rose and van der Laan (2011) Rose, S. and van der Laan, M. J. (2011). A targeted maximum likelihood estimator for two-stage designs. The international journal of biostatistics 7,.
Sridhar et al. (2018) Sridhar, S., Luedtke, A., Langevin, E., Zhu, M., Bonaparte, M., Machabert, T., Savarino, S., Zambrano, B., Moureau, A., Khromava, A., et al. (2018). Effect of dengue serostatus on dengue vaccine safety and efficacy. New England Journal of Medicine 379, 327–340.
Stuart et al. (2015) Stuart, E. A., Bradshaw, C. P., and Leaf, P. J. (2015). Assessing the generalizability of randomized trial results to target populations. Prevention Science 16, 475–485.
van der Laan et al. (2007) van der Laan, M. J., Polley, E. C., and Hubbard, A. E. (2007). Super learner. Statistical applications in genetics and molecular biology 6,.
van der Laan and Rubin (2006) van der Laan, M. J. and Rubin, D. (2006). Targeted maximum likelihood learning. The international journal of biostatistics 2,.