Precise binding free energy calculations for multiple molecules using an optimal measurement network of pairwise differences

Pengfei Li Silicon Therapeutics, Suzhou, Jiangsu 215000, China Zhijie Li Silicon Therapeutics, Suzhou, Jiangsu 215000, China Yu Wang Silicon Therapeutics, Suzhou, Jiangsu 215000, China Huaixia Dou Silicon Therapeutics, Suzhou, Jiangsu 215000, China Brian K. Radak Roivant Sciences, Boston, Massachusetts 02210, United States Woody Sherman Roivant Sciences, Boston, Massachusetts 02210, United States Huafeng Xu [email protected] Roivant Sciences, New York, New York 10036, United States

Abstract

Alchemical binding free energy (BFE) calculations offer an efficient and thermodynamically rigorous approach to in silico binding affinity predictions. As a result of decades of methodological improvements and recent advances in computer technology, alchemical BFE calculations are now widely used in drug discovery research. They help guide the prioritization of candidate drug molecules by predicting their binding affinities for a biomolecular target of interest (and potentially selectivity against undesirable anti-targets). Statistical variance associated with such calculations, however, may undermine the reliability of their predictions, introducing uncertainty both in ranking candidate molecules and in benchmarking their predictive accuracy. Here, we present a computational method that substantially improves the statistical precision in BFE calculations for a set of ligands binding to a common receptor by dynamically allocating computational resources to different BFE calculations according to an optimality objective established in a previous work from our group and extended in this work. Our method, termed Network Binding Free Energy (NetBFE), performs adaptive binding free energy calculations in iterations, re-optimizing the allocations in each iteration based on the statistical variances estimated from previous iterations. Using examples of NetBFE calculations for protein-binding of congeneric ligand series, we demonstrate that NetBFE approaches the optimal allocation in a small number ( $\leq 5$ ) of iterations and that NetBFE reduces the statistical variance in the binding free energy estimates by approximately a factor of two when compared to a previously published and widely used allocation method at the same total computational cost.

1 Introduction

In drug discovery research, candidate drug molecules are increasingly ranked and selected by their computationally predicted affinities for the biomolecular target of interest before experimental synthesis and testing ^{1, 2, 3, 4, 5, 6}. This allows for the exploration of many more molecular design ideas than is practical with experimental approaches alone. Alchemical binding free energy (BFE) calculations have become a valuable tool in predicting the protein-ligand binding affinities and their utility continues to increase thanks to advances in the methodologies ^{7, 8, 9, 10, 11, 12, 4, 13} force field ^{14, 15, 16, 17, 18, 19}, computational power afforded by the graphical prcessing units (GPUs) ^{11, 9, 8}, and the availability of simulation software and simplified workflows ^{11, 9, 4, 20, 21, 22, 23, 24, 25, 26}.

BFE calculations can either compute the binding free energy of an individual molecule, by a technique commonly referred to as "absolute" binding free energy (ABFE) calculations ^{27, 28, 29, 30}, or compute the difference between the binding free energies of two molecules, by "relative" binding free energy (RBFE) calculations ^{31, 32, 11, 33}. The binding free energies of a set of molecules can be determined by a combination of ABFE calculations of select molecules and RBFE calculations of select pairs of molecules ³⁴.

Statistical errors in BFE calculations confound decision-making in candidate molecule selection and the assessment of their predictive accuracy. There have been significant methodological progress in minimizing statistical errors in single BFE calculations ^{35, 36}, but works have only recently been published addressing the question of how to reduce the collective statistical errors for BFE calculations of a set of molecules ^{34, 37}, which represents the majority of applications of BFE calculations in drug discovery.

As suggested previously, at a fixed total computational cost, the overall statistical errors in BFE calculations for multiple molecules can be minimized by optimally allocating the computational resources to different RBFE and ABFE calculations. In particular, minimizing the total variance in the estimated binding free energies with respect to the allocation corresponds to the A-optimal in experimental design ³⁴. Such optimal allocations, however, require knowledge of the statistical fluctuations in individual BFE calculations, which are unknown a priori. Because of this reason, A-optimal BFE calculations have not yet been reported for realistic examples.

In this work, we develop a method based on the A-optimal design to improve the statistical precision of BFE calculations. Our method, Network Binding Free Energy (NetBFE), performs adaptive BFE calculations in multiple iterations. In each iteration, NetBFE estimates the statistical fluctuations in different BFE calculations based on the results from previous iterations, and it reallocates the computational resources accordingly. This allows the A-optimal allocation to be used in realistic BFE calculations. We also extend the previous theoretical work by deriving new equations for the A-optimal design and the covariance matrix when only RBFE calculations are included (the previous theory required the inclusion of at least one ABFE calculation), a common scenario in the lead optimization stage of drug discovery.

Here, we apply NetBFE to calculating the binding free energies of 16 ligands against the TYK2 protein ^{11, 38} and 8 ligands against the HIF-2 $\alpha$ protein ³⁹. We compare the statistical errors in the estimated binding free energies from NetBFE calculations and those in the BFE calculations using RBFE pairs selected by the LOMAP algorithm developed by the Mobley lab ⁴⁰, which is widely used in such calculations. At the same total computational cost, NetBFE reduces the statistical variance by about half compared to LOMAP. Our results suggest that NetBFE can substantially improve the statistical precision of BFE calculations in drug discovery, which should accelerate the computational selection of candidate drug molecules and thereby increase the efficiency of the design-make-test cycle.

2 Methods

2.1 A-optimal design of network binding free energy calculations

The A-optimal design of measurement network of pairwise differences was described previously in detail ³⁴, and is briefly explained here. The binding free energy values for multiple molecules in the same binding site of a receptor are the quantities of interest in BFE calculations. Let $x_{i=1,2,\ldots,m}$ be the binding free energy of molecule $i$ ; here we use $x_{i}$ instead of $\Delta G_{i}$ to make the notations in the mathematical presentation more compact, and to follow the same notation in Reference 34. ABFE computes the individual $x_{i}$ (i.e. $\Delta G_{i}$ ) and RBFE computes the difference between any pair of $x_{i}$ and $x_{j}$ : $x_{ij}=x_{i}-x_{j}$ (i.e. $\Delta\Delta G_{ij}=\Delta G_{i}-\Delta G_{j}$ ). These ABFE and RBFE calculations form our difference network, which can be represented by a graph $\mathcal{G}$ of $m+1$ vertices and $m(m+1)/2$ edges, where the vertices $i=1,2,\ldots,m$ stand for the $m$ quantities $\left\{x_{i}\right\}$ , the edge between vertices $i\neq j>0$ stands for the difference measurement $\hat{x}_{ij}$ (i.e. the RBFE calculations), and the edge between vertices 0 and $i>0$ stands for the individual measurement $\hat{x}_{i}$ (i.e. the ABFE calculations). For each BFE calculation, its asymptotic statistical variance $\sigma_{e}^{2}$ , for $e\in\{i|i=1,2,\ldots,m\}\cup\{(i,j)\mid i,j=1,2,..,m,i\neq j\}$ , decreases with $n_{e}$ —the resource allocated to this calculation—as

\sigma_{e}^{2}=s_{e}^{2}/n_{e}

(1)

where $s_{e}$ is the statistical fluctuation in the corresponding BFE calculation.

Some of the binding free energy values, say $\left\{x_{a}\mid a\in Q\right\}$ , may have been measured experimentally. These experimental values and their concomitant statistical errors in the experiment, $\left\{\tilde{x}_{a}\pm\delta_{a}\mid a\in Q\right\}$ , can be included in the estimate of the binding free energies from BFE calculations. If the true binding free energies are $\{x_{i}\}$ , the likelihood that our BFE calculations produce results of $\{\hat{x}_{e}\pm\sigma_{e}\}$ and the experimental measurements produce results of $\{\tilde{x}_{a}\pm\delta_{a}\mid a\in Q\}$ is

$\displaystyle L$	$\displaystyle=$	$\displaystyle\prod_{i}(\sqrt{2\pi\sigma_{i}})^{-1}\exp\left(-\frac{(x_{i}-\hat{x}_{i})^{2}}{2\sigma_{i}^{2}}\right)$	(2)
	$\displaystyle\cdot$	$\displaystyle\prod_{a\in Q}(\sqrt{2\pi\delta_{a}})^{-1}\exp\left(-\frac{(x_{a}-\tilde{x}_{a})^{2}}{2\delta_{a}^{2}}\right)$
	$\displaystyle\cdot$	$\displaystyle\prod_{i<j}(\sqrt{2\pi\sigma_{ij}})^{-1}\exp\left(-\frac{(x_{i}-x_{j}-\hat{x}_{ij})^{2}}{2\sigma_{ij}^{2}}\right)$

Maximizing the log-likelihood $\ln L$ with respect to $\{x_{i}\}$ yields the maximum likelihood estimator for $\left\{x_{i}\right\}$ :

\mathbf{F}\cdot\vec{x}=\vec{z}

(3)

where $\mathbf{F}$ is the Fisher information matrix with the elements

F_{ij}=\left\{\begin{array}[]{cl}\delta_{i}^{-2}+\sigma_{i}^{-2}+\sum_{k\neq i}\sigma_{ik}^{-2}&\text{if }i=j\text{ and }i\in Q\\ \sigma_{i}^{-2}+\sum_{k\neq i}\sigma_{ik}^{-2}&\text{if }i=j\text{ and }i\notin Q\\ -\sigma_{ij}^{-2}&\text{if }i\neq j\end{array}\right.

(4)

and

z_{i}=\left\{\begin{array}[]{cl}\delta_{i}^{-2}\tilde{x}_{i}+\sigma_{i}^{-2}\hat{x}_{i}+\sum_{j\neq i}\sigma_{ij}^{-2}\hat{x}_{ij}&\text{ if }i\in Q\\ \sigma_{i}^{-2}\hat{x}_{i}+\sum_{j\neq i}\sigma_{ij}^{-2}\hat{x}_{ij}&\text{if }i\notin Q.\end{array}\right.

(5)

The covariance in the estimates of $\left\{x_{i}\right\}$ is given by the inverse of the Fisher information matrix:

\mathbf{C}=\mathbf{F}^{-1}.

(6)

The A-optimal minimizes the total variance, tr( $\mathbf{C}$ ), subject to the constraints of non-negativity

n_{e}\geq 0,

(7)

and of the total fixed computational cost

\sum_{e}n_{e}=\sum_{i}n_{i}+\sum_{i<j}n_{ij}=N.

(8)

This minimization can be solved by cone programming⁴¹, and it is implemented in the publicly available software DiffNet. ⁴²

2.2 A-optimal for a network with only relative binding free energy calculations

Sometimes we may choose to perform only relative binding free energy calculations and compute only $\hat{x}_{ij}$ . In this case the free energies can only be determined up to an arbitrary constant $\bar{x}$ , which can be chosen to minimize the root mean square deviation from the known experimental values.

The log-likelihood in this case is given by

\ln L=-\sum_{i<j}\frac{(x_{i}-x_{j}-\hat{x}_{ij})^{2}}{2\sigma_{ij}^{2}}

(9)

The corresponding Fisher information matrix

F_{ij}=\left\{\begin{array}[]{cl}\sum_{j\neq i}\sigma_{ij}^{-2}&\text{ if }i=j\\ -\sigma_{ij}^{-2}&\text{ if }i\neq j\end{array}\right.

(10)

is singular. Consequently, the covariance matrix $\mathbf{C}=\mathbf{F}^{-1}$ cannot be defined.

This problem can be solved by introducing an additional term to the log-likelihood to restrain the mean $m^{-1}\sum_{i}x_{i}$ to a constant $\bar{x}$ :

\ln L^{\ast}(w)=\ln L-2^{-1}\omega\left(m^{-1}\sum_{i}x_{i}-\bar{x}\right)^{2}

(11)

which is equivalent to specifying that the mean of $\{x_{i}\}$ is normally distributed around $\bar{x}$ with the standard deviation of $1/\sqrt{\omega}$ . The corresponding Fisher information matrix is

\mathbf{F}^{\ast}(w)=\mathbf{F}+\omega m^{-2}\mathbf{1}

(12)

where $\mathbf{1}$ is a $m\times m$ matrix of elements 1. We substitute this augmented Fisher matrix $\mathbf{F}^{\ast}(\omega)$ for $\mathbf{F}$ in the semi-definite programming for A-optimal when there are no absolute binding free energy calculations. It can be demonstrated that the optimal allocation does not depend on the value of $\omega>0$ .

The corresponding covariance matrix is

\mathbf{C}^{\ast}(\omega)=(\mathbf{F}^{\ast}(\omega))^{-1}

(13)

We are interested in $\mathbf{C}^{\ast}=\lim_{\omega\rightarrow\infty}\mathbf{C}^{\ast}(\omega)$ , such that the mean of $\{x_{i}\}$ is constrained to the constant $\bar{x}$ . To derive $\mathbf{C}^{\ast}$ , consider the expansion of $\mathbf{C}^{\ast}(\omega)$ in $\omega^{-1}$ :

\mathbf{C}^{\ast}(\omega)=\mathbf{C}^{\ast}+\omega^{-1}\mathbf{C}_{1}+o(\omega^{-2}).

(14)

We have

	$\displaystyle\mathbf{I}$	$\displaystyle=$	$\displaystyle(\mathbf{F}+\omega m^{-2}\mathbf{1})(\mathbf{C}^{\ast}+\omega^{-1}\mathbf{C}_{1}+o(\omega^{-2}))$		(15)
		$\displaystyle=$	$\displaystyle\mathbf{F}\mathbf{C}^{\ast}+m^{-2}\mathbf{1}\mathbf{C}_{1}+\omega m^{-2}\mathbf{1}\mathbf{C}+\omega^{-1}\mathbf{F}\mathbf{C}_{1}+o(\omega^{-2}).$		(15)

Comparing the terms grouped by different orders of $\omega$ , we have

	$\displaystyle\vec{1}^{t}\mathbf{C}^{\ast}$	$\displaystyle=$	$\displaystyle\vec{0}$
	$\displaystyle\mathbf{F}\mathbf{C}^{\ast}+\vec{1}m^{-2}\vec{C}_{1}^{t}$	$\displaystyle=$	$\displaystyle\mathbf{I},$		(16)

where $\vec{C}_{1}^{t}$ is a vector whose elements are the column sums of $\mathbf{C}_{1}$ . This can be written in the matrix form

\left(\begin{array}[]{cc}\vec{1}^{t}&0\\ \mathbf{F}&\vec{1}\end{array}\right)\left(\begin{array}[]{c}\mathbf{C}^{\ast}\\ m^{-2}\vec{C}_{1}^{t}\end{array}\right)=\left(\begin{array}[]{c}\vec{0}^{t}\\ \mathbf{I}\end{array}\right),

(17)

the solution of which gives a well-defined covariance matrix $\mathbf{C}^{\ast}$ . Incidentally, $\vec{C}_{1}^{t}=m\vec{1}^{t}$ .

2.3 Iterative network binding free energy calculations

The statistical fluctuations $\{s_{e}\}$ depend on both the thermodynamic length between the two end states in an alchemical calculation and the ratio of the relaxation time of relevant motions to the length of the simulation.^{36, 7} In practice, they have to be estimated (approximately) from the completed BFE calculations. In order to obtain $\{s_{e}\}$ values for allocating the computational resources according to the A-optimal, NetBFE calculations proceed in iterations, with each iteration consuming a predetermined amount, $\Delta N^{(i)}$ , of total sampling. After each iteration $i$ , estimates of $\{s_{e}^{(i+1)}\}$ are updated as described in Section. 2.6 below, and a new A-optimal allocation $\{n_{e}^{(i+1)}\}$ is determined from the updated $\{s_{e}^{(i+1)}\}$ and used to extend the BFE calculations for iteration $i+1$ . The iterations continue until a specified total simulation samples, $N=\sum_{i}\Delta N^{(i)}$ , or a specified target average statistical variance tr( $\mathbf{C}$ )/m is reached. Data accumulated from all the iterations are included to compute the binding free energy value for each BFE calculation.

In each NetBFE calculation reported in this work, a total of $T_{0}=m\times 200\mathrm{ns}$ simulation time is used for a set of $m$ molecules. This is divided into $I=5$ iterations and in each iteration $T_{i}=T_{0}/I$ of simulation time is allocated.

2.4 A-optimal on sparse 2-connected sub-networks

In order to reduce the number of BFE calculations in each iteration, we select a 2-connected subgraph out of the full graph $\mathcal{G}$ in each iteration, and obtain the A-optimal on this sub-network. It was demonstrated previously that the 2-connected sub-network can generate near-optimal allocations.³⁴ The allocation in each iteration thus proceeds in two steps in our NetBFE implementation. First, we determine the A-optimal allocation of the computational resources for the next iteration on the full graph $\mathcal{G}$ . The edges with the largest allocations are selected to form the 2-connected subgraph. Here, the edge selection can be based on either the total cumulative allocation including all previous iterations or only the allocation for this iteration. We have implemented both selection methods, and we refer to NetBFE calculations with the former method as NetBFE( $N$ ) and those with the latter method as NetBFE( $\Delta N$ ). Second, A-optimal allocation is obtained including only the BFE calculations in the selected 2-connected subgraph.

2.5 Estimates of the statistical variances in BFE calculations

The statistical variance $\sigma_{ij}^{2}$ in the computed $\Delta\Delta G_{ij}$ (by RBFE for $i\neq j$ and by ABFE for $i=j$ ) is estimated by block bootstrapping, using all the data up to the current iteration. The sampled data points in free energy simulations are divided into $B=10$ non-overlapping blocks, each block containing samples from a simulation time interval. The data are resampled by randomly drawing $B$ blocks with replacement. A $\Delta\Delta G_{ij}$ value is computed from each set of resampled data, and the corresponding variance $\sigma_{ij}^{2}$ is computed from $K=50$ repeats of the resampling.

2.6 Estimates of the statistical fluctuations in BFE calculations

The statistical fluctuations $s_{ij}$ associated with the calculation of $\Delta\Delta G_{ij}$ can be estimated from its statistical variance $\sigma_{ij}^{2}$ and its given computational cost $n_{ij}$ by inverting Eq. 1:

s_{ij}^{2}=\sigma_{ij}^{2}n_{ij}

(18)

We use a simple model to predict $s_{ij}$ of BFE calculations that have not yet been performed (thus $\sigma_{ij}$ is unavailable):

s_{ij}=w_{0}+w_{1}\max(h_{ij},h_{ji})^{1/2}+w_{2}\max(H_{ij},H_{ji})^{1/2}

(19)

where for $i\neq j$ , $h_{ij}$ is the number of heavy atoms in molecule $i$ that do not map onto atoms in molecule $j$ and $H_{ij}$ is the number of hydrogen atoms in molecule $i$ that do not map onto atoms in molecule j in the RBFE calculation; for $i=j$ , $h_{ii}$ and $H_{ii}$ are the respective numbers of heavy and hydrogen atoms in molecule $i$ in the ABFE calculation.

Before the first iteration, the parameters are initialized to $w_{0}=1.0$ , $w_{1}=1.0$ and $w_{2}=0.5$ . In each iteration, the $s_{ij}$ values for the performed RBFE calculations are updated by Eq. 18, and the parameters $w_{0}$ , $w_{1}$ , and $w_{2}$ are updated by minimizing the root-mean-square-error (RMSE) between the values predicted from Eq. 19 and those computed from Eq. 18.

2.7 Maximum likelihood estimate of binding free energies

The individual (absolute) binding free energies can be derived from the pairwise (relative) binding free energy differences and their corresponding variances by using the maximum likelihood estimator (Eq. 3). In practical drug discovery projects, a few molecules usually have experimentally determined binding free energies—reference binding free energies–which can be included in the computational results to derive a maximum likelihood estimate of binding free energies (Eqs. 4 and 5).

2.8 Molecular dynamic simulations

In each individual RBFE/ABFE calculation, the protein-ligand complex was first equilibrated by a 5 ns simulation, where the snapshots were saved every 2500 MD steps and the total 2000 snapshots are randomly drawn as the starting structures for replicates of RBFE/ABFE calculations. Hydrogen mass repartition ⁴³ in conjunction with SHAKE ⁴⁴/SETTLE ⁴⁵ for constraining the distances between hydrogen atoms and their covalently bonded heavy atoms was employed to enable the use of a large time step of 4 fs in all simulations. The potential energies at all $\lambda$ values were computed and saved every 5 ps. Each alchemical simulation consisted of both the "complex" stage, which simulated the protein-ligand complex, and the "solvated" stage, which simulated the ligand in aqueous solution. The temperature was kept at 298.15 K using Langevin dynamics ⁴⁶ with the collision frequency 2.0 ps^-1, and the pressure was kept at 1.01325 bar with Monte Carlo barostat ⁴⁷ and a pressure relaxation time of 1.0 ps. The proteins were modeled by AMEBR ff14SB force field ⁴⁸, and the small molecules were parameterized using our in-house force field generation program; the force field parameters and starting protein-ligand complex structures are provided as AMBER20 input files in the Supplementary Materials. The entire system was solvated in a periodic box of TIP3P water molecules with all the solute atoms no less than 12 Å away from the boundary of the unit cell. In this work, the initial structures of TYK2 protein target with 16 ligands come from reference 11 and HIF-2 $\alpha$ (PDB ID: 4XT2) protein target with 8 ligands are constructed according to reference39. All simulations were performed using the AMBER20 package.⁴⁹

Each individual RBFE calculation consists of $\Lambda=22$ simulations (decoupled protocol) or $\Lambda=12$ simulations (concerted protocol) and each individual ABFE calculation consists of $\Lambda=21$ simulations (decoupled protocol) or $\Lambda=16$ simulations (concerted protocol) at different $\lambda$ values. Decoupled and concerted protocols are defined in references 5 and 8. If a BFE calculation is allocated a total simulation time of $t_{0}$ , each $\lambda$ -valued simulation will run for a total of $t_{\lambda}=t_{0}/\Lambda$ simulation time. If $t_{\lambda}<0.1$ ns, this BFE calculation is omitted and its time $t_{0}$ is allocated proportionally to other BFE calculations, otherwise the simulation time is divided into $\lceil t_{\lambda}/(2\mathrm{ns})\rceil$ replicates initialized with slightly different binding poses and random number seeds (thus no individual simulation exceeds 2 ns simulation time, which helps load balance when the simulations are executed in parallel). In each iteration, new replicates for each individual BFE calculation receiving new allocations are initiated; existing simulations are not extended. The allocation only includes the "complex" stage of each individual BFE calculation; if a BFE calculation is performed, its corresponding "solvated" stage is simulated for a fixed time of $2\Lambda$ ns, which incurs a negligible computational cost and is not included in the total allocated computational resource.

3 Results

We first illustrate the iterative allocation in NetBFE consisting of both ABFE and RBFE calculations, using the small example of eight inhibitors against the protein kinase TYK2. No experimental binding free energies were included in the allocation and analysis, i.e. $\delta_{i}^{-1}=0$ in Eq. 4. The estimated fluctuations $\{s_{e}\}$ and the corresponding A-optimal allocation of $\{n_{e}\}$ in each iteration are shown in Fig. 1A and Fig. 1B, respectively. The total variance $\mathrm{tr}(\mathbf{C}(n^{(i)},s^{(i)}))$ decreases with increasing iterations.

How well does this iterative procedure approach the true A-optimal allocation? If we assume that the estimate of $\{s_{e}^{(i=5)}\}$ after the last iteration represents the true statistical fluctuations, $\mathrm{tr}(\mathbf{C}(n^{(5)},s^{(5)}))$ approximates the variance of the A-optimal allocation. We measure how far the allocation $n^{(i)}$ in the $i$ ’th iteration—normalized to $n^{\prime(i)}\equiv\sum_{e}n_{e}^{(5)}(\sum_{e}n_{e}^{(i)})^{-1}n^{(i)}$ so that its total number of samples is the same as that in the last iteration—is from the this approximate A-optimal by comparing $\mathrm{tr}(\mathbf{C}(n^{\prime(i)},s^{(5)}))$ to $\mathrm{tr}(\mathbf{C}((n^{(5)},s^{(5)}))$ . The former indeed approaches the latter (Fig. 1C) after only three iterations, suggesting that the iterative allocation is converging quickly to the true A-optimal.

Because ABFE converges much slower than RBFE (on average ABFE has a much higher statistical fluctuation than RBFE: $\langle\{s_{i}\}\rangle_{\mathrm{ABFE}}/\langle\{s_{ij}\}\rangle_{\mathrm{RBFE}}$ $\approx$ 6.2), most of the computations (approximately 81 $\%$ of total resources) are allocated to ABFE in order to improve the precision of the absolute values of the computed binding free energies.

Refer to caption — Figure 1: The evolution of a NetBFE calculation of 8 inhibitors against TYK2. A. The estimated statistical fluctuations $s=\{s_{e}\}$ in each iteration. The width of each edge is proportional to the corresponding $s_{e}$ . B. The allocation of $n=\{n_{e}\}$ in each iteration. The width of each edge is proportional to the corresponding $n_{e}$ . In (A) and (B), each filled circle represents an inhibitor, an edge between two filled circles represents a RBFE calculation between the two corresponding inhibitors, and an edge between the large empty circle and a filled circle represents an ABFE calculation for the corresponding inhibitor. C. The total variance $\mathrm{tr}(\mathbf{C})\equiv$ tr $(\mathbf{C}(n^{(i)},s^{(i)}))$ according to the estimate of $s^{(i)}=\{s_{e}^{(i)}\}$ and the allocation $n^{(i)}=\{n_{e}^{(i)}\}$ in each iteration $i$ . Also shown is expected total variance tr $(\mathbf{C}(n^{\prime(i)},s^{(5)}))$ resulting from the normalized allocation $n^{\prime(i)}=\sum_{e}n_{e}^{(5)}(\sum_{e}n_{e}^{(i)})^{-1}n^{(i)}$ in the $i$ th iteration if $\{s_{e}^{(i=5)}\}$ from the last iteration is the true statistical fluctuation; the convergence of $\mathrm{tr}(\mathbf{C}(n^{\prime(i)},s^{(5)}))$ to $\mathrm{tr}(\mathbf{C}(n^{(5)},s^{(5)}))$ demonstrates that NetBFE approaches the A-optimal allocation via the iterative updates.

Often it is unnecessary to predict the absolute values of the binding free energies, such as in the case when the values for some similar molecules have been experimentally determined, thus the values for the molecules of interest can be determined by predicting the differences between the molecules, or in the case of ranking the molecules by their binding affinities, such that only their relative differences will suffice. In such cases ABFE calculations may be omitted to avoid the most expensive part of the computations, as noted above. In the following, we consider NetBFE with only RBFE calculations.

We used NetBFE to compute the binding free energies of 16 inhibitors against TYK2 ^{38, 11} and 8 inhibitors against HIF-2 $\alpha$ ³⁹. These molecules are depicted in Fig. 2 and listed in the Supplementary Materials. We first characterized the progression of statistical errors after each iteration of extending the total simulation time (Fig. 3). The standard deviations and the total variance by and large decreases with increasing total simulation time. Moreover, the distribution of the standard errors becomes narrower, indicating that most of the free energy results have converged to similar precision.

In the case of HIF-2 $\alpha$ inhibitors, the agreement between the experimental binding free energies and their predicted values improves at each iteration (Fig. 3), indicating that the improvement in precision is accompanied by the improvement in accuracy; for TYK2 inhibitors, the accuracy is quite good (RMSE < 1 kcal/mol) in all 5 iterations. Of course, accuracy in the binding free energy calculations depends on other factors such as the force field parametrization and the consistency between the computational setup and the experimental conditions. It remains unknown whether in other protein-ligand combinations or in other force fields the accuracy of the predicted binding free energies improves with the statistical precision.

In each iteration of the NetBFE calculation, the allocations $\{n_{e}\}$ are optimized according to the estimate of statistical fluctuations $\{s_{e}\}$ based on the results of the previous iteration, thus both $\{s_{e}\}$ and $\{n_{e}\}$ change from one iteration to the next. We measure the difference between two allocations of $\{n_{e}\}$ and $\{n^{\ast}_{e}\}$ by the Kullback-Leibler (KL) divergence:

D_{\mathrm{KL}}[n_{e}||n^{\ast}_{e}]=\sum_{e}f_{e}\ln\frac{f_{e}}{f^{\ast}_{e}}

(20)

where $f_{e}\equiv n_{e}/\sum_{e^{\prime}}n_{e^{\prime}}$ and similarly $f^{\ast}_{e}$ are the fractional allocations. In the two cases above, $\{n_{e}\}$ approach their limiting values with the progression of iterations (Fig. 4): the KL-divergence between $\{n_{e}^{(i)}\}$ and $\{n_{e}^{(\mathrm{opt})}\}$ decreases toward 0.69 for TYK2 and 0.28 for HIF-2 $\alpha$ with each iteration, where $\{n_{e}^{(\mathrm{opt})}\}$ is the the A-optimal allocation based on $s^{(i=5)}$ from the last iteration, assuming, as in the above, that the estimate of $s^{(i=5)}$ from the last iteration is a good approximation to the true statistical fluctuations. Taking the A-optimal variance to be $\mathbf{C}_{\mathrm{opt}}\approx\mathbf{C}(n^{(\mathrm{opt})},s^{(5)})$ , $\mathrm{tr}(\mathbf{C}(n^{\prime(i)},s^{(5)})/\mathrm{tr}(\mathbf{C}_{\mathrm{opt}})$ rapidly approaches one with each iteration for both TYK2 and HIF-2 $\alpha$ examples (as above, $n^{\prime(i)}=\sum_{e}n_{e}^{(5)}(\sum_{e}n_{e}^{(i)})^{-1}n^{(i)}$ is the normalized allocation in iteration $i$ ). Note that $n^{(i)}$ is determined on a 2-connected sub-network (see Section 2.4) while $n^{(\mathrm{opt})}$ is determined on the full network. These observations suggest that the iterations in NetBFE using a two-connected sub-network result in near-optimal allocations.

A simple consistency check can help identify problematic RBFE calculations by comparing the $\Delta\Delta G_{ij,\mathrm{RBFE}}$ in the RBFE calculation against the difference $\Delta G_{i,\mathrm{NetBFE}}-\Delta G_{j,\mathrm{NetBFE}}$ , where $\Delta G_{i,\mathrm{NetBFE}}$ and $\Delta G_{j,\mathrm{NetBFE}}$ are derived by the maximum likelihood estimator of Eq. 3. Fig. 5 shows such comparison for the NetBFE calculations of TYK2 and HIF-2 $\alpha$ inhibitors; points significantly deviating from the diagonal (e.g. points for which $|\Delta\Delta G_{ij,\mathrm{RBFE}}-(\Delta G_{i,\mathrm{NetBFE}}-\Delta G_{j,\mathrm{NetBFE}})|\geq 3\sqrt{\sigma_{ij}^{2}+\sigma_{i}^{2}+\sigma_{j}^{2}}$ ) indicate potentially problematic RBFE calculations.

Having demonstrated above that the iterative procedure in NetBFE allocates the computational resources approximately according to the A-optimal, we next investigate whether such allocations yield improved precision in the computed binding free energies. We performed $g=3$ independent replicates of NetBFE calculations of the TYK2 and HIF-2 $\alpha$ examples, initializing each replicate with the same binding poses but different random number seeds in equilibration and alchemical simulations, and computed the statistical variances in the predicted binding free energies across the replicates. In comparison, we computed the same sets of binding free energies—and their statistical variances from $g=3$ independent replicates—by allocating the same total computational cost evenly to pairs of RBFE calculations selected by a previously published method, LOMAP ⁴⁰. Fig. 6 shows the distributions of variances for the two NetBFE allocation methods (NetBFE( $N$ ) and NetBFE( $\Delta N$ ), explained in Section 2.4) and the LOMAP allocation method. The mean and median values of the variance distributions are listed in Table 1. The NetBFE allocations indeed significantly outperform the LOMAP allocations in terms of the statistical precision in the predicted binding free energies. The reduction in the statistical variance by approximately a factor of two is consistent with the previous results from simulated data ³⁴.

Table 1: The mean and median values of the variances distributions in the estimated binding free energies of NetBFE(

N

), NetBFE(

\Delta N

) and LOMAP schemes for the TYK2 and HIF-2

\alpha

examples, corresponding to the dashed and solid lines in Fig. 6. The error bars for these values are computed from bootstrapping. The variances are in the units of (kcal/mol)².

	TYK2		HIF-2 $\alpha$
methods	mean	median	mean	median
NetBFE( $N$ )	0.11 $\pm$ 0.05	0.04 $\pm$ 0.02	0.13 $\pm$ 0.07	0.05 $\pm$ 0.06
NetBFE( $\Delta N$ )	0.03 $\pm$ 0.01	0.03 $\pm$ 0.01	0.11 $\pm$ 0.03	0.10 $\pm$ 0.02
LOMAP	0.10 $\pm$ 0.03	0.07 $\pm$ 0.02	0.25 $\pm$ 0.10	0.13 $\pm$ 0.13

Reference binding free energy values from experimental measurements for a subset of the molecules, when available, may be incorporated in the NetBFE calculations by supplying $\tilde{x}_{i}$ and $\delta_{i}^{-1}>0$ in Eq. 4, 5. Here we tested the allocation and the corresponding precision and accuracy in NetBFE calculations that include different numbers of reference values. We performed four independent NetBFE calculations for the 16 inhibitors against TYK2, supplying reference experimental values for $k=0,1,2,4$ molecules, respectively.

The optimal allocations at the end of five iterations are shown in Fig. 7. When there are a small number of reference molecules ( $k=1,2$ ), they serve as network hubs, in that there is a RBFE edge connecting them to every other molecule. As we increase the number of reference molecules ( $k=4$ ), however, only a partial subset of the molecules are connected to each reference molecule for which RBFE calculations are quick to converge. The predicted binding free energies are compared to the experimental values in Fig. 8. The precision and accuracy of the NetBFE calculations are summarized in Table 2. In this case, including more reference values does not significantly improve either the precision or the accuracy.

k	0	1	2	4
$m^{-1}\mathrm{tr}(\mathbf{C})$	0.0030 $\pm$ 0.0011	0.0040 $\pm$ 0.0006	0.0020 $\pm$ 0.0003	0.0020 $\pm$ 0.0002
MUE	0.82 $\pm$ 0.12	0.74 $\pm$ 0.11	0.72 $\pm$ 0.10	0.79 $\pm$ 0.14
RMSE	0.93 $\pm$ 0.11	0.86 $\pm$ 0.12	0.82 $\pm$ 0.11	0.97 $\pm$ 0.13
Pearson $R^{2}$	0.87 $\pm$ 0.07	0.82 $\pm$ 0.09	0.84 $\pm$ 0.07	0.76 $\pm$ 0.11
Spearman $\rho$	0.88 $\pm$ 0.09	0.89 $\pm$ 0.10	0.87 $\pm$ 0.10	0.90 $\pm$ 0.08
Kendall $\tau$	0.72 $\pm$ 0.11	0.72 $\pm$ 0.12	0.72 $\pm$ 0.12	0.75 $\pm$ 0.10

Table 2: The precision and accuracy of NetBFE calculations for

m=16

TYK2 inhibitors when reference experimental values are included for

k=0,1,2,4

molecules. The precision is measured by the average statistical variance in the free energy estimates (in (kcal/mol)²). The accuracy is measured by the mean unsigned error (MUE, in kcal/mol) and root mean square error (RMSE, in kcal/mol) from the experimental values and by the correlations (Pearson

R^{2}

, Spearman

\rho

, and Kendall

\tau

) with the experimental values. The error bars in the metrics are computed from bootstrapping.

4 Conclusions

In this work, we have developed a new method to improve the overall statistical precision of binding free energy calculations for multiple molecules against a common biomolecular target. NetBFE (Network Binding Free Energy) implements and extends a previous theoretical work from our group on A-optimal experimental design for measuring pairwise differences, making it practical for realistic binding free energy calculations. We have demonstrated that, at a fixed total computational cost, NetBFE can reduce the statistical errors in BFE calculations by approximately two-fold. The improved precision will increase the confidence in the computational ranking of candidate drug molecules, which is instrumental to prioritizing molecules for synthesis in the design-make-test cycle. In addition, as binding free energy calculations are increasingly used for validating and benchmarking force field models and free energy methods^{4, 11, 7, 50}, the reduced statistical errors afforded by NetBFE will make the benchmark assessments more reliable and thus accelerate the improvements in in silico binding assays.

References

Chodera et al. 2011 Chodera, J.; Mobley, D.; Shirts, M.; Dixon, R.; Branson, K.; Pande, V. Alchemical Free Energy Methods for Drug Discovery: Progress and Challenges. Current Opinion in Structural Biology 2011, 21, 150–160
Mobley and Klimovich 2012 Mobley, D. L.; Klimovich, P. V. Perspective: Alchemical Free Energy Calculations for Drug Discovery. The Journal of Chemical Physics 2012, 137, 230901
Abel et al. 2017 Abel, R.; Wang, L.; Harder, E. D.; Berne, B. J.; Friesner, R. A. Advancing Drug Discovery through Enhanced Free Energy Calculations. Accounts of Chemical Research 2017, 50, 1625–1632
Schindler et al. 2020 Schindler, C. E. M.; Baumann, H.; Blum, A.; Böse, D.; Buchstaller, H.-P.; Burgdorf, L.; Cappel, D.; Chekler, E.; Czodrowski, P.; Dorsch, D.; Eguida, M. K. I.; Follows, B.; Fuch $\mathcal{B}$ , T.; Grädler, U.; Gunera, J.; Johnson, T.; Jorand Lebrun, C.; Karra, S.; Klein, M.; Knehans, T.; Koetzner, L.; Krier, M.; Leiendecker, M.; Leuthner, B.; Li, L.; Mochalkin, I.; Musil, D.; Neagu, C.; Rippmann, F.; Schiemann, K.; Schulz, R.; Steinbrecher, T.; Tanzer, E.-M.; Unzue Lopez, A.; Viacava Follis, A.; Wegener, A.; Kuhn, D. Large-Scale Assessment of Binding Free Energy Calculations in Active Drug Discovery Projects. Journal of Chemical Information and Modeling 2020, 60, 5457–5474
Song and Merz 2020 Song, L. F.; Merz, K. M. Evolution of Alchemical Free Energy Methods in Drug Discovery. Journal of Chemical Information and Modeling 2020, 60, 5308–5318
Cournia et al. 2020 Cournia, Z.; Allen, B. K.; Beuming, T.; Pearlman, D. A.; Radak, B. K.; Sherman, W. Rigorous Free Energy Simulations in Virtual Screening. Journal of Chemical Information and Modeling 2020, 60, 4153–4169
Mey et al. 2020 Mey, A. S. J. S.; Allen, B. K.; Macdonald, H. E. B.; Chodera, J. D.; Hahn, D. F.; Kuhn, M.; Michel, J.; Mobley, D. L.; Naden, L. N.; Prasad, S.; Rizzi, A.; Scheen, J.; Shirts, M. R.; Tresadern, G.; Xu, H. Best Practices for Alchemical Free Energy Calculations [Article v1.0]. Living Journal of Computational Molecular Science 2020, 2, 18378
Tsai et al. 2020 Tsai, H.-C.; Tao, Y.; Lee, T.-S.; Merz, K. M.; York, D. M. Validation of Free Energy Methods in AMBER. Journal of Chemical Information and Modeling 2020, 60, 5296–5300
Lee et al. 2020 Lee, T.-S.; Allen, B. K.; Giese, T. J.; Guo, Z.; Li, P.; Lin, C.; McGee, T. D.; Pearlman, D. A.; Radak, B. K.; Tao, Y.; Tsai, H.-C.; Xu, H.; Sherman, W.; York, D. M. Alchemical Binding Free Energy Calculations in AMBER20: Advances and Best Practices for Drug Discovery. Journal of Chemical Information and Modeling 2020, 60, 5595–5623
He et al. 2020 He, X.; Liu, S.; Lee, T.-S.; Ji, B.; Man, V. H.; York, D. M.; Wang, J. Fast, Accurate, and Reliable Protocols for Routine Calculations of Protein–Ligand Binding Affinities in Drug Design Projects Using AMBER GPU-TI with ff14SB/GAFF. ACS Omega 2020, 5, 4611–4619
Wang et al. 2015 Wang, L.; Wu, Y.; Deng, Y.; Kim, B.; Pierce, L.; Krilov, G.; Lupyan, D.; Robinson, S.; Dahlgren, M. K.; Greenwood, J.; Romero, D. L.; Masse, C.; Knight, J. L.; Steinbrecher, T.; Beuming, T.; Damm, W.; Harder, E.; Sherman, W.; Brewer, M.; Wester, R.; Murcko, M.; Frye, L.; Farid, R.; Lin, T.; Mobley, D. L.; Jorgensen, W. L.; Berne, B. J.; Friesner, R. A.; Abel, R. Accurate and Reliable Prediction of Relative Ligand Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy Calculation Protocol and Force Field. Journal of the American Chemical Society 2015, 137, 2695–2703
Shih et al. 2020 Shih, A. Y.; Hack, M.; Mirzadegan, T. Impact of Protein Preparation on Resulting Accuracy of FEP Calculations. Journal of Chemical Information and Modeling 2020, 60, 5287–5289
Chatterjee et al. 2017 Chatterjee, P.; Botello-Smith, W. M.; Zhang, H.; Qian, L.; Alsamarah, A.; Kent, D.; Lacroix, J. J.; Baudry, M.; Luo, Y. Can Relative Binding Free Energy Predict Selectivity of Reversible Covalent Inhibitors? Journal of the American Chemical Society 2017, 139, 17945–17952, PMID: 29124934
Roos et al. 2019 Roos, K.; Wu, C.; Damm, W.; Reboul, M.; Stevenson, J. M.; Lu, C.; Dahlgren, M. K.; Mondal, S.; Chen, W.; Wang, L.; Abel, R.; Friesner, R. A.; Harder, E. D. OPLS3e: Extending Force Field Coverage for Drug-Like Small Molecules. Journal of Chemical Theory and Computation 2019, 15, 1863–1874
Lu et al. 2021 Lu, C.; Wu, C.; Ghoreishi, D.; Chen, W.; Wang, L.; Damm, W.; Ross, G. A.; Dahlgren, M. K.; Russell, E.; Von Bargen, C. D.; Abel, R.; Friesner, R. A.; Harder, E. D. OPLS4: Improving Force Field Accuracy on Challenging Regimes of Chemical Space. Journal of Chemical Theory and Computation 2021,
Vanommeslaeghe et al. 2010 Vanommeslaeghe, K.; Hatcher, E.; Acharya, C.; Kundu, S.; Zhong, S.; Shim, J.; Darian, E.; Guvench, O.; Lopes, P.; Vorobyov, I.; Mackerell Jr., A. D. CHARMM General Force Field: A Force Field for Drug-Like Molecules Compatible with the CHARMM All-Atom Additive Biological Force Fields. Journal of Computational Chemistry 2010, 31, 671–690
Wang et al. 2004 Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and Testing of a General Amber Force Field. Journal of Computational Chemistry 2004, 25, 1157–1174
Halgren 1996 Halgren, T. A. Merck Molecular Force Field. I. Basis, Form, Scope, Parameterization, and Performance of MMFF94. Journal of Computational Chemistry 1996, 17, 490–519
Horta et al. 2016 Horta, B. A. C.; Merz, P. T.; Fuchs, P. F. J.; Dolenc, J.; Riniker, S.; Hünenberger, P. H. A GROMOS-Compatible Force Field for Small Organic Molecules in the Condensed Phase: The 2016H66 Parameter Set. Journal of Chemical Theory and Computation 2016, 12, 3825–3850
Christ and Fox 2014 Christ, C. D.; Fox, T. Accuracy Assessment and Automation of Free Energy Calculations for Drug Design. Journal of Chemical Information and Modeling 2014, 54, 108–120
Loeffler et al. 2015 Loeffler, H. H.; Michel, J.; Woods, C. FESetup: Automating Setup for Alchemical Free Energy Simulations. Journal of Chemical Information and Modeling 2015, 55, 2485–2490
Fu et al. 2018 Fu, H.; Gumbart, J. C.; Chen, H.; Shao, X.; Cai, W.; Chipot, C. BFEE: A User-Friendly Graphical Interface Facilitating Absolute Binding Free-Energy Calculations. Journal of Chemical Information and Modeling 2018, 58, 556–560
Lundborg and Lindahl 2015 Lundborg, M.; Lindahl, E. Automatic GROMACS Topology Generation and Comparisons of Force Fields for Solvation Free Energy Calculations. The Journal of Physical Chemistry B 2015, 119, 810–823
Klimovich and Mobley 2015 Klimovich, P. V.; Mobley, D. L. A Python Tool to Set Up Relative Free Energy Calculations in GROMACS. Journal of Computer-Aided Molecular Design 2015, 29, 1007–1014
Gapsys et al. 2015 Gapsys, V.; Michielssens, S.; Seeliger, D.; de Groot, B. L. Pmx: Automated Protein Structure and Topology Generation for Alchemical Perturbations. Journal of Computational Chemistry 2015, 36, 348–354
Senapathi et al. 2020 Senapathi, T.; Suruzhon, M.; Barnett, C. B.; Essex, J.; Naidoo, K. J. BRIDGE: An Open Platform for Reproducible High-Throughput Free Energy Simulations. Journal of Chemical Information and Modeling 2020, 60, 5290–5295
Boresch et al. 2003 Boresch, S.; Tettinger, F.; Leitgeb, M.; Karplus, M. Absolute Binding Free Energies: A Quantitative Approach for Their Calculation. The Journal of Physical Chemistry B 2003, 107, 9535–9551
Mobley et al. 2007 Mobley, D. L.; Graves, A. P.; Chodera, J. D.; McReynolds, A. C.; Shoichet, B. K.; Dill, K. A. Predicting Absolute Ligand Binding Free Energies to A Simple Model Site. Journal of Molecular Biology 2007, 371, 1118–1134
Woo and Roux 2005 Woo, H.-J.; Roux, B. Calculation of Absolute Protein–Ligand Binding Free Energy from Computer Simulations. Proceedings of the National Academy of Sciences of the United States of America 2005, 102, 6825–30
Aldeghi et al. 2016 Aldeghi, M.; Heifetz, A.; Bodkin, M. J.; Knapp, S.; Biggin, P. C. Accurate Calculation of the Absolute Free Energy of Binding for Drug Molecules. Chemical Science 2016, 7, 207–218
Reddy et al. 1994 Reddy, M. R.; Varney, M. D.; Kalish, V.; Viswanadhan, V. N.; Appelt, K. Calculation of Relative Differences in the Binding Free Energies of HIV1 Protease Inhibitors: A Thermodynamic Cycle Perturbation Approach. Journal of Medicinal Chemistry 1994, 37, 1145–1152
Reddy and Erion 2001 Reddy, M. R.; Erion, M. D. Calculation of Relative Binding Free Energy Differences for Fructose 1,6-Bisphosphatase Inhibitors Using the Thermodynamic Cycle Perturbation Approach. Journal of the American Chemical Society 2001, 123, 6246–6252
Cournia et al. 2017 Cournia, Z.; Allen, B.; Sherman, W. Relative Binding Free Energy Calculations in Drug Discovery: Recent Advances and Practical Considerations. Journal of Chemical Information and Modeling 2017, 57, 2911–2937
Xu 2019 Xu, H. Optimal Measurement Network of Pairwise Differences. Journal of Chemical Information and Modeling 2019, 59, 4720–4728
Shirts and Chodera 2008 Shirts, M. R.; Chodera, J. D. Statistically Optimal Analysis of Samples from Multiple Equilibrium States. The Journal of Chemical Physics 2008, 129, 124105
Shenfeld et al. 2009 Shenfeld, D. K.; Xu, H.; Eastwood, M. P.; Dror, R. O.; Shaw, D. E. Minimizing Thermodynamic Length to Select Intermediate States for Free-Energy Calculations and Replica-Exchange Simulations. Physical Review E 2009, 80, 046705
Yang et al. 2020 Yang, Q.; Burchett, W.; Steeno, G. S.; Liu, S.; Yang, M.; Mobley, D. L.; Hou, X. Optimal Designs for Pairwise calculation: An Application to Free Energy Perturbation in Minimizing Prediction Variability. Journal of Computational Chemistry 2020, 41, 247–257
Liang et al. 2013 Liang, J.; Tsui, V.; Van Abbema, A.; Bao, L.; Barrett, K.; Beresini, M.; Berezhkovskiy, L.; Blair, W. S.; Chang, C.; Driscoll, J.; Eigenbrot, C.; Ghilardi, N.; Gibbons, P.; Halladay, J.; Johnson, A.; Kohli, P. B.; Lai, Y.; Liimatta, M.; Mantik, P.; Menghrajani, K.; Murray, J.; Sambrone, A.; Xiao, Y.; Shia, S.; Shin, Y.; Smith, J.; Sohn, S.; Stanley, M.; Ultsch, M.; Zhang, B.; Wu, L. C.; Magnuson, S. Lead Optimization of a 4-Aminopyridine Benzamide Scaffold To Identify Potent, Selective, and Orally Bioavailable TYK2 Inhibitors. Journal of Medicinal Chemistry 2013, 56, 4521–4536
Scheuermann et al. 2015 Scheuermann, T. H.; Stroud, D.; Sleet, C. E.; Bayeh, L.; Shokri, C.; Wang, H.; Caldwell, C. G.; Longgood, J.; MacMillan, J. B.; Bruick, R. K.; Gardner, K. H.; Tambar, U. K. Isoform-Selective and Stereoselective Inhibition of Hypoxia Inducible Factor-2. Journal of Medicinal Chemistry 2015, 58, 5930–5941
Liu et al. 2013 Liu, S.; Wu, Y.; Lin, T.; Abel, R.; Redmann, J. P.; Summa, C. M.; Jaber, V. R.; Lim, N. M.; Mobley, D. L. Lead Optimization Mapper: Automating Free Energy Calculations for Lead Optimization. Journal of Computer-Aided Molecular Design 2013, 27, 755–770
Stephen Boyd 2004 Stephen Boyd, L. V. Convex Optimization, Chapter 7; Cambridge University Press: Cambridge, 2004
42 Xu, H. DiffNet. https://github.com/forcefield/DiffNet
Hopkins et al. 2015 Hopkins, C. W.; Le Grand, S.; Walker, R. C.; Roitberg, A. E. Long-Time-Step Molecular Dynamics through Hydrogen Mass Repartitioning. Journal of Chemical Theory and Computation 2015, 11, 1864–1874
Ryckaert et al. 1977 Ryckaert, J.-P.; Ciccotti, G.; Berendsen, H. J. Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. Journal of Computational Physics 1977, 23, 327–341
Miyamoto and Kollman 1992 Miyamoto, S.; Kollman, P. A. Settle: An analytical version of the SHAKE and RATTLE algorithm for rigid water models. Journal of Computational Chemistry 1992, 13, 952–962
Grønbech-Jensen and Farago 2013 Grønbech-Jensen, N.; Farago, O. A Simple and Effective Verlet-Type Algorithm for Simulating Langevin Dynamics. Molecular Physics 2013, 111, 983–991
Åqvist et al. 2004 Åqvist, J.; Wennerström, P.; Nervall, M.; Bjelic, S.; Brandsdal, B. O. Molecular Dynamics Simulations of Water and Biomolecules with a Monte Carlo Constant Pressure Algorithm. Chemical Physics Letters 2004, 384, 288–294
Maier et al. 2015 Maier, J. A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K. E.; Simmerling, C. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. Journal of Chemical Theory and Computation 2015, 11, 3696–3713
Case et al. 2020 Case, D. A.; Belfon, K.; Ben-Shalom, I. Y.; Brozell, S. R.; Cerutti, D. S.; Cheatham III, T. E.; Cruzeiro, V. W. D.; Darden, T. A.; Duke, R. E.; Giambasu, G.; Gilson, M. K.; Gohlke, H.; Goetz, A. W.; Harris, R.; Izadi, S.; Izmailov, S. A.; Kasavajhala, K.; Kovalenko, K.; Krasny, R.; Kurtzman, T.; Lee, T.; Le-Grand, S.; Li, P.; Lin, C.; Liu, J.; Luchko, T.; Luo, R.; Man, V.; Merz, K.; Miao, Y.; Mikhailovskii, O.; Monard, G.; ; Nguyen, H.; Onufriev, A.; Pan, F.; Pantano, S.; Qi, R.; Roe, D. R.; Roitberg, A.; Sagui, C.; Schott-Verdugo, S.; Shen, J.; Simmerling, C. L.; Skrynnikov, N.; Smith, J.; Swails, J.; Walker, R. C.; Wang, J.; Wilson, R. M.; Wolf, R. M.; Wu, X.; Xiong, Y.; Xue, Y.; York, D. M.; Kollman, P. A. AMBER 20. 2020; https://ambermd.org/
Hahn et al. 2021 Hahn, D. F.; Bayly, C. I.; Macdonald, H. E. B.; Chodera, J. D.; Mey, A. S. J. S.; Mobley, D. L.; Benito, L. P.; Schindler, C. E. M.; Tresadern, G.; Warren, G. L. Best Practices for Constructing, Preparing, and Evaluating Protein-Ligand Binding Affinity Benchmarks. 2021; https://arxiv.org/abs/2105.06222