This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Precise binding free energy calculations for multiple molecules using an optimal measurement network of pairwise differences

Pengfei Li Silicon Therapeutics, Suzhou, Jiangsu 215000, China    Zhijie Li Silicon Therapeutics, Suzhou, Jiangsu 215000, China    Yu Wang Silicon Therapeutics, Suzhou, Jiangsu 215000, China    Huaixia Dou Silicon Therapeutics, Suzhou, Jiangsu 215000, China    Brian K. Radak Roivant Sciences, Boston, Massachusetts 02210, United States    Woody Sherman Roivant Sciences, Boston, Massachusetts 02210, United States    Huafeng Xu [email protected] Roivant Sciences, New York, New York 10036, United States
Abstract

Alchemical binding free energy (BFE) calculations offer an efficient and thermodynamically rigorous approach to in silico binding affinity predictions. As a result of decades of methodological improvements and recent advances in computer technology, alchemical BFE calculations are now widely used in drug discovery research. They help guide the prioritization of candidate drug molecules by predicting their binding affinities for a biomolecular target of interest (and potentially selectivity against undesirable anti-targets). Statistical variance associated with such calculations, however, may undermine the reliability of their predictions, introducing uncertainty both in ranking candidate molecules and in benchmarking their predictive accuracy. Here, we present a computational method that substantially improves the statistical precision in BFE calculations for a set of ligands binding to a common receptor by dynamically allocating computational resources to different BFE calculations according to an optimality objective established in a previous work from our group and extended in this work. Our method, termed Network Binding Free Energy (NetBFE), performs adaptive binding free energy calculations in iterations, re-optimizing the allocations in each iteration based on the statistical variances estimated from previous iterations. Using examples of NetBFE calculations for protein-binding of congeneric ligand series, we demonstrate that NetBFE approaches the optimal allocation in a small number (5\leq 5) of iterations and that NetBFE reduces the statistical variance in the binding free energy estimates by approximately a factor of two when compared to a previously published and widely used allocation method at the same total computational cost.

1 Introduction

In drug discovery research, candidate drug molecules are increasingly ranked and selected by their computationally predicted affinities for the biomolecular target of interest before experimental synthesis and testing 1, 2, 3, 4, 5, 6. This allows for the exploration of many more molecular design ideas than is practical with experimental approaches alone. Alchemical binding free energy (BFE) calculations have become a valuable tool in predicting the protein-ligand binding affinities and their utility continues to increase thanks to advances in the methodologies 7, 8, 9, 10, 11, 12, 4, 13 force field 14, 15, 16, 17, 18, 19, computational power afforded by the graphical prcessing units (GPUs) 11, 9, 8, and the availability of simulation software and simplified workflows 11, 9, 4, 20, 21, 22, 23, 24, 25, 26.

BFE calculations can either compute the binding free energy of an individual molecule, by a technique commonly referred to as "absolute" binding free energy (ABFE) calculations 27, 28, 29, 30, or compute the difference between the binding free energies of two molecules, by "relative" binding free energy (RBFE) calculations 31, 32, 11, 33. The binding free energies of a set of molecules can be determined by a combination of ABFE calculations of select molecules and RBFE calculations of select pairs of molecules 34.

Statistical errors in BFE calculations confound decision-making in candidate molecule selection and the assessment of their predictive accuracy. There have been significant methodological progress in minimizing statistical errors in single BFE calculations 35, 36, but works have only recently been published addressing the question of how to reduce the collective statistical errors for BFE calculations of a set of molecules 34, 37, which represents the majority of applications of BFE calculations in drug discovery.

As suggested previously, at a fixed total computational cost, the overall statistical errors in BFE calculations for multiple molecules can be minimized by optimally allocating the computational resources to different RBFE and ABFE calculations. In particular, minimizing the total variance in the estimated binding free energies with respect to the allocation corresponds to the A-optimal in experimental design 34. Such optimal allocations, however, require knowledge of the statistical fluctuations in individual BFE calculations, which are unknown a priori. Because of this reason, A-optimal BFE calculations have not yet been reported for realistic examples.

In this work, we develop a method based on the A-optimal design to improve the statistical precision of BFE calculations. Our method, Network Binding Free Energy (NetBFE), performs adaptive BFE calculations in multiple iterations. In each iteration, NetBFE estimates the statistical fluctuations in different BFE calculations based on the results from previous iterations, and it reallocates the computational resources accordingly. This allows the A-optimal allocation to be used in realistic BFE calculations. We also extend the previous theoretical work by deriving new equations for the A-optimal design and the covariance matrix when only RBFE calculations are included (the previous theory required the inclusion of at least one ABFE calculation), a common scenario in the lead optimization stage of drug discovery.

Here, we apply NetBFE to calculating the binding free energies of 16 ligands against the TYK2 protein 11, 38 and 8 ligands against the HIF-2α\alpha protein 39. We compare the statistical errors in the estimated binding free energies from NetBFE calculations and those in the BFE calculations using RBFE pairs selected by the LOMAP algorithm developed by the Mobley lab 40, which is widely used in such calculations. At the same total computational cost, NetBFE reduces the statistical variance by about half compared to LOMAP. Our results suggest that NetBFE can substantially improve the statistical precision of BFE calculations in drug discovery, which should accelerate the computational selection of candidate drug molecules and thereby increase the efficiency of the design-make-test cycle.

2 Methods

2.1 A-optimal design of network binding free energy calculations

The A-optimal design of measurement network of pairwise differences was described previously in detail 34, and is briefly explained here. The binding free energy values for multiple molecules in the same binding site of a receptor are the quantities of interest in BFE calculations. Let xi=1,2,,mx_{i=1,2,\ldots,m} be the binding free energy of molecule ii; here we use xix_{i} instead of ΔGi\Delta G_{i} to make the notations in the mathematical presentation more compact, and to follow the same notation in Reference 34. ABFE computes the individual xix_{i} (i.e. ΔGi\Delta G_{i}) and RBFE computes the difference between any pair of xix_{i} and xjx_{j}: xij=xixjx_{ij}=x_{i}-x_{j} (i.e. ΔΔGij=ΔGiΔGj\Delta\Delta G_{ij}=\Delta G_{i}-\Delta G_{j}). These ABFE and RBFE calculations form our difference network, which can be represented by a graph 𝒢\mathcal{G} of m+1m+1 vertices and m(m+1)/2m(m+1)/2 edges, where the vertices i=1,2,,mi=1,2,\ldots,m stand for the mm quantities {xi}\left\{x_{i}\right\}, the edge between vertices ij>0i\neq j>0 stands for the difference measurement x^ij\hat{x}_{ij} (i.e. the RBFE calculations), and the edge between vertices 0 and i>0i>0 stands for the individual measurement x^i\hat{x}_{i} (i.e. the ABFE calculations). For each BFE calculation, its asymptotic statistical variance σe2\sigma_{e}^{2}, for e{i|i=1,2,,m}{(i,j)i,j=1,2,..,m,ij}e\in\{i|i=1,2,\ldots,m\}\cup\{(i,j)\mid i,j=1,2,..,m,i\neq j\}, decreases with nen_{e}—the resource allocated to this calculation—as

σe2=se2/ne\sigma_{e}^{2}=s_{e}^{2}/n_{e} (1)

where ses_{e} is the statistical fluctuation in the corresponding BFE calculation.

Some of the binding free energy values, say {xaaQ}\left\{x_{a}\mid a\in Q\right\}, may have been measured experimentally. These experimental values and their concomitant statistical errors in the experiment, {x~a±δaaQ}\left\{\tilde{x}_{a}\pm\delta_{a}\mid a\in Q\right\}, can be included in the estimate of the binding free energies from BFE calculations. If the true binding free energies are {xi}\{x_{i}\}, the likelihood that our BFE calculations produce results of {x^e±σe}\{\hat{x}_{e}\pm\sigma_{e}\} and the experimental measurements produce results of {x~a±δaaQ}\{\tilde{x}_{a}\pm\delta_{a}\mid a\in Q\} is

L\displaystyle L =\displaystyle= i(2πσi)1exp((xix^i)22σi2)\displaystyle\prod_{i}(\sqrt{2\pi\sigma_{i}})^{-1}\exp\left(-\frac{(x_{i}-\hat{x}_{i})^{2}}{2\sigma_{i}^{2}}\right) (2)
\displaystyle\cdot aQ(2πδa)1exp((xax~a)22δa2)\displaystyle\prod_{a\in Q}(\sqrt{2\pi\delta_{a}})^{-1}\exp\left(-\frac{(x_{a}-\tilde{x}_{a})^{2}}{2\delta_{a}^{2}}\right)
\displaystyle\cdot i<j(2πσij)1exp((xixjx^ij)22σij2)\displaystyle\prod_{i<j}(\sqrt{2\pi\sigma_{ij}})^{-1}\exp\left(-\frac{(x_{i}-x_{j}-\hat{x}_{ij})^{2}}{2\sigma_{ij}^{2}}\right)

Maximizing the log-likelihood lnL\ln L with respect to {xi}\{x_{i}\} yields the maximum likelihood estimator for {xi}\left\{x_{i}\right\}:

𝐅x=z\mathbf{F}\cdot\vec{x}=\vec{z} (3)

where 𝐅\mathbf{F} is the Fisher information matrix with the elements

Fij={δi2+σi2+kiσik2if i=j and iQσi2+kiσik2if i=j and iQσij2if ijF_{ij}=\left\{\begin{array}[]{cl}\delta_{i}^{-2}+\sigma_{i}^{-2}+\sum_{k\neq i}\sigma_{ik}^{-2}&\text{if }i=j\text{ and }i\in Q\\ \sigma_{i}^{-2}+\sum_{k\neq i}\sigma_{ik}^{-2}&\text{if }i=j\text{ and }i\notin Q\\ -\sigma_{ij}^{-2}&\text{if }i\neq j\end{array}\right. (4)

and

zi={δi2x~i+σi2x^i+jiσij2x^ij if iQσi2x^i+jiσij2x^ijif iQ.z_{i}=\left\{\begin{array}[]{cl}\delta_{i}^{-2}\tilde{x}_{i}+\sigma_{i}^{-2}\hat{x}_{i}+\sum_{j\neq i}\sigma_{ij}^{-2}\hat{x}_{ij}&\text{ if }i\in Q\\ \sigma_{i}^{-2}\hat{x}_{i}+\sum_{j\neq i}\sigma_{ij}^{-2}\hat{x}_{ij}&\text{if }i\notin Q.\end{array}\right. (5)

The covariance in the estimates of {xi}\left\{x_{i}\right\} is given by the inverse of the Fisher information matrix:

𝐂=𝐅1.\mathbf{C}=\mathbf{F}^{-1}. (6)

The A-optimal minimizes the total variance, tr(𝐂\mathbf{C}), subject to the constraints of non-negativity

ne0,n_{e}\geq 0, (7)

and of the total fixed computational cost

ene=ini+i<jnij=N.\sum_{e}n_{e}=\sum_{i}n_{i}+\sum_{i<j}n_{ij}=N. (8)

This minimization can be solved by cone programming41, and it is implemented in the publicly available software DiffNet. 42

2.2 A-optimal for a network with only relative binding free energy calculations

Sometimes we may choose to perform only relative binding free energy calculations and compute only x^ij\hat{x}_{ij}. In this case the free energies can only be determined up to an arbitrary constant x¯\bar{x}, which can be chosen to minimize the root mean square deviation from the known experimental values.

The log-likelihood in this case is given by

lnL=i<j(xixjx^ij)22σij2\ln L=-\sum_{i<j}\frac{(x_{i}-x_{j}-\hat{x}_{ij})^{2}}{2\sigma_{ij}^{2}} (9)

The corresponding Fisher information matrix

Fij={jiσij2 if i=jσij2 if ijF_{ij}=\left\{\begin{array}[]{cl}\sum_{j\neq i}\sigma_{ij}^{-2}&\text{ if }i=j\\ -\sigma_{ij}^{-2}&\text{ if }i\neq j\end{array}\right. (10)

is singular. Consequently, the covariance matrix 𝐂=𝐅1\mathbf{C}=\mathbf{F}^{-1} cannot be defined.

This problem can be solved by introducing an additional term to the log-likelihood to restrain the mean m1ixim^{-1}\sum_{i}x_{i} to a constant x¯\bar{x}:

lnL(w)=lnL21ω(m1ixix¯)2\ln L^{\ast}(w)=\ln L-2^{-1}\omega\left(m^{-1}\sum_{i}x_{i}-\bar{x}\right)^{2} (11)

which is equivalent to specifying that the mean of {xi}\{x_{i}\} is normally distributed around x¯\bar{x} with the standard deviation of 1/ω1/\sqrt{\omega}. The corresponding Fisher information matrix is

𝐅(w)=𝐅+ωm2𝟏\mathbf{F}^{\ast}(w)=\mathbf{F}+\omega m^{-2}\mathbf{1} (12)

where 𝟏\mathbf{1} is a m×mm\times m matrix of elements 1. We substitute this augmented Fisher matrix 𝐅(ω)\mathbf{F}^{\ast}(\omega) for 𝐅\mathbf{F} in the semi-definite programming for A-optimal when there are no absolute binding free energy calculations. It can be demonstrated that the optimal allocation does not depend on the value of ω>0\omega>0.

The corresponding covariance matrix is

𝐂(ω)=(𝐅(ω))1\mathbf{C}^{\ast}(\omega)=(\mathbf{F}^{\ast}(\omega))^{-1} (13)

We are interested in 𝐂=limω𝐂(ω)\mathbf{C}^{\ast}=\lim_{\omega\rightarrow\infty}\mathbf{C}^{\ast}(\omega), such that the mean of {xi}\{x_{i}\} is constrained to the constant x¯\bar{x}. To derive 𝐂\mathbf{C}^{\ast}, consider the expansion of 𝐂(ω)\mathbf{C}^{\ast}(\omega) in ω1\omega^{-1}:

𝐂(ω)=𝐂+ω1𝐂1+o(ω2).\mathbf{C}^{\ast}(\omega)=\mathbf{C}^{\ast}+\omega^{-1}\mathbf{C}_{1}+o(\omega^{-2}). (14)

We have

𝐈\displaystyle\mathbf{I} =\displaystyle= (𝐅+ωm2𝟏)(𝐂+ω1𝐂1+o(ω2))\displaystyle(\mathbf{F}+\omega m^{-2}\mathbf{1})(\mathbf{C}^{\ast}+\omega^{-1}\mathbf{C}_{1}+o(\omega^{-2})) (15)
=\displaystyle= 𝐅𝐂+m2𝟏𝐂1+ωm2𝟏𝐂+ω1𝐅𝐂1+o(ω2).\displaystyle\mathbf{F}\mathbf{C}^{\ast}+m^{-2}\mathbf{1}\mathbf{C}_{1}+\omega m^{-2}\mathbf{1}\mathbf{C}+\omega^{-1}\mathbf{F}\mathbf{C}_{1}+o(\omega^{-2}).

Comparing the terms grouped by different orders of ω\omega, we have

1t𝐂\displaystyle\vec{1}^{t}\mathbf{C}^{\ast} =\displaystyle= 0\displaystyle\vec{0}
𝐅𝐂+1m2C1t\displaystyle\mathbf{F}\mathbf{C}^{\ast}+\vec{1}m^{-2}\vec{C}_{1}^{t} =\displaystyle= 𝐈,\displaystyle\mathbf{I}, (16)

where C1t\vec{C}_{1}^{t} is a vector whose elements are the column sums of 𝐂1\mathbf{C}_{1}. This can be written in the matrix form

(1t0𝐅1)(𝐂m2C1t)=(0t𝐈),\left(\begin{array}[]{cc}\vec{1}^{t}&0\\ \mathbf{F}&\vec{1}\end{array}\right)\left(\begin{array}[]{c}\mathbf{C}^{\ast}\\ m^{-2}\vec{C}_{1}^{t}\end{array}\right)=\left(\begin{array}[]{c}\vec{0}^{t}\\ \mathbf{I}\end{array}\right), (17)

the solution of which gives a well-defined covariance matrix 𝐂\mathbf{C}^{\ast}. Incidentally, C1t=m1t\vec{C}_{1}^{t}=m\vec{1}^{t}.

2.3 Iterative network binding free energy calculations

The statistical fluctuations {se}\{s_{e}\} depend on both the thermodynamic length between the two end states in an alchemical calculation and the ratio of the relaxation time of relevant motions to the length of the simulation.36, 7 In practice, they have to be estimated (approximately) from the completed BFE calculations. In order to obtain {se}\{s_{e}\} values for allocating the computational resources according to the A-optimal, NetBFE calculations proceed in iterations, with each iteration consuming a predetermined amount, ΔN(i)\Delta N^{(i)}, of total sampling. After each iteration ii, estimates of {se(i+1)}\{s_{e}^{(i+1)}\} are updated as described in Section. 2.6 below, and a new A-optimal allocation {ne(i+1)}\{n_{e}^{(i+1)}\} is determined from the updated {se(i+1)}\{s_{e}^{(i+1)}\} and used to extend the BFE calculations for iteration i+1i+1. The iterations continue until a specified total simulation samples, N=iΔN(i)N=\sum_{i}\Delta N^{(i)}, or a specified target average statistical variance tr(𝐂\mathbf{C})/m is reached. Data accumulated from all the iterations are included to compute the binding free energy value for each BFE calculation.

In each NetBFE calculation reported in this work, a total of T0=m×200nsT_{0}=m\times 200\mathrm{ns} simulation time is used for a set of mm molecules. This is divided into I=5I=5 iterations and in each iteration Ti=T0/IT_{i}=T_{0}/I of simulation time is allocated.

2.4 A-optimal on sparse 2-connected sub-networks

In order to reduce the number of BFE calculations in each iteration, we select a 2-connected subgraph out of the full graph 𝒢\mathcal{G} in each iteration, and obtain the A-optimal on this sub-network. It was demonstrated previously that the 2-connected sub-network can generate near-optimal allocations.34 The allocation in each iteration thus proceeds in two steps in our NetBFE implementation. First, we determine the A-optimal allocation of the computational resources for the next iteration on the full graph 𝒢\mathcal{G}. The edges with the largest allocations are selected to form the 2-connected subgraph. Here, the edge selection can be based on either the total cumulative allocation including all previous iterations or only the allocation for this iteration. We have implemented both selection methods, and we refer to NetBFE calculations with the former method as NetBFE(NN) and those with the latter method as NetBFE(ΔN\Delta N). Second, A-optimal allocation is obtained including only the BFE calculations in the selected 2-connected subgraph.

2.5 Estimates of the statistical variances in BFE calculations

The statistical variance σij2\sigma_{ij}^{2} in the computed ΔΔGij\Delta\Delta G_{ij} (by RBFE for iji\neq j and by ABFE for i=ji=j) is estimated by block bootstrapping, using all the data up to the current iteration. The sampled data points in free energy simulations are divided into B=10B=10 non-overlapping blocks, each block containing samples from a simulation time interval. The data are resampled by randomly drawing BB blocks with replacement. A ΔΔGij\Delta\Delta G_{ij} value is computed from each set of resampled data, and the corresponding variance σij2\sigma_{ij}^{2} is computed from K=50K=50 repeats of the resampling.

2.6 Estimates of the statistical fluctuations in BFE calculations

The statistical fluctuations sijs_{ij} associated with the calculation of ΔΔGij\Delta\Delta G_{ij} can be estimated from its statistical variance σij2\sigma_{ij}^{2} and its given computational cost nijn_{ij} by inverting Eq. 1:

sij2=σij2nijs_{ij}^{2}=\sigma_{ij}^{2}n_{ij} (18)

We use a simple model to predict sijs_{ij} of BFE calculations that have not yet been performed (thus σij\sigma_{ij} is unavailable):

sij=w0+w1max(hij,hji)1/2+w2max(Hij,Hji)1/2s_{ij}=w_{0}+w_{1}\max(h_{ij},h_{ji})^{1/2}+w_{2}\max(H_{ij},H_{ji})^{1/2} (19)

where for iji\neq j, hijh_{ij} is the number of heavy atoms in molecule ii that do not map onto atoms in molecule jj and HijH_{ij} is the number of hydrogen atoms in molecule ii that do not map onto atoms in molecule j in the RBFE calculation; for i=ji=j, hiih_{ii} and HiiH_{ii} are the respective numbers of heavy and hydrogen atoms in molecule ii in the ABFE calculation.

Before the first iteration, the parameters are initialized to w0=1.0w_{0}=1.0, w1=1.0w_{1}=1.0 and w2=0.5w_{2}=0.5. In each iteration, the sijs_{ij} values for the performed RBFE calculations are updated by Eq. 18, and the parameters w0w_{0}, w1w_{1}, and w2w_{2} are updated by minimizing the root-mean-square-error (RMSE) between the values predicted from Eq. 19 and those computed from Eq. 18.

2.7 Maximum likelihood estimate of binding free energies

The individual (absolute) binding free energies can be derived from the pairwise (relative) binding free energy differences and their corresponding variances by using the maximum likelihood estimator (Eq. 3). In practical drug discovery projects, a few molecules usually have experimentally determined binding free energies—reference binding free energies–which can be included in the computational results to derive a maximum likelihood estimate of binding free energies (Eqs. 4 and 5).

2.8 Molecular dynamic simulations

In each individual RBFE/ABFE calculation, the protein-ligand complex was first equilibrated by a 5 ns simulation, where the snapshots were saved every 2500 MD steps and the total 2000 snapshots are randomly drawn as the starting structures for replicates of RBFE/ABFE calculations. Hydrogen mass repartition 43 in conjunction with SHAKE 44/SETTLE 45 for constraining the distances between hydrogen atoms and their covalently bonded heavy atoms was employed to enable the use of a large time step of 4 fs in all simulations. The potential energies at all λ\lambda values were computed and saved every 5 ps. Each alchemical simulation consisted of both the "complex" stage, which simulated the protein-ligand complex, and the "solvated" stage, which simulated the ligand in aqueous solution. The temperature was kept at 298.15 K using Langevin dynamics 46 with the collision frequency 2.0 ps-1, and the pressure was kept at 1.01325 bar with Monte Carlo barostat 47 and a pressure relaxation time of 1.0 ps. The proteins were modeled by AMEBR ff14SB force field 48, and the small molecules were parameterized using our in-house force field generation program; the force field parameters and starting protein-ligand complex structures are provided as AMBER20 input files in the Supplementary Materials. The entire system was solvated in a periodic box of TIP3P water molecules with all the solute atoms no less than 12 Å away from the boundary of the unit cell. In this work, the initial structures of TYK2 protein target with 16 ligands come from reference 11 and HIF-2α\alpha (PDB ID: 4XT2) protein target with 8 ligands are constructed according to reference39. All simulations were performed using the AMBER20 package.49

Each individual RBFE calculation consists of Λ=22\Lambda=22 simulations (decoupled protocol) or Λ=12\Lambda=12 simulations (concerted protocol) and each individual ABFE calculation consists of Λ=21\Lambda=21 simulations (decoupled protocol) or Λ=16\Lambda=16 simulations (concerted protocol) at different λ\lambda values. Decoupled and concerted protocols are defined in references 5 and 8. If a BFE calculation is allocated a total simulation time of t0t_{0}, each λ\lambda-valued simulation will run for a total of tλ=t0/Λt_{\lambda}=t_{0}/\Lambda simulation time. If tλ<0.1t_{\lambda}<0.1ns, this BFE calculation is omitted and its time t0t_{0} is allocated proportionally to other BFE calculations, otherwise the simulation time is divided into tλ/(2ns)\lceil t_{\lambda}/(2\mathrm{ns})\rceil replicates initialized with slightly different binding poses and random number seeds (thus no individual simulation exceeds 2 ns simulation time, which helps load balance when the simulations are executed in parallel). In each iteration, new replicates for each individual BFE calculation receiving new allocations are initiated; existing simulations are not extended. The allocation only includes the "complex" stage of each individual BFE calculation; if a BFE calculation is performed, its corresponding "solvated" stage is simulated for a fixed time of 2Λ2\Lambda ns, which incurs a negligible computational cost and is not included in the total allocated computational resource.

3 Results

We first illustrate the iterative allocation in NetBFE consisting of both ABFE and RBFE calculations, using the small example of eight inhibitors against the protein kinase TYK2. No experimental binding free energies were included in the allocation and analysis, i.e. δi1=0\delta_{i}^{-1}=0 in Eq. 4. The estimated fluctuations {se}\{s_{e}\} and the corresponding A-optimal allocation of {ne}\{n_{e}\} in each iteration are shown in Fig. 1A and Fig. 1B, respectively. The total variance tr(𝐂(n(i),s(i)))\mathrm{tr}(\mathbf{C}(n^{(i)},s^{(i)})) decreases with increasing iterations.

How well does this iterative procedure approach the true A-optimal allocation? If we assume that the estimate of {se(i=5)}\{s_{e}^{(i=5)}\} after the last iteration represents the true statistical fluctuations, tr(𝐂(n(5),s(5)))\mathrm{tr}(\mathbf{C}(n^{(5)},s^{(5)})) approximates the variance of the A-optimal allocation. We measure how far the allocation n(i)n^{(i)} in the ii’th iteration—normalized to n(i)ene(5)(ene(i))1n(i)n^{\prime(i)}\equiv\sum_{e}n_{e}^{(5)}(\sum_{e}n_{e}^{(i)})^{-1}n^{(i)} so that its total number of samples is the same as that in the last iteration—is from the this approximate A-optimal by comparing tr(𝐂(n(i),s(5)))\mathrm{tr}(\mathbf{C}(n^{\prime(i)},s^{(5)})) to tr(𝐂((n(5),s(5)))\mathrm{tr}(\mathbf{C}((n^{(5)},s^{(5)})). The former indeed approaches the latter (Fig. 1C) after only three iterations, suggesting that the iterative allocation is converging quickly to the true A-optimal.

Because ABFE converges much slower than RBFE (on average ABFE has a much higher statistical fluctuation than RBFE: {si}ABFE/{sij}RBFE\langle\{s_{i}\}\rangle_{\mathrm{ABFE}}/\langle\{s_{ij}\}\rangle_{\mathrm{RBFE}} \approx 6.2), most of the computations (approximately 81%\% of total resources) are allocated to ABFE in order to improve the precision of the absolute values of the computed binding free energies.

Refer to caption
Figure 1: The evolution of a NetBFE calculation of 8 inhibitors against TYK2. A. The estimated statistical fluctuations s={se}s=\{s_{e}\} in each iteration. The width of each edge is proportional to the corresponding ses_{e}. B. The allocation of n={ne}n=\{n_{e}\} in each iteration. The width of each edge is proportional to the corresponding nen_{e}. In (A) and (B), each filled circle represents an inhibitor, an edge between two filled circles represents a RBFE calculation between the two corresponding inhibitors, and an edge between the large empty circle and a filled circle represents an ABFE calculation for the corresponding inhibitor. C. The total variance tr(𝐂)\mathrm{tr}(\mathbf{C})\equiv tr(𝐂(n(i),s(i)))(\mathbf{C}(n^{(i)},s^{(i)})) according to the estimate of s(i)={se(i)}s^{(i)}=\{s_{e}^{(i)}\} and the allocation n(i)={ne(i)}n^{(i)}=\{n_{e}^{(i)}\} in each iteration ii. Also shown is expected total variance tr(𝐂(n(i),s(5)))(\mathbf{C}(n^{\prime(i)},s^{(5)})) resulting from the normalized allocation n(i)=ene(5)(ene(i))1n(i)n^{\prime(i)}=\sum_{e}n_{e}^{(5)}(\sum_{e}n_{e}^{(i)})^{-1}n^{(i)} in the iith iteration if {se(i=5)}\{s_{e}^{(i=5)}\} from the last iteration is the true statistical fluctuation; the convergence of tr(𝐂(n(i),s(5)))\mathrm{tr}(\mathbf{C}(n^{\prime(i)},s^{(5)})) to tr(𝐂(n(5),s(5)))\mathrm{tr}(\mathbf{C}(n^{(5)},s^{(5)})) demonstrates that NetBFE approaches the A-optimal allocation via the iterative updates.

Often it is unnecessary to predict the absolute values of the binding free energies, such as in the case when the values for some similar molecules have been experimentally determined, thus the values for the molecules of interest can be determined by predicting the differences between the molecules, or in the case of ranking the molecules by their binding affinities, such that only their relative differences will suffice. In such cases ABFE calculations may be omitted to avoid the most expensive part of the computations, as noted above. In the following, we consider NetBFE with only RBFE calculations.

We used NetBFE to compute the binding free energies of 16 inhibitors against TYK2 38, 11 and 8 inhibitors against HIF-2α\alpha 39. These molecules are depicted in Fig. 2 and listed in the Supplementary Materials. We first characterized the progression of statistical errors after each iteration of extending the total simulation time (Fig. 3). The standard deviations and the total variance by and large decreases with increasing total simulation time. Moreover, the distribution of the standard errors becomes narrower, indicating that most of the free energy results have converged to similar precision.

Refer to caption
Figure 2: 2D structures for the inhibitors studied in this work. A. 16 inhibitors against TYK2; B. 8 inhibitors against HIF-2α\alpha.

In the case of HIF-2α\alpha inhibitors, the agreement between the experimental binding free energies and their predicted values improves at each iteration (Fig. 3), indicating that the improvement in precision is accompanied by the improvement in accuracy; for TYK2 inhibitors, the accuracy is quite good (RMSE < 1 kcal/mol) in all 5 iterations. Of course, accuracy in the binding free energy calculations depends on other factors such as the force field parametrization and the consistency between the computational setup and the experimental conditions. It remains unknown whether in other protein-ligand combinations or in other force fields the accuracy of the predicted binding free energies improves with the statistical precision.

Refer to caption
Figure 3: The progression of precision and accuracy in NetBFE iterations for A. 16 inhibitors against TYK2 and B. 8 inhibitors against HIF-2α\alpha. The upper panels show the total variance tr(𝐂)(\mathbf{C}) (red) and the root mean square error (RMSE) between NetBFE predictions and experimental values of the individual binding free energies (blue). The lower panels show the distributions of the standard deviations in the predicted binding free energies, computed for each inhibitor as the square root of the corresponding diagonal element in 𝐂\mathbf{C}; the dashed and solid lines represent the average and median values, respectively, of the distributions.

In each iteration of the NetBFE calculation, the allocations {ne}\{n_{e}\} are optimized according to the estimate of statistical fluctuations {se}\{s_{e}\} based on the results of the previous iteration, thus both {se}\{s_{e}\} and {ne}\{n_{e}\} change from one iteration to the next. We measure the difference between two allocations of {ne}\{n_{e}\} and {ne}\{n^{\ast}_{e}\} by the Kullback-Leibler (KL) divergence:

DKL[ne||ne]=efelnfefeD_{\mathrm{KL}}[n_{e}||n^{\ast}_{e}]=\sum_{e}f_{e}\ln\frac{f_{e}}{f^{\ast}_{e}} (20)

where fene/enef_{e}\equiv n_{e}/\sum_{e^{\prime}}n_{e^{\prime}} and similarly fef^{\ast}_{e} are the fractional allocations. In the two cases above, {ne}\{n_{e}\} approach their limiting values with the progression of iterations (Fig. 4): the KL-divergence between {ne(i)}\{n_{e}^{(i)}\} and {ne(opt)}\{n_{e}^{(\mathrm{opt})}\} decreases toward 0.69 for TYK2 and 0.28 for HIF-2α\alpha with each iteration, where {ne(opt)}\{n_{e}^{(\mathrm{opt})}\} is the the A-optimal allocation based on s(i=5)s^{(i=5)} from the last iteration, assuming, as in the above, that the estimate of s(i=5)s^{(i=5)} from the last iteration is a good approximation to the true statistical fluctuations. Taking the A-optimal variance to be 𝐂opt𝐂(n(opt),s(5))\mathbf{C}_{\mathrm{opt}}\approx\mathbf{C}(n^{(\mathrm{opt})},s^{(5)}), tr(𝐂(n(i),s(5))/tr(𝐂opt)\mathrm{tr}(\mathbf{C}(n^{\prime(i)},s^{(5)})/\mathrm{tr}(\mathbf{C}_{\mathrm{opt}}) rapidly approaches one with each iteration for both TYK2 and HIF-2α\alpha examples (as above, n(i)=ene(5)(ene(i))1n(i)n^{\prime(i)}=\sum_{e}n_{e}^{(5)}(\sum_{e}n_{e}^{(i)})^{-1}n^{(i)} is the normalized allocation in iteration ii). Note that n(i)n^{(i)} is determined on a 2-connected sub-network (see Section 2.4) while n(opt)n^{(\mathrm{opt})} is determined on the full network. These observations suggest that the iterations in NetBFE using a two-connected sub-network result in near-optimal allocations.

Refer to caption
Figure 4: The deviation of allocation in each iteration from the optimal allocation according to the final estimate of {se}\{s_{e}\}, DKL[ne(i)||ne(opt)]D_{\mathrm{KL}}[n_{e}^{(i)}||n_{e}^{(\mathrm{opt})}], and the corresponding ratio tr(𝐂(n(i),s(5)))(\mathbf{C}(n^{\prime(i)},s^{(5)}))/tr(𝐂opt)(\mathbf{C}_{\mathrm{opt}}).

A simple consistency check can help identify problematic RBFE calculations by comparing the ΔΔGij,RBFE\Delta\Delta G_{ij,\mathrm{RBFE}} in the RBFE calculation against the difference ΔGi,NetBFEΔGj,NetBFE\Delta G_{i,\mathrm{NetBFE}}-\Delta G_{j,\mathrm{NetBFE}}, where ΔGi,NetBFE\Delta G_{i,\mathrm{NetBFE}} and ΔGj,NetBFE\Delta G_{j,\mathrm{NetBFE}} are derived by the maximum likelihood estimator of Eq. 3. Fig. 5 shows such comparison for the NetBFE calculations of TYK2 and HIF-2α\alpha inhibitors; points significantly deviating from the diagonal (e.g. points for which |ΔΔGij,RBFE(ΔGi,NetBFEΔGj,NetBFE)|3σij2+σi2+σj2|\Delta\Delta G_{ij,\mathrm{RBFE}}-(\Delta G_{i,\mathrm{NetBFE}}-\Delta G_{j,\mathrm{NetBFE}})|\geq 3\sqrt{\sigma_{ij}^{2}+\sigma_{i}^{2}+\sigma_{j}^{2}}) indicate potentially problematic RBFE calculations.

Refer to caption
Figure 5: Consistency check between individual RBFE calculations and maximum likelihood estimates of NetBFE results for TYK2 and HIF-2α\alpha calculations. The red point indicates a potentially problematic RBFE calculation inconsistent with NetBFE estimates (see text).

Having demonstrated above that the iterative procedure in NetBFE allocates the computational resources approximately according to the A-optimal, we next investigate whether such allocations yield improved precision in the computed binding free energies. We performed g=3g=3 independent replicates of NetBFE calculations of the TYK2 and HIF-2α\alpha examples, initializing each replicate with the same binding poses but different random number seeds in equilibration and alchemical simulations, and computed the statistical variances in the predicted binding free energies across the replicates. In comparison, we computed the same sets of binding free energies—and their statistical variances from g=3g=3 independent replicates—by allocating the same total computational cost evenly to pairs of RBFE calculations selected by a previously published method, LOMAP 40. Fig. 6 shows the distributions of variances for the two NetBFE allocation methods (NetBFE(NN) and NetBFE(ΔN\Delta N), explained in Section 2.4) and the LOMAP allocation method. The mean and median values of the variance distributions are listed in Table 1. The NetBFE allocations indeed significantly outperform the LOMAP allocations in terms of the statistical precision in the predicted binding free energies. The reduction in the statistical variance by approximately a factor of two is consistent with the previous results from simulated data 34.

Refer to caption
Figure 6: The distributions of variances in the estimated binding free energies (dashed line: mean value of variance distribution; solid line: median value of variance distribution) of NetBFE(NN), NetBFE(ΔN\Delta N) and LOMAP methods for the TYK2 and HIF-2α\alpha examples. The variances are computed from 3 replicates of binding free energy calculations.
Table 1: The mean and median values of the variances distributions in the estimated binding free energies of NetBFE(NN), NetBFE(ΔN\Delta N) and LOMAP schemes for the TYK2 and HIF-2α\alpha examples, corresponding to the dashed and solid lines in Fig. 6. The error bars for these values are computed from bootstrapping. The variances are in the units of (kcal/mol)2.
TYK2 HIF-2α\alpha
methods mean median mean median
NetBFE(NN) 0.11 ±\pm 0.05 0.04 ±\pm 0.02 0.13 ±\pm 0.07 0.05 ±\pm 0.06
NetBFE(ΔN\Delta N) 0.03 ±\pm 0.01 0.03 ±\pm 0.01 0.11 ±\pm 0.03 0.10 ±\pm 0.02
LOMAP 0.10 ±\pm 0.03 0.07 ±\pm 0.02 0.25 ±\pm 0.10 0.13 ±\pm 0.13

Reference binding free energy values from experimental measurements for a subset of the molecules, when available, may be incorporated in the NetBFE calculations by supplying x~i\tilde{x}_{i} and δi1>0\delta_{i}^{-1}>0 in Eq. 45. Here we tested the allocation and the corresponding precision and accuracy in NetBFE calculations that include different numbers of reference values. We performed four independent NetBFE calculations for the 16 inhibitors against TYK2, supplying reference experimental values for k=0,1,2,4k=0,1,2,4 molecules, respectively.

The optimal allocations at the end of five iterations are shown in Fig. 7. When there are a small number of reference molecules (k=1,2k=1,2), they serve as network hubs, in that there is a RBFE edge connecting them to every other molecule. As we increase the number of reference molecules (k=4k=4), however, only a partial subset of the molecules are connected to each reference molecule for which RBFE calculations are quick to converge. The predicted binding free energies are compared to the experimental values in Fig. 8. The precision and accuracy of the NetBFE calculations are summarized in Table 2. In this case, including more reference values does not significantly improve either the precision or the accuracy.

Refer to caption
Figure 7: The network of optimal allocations when including k=0,1,2,4k=0,1,2,4 reference experimental binding free energies. The small empty circles in the graph represent the ligands whose experimental binding free energies are used as reference values. The disconnected apo node (the large empty circle) indicates that there are no ABFE calculations.
k 0 1 2 4
m1tr(𝐂)m^{-1}\mathrm{tr}(\mathbf{C}) 0.0030 ±\pm 0.0011 0.0040 ±\pm 0.0006 0.0020 ±\pm 0.0003 0.0020 ±\pm 0.0002
MUE 0.82 ±\pm 0.12 0.74 ±\pm 0.11 0.72 ±\pm 0.10 0.79 ±\pm 0.14
RMSE 0.93 ±\pm 0.11 0.86 ±\pm 0.12 0.82 ±\pm 0.11 0.97 ±\pm 0.13
Pearson R2R^{2} 0.87 ±\pm 0.07 0.82 ±\pm 0.09 0.84 ±\pm 0.07 0.76 ±\pm 0.11
Spearman ρ\rho 0.88 ±\pm 0.09 0.89 ±\pm 0.10 0.87 ±\pm 0.10 0.90 ±\pm 0.08
Kendall τ\tau 0.72 ±\pm 0.11 0.72 ±\pm 0.12 0.72 ±\pm 0.12 0.75 ±\pm 0.10
Table 2: The precision and accuracy of NetBFE calculations for m=16m=16 TYK2 inhibitors when reference experimental values are included for k=0,1,2,4k=0,1,2,4 molecules. The precision is measured by the average statistical variance in the free energy estimates (in (kcal/mol)2). The accuracy is measured by the mean unsigned error (MUE, in kcal/mol) and root mean square error (RMSE, in kcal/mol) from the experimental values and by the correlations (Pearson R2R^{2}, Spearman ρ\rho, and Kendall τ\tau) with the experimental values. The error bars in the metrics are computed from bootstrapping.
Refer to caption
Figure 8: The predicted binding free energies by NetBFE versus experimental values when including k=0,1,2,4k=0,1,2,4 reference experimental binding free energies. Here, the predicted binding free energies by NetBFE are shifted to minimize the difference between the means of the predicted and the experimental values. The error bars in the predicted values are the square roots of the diagonal elements of the covariance matrix in the NetBFE calculations. The molecules with reference values are represented by unfilled circles; their predicted binding free energies are included in the calculation of MUE, RMSE, and the correlation coefficients.

4 Conclusions

In this work, we have developed a new method to improve the overall statistical precision of binding free energy calculations for multiple molecules against a common biomolecular target. NetBFE (Network Binding Free Energy) implements and extends a previous theoretical work from our group on A-optimal experimental design for measuring pairwise differences, making it practical for realistic binding free energy calculations. We have demonstrated that, at a fixed total computational cost, NetBFE can reduce the statistical errors in BFE calculations by approximately two-fold. The improved precision will increase the confidence in the computational ranking of candidate drug molecules, which is instrumental to prioritizing molecules for synthesis in the design-make-test cycle. In addition, as binding free energy calculations are increasingly used for validating and benchmarking force field models and free energy methods4, 11, 7, 50, the reduced statistical errors afforded by NetBFE will make the benchmark assessments more reliable and thus accelerate the improvements in in silico binding assays.

References

  • Chodera et al. 2011 Chodera, J.; Mobley, D.; Shirts, M.; Dixon, R.; Branson, K.; Pande, V. Alchemical Free Energy Methods for Drug Discovery: Progress and Challenges. Current Opinion in Structural Biology 2011, 21, 150–160
  • Mobley and Klimovich 2012 Mobley, D. L.; Klimovich, P. V. Perspective: Alchemical Free Energy Calculations for Drug Discovery. The Journal of Chemical Physics 2012, 137, 230901
  • Abel et al. 2017 Abel, R.; Wang, L.; Harder, E. D.; Berne, B. J.; Friesner, R. A. Advancing Drug Discovery through Enhanced Free Energy Calculations. Accounts of Chemical Research 2017, 50, 1625–1632
  • Schindler et al. 2020 Schindler, C. E. M.; Baumann, H.; Blum, A.; Böse, D.; Buchstaller, H.-P.; Burgdorf, L.; Cappel, D.; Chekler, E.; Czodrowski, P.; Dorsch, D.; Eguida, M. K. I.; Follows, B.; Fuch\mathcal{B}, T.; Grädler, U.; Gunera, J.; Johnson, T.; Jorand Lebrun, C.; Karra, S.; Klein, M.; Knehans, T.; Koetzner, L.; Krier, M.; Leiendecker, M.; Leuthner, B.; Li, L.; Mochalkin, I.; Musil, D.; Neagu, C.; Rippmann, F.; Schiemann, K.; Schulz, R.; Steinbrecher, T.; Tanzer, E.-M.; Unzue Lopez, A.; Viacava Follis, A.; Wegener, A.; Kuhn, D. Large-Scale Assessment of Binding Free Energy Calculations in Active Drug Discovery Projects. Journal of Chemical Information and Modeling 2020, 60, 5457–5474
  • Song and Merz 2020 Song, L. F.; Merz, K. M. Evolution of Alchemical Free Energy Methods in Drug Discovery. Journal of Chemical Information and Modeling 2020, 60, 5308–5318
  • Cournia et al. 2020 Cournia, Z.; Allen, B. K.; Beuming, T.; Pearlman, D. A.; Radak, B. K.; Sherman, W. Rigorous Free Energy Simulations in Virtual Screening. Journal of Chemical Information and Modeling 2020, 60, 4153–4169
  • Mey et al. 2020 Mey, A. S. J. S.; Allen, B. K.; Macdonald, H. E. B.; Chodera, J. D.; Hahn, D. F.; Kuhn, M.; Michel, J.; Mobley, D. L.; Naden, L. N.; Prasad, S.; Rizzi, A.; Scheen, J.; Shirts, M. R.; Tresadern, G.; Xu, H. Best Practices for Alchemical Free Energy Calculations [Article v1.0]. Living Journal of Computational Molecular Science 2020, 2, 18378
  • Tsai et al. 2020 Tsai, H.-C.; Tao, Y.; Lee, T.-S.; Merz, K. M.; York, D. M. Validation of Free Energy Methods in AMBER. Journal of Chemical Information and Modeling 2020, 60, 5296–5300
  • Lee et al. 2020 Lee, T.-S.; Allen, B. K.; Giese, T. J.; Guo, Z.; Li, P.; Lin, C.; McGee, T. D.; Pearlman, D. A.; Radak, B. K.; Tao, Y.; Tsai, H.-C.; Xu, H.; Sherman, W.; York, D. M. Alchemical Binding Free Energy Calculations in AMBER20: Advances and Best Practices for Drug Discovery. Journal of Chemical Information and Modeling 2020, 60, 5595–5623
  • He et al. 2020 He, X.; Liu, S.; Lee, T.-S.; Ji, B.; Man, V. H.; York, D. M.; Wang, J. Fast, Accurate, and Reliable Protocols for Routine Calculations of Protein–Ligand Binding Affinities in Drug Design Projects Using AMBER GPU-TI with ff14SB/GAFF. ACS Omega 2020, 5, 4611–4619
  • Wang et al. 2015 Wang, L.; Wu, Y.; Deng, Y.; Kim, B.; Pierce, L.; Krilov, G.; Lupyan, D.; Robinson, S.; Dahlgren, M. K.; Greenwood, J.; Romero, D. L.; Masse, C.; Knight, J. L.; Steinbrecher, T.; Beuming, T.; Damm, W.; Harder, E.; Sherman, W.; Brewer, M.; Wester, R.; Murcko, M.; Frye, L.; Farid, R.; Lin, T.; Mobley, D. L.; Jorgensen, W. L.; Berne, B. J.; Friesner, R. A.; Abel, R. Accurate and Reliable Prediction of Relative Ligand Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy Calculation Protocol and Force Field. Journal of the American Chemical Society 2015, 137, 2695–2703
  • Shih et al. 2020 Shih, A. Y.; Hack, M.; Mirzadegan, T. Impact of Protein Preparation on Resulting Accuracy of FEP Calculations. Journal of Chemical Information and Modeling 2020, 60, 5287–5289
  • Chatterjee et al. 2017 Chatterjee, P.; Botello-Smith, W. M.; Zhang, H.; Qian, L.; Alsamarah, A.; Kent, D.; Lacroix, J. J.; Baudry, M.; Luo, Y. Can Relative Binding Free Energy Predict Selectivity of Reversible Covalent Inhibitors? Journal of the American Chemical Society 2017, 139, 17945–17952, PMID: 29124934
  • Roos et al. 2019 Roos, K.; Wu, C.; Damm, W.; Reboul, M.; Stevenson, J. M.; Lu, C.; Dahlgren, M. K.; Mondal, S.; Chen, W.; Wang, L.; Abel, R.; Friesner, R. A.; Harder, E. D. OPLS3e: Extending Force Field Coverage for Drug-Like Small Molecules. Journal of Chemical Theory and Computation 2019, 15, 1863–1874
  • Lu et al. 2021 Lu, C.; Wu, C.; Ghoreishi, D.; Chen, W.; Wang, L.; Damm, W.; Ross, G. A.; Dahlgren, M. K.; Russell, E.; Von Bargen, C. D.; Abel, R.; Friesner, R. A.; Harder, E. D. OPLS4: Improving Force Field Accuracy on Challenging Regimes of Chemical Space. Journal of Chemical Theory and Computation 2021,
  • Vanommeslaeghe et al. 2010 Vanommeslaeghe, K.; Hatcher, E.; Acharya, C.; Kundu, S.; Zhong, S.; Shim, J.; Darian, E.; Guvench, O.; Lopes, P.; Vorobyov, I.; Mackerell Jr., A. D. CHARMM General Force Field: A Force Field for Drug-Like Molecules Compatible with the CHARMM All-Atom Additive Biological Force Fields. Journal of Computational Chemistry 2010, 31, 671–690
  • Wang et al. 2004 Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and Testing of a General Amber Force Field. Journal of Computational Chemistry 2004, 25, 1157–1174
  • Halgren 1996 Halgren, T. A. Merck Molecular Force Field. I. Basis, Form, Scope, Parameterization, and Performance of MMFF94. Journal of Computational Chemistry 1996, 17, 490–519
  • Horta et al. 2016 Horta, B. A. C.; Merz, P. T.; Fuchs, P. F. J.; Dolenc, J.; Riniker, S.; Hünenberger, P. H. A GROMOS-Compatible Force Field for Small Organic Molecules in the Condensed Phase: The 2016H66 Parameter Set. Journal of Chemical Theory and Computation 2016, 12, 3825–3850
  • Christ and Fox 2014 Christ, C. D.; Fox, T. Accuracy Assessment and Automation of Free Energy Calculations for Drug Design. Journal of Chemical Information and Modeling 2014, 54, 108–120
  • Loeffler et al. 2015 Loeffler, H. H.; Michel, J.; Woods, C. FESetup: Automating Setup for Alchemical Free Energy Simulations. Journal of Chemical Information and Modeling 2015, 55, 2485–2490
  • Fu et al. 2018 Fu, H.; Gumbart, J. C.; Chen, H.; Shao, X.; Cai, W.; Chipot, C. BFEE: A User-Friendly Graphical Interface Facilitating Absolute Binding Free-Energy Calculations. Journal of Chemical Information and Modeling 2018, 58, 556–560
  • Lundborg and Lindahl 2015 Lundborg, M.; Lindahl, E. Automatic GROMACS Topology Generation and Comparisons of Force Fields for Solvation Free Energy Calculations. The Journal of Physical Chemistry B 2015, 119, 810–823
  • Klimovich and Mobley 2015 Klimovich, P. V.; Mobley, D. L. A Python Tool to Set Up Relative Free Energy Calculations in GROMACS. Journal of Computer-Aided Molecular Design 2015, 29, 1007–1014
  • Gapsys et al. 2015 Gapsys, V.; Michielssens, S.; Seeliger, D.; de Groot, B. L. Pmx: Automated Protein Structure and Topology Generation for Alchemical Perturbations. Journal of Computational Chemistry 2015, 36, 348–354
  • Senapathi et al. 2020 Senapathi, T.; Suruzhon, M.; Barnett, C. B.; Essex, J.; Naidoo, K. J. BRIDGE: An Open Platform for Reproducible High-Throughput Free Energy Simulations. Journal of Chemical Information and Modeling 2020, 60, 5290–5295
  • Boresch et al. 2003 Boresch, S.; Tettinger, F.; Leitgeb, M.; Karplus, M. Absolute Binding Free Energies: A Quantitative Approach for Their Calculation. The Journal of Physical Chemistry B 2003, 107, 9535–9551
  • Mobley et al. 2007 Mobley, D. L.; Graves, A. P.; Chodera, J. D.; McReynolds, A. C.; Shoichet, B. K.; Dill, K. A. Predicting Absolute Ligand Binding Free Energies to A Simple Model Site. Journal of Molecular Biology 2007, 371, 1118–1134
  • Woo and Roux 2005 Woo, H.-J.; Roux, B. Calculation of Absolute Protein–Ligand Binding Free Energy from Computer Simulations. Proceedings of the National Academy of Sciences of the United States of America 2005, 102, 6825–30
  • Aldeghi et al. 2016 Aldeghi, M.; Heifetz, A.; Bodkin, M. J.; Knapp, S.; Biggin, P. C. Accurate Calculation of the Absolute Free Energy of Binding for Drug Molecules. Chemical Science 2016, 7, 207–218
  • Reddy et al. 1994 Reddy, M. R.; Varney, M. D.; Kalish, V.; Viswanadhan, V. N.; Appelt, K. Calculation of Relative Differences in the Binding Free Energies of HIV1 Protease Inhibitors: A Thermodynamic Cycle Perturbation Approach. Journal of Medicinal Chemistry 1994, 37, 1145–1152
  • Reddy and Erion 2001 Reddy, M. R.; Erion, M. D. Calculation of Relative Binding Free Energy Differences for Fructose 1,6-Bisphosphatase Inhibitors Using the Thermodynamic Cycle Perturbation Approach. Journal of the American Chemical Society 2001, 123, 6246–6252
  • Cournia et al. 2017 Cournia, Z.; Allen, B.; Sherman, W. Relative Binding Free Energy Calculations in Drug Discovery: Recent Advances and Practical Considerations. Journal of Chemical Information and Modeling 2017, 57, 2911–2937
  • Xu 2019 Xu, H. Optimal Measurement Network of Pairwise Differences. Journal of Chemical Information and Modeling 2019, 59, 4720–4728
  • Shirts and Chodera 2008 Shirts, M. R.; Chodera, J. D. Statistically Optimal Analysis of Samples from Multiple Equilibrium States. The Journal of Chemical Physics 2008, 129, 124105
  • Shenfeld et al. 2009 Shenfeld, D. K.; Xu, H.; Eastwood, M. P.; Dror, R. O.; Shaw, D. E. Minimizing Thermodynamic Length to Select Intermediate States for Free-Energy Calculations and Replica-Exchange Simulations. Physical Review E 2009, 80, 046705
  • Yang et al. 2020 Yang, Q.; Burchett, W.; Steeno, G. S.; Liu, S.; Yang, M.; Mobley, D. L.; Hou, X. Optimal Designs for Pairwise calculation: An Application to Free Energy Perturbation in Minimizing Prediction Variability. Journal of Computational Chemistry 2020, 41, 247–257
  • Liang et al. 2013 Liang, J.; Tsui, V.; Van Abbema, A.; Bao, L.; Barrett, K.; Beresini, M.; Berezhkovskiy, L.; Blair, W. S.; Chang, C.; Driscoll, J.; Eigenbrot, C.; Ghilardi, N.; Gibbons, P.; Halladay, J.; Johnson, A.; Kohli, P. B.; Lai, Y.; Liimatta, M.; Mantik, P.; Menghrajani, K.; Murray, J.; Sambrone, A.; Xiao, Y.; Shia, S.; Shin, Y.; Smith, J.; Sohn, S.; Stanley, M.; Ultsch, M.; Zhang, B.; Wu, L. C.; Magnuson, S. Lead Optimization of a 4-Aminopyridine Benzamide Scaffold To Identify Potent, Selective, and Orally Bioavailable TYK2 Inhibitors. Journal of Medicinal Chemistry 2013, 56, 4521–4536
  • Scheuermann et al. 2015 Scheuermann, T. H.; Stroud, D.; Sleet, C. E.; Bayeh, L.; Shokri, C.; Wang, H.; Caldwell, C. G.; Longgood, J.; MacMillan, J. B.; Bruick, R. K.; Gardner, K. H.; Tambar, U. K. Isoform-Selective and Stereoselective Inhibition of Hypoxia Inducible Factor-2. Journal of Medicinal Chemistry 2015, 58, 5930–5941
  • Liu et al. 2013 Liu, S.; Wu, Y.; Lin, T.; Abel, R.; Redmann, J. P.; Summa, C. M.; Jaber, V. R.; Lim, N. M.; Mobley, D. L. Lead Optimization Mapper: Automating Free Energy Calculations for Lead Optimization. Journal of Computer-Aided Molecular Design 2013, 27, 755–770
  • Stephen Boyd 2004 Stephen Boyd, L. V. Convex Optimization, Chapter 7; Cambridge University Press: Cambridge, 2004
  • 42 Xu, H. DiffNet. https://github.com/forcefield/DiffNet
  • Hopkins et al. 2015 Hopkins, C. W.; Le Grand, S.; Walker, R. C.; Roitberg, A. E. Long-Time-Step Molecular Dynamics through Hydrogen Mass Repartitioning. Journal of Chemical Theory and Computation 2015, 11, 1864–1874
  • Ryckaert et al. 1977 Ryckaert, J.-P.; Ciccotti, G.; Berendsen, H. J. Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. Journal of Computational Physics 1977, 23, 327–341
  • Miyamoto and Kollman 1992 Miyamoto, S.; Kollman, P. A. Settle: An analytical version of the SHAKE and RATTLE algorithm for rigid water models. Journal of Computational Chemistry 1992, 13, 952–962
  • Grønbech-Jensen and Farago 2013 Grønbech-Jensen, N.; Farago, O. A Simple and Effective Verlet-Type Algorithm for Simulating Langevin Dynamics. Molecular Physics 2013, 111, 983–991
  • Åqvist et al. 2004 Åqvist, J.; Wennerström, P.; Nervall, M.; Bjelic, S.; Brandsdal, B. O. Molecular Dynamics Simulations of Water and Biomolecules with a Monte Carlo Constant Pressure Algorithm. Chemical Physics Letters 2004, 384, 288–294
  • Maier et al. 2015 Maier, J. A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K. E.; Simmerling, C. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. Journal of Chemical Theory and Computation 2015, 11, 3696–3713
  • Case et al. 2020 Case, D. A.; Belfon, K.; Ben-Shalom, I. Y.; Brozell, S. R.; Cerutti, D. S.; Cheatham III, T. E.; Cruzeiro, V. W. D.; Darden, T. A.; Duke, R. E.; Giambasu, G.; Gilson, M. K.; Gohlke, H.; Goetz, A. W.; Harris, R.; Izadi, S.; Izmailov, S. A.; Kasavajhala, K.; Kovalenko, K.; Krasny, R.; Kurtzman, T.; Lee, T.; Le-Grand, S.; Li, P.; Lin, C.; Liu, J.; Luchko, T.; Luo, R.; Man, V.; Merz, K.; Miao, Y.; Mikhailovskii, O.; Monard, G.; ; Nguyen, H.; Onufriev, A.; Pan, F.; Pantano, S.; Qi, R.; Roe, D. R.; Roitberg, A.; Sagui, C.; Schott-Verdugo, S.; Shen, J.; Simmerling, C. L.; Skrynnikov, N.; Smith, J.; Swails, J.; Walker, R. C.; Wang, J.; Wilson, R. M.; Wolf, R. M.; Wu, X.; Xiong, Y.; Xue, Y.; York, D. M.; Kollman, P. A. AMBER 20. 2020; https://ambermd.org/
  • Hahn et al. 2021 Hahn, D. F.; Bayly, C. I.; Macdonald, H. E. B.; Chodera, J. D.; Mey, A. S. J. S.; Mobley, D. L.; Benito, L. P.; Schindler, C. E. M.; Tresadern, G.; Warren, G. L. Best Practices for Constructing, Preparing, and Evaluating Protein-Ligand Binding Affinity Benchmarks. 2021; https://arxiv.org/abs/2105.06222