Precise binding free energy calculations for multiple molecules using an optimal measurement network of pairwise differences
Abstract
Alchemical binding free energy (BFE) calculations offer an efficient and thermodynamically rigorous approach to in silico binding affinity predictions. As a result of decades of methodological improvements and recent advances in computer technology, alchemical BFE calculations are now widely used in drug discovery research. They help guide the prioritization of candidate drug molecules by predicting their binding affinities for a biomolecular target of interest (and potentially selectivity against undesirable anti-targets). Statistical variance associated with such calculations, however, may undermine the reliability of their predictions, introducing uncertainty both in ranking candidate molecules and in benchmarking their predictive accuracy. Here, we present a computational method that substantially improves the statistical precision in BFE calculations for a set of ligands binding to a common receptor by dynamically allocating computational resources to different BFE calculations according to an optimality objective established in a previous work from our group and extended in this work. Our method, termed Network Binding Free Energy (NetBFE), performs adaptive binding free energy calculations in iterations, re-optimizing the allocations in each iteration based on the statistical variances estimated from previous iterations. Using examples of NetBFE calculations for protein-binding of congeneric ligand series, we demonstrate that NetBFE approaches the optimal allocation in a small number () of iterations and that NetBFE reduces the statistical variance in the binding free energy estimates by approximately a factor of two when compared to a previously published and widely used allocation method at the same total computational cost.
1 Introduction
In drug discovery research, candidate drug molecules are increasingly ranked and selected by their computationally predicted affinities for the biomolecular target of interest before experimental synthesis and testing 1, 2, 3, 4, 5, 6. This allows for the exploration of many more molecular design ideas than is practical with experimental approaches alone. Alchemical binding free energy (BFE) calculations have become a valuable tool in predicting the protein-ligand binding affinities and their utility continues to increase thanks to advances in the methodologies 7, 8, 9, 10, 11, 12, 4, 13 force field 14, 15, 16, 17, 18, 19, computational power afforded by the graphical prcessing units (GPUs) 11, 9, 8, and the availability of simulation software and simplified workflows 11, 9, 4, 20, 21, 22, 23, 24, 25, 26.
BFE calculations can either compute the binding free energy of an individual molecule, by a technique commonly referred to as "absolute" binding free energy (ABFE) calculations 27, 28, 29, 30, or compute the difference between the binding free energies of two molecules, by "relative" binding free energy (RBFE) calculations 31, 32, 11, 33. The binding free energies of a set of molecules can be determined by a combination of ABFE calculations of select molecules and RBFE calculations of select pairs of molecules 34.
Statistical errors in BFE calculations confound decision-making in candidate molecule selection and the assessment of their predictive accuracy. There have been significant methodological progress in minimizing statistical errors in single BFE calculations 35, 36, but works have only recently been published addressing the question of how to reduce the collective statistical errors for BFE calculations of a set of molecules 34, 37, which represents the majority of applications of BFE calculations in drug discovery.
As suggested previously, at a fixed total computational cost, the overall statistical errors in BFE calculations for multiple molecules can be minimized by optimally allocating the computational resources to different RBFE and ABFE calculations. In particular, minimizing the total variance in the estimated binding free energies with respect to the allocation corresponds to the A-optimal in experimental design 34. Such optimal allocations, however, require knowledge of the statistical fluctuations in individual BFE calculations, which are unknown a priori. Because of this reason, A-optimal BFE calculations have not yet been reported for realistic examples.
In this work, we develop a method based on the A-optimal design to improve the statistical precision of BFE calculations. Our method, Network Binding Free Energy (NetBFE), performs adaptive BFE calculations in multiple iterations. In each iteration, NetBFE estimates the statistical fluctuations in different BFE calculations based on the results from previous iterations, and it reallocates the computational resources accordingly. This allows the A-optimal allocation to be used in realistic BFE calculations. We also extend the previous theoretical work by deriving new equations for the A-optimal design and the covariance matrix when only RBFE calculations are included (the previous theory required the inclusion of at least one ABFE calculation), a common scenario in the lead optimization stage of drug discovery.
Here, we apply NetBFE to calculating the binding free energies of 16 ligands against the TYK2 protein 11, 38 and 8 ligands against the HIF-2 protein 39. We compare the statistical errors in the estimated binding free energies from NetBFE calculations and those in the BFE calculations using RBFE pairs selected by the LOMAP algorithm developed by the Mobley lab 40, which is widely used in such calculations. At the same total computational cost, NetBFE reduces the statistical variance by about half compared to LOMAP. Our results suggest that NetBFE can substantially improve the statistical precision of BFE calculations in drug discovery, which should accelerate the computational selection of candidate drug molecules and thereby increase the efficiency of the design-make-test cycle.
2 Methods
2.1 A-optimal design of network binding free energy calculations
The A-optimal design of measurement network of pairwise differences was described previously in detail 34, and is briefly explained here. The binding free energy values for multiple molecules in the same binding site of a receptor are the quantities of interest in BFE calculations. Let be the binding free energy of molecule ; here we use instead of to make the notations in the mathematical presentation more compact, and to follow the same notation in Reference 34. ABFE computes the individual (i.e. ) and RBFE computes the difference between any pair of and : (i.e. ). These ABFE and RBFE calculations form our difference network, which can be represented by a graph of vertices and edges, where the vertices stand for the quantities , the edge between vertices stands for the difference measurement (i.e. the RBFE calculations), and the edge between vertices 0 and stands for the individual measurement (i.e. the ABFE calculations). For each BFE calculation, its asymptotic statistical variance , for , decreases with —the resource allocated to this calculation—as
(1) |
where is the statistical fluctuation in the corresponding BFE calculation.
Some of the binding free energy values, say , may have been measured experimentally. These experimental values and their concomitant statistical errors in the experiment, , can be included in the estimate of the binding free energies from BFE calculations. If the true binding free energies are , the likelihood that our BFE calculations produce results of and the experimental measurements produce results of is
(2) | |||||
Maximizing the log-likelihood with respect to yields the maximum likelihood estimator for :
(3) |
where is the Fisher information matrix with the elements
(4) |
and
(5) |
The covariance in the estimates of is given by the inverse of the Fisher information matrix:
(6) |
The A-optimal minimizes the total variance, tr(), subject to the constraints of non-negativity
(7) |
and of the total fixed computational cost
(8) |
This minimization can be solved by cone programming41, and it is implemented in the publicly available software DiffNet. 42
2.2 A-optimal for a network with only relative binding free energy calculations
Sometimes we may choose to perform only relative binding free energy calculations and compute only . In this case the free energies can only be determined up to an arbitrary constant , which can be chosen to minimize the root mean square deviation from the known experimental values.
The log-likelihood in this case is given by
(9) |
The corresponding Fisher information matrix
(10) |
is singular. Consequently, the covariance matrix cannot be defined.
This problem can be solved by introducing an additional term to the log-likelihood to restrain the mean to a constant :
(11) |
which is equivalent to specifying that the mean of is normally distributed around with the standard deviation of . The corresponding Fisher information matrix is
(12) |
where is a matrix of elements 1. We substitute this augmented Fisher matrix for in the semi-definite programming for A-optimal when there are no absolute binding free energy calculations. It can be demonstrated that the optimal allocation does not depend on the value of .
The corresponding covariance matrix is
(13) |
We are interested in , such that the mean of is constrained to the constant . To derive , consider the expansion of in :
(14) |
We have
(15) | |||||
Comparing the terms grouped by different orders of , we have
(16) |
where is a vector whose elements are the column sums of . This can be written in the matrix form
(17) |
the solution of which gives a well-defined covariance matrix . Incidentally, .
2.3 Iterative network binding free energy calculations
The statistical fluctuations depend on both the thermodynamic length between the two end states in an alchemical calculation and the ratio of the relaxation time of relevant motions to the length of the simulation.36, 7 In practice, they have to be estimated (approximately) from the completed BFE calculations. In order to obtain values for allocating the computational resources according to the A-optimal, NetBFE calculations proceed in iterations, with each iteration consuming a predetermined amount, , of total sampling. After each iteration , estimates of are updated as described in Section. 2.6 below, and a new A-optimal allocation is determined from the updated and used to extend the BFE calculations for iteration . The iterations continue until a specified total simulation samples, , or a specified target average statistical variance tr()/m is reached. Data accumulated from all the iterations are included to compute the binding free energy value for each BFE calculation.
In each NetBFE calculation reported in this work, a total of simulation time is used for a set of molecules. This is divided into iterations and in each iteration of simulation time is allocated.
2.4 A-optimal on sparse 2-connected sub-networks
In order to reduce the number of BFE calculations in each iteration, we select a 2-connected subgraph out of the full graph in each iteration, and obtain the A-optimal on this sub-network. It was demonstrated previously that the 2-connected sub-network can generate near-optimal allocations.34 The allocation in each iteration thus proceeds in two steps in our NetBFE implementation. First, we determine the A-optimal allocation of the computational resources for the next iteration on the full graph . The edges with the largest allocations are selected to form the 2-connected subgraph. Here, the edge selection can be based on either the total cumulative allocation including all previous iterations or only the allocation for this iteration. We have implemented both selection methods, and we refer to NetBFE calculations with the former method as NetBFE() and those with the latter method as NetBFE(). Second, A-optimal allocation is obtained including only the BFE calculations in the selected 2-connected subgraph.
2.5 Estimates of the statistical variances in BFE calculations
The statistical variance in the computed (by RBFE for and by ABFE for ) is estimated by block bootstrapping, using all the data up to the current iteration. The sampled data points in free energy simulations are divided into non-overlapping blocks, each block containing samples from a simulation time interval. The data are resampled by randomly drawing blocks with replacement. A value is computed from each set of resampled data, and the corresponding variance is computed from repeats of the resampling.
2.6 Estimates of the statistical fluctuations in BFE calculations
The statistical fluctuations associated with the calculation of can be estimated from its statistical variance and its given computational cost by inverting Eq. 1:
(18) |
We use a simple model to predict of BFE calculations that have not yet been performed (thus is unavailable):
(19) |
where for , is the number of heavy atoms in molecule that do not map onto atoms in molecule and is the number of hydrogen atoms in molecule that do not map onto atoms in molecule j in the RBFE calculation; for , and are the respective numbers of heavy and hydrogen atoms in molecule in the ABFE calculation.
Before the first iteration, the parameters are initialized to , and . In each iteration, the values for the performed RBFE calculations are updated by Eq. 18, and the parameters , , and are updated by minimizing the root-mean-square-error (RMSE) between the values predicted from Eq. 19 and those computed from Eq. 18.
2.7 Maximum likelihood estimate of binding free energies
The individual (absolute) binding free energies can be derived from the pairwise (relative) binding free energy differences and their corresponding variances by using the maximum likelihood estimator (Eq. 3). In practical drug discovery projects, a few molecules usually have experimentally determined binding free energies—reference binding free energies–which can be included in the computational results to derive a maximum likelihood estimate of binding free energies (Eqs. 4 and 5).
2.8 Molecular dynamic simulations
In each individual RBFE/ABFE calculation, the protein-ligand complex was first equilibrated by a 5 ns simulation, where the snapshots were saved every 2500 MD steps and the total 2000 snapshots are randomly drawn as the starting structures for replicates of RBFE/ABFE calculations. Hydrogen mass repartition 43 in conjunction with SHAKE 44/SETTLE 45 for constraining the distances between hydrogen atoms and their covalently bonded heavy atoms was employed to enable the use of a large time step of 4 fs in all simulations. The potential energies at all values were computed and saved every 5 ps. Each alchemical simulation consisted of both the "complex" stage, which simulated the protein-ligand complex, and the "solvated" stage, which simulated the ligand in aqueous solution. The temperature was kept at 298.15 K using Langevin dynamics 46 with the collision frequency 2.0 ps-1, and the pressure was kept at 1.01325 bar with Monte Carlo barostat 47 and a pressure relaxation time of 1.0 ps. The proteins were modeled by AMEBR ff14SB force field 48, and the small molecules were parameterized using our in-house force field generation program; the force field parameters and starting protein-ligand complex structures are provided as AMBER20 input files in the Supplementary Materials. The entire system was solvated in a periodic box of TIP3P water molecules with all the solute atoms no less than 12 Å away from the boundary of the unit cell. In this work, the initial structures of TYK2 protein target with 16 ligands come from reference 11 and HIF-2 (PDB ID: 4XT2) protein target with 8 ligands are constructed according to reference39. All simulations were performed using the AMBER20 package.49
Each individual RBFE calculation consists of simulations (decoupled protocol) or simulations (concerted protocol) and each individual ABFE calculation consists of simulations (decoupled protocol) or simulations (concerted protocol) at different values. Decoupled and concerted protocols are defined in references 5 and 8. If a BFE calculation is allocated a total simulation time of , each -valued simulation will run for a total of simulation time. If ns, this BFE calculation is omitted and its time is allocated proportionally to other BFE calculations, otherwise the simulation time is divided into replicates initialized with slightly different binding poses and random number seeds (thus no individual simulation exceeds 2 ns simulation time, which helps load balance when the simulations are executed in parallel). In each iteration, new replicates for each individual BFE calculation receiving new allocations are initiated; existing simulations are not extended. The allocation only includes the "complex" stage of each individual BFE calculation; if a BFE calculation is performed, its corresponding "solvated" stage is simulated for a fixed time of ns, which incurs a negligible computational cost and is not included in the total allocated computational resource.
3 Results
We first illustrate the iterative allocation in NetBFE consisting of both ABFE and RBFE calculations, using the small example of eight inhibitors against the protein kinase TYK2. No experimental binding free energies were included in the allocation and analysis, i.e. in Eq. 4. The estimated fluctuations and the corresponding A-optimal allocation of in each iteration are shown in Fig. 1A and Fig. 1B, respectively. The total variance decreases with increasing iterations.
How well does this iterative procedure approach the true A-optimal allocation? If we assume that the estimate of after the last iteration represents the true statistical fluctuations, approximates the variance of the A-optimal allocation. We measure how far the allocation in the ’th iteration—normalized to so that its total number of samples is the same as that in the last iteration—is from the this approximate A-optimal by comparing to . The former indeed approaches the latter (Fig. 1C) after only three iterations, suggesting that the iterative allocation is converging quickly to the true A-optimal.
Because ABFE converges much slower than RBFE (on average ABFE has a much higher statistical fluctuation than RBFE: 6.2), most of the computations (approximately 81 of total resources) are allocated to ABFE in order to improve the precision of the absolute values of the computed binding free energies.

Often it is unnecessary to predict the absolute values of the binding free energies, such as in the case when the values for some similar molecules have been experimentally determined, thus the values for the molecules of interest can be determined by predicting the differences between the molecules, or in the case of ranking the molecules by their binding affinities, such that only their relative differences will suffice. In such cases ABFE calculations may be omitted to avoid the most expensive part of the computations, as noted above. In the following, we consider NetBFE with only RBFE calculations.
We used NetBFE to compute the binding free energies of 16 inhibitors against TYK2 38, 11 and 8 inhibitors against HIF-2 39. These molecules are depicted in Fig. 2 and listed in the Supplementary Materials. We first characterized the progression of statistical errors after each iteration of extending the total simulation time (Fig. 3). The standard deviations and the total variance by and large decreases with increasing total simulation time. Moreover, the distribution of the standard errors becomes narrower, indicating that most of the free energy results have converged to similar precision.

In the case of HIF-2 inhibitors, the agreement between the experimental binding free energies and their predicted values improves at each iteration (Fig. 3), indicating that the improvement in precision is accompanied by the improvement in accuracy; for TYK2 inhibitors, the accuracy is quite good (RMSE < 1 kcal/mol) in all 5 iterations. Of course, accuracy in the binding free energy calculations depends on other factors such as the force field parametrization and the consistency between the computational setup and the experimental conditions. It remains unknown whether in other protein-ligand combinations or in other force fields the accuracy of the predicted binding free energies improves with the statistical precision.

In each iteration of the NetBFE calculation, the allocations are optimized according to the estimate of statistical fluctuations based on the results of the previous iteration, thus both and change from one iteration to the next. We measure the difference between two allocations of and by the Kullback-Leibler (KL) divergence:
(20) |
where and similarly are the fractional allocations. In the two cases above, approach their limiting values with the progression of iterations (Fig. 4): the KL-divergence between and decreases toward 0.69 for TYK2 and 0.28 for HIF-2 with each iteration, where is the the A-optimal allocation based on from the last iteration, assuming, as in the above, that the estimate of from the last iteration is a good approximation to the true statistical fluctuations. Taking the A-optimal variance to be , rapidly approaches one with each iteration for both TYK2 and HIF-2 examples (as above, is the normalized allocation in iteration ). Note that is determined on a 2-connected sub-network (see Section 2.4) while is determined on the full network. These observations suggest that the iterations in NetBFE using a two-connected sub-network result in near-optimal allocations.

A simple consistency check can help identify problematic RBFE calculations by comparing the in the RBFE calculation against the difference , where and are derived by the maximum likelihood estimator of Eq. 3. Fig. 5 shows such comparison for the NetBFE calculations of TYK2 and HIF-2 inhibitors; points significantly deviating from the diagonal (e.g. points for which ) indicate potentially problematic RBFE calculations.

Having demonstrated above that the iterative procedure in NetBFE allocates the computational resources approximately according to the A-optimal, we next investigate whether such allocations yield improved precision in the computed binding free energies. We performed independent replicates of NetBFE calculations of the TYK2 and HIF-2 examples, initializing each replicate with the same binding poses but different random number seeds in equilibration and alchemical simulations, and computed the statistical variances in the predicted binding free energies across the replicates. In comparison, we computed the same sets of binding free energies—and their statistical variances from independent replicates—by allocating the same total computational cost evenly to pairs of RBFE calculations selected by a previously published method, LOMAP 40. Fig. 6 shows the distributions of variances for the two NetBFE allocation methods (NetBFE() and NetBFE(), explained in Section 2.4) and the LOMAP allocation method. The mean and median values of the variance distributions are listed in Table 1. The NetBFE allocations indeed significantly outperform the LOMAP allocations in terms of the statistical precision in the predicted binding free energies. The reduction in the statistical variance by approximately a factor of two is consistent with the previous results from simulated data 34.

TYK2 | HIF-2 | |||||
---|---|---|---|---|---|---|
methods | mean | median | mean | median | ||
NetBFE() | 0.11 0.05 | 0.04 0.02 | 0.13 0.07 | 0.05 0.06 | ||
NetBFE() | 0.03 0.01 | 0.03 0.01 | 0.11 0.03 | 0.10 0.02 | ||
LOMAP | 0.10 0.03 | 0.07 0.02 | 0.25 0.10 | 0.13 0.13 |
Reference binding free energy values from experimental measurements for a subset of the molecules, when available, may be incorporated in the NetBFE calculations by supplying and in Eq. 4, 5. Here we tested the allocation and the corresponding precision and accuracy in NetBFE calculations that include different numbers of reference values. We performed four independent NetBFE calculations for the 16 inhibitors against TYK2, supplying reference experimental values for molecules, respectively.
The optimal allocations at the end of five iterations are shown in Fig. 7. When there are a small number of reference molecules (), they serve as network hubs, in that there is a RBFE edge connecting them to every other molecule. As we increase the number of reference molecules (), however, only a partial subset of the molecules are connected to each reference molecule for which RBFE calculations are quick to converge. The predicted binding free energies are compared to the experimental values in Fig. 8. The precision and accuracy of the NetBFE calculations are summarized in Table 2. In this case, including more reference values does not significantly improve either the precision or the accuracy.

k | 0 | 1 | 2 | 4 |
---|---|---|---|---|
0.0030 0.0011 | 0.0040 0.0006 | 0.0020 0.0003 | 0.0020 0.0002 | |
MUE | 0.82 0.12 | 0.74 0.11 | 0.72 0.10 | 0.79 0.14 |
RMSE | 0.93 0.11 | 0.86 0.12 | 0.82 0.11 | 0.97 0.13 |
Pearson | 0.87 0.07 | 0.82 0.09 | 0.84 0.07 | 0.76 0.11 |
Spearman | 0.88 0.09 | 0.89 0.10 | 0.87 0.10 | 0.90 0.08 |
Kendall | 0.72 0.11 | 0.72 0.12 | 0.72 0.12 | 0.75 0.10 |

4 Conclusions
In this work, we have developed a new method to improve the overall statistical precision of binding free energy calculations for multiple molecules against a common biomolecular target. NetBFE (Network Binding Free Energy) implements and extends a previous theoretical work from our group on A-optimal experimental design for measuring pairwise differences, making it practical for realistic binding free energy calculations. We have demonstrated that, at a fixed total computational cost, NetBFE can reduce the statistical errors in BFE calculations by approximately two-fold. The improved precision will increase the confidence in the computational ranking of candidate drug molecules, which is instrumental to prioritizing molecules for synthesis in the design-make-test cycle. In addition, as binding free energy calculations are increasingly used for validating and benchmarking force field models and free energy methods4, 11, 7, 50, the reduced statistical errors afforded by NetBFE will make the benchmark assessments more reliable and thus accelerate the improvements in in silico binding assays.
References
- Chodera et al. 2011 Chodera, J.; Mobley, D.; Shirts, M.; Dixon, R.; Branson, K.; Pande, V. Alchemical Free Energy Methods for Drug Discovery: Progress and Challenges. Current Opinion in Structural Biology 2011, 21, 150–160
- Mobley and Klimovich 2012 Mobley, D. L.; Klimovich, P. V. Perspective: Alchemical Free Energy Calculations for Drug Discovery. The Journal of Chemical Physics 2012, 137, 230901
- Abel et al. 2017 Abel, R.; Wang, L.; Harder, E. D.; Berne, B. J.; Friesner, R. A. Advancing Drug Discovery through Enhanced Free Energy Calculations. Accounts of Chemical Research 2017, 50, 1625–1632
- Schindler et al. 2020 Schindler, C. E. M.; Baumann, H.; Blum, A.; Böse, D.; Buchstaller, H.-P.; Burgdorf, L.; Cappel, D.; Chekler, E.; Czodrowski, P.; Dorsch, D.; Eguida, M. K. I.; Follows, B.; Fuch, T.; Grädler, U.; Gunera, J.; Johnson, T.; Jorand Lebrun, C.; Karra, S.; Klein, M.; Knehans, T.; Koetzner, L.; Krier, M.; Leiendecker, M.; Leuthner, B.; Li, L.; Mochalkin, I.; Musil, D.; Neagu, C.; Rippmann, F.; Schiemann, K.; Schulz, R.; Steinbrecher, T.; Tanzer, E.-M.; Unzue Lopez, A.; Viacava Follis, A.; Wegener, A.; Kuhn, D. Large-Scale Assessment of Binding Free Energy Calculations in Active Drug Discovery Projects. Journal of Chemical Information and Modeling 2020, 60, 5457–5474
- Song and Merz 2020 Song, L. F.; Merz, K. M. Evolution of Alchemical Free Energy Methods in Drug Discovery. Journal of Chemical Information and Modeling 2020, 60, 5308–5318
- Cournia et al. 2020 Cournia, Z.; Allen, B. K.; Beuming, T.; Pearlman, D. A.; Radak, B. K.; Sherman, W. Rigorous Free Energy Simulations in Virtual Screening. Journal of Chemical Information and Modeling 2020, 60, 4153–4169
- Mey et al. 2020 Mey, A. S. J. S.; Allen, B. K.; Macdonald, H. E. B.; Chodera, J. D.; Hahn, D. F.; Kuhn, M.; Michel, J.; Mobley, D. L.; Naden, L. N.; Prasad, S.; Rizzi, A.; Scheen, J.; Shirts, M. R.; Tresadern, G.; Xu, H. Best Practices for Alchemical Free Energy Calculations [Article v1.0]. Living Journal of Computational Molecular Science 2020, 2, 18378
- Tsai et al. 2020 Tsai, H.-C.; Tao, Y.; Lee, T.-S.; Merz, K. M.; York, D. M. Validation of Free Energy Methods in AMBER. Journal of Chemical Information and Modeling 2020, 60, 5296–5300
- Lee et al. 2020 Lee, T.-S.; Allen, B. K.; Giese, T. J.; Guo, Z.; Li, P.; Lin, C.; McGee, T. D.; Pearlman, D. A.; Radak, B. K.; Tao, Y.; Tsai, H.-C.; Xu, H.; Sherman, W.; York, D. M. Alchemical Binding Free Energy Calculations in AMBER20: Advances and Best Practices for Drug Discovery. Journal of Chemical Information and Modeling 2020, 60, 5595–5623
- He et al. 2020 He, X.; Liu, S.; Lee, T.-S.; Ji, B.; Man, V. H.; York, D. M.; Wang, J. Fast, Accurate, and Reliable Protocols for Routine Calculations of Protein–Ligand Binding Affinities in Drug Design Projects Using AMBER GPU-TI with ff14SB/GAFF. ACS Omega 2020, 5, 4611–4619
- Wang et al. 2015 Wang, L.; Wu, Y.; Deng, Y.; Kim, B.; Pierce, L.; Krilov, G.; Lupyan, D.; Robinson, S.; Dahlgren, M. K.; Greenwood, J.; Romero, D. L.; Masse, C.; Knight, J. L.; Steinbrecher, T.; Beuming, T.; Damm, W.; Harder, E.; Sherman, W.; Brewer, M.; Wester, R.; Murcko, M.; Frye, L.; Farid, R.; Lin, T.; Mobley, D. L.; Jorgensen, W. L.; Berne, B. J.; Friesner, R. A.; Abel, R. Accurate and Reliable Prediction of Relative Ligand Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy Calculation Protocol and Force Field. Journal of the American Chemical Society 2015, 137, 2695–2703
- Shih et al. 2020 Shih, A. Y.; Hack, M.; Mirzadegan, T. Impact of Protein Preparation on Resulting Accuracy of FEP Calculations. Journal of Chemical Information and Modeling 2020, 60, 5287–5289
- Chatterjee et al. 2017 Chatterjee, P.; Botello-Smith, W. M.; Zhang, H.; Qian, L.; Alsamarah, A.; Kent, D.; Lacroix, J. J.; Baudry, M.; Luo, Y. Can Relative Binding Free Energy Predict Selectivity of Reversible Covalent Inhibitors? Journal of the American Chemical Society 2017, 139, 17945–17952, PMID: 29124934
- Roos et al. 2019 Roos, K.; Wu, C.; Damm, W.; Reboul, M.; Stevenson, J. M.; Lu, C.; Dahlgren, M. K.; Mondal, S.; Chen, W.; Wang, L.; Abel, R.; Friesner, R. A.; Harder, E. D. OPLS3e: Extending Force Field Coverage for Drug-Like Small Molecules. Journal of Chemical Theory and Computation 2019, 15, 1863–1874
- Lu et al. 2021 Lu, C.; Wu, C.; Ghoreishi, D.; Chen, W.; Wang, L.; Damm, W.; Ross, G. A.; Dahlgren, M. K.; Russell, E.; Von Bargen, C. D.; Abel, R.; Friesner, R. A.; Harder, E. D. OPLS4: Improving Force Field Accuracy on Challenging Regimes of Chemical Space. Journal of Chemical Theory and Computation 2021,
- Vanommeslaeghe et al. 2010 Vanommeslaeghe, K.; Hatcher, E.; Acharya, C.; Kundu, S.; Zhong, S.; Shim, J.; Darian, E.; Guvench, O.; Lopes, P.; Vorobyov, I.; Mackerell Jr., A. D. CHARMM General Force Field: A Force Field for Drug-Like Molecules Compatible with the CHARMM All-Atom Additive Biological Force Fields. Journal of Computational Chemistry 2010, 31, 671–690
- Wang et al. 2004 Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and Testing of a General Amber Force Field. Journal of Computational Chemistry 2004, 25, 1157–1174
- Halgren 1996 Halgren, T. A. Merck Molecular Force Field. I. Basis, Form, Scope, Parameterization, and Performance of MMFF94. Journal of Computational Chemistry 1996, 17, 490–519
- Horta et al. 2016 Horta, B. A. C.; Merz, P. T.; Fuchs, P. F. J.; Dolenc, J.; Riniker, S.; Hünenberger, P. H. A GROMOS-Compatible Force Field for Small Organic Molecules in the Condensed Phase: The 2016H66 Parameter Set. Journal of Chemical Theory and Computation 2016, 12, 3825–3850
- Christ and Fox 2014 Christ, C. D.; Fox, T. Accuracy Assessment and Automation of Free Energy Calculations for Drug Design. Journal of Chemical Information and Modeling 2014, 54, 108–120
- Loeffler et al. 2015 Loeffler, H. H.; Michel, J.; Woods, C. FESetup: Automating Setup for Alchemical Free Energy Simulations. Journal of Chemical Information and Modeling 2015, 55, 2485–2490
- Fu et al. 2018 Fu, H.; Gumbart, J. C.; Chen, H.; Shao, X.; Cai, W.; Chipot, C. BFEE: A User-Friendly Graphical Interface Facilitating Absolute Binding Free-Energy Calculations. Journal of Chemical Information and Modeling 2018, 58, 556–560
- Lundborg and Lindahl 2015 Lundborg, M.; Lindahl, E. Automatic GROMACS Topology Generation and Comparisons of Force Fields for Solvation Free Energy Calculations. The Journal of Physical Chemistry B 2015, 119, 810–823
- Klimovich and Mobley 2015 Klimovich, P. V.; Mobley, D. L. A Python Tool to Set Up Relative Free Energy Calculations in GROMACS. Journal of Computer-Aided Molecular Design 2015, 29, 1007–1014
- Gapsys et al. 2015 Gapsys, V.; Michielssens, S.; Seeliger, D.; de Groot, B. L. Pmx: Automated Protein Structure and Topology Generation for Alchemical Perturbations. Journal of Computational Chemistry 2015, 36, 348–354
- Senapathi et al. 2020 Senapathi, T.; Suruzhon, M.; Barnett, C. B.; Essex, J.; Naidoo, K. J. BRIDGE: An Open Platform for Reproducible High-Throughput Free Energy Simulations. Journal of Chemical Information and Modeling 2020, 60, 5290–5295
- Boresch et al. 2003 Boresch, S.; Tettinger, F.; Leitgeb, M.; Karplus, M. Absolute Binding Free Energies: A Quantitative Approach for Their Calculation. The Journal of Physical Chemistry B 2003, 107, 9535–9551
- Mobley et al. 2007 Mobley, D. L.; Graves, A. P.; Chodera, J. D.; McReynolds, A. C.; Shoichet, B. K.; Dill, K. A. Predicting Absolute Ligand Binding Free Energies to A Simple Model Site. Journal of Molecular Biology 2007, 371, 1118–1134
- Woo and Roux 2005 Woo, H.-J.; Roux, B. Calculation of Absolute Protein–Ligand Binding Free Energy from Computer Simulations. Proceedings of the National Academy of Sciences of the United States of America 2005, 102, 6825–30
- Aldeghi et al. 2016 Aldeghi, M.; Heifetz, A.; Bodkin, M. J.; Knapp, S.; Biggin, P. C. Accurate Calculation of the Absolute Free Energy of Binding for Drug Molecules. Chemical Science 2016, 7, 207–218
- Reddy et al. 1994 Reddy, M. R.; Varney, M. D.; Kalish, V.; Viswanadhan, V. N.; Appelt, K. Calculation of Relative Differences in the Binding Free Energies of HIV1 Protease Inhibitors: A Thermodynamic Cycle Perturbation Approach. Journal of Medicinal Chemistry 1994, 37, 1145–1152
- Reddy and Erion 2001 Reddy, M. R.; Erion, M. D. Calculation of Relative Binding Free Energy Differences for Fructose 1,6-Bisphosphatase Inhibitors Using the Thermodynamic Cycle Perturbation Approach. Journal of the American Chemical Society 2001, 123, 6246–6252
- Cournia et al. 2017 Cournia, Z.; Allen, B.; Sherman, W. Relative Binding Free Energy Calculations in Drug Discovery: Recent Advances and Practical Considerations. Journal of Chemical Information and Modeling 2017, 57, 2911–2937
- Xu 2019 Xu, H. Optimal Measurement Network of Pairwise Differences. Journal of Chemical Information and Modeling 2019, 59, 4720–4728
- Shirts and Chodera 2008 Shirts, M. R.; Chodera, J. D. Statistically Optimal Analysis of Samples from Multiple Equilibrium States. The Journal of Chemical Physics 2008, 129, 124105
- Shenfeld et al. 2009 Shenfeld, D. K.; Xu, H.; Eastwood, M. P.; Dror, R. O.; Shaw, D. E. Minimizing Thermodynamic Length to Select Intermediate States for Free-Energy Calculations and Replica-Exchange Simulations. Physical Review E 2009, 80, 046705
- Yang et al. 2020 Yang, Q.; Burchett, W.; Steeno, G. S.; Liu, S.; Yang, M.; Mobley, D. L.; Hou, X. Optimal Designs for Pairwise calculation: An Application to Free Energy Perturbation in Minimizing Prediction Variability. Journal of Computational Chemistry 2020, 41, 247–257
- Liang et al. 2013 Liang, J.; Tsui, V.; Van Abbema, A.; Bao, L.; Barrett, K.; Beresini, M.; Berezhkovskiy, L.; Blair, W. S.; Chang, C.; Driscoll, J.; Eigenbrot, C.; Ghilardi, N.; Gibbons, P.; Halladay, J.; Johnson, A.; Kohli, P. B.; Lai, Y.; Liimatta, M.; Mantik, P.; Menghrajani, K.; Murray, J.; Sambrone, A.; Xiao, Y.; Shia, S.; Shin, Y.; Smith, J.; Sohn, S.; Stanley, M.; Ultsch, M.; Zhang, B.; Wu, L. C.; Magnuson, S. Lead Optimization of a 4-Aminopyridine Benzamide Scaffold To Identify Potent, Selective, and Orally Bioavailable TYK2 Inhibitors. Journal of Medicinal Chemistry 2013, 56, 4521–4536
- Scheuermann et al. 2015 Scheuermann, T. H.; Stroud, D.; Sleet, C. E.; Bayeh, L.; Shokri, C.; Wang, H.; Caldwell, C. G.; Longgood, J.; MacMillan, J. B.; Bruick, R. K.; Gardner, K. H.; Tambar, U. K. Isoform-Selective and Stereoselective Inhibition of Hypoxia Inducible Factor-2. Journal of Medicinal Chemistry 2015, 58, 5930–5941
- Liu et al. 2013 Liu, S.; Wu, Y.; Lin, T.; Abel, R.; Redmann, J. P.; Summa, C. M.; Jaber, V. R.; Lim, N. M.; Mobley, D. L. Lead Optimization Mapper: Automating Free Energy Calculations for Lead Optimization. Journal of Computer-Aided Molecular Design 2013, 27, 755–770
- Stephen Boyd 2004 Stephen Boyd, L. V. Convex Optimization, Chapter 7; Cambridge University Press: Cambridge, 2004
- 42 Xu, H. DiffNet. https://github.com/forcefield/DiffNet
- Hopkins et al. 2015 Hopkins, C. W.; Le Grand, S.; Walker, R. C.; Roitberg, A. E. Long-Time-Step Molecular Dynamics through Hydrogen Mass Repartitioning. Journal of Chemical Theory and Computation 2015, 11, 1864–1874
- Ryckaert et al. 1977 Ryckaert, J.-P.; Ciccotti, G.; Berendsen, H. J. Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. Journal of Computational Physics 1977, 23, 327–341
- Miyamoto and Kollman 1992 Miyamoto, S.; Kollman, P. A. Settle: An analytical version of the SHAKE and RATTLE algorithm for rigid water models. Journal of Computational Chemistry 1992, 13, 952–962
- Grønbech-Jensen and Farago 2013 Grønbech-Jensen, N.; Farago, O. A Simple and Effective Verlet-Type Algorithm for Simulating Langevin Dynamics. Molecular Physics 2013, 111, 983–991
- Åqvist et al. 2004 Åqvist, J.; Wennerström, P.; Nervall, M.; Bjelic, S.; Brandsdal, B. O. Molecular Dynamics Simulations of Water and Biomolecules with a Monte Carlo Constant Pressure Algorithm. Chemical Physics Letters 2004, 384, 288–294
- Maier et al. 2015 Maier, J. A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K. E.; Simmerling, C. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. Journal of Chemical Theory and Computation 2015, 11, 3696–3713
- Case et al. 2020 Case, D. A.; Belfon, K.; Ben-Shalom, I. Y.; Brozell, S. R.; Cerutti, D. S.; Cheatham III, T. E.; Cruzeiro, V. W. D.; Darden, T. A.; Duke, R. E.; Giambasu, G.; Gilson, M. K.; Gohlke, H.; Goetz, A. W.; Harris, R.; Izadi, S.; Izmailov, S. A.; Kasavajhala, K.; Kovalenko, K.; Krasny, R.; Kurtzman, T.; Lee, T.; Le-Grand, S.; Li, P.; Lin, C.; Liu, J.; Luchko, T.; Luo, R.; Man, V.; Merz, K.; Miao, Y.; Mikhailovskii, O.; Monard, G.; ; Nguyen, H.; Onufriev, A.; Pan, F.; Pantano, S.; Qi, R.; Roe, D. R.; Roitberg, A.; Sagui, C.; Schott-Verdugo, S.; Shen, J.; Simmerling, C. L.; Skrynnikov, N.; Smith, J.; Swails, J.; Walker, R. C.; Wang, J.; Wilson, R. M.; Wolf, R. M.; Wu, X.; Xiong, Y.; Xue, Y.; York, D. M.; Kollman, P. A. AMBER 20. 2020; https://ambermd.org/
- Hahn et al. 2021 Hahn, D. F.; Bayly, C. I.; Macdonald, H. E. B.; Chodera, J. D.; Mey, A. S. J. S.; Mobley, D. L.; Benito, L. P.; Schindler, C. E. M.; Tresadern, G.; Warren, G. L. Best Practices for Constructing, Preparing, and Evaluating Protein-Ligand Binding Affinity Benchmarks. 2021; https://arxiv.org/abs/2105.06222