Protein model quality assessment using rotation-equivariant, hierarchical neural networks

Stephan Eismann
Department of Applied Physics
Stanford University
[email protected]
\AndPatricia Suriana^†^†footnotemark:
Department of Computer Science
Stanford University
[email protected]
\ANDBowen Jing
Department of Computer Science
Stanford University
[email protected]
\AndRaphael J.L. Townshend
Department of Computer Science
Stanford University
[email protected]
\AndRon O. Dror
Department of Computer Science
Stanford University
[email protected]
Equal contribution

Abstract

Proteins are miniature machines whose function depends on their three-dimensional (3D) structure. Determining this structure computationally remains an unsolved grand challenge. A major bottleneck involves selecting the most accurate structural model among a large pool of candidates, a task addressed in model quality assessment. Here, we present a novel deep learning approach to assess the quality of a protein model. Our network builds on a point-based representation of the atomic structure and rotation-equivariant convolutions at different levels of structural resolution. These combined aspects allow the network to learn end-to-end from entire protein structures. Our method achieves state-of-the-art results in scoring protein models submitted to recent rounds of CASP, a blind prediction community experiment. Particularly striking is that our method does not use physics-inspired energy terms and does not rely on the availability of additional information (beyond the atomic structure of the individual protein model), such as sequence alignments of multiple proteins.

1 Introduction

Proteins–—important components of the cell which perform a wide array of functions—–comprise long chains of amino acids that fold into compact globular 3D structures. Determination of this 3D structure is critical not only for understanding how proteins function, but also for designing drugs that can bind to a protein and alter its activities. Solving protein structures experimentally is difficult, time consuming and expensive, leading to the ever-increasing gap between available sequence data and available experimental structures. This gap amplifies the critical need for computational approaches that accurately predict protein structure from amino acid sequences.

Despite the recent advances in computational methods [1, 2, 3, 4, 5], protein structure prediction remains an unsolved grand challenge. This challenge generally involves two steps: sampling and scoring. Sampling describes the generation of candidate models of protein structure given a sequence. Scoring aims to select the best among the large pool of candidate models where best describes how close a given model is to the true structure. This latter task of model quality assessment has attracted the application of a number of deep-learning methods in recent years [6, 7, 8, 9, 10].

Here, we introduce a deep-learning scoring function that assesses model quality given just the atomic coordinates and without the use of physics-inspired energy terms or other pre-computed features. Our method has several key characteristics: (1) equivariance to 3D rotations, which allows the network to recognize structural motifs independent of their orientation, (2) hierarchical layers that preserve rotation equivariance, allowing the network to identify structural motifs at many scales, (3) a focus on local interactions at each hierarchical level, reflecting the fact that inter-atomic forces are predominantly local, and (4) learning directly from atomic coordinates rather than mapping to a grid, allowing high spatial resolution even for large structures.

Our method shows state-of-the-art results in ranking protein models submitted to recent Critical Assessment of protein Structure Prediction (CASP) community experiments (CASP11-12) and does not rely on the availability of additional information, such as multiple sequence alignments.

2 Methods

2.1 Dataset

We train and test our method on candidate models submitted to multiple rounds of CASP, a biennial community experiment. CASP [11] addresses the protein structure prediction problem by withholding newly solved experimental structures (referred to as targets) and allowing computational groups to make predictions (referred to as models). Submitted models are released as sets in two stages (20 models per target for stage 1, 150 models per target for stage 2) for Model Quality Assessment (MQA), a specific subcategory of CASP that aims to assess the performance of scoring functions. Model quality is measured in terms of GDT_TS [12] based on the alignment of native structure and candidate model. GDT_TS ranges between 0 and 1, with higher GDT_TS value indicating better model quality.

We mirror the setup of the CASP experiment and split the CASP datasets based on protein target and release year. We train and validate our method on the set of models submitted to CASP 5-10 (500 targets for training, 58 for validation). For testing, we consider models submitted to stage 2 for CASP 11 (84 targets) and 12 (40 targets). We relaxed the structure of all models with the SCWRL4 software [13] to improve side-chain conformations prior to feeding the models into our network.

2.2 Architecture

Our network builds on recent neural network architectures that are specifically designed to learn from 3D atomic structures [14, 15]. Figure 1 illustrates the architecture of our network. At its core are a point-based representation of atoms and multiple layers of rotation-equivariant convolutions. The goal is to predict a quality score for a given protein model. Each atom has an associated feature vector. At input this is simply the one-hot encoding of its element type¹¹1We represent carbon, oxygen, nitrogen, and sulfur atoms.. We then perform rotation-equivariant convolutions that result in a new feature vector associated with each atom. The convolution filters of each layer are constructed based on a truncated series of spherical harmonics (up to rotation order $l=2$ ), such that the filters are able to recognize structural motifs independent of their orientation or position in space. Convolutions are performed among a limited set of k-nearest neighbors (k=40) to account for the fact that the physical laws governing intra-molecular interactions are local. The subsequent convolution layer outputs features only for the alpha carbon of each amino acid residue. This subsampling operation aggregates information hierarchically and allows the network to recognize structural motifs at different scales. We average the alpha carbon feature vectors to obtain a fingerprint for the entire protein model²²2Instead, we could also choose to aggregate information at the level of a single point using further subsampling operations (see [15]). The optimal choice of hierarchy is likely application dependent.. From this fingerprint, we use two dense layers (250 and 150 units, ReLu activation) to calculate a single scalar quality score.

Refer to caption — Figure 1: Network architecture Given 3D coordinates and element types of every atom as input, the network performs rotation-equivariant convolutions over multiple layers. The network first learns features at the level of every atom before we aggregate information at the level of alpha carbons (a subset of all atoms) in the next layer. We subsequently average the features over all alpha carbons to obtain a fingerprint for the entire protein model. This fingerprint is the input to a shallow dense network that outputs a final scalar score.

2.3 Training and Evaluation

We formulate the training as a regression task aiming to predict the quality metric GDT_TS for each model. The Huber loss between the actual and the predicted GDT_TS is used as the loss function. We train with the Adam optimizer in TensorFlow [16] (learning rate $1.25\cdot 10^{-4}$ ) and monitor the loss on the validation set for every epoch. The weights of the best-performing network are then used to evaluate the predictions on the test set. We use Horovod [17] to distribute training across 4 NVIDIA Titan X GPUs.

3 Results

3.1 Model quality assessment

We examine our results on model quality assessment, finding that we generally improve upon state-of-the-art methods (Table 1). We report multiple correlation metrics per method. Global correlation indicates the correlation between a method’s predictions and the GDT_TS scores of all protein models in a given set. Per target indicates the correlation with respect to a method’s predictions and the GDT_TS scores of the protein models for a given target (averaged over all targets). The two measures provide complementary information about a method’s performance. Good global correlation is desirable to judge the absolute quality of a set of candidate models. Per target correlation indicates a method’s ability to distinguish model quality among a set of models for one target (the main scoring challenge). We report Pearson, Kendall, and Spearman correlation coefficients for both global and per target correlations. Notably, our method also improves upon ProQ3D [10], which uses information on related proteins to make its predictions³³3GraphQA can also leverage information on related proteins, but here we compare against the version of GraphQA that only uses structural information..

Table 1: Comparison with state-of-the-art methods on CASP 11 and 12 in terms of global and mean per-target correlation coefficients (higher is better). The different coefficients are Pearson (r), Spearman (

\rho

), and Kendall (

\tau

). The top performing method for each metric is shown in bold.

Method	r	$\rho$	$\tau$	r	$\rho$	$\tau$	r	$\rho$	$\tau$	r	$\rho$	$\tau$
	CASP 11, stage 2							CASP 12, stage 2
	Global			Per target			Global			Per target
Ours	0.84	0.84	0.65	0.45	0.43	0.31	0.80	0.79	0.59	0.62	0.55	0.39
3DCNN [6]	0.64	0.69	0.48	0.40	0.39	0.27	0.61	0.64	0.46	0.51	0.45	0.32
Ornate [7]	0.63	0.67	0.48	0.39	0.37	0.26	0.67	0.66	0.47	0.49	0.46	0.32
GraphQA [8]	0.82	0.82	0.62	0.38	0.36	0.25	0.81	0.81	0.62	0.61	0.55	0.40
VoroMQA [18]	0.65	0.69	0.51	0.42	0.41	0.29	0.61	0.60	0.45	0.56	0.50	0.36
SBROD [9]	0.55	0.57	0.39	0.43	0.41	0.29	0.47	0.49	0.34	0.61	0.55	0.40
ProQ3D [10]	0.77	0.80	0.59	0.44	0.43	0.30	0.81	0.80	0.60	0.60	0.54	0.39

3.2 Visualization of learned embeddings

In Figure 2, we explore whether the network has learned to encode certain structural motifs. We project the fingerprint (see Figure 1) of each protein model in the test set into a lower-dimensional space using Principal Component Analysis (PCA). Visual inspection reveals that the fingerprint contains information on both inter-atomic interactions and secondary protein structure.

In Figure 2A, we consider the prevalence of van der Waals interactions in each protein model. We use the software GetContacts [19] to identify van der Waals interactions and divide the total number of interactions by the number of amino acid residues per model. We note that protein models cluster based on the prevalence of van der Waals interactions in the plane of principal component (PC) 5 and PC 1. In Figure 2B, we consider the prevalence of alpha helices in a given protein model. We use DSSP [20] to assign each residue in a protein model to one type of secondary structure (’alpha helix’, ’beta sheet’ or ’other’). We then calculate the fraction of ’alpha helix’ residues over all residues in a model. Models cluster based on this fraction when plotted based on PC 4 and PC 7.

4 Discussion

In this work, we presented a hierarchical deep-learning method to assess the quality of candidate models of protein structure. Our method learns end-to-end from the 3D atomic coordinates of protein models without the use of any physics-inspired or statistical energy terms. Thanks to the rotation equivariance of the network filters, the orientation in which motifs and models are presented to the network does not matter.

Our results on the CASP datasets indicate improved global and per-target GDT $\_$ TS correlations compared to previous approaches, including approaches that use additional information such as multiple-sequence alignments.

The fact that our network learns to predict a quality score given solely the atomic coordinates of a single protein model makes it suitable to guide sampling in protein structure modelling algorithms such as Rosetta [21], which we leave as future work.

References

[1] A. W. Senior, R. Evans et al., Improved protein structure prediction using potentials from deep learning, Nature 577, 706 (2020).
[2] J. Yang, R. Yan et al., The i-tasser suite: protein structure and function prediction, Nature methods 12, 7 (2015).
[3] L. J. McGuffin, J. D. Atkins et al., Intfold: an integrated server for modelling protein structures and functions from amino acid sequences, Nucleic acids research 43, W169 (2015).
[4] Z. Wang, J. Eickholt and J. Cheng, Multicom: a multi-level combination approach to protein structure prediction and its assessments in casp8, Bioinformatics 26, 882 (2010).
[5] D. Xu and Y. Zhang, Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field, Proteins: Structure, Function, and Bioinformatics 80, 1715 (2012).
[6] J. Hou, R. Cao and J. Cheng, Deep convolutional neural networks for predicting the quality of single protein structural models, bioRxiv (2019).
[7] G. Pagès, B. Charmettant and S. Grudinin, Protein model quality assessment using 3D oriented convolutional neural networks, Bioinformatics 35, 3313 (2019).
[8] F. Baldassarre, D. Menéndez Hurtado et al., GraphQA: protein model quality assessment using graph convolutional networks, Bioinformatics (2020).
[9] M. Karasikov, G. Pagès and S. Grudinin, Smooth orientation-dependent scoring function for coarse-grained protein quality assessment, Bioinformatics 35, 2801 (2019).
[10] K. Uziela, D. M. Hurtado et al., ProQ3d: Improved model quality assessments using deep learning, Bioinformatics (2017).
[11] A. Kryshtafovych, T. Schwede et al., Critical assessment of methods of protein structure prediction (casp)—round xiii, Proteins: Structure, Function, and Bioinformatics 87, 1011 (2019).
[12] A. Zemla, Č. Venclovas et al., Processing and evaluation of predictions in casp4, Proteins: Structure, Function, and Bioinformatics 45, 13 (2001).
[13] G. G. Krivov, M. V. Shapovalov and R. L. Dunbrack Jr, Improved prediction of protein side-chain conformations with scwrl4, Proteins: Structure, Function, and Bioinformatics 77, 778 (2009).
[14] N. Thomas, T. Smidt et al., Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds, arXiv preprint arXiv:1802.08219 (2018).
[15] S. Eismann, R. J. Townshend et al., Hierarchical, rotation-equivariant neural networks to predict the structure of protein complexes, arXiv preprint arXiv:2006.09275 (2020).
[16] M. Abadi, A. Agarwal et al., TensorFlow: Large-scale machine learning on heterogeneous systems (2015), Software available from tensorflow.org.
[17] A. Sergeev and M. D. Balso, Horovod: fast and easy distributed deep learning in TensorFlow, arXiv preprint arXiv:1802.05799 (2018).
[18] K. Olechnovič and Č. Venclovas, Voromqa: Assessment of protein structure quality using interatomic contact areas, Proteins: Structure, Function, and Bioinformatics 85, 1131 (2017).
[19] A. J. Venkatakrishnan, R. Fonseca et al., Uncovering patterns of atomic interactions in static and dynamic structures of proteins, bioRxiv (2019).
[20] W. Kabsch and C. Sander, Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features, Biopolymers 22, 2577 (1983).
[21] A. Leaver-Fay, M. Tyka et al., Rosetta3: an object-oriented software suite for the simulation and design of macromolecules, Methods in enzymology 487, 545 (2011).