This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Correlation-enhanced viable core in metabolic networks

Mi Jin Lee Department of Applied Physics, Hanyang University, Ansan 15588, Korea    Sudo Yi School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea    Deok-Sun Lee [email protected] School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea Center for AI and Natural Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea
Abstract

Cellular ingredient concentrations can be stabilized by adjusting generation and consumption rates through multiple pathways. To explore the portion of cellular metabolism equipped with multiple pathways, we categorize individual metabolic reactions and compounds as viable or inviable: A compound is viable if processed by two or more reactions, and a reaction is viable if all of its substrates and products are viable. Using this classification, we identify the maximal subnetwork of viable nodes, referred to as the viable core, in bipartite metabolic networks across thousands of species. The obtained viable cores are remarkably larger than those in degree-preserving randomized networks, while their broad degree distributions commonly enable the viable cores to shrink gradually as reaction nodes are deleted. We demonstrate that the positive degree-degree correlations of the empirical networks may underlie the enlarged viable cores compared to the randomized networks. By investigating the relation between degree and cross-species frequency of metabolic compounds and reactions, we elucidate the evolutionary origin of the correlations.

I Introduction

Living organisms adapt to fluctuating environments over generations, resulting in the evolution of cellular metabolism through the creation, deletion, and modification of metabolic pathways Horowitz (1945); Yčas (1974); Jensen (1976); Scossa and Fernie (2020). The stochastic nature of evolution, along with different environments that ancestor species have been exposed to, creates differences among contemporary species. On the other hand, metabolism should be able to execute common key functions essential to life via a common subset of metabolic pathways. Consequently, metabolic reactions (enzymes) and compounds exhibit heterogeneity in the number of species containing them in the metabolism Bernhardsson et al. (2011); Kim et al. (2015, 2019) and functional robustness Lemke et al. (2004); Almaas et al. (2005); Grimbs et al. (2007); Smart et al. (2008); Some are in charge of key functions and thus well preserved across species and stable in activity against environmental perturbations while others are variable, interacting closely with the environment Matias Rodrigues and Wagner (2009); Pál et al. (2006). Living organisms should allocate finite resources to diverse cellular functions optimally towards the maximal growth and reproduction Scott et al. (2010), and thus identifying stable and variable part of metabolism can reveal the principle of metabolic resource allocation under the tension between robustness and adaptability Smart et al. (2008).

A stable component of metabolism should consist of the compounds that can maintain steady concentrations against possible perturbations by balancing generation and consumption rates Papoutsakis (1984); Fell and Small (1986). To identify the stable component, one may need to examine the fluxes of all metabolic reactions in all possible conditions, which is however hard experimentally though numerical simulations could be available in limited cases Lee et al. (2009). Here we represent metabolism of each species as a bipartite network connecting reaction nodes to their substrate and product compound nodes by undirected links, assuming that all reactions are reversible, and obtain the stable component by applying a structural criterion. Such a network representation of metabolism Jeong et al. (2000) has been widely used, successfully illustrating key principles of cellular metabolism organization Wuchty and Almaas (2005); Francke et al. (2005); Stelling et al. (2002); Gianchandani et al. (2006); Lee et al. (2008); Lee (2010). In this work we classify individual reaction and compound nodes into viable and inviable ones based on their local connectivity pattern relevant to supporting dynamic stability, applying the criteria introduced in Refs. Lemke et al. (2004); Smart et al. (2008) and suited for our undirected bipartite metabolic networks. Then we define the maximal induced-subgraph consisting of these viable nodes as the viable core, which can be a candidate for the stable component. The procedures to prune inviable nodes and obtain the viable core are the bipartite-network counterpart of the greedy leaf removal process considered for a unipartite network in Ref. Liu et al. (2012).

Our study shows that more (less) resources are allocated for robustness (adaptability) in the real metabolic networks than expected based only on the node degrees. The viable cores of 5469 species from the BioCyc database Karp et al. (2017) are significantly larger than those in the randomized networks sharing the same degree sequences. Individual reactions show different likelihood to belong to the core; Some reactions belong to the core in all species while others do so in just few species. The functional form of the degree distributions can drastically alter how the core shrinks as reactions are removed randomly, which is revealed by the analytic solutions to the viable cores of uncorrelated networks. By generating artificial correlated networks, we find that a positive correlation between the degrees of connected nodes can enlarge the viable core, and to understand it, we investigate how the degree-degree correlation affects the branching ratio of inviable nodes in the pruning process. Indeed, positive degree-degree correlations exist in the real metabolic networks, supporting their large cores. Investigating the relation between the cross-species frequencies and degrees of metabolic reactions, we discuss how living species can acquire the degree-degree correlations during evolution.

This paper is organized as follows. In Sec. II, we define the viable core of a metabolic network and the viability of a node, and study them in the empirical and randomized networks. The effects of correlations on the core size are investigated, along with the cross-species properties that offer an evolutionary perspective in Sec. III. We summarize and discuss the implications of our results in Sec. IV.

II Viable core and viability

We construct the bipartite metabolic networks for the species included in the version 19.1 of the BioCyc database Karp et al. (2017), covering a total of 5469 bacterial species, 7695 distinct compounds, and 12077 reactions. A species’ network is made by connecting each chemical reaction present in the species to its associated compounds, substrates or products; for instance, if a reaction rr converts two substrates c1c_{1} and c2c_{2} into two products c3c_{3} and c4c_{4}, i.e., c1+c2c3+c4c_{1}+c_{2}\to c_{3}+c_{4}, then we assign four undirected links (r,c1),(r,c2),(r,c3)(r,c_{1}),(r,c_{2}),(r,c_{3}), and (r,c4)(r,c_{4}) Jeong et al. (2000); Deville et al. (2003). Considering the possibility of the reactions’ direction to be reversed depending on the environment and the concentrations of substrates and products, we consider all reactions as reversible and thus assign undirected links. We find 1393±4541393\pm 454 reaction nodes and 1499±4371499\pm 437 compound nodes in a network. The number of neighbor nodes, i.e., the connected reactions (compounds) of a compound (reaction), is called degree and denoted by kk (qq).

Refer to caption
Figure 1: Viable cores of metabolic networks. (a) An example of the viable core consisting of viable nodes in a bipartite metabolic network. The two rightmost nodes are inviable, which are pruned to identify the core. The viable core (b) of Klebsiella pneumoniae and (c) of its degree-preserving randomized network. (d) The relative size of the viable core vsv^{s} for 5469 species and (e) the reaction viability vrv_{r} for 12077 reactions. They are plotted versus the counterparts averaged over 100 instances of degree-preserving randomized networks for each species. Color indicates the number of data points. The solid lines are y=xy=x for a guide.

II.1 Definition and computation of viable core

Given our goal of characterizing a stable subset of metabolism by a structural criterion, we exploit the fact that a compound can maintain constant concentration only if at least one reaction produces it and another consumes it, which can support adjusting generation and consumption rates respectively, as introduced in Refs. Lemke et al. (2004); Smart et al. (2008). Translating this condition for our undirected networks, we define a compound cc as viable in a given network if its degree kck_{c}, the number of reactions involving it and thus connected to it, is equal to or larger than two (kc2k_{c}\geq 2); one reaction can generate and another can consume the compound under the assumption that the reactions are reversible. If a compound has no or just a single reaction as its neighbor, then it is considered as inviable. A reaction node rr is defined as viable if all of its qrq_{r} connected compound nodes are viable; A reaction can occur only if none of its substrates and products are inviable. We present the examples of viable and inviable nodes in Fig. 1(a).

Then one can obtain the maximal induced-subgraph consisting of such viable compounds and viable reactions, which we define as the viable core. One can consider the viable core as the subset of metabolism which can maintain constant fluxes and concentrations of the reactions and compounds involved once they are established; Whether the viable core will reach such a steady state depends on the initial condition and the environment.

The viable core of a network can be obtained by removing iteratively all inviable nodes as follows. We first remove inviable compounds, i.e., the compound nodes of degree smaller than two. Their connected reaction nodes then become inviable and are removed. After this removal of inviable compounds and reactions in the first step, there can appear new inviable compounds, which had degree k2k\geq 2 but now have k<2k<2 due to the loss of some of their connected reactions in the previous step. These new inviable compound nodes and their connected reaction nodes are removed in the second step. Such pruning is repeated until there are no more inviable nodes left, when the remaining compound and reaction nodes constitute the viable core. An example is presented in Fig. 1(a). This pruning process is almost the same as the greedy leaf removal (GLR) process considered in Ref. Liu et al. (2012), in which the nodes of degree smaller than two and all of their neighbor nodes are removed iteratively in unipartite networks without distinguishing the type of nodes.

II.2 Enhanced viable core and viability

Applying the procedure described in the previous subsection, we have obtained the viable core of the metabolic network for each species. As an example, the viable core of Klebsiella pneumoniae is shown in Fig. 1(b), which includes 18 viable reactions and 29 viable compounds among a total of 62 reactions and 91 compounds. To understand how large or small the viable core of a species ss is, we consider its relative size vsv^{s}, the ratio of the number of nodes in the viable core to the total number of nodes. We compare it with the average relative size vsran\langle v^{s}\rangle_{\rm ran} of the viable cores of the randomized networks, obtained by rewiring links while preserving the degrees of individual nodes; Each instance of these randomized networks, shown in Fig. 1(c) as an example, has been created by swapping LsL^{s} times the end nodes of two randomly chosen links (LsL^{s} is the total number of links for species ss) and we have generated one hundred such instances for each species. One can see that the viable core of the real network of Klebsiella pneumoniae [Fig. 1(b)] is much larger than that of a randomized network [Fig. 1(c)]. This holds for all species; The empirical core size ranges between 0.1 and 0.5 across species, and vs>vsranv^{s}>\langle v^{s}\rangle_{\rm ran} (P-value less than 6×1066\times 10^{-6}) for every species ss as shown in Fig. 1(d). This result means that the core is larger than expected based on the given degree sequences when others are random, and suggests that the network structure beyond the degree sequence plays a role in forming the viable core. Such enhanced stability has been reported in different contexts for a few selected species, e.g., it has been shown that the cascades of failures are more constrained in the real metabolic networks than in the randomized networks Smart et al. (2008); Güell et al. (2014). Also, as we will show below, highly heterogeneous degrees of compounds Jeong et al. (2000) strongly influence the size of the viable core.

We define the viability of each reaction vrv_{r} as the fraction of the species having the reaction in their viable cores. It can quantify the likelihood that a reaction belongs to the viable core and is thus stable. In Fig. 1(e), it is shown that a significant number of reactions (5549/1207746%5549/12077\simeq 46\%) do not belong to the viable core in any species, while their viability in the randomized networks is mostly non-zero and distributed broadly—5530 reactions have vr=0v_{r}=0 but vrran>0\langle v_{r}\rangle_{\rm ran}>0. On the other hand, the other 6528 reactions have non-zero viability in the empirical networks and tend to have lower viability in the degree-preserving randomized networks, i.e., vr>vrranv_{r}>\langle v_{r}\rangle_{\rm ran}. These findings indicate the stronger heterogeneity of reaction’s viability in empirical networks than in randomized networks and imply that the viable cores of real metabolic networks are primarily formed by selected high-viability reactions.

II.3 Viable core under random failure

Refer to caption
Figure 2: Viable core under random failure. (a) Size of the viable core as a function of the fraction QQ of removed reaction nodes in the original, degree-preserving randomized, and link-preserving randomized networks for Escherichia coli. Lines represent the analytic results from Eq. (3). Error bars are smaller than the data points. (b) Distribution of the characteristic size of the viable core under failure for the original networks of all species is located in the right of that for the degree-preserving randomized networks. Distribution of the degree of (c) compounds and of (d) reactions are shown, scaled by the mean degrees k¯s\bar{k}^{s} and q¯s\bar{q}^{s} and averaged over all species ss.

We have shown that the real-world metabolic networks have larger stable components than expected for the randomized networks. Will it hold also even with perturbations? In terms of the viable core, we can seek the answer by measuring the viable core in the damaged networks.

To be specific, we remove a fraction QQ of randomly chosen reaction nodes to mimic their malfunction due to e.g., the inactivation or mutation of the enzymes catalyzing them, and then compute the viable core. For Escherichia coli [Fig. 2(a)], the real damaged networks have larger cores than the damaged degree-preserving randomized networks in the whole range of QQ. For all species ss, we measure vs(Q)v^{s}(Q) and vsran(Q)\langle v^{s}\rangle_{\rm ran}(Q) as functions of QQ in the original and degree-preserving randomized networks, respectively. We find that the characteristic core size vs~01vs(Q)𝑑Q\widetilde{v^{s}}\equiv\int_{0}^{1}v^{s}(Q)dQ, an average of vv in the range 0Q10\leq Q\leq 1, of the real damaged networks is much larger than that of the damaged randomized networks vs~ran\widetilde{\langle v^{s}\rangle}_{\rm ran} as shown by comparing their distributions over species [Fig. 2(b)].

Since the degree-preserving randomized networks are obtained by shuffling connection among nodes while preserving nodes’ degrees, they have no or very weak correlation between the degrees of connected nodes Park and Newman (2003); Catanzaro et al. (2005). For such uncorrelated networks, the size of the viable core can be obtained analytically by solving self-consistent equations formulated in terms of the degree distributions as done in Refs. Dorogovtsev et al. (2006); Liu et al. (2012); Lee et al. (2023) or the adjacency matrix, called the message-passing method, as done recently in Ref. Bianconi and Dorogovtsev . Here we take the former approach to understand the influence of the degree distribution on the core size in uncorrelated networks.

Following Ref. Liu et al. (2012), let us first consider a compound node reached by following a link from a viable reaction node and denote the probability of the compound node to be inviable by α\alpha. The compound node will be inviable if all the remaining neighboring reaction nodes, except the viable one at the end of the followed link, are inviable. Next, let us consider a reaction node reached by following a link from a viable compound node, and denote the probability of the reaction node to be inviable by β\beta. The reaction node will be viable if (i) it is not removed by the random removal of QQ fraction of reaction nodes and (ii) all of its neighboring compound nodes are viable. These reasonings lead us to α=kkPcmp(k)k¯βk1\alpha=\sum_{k}\frac{kP_{\rm cmp}(k)}{\bar{k}}\beta^{k-1} and 1β=(1Q)qqPrxn(q)q¯(1α)q11-\beta=(1-Q)\sum_{q}\frac{qP_{\rm rxn}(q)}{\bar{q}}(1-\alpha)^{q-1}, where Pcmp(k)[Prxn(q)]P_{\rm cmp}(k)\,[P_{\rm rxn}(q)] and k¯(q¯)\bar{k}\,(\bar{q}) are the degree distribution and the mean degree of compound (reaction) nodes, respectively. Introducing the generating functions Gcmp(z)k=0Pcmp(k)zkG_{\rm cmp}(z)\equiv\sum_{k=0}^{\infty}P_{\rm cmp}(k)z^{k} and Grxn(z)q=0Prxn(q)zqG_{\rm rxn}(z)\equiv\sum_{q=0}^{\infty}P_{\rm rxn}(q)z^{q}, one can represent the previous equations as

α\displaystyle\alpha =Gcmp(β)k¯,\displaystyle={G^{\prime}_{\rm cmp}(\beta)\over\bar{k}},
1β\displaystyle 1-\beta =(1Q)Grxn(1α)q¯.\displaystyle=(1-Q){G^{\prime}_{\rm rxn}(1-\alpha)\over\bar{q}}. (1)

Recalling that a reaction node is viable if it is not removed and all of its neighboring compound nodes are viable and a compound node is viable if it has at least two neighbor reaction nodes that are viable, one can find the probability vrxn(vcmp)v_{\rm rxn}\,(v_{\rm cmp}) of a reaction (compound) node to be viable as

vrxn\displaystyle v_{\rm rxn} =(1Q)qPrxn(q)(1α)q=(1Q)Grxn(1α),\displaystyle=(1-Q)\sum_{q}P_{\rm rxn}(q)(1-\alpha)^{q}=(1-Q)G_{\rm rxn}(1-\alpha),
vcmp\displaystyle v_{\rm cmp} =k2Pcmp(k)s=2k(ks)(1β)sβks\displaystyle=\sum_{k\geq 2}P_{\rm cmp}(k)\sum_{s=2}^{k}\binom{k}{s}(1-\beta)^{s}\beta^{k-s}
=k2Pcmp(k)[1βkk(1β)βk1]\displaystyle=\sum_{k\geq 2}P_{\rm cmp}(k)\left[1-\beta^{k}-k(1-\beta)\beta^{k-1}\right]
=1Gcmp(β)(1β)Gcmp(β).\displaystyle=1-G_{\rm cmp}(\beta)-(1-\beta)G^{\prime}_{\rm cmp}(\beta). (2)

Finally, the relative size vv of the viable core is given by

v=ncmpvcmp+nrxnvrxnv=n_{\rm cmp}v_{\rm cmp}+n_{\rm rxn}v_{\rm rxn} (3)

with ncmp(nrxn)n_{\rm cmp}\,(n_{\rm rxn}) being the fraction of compound (reaction) nodes, satisfying ncmp+nrxn=1n_{\rm cmp}+n_{\rm rxn}=1, in the considered metabolic network.

The solutions to Eqs. (1), (2), and (3) are dependent only on the degree distributions Pcmp(k)P_{\rm cmp}(k) and Prxn(q)P_{\rm rxn}(q). The degree distributions averaged over species are shown in Figs. 2(c) and 2(d). The broad degree distributions for compounds are universal, which leads the viable cores to shrink gradually as QQ increases as shown in Fig. 2(a) Liu et al. (2012); Dorogovtsev et al. (2006). Such a gradual shrink may not be the case when the degree distribution is narrow and the mean degrees are sufficiently large. In the completely randomized networks that are obtained by assigning links to randomly selected node pairs and thus preserve the total number of links but do not preserve the degree sequences, the viable core size drops abruptly to a small value at a critical value QcQ_{c} [Fig. 2(a)], denoted by ‘link-preserving’. With the degree distributions given in the Poissonian form, one can find that such discontinuous transitions can occur if the mean degrees are larger than e=2.718182e=2.718182\ldots, i.e., k¯>e\bar{k}>e and q¯>e\bar{q}>e Liu et al. (2012); Dorogovtsev et al. (2006). See Appendix A for derivation. We find 5463/5469 \simeq 99.8% species satisfy this condition.

Refer to caption
Figure 3: Effects of degree-degree correlations on the size of viable cores. (a) Positive correlations in empirical networks. The color plot of the joint pdf P(k~c,q~r)P(\tilde{k}_{c},\tilde{q}_{r}) of the standardized degrees of the compound node cc and the reaction node rr at both ends of a link, averaged over species, is shown. The conditional mean q~r(k~c)=𝑑q~rq~rP(q~r|k~c)\langle\tilde{q}_{r}\rangle(\tilde{k}_{c})=\int d\tilde{q}_{r}\tilde{q}_{r}P(\tilde{q}_{r}|\tilde{k}_{c}) is represented by white circles along with the standard deviation represented by white bars. In the range 1k~c0-1\lesssim\tilde{k}_{c}\lesssim 0 where the pdf P(k~c)P(\tilde{k}_{c}), shown in the upper part, is large, the increase of q~r(k~c)\langle\tilde{q}_{r}\rangle(\tilde{k}_{c}) with k~c\tilde{k}_{c} is noticeable. (b) The relative size vv of the viable core in the degree-preserving correlated networks of given assortativity ρ\rho. The solid line represents the mean and the shade represents the standard deviation. (c) Branching ratio gg in the correlated networks as a function of assortativity. The solid lines represent the mean values and the shades the standard deviation. Inset: The compound-to-reaction branching ratio gcmprxng_{{\rm cmp}\to{\rm rxn}} and the reaction-to-compound branching ratio grxncmpg_{{\rm rxn}\to{\rm cmp}} versus ρ\rho.

III Origin of enhanced viable cores

The significantly larger viable cores than expected in the degree-preserving randomized networks suggest that correlations or higher-order structural characteristics beyond the degree sequences play a crucial role in the formation of the viable core. In this section, we explore the correlation of the degrees of connected nodes in real metabolic networks and study how they can affect the viable core and the possible origin of such correlations.

III.1 Positive degree-degree correlations

Our analysis shows that the degrees qq and kk of the end nodes, reaction- and compound-type respectively, of a link are positively correlated in the empirical networks. Their Pearson correlation coefficient or assortativity is defined for each species ss as  Newman (2002),

ρs=c,rAcrsLs(kcsknnsσcmps)(qrsqnnsσrxns),\rho^{s}=\sum_{c,r}{A_{cr}^{s}\over L^{s}}\left({k_{c}^{s}-k_{\rm nn}^{s}\over\sigma_{\rm cmp}^{s}}\right)\left({q_{r}^{s}-q_{\rm nn}^{s}\over\sigma_{\rm rxn}^{s}}\right), (4)

where AcrsA_{cr}^{s} is the adjacency matrix of the metabolic network, and knnsc,r(Acrs/Ls)kcsk_{\rm nn}^{s}\equiv\sum_{c,r}(A_{cr}^{s}/L^{s})k_{c}^{s} and qnnsc,r(Acrs/Ls)qrsq_{\rm nn}^{s}\equiv\sum_{c,r}(A_{cr}^{s}/L^{s})q_{r}^{s} are the mean degrees of the end nodes of a link, and σcmps[c,r(Acrs/Ls)(kcsknns)2]1/2\sigma_{\rm cmp}^{s}\equiv[\sum_{c,r}(A_{cr}^{s}/L^{s})(k_{c}^{s}-k_{\rm nn}^{s})^{2}]^{1/2} and σrxns[c,r(Acrs/Ls)(qrsqnns)2]1/2\sigma_{\rm rxn}^{s}\equiv[\sum_{c,r}(A_{cr}^{s}/L^{s})(q_{r}^{s}-q_{\rm nn}^{s})^{2}]^{1/2} are the standard deviations. The assortativity of the real metabolic networks is given by ρemp0.13±0.02\rho_{\rm emp}\simeq 0.13\pm 0.02 and significant with P values less than 0.01 for almost all species. Also we compute the probability density function (pdf) of the standardized degrees k~cs=kcsknnsσcmps\tilde{k}_{c}^{s}={k_{c}^{s}-k_{\rm nn}^{s}\over\sigma_{\rm cmp}^{s}} and q~rs=qrsqnnsσrxns\tilde{q}_{r}^{s}={q_{r}^{s}-q_{\rm nn}^{s}\over\sigma_{\rm rxn}^{s}}, and the conditional mean q~s(k~)=c,r(Acrs/Ls)q~rsδ(k~csk~)/c,r(Acrs/Ls)δ(k~csk~)\langle\tilde{q}^{s}\rangle(\tilde{k})=\sum_{c,r}(A_{cr}^{s}/L^{s})\tilde{q}_{r}^{s}\delta(\tilde{k}_{c}^{s}-\tilde{k})/\sum_{c,r}(A_{cr}^{s}/L^{s})\delta(\tilde{k}_{c}^{s}-\tilde{k}). Note that the assortativity is one of the moments of the pdf, ρ=𝑑k~𝑑q~P(k~,q~)k~q~\rho=\int d\tilde{k}d\tilde{q}P(\tilde{k},\tilde{q})\tilde{k}\tilde{q}. In Fig. 3(a) are shown the pdf and the conditional mean averaged over species. One can see that q~(k~)\langle\tilde{q}\rangle(\tilde{k}) grows with k~\tilde{k} in the most probable range 1k~0-1\lesssim\tilde{k}\lesssim 0 supporting the positive correlation between q~\tilde{q} and k~\tilde{k}.

To scrutinize the influence of the degree-degree correlation on the size of the viable core, we generate an ensemble of artificial degree-preserving correlated networks that exhibit a given assortativity. To be specific, for given species ss, we perform the Monte-Carlo sampling of different configurations—different adjacency matrices AcrA_{cr}—while preserving the given degree sequence by swapping the end nodes of two links with a Hamiltonian =Jc,rAcrkcsqrs\mathcal{H}=-J\sum_{c,r}A_{cr}k_{c}^{s}q_{r}^{s}. The assortativity is measured in the equilibrated networks for each JJ. Note that the degree-preserving randomized networks are generated with J=0J=0. The relative size vv of the viable cores of these degree-preserving correlated networks, averaged over species, increases as assortativity ρ\rho increases over a wide range of ρ\rho, i.e., 0.5ρ0.2-0.5\lesssim\rho\lesssim 0.2, which includes most of the empirical values of assortativity [Fig. 3(b)]. This demonstrates the contribution of the positive degree-degree correlations to the enhancement of the viable cores in the empirical networks. Yet, we also note that they are not the only cause of the core enhancement; The size of the empirical viable cores, v=0.4±0.04v=0.4\pm 0.04, is larger than v=0.22±0.02v=0.22\pm 0.02 of those artificial correlated networks at the empirical values of assortativity ρemp\rho_{\rm emp}.

To understand why a positive (negative) correlation makes a larger (smaller) viable core for a given degree sequence, we investigate the branching ratio gg of inviable compounds between adjacent steps of pruning; If there are C1C_{1} inviable compounds in the current step and after removing them and their connected reactions nodes, there appear C2C_{2} newly inviable compounds in the next step, the branching ratio will be given by gC2C1g\equiv{C_{2}\over C_{1}}. One can expect a larger (smaller) core with a smaller (larger) branching ratio. To see the dependence of the branching ratio on assortativity, we decompose the branching ratio into the compound-to-reaction one and the reaction-to-compound one as

g=gcmprxngrxncmpwithgcmprxnR1C1andgrxncmpC2R1,g=g_{{\rm cmp}\to{\rm rxn}}g_{{\rm rxn}\to{\rm cmp}}\\ \ {\rm with}\ g_{{\rm cmp}\to{\rm rxn}}\equiv{R_{1}\over C_{1}}\ {\rm and}\ g_{{\rm rxn}\to{\rm cmp}}\equiv{C_{2}\over R_{1}}, (5)

where R1R_{1} is the number of the current-step inviable reaction nodes, those connected to the C1C_{1} current-step inviable nodes, and C2C_{2} is the number of the next-step inviable nodes that will be made inviable only in the next step by pruning in the current step.

If a network has a negative degree-degree correlation, the compound-to-reaction branching ratio gcmprxng_{{\rm cmp}\to{\rm rxn}} can be small; Inviable compounds may be concentrated, being connected to a few hub reaction nodes. Each of those current-step inviable reaction nodes has many neighbor compounds and thus can leave many next-step inviable compounds, resulting in large grxncmpg_{{\rm rxn}\to{\rm cmp}}. In contrast, gcmprxng_{{\rm cmp}\to{\rm rxn}} cannot be so small in a network with a positive degree-degree correlation since inviable compounds are distributed, each being connected to each different reaction node of a small degree. Those current-step inviable reaction nodes have small degrees and thus each will leave a small number of next-step inviable compounds, resulting in small grxncmpg_{{\rm rxn}\to{\rm cmp}}.

These reasonings can be summarized as follows: As assortativity increases, gcmprxng_{{\rm cmp}\to{\rm rxn}} increases but grxncmpg_{{\rm rxn}\to{\rm cmp}} decreases. This is confirmed in our artificial correlated networks. In the inset of Fig. 3(c), we show the two branching ratios obtained by computing C1,R1C_{1},R_{1}, and C2C_{2} in the early-stage of pruning in each correlated network and averaging over species, which behave as functions of ρ\rho as expected within a range of interest. A crucial observation is that the increase of gcmprxng_{{\rm cmp}\to{\rm rxn}} with ρ\rho is much weaker than the decrease of grxncmpg_{{\rm rxn}\to{\rm cmp}} with ρ\rho. Such quantitative difference in the dependence on ρ\rho results in the decrease of gg, the product of the two branching ratios, with increasing ρ\rho; g0.4g\simeq 0.4 at ρ=0.4\rho=-0.4 while g0.35g\simeq 0.35 at ρ=0.1\rho=0.1.

Such dependencies of the branching ratios and the core size on assortativity help us understand how a structural correlation can influence the viable core. We should remark that the behaviors of v(ρ)v(\rho) and g(ρ)g(\rho) may be changed in extremely correlated networks; vv decreases as ρ\rho increases in the range ρ0.5\rho\lesssim-0.5 and ρ0.2\rho\gtrsim 0.2 [Fig. 3(b)] and gg increases with ρ\rho in the range ρ0.2\rho\gtrsim 0.2 [Fig. 3(c)]. The investigation of these deviations may be interesting and need further investigation. We also note that the empirical branching ratio g=0.2±0.03g=0.2\pm 0.03 is smaller than the ratio g0.34g\simeq 0.34 from the artificial correlated networks at the empirical value of assortativity, implying again the influence of higher-order structure on the viable core and the branching ratio of pruning.

Refer to caption
Figure 4: Correlation of the degree and frequency of compounds and reactions in real metabolic networks. (a) The color plot of the joint pdf P(fc,k~c)P(f_{c},\tilde{k}_{c}) of the frequency and the standardized degree of the compound node cc at one end of a link, averaged over species, is shown. The conditional mean k~c(fc)=𝑑k~ck~cP(k~c|fc)\langle\tilde{k}_{c}\rangle(f_{c})=\int d\tilde{k}_{c}\tilde{k}_{c}P(\tilde{k}_{c}|f_{c}) is represented by white circles along with the standard deviation represented by white bars. k~c(fc)\langle\tilde{k}_{c}\rangle(f_{c}) increases abruptly with fcf_{c} in the range 0.9fc10.9\lesssim f_{c}\lesssim 1 in which the pdf P(fc)P(f_{c}) is large. (b) The joint pdf P(fr,q~r)P(f_{r},\tilde{q}_{r}) of the frequency and the standardized degree of the reaction node rr at one end of a link is shown, along with the conditional mean q~r(fr)\langle\tilde{q}_{r}\rangle(f_{r}). In the range q~r0.8\tilde{q}_{r}\gtrsim 0.8 with large P(q~r)P(\tilde{q}_{r}), q~r(fr)\langle\tilde{q}_{r}\rangle(f_{r}) exhibits a noticeable increase with frf_{r}. (c) The joint pdf P(fc,fr)P(f_{c},f_{r}) of the compound and the reaction at both ends of a link. The conditional mean fr(fc)\langle f_{r}\rangle(f_{c}) increases persistently with fcf_{c}.

III.2 Evolutionary origin of positive correlations

Given that a positive degree-degree correlation can enlarge the viable core and thereby enhance stability, one can suspect that an evolutionary pressure might have been imposed towards generating the positive correlations in the real metabolic networks during evolution. While it is a simple and compelling scenario, here we present evidence that a neutral evolution of metabolism, without an evolutionary pressure, over a growing tree of species on a long-time scale can generate the positive correlations.

The frequency frf_{r} (fcf_{c}) of a reaction rr (compound cc) to be present in a species’ metabolism among species has been found to be so heterogeneous  Bernhardsson et al. (2011); Kim et al. (2015, 2019) that its distribution takes a power-law form with exponent one Kim et al. (2015, 2019); Lee and Lee (2024). It has been shown in Ref. Lee and Lee (2024) that such heterogeneous frequencies of reactions and compounds may originate from their different times of the first appearance in the metabolism of a living species and their inheritance to the descendant species during the co-evolution of metabolism and species tree. In this framework, the early-recruited compounds and reactions have high frequency and moreover, are expected to have larger degrees in the metabolic networks of the species containing them since they have had more chances to be connected to other nodes during evolution than the late-recruited ones. Therefore we can think of the empirical positive degree-degree correlations as originating from the co-recruitment of connected reactions and compounds in the evolving metabolism in a growing species tree.

The empirical data support this expectation. The Pearson correlation coefficient between the frequency fcf_{c} and the degree kcsk_{c}^{s} of a compound at the end of a link in a species is 0.34±0.030.34\pm 0.03 (P 4×106\lesssim 4\times 10^{-6} for all species) with the fluctuation arising among species, demonstrating their significant correlations. In each species, the end compound node of a link is likely to have high frequency and such high-frequency compounds show clearly a positive correlation between the standardized degree k~c\tilde{k}_{c} and the frequency fcf_{c} as shown by their joint pdf and the conditional mean k~c(fc)\langle\tilde{k}_{c}\rangle(f_{c}) [Fig. 4(a)]. The correlation between frf_{r} and qrsq_{r}^{s} of a reaction is 0.06±0.060.06\pm 0.06, which is weaker than the correlation for compounds’ degree and frequency but still significant for many species (P 0.05\lesssim 0.05 for 4130/546976%4130/5469\approx 76\% species). The pdf P(fr,q~r)P(f_{r},\tilde{q}_{r}) and the conditional mean qr(fr)\langle q_{r}\rangle(f_{r}) show that the correlation is particularly significant for reactions of high frequency in Fig. 4(b).

Also, as expected, the frequencies of a connected pair of reaction and compound nodes are strongly correlated; Their Pearson correlation coefficient is 0.42±0.040.42\pm 0.04 (P value 0.0015\lesssim 0.0015 for all species) and the pdf P(fc,fr)P(f_{c},f_{r}), and the conditional mean fr(fc)\langle f_{r}\rangle(f_{c}) clearly show the correlation in Fig. 4(c). Collecting all these results, we find that for a connected pair of a compound node and a reaction node, their degrees are positively correlated with their frequencies respectively and their frequencies are very similar, as expected from the metabolism evolution model as sketched above. All these factors result in the positive correlations of the degrees of the pair and, in turn, contribute to the enhancement of the viable core.

IV Summary and Discussion

Metabolism serves to perform key life processes robustly and at the same time interacts closely with fluctuating environments, and it is an important question how finite cellular resources are divided between such opposite tasks. By the graph-theoretic approach, we have investigated the viable core of the metabolic network, expected to function stably and contribute to homeostasis, across thousands of species to discover that those cores are larger than would be observed if metabolic compounds and reactions were paired randomly. We have presented evidence that the positive degree-degree correlation of a connected pair of a reaction and a compound, identified empirically, can be a cause of those enhanced viable cores. We have also shown that the co-recruitment of reactions and compounds into the metabolism over a growing phylogenetic tree of species during evolution can give rise to such positive correlations.

Here we have focused on aggregate properties, and therefore extending our study to assessing the structural stability of individual elements of metabolism, reactions and compounds, and individual species’ metabolism. We have identified the strikingly different behaviors in the decay of the viable core under node removal between homogeneous and heterogeneous networks and its analytic understanding can be desirable. It will be also interesting to extend our study to directed bipartite metabolic networks obtained by considering the irreversibility of metabolic reactions.

Acknowledgements.
This work was supported by grants from the National Research Foundation of Korea (NRF) funded by the Korean Government [No. NRF-2021R1C1C1007918 (M.J.L.) and NRF-2019R1A2C1003486 (D.-S.L.)], and a KIAS Individual Grants [No. CG079902(D.-S.L) and No. CG074102 (S.Y.)] from Korea Institute for Advanced Study. We are grateful to the Center for Advanced Computation in KIAS for providing computing resources.

References

Appendix A Size of the viable core in completely random bipartite networks under random failure: Discontinuous or continuous transition

Here we study the viable core of the random bipartite networks consisting of the compound- and reaction-type nodes, which have the Poisson degree distributions Dcmp(k)=k¯kk!ek¯D_{\rm cmp}(k)={\bar{k}^{k}\over k!}e^{-\bar{k}} and Drxn(q)=q¯qq!eq¯D_{\rm rxn}(q)={\bar{q}^{q}\over q!}e^{-\bar{q}} with k¯\bar{k} and q¯\bar{q} the mean degrees of both types of nodes. Using their generating functions Gcmp(z)=e(1z)k¯G_{\rm cmp}(z)=e^{-(1-z)\bar{k}} and Grxn(z)=e(1z)q¯G_{\rm rxn}(z)=e^{-(1-z)\bar{q}} in Eq. (1), we obtain

α=ek¯(1β)and 1β=(1Q)eq¯α.\alpha=e^{-\bar{k}(1-\beta)}\ {\rm and}\ 1-\beta=(1-Q)e^{-\bar{q}\alpha}. (6)

The solution α\alpha and β\beta to Eq. (6) are then used in Eqs. (2) and (3) to give the relative size of the viable core.

One can establish from Eq. (6)

α=f(α)withf(α)ek¯(1Q)eq¯α.\alpha=f(\alpha)\ {\rm with}\ f(\alpha)\equiv e^{-\bar{k}(1-Q)e^{-\bar{q}\alpha}}. (7)

The intersection of y=f(α)y=f(\alpha) and y=αy=\alpha in the (α,y)(\alpha,y) plane gives the solution α\alpha. Note that a small (large) value of α\alpha leads to a large (small) core. The function f(α)f(\alpha) increases monotonically with α\alpha. For sufficiently large k¯\bar{k} and q¯\bar{q}, one can see that y=f(α)y=f(\alpha) is convex (f′′(α)>0f^{\prime\prime}(\alpha)>0) for small α\alpha and concave (f′′(α)<0f^{\prime\prime}(\alpha)<0) for large α\alpha, and meets y=αy=\alpha at one, two, or three points depending on QQ as follows. i) When QQ is small, y=f(α)y=f(\alpha) intersects y=αy=\alpha at three points with the leftmost intersection point, giving the solution α\alpha. ii) At Q=QQ=Q_{*}, y=f(α)y=f(\alpha) is tangential to y=αy=\alpha at α\alpha_{*}. iii) For Q>QQ>Q_{*}, they intersect once, close to α=1\alpha=1. Consequently, the solution α\alpha increases from a small value with increasing QQ and jumps to a large value at Q=QQ=Q_{*}, leading to the discontinuous drop in the core size at a threshold QQ_{*} as seen in Fig. 2(a). When the mean degrees are not large enough, y=f(α)y=f(\alpha) meets y=αy=\alpha only once for all values of QQ, and one can see a gradual increase of α\alpha and a gradual decrease of the core size cc with increasing QQ. The threshold for such discontinuous transitions can be identified by noting that at the threshold, the curve y=f(α)y=f(\alpha) is tangential to y=αy=\alpha at α\alpha_{*} and its convexity also changes at α\alpha_{*}. Therefore, at the threshold, we find three relations α=f(α),1=f(α)\alpha_{*}=f(\alpha_{*}),1=f^{\prime}(\alpha_{*}), and 0=f′′(α)0=f^{\prime\prime}(\alpha_{*}) are satisfied. Using Eq. (7), we find that the relations are satisfied when α=e1\alpha_{*}=e^{-1}, q¯=e\bar{q}=e, and k¯=e1Q\bar{k}={e\over 1-Q_{*}}. The last expression implies that a solution to QQ_{*} can exist between 0 and 11 if k¯>e\bar{k}>e. Therefore we can see that a discontinuous transition in the core size as a function of QQ occurs when the mean degrees are larger than ee, i.e.,

k¯>eandq¯>e.\bar{k}>e\,{\rm and}\,\bar{q}>e. (8)

Appendix B Generating degree-preserving correlated networks

To explore the role of degree-degree correlations in forming a viable core, we generate networks by changing the neighboring of nodes in the original networks while preserving the degree of each node. The preservation of degree sequences makes the moments of degrees constant, resulting in the constant moment-relevant quantities knnsk_{\mathrm{nn}}^{s}, qnnsq_{\mathrm{nn}}^{s}, σcmps\sigma_{\mathrm{cmp}}^{s}, and σrxns\sigma_{\mathrm{rxn}}^{s} in Eq. (4) in all the generated networks for a given species ss, while the term c,rAcrskcsqrs\sum_{c,r}A^{s}_{cr}k^{s}_{c}q^{s}_{r} may be different from network to network.

Therefore, to control the degree-degree correlation in the generated networks, we adopt the Hamiltonian suggested in Ref. Noh (2007), which for the bipartite metabolic network is formulated as

=Jc,rAcrkcsqrs,\mathcal{H}=-J\sum_{c,r}A_{cr}k^{s}_{c}q^{s}_{r}, (9)

where JJ is the coupling strength, kcsk^{s}_{c} and qrsq^{s}_{r} are the degrees of a compound cc and a reaction rr for a given species ss, respectively, and Acr=1A_{cr}=1 if a chemical reaction rr is associated with a compound cc (schematically crc\to r), or Acr=0A_{cr}=0 otherwise (c↛rc\not\to r).

We generated an ensemble of artificial correlated networks for a given JJ by conducting a degree-preserving Monte Carlo (MC) simulation, which involves the swapping of randomly chosen pairs of links. For example, consider a scenario where a pair of links aa and bb, namely, carac_{a}\to r_{a} and cbrbc_{b}\to r_{b} are randomly selected, where ca↛rbc_{a}\not\to r_{b} and cb↛rac_{b}\not\to r_{a}. The swapping can be executed as carbc_{a}\to r_{b} and cbrac_{b}\to r_{a} with ca↛rac_{a}\not\to r_{a} and cb↛rbc_{b}\not\to r_{b}. This swap is accepted with the probability min{1,e()}\min\left\{1,e^{-(\mathcal{H}^{\prime}-\mathcal{H})}\right\}, where \mathcal{H}^{\prime} is the Hamiltonian after the swap is implemented.

Technically, we sample an ensemble of a species for a given JJ as 10 networks every 5 MC steps after stationarity is reached, where link swapping occurs for the total number of links in a single MC step. Consequently, each data point in Figs. 3(c) and (d) is averaged over 10 configurations per species and across all species.