Data-based stochastic modeling reveals sources of activity bursts in single-cell TGF-β signaling

Niklas Kolbe^{1¹¹1Corresponding author \Yinyang} Lorenz Hexemer^2\Yinyang Lukas-Malte Bammert³ Alexander Loewer⁴ Mária Lukáčová-Medviďová^3‡ Stefan Legewie^5*‡

( ¹Institut für Geometrie und Praktische Mathematik, RWTH Aachen University, Aachen, Germany
²Institute of Molecular Biology (IMB), Mainz, Germany
³Institute of Mathematics, Johannes Gutenberg-University, Mainz, Germany
⁴Systems Biology of the Stress Response, Department of Biology, Technische Universität Darmstadt, Darmstadt, Germany
⁵Department of Systems Biology and Stuttgart Research Center for Systems Biology (SRCSB), University of Stuttgart, Stuttgart, Germany
[email protected], [email protected], [email protected], [email protected], [email protected], [email protected]

\YinyangThese authors contributed equally to this work.
‡These authors also contributed equally to this work. )

Abstract

Cells sense their surrounding by employing intracellular signaling pathways that transmit hormonal signals from the cell membrane to the nucleus. TGF-β/SMAD signaling encodes various cell fates, controls tissue homeostasis and is deregulated in diseases such as cancer. The pathway shows strong heterogeneity at the single-cell level, but quantitative insights into mechanisms underlying fluctuations at various time scales are still missing, partly due to inefficiency in the calibration of stochastic models that mechanistically describe signaling processes. In this work we analyze single-cell TGF-β/SMAD signaling and show that it exhibits temporal stochastic bursts which are dose-dependent and whose number and magnitude correlate with cell migration. We propose a stochastic modeling approach to mechanistically describe these pathway fluctuations with high computational efficiency. Employing high-order numerical integration and fitting to burst statistics we enable efficient quantitative parameter estimation and discriminate models that assume noise in different reactions at the receptor level. This modeling approach suggests that stochasticity in the internalization of TGF-β receptors into endosomes plays a key role in the observed temporal bursting. Further, the model predicts the single-cell dynamics of TGF-β/SMAD signaling in untested conditions, e.g., successfully reflects memory effects of signaling noise and cellular sensitivity towards repeated stimulation. Taken together, our computational framework based on burst analysis, noise modeling and path computation scheme is a suitable tool for the data-based modeling of complex signaling pathways, capable of identifying the source of temporal noise.

Introduction

During development and homeostasis of mammalian tissues, proliferation and differentiation are coordinated among thousands of cells. To communicate with each other, cells use cytokines, a group of extracellular signaling molecules. The cytokine transforming growth factor beta (TGF-β) and other members of the TGF-β superfamily play an important role in tissue homeostasis, as they induce antiproliferative and apoptotic responses in adult cells to effectively limit tissue growth [59, 24]. Furthermore TGF-β signaling induces a loss of cell-cell junctions, cytoskeletal reorganization and migration [42], thereby allowing epithelial cells to evade from their original location by acquiring a migratory, mesenchymal phenotype in a process called epithelial-to-mesenchymal transition (EMT).

Since the TGF-β signaling pathway is involved in such central regulatory processes, its deregulation is associated with diseases like fibrosis and cancer. In cancer progression, TGF-β signaling plays a dual role: In normal cells the pathway typically provides a cytostatic response and thus acts as a tumour suppressor. In early tumor stages, the tumor-suppressive function is frequently lost, whereas in late-stage tumors TGF-β signaling may induce EMT and promote metastasis [27]. Hence, TGF-β signaling undergoes a specificity switch from a tumor-suppressing to a migration-enhancing function. To better understand this specificity switch, deeper insights into cellular information processing in the TGF-β pathway are required.

The molecular mechanisms of TGF-β signal transmission within cells are well characterized: Signaling is initiated by binding of TGF-β to the TGF-β receptor type II (TGFBR2). This receptor then recruits a type I receptor (TGFBR1) to form the receptor-ligand complex, which phosphorylates SMAD2 and SMAD3, the intracellular transducers of the signal. In the cytoplasm, phosphorylated SMAD2/3 proteins form trimers with another protein called SMAD4, and the SMAD trimers translocate into the nucleus to act as transcription factors regulating the expression of downstream target genes. Among the induced genes are both positive and negative feedback regulators, e.g., SMAD7 which inhibits TGF-β receptors by inhibiting their catalytic activity and inducing their degradation [57].

Previous studies indicate that the cellular response to TGF-β stimulation is correlated to the temporal dynamics of SMAD activation [44, 62, 60]. For instance, in cancer cell lines EMT and cell migration seems to be induced upon transient SMAD activation, while the cytostatic response requires sustained signaling [44]. Moreover, we recently analyzed the dynamics of SMAD nuclear translocation in single MCF10A cells using live-cell imaging and found that cells with sustained signaling tended to divide less when compared to cells showing a transient signal [62]. Due to high temporal and spatial resolution, this data allowed us to accurately observe the temporal heterogeneity of SMAD signaling at the single-cell level. Interestingly, we found temporal fluctuations in the nuclear translocation of SMAD2 on an intermediate time scale of 30 min to 2 hours which we refer to as activity bursts in the following. These bursts may be physiologically relevant, as stochastic or periodic pulsatile changes in signaling proteins were found to have an influence on the cellular response for several other pathways including calcium [4], NF-kB [3, 53], ERK [2, 1] and P53 [49] signaling.

The temporal regulation of SMAD translocation is well studied and understood by deterministic mathematical models describing the average dynamics of large cell populations, see e.g. [62, 46, 67, 72, 77]. Using these models, ligand degradation due to receptor internalization and the negative feedback via SMAD7 have been identified as important factors in the termination of signal. However, deterministic models do not account for the temporal activity bursts we observe at the single-cell level. For a better understanding of the pathway dynamics and to reproduce the effect of bursting events on the pathway, stochastic simulations are required. A common but computationally expensive way to integrate stochastic effects in time trajectories is to use the Gillespie algorithm [20], also known as kinetic Monte Carlo method. The algorithm delivers a solution of the underlying chemical master equation by sampling from a large number of realizations [22, 16]. Equivalently, the chemical Langevin equation introduces a system of stochastic differential equations (SDEs), that assumes noise in the concentrations of the chemical species to approximate the stochastic dynamics [35, 21]. Other types of SDE and random ordinary differential equation models have been applied to cellular dynamics for example in [55, 65, 75]. The complexity of the TGF-β system under investigation is reflected by a large number of dynamic variables and parameters acting on different time scales leading to a stiff system of ordinary differential equations (ODEs). This reduces the efficiency of stochastic simulations and complicates the quantitative calibration of stochastic models based on large-scale single-cell datasets.

Species that are present in low copy numbers contribute most to the stochasticity. In contrast, species with large copy numbers or fast reaction rates necessitate smaller time steps in the Gillespie algorithm, thereby increasing the required computing time. In common splitting methods, either the species [51] or the reactions are split into a slow and a fast group [54, 73], or combinations of both [5] under quasi-steady-state or partial equilibrium assumption. In other hybrid methods, based on a separation of time scales, fast components are modeled deterministically while other components are treated with a stochastic model [33]. Further hybrid methods determine their stochastic species based on the variance [39, 25]. Derived from fundamental chemical reactions of simple molecules without conformational changes, all of these methods assume Poisson- or normal distributions of reaction events.

In contrast to these methods we propose a hybrid deterministic-stochastic modeling framework, which allows for multi-parametric stochastic processes corresponding to more complex processes, still isolating the underlying and unknown complexity in a systems biology manner. To this end our efficient approach applies stochasticity by means of SDEs to a subset of selected variables within the system. We further propose an analysis framework that uses species splitting to our advantage since it offers the possibility to attribute sources of noise. To quantitatively reproduce bursting events in the SMAD signaling pathway, we apply an automated detection algorithm enabling us to analyze how burst properties vary across the population. Applying the same burst analysis to both the measured data and the simulations, we define an objective function and thereby compare different model variants that consider stochasticity in distinct parts of the signaling cascade to the data. Thus, we were able to determine the contribution of different pathway reactions to generating activity bursts in SMAD signaling and found that TGF-β receptor internalization may play a key role in this respect. Finally, we compared the predictions of our model to experimental data in untested conditions and show that our model faithfully reproduces the temporal heterogeneity of the SMAD signaling network. The proposed framework may be generally applicable to quantitatively model stochastic biochemical signaling networks based on live-cell imaging data.

Results

Refer to caption — Fig 1: Quantification of bursts in TGF-β-induced SMAD translocation

Quantification of temporal stochasticity in single cells by burst analysis

To analyze activity bursts in TGF-β/SMAD signaling, we used previously published measurements of SMAD2 nuclear translocation at the single-cell level [62]. In these experiments, MCF10A breast epithelial cells stably expressing SMAD2 fused to a fluorescent protein were analyzed by time-resolved live-cell microscopy. The nuclear and cytoplasmic SMAD2 concentrations were quantified over a 24h time interval after stimulation with different concentrations of TGF-β. To determine pathway activity, the ratio of nuclear to cytoplasmic SMAD2 (nuc/cyt SMAD2 ratio) was used as a quantitative measure of the translocation of SMAD2 to the nucleus (see Fig. 1).

The analysis in [62] showed that upon stimulation with a saturating TGF- $\beta$ stimulus (100 pM), the population-average response summarizing 352 single cells shows a transient peak that is followed by a plateau above the baseline (black line in Fig. 1B). In contrast to [62], where heterogeneity in the cell population was investigated, we here focus on temporal heterogeneity. At the single-cell level, we observed fluctuations in the nuc/cyt SMAD2 ratio on three different time scales: On a short time scale of less then 100 minutes, there are temporally uncorrelated fluctuations that most likely reflect measurement noise. On a larger time scale of more than 12 hours, we observed drifts of the trajectories possibly caused by photobleaching or environmental conditions like the density of the cells. On an intermediate time scale we observed changes in nuc/cyt SMAD2 ratio that we interpreted as bursting events in SMAD signaling (cf. Fig. 1B). As shown in [62] and S9 Fig, these events cannot be explained by uncorrelated fluctuations in cytoplasmic SMAD.

To quantify how activity bursts vary between cells and TGF-β doses, we developed a method to automatically detect bursting events on an intermediate time scale between 100 and 300 minutes. Our burst analysis consists of three principal steps (see Methods and Fig. 1C for details): (1) Slow drifts on a larger timescale are identified and subtracted from the signal.(2) Fast and uncorrelated fluctuations below the relevant time scale of bursting events are removed from the signal. (3) The position, height and duration of the remaining events are detected and quantified. We validated this burst detection algorithm using in silico-generated datasets, in which we on purpose introduced bursts resembling the experimentally observed ones (see Methods).

This burst detection workflow was applied to experimental single-cell data recorded at low (2.5 pM) and high (100 pM) doses of TGF-β, or in the absence of stimulation (0 pM). In the stimulation datasets, the first signal detected as a burst corresponds to the timing of the initial peak observed in the population average (black line in Fig. 1B). The amplitude of these single-cell peaks increased in a dose-dependent manner much like the population-average response (compare Fig. 1D). Subsequent activity bursts, enumerated by the burst indices 2-5, reflect stochastic events. These show a substantially lower amplitude than the first burst, but are still elevated in a dose-dependent manner in most cells treated with 2.5 and 100 pM TGF-β when compared to the 0 pM control. Moreover, the total burst count per single cell within the first 24h after stimulation gradually increased with the ligand dose (Fig. 1E). In contrast, the duration of the first and subsequent activity bursts remained essentially constant across varying TGF-β doses (compare Fig. 1F), which agrees to estimations of the signals self-similarity by means of autocorrelations (see S1 Fig and S2 Fig). Taken together, these data suggest that certain burst features change in a dose- and time-dependent manner (burst amplitude, total burst count), whereas others apparently do not depend on the external stimulus (burst duration).

Stochastic burst-like events may contribute to regulating cellular responses to TGF-β stimulation such as cell migration. To quantify the relationship between bursts and cell migration, a motility score for individual cells was calculated as the summed absolute displacement of cells between consecutive time points, as previously described in [62] (see Methods). By relating the motility of individual cells to bursting events we confirmed that cells with higher SMAD burst counts also tend to exhibit a higher motility than cells with lower burst counts, see Fig. 1G. As cells with a lower burst count are also expected to exhibit a lower total SMAD signal, this analysis was performed for various bins summarizing single cells with a similar total area-under-curve (AUC). Even after this correction for the total SMAD signal, cells with a higher total burst count (>12) persisted to exhibit a higher mobility score than those with fewer bursts (<8). This suggest that bursting events in the nuc/cyt SMAD2 ratio are partially correlated to cellular migration and may be causally related to this cellular response.

Modeling burst like behaviour in single cells by generalizing a population-average model

Given the dose-dependent nature of SMAD activity bursts and their potential relation to cell migration, we sought to describe them using a stochastic modeling approach. To this end, we built on a mechanistic model of the pathway published in [62] that is based on ordinary differential equations (ODEs). This model comprises three modules describing TGF-β receptor dynamics [11], SMAD phosphorylation/complex formation [69] and SMAD-induced feedback regulation [36], see Fig. 2A for details.

For a description of bursting, we converted the population-average version of this model into a stochastic description of a complete cell population. This was done by simulating an ensemble of cells, in which each cell exhibits independent stochastic fluctuations in certain kinetic parameter values. The individual cells of this ensemble share a common extracellular TGF-β pool that serves as an input into the single-cell model, but is also in turn influenced by the single cells, as these internalize receptor-ligand complexes and thereby degrade the ligand. Hence, the model consists of interdependent equations describing the extracellular TGF-β and intracellular signaling protein dynamics.

The temporal dynamic of the external input, which is shared among many single cells, is given by

{dL}=f_{L}(t,L,\mathbf{Y}^{1},\dots,\mathbf{Y}^{N},\mathbf{P}^{1},\dots,\mathbf{P}^{N}){\,dt},

(1)

where $L$ denotes the concentration of the free ligand TGF-β and $\mathbf{Y}^{i}$ is a vector including the dynamic concentrations $y^{i}_{1},\dots y^{i}_{22}$ relevant to the SMAD pathway in cell $i$ out of $N$ cells. Similarly, the vector $\mathbf{P}^{i}$ contains the relevant kinetic parameters $p^{1}_{i},\dots,p_{55}^{i}$ in cell $i$ , see Table 3 for details. The right hand side $f_{L}$ of equation (1) describes the addition of TGF-β into the cell culture dish and its degradation due to internalization into cells. The detailed equations governing the function $f_{L}$ are given in Table 2 and the dependence on the population size is shown in S12 Fig..

The intracellular SMAD signaling dynamics in the i-th cell in our ensemble model are governed by the equation

{dy^{i}_{k}}=f_{k}(t,L,\mathbf{Y}^{i},\mathbf{P}^{i}){\,dt}

(2)

for $k=1,\dots,22$ species in $i=1,\dots,N$ cells. The right hand sides $f_{k}$ for $k=1,\dots,22$ account for the reactions inside the cell that depend on the ligand $L$ from the outside, the intracellular dynamical concentrations $\mathbf{Y}^{i}$ and the kinetic parameters $\mathbf{P}^{i}$ . Refer to Table 2 for mathematical details.

To account for bursting at the single-cell level, we introduced noise in a subset of kinetic parameters by employing SDEs. This approach leads to different temporal trajectories of the protein concentrations in each cell considered in (1)-(2). We use the Cox-Ingersoll-Ross (CIR) model [9],

dp_{j}(t)=\theta_{j}(p_{j}^{0}-p_{j})dt+\sigma_{j}\sqrt{p_{j}}dB_{j},

(3)

to describe the parameter change $dp_{j}$ at time point $t$ by a reversion force towards the initial value $p_{j}^{0}$ and a stochastic contribution due to a Brownian motion $B_{j}$ . The standard deviation parameter $\sigma_{j}$ controls the time scale of the fluctuations for the corresponding model parameters $p_{j}$ (see Fig 2B), while the parameter $\theta_{j}$ determines the strength of an additional mean reversion force pulling the concentration back to its initial value $p_{j}^{0}$ .

For a suitable model of noise in kinetic parameter we expect first that parameters never fall below zero and thus become nonphysical. Second, we expect temporal stability of the kinetic parameter in the sense that its expectation remains constant and its variance bounded over time. And third, the model should lead to log-normal-like distribution of kinetic parameters over time (compare Figure 2B) as fluctuations in many cellular processes, e.g., in gene expression, typically lead to this type of distribution [45].

The non-negativity of model (3) is verified (see also Methods for non-negativity of its discretization), although this does not apply to fundamental Brownian motion and the more commonly used Ornstein-Uhlenbeck (OU) model. Moreover, the model preserves the mean and its variance is bound over time which is unlike Brownian noise and random ODE models whose variance steadily increases with time $t$ . We could further verify by simulations that the relative frequency of kinetic parameters over time resembles log-normal distributions (compare Figure 2B). Refer also to S3 Fig and S4 Table for a comparison of the fitted CIR model to OU.

The mean reversion property of the model controlled by the parameter $\theta_{j}$ leads to burst-like dynamics already in the paths of the kinetic parameters (see Figure 2B) and our model allowed us to assess how these dynamics propagate to the experimental readout, the nuc/cyt SMAD2 ratio. This ratio is computed from the dynamic concentrations as the sum of nuclear divided by the cytoplasmic SMAD2 species as described in the Methods. The occurrence and strength of fluctuations in the nuc/cyt SMAD2 ratio predicted by the model depend not only on the mean reversion force and variance of the stochastic kinetic parameter. They are also heavily influenced by the choice of the parameter $p_{j}$ in which stochasticity is introduced. This sensitivity allowed us to discriminate different model assumptions and identify key parameters for which added noise can reproduce the burst features of the single cell data shown in Fig. 1.

Quantitative model calibration by fitting to single-cell datasets

By quantitative fitting to experimental data we aimed to reproduce TGF-β-induced bursting in single cells (Fig. 1). To reduce the complexity and to gain insights into the origin of stochastic fluctuations, we considered block models in which we allowed randomness only in sets of parameters describing similar reactions. Technically, each block model uses (3) only for a subset of kinetic parameters $i\in S\subset\{1,2,\dots,55\}$ . The remaining kinetic parameters are assumed to be temporally constant. For introducing stochastic fluctuations, we focused on reactions on the receptor level, as receptor dynamics are rate-limiting in TGF-β signaling [28, 62]. Moreover, we observed in our published single-cell data that complete inhibition of the TGF-β receptors by a small-molecule inhibitor during a stimulation experiment leads to homogeneous decay of SMAD nuclear translocation that showed no sign of stochastic SMAD dynamics at the single-cell level [62].

We considered five stochastic block models (see Table 3 for details). The receptor/ligand block model introduces noise in the reactions related to the binding of TGF-β to its receptors of type 1 and 2 on the cell surface. The internalization block model allows for stochasticity in the transport of receptors and TGF-β/receptor complexes from the cell membrane into the cell. In the endosomal traffic block model noise is considered in the intracellular transport of internalized ligand/receptor complexes. Stochastic synthesis of feedback proteins and type 1 and 2 TGF-β receptors is assumed in the synthesis block model. Finally, the degradation block model adds systematic noise to the degradation rates of the TGF-β receptors and the feedback regulator.

To compare the stochastic models to the experiments, we applied the burst detection and analysis used to characterize the data in Fig. 1 to the model output. This allowed us to define an objective function, which computes a distance measure between simulated and measured cell populations. This function compares summary statistics of the population snapshot distributions (mean and standard deviation of nuc/cyt SMAD2 ratio, Fig. 3B) as well as single-cell characteristics (count, height and duration of bursts, Fig. 3C). Snapshot summary statistics were computed based on the nuc/cyt SMAD2 ratio of all cells at a given time point using the same (5 min) sampling intervals as in the experimental data. As depicted in Fig. 3A, the objective function is a weighted sum of the distance values resulting from the comparison of these single-cell and population statistics (see Methods for further details). The weights were balanced to consider all single-cell features and the standard deviation of the snapshot distribution equally while strictly penalizing deviations from the the snapshot mean through a larger corresponding weight to ensure that the model reproduces the population-average response to TGF-β stimulation. We validated the objective function by applying it to computationally generated data sets that clearly contained differences in the features we aimed to quantify (not shown). Furthermore, for series of data sets with gradually changing features from a reference data set, we verified that the objective function varied only gradually as well, see Fig. 4B.

To calibrate the stochastic models we simulated the nuc/cyt SMAD2 ratio in 375 cells and compared them to the the nuc/cyt SMAD2 ratio measured in 730 cells upon stimulation with 100 pM TGF-β using the objective function. For the simulation of the cells, we implemented a high-order stiff SDE solver that takes into account both the coupling of the ligand degradation by many cells and the stiffness of the system and confirmed its efficiency and accuracy (see Methods and Fig. 4). Interestingly, even in the absence of TGF-β stimulation stochastic events were observed in SMAD nuclear translocation in the experimental data, possibly due to noisy basal shuttling or due to measurement noise. To account for this in the stochastic models, we have added measurement data (normalized by subtracting the mean) from a control experiment without stimulation to the simulated time paths after their numerical computation. As a result even the model version with constant parameters, to which we refer to as ‘deterministic model’, includes temporal noise.

The kinetic parameters of the deterministic ODE model had been previously fitted to time-resolved measurement data obtained in experiments with different doses of TGF-β in [62]. With the exception of the kinetic parameters in which we introduced noise, we maintained these originally obtained parameters in our stochastic model. Hence, only the stochastic parts of the models had to be estimated using the objective function. Independent noise was assumed for each stochastic parameter $p_{i}$ in a block model, i.e., three values were to be determined per $p_{i}$ (the initial value $p_{0}^{(i)}$ , the noise parameter $\sigma_{i}$ and the reversion parameter $\theta_{i}$ ). Depending on the number of stochastic parameters included in a block model between 3 and 9 parameters were estimated during parameter optimization. The extended scatter search [14] within the software package MEIGO [13] was used to carry out a global parameter search within the parameter space. This evolutionary algorithm uses latin-hypercube sampling to identify an initial population of parameter states, which then evolves by forming combinations between its members. These combination states can replace the original members for the following iteration. The search was stopped when the distance between subsequent populations was lower than a defined level for an extended sequence of iterations. This algorithm together with our high-order path computation scheme could optimize the noise parameters in feasible time.

TGF-β internalization explains temporal stochasticity in nuclear SMAD

The optimization results are summarized in Fig. 5 and Table 1. Distributions of the burst height, width and frequency in all block models are presented in Fig 5B. In Table 1 we show the fitting errors of the block models with respect to the objective function. In Fig 5A various paths of the internalization and the synthesis block models can be compared to the nuc/cyt SMAD2 ratio observed in the experiments.

Table 1: Goodness-of-fit of stochastic block models with respect to components of the objective function

model	burst height				burst duration				count	mean	std.	norm
	model error
deterministic	0.12	0.20	0.50	0.57	0.10	0.03	0.33	0.44	0.87	2.75	5.73	11.64
degradation	0.09	0.05	0.14	0.30	0.00	0.03	0.09	0.30	0.38	0.41	0.43	2.21
end. traffic	0.10	0.09	0.17	0.28	0.03	0.02	0.04	0.16	0.17	4.62	3.76	9.44
internalization	0.01	0.00	0.00	0.00	0.00	0.00	0.00	0.01	0.02	1.19	0.14	1.39
receptor/ligand	0.09	0.10	0.23	0.33	0.04	0.02	0.06	0.22	0.26	3.66	3.79	8.80
synthesis	0.12	0.19	0.47	0.54	0.12	0.03	0.26	0.40	0.76	1.33	5.67	9.90

Distance measurements achieved by the five block models after fitting to experimental data for a stimulation with 100 pM TGF-β. The objective function with final value shown in the last column is the weighted sum of mean and standard deviation of the nuc/cyt SMAD2 ratio, as well as burst characteristics in single cells including count, height and duration, for details see Fig. 3 and Methods. Distance measures for burst height and duration are shown for the first four bursts in each model. In case of single cell statistics the table shows summed contributions of median and standard deviation. For the fitted models no computational instabilities occurred and hence the component accounting for failed simulations is zero. The stochastic internalization model achieved the best fit. An alternative validation in terms of the Akaike information criterion led to consistent conclusions, see S5 Table.

In summary, our results show that the internalization block model achieved the best fit to the data (compare Fig. 5A). All tested models succeeded in being consistent with the population average of the nuc/cyt SMAD2 ratio. However, in terms of temporal stochastic fluctuations significant differences between the models were observed: The block models endosomal traffic, receptor-ligand and synthesis could barely achieve any variability in the observable unlike the internalization and degradation block models (Fig. 5B). Moreover, the latter two models attain single cell statistics similar to those of the data, while the other models underestimate the burst count and the variance in their height and duration.

As reductions of the full complexity of intracellular processes, the stochastic models cannot explain the full variability in the data. Yet, the results are accurate enough to discriminate different models. Our model comparison implies that stochastic processes in the receptor-ligand internalization may play a key role in the temporal heterogeneity of the nuc/cyt SMAD2 ratio in the cell. We hypothesize that the predicted stochastic internalization may arise from low receptor numbers on the cell surface combined simultaneous sorting of dozens of receptors into internalization vesicles [74, 70]. Given that TGF-β receptor become only activated after internalization, the formation of a vesicle will give rise to a sudden increase in the signal and thus to a burst in the nuc/cyt SMAD2 ratio (see Discussion).

Stochastic bursting of SMAD2 is a dose-dependent phenomenon, as the burst number and amplitude are reduced at low TGF-β concentrations, whereas the burst duration essentially remains constant (Fig. 1). Since the stochastic model has been calibrated solely on the 100 pM TGF-β stimulus data, we asked whether it could successfully predict changes in bursting dynamics at 5 and 25 pM TGF-β, respectively. Indeed, we found that the internalization model also generated realistic predictions of population average and single cell responses for these doses, which had not been employed for model fitting (see Fig. 6).

Since the stochastic parameters were not fitted to the new stimulation doses, these predictions show the generality of the model. They are also an indication that the stochasticity within TGF-β internalization is an intrinsic property independent of the strength of the signaling. In other words, the processes that lead to internalization noise do not depend on the stimulation dose and the dose mainly influences how strongly the internalization noise is transferred to the nuc/cyt SMAD2 ratio.

Temporally stable and unstable noise contributions explain variability in response to repeated stimulation

The stochastic noise in our model leads to signaling fluctuations with a short memory, as the bursts typically decay within a few hours after their appearance (Figs. 5A and 6A). As shown in [62], SMAD signaling also exhibits a temporally stable noise component, since signaling in sister cells remains similar over longer time scales of more than a few hours after cell division.

Restimulation experiments, in which the same cell population is subjected to a second TGF-β stimulus 6h after the initial treatment, probe the relative contribution of temporally stable and unstable fluctuations, and are thus an interesting test case for our stochastic model: If the response to the first and second stimuli are highly correlated at the single-cell level, the variability is dominated by temporally stable fluctuations . In contrast, temporal fluctuations lead to uncorrelated restimulation responses and feedback loops in SMAD signaling additionally contribute. To investigate the long-term memory of the pathway, we compared the best-fit internalization model to restimulation experiments published in [62] (see Methods for details).

The model-data comparison was performed at the level of population-average trajectories (Fig. 7, top row) and by relating single-cell peak amplitudes approximately 90 minutes after first and second stimulus , respectively (Fig. 7, bottom row). The stochastic model approximates the measured population-average trajectories reasonably well and reproduces that the second peak reaches only approximately 80% of the magnitude of the first peak, thereby indicating signaling refractoriness after the initial stimulus (Fig. 7A). At the single-cell level, the stochastic model performs less well, as the simulated noise reproduces only 48% of the measured variance of the first peak (initial stimulation), and 62% of the variance of the second peak (restimulation). Furthermore, the first and second peaks upon restimulation are negatively correlated in the model (Pearson correlation of r=-0.19), whereas their correlation is positive in the data (r=0.34) (Fig. 7A, bottom). Taken together, the best-fit stochastic model fails to fully reproduce the restimulation behavior of the SMAD pathway .

In our model, SMAD signaling is controlled by transcriptional negative feedback loops (Fig. 2A) which may introduce pathway refractoriness and are known to reduce signaling variability [46]. To investigate whether the feedback strength may be overestimated in the model , we lowered the parameter of SMAD-dependent feedback induction to 30% of its best-fit value (see also S6 Fig for a prediction of feedback reduction in case of single stimulation). In line with a role in pathway refractoriness, the reduction of feedback increased the second peak at the population-average level (Fig. 7B, top) and diminished the negative correlation of first and second peaks at the single-cell level (Fig. 7B, bottom). Furthermore, the feedback-reduced model showed a higher cell-to-cell variability of both signaling peaks as expected (Fig. 7B, bottom). However, significant model improvements at the single-cell level could not be achieved without strong deviations between model and data at the population-average level (Fig. 7B, top). Therefore, feedback strength alone does not account for differences between model and data, and this model variant was not considered further.

In [62], we showed that temporally stable SMAD signaling fluctuations are caused by differences of signaling protein expression levels across cells. Therefore, we argued that a combination of stochastic noise and stable signaling protein fluctuations may be required to reproduce the restimulation experiments.

To assess the effect of varying initial protein concentrations in the best-fit stochastic model, we focused on total SMAD4 levels, as this parameter showed the strongest difference between cellular subpopulations in our previous work [62]. In Fig. 7C, we homogeneously decreased the total SMAD4 concentration by 40% in all cells of the stochastic model ensemble (green). Compared to the best-fit stochastic model (blue), the SMAD4-perturbed model shows a higher population-average response and a higher peak variance across cells (Fig. 7C). Hence, minor variations in SMAD4 levels cause a noticeable change in the behavior of the stochastic model.

To combine temporally stable SMAD4 fluctuations with stochastic noise, we constructed a cellular ensemble (Fig. 7D, cyan) that consists of two stochastic cell populations, one with the best-fit parameters, and one with a homogeneous 40% reduction in total SMAD4 levels in all cells (Fig. 7C, blue and green). This combined model reproduced the major aspects of the restimulation data including the population-average and the variability of the first and second signaling peaks (Fig. 7D). Furthermore, the single-cell correlation between the signaling peaks was slightly positive (r=0.08), and thus more consistent with the experimental data than the previous stochastic model variants (Figs. 7A-C).

Even though the more realistic simulations proposed in this paper were required to capture all aspects of the restimulation data, these results further support our earlier observation that SMAD signaling involves a temporally stable noise component, likely due to variations in signaling protein levels. In addition, stochastic noise and negative feedback also contribute to the restimulation response, e.g., by dampening the positive correlation of the first and second signaling peaks. Therefore, the pathway responds in a complex and non-deterministic fashion to repeated stimulation.

In summary, our restimulation analysis helped to assess the interplay of noise sources and feedback mechanisms in determining the long-term heterogeneity of SMAD signaling in single cells.

Discussion

In this work, we employed stochastic differential equations to model temporal variability observed in single-cell SMAD signaling as stochastic fluctuations in kinetic parameter values. Assuming a CIR process in TGF-β receptor internalization, the proposed model realistically reproduced noisy signaling at the single-cell level. To quantitatively calibrate the model based on experimental data, we developed a method to detect and quantify bursting events. In combination with an adjusted high-order time integration scheme, we thereby developed an efficient optimization scheme to discriminate different model variants based on single-cell measurements.

In previous modeling studies, the SMAD pathway was mostly described at the population-average level, i.e., the simulations described the dynamics of one average cell [40, 67, 58, 7, 8, 6, 72, 76, 43, 69, 32]. Now that more and more single-cell datasets of SMAD dynamics are available [76, 18, 37, 66], quantitative modeling of cell-to-cell variability in the pathway becomes feasible. In [62], a first attempt was made by sampling the initial protein concentrations in a deterministic model. Thereby, a heterogeneous ensemble of single cells was simulated and the heterogeneity was assumed to be temporally stable. Here, we have now focused on understanding the source of temporal fluctuations observed in live-cell imaging of SMAD2 nucleocytoplasmic shuttling.

Specifically, we analyzed SMAD2 bursting events which may contribute to variability in the cellular response to TGF-β, since cells with more bursts exhibit higher motility in the cell culture dish, clf. Fig. 1G. Apparently, stochastic bursts were previously reported for the nuclear translocation of the closely related SMAD4 protein during the development of Xenopus embryos [71]. Our implementation of automated burst detection allowed us to quantify burst height, width and frequency at the level of individual cells. We found that the number and height of bursting events are dose-dependent, and therefore contain information about the extracellular TGF-β concentration. To describe this bursting phenomenon, we have introduced temporal noise in specific kinetic parameter values and analyzed how this noise propagates to the experimentally observed nuc/cyt SMAD2 ratio, clf. Fig. 2B and 2D.

Quantitative stochastic modeling in biology so far is mostly limited to a description of simple reaction networks, e.g., promoter cycles describing the activity of single genes [47, 50, 41, 19, 38]. Our large-scale signaling pathway model (see Fig. 2) contains 45 kinetic parameters and therefore poses a challenge to stochastic modeling, especially when calibrated based on a single-cell dataset with high temporal resolution (0-24h in 5 min intervals) and a large number of cells (n=730). To overcome this challenge, we proposed a hybrid deterministic-stochastic approach where temporal fluctuations were modeled by a CIR process in a subset of the kinetic parameters. In contrast to the Wiener process, which is commonly used in stochastic modeling, the CIR process guarantees non-negative dynamic variables. The properties of the CIR process are well suited for biological modeling of noise: the temporal distribution of the noise tends to a log-normal distribution, clf. Fig. 2B, which was reported for kinetic rates also from experiments, see for example [45].

In this work we have implemented a time integration scheme that is capable to achieve reliable simulations and fast enough to fit the complex model to detailed data. The scheme’s efficiency has been obtained by high order and semi-implicit time stepping. The non-negativity of the CIR process has further contributed to the stability during parameter estimation. The identification of appropriate noise parameters for our model required a calibration that also takes properties of the fluctuations into account. Instead of comparing full distributions, which could be achieved for example using the Wasserstein distance, we used the first two statistical moments of the burst feature in our objective function. While the duration of the bursts did not exhibit significant differences among several tested models the burst height and their count played a crucial role in their discrimination, clf. Fig. 3B.

Using our burst detection technique, we could discriminate different models and quantify the contribution of one or several parameters to the observed noise. We showed that the receptor internalization model performed better than the degradation and the synthesis models, clf. Fig. 3. In the synthesis model, we observed low variability close to the results of the deterministic model. The degradation model variability is closer to the experimental data in terms of burst statistics, but is still outperformed by the internalization model. These results indicate that bursts might be caused by stochastic events in receptor internalization. In recent papers studying the internalization of Epo receptors and of the yeast methionine transporter Mup1 by live-cell imaging it has indeed been shown that receptor transport processes can exhibit strong heterogeneity [30, 10]. A plausible explanation for noise arising in receptor internalization is that large numbers of receptors on the cell surface get constricted and internalized at once during vesicle formation. Given that TGF-β receptor internalization promotes downstream signaling [74], the formation of a vesicle will give rise to a sudden increase in the signal and thus to a burst in the nuc/cyt SMAD2 ratio (see also S10 Fig. for an alternative internalization model). For EGFR receptors, it was indeed observed that each internalized vesicle in the endosome contains around 100 receptors [68]. Given that TGF-β receptors are present on the cell surface in low amounts of few hundreds to thousand molecules [74, 70] and are internalized via coated pits [15], these internalization bursts may indeed give rise to significant fluctuations in downstream signaling as we predict here.

Our findings about the role of receptor internalization in the SMAD pathway suggest promising ways to derive simplified models with similar burst dynamics in the future (e.g. through quantized state systems methods).

The best-fit internalization model was independently validated by analyzing the bursting behavior at low TGF-β concentrations. The successful prediction of bursting at various TGF-β levels proved that the noise parameters found by fitting to the 100 pM are not specific to this condition. Hence, the internalization noise remained constant across conditions, but its propagation through the signaling network changed, so that the net result is a lower signaling output noise in terms of burst count and amplitude upon weak TGF-β stimulation (Figs. 5 and 6).

The restimulation experiments showed that the stochastic model only partially accounted for the pathway behavior upon repeated TGF-β application (Fig. 7): In the experimental data, the first and second signaling peaks are positively correlated, whereas the model, which focuses on temporally unstable noise, predicted a negative correlation between peaks. By performing detailed simulations, we could show that correlated restimulation behavior is explained by the stochastic model if we additionally assumed a temporally stable noise contribution. As in our previous work [62], this temporally stable noise contribution was implemented as cell-to-cell differences in the total signaling protein concentrations which are important determinants of signaling fluctuations in various pathways [17, 61, 29, 31, 30, 12, 56]. The different time scales of these two noise contributions explain why sister cells rapidly desynchronize their SMAD signaling behavior after a division event (temporally unstable noise), but remain partially similar over long time scales (stable noise) [62]. Likewise, the single-cell correlation of the first and second peaks in a restimulation experiment will likely become more and more dissimilar if the second treatment is delayed relative to the first one. Our simulations have shown that stochastic modeling is a valuable tool for the analysis of such experiments which provides insights into the underlying mechanisms.

To the best of our knowledge, this work is the first approach to model bursting events in signaling pathways using CIR processes in kinetic parameters. Our results demonstrated that this approach is suitable to simulate single-cell dynamics and to analyze the origin of noise in large systems. The optimization framework combining burst detection and an efficient time integration scheme are not limited to TGF-β signaling and can be generalized to other systems in the future.

Methods

Single-cell data

The experimental datasets used in this paper are published in [62] and are available online at the Dryad digital repository [64] and in the GitHub repository [34]. In brief, human MCF10A cells expressing SMAD2 fused to the yellow fluorescent protein mVenus and H2B fused to the cyan fluorescent protein mCerulean under the control of the Ubiquitin C promoter were imaged on a Nikon Ti inverted fluorescence microscope with a CCD camera and a 20x plan apo objective using appropriate filter sets. The microscope was surrounded by a custom enclosure to maintain constant temperature and atmosphere. Images were acquired every 5 minutes for the duration of the experiments. Cells were tracked in the corresponding images using custom-written scripts and fluorescent intensities quantified. For parameter estimation of the stochastic models (Fig. 7), the measured nuc/cyt SMAD2 ratio after stimulation with 100 pM of TGF-β from the published dose-titration experiment (Fig. 2A-D in [62]) was merged with a second, previously unpublished dataset with the same experimental setup (N = 378 cells). For model validation (Fig. 6), we used the nuc/cyt SMAD2 ratios from the 5 pM and 25 pM conditions from the same dose titration experiments. In the restimulation analysis (Fig. 7), the experimental data from repeated 5 pM TGF-β stimulation (Fig. 4F in [64]) was used.

Burst analysis

To successfully apply burst detection to experimental data, we tuned the parameters of the above-mentioned processing steps, such as the bandwidth of the Gaussian filters and the height thresholds of the peak detection. To create an in silico benchmark for burst detection, we generated synthetic trajectories by model simulations and added Gaussian hills as artificial burst events as well as fluctuations from unstimulated cells as noise. We chose the burst detection parameters such that in silico added bursts are detected reliably (high sensitivity: high true positive detection rate), while keeping the detection of bursts in unstimulated cells low (high specificity: low false detection rate). Specifically, we randomly sampled all burst detection parameters and calculated sensitivity and specificity for each burst parameter candidate. Thereby, a receiver operating characteristic (ROC) curve was generated [48] and the best parameter set achieved a false detection rate below 25% and a sensitivity of more than 75% (data not shown).

Objective function

The objective function that we used for the model validation and parameter estimation is a weighted sum of several distance measures accounting for single-cell characteristics and summary statistics of the population distribution, see also Fig. 3. All these distance measures were calculated based on the SMAD2 nuc/cyt ratio which in the model is given by

r=\frac{y_{18}+2y_{19}+3y_{20}+y_{21}}{y_{10}+y_{11}+2y_{13}+3y_{14}}.

(4)

Hereby, we refer to Table 2 for details on the dynamic variables employed.

Single cell characteristics

To quantify the fit in burst height and duration we considered the relative difference between model and data in median and standard deviation across simulated and measured cell population. In more details, consider the empirical population $X$ including the nuc/cyt SMAD2 ratio in $N_{X}$ cells either given through measurement data or the model. We define the distribution of the height of the $i-$ th burst computed with our burst detection by $p_{\text{height}}^{i}(X)$ . We note that a) the burst detection might detect less than $i$ bursts and b) in case of a model population the path computation might fail to compute some of the paths. Hence we assume that $p_{\text{height}}^{i}(X)$ considers the $i$ -th burst height only if a) and b) can be excluded. Moreover we define $r^{i}(X)\geq 0$ to be the relative number of paths where either a) or b) holds for the $i$ -the burst deducting the relative number of paths where a) holds in the data. By denoting measurement data by $Y$ and model results by $X$ we set

d^{i,\text{med}}_{\text{height}}(X,Y)=r^{i}(X)+\left(1-r^{i}(X)\right)\frac{|\tilde{p}_{\text{height}}^{i}(Y)-\tilde{p}_{\text{height}}^{i}(X)|}{|\tilde{p}_{\text{height}}^{i}(Y)+\tilde{p}_{\text{height}}^{i}(X)|},

(5)

where $\tilde{p}$ denotes the median of the distribution $p$ . Hence, the distance measure increases if the burst height is different between model and data, and/or if a) or b) holds for a large number of cells. Similarly, we quantify the distance in terms of standard deviation

d^{i,\text{std}}_{\text{height}}(X,Y)=r^{i}(X)+\left(1-r^{i}(X)\right)\frac{|\sigma(p_{\text{height}}^{i}(Y))-\sigma(p_{\text{height}}^{i}(X))|}{|\sigma(p_{\text{height}}^{i}(Y))+\sigma(p_{\text{height}}^{i}(X))|},

(6)

with $\sigma(p)$ referring to the standard deviation of the distribution $p$ . Denoting by $p_{\text{dur}}^{i}(X)$ the distribution of the duration of the $i-$ th burst in an empirical population $X$ , we can define the distances $d^{i,\text{med}}_{\text{dur}}(X,Y)$ and $d^{i,\text{std}}_{\text{dur}}(X,Y)$ in the same way and obtain the corresponding formulas by exchanging the distributions in (5) and (6). The distance measures with respect to both burst height and burst duration are considered for the first four measured bursts and hence, including median and standard deviation, they account for 16 components in the objective function.

The distance with respect to the burst count is, similarly as their height and duration, measured in terms of relative difference in mean and standard deviation. Denoting by $p_{\text{count}}(X)$ and $p_{\text{count}}(Y)$ the distribution of detected bursts in the model and in the data we computed distance measures in mean and standard deviation like in (5) and (6) and included them in the objective function. In the computation we took the relative number of failed path computations, $r^{0}(X)$ , into account. To avoid parameter ranges that cause instabilities during the parameter estimation we included this rate as a separate component increasing the value of the objective function.

Population mean and standard deviation

The mean and standard deviation of the nuc/cyt SMAD2 ratio across the simulated and measured cell population account for two further components of the objective function. At any fixed point in time these quantities can be computed employing the snapshot distribution across all cells. Thus, for populations $X$ and $Y$ time-dependent means and variances $m_{X}^{t},m_{Y}^{t},\sigma_{X}^{t},\sigma_{Y}^{t}$ can be considered. The global distances of the populations with respect to mean and standard deviation sum the differences over time and take relative values using the formulas

d^{\text{mean}}(X,Y)=\frac{\sum_{t}|m_{X}^{t}-m_{Y}^{t}|}{\sum\limits_{t}|m_{X}^{t}|+\sum\limits_{t}|m_{Y}^{t}|},\quad d^{\text{std}}(X,Y)=\frac{\sum_{t}|\sigma_{X}^{t}-\sigma_{Y}^{t}|}{\sum\limits_{t}|\sigma_{X}^{t}|+\sum\limits_{t}|\sigma_{Y}^{t}|}.

Weighting

A balanced weighting of the introduced distance measures with respect to single-cell characteristics and population statistics allowed us to compare measured and simulated populations in terms of their response to TGF-β stimulation. In particular, we considered the height and duration of the first four bursts and weighted burst height, burst duration, burst count and standard deviation of the observable equally. To prevent deviations from the population mean and failed path computations in the parameter estimation we assigned large weights to the corresponding components. Moreover, we weighted differences in median and mean higher than differences in standard deviation.

In more detail, for measurement data $Y$ and model results $X$ , we defined the output of the objective function as the vector

	$\displaystyle d(X,Y)=\left(\right.$	$\displaystyle d^{1,\text{med}}_{\text{height}}(X,Y),\dots,d^{4,\text{med}}_{\text{height}}(X,Y),\frac{1}{2}\,d^{1,\text{std}}_{\text{height}}(X,Y),\dots,\frac{1}{2}\,d^{4,\text{std}}_{\text{height}}(X,Y),$
		$\displaystyle d^{1,\text{med}}_{\text{dur}}(X,Y),\dots,d^{4,\text{med}}_{\text{dur}}(X,Y),\frac{1}{2}\,d^{1,\text{std}}_{\text{dur}}(X,Y),\dots,\frac{1}{2}\,d^{4,\text{std}}_{\text{dur}}(X,Y),$
		$\displaystyle 4\,d^{\text{mean}}_{\text{count}}(X,Y),2\,d^{\text{std}}_{\text{count}}(X,Y),80\,r^{i}(X),40\,d^{\text{mean}}(X,Y),4\,d^{\text{std}}(X,Y)\left.\right),$

and considered as scalar distance the sum over its squared components.

Path computation

For efficient and accurate numerical approximation of the model (1), (2), (3) we employed a method based on the Itō-Taylor formula of strong order $\gamma=1.5$ from [23]. The formula includes the control parameter $\theta$ , which we chose $\theta=1/2$ resulting in a semi-explicit scheme that can both handle the stiff reaction terms of the SMAD signaling model and resolve the dynamics accurately without the need to compute second-order derivatives. Its deterministic counterpart is known as the Crank-Nicolson-method and commonly used for parabolic differential equations. The scheme in its full form reads

	$\displaystyle L_{n+1}=L_{n}+$	$\displaystyle\frac{\Delta t}{2}f_{L}(t_{n+1},L_{n+1},\mathbf{Y}^{1}_{n+1},\dots,\mathbf{Y}^{N}_{n+1},\mathbf{P}^{1}_{n+1},\dots,\mathbf{P}^{N}_{n+1})$
	$\displaystyle+$	$\displaystyle\frac{\Delta t}{2}f_{L}(t_{n},L_{n},\mathbf{Y^{1}_{n}},\dots,\mathbf{Y}^{N}_{n},\mathbf{P}^{1}_{n},\dots,\mathbf{P}^{N}_{n})$
	$\displaystyle y^{i}_{k,n+1}=y^{i}_{k,n}+$	$\displaystyle\frac{\Delta t}{2}\left[f_{k}(t_{n+1},L_{n+1},\mathbf{Y}^{k}_{n+1},\mathbf{P}^{k}_{n+1})+f_{k}(t_{n},L_{n},\mathbf{Y}^{k}_{n},\mathbf{P}^{k}_{n})\right]+G^{i}_{k,n}$
	$\displaystyle p^{i}_{j,n+1}=p^{i}_{j,n}+$	$\displaystyle\frac{\theta_{k}\,\Delta t}{2}\left[2p^{i}_{j,0}-p^{i}_{j,n+1}-p^{i}_{j,n}\right]+\sigma_{j}\sqrt{p^{i}_{j,n}}\Delta W^{i}_{j,n}$
		$\displaystyle+\frac{\sigma_{j}}{4}\left[(\Delta{B}^{i}_{j,n})^{2}-\Delta t\right]+H^{i}_{j,n},$

for $i=1,\dots,N$ , $k=1,\dots,22$ and $j\in S$ . Here $L_{n}$ , $y^{i}_{k,n}$ and $p^{i}_{j,n}$ denote approximates of the ligand, the dynamic concentrations and the stochastic kinetic parameters at time instance $t^{n}=n\Delta t$ . Moreover, $\Delta t$ refers to the time increment and $\Delta{B}_{j,n}^{i}=\sqrt{\Delta t}\mathcal{M}_{j,n}^{i}$ , where $\mathcal{M}_{j,n}^{i}$ are independent and identically distributed normal random variables with expectation zero and variance one. The high-order correction terms $G^{i}_{k,n}$ and $H^{i}_{j,n}$ require the generation of further random variables and its formulas are given in Table 5.

To solve the nonlinear systems in each time step, we used an adaptive Newton method, where the Newton-error is individually computed for each cell. If this error fell below the error tolerance the corresponding cell was removed from the system in the next Newton iteration. This was iterated until convergence was achieved in each cell. This technique significantly reduced the computational costs. We optimized the performance further by employing a sparse Jacobian that we symbolically pre-computed offline. To handle the stiff reactions caused by the step-like addition of the TGF-β stimulus we employed small time increments of size $\Delta t=0.04$ in the stimulation phase, which included only a minor part of the total time interval. For the remaining time steps we switched to $\Delta t=0.6$ for fast and robust computations.

Although our model leads to non-negative protein concentrations and kinetic parameters, the scheme cannot guarantee non-negativity. To avoid nonphysical parameters in simulations we applied clipping and set computationally obtained negative state variables to zero. Further, in the case of kinetic parameters equal to zero we ignored the high-order corrections in the scheme for the next time step. We experimentally verified the strong convergence of the resulting scheme and observed a strong order higher than 1, see Fig. 4A.

Simulations of restimulation experiments

In restimulation experiments (Fig. 7), the system is subjected twice to a 5 pM TGF-β stimulus at 0 and 6h. To simulate these experiments, the concentration of external ligand in the model was raised from 0 to 5 pM at the beginning of the experiment, and from the value at the time point before 6h to 5 pM at t = 6h. The restimulation experiment was performed using a different charge of recombinant TGF-β when compared to the dose-titration experiments used for model fitting and validation (Figs. 5 and 6). Thus, the initial peak amplitudes of the signal 1h after stimulation were slightly different between both types of experiments (compare Figs. 6A and 7A), likely because different ligand charges have slightly different biological activities. To account for this ligand differences in our simulations, the forward and backward reaction rates of ligand binding ( $P_{49}$ and $P_{20}$ , clf. Table 3) were adapted and a dose factor changing the effective external ligand concentration was introduced. To avoid an over-adaptation, only data-points the second stimulus in the restimulation experiments were used in the estimation of these ligand-adjustment parameters.

Supporting information

S1 Fig. Autocorrelation of nuc/cyt SMAD2 ratio

Autocorrelation functions of the nuc/cyt SMAD2 ratio for both the experimental data and the respective simulations (after subtracting the trend) for all 5 considered doses. Autocorrelations for model and data have similar trajectories and reach values close to 0 at similar times fitting within the range considered as intermediate time scale in our definition of bursts (grey shaded area). https://doi.org/10.6084/m9.figshare.19064561

S2 Fig. Autocorrelation of dynamic variables in the model

Autocorrelations of individual species within the pathway in the model after subtracting time averages in case of stimulation with 100 pM TGF-β. The autocorrelation trajectories vary among each other, they mostly drop fast in the first 120 minutes but then approach 0 only slowly. In species with slower decay of self-similarity like nuclear and cytoplasmic unphosphorylated SMAD2, the noise led to faster decay. In species with faster decay of self-similarity on the other hand, like nuclear phosphorylated SMAD2 and heterotromer, it led to slower decay. In both cases the stochasticity brings the signal closer to the autocorrelation of the nuc/cyt SMAD2 ratio. No species alone shows an autocorrelation similar to the one of the SMAD2 nuc/cyt ratio (compare S1 Fig). https://doi.org/10.6084/m9.figshare.19064630

S3 Table. Comparison between the CIR and the OU model

Distance measurements of the OU internalisation model in comparison to the CIR internalisation and the deterministic model to experimental data for a stimulation with 100 pM TGF-β (compare Table 1). The best-fit parameters from fitting the CIR internalisation model were used in the OU internalisation model. Parameters could not be a priori estimated for the OU internalisation model due to instabilities caused by negative stochastic parameter values. The components of the objective function in the OU internalization model are consistent and comparable albeit of an increased magnitude in comparison to the results of the CIR internalization model. https://doi.org/10.6084/m9.figshare.19064483

S4 Fig. Best-fit OU model

Single cell and population average of nuc/cyt SMAD2 ratio for 100 pM TGF-β stimulation predicted by the OU internalisation model (see S2 Table). In comparison to the CIR internalisation model qualitatively similar bursting is observed in the single cells (compare Fig. 3A). https://doi.org/10.6084/m9.figshare.19064663

S5 Table. Log-likelihood and Akaike information criterion of block models

In analogy to simultaneous fitting of an error model proposed in [52], the scaling of the residuals of our cost-function was re-normalized by introducing the rescaling factors $s_{k}$ for the components of the objective function $r_{k}$ minimizing $\sum_{k}(r_{k}/s_{k})^{2}+2\,\log(s_{k})$ . Factors for all components referring to burst height and those referring to burst duration were assumed equal, respectively. This re-scaled cost-function met the requirements on the variance of the residuals to calculate the log-likelihood (LL) and the Akaike information criterion (AIC) of the block models in case of the 100 pM TGF-β stimulation. The AIC ratio was computed with respect to the AIC of the internalization block model and indicates that the internalization model outperforms the other models also taking the degrees of freedom into account. https://doi.org/10.6084/m9.figshare.19064558

S6 Fig. Prediction of SMAD7 knockout

SMAD7 knockout prediction by reducing the parameter of SMAD dependant feedback induction in the internalization model to 30 % of its original value in case of a single stimulation with 100 pM TGF-β. Using burst analysis, the model predictions were compared to data from a SMAD7 knockout experiment. Simulation data and burst analysis is presented analog to Fig. 5. The population average is well described by the model prediction. Furthermore, the model predicts a slight increase in the mean burst count, amplitude and duration in the SMAD7 knockout. Moreover the burst amplitude and duration variance is predicted to increase. Most of these predictions are indeed observed qualitatively in the SMAD7 knockout compared to wildtype data, although the effect sizes are not very large. The prediction was similarly accurate to those of 5 pM and 25 pM doses of TGF-β (compare Fig. 6). https://doi.org/10.6084/m9.figshare.19064687

S7 Fig. Comparison to full and hybrid stochastic simulation techniques

We applied $\tau$ -leaping SSA and a hybrid SSA solver to the TGF-β model in case of stimulation with 100 pM TGF-β using the COPASI toolbox [26] and the model from [63]. In a direct comparison between $\tau$ -leaping SSA and the internalization CIR model by running 375 paths on the same hardware, the CIR internalization model performed nearly 100x faster (2.2 sec/path vs. 210 sec/path). The hybrid approach combined SSA with a Runge-Kutta method and selected the stochastically treated species based on the number of particles, which also correlates with the expected variance. The publicly available hybrid solver has been much slower than our approach. Simulation data and burst analysis is presented analog to Fig. 5. While the computationally expensive SSA reproduced experimental data relatively accurately, the hybrid approach failed to accurately describe the population average and underestimated the number of bursts. Unlike in the CIR internalization model, in both the full SSA and the hybrid method, the nuc/cyt SMAD2 ratio tends to stick to the level before stimulation once the trajectory reached this level again and subsequent stochastic events were less frequently observed. This behaviour was even more prominent in the hybrid model. Moreover the full SSA model predicts a higher number of bursts with high amplitude, in comparison to both data and CIR internalization model, which all attain a similar maximum level. Both SSA models overestimate the heights of later bursts, which was not observed in the CIR internalization model. The predicted bursts thus vary significantly from the ones observed in the experimental data and those predicted by the CIR internalization model. https://doi.org/10.6084/m9.figshare.19064702

S8 Table. Comparison to full and hybrid stochastic simulation techniques

Distance measurements of $\tau$ -leaping SSA and a hybrid SSA solver (see S7 Fig.) in comparison to the CIR internalization and the deterministic model to experimental data for a stimulation with 100 pM TGF-β (compare Table 1). Distance in terms of burst statistics of both $\tau$ -leaping SSA and hybrid SSA is increased in comparison to the CIR internalization model. https://doi.org/10.6084/m9.figshare.19064621

S9 Fig. Bursts in the nuc/cyt SMAD2 ratio are caused by anti-correlated changes in nuclear and cytoplasmic SMAD2

To validate that the bursting events in the nuc/cyt SMAD2 ratio do not arise from technical artefacts, we analyzed the behavior of cytoplasmic as well as nuclear SMAD2 pools in trajectories of single cells. If the bursts are related to nuclear translocation, the two pools must be negatively correlated in single cells during bursting events. Even though over all cells and considered time points, nuclear and cytoplasmic SMAD2 are positively correlated, in the progress of the initial burst, they are indeed negatively correlated. In the scatter plot on the left each trajectory is represented by a blue dot and a red dot encoding nuclear and cytoplasmic SMAD2 at the time before and within a burst event respectively. The histogram on the right shows the distribution of angles of the line connecting the initial state to the burst peak in the cell population, clearly indicating that for most cells, the increase in nuclear SMAD2 matched the decrease of the cytoplasmic SMAD2. This excludes the existence of globally correlated measurement noise and suggests that nuclear and cytoplasmic SMAD2 is negatively correlated within bursts. https://doi.org/10.6084/m9.figshare.19064699

S10 Fig. Comparison to alternative vesicle model

To assess the role of receptor sorting into vesicles we considered an alternative model. This vesicle model assumed the receptors to be attributed to a particular area on the cell surface. We randomly decided whether each vesicle got internalized or not. For each vesicle that gets internalized, all attributed receptors were internalized, independent of their binding states. For simplicity, after internalisation, the receptors are assumed to be released and recycled or degraded according to the ODE model. Thus the underlying hypothesis of vesicles is not yet consistently translated into the new model. Still, even without parameter fitting, the number of bursting events agreed well to the experimental data. Simulation data and burst analysis is presented analog to Fig. 5. While the variance predicted by this model is clearly underestimated, the average height and duration of bursting events were well predicted (see also S11 Table). The variance of the burst properties though, were underestimated. While the results of the vesicle model are promising, further research and modeling are required to better describe the data. https://doi.org/10.6084/m9.figshare.19064675.v1

S11 Table. Comparison to alternative vesicle model

Distance measurements of the vesicle model (see S10 Fig.) in comparison to the CIR model and the deterministic model relative to experimental data for stimulation with 100 pM TGF-β (compare Table 1). While the number of bursts predicted by the vesicle model matches the experimental data, the fit in standard deviation and population average is increased compared to the CIR model. https://doi.org/10.6084/m9.figshare.19064660

S12 Fig. Ligand degradation

The free ligand concentration in the CIR internalization model in case of stimulation with 100 pM TGF-β for different population sizes (left) and their difference from a reference ligand concentration computed by assuming population size $n=750$ (right). Temporal fluctuations in the ligand concentration average out for large population sizes. https://doi.org/10.6084/m9.figshare.19064666

Data availability statement

The source code and data used to produce the results and analyses presented in this manuscript are available from the GitHub repository [34].

Acknowledgements

The experimental data used in this paper was partially produced by Jette Strasen (MDC Berlin).

References

[1] J. G. Albeck, G. B. Mills, and J. S. Brugge. Frequency-Modulated Pulses of ERK Activity Transmit Quantitative Proliferation Signals. Molecular Cell, 49(2):249–261, jan 2013.
[2] K. Aoki, Y. Kumagai, A. Sakurai, N. Komatsu, Y. Fujita, C. Shionyu, and M. Matsuda. Stochastic ERK activation induced by noise and Cell-to-Cell propagation regulates cell Density-Dependent proliferation. Mol. Cell, 52(4):529–540, Nov. 2013.
[3] L. Ashall, C. A. Horton, D. E. Nelson, P. Paszek, C. V. Harper, K. Sillitoe, S. Ryan, D. G. Spiller, J. F. Unitt, D. S. Broomhead, D. B. Kell, D. A. Rand, V. Sée, and M. R. H. White. Pulsatile stimulation determines timing and specificity of NF-kappaB-dependent transcription. Science, 324(5924):242–246, Apr. 2009.
[4] L. Cai, C. K. Dalal, and M. B. Elowitz. Frequency-modulated nuclear localization bursts coordinate gene regulation. Nature, 455(7212):485–490, Sept. 2008.
[5] Y. Cao, D. T. Gillespie, and L. R. Petzold. The slow-scale stochastic simulation algorithm. The Journal of chemical physics, 122(1):014116, 2005.
[6] G. Cellière, G. Fengos, M. Hervé, and D. Iber. Plasticity of TGF- $\beta$ signaling. BMC Systems Biology, 5:184, nov 2011.
[7] S. W. Chung, F. L. Miles, R. A. Sikes, C. R. Cooper, M. C. Farach-Carson, and B. A. Ogunnaike. Quantitative modeling and analysis of the transforming growth factor $\beta$ signaling pathway. Biophysical Journal, 96(5):1733–1750, mar 2009.
[8] D. C. Clarke, M. L. Brown, R. A. Erickson, Y. Shi, and X. Liu. Transforming Growth Factor Depletion Is the Primary Determinant of Smad Signaling Kinetics. Molecular and Cellular Biology, 29(9):2443–2455, may 2009.
[9] J. C. Cox, J. E. Ingersoll Jr, and S. A. Ross. An intertemporal general equilibrium model of asset prices. Econometrica: Journal of the Econometric Society, pages 363–384, 1985.
[10] L. Dharmarajan, H. M. Kaltenbach, F. Rudolf, and J. Stelling. A Simple and Flexible Computational Framework for Inferring Sources of Heterogeneity from Single-Cell Dynamics. Cell systems, 8(1):15–26.e11, jan 2019.
[11] G. M. Di Guglielmo, C. Le Roy, A. F. Goodfellow, and J. L. Wrana. Distinct endocytic pathways regulate tgf-β receptor signalling and turnover. Nature cell biology, 5(5):410, 2003.
[12] P. D. Dixit, E. Lyashenko, M. Niepel, and D. Vitkup. Maximum Entropy Framework for Predictive Inference of Cell Population Heterogeneity and Responses in Signaling Networks. Cell Systems, 10(2):204–212.e8, feb 2020.
[13] J. A. Egea, D. Henriques, T. Cokelaer, A. F. Villaverde, A. MacNamara, D.-P. Danciu, J. R. Banga, and J. Saez-Rodriguez. MEIGO: An open-source software suite based on metaheuristics for global optimization in systems biology and bioinformatics. BMC Bioinformatics, 15(1):136, 2014.
[14] J. A. Egea, R. Martí, and J. R. Banga. An evolutionary method for complex-process optimization. Computers & Operations Research, 37(2):315–324, 2010.
[15] M. Ehrlich, A. Shmuely, and Y. Henis. A single internalization signal from the di-leucine family is critical for constitutive endocytosis of the type II TGF-(β) receptor. Journal of Cell Science, 114(9):1777–1786, May 2001.
[16] H. El Samad, M. Khammash, L. Petzold, and D. Gillespie. Stochastic modelling of gene regulatory networks. International Journal of Robust and Nonlinear Control, 15(15):691–711, 2005.
[17] O. Feinerman, J. Veiga, J. R. Dorfman, R. N. Germain, and G. Altan-Bonnet. Variability and robustness in T cell activation from regulated heterogeneity in protein levels. Science, 321(5892):1081–1084, aug 2008.
[18] C. L. Frick, C. Yarka, H. Nunns, and L. Goentoro. Sensing relative signal in the Tgf- $\beta$ /Smad pathway. Proc. Natl. Acad. Sci. U. S. A., 114(14):E2975–E2982, Apr. 2017.
[19] C. Fritzsch, S. Baumgärtner, M. Kuban, D. Steinshorn, G. Reid, and S. Legewie. Estrogen-dependent control and cell-to-cell variability of transcriptional bursting. Molecular Systems Biology, 14(2):e7678, feb 2018.
[20] D. T. Gillespie. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. Journal of Computational Physics, 22(4):403–434, Dec. 1976.
[21] D. T. Gillespie. The chemical Langevin equation. The Journal of Chemical Physics, 113(1):297–306, July 2000.
[22] D. T. Gillespie. Stochastic simulation of chemical kinetics. Annual Review of Physical Chemistry, 58:35–55, 2007.
[23] X. Han and P. E. Kloeden. Random ordinary differential equations and their numerical solution. Springer, 2017.
[24] C.-H. Heldin, M. Landström, and A. Moustakas. Mechanism of TGF-β signaling to growth arrest, apoptosis, and epithelial–mesenchymal transition. Current Opinion in Cell Biology, 21(2):166–176, Apr. 2009.
[25] A. Hellander and P. Lötstedt. Hybrid method for the chemical master equation. Journal of Computational Physics, 227(1):100–122, 2007.
[26] S. Hoops, S. Sahle, R. Gauges, C. Lee, J. Pahle, N. Simus, M. Singhal, L. Xu, P. Mendes, and U. Kummer. COPASI—a COmplex PAthway SImulator. Bioinformatics (Oxford, England), 22(24):3067–3074, Oct. 2006.
[27] H. Ikushima and K. Miyazono. TGFβ signalling: A complex web in cancer progression. Nat Rev Cancer, 10(6):415–424, June 2010.
[28] G. J. Inman, F. J. Nicolás, and C. S. Hill. Nucleocytoplasmic Shuttling of Smads 2, 3, and 4 Permits Sensing of TGF-β Receptor Activity. Molecular Cell, 10(2):283–294, Aug. 2002.
[29] S. M. Kallenberger, J. Beaudouin, J. Claus, C. Fischer, P. K. Sorger, S. Legewie, and R. Eils. Intra- and interdimeric caspase-8 self-cleavage controls strength and timing of CD95-induced apoptosis. Science Signaling, 7(316), mar 2014.
[30] S. M. Kallenberger, A. L. Unger, S. Legewie, K. Lymperopoulos, U. Klingmüller, R. Eils, and D. P. Herten. Correlated receptor transport processes buffer single-cell heterogeneity. PLoS Computational Biology, 13(9):e1005779, sep 2017.
[31] M. Karlsson, D. L. Janzén, L. Durrieu, A. Colman-Lerner, M. C. Kjellsson, and G. Cedersund. Nonlinear mixed-effects modelling for single cell estimation: When, why, and how to use it. BMC Systems Biology, 9(1):1–15, sep 2015.
[32] S. Khatibi, H. J. Zhu, J. Wagner, C. W. Tan, J. H. Manton, and A. W. Burgess. Mathematical model of TGF- $\beta$ signalling: Feedback coupling is consistent with signal switching. BMC Systems Biology, 11(1):1–15, apr 2017.
[33] T. R. Kiehl, R. M. Mattheyses, and M. K. Simmons. Hybrid simulation of cellular behavior. Bioinformatics, 20(3):316–322, 2004.
[34] N. Kolbe, L. Hexemer, L.-M. Bammert, A. Loewer, M. Lukáčová-Medviďová, and S. Legewie. Simulation code of the tgf-b/smad pathway in single cells. https://github.com/nklb/stochastic-smad, 2021.
[35] T. G. Kurtz. Strong approximation theorems for density dependent Markov chains. Stochastic Processes and their Applications, 6(3):223–240, Feb. 1978.
[36] S. Legewie, H. Herzel, H. V. Westerhoff, and N. Blüthgen. Recurrent design patterns in the feedback regulation of the mammalian signalling network. Molecular systems biology, 4(1):190, 2008.
[37] Y. Li, M. Lee, N. Kim, G. Wu, D. Deng, J. M. Kim, X. Liu, W. D. Heo, and Z. Zi. Spatiotemporal Control of TGF- $\beta$ Signaling with Light. ACS Synthetic Biology, 7(2):443–451, feb 2018.
[38] C. Loos and J. Hasenauer. Mathematical modeling of variability in intracellular signaling. Current Opinion in Systems Biology, 16:17–24, 2019.
[39] P. Lötstedt and L. Ferm. Dimensional reduction of the fokker–planck equation for stochastic chemical reactions. Multiscale Modeling & Simulation, 5(2):593–614, 2006.
[40] P. Melke, H. Jönsson, E. Pardali, P. Ten Dijke, and C. Peterson. A rate equation approach to elucidate the kinetics and robustness of the TGF- $\beta$ pathway. Biophysical Journal, 91(12):4368–4380, dec 2006.
[41] N. Molina, D. M. Suter, R. Cannavo, B. Zoller, I. Gotic, and F. Naef. Stimulus-induced modulation of transcriptional bursting in a single mammalian gene. Proceedings of the National Academy of Sciences of the United States of America, 110(51):20563–20568, dec 2013.
[42] A. Moustakas and C. H. Heldin. Signaling networks guiding epithelial-mesenchymal transitions during embryogenesis and cancer progression. Cancer Science, 98(10):1512–1520, oct 2007.
[43] D. Nicklas and L. Saiz. Characterization of negative feedback network motifs in the TGF- $\beta$ signaling pathway. PLoS ONE, 8(12):1–12, dec 2013.
[44] F. J. Nicolás and C. S. Hill. Attenuation of the tgf-β-smad signaling pathway in pancreatic tumor cells confers resistance to tgf-β-induced growth arrest. Oncogene, 22(24):3698–3711, 2003.
[45] E. M. Ozbudak, M. Thattai, I. Kurtser, A. D. Grossman, and A. van Oudenaarden. Regulation of noise in the expression of a single gene. Nature Genetics, 31(1):69–73, 2002.
[46] M. Paulsen, S. Legewie, R. Eils, E. Karaulanov, and C. Niehrs. Negative feedback in the bone morphogenetic protein 4 (BMP4) synexpression group governs its dynamic signaling range and canalizes development. Proceedings of the National Academy of Sciences, 108(25):10202–10207, June 2011.
[47] J. Paulsson. Models of stochastic gene expression. Physics of Life Reviews, 2(2):157–175, jun 2005.
[48] D. Powers. Evaluation: From precision, recall and f-measure to ROC, informedness, markedness & correlation. Journal of Machine Learning Technologies, 2(1):37–63, 2011.
[49] J. E. Purvis, K. W. Karhohs, C. Mock, E. Batchelor, A. Loewer, and G. Lahav. p53 dynamics control cell fate. Science, 336(6087):1440–1444, June 2012.
[50] A. Raj and A. van Oudenaarden. Nature, Nurture, or Chance: Stochastic Gene Expression and Its Consequences. Cell, 135(2):216–226, oct 2008.
[51] C. V. Rao and A. P. Arkin. Stochastic chemical kinetics and the quasi-steady-state assumption: Application to the gillespie algorithm. The Journal of chemical physics, 118(11):4999–5010, 2003.
[52] A. Raue, M. Schilling, J. Bachmann, A. Matteson, M. Schelke, D. Kaschek, S. Hug, C. Kreutz, B. D. Harms, F. J. Theis, et al. Lessons learned from quantitative dynamical modeling in systems biology. PloS one, 8(9):e74335, 2013.
[53] J. Sakai, E. Cammarota, J. A. Wright, P. Cicuta, R. A. Gottschalk, N. Li, I. D. C. Fraser, and C. E. Bryant. Lipopolysaccharide-induced NF- $\kappa$ B nuclear translocation is primarily dependent on MyD88, but TNF $\alpha$ expression requires TRIF and MyD88. Sci. Rep., 7(1):1–9, May 2017.
[54] H. Salis and Y. Kaznessis. Accurate hybrid stochastic simulation of a system of coupled chemical or biochemical reactions. The Journal of chemical physics, 122(5):054103, 2005.
[55] M. Samoilov, S. Plyasunov, and A. P. Arkin. Stochastic amplification and signaling in enzymatic futile cycles through noise-induced bistability with oscillations. Proceedings of the National Academy of Sciences, 102(7):2310–2315, Feb. 2005.
[56] U. Sarma, L. Hexemer, U. A. Anyaegbunam, and S. Legewie. Modelling cellular signalling variability based on single-cell data: the TGFb/SMAD signaling pathway. arXiv preprint, 2007.09093, jul 2020. Preprint available from https://arxiv.org/abs/2007.09093v1.
[57] B. Schmierer and C. S. Hill. Tgf-β-smad signal transduction: Molecular specificity and functional flexibility. Nature Reviews Molecular Cell Biology, 8(12):970–982, 2007.
[58] B. Schmierer, A. L. Tournier, P. A. Bates, and C. S. Hill. Mathematical modeling identifies Smad nucleocytoplasmic shuttling as a dynamic signal-interpreting system. Proceedings of the National Academy of Sciences of the United States of America, 105(18):6608–6613, may 2008.
[59] P. M. Siegel and J. Massagué. Cytostatic and apoptotic actions of tgf- $\beta$ in homeostasis and cancer. Nature Reviews Cancer, 3(11):807–820, 2003.
[60] B. Sorre, A. Warmflash, A. H. Brivanlou, and E. D. Siggia. Encoding of Temporal Signals by the TGF- $\beta$ Pathway and Implications for Embryonic Patterning. Developmental Cell, 30(3):334–342, Aug. 2014.
[61] S. L. Spencer, S. Gaudet, J. G. Albeck, J. M. Burke, and P. K. Sorger. Non-genetic origins of cell-to-cell variability in TRAIL-induced apoptosis. Nature, 459(7245):428–432, apr 2009.
[62] J. Strasen, U. Sarma, M. Jentsch, S. Bohn, C. Sheng, D. Horbelt, P. Knaus, S. Legewie, and A. Loewer. Cell-specific responses to the cytokine tgf-β are determined by variability in protein levels. Molecular Systems Biology, 14(1), 2018.
[63] J. Strasen, U. Sarma, M. Jentsch, S. Bohn, C. Sheng, D. Horbelt, P. Knaus, S. Legewie, and A. Loewer. Cell-specific responses to the cytokine tgf-β are determined by variability in protein levels. Biomodel identifier: BIOMD0000000989, 2018.
[64] J. Strasen, U. Sarma, M. Jentsch, S. Bohn, C. Sheng, D. Horbelt, P. Knaus, S. Legewie, and A. Loewer. Data from: Cell-specific responses to the cytokine tgf-β are determined by variability in protein levels. https://doi.org/10.5061/dryad.hc5dp, 2018.
[65] G. M. Süel, J. Garcia-Ojalvo, L. M. Liberman, and M. B. Elowitz. An excitable gene regulatory circuit induces transient cellular differentiation. Nature, 440(7083):545–550, Mar. 2006.
[66] O. Tidin, E. T. Friman, F. Naef, and D. M. Suter. Quantitative relationships between SMAD dynamics and target gene activation kinetics in single live cells. Scientific Reports, 9(1):1–11, 2019.
[67] J. M. Vilar, R. Jansen, and C. Sander. Signal processing in the tgf-β superfamily ligand-receptor network. PLoS Computational Biology, 2(1):0036–0045, jan 2006.
[68] R. Villaseñor, H. Nonaka, P. Del Conte-Zerial, Y. Kalaidzidis, and M. Zerial. Regulation of egfr signal transduction by analogue-to-digital conversion in endosomes. Elife, 4:e06156, 2015.
[69] P. Vizán, D. S. Miller, I. Gori, D. Das, B. Schmierer, and C. S. Hill. Controlling long-term signaling: receptor dynamics determine attenuation and refractory behavior of the tgf-β pathway. Sci. Signal., 6(305):ra106–ra106, 2013.
[70] L. M. Wakefield, D. M. Smith, T. Masui, C. C. Harris, and M. B. Sporn. Distribution and modulation of the cellular receptor for transforming growth factor-beta. Journal of Cell Biology, 105(2):965–975, aug 1987.
[71] A. Warmflash, Q. Zhang, B. Sorre, A. Vonic, E. D. Siggia, and A. H. Brivanlou. Dynamics of TGF- $\beta$ signaling reveal adaptive and pulsatile behaviors reflected in the nuclear localization of transcription factor Smad4. Proceedings of the National Academy of Sciences of the United States of America, 109(28):E1947–E1956, 2012.
[72] K. Wegner, A. Bachmann, J. U. Schad, P. Lucarelli, S. Sahle, P. Nickel, C. Meyer, U. Klingmüller, S. Dooley, and U. Kummer. Dynamics and feedback loops in the transforming growth factor β signaling pathway. Biophysical Chemistry, 162:22–34, 2012.
[73] E. Weinan, D. Liu, and E. Vanden-Eijnden. Nested stochastic simulation algorithms for chemical kinetic systems with multiple time scales. Journal of computational physics, 221(1):158–180, 2007.
[74] I. Yakymovych, M. Yakymovych, and C.-H. Heldin. Intracellular trafficking of transforming growth factor $\beta$ receptors. Acta Biochimica et Biophysica Sinica, 50(1):3–11, jan 2018.
[75] M. Yi, Y. Jia, Q. Liu, J. Li, and C. Zhu. Enhancement of internal-noise coherence resonance by modulation of external noise in a circadian oscillator. Physical Review E, 73(4):041923, Apr. 2006.
[76] Z. Zi, D. A. Chapnick, and X. Liu. Dynamics of TGF- $\beta$ /Smad signaling. FEBS Letters, 586(14):1921–1928, 2012.
[77] Z. Zi and E. Klipp. Constraint-based modeling and kinetic analysis of the Smad dependent tgf-β signaling pathway. PLoS ONE, 2(9):1–11, 2007.

Table 2: Dynamic variables and model equations

	description		governing time derivative
$L$	free TGF-β ligand	$f_{L}=$	$E_{1}(t)+\sum_{i=1}^{N}E_{7}^{i}\,E_{9}(t)\frac{y_{6}^{i}}{N\,P_{40}^{i}}-\sum_{i=1}^{N}P_{49}^{i}E_{9}(t)\frac{Ly^{i}_{2}}{N\,P_{40}^{i}}$
$y_{1}$	TGF-β receptor I (TGFBR1) on cell surface	$f_{1}=$	$E_{2}+E_{6}y_{7}+P_{15}P_{44}y_{9}+P_{16}P_{52}y_{9}+P_{17}P_{37}y_{3}-P_{37}y_{1}-P_{48}y_{1}y_{6}$
$y_{2}$	TGF-β receptor II (TGFBR2) on cell surface	$f_{2}=$	$E_{3}+E_{7}E_{9}(t)y_{6}+P_{16}P_{52}y_{9}+P_{18}P_{38}y_{4}-P_{38}y_{2}-P_{49}E_{9}(t)y_{2}L$
$y_{3}$	endosomal TGFBR1	$f_{3}=$	$-P_{17}P_{37}y_{3}+P_{36}y_{8}+P_{37}y_{1}-P_{43}y_{3}+P_{44}y_{8}+P_{52}y_{8}$
$y_{4}$	endosomal TGFBR2	$f_{4}=$	$-P_{18}P_{38}y_{4}+P_{36}y_{8}+P_{38}y_{2}+P_{43}y_{8}-P_{44}y_{4}+P_{52}y_{8}$
$y_{5}$	endosomal TGF-β	$f_{5}=$	$P_{15}P_{44}y_{9}+P_{36}y_{8}+P_{43}y_{8}+P_{44}y_{8}-P_{52}y_{5}$
$y_{6}$	complex of TGF-β and TGFBR2 on cell surface	$f_{6}=$	$E_{6}y_{7}-E_{7}E_{9}(t)y_{6}-P_{48}y_{1}y_{6}+P_{49}E_{9}(t)y_{2}L$
$y_{7}$	activated complex of TGF-β, TGFBR1 and TGFBR2 on cell surface	$f_{7}=$	$-E_{6}y_{7}+E_{8}y_{9}-P_{14}P_{37}y_{7}+P_{48}y_{1}y_{6}-P_{50}y_{7}y_{17}$
$y_{8}$	activated endosomal complex of TGF-β, TGFBR1 and TGFBR2	$f_{8}=$	$P_{14}P_{37}y_{7}-P_{36}y_{8}-P_{43}y_{8}-P_{44}y_{8}-P_{52}y_{8}$
$y_{9}$	inactivated complex of TGF-β, TGFBR1, TGFBR2 and SMAD7 on cell surface	$f_{9}=$	$-E_{8}y_{9}-P_{15}P_{44}y_{9}-P_{16}P_{52}y_{9}+P_{50}y_{7}y_{17}$
$y_{10}$	cytoplasmic phosphorylated SMAD2	$f_{10}=$	$\frac{P_{5}y_{18}}{2}-P_{6}y_{10}+3P_{21}P_{42}y_{14}-3P_{22}P_{51}y_{10}^{3}+P_{41}y_{8}y_{11}+2P_{42}y_{13}-P_{45}y_{10}+P_{45}y_{13}+2P_{45}y_{14}+2P_{46}y_{13}-2P_{51}y_{10}^{2}y_{12}$
$y_{11}$	cytoplasmic unphosphorylated SMAD2	$f_{11}=$	$E_{4}+\frac{P_{5}y_{21}}{2}-P_{6}y_{11}-P_{41}y_{8}y_{11}-P_{45}y_{11}$
$y_{12}$	cytoplasmic SMAD4	$f_{12}=$	$E_{5}+\frac{P_{8}y_{22}}{2}-P_{9}y_{12}+P_{42}y_{13}+P_{45}y_{13}-P_{46}y_{12}-P_{51}y_{10}^{2}y_{12}$
$y_{13}$	cytoplasmic heterotrimer of two phosphorylated SMAD2 proteins and one SMAD4 protein	$f_{13}=$	$-(P_{11}+P_{42}+P_{45}+P_{46})y_{13}+P_{51}y_{10}^{2}y_{12}$
$y_{14}$	cytoplasmic homotrimer of phosphorylated SMAD2	$f_{14}=$	$-P_{11}y_{14}-P_{21}P_{42}y_{14}+P_{22}P_{51}y_{10}^{3}-P_{45}y_{14}$
$y_{15}$	cytoplasmic mRNA of SMAD7	$f_{15}=$	$\frac{P_{12}y_{16}}{2}-P_{54}y_{15}$
$y_{16}$	nuclear mRNA of SMAD7	$f_{16=}$	$P_{39}~{}\frac{y_{19}^{P_{13}}}{P_{2}^{P_{13}}+y_{19}^{P_{13}}}+P_{55}-P_{12}y_{16}-P_{53}y_{16}$
$y_{17}$	cytoplasmic SMAD7 (feedback protein)	$f_{17}=$	$E_{8}y_{9}+P_{15}P_{44}y_{9}+P_{16}P_{52}y_{9}+P_{35}y_{15}-P_{47}y_{17}-P_{50}y_{7}y_{17}$
$y_{18}$	nuclear phosphorylated SMAD2	$f_{18}=$	$-P_{5}y_{18}+2P_{6}y_{10}+3P_{22}P_{41}y_{20}-3P_{21}P_{51}y_{18}^{3}+P_{24}P_{34}y_{19}-P_{34}y_{18}+2P_{42}y_{19}-P_{45}y_{18}+P_{45}y_{19}+2P_{45}y_{20}+2P_{46}y_{19}-2P_{51}y_{18}^{2}y_{22}$
$y_{19}$	nuclear heterotrimer of two phosphorylated SMAD2 proteins and one SMAD4 protein	$f_{19}=$	$2P_{11}y_{13}-P_{24}P_{34}y_{19}-P_{42}y_{19}-P_{45}y_{19}-P_{46}y_{19}+P_{51}y_{18}^{2}y_{22}$
$y_{20}$	nuclear homotrimer of phosphorylated SMAD2	$f_{20}=$	$2P_{11}y_{14}-P_{21}P_{42}y_{20}+P_{22}P_{51}y_{18}^{3}-P_{45}y_{20}$
$y_{21}$	nuclear unphosphorylated SMAD2	$f_{21}=$	$-P_{5}y_{21}+2P_{6}y_{11}+P_{24}P_{34}y_{19}+P_{34}y_{18}-P_{45}y_{21}$
$y_{22}$	nuclear SMAD4	$f_{22}=$	$-P_{8}y_{22}+2P_{9}y_{12}+P_{24}P_{34}y_{19}+P_{42}y_{19}+P_{45}y_{19}-P_{46}y_{22}-P_{51}y_{18}^{2}y_{22}$

Description of the dynamic variables and their corresponding right hand sides used in the model equations (1) and (2). For brevity, the dependence of the dynamic concentrations on $i$ (indicating the cell) is omitted in the descriptions of $f_{1},\dots,f_{22}$ . Details about the kinetic and experimental parameters used here can be found in Tables 3 and 4.

Table 3: Kinetic model parameters

	name in the code	description	block model
$P_{2}$	K-mran	Smad 7 induction hill parameter (Michaelis constant)
$P_{3}$	R1-total	total number of TGFBR1 receptors	synthesis
$P_{4}$	R2-total	total number of TGFBR2 receptors	synthesis
$P_{5}$	S2-export-from-nuc	rate of SMAD2 export from the nucleus to the cytoplasm
$P_{6}$	S2-import-to-nuc	rate of SMAD2 import from the cytoplasm to the nucleus
$P_{7}$	S2-total	total number of SMAD2 proteins
$P_{8}$	S4-export-from-nuc	rate of SMAD4 export from the nucleus to the cytoplasm
$P_{9}$	S4-import-to-nuc	rate of SMAD4 import from the cytoplasm to the nucleus
$P_{10}$	S4-total	total number of SMAD4 proteins
$P_{11}$	Trimer-import-to-nuc	rate of SMAD trimer import to the nucleus
$P_{12}$	export-cytoplasm	rate of SMAD7 export from the cell
$P_{13}$	hill-fact1	SMAD7 induction hill parameter (hill coefficient)
$P_{14}$	index-active-Rec-internalize	internalization rate of activated receptor complexes	internalization
$P_{15}$	index-induced-R2-deg	decay rate of SMAD7 inactivated TGFBR2 receptors
$P_{16}$	index-induced-ligand-deg	decay rate of TGF-β bound by inactivated TGFBR2 receptor
$P_{17}$	index-k-out-1-relative-speed	speed of the export of TGFBR1 out of the cell relative to its import into the cell
$P_{18}$	index-k-out-2-relative-speed	speed of the export of TGFBR2 out of the cell relative to its import into the cell
$P_{19}$	index-kb-R1	speed of the unbinding of TGFBR1 from the ligand relative to its binding from the ligand
$P_{20}$	index-kb-R2	speed of the unbinding of TGFBR2 from the ligand relative to its binding from the ligand
$P_{21}$	index-kb-homotrimer	rate of the unbinding of SMAD homotrimers into single SMADs
$P_{22}$	index-kf-homotrimer	rate of the binding of single SMADs to homotrimers
$P_{23}$	index-seq-kb	speed of the unbinding of SMAD7 to activated receptor complexes relative to its unbinding
$P_{24}$	index-trimer-dephos	decay rate of pSMAD within a trimer
$P_{25}$	k1	dose constant of 1 pM stimulation
$P_{26}$	k2	dose constant of 2.5 pM stimulation
$P_{27}$	k3	dose constant of 5 pM stimulation
$P_{28}$	k4	dose constant of 25 pM stimulation
$P_{29}$	k5	dose constant of 100 pM stimulation
$P_{34}$	k-Dephos	dephosphorylation rate of SMAD
$P_{35}$	k-S7-protein	synthesis rate of SMAD7 from mRNA (translation)
$P_{36}$	k-disso-Active-Rec	dissociation rate of active receptor complexes	end. traffic
$P_{37}$	k-in-1	internalization rate of TGFBR1 (shutteling)	internalization
$P_{38}$	k-in-2	internalization rate of TGFBR2 (shutteling)	internalization
$P_{39}$	k-induced-S7-production	rate of the SMAD signaling induced production of SMAD7
$P_{40}$	k-medium	cell volume
$P_{41}$	k-phosphorylation	phosphorylation rate of SMAD2 to pSMAD2
$P_{42}$	kb-trimmer	dissociation rate of SMAD trimers to SMADs
$P_{43}$	kdeg-R1	degradation rate of TGFBR1 receptors	degradation
$P_{44}$	kdeg-R2	degradation rate of TGFBR2 receptors	degradation
$P_{45}$	kdeg-S2	degradation rate of SMAD2 proteins
$P_{46}$	kdeg-S4	degradation rate of SMAD4 proteins
$P_{47}$	kdeg-S7	degradation rate of SMAD7 proteins	degradation
$P_{48}$	kf-R1-activation	rate of TGFBR1 binding to activated TGFBR2 complexes	receptor/ligand
$P_{49}$	kf-R2-activation	rate of TGFBR2 binding to the ligand	receptor ligand
$P_{50}$	kf-Seq-S7-Rec	rate of SMAD7 binding to activated receptor complexes
$P_{51}$	kf-trimmer	rate of SMAD binding to build SMAD trimers
$P_{52}$	kin-deg-Ligand	degradation rate of TGF-β
$P_{53}$	kmRNA1deg-S7	degradation rate of SMAD7 mRNA in the nucleus
$P_{54}$	kmRNAdeg-S7	degradation rate of SMAD7 mRNA in the cytoplasm
$P_{55}$	mRNA-prod	basal production rate of SMAD7 mRNA (transcription)	synthesis

Description of the kinetic parameters used in the model, their name in the code. Indications are given when kinetic parameters are used to introduce noise in a block model. The Parameters $P_{25}$ – $P_{29}$ are used for input scaling and are therefore not considered kinetic parameters.

Table 4: Parameters for experimental conditions and substitutions in the model descriptions

$E_{1}(t,\text{1 pM})=$	$P_{25}Q^{+}(0.1,t)\,Q^{-}(2.1,t)$	$E_{1}(t,\text{2.5 pM})=$	$P_{26}Q^{+}(0.1,t)\,Q^{-}(2.1,t)$
$E_{1}(t,\text{5 pM})=$	$P_{27}Q^{+}(0.1,t)\,Q^{-}(2.1,t)$	$E_{1}(t,\text{25 pM})=$	$P_{28}Q^{+}(0.1,t)\,Q^{-}(2.1,t)$
$E_{1}(t,\text{100 pM})=$	$P_{29}Q^{+}(0.1,t)\,Q^{-}(2.1,t)$	$E_{1}(t,\text{2$\times$2.5 pM})=$	$P_{26}Q^{+}(0.1,t)\,Q^{-}(2.1,t)+P_{26}Q^{+}(480.1,t)\,Q^{-}(482.1,t)$
$E_{2}=$	$P_{3}\frac{P_{37}~{}P_{43}^{2}}{P_{17}P_{37}P_{43}+P_{43}^{2}+P_{37}P_{43}}$	$E_{3}=$	$P_{4}\frac{P_{38}~{}P_{44}^{2}}{P_{18}P_{38}P_{44}+P_{44}^{2}+P_{38}P_{44}}$
$E_{4}=$	$P_{7}\frac{P_{6}P_{45}+P_{5}P_{45}+P_{45}^{2}}{P_{5}+P_{45}+2P_{6}}$	$E_{5}=$	$P_{10}\frac{P_{9}P_{46}+P_{8}P_{46}+P_{46}^{2}}{P_{8}+P_{46}+2P_{9}}$
$E_{6}=$	$P_{48}P_{19}$	$E_{7}=$	$P_{49}P_{20}$
$E_{8}=$	$P_{50}P_{23}$	$E_{9}(t)=$	$Q^{+}(0.1,t)$
$Q^{+}(\tau,t)=$	$\frac{1000}{1000+100\exp(\tau-t)}$	$Q^{-}(\tau,t)=$	$\frac{1000}{1000+100\exp(t-\tau))}$

Parameters of experimental conditions that were used in the model (see Table 2). The parameter $E_{1}$ describes the dose dependent bolus induction. The Parameters $E_{2}$ , $E_{3}$ , $E_{4}$ and $E_{5}$ describe the production of TGFBR1, TGFBR2, SMAD2 and SMAD4, respectively. Unbinding from complexes is determined by $E_{6}$ (TGFBR1 from activated complex of TGF-β, TGFBR1 and TGFBR2 ( $y_{7}$ )), $E_{7}$ (TGFBR2 from activated complex of TGF-β and TGFBR2 ( $y_{6}$ )) and $E_{8}$ (SMAD7 from inactivated complex of TGF-β, TGFBR1, TGFBR2 and SMAD7 ( $y_{9}$ )). The parameter $E_{9}$ introduces a short delay in some reactions for numerical stability.

Table 5: High order corrections for the path computation scheme

$G^{i}_{k,n}=$	$\sum_{j\in S}\sigma_{j}\sqrt{p^{i}_{j,n}}\frac{\partial f_{k}(t_{n},L_{n},\mathbf{Y}^{k}_{n},\mathbf{P}^{k}_{n})}{\partial p_{j}}\left[\Delta Z^{i}_{j,n}-\frac{1}{2}\Delta t\Delta{B}^{i}_{j,n}\right]$
$H^{i}_{j,n}=$	$\frac{\theta_{j}\sigma_{j}(p^{i}_{j,0}-p^{i}_{j,n})}{2\,\sqrt{p^{i}_{j,n}}}\left[\Delta{B}^{i}_{j,n}\Delta t-\Delta Z^{i}_{j,n}\right]-\theta_{j}\sigma_{j}\sqrt{p^{i}_{j,n}}\left[\Delta Z^{i}_{j,n}-\frac{1}{2}\Delta{B}^{i}_{j,k}\Delta t\right]$
$\Delta Z^{i}_{j,n}=$	$\frac{1}{2}\sqrt[3]{\Delta t}\left(\mathcal{M}_{j,n}^{i}+\frac{1}{\sqrt{3}}\mathcal{N}_{j,n}^{i}\right)$

Formulas of the high order corrections used in the numerical scheme. $\mathcal{M}_{j,n}^{i}$ and $\mathcal{N}_{j,n}^{i}$ denote independent identically distributed standard normal random variables.