This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Mapping, modeling, and reprogramming cell-fate decision making systems

Lucy Ham School of BioSciences, University of Melbourne, Parkville, Australia School of Mathematics and Statistics, University of Melbourne, Parkville, Australia ARC Centre of Excellence for the Mathematical Analysis of Cellular Systems, University of Melbourne, Parkville, Australia These authors contributed equally Taylor E. Woodward School of Mathematics and Statistics, University of Melbourne, Parkville, Australia These authors contributed equally Megan A. Coomer School of BioSciences, University of Melbourne, Parkville, Australia Cell Bauhaus, Melbourne, Australia Michael P.H. Stumpf School of BioSciences, University of Melbourne, Parkville, Australia School of Mathematics and Statistics, University of Melbourne, Parkville, Australia
Abstract

Many cellular processes involve information processing and decision making. We can probe these processes at increasing molecular detail. The analysis of heterogeneous data remains a challenge that requires new ways of thinking about cells in quantitative, predictive, and mechanistic ways. We discuss the role of mathematical models in the context of cell-fate decision making systems across the tree of life. Complex multi-cellular organisms have been a particular focus, but single celled organisms also have to sense and respond to their environment. We center our discussion around the idea of design principles which we can learn from observations and modeling, and exploit in order to (re)-design or guide cellular behavior.

1 Introduction

All cells in our bodies derive from a single ancestor, the fertilized egg cell. A process of division and differentiation gives rise to all 35-40 Trillion cells [1]. How the correct types of cells appear in their correct numbers, at the right place, and at the right time remains a riddle central to developmental and stem-cell biology. It is an instructive riddle which goes to the core of biology. The genome alone does not suffice to make an organism; the cellular machinery provides the physiological context and integrates external stimuli which have all come together in orchestrating which genes are expressed when, where, and for how long [2, 3].

Cell-fate decisions are generally considered within the context of multicellular organisms, where stem or progenitor cells undergo a series of commitment steps to adopt specific downstream fates. But it is beneficial to take a broader view and also consider decision making systems in single-celled organisms, including bacteria. Bacillus subtilis is an excellent example of a bacterium that “makes” decisions akin to those we see in multi-cellular organisms [4, 5]. B. subtilis can undergo sporulation, producing semi-inert spores that act as carriers of genetic information, surviving for extended periods and reactivating into normal bacterial cells when conditions improve. Alternatively, B. subtilis can form biofilms [6], which are organised communities with both local and large-scale spatial coherence and have even been described as possessing “embryo-like features” [7]. Not all strains of B. subtilis can exhibit sporulation or biofilm formation, which we must assume reflects different genomic contents and contexts among the different strains subsumed under B. subtilis.

The conservation of biological processes across vast evolutionary distances is well-documented in the literature [8, 9]. Jacques Monod was only slightly – if at all – facetious when saying: “What is true for E. coli is true for the elephant”, highlighting that fundamental mechanisms can be shared across diverse life forms. Key aspects of transcriptional regulation in mammalian systems, for example, were elucidated through foundational studies like Monod’s work on the lac operon in Escherichia coli. While the outcomes of cell-fate decision making systems in microbial organisms may lack the intricacy and variety of multi-cellular systems, they can offer us insights into fundamental aspects of cell-fate decision making systems and processes [7]. But microbial systems are also important in their own right and being able to affect or reprogram their behavior holds great potential for biotechnology, synthetic and engineering biology, and the management of bacterial and fungal pathogens.

We now have access to a bewildering array of experimental assays that allow us to study aspects of cell-fate decision making processes and systems. But data alone does not equate to knowledge, and even the most cutting edge experimental assays and technologies only deliver a partial view of the developmental processes involved in cell-fate decisions. Currently, single-cell data is abundant for mRNA, and a wide range of tools and techniques have emerged to analyze this information, as detailed in excellent reviews elsewhere [10, 11, 12, 13]. Other molecular states are also becoming accessible at single-cell resolution. For example, single-cell ATAC-seq (Assay for Transposase-Accessible Chromatin) reveals genome accessibility to regulatory machinery. Although single-cell proteomics is available, it remains limited to quantifying a small subset of proteins or post-translationally modified proteins. Single-cell metabolic analysis is similarly in an early stage of development [14, 15, 16].

Even though we expect that these single-cell techniques will develop rapidly, there are two impediments to the experimental analysis of cell-fate decision making systems that will continue to persist in the short term: first, we lack the ability to collect several data modalities simultaneously in the same cell; second, we cannot at scale monitor cells over time. This is due to the destructive nature of most assays and the limited resolution of non-destructive (observational or imaging-based) technologies. This is a problem for our ability to integrate and draw unified inferences from multi-omic data. Though we have made progress in integrating bulk data, single-cell analysis presents unique challenges for combining epigenomic, metabolomic, transcriptomic, and proteomic data. This is also a challenge for state-of-the-art machine learning (ML) or artificial intelligence (AI) models. These approaches rely on the detection and analysis of statistical features in large complex data sets, but without simultaneous mRNA and metabolomic data from the same cells, crucial dependencies are lost, reducing the models’ predictive power. There are computational strategies to address some of these challenges, which we will discuss below.

Mechanistic modeling, especially when coupled with ML/AI, offers valuable alternatives for understanding cell-fate decision making processes [17, 18, 19]. Here by “mechanistic models,” we mean mathematical and computational representations of our biological understanding, hypotheses, and assumptions. These models are (i) quantitative; (ii) predictive; and (iii) correspond to hypotheses that can be tested, evaluated, and refined in light of new data and information. In contrast, purely ML/AI models, though predictive, often lack mechanistic insights. While magnificent resources like AlphaFold [20], provide powerful descriptive insights, extracting actionable knowledge from them still requires additional interpretation and refinement [21, 22].

This review focuses on extracting knowledge from experimental data. We argue that mechanistic models are essential tools for interpreting cell-fate decision making systems. Mechanistic models not only serve as conceptual frameworks, but also enable predictive analyses, making them invaluable for deciphering data in a meaningful way. In our view, an effective analysis of cell-fate systems requires a balance: a purely conceptual but non-predictive approach is as limited as a purely predictive model with no functional or mechanistic understanding. One potentially positive outcome of centering mechanistic models at the center of cell-fate research could be a shift away from descriptive data collection to “discriminatory data” collection, meaning data that directly tests and informs our hypotheses and models. Large-scale Cell Atlas projects provide a rich backdrop, allowing for more focused and detailed investigations into the molecular mechanisms driving cell fates.

We begin by establishing foundational concepts in cell-fate decision making processes, with a primary focus on mammalian systems. In this domain, experimental methodologies have progressed enormously and we are now able to map biological processes at single-cell resolution, especially at the transcriptional level. While protein, epigenetic, lipid, and metabolic analyses are still developing, they are expected to reach comparable precision soon. Due to their larger size and advanced study capabilities, mammalian systems offer insights that may also help us understand cell-fate processes in simpler organisms, such as microbes.

Following this, we explore the critical role of modeling in interpreting the wealth of cell-fate data. Modeling serves two primary purposes: (1) tackling the inverse problem in cell-fate decision making, that is, inferring the underlying mechanisms from observed data, and (2) developing frameworks, which we term “CellMaps” (following the precedent set by Sydney Brenner), to potentially reprogram cellular behavior. While this capability is still largely aspirational, advances in synthetic and engineering biology bring us close to achieving it.

We then shift to constructing realistic CellMaps and whole-cell models, focusing on microbial systems where these approaches have already seen success, particularly in synthetic and engineering biology. The aim is to generate new genetic constructs in biological organisms, or to design and construct new strains that are better capable of performing specific functions, from biosensing to bioprocessing. Bacteria and single-celled eukaryotes (archaea are much less well studied), with their simpler genomes and molecular networks, provide a more feasible starting point for whole-cell modeling than multicellular organisms. We outline some of the current approaches, early successes, and promising developments in modeling aimed at supporting synthetic/engineering biology and biotechnology. Finally, we emphasize the importance of comparative biology. The lessons we learn in the context of microbial lifeforms have the potential to enhance our understanding of metazoan and mammalian systems, and vice versa. The feedback between these domains will be invaluable for advancing both our theoretical and practical knowledge of cell-fate decision making across the tree of life.

Refer to caption
Figure 1: Illustration of key design principles in cell-fate decision making systems over space and time. (A) Toggle Switches enable cells to adopt and stably maintain specific states through mutually inhibitory feedback loops. (B) Multistability is the ability of cells to attain multiple stable states, allowing for cellular diversity within a genetically identical population. (C) Oscillatory dynamics in gene expression provide timing and rhythmic control in developmental processes. (D) Cell communication is crucial for robust cell-fate control. Cells exchange signals with neighbors, enabling coordinated behavior and stabilization of gene expression patterns across tissues. (E) Spatial patterns emerge from reaction-diffusion processes, allowing cells to self-organize into distinct regions, as observed in animal coat patterns and tissue structures. This figure was created in BioRender.com.

2 Cell-Fate Decision Making Systems

We begin by outlining the role of design principles in cell-fate decision making, followed by working definitions of cell states, types, and fates. We then examine exemplar cell-fate decision making systems, first qualitatively and then quantitatively.

There is still some debate about what constitutes a cellular state, a cell-fate, or a cell type [23, 24]. While various biological definitions have been proposed, greater clarity is also needed from a theoretical and mathematical standpoint.

2.1 Design Principles of Cell-Fate Decision Making Systems

In biological systems, unlike classical or quantum computing, there is no distinct separation between hardware and software. Here, molecules act as the information-processing units (we note that this term is not without controversy), information carriers, and energy sources. Dennis Bray’s term “wetware” captures this unique nature well [25]. What is more, cellular information systems are highly interconnected, lacking the isolation we see in traditional electronics [26, 27].

Nevertheless cells manage to be (mostly) remarkably precise in their decision making, certainly at a level sufficient to give rise to 70 or so trillion cells in an adult human, all stemming back to a single fertilized egg cell. Design principles provide a valuable framework for understanding these complex processes.

To define “design principle” [28, 29, 30] we need to go to mathematical models, mm, described below in 2.2; here design principles are characteristics shared by all mathematical models that exhibit a particular behavior or trait. This concept, influenced by engineering and synthetic biology, has gained traction in developmental biology. Foundational work by Turing, Wolpert, and many others addressed similar ideas regarding the core principles that drive biological organization and its emergence during development [31, 32].

Refer to caption
Figure 2: Whole-cell modeling requires data from many layers of the omics pyramid (A) and integration of models which encode for the dynamics of subsystems (B, C, D). Multi-omics data varies in biological resolution and complexity (A); which are often inversely proportionate. Many computational methods model a single or subset of the microscopic systems within a cell (Model 1, 2, 3). In order to integrate such subsystems into a biological relevant whole-cell model, their interactions and relationships must be known. As shown in this figure, Model 1 uses transcriptomic data to model signaling pathways (B) who’s output regulates transcription (C); modeled with proteomics data in Model 2. Model 3 uses metabolic data to model the Krebs cycle which is necessary for the synthesis of amino acids. This figure highlights the need for whole-cell models to follow the flow of information that drives cell-fate decisions. This figure was created in BioRender.com.

Some design principles are easy to explain and reason about:

Bistability:

A system achieves bistability–two or more stable states–only if a positive feedback loop is part of its underlying dynamics [33, 34].

Biochemical oscillations:

All biochemical oscillators are driven by negative feedback, often with a time delay, to maintain cycles [35, 36].

Protein switch:

These arise from proteins having multiple structural states with distinct functions, which shift in response to signals [37, 38].

For most other biological phenomena, design principles may exist but are often subtle and complex [39]. We discuss four additional design principles relevant to cell-fate decision making in the following sections. These examples demonstrate two key points: (1) design principles are useful conceptual tools for analyzing biological systems, and (2) while they do not apply universally, when present, they can have a level of subtlety and complexity that needs to be captured at the relevant level of detail.

2.1.1 Cellular compartments and multistability

Multistability is crucial for cells to adopt and maintain distinct fates. The compartmentalization of key signaling proteins within cellular structures, such as the cytoplasm and nucleus, creates spatial organization that supports this process. In eukaryotic cells, many signaling proteins actively shuttle between the cytoplasm, where they are activated through processes like phosphorylation, and the nucleus, where they can trigger transcriptional responses, before returning to the cytoplasm. This cyclical cytoplasm \longrightarrow nucleus \longrightarrow cytoplasm shuttling acts as a positive feedback loop that promotes multistability [40]. By compartmentalizing signaling pathways, cells can better process and retain information, enabling them to stabilize their state even amidst fluctuating signals. For example, MAPK proteins, which move between these compartments, have been shown to influence stem cell differentiation in vivo [41], highlighting that compartmentalized signaling is a key mechanism for establishing and maintaining cell-fates. This design principle was extracted by extrapolating the mathematical analysis of positive feedback systems to more general systems [42]

2.1.2 Robust perfect adaptation

Robust perfect adaptation (RPA) is a phenomenon in which a stimulus triggers an initial response, often seen as an increase in the concentration of certain molecules, which then returns to baseline levels, even while the stimulus continues. This adaptive response allows cells to reset their signaling activity and avoid overstimulation, a crucial feature for maintaining stable cell states in the face of persistent signals. Aruajo and colleagues [43, 44] were able to develop precise algebraic criteria to define the network structures that can achieve RPA. There are only two core network configurations that are capable of producing this adaptation, meaning that all biological networks exhibiting RPA must adopt one of these designs. This insight into RPA’s design principles was achieved through abstraction, which allowed the identification of underlying structural requirements without exhaustive trial-and-error. In the context of cell-fate decision making, RPA provides cells with the ability to respond selectively to new signals while maintaining a stable fate, making it an essential signal-processing mechanism in both biological and technological systems.

2.1.3 Toggle switch behavior

Toggle switches allow cells to make binary decisions, often essential for specifying a cell-fate among competing options. Bistability, the ability to adopt one of two stable states, enables cells to lock into a specific fate when exposed to certain signals. Stochasticity is one of the hallmarks of many toggle switch architectures. Noise can help push the system between states, and many designs are only bistable in a stochastic regime. In development, toggle switches help cells decide between pathways, like becoming either a neuron or a glial cell. This principle is also seen in microbial population dynamics, where toggle switches provide population-level heterogeneity, supporting bet-hedging strategies that allow populations to survive in changing environments [45]. Efforts to identify the design principles of toggle switches have relied on a combination of modeling and statistical model selection. However, there remain many different designs that can exhibit toggle switch behavior [37]. This diversity suggests that toggle switches may represent a class of systems with so many possible implementations that it is difficult to define a single set of design principles [46, 37].

2.1.4 Turing patterns

The final example we discuss here are Turing patterns, defined by Turing to be a purely chemical process by which stable spatial patterns can emerge. It is one of the fundamental patterning mechanisms in developmental biology, alongside positional information, rule-based mechanisms, and phase separation. Arguably positional information and Turing mechanisms have received the greatest attention [47, 48, 49, 50, 51], but were taken up primarily by the biological and mathematical communities, respectively. Over the past decade, however, there has been a convergence of these perspectives, with both mechanisms now widely recognized as working together to create the developmental diversity observed in nature [51, 31].

Turing’s and later work summarized the design principles of these patterning mechanisms as: “local activation and lateral inhibition” [52, 31]. In this process, a molecule AA in one cell increases through positive feedback, diffuses to neighboring cells, and activates AA in those cells as well. At the same time, it also activates a second molecule BB, which diffuses and inhibits the production of AA in neighboring cells. This interplay of activation and inhibition enables stable spatial patterns to emerge across cell populations.

2.2 Working Definitions for Cell States, Types, and Fates

We distinguish pragmatically between cell states, cell types, and cell fates. Here we make no attempt to further qualify these working definitions, which we believe will ultimately be necessary.

2.2.1 Cell state

We use this to refer to the molecular census and structural arrangement of all the molecules comprising a cell. In practical terms, not everything will be experimentally accessible. In current applications in mammalian systems, including stem cell systems, we have access to mRNA measurements at the level of individual cells. We currently do not have the ability to sample protein abundances, epigenetic states, metabolome concentrations, and more, at the sufficient scale and quality. As a result, current Cell Atlas projects provide only a partial view of cell states.

Imaging data offers a complementary perspective by revealing cell morphology and the spatial organisation of structures like organelles and molecular complexes. Combined with molecular census data, this allows us to build a more integrated view of cell states.

Many definitions of cell states overlook or minimize the influence of environmental feedback, which crucially determines the cellular physiology. To address this limitation, we might consider defining a “renormalized” cell state, encompassing both the molecular inventory and structural arrangement within a cell, along with interactions with its environment.

2.2.2 Cell type

A cell type can be usefully defined as an ensemble of cells that, from a physiological or biological perspective, can be considered as equivalent. In mathematical terms this could, for example, correspond to region in cellular phase space that, under suitable thermodynamic conditions, is enriched for experimentally observable cells.

Cell types are traditionally defined using morphology or cellular phenotypes. The extent to which these classical definitions of cell types align with molecular signatures remains an open question, one that we will revisit throughout this review. This question is also closely linked to understanding the sources and propagation of molecular noise within cells

We can usefully differentiate between fully differentiated cell types and cell types that have some differentiation and proliferation potential. The latter include all stem and progenitor cell types. The former include all cells which no longer undergo a cell cycle, and where the only possible fate is cell death.

2.2.3 Cell-fate

This term refers to the cell type that can be reached from an initial cell type, with an inherent temporal aspect, representing a path (or set of paths) through cellular phase space between cell types. Along this path, individual cells can be observed moving in one or both directions.

Important questions include when is a cell’s fate determined at the molecular level versus the phenotypic or morphological level. Similarly important questions include how many potential fates or paths are open to a given cell type; which cell types can be reached from which others; whether cell-fates are reversible, and if they are, is the same path through cellular phase space traveled in both directions.

2.3 Cell-Fate Decisions

The set of molecular computations and transformations of cellular states by which a cell “decides” (or is “forced”) to follow a certain path are referred to as cell-fate decision making processes.

2.3.1 Biological descriptions of cell-fate decisions

We use the term “cell-fate decision” as a shorthand for a complex set of processes, implying that cells possess some level of information-processing and computational ability, as well as agency: they respond and react to internal and external signals and cues and change their behavior accordingly. There are philosophical discussions to be had on this, we are sure, but we eschew these here.

2.3.2 Mathematical descriptions of cell-fate decisions

To clarify the concepts of cell state, cell type and cell-fate developed above, we can adopt a mathematical perspective [53, 54]. Let XX denote the cell state, encompassing the abundances, and arrangements of all molecular components within the cell. We can define a suitable phase space, ΩX\Omega_{X}, in which XX resides,

XΩX0+n.X\in\Omega_{X}\subseteq\mathbb{R}_{0}^{+n}.

The temporal evolution of XX is described in terms of a stochastic differential equation (SDE),

dX=f(X;θ,t)dtDeterministic Dynamics+g(X;θ,t)dWtStochastic Dynamics+h(X;Ψ,t)dtExternal Influences.dX=\overset{\text{\footnotesize\color[rgb]{0,0,1}Deterministic Dynamics}}{\overbrace{\color[rgb]{0,0,0}f(X;\theta,t)dt}}+\overset{\text{\footnotesize\color[rgb]{0,0,1}Stochastic Dynamics}}{\overbrace{\color[rgb]{0,0,0}g(X;\theta,t)dW_{t}}}+\overset{\text{\footnotesize\color[rgb]{0,0,1}External Influences}}{\overbrace{\color[rgb]{0,0,0}h(X;\Psi,t)dt}}. (1)

Here f(X;θ,t)f(X;\theta,t) represents the deterministic dynamics and g(X;θ,t)g(X;\theta,t) the stochastic dynamics, where WtW_{t} is a Wiener process capturing the inherent noise [55]. The term h(X;Ψ,t)h(X;\Psi,t) describes the (assumed to be deterministic – this can be relaxed) influences from the environment (e.g. a change of growth-medium). Parameters θ\theta define the model, while Ψ\Psi governs the environment. A cautionary note for readers with experience in dealing with SDEs: in chemical or molecular processes the functional forms of f(X;θ,t)f(X;\theta,t) and g(X;θ,t)g(X;\theta,t) are tightly linked; hence the noise depends explicitly on the system’s state, XX. We therefore often refer to both terms jointly as the model mm, which summarizes our understanding and hypotheses about the underlying processes in a biological system, here focusing on cell-fate decision making.

This model mm captures cell-fate dynamics and is often associated with Waddington’s epigenetic landscape [56, 57, 55, 58]. These models range from simple, qualitative “back-of-the-envelope” types to more comprehensive frameworks. They may be designed to align with experimental data [59, 58, 60] or to explore broad phenomenological principles [61, 62, 57]. At the most detailed level, a whole-cell model, termed a ”CellMap” by Sydney Brenner [63], provides mechanistic, quantitative, and testable insights into cellular processes.

While we focus here on SDE models of the form in Eqn. (1), other mechanistic modeling frameworks can provide valuable perspectives. These include Boolean and generalized logical networks, stochastic Petri nets, agent-based models, and interacting spin-systems. Much of what we discuss here can be applied, with minor adjustments, to these other modeling frameworks. What they share is a foundational mechanistic structure that enables us to test and refine our understanding of the underlying mechanisms driving cellular behavior.

Mathematical models enable us to explore fundamental questions in cell-fate decision making systems such as:

  1. 1.

    How many cell types exist?

  2. 2.

    How heterogeneous can cells of the same type be? Or in other words, how many states and how varied can these states be within a single cell type?

  3. 3.

    How stable are cell types?

  4. 4.

    What causes a cell to change its type and make a cell-fate decision?

We cannot answer these questions without a tight integration of experiment, observation, and theoretical analysis.

3 The Experimental Signatures Of Cell-Fate Decision Making Systems

We live in a data rich world – but there never seems to be sufficient data. We also live in a hypotheses rich world: our understanding of almost all complex biological processes and systems is still in its infancy and as a result there are many a priori plausible but mutually contradictory hypotheses about the organisation, function, and dynamics of cellular systems [23, 24]. Currently, we lack sufficient high-quality data to thoroughly test, refine, and either validate or invalidate these hypotheses, concepts, and models related to cell-fate decision making systems [64, 65]. A key challenge is that testing our models often requires carefully designed experiments that yield data capable of distinguishing between different models or alternative hypotheses.

In this section, we review the current experimental frameworks for probing cell-fate decision making systems, focusing on data aspects we believe will remain valuable in the near future. Instead of detailed comparisons of specific methods, we emphasize three critical issues impacting cell-fate research. First, current data is often incomplete and affected by experimental noise, which must be factored into our analyses. Second, environmental conditions, cell-cell interactions, and the cell cycle all significantly influence cell fate and can complicate inference from experimental data [66, 67]. Third, most current data lacks time resolution, as we are unable to monitor the same cell over time at the necessary scale and detail. We will revisit these challenges throughout the following discussion.

3.1 The Genomic Context

We distinguish between two facets of the genome: first the inheritable material that is passed down through generations, subject to mutation and recombination; and second, the genome’s dynamic organisation within the cell, including the 3D arrangement of chromosomes, which largely determines which genes are accessible for transcription into mRNA and subsequent translation into protein.

3.1.1 The heritable genome

We now have the DNA sequences of many species, and for several, we also have high-resolution data on genomic diversity, revealing genetic variability within populations. Evolutionary and comparative genomics, supported by bioinformatics tools, enable us to annotate and interpret these genomes, especially for newly sequenced organisms.

In the context of cell-fate decisions, the heritable genome plays a lesser role, except when comparing different species, such as embryonic development across mammals or varying phenotypes in yeast species, where even closely related species can exhibit very different phenotypes. Heritable genomic elements—such as gene sequences, regulatory regions, and arrangements—shape many molecular and phenotypic differences, influencing factors like gene activation timing, mRNA processing, and protein isoform diversity. However, not all sequence-level differences are shaped by selection pressures [68], and some aspects, like the full diversity of protein isoforms, remain largely speculative [69].

3.1.2 The dynamic genome

Since the first draft of the human genome was published, our understanding of genome content and the diverse functions of non-coding DNA has transformed. We now recognise, though still incompletely, that the genome is regulated by a complex epigenetic machinery that actively opens and closes DNA sequences, modifying access to coding regions [70].

These epigenetic mechanisms, while reversible, are sensitive to environmental factors and influence which genes are accessible for transcription, thereby affecting cell fate.These include DNA methylation and demethylation, histone modifications, and chromatin remodeling. Methylation is the addition of a methyl group (CH3) to DNA (or proteins), often suppressing gene expression by limiting transcriptional access, playing a crucial role in maintaining or altering cell states. Histones are the proteins around which DNA is wrapped and the tightness of this wrapping also leads to decreased accessibility of DNA to the transcriptional machinery. Chromatin remodellining adjusts the structure of chromatin to reveal or hide specific DNA sequences, directly impacting which genes are active in determining cell identity.

A raft of experimental techniques are available to study these epigenetic and dynamic aspects of the genome and its in vivo geometry. ATAC-seq is used to understand patterns of DNA methylation and to analyze chromatin remodeling. Histone modifications are assessed through CHiP-seq (Chromatin immunoprecipitation) and similar assays. Hi-C (chromsome conformation capture) is widely used to assess the 3D structure of the genome, but other methods, such as genome architecture mapping and related approaches give a more direct representation of 3D genome structure, including contacts between chromosomes. Real-time methods like live-cell imaging and single-molecule tracking operate at a lower throughput compared to other techniques. Unlike most methods, they allow us to monitor the same cells over time, though they are typically limited to tracking a small number of molecules. Collectively, these tools help us connect the physical and dynamic organisation of the genome to its functional role in cell fate decisions, enhancing our ability to map and interpret complex epigenetic landscapes.

3.2 Gene and Protein Expression

Most current analyses of cell-fate decision making systems are based on transcriptomic data. We can measure mRNA content in single cells and gather transcriptome-wide data for tens of thousands of cells [71, 72, 73]. Single-cell mRNA-seq has led to the development of numerous statistical and computational tools, which have been comprehensively reviewed elsewhere. Here, we focus on key limitations: the incomplete nature of this data, the influence of confounding factors, and the lack of time-course data from direct observations, all of which have also been discussed in prior work [12, 73].

The central dogma of molecular biology describes the flow of information from genetic instructions to functional molecules:

DNAmRNAprotein.\text{DNA}\longrightarrow\text{mRNA}\longrightarrow\text{protein}. (2)

Each step involves complex processes. As we have discussed above, the “expressability” of DNA depends on numerous factors that regulate which genes can be transcribed. The amount of mRNA in a cell reflects a balance of production and degradation, as well as other regulatory mechanisms that finely tune mRNA levels and activity. Proteins are similarly regulated through degradation pathways, such as the ubiquitin-proteasome system, which plays a major role in protein turnover. Beyond simple degradation, the proteasome also engages in protein splicing, as seen with the non-ubiquitin-dependent 20S proteasome [74]. Crucially, a protein’s activity is often controlled by post-translational modifications, which depend on signaling networks that respond to physiological and environmental cues, adjusting the activities of kinases, phosphatases, transcription factors, and other key molecules [75].

Currently, we can probe mRNA at scale, but we often rely on simplified (and often linear) assumptions to link mRNA levels to active protein levels. Despite this limitation, many findings on cell-fate decision making in embryonic development, hematopoiesis, and tissue homeostasis are likely robust and will be further validated as we incorporate additional data types. However, analyses that aim to fully define cell types or reconstruct gene regulatory networks underscore the need for a more comprehensive understanding of all cellular molecules. Our view of “incompleteness” in data is slightly broader than most. While it’s true that transcriptomic data alone may leave gaps, experimental biases and limitations further contribute to an incomplete picture, even within the mRNA level.

The primary factors complicating our analyses include, above all, the cell cycle. A range of tools are available to detect genes and their mRNA levels in relation to cell-cycle stages. Cell-cycle progression is regulated by key proteins, such as cyclin-dependent kinases, and we have an increasingly detailed understanding of cell-cycle checkpoints and their role in cell-fate decisions, although most insights remain at the protein level.

Most single-cell mRNA and emerging single-cell protein assays are destructive, meaning cells must be lysed to be analysed. This prevents true time-course studies of individual cells, so we often rely on computational methods, such as pseudo-time analysis, to infer temporal order among observed cells [76, 77]. Additionally, explicit mathematical models of cell ensembles and their trajectories over time provide insights into temporal—and, hopefully, causal—relationships among molecular processes in cells undergoing fate changes.

3.3 Metabolic Signatures and Determinants of Cell-Fate

Metabolomics seeks to provide a detailed profile of the metabolic processes occurring within a cell [78]. Metabolism operates at a faster rate than processes like gene regulation and was historically one of the first cellular processes amenable to mathematical analysis. It is linked to key clinical phenotypes in humans and is particularly important in microbiology, especially in biotechnological applications involving bacterial or single-celled eukaryotic organisms. In developmental biology, however, metabolic studies have largely focused on the effects of environmental changes on an embryo or the roles of nutrition and drug metabolism in health and disease.

A growing number of single-cell metabolomics techniques are becoming available [16, 14, 15, 79], often developed with a focus on drug metabolism—a process that, in many cases, also involves cell-fate transitions. While single-cell technologies in microbial metabolomics are still developing, there is a clear understanding of the importance of integrating multiple data types. Mechanistic models play a crucial role in achieving this integration from the outset, and we discuss this in more detail in Section 55.

3.4 Molecular Interaction and Regulatory Networks

Our understanding of cell-fate decision making has moved far beyond viewing genes as isolated entities. Instead the analysis of biochemical, interaction, and regulatory networks is now routinely central to analysis of cell-fate decision making systems. In mathematical terms a network, 𝒩\mathcal{N} is the union of a set of vertices/nodes (e.g. metabolites, proteins, or genes), 𝒱\mathcal{V} and a set of edges/links (chemical reactions, physical or regulatory interactions) between these vertices, \mathcal{E}. We then write, 𝒩=(,)\mathcal{N}=(\mathcal{E},\mathcal{E}). Networks have been used in mathematics, physics, and computer science, but also the social sciences.

The predominant networks in the cell biology literature are:

Metabolic networks

capture the biochemistry and the set of biochemical reactions happening inside a cell, as well as the influx and efflux of metabolites; here 𝒩\mathcal{N} are the metabolites, and \mathcal{E} are the reactions connecting metabolites.

Protein-protein interaction networks

provide a summary of the physical interactions between pairs or groups of proteins. Often this can include reactions that occur during signal transduction. 𝒩\mathcal{N} is the set of all proteins in an organism, and \mathcal{E} denotes the set of the reactions connecting metabolites.

Gene regulation networks

aim to describe the complete set of transcriptional regulatory interactions and relationships; now 𝒩\mathcal{N} is the set of all genes, and \mathcal{E} contains regulatory relationships between pairs of genes.

While each of these networks was initially studied independently, researchers now recognize the value of examining them together to capture their interdependencies.

We probably understand metabolic networks (MN) best: the pioneering efforts of early biochemists, and arguably, the early mathematical work describing (bio)chemical and catalytic reactions and processes, have meant that we have a good understanding of many of the key metabolic processes. This is helped by the fact that the biochemistry is widely shared across the tree of life, and we have been able to apply lessons learned in one organism, say yeast, or pigeon muscle tissue, to other organisms.

Protein-protein interaction networks (PIN) and gene regulatory networks (GRN) are, by comparison, both less well characterized, but also show greater dependence on the species than MNs. To reconstruct PINs we have to rely on error-prone experimental assays; and GRNs we typically try to infer from gene expression data. All three networks are condition-dependent and dynamic to a degree that makes inference and analysis challenging. After a surge of interest in simple network models in the 1990s-2000s, the field now takes a more nuanced view, recognizing the complex temporal and conditional dependencies within these systems. Increasingly, we are moving from simple graph models to hypergraphs, where edges can connect multiple nodes simultaneously [80, 81, 82]. Here edges can have more than two incident nodes. For example, the first phosphorylation of Erk [83],

Erk+PErk-P\textrm{Erk}+\textrm{P}\longrightarrow\textrm{Erk-P}

would be represented by a single edge with three nodes, Erk, P, and Erk-P.

Networks and hypergraphs thus provide a visual representation of cellular processes, serve as an organizational tool for tracking molecules and interactions, and lay the groundwork for more mechanistic models of cell behavior.

3.5 Cellular Mechanics

The molecular components of a cell interact not only with each other but also with their environment, and cells are influenced by various physical forces, including mechanical forces, chemical adhesion forces between cells, and electrostatic interactions within the cell.

Mechanical forces can push and pull cells, activating cellular responses—a process seen even in embryos [84, 85]. The movement of cells within tissues is tightly coordinated by morphogen gradients, which guide these mechanical behaviors. However, studying these interactions is challenging, as we currently lack the ability to combine large-scale molecular assays and mechanical measurements in a single experiment. While we can apply and measure mechanical stress and use live imaging to observe responses, advances in spatial transcriptomics (and eventually spatial proteomics) will improve data collection. Nonetheless, integrating these different data types will remain a significant challenge.

4 Modeling Cell-Fate Decision Making Systems

The processes and the data modalities that we have outlined above highlight a set of challenges and opportunities that are amenable to modeling and sophisticated data analysis. We suggest here that mathematical analysis of mechanistic models of our hypotheses will allow us to integrate data into an interpretable and quantitative framework [86] that will allow us to improve our understanding of biological processes.

4.1 Data-Driven Analysis

In this section we focus primarily on single-cell transcriptomic data. Especially in the context of single transcriptomics there have been huge advances in the analysis of single-cell data. We will primarily focus on mechanistic perspectives in the next section, but starting with networks of gene regulation.

4.1.1 Descriptive analysis

We refer to [72] for a recent review of descriptive methods for single-cell analysis. But we like to issue a note of caution: the high dimensionality, (tens of) thousands of genes across (tens of) thousands of cells has given rise to a suite of dimensionality reduction techniques. Arguably their popularity is tied to our preference to visualize data and this is only possible in low dimension. Techniques such as t-SNE and uMap are producing deceptively pretty plots and it can be tempting to apply quantitative interpretations to such visualizations, but this should be avoided as the purposes of analysis and visualization of high-dimensional data diverge and can be incompatible.

4.1.2 Network inference

Networks, as discussed above, offer us the opportunity to study interacting sets of genes/proteins and gain some systems-level insights. Single-cell mRNA has given rise to a new set of network inference methods. They take the matrix of gene ×\times cell gene expression measurements and seek to identify sets or pairs of genes that are statistically informative about each other’s expression levels. Network inference has been a notoriously challenging problem in systems biology [87, 88, 89]. At the core sits the large-pp-small-NN problem, i.e. the fact that we have more potential hypotheses of the presence of pairwise interactions, than measurements NN. Single-cell data with its abundances has overcome much of this.

Two types of approaches have been successful: approaches that are robust to noise and which can capture non-linear dependencies [90]; and flexible statistical or machine learning approaches, especially if combined with additional data [91]. The most promising methods still have appreciable error rates but serve as interpretative tools or hypotheses-generation methods for further testing. Crucially these methods, especially for carefully designed experiments, allow us to infer networks corresponding to different cell types or developmental stages. Capturing the complex temporal dependencies among cells (using e.g. geometric perspectives) allows us even to generate cell specific networks [92]. This field is still in its infancy, but will likely result in more detailed testable hypotheses about the workings of biological processes.

4.2 Hypothesis-Driven Analysis

In Eqn. (1) we have assumed the existence of a mathematical model that describes the temporal evolution of a biological system. Below we provide an overview of recent attempts to model sub-cellular and cellular processes related to cell-fate decision making.

4.2.1 Modeling molecular and cellular processes

There have been mathematical approaches to cell cycle, cell signaling, and gene expression among others. One of the recurring themes has been the origin and maintenance of cell-to-cell variability and its impact on cell-fate decision making. We distinguish between two types of noise: intrinsic noise is due to the random effects affecting molecular reactions and is captured by stochastic modeling approaches; the second term in Eqn. (1) aims to capture these effects. Extrinsic noise, the second component we consider refers to the variability that is due to factors that are not explicitly captured in our model. This can, for example, include the cellular environment but also cell cycle stage, differences in ribosome abundance, or mitochondrial activity between different cells [93]. Capturing both types of noise is important; dissecting sources of noise is, however, not always possible, though some experimental designs allow this in principle and in practice.

In addition to SDEs of the form  (1), we can also employ other modeling frameworks, both exact and approximate, to simulate and analyze molecular processes. For a limited number of cases exact solutions or statements can be derived, but simulation is currently the only route for many situations. We are getting better at describing the design principles for noise propagation and attenuation, but both further experimental and theoretical work is required. For integrating transcriptomic and protein levels stochastic modeling can already help a lot and we can explore different alterations and refinements to the model and its dynamics, for example, by including opening and closing of chromatin, degradation of mRNA and protein, or threshold effects [94].

4.2.2 Modeling developmental processes

To model developmental processes biologists and mathematicians often turn to Waddington’s epigenetic landscape [53, 3] – a metaphor in which cell differentiation is visualized as a journey through a landscape of valleys and hills. Cells start as less differentiated at the top of the landscape, move downhill, and encounter branching points along the way. These branches represent cell-fate choices, with each path leading to a distinct valley corresponding to a stable cell type. This metaphor, introduced nearly a century ago, has since inspired quantitative and computational frameworks to describe key qualitative dynamics of developmental systems, such as the stability of cell types, bifurcations leading to different cell-fates, and the role of intermediate cell states. Over time, approaches to modeling the developmental landscape have evolved into two broad categories: “gene-centric”[59, 58, 60] and “gene-free models”[61, 62, 57], each offering distinct perspectives, advantages and disadvantages.

4.2.3 Gene-centric models of the developmental landscape

Gene-centric models are typically rooted in SDEs of the form (1) that describe the evolution of gene expression profiles over time. By presuming a network of gene interactions, these models seek to understand how individual genes and their interactions shape the developmental landscape. Specifically, the probability P(X)P(X) of a cell’s gene expression vector XX is used to determine valleys (stable states) and paths (transitional states) within the so-called “quasi-potential” landscape:

U(X)log(p(X)).U(X)\propto-\log(p(X)). (3)

The simple picture conceived by Waddington and analyzed by many since then, has provided valuable insights into the molecular mechanisms underlying stability and transition points within the developmental landscape. These models are effective at capturing the influence of specific gene interactions on cell behavior, which has been important for understanding the roles of critical genes in driving cell-fate choices and maintaining stable cellular states [95, 96].

As we connect this metaphor to the underlying mathematics and biological data, however, certain limitations surface. For example, these models typically rely on the a gradient assumption, where the drift component of the underlying SDE is derived from the gradient of a potential function. The system’s dynamics are thus guided by an “energy-minimizing” behavior, with cells tending to move “downhill” in this landscape to reach stable states. This gradient-based perspective has limitations, especially in biological systems with complex, oscillatory, or non-conservative behaviors. Such processes do not strictly follow energy-minimizing paths, which means that they cannot be fully captured by a gradient potential alone. Moreover, the gradient assumption is typically applied within a quasi-steady-state context, where gene expression dynamics are presumed to stabilise over time. This assumption may not hold in developmental systems that undergo active transitions, bifurcations, or shifts between cell states [97].

Stochastic dynamics, intrinsic to cellular behavior, can reshape the landscape both quantitatively and qualitatively, introducing elements such as noise-induced transitions. These can drive transitions across the landscape in a way that diverges from simple “downhill” behavior. Differential equation models, however, typically use deterministic frameworks or assume additive noise. Gene-centric models also overlook cell-cell interactions and spatial or temporal factors, which are critical in many developmental processes but are challenging to integrate into a fixed gene network framework.

Refer to caption
Figure 3: (A) Yeast cell with major cellular compartments highlighted. Our ambition is to provide mathematical models, or ‘digital twins’, for fungal organisms beyond Saccharomyces cerevisiae. (B) Our digital twins are hybrid models which combine the advantages of mechanistic and biophysical modeling with machine learning/AI models of less well characterized system components that can be learned from experimental data. (C) The mechanistic model is constructed from available biological information, including metabolic, gene regulation, signaling, and protein-protein interaction network data; the molecular machines that make up these networks; and the cellular compartments into which the cell is organized. This information can be curated from bioinformatics resources, public data bases, and literature (through text-mining). This figure was created in BioRender.com.

4.2.4 Gene-free approaches in developmental modeling

In parallel to connecting models of landscapes to gene regulatory systems, recent approaches explore the structure of these landscapes without relying on specific assumptions about gene networks. This shift, pioneered in part by the work of Briscoe, Siggia, and Rand, [61, 62, 57] leverages the work of renowned mathematician René Thom, who had a keen interest in applying catastrophe theory to biological systems. This “gene-free” perspective diverges from traditional gene-centric models, instead focusing on the global structural features that characterize developmental processes. These approaches seek to describe the behavior of the system as a whole, using concepts like stability points and saddle nodes to study the dynamics of cell state transitions. Rather than anchoring models in predefined gene networks, this framework captures the system’s global behavior, focusing on the topological and geometrical features that shape developmental landscapes.

Despite their innovative approach, these methods rely on the same gradient-based assumptions as gene-centric models and are thus subject to similar limitations when connecting these with data. However, they do offer several advantages. By not relying on specific gene interactions, gene-free approaches provide flexibility in capturing the complex and adaptive behaviors of developmental systems, accommodating phenomena that are difficult to model within a gene-centric framework. The structural features identified through catastrophe theory, such as bifurcations and attractors, offer robust insights into cell state stability and the conditions that lead to cell-fate transitions. Such methods lend themselves well to high-dimensional single-cell data, where the explicit modeling of all gene interactions is computationally prohibitive and often impractical due to incomplete information on gene regulatory networks. As the field progresses, integrating these approaches with empirical data may illuminate previously uncharacterized aspects of developmental biology, such as transient cell states and non-equilibrium dynamics, offering a pathway toward comprehensive, system-level models of development.

Ultimately, combining gene-centric and gene-free models may yield a more comprehensive and nuanced understanding of developmental processes, accommodating both gene-specific interactions and broader system-level behaviors. This combined framework has the potential to capture a richer, multidimensional picture of development, opening new pathways for exploring cell-fate determination and dynamic regulatory landscapes.

5 Hybrid Models Of Cell-Fate Decision Making Systems And Their Applications In Biotechnology

Up to now we have dealt with cell-fate decision making systems in multicellular systems and predominantly in the context of developmental processes. But cell-fate decision making is not exclusive to the eukaryotic kingdom of life, nor to multicellular organisms. Prokaryotic and archaeal cells, but also single-celled eukaryotic organisms, also have to respond and adapt their behavior to different environments. Just as in the case of eukaryotes, there are decisions that are part of the normal cell physiological processes such as adapting to different environmental or internal cellular conditions. Prokaryotes – we focus on these for simplicity and because our experience overlaps with this domain – exhibit bistability [98], oscillatory behavior [99], a multitude of switch-like behaviors (including toggle-switches) [98], robust perfect adaptation [100], and even Turing patterns [52]; they exhibit this behavior either naturally or in synthetically engineered systems. In short, they have considerable computational capacities, some of which we can harness and even repurpose to our advantage. There would be neither bread nor beer without Saccharomyces cerevisiae, but genetically engineered cells are poised to revolutionize a number of different industries, including manufacturing, across a much broader range of applications. Harnessing and repurposing microbial metabolism promises a sustainable way to produce chemicals and materials – from fuel to food and cosmetic additives to pharmaceuticals – that have traditionally relied on fossil fuel-based manufacturing practices.

5.1 Synthetic Biology to Control Cell-Fate Decision Making Processes

Synthetic and engineering biology try to optimize and repurpose biological processes, even whole microbial organisms, for applications in bio-manufacturing and beyond. One promising avenue here is to reprogram or rewire the cell-fate decision making machinery of these biotechnologically important organisms using new genomic constructs. Understanding cell-fate decision making systems, the ability to explore their behavior, and the behavior of synthetically engineered systems in silico, promises to make this design process more efficient. The hope is that in silico investigations can triage the biological design space before in vitro or in vivo analyses, reducing the time and associated costs involved in creating new microbial strains. Early successes show that this hope seems justified [101].

Understanding the design principles that drive certain types of cellular behavior would greatly assist in designing appropriately performing biosynthetic systems. As such, the search for design principles has become central to many endeavours in engineering biology, and generally proceed in one of three ways. First, we can use mathematical analysis, typically by exploring and exploiting abstractions of mathematical models that can exhibit the desired behavior. Robust perfect adaptation is one example where this is possible. Where this is not possible – and that is for the overwhelming majority of cases – we can either exhaustively explore model spaces when the design space is relatively small; or we can sample models statistically to explore larger model spaces. However, even this will ultimately reach a limit because of the associated computational demands. For example, previous analyses that have exploited exhaustive sampling of Turing pattern generating (TPG) mechanisms considered thousands of models (5,760 different models with three nodes, where two are allowed to diffuse). Evidently, if we want to explore parameter spaces, at least to a moderate degree, then the number of model evaluations quickly explodes. In these situations we can either make mathematical simplifications which allow analytic insights into approximate models [48, 50, 102], or sample less idealized (and arguably more realistic) models [49, 47]. Comparisons between different assessments of potential Turing pattern generators then depend on the nature of the systems and, for sampling-based approaches, on the depth of sampling. For example, a more comprehensive analysis where some 101110^{11} different model/parameter combinations were sampled [47] found more TPGs than an approach which sampled some 10810^{8} combinations [49].

A recurring problem in these studies is that the models we consider are systematically and purposefully designed to focus on what are perceived as the core mechanisms, often ignoring the wider molecular and cellular networks in which these mechanisms are embedded. While this approach can be successful, there are no a priori guarantees. When it fails, it is difficult to decide whether the issue is due to inaccuracies in the model itself, or external factors modulating cellular dynamics beyond the model’s scope [32, 103]. Such cell physiological feedback has been observed in many contexts as we have already argued above. One way to address this challenge is to make models larger. For example, in engineering, we might first model and design an airplane wing in isolation. However, before building a physical prototype, we refine this design within a comprehensive computer model of the entire plane, since airflow around an isolated wing differs from that around a full aircraft. Despite the increased computational cost, we scale the models to capture these complexities. Similarly, we believe it may be essential to develop more detailed models of cellular systems that function as “digital twins” of real cells. This concept, dating back at least to discussions between Sydney Brenner and Francis Crick in 1967, is older than many realize.

5.2 Whole-cell Models and Hybrid Whole-cell Models

Constraint-based metabolic models have been widely used in the design of genetically engineered organisms and microbial strains. While often successful, models focusing solely on metabolism – so ignoring gene regulation, signaling, cell-wall biophysics, and other biological processes – have inherent limitations [103]. First, they model only the enzymes involved in metabolism, overlooking genes with functions outside of metabolic reactions. Second, they fail to capture the interactions between different biological processes and how these interactions shape the behavior of the system as a whole. Sydney Brenner’s idea of a CellMap [63] is more comprehensive, and efforts towards whole-cell modeling are well underway. Particularly for bacteria, starting with Mycoplasma genitalium [104] (\approx525 genes) and advancing to E. coli [105], and for yeast, we now have powerful models that extend beyond metabolism [106] to include cellular compartments, gene regulation, and signaling. The aim is to generate “digital twins” of these industrially important microbes, allowing them to be studied in silico to establish cause-and-effect relationships that connect biological and physiological mechanisms across scales. This approach can ultimately guide the design of biosynthetic strains with superior performance characteristics (Fig. 3 A,B).

Whole-cell models are conceived to mathematically capture all relevant biomolecular processes inside a cell, Fig. 3B. As we have argued above, mathematical or mechanistic models are particularly well-suited to integrating data across multiple ’omics levels to uncover mechanisms that explain the emergence of system-level behavior. However, in the short term at least, we lack the knowledge to construct these models comprehensively despite the wealth of information on metabolic, gene regulatory, and signaling processes (Fig. 3C). Instead, we now have an opportunity to generate hybrid models that combine mechanistic and data-driven modeling frameworks in ways that leverage their distinct features and compensate for the deficiencies of the other. Machine learning can incorporate physics-based knowledge, e.g. governing equations, boundary conditions, and constraints, to resolve ill-posed problems and non-physical predictions that may arise from the naive use of ML on sparse, noisy, or biased data. Additionally, ML can benefit from mechanistic models by using them to generate synthetic data, providing a cost-effective and efficient way to supplement sparse training datasets. In turn, mechanistic models can leverage machine learning to develop surrogate models that approximate complex, computationally intensive mechanistic simulations as seen in [107], identify system dynamics and parameters, analyse sensitivities, and quantify uncertainty.

Currently, there are very few genuinely hybrid models that integrate mechanistic and data-driven approaches. More often, ML is employed to aid in the construction of mechanistic models. For example, in [108] the authors use deep learning to predict catalytic constants (kcatk_{cat} values) across different species which are subsequently used as inputs to enzyme-constrained genome-scale metabolic models (ecGEMs). Similarly, ML is frequently used to extract features from multi-omics data for setting flux balance analysis (FBA) constraints [109]. In a truly hybrid model we encode our knowledge and our hypotheses in a mechanistic mathematical model, e.g. of the type in Eqn. (1); but we also model other less clearly defined processes and cellular features, including uncertainty in our model, through a ML/AI model such as a deep neural network, Fig. 3B. From a design perspective, these models naturally perform best when design alternatives can be represented within the mechanistic framework.

There are promising applications in strain design, for example, that demonstrate the benefits of combining metabolic modeling with ML/AI approaches, whether truly hybrid or otherwise [110, 111]. Ultimately, this hybrid approach has the potential to vastly reduce the time and effort required to generate whole-cell models for different species. This has a range of different advantages. First, generating digital twins for related organisms will give us the ability to explore the complex genotype-phenotype relationships theoretically, and to address fundamental questions in evolutionary genomics from a new perspective e.g. the differences in pathogenicity among closely related fungal species. Second, it allows us to study, and even expand (using natural or synthetic strains), our repertoire of biotechnologically useful microbial systems.

Complex interacting networks that define cell-fate decision making systems are now captured in rich datasets, enabling the development and integration of mechanistic and machine learning models to deepen our exploration and understanding of how cells “make” decisions.

6 Summary

Purely descriptive analysis methods cannot describe the complex processes we observe in cell-fate decision making systems. We need to have comprehensive descriptions that allow us to integrate diverse and heterogeneous data into a single framework. Decision making is a process with an intrinsic time component but so far we have lacked the ability to observe these processes directly but have instead relied on computational processes to infer temporal dynamics.

Here we argue that CellMaps or mechanistic models of cellular behavior are essential to make sense of the large amounts of heterogeneous data available to us. Among many important functional insights they are also pivotal to distill design principles of cellular behavior. If we are able to identify such design principles we will have a framework to reason about cell-fate decision making processes, especially if we are able to consider cell fate decision making systems from across the tree of life. Integrating and interpreting functional and conceptual drivers of cell-fate decision making between species, and across experimental systems, will become a greater priority.

We are on the cusp of being able to affect cell-fate decision making processes in vivo and need to build up the ethical and societal understanding and license to do so. Here, too, CellMaps and whole cell models will be one important tool to assess safety, reliability, and social acceptability prior to any in vitro and in vivo interventions.

Disclosure Statement

MAC and MPHS are co-founders and shareholders of Cell Bauhaus.

Acknowledgments

TEW & MPHS acknowledge funding through an Australian Research Council Laureate Fellowship to MPHS (FL220100005). LH acknowledges funding throught the ARC Centre of Excellence for the Mathematical Analysis of Cellular Systems. We thank the members of the Theoretical Systems Biology group for lively discussions and the stimulating atmosphere that have been the setting in which this review became possible.

References

  • [1] Hatton IA, Galbraith ED, Merleau NSC, Miettinen TP, Smith BM, Shander JA. 2023. The human cell count and size distribution. Proc Natl Acad Sci U S A 120:e2303077120
  • [2] Rizvi AH, Camara PG, Kandror EK, Roberts TJ, Schieren I, et al. 2017. Single-cell topological rna-seq analysis reveals insights into cellular differentiation and development. Nature biotechnology 35:551–560
  • [3] MacArthur B. 2022. The geometry of cell fate. Cell Syst 13:1–3
  • [4] Kuchina A, Espinar L, Garcia-Ojalvo J, Süel GM. 2011. Reversible and noisy progression towards a commitment point enables adaptable and reliable cellular decision-making. PLoS Computational Biology 7:e1002273
  • [5] Iwańska O, Latoch P, Kopik N, Kovalenko M, Lichocka M, et al. 2024. Translation in bacillus subtilis is spatially and temporally coordinated during sporulation. Nat Commun 15:7188
  • [6] Dragoš A, Priyadarshini B, Hasan Z, Strube ML, Kempen PJ, et al. 2021. Pervasive prophage recombination occurs during evolution of spore-forming bacilli. ISME J 15:1344–1358
  • [7] Futo M, Opašić L, Koska S, Čorak N, Široki T, et al. 2021. Embryo-like features in developing bacillus subtilis biofilms. Mol Biol Evol 38:31–47
  • [8] Stadler T, Pybus OG, Stumpf MPH. 2021. Phylodynamics for cell biologists. Science 371:eaah6266
  • [9] Foley N, Mason V, Harris A, Bredemeyer K, Damas J, et al. 2023. A genomic timescale for placental mammal evolution. Science 380:eabl8189
  • [10] Baysoy A, Bai Z, Satija R, Fan R. 2023. The technological landscape and applications of single-cell multi-omics. Nat Rev Mol Cell Biol 24:695–713
  • [11] Heumos L, Schaar AC, Lance C, Litinetskaya A, Drost F, et al. 2023. Best practices for single-cell analysis across modalities. Nat Rev Genet 24:550–572
  • [12] Gorin G, Fang M, Chari T, Pachter L. 2022. Rna velocity unraveled. PLoS Comput Biol 18:e1010492
  • [13] Moses L, Pachter L. 2022. Museum of spatial transcriptomics. Nat Methods 19:534–546
  • [14] Ahl PJ, Hopkins RA, Xiang WW, Au B, Kaliaperumal N, et al. 2020. Met-flow, a strategy for single-cell metabolic analysis highlights dynamic changes in immune subpopulations. Commun Biol 3:305
  • [15] Rappez L, Stadler M, Triana S, Gathungu RM, Ovchinnikova K, et al. 2021. Spacem reveals metabolic states of single cells. Nature Methods 18:799–805
  • [16] Hrovatin K, Fischer DS, Theis FJ. 2022. Toward modeling metabolic state from single-cell transcriptomics. Mol Metab 57:101396
  • [17] Casey MJ, Stumpf PS, Macarthur BD. 2020. Theory of cell fate. Wiley Interdisciplinary Reviews: Systems Biology and Medicine 12:e1471
  • [18] Brun-Usan M, Thies C, Watson R. 2020. How to fit in: The learning principles of cell differentiation. PLoS Comput Biol 16:e1006811
  • [19] Babtie AC, Stumpf MPH. 2017. How to deal with parameters for whole-cell modelling. Journal of the Royal Society, Interface / the Royal Society 14:20170237
  • [20] Jumper J, Evans R, Pritzel A, Green T, Figurnov M, et al. 2021. Highly accurate protein structure prediction with alphafold. Nature 596:583–589
  • [21] David A, Islam S, Tankhilevich E, Sternberg M. 2022. The alphafold database of protein structures: A biologist’s guide. J Mol Biol 434:167336
  • [22] Madsen C, Barbensi A, Zhang S, Ham L, David A, et al. 2023. The topological properties of the protein universe. bioRxiv
  • [23] Clevers H, Rafelski S, Elowitz MB, Klein AM, Shendure J, et al. 2017. What is your conceptual definition of cell type in the context of a mature organism? Cell Systems 4:255–259
  • [24] Rafelski S, Theriot J. 2024. Establishing a conceptual framework for holistic cell states and state transitions. Cell 187:2633–2651
  • [25] Bray D. 2009. Wetware: A computer in every living cell. Yale University Press
  • [26] Lang M, Summers S, Stelling Jor. 2014. Cutting the wires: modularization of cellular networks for experimental design. Biophysical journal 106:321–331
  • [27] Mc Mahon SS, Lenive O, Filippi S, Stumpf MPH. 2015. Information processing by simple molecular motifs and susceptibility to noise. Journal of the Royal Society, Interface / the Royal Society 12:0597
  • [28] Andrews SS, Wiley HS, Sauro HM. 2024. Design patterns of biological cells. Bioessays 46:e2300188
  • [29] Poyatos JF. 2024. Design principles of multi-map variation in biological systems. Phys Biol 21
  • [30] Lynch MR. 2024. Evolutionary cell biology: the origins of cellular architecture. Oxford University Press
  • [31] Green JBA, Sharpe J. 2015. Positional information and reaction-diffusion: two big ideas in developmental biology combine. Development 142:1203–1211
  • [32] Gunawardena J. 2014. Models in biology: accurate descriptions of our pathetic thinking. BMC biology 12:29
  • [33] Brandman O, Ferrell Jr JE, Li R, Meyer T. 2005. Interlinked fast and slow positive feedback loops drive reliable cell decisions. Science 310:496–498
  • [34] Angeli D, Angeli D, Ferrell Jr JE, Sontag ED, Sontag ED. 2004. Detection of multistability, bifurcations, and hysteresis in a large class of biological positive-feedback systems. Proceedings of the National Academy of Sciences 101:1822–1827
  • [35] Novák B, Tyson JJ. 2008. Design principles of biochemical oscillators. Nature Reviews Molecular Cell Biology 9:981–991
  • [36] Zhou Z, Liu Y, Feng Y, Klepin S, Tsimring L, et al. 2023. Engineering longevity-design of a synthetic gene oscillator to slow cellular aging. Science 380:376–381
  • [37] Leon M, Woods ML, Fedorec AJH, Barnes CP. 2016. A computational method for the investigation of multistable systems and its application to genetic switches. Bmc Systems Biology 10:130–112
  • [38] Alberstein RG, Guo AB, Kortemme T. 2022. Design principles of protein switches. Curr Opin Struct Biol 72:71–78
  • [39] Araujo R, Vittadello S, Stumpf MPH. 2021. Bayesian and algebraic strategies to design in synthetic biology. Proceedings of the IEEE
  • [40] Harrington HA, Feliu E, Wiuf C, Stumpf MPH, Stumpf MPH. 2013. Cellular compartments cause multistability and allow cells to process more information. Biophysical journal 104:1824–1831
  • [41] Michailovici I, Harrington HA, Azogui HH, Yahalom-Ronen Y, Plotnikov A, et al. 2014. Nuclear to cytoplasmic shuttling of erk promotes differentiation of muscle stem/progenitor cells. Development 141:2611–2620
  • [42] Ferrell Jr JE. 2012. Bistability, bifurcations, and waddington's epigenetic landscape. Current Biology 22:R458–R466
  • [43] Araujo RP, Liotta LA. 2018. The topological requirements for robust perfect adaptation in networks of any size. Nature communications 9:1757
  • [44] Araujo R, Liotta L. 2023. Universal structures for adaptation in biochemical reaction networks. Nat Commun 14:2251
  • [45] Kobayashi TJ, Kamimura A. 2011. Dynamics of intracellular information decoding. Physical biology 8:055007
  • [46] Barnes CP, Silk D, Sheng X, Stumpf MPH, Stumpf MPH. 2011. Bayesian design of synthetic biological systems. Proceedings of the National Academy of Sciences of the United States of America 108:15190–15195
  • [47] Scholes NS, Schnoerr D, Isalan M, Stumpf MPH. 2019. A comprehensive network atlas reveals that turing patterns are common but not robust. Cell Systems 9:243–257.e4
  • [48] Marcon L, Diego X, Sharpe J, Müller P. 2016. High-throughput mathematical analysis identifies turing networks for patterning with equally diffusing signals. eLife 5
  • [49] Zheng MM, Shao B, Ouyang Q. 2016. Identifying network topologies that can generate turing pattern. Journal of theoretical biology 408:88–96
  • [50] Leyshon T, Tonello E, Schnoerr D, Siebert H, Stumpf M. 2021. The design principles of discrete turing patterning systems. J Theor Biol 531:110901
  • [51] Sharpe J. 2019. Wolpert’s french flag: what’s the problem. Development 146:dev185967
  • [52] Turing AM. 1952. The chemical basis of morphogenesis. Philosophical transactions of the Royal Society of London. Series B, Biological sciences 237:38–72
  • [53] Moris N, Pina C, Martinez Arias A. 2016. Transition states and cell fate decisions in epigenetic landscapes. Nature reviews. Genetics 17:693–703
  • [54] Guillemin A, Stumpf MPH. 2020a. Noise and the molecular processes underlying cell fate decision-making. Physical biology 18:011002
  • [55] Coomer M, Ham L, Stumpf M. 2022. Noise distorts the epigenetic landscape and shapes cell-fate decisions. Cell Syst :1–20
  • [56] Huang B, Lu M, Galbraith M, Levine H, Onuchic JN, Jia D. 2020. Decoding the mechanisms underlying cell-fate decision-making during stem cell differentiation by random circuit perturbation. Journal of the Royal Society, Interface / the Royal Society 17:20200500
  • [57] Rand D, Raju A, Sáez M, Corson F, Siggia E. 2021. Geometry of gene regulatory dynamics. Proc Natl Acad Sci U S A 118:e2109729118
  • [58] Sáez M, Blassberg R, Camacho-Aguilar E, Siggia ED, Rand DA, Briscoe J. 2021. Statistically derived geometrical landscapes capture principles of decision-making dynamics during cell fate transitions. Cell Systems
  • [59] Camacho-Aguilar E, Warmflash A, Rand DA. 2021. Quantifying cell transitions in c. elegans with data-fitted landscape models. PLoS computational biology 17:e1009034
  • [60] Liu Y, Zhang SY, Kleijn I, Stumpf MPH. 2024. Approximate bayesian computation for inferring waddington landscapes from single-cell data. Royal Society Open Science 11:231697
  • [61] Corson F, Siggia ED. 2012. Geometry, epistasis, and developmental patterning. Proceedings of the National Academy of Sciences, USA 109:5568–5575
  • [62] Corson F, Siggia ED. 2017. Gene-free methodology for cell fate dynamics during development. eLife 6:1947
  • [63] Brenner MP. 2010. Sequences and consequences. Phil. Trans. R. Soc. B 365:207–212
  • [64] Glauche I, Marr C. 2021. Mechanistic models of blood cell fate decisions in the era of single-cell data. Curr Opin Syst Biol 28:None
  • [65] Stumpf MP. 2021a. Statistical and computational challenges for whole cell modelling. Current Opinion in Systems Biology
  • [66] Rich J, Moses L, Einarsson P, Jackson K, Luebbert L, et al. 2024. The impact of package selection and versioning on single-cell rna-seq analysis. bioRxiv :2024.04.04.588111
  • [67] Guillemin A, Stumpf MPH. 2020b. Non-equilibrium statistical physics, transitory epigenetic landscapes, and cell fate decision dynamics. Mathematical Biosciences and Engineering 17:7916–7930
  • [68] Lynch M. 2006. The origins of genome architecture. Sinauer Associates
  • [69] Aebersold R, Agar JN, Amster IJ, Baker MS, Bertozzi CR, et al. 2018. How many human proteoforms are there? Nat Chem Biol 14:206–214
  • [70] Sarkies P. 2020. Molecular mechanisms of epigenetic inheritance: Possible evolutionary implications. Semin Cell Dev Biol 97:106–115
  • [71] Regev A, Teichmann S, Rozenblatt-Rosen O, Stubbington M, Ardlie K, et al. 2017. The human cell atlas. Mathematical Biosciences and Engineering 6:503
  • [72] Lähnemann D, Köster J, Szczurek E, McCarthy DJ, Hicks SC, et al. 2020. Eleven grand challenges in single-cell data science. Genome biology 21:31
  • [73] Gorin G, Vastola J, Pachter L. 2023. Studying stochastic systems biology of the cell with single-cell genomics data. Cell Syst 14:822–843.e22
  • [74] Liepe J, Marino F, Sidney J, Jeko A, Bunting DE, et al. 2016. A large fraction of hla class i ligands are proteasome-generated spliced peptides. Science 354:354–358
  • [75] Komorowski M, Miekisz J, Stumpf MPH, Stumpf MPH. 2013. Decomposing noise in biochemical signaling systems highlights the role of protein degradation. Biophysical journal 104:1783–1793
  • [76] Wang S, Karikomi M, Maclean AL, Nie Q. 2019. Cell lineage and communication network inference via optimization for single-cell transcriptomics. Nucleic acids research 47:e66–e66
  • [77] Pham D, Tan X, Balderson B, Xu J, Grice L, et al. 2023. Robust mapping of spatiotemporal trajectories and cell-cell interactions in healthy and diseased tissues. Nat Commun 14:7739
  • [78] Rabinowitz JD, Vastag L. 2012. Teaching the design principles of metabolism. 8:497–501
  • [79] Luzia L, Battjes J, Zwering E, Jansen D, Melkonian C, Teusink B. 2024. A fast method to distinguish between fermentative and respiratory metabolisms in single yeast cells. iScience 27:108767
  • [80] Müller S, Flamm C, Stadler P. 2022. What makes a reaction network “chemical”? J Cheminform 14:63
  • [81] Zhang Y, Lucas M, Battiston F. 2023. Higher-order interactions shape collective dynamics differently in hypergraphs and simplicial complexes. Nat Commun 14:1605
  • [82] Battiston F, Amico E, Barrat A, Bianconi G, Ferraz de Arruda G, et al. 2021. The physics of higher-order interactions in complex systems. Nature Physics 17:1093–1098
  • [83] Filippi S, Barnes CP, Kirk PDW, Kudo T, Kunida K, et al. 2016. Robustness of mek-erk dynamics and origins of cell-to-cell variability in mapk signaling. Cell reports 15:2524–2535
  • [84] Ichbiah S, Delbary F, McDougall A, Dumollard R, Turlier H. 2023. Embryo mechanics cartography: inference of 3d force atlases from fluorescence microscopy. Nat Methods 20:1989–1999
  • [85] Fabrèges D, Corominas-Murtra B, Moghe P, Kickuth A, Ichikawa T, et al. 2024. Temporal variability and cell mechanics control robustness in mammalian embryogenesis. Science 386:eadh1145
  • [86] Huang S. 2018. The tension between big data and theory in the omics era of biomedical research. Perspectives in biology and medicine 61:472–488
  • [87] Stumpf M. 2021b. Inferring better gene regulation networks from single cell data. Current Opinion in Systems Biology
  • [88] Pratapa A, Jalihal AP, Law JN, Bharadwaj A, Murali TM. 2020. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nature methods 17:147–154
  • [89] Wells CA, Choi J. 2019. Transcriptional profiling of stem cells: Moving from descriptive to predictive paradigms. Stem cell reports 13:237–246
  • [90] Chan TE, Stumpf MPH, Babtie AC. 2017. Gene regulatory network inference from single-cell data using multivariate information measures. Cell Systems 5:251–267.e3
  • [91] Petralia F, Wang P, Yang J, Tu Z. 2015. Integrative random forest for gene regulatory network inference. Bioinformatics 31:i197–205
  • [92] Wang X, Choi D, Roeder K. 2021. Constructing local cell-specific networks from single-cell data. Proc Natl Acad Sci U S A 118:e2113178118
  • [93] Ham L, Brackston RD, Stumpf MPH. 2020. Extrinsic noise and heavy-tailed laws in gene expression. Physical Review Letters 124:108101
  • [94] Ham L, Coomer MA, Öcal K, Grima R, Stumpf MPH. 2024. A stochastic vs deterministic perspective on the timing of cellular events. Nature Communications 15:0
  • [95] Brackston RD, Lakatos E, Stumpf MPH. 2018. Transition state characteristics during cell differentiation. PLoS computational biology 14:e1006405
  • [96] Brackston RD, Wynn A, Stumpf MPH. 2018. Construction of quasipotentials for stochastic dynamical systems: An optimization approach. Physical Review E 98:022136
  • [97] Sáez M, Briscoe J, Rand D. 2022. Dynamical landscapes of cell fate decisions. Interface Focus 12:20220002
  • [98] Gardner TS, Cantor CR, Collins JJ. 2000. Construction of a genetic toggle switch in escherichia coli. Nature 403:339–342
  • [99] Elowitz MB, Leibler S. 2000. A synthetic oscillatory network of transcriptional regulators. Nature 403:335–338
  • [100] Yi TM, Huang Y, Simon MI, Doyle J. 2000. Robust perfect adaptation in bacterial chemotaxis through integral feedback control. Proc. Natl. Acad. Sci. U. S. A. 97:4649–4653
  • [101] Voigt CA. 2020. Synthetic biology 2020-2030: six commercially-available products that are changing our world. Nat. Commun. 11:6379
  • [102] Smith S, Dalchau N. 2018. Model reduction enables turing instability analysis of large reaction–diffusion models. Journal of the Royal Society Interface 15:20170805
  • [103] Kirk PDW, Babtie AC, Stumpf MPH. 2015. Systems biology (un)certainties. Science 350:386–388
  • [104] Karr JR, Sanghvi JC, Macklin DN, Gutschow MV, Jacobs JM, et al. 2012. A whole-cell computational model predicts phenotype from genotype. Cell 150:389–401
  • [105] Sun G, Ahn-Horst TA, Covert MW. 2021. The E. coli whole-cell modeling project. EcoSal Plus 9:eESP00012020
  • [106] Elsemman I, Rodriguez Prado A, Grigaitis P, Garcia Albornoz M, Harman V, et al. 2022. Whole-cell modeling in yeast predicts compartment-specific proteome constraints that drive metabolic strategies. Nat Commun 13:801
  • [107] Gherman IM, Abdallah ZS, Pang W, Gorochowski TE, Grierson CS, Marucci L. 2023. Bridging the gap between mechanistic biological models and machine learning surrogates. PLoS Comput. Biol. 19:e1010988
  • [108] Li F, Yuan L, Lu H, Li G, Chen Y, et al. 2022. Deep learning-based kcat prediction enables improved enzyme-constrained model reconstruction. Nat. Catal. 5:662–672
  • [109] Rana P, Berry C, Ghosh P, Fong SS. 2020. Recent advances on constraint-based models by integrating machine learning. Curr. Opin. Biotechnol. 64:85–91
  • [110] Sahu A, Blätke MA, Szymański JJ, Töpfer N. 2021. Advances in flux balance analysis by integrating machine learning and mechanism-based models. Comput. Struct. Biotechnol. J. 19:4626–4640
  • [111] Zhang J, Petersen SD, Radivojevic T, Ramirez A, Pérez-Manríquez A, et al. 2020. Combining mechanistic and machine learning models for predictive engineering and optimization of tryptophan metabolism. Nat. Commun. 11:4880