Antifragile Perimeter Control: Anticipating and Gaining from Disruptions with Reinforcement Learning

Linghang Sun [email protected] Michail A. Makridis [email protected] Alexander Genser Cristian Axenie Margherita Grossi Anastasios Kouvelas

Abstract

The optimal operation of transportation systems is often susceptible to unexpected disruptions, such as traffic accidents and social events. Many established control strategies relying on mathematical models can struggle to cope well with real-world disruptions, leading to a significant deviation from their anticipated efficiency. This work applies the cutting-edge concept of antifragility to design a traffic control strategy for urban road networks against disruptions. Antifragility sets itself apart from robustness and resilience as it represents a system’s ability not only to withstand stressors, shocks, and volatility but also to thrive and enhance performance in the presence of such adversarial events. This work proposes an antifragile model-free deep reinforcement learning scheme to control a two-region cordon-shaped urban traffic perimeter network. Promising results demonstrate that the proposed algorithm is capable of anticipating and responding to potential disruptions. Comparisons with state-of-the-art baseline algorithms proved the efficiency of the proposed approach that moreover exhibits antifragile properties due to its design, i.e., (a) to gradually improve its absolute performance under disruptions of similar magnitude, and (b) to improve its relative performance under disruptions of increasing magnitude.

keywords:

antifragility , reinforcement learning , perimeter control , disruptions , macroscopic fundamental diagram

^†^†journal: Elsevier

\affiliation

[a]organization=Institute for Transport Planning and Systems, ETH Zurich, city=Zurich, postcode=8093, country=Switzerland \affiliation[b]organization=Computer Science Department and Center for Artificial Intelligence, Technische Hochschule Nürnberg, city=Nürnberg, postcode=90489, country=Germany \affiliation[c]organization=Intelligent Cloud Technologies Lab, Huawei Munich Research Center, city=Munich, postcode=80992, country=Germany

1 Introduction

Transportation networks serve as vital channels for the movement of people and the flow of goods. The optimization and control of transportation systems has always been a focal point, resulting in a multitude of research endeavors and practical implementations in the field of Intelligent Transportation Systems (ITS), as detailed in Auer et al. (2016). Given that small disturbances and various sorts of disruptions, such as traffic accidents, social events, adversarial weather conditions, etc., often occur unexpectedly in real-world networks, examining the robustness and resilience of transportation systems is highly crucial in the research and application of ITS (Ganin et al., 2019).

With the ever-growing population in major cities worldwide and urbanization, traffic systems have gained both volume and complexity. For example, private motorized road traffic experienced a steady growth of about 50% over the past few decades in Switzerland (Federal Statistical Office, 2020). Similar trends can also be observed in the U.S. (U.S. Department of Transportation, 2019) and many other countries. Researchers have also managed to project possible future traffic demand and demonstrated the demand is highly likely to keep on growing even when environmental and sustainable aspects as well as political and behavioral shifts, are taken into account (Matthias et al., 2020; Zhang and Zhang, 2021). Moreover, the rise in traffic volume can lead to an escalation of congestion and more traffic accidents (Chang and Xiang, 2003). Similar to the BPR function first developed in (U.S. Bureau of Public Roads, 1964), which shows with empirical data that the travel time grows exponentially with traffic volume at the link level, Sun et al. (2024) has demonstrated the fragile nature of road transportation systems at the network level with mathematical proof, indicating the performance degrades exponentially with linearly increasing magnitude of disruptions. Therefore, when facing such traffic demand and disruptions projected for sustained growth, urban road networks should be able to secure a decent level of service even when confronted by various disruptive events of unforeseen magnitudes.

To address such issues, the concept of antifragility has shed light on a feasible solution and provides potential new evaluation criteria for the performance of a system under disruptions. First introduced in the reputable publication Antifragile: Things That Gain from Disorder (Taleb, 2012), and later mathematically formulated as academic works in Taleb (2013); Taleb and Douady (2013), it has provided insights into designing systems that can benefit from disruptions and perform better under growing volatility and randomness. Its counterpart concept, fragility, can be dated earlier in complexity theory in Vespignani (2010), indicating a cascading effect of interdependent variables in complex networks (Buldyrev et al., 2010). Ever since the proposal of the two opposing concepts, antifragility has gained substantial interest from both the general public and academia, particularly in the risk engineering community (Aven, 2015). The potential of antifragility has been leveraged in technical domains to address the growing challenges posed by complex and dynamic systems in modern society (Axenie et al., 2024).

However, the design principles for realizing antifragile transportation systems are generally unexplored. One promising approach to induce antifragility is by using learning-based algorithms. With the rapid advancement of big data and sensor techniques, Reinforcement Learning (RL) is becoming a trending practice in designing traffic management and control strategies (Zhu et al., 2019; Zhou and Gayah, 2021; Haydari and Yılmaz, 2022; Zhou and Gayah, 2023). Through interacting with a given environment, an RL agent can enhance its decision-making ability over time (Sutton and Barto, 2018). One advantage of RL over established controllers based on control theory is that it allows for more flexibility and competence in dealing with multivariate nonlinearities in complex environments (Li, 2018; Mysore et al., 2021). RL agents can also gradually adjust their decision-making when deployed to an environment subject to variations, whereas the established controllers, such as PID controllers, may need intensive manual tuning of parameters. Also, additional information can be fed to the RL agent with ease as a representation of the environment, regardless of the knowledge of system dynamics. In contrast, structured modeling of the system is often required when designing a controller (Bemporad, 2006). This feature of RL can also be exploited to explore hidden information from the vast amount of data collected through various sensors. As a result, modern traffic control systems have the potential to learn from traffic disruptions, respond preemptively with merely early signs, and exhibit increasingly better performance as disruptions escalate.

The main goal of this paper is to design an antifragile perimeter control algorithm against traffic volatility and randomness, which are instantiated through different types of disruptions with various magnitudes. The key contributions are: a) We formulate and distinguish the concept of antifragility as opposed to other related and commonly used terms in the transportation domain; b) We introduce how antifragility can be incorporated into RL algorithms for perimeter control to achieve superior performance compared to baseline methods; and c) With proposed skewness-based quantitative indicator, our proposed algorithm further showcased antifragile properties under increasing magnitude of disruptions.

The remainder of this paper is structured as follows. In Section 2, we introduce relevant literature on multiple aspects covered in this work while Section 3 mathematically formulates the cordon-shaped perimeter control simulation environment. Methodologies related to incorporating antifragility into RL algorithms are detailed in Section 4. Section 5 discusses the simulation setups and parametrization. Results are presented in Section 6, followed by conclusion remarks and further discussions in 7.

2 Literature review

This section reviews the relevant literature with three topics intersected in this work. First, a macroscopic traffic model and the control strategy applied in this paper are introduced, which serve as the basis of model dynamics for the simulation environment. Next, as antifragility is a novel research topic and the link between RL and antifragility is yet absent, we present state-of-the-art research on leveraging RL algorithms to induce robustness and resilience of traffic control strategies. Finally, the design of an antifragile system is covered.

2.1 Macroscopic fundamental diagram and perimeter control

Alleviating urban network congestion can be realized through various traffic control strategies. Since Daganzo (2007) proved the existence of the Macroscopic Fundamental Diagram (MFD) theoretically and Geroliminis and Daganzo (2008) demonstrated the presence of the MFD with empirical data, the relationship between traffic flow and density has been established through the aggregation of individual microscopic data points. This relationship has paved the path for the development of control strategies on a macroscopic level, enabling more computationally feasible real-time control strategies for large-scale networks (Knoop et al., 2012), such as perimeter control (Keyvan-Ekbatani et al., 2012; Geroliminis et al., 2013; Kouvelas et al., 2017; Yang et al., 2017), congestion pricing (Zheng et al., 2012; Zheng and Geroliminis, 2016; Genser and Kouvelas, 2022), route guidance (Yildirimoglu et al., 2015; Fu et al., 2022), and many more.

Perimeter control is among the strategies that have attracted immense attention and research efforts. Real-world implementation as shown in Ambühl et al. (2018) also demonstrated its applicability as an effective approach to regulating urban traffic. By refraining the incoming vehicles from adjacent regions into a protected zone, the traffic density in the protected area remains below the critical density, and a satisfactory level of service can be upheld (Keyvan-Ekbatani et al., 2012). Geroliminis et al. (2013) proposed an optimal perimeter control method using Model Predictive Control (MPC) and proved its effectiveness compared to a greedy controller in a cordon network. One major issue with the previous works is the MFD heterogeneity. To tackle this challenge, a substantial amount of effort has been made in investigating the partitioning algorithms so that a well-defined MFD can be referred to for a certain subordinated network (Ambühl et al., 2019; Saedi et al., 2020). However, MFDs in the real world can hardly be well-defined, as demonstrated in Ambühl et al. (2021) with loop detector data over a year. Wang et al. (2015) and Ji et al. (2015) also showed that adverse weather conditions and traffic incidents can alter the shape of the MFDs. Even a recovery from the peak-hour congestion may lead to a hysteresis (Gayah and Daganzo, 2011). These phenomena could potentially violate the mathematical model that serves as the foundation for the established model-based perimeter controllers.

2.2 Leveraging RL to induce robustness or resilience in traffic control

To tackle the parameter uncertainties in the models caused by real-world disturbances, recent years have also witnessed a growing trend towards utilizing non-parametric learning-based approaches in traffic control (Nguyen et al., 2018). Among different ML algorithms, RL has been researched extensively in transportation operations, e.g., traffic light control (Wei et al., 2019; Chen et al., 2020), dynamic pricing (Wang et al., 2022), delay management (Zhu et al., 2021). Particularly for perimeter control, recent works from Ni and Cassidy (2019) and Zhou and Gayah (2021) have illustrated the capability of RL algorithms to achieve similar or even superior performance compared to the established control methods, with the advantage of the exact system dynamics being agonistic.

Before introducing antifragility, two associated terms robustness and resilience commonly used in evaluating traffic control strategies and how they can be induced with RL are introduced. As explained in Tang et al. (2020), these terms are sometimes used interchangeably across transportation research works and no unanimous is yet agreed upon. Therefore, we follow the principles proposed by Zhou et al. (2019), that robustness is concerned with assessing a system’s capacity to preserve its initial state and resist performance deterioration in the presence of uncertainty and disturbances, while resilience emphasizes the ability and speed of a system to recover from major disruptions to the original state. Note in this work we use disruptions to refer to the negative events more significant than daily disturbances but less severe than natural hazards or large-scale power outages that completely paralyze the whole traffic network. While researchers have endeavored to produce review articles in either the applications of RL in traffic control (Haydari and Yılmaz, 2022; Haghighat et al., 2020) or robustness/resilience in transportation systems (Zhou et al., 2019; Tamvakis and Xenidis, 2012), a collection of using RL algorithms in transportation to achieve robust or resilient urban networks remains absent. Here we reviewed some of the existing studies that apply RL to regulate urban traffic that have demonstrated robustness or resilience and are summarized in Table 1. By studying how robustness and resilience can be induced in RL algorithms, we can induce antifragility in our traffic control design through similar approaches.

Table 1: Traffic control strategies using RL with demonstrated robustness or resilience

Literature	Category	Objective	Scenarios	Robust	Resilient	Benchmark	RL method	State	Action	Reward
Zhou and Gayah (2021)	Perimeter control	Maximize trip completion under uncertainties	Demand and MFD uncertainties	Tested¹		No-control, MPC	DDPG	No. of vehicles, traffic demand	Perimeter control variables	Trip
completion
Chen et al. (2022)	Perimeter control	Minimize total time spent under uncertainties/disturbances	Demand uncertainties, demand surge	Tested		MPC	AC-IRL	No. of vehicles	Perimeter control variables	Divergence from critical accumulation
Su et al. (2023)	Perimeter control	Maximize trip completion under uncertainties	Demand uncertainties	Tested		FT/Webster/ MP + PI	DQN	Flow, density, speed	Perimeter control variables	Ratio of flow over max flow
Zhou and Gayah (2023)	Perimeter control	Maximize trip completion under uncertainties and errors	Demand and MFD uncertainties		Induced¹	No-control, MPC	MADDPG	No. of vehicles, traffic demand, congestion indicator²	Perimeter control variables	Trip
completion
Aslani et al. (2018)	Traffic signal control	Minimize average travel time under disturbances	Incidents, sensor noise	Tested		Cross comparison	DQN, AC, SARSA	Phase, pressure	Green time duration	Queue length
Chu et al. (2020)	Traffic signal control	Minimize queue length and delay, maximize vehicle speed and trip completion under demand uncertainties	Demand uncertainties, demand surge	Induced		IA2C, IQL-LR, IQL-DNN	MAA2C	Total delay, no. of vehicles, adjacent info	Signal configuration	Queue length, total delay
Rodrigues and Azevedo (2019)	Traffic signal control	Minimize total travel time, total delay, and total stop time	Demand surge, incidents, sensor failures	Induced		Self-comparison	DDQN	Queue length, elapsed time in each phase	Time extension, phase selection	Numerical derivative of queue length
Tan et al. (2020)	Traffic signal control	Minimize queue length at the intersection under uncertainties	Demand uncertainties	Induced		Self-comparison	DDQN	phase info, queue length, speed, pressure	Phase configuration	Queue length, duration over detection
Wu et al. (2020)	Traffic signal control	Minimize average vehicle delay under different scenarios	Demand uncertainties, sensor noise	Tested		Fixed-time, actuated	DDQN	No. of vehicles, phase, elapsed time, neighbor phases	No-change/ extend/ terminate	Queue length
Korecki et al. (2023)	Traffic signal control	Minimize average travel time under disruptions	Lane closure		Tested	Random, cyclical, demand, analytical+	DQN, DQN + heuristic	Vehicle occupancies	Green time duration	Negative pressure

1

Tested/Induced: Tested means the robustness or resilience is demonstrated through direct testing against other benchmark methods, without identifying which component in the RL algorithm results in such properties. Induced means the author identified that adding certain terms in the RL algorithm improves the robustness or resilience.
2

Italicized terms represent the additional information the authors applied as effective to design a robust/resilient system.

Table 1 also summarizes the RL setups to demonstrate the property of being either robust or resilient of each traffic control strategy. In these papers, some authors proved the superior robustness or resilience of their proposed methods directly through testing against the established methods as benchmarks, whereas others induced such properties by adding specific terms (italicized in Table 1) in the state space of the RL-algorithm. As a result, the algorithms are given additional information related to disturbances or disruptions. For instance, Rodrigues and Azevedo (2019) induced robustness by adding the elapsed time since the last green signal for each phase, Tan et al. (2020) experimented with speed or pressure (residual queue) as an additional state representation in the state space of the RL algorithm, Chu et al. (2020) supplemented the control policies of neighboring intersections as additional information to the agents, and Zhou and Gayah (2023) used an extra binary congestion indicator in the state space. The analysis of the state-of-the-art RL studies considers reversals or sudden changes in the state-action-reward dynamics, which evokes unanticipated uncertainty. The problem in these contexts is often to respond to unexpected results appropriately since they might indicate a shift in the environment. In this case, exploration refers to the process of looking for new information to improve the RL agent’s understanding of the traffic dynamics under disruptions, which would then be used to identify better courses of action.

2.3 Antifragility: definitions and distinctions

Ever since the concept of antifragility was proposed, it has become an increasingly popular concept in many disciplines, such as economy (Manso et al., 2020), biology (Kim et al., 2020), medicine (Axenie et al., 2022), and robotics (Axenie and Saveriano, 2023). However, current studies on antifragility in engineering mostly pertain to post-disaster treatment and reconstruction efforts, for example, in Fang and Sansavini (2017); Priyadarshini et al. (2022). Recent work in Sun et al. (2024) proved the fragile nature of road transportation networks and provided insights for antifragile network analysis and design. Leveraging the potential of antifragility for the operation of transportation systems is still a novel and unexplored notion.

In Taleb (2012), two levels of antifragility have been indicated: proto-antifragility and antifragility. To differentiate these two terms more clearly, we rename the latter and the more important one as progressive antifragility. In contrast to robustness or resilience, proto-antifragility describes systems that can improve performance from experiencing disruptions of similar magnitude. An illustrated example can be the biological process of hormesis. In this sense, proto-antifragility resembles the concept of adaptiveness, research works such as Fang and Sansavini (2017) are related to proto-antifragility of infrastructure under hazardous events. The higher level of antifragility, i.e., the progressive antifragility, emphasizes the concave response of the system regarding an increasing magnitude of disruptions, which has been proven in Sun et al. (2024) that urban road networks are progressively fragile by nature. A graphical comparison between robustness, resilience, adaptiveness, and antifragility is shown in Fig. 1. As previously mentioned, robustness refers to a system’s resistance to disturbances, without considering whether the system ultimately recovers. In contrast, resilience evaluates how quickly the system recovers from disruptions and the extent of performance loss, as illustrated by the shaded areas. A proto-antifragile (adaptive) system alleviates performance degradation after similar disruptions but does not necessarily guarantee continuous improvement as disruption magnitude increases. On the other hand, progressive antifragility ensures that the system experiences a damped increase in performance loss with linearly increasing disruptions, which can result from either the system’s innate antifragile characteristics or the implementation of antifragile control strategies. And a system demonstrating progressive antifragility through antifragile control is not inherently proto-antifragile on its own (Taleb and Douady, 2013). Likewise, fragility refers to systems that suffer from exponentially growing loss when faced with linearly increasing disruptions (Taleb and Douady, 2013), which is characterized by convexity and can be mathematically formulated with Jensen’s inequality $E[g(X)]\geq g(E[X])$ . On the contrary, the nonlinear relationship between external stressors and responses for antifragile systems is concave with $E[g(X)]\leq g(E[X])$ .

Refer to caption — Figure 1: Illustration for robustness, resilience, adaptiveness, and antifragility

Researchers have also proposed methods to incentive antifragile property of a system by emphasizing the derivatives to capture the temporal evolution patterns of the system dynamics, i.e., how fast the system state deviates towards a possible black swan event, and the curvature of this deviation (Taleb and Douady, 2013; Taleb and West, 2023; Axenie et al., 2022). With this additional information, the system can anticipate ongoing disruptions and be more responsive to drastic changes. Similar to the function of redundancy in resilience (Tan et al., 2019; Kamalahmadi et al., 2022), redundancy can also be added in the system to induce antifragility de Bruijn et al. (2020); Johnson and Gheorghe (2013); Munoz et al. (2022). Other feasible approaches also include time-scale separation and attractor dynamics (Axenie et al., 2022; Axenie, 2022).

3 Problem Formulation

This paper studies the problem of perimeter control between two homogeneous regions. A cordon-shaped urban network is investigated as in Geroliminis et al. (2013), with the inner region representing a city center, as shown in Figure 2(a). New traffic demand at time $t$ for an Origin-Destination (OD) pair from region $i$ to region $j$ is denoted as $q_{ij}(t)$ . The inner and outer regions have different MFDs due to the difference in capacity to accommodate vehicles in the road networks within the city center and the surrounding region, which are defined as $G_{i}(n_{i}(t))$ as illustrated in Figure 2(b). Given the total number of vehicles with presence in region $i$ at time $t$ , denoted as $n_{i}(t)$ , the total trip completion rate for this region $i$ , denoted as $M_{i}(t)$ , can be determined using the corresponding MFD, which comprises both the intraregional trip completion, i.e., $M_{ii}(t)$ and the interregional transfer flow, i.e., $M_{ij}(t)\>(i\neq j)$ with $i,j\in\{1,2\}$ . In order to protect both regions from being overflown by possible high traffic demand, the percentage of the transfer flow allowed to go across the perimeter at time $t$ is regulated by two perimeter controllers denoted as $u_{ij}(t)\>(i\neq j)$ . A list of all notations used in this paper, including the notations used in defining the RL algorithm and the antifragile terms, is summarized in Table 2.

Table 2: List of notations

1. General notations in problem formulation
Symbol	Description
$t$	Time
$\Delta t$	Time step interval
$T$	Total simulation time
$n_{ij}(t)$	Vehicle accumulation with OD from region $i$ to $j$ at time $t$
$n_{i}(t)$	Vehicle accumulation in region $i$ at time $t$
$u_{ij}(t)$	Perimeter control variables regulating flow from region $i$ to $j$ at time $t$
$q_{ij}(t)$	Traffic demand with OD pair $i$ and $j$ at time $t$
$G_{i}(n_{i}(t))$	Sum of trip completion and transfer flow in region $i$ at time $t$
$M_{ij}(t)$	Trip completion with OD from region $i$ to $j$ $(i\neq j)$ at time $t$
$n_{i,\rm cap}(t)$	Maximal number of vehicles (jam accumulation) in region $i$ at time $t$
$n_{i,\rm crit}(t)$	Vehicle accumulation with highest completion rate in region $i$ at time $t$
$J$	Objective function
2. Notations in reinforcement learning
$\mathcal{S}$	State space, the whole set of states the RL agent can transition to
$s_{t}$	$s_{t}\in\mathcal{S}$ , the observable state in simulation at time $t$
$\mathcal{A}$	Action space, the whole set of actions the RL agent can act out
$a_{t}$	$a_{t}\in\mathcal{A}$ , the action taken in simulation at time $t$
$\mathcal{R}$	The reward function for the RL agent
$r_{t}$	$r_{t}=\mathcal{R}(s_{t},a_{t})$ , the received reward with state $s_{t}$ and action $a_{t}$ at time $t$
$\gamma$	Discount factor to favor rewards in the near future
$Q(s(t),a(t))$	Expected long-term return for taking action $a(t)$ in state $s(t)$ at time $t$
$\theta^{\mu}$	Weight parameter of the deep neural network for the actor network
$\theta^{Q}$	Weight parameter of the deep neural network for the critic network
$y_{i}$	Expected long-term return calculated with the target critic network
$L$	The loss of the critic network
$\rho^{\beta}$	All possible trajectories of $s_{t}$
$I$	The objective function for the actor-network
3. Notations for the additional antifragile terms applied in reinforcement learning
$r_{com}(t)$	Reward term in RL based on trip completion, equaling to $\sum^{2}_{i=1}M_{ii}(t)$
$r_{red}(t)$	Additional reward term in RL based on derivatives and redundancy
$r_{dam}(t)$	Additional damping term to counter possible action oscillation
$\omega_{h}$	The weight of first derivative in the additional reward term $r_{red}(t)$
$\omega_{\Delta h}$	The weight of second derivative in the additional reward term $r_{red}(t)$
$\alpha_{i}(t)$	Binary variable determining the term to be reward/penalty
$h_{i}(t)$	The first derivative of traffic state at time $t$
$\Delta h_{i}(t)$	The second derivative of traffic state at time $t$

Eq. 1a describes the change rate of the intraregional vehicle accumulation of region $i$ . It is the sum of intraregional traffic demand in this region, denoted as $q_{ii}(t)$ , together with the perimeter control regulated transfer flow from region $j$ to region $i$ , defined as $u_{ji}(t)\cdot M_{ji}(t)$ , then deducted by the trip completion within region $i$ , denoted as $M_{ii}(t)$ . similarly, the change rate of interregional traffic accumulation, as in Eq. 1b shows, is the difference between the interregional traffic demand, denoted as $q_{ij}(t)$ , and the regulated transfer flow $u_{ij}(t)\cdot M_{ij}(t)$ :


$\displaystyle\frac{dn_{ii}(t)}{dt}$	$\displaystyle=q_{ii}(t)+u_{ji}(t)\cdot M_{ji}(t)-M_{ii}(t)$	(1a)
$\displaystyle\frac{dn_{ij}(t)}{dt}$	$\displaystyle=q_{ij}(t)-u_{ij}(t)\cdot M_{ij}(t),\>(i\neq j)$	(1b)

The total trip completion, i.e., $M_{i}(t)$ for region $i$ at time $t$ is calculated based on the trip accumulation and the related MFD, defined as $G_{i}(n_{i}(t))$ , and is the sum of the intraregional trip completion, i.e., $M_{ii}(t)$ , in Eq. 2a and the interregional transfer flow, i.e., $M_{ij}(t)\>(i\neq j)$ , in Eq. 2b:


$\displaystyle M_{ii}(t)$	$\displaystyle=\frac{n_{ii}(t)}{n_{i}(t)}\cdot G_{i}(n_{i}(t))$	(2a)
$\displaystyle M_{ij}(t)$	$\displaystyle=\frac{n_{ij}(t)}{n_{i}(t)}\cdot G_{i}(n_{i}(t)),\>(i\neq j)$	(2b)
$\displaystyle n_{i}(t)$	$\displaystyle=\sum_{j=1,2}n_{ij}(t)$	(2c)

The objective function is to maximize the throughput of this cordon-shaped network, which is the sum of the intraregional trip completion in both regions.

$J=\max\limits_{u_{ij}(t)}{\int_{0}^{T}\sum_{i=1,2}M_{ii}(t)dt}$		(3)

s.t.	$\displaystyle n_{ij}(t)\geq 0$	(3a)
	$\displaystyle n_{i}(t)\leq n_{i,\rm cap}$	(3b)
	$\displaystyle u_{\min}\leq u_{ij}(t)\leq u_{\max}$	(3c)

Intraregional and interregional vehicle accumulation, i.e., $n_{ii}(t)$ and $n_{ij}(t)$ are non-negative values, and $n_{i,\rm cap}$ is the maximal possible number of vehicles accumulated in the region $i$ . At this vehicle accumulation, a gridlock will occur in the network. $u_{\min}$ and $u_{\max}$ represent the lower and upper limit for the perimeter control variable $u_{ij}(t)$ for both directions and such applications are in line with Geroliminis et al. (2013); Zhou and Gayah (2021). These bounds are necessary as perimeter control is normally implemented through signalization. While $u_{\max}$ accounts for the lost time caused by the interchange between the red and green phases, $u_{\min}$ is necessitated since an indefinite long red light is rare in real-world cases.

In contrast to the objective function to be applied for control-based strategies, the objective function for the RL-based algorithms $J_{\rm RL}$ is:

J_{\rm RL}=\max\limits_{u_{ij}(t)}{\int_{0}^{T}\left(r_{com}(t)+r_{dam}(t)+r_{red}(t)\right)dt}

(4)

with $r_{com}(t)$ standing for the same trip completion as in Eq. 3:

r_{com}(t)=\sum_{i=1,2}M_{ii}(t)

The second term $r_{dam}(t)$ is the damping term to counter the oscillating behavior of the RL agents due to the incorporation of derivatives in RL. The third term $r_{red}(t)$ represents the redundancy term that aims to build up a proper redundancy so that the proposed RL algorithm does not reward the agent for targeting the optimal critical accumulation. More explanation of the damping term $r_{dam}(t)$ and the redundancy term $r_{red}(t)$ can be found in Section 4.2.

4 Methodology

In Section 2, we reviewed prior studies related to traffic control on inducing robustness and resilience using different RL algorithms. Here, we demonstrate how we integrate the concept of antifragility and enhance the performance of traffic control algorithms.

4.1 RL algorithm

In RL algorithms, an agent or multiple agents interact with a preset environment and improve the performance of decision-making, defined as action $a_{t}$ in an action space $\mathcal{A}$ , based on the observable state $s_{t}$ in the state space $\mathcal{S}$ and the reward, defined as $r_{t}=\mathcal{R}(s_{t},a_{t})$ , where $\mathcal{R}$ is the reward function. The improvement of decision-making is commonly realized through a deep neural network as a function approximator. The RL algorithm applied in this work is Deep Deterministic Policy Gradient (DDPG) as proposed in Lillicrap et al. (2015). By applying an actor-critic scheme, DDPG can manage a continuous action space instead of only choosing from a limited set of discrete values as in the Deep Q-Network (DQN) algorithm (Mnih et al., 2013), which are commonly applied as Table 1 shows. Also, Zhou and Gayah (2021) has demonstrated that an RL algorithm with continuous action space can achieve better performance compared to discrete action space. The DDPG algorithm can be divided into two main components, namely the actor and the critic, which are updated at each step through policy gradient and Q-value, respectively. The scheme of the DDPG algorithm applied in this paper is schematically illustrated in Figure 3.

The state $s_{t}\in\mathcal{S}$ is defined distinctively according to different methods applied in this work. Our proposed method consists of three terms, the vehicle accumulation regarding the OD pair $n_{ij}(t)$ , the change rate of vehicle accumulation at each time step $dn_{ij}(t)$ , also referred to as the first derivative, and the change rate of change rate $d^{2}n_{ij}(t)$ , or the second derivative or curvature. In Zhou and Gayah (2021), a state $s_{t}$ defined as $[n_{ij}(t),q_{ij}(t)]$ is adopted. However, since traffic demand in the real world is hardly measurable, $q_{ij}(t)$ would be an unobservable state for the agent. The action $a_{t}\in\mathcal{A}$ is defined the same as the control variables $u_{ij}(t)$ . For the reward $r_{t}$ , while Zhou and Gayah (2021) uses merely the completion rate, in our proposed algorithm, the reward is defined with an additional $r_{red}(t)$ term, as Eq. 4 shows.

The actor-network is represented by $\mu(\cdot)$ and it determines the best action $a_{t}$ , which is the percentage of vehicles that are allowed to travel across the periphery, based on the current state $s_{t}$ and the weight parameters at a certain time step $t$ .

The nature of the best action $a_{t}$ was also explored in the previous work of Axenie (2022), where in the framework of variable structure control, the authors demonstrate the need for a discontinuous signal. The critic network, denoted by $Q(\cdot)$ , takes the responsibility to evaluate whether a specific state-action pair at a certain time step yields the maximal possible discounted future reward $Q(s_{t},a_{t})$ . A common technique used in DDPG is to create a target actor network $\mu^{\prime}(\cdot)$ and a target critic network $Q^{\prime}(\cdot)$ , which are a copy of the original actor and critic network but updated posteriorly to stabilize the training process and prevent overfitting (Zhang et al., 2021). The target maximal discounted future reward for the target critic network can be calculated as in Eq. 5.

y_{i}=r_{i}+\gamma Q^{\prime}(s_{i+1},\mu^{\prime}(s_{i+1}|\theta^{\mu^{\prime}})|\theta^{Q^{\prime}})

(5)

Similar to DQN, the critic network can be updated by calculating the temporal difference between the predicted reward and the target reward and minimizing the loss for a mini-batch $N$ sampled from the replay buffer:

L=\frac{1}{N}\sum_{i}(y_{i}-Q(s_{i},a_{i}|\theta^{Q}))^{2}

(6)

Afterward, the actor network can be updated with sampled deterministic policy gradient (Silver et al., 2014):

I=\mathbb{E}_{s_{t}\sim\rho^{\beta}}[r(s,\mu(s|\theta^{\mu}))|_{s=s_{t}}]

(7)

\nabla_{\theta^{\mu}}I=\mathbb{E}_{s_{t}\sim\rho^{\beta}}[\nabla_{a}Q(s,a|\theta^{Q})|_{s=s_{t},a=\mu(s_{t})}\nabla_{\theta_{\mu}}\mu(s|\theta^{\mu})|_{s=s_{t}}]

(8)

\nabla_{\theta^{\mu}}I\approx\frac{1}{N}\sum_{t}\nabla_{a}Q(s,a|\theta^{Q})|_{s=s_{t},a=\mu(s_{t})}\nabla_{\theta^{\mu}}\mu(s|\theta^{\mu})|_{s_{t}}

(9)

First developed in Horgan et al. (2018), a similar Ape-X architecture as applied in Zhou and Gayah (2021) is also adopted in this work, which allows for multiple generations of simulations to be included and centrally learned for the best policy.

When training the agent, we add disruptions into the simulation environment starting from a certain training episode, such as surging traffic demand or MFD disruption. These various forms of volatility ought to optimally elicit various learning and decision-making processes. The RL agent would include the results of their prior choices to create and update their weight parameters in a situation with high outcome volatility and be capable of generating and updating expectations after sensing a change in a high-volatility environment.

4.2 Antifragility and the antifragile terms in RL

In this work, following the same idea of modifying the state $\mathcal{S}$ as in the research works in Tabel 1, we add additional terms based on derivatives (Taleb and Douady, 2013) and redundancy (de Bruijn et al., 2020), in both the state $\mathcal{S}$ and the reward function $\mathcal{R}$ of the RL algorithm.

The DDPG algorithm introduced in Zhou and Gayah (2021) for perimeter control requires knowledge of vehicle accumulation $n_{ij}(t)$ and a given demand profile $q_{ij}(t)$ , which can be regarded as an estimate of average daily traffic demand, since the exact real-time demand is almost impossible to acquire. Therefore, we reduce the information requirement by replacing $q_{ij}(t)$ with the first and second derivatives of vehicle accumulation, denoted $dn_{ij}(t)$ and $d^{2}n_{ij}(t)$ . These two derivatives can be calculated directly from $n_{ij}(t)$ and the value from previous time steps without excessive knowledge of $q_{ij}(t)$ . Moreover, while the robustness of including $q_{ij}(t)$ against model stochasticity has been demonstrated in Zhou and Gayah (2021), the algorithm remains vulnerable to significant disruptions that cause substantial deviations from the average demand profile. Incorporating the derivative terms brings the advantage of feeding additional information similar to the demand profile $q_{ij}(t)$ , but can easily adapt under the presence of disruptions.

However, when $dn_{ij}(t)$ and $d^{2}n_{ij}(t)$ are incorporated into the state $\mathcal{S}$ of the algorithm, significant oscillations in the perimeter control variables can be quite often observed, especially under scenarios with disruptions. As perimeter control is composed of coordinated traffic lights at the border of different regions, oscillating actions will result in significantly varying green splits between consecutive cycles. Even though oscillations may have little impact based on certain key performance indicators, the operation of traffic lights in the real world should be as stable as possible. Therefore, a damping term $r_{dam}(t)$ is introduced into the reward function to penalize potential oscillatory actions. The exponent $\xi$ determines how fast the penalty decays when the change of control variables becomes small, indicating a large $\xi$ mostly penalizes the agent when the oscillation is substantial.

r_{dam}(t)=-\sum_{i,j\in\{1,2\}}|(u_{i,j}(t)-u_{i,j}(t-1))^{\xi}|,\quad i\neq j

(10)

For the reward function $\mathcal{R}$ , the trip completion at each time step acts as the main component of our proposed method. The term $r_{red}(t)$ in the objective function $J_{\rm RL}$ in Equation (4) acts as an additional term to build up redundancy in the system. Similar to the creation of the additional term in the state space, we create redundancy also through the calculation of the derivates, but instead of the derivatives of the vehicle accumulation, we calculate the derivatives of the traffic state. This creates a one-to-one correspondence between the derivatives of the vehicle accumulation and the derivatives of the traffic state. To explain this antifragile term in the reward, we summarize $r_{red}(t)$ as the sum of two terms, with $H(t)$ being an overall term representing the first derivative of the traffic state and $\Delta H(t)$ representing the second derivative:

r_{red}(t)=H(t)+\Delta H(t)

(11)

Here, $H(t)$ and $\Delta H(t)$ can be expanded as:

H(t)=\sum_{i=1,2}H_{i}(t)=\omega_{h}\sum_{i=1,2}f(n_{i}(t),n_{i,\rm crit},n_{i,\rm cap})\cdot\alpha_{i}(t)\cdot h_{i}(t)

(12)

\Delta H(t)=\sum_{i=1,2}\Delta H_{i}(t)=\omega_{\Delta h}\sum_{i=1,2}f(n_{i}(t),n_{i,\rm crit},n_{i,\rm cap})\cdot\Delta h_{i}(t)

(13)

$h_{i}(t)$ and $\Delta h_{i}(t)$ are the first and second numerical derivatives of the traffic states on the MFD, $h_{i}(t)$ is defined as the difference of trip completion over vehicle accumulation at the end of a time step versus at the beginning of the same time step, as in Eq. 14 shows, and the second derivative $\Delta h_{i}(t)$ is calculated as the difference between the first derivatives of two consecutive time steps, as in Eq. 15 shows:

h_{i}(t)=\frac{M_{i}(t)\ -\ M_{i}(t-1)}{n_{i}(t)\ -\ n_{i}(t-1)}

(14)

\Delta h_{i}\left(t\right)=h_{i}\left(t\right)-h_{i}\left(t-1\right)\

(15)

Since in the RL algorithms implemented in this paper, all variables involved in the deep neural network should be normalized to facilitate the training process, meaning the exact values of the derivatives are not of importance, $\omega_{h}$ and $\omega_{\Delta h}$ are introduced as the weight constants for the first and second derivatives to regulate their impact on the reward side $\mathcal{R}$ .

The binary variable $\alpha_{i}(t)$ was designed in the first derivative to reward the agent when moving towards the desired direction on the MFD. For instance, the derivative of any data point in the congested zone of the MFD is negative. In this case, when the vehicle accumulation is still getting larger, a penalty will be applied. However, if the vehicle accumulation is decreasing through perimeter control, this binary $\alpha_{i}(t)$ variable will turn it into a reward. For the second derivative, an additional binary variable is not necessary since the two consecutive first derivatives are able to determine whether $\Delta h_{i}(t)$ is either positive or negative.

\alpha_{i}(t)=\begin{cases}1,&\text{if }n_{i}(t)\geq n_{i}(t-1),\\ -1,&\text{otherwise.}\end{cases}

(16)

The term $f(n_{i}(t),n_{i,\rm crit},n_{i,\rm cap})$ is a reduction factor to constrain the impact of the $r_{red}(t)$ term when the accumulation is either on a very lower level (empty network) or on a very higher level (gridlock). The area near the critical accumulation is where the $r_{red}(t)$ term should have the greatest impact. Here we use a modified trigonometric function to realize this purpose. It should be noted that other functions, such as normal distribution, are also valid for achieving the same purpose.

f(n_{i}(t),n_{i,\rm crit},n_{i,\rm cap})=\begin{cases}\cfrac{1+\cos\Big{(}-\pi\cdot\cfrac{n_{i,\rm crit}-n_{i}(t)}{n_{i,\rm crit}}\Big{)}}{2},&\text{if }n_{i}(t)\geq n_{i,\rm crit},\\ \cfrac{1+\cos\Big{(}-\pi\cdot\cfrac{n_{i}(t)-n_{i,\rm crit}}{n_{i,\rm cap}-n_{i,\rm crit}}\Big{)}}{2},&\text{otherwise.}\end{cases}

(17)

After considering all the modifiers above, we show $H(t)$ and $\Delta H(t)$ using a single MFD as an example in Figure 4 and 5. The first derivative $H(t)$ , in Figure 4(a), rewards the agent more when it’s moving towards the critical accumulation to maximize its trip completion rate. However, when the number of vehicles approaches the critical accumulation, this term drops significantly and becomes a penalty when it exceeds the critical point. Since the first derivative $H(t)$ is a complementary term in addition to the trip completion in the reward function, we showcase the influence of this term on the MFD after normalization in Figure 4. With increasing weight coefficient $\omega_{h}$ , the critical accumulation of the modified MFD becomes marginally smaller compared to the original MFD, and the reward that the RL agent can receive also decreases faster after the accumulation exceeds the critical accumulation. Although the trip completion still follows the original MFD in the simulation environment, The agent learns to get more rewards following the modified MFD. In this way, redundant overcompensation has been established to prevent accumulation from exceeding the critical accumulation when disruption takes place unexpectedly.

An interesting note is that estimation uncertainty, also known as second-order uncertainty, is another factor that affects disruptions that take place unexpectedly. This is in fact the imprecision of the learner’s current beliefs about the environment, and what the antifragile terms capture. This amount reduces with sampling if beliefs are acquired by learning as opposed to instruction (e.g. anticipation through redundant overcompensation). When estimating uncertainty is substantial, unlikely samples could partly be attributed to the agent’s false assumptions about the environment’s structure rather than a change in that structure (e.g. around critical accumulation).

The second derivative $\Delta H(t)$ is shown in Figure 5. The x-axis is the vehicle accumulation, same as in Figure 4(a), while the y-axis represents how fast the traffic state is changing on the MFD. The faster it increases to reach the critical accumulation, the greater the penalty will be applied to the RL agent. This observation is consistent with the redundant overcompensation and time-scale separation principles formalized in Taleb and Douady (2013) and practically applied in Axenie (2022) and Axenie and Saveriano (2023). On the contrary, if the vehicle accumulation decelerates, a reward will be applied. Similar to $H(t)$ , this complementary term $\Delta H(t)$ is also dependent on the normalization factor $\omega_{\Delta h}$ .

With $H(t)$ and $\Delta H(t)$ , the agent learns to be conservative when regulating the perimeter control variables when the accumulation is about to reach critical, in case disruptions take place. Therefore, as can be concluded, although $H(t)$ and $\Delta H(t)$ apply the same concept of the derivative as the $dn_{ij}(t)$ and $d^{2}n_{ij}$ in the state space $\mathcal{S}$ , the purpose of $H(t)$ and $\Delta H(t)$ is preserving redundancy in the system instead of feeding additional information to the agent. This behavior is consistent with the locally discontinuous shape of the action signals $u_{ij}(t)$ applied to the cordon network, as suggested by the control theoretic study of Axenie (2022).

4.3 Model Predictive Control

Model Predictive Control (MPC) is a well-established control method with wide applications in regulating dynamic systems in various engineering domains, including transportation and particularly perimeter control, Readers interested in Model Predictive Control (MPC) and its applications in traffic engineering can refer to Geroliminis et al. (2013); Haddad and Mirkin (2017); Zhou and Gayah (2021). The MPC toolkit applied in this paper is introduced in Lucia et al. (2017), which uses the CasADi framework (Andersson et al., 2019) and the NLP solver IPOPT (Wächter and Biegler, 2006).

5 Experiment application

This section introduces the most important settings for the numerical simulation to showcase antifragile perimeter control and the corresponding performance evaluation. In addition to No Control (NC), we investigated two state-of-the-art perimeter control algorithms compared to our proposed antifragile algorithm, which are summarized below:

-

NC where the system operates without intervention.
-

MPC proposed in Geroliminis et al. (2013).
-

State-of-the-art RL-based perimeter control algorithm proposed in Zhou and Gayah (2021) as the RL baseline.

The performance of the traffic control algorithms is studied under two different types of disruptions, i.e., a demand disruption like surging traffic demand, or a supply disruption, such as the change of the MFD of a specific network due to adversarial weather or major traffic accidents, as illustrated in Fig. 6. Furthermore, to showcase both proto-antifragility and progressive antifragility, this work examines the impact of different manifestations of the magnitude of disruptions. In the short term, the magnitudes are assumed to remain relatively stable with minor stochastic variations. In the long term, as explained in Section 1, however, as traffic demand continues to increase and contributes to more frequent traffic incidents, the magnitude is also expected to grow, which may also be subject to real-world uncertainties. The static and incremental magnitude of disruptions is intended for testing proto- and progressive antifragility. To sum up, the combination of: (1). demand and supply disruptions; (2). static and incremental disruption; (3). with and without uncertainty in disruptions has resulted in a total number of 8 scenarios.

5.1 Simulation environment parametrization

As introduced in Section 3, we simulate a cordon-shaped urban network with inner and outer regions represented by different MFDs, as Fig. 2(b) shows. These MFDs are largely the same as in Zhou and Gayah (2021) and were originally approximated through the Yokohama loop detector dataset (Geroliminis and Daganzo, 2008). However, a minor modification has been made to allow its differentiability throughout the entire domain, since the MFDs in Zhou and Gayah (2021) are formulated as piecewise functions and are, although continuous, not differentiable. Indifferentiability within MFDs can cause fluctuations when calculating the derivatives and harm the efficacy of the redundancy term. Therefore, the gridlock accumulation of the MFD has been slightly increased with less than $3\%$ , so that the whole MFD can be both continuous and differentiable. Other important indicators, e.g., critical accumulation and maximal trip completion, remain the same as in Zhou and Gayah (2021).

Although real-world disruptions can surely happen to both the outer and the inner region and exert a negative influence, we focus on the bottleneck scenarios where the potential of perimeter control can be better reflected. Therefore, for a demand disruption, a surging traffic demand is assumed to be generated from the outer region toward the inner region, whereas for a supply disruption, the MFD of the inner region is lowered as a result of such disruptive events while the MFD of the outer region remains intact. The two types of disruptions also demonstrate different training difficulties for the two RL-based algorithms involved, which will be detailed in Section 6. The traffic demand under no disruption is approximated based on Geroliminis and Daganzo (2008). As the real-world peak hour traffic demand profile has more resemblance to a Gaussian distribution instead of being a simple trapezoidal shape (Mazloumi et al., 2010), the base demand $q_{ij}(t)=[q_{11}(t),q_{12}(t),q_{21}(t),q_{22}(t)]$ is illustrated as the blue curves in Fig. 7(a). It consists of two components, which are a constant value $q_{ij,c}=[q_{11,c},q_{12,c},q_{21,c},q_{22,c}]=[0.3,0.35,0.2,0.25]$ in $veh/s$ and a Gaussian term $q_{ij,N}(t)=C_{ij,N}\mathcal{N}_{ij}(\mu_{ij,N},\sigma_{ij,N})$ with total number of vehicles following Gaussian as $C_{ij,N}=[3000,15000,2000,4000]$ and the mean and variance of the distribution being $\mu_{ij,N}=[1800,1800,1800,1800]$ and $\sigma_{ij,N}=[1200,1500,900,1200]$ in seconds. To simulate demand disruptions, $q_{12,N}$ is increased to reflect traffic flowing from the periphery into the city center. For testing progressive antifragility, a compounding factor of $0.018$ per episode is applied to escalate $q_{12,N}$ , with the maximal demand disruption after 25 episodes of incremental disruptions reaching $(1.018^{25}-1)C_{12,N}\mathcal{N}_{12}(\mu_{12,N},\sigma_{12,N})$ . For static demand disruptions to validate proto-antifragility, the magnitude is set to be $(1.018^{25/2}-1)C_{12,N}\mathcal{N}_{12}(\mu_{12,N},\sigma_{12,N})$ .

Supply disruptions represent adversarial events that negatively affect the MFD, resulting in impaired traffic performance. However, as these events can arise in different forms, such as adverse weather conditions, traffic accidents, road maintenance works, etc., there has been limited research effort dedicated to understanding the exact correlation between such events and their impact on the MFD. Recent work in Ambühl et al. (2020) proposed the concept of infrastructure potential $\lambda$ to represent how efficiently the network infrastructure is used considering the performance loss due to vehicle infrastructure interaction, and a smaller $\lambda$ indicates the infrastructure is utilized with higher efficiency. In this research, a $\lambda$ equal to $0.07$ is chosen for the MFD without disruption representing the inner region, which lies in the common range of $\lambda$ based on data from real-world cities (Ambühl et al., 2020). And for the incremental magnitude of supply disruptions to test progressive antifragility, $\lambda$ is escalated by $0.003$ per episode with the maximal $\lambda$ after 25 episodes being $0.145$ , as the red solid curve shows in Fig. 7(b). For static supply disruptions to test proto-antifragility, the magnitude of disruption is set to be half of the maximal disruption. Note as the polynomial MFD and the MFD approximated with $\lambda$ are essentially two different functional forms, there are still marginal deviations between each other after careful calibration of parameters, as illustrated with the blue and gray curves in Fig. 7(b).

The total simulation duration $T$ is 3 hours with each time step $\Delta t$ as $180$ seconds, and the third hour has significantly fewer vehicles so that the network would be able to clear the vehicles accumulated in the previous two hours, if there is no gridlock formed in the simulated network. Another notable constraint is the lower and upper bounds for the perimeter control variable $u_{ij}\in[0.1,0.9]$ to represent the real-world constraints from traffic signals (Geroliminis et al., 2013). The initial vehicle accumulation is set to be $n_{ij,0}=[600,1300,300,2400]$ so that the accumulation remains approximately at an equilibrium at the beginning of the simulation.

Each scenario is run $25$ times since randomness is inherent in RL and performance may vary across simulation iterations. Each iteration lasts for $75$ episodes, with the first $50$ under no disruption so that the RL agent can be properly trained and the following $25$ with either static or incremental disruption, and outputs the performance results from NC, MPC, RL baseline, and RL proposed, respectively. The number of episodes with disruption $25$ is deliberately set to be the same as the number of simulation iterations to improve the clarity of the results from the scenarios with disruption uncertainties, as illustrated by the zigzag lines for the simulation scenarios in Fig. 6. A list of multipliers $\epsilon=[\epsilon_{1},\epsilon_{2},\ldots,\epsilon_{24},\epsilon_{25}]$ following a normal distribution $\mathcal{N}\sim(1,0.15)$ is randomly generated to introduce uncertainties by calculating the cross product with the list of disruption magnitudes. The multiplier list will be shuffled by 1 for each simulation run, for example, the list would be $\epsilon=[\epsilon_{2},\epsilon_{3},\ldots,\epsilon_{25},\epsilon_{1}]$ for the second iteration. Through reshuffling, the disruption magnitudes from different episodes will experience exactly the same list of uncertainty multipliers, but with different sequences throughout the simulation iterations. The evaluation will be detailed in Section 5.2 together with distribution skewness as the quantitative indicator.

The most important hyperparameters for both the baseline and the proposed RL algorithms are the same and summarized in Table. 3 below. Note that the minimal learning rates and noise scale are not set to be an extremely small value as common RL algorithms do, we aim to get a trade-off between optimality and adaptiveness when the algorithm is experiencing disruptions, which can be demonstrated by the results for both good performance and being antifragile in Section 6. The exponent coefficient for the damping term $r_{dam}(t)$ to counter the potential oscillations actions in Eq. 10 is set to be $6$ .

Table 3: List of hyperparameters

Hyperparameter	Value
Replay buffer	10,000
Sample size	1,000
Action noise initial scale	0.3
Action noise linear decay	0.003
Action noise minimal scale	0.1
Batch size	256
Target network update	5
Discount factor	0.90
	Actor	Critic
Initial learning rate	0.004	0.008
Learning rate decay	0.975	0.975
Minimal learning rate	0.0005	0.001
Epoch	2	128

5.2 Performance evaluation

Although the reward in the RL algorithm is defined based on trip completion, together with a damping term and a redundancy term, to better showcase antifragility and the associated performance loss represented by the shaded area in Fig. 1 to coincide the relationship between resilience and antifragility, the main performance indicator being evaluated here is the Total Time Spent (TTS), which is calculated by adding up the number of vehicles within the network at each second of the simulation. Another major benefit of choosing TTS over completion is that two scenarios with the same trip completion at the end of the simulation may exhibit distinct TTS and the one with the lower value should be regarded to have demonstrated a superior performance overall.

Since urban road networks are always subject to capacity constraints, and as Sun et al. (2024) has recently proven the fragile nature of urban road networks in terms of progressive fragility, fully antifragile traffic control strategies may be impossible to achieve and will inevitably fall into being fragile when the magnitude of disruptions is large enough. Therefore, this paper aims to demonstrate that the proposed perimeter control algorithm is less fragile than the state-of-the-art baseline algorithms, and we normalize all the other perimeter control algorithms over the RL baseline method to study the relative antifragility. To quantify the progressive antifragility of different algorithms, we calculate the skewness of distribution based on the samples from the last 25 incremental episodes, with $\mu$ and $\sigma$ denoting the mean and the standard deviation:

\displaystyle s=\frac{1}{N_{sample}}\sum_{i=1}^{N_{sample}}\left(\frac{TTS_{i}-\mu}{\sigma}\right)^{3}

(18)

When the performance distribution skewness of a certain system with or without a control algorithm is $0$ , it means that the system itself or the applied algorithm makes the system neither fragile nor antifragile. On the other side, negative skewness indicates the distribution has a longer or fatter right tail and thus a higher degree of concavity in the performance function, which showcases antifragility, whereas a positive skewness signifies fragility. Therefore, other than demonstrating a superior performance regarding a lower TTS, the proposed method should also showcase a smaller skewness than the baseline method.

6 Results

As mentioned in the previous section, there are 8 simulation scenarios based on the combination of demand/supply disruption, static/incremental disruption, and with/without uncertainty, with each one containing the performance of 4 perimeter control algorithms to be tested: NC, MPC, the baseline RL algorithm (Zhou and Gayah, 2021), and our proposed RL-based algorithm. Here we first present the results of simulations with static magnitude of disruptions to showcase proto-antifragility, followed by the results with incremental magnitudes to demonstrate progressive antifragility. Each scenario is tested for 25 simulation iterations.

6.1 Proto-antifragility

Systems with the property of proto-antifragility can learn from past adverse events and anticipate possible ongoing disruptions of similar magnitudes to enhance future performance. To validate such property, particularly for the RL-based algorithms, after training the agent with the base demand profile for 50 episodes, we apply disruptions with static magnitude for another 25 episodes.

6.1.1 Demand disruption

Fig. 8 demonstrates the performance curves of the four selected algorithms without and with uncertainty. Each data point on any performance curve for a particular episode represents the TTS averaged from 25 full simulations of 3 hours. The shaded area represents the performance variance due to the innate randomness in the training process of the RL agents or combined with uncertainties artificially introduced into the magnitude of disruptions.

In Fig. 8(a), both RL-based algorithms are first trained for 50 episodes with only the base demand profile at the beginning of the simulation. At this stage, the proposed RL algorithm outperforms both NC and the RL baseline, while demonstrating performance comparable to that of MPC. Afterward, a static magnitude of disruption occurs in the next 25 episodes, during which both RL-based algorithms exhibited improved performance following an initial rise of TTS at the transition. They also showed a lower TTS compared to NC and MPC. However, two other phenomena can be noticed. First, the proposed RL algorithm performs better compared to the baseline RL algorithm before the introduction of the static disruption and shows an even superior performance of about $4.2\%$ after the disruption comes into play. Furthermore, the performance variance of the proposed RL algorithm is significantly smaller than the RL baseline, particularly with demand disruptions, indicating the proposed algorithm is both superior and more stable compared to the RL baseline algorithm. In Fig. 8(b), as disruption magnitude is set to be stochastic following a Gaussian distribution of $\mathcal{N}\sim(1,0.15)$ , the performance of NC and MPC also contains a shaded area. The shaded area is rather stable under uncertainty due to the reshuffling mechanism explained in Section 5 to reshuffle the uncertainty multiplier list $[\epsilon_{i}]$ by $1$ after each iteration. Despite the enlarged variance for both RL-based methods, the averaged TTS curves are largely the same as in 8(a) with no uncertainty. The performance gain of the proposed RL algorithm compared to the RL baseline under disruption uncertainty is $3.9\%$ , marginally smaller compared to the performance without uncertainty.

6.1.2 Supply disruption

Likewise, for studying proto-antifragility under a static supply disruption, we also tested the algorithms without and with disruption uncertainty, as demonstrated in Fig. 9. Similar results as in the study of demand disruption can be observed, the averaged TTS curves show a $3.0\%$ performance gain of the proposed RL algorithm compared to the baseline RL algorithm under no disruption uncertainty. In Fig. 9(b), the performance gain is marginally higher and comes to $3.5\%$ under the presence of disruption uncertainty. However, the performance variance of the proposed algorithm in Fig. 9(a) is greater with disruption uncertainty compared to the baseline RL algorithm, but still, its shaded area in blue lies predominantly under the gray shaded area, indicating the superior performance of the antifragile RL algorithm.

6.2 Progressive antifragility

Progressive antifragility describes a system that exhibits a nonlinear response to growing disruptions. In terms of TTS, since an increase in traffic density or vehicle accumulation within a network will inevitably reduce average vehicle speed or trip completion rate as per speed-density MFD (Geroliminis and Daganzo, 2008), and thus increase TTS, a concave response between TTS and vehicle accumulation can indicate a progressively antifragile system, and vice versa. Therefore, instead of 25 episodes of a static demand or supply disruption, a linearly increasing magnitude of demand or supply disruption is simulated, and skewness is calculated for each applied algorithm as the metric of (anti-)fragility.

6.2.1 Demand disruption

First, as can be observed in Fig. 10(a), when no disruption uncertainty is present in the last 25 episodes of incremental disruptions, the proposed RL algorithm performs the best out of all the four tested algorithms. The shaded area represents the variance from the 25 iterations of the simulation, and the variance from the proposed method is lower compared to the RL baseline. Furthermore, the performance curve of our proposed method also seems less convex than the other algorithms, demonstrating its relative antifragility. Before discussing the quantitative indicators, it should be noted that the TTS curves from NC and MPC are only partially illustrated, as their simulated network quickly came into gridlock when faced with a major disruption with the vehicle accumulation reaching maximal accumulation. When the network is fully gridlocked, it cannot recover to an uncongested state even when the disruption has passed. In this case, TTS will only grow linearly with the magnitude of distribution and lose the ability to showcase skewness, which will be further detailed below in Fig. 10(c).

In Fig. 10(b), the performance difference normalized over the baseline RL algorithm is illustrated with a polar plot. Only the episodes under incremental demand are shown, i.e., episodes $50-75$ . Instead of TTS variance, the shaded blue area highlights the performance gain of our proposed method compared to the state-of-the-art baseline RL algorithm. As observed, the TTS curves for NC, MPC, and the proposed RL algorithm all diverge from the baseline method at an accelerating rate. The main comparison here is two RL-based algorithms, and here our proposed one shows an ultimate $9.3\%$ performance gain after 25 episodes with incremental demand disruption. Also, the disruption magnitude between the 62-rd and 63-rd episodes under incremental disruption is about the same as in the simulation with static disruption, and it is interesting to see that the performance gain here is approximately $5.2\%$ , significantly higher than the $4.2\%$ improvement with static magnitude of disruption. It shows the proposed method learns and adapts relatively faster under an incremental environment.

To analyze the antifragile properties of each perimeter control algorithm, we compute the distribution skewness for each method up to the $n$ -th episode, as in Fig. 10(c) shows, such that the skewness value at the final episode reflects the distribution skewness across all $25$ episodes with incremental disruptions. Since calculating skewness requires a minimum sample size to yield meaningful results, a buffer of $5$ episodes is included at the start of the incremental disruption phase. Initially, the skewness for all four methods under comparison starts at a low value. However, the skewness for NC and MPC increases rapidly to approximately $1$ , while the skewness for the two RL-based algorithms rises more gradually. Among these, the skewness curve of the proposed algorithm in blue lies mostly below the baseline curve in gray, particularly when the magnitude of disruption is high, achieving the lowest final skewness at $0.29$ . It should be noted that the results from both NC and MPC indicate that these two algorithms render the network quickly into gridlock under the presence of demand disruptions, at which state the TTS will only grow linearly with the magnitude of disruptions, i.e., the additional number of vehicles. In this case, the skewness up to the $n$ -episode will also gradually approach $0$ when the episode number $n$ is sufficiently high. Therefore, we use dashed lines to represent the maximal skewness if the skewness of a particular algorithm has ever plateaued.

When uncertainty is introduced to the magnitude of disruptions, as shown in Fig. 11, the results remain largely consistent with the non-stochastic scenario. However, the performance curves appear less smooth than those under no stochasticity in Fig. 10, suggesting that the presence of uncertainty adds some complexity to the training process. In contrast to the baseline algorithm, the proposed RL algorithm demonstrates a continuously increasing performance gain, as shown in Fig. 11(b), reaching $9.2\%$ when the magnitude of disruption becomes significant. The unevenness in performance is also reflected in the value of skewness in Fig. 11(c), despite the final skewness of the two RL-based algorithms being almost the same as in Fig. 10(c), with $0.32$ and $0.29$ to $0.32$ and $0.28$ . The final skewness values for NC and MPC are noticeably lower, even though their skewness curves substantially follow and overlap the same curves in Fig. 10(c) before plateauing. The premature entering into plateauing for both the NC and MPC can be attributed to the introduction of uncertainty, which causes gridlock to occur sooner, particularly when considering the convex fragile performance of the network.

6.2.2 Supply disruption

TTS curves of the four studied algorithms under deterministic supply disruption are shown in Fig. 12. The magnitude of incremental supply disruption is tuned so that the performance curve from NC is largely the same as with incremental demand disruption. However, the TTS from MPC and RL-based algorithms have increased to different extents. In Episode 65, for example, with supply disruption the TTS of NC is $1.17\cdot 10^{8}$ s, greatly resembling the TTS with $1.19\cdot 10^{8}$ s in 10(a), whereas the TTS for MPC grows from $8.2\cdot 10^{7}$ to $9.2\cdot 10^{7}$ , for baseline RL grows from $6.7\cdot 10^{7}$ to $7.7\cdot 10^{7}$ , and for proposed RL grows from $6.2\cdot 10^{7}$ to $7.3\cdot 10^{7}$ . Especially for the two RL-based algorithms, the TTS has increased by around $15\%$ . The difference in performance between demand and supply disruptions has demonstrated the distinct levels of difficulty they pose for the learning of the RL agents. The relatively faster-growing skewness in Fig. 12(c) is another evidence for such challenges. The state of the baseline RL algorithm is composed of $[n_{ij}(t),q_{ij}(t)]$ , whereas it is $[n_{ij}(t),dn_{ij}(t),d^{2}n_{ij}(t)]$ in the proposed algorithm. And although the baseline method relies on additional information of $[q_{ij}(t)]$ , it becomes misinformation when demand disruption is present, while the derivatives $[dn_{ij}(t),d^{2}n_{ij}(t)]$ have a similar function as $[q_{ij}(t)]$ but being antifragile to demand disruption. When it comes to disruption on the supply side, the situation differs because neither of the state definitions of the two RL-based algorithms is capable of accurately reflecting the model shifting caused by the supply disruption. Still, with a faster changing rate of the derivatives, the agents in the proposed algorithm can make better decisions than the baseline algorithm while gradually adapting to the new model.

As shown in Fig. 12(b), the performance gain can grow up to $12.2\%$ when the supply disruption is substantial with a value of the infrastructure potential being $0.145$ , demonstrating the superior performance of the proposed method compared to the baseline method. Furthermore, in Fig. 12(c), the blue skewness curve lies constantly below the baseline curve in gray, with an ultimate skewness of $0.41$ , which indicates that the proposed RL algorithm is relatively more antifragile than the baseline algorithm.

When supply disruption uncertainty is introduced into the simulation, as shown in Fig. 13, the results are similar to those observed under demand disruption with uncertainty. First, the TTS difference is $11.8\%$ , showing minimal variation between scenarios with and without disruption. The skewness of the two RL-based algorithms remains almost unchanged, shifting only slightly from $0.41$ and $0.52$ to $0.42$ and $0.51$ for the proposed and baseline algorithms respectively. In contrast, the skewness values for NC and MPC decrease from $0.88$ and $0.78$ to $0.82$ and $0.73$ , reflecting the fact that the traffic state has already reached the gridlock accumulation without uncertainty, and by the incorporation of uncertainty, the network is prone to get gridlocked even more easily. Also, even after averaging 25 simulation iterations, the performance curves under supply disruption with uncertainty are less smooth, which is a similar pattern under demand disruption uncertainty.

7 Conclusion

This research work introduces the concept of antifragility by comparing it with two other terms robustness and resilience, which are commonly used in transportation and traffic control strategies. Through a literature review on using RL algorithms in traffic control to realize a robust or resilient design, whether and how such properties can be induced in RL is investigated. Following the same idea, we manage to induce antifragility in perimeter control by modifying the state definition and reward function based on a state-of-the-art RL-based perimeter control algorithm. We incorporate the first and second derivatives, i.e., the change rate and curvature of the traffic states in the state definition to leverage the potential of such derivatives to feed more information to the RL algorithm. To counter the side effect of oscillation from introducing the derivatives, a damping term is included to stabilize the computed actions. Additionally, a redundant overcompensation term in the reward function has been carefully composed to empower the system to be more antifragile to disruptions.

We conducted a comprehensive comparison between our proposed antifragile RL-based perimeter control approach and three other methods: NC, MPC, and the state-of-the-art RL-based method as the baseline. This comparison was carried out on two different levels, disruptions with static and incremental magnitude, for validating proto-antifragility and progressive antifragility, respectively. We investigated two distinct manifestations of disruptions, namely demand disruptions and supply disruptions. Potential uncertainties in the magnitude of disruptions are also considered. The results not only demonstrated the effectiveness of our proposed method based on superior performance regarding lower TTS, but also showcased proto-antifragility under static disruptions. Furthermore, we put forward a novel method for the quantification of antifragility by computing and comparing the distribution skewness of different algorithms. Our proposed antifragile method exhibits the lowest skewness among all the methods examined, indicating its relative progressive antifragility property against an incremental magnitude of disruptions.

In conclusion, this study is the first of its kind to pioneer the application of the antifragility concept in enhancing the daily operation of transportation networks to improve performance during unforeseen disruptions using a learning-based algorithm. It introduces a new possibility for evaluating system operation under disruptive conditions. Moreover, the concept is sufficiently generic to be extended not only to other traffic control systems like traffic signal control and pricing but also to various engineering disciplines, broadening its potential impact.

8 CRediT authorship contribution statement

Linghang Sun: Conceptualization, Investigation, Methodology, Visualization, Writing – original draft. Michail A. Makridis: Conceptualization, Methodology, Supervision, Writing - review & editing. Alexander Genser: Methodology, Visualization, Writing - review & editing. Cristian Axenie: Project administration, Resources, Writing - review & editing. Margherita Grossi: Project administration, Resources, Writing - review & editing. Anastasios Kouvelas: Supervision, Writing - review & editing.

9 Declaration of Competing Interest

This research was kindly funded by the Huawei Munich Research Center under the framework of the Antigones project, with one of our co-authors being employed at the said company. Otherwise, the authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Ambühl et al. (2020) Ambühl, L., Loder, A., Bliemer, M.C.J., Menendez, M., Axhausen, K.W., 2020. A functional form with a physical meaning for the macroscopic fundamental diagram. Transportation Research Part B: Methodological 137, 119–132. doi:10.1016/j.trb.2018.10.013.
Ambühl et al. (2021) Ambühl, L., Loder, A., Leclercq, L., Menendez, M., 2021. Disentangling the city traffic rhythms: A longitudinal analysis of MFD patterns over a year. Transportation Research Part C: Emerging Technologies 126, 103065. doi:10.1016/j.trc.2021.103065.
Ambühl et al. (2018) Ambühl, L., Loder, A., Menendez, M., Axhausen, K.W., 2018. A case study of Zurich’s two-layered perimeter control , 8 p.doi:10.3929/ETHZ-B-000206987.
Ambühl et al. (2019) Ambühl, L., Loder, A., Zheng, N., Axhausen, K.W., Menendez, M., 2019. Approximative Network Partitioning for MFDs from Stationary Sensor Data. Transportation Research Record 2673, 94–103. doi:10.1177/0361198119843264.
Andersson et al. (2019) Andersson, J.A.E., Gillis, J., Horn, G., Rawlings, J.B., Diehl, M., 2019. CasADi – A software framework for nonlinear optimization and optimal control. Mathematical Programming Computation 11, 1–36. doi:10.1007/s12532-018-0139-4.
Aslani et al. (2018) Aslani, M., Seipel, S., Mesgari, M.S., Wiering, M., 2018. Traffic signal optimization through discrete and continuous reinforcement learning with robustness analysis in downtown Tehran. Advanced Engineering Informatics 38, 639–655. doi:10.1016/j.aei.2018.08.002.
Auer et al. (2016) Auer, A., Feese, S., Lockwood, S., Booz Allen Hamilton, 2016. History of Intelligent Transportation Systems. Technical Report FHWA-JPO-16-329.
Aven (2015) Aven, T., 2015. The Concept of Antifragility and its Implications for the Practice of Risk Analysis. Risk Analysis 35, 476–483. doi:10.1111/risa.12279.
Axenie (2022) Axenie, C., 2022. Antifragile control systems: The case of an oscillator-based network model of urban road traffic dynamics. arXiv preprint arXiv:2210.10460 .
Axenie et al. (2022) Axenie, C., Kurz, D., Saveriano, M., 2022. Antifragile Control Systems: The Case of an Anti-Symmetric Network Model of the Tumor-Immune-Drug Interactions. Symmetry 14, 2034. doi:10.3390/sym14102034.
Axenie et al. (2024) Axenie, C., López-Corona, O., Makridis, M.A., Akbarzadeh, M., Saveriano, M., Stancu, A., West, J., 2024. Antifragility in complex dynamical systems. npj Complexity 1, 1–8. doi:10.1038/s44260-024-00014-y.
Axenie and Saveriano (2023) Axenie, C., Saveriano, M., 2023. Antifragile Control Systems: The case of mobile robot trajectory tracking in the presence of uncertainty.
Bemporad (2006) Bemporad, A., 2006. Model Predictive Control Design: New Trends and Tools, in: Proceedings of the 45th IEEE Conference on Decision and Control, pp. 6678–6683. doi:10.1109/CDC.2006.377490.
de Bruijn et al. (2020) de Bruijn, H., Größler, A., Videira, N., 2020. Antifragility as a design criterion for modelling dynamic systems. Systems Research and Behavioral Science 37, 23–37. doi:10.1002/sres.2574.
Buldyrev et al. (2010) Buldyrev, S.V., Parshani, R., Paul, G., Stanley, H.E., Havlin, S., 2010. Catastrophic cascade of failures in interdependent networks. Nature 464, 1025–1028. doi:10.1038/nature08932.
Chang and Xiang (2003) Chang, G.L., Xiang, H., 2003. The relationship between congestion levels and accidents .
Chen et al. (2022) Chen, C., Huang, Y.P., Lam, W.H.K., Pan, T.L., Hsu, S.C., Sumalee, A., Zhong, R.X., 2022. Data efficient reinforcement learning and adaptive optimal perimeter control of network traffic dynamics. Transportation Research Part C: Emerging Technologies 142, 103759. doi:10.1016/j.trc.2022.103759.
Chen et al. (2020) Chen, C., Wei, H., Xu, N., Zheng, G., Yang, M., Xiong, Y., Xu, K., Li, Z., 2020. Toward A Thousand Lights: Decentralized Deep Reinforcement Learning for Large-Scale Traffic Signal Control. Proceedings of the AAAI Conference on Artificial Intelligence 34, 3414–3421. doi:10.1609/aaai.v34i04.5744.
Chu et al. (2020) Chu, T., Wang, J., Codecà, L., Li, Z., 2020. Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control. IEEE Transactions on Intelligent Transportation Systems 21, 1086–1095. doi:10.1109/TITS.2019.2901791.
Daganzo (2007) Daganzo, C.F., 2007. Urban gridlock: Macroscopic modeling and mitigation approaches. Transportation Research Part B: Methodological 41, 49–62. doi:10.1016/j.trb.2006.03.001.
Fang and Sansavini (2017) Fang, Y., Sansavini, G., 2017. Emergence of Antifragility by Optimum Postdisruption Restoration Planning of Infrastructure Networks. Journal of Infrastructure Systems 23, 04017024. doi:10.1061/(ASCE)IS.1943-555X.0000380.
Federal Statistical Office (2020) Federal Statistical Office, 2020. Mobilität und Verkehr: Panorama (in German/French only). 16704292, Neuchâtel.
Fu et al. (2022) Fu, H., Chen, S., Chen, K., Kouvelas, A., Geroliminis, N., 2022. Perimeter Control and Route Guidance of Multi-Region MFD Systems With Boundary Queues Using Colored Petri Nets. IEEE Transactions on Intelligent Transportation Systems 23, 12977–12999. doi:10.1109/TITS.2021.3119017.
Ganin et al. (2019) Ganin, A.A., Mersky, A.C., Jin, A.S., Kitsak, M., Keisler, J.M., Linkov, I., 2019. Resilience in Intelligent Transportation Systems (ITS). Transportation Research Part C: Emerging Technologies 100, 318–329. doi:10.1016/j.trc.2019.01.014.
Gayah and Daganzo (2011) Gayah, V.V., Daganzo, C.F., 2011. Clockwise hysteresis loops in the Macroscopic Fundamental Diagram: An effect of network instability. Transportation Research Part B: Methodological 45, 643–655. doi:10.1016/j.trb.2010.11.006.
Genser and Kouvelas (2022) Genser, A., Kouvelas, A., 2022. Dynamic optimal congestion pricing in multi-region urban networks by application of a Multi-Layer-Neural network. Transportation Research Part C: Emerging Technologies 134, 103485. doi:10.1016/j.trc.2021.103485.
Geroliminis and Daganzo (2008) Geroliminis, N., Daganzo, C.F., 2008. Existence of urban-scale macroscopic fundamental diagrams: Some experimental findings. Transportation Research Part B: Methodological 42, 759–770. doi:10.1016/j.trb.2008.02.002.
Geroliminis et al. (2013) Geroliminis, N., Haddad, J., Ramezani, M., 2013. Optimal Perimeter Control for Two Urban Regions With Macroscopic Fundamental Diagrams: A Model Predictive Approach. IEEE Transactions on Intelligent Transportation Systems 14, 348–359. doi:10.1109/TITS.2012.2216877.
Haddad and Mirkin (2017) Haddad, J., Mirkin, B., 2017. Coordinated distributed adaptive perimeter control for large-scale urban road networks. Transportation Research Part C: Emerging Technologies 77, 495–515. doi:10.1016/j.trc.2016.12.002.
Haghighat et al. (2020) Haghighat, A.K., Ravichandra-Mouli, V., Chakraborty, P., Esfandiari, Y., Arabi, S., Sharma, A., 2020. Applications of Deep Learning in Intelligent Transportation Systems. Journal of Big Data Analytics in Transportation 2, 115–145. doi:10.1007/s42421-020-00020-1.
Haydari and Yılmaz (2022) Haydari, A., Yılmaz, Y., 2022. Deep Reinforcement Learning for Intelligent Transportation Systems: A Survey. IEEE Transactions on Intelligent Transportation Systems 23, 11–32. doi:10.1109/TITS.2020.3008612.
Horgan et al. (2018) Horgan, D., Quan, J., Budden, D., Barth-Maron, G., Hessel, M., Hasselt, H.v., Silver, D., 2018. Distributed Prioritized Experience Replay. doi:10.48550/arXiv.1803.00933.
Ji et al. (2015) Ji, Y., Jiang, R., Chung, E., Zhang, X., 2015. The impact of incidents on macroscopic fundamental diagrams. Proceedings of the Institution of Civil Engineers: Transport 168, 396–405. doi:10.1680/tran.13.00026.
Johnson and Gheorghe (2013) Johnson, J., Gheorghe, A.V., 2013. Antifragility analysis and measurement framework for systems of systems. International Journal of Disaster Risk Science 4, 159–168. doi:10.1007/s13753-013-0017-7.
Kamalahmadi et al. (2022) Kamalahmadi, M., Shekarian, M., Mellat Parast, M., 2022. The impact of flexibility and redundancy on improving supply chain resilience to disruptions. International Journal of Production Research 60, 1992–2020. doi:10.1080/00207543.2021.1883759.
Keyvan-Ekbatani et al. (2012) Keyvan-Ekbatani, M., Kouvelas, A., Papamichail, I., Papageorgiou, M., 2012. Exploiting the fundamental diagram of urban networks for feedback-based gating. Transportation Research Part B: Methodological 46, 1393–1403. doi:10.1016/j.trb.2012.06.008.
Kim et al. (2020) Kim, H., Muñoz, S., Osuna, P., Gershenson, C., 2020. Antifragility Predicts the Robustness and Evolvability of Biological Networks through Multi-Class Classification with a Convolutional Neural Network. Entropy 22, 986. doi:10.3390/e22090986.
Knoop et al. (2012) Knoop, V.L., Hoogendoorn, S.P., Van Lint, J.W.C., 2012. Routing Strategies Based on Macroscopic Fundamental Diagram. Transportation Research Record 2315, 1–10. doi:10.3141/2315-01.
Korecki et al. (2023) Korecki, M., Dailisan, D., Helbing, D., 2023. How Well Do Reinforcement Learning Approaches Cope With Disruptions? The Case of Traffic Signal Control. IEEE Access 11, 36504–36515. doi:10.1109/ACCESS.2023.3266644.
Kouvelas et al. (2017) Kouvelas, A., Saeedmanesh, M., Geroliminis, N., 2017. Enhancing model-based feedback perimeter control with data-driven online adaptive optimization. Transportation Research Part B: Methodological 96, 26–45. doi:10.1016/j.trb.2016.10.011.
Li (2018) Li, Y., 2018. Deep Reinforcement Learning: An Overview.
Lillicrap et al. (2015) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D., 2015. Continuous control with deep reinforcement learning. doi:10.48550/arXiv.1509.02971.
Lucia et al. (2017) Lucia, S., Tătulea-Codrean, A., Schoppmeyer, C., Engell, S., 2017. Rapid development of modular and sustainable nonlinear model predictive control solutions. Control Engineering Practice 60, 51–62. doi:10.1016/j.conengprac.2016.12.009.
Manso et al. (2020) Manso, G., Balsmeier, B., Fleming, L., 2020. Heterogeneous Innovation and the Antifragile Economy .
Matthias et al. (2020) Matthias, V., Bieser, J., Mocanu, T., Pregger, T., Quante, M., Ramacher, M.O.P., Seum, S., Winkler, C., 2020. Modelling road transport emissions in Germany – Current day situation and scenarios for 2040. Transportation Research Part D: Transport and Environment 87, 102536. doi:10.1016/j.trd.2020.102536.
Mazloumi et al. (2010) Mazloumi, E., Currie, G., Rose, G., 2010. Using GPS Data to Gain Insight into Public Transport Travel Time Variability. Journal of Transportation Engineering 136, 623–631. doi:10.1061/(ASCE)TE.1943-5436.0000126.
Mnih et al. (2013) Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M., 2013. Playing Atari with Deep Reinforcement Learning .
Munoz et al. (2022) Munoz, A., Billsberry, J., Ambrosini, V., 2022. Resilience, robustness, and antifragility: Towards an appreciation of distinct organizational responses to adversity. International Journal of Management Reviews 24, 181–187. doi:10.1111/ijmr.12289.
Mysore et al. (2021) Mysore, S., Mabsout, B., Mancuso, R., Saenko, K., 2021. Regularizing Action Policies for Smooth Control with Reinforcement Learning, in: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 1810–1816. doi:10.1109/ICRA48506.2021.9561138.
Nguyen et al. (2018) Nguyen, H., Kieu, L.M., Wen, T., Cai, C., 2018. Deep learning methods in transportation domain: a review. IET Intelligent Transport Systems 12, 998–1004. doi:10.1049/iet-its.2018.0064.
Ni and Cassidy (2019) Ni, W., Cassidy, M.J., 2019. Cordon control with spatially-varying metering rates: A Reinforcement Learning approach. Transportation Research Part C: Emerging Technologies 98, 358–369. doi:10.1016/j.trc.2018.12.007.
Priyadarshini et al. (2022) Priyadarshini, J., Singh, R.K., Mishra, R., Bag, S., 2022. Investigating the interaction of factors for implementing additive manufacturing to build an antifragile supply chain: TISM-MICMAC approach. Operations Management Research 15, 567–588. doi:10.1007/s12063-022-00259-7.
Rodrigues and Azevedo (2019) Rodrigues, F., Azevedo, C.L., 2019. Towards Robust Deep Reinforcement Learning for Traffic Signal Control: Demand Surges, Incidents and Sensor Failures, in: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pp. 3559–3566. doi:10.1109/ITSC.2019.8917451.
Saedi et al. (2020) Saedi, R., Saeedmanesh, M., Zockaie, A., Saberi, M., Geroliminis, N., Mahmassani, H.S., 2020. Estimating network travel time reliability with network partitioning. Transportation Research Part C: Emerging Technologies 112, 46–61. doi:10.1016/j.trc.2020.01.013.
Silver et al. (2014) Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M., 2014. Deterministic Policy Gradient Algorithms .
Su et al. (2023) Su, Z.C., Chow, A.H.F., Fang, C.L., Liang, E.M., Zhong, R.X., 2023. Hierarchical control for stochastic network traffic with reinforcement learning. Transportation Research Part B: Methodological 167, 196–216. doi:10.1016/j.trb.2022.12.001.
Sun et al. (2024) Sun, L., Zhang, Y., Axenie, C., Grossi, M., Kouvelas, A., Makridis, M.A., 2024. The Fragile Nature of Road Transportation Systems. doi:10.48550/arXiv.2402.00924.
Sutton and Barto (2018) Sutton, R.S., Barto, A.G., 2018. Reinforcement Learning, second edition: An Introduction. MIT Press.
Taleb (2012) Taleb, N.N., 2012. Antifragile: Things That Gain from Disorder. volume 3. Random House.
Taleb (2013) Taleb, N.N., 2013. ’Antifragility’ as a mathematical idea. Nature 494, 430–430. doi:10.1038/494430e.
Taleb and Douady (2013) Taleb, N.N., Douady, R., 2013. Mathematical definition, mapping, and detection of (anti)fragility. Quantitative Finance 13, 1677–1689. doi:10.1080/14697688.2013.800219.
Taleb and West (2023) Taleb, N.N., West, J., 2023. Working with Convex Responses: Antifragility from Finance to Oncology. Entropy 25, 343. doi:10.3390/e25020343.
Tamvakis and Xenidis (2012) Tamvakis, P., Xenidis, Y., 2012. Resilience in Transportation Systems. Procedia - Social and Behavioral Sciences 48, 3441–3450. doi:10.1016/j.sbspro.2012.06.1308.
Tan et al. (2020) Tan, K.L., Sharma, A., Sarkar, S., 2020. Robust Deep Reinforcement Learning for Traffic Signal Control. Journal of Big Data Analytics in Transportation 2, 263–274. doi:10.1007/s42421-020-00029-6.
Tan et al. (2019) Tan, W.J., Zhang, A.N., Cai, W., 2019. A graph-based model to measure structural redundancy for supply chain resilience. International Journal of Production Research 57, 6385–6404. doi:10.1080/00207543.2019.1566666.
Tang et al. (2020) Tang, J., Heinimann, H., Han, K., Luo, H., Zhong, B., 2020. Evaluating resilience in urban transportation systems for sustainability: A systems-based Bayesian network model. Transportation Research Part C: Emerging Technologies 121, 102840. doi:10.1016/j.trc.2020.102840.
U.S. Bureau of Public Roads (1964) U.S. Bureau of Public Roads, 1964. Traffic assignment manual for application with a large, high speed computer. US Department of Commerce.
U.S. Department of Transportation (2019) U.S. Department of Transportation, 2019. Vehicle Miles Traveled.
Vespignani (2010) Vespignani, A., 2010. The fragility of interdependency. Nature 464, 984–985. doi:10.1038/464984a.
Wang et al. (2015) Wang, P., Wada, K., Alamatsu, T., Hara, Y., 2015. An empirical analysis of macroscopic fundamental diagrams for sendai road networks. Interdisciplinary Information Sciences 21, 49–61. doi:10.4036/iis.2015.49.
Wang et al. (2022) Wang, Y., Jin, H., Zheng, G., 2022. CTRL: Cooperative Traffic Tolling via Reinforcement Learning, in: Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Association for Computing Machinery, New York, NY, USA. pp. 3545–3554. doi:10.1145/3511808.3557112.
Wei et al. (2019) Wei, H., Xu, N., Zhang, H., Zheng, G., Zang, X., Chen, C., Zhang, W., Zhu, Y., Xu, K., Li, Z., 2019. CoLight: Learning Network-level Cooperation for Traffic Signal Control, in: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, ACM, Beijing China. pp. 1913–1922. doi:10.1145/3357384.3357902.
Wu et al. (2020) Wu, C., Ma, Z., Kim, I., 2020. Multi-Agent Reinforcement Learning for Traffic Signal Control: Algorithms and Robustness Analysis, in: 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), pp. 1–7. doi:10.1109/ITSC45102.2020.9294623.
Wächter and Biegler (2006) Wächter, A., Biegler, L.T., 2006. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical Programming 106, 25–57. doi:10.1007/s10107-004-0559-y.
Yang et al. (2017) Yang, K., Zheng, N., Menendez, M., 2017. Multi-scale Perimeter Control Approach in a Connected-Vehicle Environment. Transportation Research Procedia 23, 101–120. doi:10.1016/j.trpro.2017.05.007.
Yildirimoglu et al. (2015) Yildirimoglu, M., Ramezani, M., Geroliminis, N., 2015. Equilibrium Analysis and Route Guidance in Large-scale Networks with MFD Dynamics. Transportation Research Procedia 9, 185–204. doi:10.1016/j.trpro.2015.07.011.
Zhang and Zhang (2021) Zhang, R., Zhang, J., 2021. Long-term pathways to deep decarbonization of the transport sector in the post-COVID world. Transport Policy 110, 28–36. doi:10.1016/j.tranpol.2021.05.018.
Zhang et al. (2021) Zhang, S., Yao, H., Whiteson, S., 2021. Breaking the Deadly Triad with a Target Network, in: Proceedings of the 38th International Conference on Machine Learning, PMLR. pp. 12621–12631.
Zheng and Geroliminis (2016) Zheng, N., Geroliminis, N., 2016. Modeling and optimization of multimodal urban networks with limited parking and dynamic pricing. Transportation Research Part B: Methodological 83, 36–58. doi:10.1016/j.trb.2015.10.008.
Zheng et al. (2012) Zheng, N., Waraich, R.A., Axhausen, K.W., Geroliminis, N., 2012. A dynamic cordon pricing scheme combining the Macroscopic Fundamental Diagram and an agent-based traffic model. Transportation Research Part A: Policy and Practice 46, 1291–1303. doi:10.1016/j.tra.2012.05.006.
Zhou and Gayah (2021) Zhou, D., Gayah, V.V., 2021. Model-free perimeter metering control for two-region urban networks using deep reinforcement learning. Transportation Research Part C: Emerging Technologies 124, 102949. doi:10.1016/j.trc.2020.102949.
Zhou and Gayah (2023) Zhou, D., Gayah, V.V., 2023. Scalable multi-region perimeter metering control for urban networks: A multi-agent deep reinforcement learning approach. Transportation Research Part C: Emerging Technologies 148, 104033. doi:10.1016/j.trc.2023.104033.
Zhou et al. (2019) Zhou, Y., Wang, J., Yang, H., 2019. Resilience of Transportation Systems: Concepts and Comprehensive Review. IEEE Transactions on Intelligent Transportation Systems 20, 4262–4276. doi:10.1109/TITS.2018.2883766.
Zhu et al. (2019) Zhu, L., Yu, F.R., Wang, Y., Ning, B., Tang, T., 2019. Big Data Analytics in Intelligent Transportation Systems: A Survey. IEEE Transactions on Intelligent Transportation Systems 20, 383–398. doi:10.1109/TITS.2018.2815678.
Zhu et al. (2021) Zhu, Y., Wang, P., Corman, F., 2021. A deep reinforcement learning framework for delay management with passenger re-routing, in: 9th International Conference on Railway Operations Modelling and Analysis.