Social Coordination and Altruism
in Autonomous Driving
Abstract
Despite the advances in the autonomous driving domain, autonomous vehicles (AVs) are still inefficient and limited in terms of cooperating with each other or coordinating with vehicles operated by humans. A group of autonomous and human-driven vehicles (HVs) which work together to optimize an altruistic social utility can co-exist seamlessly and assure safety and efficiency on the road. Achieving this mission without explicit coordination among agents is challenging, mainly due to the difficulty of predicting the behavior of humans with heterogeneous preferences in mixed-autonomy environments. Formally, we model an AV’s maneuver planning in mixed-autonomy traffic as a partially-observable stochastic game and attempt to derive optimal policies that lead to socially-desirable outcomes using a multi-agent reinforcement learning framework (MARL), and propose a semi-sequential multi-agent training and policy dissemination algorithm for our MARL problem. We introduce a quantitative representation of the AVs’ social preferences and design a distributed reward structure that induces altruism into their decision-making process. Altruistic AVs are able to form alliances, guide the traffic, and affect the behavior of the HVs to handle competitive driving scenarios. We compare egoistic AVs to our altruistic autonomous agents in a highway merging setting and demonstrate the emerging behaviors that lead to improvement in the number of successful merges and the overall traffic flow and safety.
Index Terms:
Cooperative Driving, Social Navigation, Mixed-autonomy Traffic, Multi-agent Reinforcement LearningI Introduction

Connected and automated vehicles (CAVs) pursue a mission to enhance driving safety and reliability by bringing automation and intelligence into vehicles, which lessens the inherent human limitations such as range of vision, reaction time, and distraction. Adding the communication component to intelligent vehicles further improves their ability to perceive their surroundings and creates an opportunity for mass coordination and cooperative decision-making. This inter-agent coordination is particularly important as the full potential of CAVs does not lie in operating a single vehicle on an empty road but rather from their seamless co-existence with other autonomous and human-driven vehicles (HVs). Hence, we narrow the focus of this work to studying the decision-making problem in the presence of multiple autonomous agents and human drivers, i.e. a mixed-autonomy multi-agent environment.
Leveraging vehicle-to-vehicle (V2V) communication, decision-making in a purely-autonomous environment can be simplified into a centralized control problem with essentially one agent. However, the presence of HVs makes the inter-agent coordination more challenging as they cannot explicitly communicate to coordinate with AVs in real-time. In order to make safe and socially-desirable decisions in the presence of humans, current solutions on social navigation for AVs mainly rely on learned or hand-coded models that predict the behavior of human drivers [1, 2]. We identify two key shortcomings in the existing schemes. First, the fidelity of the human models that are derived in the absence of autonomous agents is questionable in mixed-autonomy settings as human drivers tend to act differently when around AVs [3]. Second, single-agent solutions do not fully exploit the potential of CAVs in constituting a mass intelligence, forming alliances, and performing coordinated multi-agent maneuvers.
We study the mixed-autonomy decision-making problem from a multi-agent point of view, as opposed to the previous individual perspectives. Our key insight is that incentivizing AVs on adopting an altruistic behavior and accounting for the interest of other vehicles, allows them to see the big picture and find solutions that are optimal for the group in a longer term. In addition to the potential safety and efficiency benefits of altruistic decision-making, altruism leads to circumstances where no vehicle has superiority over the others, creating more societally beneficial outcomes [4]. To elaborate, Figure 1(a) shows that a group of AVs can guide the behavior of human drivers to improve safety and efficiency, Figures 1(b) and 1(c) illustrate examples of how AVs can work together to achieve a social goal that benefits another HV or AV.
We focus our work on inherently competitive driving scenarios, such as the examples illustrated in Figure 1, where safe and efficient traffic flow necessarily requires coordination among autonomous agents and egoistic behavior most likely compromises on traffic safety and efficiency. We build on our prior work in [5, 6] and proposed a novel semi-sequential multi-agent training and policy dissemination algorithm to alleviate the non-stationary problem. Additionally, we use a method for scoring the entries in the experience replay buffer that improves sample efficiency and speeds up the learning process. Furthermore, we emphasize the importance of finding the optimal social value orientation and in contrast to the other works, formulate it as a convex optimization problem. We formalize the mixed-autonomy driving problem as a partially observable stochastic game (POSG) and derive optimal policies using deep multi-agent reinforcement learning (MARL). With our solution, altruistic autonomous agents not only learn to drive safely but also master the inter-agent coordination and social navigation. Our main contributions are as follows:
-
•
We propose a MARL framework to train altruistic agents using a decentralized social reward signal. These agents are able to drive safely on the highway and coordinate with each other in the presence of human drivers.
-
•
We proposed a novel semi-sequential multi-agent training and policy dissemination algorithm for our MARL problem and utilized a network architecture that allows our agents to implicitly learn from experience, without the need for an explicit behavioral model of human drivers.
-
•
In contrast with the existing solutions, we formulate the problem of finding the optimal social value orientation angle as a convex optimization objective. We show that an optimal value for the level of altruism exists and when chosen properly between being absolutely selfless or selfish, despite some agents’ compromise on their local utility, the overall traffic safety and flow improve for the group of vehicles.
II Related Work
This section presents a short literature review on the main topics that are closely related to our problem, namely core MARL solutions, cooperative algorithms, human behavior modeling, and navigation in the presence of humans.
Multi-agent Reinforcement Learning. Early solutions for multi-agent value-learning algorithms assume independently trained agents and are proved to perform poorly [7]. To alleviate this problem, a learning rule is presented by Foerster et al. that relies on an additional term to take into account the effect of other agents’ evolution during the training. They have also attempted to leverage a multi-agent derivation of importance sampling and removing outdated samples from the experience replay buffer [8] to make it effective for multi-agent settings. Xie et al. employ latent representations of partner strategies to address this problem and enable a more scalable partner modeling [9]. Shih et al. further consider the effects of repeated interactions on partner modeling and develop a modular approach that separates rule-dependent representations from partner-dependent conventions [10].
Foerster et al. proposed the counterfactual multi-agent (COMA) algorithm that is expected to address the credit assignment problem in multi-agent environments [11]. COMA algorithm utilizes the set of joint actions of all agents as well as the full state of the world during the training. In contrast, we assume partial observability and a decentralized reward function during both training and execution. More application-oriented related works, include the centralized multi-agent solutions proposed by Gupta et al. [12]. More recently, Wang et al. proposed a gifting approach that enables the emergence of prosocial behaviors in general-sum coordination games [13]. Importantly, in contrast with our approach, the existing literature on multi-agent systems relies on assumptions on the social preference of agents [14, 15].
Human Behavior Modeling. Driving styles of human drivers can be learned either from demonstration through inverse RL, as proposed by Kuderer et al. , or employing statistical models such as Gaussian and Dirichlet processes [16, 17]. Kuefler et al. adopt a novel approach and apply generative adversarial networks to imitate the behavior of a human driver [18]. Schmerling et al. study the scenarios with inherent multimodal uncertainty, such as our driving, and leverage conditional variational autoencoders (CVAEs) to condition the policy on the present interaction history [19]. Recent data-driven approaches have shown achievements in classifying human driving maneuvers [20], and predicting human trajectories to enable fully-autonomous navigation of a robot in human-dense environments [21]. In contrast with works in the broad literature on human behavior modeling that take a game-theoretic or optimization-based approach, we rely on implicitly learning from interaction data within our MARL platform.
Social Navigation. Alahi et al. introduced the Social LSTM framework which leverages recurrent neural networks to extract the temporal information from the trajectory of pedestrians in large crowds [1]. Tsoi et al. present their high-fidelity simulation platform, SEAN, to accelerate the research on social robot navigation [22]. Vazquez et al. study the social interactions in a human-robot role-playing game and expand their observations to studying spatial behavior of a group of robots. More recent works in social navigation have revealed the potential for collaborative planning and interaction with humans. Examples include but are not limited to works by Trautman et al. and Nikolaidis et al. where a mutual reward function is optimized in order to enable joint trajectory planning for humans and robots [23, 24].
Mixed-autonomy Traffic Networks. Lazar et al. take a more abstract and traffic-level perspective to study the emergent behaviors in mixed-autonomy environments using model-free RL solutions [25]. Wu et al. explore the idea of stabilizing the traffic flow that is guided by autonomous vehicles as well as the emergent behaviors in a mixed AV-HV setting [26, 27]. Vinitsky et al. present a benchmark for traffic control based on RL in mixed-autonomy traffic [28]. Biyik et al. formalize the effects of altruistic driving in mixed-autonomy at a road-level and present a formal model of road congestion that can be used for optimal routing in road networks [29].
III Preliminaries
In this section, we provide the preliminary concepts that are essential in the following section and introduce our formal notation.
Partially-observable Stochastic Games. Decision-making process in a finite set of autonomous agents with partial observability in stochastic environments can be formalized as a partially-observable stochastic game (POSG) defined by the tuple for . At a given time, each agent receives a local observation that is correlated with the underlying state of the environment and takes an action from the action space . Consequently, the environment evolves to a new state with probability and the agent receives a decentralized reward . The probability distribution over actions at a given state is known as the stochastic policy . The goal is to derive a distribution that maximizes the discounted sum of future rewards over an infinite time horizon, i.e., an optimal policy ,
(1) |
in which, is the discount factor. The optimal policy maximizes the state-action value function, i.e., , where
(2) |
and the optimal state-action value function can then be derived using the Bellman optimality equation,
(3) |
Solving POSGs with Unknown Dynamics. Dynamics of the environment and reward function are usually stochastic and not fully-known in real-world problems. Reinforcement learning (RL) provides a possibility to solve POSGs with unknown reward and state transition functions through continuous interaction with the environment. RL algorithms such as off-policy temporal difference learning enable agents to update the value function from such interactions with the environment,
(4) |
where is the learning rate at the th iteration.
Deep Q-networks. Parameterizing the state-action value function using a function approximator, i.e., , results in more generalizable policies that can scale to larger state-spaces. Parameters w can be learned through mini-batch gradient descent steps,
(5) |
where, the operator estimates the gradient at . Deep neural networks are widely used as function approximators and are also applicable to the Q-learning algorithm [30]. A deep Q-network (DQN) builds up on two major ideas, namely using two separate networks during training and employing an experience replay buffer to decorrelate the training samples. The former is done to stabilize the training process by updating the greedy network at each training iteration to compute the optimal Q-value and using another less-frequently updated target network. The loss function in Eq. (5) can be written as
(6) |
where is the target network which periodically gets updated during the training. Additionally, DQN algorithm draws batches of training data from an experience replay buffer in order to decorrelate the training samples in Eq. (5) that are generated from simulation or real-world experience and thus naturally have temporal dependencies. This process is challenging in MARL since, if any . In other words, the environment becomes non-stationary when multiple agents are evolving concurrently. We will further discuss this issue and provide a solution to stabilize the multi-agent learning process in Section V-D.
V2V Networks. We are interested in a multi-agent setting where agents have no information about others’ actions and cannot explicitly coordinate. Instead, the decentralized coordination among agents is expected to arise from the social reward signal. We extend the earlier introduced concepts to a coordinated POSG defined as , where is a stochastic, time-varying, undirected graph that encompasses the V2V communication among agents in the environment . The communicated information can be as simple as kinematics information, e.g., speed, location, heading, or more bandwidth-intensive forms of sensory data, e.g., camera and LiDAR. Leveraging this shared situational awareness, agents can extend their range of perception and overcome obstacles and line-of-sight visibility limitations [31, 32]. An agent’s local observation is created using the shared situational awareness and clearly depends on which incorporates the flow of information among agents. We utilize the network analysis from [33] to model the V2V communication in a high-density highway.
IV Problem Statement
We investigate the maneuver-level decision-making problem for AVs to explore behaviors that can lead to socially-desirable outcomes. We are interested in the question of how autonomous agents can be trained from scratch to perform an individual task such as driving safely on a road, while considering the social aspects of their mission, i.e., optimizing for a social utility that also accounts for the interest of other vehicles around them. Figure 1 helps us to build more intuition on the topic by depicting instances of driving scenarios in which altruism leads to socially-valuable outcomes and clearly overcomes the limitations of egoistic and single-agent planning. Each example in Figure 1 provides an example on altruistic inter-agent coordination settings that can benefit both HVs and AVs. It is clear that in some instances, altruistic AVs have to compromise on their individual utility, e.g., by slowing down, in order to increase the group’s overall utility. The balance between an AV’s selflessness and selfishness is the key to reaching efficient and safe traffic flow. In [5, 6] we show that tuning the level of altruism in AVs leads to different emerging behaviors and affects the traffic flow and driving safety. In this work, we further explore that finding and formulate the problem as a convex optimization objective, to obtain an optimal social value orientation angle. Thus, we continue this section by providing a quantitative representation of an agent’s level of altruism and formally defining our case study scenario, before presenting our proposed solution in the next section.
IV-A Quantifying Social Value Orientation
In order to formally study the social dilemmas between humans and autonomous agents in heterogeneous environments, it is crucial to quantify the social preference of an individual, e.g., whether if they will defect or cooperate in a given situation such as opening a gap in our highway merging example. The degree of an agent’s egoism or altruism with regards to its counterparts is defined as Social Value Orientation (SVO), a widely used notion in the social psychology literature which has been recently adopted in robotics research. Specifically, we borrow the angular annotation for SVO as defined by Liebrand et al. [34]. The SVO angular preference , quantifies how an agent weights its own reward against the reward of others. An agent’s total utility can then be written as,
(7) |
where is the agent’s individual utility, is the total utility of other agents from the perspective of the th agent which in general is a function of their individual utilities,
(8) |

Autonomous agents require an understanding of human drivers’ social preferences and their willingness to coordinate. However, it is well-established in the behavioral decision theory that humans are heterogeneous in SVO and thus their preference is rather ambiguous and unclear [36]. Current works on social navigation for AVs often make restrictive assumptions on human drivers’ social preference and compliance [2], whereas Figure 2 indicates an spectrum of altruism among humans with heterogeneous social value orientations. Thus, due to the large spectrum of altruistic behavior observed by humans, our insight is to rely on autonomous cars instead to guide the overall system toward more socially desirable objectives. Specifically, we plan to find policies for AVs that improve the utility of the group as a whole through emerging alliances and more importantly, affecting the behavior of human drivers. In our particular driving example, the desired social outcome is achieving seamless and safe highway merging while maximizing the distance traveled by all vehicles and avoiding collisions.
IV-B Formalism

We choose a highway merging scenario with a mixed group of AVs and HVs as our base experiment scenario, as illustrated in Figure 3. A merging vehicle, which can be either HV or AV, approaches the highway on the merging ramp and faces a mixed platoon of vehicles that are cruising on the highway. This configuration contains a group of AVs that hold the same SVO, as well as a group of HVs which are heterogeneous in their SVO and hence it is unclear if they are allies or foes. In this settings, it is obvious that the individual interest of the merging vehicle, i.e., seamless merging into the highway, does not align with that of the cruising vehicles, i.e., cruising with optimal speed and energy consumption. We design our case study scenario in a way that safe and seamless merging necessarily requires all AVs to work together and none of them alone can enable the merging of the mission vehicle without the cooperation of the others. Formally, the road section shown in Figure 3 is shared by a set of AVs that are connected together via V2V communication and governed by a decentralized stochastic policy, a set of HVs operated by humans with heterogeneous and unknown SVOs, and a human-driven or autonomous mission vehicle that attempts to merge into the highway.
A human driver’s perception is often limited by their range of vision, occlusion, and obstacles. In contrast, CAVs share their observations to overcome these limitations. Each CAVs has a unique local observation that is constructed using the its own local observation, as well as the local observations it receives from the neighboring CAVs. As mentioned before, graph grasps this inter-agent communication. Therefore, an observer AV can detect a subset of other AVs, , and a subset of HVs . As we elaborated before, our aim is to find a decentralized control scheme that can induce altruism in the behavior of AVs. Hence, each AV must use its local observation to make independent decisions that optimize its utility. The value of the agent’s altruism, i.e., the SVO angular phase , determines the social implications of an agent’s local actions. To summarize, we state our problem as deriving a utility function that enables the AVs to handle competitive driving scenarios, such as those illustrated in Figure 1, and lead them into socially-desirable outcomes that improve traffic safety and efficiency for the group of vehicles.
V Sympathetic Cooperative Driving Framework
In their recent work, Silver et al. explained how artificial intelligence agents can learn complex tasks through experience and maximizing a generic reward function, rather than requiring task-specific specialized problem formulations [37]. Inspired by this approach to solving decision-making problems, rather than breaking down our problem into learning how to drive and learning social coordination, we train our autonomous agents from scratch using a decentralized reward structure and expect them to master the basics of highway driving, e.g., avoiding collisions and unnecessary lane change or acceleration, while learning inter-agent coordination to eventually achieve the goal of enabling a safe and seamless merging. To reiterate on our goal, we seek a decentralized solution that enables the autonomous agents to make independent socially-desirable decisions, with no explicit coordination or sharing of their decisions and future actions. In the rest of this section, we define the action and observation space in the POSG framework of Section III and introduce the notions of sympathy and cooperation that are essential for structuring the reward function.
V-A Action and Observation Spaces
We employ a numeric representation for an agent’s observation that embeds the kinematics of the neighboring vehicles. Additionally, we integrate the history of vehicles’ last meta-actions to extract temporal information and their past trajectories. An ego vehicle observes a set of HVs and AVs in its perception range. The Kinematic observation includes the relative Frenet coordinates of the closest vehicles in addition to the absolute Frenet coordinates of the ego vehicle. Formally, agent receives a local observation ,
(9) |
Each row of the local observation matrix is defined as,
(10) |
in which, and are the longitudinal and lateral Frenet coordinates of the th vehicle, respectively. Vehicle’s yaw angle is denoted by and the autonomy flag is if and otherwise. In case that the total number of observed vehicles is smaller than the set size of the observation matrix o, the remaining rows are filled with zeros with . is the unrolled numeric representation of the action history array that contains the last meta-actions taken by and is defined as,
(11) |
Our interest is in maneuver-level decision-making for autonomous vehicles. Thus, we define the action space as the set of abstract meta-actions , Idle, Lane Right, Accelerate, . These meta-actions are then translated into admissible trajectories and low-level control signals that eventually govern the movement of the vehicle. The implementation details of how meta-actions render into steering and acceleration signals are discussed in Section VI. Additionally, the discrete meta-actions defined above must be translated into numeric values in Eq. (11). We experiment with three encodings and choose the one that leads to best performance after training:
-
-
Binary: A one-hot encoding with 5 bits for .
-
-
Discrete: An integer in for .
-
-
Frenet: Two integers in for lateral and longitudinal actions.
V-B Disentangling Sympathy and Cooperation
Inter-agent relations in our mixed-autonomy problem can be broken down into the interactions among autonomous agents, i.e., AV-AV interactions, as well as between autonomous agents and human drivers, i.e., human-AI interactions. Decoupling the two enables us to systematically study the interactions between human drivers with ambiguous SVO and our autonomous agents. We refer to an autonomous agent’s altruism toward a human as sympathy and define cooperation as the altruistic behavior among autonomous agents. Our rationale for decoupling the components of altruism is that they differ in nature. As an instance, sympathy may not be reciprocal as humans are heterogeneous in their SVO but cooperation among autonomous agents is essentially homogeneous, assuming that they hold the same SVO. We investigate each component of altruism separately to better understand the emerging behaviors and the mechanics of inducing altruism in autonomous agents. Following this definition, we can rewrite Eq. (7) as,
(12) | ||||
where is the sympathy angular phase determining the cooperation-to-sympathy ratio. Parameters and denote the total utility of other autonomous and human-driven vehicles, respectively, as perceived from the th agent’s perspective. We expand on this topic in Section V-C where we introduce the distributed reward structure.
V-C Decentralized Reward Structure
Following the notions of sympathy and cooperation and the notation of Eq. (12) we decompose the decentralized reward received by agent as,
(13) | ||||
in which , . The term denotes the ego vehicle’s driving performance derived from metrics such as distance traveled, average speed, and a negative cost for changes in acceleration to promote a smooth and efficient movement by the vehicle. The cooperative reward term, accounts for the utility of the ego’s allies. It is important to note that ego vehicle only requires the observation to compute and not any explicit coordination or knowledge of the actions of the other agents. The sympathetic reward term, is defined as
(14) |
where denotes an HV’s utility, e.g., its speed, is the distance between the observer autonomous agent and the th HV, and and are dimensionless coefficients. Moreover, the sparse scenario-specific mission reward term in the case of our driving scenario is representing the success or failure of the merging maneuver,
(15) |
V-D Deep MARL for Sympathetic and Cooperative Driving
Two cascade multi-layer perceptron (MLP) networks are utilized as the feature extractor network (FEN) and the function approximator network (FAN), each with two layers of size 256 and 128 neurons, respectively, and rectified linear unit (ReLU) non-linearities. As introduced in Section V-A, the temporal information in a vehicle’s observations is captured through integrating the history of the past actions in the observations and the feature extractor network must be able to efficiently extract meaningful patterns from this information. Both networks are trained end-to-end to enforce the feature extractor network to extract the most vital information that is required for estimating the state-action value function. The policy is trained offline and deployed into all agents to be executed in a distributed and online fashion, meaning that each agent makes independent decisions based on its observation but they all follow the same stochastic policy.
As we elaborated in Section III, the non-stationarity of the environment is a major problem in concurrent training of multiple RL agents. We employ a semi-sequential training and policy dissemination algorithm to cope with this challenge and stabilize the training process. Algorithm 1 summarizes our overall methodology which is done in two stages. First, an experience replay buffer (ERB) is filled with data from simulation episodes and then, random samples drawn from this buffer is used for updating the weights of both FEN and FAN networks. For simplicity we refer to the set of all weights for both neural networks as w. We use a novel method for scoring the entries in ERB and drawing them with a probability proportional to that score.
ERB is highly skewed due to the nature of our highway merging scenario. To elaborate, each episode can be morphologically broken down into two parts, straight driving on the highway and the merging point. The former mostly provides information and training samples that are useful for learning the basics of driving and the latter contains the important information regarding the inter-agent coordination and altruistic behavior, which is of our interest. Only a few time steps of each episode contain the merging point and the rest is mostly related to highway cruising. To balance the training data drawn from the experience replay, we randomly draw samples with a probability proportional to their spatial distance from the merging point. This method showed better performance when compared to the most common method of prioritizing the experience replay based on a sample’s last resulted reward.
After drawing a training sample from ERB, the agent performs iterations of training while the weights of all other agents is frozen. The updated weights are then disseminated to the other agents to update their policy. This process is then repeated for all agents until convergence. Doing so enables us to stabilize the training and train all agents concurrently. The key idea is applying incremental updates and keeping the environment stationary in-between the updates so that the optimizer achieves convergence. This semi-sequential algorithm is illustrated in Figure 4 and Algorithm 1.
VI Implementation Details
We start this section with the 2D micro-traffic simulator we employed to generate simulation episodes and formulate the human driver model that imitates the behavior of a HV in mixed-autonomy environments. Practical details of training and validation are discussed before presenting our results in the next section.
VI-A Driving Simulator
We modified an OpenAI Gym environment [38] to enable multi-agent training and distributed execution in a mixed-autonomy highway merging scenario. The meta-actions determined by the stochastic policy are translated to low-level steering and acceleration control signals through a closed-loop proportional–integral–derivative (PID) controller. Motion of the vehicles is then governed by a Kinematic Bicycle Model that determines the vehicles’ yaw rate and acceleration. As a common practice in robotics, road segments and the motion of the agents are expressed in Frenet-Serret coordinates and broken into lateral and longitudinal movements.
In order to ensure learning generalizable policies rather than memorizing a sequence of actions by the function approximator network, the initial state of each simulation episode is randomized. This episode initialization is particularly critical as the resulting initial states must be still meaningful and valid for our desired conflictive highway merging scenario. Trivial episodes where the merging vehicle can easily merge into the highway regardless of the AVs’ actions or the episodes where the AVs’ do not have an opportunity to enable safe merging, not only do not add valuable information to the training process but also can lead into misleading measures. The initial longitude and speed of the cruising vehicles are uniformly randomized and the initial longitude and speed of the merging vehicle are drawn from a clipped-Gaussian distribution defined as,
(16) |
where denotes a Gaussian distribution and is the Heaviside step function. We elaborate on initializing episodes via parameters , , and in Section VII-E.
VI-B Human Driver Model
Lateral and longitudinal movements of HVs are mimicked by human driver models proposed by Treiber et al. and Kesting et al. [39, 40]. The lateral actions of HVs, i.e., the decision to perform a lane change, follow the Minimizing Overall Braking Induced by Lane changes (MOBIL) strategy [40]. MOBIL model allows a lane change only if the resulting acceleration meets the safety criterion, and the incentive criterion is also satisfied,
(17) |
with , , and being the acceleration of the ego HV, the following vehicle in the target lane, and the following vehicle in the current lane, respectively, and , , and are the corresponding accelerations assuming the ego HV has performed the lane change. is the threshold that determines if the ego HV shall performs the lane change. HV’s SVO angle is also referred to as the politeness factor in the literature and is extracted from the empirical probability distribution illustrated in Figure 2.
The longitudinal acceleration of HVs follows the Intelligent Driver Model (IDM) [39]. The longitudinal Frenet acceleration of a HV, , is determined by
(18) |
where denotes the longitudinal Frenet speed of the HV, and the desired Frenet distance to the leading vehicle is controlled by , defined as,
(19) |
in which is the approach rate, and the model parameters , , , , and are set speed, set time gap, minimum gap distance, maximum acceleration, and the desired acceleration, respectively. Additionally, the acceleration of the vehicle is a random variable defined as,
(20) |
with being a standard Gaussian random variable and is the standard deviation of the velocity noise at the time step of the simulation.

VI-C Training and Validation
The autonomous agents are trained using the semi-sequential multi-agent Q-learning algorithm that we introduced in Figure 4 and Algorithm 1 for 15,000 episodes that are generated by the procedure discussed in Section VI-A. Training process is repeated and compared across multiple runs to assure the stability of training and that it converges to the similar policies every time. The trained policies are then evaluated for 2,000 randomized novel test episodes to gauge their efficacy. Test episodes are intentionally generated with different and broader initialization range than the training episodes to demonstrate that agents actually are able to learn generalizable policies and not only memorize sequences of actions.
VII Experimental Results
We break down the research questions of our interest into experimental hypotheses and investigate them through our experiments and ablation studies in this section.

VII-A Manipulated Variables
The two key variables in Eq. (13) are and that determine the level of altruism, which is the general term we use for both HVs and AVs, as well as level of sympathy, which is the term for altruism toward HVs only. Our experiments are done in 26 settings with different values of and . Furthermore, we experiment with both autonomous, , and human-driven, , mission vehicle. Our experiment settings are:
-
•
HV+E. autonomous agents are egoistic ( for ), and the mission vehicle is HV ();
-
•
HV+C. autonomous vehicles are cooperative only ( and for ), and the mission vehicle is HV ();
-
•
HV+SC. autonomous vehicles are sympathetic and cooperative ( and for ), and the merging vehicle is HV ();
-
•
AV+E/C/SC. Duals of the the above cases with autonomous mission vehicle ().
In HV+SC and AV+SC scenarios where autonomous agents have both sympathy and cooperation components, we set the sympathy angle to for the sake of fairness and to avoid imposing bias between HVs and AVs as they both carry humans or goods and neither should have a pre-assumed advantage over the other. The SVO angle is however tuned to reach the optimal level of altruism, we elaborate on this topic in Section VII-D and derive the optimal SVO angle .
VII-B Performance Measures
To gauge the impact of the aforementioned manipulated variables and other configurable parameters, 3 metrics are chosen that despite being correlated with each other, provide different insights on the efficacy of our solution. As a traffic-level metric, the average distance traveled by HVs and AVs is logged during simulation episodes. Additionally, counting the percentage of the episodes that experience a successful merge enables us to probe the overall social importance of a solution. Safety is also gauged through counting the percentage of episodes that contain at least one crash.
VII-C Hypotheses
Social and individual performance of altruistic and purely egoistic agents are compared through the 3 key hypotheses:
-
•
H1. While egoistic AVs fail to account for a merging HV, AVs that hold both sympathy and cooperation elements explore ways to enable safe and seamless merging. Therefore, we expect HV+SC to outperform HV+E and HV+C settings.
-
•
H2. AVs with are able to implicitly learn the SVO of HVs and guide them to improve the overall performance of the group.
-
•
H3. There exists a social value orientation angle for autonomous agents that can both lessen the number of crashes and improve the number of successful merges.

VII-D Analysis and Results
Examining H1. The main claim of hypothesis H1 is the superiority of sympathetic cooperative AVs in creating socially optimal results when compared to egoistic autonomous AVs. To better understand the situation, we reiterate on the driving scenario: the merging vehicle , that can be either human-driven or autonomous, approaches a highway with a mixed group of HVs and AVs. requires the cruising vehicles’ assistance in order to be able to merge safely. Per our fundamental assumption, we do not rely on the HVs to compromise on their own utility as their SVO is unknown. Instead, it’s on the AVs to create a safe corridor for and, as we will show in Section VII-E, this goal cannot be achieved by a single AV alone and necessarily needs a cooperative action by the group of AVs.

Figure 5 illustrates an overall comparison between the settings defined in Section VII-A. Focusing on the cases with a human-driven merging vehicle, it is evident that in the absence of the sympathy component in AVs, i.e., in HV+E and HV+C settings, merging fails in the majority of episodes. Failed merging leads to a crash in our simulator as vehicles cannot stop on the highway nor the merging ramp and the merging vehicle that fails to merge collides with the barrier at the end of the merging ramp. This assumption is made to make our simulations more realistic and avoid unfeasible solutions that require full-stop on the highway. Therefore, most of the crash cases shown in Figure 5 are due to unsuccessful merging and not the lack of basic driving skills in HVs and AVs. As an additional evidence, independent crashes that are not relevant to a failed merge are also plotted in Figure 5, which confirms the fact that the vehicles hold sufficient basic skills to maneuver on a highway and avoid collisions.
Figures 5 and 6 clarify the positive social impact that sympathy and cooperation make in terms of reducing the total number of crashes and failed merging. However, a counter-argument against this comparison can be the fact that a rather conservative model is used to mimic HVs in our simulations and this might limit their capability in merging. To investigate this claim, we repeat the comparison with an autonomous mission vehicle that is more risk-tolerant and attempts more creative ways to merge into the highway. In the AV+E setting that AVs only care about their individual utility, although the results are better compared to HV+E, even the autonomous mission vehicle still fails to safely merge in more than 1/3 of the episodes. We conclude that our test case indeed creates a competitive and conflictive scene for the vehicles and showcases how incorporating sympathy and cooperation components in the reward structure of AVs leads to socially-desirable outcomes and improves safety and traffic flow. Figure 6 provides further intuition to this comparison by depicting a sampled set of mission vehicle’s trajectories in different experiment settings. It is evident that un-sympathetic does not allow the mission vehicle to merge, causing its trajectory to end in the merging ramp.
Examining H2. Figure 7 illustrates an example of autonomous agents trained with the sympathetic cooperative reward and a higher capacity neural network architecture. Although all AVs in this scenario work together to make the merging possible, we focus on the most impactful agent which is the “Guide AV” shown in orange color. Other AVs in this sample scenario (shown in green) compromise on their individual reward by accelerating, consuming more energy, and thus receiving less reward as defined in Section V-C. Interestingly, the Guide AV learns to first slow down and then change lane to left and open up space for . After successfully merges, the Guide AV finds its lane blocked by a HV so makes another lane change to the right and follows other AVs. Figure 7 demonstrates how AVs receive a significant reward when merges into the highway. Although the reward structure defined in Section V-C contains multiple parameters but the mission reward term of Eq. (15) has an order of magnitude larger impact and thus is the dominating reward signal in training our autonomous agents. In other words, the trained agents learn to take sequences of actions that lead to receiving . This learning process includes learning to avoid collisions, navigating through the traffic, and if required affecting the behavior of other HVs.
As it was emphasized before, the autonomous agents do not have access to an explicit behavior model of human drivers and instead implicitly learn this model from experience during the training episodes. Although we employ a rather conservative model of human drivers to showcase our concept, it is expected that given sufficient training data, the autonomous agents can extract models of more complex human behaviors as well. However, sensitivity of our solution to these models and the effect of human behaviors on inter-agent coordination is a topic worthy of investigation which we leave for our future work. As a relevant observation, AVs implicitly learn to predict the behavior of HVs and the fact that HVs commonly act egoistically (refer to Figure 2) and do not slow down for the merging vehicle. Hence, they do not rely on the HVs and instead compromise on their individual reward to enable the highway merging.
Examining H3.

Our experimental scenarios in Section VII-A are defined based on the optimal SVO angle of the autonomous agents. This parameter clearly has an important impact on the behavior of AVs and thus the safety and traffic-flow metrics. We trained a large set of agents with different SVO angles and tested them in our case study driving scenario. The optimal SVO angle is then defined as the angle that results the best performance metrics, i.e., least number of episodes with collisions and failed merges. We formulate this simple optimization objective as the convex combination of the two metrics,
(21) |
where and are the percentage of episodes with a crash and failed mission, respectively. The hyper-parameter determines the importance of each performance metric and we choose it to be as otherwise it could bias the training process by putting more emphasis on either of the metrics. Figure 8 illustrates how the two metrics change when the autonomous agents’ SVO is varied from (purely egoistic) towards (purely altruistic). It is worth mentioning that neither of the two extremes seems optimal and a point between caring about others and being selfish leads to the most socially-desirable outcome.
Mission Failed | Crashed | Distance Traveled | |
Single-agent (HV+1SC) | |||
Multi-agent (HV+SC) |
Mission Failed | Crashed | |
Adding Autonomy Flag | ||
Without | ||
With | ||
Including Mission Vehicle | ||
Without | ||
With |
A fair critique to the behavior of sympathetic cooperative agents can be the fact that the Guide AV, i.e., AV3 in Figure 3, decelerates and therefore slows down the group of vehicles behind only to allow the mission vehicle to merge. In other words, the utility of a big group of vehicles is being compromised for the sake of the mission vehicle. To investigate the fairness and effectiveness of this outcome, we measure the average distance traveled by HVs and AVs. Figure 9 reveals how despite the fact that in the HV+SC setting a group of vehicles need to slow down to open up space for the mission vehicle, eventually both HVs and AVs manage to travel more distance when compared to a similar setup with egoistic agents (HV+E). It should be noted that the effect of Guide AV’s deceleration gradually propagates through the platoon of vehicles in behind and only affects a limited group of vehicles as the traffic in the platoon is not rigid and can contract and expand.

VII-E Ablation Studies
Necessity of Multi-agent Coordination. Consider the highway merge scenario of Figure 3. Our claim is that all AVs require to work together to enable a safe and seamless merging and none of them can achieve this goal if the others do not cooperate. As elaborated in Section VI-A, we particularly design our scenarios to gauge the effectiveness of altruistic agents and inter-agent coordination. To complement our results in Figure 5 that back the hypothesis H1, we conducted an ablation study in the driving scenario of Figure 3 with the difference that only is sympathetic cooperative and label this scenario as HV+1SC. Table I demonstrate the necessity of multi-agent coordination and the fact that a single sympathetic cooperative AV, i.e., the Guide AV, is not able to achieve the mission of safe and seamless merging without help from the other AVs.
Designing Non-trivial and Fair Scenarios. Our method for initializing simulation episodes is described in Eq. (16). Parameters and determine the range of the allowed values for the merging vehicle’s initial longitude and speed. Trivial episodes that are too easy, i.e., always lead to successful merging, or too challenging, i.e., never result in a successful merge, can steer the training process into the wrong direction and must be avoided when initializing the episodes. Furthermore, the initial state of an episode can benefit different agents with various SVOs and thus, one may argue that the superior performance of sympathetic cooperative agents as observed in Figures 5 and 6 is an artifact of the episode’s initialization. We draw the initial values from a region that does not favor either of the social preferences. Two sets of parameters and are chosen for the initial longitude and initial speed of the merging vehicle, as listed in Table III. Figure 10 illustrates the intuition behind choosing these values.

Observation-space Representation. We discussed the details of how information is embedded into an agent’s observation in Section V-A. Here we justify the design choices and show their positive impact on the performance. Table II shows the impact of including in Eq. (9) as well as the autonomy flag of Eq. (10). Figure 11 summarizes the effect of integrating in Eq. (10), the history horizon , and the type of the action encoding. We also experimented with sorting the rows of in Eq. (9) based on vehicle ID and vehicles’ longitude, as shown in Figure 11.
Parameter | Value | Parameter | Value |
Batch size | |||
Initial exploration | |||
Final exploration | |||
decay | |||
Optimizer | |||

VIII Concluding Remarks
Summary. Autonomous vehicles need to learn to co-exist with human-driven vehicles on the same road infrastructure. Deploying egoistic AVs that solely account for their individual interests on the road leads to sub-optimal and non-desirable social outcomes. In contrast, we compute the optimal SVO angle that optimizes the traffic metrics and demonstrate how altruistic AVs with the corresponding SVO can be trained to optimize a decentralized social utility that improves traffic flow, safety, and efficiency. We propose practical solutions to mitigate the non-stationarity problem in simultaneous multi-agent training and implicitly learn the behavior of human drivers from experience. Our experiments reveal that altruistic AVs are able to form alliances and affect the behavior of HVs in order to create socially-desirable outcomes that benefit the group of the vehicles.
Limitations and Future Work. While this paper captures the fundamentals of social coordination and altruism in autonomous driving, many tangential aspects of the problem can be further studied. For example, we employed a conservative and limited model of human drivers. Although we expect our solution to be effective with other human behavior models as well, it is important to study its performance under different human behaviors. Also, the impact of communication imperfections and packet drops on the inter-agent coordination can be further investigated using more complex communication models than those presented in this work. On the implementation side, more advanced neural architectures such as convolutional and recurrent networks can be leveraged to capture spatial and temporal information more effectively, a direction that we plan to explore in our future work.
References
- [1] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese, “Social lstm: Human trajectory prediction in crowded spaces,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 961–971.
- [2] D. Sadigh, S. Sastry, S. A. Seshia, and A. D. Dragan, “Planning for autonomous cars that leverage effects on human actions.” in Robotics: Science and Systems, vol. 2. Ann Arbor, MI, USA, 2016.
- [3] D. Sadigh, “Influencing interactions between human drivers and autonomous vehicles,” in Frontiers of Engineering: Reports on Leading-Edge Engineering from the 2019 Symposium. National Academies Press, 2020.
- [4] W. Schwarting, A. Pierson, J. Alonso-Mora, S. Karaman, and D. Rus, “Social behavior for autonomous vehicles,” Proceedings of the National Academy of Sciences, vol. 116, no. 50, pp. 24 972–24 978, 2019.
- [5] B. Toghi, R. Valiente, D. Sadigh, R. Pedarsani, and Y. P. Fallah, “Altruistic maneuver planning for cooperative autonomous vehicles using multi-agent advantage actor-critic,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021.
- [6] ——, “Cooperative autonomous vehicles that sympathize with human drivers,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021.
- [7] L. Matignon, G. J. Laurent, and N. Le Fort-Piat, “Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems.” Knowledge Engineering Review, vol. 27, no. 1, pp. 1–31, 2012.
- [8] J. Foerster, N. Nardelli, G. Farquhar, T. Afouras, P. H. Torr, P. Kohli, and S. Whiteson, “Stabilising experience replay for deep multi-agent reinforcement learning,” in International conference on machine learning. PMLR, 2017, pp. 1146–1155.
- [9] A. Xie, D. Losey, R. Tolsma, C. Finn, and D. Sadigh, “Learning latent representations to influence multi-agent interaction,” in Proceedings of the 4th Conference on Robot Learning (CoRL), November 2020.
- [10] A. Shih, A. Sawhney, J. Kondic, S. Ermon, and D. Sadigh, “On the critical role of conventions in adaptive human-ai collaboration,” in 9th International Conference on Learning Representations (ICLR), 2021.
- [11] J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, “Counterfactual multi-agent policy gradients,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018.
- [12] J. K. Gupta, M. Egorov, and M. Kochenderfer, “Cooperative multi-agent control using deep reinforcement learning,” in International Conference on Autonomous Agents and Multiagent Systems. Springer, 2017, pp. 66–83.
- [13] W. Z. Wang, M. Beliaev, E. Biyik, D. A. Lazar, R. Pedarsani, and D. Sadigh, “Emergent prosociality in multi-agent games through gifting,” in 30th International Joint Conference on Artificial Intelligence (IJCAI), 2021.
- [14] S. Omidshafiei, J. Pazis, C. Amato, J. P. How, and J. Vian, “Deep decentralized multi-task multi-agent reinforcement learning under partial observability,” in International Conference on Machine Learning. PMLR, 2017, pp. 2681–2690.
- [15] M. Lauer and M. Riedmiller, “An algorithm for distributed reinforcement learning in cooperative multi-agent systems,” in In Proceedings of the Seventeenth International Conference on Machine Learning. Citeseer, 2000.
- [16] M. Kuderer, S. Gulati, and W. Burgard, “Learning driving styles for autonomous vehicles from demonstration,” in 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2015, pp. 2641–2646.
- [17] H. N. Mahjoub, B. Toghi, and Y. P. Fallah, “A stochastic hybrid framework for driver behavior modeling based on hierarchical dirichlet process,” in 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), 2018, pp. 1–5.
- [18] A. Kuefler, J. Morton, T. Wheeler, and M. Kochenderfer, “Imitating driver behavior with generative adversarial networks,” in 2017 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2017, pp. 204–211.
- [19] E. Schmerling, K. Leung, W. Vollprecht, and M. Pavone, “Multimodal probabilistic model-based planning for human-robot interaction,” in 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018, pp. 3399–3406.
- [20] B. Toghi, D. Grover, M. Razzaghpour, R. Jain, R. Valiente, M. Zaman, G. Shah, and Y. P. Fallah, “A maneuver-based urban driving dataset and model for cooperative vehicle applications,” 2020.
- [21] Y. F. Chen, M. Everett, M. Liu, and J. P. How, “Socially aware motion planning with deep reinforcement learning,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 1343–1350.
- [22] N. Tsoi, M. Hussein, J. Espinoza, X. Ruiz, and M. Vázquez, “Sean: Social environment for autonomous navigation,” in Proceedings of the 8th International Conference on Human-Agent Interaction, 2020, pp. 281–283.
- [23] P. Trautman and A. Krause, “Unfreezing the robot: Navigation in dense, interacting crowds,” in 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2010, pp. 797–803.
- [24] S. Nikolaidis, R. Ramakrishnan, K. Gu, and J. Shah, “Efficient model learning from joint-action demonstrations for human-robot collaborative tasks,” in 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 2015, pp. 189–196.
- [25] D. A. Lazar, E. Bıyık, D. Sadigh, and R. Pedarsani, “Learning how to dynamically route autonomous vehicles on shared roads,” arXiv preprint arXiv:1909.03664, 2019.
- [26] C. Wu, A. M. Bayen, and A. Mehta, “Stabilizing traffic with autonomous vehicles,” in 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018, pp. 6012–6018.
- [27] C. Wu, A. Kreidieh, E. Vinitsky, and A. M. Bayen, “Emergent behaviors in mixed-autonomy traffic,” in Conference on Robot Learning. PMLR, 2017, pp. 398–407.
- [28] E. Vinitsky, A. Kreidieh, L. Le Flem, N. Kheterpal, K. Jang, C. Wu, F. Wu, R. Liaw, E. Liang, and A. M. Bayen, “Benchmarks for reinforcement learning in mixed-autonomy traffic,” in Conference on robot learning. PMLR, 2018, pp. 399–409.
- [29] E. Bıyık, D. Lazar, R. Pedarsani, and D. Sadigh, “Altruistic autonomy: Beating congestion on shared roads,” arXiv preprint arXiv:1810.11978, 2018.
- [30] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, 2013.
- [31] E. Emad Marvasti, A. Raftari, Y. P. Fallah, R. Guo, and H. Lu, “Feature sharing and integration for cooperative cognition and perception with volumetric sensors,” arXiv e-prints, pp. arXiv–2011, 2020.
- [32] R. Valiente, M. Zaman, S. Ozer, and Y. P. Fallah, “Controlling steering angle for cooperative self-driving vehicles utilizing cnn and lstm-based deep networks,” in 2019 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2019, pp. 2423–2428.
- [33] B. Toghi, M. Saifuddin, H. N. Mahjoub, M. Mughal, Y. P. Fallah, J. Rao, and S. Das, “Multiple access in cellular v2x: Performance analysis in highly congested vehicular networks,” in 2018 IEEE Vehicular Networking Conference (VNC). IEEE, 2018, pp. 1–8.
- [34] W. B. Liebrand and C. G. McClintock, “The ring measure of social values: A computerized procedure for assessing individual differences in information processing and social value orientation,” European journal of personality, vol. 2, no. 3, pp. 217–230, 1988.
- [35] A. Garapin, L. Muller, and B. Rahali, “Does trust mean giving and not risking? experimental evidence from the trust game,” Revue d’économie politique, vol. 125, no. 5, pp. 701–716, 2015.
- [36] R. O. Murphy and K. A. Ackermann, “Social preferences, positive expectations, and trust based cooperation,” Journal of Mathematical Psychology, vol. 67, pp. 45–50, 2015.
- [37] D. Silver, S. Singh, D. Precup, and R. S. Sutton, “Reward is enough,” Artificial Intelligence, p. 103535, 2021.
- [38] E. Leurent, Y. Blanco, D. Efimov, and O.-A. Maillard, “Approximate robust control of uncertain dynamical systems,” arXiv preprint arXiv:1903.00220, 2019.
- [39] M. Treiber, A. Hennecke, and D. Helbing, “Congested traffic states in empirical observations and microscopic simulations,” Physical review E, vol. 62, no. 2, p. 1805, 2000.
- [40] A. Kesting, M. Treiber, and D. Helbing, “General lane-changing model mobil for car-following models,” Transportation Research Record, vol. 1999, no. 1, pp. 86–94, 2007.
![]() |
Behrad Toghi is a Ph.D. candidate at the University of Central Florida. He received the B.Sc. degree in electrical engineering from Sharif University of Technology in 2016 and has worked as a research intern at Mercedes-Benz R&D North America and Ford Motor Company R&D between 2018 and 2020. His work is in the intersection of artificial intelligence and cooperative networked systems with a focus on autonomous driving. |
![]() |
Rodolfo Valiente is a Ph.D. candidate in Computer Engineering at the University of Central Florida. His research interests include connected autonomous vehicles (CAVs), reinforcement learning, computer vision, and deep learning with a focus on the autonomous driving problem. He received a M.Sc. degree from the University of Sao Paulo (USP) in 2017 and his B.Sc. degree from the Technological University Jose Antonio Echeverria in 2014. |
![]() |
Dorsa Sadigh is an Assistant Professor in the CS and EE departments at Stanford University. Her research interests lie in the intersection of robotics, learning and control theory. Specifically, she is interested in developing algorithms for safe and adaptive human-robot interaction. Dorsa has received her doctoral degree in Electrical Engineering and Computer Sciences (EECS) at UC Berkeley in 2017, and has received her B.Sc. in EECS at UC Berkeley in 2012. |
![]() |
Ramtin Pedarsani is an Assistant Professor in the ECE Department at the University of California, Santa Barbara. He received the B.Sc. degree in electrical engineering from the University of Tehran in 2009, the M.Sc. degree in communication systems from the Swiss Federal Institute of Technology (EPFL) in 2011, and his Ph.D. from the University of California, Berkeley, in 2015. His research interests include networks, game theory, machine learning, and transportation systems. |
![]() |
Yaser P. Fallah is an Associate Professor in the ECE Department at the University of Central Florida. He received the Ph.D. degree from the University of British Columbia, Vancouver, BC, Canada, in 2007. From 2008 to 2011, he was a Research Scientist with the Institute of Transportation Studies, University of California Berkeley, Berkeley, CA, USA. His research, sponsored by industry, USDoT, and NSF, is focused on intelligent transportation systems and automated and networked vehicle safety systems. |