Towards Crossing the Reality Gap with Evolved Plastic Neurocontrollers

Huanneng Qiu The University of New South Wales CanberraNorthcott Drive, CampbellCanberraACTAustralia2600 [email protected] , Matthew Garratt The University of New South Wales CanberraNorthcott Drive, CampbellCanberraACTAustralia2600 [email protected] , David Howard 1234-5678-9012 Robotics and Autonomous Systems Group, CSIRO1 Technology Court, PullenvaleBrisbaneQLDAustralia4069 [email protected] and Sreenatha Anavatti The University of New South Wales CanberraNorthcott Drive, CampbellCanberraACTAustralia2600 [email protected]

(2020)

Abstract.

A critical issue in evolutionary robotics is the transfer of controllers learned in simulation to reality. This is especially the case for small Unmanned Aerial Vehicles (UAVs), as the platforms are highly dynamic and susceptible to breakage. Previous approaches often require simulation models with a high level of accuracy, otherwise significant errors may arise when the well-designed controller is being deployed onto the targeted platform. Here we try to overcome the transfer problem from a different perspective, by designing a spiking neurocontroller which uses synaptic plasticity to cross the reality gap via online adaptation. Through a set of experiments we show that the evolved plastic spiking controller can maintain its functionality by self-adapting to model changes that take place after evolutionary training, and consequently exhibit better performance than its non-plastic counterpart.

evolutionary robotics, spiking neural networks, Hebbian plasticity, neuroevolution, UAV control

^†^†journalyear: 2020^†^†copyright: acmlicensed^†^†conference: Genetic and Evolutionary Computation Conference; July 8–12, 2020; Cancún, Mexico^†^†booktitle: Genetic and Evolutionary Computation Conference (GECCO ’20), July 8–12, 2020, Cancún, Mexico^†^†price: 15.00^†^†doi: 10.1145/3377930.3389843^†^†isbn: 978-1-4503-7128-5/20/07^†^†ccs: Computer systems organization Evolutionary robotics^†^†ccs: Computing methodologies Neural networks^†^†ccs: Computing methodologies Evolutionary robotics^†^†ccs: Applied computing Biological networks

1. Introduction

Unmanned Aerial Vehicles (UAVs) are challenging platforms for developing and testing advanced control techniques, because they are highly dynamic, with strong couplings between different subsystems (Ng et al., 2004). Controller design for these agile platforms is naturally difficult, as a poorly-performing controller can lead to catastrophic consequences, e.g., the UAV crashing. In addition, many learning approaches require large numbers of fitness evaluations. Therefore, there still exist a large group of aerial robotic studies relying on simulations as an intermediate step to develop control algorithms (Kendoul, 2012).

When simulating, it is not uncommon to derive UAV models mathematically from first principles (Pounds et al., 2010; Alaimo et al., 2013). However, such models are ill-suited to capturing every aspect of the system dynamics, because some of them cannot easily be modeled analytically, e.g., actuator kinematic nonlinearities, servo dynamics, etc (Garratt and Anavatti, 2012). Ignoring these effects can significantly deteriorate the performance of the designed controller when being deployed onto the targeted platform. To address this issue, a common practice is to develop control algorithms based on an ‘identified’ model that is a simulated representation of the real plant. This identified model is obtained by applying a data-driven process called system identification that models the exact dynamics from the measured plant’s input and output data. Such implementations have been successful amongst previous research (Ng et al., 2004; Ng et al., 2006; Garratt and Anavatti, 2012; Kendoul, 2012; Hoffer et al., 2014).

While a lot of works have pursued a perfect model that well characterizes UAV platforms, a key issue is that loss of performance is still likely to happen when transferring the well-designed (in simulation) controller onto the real platform that has somewhat different dynamics – the well-known reality gap. In this work we demonstrate a novel approach to compensate the gap across different platform representations, which works specifically with Spiking Neural Networks (SNNs) that exhibit online adaptation ability through Hebbian plasticity (Gerstner and Kistler, 2002). We propose an evolutionary learning strategy for SNNs, which includes topology and weight evolution as per NEAT (Stanley and Miikkulainen, 2002), and integration of biological plastic learning mechanisms. With the goal of simulation-to-reality transfer, we here prove the concept in a time-efficient manner by transferring from a simpler to a more complex model, a transfer that encapsulates some issues inherent in crossing the reality gap, i.e. incomplete capture of true flight dynamics, oversimplification of true conditions.

In this work, we focus on the development of UAV height control. Our approach to resolve this problem is threefold. First, explicit mathematical modeling of the aircraft is not required. Instead, a simplified linear model is identified based on the measurement of the plant’s input and output data. In reality, such models are fast to run and simple to develop. Second, neuroevolution takes place as usual to search through solution space for the construction of high-performance networks. Finally, Hebbian plasticity is implemented by leveraging evolutionary algorithms to optimize plastic rule coefficients that describe how neural connections are updated. Plasticity evolution has been used in conventional ANNs (Urzelai and Floreano, 2001; Soltoggio et al., 2007; Tonelli and Mouret, 2011) and SNNs (Howard et al., 2012), where evolution takes place in the rules that govern synaptic self-organization instead of in the synapses themselves. The evolved controller is able to exhibit online adaptation due to plasticity, which allows successful transfer to a more realistic model and indicates that transfer to reality would be similarly successful.

Organization of the rest of this paper is as follows. Section 2 introduces our SNN package that is utilized to develop our UAV controller, including descriptions of spiking neuron models, the mechanism of plasticity learning and evolutionary learning strategies. Section 3 presents the plant model to be controlled in this work. Section 4, 5 and 6 describe the controller development process in detail. Results and analysis are given in Section 7. Finally, discussions and conclusions are presented in Section 8 and 9.

2. eSpinn: Learning with Spikes

2.1. Background

The current widely used Artificial Neural Networks (ANNs) follow a computation cycle of multiply-accumulate-activate. The neuron model consists of two components: a weighted sum of inputs and an activation function generating the output accordingly. Both the inputs and outputs of these neurons are real-valued. While ANN models have shown exceptional performance in the artificial intelligence domain, they are highly abstracted from their biological counterparts in terms of information representation, transmission and computation paradigms.

SNNs, on the other hand, carry out computation based on biological modeling of neurons and synaptic interactions, and have been of great interest in the computational intelligence community in recent decades. Applications have been both non-behavioral (Abbott et al., 2016) and behavioral (Vasu and Izquierdo, 2017; Qiu et al., 2018). Information transmission in SNNs is by means of discrete spikes generated during a potential integration process. Such spatiotemporal dynamics are able to yield more powerful computation compared with non-spiking neural systems (Maass, 1997). Moreover, neuromorphic hardware implementations of SNNs are believed to provide fast and low-power information processing due to their event-driven sparsity (Bouvier et al., 2019), which perfectly suits embedded applications such as UAVs.

As shown in Fig. 1, spikes are fired at certain points in time, whenever the membrane potential of a neuron exceeds its threshold. They will travel through synapses from the presynaptic neurons and arrive at all forward-connected postsynaptic neurons. The information measured by spikes is in form of timing and frequency, rather than the amplitude or intensity.

Refer to caption — Figure 1. Illustration of spike transmission in SNNs. Membrane potential $v$ accumulates as input spikes arrive and decays with time. Whenever it reaches a given threshold $\theta$ , an output spike will be fired, and the potential will be reset to a resting value.

In order to assist the process of designing our spiking controller, we have developed the eSpinn software package. The eSpinn library stands for Evolving Spiking Neural Networks. It is designed to develop controller learning strategies for nonlinear control models by integrating biological learning mechanisms with neuroevolution algorithms. It is able to accommodate different network implementations (ANNs, SNNs and hybrid models) with specific dataflow schemes. eSpinn is written in C++ and has abundant interfaces to easily archive data through serialization. It also contains scripts for data visualization and integration with MATLAB and Simulink simulations.

2.2. Neuron Model

To date, there have been different kinds of spiking neuron models. When implementing a neuron model, trade-offs must be considered between biological reality and computational efficiency. In this work we use the two-dimensional Izhikevich model (Izhikevich, 2003), because of its capability of exhibiting richness and complexity in neuron firing behavior (detailed in (Izhikevich, 2003)) with only two ordinary differential equations:

(1)		$\displaystyle\dot{v}$	$\displaystyle=0.04v^{2}+5v+140-u+I$
(1)		$\displaystyle\dot{u}$	$\displaystyle=a(bv-u)$

with after-spike resetting following:

(2)

\text{if }v\geq v_{t}\text{, then}\left\{\begin{array}[]{l}v=c\\ u=u+d\end{array}\right.

Here $v$ represents the membrane potential of the neuron; $u$ represents a recovery variable; $\dot{v}$ and $\dot{u}$ denote their derivatives, respectively. $I$ represents the synaptic current that is injected into the neuron. Whenever $v$ exceeds the threshold of membrane potential $v_{t}$ , a spike will be fired and $v$ and $u$ will be reset following Eq. (2). $a,b,c$ and $d$ are dimensionless coefficients that are tunable to form different firing patterns (Izhikevich, 2003). The membrane potential response of an Izhikevich neuron is given in Fig. 2, with an injected current signal.

A spike train is defined as a temporal sequence of firing times:

(3)

s(t)=\sum_{f}\delta(t-t^{(f)})

where $\delta(t)$ is the Dirac $\delta$ function; $t^{(f)}$ represents the firing time, i.e., the moment of $v$ crossing threshold $v_{t}$ from below.

2.3. Network Structure

We use a three-layer architecture that has hidden-layer recurrent connections, illustrated in Fig. 3. The input layer consists of encoding neurons which act as information converters. Hidden-layer spiking neurons are connected via unidirectional weighted synapses among themselves. Such internal recurrence ensures a history of recent inputs is preserved within the network, which exhibits highly nonlinear functionality. Output neurons can be configured as either activation-based or spiking. In this work a linear unit is used to obtain real-value outputs from a weighted sum of outputs from hidden-layer neurons. A bias neuron that has a constant output value is able to connect to any neurons in the hidden and output layers. Connection weights are bounded within [-1, 1]. The NEAT topology and weight evolution scheme is used to form and update network connections and consequently to seek functional network compositions.

In a rate coding scheme, neuron output is defined as the spike train frequency calculated within a given time window. Loss of precision during this process is likely to happen. eSpinn configures a decoding method with high accuracy to derive continuous outputs from discrete spike trains. The implementation involves direct transfer of intermediate membrane potentials as well as decoding of spikes in a rate-based manner.

Figure 3. Spiking network topology that allows internal recurrence. Network inputs (

i

) consist of position error in z-axis

e_{z}

and vertical velocity

v_{z}

. Hidden layer neurons (

h

) are spiking, whose outputs

o_{i}

involve direct transfer of intermediate membrane potential and decoding of firing rate. A bias neuron (

b

) is allowed to connect to any neurons in the hidden and output layer. Output thrust command

T

is calculated based on a weighted sum of incoming neuron activations

\sum w_{i}o_{i}

, which will be fed to the hexacopter plant model. Weights

w_{i}

are bounded within [-1, 1].

2.4. Hebbian Plasticity

In neuroscience, studies have shown that synaptic strength in biological neural systems is not fixed but changes over time (Kandel and Hawkins, 1992) – connections between pre- and postsynaptic neurons are updated according to their degree of causality, which involves changes of synaptic weights or even formation/removal of synapses. This phenomenon is often referred to as Hebbian plasticity as inspired by the Hebb’s postulate (Hebb, 1949).In our work, plastic behaviors are determined by leveraging evolutionary algorithms to optimize plastic rule coefficients, such that each connection is able to develop its own plastic rule.

Modern Hebbian rules generally describe weight change $\Delta w$ as a function of the joint activity of pre- and postsynaptic neurons:

(4)

\Delta w=f(w_{ij},u_{j},u_{i})

where $w_{ij}$ represents the weight of the connection from neuron $j$ to neuron $i$ ; $u_{j}$ and $u_{i}$ represent the firing activity of $j$ and $i$ , respectively.

In a spike-based scheme, we consider the synaptic plasticity at the level of individual spikes. This has led to a phenomenological temporal Hebbian paradigm: Spiking-Timing Dependent Plasticity (STDP) (Gerstner and Kistler, 2002), which modulates synaptic weights between neurons based on the temporal difference of spikes.

While different STDP variants have been proposed (Izhikevich and Desai, 2003), the basic principle of STDP is that the change of weight is driven by the causal correlations between the pre- and postsynaptic spikes. Weight change would be more significant when the two spikes fire closer together in time. The standard STDP learning window is formulated as:

(5)

W(\Delta t)=\left\{\begin{array}[]{ll}A_{+}e^{-\frac{\Delta t}{\tau_{+}}}&\Delta t>0,\\ A_{-}e^{\frac{\Delta t}{\tau_{-}}}&\Delta t<0.\end{array}\right.

where $A_{+}$ and $A_{-}$ are scaling constants of strength of potentiation and depression; $\tau_{+}$ and $\tau_{-}$ represent the time decay constants; $\Delta t$ is the time difference between pre- and post-synaptic firing timings:

(6)

\Delta t=t_{post}-t_{pre}

In eSpinn we have introduced a rate-based Hebbian model derived from the nearest neighbor STDP implementation (Izhikevich and Desai, 2003), with two additional evolvable parameters:

(7)

\dot{w}=u_{i}(\frac{A_{+}}{\tau_{+}^{-1}+u_{i}}+\frac{k_{m}(u_{j}-u_{i}+k_{c})+A_{-}}{\tau_{-}^{-1}+u_{i}})

where $k_{m}$ is a magnitude term that determines the amplitude of weight changes, and $k_{c}$ is a correlation term that determines the correlation between pre- and postsynaptic firing activity. These factors are set to as evolvable so that the best values can be autonomously located. Fig. 4 shows the resulting Hebbian learning curve. The connection weight has a stable converging equilibrium at $u_{\theta}$ , which is due to the correlation term $k_{c}$ . This equilibrium corresponds to a balance of the pre- and postsynaptic firing.

2.5. Learning in Neuroevolution

While gradient methods have been very successful in training traditional MLPs (Demuth et al., 2014), their implementations on SNNs are not as straightforward because they require the availability of gradient information. Instead, eSpinn has developed its own version of a popular neuroevolution approach – NEAT (Stanley and Miikkulainen, 2002), which can accommodate different network implementations and integrate with Hebbian plasticity, as the method to learn the best network controller.

NEAT is a popular neuroevolution algorithm that involves network topology and weight evolution. It enables an incremental network topological growth to discover the (near) minimal effective network structure.

The basis of NEAT is the use of historical markings, which are essentially gene IDs. They are used as a measurement of the genetic similarity of network topology, based on which, genomes are clustered into different species. Then NEAT uses an explicit fitness sharing scheme (Eiben et al., 2015) to preserve network diversities. Meanwhile, these markings are also used to line up genes from variant topologies and allow crossover of divergent genomes in a rationale manner.

eSpinn keeps a global list of innovations (e.g., structural variations), so that when an innovation occurs, we can know whether it has already existed. This mechanism will ensure networks with the same topology will have the exactly same innovation numbers, which is essential during the process of network structural growth.

3. System Modeling

The experimental platform is a commercial hexacopter, Tarot 680 Pro, fitted with a Pixhawk 2 autopilot system. To assist the development and tests of our control paradigms, we have developed a Simulink model based on our previous work (Santoso et al., 2017). The model is derived from first principles, which contains 6-DOF rigid body dynamics and non-linear aerodynamics. Many aspects of the hexacopter dynamics are modeled with C/C++ S-functions, which describe the functionalities of Simulink blocks in C/C++ with MATLAB built-in APIs.

The simulation system is based on a hierarchical architecture. The top-level diagram of the system is given in Fig. 5. The ‘Control Mixing’ block combines controller commands from the ‘Attitude Controller’, ‘Yaw Controller’ and ‘Height Controller’ to calculate appropriate rotor speed commands using a linear mixing matrix.

In the ‘Forces & Moments’ block we take the rotor speeds and calculate the thrust and torque of each rotor based on the relative airflow through the blades. Then the yawing torque will be obtained by simply summing up the torque of each rotor. Rolling and pitching torques can also be calculated by multiplying the thrust of each rotor with corresponding torque arms. Meanwhile, we have also introduced a drag term on the fuselage caused by aircraft climb/descent, of which the direction is opposite to the vector sum of aircraft velocity. The collective thrust would be equal to the sum of thrust of each rotor combined with the drag effect.

Afterwards, the thrust and torques are fed to the ‘Hexacopter Dynamics’ block. Assuming the UAV is a rigid body, Newton’s second law of motion is used to calculate the linear and angular accelerations and hence the state of the drone will be updated. To convert the local velocities of the UAV to the earth-based coordinate we will need a rotation matrix, which is parameterized in terms of quaternion to avoid singularities caused by reciprocating trigonometric functions (gimbal lock).

Finally, closed-loop simulations have been tested to validate the functionality of the Simulink model. Tuned PID controllers that display fast response and low steady output error are used in both the inner and outer loops as a challenging benchmark.

4. Problem Description

In this work, we are aiming to develop an SNN controller for height control of a hexacopter without explicit modeling of the UAV. Hebbian plasticity that is evolved offline enables online adaptation to cross the gap between the identified model and the targeted plant.

The controller takes some known states of the plant model (i.e., error in z-axis between the desired and current position as well as the vertical velocity) and learns to generate a functional action selection policy. The output is a thrust command that will be fed into the plant so that its status can be updated.

Our approach to resolve the problem is threefold. First, system identification is carried out to construct a heave model to loosely approximate the dynamics of the hexacopter. Then neuroevolution is used to search for functional SNN controllers to control the identified heave model. Network topology and initial weight configurations are determined. Finally, the fittest controller is selected for further evolution. Hebbian plasticity is activated so that the network is able to adapt connection weights according to local neural activations. An EA is used to determine the best plasticity rules by evolving the two parameters $k_{m}$ and $k_{c}$ in Eq. 7. Each connection can develop its own plasticity rule. The above-mentioned processes will be offline and only involve the identified model, and the dynamics of the hexacopter are unknown to the controller.

On completion of training, the champion network with the best plasticity rules will be deployed to drive the hexacopter model, which is a more true-to-life representation of the real plant.

5. Identification of Heave Model

We first build a loose approximation to resemble the heave dynamics of the hexacopter. Essentially, this is to model the relationship between the vertical velocity $v_{z}$ , collective thrust $T$ and the vertical acceleration $a_{z}$ . Fig. 6 shows the nonlinear response of vertical acceleration with varying thrust command when the vertical speed is set as -3 m/s to 3 m/s. Note here that the acceleration is actually the net effect of z-axis force acting on the body, which are generated from the rotor thrust, vertical drag caused by rotor downwash and fuselage. The net acceleration $a_{n}$ would be $a_{z}$ plus the gravitational acceleration $g$ .

In our identified model, vertical acceleration $a_{z}$ is approximated as a linear combination of the thrust command $T$ and vertical speed $v_{z}$ . $v_{z}$ , on the other hand, is obtained by integrating the net acceleration of z-axis $a_{n}$ :

(8)	$\displaystyle a_{z}$	$\displaystyle=k_{T}T+k_{v}v_{z}+b$
	$\displaystyle a_{n}$	$\displaystyle=a_{z}+g$
	$\displaystyle v_{z}$	$\displaystyle=\int a_{n}$

where $k_{T}$ and $k_{v}$ are configurable coefficients; $b$ is a bias that is also tunable to make sure that the linear function will be expanded at the point where the net acceleration equals zero, i.e., $a_{z}=-g$ .

We take two of the acceleration curves from Fig. 6 (i.e., for $v_{z}$ = 0 m/s and $v_{z}$ = 1 m/s) to model the linear function. The resulting identified linear model is given in Fig. 7. $k_{T}$ is identified as the slope of $a_{z}$ against $T$ when $v_{z}$ = 0 at the point where $a_{n}=0$ . $k_{v}$ is then calculated from the vertical distance between the two nonlinear curves. Finally, $b$ is set to shift the linear curve vertically, so that the identified model will be tangent with the hexacopter curve at the point where $a_{n}=0$ .

Finally, the same random thrust command is fed to the two different models for validation of functional similarity. System response of the two models are given in Fig. 8. Clearly the response of the identified model differs from the hexacopter model, which is desired, but still the identified model approximates the original system.

6. Controller Development and Deployment

6.1. Evolution of Non-plastic Controllers

With the identified model we have developed according to Eq. 8, we begin to search for optimal network compositions by evolving SNNs using our NEAT implementation. By ‘optimal,’ we mean the SNN controller is defined to be able to drive the plant model to follow a reference signal with minimal error in height during the course of flight. Each simulation (flight) lasts 80 s and is updated every 0.02 s.

To speed up the evolution process, the whole simulation in this part is implemented in C++, with our eSpinn package. At the beginning, a population of non-plastic networks are initialized and categorized into different species. These networks are feed-forward, fully-connected with random connection weights. The initial topology is 2-4-1 (input-hidden-output layer neurons), with an additional bias neuron that is connected to all hidden and output layer neurons. The two inputs consist of error of position in z-axis $e_{z}$ and vertical velocity $v_{z}$ , other than which, the system’s dynamics are unknown to the controller. Output of the controller is thrust command that will be fed to the plant model.

Encoding of sensing data is done by the encoding neurons in the input layer. Input data are first normalized within the range of [0,1], so that the standardized signal can be linearly converted into a current value (i.e., $I$ in Eq. 1). This so-called ‘current coding’ method is a common practice to provide a notional scale to the input metrics.

After initialization, each network will be iterated one by one to be evaluated against the plant model. A fitness value will be assigned to each of them based on their performance. Afterwards, these networks will be ranked within their species according to their fitness values in descending order. A newer generation will be formed from the best parent networks using NEAT: only the top 20% of parents in each species are allowed to reproduce, after which, the previous generation is discarded and the newly created children will form the next generation. During evolution, hidden layer neurons will increase with a probability of 0.005, connections will be added with a probability of 0.01. Connection weights will be bounded within [-1, 1].

The program terminates when the population’s best fitness has been stagnant for 12 generations or if the evolution has reached 50 generations¹¹1empirically determined. During the simulation, outputs of the champion will be saved to files for later visualization. The best fitness will also be saved. Upon completion of simulation, data structure of the whole population will be archived to a text file, which can be retrieved to be constructed in our later work.

6.2. Searching Solutions

Note the control system to be solved is a Constraint Problem (Michalewicz and Schoenauer, 1996), because the height of the UAV must be bounded within some certain range in the real world. However, constraint handling is not straightforward in NEAT – invalid solutions that violate the system’s boundary can be generated, even if their parents satisfy these constraints. Therefore, in this paper we use the feasibility-first principle (Michalewicz and Schoenauer, 1996) to handle the constraints.

We divide the potential solution space into two disjoint regions, the feasible region and the infeasible, by whether the hexacopter is staying in the bounded area during the entire simulation. For infeasible candidates, a penalized fitness function is introduced so that their fitness values are guaranteed to be smaller than those feasible.

We define the fitness function of feasible solutions based on the mean normalized absolute error during the simulation:

(9)

f=1-\bar{\lvert e_{n}\rvert}

where $\lvert e_{n}\rvert$ denotes the normalized absolute error between actual and reference position. Since the error is normalized, desired solution will have a fitness value close to 1.

For infeasible solutions, we define the fitness based on the time that the hexacopter stays in the bounded region:

(10)

f=k(t_{i}/t_{t})

where $t_{i}$ is the steps that the hexacopter successively stays in the bounded region, and $t_{t}$ is the total amount of steps the entire simulation has. Penalty is applied using a scalar $k$ of 0.2.

6.3. Enabling Plasticity

Once the above step is done to discover the optimal network topology, we proceed to consider the plasticity rules. The champion network from the previous step is loaded from file, with the Hebbian rule activated. It is spawned into a NEAT population, where each network connection has randomly initialized Hebbian parameters (i.e., $k_{m}$ and $k_{c}$ in Eq. 7).

Networks are evaluated as previously stated. The best parents will be selected to reproduce. During this step, all evolution is disabled except for that of the plasticity rules, e.g. the EA is only used to determine the optimal configuration of the plasticity rules.

Upon completion of the previous steps, the final network controller is obtained and ready for deployment. To construct the controller in the Simulink hexacopter model, it is implemented as a C++ S-function block.

7. Results and Analysis

10 runs of the controller development process have been conducted to perform statistical analysis. Data are recorded to files and analyzed offline with MATLAB.

7.1. Adaptation in Progress

Table 1 shows the fitness changes of the best controller during the course. From left to right are non-plastic networks controlling the identified model, plastic networks controlling the identified model and plastic networks controlling the hexacopter model, respectively. The fitness values are averaged among the 10 runs.

Table 1. Best Networks’ Mean Fitness Values in Progress

	Non-plastic on id’d model	Plastic on id’d model	Plastic on hexa model
Fitness	0.9189	0.9349	0.9298

As stated in 6.1, evolution would be terminated if the performance does not improve for 12 consecutive generations before the threshold of 50. For non-plastic controllers, only one of the 10 runs has reached the threshold, and its fitness has only increased by 0.0034 in the last 15 generations. This indicates the evolutionary runs of non-plastic controllers have plateaued and further evolution is unlikely to find better solutions. On the other hand, when plasticity is enabled, an increase in fitness can be clearly observed when controlling the same identified model. The plastic controllers demonstrate better performance even when transferred to control the hexacopter model that has different dynamics.

7.2. Plastic vs. Non-plastic

Table 2. Fitness of Non-Plastic vs. Plastic Controllers on the Hexacopter Model

Fitness	Non-plastic	Plastic
Run 1	0.9188	0.9350
Run 2	0.9074	0.9271
Run 3	0.9261	0.9396
Run 4	0.9280	0.9465
Run 5	0.9053	0.9162
Run 6	0.9046	0.9166
Run 7	0.9174	0.9338
Run 8	0.9188	0.9256
Run 9	0.9219	0.9366
Run 10	0.9210	0.9207
Mean	0.9169	0.9298

A second comparison is conducted between non-plastic and plastic controllers on the hexacopter model. Results are given in Table 2. For 9 out of the 10 runs, we can see a performance improvement when plasticity is enabled. The only one not being better, still has a close fitness value. Statistic difference is assessed using the two-tailed Mann-Whitney U-test between the two sets of data. The $U$ -value is 21, showing the plastic controllers are significantly better than the non-plastics at $p<0.05$ .

Fig. 9 shows a typical run using the non-plastic and plastic controller. We can see the plastic control system has a faster response as well as smaller steady error. It is clear that plasticity is a key component to bridge the gap between two models.

7.3. Validation of Plasticity

To verify the contribution of the proposed Hebbian plasticity, we extract the evolved best plastic rule and applied it to other networks that have sub-optimal performance. With plasticity enabled, a sub-optimal network is selected to repetitively drive the hexacopter model to follow the same reference signal. Fig. 10 shows the progress of 4 consecutive runs when a) plasticity is disabled; b-d) plasticity is enabled.

We can see that in Fig. a), there is a considerable steady system output error. When plasticity is turned on, connection weights begin to adjust themselves gradually. The system follows the reference signal with a decreasing steady error until around 0.005 m. Meanwhile a fitness increase is witnessed from a) 0.921296, b) 0.927559, c) 0.932286 to d) 0.933918.

The same results can be obtained when the rule is assigned to other near-optimal networks, while for those with poor initial performance, plasticity learns worse patterns. This analysis has justified our evolutionary approach to search for the optimal plastic function, demonstrating that plasticity narrows the reality gap for evolved spiking neurocontrollers.

7.4. Comparing with PID control

PID control is a classic linear control algorithm that has been dominant in engineering. The aforementioned PID height controller is taken for comparison. Note here the PID controller is designed directly based on the hexacopter model, whereas the SNN controller only relies on the identified model and utilizes Hebbian plasticity to adapt itself to the new plant model. System outputs of the two approaches is given in Fig. 11. Evidently our controller has smaller overshoot and steady output error. The PID controller has a mean absolute error of 0.108 m during the course of flight, while our plastic SNN controller has a value of 0.090 m.

8. Discussion

When transferring the pseudo-optimal controllers to physical-world applications, one may argue we can still rely on evolution to tweak the connection configurations. However, one main problem is that learning in evolution cannot be continuous because the fitness signal during the process is not immediately available. What we propose here is to evolve in advance the adaptive characteristics of the neurocontroller, such that the controller can be self-organizing and adaptive to model changes during the entire lifetime in real-time. There is no guarantee that any Hebbian rules can perform synaptic changes on the desired direction. That is why we use evolution to discover functional Hebbian rules in which synapses build up over time in a meaningful manner.

9. Conclusions

Our work has presented a solution to applied evolutionary aerial robotics, where evolution is used not only in network initial construction, but also to formulate plasticity rules which govern synaptic self-modulation for online adaptation based on local neural activities. We have shown that plasticity can make the controller more adaptive to model changes in a way that evolutionary approaches cannot accommodate. We are currently in the process of applying this controller development strategy to a real hexacopter platform, and expanding from height control to encompass all degrees of freedom in the UAV.

References

(1)
Abbott et al. (2016) L. F. Abbott, Brian DePasquale, and Raoul-Martin Memmesheimer. 2016. Building functional networks of spiking model neurons. Nature Neuroscience 19 (23 Feb 2016), 350 EP –. http://dx.doi.org/10.1038/nn.4241 Perspective.
Alaimo et al. (2013) A. Alaimo, V. Artale, C. Milazzo, A. Ricciardello, and L. Trefiletti. 2013. Mathematical modeling and control of a hexacopter. In 2013 International Conference on Unmanned Aircraft Systems (ICUAS). 1043–1050. https://doi.org/10.1109/ICUAS.2013.6564793
Bouvier et al. (2019) Maxence Bouvier, Alexandre Valentian, Thomas Mesquida, Francois Rummens, Marina Reyboz, Elisa Vianello, and Edith Beigne. 2019. Spiking Neural Networks Hardware Implementations and Challenges: A Survey. J. Emerg. Technol. Comput. Syst. 15, 2, Article 22 (April 2019), 35 pages. https://doi.org/10.1145/3304103
Demuth et al. (2014) Howard B. Demuth, Mark H. Beale, Orlando De Jess, and Martin T. Hagan. 2014. Neural Network Design (2nd ed.). Martin Hagan, USA. http://hagan.okstate.edu/NNDesign.pdf
Eiben et al. (2015) Agoston E Eiben, James E Smith, et al. 2015. Introduction to Evolutionary Computing (second ed.). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44874-8
Garratt and Anavatti (2012) Matthew Garratt and Sreenatha Anavatti. 2012. Non-linear Control of Heave for an Unmanned Helicopter Using a Neural Network. Journal of Intelligent & Robotic Systems 66, 4 (01 Jun 2012), 495–504. https://doi.org/10.1007/s10846-011-9634-9
Gerstner and Kistler (2002) Wulfram Gerstner and Werner Kistler. 2002. Spiking Neuron Models: An Introduction. Cambridge University Press, New York, NY, USA. http://icwww.epfl.ch/~gerstner/BUCH.html
Hebb (1949) Donald Olding Hebb. 1949. The organization of behavior: A neuropsychological theory. John Wiley & Sons. http://pubman.mpdl.mpg.de/pubman/item/escidoc:2346268/component/escidoc:2346267/Hebb_1949_The_Organization_of_Behavior.pdf
Hoffer et al. (2014) Nathan V. Hoffer, Calvin Coopmans, Austin M. Jensen, and YangQuan Chen. 2014. A Survey and Categorization of Small Low-Cost Unmanned Aerial Vehicle System Identification. Journal of Intelligent & Robotic Systems 74, 1 (01 Apr 2014), 129–145. https://doi.org/10.1007/s10846-013-9931-6
Howard et al. (2012) G. Howard, E. Gale, L. Bull, B. de Lacy Costello, and A. Adamatzky. 2012. Evolution of Plastic Learning in Spiking Networks via Memristive Connections. IEEE Transactions on Evolutionary Computation 16, 5 (Oct 2012), 711–729. https://doi.org/10.1109/TEVC.2011.2170199
Izhikevich (2003) E. M. Izhikevich. 2003. Simple model of spiking neurons. IEEE Transactions on Neural Networks 14, 6 (Nov 2003), 1569–1572. https://doi.org/10.1109/TNN.2003.820440
Izhikevich and Desai (2003) Eugene M. Izhikevich and Niraj S. Desai. 2003. Relating STDP to BCM. Neural Computation 15, 7 (2003), 1511–1523. https://doi.org/10.1162/089976603321891783
Kandel and Hawkins (1992) Eric R. Kandel and Robert D. Hawkins. 1992. The Biological Basis of Learning and Individuality. Scientific American 267, 3 (1992), 78–87. http://www.jstor.org/stable/24939215
Kendoul (2012) Farid Kendoul. 2012. Survey of advances in guidance, navigation, and control of unmanned rotorcraft systems. Journal of Field Robotics 29, 2 (2012), 315–378. https://doi.org/10.1002/rob.20414
Maass (1997) Wolfgang Maass. 1997. Networks of spiking neurons: The third generation of neural network models. Neural Networks 10, 9 (1997), 1659–1671. https://doi.org/10.1016/S0893-6080(97)00011-7
Michalewicz and Schoenauer (1996) Zbigniew Michalewicz and Marc Schoenauer. 1996. Evolutionary Algorithms for Constrained Parameter Optimization Problems. Evolutionary Computation 4, 1 (1996), 1–32. https://doi.org/10.1162/evco.1996.4.1.1
Ng et al. (2006) Andrew Y. Ng, Adam Coates, Mark Diel, Varun Ganapathi, Jamie Schulte, Ben Tse, Eric Berger, and Eric Liang. 2006. Autonomous Inverted Helicopter Flight via Reinforcement Learning. Springer Berlin Heidelberg, Berlin, Heidelberg, 363–372. https://doi.org/10.1007/11552246_35
Ng et al. (2004) Andrew Y. Ng, H. J. Kim, Michael I. Jordan, and Shankar Sastry. 2004. Autonomous Helicopter Flight via Reinforcement Learning. In Advances in Neural Information Processing Systems 16, S. Thrun, L. K. Saul, and P. B. Schölkopf (Eds.). MIT Press, 799–806. http://papers.nips.cc/paper/2455-autonomous-helicopter-flight-via-reinforcement-learning.pdf
Pounds et al. (2010) P. Pounds, R. Mahony, and P. Corke. 2010. Modelling and control of a large quadrotor robot. Control Engineering Practice 18, 7 (2010), 691 – 699. https://doi.org/10.1016/j.conengprac.2010.02.008 Special Issue on Aerial Robotics.
Qiu et al. (2018) H. Qiu, M. Garratt, D. Howard, and S. Anavatti. 2018. Evolving Spiking Neural Networks for Nonlinear Control Problems. In 2018 IEEE Symposium Series on Computational Intelligence (SSCI). 1367–1373. https://doi.org/10.1109/SSCI.2018.8628848
Santoso et al. (2017) F. Santoso, M. A. Garratt, and S. G. Anavatti. 2017. A self-learning TS-fuzzy system based on the C-means clustering technique for controlling the altitude of a hexacopter unmanned aerial vehicle. In 2017 International Conference on Advanced Mechatronics, Intelligent Manufacture, and Industrial Automation (ICAMIMIA). 46–51. https://doi.org/10.1109/ICAMIMIA.2017.8387555
Soltoggio et al. (2007) A. Soltoggio, P. Durr, C. Mattiussi, and D. Floreano. 2007. Evolving neuromodulatory topologies for reinforcement learning-like problems. In 2007 IEEE Congress on Evolutionary Computation. 2471–2478. https://doi.org/10.1109/CEC.2007.4424781
Stanley and Miikkulainen (2002) Kenneth O. Stanley and Risto Miikkulainen. 2002. Evolving Neural Networks Through Augmenting Topologies. Evolutionary Computation 10, 2 (2002), 99–127. http://nn.cs.utexas.edu/?stanley:ec02
Tonelli and Mouret (2011) Paul Tonelli and Jean-Baptiste Mouret. 2011. On the Relationships between Synaptic Plasticity and Generative Systems. In Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation (GECCO ’11). Association for Computing Machinery, New York, NY, USA, 1531–1538. https://doi.org/10.1145/2001576.2001782
Urzelai and Floreano (2001) Joseba Urzelai and Dario Floreano. 2001. Evolution of Adaptive Synapses: Robots with Fast Adaptive Behavior in New Environments. Evolutionary Computation 9, 4 (2001), 495–524. https://doi.org/10.1162/10636560152642887
Vasu and Izquierdo (2017) Madhavun Candadai Vasu and Eduardo J. Izquierdo. 2017. Evolution and Analysis of Embodied Spiking Neural Networks Reveals Task-specific Clusters of Effective Networks. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’17). ACM, New York, NY, USA, 75–82. https://doi.org/10.1145/3071178.3071336