This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Event-Triggered Optimal Attitude Consensus
of Multiple Rigid Body Networks with
Unknown Dynamics

Xin Jin, Shuai Mao, Ljupco Kocarev, , Chen Liang, Saiwei Wang,
and Yang Tang
This work was supported in part by the National Key Research and Development Program of China under Grant 2018YFC0809302, the National Natural Science Foundation of China under Grant 61751305, in part by the Program of Shanghai Academic Research Leader under Grant No. 20XD1401300, in part by the Programme of Introducing Talents of Discipline to Universities (the 111 Project) under Grant B17017. (Corresponding author: Yang Tang.)Xin Jin, Shuai Mao, Saiwei Wang, Chen Liang and Yang Tang are with the Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai 200237, China (e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]).Ljupco Kocarev is with the Macedonian Academy of Sciences and Arts, 1000 Skopje, Macedonia, also with the Faculty of Computer Science and Engineering, Univerzitet “Sv. Kiril i Metodij,” 1000 Skopje, Macedonia, and also with the BioCircuits Institute, University of California at San Diego, La Jolla, CA 92093 USA (e-mail: [email protected]).
Abstract

In this paper, an event-triggered Reinforcement Learning (RL) method is proposed for the optimal attitude consensus of multiple rigid body networks with unknown dynamics. Firstly, the consensus error is constructed through the attitude dynamics. According to the Bellman optimality principle, the implicit form of the optimal controller and the corresponding Hamilton-Jacobi-Bellman (HJB) equations are obtained. Because of the augmented system, the optimal controller can be obtained directly without relying on the system dynamics. Secondly, the self-triggered mechanism is applied to reduce the computing and communication burden when updating the controller. In order to address the problem that the HJB equations are difficult to solve analytically, a RL method which only requires measurement data at the event-triggered instants is proposed. For each agent, only one neural network is designed to approximate the optimal value function. Each neural network is updated only at the event-triggered instants. Meanwhile, the Uniformly Ultimately Bounded (UUB) of the closed-loop system is obtained, and the Zeno behavior is also avoided. Finally, the simulation results on a multiple rigid body network demonstrate the validity of the proposed method.

Index Terms:
Optimal attitude consensus, multiple rigid body networks, event-triggered control, reinforcement learning

I Introduction

Consensus control, as a fundamental form of coordination problem in multi-agent systems, aims to design a control protocol for each agent to drive the states of all agents to be synchronized [1][3, 5]. Over recent decades, the attitude consensus problem of multiple rigid body networks has received increasing attention [4] because it plays a significant role in the development of many fields, such as formation in three-dimensional space [6], [7], cooperation of multi-manipulators [8] and satellite networks [9], etc. At present, some results have been proposed, which can be classified into two categories, the leaderless attitude consensus [10][12] and the leader-follower attitude consensus [13][15]. Note that none of them have considered the performance cost when achieving the attitude consensus.

In practical application scenarios, the performance cost is a factor that must be considered, which affects the efficiency of mission completion and the endurance of limited resources. The optimal attitude consensus control, which not only makes the attitude of all rigid body systems tend to be synchronized, but also minimizes the performance cost. In general, the optimal control problem can be transformed into solving the Hamilton-Jacobi-Bellman (HJB) equations. Nevertheless, it is very difficult to find the analytic solutions to the HJB equations. With the popularity of reinforcement learning technology [16, 17, 18] and the rapid development of the computing capacity of processors, some reinforcement learning based researches on solving the optimal consensus problem have emerged. As far as we know, the vast majority of the research objects are linear systems [19][21] or first-order nonlinear systems [22]. In the above results, the knowledge of system dynamics is required in [19], [22], and the researches in [20] and [21] circumvent the dependence on system dynamics. However, the implementation of algorithms in [20] and [21] requires the acquisition of measurement data in advance and a lot of tedious integration operations are involved, which obviously increases the computational burden of the system [46]. At present, there are relatively few results concentrating on the reinforcement learning based method to realize the optimal attitude consensus for multiple rigid body networks. In [23], a model-free algorithm is proposed to deal with the optimal consensus for multiple rigid body networks, in which the model of each rigid body is expressed in the form of Euler-Lagrange equation. However, an extra neural network-based observer is designed to estimate the system dynamics, which imposes additional computational burden. Motivated by these factors, we aim to design a method that only needs real-time measurement data to achieve the optimal attitude consensus of multiple rigid body networks with unknown dynamics.

Updating the controller and the neural networks at each sampling instant based on the reinforcement learning method will take up a lot of computing and communication resources, especially when the system scale is huge. Therefore, it is particularly necessary to integrate the event-triggered mechanism into the reinforcement learning method to reduce the consumption of resources. In recent years, the event-triggered control scheme is widely studied to save the control cost and energy resources [41, 42]. In [24][27], the event-triggered mechanism is introduced to solve the optimal control of an individual system. The optimal consensus of multi-agent systems is considered in [28][30] by using the event-triggered reinforcement learning method. However, the event-triggered conditions in all of the above event-triggered reinforcement learning methods [24][30] include the continuous information. Therefore, all agents need to obtain the state information of themselves and their neighbors in real-time to determine whether the event-triggered condition is satisfied, which inevitably increases the communication resources. Inspired by [31][33], we aim to design an event-triggered reinforcement learning method under the self-triggered mechanism, thereby greatly reducing the consumption of computing and communication resources. Compared with the common linear system and first-order nonlinear system [19][22], it is challenging to combine the self-triggered mechanism with the reinforcement learning method to solve the optimal attitude consensus problem of multiple rigid body networks, since the dynamic model of a rigid body is a second order system with state coupled characteristics and the underlying attitude configuration space is non-Euclidean.

In this paper, we deal with the optimal attitude consensus problem for multiple rigid body networks with unknown system dynamics. The dynamic event-triggered mechanism is first introduced, which can significantly reduce the consumption of computing resources caused by updating the controller. Based on the discussion of the dynamic event-triggered condition, a sufficient self-triggered condition is proposed. Under the self-triggered mechanism, the continuous communication between rigid bodies can be avoided. Moreover, a reinforcement learning method is used to obtain the optimal policy. In detail, a rigid body only needs a neural network to approximate the optimal value function because of the existence of the augmented system [43]. Each neural network is updated only when the self-triggered condition is violated. The main contributions are as follows:

1) By using only the measurement data at the event-triggered instants, we achieve the optimal attitude consensus of multiple rigid body networks with unknown system dynamics. No additional actor neural network [20], [21] or additional neural network-based observer [23] is used in this paper, which obviously reduces the complexity of the algorithm implementation.

2) Compared with the results in [23] and [34], both a dynamic event-triggered condition and a self-triggered condition are integrated into the proposed reinforcement learning based method to solve the optimal attitude consensus of multiple rigid body networks. Under the self-triggered mechanism, the neural networks are updated only at the event-triggered instants. Meanwhile, the continuous communication is also avoided. Therefore, the consumption of computing and communication resources would be greatly reduced.

The following layout of this paper is as follows. In Section II, we introduce the notations used in this paper and the basics of graph theory. The model-free optimal attitude consensus problem is described in Section III. Meanwhile, the event-triggered mechanism is also introduced. We design an event-triggered reinforcement learning method in Section IV. The feasibility of this method is verified through a simulation in Section V. Section VI gives the conclusion.

II Preliminaries

II-A Notations

Throughout this paper, \mathbb{R} represents the set of all real numbers, >0\mathbb{R}_{>0} represents the set of all positive real numbers, \mathbb{N} represents the set of all non-negative integers, and >0\mathbb{N}_{>0} is the set of all positive integers, i.e., =(,+)\mathbb{R}=(-\infty,+\infty), >0=(0,+)\mathbb{R}_{>0}=(0,+\infty), ={0,1,2,}\mathbb{N}=\{0,1,2,...\} and >0={1,2,}\mathbb{N}_{>0}=\{1,2,...\}. xnx\in\mathbb{R}^{n} indicates an nn-dimensional vector, InI_{n} indicates an nn-dimensional identity matrix, An×mA\in\mathbb{R}^{n\times m} indicates an n×mn\times m dimensional matrix. For a vector xx, its Euclidean norm is defined as x=xTx\lVert x\rVert=\sqrt{x^{T}x}. For a square matrix B=[bij]n×nB=[b_{ij}]\in\mathbb{R}^{n\times n}, its trace is defined as tr(B)=i=1nbii\textrm{tr}(B)=\sum_{i=1}^{n}b_{ii}, its Frobenius norm is defined as B=i=1nj=1n|bij|2\lVert B\rVert=\sqrt{\sum_{i=1}^{n}\sum_{j=1}^{n}|b_{ij}|^{2}}. Define λmin(B){\lambda}_{\textrm{min}}(B) and λmax(B){\lambda}_{\textrm{max}}(B) as the minimum eigenvalue and maximum eigenvalue, respectively. B>0B>0 (B0)(B\geq 0) indicates that BB is positive (semi-positive) definite.

For any two vectors ξ=[x1,y1,z1]3\xi=[x_{1},y_{1},z_{1}]^{\top}\in\mathbb{R}^{3} and ζ=[x2,y2,z2]3\zeta=[x_{2},y_{2},z_{2}]^{\top}\in\mathbb{R}^{3}, their cross product is expressed as follows:

ξ×ζ=[y1z2y2z1x2z1x1z2x1y2x2y1]3.\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\xi\times\zeta}=\begin{bmatrix}y_{1}z_{2}-y_{2}z_{1}\\ x_{2}z_{1}-x_{1}z_{2}\\ x_{1}y_{2}-x_{2}y_{1}\end{bmatrix}\in\mathbb{R}^{3}.

II-B Graph Theory

Let 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}) represent the directed communication graph among N>0N\in\mathbb{N}_{>0} rigid bodies, where 𝒱={1,2,,N}\mathcal{V}=\{1,2,...,N\} indicates the set of all rigid bodies and 𝒱×𝒱\mathcal{E}\subseteq\mathcal{V}\times\mathcal{V} indicates the communication relationship between any two rigid bodies. For the rigid body ii, we use 𝒩i={j𝒱|(j,i)}\mathcal{N}_{i}=\{j\in\mathcal{V}|(j,i)\in\mathcal{E}\} to represent the set of its neighbors. For any two rigid bodies, if there is always a direct path between them, we call this communication graph strongly connected. In this paper, we suppose that all directed communication graphs are strongly connected.

In order to express the communication relationship between all rigid bodies more clearly, the weighted adjacency matrix A=[aij]N×NA=[a_{ij}]\in\mathbb{R}^{N\times N} is introduced. If the rigid body ii can receive the data transmitted by the rigid body jj, aij>0a_{ij}>0 and aij=0a_{ij}=0 otherwise. The in-degree matrix of the directed communication graph can be expressed as D=diag{d1,d2,,dN}N×ND=\text{diag}\{d_{1},d_{2},...,d_{N}\}\in\mathbb{R}^{N\times N}, where di=j𝒩iaijd_{i}=\sum_{j\in\mathcal{N}_{i}}a_{ij}. Let =DA=[lij]\mathcal{L}=D-A=[l_{ij}] represent the Laplacian matrix, where lii=dil_{ii}=d_{i}, and lij=aijl_{ij}=-a_{ij} when iji\neq j.

III Model-Free Event-Triggered Optimal Attitude Consensus

III-A Model-Free Optimal Attitude Consensus

We consider a multiple rigid body network with NN nodes, where the attitude of each node can be expressed by Modified Rodriguez Parameters (MRPs) [35]. For the rigid body ii, the attitude is represent by σi=[σi1,σi2,σi3]=ΨitanΦi43\sigma_{i}=[\sigma_{i}^{1},\sigma_{i}^{2},\sigma_{i}^{3}]^{\top}=\Psi_{i}\textrm{tan}\frac{\Phi_{i}}{4}\in\mathbb{R}^{3}, where Ψi3\Psi_{i}\in\mathbb{R}^{3} indicates the Euler axis, and Φi[0,π)\Phi_{i}\in[0,\pi) denotes the angle respective to the Euler axis.

Then, the attitude dynamics of each rigid body is given in the following form:

σi˙=G(σi)ωi,\displaystyle\dot{\sigma_{i}}=G(\sigma_{i})\omega_{i}, (1a)
Jiω˙i=ωi×(Jiωi)+τi,\displaystyle J_{i}\dot{\omega}_{i}=-\omega_{i}\times(J_{i}\omega_{i})+\tau_{i}, i=1,2,,N,\displaystyle\quad i=1,2,...,N, (1b)

where ωi3\omega_{i}\in\mathbb{R}^{3}, Ji3×3J_{i}\in\mathbb{R}^{3\times 3} and τi3\tau_{i}\in\mathbb{R}^{3} indicate the angular velocity vector, the inertia matrix and the control input torque, respectively. The matrix G(σi)=12(σi×+σiσi+1σiσi2I3)3×3G(\sigma_{i})=\frac{1}{2}(\sigma_{i}^{\times}+\sigma_{i}\sigma_{i}^{\top}+\frac{1-\sigma_{i}^{\top}\sigma_{i}}{2}I_{3})\in\mathbb{R}^{3\times 3}, where

σi×=[0σi3σi2σi30σi1σi2σi10].\displaystyle\sigma_{i}^{\times}=\begin{bmatrix}0&-\sigma_{i}^{3}&\sigma_{i}^{2}\\ \sigma_{i}^{3}&0&-\sigma_{i}^{1}\\ -\sigma_{i}^{2}&\sigma_{i}^{1}&0\end{bmatrix}.

Definition 1: Given that the communication topology of a multiple rigid body network (1) is strongly connected, the attitude consensus is said to be achieved when the following conditions hold:

limtσi(t)σj(t)=0,\displaystyle\lim_{t\to\infty}\big{\lVert}\sigma_{i}(t)-\sigma_{j}(t)\big{\rVert}=0, (2a)
limtωi(t)ωj(t)=0,i,j𝒱.\displaystyle\lim_{t\to\infty}\big{\lVert}\omega_{i}(t)-\omega_{j}(t)\big{\rVert}=0,\ \forall i,j\in\mathcal{V}. (2b)

Considering the communication topology among these NN rigid bodies, we can define the following form of consensus error for the rigid body ii:

δi=j𝒩iaij(ωiωj)+αij𝒩iaij(σiσj),\displaystyle\delta_{i}=\sum_{j\in\mathcal{N}_{i}}a_{ij}(\omega_{i}-\omega_{j})+\alpha_{i}\sum_{j\in\mathcal{N}_{i}}a_{ij}(\sigma_{i}-\sigma_{j}), (3)

where αi>0\alpha_{i}\in\mathbb{R}_{>0}. When limtδi=0\lim_{t\to\infty}\delta_{i}=0, i=1,2,,Ni=1,2,...,N, we can easily obtain that σ1=σ2==σN\sigma_{1}=\sigma_{2}=...=\sigma_{N} and ω1=ω2==ωN\omega_{1}=\omega_{2}=...=\omega_{N} with tt\to\infty. That is to say, the attitude consensus is achieved.

The dynamics of δi\delta_{i} can be obtained by taking the derivative of Eq. (3), which is described as follows:

δ˙i\displaystyle\dot{\delta}_{i} =j𝒩iaij(ω˙iω˙j)+αij𝒩iaij(σ˙iσ˙j)\displaystyle=\sum_{j\in\mathcal{N}_{i}}a_{ij}(\dot{\omega}_{i}-\dot{\omega}_{j})+\alpha_{i}\sum_{j\in\mathcal{N}_{i}}a_{ij}(\dot{\sigma}_{i}-\dot{\sigma}_{j})
=Γi(δi)+liiJi1τij𝒩iaijJj1τj,\displaystyle=\varGamma_{i}(\delta_{i})+l_{ii}J_{i}^{-1}\tau_{i}-\sum_{j\in\mathcal{N}_{i}}a_{ij}J_{j}^{-1}\tau_{j}, (4)

where Γi(δi)=αij𝒩iaij(σ˙iσ˙j)+j𝒩iaij[Ji1((G1(σi)σi˙)×(JiG1(σi)σi˙))+Jj1((G1(σj)σj˙)×(JjG1(σj)σj˙))]\varGamma_{i}(\delta_{i})=\alpha_{i}\sum_{j\in\mathcal{N}_{i}}a_{ij}(\dot{\sigma}_{i}-\dot{\sigma}_{j})+\sum_{j\in\mathcal{N}_{i}}a_{ij}\Big{[}-J_{i}^{-1}\Big{(}\big{(}G^{-1}(\sigma_{i})\dot{\sigma_{i}}\big{)}\times\big{(}J_{i}G^{-1}(\sigma_{i})\dot{\sigma_{i}}\big{)}\Big{)}+J_{j}^{-1}\Big{(}\big{(}G^{-1}(\sigma_{j})\dot{\sigma_{j}}\big{)}\times\big{(}J_{j}G^{-1}(\sigma_{j})\dot{\sigma_{j}}\big{)}\Big{)}\Big{]}.

In order to overcome the dependence on model information, a compensator is introduced, which can be expressed by the following affine differential equation:

τ˙i=f(τi)+liig(τi)uij𝒩iaijg(τj)uj,\displaystyle\dot{\tau}_{i}=f(\tau_{i})+l_{ii}g(\tau_{i})u_{i}-\sum_{j\in\mathcal{N}_{i}}a_{ij}g(\tau_{j})u_{j}, (5)

where f():33f(\cdot):\mathbb{R}^{3}\to\mathbb{R}^{3}, g():33×3g(\cdot):\mathbb{R}^{3}\to\mathbb{R}^{3\times 3} are two functions to be designed later, and ui3u_{i}\in\mathbb{R}^{3} is the control input of the compensator. We need to choose appropriate functions f()f(\cdot) and g()g(\cdot) to ensure that the compensator is controllable. In this paper, a feasible pair of f()f(\cdot) and g()g(\cdot) is given as follows:

f(τi)=2τi,\displaystyle f(\tau_{i})=-2\tau_{i}, (6a)
g(τi)=diag{cos2(τi1),cos2(τi2),cos2(τi3)}.\displaystyle g(\tau_{i})=\text{diag}\{\cos^{2}(\tau_{i}^{1}),\cos^{2}(\tau_{i}^{2}),\cos^{2}(\tau_{i}^{3})\}. (6b)

By combining the consensus error δi\delta_{i} and the control input torque τi\tau_{i}, we define an augmented consensus error, which is expressed as ei=[δi,τi]6e_{i}=[\delta_{i}^{\top},\tau_{i}^{\top}]^{\top}\in\mathbb{R}^{6}. According to Eq. (III-A) and Eq. (5), we can use the following augmented system to describe the dynamics of the augmented consensus error:

e˙i=Xi(ei)+liiYi(ei)uij𝒩iaijYj(ej)uj,\displaystyle\dot{e}_{i}=X_{i}(e_{i})+l_{ii}Y_{i}(e_{i})u_{i}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}(e_{j})u_{j}, (7)

where Xi(ei)X_{i}(e_{i}) and Yi(ei)Y_{i}(e_{i}) are represented as follows:

Xi(ei)=[Γi(δi)+liiJi1τij𝒩iaijJj1τjf(τi)]6,\displaystyle X_{i}(e_{i})=\begin{bmatrix}\varGamma_{i}(\delta_{i})+l_{ii}J_{i}^{-1}\tau_{i}-\sum\limits_{j\in\mathcal{N}_{i}}a_{ij}J_{j}^{-1}\tau_{j}\\ f(\tau_{i})\end{bmatrix}\in\mathbb{R}^{6},
Yi(ei)=[0g(τi)]6×3.\displaystyle Y_{i}(e_{i})=\begin{bmatrix}0\\ g(\tau_{i})\end{bmatrix}\in\mathbb{R}^{6\times 3}.

Assumption 1: The matrix Xi(ei)X_{i}(e_{i}) is bounded, i.e., i𝒱\forall i\in\mathcal{V}, Xi(ei)XMei\lVert X_{i}(e_{i})\rVert\leq X_{M}\lVert e_{i}\rVert is satisfied, where XM>0X_{M}\in\mathbb{R}_{>0}.

In order to measure the performance cost of implementing the attitude consensus, a performance function is defined in the following form:

Fi(ei(0),ui,ui)=0(eiQiei+uiRiui)𝑑t,\displaystyle F_{i}(e_{i}(0),u_{i},u_{-i})=\int_{0}^{\infty}\big{(}e_{i}^{\top}Q_{i}e_{i}+u_{i}^{\top}R_{i}u_{i}\big{)}dt, (8)

where ui={uj|j𝒩i}u_{-i}=\{u_{j}|j\in\mathcal{N}_{i}\} indicates the set of control inputs for the neighbors of the rigid body ii, Qi6×6Q_{i}\in\mathbb{R}^{6\times 6}, Qi0Q_{i}\geq 0, Ri3×3R_{i}\in\mathbb{R}^{3\times 3} and Ri>0R_{i}>0. According to Eq. (7), we can conclude that eie_{i} is driven by uiu_{i} and uiu_{-i}. Therefore, the left side of Eq. (8) also contains uiu_{-i}.

According to (8), the value function can be defined as

Vi(ei(t))=t(ei(v)Qiei(v)+ui(v)Riui(v))𝑑v.\displaystyle V_{i}(e_{i}(t))=\int_{t}^{\infty}\big{(}e_{i}(v)^{\top}Q_{i}e_{i}(v)+u_{i}(v)^{\top}R_{i}u_{i}(v)\big{)}dv. (9)

By taking the derivative of Eq. (9), we can obtain the Hamiltonian function in the following form:

Hi(ei,Vi,ui,ui)\displaystyle H_{i}(e_{i},\nabla V_{i},u_{i},u_{-i})
=eiQiei+uiRiui+Vi(Xi+liiYiuij𝒩iaijYjuj)\displaystyle=e_{i}^{\top}Q_{i}e_{i}+u_{i}^{\top}R_{i}u_{i}+\nabla V_{i}^{\top}\Big{(}X_{i}+l_{ii}Y_{i}u_{i}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}u_{j}\Big{)}
=0,\displaystyle=0, (10)

where Vi=Viei.\nabla V_{i}=\frac{\partial V_{i}}{\partial e_{i}}.

The implicit solution of the model-free optimal controller uiu_{i}^{*} can be derived from Hiui=0\frac{\partial H_{i}}{\partial u_{i}}=0, which is represented as

ui=12liiRi1YiVi,\displaystyle u_{i}^{*}=-\frac{1}{2}l_{ii}R_{i}^{-1}Y_{i}^{\top}\nabla V_{i}^{*}, (11)

where ViV_{i}^{*} indicates the optimal value function and Vi=Viei\nabla V_{i}^{*}=\frac{\partial V_{i}^{*}}{\partial e_{i}}.

Combining (III-A) and (11), we can derive the HJB equation for the rigid body ii as follows:

Hi(ei,Vi,ui,ui)\displaystyle H_{i}(e_{i},\nabla V_{i}^{*},u_{i}^{*},u_{-i}^{*})
=eiQiei+(ui)Riui\displaystyle=e_{i}^{\top}Q_{i}e_{i}+(u_{i}^{*})^{\top}R_{i}u_{i}^{*}
+(Vi)(Xi+liiYiuij𝒩iaijYjuj)\displaystyle\quad+(\nabla V_{i}^{*})^{\top}\Big{(}X_{i}+l_{ii}Y_{i}u_{i}^{*}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}u_{j}^{*}\Big{)}
=eiQiei14lii2(Vi)YiRi1YiVi\displaystyle=e_{i}^{\top}Q_{i}e_{i}-\frac{1}{4}l_{ii}^{2}(\nabla V_{i}^{*})^{\top}Y_{i}R_{i}^{-1}Y_{i}^{\top}\nabla V_{i}^{*}
+(Vi)(Xi+12j𝒩iaijljjYjRj1YjVj)\displaystyle\quad+(\nabla V_{i}^{*})^{\top}\Big{(}X_{i}+\frac{1}{2}\sum_{j\in\mathcal{N}_{i}}a_{ij}l_{jj}Y_{j}R_{j}^{-1}Y_{j}^{\top}\nabla V_{j}^{*}\Big{)}
=0.\displaystyle=0. (12)

From Eq. (11), we can observe that the optimal controller contains YiY_{i} and ViV_{i}^{*}. According to the definition of the augmented system (7), we can obtain YiY_{i} which overcomes the dependence on system dynamics. Therefore, we only need to obtain the optimal value function ViV_{i}^{*} from Eq. (III-A) to achieve the optimal controller.

III-B Dynamic Event-Triggered Mechanism

For the purpose of reducing the computational burden when updating the controller, the event-triggered mechanism is introduced. Under the event-triggered mechanism, we only update the controller at a series of discrete instants {tih}h\{t_{i}^{h}\}_{h\in\mathbb{N}}, where tih<tih+1t_{i}^{h}<t_{i}^{h+1} holds with h\forall h\in\mathbb{N} and the initial instant is set as ti0=0t_{i}^{0}=0.

To define the difference between the augmented consensus error at the last event-triggered instant and in real-time as the measurement error Ei(t)6E_{i}(t)\in\mathbb{R}^{6}, which is expressed as

Ei(t)=ei(tih)ei(t),t[tih,tih+1).\displaystyle E_{i}(t)=e_{i}(t_{i}^{h})-e_{i}(t),\ \forall t\in[t_{i}^{h},t_{i}^{h+1}). (13)

For the rigid body ii, the controller uiu_{i} is only updated at the event-triggered instants tiht_{i}^{h}, and remains unchanged until a new event is triggered. During the event-triggered intervals [tih,tih+1)[t_{i}^{h},t_{i}^{h+1}), the neighbor jj of the rigid body ii might update its controller at some instants tjht_{j}^{h^{\prime}}, which performs as a piecewise constant. Letting μj\mu_{j} indicate the event-triggered times of the neighbor jj, we have tjh{tj0,tj1,,tjμj}t_{j}^{h^{\prime}}\in\{t_{j}^{0},t_{j}^{1},...,t_{j}^{\mu_{j}}\} and tj0=tiht_{j}^{0}=t_{i}^{h}.

Therefore, the eie_{i}-dynamics (III-A) during [tih,tih+1)[t_{i}^{h},t_{i}^{h+1}) is represented as follows:

e˙i=Xi+liiYiu^ij𝒩iaijYju^j,\displaystyle\dot{e}_{i}=X_{i}+l_{ii}Y_{i}\hat{u}_{i}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}\hat{u}_{j}, (14)

where u^i=ui(tih)\hat{u}_{i}=u_{i}(t_{i}^{h}) and u^j=uj(tjh)\hat{u}_{j}=u_{j}(t_{j}^{h^{\prime}}) are the control input vectors at the event-triggered instants.

Definition 2 (Event-Triggered Admissible Control): If the following conditions are met: uiu_{i} is piecewise continuous, ui(0)=0u_{i}(0)=0, eie_{i}-dynamics (14) is stable, and the performance function (8) is finite, then uiu_{i} is called event-triggered admissible control.

Depending on the form of optimal controller (11), we can obtain the event-triggered optimal controller as follows:

u^i\displaystyle\hat{u}_{i}^{*} =ui(tih)\displaystyle=u_{i}^{*}(t_{i}^{h})
=12liiRi1YiV^i,t[tih,tih+1),\displaystyle=-\frac{1}{2}l_{ii}R_{i}^{-1}Y_{i}^{\top}\nabla\hat{V}_{i}^{*},\ \forall t\in[t_{i}^{h},t_{i}^{h+1}), (15)

where V^i=Viei(tih)\nabla\hat{V}_{i}^{*}=\frac{\partial V_{i}^{*}}{\partial e_{i}}(t_{i}^{h}).

By combining (III-A) and (III-B), the event-triggered HJB equation for the rigid body ii is given as follows:

Hi(ei,V^i,u^i,u^i)\displaystyle H_{i}(e_{i},\nabla\hat{V}_{i}^{*},\hat{u}_{i}^{*},\hat{u}_{-i}^{*})
=eiQiei14(V^i)YiRi1YiV^i\displaystyle=e_{i}^{\top}Q_{i}e_{i}-\frac{1}{4}(\nabla\hat{V}_{i}^{*})^{\top}Y_{i}R_{i}^{-1}Y_{i}^{\top}\nabla\hat{V}_{i}^{*}
+(V^i)(Xi+12j𝒩iaijljjYjRj1YjV^j)\displaystyle\quad+(\nabla\hat{V}_{i}^{*})^{\top}\Big{(}X_{i}+\frac{1}{2}\sum_{j\in\mathcal{N}_{i}}a_{ij}l_{jj}Y_{j}R_{j}^{-1}Y_{j}^{\top}\nabla\hat{V}_{j}^{*}\Big{)}
=0.\displaystyle=0. (16)

The following assumption is proposed for proving the stability of system (14).

Assumption 2 [36]: The controller uiu_{i} is Lipschitz continuous during the time interval [tih,tih+1)[t_{i}^{h},t_{i}^{h+1}), and there exists a constant PP that satisfies the following inequality:

ui(ei(tih))ui(ei(t))PEi(t),\displaystyle\big{\lVert}u_{i}\big{(}e_{i}(t_{i}^{h})\big{)}-u_{i}\big{(}e_{i}(t)\big{)}\big{\rVert}\leq P\lVert E_{i}(t)\rVert, (17)

where PP indicates the Lipschitz constant. In the actual engineering applications, the selection of PP should not be smaller than the maximum value of ui/ei\lVert\partial u_{i}/\partial e_{i}^{\top}\rVert.

For the event-triggered control, it is necessary to ensure that Zeno-behavior does not occur in the proposed event-triggered mechanism. Therefore, we introduce a dynamic event-triggered mechanism inspired by [31] to exclude the Zeno-behavior implicitly. Firstly, a dynamic variable yi(t)y_{i}(t) is defined as follows:

y˙i(t)=\displaystyle\dot{y}_{i}(t)= γiyi(t)+κi(ϖiλmin(Qi)ei(t)2\displaystyle-\gamma_{i}y_{i}(t)+\kappa_{i}\Big{(}\varpi_{i}\lambda_{\textrm{min}}(Q_{i})\big{\lVert}e_{i}(t)\big{\rVert}^{2}
λmax(Ri)P2Ei(t)2),\displaystyle-\lambda_{\textrm{max}}(R_{i})P^{2}\big{\lVert}E_{i}(t)\big{\rVert}^{2}\Big{)}, (18)

where yi(0)0y_{i}(0)\in\mathbb{R}_{\geq 0}, γi>0\gamma_{i}\in\mathbb{R}_{>0}, κi[0,12]\kappa_{i}\in[0,\frac{1}{2}] and ϖi[0,1]\varpi_{i}\in[0,1]. Then, we can obtain the event-triggered instants through the following dynamic event-triggered condition:

ti0=0,\displaystyle t_{i}^{0}=0,
tih+1=\displaystyle t_{i}^{h+1}= maxrtih{r:yi(t)+θi(ϖiλmin(Qi)ei(t)2\displaystyle\max\limits_{r\geq t_{i}^{h}}\biggl{\{}r\in\mathbb{R}:y_{i}(t)+\theta_{i}\Big{(}\varpi_{i}\lambda_{\textrm{min}}(Q_{i})\big{\lVert}e_{i}(t)\big{\rVert}^{2}
λmax(Ri)P2Ei(t)2)0,t[tih,r]},\displaystyle-\lambda_{\textrm{max}}(R_{i})P^{2}\big{\lVert}E_{i}(t)\big{\rVert}^{2}\Big{)}\geq 0,\forall t\in[t_{i}^{h},r]\biggr{\}}, (19)

where θi>0\theta_{i}\in\mathbb{R}_{>0} will be determined later in the proof analysis.

Lemma 1: Assuming that the event-triggered instants tiht_{i}^{h} are determined by (III-B), yi(t)0y_{i}(t)\geq 0 always holds for t[0,+)\forall t\in[0,+\infty) if the given initial value yi(0)0y_{i}(0)\geq 0.

Proof: For t[0,+)\forall t\in[0,+\infty), the event-triggered condition (III-B) guarantees the following inequality:

yi(t)+θi(ϖiλmin(Qi)ei(t)2λmax(Ri)P2Ei(t)2)0.\displaystyle y_{i}(t)+\theta_{i}\Big{(}\varpi_{i}\lambda_{\textrm{min}}(Q_{i})\big{\lVert}e_{i}(t)\big{\rVert}^{2}-\lambda_{\textrm{max}}(R_{i})P^{2}\big{\lVert}E_{i}(t)\big{\rVert}^{2}\Big{)}\geq 0. (20)

Since the selection of θi\theta_{i} must satisfy θi>0\theta_{i}>0, the inequality (20) becomes

ϖiλmin(Qi)ei(t)2λmax(Ri)P2Ei(t)21θiyi(t).\displaystyle\varpi_{i}\lambda_{\textrm{min}}(Q_{i})\big{\lVert}e_{i}(t)\big{\rVert}^{2}-\lambda_{\textrm{max}}(R_{i})P^{2}\big{\lVert}E_{i}(t)\big{\rVert}^{2}\geq-\frac{1}{\theta_{i}}y_{i}(t). (21)

By combining (III-B) and (21), we can easily obtain that for t[0,+)\forall t\in[0,+\infty),

y˙i(t)(γi+κiθi)yi(t).\displaystyle\dot{y}_{i}(t)\geq-(\gamma_{i}+\frac{\kappa_{i}}{\theta_{i}})y_{i}(t). (22)

According to the comparison lemma in [37], we can deduce that

yi(t)yi(0)exp{(γi+κiθi)t}.\displaystyle y_{i}(t)\geq y_{i}(0){\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{-(\gamma_{i}+\frac{\kappa_{i}}{\theta_{i}})t\}.} (23)

Therefore, yi(t)0y_{i}(t)\geq 0 always holds for t[0,+)\forall t\in[0,+\infty). The complete proof is given. \hfill\blacksquare

Theorem 2: Consider a multiple rigid body network with NN nodes under a strongly connected communication topology. Supposed that Assumption 2 holds, the performance function and the event-triggered optimal controller are given by (8) and (III-B), respectively. If the event-triggered instants are determined by the dynamic event-triggered condition (III-B), then the following two conclusions can be obtained:

1) The eie_{i}-dynamics (14) is asymptotically stable, i.e., the optimal attitude consensus is achieved.

2) The Zeno behavior is excluded, i.e., the interval between tih+1t_{i}^{h+1} and tiht_{i}^{h}, i𝒱\forall i\in\mathcal{V} has a positive lower bound.

Proof: 1) Firstly, we prove that the eie_{i}-dynamics (14) is asymptotically stable. Choosing Πi(t)=Vi(ei(t))+yi(t)\Pi_{i}(t)=V_{i}^{*}\big{(}e_{i}(t)\big{)}+y_{i}(t) as the Lyapunov function, which contains the optimal value function Vi(ei(t))V_{i}^{*}\big{(}e_{i}(t)\big{)} in (9) and the dynamic variable yi(t)y_{i}(t) is governed by (III-B).

By taking the first-order derivative of Vi(ei)V_{i}^{*}(e_{i}) with respect to tt along with the consensus error eie_{i}, we derive

V˙i(ei)\displaystyle\dot{V}_{i}^{*}(e_{i}) =(Vi)e˙i\displaystyle=(\nabla V_{i}^{*})^{\top}\dot{e}_{i}
=(Vi)(Xi+liiYiu^ij𝒩iaijYju^j).\displaystyle=(\nabla V_{i}^{*})^{\top}\Big{(}X_{i}+l_{ii}Y_{i}\hat{u}_{i}^{*}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}\hat{u}_{j}^{*}\Big{)}. (24)

During the event-triggered intervals [tih,tih+1)[t_{i}^{h},t_{i}^{h+1}), assuming that the neighbors of the rigid body ii execute u^j=uj(tjh)\hat{u}_{j}^{*}=u_{j}^{*}(t_{j}^{h^{\prime}}). According to (11) and (III-A), it can be easily obtained that

(Vi)Xi=\displaystyle(\nabla V_{i}^{*})^{\top}X_{i}= eiQiei(ui)Riui(Vi)liiYiui\displaystyle-e_{i}^{\top}Q_{i}e_{i}-(u_{i}^{*})^{\top}R_{i}u_{i}^{*}-(\nabla V_{i}^{*})^{\top}l_{ii}Y_{i}u_{i}^{*}
+(Vi)j𝒩iaijYju^j\displaystyle+(\nabla V_{i}^{*})^{\top}\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}\hat{u}_{j}^{*} (25)

and

(\displaystyle(\nabla Vi)liiYi=2(ui)Ri.\displaystyle V_{i}^{*})^{\top}l_{ii}Y_{i}=-2(u_{i}^{*})^{\top}R_{i}. (26)

Thus, Eq. (III-B) can be redescribed as

V˙i(ei)=\displaystyle\dot{V}_{i}^{*}(e_{i})= eiQiei(ui)Riui\displaystyle-e_{i}^{\top}Q_{i}e_{i}-(u_{i}^{*})^{\top}R_{i}u_{i}^{*}
+(Vi)liiYi(u^iui)\displaystyle+(\nabla V_{i}^{*})^{\top}l_{ii}Y_{i}(\hat{u}_{i}^{*}-u_{i}^{*})
=\displaystyle= eiQiei+(ui)Riui2(ui)Riu^i\displaystyle-e_{i}^{\top}Q_{i}e_{i}+(u_{i}^{*})^{\top}R_{i}u_{i}^{*}-2(u_{i}^{*})^{\top}R_{i}\hat{u}_{i}^{*}
=\displaystyle= eiQiei(u^i)Riu^i\displaystyle-e_{i}^{\top}Q_{i}e_{i}-(\hat{u}_{i}^{*})^{\top}R_{i}\hat{u}_{i}^{*}
+(uiu^i)Ri(uiu^i)\displaystyle+(u_{i}^{*}-\hat{u}_{i}^{*})^{\top}R_{i}(u_{i}^{*}-\hat{u}_{i}^{*})
\displaystyle\leq λmin(Qi)ei(t)2+λmax(Ri)P2Ei(t)2.\displaystyle-\lambda_{\textrm{min}}(Q_{i})\big{\lVert}e_{i}(t)\big{\rVert}^{2}+\lambda_{\textrm{max}}(R_{i})P^{2}\big{\lVert}E_{i}(t)\big{\rVert}^{2}. (27)

According to (III-B) and (III-B), we can obtain the first-order derivative of Πi(t)\Pi_{i}(t) as follows:

Π˙i(t)=\displaystyle\dot{\Pi}_{i}(t)= V˙i(ei)+y˙i(t)\displaystyle\dot{V}_{i}^{*}(e_{i})+\dot{y}_{i}(t)
\displaystyle\leq λmin(Qi)ei(t)2+λmax(Ri)P2Ei(t)2\displaystyle-\lambda_{\textrm{min}}(Q_{i})\big{\lVert}e_{i}(t)\big{\rVert}^{2}+\lambda_{\textrm{max}}(R_{i})P^{2}\big{\lVert}E_{i}(t)\big{\rVert}^{2}
γiyi(t)+κi(ϖiλmin(Qi)ei(t)2\displaystyle-\gamma_{i}y_{i}(t)+\kappa_{i}\Big{(}\varpi_{i}\lambda_{\textrm{min}}(Q_{i})\big{\lVert}e_{i}(t)\big{\rVert}^{2}
λmax(Ri)P2Ei(t)2)\displaystyle-\lambda_{\textrm{max}}(R_{i})P^{2}\big{\lVert}E_{i}(t)\big{\rVert}^{2}\Big{)}
\displaystyle\leq (1ϖi)λmin(Qi)ei(t)2γiyi(t)\displaystyle-(1-\varpi_{i})\lambda_{\textrm{min}}(Q_{i})\big{\lVert}e_{i}(t)\big{\rVert}^{2}-\gamma_{i}y_{i}(t)
+(κi1)(ϖiλmin(Qi)ei(t)2\displaystyle+(\kappa_{i}-1)\Big{(}\varpi_{i}\lambda_{\textrm{min}}(Q_{i})\big{\lVert}e_{i}(t)\big{\rVert}^{2}
λmax(Ri)P2Ei(t)2).\displaystyle-\lambda_{\textrm{max}}(R_{i})P^{2}\big{\lVert}E_{i}(t)\big{\rVert}^{2}\Big{)}. (28)

Substituting the dynamic event-triggered condition (III-B) into (III-B), Π˙i(t)\dot{\Pi}_{i}(t) becomes

Π˙i(t)\displaystyle\dot{\Pi}_{i}(t)\leq (1ϖi)λmin(Qi)ei(t)2γiyi(t)\displaystyle-(1-\varpi_{i})\lambda_{\textrm{min}}(Q_{i})\big{\lVert}e_{i}(t)\big{\rVert}^{2}-\gamma_{i}y_{i}(t)
+(κi1)(1θi)yi(t)\displaystyle+(\kappa_{i}-1)(-\frac{1}{\theta_{i}})y_{i}(t)
\displaystyle\leq (1ϖi)λmin(Qi)ei(t)2(γi+κi1θi)yi(t).\displaystyle-(1-\varpi_{i})\lambda_{\textrm{min}}(Q_{i})\big{\lVert}e_{i}(t)\big{\rVert}^{2}-(\gamma_{i}+\frac{\kappa_{i}-1}{\theta_{i}})y_{i}(t). (29)

Since ϖi[0,1]\varpi_{i}\in[0,1], we have Π˙i(t)0\dot{\Pi}_{i}(t)\leq 0 if θi[1κiγi,+)\theta_{i}\in[\frac{1-\kappa_{i}}{\gamma_{i}},+\infty). Therefore, we can select appropriate γi>0\gamma_{i}\in\mathbb{R}_{>0}, κi[0,12]\kappa_{i}\in[0,\frac{1}{2}], ϖi[0,1]\varpi_{i}\in[0,1] and θi[1κiγi,+)\theta_{i}\in[\frac{1-\kappa_{i}}{\gamma_{i}},+\infty) to ensure that the eie_{i}-dynamics (14) is asymptotically stable under the dynamic event-triggered condition (III-B).

2) Then, we prove that the Zeno behavior is excluded.

According to (23), we can deduce a sufficient condition of the dynamic event-trigger condition (III-B) when ϖ\varpi is selected as zero, which is expressed as follows:

tih+1=\displaystyle t_{i}^{h+1}= maxrtih{r:Ei(t)yi(0)θiλmax(Ri)P2\displaystyle\max\limits_{r\geq t_{i}^{h}}\biggl{\{}r\in\mathbb{R}:\big{\lVert}E_{i}(t)\big{\rVert}\leq\sqrt{\frac{y_{i}(0)}{\theta_{i}\lambda_{\textrm{max}}(R_{i})P^{2}}}
×exp{12(γi+κiθi)t},t[tih,r]}.\displaystyle\times{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{-\frac{1}{2}{(\gamma_{i}+\frac{\kappa_{i}}{\theta_{i}})t}\}},\forall t\in[t_{i}^{h},r]\biggr{\}}. (30)

According to the definition of YiY_{i}, we can conclude that YiY_{i} is bounded. That is to say, YiYMY_{i}\leq Y_{M} is satisfied, where YM>0Y_{M}\in\mathbb{R}_{>0}. With Assumption 1 and the definition of measurement error Ei(t)E_{i}(t), we can obtain that for t[tih,tih+1)\forall t\in[t_{i}^{h},t_{i}^{h+1}),

E˙i(t)\displaystyle\big{\lVert}\dot{E}_{i}(t)\big{\rVert} =e˙i(tih)e˙i(t)\displaystyle=\big{\lVert}\dot{e}_{i}(t_{i}^{h})-\dot{e}_{i}(t)\big{\rVert}
=Xi(ei(t))+liiYiui(tih)j𝒩iaijYjuj(tjh)\displaystyle=\big{\lVert}X_{i}\big{(}e_{i}(t)\big{)}+l_{ii}Y_{i}u_{i}(t_{i}^{h})-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}u_{j}(t_{j}^{h^{\prime}})\big{\rVert}
XMei(t)+j𝒩iaijYM(ui(tih)+uj(tjh))\displaystyle\leq X_{M}\big{\lVert}e_{i}(t)\big{\rVert}+\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{M}\big{(}\big{\lVert}u_{i}(t_{i}^{h})\big{\lVert}+\big{\lVert}u_{j}(t_{j}^{h^{\prime}})\big{\rVert}\big{)}
XMEi(t)+XMei(tih)\displaystyle\leq X_{M}\big{\lVert}E_{i}(t)\big{\rVert}+X_{M}\big{\lVert}e_{i}(t_{i}^{h})\big{\rVert}
+j𝒩iaijYM(ui(tih)+uj(tjh)).\displaystyle\quad+\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{M}\big{(}\big{\lVert}u_{i}(t_{i}^{h})\big{\lVert}+\big{\lVert}u_{j}(t_{j}^{h^{\prime}})\big{\rVert}\big{)}. (31)

Since Ei(tih)=0E_{i}(t_{i}^{h})=0 is satisfied at the event-triggered instants, we can derive the following inequality by using the comparison lemma [37]:

Ei(t)\displaystyle\big{\lVert}E_{i}(t)\big{\rVert} exp{XM(ttih)}Ei(tih)\displaystyle\leq{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{X_{M}(t-t_{i}^{h})\}}E_{i}(t_{i}^{h})
+12tihtexp{XM(tv)}(XMei(tih)\displaystyle\quad+\frac{1}{2}\int_{t_{i}^{h}}^{t}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{X_{M}(t-v)\}}\Big{(}X_{M}\big{\lVert}e_{i}(t_{i}^{h})\big{\rVert}
+j𝒩iaijYM(ui(tih)+uj(tjh)))dv\displaystyle\quad+\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{M}\big{(}\big{\lVert}u_{i}(t_{i}^{h})\big{\lVert}+\big{\lVert}u_{j}(t_{j}^{h^{\prime}})\big{\rVert}\big{)}\Big{)}dv
12tihtexp{XM(tv)}(XMei(tih)\displaystyle\leq\frac{1}{2}\int_{t_{i}^{h}}^{t}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{X_{M}(t-v)\}}\Big{(}X_{M}\big{\lVert}e_{i}(t_{i}^{h})\big{\rVert}
+j𝒩iaijYM(ui(tih)+uj(tjh)))dv.\displaystyle\quad+\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{M}\big{(}\big{\lVert}u_{i}(t_{i}^{h})\big{\lVert}+\big{\lVert}u_{j}(t_{j}^{h^{\prime}})\big{\rVert}\big{)}\Big{)}dv. (32)

Let t~ih+1\tilde{t}_{i}^{h+1} indicate the next event-triggered instant determined by the sufficient condition (III-B). According to (III-B), the sufficient condition (III-B) can be divided into two situations during the time interval [tih,t~ih+1)[t_{i}^{h},\tilde{t}_{i}^{h+1}), which contains: 1) there is no events occuring for all rigid bodies in 𝒩i\mathcal{N}_{i}, and 2) there exists at least one event for the rigid body j𝒩ij\in\mathcal{N}_{i}.

Situation 1: For all rigid bodies in 𝒩i\mathcal{N}_{i}, there is no event-triggered instants during [tih,t~ih+1)[t_{i}^{h},\tilde{t}_{i}^{h+1}). Therefore, we can deduce the following inequality:

yi(0)θiλmax(Ri)P2exp{12(γi+κiθi)t~ih+1}\displaystyle\sqrt{\frac{y_{i}(0)}{\theta_{i}\lambda_{\textrm{max}}(R_{i})P^{2}}}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{-\frac{1}{2}{(\gamma_{i}+\frac{\kappa_{i}}{\theta_{i}})\tilde{t}_{i}^{h+1}}\}}
XMei(tih)+j𝒩iaijYM(ui(tih)+uj(tjh))2XM\displaystyle\leq\frac{X_{M}\big{\lVert}e_{i}(t_{i}^{h})\big{\rVert}+\sum\limits_{j\in\mathcal{N}_{i}}a_{ij}Y_{M}\big{(}\big{\lVert}u_{i}(t_{i}^{h})\big{\lVert}+\big{\lVert}u_{j}(t_{j}^{h^{\prime}})\big{\rVert}\big{)}}{2X_{M}}
×(exp{XM(t~ih+1tih)}1).\displaystyle\quad\times\Big{(}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{X_{M}(\tilde{t}_{i}^{h+1}-t_{i}^{h})\}}-1\Big{)}. (33)

Situation 2: For the rigid body j𝒩ij\in\mathcal{N}_{i}, there exists μj>0\mu_{j}\in\mathbb{N}_{>0} event-triggered instants during [tih,t~ih+1)[t_{i}^{h},\tilde{t}_{i}^{h+1}). By using tj0,tj1,,tjμjt_{j}^{0},t_{j}^{1},...,t_{j}^{\mu_{j}} to indicate the event-triggered instants and tj0=tikt_{j}^{0}=t_{i}^{k}, we can deduce

yi(0)θiλmax(Ri)P2exp{12(γi+κiθi)t~ih+1}\displaystyle\sqrt{\frac{y_{i}(0)}{\theta_{i}\lambda_{\textrm{max}}(R_{i})P^{2}}}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{-\frac{1}{2}{(\gamma_{i}+\frac{\kappa_{i}}{\theta_{i}})\tilde{t}_{i}^{h+1}}\}}
XMei(tih)+j𝒩iaijYMui(tih)2XM\displaystyle\leq\frac{X_{M}\big{\lVert}e_{i}(t_{i}^{h})\big{\rVert}+\sum\limits_{j\in\mathcal{N}_{i}}a_{ij}Y_{M}\big{\lVert}u_{i}(t_{i}^{h})\big{\lVert}}{2X_{M}}
×(exp{XM(t~ih+1tih)}1)\displaystyle\quad\quad\times\Big{(}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{X_{M}(\tilde{t}_{i}^{h+1}-t_{i}^{h})\}}-1\Big{)}
+j𝒩iaijYMs=0μj1uj(tjs)2XM\displaystyle\quad+\frac{\sum\limits_{j\in\mathcal{N}_{i}}a_{ij}Y_{M}\sum\limits_{s=0}^{\mu_{j}-1}\big{\lVert}u_{j}(t_{j}^{s})\big{\lVert}}{2X_{M}}
×(exp{XM(tjs+1tis)}1)\displaystyle\quad\quad\times\Big{(}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{X_{M}(t_{j}^{s+1}-t_{i}^{s})\}}-1\Big{)}
+j𝒩iaijYMuj(tjμj)2XM\displaystyle\quad+\frac{\sum\limits_{j\in\mathcal{N}_{i}}a_{ij}Y_{M}\big{\lVert}u_{j}(t_{j}^{\mu_{j}})\big{\lVert}}{2X_{M}}
×(exp{XM(t~ih+1tjμj)}1).\displaystyle\quad\quad\times\Big{(}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{X_{M}(\tilde{t}_{i}^{h+1}-t_{j}^{\mu_{j}})\}}-1\Big{)}. (34)

Combining (III-B) and (III-B), we can obtain the unified form of the two situations, which is expressed as follows:

yi(0)θiλmax(Ri)P2exp{12(γi+κiθi)t~ih+1}\displaystyle\sqrt{\frac{y_{i}(0)}{\theta_{i}\lambda_{\textrm{max}}(R_{i})P^{2}}}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{-\frac{1}{2}{(\gamma_{i}+\frac{\kappa_{i}}{\theta_{i}})\tilde{t}_{i}^{h+1}}\}}
XMei(tih)+Θi2XM(exp{XM(t~ih+1tih)}1),\displaystyle\leq\frac{X_{M}\big{\lVert}e_{i}(t_{i}^{h})\big{\rVert}+\Theta_{i}}{2X_{M}}\Big{(}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{X_{M}(\tilde{t}_{i}^{h+1}-t_{i}^{h})\}}-1\Big{)}, (35)

where Θi=j𝒩iaijYM(ui(tih)+maxs=0,,μj{uj(tjs)})\Theta_{i}=\sum\limits_{j\in\mathcal{N}_{i}}a_{ij}Y_{M}\Big{(}\big{\lVert}u_{i}(t_{i}^{h})\big{\lVert}+\max\limits_{s=0,...,\mu_{j}}\big{\{}\big{\lVert}u_{j}(t_{j}^{s})\big{\rVert}\big{\}}\Big{)}.

Since t~ih+1\tilde{t}_{i}^{h+1} is determined by the sufficient condition (III-B), and let tih+1t_{i}^{h+1} indicate the next event-triggered instant determined by (III-B), we can obtain the interval between two adjacent event-triggered instants:

tih+1tih\displaystyle t_{i}^{h+1}-t_{i}^{h} t~ih+1tih\displaystyle\geq\tilde{t}_{i}^{h+1}-t_{i}^{h}
1XMlog(2XMyi(0)(XMei(tih)+Θi)θiλmax(Ri)P2\displaystyle\geq\frac{1}{X_{M}}\text{log}\Bigg{(}\frac{2X_{M}\sqrt{y_{i}(0)}}{\big{(}X_{M}\big{\lVert}e_{i}(t_{i}^{h})\big{\rVert}+\Theta_{i}\big{)}\sqrt{\theta_{i}\lambda_{\textrm{max}}(R_{i})P^{2}}}
×exp{12(γi+κiθi)t~ih+1}+1)>0.\displaystyle\quad\times{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{-\frac{1}{2}{(\gamma_{i}+\frac{\kappa_{i}}{\theta_{i}})\tilde{t}_{i}^{h+1}}\}}+1\Bigg{)}>0. (36)

Therefore, the Zeno behavior can be excluded. The complete proof is given. \hfill\blacksquare

III-C Self-Triggered Mechanism

Under the dynamic event-triggered mechanism, we have to obtain the continuous consensus error ei(t)e_{i}(t) and the continuous measurement error Ei(t)E_{i}(t) to judge whether the dynamic event-triggered condition (III-B) is violated. Therefore, it is necessary to continuously communicate with neighbors to obtain their absolute attitude information, or to measure continuous relative attitude information with the help of sensors such as cameras. In order to overcome this problem, a self-triggered condition is proposed in this subsection.

Letting κi=0\kappa_{i}=0, Eq. (III-B) becomes

y˙i(t)=λiyi(t).\displaystyle\dot{y}_{i}(t)=-\lambda_{i}y_{i}(t). (37)

Therefore, we can obtain that yi(t)=yi(0)exp{λit}y_{i}(t)=y_{i}(0){\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{-\lambda_{i}t\}}, t[0,+)\forall t\in[0,+\infty).

Then, letting ϖi=0\varpi_{i}=0, the dynamic event-triggered condition (III-B) becomes

tih+1=\displaystyle t_{i}^{h+1}= maxrtih{r:Ei(t)yi(0)θiλmax(Ri)P2\displaystyle\max\limits_{r\geq t_{i}^{h}}\biggl{\{}r\in\mathbb{R}:\big{\lVert}E_{i}(t)\big{\rVert}\leq\sqrt{\frac{y_{i}(0)}{\theta_{i}\lambda_{\textrm{max}}(R_{i})P^{2}}}
×exp{12λit},t[tih,r]},\displaystyle\times{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{-\frac{1}{2}\lambda_{i}t\}},\forall t\in[t_{i}^{h},r]\biggr{\}}, (38)

where θi[1γi,+)\theta_{i}\in[\frac{1}{\gamma_{i}},+\infty).

According to (III-B), the self-triggered measurement error is defined in the following form:

Δi(t)=XMei(tih)+Θi2XM(exp{XM(ttih)}1),\displaystyle\big{\lVert}\Delta_{i}(t)\big{\rVert}=\frac{X_{M}\big{\lVert}e_{i}(t_{i}^{h})\big{\rVert}+\Theta_{i}}{2X_{M}}\Big{(}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{X_{M}(t-t_{i}^{h})\}}-1\Big{)}, (39)

which is the upper bound of Ei(t)\big{\lVert}E_{i}(t)\big{\rVert}.

Thus, we can obtain a new sufficient condition of the dynamic event-triggered condition (III-B) as follows:

tih+1=\displaystyle t_{i}^{h+1}= maxrtih{r:Δi(t)yi(0)θiλmax(Ri)P2\displaystyle\max\limits_{r\geq t_{i}^{h}}\biggl{\{}r\in\mathbb{R}:\big{\lVert}\Delta_{i}(t)\big{\rVert}\leq\sqrt{\frac{y_{i}(0)}{\theta_{i}\lambda_{\textrm{max}}(R_{i})P^{2}}}
×exp{12λit},t[tih,r]}.\displaystyle\times{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{-\frac{1}{2}\lambda_{i}t\}},\forall t\in[t_{i}^{h},r]\biggr{\}}. (40)

According to (39), we can calculate the value of Δi(t)\big{\lVert}\Delta_{i}(t)\big{\rVert} without using the continuous information. Therefore, the continuous communication is avoided.

Remark 1: Since the self-triggered condition (III-C) is a sufficient condition for the dynamic event-triggered condition (III-B), the number of triggered times by using (III-C) will be higher than using (III-B). Our original intention of introducing the self-triggered mechanism is to reduce the consumption of communication resources, which will inevitably increase the number of triggered times. That is to say, we can exchange a large amount of communication resources with a small amount of computing resources.

Remark 2: Compared with the time-triggered methods in [19][23], the event-triggered mechanism significantly saves computing resources and communication resources. Note that the event-triggered attitude stabilization problem is studied based on the sliding mode control in [40], however, the performance cost has not been consided in the controller design.

IV Main Results

Up to now, we have already derived the form of the optimal controller (III-B), which contains the optimal value function V^i\hat{V}_{i}^{*}. However, it is very difficult to obtain the analytic solutions to the event-triggered HJB equations (III-B). In this section, we first introduce an event-triggered Reinforcement Learning (RL) algorithm to obtain the optimal policy. In order to implement the event-triggered RL algorithm online, a critic neural network is used to approximate the optimal value function V^i\hat{V}_{i}^{*}. Only measurement data at the event-triggered instants are needed in the event-triggered RL algorithm, which obviously reduces the computation burden.

IV-A Model-Free Event-Triggered RL Algorithm

This section presents a model-free event-triggered algorithm based on reinforcement learning, which is used to seek the optimal policy. The RL algorithm involves two parts: policy evaluation and policy improvement. By repeating these two steps at the event-triggered instants, we know that the optimal policy is obtained when the policy improvement does not change the control policy.

1 Initialize the event-triggered admissible controllers ui(0)=0,i=1,,Nu_{i}(0)=0,i=1,...,N and set h=0h=0;
2 for the rigid body i𝒱i\in\mathcal{V}  do
3       if the rigid body ii receives information uj(t)u_{j}(t) transmitted by the rigid body jj, where j𝒩ij\in\mathcal{N}_{i} then
4             Set h=h+1h^{\prime}=h^{\prime}+1 and tjh=tt_{j}^{h^{\prime}}=t;
5             Update uj(tjh)=uj(t)u_{j}(t_{j}^{h^{\prime}})=u_{j}(t);
6       end if
7      else
8             u^jh=uj(tjh)\hat{u}_{j}^{h^{\prime}}=u_{j}(t_{j}^{h^{\prime}}) remains unchanged;
9       end if
10      Calculate the self-triggered measurment error Δi(t)\big{\lVert}\Delta_{i}(t)\big{\rVert};
11       if the self-triggered condition (III-C) is violated then
12             Set tih+1=tt_{i}^{h+1}=t;
13             Step 1 (Policy evaluation):
Hi\displaystyle H_{i} (ei,V^ih+1,u^ih,u^ih)\displaystyle(e_{i},\nabla\hat{V}_{i}^{h+1},\hat{u}_{i}^{h},\hat{u}_{-i}^{h^{\prime}})
=\displaystyle= eiQiei+(u^ih)Riu^ih+(V^i(h+1))(Xi\displaystyle e_{i}^{\top}Q_{i}e_{i}+(\hat{u}_{i}^{h})^{\top}R_{i}\hat{u}_{i}^{h}+\big{(}\nabla\hat{V}_{i}^{(h+1)}\big{)}^{\top}\big{(}X_{i}
+liiYiu^ihj𝒩iaijYju^jh)=0,\displaystyle+l_{ii}Y_{i}\hat{u}_{i}^{h}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}\hat{u}_{j}^{h^{\prime}}\big{)}=0, (41)
where V^ih+1=Vi(tih+1)\nabla\hat{V}_{i}^{h+1}=\nabla V_{i}(t_{i}^{h+1}), u^ih=ui(tih)\hat{u}_{i}^{h}=u_{i}(t_{i}^{h}) and u^ih=u^jh=uj(tjh)\hat{u}_{-i}^{h^{\prime}}=\hat{u}_{j}^{h^{\prime}}=u_{j}(t_{j}^{h^{\prime}}), j𝒩ij\in\mathcal{N}_{i};
14             Step 2 (Policy Improvement):
u^ih+1=12liiRi1YiV^ih+1,\displaystyle\hat{u}_{i}^{h+1}=-\frac{1}{2}l_{ii}R_{i}^{-1}Y_{i}^{\top}\nabla\hat{V}_{i}^{h+1}, (42)
where u^ih+1=ui(tih+1)\hat{u}_{i}^{h+1}=u_{i}(t_{i}^{h+1});
15             Set h=h+1h=h+1;
16       end if
17      else
18             u^ih=ui(tih)\hat{u}_{i}^{h}=u_{i}(t_{i}^{h}) remains unchanged;
19       end if
20      
21 end for
Algorithm 1 Model-Free Event-Triggered RL Algorithm.

Next, we give a theorem to show the convergence of the model-free event-triggered RL algorithm.

Theorem 2: Supposed that agent ii updates the control policy according to Algorithm 1, the value function converges to the optimal value function, i.e., limhV^ih=V^i\lim_{h\to\infty}\hat{V}_{i}^{h}=\hat{V}_{i}^{*} and the control policy converges to the optimal control policy, i.e., limhu^ih=u^i\lim_{h\to\infty}\hat{u}_{i}^{h}=\hat{u}_{i}^{*}.

Proof: According to Eq. (13), we can obtain that

(V^ih)[Xi+liiYiu^ih1j𝒩iaijYju^jh1]\displaystyle\big{(}\nabla\hat{V}_{i}^{h}\big{)}^{\top}\Big{[}X_{i}+l_{ii}Y_{i}\hat{u}_{i}^{h-1}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}\hat{u}_{j}^{h^{\prime}-1}\Big{]}
=eiQiei(u^ih1)Riu^ih1,\displaystyle=-e_{i}^{\top}Q_{i}e_{i}-\big{(}\hat{u}_{i}^{h-1}\big{)}^{\top}R_{i}\hat{u}_{i}^{h-1}, (43)

and

(V^ih+1)[Xi+liiYiu^ihj𝒩iaijYju^jh]\displaystyle\big{(}\nabla\hat{V}_{i}^{h+1}\big{)}^{\top}\Big{[}X_{i}+l_{ii}Y_{i}\hat{u}_{i}^{h}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}\hat{u}_{j}^{h^{\prime}}\Big{]}
=eiQiei(u^ih)Riu^ih.\displaystyle=-e_{i}^{\top}Q_{i}e_{i}-\big{(}\hat{u}_{i}^{h}\big{)}^{\top}R_{i}\hat{u}_{i}^{h^{\prime}}. (44)

By applying the transformation to Eq. (IV-A), the following equation holds:

(V^ih)[Xi+liiYiu^ihj𝒩iaijYju^jh]\displaystyle\big{(}\nabla\hat{V}_{i}^{h}\big{)}^{\top}\Big{[}X_{i}+l_{ii}Y_{i}\hat{u}_{i}^{h}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}\hat{u}_{j}^{h^{\prime}}\Big{]}
=eiQiei(u^ih1)Riu^ih1+(V^ih)liiYi(u^ihu^ih1)\displaystyle=-e_{i}^{\top}Q_{i}e_{i}-\big{(}\hat{u}_{i}^{h-1}\big{)}^{\top}R_{i}\hat{u}_{i}^{h-1}+\big{(}\nabla\hat{V}_{i}^{h}\big{)}^{\top}l_{ii}Y_{i}\big{(}\hat{u}_{i}^{h}-\hat{u}_{i}^{h-1}\big{)}
(V^ih)j𝒩iaijYj(u^jhu^jh1).\displaystyle\quad-\big{(}\nabla\hat{V}_{i}^{h}\big{)}^{\top}\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}\big{(}\hat{u}_{j}^{h^{\prime}}-\hat{u}_{j}^{h^{\prime}-1}\big{)}. (45)

In the model-free event-triggered RL algorithm, we update the control policy of the rigid body ii when the self-triggered condition (III-C) is violated. Under the distributed asynchronous update pattern, the control policy of the rigid body jj remains invariant, where j𝒩ij\in\mathcal{N}_{i}. Considering the trajectory of the consensus error driven by uihu_{i}^{h}, i.e., e˙i=Xi+liiYiu^ihj𝒩iaijYju^jh\dot{e}_{i}=X_{i}+l_{ii}Y_{i}\hat{u}_{i}^{h}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}\hat{u}_{j}^{h^{\prime}}, we have

V^ih+1V^ih\displaystyle\hat{V}_{i}^{h+1}-\hat{V}_{i}^{h}
=t[(V^ih)e˙i(V^ih+1)e˙i]𝑑v\displaystyle=\int_{t}^{\infty}\Big{[}\big{(}\nabla\hat{V}_{i}^{h}\big{)}^{\top}\dot{e}_{i}-\big{(}\nabla\hat{V}_{i}^{h+1}\big{)}^{\top}\dot{e}_{i}\Big{]}dv
=t[(u^ih)Riu^ih(u^ih1)Riu^ih1\displaystyle=\int_{t}^{\infty}\Big{[}\big{(}\hat{u}_{i}^{h}\big{)}^{\top}R_{i}\hat{u}_{i}^{h}-\big{(}\hat{u}_{i}^{h-1}\big{)}^{\top}R_{i}\hat{u}_{i}^{h-1}
+(V^ih)liiYi(u^ihu^ih1)]dv.\displaystyle\quad+\big{(}\nabla\hat{V}_{i}^{h}\big{)}^{\top}l_{ii}Y_{i}\big{(}\hat{u}_{i}^{h}-\hat{u}_{i}^{h-1}\big{)}\Big{]}dv. (46)

According to Eq. (42), it can be easily obtained that

(V^ih)liiYi=2(u^ih)Ri.\displaystyle\big{(}\nabla\hat{V}_{i}^{h}\big{)}^{\top}l_{ii}Y_{i}=-2\big{(}\hat{u}_{i}^{h}\big{)}^{\top}R_{i}. (47)

Then, Eq. (IV-A) becomes

V^ih+1V^ih\displaystyle\hat{V}_{i}^{h+1}-\hat{V}_{i}^{h}
=t[(u^ih)Riu^ih(u^ih1)Riu^ih1\displaystyle=\int_{t}^{\infty}\Big{[}\big{(}\hat{u}_{i}^{h}\big{)}^{\top}R_{i}\hat{u}_{i}^{h}-\big{(}\hat{u}_{i}^{h-1}\big{)}^{\top}R_{i}\hat{u}_{i}^{h-1}
2(u^ih)Ri(u^ihu^ih1)]dv\displaystyle\quad-2\big{(}\hat{u}_{i}^{h}\big{)}^{\top}R_{i}\big{(}\hat{u}_{i}^{h}-\hat{u}_{i}^{h-1}\big{)}\Big{]}dv
=t[(u^ihu^ih1)Ri(u^ihu^ih1)]𝑑v\displaystyle=\int_{t}^{\infty}\Big{[}-\big{(}\hat{u}_{i}^{h}-\hat{u}_{i}^{h-1}\big{)}^{\top}R_{i}\big{(}\hat{u}_{i}^{h}-\hat{u}_{i}^{h-1}\big{)}\Big{]}dv
0.\displaystyle\leq 0. (48)

Therefore, V^ih+1V^ih\hat{V}_{i}^{h+1}\leq\hat{V}_{i}^{h} is always satisfied. According to the Weierstrass theorem [38], the positive definite value function V^ih\hat{V}_{i}^{h} converges to the optimal value function V^i\hat{V}_{i}^{*} with hh\to\infty. Meanwhile, the control policy u^ih\hat{u}_{i}^{h} converges to the optimal control policy u^i\hat{u}_{i}^{*}. The complete proof is given. \hfill\blacksquare

IV-B Implementation of Event-Triggered PI Algorithm

In this section, we implement Algorithm 1 by using a critic neural network to approximate the optimal value function V^i\hat{V}_{i}^{*}.

We first define the following neural network of each agent:

V^i(ei)=W^c,iϕi(ei),t[tih,tih+1),\displaystyle\hat{V}_{i}(e_{i})=\hat{W}_{c,i}^{\top}\phi_{i}(e_{i}),\ \forall t\in[t_{i}^{h},t_{i}^{h+1}), (49)

where W^c,i\hat{W}_{c,i} indicates the critic estimated weight at the event-triggered instant tiht_{i}^{h}, and ϕi(ei)\phi_{i}(e_{i}) indicates the critic activation function.

According to Eq. (III-B), the estimated error of the critic NN can be defined as

ec,i=eiQiei+u^iRiu^i+W^c,iϕie˙i,\displaystyle e_{c,i}=e_{i}Q_{i}e_{i}+\hat{u}_{i}^{\top}R_{i}\hat{u}_{i}+\hat{W}_{c,i}^{\top}\nabla\phi_{i}\dot{e}_{i}, (50)

where ϕi=ϕi(ei)/ei\nabla\phi_{i}=\partial\phi_{i}(e_{i})/\partial e_{i}^{\top}.

Refer to caption
Figure 1: A strongly connected graph with six nodes.

For a given event-triggered admissible controller u^i\hat{u}_{i}, the update rule of W^c,i\hat{W}_{c,i} is to minimize the following objective function:

Ec,i=12ec,iec,i.\displaystyle E_{c,i}=\frac{1}{2}e_{c,i}^{\top}e_{c,i}. (51)

According to the gradient descent method, we can derive the update law of the following form:

W^˙c,i=0,t(tih,tih+1),\displaystyle\dot{\hat{W}}_{c,i}=0,\ t\in(t_{i}^{h},t_{i}^{h+1}), (52a)
W^c,i+=W^c,ilc,iki(k1,iW^c,i+eiQiei+u^iRiu^i),t=tih,\displaystyle\hat{W}_{c,i}^{+}=\hat{W}_{c,i}-l_{c,i}k_{i}(k_{1,i}^{\top}\hat{W}_{c,i}+e_{i}^{\top}Q_{i}e_{i}+\hat{u}_{i}^{\top}R_{i}\hat{u}_{i}),\ t=t_{i}^{h}, (52b)

where lc,i>0l_{c,i}>0 indicates the learning rate of the critic NN, k1,i=ϕie˙ik_{1,i}=\nabla\phi_{i}\dot{e}_{i} and ki=k1,i/(k1,ik1,i+1)2k_{i}=k_{1,i}/(k_{1,i}^{\top}k_{1,i}+1)^{2}.

Letting W~c,i=W^c,iWc,i\tilde{W}_{c,i}=\hat{W}_{c,i}-W_{c,i}, we can deduce

W~˙c,i=0,t(tih,tih+1),\displaystyle\dot{\tilde{W}}_{c,i}=0,\ t\in(t_{i}^{h},t_{i}^{h+1}), (53a)
W~c,i+=W~c,ilc,iki(k1,iW~c,i+ϵc,i),t=tih,\displaystyle\tilde{W}_{c,i}^{+}=\tilde{W}_{c,i}-l_{c,i}k_{i}(k_{1,i}^{\top}\tilde{W}_{c,i}+\epsilon_{c,i}),\ t=t_{i}^{h}, (53b)

where Wc,iW_{c,i} denotes the critic target weight, W~c,i\tilde{W}_{c,i} is the critic weight error and ϵc,i=eiQiei+u^iRiu^i+Wc,iϕie˙i\epsilon_{c,i}=e_{i}Q_{i}e_{i}+\hat{u}_{i}^{\top}R_{i}\hat{u}_{i}+W_{c,i}^{\top}\nabla\phi_{i}\dot{e}_{i} is the critic residual error.

Therefore, the optimal controller can be obtained by (42) and (49), which is expressed in the following form:

u^i=12liiRi1Yi(ϕi)W^c,i.\displaystyle\hat{u}_{i}=-\frac{1}{2}l_{ii}R_{i}^{-1}Y_{i}^{\top}(\nabla\phi_{i})^{\top}\hat{W}_{c,i}. (54)

Through the above critic NN framework, we can obtain the optimal controller with only measurement data at the event-triggered instants. Therefore, the need for system dynamics is obviously avoided. In addition, the neural network is only updated at the event-triggered instants tiht_{i}^{h}, which are determined by the self-triggered condition (III-C).

Assumption 3: In the critic NN framework, the target weight matrix, the activation function, the critic residual error are bounded with positive constants WcMW_{cM}, ϕM\phi_{M}, and ϵcM\epsilon_{cM}, i.e., Wc,iWcM\lVert W_{c,i}\rVert\leq W_{cM}, ϕiϕM\lVert\phi_{i}\rVert\leq\phi_{M}, and ϵc,iϵcM\lVert\epsilon_{c,i}\rVert\leq\epsilon_{cM}.

Theorem 3: Consider the consensus error dynamics (14), the critic neural network is given as (49). If the estimated weight matrix W^c,i\hat{W}_{c,i} is updated with (52), the consensus error eie_{i} and the critic estimation error W~c,i\tilde{W}_{c,i} are UUB.

Refer to caption
Refer to caption
Figure 2: (a) The norms of attitude errors and angular velocity errors. The trajectories indicate the norms of attitude errors between the rigid body 11 and the rigid body i{2,3,4,5,6}i\in\{2,3,4,5,6\}. (b) The consensus errors of each rigid body δi\delta_{i}. Three subfigures show three components of the consensus error vector δi\delta_{i} of each agent, respectively.
Refer to caption
Refer to caption
Figure 3: (a) The original control inputs τi\tau_{i} of each rigid body. Three subfigures show three components of the control input τi\tau_{i} of each agent, respectively. (b) The control inputs ui\text{u}_{i} of the augmented systems. Three subfigures show three components of the control input ui\text{u}_{i} of augmented systems of each agent, respectively.
Refer to caption
Figure 4: The critic estimated weight matrices. The trajectories show the norm of weight matrices of each agent.

Proof: Two different situations are considered, including during the event-triggered intervals and at the event-triggered instants.

Situation 1: During the event-triggered intervals, i.e., t(tih,tih+1)t\in(t_{i}^{h},t_{i}^{h+1}).

Consider the Lyapunov function of the following form:

Li=Li,1+Li,2,\displaystyle L_{i}=L_{i,1}+L_{i,2}, (55)

where Li,1=eiei+Vi(ei)L_{i,1}=e_{i}^{\top}e_{i}+V_{i}(e_{i}), Li,2=tr(W~c,iW~c,i)lc,iL_{i,2}=\frac{\text{tr}(\tilde{W}_{c,i}^{\top}\tilde{W}_{c,i})}{l_{c,i}}.

According to (53), we can obtain

L˙i,2=2tr(W~c,iW~˙c,i)lc,i=0.\displaystyle\dot{L}_{i,2}=\frac{2\text{tr}(\tilde{W}_{c,i}^{\top}\dot{\tilde{W}}_{c,i})}{l_{c,i}}=0. (56)

Therefore, the first-order derivative of LiL_{i} can be expressed as follows:

L˙i\displaystyle\dot{L}_{i} =L˙i,1=2eie˙i+V˙i(ei)\displaystyle=\dot{L}_{i,1}=2e_{i}^{\top}\dot{e}_{i}+\dot{V}_{i}(e_{i})
=2ei(Xi(ei)+liiYiu^ij𝒩iaijYju^j)\displaystyle=2e_{i}^{\top}\big{(}X_{i}(e_{i})+l_{ii}Y_{i}\hat{u}_{i}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}\hat{u}_{j}\big{)}
eiQieiu^iRiu^i\displaystyle\quad-e_{i}^{\top}Q_{i}e_{i}-\hat{u}_{i}^{\top}R_{i}\hat{u}_{i}
ei2+Xi(ei)+liiYiu^ij𝒩iaijYju^j2\displaystyle\leq\lVert e_{i}\rVert^{2}+\big{\lVert}X_{i}(e_{i})+l_{ii}Y_{i}\hat{u}_{i}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}\hat{u}_{j}\big{\rVert}^{2}
λmin(Qi)ei2λmin(Ri)u^i2\displaystyle\quad-\lambda_{\textrm{min}}(Q_{i})\lVert e_{i}\rVert^{2}-\lambda_{\textrm{min}}(R_{i})\lVert\hat{u}_{i}\rVert^{2}
(1+3XM2λmin(Qi))ei2+3lii2YM2u^i2\displaystyle\leq\big{(}1+3X_{M}^{2}-\lambda_{\textrm{min}}(Q_{i})\big{)}\lVert e_{i}\rVert^{2}+3l_{ii}^{2}Y_{M}^{2}\lVert\hat{u}_{i}\rVert^{2}
+3j𝒩iaij2YM2u^j2λmin(Ri)u^i2.\displaystyle\quad+3\sum_{j\in\mathcal{N}_{i}}a_{ij}^{2}Y_{M}^{2}\lVert\hat{u}_{j}\rVert^{2}-\lambda_{\textrm{min}}(R_{i})\lVert\hat{u}_{i}\rVert^{2}. (57)

In order to ensure L˙i<0\dot{L}_{i}<0, the following inequality should be satisfied:

ei>Φiλmin(Qi)13XM2,\displaystyle\lVert e_{i}\rVert>\sqrt{\frac{\Phi_{i}}{\lambda_{\textrm{min}}(Q_{i})-1-3X_{M}^{2}}}, (58)

where Φi=3lii2YM2u^i2+3j𝒩iaij2YM2u^j2λmin(Ri)u^i2\Phi_{i}=3l_{ii}^{2}Y_{M}^{2}\lVert\hat{u}_{i}\rVert^{2}+3\sum_{j\in\mathcal{N}_{i}}a_{ij}^{2}Y_{M}^{2}\lVert\hat{u}_{j}\rVert^{2}-\lambda_{\textrm{min}}(R_{i})\lVert\hat{u}_{i}\rVert^{2}.

Hence, the consensus error eie_{i} is UUB. During the event-triggered intervals, the critic estimation error W~c,i\tilde{W}_{c,i} remains unchanged, which means W~c,i\tilde{W}_{c,i} is also UUB.

Situation 2: At the event-triggered instants, i.e., t=tiht=t_{i}^{h}.

Choosing the same Lyapunov function as (55), we can obtain:

ΔLi=ΔL1,i+ΔL2,i.\displaystyle\Delta L_{i}=\Delta L_{1,i}+\Delta L_{2,i}. (59)

Since the trajectory of eie_{i} is continuous, i.e., ei+=eie_{i}^{+}=e_{i}. Therefore, one has

ΔL1,i=(ei+)ei++Vi(ei+)eieiVi(ei)=0.\displaystyle\Delta L_{1,i}=(e_{i}^{+})^{\top}e_{i}^{+}+V_{i}(e_{i}^{+})-e_{i}^{\top}e_{i}-V_{i}(e_{i})=0. (60)

Next, according to (53), we have

ΔL2,i\displaystyle\Delta L_{2,i} =tr[(W~c,i+)W~c,i+]lc,itr(W~c,iW~c,i)lc,i\displaystyle=\frac{\text{tr}\big{[}(\tilde{W}_{c,i}^{+})^{\top}\tilde{W}_{c,i}^{+}\big{]}}{l_{c,i}}-\frac{\text{tr}(\tilde{W}_{c,i}^{\top}\tilde{W}_{c,i})}{l_{c,i}}
=1lc,itr[(W~c,ilc,iki(k1,iW~c,i+ϵc,i))\displaystyle=\frac{1}{l_{c,i}}\text{tr}\Big{[}\big{(}\tilde{W}_{c,i}-l_{c,i}k_{i}(k_{1,i}^{\top}\tilde{W}_{c,i}+\epsilon_{c,i})\big{)}^{\top}
×(W~c,ilc,iki(k1,iW~c,i+ϵc,i))W~c,iW~c,i]\displaystyle\quad\times\big{(}\tilde{W}_{c,i}-l_{c,i}k_{i}(k_{1,i}^{\top}\tilde{W}_{c,i}+\epsilon_{c,i})\big{)}-\tilde{W}_{c,i}^{\top}\tilde{W}_{c,i}\Big{]}
=lc,itr[(ki(k1,iW~c,i+ϵc,i))(ki(k1,iW~c,i+ϵc,i))]\displaystyle=l_{c,i}\text{tr}\Big{[}\big{(}k_{i}(k_{1,i}^{\top}\tilde{W}_{c,i}+\epsilon_{c,i})\big{)}^{\top}\big{(}k_{i}(k_{1,i}^{\top}\tilde{W}_{c,i}+\epsilon_{c,i})\big{)}\Big{]}
2tr(W~c,ikik1,iW~c,i)2tr(W~c,ikiϵc,i)\displaystyle\quad-2\text{tr}(\tilde{W}_{c,i}^{\top}k_{i}k_{1,i}^{\top}\tilde{W}_{c,i})-2\text{tr}(\tilde{W}_{c,i}^{\top}k_{i}\epsilon_{c,i})
=lc,ikik1,iW~c,i+kiϵc,i22kik1,iW~c,i2\displaystyle=l_{c,i}\lVert k_{i}k_{1,i}^{\top}\tilde{W}_{c,i}+k_{i}\epsilon_{c,i}\rVert^{2}-2k_{i}^{\top}k_{1,i}\lVert\tilde{W}_{c,i}\rVert^{2}
2W~c,ikiϵc,i.\displaystyle\quad-2\lVert\tilde{W}_{c,i}^{\top}k_{i}\epsilon_{c,i}\rVert. (61)

From the definition of k1,ik_{1,i} and kik_{i}, we can obtain the following inequalities:

αkk1,i\displaystyle\alpha_{k}\leq k_{1,i}^{\top} kiβk,\displaystyle k_{i}\leq\beta_{k}, (62)
ki\displaystyle\lVert k_{i}\rVert\leq KM,\displaystyle K_{M}, (63)

where βk>αk>0\beta_{k}>\alpha_{k}>0 and KM>0K_{M}>0.

Substituting (62) and (61) into (IV-B), ΔL2,i\Delta L_{2,i} becomes

ΔL2,i\displaystyle\Delta L_{2,i} 2lc,iβk2W~c,i2+2lc,iϵcM2KM22αkW~c,i2\displaystyle\leq 2l_{c,i}\beta_{k}^{2}\lVert\tilde{W}_{c,i}\rVert^{2}+2l_{c,i}\epsilon_{cM}^{2}K_{M}^{2}-2\alpha_{k}\lVert\tilde{W}_{c,i}\rVert^{2}
+ϵcM(W~c,i2+KM2)\displaystyle\quad+\epsilon_{cM}(\lVert\tilde{W}_{c,i}\rVert^{2}+K_{M}^{2})
(2αk2lc,iβk2ϵcM)W~c,i2\displaystyle\leq-(2\alpha_{k}-2l_{c,i}\beta_{k}^{2}-\epsilon_{cM})\lVert\tilde{W}_{c,i}\rVert^{2}
+(2lc,iϵcM2+ϵcM)KM2.\displaystyle\quad+(2l_{c,i}\epsilon_{cM}^{2}+\epsilon_{cM})K_{M}^{2}. (64)

Combining (60) and (IV-B), ΔLi\Delta L_{i} can be transformed into the following form:

ΔLi\displaystyle\Delta L_{i} =ΔL1,i+ΔL2,i\displaystyle=\Delta L_{1,i}+\Delta L_{2,i}
(2αk2lc,iβk2ϵcM)W~c,i2\displaystyle\leq-(2\alpha_{k}-2l_{c,i}\beta_{k}^{2}-\epsilon_{cM})\lVert\tilde{W}_{c,i}\rVert^{2}
+(2lc,iϵcM2+ϵcM)KM2.\displaystyle\quad+(2l_{c,i}\epsilon_{cM}^{2}+\epsilon_{cM})K_{M}^{2}. (65)

In order to simplify the expression, some auxiliary variables are defined as follows:

Ai\displaystyle A_{i} =2αk2lc,iβk2ϵcM,\displaystyle=2\alpha_{k}-2l_{c,i}\beta_{k}^{2}-\epsilon_{cM},
Γi\displaystyle\Gamma_{i} =(2lc,iϵcM2+ϵcM)KM2.\displaystyle=(2l_{c,i}\epsilon_{cM}^{2}+\epsilon_{cM})K_{M}^{2}.

Therefore, it can be deduced ΔLi<0\Delta L_{i}<0 when W~c,i>Γi/Ai\lVert\tilde{W}_{c,i}\rVert>\sqrt{\Gamma_{i}/A_{i}}, which signifies that eie_{i} and W~c,i\tilde{W}_{c,i} are UUB at the event-triggered instants.

Combing situation 1 with situation 2, it can be proved that the consensus error eie_{i} and the critic estimation error W~c,i\tilde{W}_{c,i} are UUB. The complete proof is given. \hfill\blacksquare

Remark 3: The attitude consensus problem of multiple rigid body networks has been widely studied in the literature [12, 13, 14, 15]. However, the attitude consensus protocol proposed in [12, 13, 14, 15] are all based on the known rigid body dynamics, which is a major limitation in practical applications. In this work, a model-free RL algorithm is proposed to solve the HJB equation of the optimal attitude consensus of multiple rigid body networks. Moreover, compared with the existing results on model-free consensus problem of multi-agent networks [20, 21, 23], an event-triggered RL algorithm is proposed, which is further extended to the self-triggered RL algorithm. Based on Algorithm 1, we know that the control update action and the information interaction among agents are only executed on the triggering instants. Hence, the computation and communication resource can be obviously reduced compared with the continuous-time approaches [20, 21, 23].

V Simulation

This section presents a numerical simulation to verify the effectiveness of the proposed event-triggered reinforcement learning method. We consider a multiple rigid body network with six nodes under the strongly connected communication topology. The communication relationship between any two nodes can be seen in Fig. 1. The Laplacian matrix is selected as follows:

=[400004480004048400004804000440000044].\displaystyle\mathcal{L}=\begin{bmatrix}4&0&0&0&0&-4\\ -4&8&0&0&0&-4\\ 0&-4&8&-4&0&0\\ 0&0&-4&8&0&-4\\ 0&0&0&-4&4&0\\ 0&0&0&0&-4&4\end{bmatrix}.
Refer to caption
Refer to caption
Figure 5: (a) Triggering instants of dynamic event-triggered control of each agent. (b) Triggering instants of self-triggered control of each agent.
Refer to caption
Refer to caption
Figure 6: (a) The self-triggered measurement errors and their upper bound of each agent. (b) Comparisons of the total number of triggering times and minimum triggering intervals between the dynamic event-triggered control and the self-triggered control of each agent.

The rigid body i{1,2,3,4,5,6}i\in\{1,2,3,4,5,6\} can be modeled using the following equations:

σi˙\displaystyle\dot{\sigma_{i}} =G(σi)ωi,\displaystyle=G(\sigma_{i})\omega_{i},
Jiω˙i\displaystyle J_{i}\dot{\omega}_{i} =ωi×(Jiωi)+τi,\displaystyle=-\omega_{i}\times(J_{i}\omega_{i})+\tau_{i},

in which σi=[σi(1),σi(2),σi(3)]\sigma_{i}=[\sigma_{i}^{(1)},\sigma_{i}^{(2)},\sigma_{i}^{(3)}]^{\top} indicates the attitude vector, ωi=[ωi(1),ωi(2),ωi(3)]\omega_{i}=[\omega_{i}^{(1)},\omega_{i}^{(2)},\omega_{i}^{(3)}]^{\top} indicates the angular velocity vector, JiJ_{i} indicates the inertial matrix. Here, the inertial matrix of each rigid body is selected as follows:

J1=J3=[1.0 0.1 0.1; 0.1 1.0 0.1; 0.1 0.1 1.0],\displaystyle J_{1}=J_{3}=[1.0\ 0.1\ 0.1;\ 0.1\ 1.0\ 0.1;\ 0.1\ 0.1\ 1.0],
J2=J4=[1.2 0.1 0.1; 0.1 0.9 0.1; 0.1 0.1 1.1],\displaystyle J_{2}=J_{4}=[1.2\ 0.1\ 0.1;\ 0.1\ 0.9\ 0.1;\ 0.1\ 0.1\ 1.1],
J3=J6=[1.1 0.2 0.1; 0.2 1.0 0.3; 0.1 0.3 1.3].\displaystyle J_{3}=J_{6}=[1.1\ 0.2\ 0.1;\ 0.2\ 1.0\ 0.3;\ 0.1\ 0.3\ 1.3].

In this simulation, the total duration is set to 40 seconds and the sampling period is 0.01 seconds. We selected the parameters as follows: αi=0.5\alpha_{i}=0.5, P=1P=1, the weight matrices Qi=4I6Q_{i}=4I_{6}, Ri=I3R_{i}=I_{3}, and the learning rate lc,i=0.6l_{c,i}=0.6. In the dynamic event-triggered condition (III-B), yi(0)=4y_{i}(0)=4, γi=0.5\gamma_{i}=0.5, κi=0.5\kappa_{i}=0.5, ϖi=0.6\varpi_{i}=0.6, θi=2\theta_{i}=2. In the self-triggered condition (III-C), the parameters remain the same except κi=0\kappa_{i}=0 and ϖi=0\varpi_{i}=0. The initial states of each rigid body are given by:

σi=[0.05i0.05i0.05i],ωi=ω˙i=[000],i=1,2,3,4,5,6.\begin{split}&\sigma_{i}=\begin{bmatrix}0.05i\\ -0.05i\\ 0.05i\end{bmatrix},\;\omega_{i}=\dot{\omega}_{i}=\begin{bmatrix}0\\ 0\\ 0\end{bmatrix},\;i=1,2,3,4,5,6.\end{split}

The critic activation function is designed as:

ϕi(ei)=[\displaystyle\phi_{i}(e_{i})=[ (ei1)2ei1ei2ei1ei3ei1ei4ei1ei5ei1ei6\displaystyle(e_{i}^{1})^{2}\quad e_{i}^{1}e_{i}^{2}\quad e_{i}^{1}e_{i}^{3}\quad e_{i}^{1}e_{i}^{4}\quad e_{i}^{1}e_{i}^{5}\quad e_{i}^{1}e_{i}^{6}
(ei2)2ei2ei3ei2ei4ei2ei5ei2ei6(ei3)2\displaystyle(e_{i}^{2})^{2}\quad e_{i}^{2}e_{i}^{3}\quad e_{i}^{2}e_{i}^{4}\quad e_{i}^{2}e_{i}^{5}\quad e_{i}^{2}e_{i}^{6}\quad(e_{i}^{3})^{2}
ei3ei4ei3ei5ei3ei6(ei4)2ei4ei5ei4ei6\displaystyle e_{i}^{3}e_{i}^{4}\quad e_{i}^{3}e_{i}^{5}\quad e_{i}^{3}e_{i}^{6}\quad(e_{i}^{4})^{2}\quad e_{i}^{4}e_{i}^{5}\quad e_{i}^{4}e_{i}^{6}
(ei5)2ei5ei6(ei6)2]21.\displaystyle(e_{i}^{5})^{2}\quad e_{i}^{5}e_{i}^{6}\quad(e_{i}^{6})^{2}]^{\top}\in\mathbb{R}^{21}.

By using the model-free event-triggered RL method proposed above, the optimal attitude consensus problem for multiple rigid body networks is solved. Fig. 2 shows the norms of attitude errors and angular velocity errors between the rigid body 11 and the rigid body i{2,3,4,5,6}i\in\{2,3,4,5,6\}. From Fig. 2, we can conclude that the optimal attitude consensus is achieved. The same conclusion can be obtained from Fig. 2, which shows the trajectories of the consensus errors. Fig. 3 and Fig. 3 show the original control inputs of each rigid body and the control inputs of the augmented systems, respectively. It is worth noting that the control inputs of the augmented systems are only updated at the event-triggered instants, and remain unchanged during the event-triggered intervals.

Fig. 4 demonstrates the critic estimated weight matrix of each rigid body. It can be clearly seen that the neural networks are only updated at the event-triggered instants, which obviously reduces the consumption of computing resources. The triggering instants of dynamic event-triggered control and self-triggered control are illustrated in Figs. 5 and 5, respectively. It is clearly shown that control update actions and communication frequencey are both significantly reduced compared with the continous-time control approaches [20, 21, 23]. Fig. 6 shows the self-triggered measurement errors and their upper bound, which determines the event-triggered instants. Fig. 6 represents the triggered times and the minimum triggered interval under the dynamic event-triggered condition and the self-triggered condition, respectively. Since the self-triggered measurement error Δi(t)\Delta_{i}(t) is the upper bound of the measurement error Ei(t)E_{i}(t), the triggered times under the self-triggered mechanism are more than uner the dynamic event-triggered mechanism. Therefore, we can conclude that the self-triggered mechanism leads to an inevitable increase in the number of triggered times without continuing to communicate with neighbors.

VI Conclusion

In this paper, a model-free event-triggered RL method is proposed to deal with the optimal attitude consensus for multiple rigid body networks, which only requires the measurement data at the event-triggered instants. In order to solve the HJB equations, an event-triggered PI algorithm is proposed to obtain the optimal policy. Meanwhile, the critic NN framework is used to approximate the optimal value function online. The critic neural network is updated only when the event-triggered condition is violated, which greatly reduces the consumption of computing and communication resources. The UUB of the consensus error and the weight estimation error is proved and the Zeno behavior is excluded. A numerical simulation for a multiple rigid body network with six nodes shows the feasibility of the proposed method.

In the future, we will further improve this work from the following perspectives. One consideration is to relax the condition of communication topologies, such as from strongly connected graphs to directed spanning trees or even switching topologies [44]. Due to the actuator failure could happen in real applications of rigid body such as intelligent cars and quadrotor aircrafts [45], it is well motivated to consider the optimal cooperative control of rigid body systems with the actuator failure.

References

  • [1] S. Ren, R. Mao and J. Wu, “Passivity-based leader-following consensus control for nonlinear multi-agent systems with fixed and switching topologies,” IEEE Transactions on Network Science and Engineering, vol. 6, no. 4, pp. 844-856, Oct. 2019.
  • [2] H. Hong, W. Yu, J. Fu and X. Yu, “A novel class of distributed fixed-time consensus protocols for second-order nonlinear and disturbed multi-agent systems,” IEEE Transactions on Network Science and Engineering, vol. 6, no. 4, pp. 760-772, Oct. 2019.
  • [3] D. Chen, X. Liu and W. Yu, “Finite-time fuzzy adaptive consensus for heterogeneous nonlinear multi-agent systems,” IEEE Transactions on Network Science and Engineering, vol. 7, no. 4, pp. 3057-3066, Oct. 2020.
  • [4] H. Li, J. Liu, R.W. Liu, N. Xiong, K. Wu, T. Kim, “A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis.” Sensors vol. 17, no. 8, pp. 1792, Aug. 2017.
  • [5] X. Shi, J. Cao, G. Wen and M. Perc, ”Finite-Time Consensus of Opinion Dynamics and its Applications to Distributed Optimization Over Digraph,” IEEE Transactions on Cybernetics, vol. 49, no. 10, pp. 3767-3779, Oct. 2019.
  • [6] H. Du, W. Zhu, G. Wen, Z. Duan and J. Lü, “Distributed formation control of multiple quadrotor aircraft based on nonsmooth consensus algorithms,” IEEE Transactions on Cybernetics, vol. 49, no. 1, pp. 342-353, Jan. 2019.
  • [7] Z. Li, Y. Tang, T. Huang and J. Kurths, “Formation control with mismatched orientation in multi-agent systems,” IEEE Transactions on Network Science and Engineering, vol. 6, no. 3, pp. 314-325, 1 Jul. 2019.
  • [8] X. Jin, W. Du, W. He, L. Kocarev, Y. Tang and J. Kurths, “Twisting-based finite-time consensus for Euler-Lagrange systems with an event-triggered strategy,” IEEE Transactions on Network Science and Engineering, vol. 7, no. 3, pp. 1007-1018, Jul. 2020.
  • [9] H. Zhang and P. Gurfil, “Cooperative orbital control of multiple satellites via consensus,” IEEE Transactions on Aerospace and Electronic Systems, vol. 54, no. 5, pp. 2171-2188, Oct. 2018.
  • [10] K. Zhang and M. A. Demetriou, “Adaptation of consensus penalty terms for attitude synchronization of spacecraft formation with unknown parameters,” 52nd IEEE Conference on Decision and Control, Florence, pp. 5491-5496, 2013.
  • [11] H. Qu, F. Yang, Q. Han and Y. Zhang, “Distributed H\infty-consensus filtering for attitude tracking using ground-based radars,” IEEE Transactions on Cybernetics, to be published.
  • [12] A. Abdessameud and A. Tayebi, “Attitude synchronization of a group of spacecraft without velocity measurements,” IEEE Transactions on Automatic Control, vol. 54, no. 11, pp. 2642-2648, Nov. 2009.
  • [13] H. Cai, and J. Huang, “Leader-following attitude consensus of multiple rigid body networks by attitude feedback control,” Automatica, vol. 69, pp. 87-92, Jul. 2016.
  • [14] H. Gui, and A.H.J. de Ruiter, “Global finite-time attitude consensus of leader-following spacecraft systems based on distributed observers,” Automatica, vol. 91, pp. 225-232, May 2018.
  • [15] M. Lu and L. Liu, “Leader-following attitude consensus of multiple rigid spacecraft systems under switching networks,” IEEE Transactions on Automatic Control, vol. 65, no. 2, pp. 839-845, Feb. 2020.
  • [16] B. Yi, X. Shen, H. Liu, Z. Zhang, W. Zhang, S. Liu, N. Xiong, ”Deep Matrix Factorization With Implicit Feedback Embedding for Recommendation System,” IEEE Transactions on Industrial Informatics, vol. 15, no. 8, pp. 4591-4601, Aug. 2019.
  • [17] B. Lin, F. Zhu, J. Zhang, J. Chen, X. Chen, N. Xiong, J. L. Mauri, ”A Time-Driven Data Placement Strategy for a Scientific Workflow Combining Edge Computing and Cloud Computing,” IEEE Transactions on Industrial Informatics, vol. 15, no. 7, pp. 4254-4265, July 2019.
  • [18] J. Sun, X. Wang, N. Xiong and J. Shao, ”Learning Sparse Representation With Variational Auto-Encoder for Anomaly Detection,” IEEE Access, vol. 6, pp. 33353-33361, 2018.
  • [19] K. G. Vamvoudakis, F. L. Lewis, and G. R. Hudas, “Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality,” Automatica, vol. 48, no. 8, pp. 1598-1611, Aug. 2012.
  • [20] J. Li, H. Modares, T. Chai, F. L. Lewis and L. Xie, “Off-policy reinforcement learning for synchronization in multiagent graphical games,” IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 10, pp. 2434-2445, Oct. 2017.
  • [21] J. Qin, M. Li, Y. Shi, Q. Ma and W. X. Zheng, “Optimal synchronization control of multiagent systems with input saturation via off-policy reinforcement learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 1, pp. 85-96, Jan. 2019.
  • [22] M. Abu-Khalaf and F. L. Lewis, “Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach,” Automatica, vol. 41, no. 5, pp. 779-791, May 2005.
  • [23] H. Zhang, J. H. Park and W. Zhao, “Model-free optimal consensus control of networked Euler-Lagrange systems,” IEEE Access, vol. 7, pp. 100771-100779, 2019.
  • [24] L. Dong, X. Zhong, C. Sun and H. He, “Event-triggered adaptive dynamic programming for continuous-time systems with control constraints,” IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 8, pp. 1941-1952, Aug. 2017.
  • [25] X. Zhong and H. He, “An event-triggered ADP control approach for continuous-time system with unknown internal states,” IEEE Transactions on Cybernetics, vol. 47, no. 3, pp. 683-694, Mar. 2017.
  • [26] Y. Zhu, D. Zhao, H. He and J. Ji, “Event-triggered optimal control for partially unknown constrained-input systems via adaptive dynamic programming,” IEEE Transactions on Industrial Electronics, vol. 64, no. 5, pp. 4101-4109, May 2017.
  • [27] Q. Zhang, D. Zhao and D. Wang, “Event-based robust control for uncertain nonlinear systems using adaptive dynamic programming,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 1, pp. 37-50, Jan. 2018.
  • [28] W. Zhao and H. Zhang, “Distributed optimal coordination control for nonlinear multi-agent systems using event-triggered adaptive dynamic programming method,” ISA Transactions, vol. 91, pp. 184-195, Aug. 2019.
  • [29] W. Zhao, W. Yu and H. Zhang, “Event-triggered optimal consensus tracking control for multi-agent systems with unknown internal states and disturbances,” Nonlinear Analysis Hybrid Systems, vol. 33, pp. 227-248, Aug. 2019.
  • [30] Z. Shi and C. Zhou, “Distributed optimal consensus control for nonlinear multi-agent systems with input saturation based on event-triggered adaptive dynamic programming method,” International Journal of Control, to be published.
  • [31] A. Girard, “Dynamic triggering mechanisms for event-triggered control,” IEEE Transactions on Automatic Control, vol. 60, no. 7, pp. 1992-1997, Jul. 2015.
  • [32] X. Yi, K. Liu, D. V. Dimarogonas and K. H. Johansson, “Dynamic event-triggered and self-triggered control for multi-agent systems,” IEEE Transactions on Automatic Control, vol. 64, no. 8, pp. 3300-3307, Aug. 2019.
  • [33] X. Jin, Y. Shi, Y. Tang and X. Wu, “Event-triggered attitude consensus with absolute and relative attitude measurements,” Automatica, vol. 122, Art. No. 109245, Dec. 2020.
  • [34] S. Wang, X. Jin, S. Mao, A. V. Vasilakos and Y. Tang, “Model-free event-triggered optimal consensus control of multiple Euler-Lagrange systems via reinforcement learning,” IEEE Transactions on Network Science and Engineering, to be published.
  • [35] H. Schaub and J. L. Junkins, Analytical Mechanics of Space Systems, American Institute of Aeronautics and Astronautics, 2009.
  • [36] M. Lemmon, Networked Control Systems, Springer London, 2010.
  • [37] H. K. Khalil, Nonlinear Systems, Upper Saddle River, NJ, USA: Prentice Hall, 2002.
  • [38] Z. Jiang and Y. Jiang. “Robust adaptive dynamic programming for linear and nonlinear systems: An overview,” European Journal of Control, vol. 19, no. 5, pp. 417-425, Sept. 2013.
  • [39] W. Fang, X. Yao, X. Zhao, J. Yin and N. Xiong, ”A Stochastic Control Approach to Maximize Profit on Service Provisioning for Mobile Cloudlet Platforms,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 48, no. 4, pp. 522-534, Apr. 2018.
  • [40] Y. Liu, B. Jiang, J. Lu, J. Cao and G. Lu, ”Event-Triggered Sliding Mode Control for Attitude Stabilization of a Rigid Spacecraft,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 50, no. 9, pp. 3290-3299, Sept. 2020.
  • [41] B. Li, Y. Liu, K. I. Kou and L. Yu, ”Event-Triggered Control for the Disturbance Decoupling Problem of Boolean Control Networks,” IEEE Transactions on Cybernetics, vol. 48, no. 9, pp. 2764-2769, Sept. 2018.
  • [42] S. Zhu, Y. Liu, Y. Lou, et al., ”Stabilization of logical control networks: an event-triggered control approach,” Science China Information Sciences, vol. 63 no. 1, pp. 1-11, 2020.
  • [43] Y. -J. Liu, Q. Zeng, S. Tong, C. L. P. Chen and L. Liu, ”Adaptive Neural Network Control for Active Suspension Systems With Time-Varying Vertical Displacement and Speed Constraints,” IEEE Transactions on Industrial Electronics, vol. 66, no. 12, pp. 9458-9466, Dec. 2019.
  • [44] L. Liu, Y.-J. Liu, A. Chen, S. Tong, C. L. P. Chen, ”Integral Barrier Lyapunov function-based adaptive control for switched nonlinear systems,” Science China Information Sciences, vol. 63, no.3, 2020.
  • [45] Y. -J. Liu, Q. Zeng, S. Tong, C. L. P. Chen and L. Liu, ”Actuator Failure Compensation-Based Adaptive Control of Active Suspension Systems With Prescribed Performance,” IEEE Transactions on Industrial Electronics, vol. 67, no. 8, pp. 7044-7053, Aug. 2020.
  • [46] Y. Qu and N. Xiong, ”RFH: A Resilient, Fault-Tolerant and High-Efficient Replication Algorithm for Distributed Cloud Storage,” 2012 41st International Conference on Parallel Processing, pp. 520-529, 2012.
[Uncaptioned image] Xin Jin received the B.S. degree in school of automation from the Guangdong University of Technology, Guangzhou, China, in 2016. He was an exchange Ph.D. student at University of Victoria, Victoria, Canada from Sept. 2019 to Sept. 2020. He is currently working toward the Ph.D. degree from the East China University of Science and Technology. His research interests include rigid body systems, multi-agent systems, event-triggered control and their applications.
[Uncaptioned image] Shuai Mao received the B.S. degree in school of control science and engineering from East China University of Science and Technology, in 2017. He is currently pursuing the Ph.D. degree from East China University of Science and Technology. His research interests include multi-agent systems, distributed optimization and their applications in practical engineering.
[Uncaptioned image] Ljupco Kocarev (Fellow, IEEE) is currently a member of the Macedonian Academy of Sciences and Arts, a Full Professor with the Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, Macedonia, the Director of the Research Center for Computer Science and Information Technologies, Macedonian Academy, and a Research Professor with the University of California at San Diego. His work has been supported by the Macedonian Ministry of Education and Science, the Macedonian Academy of Sciences and Arts, NSF, AFOSR, DoE, ONR, ONR Global, NIH, STMicroelectronics, NATO, TEMPUS, FP6, FP7, Horizon 2020, and agencies from Spain, Italy, Germany (DAAD and DFG), Hong Kong, and Hungary. His scientific interests include networks, nonlinear systems and circuits, dynamical systems and mathematical modeling, machine learning, and computational biology.
[Uncaptioned image] Chen Liang is currently working as a Research Assistant at Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education and a Faculty Member of School of Information at East China University of Science and Technology. She got her Master Degree in Computer Applied Technology at Shanghai Normal University in 2013. Her research interests include multi-agent systems, reinforcement learning and network.
[Uncaptioned image] Saiwei Wang received the B.S. degree in school of control science and engineering from East China University of Science and Technology, Shanghai, China, in 2018. He is currently pursuing the M.S. degree from East China University of Science and Technology. His research interests include multi-agent systems, reinforcement learning and their applications.
[Uncaptioned image] Yang Tang (Senior Member, IEEE) received the B.S. and Ph.D. degrees in electrical engineering from Donghua University, Shanghai, China, in 2006 and 2010, respectively. From 2008 to 2010, he was a Research Associate with The Hong Kong Polytechnic University, Hong Kong. From 2011 to 2015, he was a Post-Doctoral Researcher with the Humboldt University of Berlin, Berlin, Germany, and with the Potsdam Institute for Climate Impact Research, Potsdam, Germany. Since 2015, he has been a Professor with the East China University of Science and Technology, Shanghai. His current research interests include distributed estimation/control/optimization, cyber-physical systems, hybrid dynamical systems, computer vision, reinforcement learning and their applications. Prof. Tang was a recipient of the Alexander von Humboldt Fellowship and the ISI Highly Cited Researchers Award by Clarivate Analytics from 2017. He is a Senior Board Member of Scientific Reports, an Associate Editor of IEEE Transactions on Neural Networks and Learning Systems, IEEE Transactions on Emerging Topics in Computational Intelligence, IEEE Transactions on Circuits and Systems I: Regular Papers and IEEE Systems Journal, etc.