Event-Triggered Optimal Attitude Consensus
of Multiple Rigid Body Networks with
Unknown Dynamics

Xin Jin, Shuai Mao, Ljupco Kocarev, , Chen Liang, Saiwei Wang,
and Yang Tang This work was supported in part by the National Key Research and Development Program of China under Grant 2018YFC0809302, the National Natural Science Foundation of China under Grant 61751305, in part by the Program of Shanghai Academic Research Leader under Grant No. 20XD1401300, in part by the Programme of Introducing Talents of Discipline to Universities (the 111 Project) under Grant B17017. (Corresponding author: Yang Tang.)Xin Jin, Shuai Mao, Saiwei Wang, Chen Liang and Yang Tang are with the Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai 200237, China (e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]).Ljupco Kocarev is with the Macedonian Academy of Sciences and Arts, 1000 Skopje, Macedonia, also with the Faculty of Computer Science and Engineering, Univerzitet “Sv. Kiril i Metodij,” 1000 Skopje, Macedonia, and also with the BioCircuits Institute, University of California at San Diego, La Jolla, CA 92093 USA (e-mail: [email protected]).

Abstract

In this paper, an event-triggered Reinforcement Learning (RL) method is proposed for the optimal attitude consensus of multiple rigid body networks with unknown dynamics. Firstly, the consensus error is constructed through the attitude dynamics. According to the Bellman optimality principle, the implicit form of the optimal controller and the corresponding Hamilton-Jacobi-Bellman (HJB) equations are obtained. Because of the augmented system, the optimal controller can be obtained directly without relying on the system dynamics. Secondly, the self-triggered mechanism is applied to reduce the computing and communication burden when updating the controller. In order to address the problem that the HJB equations are difficult to solve analytically, a RL method which only requires measurement data at the event-triggered instants is proposed. For each agent, only one neural network is designed to approximate the optimal value function. Each neural network is updated only at the event-triggered instants. Meanwhile, the Uniformly Ultimately Bounded (UUB) of the closed-loop system is obtained, and the Zeno behavior is also avoided. Finally, the simulation results on a multiple rigid body network demonstrate the validity of the proposed method.

Index Terms:

Optimal attitude consensus, multiple rigid body networks, event-triggered control, reinforcement learning

I Introduction

Consensus control, as a fundamental form of coordination problem in multi-agent systems, aims to design a control protocol for each agent to drive the states of all agents to be synchronized [1]–[3, 5]. Over recent decades, the attitude consensus problem of multiple rigid body networks has received increasing attention [4] because it plays a significant role in the development of many fields, such as formation in three-dimensional space [6], [7], cooperation of multi-manipulators [8] and satellite networks [9], etc. At present, some results have been proposed, which can be classified into two categories, the leaderless attitude consensus [10]–[12] and the leader-follower attitude consensus [13]–[15]. Note that none of them have considered the performance cost when achieving the attitude consensus.

In practical application scenarios, the performance cost is a factor that must be considered, which affects the efficiency of mission completion and the endurance of limited resources. The optimal attitude consensus control, which not only makes the attitude of all rigid body systems tend to be synchronized, but also minimizes the performance cost. In general, the optimal control problem can be transformed into solving the Hamilton-Jacobi-Bellman (HJB) equations. Nevertheless, it is very difficult to find the analytic solutions to the HJB equations. With the popularity of reinforcement learning technology [16, 17, 18] and the rapid development of the computing capacity of processors, some reinforcement learning based researches on solving the optimal consensus problem have emerged. As far as we know, the vast majority of the research objects are linear systems [19]–[21] or first-order nonlinear systems [22]. In the above results, the knowledge of system dynamics is required in [19], [22], and the researches in [20] and [21] circumvent the dependence on system dynamics. However, the implementation of algorithms in [20] and [21] requires the acquisition of measurement data in advance and a lot of tedious integration operations are involved, which obviously increases the computational burden of the system [46]. At present, there are relatively few results concentrating on the reinforcement learning based method to realize the optimal attitude consensus for multiple rigid body networks. In [23], a model-free algorithm is proposed to deal with the optimal consensus for multiple rigid body networks, in which the model of each rigid body is expressed in the form of Euler-Lagrange equation. However, an extra neural network-based observer is designed to estimate the system dynamics, which imposes additional computational burden. Motivated by these factors, we aim to design a method that only needs real-time measurement data to achieve the optimal attitude consensus of multiple rigid body networks with unknown dynamics.

Updating the controller and the neural networks at each sampling instant based on the reinforcement learning method will take up a lot of computing and communication resources, especially when the system scale is huge. Therefore, it is particularly necessary to integrate the event-triggered mechanism into the reinforcement learning method to reduce the consumption of resources. In recent years, the event-triggered control scheme is widely studied to save the control cost and energy resources [41, 42]. In [24]–[27], the event-triggered mechanism is introduced to solve the optimal control of an individual system. The optimal consensus of multi-agent systems is considered in [28]–[30] by using the event-triggered reinforcement learning method. However, the event-triggered conditions in all of the above event-triggered reinforcement learning methods [24]–[30] include the continuous information. Therefore, all agents need to obtain the state information of themselves and their neighbors in real-time to determine whether the event-triggered condition is satisfied, which inevitably increases the communication resources. Inspired by [31]–[33], we aim to design an event-triggered reinforcement learning method under the self-triggered mechanism, thereby greatly reducing the consumption of computing and communication resources. Compared with the common linear system and first-order nonlinear system [19]–[22], it is challenging to combine the self-triggered mechanism with the reinforcement learning method to solve the optimal attitude consensus problem of multiple rigid body networks, since the dynamic model of a rigid body is a second order system with state coupled characteristics and the underlying attitude configuration space is non-Euclidean.

In this paper, we deal with the optimal attitude consensus problem for multiple rigid body networks with unknown system dynamics. The dynamic event-triggered mechanism is first introduced, which can significantly reduce the consumption of computing resources caused by updating the controller. Based on the discussion of the dynamic event-triggered condition, a sufficient self-triggered condition is proposed. Under the self-triggered mechanism, the continuous communication between rigid bodies can be avoided. Moreover, a reinforcement learning method is used to obtain the optimal policy. In detail, a rigid body only needs a neural network to approximate the optimal value function because of the existence of the augmented system [43]. Each neural network is updated only when the self-triggered condition is violated. The main contributions are as follows:

1) By using only the measurement data at the event-triggered instants, we achieve the optimal attitude consensus of multiple rigid body networks with unknown system dynamics. No additional actor neural network [20], [21] or additional neural network-based observer [23] is used in this paper, which obviously reduces the complexity of the algorithm implementation.

2) Compared with the results in [23] and [34], both a dynamic event-triggered condition and a self-triggered condition are integrated into the proposed reinforcement learning based method to solve the optimal attitude consensus of multiple rigid body networks. Under the self-triggered mechanism, the neural networks are updated only at the event-triggered instants. Meanwhile, the continuous communication is also avoided. Therefore, the consumption of computing and communication resources would be greatly reduced.

The following layout of this paper is as follows. In Section II, we introduce the notations used in this paper and the basics of graph theory. The model-free optimal attitude consensus problem is described in Section III. Meanwhile, the event-triggered mechanism is also introduced. We design an event-triggered reinforcement learning method in Section IV. The feasibility of this method is verified through a simulation in Section V. Section VI gives the conclusion.

II Preliminaries

II-A Notations

Throughout this paper, $\mathbb{R}$ represents the set of all real numbers, $\mathbb{R}_{>0}$ represents the set of all positive real numbers, $\mathbb{N}$ represents the set of all non-negative integers, and $\mathbb{N}_{>0}$ is the set of all positive integers, i.e., $\mathbb{R}=(-\infty,+\infty)$ , $\mathbb{R}_{>0}=(0,+\infty)$ , $\mathbb{N}=\{0,1,2,...\}$ and $\mathbb{N}_{>0}=\{1,2,...\}$ . $x\in\mathbb{R}^{n}$ indicates an $n$ -dimensional vector, $I_{n}$ indicates an $n$ -dimensional identity matrix, $A\in\mathbb{R}^{n\times m}$ indicates an $n\times m$ dimensional matrix. For a vector $x$ , its Euclidean norm is defined as $\lVert x\rVert=\sqrt{x^{T}x}$ . For a square matrix $B=[b_{ij}]\in\mathbb{R}^{n\times n}$ , its trace is defined as $\textrm{tr}(B)=\sum_{i=1}^{n}b_{ii}$ , its Frobenius norm is defined as $\lVert B\rVert=\sqrt{\sum_{i=1}^{n}\sum_{j=1}^{n}|b_{ij}|^{2}}$ . Define ${\lambda}_{\textrm{min}}(B)$ and ${\lambda}_{\textrm{max}}(B)$ as the minimum eigenvalue and maximum eigenvalue, respectively. $B>0$ $(B\geq 0)$ indicates that $B$ is positive (semi-positive) definite.

For any two vectors $\xi=[x_{1},y_{1},z_{1}]^{\top}\in\mathbb{R}^{3}$ and $\zeta=[x_{2},y_{2},z_{2}]^{\top}\in\mathbb{R}^{3}$ , their cross product is expressed as follows:

\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\xi\times\zeta}=\begin{bmatrix}y_{1}z_{2}-y_{2}z_{1}\\ x_{2}z_{1}-x_{1}z_{2}\\ x_{1}y_{2}-x_{2}y_{1}\end{bmatrix}\in\mathbb{R}^{3}.

II-B Graph Theory

Let $\mathcal{G}=(\mathcal{V},\mathcal{E})$ represent the directed communication graph among $N\in\mathbb{N}_{>0}$ rigid bodies, where $\mathcal{V}=\{1,2,...,N\}$ indicates the set of all rigid bodies and $\mathcal{E}\subseteq\mathcal{V}\times\mathcal{V}$ indicates the communication relationship between any two rigid bodies. For the rigid body $i$ , we use $\mathcal{N}_{i}=\{j\in\mathcal{V}|(j,i)\in\mathcal{E}\}$ to represent the set of its neighbors. For any two rigid bodies, if there is always a direct path between them, we call this communication graph strongly connected. In this paper, we suppose that all directed communication graphs are strongly connected.

In order to express the communication relationship between all rigid bodies more clearly, the weighted adjacency matrix $A=[a_{ij}]\in\mathbb{R}^{N\times N}$ is introduced. If the rigid body $i$ can receive the data transmitted by the rigid body $j$ , $a_{ij}>0$ and $a_{ij}=0$ otherwise. The in-degree matrix of the directed communication graph can be expressed as $D=\text{diag}\{d_{1},d_{2},...,d_{N}\}\in\mathbb{R}^{N\times N}$ , where $d_{i}=\sum_{j\in\mathcal{N}_{i}}a_{ij}$ . Let $\mathcal{L}=D-A=[l_{ij}]$ represent the Laplacian matrix, where $l_{ii}=d_{i}$ , and $l_{ij}=-a_{ij}$ when $i\neq j$ .

III Model-Free Event-Triggered Optimal Attitude Consensus

III-A Model-Free Optimal Attitude Consensus

We consider a multiple rigid body network with $N$ nodes, where the attitude of each node can be expressed by Modified Rodriguez Parameters (MRPs) [35]. For the rigid body $i$ , the attitude is represent by $\sigma_{i}=[\sigma_{i}^{1},\sigma_{i}^{2},\sigma_{i}^{3}]^{\top}=\Psi_{i}\textrm{tan}\frac{\Phi_{i}}{4}\in\mathbb{R}^{3}$ , where $\Psi_{i}\in\mathbb{R}^{3}$ indicates the Euler axis, and $\Phi_{i}\in[0,\pi)$ denotes the angle respective to the Euler axis.

Then, the attitude dynamics of each rigid body is given in the following form:


$\displaystyle\dot{\sigma_{i}}=G(\sigma_{i})\omega_{i},$		(1a)
$\displaystyle J_{i}\dot{\omega}_{i}=-\omega_{i}\times(J_{i}\omega_{i})+\tau_{i},$	$\displaystyle\quad i=1,2,...,N,$	(1b)

where $\omega_{i}\in\mathbb{R}^{3}$ , $J_{i}\in\mathbb{R}^{3\times 3}$ and $\tau_{i}\in\mathbb{R}^{3}$ indicate the angular velocity vector, the inertia matrix and the control input torque, respectively. The matrix $G(\sigma_{i})=\frac{1}{2}(\sigma_{i}^{\times}+\sigma_{i}\sigma_{i}^{\top}+\frac{1-\sigma_{i}^{\top}\sigma_{i}}{2}I_{3})\in\mathbb{R}^{3\times 3}$ , where

\displaystyle\sigma_{i}^{\times}=\begin{bmatrix}0&-\sigma_{i}^{3}&\sigma_{i}^{2}\\ \sigma_{i}^{3}&0&-\sigma_{i}^{1}\\ -\sigma_{i}^{2}&\sigma_{i}^{1}&0\end{bmatrix}.

Definition 1: Given that the communication topology of a multiple rigid body network (1) is strongly connected, the attitude consensus is said to be achieved when the following conditions hold:


	$\displaystyle\lim_{t\to\infty}\big{\lVert}\sigma_{i}(t)-\sigma_{j}(t)\big{\rVert}=0,$		(2a)
	$\displaystyle\lim_{t\to\infty}\big{\lVert}\omega_{i}(t)-\omega_{j}(t)\big{\rVert}=0,\ \forall i,j\in\mathcal{V}.$		(2b)

Considering the communication topology among these $N$ rigid bodies, we can define the following form of consensus error for the rigid body $i$ :

\displaystyle\delta_{i}=\sum_{j\in\mathcal{N}_{i}}a_{ij}(\omega_{i}-\omega_{j})+\alpha_{i}\sum_{j\in\mathcal{N}_{i}}a_{ij}(\sigma_{i}-\sigma_{j}),

(3)

where $\alpha_{i}\in\mathbb{R}_{>0}$ . When $\lim_{t\to\infty}\delta_{i}=0$ , $i=1,2,...,N$ , we can easily obtain that $\sigma_{1}=\sigma_{2}=...=\sigma_{N}$ and $\omega_{1}=\omega_{2}=...=\omega_{N}$ with $t\to\infty$ . That is to say, the attitude consensus is achieved.

The dynamics of $\delta_{i}$ can be obtained by taking the derivative of Eq. (3), which is described as follows:

	$\displaystyle\dot{\delta}_{i}$	$\displaystyle=\sum_{j\in\mathcal{N}_{i}}a_{ij}(\dot{\omega}_{i}-\dot{\omega}_{j})+\alpha_{i}\sum_{j\in\mathcal{N}_{i}}a_{ij}(\dot{\sigma}_{i}-\dot{\sigma}_{j})$
		$\displaystyle=\varGamma_{i}(\delta_{i})+l_{ii}J_{i}^{-1}\tau_{i}-\sum_{j\in\mathcal{N}_{i}}a_{ij}J_{j}^{-1}\tau_{j},$		(4)

where $\varGamma_{i}(\delta_{i})=\alpha_{i}\sum_{j\in\mathcal{N}_{i}}a_{ij}(\dot{\sigma}_{i}-\dot{\sigma}_{j})+\sum_{j\in\mathcal{N}_{i}}a_{ij}\Big{[}-J_{i}^{-1}\Big{(}\big{(}G^{-1}(\sigma_{i})\dot{\sigma_{i}}\big{)}\times\big{(}J_{i}G^{-1}(\sigma_{i})\dot{\sigma_{i}}\big{)}\Big{)}+J_{j}^{-1}\Big{(}\big{(}G^{-1}(\sigma_{j})\dot{\sigma_{j}}\big{)}\times\big{(}J_{j}G^{-1}(\sigma_{j})\dot{\sigma_{j}}\big{)}\Big{)}\Big{]}$ .

In order to overcome the dependence on model information, a compensator is introduced, which can be expressed by the following affine differential equation:

\displaystyle\dot{\tau}_{i}=f(\tau_{i})+l_{ii}g(\tau_{i})u_{i}-\sum_{j\in\mathcal{N}_{i}}a_{ij}g(\tau_{j})u_{j},

(5)

where $f(\cdot):\mathbb{R}^{3}\to\mathbb{R}^{3}$ , $g(\cdot):\mathbb{R}^{3}\to\mathbb{R}^{3\times 3}$ are two functions to be designed later, and $u_{i}\in\mathbb{R}^{3}$ is the control input of the compensator. We need to choose appropriate functions $f(\cdot)$ and $g(\cdot)$ to ensure that the compensator is controllable. In this paper, a feasible pair of $f(\cdot)$ and $g(\cdot)$ is given as follows:


	$\displaystyle f(\tau_{i})=-2\tau_{i},$		(6a)
	$\displaystyle g(\tau_{i})=\text{diag}\{\cos^{2}(\tau_{i}^{1}),\cos^{2}(\tau_{i}^{2}),\cos^{2}(\tau_{i}^{3})\}.$		(6b)

By combining the consensus error $\delta_{i}$ and the control input torque $\tau_{i}$ , we define an augmented consensus error, which is expressed as $e_{i}=[\delta_{i}^{\top},\tau_{i}^{\top}]^{\top}\in\mathbb{R}^{6}$ . According to Eq. (III-A) and Eq. (5), we can use the following augmented system to describe the dynamics of the augmented consensus error:

\displaystyle\dot{e}_{i}=X_{i}(e_{i})+l_{ii}Y_{i}(e_{i})u_{i}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}(e_{j})u_{j},

(7)

where $X_{i}(e_{i})$ and $Y_{i}(e_{i})$ are represented as follows:

\displaystyle X_{i}(e_{i})=\begin{bmatrix}\varGamma_{i}(\delta_{i})+l_{ii}J_{i}^{-1}\tau_{i}-\sum\limits_{j\in\mathcal{N}_{i}}a_{ij}J_{j}^{-1}\tau_{j}\\ f(\tau_{i})\end{bmatrix}\in\mathbb{R}^{6},

\displaystyle Y_{i}(e_{i})=\begin{bmatrix}0\\ g(\tau_{i})\end{bmatrix}\in\mathbb{R}^{6\times 3}.

Assumption 1: The matrix $X_{i}(e_{i})$ is bounded, i.e., $\forall i\in\mathcal{V}$ , $\lVert X_{i}(e_{i})\rVert\leq X_{M}\lVert e_{i}\rVert$ is satisfied, where $X_{M}\in\mathbb{R}_{>0}$ .

In order to measure the performance cost of implementing the attitude consensus, a performance function is defined in the following form:

\displaystyle F_{i}(e_{i}(0),u_{i},u_{-i})=\int_{0}^{\infty}\big{(}e_{i}^{\top}Q_{i}e_{i}+u_{i}^{\top}R_{i}u_{i}\big{)}dt,

(8)

where $u_{-i}=\{u_{j}|j\in\mathcal{N}_{i}\}$ indicates the set of control inputs for the neighbors of the rigid body $i$ , $Q_{i}\in\mathbb{R}^{6\times 6}$ , $Q_{i}\geq 0$ , $R_{i}\in\mathbb{R}^{3\times 3}$ and $R_{i}>0$ . According to Eq. (7), we can conclude that $e_{i}$ is driven by $u_{i}$ and $u_{-i}$ . Therefore, the left side of Eq. (8) also contains $u_{-i}$ .

According to (8), the value function can be defined as

\displaystyle V_{i}(e_{i}(t))=\int_{t}^{\infty}\big{(}e_{i}(v)^{\top}Q_{i}e_{i}(v)+u_{i}(v)^{\top}R_{i}u_{i}(v)\big{)}dv.

(9)

By taking the derivative of Eq. (9), we can obtain the Hamiltonian function in the following form:

		$\displaystyle H_{i}(e_{i},\nabla V_{i},u_{i},u_{-i})$
		$\displaystyle=e_{i}^{\top}Q_{i}e_{i}+u_{i}^{\top}R_{i}u_{i}+\nabla V_{i}^{\top}\Big{(}X_{i}+l_{ii}Y_{i}u_{i}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}u_{j}\Big{)}$
		$\displaystyle=0,$		(10)

where $\nabla V_{i}=\frac{\partial V_{i}}{\partial e_{i}}.$

The implicit solution of the model-free optimal controller $u_{i}^{*}$ can be derived from $\frac{\partial H_{i}}{\partial u_{i}}=0$ , which is represented as

\displaystyle u_{i}^{*}=-\frac{1}{2}l_{ii}R_{i}^{-1}Y_{i}^{\top}\nabla V_{i}^{*},

(11)

where $V_{i}^{*}$ indicates the optimal value function and $\nabla V_{i}^{*}=\frac{\partial V_{i}^{*}}{\partial e_{i}}$ .

Combining (III-A) and (11), we can derive the HJB equation for the rigid body $i$ as follows:

		$\displaystyle H_{i}(e_{i},\nabla V_{i}^{},u_{i}^{},u_{-i}^{*})$
		$\displaystyle=e_{i}^{\top}Q_{i}e_{i}+(u_{i}^{})^{\top}R_{i}u_{i}^{}$
		$\displaystyle\quad+(\nabla V_{i}^{})^{\top}\Big{(}X_{i}+l_{ii}Y_{i}u_{i}^{}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}u_{j}^{*}\Big{)}$
		$\displaystyle=e_{i}^{\top}Q_{i}e_{i}-\frac{1}{4}l_{ii}^{2}(\nabla V_{i}^{})^{\top}Y_{i}R_{i}^{-1}Y_{i}^{\top}\nabla V_{i}^{}$
		$\displaystyle\quad+(\nabla V_{i}^{})^{\top}\Big{(}X_{i}+\frac{1}{2}\sum_{j\in\mathcal{N}_{i}}a_{ij}l_{jj}Y_{j}R_{j}^{-1}Y_{j}^{\top}\nabla V_{j}^{}\Big{)}$
		$\displaystyle=0.$		(12)

From Eq. (11), we can observe that the optimal controller contains $Y_{i}$ and $V_{i}^{*}$ . According to the definition of the augmented system (7), we can obtain $Y_{i}$ which overcomes the dependence on system dynamics. Therefore, we only need to obtain the optimal value function $V_{i}^{*}$ from Eq. (III-A) to achieve the optimal controller.

III-B Dynamic Event-Triggered Mechanism

For the purpose of reducing the computational burden when updating the controller, the event-triggered mechanism is introduced. Under the event-triggered mechanism, we only update the controller at a series of discrete instants $\{t_{i}^{h}\}_{h\in\mathbb{N}}$ , where $t_{i}^{h}<t_{i}^{h+1}$ holds with $\forall h\in\mathbb{N}$ and the initial instant is set as $t_{i}^{0}=0$ .

To define the difference between the augmented consensus error at the last event-triggered instant and in real-time as the measurement error $E_{i}(t)\in\mathbb{R}^{6}$ , which is expressed as

\displaystyle E_{i}(t)=e_{i}(t_{i}^{h})-e_{i}(t),\ \forall t\in[t_{i}^{h},t_{i}^{h+1}).

(13)

For the rigid body $i$ , the controller $u_{i}$ is only updated at the event-triggered instants $t_{i}^{h}$ , and remains unchanged until a new event is triggered. During the event-triggered intervals $[t_{i}^{h},t_{i}^{h+1})$ , the neighbor $j$ of the rigid body $i$ might update its controller at some instants $t_{j}^{h^{\prime}}$ , which performs as a piecewise constant. Letting $\mu_{j}$ indicate the event-triggered times of the neighbor $j$ , we have $t_{j}^{h^{\prime}}\in\{t_{j}^{0},t_{j}^{1},...,t_{j}^{\mu_{j}}\}$ and $t_{j}^{0}=t_{i}^{h}$ .

Therefore, the $e_{i}$ -dynamics (III-A) during $[t_{i}^{h},t_{i}^{h+1})$ is represented as follows:

\displaystyle\dot{e}_{i}=X_{i}+l_{ii}Y_{i}\hat{u}_{i}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}\hat{u}_{j},

(14)

where $\hat{u}_{i}=u_{i}(t_{i}^{h})$ and $\hat{u}_{j}=u_{j}(t_{j}^{h^{\prime}})$ are the control input vectors at the event-triggered instants.

Definition 2 (Event-Triggered Admissible Control): If the following conditions are met: $u_{i}$ is piecewise continuous, $u_{i}(0)=0$ , $e_{i}$ -dynamics (14) is stable, and the performance function (8) is finite, then $u_{i}$ is called event-triggered admissible control.

Depending on the form of optimal controller (11), we can obtain the event-triggered optimal controller as follows:

	$\displaystyle\hat{u}_{i}^{*}$	$\displaystyle=u_{i}^{*}(t_{i}^{h})$
		$\displaystyle=-\frac{1}{2}l_{ii}R_{i}^{-1}Y_{i}^{\top}\nabla\hat{V}_{i}^{*},\ \forall t\in[t_{i}^{h},t_{i}^{h+1}),$		(15)

where $\nabla\hat{V}_{i}^{*}=\frac{\partial V_{i}^{*}}{\partial e_{i}}(t_{i}^{h})$ .

By combining (III-A) and (III-B), the event-triggered HJB equation for the rigid body $i$ is given as follows:

		$\displaystyle H_{i}(e_{i},\nabla\hat{V}_{i}^{},\hat{u}_{i}^{},\hat{u}_{-i}^{*})$
		$\displaystyle=e_{i}^{\top}Q_{i}e_{i}-\frac{1}{4}(\nabla\hat{V}_{i}^{})^{\top}Y_{i}R_{i}^{-1}Y_{i}^{\top}\nabla\hat{V}_{i}^{}$
		$\displaystyle\quad+(\nabla\hat{V}_{i}^{})^{\top}\Big{(}X_{i}+\frac{1}{2}\sum_{j\in\mathcal{N}_{i}}a_{ij}l_{jj}Y_{j}R_{j}^{-1}Y_{j}^{\top}\nabla\hat{V}_{j}^{}\Big{)}$
		$\displaystyle=0.$		(16)

The following assumption is proposed for proving the stability of system (14).

Assumption 2 [36]: The controller $u_{i}$ is Lipschitz continuous during the time interval $[t_{i}^{h},t_{i}^{h+1})$ , and there exists a constant $P$ that satisfies the following inequality:

\displaystyle\big{\lVert}u_{i}\big{(}e_{i}(t_{i}^{h})\big{)}-u_{i}\big{(}e_{i}(t)\big{)}\big{\rVert}\leq P\lVert E_{i}(t)\rVert,

(17)

where $P$ indicates the Lipschitz constant. In the actual engineering applications, the selection of $P$ should not be smaller than the maximum value of $\lVert\partial u_{i}/\partial e_{i}^{\top}\rVert$ .

For the event-triggered control, it is necessary to ensure that Zeno-behavior does not occur in the proposed event-triggered mechanism. Therefore, we introduce a dynamic event-triggered mechanism inspired by [31] to exclude the Zeno-behavior implicitly. Firstly, a dynamic variable $y_{i}(t)$ is defined as follows:

	$\displaystyle\dot{y}_{i}(t)=$	$\displaystyle-\gamma_{i}y_{i}(t)+\kappa_{i}\Big{(}\varpi_{i}\lambda_{\textrm{min}}(Q_{i})\big{\lVert}e_{i}(t)\big{\rVert}^{2}$
		$\displaystyle-\lambda_{\textrm{max}}(R_{i})P^{2}\big{\lVert}E_{i}(t)\big{\rVert}^{2}\Big{)},$		(18)

where $y_{i}(0)\in\mathbb{R}_{\geq 0}$ , $\gamma_{i}\in\mathbb{R}_{>0}$ , $\kappa_{i}\in[0,\frac{1}{2}]$ and $\varpi_{i}\in[0,1]$ . Then, we can obtain the event-triggered instants through the following dynamic event-triggered condition:

$\displaystyle t_{i}^{0}=0,$
$\displaystyle t_{i}^{h+1}=$	$\displaystyle\max\limits_{r\geq t_{i}^{h}}\biggl{\{}r\in\mathbb{R}:y_{i}(t)+\theta_{i}\Big{(}\varpi_{i}\lambda_{\textrm{min}}(Q_{i})\big{\lVert}e_{i}(t)\big{\rVert}^{2}$
	$\displaystyle-\lambda_{\textrm{max}}(R_{i})P^{2}\big{\lVert}E_{i}(t)\big{\rVert}^{2}\Big{)}\geq 0,\forall t\in[t_{i}^{h},r]\biggr{\}},$	(19)

where $\theta_{i}\in\mathbb{R}_{>0}$ will be determined later in the proof analysis.

Lemma 1: Assuming that the event-triggered instants $t_{i}^{h}$ are determined by (III-B), $y_{i}(t)\geq 0$ always holds for $\forall t\in[0,+\infty)$ if the given initial value $y_{i}(0)\geq 0$ .

Proof: For $\forall t\in[0,+\infty)$ , the event-triggered condition (III-B) guarantees the following inequality:

\displaystyle y_{i}(t)+\theta_{i}\Big{(}\varpi_{i}\lambda_{\textrm{min}}(Q_{i})\big{\lVert}e_{i}(t)\big{\rVert}^{2}-\lambda_{\textrm{max}}(R_{i})P^{2}\big{\lVert}E_{i}(t)\big{\rVert}^{2}\Big{)}\geq 0.

(20)

Since the selection of $\theta_{i}$ must satisfy $\theta_{i}>0$ , the inequality (20) becomes

\displaystyle\varpi_{i}\lambda_{\textrm{min}}(Q_{i})\big{\lVert}e_{i}(t)\big{\rVert}^{2}-\lambda_{\textrm{max}}(R_{i})P^{2}\big{\lVert}E_{i}(t)\big{\rVert}^{2}\geq-\frac{1}{\theta_{i}}y_{i}(t).

(21)

By combining (III-B) and (21), we can easily obtain that for $\forall t\in[0,+\infty)$ ,

\displaystyle\dot{y}_{i}(t)\geq-(\gamma_{i}+\frac{\kappa_{i}}{\theta_{i}})y_{i}(t).

(22)

According to the comparison lemma in [37], we can deduce that

\displaystyle y_{i}(t)\geq y_{i}(0){\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{-(\gamma_{i}+\frac{\kappa_{i}}{\theta_{i}})t\}.}

(23)

Therefore, $y_{i}(t)\geq 0$ always holds for $\forall t\in[0,+\infty)$ . The complete proof is given. $\hfill\blacksquare$

Theorem 2: Consider a multiple rigid body network with $N$ nodes under a strongly connected communication topology. Supposed that Assumption 2 holds, the performance function and the event-triggered optimal controller are given by (8) and (III-B), respectively. If the event-triggered instants are determined by the dynamic event-triggered condition (III-B), then the following two conclusions can be obtained:

1) The $e_{i}$ -dynamics (14) is asymptotically stable, i.e., the optimal attitude consensus is achieved.

2) The Zeno behavior is excluded, i.e., the interval between $t_{i}^{h+1}$ and $t_{i}^{h}$ , $\forall i\in\mathcal{V}$ has a positive lower bound.

Proof: 1) Firstly, we prove that the $e_{i}$ -dynamics (14) is asymptotically stable. Choosing $\Pi_{i}(t)=V_{i}^{*}\big{(}e_{i}(t)\big{)}+y_{i}(t)$ as the Lyapunov function, which contains the optimal value function $V_{i}^{*}\big{(}e_{i}(t)\big{)}$ in (9) and the dynamic variable $y_{i}(t)$ is governed by (III-B).

By taking the first-order derivative of $V_{i}^{*}(e_{i})$ with respect to $t$ along with the consensus error $e_{i}$ , we derive

	$\displaystyle\dot{V}_{i}^{*}(e_{i})$	$\displaystyle=(\nabla V_{i}^{*})^{\top}\dot{e}_{i}$
		$\displaystyle=(\nabla V_{i}^{})^{\top}\Big{(}X_{i}+l_{ii}Y_{i}\hat{u}_{i}^{}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}\hat{u}_{j}^{*}\Big{)}.$		(24)

During the event-triggered intervals $[t_{i}^{h},t_{i}^{h+1})$ , assuming that the neighbors of the rigid body $i$ execute $\hat{u}_{j}^{*}=u_{j}^{*}(t_{j}^{h^{\prime}})$ . According to (11) and (III-A), it can be easily obtained that

	$\displaystyle(\nabla V_{i}^{*})^{\top}X_{i}=$	$\displaystyle-e_{i}^{\top}Q_{i}e_{i}-(u_{i}^{})^{\top}R_{i}u_{i}^{}-(\nabla V_{i}^{})^{\top}l_{ii}Y_{i}u_{i}^{}$
		$\displaystyle+(\nabla V_{i}^{})^{\top}\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}\hat{u}_{j}^{}$		(25)

and

\displaystyle(\nabla

\displaystyle V_{i}^{*})^{\top}l_{ii}Y_{i}=-2(u_{i}^{*})^{\top}R_{i}.

(26)

Thus, Eq. (III-B) can be redescribed as

$\displaystyle\dot{V}_{i}^{*}(e_{i})=$	$\displaystyle-e_{i}^{\top}Q_{i}e_{i}-(u_{i}^{})^{\top}R_{i}u_{i}^{}$
	$\displaystyle+(\nabla V_{i}^{})^{\top}l_{ii}Y_{i}(\hat{u}_{i}^{}-u_{i}^{*})$
$\displaystyle=$	$\displaystyle-e_{i}^{\top}Q_{i}e_{i}+(u_{i}^{})^{\top}R_{i}u_{i}^{}-2(u_{i}^{})^{\top}R_{i}\hat{u}_{i}^{}$
$\displaystyle=$	$\displaystyle-e_{i}^{\top}Q_{i}e_{i}-(\hat{u}_{i}^{})^{\top}R_{i}\hat{u}_{i}^{}$
	$\displaystyle+(u_{i}^{}-\hat{u}_{i}^{})^{\top}R_{i}(u_{i}^{}-\hat{u}_{i}^{})$
$\displaystyle\leq$	$\displaystyle-\lambda_{\textrm{min}}(Q_{i})\big{\lVert}e_{i}(t)\big{\rVert}^{2}+\lambda_{\textrm{max}}(R_{i})P^{2}\big{\lVert}E_{i}(t)\big{\rVert}^{2}.$	(27)

According to (III-B) and (III-B), we can obtain the first-order derivative of $\Pi_{i}(t)$ as follows:

$\displaystyle\dot{\Pi}_{i}(t)=$	$\displaystyle\dot{V}_{i}^{*}(e_{i})+\dot{y}_{i}(t)$
$\displaystyle\leq$	$\displaystyle-\lambda_{\textrm{min}}(Q_{i})\big{\lVert}e_{i}(t)\big{\rVert}^{2}+\lambda_{\textrm{max}}(R_{i})P^{2}\big{\lVert}E_{i}(t)\big{\rVert}^{2}$
	$\displaystyle-\gamma_{i}y_{i}(t)+\kappa_{i}\Big{(}\varpi_{i}\lambda_{\textrm{min}}(Q_{i})\big{\lVert}e_{i}(t)\big{\rVert}^{2}$
	$\displaystyle-\lambda_{\textrm{max}}(R_{i})P^{2}\big{\lVert}E_{i}(t)\big{\rVert}^{2}\Big{)}$
$\displaystyle\leq$	$\displaystyle-(1-\varpi_{i})\lambda_{\textrm{min}}(Q_{i})\big{\lVert}e_{i}(t)\big{\rVert}^{2}-\gamma_{i}y_{i}(t)$
	$\displaystyle+(\kappa_{i}-1)\Big{(}\varpi_{i}\lambda_{\textrm{min}}(Q_{i})\big{\lVert}e_{i}(t)\big{\rVert}^{2}$
	$\displaystyle-\lambda_{\textrm{max}}(R_{i})P^{2}\big{\lVert}E_{i}(t)\big{\rVert}^{2}\Big{)}.$	(28)

Substituting the dynamic event-triggered condition (III-B) into (III-B), $\dot{\Pi}_{i}(t)$ becomes

$\displaystyle\dot{\Pi}_{i}(t)\leq$	$\displaystyle-(1-\varpi_{i})\lambda_{\textrm{min}}(Q_{i})\big{\lVert}e_{i}(t)\big{\rVert}^{2}-\gamma_{i}y_{i}(t)$
	$\displaystyle+(\kappa_{i}-1)(-\frac{1}{\theta_{i}})y_{i}(t)$
$\displaystyle\leq$	$\displaystyle-(1-\varpi_{i})\lambda_{\textrm{min}}(Q_{i})\big{\lVert}e_{i}(t)\big{\rVert}^{2}-(\gamma_{i}+\frac{\kappa_{i}-1}{\theta_{i}})y_{i}(t).$	(29)

Since $\varpi_{i}\in[0,1]$ , we have $\dot{\Pi}_{i}(t)\leq 0$ if $\theta_{i}\in[\frac{1-\kappa_{i}}{\gamma_{i}},+\infty)$ . Therefore, we can select appropriate $\gamma_{i}\in\mathbb{R}_{>0}$ , $\kappa_{i}\in[0,\frac{1}{2}]$ , $\varpi_{i}\in[0,1]$ and $\theta_{i}\in[\frac{1-\kappa_{i}}{\gamma_{i}},+\infty)$ to ensure that the $e_{i}$ -dynamics (14) is asymptotically stable under the dynamic event-triggered condition (III-B).

2) Then, we prove that the Zeno behavior is excluded.

According to (23), we can deduce a sufficient condition of the dynamic event-trigger condition (III-B) when $\varpi$ is selected as zero, which is expressed as follows:

	$\displaystyle t_{i}^{h+1}=$	$\displaystyle\max\limits_{r\geq t_{i}^{h}}\biggl{\{}r\in\mathbb{R}:\big{\lVert}E_{i}(t)\big{\rVert}\leq\sqrt{\frac{y_{i}(0)}{\theta_{i}\lambda_{\textrm{max}}(R_{i})P^{2}}}$
		$\displaystyle\times{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{-\frac{1}{2}{(\gamma_{i}+\frac{\kappa_{i}}{\theta_{i}})t}\}},\forall t\in[t_{i}^{h},r]\biggr{\}}.$		(30)

According to the definition of $Y_{i}$ , we can conclude that $Y_{i}$ is bounded. That is to say, $Y_{i}\leq Y_{M}$ is satisfied, where $Y_{M}\in\mathbb{R}_{>0}$ . With Assumption 1 and the definition of measurement error $E_{i}(t)$ , we can obtain that for $\forall t\in[t_{i}^{h},t_{i}^{h+1})$ ,

$\displaystyle\big{\lVert}\dot{E}_{i}(t)\big{\rVert}$	$\displaystyle=\big{\lVert}\dot{e}_{i}(t_{i}^{h})-\dot{e}_{i}(t)\big{\rVert}$
	$\displaystyle=\big{\lVert}X_{i}\big{(}e_{i}(t)\big{)}+l_{ii}Y_{i}u_{i}(t_{i}^{h})-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}u_{j}(t_{j}^{h^{\prime}})\big{\rVert}$
	$\displaystyle\leq X_{M}\big{\lVert}e_{i}(t)\big{\rVert}+\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{M}\big{(}\big{\lVert}u_{i}(t_{i}^{h})\big{\lVert}+\big{\lVert}u_{j}(t_{j}^{h^{\prime}})\big{\rVert}\big{)}$
	$\displaystyle\leq X_{M}\big{\lVert}E_{i}(t)\big{\rVert}+X_{M}\big{\lVert}e_{i}(t_{i}^{h})\big{\rVert}$
	$\displaystyle\quad+\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{M}\big{(}\big{\lVert}u_{i}(t_{i}^{h})\big{\lVert}+\big{\lVert}u_{j}(t_{j}^{h^{\prime}})\big{\rVert}\big{)}.$	(31)

Since $E_{i}(t_{i}^{h})=0$ is satisfied at the event-triggered instants, we can derive the following inequality by using the comparison lemma [37]:

$\displaystyle\big{\lVert}E_{i}(t)\big{\rVert}$	$\displaystyle\leq{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{X_{M}(t-t_{i}^{h})\}}E_{i}(t_{i}^{h})$
	$\displaystyle\quad+\frac{1}{2}\int_{t_{i}^{h}}^{t}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{X_{M}(t-v)\}}\Big{(}X_{M}\big{\lVert}e_{i}(t_{i}^{h})\big{\rVert}$
	$\displaystyle\quad+\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{M}\big{(}\big{\lVert}u_{i}(t_{i}^{h})\big{\lVert}+\big{\lVert}u_{j}(t_{j}^{h^{\prime}})\big{\rVert}\big{)}\Big{)}dv$
	$\displaystyle\leq\frac{1}{2}\int_{t_{i}^{h}}^{t}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{X_{M}(t-v)\}}\Big{(}X_{M}\big{\lVert}e_{i}(t_{i}^{h})\big{\rVert}$
	$\displaystyle\quad+\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{M}\big{(}\big{\lVert}u_{i}(t_{i}^{h})\big{\lVert}+\big{\lVert}u_{j}(t_{j}^{h^{\prime}})\big{\rVert}\big{)}\Big{)}dv.$	(32)

Let $\tilde{t}_{i}^{h+1}$ indicate the next event-triggered instant determined by the sufficient condition (III-B). According to (III-B), the sufficient condition (III-B) can be divided into two situations during the time interval $[t_{i}^{h},\tilde{t}_{i}^{h+1})$ , which contains: 1) there is no events occuring for all rigid bodies in $\mathcal{N}_{i}$ , and 2) there exists at least one event for the rigid body $j\in\mathcal{N}_{i}$ .

Situation 1: For all rigid bodies in $\mathcal{N}_{i}$ , there is no event-triggered instants during $[t_{i}^{h},\tilde{t}_{i}^{h+1})$ . Therefore, we can deduce the following inequality:

		$\displaystyle\sqrt{\frac{y_{i}(0)}{\theta_{i}\lambda_{\textrm{max}}(R_{i})P^{2}}}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{-\frac{1}{2}{(\gamma_{i}+\frac{\kappa_{i}}{\theta_{i}})\tilde{t}_{i}^{h+1}}\}}$
		$\displaystyle\leq\frac{X_{M}\big{\lVert}e_{i}(t_{i}^{h})\big{\rVert}+\sum\limits_{j\in\mathcal{N}_{i}}a_{ij}Y_{M}\big{(}\big{\lVert}u_{i}(t_{i}^{h})\big{\lVert}+\big{\lVert}u_{j}(t_{j}^{h^{\prime}})\big{\rVert}\big{)}}{2X_{M}}$
		$\displaystyle\quad\times\Big{(}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{X_{M}(\tilde{t}_{i}^{h+1}-t_{i}^{h})\}}-1\Big{)}.$		(33)

Situation 2: For the rigid body $j\in\mathcal{N}_{i}$ , there exists $\mu_{j}\in\mathbb{N}_{>0}$ event-triggered instants during $[t_{i}^{h},\tilde{t}_{i}^{h+1})$ . By using $t_{j}^{0},t_{j}^{1},...,t_{j}^{\mu_{j}}$ to indicate the event-triggered instants and $t_{j}^{0}=t_{i}^{k}$ , we can deduce

		$\displaystyle\sqrt{\frac{y_{i}(0)}{\theta_{i}\lambda_{\textrm{max}}(R_{i})P^{2}}}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{-\frac{1}{2}{(\gamma_{i}+\frac{\kappa_{i}}{\theta_{i}})\tilde{t}_{i}^{h+1}}\}}$
		$\displaystyle\leq\frac{X_{M}\big{\lVert}e_{i}(t_{i}^{h})\big{\rVert}+\sum\limits_{j\in\mathcal{N}_{i}}a_{ij}Y_{M}\big{\lVert}u_{i}(t_{i}^{h})\big{\lVert}}{2X_{M}}$
		$\displaystyle\quad\quad\times\Big{(}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{X_{M}(\tilde{t}_{i}^{h+1}-t_{i}^{h})\}}-1\Big{)}$
		$\displaystyle\quad+\frac{\sum\limits_{j\in\mathcal{N}_{i}}a_{ij}Y_{M}\sum\limits_{s=0}^{\mu_{j}-1}\big{\lVert}u_{j}(t_{j}^{s})\big{\lVert}}{2X_{M}}$
		$\displaystyle\quad\quad\times\Big{(}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{X_{M}(t_{j}^{s+1}-t_{i}^{s})\}}-1\Big{)}$
		$\displaystyle\quad+\frac{\sum\limits_{j\in\mathcal{N}_{i}}a_{ij}Y_{M}\big{\lVert}u_{j}(t_{j}^{\mu_{j}})\big{\lVert}}{2X_{M}}$
		$\displaystyle\quad\quad\times\Big{(}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{X_{M}(\tilde{t}_{i}^{h+1}-t_{j}^{\mu_{j}})\}}-1\Big{)}.$		(34)

Combining (III-B) and (III-B), we can obtain the unified form of the two situations, which is expressed as follows:

		$\displaystyle\sqrt{\frac{y_{i}(0)}{\theta_{i}\lambda_{\textrm{max}}(R_{i})P^{2}}}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{-\frac{1}{2}{(\gamma_{i}+\frac{\kappa_{i}}{\theta_{i}})\tilde{t}_{i}^{h+1}}\}}$
		$\displaystyle\leq\frac{X_{M}\big{\lVert}e_{i}(t_{i}^{h})\big{\rVert}+\Theta_{i}}{2X_{M}}\Big{(}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{X_{M}(\tilde{t}_{i}^{h+1}-t_{i}^{h})\}}-1\Big{)},$		(35)

where $\Theta_{i}=\sum\limits_{j\in\mathcal{N}_{i}}a_{ij}Y_{M}\Big{(}\big{\lVert}u_{i}(t_{i}^{h})\big{\lVert}+\max\limits_{s=0,...,\mu_{j}}\big{\{}\big{\lVert}u_{j}(t_{j}^{s})\big{\rVert}\big{\}}\Big{)}$ .

Since $\tilde{t}_{i}^{h+1}$ is determined by the sufficient condition (III-B), and let $t_{i}^{h+1}$ indicate the next event-triggered instant determined by (III-B), we can obtain the interval between two adjacent event-triggered instants:

$\displaystyle t_{i}^{h+1}-t_{i}^{h}$	$\displaystyle\geq\tilde{t}_{i}^{h+1}-t_{i}^{h}$
	$\displaystyle\geq\frac{1}{X_{M}}\text{log}\Bigg{(}\frac{2X_{M}\sqrt{y_{i}(0)}}{\big{(}X_{M}\big{\lVert}e_{i}(t_{i}^{h})\big{\rVert}+\Theta_{i}\big{)}\sqrt{\theta_{i}\lambda_{\textrm{max}}(R_{i})P^{2}}}$
	$\displaystyle\quad\times{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{-\frac{1}{2}{(\gamma_{i}+\frac{\kappa_{i}}{\theta_{i}})\tilde{t}_{i}^{h+1}}\}}+1\Bigg{)}>0.$	(36)

Therefore, the Zeno behavior can be excluded. The complete proof is given. $\hfill\blacksquare$

III-C Self-Triggered Mechanism

Under the dynamic event-triggered mechanism, we have to obtain the continuous consensus error $e_{i}(t)$ and the continuous measurement error $E_{i}(t)$ to judge whether the dynamic event-triggered condition (III-B) is violated. Therefore, it is necessary to continuously communicate with neighbors to obtain their absolute attitude information, or to measure continuous relative attitude information with the help of sensors such as cameras. In order to overcome this problem, a self-triggered condition is proposed in this subsection.

Letting $\kappa_{i}=0$ , Eq. (III-B) becomes

\displaystyle\dot{y}_{i}(t)=-\lambda_{i}y_{i}(t).

(37)

Therefore, we can obtain that $y_{i}(t)=y_{i}(0){\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{-\lambda_{i}t\}}$ , $\forall t\in[0,+\infty)$ .

Then, letting $\varpi_{i}=0$ , the dynamic event-triggered condition (III-B) becomes

	$\displaystyle t_{i}^{h+1}=$	$\displaystyle\max\limits_{r\geq t_{i}^{h}}\biggl{\{}r\in\mathbb{R}:\big{\lVert}E_{i}(t)\big{\rVert}\leq\sqrt{\frac{y_{i}(0)}{\theta_{i}\lambda_{\textrm{max}}(R_{i})P^{2}}}$
		$\displaystyle\times{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{-\frac{1}{2}\lambda_{i}t\}},\forall t\in[t_{i}^{h},r]\biggr{\}},$		(38)

where $\theta_{i}\in[\frac{1}{\gamma_{i}},+\infty)$ .

According to (III-B), the self-triggered measurement error is defined in the following form:

\displaystyle\big{\lVert}\Delta_{i}(t)\big{\rVert}=\frac{X_{M}\big{\lVert}e_{i}(t_{i}^{h})\big{\rVert}+\Theta_{i}}{2X_{M}}\Big{(}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{X_{M}(t-t_{i}^{h})\}}-1\Big{)},

(39)

which is the upper bound of $\big{\lVert}E_{i}(t)\big{\rVert}$ .

Thus, we can obtain a new sufficient condition of the dynamic event-triggered condition (III-B) as follows:

	$\displaystyle t_{i}^{h+1}=$	$\displaystyle\max\limits_{r\geq t_{i}^{h}}\biggl{\{}r\in\mathbb{R}:\big{\lVert}\Delta_{i}(t)\big{\rVert}\leq\sqrt{\frac{y_{i}(0)}{\theta_{i}\lambda_{\textrm{max}}(R_{i})P^{2}}}$
		$\displaystyle\times{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\exp\{-\frac{1}{2}\lambda_{i}t\}},\forall t\in[t_{i}^{h},r]\biggr{\}}.$		(40)

According to (39), we can calculate the value of $\big{\lVert}\Delta_{i}(t)\big{\rVert}$ without using the continuous information. Therefore, the continuous communication is avoided.

Remark 1: Since the self-triggered condition (III-C) is a sufficient condition for the dynamic event-triggered condition (III-B), the number of triggered times by using (III-C) will be higher than using (III-B). Our original intention of introducing the self-triggered mechanism is to reduce the consumption of communication resources, which will inevitably increase the number of triggered times. That is to say, we can exchange a large amount of communication resources with a small amount of computing resources.

Remark 2: Compared with the time-triggered methods in [19]–[23], the event-triggered mechanism significantly saves computing resources and communication resources. Note that the event-triggered attitude stabilization problem is studied based on the sliding mode control in [40], however, the performance cost has not been consided in the controller design.

IV Main Results

Up to now, we have already derived the form of the optimal controller (III-B), which contains the optimal value function $\hat{V}_{i}^{*}$ . However, it is very difficult to obtain the analytic solutions to the event-triggered HJB equations (III-B). In this section, we first introduce an event-triggered Reinforcement Learning (RL) algorithm to obtain the optimal policy. In order to implement the event-triggered RL algorithm online, a critic neural network is used to approximate the optimal value function $\hat{V}_{i}^{*}$ . Only measurement data at the event-triggered instants are needed in the event-triggered RL algorithm, which obviously reduces the computation burden.

IV-A Model-Free Event-Triggered RL Algorithm

This section presents a model-free event-triggered algorithm based on reinforcement learning, which is used to seek the optimal policy. The RL algorithm involves two parts: policy evaluation and policy improvement. By repeating these two steps at the event-triggered instants, we know that the optimal policy is obtained when the policy improvement does not change the control policy.

1 Initialize the event-triggered admissible controllers

u_{i}(0)=0,i=1,...,N

and set

h=0

;

2 for the rigid body $i\in\mathcal{V}$ do

3 if the rigid body $i$ receives information $u_{j}(t)$ transmitted by the rigid body $j$ , where $j\in\mathcal{N}_{i}$ then

4 Set

h^{\prime}=h^{\prime}+1

and

t_{j}^{h^{\prime}}=t

;

5 Update

u_{j}(t_{j}^{h^{\prime}})=u_{j}(t)

;

6 end if

7 else

\hat{u}_{j}^{h^{\prime}}=u_{j}(t_{j}^{h^{\prime}})

remains unchanged;

9 end if

10 Calculate the self-triggered measurment error

\big{\lVert}\Delta_{i}(t)\big{\rVert}

;

11 if the self-triggered condition (III-C) is violated then

12 Set

t_{i}^{h+1}=t

;

13 Step 1 (Policy evaluation):

$\displaystyle H_{i}$	$\displaystyle(e_{i},\nabla\hat{V}_{i}^{h+1},\hat{u}_{i}^{h},\hat{u}_{-i}^{h^{\prime}})$
$\displaystyle=$	$\displaystyle e_{i}^{\top}Q_{i}e_{i}+(\hat{u}_{i}^{h})^{\top}R_{i}\hat{u}_{i}^{h}+\big{(}\nabla\hat{V}_{i}^{(h+1)}\big{)}^{\top}\big{(}X_{i}$
	$\displaystyle+l_{ii}Y_{i}\hat{u}_{i}^{h}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}\hat{u}_{j}^{h^{\prime}}\big{)}=0,$	(41)

where

\nabla\hat{V}_{i}^{h+1}=\nabla V_{i}(t_{i}^{h+1})

\hat{u}_{i}^{h}=u_{i}(t_{i}^{h})

and

\hat{u}_{-i}^{h^{\prime}}=\hat{u}_{j}^{h^{\prime}}=u_{j}(t_{j}^{h^{\prime}})

j\in\mathcal{N}_{i}

;

14 Step 2 (Policy Improvement):

\displaystyle\hat{u}_{i}^{h+1}=-\frac{1}{2}l_{ii}R_{i}^{-1}Y_{i}^{\top}\nabla\hat{V}_{i}^{h+1},

(42)

where

\hat{u}_{i}^{h+1}=u_{i}(t_{i}^{h+1})

;

15 Set

h=h+1

;

16 end if

17 else

\hat{u}_{i}^{h}=u_{i}(t_{i}^{h})

remains unchanged;

19 end if

21 end for

Algorithm 1 Model-Free Event-Triggered RL Algorithm.

Next, we give a theorem to show the convergence of the model-free event-triggered RL algorithm.

Theorem 2: Supposed that agent $i$ updates the control policy according to Algorithm 1, the value function converges to the optimal value function, i.e., $\lim_{h\to\infty}\hat{V}_{i}^{h}=\hat{V}_{i}^{*}$ and the control policy converges to the optimal control policy, i.e., $\lim_{h\to\infty}\hat{u}_{i}^{h}=\hat{u}_{i}^{*}$ .

Proof: According to Eq. (13), we can obtain that

		$\displaystyle\big{(}\nabla\hat{V}_{i}^{h}\big{)}^{\top}\Big{[}X_{i}+l_{ii}Y_{i}\hat{u}_{i}^{h-1}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}\hat{u}_{j}^{h^{\prime}-1}\Big{]}$
		$\displaystyle=-e_{i}^{\top}Q_{i}e_{i}-\big{(}\hat{u}_{i}^{h-1}\big{)}^{\top}R_{i}\hat{u}_{i}^{h-1},$		(43)

and

		$\displaystyle\big{(}\nabla\hat{V}_{i}^{h+1}\big{)}^{\top}\Big{[}X_{i}+l_{ii}Y_{i}\hat{u}_{i}^{h}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}\hat{u}_{j}^{h^{\prime}}\Big{]}$
		$\displaystyle=-e_{i}^{\top}Q_{i}e_{i}-\big{(}\hat{u}_{i}^{h}\big{)}^{\top}R_{i}\hat{u}_{i}^{h^{\prime}}.$		(44)

By applying the transformation to Eq. (IV-A), the following equation holds:

		$\displaystyle\big{(}\nabla\hat{V}_{i}^{h}\big{)}^{\top}\Big{[}X_{i}+l_{ii}Y_{i}\hat{u}_{i}^{h}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}\hat{u}_{j}^{h^{\prime}}\Big{]}$
		$\displaystyle=-e_{i}^{\top}Q_{i}e_{i}-\big{(}\hat{u}_{i}^{h-1}\big{)}^{\top}R_{i}\hat{u}_{i}^{h-1}+\big{(}\nabla\hat{V}_{i}^{h}\big{)}^{\top}l_{ii}Y_{i}\big{(}\hat{u}_{i}^{h}-\hat{u}_{i}^{h-1}\big{)}$
		$\displaystyle\quad-\big{(}\nabla\hat{V}_{i}^{h}\big{)}^{\top}\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}\big{(}\hat{u}_{j}^{h^{\prime}}-\hat{u}_{j}^{h^{\prime}-1}\big{)}.$		(45)

In the model-free event-triggered RL algorithm, we update the control policy of the rigid body $i$ when the self-triggered condition (III-C) is violated. Under the distributed asynchronous update pattern, the control policy of the rigid body $j$ remains invariant, where $j\in\mathcal{N}_{i}$ . Considering the trajectory of the consensus error driven by $u_{i}^{h}$ , i.e., $\dot{e}_{i}=X_{i}+l_{ii}Y_{i}\hat{u}_{i}^{h}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}\hat{u}_{j}^{h^{\prime}}$ , we have

		$\displaystyle\hat{V}_{i}^{h+1}-\hat{V}_{i}^{h}$
		$\displaystyle=\int_{t}^{\infty}\Big{[}\big{(}\nabla\hat{V}_{i}^{h}\big{)}^{\top}\dot{e}_{i}-\big{(}\nabla\hat{V}_{i}^{h+1}\big{)}^{\top}\dot{e}_{i}\Big{]}dv$
		$\displaystyle=\int_{t}^{\infty}\Big{[}\big{(}\hat{u}_{i}^{h}\big{)}^{\top}R_{i}\hat{u}_{i}^{h}-\big{(}\hat{u}_{i}^{h-1}\big{)}^{\top}R_{i}\hat{u}_{i}^{h-1}$
		$\displaystyle\quad+\big{(}\nabla\hat{V}_{i}^{h}\big{)}^{\top}l_{ii}Y_{i}\big{(}\hat{u}_{i}^{h}-\hat{u}_{i}^{h-1}\big{)}\Big{]}dv.$		(46)

According to Eq. (42), it can be easily obtained that

\displaystyle\big{(}\nabla\hat{V}_{i}^{h}\big{)}^{\top}l_{ii}Y_{i}=-2\big{(}\hat{u}_{i}^{h}\big{)}^{\top}R_{i}.

(47)

Then, Eq. (IV-A) becomes

		$\displaystyle\hat{V}_{i}^{h+1}-\hat{V}_{i}^{h}$
		$\displaystyle=\int_{t}^{\infty}\Big{[}\big{(}\hat{u}_{i}^{h}\big{)}^{\top}R_{i}\hat{u}_{i}^{h}-\big{(}\hat{u}_{i}^{h-1}\big{)}^{\top}R_{i}\hat{u}_{i}^{h-1}$
		$\displaystyle\quad-2\big{(}\hat{u}_{i}^{h}\big{)}^{\top}R_{i}\big{(}\hat{u}_{i}^{h}-\hat{u}_{i}^{h-1}\big{)}\Big{]}dv$
		$\displaystyle=\int_{t}^{\infty}\Big{[}-\big{(}\hat{u}_{i}^{h}-\hat{u}_{i}^{h-1}\big{)}^{\top}R_{i}\big{(}\hat{u}_{i}^{h}-\hat{u}_{i}^{h-1}\big{)}\Big{]}dv$
		$\displaystyle\leq 0.$		(48)

Therefore, $\hat{V}_{i}^{h+1}\leq\hat{V}_{i}^{h}$ is always satisfied. According to the Weierstrass theorem [38], the positive definite value function $\hat{V}_{i}^{h}$ converges to the optimal value function $\hat{V}_{i}^{*}$ with $h\to\infty$ . Meanwhile, the control policy $\hat{u}_{i}^{h}$ converges to the optimal control policy $\hat{u}_{i}^{*}$ . The complete proof is given. $\hfill\blacksquare$

IV-B Implementation of Event-Triggered PI Algorithm

In this section, we implement Algorithm 1 by using a critic neural network to approximate the optimal value function $\hat{V}_{i}^{*}$ .

We first define the following neural network of each agent:

\displaystyle\hat{V}_{i}(e_{i})=\hat{W}_{c,i}^{\top}\phi_{i}(e_{i}),\ \forall t\in[t_{i}^{h},t_{i}^{h+1}),

(49)

where $\hat{W}_{c,i}$ indicates the critic estimated weight at the event-triggered instant $t_{i}^{h}$ , and $\phi_{i}(e_{i})$ indicates the critic activation function.

According to Eq. (III-B), the estimated error of the critic NN can be defined as

\displaystyle e_{c,i}=e_{i}Q_{i}e_{i}+\hat{u}_{i}^{\top}R_{i}\hat{u}_{i}+\hat{W}_{c,i}^{\top}\nabla\phi_{i}\dot{e}_{i},

(50)

where $\nabla\phi_{i}=\partial\phi_{i}(e_{i})/\partial e_{i}^{\top}$ .

Refer to caption — Figure 1: A strongly connected graph with six nodes.

For a given event-triggered admissible controller $\hat{u}_{i}$ , the update rule of $\hat{W}_{c,i}$ is to minimize the following objective function:

\displaystyle E_{c,i}=\frac{1}{2}e_{c,i}^{\top}e_{c,i}.

(51)

According to the gradient descent method, we can derive the update law of the following form:


	$\displaystyle\dot{\hat{W}}_{c,i}=0,\ t\in(t_{i}^{h},t_{i}^{h+1}),$		(52a)
	$\displaystyle\hat{W}_{c,i}^{+}=\hat{W}_{c,i}-l_{c,i}k_{i}(k_{1,i}^{\top}\hat{W}_{c,i}+e_{i}^{\top}Q_{i}e_{i}+\hat{u}_{i}^{\top}R_{i}\hat{u}_{i}),\ t=t_{i}^{h},$		(52b)

where $l_{c,i}>0$ indicates the learning rate of the critic NN, $k_{1,i}=\nabla\phi_{i}\dot{e}_{i}$ and $k_{i}=k_{1,i}/(k_{1,i}^{\top}k_{1,i}+1)^{2}$ .

Letting $\tilde{W}_{c,i}=\hat{W}_{c,i}-W_{c,i}$ , we can deduce


	$\displaystyle\dot{\tilde{W}}_{c,i}=0,\ t\in(t_{i}^{h},t_{i}^{h+1}),$		(53a)
	$\displaystyle\tilde{W}_{c,i}^{+}=\tilde{W}_{c,i}-l_{c,i}k_{i}(k_{1,i}^{\top}\tilde{W}_{c,i}+\epsilon_{c,i}),\ t=t_{i}^{h},$		(53b)

where $W_{c,i}$ denotes the critic target weight, $\tilde{W}_{c,i}$ is the critic weight error and $\epsilon_{c,i}=e_{i}Q_{i}e_{i}+\hat{u}_{i}^{\top}R_{i}\hat{u}_{i}+W_{c,i}^{\top}\nabla\phi_{i}\dot{e}_{i}$ is the critic residual error.

Therefore, the optimal controller can be obtained by (42) and (49), which is expressed in the following form:

\displaystyle\hat{u}_{i}=-\frac{1}{2}l_{ii}R_{i}^{-1}Y_{i}^{\top}(\nabla\phi_{i})^{\top}\hat{W}_{c,i}.

(54)

Through the above critic NN framework, we can obtain the optimal controller with only measurement data at the event-triggered instants. Therefore, the need for system dynamics is obviously avoided. In addition, the neural network is only updated at the event-triggered instants $t_{i}^{h}$ , which are determined by the self-triggered condition (III-C).

Assumption 3: In the critic NN framework, the target weight matrix, the activation function, the critic residual error are bounded with positive constants $W_{cM}$ , $\phi_{M}$ , and $\epsilon_{cM}$ , i.e., $\lVert W_{c,i}\rVert\leq W_{cM}$ , $\lVert\phi_{i}\rVert\leq\phi_{M}$ , and $\lVert\epsilon_{c,i}\rVert\leq\epsilon_{cM}$ .

Theorem 3: Consider the consensus error dynamics (14), the critic neural network is given as (49). If the estimated weight matrix $\hat{W}_{c,i}$ is updated with (52), the consensus error $e_{i}$ and the critic estimation error $\tilde{W}_{c,i}$ are UUB.

Proof: Two different situations are considered, including during the event-triggered intervals and at the event-triggered instants.

Situation 1: During the event-triggered intervals, i.e., $t\in(t_{i}^{h},t_{i}^{h+1})$ .

Consider the Lyapunov function of the following form:

\displaystyle L_{i}=L_{i,1}+L_{i,2},

(55)

where $L_{i,1}=e_{i}^{\top}e_{i}+V_{i}(e_{i})$ , $L_{i,2}=\frac{\text{tr}(\tilde{W}_{c,i}^{\top}\tilde{W}_{c,i})}{l_{c,i}}$ .

According to (53), we can obtain

\displaystyle\dot{L}_{i,2}=\frac{2\text{tr}(\tilde{W}_{c,i}^{\top}\dot{\tilde{W}}_{c,i})}{l_{c,i}}=0.

(56)

Therefore, the first-order derivative of $L_{i}$ can be expressed as follows:

$\displaystyle\dot{L}_{i}$	$\displaystyle=\dot{L}_{i,1}=2e_{i}^{\top}\dot{e}_{i}+\dot{V}_{i}(e_{i})$
	$\displaystyle=2e_{i}^{\top}\big{(}X_{i}(e_{i})+l_{ii}Y_{i}\hat{u}_{i}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}\hat{u}_{j}\big{)}$
	$\displaystyle\quad-e_{i}^{\top}Q_{i}e_{i}-\hat{u}_{i}^{\top}R_{i}\hat{u}_{i}$
	$\displaystyle\leq\lVert e_{i}\rVert^{2}+\big{\lVert}X_{i}(e_{i})+l_{ii}Y_{i}\hat{u}_{i}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}\hat{u}_{j}\big{\rVert}^{2}$
	$\displaystyle\quad-\lambda_{\textrm{min}}(Q_{i})\lVert e_{i}\rVert^{2}-\lambda_{\textrm{min}}(R_{i})\lVert\hat{u}_{i}\rVert^{2}$
	$\displaystyle\leq\big{(}1+3X_{M}^{2}-\lambda_{\textrm{min}}(Q_{i})\big{)}\lVert e_{i}\rVert^{2}+3l_{ii}^{2}Y_{M}^{2}\lVert\hat{u}_{i}\rVert^{2}$
	$\displaystyle\quad+3\sum_{j\in\mathcal{N}_{i}}a_{ij}^{2}Y_{M}^{2}\lVert\hat{u}_{j}\rVert^{2}-\lambda_{\textrm{min}}(R_{i})\lVert\hat{u}_{i}\rVert^{2}.$	(57)

In order to ensure $\dot{L}_{i}<0$ , the following inequality should be satisfied:

\displaystyle\lVert e_{i}\rVert>\sqrt{\frac{\Phi_{i}}{\lambda_{\textrm{min}}(Q_{i})-1-3X_{M}^{2}}},

(58)

where $\Phi_{i}=3l_{ii}^{2}Y_{M}^{2}\lVert\hat{u}_{i}\rVert^{2}+3\sum_{j\in\mathcal{N}_{i}}a_{ij}^{2}Y_{M}^{2}\lVert\hat{u}_{j}\rVert^{2}-\lambda_{\textrm{min}}(R_{i})\lVert\hat{u}_{i}\rVert^{2}$ .

Hence, the consensus error $e_{i}$ is UUB. During the event-triggered intervals, the critic estimation error $\tilde{W}_{c,i}$ remains unchanged, which means $\tilde{W}_{c,i}$ is also UUB.

Situation 2: At the event-triggered instants, i.e., $t=t_{i}^{h}$ .

Choosing the same Lyapunov function as (55), we can obtain:

\displaystyle\Delta L_{i}=\Delta L_{1,i}+\Delta L_{2,i}.

(59)

Since the trajectory of $e_{i}$ is continuous, i.e., $e_{i}^{+}=e_{i}$ . Therefore, one has

\displaystyle\Delta L_{1,i}=(e_{i}^{+})^{\top}e_{i}^{+}+V_{i}(e_{i}^{+})-e_{i}^{\top}e_{i}-V_{i}(e_{i})=0.

(60)

Next, according to (53), we have

$\displaystyle\Delta L_{2,i}$	$\displaystyle=\frac{\text{tr}\big{[}(\tilde{W}_{c,i}^{+})^{\top}\tilde{W}_{c,i}^{+}\big{]}}{l_{c,i}}-\frac{\text{tr}(\tilde{W}_{c,i}^{\top}\tilde{W}_{c,i})}{l_{c,i}}$
	$\displaystyle=\frac{1}{l_{c,i}}\text{tr}\Big{[}\big{(}\tilde{W}_{c,i}-l_{c,i}k_{i}(k_{1,i}^{\top}\tilde{W}_{c,i}+\epsilon_{c,i})\big{)}^{\top}$
	$\displaystyle\quad\times\big{(}\tilde{W}_{c,i}-l_{c,i}k_{i}(k_{1,i}^{\top}\tilde{W}_{c,i}+\epsilon_{c,i})\big{)}-\tilde{W}_{c,i}^{\top}\tilde{W}_{c,i}\Big{]}$
	$\displaystyle=l_{c,i}\text{tr}\Big{[}\big{(}k_{i}(k_{1,i}^{\top}\tilde{W}_{c,i}+\epsilon_{c,i})\big{)}^{\top}\big{(}k_{i}(k_{1,i}^{\top}\tilde{W}_{c,i}+\epsilon_{c,i})\big{)}\Big{]}$
	$\displaystyle\quad-2\text{tr}(\tilde{W}_{c,i}^{\top}k_{i}k_{1,i}^{\top}\tilde{W}_{c,i})-2\text{tr}(\tilde{W}_{c,i}^{\top}k_{i}\epsilon_{c,i})$
	$\displaystyle=l_{c,i}\lVert k_{i}k_{1,i}^{\top}\tilde{W}_{c,i}+k_{i}\epsilon_{c,i}\rVert^{2}-2k_{i}^{\top}k_{1,i}\lVert\tilde{W}_{c,i}\rVert^{2}$
	$\displaystyle\quad-2\lVert\tilde{W}_{c,i}^{\top}k_{i}\epsilon_{c,i}\rVert.$	(61)

From the definition of $k_{1,i}$ and $k_{i}$ , we can obtain the following inequalities:

	$\displaystyle\alpha_{k}\leq k_{1,i}^{\top}$	$\displaystyle k_{i}\leq\beta_{k},$		(62)
	$\displaystyle\lVert k_{i}\rVert\leq$	$\displaystyle K_{M},$		(63)

where $\beta_{k}>\alpha_{k}>0$ and $K_{M}>0$ .

Substituting (62) and (61) into (IV-B), $\Delta L_{2,i}$ becomes

$\displaystyle\Delta L_{2,i}$	$\displaystyle\leq 2l_{c,i}\beta_{k}^{2}\lVert\tilde{W}_{c,i}\rVert^{2}+2l_{c,i}\epsilon_{cM}^{2}K_{M}^{2}-2\alpha_{k}\lVert\tilde{W}_{c,i}\rVert^{2}$
	$\displaystyle\quad+\epsilon_{cM}(\lVert\tilde{W}_{c,i}\rVert^{2}+K_{M}^{2})$
	$\displaystyle\leq-(2\alpha_{k}-2l_{c,i}\beta_{k}^{2}-\epsilon_{cM})\lVert\tilde{W}_{c,i}\rVert^{2}$
	$\displaystyle\quad+(2l_{c,i}\epsilon_{cM}^{2}+\epsilon_{cM})K_{M}^{2}.$	(64)

Combining (60) and (IV-B), $\Delta L_{i}$ can be transformed into the following form:

$\displaystyle\Delta L_{i}$	$\displaystyle=\Delta L_{1,i}+\Delta L_{2,i}$
	$\displaystyle\leq-(2\alpha_{k}-2l_{c,i}\beta_{k}^{2}-\epsilon_{cM})\lVert\tilde{W}_{c,i}\rVert^{2}$
	$\displaystyle\quad+(2l_{c,i}\epsilon_{cM}^{2}+\epsilon_{cM})K_{M}^{2}.$	(65)

In order to simplify the expression, some auxiliary variables are defined as follows:

	$\displaystyle A_{i}$	$\displaystyle=2\alpha_{k}-2l_{c,i}\beta_{k}^{2}-\epsilon_{cM},$
	$\displaystyle\Gamma_{i}$	$\displaystyle=(2l_{c,i}\epsilon_{cM}^{2}+\epsilon_{cM})K_{M}^{2}.$

Therefore, it can be deduced $\Delta L_{i}<0$ when $\lVert\tilde{W}_{c,i}\rVert>\sqrt{\Gamma_{i}/A_{i}}$ , which signifies that $e_{i}$ and $\tilde{W}_{c,i}$ are UUB at the event-triggered instants.

Combing situation 1 with situation 2, it can be proved that the consensus error $e_{i}$ and the critic estimation error $\tilde{W}_{c,i}$ are UUB. The complete proof is given. $\hfill\blacksquare$

Remark 3: The attitude consensus problem of multiple rigid body networks has been widely studied in the literature [12, 13, 14, 15]. However, the attitude consensus protocol proposed in [12, 13, 14, 15] are all based on the known rigid body dynamics, which is a major limitation in practical applications. In this work, a model-free RL algorithm is proposed to solve the HJB equation of the optimal attitude consensus of multiple rigid body networks. Moreover, compared with the existing results on model-free consensus problem of multi-agent networks [20, 21, 23], an event-triggered RL algorithm is proposed, which is further extended to the self-triggered RL algorithm. Based on Algorithm 1, we know that the control update action and the information interaction among agents are only executed on the triggering instants. Hence, the computation and communication resource can be obviously reduced compared with the continuous-time approaches [20, 21, 23].

V Simulation

This section presents a numerical simulation to verify the effectiveness of the proposed event-triggered reinforcement learning method. We consider a multiple rigid body network with six nodes under the strongly connected communication topology. The communication relationship between any two nodes can be seen in Fig. 1. The Laplacian matrix is selected as follows:

\displaystyle\mathcal{L}=\begin{bmatrix}4&0&0&0&0&-4\\ -4&8&0&0&0&-4\\ 0&-4&8&-4&0&0\\ 0&0&-4&8&0&-4\\ 0&0&0&-4&4&0\\ 0&0&0&0&-4&4\end{bmatrix}.

The rigid body $i\in\{1,2,3,4,5,6\}$ can be modeled using the following equations:

	$\displaystyle\dot{\sigma_{i}}$	$\displaystyle=G(\sigma_{i})\omega_{i},$
	$\displaystyle J_{i}\dot{\omega}_{i}$	$\displaystyle=-\omega_{i}\times(J_{i}\omega_{i})+\tau_{i},$

in which $\sigma_{i}=[\sigma_{i}^{(1)},\sigma_{i}^{(2)},\sigma_{i}^{(3)}]^{\top}$ indicates the attitude vector, $\omega_{i}=[\omega_{i}^{(1)},\omega_{i}^{(2)},\omega_{i}^{(3)}]^{\top}$ indicates the angular velocity vector, $J_{i}$ indicates the inertial matrix. Here, the inertial matrix of each rigid body is selected as follows:

	$\displaystyle J_{1}=J_{3}=[1.0\ 0.1\ 0.1;\ 0.1\ 1.0\ 0.1;\ 0.1\ 0.1\ 1.0],$
	$\displaystyle J_{2}=J_{4}=[1.2\ 0.1\ 0.1;\ 0.1\ 0.9\ 0.1;\ 0.1\ 0.1\ 1.1],$
	$\displaystyle J_{3}=J_{6}=[1.1\ 0.2\ 0.1;\ 0.2\ 1.0\ 0.3;\ 0.1\ 0.3\ 1.3].$

In this simulation, the total duration is set to 40 seconds and the sampling period is 0.01 seconds. We selected the parameters as follows: $\alpha_{i}=0.5$ , $P=1$ , the weight matrices $Q_{i}=4I_{6}$ , $R_{i}=I_{3}$ , and the learning rate $l_{c,i}=0.6$ . In the dynamic event-triggered condition (III-B), $y_{i}(0)=4$ , $\gamma_{i}=0.5$ , $\kappa_{i}=0.5$ , $\varpi_{i}=0.6$ , $\theta_{i}=2$ . In the self-triggered condition (III-C), the parameters remain the same except $\kappa_{i}=0$ and $\varpi_{i}=0$ . The initial states of each rigid body are given by:

\begin{split}&\sigma_{i}=\begin{bmatrix}0.05i\\ -0.05i\\ 0.05i\end{bmatrix},\;\omega_{i}=\dot{\omega}_{i}=\begin{bmatrix}0\\ 0\\ 0\end{bmatrix},\;i=1,2,3,4,5,6.\end{split}

The critic activation function is designed as:

	$\displaystyle\phi_{i}(e_{i})=[$	$\displaystyle(e_{i}^{1})^{2}\quad e_{i}^{1}e_{i}^{2}\quad e_{i}^{1}e_{i}^{3}\quad e_{i}^{1}e_{i}^{4}\quad e_{i}^{1}e_{i}^{5}\quad e_{i}^{1}e_{i}^{6}$
		$\displaystyle(e_{i}^{2})^{2}\quad e_{i}^{2}e_{i}^{3}\quad e_{i}^{2}e_{i}^{4}\quad e_{i}^{2}e_{i}^{5}\quad e_{i}^{2}e_{i}^{6}\quad(e_{i}^{3})^{2}$
		$\displaystyle e_{i}^{3}e_{i}^{4}\quad e_{i}^{3}e_{i}^{5}\quad e_{i}^{3}e_{i}^{6}\quad(e_{i}^{4})^{2}\quad e_{i}^{4}e_{i}^{5}\quad e_{i}^{4}e_{i}^{6}$
		$\displaystyle(e_{i}^{5})^{2}\quad e_{i}^{5}e_{i}^{6}\quad(e_{i}^{6})^{2}]^{\top}\in\mathbb{R}^{21}.$

By using the model-free event-triggered RL method proposed above, the optimal attitude consensus problem for multiple rigid body networks is solved. Fig. 2 shows the norms of attitude errors and angular velocity errors between the rigid body $1$ and the rigid body $i\in\{2,3,4,5,6\}$ . From Fig. 2, we can conclude that the optimal attitude consensus is achieved. The same conclusion can be obtained from Fig. 2, which shows the trajectories of the consensus errors. Fig. 3 and Fig. 3 show the original control inputs of each rigid body and the control inputs of the augmented systems, respectively. It is worth noting that the control inputs of the augmented systems are only updated at the event-triggered instants, and remain unchanged during the event-triggered intervals.

Fig. 4 demonstrates the critic estimated weight matrix of each rigid body. It can be clearly seen that the neural networks are only updated at the event-triggered instants, which obviously reduces the consumption of computing resources. The triggering instants of dynamic event-triggered control and self-triggered control are illustrated in Figs. 5 and 5, respectively. It is clearly shown that control update actions and communication frequencey are both significantly reduced compared with the continous-time control approaches [20, 21, 23]. Fig. 6 shows the self-triggered measurement errors and their upper bound, which determines the event-triggered instants. Fig. 6 represents the triggered times and the minimum triggered interval under the dynamic event-triggered condition and the self-triggered condition, respectively. Since the self-triggered measurement error $\Delta_{i}(t)$ is the upper bound of the measurement error $E_{i}(t)$ , the triggered times under the self-triggered mechanism are more than uner the dynamic event-triggered mechanism. Therefore, we can conclude that the self-triggered mechanism leads to an inevitable increase in the number of triggered times without continuing to communicate with neighbors.

VI Conclusion

In this paper, a model-free event-triggered RL method is proposed to deal with the optimal attitude consensus for multiple rigid body networks, which only requires the measurement data at the event-triggered instants. In order to solve the HJB equations, an event-triggered PI algorithm is proposed to obtain the optimal policy. Meanwhile, the critic NN framework is used to approximate the optimal value function online. The critic neural network is updated only when the event-triggered condition is violated, which greatly reduces the consumption of computing and communication resources. The UUB of the consensus error and the weight estimation error is proved and the Zeno behavior is excluded. A numerical simulation for a multiple rigid body network with six nodes shows the feasibility of the proposed method.

In the future, we will further improve this work from the following perspectives. One consideration is to relax the condition of communication topologies, such as from strongly connected graphs to directed spanning trees or even switching topologies [44]. Due to the actuator failure could happen in real applications of rigid body such as intelligent cars and quadrotor aircrafts [45], it is well motivated to consider the optimal cooperative control of rigid body systems with the actuator failure.

References

[1] S. Ren, R. Mao and J. Wu, “Passivity-based leader-following consensus control for nonlinear multi-agent systems with fixed and switching topologies,” IEEE Transactions on Network Science and Engineering, vol. 6, no. 4, pp. 844-856, Oct. 2019.
[2] H. Hong, W. Yu, J. Fu and X. Yu, “A novel class of distributed fixed-time consensus protocols for second-order nonlinear and disturbed multi-agent systems,” IEEE Transactions on Network Science and Engineering, vol. 6, no. 4, pp. 760-772, Oct. 2019.
[3] D. Chen, X. Liu and W. Yu, “Finite-time fuzzy adaptive consensus for heterogeneous nonlinear multi-agent systems,” IEEE Transactions on Network Science and Engineering, vol. 7, no. 4, pp. 3057-3066, Oct. 2020.
[4] H. Li, J. Liu, R.W. Liu, N. Xiong, K. Wu, T. Kim, “A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis.” Sensors vol. 17, no. 8, pp. 1792, Aug. 2017.
[5] X. Shi, J. Cao, G. Wen and M. Perc, ”Finite-Time Consensus of Opinion Dynamics and its Applications to Distributed Optimization Over Digraph,” IEEE Transactions on Cybernetics, vol. 49, no. 10, pp. 3767-3779, Oct. 2019.
[6] H. Du, W. Zhu, G. Wen, Z. Duan and J. Lü, “Distributed formation control of multiple quadrotor aircraft based on nonsmooth consensus algorithms,” IEEE Transactions on Cybernetics, vol. 49, no. 1, pp. 342-353, Jan. 2019.
[7] Z. Li, Y. Tang, T. Huang and J. Kurths, “Formation control with mismatched orientation in multi-agent systems,” IEEE Transactions on Network Science and Engineering, vol. 6, no. 3, pp. 314-325, 1 Jul. 2019.
[8] X. Jin, W. Du, W. He, L. Kocarev, Y. Tang and J. Kurths, “Twisting-based finite-time consensus for Euler-Lagrange systems with an event-triggered strategy,” IEEE Transactions on Network Science and Engineering, vol. 7, no. 3, pp. 1007-1018, Jul. 2020.
[9] H. Zhang and P. Gurfil, “Cooperative orbital control of multiple satellites via consensus,” IEEE Transactions on Aerospace and Electronic Systems, vol. 54, no. 5, pp. 2171-2188, Oct. 2018.
[10] K. Zhang and M. A. Demetriou, “Adaptation of consensus penalty terms for attitude synchronization of spacecraft formation with unknown parameters,” 52nd IEEE Conference on Decision and Control, Florence, pp. 5491-5496, 2013.
[11] H. Qu, F. Yang, Q. Han and Y. Zhang, “Distributed H $\infty$ -consensus filtering for attitude tracking using ground-based radars,” IEEE Transactions on Cybernetics, to be published.
[12] A. Abdessameud and A. Tayebi, “Attitude synchronization of a group of spacecraft without velocity measurements,” IEEE Transactions on Automatic Control, vol. 54, no. 11, pp. 2642-2648, Nov. 2009.
[13] H. Cai, and J. Huang, “Leader-following attitude consensus of multiple rigid body networks by attitude feedback control,” Automatica, vol. 69, pp. 87-92, Jul. 2016.
[14] H. Gui, and A.H.J. de Ruiter, “Global finite-time attitude consensus of leader-following spacecraft systems based on distributed observers,” Automatica, vol. 91, pp. 225-232, May 2018.
[15] M. Lu and L. Liu, “Leader-following attitude consensus of multiple rigid spacecraft systems under switching networks,” IEEE Transactions on Automatic Control, vol. 65, no. 2, pp. 839-845, Feb. 2020.
[16] B. Yi, X. Shen, H. Liu, Z. Zhang, W. Zhang, S. Liu, N. Xiong, ”Deep Matrix Factorization With Implicit Feedback Embedding for Recommendation System,” IEEE Transactions on Industrial Informatics, vol. 15, no. 8, pp. 4591-4601, Aug. 2019.
[17] B. Lin, F. Zhu, J. Zhang, J. Chen, X. Chen, N. Xiong, J. L. Mauri, ”A Time-Driven Data Placement Strategy for a Scientific Workflow Combining Edge Computing and Cloud Computing,” IEEE Transactions on Industrial Informatics, vol. 15, no. 7, pp. 4254-4265, July 2019.
[18] J. Sun, X. Wang, N. Xiong and J. Shao, ”Learning Sparse Representation With Variational Auto-Encoder for Anomaly Detection,” IEEE Access, vol. 6, pp. 33353-33361, 2018.
[19] K. G. Vamvoudakis, F. L. Lewis, and G. R. Hudas, “Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality,” Automatica, vol. 48, no. 8, pp. 1598-1611, Aug. 2012.
[20] J. Li, H. Modares, T. Chai, F. L. Lewis and L. Xie, “Off-policy reinforcement learning for synchronization in multiagent graphical games,” IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 10, pp. 2434-2445, Oct. 2017.
[21] J. Qin, M. Li, Y. Shi, Q. Ma and W. X. Zheng, “Optimal synchronization control of multiagent systems with input saturation via off-policy reinforcement learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 1, pp. 85-96, Jan. 2019.
[22] M. Abu-Khalaf and F. L. Lewis, “Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach,” Automatica, vol. 41, no. 5, pp. 779-791, May 2005.
[23] H. Zhang, J. H. Park and W. Zhao, “Model-free optimal consensus control of networked Euler-Lagrange systems,” IEEE Access, vol. 7, pp. 100771-100779, 2019.
[24] L. Dong, X. Zhong, C. Sun and H. He, “Event-triggered adaptive dynamic programming for continuous-time systems with control constraints,” IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 8, pp. 1941-1952, Aug. 2017.
[25] X. Zhong and H. He, “An event-triggered ADP control approach for continuous-time system with unknown internal states,” IEEE Transactions on Cybernetics, vol. 47, no. 3, pp. 683-694, Mar. 2017.
[26] Y. Zhu, D. Zhao, H. He and J. Ji, “Event-triggered optimal control for partially unknown constrained-input systems via adaptive dynamic programming,” IEEE Transactions on Industrial Electronics, vol. 64, no. 5, pp. 4101-4109, May 2017.
[27] Q. Zhang, D. Zhao and D. Wang, “Event-based robust control for uncertain nonlinear systems using adaptive dynamic programming,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 1, pp. 37-50, Jan. 2018.
[28] W. Zhao and H. Zhang, “Distributed optimal coordination control for nonlinear multi-agent systems using event-triggered adaptive dynamic programming method,” ISA Transactions, vol. 91, pp. 184-195, Aug. 2019.
[29] W. Zhao, W. Yu and H. Zhang, “Event-triggered optimal consensus tracking control for multi-agent systems with unknown internal states and disturbances,” Nonlinear Analysis Hybrid Systems, vol. 33, pp. 227-248, Aug. 2019.
[30] Z. Shi and C. Zhou, “Distributed optimal consensus control for nonlinear multi-agent systems with input saturation based on event-triggered adaptive dynamic programming method,” International Journal of Control, to be published.
[31] A. Girard, “Dynamic triggering mechanisms for event-triggered control,” IEEE Transactions on Automatic Control, vol. 60, no. 7, pp. 1992-1997, Jul. 2015.
[32] X. Yi, K. Liu, D. V. Dimarogonas and K. H. Johansson, “Dynamic event-triggered and self-triggered control for multi-agent systems,” IEEE Transactions on Automatic Control, vol. 64, no. 8, pp. 3300-3307, Aug. 2019.
[33] X. Jin, Y. Shi, Y. Tang and X. Wu, “Event-triggered attitude consensus with absolute and relative attitude measurements,” Automatica, vol. 122, Art. No. 109245, Dec. 2020.
[34] S. Wang, X. Jin, S. Mao, A. V. Vasilakos and Y. Tang, “Model-free event-triggered optimal consensus control of multiple Euler-Lagrange systems via reinforcement learning,” IEEE Transactions on Network Science and Engineering, to be published.
[35] H. Schaub and J. L. Junkins, Analytical Mechanics of Space Systems, American Institute of Aeronautics and Astronautics, 2009.
[36] M. Lemmon, Networked Control Systems, Springer London, 2010.
[37] H. K. Khalil, Nonlinear Systems, Upper Saddle River, NJ, USA: Prentice Hall, 2002.
[38] Z. Jiang and Y. Jiang. “Robust adaptive dynamic programming for linear and nonlinear systems: An overview,” European Journal of Control, vol. 19, no. 5, pp. 417-425, Sept. 2013.
[39] W. Fang, X. Yao, X. Zhao, J. Yin and N. Xiong, ”A Stochastic Control Approach to Maximize Profit on Service Provisioning for Mobile Cloudlet Platforms,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 48, no. 4, pp. 522-534, Apr. 2018.
[40] Y. Liu, B. Jiang, J. Lu, J. Cao and G. Lu, ”Event-Triggered Sliding Mode Control for Attitude Stabilization of a Rigid Spacecraft,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 50, no. 9, pp. 3290-3299, Sept. 2020.
[41] B. Li, Y. Liu, K. I. Kou and L. Yu, ”Event-Triggered Control for the Disturbance Decoupling Problem of Boolean Control Networks,” IEEE Transactions on Cybernetics, vol. 48, no. 9, pp. 2764-2769, Sept. 2018.
[42] S. Zhu, Y. Liu, Y. Lou, et al., ”Stabilization of logical control networks: an event-triggered control approach,” Science China Information Sciences, vol. 63 no. 1, pp. 1-11, 2020.
[43] Y. -J. Liu, Q. Zeng, S. Tong, C. L. P. Chen and L. Liu, ”Adaptive Neural Network Control for Active Suspension Systems With Time-Varying Vertical Displacement and Speed Constraints,” IEEE Transactions on Industrial Electronics, vol. 66, no. 12, pp. 9458-9466, Dec. 2019.
[44] L. Liu, Y.-J. Liu, A. Chen, S. Tong, C. L. P. Chen, ”Integral Barrier Lyapunov function-based adaptive control for switched nonlinear systems,” Science China Information Sciences, vol. 63, no.3, 2020.
[45] Y. -J. Liu, Q. Zeng, S. Tong, C. L. P. Chen and L. Liu, ”Actuator Failure Compensation-Based Adaptive Control of Active Suspension Systems With Prescribed Performance,” IEEE Transactions on Industrial Electronics, vol. 67, no. 8, pp. 7044-7053, Aug. 2020.
[46] Y. Qu and N. Xiong, ”RFH: A Resilient, Fault-Tolerant and High-Efficient Replication Algorithm for Distributed Cloud Storage,” 2012 41st International Conference on Parallel Processing, pp. 520-529, 2012.

Shuai Mao received the B.S. degree in school of control science and engineering from East China University of Science and Technology, in 2017. He is currently pursuing the Ph.D. degree from East China University of Science and Technology. His research interests include multi-agent systems, distributed optimization and their applications in practical engineering.

Saiwei Wang received the B.S. degree in school of control science and engineering from East China University of Science and Technology, Shanghai, China, in 2018. He is currently pursuing the M.S. degree from East China University of Science and Technology. His research interests include multi-agent systems, reinforcement learning and their applications.

Yang Tang (Senior Member, IEEE) received the B.S. and Ph.D. degrees in electrical engineering from Donghua University, Shanghai, China, in 2006 and 2010, respectively. From 2008 to 2010, he was a Research Associate with The Hong Kong Polytechnic University, Hong Kong. From 2011 to 2015, he was a Post-Doctoral Researcher with the Humboldt University of Berlin, Berlin, Germany, and with the Potsdam Institute for Climate Impact Research, Potsdam, Germany. Since 2015, he has been a Professor with the East China University of Science and Technology, Shanghai. His current research interests include distributed estimation/control/optimization, cyber-physical systems, hybrid dynamical systems, computer vision, reinforcement learning and their applications. Prof. Tang was a recipient of the Alexander von Humboldt Fellowship and the ISI Highly Cited Researchers Award by Clarivate Analytics from 2017. He is a Senior Board Member of Scientific Reports, an Associate Editor of IEEE Transactions on Neural Networks and Learning Systems, IEEE Transactions on Emerging Topics in Computational Intelligence, IEEE Transactions on Circuits and Systems I: Regular Papers and IEEE Systems Journal, etc.

		$\displaystyle H_{i}(e_{i},\nabla V_{i}^{},u_{i}^{},u_{-i}^{*})$
		$\displaystyle=e_{i}^{\top}Q_{i}e_{i}+(u_{i}^{})^{\top}R_{i}u_{i}^{}$
		$\displaystyle\quad+(\nabla V_{i}^{})^{\top}\Big{(}X_{i}+l_{ii}Y_{i}u_{i}^{}-\sum_{j\in\mathcal{N}_{i}}a_{ij}Y_{j}u_{j}^{*}\Big{)}$
		$\displaystyle=e_{i}^{\top}Q_{i}e_{i}-\frac{1}{4}l_{ii}^{2}(\nabla V_{i}^{})^{\top}Y_{i}R_{i}^{-1}Y_{i}^{\top}\nabla V_{i}^{}$
		$\displaystyle\quad+(\nabla V_{i}^{})^{\top}\Big{(}X_{i}+\frac{1}{2}\sum_{j\in\mathcal{N}_{i}}a_{ij}l_{jj}Y_{j}R_{j}^{-1}Y_{j}^{\top}\nabla V_{j}^{}\Big{)}$
		$\displaystyle=0.$		(12)

		$\displaystyle H_{i}(e_{i},\nabla\hat{V}_{i}^{},\hat{u}_{i}^{},\hat{u}_{-i}^{*})$
		$\displaystyle=e_{i}^{\top}Q_{i}e_{i}-\frac{1}{4}(\nabla\hat{V}_{i}^{})^{\top}Y_{i}R_{i}^{-1}Y_{i}^{\top}\nabla\hat{V}_{i}^{}$
		$\displaystyle\quad+(\nabla\hat{V}_{i}^{})^{\top}\Big{(}X_{i}+\frac{1}{2}\sum_{j\in\mathcal{N}_{i}}a_{ij}l_{jj}Y_{j}R_{j}^{-1}Y_{j}^{\top}\nabla\hat{V}_{j}^{}\Big{)}$
		$\displaystyle=0.$		(16)

Event-Triggered Optimal Attitude Consensus of Multiple Rigid Body Networks with Unknown Dynamics