This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Private Inputs for Leader-Follower Game with Feedback Stackelberg Strategy

Yue Sun, Hongdan Li and Huanshui Zhang This work was supported by the National Natural Science Foundation of China under Grants 61821004 and the Natural Science Foundation of Shandong Province (ZR2021ZD14, ZR2021JQ24), and Science and Technology Project of Qingdao West Coast New Area (2019-32, 2020-20, 2020-1-4), High-level Talent Team Project of Qingdao West Coast New Area (RCTD-JC-2019-05), Key Research and Development Program of Shandong Province (2020CXGC01208). *Corresponding author.Y. Sun is with the School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China (e-mail: [email protected]). H. Li and H. Zhang are with College of Electrical Engineering and Automation, Shandong University of Science and Technology, Qingdao, Shandong 266590, China (e-mail: [email protected]; [email protected]).
Abstract

In this paper, the two-player leader-follower game with private inputs for feedback Stackelberg strategy is considered. In particular, the follower shares its measurement information with the leader except its historical control inputs while the leader shares none of the historical control inputs and the measurement information with the follower. The private inputs of the leader and the follower lead to the main obstacle, which causes the fact that the estimation gain and the control gain are related with each other, resulting that the forward and backward Riccati equations are coupled and making the calculation complicated. By introducing a kind of novel observers through the information structure for the follower and the leader, respectively, a kind of new observer-feedback Stacklberg strategy is designed. Accordingly, the above-mentioned obstacle is also avoided. Moreover, it is found that the cost functions under the presented observer-feedback Stackelberg strategy are asymptotically optimal to the cost functions under the optimal feedback Stackelberg strategy with the feedback form of the state. Finally, a numerical example is given to show the efficiency of this paper.

Index Terms:
feedback Stackelberg strategy, private inputs, observers, asymptotic optimality.

I Introduction

In the traditional control model, centralized control is a basic concept and has been extensively studied from time-invariant system to time-variant system and system with time-delay [1, 2, 3]. However, with the development of wireless sensor network and artificial intelligence, the centralized control will no longer be applicable due to the fact that the achievable bandwidth would be limited by long delays induced by the communication between the centralized controller [4]. The task of effectively controlling multiple decision-makers systems in the absence of communication channels is increasingly an interesting and challenging control problem. Correspondingly, the decentralized control of large-scale systems arises accordingly, which has widespread implementation in electrical power distribution networks, cloud environments, multi-agent systems, reinforcement learning and so on [5, 6, 7, 8], where decisions are made by multiple different decision-makers who have access to different information.

Decentralized control can be traced back to 1970s [9, 10, 11]. The optimization of decentralized control can be divided into two categories. The first category is the decentralized control for multi-controllers with one associated cost function [12, 13, 14]. Nayyar studied decentralized stochastic control with partial history observations and control inputs sharing in [15] by using the common information approach and the nn-step delayed sharing information structure was investigated in [16]. [17] focused on decentralized control in networked control system with asymmetric information by solving the forward and backward coupled Riccati equations through forward iteration, where the historical control inputs was shared unilaterally compared with the information structure shared with each other in [15, 16]. [18] designed decentralized strategies for mean-field system, which was further shown to have asymptotic robust social optimality. The other category is the decentralized control for game theory [23, 24, 25]. Two-criteria LQG decision problems with one-step delay observation sharing pattern for stochastic discrete-time system in Stackelberg strategy and Nash equilibrium strategy were considered in [19] and [20], respectively. Necessary conditions for an optimal Stackelberg strategy with output feedback form were given in [21] with incomplete information of the controllers. [22] investigated feedback risk-sensitive Nash equilibrium solutions for two-player nonzero-sum games with complete state observation and shared historical control inputs. Static output feedback incentive Stackelberg game with markov jump for linear stochastic systems was taken into consideration in [26] and a numerical algorithm was further proposed which guaranteed local convergence.

Noting that the information structure in the decentralized control systems mentioned above has the following feature, that is, all or part of historical control inputs of the controllers are shared with the other controllers. However, the case, where the controllers have its own private control inputs, has not been addressed in decentralized control system, which has applications in a personalized healthcare setting, in the states of a virtual keyboard user (e.g., Google GBoard users) and in the social robot for second language education of children [27]. It should be noted that the information structure where the control information are unavailable to the other decision makers will cause the estimation gain depends on the control gain and vice versa, which means the forward and backward Riccati equations are coupled, and make the calculation more complicated. Motivated by [28], which focused on the LQ optimal control problem of linear systems with private input and measurement information by using a kind of novel observers to overcome the obstacle, in this paper, we are concerned with the feedback Stackelberg strategy for two-player game with private control inputs. In particular, the follower shares its measurement information to the leader, while the leader doesn’t share any information to the follower due to the hierarchical relationship and the historical control inputs for the follower and the leader are both private, which is the main obstacle in this paper. To overcome the problem, firstly, the novel observers based on the information structure of each controller are proposed. Accordingly, a new kind of observer-feedback Stackelberg strategy for the follower and the leader is designed. Finally, it proved that the associated cost functions for the follower and the leader under the proposed observer-feedback Stackelberg strategy are asymptotically optimal as compared with the cost functions under the optimal feedback Stackelberg strategy with the feedback form of the state obtained in [29].

The outline of this paper is given as follows. The problem formulation is given in Section II. The observers and the observer-feedback Stackelberg strategy with private inputs are designed in Section III. The asymptotical optimal analysis is shown in Section IV. Numerical examples are presented in Section V. Conclusion is given in Section VI.

Notations: n\mathbb{R}^{n} represents the space of all real nn-dimensional vectors. AA^{\prime} means the transpose of the matrix AA. A symmetric matrix A>0A>0 (or A0A\geq 0) represents that the matrix AA is positive definite (or positive semi-definite). x\|x\| denotes the Euclidean norm of vector xx, i.e., x2=xx\|x\|^{2}=x^{\prime}x. A\|A\| denotes the Euclidean norm of matrix AA, i.e., A=λmax(AA)\|A\|=\sqrt{\lambda_{max}(A^{\prime}A)}. λ(A)\lambda(A) represents the eigenvalues of the matrix AA and λmax(A)\lambda_{max}(A) represents the largest eigenvalues of the matrix AA. II is an identity matrix with compatible dimension. 0 in block matrix represents a zero matrix with appropriate dimensions.

II Problem Formulation

Consider a two-player leader-follower game described as:

x(k+1)\displaystyle x(k+1) =\displaystyle= Ax(k)+B1u1(k)+B2u2(k),\displaystyle Ax(k)+B_{1}u_{1}(k)+B_{2}u_{2}(k), (1)
y1(k)\displaystyle y_{1}(k) =\displaystyle= H1x(k),\displaystyle H_{1}x(k), (2)
y2(k)\displaystyle y_{2}(k) =\displaystyle= H2x(k),\displaystyle H_{2}x(k), (3)

where x(k)nx(k)\in\mathbb{R}^{n} is the state with initial value x(0)x(0). u1(k)m1u_{1}(k)\in\mathbb{R}^{m_{1}} and u2(k)m2u_{2}(k)\in\mathbb{R}^{m_{2}} are the two control inputs of the follower and the leader, respectively. yi(k)siy_{i}(k)\in\mathbb{R}^{s_{i}} is the measurement information. AA, BiB_{i} and HiH_{i} (i=1,2i=1,2) are constant matrices with compatible dimensions.

The associated cost functions for the follower and the leader are given by

J1\displaystyle J_{1} =\displaystyle= k=0[x(k)Q1x(k)+u1(k)R11u1(k)\displaystyle\sum\limits^{\infty}_{k=0}[x^{\prime}(k)Q_{1}x(k)+u^{\prime}_{1}(k)R_{11}u_{1}(k) (4)
+u2(k)R12u2(k)],\displaystyle+u^{\prime}_{2}(k)R_{12}u_{2}(k)],
J2\displaystyle J_{2} =\displaystyle= k=0[x(k)Q2x(k)+u1(k)R21u1(k)\displaystyle\sum\limits^{\infty}_{k=0}[x^{\prime}(k)Q_{2}x(k)+u^{\prime}_{1}(k)R_{21}u_{1}(k) (5)
+u2(k)R22u2(k)],\displaystyle+u^{\prime}_{2}(k)R_{22}u_{2}(k)],

where the weight matrices are such that Qi0Q_{i}\geq 0, Rij0R_{ij}\geq 0 (iji\neq j) and Rii>0R_{ii}>0 (i,j=1,2i,j=1,2) with compatible dimensions.

Feedback Stackelberg strategy with different information structure for controllers had been considered since 1970s in [29], where the information structure satisfied that the controller shared all or part of historical inputs to the other. To the best of our knowledge, there has been no efficiency technique to deal with the case of private inputs for controllers. The difficultly lies in the unavailability of other controllers’ historical control inputs, which leads to the fact that the estimation gain depends on the control gain and makes the forward and backward Riccati equations coupled. In this paper, our goal is that by designing the novel observers based on the measurements and private inputs for the follower and the leader, respectively, we will show the proposed observer-feedback Stackelberg strategy is asymptotic optimal to the deterministic case in [29]. Mathematically, by denoting

Yi(k)\displaystyle Y_{i}(k) =\displaystyle= {yi(0),,yi(k)},\displaystyle\{y_{i}(0),...,y_{i}(k)\},
Ui(k1)\displaystyle U_{i}(k-1) =\displaystyle= {ui(0),,ui(k1)},\displaystyle\{u_{i}(0),...,u_{i}(k-1)\},
F1(k)\displaystyle{F}_{1}(k) =\displaystyle= {Y1(k),U1(k1)},\displaystyle\{Y_{1}(k),U_{1}(k-1)\}, (6)
F2(k)\displaystyle{F}_{2}(k) =\displaystyle= {Y1(k),Y2(k),U2(k1)},\displaystyle\{Y_{1}(k),Y_{2}(k),U_{2}(k-1)\}, (7)

we will design the observer-feedback Stackelberg strategy based on the information i(k)\mathcal{F}_{i}(k), where ui(k)u_{i}(k) is i(k)\mathcal{F}_{i}(k)-casual for i=1,2i=1,2 in this paper. The following assumptions will be used in this paper.

Assumption 1

System (A,B)(A,B) is stabilizable with B=[B1B2]B=\left[\begin{array}[]{cc}B_{1}&B_{2}\\ \end{array}\right] and system (A,Qi)(A,Q_{i}) (i=1,2i=1,2) is observable.

By denoting the admissible controls sets 𝒰i\mathcal{U}_{i} (i=1, 2) for the feedback Stackelberg strategy of the follower and the leader:

𝒰1\displaystyle\mathcal{U}_{1} ={\displaystyle=\{ u1:Ω×[0,N]×n×U2U1},\displaystyle u_{1}:\Omega\times[0,N]\times\mathbb{R}^{n}\times U_{2}\longrightarrow U_{1}\},
𝒰2\displaystyle\mathcal{U}_{2} ={\displaystyle=\{ u2:Ω×[0,N]×nU2},\displaystyle u_{2}:\Omega\times[0,N]\times\mathbb{R}^{n}\longrightarrow U_{2}\}, (8)

where U1U_{1} and U2U_{2} represent the strategy for the follower and the leader, respectively, the definition of the feedback Stackelberg strategy [30] is given.

Definition 1

(u1(k),u2(k))𝒰1×𝒰2(u^{*}_{1}(k),u^{*}_{2}(k))\in\mathcal{U}_{1}\times\mathcal{U}_{2} is the optimal feedback Stackelberg strategy, if there holds that:

J1(u1(k,u2(k)),u2(k))\displaystyle J_{1}(u^{*}_{1}(k,u^{*}_{2}(k)),u^{*}_{2}(k)) \displaystyle\leq J1(u1(k,u2(k)),u2(k)),u1𝒰1,\displaystyle J_{1}(u_{1}(k,u^{*}_{2}(k)),u^{*}_{2}(k)),\forall u_{1}\in\mathcal{U}_{1},
J2(u1(k,u2(k)),u2(k))\displaystyle J_{2}(u^{*}_{1}(k,u^{*}_{2}(k)),u^{*}_{2}(k)) \displaystyle\leq J2(u1(k,u2(k)),u2(k)),u2𝒰2.\displaystyle J_{2}(u^{*}_{1}(k,u_{2}(k)),u_{2}(k)),\forall u_{2}\in\mathcal{U}_{2}.

Firstly, the optimal feedback Stackelberg strategy in deterministic case with perfect information structure is given, that is, the information structure of the follower and the leader both satisfy

Yk={x(0),,x(k),ui(0),,ui(k1),i=1,2}.\displaystyle Y_{k}=\{x(0),...,x(k),u_{i}(0),...,u_{i}(k-1),\quad i=1,2\}.
Lemma 1

Under Assumption 1, the optimal feedback Stackelberg strategy with the information structure for the follower and the leader satisfying YkY_{k}, is given by

u1(k)\displaystyle u_{1}(k) =\displaystyle= K1x(k),\displaystyle K_{1}x(k), (9)
u2(k)\displaystyle u_{2}(k) =\displaystyle= K2x(k),\displaystyle K_{2}x(k), (10)

where the feedback gain matrices K1K_{1} and K2K_{2} satisfy

K1\displaystyle K_{1} =\displaystyle= Γ11Y1,\displaystyle-\Gamma^{-1}_{1}Y_{1}, (11)
K2\displaystyle K_{2} =\displaystyle= Γ21Y2,\displaystyle-\Gamma^{-1}_{2}Y_{2}, (12)

with

Γ1\displaystyle\Gamma_{1} =\displaystyle= R11+B1P1B1,\displaystyle R_{11}+B^{\prime}_{1}P_{1}B_{1},
Γ2\displaystyle\Gamma_{2} =\displaystyle= R22+B2M1P2M1B2+B2SR21SB2,\displaystyle R_{22}+B^{\prime}_{2}M^{\prime}_{1}P_{2}M_{1}B_{2}+B^{\prime}_{2}S^{\prime}R_{21}SB_{2},
M1\displaystyle M_{1} =\displaystyle= IB1S,S=Γ11B1P1,\displaystyle I-B_{1}S,\quad S=\Gamma^{-1}_{1}B^{\prime}_{1}P_{1},
Y1\displaystyle Y_{1} =\displaystyle= B1P1A+B1P1B2K2,\displaystyle B^{\prime}_{1}P_{1}A+B^{\prime}_{1}P_{1}B_{2}K_{2},
Y2\displaystyle Y_{2} =\displaystyle= B2M1P2M1A+B2SR21SA,\displaystyle B^{\prime}_{2}M^{\prime}_{1}P_{2}M_{1}A+B^{\prime}_{2}S^{\prime}R_{21}SA,

where P1P_{1} and P2P_{2} satisfy the following two-coupled algebraic Riccati equations:

P1\displaystyle P_{1} =\displaystyle= Q1+(A+B2K2)P1(A+B2K2)\displaystyle Q_{1}+(A+B_{2}K_{2})^{\prime}P_{1}(A+B_{2}K_{2}) (13)
Y1Γ11Y1+K2R12K2,\displaystyle-Y^{\prime}_{1}\Gamma^{-1}_{1}Y_{1}+K^{\prime}_{2}R_{12}K_{2},
P2\displaystyle P_{2} =\displaystyle= Q2+AM1P2M1A+ASR21SA\displaystyle Q_{2}+A^{\prime}M^{\prime}_{1}P_{2}M_{1}A+A^{\prime}S^{\prime}R_{21}SA (14)
Y2Γ21Y2.\displaystyle-Y^{\prime}_{2}\Gamma^{-1}_{2}Y_{2}.

The optimal cost functions for feedback Stackelberg strategy are such that

J1\displaystyle J^{*}_{1} =\displaystyle= x(0)P1x(0),\displaystyle x^{\prime}(0)P_{1}x(0), (15)
J2\displaystyle J^{*}_{2} =\displaystyle= x(0)P2x(0).\displaystyle x^{\prime}(0)P_{2}x(0). (16)
Proof 1

The optimal feedback Stackelberg strategy for deterministic case with perfect information structure for the follower and the leader in finite-time horizon has been shown in (18)-(28) with θ(t)=Π1(t)=Π2(t)=0\theta(t)=\Pi_{1}(t)=\Pi_{2}(t)=0 in [29]. By using the results in Theorem 2 in [3], the results obtained in [29] can be extended into infinite horizon, i.e., (18)-(28) in [29] are convergent to the algebraic equations obtained in (11)-(12) and (13)-(14) in Lemma 1 of this paper by using the monotonic boundedness theorem. This completes the proof.

Remark 1

P1>0P_{1}>0 and P2>0P_{2}>0 in (13)-(14) can be shown accordingly by using Theorem 2 in [3], which guaranteed the invertibility of Γ1\Gamma_{1} and Γ2\Gamma_{2}.

Remark 2

Compared with [29], where the historical control inputs of the follower and the leader are shared with each other, the historical control inputs of this paper are private, leading to the main obstacle.

III The observer-feedback Stackelberg strategy

Based on the discussion above, we are in position to consider the leader-follower game with private inputs, i.e., ui(k){u}_{i}(k) is Fi(k){F}_{i}(k)-casual.

Remark 3

As pointed out in [17], the information structure in decentralized control, where one of the controllers (C1) doesn’t share the historical control inputs to the other controller (C2) while C2 shares its historical control inputs with C1, is a challenge problem due to the control gain and estimator gain are coupled. The difficulty with private inputs for the follower and the leader is even more complicated due to the unavailability of the historical control inputs of each controller.

Considering the private inputs of the follower and the leader, the observers x^i(k)\hat{x}_{i}(k) (i=1,2i=1,2) are designed as follows:

x^1(k+1)\displaystyle\hat{x}_{1}(k+1) =\displaystyle= Ax^1(k)+B1u1(k)+B2K2x^1(k)\displaystyle A\hat{x}_{1}(k)+B_{1}u^{\star}_{1}(k)+B_{2}K_{2}\hat{x}_{1}(k) (17)
+L1[y1(k)H1x^1(k)],\displaystyle+L_{1}[y_{1}(k)-H_{1}\hat{x}_{1}(k)],
x^2(k+1)\displaystyle\hat{x}_{2}(k+1) =\displaystyle= Ax^2(k)+B1K1x^2(k)+B2u2(k)\displaystyle A\hat{x}_{2}(k)+B_{1}K_{1}\hat{x}_{2}(k)+B_{2}u^{\star}_{2}(k) (18)
+L2[y2(k)H2x^2(k)],\displaystyle+L_{2}[y_{2}(k)-H_{2}\hat{x}_{2}(k)],

where the observer gain matrices L1L_{1} and L2L_{2} are chosen to make the observers stable. Accordingly, the observer-feedback Stackelberg strategy is designed as follows:

u1(k)\displaystyle u^{\star}_{1}(k) =\displaystyle= K1x^1(k),\displaystyle K_{1}\hat{x}_{1}(k), (19)
u2(k)\displaystyle u^{\star}_{2}(k) =\displaystyle= K2x^2(k),\displaystyle K_{2}\hat{x}_{2}(k), (20)

where K1K_{1} and K2K_{2} are given in (11)-(12), respectively.

For convenience of future discussion, some symbols will be given beforehand.

𝒜\displaystyle\mathcal{A} =\displaystyle= [A+B2K2L1H1B2K2B1K1A+B1K1L2H2],\displaystyle\left[\begin{array}[]{cc}A+B_{2}K_{2}-L_{1}H_{1}&-B_{2}K_{2}\\ -B_{1}K_{1}&A+B_{1}K_{1}-L_{2}H_{2}\\ \end{array}\right], (23)
\displaystyle\mathcal{B} =\displaystyle= [B1K1B2K2]\displaystyle\left[\begin{array}[]{cc}-B_{1}K_{1}&-B_{2}K_{2}\\ \end{array}\right] (25)
=\displaystyle= [B1S(A+B2K2)B2K2],\displaystyle\left[\begin{array}[]{cc}B_{1}S(A+B_{2}K_{2})&-B_{2}K_{2}\\ \end{array}\right], (27)
A¯\displaystyle\bar{A} =\displaystyle= [A+B1K1+B2K20𝒜],\displaystyle\left[\begin{array}[]{cc}A+B_{1}K_{1}+B_{2}K_{2}&\mathcal{B}\\ 0&\mathcal{A}\\ \end{array}\right], (30)
x~(k)\displaystyle\tilde{x}(k) =\displaystyle= [x~1(k)x~2(k)],\displaystyle\left[\begin{array}[]{cc}\tilde{x}^{\prime}_{1}(k)&\tilde{x}^{\prime}_{2}(k)\\ \end{array}\right]^{\prime}, (32)
x~i(k)\displaystyle\tilde{x}_{i}(k) =\displaystyle= x(k)x^i(k),i=1,2.\displaystyle x(k)-\hat{x}_{i}(k),\quad i=1,2.

Subsequently, the stability of the observers x^i(k)\hat{x}_{i}(k) (i=1,2i=1,2) and the stability of the closed-loop system (1) under the designed observer-feedback Stackelberg strategy (19)-(20) are shown, respectively.

Theorem 1

If there exist optional gain matrices L1L_{1} and L2L_{2} such that the matrix 𝒜\mathcal{A} is stable, then, the observers x^i(k)\hat{x}_{i}(k) for i=1,2i=1,2 are stable with the controllers of the follower and the leader satisfying (19)-(20), i.e., there holds

limkx(k)x^i(k)=0.\displaystyle\lim_{k\rightarrow\infty}\|x(k)-\hat{x}_{i}(k)\|=0. (33)
Proof 2

By substituting the observer-feedback controllers (19)-(20) into (1), then x(k+1)x(k+1) is recalculated as:

x(k+1)\displaystyle x(k+1) =\displaystyle= Ax(k)+B1K1x^1(k)+B2K2x^(k)\displaystyle Ax(k)+B_{1}K_{1}\hat{x}_{1}(k)+B_{2}K_{2}\hat{x}_{(}k) (34)
=\displaystyle= [A+B1K1+B2K2]x(k)B1K1x~1(k)\displaystyle[A+B_{1}K_{1}+B_{2}K_{2}]x(k)-B_{1}K_{1}\tilde{x}_{1}(k)
B2K2x~2(k).\displaystyle-B_{2}K_{2}\tilde{x}_{2}(k).

Accordingly, by adding (19)-(20) into the observers (17)-(18) and combining with (34), the derivation of x~i(k)\tilde{x}_{i}(k) for i=1,2i=1,2 are given as

x~1(k+1)\displaystyle\tilde{x}_{1}(k+1) =\displaystyle= (A+B2K2L1H1)x~1(k)B2K2x~2(k),\displaystyle(A+B_{2}K_{2}-L_{1}H_{1})\tilde{x}_{1}(k)-B_{2}K_{2}\tilde{x}_{2}(k),
x~2(k+1)\displaystyle\tilde{x}_{2}(k+1) =\displaystyle= (A+B1K1L2H2)x~1(k)B1K1x~1(k),\displaystyle(A+B_{1}K_{1}-L_{2}H_{2})\tilde{x}_{1}(k)-B_{1}K_{1}\tilde{x}_{1}(k),

that is

x~(k+1)\displaystyle\tilde{x}(k+1) =\displaystyle= 𝒜x~(k).\displaystyle\mathcal{A}\tilde{x}(k). (35)

Subsequently, if there exist matrices L1L_{1} and L2L_{2} making 𝒜\mathcal{A} stable, then, the stability of the matrix 𝒜\mathcal{A} means that

limkx~(k)=0,\displaystyle\lim_{k\rightarrow\infty}\tilde{x}(k)=0,

i.e., (33) is established. That is to say, the observers x^i(k)\hat{x}_{i}(k) are stable under (19)-(20). The proof is completed.

Remark 4

Noting that in Theorem 1 the key point lies in that how to select LiL_{i} (i=1,2i=1,2) so that the eigenvalues of the matrix 𝒜\mathcal{A} are within the unit circle. The following analysis gives an method to find LiL_{i}.

According to the Lyapunov stability criterion, i.e., 𝒜\mathcal{A} is stable if and only if for any positive definite matrix QQ, 𝒜P𝒜P=Q\mathcal{A}^{\prime}P\mathcal{A}-P=-Q admits a solution such that P>0P>0. Thus, if there exists a P>0P>0 such that

𝒜P𝒜P<0,\displaystyle\mathcal{A}^{\prime}P\mathcal{A}-P<0, (36)

then 𝒜\mathcal{A} is stable. Following from the elementary row transformation, one has

(II0I)(I00𝒜)(P𝒜PP𝒜P)(I00𝒜)\displaystyle\left(\begin{array}[]{cc}I&I\\ 0&I\\ \end{array}\right)\left(\begin{array}[]{cc}I&0\\ 0&\mathcal{A}^{\prime}\\ \end{array}\right)\left(\begin{array}[]{cc}-P&\mathcal{A}^{\prime}P\\ P\mathcal{A}&-P\\ \end{array}\right)\left(\begin{array}[]{cc}I&0\\ 0&\mathcal{A}\\ \end{array}\right)
×(I0II)=(𝒜P𝒜P00𝒜P𝒜)<0,\displaystyle\times\left(\begin{array}[]{cc}I&0\\ I&I\\ \end{array}\right)=\left(\begin{array}[]{cc}\mathcal{A}^{\prime}P\mathcal{A}-P&0\\ 0&-\mathcal{A}^{\prime}P\mathcal{A}\\ \end{array}\right)<0,

that is, 𝒜P𝒜P<0\mathcal{A}^{\prime}P\mathcal{A}-P<0 is equivalent to the following matrix inequality

(P𝒜PP𝒜P)<0.\displaystyle\left(\begin{array}[]{cc}-P&\mathcal{A}^{\prime}P\\ P\mathcal{A}&-P\\ \end{array}\right)<0. (41)

Noting that 𝒜\mathcal{A} is related with LiL_{i}, in order to use the linear matrix inequality (LMI) Toolbox in Matlab to find LiL_{i}, (41) will be transmit into a LMI form. Let

P=(P00P),W~=(W100W2),\displaystyle P=\left(\begin{array}[]{cc}P&0\\ 0&P\\ \end{array}\right),\quad\tilde{W}=\left(\begin{array}[]{cc}W_{1}&0\\ 0&W_{2}\\ \end{array}\right),

and rewrite 𝒜\mathcal{A} in (23) as 𝒜=A~L~H~\mathcal{A}=\tilde{A}-\tilde{L}\tilde{H}, where

𝒜\displaystyle\mathcal{A} =\displaystyle= (A+B2K2B2K2B1K1A+B1K1),\displaystyle\left(\begin{array}[]{cc}A+B_{2}K_{2}&-B_{2}K_{2}\\ -B_{1}K_{1}&A+B_{1}K_{1}\\ \end{array}\right),
L~\displaystyle\tilde{L} =\displaystyle= (L100L2),H~=(H100H2).\displaystyle\left(\begin{array}[]{cc}L_{1}&0\\ 0&L_{2}\\ \end{array}\right),\quad\tilde{H}=\left(\begin{array}[]{cc}H_{1}&0\\ 0&H_{2}\\ \end{array}\right).

To this end, we have

P𝒜=PA~PL~H~=PA~W~H~,\displaystyle P\mathcal{A}=P\tilde{A}-P\tilde{L}\tilde{H}=P\tilde{A}-\tilde{W}\tilde{H},

with W~=PL~\tilde{W}=P\tilde{L}. Based on the discussion above, it concludes that 𝒜\mathcal{A} is stable if there exists a P>0P>0 such that the following LMI:

(P(PA~W~H~)PA~W~H~P)<0.\displaystyle\left(\begin{array}[]{cc}-P&(P\tilde{A}-\tilde{W}\tilde{H})^{\prime}\\ P\tilde{A}-\tilde{W}\tilde{H}&-P\\ \end{array}\right)<0. (47)

In this way, by using the LMI Toolbox in Matlab, LiL_{i} can be found according, which stabilizes 𝒜\mathcal{A} where Li=P1WiL_{i}=P^{-1}W_{i}.

Under the observer-feedback controllers (19)-(20), the stability of (1) is given.

Theorem 2

Under Assumption 1 and if there exists LiL_{i} stabilizing 𝒜\mathcal{A}, then the closed-loop system (1) is stable with the observer-feedback controllers (19)-(20).

Proof 3

According to (34), the closed-loop system (1) is reformulated as

x(k+1)\displaystyle x(k+1) =\displaystyle= [A+B1K1+B2K2]x(k)+x~(k).\displaystyle[A+B_{1}K_{1}+B_{2}K_{2}]x(k)+\mathcal{B}\tilde{x}(k). (48)

Together with (35), we have

[x(k+1)x~(k+1)]\displaystyle\left[\begin{array}[]{c}x(k+1)\\ \tilde{x}(k+1)\\ \end{array}\right] =\displaystyle= A¯[x(k)x~(k)].\displaystyle\bar{A}\left[\begin{array}[]{c}x(k)\\ \tilde{x}(k)\\ \end{array}\right]. (53)

The stability of A+B1K1+B2K2A+B_{1}K_{1}+B_{2}K_{2} is guaranteed by the stabilizability of (A,B)(A,B) and the observability of (A,Qi)(A,Q_{i}) for i=1,2i=1,2. Following from Theorem 1, 𝒜\mathcal{A} is stabilized by selecting appropriate gain matrices L1L_{1} and L2L_{2}. Subsequently, the stability of the closed-loop system (1) is derived. This completes the proof.

IV The asymptotical optimal analysis

The stability of the state and the observers, i.e., x(k)x(k) and x^i\hat{x}_{i} for i=1,2i=1,2 has been shown in Theorem 1 and Theorem 2 under the observer-feedback controllers (19)-(20). To shown the rationality of the design of the observer-feedback controllers (19)-(20), the asymptotical optimal analysis relating with the cost functions under (19)-(20) is given. To this end, denote the cost functions for the follower and the leader satisfying

J1(s,M)\displaystyle J_{1}(s,M) =\displaystyle= k=sM[x(k)Q1x(k)+u1(k)R11u1(k)\displaystyle\sum\limits^{M}_{k=s}[x^{\prime}(k)Q_{1}x(k)+u^{\prime}_{1}(k)R_{11}u_{1}(k) (54)
+u2(k)R12u2(k)],\displaystyle+u^{\prime}_{2}(k)R_{12}u_{2}(k)],
J2(s,M)\displaystyle J_{2}(s,M) =\displaystyle= k=sM[x(k)Q2x(k)+u1(k)R21u1(k)\displaystyle\sum\limits^{M}_{k=s}[x^{\prime}(k)Q_{2}x(k)+u^{\prime}_{1}(k)R_{21}u_{1}(k) (55)
+u2(k)R22u2(k)].\displaystyle+u^{\prime}_{2}(k)R_{22}u_{2}(k)].

Now, we are in position to show that the observer-feedback Stackelberg strategy (19)-(20) is asymptotical optimal to the optimal feedback Stackelberg strategy presented in Lemma 1.

Theorem 3

Under Assumption 1, the corresponding cost functions (54)-(55) under the observer-feedback Stackelberg strategy (19)-(20) with LiL_{i} (i=1,2i=1,2) selected from Theorem 1 are given by

J1(s,)\displaystyle J^{\star}_{1}(s,\infty) =\displaystyle= x(s)P1x(s)\displaystyle x^{\prime}(s)P_{1}x(s) (62)
+k=s[x(k)x~(k)][0T1T1S1][x(k)x~(k)],\displaystyle+\sum\limits^{\infty}_{k=s}\left[\begin{array}[]{c}x(k)\\ \tilde{x}(k)\\ \end{array}\right]^{\prime}\left[\begin{array}[]{cc}0&T_{1}\\ T^{\prime}_{1}&S_{1}\\ \end{array}\right]\left[\begin{array}[]{c}x(k)\\ \tilde{x}(k)\\ \end{array}\right],
J2(s,)\displaystyle J^{\star}_{2}(s,\infty) =\displaystyle= x(s)P2x(s)\displaystyle x^{\prime}(s)P_{2}x(s) (69)
+k=s[x(k)x~(k)][0T2T2S2][x(k)x~(k)],\displaystyle+\sum\limits^{\infty}_{k=s}\left[\begin{array}[]{c}x(k)\\ \tilde{x}(k)\\ \end{array}\right]^{\prime}\left[\begin{array}[]{cc}0&T_{2}\\ T^{\prime}_{2}&S_{2}\\ \end{array}\right]\left[\begin{array}[]{c}x(k)\\ \tilde{x}(k)\\ \end{array}\right],

where

S1\displaystyle S_{1} =\displaystyle= P1[K1R11K100K2R12K2],\displaystyle\mathcal{B}^{\prime}P_{1}\mathcal{B}-\left[\begin{array}[]{cc}K^{\prime}_{1}R_{11}K_{1}&0\\ 0&K^{\prime}_{2}R_{12}K_{2}\\ \end{array}\right],
S2\displaystyle S_{2} =\displaystyle= P2[K1R21K100K2R22K2],\displaystyle\mathcal{B}^{\prime}P_{2}\mathcal{B}-\left[\begin{array}[]{cc}K^{\prime}_{1}R_{21}K_{1}&0\\ 0&K^{\prime}_{2}R_{22}K_{2}\\ \end{array}\right],
T1\displaystyle T_{1} =\displaystyle= (A+B2K2)M1P1,\displaystyle(A+B_{2}K_{2})^{\prime}M^{\prime}_{1}P_{1}\mathcal{B},
T2\displaystyle T_{2} =\displaystyle= (A+B2K2)M1P2.\displaystyle(A+B_{2}K_{2})^{\prime}M^{\prime}_{1}P_{2}\mathcal{B}.

Moreover, the differences, which are denoted as δJ1(s,)\delta J_{1}(s,\infty) and δJ2(s,)\delta J_{2}(s,\infty), between (62)-(69) and the optimal cost functions (15)-(16) obtained in Lemma 1 under the optimal feedback Stackelberg strategy are such that

δJ1(s,)\displaystyle\delta J_{1}(s,\infty) =\displaystyle= J1(s,)J1(s,)\displaystyle J^{\star}_{1}(s,\infty)-J^{*}_{1}(s,\infty) (78)
=\displaystyle= k=s[x(k)x~(k)][0T1T1S1][x(k)x~(k)],\displaystyle\sum\limits^{\infty}_{k=s}\left[\begin{array}[]{c}x(k)\\ \tilde{x}(k)\\ \end{array}\right]^{\prime}\left[\begin{array}[]{cc}0&T_{1}\\ T^{\prime}_{1}&S_{1}\\ \end{array}\right]\left[\begin{array}[]{c}x(k)\\ \tilde{x}(k)\\ \end{array}\right],
δJ2(s,)\displaystyle\delta J_{2}(s,\infty) =\displaystyle= J2(s,)J2(s,)\displaystyle J^{\star}_{2}(s,\infty)-J^{*}_{2}(s,\infty) (85)
=\displaystyle= k=s[x(k)x~(k)][0T2T2S2][x(k)x~(k)].\displaystyle\sum\limits^{\infty}_{k=s}\left[\begin{array}[]{c}x(k)\\ \tilde{x}(k)\\ \end{array}\right]^{\prime}\left[\begin{array}[]{cc}0&T_{2}\\ T^{\prime}_{2}&S_{2}\\ \end{array}\right]\left[\begin{array}[]{c}x(k)\\ \tilde{x}(k)\\ \end{array}\right].
Proof 4

The proof will be divided into two parts. The first part is to consider the cost function of the follower under the observer-feedback controllers (19)-(20). Following from (34), system (1) it can be rewritten as

x(k+1)\displaystyle x(k+1) =\displaystyle= [A+B1K1+B2K2]x(k)B1K1x~1(k)\displaystyle[A+B_{1}K_{1}+B_{2}K_{2}]x(k)-B_{1}K_{1}\tilde{x}_{1}(k) (86)
B2K2x~2(k)\displaystyle-B_{2}K_{2}\tilde{x}_{2}(k)
=\displaystyle= (IB1S)(A+B2K2)x(k)+x~(k),\displaystyle(I-B_{1}S)(A+B_{2}K_{2})x(k)+\mathcal{B}\tilde{x}(k),

where K1K_{1} in (11) have been used in the derivation of the last equality.

Firstly, we will prove J1(s,)J^{\star}_{1}(s,\infty) satisfies (62). Combing (86) with (13), one has

x(k)P1x(k)x(k+1)P1x(k+1)\displaystyle x^{\prime}(k)P_{1}x(k)-x(k+1)^{\prime}P_{1}x(k+1)
=\displaystyle= x(k)[P1(A+B2K2)(IB1S)P1(IB1S)\displaystyle x^{\prime}(k)[P_{1}-(A+B_{2}K_{2})^{\prime}(I-B_{1}S)^{\prime}P_{1}(I-B_{1}S)
×(A+B2K2)]x(k)x(k)(A+B2K2)M1P1x~(k)\displaystyle\times(A+B_{2}K_{2})]x(k)-x^{\prime}(k)(A+B_{2}K_{2})^{\prime}M^{\prime}_{1}P_{1}\mathcal{B}\tilde{x}(k)
x~(k)P1M1(A+B2K2)x(k)x~(k)P1x~(k)\displaystyle-\tilde{x}^{\prime}(k)\mathcal{B}^{\prime}P_{1}M_{1}(A+B_{2}K_{2})x(k)-\tilde{x}^{\prime}(k)\mathcal{B}^{\prime}P_{1}\mathcal{B}\tilde{x}(k)
=\displaystyle= x(k)[Q1+K2R12K2(A+B2K2)P1B1Γ11B1P1\displaystyle x^{\prime}(k)[Q_{1}+K^{\prime}_{2}R_{12}K_{2}-(A+B_{2}K_{2})^{\prime}P_{1}B_{1}\Gamma^{-1}_{1}B^{\prime}_{1}P_{1}
×(A+B2K2)+(A+B2K2)P1B1S(A+B2K2)\displaystyle\times(A+B_{2}K_{2})+(A+B_{2}K_{2})^{\prime}P_{1}B_{1}S(A+B_{2}K_{2})
+(A+B2K2)SB1P1(A+B2K2)(A+B2K2)S\displaystyle+(A+B_{2}K_{2})^{\prime}S^{\prime}B^{\prime}_{1}P_{1}(A+B_{2}K_{2})-(A+B_{2}K_{2})^{\prime}S^{\prime}
×B1P1B1S(A+B2K2)]x(k)x(k)(A+B2K2)M1\displaystyle\times B^{\prime}_{1}P_{1}B_{1}S(A+B_{2}K_{2})]x(k)-x^{\prime}(k)(A+B_{2}K_{2})^{\prime}M^{\prime}_{1}
×P1x~(k)x~(k)P1M1(A+B2K2)x(k)\displaystyle\times P_{1}\mathcal{B}\tilde{x}(k)-\tilde{x}^{\prime}(k)\mathcal{B}^{\prime}P_{1}M_{1}(A+B_{2}K_{2})x(k)
x~(k)P1x~(k)\displaystyle-\tilde{x}^{\prime}(k)\mathcal{B}^{\prime}P_{1}\mathcal{B}\tilde{x}(k)
=\displaystyle= x(k)[Q1+K2R12K2+K1(R11+B1P1B1)K1\displaystyle x^{\prime}(k)[Q_{1}+K^{\prime}_{2}R_{12}K_{2}+K^{\prime}_{1}(R_{11}+B^{\prime}_{1}P_{1}B_{1})K_{1}
K1B1P1B1K1]x(k)x(k)(A+B2K2)M1P1x~(k)\displaystyle-K^{\prime}_{1}B^{\prime}_{1}P_{1}B_{1}K_{1}]x(k)-x^{\prime}(k)(A+B_{2}K_{2})^{\prime}M^{\prime}_{1}P_{1}\mathcal{B}\tilde{x}(k)
x~(k)P1M1(A+B2K2)x(k)x~(k)P1x~(k)\displaystyle-\tilde{x}^{\prime}(k)\mathcal{B}^{\prime}P_{1}M_{1}(A+B_{2}K_{2})x(k)-\tilde{x}^{\prime}(k)\mathcal{B}^{\prime}P_{1}\mathcal{B}\tilde{x}(k)
=\displaystyle= x(k)[Q1+K1R11K1+K2R12K2]x(k)\displaystyle x^{\prime}(k)[Q_{1}+K^{\prime}_{1}R_{11}K_{1}+K^{\prime}_{2}R_{12}K_{2}]x(k) (87)
x(k)(A+B2K2)M1P1x~(k)x~(k)P1M1\displaystyle-x^{\prime}(k)(A+B_{2}K_{2})^{\prime}M^{\prime}_{1}P_{1}\mathcal{B}\tilde{x}(k)-\tilde{x}^{\prime}(k)\mathcal{B}^{\prime}P_{1}M_{1}
×(A+B2K2)x(k)x~(k)P1x~(k).\displaystyle\times(A+B_{2}K_{2})x(k)-\tilde{x}^{\prime}(k)\mathcal{B}^{\prime}P_{1}\mathcal{B}\tilde{x}(k).

Substituting (4) from k=sk=s to k=Mk=M on both sides, we have

x(s)P1x(s)x(M+1)P1x(M+1)\displaystyle x^{\prime}(s)P_{1}x(s)-x^{\prime}(M+1)P_{1}x(M+1)
=\displaystyle= J1(s,M)+k=sMx~(k)[K1R11K100K2R12K2]x~(k)\displaystyle J_{1}(s,M)+\sum\limits^{M}_{k=s}\tilde{x}^{\prime}(k)\left[\begin{array}[]{cc}K^{\prime}_{1}R_{11}K_{1}&0\\ 0&K^{\prime}_{2}R_{12}K_{2}\\ \end{array}\right]\tilde{x}(k) (97)
k=sM[x(k)x~(k)][0T1T1P1][x(k)x~(k)].\displaystyle-\sum\limits^{M}_{k=s}\left[\begin{array}[]{c}x(k)\\ \tilde{x}(k)\\ \end{array}\right]^{\prime}\left[\begin{array}[]{cc}0&T_{1}\\ T^{\prime}_{1}&\mathcal{B}^{\prime}P_{1}\mathcal{B}\\ \end{array}\right]\left[\begin{array}[]{c}x(k)\\ \tilde{x}(k)\\ \end{array}\right].

According to Theorem 2, the stability of (1) means that

limMx(M+1)P1x(M+1)=0.\displaystyle\lim_{M\rightarrow\infty}x^{\prime}(M+1)P_{1}x(M+1)=0.

Thus, following from (4) and letting MM\rightarrow\infty, (62) can be obtained exactly.

The second part is to consider the cost function of the leader under the observer-feedback controllers (19)-(20), that is, we will show that J2(s,)J^{\star}_{2}(s,\infty) satisfies (69). Following from (86), it derives

x(k)P2x(k)x(k+1)P2x(k+1)\displaystyle x^{\prime}(k)P_{2}x(k)-x(k+1)^{\prime}P_{2}x(k+1)
=\displaystyle= x(k)[P2(A+B2K2)M1P2M1(A+B2K2)]x(k)\displaystyle x^{\prime}(k)[P_{2}-(A+B_{2}K_{2})^{\prime}M^{\prime}_{1}P_{2}M_{1}(A+B_{2}K_{2})]x(k)
x(k)(A+B2K2)M1P2x~(k)x~(k)P2M1(A\displaystyle-x^{\prime}(k)(A+B_{2}K_{2})^{\prime}M^{\prime}_{1}P_{2}\mathcal{B}\tilde{x}(k)-\tilde{x}^{\prime}(k)\mathcal{B}^{\prime}P_{2}M_{1}(A
+B2K2)x(k)x~(k)P2x~(k)\displaystyle+B_{2}K_{2})x(k)-\tilde{x}^{\prime}(k)\mathcal{B}^{\prime}P_{2}\mathcal{B}\tilde{x}(k)
=\displaystyle= x(k)[Q2+ASR21SAY2Γ21Y2\displaystyle x^{\prime}(k)[Q_{2}+A^{\prime}S^{\prime}R_{21}SA-Y^{\prime}_{2}\Gamma^{-1}_{2}Y_{2} (98)
AM1P2M1B2K2K2B2M1P2M1A\displaystyle-A^{\prime}M^{\prime}_{1}P_{2}M_{1}B_{2}K_{2}-K^{\prime}_{2}B^{\prime}_{2}M^{\prime}_{1}P_{2}M_{1}A
K2B2M1P2M1B2K2]x(k)x~(k)P2x~(k)\displaystyle-K^{\prime}_{2}B^{\prime}_{2}M^{\prime}_{1}P_{2}M_{1}B_{2}K_{2}]x(k)-\tilde{x}^{\prime}(k)\mathcal{B}^{\prime}P_{2}\mathcal{B}\tilde{x}(k)
x(k)(A+B2K2)M1P2x~(k)\displaystyle-x^{\prime}(k)(A+B_{2}K_{2})^{\prime}M^{\prime}_{1}P_{2}\mathcal{B}\tilde{x}(k)
x~(k)P2M1(A+B2K2)x(k),\displaystyle-\tilde{x}^{\prime}(k)\mathcal{B}^{\prime}P_{2}M_{1}(A+B_{2}K_{2})x(k),

where the algebraic Riccati equation (14) has been used in the derivation of the last equality. For further optimization, we make the following derivation:

x(k)P2x(k)x(k+1)P2x(k+1)\displaystyle x^{\prime}(k)P_{2}x(k)-x(k+1)^{\prime}P_{2}x(k+1)
=\displaystyle= x(k)[Q2+K1R21K1+K2R22K2]x(k)\displaystyle x^{\prime}(k)[Q_{2}+K^{\prime}_{1}R_{21}K_{1}+K^{\prime}_{2}R_{22}K_{2}]x(k)
+x(k)[(A+B2K2)SR21S(A+B2K2)\displaystyle+x^{\prime}(k)[-(A+B_{2}K_{2})^{\prime}S^{\prime}R_{21}S(A+B_{2}K_{2})
K2R22K2+ASR21SAY2Γ21Y2\displaystyle-K^{\prime}_{2}R_{22}K_{2}+A^{\prime}S^{\prime}R_{21}SA-Y^{\prime}_{2}\Gamma^{-1}_{2}Y_{2}
AM1P2M1B2K2K2B2M1P2M1A\displaystyle-A^{\prime}M^{\prime}_{1}P_{2}M_{1}B_{2}K_{2}-K^{\prime}_{2}B^{\prime}_{2}M^{\prime}_{1}P_{2}M_{1}A
K2B2M1P2M1B2K2]x(k)x(k)(A+B2K2)\displaystyle-K^{\prime}_{2}B^{\prime}_{2}M^{\prime}_{1}P_{2}M_{1}B_{2}K_{2}]x(k)-x^{\prime}(k)(A+B_{2}K_{2})^{\prime}
×M1P2x~(k)x~(k)P2M1(A+B2K2)x(k)\displaystyle\times M^{\prime}_{1}P_{2}\mathcal{B}\tilde{x}(k)-\tilde{x}^{\prime}(k)\mathcal{B}^{\prime}P_{2}M_{1}(A+B_{2}K_{2})x(k)
x~(k)P2x~(k)\displaystyle-\tilde{x}^{\prime}(k)\mathcal{B}^{\prime}P_{2}\mathcal{B}\tilde{x}(k)
=\displaystyle= x(k)[Q2+K1R21K1+K2R22K2]x(k)\displaystyle x^{\prime}(k)[Q_{2}+K^{\prime}_{1}R_{21}K_{1}+K^{\prime}_{2}R_{22}K_{2}]x(k) (99)
x(k)(A+B2K2)M1P2x~(k)x~(k)P2M1\displaystyle-x^{\prime}(k)(A+B_{2}K_{2})^{\prime}M^{\prime}_{1}P_{2}\mathcal{B}\tilde{x}(k)-\tilde{x}^{\prime}(k)\mathcal{B}^{\prime}P_{2}M_{1}
×(A+B2K2)x(k)x~(k)P2x~(k).\displaystyle\times(A+B_{2}K_{2})x(k)-\tilde{x}^{\prime}(k)\mathcal{B}^{\prime}P_{2}\mathcal{B}\tilde{x}(k).

Substituting (4) from k=sk=s to k=Mk=M on both sides, one has

x(s)P2x(s)x(M+1)P2x(M+1)\displaystyle x^{\prime}(s)P_{2}x(s)-x^{\prime}(M+1)P_{2}x(M+1)
=\displaystyle= J2(s,M)+k=sMx~(k)[K1R21K100K2R22K2]x~(k)\displaystyle J_{2}(s,M)+\sum\limits^{M}_{k=s}\tilde{x}^{\prime}(k)\left[\begin{array}[]{cc}K^{\prime}_{1}R_{21}K_{1}&0\\ 0&K^{\prime}_{2}R_{22}K_{2}\\ \end{array}\right]\tilde{x}(k) (109)
k=sM[x(k)x~(k)][0T2T2P1][x(k)x~(k)].\displaystyle-\sum\limits^{M}_{k=s}\left[\begin{array}[]{c}x(k)\\ \tilde{x}(k)\\ \end{array}\right]^{\prime}\left[\begin{array}[]{cc}0&T_{2}\\ T^{\prime}_{2}&\mathcal{B}^{\prime}P_{1}\mathcal{B}\\ \end{array}\right]\left[\begin{array}[]{c}x(k)\\ \tilde{x}(k)\\ \end{array}\right].

Due to limMx(M+1)P2x(M+1)=0\lim_{M\rightarrow\infty}x^{\prime}(M+1)P_{2}x(M+1)=0, (69) can be immediately obtained by letting MM\rightarrow\infty in (4).

Moreover, together with Lemma 1, the optimal cost functions of (54)-(55) under the optimal feedback Stackelberg strategy are given by

J1(s,)\displaystyle J^{*}_{1}(s,\infty) =\displaystyle= x(s)P1x(s),\displaystyle x^{\prime}(s)P_{1}x(s), (110)
J2(s,)\displaystyle J^{*}_{2}(s,\infty) =\displaystyle= x(s)P2x(s).\displaystyle x^{\prime}(s)P_{2}x(s). (111)

Together with (62)-(69), δJ1(s,)\delta J_{1}(s,\infty) and δJ2(s,)\delta J_{2}(s,\infty) in (78)-(85) are obtained. This completes the proof.

Finally, we will show the asymptotical optimal property under the observer-feedback Stackelberg strategy (19)-(20).

Theorem 4

Under the condition of Theorem 2, the optimal cost functions (62)-(69) under the observer-feedback Stackelberg strategy (19)-(20) are asymptotical optimal to the optimal cost functions (110)-(111) under the optimal feedback Stackelberg strategy (9)-(10), that is to say, for any ε>0\varepsilon>0, there exists a sufficiency large integer NN for i=1,2i=1,2 such that

δJi(N,)<ε.\displaystyle\delta J_{i}(N,\infty)<\varepsilon. (112)
Proof 5

Following from Theorem 2, there exists a stable matrix A¯\bar{A}. Thus, by [2], there exist constants 0<λ<10<\lambda<1 and c>0c>0 such that

[x(k)x~(k)]cλk[x(0)x~(0)].\displaystyle\Big{\|}\left[\begin{array}[]{c}x(k)\\ \tilde{x}(k)\\ \end{array}\right]\Big{\|}\leq c\lambda^{k}\Big{\|}\left[\begin{array}[]{c}x(0)\\ \tilde{x}(0)\\ \end{array}\right]\Big{\|}. (117)

In this way, one has

δJi(N,)\displaystyle\delta J_{i}(N,\infty) =\displaystyle= k=s[x(k)x~(k)][0TiTiSi][x(k)x~(k)]\displaystyle\sum\limits^{\infty}_{k=s}\left[\begin{array}[]{c}x(k)\\ \tilde{x}(k)\\ \end{array}\right]^{\prime}\left[\begin{array}[]{cc}0&T_{i}\\ T^{\prime}_{i}&S_{i}\\ \end{array}\right]\left[\begin{array}[]{c}x(k)\\ \tilde{x}(k)\\ \end{array}\right] (124)
\displaystyle\leq k=s[0TiTiSi][x(k)x~(k)]2\displaystyle\sum\limits^{\infty}_{k=s}\Big{\|}\left[\begin{array}[]{cc}0&T_{i}\\ T^{\prime}_{i}&S_{i}\\ \end{array}\right]\Big{\|}\Big{\|}\left[\begin{array}[]{c}x(k)\\ \tilde{x}(k)\\ \end{array}\right]\Big{\|}^{2} (129)
\displaystyle\leq k=sλ2kc2[0TiTiSi][x(0)x~(0)]2\displaystyle\sum\limits^{\infty}_{k=s}\lambda^{2k}\cdot c^{2}\Big{\|}\left[\begin{array}[]{cc}0&T_{i}\\ T^{\prime}_{i}&S_{i}\\ \end{array}\right]\Big{\|}\Big{\|}\left[\begin{array}[]{c}x(0)\\ \tilde{x}(0)\\ \end{array}\right]\Big{\|}^{2} (134)
<\displaystyle< λ2s1λ2c2[0TiTiSi][x(0)x~(0)]2\displaystyle\frac{\lambda^{2s}}{1-\lambda^{2}}\cdot c^{2}\Big{\|}\left[\begin{array}[]{cc}0&T_{i}\\ T^{\prime}_{i}&S_{i}\\ \end{array}\right]\Big{\|}\Big{\|}\left[\begin{array}[]{c}x(0)\\ \tilde{x}(0)\\ \end{array}\right]\Big{\|}^{2} (139)
\displaystyle\doteq c¯λ2s.\displaystyle\bar{c}\lambda^{2s}. (140)

Since 0<λ<10<\lambda<1, thus there exists a sufficiency large integer NN such that for any ε>0\varepsilon>0, satisfying

λ2N<1c¯+1ε.\displaystyle\lambda^{2N}<\frac{1}{\bar{c}+1}\varepsilon.

Combing with (124), one has

δJi(N,)<c¯c¯+1ε<ε.\displaystyle\delta J_{i}(N,\infty)<\frac{\bar{c}}{\bar{c}+1}\varepsilon<\varepsilon. (141)

That is to say, the cost functions (62)-(69) under the observer-feedback Stackelberg strategy (19)-(20) are asymptotical optimal to the cost functions (110)-(111) under the optimal feedback Stackelberg strategy (9)-(10) when the integer NN is large enough. The proof is now completed.

V Numerical Examples

To show the validity of the results in Theorem 1 to Theorem 4, the following example is presented. Consider system (1)-(3) with

A\displaystyle A =\displaystyle= [10.710.3],B1=[51],\displaystyle\left[\begin{array}[]{cc}1&-0.7\\ 1&-0.3\\ \end{array}\right],\quad B_{1}=\left[\begin{array}[]{c}-5\\ -1\\ \end{array}\right],
B2\displaystyle B_{2} =\displaystyle= [01],H1=[10],H2=[01],\displaystyle\left[\begin{array}[]{c}0\\ 1\\ \end{array}\right],\quad H_{1}=\left[\begin{array}[]{cc}1&0\\ \end{array}\right],\quad H_{2}=\left[\begin{array}[]{cc}0&1\\ \end{array}\right],

and the associated cost functions (4)-(5) with

Q1\displaystyle Q_{1} =\displaystyle= [1001],Q2=[2001],\displaystyle\left[\begin{array}[]{cc}1&0\\ 0&1\\ \end{array}\right],\quad Q_{2}=\left[\begin{array}[]{cc}2&0\\ 0&1\\ \end{array}\right],
R11\displaystyle R_{11} =\displaystyle= 1,R11=2,R21=0,R22=1.\displaystyle 1,\quad R_{11}=2,\quad R_{21}=0,\quad R_{22}=1.

By decoupled solving the algebraic Riccati equations (13)-(14), the feedback gains in (11)-(12) are respectively calculated as

K1\displaystyle K_{1} =\displaystyle= [0.20280.1374],\displaystyle\left[\begin{array}[]{cc}0.2028&-0.1374\\ \end{array}\right],
K2\displaystyle K_{2} =\displaystyle= [0.40050.0791].\displaystyle\left[\begin{array}[]{cc}-0.4005&0.0791\\ \end{array}\right].

By using the LMI Toolbox in Matlab, LiL_{i} (i=1,2i=1,2) are calculated as

L1=[1.23640.4246],L2=[0.00390.1925],\displaystyle L_{1}=\left[\begin{array}[]{c}1.2364\\ 0.4246\\ \end{array}\right],\quad L_{2}=\left[\begin{array}[]{c}0.0039\\ 0.1925\\ \end{array}\right],

while the four eigenvalues of matrix 𝒜\mathcal{A} are calculated as:

λ1(𝒜)\displaystyle\lambda_{1}(\mathcal{A}) =\displaystyle= 0.1949,λ2(𝒜)=0.6791,\displaystyle 0.1949,\quad\lambda_{2}(\mathcal{A})=0.6791,
λ3(𝒜)\displaystyle\lambda_{3}(\mathcal{A}) =\displaystyle= λ4(𝒜)=0.7317,\displaystyle\lambda_{4}(\mathcal{A})=0.7317,

which means that 𝒜\mathcal{A} in (23) is sable. In this way, following from Theorem 1, the state error estimation x~(k)\tilde{x}(k) in (35) is stable, which is shown in Fig. 1, where data 1 to data 4 represent the four components of vector x~(k)[x~11(k)x~21(k)x~31(k)x~41(k)]\tilde{x}(k)\doteq\left[\begin{array}[]{cccc}\tilde{x}_{11}(k)&\tilde{x}_{21}(k)&\tilde{x}_{31}(k)&\tilde{x}_{41}(k)\\ \end{array}\right]^{\prime}. Moreover, under the observer-feedback Stackelberg strategy (19)-(20), the state x(k)x(k) in (1) is also stable which can be seen in Fig. 2, where data 1 and data 2 represent the two components of x(k)[x11(k)x21(k)]x(k)\doteq\left[\begin{array}[]{cc}x_{11}(k)&x_{21}(k)\\ \end{array}\right]^{\prime}. Finally, by analyzing Fig. 1 and Fig. 2 and selecting N=30N=30 in Theorem 4, the asymptotical optimal property of the cost functions (62)-(69) under the observer-feedback Stackelberg strategy (19)-(20) is guaranteed.

Refer to caption
Figure 1: Trajectory of x~(k)\tilde{x}(k) in (35) under the observer-feedback Stackelberg strategy (19)-(20).
Refer to caption
Figure 2: Trajectory of x(k)x(k) in (1) under the observer-feedback Stackelberg strategy (19)-(20).

VI Conclusion

In this paper, we have considered the feedback Stackelberg strategy for two-player leader-follower game with private inputs, where the follower only shares its measurement informaiton with the leader, while none of the historical control inputs and measurement information of the leader are shared with the follower due to the hierarchical relationship. The unavaliable access of the historical inputs for both controllers causes the main difficulty. The obstacle is overcome by designing the observers based on the informaiton structure and the observer-feedback Stackelberg strategy. Moreover, we have shown that the cost functions under the proposed observer-feedback Stackelberg strategy are asymptotical optimal to the cost functions under the optimal feedback Stackelberg strategy.

References

  • [1] D. Anderson and B. Moore, “Linear optimal control”, Prentice-Hall, Englewood Cliffs, NJ, 1971.
  • [2] M. Rami, X. Chen, J. Moore and X. Zhou. “Solvability and asymptotic behavior of generalized Riccati equations arising in indefinite stochastic LQ Controls”, IEEE Transactions on Automatic, 46(3): 428-440, 2001.
  • [3] H. S. Zhang, L. Li, J. J. Xu and M. Y. Fu, “Linear quadratic regulation and stabilization of discrete-time systems with delay and multiplicative noise”, IEEE Transactions on Automatic Control, 60(10): 2599-2613, 2015.
  • [4] N. W. Bauer, M. Donkers, N. van de Wouw and W. Heemels, “Decentralized observer-based control via networked communication”, Automatica, 49: 2074-2086, 2013.
  • [5] F. Blaabjerg, R. Teodorescu, M. Liserre and A. V. Timbus, “Overview of control and grid synchronization for distributed power generation systems”, IEEE Transactions on Industrial Electronics, 53: 1398-1409, 2006.
  • [6] B. Hoogenkamp, S. Farshidi, R. Y. Xin, Z. Shi, P. Chen and Z. M. Zhao, “A decentralized service control framework for decentralized applications in cloud environments”, Service-Oriented and Cloud Computation, 13226: 65-73, 2022.
  • [7] Q. P. Ha, and H. Trinh, “Observer-based control of multi-agent systems under decentralized information structure”, International Journal of Systems Science, 35(12): 719-728, 2004.
  • [8] D. Görges, “Distributed adaptive linear quadratic control using distributed reinforcement learning”, IFAC-PapersOnLine, 52(11): 218-223, 2019.
  • [9] H. Witsenhausen, “A counterexample in stochastic optimum control”, SIAM Journal on Control and Optimization, 6(1): 131-147, 1968.
  • [10] E. Davison, N. Rau and F. Palmay, “The optimal decentralized control of a power system consisting of a number of interconnected synchronous machines”, International Journal of Control, 18(6): 1313-1328, 1973.
  • [11] E. Davison, “The robust decentralized control of a general servomechanism problem”, IEEE Transactions on Automatic Control, AC-21: 14-24, 1976.
  • [12] T. Yoshikawa, “Dynamic programming approach to decentralized stochastic control problem”, IEEE Transactions on Automatic Control, 20(6): 796-797, 1975.
  • [13] J. Swigart and S. Lall, “An explicit state-space solution for a deffcentralized two-player optimal linear-quadratic regulator”, American Control Conference, 6385-6390, 2010.
  • [14] X. Liang, J. J Xu, H. X. Wang and H. S. Zhang, “Decentralized output-feedback control with asymmetric one-step delayed information”, IEEE Transactions on Automatic Control, doi: 10.1109/TAC.2023.3250161, 2023.
  • [15] A. Nayyar, A. Mahajan and T. Teneketzis, “Decentralized stochastic control with partial history sharing: A common information approach”, IEEE Transactions on Automatic Control, 58(7): 1644-1658, 2013.
  • [16] A. Nayyar, A. Mahajan and T. Teneketzis, “Optimal control strategies in delayed sharing information structures”, IEEE Transactions on Automatic Control, 56(7): 1606-1620, 2011.
  • [17] X. Liang, Q. Q. Qi, H. S. Zhang and L. H. Xie, “Decentralized control for networked control systems with asymmetric information”, IEEE Transactions on Automatic Control, 67(4): 2067-2083, 2021.
  • [18] B. C. Wang, X. Yu and H. L. Dong, “Social optima in linear quadratic mean field control withunmodeled dynamics and multiplicative noise” Aisan Journal of Control, 23(3): 1572-1582, 2019.
  • [19] T. Başar, “Two-criteria LQG decision problems with one-step delay observation sharing pattern”, Information and Control, 38: 21-50, 1978.
  • [20] P. George, “On the linear-quadratic-gaussian Nash game with one-step delay observation sharing pattern”, IEEE Transactions on Automatic Control, 27: 1065-1071, 1982.
  • [21] F. Suzumura and K. Mizukami, “Closed-loop strategy for Stackelberg game problem with incomplete information structures” IFAC 12th Triennial World Congress, Australia, 413-418, 1993.
  • [22] M. B. Klompstra, “Nash equilibria in risk-sensitive dynamic games”, IEEE Transactions on Automatic Control, 45(7): 1397-1401, 2000.
  • [23] M. Pachter, “LQG dynamic games with a control-sharing information pattern”, Dynamic Games and Applications, 7: 289-322, 2017.
  • [24] Y. Sun, J. J. Xu and H. S. Zhang, “Feedback Nash equilibrium with packet dropouts in networked control systems”, IEEE Transactions on Circuits and Systems II: Express Briefs, 70(3): 1024-1028, 2022.
  • [25] Z. P. Li, M. Y. Fu, H. S. Zhang and Z. Z. Wu, “Mean field stochastic linear quadratic games for continuum-parameterized multi-agent systems”, Journal of the Franklin Institute, 355: 5240-5255, 2018.
  • [26] H. Mukaidani, H. Xu and V. Dragan, “Static output-feedback incentive Stackelberg game for discrete-time markov jump linear stochastic systems with external disturbance”, IEEE Control Systems Letters, 2(4): 701-706, 2016.
  • [27] S. R. Chowdhury, X. Y. Zhou and N. Shroff, “Adaptive control of differentially private linear quadratic systems”, IEEE International Symposium on Information Theory, 485-490, 2021.
  • [28] J. J. Xu and H. S. Zhang, “Decentralized control of linear systems with private input and measurement information”, arXiv:2305.14921, 1-6, 2023.
  • [29] D. Castanon and M. Athans, “On stochastic dynamic Stackelberg strategies”, Automatica, 12: 177-183, 1976.
  • [30] A. Bensoussan, S. Chen and S. P. Sethi, “The maximum principle for global solutions of stochastic Stackelberg differential games”, SIAM Journal of Control and Optimization, 53(4): 1965-1981, 2015.