This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Data-Driven Pole Placement in LMI Regions with Robustness Constraints

Sayak Mukherjee, Ramij R. Hossain S. Mukherjee is with the Optimization and Control Group, Pacific Northwest National Laboratory (PNNL), Richland, WA, USA, and R. R. Hossain is with the Department of Electrical and Computer Engineering, Iowa State University, Ames, IA, USA. Authors have equal contributions. Emails: [email protected], [email protected].
Abstract

This paper proposes a robust learning methodology to place the closed-loop poles in desired convex regions in the complex plane. We considered the system state and input matrices to be unknown and can only use the measurements of the system trajectories. The closed-loop pole placement problem in the linear matrix inequality (LMI) regions is considered a classic robust control problem; however, that requires knowledge about the state and input matrices of the linear system. We bring in ideas from the behavioral system theory and persistency of excitation condition-based fundamental lemma to develop a data-driven counterpart that satisfies multiple closed-loop robustness specifications, such as 𝒟\mathcal{D}-stability and mixed H2/HH_{2}/H_{\infty} performance specifications. Our formulations lead to data-driven semi-definite programs (SDPs) that are coupled with sufficient theoretical guarantees. We validate the theoretical results with numerical simulations on a third-order dynamic system.

Keywords: Robust pole placement, data-driven robust control, stability guarantee, mixed H2/HH_{2}/H_{\infty}, LMI regions.

1 Introduction

Recent research works in automatic control has been focused more on converting the classic model-based formulations to their data-driven counter-parts. Motivation of such designs are taken from the increasing complexity of practical dynamic systems with increase in their scale, making the dynamic model and parameters less accurately known along with several unmodeled non-idealities. Data-driven approaches varied in many forms with their distinct characteristics. In machine learning community, sequential decision making problems using Markov decision process (MDPs) have garnered lot of interest under the umbrella of reinforcement learning (RL) [1, 2, 3, 4]. Many underlying concepts of RL have been translated to dynamic system viewpoint using adaptive dynamic programming in approaches such as [5, 6, 7, 8, 9, 10], considering both partial and full model-free designs. The later class of methods intends to provide theoretical guarantees using dynamic systems theory. On continuing the path of supplementing data-driven algorithms with strong mathematical backing, behavioral system theory is recently being touted as an effective alternative approach [11]. The underlying idea is to represent the space of input-output trajectories of an LTI system to be spanned by single time shifted trajectory measurements. Research works such as [12, 13, 14] follow such underlying framework for data-driven optimal control designs.

Along with considering optimal control designs in a data-driven way, practical systems will require sufficient robustness margins. The problem of performing robust control designs with unknown state model is currently having a lot of open questions. Approaches have been developed such as [15, 16, 17, 18], that tries to augment some robustness aspects by infusing few robust considerations in the data-driven RL or optimal control setting. However, dedicated robust learning methodology is envisioned to provide much better stabilization and performance guarantees with strong underlying framework. Behavioral system theory using Willems fundamental lemma [11] can provide such foundations for robust learning control designs that helps to provide an one-to-one conversion strategy from model based to data-driven approaches. In this paper we build upon that framework and consider the pole placement problem in desired convex region of the complex plane along with sufficient robustness specifications.

Classically, decades of research in the robust control domain unearthed plethora of methods that can provide sufficient system performance in presence of noise, unmodeled dynamics, uncertainty etc [19, 20, 21]. The closed-loop pole placement problem in desired convex region in the complex plane has given rise to linear matrix inequalities in works such as [22]. However, these classical methods require the knowledge about system state and input matrices. With this motivation, this paper deals with the robust pole placement problem in the LMI regions under the assumption that the system state and input matrices are unknown and the designer only has access to trajectory measurements. The system is explored with persistently exciting inputs to make sure we do not violate the fundamental requirement of behavioral system theory. The model based formulations, thereafter, can be converted to data-driven formulations using the closed-loop data-driven parametrized representations. The robust control problem considers the LMI based pole placement conditions along with robust performance constraints such as mixed H2/HH_{2}/H_{\infty} performance requirements.

Contribution. The main contribution of the paper is to propose a data-driven robust control methodology that can achieve desired closed-loop pole placement in convex regions of the complex plane along with sufficient robust and optimal system performance constrained by imposing mixed H2/HH_{2}/H_{\infty} performance metrics. We use the fundamental lemma based data-driven parametrized representation to formulate the convex formulations of the robust pole placement problem in LMI regions that can achieve very close to the model-based design characteristics. We provide theoretical guarantees on the performance of the proposed algorithm along with validation on a third order dynamic system example.

The rest of the paper is organized as follows. Section II describes the model and problem statement considered for the paper. We recall some fundamentals for data-driven designs in Section III. The main results on the data-driven LMI region pole placement with robustness specifications are shown in Section IV. Numerical example is given in Section V, and we provide concluding remarks in Section VI.

Notations. S()0S\succ(\succeq)0 denotes positive definite (semidefinite) matrix; H2H_{2} norm : The H2H_{2} norm of system GG in time domain is given by, G2=(0[trace(h(t)Th(t))]𝑑t)12||G||_{2}=(\int_{0}^{\infty}[\mbox{trace}(h(t)^{T}h(t))]dt)^{\frac{1}{2}} where h(t)h(t) is the impulse response; HH_{\infty} norm : The system HH_{\infty} norm is given by G=supwσmax(G(jw))||G||_{\infty}=\mbox{sup}_{w}\sigma_{\mbox{max}}(G(jw)), where σmax\sigma_{\mbox{max}} denotes maximum singular value, and G(jw)G(jw) is the system transfer matrix.

2 Problem Statement

We consider a linear time-invariant (LTI) continuous-time dynamic system of the form:

x˙=Ax+B1w+B2u,x(0)=x0,\displaystyle\dot{x}=Ax+B_{1}w+B_{2}u,\;x(0)=x_{0}, (1)

where xn,um,wdx\in\mathbb{R}^{n},u\in\mathbb{R}^{m},w\in\mathbb{R}^{d} are the states and control and extraneous inputs. We, hereby, make the following assumption.
Assumption 1: The dynamic state matrix AA and control input matrix B2B_{2} are unknown. However, the values of n,mn,m and dd are known.

We also consider the following assumption about the availability of measurements.
Assumption 2: The measurements of states x(t)x(t) and control inputs u(t)u(t) are available to the designer, and the designer injects known extraneous disturbances w(t)w(t) in a controlled environment to perform the control design tasks.
We are interested in placing the closed-loop poles in some desired regions of the complex plane. These regions are classically known as LMI regions defined as follows.

Definition 1: LMI pole placement regions - A subset 𝒟\mathcal{D} can be characterized as the desired pole placement regions if there exist a symmetric matrix α=[aij]n×n\alpha=[a_{ij}]\in\mathbb{R}^{n\times n}, and a matrix β=[bij]n×n\beta=[b_{ij}]\in\mathbb{R}^{n\times n} such that,

𝒟(z)={z,ψ𝒟<0},\displaystyle\mathcal{D}(z)=\{z\in\mathbb{C},\psi_{\mathcal{D}}<0\}, (2)

where,

ψ𝒟(z)=α+zβ+zβT=[aij+bijz+βjiz]1i,jn.\displaystyle\psi_{\mathcal{D}}(z)=\alpha+z\beta+z^{*}\beta^{T}=[a_{ij}+b_{ij}z+\beta_{ji}z^{*}]_{1\leq i,j\leq n}. (3)

The notation M=[mij]1i,jnM=[m_{ij}]_{1\leq i,j\leq n} denotes MM to be a n×nn\times n matrix (resp. block matrix) with generic entry (resp. block) mijm_{ij}. An LMI region is a subset of the complex plane that is representable by an LMI in zz and zz^{*}, or equivalently, an LMI in x=Re(z)x=Re(z) and y=Im(z)y=Im(z). As a result, LMI regions are convex. Moreover, LMI regions are symmetric with respect to the real axis. Various different types of LMI regions can be constructed by designer. Through out this paper, we consider the following as our working example.
Example 1: To consider the region in the left half of complex plane between the lines with slopes 1α-\frac{1}{\alpha} and 1α,α>0\frac{1}{\alpha},\alpha>0 is given as,

𝒟={z𝒞,[z+zα(zz)α(zz)z+z]<0}.\displaystyle\mathcal{D}=\{z\in\mathcal{C},\begin{bmatrix}z+z^{*}&\alpha(z^{*}-z)\\ \alpha(z-z^{*})&z+z^{*}\end{bmatrix}<0\}. (4)

This can be straightforwardly shown using z=x+jy,z=x+jy, giving,

[z+zα(zz)α(zz)z+z]=[2xj2αyj2αy2x]<0\displaystyle\begin{bmatrix}z+z^{*}&\alpha(z^{*}-z)\\ \alpha(z-z^{*})&z+z^{*}\end{bmatrix}=\begin{bmatrix}2x&j2\alpha y\\ -j2\alpha y&2x\end{bmatrix}<0 (5)

Therefore using Schur complement,

x<0,\displaystyle x<0, (6)
2x+(j2αy)12x(j2αy)<0,\displaystyle 2x+(j2\alpha y)\frac{1}{2x}(j2\alpha y)<0, (7)
implying,1α<yx<1α.\displaystyle\text{implying,}-\frac{1}{\alpha}<\frac{y}{x}<\frac{1}{\alpha}. (8)

If the sector in the left half of the complex plane is described using the inner angle θ\theta, then the LMI region expression becomes,

ψ𝒟(z;θ)=[sinθ(z+z)cosθ(zz)cosθ(zz)sinθ(z+z)]<0.\displaystyle\psi_{\mathcal{D}}(z;\theta)=\begin{bmatrix}\sin\theta(z+z^{*})&\cos\theta(z-z^{*})\\ \cos\theta(z^{*}-z)&\sin\theta(z+z^{*})\end{bmatrix}<0. (9)

\square
As the LMI region is convex, we can construct more complicated LMI regions by realizing convex polygons with intersection of simpler LMI regions. The focus of this paper to designer controllers without the state dynamics and using the state and input trajectory measurements. We also incorporate robust optimization objectives along with the LMI-based pole placement constraints. We consider the mixed H2/HH_{2}/H_{\infty} optimization objective. We consider two controlled output variables along with the dynamics,

Σ:{x˙(t)=Ax(t)+B1w(t)+B2u(t),z1(t)=C1x(t)+D11w(t)+D12u(t),z2(t)=C2x(t)+D22u(t).\Sigma:\left\{\begin{array}[]{l}\dot{x}(t)=Ax(t)+B_{1}w(t)+B_{2}u(t),\\ z_{1}(t)=C_{1}x(t)+D_{11}w(t)+D_{12}u(t),\\ z_{2}(t)=C_{2}x(t)+D_{22}u(t).\end{array}\right. (10)

Twz1T_{wz_{1}} (respectively Twz2T_{wz_{2}}) denotes the trasnfer function from w(t)w(t) to controlled output z1(t)z_{1}(t) (respectively to z2(t)z_{2}(t)). We intend to learn the state-feedback control u=Kxu=Kx such that the poles of the underlying closed-loop dynamics A+BKA+BK lie in the desired LMI region characterized by the prescribed ψ𝒟(z)<0,\psi_{\mathcal{D}}(z)<0, thereby maintaining 𝒟\mathcal{D}-stability, and to also satisfy mixed H2/HH_{2}/H_{\infty} objectives on the regulated variables. The problem statement is given as follows:
P. With the assumptions 1 and 2, learn the state-feedback control u=Kxu=Kx such that:

  • The prescribed 𝒟\mathcal{D}-stability is maintained for a desired LMI region,

  • Meet a prescribed HH_{\infty} robustness criterion, i.e, Twz1<γ||T_{wz_{1}}||_{\infty}<\gamma, or minimize the HH_{\infty} norm assuming the robustness margin as a variable,

  • With the desired 𝒟\mathcal{D}-stability and HH_{\infty} robustness margin, minimize the H2H_{2} performance Twz22||T_{wz_{2}}||_{2}.

3 Data-Driven Representation Fundamentals

3.1 Recalling fundamental lemma

Consider a signal s:ps:\mathbb{Z}\to\mathbb{R}^{p}, the Hankel matrix associated with it is given as,

Si,L,N=[s(i)s(i+1)s(i+N1)s(i+1)s(i+2)s(i+N)s(i+L1)s(i+L)s(i+N+L2)].\displaystyle S_{i,L,N}=\begin{bmatrix}s(i)&s(i+1)&\dots&s(i+N-1)\\ s(i+1)&s(i+2)&\dots&s(i+N)\\ \dots&\dots&\dots&\dots\\ s(i+L-1)&s(i+L)&\dots&s(i+N+L-2)\end{bmatrix}. (11)

The Hankel matrix starts with the element s(i)s(i), and consists of LL rows and NN columns. With L=1L=1 we denote,

Si,N=[s(i)s(i+1)s(i+N1)].\displaystyle S_{i,N}=\begin{bmatrix}s(i)&s(i+1)&\dots&s(i+N-1)\end{bmatrix}. (12)

Definition 2 [11, 12]: The signal s[0,T1]ps_{[0,T-1]}\in\mathbb{R}^{p} is persistently exciting of order LL if the corresponding Hankel matrix S0,L,TL+1S_{0,L,T-L+1} has full rank pLpL. Therefore, the signal must be sufficiently extended, i.e., T(p+1)L1T\geq(p+1)L-1. We recall the Willems et al.’s fundamental lemma [11] for the discrete time dynamic system:

x(k+1)=Ax(k)+Bu(k),\displaystyle x(k+1)=Ax(k)+Bu(k), (13)
y(k)=Cx(k)+Du(k),\displaystyle y(k)=Cx(k)+Du(k),

where xn,um,x\in\mathbb{R}^{n},u\in\mathbb{R}^{m}, and ypy\in\mathbb{R}^{p}.

Lemma 1 [11]: Considering the discrete-time system as given in (13), when the input u[0,T1]u_{[0,T-1]} is persistently exciting of order n+tn+t then one will have,

rank([U[0,t,Tt+1]X[0,Tt+1]])=n+tm.\displaystyle\mbox{rank}(\begin{bmatrix}U_{[0,t,T-t+1]}\\ X_{[0,T-t+1]}\end{bmatrix})=n+tm. (14)

Lemma 2 [11]: For the system (13), if the input u[0,T1]u_{[0,T-1]} is persistently exciting of order n+tn+t, then one can express any tt-length input-output trajectory measurements of the system in the following form,

[u[0,t1]x[0,t1]]=[U[0,t,Tt+1]X[0,t,Tt+1]]g,\displaystyle\begin{bmatrix}u_{[0,t-1]}\\ x_{[0,t-1]}\end{bmatrix}=\begin{bmatrix}U_{[0,t,T-t+1]}\\ X_{[0,t,T-t+1]}\end{bmatrix}g, (15)

where gTt+1g\in\mathbb{R}^{T-t+1}. This shows that when TT is taken sufficiently large, the rank condition of Lemma 1 can be satisfied, and therefore, any input-output trajectory of the system can be represented as a linear combination of collected input/output data. This property enables us to replace a parametric description of the system with a data based counterpart. For a persistently exciting input sequence u[0,T1]u_{[0,T-1]} of order n+1n+1 with t=1t=1, T(m+1)n+mT\geq(m+1)n+m is necessary for the persistence of excitation condition to hold. This results in,

rank([U[0,1,Tt+1]X[0,Tt+1]])=n+m.\displaystyle\mbox{rank}(\begin{bmatrix}U_{[0,1,T-t+1]}\\ X_{[0,T-t+1]}\end{bmatrix})=n+m. (16)

This idea can be extended for continuous-time systems as shown in [12]. For a sampling time Δ>0\Delta>0, input and state-sampled trajectories U[0,1,T]U_{[0,1,T]}, and X[0,T]X_{[0,T]} are stored, and the rank condition (16) needs to be checked. [12] constructed the continuous-time counterpart for the time-shifted states in the discrete-time using the derivative information with slight abuse of notation:

X1,T=[x˙(0)x˙(Δ)x˙((T1)Δ)].\displaystyle X_{1,T}=\begin{bmatrix}\dot{x}(0)&\dot{x}(\Delta)&\dots&\dot{x}((T-1)\Delta)\end{bmatrix}. (17)

The state-dynamic data gathered over the TT-length window is represented as,

X1,T=AX0,T+BU0,1,T,\displaystyle X_{1,T}=AX_{0,T}+BU_{0,1,T}, (18)
=[BA][U0,1,TX0,T].\displaystyle=[B\;\;\;\;A]\begin{bmatrix}U_{0,1,T}\\ X_{0,T}\end{bmatrix}. (19)

3.2 Data-driven Closed-loop Representation

Following [12], Lemma 2 can be exploited to derive a parametrization of the closed loop system with a state-feedback law u=Kxu=Kx. For the closed-loop system,

x˙\displaystyle\dot{x} =Ax+Bu=(A+BK)x,\displaystyle=Ax+Bu=(A+BK)x, (20)

by the Rouché–Capelli theorem, there exists a matrix GT×nG\in\mathbb{R}^{T\times n} such that

A+BK=[BA][KI],\displaystyle A+BK=[B\;\;A]\begin{bmatrix}K\\ I\end{bmatrix}, (21)
[KI]=[U0,1,TX0,T]G.\displaystyle\begin{bmatrix}K\\ I\end{bmatrix}=\begin{bmatrix}U_{0,1,T}\\ X_{0,T}\end{bmatrix}G.

Therefore the data-driven representation becomes,

x˙\displaystyle\dot{x} =[BA][U0,1,TX0,T]Gx,\displaystyle=[B\;\;A]\begin{bmatrix}U_{0,1,T}\\ X_{0,T}\end{bmatrix}Gx, (22)
=X1,TGx,\displaystyle=X_{1,T}Gx, (23)

and the model-based closed-loop can now be made data-based as follows

A+BK=[BA][KI]=X1,TG,\displaystyle A+BK=[B\;\;A]\begin{bmatrix}K\\ I\end{bmatrix}=X_{1,T}G, (24)

The control now becomes u=U0,1,TGxu=U_{0,1,T}Gx. Therefore, the designer needs to learn the matrix GG to implement the feedback control. We now provide the main results of the paper.

4 Data-Driven Robust Pole Placement Methodology

We first discuss the challenges in pole placement problems using the data-driven method without invoking any robust performance constraints. Then, we introduce the formulation of data-driven pole placement problem combining H2/HH_{2}/H_{\infty} condition followed by a comprehensive discussion.

Although the pole placement requirement can be on any convex region as given by (2), we consider (4) as a working example throughout the methodology development in this section and numerical example in the following section. Considering the system x˙=Ax+B2u\dot{x}=Ax+B_{2}u, in Section-II, we defined an example LMI region (4) for 𝒟\mathcal{D}-stability. Next, we will state the LMI condition needs to be satisfied to place the closed loop poles in the prescribed region using a state-feedback control u=Kxu=Kx.

Lemma 3[22]: For the system x˙=Ax+B2u\dot{x}=Ax+B_{2}u, given an LMI region 𝒟\mathcal{D} defined by (4), the closed loop system A~=A+B2K\tilde{A}=A+B_{2}K is said to be 𝒟\mathcal{D}-stable if there exists a real symmetric matrix XD0X_{D}\succ 0 satisfying the following condition.

[A~XD+XDA~Tα(A~XDXDA~T)α(XDA~TA~XD)A~XD+XDA~T]0,\displaystyle\begin{bmatrix}\tilde{A}X_{D}+X_{D}\tilde{A}^{T}&\alpha(\tilde{A}X_{D}-X_{D}\tilde{A}^{T})\\ \alpha(X_{D}\tilde{A}^{T}-\tilde{A}X_{D})&\tilde{A}X_{D}+X_{D}\tilde{A}^{T}\end{bmatrix}\prec 0, (25)

\square
Proof[22]: The above condition can easily be obtained by replacing zz with A~XD\tilde{A}X_{D} and zz^{*} with XDA~TX_{D}\tilde{A}^{T} in (4). Also, the detailed discussion of this condition can be found in [22]. This guarantees that the eigen values of A~=A+B2K\tilde{A}=A+B_{2}K belong to the conic region ψ𝒟(z;θ)\psi_{\mathcal{D}}(z;\theta) on the left half of complex plane. \square
Note that, the condition given in (25) is not an LMI, because if we replace A~\tilde{A} with A+B2KA+B_{2}K, we get a product term KXDKX_{D} of two unknown quantities KK and XDX_{D}. But, with a simple change of variable Y=KXDY=KX_{D}, (25) can be converted in to an LMI condition.

Next, we utilize the relation given in (24), and derive the data based condition for placing the poles of closed loop system in the region defined by the LMI condition (4).

Theorem 1: For the system x˙=Ax+B2u\dot{x}=Ax+B_{2}u with unknown state dynamics, let the input sequence U0,1,TU_{0,1,T} is persistently exciting, i.e. the rank condition (16) holds, then any matrix QQ satisfying (26), resulting in feedback gain K=U0,1,TQ(X0,TQ)1K=U_{0,1,T}Q(X_{0,T}Q)^{-1}, will make the system 𝒟\mathcal{D}-stable for the conic region defined by (4).

[X1,TQ+QTX1,TTα(X1,TQQTX1,TT)α(QTX1,TTX1,TQ)X1,TQ+QTX1,TT]0.\displaystyle\begin{bmatrix}X_{1,T}Q+Q^{T}X_{1,T}^{T}&\alpha(X_{1,T}Q-Q^{T}X_{1,T}^{T})\\ \alpha(Q^{T}X_{1,T}^{T}-X_{1,T}Q)&X_{1,T}Q+Q^{T}X_{1,T}^{T}\end{bmatrix}\prec 0. (26)

\square
Proof: We recall Lemma 3 giving us the condition (25), to place the closed loops of A+B2KA+B_{2}K in the desired conic region. However, we make the assumption that the system dynamics is unknown, therefore we rely upon the data driven persistency of excitation condition as defined in Section II. It can be seen from (24) that under a persistently exciting input, the closed loop dynamics can be represented in parameterized form derived from time evolution of system dynamics X1,TX_{1,T}, i.e., A+B2K=X1,TGA+B_{2}K=X_{1,T}G. As such, we get the following inequality,

[X1,TGXD+XD(X1,TG)Tα(X1,TGXDXD(X1,TG)T)α(XD(X1,TG)T(X1,TG)XD)X1,TGXD+XD(X1,TG)T]0\displaystyle\begin{bmatrix}X_{1,T}GX_{D}+X_{D}(X_{1,T}G)^{T}&\alpha(X_{1,T}GX_{D}-X_{D}(X_{1,T}G)^{T})\\ \alpha(X_{D}(X_{1,T}G)^{T}-(X_{1,T}G)X_{D})&X_{1,T}GX_{D}+X_{D}(X_{1,T}G)^{T}\end{bmatrix}\prec 0 (27)

To make this inequality an LMI, we consider GXD=QGX_{D}=Q, resulting in (26). We also have G=Q(XD)1G=Q(X_{D})^{-1}. Now recalling Section II, KK can be computed as K=U0,1,TG=U0,1,TQ(XD)1K=U_{0,1,T}G=U_{0,1,T}Q(X_{D})^{-1}. Using (21) we have X0,TG=IX_{0,T}G=I, which implies, XD=X0,TQX_{D}=X_{0,T}Q. Therefore, the feedback gain turns out to be K=U0,1,TQ(X0,TQ)1K=U_{0,1,T}Q(X_{0,T}Q)^{-1}. Please note that this condition completely relies upon collected data from system trajectories and the design parameter α\alpha which determines the LMI region (4). \square
Remark 1: It is essential to note that the above pole placement problem is a feasibility problem. Therefore, it is apparent that there can be many feasible XDX_{D} which lies in the convex set defined by (25). This gives rise to different controller gains which all satisfy the 𝒟\mathcal{D}-stability condition. Similar characteristics will be observed during data-driven design using Theorem 1, as different exploration trajectories can result in different possible control gains, however, they all satisfy the desired closed-loop pole placement constraint (4).

As described in Remark 1, we will now consider system (10) along with desired H2/HH_{2}/H_{\infty} constraints. Please note, we are now considering an extraneous input ww and transfer functions associated with the H2H_{2} and HH_{\infty} problem represent the gain from ww to regulated outputs z2z_{2} and z1z_{1}, respectively. As we consider the state matrix AA and input matrix BB are unknown, the extraneous disturbance needs to pre-specified in a controlled environment during the design of the feedback gains. Recalling Section II, the trajectory based system dynamics turns out to be

X1,T=AX0,T+B1W0,T+B2U0,1,T,\displaystyle X_{1,T}=AX_{0,T}+B_{1}W_{0,T}+B_{2}U_{0,1,T}, (28)
X1,TB1W0,T=AX0,T+B2U0,1,T.\displaystyle X_{1,T}-B_{1}W_{0,T}=AX_{0,T}+B_{2}U_{0,1,T}. (29)

Therefore, the closed loop parameterized representation modifies to (30). Note, like X0,TX_{0,T}, W0,TW_{0,T} can also be defined using (12). The closed loop parameterization with extraneous input becomes

A+B2K=[B2A][KI]=(X1,TB1W0,T)G=X~1,TG.\displaystyle A+B_{2}K=[B_{2}\;\;A]\begin{bmatrix}K\\ I\end{bmatrix}=(X_{1,T}-B_{1}W_{0,T})G=\tilde{X}_{1,T}G. (30)

We now state the following theorem to solve problem P.

Theorem 2: For the system (10), to place the closed loop poles in the desired LMI region (4) along with sufficient mixed H2/HH_{2}/H_{\infty} the following set of data driven LMIs needs to be solved, where C2=[Qx120]C_{2}=\begin{bmatrix}Q_{x}^{\frac{1}{2}}\\ 0\end{bmatrix}, D22=[0R12]D_{22}=\begin{bmatrix}0\\ R^{\frac{1}{2}}\end{bmatrix} and Qx0Q_{x}\succeq 0, R0R\succ 0 are the designable state and input penalty factors.

minQ,S,γtrace(QxX0,TQ)+trace(S)+γ\min_{Q,S,\gamma}\operatorname{trace}\left(Q_{x}X_{0,T}Q\right)+\operatorname{trace}(S)+\gamma (31)

subject to

[X~1,TQ+QTX~1,TTB1X0,TQC1T+QTU0,1,TTD12B1TγID11TC1X0,TQ+D12U0,1,TQD11γI]0,\displaystyle\begin{bmatrix}\tilde{X}_{1,T}Q+Q^{T}\tilde{X}_{1,T}^{T}&B_{1}&X_{0,T}QC_{1}^{T}+Q^{T}U_{0,1,T}^{T}D_{12}\\ B_{1}^{T}&-\gamma I&D_{11}^{T}\\ C_{1}X_{0,T}Q+D_{12}U_{0,1,T}Q&D_{11}&-\gamma I\end{bmatrix}\prec 0, (32)
[X1,TQ+QTX1,TTα(X1,TQQTX1,TT)α(QTX1,TTX1,TQ)X1,TQ+QTX1,TT]0,\displaystyle\begin{bmatrix}X_{1,T}Q+Q^{T}X_{1,T}^{T}&\alpha(X_{1,T}Q-Q^{T}X_{1,T}^{T})\\ \alpha(Q^{T}X_{1,T}^{T}-X_{1,T}Q)&X_{1,T}Q+Q^{T}X_{1,T}^{T}\end{bmatrix}\prec 0, (33)
[SR1/2U0,1,TQQTU0,1,TTR1/2X0,TQ]0.\displaystyle\begin{bmatrix}S&R^{1/2}U_{0,1,T}Q\\ Q^{T}U_{0,1,T}^{T}R^{1/2}&X_{0,T}Q\end{bmatrix}\succeq 0. (34)

Proof: We start with considering the H2H_{2} performance objective. Please note in (10), H2H_{2} performance objective is to minimize Twz22||T_{wz_{2}}||_{2}. Following [23, 24], the model-based optimization problem for the H2H_{2} performance is given as follows:

minK,X2trace(QxX2)+trace(R12KX2KTR12)\min_{K,X_{2}}\operatorname{trace}\left(Q_{x}X_{2}\right)+\operatorname{trace}(R^{\frac{1}{2}}KX_{2}K^{T}R^{\frac{1}{2}}) (35)

subject to

{(A+B2K)X2+X2(A+B2K)T+B1B1T0,X2=X2T>0.\left\{\begin{array}[]{l}(A+B_{2}K)X_{2}+X_{2}(A+B_{2}K)^{T}+B_{1}B_{1}^{T}\prec 0,\\ X_{2}=X_{2}^{T}>0.\end{array}\right.

We now consider the HH_{\infty} performance objective which intends to minimize Twz1||T_{wz_{1}}||_{\infty}, and the use of KYP lemma (Bounded real lemma) results into the following optimization problem, [25, 26, 27]

minγ,Xγ\min_{\gamma,X_{\infty}}\gamma (36)

subject to

[A~X+XA~TB1X(C1+D12K)TB1TγID11T(C1+D12K)XD11γI]0,\displaystyle\begin{bmatrix}\tilde{A}X_{\infty}+X_{\infty}\tilde{A}^{T}&B_{1}&X_{\infty}(C_{1}+D_{12}K)^{T}\\ B_{1}^{T}&-\gamma I&D_{11}^{T}\\ (C_{1}+D_{12}K)X_{\infty}&D_{11}&-\gamma I\end{bmatrix}\prec 0, (37)
where,A~=A+B2K,\displaystyle\text{where,}\;\tilde{A}=A+B_{2}K,\normalsize
X0.\displaystyle X_{\infty}\succ 0. (38)

Recalling Lemma 3, the pole placement condition needs to satisfy the following inequality,

[A~XD+XDA~Tα(A~XDXDA~T)α(XDA~TA~XD)A~XD+XDA~T]0,\displaystyle\begin{bmatrix}\tilde{A}X_{D}+X_{D}\tilde{A}^{T}&\alpha(\tilde{A}X_{D}-X_{D}\tilde{A}^{T})\\ \alpha(X_{D}\tilde{A}^{T}-\tilde{A}X_{D})&\tilde{A}X_{D}+X_{D}\tilde{A}^{T}\end{bmatrix}\prec 0,
where,A~=A+B2K.\displaystyle\text{where,}\;\tilde{A}=A+B_{2}K.

Next, we convert the above conditions in an optimization problem by seeking a common solution of XX, where X=X2=X=XD0X=X_{2}=X_{\infty}=X_{D}\succ 0. Using a single Lyapunov matrix X that enforces multiple constraints has been studied in [22, 25]. As we are considering the mixed H2/HH_{2}/H_{\infty} objective the stabilization constraint provided by the HH_{\infty} problem will serve as a conservative unifying condition for both H2H_{2} and HH_{\infty} problem. Therefore, the model-based solution of problem P, can be written as,

minK,X,S,γtrace(QxX)+trace(S)+γ\min_{K,X,S,\gamma}\operatorname{trace}\left(Q_{x}X\right)+\operatorname{trace}(S)+\gamma (39)

subject to

[A~X+XA~TB1X(C1+D12K)TB1TγID11T(C1+D11K)XD11γI]0,\displaystyle\begin{bmatrix}\tilde{A}X+X\tilde{A}^{T}&B_{1}&X(C_{1}+D_{12}K)^{T}\\ B_{1}^{T}&-\gamma I&D_{11}^{T}\\ (C_{1}+D_{11}K)X&D_{11}&-\gamma I\end{bmatrix}\prec 0, (40)
where,A~=A+B2K,\displaystyle\text{where,}\;\tilde{A}=A+B_{2}K,
[A~X+XA~Tα(A~XXA~T)α(XA~TA~X)A~X+XA~T]0,\displaystyle\begin{bmatrix}\tilde{A}X+X\tilde{A}^{T}&\alpha(\tilde{A}X-X\tilde{A}^{T})\\ \alpha(X\tilde{A}^{T}-\tilde{A}X)&\tilde{A}X+X\tilde{A}^{T}\end{bmatrix}\prec 0,
where,A~=A+B2K,\displaystyle\text{where,}\;\tilde{A}=A+B_{2}K, (41)
X=XT>0,\displaystyle X=X^{T}>0, (42)
SR1/2KXKTR1/20.\displaystyle S-R^{1/2}KXK^{T}R^{1/2}\succeq 0. (43)

We represented the second term in the objective of (35) by the second term of (39) and the corresponding inequality (43). To this end, we now move into converting these model-based expressions to their data-driven counter part. Considering (40) and using (30), we can have,

[(X~1,TG)X+X(X~1,TGTB1XC1+XKTD12TB1TγID11TC1X+D12KXD11γI]0.\begin{bmatrix}(\tilde{X}_{1,T}G)X+X(\tilde{X}_{1,T}G^{T}&B_{1}&XC_{1}+XK^{T}D_{12}^{T}\\ B_{1}^{T}&-\gamma I&D_{11}^{T}\\ C_{1}X+D_{12}KX&D_{11}&-\gamma I\end{bmatrix}\prec 0. (44)

We now substitute, GX=QGX=Q and K=U0,1,TGK=U_{0,1,T}G, this results in XKTD12T=QTU0,1,TTD12TXK^{T}D_{12}^{T}=Q^{T}U_{0,1,T}^{T}D_{12}^{T}, giving us (32). The substitution of K=U0,1,TGK=U_{0,1,T}G also converts (43) into (45).

SR1/2U0,1,TGXGTU0,1,TTR1/20.\displaystyle S-R^{1/2}U_{0,1,T}GXG^{T}U_{0,1,T}^{T}R^{1/2}\succeq 0. (45)

Next, replace GX=QGX=Q and GT=XTQT=(X0,TQ)TQTG^{T}=X^{-T}Q^{T}=(X_{0,T}Q)^{-T}Q^{T}. Note, from (21) we can write X0,TG=IX_{0,T}G=I, therefore pre-multiplying GX=QGX=Q with X0,TX_{0,T} results in X=X0,TQX=X_{0,T}Q. Now, we have SR1/2U0,1,TQ(X0,TQ)TQTU0,1,TTR1/20S-R^{1/2}U_{0,1,T}Q(X_{0,T}Q)^{-T}Q^{T}U_{0,1,T}^{T}R^{1/2}\succeq 0, and after applying Schur complement, we get (34). The first term of objective(39) uses the relation X=X0,TQX=X_{0,T}Q and converts into first term of (31). Finally, we supplement these LMI conditions with the data-driven 𝒟\mathcal{D}-stability condition provided in Theorem 1, and this completes the proof of Theorem 2. \square
Remark 2 Please note that Theorem 2 considers the HH_{\infty} performance margin γ\gamma as an optimization variable. However, in many scenarios the designer may be interested in a pre-specified robustness performance gain (γ¯)γmin(\bar{\gamma})\geq\gamma_{min}, where γmin\gamma_{min} is the solution of the problem in Theorem 2. To use a pre-specified (γ¯)(\bar{\gamma}), the objective function in Theorem 2 modifies into trace(QxX0,TQ)+trace(S)\operatorname{trace}\left(Q_{x}X_{0,T}Q\right)+\operatorname{trace}(S) along with the γ\gamma in (32) will be replaced by (γ¯)(\bar{\gamma}).

5 Numerical Example

In this section, we present an illustrative example of a pole placement problem with robust performance criteria using the data-driven method discussed in Section 4. We compare the obtained results using the data-driven approach with its model-based counterpart. We define the system given in (10) with the following state and input matrices, taken from [14]:

A=[0.51.40.40.90.31.51.110.4];B2=[0.10.30.10.70.71]A=\begin{bmatrix}-0.5&1.4&0.4\\ -0.9&0.3&-1.5\\ 1.1&1&-0.4\end{bmatrix};B_{2}=\begin{bmatrix}0.1&-0.3\\ -0.1&-0.7\\ 0.7&-1\end{bmatrix}

. Two of the eigenvalues of AA are located on the right side of the jωj\omega axis resulting in an unstable open-loop system. Note while designing the data-driven control, it is assumed that AA and B2B_{2} are unknown. We choose the system matrix B1B_{1} for extraneous inputs ww and other performance matrices C1,D11,D12,C2C_{1},D_{11},D_{12},C_{2} and D22D_{22} as follows:

B1=[100010001];C1=[100010001];D11=[111111111];D12=[111111]B_{1}=\begin{bmatrix}1&0&0\\ 0&1&0\\ 0&0&1\end{bmatrix};C_{1}=\begin{bmatrix}1&0&0\\ 0&1&0\\ 0&0&1\end{bmatrix};D_{11}=\begin{bmatrix}1&1&1\\ 1&1&1\\ 1&1&1\end{bmatrix};D_{12}=\begin{bmatrix}1&1&1\\ 1&1&1\\ \end{bmatrix}

,

C2=[Qx12𝟎],whereQx=[100010001];D22=[𝟎R12],whereR=[1001]C_{2}=\begin{bmatrix}Q_{x}^{\frac{1}{2}}\\ \mathbf{0}\end{bmatrix},\;\;\text{where}\;\;Q_{x}=\begin{bmatrix}1&0&0\\ 0&1&0\\ 0&0&1\end{bmatrix};D_{22}=\begin{bmatrix}\mathbf{0}\\ R^{\frac{1}{2}}\end{bmatrix},\;\;\text{where}\;\;R=\begin{bmatrix}1&0\\ 0&1\\ \end{bmatrix}

. We choose a random initial condition x0x_{0} and a random input sequence u2u\in\mathbb{R}^{2} with the magnitudes of both channels constrained at [0.5,0.5][-0.5,0.5]. Here, the interesting part is in deciding the length of the input sequence uu. For this example n=3n=3, m=2m=2, therefore to satisfy the rank condition given in (16) as a requirement of the persistently excitation, the length of the input sequence TT is selected as 1515, as we require T(m+1)n+mT\geq(m+1)n+m. Next, to generate the trajectory rollout as defined in (28), we choose ww randomly from a norm ball defined by ||w||20.05\lvert\lvert w\rvert\rvert_{2}\leq 0.05. Trapezoidal approximations are used to generate the rollouts using continuous time dynamics. Now, for a given α=2\alpha=2, the solution of problem P using Theorem 2 is as follows,

Kdata=[3.6279841.2572983.8037311.4335450.8374962.065323],γmin=4.832K_{\text{data}}=\begin{bmatrix}-3.627984&1.257298&-3.803731\\ 1.433545&0.837496&2.065323\end{bmatrix},\\ \gamma_{\text{min}}=4.832

The poles of the closed-loop system are -4.2545, -1.9539, and -0.6244. These poles (marked as red* in Fig. 1) are located on the negative real axis, which includes the conic region defined by α=2\alpha=2 (shaded area in Fig. 1). To verify our designed data-driven controller gain and the location of closed-loop poles, we run the same experiments with the model-based equations given in (39) to (43). The computed model-based gain is shown below:

Kmod=[3.6279941.2573023.8037371.4335400.8374972.065325],γmin=4.832K_{\text{mod}}=\begin{bmatrix}-3.627994&1.257302&-3.803737\\ 1.433540&0.837497&2.065325\end{bmatrix},\\ \gamma_{\text{min}}=4.832

The computed gain KdataK_{\text{data}} and KmodK_{\text{mod}} are identical with a difference ||KdataKmod||104\lvert\lvert K_{\text{data}}-K_{\text{mod}}\rvert\rvert\approx 10^{-4}, which validates the accuracy of our proposed data-driven design.

Refer to caption
Figure 1: Data-Driven Pole Location with (in red) and without LMI Constraints (in blue)

We do an ablation study to further check the robustness of the data driven method. We removed the pole placement constraints from the conditions given in Theorem 2. This simply converts the problem into a mixed H2/HH_{2}/H_{\infty} problem. Solving the LMIs (31) to (34), except (33), we obtained,

K¯data=[3.6279841.2572983.8037311.4335450.8374962.065323],\bar{K}_{\text{data}}=\begin{bmatrix}-3.627984&1.257298&-3.803731\\ 1.433545&0.837496&2.065323\end{bmatrix},
Poles:0.9062+j1.533,0.9062j1.533,1.4182\text{Poles:}-0.9062+j1.533,-0.9062-j1.533,-1.4182

Fig. 1 clearly indicates that these poles (marked as blue*) are located outside the LMI region (shaded area). Like previous experiments, the same results can be found in case of model-based optimization using (39) to (43) eliminating (4).

6 Conclusion

In this paper, we have presented a comprehensive data-driven methodology that satisfies multiple constraints comprising of 𝒟\mathcal{D}-stability, and mixed H2/HH_{2}/H_{\infty} performance guarantees. We have shown that the data-based parametrized representation of closed-loop dynamics originating from the behavioral system theory can provide a fundamental framework to solve such classic problems with unknown state and input matrices. The solutions from the proposed semi-definite programs match closely with the classical model-based solutions, which has been proven rigorously and validated numerically. Future research work will consider unknown dynamic systems coupled with structured and parametric uncertainty, and develop multiple data-driven robust control designs.

References

  • [1] R. Sutton and A. Barto, Reinforcement learning - An introduction.   MIT press, Cambridge, 1998, 1998.
  • [2] C. Watkins, “Learning from delayed systems,” PhD thesis, King’s college of Cambridge, 1989.
  • [3] D. P. Bertsekas, Dynamic Programming and Optimal Control: Approximate Dynamic Programming, 4th ed.   Athena Scientific, Belmont, MA, USA., 2012.
  • [4] W. Powell, Approximate dynamic programming.   Wiley, 2007.
  • [5] D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. Lewis, “Adaptive optimal control for continuous-time linear systems based on policy iteration,” Automatica, vol. 45, pp. 477–484, 2009.
  • [6] Y. Jiang and Z.-P. Jiang, “Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,” Automatica, vol. 48, pp. 2699–2704, 2012.
  • [7] K. Vamvoudakis, “Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach,” Systems and Control Letters, vol. 100, pp. 14–20, 2017.
  • [8] B. Kiumarsi, K. Vamvoudakis, H. Modares, and F. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE Trans. on Neural Networks and Learning Systems, 2018.
  • [9] S. Mukherjee, H. Bai, and A. Chakrabortty, “Reduced-dimensional reinforcement learning control using singular perturbation approximations,” Automatica, vol. 126, p. 109451, 2021.
  • [10] S. Mukherjee and T. L. Vu, “On distributed model-free reinforcement learning control with stability guarantee,” IEEE Control Systems Letters, vol. 5, no. 5, pp. 1615–1620, 2020.
  • [11] J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. De Moor, “A note on persistency of excitation,” Systems & Control Letters, vol. 54, no. 4, pp. 325–329, 2005.
  • [12] C. De Persis and P. Tesi, “Formulas for data-driven control: Stabilization, optimality, and robustness,” IEEE Transactions on Automatic Control, vol. 65, no. 3, pp. 909–924, 2019.
  • [13] H. J. Van Waarde, J. Eising, H. L. Trentelman, and M. K. Camlibel, “Data informativity: a new perspective on data-driven analysis and control,” IEEE Transactions on Automatic Control, vol. 65, no. 11, pp. 4753–4768, 2020.
  • [14] J. Berberich, A. Koch, C. W. Scherer, and F. Allgöwer, “Robust data-driven state-feedback design,” in 2020 American Control Conference (ACC).   IEEE, 2020, pp. 1532–1538.
  • [15] J. Morimoto and K. Doya, “Robust reinforcement learning,” in NIPS.   Citeseer, 2000, pp. 1061–1067.
  • [16] L. Pinto, J. Davidson, R. Sukthankar, and A. Gupta, “Robust adversarial reinforcement learning,” in International Conference on Machine Learning.   PMLR, 2017, pp. 2817–2826.
  • [17] Y. Jiang and Z.-P. Jiang, Robust Adaptive Dynamic Programming.   Wiley-IEEE press, 2017.
  • [18] S. Mukherjee, H. Bai, and A. Chakrabortty, “Block-decentralized model-free reinforcement learning control of two time-scale networks,” in American Control Conference 2019, Philadelphia, USA.
  • [19] P. P. Khargonekar and M. A. Rotea, “Mixed control: H2{H_{2}}/H{H_{\infty}} a convex optimization approach,” IEEE Transactions on Automatic Control, vol. 36, no. 7, pp. 824–837, 1991.
  • [20] J. Doyle, K. Glover, P. Khargonekar, and B. Francis, “State-space solutions to standard H2{H_{2}} and H{H_{\infty}} control problems,” in 1988 American Control Conference.   IEEE, 1988, pp. 1691–1696.
  • [21] Y. Fujisaki and T. Yoshida, “A linear matrix inequality approach to mixed H2{H_{2}}/H{H_{\infty}} control,” IFAC Proceedings Volumes, vol. 29, no. 1, pp. 1339–1344, 1996.
  • [22] M. Chilali and P. Gahinet, “H2{H_{2}}/H{H_{\infty}} design with pole placement constraints: an lmi approach,” IEEE Transactions on automatic control, vol. 41, no. 3, pp. 358–367, 1996.
  • [23] E. Feron, V. Balakrishnan, S. Boyd, and L. El Ghaoui, “Numerical methods for H2{H_{2}} related problems,” in 1992 American Control Conference.   IEEE, 1992, pp. 2921–2922.
  • [24] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan, Linear matrix inequalities in system and control theory.   SIAM, 1994.
  • [25] C. Scherer and S. Weiland, “Linear matrix inequalities in control,” Lecture Notes, Dutch Institute for Systems and Control, Delft, The Netherlands, vol. 3, no. 2, 2000.
  • [26] P. Gahinet and P. Apkarian, “A linear matrix inequality approach to H{H_{\infty}} control,” International journal of robust and nonlinear control, vol. 4, no. 4, pp. 421–448, 1994.
  • [27] T. Iwasaki and R. E. Skelton, “All controllers for the general H{H_{\infty}} control problem: LMI existence conditions and state space formulas,” Automatica, vol. 30, no. 8, pp. 1307–1317, 1994.