Data-Driven Pole Placement in LMI Regions with Robustness Constraints

Sayak Mukherjee, Ramij R. Hossain S. Mukherjee is with the Optimization and Control Group, Pacific Northwest National Laboratory (PNNL), Richland, WA, USA, and R. R. Hossain is with the Department of Electrical and Computer Engineering, Iowa State University, Ames, IA, USA. Authors have equal contributions. Emails: [email protected], [email protected].

Abstract

This paper proposes a robust learning methodology to place the closed-loop poles in desired convex regions in the complex plane. We considered the system state and input matrices to be unknown and can only use the measurements of the system trajectories. The closed-loop pole placement problem in the linear matrix inequality (LMI) regions is considered a classic robust control problem; however, that requires knowledge about the state and input matrices of the linear system. We bring in ideas from the behavioral system theory and persistency of excitation condition-based fundamental lemma to develop a data-driven counterpart that satisfies multiple closed-loop robustness specifications, such as $\mathcal{D}$ -stability and mixed $H_{2}/H_{\infty}$ performance specifications. Our formulations lead to data-driven semi-definite programs (SDPs) that are coupled with sufficient theoretical guarantees. We validate the theoretical results with numerical simulations on a third-order dynamic system.

Keywords: Robust pole placement, data-driven robust control, stability guarantee, mixed $H_{2}/H_{\infty}$ , LMI regions.

1 Introduction

Recent research works in automatic control has been focused more on converting the classic model-based formulations to their data-driven counter-parts. Motivation of such designs are taken from the increasing complexity of practical dynamic systems with increase in their scale, making the dynamic model and parameters less accurately known along with several unmodeled non-idealities. Data-driven approaches varied in many forms with their distinct characteristics. In machine learning community, sequential decision making problems using Markov decision process (MDPs) have garnered lot of interest under the umbrella of reinforcement learning (RL) [1, 2, 3, 4]. Many underlying concepts of RL have been translated to dynamic system viewpoint using adaptive dynamic programming in approaches such as [5, 6, 7, 8, 9, 10], considering both partial and full model-free designs. The later class of methods intends to provide theoretical guarantees using dynamic systems theory. On continuing the path of supplementing data-driven algorithms with strong mathematical backing, behavioral system theory is recently being touted as an effective alternative approach [11]. The underlying idea is to represent the space of input-output trajectories of an LTI system to be spanned by single time shifted trajectory measurements. Research works such as [12, 13, 14] follow such underlying framework for data-driven optimal control designs.

Along with considering optimal control designs in a data-driven way, practical systems will require sufficient robustness margins. The problem of performing robust control designs with unknown state model is currently having a lot of open questions. Approaches have been developed such as [15, 16, 17, 18], that tries to augment some robustness aspects by infusing few robust considerations in the data-driven RL or optimal control setting. However, dedicated robust learning methodology is envisioned to provide much better stabilization and performance guarantees with strong underlying framework. Behavioral system theory using Willems fundamental lemma [11] can provide such foundations for robust learning control designs that helps to provide an one-to-one conversion strategy from model based to data-driven approaches. In this paper we build upon that framework and consider the pole placement problem in desired convex region of the complex plane along with sufficient robustness specifications.

Classically, decades of research in the robust control domain unearthed plethora of methods that can provide sufficient system performance in presence of noise, unmodeled dynamics, uncertainty etc [19, 20, 21]. The closed-loop pole placement problem in desired convex region in the complex plane has given rise to linear matrix inequalities in works such as [22]. However, these classical methods require the knowledge about system state and input matrices. With this motivation, this paper deals with the robust pole placement problem in the LMI regions under the assumption that the system state and input matrices are unknown and the designer only has access to trajectory measurements. The system is explored with persistently exciting inputs to make sure we do not violate the fundamental requirement of behavioral system theory. The model based formulations, thereafter, can be converted to data-driven formulations using the closed-loop data-driven parametrized representations. The robust control problem considers the LMI based pole placement conditions along with robust performance constraints such as mixed $H_{2}/H_{\infty}$ performance requirements.

Contribution. The main contribution of the paper is to propose a data-driven robust control methodology that can achieve desired closed-loop pole placement in convex regions of the complex plane along with sufficient robust and optimal system performance constrained by imposing mixed $H_{2}/H_{\infty}$ performance metrics. We use the fundamental lemma based data-driven parametrized representation to formulate the convex formulations of the robust pole placement problem in LMI regions that can achieve very close to the model-based design characteristics. We provide theoretical guarantees on the performance of the proposed algorithm along with validation on a third order dynamic system example.

The rest of the paper is organized as follows. Section II describes the model and problem statement considered for the paper. We recall some fundamentals for data-driven designs in Section III. The main results on the data-driven LMI region pole placement with robustness specifications are shown in Section IV. Numerical example is given in Section V, and we provide concluding remarks in Section VI.

Notations. $S\succ(\succeq)0$ denotes positive definite (semidefinite) matrix; $H_{2}$ norm : The $H_{2}$ norm of system $G$ in time domain is given by, $||G||_{2}=(\int_{0}^{\infty}[\mbox{trace}(h(t)^{T}h(t))]dt)^{\frac{1}{2}}$ where $h(t)$ is the impulse response; $H_{\infty}$ norm : The system $H_{\infty}$ norm is given by $||G||_{\infty}=\mbox{sup}_{w}\sigma_{\mbox{max}}(G(jw))$ , where $\sigma_{\mbox{max}}$ denotes maximum singular value, and $G(jw)$ is the system transfer matrix.

2 Problem Statement

We consider a linear time-invariant (LTI) continuous-time dynamic system of the form:

\displaystyle\dot{x}=Ax+B_{1}w+B_{2}u,\;x(0)=x_{0},

(1)

where $x\in\mathbb{R}^{n},u\in\mathbb{R}^{m},w\in\mathbb{R}^{d}$ are the states and control and extraneous inputs. We, hereby, make the following assumption.
Assumption 1: The dynamic state matrix $A$ and control input matrix $B_{2}$ are unknown. However, the values of $n,m$ and $d$ are known.

We also consider the following assumption about the availability of measurements.
Assumption 2: The measurements of states $x(t)$ and control inputs $u(t)$ are available to the designer, and the designer injects known extraneous disturbances $w(t)$ in a controlled environment to perform the control design tasks.
We are interested in placing the closed-loop poles in some desired regions of the complex plane. These regions are classically known as LMI regions defined as follows.

Definition 1: LMI pole placement regions - A subset $\mathcal{D}$ can be characterized as the desired pole placement regions if there exist a symmetric matrix $\alpha=[a_{ij}]\in\mathbb{R}^{n\times n}$ , and a matrix $\beta=[b_{ij}]\in\mathbb{R}^{n\times n}$ such that,

\displaystyle\mathcal{D}(z)=\{z\in\mathbb{C},\psi_{\mathcal{D}}<0\},

(2)

where,

\displaystyle\psi_{\mathcal{D}}(z)=\alpha+z\beta+z^{*}\beta^{T}=[a_{ij}+b_{ij}z+\beta_{ji}z^{*}]_{1\leq i,j\leq n}.

(3)

The notation $M=[m_{ij}]_{1\leq i,j\leq n}$ denotes $M$ to be a $n\times n$ matrix (resp. block matrix) with generic entry (resp. block) $m_{ij}$ . An LMI region is a subset of the complex plane that is representable by an LMI in $z$ and $z^{*}$ , or equivalently, an LMI in $x=Re(z)$ and $y=Im(z)$ . As a result, LMI regions are convex. Moreover, LMI regions are symmetric with respect to the real axis. Various different types of LMI regions can be constructed by designer. Through out this paper, we consider the following as our working example.
Example 1: To consider the region in the left half of complex plane between the lines with slopes $-\frac{1}{\alpha}$ and $\frac{1}{\alpha},\alpha>0$ is given as,

\displaystyle\mathcal{D}=\{z\in\mathcal{C},\begin{bmatrix}z+z^{*}&\alpha(z^{*}-z)\\ \alpha(z-z^{*})&z+z^{*}\end{bmatrix}<0\}.

(4)

This can be straightforwardly shown using $z=x+jy,$ giving,

\displaystyle\begin{bmatrix}z+z^{*}&\alpha(z^{*}-z)\\ \alpha(z-z^{*})&z+z^{*}\end{bmatrix}=\begin{bmatrix}2x&j2\alpha y\\ -j2\alpha y&2x\end{bmatrix}<0

(5)

Therefore using Schur complement,

	$\displaystyle x<0,$		(6)
	$\displaystyle 2x+(j2\alpha y)\frac{1}{2x}(j2\alpha y)<0,$		(7)
	$\displaystyle\text{implying,}-\frac{1}{\alpha}<\frac{y}{x}<\frac{1}{\alpha}.$		(8)

If the sector in the left half of the complex plane is described using the inner angle $\theta$ , then the LMI region expression becomes,

\displaystyle\psi_{\mathcal{D}}(z;\theta)=\begin{bmatrix}\sin\theta(z+z^{*})&\cos\theta(z-z^{*})\\ \cos\theta(z^{*}-z)&\sin\theta(z+z^{*})\end{bmatrix}<0.

(9)

$\square$
As the LMI region is convex, we can construct more complicated LMI regions by realizing convex polygons with intersection of simpler LMI regions. The focus of this paper to designer controllers without the state dynamics and using the state and input trajectory measurements. We also incorporate robust optimization objectives along with the LMI-based pole placement constraints. We consider the mixed $H_{2}/H_{\infty}$ optimization objective. We consider two controlled output variables along with the dynamics,

\Sigma:\left\{\begin{array}[]{l}\dot{x}(t)=Ax(t)+B_{1}w(t)+B_{2}u(t),\\ z_{1}(t)=C_{1}x(t)+D_{11}w(t)+D_{12}u(t),\\ z_{2}(t)=C_{2}x(t)+D_{22}u(t).\end{array}\right.

(10)

$T_{wz_{1}}$ (respectively $T_{wz_{2}}$ ) denotes the trasnfer function from $w(t)$ to controlled output $z_{1}(t)$ (respectively to $z_{2}(t)$ ). We intend to learn the state-feedback control $u=Kx$ such that the poles of the underlying closed-loop dynamics $A+BK$ lie in the desired LMI region characterized by the prescribed $\psi_{\mathcal{D}}(z)<0,$ thereby maintaining $\mathcal{D}$ -stability, and to also satisfy mixed $H_{2}/H_{\infty}$ objectives on the regulated variables. The problem statement is given as follows:
P. With the assumptions 1 and 2, learn the state-feedback control $u=Kx$ such that:

•

The prescribed $\mathcal{D}$ -stability is maintained for a desired LMI region,
•

Meet a prescribed $H_{\infty}$ robustness criterion, i.e, $||T_{wz_{1}}||_{\infty}<\gamma$ , or minimize the $H_{\infty}$ norm assuming the robustness margin as a variable,
•

With the desired $\mathcal{D}$ -stability and $H_{\infty}$ robustness margin, minimize the $H_{2}$ performance $||T_{wz_{2}}||_{2}$ .

3 Data-Driven Representation Fundamentals

3.1 Recalling fundamental lemma

Consider a signal $s:\mathbb{Z}\to\mathbb{R}^{p}$ , the Hankel matrix associated with it is given as,

\displaystyle S_{i,L,N}=\begin{bmatrix}s(i)&s(i+1)&\dots&s(i+N-1)\\ s(i+1)&s(i+2)&\dots&s(i+N)\\ \dots&\dots&\dots&\dots\\ s(i+L-1)&s(i+L)&\dots&s(i+N+L-2)\end{bmatrix}.

(11)

The Hankel matrix starts with the element $s(i)$ , and consists of $L$ rows and $N$ columns. With $L=1$ we denote,

\displaystyle S_{i,N}=\begin{bmatrix}s(i)&s(i+1)&\dots&s(i+N-1)\end{bmatrix}.

(12)

Definition 2 [11, 12]: The signal $s_{[0,T-1]}\in\mathbb{R}^{p}$ is persistently exciting of order $L$ if the corresponding Hankel matrix $S_{0,L,T-L+1}$ has full rank $pL$ . Therefore, the signal must be sufficiently extended, i.e., $T\geq(p+1)L-1$ . We recall the Willems et al.’s fundamental lemma [11] for the discrete time dynamic system:

	$\displaystyle x(k+1)=Ax(k)+Bu(k),$		(13)
	$\displaystyle y(k)=Cx(k)+Du(k),$

where $x\in\mathbb{R}^{n},u\in\mathbb{R}^{m},$ and $y\in\mathbb{R}^{p}$ .

Lemma 1 [11]: Considering the discrete-time system as given in (13), when the input $u_{[0,T-1]}$ is persistently exciting of order $n+t$ then one will have,

\displaystyle\mbox{rank}(\begin{bmatrix}U_{[0,t,T-t+1]}\\ X_{[0,T-t+1]}\end{bmatrix})=n+tm.

(14)

Lemma 2 [11]: For the system (13), if the input $u_{[0,T-1]}$ is persistently exciting of order $n+t$ , then one can express any $t-$ length input-output trajectory measurements of the system in the following form,

\displaystyle\begin{bmatrix}u_{[0,t-1]}\\ x_{[0,t-1]}\end{bmatrix}=\begin{bmatrix}U_{[0,t,T-t+1]}\\ X_{[0,t,T-t+1]}\end{bmatrix}g,

(15)

where $g\in\mathbb{R}^{T-t+1}$ . This shows that when $T$ is taken sufficiently large, the rank condition of Lemma 1 can be satisfied, and therefore, any input-output trajectory of the system can be represented as a linear combination of collected input/output data. This property enables us to replace a parametric description of the system with a data based counterpart. For a persistently exciting input sequence $u_{[0,T-1]}$ of order $n+1$ with $t=1$ , $T\geq(m+1)n+m$ is necessary for the persistence of excitation condition to hold. This results in,

\displaystyle\mbox{rank}(\begin{bmatrix}U_{[0,1,T-t+1]}\\ X_{[0,T-t+1]}\end{bmatrix})=n+m.

(16)

This idea can be extended for continuous-time systems as shown in [12]. For a sampling time $\Delta>0$ , input and state-sampled trajectories $U_{[0,1,T]}$ , and $X_{[0,T]}$ are stored, and the rank condition (16) needs to be checked. [12] constructed the continuous-time counterpart for the time-shifted states in the discrete-time using the derivative information with slight abuse of notation:

\displaystyle X_{1,T}=\begin{bmatrix}\dot{x}(0)&\dot{x}(\Delta)&\dots&\dot{x}((T-1)\Delta)\end{bmatrix}.

(17)

The state-dynamic data gathered over the $T$ -length window is represented as,

	$\displaystyle X_{1,T}=AX_{0,T}+BU_{0,1,T},$		(18)
	$\displaystyle=[B\;\;\;\;A]\begin{bmatrix}U_{0,1,T}\\ X_{0,T}\end{bmatrix}.$		(19)

3.2 Data-driven Closed-loop Representation

Following [12], Lemma 2 can be exploited to derive a parametrization of the closed loop system with a state-feedback law $u=Kx$ . For the closed-loop system,

\displaystyle\dot{x}

\displaystyle=Ax+Bu=(A+BK)x,

(20)

by the Rouché–Capelli theorem, there exists a matrix $G\in\mathbb{R}^{T\times n}$ such that

	$\displaystyle A+BK=[B\;\;A]\begin{bmatrix}K\\ I\end{bmatrix},$		(21)
	$\displaystyle\begin{bmatrix}K\\ I\end{bmatrix}=\begin{bmatrix}U_{0,1,T}\\ X_{0,T}\end{bmatrix}G.$

Therefore the data-driven representation becomes,

	$\displaystyle\dot{x}$	$\displaystyle=[B\;\;A]\begin{bmatrix}U_{0,1,T}\\ X_{0,T}\end{bmatrix}Gx,$		(22)
		$\displaystyle=X_{1,T}Gx,$		(23)

and the model-based closed-loop can now be made data-based as follows

\displaystyle A+BK=[B\;\;A]\begin{bmatrix}K\\ I\end{bmatrix}=X_{1,T}G,

(24)

The control now becomes $u=U_{0,1,T}Gx$ . Therefore, the designer needs to learn the matrix $G$ to implement the feedback control. We now provide the main results of the paper.

4 Data-Driven Robust Pole Placement Methodology

We first discuss the challenges in pole placement problems using the data-driven method without invoking any robust performance constraints. Then, we introduce the formulation of data-driven pole placement problem combining $H_{2}/H_{\infty}$ condition followed by a comprehensive discussion.

Although the pole placement requirement can be on any convex region as given by (2), we consider (4) as a working example throughout the methodology development in this section and numerical example in the following section. Considering the system $\dot{x}=Ax+B_{2}u$ , in Section-II, we defined an example LMI region (4) for $\mathcal{D}$ -stability. Next, we will state the LMI condition needs to be satisfied to place the closed loop poles in the prescribed region using a state-feedback control $u=Kx$ .

Lemma 3[22]: For the system $\dot{x}=Ax+B_{2}u$ , given an LMI region $\mathcal{D}$ defined by (4), the closed loop system $\tilde{A}=A+B_{2}K$ is said to be $\mathcal{D}$ -stable if there exists a real symmetric matrix $X_{D}\succ 0$ satisfying the following condition.

\displaystyle\begin{bmatrix}\tilde{A}X_{D}+X_{D}\tilde{A}^{T}&\alpha(\tilde{A}X_{D}-X_{D}\tilde{A}^{T})\\ \alpha(X_{D}\tilde{A}^{T}-\tilde{A}X_{D})&\tilde{A}X_{D}+X_{D}\tilde{A}^{T}\end{bmatrix}\prec 0,

(25)

$\square$
Proof[22]: The above condition can easily be obtained by replacing $z$ with $\tilde{A}X_{D}$ and $z^{*}$ with $X_{D}\tilde{A}^{T}$ in (4). Also, the detailed discussion of this condition can be found in [22]. This guarantees that the eigen values of $\tilde{A}=A+B_{2}K$ belong to the conic region $\psi_{\mathcal{D}}(z;\theta)$ on the left half of complex plane. $\square$
Note that, the condition given in (25) is not an LMI, because if we replace $\tilde{A}$ with $A+B_{2}K$ , we get a product term $KX_{D}$ of two unknown quantities $K$ and $X_{D}$ . But, with a simple change of variable $Y=KX_{D}$ , (25) can be converted in to an LMI condition.

Next, we utilize the relation given in (24), and derive the data based condition for placing the poles of closed loop system in the region defined by the LMI condition (4).

Theorem 1: For the system $\dot{x}=Ax+B_{2}u$ with unknown state dynamics, let the input sequence $U_{0,1,T}$ is persistently exciting, i.e. the rank condition (16) holds, then any matrix $Q$ satisfying (26), resulting in feedback gain $K=U_{0,1,T}Q(X_{0,T}Q)^{-1}$ , will make the system $\mathcal{D}$ -stable for the conic region defined by (4).

\displaystyle\begin{bmatrix}X_{1,T}Q+Q^{T}X_{1,T}^{T}&\alpha(X_{1,T}Q-Q^{T}X_{1,T}^{T})\\ \alpha(Q^{T}X_{1,T}^{T}-X_{1,T}Q)&X_{1,T}Q+Q^{T}X_{1,T}^{T}\end{bmatrix}\prec 0.

(26)

$\square$
Proof: We recall Lemma 3 giving us the condition (25), to place the closed loops of $A+B_{2}K$ in the desired conic region. However, we make the assumption that the system dynamics is unknown, therefore we rely upon the data driven persistency of excitation condition as defined in Section II. It can be seen from (24) that under a persistently exciting input, the closed loop dynamics can be represented in parameterized form derived from time evolution of system dynamics $X_{1,T}$ , i.e., $A+B_{2}K=X_{1,T}G$ . As such, we get the following inequality,

\displaystyle\begin{bmatrix}X_{1,T}GX_{D}+X_{D}(X_{1,T}G)^{T}&\alpha(X_{1,T}GX_{D}-X_{D}(X_{1,T}G)^{T})\\ \alpha(X_{D}(X_{1,T}G)^{T}-(X_{1,T}G)X_{D})&X_{1,T}GX_{D}+X_{D}(X_{1,T}G)^{T}\end{bmatrix}\prec 0

(27)

To make this inequality an LMI, we consider $GX_{D}=Q$ , resulting in (26). We also have $G=Q(X_{D})^{-1}$ . Now recalling Section II, $K$ can be computed as $K=U_{0,1,T}G=U_{0,1,T}Q(X_{D})^{-1}$ . Using (21) we have $X_{0,T}G=I$ , which implies, $X_{D}=X_{0,T}Q$ . Therefore, the feedback gain turns out to be $K=U_{0,1,T}Q(X_{0,T}Q)^{-1}$ . Please note that this condition completely relies upon collected data from system trajectories and the design parameter $\alpha$ which determines the LMI region (4). $\square$
Remark 1: It is essential to note that the above pole placement problem is a feasibility problem. Therefore, it is apparent that there can be many feasible $X_{D}$ which lies in the convex set defined by (25). This gives rise to different controller gains which all satisfy the $\mathcal{D}$ -stability condition. Similar characteristics will be observed during data-driven design using Theorem 1, as different exploration trajectories can result in different possible control gains, however, they all satisfy the desired closed-loop pole placement constraint (4).

As described in Remark 1, we will now consider system (10) along with desired $H_{2}/H_{\infty}$ constraints. Please note, we are now considering an extraneous input $w$ and transfer functions associated with the $H_{2}$ and $H_{\infty}$ problem represent the gain from $w$ to regulated outputs $z_{2}$ and $z_{1}$ , respectively. As we consider the state matrix $A$ and input matrix $B$ are unknown, the extraneous disturbance needs to pre-specified in a controlled environment during the design of the feedback gains. Recalling Section II, the trajectory based system dynamics turns out to be

	$\displaystyle X_{1,T}=AX_{0,T}+B_{1}W_{0,T}+B_{2}U_{0,1,T},$		(28)
	$\displaystyle X_{1,T}-B_{1}W_{0,T}=AX_{0,T}+B_{2}U_{0,1,T}.$		(29)

Therefore, the closed loop parameterized representation modifies to (30). Note, like $X_{0,T}$ , $W_{0,T}$ can also be defined using (12). The closed loop parameterization with extraneous input becomes

\displaystyle A+B_{2}K=[B_{2}\;\;A]\begin{bmatrix}K\\ I\end{bmatrix}=(X_{1,T}-B_{1}W_{0,T})G=\tilde{X}_{1,T}G.

(30)

We now state the following theorem to solve problem P.

Theorem 2: For the system (10), to place the closed loop poles in the desired LMI region (4) along with sufficient mixed $H_{2}/H_{\infty}$ the following set of data driven LMIs needs to be solved, where $C_{2}=\begin{bmatrix}Q_{x}^{\frac{1}{2}}\\ 0\end{bmatrix}$ , $D_{22}=\begin{bmatrix}0\\ R^{\frac{1}{2}}\end{bmatrix}$ and $Q_{x}\succeq 0$ , $R\succ 0$ are the designable state and input penalty factors.

\min_{Q,S,\gamma}\operatorname{trace}\left(Q_{x}X_{0,T}Q\right)+\operatorname{trace}(S)+\gamma

(31)

subject to

\displaystyle\begin{bmatrix}\tilde{X}_{1,T}Q+Q^{T}\tilde{X}_{1,T}^{T}&B_{1}&X_{0,T}QC_{1}^{T}+Q^{T}U_{0,1,T}^{T}D_{12}\\ B_{1}^{T}&-\gamma I&D_{11}^{T}\\ C_{1}X_{0,T}Q+D_{12}U_{0,1,T}Q&D_{11}&-\gamma I\end{bmatrix}\prec 0,

(32)

\displaystyle\begin{bmatrix}X_{1,T}Q+Q^{T}X_{1,T}^{T}&\alpha(X_{1,T}Q-Q^{T}X_{1,T}^{T})\\ \alpha(Q^{T}X_{1,T}^{T}-X_{1,T}Q)&X_{1,T}Q+Q^{T}X_{1,T}^{T}\end{bmatrix}\prec 0,

(33)

\displaystyle\begin{bmatrix}S&R^{1/2}U_{0,1,T}Q\\ Q^{T}U_{0,1,T}^{T}R^{1/2}&X_{0,T}Q\end{bmatrix}\succeq 0.

(34)

Proof: We start with considering the $H_{2}$ performance objective. Please note in (10), $H_{2}$ performance objective is to minimize $||T_{wz_{2}}||_{2}$ . Following [23, 24], the model-based optimization problem for the $H_{2}$ performance is given as follows:

\min_{K,X_{2}}\operatorname{trace}\left(Q_{x}X_{2}\right)+\operatorname{trace}(R^{\frac{1}{2}}KX_{2}K^{T}R^{\frac{1}{2}})

(35)

subject to

\left\{\begin{array}[]{l}(A+B_{2}K)X_{2}+X_{2}(A+B_{2}K)^{T}+B_{1}B_{1}^{T}\prec 0,\\ X_{2}=X_{2}^{T}>0.\end{array}\right.

We now consider the $H_{\infty}$ performance objective which intends to minimize $||T_{wz_{1}}||_{\infty}$ , and the use of KYP lemma (Bounded real lemma) results into the following optimization problem, [25, 26, 27]

\min_{\gamma,X_{\infty}}\gamma

(36)

subject to

	$\displaystyle\begin{bmatrix}\tilde{A}X_{\infty}+X_{\infty}\tilde{A}^{T}&B_{1}&X_{\infty}(C_{1}+D_{12}K)^{T}\\ B_{1}^{T}&-\gamma I&D_{11}^{T}\\ (C_{1}+D_{12}K)X_{\infty}&D_{11}&-\gamma I\end{bmatrix}\prec 0,$		(37)
	$\displaystyle\text{where,}\;\tilde{A}=A+B_{2}K,\normalsize$

\displaystyle X_{\infty}\succ 0.

(38)

Recalling Lemma 3, the pole placement condition needs to satisfy the following inequality,

	$\displaystyle\begin{bmatrix}\tilde{A}X_{D}+X_{D}\tilde{A}^{T}&\alpha(\tilde{A}X_{D}-X_{D}\tilde{A}^{T})\\ \alpha(X_{D}\tilde{A}^{T}-\tilde{A}X_{D})&\tilde{A}X_{D}+X_{D}\tilde{A}^{T}\end{bmatrix}\prec 0,$
	$\displaystyle\text{where,}\;\tilde{A}=A+B_{2}K.$

Next, we convert the above conditions in an optimization problem by seeking a common solution of $X$ , where $X=X_{2}=X_{\infty}=X_{D}\succ 0$ . Using a single Lyapunov matrix X that enforces multiple constraints has been studied in [22, 25]. As we are considering the mixed $H_{2}/H_{\infty}$ objective the stabilization constraint provided by the $H_{\infty}$ problem will serve as a conservative unifying condition for both $H_{2}$ and $H_{\infty}$ problem. Therefore, the model-based solution of problem P, can be written as,

\min_{K,X,S,\gamma}\operatorname{trace}\left(Q_{x}X\right)+\operatorname{trace}(S)+\gamma

(39)

subject to

	$\displaystyle\begin{bmatrix}\tilde{A}X+X\tilde{A}^{T}&B_{1}&X(C_{1}+D_{12}K)^{T}\\ B_{1}^{T}&-\gamma I&D_{11}^{T}\\ (C_{1}+D_{11}K)X&D_{11}&-\gamma I\end{bmatrix}\prec 0,$		(40)
	$\displaystyle\text{where,}\;\tilde{A}=A+B_{2}K,$

	$\displaystyle\begin{bmatrix}\tilde{A}X+X\tilde{A}^{T}&\alpha(\tilde{A}X-X\tilde{A}^{T})\\ \alpha(X\tilde{A}^{T}-\tilde{A}X)&\tilde{A}X+X\tilde{A}^{T}\end{bmatrix}\prec 0,$
	$\displaystyle\text{where,}\;\tilde{A}=A+B_{2}K,$		(41)

	$\displaystyle X=X^{T}>0,$		(42)
	$\displaystyle S-R^{1/2}KXK^{T}R^{1/2}\succeq 0.$		(43)

We represented the second term in the objective of (35) by the second term of (39) and the corresponding inequality (43). To this end, we now move into converting these model-based expressions to their data-driven counter part. Considering (40) and using (30), we can have,

\begin{bmatrix}(\tilde{X}_{1,T}G)X+X(\tilde{X}_{1,T}G^{T}&B_{1}&XC_{1}+XK^{T}D_{12}^{T}\\ B_{1}^{T}&-\gamma I&D_{11}^{T}\\ C_{1}X+D_{12}KX&D_{11}&-\gamma I\end{bmatrix}\prec 0.

(44)

We now substitute, $GX=Q$ and $K=U_{0,1,T}G$ , this results in $XK^{T}D_{12}^{T}=Q^{T}U_{0,1,T}^{T}D_{12}^{T}$ , giving us (32). The substitution of $K=U_{0,1,T}G$ also converts (43) into (45).

\displaystyle S-R^{1/2}U_{0,1,T}GXG^{T}U_{0,1,T}^{T}R^{1/2}\succeq 0.

(45)

Next, replace $GX=Q$ and $G^{T}=X^{-T}Q^{T}=(X_{0,T}Q)^{-T}Q^{T}$ . Note, from (21) we can write $X_{0,T}G=I$ , therefore pre-multiplying $GX=Q$ with $X_{0,T}$ results in $X=X_{0,T}Q$ . Now, we have $S-R^{1/2}U_{0,1,T}Q(X_{0,T}Q)^{-T}Q^{T}U_{0,1,T}^{T}R^{1/2}\succeq 0$ , and after applying Schur complement, we get (34). The first term of objective(39) uses the relation $X=X_{0,T}Q$ and converts into first term of (31). Finally, we supplement these LMI conditions with the data-driven $\mathcal{D}$ -stability condition provided in Theorem 1, and this completes the proof of Theorem 2. $\square$
Remark 2 Please note that Theorem 2 considers the $H_{\infty}$ performance margin $\gamma$ as an optimization variable. However, in many scenarios the designer may be interested in a pre-specified robustness performance gain $(\bar{\gamma})\geq\gamma_{min}$ , where $\gamma_{min}$ is the solution of the problem in Theorem 2. To use a pre-specified $(\bar{\gamma})$ , the objective function in Theorem 2 modifies into $\operatorname{trace}\left(Q_{x}X_{0,T}Q\right)+\operatorname{trace}(S)$ along with the $\gamma$ in (32) will be replaced by $(\bar{\gamma})$ .

5 Numerical Example

In this section, we present an illustrative example of a pole placement problem with robust performance criteria using the data-driven method discussed in Section 4. We compare the obtained results using the data-driven approach with its model-based counterpart. We define the system given in (10) with the following state and input matrices, taken from [14]:

A=\begin{bmatrix}-0.5&1.4&0.4\\ -0.9&0.3&-1.5\\ 1.1&1&-0.4\end{bmatrix};B_{2}=\begin{bmatrix}0.1&-0.3\\ -0.1&-0.7\\ 0.7&-1\end{bmatrix}

. Two of the eigenvalues of $A$ are located on the right side of the $j\omega$ axis resulting in an unstable open-loop system. Note while designing the data-driven control, it is assumed that $A$ and $B_{2}$ are unknown. We choose the system matrix $B_{1}$ for extraneous inputs $w$ and other performance matrices $C_{1},D_{11},D_{12},C_{2}$ and $D_{22}$ as follows:

B_{1}=\begin{bmatrix}1&0&0\\ 0&1&0\\ 0&0&1\end{bmatrix};C_{1}=\begin{bmatrix}1&0&0\\ 0&1&0\\ 0&0&1\end{bmatrix};D_{11}=\begin{bmatrix}1&1&1\\ 1&1&1\\ 1&1&1\end{bmatrix};D_{12}=\begin{bmatrix}1&1&1\\ 1&1&1\\ \end{bmatrix}

C_{2}=\begin{bmatrix}Q_{x}^{\frac{1}{2}}\\ \mathbf{0}\end{bmatrix},\;\;\text{where}\;\;Q_{x}=\begin{bmatrix}1&0&0\\ 0&1&0\\ 0&0&1\end{bmatrix};D_{22}=\begin{bmatrix}\mathbf{0}\\ R^{\frac{1}{2}}\end{bmatrix},\;\;\text{where}\;\;R=\begin{bmatrix}1&0\\ 0&1\\ \end{bmatrix}

. We choose a random initial condition $x_{0}$ and a random input sequence $u\in\mathbb{R}^{2}$ with the magnitudes of both channels constrained at $[-0.5,0.5]$ . Here, the interesting part is in deciding the length of the input sequence $u$ . For this example $n=3$ , $m=2$ , therefore to satisfy the rank condition given in (16) as a requirement of the persistently excitation, the length of the input sequence $T$ is selected as $15$ , as we require $T\geq(m+1)n+m$ . Next, to generate the trajectory rollout as defined in (28), we choose $w$ randomly from a norm ball defined by $\lvert\lvert w\rvert\rvert_{2}\leq 0.05$ . Trapezoidal approximations are used to generate the rollouts using continuous time dynamics. Now, for a given $\alpha=2$ , the solution of problem P using Theorem 2 is as follows,

K_{\text{data}}=\begin{bmatrix}-3.627984&1.257298&-3.803731\\ 1.433545&0.837496&2.065323\end{bmatrix},\\ \gamma_{\text{min}}=4.832

The poles of the closed-loop system are -4.2545, -1.9539, and -0.6244. These poles (marked as red* in Fig. 1) are located on the negative real axis, which includes the conic region defined by $\alpha=2$ (shaded area in Fig. 1). To verify our designed data-driven controller gain and the location of closed-loop poles, we run the same experiments with the model-based equations given in (39) to (43). The computed model-based gain is shown below:

K_{\text{mod}}=\begin{bmatrix}-3.627994&1.257302&-3.803737\\ 1.433540&0.837497&2.065325\end{bmatrix},\\ \gamma_{\text{min}}=4.832

The computed gain $K_{\text{data}}$ and $K_{\text{mod}}$ are identical with a difference $\lvert\lvert K_{\text{data}}-K_{\text{mod}}\rvert\rvert\approx 10^{-4}$ , which validates the accuracy of our proposed data-driven design.

Refer to caption — Figure 1: Data-Driven Pole Location with (in red) and without LMI Constraints (in blue)

We do an ablation study to further check the robustness of the data driven method. We removed the pole placement constraints from the conditions given in Theorem 2. This simply converts the problem into a mixed $H_{2}/H_{\infty}$ problem. Solving the LMIs (31) to (34), except (33), we obtained,

\bar{K}_{\text{data}}=\begin{bmatrix}-3.627984&1.257298&-3.803731\\ 1.433545&0.837496&2.065323\end{bmatrix},

\text{Poles:}-0.9062+j1.533,-0.9062-j1.533,-1.4182

Fig. 1 clearly indicates that these poles (marked as blue*) are located outside the LMI region (shaded area). Like previous experiments, the same results can be found in case of model-based optimization using (39) to (43) eliminating (4).

6 Conclusion

In this paper, we have presented a comprehensive data-driven methodology that satisfies multiple constraints comprising of $\mathcal{D}$ -stability, and mixed $H_{2}/H_{\infty}$ performance guarantees. We have shown that the data-based parametrized representation of closed-loop dynamics originating from the behavioral system theory can provide a fundamental framework to solve such classic problems with unknown state and input matrices. The solutions from the proposed semi-definite programs match closely with the classical model-based solutions, which has been proven rigorously and validated numerically. Future research work will consider unknown dynamic systems coupled with structured and parametric uncertainty, and develop multiple data-driven robust control designs.

References

[1] R. Sutton and A. Barto, Reinforcement learning - An introduction. MIT press, Cambridge, 1998, 1998.
[2] C. Watkins, “Learning from delayed systems,” PhD thesis, King’s college of Cambridge, 1989.
[3] D. P. Bertsekas, Dynamic Programming and Optimal Control: Approximate Dynamic Programming, 4th ed. Athena Scientific, Belmont, MA, USA., 2012.
[4] W. Powell, Approximate dynamic programming. Wiley, 2007.
[5] D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. Lewis, “Adaptive optimal control for continuous-time linear systems based on policy iteration,” Automatica, vol. 45, pp. 477–484, 2009.
[6] Y. Jiang and Z.-P. Jiang, “Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,” Automatica, vol. 48, pp. 2699–2704, 2012.
[7] K. Vamvoudakis, “Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach,” Systems and Control Letters, vol. 100, pp. 14–20, 2017.
[8] B. Kiumarsi, K. Vamvoudakis, H. Modares, and F. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE Trans. on Neural Networks and Learning Systems, 2018.
[9] S. Mukherjee, H. Bai, and A. Chakrabortty, “Reduced-dimensional reinforcement learning control using singular perturbation approximations,” Automatica, vol. 126, p. 109451, 2021.
[10] S. Mukherjee and T. L. Vu, “On distributed model-free reinforcement learning control with stability guarantee,” IEEE Control Systems Letters, vol. 5, no. 5, pp. 1615–1620, 2020.
[11] J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. De Moor, “A note on persistency of excitation,” Systems & Control Letters, vol. 54, no. 4, pp. 325–329, 2005.
[12] C. De Persis and P. Tesi, “Formulas for data-driven control: Stabilization, optimality, and robustness,” IEEE Transactions on Automatic Control, vol. 65, no. 3, pp. 909–924, 2019.
[13] H. J. Van Waarde, J. Eising, H. L. Trentelman, and M. K. Camlibel, “Data informativity: a new perspective on data-driven analysis and control,” IEEE Transactions on Automatic Control, vol. 65, no. 11, pp. 4753–4768, 2020.
[14] J. Berberich, A. Koch, C. W. Scherer, and F. Allgöwer, “Robust data-driven state-feedback design,” in 2020 American Control Conference (ACC). IEEE, 2020, pp. 1532–1538.
[15] J. Morimoto and K. Doya, “Robust reinforcement learning,” in NIPS. Citeseer, 2000, pp. 1061–1067.
[16] L. Pinto, J. Davidson, R. Sukthankar, and A. Gupta, “Robust adversarial reinforcement learning,” in International Conference on Machine Learning. PMLR, 2017, pp. 2817–2826.
[17] Y. Jiang and Z.-P. Jiang, Robust Adaptive Dynamic Programming. Wiley-IEEE press, 2017.
[18] S. Mukherjee, H. Bai, and A. Chakrabortty, “Block-decentralized model-free reinforcement learning control of two time-scale networks,” in American Control Conference 2019, Philadelphia, USA.
[19] P. P. Khargonekar and M. A. Rotea, “Mixed control: ${H_{2}}$ / ${H_{\infty}}$ a convex optimization approach,” IEEE Transactions on Automatic Control, vol. 36, no. 7, pp. 824–837, 1991.
[20] J. Doyle, K. Glover, P. Khargonekar, and B. Francis, “State-space solutions to standard ${H_{2}}$ and ${H_{\infty}}$ control problems,” in 1988 American Control Conference. IEEE, 1988, pp. 1691–1696.
[21] Y. Fujisaki and T. Yoshida, “A linear matrix inequality approach to mixed ${H_{2}}$ / ${H_{\infty}}$ control,” IFAC Proceedings Volumes, vol. 29, no. 1, pp. 1339–1344, 1996.
[22] M. Chilali and P. Gahinet, “ ${H_{2}}$ / ${H_{\infty}}$ design with pole placement constraints: an lmi approach,” IEEE Transactions on automatic control, vol. 41, no. 3, pp. 358–367, 1996.
[23] E. Feron, V. Balakrishnan, S. Boyd, and L. El Ghaoui, “Numerical methods for ${H_{2}}$ related problems,” in 1992 American Control Conference. IEEE, 1992, pp. 2921–2922.
[24] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan, Linear matrix inequalities in system and control theory. SIAM, 1994.
[25] C. Scherer and S. Weiland, “Linear matrix inequalities in control,” Lecture Notes, Dutch Institute for Systems and Control, Delft, The Netherlands, vol. 3, no. 2, 2000.
[26] P. Gahinet and P. Apkarian, “A linear matrix inequality approach to ${H_{\infty}}$ control,” International journal of robust and nonlinear control, vol. 4, no. 4, pp. 421–448, 1994.
[27] T. Iwasaki and R. E. Skelton, “All controllers for the general ${H_{\infty}}$ control problem: LMI existence conditions and state space formulas,” Automatica, vol. 30, no. 8, pp. 1307–1317, 1994.