Zeroth-order optimisation on subsets of symmetric matrices with application to MPC tuning

Alejandro I. Maass Department of Electrical and Electronic Engineering, The University of Melbourne Chris Manzie Department of Electrical and Electronic Engineering, The University of Melbourne Iman Shames College of Engineering & Computer Science, The Australian National University Hayato Nakada Advanced Unit Management System Development Division, Toyota Motor Corporation, Japan

Abstract

This paper provides a zeroth-order optimisation framework for non-smooth and possibly non-convex cost functions with matrix parameters that are real and symmetric. We provide complexity bounds on the number of iterations required to ensure a given accuracy level for both the convex and non-convex case. The derived complexity bounds for the convex case are less conservative than available bounds in the literature since we exploit the symmetric structure of the underlying matrix space. Moreover, the non-convex complexity bounds are novel for the class of optimisation problems we consider. The utility of the framework is evident in the suite of applications that use symmetric matrices as tuning parameters. Of primary interest here is the challenge of tuning the gain matrices in model predictive controllers, as this is a challenge known to be inhibiting industrial implementation of these architectures. To demonstrate the framework we consider the problem of MIMO diesel air-path control, and consider implementing the framework iteratively “in-the-loop” to reduce tracking error on the output channels. Both simulations and experimental results are included to illustrate the effectiveness of the proposed framework over different engine drive cycles.

1 Introduction

Optimisation with matrix-valued decision variables is a problem that appears in a wide variety of applications. Particularly, this approach can be found in non-negative matrix factorization [19], low-rank matrix optimisation [45, 44], signal processing with independent component analysis [3], energy efficient wireless systems [26], graph weight allocation [33, 6], and controller tuning such as model predictive controllers [8], linear quadratic regulators [23], and PIDs for MIMO systems [9]. In most of the aforementioned applications, either explicit expressions of the cost function are not available, or derivatives are difficult or infeasible to obtain. Consequently, we restrict our attention to this class of problems in which the cost function is treated as a black-box. In this case, the cost function directional derivatives can still be approximated by means of cost function values using finite-differences for instance. Of main interest in this paper is the challenge of tuning the gain matrices in model predictive controllers (MPC), which have the property of being symmetric and positive (semi) definite. Efficient calibration of MPCs is often a difficult task given the large number of tuning parameters and their non-intuitive correlation with the output response.

Existing literature regarding MPC tuning can be broadly divided in heuristic guidelines [12], analytical studies [36, 34, 5], and meta-heuristic algorithms [32, 43, 18, 38, 40, 8, 16, 30]. Both heuristic guidelines and analytical studies hold for specific scenarios and MPC formulations, thus trial-and-error approaches are unavoidably adopted in practice. Regarding meta-heuristic methods, the works [32, 38, 40, 16, 30] focus on set-point tuning based on step response characteristics (e.g. overshoot, settling and rise times, and steady-state errors). The authors in [43] deal with trajectory tuning of MPC in which the trajectories are generated by first-order transfer function models. General trajectory tuning is considered in [18] and [8] via particle swarm optimisation (PSO) and gradient-based algorithms, respectively. The number of particles in PSO algorithms depends on the specific problem and it usually oscillates around 20 to 50 particles. This translates into running 20 to 50 closed-loop experiments per iteration, which is not practical for applications where the plant is in the loop. Gradient-based methods [8] and Bayesian optimisation (BO) methods [22, 37] can also be used as an alternative with less experiments per iteration, but existent results are presented only for vectors of parameters (i.e. diagonal tuning matrices). Lastly, inverse optimal control techniques such as [25] could be used for MPC tuning, see e.g. [28]. However, [28] does not provide any theoretical guarantees and it relies on a good choice of initial weighting matrices and heuristic guidelines. We note there is currently a lack of a general framework with convergence guarantees that can directly deal with, not only diagonal matrices, but also general subsets of symmetric matrices in which the algorithm preserves the structure of the tuning matrix at every iteration.

In response to the above discussion, we propose a zeroth-order optimisation framework for non-smooth and possibly non-convex cost functions with matrix parameters that are real and symmetric. We adopt iterates that use zeroth-order matrix oracles to approximate the directional derivative in a randomly chosen direction. This random direction is taken from the Gaussian orthogonal ensemble (GOE), which is the set of matrices with independent normally distributed entries, see e.g. [11, 2, 39]. The algorithm we use is the natural extension of the zeroth-order random search algorithm from [27] but tailored to deal with cost functions with matrix parameters. For this optimisation framework, we provide convergence guarantees in terms of complexity bounds for both convex and non-convex cost functions. In [27], the counterpart of our optimisation problem is studied, in which functions with vector parameters are used, and the zeroth-order oracle samples from the normal distribution. The authors in [27] present complexity bounds for both convex and non-convex functions. The bounds for the convex case can be applied to our matrix setup if one vectorises both the parameter and random direction matrices. However, this does not carry the matrix structure and, in fact, these bounds are significantly more conservative than the ones we propose. Moreover, the non-convex bounds proposed in [27] are only applicable to unconstrained problems and thus cannot be used in our constrained setting. Consequently, we develop novel bounds for the non-convex case, in which the optimisation parameters are constrained to subsets of symmetric matrices. The optimisation framework presented in this paper, when applied to the context of MPC tuning, offers a more general approach with respect to available literature, and it also comes with convergence guarantees. Particularly, it can deal with MPC tuning over trajectories in a general setting as opposed to [32, 38, 40, 16, 43], and it provides MPC weighting matrices that satisfy the required constraints of being symmetric and positive (semi) definite, and thus provide more degrees of freedom than the usual diagonal choices in [8, 18, 22, 37].

The applicability of the proposed approach is illustrated via both simulations and experiments in the problem of diesel air-path control. The MPC parameters are iteratively tuned within the closed-loop setting with the goal of improving the overall tracking performance over an engine drive cycle. For the simulation study, we compare the performance of our proposed framework with other gradient-free algorithms available in the literature such as dividing rectangles (DIRECT), PSO, and BO. In the experimental testing, we tune MPC controllers in a diesel engine test bench over two segments of the new European driving cycle (NEDC), and one segment of the worldwide harmonised light duty test cycle (WLTC).

A summary of our main contributions can be found below.

1.

We present theoretical guarantees for a general class of optimisation problems with matrix-valued decision variables on subsets of real and symmetric matrices. The adopted iterates are tailored to this setting in the sense that they produce matrices that belong to the adequate space through projection. When the cost function is convex, the derived complexity bounds are demonstrably less conservative than the ones in [27] (vector-valued decision variables), since we exploit the symmetric structure of the underlying matrix space. In the non-convex case, we derive new complexity bounds that were not available in the literature for this context.
2.

We illustrate the applicability of our framework by using it to tune the matrix parameters in MPC. This extends current literature such as [32, 38, 40, 16, 43, 30].
3.

We highlight the efficacy of the approach by a simulation study in which we compare it to other available gradient-free algorithms in the literature (DIRECT, PSO, and BO).
4.

We provide various experimental evaluations in a diesel engine test bench showing significant improvement of MPC tracking performance with only a few iterations of the proposed optimisation approach.

Organisation: Section 2 introduces optimisation framework. The complexity bounds are derived in Section 3. In Section 4, the proposed optimisation framework is applied to MPC tuning in the context of air-path control. In Section 5, we compare different gradient-free algorithms available in the literature to our framework. Experimental testing is presented in Section 6 for different engine drive cycles. Lastly, conclusions are drawn in Section 7.

Notation: Denote by $\mathbb{R}^{m\times n}$ the set of real matrices of dimension $m\times n$ , and $\mathbb{S}^{n}$ the set of real symmetric matrices of dimension $n\times n$ . Let $\mathbb{N}\triangleq\{1,2,\dots\}$ and $\mathbb{N}_{0}\triangleq\mathbb{N}\cup\{0\}$ . Given a matrix $M$ , $M^{\top}$ denotes its transpose. We use $\mbox{diag}\{M_{1},\dots,M_{n}\}$ to denote the standard block diagonal matrix. The identity and zero matrices of dimension $m\times n$ are denoted by $I_{m\times n}$ and $0_{m\times n}$ , respectively. $\nabla f(X)$ denotes the matrix gradient for a scalar function $f$ with matrix parameter $X$ , see e.g. [4, 10]. For matrices $M\in\mathbb{R}^{n\times n}$ and $N\in\mathbb{R}^{n\times n}$ , the Frobenius inner product is defined as $\left\langle M,N\right\rangle_{F}\triangleq\mbox{Tr}\left\{M^{\top}N\right\}$ , which induces the norm $\left\|M\right\|_{F}\triangleq\sqrt{\left\langle M,M\right\rangle_{F}}$ . For a random variable $x\in\mathbb{R}$ , we write $x\sim\mathcal{N}(0,\sigma^{2})$ to say that $x$ is normally distributed with zero mean and variance $\sigma^{2}$ .

2 Optimisation framework

We consider the following class of optimisation problems

\displaystyle\min_{X\in\mathbb{Q}^{n}}f(X),

(1)

where $X$ is an $n\times n$ matrix parameter, $\mathbb{Q}^{n}\subset\mathbb{R}^{n\times n}$ is a closed convex set of admissible parameters, and $f$ is a non-smooth and possibly non-convex cost function. We build upon the random search ideas of [27], but tailored to the matrix space $\mathbb{Q}^{n}$ . That is, we are interested in solving (1) using iterates of the form

\displaystyle X_{k+1}=\pi_{\mathbb{Q}^{n}}\{X_{k}-h_{k}\mathcal{O}_{\mu}(X_{k},U_{k})\},

(2)

where $\pi_{\mathbb{Q}^{n}}$ denotes the Euclidean projection onto the closed convex set $\mathbb{Q}^{n}$ , $h_{k}$ is a positive scalar known as the step size, and $\mathcal{O}_{\mu}$ denotes the so-called zeroth-order random oracle which is defined as

\displaystyle\mathcal{O}_{\mu}(X,U)\triangleq\frac{f(X+\mu U)-f(X)}{\mu}U,

(3)

where $\mu>0$ denotes the oracle’s precision, and $U$ is a random symmetric matrix that belongs to the Gaussian orthogonal ensemble (GOE) as per Definition 1 below.

Definition 1

The GOE, denoted by $\mathbb{G}^{n}$ , is the set of random symmetric matrices $U\in\mathbb{S}^{n}$ with i.i.d. entries such that $[U]_{ii}\sim\mathcal{N}(0,1)$ , and $[U]_{ij}\sim\mathcal{N}(0,\frac{1}{2})$ independent of $[U]_{ii}$ for $i<j$ , see e.g. [11, 2, 39].

The overall method to solve (1) is represented by the pseudo-code above, which we name zeroth-order random matrix search (ZO-RMS) algorithm, where $N$ denotes the number of iterations. Note that the oracle, per iteration, only requires the cost function value at two points instead of first-order or second-order derivative information, and it computes an estimate of the directional derivative in a randomly chosen direction.

Zeroth-Order Random Matrix Search (ZO-RMS)

1:

Choose $X_{0}\in\mathbb{Q}^{n}$ , $\mu>0$ and $\{h_{k}\}_{k\in\mathbb{N}_{0}}$ .
2:

for $k=\{0,\dots,N\}$ , $N\in\mathbb{N}$ do
3:

Generate $U_{k}\in\mathbb{G}^{n}$
4:

$\mathcal{O}_{\mu}(X_{k},U_{k})\leftarrow\frac{1}{\mu}(f(X_{k}+\mu U_{k})-f(X_{k}))U_{k}$
5:

$X_{k+1}\leftarrow\pi_{\mathbb{Q}^{n}}\{X_{k}-h_{k}\mathcal{O}_{\mu}(X_{k},U_{k})\}$
6:

end for
7:

Return $\hat{X}_{N}\triangleq\arg\min_{X}[f(X):X\in\{X_{0},\dots,X_{N}\}]$ .

We emphasise that the ZO-RMS algorithm is the natural extension of the random vector search algorithm proposed in [27] but tailored to matrix decision variables. This extension is essential to our setting since there is currently an absence of a general framework for the constrained class of problems (1), i.e. adequate algorithm and convergence guarantees. In [27], the authors provide gradient-free algorithms to solve the optimisation problem $\min_{x\in\bar{\mathbb{Q}}^{n}\subset\mathbb{R}^{n}}f(x)$ for both convex and non-convex cost functions, which is the counterpart of (1) for vector optimisation variables. A zeroth-order random oracle with a multivariate normally distributed direction $u$ is used. In this paper, we extend the setting in [27] to optimisation problems over a matrix space using random search matrices $U$ from the GOE, which are the natural counterpart to the normally distributed vector $u$ . Note that the complexity bounds provided in [27] for the convex case are applicable to our setting if we consider $x=\mbox{vec}\{X\}$ and $u=\mbox{vec}\{U\}$ . However, the matrix structure of $X$ and $U$ is immediately lost. Consequently, the first theoretical goal of this work is to obtain less conservative complexity bounds for the ZO-RMS algorithm when $f$ in (1) is convex, by exploiting the structure of the underlying matrix space. With respect to the non-convex case, we note that the complexity bounds available in [27] are not applicable to our case since they consider an unconstrained non-convex problem in $\mathbb{R}^{n}$ as opposed to a constrained one such as (1). Therefore, our second theoretical goal is to develop new complexity bounds for the ZO-RMS algorithm when solving constrained non-convex optimisation problems in a matrix space, i.e. problem (1) with $f$ non-convex.

As mentioned in the introduction, the proposed optimisation framework presented here is general and fits many applications. Therefore, the theoretical results presented in the following section will hold for any family of problems that fit the framework. Particularly, in this paper we will illustrate how to apply this framework to tune MPC controllers, since it is an important application for which we can provide interesting contributions validated by simulations and experiments.

3 Complexity bounds

In this section, we study the performance of the ZO-RMS algorithm in terms of complexity bounds that guarantees a given level of accuracy for the algorithm. We provide complexity bounds for both the convex and non-convex case.

Let us first introduce some important definitions. Definition 1 implies that the probability distribution $\mathbb{P}(\mathrm{d}U)$ in the GOE $\mathbb{G}^{n}$ is $\mathbb{P}(\mathrm{d}U)=\frac{1}{\kappa}e^{-\frac{1}{2}\left\|U\right\|_{F}^{2}}\mathrm{d}U$ , where $\mathrm{d}U$ is the Lebesgue measure on $\mathbb{S}^{n}\cong\mathbb{R}^{n(n+1)/2}$ , i.e. $\mathrm{d}U\triangleq\prod_{i=1}^{n}\mathrm{d}U_{ii}\prod_{i<j}\mathrm{d}U_{ij}$ , and $\kappa$ is the normalising constant defined as $\kappa\triangleq\int_{\mathbb{G}^{n}}e^{-\frac{1}{2}\left\|U\right\|_{F}^{2}}\mathrm{d}U=2^{\frac{n}{2}}\pi^{\frac{n^{2}+n}{4}}$ . Therefore, we can define the expectation of a function $q:\mathbb{G}^{n}\rightarrow\mathbb{R}$ as $\mathbb{E}_{U}\{q(U)\}\triangleq\int_{\mathbb{G}^{n}}q(U)\mathbb{P}(\mathrm{d}U)=\frac{1}{\kappa}\int_{\mathbb{G}^{n}}q(U)e^{-\frac{1}{2}\left\|U\right\|_{F}^{2}}\mathrm{d}U$ , and the following moments of interest in $\mathbb{G}^{n}$ , $m_{p}\triangleq\mathbb{E}_{U}\{\left\|U\right\|_{F}^{p}\}=\frac{1}{\kappa}\int_{\mathbb{G}^{n}}\left\|U\right\|_{F}^{p}e^{-\frac{1}{2}\left\|U\right\|_{F}^{2}}\mathrm{d}U$ , for $p\in\mathbb{N}_{0}$ . We state the following lemma for some moments of interest.

Lemma 1

For every $n\geq 1$ , we have that $m_{1}\leq\sqrt{\frac{n^{2}+n}{2}}$ , $m_{2}=\frac{n^{2}+n}{2}$ , and $m_{4}=\frac{1}{4}(n^{4}+2n^{3}+5n^{2}+4n)$ .

Proof: Define $\psi(p)\triangleq\ln(m_{p})$ , and note that this function is convex in $p$ . Let us write $p=(1-\alpha)\times 0+\alpha\times 2$ (i.e. $\alpha=p/2$ ). For $p\in[0,2]$ we have $\alpha\in[0,1]$ . Therefore, since $\psi(p)$ is convex, we have that $\psi(p)=\psi(\alpha\times 2+(1-\alpha)\times 0)\leq\alpha\psi(2)+(1-\alpha)\psi(0)$ . Particularly, $\psi(0)=\ln(m_{0})=0$ , and $\psi(2)=\ln(m_{2})$ , then $\psi(p)\leq\alpha\ln(m_{2})=\frac{p}{2}\ln(m_{2})$ , and thus $m_{p}\leq m_{2}^{p/2}$ . Particularly, $m_{1}\leq\sqrt{m_{2}}$ . On the other hand, from [14] we get that $m_{2}=(n^{2}+n)/2$ and $m_{4}=\frac{1}{4}(n^{4}+2n^{3}+5n^{2}+4n)$ , which concludes the proof. $\blacksquare$

Our analysis relies in the following assumption.

Assumption 1

(a)

The set of admissible parameters is a subset of the set of real symmetric matrices of dimension $n\times n$ , i.e. $\mathbb{Q}^{n}\subset\mathbb{S}^{n}$ .
(b)

The function $f:\mathbb{Q}^{n}\rightarrow\mathbb{R}$ is Lipschitz continuous, i.e. $\exists L_{0}(f)>0$ s.t. $|f(X)-f(Y)|\leq L_{0}(f)\left\|X-Y\right\|_{F}$ holds for all $X,Y\in\mathbb{Q}^{n}$ .

3.1 Convex case

Let $X^{*}\in\mathbb{Q}^{n}$ be a stationary point of (1), and $f^{*}\triangleq f(X^{*})$ . In addition, define $U_{0:k}\triangleq\mbox{diag}\{U_{0},\dots,U_{k}\}$ , a random matrix composed by i.i.d. variables $\{U_{k}\}_{k\in\mathbb{N}_{0}}$ associated with each iteration of the scheme.

We are now in a position to state the main result.

Theorem 1

Consider problem (1) with $f$ convex. Suppose Assumption 1 holds, and let the sequence $\{X_{k}\}_{k\in\{0,\dots,N\}}$ be generated by the ZO-RMS iterates (2). Then, for any $N\geq 0$

\tfrac{1}{S_{N}}\sum_{k=0}^{N}h_{k}(\phi_{k}-f^{*})\leq\mu L_{0}(f)\sqrt{\tfrac{n^{2}+n}{2}}+\tfrac{1}{S_{N}}\Big{(}\tfrac{1}{2}\left\|X_{0}-X^{*}\right\|_{F}^{2}\\ +\tfrac{1}{8}(n^{4}+2n^{3}+5n^{2}+4n)L_{0}^{2}(f)\sum_{k=0}^{N}h_{k}^{2}\Big{)},

(4)

where $S_{N}\triangleq\sum_{k=0}^{N}h_{k}$ , $\phi_{k}\triangleq\mathbb{E}_{U_{0:k-1}}\{f(X_{k})\}$ for $k\in\mathbb{N}$ , and $\phi_{0}\triangleq f(X_{0})$ .

Proof: See Appendix B. $\blacksquare$

A corollary from Theorem 1 can be obtained which provides expressions for $\mu$ , $h_{k}$ , and $N$ that ensure a given level of accuracy for the ZO-RMS algorithm.

Corollary 1

Let $\epsilon>0$ be given. If $\mu$ and $h_{k}$ are chosen such that


$\displaystyle\mu$	$\displaystyle\leq\frac{\epsilon}{L_{0}(f)\sqrt{2(n^{2}+n)}},$	(5a)
$\displaystyle h_{k}$	$\displaystyle=\frac{2\bar{r}}{L_{0}(f)\sqrt{n^{4}+2n^{3}+5n^{2}+4n}\sqrt{N+1}},$	(5b)

for $k\in\{0,\dots,N\}$ , then, $\mathbb{E}_{U_{0:N-1}}\{f(\hat{X}_{N})\}-f^{*}\leq\epsilon$ is guaranteed by the ZO-RMS algorithm in

\displaystyle N\geq\frac{L_{0}^{2}(f)\bar{r}^{2}}{\epsilon^{2}}(n^{4}+2n^{3}+5n^{2}+4n)

(6)

iterations, where $\bar{r}$ is such that $\left\|X_{0}-X^{*}\right\|_{F}\leq\bar{r}$ .

Proof: Note that

	$\displaystyle\mathbb{E}_{U_{0:N-1}}\{f(\hat{X}_{N})\}$	$\displaystyle-f^{*}$
		$\displaystyle\leq\mathbb{E}_{U_{0:N-1}}\Big{\{}\tfrac{1}{S_{N}}\sum_{k=0}^{N}h_{k}(f(X_{k})-f^{*})\Big{\}}$
		$\displaystyle\stackrel{{\scriptstyle\eqref{eq:theo6-matrix}}}{{\leq}}\mu L_{0}(f)\sqrt{\tfrac{n^{2}+n}{2}}+\tfrac{1}{S_{N}}\Big{(}\tfrac{R^{2}}{2}+\tfrac{1}{8}(n^{4}+2n^{3}+5n^{2}+4n)L_{0}^{2}(f)\sum_{k=0}^{N}h_{k}^{2}\Big{)}.$

Suppose the number of steps $N$ is fixed, we can choose $\mu$ and $h_{k}$ as in (5) and further obtain the following bound

\displaystyle\mathbb{E}_{U_{0:N-1}}\{f(\hat{X}_{N})\}-f^{*}\leq\frac{\epsilon}{2}+\frac{\bar{r}^{2}}{S_{N}}=\frac{\epsilon}{2}+\frac{\bar{r}L_{0}(f)\sqrt{n^{4}+2n^{3}+5n^{2}+4n}}{2\sqrt{N+1}}.

Therefore, in order to satisfy $\mathbb{E}_{U_{0:N-1}}\{f(\hat{X}_{N})\}-f^{*}\leq\epsilon$ , we need $N\geq\frac{L_{0}^{2}(f)\bar{r}^{2}}{\epsilon^{2}}(n^{4}+2n^{3}+5n^{2}+4n)-1$ , concluding the proof. $\blacksquare$

The complexity bound (6) in Corollary 1 corresponds to the number of iterations in which the ZO-RMS algorithm guarantees a given accuracy of $\epsilon$ , provided the step size $h_{k}$ and parameter $\mu$ are chosen as per (5). Certainly, the expressions for $h_{k},\mu$ , and $N$ depend on the Lipschitz constant $L_{0}(f)$ and $\bar{r}$ which may be hard to obtain explicitly depending on the application. However, these can be numerically bounded as we explain further below in Section 6.1. Since the oracle (3) is random, the guarantees of the ZO-RMS algorithm hold in expectation. In essence, Corollary 1 provides sufficient conditions on the oracle’s precision $\mu$ and step size $h_{k}$ such that we can get sufficiently close to the optimal cost function $f^{*}$ in $N$ iterations, in average.

Refer to caption — Figure 1: Comparison of the complexity bounds (6) and (7), see black and blue lines, respectively.

We now compare our complexity bound (6) to the bounds in [27] which are applicable to (1) (for convex $f$ ) after grouping the elements of the tuning matrix $X$ into a single vector. Particularly, since $X$ is symmetric, we group the lower triangular elements into a vector of size $n(n+1)/2$ (or often called half vectorization), therefore the results in [27] apply and the following complexity bound holds (cf. Theorem 6 in [27])

\displaystyle N_{\text{\tiny\cite[cite]{[\@@bibref{}{nesspo17}{}{}]}}}\geq\frac{4L_{o}^{2}(f)\bar{r}^{2}}{\epsilon^{2}}(n(n+1)/2+4)^{2}.

(7)

Fig. 1 depicts (6) and (7) for different values of $n$ . We can see that our bound (6) is less conservative than (7) for all $n\in\mathbb{N}$ , and tends to (7) as $n\rightarrow\infty$ . We emphasise that our optimisation framework is directly tailored to matrix parameters, and the reduction in conservatism with respect to a vector approach such as [27] follows from exploiting the special structure of the underlying matrix space, namely the symmetry, and the use of iterates that are tailored to it. The reduction in conservativeness is significant, for instance, for $3\times 3$ matrices (i.e. $n=3$ ), $N_{\text{\tiny\cite[cite]{[\@@bibref{}{nesspo17}{}{}]}}}/N_{\text{\tiny Cor.\ref{coro:complexity-bound}}}=2.08$ . This translates in a sufficient condition that guarantees a level of accuracy in twice less iterations.

3.2 Non-convex case

Compared to the convex optimisation problem considered in Section 3.1, the analysis for constrained non-convex problems is more challenging. Particularly, constrained convex problems typically focus on the optimality gap $\mathbb{E}\left\{f(x)\right\}-f^{*}$ to measure the convergence rate (as we do in Corollary 1). On the other hand, Nesterov in [27] provided complexity bounds for unconstrained non-convex problems using zeroth order iterates, where the optimisation variable is a vector in a subset of $\mathbb{R}^{n}$ . In these unconstrained non-convex problems, the object $\mathbb{E}\left\{\left\|\nabla f(x)\right\|^{2}\right\}$ is the typical measure for stationarity. We emphasise that the bounds for unconstrained non-convex problems in [27] are not applicable to our constrained problem (1), since we need to search over the matrix space $\mathbb{Q}^{n}$ and use iterates of the form (2) that project onto $\mathbb{Q}^{n}$ . For constrained non-convex problems like ours, a fitting alternative to $\mathbb{E}\left\{\left\|\nabla f(x)\right\|^{2}\right\}$ is to consider the so-called gradient mapping, see e.g. [21, 13], which is defined as follows

\displaystyle P_{\mathbb{Q}^{n}}(X,\mathcal{O}_{\mu},h)\triangleq\frac{1}{h}\Big{[}X-\pi_{\mathbb{Q}^{n}}\{X-h\mathcal{O}_{\mu}(X,U)\}\Big{]},

where $\mathcal{O}_{\mu}$ is the random zeroth order oracle in (3), and recall that $\pi_{\mathbb{Q}^{n}}$ denotes the Euclidean projection onto the closed convex set $\mathbb{Q}^{n}$ . The natural interpretation of $P_{\mathbb{Q}^{n}}$ is that it represents the projected gradient, which offers a feasible update from the previous point $X$ . The main goal in this section is to provide complexity bounds for the ZO-RMS algorithm (2) in terms of bounding $\mathbb{E}\left\{\left\|P_{\mathbb{Q}^{n}}(X,\mathcal{O}_{\mu},h)\right\|_{F}^{2}\right\}$ when solving the constrained optimisation problem (1) with $f$ non-convex.

Before stating our results, we impose an additional assumption.

Assumption 2

$\mathbb{E}\left\{\left\|\mathcal{O}_{\mu}(X,U)-\nabla f_{\mu}(X)\right\|_{F}^{2}\right\}\leq\sigma^{2}$ , $\sigma>0$ , where $f_{\mu}$ is the Gaussian approximation of $f$ defined in (16).

Assumption 2 essentially bounds the variance of the random oracle $\mathcal{O}_{\mu}$ . This assumption is often adopted in non-convex problems, see e.g. [13]. If constructing $\sigma$ explicitly is of interest, we can construct it as follows. We note that $\mathcal{O}_{\mu}$ is an unbiased estimator of $\nabla f_{\mu}(X)$ (cf. (18)), i.e. $\mathbb{E}\left\{\mathcal{O}_{\mu}(X,U)\right\}=\nabla f_{\mu}(X)$ , then $\mathbb{E}\left\{\left\|\mathcal{O}_{\mu}(X,U)-\nabla f_{\mu}(X)\right\|_{F}^{2}\right\}\leq\mathbb{E}\left\{\left\|\mathcal{O}_{\mu}(X,U)\right\|_{F}^{2}\right\}\stackrel{{\scriptstyle\text{\tiny Thm.\ref{theo:4-matrix}}}}{{\leq}}(1/4)L_{0}^{2}(f)(n^{4}+2n^{3}+5n^{2}+4n)\triangleq\sigma^{2}$ .

The main result for the non-convex case is stated below. It can be seen as the non-convex counterpart of Theorem 1.

Theorem 2

Consider problem (1) with $f$ non-convex. Suppose Assumptions 1 and 2 hold, and let the sequence $\{X_{k}\}_{k\in\{0,\dots,N\}}$ be generated by the ZO-RMS iterates (2). Then, for any $N\geq 0$ ,

\frac{1}{S_{N}}\sum_{k=0}^{N}h_{k}\mathbb{E}_{U_{0:k}}\{\left\|P_{\mathbb{Q}^{n}}(X_{k},\mathcal{O}_{\mu}(X_{k},U_{k}),h_{k})\right\|_{F}^{2}\}\leq\frac{1}{S_{N}}\Big{[}f_{\mu}(X_{0})-f^{*}+C(\mu)\sum_{k=0}^{N}h_{k}^{2}\Big{]}+\sigma^{2},

(8)

where $C(\mu)\triangleq\frac{L_{0}^{3}(f)}{4\mu}\sqrt{\frac{n^{2}+n}{2}}(n^{4}+2n^{3}+5n^{2}+4n)$ .

Proof: See Appendix C. $\blacksquare$

The next corollary provides a choice of $\mu,h_{k}$ , and $N$ that ensure a given level of accuracy for the ZO-RMS algorithm up to an error of order $\sigma^{2}$ .

Corollary 2

Let $\delta>0$ and $\varepsilon>0$ be given, and $h_{k}=h$ for all $k\in\{0,\dots,N\}$ . If $\mu$ and $h$ are chosen as

	$\displaystyle\mu$	$\displaystyle=\frac{\varepsilon}{L_{0}(f)[(n^{2}+n)/2]^{1/2}},$		(9)
	$\displaystyle h$	$\displaystyle=\left[\frac{8\varepsilon\bar{r}}{(N+1)L_{0}^{3}(f)(n^{2}+n)(n^{4}+2n^{3}+5n^{2}+4n)}\right]^{1/2},$		(10)

then $\mathbb{E}\{\left\|P_{\mathbb{Q}^{n}}(X_{D},\mathcal{O}_{\mu}(X_{D},U_{D}),h)\right\|^{2}\}\leq\delta+\sigma^{2}$ is guaranteed by the ZO-RMS algorithm in

\displaystyle N\geq\frac{L_{0}^{5}(f)\bar{r}(n^{2}+n)(n^{4}+2n^{3}+5n^{2}+4n)}{2\varepsilon\delta^{2}}

(11)

iterations, where

D\triangleq\arg\min_{k\in\{0,\dots,N\}}\left\|P_{\mathbb{Q}^{n}}(X_{k},\mathcal{O}_{\mu}(X_{k},U_{k}),h)\right\|_{F}.

Proof: Recall that $f_{\mu}$ is a smooth approximation of $f$ , and the gap in this approximation can be upper bounded by $|f_{\mu}(X)-f(X)|\leq\mu L_{0}(f)\sqrt{\frac{n^{2}+n}{2}}$ , see Lemma 2-(III) in the appendix. Then, to bound this gap by $\varepsilon$ we need to choose $\mu$ as per (9). For this choice of $\mu$ , which we denote $\bar{\mu}$ , $C(\bar{\mu})=\frac{L_{0}^{4}(f)}{4\varepsilon}\left(\frac{n^{2}+n}{2}\right)(n^{4}+2n^{3}+5n^{2}+4n)$ . For a constant step size, the right-hand size of (8) becomes

\displaystyle\frac{f_{\bar{\mu}}(X_{0})-f^{*}}{(N+1)h}+C(\bar{\mu})h+\sigma^{2}\leq\frac{L_{0}(f)\bar{r}}{(N+1)h}+C(\bar{\mu})h+\sigma^{2}.

Let $\rho(h)\triangleq\frac{L_{0}(f)\bar{r}}{(N+1)h}+C(\bar{\mu})h$ . Minimising $\rho(h)$ in $h$ leads to $h^{*}=\sqrt{L_{0}(f)\bar{r}/[C(\bar{\mu})(N+1)]}$ , which corresponds to (10). For this choice of step size, we get $\rho(h^{*})=\sqrt{4L_{0}(f)\bar{r}C(\bar{\mu})/(N+1)}$ . Note that $\mathbb{E}\{\left\|P_{\mathbb{Q}^{n}}(X_{D},\mathcal{O}_{\mu}(X_{D},U_{D}),h)\right\|_{F}^{2}\}\leq\frac{1}{N+1}\sum_{k=0}^{N}\mathbb{E}_{U_{0:k}}\{\left\|P_{\mathbb{Q}^{n}}(X_{k},\mathcal{O}_{\mu}(X_{k},U_{k}),h)\right\|_{F}^{2}\}\leq\rho(h^{*})+\sigma^{2}$ . Then, we can guarantee that $\rho(h^{*})\leq\delta$ in $4L_{0}(f)\bar{r}C(\bar{\mu})/\delta^{2}$ iterations, which corresponds to (11), completing the proof. $\blacksquare$

Our analysis shows that ZO-RMS for constrained non-convex problems can suffer an additional error of order $\sigma^{2}$ which does not appear in the convex case, see Corollary 1. This effect is consistent with the literature on constrained non-convex problems, where this effect has been reported for different algorithms, see e.g. [21, 13, 20].

4 Application to MPC tuning: air-path control in diesel engines

In this section, we illustrate how to apply the proposed optimisation framework to tune MPC controllers in the context of diesel air-path control.

4.1 Diesel engine air-path model

A schematic representation of the diesel air-path is shown in Fig. 2. The dynamics of a diesel engine air-path are highly non-linear, see e.g. [41], and can be captured in the general form


$\displaystyle x_{k+1}$	$\displaystyle=\mathbf{f}(x_{k},u_{k},\theta),$	(12a)
$\displaystyle y_{k}$	$\displaystyle=\mathbf{g}(x_{k},u_{k},\theta),$	(12b)

where $x_{k}\in\mathbb{R}^{n_{x}},y_{k}\in\mathbb{R}^{n_{y}}$ , and $u_{k}\in\mathbb{R}^{n_{u}}$ are the state, output, and input of the system respectively, at time instant $k\in\mathbb{N}_{0}$ . The engine operational space is typically parametrised by $\theta\triangleq(\omega_{e},\dot{m}_{f})\in\Theta\subseteq\mathbb{R}^{2}$ , where $\omega_{e}$ denotes the engine speed, $\dot{m}_{f}$ denotes the fuel rate, and $\Theta$ is the engine operating space defined as $\Theta\triangleq\{\theta\in\mathbb{R}^{2}:\omega_{e,\min}\leq\omega_{e}\leq\omega_{e,\max},\dot{m}_{f,\min}\leq\dot{m}_{f}\leq\dot{m}_{f,\max}\}$ , for some $\omega_{e,\min},\omega_{e,\max},\dot{m}_{f,\min},\dot{m}_{f,\max}\in\mathbb{R}$ . Given these highly non-linear dynamics, a common approach to control-oriented modelling of the air-path is to generate linear models trimmed at various operating points $\theta$ . Particularly, we follow [30] and use twelve models to approximate the operating space uniformly. That is, the engine operating space $\Theta$ is divided into twelve regions as per Fig. 3, with a linear model representing the engine dynamics in each region. For commercial in confidence purposes, we only show normalised axes in every plot. We emphasise that these regions are chosen to provide adequate coverage over the range of operating points encountered along a drive cycle. The control-oriented model state consists in the intake manifold pressure $p_{\text{im}}$ (also known as boost pressure), the exhaust manifold pressure $p_{\text{em}}$ , the compressor flow $W_{\text{comp}}$ , and the EGR rate $y_{\text{EGR}}$ . The input consists of the throttle position $u_{\text{thr}}$ , EGR valve position $u_{\text{EGR}}$ , and VGT valve position $u_{\text{VGT}}$ . Lastly, the measured output is $(p_{\text{im}},y_{\text{EGR}})$ .

For a given operating point $(\omega_{e}^{\sigma},\dot{m}_{f}^{\sigma})$ , $\sigma\in\{1,\dots,12\}$ , the engine control unit (ECU) applies certain steady-state actuator values. We denote by $\bar{u}^{\sigma}\in\mathbb{R}^{3},\bar{x}^{\sigma}\in\mathbb{R}^{4}$ , and $\bar{y}^{\sigma}\in\mathbb{R}^{2}$ the steady-state values of the input, state, and output respectively at each operating condition $(\omega_{e}^{\sigma},\dot{m}_{f}^{\sigma})$ . Then, by following the system identification procedure detailed in [35], a linear representation of the non-linear diesel air-path (12) trimmed around the grid point $(\omega_{e}^{\sigma},\dot{m}_{f}^{\sigma})$ is given by


$\displaystyle\tilde{x}_{k+1}$	$\displaystyle=A^{\sigma}\tilde{x}_{k}+B^{\sigma}\tilde{u}_{k},$	(13a)
$\displaystyle\tilde{y}_{k}$	$\displaystyle=C^{\sigma}\tilde{x}_{k}+D^{\sigma}\tilde{u}_{k},$	(13b)

where $\sigma\in\{1,\dots,12\}$ , $\tilde{x}_{k}=x_{k}-\bar{x}^{\sigma}\in\mathbb{R}^{4}$ is the perturbed state around the corresponding steady-state $\bar{x}^{\sigma}$ , $\tilde{u}_{k}=u_{k}-\bar{u}^{\sigma}\in\mathbb{R}^{3}$ is the perturbed input around the corresponding steady-state input $\bar{u}^{\sigma}$ , and $\tilde{y}_{k}=y_{k}-\bar{y}^{\sigma}\in\mathbb{R}^{2}$ is the perturbed output around the corresponding steady-state output $\bar{y}^{\sigma}$ .

4.2 MPC formulation

For each operating point $(\omega_{e}^{\sigma},\dot{m}_{f}^{\sigma})$ , $\sigma\in\{1,\dots,12\}$ , and corresponding model (13) associated to that region, we formulate an MPC with augmented integrator (see e.g. [29, 42]). To this end, define the augmented state $\mathbf{x}_{k}=(\tilde{x}_{k},e_{k})$ , where $e_{k}$ is the integrator state with dynamics $e_{k+1}=-C^{\sigma}\tilde{x}_{k}+e_{k}$ . Define the cost function as

\displaystyle J(\mathbf{x}_{k},\mathbf{u})\triangleq\mathbf{x}_{k+H}^{\top}{P}^{\sigma}\mathbf{x}_{k+H}+\sum_{i=0}^{H-1}\left(\mathbf{x}_{k+i}^{\top}{Q}^{\sigma}\mathbf{x}_{k+i}+\tilde{u}_{k+i}^{\top}R^{\sigma}\tilde{u}_{k+i}\right),

where $H\in\mathbb{N}$ is the prediction horizon, $\mathbf{u}\triangleq\{\tilde{u}_{k},\tilde{u}_{k+1},\dots,\tilde{u}_{k+H-1}\}$ is the sequence of control values applied over the horizon $H$ , and ${P}^{\sigma}\in\mathbb{S}^{6},{Q}^{\sigma}\in\mathbb{S}^{6}$ , and $R^{\sigma}\in\mathbb{S}^{3}$ are real symmetric matrices containing the tuning weights. Particularly, we further assume that $R^{\sigma}$ is positive definite, and that $P^{\sigma}$ and $Q^{\sigma}$ are positive semidefinite.

The corresponding MPC problem is stated below,

	$\displaystyle\text{minimise}\quad J(\mathbf{x}_{k},\mathbf{u})$
	$\displaystyle\text{subject to}\left\{\begin{array}[]{l}\mathbf{x}_{k+1}=\begin{bmatrix}A^{\sigma}&0\\ -C^{\sigma}&I\end{bmatrix}\mathbf{x}_{k}+\begin{bmatrix}B^{\sigma}\\ 0\end{bmatrix}\tilde{u}_{k},\\ \tilde{x}(k)=x(k)-\bar{x}^{\sigma},\\ S_{x}\mathbf{x}_{i}+S_{u}\tilde{u}_{i}\leq b_{i},\forall i\in\{0,\dots,H-1\},\\ S_{H}\mathbf{x}_{H}\leq b_{H},\end{array}\right.$

where $S_{x}\in\mathbb{R}^{18\times 6}$ , $S_{u}\in\mathbb{R}^{18\times 3}$ , $b_{i}\in\mathbb{R}^{18}$ , $S_{H}\in\mathbb{R}^{12\times 6}$ , and $b_{H}\in\mathbb{R}^{12}$ are given by the relevant state and input constraints. The solution to the above optimisation problem yields the optimising control sequence $\mathbf{u}^{*}=\{\tilde{u}^{*}_{k},\tilde{u}^{*}_{k+1},\dots,\tilde{u}^{*}_{k+H-1}\}$ . The first element of the sequence $\mathbf{u^{*}}$ is applied to the system and the whole process is repeated as $k$ is incremented.

For transient operations between operating points, we use a switching LTI-MPC architecture so that the controllers are switched based on the current operating condition. As in [31], the switching LTI-MPC strategy selects the MPC controller at the nearest operating point to the current operating condition.

Under the above scenario, our goal is to use the ZO-RMS algorithm to tune the MPC weighting matrices $\{P^{\sigma},Q^{\sigma},R^{\sigma}\}$ in order to achieve a satisfactory tracking performance for a given prediction horizon $H\in\mathbb{N}$ . Since there are twelve available controllers to tune, we could potentially use the ZO-RMS algorithm to tune all of them. However, since experiments are costly, we will focus on tuning only the controllers that have poor performance given an initial choice of tuning parameters. That is, we utilise an initial calibration denoted by $\{P_{0}^{\sigma},Q_{0}^{\sigma},R_{0}^{\sigma}\}$ for every controller $\sigma\in\{1,\dots,12\}$ , which we call baseline controller. The choice of $\{P_{0}^{\sigma},Q_{0}^{\sigma},R_{0}^{\sigma}\}$ may be based on model dynamics, experience, or using ad-hoc guidelines as per [12]. Then, by looking at a drive cycle response with the baseline controller, we detect the controllers that have poor performance. Let us denote by $\sigma^{*}$ a controller with unsatisfactory performance and that we attempt to tune with the ZO-RMS algorithm. Therefore, in this context, we want to solve

\displaystyle\min_{P^{\sigma^{*}},Q^{\sigma^{*}}\in\mathbb{S}^{4}_{+},R^{\sigma^{*}}\in\mathbb{S}^{3}_{++}}g(y(P^{\sigma^{*}},Q^{\sigma^{*}},R^{\sigma^{*}})),

(14)

where $g$ is the tracking performance of the MPC defined in (15) below, which depends on a closed-loop response of interest $y$ , and $\mathbb{S}^{4}_{+}\subset\mathbb{S}^{4}$ and $\mathbb{S}^{3}_{++}\subset\mathbb{S}^{3}$ denote the set of symmetric positive semi-definite matrices and the set of symmetric positive definite matrices, respectively.

As mentioned above, we use the tracking error as the measure of performance for the MPC controller, that is,

\displaystyle g(y(P^{\sigma^{*}},Q^{\sigma^{*}},R^{\sigma^{*}}))\triangleq\frac{1}{\sqrt{M}}\sqrt{\sum_{k=0}^{M-1}\left\|y_{k}(P^{\sigma^{*}},Q^{\sigma^{*}},R^{\sigma^{*}})-y_{k}^{\text{ref}}\right\|_{2}^{2}}\ ,

(15)

where $M\in\mathbb{N}$ is the experiment length, $y_{k}^{\text{ref}}$ is the vector containing the boost pressure and EGR rate references, respectively, and $y_{k}(P^{\sigma^{*}},Q^{\sigma^{*}},R^{\sigma^{*}})$ is the process measured output when the $\sigma^{*}$ -th controller is using tuning parameters $P^{\sigma^{*}},Q^{\sigma^{*}},R^{\sigma^{*}}$ , and the rest of controllers are using the baseline parameters $\{P_{0},Q_{0},R_{0}\}$ . We emphasise that for the simulations and experiments below we input normalised signals in (15) so that they are weighted equally.

4.3 Implementing the ZO-RMS algorithm

We now show how to implement the ZO-RMS algorithm for the MPC tuning problem (14). First note that (14) fits the general optimisation problem (1) with $X=\mbox{diag}\{P^{\sigma^{*}},Q^{\sigma^{*}},R^{\sigma^{*}}\}$ , $f(X)\triangleq g(y(X))$ , and

\displaystyle\mathbb{Q}^{n}\triangleq\big{\{}X\in\mathbb{R}^{11\times 11}:X=\mbox{diag}\{P^{\sigma^{*}},Q^{\sigma^{*}},R^{\sigma^{*}}\},P^{\sigma^{*}},Q^{\sigma^{*}}\in\mathbb{S}^{4}_{+},R^{\sigma^{*}}\in\mathbb{S}^{3}_{++}\big{\}}.

Therefore, we can apply ZO-RMS to solve (14). The overall implementation of ZO-RMS in this context is depicted in Fig. 4. the ZO-RMS algorithm iteratively updates the MPC parameters $\{P^{\sigma^{*}},Q^{\sigma^{*}},R^{\sigma^{*}}\}$ to minimise $f(P^{\sigma^{*}},Q^{\sigma^{*}},R^{\sigma^{*}})$ , which is calculated from closed-loop response experiments carried out between the MPC and diesel engine.

It remains to show how to compute the oracle and projection in (2). Particularly, the oracle in this context is computed as per (3) with $U=\mbox{diag}\{U^{P},U^{Q},U^{R}\}$ , where $U^{P},U^{Q}\in\mathbb{G}^{4}$ and $U^{R}\in\mathbb{G}^{3}$ (see Definition 1). We emphasise that, at each iteration step, the oracle is computed once the entire closed-loop response is available. Particularly, two experiments are required to compute the oracle, one with parameters $\{P_{k}^{\sigma^{*}}+\mu U_{k}^{P},Q_{k}^{\sigma^{*}}+\mu U_{k}^{Q},R_{k}^{\sigma^{*}}+\mu U_{k}^{R}\}$ and one with $\{P_{k}^{\sigma^{*}},Q_{k}^{\sigma^{*}},R_{k}^{\sigma^{*}}\}$ . Once the two closed-loop responses are finished, then ZO-RMS computes the oracle and next update $\{P_{k+1}^{\sigma^{*}},Q_{k+1}^{\sigma^{*}},R_{k+1}^{\sigma^{*}}\}$ . Two new closed-loop experiments are then performed with these new controller parameters and the process continues iteratively in this fashion.

Since $X$ is block diagonal, the projection onto $\mathbb{Q}^{n}$ is computed as $\pi_{\mathbb{Q}^{n}}\{X\}=\mbox{diag}\{\pi_{\mathbb{S}^{4}_{+}}\{P\},\pi_{\mathbb{S}^{4}_{+}}\{Q\},\pi_{\mathbb{S}^{3}_{++}}\{R\}\}$ , where $\pi_{\mathbb{S}^{4}_{+}}$ and $\pi_{\mathbb{S}^{3}_{++}}$ denote the Euclidean projection onto $\mathbb{S}^{4}_{+}$ and $\mathbb{S}^{3}_{++}$ , respectively. To compute them, we follow the well known results in [15]. That is, let $K=V\Lambda V^{\top}$ be the eigenvalue decomposition of a matrix $K$ . Then, $\pi_{\mathbb{S}^{+}}\{K\}\triangleq V\max\{0,\Lambda\}V^{\top}$ and $\pi_{\mathbb{S}^{++}}\{K\}\triangleq V\max\{d,\Lambda\}V^{\top}$ , $d>0$ .

5 Simulation study

We now perform numerical simulations to show the advantage of our proposed MPC tuning framework with respect to other available gradient-free algorithms in the literature. Particularly, we compare our approach to the dividing rectangles (DIRECT) algorithm [17], particle swarm optimisation algorithms [18], and Bayesian optimisation algorithms [22, 37]. DIRECT is a sample-based global optimisation method for Lipschitz continuous functions defined over multidimensional domains. It partitions the space into hyperrectangles, each with a sampled point at its centre. The cost function is evaluated at these centrepoints, and then the algorithm keeps sampling and dividing into rectangles until the iteration limit has been reached. PSO algorithms solve the problem by having a population of so-called particles which move along the search space according to the updating rules of their position and velocity. The movement of the particles are guided by their own best known position as well as the entire swarm’s best known position in the search space. At each iteration, the cost function is evaluated for every particle in the swarm. Lastly, BO seeks to identify the optimal tuning parameters by strategically exploring and exploiting the parameter space. Exploration aims to evaluate the objective at points in the decision space with the goal of improving the accuracy of a surrogate model of the objective function, while exploitation aims to use the surrogate model to identify decisions that reduce (or increase) the objective function [22].

Intuition suggests that methods such as DIRECT and PSO would perform many cost function evaluations in order to find the optimal value, since they rely on the number of rectangles/centre points (DIRECT) and particles (PSO). We demonstrate that this is indeed true for the MPC tuning problem described in Section 4.

We perform simulations using the tuning scheme from Fig. 4 in which the diesel engine block is simulated via a high-fidelity model, see e.g. [30]. This model is physics-based with the form (12), in which the parameters have been obtained from system identification experiments conducted at Toyota’s Higashi-Fuji Technical Centre. The specific equations and parameters are not included due to commercial in confidence purposes. We focus on tuning controller $\sigma^{*}=6$ (see Fig. 3), over a segment of the NEDC, which is called the urban drive cycle (UDC). We focus on tuning diagonal matrices $\{P^{6},Q^{6},R^{6}\}$ with positive elements so that we can compare to vector-valued methods like DIRECT, PSO, and BO. We emphasise another contribution of the proposed optimisation framework is that it allows the direct tuning of the weighting matrices so that they satisfy the required constraints of symmetricity and positive (semi) definiteness. For these simulations, we do not include the augmented integrator states.

The cost function is the tracking error as per (15), and to assess the complexity of each algorithm we will use the total number of cost function evaluations that takes to achieve the optimal cost function value up to a certain tolerance. The initial calibration is $P_{0}^{6}=Q_{0}^{6}=\mbox{diag}\{0.01,0,0.2,0.01\}$ and $R_{0}^{6}=10^{-5}I_{3\times 3}$ , and the number of tuning parameters is eleven. The ZO-RMS algorithm uses step size $h_{k}=5\times 10^{-3}/\sqrt{k+1}$ and oracle’s precision $\mu=8\times 10^{-5}$ ( $L_{0}(f)=6,\epsilon=0.008,\bar{r}=2,n=11$ ). For the PSO algorithm we have picked 40 particles as suggested by the literature [18]. For DIRECT, PSO, and BO, we use the decision variable range $[0.01,5]$ .

One realisation run of all four algorithms is summarised in Table 1, where $f_{\text{opt}}$ denotes the optimal value of the cost function achieved by each algorithm. Next we show the number of cost function evaluations, followed by the total execution time of each algorithm. We can see that in order to achieve a similar value of $f_{\text{opt}}$ (up to a 0.0002 difference), the ZO-RMS algorithm performs 20 function evaluations, compared to 66 for DIRECT, 200 for PSO, and 28 for BO. We emphasise that each function evaluation in this context corresponds to a closed-loop experiment, and thus having to perform many of these is intractable in practice. We can see that BO requires only a few more cost function evaluations than ZO-RMS. It can also be observed that DIRECT, PSO, and BO take longer to execute in comparison to ZO-RMS.

Since PSO, BO, and ZO-RMS are stochastic, we also performed a Monte Carlo simulation of 100 realisations in order to compare the three of them more accurately. These results are listed in Table 2. We can see that all achieve a similar optimal value for $f_{\text{opt}}$ in average, but it takes about 199 cost function evaluations in average for PSO to achieve this, and only 22 for BO and 20 for ZO-RMS which makes them more efficient for applications in which the plant is in the loop. Overall, BO exhibits comparable performance with respect to ZO-RMS in terms of cost function evaluations and execution time; however, as mentioned in the introduction, these methods only work for vector decision variables and the PSD matrix structure of the MPC tuning matrices is not preserved by the algorithm. They also lack theoretical guarantees in this context.

Table 1: Comparison between available algorithms.

Algorithm	$f_{\text{opt}}$	N^∘ of cost fcn. evaluations	Execution time
DIRECT	0.0478	66	$\sim$ 0.6 hours
PSO	0.0476	200	$\sim$ 1.6 hours
BO	0.0477	28	$\sim$ 0.5 hours
ZO-RMS	0.0476	20	$\sim$ 0.3 hours

Table 2: Monte Carlo simulation (100 realisations).

Algorithm	$\mathbb{E}\{f_{\text{opt}}\}$	$\textrm{Var}\{f_{\text{opt}}\}$	$\mathbb{E}\left\{\text{N${}^{\circ}$ of cost fcn. evaluations}\right\}$
PSO	0.0472	$5\times 10^{-8}$	199
BO	0.0477	$7\times 10^{-8}$	22
ZO-RMS	0.0496	$9\times 10^{-6}$	20

6 Experiments

The experimental testing of the proposed MPC tuning framework was performed at Toyota’s Higashi-Fuji Technical Center in Susono, Japan. The test bench is equipped with a diesel engine, a transient dynamometer, and a dSPACE DS1006 processor board, which is in turn interfaced to the ECU. The block diagram in Fig. 5 depicts the overall experimental setting. The ECU logs sensor data from the engine and transmits the current state information to the controller. In addition, the ECU directly controls all engine sub-systems, however, the ECU commands for the three actuators (throttle, EGR valve, and VGT) can be overridden by the MPC commands through a virtual switch in the ControlDesk interface. The proposed tuning architecture is implemented iteratively in-the-loop as per Fig. 4, in which the diesel engine block is the real engine described above. Particularly, the switched MPC is implemented in real-time on the dSPACE board to generate the required closed-loop drive cycle responses, and the computations in the ZO-RMS algorithm are performed by Matlab, i.e. oracle, projections, and next set of parameters.

6.1 Choice of algorithm parameters

To run ZO-RMS we need to choose the parameters $\mu$ and $h_{k}$ . Corollary 1 provides a choice for these parameters so that a given level of accuracy is achieved by the algorithm in $N$ iterations as per (6). However, this choice depends on the Lipschitz constant $L_{0}(f)$ , and the bound $\bar{r}$ . These can be numerically bounded depending on the application. Since we apply the proposed optimisation framework to MPC tuning over a real engine, it is not possible to find an explicit expression for the cost function in terms of $P,Q,R$ . Instead, a common approach is to adopt experimental optimisation methods such as the ones in [7, 1] to ensure that the Lipschitz condition is at least satisfied in the experimental data. Particularly, since the oracle $\mathcal{O}_{\mu}$ uses values of the cost function at two different points, we can store these values and use a consistency-check algorithm to see if the Lipschitz condition is verified (see e.g. [7, Section 4.3]). This could be done by performing high-fidelity simulations of the closed-loop to compute the oracle at different points, and pick the largest constant that satisfies the Lipschitz condition as a first estimate. Later on in the experiments, we can further adjust this first estimate accordingly, and verify whether the bound is satisfied for the collected experimental measurements. Similarly, $\bar{r}$ can be estimated with the stored data of $X_{0}$ and $\hat{X}_{N}$ from multiple simulations so that $\left\|X_{0}-X^{*}\right\|\leq\bar{r}$ holds for the experimental space. In this paper, we initially picked conservative estimates from simulations, and we later refine these heuristically during experimental testing. Online estimation of the Lipschitz constant $L_{0}(f)$ can improve the aforementioned methods and it is considered as a future direction.

In what follows, we tune three out of the twelve MPC controllers depicted in Fig. 3 over three different drive cycle segments. Particularly, we tune: controller $\sigma^{*}=9$ over the middle segment of the UDC (49–117[s]), and we call it UDC2; controller $\sigma^{*}=5$ over the first 100 seconds of the WLTC, which call WLTC1; and controller $\sigma^{*}=6$ over the last segment of the UDC (117–195[s]), which we call UDC3.

For all the experiments below we consider an initial calibration $Q_{0}=\mbox{diag}\{\bar{Q}_{0},\bar{Q}_{e,0}\}$ , $\bar{Q}_{0}=\mbox{diag}\{0.01,0.01,0.2,1\}$ , $\bar{Q}_{e,0}=0.00001\times\mbox{diag}\{1,100\}$ , $P_{0}=Q_{0},R_{0}=I_{3\times 3}$ for all twelve controllers, where $\bar{Q}_{0}$ and $\bar{Q}_{e,0}$ respectively relate to the engine state and integrator state. In addition, all the drive cycle references are generated by the ECU based on a driver model. These models are responsible for converting the drive cycle vehicle speed reference into reference operating points in terms of engine speed and load.

6.2 Controller $\sigma^{*}=9$ over UDC2

The engine output response to the UDC2 segment with the baseline controller is given in Fig. 6. The grey area illustrates when controller $\sigma^{*}=9$ is active, and we consider its tracking performance as unsatisfactory (we draw the attention to this with the boxed area). We would like to improve the tracking performance for this controller by using ZO-RMS. The parameters used in this experiment are¹¹1We note that for this particular experiment we used a very conservative estimate of the Lipschitz constant based on different experimental tests. $\mu=0.003\times 10^{-5}$ , $h_{k}=10^{-6}/\sqrt{k+1}$ . ( $L_{0}(f)=2.37\times 10^{4},\epsilon=0.01,\bar{r}=1.1,n=9$ ).

In this experiment, we consider $Q^{9}=\mbox{diag}\{\bar{Q}^{9},\bar{Q}_{e}^{9}\}$ and use ZO-RMS to tune $\{\bar{Q}^{9},\bar{Q}_{e}^{9},R^{9}\}$ , whilst $P^{9}$ is constructed by solving the discrete-time algebraic Riccati equation (DARE) [24], $P^{9}=\texttt{dare}\bigg{(}\begin{bmatrix}A^{9}&0\\ -C^{9}&I\end{bmatrix},\begin{bmatrix}B^{9}\\ 0\end{bmatrix},\begin{bmatrix}\bar{Q}^{9}&0\\ 0&\bar{Q}_{e}^{9}\end{bmatrix},R^{9}\bigg{)}$ at each iteration. We emphasise that the proposed tuning framework provides enough flexibility to either tune a single matrix, or all of the MPC matrices depending on the particular application. For instance, for this drive cycle segment, tuning the integrator state matrix $\bar{Q}_{e}$ helped significantly in improving tracking performance, but for the experiments further below it was not necessary to tune this matrix and it was thus fixed.

The resulting optimal tuning matrices for $N=11$ iterations are given by

	$\displaystyle\hat{\bar{Q}}^{9}$	$\displaystyle=\begin{bmatrix}0.0611&0.0277&0.0742&0.1389\\ 0.0277&0.1393&-0.0554&0.0318\\ 0.0742&-0.0554&0.2177&0.0963\\ 0.1389&0.0318&0.0963&1.0701\end{bmatrix},$
	$\displaystyle\hat{R}^{9}$	$\displaystyle=\begin{bmatrix}1.8037&-0.0924&0.4081\\ -0.0924&1.5841&0.2252\\ 0.4081&0.2252&1.2879\end{bmatrix},$
	$\displaystyle\hat{\bar{Q}}_{e}^{9}$	$\displaystyle=\begin{bmatrix}0.0013&0.0014\\ 0.0014&0.0041\end{bmatrix}.$

The engine output response over the UDC2 using the above matrices is shown in Fig. 7. The tracking performance has significantly improved in the region where controller nine is acting. Particularly, the tuned matrices provide an improvement of performance²²2We compute the percentage of improvement with respect to the baseline controller as: $100\times[f(P_{0},Q_{0},R_{0})-f(\hat{P}^{\sigma^{*}},\hat{Q}^{\sigma^{*}},\hat{R}^{\sigma^{*}})]/f(P_{0},Q_{0},R_{0})$ . of 16.2% with only eleven iterations.

6.3 Controller $\sigma^{*}=5$ over WLTC1

The engine output response to the WLTC1 segment with the baseline controller is given in Fig. 8. The grey area illustrates when controller $\sigma^{*}=5$ is active, and we have boxed the areas with unsatisfactory performance that we would like to improve. The parameters used in this experiment are $\mu=2.5\times 10^{-6}$ , $h_{k}=10^{-6}/\sqrt{k+1}$ ( $L_{0}(f)=3450,\epsilon=0.1,\bar{r}=0.1,n=7$ ). Similar to the previous experiment, the matrix $P^{5}$ is constructed by solving the corresponding DARE, and we consider $Q^{5}=\mbox{diag}\{\bar{Q}^{5},\bar{Q}_{e}^{5}\}$ , but now the integrator state matrix $\bar{Q}_{e}^{5}$ is not tuned but kept equal to the initial value $\bar{Q}_{e,0}=0.00001\times\mbox{diag}\{1,100\}$ .

Consequently, the tuning parameters are $\{\bar{Q}^{5},R^{5}\}$ , and their optimal values for $N=11$ iterations are given by

	$\displaystyle\hat{\bar{Q}}^{5}$	$\displaystyle=\begin{bmatrix}0.0101&-0.0078&-0.0213&0.0102\\ -0.0078&0.0273&0.0061&0.0170\\ -0.0213&0.0061&0.1760&0.0004\\ 0.0102&0.0170&0.0004&0.9967\end{bmatrix},$
	$\displaystyle\hat{R}^{5}$	$\displaystyle=\begin{bmatrix}0.9992&-0.0066&-0.0017\\ -0.0066&0.9839&0.0016\\ -0.0017&0.0016&0.9806\end{bmatrix}.$

The engine output response over the WLTC1 using the above matrices is shown in Fig. 9. The tracking performance has improved in the region where controller nine is acting. Particularly, the tuned matrices provide an improvement of performance of 22.67% with only eleven iterations, as illustrated by the boxed regions in Fig. 9. We can observe that the tracking has clearly improved in the boxed areas, but we can see that in the grey area 40-50s in Fig. 8 and 9, the EGR tracking got slightly deteriorated. However, we emphasise that the cost function being minimised is the overall tracking performance considering every grey area where controller $\sigma^{*}=5$ is acting. This is considered satisfactory since the overall tracking improvement in this case was 22.67%. Different cost functions or tuning approaches could be potentially used to tackle each region individually.

6.4 Controller $\sigma^{*}=6$ over UDC3

The engine output response to the UDC3 segment with the baseline controller is given in Fig. 10. The grey area illustrates when controller $\sigma^{*}=6$ is active, and we have boxed the areas with unsatisfactory performance. The parameters used in this experiment are $\mu=2.5\times 10^{-6}$ , $h_{k}=10^{-5}/\sqrt{k+1}$ ( $L_{0}(f)=2450,\epsilon=0.1,\bar{r}=1.2,n=11$ ). The integrator state matrix $\bar{Q}_{e}$ was not tuned but kept equal to the initial value $\bar{Q}_{e,0}=0.00001\times\mbox{diag}\{1,100\}$ . As opposed to the previous two experiments, we do include $P^{5}$ in the tuning process for this experiment. Specifically, we construct it as $P^{6}=\mbox{diag}\{\bar{P}^{6},\bar{Q}_{e,0}\}$ and tune $\bar{P}^{6}$ .

Consequently, the tuning parameters are { $\bar{P}^{6},\bar{Q}^{6},R^{6}$ }, and their optimal values for $N=8$ iterations are given by

	$\displaystyle\hat{\bar{P}}^{6}$	$\displaystyle=\begin{bmatrix}0.0502&-0.0117&-0.0398&0.0057\\ -0.0117&0.0102&0.0146&0.0118\\ -0.0398&0.0146&0.2003&-0.0426\\ 0.0057&0.0118&-0.0426&1.0022\end{bmatrix},$
	$\displaystyle\hat{\bar{Q}}^{6}$	$\displaystyle=\begin{bmatrix}0.4699&0.0708&0.0192&0.0901\\ 0.0708&0.2329&-0.1597&-0.1684\\ 0.0192&-0.1597&0.3459&0.0016\\ 0.0901&-0.1684&0.0016&1.1911\end{bmatrix},$
	$\displaystyle\hat{R}^{6}$	$\displaystyle=\begin{bmatrix}1.1227&-0.1419&0.0305\\ -0.1419&1.0621&0.0534\\ 0.0305&0.0534&1.2843\end{bmatrix}.$

The engine output response over the UDC3 using the above matrices is shown in Fig. 11. We can see that the tracking performance has improved in the region where controller nine is acting. Particularly, the tuned matrices provide an improvement of performance of 7.73% with only eight iterations.

The theoretical results in Section 3 are sufficient conditions on the step size $h_{k}$ and oracle’s precision $\mu$ that ensure a certain level of accuracy in a fixed number of iterations. We note that the complexity bounds presented in this paper hold for any problem that fits the optimisation framework surrounding (1). Depending on the application, these bounds could become more or less conservative, as we observed in the air-path control case study. In fact, since these are sufficient conditions, we observed that the experimental performance of the algorithm showed tracking error improvement only in a few iterations. Nevertheless, the theoretical results serve as a design tool that guide the choice of algorithm parameters as opposed to a heuristic choice of parameters.

Remark 1

Overall, we observed that the proposed approach successfully provides improved tracking performance over three different drive cycle segments, i.e. different data sets, which sheds some light on the robustness of the method. However, an interesting future work direction is to provide a theoretical study on robustness guarantees.

7 Conclusion

This paper provided an optimisation framework with theoretical guarantees for the minimisation of non-smooth and possibly non-convex cost functions with matrix parameters. We applied the proposed algorithm to tune MPCs in the context of air-path control in diesel engines, which is then validated by experimental testing. The algorithm provides improvement of performance with a few iterations, which is demonstrated in different engine drive cycles. Therefore, it creates potential for the development of efficient tuning tools for advanced controllers (and potentially retuning online), even though the theoretical complexity bounds may be large depending on the application.

Future work includes exploring algorithms over other matrix manifolds with practical applications, testing of the proposed algorithms in stochastic drive cycles, robustness guarantees, and online estimation/adaptation of Lipschitz constant $L_{0}(f)$ .

Appendix A Technical results

Appendix A provides the preliminary results needed for Appendices B and C.

Consider a function $f:\mathbb{Q}^{n}\rightarrow\mathbb{R}$ , and let us define the Gaussian approximation of $f$ as

\displaystyle f_{\mu}(X)\triangleq\frac{1}{\kappa}\int_{\mathbb{G}^{n}}f(X+\mu U)e^{-\frac{1}{2}\left\|U\right\|_{F}^{2}}\mathrm{d}U.

(16)

We first introduce the following definition.

Definition 2

Let $f$ be a convex function. A matrix $G$ is called a subgradient of function $f$ at $X\in\mathbb{R}^{n\times n}$ if for any $Y\in\mathbb{R}^{n\times n}$ we have $f(Y)\geq f(X)+\langle G,Y-X\rangle_{F}$ . The set of all subgradients of $f$ at $X\in\mathbb{R}^{n\times n}$ is called subdifferential and is denoted by $\partial f(X)$ .

The following lemma holds for $f_{\mu}$ .

Lemma 2

(I)

If $f$ is convex and $G\in\partial f(X)$ , then

\displaystyle f_{\mu}(X)

\displaystyle\geq\frac{1}{\kappa}\int_{\mathbb{G}^{n}}(f(X)+\mu\langle G,U\rangle_{F})e^{-\frac{1}{2}\left\|U\right\|_{F}^{2}}\mathrm{d}U=f(X).

(17)

(II)

The gradient matrix $\nabla f_{\mu}$ satisfies

\displaystyle\nabla f_{\mu}({X})=\frac{1}{\kappa}\int_{\mathbb{G}^{n}}\frac{f({X}+\mu{U})-f({X})}{\mu}e^{-\frac{1}{2}\left\|{U}\right\|_{F}^{2}}{U}\,\mathrm{d}{U}.

(18)

(III)

Let $f$ be Lipschitz continuous as per Assumption 1, then

$\displaystyle|f_{\mu}(X)-f(X)|\leq\mu L_{0}(f)\sqrt{\tfrac{n^{2}+n}{2}},\quad X\in\mathbb{Q}^{n}.$ (19)
(IV)

Under Assumption 1, $f_{\mu}$ has Lipschitz continuous gradient, i.e. $|f_{\mu}(Y)-f_{\mu}(X)-\left\langle\nabla f_{\mu}(X),Y-X\right\rangle_{F}|\leq\frac{1}{2}L_{1}(f_{\mu})\left\|X-Y\right\|_{F}^{2}$ , for all $X,Y\in\mathbb{Q}^{n}$ , with $L_{1}(f_{\mu})=(2L_{0}(f)/\mu)\sqrt{(n^{2}+n)/2}$ .

Proof:

(I)

We prove this statement as follows.

	$\displaystyle f_{\mu}(X)$	$\displaystyle\stackrel{{\scriptstyle\eqref{eq:f_mu-matrix}}}{{=}}\frac{1}{\kappa}\int_{\mathbb{G}^{n}}f(X+\mu U)e^{-\frac{1}{2}\left\\|U\right\\|_{F}^{2}}\mathrm{d}U$
		$\displaystyle\stackrel{{\scriptstyle\mbox{\small Def.\ref{def:sub-differential-matrix}}}}{{\geq}}\frac{1}{\kappa}\int_{\mathbb{G}^{n}}(f(X)+\langle G,\mu U\rangle_{F})e^{-\frac{1}{2}\left\\|U\right\\|_{F}^{2}}\mathrm{d}U$
		$\displaystyle=f(X)+\frac{1}{\kappa}\int_{\mathbb{G}^{n}}\langle G,\mu U\rangle_{F}e^{-\frac{1}{2}\left\\|U\right\\|_{F}^{2}}\mathrm{d}U$
		$\displaystyle=f(X)+\mu\mbox{Tr}\left\{\frac{1}{\kappa}\int_{\mathbb{G}^{n}}G^{\top}Ue^{-\frac{1}{2}\left\\|U\right\\|_{F}^{2}}\mathrm{d}U\right\}$
		$\displaystyle=f(X),$

where the last equality follows from $\mathbb{E}_{U}\{U\}=0_{n\times n}$ .

(II)

Define $Y\triangleq X+\mu U$ , and note that by the change of variable formula for multivariate integrals, $\mathrm{d}U=(1/\mu^{\bar{n}})\mathrm{d}Y$ , where $\bar{n}=n(n+1)/2$ . Consequently,

\displaystyle f_{\mu}(X)

\displaystyle=\frac{1}{\kappa}\int_{\mathbb{G}^{n}}f(Y)e^{-\frac{1}{2\mu^{2}}\left\|Y-X\right\|_{F}^{2}}\times\frac{1}{\mu^{\bar{n}}}\mathrm{d}Y.

Then,

	$\displaystyle\nabla f_{\mu}(X)$	$\displaystyle=\frac{1}{\mu^{\bar{n}}\kappa}\int_{\mathbb{G}^{n}}f(Y)e^{-\frac{1}{2\mu^{2}}\left\\|Y-X\right\\|_{F}^{2}}\times-\frac{1}{2\mu^{2}}\times\frac{\partial\mbox{Tr}\left\{(Y-X)^{2}\right\}}{\partial X}\mathrm{d}Y$
		$\displaystyle=\frac{1}{\mu^{\bar{n}+2}\kappa}\int_{\mathbb{G}^{n}}f(Y)e^{-\frac{1}{2\mu^{2}}\left\\|Y-X\right\\|_{F}^{2}}(Y-X)\mathrm{d}Y$
		$\displaystyle=\frac{1}{\mu\kappa}\int_{\mathbb{G}^{n}}f(X+\mu U)e^{-\frac{1}{2}\left\\|U\right\\|_{F}^{2}}U\mathrm{d}U$
		$\displaystyle=\frac{1}{\kappa}\int_{\mathbb{G}^{n}}\frac{f(X+\mu U)-f(X)}{\mu}e^{-\frac{1}{2}\left\\|U\right\\|_{F}^{2}}U\mathrm{d}U,$

where the last equality follows from $\mathbb{E}_{U}\{U\}=0_{n\times n}$ .

(III)

By definition of $f_{\mu}$ in (16) we have

\displaystyle f_{\mu}(X)-f(X)=\frac{1}{\kappa}\int_{\mathbb{G}^{n}}(f(X+\mu U)-f(X))e^{-\frac{1}{2}\left\|U\right\|_{F}^{2}}\mathrm{d}U.

Then, by Lipschitz continuity of $f$ (cf. Assumption 1),

\displaystyle|f_{\mu}(X)-f(X)|

\displaystyle\leq\mu L_{o}(f)\frac{1}{\kappa}\int_{\mathbb{G}^{n}}\left\|U\right\|_{F}e^{-\frac{1}{2}\left\|U\right\|_{F}^{2}}\mathrm{d}U\leq\mu L_{0}(f)\sqrt{\tfrac{n^{2}+n}{2}},

where the last inequality follows from Lemma 1.

(IV)

Indeed, for all $X,Y\in\mathbb{Q}^{n}$ we have from (18) and Assumption 1 that

\displaystyle\left\|\nabla f_{\mu}(X)-\nabla f_{\mu}(Y)\right\|_{F}\leq\frac{2L_{0}(f)}{\kappa\mu}\int_{\mathbb{G}^{n}}\left\|U\right\|_{F}e^{-\frac{1}{2}\left\|U\right\|_{F}^{2}}\mathrm{d}U,

and the result follows from Lemma 1. $\blacksquare$

For the random oracle in (3), we have the following bound.

Theorem 3

If $f$ satisfies Assumption 1, then

\displaystyle\mathbb{E}_{U}\{\left\|\mathcal{O}_{\mu}(X,U)\right\|^{2}_{F}\}\leq\frac{1}{4}L_{0}^{2}(f)(n^{4}+2n^{3}+5n^{2}+4n).

(20)

Proof: Note that $\left\|\mathcal{O}_{\mu}(X,U)\right\|_{F}^{2}=\mbox{Tr}\left\{\mathcal{O}_{\mu}(X,U)^{\top}\mathcal{O}_{\mu}(X,U)\right\}=\frac{(f(X+\mu U)-f(X))^{2}}{\mu^{2}}\left\|U\right\|_{F}^{2}$ . Then, $\mathbb{E}_{U}\{\left\|\mathcal{O}_{\mu}(X,U)\right\|^{2}_{F}\}\leq\frac{1}{\mu^{2}}\mathbb{E}_{U}\{L_{0}^{2}(f)\left\|\mu U\right\|_{F}^{2}\left\|U\right\|_{F}^{2}\}=L_{0}^{2}(f)\mathbb{E}_{U}\{\left\|U\right\|_{F}^{4}\}$ . The proof thus follows from Lemma 1. $\blacksquare$

Appendix B Proof of Theorem 1

We first note that the projection $\pi_{\mathbb{Q}^{n}}$ in ZO-RMS satisfies

\displaystyle\left\|\pi_{\mathbb{Q}^{n}}\{X\}-Y\right\|_{F}\leq\left\|X-Y\right\|_{F},

(21)

for all $Y\in\mathbb{Q}^{n}$ .

Now, let point $X_{k}$ with $k\in\mathbb{N}$ be generated after $k$ iterations of ZO-RMS, and define $r_{k}\triangleq\left\|X_{k}-X^{*}\right\|_{F}$ . Then,

	$\displaystyle r_{k+1}^{2}$	$\displaystyle=\left\\|\pi_{\mathbb{Q}^{n}}\{X_{k}-h_{k}\mathcal{O}_{\mu}(X_{k},U_{k})\}-X^{*}\right\\|_{F}^{2}$
		$\displaystyle\stackrel{{\scriptstyle\eqref{eq:projection-matrix}}}{{\leq}}\left\\|X_{k}-h_{k}\mathcal{O}_{\mu}(X_{k},U_{k})-X^{*}\right\\|_{F}^{2}$
		$\displaystyle=r_{k}^{2}-2h_{k}\left\langle X_{k}-X^{*},\mathcal{O}_{\mu}(X_{k},U_{k})\right\rangle_{F}+h_{k}^{2}\left\\|\mathcal{O}_{\mu}(X_{k},U_{k})\right\\|_{F}^{2}\,.$

Then, taking expectation

	$\displaystyle\mathbb{E}_{U_{k}}\{r_{k+1}^{2}\}$	$\displaystyle\stackrel{{\scriptstyle\eqref{eq:(35)-matrix}}}{{\leq}}r_{k}^{2}-2h_{k}\mathbb{E}_{U_{k}}\{\left\langle X_{k}-X^{*},\mathcal{O}_{\mu}(X_{k},U_{k})\right\rangle_{F}\}+\tfrac{1}{4}h_{k}^{2}L_{0}^{2}(f)(n^{4}+2n^{3}+5n^{2}+4n)$
		$\displaystyle=r_{k}^{2}-2h_{k}\mbox{Tr}\left\{[X_{k}-X^{*}]^{\top}\mathbb{E}_{U_{k}}\{\mathcal{O}_{\mu}(X_{k},U_{k})\}\right\}+\tfrac{1}{4}h_{k}^{2}L_{0}^{2}(f)(n^{4}+2n^{3}+5n^{2}+4n).$

From (18), we get $\mathbb{E}_{U_{k}}\{\mathcal{O}_{\mu}(X_{k},U_{k})\}=\nabla f_{\mu}(X_{k})$ . Therefore,

	$\displaystyle\mathbb{E}_{U_{k}}\{r_{k+1}^{2}\}$	$\displaystyle\leq r_{k}^{2}-2h_{k}\left\langle X_{k}-X^{*},\nabla f_{\mu}(X_{k})\right\rangle_{F}+\tfrac{1}{4}h_{k}^{2}L_{0}^{2}(f)(n^{4}+2n^{3}+5n^{2}+4n)$
		$\displaystyle\stackrel{{\scriptstyle(a)}}{{\leq}}r_{k}^{2}-2h_{k}[f(X_{k})-f_{\mu}(X^{*})]+\tfrac{1}{4}h_{k}^{2}L_{0}^{2}(f)(n^{4}+2n^{3}+5n^{2}+4n),$

where $(a)$ in the last inequality is shown as follows,

	$\displaystyle f_{\mu}(X^{*})$	$\displaystyle\stackrel{{\scriptstyle\text{Def.\ref{def:sub-differential-matrix}}}}{{\geq}}f_{\mu}(X_{k})+\langle\nabla f_{\mu}(X_{k}),X^{*}-X_{k}\rangle_{F},$
		$\displaystyle\hskip 1.99168pt\stackrel{{\scriptstyle\eqref{eq:f_mu>f-matrix}}}{{\geq}}f(X_{k})-\langle X_{k}-X^{*},\nabla f_{\mu}(X_{k})\rangle_{F}.$

Taking now the expectation in $U_{0:k-1}$ , we obtain

\displaystyle\mathbb{E}_{U_{0:k}}\{r_{k+1}^{2}\}

\displaystyle\leq\mathbb{E}_{U_{0:k-1}}\{r_{k}^{2}\}-2h_{k}(\phi_{k}-f_{\mu}(X^{*}))+\tfrac{1}{4}h_{k}^{2}L_{0}^{2}(f)(n^{4}+2n^{3}+5n^{2}+4n).

Moreover, note that $f_{\mu}(X^{*})\stackrel{{\scriptstyle\eqref{eq:f_mu>f-matrix}}}{{\geq}}f(X^{*})$ , thus (19) implies that $f_{\mu}(X^{*})\leq f^{*}+\mu L_{0}(f)\sqrt{(n^{2}+n)/2}$ . Consequently,

\mathbb{E}_{U_{0:k}}\{r_{k+1}^{2}\}\leq\mathbb{E}_{U_{0:k-1}}\{r_{k}^{2}\}-2h_{k}(\phi_{k}-f^{*})+2h_{k}\mu L_{0}(f)\sqrt{\tfrac{n^{2}+n}{2}}\\ +\tfrac{1}{4}h_{k}^{2}L_{0}^{2}(f)(n^{4}+2n^{3}+5n^{2}+4n).

(22)

We can now iterate (22) from $k=0$ to $k=N$ and get

\mathbb{E}_{U_{0:N}}\{r_{N+1}^{2}\}\leq r_{0}^{2}-2\sum_{k=0}^{N}h_{k}(\phi_{k}-f^{*})+2S_{N}\mu L_{0}(f)\sqrt{\tfrac{n^{2}+n}{2}}\\ +\tfrac{1}{4}(n^{4}+2n^{3}+5n^{2}+4n)L_{0}^{2}(f)\sum_{k=0}^{N}h_{k}^{2}\ .

(23)

Since $\mathbb{E}_{U_{0:N}}\{r_{N+1}^{2}\}\geq 0$ , from (23) we immediately get (4), completing the proof. $\blacksquare$

Appendix C Proof of Theorem 2

Define $g_{k}\triangleq P_{\mathbb{Q}^{n}}(X_{k},\mathcal{O}_{\mu}(X_{k},U_{k}),h_{k})$ . From Lemma 2(IV), we have

\displaystyle f_{\mu}(X_{k+1})

\displaystyle\leq f_{\mu}(X_{k})-h_{k}\left\langle\nabla f_{\mu}(X_{k}),g_{k}\right\rangle_{F}+\tfrac{h_{k}^{2}L_{1}(f_{\mu})}{2}\left\|g_{k}\right\|_{F}^{2}.

From non-expansiveness of projection, i.e. (21), and from the fact that $L_{1}(f_{\mu})=(2L_{0}(f)/\mu)\sqrt{(n^{2}+n)/2}$ , we have

\displaystyle f_{\mu}(X_{k+1})

\displaystyle\leq f_{\mu}(X_{k})-h_{k}\left\langle\nabla f_{\mu}(X_{k}),g_{k}\right\rangle_{F}+\tfrac{h_{k}^{2}L_{0}(f)}{\mu}\sqrt{\tfrac{n^{2}+n}{2}}\left\|\mathcal{O}_{\mu}(X_{k},U_{k})\right\|_{F}^{2}.

Define $\xi_{k}\triangleq\mathcal{O}_{\mu}(X_{k},U_{k})-\nabla f_{\mu}(X_{k})$ , then $\left\langle\nabla f_{\mu}(X_{k}),g_{k}\right\rangle_{F}=\left\langle\mathcal{O}_{\mu}(X_{k},U_{k}),g_{k}\right\rangle_{F}-\left\langle\xi_{k},g_{k}\right\rangle_{F}$ . Based on [13, Lemma 1] we have $\left\langle\mathcal{O}_{\mu}(X_{k},U_{k}),g_{k}\right\rangle\geq\left\|g_{k}\right\|_{F}^{2}+\frac{1}{h_{k}}[\textbf{h}(X_{k+1})-\textbf{h}(X_{k})]$ , where $\textbf{h}(X)=0$ if $X\in\mathbb{Q}^{n}$ and $\infty$ otherwise. Since $X_{0}\in\mathbb{Q}^{n}$ and based on (2), we know that $\mathbf{h}(X_{k})=0$ for $k\geq 0$ . Therefore,

f_{\mu}(X_{k+1})\leq f_{\mu}(X_{k})+h_{k}\left\langle\xi_{k},g_{k}-P_{\mathbb{Q}^{n}}(X_{k},\nabla f_{\mu}(X_{k}),h_{k})\right\rangle_{F}+h_{k}\left\langle\xi_{k},P_{\mathbb{Q}^{n}}(X_{k},\nabla f_{\mu}(X_{k}),h_{k})\right\rangle\\ -h_{k}\left\|g_{k}\right\|_{F}^{2}+\tfrac{h_{k}^{2}L_{0}(f)}{\mu}\sqrt{\tfrac{n^{2}+n}{2}}\left\|\mathcal{O}_{\mu}(X_{k},U_{k})\right\|_{F}^{2}.

(24)

From [13, Proposition 1], we can write $\left\langle\xi_{k},g_{k}-P_{\mathbb{Q}^{n}}(X_{k},\nabla f_{\mu}(X_{k}),h_{k})\right\rangle_{F}\leq\left\|\xi_{k}\right\|_{F}^{2}$ . With this fact, we take expectation in (24) and get

\displaystyle h_{k}\mathbb{E}_{U_{k}}\{\left\|g_{k}\right\|_{F}^{2}\}\leq f_{\mu}(X_{k})-\mathbb{E}_{U_{k}}\{f_{\mu}(X_{k+1})\}+h_{k}\sigma^{2}+h_{k}^{2}C(\mu),

(25)

where we also used $\mathbb{E}_{U_{k}}\{\xi_{k}\}=0$ , Theorem 3 to compute $\mathbb{E}_{U_{k}}\{\left\|\mathcal{O}_{\mu}(X_{k},U_{k})\right\|_{F}^{2}\}$ , and Assumption 2 to upper bound $\mathbb{E}_{U_{k}}\{\left\|\xi_{k}\right\|_{F}^{2}\}\leq\sigma^{2}$ . Recall that $S_{N}\triangleq\sum_{k=0}^{N}h_{k}$ and that $f_{\mu}(X^{*})\geq f^{*}$ . Then, taking expectation in $U_{0:k}$ in (25) and then taking summation, we obtain (8) as required. $\blacksquare$

Acknowledgment

The authors would like to thank the engineering staff at Toyota Higashi-Fuji Technical Centre, Susono, Japan, for their assistance in running the experiments included in this paper.

References

[1] Mohamed Osama Ahmed, Sharan Vaswani, and Mark Schmidt. Combining bayesian optimization and lipschitz optimization. Machine Learning, 109(1):79–102, 2020.
[2] Greg W Anderson, Alice Guionnet, and Ofer Zeitouni. An introduction to random matrices, volume 118. Cambridge university press, 2010.
[3] Matthew Anderson, Xi-Lin Li, Pedro Rodriguez, and Tülay Adali. An effective decoupling method for matrix optimization and its application to the ICA problem. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1885–1888. IEEE, 2012.
[4] M. Athans and F.C. Schweppe. Gradient matrices and matrix calculations. Technical report, M.I.T. Lincoln Lab., 1965.
[5] Peyman Bagheri and Ali Khaki-Sedigh. An analytical tuning approach to multivariable model predictive controllers. Journal of Process Control, 24(12):41–54, 2014.
[6] Alexander Bertrand and Marc Moonen. Topology-aware distributed adaptation of Laplacian weights for in-network averaging. In 21st European Signal Processing Conference (EUSIPCO 2013), pages 1–5. IEEE, 2013.
[7] Gene A Bunin and Grégory François. Lipschitz constants in experimental optimization. arXiv preprint arXiv:1603.07847, 2016.
[8] Gene A Bunin, Fernando Fraire Tirado, Grégory François, and Dominique Bonvin. Run-to-run MPC tuning via gradient descent. In Computer Aided Chemical Engineering, volume 30, pages 927–931. Elsevier, 2012.
[9] MM Chaikovskii and Igor’Borisovich Yadykin. Optimal tuning of PID controllers for MIMO bilinear plants. Automation and Remote Control, 70(1):118–132, 2009.
[10] Jon Dattorro. Convex optimization & Euclidean distance geometry. Meboo Publishing USA, 2011.
[11] Peter J Forrester. Log-gases and random matrices (LMS-34). Princeton University Press, 2010.
[12] Jorge L Garriga and Masoud Soroush. Model predictive control tuning methods: A review. Industrial & Engineering Chemistry Research, 49(8):3505–3515, 2010.
[13] Saeed Ghadimi, Guanghui Lan, and Hongchao Zhang. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming, 155(1-2):267–305, 2016.
[14] T. Hayakawa and Y. Kikuchi. The moments of a function of traces of a matrix with a multivariate symmetric normal distribution. South African Statistical Journal, 13(1):71–82, 1979.
[15] Nicholas J Higham. Computing a nearest symmetric positive semidefinite matrix. Linear algebra and its applications, 103:103–118, 1988.
[16] Alex S Ira, Chris Manzie, Iman Shames, Robert Chin, Dragan Nešić, Hayato Nakada, and Takeshi Sano. Tuning of multivariable model predictive controllers through expert bandit feedback. International Journal of Control, pages 1–9, 2020.
[17] Donald R Jones, Cary D Perttunen, and Bruce E Stuckman. Lipschitzian optimization without the lipschitz constant. Journal of optimization Theory and Applications, 79(1):157–181, 1993.
[18] Gesner A Nery Júnior, Márcio AF Martins, and Ricardo Kalid. A PSO-based optimal tuning strategy for constrained multivariable predictive controllers with model uncertainty. ISA transactions, 53(2):560–567, 2014.
[19] Chih-Jen Lin. Projected gradient methods for nonnegative matrix factorization. Neural computation, 19(10):2756–2779, 2007.
[20] Sijia Liu, Bhavya Kailkhura, Pin-Yu Chen, Paishun Ting, Shiyu Chang, and Lisa Amini. Zeroth-order stochastic variance reduction for nonconvex optimization. Advances in Neural Information Processing Systems, 31:3727–3737, 2018.
[21] Sijia Liu, Xingguo Li, Pin-Yu Chen, Jarvis Haupt, and Lisa Amini. Zeroth-order stochastic projected gradient descent for nonconvex optimization. In 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pages 1179–1183. IEEE, 2018.
[22] Qiugang Lu, Ranjeet Kumar, and Victor M Zavala. MPC controller tuning using bayesian optimization techniques. arXiv preprint arXiv:2009.14175, 2020.
[23] Alonso Marco, Philipp Hennig, Jeannette Bohg, Stefan Schaal, and Sebastian Trimpe. Automatic LQR tuning based on gaussian process global optimization. In 2016 IEEE international conference on robotics and automation (ICRA), pages 270–277. IEEE, 2016.
[24] D.Q. Mayne, J.B. Rawling, C.V. Rao, and P.O.M. Scokaert. Constrained model predictive control: Stability and optimality. Automatica, 36:789–814, 2000.
[25] Marcel Menner, Peter Worsnop, and Melanie N Zeilinger. Constrained inverse optimal control with application to a human manipulation task. IEEE Transactions on Control Systems Technology, 2019.
[26] Panayotis Mertikopoulos, E Veronica Belmega, Romain Negrel, and Luca Sanguinetti. Distributed stochastic optimization via matrix exponential learning. IEEE Transactions on Signal Processing, 65(9):2277–2290, 2017.
[27] Yurii Nesterov and Vladimir Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17(2):527–566, 2017.
[28] Ahmed Ramadan, Jongeun Choi, Clark J. Radcliffe, John M. Popovich, and N. Peter Reeves. Inferring control intent during seated balance using inverse model predictive control. IEEE Robotics and Automation Letters, 4(2):224–230, 2019.
[29] James Blake Rawlings and David Q Mayne. Model predictive control: Theory and design. Nob Hill Pub., 2009.
[30] Gokul S Sankar, Rohan C Shekhar, Chris Manzie, Takeshi Sano, and Hayato Nakada. Fast calibration of a robust model predictive controller for diesel engine airpath. IEEE Transactions on Control Systems Technology, 2019.
[31] Gokul S Sankar, Rohan C Shekhar, Chris Manzie, Takeshi Sano, and Hayato Nakada. Fast calibration of a robust model predictive controller for diesel engine airpath. IEEE Transactions on Control Systems Technology, 2019.
[32] Jose Eduardo W Santos, Jorge Otávio Trierweiler, and Marcelo Farenzena. Robust tuning for classical MPC through the multi-scenarios approach. Industrial & Engineering Chemistry Research, 58(8):3146–3158, 2019.
[33] S Yusef Shafi, Murat Arcak, and Laurent El Ghaoui. Graph weight allocation to meet Laplacian spectral constraints. IEEE Transactions on Automatic Control, 57(7):1872–1877, 2011.
[34] Gaurang Shah and Sebastian Engell. Tuning MPC for desired closed-loop performance for mimo systems. In Proceedings of the 2011 American Control Conference, pages 4404–4409. IEEE, 2011.
[35] Rohan C Shekhar, Gokul S Sankar, Chris Manzie, and Hayato Nakada. Efficient calibration of real-time model-based controllers for diesel engines–Part I: Approach and drive cycle results. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pages 843–848. IEEE, 2017.
[36] Rahul Shridhar and Douglas J Cooper. A tuning strategy for unconstrained multivariable model predictive control. Industrial & engineering chemistry research, 37(10):4003–4016, 1998.
[37] Farshud Sorourifar, Georgios Makrygirgos, Ali Mesbah, and Joel A Paulson. A data-driven automatic tuning method for mpc under uncertainty using constrained bayesian optimization. arXiv preprint arXiv:2011.11841, 2020.
[38] Ryohei Suzuki, Fukiko Kawai, Hideyuki Ito, Chikashi Nakazawa, Yoshikazu Fukuyama, and Eitaro Aiyoshi. Automatic tuning of model predictive control using particle swarm optimization. In 2007 IEEE Swarm Intelligence Symposium, pages 221–226. IEEE, 2007.
[39] C.A. Tracy and H. Widom. On orthogonal and symplectic matrix ensembles. Communications in Mathematical Physics, 177(3):727–754, 1996.
[40] JH Van der Lee, WY Svrcek, and BR Young. A tuning algorithm for model predictive controllers based on genetic algorithms and fuzzy decision making. ISA transactions, 47(1):53–59, 2008.
[41] Johan Wahlström and Lars Eriksson. Modelling diesel engines with a variable-geometry turbocharger and exhaust gas recirculation by optimization of model parameters for capturing non-linear system dynamics. Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering, 225(7):960–986, 2011.
[42] Liuping Wang. Model predictive control system design and implementation using MATLAB®. Springer Science & Business Media, 2009.
[43] André Shigueo Yamashita, Paulo Martin Alexandre, Antonio Carlos Zanin, and Darci Odloak. Reference trajectory tuning of model predictive control. Control Engineering Practice, 50:1–11, 2016.
[44] Tuo Zhao, Zhaoran Wang, and Han Liu. A nonconvex optimization framework for low rank matrix estimation. In Advances in Neural Information Processing Systems, pages 559–567, 2015.
[45] Zhihui Zhu, Qiuwei Li, Gongguo Tang, and Michael B Wakin. Global optimality in low-rank matrix optimization. IEEE Transactions on Signal Processing, 66(13):3614–3628, 2018.

	$\displaystyle f_{\mu}(X)$	$\displaystyle\stackrel{{\scriptstyle\eqref{eq:f_mu-matrix}}}{{=}}\frac{1}{\kappa}\int_{\mathbb{G}^{n}}f(X+\mu U)e^{-\frac{1}{2}\left\\|U\right\\|_{F}^{2}}\mathrm{d}U$
		$\displaystyle\stackrel{{\scriptstyle\mbox{\small Def.\ref{def:sub-differential-matrix}}}}{{\geq}}\frac{1}{\kappa}\int_{\mathbb{G}^{n}}(f(X)+\langle G,\mu U\rangle_{F})e^{-\frac{1}{2}\left\\|U\right\\|_{F}^{2}}\mathrm{d}U$
		$\displaystyle=f(X)+\frac{1}{\kappa}\int_{\mathbb{G}^{n}}\langle G,\mu U\rangle_{F}e^{-\frac{1}{2}\left\\|U\right\\|_{F}^{2}}\mathrm{d}U$
		$\displaystyle=f(X)+\mu\mbox{Tr}\left\{\frac{1}{\kappa}\int_{\mathbb{G}^{n}}G^{\top}Ue^{-\frac{1}{2}\left\\|U\right\\|_{F}^{2}}\mathrm{d}U\right\}$
		$\displaystyle=f(X),$

	$\displaystyle\nabla f_{\mu}(X)$	$\displaystyle=\frac{1}{\mu^{\bar{n}}\kappa}\int_{\mathbb{G}^{n}}f(Y)e^{-\frac{1}{2\mu^{2}}\left\\|Y-X\right\\|_{F}^{2}}\times-\frac{1}{2\mu^{2}}\times\frac{\partial\mbox{Tr}\left\{(Y-X)^{2}\right\}}{\partial X}\mathrm{d}Y$
		$\displaystyle=\frac{1}{\mu^{\bar{n}+2}\kappa}\int_{\mathbb{G}^{n}}f(Y)e^{-\frac{1}{2\mu^{2}}\left\\|Y-X\right\\|_{F}^{2}}(Y-X)\mathrm{d}Y$
		$\displaystyle=\frac{1}{\mu\kappa}\int_{\mathbb{G}^{n}}f(X+\mu U)e^{-\frac{1}{2}\left\\|U\right\\|_{F}^{2}}U\mathrm{d}U$
		$\displaystyle=\frac{1}{\kappa}\int_{\mathbb{G}^{n}}\frac{f(X+\mu U)-f(X)}{\mu}e^{-\frac{1}{2}\left\\|U\right\\|_{F}^{2}}U\mathrm{d}U,$

	$\displaystyle r_{k+1}^{2}$	$\displaystyle=\left\\|\pi_{\mathbb{Q}^{n}}\{X_{k}-h_{k}\mathcal{O}_{\mu}(X_{k},U_{k})\}-X^{*}\right\\|_{F}^{2}$
		$\displaystyle\stackrel{{\scriptstyle\eqref{eq:projection-matrix}}}{{\leq}}\left\\|X_{k}-h_{k}\mathcal{O}_{\mu}(X_{k},U_{k})-X^{*}\right\\|_{F}^{2}$
		$\displaystyle=r_{k}^{2}-2h_{k}\left\langle X_{k}-X^{*},\mathcal{O}_{\mu}(X_{k},U_{k})\right\rangle_{F}+h_{k}^{2}\left\\|\mathcal{O}_{\mu}(X_{k},U_{k})\right\\|_{F}^{2}\,.$