Exploring the use of deep learning in task-flexible ILC*

Anantha Sai Hariharan Vinjarapu, Yorick Broens, Hans Butler, and Roland Tóth *This work has received funding from the ECSEL Joint Undertaking under grant agreement No 875999 and from the European Union within the framework of the National Laboratory for Autonomous Systems (RRF-2.3.1-21.2022-00002).A.S.H.Vinjarapu, Y.Broens, R.Tóth and H.Butler are with the Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands. R.Tóth is also affiliated with the Systems and Control Laboratory, Institute for Computer Science and Control, Hungary. H.Butler is also affiliated with ASML, Veldhoven, The Netherlands. Email: [email protected].

Abstract

Growing demands in today’s industry results in increasingly stringent performance and throughput specifications. For accurate positioning of high-precision motion systems, feedforward control plays a crucial role. Nonetheless, conventional model-based feedforward approaches are no longer sufficient to satisfy the challenging performance requirements. An attractive method for systems with repetitive motion tasks is iterative learning control (ILC) due to its superior performance. However, for systems with non-repetitive motion tasks, ILC is generally not applicable, despite of some recent promising advances. In this paper, we aim to explore the use of deep learning to address the task flexibility constraint of ILC. For this purpose, a novel Task Analogy based Imitation Learning (TAIL)-ILC approach is developed. To benchmark the performance of the proposed approach, a simulation study is presented which compares the TAIL-ILC to classical model-based feedforward strategies and existing learning-based approaches, such as neural network based feedforward learning.

I Introduction

High-precision positioning systems are essential components in modern manufacturing machines and scientific equipment, see [1, 2, 3, 4]. To ensure high-throughput and high-accuracy position tracking, a two-degree-of-freedom controller structure, consisting of a feedback controller and a feedforward controller, is commonly utilized, see [5, 6, 7]. The feedback controller maintains closed-loop stability and disturbance rejection, while the feedforward controller is primarily responsible for achieving optimal position tracking performance, see [8]. Nonetheless, with the increasingly stringent demands in contemporary industry, conventional model-based feedforward techniques, e.g. [9], are no longer adequate to meet the desired performance specifications, thus necessitating for alternative feedforward approaches.

Iterative Learning Control (ILC), see [10], has emerged as a viable choice for feedforward control in motion systems that execute recurring tasks, enabling accurate position tracking. Despite its advantages, ILC exhibits significant limitations. Primarily, ILC is dependent on the assumption that the tracking error recurs from one iteration to the next, limiting its general applicability. Additionally, conventional ILC performance is constrained to a single task, see [11].

Several studies have attempted to address the task flexibility limitations of ILC by drawing on concepts from machine learning and system identification, as reported in the literature [12, 13, 14]. However, the findings from the related literature suggest that there exists a trade-off between the achievable position tracking performance and the degree of deviation from the core principle of ILC, i.e., direct iterative manipulation of signals. Instead of compromising local ILC performance to enhance task flexibility, the aim is to develop a learning-based feedforward strategy that can deliver superior position tracking performance regardless of the severity of the variation of the compensatory signal across tasks. Such an ILC variant can be imagined to make use of imitation learning in order to mimic the behaviour of conventional ILC policies generalized over multiple trajectories.

This paper introduces a novel approach to ILC, termed Task Analogy based Imitation Learning (TAIL)-ILC, from a data science perspective. By acquiring spatial feature analogies of the trajectories and their corresponding control signals, performance of conventional ILC policies can be replicated. To facilitate efficient network training, abstract lower-dimensional representations of signals are utilized. This approach offers numerous benefits in terms of training and prediction time efficiency, utilization of large datasets, and high sampling rate handling. The resulting feedforward controller comprises an encoding policy, a learning policy, and a decoding policy arranged in a cascade interconnection. Dual principal component analysis (DPCA), a standard linear dimensionality reduction technique, is utilized for the integration of the encoding and decoding policies, while a deep neural network is employed for the learning policy.

The main contributions of this paper are:

(C1)

A novel TAIL-ILC approach that tackles the task extension problem of ILC via learning spatial feature analogies of trajectories and their compensation signals, enabling direct imitation of ILC policies.
(C2)

An efficient implementation strategy is devised for the learning-based feedforward controller in terms of constructing it through the cascade interconnection of an encoder, a deep neural network, and a decoder.

This paper is organized as follows. First, the problem formulation is presented in Section II. Next, Section III presents the proposed novel TAIL-ILC approach which aims at generalizing ILC performance across various tasks through imitation learning strategies. Section IV provides a simulation study of the proposed approach with respect to existing feedforward strategies using a high-fidelity model of a moving-magnet planar actuator. In Section V, detailed comparison between the proposed TAIL-ILC approach and neural-network-based feedforward strategies is presented. Finally, conclusions on the proposed approach are presented in Section VI.

II Problem statement

II-A Background

Consider the conventional frequency domain ILC configuration illustrated by Figure 1, where $P\in\mathcal{R}^{n_{\mathrm{y}}\times n_{\mathrm{u}}}$ corresponds to the proper transfer matrix representation of a discrete time (DT) linear-time-invariant (LTI) multiple-input multiple-output (MIMO) plant with $\mathcal{R}$ denoting the set of real rational functions in the complex variable $z\in\mathbb{C}$ . Furthermore, the proper $K\in\mathcal{R}^{n_{\mathrm{u}}\times n_{\mathrm{y}}}$ represents a LTI stabilizing DT feedback controller, which is typically constructed using rigid-body decoupling strategies, see [15]. The aim of the conventional frequency domain ILC framework is to construct an optimal feedforward policy $f$ , which minimizes the position tracking error $e$ in the presence of the motion trajectory $r$ . Under the assumption that the reference trajectory is trial invariant, the error propagation per trial $k\in\mathbb{N}_{\geq 0}$ is given by:

e_{k}=Sr-Jf_{k},

(1)

where $S=(I+PK)^{-1}$ and $J=(I+PK)^{-1}P$ . Generally, the update law for the feedforward policy is in accordance with the procedure outlined in [16]:

f_{k+1}=Q\left(Le_{k}+f_{k}\right),

(2)

where $L\in\mathcal{RL}_{\infty}^{n_{\mathrm{u}}\times n_{\mathrm{y}}}$ is a learning filter and $Q\in\mathcal{RL}_{\infty}^{n_{\mathrm{u}}\times n_{\mathrm{u}}}$ denotes a robustness filter with $\mathcal{RL}_{\infty}$ corresponding to set of real rational functions in $z$ that have bounded singular value on the unit circle $\mathbb{D}=\{e^{\mathrm{i}\omega}\mid\omega\in[0,2\pi]\}$ , i.e., finite $\mathcal{L}_{\infty}(\mathbb{D})$ norm. Both $L$ and $Q$ are required to be designed for the ILC task at hand. Furthermore, by combining (1) and (2), the progression of the error and feedforward update is reformulated as:


$\displaystyle e_{k+1}$	$\displaystyle=(I-JQJ^{-1})Sr+JQ(J^{-1}-L)e_{k},$	(3a)
$\displaystyle f_{k+1}$	$\displaystyle=QLSr+Q(I-LJ)f_{k},$	(3b)

which can be reduced to:


$\displaystyle e_{k+1}$	$\displaystyle=(I-Q)Sr+Q(I-JL)e_{k},$	(4a)
$\displaystyle f_{k+1}$	$\displaystyle=QLSr+Q(I-LJ)f_{k},$	(4b)

under the assumption that $Q$ is diagonal and $J$ is approximately diagonal, which holds in case of rigid-body decoupled systems.

From (4), several observations can be made. First, it can be observed that the contribution of $r$ to the position tracking error is dependent on the robustness filter $Q$ , which is optimally chosen as identity to negate the contribution of the reference trajectory towards the tracking error. Secondly, learning filter $L$ aims to minimize the criterion $\|Q(I-JL)\|_{\infty}<1$ , where $\|\centerdot\|_{\infty}$ stands for the $\mathcal{H}_{\infty}$ norm, such that the tracking error is steered to zero, which is optimally achieved when $L=J^{-1}$ . Note that these assumptions on $Q$ and $L$ yield the optimal feedforward update $f_{k+1}=P^{-1}r$ , which results in perfect position tracking. Moreover, when the convergence criterion is satisfied, the limit policies, i.e. $e_{\infty}=\lim_{k\rightarrow\infty}e_{k}$ , $f_{\infty}=\lim_{k\rightarrow\infty}f_{k}$ , correspond to:


$\displaystyle e_{\infty}$	$\displaystyle=\bigl{(}I-J\bigl{(}I-Q(I-LJ)\bigr{)}^{-1}QL\bigr{)}Sr,$	(5a)
$\displaystyle f_{\infty}$	$\displaystyle=\left(I-Q(I-LJ)\right)^{-1}QLSr,$	(5b)

In spite of its simplicity and efficacy, the conventional ILC is hindered by significant limitations, the most notable of which is its confinement to a single task. Consequently, its practical utility is restricted to particular types of machinery.

Refer to caption — Figure 1: Control structure with the conventional ILC configuration.

II-B Problem formulation

The aim of this paper is to address the challenge of augmenting the task-flexibility of the conventional ILC by utilizing an imitation learning based controller. This approach facilitates the generalization of the optimal feedforward policy, created by the conventional ILC, for a wider range of motion profiles. The primary objective of this paper is to devise a feedforward controller that employs a learning-based mechanism, which satisfies the following requirements:

(R1)

The learning-based feedforward approach enables the generalization of the performance of the conventional ILC across multiple trajectories.
(R2)

The scalability of the learning-based feedforward approach is imperative for its implementation in systems with a high sampling rate.

III TAIL-ILC

III-A Approach

For a given dynamic system with a proper discrete transfer function $G\in\mathcal{R}$ under a sampling time $T_{\mathrm{s}}\in\mathbb{R}_{+}$ , a reference trajectory $r$ of duration $T=n_{\mathrm{d}}T_{\mathrm{s}}$ seconds can be defined as

r=\begin{bmatrix}r(0)&\cdots&r(n_{\mathrm{d}})\end{bmatrix}^{\top},

(6)

where $n_{\mathrm{d}}$ corresponds to the length of the signal in DT. This reference trajectory for example can correspond to a $n^{th}$ order motion profile. A trajectory class $C\subset\mathbb{R}^{n_{\mathrm{d}}\times n_{\mathrm{t}}}$ is defined as a collection of reference trajectories such that each trajectory shares certain prominent spatial features (motion profile order, constant velocity interval length, etc.) with the others, where $n_{\mathrm{t}}$ is the number of trajectories:

C=\{r_{1},r_{2},r_{3}.....,r_{n_{\mathrm{t}}}\}.

(7)

Given a specific combination of the $L$ and $Q$ filters, consider that an ILC policy $\pi^{*}$ exists which maps a given reference trajectory $r$ to the optimal feedforward compensation signal $f^{*}$ , see (5). This can be formally expressed as:

\pi^{*}:r_{i}\rightarrow f^{*}_{i}.

(8)

Henceforth, $\pi^{*}$ shall be denoted as the expert policy, which is equipped with learning and robustness filters established through a process model. Our objective is to formulate an optimal student policy $\pi_{\mathrm{s}}^{*}$ that approximates the performance of the optimal policy $\pi^{*}$ over a set of trajectories from the pertinent trajectory class. To this end, we endeavor to determine $\pi_{\mathrm{s}}^{*}$ as a solution to the optimization problem:

\pi_{\mathrm{s}}^{*}=\underset{\pi_{\mathrm{s}}}{\arg\min}\ \eta(\pi^{*}(r_{i}),\pi_{\mathrm{s}}(r_{i})),\quad\forall i\in[1,n_{\mathrm{t}}]

(9)

where $r_{i}\sim C$ and $\eta(\cdot,\cdot)$ is a performance quantification measure, and $\pi_{\mathrm{s}}$ are parameterized student policy candidates. The expert policy $\pi^{*}$ is a conventionally designed frequency domain ILC as described in Section II-A. In TAIL-ILC, the idea is to structure $\pi_{\mathrm{s}}$ as :

\pi_{\mathrm{s}}=\pi_{\mathrm{D,Y}}\circ\pi_{\mathrm{C}}\circ\pi_{\mathrm{E,U}}.

(10)

which is visualised in Figure 2. The TAIL-ILC controller is capable of generating a feedforward control signal based on a given reference trajectory. This process is carried out through a series of three sub-policies outlined in equation (10). The first sub-policy, $\pi_{\mathrm{E,U}}$ , projects the reference trajectory $r_{i}\in\mathbb{R}^{n_{\mathrm{d}}\times 1}$ into a lower-dimensional space referred to as the latent space. Next, the second sub-policy, $\pi_{\mathrm{C}}$ , predicts a latent space representation of the feedforward signal, which is then fed into the third sub-policy, $\pi_{\mathrm{D,Y}}$ , to project the latent space feedforward signal back into the higher-dimensional output space, resulting in $f_{i}\in\mathbb{R}^{n_{\mathrm{d}}\times 1}$ . Notably, the successful application of TAIL-ILC requires that all reference trajectories share certain spatial features with each other. The prediction sub-policy, $\pi_{\mathrm{C}}$ , is trained on a set of reference trajectories and their corresponding feedforward control signals obtained using $\pi^{*}$ , which are projected into the latent space. The use of abstract representations enables the preservation of the most significant information of the signals while simultaneously reducing the amount of data used for making predictions, resulting in several advantages, such as increased training and prediction time efficiencies. The subsequent sub-section will delve into the development of each sub-policy in further detail.

III-B Student policy $\pi_{\mathrm{s}}$

The three-part student policy $\pi_{\mathrm{s}}:r\rightarrow f_{\pi_{\mathrm{s}}}$ can be decomposed into three distinct components:


$\displaystyle\pi_{\mathrm{E,U}}$	$\displaystyle:r\rightarrow r_{l}$	(11a)
$\displaystyle\pi_{\mathrm{C}}$	$\displaystyle:r_{l}\rightarrow f_{l}$	(11b)
$\displaystyle\pi_{\mathrm{D,Y}}$	$\displaystyle:f_{l}\rightarrow f_{\pi_{\mathrm{s}}}$	(11c)

where, $r,f_{\pi_{\mathrm{s}}}\in\mathbb{R}^{n_{\mathrm{d}}\times 1}$ and $r_{l},f_{l}\in\mathbb{R}^{n_{l}\times 1}$ and $n_{l}$ is the latent space dimensionality such that $n_{l}\ll n_{\mathrm{d}}$ . As mentioned in Section III-A, the training data for the sub-policy $\pi_{\mathrm{C}}$ , namely the pairs $\{r_{i,l},f_{i,l}\}$ , are in the latent space. This shows that the ideal outputs of $\pi_{\mathrm{s}}$ are of the form:

f_{\pi_{\mathrm{s}}}=\pi_{\mathrm{D,Y}}(f_{l})=f^{\prime}\approx f^{*},

(12)

where, an approximation error may exist between $f^{\prime}$ and $f^{*}$ . Additionally, we aim at:

\pi_{\mathrm{C}}(r_{l})=\widehat{f_{l}}\approx f_{l},

(13)

where, in case of using a deep neural network, $\widehat{f_{l}}$ is the output of the network and the prediction error $e_{\mathrm{pred}}$ is defined as:

e_{\mathrm{pred}}=\|f_{l}-\widehat{f_{l}}\|_{2},

(14)

where, $\|\cdot\|_{2}$ denotes the $\ell_{2}$ norm. Moreover, this implies that (12) becomes:

f_{\pi_{\mathrm{s}}}=\pi_{\mathrm{D,Y}}(\widehat{f_{l}})=\widehat{f^{\prime}}.

(15)

In order to quantify the gap between performance of $\pi^{*}$ and that of $\pi_{\mathrm{s}}$ , a distance measure is used as the performance quantification measure $\eta$ in (9). This is expressed as:

\eta(\pi^{*},\pi_{\mathrm{s}})=\frac{1}{n_{\mathrm{t}}}\sum_{i=1}^{n_{\mathrm{t}}}\|f_{i}-\widehat{f^{\prime}_{i}}\|_{2}.

(16)

Assuming that $\mu$ represents the set of weights and biases of the deep neural network, improving the performance of $\pi_{\mathrm{s}}$ can be posed as the following optimization problem:

\underset{n_{l},\mu}{\arg\min}\ \eta(\pi^{*},\pi_{\mathrm{s}})

(17)

The proposed approach involves propagating the parameter $\eta$ through the three sub-policies, with the aim of iteratively optimizing both $n_{l}$ and $\mu$ via (17). However, given the significant computational burden associated with this approach, there is a need for a more straightforward alternative or a reformulation of the problem. With this goal in mind, we introduce the concepts of the Expert space and Student space to provide alternative perspectives for addressing the optimization problem at hand.

Definition 1

The expert space is defined as the space of all real policies denoted by superscript $\ {}^{\mathrm{e}}$ having the form

\pi^{\mathrm{e}}:\mathbb{R}^{n_{\mathrm{x}}\times 1}\rightarrow\mathbb{R}^{n_{\mathrm{d}}\times 1}\quad\forall n_{\mathrm{x}}\in\mathbb{N}

Example:

1.

Expert policy in expert space:

$\pi_{\mathrm{e}}^{\mathrm{e}}:r\rightarrow f^{\prime}$ (18)

where $r,f^{\prime}\in\mathbb{R}^{n_{\mathrm{d}}\times 1}$
2.

Student policy in expert space:

$\pi_{\mathrm{s}}^{\mathrm{e}}:r\rightarrow\widehat{f^{\prime}}$ (19)

where $r,\widehat{f^{\prime}}\in\mathbb{R}^{n_{\mathrm{d}}\times 1}$

Definition 2

The student space is defined as the space of all real policies denoted by superscript $(\ ^{\mathrm{s}})$ having the form

\pi^{\mathrm{s}}:\mathbb{R}^{n_{\mathrm{x}}\times 1}\rightarrow\mathbb{R}^{n_{l}\times 1}\ \forall\ n_{\mathrm{x}}\in\mathbb{N}

Example:

1.

Expert policy in student space:

$\pi_{\mathrm{e}}^{\mathrm{s}}:r_{l}\rightarrow f_{l}$ (20)

where $r_{l},f_{l}\in\mathbb{R}^{n_{l}\times 1}$
2.

Student policy in student space:

$\pi_{\mathrm{s}}^{\mathrm{s}}:r_{l}\rightarrow\widehat{f_{l}}$ (21)

where $r_{l},\widehat{f_{l}}\in\mathbb{R}^{n_{l}\times 1}$

Table I summarizes these definitions.

Stated differently, the expert space is comprised of all the decoding policies, $\pi_{\mathrm{D}}$ , which project signals into $n_{\mathrm{d}}$ dimensions, while the student space is composed of all the encoding policies, $\pi_{\mathrm{E}}$ , which project signals into $n_{l}$ dimensions.

TABLE I: Expert and student policies in expert and student spaces

	Expert space $(^{\mathrm{e}})$	Student space $(^{\mathrm{s}})$
Expert policy $(\pi_{\mathrm{e}})$	$\pi_{\mathrm{e}}^{\mathrm{e}}:r\rightarrow f^{\prime}$	$\pi_{\mathrm{e}}^{\mathrm{s}}:r_{l}\rightarrow f_{l}$
Student policy $(\pi_{\mathrm{s}})$	$\pi_{\mathrm{s}}^{\mathrm{e}}:r\rightarrow\widehat{f^{\prime}}$	$\pi_{\mathrm{s}}^{\mathrm{s}}:r_{l}\rightarrow\widehat{f_{l}}$

Based on the preceding definitions, it is worth noting that our primary objective is to determine the student policy in the expert space, $\pi_{\mathrm{s}}^{\mathrm{e}}$ . In light of these definitions, the distance metric specified in (16) can be reformulated as:

\eta(\pi^{*},\pi_{\mathrm{s}}^{\mathrm{e}})=\eta(\pi^{*},\pi_{\mathrm{e}}^{\mathrm{e}})+\eta(\pi_{\mathrm{e}}^{\mathrm{e}},\pi_{\mathrm{s}}^{\mathrm{e}})

(22)

\implies\eta(\pi^{*},\pi_{\mathrm{s}}^{\mathrm{e}})=\frac{1}{n_{\mathrm{t}}}\sum_{i=1}^{n_{\mathrm{t}}}\|f_{i}-f^{\prime}_{i}\|_{2}+\frac{1}{n_{\mathrm{t}}}\sum_{i=1}^{n_{\mathrm{t}}}\|f^{\prime}_{i}-\widehat{f^{\prime}_{i}}\|_{2}

where, $\eta(\pi^{*},\pi_{\mathrm{e}}^{\mathrm{e}})$ corresponds to the optimization of $n_{l}$ and $\eta(\pi_{\mathrm{e}}^{\mathrm{e}},\pi_{\mathrm{s}}^{\mathrm{e}})$ corresponds to the optimization of $\mu$ . This separation of the distance measure (16) allows the optimization problem in (17) to be segmented as:

\underset{n_{l},\mu}{\arg\min}\ \eta(\pi^{*},\pi_{\mathrm{s}}^{\mathrm{e}})=\underset{n_{l}}{\arg\min}\ \eta(\pi^{*},\pi_{\mathrm{e}}^{\mathrm{e}})+\underset{\mu}{\arg\min}\ \eta(\pi_{\mathrm{e}}^{\mathrm{e}},\pi_{\mathrm{s}}^{\mathrm{e}})

This segregation allows us to optimize $n_{l}$ independently of $\mu$ , thus simplifying the optimization problem defined by (17).

III-C Choice of encoding and decoding sub-policies

The encoding and decoding sub-policies in this work employ DPCA, a well-established linear dimensionality reduction technique, due to its computational simplicity. Other commonly-used linear and non-linear dimensionality reduction methods are also available and have been reviewed in [17]. DPCA involves the identification of a linear subspace with $n_{l}$ dimensions in an $n_{\mathrm{d}}$ dimensional space, where $n_{l}$ is significantly smaller than $n_{\mathrm{d}}$ . This subspace is defined by a set of orthonormal bases that maximize the variance of the original data when projected onto this subspace. The orthonormal bases computed through this process are commonly referred to as principal components.

Definition 3

A data point in an arbitrary dataset $H\in\mathbb{R}^{n_{\mathrm{x}}\times n_{\mathrm{t}}}$ is defined as a vector $r_{i}\in\mathbb{R}^{n_{\mathrm{x}}\times 1}$ $\forall i\in[1,n_{\mathrm{t}}]$ .

The selection of the principal components for an $n_{l}$ dimensional latent space for the data points in $C$ involves choosing the right eigenvectors that correspond to the first $n_{l}$ singular values of $H$ . It should be emphasized that the projection of a data point onto the latent space can be computed through the following method:

r_{l}=T_{\mathrm{E}}r,

(23)

where:

T_{\mathrm{E}}=\widehat{\Sigma}^{-1}V^{\top}H^{\top}

(24)

In this context, $r_{l}\in\mathbb{R}^{n_{l}\times 1}$ , $V\in\mathbb{R}^{n_{\mathrm{t}}\times n_{\mathrm{t}}}$ denotes the matrix of right eigenvectors of $H$ and $\widehat{\Sigma}\in\mathbb{R}^{n_{l}\times n_{\mathrm{t}}}$ contains the first $n_{l}$ singular values of $H$ along its diagonal elements. It is worth noting that the value of $n_{l}$ is constrained by the number of data points in $H$ . This feature of DPCA is particularly advantageous in situations where $n_{\mathrm{d}}>>n_{\mathrm{t}}$ . Given the latent space representation $r_{l}$ , a reconstructed data point $r^{\prime}$ can be obtained as:

r^{\prime}=T_{D}r_{l},\qquad r^{\prime}\in\mathbb{R}^{n_{\mathrm{d}}\times 1},

(25)

where:

T_{\mathrm{D}}=HV\widehat{\Sigma}^{-1}.

(26)

Remark 1

The computation of transformations $T_{\mathrm{E}}$ and $T_{\mathrm{D}}$ depend on $n_{l}$ . Additionally, considering that we have access to the dataset $H$ , the matrix transformations $\widehat{\Sigma}^{-1}V^{T}H^{T}$ and $HV\widehat{\Sigma}^{-1}$ , the right hand side of (23) and (25), become constant for a specific problem for a given choice of $n_{l}$ .

In the light of Remark 1, for a given dataset $H\in\mathbb{R}^{n_{\mathrm{d}}\times n_{\mathrm{t}}}$ the encoding ( $\pi_{\mathrm{E,U}}$ ) and decoding ( $\pi_{\mathrm{D,Y}}$ ) sub-policies for using in the student policy $\pi_{\mathrm{s}}$ can be defined as follows:


$\displaystyle\pi_{\mathrm{E,U}}(r)$	$\displaystyle=T_{\mathrm{E}}r$	(27a)
$\displaystyle\pi_{\mathrm{D,Y}}(\widehat{f_{l}})$	$\displaystyle=T_{\mathrm{D}}\widehat{f_{l}}$	(27b)

IV Simulation study

This section presents a comparison study of the TAIL-ILC approach in comparison with classical ILC, an artificial neural network (ANN) based ILC, referred to as NN-ILC, see [14], and conventional rigid body feedforward, see [7, 18], which is obtained by multiplication of the acceleration profile and the inverted rigid body dynamics of the system:

C_{\mathrm{FF}}=m\ddot{r_{i}}

(28)

To facilitate simulation, a high-fidelity model of a moving-magnet planar actuator (MMPA), depicted in Figure 3, is considered. A detailed description of a MMPA system is given in [19].

Table II provide a concise overview of the network architecture and training specifics for sub-policy $\pi_{\mathrm{C}}$ in TAIL-ILC and policy $\pi_{\mathrm{NN}}$ in NN-ILC, respectively. For the sake of comparability, the training parameters are kept consistent between the two networks. The networks are designed and trained using the Deep Learning toolbox in MATLAB 2019b, employing the default random parameter initialization.

The training set consists of 618 trajectories, while the test set includes 42 trajectories, each of which is 2.5 seconds long with a total of 20833 time samples. Each trajectory corresponds to a fourth-order motion profile, designed based on the approach presented by [20], and is parameterized with five parameters in the spatial domain. Individual trajectories are then generated by sweeping over a grid of values for each of these parameters. The objective of this study is to evaluate and compare the performance of the previously mentioned feedforward approaches against the expert ILC policy $\pi^{*}$ , which is the traditional ILC optimized for multiple trajectories of the same class. The primary aim of ILC in this context is to mitigate any unaccounted-for residual dynamics in the system and enhance classical model-based feedforward. Consequently, we also compare the combined performance of student policies with classical feedforward controllers. We demonstrate the tracking ability of TAIL-ILC and NN-ILC on two reference trajectories, namely $r_{1}$ and $r_{2}$ , which belong to the same class and are shown in Figure 4. $r_{1}$ is a randomly chosen trajectory from the training set, while $r_{2}$ is a previously unseen trajectory.

TABLE II: Architecture and training details of the NNs

Parameter	TAIL-ILC	NN-ILC
No. of neurons in the input layer	$618$	$4$
No. of hidden layers	$3$	$3$
No. of neurons in hidden layers	$800$	$6$
Activation	Relu	Relu
No. of neurons in the output layer	$618$	$1$
Learning rate	$10^{-3}$	$10^{-3}$
Epochs	$5000$	$5000$
Optimizer	adam	adam
Minibatch size	$128$	$128$
Train set	$618$ trajectories	$618$ trajectories
Test set	$42$ trajectories	$42$ trajectories

IV-A Time domain performance of TAIL-ILC and NN-ILC

A silicon wafer scanning application is considered where the scanning takes place during the constant velocity interval of the motion profile, see [1]. In this context, Figure 5 illustrates the position tracking error in $x$ -direction during the constant velocity interval of the reference trajectories $r_{1}$ and $r_{2}$ respectively. In addition to the performance of mass feedforward, TAIL-ILC and NN-ILC, the figure also indicates the performance of the expert ILC policy. This is to facilitate the comparison of the performance of the two deep learning based ILC variants with the baseline. As demonstrated in the left Figure, i.e. the performance of the feedforward controllers on $r_{1}$ , the expert ILC policy exhibits the highest overall performance. Nonetheless, it is noteworthy that the TAIL-ILC policy outperforms in terms of the peak tracking error achieved compared to the alternative feedforward approaches, whereas the NN-ILC policy demonstrates a superior performance in terms of the convergence time of the error. Nonetheless, when analyzing the right Figure, i.e. the performance of the feedforward approaches for a previously unseen trajectory $r_{2}$ , the expert ILC policy needs to re-learn the relevant feedforward signal. Conversely, the TAIL-ILC and NN-ILC policies are capable of achieving similar performance to the re-learned expert ILC policy without any further training. Additionally, when combined with a classical mass feedforward controller, both the TAIL-ILC and NN-ILC policies are observed to yield superior performance in terms of peak error and settling time compared to the classical mass feedforward controller alone.

IV-B TAIL-ILC vs NN-ILC

Table III provides a comparison of the training and prediction properties of the TAIL-ILC and NN-ILC student policies. Here, we compare the following parameters:

1.

$T_{\text{train}}$ : Time to train the neural network
2.

$T_{\text{predict}}$ : Time to make predictions for 10 randomly selected test set trajectories.
3.

$e_{\text{train}}$ : Control signal prediction error averaged over 10 randomly selected train set trajectories.
4.

$e_{\text{test}}$ : Control signal prediction error averaged over 10 randomly selected test set trajectories.
5.

$e_{\text{peak \ tracking}}$ : Peak tracking error achieved with the predicted control signals averaged over 10 randomly selected train set trajectories.

TABLE III: Performance comparison for the

1^{st}

degree of freedom

Criterion	NN-ILC	TAIL-ILC
$T_{\text{train}}$	$2.5$ hr	$20$ min
$T_{\text{predict}}(\text{per sample})$	$0.005$ sec	$0.064$ sec
$T_{\text{predict}}(\text{full signal})$	$86$ sec	$0.064$ sec
$e_{\text{train}}$	$0.0055$ N	$0.0011$ N
$e_{\text{test}}$	$0.0013$ N	$0.0064$ N
$e_{\text{peak\ tracking}}$	$1.3\cdot 10^{-7}$ m	$8.3\cdot 10^{-8}$ m

Here, the average control signal prediction errors of the train and the test set trajectories are calculated as the values of the performance measure $\eta$ in (22). As can be seen, though the original signals and trajectories are extremely high dimensional, the projection of these signals into the latent space using the proposed TAIL-ILC approach has resulted in significant improvement in training and prediction time compared to that of the NN-ILC approach.

Moreover, as observed in Figure 5, the average signal prediction error has decreased for TAIL-ILC in case of previously seen trajectories whereas the NN-ILC has improved performance for previously unseen trajectories.

V TAIL-ILC vs NN-ILC PERSPECTIVES

In the previous Section, we have seen a comparison of the performance of TAIL-ILC and NN-ILC controllers for a specific use case. However, it is more natural to view these controllers as individual instances of two fundamentally different perspectives of the problem. Hence, it is important to reflect upon the perspectives that these controllers convey and the consequences for various aspects of the resulting controllers. This is expected to provide us with a more generalised reasoning to some of the differences observed in performances of these two controllers.

V-A Time duration of trajectories

The NN-ILC and TAIL-ILC are two approaches of ILC that differ in their treatment of reference trajectories and feedforward signals. NN-ILC is capable of handling trajectories of different lengths, as it deals with them sample-wise. In contrast, TAIL-ILC processes trajectories and signals in their entirety, making it challenging to manage trajectories of varying durations due to the fixed input-output dimensionality of neural network learning models. Additionally, NN-ILC is better equipped to handle instantaneous changes in reference trajectories compared to TAIL-ILC. A possible solution to reconcile these perspectives is to use a different class of learning models, such as a recurrent neural network.

V-B Training and prediction time efficiencies

In NN-ILC, the training dataset used for $\pi_{\mathrm{NN}}$ encompasses all the samples from all the trajectories in the training set, along with their associated feedforward signals. Conversely, TAIL-ILC employs a training dataset for $\pi_{\mathrm{C}}$ that solely includes the parameters of the trajectories and feedforward signals within the latent space, resulting in a significantly smaller dataset in comparison to the total number of samples. This characteristic leads to TAIL-ILC presenting shorter training and prediction times when compared to NN-ILC, as demonstrated by the results presented in Table III.

V-C Generalizability to previously unseen trajectories

Figure 5 demonstrates that NN-ILC outperforms TAIL-ILC in terms of generalizing performance to previously unobserved trajectories. The improved performance can be attributed to NN-ILC’s treatment of reference trajectories as points in an $n$ -dimensional space corresponding to $n^{\mathrm{th}}$ order motion profiles, which allows it to learn a mapping to the corresponding feedforward signal time samples. As a result, the trained network can more accurately extrapolate performance to previously unobserved points in the space of possible reference trajectories. In contrast, TAIL-ILC relies primarily on analogies between individual tasks on a higher level, which may result in suboptimal performance when confronted with previously unobserved trajectories at the sample level.

VI CONCLUSION

In this work, we have primarily explored two different perspectives within the context of deep learning of the task-flexibility constraint of conventional ILC. While each of the considered approaches has its own advantages and disadvantages, it has been observed that the use of deep learning techniques in general could be a useful direction for future research in designing task-flexible ILC variants.

References

[1] H. Butler, “Position control in lithographic equipment [applications of control],” IEEE Control Systems Magazine, vol. 31, no. 5, pp. 28–47, 2011.
[2] N. Tamer and M. Dahleh, “Feedback control of piezoelectric tube scanners,” in Proc. of the Proc. of the Proceedings of 1994 33rd IEEE Conference on Decision and Control, vol. 2, pp. 1826–1831 vol.2, 1994.
[3] M. Heertjes, “Data-based motion control of wafer scanners,” IFAC-PapersOnLine, vol. 49, no. 13, pp. 1–12, 2016. 12th IFAC Workshop on ALCOSP 2016.
[4] X. Ye, Y. Zhang, and Y. Sun, “Robotic pick-place of nanowires for electromechanical characterization,” in Proc. of the 2012 IEEE International Conference on Robotics and Automation, pp. 2755–2760, 2012.
[5] M. Boerlage, M. Steinbuch, P. Lambrechts, and M. van de Wal, “Model-based feedforward for motion systems,” in Proc. of the Proceedings of 2003 IEEE CCA, 2003., vol. 2, pp. 1158–1163 vol.2, 2003.
[6] T. Oomen, “Advanced motion control for precision mechatronics: control, identification, and learning of complex systems,” IEEJ Journal of Industry Applications, vol. 7, pp. 127–140, Jan. 2018.
[7] M. Steinbuch, R. Merry, M. Boerlage, M. Ronde, and M. Molengraft, van de, Advanced Motion Control Design, pp. 27–1/25. CRC Press, 2010.
[8] M. Heertjes, D. Hennekens, and M. Steinbuch, “Mimo feed-forward design in wafer scanners using a gradient approximation-based algorithm,” Control Engineering Practice, vol. 18, pp. 495–506, 05 2010.
[9] T. Oomen and M. Steinbuch, “Model-based control for high-tech mechatronic systems,” in Proc. of the Mechatronics and Robotics, pp. 51–80, CRC Press, 2020.
[10] H.-S. Ahn, Y. Chen, and K. L. Moore, “Iterative learning control: Brief survey and categorization,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 37, no. 6, pp. 1099–1121, 2007.
[11] L. Blanken, J. Willems, S. Koekebakker, and T. Oomen, “Design techniques for multivariable ilc: Application to an industrial flatbed printer,” IFAC-PapersOnLine, vol. 49, no. 21, pp. 213–221, 2016. 7th IFAC Symposium on Mechatronic Systems MECHATRONICS 2016.
[12] J. S. van Hulst, “Rational basis functions to attenuate vibrating flexible modes with compensation of input nonlinearity: Applied to semiconductor wire bonder,” 1 2022. MSc thesis.
[13] D. J. Hoelzle, A. G. Alleyne, and A. J. Wagoner Johnson, “Basis task approach to iterative learning control with applications to micro-robotic deposition,” IEEE Transactions on Control Systems Technology, vol. 19, no. 5, pp. 1138–1148, 2011.
[14] S. Bosma, “The generalization of feedforward control for a periodic motion system.,” 2019.
[15] M. Steinbuch, “Design and control of high tech systems,” in Proc. of the 2013 IEEE ICM, pp. 13–17, 2013.
[16] L. Blanken, J. van Zundert, R. de Rozario, N. Strijbosch, T. Oomen, C. Novara, and S. Formentin, “Multivariable iterative learning control: analysis and designs for engineering applications,” IET Chapter, pp. 109–143, 2019.
[17] I. K. Fodor, “A survey of dimension reduction techniques,” 5 2002.
[18] I. Proimadis, Nanometer-accurate motion control of moving-magnet planar motors. PhD thesis, Department of Electrical Engineering, 2020.
[19] I. Proimadis, C. H. H. M. Custers, R. Tóth, J. W. Jansen, H. Butler, E. Lomonova, and P. M. J. V. d. Hof, “Active deformation control for a magnetically levitated planar motor mover,” IEEE Transactions on Industry Applications, vol. 58, no. 1, pp. 242–249, 2022.
[20] P. Lambrechts, M. Boerlage, and M. Steinbuch, “Trajectory planning and feedforward design for electromechanical motion systems,” Control Engineering Practice, vol. 13, no. 2, pp. 145–157, 2005.