\headingtitle

Affine Iterations and Wrapping Effect

Affine Iterations and Wrapping Effect: Various Approaches

Nathalie Revol INRIA - LIP, ENS de Lyon, France, , \orcidhttps://orcid.org/0000-0002-2503-2274 Nathalie.Revol@inria.fr

Abstract

Affine iterations of the form $x_{n+1}=Ax_{n}+b$ converge, using real arithmetic, if the spectral radius of the matrix $A$ is less than 1. However, substituting interval arithmetic to real arithmetic may lead to divergence of these iterations, in particular if the spectral radius of the absolute value of $A$ is greater than 1. We will review different approaches to limit the overestimation of the iterates, when the components of the initial vector $x_{0}$ and $b$ are intervals. We will compare, both theoretically and experimentally, the widths of the iterates computed by these different methods: the naive iteration, methods based on the QR- and SVD-factorization of $A$ , and Lohner’s QR-factorization method. The method based on the SVD-factorization is computationally less demanding and gives good results when the matrix is poorly scaled, it is superseded either by the naive iteration or by Lohner’s method otherwise.

keywords:

interval analysis, affine iterations, matrix powers, Lohner’s QR algorithm, QR factorization, SVD factorization

1 Introduction

The problem we consider is the evaluation of the successive iterates of

\left\{\begin{array}[]{lcl}x_{n+1}&=&Ax_{n}+b,\\ \lx@intercol x_{0}\mbox{ given},\hfil\lx@intercol\end{array}\right.

where $A\in\mathbb{R}^{d\times d}$ , $x_{n}\in\mathbb{R}^{d}$ for every $n\in\mathbb{N}$ and $b\in\mathbb{R}^{d}$ . More specifically, the focus is on the use of interval arithmetic to evaluate these iterates.

In what follows, interval quantities will be denoted in boldface.

1.1 A Toy Example

This problem was brought to us through this example of an IIR (Infinite Impulse Response) linear filter in a state-space form:

x_{n}=1.8*x_{n-1}-0.9*x_{n-2}+4.7.10^{-2}*(u_{n-2}+u_{n-1}+u_{n})

for $x_{0}=0$ and $x_{1}\in[1,1.1]$ . We assume that $u_{n}\in{\mathbf{u}}=[9.95\,,\,10.05]$ for every $n$ .

This iteration can also be written as a linear recurrence in $\mathbb{R}^{2}$ :

\begin{array}[]{ll}&\left(\begin{array}[]{r}x_{n-1}\\ x_{n}\end{array}\right)=A.\left(\begin{array}[]{r}x_{n-2}\\ x_{n-1}\end{array}\right)+b_{n},\\ \mbox{ where }&A=\left(\begin{array}[]{rl}0&1\\ -0.9&1.8\end{array}\right)\mbox{ and }b_{n}=\left(\begin{array}[]{c}0\\ 4.7.10^{-2}*(u_{n-2}+u_{n-1}+u_{n})\end{array}\right).\end{array}

This toy example will be used to illustrate the various approaches mentioned in this paper. The first iterates, obtained using floating-point arithmetic, with random values for $x_{1}\in[1,1.1]$ and each $u_{n}\in[9.95\,,\,10.05]$ , are given on the left two columns below.

\begin{array}[]{cc||cc}\begin{array}[]{c|r}n&x_{n}\\ \hline\cr 0&0\\ 1&1.0617\\ 2&3.3183\\ 3&6.4234\\ 4&9.9851\\ 5&13.6031\\ 6&16.9117\\ 7&19.6103\\ 8&21.4884\\ 9&22.4394\\ 10&22.4595\\ 12&20.1508\\ 15&13.8931\\ \end{array}&\hskip 14.22636pt\begin{array}[]{c|r}n&x_{n}\\ \hline\cr 20&9.1518\\ 30&17.0186\\ 40&12.4414\\ 50&15.0305\\ 60&13.6130\\ 70&14.3858\\ 80&13.9680\\ 90&14.1510\\ 100&14.0949\\ 200&14.0870\\ 300&14.1443\\ 400&14.1282\\ 500&14.0828\\ \end{array}&\hskip 14.22636pt\begin{array}[]{c|r}n&\mathrm{wid}(\mathbf{x}_{n})\\ \hline\cr 0&0\\ 1&0.1000\\ 2&0.1941\\ 3&0.4535\\ 4&1.0051\\ 5&2.2313\\ 6&4.9350\\ 7&10.905\\ 8&24.085\\ 9&53.182\\ 10&117.42\\ 12&572.31\\ 15&6158.0\\ \end{array}&\hskip 8.53581pt\begin{array}[]{c|l}n&\mathrm{wid}(\mathbf{x}_{n})\\ \hline\cr 20&3.2293.10^{5}\\ 30&8.8808.10^{8}\\ 40&2.4423.10^{12}\\ 50&6.7164.10^{15}\\ 60&1.8470.10^{19}\\ 70&5.0794.10^{22}\\ 80&1.3969.10^{26}\\ 90&3.8415.10^{29}\\ 100&1.0564.10^{33}\\ 200&2.6137.10^{67}\\ 300&6.4663.10^{101}\\ 400&1.5998.10^{136}\\ 500&3.9580.10^{170}\\ \end{array}\end{array}

The system stabilizes around $14$ , with variations due to the random values taken by the $u_{n}$ . However, the following snippet of Octave code computes the successive iterates using interval arithmetic, using the interval $[1\,,\,1.1]$ for ${\mathbf{x}}_{1}$ and ${\mathbf{u}}=[9.95\,,\,10.05]$ for the $u_{n}$ , that is, we replace $u_{n-2}+u_{n-1}+u_{n}$ by $3*{\mathbf{u}}$ .

A=[[0 1];[-0.9 1.8]];
xn=[infsup(0,0);infsup(1,1.1)];
b=4.7e-2 * 3.0*[infsup(0,0);infsup(9.95,10.05)];
n=500; for i=1:n, i , xn=A*xn+b, wid(xn(1)), end;

On the right two columns above are the widths of the successive iterates ${\mathbf{x}}_{n}$ : the widths of the iterates diverge rapidly to infinity.

The explanation of this phenomenon is the following: the spectral radius of $A$ is strictly less than 1: $\rho(A)\simeq 0.9487<1$ , and thus the exact (and, for that matter, floating-point) iterations converge. However, the recurrence satisfied by the widths of the iterates is $\mathrm{wid}({\mathbf{x}}_{n})=1.8*\mathrm{wid}({\mathbf{x}}_{n-1})+0.9*\mathrm{wid}({\mathbf{x}}_{n-2})+4.7.10^{-2}*3*\mathrm{wid}({\mathbf{u}})$ , which corresponds to the 2-dimensional iteration $w_{n}=|A|.w_{n-1}+w_{b}$ , with $w_{n}=\mathrm{wid}({\mathbf{x}}_{n})$ , $|A|$ the matrix whose coefficients are the absolute values of the coefficients of $A$ and $w_{b}=4.7.10^{-2}*3*\mathrm{wid}({\mathbf{u}})$ . As the spectral radius of $|A|$ is larger than 1, indeed $\rho(|A|)\simeq 2.208>1$ , the iterations diverge.

This phenomenon is a special case of the so-called wrapping effect. Its ubiquity in interval computations has been put in evidence by Lohner in [6].

1.2 The Wrapping Effect

The wrapping effect is ubiquitous, as defined and developed in [6]. It can be described as the overestimation due to the enclosure of the sought set in a set of a given. simple structure. In our case, this simple structure corresponds to multidimensional intervals or boxes, that is, parallelepipeds with sides parallel to the axes of the coordinate system. When the computation is iterative, and when each iteration produces such an overestimating set that is used as the starting point of the next iteration, the size of the computed set may grow exponentially in the number of iterations, even when the exact solution set remains bounded and small.

Lohner also put in evidence that the affine iteration we study in this paper, namely $x_{n+1}=Ax_{n}+b$ , or more generally $x_{n+1}=A_{n}x_{n}+b_{n}$ with $x_{n+1}$ , $x_{n}$ and $b_{n}$ vectors in $\mathbb{R}^{d}$ and $A_{n}\in\mathbb{R}^{d\times d}$ for every $n\in\mathbb{N}$ , is archetypal. It occurs in many algorithms, and the examples cited in [6] include

•

matrix-vector iterations as the ones studied in this paper;
•

discrete dynamical systems: ${\mathbf{x}}_{n+1}=f({\mathbf{x}}_{n})$ , ${\mathbf{x}}_{0}$ given and $f$ sufficiently smooth;
•

continuous dynamical systems (ODEs): $x^{\prime}(t)=g(t,x(t))$ , $x(0)=x_{0}$ , which is studied through a numerical one step method (or more) of the kind ${\mathbf{x}}_{n+1}={\mathbf{x}}_{n}+h\Phi({\mathbf{x}}_{n},t_{n})+{\mathbf{z}}_{n+1}$ ;
•

difference equations: $a_{0}{\mathbf{z}}_{n}+a_{1}{\mathbf{z}}_{n+1}+\ldots+a_{m}{\mathbf{z}}_{n+m}=b_{n}$ with ${\mathbf{z}}_{1},\ldots{\mathbf{z}}_{m}$ given;
•

linear systems with (banded) triangular matrix;
•

automatic differentiation.

In this paper, we concentrate on examples similar to the toy example presented above: for every initial value $x_{0}\in{\mathbb{R}}^{d}$ , the sequence of iterates $(x_{n})_{n\in{\mathbb{N}}}$ converges to a finite value $x^{*}\in{\mathbb{R}}^{d}$ , since $\rho(A)<1$ ; however, the computations performed using interval arithmetic diverge because their behaviour is dictated by $\rho(|A|)$ which is larger than $1$ . We are interested in the iterates computed using interval arithmetic: it is established that these iterates increase in width, however different approaches can be applied to counteract the exponential growth of the width of the iterates. Several of them, some new as in Sections 2.3.2 and 2.3.3, and some already well eestablished as in Section 2.4, will be tried and compared, in terms of the widths of the results and the computational time.

2 Theoretical Results

2.1 Problem and Notations

Let $A$ be a $d\times d$ matrix in $\mathbb{R}^{d\times d}$ , ${\mathbf{x}}_{0}\in{\mathrm{I}}\!\mathbb{R}^{d}$ be an interval vector (boldface font is used for interval quantities and ${\mathrm{I}}\!\mathbb{R}$ stands for the set of real intervals), $x_{0}$ a vector in $\mathbb{R}^{d}$ with $x_{0}\in{\mathbf{x}}_{0}$ , ${\mathbf{b}}\in{\mathrm{I}}\!\mathbb{R}^{d}$ an interval vector, and $b$ a vector in $\mathbb{R}^{d}$ and $b\in{\mathbf{b}}$ . In what follows, $n$ denotes the number of iterations.

It is assumed that $\rho(A)<1$ and $\rho(|A|)>1$ .

A first goal is to determine the set of all fixed-points of the iteration

\left\{\begin{array}[]{l}x_{0}\in{\mathbf{x}}_{0},\mbox{ }b\in{\mathbf{b}},\\ x_{n+1}=Ax_{n}+b\end{array}\right.

for every $x_{0}\in{\mathbf{x}}_{0}$ and every $b\in{\mathbf{b}}$ .

It is known that $x_{n}$ can be written as

x_{n}=A^{n}x_{0}+\sum_{i=1}^{n-1}A^{i}b,

thus

\{x_{n}:x_{0}\in{\mathbf{x}}_{0}\}\subset A^{n}{\mathbf{x}}_{0}+\left(\sum_{i=1}^{n-1}A^{i}\right){\mathbf{b}}.

However, when the vectors $x_{0}$ and $b$ are replaced in the iterative formula by their interval enclosures ${\mathbf{x}}_{0}$ and ${\mathbf{b}}$ , one obtains the new interval vector ${\mathbf{x}}_{n+1}$ , which is computed as:

\left\{\begin{array}[]{l}{\mathbf{x}}_{0}\mbox{ and }{\mathbf{b}}\mbox{ given},\\ {\mathbf{x}}_{n+1}=A{\mathbf{x}}_{n}+{\mathbf{b}}.\end{array}\right.

Another goal is to determine a tight enclosure for each iterate of this diverging set of intervals.

As mentioned above, the increase in widths of the iterates can be attributed to the use of parallelepipeds with sides parallel to the axes of the coordinate system, and not to the geometry of the transformation. To cure this problem, changes of coordinates will be applied, using an invertible matrix $B$ , with $x=By\Leftrightarrow y=B^{-1}x$ and its interval counterpart ${\mathbf{x}}=B{\mathbf{y}}$ . This yields the iteration

\left\{\begin{array}[]{lcl}x_{n+1}&=&By_{n+1}\\ y_{n+1}&=&B^{-1}ABy_{n}+B^{-1}b\end{array}\right.

and its interval counterpart

\left\{\begin{array}[]{lcl}{\mathbf{x}}_{n+1}&=&B{\mathbf{y}}_{n+1}\\ {\mathbf{y}}_{n+1}&=&B^{-1}AB{\mathbf{y}}_{n}+B^{-1}{\mathbf{b}}.\end{array}\right.

In what follows, to establish bounds and their proofs, we assume $A$ diagonalizable (this will not necessarily be the case for the experiments) and $A$ can be diagonalized as $A=P^{-1}\Lambda P$ where $\Lambda$ is a diagonal matrix with the eigenvalues $\lambda_{1},\ldots\lambda_{d}$ of $A$ on the diagonal and the columns of $P^{-1}$ are the corresponding eigenvectors.

The iteration considered in this paper corresponds to $x_{n}=A^{n}x_{0}+\sum_{i=0}^{n-1}A^{i}b$ . The numerical unstability of computing the matrix power $A^{n}$ and applying it to a vector, is well known: $A^{n}x_{0}$ tends to be aligned with the eigenvector of $A$ associated with the largest (in module) eigenvalue, and the information corresponding to the contribution of the other eigenvectors is lost. To avoid this well-known problem of the power method, we will consider orthogonal changes of coordinates. The choice of the orthogonal matrices is related to $A$ , the matrix of the iteration.

We will first consider the QR-factorization of $A$ : $A=QR$ with $Q\in{\mathbb{R}}^{d\times d}$ orthogonal, that is, $QQ^{\prime}=Q^{\prime}Q=I$ is the identity matrix and $R\in{\mathbb{R}}^{d\times d}$ is upper triangular.

The other factorization used in this paper is the SVD-factorization of $A$ : $A=U\Sigma V^{\prime}$ with $U$ , $V$ and $\Sigma$ $\in{\mathbb{R}}^{d\times d}$ , where $U$ and $V$ are orthogonal and $\Sigma$ is a diagonal matrix with the singular values $\sigma_{1},\ldots\sigma_{d}$ of $A$ on the diagonal. We also assume that $\sigma_{1}\geq\sigma_{2}\geq\ldots\geq\sigma_{n}\geq 0$ . This idea has been sketched but not completely developed by Beaumont in [2].

2.2 Known results

Mayer and his co-authors have extensively studied the existence of a fixed-point for the iteration studied in this paper. In [7], Mayer and Warnke have thoroughly established formulas for the fixed-point in the case of $\rho(|A|)<1$ : this fixed-point is independent of the starting interval ${\mathbf{x}}_{0}$ . In [1], Arndt and Mayer have established necessary and sufficient condition on $A$ for a fixed-point to exist, when $\rho(|A|)=1$ . In this case, the fixed-point is an interval of nonzero width, that is, a non-degenerate interval. It is well-known that the widths of the iterates diverge when $\rho(|A|)>1$ , and thus that no fixed-point exists in this case. Our goal is to study the speed of divergence of the iterates in this case.

2.3 Different Approaches along with Theoretical Bounds

The main idea is to use an orthogonal change of coordinates which is related to the matrix of the iteration. As the matrix $A$ is kept constant for all iterations (and this is not the case in the more general approach of Lohner, see Section 2.4), the change of coordinates is also kept constant and given by an orthogonal matrix $B$ . The two orthogonal matrices considered in what follows are either $B=Q$ from the QR-factorization of $A$ , or $B=U$ , resp. $B=V^{\prime}$ , from the SVD-factorization of $A$ .

2.3.1 Orthogonal Change of Coordinates

Before diving into the specificities of these changes of coordinates, let us study the general change of coordinates using an orthogonal matrix $B$ , that is, $B^{-1}=B^{\prime}$ , with $x=By\Leftrightarrow y=B^{-1}x$ and its interval counterpart ${\mathbf{x}}=B{\mathbf{y}}$ . The interval iteration is

\left\{\begin{array}[]{lcl}{\mathbf{x}}_{n+1}&=&B{\mathbf{y}}_{n+1}\\ {\mathbf{y}}_{n+1}&=&B^{-1}AB{\mathbf{y}}_{n}+B^{-1}{\mathbf{b}}.\end{array}\right.

Thus the iteration satisfied by the width of the ${\mathbf{y}}_{n}$ is

\begin{array}[]{lcl}\mathrm{wid}(\mathbf{y}_{n+1})&=&|B^{-1}AB|\mathrm{wid}(\mathbf{y}_{n})+|B^{-1}|\mathrm{wid}(\mathbf{b})\\ &\leq&|B^{-1}|.|A|.|B|\mathrm{wid}(\mathbf{y}_{n})+|B^{-1}|\mathrm{wid}(\mathbf{b})\end{array}

where the inequalities are to be understood componentwise. By induction on $n$ ,

\mathrm{wid}(\mathbf{y}_{n})\leq(|B^{-1}|.|A|.|B|)^{n}.\mathrm{wid}(\mathbf{y}_{0})+\sum_{i=0}^{n-1}(|B^{-1}|.|A|.|B|)^{i}.|B^{-1}|.\mathrm{wid}(\mathbf{b}).

Taking norms, one gets

\begin{array}[]{lccl}\|\mathrm{wid}(\mathbf{y}_{n})\|&\leq&&\left(\|\>|B^{-1}|\>\|.\|\>|A|\>\|.\|\>|B|\>\|\right)^{n}\|.\mathrm{wid}(\mathbf{y}_{0})\|\\[5.69054pt] &&+&\sum_{i=0}^{n-1}\left(\|\>|B^{-1}|\>\|.\|\>|A|\>\|.\|\>|B|\>\|\right)^{i}.\|\>|B^{-1}|\>\|.\|\mathrm{wid}(\mathbf{b})\|\\[11.38109pt] &\leq&&\left(\|\>|B^{-1}|\>\|.\|\>|A|\>\|.\|\>|B|\>\|\right)^{n}\|.\mathrm{wid}(\mathbf{y}_{0})\|\\[5.69054pt] &&+&\frac{\left(\|\>|B^{-1}|\>\|.\|\>|A|\>\|.\|\>|B|\>\|\right)^{n}-1}{\|\>|B^{-1}|\>\|.\|\>|A|\>\|.\|\>|B|\>\|-1}\|\>|B^{-1}|\>\|.\|\mathrm{wid}(\mathbf{b})\|.\end{array}

Remark: if the considered norm is the matrix norm induced by the vector Euclidean norm, then $\|\>|B|\>\|_{2}=\|B\|_{2}$ for any matrix $B$ . Similarly, $\|\>|B|\>\|_{\infty}=\|B\|_{\infty}\leq\sqrt{d}$ for any $d\times d$ orthogonal matrix $B$ . In such cases, the bound becomes

\begin{array}[]{lccl}\|\mathrm{wid}(\mathbf{y}_{n})\|&\leq&&\left(\kappa(B).\|\>|A|\>\|\right)^{n}\|\mathrm{wid}(\mathbf{y}_{0})\|\\[5.69054pt] &&+&\frac{\left(\kappa(B).\|\>|A|\>\|\right)^{n}-1}{\kappa(B).\|\>|A|\>\|-1}\|B^{-1}\|.\|\mathrm{wid}(\mathbf{b})\|,\end{array}

where $\kappa(B)$ denotes $\|B\|.\|B^{-1}\|$ , the condition number of $B$ for the problem of solving a linear system.

Since $\|B\|_{2}=\|B^{-1}\|_{2}=\kappa_{2}(B)=1$ for an orthogonal matrix $B$ , this bound simplifies even further with the Euclidean norm:

\begin{array}[]{lcl}\|\mathrm{wid}(\mathbf{y}_{n})\|&\leq&\|A\|^{n}.\|\mathrm{wid}(\mathbf{y}_{0})\|+\frac{\|A\|^{n}-1}{\|A\|-1}.\|\mathrm{wid}(\mathbf{b})\|.\end{array}

In other words, theoretically there is no difference in the bounds on the widths of the iterates, whether an orthogonal change of coordinates takes place or not.

In what follows, we assume again $A$ diagonalizable (this will not necessarily be the case for the experiments) and $A$ can be diagonalized as $A=P^{-1}\Lambda P$ where $\Lambda$ is diagonal. If we replace $A$ by $P^{-1}\Lambda P$ in the iteration, one gets the mathematically equivalent formulation

\begin{array}[]{lcl}\mathbf{y}_{n+1}&=&B^{-1}P^{-1}\Lambda PB\mathbf{y}_{n}+B^{-1}\mathbf{b}\\ &=&(PB)^{-1}\Lambda(PB)\mathbf{y}_{n}+B^{-1}\mathbf{b},\end{array}

thus

\mathrm{wid}(\mathbf{y}_{n})=(|(PB)^{-1}|.|\Lambda|.|PB|).\mathrm{wid}(\mathbf{y}_{n})+|B^{-1}|.\mathrm{wid}(\mathbf{b}),

and by induction

\mathrm{wid}(\mathbf{y}_{n})=(|(PB)^{-1}|.|\Lambda|.|PB|)^{n}.\mathrm{wid}(\mathbf{y}_{0})+\sum_{i=0}^{n-1}(|(PB)^{-1}|.|\Lambda|.|PB|)^{i}.|B^{-1}|.\mathrm{wid}(\mathbf{b}).

Taking the Euclidean norm of vectors and the induced matrix norm, one gets

\begin{array}[]{lccl}\|\mathrm{wid}(\mathbf{y}_{n})\|_{2}&\leq&&\left(\kappa_{2}(PB)\|\Lambda\|_{2}\right)^{n}.\|\mathrm{wid}(\mathbf{y}_{0})\|_{2}\\[5.69054pt] &&+&\frac{\left(\kappa_{2}(PB)\|\Lambda\|_{2}\right)^{n}-1}{\kappa_{2}(PB)\|\Lambda\|_{2}-1}.\|\mathrm{wid}(\mathbf{b})\|_{2}.\\[11.38109pt] \end{array}

Let us note that $\kappa(PB)=\kappa(P)$ . Furthermore, as $\Lambda$ is diagonal, $\|\Lambda\|$ is the largest eigenvalue (in module) of $A$ , that is, $\|\Lambda\|=\rho(A)<1$ . This implies

\begin{array}[]{lcl}\|\mathrm{wid}(\mathbf{y}_{n})\|_{2}&\leq&\left(\kappa_{2}(P)\rho(A)\right)^{n}.\|\mathrm{wid}(\mathbf{y}_{0})\|_{2}+\frac{\left(\kappa_{2}(P)\rho(A)\right)^{n}-1}{\kappa_{2}(P)\rho(A)-1}.\|\mathrm{wid}(\mathbf{b})\|_{2},\\[11.38109pt] \end{array}

This inequality puts in evidence the influence of the condition number of $P$ , the matrix of eigenvectors. For instance, in the ideal case where the eigenvectors form an orthonormal basis, no overestimation occurs.

2.3.2 Use of the QR Factorization

When the orthogonal change of coordinates involves $Q$ from the QR-factorization of $A$ , the algorithm can be written as

\begin{array}[]{rcl}A&=&QR,\\ x_{n+1}&=&Qy_{n+1}\\ \Leftrightarrow y_{n+1}&=&Q^{\prime}x_{n+1}\\ y_{n+1}&=&Q^{\prime}AQy_{n}+Q^{\prime}b\end{array}

In exact arithmetic, one should get

y_{n+1}=RQy_{n}+Q^{\prime}b.

The interval counterpart is

\begin{array}[]{lcl}{\mathbf{x}}_{n+1}&=&Q{\mathbf{y}}_{n+1}\\ {\mathbf{y}}_{n+1}&=&Q^{\prime}AQ{\mathbf{y}}_{n}+Q^{\prime}{\mathbf{b}}.\end{array}

2.3.3 Use of the SVD Factorization

Our second and third proposals consist in using respectively $U$ and $V$ from the SVD-factorization of $A$ : from $A=U\Sigma V^{\prime}$ , we use either $B=U$ or $B=V^{\prime}$ , which yields
$\begin{array}[]{rcl}x_{n+1}&=&Uy_{n+1}\\ \Leftrightarrow y_{n+1}&=&U^{\prime}x_{n+1},\\ y_{n+1}&=&U^{\prime}AUy_{n}+U^{\prime}b.\end{array}$ In exact arithmetic, this corresponds to $y_{n+1}=\Sigma VUy_{n}+U^{\prime}b.$ The interval counterpart is $\begin{array}[]{lcl}{\mathbf{x}}_{n+1}&=&U{\mathbf{y}}_{n+1}\\ {\mathbf{y}}_{n+1}&=&U^{\prime}AU{\mathbf{y}}_{n}+U^{\prime}{\mathbf{b}}.\end{array}$ $\begin{array}[]{rcl}x_{n+1}&=&V^{\prime}y_{n+1}\\ \Leftrightarrow y_{n+1}&=&Vx_{n+1},\\ y_{n+1}&=&VAV^{\prime}y_{n}+Vb.\end{array}$ In exact arithmetic, this corresponds to $y_{n+1}=VU\Sigma y_{n}+Vb.$ The interval counterpart is $\begin{array}[]{lcl}{\mathbf{x}}_{n+1}&=&V^{\prime}{\mathbf{y}}_{n+1}\\ {\mathbf{y}}_{n+1}&=&VAV^{\prime}{\mathbf{y}}_{n}+V{\mathbf{b}}.\end{array}$

Remark: $VU$ is also an orthogonal matrix.

2.4 Lohner’s QR Method

A well-known approach is given in Lohner, e.g. in [6] and studied in details by Nedialkov and Jackson in [8]. It is usually presented for the iteration $x_{n+1}=A_{n}x_{n}+b_{n}$ , that is when the matrix and the affine term vary at each iteration.

Lohner’s QR method consists in performing the following iteration:

\left\{\begin{array}[]{lcl}\lx@intercol y_{0}=x_{0},\>Q_{0}=I,\>[Q_{1},R_{1}]=qr(A)\mbox{ that is, }A=Q_{1}R_{1}\hfil\lx@intercol\\ [Q_{n+1},R_{n+1}]&=&qr(R_{n}Q_{n})\\ y_{n+1}&=&Q_{n+1}^{\prime}AQ_{n}y_{n}+Q_{n+1}^{\prime}b\\ x_{n+1}&=&Q_{n+1}y_{n+1}\end{array}\right.

and its interval counterpart is

\left\{\begin{array}[]{lcl}\lx@intercol{\mathbf{y}}_{0}={\mathbf{x}}_{0},\>Q_{0}=I,\>[Q_{1},R_{1}]=qr(A)\mbox{ that is, }A=Q_{1}R_{1}\hfil\lx@intercol\\ [Q_{n+1},R_{n+1}]&=&qr(R_{n}Q_{n})\\ {\mathbf{y}}_{n+1}&=&Q_{n+1}^{\prime}AQ_{n}{\mathbf{y}}_{n}+Q_{n+1}^{\prime}{\mathbf{b}}\\ {\mathbf{x}}_{n+1}&=&Q_{n+1}{\mathbf{y}}_{n+1}.\\ \end{array}\right.

In the case of a constant – throughout the iterations – matrix $A$ , one can recognize Francis’ and Kublanovskaya’s QR-algorithm. Using the convergence of $(R_{n})$ towards the matrix of eigenvalues of $A$ (or towards its Schur form), in [8], Nedialkov and Jackson established the following bounds:

w({\mathbf{x}}_{n})\leq\mathrm{cond}(P)\rho(A)^{n}w({\mathbf{x}}_{0})+\frac{\mathrm{cond}(P)\rho(A)^{n-1}-1}{\mathrm{cond}(P)\rho(A)-1}w({\mathbf{b}})+{\mathbf{b}}

where we recall $A$ diagonalizable: $A=P^{-1}\Lambda P$ .

2.5 Comparison

Two aspects are compared: the complexity and the accuracy, that is, the bounds on the widths of the iterates, of each method.

Let us first examine the computational complexity. Let us recall that the QR-factorization, resp. SVD-factorization, of a $d\times d$ matrix has a computational complexity of ${\cal{O}}(d^{3})$ . In the algorithms of Sections 2.3.2 and 2.3.3, the factorization of a matrix is performed only once, and not at every iteration: for $n$ iterations, these algorithms thus have complexity ${\cal{O}}(d^{3}+nd^{2})$ . In comparison, Lohner’s QR method has complexity ${\cal{O}}(nd^{3})$ , which is significantly larger when $d$ is large. In comparison, the cost of the factorization is negligible when the number $n$ of iterations is large.

Let us now compare the accuracy of these different methods, from a theoretical point of view. The bounds we get on the width of the iterate ${\mathbf{x}}_{n}$ are larger than the bounds obtained by Nedialkov and Jackson, as the condition number of the matrix $P$ appears to the $n$ -th power in the formula for the QR- and SVD-algorithms, whereas it appears without this $n$ -th power in the bound for Lohner’s QR-algorithm. As a condition number is always larger or equal to $1$ , this means that the bound for Lohner’s QR-algorithm is tighter than the bounds for the QR- and SVD-algorithms.

3 Experiments

3.1 Experimental Setup

After the results on the widths of the iterates of the toy example given in Section 1.1, Section 3.2 presents the computation of each corner of the initial box, to illustrate that it is possible to get tight enclosures, on such a small example.

All algorithms presented in this paper, namely the naive (or brute-force) application of the iteration, the QR-algorithm of Section 2.3.2, the two versions of the SVD-algorithm of Section 2.3.3, and Lohner’s QR algorithm given in Section 2.4 have been implemented in Octave using Heimlich’ interval package [3], then in Matlab using Rump’s Intlab package [10]. Two other methods have been implemented and compared. The first technique [9] consists in the determination of $k$ such that $\rho(|A^{k}|)<1$ , then it computes only one iterate every $k$ step, in other words it computes

x_{(k+1)n}=A^{k}x_{kn}+\sum_{i=0}^{k-1}A^{i}b:

this iteration converges even when interval arithmetic is employed. The other technique is the use of affine arithmetic, as advocated by Rump in a private communication. In the experimental results presented below, each technique is associated to a color:

algorithm	color
brute force	black
QR	cyan
SVD $U$	red
SVD $V$	magenta
Lohner’s QR	dark blue
every $k$ -th iterate	green
affine arithmetic	yellow

The factorizations use only the basic QR and SVD factorizations available in Matlab, but neither the pivoted QR recommended by Lohner in [6] nor more elaborate versions presented by Higham in [4].

Sections 3.3 and 3.4 contain the evolution of the radii of the iterates computed by these different techniques, for two matrix dimensions: $10\times 10$ and $100\times 100$ . The $y$ -axis for the radii uses a logarithmic scale. For both dimensions, four kinds of matrices $A$ have been used for the experiments. On the one hand, matrices which are well-conditioned (with a condition number of order $10^{2}$ ) and ill-conditioned (with a condition number of order $10^{10}$ ) have been generated. On the other hand, the scaling of the matrices varies: matrices which are well-scaled and matrices which are ill-scaled, with the order of magnitude of their coefficients varying between $1$ and $10^{10}$ . These are only orders of magnitudes, as the matrices, originally generated by a call to Matlab’s randsvd were then added to a multiple of the identity matrix and multiplied by a constant, in order to satisfy both $\rho(A)<1$ and $\rho(|A|)>1$ . It can also be noted that degrading the scaling of the matrix also degrades its condition number; in other words, a “well-conditioned ill-scaled” matrix has a much worse condition number than a “well-conditioned well-scaled” matrix, even if the required condition numbers, in the call to randsvd, are initially the same.

All experiments have been performed on a 2.7 GHz Quad-Core Intel Core i7 with 16GB RAM. Timings are averaged over 100 executions, except for affine arithmetic where at most 10 executions were performed.

3.2 Toy Example

First, the toy example presented in Section 1.1 is considered. As the iteration is affine, one can compute separately the images of the endpoints of the initial vector, to get the endpoints of the successive iterates. That is, we compute separately

x_{n}=1.8*x_{n-1}-0.9*x_{n-2}+4.7.10^{-2}*3*{\mathbf{u}}

for $x_{0}=0$ and $x_{1}=1$ and for $x_{0}=0$ and $x_{1}=1.1$ . However, we use the interval vector ${\mathbf{u}}=[9.95,10.05]$ in the iteration. The convex hull of the 10 first iterates are represented on the left part of Figure 1. It is obvious that the width of the successive iterates grow rapidly.

Then we compute separately

x_{n}=1.8*x_{n-1}-0.9*x_{n-2}+4.7.10^{-2}*3*u

for $x_{0}=0$ and $x_{1}=1$ and for $x_{0}=0$ and $x_{1}=1.1$ , and for $u=9.95$ and $u=10.05$ . The convex hull of the 10 first iterates are represented on the right part of Figure 1. In this case, the width of the successive iterates remain small, of the order of magnitude of $1\%$ of the midpoint of the interval.

Refer to caption — Figure 1: Left: the 10 first iterates of the toy example, where the endpoints of ${\mathbf{x}}_{0}$ are considered separately. Right: the 10 first iterates of the toy example, computed corner by corner.

In this toy example, the iterations had to be performed 4 times, that is, once for each corner of the initial values, in order to get a tight enclosure. This can clearly not be generalized to high dimensions, as the number of corners grows as $4^{d}$ with the dimension $d$ of the problem.

3.3 Example of dimension 10

Figure 2 gives the radii (in logarithmic scale) for the successive iterates computed by the methods detailed above. When the number of iterations is large (visually, above 30 or 40 iterations), the iterates computed by all methods presented in Section 2 diverge rapidly, as can be seen on the plots on the left. When one concentrates on the first iterations, the behaviours compare differently. One can also note that unscaling the matrix $A$ speeds the divergence, for all methods. On the contrary, the $k$ -step method and the use of affine arithmetic preserve the convergence of the iterates.

The timings in seconds are given below:

method	well-cond.	ill-cond.	well-cond.	ill-cond.
	well-scaled	well-scaled	ill-scaled	ill-scaled
naive	0.0088	0.0084	0.0093	0.0086
$k$ -th step	0.0044	0.0015	0.0043	0.0045
QR	0.0161	0.0155	0.0160	0.0157
SVD $U$	0.0162	0.0156	0.0161	0.0166
SVD $V$	0.0147	0.0145	0.0324	0.0332
Lohner’s QR	0.0170	0.0164	0.0165	0.0177
affine arith.	8.1859	8.7911	8.6595	8.2754

The methods presented in Sections 2.3.2, 2.3.3 and 2.4 all exhibit similar execution times. The naive method performs less operations and is thus faster. The $k$ -th step method is fast as well, the variations in its execution time are due to the preprocessing, that is to the determination of the power $k$ such that $\rho(|A^{k}|)<1$ : the execution time is larger when $k$ is larger. With this method, the convergence is good. The use of affine arithmetic significantly slows down the computations, however the iterates converge.

3.4 Example of dimension 100

Figure 3 gives the radii (in logarithmic scale) for the successive iterates computed by the methods detailed in Section 3.1. When the number of iterations is large (visually, above 40 or 50 iterations), the iterates computed by all methods, except the $k$ -step method, diverge rapidly, as can be seen on the plots on the left. Again, when one concentrates on the first iterations, the behaviours compare differently.

The timings in seconds are given below:

method	well-cond.	ill-cond.	well-cond.	ill-cond.
	well-scaled	well-scaled	ill-scaled	ill-scaled
naive	0.0163	0.0145	0.0142	0.0142
$k$ -th step	0.0046	0.0071	0.0048	0.0072
QR	0.1577	0.0363	0.0408	0.0409
SVD $U$	0.0427	0.0420	0.0437	0.0437
SVD $V$	0.0423	0.0390	0.0643	0.0638
Lohner’s QR	0.0611	0.0758	0.0646	0.0804
affine arith.	70.8724	72.6441	76.6102	71.2518

The comments on the timings apply again, with the exception of the use of affine arithmetic, which is still much slower but does not manage any more to preserve the convergence very long.

3.5 Comments

One can note that the $k$ -step method, that is the method that resorts to a convergent interval iteration, performs very well at a moderate computation cost. Even the preprocessing time to determine the value of $k$ has a negligible cost.

This method is a totally ad hoc approach for this problem and cannot be generalized. However, in the framework of filters and control theory, it has a physical meaning: the divergence of the iterations can be attributed to a sampling time which is too small to allow variations to be observed. Multiplying the sampling time by $k$ means sampling less frequently (by a factor $k$ ) and thus being able to measure the evolution of the observed quantities.

The use of affine arithmetic, on the contrary, is a very general method and it exhibits a very good accuracy, even if it eventually diverges (see the experiments with the $100\times 100$ matrices in Section 3.4). The counterpart is the execution time, which is at least a thousand times larger than for the other methods. This is not an issue for the experiments presented here, as the time is of order of magnitude of a minute.

The methods based on the QR or SVD factorizations of the matrix $A$ were developed with geometric principles in mind. For the QR-algorithm, the idea was to align the current box with the directions that are preserved by the product by $A$ , with a tradeoff between aligning the box along the eigenvectors and preserving an orthonormal system of coordinates, hence the choice of $Q$ . For the SVD-algorithm, the idea was to align the box along the direction which gets the maximal elongation, that is along the singular vector corresponding to the largest singular value.

In both cases, the benefit of these geometric transformations is mitigated with the overestimation implied by extra computations, and there is either no clear benefit for the QR-based approach, or a delicate balance for the SVD-based approach. The SVD-algorithm is interesting when the matrix is ill-scaled, and particularly for the first iterations.

The methods of choice remain either the naive approach, when the matrix $A$ is well-conditioned and well-scaled, or Lohner’s QR method when the matrix is ill-conditioned. Surprisingly, the overhead of Lohner’s QR method, in terms of computational time, is not as large as the formula for its complexity implies.

Our general recommendation is thus:

•

to preprocess the matrix $A$ in order to scale it;
•

then to execute in parallel the naive approach and Lohner’s QR approach, in order to converge reasonably well for any condition number of $A$ .

Affine arithmetic is a solution of choice when other solutions fail and when the analysis and developing time is a scarce resource.

4 Conclusion and Future Work

This study, both theoretical and experimental, has compared several approaches to counteract the wrapping effect for the computation of affine iterations. Geometric considerations have led to the proposed algorithms. The benefit of these approaches is not always clear, as a better configuration is obtained through extra-computations and thus extra-overestimation. To deepen this geometric approach, we will aim at simplifying the resulting formulas, at getting formulas that are closer to the mathematically equivalent, but simpler, ones that are given after each proposed transformation. The main difficulty is to perform products such as $Q.Q^{\prime}$ or $U^{\prime}.U$ , without replacing them by the identity, but in a certified and tight way. As the SVD-based approach seems more promising, our future work will concentrate on the use of a certified SVD factorization, as proposed by van der Hoeven and Yakoubsohn in [11]. We also plan to consider an interval version of the matrix, using the results in [5] to keep guarantees on the singular quantities involved in the computations.

References

[1] Arndt, H-R., and Mayer, G. On the semi-convergence of interval matrices. Linear Algebra and its Applications, 393:15–35, 2004. 10.1016/j.laa.2003.10.015.
[2] Beaumont, O. Solving Interval Linear Systems with Oblique Boxes. Technical Report 1315, Irisa, 2000. ftp://ftp.irisa.fr/techreports/2000/PI-1313.ps.gz.
[3] Heimlich, O. Interval arithmetic in GNU Octave. In SWIM 2016: Summer Workshop on Interval Methods, France, 2016. https://swim2016.sciencesconf.org/data/SWIM2016_book_of_abstracts.pdf#page=27.
[4] Higham, N.J. QR factorization with complete pivoting and accurate computation of the SVD. Linear Algebra and its Applications, 309:153–174, 2000. 10.1016/S0024-3795(99)00230-X.
[5] Hladik, M. and Daney, D. and Tsigaridas, E. Bounds on Real Eigenvalues and Singular Values of Interval Matrices. SIAM J. Matrix Analysis and Applications, 31(4):2116–2129, 2010. 10.1137/090753991.
[6] Lohner, R. On the Ubiquity of the Wrapping Effect in the Computation of Error Bounds. In Kulisch, Lohner, Facius, editor, Perspectives on Enclosures Methods, pages 201–217. Springer, 2001.
[7] Mayer, G. and Warnke, I. On the fixed points of the interval function $[f]([x])=[A][x]+[b]$ . Linear Algebra and its Applications, 363:202–216, 2003. 10.1016/S0024-3795(02)00254-9.
[8] Nedialkov, N. and Jackson, K. A New Perspective on the Wrapping Effect in Interval Methods for Initial Value Problems for Ordinary Differential Equations. In Kulisch, Lohner, Facius, editor, Perspectives on Enclosures Methods, pages 219–264. Springer, 2001.
[9] Revol, N. Convergent linear recurrences (with scalar coefficients) with divergent interval simulations. In SCAN: 11th GAMM - IMACS Int. Symp. on Scientific Computing, Computer Arithmetic, and Validated Numerics (Japan), 2004.
[10] Rump, S.M. INTLAB - INTerval LABoratory. In Csendes, Tibor, editor, Developments in Reliable Computing, pages 77–104. Kluwer Academic Publishers, Dordrecht, 1999. 10.1007/978-94-017-1247-7_7.
[11] van der Hoeven, J. and Yakoubsohn, J.-C. Certified Singular Value Decomposition. Technical report, 2018. https://hal.archives-ouvertes.fr/hal-01941987.

a)
b)
c)
d)

a)
b)
c)
d)