Relaxed Fixed Point Iterations for Matrix Equations Arising in Markov Chains Modeling^†^†thanks: The authors are partially supported by INDAM/GNCS and by the project PRA_2020_61 of the University of Pisa.

Gemignani Luca
Dipartimento di Informatica
Università di Pisa
[email protected] Meini Beatrice
Dipartimento di Matematica
Università di Pisa
[email protected]

Abstract

We present some accelerated variants of fixed point iterations for computing the minimal non-negative solution of the unilateral matrix equation associated with an M/G/1-type Markov chain. These variants derive from certain staircase regular splittings of the block Hessenberg M-matrix associated with the Markov chain. By exploiting the staircase profile we introduce a two-step ﬁxed point iteration. The iteration can be further accelerated by computing a weighted average between the approximations obtained at two consecutive steps. The convergence of the basic two-step ﬁxed point iteration and of its relaxed modification is proved. Our theoretical analysis, along with several numerical experiments, show that the proposed variants generally outperform the classical iterations.

Keywords: M-matrix, Staircase Splitting, Nonlinear Matrix Equation, Hessenberg Matrix, Markov Chain.

1 Introduction

The transition probability matrix of an M/G/1-type Markov chain is a block Hessenberg matrix $P$ of the form

P=\left[\begin{array}[]{cccc}B_{0}&B_{1}&B_{2}&\ldots\\ A_{-1}&A_{0}&A_{1}&\ldots\\ &A_{-1}&A_{0}&\ddots\\ &&\ddots&\ddots\end{array}\right],

(1)

with $A_{i},B_{i}\in\mathbb{R}^{n\times n}\geq 0$ and $\sum_{i=0}^{\infty}B_{i}$ and $\sum_{i=-1}^{\infty}A_{i}$ stochastic matrices.

In the sequel, given a real matrix $A=(a_{ij})_{ij}\in\mathbb{R}^{m\times n}$ , we write $A\geq 0$ ( $A>0$ ) if $a_{ij}\geq 0$ ( $a_{i,j}>0$ ) for any $i,j$ . A stochastic matrix is matrix $A\geq 0$ such that $A\mbox{\boldmath$e$}=\mbox{\boldmath$e$}$ , where $e$ is the column vector having all the entries equal to 1.

In the positive recurrent case, the computation of the steady state vector $\pi$ of $P$ , such that

\mbox{\boldmath$\pi$}^{T}P=\mbox{\boldmath$\pi$}^{T},\quad\mbox{\boldmath$\pi$}^{T}\mbox{\boldmath$e$}=1,\quad\mbox{\boldmath$\pi$}\geq\mbox{\boldmath$0$},

(2)

is related with the solution of the unilateral power series matrix equation

X=A_{-1}+A_{0}X+A_{1}X^{2}+A_{2}X^{3}+\ldots.

(3)

Under some mild assumptions this equation has a componentwise minimal non-negative solution $G$ which determines, by means of Ramaswami’s formula [16], the vector $\pi$ .

Among the easy-to-use, but still effective, tools for numerically solving (3), there are fixed point iterations (see [3] and the references given therein for a general review of these methods). The intrinsic simplicity of such schemes make them attractive in domains where high performance computing is crucial. But they come at a price: the convergence can become very slow especially for problems which are close to singularity. The design of acceleration methods (aka extrapolation methods) for fixed point iterations is a classical topic in numerical analysis [5]. Relaxation techniques are commonly used for the acceleration of classical stationary iterative solvers for large linear systems. In this paper we introduce some new coupled fixed point iterations for solving (3) which can be combined with relaxation techniques to speed up their convergence.

More specifically, we first observe that computing the solution of the matrix equation (3) is formally equivalent to solving a semi-infinite block Toeplitz block Hessenberg linear system. Customary block iterative algorithms applied for the solution of this system yield classical fixed point iterations. In particular, the traditional and the U-based fixed point iteration [3] originate from the block Jacobi and the block Gauss-Seidel method, respectively. Recently, in [7] the authors show that some iterative solvers based on a block staircase partitioning outperform the block Jacobi and the block Gauss-Seidel method for M-matrix linear systems in block Hessenberg form. The application of the staircase splitting to the block Toeplitz block Hessenberg linear system associated with (3) yields the new coupled fixed point iteration

\left\{\begin{array}[]{ll}(I_{n}-A_{0})Y_{k}=A_{-1}+\sum_{i=1}^{\infty}A_{i}X_{k}^{i+1};\\ X_{k+1}=Y_{k}+(I_{n}-A_{0})^{-1}A_{1}(Y_{k}^{2}-X_{k}^{2}),\end{array}\right.\quad k\geq 0,

(4)

starting from an initial approximation $X_{0}$ . The contribution of this paper is aimed at highlighting the properties of (4).

We show that, if $X_{0}=0$ , the sequence $\{X_{k}\}_{k}$ defined in (4) converges to $G$ faster than the traditional fixed point iteration. In the case where the starting matrix $X_{0}$ of (4) is any column stochastic matrix and $G$ is also column stochastic, we prove that the sequence $\{X_{k}\}_{k}$ still converges to $G$ . Moreover, by comparison of the mean asymptotic rates of convergence, we conclude that (4) is asymptotically faster than the traditional fixed point iteration.

At each iteration the scheme (4) determines two approximations which can be combined by using the relaxation technique, that is, the approximation computed at the $k$ -th step takes the form of a weighted average between $Y_{k}$ and $X_{k+1}$ . The modified relaxed variant is defined by the sequence

\left\{\begin{array}[]{ll}(I_{n}-A_{0})Y_{k}=A_{-1}+\sum_{i=1}^{\infty}A_{i}X_{k}^{i+1};\\ X_{k+1}=Y_{k}+\omega_{k+1}(I_{n}-A_{0})^{-1}A_{1}(Y_{k}^{2}-X_{k}^{2}),\end{array}\right.\quad k\geq 0,

(5)

where $\omega_{k+1}$ is the relaxation parameter. The convergence results for (4) easily extend to the modified scheme in the case of under-relaxation, that is, the parameter $\omega_{k}$ satisfies $0\leq\omega_{k}\leq 1$ . Heuristically, it is argued that over-relaxation values ( $\omega_{k}>1)$ can improve the convergence. If $X_{0}=0$ , under some assumptions a theoretical estimate of the asymptotic convergence rate of (5) is given which confirms this heuristic. Moreover, an adaptive strategy is devised which makes possible to perform over-relaxed iterations of (5) by still ensuring the convergence of the overall iterative process. The results of extensive numerical experiments confirm the effectiveness of the proposed variants, which generally outperform the $U$ -based fixed point iteration for nearly singular problems. In particular, the over-relaxed scheme (5) with $X_{0}=0$ , combined with the adaptive strategy for parameter estimation, is capable to significantly accelerate the convergence without increasing the computational cost.

The paper is organized as follows. In Section 2 we set up the theoretical framework, briefly recalling some preliminary properties and assumptions. In Section 3 we revisit classical fixed point iterations for solving (3) by establishing the link with the iterative solution of an associated block Toeplitz block Hessenberg linear system. In Section 4 we introduce the new fixed point iteration (4) by proving some convergence results. The relaxed variant (5), as well as the generalizations of convergence results for this variant, are described in Section 5. Section 6 deals with a formal analysis of the asymptotic convergence rate of both (4) and (5). Adaptive strategies for the choice of the relaxation parameter are discussed in Section 7, together with their cost analysis under some simplified assumptions. Finally, the results of extensive numerical experiments are presented in Section 8 whereas conclusions and future work are the subjects of Section 9.

2 Preliminaries and assumptions

Throughout the paper we assume that $A_{i}$ , $i\geq-1$ , are $n\times n$ nonnegative matrices, such that their sum $A=\sum_{i=-1}^{\infty}A_{i}$ is irreducible and row stochastic, that is, $A\mbox{\boldmath$e$}=\mbox{\boldmath$e$}$ , $\mbox{\boldmath$e$}=\left[1,\ldots,1\right]^{T}$ . According to the results of [3, Chapter 4], such assumption implies that (3) has a unique componentwise minimal nonnegative solution $G$ ; moreover, $I_{n}-A_{0}$ is an invertible M-matrix and, hence, $(I-A_{0})^{-1}\geq 0$ .

Furthermore, in view of the Perron Frobenius Theorem, there exists a unique vector $v$ such that $\mbox{\boldmath$v$}^{T}A=\mbox{\boldmath$v$}^{T}$ and $\mbox{\boldmath$v$}^{T}\mbox{\boldmath$e$}=1$ , $\mbox{\boldmath$v$}>0$ . If the series $\sum_{i=-1}^{\infty}iA_{i}$ is convergent we may define the vector $\mbox{\boldmath$w$}=\sum_{i=-1}^{\infty}iA_{i}\mbox{\boldmath$e$}\in\mathbb{R}^{n}$ . In the study of M/G/1-type Markov chains, the drift is the scalar number $\eta=\mbox{\boldmath$v$}^{T}\mbox{\boldmath$w$}$ [14]. The sign of the drift determines the positive recurrence of the M/G/1-type Markov chain [3].

When explicitly stated, we will assume the following condition:

A1.

The series $\sum_{i=-1}^{\infty}iA_{i}$ is convergent and $\eta<0$ .

Under assumption [A1], the componentwise minimal nonnegative solution $G$ of equation (3) is stochastic, i.e., $G\mbox{\boldmath$e$}=\mbox{\boldmath$e$}$ ([3]). Moreover $G$ is the only stochastic solution.

3 Nonlinear matrix equations and structured linear systems

In this section we reinterpret classical fixed point iterations for solving the matrix equation (3) in terms of iterative methods for solving a structured linear system.

Formally, the power series matrix equation (3) can be rewritten as the following block Toeplitz block Hessenberg linear system

\left[\begin{array}[]{cccc}I_{n}-A_{0}&-A_{1}&-A_{2}&\ldots\\ -A_{-1}&I_{n}-A_{0}&-A_{1}&\ldots\\ &-A_{-1}&I_{n}-A_{0}&\ddots\\ &&\ddots&\ddots\end{array}\right]\left[\begin{array}[]{cccc}X\\ X^{2}\\ X^{3}\\ \vdots\end{array}\right]=\left[\begin{array}[]{cccc}A_{-1}\\ 0\\ 0\\ \vdots\end{array}\right].

The above linear system can be expressed in compact form as

H\mbox{\boldmath$\hat{X}$}=\mbox{\boldmath$E$}A_{-1},\ H=I-\tilde{P},

(6)

where $\mbox{\boldmath$\hat{X}$}^{T}=\left[X^{T},{X^{T}}^{2},\ldots\right]^{T}$ , $\tilde{P}$ is the matrix obtained from $P$ in (1) by removing its first block row and block column and $\mbox{\boldmath$E$}=\left[I_{n},0_{n},\ldots\right]^{T}$ . Classical fixed point iterations for solving (3) can be interpreted as iterative methods for solving (6), based on suitable partitionings of the matrix $H$ .

For instance, from the partitioning $H=M-N$ , where $M=I$ and $N=\tilde{P}$ , we find that the block vector $\hat{X}$ is a solution of the fixed point problem

\mbox{\boldmath$\hat{X}$}=\tilde{P}\mbox{\boldmath$\hat{X}$}+\mbox{\boldmath$E$}A_{-1}.

(7)

From this equation we may generate the sequence of block vectors

\mbox{\boldmath$\hat{X}_{k}$}=\begin{bmatrix}X_{k}\\ X_{k}^{2}\\ X_{k}^{2}\\ \vdots\end{bmatrix},~{}~{}~{}\mbox{\boldmath$Z_{k+1}$}=\begin{bmatrix}X_{k+1}\\ X_{k+1}X_{k}\\ X_{k+1}X_{k}^{2}\\ \vdots\end{bmatrix},

such that

\mbox{\boldmath$Z$}_{k+1}=\tilde{P}\mbox{\boldmath$\hat{X}_{k}$}+\mbox{\boldmath$E$}A_{-1},~{}~{}k=0,1,\ldots.

We may easily verify that the sequence $\{X_{k}\}_{k}$ coincides with the sequence generated by the so called natural fixed point iteration $X_{k+1}=\sum_{i=-1}^{\infty}A_{i}X_{k}^{i+1}$ , $k=0,1,,\ldots$ , applied to (3).

Similarly, the Jacobi partitioning, where $M=I\otimes(I_{n}-A_{0})$ and $N=M-H$ , which leads to the sequence

M\mbox{\boldmath$Z$}_{k+1}=N\mbox{\boldmath$\hat{X}_{k}$}+\mbox{\boldmath$E$}A_{-1},~{}~{}k=0,1,\ldots,

corresponds to the traditional fixed point iteration

(I_{n}-A_{0})X_{k+1}=A_{-1}+\sum_{i=1}^{\infty}A_{i}X_{k}^{i+1},~{}~{}k\geq 0.

(8)

The anti-Gauss-Seidel partitioning, where $M$ is the block upper triangular part of $H$ and $N=M-H$ , determines the $U$ -based fixed point iteration

\left(I_{n}-\sum_{i=0}^{\infty}A_{i}X_{k}^{i}\right)X_{k+1}=A_{-1},~{}~{}k\geq 0.

(9)

The convergence properties of these three fixed point iterations are analyzed in [3]. Among the three iterations (9) is the fastest and also the most expensive since it requires the solution of a new linear system (with multiple right hand sides) at each iteration. Moreover, it turns out that fixed-point iterations exhibit arbitrarily slow convergence for problems which are close to singularity. In particular, for positive recurrent Markov chains having a drift close to zero the convergence slows down and the number of iterations becomes arbitrarily large. In the next sections we present some new fixed point iterations which offer several advantages in terms of computational efﬁciency and convergence properties when compared with (9).

4 A new fixed point iteration

Recently in [7] a comparative analysis has been performed for the asymptotic convergence rates of some regular splittings of a non-singular block upper Hessenberg M-matrix of finite size. The conclusion is that the staircase splitting is faster than the anti-Gauss-Seidel splitting, that in turn is faster than the Jacobi splitting. The second result is classical, while the first one is somehow surprising since the matrix $M$ in the staircase splitting is much more sparse than the corresponding matrix in the anti-Gauss-Seidel partitioning and the splittings are not comparable. Inspired from these convergence properties, we introduce a new fixed point iteration for solving (3), based on the staircase partitioning of $H$ , namely,

M=\left[\begin{array}[]{cccccccccc}I_{n}-A_{0}\\ -A_{-1}&I_{n}-A_{0}&-A_{1}\\ &&I_{n}-A_{0}\\ &&-A_{-1}&I_{n}-A_{0}&-A_{1}\\ &&&&I_{n}-A_{0}\\ &&&&\times&\times&\times\\ &&&&&&\times&\phantom{a}\\ \\ \end{array}\right],\quad N=M-H.

(10)

The splitting has attracted interest for applications in parallel computing environments [12, 10]. In principle the alternating structure of the matrix $M$ in (10) suggests several different iterative schemes.

From one hand, the odd block entries of the system $M\mbox{\boldmath$Z$}_{k+1}=N\mbox{\boldmath$\hat{X}_{k}$}+\mbox{\boldmath$E$}A_{-1}$ yield the traditional fixed point iteration. On the other hand, the even block entries lead to the implicit scheme $-A_{-1}+(I_{n}-A_{0})X_{k+1}-A_{1}X_{k+1}^{2}=\sum_{i=2}^{\infty}A_{i}X_{k}^{i+1}$ recently introduced in [4]. Differently, by looking at the structure of the matrix $M$ on the whole, we introduce the following composite two-stage iteration:

\left\{\begin{array}[]{ll}(I_{n}-A_{0})Y_{k}=A_{-1}+\sum_{i=1}^{\infty}A_{i}X_{k}^{i+1};\\ -A_{-1}+(I_{n}-A_{0})X_{k+1}-A_{1}Y_{k}^{2}=\sum_{i=2}^{\infty}A_{i}X_{k}^{i+1},\end{array}\right.\quad k\geq 0,

(11)

or, equivalently,

\left\{\begin{array}[]{ll}(I_{n}-A_{0})Y_{k}=A_{-1}+\sum_{i=1}^{\infty}A_{i}X_{k}^{i+1};\\ X_{k+1}=Y_{k}+(I_{n}-A_{0})^{-1}A_{1}(Y_{k}^{2}-X_{k}^{2}),\end{array}\right.\quad k\geq 0,

(12)

starting from an initial approximation $X_{0}$ . At each step $k$ , this scheme consists of a traditional fixed point iteration that computes $Y_{k}$ from $X_{k}$ , followed by a cheap correction step for computing the new approximation $X_{k+1}$ .

As for classical fixed point iterations, the convergence is guaranteed when $X_{0}=0$ .

Proposition 1.

Assume that $X_{0}=0$ . Then the sequence $\{X_{k}\}_{k\in\mathbb{N}}$ generated by (12) converges monotonically to $G$ .

Proof.

We show by induction on $k$ that $0\leq X_{k}\leq Y_{k}\leq X_{k+1}\leq G$ for any $k\geq 0$ . For $k=0$ we verify easily that

X_{1}\geq Y_{0}=(I_{n}-A_{0})^{-1}A_{-1}\geq 0=X_{0},\quad(I_{n}-A_{0})X_{1}\leq A_{-1}+A_{1}G^{2}\leq(I_{n}-A_{0})G,

which gives $G\geq X_{1}$ . Suppose now that $G\geq X_{k}\geq Y_{k-1}\geq X_{k-1}$ , $k\geq 1$ . We find that

(I_{n}-A_{0})Y_{k}=A_{-1}+\sum_{i=1}^{\infty}A_{i}X_{k}^{i+1}\geq A_{-1}+A_{1}Y_{k-1}^{2}+\sum_{i=2}^{\infty}A_{i}X_{k-1}^{i+1}=(I_{n}-A_{0})X_{k}.

and

(I_{n}-A_{0})Y_{k}=A_{-1}+\sum_{i=1}^{\infty}A_{i}X_{k}^{i+1}\leq A_{-1}+\sum_{i=1}^{\infty}A_{i}G^{i+1}=(I_{n}-A_{0})G.

By multiplying both sides by the inverse of $I-A_{0}$ we obtain that $G\geq Y_{k}\geq X_{k}$ . This also implies that $Y_{k}^{2}-X_{k}^{2}\geq 0$ and therefore $X_{k+1}\geq Y_{k}$ . Since

(I_{n}-A_{0})X_{k+1}=A_{-1}+A_{1}Y_{k}^{2}+\sum_{i=2}^{\infty}A_{i}X_{k}^{i+1}\leq(I_{n}-A_{0})G,

we prove similarly that $G\geq X_{k+1}$ . It follows that $\{X_{k}\}_{k\in\mathbb{N}}$ is convergent, the limit solves (3) by continuity and, hence, the limit coincides with the matrix $G$ , since $G$ is the minimal nonnegative solution. ∎

A similar result also holds for the case where $X_{0}$ is a stochastic matrix, assuming that [A1] holds, so that $G$ is also stochastic.

Proposition 2.

Assume that condition [A1] is fulfilled and that $X_{0}$ is a stochastic matrix. Then, the sequence $\{X_{k}\}_{k\in\mathbb{N}}$ generated by (12) converges to $G$ .

Proof.

From (11) we obtain that

\left\{\begin{array}[]{ll}(I_{n}-A_{0})Y_{k}=A_{-1}+\sum_{i=1}^{\infty}A_{i}X_{k}^{i+1};\\ (I_{n}-A_{0})X_{k+1}=A_{-1}+A_{1}Y_{k}^{2}+\sum_{i=2}^{\infty}A_{i}X_{k}^{i+1}\end{array}\right.\quad k\geq 0,

which gives that $X_{k}\geq 0$ and $Y_{k}\geq 0$ , for any $k\in\mathbb{N}$ , since $X_{0}\geq 0$ . By assuming that $X_{0}\mbox{\boldmath$e$}=\mbox{\boldmath$e$}$ , we may easily show by induction that $Y_{k}\mbox{\boldmath$e$}=X_{k}\mbox{\boldmath$e$}=\mbox{\boldmath$e$}$ for any $k\geq 0$ . Therefore, all the matrices $X_{k}$ and $Y_{k}$ , $k\in\mathbb{N}$ , are stochastic. Let $\{\hat{X}_{k}\}_{k\in\mathbb{N}}$ be the sequence generated by (12) with $\hat{X}_{0}=0$ . We can easily show by induction that $X_{k}\geq\hat{X}_{k}$ for any $k\in\mathbb{N}$ . Since $\lim_{k\to\infty}\hat{X}_{k}=G$ , then any convergent subsequence of $\{X_{k}\}_{k\in\mathbb{N}}$ converges to a stochastic matrix $S$ such that $S\geq G$ . Since $G$ is also stochastic, it follows that $S=G$ and therefore, by compactness, we conclude that the sequence $\{X_{k}\}_{k\in\mathbb{N}}$ is also convergent to $G$ . ∎

Propositions 1 and 2 are global convergence results. An estimate of the rate of convergence of (12) will be provided in Section 6, together with a comparison with other existing methods.

5 A relaxed variant

At each iteration, the scheme (12) determines two approximations which can be combined by using a relaxation technique, that is, the approximation computed at the $k$ -th step takes the form of a weighted average between $Y_{k}$ and $X_{k+1}$ ,

X_{k+1}=\omega_{k+1}(Y_{k}+(I_{n}-A_{0})^{-1}A_{1}(Y_{k}^{2}-X_{k}^{2}))+(1-\omega_{k+1})Y_{k},\quad k\geq 0,

In matrix terms, the resulting relaxed variant of (12) can be written as

\left\{\begin{array}[]{ll}(I_{n}-A_{0})Y_{k}=A_{-1}+\sum_{i=1}^{\infty}A_{i}X_{k}^{i+1};\\ X_{k+1}=Y_{k}+\omega_{k+1}(I_{n}-A_{0})^{-1}A_{1}(Y_{k}^{2}-X_{k}^{2}),\end{array}\right.\quad k\geq 0.

(13)

If $\omega_{k}=0$ , for $k\geq 1$ , the relaxed scheme reduces to the traditional fixed point iteration (8). If $\omega_{k}=1$ , for $k\geq 1$ , the relaxed scheme coincides with (12). Values of $\omega_{k}$ greater than 1 can speed up the convergence of the iterative scheme.

Concerning convergence, the proof of Proposition 1 can immediately be generalized to show that the sequence $\{X_{k}\}_{k}$ defined by (13), with $X_{0}=0$ , converges for any $\omega_{k}=\omega$ , $k\geq 1$ , such that $0\leq\omega\leq 1$ . Moreover, let $\{X_{k}\}_{k\in\mathbb{N}}$ and $\{\hat{X}_{k}\}_{k\in\mathbb{N}}$ be the sequences generated by (13) for $\omega_{k}=\omega$ and $\omega_{k}=\hat{\omega}$ with $0\leq\omega\leq\hat{\omega}\leq 1$ , respectively. It can be easily shown that $G\geq\hat{X}_{k}\geq X_{k}$ for any $k$ and, hence, that the iterative scheme (12) converges faster than (13) if $0\leq\omega_{k}=\omega<1$ .

The convergence analysis of the modified scheme (13) for $\omega_{k}>1$ is much more involved since the choice of a relaxation parameter $\omega_{k}>1$ can destroy the monotonicity and the nonnegativity of the approximation sequence, which is at the core of the proofs of Proposition 1 and Proposition 2 . In order to maintain the convergence properties of the modified scheme we introduce the following definition.

Definition 3.

The sequence $\{\omega_{k}\}_{k\geq 1}$ is eligible for the scheme (13) if $\omega_{k}\geq 0$ , $k\geq 1$ , and the following two conditions are satisfied:

\omega_{k+1}A_{1}(Y_{k}^{2}-X_{k}^{2})\leq A_{1}(X_{k+1}^{2}-X_{k}^{2})+\sum_{i=2}^{\infty}A_{i}(Y_{k}^{i+1}-X_{k}^{i+1}),\quad k\geq 0.

(14)

and

X_{k+1}\mbox{\boldmath$e$}=Y_{k}\mbox{\boldmath$e$}+\omega_{k+1}(I_{n}-A_{0})^{-1}A_{1}(Y_{k}^{2}-X_{k}^{2})\mbox{\boldmath$e$}\leq\mbox{\boldmath$e$},\quad k\geq 0.

(15)

It is worth noting that condition (14) is implicit since the construction of $X_{k+1}$ also depends on the value of $\omega_{k+1}$ . By replacing $X_{k+1}$ in (14) with the expression in the right-hand side of (13) we obtain a quadratic inequality with matrix coefficients in the variable $\omega_{k+1}$ . Obviously $\omega_{k}=\omega$ , $0\leq\omega\leq 1$ , $k\geq 1$ , are eligible sequences.

The following generalization of Proposition 1 holds.

Proposition 4.

Set $X_{0}=0$ and let condition [A1] be satisfied. If $\{\omega_{k}\}_{k\geq 1}$ is eligible then the sequence $\{X_{k}\}_{k\in\mathbb{N}}$ generated by (13) converges monotonically to $G$ .

Proof.

We show by induction that $0\leq X_{k}\leq Y_{k}\leq X_{k+1}\leq G$ . For $k=0$ we have

(I_{n}-A_{0})X_{1}\geq(I_{n}-A_{0})Y_{0}=A_{-1}\geq 0=X_{0}

which gives immediately $X_{1}\geq Y_{0}\geq 0$ . Moreover, $X_{1}\mbox{\boldmath$e$}\leq\mbox{\boldmath$e$}$ . Suppose now that $X_{k}\geq Y_{k-1}\geq X_{k-1}\geq 0$ , $k\geq 1$ . We find that

\begin{array}[]{lll}(I_{n}-A_{0})X_{k}=A_{-1}+A_{1}(X_{k-1}^{2}+\omega_{k}(Y_{k-1}^{2}-X_{k-1}^{2}))+\sum_{i=2}^{\infty}A_{i}X_{k-1}^{i+1}\leq\\ \leq A_{-1}+A_{1}X_{k-1}^{2}+A_{1}(X_{k}^{2}-X_{k-1}^{2})+\sum_{i=2}^{\infty}A_{i}(Y_{k-1}^{i+1}-X_{k-1}^{2})+\sum_{i=2}^{\infty}A_{i}X_{k-1}^{i+1}\leq\\ \leq A_{-1}+A_{1}X_{k}^{2}+\sum_{i=2}^{\infty}A_{i}Y_{k-1}^{i+1}\leq A_{-1}+\sum_{i=1}^{\infty}A_{i}X_{k}^{i+1}=(I_{n}-A_{0})Y_{k}\end{array}

from which it follows $Y_{k}\geq X_{k}\geq 0$ . This also implies that $X_{k+1}\geq Y_{k}$ . From (15) it follows that $X_{k}\mbox{\boldmath$e$}\leq\mbox{\boldmath$e$}$ for all $k\geq 0$ and therefore the sequence of approximations is upper bounded and it has a finite limit $H$ . By continuity we find that $H$ solves the matrix equation (3) and $H\mbox{\boldmath$e$}\leq\mbox{\boldmath$e$}$ . Since $G$ is the unique stochastic solution, then $H=G$ . ∎

Remark 5.

As previously mentioned, condition (14) is implicit, since $X_{k+1}$ also depends on $\omega_{k+1}$ . An explicit condition can be derived by noting that

A_{1}(X_{k+1}^{2}-X_{k}^{2})\geq A_{1}((Y_{k}^{2}-X_{k}^{2})+\omega_{k+1}(Y_{k}\Gamma_{k}+\Gamma_{k}Y_{k})),

with $\Gamma_{k}=(I_{n}-A_{0})^{-1}A_{1}(Y_{k}^{2}-X_{k}^{2})$ . There follows that (14) is fulfilled whenever

\frac{\omega_{k+1}-1}{\omega_{k+1}}A_{1}((Y_{k}^{2}-X_{k}^{2})\leq(Y_{k}\Gamma_{k}+\Gamma_{k}Y_{k})+\omega_{k+1}^{-1}\sum_{i=2}^{\infty}A_{i}(Y_{k}^{i+1}-X_{k}^{i+1})

which can be reduced to a linear inequality in $\omega_{k+1}$ over a fixed search interval. Let $\omega\in[1,\hat{\omega}]$ be such that

\frac{\omega-1}{\omega}A_{1}((Y_{k}^{2}-X_{k}^{2})\leq(Y_{k}\Gamma_{k}+\Gamma_{k}Y_{k})+{\hat{\omega}}^{-1}\sum_{i=2}^{\infty}A_{i}(Y_{k}^{i+1}-X_{k}^{i+1}).

(16)

Then we can impose that

\omega_{k+1}=\max\{\omega\colon\omega\in[1,\hat{\omega}]\land\eqref{strat1}\ holds\}.

(17)

From a computational viewpoint the strategy based on (16) and (17) for the choice of the value of $\omega_{k+1}$ can be too much expensive and some weakened criterion should be considered (compare with Section 7 below).

In the following section we perform a convergence analysis to estimate the convergence rate of (13) in the stationary case $\omega_{k}=\omega$ , $k\geq 1$ , as a function of the relaxation parameter.

6 Estimate of the convergence rate

Relaxation techniques are usually aimed at accelerating the convergence speed of frustratingly slow iterative solvers. Such inefficient behavior is typically exhibited when the solver is applied to a nearly singular problem. Incorporating some relaxation parameter into the iterative scheme (3) can greatly improve its convergence rate. Preliminary insights on the effectiveness of relaxation techniques applied for the solution of the fixed point problem (7) come from the classical analysis for stationary iterative solvers and are developed in Section 6.1. A precise convergence analysis is presented in Section 6.2.

6.1 Finite dimensional convergence analysis

Suppose that $H$ in (7) is block tridiagonal of finite size $m=n\ell$ , $\ell$ even. We are interested in comparing the iterative algorithm based on the splitting (10) with other classical iterative solvers for the solution of a linear system with coefficient matrix $H$ . As usual, we can write $H=D-P_{1}-P_{2}$ , where $D$ is block diagonal, while $P_{1}$ and $P_{2}$ are staircase matrices with zero block diagonal. The eigenvalues $\lambda_{i}$ of the Jacobi iteration matrix satisfy

0=\det(\lambda_{i}I_{\ell}-D^{-1}P_{1}-D^{-1}P_{2}).

Let us consider a relaxed scheme where the matrix $M$ is obtained from (10) by multiplying the off-diagonal blocks by $\omega$ . The eigenvalues $\mu_{i}$ of the iteration matrix associated with the relaxed staircase regular splitting are such that

0=\det(\mu_{i}(D-\omega P_{1})-(P_{2}+(1-\omega)P_{1}))

and, equivalently,

0=\det(\mu_{i}I_{\ell}-(\mu_{i}\omega+(1-\omega))D^{-1}P_{1}-D^{-1}P_{2}).

By using a similarity transformation induced by the matrix $S=I_{\ell/2}\otimes\operatorname{diag}\left[I_{n},\alpha I_{n}\right]$ we find that

	$\displaystyle\det(\mu_{i}I-(\mu_{i}\omega+(1-\omega))D^{-1}P_{1}-D^{-1}P_{2})=$
	$\displaystyle\det(\mu_{i}I_{\ell}-\alpha(\mu_{i}\omega+(1-\omega))D^{-1}P_{1}-\frac{1}{\alpha}D^{-1}P_{2}).$

There follows that

\mu_{i}\alpha=\lambda_{i}

whenever $\alpha$ fulfills

\alpha(\mu_{i}\omega+(1-\omega))=\frac{1}{\alpha}.

Therefore, the eigenvalues of the Jacobi and relaxed staircase regular splittings are related by

\mu_{i}^{2}-\lambda_{i}^{2}\mu\omega+\lambda_{i}^{2}(\omega-1)=0.

For $\omega=0$ the staircase splitting reduces to the Jacobi partitioning. For $\omega=1$ we find that $\mu_{i}=\lambda_{i}^{2}$ which yields the classical relation between the spectral radii of Jacobi and Gauss-Seidel methods. Observe that it is well known that the asymptotic convergence rates of Gauss-Seidel and the staircase iteration coincide when applied to a block tridiagonal matrix [1]. For $\omega>1$ the spectral radius of the relaxed staircase scheme can be significantly less than the spectral radius of the same scheme for $\omega=1$ . In Figure 1 we illustrate the plot of the function

\rho_{S}(\omega)=\left|\frac{\lambda^{2}\omega+\sqrt{\lambda^{4}\omega^{2}-4\lambda^{2}(\omega-1)}}{2}\right|,\quad 1\leq\omega\leq 2,

for a fixed $\lambda=0.999$ .

Refer to caption — Figure 1: Plot of $\rho_{S}(\omega)$ for $\omega\in[1,2]$ and $\lambda=0.999$

For the best choice of $\omega=\omega^{\star}=2\displaystyle\frac{1+\sqrt{1-\lambda^{2}}}{\lambda^{2}}$ we find $\rho_{S}(\omega^{\star})=1-\sqrt{1-\lambda^{2}}=\displaystyle\frac{\lambda^{2}}{1+\sqrt{1-\lambda^{2}}}$ .

6.2 Asymptotic Convergence Rate

A formal analysis of the asymptotic convergence rate of the relaxed variants (13) can be carried out by using the tools described in [11]. In this section we relate the approximation error at two subsequent steps and we provide an estimate of the asymptotic rate of convergence, expressed as the spectral radius of a suitable matrix depending on $\omega$ .

Hereafter it is assumed that assumption [A1] is verified.

6.2.1 The case $X_{0}=0$

Let us introduce the error matrix $E_{k}=G-X_{k}$ , where $\{X_{k}\}_{k\in\mathbb{N}}$ is generated by (13) with $X_{0}=0$ . We also define $E_{k+1/2}=G-Y_{k}$ , for $k=0,1,2,\ldots$ . Suppose that

C0.

$\{\omega_{k}\}_{k}$ is an eligible sequence according to Definition 3.

Under this assumption from Proposition 4 the sequence $\{X_{k}\}_{k}$ converges monotonically to $G$ and $E_{k}\geq 0$ , $E_{k+1/2}\geq 0$ . Since $E_{k}\geq 0$ and $\parallel E_{k}\parallel_{\infty}=\parallel E_{k}\mbox{\boldmath$e$}\parallel_{\infty}$ , we analyze the convergence of the vector $\mbox{\boldmath$\epsilon$}_{k}=E_{k}\mbox{\boldmath$e$}$ , $k\geq 0$ .

We have

(I_{n}-A_{0})E_{k+1/2}=\sum_{i=1}^{\infty}A_{i}(G^{i+1}-X_{k}^{i+1})=\sum_{i=1}^{\infty}A_{i}\sum_{j=0}^{i}G^{j}E_{k}X_{k}^{i-j}.

(18)

Similarly, for the second equation of (13), we find that

(I_{n}-A_{0})E_{k+1}=(I_{n}-A_{0})E_{k+1/2}-\omega_{k+1}A_{1}((G^{2}-X_{k}^{2})-(G^{2}-Y_{k}^{2})),

which gives

		$\displaystyle(I_{n}-A_{0})E_{k+1}=$
		$\displaystyle(I_{n}-A_{0})E_{k+1/2}-\omega_{k+1}A_{1}(E_{k}G+X_{k}E_{k})+\omega_{k+1}A_{1}(GE_{k+1/2}+E_{k+1/2}Y_{k}).$		(19)

Denote by $R_{k}$ the matrix on the right hand side of (18), i.e.,

R_{k}=\sum_{i=1}^{\infty}A_{i}\sum_{j=0}^{i}G^{j}E_{k}X_{k}^{i-j}.

Since $G\mbox{\boldmath$e$}=\mbox{\boldmath$e$}$ , equation (6.2.1), together with the monotonicity, yields

(I_{n}-A_{0})E_{k+1}\mbox{\boldmath$e$}\leq R_{k}\mbox{\boldmath$e$}-\omega_{k+1}A_{1}(I_{n}+X_{k})E_{k}\mbox{\boldmath$e$}+\omega_{k+1}A_{1}(I_{n}+G)(I_{n}-A_{0})^{-1}R_{k}\mbox{\boldmath$e$}.

(20)

Observe that $R_{k}\mbox{\boldmath$e$}\leq WE_{k}\mbox{\boldmath$e$}$ , where

W=\sum_{i=1}^{\infty}A_{i}\sum_{j=0}^{i}G^{j},

hence

\epsilon_{k+1}\leq P(\omega_{k+1})\epsilon_{k},\quad k\geq 0,

(21)

where

P(\omega)=(I_{n}-A_{0})^{-1}W-\omega(I_{n}-A_{0})^{-1}A_{1}(I_{n}+G)(I_{n}-(I_{n}-A_{0})^{-1}W).

(22)

The matrix $P(\omega)$ can be written as $P(\omega)=M^{-1}N(\omega)$ where

M=I_{n}-A_{0},~{}~{}~{}M-N(\omega)=A(\omega),

and

A(\omega)=(I_{n}+\omega A_{1}(I_{n}+G)(I_{n}-A_{0})^{-1})(I_{n}-A_{0}-W).

Let us assume the following condition holds:

C1.

The relaxation parameter $\omega$ satisfies $\omega\in[0,\hat{\omega}]$ with $\hat{\omega}\geq 1$ such that

$\hat{\omega}A_{1}(I_{n}+G)(I_{n}-A_{0})^{-1}(I_{n}-A_{0}-W)\leq W.$

Assumption [C1] ensures that $N(\omega)\geq 0$ and, therefore $P(\omega)\geq 0$ and $M-N(\omega)=A(\omega)$ is a regular splitting of $A(\omega)$ . If [C1] is satisfied at each iteration of (13), then from (21) we obtain that

\epsilon_{k}\leq P(\omega_{k})P(\omega_{k-1})\cdots P(\omega_{0})\epsilon_{0},~{}~{}k\geq 1.

Therefore, the asymptotic rate of convergence, defined as

\sigma=\limsup_{k\to\infty}\left(\frac{\|\mbox{\boldmath$\epsilon$}_{k}\|}{\|\mbox{\boldmath$\epsilon$}_{0}\|}\right)^{1/k},

where $\|\cdot\|$ is any vector norm, is such that

\sigma\leq\limsup_{k\to\infty}\|P(\omega_{k})P(\omega_{k-1})\cdots P(\omega_{0})\|_{\infty}^{1/k}.

The above properties can be summarized in the following result, that gives a convergence rate estimate for iteration (13).

Proposition 6.

Under Assumptions [A1], [C0] and [C1], for the fixed point iteration (13) applied with $\omega_{k}=\omega$ for any $k\geq 0$ , we have the following convergence rate estimate:

\sigma=\limsup_{k\to\infty}\left(\frac{\|\mbox{\boldmath$\epsilon$}_{k}\|}{\|\mbox{\boldmath$\epsilon$}_{0}\|}\right)^{1/k}\leq\rho_{\omega},

where $P(\omega)$ is defined in (22) and $\rho_{\omega}=\rho(P(\omega))$ is the spectral radius of $P(\omega)$ .

When $\omega=0$ , we find that $A(0)=I_{n}-A_{0}-W=I_{n}-V$ , where $V=\sum_{i=0}^{\infty}A_{i}\sum_{j=0}^{i}G^{j}$ . According to Theorem 4.14 in [3], $I-V$ is a nonsingular M-matrix and therefore, since $N(0)\geq 0$ and $M^{-1}\geq 0$ , $A(0)=M-N(0)$ is a regular splitting. Hence, the spectral radius $\rho_{0}$ of $P(0)$ is less than 1. More generally, under Assumption [C1] since

I_{n}-V\leq A(\omega)\leq I_{n}-A

from characterization F20 in [15] we find that $A(\omega)$ is a nonsingular M-matrix and $A(\omega)=M-N(\omega)$ is a regular splitting. Hence, we deduce that $\rho_{\omega}<1$ . The following result gives an estimate of $\rho_{\omega}$ by showing its dependence as function of the relaxation parameter.

Proposition 7.

Let $\omega$ be such that $0\leq\omega\leq\hat{\omega}$ and condition [C1] holds. Assume that the Perron eigenvector $v$ of $P(0)$ is positive. Then we have

\rho_{0}-\omega(1-\rho_{0})\sigma_{\max}\leq\rho_{\omega}\leq\rho_{0}-\omega(1-\rho_{0})\sigma_{\min}

(23)

where $\sigma_{\min}=\min_{i}\frac{u_{i}}{v_{i}}$ and $\sigma_{\max}=\max_{i}\frac{u_{i}}{v_{i}}$ , with $\mbox{\boldmath$u$}=(I_{n}-A_{0})^{-1}A_{1}(I+G)\mbox{\boldmath$v$}$ . Moreover, $0\leq\sigma_{\min},\sigma_{\max}\leq\rho_{0}$ .

Proof.

In view of the classical Collatz-Wielandt formula (see [13], Chapter 8) , if $P(\omega)\mbox{\boldmath$v$}=\mbox{\boldmath$w$}$ , where $\mbox{\boldmath$v$}>0$ and $\mbox{\boldmath$w$}\geq 0$ , then

\min_{i}\frac{w_{i}}{v_{i}}\leq\rho_{\omega}\leq\max_{i}\frac{w_{i}}{v_{i}}.

Observe that

\begin{array}[]{l}\mbox{\boldmath$w$}=P(\omega)\mbox{\boldmath$v$}=P(0)\mbox{\boldmath$v$}-\omega(I_{n}-A_{0})^{-1}A_{1}(I_{n}+G)(I-P(0))\mbox{\boldmath$v$}=\\ \rho_{0}\mbox{\boldmath$v$}-\omega(1-\rho_{0})(I_{n}-A_{0})^{-1}A_{1}(I_{n}+G)\mbox{\boldmath$v$}=\rho_{0}\mbox{\boldmath$v$}-\omega(1-\rho_{0})\mbox{\boldmath$u$},\end{array}

which leads to (23), since $\mbox{\boldmath$u$}\geq 0$ . Moreover, since $A_{1}(I_{n}+G)\leq W$ , then

\mbox{\boldmath$u$}=(I_{n}-A_{0})^{-1}A_{1}(I_{n}+G)\mbox{\boldmath$v$}\leq(I_{n}-A_{0})^{-1}W\mbox{\boldmath$v$}=\rho_{0}\mbox{\boldmath$v$},

hence $u_{i}/v_{i}\leq\rho_{0}$ for any $i$ . ∎

Observe that, in the QBD case, where $A_{i}=0$ for $i\geq 2$ , we have $A_{1}(I_{n}+G)=W$ and from the proof above $\mbox{\boldmath$u$}=\rho_{0}\mbox{\boldmath$v$}$ . Therefore, we have $\sigma_{\min}=\sigma_{\max}=\rho_{0}$ and, hence, $\rho_{\omega}=\rho_{0}(1-\omega(1-\rho_{0}))$ . In particular, $\rho_{\omega}$ linearly decreases with $\omega$ , and $\rho_{1}=\rho_{0}^{2}$ . In the general case, inequality (23) shows that the upper bound to $\rho_{\omega}$ linearly decreases as a function of $\omega$ . Therefore the choice $\omega=\hat{\omega}$ gives the fastest convergence rate.

Remark 8.

For the sake of illustration we consider a quadratic equation associated with a block tridiagonal Markov chain taken from [9]. We set $A_{-1}=W+\delta I$ , $A_{0}=A_{1}=W$ where $0<\delta<1$ and $W\in\mathbb{R}^{n\times n}$ has zero diagonal entries and all off-diagonal entries equal to a given value $\alpha$ determined so that $A_{-1}+A_{0}+A_{1}$ is stochastic. We find that for $\omega_{k}=\omega\in[0,6]$ $N(\omega)\geq 0$ . In Figure 2 we plot the spectral radius of $P=P(\omega)$ . The linear plot is in accordance with Theorem 7.

6.2.2 The Case $X_{0}$ stochastic

In this section we analyze the convergence of the iterative method (13) starting with a stochastic matrix $X_{0}$ , that is, $X_{0}\geq 0$ and $X_{0}\mbox{\boldmath$e$}=\mbox{\boldmath$e$}$ . Eligible sequences $\{\omega_{k}\}_{k}$ are such that $X_{k}\geq 0$ for any $k\geq 0$ . This happens for $0\leq\omega_{k}\leq 1$ , $k\geq 1$ . Suppose that:

S0.

The sequence $\{\omega_{k}\}_{k}$ in (13) is determined so that $\omega_{k}\geq 0$ and $X_{k}\geq 0$ for any $k\geq 1$ .

Observe that the property $X_{k}\mbox{\boldmath$e$}=\mbox{\boldmath$e$}$ , $k\geq 0$ is automatically satisfied. Hence, under assumption [S0], all the approximations generated by the iterative scheme (13) are stochastic matrices and therefore, Proposition 2 can be extended in order to prove that the sequence $\{X_{k}\}_{k\in\mathbb{N}}$ is convergent to $G$ .

The analysis of the speed of convergence follows from relation (6.2.1). Let us denote as $\operatorname{vec}1(A)\in\mathbb{R}^{n^{2}}$ the vector obtained by stacking the columns of the matrix $A\in\mathbb{R}^{n\times n}$ on top of one another. Recall that $\operatorname{vec}1(ABC)=(C^{T}\otimes A)\operatorname{vec}1(B)$ for any $A,B,C\in\mathbb{R}^{n\times n}$ . By using this property we can rewrite (6.2.1) as follows. We have:

\begin{array}[]{llllll}[I_{n}\otimes(I_{n}-A_{0})]\operatorname{vec}1(E_{k+1})=\omega_{k+1}[I_{n}\otimes A_{1}G(I_{n}-A_{0})^{-1}A_{1}G]\operatorname{vec}1(E_{k})+\\ \omega_{k+1}[X_{k}^{T}\otimes A_{1}G(I_{n}-A_{0})^{-1}A_{1}]\operatorname{vec}1(E_{k})+\omega_{k+1}[Y_{k}^{T}\otimes A_{1}(I_{n}-A_{0})^{-1}A_{1}G]\operatorname{vec}1(E_{k})+\\ \omega_{k+1}[Y_{k}^{T}X_{k}^{T}\otimes A_{1}(I_{n}-A_{0})^{-1}A_{1}]\operatorname{vec}1(E_{k})+(1-\omega_{k+1})[I_{n}\otimes A_{1}G]\operatorname{vec}1(E_{k})+\\ (1-\omega_{k+1})[X_{k}^{T}\otimes A_{1}]\operatorname{vec}1(E_{k})+[\sum_{i=2}^{\infty}\sum_{j=0}^{i}({X_{k}^{T}}^{i-j}\otimes A_{i}G^{j})]\operatorname{vec}1(E_{k})+\\ \omega_{k+1}[\sum_{i=2}^{\infty}\sum_{j=0}^{i}({X_{k}^{T}}^{i-j}\otimes A_{1}G(I_{n}-A_{0})^{-1}A_{i}G^{j})]\operatorname{vec}1(E_{k})+\\ \omega_{k+1}[\sum_{i=2}^{\infty}\sum_{j=0}^{i}({Y_{k}^{T}}{X_{k}^{T}}^{i-j}\otimes A_{1}(I_{n}-A_{0})^{-1}A_{i}G^{j})]\operatorname{vec}1(E_{k}),\end{array}

(24)

for $k\geq 0$ . The convergence of $\{\operatorname{vec}1(E_{k})\}$ depends on the choice of $\omega_{k+1}$ , $k\geq 0$ . Suppose that $\omega_{k}=\omega$ for any $k\geq 0$ and [S0] holds. Then (24) can be rewritten in a compact form as

\operatorname{vec}1(E_{k+1})=H_{k}\operatorname{vec}1(E_{k}),\quad k\geq 0,

where $H_{k}=H_{\omega}(X_{k},Y_{k})$ and

\lim_{k\rightarrow+\infty}H_{k}=H_{\omega}(G,G)=H_{\omega}.

It can be shown that the asymptotic rate of convergence $\sigma$ satisfies

\sigma=\limsup_{k\to\infty}\left(\frac{\|\operatorname{vec}1(E_{k+1})\|}{\|\operatorname{vec}1(E_{0})\|}\right)^{1/k}\leq\rho(H_{\omega}).

In the sequel we compare the cases $\omega_{k}=0$ , which corresponds with the traditional fixed point iteration (8), and $\omega_{k}=1$ which reduces to the staircase fixed point iteration (12).

For $\omega_{k+1}=0$ , $k\geq 0$ , we find that

\operatorname{vec}1(E_{k+1})=[\sum_{i=1}^{\infty}\sum_{j=0}^{i}({X_{k}^{T}}^{i-j}\otimes(I_{n}-A_{0})^{-1}A_{i}G^{j})]\operatorname{vec}1(E_{k}),\quad k\geq 0,

which means that

H_{0}=\sum_{i=1}^{\infty}\sum_{j=0}^{i}({G^{T}}^{i-j}\otimes(I_{n}-A_{0})^{-1}A_{i}G^{j}),\quad k\geq 0.

Let $U^{H}G^{T}U=T$ be the Schur form of $G^{T}$ and set $W=(U^{H}\otimes I_{n})$ . Then

W\sum_{i=1}^{\infty}\sum_{j=0}^{i}({G^{T}}^{i-j}\otimes(I_{n}-A_{0})^{-1}A_{i}G^{j})W^{-1}=\sum_{i=1}^{\infty}\sum_{j=0}^{i}({T}^{i-j}\otimes(I_{n}-A_{0})^{-1}A_{i}G^{j})

which means that $H_{0}$ is similar to the matrix on the right hand-side. There follows that the eigenvalues of $H_{0}$ belong to the set

\cup_{\lambda}\{\mu\colon\mu\textrm{ is eigenvalue of }\sum_{i=1}^{\infty}\sum_{j=0}^{i}(\lambda^{i-j}(I_{n}-A_{0})^{-1}A_{i}G^{j})\}

with $\lambda$ eigenvalue of $G$ . Since $G$ is stochastic we have $|\lambda|\leq 1$ . Thus, from

|\sum_{i=1}^{\infty}\sum_{j=0}^{i}\lambda^{i-j}(I_{n}-A_{0})^{-1}A_{i}G^{j}|\leq\sum_{i=1}^{\infty}\sum_{j=0}^{i}(I_{n}-A_{0})^{-1}A_{i}G^{j}=P(0)

we conclude that $\rho(H_{0})\leq\rho(P(0))$ in view of the Wielandt theorem [13].

A similar analysis can be performed in the case $\omega_{k}=1$ , $k\geq 0$ . We find that

\begin{array}[]{ll}H_{1}=[I_{n}\otimes(I_{n}-A_{0})^{-1}]\left\{\sum_{i=2}^{\infty}\sum_{j=0}^{i}({G^{T}}^{i-j}\otimes A_{i}G^{j})+\right.\\ \left.\sum_{i=1}^{\infty}\sum_{j=0}^{i}\left[({G^{T}}^{i-j}\otimes A_{1}G(I_{n}-A_{0})^{-1}A_{i}G^{j})+({G^{T}}^{i-j+1}\otimes A_{1}(I_{n}-A_{0})^{-1}A_{i}G^{j})\right]\right\}.\end{array}

By the same arguments as above we find that the eigenvalues of $H_{1}$ belong to the set

\cup_{\lambda}\{\mu\colon\mu\textrm{ is eigenvalue of }(I_{n}-A_{0})^{-1}N(\lambda)\},

with $\lambda$ eigenvalue of $G^{T}$ , and

\begin{array}[]{ll}N(\lambda)=\sum_{i=2}^{\infty}\sum_{j=0}^{i}(\lambda^{i-j}A_{i}G^{j})+\sum_{i=1}^{\infty}\sum_{j=0}^{i}(\lambda^{i-j}A_{1}G(I_{n}-A_{0})^{-1}A_{i}G^{j})+\\ \sum_{i=1}^{\infty}\sum_{j=0}^{i}(\lambda^{i-j+1}A_{1}(I_{n}-A_{0})^{-1}A_{i}G^{j}).\end{array}

Since

|(I_{n}-A_{0})^{-1}N(\lambda)|\leq P(1)

we conclude that

\rho(H_{1})\leq\rho(P(1)).

Therefore, in the application of (12) we expect a faster convergence when $X_{0}$ is a stochastic matrix, rather than $X_{0}=0$ . Indeed, numerical results shown in Section 5 exhibit a very rapid convergence profile when $X_{0}$ is stochastic, even better than the one predicted by $\rho(H_{1})$ . This might be explained with the dependence of the asymptotic convergence rate on the second eigenvalue of the corresponding iteration matrices as reported in [11].

7 Adaptive Strategies and Efficiency Analysis

The efficiency of fixed point iterations depends on both speed of convergence and complexity properties. Markov chains are generally defined in terms of sparse matrices. To take into account this feature we assume that $\gamma n^{2}$ , $\gamma=\gamma(n)$ , multiplications/divisions are sufficient to perform the following tasks:

1.

to compute a matrix multiplication of the form $A_{i}\cdot W$ , where $A_{i},W\in\mathbb{R}^{n\times n}$ ;
2.

to solve a linear system of the form $(I-A_{0})Z=W$ , where $A_{0},W\in\mathbb{R}^{n\times n}$ .

We also suppose that the transition matrix $P$ in (1) is banded, hence $A_{i}=0$ for $i>q$ . This is always the case in numerical computations where the matrix power series $\sum_{i=-1}^{\infty}A_{i}X_{k}^{i+1}$ has to be approximated by some finite partial sum $\sum_{i=-1}^{q}A_{i}X_{k}^{i+1}$ . Under these assumptions we obtain the following cost estimates per step:

1.

the traditional fixed point iteration (8) requires $qn^{3}+2\gamma n^{2}+O(n^{2})$ multiplicative operations;
2.

the $U$ -based fixed point iteration (9) requires $(q+4/3)n^{3}+\gamma n^{2}+O(n^{2})$ multiplicative operations;
3.

the staircase-based (S-based) fixed point iteration (12) requires $(q+1)n^{3}+4\gamma n^{2}+O(n^{2})$ multiplicative operations.

Observe that the cost of the S-based fixed point iteration is comparable with the cost of the $U$ -based iteration, which is the fastest among classical iterations [3]. Therefore, in the cases where the $U$ /S-based fixed point iterative schemes require significantly less iterations to converge, these algorithms are more efficient than the traditional fixed point iteration.

Concerning the relaxed versions (13) of the S-based fixed point iteration for a given fixed choice of $\omega_{k}=\omega$ we get the same complexity of the unmodified scheme (12) obtained with $\omega_{k}=\omega=1$ . The adaptive selection of $\omega_{k+1}$ exploited in Proposition 4 and Remark 5 with $X_{0}=0$ requires more care.

The strategy (16) is computationally unfeasible since it needs the additional computation of $\sum_{i=2}^{q}A_{i}Y_{k}^{i+1}$ . To approximate this quantity we recall that

A_{i}(Y_{k}^{i+1}-X_{k}^{i+1})=A_{i}\sum_{j=0}^{i}Y_{k}^{j}(Y_{k}-X_{k})X_{k}^{i-j}\geq A_{i}\sum_{j=0}^{i}X_{k}^{j}(Y_{k}-X_{k})X_{k-1}^{i-j}.

Let $\theta_{k+1}$ be such that

Y_{k}-X_{k}\geq\frac{X_{k}-X_{k-1}}{\theta_{k+1}}.

Then condition (16) can be replaced with

\frac{\omega_{k+1}-1}{\omega_{k+1}}A_{1}((Y_{k}^{2}-X_{k}^{2})\leq(Y_{k}\Gamma_{k}+\Gamma_{k}Y_{k})+(\hat{\omega}\theta_{k+1})^{-1}\sum_{i=2}^{q}A_{i}(X_{k}^{i+1}-X_{k-1}^{i+1}).

(25)

The iterative scheme (13) complemented with the strategy based on (25) for the selection of the parameter $\omega_{k+1}$ requires no more than $(q+3)n^{3}+4\gamma n^{2}+O(n^{2})$ multiplicative operations. The efficiency of this scheme will be investigated experimentally in the next section.

8 Numerical Results

In this section we present the results of some numerical experiments which confirm the effectiveness of our proposed schemes. All the algorithm have been implemented in Matlab and tested on a PC i9-9900K CPU 3.60GHz $\times$ 8. Our test suite includes:

Synthetic Examples:

(a)

The block tridiagonal Markov chain of Remark 8. Observe that the drift of the Markov chain is exactly equal to $-\delta$ .

(b)

A numerical example given in [2] and considered in [8] for testing a suitable modification –named Algorithm 1– of the U-based fixed point iteration. The Markov chain of the M/G/1 type is given by

A_{-1}=\frac{4(1-p)}{3}\left[\begin{array}[]{ccccc}0.05&0.1&0.2&0.3&0.1\\ 0.2&0.05&0.1&0.1&0.3\\ 0.1&0.2&0.3&0.05&0.1\\ 0.1&0.05&0.2&0.1&0.3\\ 0.3&0.1&0.1&0.2&0.05\end{array}\right],\quad A_{i}=pA_{i-1},~{}~{}i\geq 0.

The Markov chain is positive recurrent, null recurrent or transient according as $0<p<0.5$ , $p=0.5$ or $p>0.5$ , respectively. In our computations we have chosen different values of $p$ , in the range $0<p<0.6$ and the matrices $A_{i}$ are treated as zero matrix for $i\geq 51$ .

(c)

Synthetic examples of M/G/1-type Markov chains described in [4]. These examples are constructed in such a way that the drift of the associated Markov chain is close to a given nonnegative value. We do not describe in detail the construction, as it would take some space, but refer the reader to [4, Sections 7.1].

2.
Application Examples:
1. (a)
  
  Some examples of PH/PH/1 queues collected in [4, Sections 7.1] for testing purposes. The construction depends on a parameter $\rho$ with $0\leq\rho\leq 1$ . In this case the drift is $\mu=1-\rho$ .
2. (b)
  
  The Markov chain of M/G/1 type associated with the infinitesimal generator matrix $Q$ from the queuing model described in [6]. This is a complex queuing model, a BMAP/PHF/1/N model with retrial system with finite buffer of capacity $N$ and non-persistent customers. For the construction of the matrix $Q$ we refer the reader to [6, Sections 4.3 and 4.5].

8.1 Synthetic Examples

The first test concern with the validation of the analysis performed above, regarding the convergence of fixed point iterations. In Table 1 we show the number of iterations required by different iterative schemes on Example 1.a with $n=100$ . Specifically we compare the traditional fixed point iteration, the $U$ -based fixed point iteration, the S-based fixed point iteration (12) and the relaxed fixed point iterations (13). For the latter case we consider the S_ω-based iteration where $\omega_{k+1}=\omega$ is fixed a priori and the S_ω(k)-based iteration where the value of $\omega_{k+1}$ is dynamically adjusted at any step according to the strategy (25) complemented with condition (15). The relaxed stationary iteration is applied for $\omega=1.8,1.9,2$ . The relaxed adaptive iteration is applied with $\hat{\omega}=10$ . The iterations are stopped when the residual error $\|X_{k}-\sum_{i=-1}^{q}A_{i}X_{k}^{i+1}\|_{\infty}$ is smaller than $tol=10^{-13}$ .

$\delta$	trad.	$U$ -based	S-based	S_ω-based	S_ω(k)-based
1.0e-2	1447	731	724	515	65
				496
				479
1.0e-4	84067	42046	42037	30023	11771
				28976
				28027
1.0e-6	2310370	1154998	1155352	825121	329843
				796693
				770250

Table 1: Number of iterations on Example 1.a for different values of

\delta

. The relaxed iteration with

\omega

being fixed a priori is tested with

\omega\in\{1.8,1.9,2\}

The first four columns of Table 1 1 confirms the theoretical comparison of asymptotic convergence rates of classical fixed point iterations applied to a block tridiagonal matrix. Specifically, the $U$ -based and the S-based iterations are twice faster than the traditional iteration. Also, the relaxed stationary variants greatly improve the convergence speed. An additional remarkable improvement is obtained by adjusting dynamically the value of the relaxation parameter. Also notice that the S_ω(k)-based iteration is guaranteed to converge differently to the stationary S_ω-based iteration.

The superiority of the adaptive implementation over the other fixed point iterations is confirmed by numerical results on Example 1.b. In Table 2 for different values of $p$ we show the number of iterations required by different iterative schemes including also Algorithm 1 implemented in [8]. For comparison with the results in [8] here we set $tol=10^{-8}$ in the stopping criterion.

$p$	trad.	$U$ -based	S-based	Algorithm 1	S_ω(k)-based
0.3	14	11	10	12	9
0.48	122	84	91	85	72
0.5	7497	5000	5622	5001	4374
0.55	53	37	39	39	32

Table 2: Number of iterations on Example 1.b for different values of

p

Finally, we compare the convergence speed of the traditional, $U$ -based, S-based and S_ω(k)-based fixed point iterations applied on the synthetic examples of M/G/1-type Markov chains described in [4]. In Figure 3 we report the semilogarithmic plot of the residual error in the infinity norm generated by the four fixed point iterations for two different values of the drift.

Observe that the adaptive relaxed iteration is about twice faster than the traditional fixed point iteration. The observation is confirmed in in Table 3 where we indicate the speed-up in terms of CPU-time with respect to the traditional fixed point iteration for different values of the drift $\mu$ .

$\mu$	$U$ -based	S-based	S_ω(k)-based
-0.1	1.6	1.5	2.1
-0.05	1.5	1.4	2.0
-0.01	1.6	1.5	2.1
-0.005	1.6	1.5	2.1
-0.001	1.6	1.4	2.2
-0.0005	1.6	1.5	2.2
-0.0001	1.7	1.6	2.4

Table 3: Speed-up in terms of CPU-time w.r.t. the traditional fixed point iteration for different values of the drift

\mu

In Figure 4 we repeat the set of experiments of Figure 3 with a starting stochastic matrix $X_{0}=\mbox{\boldmath$e$}\mbox{\boldmath$e$}^{T}/n$ . Here the adaptive strategy is basically the same as used before where we select $\omega_{k+1}$ in the interval $[0,\omega_{k}]$ as the maximum value which maintains the nonnegativity of $X_{k+1}$ .

In this case the adaptive strategy seems to be quite effective in reducing the number of iterations. Other results are not so favorable and we believe that in the stochastic case the design of a general efficient strategy for the choice of the relaxation parameter $\omega_{k}$ is still an open problem.

8.2 Application Examples

The first set 2.a of examples from applications includes several cases of PH/PH/1 queues collected in [4]. The construction of the Markov chain depends on a parameter $\rho$ with $0\leq\rho\leq 1$ and two integers $(i,j)$ which specify the PH distributions of the model. The Markov chain generates in this way is denoted as Example $(i,j)$ . Its drift is $\mu=1-\rho$ . In Tables 4 5 and 6 we compare the number of iterations for different values of $\rho$ . Here and hereafter the relaxed stationary S_ω-based iteration is applied with $\omega=2$ .

$\rho$	trad.	$U$ -based	S-based	S_ω-based	S_ω(k)-based
0.99	6708	3666	3638	1061	1069
0.999	53900	30018	29271	8559	8609
0.9999	386332	215660	209975	61608	65209

Table 4: Number of iterations for different values of

\rho

on Example

(2,6)

$\rho$	trad.	$U$ -based	S-based	S_ω-based	S_ω(k)-based
0.99	3735	3241	2225	1582	1158
0.999	29162	25241	17460	12469	9331
0.9999	209275	181139	125689	89965	68967

Table 5: Number of iterations for different values of

\rho

on Example

(8,3)

$\rho$	trad.	$U$ -based	S-based	S_ω-based	S_ω(k)-based
0.99	11372	9679	9416	8031	8204
0.999	87566	74611	72595	61992	63346
0.9999	597330	509108	495942	424141	433710

Table 6: Number of iterations for different values of

\rho

on Example

(8,9)

For comparison in Table 7 we show the number of iterations on Example $(8,3)$ of Table 5 starting with $X_{0}$ a stochastic matrix. We compare the traditional, $U$ -based and S-based iterations. We observe a rapid convergence profile and the fact that the number of iterations is independent of the drift value.

$\rho$	trad.	$U$ -based	S-based
0.99	52	45	31
0.999	51	45	30
0.9999	51	45	30

Table 7: Number of iterations for different values of

\rho

on Example

(8,3)

with

X_{0}

a stochastic matrix.

For a more challenging example from applications in 2.b we consider the generator matrix $Q$ from the queuing model described in [6]. In Table 8 we indicate the number of iterations for different values of the capacity $N$ .

$N$	trad.	$U$ -based	S-based	S_ω-based	S_ω(k)-based
20	4028	3567	3496	3089	2826
30	4028	3567	3496	3089	2826
40	4028	3567	3496	3089	2826

Table 8: Number of iterations for different values of

N

on Example described in [6].

9 Conclusion and Future Work

In this paper we have introduced a novel fixed point iteration for solving M/G/1-type Markov chains. It is shown that this iteration complemented with suitable adaptive relaxation techniques is generally more efficient than other classical iterations. Incorporating relaxation techniques into other different inner-outer iterative schemes as the ones introduced in [4] is an ongoing research.

References

[1] P. Amodio and F. Mazzia. A parallel Gauss-Seidel method for block tridiagonal linear systems. SIAM J. Sci. Comput., 16(6):1451–1461, 1995.
[2] Z. Z Bai. A class of iteration methods based on the Moser formula for nonlinear equations in Markov chains. Linear Algebra Appl., 266:219–241, 1997.
[3] D. A. Bini, G. Latouche, and B. Meini. Numerical methods for structured Markov chains. Numerical Mathematics and Scientific Computation. Oxford University Press, New York, 2005. Oxford Science Publications.
[4] D. A. Bini, G. Latouche, and B. Meini. A family of fast fixed point iterations for M/G/1-type Markov chains. IMA Journal of Numerical Analysis, 42(2):1454–1477, 02 2021.
[5] C. Brezinski and M. Redivo Z. Extrapolation methods, volume 2 of Studies in Computational Mathematics. North-Holland Publishing Co., Amsterdam, 1991. Theory and practice, With 1 IBM-PC floppy disk (5.25 inch).
[6] S. Dudin, A. Dudin, O. Kostyukova, and O. Dudina. Effective algorithm for computation of the stationary distribution of multi-dimensional level-dependent Markov chains with upper block-Hessenberg structure of the generator. J. Comput. Appl. Math., 366:112425, 17, 2020.
[7] L. Gemignani and F. Poloni. Comparison theorems for splittings of M-matrices in (block) Hessenberg form. BIT Numerical Mathematics, 2022.
[8] C. H. Guo. On the numerical solution of a nonlinear matrix equation in Markov chains. Linear Algebra Appl., 288(1-3):175–186, 1999.
[9] G. Latouche and V. Ramaswami. A logarithmic reduction algorithm for quasi-birth-death processes. J. Appl. Probab., 30(3):650–674, 1993.
[10] H. Lu. Stair matrices and their generalizations with applications to iterative methods. I. A generalization of the successive overrelaxation method. SIAM J. Numer. Anal., 37(1):1–17, 1999.
[11] B. Meini. New convergence results on functional iteration techniques for the numerical solution of $M/G/1$ type Markov chains. Numer. Math., 78(1):39–58, 1997.
[12] G. Meurant. Domain decomposition preconditioners for the conjugate gradient method. Calcolo, 25(1-2):103–119 (1989), 1988.
[13] C. Meyer. Matrix analysis and applied linear algebra. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2000. With 1 CD-ROM (Windows, Macintosh and UNIX) and a solutions manual (iv+171 pp.).
[14] M. F. Neuts. Matrix-geometric solutions in stochastic models, volume 2 of Johns Hopkins Series in the Mathematical Sciences. Johns Hopkins University Press, Baltimore, Md., 1981. An algorithmic approach.
[15] R. J. Plemmons. $M$ -matrix characterizations. I. Nonsingular $M$ -matrices. Linear Algebra Appl., 18(2):175–188, 1977.
[16] V. Ramaswami. A stable recursion for the steady state vector in Markov chains of $M/G/1$ type. Comm. Statist. Stochastic Models, 4(1):183–188, 1988.

Relaxed Fixed Point Iterations for Matrix Equations Arising in Markov Chains Modeling††thanks: The authors are partially supported by INDAM/GNCS and by the project PRA_2020_61 of the University of Pisa.

Abstract

1 Introduction

2 Preliminaries and assumptions

3 Nonlinear matrix equations and structured linear systems

4 A new fixed point iteration

Proposition 1.

Proof.

Proposition 2.

Proof.

5 A relaxed variant

Definition 3.

Proposition 4.

Proof.

Remark 5.

6 Estimate of the convergence rate

6.1 Finite dimensional convergence analysis

6.2 Asymptotic Convergence Rate

6.2.1 The case X0=0X_{0}=0

Proposition 6.

Proposition 7.

Proof.

Remark 8.

6.2.2 The Case X0X_{0} stochastic

7 Adaptive Strategies and Efficiency Analysis

8 Numerical Results

8.1 Synthetic Examples

8.2 Application Examples

9 Conclusion and Future Work

References

Relaxed Fixed Point Iterations for Matrix Equations Arising in Markov Chains Modeling^†^†thanks: The authors are partially supported by INDAM/GNCS and by the project PRA_2020_61 of the University of Pisa.

6.2.1 The case $X_{0}=0$

6.2.2 The Case $X_{0}$ stochastic