Traffic Rate Network Tomography with Higher-Order Cumulants

Hanoch Lev-Ari, , Yariv Ephraim , Brian L. Mark This work was supported in part by the U.S. National Science Foundation under Grant CCF-1717033.H. Lev-Ari is with the Dept. of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115 (e-mail: [email protected]).Y. Ephraim and B.L. Mark are with the Dept. of Electrical and Computer Engineering, George Mason University, Fairfax, VA 22030 (e-mail: [email protected], [email protected]).

Abstract

Network tomography aims at estimating source-destination traffic rates from link traffic measurements. This inverse problem was formulated by Vardi in 1996 for Poisson traffic over networks operating under deterministic as well as random routing regimes. In this paper we expand Vardi’s second-order moment matching rate estimation approach to higher-order cumulant matching with the goal of increasing the column rank of the mapping and consequently improving the rate estimation accuracy. We develop a systematic set of linear cumulant matching equations and express them compactly in terms of the Khatri-Rao product. Both least squares estimation and iterative minimum I-divergence estimation are considered. We develop an upper bound on the mean squared error (MSE) in least squares rate estimation from empirical cumulants. We demonstrate for the NSFnet that supplementing Vardi’s approach with third-order empirical cumulant reduces its averaged normalized MSE relative to the theoretical minimum of the second-order moment matching approach by about $12\%-18\%$ . This minimum MSE is obtained when Vardi’s second-order moment matching approach is based on the theoretical rather than the empirical moments.

Index Terms:

Network traffic, network tomography, inverse problem, higher-order cumulants.

I Introduction

Network tomography was formulated by Y. Vardi in the seminal paper [1]. The goal is to estimate the rates of traffic flows over source-destination pairs from traffic flows over links of the networks. Traffic flow rates are measured, for example, by the number of packets per second. We use $X$ to denote a vector of $L$ source-destination traffic flows, and $Y$ to denote a vector of $M$ link traffic flows. Normally, $M\ll L$ . All source-destination traffic flows are assumed independent Poisson random variables. Thus, $X$ comprises a vector of $L$ independent Poisson random variables. Link traffic flow measurements are assumed passive and require no probes. For networks operating under a deterministic routing regime, traffic flows from each source node is routed to the destination node over a fixed path. Traffic flows over each link may originate from multiple source-destination traffic flows. Thus, we define a binary variable $a_{ij}$ such that $a_{ij}=1$ when traffic over source-destination $j$ passes through link $i$ and $a_{ij}=0$ otherwise. We also define an $M\times L$ binary routing matrix $A=\{a_{ij},i=1,\ldots,M;j=1,\ldots,L\}$ which is assumed known throughout the paper. It follows that $Y=AX$ . The expected value of the $j$ th Poisson source-destination traffic flow constitutes its traffic flow rate. Thus, the vector of source-destination traffic flow rates is the expected value of $X$ which we denote by $\lambda=E\{X\}$ .

The network tomography problem is that of estimating the rate vector $\lambda$ from given realizations of $Y$ . This is a rather challenging inverse problem since $Y=AX$ is an underdetermined set of equations, and the estimate of $\lambda$ must be non-negative. Vardi invoked the terminology “LININPOS” for (LINear INverse POSitive problem). Maximum likelihood estimation of $\lambda$ is not feasible since the components of $Y$ are dependent Poisson random variables with no explicitly known distribution. The expectation-maximization (EM) algorithm is also not useful for this problem as it requires calculation of the conditional mean $E\{X\mid Y\}$ which in turns requires knowledge of the joint distribution of $X$ and $Y$ . Vanderbei and Iannone [2] developed an EM algorithm in which $E\{X\mid Y\}$ is simulated. A common approach is to rely on the explicit form of $E\{X\mid Y\}$ for jointly Gaussian $X$ and $Y$ , see, e.g., [1], [3], [4].

Vardi resorted to moment matching in which $\lambda$ is estimated from a linear matrix equation relating the first two empirical moments of $Y$ to the corresponding first two theoretical moments of $AX$ . The approach is applicable to networks operating under deterministic as well as random routing regimes. In the latter case, there are multiple alternative paths for each source-destination pair which can be selected according to some probability law. Vardi invoked an iterative procedure for estimating $\lambda$ which was previously developed for image deblurring [5]. This useful procedure will be discussed in Section II. Vardi’s work ignited extensive research in the areas of network inference and medical tomography.

In this paper we explore the benefits of supplementing Vardi’s second-order moment matching approach with higher-order empirical cumulants. Here too the cumulant matching equation is linear in $\lambda$ , and the approach is applicable to networks operating under deterministic as well as random routing regimes. Higher-order cumulants introduce new useful information on the unavailable distribution of $Y$ which can be leveraged in estimating the source-destination flow rates. By using a sufficient number of empirical cumulants, the linear mapping involved in the cumulant matching approach can achieve full column rank. In an ideal scenario where the cumulant matching equations are consistent, and the cumulants of the link traffic flows are accurately known, the rate vector can be recovered without error. As we demonstrate in Section V, this ideal situation is approachable when a sufficiently large number of realizations of $Y$ are available.

A vast literature exists on network tomography. Several forms of network tomography have been studied in the literature depending on the type of measurement data and the network performance parameters (e.g., rate, delay) of interest: (i) source-destination path-level traffic rate estimation based on link-level traffic measurements [1, 6, 3, 4, 7, 8, 9]; (ii) link-level parameter estimation based on end-to-end path-level traffic measurements [2], [10, 11, 12, 13, 14]; (iii) network topology discovery from traffic measurements [15, 16, 17, 18]. Approaches to network tomography can also be characterized as active (e.g., [10, 11, 19, 20]), whereby explicit control traffic is injected into the network to extract measurements, or passive (e.g., [1, 3, 21, 14]), whereby the observation data is obtained from existing network traffic. A somewhat outdated survey circa 2003 can be found in [22]. A more recent survey [23] discusses network tomography in conjunction with network coding.

Closest to Vardi’s work is the contemporary work of Vanderbei and Iannone [2] which relies on a Poisson model for incoming traffic. The goal is to estimate the rate of traffic on each link connecting input and output nodes from traffic counts at the input and output nodes. Vanderbei and Iannone did not resort to moment matching but rather developed a simulation based EM algorithm for maximum likelihood estimation of the rates. A thorough Bayesian approach to Vardi’s problem was developed by Tebaldi and West [6] using Markov Chain Monte Carlo simulation. See also [9]. Another closely related work to Vardi’s problem appeared in [3], [4], [22] where maximum likelihood estimation of the source-destination rates from link data was implemented under a Gaussian rather than a Poisson traffic model. In [7], source-destination rates were estimated by utilizing spatiotemporal correlation of nominal traffic, and the fact that traffic anomalies are sparse. In [8], conditions for identifiability of higher order cumulants in estimation of source-destination traffic from link measurements were established. In [13] an algorithm was developed for choosing a set of linearly independent source-destination measurement paths from which additive link metrics are individually estimated.

The plan for this paper is as follows. In Section II we present the minimum I-divergence iterative procedure which plays an important role in this paper. In Section III we address rate estimation in networks with deterministic routing. We develop a new systematic set of cumulant matching equations for estimating the rate vector from up to the fourth-order empirical cumulant. We also develop an upper bound on the mean squared error (MSE) in least squares estimation of the rate vector from empirical cumulants. We conclude this section with a discussion on some implementation issues of the proposed cumulant matching approach. Complexity of the approach is discussed in Section III-D. In Section IV we discuss rate estimation in networks operating under a random routing regime. Numerical results for the NSFnet [24] are presented in Section V. Concluding remarks are given in Section VI. Technical details of the derivations are deferred to the Appendix.

II Minimum I-Divergence Iteration

Under the Poisson model for $X$ , the moment (and cumulant) matching approach results in a set of linear equations in $\lambda$ . Generally speaking, the set of equations has the form of $\hat{\eta}(Y)=B\lambda$ where $\hat{\eta}(Y)$ is a vector of $M_{b}$ empirical moments (first and second-order in [1]) of $Y$ , $B=\{b_{ij},i=1,\ldots,M_{b};j=1,\ldots,L\}$ is a constant zero-one matrix that depends on $A$ but not on $\lambda$ , and $B\lambda$ represents the corresponding theoretical moments.

To estimate $\lambda$ that satisfies $\hat{\eta}(Y)=B\lambda$ , Vardi proposed to use an iterative estimation approach which originated in the study of another inverse problem concerning image deblurring [5], [25]. Let $\lambda_{j}^{\textrm{old}}$ denote a current estimate of the $j$ th component of $\lambda$ , and let $\lambda_{j}^{\textrm{new}}$ denote the new estimate of that component at the conclusion of the iteration. Let $(B\lambda^{\textrm{old}})_{i}$ denote the $i$ th component of $B\lambda^{\textrm{old}}$ . Similarly, let $\hat{\eta}_{i}(Y)$ denote the $i$ th component of $\hat{\eta}(Y)$ . The iteration is given by

\displaystyle\lambda_{j}^{\textrm{new}}

\displaystyle=\lambda_{j}^{\textrm{old}}\sum_{i=1}^{M_{b}}\bar{b}_{ij}\frac{\hat{\eta}_{i}(Y)}{(B\lambda^{\textrm{old}})_{i}}\textrm{ where }\bar{b}_{ij}:=\frac{b_{ij}}{\sum_{t=1}^{M_{b}}b_{tj}},

(1)

for $j=1,\ldots,L$ . This iteration is particularly suitable for solving positive inverse problems. It reaches a fixed point when the moment matching equation $\hat{\eta}(Y)=B\lambda^{\textrm{old}}$ is satisfied. The iteration was studied by Snyder, Schulz and O’Sullivan [25] in a similarly formulated application of image deblurring. It was shown to monotonically decrease Csiszár’s I-divergence [26] between the original image convolved with the kernel, and the observed blurred image. The procedure turned out to be an EM iteration in the positron emission tomography problem, which follows a similar formulation as network tomography, but with the crucially facilitating difference that the Poisson components of $Y$ are now independent random variables [27], [28]. This iteration will play a central role in our cumulant matching approach.

III Cumulant Matching in Deterministic Routing

In this section we present our cumulant matching approach and its theoretical performance bound for networks operating under a deterministic routing regime. The goal is to estimate the source-destination rate vector $\lambda$ from $N$ realizations of the link traffic flow $Y$ where $Y=AX$ and the routing matrix $A$ is known. Conditions for identifiability of $\lambda$ were given in [1]. Specifically, the parameter $\lambda$ is identifiable if all columns $\{a_{1},\ldots,a_{L}\}$ of $A$ are distinct and none is null.

III-A Cumulant Matching

Let $\bar{X}:=X-E\{X\}$ where $X$ is any real random vector with finite mean and possibly dependent components. The vectorized $k$ th order central moment of $X$ is given by

\displaystyle\mu_{k}(X)=E\{\underset{k\textrm{ times }}{\underbrace{\bar{X}\otimes\bar{X}\otimes\cdots\otimes\bar{X}}}\}

(2)

where $k$ is any positive integer and $\otimes$ denotes the Kronecker product. When the length of $X$ is $L$ , the length of the vector $\mu_{k}(X)$ is $L^{k}$ . For $Y=AX$ , it follows from the identity

\displaystyle(A_{1}B_{1})\otimes(A_{2}B_{2})=(A_{1}\otimes A_{2})(B_{1}\otimes B_{2}),

(3)

where $A_{1},A_{2},B_{1},B_{2}$ are matrices of suitable dimensions [29, p. 408], that

\displaystyle\mu_{k}(Y)=\underset{k\textrm{ times }}{\underbrace{\left(A\otimes A\otimes\cdots\otimes A\right)}}\ \mu_{k}(X).

(4)

Expressions for the cumulants of the observed process in a state-space model were developed by Swami and Mendal [30]. The process in this paper is a particular case for which simpler expressions hold. We provide an alternative derivation for our particular case in the Appendix. Let $K_{k}(X)$ denote the vectorized $k$ th central cumulant of any real random vector $X$ with finite mean. For $k=1,2,3$ , $K_{k}(X)=\mu_{k}(X)$ . For $k=4$ we have

	$\displaystyle K_{4}(X)$	$\displaystyle=\mu_{4}(X)-\mu_{2}(X)\otimes\mu_{2}(X)$
		$\displaystyle-\textrm{vec}\left\{R_{XX}\otimes R_{XX}+U_{L^{2}}\cdot(R_{XX}\otimes R_{XX})\right\}$		(5)

where $R_{XX}$ is the covariance matrix of $X$ and $U_{L^{2}}$ is an $L^{2}\times L^{2}$ permutation matrix defined in (A.70). We also have

\displaystyle K_{4}(Y)=(A\otimes A\otimes A\otimes A)K_{4}(X).

(6)

When $R_{XX}$ is a diagonal matrix, as is the case of interest in this paper, it follows that $U_{L^{2}}\cdot(R_{XX}\otimes R_{XX})$ is a symmetric matrix. We shall restrict our attention to cumulants of order $k\leq 4$ since estimation of higher order cumulants is not practical.

Assume next that the components $\{x_{j}\}$ of $X$ are independent. For this special case of interest, the central moments and central cumulants of $X$ are conveniently and concisely expressed in terms of the Khatri-Rao product defined by

\displaystyle A\odot A:=[a_{1}\otimes a_{1},a_{2}\otimes a_{2},\ldots,a_{L}\otimes a_{L}].

(7)

It is shown in the Appendix that

	$\displaystyle\mu_{2}(Y)$	$\displaystyle=(A\!\odot\!A)\textrm{col}\!\left(E\left\{\bar{x}_{1}^{2}\right\},E\left\{\bar{x}_{2}^{2}\right\},\ldots,E\left\{\bar{x}_{L}^{2}\right\}\right)$
	$\displaystyle\mu_{3}(Y)$	$\displaystyle=(A\!\odot\!A\odot\!A)\textrm{col}\!\left(E\left\{\bar{x}_{1}^{3}\right\},E\left\{\bar{x}_{2}^{3}\right\},\ldots,E\left\{\bar{x}_{L}^{3}\right\}\right),$		(8)

and

\displaystyle K_{4}(Y)

\displaystyle=(A\!\odot\!A\!\odot\!A\!\odot\!A)\left(\begin{array}[]{c}\kappa_{4}(\bar{x}_{1},\bar{x}_{1},\bar{x}_{1},\bar{x}_{1})\\ \kappa_{4}(\bar{x}_{2},\bar{x}_{2},\bar{x}_{2},\bar{x}_{2})\\ \vdots\\ \kappa_{4}(\bar{x}_{L},\bar{x}_{L},\bar{x}_{L},\bar{x}_{L})\end{array}\right),

(13)

where

\displaystyle\kappa_{4}(\bar{x}_{i},\bar{x}_{j},\bar{x}_{k},\bar{x}_{l})=\left\{\begin{array}[]{cc}E\{\bar{x}_{i}^{4}\}-3(E\{\bar{x}_{i}^{2}\})^{2},&i\!=\!j\!=\!k\!=\!l\\ 0,&\textrm{otherwise.}\end{array}\right.

(16)

When $\{x_{j}\}$ are independent Poisson random variables, all cumulants of $x_{j}$ equal its rate $E\{x_{j}\}=\lambda_{j}$ , and we have from $m(Y):=E\{Y\}=AE\{X\}$ and (III-A)-(16) that,

\displaystyle\left(\begin{array}[]{c}A\\ A\odot A\\ A\odot A\odot A\\ A\odot A\odot A\odot A\end{array}\right)\lambda=\left(\begin{array}[]{c}m(Y)\\ \mu_{2}(Y)\\ \mu_{3}(Y)\\ K_{4}(Y)\end{array}\right).

(25)

This is the key cumulant matching equation for $r\leq 4$ . In (25), let $\mathcal{A}_{r}$ denote the matrix of stacked Khatri-Rao products of $A$ , and let $\eta_{r}(Y)$ denote the vector of stacked cumulants. Then the rate vector $\lambda$ satisfies

\displaystyle\mathcal{A}_{r}\lambda=\eta_{r}(Y).

(26)

To estimate $\lambda$ , $\eta_{r}(Y)$ is replaced by a vector of empirical estimates $\hat{\eta}_{r}(Y)$ and $\lambda$ is estimated from

\displaystyle\boxed{\mathcal{A}_{r}\lambda=\hat{\eta}_{r}(Y).}

(27)

In this paper we consider least squares estimation and the minimum I-divergence iterative procedure (1). The vector $\hat{\eta}_{r}(Y)$ comprises minimum variance unbiased cumulant estimates given by the $K$ -Statistics which we discuss in the next section [31].

Vardi’s second-order moment matching equation can be summarized as $\hat{\eta}_{2}(Y)=\mathcal{A}_{2}\lambda$ . With only two moments, the column rank of the matrix $\mathcal{A}_{2}$ may be too low to provide an accurate solution. Note that if the set of equations (27) is consistent, the theoretical cumulants are known, and $r$ is sufficiently large so that $\mathcal{A}_{r}$ has full column rank, then $\lambda$ can be estimated error free as the unique solution of (27). We demonstrate in Section V for the NSFnet [24] that $r=3$ is sufficient to achieve full column rank, and that the theoretical performance is approachable when the cumulants are estimated from a sufficiently large number of realizations of $Y$ .

III-B Error Analysis

In this section we assess the MSE in the least squares estimation of $\lambda$ satisfying (26) from the cumulant matching equation (27). We assume for this analysis that $r$ is sufficiently large so that the augmented matrix $\mathcal{A}_{r}$ has full column rank.

$K$ -Statistics of scalar processes were developed in [31, p. 281]. For vector processes and $k=1,2,3$ , let $\hat{\mu}_{k}(Y)$ and $\hat{\mu}_{k}(X)$ denote the $K$ -Statistics of $\mu_{k}(Y)$ and $\mu_{k}(X)$ , respectively. Similarly, let $\hat{K}_{4}(Y)$ and $\hat{K}_{4}(X)$ denote the $K$ -Statistics of $K_{4}(Y)$ and $K_{4}(X)$ , respectively. Let $\{X_{n},n=1,2,\ldots,N\}$ denote a sequence of independent identically distributed source-destination traffic flows defined similarly to $X$ . Each $X_{n}$ comprised $L$ independent Poisson random variables with rate $E\{X_{n}\}=\lambda$ . Let $Y_{n}=AX_{n}$ . Define

\displaystyle\tilde{Y}_{n}:=Y_{n}-\hat{m}(Y)

(28)

where

\displaystyle\hat{m}(Y):=\frac{1}{N}\sum_{n=1}^{N}Y_{n},

(29)

and the empirical cumulant estimate for $k>1$ as

\displaystyle\tilde{\mu}_{k}(Y):=\frac{1}{N}\sum_{n=1}^{N}\underset{k\textrm{ times }}{\underbrace{\tilde{Y}_{n}\otimes\tilde{Y}_{n}\otimes\cdots\otimes\tilde{Y}_{n}}}.

(30)

Similar definitions follow for the $X$ process when $Y$ in (28)-(30) is replaced by $X$ . The $K$ -Statistic for $k=1$ is given by

\displaystyle\hat{m}(Y)=\frac{1}{N}\sum_{n=1}^{N}AX_{n}=A\hat{m}(X),

(31)

for $k=2$ ,

	$\displaystyle\hat{\mu}_{2}(Y)$	$\displaystyle=\frac{N}{N-1}\tilde{\mu}_{2}(Y)$
		$\displaystyle=(A\otimes A)\hat{\mu}_{2}(X),$		(32)

for $k=3$ ,

	$\displaystyle\hat{\mu}_{3}(Y)$	$\displaystyle=\frac{N^{2}}{(N-1)(N-2)}\tilde{\mu}_{3}(Y)$
		$\displaystyle=(A\otimes A\otimes A)\hat{\mu}_{3}(X)$		(33)

and for $k=4$ ,

$\displaystyle\hat{K}_{4}(Y)$	$\displaystyle:=\frac{N^{2}}{(N-1)(N-2)(N-3)}\left[(N+1)\tilde{\mu}_{4}(Y)\right.$
	$\displaystyle\hskip 64.58313pt\left.-3(N-1)\tilde{\mu}_{2}(Y)\otimes\tilde{\mu}_{2}(Y)\right]$
	$\displaystyle=(A\otimes A\otimes A\otimes A)\hat{K}_{4}(X),$	(34)

where we have used (3). Similar relations are expected to hold for higher order cumulants. For $r\leq 3$ and sufficiently large $N$ , the $K$ -Statistic $\hat{\mu}_{r}(Y)$ and the empirical cumulant $\tilde{\mu}_{r}(Y)$ are essentially the same.

Let $\hat{\eta}_{r}(X)$ and $\hat{\eta}_{r}(Y)$ denote vectors of stacked $K$ -statistics of order smaller than or equal to $r$ corresponding to $X$ and $Y$ , respectively. The least squares estimate of $\lambda$ in (27) is given by

\displaystyle\hat{\lambda}_{r}=(\mathcal{A}_{r}^{*}\mathcal{A}_{r})^{-1}\mathcal{A}_{r}^{*}\ \hat{\eta}_{r}(Y).

(35)

Let

\displaystyle\epsilon_{r}(Y)=\hat{\eta}_{r}(Y)-\eta_{r}(Y)

(36)

denote the error vector in estimating $\eta_{r}(Y)$ . From (26),

\displaystyle\hat{\eta}_{r}(Y)=\mathcal{A}_{r}\lambda+\epsilon_{r}(Y).

(37)

Substituting (37) into (35) yields,

	$\displaystyle\hat{\lambda}_{r}$	$\displaystyle=(\mathcal{A}_{r}^{}\mathcal{A}_{r})^{-1}\mathcal{A}_{r}^{}[\mathcal{A}_{r}\lambda+\epsilon_{r}(Y)]$
		$\displaystyle=\lambda+(\mathcal{A}_{r}^{}\mathcal{A}_{r})^{-1}\mathcal{A}_{r}^{}\epsilon_{r}(Y).$		(38)

Let $H_{r}=(\mathcal{A}_{r}^{*}\mathcal{A}_{r})^{-1}\mathcal{A}_{r}^{*}$ . The rate estimation error is given by

\displaystyle\hat{\lambda}_{r}-\lambda=H_{r}\epsilon_{r}(Y),

(39)

and the MSE is given by

$\displaystyle\overline{\epsilon_{r}^{2}}(\lambda)$	$\displaystyle=E\left\{\|\|\hat{\lambda}_{r}-\lambda\|\|^{2}\right\}$
	$\displaystyle=E\left\{\|\|H_{r}\epsilon_{r}(Y)\|\|^{2}\right\}$
	$\displaystyle\leq\rho_{\textrm{max}}(H_{r}^{*}H_{r})E\{\|\|\epsilon_{r}(Y)\|\|^{2}\}$	(40)

where the inequality follows from the Rayleigh quotient theorem ¹¹1 The Rayleigh quotient of a Hermitian matrix $H$ is defined as $u^{\ast}Hu/u^{\ast}u$ where $u$ is a vector of suitable dimension. [29, Theorem 8.1.4], and $\rho_{\textrm{max}}(H_{r}^{*}H_{r})$ denotes the maximal eigenvalue of the Hermitian matrix $H_{r}^{*}H_{r}$ . From the theory of singular value decomposition, we recall that for any matrix $H_{r}$ ,

\displaystyle\rho_{\textrm{max}}(H_{r}H_{r}^{*})=\rho_{\textrm{max}}(H_{r}^{*}H_{r}).

(41)

Applying this to (III-B) gives

$\displaystyle\overline{\epsilon_{r}^{2}}(\lambda)$	$\displaystyle\leq\rho_{\textrm{max}}(((\mathcal{A}_{r}^{}\mathcal{A}_{r})^{-1}\mathcal{A}_{r}^{})((\mathcal{A}_{r}^{}\mathcal{A}_{r})^{-1}\mathcal{A}_{r}^{})^{*})E\{\|\|\epsilon_{r}(Y)\|\|^{2}\}$
	$\displaystyle=\rho_{\textrm{max}}((\mathcal{A}_{r}^{*}\mathcal{A}_{r})^{-1})E\{\|\|\epsilon_{r}(Y)\|\|^{2}\}$
	$\displaystyle=\frac{1}{\rho_{\textrm{min}}(\mathcal{A}_{r}^{*}\mathcal{A}_{r})}E\{\|\|\epsilon_{r}(Y)\|\|^{2}\}.$	(42)

Let $\mathcal{B}_{r}$ denote a block diagonal matrix with the $k$ th diagonal block being $\mathcal{A}_{k}$ . There are $r$ diagonal blocks in total. Let

\displaystyle\epsilon_{r}(X)=\hat{\eta}_{r}(X)-\eta_{r}(X).

(43)

From (36), (31)-(III-B), and (25),

\displaystyle\epsilon_{r}(Y)=\mathcal{B}_{r}\epsilon_{r}(X)

(44)

It follows from the Rayleigh quotient [29, Theorem 8.1.4] that

\displaystyle||\epsilon_{r}(Y)||^{2}\leq||\mathcal{B}_{r}||^{2}\cdot||\epsilon_{r}(X)||^{2}

(45)

where $||\mathcal{B}_{r}||$ equals the maximal singular value of $\mathcal{B}_{r}$ , and

\displaystyle||\mathcal{B}_{r}||^{2}=\rho_{\textrm{max}}(\mathcal{B}_{r}^{*}\mathcal{B}_{r}).

(46)

The $k$ th diagonal block of $\mathcal{B}_{r}^{*}\mathcal{B}_{r}$ is given by

$\displaystyle(A\otimes A\otimes\cdots$	$\displaystyle\otimes A)^{*}(A\otimes A\otimes\cdots\otimes A)$
	$\displaystyle=A^{}A\otimes A^{}A\otimes\cdots\otimes A^{*}A$
	$\displaystyle=:(A^{*}A)^{\otimes k}.$	(47)

Since the eigenvalues of the Kronecker product of two matrices equal the Kronecker product of the vectors of eigenvalues of each matrix [29, p. 412], we conclude that the eigenvalues of $(A^{*}A)^{\otimes p}$ for any positive integer $p$ are all possible products of length $p$ of the eigenvalues of $A^{*}A$ . In particular, since $A^{*}A$ is positive semi-definite,

\displaystyle\rho_{\textrm{max}}((A^{*}A)^{\otimes p})=[\rho_{\textrm{max}}(A^{*}A)]^{p}.

(48)

Furthermore, since $\mathcal{B}_{r}^{*}\mathcal{B}_{r}$ is block diagonal with blocks of increasing size, its eigenvalues are given by the union of the sets of eigenvalues associated with each block. Thus,

\displaystyle\rho_{\textrm{max}}(\mathcal{B}_{r}^{*}\mathcal{B}_{r})=\max_{p:p\geq 1}[\rho_{\textrm{max}}(A^{*}A)]^{p}.

(49)

When $\rho_{\textrm{max}}(A^{*}A)>1$ , the maximum in (49) is achieved for $p=r$ . Otherwise, when $\rho_{\textrm{max}}(A^{*}A)<1$ , the maximum in (49) is achieved for $p=1$ . Hence, from (45),

	$\displaystyle E\{\|\|\epsilon_{r}(Y)\|\|^{2}\}$	$\displaystyle\leq\max\left\{\rho_{\textrm{max}}(A^{}A),[\rho_{\textrm{max}}(A^{}A)]^{r}\right\}$
		$\displaystyle\hskip 86.11084pt\cdot E\left\{\|\|\epsilon_{r}(X)\|\|^{2}\right\}.$		(50)

From (III-B) and (III-B) we obtain,

\displaystyle\overline{\epsilon_{r}^{2}}(\lambda)

\displaystyle\leq\frac{\max\left\{\rho_{\textrm{max}}(A^{*}A),[\rho_{\textrm{max}}(A^{*}A)]^{r}\right\}}{\rho_{\textrm{min}}(\mathcal{A}_{r}^{*}\mathcal{A}_{r})}E\{||\epsilon_{r}(X)||^{2}\}.

(51)

Evaluating the MSE $E\{||\epsilon_{r}(X)||^{2}\}$ is quite involved. For $r=2,3$ , and sufficiently large $N$ , the cross terms in $\hat{\mu}_{r}(X)$ will be approximately zero, and hence

\displaystyle E\{||\epsilon_{r}(X)||^{2}\}\approx\sum_{l=1}^{L}E\{||\epsilon_{r}(x_{l})||^{2}\}

(52)

where $E\{||\epsilon_{r}(x_{l})||^{2}\}$ is the MSE associated with the $K$ -statistics of the $l$ th component of $X$ . It is also the variance of $\hat{\mu}_{r}(x_{l})$ . Since the $\{X_{n}\}$ vectors are assumed independent, we can infer $E\{||\epsilon_{r}(x_{l})||^{2}\}$ from the variance of the $K$ -statistics of an IID sequence of scalar random variables given by [31, p. 291],

var	$\displaystyle\left(\hat{\mu}_{2}(x_{l})\right)=\frac{1}{N}K_{4}(x_{l})+\frac{1}{N-1}2\mu_{2}^{2}(x_{l})$
var	$\displaystyle\left(\hat{\mu}_{3}(x_{l})\right)=\frac{1}{N}K_{6}(x_{l})+\frac{9}{N-1}K_{4}(x_{l})\mu_{2}(x_{l})$
	$\displaystyle\hskip 8.61108pt+\frac{9}{N-1}\mu_{3}^{2}(x_{l})+\frac{6N}{(N-1)(N-2)}\mu_{2}^{3}(x_{l}).$	(53)

When the IID random variables are Poisson with $E\{X_{l}\}=\lambda_{l}$ , all cumulants equal $\lambda_{l}$ , and hence,

	$\displaystyle E\{\|\|\epsilon_{2}(x_{l})\|\|^{2}\}$	$\displaystyle=\textrm{var}\left(\hat{\mu}_{2}(x_{l})\right)=\frac{1}{N}\lambda_{l}+\frac{1}{N-1}2\lambda_{l}^{2}$
		$\displaystyle\approx\frac{1}{N}(\lambda_{l}+2\lambda_{l}^{2}).$		(54)

	$\displaystyle E$	$\displaystyle\{\|\|\epsilon_{3}(x_{l})\|\|^{2}\}=\textrm{var}\left(\hat{\mu}_{3}(x_{l})\right)=\frac{1}{N}\lambda_{l}+\frac{9}{N-1}2\lambda_{l}^{2}$
		$\displaystyle~{}~{}~{}+\frac{6N}{(N-1)(N-2)}\lambda_{l}^{3}\approx\frac{1}{N}(\lambda_{l}+18\lambda_{l}^{2}+6\lambda_{l}^{3}).$		(55)

The upper bound on $\overline{\epsilon_{r}^{2}}(\lambda)$ for $r=2,3$ , follows from (51), (52), (III-B) and (III-B).

The bound (51) is loose when the condition number of the matrix $\mathcal{A}_{r}^{*}\mathcal{A}_{r}$ is large. It does have qualitative value as it shows that estimation of moments becomes increasingly harder as the order increases. Since usually $\lambda_{l}>1$ in network tomography, the error in estimating $\mu_{3}(x_{l})$ is much larger than that in estimating $\mu_{2}(x_{l})$ .

III-C Implementation

In this section we address several aspects related to the implementation of the cumulant matching approach. The Khatri-Rao product can be expressed in terms of the rows of $A$ instead of its columns as in (7). Let $\alpha_{i}$ denote the $i$ th row of $A$ , $i=1,\ldots,M$ . Then $A=\mbox{stack}\{\alpha_{i}:i=1,\ldots,M\}$ where stack refers to the stacking of row vectors of equal length to form a matrix. We have

	$\displaystyle A\odot A$	$\displaystyle=\mbox{stack}\left\{\alpha_{i}\circ\alpha_{j};\ i,j\!=\!1,\ldots,M\right\}$
	$\displaystyle A\odot A\!\odot\!A$	$\displaystyle=\mbox{stack}\left\{\alpha_{i}\circ\alpha_{j}\circ\alpha_{k};\ i,j,k\!=\!1,\ldots,M\right\},$		(56)

etc., where $\circ$ denotes the Schur-Hadamard product and lexicographic ordering of the indices $(i,j,k)$ is assumed. Using this formulation, it is easy to see that $\mathcal{A}_{r}$ contains duplicate rows. For example, $\alpha_{i}$ in $A$ and $\alpha_{i}\circ\alpha_{i}$ in $A\odot A$ are duplicates since the elements of $A$ belong to $\{0,1\}$ . Additionally, $\mathcal{A}_{r}$ may contain null rows as is easy to see from (7). Thus, equations in (25) corresponding to duplicate and null rows in $\mathcal{A}_{r}$ can be removed. We shall always opt to removing duplicate rows that correspond to higher-order cumulants rather than to low order cumulants since higher-order cumulants are harder to estimate. To simplify notation we shall henceforth assume that $\mathcal{A}_{r}$ and the right hand side (RHS) vector $\eta_{r}(Y)$ of (25) are given in their reduced form. Similarly, we shall consider (26) and (27) as given in their reduced form.

It is interesting to compare the number of rows in the original and reduced $\mathcal{A}_{r}$ . The number of rows in the original $\mathcal{A}_{r}$ is given by

\displaystyle\bar{n}_{r}(M)=M+M^{2}+M^{3}+\ldots+M^{r}=\frac{M(M^{r}\!-\!1)}{M-1}.

(57)

The maximum number of distinct rows in the reduced $\mathcal{A}_{r}$ is counted as the sum of the maximum number of distinct rows contributed individually by $A$ , $A\circ A$ , $A\circ A\circ A$ , etc., with the last contribution from the $r$ -fold Khatri-Rao product of $A$ with itself. The contribution from the $i$ th term is given by ${M\choose i}$ , $i=1,\ldots,r$ , which represents the number of unordered combinations of $i$ rows chosen without replacement from the given $M$ rows. Thus, the maximum number of distinct rows in the reduced $\mathcal{A}_{r}$ is given by

\displaystyle n_{r}(M)={M\choose 1}\!+\!{M\choose 2}\!+\!{M\choose 3}+\ldots+{M\choose r}.

(58)

For example, consider a network with a $4\times 15$ routing matrix $A$ whose columns comprise all lexicographically ordered non-zero binary $4$ -tuples. Here, $M=4$ , and $\bar{n}_{r}(M)$ and $n_{r}(M)$ are shown in Table I. Note that for this example, the number of reduced equations coincides with the row rank of $\mathcal{A}_{r}$ for each $r$ . Furthermore, full column rank is achieved only when $r=4$ .

$r$	1	2	3	4
$\bar{n}_{r}(M)$	$4$	$20$	$64$	$320$
$n_{r}(M)$	$4$	$10$	$14$	$15$
row rank of $\mathcal{A}_{r}$	$4$	$10$	$14$	$15$

TABLE I: Number of unreduced (

\bar{n}_{r}(M)

) and reduced (

n_{r}(M)

) cumulant matching equations, and row rank of

\mathcal{A}_{r}

, for the

4\times 15

routing matrix of all non-zero binary

4

-tuples.

We study two estimates of $\lambda$ that approximate the cumulant matching equation (27). The first estimate follows from the iteration (1) which is initialized by a constant vector. The second estimate follows from least squares. Specifically, we use the unique Tikhonov regularized least squares solution for the inconsistent set of equations (27), when $\mathcal{A}_{r}$ is not necessarily full column rank. This estimator is given by [32, p. 51]

\displaystyle\hat{\lambda}_{r}=(\mathcal{A}_{r}^{*}\mathcal{A}_{r}+\gamma I)^{-1}\mathcal{A}_{r}^{*}\ \hat{\eta}_{r}(Y)

(59)

for some $\gamma>0$ . Note that the regularized estimator applies to a skinny as well as a fat matrix $\mathcal{A}_{r}$ .

To mitigate the effects of the error introduced by the empirical cumulant estimates, while allowing the estimator of $\lambda$ to benefit from the higher order cumulants, the relative weights of the third and fourth order cumulants compared to the first and second order moments, can be reduced. This can be done by multiplying all equations in (59) with rows originating from $\mathcal{A}_{3}$ by some $0<\epsilon_{3}<1$ , and all equations with rows originating from $\mathcal{A}_{4}$ by some $0<\epsilon_{4}<1$ . This regularization approach was advocated by Vardi [1] in the context of his second-order moment matching approach. Following this approach, let the reduced and $\epsilon$ -weighted matrix $\mathcal{A}_{r}$ be denoted by $\mathcal{A}_{r,\epsilon}$ , and let the reduced $\epsilon$ -weighted vector $\hat{\eta}_{r,\epsilon}(Y)$ be denoted by $\hat{\eta}_{r,\epsilon}(Y)$ . Then, from (59), the rate vector $\lambda$ is estimated from

\displaystyle\hat{\lambda}_{r,\epsilon}=(\mathcal{A}_{r,\epsilon}^{*}\mathcal{A}_{r,\epsilon}+\gamma I)^{-1}\mathcal{A}_{r,\epsilon}^{*}\ \hat{\eta}_{r,\epsilon}(Y).

(60)

Note that the estimator (60) is not guaranteed to be non-negative. A non-negative estimate of $\lambda$ can be obtained by using non-negative least squares optimization [33]. In our numeric examples we have arbitrarily substituted negative estimates with the value of $.005$ . This approach resulted in substantially lower MSE compared to using the constrained optimization algorithm of [33, p. 161]. The performance of the algorithm remained essentially the same when other small values (e.g., $.1$ or $.2$ ) were used since occurrences of negative estimates are infrequent at our working point. See Table V.

III-D Complexity

The computational effort in the cumulant matching approach consists of the effort to construct and solve the set of equations (27). The number of equations in this set is $n_{r}(M)$ . Construction of the left hand side of (27) requires $(M^{2}+M^{3}+\ldots+M^{r})L$ operations. Construction of the right hand side of (27) requires $(M^{2}+M^{3}+\ldots+M^{r})N$ operations where $N$ is the number vectors used to estimate each cumulant. Solving the equations requires effort that depends only on $n_{r}(M)$ , $M$ and the number of iterations in (1). The combined effort is dominated by $(M^{2}+M^{3}+\ldots+M^{r})N$ since $N$ must be large to produce meaningful cumulant estimates. Thus the computational effort of the cumulant matching approach is approximately linear in $N$ when $N$ is large which is always the case.

IV Random Routing

In a typical network, there are multiple paths connecting every source node with every destination node. When the network operates under a deterministic routing regime, a single fixed path is used for every source-destination pair. When the network operates under a random routing regime, a path from the source to destination nodes is selected according to some probability law. Vardi attributed Markovian routing for traffic on each source-destination pair. The accessible nodes and links for the given pair are represented by the states and transition probabilities of the Markov chain, respectively. The transition probabilities of the Markov chain determine the probability of each path with the same source-destination address.

Tebaldi and West argued that random routing can be viewed as deterministic routing in a super-network in which all possible paths for each source-destination address are listed [6]. This approach results in an expanded zero-one routing matrix with each column in the original routing matrix replaced by multiple columns representing the feasible paths for the source-destination pair. The Poisson traffic flow on a source-destination pair is thinned into multiple Poisson traffic flows with the multinomial thinning probabilities being the probabilities of the paths with the same source-destination address. Thus, the super network and the original network operate under similar statistical (Poisson) models. The thinned Poisson rates can now be estimated as was originally done, e.g., using cumulant matching, and each source-destination rate estimate can be obtained from the sum of the thinned rate estimates in that source-destination pair.

V Numerical Example

In this section we demonstrate the performance of our approach and the gain realized by using higher-order empirical cumulants²²2Our experiments were run on ARGO, a research computing cluster provided by the Office of Research Computing at George Mason University, VA. (URL: http://orc.gmu.edu).. We study the NSFnet [24] whose topology is shown in Fig. 1. The network consists of 14 nodes and 21 bidirectional links. Hence, it contains $L=14\cdot 13/2=91$ source-destination pairs.

Refer to caption — Figure 1: NSFnet topology with link weights as in [24, Fig. 4].

This size network may represent a private network, a transportation network or a subset of interest of a large-scale network. The link weights in Fig. 1 are exclusively used to determine $k\geq 1$ shortest paths for each source-destination pair. Otherwise, they play no role in the traffic rate estimation problem. To determine the $k$ shortest paths between a given source-destination pair, we used the shortest simple paths function from the NetworkX Python library, which is based on the algorithm of Yen [34]. When $k=1$ , the number of source-destination paths equals the number of source-destination pairs and the routing matrix $A$ is a $21\times 91$ matrix. The augmented routing matrix $\mathcal{A}_{r}$ achieves full column rank when $r=2$ .

With $k\geq 2$ , we can assign multiple paths to each source-destination pair and treat them as distinguishable new source-destination pairs. The routing matrix $A$ thus becomes fatter and using higher-order empirical cumulants becomes beneficial. For example, when $k=2$ , we have $L=182$ source-destination paths, the column rank of $\mathcal{A}_{2}$ is $162$ and $\mathcal{A}_{3}$ has full column rank. Thus, using this example we focus on third-order cumulant matching.

The network with $k=2$ and a $21\times 182$ routing matrix $A$ could also be seen as a super-network in the Tebaldi-West sense [6] for a network with $L=91$ source-destination pairs operating under a random routing regime with two possible paths per each source-destination pair. The accuracy of the rate estimation for the random routing network is determined by the accuracy of the rate estimation in the deterministic routing super-network. Thus, it suffices to focus on rate estimation in the deterministic routing network with the $21\times 182$ routing matrix $A$ .

The arrival rates $\{\lambda_{j}\}$ in our experiment were generated randomly from the interval $[0,4]$ . In each of $T=500$ simulation runs, a rate vector $\lambda$ was generated, and was subsequently used in generating $N$ statistically independent identically distributed Poisson vectors $\{X_{n}\}$ which were transformed into the vectors $\{Y_{n}=AX_{n}\}$ using the assumed known routing matrix $A$ . The $N$ statistically independent identically distributed Poisson vectors $\{Y_{n}\}$ were used to generate the empirical cumulants using (29) when $r=1$ and (30) when $r=2,3$ . We experimented with $N$ in the range of $10\,000$ to $500\,000$ . The MSE in estimating the cumulants for the NSFnet with $M=21$ are given in Table II. Clearly, the MSE decreases monotonically with $N$ for $r=1,2,3$ .

The empirical cumulants were used to estimate the rate vector in the current run. For the cumulants regularization we have used $\epsilon_{2}=1$ and $\epsilon_{3}=.01$ . Let $\lambda_{t}(i)$ and $\hat{\lambda}_{t}(i)$ denote, respectively, the $i$ th component of $\lambda$ and its estimate at the $t$ th run where $i=1,\ldots,L$ and $t=1,\ldots,T$ . For each estimate we evaluated the normalized MSE defined by

\displaystyle\xi_{i}^{2}=\frac{\frac{1}{T}\sum_{t=1}^{T}(\lambda_{t}(i)-\hat{\lambda}_{t}(i))^{2}}{\frac{1}{T}\sum_{t=1}^{T}(\lambda_{t}(i))^{2}}

(61)

and the averaged normalized MSE defined by

\displaystyle\overline{\xi^{2}}=\frac{1}{L}\sum_{i=1}^{L}\xi_{i}^{2}.

(62)

The MSE in estimating $\lambda_{i}$ is approximately $\xi_{i}^{2}\cdot E\{\lambda^{2}(i)\}$ when $T$ is sufficiently large.

Two rate estimators were used, the iterative estimator (1) and the least squares estimator (60). The iteration was initialized uniformly with all rates set to $.1$ . It was terminated after $300$ iterations. The least squares regularization factor was set to $\gamma=.0005$ .

$N$	$r=1$	$r=2$	$r=3$
$10\,000$	0.0048	0.2578	14.9097
$20\,000$	0.0025	0.1285	7.4569
$50\,000$	0.0010	0.0515	2.9978
$100\,000$	0.0005	0.0257	1.4989
$200\,000$	0.0002	0.0131	0.7429
$500\,000$	0.0001	0.0052	0.2993

TABLE II: MSE in cumulant estimation for the NSFnet with

M=21

links.

$N$	$r=1$	$r=2$	$r=3$
$10\,000$	0.2292	0.0993	0.1008
$20\,000$	0.2292	0.0701	0.0694
$50\,000$	0.2292	0.0502	0.0467
$100\,000$	0.2292	0.0434	0.0381
$200\,000$	0.2292	0.0399	0.0335
$500\,000$	0.2292	0.0372	0.0298
Theoretical	0.2292	0.0353	0.0015
Rank $(\mathcal{A}_{r})$	21	162	182

TABLE III: Averaged normalized MSE

\overline{\xi^{2}}

L=182

source-destination path rate estimation of the NSFnet using the Iteration (1).

$N$	$r=1$	$r=2$	$r=3$
$10\,000$	0.3037	0.0951	0.0950
$20\,000$	0.3037	0.0637	0.0625
$50\,000$	0.3037	0.0430	0.0407
$100\,000$	0.3037	0.0358	0.0329
$200\,000$	0.3037	0.0321	0.0288
$500\,000$	0.3037	0.0293	0.0257
Theoretical	0.3037	0.0275	0.0000
Rank $(\mathcal{A}_{r})$	21	162	182

TABLE IV: Averaged normalized MSE

\overline{\xi^{2}}

L=182

source-destination path rate estimation of the NSFnet using the Least Squares Estimator (60).

$N$	$r=1$	$r=2$	$r=3$
$10\,000$	0.1890	4.1747	4.2275
$20\,000$	0.1868	2.9813	3.0626
$50\,000$	0.1890	1.9879	1.9901
$100\,000$	0.1890	1.6385	1.5780
$200\,000$	0.1890	1.3747	1.3242
$500\,000$	0.1890	1.1956	1.1165
Theoretical	0.1890	0.9692	0.0022

TABLE V: Percent of Negative rate estimates in Table IV.

The performance of the iterative estimator (1) is demonstrated in Table III. The table shows the averaged normalized MSE $\overline{\xi^{2}}$ in estimating the path rates using $r=1,2,3$ empirical cumulants estimated from $N$ vectors $\{Y_{n}\}$ with $N$ ranging from $N=10\,000$ to $N=500\,000$ . The averaged normalized MSE $\overline{\xi^{2}}$ in estimating the path rates using $r=1,2,3$ theoretical cumulants is shown as well. As can be observed in the table, $\overline{\xi^{2}}$ monotonically decreases with $N$ for $r=2,3$ . For a given $N$ , the performance using $r=3$ cumulants is better than using $r=2$ moments, except for $N=10\,000$ where the difference is only $.0005$ . Similar behavior can be seen in Table IV when the least squares estimator (60) is used for the path rate estimation. The percentage of negative rate estimates observed with the least squares estimator is shown in Table V. This percentage is negligible for all values of $r$ and $N$ and does not exceed $5\%$ . Comparing the two estimators (1) and (60), the least squares estimator appears superior as it consistently provides lower $\overline{\xi^{2}}$ at the expense of negligible percentage of negative rate estimates.

To assess the effectiveness of the third-order cumulant matching approach and compare it with Vardi’s second-order moment matching approach, we contrast the performance of the former approach with the best performance of the latter approach as obtained by using the theoretical rather than the empirical moments. Let $\overline{\xi^{2}}(r,N)$ denote the averaged normalized MSE from (62) obtained when $\lambda$ is estimated using $r$ empirical cumulants that were estimated from $N$ realizations of the vector $Y$ . Let $\overline{\xi^{2}}(r,\infty)$ denote the averaged normalized MSE $\overline{\xi^{2}}$ from (62) obtained by using $r$ theoretical cumulants, or effectively by using $N=\infty$ . Table VI shows $\overline{\xi^{2}}(2,N)/\overline{\xi^{2}}(2,\infty)$ and $\overline{\xi^{2}}(3,N)/\overline{\xi^{2}}(2,\infty)$ for $N=200\,000$ and the estimators (1) and (60) discussed in Tables III–IV.

	$\hat{\lambda}$ [Eq. (1)]	$\hat{\lambda}$ [Eq. (60)]
${\overline{\xi^{2}}(2,N)}/{\overline{\xi^{2}}(2,\infty)}$	1.1303	1.1673
${\overline{\xi^{2}}(3,N)}/{\overline{\xi^{2}}(2,\infty)}$	0.9490	1.0473

TABLE VI: Relative averaged normalized MSE

\overline{\xi^{2}}

in second and third order cumulant matching with

N=200\,000

, to minimum second-order averaged normalized MSE.

Table VI shows the relative averaged normalized MSE in second and third-order cumulant matching compared to the minimum averaged normalized MSE in theoretical second-order moment matching as obtained by using the true second-order moments. The reduction in the relative averaged normalized MSE obtained when the estimator (1) was used was from $1.1303$ to $0.9490$ , and from $1.1673$ to $1.0473$ when the estimator (60) was used. The third-order cumulants in this experiment were estimated using $N=200\,000$ . This represents a reduction of about $12\%-18\%$ .

To show the accuracy in estimating individual path rates, let $\xi_{i}^{2}(r,N)$ denote the normalized MSE $\xi_{i}^{2}$ from (61) when estimating the $i$ th component of $\lambda$ using $r$ empirical cumulants estimated from $N$ realizations of $Y$ . Figure 2 shows $\xi_{i}^{2}(3,N)/\xi_{i}^{2}(2,\infty)$ for $i=1,\ldots,182$ when estimation was done using the Iteration (1) and $N=200\,000$ . Excluding outliers, the figure shows that $\xi_{i}^{2}(3,N)$ equals about $.8-1.1$ of $\xi_{i}^{2}(2,\infty)$ . The outliers correspond to rate components for which $\xi_{i}^{2}(2,\infty)$ is very small so that the ratio $\xi_{i}^{2}(3,N)/\xi_{i}^{2}(2,\infty)$ is rather large.

VI Concluding Remarks

We have developed a framework for high-order cumulant matching approach for estimating the rates of source-destination Poisson traffic flows from link traffic flows. The approach is equally applicable to networks operating under deterministic as well as random routing strategies. Under independent Poisson source-destination traffic flows, the approach boils down to a set of linear equations relating empirical cumulants of the link measurements to the rate vector $\lambda$ via a matrix involving Khatri-Rao products of the routing matrix. We studied an iterative minimum I-divergence approach and least squares estimation of the rate vector from the cumulant matching equations. We have established an upper bound on the MSE in least squares estimation of $\lambda$ . The bound is useful for full column rank matrices $\mathcal{A}_{r}$ with small condition number. We demonstrated the performance of the cumulant matching approach and compared it with Vardi’s best second-order moment matching on the NSFnet. We demonstrated that supplementing Vardi’s approach with the third-order empirical cumulant reduces its minimum averaged normalized MSE in rate estimation by $13\%-17\%$ . The minimum averaged normalized MSE is obtained when the theoretical rather than the empirical moments are used in Vardi’s second-order approach. The computational complexity of the approach is linear in the number $N$ of link traffic flows used to estimate the cumulants.

Appendix A Cumulants of Linear Maps

Let $e_{i}$ be an $L\times 1$ vector with a $1$ in the $i$ th component and zeros elsewhere. The vector $\bar{X}$ can be written as $\bar{X}=\sum_{i=1}^{L}\bar{x}_{i}e_{i}$ . Substituting in (2), and applying the distributive property of the Kronecker product, we obtain,

\displaystyle\mu_{4}(X)

\displaystyle=\sum_{i,j,k,l}E\left\{\bar{x}_{i}\bar{x}_{j}\bar{x}_{k}\bar{x}_{l}\right\}(e_{i}\otimes e_{j}\otimes e_{k}\otimes e_{l}).

(A.63)

Similarly, the fourth order cumulant of $X$ is given by

\displaystyle K_{4}(X)

\displaystyle=\sum_{i,j,k,l}\kappa_{4}(\bar{x}_{i},\bar{x}_{j},\bar{x}_{k},\bar{x}_{l})(e_{i}\otimes e_{j}\otimes e_{k}\otimes e_{l})

(A.64)

where

	$\displaystyle\kappa_{4}$	$\displaystyle(\bar{x}_{i},\bar{x}_{j},\bar{x}_{k},\bar{x}_{l})=E\{\bar{x}_{i}\bar{x}_{j}\bar{x}_{k}\bar{x}_{l}\}-E\{\bar{x}_{i}\bar{x}_{j}\}E\{\bar{x}_{k}\bar{x}_{l}\}$
		$\displaystyle-E\{\bar{x}_{i}\bar{x}_{k}\}E\{\bar{x}_{j}\bar{x}_{l}\}-E\{\bar{x}_{i}\bar{x}_{l}\}E\{\bar{x}_{j}\bar{x}_{k}\}.$		(A.65)

Substituting (A) into (A.64) we obtain

$\displaystyle K_{4}(X)$	$\displaystyle\overset{(i)}{=}\sum_{i,j,k,l}E\{\bar{x}_{i}\bar{x}_{j}\bar{x}_{k}\bar{x}_{l}\}(e_{i}\otimes e_{j}\otimes e_{k}\otimes e_{l})$
	$\displaystyle\overset{(ii)}{-}\sum_{i,j,k,l}E\{\bar{x}_{i}\bar{x}_{j}\}E\{\bar{x}_{k}\bar{x}_{l}\}(e_{i}\otimes e_{j}\otimes e_{k}\otimes e_{l})$
	$\displaystyle\overset{(iii)}{-}\sum_{i,j,k,l}E\{\bar{x}_{i}\bar{x}_{k}\}E\{\bar{x}_{j}\bar{x}_{l}\}(e_{i}\otimes e_{j}\otimes e_{k}\otimes e_{l})$
	$\displaystyle\overset{(iv)}{-}\sum_{i,j,k,l}E\{\bar{x}_{i}\bar{x}_{l}\}E\{\bar{x}_{j}\bar{x}_{k}\}(e_{i}\otimes e_{j}\otimes e_{k}\otimes e_{l}).$	(A.66)

Row $(i)$ coincides with (A.63) and hence it equals $\mu_{4}(X)$ . The expression in row $(ii)$ can be written as

		$\displaystyle\sum_{i,j}E\{\bar{x}_{i}\bar{x}_{j}\}(e_{i}\otimes e_{j})\otimes\sum_{k,l}E\{\bar{x}_{k}\bar{x}_{l}\}(e_{k}\otimes e_{l})$
		$\displaystyle\hskip 86.11084pt=\mu_{2}(X)\otimes\mu_{2}(X).$		(A.67)

For row $(iii)$ ,

$\displaystyle(e_{i}\otimes e_{j}\otimes e_{k}\otimes e_{l})$	$\displaystyle=\textrm{vec}\{(e_{k}\otimes e_{l})(e_{i}\otimes e_{j})^{*}\}$
	$\displaystyle=\textrm{vec}\{(e_{k}\otimes e_{l})(e_{i}^{}\otimes e_{j}^{})\}$
	$\displaystyle=\textrm{vec}\{(e_{k}e_{i}^{})\otimes(e_{l}e_{j}^{})\},$	(A.68)

and hence,

$\displaystyle\sum_{i,j,k,l}E\{\bar{x}_{i}$	$\displaystyle\bar{x}_{k}\}E\{\bar{x}_{j}\bar{x}_{l}\}(e_{i}\otimes e_{j}\otimes e_{k}\otimes e_{l})$
	$\displaystyle=\textrm{vec}\sum_{i,j,k,l}E\{\bar{x}_{i}\bar{x}_{k}\}E\{\bar{x}_{j}\bar{x}_{l}\}(e_{k}e_{i}^{}\otimes e_{l}e_{j}^{})$
	$\displaystyle=\textrm{vec}\left\{R_{XX}\otimes R_{XX}\right\}.$	(A.69)

For row $(iv)$ we utilize the $L^{2}\times L^{2}$ permutation matrix

\displaystyle U_{L^{2}}=\sum_{i=1}^{L}\sum_{j=1}^{L}(e_{i}e_{j}^{*})\otimes(e_{j}e_{i}^{*})

(A.70)

to obtain,

$\displaystyle(e_{i}\otimes e_{j}\otimes e_{k}\otimes e_{l})$	$\displaystyle=(e_{i}\otimes e_{j})\otimes\left(U_{L^{2}}\cdot(e_{l}\otimes e_{k})\right)$
	$\displaystyle=\textrm{vec}\left\{U_{L^{2}}\cdot(e_{l}\otimes e_{k})\cdot(e_{i}\otimes e_{j})^{*}\right\}$
	$\displaystyle=\textrm{vec}\left\{U_{L^{2}}\cdot(e_{l}\otimes e_{k})\cdot(e_{i}^{}\otimes e_{j}^{})\right\}$
	$\displaystyle=\textrm{vec}\left\{U_{L^{2}}\cdot(e_{l}e_{i}^{})\otimes(e_{k}e_{j}^{})\right\}.$	(A.71)

Hence, row $(iv)$ is given by

		$\displaystyle\textrm{vec}\left\{U_{L^{2}}\sum_{i,j,k,l}E\{\bar{x}_{i}\bar{x}_{l}\}E\{\bar{x}_{j}\bar{x}_{k}\}(e_{l}e_{i}^{})\otimes(e_{k}e_{j}^{})\right\}$
		$\displaystyle\hskip 86.11084pt=\textrm{vec}\left\{U_{L^{2}}(R_{XX}\otimes R_{XX})\right\}.$		(A.72)

Substituting these results into (A) yields (III-A).

We next express $K_{4}(Y)$ in terms of $K_{4}(X)$ for $Y=AX$ . By substituting $X$ with $Y$ in (III-A) we obtain,

	$\displaystyle K_{4}(Y)$	$\displaystyle=\mu_{4}(Y)-\mu_{2}(Y)\otimes\mu_{2}(Y)$
		$\displaystyle-\textrm{vec}\left\{R_{YY}\otimes R_{YY}+U_{M^{2}}\cdot(R_{YY}\otimes R_{YY})\right\}.$		(A.73)

The first term in the RHS of (A) is given by (4). The second term can be expressed using (3) and (4) as,

	$\displaystyle\mu_{2}(Y)\otimes\mu_{2}(Y)$	$\displaystyle=[(A\otimes A)\mu_{2}(X)]\otimes[(A\otimes A)\mu_{2}(X)]$
		$\displaystyle=(A\otimes A\otimes A\otimes A)(\mu_{2}(X)\otimes\mu_{2}(X)).$		(A.74)

For the third term in (A) we use (3) and the well known identity [29, p. 410]

\displaystyle\textrm{vec}\left\{ARB^{*}\right\}=(B\otimes A)\textrm{vec}\{R\}

(A.75)

to obtain

vec	$\displaystyle\left\{R_{YY}\otimes R_{YY}\right\}x=\textrm{vec}\left\{(AR_{XX}A^{})\otimes(AR_{XX}A^{})\right\}$
	$\displaystyle=\textrm{vec}\left\{(A\otimes A)(R_{XX}\otimes R_{XX})(A\otimes A)^{*}\right\}$
	$\displaystyle=(A\otimes A\otimes A\otimes A)\textrm{vec}\{R_{XX}\otimes R_{XX}\}.$	(A.76)

For the last term in (A), we use (3) and the relation

\displaystyle U_{M^{2}}(A\otimes A)=(A\otimes A)U_{L^{2}}

(A.77)

to obtain

	$\displaystyle U_{M^{2}}(R_{YY}\!\otimes\!R_{YY})$	$\displaystyle=U_{M^{2}}(A\!\otimes\!A)(R_{XX}\!\otimes\!R_{XX})(A\!\otimes\!A)^{*}$
		$\displaystyle=(A\!\otimes\!A)U_{L^{2}}(R_{XX}\!\otimes\!R_{XX})(A\!\otimes\!A)^{*}.$		(A.78)

Hence, from (A.75),

	vec	$\displaystyle\left\{U_{M^{2}}(R_{YY}\otimes R_{YY})\right\}=$
		$\displaystyle(A\otimes A\otimes A\otimes A)\textrm{vec}\{U_{L^{2}}(R_{XX}\otimes R_{XX})\}.$		(A.79)

Combining these results we obtain (6).

Suppose now that the components of $X$ are independent random variables. Then,

\displaystyle E\{\bar{x}_{l_{1}}\cdot\bar{x}_{l_{2}}\cdots\bar{x}_{l_{k}}\}=\left\{\begin{array}[]{cc}E\{\bar{x}_{l}^{k}\},&l_{1}\!=\!l_{2}\!=\!\cdots\!=\!l_{k}=l\\ 0,&\textrm{otherwise}\end{array}\right.

(A.82)

and

$\displaystyle\mu_{3}(X)$	$\displaystyle=E\left\{\bar{X}_{1}\otimes\bar{X}_{1}\otimes\bar{X}_{1}\right\}$
	$\displaystyle=\sum_{i,j,k}E\{\bar{x}_{i}\bar{x}_{j}\bar{x}_{k}\}(e_{i}\otimes e_{j}\otimes e_{k})$
	$\displaystyle=\sum_{i}E\{\bar{x}_{i}^{3}\}(e_{i}\otimes e_{i}\otimes e_{i}).$	(A.83)

The central moment $\mu_{3}(Y)$ follows from (4). Expressing

\displaystyle A=\sum_{i=1}^{L}a_{i}e_{i}^{*},

(A.84)

we have

	$\displaystyle A\otimes A\otimes A$	$\displaystyle=\sum_{i,j,k}(a_{i}e_{i}^{})\otimes(a_{j}e_{j}^{})\otimes(a_{k}e_{k}^{*})$
		$\displaystyle=\sum_{i,j,k}(a_{i}\otimes a_{j}\otimes a_{k})(e_{i}\otimes e_{j}\otimes e_{k})^{*}.$		(A.85)

Substituting (A) and (A) in (4), and using the orthonormality of the $L^{3}\times 1$ vectors $\{e_{i}\otimes e_{j}\otimes e_{k}\}$ , we obtain,

	$\displaystyle\mu_{3}(Y)$	$\displaystyle=\sum_{i=1}^{L}E\{\bar{x}_{i}^{3}\}(a_{i}\otimes a_{i}\otimes a_{i})$
		$\displaystyle=(A\odot A\odot A)\ \textrm{col}\left(E\left\{\bar{x}_{1}^{3}\right\},E\left\{\bar{x}_{2}^{3}\right\},\ldots,E\left\{\bar{x}_{L}^{3}\right\}\right).$		(A.86)

It can similarly be shown that

\displaystyle\mu_{2}(Y)

\displaystyle=(A\odot A)\ \textrm{col}\left(E\left\{\bar{x}_{1}^{2}\right\},E\left\{\bar{x}_{2}^{2}\right\},\ldots,E\left\{\bar{x}_{L}^{2}\right\}\right).

(A.87)

When the components of $X$ are independent random variables

\displaystyle\kappa_{4}(\bar{x}_{i},\bar{x}_{j},\bar{x}_{k},\bar{x}_{l})=\delta_{ij}\delta_{jk}\delta_{kl}\kappa_{4}(\bar{x}_{i},\bar{x}_{i},\bar{x}_{i},\bar{x}_{i}),

(A.88)

and

\displaystyle\kappa_{4}(\bar{x}_{i},\bar{x}_{j},\bar{x}_{k},\bar{x}_{l})=\left\{\begin{array}[]{cc}E\{\bar{x}_{i}^{4}\}-3(E\{\bar{x}_{i}^{2}\})^{2},&i\!=\!j\!=\!k\!=\!l\\ 0,&\textrm{otherwise.}\end{array}\right.

(A.91)

Substituting (A.91) into (A) yields,

	$\displaystyle K_{4}(X)$	$\displaystyle=\sum_{i,j,k,l}\kappa_{4}(\bar{x}_{i},\bar{x}_{i},\bar{x}_{i},\bar{x}_{i})\delta_{ij}\delta_{jk}\delta_{kl}(e_{i}\otimes e_{j}\otimes e_{k}\otimes e_{l})$
		$\displaystyle=\sum_{p}\kappa_{4}(\bar{x}_{p},\bar{x}_{p},\bar{x}_{p},\bar{x}_{p})(e_{p}\otimes e_{p}\otimes e_{p}\otimes e_{p}).$		(A.92)

To find $K_{4}(Y)$ we use (6) where

\displaystyle A\otimes A\otimes A\otimes A=\!\sum_{i,j,k,l}\!(a_{i}\!\otimes\!a_{j}\!\otimes\!a_{k}\otimes\!a_{l})(e_{i}\!\otimes\!e_{j}\!\otimes\!e_{k}\!\otimes e_{l})^{*}

(A.93)

as follows from (A.84)-(A), and $K_{4}(X)$ is given in (A). Using orthonormality of the $L^{4}\times 1$ vectors $\{e_{i}\otimes e_{j}\otimes e_{k}\otimes e_{l}\}$ as in (A), yields,

	$\displaystyle K_{4}(Y)$	$\displaystyle=\sum_{p=1}^{L}\kappa_{4}(\bar{x}_{p},\bar{x}_{p},\bar{x}_{p},\bar{x}_{p})(a_{p}\otimes a_{p}\otimes a_{p}\otimes a_{p})$
		$\displaystyle=(A\odot A\odot A\odot A)\left(\begin{array}[]{c}\kappa_{4}(\bar{x}_{1},\bar{x}_{1},\bar{x}_{1},\bar{x}_{1})\\ \kappa_{4}(\bar{x}_{2},\bar{x}_{2},\bar{x}_{2},\bar{x}_{2})\\ \vdots\\ \kappa_{4}(\bar{x}_{L},\bar{x}_{L},\bar{x}_{L},\bar{x}_{L})\end{array}\right),$		(A.98)

which is (13). We conjecture that the same relation holds for all higher order cumulants.

When the components of $X$ are independent Poisson random variables with $E\{X\}=\lambda$ ,

$\displaystyle E\{\bar{x}_{i}^{2}\}$	$\displaystyle=E\{\bar{x}_{i}^{3}\}=\lambda_{i}$
$\displaystyle E\{\bar{x}_{i}^{4}\}$	$\displaystyle=\lambda_{i}+3\lambda_{i}^{2}$
$\displaystyle\kappa_{4}(\bar{x}_{i},\bar{x}_{i},\bar{x}_{i},\bar{x}_{i})$	$\displaystyle=(\lambda_{i}+3\lambda_{i}^{2})-3\lambda_{i}^{2}=\lambda_{i},$	(A.99)

and (25) follows.

References

[1] Y. Vardi, “Network tomography: Estimating source-destination traffic intensities from link data,” J. Amer. Stat. Assoc., vol. 91, no. 433, pp. 365–377, 1996.
[2] R. J. Vanderbei and J. Iannone, “An EM approach to OD matrix estimation,” Princeton University, Tech. Rep. SOR 94-04, 1994.
[3] J. Cao, D. Davis, S. Vander Wiel, and B. Yu, “Time-varying network tomography: Router link data,” J. Amer. Statist. Assoc., vol. 95, pp. 1063–1075, 2004.
[4] G. Liang and B. Yu, “Maximum pseudo likelihood estimation in network tomography,” IEEE Trans. Signal Process., vol. 51, no. 8, pp. 2043–2053, Aug. 2003.
[5] W. H. Richardson, “Bayesian-based iterative method of image restoration,” J. Opt. Soc. Amer., vol. 62, pp. 55–59, Jan 1972.
[6] C. Tebaldi and M. West, “Bayesian inference on network traffic using link count data (with discussion),” J. Amer. Stat. Assoc., vol. 93, no. 442, pp. 557–576, 1998.
[7] M. Mardani and G. B. Giannakis, “Estimating traffic and anomaly maps via network tomography,” IEEE/ACM Trans. Netw., vol. 24, no. 3, pp. 1533–1547, June 2016.
[8] H. Singhal and G. Michailidis, “Identifiability of flow distributions from link measurements with applications to computer networks,” Inverse Problems, vol. 23, pp. 1821–1849, 2007.
[9] M. L. Hazelton, “Network tomography for integer-valued traffic,” The Annals of Applied Statistics, vol. 9, no. 1, pp. 474–506, 2015.
[10] R. Cáceres, N. G. Duffield, J. Horowitz, and D. F. Towsley, “Multicast-based inference of network internal loss characteristics,” IEEE Trans. Inf. Theory, vol. 45, no. 7, pp. 2462–2480, Nov. 1999.
[11] F. Lo Presti, N. G. Duffield, J. Horowitz, and D. Towsley, “Multicast-based inference of network-internal delay distributions,” IEEE/ACM Trans. Netw., vol. 10, no. 6, pp. 761–775, Dec. 2002.
[12] Y. Tsang, M. J. Coates, and R. Nowak, “Network delay tomography,” IEEE Trans. Signal Process., vol. 51, pp. 2125–2136, 2003.
[13] L. Ma, T. He, K. K. Leung, D. Towsley, and A. Swami, “Efficient identification of additive link metrics via network tomography,” in IEEE 33rd Int. Conf. Distributed Computing Systems, Philadelphia, PA, 2013, pp. 581–590.
[14] N. Etemadi Rad, Y. Ephraim, and B. L. Mark, “Delay network tomography using a partially observable bivariate Markov chain,” IEEE/ACM Trans. Netw., vol. 25, no. 1, pp. 126–138, Feb. 2017.
[15] X. Jin, W.-P. Yiu, S.-H. Chan, and Y. Wang, “Network topology inference b ased on end-to-end measurements,” IEEE J. Sel. Areas Commun., vol. 24, no. 12, pp. 2182–2195, Dec. 2006.
[16] B. Eriksson, G. Dasarathy, P. Barford, and R. Nowak, “Toward the practical use of network tomography for Internet topology discovery,” in Proc. IEEE INFOCOM, San Diego, CA, USA, March 2010, pp. 1–9.
[17] J. Ni, H. Xie, S. Tatikonda, and Y. R. Yang, “Efficient and dynamic routing topology inference from end-to-end measurements,” IEEE/ACM Trans. Netw., vol. 18, no. 1, pp. 123–135, Feb. 2010.
[18] A. Anandkumar, A. Hassidim, and J. Kelner, “Topology discovery of sparse random graphs with few participants,” in ACM SIGMETRICS’11, San Jose, CA, USA, June 2011.
[19] R. Kumar and J. Kaur, “Practical beacon placement for link monitoring using network tomography,” IEEE J. Sel. Areas Commun., vol. 24, no. 12, pp. 2196–2209, Dec. 2006.
[20] M. G. Rabbat, M. J. Coates, and R. D. Nowak, “Multiple-source Internet tomography,” IEEE J. Sel. Areas Commun., vol. 24, no. 12, pp. 2221–2234, Dec. 2006.
[21] H. Yao, S. Jaggi, and M. Chen, “Passive network tomography for erroneous networks: A network coding approach,” IEEE Trans. Inf. Theory, vol. 58, no. 9, pp. 5922–5940, Sep. 2012.
[22] R. Castro, M. Coates, G. Liang, R. Nowak, and B. Yu, “Network tomography: Recent developments,” J. Stat. Sci., vol. 19, no. 3, pp. 499–517, 2004.
[23] P. Qin, B. Dai, B. Huang, G. Xu, and K. Wu, “A survey on network tomography with network coding,” IEEE Commun. Surveys Tuts., vol. 16, no. 4, pp. 1981–1995, 2014.
[24] L. H. Bonani, “Modeling an optical network operating with hybrid-switching paradigms,” J. Microw. Optoelectron. Electromagn. Appl., vol. 15, no. 4, pp. 275–292, 2016.
[25] D. L. Snyder, T. J. Schulz, and J. A. O’Sullivan, “Deblurring subject to nonnegativity constraints,” IEEE Trans. Sig. Proc., vol. 40, 1992.
[26] I. Csiszár, “Why least squares and maximum entropy? an axiomatic approach to inference for linear inverse problems,” The Annals ofStatistics, vol. 19, no. 4, pp. 2032–2066, Dec. 1991.
[27] L. A. Shepp and Y. Vardi, “Maximum likelihood reconstruction for emission tomography,” IEEE Trans Med Imaging, vol. 1, pp. 113–122, 1982.
[28] Y. Vardi, “Applications of the EM algorithm to linear inverse problems with positivity constraints,” in Image Models (and their Speech Cousins), S. E. Levinson and L. A. Shepp, Eds. Springer, 1966, pp. 183–198.
[29] P. Lancaster and M. Tismenetsky, The Theory of Matrices, 2nd ed. Orlando, Florida: Academic Press, 1985.
[30] A. Swami and J. M. Mendel, “Time and lag recursive computation of cumulants from a state-space model,” IEEE Trans. Automatic Control, vol. 35, no. 1, pp. 4–17, Jan. 1990.
[31] M. G. Kendall and A. Stuart, The Advanced Theory of Statistics. Hafner Publishing Company, 1969.
[32] T. Kailath, A. H. Sayed, and B. Hassibi, Linear Estimation, 1st ed. New Jersey: Prentice Hall, 2000.
[33] C. L. Lawson and R. J. Hanson, Solving Least Squares Problems. New Jersey: Prentice-Hall, 1974.
[34] J. Y. Yen, “Finding the k shortest loopless paths in a network,” Management Science, vol. 17, no. 11, pp. 712–716, Jul. 1971.

$\displaystyle\overline{\epsilon_{r}^{2}}(\lambda)$	$\displaystyle=E\left\{\|\|\hat{\lambda}_{r}-\lambda\|\|^{2}\right\}$
	$\displaystyle=E\left\{\|\|H_{r}\epsilon_{r}(Y)\|\|^{2}\right\}$
	$\displaystyle\leq\rho_{\textrm{max}}(H_{r}^{*}H_{r})E\{\|\|\epsilon_{r}(Y)\|\|^{2}\}$	(40)

$\displaystyle\overline{\epsilon_{r}^{2}}(\lambda)$	$\displaystyle\leq\rho_{\textrm{max}}(((\mathcal{A}_{r}^{}\mathcal{A}_{r})^{-1}\mathcal{A}_{r}^{})((\mathcal{A}_{r}^{}\mathcal{A}_{r})^{-1}\mathcal{A}_{r}^{})^{*})E\{\|\|\epsilon_{r}(Y)\|\|^{2}\}$
	$\displaystyle=\rho_{\textrm{max}}((\mathcal{A}_{r}^{*}\mathcal{A}_{r})^{-1})E\{\|\|\epsilon_{r}(Y)\|\|^{2}\}$
	$\displaystyle=\frac{1}{\rho_{\textrm{min}}(\mathcal{A}_{r}^{*}\mathcal{A}_{r})}E\{\|\|\epsilon_{r}(Y)\|\|^{2}\}.$	(42)

	$\displaystyle E\{\|\|\epsilon_{r}(Y)\|\|^{2}\}$	$\displaystyle\leq\max\left\{\rho_{\textrm{max}}(A^{}A),[\rho_{\textrm{max}}(A^{}A)]^{r}\right\}$
		$\displaystyle\hskip 86.11084pt\cdot E\left\{\|\|\epsilon_{r}(X)\|\|^{2}\right\}.$		(50)