This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Traffic Rate Network Tomography with Higher-Order Cumulants

Hanoch Lev-Ari, , Yariv Ephraim , Brian L. Mark This work was supported in part by the U.S. National Science Foundation under Grant CCF-1717033.H. Lev-Ari is with the Dept. of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115 (e-mail: [email protected]).Y. Ephraim and B.L. Mark are with the Dept. of Electrical and Computer Engineering, George Mason University, Fairfax, VA 22030 (e-mail: [email protected], [email protected]).
Abstract

Network tomography aims at estimating source-destination traffic rates from link traffic measurements. This inverse problem was formulated by Vardi in 1996 for Poisson traffic over networks operating under deterministic as well as random routing regimes. In this paper we expand Vardi’s second-order moment matching rate estimation approach to higher-order cumulant matching with the goal of increasing the column rank of the mapping and consequently improving the rate estimation accuracy. We develop a systematic set of linear cumulant matching equations and express them compactly in terms of the Khatri-Rao product. Both least squares estimation and iterative minimum I-divergence estimation are considered. We develop an upper bound on the mean squared error (MSE) in least squares rate estimation from empirical cumulants. We demonstrate for the NSFnet that supplementing Vardi’s approach with third-order empirical cumulant reduces its averaged normalized MSE relative to the theoretical minimum of the second-order moment matching approach by about 12%18%12\%-18\%. This minimum MSE is obtained when Vardi’s second-order moment matching approach is based on the theoretical rather than the empirical moments.

Index Terms:
Network traffic, network tomography, inverse problem, higher-order cumulants.

I Introduction

Network tomography was formulated by Y. Vardi in the seminal paper [1]. The goal is to estimate the rates of traffic flows over source-destination pairs from traffic flows over links of the networks. Traffic flow rates are measured, for example, by the number of packets per second. We use XX to denote a vector of LL source-destination traffic flows, and YY to denote a vector of MM link traffic flows. Normally, MLM\ll L. All source-destination traffic flows are assumed independent Poisson random variables. Thus, XX comprises a vector of LL independent Poisson random variables. Link traffic flow measurements are assumed passive and require no probes. For networks operating under a deterministic routing regime, traffic flows from each source node is routed to the destination node over a fixed path. Traffic flows over each link may originate from multiple source-destination traffic flows. Thus, we define a binary variable aija_{ij} such that aij=1a_{ij}=1 when traffic over source-destination jj passes through link ii and aij=0a_{ij}=0 otherwise. We also define an M×LM\times L binary routing matrix A={aij,i=1,,M;j=1,,L}A=\{a_{ij},i=1,\ldots,M;j=1,\ldots,L\} which is assumed known throughout the paper. It follows that Y=AXY=AX. The expected value of the jjth Poisson source-destination traffic flow constitutes its traffic flow rate. Thus, the vector of source-destination traffic flow rates is the expected value of XX which we denote by λ=E{X}\lambda=E\{X\}.

The network tomography problem is that of estimating the rate vector λ\lambda from given realizations of YY. This is a rather challenging inverse problem since Y=AXY=AX is an underdetermined set of equations, and the estimate of λ\lambda must be non-negative. Vardi invoked the terminology “LININPOS” for (LINear INverse POSitive problem). Maximum likelihood estimation of λ\lambda is not feasible since the components of YY are dependent Poisson random variables with no explicitly known distribution. The expectation-maximization (EM) algorithm is also not useful for this problem as it requires calculation of the conditional mean E{XY}E\{X\mid Y\} which in turns requires knowledge of the joint distribution of XX and YY. Vanderbei and Iannone [2] developed an EM algorithm in which E{XY}E\{X\mid Y\} is simulated. A common approach is to rely on the explicit form of E{XY}E\{X\mid Y\} for jointly Gaussian XX and YY, see, e.g., [1], [3], [4].

Vardi resorted to moment matching in which λ\lambda is estimated from a linear matrix equation relating the first two empirical moments of YY to the corresponding first two theoretical moments of AXAX. The approach is applicable to networks operating under deterministic as well as random routing regimes. In the latter case, there are multiple alternative paths for each source-destination pair which can be selected according to some probability law. Vardi invoked an iterative procedure for estimating λ\lambda which was previously developed for image deblurring [5]. This useful procedure will be discussed in Section II. Vardi’s work ignited extensive research in the areas of network inference and medical tomography.

In this paper we explore the benefits of supplementing Vardi’s second-order moment matching approach with higher-order empirical cumulants. Here too the cumulant matching equation is linear in λ\lambda, and the approach is applicable to networks operating under deterministic as well as random routing regimes. Higher-order cumulants introduce new useful information on the unavailable distribution of YY which can be leveraged in estimating the source-destination flow rates. By using a sufficient number of empirical cumulants, the linear mapping involved in the cumulant matching approach can achieve full column rank. In an ideal scenario where the cumulant matching equations are consistent, and the cumulants of the link traffic flows are accurately known, the rate vector can be recovered without error. As we demonstrate in Section V, this ideal situation is approachable when a sufficiently large number of realizations of YY are available.

A vast literature exists on network tomography. Several forms of network tomography have been studied in the literature depending on the type of measurement data and the network performance parameters (e.g., rate, delay) of interest: (i) source-destination path-level traffic rate estimation based on link-level traffic measurements [1, 6, 3, 4, 7, 8, 9]; (ii) link-level parameter estimation based on end-to-end path-level traffic measurements [2], [10, 11, 12, 13, 14]; (iii) network topology discovery from traffic measurements [15, 16, 17, 18]. Approaches to network tomography can also be characterized as active (e.g., [10, 11, 19, 20]), whereby explicit control traffic is injected into the network to extract measurements, or passive (e.g., [1, 3, 21, 14]), whereby the observation data is obtained from existing network traffic. A somewhat outdated survey circa 2003 can be found in [22]. A more recent survey [23] discusses network tomography in conjunction with network coding.

Closest to Vardi’s work is the contemporary work of Vanderbei and Iannone [2] which relies on a Poisson model for incoming traffic. The goal is to estimate the rate of traffic on each link connecting input and output nodes from traffic counts at the input and output nodes. Vanderbei and Iannone did not resort to moment matching but rather developed a simulation based EM algorithm for maximum likelihood estimation of the rates. A thorough Bayesian approach to Vardi’s problem was developed by Tebaldi and West [6] using Markov Chain Monte Carlo simulation. See also [9]. Another closely related work to Vardi’s problem appeared in [3], [4], [22] where maximum likelihood estimation of the source-destination rates from link data was implemented under a Gaussian rather than a Poisson traffic model. In [7], source-destination rates were estimated by utilizing spatiotemporal correlation of nominal traffic, and the fact that traffic anomalies are sparse. In [8], conditions for identifiability of higher order cumulants in estimation of source-destination traffic from link measurements were established. In [13] an algorithm was developed for choosing a set of linearly independent source-destination measurement paths from which additive link metrics are individually estimated.

The plan for this paper is as follows. In Section II we present the minimum I-divergence iterative procedure which plays an important role in this paper. In Section III we address rate estimation in networks with deterministic routing. We develop a new systematic set of cumulant matching equations for estimating the rate vector from up to the fourth-order empirical cumulant. We also develop an upper bound on the mean squared error (MSE) in least squares estimation of the rate vector from empirical cumulants. We conclude this section with a discussion on some implementation issues of the proposed cumulant matching approach. Complexity of the approach is discussed in Section III-D. In Section IV we discuss rate estimation in networks operating under a random routing regime. Numerical results for the NSFnet [24] are presented in Section V. Concluding remarks are given in Section VI. Technical details of the derivations are deferred to the Appendix.

II Minimum I-Divergence Iteration

Under the Poisson model for XX, the moment (and cumulant) matching approach results in a set of linear equations in λ\lambda. Generally speaking, the set of equations has the form of η^(Y)=Bλ\hat{\eta}(Y)=B\lambda where η^(Y)\hat{\eta}(Y) is a vector of MbM_{b} empirical moments (first and second-order in [1]) of YY, B={bij,i=1,,Mb;j=1,,L}B=\{b_{ij},i=1,\ldots,M_{b};j=1,\ldots,L\} is a constant zero-one matrix that depends on AA but not on λ\lambda, and BλB\lambda represents the corresponding theoretical moments.

To estimate λ\lambda that satisfies η^(Y)=Bλ\hat{\eta}(Y)=B\lambda, Vardi proposed to use an iterative estimation approach which originated in the study of another inverse problem concerning image deblurring [5], [25]. Let λjold\lambda_{j}^{\textrm{old}} denote a current estimate of the jjth component of λ\lambda, and let λjnew\lambda_{j}^{\textrm{new}} denote the new estimate of that component at the conclusion of the iteration. Let (Bλold)i(B\lambda^{\textrm{old}})_{i} denote the iith component of BλoldB\lambda^{\textrm{old}}. Similarly, let η^i(Y)\hat{\eta}_{i}(Y) denote the iith component of η^(Y)\hat{\eta}(Y). The iteration is given by

λjnew\displaystyle\lambda_{j}^{\textrm{new}} =λjoldi=1Mbb¯ijη^i(Y)(Bλold)i where b¯ij:=bijt=1Mbbtj,\displaystyle=\lambda_{j}^{\textrm{old}}\sum_{i=1}^{M_{b}}\bar{b}_{ij}\frac{\hat{\eta}_{i}(Y)}{(B\lambda^{\textrm{old}})_{i}}\textrm{ where }\bar{b}_{ij}:=\frac{b_{ij}}{\sum_{t=1}^{M_{b}}b_{tj}}, (1)

for j=1,,Lj=1,\ldots,L. This iteration is particularly suitable for solving positive inverse problems. It reaches a fixed point when the moment matching equation η^(Y)=Bλold\hat{\eta}(Y)=B\lambda^{\textrm{old}} is satisfied. The iteration was studied by Snyder, Schulz and O’Sullivan [25] in a similarly formulated application of image deblurring. It was shown to monotonically decrease Csiszár’s I-divergence [26] between the original image convolved with the kernel, and the observed blurred image. The procedure turned out to be an EM iteration in the positron emission tomography problem, which follows a similar formulation as network tomography, but with the crucially facilitating difference that the Poisson components of YY are now independent random variables [27], [28]. This iteration will play a central role in our cumulant matching approach.

III Cumulant Matching in Deterministic Routing

In this section we present our cumulant matching approach and its theoretical performance bound for networks operating under a deterministic routing regime. The goal is to estimate the source-destination rate vector λ\lambda from NN realizations of the link traffic flow YY where Y=AXY=AX and the routing matrix AA is known. Conditions for identifiability of λ\lambda were given in [1]. Specifically, the parameter λ\lambda is identifiable if all columns {a1,,aL}\{a_{1},\ldots,a_{L}\} of AA are distinct and none is null.

III-A Cumulant Matching

Let X¯:=XE{X}\bar{X}:=X-E\{X\} where XX is any real random vector with finite mean and possibly dependent components. The vectorized kkth order central moment of XX is given by

μk(X)=E{X¯X¯X¯k times }\displaystyle\mu_{k}(X)=E\{\underset{k\textrm{ times }}{\underbrace{\bar{X}\otimes\bar{X}\otimes\cdots\otimes\bar{X}}}\} (2)

where kk is any positive integer and \otimes denotes the Kronecker product. When the length of XX is LL, the length of the vector μk(X)\mu_{k}(X) is LkL^{k}. For Y=AXY=AX, it follows from the identity

(A1B1)(A2B2)=(A1A2)(B1B2),\displaystyle(A_{1}B_{1})\otimes(A_{2}B_{2})=(A_{1}\otimes A_{2})(B_{1}\otimes B_{2}), (3)

where A1,A2,B1,B2A_{1},A_{2},B_{1},B_{2} are matrices of suitable dimensions [29, p. 408], that

μk(Y)=(AAA)k times μk(X).\displaystyle\mu_{k}(Y)=\underset{k\textrm{ times }}{\underbrace{\left(A\otimes A\otimes\cdots\otimes A\right)}}\ \mu_{k}(X). (4)

Expressions for the cumulants of the observed process in a state-space model were developed by Swami and Mendal [30]. The process in this paper is a particular case for which simpler expressions hold. We provide an alternative derivation for our particular case in the Appendix. Let Kk(X)K_{k}(X) denote the vectorized kkth central cumulant of any real random vector XX with finite mean. For k=1,2,3k=1,2,3, Kk(X)=μk(X)K_{k}(X)=\mu_{k}(X). For k=4k=4 we have

K4(X)\displaystyle K_{4}(X) =μ4(X)μ2(X)μ2(X)\displaystyle=\mu_{4}(X)-\mu_{2}(X)\otimes\mu_{2}(X)
vec{RXXRXX+UL2(RXXRXX)}\displaystyle-\textrm{vec}\left\{R_{XX}\otimes R_{XX}+U_{L^{2}}\cdot(R_{XX}\otimes R_{XX})\right\} (5)

where RXXR_{XX} is the covariance matrix of XX and UL2U_{L^{2}} is an L2×L2L^{2}\times L^{2} permutation matrix defined in (A.70). We also have

K4(Y)=(AAAA)K4(X).\displaystyle K_{4}(Y)=(A\otimes A\otimes A\otimes A)K_{4}(X). (6)

When RXXR_{XX} is a diagonal matrix, as is the case of interest in this paper, it follows that UL2(RXXRXX)U_{L^{2}}\cdot(R_{XX}\otimes R_{XX}) is a symmetric matrix. We shall restrict our attention to cumulants of order k4k\leq 4 since estimation of higher order cumulants is not practical.

Assume next that the components {xj}\{x_{j}\} of XX are independent. For this special case of interest, the central moments and central cumulants of XX are conveniently and concisely expressed in terms of the Khatri-Rao product defined by

AA:=[a1a1,a2a2,,aLaL].\displaystyle A\odot A:=[a_{1}\otimes a_{1},a_{2}\otimes a_{2},\ldots,a_{L}\otimes a_{L}]. (7)

It is shown in the Appendix that

μ2(Y)\displaystyle\mu_{2}(Y) =(AA)col(E{x¯12},E{x¯22},,E{x¯L2})\displaystyle=(A\!\odot\!A)\textrm{col}\!\left(E\left\{\bar{x}_{1}^{2}\right\},E\left\{\bar{x}_{2}^{2}\right\},\ldots,E\left\{\bar{x}_{L}^{2}\right\}\right)
μ3(Y)\displaystyle\mu_{3}(Y) =(AAA)col(E{x¯13},E{x¯23},,E{x¯L3}),\displaystyle=(A\!\odot\!A\odot\!A)\textrm{col}\!\left(E\left\{\bar{x}_{1}^{3}\right\},E\left\{\bar{x}_{2}^{3}\right\},\ldots,E\left\{\bar{x}_{L}^{3}\right\}\right), (8)

and

K4(Y)\displaystyle K_{4}(Y) =(AAAA)(κ4(x¯1,x¯1,x¯1,x¯1)κ4(x¯2,x¯2,x¯2,x¯2)κ4(x¯L,x¯L,x¯L,x¯L)),\displaystyle=(A\!\odot\!A\!\odot\!A\!\odot\!A)\left(\begin{array}[]{c}\kappa_{4}(\bar{x}_{1},\bar{x}_{1},\bar{x}_{1},\bar{x}_{1})\\ \kappa_{4}(\bar{x}_{2},\bar{x}_{2},\bar{x}_{2},\bar{x}_{2})\\ \vdots\\ \kappa_{4}(\bar{x}_{L},\bar{x}_{L},\bar{x}_{L},\bar{x}_{L})\end{array}\right), (13)

where

κ4(x¯i,x¯j,x¯k,x¯l)={E{x¯i4}3(E{x¯i2})2,i=j=k=l0,otherwise.\displaystyle\kappa_{4}(\bar{x}_{i},\bar{x}_{j},\bar{x}_{k},\bar{x}_{l})=\left\{\begin{array}[]{cc}E\{\bar{x}_{i}^{4}\}-3(E\{\bar{x}_{i}^{2}\})^{2},&i\!=\!j\!=\!k\!=\!l\\ 0,&\textrm{otherwise.}\end{array}\right. (16)

When {xj}\{x_{j}\} are independent Poisson random variables, all cumulants of xjx_{j} equal its rate E{xj}=λjE\{x_{j}\}=\lambda_{j}, and we have from m(Y):=E{Y}=AE{X}m(Y):=E\{Y\}=AE\{X\} and (III-A)-(16) that,

(AAAAAAAAAA)λ=(m(Y)μ2(Y)μ3(Y)K4(Y)).\displaystyle\left(\begin{array}[]{c}A\\ A\odot A\\ A\odot A\odot A\\ A\odot A\odot A\odot A\end{array}\right)\lambda=\left(\begin{array}[]{c}m(Y)\\ \mu_{2}(Y)\\ \mu_{3}(Y)\\ K_{4}(Y)\end{array}\right). (25)

This is the key cumulant matching equation for r4r\leq 4. In (25), let 𝒜r\mathcal{A}_{r} denote the matrix of stacked Khatri-Rao products of AA, and let ηr(Y)\eta_{r}(Y) denote the vector of stacked cumulants. Then the rate vector λ\lambda satisfies

𝒜rλ=ηr(Y).\displaystyle\mathcal{A}_{r}\lambda=\eta_{r}(Y). (26)

To estimate λ\lambda, ηr(Y)\eta_{r}(Y) is replaced by a vector of empirical estimates η^r(Y)\hat{\eta}_{r}(Y) and λ\lambda is estimated from

𝒜rλ=η^r(Y).\displaystyle\boxed{\mathcal{A}_{r}\lambda=\hat{\eta}_{r}(Y).} (27)

In this paper we consider least squares estimation and the minimum I-divergence iterative procedure (1). The vector η^r(Y)\hat{\eta}_{r}(Y) comprises minimum variance unbiased cumulant estimates given by the KK-Statistics which we discuss in the next section [31].

Vardi’s second-order moment matching equation can be summarized as η^2(Y)=𝒜2λ\hat{\eta}_{2}(Y)=\mathcal{A}_{2}\lambda. With only two moments, the column rank of the matrix 𝒜2\mathcal{A}_{2} may be too low to provide an accurate solution. Note that if the set of equations (27) is consistent, the theoretical cumulants are known, and rr is sufficiently large so that 𝒜r\mathcal{A}_{r} has full column rank, then λ\lambda can be estimated error free as the unique solution of (27). We demonstrate in Section V for the NSFnet [24] that r=3r=3 is sufficient to achieve full column rank, and that the theoretical performance is approachable when the cumulants are estimated from a sufficiently large number of realizations of YY.

III-B Error Analysis

In this section we assess the MSE in the least squares estimation of λ\lambda satisfying (26) from the cumulant matching equation (27). We assume for this analysis that rr is sufficiently large so that the augmented matrix 𝒜r\mathcal{A}_{r} has full column rank.

KK-Statistics of scalar processes were developed in [31, p. 281]. For vector processes and k=1,2,3k=1,2,3, let μ^k(Y)\hat{\mu}_{k}(Y) and μ^k(X)\hat{\mu}_{k}(X) denote the KK-Statistics of μk(Y)\mu_{k}(Y) and μk(X)\mu_{k}(X), respectively. Similarly, let K^4(Y)\hat{K}_{4}(Y) and K^4(X)\hat{K}_{4}(X) denote the KK-Statistics of K4(Y)K_{4}(Y) and K4(X)K_{4}(X), respectively. Let {Xn,n=1,2,,N}\{X_{n},n=1,2,\ldots,N\} denote a sequence of independent identically distributed source-destination traffic flows defined similarly to XX. Each XnX_{n} comprised LL independent Poisson random variables with rate E{Xn}=λE\{X_{n}\}=\lambda. Let Yn=AXnY_{n}=AX_{n}. Define

Y~n:=Ynm^(Y)\displaystyle\tilde{Y}_{n}:=Y_{n}-\hat{m}(Y) (28)

where

m^(Y):=1Nn=1NYn,\displaystyle\hat{m}(Y):=\frac{1}{N}\sum_{n=1}^{N}Y_{n}, (29)

and the empirical cumulant estimate for k>1k>1 as

μ~k(Y):=1Nn=1NY~nY~nY~nk times .\displaystyle\tilde{\mu}_{k}(Y):=\frac{1}{N}\sum_{n=1}^{N}\underset{k\textrm{ times }}{\underbrace{\tilde{Y}_{n}\otimes\tilde{Y}_{n}\otimes\cdots\otimes\tilde{Y}_{n}}}. (30)

Similar definitions follow for the XX process when YY in (28)-(30) is replaced by XX. The KK-Statistic for k=1k=1 is given by

m^(Y)=1Nn=1NAXn=Am^(X),\displaystyle\hat{m}(Y)=\frac{1}{N}\sum_{n=1}^{N}AX_{n}=A\hat{m}(X), (31)

for k=2k=2,

μ^2(Y)\displaystyle\hat{\mu}_{2}(Y) =NN1μ~2(Y)\displaystyle=\frac{N}{N-1}\tilde{\mu}_{2}(Y)
=(AA)μ^2(X),\displaystyle=(A\otimes A)\hat{\mu}_{2}(X), (32)

for k=3k=3,

μ^3(Y)\displaystyle\hat{\mu}_{3}(Y) =N2(N1)(N2)μ~3(Y)\displaystyle=\frac{N^{2}}{(N-1)(N-2)}\tilde{\mu}_{3}(Y)
=(AAA)μ^3(X)\displaystyle=(A\otimes A\otimes A)\hat{\mu}_{3}(X) (33)

and for k=4k=4,

K^4(Y)\displaystyle\hat{K}_{4}(Y) :=N2(N1)(N2)(N3)[(N+1)μ~4(Y)\displaystyle:=\frac{N^{2}}{(N-1)(N-2)(N-3)}\left[(N+1)\tilde{\mu}_{4}(Y)\right.
3(N1)μ~2(Y)μ~2(Y)]\displaystyle\hskip 64.58313pt\left.-3(N-1)\tilde{\mu}_{2}(Y)\otimes\tilde{\mu}_{2}(Y)\right]
=(AAAA)K^4(X),\displaystyle=(A\otimes A\otimes A\otimes A)\hat{K}_{4}(X), (34)

where we have used (3). Similar relations are expected to hold for higher order cumulants. For r3r\leq 3 and sufficiently large NN, the KK-Statistic μ^r(Y)\hat{\mu}_{r}(Y) and the empirical cumulant μ~r(Y)\tilde{\mu}_{r}(Y) are essentially the same.

Let η^r(X)\hat{\eta}_{r}(X) and η^r(Y)\hat{\eta}_{r}(Y) denote vectors of stacked KK-statistics of order smaller than or equal to rr corresponding to XX and YY, respectively. The least squares estimate of λ\lambda in (27) is given by

λ^r=(𝒜r𝒜r)1𝒜rη^r(Y).\displaystyle\hat{\lambda}_{r}=(\mathcal{A}_{r}^{*}\mathcal{A}_{r})^{-1}\mathcal{A}_{r}^{*}\ \hat{\eta}_{r}(Y). (35)

Let

ϵr(Y)=η^r(Y)ηr(Y)\displaystyle\epsilon_{r}(Y)=\hat{\eta}_{r}(Y)-\eta_{r}(Y) (36)

denote the error vector in estimating ηr(Y)\eta_{r}(Y). From (26),

η^r(Y)=𝒜rλ+ϵr(Y).\displaystyle\hat{\eta}_{r}(Y)=\mathcal{A}_{r}\lambda+\epsilon_{r}(Y). (37)

Substituting (37) into (35) yields,

λ^r\displaystyle\hat{\lambda}_{r} =(𝒜r𝒜r)1𝒜r[𝒜rλ+ϵr(Y)]\displaystyle=(\mathcal{A}_{r}^{*}\mathcal{A}_{r})^{-1}\mathcal{A}_{r}^{*}[\mathcal{A}_{r}\lambda+\epsilon_{r}(Y)]
=λ+(𝒜r𝒜r)1𝒜rϵr(Y).\displaystyle=\lambda+(\mathcal{A}_{r}^{*}\mathcal{A}_{r})^{-1}\mathcal{A}_{r}^{*}\epsilon_{r}(Y). (38)

Let Hr=(𝒜r𝒜r)1𝒜rH_{r}=(\mathcal{A}_{r}^{*}\mathcal{A}_{r})^{-1}\mathcal{A}_{r}^{*}. The rate estimation error is given by

λ^rλ=Hrϵr(Y),\displaystyle\hat{\lambda}_{r}-\lambda=H_{r}\epsilon_{r}(Y), (39)

and the MSE is given by

ϵr2¯(λ)\displaystyle\overline{\epsilon_{r}^{2}}(\lambda) =E{λ^rλ2}\displaystyle=E\left\{||\hat{\lambda}_{r}-\lambda||^{2}\right\}
=E{Hrϵr(Y)2}\displaystyle=E\left\{||H_{r}\epsilon_{r}(Y)||^{2}\right\}
ρmax(HrHr)E{ϵr(Y)2}\displaystyle\leq\rho_{\textrm{max}}(H_{r}^{*}H_{r})E\{||\epsilon_{r}(Y)||^{2}\} (40)

where the inequality follows from the Rayleigh quotient theorem 111 The Rayleigh quotient of a Hermitian matrix HH is defined as uHu/uuu^{\ast}Hu/u^{\ast}u where uu is a vector of suitable dimension. [29, Theorem 8.1.4], and ρmax(HrHr)\rho_{\textrm{max}}(H_{r}^{*}H_{r}) denotes the maximal eigenvalue of the Hermitian matrix HrHrH_{r}^{*}H_{r}. From the theory of singular value decomposition, we recall that for any matrix HrH_{r},

ρmax(HrHr)=ρmax(HrHr).\displaystyle\rho_{\textrm{max}}(H_{r}H_{r}^{*})=\rho_{\textrm{max}}(H_{r}^{*}H_{r}). (41)

Applying this to (III-B) gives

ϵr2¯(λ)\displaystyle\overline{\epsilon_{r}^{2}}(\lambda) ρmax(((𝒜r𝒜r)1𝒜r)((𝒜r𝒜r)1𝒜r))E{ϵr(Y)2}\displaystyle\leq\rho_{\textrm{max}}(((\mathcal{A}_{r}^{*}\mathcal{A}_{r})^{-1}\mathcal{A}_{r}^{*})((\mathcal{A}_{r}^{*}\mathcal{A}_{r})^{-1}\mathcal{A}_{r}^{*})^{*})E\{||\epsilon_{r}(Y)||^{2}\}
=ρmax((𝒜r𝒜r)1)E{ϵr(Y)2}\displaystyle=\rho_{\textrm{max}}((\mathcal{A}_{r}^{*}\mathcal{A}_{r})^{-1})E\{||\epsilon_{r}(Y)||^{2}\}
=1ρmin(𝒜r𝒜r)E{ϵr(Y)2}.\displaystyle=\frac{1}{\rho_{\textrm{min}}(\mathcal{A}_{r}^{*}\mathcal{A}_{r})}E\{||\epsilon_{r}(Y)||^{2}\}. (42)

Let r\mathcal{B}_{r} denote a block diagonal matrix with the kkth diagonal block being 𝒜k\mathcal{A}_{k}. There are rr diagonal blocks in total. Let

ϵr(X)=η^r(X)ηr(X).\displaystyle\epsilon_{r}(X)=\hat{\eta}_{r}(X)-\eta_{r}(X). (43)

From (36), (31)-(III-B), and (25),

ϵr(Y)=rϵr(X)\displaystyle\epsilon_{r}(Y)=\mathcal{B}_{r}\epsilon_{r}(X) (44)

It follows from the Rayleigh quotient [29, Theorem 8.1.4] that

ϵr(Y)2r2ϵr(X)2\displaystyle||\epsilon_{r}(Y)||^{2}\leq||\mathcal{B}_{r}||^{2}\cdot||\epsilon_{r}(X)||^{2} (45)

where r||\mathcal{B}_{r}|| equals the maximal singular value of r\mathcal{B}_{r}, and

r2=ρmax(rr).\displaystyle||\mathcal{B}_{r}||^{2}=\rho_{\textrm{max}}(\mathcal{B}_{r}^{*}\mathcal{B}_{r}). (46)

The kkth diagonal block of rr\mathcal{B}_{r}^{*}\mathcal{B}_{r} is given by

(AA\displaystyle(A\otimes A\otimes\cdots A)(AAA)\displaystyle\otimes A)^{*}(A\otimes A\otimes\cdots\otimes A)
=AAAAAA\displaystyle=A^{*}A\otimes A^{*}A\otimes\cdots\otimes A^{*}A
=:(AA)k.\displaystyle=:(A^{*}A)^{\otimes k}. (47)

Since the eigenvalues of the Kronecker product of two matrices equal the Kronecker product of the vectors of eigenvalues of each matrix [29, p. 412], we conclude that the eigenvalues of (AA)p(A^{*}A)^{\otimes p} for any positive integer pp are all possible products of length pp of the eigenvalues of AAA^{*}A. In particular, since AAA^{*}A is positive semi-definite,

ρmax((AA)p)=[ρmax(AA)]p.\displaystyle\rho_{\textrm{max}}((A^{*}A)^{\otimes p})=[\rho_{\textrm{max}}(A^{*}A)]^{p}. (48)

Furthermore, since rr\mathcal{B}_{r}^{*}\mathcal{B}_{r} is block diagonal with blocks of increasing size, its eigenvalues are given by the union of the sets of eigenvalues associated with each block. Thus,

ρmax(rr)=maxp:p1[ρmax(AA)]p.\displaystyle\rho_{\textrm{max}}(\mathcal{B}_{r}^{*}\mathcal{B}_{r})=\max_{p:p\geq 1}[\rho_{\textrm{max}}(A^{*}A)]^{p}. (49)

When ρmax(AA)>1\rho_{\textrm{max}}(A^{*}A)>1, the maximum in (49) is achieved for p=rp=r. Otherwise, when ρmax(AA)<1\rho_{\textrm{max}}(A^{*}A)<1, the maximum in (49) is achieved for p=1p=1. Hence, from (45),

E{ϵr(Y)2}\displaystyle E\{||\epsilon_{r}(Y)||^{2}\} max{ρmax(AA),[ρmax(AA)]r}\displaystyle\leq\max\left\{\rho_{\textrm{max}}(A^{*}A),[\rho_{\textrm{max}}(A^{*}A)]^{r}\right\}
E{ϵr(X)2}.\displaystyle\hskip 86.11084pt\cdot E\left\{||\epsilon_{r}(X)||^{2}\right\}. (50)

From (III-B) and (III-B) we obtain,

ϵr2¯(λ)\displaystyle\overline{\epsilon_{r}^{2}}(\lambda) max{ρmax(AA),[ρmax(AA)]r}ρmin(𝒜r𝒜r)E{ϵr(X)2}.\displaystyle\leq\frac{\max\left\{\rho_{\textrm{max}}(A^{*}A),[\rho_{\textrm{max}}(A^{*}A)]^{r}\right\}}{\rho_{\textrm{min}}(\mathcal{A}_{r}^{*}\mathcal{A}_{r})}E\{||\epsilon_{r}(X)||^{2}\}. (51)

Evaluating the MSE E{ϵr(X)2}E\{||\epsilon_{r}(X)||^{2}\} is quite involved. For r=2,3r=2,3, and sufficiently large NN, the cross terms in μ^r(X)\hat{\mu}_{r}(X) will be approximately zero, and hence

E{ϵr(X)2}l=1LE{ϵr(xl)2}\displaystyle E\{||\epsilon_{r}(X)||^{2}\}\approx\sum_{l=1}^{L}E\{||\epsilon_{r}(x_{l})||^{2}\} (52)

where E{ϵr(xl)2}E\{||\epsilon_{r}(x_{l})||^{2}\} is the MSE associated with the KK-statistics of the llth component of XX. It is also the variance of μ^r(xl)\hat{\mu}_{r}(x_{l}). Since the {Xn}\{X_{n}\} vectors are assumed independent, we can infer E{ϵr(xl)2}E\{||\epsilon_{r}(x_{l})||^{2}\} from the variance of the KK-statistics of an IID sequence of scalar random variables given by [31, p. 291],

var (μ^2(xl))=1NK4(xl)+1N12μ22(xl)\displaystyle\left(\hat{\mu}_{2}(x_{l})\right)=\frac{1}{N}K_{4}(x_{l})+\frac{1}{N-1}2\mu_{2}^{2}(x_{l})
var (μ^3(xl))=1NK6(xl)+9N1K4(xl)μ2(xl)\displaystyle\left(\hat{\mu}_{3}(x_{l})\right)=\frac{1}{N}K_{6}(x_{l})+\frac{9}{N-1}K_{4}(x_{l})\mu_{2}(x_{l})
+9N1μ32(xl)+6N(N1)(N2)μ23(xl).\displaystyle\hskip 8.61108pt+\frac{9}{N-1}\mu_{3}^{2}(x_{l})+\frac{6N}{(N-1)(N-2)}\mu_{2}^{3}(x_{l}). (53)

When the IID random variables are Poisson with E{Xl}=λlE\{X_{l}\}=\lambda_{l}, all cumulants equal λl\lambda_{l}, and hence,

E{ϵ2(xl)2}\displaystyle E\{||\epsilon_{2}(x_{l})||^{2}\} =var(μ^2(xl))=1Nλl+1N12λl2\displaystyle=\textrm{var}\left(\hat{\mu}_{2}(x_{l})\right)=\frac{1}{N}\lambda_{l}+\frac{1}{N-1}2\lambda_{l}^{2}
1N(λl+2λl2).\displaystyle\approx\frac{1}{N}(\lambda_{l}+2\lambda_{l}^{2}). (54)
E\displaystyle E {ϵ3(xl)2}=var(μ^3(xl))=1Nλl+9N12λl2\displaystyle\{||\epsilon_{3}(x_{l})||^{2}\}=\textrm{var}\left(\hat{\mu}_{3}(x_{l})\right)=\frac{1}{N}\lambda_{l}+\frac{9}{N-1}2\lambda_{l}^{2}
+6N(N1)(N2)λl31N(λl+18λl2+6λl3).\displaystyle~{}~{}~{}+\frac{6N}{(N-1)(N-2)}\lambda_{l}^{3}\approx\frac{1}{N}(\lambda_{l}+18\lambda_{l}^{2}+6\lambda_{l}^{3}). (55)

The upper bound on ϵr2¯(λ)\overline{\epsilon_{r}^{2}}(\lambda) for r=2,3r=2,3, follows from (51), (52), (III-B) and (III-B).

The bound (51) is loose when the condition number of the matrix 𝒜r𝒜r\mathcal{A}_{r}^{*}\mathcal{A}_{r} is large. It does have qualitative value as it shows that estimation of moments becomes increasingly harder as the order increases. Since usually λl>1\lambda_{l}>1 in network tomography, the error in estimating μ3(xl)\mu_{3}(x_{l}) is much larger than that in estimating μ2(xl)\mu_{2}(x_{l}).

III-C Implementation

In this section we address several aspects related to the implementation of the cumulant matching approach. The Khatri-Rao product can be expressed in terms of the rows of AA instead of its columns as in (7). Let αi\alpha_{i} denote the iith row of AA, i=1,,Mi=1,\ldots,M. Then A=stack{αi:i=1,,M}A=\mbox{stack}\{\alpha_{i}:i=1,\ldots,M\} where stack refers to the stacking of row vectors of equal length to form a matrix. We have

AA\displaystyle A\odot A =stack{αiαj;i,j=1,,M}\displaystyle=\mbox{stack}\left\{\alpha_{i}\circ\alpha_{j};\ i,j\!=\!1,\ldots,M\right\}
AAA\displaystyle A\odot A\!\odot\!A =stack{αiαjαk;i,j,k=1,,M},\displaystyle=\mbox{stack}\left\{\alpha_{i}\circ\alpha_{j}\circ\alpha_{k};\ i,j,k\!=\!1,\ldots,M\right\}, (56)

etc., where \circ denotes the Schur-Hadamard product and lexicographic ordering of the indices (i,j,k)(i,j,k) is assumed. Using this formulation, it is easy to see that 𝒜r\mathcal{A}_{r} contains duplicate rows. For example, αi\alpha_{i} in AA and αiαi\alpha_{i}\circ\alpha_{i} in AAA\odot A are duplicates since the elements of AA belong to {0,1}\{0,1\}. Additionally, 𝒜r\mathcal{A}_{r} may contain null rows as is easy to see from (7). Thus, equations in (25) corresponding to duplicate and null rows in 𝒜r\mathcal{A}_{r} can be removed. We shall always opt to removing duplicate rows that correspond to higher-order cumulants rather than to low order cumulants since higher-order cumulants are harder to estimate. To simplify notation we shall henceforth assume that 𝒜r\mathcal{A}_{r} and the right hand side (RHS) vector ηr(Y)\eta_{r}(Y) of (25) are given in their reduced form. Similarly, we shall consider (26) and (27) as given in their reduced form.

It is interesting to compare the number of rows in the original and reduced 𝒜r\mathcal{A}_{r}. The number of rows in the original 𝒜r\mathcal{A}_{r} is given by

n¯r(M)=M+M2+M3++Mr=M(Mr1)M1.\displaystyle\bar{n}_{r}(M)=M+M^{2}+M^{3}+\ldots+M^{r}=\frac{M(M^{r}\!-\!1)}{M-1}. (57)

The maximum number of distinct rows in the reduced 𝒜r\mathcal{A}_{r} is counted as the sum of the maximum number of distinct rows contributed individually by AA, AAA\circ A, AAAA\circ A\circ A, etc., with the last contribution from the rr-fold Khatri-Rao product of AA with itself. The contribution from the iith term is given by (Mi){M\choose i}, i=1,,ri=1,\ldots,r, which represents the number of unordered combinations of ii rows chosen without replacement from the given MM rows. Thus, the maximum number of distinct rows in the reduced 𝒜r\mathcal{A}_{r} is given by

nr(M)=(M1)+(M2)+(M3)++(Mr).\displaystyle n_{r}(M)={M\choose 1}\!+\!{M\choose 2}\!+\!{M\choose 3}+\ldots+{M\choose r}. (58)

For example, consider a network with a 4×154\times 15 routing matrix AA whose columns comprise all lexicographically ordered non-zero binary 44-tuples. Here, M=4M=4, and n¯r(M)\bar{n}_{r}(M) and nr(M)n_{r}(M) are shown in Table I. Note that for this example, the number of reduced equations coincides with the row rank of 𝒜r\mathcal{A}_{r} for each rr. Furthermore, full column rank is achieved only when r=4r=4.

rr 1 2 3 4
n¯r(M)\bar{n}_{r}(M) 44 2020 6464 320320
nr(M)n_{r}(M) 44 1010 1414 1515
row rank of 𝒜r\mathcal{A}_{r} 44 1010 1414 1515
TABLE I: Number of unreduced (n¯r(M)\bar{n}_{r}(M)) and reduced (nr(M)n_{r}(M)) cumulant matching equations, and row rank of 𝒜r\mathcal{A}_{r}, for the 4×154\times 15 routing matrix of all non-zero binary 44-tuples.

We study two estimates of λ\lambda that approximate the cumulant matching equation (27). The first estimate follows from the iteration (1) which is initialized by a constant vector. The second estimate follows from least squares. Specifically, we use the unique Tikhonov regularized least squares solution for the inconsistent set of equations (27), when 𝒜r\mathcal{A}_{r} is not necessarily full column rank. This estimator is given by [32, p. 51]

λ^r=(𝒜r𝒜r+γI)1𝒜rη^r(Y)\displaystyle\hat{\lambda}_{r}=(\mathcal{A}_{r}^{*}\mathcal{A}_{r}+\gamma I)^{-1}\mathcal{A}_{r}^{*}\ \hat{\eta}_{r}(Y) (59)

for some γ>0\gamma>0. Note that the regularized estimator applies to a skinny as well as a fat matrix 𝒜r\mathcal{A}_{r}.

To mitigate the effects of the error introduced by the empirical cumulant estimates, while allowing the estimator of λ\lambda to benefit from the higher order cumulants, the relative weights of the third and fourth order cumulants compared to the first and second order moments, can be reduced. This can be done by multiplying all equations in (59) with rows originating from 𝒜3\mathcal{A}_{3} by some 0<ϵ3<10<\epsilon_{3}<1, and all equations with rows originating from 𝒜4\mathcal{A}_{4} by some 0<ϵ4<10<\epsilon_{4}<1. This regularization approach was advocated by Vardi [1] in the context of his second-order moment matching approach. Following this approach, let the reduced and ϵ\epsilon-weighted matrix 𝒜r\mathcal{A}_{r} be denoted by 𝒜r,ϵ\mathcal{A}_{r,\epsilon}, and let the reduced ϵ\epsilon-weighted vector η^r,ϵ(Y)\hat{\eta}_{r,\epsilon}(Y) be denoted by η^r,ϵ(Y)\hat{\eta}_{r,\epsilon}(Y). Then, from (59), the rate vector λ\lambda is estimated from

λ^r,ϵ=(𝒜r,ϵ𝒜r,ϵ+γI)1𝒜r,ϵη^r,ϵ(Y).\displaystyle\hat{\lambda}_{r,\epsilon}=(\mathcal{A}_{r,\epsilon}^{*}\mathcal{A}_{r,\epsilon}+\gamma I)^{-1}\mathcal{A}_{r,\epsilon}^{*}\ \hat{\eta}_{r,\epsilon}(Y). (60)

Note that the estimator (60) is not guaranteed to be non-negative. A non-negative estimate of λ\lambda can be obtained by using non-negative least squares optimization [33]. In our numeric examples we have arbitrarily substituted negative estimates with the value of .005.005. This approach resulted in substantially lower MSE compared to using the constrained optimization algorithm of [33, p. 161]. The performance of the algorithm remained essentially the same when other small values (e.g., .1.1 or .2.2) were used since occurrences of negative estimates are infrequent at our working point. See Table V.

III-D Complexity

The computational effort in the cumulant matching approach consists of the effort to construct and solve the set of equations (27). The number of equations in this set is nr(M)n_{r}(M). Construction of the left hand side of (27) requires (M2+M3++Mr)L(M^{2}+M^{3}+\ldots+M^{r})L operations. Construction of the right hand side of (27) requires (M2+M3++Mr)N(M^{2}+M^{3}+\ldots+M^{r})N operations where NN is the number vectors used to estimate each cumulant. Solving the equations requires effort that depends only on nr(M)n_{r}(M), MM and the number of iterations in (1). The combined effort is dominated by (M2+M3++Mr)N(M^{2}+M^{3}+\ldots+M^{r})N since NN must be large to produce meaningful cumulant estimates. Thus the computational effort of the cumulant matching approach is approximately linear in NN when NN is large which is always the case.

IV Random Routing

In a typical network, there are multiple paths connecting every source node with every destination node. When the network operates under a deterministic routing regime, a single fixed path is used for every source-destination pair. When the network operates under a random routing regime, a path from the source to destination nodes is selected according to some probability law. Vardi attributed Markovian routing for traffic on each source-destination pair. The accessible nodes and links for the given pair are represented by the states and transition probabilities of the Markov chain, respectively. The transition probabilities of the Markov chain determine the probability of each path with the same source-destination address.

Tebaldi and West argued that random routing can be viewed as deterministic routing in a super-network in which all possible paths for each source-destination address are listed [6]. This approach results in an expanded zero-one routing matrix with each column in the original routing matrix replaced by multiple columns representing the feasible paths for the source-destination pair. The Poisson traffic flow on a source-destination pair is thinned into multiple Poisson traffic flows with the multinomial thinning probabilities being the probabilities of the paths with the same source-destination address. Thus, the super network and the original network operate under similar statistical (Poisson) models. The thinned Poisson rates can now be estimated as was originally done, e.g., using cumulant matching, and each source-destination rate estimate can be obtained from the sum of the thinned rate estimates in that source-destination pair.

V Numerical Example

In this section we demonstrate the performance of our approach and the gain realized by using higher-order empirical cumulants222Our experiments were run on ARGO, a research computing cluster provided by the Office of Research Computing at George Mason University, VA. (URL: http://orc.gmu.edu).. We study the NSFnet [24] whose topology is shown in Fig. 1. The network consists of 14 nodes and 21 bidirectional links. Hence, it contains L=1413/2=91L=14\cdot 13/2=91 source-destination pairs.

Refer to caption

Figure 1: NSFnet topology with link weights as in [24, Fig. 4].

This size network may represent a private network, a transportation network or a subset of interest of a large-scale network. The link weights in Fig. 1 are exclusively used to determine k1k\geq 1 shortest paths for each source-destination pair. Otherwise, they play no role in the traffic rate estimation problem. To determine the kk shortest paths between a given source-destination pair, we used the shortest simple paths function from the NetworkX Python library, which is based on the algorithm of Yen [34]. When k=1k=1, the number of source-destination paths equals the number of source-destination pairs and the routing matrix AA is a 21×9121\times 91 matrix. The augmented routing matrix 𝒜r\mathcal{A}_{r} achieves full column rank when r=2r=2.

With k2k\geq 2, we can assign multiple paths to each source-destination pair and treat them as distinguishable new source-destination pairs. The routing matrix AA thus becomes fatter and using higher-order empirical cumulants becomes beneficial. For example, when k=2k=2, we have L=182L=182 source-destination paths, the column rank of 𝒜2\mathcal{A}_{2} is 162162 and 𝒜3\mathcal{A}_{3} has full column rank. Thus, using this example we focus on third-order cumulant matching.

The network with k=2k=2 and a 21×18221\times 182 routing matrix AA could also be seen as a super-network in the Tebaldi-West sense [6] for a network with L=91L=91 source-destination pairs operating under a random routing regime with two possible paths per each source-destination pair. The accuracy of the rate estimation for the random routing network is determined by the accuracy of the rate estimation in the deterministic routing super-network. Thus, it suffices to focus on rate estimation in the deterministic routing network with the 21×18221\times 182 routing matrix AA.

The arrival rates {λj}\{\lambda_{j}\} in our experiment were generated randomly from the interval [0,4][0,4]. In each of T=500T=500 simulation runs, a rate vector λ\lambda was generated, and was subsequently used in generating NN statistically independent identically distributed Poisson vectors {Xn}\{X_{n}\} which were transformed into the vectors {Yn=AXn}\{Y_{n}=AX_{n}\} using the assumed known routing matrix AA. The NN statistically independent identically distributed Poisson vectors {Yn}\{Y_{n}\} were used to generate the empirical cumulants using (29) when r=1r=1 and (30) when r=2,3r=2,3. We experimented with NN in the range of 10 00010\,000 to 500 000500\,000. The MSE in estimating the cumulants for the NSFnet with M=21M=21 are given in Table II. Clearly, the MSE decreases monotonically with NN for r=1,2,3r=1,2,3.

The empirical cumulants were used to estimate the rate vector in the current run. For the cumulants regularization we have used ϵ2=1\epsilon_{2}=1 and ϵ3=.01\epsilon_{3}=.01. Let λt(i)\lambda_{t}(i) and λ^t(i)\hat{\lambda}_{t}(i) denote, respectively, the iith component of λ\lambda and its estimate at the ttth run where i=1,,Li=1,\ldots,L and t=1,,Tt=1,\ldots,T. For each estimate we evaluated the normalized MSE defined by

ξi2=1Tt=1T(λt(i)λ^t(i))21Tt=1T(λt(i))2\displaystyle\xi_{i}^{2}=\frac{\frac{1}{T}\sum_{t=1}^{T}(\lambda_{t}(i)-\hat{\lambda}_{t}(i))^{2}}{\frac{1}{T}\sum_{t=1}^{T}(\lambda_{t}(i))^{2}} (61)

and the averaged normalized MSE defined by

ξ2¯=1Li=1Lξi2.\displaystyle\overline{\xi^{2}}=\frac{1}{L}\sum_{i=1}^{L}\xi_{i}^{2}. (62)

The MSE in estimating λi\lambda_{i} is approximately ξi2E{λ2(i)}\xi_{i}^{2}\cdot E\{\lambda^{2}(i)\} when TT is sufficiently large.

Two rate estimators were used, the iterative estimator (1) and the least squares estimator (60). The iteration was initialized uniformly with all rates set to .1.1. It was terminated after 300300 iterations. The least squares regularization factor was set to γ=.0005\gamma=.0005.

NN r=1r=1 r=2r=2 r=3r=3
10 00010\,000 0.0048 0.2578 14.9097
20 00020\,000 0.0025 0.1285 7.4569
50 00050\,000 0.0010 0.0515 2.9978
100 000100\,000 0.0005 0.0257 1.4989
200 000200\,000 0.0002 0.0131 0.7429
500 000500\,000 0.0001 0.0052 0.2993
TABLE II: MSE in cumulant estimation for the NSFnet with M=21M=21 links.
NN r=1r=1 r=2r=2 r=3r=3
10 00010\,000 0.2292 0.0993 0.1008
20 00020\,000 0.2292 0.0701 0.0694
50 00050\,000 0.2292 0.0502 0.0467
100 000100\,000 0.2292 0.0434 0.0381
200 000200\,000 0.2292 0.0399 0.0335
500 000500\,000 0.2292 0.0372 0.0298
Theoretical 0.2292 0.0353 0.0015
Rank(𝒜r)(\mathcal{A}_{r}) 21 162 182
TABLE III: Averaged normalized MSE ξ2¯\overline{\xi^{2}} in L=182L=182 source-destination path rate estimation of the NSFnet using the Iteration (1).
NN r=1r=1 r=2r=2 r=3r=3
10 00010\,000 0.3037 0.0951 0.0950
20 00020\,000 0.3037 0.0637 0.0625
50 00050\,000 0.3037 0.0430 0.0407
100 000100\,000 0.3037 0.0358 0.0329
200 000200\,000 0.3037 0.0321 0.0288
500 000500\,000 0.3037 0.0293 0.0257
Theoretical 0.3037 0.0275 0.0000
Rank(𝒜r)(\mathcal{A}_{r}) 21 162 182
TABLE IV: Averaged normalized MSE ξ2¯\overline{\xi^{2}} in L=182L=182 source-destination path rate estimation of the NSFnet using the Least Squares Estimator (60).
NN r=1r=1 r=2r=2 r=3r=3
10 00010\,000 0.1890 4.1747 4.2275
20 00020\,000 0.1868 2.9813 3.0626
50 00050\,000 0.1890 1.9879 1.9901
100 000100\,000 0.1890 1.6385 1.5780
200 000200\,000 0.1890 1.3747 1.3242
500 000500\,000 0.1890 1.1956 1.1165
Theoretical 0.1890 0.9692 0.0022
TABLE V: Percent of Negative rate estimates in Table IV.

The performance of the iterative estimator (1) is demonstrated in Table III. The table shows the averaged normalized MSE ξ2¯\overline{\xi^{2}} in estimating the path rates using r=1,2,3r=1,2,3 empirical cumulants estimated from NN vectors {Yn}\{Y_{n}\} with NN ranging from N=10 000N=10\,000 to N=500 000N=500\,000. The averaged normalized MSE ξ2¯\overline{\xi^{2}} in estimating the path rates using r=1,2,3r=1,2,3 theoretical cumulants is shown as well. As can be observed in the table, ξ2¯\overline{\xi^{2}} monotonically decreases with NN for r=2,3r=2,3. For a given NN, the performance using r=3r=3 cumulants is better than using r=2r=2 moments, except for N=10 000N=10\,000 where the difference is only .0005.0005. Similar behavior can be seen in Table IV when the least squares estimator (60) is used for the path rate estimation. The percentage of negative rate estimates observed with the least squares estimator is shown in Table V. This percentage is negligible for all values of rr and NN and does not exceed 5%5\%. Comparing the two estimators (1) and (60), the least squares estimator appears superior as it consistently provides lower ξ2¯\overline{\xi^{2}} at the expense of negligible percentage of negative rate estimates.

To assess the effectiveness of the third-order cumulant matching approach and compare it with Vardi’s second-order moment matching approach, we contrast the performance of the former approach with the best performance of the latter approach as obtained by using the theoretical rather than the empirical moments. Let ξ2¯(r,N)\overline{\xi^{2}}(r,N) denote the averaged normalized MSE from (62) obtained when λ\lambda is estimated using rr empirical cumulants that were estimated from NN realizations of the vector YY. Let ξ2¯(r,)\overline{\xi^{2}}(r,\infty) denote the averaged normalized MSE ξ2¯\overline{\xi^{2}} from (62) obtained by using rr theoretical cumulants, or effectively by using N=N=\infty. Table VI shows ξ2¯(2,N)/ξ2¯(2,)\overline{\xi^{2}}(2,N)/\overline{\xi^{2}}(2,\infty) and ξ2¯(3,N)/ξ2¯(2,)\overline{\xi^{2}}(3,N)/\overline{\xi^{2}}(2,\infty) for N=200 000N=200\,000 and the estimators (1) and (60) discussed in Tables IIIIV.

λ^\hat{\lambda} [Eq. (1)] λ^\hat{\lambda} [Eq. (60)]
ξ2¯(2,N)/ξ2¯(2,){\overline{\xi^{2}}(2,N)}/{\overline{\xi^{2}}(2,\infty)} 1.1303 1.1673
ξ2¯(3,N)/ξ2¯(2,){\overline{\xi^{2}}(3,N)}/{\overline{\xi^{2}}(2,\infty)} 0.9490 1.0473
TABLE VI: Relative averaged normalized MSE ξ2¯\overline{\xi^{2}} in second and third order cumulant matching with N=200 000N=200\,000, to minimum second-order averaged normalized MSE.

Table VI shows the relative averaged normalized MSE in second and third-order cumulant matching compared to the minimum averaged normalized MSE in theoretical second-order moment matching as obtained by using the true second-order moments. The reduction in the relative averaged normalized MSE obtained when the estimator (1) was used was from 1.13031.1303 to 0.94900.9490, and from 1.16731.1673 to 1.04731.0473 when the estimator (60) was used. The third-order cumulants in this experiment were estimated using N=200 000N=200\,000. This represents a reduction of about 12%18%12\%-18\%.

Refer to caption

Figure 2: Individual rate estimation performance as measured by the ratio ξi2(3,N)/ξi2(2,)\xi_{i}^{2}(3,N)/\xi_{i}^{2}(2,\infty) for i=1,,182i=1,\ldots,182. The rate vector λ\lambda was estimated using the Iteration (1) and N=200 000N=200\,000.

To show the accuracy in estimating individual path rates, let ξi2(r,N)\xi_{i}^{2}(r,N) denote the normalized MSE ξi2\xi_{i}^{2} from (61) when estimating the iith component of λ\lambda using rr empirical cumulants estimated from NN realizations of YY. Figure 2 shows ξi2(3,N)/ξi2(2,)\xi_{i}^{2}(3,N)/\xi_{i}^{2}(2,\infty) for i=1,,182i=1,\ldots,182 when estimation was done using the Iteration (1) and N=200 000N=200\,000. Excluding outliers, the figure shows that ξi2(3,N)\xi_{i}^{2}(3,N) equals about .81.1.8-1.1 of ξi2(2,)\xi_{i}^{2}(2,\infty). The outliers correspond to rate components for which ξi2(2,)\xi_{i}^{2}(2,\infty) is very small so that the ratio ξi2(3,N)/ξi2(2,)\xi_{i}^{2}(3,N)/\xi_{i}^{2}(2,\infty) is rather large.

VI Concluding Remarks

We have developed a framework for high-order cumulant matching approach for estimating the rates of source-destination Poisson traffic flows from link traffic flows. The approach is equally applicable to networks operating under deterministic as well as random routing strategies. Under independent Poisson source-destination traffic flows, the approach boils down to a set of linear equations relating empirical cumulants of the link measurements to the rate vector λ\lambda via a matrix involving Khatri-Rao products of the routing matrix. We studied an iterative minimum I-divergence approach and least squares estimation of the rate vector from the cumulant matching equations. We have established an upper bound on the MSE in least squares estimation of λ\lambda. The bound is useful for full column rank matrices 𝒜r\mathcal{A}_{r} with small condition number. We demonstrated the performance of the cumulant matching approach and compared it with Vardi’s best second-order moment matching on the NSFnet. We demonstrated that supplementing Vardi’s approach with the third-order empirical cumulant reduces its minimum averaged normalized MSE in rate estimation by 13%17%13\%-17\%. The minimum averaged normalized MSE is obtained when the theoretical rather than the empirical moments are used in Vardi’s second-order approach. The computational complexity of the approach is linear in the number NN of link traffic flows used to estimate the cumulants.

Appendix A Cumulants of Linear Maps

Let eie_{i} be an L×1L\times 1 vector with a 11 in the iith component and zeros elsewhere. The vector X¯\bar{X} can be written as X¯=i=1Lx¯iei\bar{X}=\sum_{i=1}^{L}\bar{x}_{i}e_{i}. Substituting in (2), and applying the distributive property of the Kronecker product, we obtain,

μ4(X)\displaystyle\mu_{4}(X) =i,j,k,lE{x¯ix¯jx¯kx¯l}(eiejekel).\displaystyle=\sum_{i,j,k,l}E\left\{\bar{x}_{i}\bar{x}_{j}\bar{x}_{k}\bar{x}_{l}\right\}(e_{i}\otimes e_{j}\otimes e_{k}\otimes e_{l}). (A.63)

Similarly, the fourth order cumulant of XX is given by

K4(X)\displaystyle K_{4}(X) =i,j,k,lκ4(x¯i,x¯j,x¯k,x¯l)(eiejekel)\displaystyle=\sum_{i,j,k,l}\kappa_{4}(\bar{x}_{i},\bar{x}_{j},\bar{x}_{k},\bar{x}_{l})(e_{i}\otimes e_{j}\otimes e_{k}\otimes e_{l}) (A.64)

where

κ4\displaystyle\kappa_{4} (x¯i,x¯j,x¯k,x¯l)=E{x¯ix¯jx¯kx¯l}E{x¯ix¯j}E{x¯kx¯l}\displaystyle(\bar{x}_{i},\bar{x}_{j},\bar{x}_{k},\bar{x}_{l})=E\{\bar{x}_{i}\bar{x}_{j}\bar{x}_{k}\bar{x}_{l}\}-E\{\bar{x}_{i}\bar{x}_{j}\}E\{\bar{x}_{k}\bar{x}_{l}\}
E{x¯ix¯k}E{x¯jx¯l}E{x¯ix¯l}E{x¯jx¯k}.\displaystyle-E\{\bar{x}_{i}\bar{x}_{k}\}E\{\bar{x}_{j}\bar{x}_{l}\}-E\{\bar{x}_{i}\bar{x}_{l}\}E\{\bar{x}_{j}\bar{x}_{k}\}. (A.65)

Substituting (A) into (A.64) we obtain

K4(X)\displaystyle K_{4}(X) =(i)i,j,k,lE{x¯ix¯jx¯kx¯l}(eiejekel)\displaystyle\overset{(i)}{=}\sum_{i,j,k,l}E\{\bar{x}_{i}\bar{x}_{j}\bar{x}_{k}\bar{x}_{l}\}(e_{i}\otimes e_{j}\otimes e_{k}\otimes e_{l})
(ii)i,j,k,lE{x¯ix¯j}E{x¯kx¯l}(eiejekel)\displaystyle\overset{(ii)}{-}\sum_{i,j,k,l}E\{\bar{x}_{i}\bar{x}_{j}\}E\{\bar{x}_{k}\bar{x}_{l}\}(e_{i}\otimes e_{j}\otimes e_{k}\otimes e_{l})
(iii)i,j,k,lE{x¯ix¯k}E{x¯jx¯l}(eiejekel)\displaystyle\overset{(iii)}{-}\sum_{i,j,k,l}E\{\bar{x}_{i}\bar{x}_{k}\}E\{\bar{x}_{j}\bar{x}_{l}\}(e_{i}\otimes e_{j}\otimes e_{k}\otimes e_{l})
(iv)i,j,k,lE{x¯ix¯l}E{x¯jx¯k}(eiejekel).\displaystyle\overset{(iv)}{-}\sum_{i,j,k,l}E\{\bar{x}_{i}\bar{x}_{l}\}E\{\bar{x}_{j}\bar{x}_{k}\}(e_{i}\otimes e_{j}\otimes e_{k}\otimes e_{l}). (A.66)

Row (i)(i) coincides with (A.63) and hence it equals μ4(X)\mu_{4}(X). The expression in row (ii)(ii) can be written as

i,jE{x¯ix¯j}(eiej)k,lE{x¯kx¯l}(ekel)\displaystyle\sum_{i,j}E\{\bar{x}_{i}\bar{x}_{j}\}(e_{i}\otimes e_{j})\otimes\sum_{k,l}E\{\bar{x}_{k}\bar{x}_{l}\}(e_{k}\otimes e_{l})
=μ2(X)μ2(X).\displaystyle\hskip 86.11084pt=\mu_{2}(X)\otimes\mu_{2}(X). (A.67)

For row (iii)(iii),

(eiejekel)\displaystyle(e_{i}\otimes e_{j}\otimes e_{k}\otimes e_{l}) =vec{(ekel)(eiej)}\displaystyle=\textrm{vec}\{(e_{k}\otimes e_{l})(e_{i}\otimes e_{j})^{*}\}
=vec{(ekel)(eiej)}\displaystyle=\textrm{vec}\{(e_{k}\otimes e_{l})(e_{i}^{*}\otimes e_{j}^{*})\}
=vec{(ekei)(elej)},\displaystyle=\textrm{vec}\{(e_{k}e_{i}^{*})\otimes(e_{l}e_{j}^{*})\}, (A.68)

and hence,

i,j,k,lE{x¯i\displaystyle\sum_{i,j,k,l}E\{\bar{x}_{i} x¯k}E{x¯jx¯l}(eiejekel)\displaystyle\bar{x}_{k}\}E\{\bar{x}_{j}\bar{x}_{l}\}(e_{i}\otimes e_{j}\otimes e_{k}\otimes e_{l})
=veci,j,k,lE{x¯ix¯k}E{x¯jx¯l}(ekeielej)\displaystyle=\textrm{vec}\sum_{i,j,k,l}E\{\bar{x}_{i}\bar{x}_{k}\}E\{\bar{x}_{j}\bar{x}_{l}\}(e_{k}e_{i}^{*}\otimes e_{l}e_{j}^{*})
=vec{RXXRXX}.\displaystyle=\textrm{vec}\left\{R_{XX}\otimes R_{XX}\right\}. (A.69)

For row (iv)(iv) we utilize the L2×L2L^{2}\times L^{2} permutation matrix

UL2=i=1Lj=1L(eiej)(ejei)\displaystyle U_{L^{2}}=\sum_{i=1}^{L}\sum_{j=1}^{L}(e_{i}e_{j}^{*})\otimes(e_{j}e_{i}^{*}) (A.70)

to obtain,

(eiejekel)\displaystyle(e_{i}\otimes e_{j}\otimes e_{k}\otimes e_{l}) =(eiej)(UL2(elek))\displaystyle=(e_{i}\otimes e_{j})\otimes\left(U_{L^{2}}\cdot(e_{l}\otimes e_{k})\right)
=vec{UL2(elek)(eiej)}\displaystyle=\textrm{vec}\left\{U_{L^{2}}\cdot(e_{l}\otimes e_{k})\cdot(e_{i}\otimes e_{j})^{*}\right\}
=vec{UL2(elek)(eiej)}\displaystyle=\textrm{vec}\left\{U_{L^{2}}\cdot(e_{l}\otimes e_{k})\cdot(e_{i}^{*}\otimes e_{j}^{*})\right\}
=vec{UL2(elei)(ekej)}.\displaystyle=\textrm{vec}\left\{U_{L^{2}}\cdot(e_{l}e_{i}^{*})\otimes(e_{k}e_{j}^{*})\right\}. (A.71)

Hence, row (iv)(iv) is given by

vec{UL2i,j,k,lE{x¯ix¯l}E{x¯jx¯k}(elei)(ekej)}\displaystyle\textrm{vec}\left\{U_{L^{2}}\sum_{i,j,k,l}E\{\bar{x}_{i}\bar{x}_{l}\}E\{\bar{x}_{j}\bar{x}_{k}\}(e_{l}e_{i}^{*})\otimes(e_{k}e_{j}^{*})\right\}
=vec{UL2(RXXRXX)}.\displaystyle\hskip 86.11084pt=\textrm{vec}\left\{U_{L^{2}}(R_{XX}\otimes R_{XX})\right\}. (A.72)

Substituting these results into (A) yields (III-A).

We next express K4(Y)K_{4}(Y) in terms of K4(X)K_{4}(X) for Y=AXY=AX. By substituting XX with YY in (III-A) we obtain,

K4(Y)\displaystyle K_{4}(Y) =μ4(Y)μ2(Y)μ2(Y)\displaystyle=\mu_{4}(Y)-\mu_{2}(Y)\otimes\mu_{2}(Y)
vec{RYYRYY+UM2(RYYRYY)}.\displaystyle-\textrm{vec}\left\{R_{YY}\otimes R_{YY}+U_{M^{2}}\cdot(R_{YY}\otimes R_{YY})\right\}. (A.73)

The first term in the RHS of (A) is given by (4). The second term can be expressed using (3) and (4) as,

μ2(Y)μ2(Y)\displaystyle\mu_{2}(Y)\otimes\mu_{2}(Y) =[(AA)μ2(X)][(AA)μ2(X)]\displaystyle=[(A\otimes A)\mu_{2}(X)]\otimes[(A\otimes A)\mu_{2}(X)]
=(AAAA)(μ2(X)μ2(X)).\displaystyle=(A\otimes A\otimes A\otimes A)(\mu_{2}(X)\otimes\mu_{2}(X)). (A.74)

For the third term in (A) we use (3) and the well known identity [29, p. 410]

vec{ARB}=(BA)vec{R}\displaystyle\textrm{vec}\left\{ARB^{*}\right\}=(B\otimes A)\textrm{vec}\{R\} (A.75)

to obtain

vec {RYYRYY}x=vec{(ARXXA)(ARXXA)}\displaystyle\left\{R_{YY}\otimes R_{YY}\right\}x=\textrm{vec}\left\{(AR_{XX}A^{*})\otimes(AR_{XX}A^{*})\right\}
=vec{(AA)(RXXRXX)(AA)}\displaystyle=\textrm{vec}\left\{(A\otimes A)(R_{XX}\otimes R_{XX})(A\otimes A)^{*}\right\}
=(AAAA)vec{RXXRXX}.\displaystyle=(A\otimes A\otimes A\otimes A)\textrm{vec}\{R_{XX}\otimes R_{XX}\}. (A.76)

For the last term in (A), we use (3) and the relation

UM2(AA)=(AA)UL2\displaystyle U_{M^{2}}(A\otimes A)=(A\otimes A)U_{L^{2}} (A.77)

to obtain

UM2(RYYRYY)\displaystyle U_{M^{2}}(R_{YY}\!\otimes\!R_{YY}) =UM2(AA)(RXXRXX)(AA)\displaystyle=U_{M^{2}}(A\!\otimes\!A)(R_{XX}\!\otimes\!R_{XX})(A\!\otimes\!A)^{*}
=(AA)UL2(RXXRXX)(AA).\displaystyle=(A\!\otimes\!A)U_{L^{2}}(R_{XX}\!\otimes\!R_{XX})(A\!\otimes\!A)^{*}. (A.78)

Hence, from (A.75),

vec {UM2(RYYRYY)}=\displaystyle\left\{U_{M^{2}}(R_{YY}\otimes R_{YY})\right\}=
(AAAA)vec{UL2(RXXRXX)}.\displaystyle(A\otimes A\otimes A\otimes A)\textrm{vec}\{U_{L^{2}}(R_{XX}\otimes R_{XX})\}. (A.79)

Combining these results we obtain (6).

Suppose now that the components of XX are independent random variables. Then,

E{x¯l1x¯l2x¯lk}={E{x¯lk},l1=l2==lk=l0,otherwise\displaystyle E\{\bar{x}_{l_{1}}\cdot\bar{x}_{l_{2}}\cdots\bar{x}_{l_{k}}\}=\left\{\begin{array}[]{cc}E\{\bar{x}_{l}^{k}\},&l_{1}\!=\!l_{2}\!=\!\cdots\!=\!l_{k}=l\\ 0,&\textrm{otherwise}\end{array}\right. (A.82)

and

μ3(X)\displaystyle\mu_{3}(X) =E{X¯1X¯1X¯1}\displaystyle=E\left\{\bar{X}_{1}\otimes\bar{X}_{1}\otimes\bar{X}_{1}\right\}
=i,j,kE{x¯ix¯jx¯k}(eiejek)\displaystyle=\sum_{i,j,k}E\{\bar{x}_{i}\bar{x}_{j}\bar{x}_{k}\}(e_{i}\otimes e_{j}\otimes e_{k})
=iE{x¯i3}(eieiei).\displaystyle=\sum_{i}E\{\bar{x}_{i}^{3}\}(e_{i}\otimes e_{i}\otimes e_{i}). (A.83)

The central moment μ3(Y)\mu_{3}(Y) follows from (4). Expressing

A=i=1Laiei,\displaystyle A=\sum_{i=1}^{L}a_{i}e_{i}^{*}, (A.84)

we have

AAA\displaystyle A\otimes A\otimes A =i,j,k(aiei)(ajej)(akek)\displaystyle=\sum_{i,j,k}(a_{i}e_{i}^{*})\otimes(a_{j}e_{j}^{*})\otimes(a_{k}e_{k}^{*})
=i,j,k(aiajak)(eiejek).\displaystyle=\sum_{i,j,k}(a_{i}\otimes a_{j}\otimes a_{k})(e_{i}\otimes e_{j}\otimes e_{k})^{*}. (A.85)

Substituting (A) and (A) in (4), and using the orthonormality of the L3×1L^{3}\times 1 vectors {eiejek}\{e_{i}\otimes e_{j}\otimes e_{k}\}, we obtain,

μ3(Y)\displaystyle\mu_{3}(Y) =i=1LE{x¯i3}(aiaiai)\displaystyle=\sum_{i=1}^{L}E\{\bar{x}_{i}^{3}\}(a_{i}\otimes a_{i}\otimes a_{i})
=(AAA)col(E{x¯13},E{x¯23},,E{x¯L3}).\displaystyle=(A\odot A\odot A)\ \textrm{col}\left(E\left\{\bar{x}_{1}^{3}\right\},E\left\{\bar{x}_{2}^{3}\right\},\ldots,E\left\{\bar{x}_{L}^{3}\right\}\right). (A.86)

It can similarly be shown that

μ2(Y)\displaystyle\mu_{2}(Y) =(AA)col(E{x¯12},E{x¯22},,E{x¯L2}).\displaystyle=(A\odot A)\ \textrm{col}\left(E\left\{\bar{x}_{1}^{2}\right\},E\left\{\bar{x}_{2}^{2}\right\},\ldots,E\left\{\bar{x}_{L}^{2}\right\}\right). (A.87)

When the components of XX are independent random variables

κ4(x¯i,x¯j,x¯k,x¯l)=δijδjkδklκ4(x¯i,x¯i,x¯i,x¯i),\displaystyle\kappa_{4}(\bar{x}_{i},\bar{x}_{j},\bar{x}_{k},\bar{x}_{l})=\delta_{ij}\delta_{jk}\delta_{kl}\kappa_{4}(\bar{x}_{i},\bar{x}_{i},\bar{x}_{i},\bar{x}_{i}), (A.88)

and

κ4(x¯i,x¯j,x¯k,x¯l)={E{x¯i4}3(E{x¯i2})2,i=j=k=l0,otherwise.\displaystyle\kappa_{4}(\bar{x}_{i},\bar{x}_{j},\bar{x}_{k},\bar{x}_{l})=\left\{\begin{array}[]{cc}E\{\bar{x}_{i}^{4}\}-3(E\{\bar{x}_{i}^{2}\})^{2},&i\!=\!j\!=\!k\!=\!l\\ 0,&\textrm{otherwise.}\end{array}\right. (A.91)

Substituting (A.91) into (A) yields,

K4(X)\displaystyle K_{4}(X) =i,j,k,lκ4(x¯i,x¯i,x¯i,x¯i)δijδjkδkl(eiejekel)\displaystyle=\sum_{i,j,k,l}\kappa_{4}(\bar{x}_{i},\bar{x}_{i},\bar{x}_{i},\bar{x}_{i})\delta_{ij}\delta_{jk}\delta_{kl}(e_{i}\otimes e_{j}\otimes e_{k}\otimes e_{l})
=pκ4(x¯p,x¯p,x¯p,x¯p)(epepepep).\displaystyle=\sum_{p}\kappa_{4}(\bar{x}_{p},\bar{x}_{p},\bar{x}_{p},\bar{x}_{p})(e_{p}\otimes e_{p}\otimes e_{p}\otimes e_{p}). (A.92)

To find K4(Y)K_{4}(Y) we use (6) where

AAAA=i,j,k,l(aiajakal)(eiejekel)\displaystyle A\otimes A\otimes A\otimes A=\!\sum_{i,j,k,l}\!(a_{i}\!\otimes\!a_{j}\!\otimes\!a_{k}\otimes\!a_{l})(e_{i}\!\otimes\!e_{j}\!\otimes\!e_{k}\!\otimes e_{l})^{*} (A.93)

as follows from (A.84)-(A), and K4(X)K_{4}(X) is given in (A). Using orthonormality of the L4×1L^{4}\times 1 vectors {eiejekel}\{e_{i}\otimes e_{j}\otimes e_{k}\otimes e_{l}\} as in (A), yields,

K4(Y)\displaystyle K_{4}(Y) =p=1Lκ4(x¯p,x¯p,x¯p,x¯p)(apapapap)\displaystyle=\sum_{p=1}^{L}\kappa_{4}(\bar{x}_{p},\bar{x}_{p},\bar{x}_{p},\bar{x}_{p})(a_{p}\otimes a_{p}\otimes a_{p}\otimes a_{p})
=(AAAA)(κ4(x¯1,x¯1,x¯1,x¯1)κ4(x¯2,x¯2,x¯2,x¯2)κ4(x¯L,x¯L,x¯L,x¯L)),\displaystyle=(A\odot A\odot A\odot A)\left(\begin{array}[]{c}\kappa_{4}(\bar{x}_{1},\bar{x}_{1},\bar{x}_{1},\bar{x}_{1})\\ \kappa_{4}(\bar{x}_{2},\bar{x}_{2},\bar{x}_{2},\bar{x}_{2})\\ \vdots\\ \kappa_{4}(\bar{x}_{L},\bar{x}_{L},\bar{x}_{L},\bar{x}_{L})\end{array}\right), (A.98)

which is (13). We conjecture that the same relation holds for all higher order cumulants.

When the components of XX are independent Poisson random variables with E{X}=λE\{X\}=\lambda,

E{x¯i2}\displaystyle E\{\bar{x}_{i}^{2}\} =E{x¯i3}=λi\displaystyle=E\{\bar{x}_{i}^{3}\}=\lambda_{i}
E{x¯i4}\displaystyle E\{\bar{x}_{i}^{4}\} =λi+3λi2\displaystyle=\lambda_{i}+3\lambda_{i}^{2}
κ4(x¯i,x¯i,x¯i,x¯i)\displaystyle\kappa_{4}(\bar{x}_{i},\bar{x}_{i},\bar{x}_{i},\bar{x}_{i}) =(λi+3λi2)3λi2=λi,\displaystyle=(\lambda_{i}+3\lambda_{i}^{2})-3\lambda_{i}^{2}=\lambda_{i}, (A.99)

and (25) follows.

References

  • [1] Y. Vardi, “Network tomography: Estimating source-destination traffic intensities from link data,” J. Amer. Stat. Assoc., vol. 91, no. 433, pp. 365–377, 1996.
  • [2] R. J. Vanderbei and J. Iannone, “An EM approach to OD matrix estimation,” Princeton University, Tech. Rep. SOR 94-04, 1994.
  • [3] J. Cao, D. Davis, S. Vander Wiel, and B. Yu, “Time-varying network tomography: Router link data,” J. Amer. Statist. Assoc., vol. 95, pp. 1063–1075, 2004.
  • [4] G. Liang and B. Yu, “Maximum pseudo likelihood estimation in network tomography,” IEEE Trans. Signal Process., vol. 51, no. 8, pp. 2043–2053, Aug. 2003.
  • [5] W. H. Richardson, “Bayesian-based iterative method of image restoration,” J. Opt. Soc. Amer., vol. 62, pp. 55–59, Jan 1972.
  • [6] C. Tebaldi and M. West, “Bayesian inference on network traffic using link count data (with discussion),” J. Amer. Stat. Assoc., vol. 93, no. 442, pp. 557–576, 1998.
  • [7] M. Mardani and G. B. Giannakis, “Estimating traffic and anomaly maps via network tomography,” IEEE/ACM Trans. Netw., vol. 24, no. 3, pp. 1533–1547, June 2016.
  • [8] H. Singhal and G. Michailidis, “Identifiability of flow distributions from link measurements with applications to computer networks,” Inverse Problems, vol. 23, pp. 1821–1849, 2007.
  • [9] M. L. Hazelton, “Network tomography for integer-valued traffic,” The Annals of Applied Statistics, vol. 9, no. 1, pp. 474–506, 2015.
  • [10] R. Cáceres, N. G. Duffield, J. Horowitz, and D. F. Towsley, “Multicast-based inference of network internal loss characteristics,” IEEE Trans. Inf. Theory, vol. 45, no. 7, pp. 2462–2480, Nov. 1999.
  • [11] F. Lo Presti, N. G. Duffield, J. Horowitz, and D. Towsley, “Multicast-based inference of network-internal delay distributions,” IEEE/ACM Trans. Netw., vol. 10, no. 6, pp. 761–775, Dec. 2002.
  • [12] Y. Tsang, M. J. Coates, and R. Nowak, “Network delay tomography,” IEEE Trans. Signal Process., vol. 51, pp. 2125–2136, 2003.
  • [13] L. Ma, T. He, K. K. Leung, D. Towsley, and A. Swami, “Efficient identification of additive link metrics via network tomography,” in IEEE 33rd Int. Conf. Distributed Computing Systems, Philadelphia, PA, 2013, pp. 581–590.
  • [14] N. Etemadi Rad, Y. Ephraim, and B. L. Mark, “Delay network tomography using a partially observable bivariate Markov chain,” IEEE/ACM Trans. Netw., vol. 25, no. 1, pp. 126–138, Feb. 2017.
  • [15] X. Jin, W.-P. Yiu, S.-H. Chan, and Y. Wang, “Network topology inference b ased on end-to-end measurements,” IEEE J. Sel. Areas Commun., vol. 24, no. 12, pp. 2182–2195, Dec. 2006.
  • [16] B. Eriksson, G. Dasarathy, P. Barford, and R. Nowak, “Toward the practical use of network tomography for Internet topology discovery,” in Proc. IEEE INFOCOM, San Diego, CA, USA, March 2010, pp. 1–9.
  • [17] J. Ni, H. Xie, S. Tatikonda, and Y. R. Yang, “Efficient and dynamic routing topology inference from end-to-end measurements,” IEEE/ACM Trans. Netw., vol. 18, no. 1, pp. 123–135, Feb. 2010.
  • [18] A. Anandkumar, A. Hassidim, and J. Kelner, “Topology discovery of sparse random graphs with few participants,” in ACM SIGMETRICS’11, San Jose, CA, USA, June 2011.
  • [19] R. Kumar and J. Kaur, “Practical beacon placement for link monitoring using network tomography,” IEEE J. Sel. Areas Commun., vol. 24, no. 12, pp. 2196–2209, Dec. 2006.
  • [20] M. G. Rabbat, M. J. Coates, and R. D. Nowak, “Multiple-source Internet tomography,” IEEE J. Sel. Areas Commun., vol. 24, no. 12, pp. 2221–2234, Dec. 2006.
  • [21] H. Yao, S. Jaggi, and M. Chen, “Passive network tomography for erroneous networks: A network coding approach,” IEEE Trans. Inf. Theory, vol. 58, no. 9, pp. 5922–5940, Sep. 2012.
  • [22] R. Castro, M. Coates, G. Liang, R. Nowak, and B. Yu, “Network tomography: Recent developments,” J. Stat. Sci., vol. 19, no. 3, pp. 499–517, 2004.
  • [23] P. Qin, B. Dai, B. Huang, G. Xu, and K. Wu, “A survey on network tomography with network coding,” IEEE Commun. Surveys Tuts., vol. 16, no. 4, pp. 1981–1995, 2014.
  • [24] L. H. Bonani, “Modeling an optical network operating with hybrid-switching paradigms,” J. Microw. Optoelectron. Electromagn. Appl., vol. 15, no. 4, pp. 275–292, 2016.
  • [25] D. L. Snyder, T. J. Schulz, and J. A. O’Sullivan, “Deblurring subject to nonnegativity constraints,” IEEE Trans. Sig. Proc., vol. 40, 1992.
  • [26] I. Csiszár, “Why least squares and maximum entropy? an axiomatic approach to inference for linear inverse problems,” The Annals ofStatistics, vol. 19, no. 4, pp. 2032–2066, Dec. 1991.
  • [27] L. A. Shepp and Y. Vardi, “Maximum likelihood reconstruction for emission tomography,” IEEE Trans Med Imaging, vol. 1, pp. 113–122, 1982.
  • [28] Y. Vardi, “Applications of the EM algorithm to linear inverse problems with positivity constraints,” in Image Models (and their Speech Cousins), S. E. Levinson and L. A. Shepp, Eds.   Springer, 1966, pp. 183–198.
  • [29] P. Lancaster and M. Tismenetsky, The Theory of Matrices, 2nd ed.   Orlando, Florida: Academic Press, 1985.
  • [30] A. Swami and J. M. Mendel, “Time and lag recursive computation of cumulants from a state-space model,” IEEE Trans. Automatic Control, vol. 35, no. 1, pp. 4–17, Jan. 1990.
  • [31] M. G. Kendall and A. Stuart, The Advanced Theory of Statistics.   Hafner Publishing Company, 1969.
  • [32] T. Kailath, A. H. Sayed, and B. Hassibi, Linear Estimation, 1st ed.   New Jersey: Prentice Hall, 2000.
  • [33] C. L. Lawson and R. J. Hanson, Solving Least Squares Problems.   New Jersey: Prentice-Hall, 1974.
  • [34] J. Y. Yen, “Finding the k shortest loopless paths in a network,” Management Science, vol. 17, no. 11, pp. 712–716, Jul. 1971.