This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\headers

Error bounds for Floating Point SummationEric Hallman and Ilse C.F. Ipsen

Deterministic and Probabilistic Error Bounds for Floating Point Summation Algorithmsthanks: This research was supported in part by the National Science Foundation through grant DMS-1745654 and DMS-1760374.

Eric Hallman Department of Mathematics, North Carolina State University () [email protected]    Ilse C.F. Ipsen Department of Mathematics, North Carolina State University () [email protected]
Abstract

We analyse the forward error in the floating point summation of real numbers, from algorithms that do not require recourse to higher precision or better hardware. We derive informative explicit expressions, and new deterministic and probabilistic bounds for errors in three classes of algorithms: general summation, shifted general summation, and compensated (sequential) summation. Our probabilistic bounds for general and shifted general summation hold to all orders. For compensated summation, we also present deterministic and probabilistic first and second order bounds, with a first order bound that differs from existing ones. Numerical experiments illustrate that the bounds are informative and that among the three algorithm classes, compensated summation is generally the most accurate method.

keywords:
Rounding error analysis, floating-point arithmetic, random variables, martingales
{AMS}

65G99, 60G42, 60G50

1 Introduction

Given nn real numbers x1,,xnx_{1},\ldots,x_{n}, we consider algorithms for summation sn=x1++xns_{n}=x_{1}+\cdots+x_{n} in floating point arithmetic. The algorithms operate in one and only one precision, without any recourse to higher precision as in [1], or wider accumulators as in [6].

We analyze the forward error en=s^nsne_{n}=\hat{s}_{n}-s_{n} in the computed sum s^n\hat{s}_{n} in terms of the unit roundoff uu, for three classes of algorithms: general summation (Section 2), shifted general summation (Section 3), and compensated summation (Section 4). Numerical experiments (Section 5) illustrate that the bounds are informative, and that, among the three classes of algorithms, compensated summation is the most accurate method. Hybrid algorithms as in [1] will be analyzed in future work.

1.1 Contributions

We present informative bounds, based on clear and structured derivations, for three classes of summation algorithms.

General summation

We derive explicit expressions for the error (Lemmas 2.2 and 2.3); as well as deterministic bounds that hold to all orders (Theorem 2.4) and depends on the height hh of the computational tree associated with the particular summation order. We present two probabilistic bounds that treat the roundoffs as random variables, one for independent roundoffs (Theorem 2.8) and one for mean-independent roundoffs (Theorem 2.12). Both bounds hold to all orders, depend only on h\sqrt{h}, and suggest that reducing the tree height hh increases the accuracy.

Shifted general summation

We present the extension of shifted sequential summation to shifted general summation (Algorithm 2). For the special case of shifted sequential summation we present an explicit expression for the error, and a simple probabilistic bound for independent roundoffs (Theorem 3.2). For shifted general summation, we deduce probabilistic bounds for mean-independent roundoffs (Theorem 3.4) and independent roundoffs (Theorem 3.6) as respective special cases of Theorems 2.12 and 2.8.

Compensated summation

We derive three different explicit expressions for the error (Theorems 4.3 and 4.5), and bounds that are valid to first and second order, both deterministic (Corollary 4.6, Theorem 4.9) and probabilistic (Theorem 4.11, Corollary 4.13). In particular (Remark 4.8) we note the discrepancy by a factor uu of existing bounds with ours in Corollary 4.6,

|s^nsn|3uk=1n|xk|+𝒪(u2).|\hat{s}_{n}-s_{n}|\leq 3u\,\sum_{k=1}^{n}{|x_{k}|}+\mathcal{O}(u^{2}).

1.2 Models for roundoff error

Our analyses assume that the inputs xkx_{k} are floating point numbers, i.e. can be stored exactly without error, and that the summation produces no overflow or underflow.

Let uu denote the unit roundoff.

Standard model [9]

This holds for IEEE floating-point arithmetic and implies that in the absence of underflow or overflow, the operations op{+,,×,/,}\text{op}\in\{+,-,\times,/,\sqrt{}\} when applied to floating point inputs xx and yy produce

(1) fl(x op y)=(x op y)(1+δxy),|δxy|u.\operatorname*{fl}(x\text{ op }y)=(x\text{ op }y)(1+\delta_{xy}),\qquad|\delta_{xy}|\leq u.

In order to derive probabilistic bounds, we treat the roundoffs as zero-mean random variables, according to two models that differ in the assumption of independence.

Model 1.1.

In the computation of interest, the quantities δxy\delta_{xy} in the model (1) associated with every pair of operands are independent random variables of mean zero.

Model 1.2.

Let the computation of interest generate rounding errors δ1,δ2,\delta_{1},\delta_{2},\ldots in that order. The δk\delta_{k} are random variables of mean zero such that

𝔼(δk|δ1,,δk1)=𝔼(δk)=0.\mathbb{E}(\delta_{k}|\delta_{1},\ldots,\delta_{k-1})=\mathbb{E}(\delta_{k})=0.

Higham and Mary [11] note that the mean independence assumption of Model 1.2 is weaker than the independence assumption of Model 1.1, but stronger than the assumption that the errors are uncorrelated. Connolly, et al. [3] observe that at least one mode of stochastic rounding produces errors satisfying Model 1.2, but with the weaker bound |δxy|2u|\delta_{xy}|\leq 2u in place of (1).

1.3 Probability theory

For the derivation of the probabilistic bounds, we need a martingale, and a concentration inequality.

Definition 1.1 (Martingale [18]).

A sequence of random variables Z1,,ZnZ_{1},\ldots,Z_{n} is a martingale with respect to the sequence X1,,XnX_{1},\ldots,X_{n} if, for all k1k\geq 1,

  • ZkZ_{k} is a function of X1,,XkX_{1},\ldots,X_{k},

  • 𝔼[|Zk|]<\mathbb{E}[|Z_{k}|]<\infty, and

  • 𝔼(Zk+1|X1,,Xk)=Zk\mathbb{E}(Z_{k+1}|X_{1},\ldots,X_{k})=Z_{k}.

Lemma 1.2 (Azuma-Hoeffding inequality [20]).

Let Z1,,ZnZ_{1},\ldots,Z_{n} be a martingale with respect to a sequence X1,,XnX_{1},\ldots,X_{n}, and let ckc_{k} be constants with

|ZkZk1|ck,2kn.|Z_{k}-Z_{k-1}|\leq c_{k},\qquad 2\leq k\leq n.

Then for any δ(0,1)\delta\in(0,1), with probability at least 1δ1-\delta,

(2) |ZnZ1|(k=2nck2)1/22ln(2/δ).|Z_{n}-Z_{1}|\leq\left(\sum_{k=2}^{n}c_{k}^{2}\right)^{1/2}\sqrt{2\ln(2/\delta)}.

If the bounds on the differences |ZkZk1||Z_{k}-Z_{k-1}| are permitted to fail with a small probability, then a similar but weaker concentration inequality holds [2]. Specifically, if |ZkZk1|ck|Z_{k}-Z_{k-1}|\leq c_{k} simultaneously for all 2kn2\leq k\leq n with exceptional probability at most η\eta, then (2) still holds with probability at least 1(δ+η)1-(\delta+\eta).

2 General summation

We analyse the error for general summation, presented in Algorithm 1. After representing the summation algorithm as a computational tree, we present error expressions and a deterministic error bound (Section 2.1), deterministic and probabilistic bounds valid to first order (Section 2.2), followed by two probabilistic bounds, one for Model 1.1 (Section 2.3) and one for Model 1.2 (Section 2.4).

Algorithm 1 General Summation [9, Algorithm 4.1]
0:  A set of floating point numbers 𝒮={x1,,xn}\mathcal{S}=\{x_{1},\ldots,x_{n}\}
0:  sn=k=1nxks_{n}=\sum_{k=1}^{n}{x_{k}}
1:  for k=2:nk=2:n do
2:     Remove two elements xx and yy from 𝒮\mathcal{S}
3:     sk=x+ys_{k}=x+y
4:     Add sks_{k} to 𝒮\mathcal{S}
5:  end for

Denote by sks_{k} the exact partial sum, by s^k\hat{s}_{k} the computed sum, and by ek=s^kske_{k}=\hat{s}_{k}-s_{k} the forward error, 2kn2\leq k\leq n.

Computational tree for Algorithm 1

We represent the specific ordering of summations in Algorithm 1 by a binary tree with 2n12n-1 vertices and the following properties:

  • Each vertex represents a partial sum sks_{k} or an input xkx_{k}.

  • Each operation sk=x+ys_{k}=x+y gives rise to two edges (x,sk)(x,s_{k}) and (y,sk)(y,s_{k}).

  • The root is the output sns_{n}, and the leaves are the inputs x1,,xnx_{1},\ldots,x_{n}.

Figure 1 shows two examples, one for sequential (recursive) summation and one for pairwise summation.

x1x_{1}x2x_{2}x3x_{3}x4x_{4}s2s_{2}s3s_{3}s4s_{4}
x1x_{1}x2x_{2}x3x_{3}x4x_{4}s2s_{2}s3s_{3}s4s_{4}
Figure 1: Computational trees for two different summation orderings in Algorithm 1 when n=4n=4. Left: sequential (recursive) summation. Right: pairwise summation.
Definition 2.1.

Consider the computational tree associated for Algorithm 1 applied to nn inputs.

  • The tree defines a partial ordering. If sjs_{j} is a descendant of sks_{k}, we say jkj\prec k and jkj\preceq k if sj=sks_{j}=s_{k} is possible.

  • The height of a node is the length of the longest downward path from that node to a leaf. Leaves have a height of zero.

  • The height of the tree is the height of its root. Sequential (recursive) summation yields a tree of height n1n-1, while pairwise summation yields a tree of height ln2n\lceil\ln_{2}n\rceil, see Figure 1.

2.1 Explicit expressions and deterministic bounds for the error

We present two expressions for the error in Algorithm 1 (Lemmas 2.2 and 2.3) each geared towards a different model for the subsequent probabilistic bounds; and two deterministic bounds (Theorem 2.4).

If the computed children are x^=x+ex\hat{x}=x+e_{x} and y^=y+ey\hat{y}=y+e_{y}, then the computed parent sum, in line 3 of Algorithm 1, is

s^k=(x^+y^)(1+δk),2kn.\hat{s}_{k}=(\hat{x}+\hat{y})(1+\delta_{k}),\qquad 2\leq k\leq n.

Highlight the error in the children,

sk+ek=s^k=((x+ex)+(y+ey))(1+δk)=(sk+ex+ey)(1+δk)\displaystyle s_{k}+e_{k}=\hat{s}_{k}=((x+e_{x})+(y+e_{y}))(1+\delta_{k})=(s_{k}+e_{x}+e_{y})(1+\delta_{k})

to obtain the error in the parent

(3) ek=(sk+ex+ey)(1+δk)sk=(ex+ey)(1+δk)+skδk,2kn.\displaystyle e_{k}=(s_{k}+e_{x}+e_{y})(1+\delta_{k})-s_{k}=(e_{x}+e_{y})(1+\delta_{k})+s_{k}\delta_{k},\qquad 2\leq k\leq n.

This means the error at node sks_{k} equals the sum of the children errors, perturbed slightly, plus a local error that is proportional to the exact partial sum sks_{k}.

The unravelled recursions in Lemma 2.2 generalize the explicit expressions for sequential (recursive) summation in [8, Lemma 3.1],

(4) en=s^nsn=k=2nskδk=k+1n(1+δ)e_{n}=\hat{s}_{n}-s_{n}=\sum_{k=2}^{n}{s_{k}\delta_{k}\prod_{\ell=k+1}^{n}{(1+\delta_{\ell})}}

to general summation algorithms.

We present two expressions that differ in their usefulness under particular probabilistic models. The expression in Lemma 2.2 works well with Model 1.1 because the sum is a martingale if summed in reverse order, see Section 2.3. Unfortunately, this approach does not work under the weaker Model 1.2. This is the reason for Lemma 2.3 where the sum is a martingale under the original ordering, see Section 2.4.

Lemma 2.2 (First explicit expression).

The error in Algorithm 1 equals

(5) en=s^nsn=k=2nskδkkjn(1+δj).e_{n}=\hat{s}_{n}-s_{n}=\sum_{k=2}^{n}s_{k}\delta_{k}\prod_{k\prec j\preceq n}(1+\delta_{j}).

This means the forward error is the sum of the local errors at each node, each perturbed by subsequent rounding errors. A similar expression is derived in [9, (4.2)]

(6) en=k=2ns^kδ~k,e_{n}=\sum_{k=2}^{n}\hat{s}_{k}\tilde{\delta}_{k},

but in terms of the computed partial sums s^k\hat{s}_{k}. By contrast, our (5) depends only on the exact partial sks_{k}, which make it more amenable to a probabilistic analysis.

Write (3) in terms of the sum fkf_{k} of the children errors at node sks_{k},

ek=fk+(sk+fk)δk,fkex+ey,2kn.e_{k}=f_{k}+(s_{k}+f_{k})\delta_{k},\qquad f_{k}\equiv e_{x}+e_{y},\qquad 2\leq k\leq n.

Unraveling the recurrence gives the second explicit expression for the errors.

Lemma 2.3 (Second explicit expression).

The errors in Algorithm 1 equal

(7) ek=s^ksk=jk(sj+fj)δj,2kn,e_{k}=\hat{s}_{k}-s_{k}=\sum_{j\preceq k}(s_{j}+f_{j})\delta_{j},\qquad 2\leq k\leq n,

and similarly

(8) fk=jk(sj+fj)δj,2kn.f_{k}=\sum_{j\prec k}(s_{j}+f_{j})\delta_{j},\qquad 2\leq k\leq n.

Theorem 2.4.

If hh is the height of the computational tree for Algorithm 1, the error is bounded by

|en|\displaystyle|e_{n}| k=2n|sk||δk||kjn(1+δj)|u(1+u)hk=2n|sk|\displaystyle\leq\sum_{k=2}^{n}|s_{k}||\delta_{k}|\left|\prod_{k\prec j\preceq n}(1+\delta_{j})\right|\leq u(1+u)^{h}\sum_{k=2}^{n}|s_{k}|
h2u21huk=1n|xk|,providedhu<1.\displaystyle\leq\frac{h^{2}u^{2}}{1-hu}\,\sum_{k=1}^{n}{|x_{k}|},\qquad\mathrm{provided}~{}hu<1.

Proof 2.5.

The first bound is a direct consequence of Lemma 2.2. The second one follows from k=2n|sk|hk=1n|xk|\sum_{k=2}^{n}|s_{k}|\leq h\sum_{k=1}^{n}|x_{k}|, and (1+u)hhu/(1hu)(1+u)^{h}\leq hu/(1-hu) [9, Lemma 3.1].

A similar bound is derived from (6) in [9, (4.3)],

|en|uk=2n|s^k|,|e_{n}|\leq u\sum_{k=2}^{n}|\hat{s}_{k}|,

is accompanied by the following observation:

In designing or choosing a summation method to achieve high accuracy, the aim should be to minimize the absolute values of the intermediate sums sks_{k}.

Our probabilistic bounds in Sections 2.3 and 2.4 corroborate this observation, the main difference being a dependence on (ksk2)1/2\left(\sum_{k}s_{k}^{2}\right)^{1/2} rather than k|sk|\sum_{k}|s_{k}|.

2.2 Deterministic and probabilistic error bounds valid to first order

We present a deterministic bound (9), and a probabilistic bound (Theorem 2.6) under Model 1.2, both valid to first order.

Truncating in Theorem 2.4 yields the first order bound

(9) en=k=2nskδk+𝒪(h2u2),e_{n}=\sum_{k=2}^{n}s_{k}\delta_{k}+\mathcal{O}(h^{2}u^{2}),

which is also derived in [11, Lemma 2.1] for the special case sequential (recursive) summation.

Theorem 2.6.

Let δ2,,δn\delta_{2},\ldots,\delta_{n} be mean-indepen-dent, zero mean random variables. Then for any δ(0,1)\delta\in(0,1), with probability at least 1δ1-\delta, the error in Algorithm 1 is bounded by

|en|u(k=2nsk2)1/22ln(2/δ)+𝒪(u2).|e_{n}|\leq u\left(\sum_{k=2}^{n}s_{k}^{2}\right)^{1/2}\sqrt{2\ln(2/\delta)}+\mathcal{O}(u^{2}).

Proof 2.7.

With δ1=0\delta_{1}=0, the sequence Z1=0Z_{1}=0, Zi=k=2iskδkZ_{i}=\sum_{k=2}^{i}s_{k}\delta_{k}, 2in2\leq i\leq n, is a martingale with respect to δ1,,δn\delta_{1},\ldots,\delta_{n}, satisfying

|ZiZi1|u|sk|,2in.|Z_{i}-Z_{i-1}|\leq u|s_{k}|,\qquad 2\leq i\leq n.

Now apply Lemma 1.2.

A closely related bound for sequential (recursive) summation is derived in [11, Theorem 2.8] under a model where the inputs xkx_{k} are themselves random variables. Under this assumption, the probabilistic error bounds become significantly tighter if 𝔼[xk]=0\mathbb{E}[x_{k}]=0, leading to the conclusion that centering the data prior to summation can reduce the error, see Section 3.

A disadvantage of Theorem 2.6 is its uncertainty about the point where the second-order term 𝒪(u2)\mathcal{O}(u^{2}) starts to become relevant. For input dimensions n>u1n>u^{-1} or h>u1h>u^{-1} in particular, it is desirable to derive bounds that hold to all orders.

2.3 General probabilistic bound under Model 1.1

We derive a bound under the assumption of independent roundoffs δ2,,δn\delta_{2},\ldots,\delta_{n}.

Theorem 2.8.

Let δ2,,δn\delta_{2},\ldots,\delta_{n} be independent zero-mean random variables, hh the height of the computational tree for Algorithm 1, and λ2ln(2n/η)\lambda\equiv\sqrt{2\ln(2n/\eta)}. Then for any δ(0,1)\delta\in(0,1) and η(0,1)\eta\in(0,1), with probability at least 1(δ+η)1-(\delta+\eta), the error in Algorithm 1 is bounded by

|s^nsn|uexp(λhu)(k=2nsk2)1/22ln(2/δ).|\hat{s}_{n}-s_{n}|\leq u\exp(\lambda\sqrt{h}u)\left(\sum_{k=2}^{n}{s_{k}^{2}}\right)^{1/2}\sqrt{2\ln(2/\delta)}.

Proof 2.9.

We start by bounding the products in the error expression (5). Since

kjn(1+δj)exp(kjnδj),2kn1,\prod_{k\prec j\preceq n}(1+\delta_{j})\leq\exp\left(\sum_{k\prec j\preceq n}\delta_{j}\right),\qquad 2\leq k\leq n-1,

we can apply Lemma 1.2 to the sum on the right to conclude that for each kk, η(0,1)\eta\in(0,1), and λ=2ln(2/η)\lambda=\sqrt{2\ln(2/\eta)} with failure probability at most η\eta, the bound

kjn(1+δj)exp(λhu)\prod_{k\prec j\preceq n}(1+\delta_{j})\leq\exp(\lambda\sqrt{h}u)

holds. Since the product is nonnegative, we can replace the expression on the left with its absolute value and the statement still holds. A union bound over at most nn such inequalities shows that with failure probability at most nηn\cdot\eta the bound

max2kn|kjn(1+δj)|exp(λhu)\max_{2\leq k\leq n}\left|\prod_{k\prec j\preceq n}(1+\delta_{j})\right|\leq\exp(\lambda\sqrt{h}u)

holds. Then substitute ηη/n\eta\mapsto\eta/n and λ=2ln(2n/η)\lambda=\sqrt{2\ln(2n/\eta)}.

Next, with δ1=0\delta_{1}=0, the sequence

Z1=0,Zi=k=ni+2nskδkkjn(1+δj),2inZ_{1}=0,\qquad Z_{i}=\sum_{k=n-i+2}^{n}s_{k}\delta_{k}\prod_{k\prec j\preceq n}(1+\delta_{j}),\qquad 2\leq i\leq n

is a martingale with respect to δn,δn1,,δ2\delta_{n},\delta_{n-1},\ldots,\delta_{2}, that satisfies

|ZiZi1|uexp(λhu)|sni+2|,2in|Z_{i}-Z_{i-1}|\leq u\exp(\lambda\sqrt{h}u)|s_{n-i+2}|,\qquad 2\leq i\leq n

with failure probability at most η\eta. At last apply Lemma 1.2 with additional failure probability δ\delta.

The bound (k=2nsk2)1/2hi=1n|xi|\left(\sum_{k=2}^{n}s_{k}^{2}\right)^{1/2}\leq\sqrt{h}\sum_{i=1}^{n}|x_{i}| illustrates that, with high probability, the error in Theorem 2.8 is bounded by a quantity proportional to the square root of the height of the computational tree in Algorithm 1. This suggests that minimizing the height of the computational tree should reduce the error.

As long as λhu1\lambda\sqrt{h}u\ll 1, the probability η\eta can be made small, without much effect on the overall bound in Theorem 2.8.

2.4 General probabilistic bound under Model 1.2

We derive a bound under the assumption of mean-independent roundoffs δ2,,δn\delta_{2},\ldots,\delta_{n}. To this end we rely on the recurrence in (8) and induction.

Lemma 2.10.

Let δ2,,δn\delta_{2},\ldots,\delta_{n} be mean-independent mean-zero random variables, hh the height of the computational tree for Algorithm 1, and λ2ln(2n/η)\lambda\equiv\sqrt{2\ln(2n/\eta)}. For any η>0\eta>0 with λhu<1\lambda\sqrt{h}u<1, the bounds

(10) |fk|λu1λhu(jksj2)1/2,2kn|f_{k}|\leq\frac{\lambda u}{1-\lambda\sqrt{h}u}\left(\sum_{j\prec k}s_{j}^{2}\right)^{1/2},\qquad 2\leq k\leq n

all hold simultaneously with probability at least 1η1-\eta.

Proof 2.11.

The proof is by induction, and in terms of the failure probability η\eta.

Induction basis k=2k=2

The children of s2s_{2} are both leaves and floating point numbers, hence f2=0f_{2}=0 and (10) holds deterministically.

Induction hypothesis

Assume that (10) holds simultaneously for all 2jk12\leq j\leq k-1 with failure probability at most (k1)η/n(k-1)\eta/n.

Induction step

Write (8) as

fk=j=2k1(sj+fj)δj𝟙jk.f_{k}=\sum_{j=2}^{k-1}(s_{j}+f_{j})\delta_{j}\mathds{1}_{j\prec k}.

With δ1=0\delta_{1}=0, the sequence Z1=0Z_{1}=0, Zi=j=2i(sj+fj)δj𝟙jkZ_{i}=\sum_{j=2}^{i}(s_{j}+f_{j})\delta_{j}\mathds{1}_{j\prec k} is a martingale with respect to δ1,,δk1\delta_{1},\ldots,\delta_{k-1}, and by the induction hypothesis satisfies

|ZiZi1|{usi+λu21λhu(jisj2)1/2ik,0ik,,2ik1,|Z_{i}-Z_{i-1}|\leq\begin{cases}us_{i}+\frac{\lambda u^{2}}{1-\lambda\sqrt{h}u}\left(\sum_{j\prec i}s_{j}^{2}\right)^{1/2}&i\prec k,\\ 0&i\nprec k,\end{cases},\qquad 2\leq i\leq k-1,

with failure probability at most (k1)η/n(k-1)\eta/n. Lemma 1.2 then implies that with additional failure probability η/n\eta/n,

|fk|\displaystyle|f_{k}| λu(jk(sj+λu1λhu(js2)1/2)2)1/2\displaystyle\leq\lambda u\left(\sum_{j\prec k}\left(s_{j}+\frac{\lambda u}{1-\lambda\sqrt{h}u}\Big{(}\sum_{\ell\prec j}s_{\ell}^{2}\Big{)}^{1/2}\right)^{2}\right)^{1/2}
λu(jksj2)1/2+λu(jk(λu1λhu(js2)1/2)2)1/2\displaystyle\leq\lambda u\left(\sum_{j\prec k}s_{j}^{2}\right)^{1/2}+\lambda u\left(\sum_{j\prec k}\left(\frac{\lambda u}{1-\lambda\sqrt{h}u}\Big{(}\sum_{\ell\prec j}s_{\ell}^{2}\Big{)}^{1/2}\right)^{2}\right)^{1/2}

where the second inequality follows from the two-norm triangle inequality. Exploit the fact that sjs_{j} can appear at most hh times, once for itself and once for each of its ancestors, to bound the second summand by

λu(jk(λu1λhu(js2)1/2)2)1/2\displaystyle\lambda u\left(\sum_{j\prec k}\left(\frac{\lambda u}{1-\lambda\sqrt{h}u}\Big{(}\sum_{\ell\prec j}s_{\ell}^{2}\Big{)}^{1/2}\right)^{2}\right)^{1/2} =λ2u21λhu(jkjs2)1/2\displaystyle=\frac{\lambda^{2}u^{2}}{1-\lambda\sqrt{h}u}\left(\sum_{j\prec k}\sum_{\ell\prec j}s_{\ell}^{2}\right)^{1/2}
λ2u21λhu(hjksj2)1/2.\displaystyle\leq\frac{\lambda^{2}u^{2}}{1-\lambda\sqrt{h}u}\left(h\sum_{j\prec k}s_{j}^{2}\right)^{1/2}.

Combine everything

|fk|\displaystyle|f_{k}| λu(jksj2)1/2+λ2u21λhu(hjksj2)1/2\displaystyle\leq\lambda u\left(\sum_{j\prec k}s_{j}^{2}\right)^{1/2}+\frac{\lambda^{2}u^{2}}{1-\lambda\sqrt{h}u}\left(h\sum_{j\prec k}s_{j}^{2}\right)^{1/2}
=λu(1+λhu1λhu)(jksj2)1/2=λu1λhu(jksj2)1/2.\displaystyle=\lambda u\left(1+\frac{\lambda\sqrt{h}u}{1-\lambda\sqrt{h}u}\right)\left(\sum_{j\prec k}s_{j}^{2}\right)^{1/2}=\frac{\lambda u}{1-\lambda\sqrt{h}u}\left(\sum_{j\prec k}s_{j}^{2}\right)^{1/2}.

Thus (10) holds simultaneously for all 2jk2\leq j\leq k with failure probability at most kη/nk\eta/n. By induction (10) holds for all 2kn2\leq k\leq n with failure probability at most η\eta.

Below is the final probabilistic bound under Model 1.2.

Theorem 2.12.

Let δ2,,δn\delta_{2},\ldots,\delta_{n} be mean-independent mean-zero random variables, hh the height of the computational tree for Algorithm 1, and λ2ln(2n/η)\lambda\equiv\sqrt{2\ln(2n/\eta)}. For any η>0\eta>0 with λhu<1\lambda\sqrt{h}u<1 and δ(0,1)\delta\in(0,1), with probability at least 1(δ+η)1-(\delta+\eta), the error in Algorithm 1 is bounded by

(11) |s^nsn|u1λhu(k=2nsk2)1/22ln(2/δ).|\hat{s}_{n}-s_{n}|\leq\frac{u}{1-\lambda\sqrt{h}u}\left(\sum_{k=2}^{n}s_{k}^{2}\right)^{1/2}\sqrt{2\ln(2/\delta)}.

Proof 2.13.

Apply Lemma 1.2 to (7) with failure probability δ\delta, and bound |fk||f_{k}| with Lemma 2.10.

The bound in Theorem 2.12 is nearly identical to the one in Theorem 2.8, the difference becoming significant only when λhu\lambda\sqrt{h}u is close to 1.

3 Shifted General Summation

After discussing the motivation for shifted summation, we present Algorithm 2 for shifted general summation; and derive error expressions, deterministic and probabilistic bounds for shifted sequential (recursive) summation (Section 3.1) and shifted general summation (Section 3.2).

Shifted summation is motivated by work in computer architecture [5, 4] and formal methods for program verification [17] where not only the roundoffs but also the inputs are interpreted as random variables sampled from some distribution. Then one can compute statistics for the total roundoff error and estimate the probability that it is bounded by tutu for a given tt.

Adopting instead the approach for probabilistic roundoff error analysis in [10, 13], probabilistic bounds for random data are derived in [11], with improvements in [8]. The bound below suggests that sequential summation is accurate if the summands xkx_{k} are tightly clustered around zero.

Lemma 3.1 (Theorem 5.2 in [8]).

Let the roundoffs δk\delta_{k} be independent mean-zero random variables, and the summands xkx_{k} be independent random variables with mean μ\mu and ‘variance’ max1kn|xkμ|σ\max_{1\leq k\leq n}{|x_{k}-\mu|}\leq\sigma. Then for any 0<δ<10<\delta<1, with probability at least 1δ1-\delta, the error in sequential (recursive) summation is bounded by

|s^nsn|(1+γ)(λn3/2|μ|+λ2nσ)u|\hat{s}_{n}-s_{n}|\leq(1+\gamma)(\lambda\,n^{3/2}\,|\mu|+\lambda^{2}n\,\sigma)\,u

where λ=2ln(6/δ)\lambda=\sqrt{2\ln{(6/\delta)}} and γ=exp(λ(nu+nu2)/(1u))1\gamma=\exp{(\lambda\,(\sqrt{n}u+nu^{2})/(1-u))}-1

This observation has been applied to non-random data to reduce the backward error [11, Section 4]. The shifted algorithm for sequential (recursive) summation [11, Algortihm 4.1] is extended to general summation in Algorithm 2.

Algorithm 2 Shifted General Summation
0:  Summands xkx_{k}, 1kn1\leq k\leq n, shift cc
0:  sn=k=1nxks_{n}=\sum_{k=1}^{n}{x_{k}}
1:  for k=1:nk=1:n do
2:     yk=xkcy_{k}=x_{k}-c
3:  end for
4:  tnt_{n} = output of Algorithm 1 on {y1,,yn}\{y_{1},\ldots,y_{n}\}
5:  return  sn=tn+ncs_{n}=t_{n}+nc

Note that the pseudo-code in Algorithm 2 is geared towards exposition. In practice, one shifts the xkx_{k} immediately prior to the summation, to avoid allocating additional storage for the yky_{k}. The ideal but impractical choice for centering is the empirical mean c=sn/nc=s_{n}/n. A more practical approximation is c=(minkxk+maxkxk)/2c=(\min_{k}{x_{k}}+\max_{k}{x}_{k})/2.

3.1 Shifted sequential (recursive) summation

We present an expression for the error (12), a deterministic bound (13), and a probabilistic bound under Model 1.1 (Theorem 3.2).

We incorporate the final uncentering operation as the (n+1)(n+1)st summation, and describe shifted sequential summation in exact arithmetic as

tk=j=1kyj=tk1+yk,1kn+1,sn=tn+1=tn+nc,t_{k}=\sum_{j=1}^{k}{y_{j}}=t_{k-1}+y_{k},\quad 1\leq k\leq n+1,\qquad s_{n}=t_{n+1}=t_{n}+nc,

where yk=xkcy_{k}=x_{k}-c, 1kn1\leq k\leq n and yn+1=ncy_{n+1}=nc. In floating point arithmetic, the ϵj\epsilon_{j} denote the (un)centering errors, and the δj\delta_{j} the summation errors.

t^1\displaystyle\hat{t}_{1} =\displaystyle= y1(1+ϵ1)(1+δ1),δ1=0\displaystyle y_{1}(1+\epsilon_{1})(1+\delta_{1}),\qquad\delta_{1}=0
t^k\displaystyle\hat{t}_{k} =\displaystyle= (t^k1+yk(1+ϵk))(1+δk),2kn+1.\displaystyle\left(\hat{t}_{k-1}+y_{k}(1+\epsilon_{k})\right)(1+\delta_{k}),\qquad 2\leq k\leq n+1.

The idea behind shifted summation for non-random data is that the centered sum

|t^ntn|nuk=1n|yk|+𝒪(u2)|\hat{t}_{n}-t_{n}|\leq nu\sum_{k=1}^{n}{|y_{k}|}+\mathcal{O}(u^{2})

can be more accurate if centering reduces the magnitude of the partial sums and the summands, see also the discussions after Theorems 2.4 and 3.2.

The error expression corresponding to (4) and Lemma 2.2 is

(12) en=s^nsn=k=2n+1tkδk=k+1n+1(1+δ)Summation+k=1n+1ykϵk=kn+1(1+δ)Centeringe_{n}=\hat{s}_{n}-s_{n}=\underbrace{\sum_{k=2}^{n+1}{t_{k}\delta_{k}\prod_{\ell=k+1}^{n+1}{(1+\delta_{\ell})}}}_{\text{Summation}}+\underbrace{\sum_{k=1}^{n+1}{y_{k}\epsilon_{k}\prod_{\ell=k}^{n+1}{(1+\delta_{\ell})}}}_{\text{Centering}}

with the deterministic bound

(13) |en|u(1+u)n(k=2n|skkc|+k=1n|xkc|+|s|+|nc|).|e_{n}|\leq u(1+u)^{n}\left(\sum_{k=2}^{n}{|s_{k}-kc|}+\sum_{k=1}^{n}{|x_{k}-c|}+|s|+|nc|\right).

If all ϵk=0\epsilon_{k}=0 and δn+1=0\delta_{n+1}=0, then tk=skt_{k}=s_{k} and the error (12) reduces to that for plain sequential summation (4).

The probabilistic bound below can be considered to be a special case of the general bound in Theorem 3.6, but it requires only a single probability.

Theorem 3.2 (Probabilistic bound for shifted sequential (recursive) summation).

Let δj\delta_{j} and ϵj\epsilon_{j} be independent mean-zero random variables, and γk(1+u)k1\gamma_{k}\equiv(1+u)^{k}-1. Then for any 0<δ<10<\delta<1, with probability at least 1δ1-\delta,

|s^nsn|max1kn+1(|skkc|+|xkc|)uγ2(n+2)22ln(2/δ).|\hat{s}_{n}-s_{n}|\leq\max_{1\leq k\leq n+1}{\left(|s_{k}-kc|+|x_{k}-c|\right)}\sqrt{\frac{u\,\gamma_{2(n+2)}}{2}}\,\sqrt{2\ln(2/\delta)}.

For n1/un\ll 1/u we have

uγ2(n+2)2(n+2)u212(n+2)un+2u.\sqrt{\frac{u\gamma_{2(n+2)}}{2}}\leq\sqrt{\frac{(n+2)u^{2}}{1-2(n+2)u}}\approx\sqrt{n+2}\,u.

Proof 3.3.

We construct a Martingale by extracting the accumulated roundoffs in a last-in first-out manner. Set Z00Z_{0}\equiv 0,

Z1\displaystyle Z_{1} =\displaystyle= tn+1δn+1+yn+1ϵn+1(1+δn+1)=sδn+1+ncϵn+1(1+δn+1)\displaystyle t_{n+1}\delta_{n+1}+y_{n+1}\epsilon_{n+1}(1+\delta_{n+1})=s\delta_{n+1}+nc\,\epsilon_{n+1}(1+\delta_{n+1})
Zk\displaystyle Z_{k} =\displaystyle= j=nk+2n+1tjδj=j+1n+1(1+δ)+j=nk+2n+1yjϵj=jn+1(1+δ),1kn\displaystyle\sum_{j=n-k+2}^{n+1}{t_{j}\delta_{j}\prod_{\ell=j+1}^{n+1}{(1+\delta_{\ell})}}+\sum_{j=n-k+2}^{n+1}{y_{j}\epsilon_{j}\prod_{\ell=j}^{n+1}{(1+\delta_{\ell})}},\qquad 1\leq k\leq n
Zn+1\displaystyle Z_{n+1} =\displaystyle= j=2n+1tjδj=j+1n+1(1+δ)+j=1n+1yjϵj=jn+1(1+δ)=s^nsn.\displaystyle\sum_{j=2}^{n+1}{t_{j}\delta_{j}\prod_{\ell=j+1}^{n+1}{(1+\delta_{\ell})}}+\sum_{j=1}^{n+1}{y_{j}\epsilon_{j}\prod_{\ell=j}^{n+1}{(1+\delta_{\ell})}}=\hat{s}_{n}-s_{n}.

By construction, Z1Z_{1} is a function of ξn+1(δn+1,ϵn+1)\xi_{n+1}\equiv(\delta_{n+1},\epsilon_{n+1}), while ZkZ_{k} is a function of ξj(δj,ϵj)\xi_{j}\equiv(\delta_{j},\epsilon_{j}) for nk+2jn+1n-k+2\leq j\leq n+1, and finally Zn+1Z_{n+1} is a function of all ξj(δj,ϵj)\xi_{j}\equiv(\delta_{j},\epsilon_{j}) for 1jn+11\leq j\leq n+1.

By assumption ηj(δj,ϵj)\eta_{j}\equiv(\delta_{j},\epsilon_{j}) are tuples of 2(n+1)2(n+1) independent random variables with 𝔼[δj]=0=𝔼[ϵj]\mathbb{E}[\delta_{j}]=0=\mathbb{E}[\epsilon_{j}] and |δj|,|ϵj|u|\delta_{j}|,|\epsilon_{j}|\leq u, 1jn+11\leq j\leq n+1. Then

𝔼[Zk+1|ξnk+2,,ξn+1]=Zk,0kn.\mathbb{E}[Z_{k+1}|\xi_{n-k+2},\ldots,\xi_{n+1}]=Z_{k},\qquad 0\leq k\leq n.

Combining all of the above shows that Z0,,Zn+1Z_{0},\ldots,Z_{n+1} is a Martingale in ξn+1,,ξ1\xi_{n+1},\ldots,\xi_{1}.

The differences are for 2kn2\leq k\leq n

Z1Z0\displaystyle Z_{1}-Z_{0} =\displaystyle= tn+1δn+1+yn+1ϵn+1(1+δn+1)\displaystyle t_{n+1}\delta_{n+1}+y_{n+1}\epsilon_{n+1}(1+\delta_{n+1})
ZkZk1\displaystyle Z_{k}-Z_{k-1} =\displaystyle= tnk+2δnk+2=nk+3n+1(1+δ)+ynk+2ϵnk+2=nk+2n+1(1+δ),\displaystyle t_{n-k+2}\delta_{n-k+2}\prod_{\ell=n-k+3}^{n+1}{(1+\delta_{\ell})}+y_{n-k+2}\epsilon_{n-k+2}\prod_{\ell=n-k+2}^{n+1}{(1+\delta_{\ell})},
Zn+1Zn\displaystyle Z_{n+1}-Z_{n} =\displaystyle= y1ϵ1=1n+1(1+δ).\displaystyle y_{1}\epsilon_{1}\prod_{\ell=1}^{n+1}{(1+\delta_{\ell})}.

They are bounded by

|Z1Z0|\displaystyle|Z_{1}-Z_{0}| \displaystyle\leq c1u(1+u)(|tn+1|+|yn+1|)=u(1+u)(|s|+|nc|)\displaystyle c_{1}\equiv u(1+u)\,(|t_{n+1}|+|y_{n+1}|)=u(1+u)(|s|+|nc|)
|ZkZk1|\displaystyle|Z_{k}-Z_{k-1}| \displaystyle\leq cku(1+u)k1(|tnk+2|+|ynk+2|),2kn\displaystyle c_{k}\equiv u(1+u)^{k-1}\,(|t_{n-k+2}|+|y_{n-k+2}|),\qquad 2\leq k\leq n
|Zn+1Zn|\displaystyle|Z_{n+1}-Z_{n}| \displaystyle\leq cn+1u(1+u)n|y1|\displaystyle c_{n+1}\equiv u\,(1+u)^{n}\,|y_{1}|

Lemma 1.2 implies that for any 0<δ<10<\delta<1 with probability at least δ\delta

|Zn+1Z0|k=1n+1ck22ln(2/δ).|Z_{n+1}-Z_{0}|\leq\sqrt{\sum_{k=1}^{n+1}{c_{k}^{2}}}\sqrt{2\ln(2/\delta)}.

Here |Zn+1Z0|=|s^nsn||Z_{n+1}-Z_{0}|=|\hat{s}_{n}-s_{n}|. With v1=|y1|v_{1}=|y_{1}|, vk=|tk|+|yk|v_{k}=|t_{k}|+|y_{k}|, 2kn+12\leq k\leq n+1, the sum under the square root is bounded by

k=1n+1ck2k=1n+1u2(1+u)2k|vn+2k|2max1jn+1|vj|2u2k=1n+1(1+u)2k\sum_{k=1}^{n+1}{c_{k}^{2}}\leq\sum_{k=1}^{n+1}{u^{2}(1+u)^{2k}\,|v_{n+2-k}|^{2}}\leq\max_{1\leq j\leq n+1}{|v_{j}|^{2}}{u^{2}\sum_{k=1}^{n+1}{(1+u)^{2k}}}

The trailing term is a geometric sum

k=1n+1(1+u)2k(1+u)2(n+2)1(1+u)21=γ2nu2+2u\sum_{k=1}^{n+1}(1+u)^{2k}\leq\frac{(1+u)^{2(n+2)}-1}{(1+u)^{2}-1}=\frac{\gamma_{2n}}{u^{2}+2u}

Combining everything gives k=1n+1ck2max1jn+1|vj|uγ2(n+2)2\sqrt{\sum_{k=1}^{n+1}{c_{k}^{2}}}\leq\max_{1\leq j\leq n+1}{|v_{j}|}\sqrt{\frac{u\gamma_{2(n+2)}}{2}}. The inequality for uγ2(n+2)/2\sqrt{u\gamma_{2(n+2)}/2} follows from [9, Lemma 3.1].

Theorem 3.2 suggests that the error has a bound proportional to n+2\sqrt{n+2}, and that shifted sequential summation improves the accuracy over plain sequential summation if

maxk(|skkc|+|xkc|)maxk(|sk|+|xk|),\max_{k}{\left(|s_{k}-kc|+|x_{k}-c|\right)}\ll\max_{k}{\left(|s_{k}|+|x_{k}|\right)},

that is, if shifting reduces the magnitude of the partial sums. This is confirmed by numerical experiments. However, the experiments also illustrate that shifted summation can decrease the accuracy in the summation of naturally centered data, such as xkx_{k} being normal(0,1) random variables.

3.2 Shifted general summation

We present a probabilistic bound for shifted general summation under Model 1.2, based on the following computational tree. If Algorithm 1 operates with a tree of height hh, then the analogous version in Algorithm 2 operates with a tree of height h+2h+2 on the 2n+12n+1 input data

x1,c,x2,c,,xn,c,ncx_{1},-c,x_{2},-c,\ldots,x_{n},-c,nc

The leading 2n2n inputs are centered in pairs yk=xkcy_{k}=x_{k}-c at the beginning, the intermediate sums are tkt_{k}, and the final uncentering is sn=tn+ncs_{n}=t_{n}+nc.

Theorem 3.4 (Probabilistic bound for shifted general summation).

Let δk\delta_{k} be mean-independent mean-zero random variables, hh the height of the computational tree in the inner call to Algorithm 1, and λ2ln(2(2n+1)/η)\lambda\equiv\sqrt{2\ln(2(2n+1)/\eta)}. For any η>0\eta>0 with λh+2u<1\lambda\sqrt{h+2}u<1 and δ(0,1)\delta\in(0,1), with probability at least 1(δ+η)1-(\delta+\eta), the error in Algorithm 2 is bounded by

|s^nsn|u1λh+2u(sn2+k=2ntk2+k=1nyk2)1/22ln(2/δ).|\hat{s}_{n}-s_{n}|\leq\frac{u}{1-\lambda\sqrt{h+2}u}\left(s_{n}^{2}+\sum_{k=2}^{n}t_{k}^{2}+\sum_{k=1}^{n}y_{k}^{2}\right)^{1/2}\sqrt{2\ln(2/\delta)}.

Proof 3.5.

This is a direct consequence of Theorem 2.12.

A similar bound under Model 1.1 follows from Theorem 2.8.

Theorem 3.6 (Probabilistic bound for shifted general summation).

Let δk\delta_{k} be independent mean-zero random variables, hh the height of the computational tree in the inner call to Algorithm 1, and λ2ln(2(2n+1)/η)\lambda\equiv\sqrt{2\ln(2(2n+1)/\eta)}. For any η>0\eta>0 with λh+2u<1\lambda\sqrt{h+2}u<1 and δ(0,1)\delta\in(0,1), with probability at least 1(δ+η)1-(\delta+\eta), the error in Algorithm 2 is bounded by

|s^nsn|uexp(λh+2u)(sn2+k=2ntk2+k=1nyk2)1/22ln(2/δ).|\hat{s}_{n}-s_{n}|\leq u\exp(\lambda\sqrt{h+2}u)\left(s_{n}^{2}+\sum_{k=2}^{n}t_{k}^{2}+\sum_{k=1}^{n}y_{k}^{2}\right)^{1/2}\sqrt{2\ln(2/\delta)}.

4 Compensated sequential summation

We analyse the forward error for compensated sequential summation.

After presenting compensated summation (Algorithm 3) and the roundoff error model (14), we derive three different expressions for the compensated summation error (Section 4.1), followed by bounds that are exact to first order (Section 4.2) and to second order (Section 4.3).

Algorithm 3 is the formulation [7, Theorem 8] of the ‘Kahan Summation Formula’ [14]. A version with opposite signs is presented in [9, Algorithm 4.2].

Algorithm 3 Compensated Summation [7, Theorem 8] [15, page 9-4]
0:  Summands xkx_{k}, 1kn1\leq k\leq n
0:  sn=k=1nxks_{n}=\sum_{k=1}^{n}{x_{k}}
1:  s1=x1s_{1}=x_{1}, c1=0c_{1}=0
2:  for k=2:nk=2:n do
3:     yk=xkck1y_{k}=x_{k}-c_{k-1}
4:     sk=sk1+yks_{k}=s_{k-1}+y_{k}
5:     ck=(sksk1)ykc_{k}=(s_{k}-s_{k-1})-y_{k}
6:  end for
7:  return  sns_{n}

Following [15, page 9-5], our finite precision model of Algorithm 3 is

(14) s^1=s1=x1,c^1=0y^k=(xkc^k1)(1+ηk),2kn,η2=0s^k=(s^k1+y^k)(1+σk)c^k=((s^ks^k1)(1+δk)y^k)(1+βk),\displaystyle\begin{split}\hat{s}_{1}&=s_{1}=x_{1},\qquad\hat{c}_{1}=0\\ \hat{y}_{k}&=(x_{k}-\hat{c}_{k-1})(1+\eta_{k}),\qquad 2\leq k\leq n,\qquad\eta_{2}=0\\ \hat{s}_{k}&=(\hat{s}_{k-1}+\hat{y}_{k})(1+\sigma_{k})\\ \hat{c}_{k}&=\left((\hat{s}_{k}-\hat{s}_{k-1})(1+\delta_{k})-\hat{y}_{k}\right)(1+\beta_{k}),\end{split}

4.1 Error expressions

We express the accumulated roundoff in Algorithm 3 as a recursive matrix vector product (Lemma 4.1), then unravel the recursions into an explicit form (Theorem 4.3), followed by two other exact expressions derived in a different way (Theorem 4.5).

Lemma 4.1.

With assumptions (14), and

Γk(1+σk)(1+δk),Ψk(1+δk)(1+βk),2kn,\Gamma_{k}\equiv(1+\sigma_{k})(1+\delta_{k}),\qquad\Psi_{k}\equiv(1+\delta_{k})(1+\beta_{k}),\qquad 2\leq k\leq n,

define

Pk[1+σk(1+ηk)(1+σk)σkΨk(1+ηk)(1Γk)(1+βk)],2kn.P_{k}\equiv\begin{bmatrix}1+\sigma_{k}&-(1+\eta_{k})(1+\sigma_{k})\\ \sigma_{k}\Psi_{k}&(1+\eta_{k})(1-\Gamma_{k})(1+\beta_{k})\end{bmatrix},\qquad 2\leq k\leq n.

Then

[ekc^k]=Pk[ek1c^k1]+Pk[sk1xk]+[sk0],2kn.\begin{bmatrix}e_{k}\\ \hat{c}_{k}\end{bmatrix}=P_{k}\begin{bmatrix}e_{k-1}\\ \hat{c}_{k-1}\end{bmatrix}+P_{k}\begin{bmatrix}s_{k-1}\\ -x_{k}\end{bmatrix}+\begin{bmatrix}-s_{k}\\ 0\end{bmatrix},\qquad 2\leq k\leq n.

Proof 4.2.

Since e1=c^1=0e_{1}=\hat{c}_{1}=0,

[e2c2]=P2[x1x2]+[s20].\begin{bmatrix}e_{2}\\ c_{2}\end{bmatrix}=P_{2}\begin{bmatrix}x_{1}\\ -x_{2}\end{bmatrix}+\begin{bmatrix}-s_{2}\\ 0\end{bmatrix}.

For the subsequent recursions abbreviate

p11\displaystyle p_{11} =\displaystyle= 1+σk,p12=(1+ηk)(1+σk),p21=σkΨk=σk(1+δk)(1+βk),\displaystyle 1+\sigma_{k},\quad p_{12}=-(1+\eta_{k})(1+\sigma_{k}),\quad p_{21}=\sigma_{k}\Psi_{k}=\sigma_{k}(1+\delta_{k})(1+\beta_{k}),
p22\displaystyle p_{22} =\displaystyle= (1+ηk)(1Γk)(1+βk)=(1+ηk)(1(1+σk)(1+δk))(1+βk)\displaystyle(1+\eta_{k})(1-\Gamma_{k})(1+\beta_{k})=(1+\eta_{k})\left(1-(1+\sigma_{k})(1+\delta_{k})\right)(1+\beta_{k})

To derive the recurrence for eke_{k}, write

(15) s^k=(s^k1+y^k)(1+σk)=(ek1+sk1+y^k)(1+σk)\hat{s}_{k}=(\hat{s}_{k-1}+\hat{y}_{k})(1+\sigma_{k})=(e_{k-1}+s_{k-1}+\hat{y}_{k})(1+\sigma_{k})

Thus

ek\displaystyle e_{k} =\displaystyle= s^ksk=(ek1+sk1)(1+σk)(c^k1xk)(1+ηk)(1+σk)sk\displaystyle\hat{s}_{k}-s_{k}=(e_{k-1}+s_{k-1})(1+\sigma_{k})-(\hat{c}_{k-1}-x_{k})(1+\eta_{k})(1+\sigma_{k})-s_{k}
=\displaystyle= p11(ek1+sk1)+p12(c^k1xk)sk.\displaystyle p_{11}(e_{k-1}+s_{k-1})+p_{12}(\hat{c}_{k-1}-x_{k})-s_{k}.

To derive the recurrence for c^k\hat{c}_{k}, start with (15)

s^ks^k1=s^k1σk+y^k(1+σk)=(ek1+sk1)σk+y^k(1+σk).\hat{s}_{k}-\hat{s}_{k-1}=\hat{s}_{k-1}\sigma_{k}+\hat{y}_{k}(1+\sigma_{k})=(e_{k-1}+s_{k-1})\sigma_{k}+\hat{y}_{k}(1+\sigma_{k}).

Multiply by 1+δk1+\delta_{k} and subtract y^k=xkc^k1\hat{y}_{k}=x_{k}-\hat{c}_{k-1}

(s^ks^k1)(1+δk)y^k\displaystyle(\hat{s}_{k}-\hat{s}_{k-1})(1+\delta_{k})-\hat{y}_{k} =\displaystyle= (ek1+sk1)σk(1+δk)+y^k((1+σk)(1+δk)Γk1)\displaystyle(e_{k-1}+s_{k-1})\sigma_{k}(1+\delta_{k})+\hat{y}_{k}(\underbrace{(1+\sigma_{k})(1+\delta_{k})}_{\Gamma_{k}}-1)
=\displaystyle= (ek1+sk1)σk(1+δk)+(c^k1xk)(1+ηk)(1Γk).\displaystyle(e_{k-1}+s_{k-1})\sigma_{k}(1+\delta_{k})+(\hat{c}_{k-1}-x_{k})(1+\eta_{k})(1-\Gamma_{k}).

Multiply by 1+βk1+\beta_{k} to obtain c^k=p21(ek1+sk1)+p22(c^k1xk)\hat{c}_{k}=p_{21}(e_{k-1}+s_{k-1})+p_{22}(\hat{c}_{k-1}-x_{k}).

Now we unravel this recursion, so that the matrix product ends with a matrix P~j\widetilde{P}_{j} all of whose elements are 𝒪(u)\mathcal{O}(u). Note the correspondence between (16) and the error expression in (4).

Theorem 4.3.

With assumptions (14), the recursions in Lemma 4.1 have the explicit form

(16) [ekc^k]=j=2k1(PkPj+1)P~j[sjxj]+P~k[skxk],2kn,\begin{bmatrix}e_{k}\\ \hat{c}_{k}\end{bmatrix}=\sum_{j=2}^{k-1}(P_{k}\cdots P_{j+1})\widetilde{P}_{j}\begin{bmatrix}s_{j}\\ x_{j}\end{bmatrix}+\widetilde{P}_{k}\begin{bmatrix}s_{k}\\ x_{k}\end{bmatrix},\qquad 2\leq k\leq n,

where

P~j[σjηj(1+σj)σjΨj(1+βj)(δj+ηj(Γj1)).]=𝒪(u),2jn.\displaystyle\widetilde{P}_{j}\equiv\begin{bmatrix}\sigma_{j}&\eta_{j}(1+\sigma_{j})\\ \sigma_{j}\Psi_{j}&(1+\beta_{j})(\delta_{j}+\eta_{j}(\Gamma_{j}-1)).\end{bmatrix}=\mathcal{O}(u),\qquad 2\leq j\leq n.

Proof 4.4.

Write the last two summands in the recursion in Lemma 4.1 as

Pk[sk1xk]+[sk0]=Pk[1101][skxk][1000][skxk]=P~k[skxk],\displaystyle P_{k}\begin{bmatrix}s_{k-1}\\ -x_{k}\end{bmatrix}+\begin{bmatrix}-s_{k}\\ 0\end{bmatrix}=P_{k}\begin{bmatrix}1&-1\\ 0&-1\end{bmatrix}\begin{bmatrix}s_{k}\\ x_{k}\end{bmatrix}-\begin{bmatrix}1&0\\ 0&0\end{bmatrix}\begin{bmatrix}s_{k}\\ x_{k}\end{bmatrix}=\widetilde{P}_{k}\begin{bmatrix}s_{k}\\ x_{k}\end{bmatrix},

where

P~k=Pk[1101][1000]=[σkηk(1+σk)σkΨk(1+βk)(δk+ηk(Γk1))].\displaystyle\begin{split}\widetilde{P}_{k}=P_{k}\begin{bmatrix}1&-1\\ 0&-1\end{bmatrix}-\begin{bmatrix}1&0\\ 0&0\end{bmatrix}=\begin{bmatrix}\sigma_{k}&\eta_{k}(1+\sigma_{k})\\ \sigma_{k}\Psi_{k}&(1+\beta_{k})(\delta_{k}+\eta_{k}(\Gamma_{k}-1))\end{bmatrix}.\end{split}

Then the recursion in Lemma 4.1 is identical to

[ekc^k]=Pk[ek1c^k1]+P~k[skxk],2kn.\displaystyle\begin{bmatrix}e_{k}\\ \hat{c}_{k}\end{bmatrix}=P_{k}\begin{bmatrix}e_{k-1}\\ \hat{c}_{k-1}\end{bmatrix}+\widetilde{P}_{k}\begin{bmatrix}s_{k}\\ x_{k}\end{bmatrix},\qquad 2\leq k\leq n.

Now unravel the recursion and terminate with e1=c^1=0e_{1}=\hat{c}_{1}=0.

To strengthen the justification for the first order bound in Section 4.2, we present two additional exact expressions for the error that are independent of Theorem 4.3.

Theorem 4.5.

With assumptions (14), the error in Algorithm 3 equals

en=snσn+j=4nxjηjk=jn(1+σk)+j=2n1(sjσjc^j(1+ηj+1))k=j+1n(1+σk)\displaystyle e_{n}=s_{n}\sigma_{n}+\sum_{j=4}^{n}{x_{j}\eta_{j}\prod_{k=j}^{n}{(1+\sigma_{k})}}+\sum_{j=2}^{n-1}{\left(s_{j}\sigma_{j}-\hat{c}_{j}(1+\eta_{j+1})\right)\prod_{k=j+1}^{n}{(1+\sigma_{k})}}

and

en=snσn+(Xn+En1)(1+σn),\displaystyle e_{n}=s_{n}\,\sigma_{n}+\left(X_{n}+E_{n-1}\right)(1+\sigma_{n}),

where

Xn\displaystyle X_{n} \displaystyle\equiv xnηn(1+βn)+j=2n1xj(ηjδj)=jn1(1+β)(1+η+1)\displaystyle x_{n}\eta_{n}(1+\beta_{n})+\sum_{j=2}^{n-1}{x_{j}(\eta_{j}-\delta_{j})\prod_{\ell=j}^{n-1}{(1+\beta_{\ell})(1+\eta_{\ell+1})}}
Θk\displaystyle\Theta_{k} \displaystyle\equiv 1(1+δk)(1+βk)(1+ηk+1),2kn1\displaystyle 1-(1+\delta_{k})(1+\beta_{k})(1+\eta_{k+1}),\qquad 2\leq k\leq n-1
En1\displaystyle E_{n-1} \displaystyle\equiv en1Θn1+j=2n2ej(Θj+δj+1)=j+1n1(1+β)(1+η+1).\displaystyle e_{n-1}\Theta_{n-1}+\sum_{j=2}^{n-2}{e_{j}(\Theta_{j}+\delta_{j+1})\prod_{\ell=j+1}^{n-1}{(1+\beta_{\ell})(1+\eta_{\ell+1})}}.

4.2 First order bound

We present a first-order expression and bound for the error in Algorithm 3 and note the discrepancy with existing bounds (Remark 4.8.

Corollary 4.6.

With assumptions (14), Algorithm 3 has the first order error

en=s^nsn=snσn+xnηn+j=2n1xj(ηjδj)+𝒪(u2),e_{n}=\hat{s}_{n}-s_{n}=s_{n}\,\sigma_{n}+x_{n}\eta_{n}+\sum_{j=2}^{n-1}{x_{j}(\eta_{j}-\delta_{j})}+\mathcal{O}(u^{2}),

with the bound

|s^nsn|3uj=1n|xj|+𝒪(u2).|\hat{s}_{n}-s_{n}|\leq 3u\,\sum_{j=1}^{n}{|x_{j}|}+\mathcal{O}(u^{2}).

Proof 4.7.

This follows from the three expressions in Theorems 4.3 and 4.5, but is most easily seen from the second expression in Theorem 4.5. There the first order error is extracted from the summands snσn+Xn(1+σn)s_{n}\sigma_{n}+X_{n}(1+\sigma_{n}), where

Xn\displaystyle X_{n} =xnηn(1+βn)+j=2n1xj(ηjδj)=jn1(1+β)(1+η+1)\displaystyle=x_{n}\eta_{n}(1+\beta_{n})+\sum_{j=2}^{n-1}{x_{j}(\eta_{j}-\delta_{j})\prod_{\ell=j}^{n-1}{(1+\beta_{\ell})(1+\eta_{\ell+1})}}
=xnηn+j=2n1xj(ηjδj)+𝒪(u2).\displaystyle=x_{n}\eta_{n}+\sum_{j=2}^{n-1}{x_{j}(\eta_{j}-\delta_{j})}+\mathcal{O}(u^{2}).

Corollary 4.6 suggests that the errors in the ’correction’ steps 3 and 5 of Algorithm 3 dominate the first order error.

Remark 4.8.

The existing bounds in [7, Section 4.3, Appendix], [9, (4.9)], [15, page 9-5], and [16, Section 4.2.2] are not consistent with Corollary 4.6, which implies that the computed sum equals

s^n=sn(1+σn)+j=2n1xj(ηjδj)+xnηn+𝒪(u2).\hat{s}_{n}=s_{n}\,(1+\sigma_{n})+\sum_{j=2}^{n-1}{x_{j}(\eta_{j}-\delta_{j})}+x_{n}\eta_{n}+\mathcal{O}(u^{2}).

In contrast, the expressions in [7, Theorem 8], [15, page 9-5], and [16, Section 4.2.2] are equal to

s^n=jxj(1+ϕj)+𝒪(nu2)k|xk|where|ϕj|2u\hat{s}_{n}=\sum_{j}{x_{j}(1+\phi_{j})}+\mathcal{O}(nu^{2})\sum_{k}{|x_{k}|}\qquad\text{where}\quad|\phi_{j}|\leq 2u

implying the bound |en|2uk|xk|+𝒪(u2)|e_{n}|\leq 2u\sum_{k}{|x_{k}|}+\mathcal{O}(u^{2}).

The reason for the discrepancy with Corollary 4.6 may be the focus of [7, Section 4.3, Appendix] on the coefficients of x1x_{1}, which is subjected to one less error than the other xkx_{k}.

4.3 Second order deterministic and probabilistic bounds

For the second order in Algorithm 3, we present an explicit expression and deterministic bound (Theorem 4.9) and a probabilistic bound (Theorem 4.11).

Theorem 4.9 specifies the second order coefficient in the existing bounds [7, Theorem 8], [9, (4.9)], [16, Section 4.2.2] and improves the coefficient n2u2n^{2}u^{2} in [15, page 9-5] and [19, Section4].

Theorem 4.9 (Explicit expression and deterministic bound).

With assumptions (14), define μkηkδk\mu_{k}\equiv\eta_{k}-\delta_{k}, 2kn12\leq k\leq n-1, and μnηn\mu_{n}\equiv\eta_{n}. Then Algorithm 3 has the second order error

en=snσn+\displaystyle e_{n}=s_{n}\sigma_{n}+ (1+σn)k=2nxkμk\displaystyle(1+\sigma_{n})\sum_{k=2}^{n}{x_{k}\mu_{k}}
k=2n1skσk(μk+1+δk+βk)\displaystyle-\sum_{k=2}^{n-1}{s_{k}\sigma_{k}(\mu_{k+1}+\delta_{k}+\beta_{k})}
k=2n1xkδk(μk+1+βk+ηk)+𝒪(u3).\displaystyle-\sum_{k=2}^{n-1}{x_{k}\delta_{k}(\mu_{k+1}+\beta_{k}+\eta_{k})}+\mathcal{O}(u^{3}).

with the bound

|s^nsn|(3u+4nu2)k=1n|xk|+𝒪(u3).\displaystyle|\hat{s}_{n}-s_{n}|\leq(3u+4nu^{2})\,\sum_{k=1}^{n}{|x_{k}|}+\mathcal{O}(u^{3}).

Proof 4.10.

For the second order error in Theorem 4.3, it suffices to compute the products PkPj+1P_{k}\cdots P_{j+1} to first order only. The matrices in Lemma 4.1 equal to first order

Pk=[1+σk(1+σk+ηk)σk(σk+δk)]+𝒪(u2).P_{k}=\begin{bmatrix}1+\sigma_{k}&-(1+\sigma_{k}+\eta_{k})\\ \sigma_{k}&-(\sigma_{k}+\delta_{k})\end{bmatrix}+\mathcal{O}(u^{2}).

Induction shows that PkPj+1=Qk:j+1+𝒪(u2)P_{k}\cdots P_{j+1}=Q_{k:j+1}+\mathcal{O}(u^{2}) for k>j+1k>j+1, where

(17) Qk:j+1[1+σkμj+1(1+σk)σkσk],k>j+1.Q_{k:j+1}\equiv\begin{bmatrix}1+\sigma_{k}&\mu_{j+1}-(1+\sigma_{k})\\ \sigma_{k}&-\sigma_{k}\end{bmatrix},\qquad k>j+1.

The matrix in Theorem 4.3 is to second order

P~k=[σkηk(1+σk)σk(1+δk+βk)(1+βk)δk+ηk(δk+σk)]+𝒪(u3),2kn.\displaystyle\widetilde{P}_{k}=\begin{bmatrix}\sigma_{k}&\eta_{k}(1+\sigma_{k})\\ \sigma_{k}(1+\delta_{k}+\beta_{k})&(1+\beta_{k})\delta_{k}+\eta_{k}(\delta_{k}+\sigma_{k})\end{bmatrix}+\mathcal{O}(u^{3}),\qquad 2\leq k\leq n.

This implies for the matrix products with n>j+1n>j+1 in Theorem 4.3

Qn:j+1P~j=[σj(μj+1+δj+βj)(1+σn)μjδj(μj+1+βj+ηj)0σnμj]+𝒪(u3).\displaystyle Q_{n:j+1}\widetilde{P}_{j}=\begin{bmatrix}-\sigma_{j}(\mu_{j+1}+\delta_{j}+\beta_{j})&(1+\sigma_{n})\mu_{j}-\delta_{j}(\mu_{j+1}+\beta_{j}+\eta_{j})\\ 0&\sigma_{n}\mu_{j}\end{bmatrix}+\mathcal{O}(u^{3}).

For j=n1j=n-1 we have

PnP~n1=[σn1(ηn+δn1+βn1)(1+σn)μn1δn1(ηn+βn1+ηn1)σn1δnσnμn1δnδn1]\displaystyle P_{n}\widetilde{P}_{n-1}=\begin{bmatrix}-\sigma_{n-1}(\eta_{n}+\delta_{n-1}+\beta_{n-1})&(1+\sigma_{n})\mu_{n-1}-\delta_{n-1}(\eta_{n}+\beta_{n-1}+\eta_{n-1})\\ -\sigma_{n-1}\delta_{n}&\sigma_{n}\mu_{n-1}-\delta_{n}\delta_{n-1}\end{bmatrix}

Substitute the above products into the recurrence for the second order error

[enc^n]=P~n[snxn]+PnP~n1[sn1xn1]+j=2n2Qn:j+1P~j[sjxj]+𝒪(u3).\displaystyle\begin{bmatrix}e_{n}\\ \hat{c}_{n}\end{bmatrix}=\widetilde{P}_{n}\begin{bmatrix}s_{n}\\ x_{n}\end{bmatrix}+P_{n}\widetilde{P}_{n-1}\begin{bmatrix}s_{n-1}\\ x_{n-1}\end{bmatrix}+\sum_{j=2}^{n-2}Q_{n:j+1}\widetilde{P}_{j}\begin{bmatrix}s_{j}\\ x_{j}\end{bmatrix}+\mathcal{O}(u^{3}).

The error is bounded up to second order by

|en|\displaystyle|e_{n}| (3u+6u2)k=1n|xk|+4u2k=2n1|sk|(3u+6u2+4(n2)u2)k=1n|xk|\displaystyle\leq(3u+6u^{2})\sum_{k=1}^{n}{|x_{k}|}+4u^{2}\sum_{k=2}^{n-1}{|s_{k}|}\leq(3u+6u^{2}+4(n-2)u^{2})\sum_{k=1}^{n}{|x_{k}|}
(3u+4nu2)k=1n|xk|𝒪(u3).\displaystyle\leq(3u+4nu^{2})\,\sum_{k=1}^{n}{|x_{k}|}*\mathcal{O}(u^{3}).

The probabilistic bound below is derived from the explicit expression in Theorem 4.9.

Theorem 4.11 (Probabilistic second order bound).

Let δj\delta_{j} and ϵj\epsilon_{j} be independent zero-mean random variables, then for any 0<δ<10<\delta<1 with probability at least δ\delta, the error in Algorithm 2 is bounded by

|s^nsn|\displaystyle|\hat{s}_{n}-s_{n}| \displaystyle\leq u(2(1+3u)k=1n|xk|2+|sn|2+16u2k=1n1|sk|2)2ln(2/δ)+𝒪(u3)\displaystyle u\left(2(1+3u)\sqrt{\sum_{k=1}^{n}{|x_{k}|^{2}}}+\sqrt{|s_{n}|^{2}+16u^{2}\sum_{k=1}^{n-1}{|s_{k}|^{2}}}\right)\sqrt{2\ln(2/\delta)}+\mathcal{O}(u^{3})
\displaystyle\leq u(2(1+3u)𝐱2+1+16(n2)u2𝐱1)2ln(2/δ)+𝒪(u3),\displaystyle u\left(2(1+3u)\|\mathbf{x}\|_{2}+\sqrt{1+16(n-2)u^{2}}\|\mathbf{x}\|_{1}\right)\sqrt{2\ln(2/\delta)}+\mathcal{O}(u^{3}),

where 𝐱=[x1xn]T\mathbf{x}=\begin{bmatrix}x_{1}&\cdots&x_{n}\end{bmatrix}^{T}.

Proof 4.12.

Analogously to the proof of Theorem 3.2, we construct a martingale by unravelling the second order error in Theorem 4.9 in a last-in first-out manner. Let μk=ηkδk\mu_{k}=\eta_{k}-\delta_{k}, 2kn12\leq k\leq n-1, μn=ηn\mu_{n}=\eta_{n}, Z0=0Z_{0}=0, and

Z1\displaystyle Z_{1} =snσn+xnμn(1+σn)\displaystyle=s_{n}\sigma_{n}+x_{n}\mu_{n}(1+\sigma_{n})
Zk\displaystyle Z_{k} =snσn+xnμn(1+σn)j=nk+1n1sjσj(μj+1+δj+βj)\displaystyle=s_{n}\sigma_{n}+x_{n}\mu_{n}(1+\sigma_{n})-\sum_{j=n-k+1}^{n-1}{s_{j}\sigma_{j}(\mu_{j+1}+\delta_{j}+\beta_{j})}
+j=nk+1n1xj(μj(1+σn)δj(μj+1+βj+ηj)),2kn1.\displaystyle\qquad+\sum_{j=n-k+1}^{n-1}{x_{j}\left(\mu_{j}(1+\sigma_{n})-\delta_{j}(\mu_{j+1}+\beta_{j}+\eta_{j})\right)},\qquad 2\leq k\leq n-1.

Setting δn=βn=η2=0\delta_{n}=\beta_{n}=\eta_{2}=0 show that Z1Z_{1} is a function of ξn(σn,ηn,δn,βn)\xi_{n}\equiv(\sigma_{n},\eta_{n},\delta_{n},\beta_{n}) while ZkZ_{k} is a function of ξj(σj,ηj,δj,βj)\xi_{j}\equiv(\sigma_{j},\eta_{j},\delta_{j},\beta_{j}) for nk+1jn1n-k+1\leq j\leq n-1.

By assumption ηj(σj,ηj,δj,βj)\eta_{j}\equiv(\sigma_{j},\eta_{j},\delta_{j},\beta_{j}) are tuples of 4(n1)4(n-1) independent random variables that have zero mean and are bounded, that is,

𝔼[σj]=𝔼[|ηj]=E[δj]=𝔼[βj]=0,and|σj|,|ηj|,|δj|,|βj|u,2jn.\mathbb{E}[\sigma_{j}]=\mathbb{E}[|\eta_{j}]=E[\delta_{j}]=\mathbb{E}[\beta_{j}]=0,\quad\text{and}\quad|\sigma_{j}|,|\eta_{j}|,|\delta_{j}|,|\beta_{j}|\leq u,\qquad 2\leq j\leq n.

Thus 𝔼[Zk+1|ξnk+1,,ξn]=Zk\mathbb{E}[Z_{k+1}|\xi_{n-k+1},\ldots,\xi_{n}]=Z_{k}, 0kn20\leq k\leq n-2. Combine everything to conclude that Z0,,Zn1Z_{0},\ldots,Z_{n-1} is a Martingale in ξn,,ξ2\xi_{n},\ldots,\xi_{2}.

The differences are for 1kn21\leq k\leq n-2

Z1Z0\displaystyle Z_{1}-Z_{0} =snσn+xnμn(1+δn)\displaystyle=s_{n}\sigma_{n}+x_{n}\mu_{n}(1+\delta_{n})
Zk+1Zk\displaystyle Z_{k+1}-Z_{k} =snkσnk(μnk+1+δnk+βnk)\displaystyle=-s_{n-k}\sigma_{n-k}(\mu_{n-k+1}+\delta_{n-k}+\beta_{n-k})
+xnk(μnk(1+σn)δnk(μnk+1+βnk+ηnk)).\displaystyle\quad+x_{n-k}\left(\mu_{n-k}(1+\sigma_{n})-\delta_{n-k}(\mu_{n-k+1}+\beta_{n-k}+\eta_{n-k})\right).

Since |μk|2u|\mu_{k}|\leq 2u, 2kn12\leq k\leq n-1, the differences are bounded by

|Z1Z0|\displaystyle|Z_{1}-Z_{0}| \displaystyle\leq c1u(v1+w1),v1|sn|,w1|xn|(1+u)\displaystyle c_{1}\equiv u(v_{1}+w_{1}),\qquad v_{1}\equiv|s_{n}|,\qquad w_{1}\equiv|x_{n}|(1+u)
|Zk+1Zk|\displaystyle|Z_{k+1}-Z_{k}| \displaystyle\leq ck+1u(vk+1+wk+1),1kn2,\displaystyle c_{k+1}\equiv u(v_{k+1}+w_{k+1}),\qquad 1\leq k\leq n-2,

where vk+14|snk|uv_{k+1}\equiv 4|s_{n-k}|u and wk+12|xnk|(1+3u)w_{k+1}\equiv 2|x_{n-k}|(1+3u).

Lemma 1.2 implies that for any 0<δ<10<\delta<1 with probability at least δ\delta

|s^nsn|=|Zn1Z0|k=1n1ck22ln(2/δ).|\hat{s}_{n}-s_{n}|=|Z_{n-1}-Z_{0}|\leq\sqrt{\sum_{k=1}^{n-1}{c_{k}^{2}}}\sqrt{2\ln(2/\delta)}.

The triangle inequality implies the first desired inequality

k=1n1ck2k=1n1vk2+k=1n1wk2,\sqrt{\sum_{k=1}^{n-1}{c_{k}^{2}}}\leq\sqrt{\sum_{k=1}^{n-1}{v_{k}^{2}}}+\sqrt{\sum_{k=1}^{n-1}{w_{k}^{2}}},

while the second one follows from |sk|𝐱1|s_{k}|\leq\|\mathbf{x}\|_{1}, 2kn2\leq k\leq n.

The above immediately implies a first order bound.

Corollary 4.13.

Let δj\delta_{j} and ϵj\epsilon_{j} be independent zero-mean random variables, then for any 0<δ<10<\delta<1 with probability at least δ\delta, the error in Algorithm 2 is bounded by

|s^nsn|\displaystyle|\hat{s}_{n}-s_{n}| \displaystyle\leq u(2𝐱2+|sn|)2ln(2/δ)+𝒪(u2),\displaystyle u\left(2\|\mathbf{x}\|_{2}+|s_{n}|\right)\sqrt{2\ln(2/\delta)}+\mathcal{O}(u^{2}),

where 𝐱=[x1xn]T\mathbf{x}=\begin{bmatrix}x_{1}&\cdots&x_{n}\end{bmatrix}^{T}.

5 Numerical experiments

After describing the setup, we present numerical experiments for shifted summation (Section 5.1), compensated summation (Section 5.3), and a comparison of the two (Section 5.2).

The unit roundoffs in the experiments are [12]

  • Double precision u=2531.111016u=2^{-53}\approx 1.11\cdot 10^{-16}.
    Here double precision is Julia Float64, while ‘exact’ computations are performed in Julia Float256.

  • Half precision u=2114.88104u=2^{-11}\approx 4.88\cdot 10^{-4}. Here half precision is Julia Float16, while ‘exact’ computations are performed in Julia Float64.

We plot relative errors |s^nsn|/|sn||\hat{s}_{n}-s_{n}|/|s_{n}| rather than absolute errors to allow for meaningful calibration: Relative errors u\leq u indicate full accuracy; while relative errors .5\geq.5 indicate zero accuracy.

For shifted summation we use the empirical mean of two extreme summands,

c=(minkxk+maxkxk)/2.c=(\min_{k}{x_{k}}+\max_{k}{x_{k}})/2.

For probabilistic bounds, the failure probability is δ=102\delta=10^{-2}, hence 2ln(2/δ)3.26\sqrt{2\ln(2/\delta)}\approx 3.26.

Two classes of summands will be considered:

  • xk=m+ykx_{k}=m+y_{k} where yky_{k} are independent uniform[0,1][0,1] random variables, with 0y10\leq y\leq 1; and m=104m=10^{4} in Sections 5.1 and 5.2 and m=0m=0 in Section 5.3.
    Since all summands have the same sign, this is a well-conditioned problem with |sn|=k=1n|xk||s_{n}|=\sum_{k=1}^{n}{|x_{k}|}. The case m=104m=10^{4} is designed to increase the accuracy in shifted summation since xkx_{k} are tightly clustered at 10410^{4}.

  • xkx_{k} are normal(0,1)(0,1) random variables with mean 0 and variance Summands have different signs, which may lead to cancellation. The xkx_{k} are naturally clustered around zero.

The plots show relative errors |s^nsn|/|sn||\hat{s}_{n}-s_{n}|/|s_{n}| versus nn, where 10nnmax10\leq n\leq n_{max}. To speed up the computations, we compute x1,,xnmaxx_{1},\ldots,x_{n_{max}} once, and then pick its leading nn elements as x1,,xnx_{1},\ldots,x_{n}.

5.1 Errors and bounds for sequential summation

We compare the errors from sequential versions of Algorithm 1 and 2, that is, sequential summation without and with shifting, and an easy relative version of the probabilistic bound in Theorem 3.2

(18) un+2max1kn+1|skkc|+|xkc||sn|2ln(2/δ)u\sqrt{n+2}\,\max_{1\leq k\leq n+1}{\frac{|s_{k}-kc|+|x_{k}-c|}{|s_{n}|}}\,\sqrt{2\ln{(2/\delta)}}
Refer to caption
Refer to caption
Figure 2: Relative errors in plain summation (green +) and summation with shift (blue x); and (18) (red triangle). Left panel: xk=104+uniform[0,1]x_{k}=10^{4}+\mathrm{uniform}[0,1]. Right panel: xk=normal(0,1)x_{k}=\mathrm{normal}(0,1).

In Figure 2, the range for the number of summands is n=10:104:106n=10:10^{4}:10^{6}, and working precision is double with u1016u\approx 10^{-16}. Figure 2 illustrates that, compared to plain summation, shifted summation can increase the accuracy when summands are artificially and tightly clustered at a large mean, however it can hurt the accuracy when summands are naturally clustered around zero. For the normal(0,1) summands in the right panel, the range of shifts is .003c1.2.003\leq c\leq 1.2. The expression (18) represents an upper bound accurate within a factor 10-100.

5.2 Comparison of shifted and compensated summation

We compare the errors from sequential summation with shifting (sequential version of Algorithm 1) and compensated summation (Algorithm 3).

Refer to caption
Refer to caption
Figure 3: Relative errors in plain summation (green +), summation with shift (blue x), and compensated summation (red triangle). Left panel: xk=104+uniform[0,1]x_{k}=10^{4}+\mathrm{uniform}[0,1]. Right panel: xk=normal(0,1)x_{k}=\mathrm{normal}(0,1).

In Figure 3, the range for the number of summands is n=10:104:106n=10:10^{4}:10^{6}, and working precision is double with u1016u\approx 10^{-16}. Figure 3 illustrates that compensated summation is at least as accurate and can be much more accurate than shifted summation. in both panels, compensated summation is accurate to machine precision.

5.3 Errors and bounds for compensated summation

We compare the errors from sequential summation (sequential version of Algorithm 1) and compensated summation (Algorithm 3) with the relative version of the first order probabilistic bound in Corollary 4.13

(19) u2𝐱2+|sn||sn|2ln(2/δ),𝐱=[x1xn]T.u\,\frac{2\|\mathbf{x}\|_{2}+|s_{n}|}{|s_{n}|}\sqrt{2\ln(2/\delta)},\qquad\mathbf{x}=\begin{bmatrix}x_{1}&\cdots&x_{n}\end{bmatrix}^{T}.
Refer to caption
Refer to caption
Figure 4: Relative errors in plain summation (green +), compensated (blue x), and (19) (red triangle). Left panel: xk=uniform[0,1]x_{k}=\mathrm{uniform}[0,1]. Right panel: xk=normal(0,1)x_{k}=\mathrm{normal}(0,1).

In Figure 4, the working precision is double with u5104u\approx 5\cdot 10^{-4}. In the left panel the number of uniform(0,1) summands is n=10:103:6104n=10:10^{3}:6\cdot 10^{4}, to avoid overflow since the largest value in binary16 is about 65,504. In the right panel in contrast, the number of normal(0,1) summands is n=10:104:106n=10:10^{4}:10^{6}, as summands of different signs are less likely to cause overflow.

Figure 4 illustrates that, in contrast to plain summation, compensated summation is accurate to machine precision, even for n>u1n>u^{-1} in the right panel, and that (19) represents an upper bound accurate with a factor 10.

Acknowledgement

We thank Johnathan Rhyne for helpful discussions.

References

  • [1] P. Blanchard, N. J. Higham, and T. Mary, A class of fast and accurate summation algorithms, SIAM J. Sci. Comput., 42 (2020), pp. A1541–A1557.
  • [2] F. Chung and L. Lu, Concentration inequalities and martingale inequalities: a survey, Internet Math., 3 (2006), pp. 79–127.
  • [3] M. P. Connolly, N. J. Higham, and T. Mary, Stochastic rounding and its probabilistic backward error analysis, SIAM J. Sci. Comput., 43 (2021), pp. A566–A585.
  • [4] G. Constantinides, F. Dahlqvist, Z. Rakamaric, and R. Salvia, Rigorous roundoff error analysis of probabilistic floating-point computations, 2021, https://arxiv.org/abs/2105.13217.
  • [5] F. Dahlqvist, R. Salvia, and G. A. Constantinides, A probabilistic approach to floating-point arithmetic, 2019, https://arxiv.org/abs/1912.00867.
  • [6] J. Demmel and Y. Hida, Accurate and efficient floating point summation, SIAM J. Sci. Comput., 25 (2003/04), pp. 1214–1248.
  • [7] D. Goldberg, What every computer scientist should know about floating-point arithmetic, ACM Comput. Surv., 23 (1991), pp. 5–48.
  • [8] E. Hallman, A refined probabilistic error bound for sums, 2021, https://arxiv.org/abs/2104.06531.
  • [9] N. J. Higham, Accuracy and stability of numerical algorithms, SIAM, Philadelphia, second ed., 2002.
  • [10] N. J. Higham and T. Mary, A new approach to probabilistic rounding error analysis, SIAM J. Sci. Comput., 41 (2019), pp. A2815–A2835.
  • [11] N. J. Higham and T. Mary, Sharper probabilistic backward error analysis for basic linear algebra kernels with random data, SIAM J. Sci. Comput., 42 (2020), pp. A3427–A3446.
  • [12] IEEE Computer Society, IEEE Standard for Floating-Point Arithmetic, IEEE Standard 754–2008, 2019. http://ieeexplore.ieee.org/document/4610935.
  • [13] I. C. F. Ipsen and H. Zhou, Probabilistic error analysis for inner products, SIAM J. Matrix Anal. Appl., 41 (2020), pp. 1726–1741.
  • [14] W. Kahan, Further remarks on reducing truncation errors, Comm. ACM, 8 (1965), p. 40.
  • [15] W. Kahan, Implementation of algorithms (lecture notes by W. S. Haugeland and D. Hough), Tech. Report 20, Department of Computer Science, University of California, Berkeley, CA 94720, 1973.
  • [16] D. Knuth, The Art of Computer Programming, vol. II, Addison-Wesley, Reading, MA, third ed., 1998.
  • [17] D. Lohar, M. Prokop, and E. Darulova, Sound probabilistic numerical error analysis, in Intern. Conf. Integrated Formal Methods, Springer, 2019, pp. 322–340.
  • [18] M. Mitzenmacher and E. Upfal, Probability and computing: randomization and probabilistic techniques in algorithms and data analysis, Cambridge University Press, 2005.
  • [19] A. Neumaier, Rundungsfehleranalyse einiger Verfahren zur Summation endlicher Summen, Z. Angew. Math. Mech., 54 (1974), pp. 39–51.
  • [20] S. Roch, Modern discrete probability: An essential toolkit, University Lecture, (2015).