Error bounds for Floating Point SummationEric Hallman and Ilse C.F. Ipsen
Deterministic and Probabilistic Error Bounds for Floating Point Summation Algorithms††thanks: This research was supported in part by the National Science Foundation through grant DMS-1745654 and DMS-1760374.
Abstract
We analyse the forward error in the floating point summation of real numbers, from algorithms that do not require recourse to higher precision or better hardware. We derive informative explicit expressions, and new deterministic and probabilistic bounds for errors in three classes of algorithms: general summation, shifted general summation, and compensated (sequential) summation. Our probabilistic bounds for general and shifted general summation hold to all orders. For compensated summation, we also present deterministic and probabilistic first and second order bounds, with a first order bound that differs from existing ones. Numerical experiments illustrate that the bounds are informative and that among the three algorithm classes, compensated summation is generally the most accurate method.
keywords:
Rounding error analysis, floating-point arithmetic, random variables, martingales65G99, 60G42, 60G50
1 Introduction
Given real numbers , we consider algorithms for summation in floating point arithmetic. The algorithms operate in one and only one precision, without any recourse to higher precision as in [1], or wider accumulators as in [6].
We analyze the forward error in the computed sum in terms of the unit roundoff , for three classes of algorithms: general summation (Section 2), shifted general summation (Section 3), and compensated summation (Section 4). Numerical experiments (Section 5) illustrate that the bounds are informative, and that, among the three classes of algorithms, compensated summation is the most accurate method. Hybrid algorithms as in [1] will be analyzed in future work.
1.1 Contributions
We present informative bounds, based on clear and structured derivations, for three classes of summation algorithms.
General summation
We derive explicit expressions for the error (Lemmas 2.2 and 2.3); as well as deterministic bounds that hold to all orders (Theorem 2.4) and depends on the height of the computational tree associated with the particular summation order. We present two probabilistic bounds that treat the roundoffs as random variables, one for independent roundoffs (Theorem 2.8) and one for mean-independent roundoffs (Theorem 2.12). Both bounds hold to all orders, depend only on , and suggest that reducing the tree height increases the accuracy.
Shifted general summation
We present the extension of shifted sequential summation to shifted general summation (Algorithm 2). For the special case of shifted sequential summation we present an explicit expression for the error, and a simple probabilistic bound for independent roundoffs (Theorem 3.2). For shifted general summation, we deduce probabilistic bounds for mean-independent roundoffs (Theorem 3.4) and independent roundoffs (Theorem 3.6) as respective special cases of Theorems 2.12 and 2.8.
Compensated summation
We derive three different explicit expressions for the error (Theorems 4.3 and 4.5), and bounds that are valid to first and second order, both deterministic (Corollary 4.6, Theorem 4.9) and probabilistic (Theorem 4.11, Corollary 4.13). In particular (Remark 4.8) we note the discrepancy by a factor of existing bounds with ours in Corollary 4.6,
1.2 Models for roundoff error
Our analyses assume that the inputs are floating point numbers, i.e. can be stored exactly without error, and that the summation produces no overflow or underflow.
Let denote the unit roundoff.
Standard model [9]
This holds for IEEE floating-point arithmetic and implies that in the absence of underflow or overflow, the operations when applied to floating point inputs and produce
(1) |
In order to derive probabilistic bounds, we treat the roundoffs as zero-mean random variables, according to two models that differ in the assumption of independence.
Model 1.1.
In the computation of interest, the quantities in the model (1) associated with every pair of operands are independent random variables of mean zero.
Model 1.2.
Let the computation of interest generate rounding errors in that order. The are random variables of mean zero such that
Higham and Mary [11] note that the mean independence assumption of Model 1.2 is weaker than the independence assumption of Model 1.1, but stronger than the assumption that the errors are uncorrelated. Connolly, et al. [3] observe that at least one mode of stochastic rounding produces errors satisfying Model 1.2, but with the weaker bound in place of (1).
1.3 Probability theory
For the derivation of the probabilistic bounds, we need a martingale, and a concentration inequality.
Definition 1.1 (Martingale [18]).
A sequence of random variables is a martingale with respect to the sequence if, for all ,
-
•
is a function of ,
-
•
, and
-
•
.
Lemma 1.2 (Azuma-Hoeffding inequality [20]).
Let be a martingale with respect to a sequence , and let be constants with
Then for any , with probability at least ,
(2) |
2 General summation
We analyse the error for general summation, presented in Algorithm 1. After representing the summation algorithm as a computational tree, we present error expressions and a deterministic error bound (Section 2.1), deterministic and probabilistic bounds valid to first order (Section 2.2), followed by two probabilistic bounds, one for Model 1.1 (Section 2.3) and one for Model 1.2 (Section 2.4).
Denote by the exact partial sum, by the computed sum, and by the forward error, .
Computational tree for Algorithm 1
We represent the specific ordering of summations in Algorithm 1 by a binary tree with vertices and the following properties:
-
•
Each vertex represents a partial sum or an input .
-
•
Each operation gives rise to two edges and .
-
•
The root is the output , and the leaves are the inputs .
Figure 1 shows two examples, one for sequential (recursive) summation and one for pairwise summation.
Definition 2.1.
Consider the computational tree associated for Algorithm 1 applied to inputs.
-
•
The tree defines a partial ordering. If is a descendant of , we say and if is possible.
-
•
The height of a node is the length of the longest downward path from that node to a leaf. Leaves have a height of zero.
-
•
The height of the tree is the height of its root. Sequential (recursive) summation yields a tree of height , while pairwise summation yields a tree of height , see Figure 1.
2.1 Explicit expressions and deterministic bounds for the error
We present two expressions for the error in Algorithm 1 (Lemmas 2.2 and 2.3) each geared towards a different model for the subsequent probabilistic bounds; and two deterministic bounds (Theorem 2.4).
If the computed children are and , then the computed parent sum, in line 3 of Algorithm 1, is
Highlight the error in the children,
to obtain the error in the parent
(3) |
This means the error at node equals the sum of the children errors, perturbed slightly, plus a local error that is proportional to the exact partial sum .
The unravelled recursions in Lemma 2.2 generalize the explicit expressions for sequential (recursive) summation in [8, Lemma 3.1],
(4) |
to general summation algorithms.
We present two expressions that differ in their usefulness under particular probabilistic models. The expression in Lemma 2.2 works well with Model 1.1 because the sum is a martingale if summed in reverse order, see Section 2.3. Unfortunately, this approach does not work under the weaker Model 1.2. This is the reason for Lemma 2.3 where the sum is a martingale under the original ordering, see Section 2.4.
Lemma 2.2 (First explicit expression).
This means the forward error is the sum of the local errors at each node, each perturbed by subsequent rounding errors. A similar expression is derived in [9, (4.2)]
(6) |
but in terms of the computed partial sums . By contrast, our (5) depends only on the exact partial , which make it more amenable to a probabilistic analysis.
Write (3) in terms of the sum of the children errors at node ,
Unraveling the recurrence gives the second explicit expression for the errors.
Lemma 2.3 (Second explicit expression).
Theorem 2.4.
Proof 2.5.
A similar bound is derived from (6) in [9, (4.3)],
is accompanied by the following observation:
In designing or choosing a summation method to achieve high accuracy, the aim should be to minimize the absolute values of the intermediate sums .
Our probabilistic bounds in Sections 2.3 and 2.4 corroborate this observation, the main difference being a dependence on rather than .
2.2 Deterministic and probabilistic error bounds valid to first order
We present a deterministic bound (9), and a probabilistic bound (Theorem 2.6) under Model 1.2, both valid to first order.
Truncating in Theorem 2.4 yields the first order bound
(9) |
which is also derived in [11, Lemma 2.1] for the special case sequential (recursive) summation.
Theorem 2.6.
Let be mean-indepen-dent, zero mean random variables. Then for any , with probability at least , the error in Algorithm 1 is bounded by
Proof 2.7.
A closely related bound for sequential (recursive) summation is derived in [11, Theorem 2.8] under a model where the inputs are themselves random variables. Under this assumption, the probabilistic error bounds become significantly tighter if , leading to the conclusion that centering the data prior to summation can reduce the error, see Section 3.
A disadvantage of Theorem 2.6 is
its uncertainty about the point where the second-order term
starts to become relevant. For input dimensions or in particular, it is desirable to derive bounds that hold to all orders.
2.3 General probabilistic bound under Model 1.1
We derive a bound under the assumption of independent roundoffs .
Theorem 2.8.
Proof 2.9.
We start by bounding the products in the error expression (5). Since
we can apply Lemma 1.2 to the sum on the right to conclude that for each , , and with failure probability at most , the bound
holds. Since the product is nonnegative, we can replace the expression on the left with its absolute value and the statement still holds. A union bound over at most such inequalities shows that with failure probability at most the bound
holds. Then substitute and .
Next, with , the sequence
is a martingale with respect to , that satisfies
with failure probability at most . At last apply Lemma 1.2 with additional failure probability .
The bound illustrates that, with high probability, the error in Theorem 2.8 is bounded by a quantity proportional to the square root of the height of the computational tree in Algorithm 1. This suggests that minimizing the height of the computational tree should reduce the error.
As long as , the probability can be made small, without much effect on the overall bound in Theorem 2.8.
2.4 General probabilistic bound under Model 1.2
We derive a bound under the assumption of mean-independent roundoffs . To this end we rely on the recurrence in (8) and induction.
Lemma 2.10.
Let be mean-independent mean-zero random variables, the height of the computational tree for Algorithm 1, and . For any with , the bounds
(10) |
all hold simultaneously with probability at least .
Proof 2.11.
The proof is by induction, and in terms of the failure probability .
Induction basis
The children of are both leaves and floating point numbers, hence and (10) holds deterministically.
Induction hypothesis
Assume that (10) holds simultaneously for all with failure probability at most .
Induction step
Write (8) as
With , the sequence , is a martingale with respect to , and by the induction hypothesis satisfies
with failure probability at most . Lemma 1.2 then implies that with additional failure probability ,
where the second inequality follows from the two-norm triangle inequality. Exploit the fact that can appear at most times, once for itself and once for each of its ancestors, to bound the second summand by
Combine everything
Thus (10) holds simultaneously for all with failure probability at most . By induction (10) holds for all with failure probability at most .
Below is the final probabilistic bound under Model 1.2.
Theorem 2.12.
3 Shifted General Summation
After discussing the motivation for shifted summation, we present Algorithm 2 for shifted general summation; and derive error expressions, deterministic and probabilistic bounds for shifted sequential (recursive) summation (Section 3.1) and shifted general summation (Section 3.2).
Shifted summation is motivated by work in computer architecture [5, 4] and formal methods for program verification [17] where not only the roundoffs but also the inputs are interpreted as random variables sampled from some distribution. Then one can compute statistics for the total roundoff error and estimate the probability that it is bounded by for a given .
Adopting instead the approach for probabilistic roundoff error analysis in [10, 13], probabilistic bounds for random data are derived in [11], with improvements in [8]. The bound below suggests that sequential summation is accurate if the summands are tightly clustered around zero.
Lemma 3.1 (Theorem 5.2 in [8]).
Let the roundoffs be independent mean-zero random variables, and the summands be independent random variables with mean and ‘variance’ . Then for any , with probability at least , the error in sequential (recursive) summation is bounded by
where and
This observation has been applied to non-random data to reduce the backward error [11, Section 4]. The shifted algorithm for sequential (recursive) summation [11, Algortihm 4.1] is extended to general summation in Algorithm 2.
Note that the pseudo-code in Algorithm 2 is geared towards exposition. In practice, one shifts the immediately prior to the summation, to avoid allocating additional storage for the . The ideal but impractical choice for centering is the empirical mean . A more practical approximation is .
3.1 Shifted sequential (recursive) summation
We present an expression for the error (12), a deterministic bound (13), and a probabilistic bound under Model 1.1 (Theorem 3.2).
We incorporate the final uncentering operation as the st summation, and describe shifted sequential summation in exact arithmetic as
where , and . In floating point arithmetic, the denote the (un)centering errors, and the the summation errors.
The idea behind shifted summation for non-random data is that the centered sum
can be more accurate if centering reduces the magnitude of the partial sums and the summands, see also the discussions after Theorems 2.4 and 3.2.
The probabilistic bound below can be considered to be a special case of the general bound in Theorem 3.6, but it requires only a single probability.
Theorem 3.2 (Probabilistic bound for shifted sequential (recursive) summation).
Let and be independent mean-zero random variables, and . Then for any , with probability at least ,
For we have
Proof 3.3.
We construct a Martingale by extracting the accumulated roundoffs in a last-in first-out manner. Set ,
By construction, is a function of , while is a function of for , and finally is a function of all for .
By assumption are tuples of independent random variables with and , . Then
Combining all of the above shows that is a Martingale in .
Theorem 3.2 suggests that the error has a bound proportional to , and that shifted sequential summation improves the accuracy over plain sequential summation if
that is, if shifting reduces the magnitude of the partial sums. This is confirmed by numerical experiments. However, the experiments also illustrate that shifted summation can decrease the accuracy in the summation of naturally centered data, such as being normal(0,1) random variables.
3.2 Shifted general summation
We present a probabilistic bound for shifted general summation under Model 1.2, based on the following computational tree. If Algorithm 1 operates with a tree of height , then the analogous version in Algorithm 2 operates with a tree of height on the input data
The leading inputs are centered in pairs at the beginning, the intermediate sums are , and the final uncentering is .
Theorem 3.4 (Probabilistic bound for shifted general summation).
Proof 3.5.
This is a direct consequence of Theorem 2.12.
4 Compensated sequential summation
We analyse the forward error for compensated sequential summation.
After presenting compensated summation (Algorithm 3) and the roundoff error model (14), we derive three different expressions for the compensated summation error (Section 4.1), followed by bounds that are exact to first order (Section 4.2) and to second order (Section 4.3).
Algorithm 3 is the formulation [7, Theorem 8] of the ‘Kahan Summation Formula’ [14]. A version with opposite signs is presented in [9, Algorithm 4.2].
4.1 Error expressions
We express the accumulated roundoff in Algorithm 3 as a recursive matrix vector product (Lemma 4.1), then unravel the recursions into an explicit form (Theorem 4.3), followed by two other exact expressions derived in a different way (Theorem 4.5).
Lemma 4.1.
Proof 4.2.
Since ,
For the subsequent recursions abbreviate
To derive the recurrence for , write
(15) |
Thus
To derive the recurrence for , start with (15)
Multiply by and subtract
Multiply by to obtain .
Now we unravel this recursion, so that the matrix product ends with a matrix all of whose elements are . Note the correspondence between (16) and the error expression in (4).
Proof 4.4.
4.2 First order bound
We present a first-order expression and bound for the error in Algorithm 3 and note the discrepancy with existing bounds (Remark 4.8.
Proof 4.7.
Corollary 4.6 suggests that the errors in the ’correction’ steps 3 and 5 of Algorithm 3 dominate the first order error.
Remark 4.8.
The existing bounds in [7, Section 4.3, Appendix], [9, (4.9)], [15, page 9-5], and [16, Section 4.2.2] are not consistent with Corollary 4.6, which implies that the computed sum equals
In contrast, the expressions in [7, Theorem 8], [15, page 9-5], and [16, Section 4.2.2] are equal to
implying the bound .
4.3 Second order deterministic and probabilistic bounds
For the second order in Algorithm 3, we present an explicit expression and deterministic bound (Theorem 4.9) and a probabilistic bound (Theorem 4.11).
Theorem 4.9 specifies the second order coefficient in the existing bounds [7, Theorem 8], [9, (4.9)], [16, Section 4.2.2] and improves the coefficient in [15, page 9-5] and [19, Section4].
Theorem 4.9 (Explicit expression and deterministic bound).
Proof 4.10.
For the second order error in Theorem 4.3, it suffices to compute the products to first order only. The matrices in Lemma 4.1 equal to first order
Induction shows that for , where
(17) |
The matrix in Theorem 4.3 is to second order
This implies for the matrix products with in Theorem 4.3
For we have
Substitute the above products into the recurrence for the second order error
The error is bounded up to second order by
The probabilistic bound below is derived from the explicit expression in Theorem 4.9.
Theorem 4.11 (Probabilistic second order bound).
Let and be independent zero-mean random variables, then for any with probability at least , the error in Algorithm 2 is bounded by
where .
Proof 4.12.
Analogously to the proof of Theorem 3.2, we construct a martingale by unravelling the second order error in Theorem 4.9 in a last-in first-out manner. Let , , , , and
Setting show that is a function of while is a function of for .
By assumption are tuples of independent random variables that have zero mean and are bounded, that is,
Thus , . Combine everything to conclude that is a Martingale in .
The differences are for
Since , , the differences are bounded by
where and .
Lemma 1.2 implies that for any with probability at least
The triangle inequality implies the first desired inequality
while the second one follows from , .
The above immediately implies a first order bound.
Corollary 4.13.
Let and be independent zero-mean random variables, then for any with probability at least , the error in Algorithm 2 is bounded by
where .
5 Numerical experiments
After describing the setup, we present numerical experiments for shifted summation (Section 5.1), compensated summation (Section 5.3), and a comparison of the two (Section 5.2).
The unit roundoffs in the experiments are [12]
-
•
Double precision .
Here double precision is Julia Float64, while ‘exact’ computations are performed in Julia Float256. -
•
Half precision . Here half precision is Julia Float16, while ‘exact’ computations are performed in Julia Float64.
We plot relative errors rather than absolute errors to allow for meaningful calibration: Relative errors indicate full accuracy; while relative errors indicate zero accuracy.
For shifted summation we use the empirical mean of two extreme summands,
For probabilistic bounds, the failure probability is , hence .
Two classes of summands will be considered:
- •
-
•
are normal random variables with mean 0 and variance Summands have different signs, which may lead to cancellation. The are naturally clustered around zero.
The plots show relative errors versus , where . To speed up the computations, we compute once, and then pick its leading elements as .
5.1 Errors and bounds for sequential summation
We compare the errors from sequential versions of Algorithm 1 and 2, that is, sequential summation without and with shifting, and an easy relative version of the probabilistic bound in Theorem 3.2
(18) |


In Figure 2, the range for the number of summands is , and working precision is double with . Figure 2 illustrates that, compared to plain summation, shifted summation can increase the accuracy when summands are artificially and tightly clustered at a large mean, however it can hurt the accuracy when summands are naturally clustered around zero. For the normal(0,1) summands in the right panel, the range of shifts is . The expression (18) represents an upper bound accurate within a factor 10-100.
5.2 Comparison of shifted and compensated summation
We compare the errors from sequential summation with shifting (sequential version of Algorithm 1) and compensated summation (Algorithm 3).


5.3 Errors and bounds for compensated summation
We compare the errors from sequential summation (sequential version of Algorithm 1) and compensated summation (Algorithm 3) with the relative version of the first order probabilistic bound in Corollary 4.13
(19) |


In Figure 4, the working precision is double with . In the left panel the number of uniform(0,1) summands is , to avoid overflow since the largest value in binary16 is about 65,504. In the right panel in contrast, the number of normal(0,1) summands is , as summands of different signs are less likely to cause overflow.
Acknowledgement
We thank Johnathan Rhyne for helpful discussions.
References
- [1] P. Blanchard, N. J. Higham, and T. Mary, A class of fast and accurate summation algorithms, SIAM J. Sci. Comput., 42 (2020), pp. A1541–A1557.
- [2] F. Chung and L. Lu, Concentration inequalities and martingale inequalities: a survey, Internet Math., 3 (2006), pp. 79–127.
- [3] M. P. Connolly, N. J. Higham, and T. Mary, Stochastic rounding and its probabilistic backward error analysis, SIAM J. Sci. Comput., 43 (2021), pp. A566–A585.
- [4] G. Constantinides, F. Dahlqvist, Z. Rakamaric, and R. Salvia, Rigorous roundoff error analysis of probabilistic floating-point computations, 2021, https://arxiv.org/abs/2105.13217.
- [5] F. Dahlqvist, R. Salvia, and G. A. Constantinides, A probabilistic approach to floating-point arithmetic, 2019, https://arxiv.org/abs/1912.00867.
- [6] J. Demmel and Y. Hida, Accurate and efficient floating point summation, SIAM J. Sci. Comput., 25 (2003/04), pp. 1214–1248.
- [7] D. Goldberg, What every computer scientist should know about floating-point arithmetic, ACM Comput. Surv., 23 (1991), pp. 5–48.
- [8] E. Hallman, A refined probabilistic error bound for sums, 2021, https://arxiv.org/abs/2104.06531.
- [9] N. J. Higham, Accuracy and stability of numerical algorithms, SIAM, Philadelphia, second ed., 2002.
- [10] N. J. Higham and T. Mary, A new approach to probabilistic rounding error analysis, SIAM J. Sci. Comput., 41 (2019), pp. A2815–A2835.
- [11] N. J. Higham and T. Mary, Sharper probabilistic backward error analysis for basic linear algebra kernels with random data, SIAM J. Sci. Comput., 42 (2020), pp. A3427–A3446.
- [12] IEEE Computer Society, IEEE Standard for Floating-Point Arithmetic, IEEE Standard 754–2008, 2019. http://ieeexplore.ieee.org/document/4610935.
- [13] I. C. F. Ipsen and H. Zhou, Probabilistic error analysis for inner products, SIAM J. Matrix Anal. Appl., 41 (2020), pp. 1726–1741.
- [14] W. Kahan, Further remarks on reducing truncation errors, Comm. ACM, 8 (1965), p. 40.
- [15] W. Kahan, Implementation of algorithms (lecture notes by W. S. Haugeland and D. Hough), Tech. Report 20, Department of Computer Science, University of California, Berkeley, CA 94720, 1973.
- [16] D. Knuth, The Art of Computer Programming, vol. II, Addison-Wesley, Reading, MA, third ed., 1998.
- [17] D. Lohar, M. Prokop, and E. Darulova, Sound probabilistic numerical error analysis, in Intern. Conf. Integrated Formal Methods, Springer, 2019, pp. 322–340.
- [18] M. Mitzenmacher and E. Upfal, Probability and computing: randomization and probabilistic techniques in algorithms and data analysis, Cambridge University Press, 2005.
- [19] A. Neumaier, Rundungsfehleranalyse einiger Verfahren zur Summation endlicher Summen, Z. Angew. Math. Mech., 54 (1974), pp. 39–51.
- [20] S. Roch, Modern discrete probability: An essential toolkit, University Lecture, (2015).