Approximating monomials using Chebyshev polynomials

Arvind K. Saibaba¹¹1Department of Mathematics, North Carolina State University, Raleigh, NC. Email: [email protected].

Abstract

This paper considers the approximation of a monomial $x^{n}$ over the interval $[-1,1]$ by a lower-degree polynomial. This polynomial approximation can be easily computed analytically and is obtained by truncating the analytical Chebyshev series expansion of $x^{n}$ . The error in the polynomial approximation in the supremum norm has an exact expression with an interesting probabilistic interpretation. We use this interpretation along with concentration inequalities to develop a useful upper bound for the error.

Keywords— Chebyshev polynomials, Polynomial Approximation, Binomial Coefficients, Concentration inequalities

1 Motivation and Introduction

We are interested in approximating the monomial $x^{n}$ by a polynomial of degree $0\leq k<n$ over the interval $[-1,1]$ . The monomials $1,x,x^{2},\dots$ form a basis for $C[-1,1]$ , so it seems unlikely that we can represent a monomial in terms of lower degree polynomials. In Figure 1, we plot a few functions from the monomial basis over $[0,1]$ ; the basis function look increasingly alike as we take higher and higher powers, i.e., they appear to “lose independence.” Numerical analysts often avoid the monomial basis in polynomial interpolation since they result in ill-conditioned Vandermonde matrices, leading to poor numerical performance in finite precision arithmetic. This loss of independence means that it is reasonable to approximate the monomial $x^{n}$ as a linear combination of lower order monomials, i.e., a lower order polynomial approximation. The natural question to ask, therefore, is: how small can $k$ be so that a well-chosen polynomial of degree $k$ can accurately approximate $x^{n}$ ?

Refer to caption — Figure 1: Visualization of a few monomials in the interval $[0,1]$ .

The surprising answer to this question is that we can approximate the monomial $x^{n}$ over $[-1,1]$ by a polynomial of small degree, which we will make precise. Let $\|f\|_{\infty}=\max_{x\in[-1,1]}|f(x)|$ denote the supremum norm on $C[-1,1]$ and let $\pi_{k}^{*}(\cdot)$ be the best polynomial approximation to $x^{n}$ in this norm; that is

E_{n,k}:=\min_{\pi\in\mathcal{P}_{k}}\|x^{n}-\pi(x)\|_{\infty}=\|x^{n}-\pi_{k}^{*}(x)\|_{\infty},

where $\mathcal{P}_{k}$ is a vector space of polynomials with real coefficients of degree at most $k$ . The minimizer $\pi_{k}^{*}(\cdot)$ exists and is unique [9, Chapter 10], but does not have a closed form expression. Newman and Rivlin [7, Theorem 2] showed that²²2We briefly mention that the notation our manuscript differs from [7] in that we reverse the roles of $n$ and $k$ .

\frac{p_{n,k}}{4e}\leq\|x^{n}-\pi_{k}^{*}(x)\|_{\infty}\leq p_{n,k},

(1)

where the term $p_{n,k}$ is given by the formula

p_{n,k}=\frac{1}{2^{n-1}}\sum_{j=\lfloor(n+k)/2\rfloor+1}^{n}\binom{n}{j}.

Since $p_{n,k}$ involves the sum of binomial coefficients, it has a probabilistic interpretation which we explore in Section 4.

To see why a small $k$ is sufficient, consider the upper bound $p_{n,k}$ . In Section 4 we use the probabilistic interpretation to obtain the following bound $p_{n,k}\leq 2\exp\left(-{k^{2}}/{2n}\right)$ . Suppose we are given a user-defined tolerance $\epsilon>0$ . To ensure

\|x^{n}-\pi_{k}^{*}(x)\|_{\infty}\leq\epsilon,

we need to choose $k\geq\sqrt{2n\log(2/\epsilon)}$ . The accuracy of the polynomial approximation is visualized in Figure 2, where in the left panel we plot the monomial $x^{n}$ for $n=75$ and the best polynomial approximation $\pi_{k}^{*}$ for $k=5,15,25$ . The polynomial $\pi_{k}^{*}$ is computed using the Remez algorithm, implemented in chebfun [3]. We see that for $k=25$ , the polynomial approximation looks very accurate. In the right panel, we display $p_{n,k}$ , which is the upper bound of the best polynomial approximation, as well as the upper bound for $p_{n,k}$ . We see that $p_{n,k}$ and its upper bound both have sharp decay with increasing $k$ . Numerical evidence in [6] further confirms this analysis; the authors show that the error $E_{n,k}$ behaves approximately like $\frac{1}{2}\text{erfc}(k/\sqrt{n})$ , where erfc is the complementary error function. .

Polynomial and rational approximations to the monomial has received considerable attention in the approximation theory community, and surveys of various results can be found in [8, 6]. Polynomial approximations to high order monomials have many applications in numerical analysis. This key insight was exploited by Cornelius Lanczos [5] in his “ $\tau$ -method” for the numerical solution of differential equations. For a simulating discussion on this topic, please see [6]. In numerical linear algebra, this has been exploited to efficiently compute matrix powers and the Schatten p-norm of a matrix [1, 4].

In this short note, we show to construct a polynomial approximation $x^{n}\approx\phi_{k}(x)$ using a truncated Chebyshev polynomial expansion. The error in the truncated representation equals the sum of the discarded coefficients and is precisely $p_{n,k}$ . The polynomial $\phi_{k}$ and the resulting error can both be computed analytically and, therefore, is of great practical use. We briefly review Chebyshev polynomials in Section 2 and state and prove the main result in Section 3. In Section 4, we explore probabilistic interpretations of $p_{n,k}$ and obtain bounds for partial sums of binomial coefficients.

2 Chebyshev polynomials

The Chebyshev polynomials of the first kind $T_{n}(x)$ for $n=0,1,2,\dots$ can be represented as

T_{n}(x)=\cos(n\arccos{x})\qquad x\in[-1,1].

Starting with $T_{0}(x)=1$ , $T_{1}(x)=x$ , the Chebyshev polynomials satisfy a recurrence relationship of the form $T_{n+1}(x)=2xT_{n}(x)-T_{n-1}(x)$ for $n\geq 1$ . The Chebyshev polynomials are orthogonal with respect to the weighted inner product

\langle u,v\rangle=\int_{-1}^{1}w(x)u(x)v(x)dx

where the weight function takes the form $w(x)=(1-x^{2})^{-1/2}$ . Any function $f\in C[-1,1]$ that is Lipschitz continuous can be represented in terms of a Chebyshev polynomial expansion of the form

f(x)=\frac{1}{2}c_{0}+\sum_{j=1}^{\infty}c_{j}T_{j}(x)\qquad x\in[-1,1],

where the coefficients $c_{j}$ are obtained as $c_{j}=\frac{2}{{\pi}}\langle f(x),T_{j}(x)\rangle$ and the series is uniformly convergent. The monomial $x^{n}$ is rather special since it has the following exact representation in terms of the Chebyshev polynomials [2, Section 4]

x^{n}=\sum_{j=0}^{n}{}^{{}^{\prime}}c_{j}T_{j}(x),

(2)

where ${}^{{}^{\prime}}$ means the summand corresponding to $j=0$ is halved (if it appears) and the coefficients $c_{j}$ for $j=0,\dots,n$ are

c_{j}=\left\{\begin{array}[]{ll}2^{1-n}\binom{n}{(n-j)/2}&n-j\text{ even}\\ 0&\text{otherwise}.\end{array}\right.

(3)

Equation (2) takes a more familiar form, when we consider the trigonometric perspective of Chebyshev polynomials. For example, the well-known trigonometric identity $\cos(3\theta)=4\cos^{3}\theta-3\cos\theta$ , can be arranged as

\cos^{3}\theta=\frac{3}{4}\cos\theta+\frac{1}{4}\cos(3\theta)=\frac{1}{2^{2}}\left(\binom{3}{1}\cos\theta+\binom{3}{0}\cos(3\theta)\right).

With $x=\cos\theta$ , we get $x^{3}=2^{-2}\binom{3}{1}T_{1}(x)+2^{-2}\binom{3}{0}T_{3}(x)$ . For completeness, we provide a derivation of (2) in Appendix A. It is important to note here that the series in (2) is finite, but can be truncated to obtain an accurate approximation; see Section 3.

Chebyshev polynomials have many applications in approximation theory and numerical analysis [9] but we limit ourselves to two such examples here. First, if the function is differentiable $r$ times or analytic, the Chebyshev coefficients exhibit decay (algebraic or geometric respectively). Therefore, the Chebyshev series can be truncated to obtain an polynomial approximation of the function and the accuracy of the approximation depends on the rate of decay of the coefficients. Another application of Chebyshev polynomials is in the theory and practice of polynomial interpolation. The polynomial ${q}_{n-1}^{*}(x)=x^{n}-2^{1-n}T_{n}(x)$ solves the minimax problem

\min_{q\in\mathcal{P}_{n-1}}\|x^{n}-q(x)\|_{\infty}=2^{1-n}.

(4)

Based on the minimax characterization, to interpolate a function over $[-1,1]$ by a polynomial of degree $n-1$ , the function to be interpolated should be evaluated at the roots of the Chebyshev polynomial $T_{n}$ given by the points $x_{j}=\cos\left(\frac{2j+1}{2n}\pi\right)$ for $j=0,\dots,n-1$ .

3 Main result

We construct the polynomial approximation $x^{n}\approx\phi_{k}(x)$ by truncating the Chebyshev polynomial expansion in (2) beyond the term $j=k$ . That is

\phi_{k}(x):=\sum_{j=0}^{k}{}^{{}^{\prime}}c_{j}T_{j}(x).

Our main result is the following theorem, which quantifies the error in the polynomial approximation. The proof of this theorem is based on the expression in (2). We believe this result is new.

Theorem 1.

The error in the polynomial approximation $\phi_{k}(x)$ satisfies

\|x^{n}-\phi_{k}(x)\|_{\infty}=p_{n,k}.

Proof.

From (2), $x^{n}-\phi_{k}(x)=\sum_{j=k+1}^{n}c_{j}T_{j}(x)$ . Using triangle inequality, we find that $\|x^{n}-\phi_{k}(x)\|_{\infty}\leq\sum_{j=k+1}^{n}c_{j}$ since the coefficients are nonnegative and the Chebyshev polynomials are bounded as $|T_{j}(x)|\leq 1$ . Substituting the coefficients $c_{j}$ from (3), to get

\|x^{n}-\phi_{k}(x)\|_{\infty}\leq\frac{1}{2^{n-1}}\sum_{\begin{subarray}{c}j=k+1\\ n-j\text{ even}\end{subarray}}^{n}\binom{n}{(n-j)/2}.

(5)

Using the properties of the binomial coefficients, the summation simplifies as

\sum_{\begin{subarray}{c}j=k+1\\ n-j\text{ even}\end{subarray}}^{n}\binom{n}{(n-j)/2}=\sum_{\begin{subarray}{c}j=k+1\\ n+j\text{ even}\end{subarray}}^{n}\binom{n}{(n+j)/2}=\sum_{j=\lfloor(n+k)/2\rfloor+1}^{n}\binom{n}{j}.

Plug this identity into (5) to get $\|x^{n}-\phi_{k}(x)\|_{\infty}\leq p_{n,k}$ . The bound is clearly achieved at $x=1$ , where all the Chebyshev polynomials take the value $1$ . ∎

This theorem shows that the polynomial approximation $\phi_{k}$ is nearly optimal, and the error due to this approximation is $p_{n,k}$ . However, it is the optimal polynomial for the special case $k=n-1$ . It is easy to see that $x^{n}-\phi_{n-1}(x)=2^{1-n}T_{n}(x)$ and so $\phi_{n-1}$ is the same as the best polynomial approximation $q^{*}_{n-1}$ in (4). For $k<n-1$ , from (1) and Theorem 1

\frac{\|x^{n}-\phi_{k}(x)\|_{\infty}}{4e}\leq\|x^{n}-\pi_{k}^{*}(x)\|_{\infty}\leq\|x^{n}-\phi_{k}(x)\|_{\infty},

so that the error in the Chebyshev polynomial approximation is suboptimal by at most the factor $4e\approx 10.87$ . Therefore, by using $\phi_{k}$ we lose only one significant digit of accuracy compared to $\pi^{*}_{k}$ .

4 A probabilistic digression

In Section 1, we saw that the error in the monomial approximation depends on $p_{n,k}$ . Since $p_{n,k}$ depends on the sum of binomial coefficients, it has a probabilistic interpretation. Newman and Rivlin [7] observed that if a fair coin is tossed $n$ times, $p_{n,k}$ is the probability that the magnitude of the difference between the number of heads and the number of tails exceeds $k$ . They used this insight along with the de Moivre-Laplace theorem [10, Section 1.3] (which is a special case of the Central Limit Theorem) to obtain the approximation $p_{n,k}\approx 2\text{erfc}(k/\sqrt{n})$ .

To convert this into a rigorous inequality for $p_{n,k}$ we use a different tool from probability, namely, concentration inequalities. The inequalities are useful in quantifying how much a random variable deviates from its mean. We start with the following alternative interpretation for $p_{n,k}$ : it is twice the probability that greater than $\lfloor(n+k)/2\rfloor$ coin tosses result in heads (or equivalently tails). We associate each coin toss with an independent Bernoulli random variable $X_{i}$ with parameter $p=1/2$ since the coin is fair. The random variable $X=\sum_{i=1}^{n}X_{i}$ has the Binomial distribution with parameters $n$ and $p$ . Then,

p_{n,k}=2\mathbb{P}\left(\lfloor(n+k)/2\rfloor+1\leq X\leq n\right)\leq 2\mathbb{P}\left(X\geq(n+k)/2\right).

Since $X$ has the Binomial distribution, we can once again use the de Moivre-Laplace theorem, to say that as $n\rightarrow\infty$ ,

\frac{X-np}{\sqrt{np(1-p)}}\longrightarrow\mathcal{N}(0,1),\qquad\text{in distribution}.

Roughly speaking, this theorem says that $X$ behaves as a normal random variable with mean $n/2$ and variance $n/4$ . Since the tails of normal distributions decay exponentially, we expect that $X$ lies in the range $\frac{n}{2}\pm 1.96\sqrt{\frac{n}{4}}$ with nearly $95\%$ probability; alternatively, the probability that it is outside this range is very small. To make this more precise, we apply Hoeffding’s concentration inequality [10, Theorem 2.2.6], to obtain

\mathbb{P}\left(X\geq(n+k)/2\right)=\mathbb{P}\left(X-\mathbb{E}[X]\geq k/2\right)\leq\exp\left(-\frac{k^{2}}{2n}\right).

This gives our desired bound $p_{n,k}\leq 2\exp(-{k^{2}}/{2n})$ .

We can use a similar technique to prove the following result which may be of independent interest. If $0\leq k\leq n/2$ , then

\sum_{j=0}^{k}\binom{n}{j}\leq 2^{n}\exp\left(-\frac{(n-2k)^{2}}{2n}\right).

Other concentration inequalities such as Chernoff and Bernstein (see [10, Chapter 2]) also give equally interesting bounds. We invite the reader to explore such results.

5 Acknowledgements

The author would like to thank Alen Alexanderian, Ethan Dudley, Ivy Huang, Ilse Ipsen, and Nathan Reading for comments and feedback. The work was supported by the National Science Foundation through the grants DMS-1745654 and DMS-1845406.

6 Declaration of Interest

The author has no relevant financial or non-financial competing interests to report.

Appendix A Derivation of the monomial expansion

In this appendix, we provide a short derivation of (2). We take $x=\cos\theta$ and write

\cos^{n}\theta=\left(\frac{e^{i\theta}+e^{-i\theta}}{2}\right)=\frac{1}{2^{n}}\sum_{j=0}^{n}\binom{n}{j}e^{ij\theta}e^{-i(n-j)\theta}=\frac{1}{2^{n}}\sum_{j=0}^{n}\binom{n}{j}e^{i(2j-n)\theta}.

We have used Euler’s formula and the binomial theorem. At this point, the derivation splits into two different paths:

1. $n$ is odd

	$\displaystyle\cos^{n}\theta=$	$\displaystyle\>\frac{1}{2^{n}}\left(\sum_{j=0}^{(n-1)/2}\binom{n}{j}e^{i(2j-n)\theta}+\sum_{j=(n+1)/2}^{n}\binom{n}{j}e^{i(2j-n)\theta}\right)$
	$\displaystyle=$	$\displaystyle\>\frac{1}{2^{n-1}}\sum_{j=0}^{(n-1)/2}\binom{n}{j}\cos((n-2j)\theta)$
	$\displaystyle=$	$\displaystyle\>\frac{1}{2^{n-1}}\sum_{\begin{subarray}{c}j=0\\ n-j\text{ even}\end{subarray}}^{n}\binom{n}{(n-j)/2}\cos(j\theta).$

2. $n$ is even

	$\displaystyle\cos^{n}\theta=$	$\displaystyle\>\frac{1}{2^{n}}\left(\sum_{j=0}^{n/2-1}\binom{n}{j}e^{i(2j-n)\theta}+\binom{n}{n/2}+\sum_{j=n/2+1}^{n}\binom{n}{j}e^{i(2j-n)\theta}\right)$
	$\displaystyle=$	$\displaystyle\>\frac{1}{2^{n}}\binom{n}{n/2}+\frac{1}{2^{n-1}}\sum_{j=0}^{n/2-1}\binom{n}{j}\cos((n-2j)\theta)$
	$\displaystyle=$	$\displaystyle\>\frac{1}{2^{n-1}}\sum_{\begin{subarray}{c}j=0\\ n-j\text{ even}\end{subarray}}^{n}{}^{{}^{\prime}}\binom{n}{(n-j)/2}\cos(j\theta).$

In either case, substitute $x=\cos\theta$ and $T_{j}(x)=\cos(j\theta)$ to complete the derivation.

References

[1] H. Avron and S. Toledo. Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix. Journal of the ACM (JACM), 58(2):8, 2011.
[2] W. J. Cody. A survey of practical rational and polynomial approximation of functions. SIAM Review, 12(3):400–423, 1970.
[3] T. A. Driscoll, N. Hale, and L. N. Trefethen. Chebfun Guide. Pafnuty Publications, 2014.
[4] E. Dudley, A. K. Saibaba, and A. Alexanderian. Monte Carlo estimators for the Schatten p-norm of symmetric positive semidefinite matrices. arXiv preprint arXiv:2005.10174, 2020.
[5] C. Lanczos. Applied analysis. Dover Books on Advanced Mathematics. Dover Publications, Inc., New York, 1988. Reprint of the 1956 original.
[6] Y. Nakatsukasa and L. Trefethen. Rational approximation of $x^{n}$ . Proceedings of the American Mathematical Society, 146(12):5219–5224, 2018.
[7] D. J. Newman and T. J. Rivlin. Approximation of monomials by lower degree polynomials. Aequationes Mathematicae, 14(3):451–455, 1976.
[8] A. R. Reddy. Approximations to $x^{n}$ and $|x|$ —a survey. J. Approx. Theory, 51(2):127–137, 1987.
[9] L. N. Trefethen. Approximation theory and approximation practice. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2013.
[10] R. Vershynin. High-dimensional probability, volume 47 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2018.