This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

An iterative algorithm for approximating roots of integers

William Gerst
(January 8, 2021)
Abstract

We explore an algorithm for approximating roots of integers, discuss its motivation and derivation, and analyze its convergence rates with varying parameters and inputs. We also perform comparisons with established methods for approximating square roots and other rational powers.

1 Introduction

Fixed-point iteration is a technique commonly used for approximating irrational constants, such as calculating the golden ratio ϕ1.618\phi\approx 1.618 or finding square roots via the Babylonian method. More generally, the Newton-Raphson method is an iterative process used for estimating the roots of functions, which can be applied to find the roots of certain integers. For example, searching for the real root of f(x)=x35f(x)=x^{3}-5 will yield x=53x=\sqrt[3]{5}.

In this paper, we will introduce an algorithm for estimating rational powers of integers that employs fixed-point iteration, provide examples of its use, and compare it to the existing methods mentioned above.

It is important to note the definition of a “fixed point” in the context of functions and iterative processes, as well as what it means for a fixed point to be “stable.”

Definition 1.1.

A fixed point x0x_{0} of a function ff is a value in the domain of ff such that f(x0)=x0f(x_{0})=x_{0}.

In other words, a fixed point of a function occurs wherever the input and output values are equal.

Example 1.2.

The values ϕ=1+52=1.618\displaystyle\phi=\frac{1+\sqrt{5}}{2}=1.618\ldots and Φ=1ϕ0.618\displaystyle\Phi=-\frac{1}{\phi}\approx-0.618 are the two fixed points of the function f(x)=1+1x\displaystyle f(x)=1+\frac{1}{x} because f(ϕ)=ϕf(\phi)=\phi and f(Φ)=Φf(\Phi)=\Phi.

Not all fixed points are of equal importance, however. One important distinction to be made for a given fixed point is whether it can be classified as “stable” or “unstable,” which describes how the function acts in the neighborhood of this point.

Definition 1.3.

A fixed point cc of a function ff is stable if and only if the sequence xn+1=f(xn)x_{n+1}=f(x_{n}) converges to cc with x1=c+εx_{1}=c+\varepsilon for arbitrarily small ε\varepsilon. Conversely, cc is unstable if and only if the sequence xn+1=f(xn)x_{n+1}=f(x_{n}) diverges from cc with small initial offset ε\varepsilon.

The stability of fixed points of a function f(x)f(x) can be identified by the magnitude of the derivative at those points [1].

unstable |f(x)|>1\left\lvert f^{\prime}(x)\right\rvert>1
neutral |f(x)|=1\left\lvert f^{\prime}(x)\right\rvert=1
stable |f(x)|<1\left\lvert f^{\prime}(x)\right\rvert<1
superstable |f(x)|=0\left\lvert f^{\prime}(x)\right\rvert=0

When a fixed point is “superstable,” it is said that the algorithm “converges (at least) quadratically” because the error in the approximation xn+1x_{n+1} must be some quadratic function of the error in xnx_{n}.

2 Motivation

In the Newton-Raphson method, we see an example of how fixed-point iteration can be used to narrow in on an estimated value for the root of a given integer. Suppose that we begin the process in a slightly different way.

For clarity of this example, suppose we are trying to find an estimated value of 7\sqrt{7}. Rather than defining the function f(x)=x27f(x)=x^{2}-7 and trying to find the positive root, we simply consider the equation r2=7r^{2}=7 for some rr. Ideally, we could divide each side of the equation by rr to get r=7/rr=7/r and iterate as rn+1=7/rnr_{n+1}=7/r_{n} with arbitrary r1r_{1} to get closer and closer to rn=7r_{n}=\sqrt{7} as nn\to\infty. There is, however, a fatal flaw with this approach: the fixed point of r=7r=\sqrt{7} has only neutral stability, so the value does not converge as we might hope. Instead, it can be easily seen that any nonzero starting value of r1r_{1} will simply yield values alternating between r1r_{1} and 7/r17/r_{1}.

This flaw can be remedied if we offset rr by some constant value. Suppose that we rewrite r=2cr=2-c. (As for where the 2 and the negative sign come from, that will be explained shortly.) Then, we have the initial equation (2c)2=44c+c2=4+c(c4)=7c=74c4(2-c)^{2}=4-4c+c^{2}=4+c(c-4)=7\iff c=\frac{7-4}{c-4} and can turn this into the fixed-point iteration with

cn+1=3cn4.c_{n+1}=\frac{3}{c_{n}-4}.

Taking the derivative of the right-hand side with respect to cc at the fixed point of c=27c=2-\sqrt{7} gives

ddc(3c4)=3(c4)2=3(27)2=34+47+70.139.\frac{\mathrm{d}}{\mathrm{d}c}\left(\frac{3}{c-4}\right)=\frac{-3}{(c-4)^{2}}=\frac{-3}{(-2-\sqrt{7})^{2}}=\frac{-3}{4+4\sqrt{7}+7}\approx-0.139.

The absolute value of the derivative at the fixed point is less than 1, so it is stable! This means that repeatedly iterating on the value of cc will cause the value to tend towards 272-\sqrt{7}, at which point we may subtract 2c2-c to recover our approximation for 7\sqrt{7} as desired.

This is where the earlier rewriting of r=2cr=2-c is important. Had we chosen a different constant or sign, it is entirely possible that the fixed point would not have been stable, ruining the results. In the following section, we generalize our results and make clear exactly how to choose this constant to make the algorithm work (and optimize its rate of convergence).

3 Derivation of general algorithm

First, we want to generalize to arbitrary input x+x\in\mathbb{R}^{+}, not just the number 7 used in the previous example. (Note that each of the examples given in this paper use integer xx, but there is no limitation that prevents calculating π\sqrt{\pi}, for example. The main drawback is simply that it is more difficult to calculate by hand and of less interest to the general reader.) Start with the quadratic equation

(b2c)2=x\left(\frac{b}{2}-c\right)^{2}=x (1)

where b/2b/2 is our “offset” value and we can solve to get b2c=±x\frac{b}{2}-c=\pm\sqrt{x}. Next, rewrite equation (1) in expanded form as

c2bc+b24=xc^{2}-bc+\frac{b^{2}}{4}=x

and perform algebraic manipulation to obtain

c=x(b2/4)cbc=\frac{x-(b^{2}/4)}{c-b}

which can be easily transformed into the fixed-point iteration of

cn+1=x(b2/4)cnb.c_{n+1}=\frac{x-(b^{2}/4)}{c_{n}-b}. (2)

From here, it is easily verified that the two fixed points for cc are given by c=b2xc_{-}=\frac{b}{2}-\sqrt{x} and c+=b2+xc_{+}=\frac{b}{2}+\sqrt{x}. Determining which of these is stable is fundamental in knowing how to proceed.

We define f(c)=x(b2/4)cbf(c)=\frac{x-(b^{2}/4)}{c-b} and take the absolute value of the derivative to get

|f(c)|=|x(b2/4)(cb)2|.|f^{\prime}(c)|=\left|\frac{x-(b^{2}/4)}{(c-b)^{2}}\right|.

We first check the fixed point of c+=b2+xc_{+}=\frac{b}{2}+\sqrt{x}. Substituting yields

|f(c+)|=|x(b2/4)(xb/2)2|=|x(b2/4)xbx+(b2/4)|.|f^{\prime}(c_{+})|=\left|\frac{x-(b^{2}/4)}{(\sqrt{x}-b/2)^{2}}\right|=\left|\frac{x-(b^{2}/4)}{x-b\sqrt{x}+(b^{2}/4)}\right|.

Consider the case in which we have picked bb such that 0<b<2x0<b<2\sqrt{x}. We can rewrite x=b/2+ε\sqrt{x}=b/2+\varepsilon for some ε>0\varepsilon>0. Then, we can simplify to

|f(c+)|=|bε+ε2ε2|>1.|f^{\prime}(c_{+})|=\left|\frac{b\varepsilon+\varepsilon^{2}}{\varepsilon^{2}}\right|>1.

In this case, then, the fixed point c+c_{+} is unstable. Consider instead the case of 0<2x<b0<2\sqrt{x}<b. We rewrite x=b/2ε\sqrt{x}=b/2-\varepsilon for some 0<ε<b/20<\varepsilon<b/2. Then, we can simplify to

|f(c+)|=|bε+ε2ε2|=|1bε|>1,|f^{\prime}(c_{+})|=\left|\frac{-b\varepsilon+\varepsilon^{2}}{\varepsilon^{2}}\right|=\left|1-\frac{b}{\varepsilon}\right|>1,

where the final inequality holds true because of the bounds on ε\varepsilon. This is unstable once more.

Next, let us consider the case of 2x<b<0-2\sqrt{x}<b<0. Rewriting x=b/2+ε\sqrt{x}=-b/2+\varepsilon for some value ε>0\varepsilon>0 gives

|f(c+)|=|bε+ε2(b+ε)2|=|εεb|<1.|f^{\prime}(c_{+})|=\left|\frac{-b\varepsilon+\varepsilon^{2}}{(-b+\varepsilon)^{2}}\right|=\left|\frac{\varepsilon}{\varepsilon-b}\right|<1.

Unlike the first two cases, this shows that the fixed point c+c_{+} is stable for such a selection of bb.

Fourth, we consider the case of b<2x<0b<-2\sqrt{x}<0. Rewriting x=b/2ε\sqrt{x}=-b/2-\varepsilon for some value 0<ε<b/20<\varepsilon<-b/2 gives

|f(c+)|=|bε+ε2(bε)2|=|εε+b|<1.|f^{\prime}(c_{+})|=\left|\frac{b\varepsilon+\varepsilon^{2}}{(-b-\varepsilon)^{2}}\right|=\left|\frac{\varepsilon}{\varepsilon+b}\right|<1.

Once again, the fixed point is stable.

The last inequality here may not be immediately clear, but it can be observed that the value of the denominator lies within the open interval of bb (which is greater in magnitude than any possible value of ε\varepsilon) to b/2b/2, at which point the numerator and denominator are equal in magnitude.

Now, we address the few remaining special cases. When we have the exact equality b=2xb=2\sqrt{x}, we see |f(c+)||f^{\prime}(c_{+})|\to\infty, making it unstable. When b=0b=0, we have that |f(c+)|=1|f^{\prime}(c_{+})|=1, indicating a neutral stability (the same situation described as a “flaw” in the beginning of section 2). Finally, when we have b=2xb=-2\sqrt{x}, we see |f(c+)|=0|f^{\prime}(c_{+})|=0, making the point superstable.

Nearly identical calculations reveal that whenever the point c+c_{+} is stable, the point cc_{-} is unstable, and vice versa. A summary table of all the stability results above for the different possible values of bb is provided below:

c=b/2xc_{-}=b/2-\sqrt{x} c+=b/2+xc_{+}=b/2+\sqrt{x}
b<2xb<-2\sqrt{x} unstable stable
b=2xb=-2\sqrt{x} unstable superstable
2x<b<0-2\sqrt{x}<b<0 unstable stable
b=0b=0 neutral neutral
0<b<2x0<b<2\sqrt{x} stable unstable
b=2xb=2\sqrt{x} superstable unstable
2x<b2\sqrt{x}<b stable unstable

Essentially, the key takeaway is that there is a kind of symmetry about b=0b=0 with respect to the stability of these fixed points. To make things simpler, we can simply choose to consider strictly positive bb and deal only with cc_{-}, or we may choose to consider only negative bb and the corresponding c+c_{+}; either is equally valid. In this analysis, we opt with the former.

3.1 How to put the square root algorithm into practice

A quick glance at the summary table provides instructions on how to select the constant bb: choose a positive value as close to 2x2\sqrt{x} as possible to maximize the convergence rate. (Estimating x\sqrt{x} is what we seek to do with this algorithm, so coming up with a good initial guess is not necessarily an easy task. It can be noted, however, that for an input integer xx with nn digits, the square root of xx will have approximately n/2n/2 digits, which can be used to rapidly generate a ballpark figure.)

Once we have chosen an appropriate value of bb, we want to choose some initial value c1c_{1} for our iteration. Fortunately, the convergence does not depend on this value, except for that the denominator of (2) must never be zero. To avoid this, set c1c_{1} to some number not equal to bb, and if by chance a division by zero does occur, simply restart the process with a new value of c1c_{1}. (Ideally, c1b/2xc_{1}\approx b/2-\sqrt{x} will provide the best accuracy in the fewest iterations. We can use the same trick as before to estimate x\sqrt{x} here if desired, or simply note that this estimate gives b/2xb/2\approx\sqrt{x} and set c1=0c_{1}=0.)

After values of bb and c1c_{1} are both chosen, iterate with equation (2) until achieving the desired level of accuracy (in terms of unchanging decimal places, etc.).

To finish the process, calculate b/2cnb/2-c_{n} to get an estimate for x\sqrt{x}.

3.2 Example of square-root application

Suppose we wish to calculate an approximation of 5\sqrt{5}. Since 454\approx 5, it stands to reason that 4=25\sqrt{4}=2\approx\sqrt{5}, so we will choose the value b=4b=4 as our guess. Since we have a fairly good estimate for bb, it would make sense to choose c1=0c_{1}=0. For the purpose of demonstrating the convergence speed even in non-optimal cases, however, we will choose c1=10c_{1}=10 here instead. Now, we use the recursive formula

cn+1=x4cn4c_{n+1}=\frac{x-4}{c_{n}-4}

to calculate cnc_{n} up to our desired level of precision. Afterwards, we will evaluate 2cn2-c_{n} to obtain an approximation of 5\sqrt{5}. The results of these calculations are provided below:

ncn11020.1666666630.2606895640.2346938750.2361445760.2360637070.2360682180.2360679690.23606797100.23606797\begin{array}[]{c|c}n&c_{n}\\ \hline\cr 1&10\\ 2&0.16666666\dots\\ 3&-0.26068956\dots\\ 4&-0.23469387\dots\\ 5&-0.23614457\dots\\ 6&-0.23606370\dots\\ 7&-0.23606821\dots\\ 8&-0.23606796\dots\\ 9&-0.23606797\dots\\ 10&-0.23606797\dots\\ \end{array}

Finally, we conclude by computing 2c102.236067977452-c_{10}\approx 2.23606797745, to be compared against the desired value of 52.23606797750\sqrt{5}\approx 2.23606797750.

Interestingly, this specific process produces the continued fraction

[0;4¯]=25=14+14+14+1-[0;\bar{4}]=2-\sqrt{5}=\cfrac{1}{-4+\cfrac{1}{-4+\cfrac{1}{-4+\cfrac{1}{\cdots}}}}

that clearly demonstrates why 2c2(25)=52-c\approx 2-(2-\sqrt{5})=\sqrt{5}.

3.3 Attempting to expand to nthn^{th} roots

It is clear how to extend this reasoning to more than simply square roots. To estimate a power xd\sqrt[d]{x}, we start with the analogous form of equation (1):

(b2c)d=xb2c=xd.\left(\frac{b}{2}-c\right)^{d}=x\iff\frac{b}{2}-c=\sqrt[d]{x}. (3)

To make analysis easier, we can swap the sign of cc to give

(b2+c)d=xb2+c=xd.\left(\frac{b}{2}+c\right)^{d}=x\iff\frac{b}{2}+c=\sqrt[d]{x}.

Binomial expansion of the left equality yields

x=(b2)d+(d1)(b2)d1c+(d2)(b2)d2c2++(dd1)(b2)cd1+cd.x=\left(\frac{b}{2}\right)^{d}+{d\choose 1}\left(\frac{b}{2}\right)^{d-1}c+{d\choose 2}\left(\frac{b}{2}\right)^{d-2}c^{2}+\cdots+{d\choose d-1}\left(\frac{b}{2}\right)c^{d-1}+c^{d}.

All terms, except for the first, in the binomial expansion have at least one factor cc. Then, using the same algebraic manipulations as before, it is trivial to find that

x(b/2)d=c(cd1+)=c((b/2+c)d(b/2)dc).x-(b/2)^{d}=c\cdot\left(c^{d-1}+\cdots\right)=c\cdot\left(\frac{(b/2+c)^{d}-(b/2)^{d}}{c}\right).

Cross-multiplying, it follows that the generalized form of (2) is given by

cn+1=cn(x(b2)d)(b2+cn)d(b2)d.c_{n+1}=\frac{c_{n}\cdot\left(x-\left(\frac{b}{2}\right)^{d}\right)}{\left(\frac{b}{2}+c_{n}\right)^{d}-\left(\frac{b}{2}\right)^{d}}. (4)

where the fixed point for cc satisfies

xdb2+c.\sqrt[d]{x}\approx\frac{b}{2}+c.

The equations (1) and (2) can be readily recovered by setting d=2d=2 and replacing cc with c-c in formulas (3) and (4) and simplifying.

Setting d=3d=3, on the other hand, results in the equation

cn+1=x(b3/8)cn2+3bcn/2+3b2/4.c_{n+1}=\frac{x-(b^{3}/8)}{c_{n}^{2}+3bc_{n}/2+3b^{2}/4}.

Note that we can choose to avoid doing this simplification and just use equation (4) directly, which is especially helpful for large dd.

However, trying this equation with test values may reveal a disappointing truth: the equation is not as versatile as the square root special case. As an exercise, attempt to calculate 12503\sqrt[3]{1250} with the values (b,c1)=(7,1)(b,c_{1})=(7,1) or (b,c1)=(7,10)(b,c_{1})=(7,10). They don’t converge to the correct value! On the other hand, testing with (b,c1)=(70,1)(b,c_{1})=(70,1) or (b,c1)=(70,10)(b,c_{1})=(70,10) does produce the correct value of about 10.77210.772. To understand why this occurs, we must return to the idea of fixed point stability.

In the quadratic special case, we found the fixed points cc of the algorithm as dictated by the bb and xx inputs. These were of the form c=b/2xc_{-}=b/2-\sqrt{x} and c+=b/2+xc_{+}=b/2+\sqrt{x}. Now, in the higher-degree case, we have performed a sign swap, so these fixed points are of the form c=xdb/2c=\sqrt[d]{x}-b/2 (with a ±\pm on the radical term in the case of even dd). Then, to analyze the fixed point stability, we differentiate the right-hand side of (4) with respect to cc at the fixed point(s). Finding where the magnitude of the derivative crosses one, we will see for which values of bb the fixed point(s) switch from being stable to unstable. In the example with 12503\sqrt[3]{1250}, this yields values of b7.88578b\approx 7.88578 and b29.43013b\approx-29.43013, where the algorithm fails to converge to the correct value within the interval but succeeds outside the interval. (Solving the resulting quadratic equation in bb analytically gives b=x3(±31)b=\sqrt[3]{x}\cdot(\pm\sqrt{3}-1), which supports the aforementioned empirical values.)

The cases of even higher degree work even less frequently. In computer-based testing, it seems that the algorithm works but only for specific values of bb inside tight intervals. This is a stark contrast with the quadratic case, which required only that bb have a particular sign.

In the seventh-degree case, for example, the least positive value of bb for which calculating 12507\sqrt[7]{1250} seems to work is approximately 4.17554.1755, while the lowest possible positive value of bb for estimating 125007\sqrt[7]{12500} is approximately 5.80195.8019. Guessing that the seventh powers of these bounds are related to their corresponding values of xx (analogous to the cubic example), we see that the ratio blower7/x17.704b_{lower}^{7}/x\approx 17.704 in each case. Extrapolating to x=125000x=125000 predicts blower8.0617b_{lower}\approx 8.0617, which can be verified via computer. Evidently, this trend is real, although the exact mechanism by which the number 17.704 can be generated remains elusive.

Unfortunately, there does not seem to be a clear path to a general analytical solution to this problem of determining the interval(s) of validity. It should be noted that whereas there was no upper bound on the valid range of bb values for the cubic case, the same does not seem to hold true for higher-degree examples. Fortunately, computer testing suggests that even at high degrees, there are at least some values of bb for which this process works, so a solution could recover some of the usefulness of this technique.

4 Comparison to existing methods

There are two connections we will draw. Namely, we will discuss connections to continued fractions and the Newton-Raphson method mentioned before.

We recall that the special case d=2d=2 in equation (4) results in (2):

cn+1=x(b2/4)cnbc_{n+1}=\frac{x-(b^{2}/4)}{c_{n}-b}

In this case, we see that the recurrence relation allows for an infinite generalized continued fraction to be constructed [2].

c=b2x=x(b2/4)b+x(b2/4)b+x(b2/4)c=\frac{b}{2}-\sqrt{x}=\frac{x-(b^{2}/4)}{-b+\cfrac{x-(b^{2}/4)}{-b+\cfrac{x-(b^{2}/4)}{\ddots}}}

In 3.2, the numerators were each 1, making a “simple” continued fraction.

Next, the Newton-Raphson method is another technique used for approximating roots of positive real numbers, and it also relies on fixed-point iteration in its calculations.

Suppose we want to find an approximation for 7\sqrt{7}. We know that we can achieve this by searching for the positive root of f(x)=x27f(x)=x^{2}-7. By setting x1=1x_{1}=1 and performing successive linear approximations with

xn+1=xnf(xn)f(xn)=xnxn272xn,x_{n+1}=x_{n}-\frac{f(x_{n})}{f^{\prime}(x_{n})}=x_{n}-\frac{x_{n}^{2}-7}{2x_{n}}, (5)

we will have xnx_{n} converge to the desired value. (This is equivalent to the Babylonian method, after algebraic simplification.)

In fact, we see that at the fixed point xn=7x_{n}=\sqrt{7}, the derivative of the right-hand side with respect to xnx_{n} is zero. This indicates that the Newton-Raphson method converges at least quadratically (the number of accurate decimal places roughly doubles in each iteration), which is better than the method explored in this paper, which only converges linearly (since the derivative is generally nonzero, as discussed in the section on stability). This can be explained by the theory of Taylor expansion [3].

Thus, the Newton-Raphson method and higher-order analogues like Halley’s method experience a much faster rate of convergence, with the trade-off being that elementary differentiation of the polynomial f(x)f(x) is necessary for their computation. Since this polynomial is always of the form f(xn)=xndkf(x_{n})=x_{n}^{d}-k with degree dd and constant kk, we know that f(xn)f^{\prime}(x_{n}) in right-hand side of (5) will always be equal to dxnd1dx_{n}^{d-1} by the power rule. For Halley’s method and higher-order root-finding algorithms, the higher-order derivatives can be similarly computed without much effort.

References