[datatype=bibtex] \map[overwrite=true] \step[fieldsource=fjournal] \step[fieldset=journal, origfieldval] \step[fieldset=language, null] \step[fieldset=issn, null] \step[fieldset=month, null] \map[overwrite=true] \pertypearticle \pertypeinproceedings \pertypeunpublished \step[fieldset=publisher, null]
Power Series Composition in Near-Linear Time
Abstract
We present an algebraic algorithm that computes the composition of two power series in softly linear time complexity. The previous best algorithms are non-algebraic algorithm by Kedlaya and Umans (FOCS 2008) and an algebraic algorithm by Neiger, Salvy, Schost and Villard (JACM 2023).
Our algorithm builds upon the recent Graeffe iteration approach to manipulate rational power series introduced by Bostan and Mori (SOSA 2021).
1 Introduction
Let be a commutative ring and let be polynomials in of degrees less than and , respectively. The problem of power series composition is to compute the coefficients of . The terminology stems from the idea that can be seen as a truncated formal power series. This is a fundamental problem in computer algebra [Knu98, Section 4.7], and has applications in various areas such as combinatorics [PSS12] and cryptography [Bos+08].
A textbook algorithm for power series composition is due to Brent and Kung [BK78], which computes the composition in time complexity. Here denotes the complexity of computing the product of two polynomials of degree less than .
The current best known algorithm for power series composition is due to Kedlaya and Umans [KU08, KU11], which computes the composition in bit operations, where is the finite field . Van der Hoeven and Lecerf [HL20] gave a detailed analysis of the subpolynomial term, and showed that the algorithm has a time complexity of .
In this paper, we present a simple algorithm that reduces the complexity of power series composition to near-linear time. Furthermore, our algorithm works over arbitrary commutative rings.
Theorem 1.
Given polynomials with degree less than and , respectively, the power series composition can be computed in arithmetic operations.
We remark that our algorithm also has consequences for other computation models measured by bit complexity, such as boolean circuits and multitape Turing machines.
Technical Overview.
One of our key ingredients is Graeffe’s root squaring method, which was first time introduced by Schönhage [Sch00] in a purely algebraic setting to compute reciprocals of power series. Recently, Bostan and Mori [BM21] applied the Graeffe iteration to compute the coefficient of of the series expansion of a rational power series , achieving time complexity where is the degree bound of and .
In a nutshell, their algorithm works as follows. Considering multiplying on both the numerator and the denominator, one obtains 111We use to denote coefficient extraction, i.e., for a power series , is the coefficient of in .. By the symmetry of , one can rewrite the expression to where and . Furthermore, one can split into even and odd parts as , and reduce the problem into or depending on the parity of . Since the iteration reduces by half each time, the algorithm has time complexity by using polynomial multiplication.
Despite their algorithm has the same time complexity as the classical algorithm due to Fiduccia [Fid85], their algorithm is based on a totally different idea, and it is worth to get a closer look at their algorithm when . In that case, the terms greater than are irrelevant, hence the size of the problem can be reduced to , compared with . Since the size of the problem is reduced by half in each iteration, the total time complexity is .
We extend the Graeffe iteration approach to a special kind of bivariate rational power series , where . The goal is still to compute the coefficient of of the series expansion of , but in this case, the output is a polynomial in , instead of a single -coefficient.
At the beginning of our algorithm, both and have a degree of with respect to , and a degree at most with respect to . In each iteration, we halve the degree with respect to , and double the degree with respect to , therefore the size of the problem stays at . Since there are iterations, the total time complexity is .
We observe that the above algorithm, actually already solved the power projection problem, which is the transposed problem of power series composition. By the transposition principle — a general method for transforming algebraic algorithms into algorithms for the transposed problem with nearly identical complexity, we conclude the algorithm for power series composition. For simplicity, we unpack the transposition principle and give an independent explanation of the algorithm for power series composition in our presentation.
1.1 Related Work
Special Cases.
Many previous works have focused on special cases where and/or has a specific form. Kung [Kun74] applied Newton iteration to compute the reciprocal of a power series, i.e., , in operations, Brent [Bre76] further extended the algorithm for being elementary functions. Extensive work has been done to further optimize the constant factor of the algorithms computing reciprocals [SGV94, Ber04, Sch00, Har11, Ser10], square roots [Ber04, Har11, Ser10], exponentials [Ber04, Hoe10, BS09, Har09, Ser10, Ser12], and other elementary functions.
Gerhard [Ger00] gave an algorithm for the case when and , which can be interpreted as the conversion between the falling factorial basis and the monomial basis of polynomials. Brent and Kung [BK78] showed that when is a constant degree polynomial, the composition can be computed in time. This result is generalized by van der Hoeven [Hoe02] to the case where is an algebraic power series. Bostan, Salvy and Schost [BSS08] isolated a large class of such that the composition problem can be computed in or operations, by composing several basic subroutines and the usage of the transposition principle [BLS03].
Bernstein [Ber98] studied the composition of a power series in the case where has a small characteristic. When has positive characteristic , his algorithm computes the composition of a power series in operations. Ritzmann [Rit86] studied the numerical algorithm of power series composition, where is the field of complex numbers.
Modular Composition.
Besides Brent and Kung’s -time algorithm, all known approaches toward the general power series composition, actually solve a generalized problem called the modular composition problem. This problem is, given three polynomials where is monic, to compute where the operation takes the remainder of the Euclidean division. When restricted to , the modular composition problem becomes the power series composition problem.
In Brent and Kung’s paper [BK78] on power series composition, they also presented an algorithm that computes modular composition in operations, where is a feasible exponent of matrix multiplication. Huang and Pan [HP98] improved the bound to by using the fast rectangular matrix multiplication, where is a feasible exponent of rectangular matrix multiplication of size . Even assuming one can attain the obvious lower bounds and , their algorithms cannot yield a bound better than .
Based on this, Bürgisser, Clausen and Shokrollahi [BCS10, open problem 2.4] and von zur Gathen and Gerhard [GG13, research problem 12.19] asked the following question:
Question 1.
Is there an algorithm that computes modular composition better than Brent and Kung’s approach, or even better than ?
Their question is partially answered by Kedlaya and Umans, and Neiger, Salvy, Schost and Villard.
Kedlaya and Umans [KU08, KU11] presented two algorithms for computing modular composition, where is the finite field . The first algorithm performs bit operations, while the second one performs algebraic operations, where is the characteristic of the field. However, the first algorithm involves non-algebraic operations, such as lifting from the prime field to the integers, and the second algorithm is efficient only for fields with small characteristic. As a result, their approach is unlikely to extend to the case of general rings.
A recent breakthrough by Neiger, Salvy, Schost and Villard [Nei+23] presented a Las Vegas algorithm that computes modular composition in 222We use to suppress log factors, i.e., denotes for some . arithmetic operations, assuming is a field and some mild technical assumptions. Here is determined by
By the best known upper bound and given by Williams, Xu, Xu and Zhou [Wil+24], it follows that , which breaks the barrier. However, by the obvious lower bound and , their algorithm cannot yield a bound better than .
Our results can be seen as another partial answer to Question 1.
2 Preliminaries
2.1 Notation
In this paper, denotes an effective ring, i.e., a commutative ring with unity, and one can perform the basic ring operations with unit cost in . The polynomial ring and the formal power series ring over are denoted by and , respectively. We use to denote the set of polynomials of degree less than . For any polynomial or power series , denotes the coefficient of , and denotes the truncation .
For a bivariate polynomial , and denote its degree with respect to and , respectively. The bidegree of is the pair . Inequalities between bidegrees are component-wise. For example, means and .
2.2 Computation Model
Informally, we measure the complexity of an algorithm by counting the number of arithmetic operations in the ring . The underlying computation model is the straight-line program. Readers are referred to the textbook of Bürgisser, Clausen and Shokrollahi [BCS10, Sec. 4.1] for a detailed explanation.
2.3 Polynomial Multiplication
Our algorithm calls polynomial multiplication as a crucial subroutine. Over different rings, the time complexity of our algorithm may vary due to the different polynomial multiplication algorithms.
For an effective ring , we call a function a multiplication time for if polynomials in of degree less than can be multiplied in operations. When the ring is clear from the context, we simply write .
Throughout this paper, we assume a regularity condition on , that for some constants , the function satisfies the condition for all sufficiently large . All the examples of presented in this paper satisfy this condition.
The fast Fourier transform (FFT) [CT65] gives the bound when is a field possessing roots of unity of sufficiently high order; for example, the complex numbers satisfies . For any -algebra where is a prime, Harvey and van der Hoeven [HH19] proved that 333 is the iterated logarithm.. For general rings, Cantor and Kaltofen [CK91] showed a uniform bound . This implies that our algorithm has a uniform bound .
We may also obtain complexity bounds for various computation models, depending on the fast arithmetic of and the polynomial multiplication algorithm on the corresponding model. For example, for the bit complexity of boolean circuits and multitape Turing machines, since the polynomial multiplication over can be done in bit operations [HH19], substituting such a bound into our algorithm yields a bound for the bit complexity of the composition problem.
2.4 Operations on Power Series and Polynomials
We need the following operations on power series and polynomials that are used in our algorithm.
Theorem 2 ([Kun74]).
Let be an effective ring and satisfies , the truncation of the reciprocal can be computed in operations.
Using Kronecker’s substitution, one can reduce the problem of computing the multiplication of bivariate polynomials to univariate cases.
Theorem 3 ([GG13, Corollary 8.28]).
Let with bidegree bound , their product can be computed in operations.
We also require the following lemma, which bounds the total time complexity of polynomial multiplications with exponentially decreasing degrees.
Lemma 4 ([BK78, Lemma 1.1]).
.
2.5 Transposition Principle
The transposition principle, a.k.a. Tellegen’s principle, is an algorithmic theorem that allows one to produce a new algorithm of the transposed problem of a given linear problem.
Theorem 5 ([BCS10, Thm. 13.20]).
Let be an matrix. Suppose that there exists a linear straight-line program of length that computes the matrix-vector product , and has zero rows and zero columns. Then the transposed problem can be solved by a linear straight-line program of length .
Roughly speaking, the transposition principle allows one to compute the transposed problem with the same time complexity as the original problem, provided that the algorithm is -linear, in the sense that only linear operations in the coefficients of are performed.
The transposed problem of power series composition is the power projection problem, which is defined as follows.
Definition 1 (Power Projection).
Given a positive integer , a polynomial of degree less than , and a linear form , the power projection problem is to compute for all . The linear form is identified by its values on the monomials , i.e., .
For a given polynomial , consider the matrix given by . The power series composition problem is equivalent to computing , where is the column vector of coefficients of , and the power projection problem is equivalent to computing , where is the column vector of coefficients of the linear form . It is important to note that the power projection and composition problems are linear only when the polynomial is fixed, as the output of these problems is nonlinear with respect to the coefficients of .
3 Main Results
In Section 3.1, we present an efficient -linear algorithm for power projection. By applying the transposition principle to this algorithm, we obtain an algorithm for power series composition. In Section 3.2, we describe the derivation of the power series composition algorithm without relying on the transposition principle, thereby providing a direct proof of Theorem 1.
3.1 Algorithm for Power Projection
Our objective is to compute for all . We define a polynomial as . Let and . Then, the generating function is rearranged as:
If , then . If , we multiply by the numerator and the denominator, resulting in
Let . satisfies the equation , which implies for all odd ; thus, there exists a unique polynomial such that . Now, we have
for . Since is an even formal power series with respect to , i.e., for all odd , we can ignore the odd (or even) part of if is even (or odd). Let and be the unique polynomials such that . Then, we have
Now the problem is reduced to a case where is halved. We repeat this reduction until becomes . The resulting algorithm is summarized in Algorithm 1.
Input: , ,
Output:
Theorem 6.
Given a positive integer , a polynomial with degree less than , and a linear form , one can compute for all in operations.
Proof.
We analyze the time complexity of . Let represent the time complexity required for polynomial multiplications in the -th iteration out of a total of iterations in Algorithm 1. Here, we consider the first iteration as the -th iteration. By induction, we observe that both and are at most in the -th iteration, which implies . At the same time, we also have both and are at most , which implies . By combining these bounds with Lemma 4, the total time complexity of all iterations is bounded as
The computation in line 13 can be performed in operations by Theorem 2. Other parts of the algorithm have a negligible contribution to the overall time complexity. Consequently, the total time complexity is . ∎
3.2 Algorithm for Composition
The algorithm presented in this section is essentially the transposition of the algorithm presented in Section 3.1. However, to enhance comprehension, we present a derivation that does not rely on the transposition principle. To this end, we rewrite the objective .
Let . Then, we observe that
Let us define a operator by
Then our objective is to compute . For algorithmic convenience, we generalize our aim to computing for a given . If , then . If , similar to the transformation described in Section 3.1, we can derive that
where . Let . Then, we can discard terms of order less than with respect to among to compute , because they has no contribution to term with order or higher with respect to when multiplied by . Therefore, can be rewritten as
Now the problem is reduced to a case where is halved. We repeat this reduction until becomes . The resulting algorithm is summarized in Algorithm 2.
Input: , , , ,
Output:
Proof of Theorem 1.
We analyze the time complexity of . Let represent the time complexity required for line 6 and line 10 in the -th invocation out of a total of recursive invocations of Algorithm 2. Here, we consider the first invocation as the -th invocation. By induction, we observe that and in the -th invocation, which implies . At the same time, we also have and , which implies . By combining these bounds with Lemma 4, the total time complexity of all invocations is bounded as
The computation in line 3 can be performed in operations by Theorem 2. Other parts of the algorithm have a negligible contribution to the overall time complexity. Consequently, the total time complexity is . ∎
Acknowledgements
We extend our gratitude to Josh Alman and Hanna Sumita for their review and revisions of our paper. We also appreciate the helpful suggestions provided by the anonymous reviewers.
References
- [Ber98] Daniel J. Bernstein “Composing power series over a finite ring in essentially linear time” In Journal of Symbolic Computation 26.3, 1998, pp. 339–341 DOI: 10.1006/jsco.1998.0216
- [Ber04] Daniel J. Bernstein “Removing redundancy in high-precision Newton iteration” Available from http://cr.yp.to/papers.html#fastnewton, 2004
- [BLS03] Alin Bostan, Grégoire Lecerf and Éric Schost “Tellegen’s principle into practice” In Proceedings of the 2003 international symposium on Symbolic and algebraic computation, 2003, pp. 37–44
- [Bos+08] Alin Bostan, François Morain, Bruno Salvy and Éric Schost “Fast algorithms for computing isogenies between elliptic curves” In Mathematics of Computation 77.263, 2008, pp. 1755–1778 DOI: 10.1090/S0025-5718-08-02066-8
- [BM21] Alin Bostan and Ryuhei Mori “A simple and fast algorithm for computing the -th term of a linearly recurrent sequence” In Symposium on Simplicity in Algorithms (SOSA), 2021, pp. 118–132 SIAM
- [BSS08] Alin Bostan, Bruno Salvy and Éric Schost “Power series composition and change of basis” In Proceedings of the twenty-first international symposium on Symbolic and algebraic computation, 2008, pp. 269–276
- [BS09] Alin Bostan and Éric Schost “A simple and fast algorithm for computing exponentials of power series” In Information Processing Letters 109.13, 2009, pp. 754–756 DOI: 10.1016/j.ipl.2009.03.012
- [Bre76] Richard P Brent “Multiple-precision zero-finding methods and the complexity of elementary function evaluation” In Analytic computational complexity Elsevier, 1976, pp. 151–176
- [BK78] Richard P. Brent and Hsiang T. Kung “Fast algorithms for manipulating formal power series” In Journal of the ACM 25.4, 1978, pp. 581–595
- [BCS10] Peter Bürgisser, Michael Clausen and Mohammad A. Shokrollahi “Algebraic Complexity Theory” Springer, 2010
- [CK91] David G. Cantor and Erich Kaltofen “On fast multiplication of polynomials over arbitrary algebras” In Acta Informatica 28.7, 1991, pp. 693–701 DOI: 10.1007/BF01178683
- [CT65] James W. Cooley and John W. Tukey “An algorithm for the machine calculation of complex Fourier series” In Mathematics of Computation 19.90, 1965, pp. 297–301 URL: http://www.jstor.org/stable/2003354
- [Fid85] Charles M. Fiduccia “An efficient formula for linear recurrences” In SIAM Journal on Computing 14, 1985, pp. 106–112 DOI: 10.1137/0214007
- [GG13] Joachim Gathen and Jürgen Gerhard “Modern Computer Algebra” Cambridge University Press, 2013 DOI: 10.1017/CBO9781139856065
- [Ger00] Jürgen Gerhard “Modular algorithms for polynomial basis conversion and greatest factorial factorization” In Proceedings of the Seventh Rhine Workshop on Computer Algebra (RWCA’00), 2000, pp. 125–141
- [Har09] David Harvey “Faster exponentials of power series” In arXiv preprint arXiv:0911.3110, 2009
- [Har11] David Harvey “Faster algorithms for the square root and reciprocal of power series” In Mathematics of Computation 80.273, 2011, pp. 387–394 DOI: 10.1090/S0025-5718-2010-02392-0
- [HH19] David Harvey and Joris Hoeven “Faster polynomial multiplication over finite fields using cyclotomic coefficient rings” Id/No 101404 In Journal of Complexity 54, 2019, pp. 18 DOI: 10.1016/j.jco.2019.03.004
- [Hoe02] Joris Hoeven “Relax, but don’t be too lazy” In Journal of Symbolic Computation 34.6, 2002, pp. 479–542 DOI: 10.1006/jsco.2002.0562
- [Hoe10] Joris Hoeven “Newton’s method and FFT trading” In Journal of Symbolic Computation 45.8, 2010, pp. 857–878 DOI: 10.1016/j.jsc.2010.03.005
- [HL20] Joris Hoeven and Grégoire Lecerf “Fast multivariate multi-point evaluation revisited” Id/No 101405 In Journal of Complexity 56, 2020, pp. 38 DOI: 10.1016/j.jco.2019.04.001
- [HP98] Xiaohan Huang and Victor Y. Pan “Fast rectangular matrix multiplication and applications” In Journal of Complexity 14.2, 1998, pp. 257–299 DOI: https://doi.org/10.1006/jcom.1998.0476
- [KU08] Kiran S. Kedlaya and Christopher Umans “Fast modular composition in any characteristic” In 2008 49th Annual IEEE Symposium on Foundations of Computer Science, 2008, pp. 146–155 DOI: 10.1109/FOCS.2008.13
- [KU11] Kiran S. Kedlaya and Christopher Umans “Fast polynomial factorization and modular composition” In SIAM Journal on Computing 40.6, 2011, pp. 1767–1802 DOI: 10.1137/08073408X
- [Knu98] Donald E. Knuth “The Art of Computer Programming. Vol. 2: Seminumerical Algorithms.” Addison-Wesley, 1998
- [Kun74] Hsiang T. Kung “On computing reciprocals of power series” In Numerische Mathematik 22.5, 1974, pp. 341–348
- [Nei+23] Vincent Neiger, Bruno Salvy, Éric Schost and Gilles Villard “Faster Modular Composition” In Journal of the ACM, 2023 DOI: 10.1145/3638349
- [PSS12] Carine Pivoteau, Bruno Salvy and Michèle Soria “Algorithms for combinatorial structures: well-founded systems and Newton iterations” In Journal of Combinatorial Theory. Series A 119.8, 2012, pp. 1711–1773 DOI: 10.1016/j.jcta.2012.05.007
- [Rit86] Peter Ritzmann “A fast numerical algorithm for the composition of power series with complex coefficients” In Theoretical computer science 44, 1986, pp. 1–16
- [Sch00] Arnold Schönhage “Variations on computing reciprocals of power series” In Information Processing Letters 74.1-2, 2000, pp. 41–46 DOI: 10.1016/S0020-0190(00)00044-2
- [SGV94] Arnold Schönhage, Andreas Grotefeld and Ekkehart Vetter “Fast algorithms. A multitape Turing machine implementation” B.I. Wissenschaftsverlag, 1994
- [Ser10] Igor S. Sergeev “Fast algorithms for elementary operations on complex power series” In Discrete Mathematics and Applications 20.1, 2010, pp. 25–60 DOI: 10.1515/DMA.2010.002
- [Ser12] Igor S. Sergeev “A note on the fast power series’ exponential” In arXiv preprint arXiv:1203.3883, 2012
- [Wil+24] Virginia V. Williams, Yinzhan Xu, Zixuan Xu and Renfei Zhou “New bounds for matrix multiplication: from alpha to omega” In Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2024, pp. 3792–3835 DOI: 10.1137/1.9781611977912.134