This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\DeclareSourcemap\maps

[datatype=bibtex] \map[overwrite=true] \step[fieldsource=fjournal] \step[fieldset=journal, origfieldval] \step[fieldset=language, null] \step[fieldset=issn, null] \step[fieldset=month, null] \map[overwrite=true] \pertypearticle \pertypeinproceedings \pertypeunpublished \step[fieldset=publisher, null]

Power Series Composition in Near-Linear Time

Yasunori Kinoshita Tokyo Institute of Technology, Japan.    Baitian Li Institute for Interdisciplinary Information Sciences, Tsinghua University. Email: [email protected].
Abstract

We present an algebraic algorithm that computes the composition of two power series in softly linear time complexity. The previous best algorithms are O(n1+o(1))\mathop{\mathrm{O}}(n^{1+o(1)}) non-algebraic algorithm by Kedlaya and Umans (FOCS 2008) and an O(n1.43)\mathop{\mathrm{O}}(n^{1.43}) algebraic algorithm by Neiger, Salvy, Schost and Villard (JACM 2023).

Our algorithm builds upon the recent Graeffe iteration approach to manipulate rational power series introduced by Bostan and Mori (SOSA 2021).

1 Introduction

Let 𝔸\mathbb{A} be a commutative ring and let f(x),g(x)f(x),g(x) be polynomials in 𝔸[x]\mathbb{A}[x] of degrees less than mm and nn, respectively. The problem of power series composition is to compute the coefficients of f(g(x))modxnf(g(x))\bmod x^{n}. The terminology stems from the idea that g(x)g(x) can be seen as a truncated formal power series. This is a fundamental problem in computer algebra [Knu98, Section  4.7], and has applications in various areas such as combinatorics [PSS12] and cryptography [Bos+08].

A textbook algorithm for power series composition is due to Brent and Kung [BK78], which computes the composition in O(𝖬(n)(nlogn)1/2)\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n)(n\log n)^{1/2}) time complexity. Here 𝖬(n)\mathop{\mathsf{M}}(n) denotes the complexity of computing the product of two polynomials of degree less than nn.

The current best known algorithm for power series composition is due to Kedlaya and Umans [KU08, KU11], which computes the composition in (nlogq)1+o(1)(n\log q)^{1+o(1)} bit operations, where 𝔸\mathbb{A} is the finite field 𝔽q\mathbb{F}_{q}. Van der Hoeven and Lecerf [HL20] gave a detailed analysis of the subpolynomial term, and showed that the algorithm has a time complexity of O~(n2O(lognloglogn)logq)\tilde{\mathop{\mathrm{O}}}(n2^{\mathop{\mathrm{O}}(\sqrt{\log n\log\log n})}\log q).

In this paper, we present a simple algorithm that reduces the complexity of power series composition to near-linear time. Furthermore, our algorithm works over arbitrary commutative rings.

Theorem 1.

Given polynomials f(x),g(x)𝔸[x]f(x),g(x)\in\mathbb{A}[x] with degree less than mm and nn, respectively, the power series composition f(g(x))modxnf(g(x))\bmod x^{n} can be computed in O(𝖬(n)logm+𝖬(m))\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n)\log m+\mathop{\mathsf{M}}(m)) arithmetic operations.

We remark that our algorithm also has consequences for other computation models measured by bit complexity, such as boolean circuits and multitape Turing machines.

Technical Overview.

One of our key ingredients is Graeffe’s root squaring method, which was first time introduced by Schönhage [Sch00] in a purely algebraic setting to compute reciprocals of power series. Recently, Bostan and Mori [BM21] applied the Graeffe iteration to compute the coefficient of xNx^{N} of the series expansion of a rational power series P(x)/Q(x)P(x)/Q(x), achieving time complexity O(𝖬(n)logN)\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n)\log N) where nn is the degree bound of P(x)P(x) and Q(x)Q(x).

In a nutshell, their algorithm works as follows. Considering multiplying Q(x)Q(-x) on both the numerator and the denominator, one obtains [xN]P(x)Q(x)=[xN]P(x)Q(x)Q(x)Q(x)[x^{N}]\frac{P(x)}{Q(x)}=[x^{N}]\frac{P(x)Q(-x)}{Q(x)Q(-x)}111We use [xN][x^{N}] to denote coefficient extraction, i.e., for a power series F(x)F(x), [xN]F(x)[x^{N}]F(x) is the coefficient of xNx^{N} in F(x)F(x).. By the symmetry of Q(x)Q(x)Q(x)Q(-x), one can rewrite the expression to [xN]U(x)V(x2)[x^{N}]\frac{U(x)}{V(x^{2})} where U(x)=P(x)Q(x)U(x)=P(x)Q(-x) and V(x2)=Q(x)Q(x)V(x^{2})=Q(x)Q(-x). Furthermore, one can split U(x)U(x) into even and odd parts as U(x)=Ue(x2)+xUo(x2)U(x)=U_{e}(x^{2})+xU_{o}(x^{2}), and reduce the problem into [xN/2]Ue(x)V(x)[x^{N/2}]\frac{U_{e}(x)}{V(x)} or [x(N1)/2]Uo(x)V(x)[x^{(N-1)/2}]\frac{U_{o}(x)}{V(x)} depending on the parity of NN. Since the iteration reduces NN by half each time, the algorithm has time complexity O(𝖬(n)logN)\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n)\log N) by using polynomial multiplication.

Despite their algorithm has the same time complexity as the classical algorithm due to Fiduccia [Fid85], their algorithm is based on a totally different idea, and it is worth to get a closer look at their algorithm when NnN\leq n. In that case, the terms greater than NN are irrelevant, hence the size of the problem can be reduced to NN, compared with nn. Since the size of the problem is reduced by half in each iteration, the total time complexity is O(𝖬(N))\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(N)).

We extend the Graeffe iteration approach to a special kind of bivariate rational power series P(x)/Q(x,y)P(x)/Q(x,y), where Q(x,y)=1/(1yg(x))Q(x,y)=1/(1-yg(x)). The goal is still to compute the coefficient of xnx^{n} of the series expansion of P(x)/Q(x,y)P(x)/Q(x,y), but in this case, the output is a polynomial in yy, instead of a single 𝔸\mathbb{A}-coefficient.

At the beginning of our algorithm, both PP and QQ have a degree of nn with respect to xx, and a degree at most 11 with respect to yy. In each iteration, we halve the degree with respect to xx, and double the degree with respect to yy, therefore the size of the problem stays at nn. Since there are logn\log n iterations, the total time complexity is O(𝖬(n)logn)\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n)\log n).

We observe that the above algorithm, actually already solved the power projection problem, which is the transposed problem of power series composition. By the transposition principle — a general method for transforming algebraic algorithms into algorithms for the transposed problem with nearly identical complexity, we conclude the algorithm for power series composition. For simplicity, we unpack the transposition principle and give an independent explanation of the algorithm for power series composition in our presentation.

We first discuss the details of the Graeffe iteration algorithm for power projection in Section 3.1, as well as the analysis of correctness and running time. Then we explicitly present the algorithm for the power series composition in Section 3.2.

1.1 Related Work

Special Cases.

Many previous works have focused on special cases where ff and/or gg has a specific form. Kung [Kun74] applied Newton iteration to compute the reciprocal of a power series, i.e., f=(1+x)1f=(1+x)^{-1}, in O(𝖬(n))\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n)) operations, Brent [Bre76] further extended the algorithm for ff being elementary functions. Extensive work has been done to further optimize the constant factor of the algorithms computing reciprocals [SGV94, Ber04, Sch00, Har11, Ser10], square roots [Ber04, Har11, Ser10], exponentials [Ber04, Hoe10, BS09, Har09, Ser10, Ser12], and other elementary functions.

Gerhard [Ger00] gave an O(𝖬(n)logn)\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n)\log n) algorithm for the case when g(x)=ex1g(x)=e^{x}-1 and g(x)=log(1+x)g(x)=\log(1+x), which can be interpreted as the conversion between the falling factorial basis and the monomial basis of polynomials. Brent and Kung [BK78] showed that when g(x)g(x) is a constant degree polynomial, the composition can be computed in O(𝖬(n)logn)\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n)\log n) time. This result is generalized by van der Hoeven [Hoe02] to the case where g(x)g(x) is an algebraic power series. Bostan, Salvy and Schost [BSS08] isolated a large class of gg such that the composition problem can be computed in O(𝖬(n)logn)\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n)\log n) or O(𝖬(n))\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n)) operations, by composing several basic subroutines and the usage of the transposition principle [BLS03].

Bernstein [Ber98] studied the composition of a power series in the case where 𝔸\mathbb{A} has a small characteristic. When 𝔸\mathbb{A} has positive characteristic qq, his algorithm computes the composition of a power series in O((q/logq)𝖬(n)logn)\mathop{\mathrm{O}}((q/\log q)\mathop{\mathsf{M}}(n)\log n) operations. Ritzmann [Rit86] studied the numerical algorithm of power series composition, where 𝔸\mathbb{A} is the field of complex numbers.

Modular Composition.

Besides Brent and Kung’s O(𝖬(n)(nlogn)1/2)\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n)(n\log n)^{1/2})-time algorithm, all known approaches toward the general power series composition, actually solve a generalized problem called the modular composition problem. This problem is, given three polynomials f(x),g(x),h(x)𝔸[x]f(x),g(x),h(x)\in\mathbb{A}[x] where hh is monic, to compute f(g)modhf(g)\bmod h where the mod\bmod operation takes the remainder of the Euclidean division. When restricted to h(x)=xnh(x)=x^{n}, the modular composition problem becomes the power series composition problem.

In Brent and Kung’s paper [BK78] on power series composition, they also presented an algorithm that computes modular composition in O(n(ω+1)/2+𝖬(n)(nlogn)1/2)\mathop{\mathrm{O}}(n^{(\omega+1)/2}+\mathop{\mathsf{M}}(n)(n\log n)^{1/2}) operations, where ω\omega is a feasible exponent of matrix multiplication. Huang and Pan [HP98] improved the bound to O(nω2/2+𝖬(n)(nlogn)1/2)\mathop{\mathrm{O}}(n^{\omega_{2}/2}+\mathop{\mathsf{M}}(n)(n\log n)^{1/2}) by using the fast rectangular matrix multiplication, where ω2ω+1\omega_{2}\leq\omega+1 is a feasible exponent of rectangular matrix multiplication of size n×n2×nn\times n^{2}\times n. Even assuming one can attain the obvious lower bounds ω2\omega\geq 2 and ω23\omega_{2}\geq 3, their algorithms cannot yield a bound better than O(n1.5)\mathop{\mathrm{O}}(n^{1.5}).

Based on this, Bürgisser, Clausen and Shokrollahi [BCS10, open problem 2.4] and von zur Gathen and Gerhard [GG13, research problem 12.19] asked the following question:

Question 1.

Is there an algorithm that computes modular composition better than Brent and Kung’s approach, or even better than O(n1.5)\mathop{\mathrm{O}}(n^{1.5})?

Their question is partially answered by Kedlaya and Umans, and Neiger, Salvy, Schost and Villard.

Kedlaya and Umans [KU08, KU11] presented two algorithms for computing modular composition, where 𝔸\mathbb{A} is the finite field 𝔽q\mathbb{F}_{q}. The first algorithm performs (nlogq)1+o(1)(n\log q)^{1+o(1)} bit operations, while the second one performs p1+o(1)n1+o(1)p^{1+o(1)}n^{1+o(1)} algebraic operations, where pp is the characteristic of the field. However, the first algorithm involves non-algebraic operations, such as lifting from the prime field to the integers, and the second algorithm is efficient only for fields with small characteristic. As a result, their approach is unlikely to extend to the case of general rings.

A recent breakthrough by Neiger, Salvy, Schost and Villard [Nei+23] presented a Las Vegas algorithm that computes modular composition in O~(nκ)\tilde{\mathop{\mathrm{O}}}(n^{\kappa})222We use O~\tilde{\mathop{\mathrm{O}}} to suppress log factors, i.e., T(n)=O~(f(n))T(n)=\tilde{\mathop{\mathrm{O}}}(f(n)) denotes T(n)=O(f(n)logkf(n))T(n)=\mathop{\mathrm{O}}(f(n)\log^{k}f(n)) for some kk. arithmetic operations, assuming 𝔸\mathbb{A} is a field and some mild technical assumptions. Here κ\kappa is determined by

κ=1+11ω1+2ω22.\kappa=1+\frac{1}{\frac{1}{\omega-1}+\frac{2}{\omega_{2}-2}}.

By the best known upper bound ω2.371552\omega\leq 2.371552 and ω23.250385\omega_{2}\leq 3.250385 given by Williams, Xu, Xu and Zhou [Wil+24], it follows that κ1.43\kappa\leq 1.43, which breaks the n1.5n^{1.5} barrier. However, by the obvious lower bound ω2\omega\geq 2 and ω23\omega_{2}\geq 3, their algorithm cannot yield a bound better than O(n4/3)\mathop{\mathrm{O}}(n^{4/3}).

Our results can be seen as another partial answer to Question 1.

2 Preliminaries

2.1 Notation

In this paper, 𝔸\mathbb{A} denotes an effective ring, i.e., a commutative ring with unity, and one can perform the basic ring operations (+,,×)(+,-,\times) with unit cost in 𝔸\mathbb{A}. The polynomial ring and the formal power series ring over 𝔸\mathbb{A} are denoted by 𝔸[x]\mathbb{A}[x] and 𝔸[[x]]\mathbb{A}[[x]], respectively. We use 𝔸[x]<n\mathbb{A}[x]_{<n} to denote the set of polynomials of degree less than nn. For any polynomial or power series f(x)=ifixif(x)=\sum_{i}f_{i}x^{i}, [xi]f(x)[x^{i}]f(x) denotes the coefficient fif_{i} of xix^{i}, and f(x)modxnf(x)\bmod x^{n} denotes the truncation i=0n1fixi\sum_{i=0}^{n-1}f_{i}x^{i}.

For a bivariate polynomial f(x,y)𝔸[x,y]f(x,y)\in\mathbb{A}[x,y], degxf(x,y)\deg_{x}f(x,y) and degyf(x,y)\deg_{y}f(x,y) denote its degree with respect to xx and yy, respectively. The bidegree of f(x,y)f(x,y) is the pair bidegf(x,y)=(degxf(x,y),degyf(x,y))\mathop{\mathrm{bideg}}f(x,y)=(\deg_{x}f(x,y),\deg_{y}f(x,y)). Inequalities between bidegrees are component-wise. For example, bidegf(x,y)(n,m)\mathop{\mathrm{bideg}}f(x,y)\leq(n,m) means degxf(x,y)n\deg_{x}f(x,y)\leq n and degyf(x,y)m\deg_{y}f(x,y)\leq m.

2.2 Computation Model

Informally, we measure the complexity of an algorithm by counting the number of arithmetic operations in the ring 𝔸\mathbb{A}. The underlying computation model is the straight-line program. Readers are referred to the textbook of Bürgisser, Clausen and Shokrollahi [BCS10, Sec. 4.1] for a detailed explanation.

2.3 Polynomial Multiplication

Our algorithm calls polynomial multiplication as a crucial subroutine. Over different rings, the time complexity of our algorithm may vary due to the different polynomial multiplication algorithms.

For an effective ring 𝔸\mathbb{A}, we call a function 𝖬𝔸(n)\mathop{\mathsf{M}}_{\mathbb{A}}(n) a multiplication time for 𝔸[x]\mathbb{A}[x] if polynomials in 𝔸[x]\mathbb{A}[x] of degree less than nn can be multiplied in O(𝖬𝔸(n))\mathop{\mathrm{O}}(\mathop{\mathsf{M}}_{\mathbb{A}}(n)) operations. When the ring 𝔸\mathbb{A} is clear from the context, we simply write 𝖬(n)\mathop{\mathsf{M}}(n).

Throughout this paper, we assume a regularity condition on 𝖬(n)\mathop{\mathsf{M}}(n), that for some constants α,β(0,1)\alpha,\beta\in(0,1), the function 𝖬\mathop{\mathsf{M}} satisfies the condition 𝖬(αn)β𝖬(n)\mathop{\mathsf{M}}(\lceil\alpha n\rceil)\leq\beta\mathop{\mathsf{M}}(n) for all sufficiently large nn. All the examples of 𝖬\mathop{\mathsf{M}} presented in this paper satisfy this condition.

The fast Fourier transform (FFT) [CT65] gives the bound when 𝔸\mathbb{A} is a field possessing roots of unity of sufficiently high order; for example, the complex numbers \mathbb{C} satisfies 𝖬(n)=O(nlogn)\mathop{\mathsf{M}}_{\mathbb{C}}(n)=\mathop{\mathrm{O}}(n\log n). For any 𝔽p\mathbb{F}_{p}-algebra 𝔸\mathbb{A} where pp is a prime, Harvey and van der Hoeven [HH19] proved that 𝖬𝔸(n)=O(nlogn 4logn)\mathop{\mathsf{M}}_{\mathbb{A}}(n)=\mathop{\mathrm{O}}(n\log n\,4^{\log^{*}n})333log\log^{*} is the iterated logarithm.. For general rings, Cantor and Kaltofen [CK91] showed a uniform bound 𝖬𝔸(n)=O(nlognloglogn)\mathop{\mathsf{M}}_{\mathbb{A}}(n)=\mathop{\mathrm{O}}(n\log n\log\log n). This implies that our algorithm has a uniform bound O(nlognlogmloglogn+mlogmloglogm)\mathop{\mathrm{O}}(n\log n\log m\log\log n+m\log m\log\log m).

We may also obtain complexity bounds for various computation models, depending on the fast arithmetic of 𝔸\mathbb{A} and the polynomial multiplication algorithm on the corresponding model. For example, for the bit complexity of boolean circuits and multitape Turing machines, since the polynomial multiplication over 𝔽p\mathbb{F}_{p} can be done in 𝖬𝔽p(n)=O((nlogp)log(nlogp)4max(logp,logn))\mathop{\mathsf{M}}_{\mathbb{F}_{p}}(n)=\mathop{\mathrm{O}}((n\log p)\log(n\log p)4^{\max(\log^{*}p,\log^{*}n)}) bit operations [HH19], substituting such a bound into our algorithm yields a bound for the bit complexity of the composition problem.

2.4 Operations on Power Series and Polynomials

We need the following operations on power series and polynomials that are used in our algorithm.

Theorem 2 ([Kun74]).

Let 𝔸\mathbb{A} be an effective ring and f(x)𝔸[x]<nf(x)\in\mathbb{A}[x]_{<n} satisfies f(0)=1f(0)=1, the truncation of the reciprocal f(x)1modxnf(x)^{-1}\bmod x^{n} can be computed in O(𝖬𝔸(n))\mathop{\mathrm{O}}(\mathop{\mathsf{M}}_{\mathbb{A}}(n)) operations.

Using Kronecker’s substitution, one can reduce the problem of computing the multiplication of bivariate polynomials to univariate cases.

Theorem 3 ([GG13, Corollary 8.28]).

Let f(x,y),g(x,y)𝔸[x,y]f(x,y),g(x,y)\in\mathbb{A}[x,y] with bidegree bound (n,m)(n,m), their product f(x,y)g(x,y)f(x,y)g(x,y) can be computed in O(𝖬𝔸(nm))\mathop{\mathrm{O}}(\mathop{\mathsf{M}}_{\mathbb{A}}(nm)) operations.

We also require the following lemma, which bounds the total time complexity of polynomial multiplications with exponentially decreasing degrees.

Lemma 4 ([BK78, Lemma 1.1]).

i=0log2n𝖬(n/2i)=O(𝖬(n))\sum_{i=0}^{\lfloor\log_{2}n\rfloor}\mathop{\mathsf{M}}(\lceil n/2^{i}\rceil)=\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n)).

2.5 Transposition Principle

The transposition principle, a.k.a. Tellegen’s principle, is an algorithmic theorem that allows one to produce a new algorithm of the transposed problem of a given linear problem.

Theorem 5 ([BCS10, Thm. 13.20]).

Let MM be an r×sr\times s matrix. Suppose that there exists a linear straight-line program of length LL that computes the matrix-vector product bMbb\mapsto Mb, and MM has z0z_{0} zero rows and z1z_{1} zero columns. Then the transposed problem cM𝖳cc\mapsto M^{\mathsf{T}}c can be solved by a linear straight-line program of length Ls+rz1+z0L-s+r-z_{1}+z_{0}.

Roughly speaking, the transposition principle allows one to compute the transposed problem with the same time complexity as the original problem, provided that the algorithm is 𝔸\mathbb{A}-linear, in the sense that only linear operations in the coefficients of bb are performed.

The transposed problem of power series composition is the power projection problem, which is defined as follows.

Definition 1 (Power Projection).

Given a positive integer mm, a polynomial g(x)g(x) of degree less than nn, and a linear form w:𝔸[x]<n𝔸w\colon\mathbb{A}[x]_{<n}\to\mathbb{A}, the power projection problem is to compute fi=w(g(x)imodxn)f_{i}=w(g(x)^{i}\bmod x^{n}) for all i=0,,m1i=0,\dots,m-1. The linear form ww is identified by its values on the monomials xix^{i}, i.e., w(i=0n1aixi)=i=0n1wiaiw(\sum_{i=0}^{n-1}a_{i}x^{i})=\sum_{i=0}^{n-1}w_{i}a_{i}.

For a given polynomial g(x)g(x), consider the n×mn\times m matrix MM given by Mij=[xi]g(x)jM_{ij}=[x^{i}]g(x)^{j}. The power series composition problem is equivalent to computing MfMf, where ff is the column vector of coefficients of f(x)f(x), and the power projection problem is equivalent to computing M𝖳bM^{\mathsf{T}}b, where bb is the column vector of coefficients of the linear form ww. It is important to note that the power projection and composition problems are linear only when the polynomial gg is fixed, as the output of these problems is nonlinear with respect to the coefficients of gg.

3 Main Results

In Section 3.1, we present an efficient 𝔸\mathbb{A}-linear algorithm for power projection. By applying the transposition principle to this algorithm, we obtain an algorithm for power series composition. In Section 3.2, we describe the derivation of the power series composition algorithm without relying on the transposition principle, thereby providing a direct proof of Theorem 1.

3.1 Algorithm for Power Projection

Our objective is to compute fi=w(g(x)imodxn)f_{i}=w(g(x)^{i}\bmod x^{n}) for all i=0,,m1i=0,\dots,m-1. We define a polynomial wRw^{\mathrm{R}} as wR(x)=j=0n1wn1jxjw^{\mathrm{R}}(x)=\sum_{j=0}^{n-1}w_{n-1-j}x^{j}. Let P(x,y)wR(x)P(x,y)\coloneqq w^{\mathrm{R}}(x) and Q(x,y)1yg(x)Q(x,y)\coloneqq 1-yg(x). Then, the generating function f(y)i=0m1fiyif(y)\coloneqq\sum_{i=0}^{m-1}f_{i}y^{i} is rearranged as:

f(y)=i=0m1yij=0n1wj[xj]g(x)i=i=0m1yi[xn1](wR(x)g(x)i)=(i=0yi[xn1](wR(x)g(x)i))modym=[xn1](wR(x)i=0yig(x)i)modym=[xn1]wR(x)1yg(x)modym=[xn1]P(x,y)Q(x,y)modymu.\begin{split}f(y)&=\sum_{i=0}^{m-1}y^{i}\sum_{j=0}^{n-1}w_{j}[x^{j}]g(x)^{i}\\ &=\sum_{i=0}^{m-1}y^{i}[x^{n-1}](w^{\mathrm{R}}(x)g(x)^{i})\\ &=\left(\sum_{i=0}^{\infty}y^{i}[x^{n-1}](w^{\mathrm{R}}(x)g(x)^{i})\right)\bmod y^{m}\\ &=[x^{n-1}]\left(w^{\mathrm{R}}(x)\sum_{i=0}^{\infty}y^{i}g(x)^{i}\right)\bmod y^{m}\\ &=[x^{n-1}]\frac{w^{\mathrm{R}}(x)}{1-yg(x)}\bmod y^{m}\\ &=[x^{n-1}]\frac{P(x,y)}{Q(x,y)}\bmod y^{m}\eqqcolon u.\end{split}

If n1=0n-1=0, then u=P(0,y)/Q(0,y)u=P(0,y)/Q(0,y). If n11n-1\geq 1, we multiply Q(x,y)Q(-x,y) by the numerator and the denominator, resulting in

u=[xn1]P(x,y)Q(x,y)modxnmodymQ(x,y)Q(x,y)modxnmodymmodym.u=[x^{n-1}]{\frac{P(x,y)Q(-x,y)\bmod x^{n}\bmod y^{m}}{Q(x,y)Q(-x,y)\bmod x^{n}\bmod y^{m}}}\bmod y^{m}.

Let A(x,y)Q(x,y)Q(x,y)modxnmodymA(x,y)\coloneqq Q(x,y)Q(-x,y)\bmod x^{n}\bmod y^{m}. A(x,y)A(x,y) satisfies the equation A(x,y)=A(x,y)A(x,y)=A(-x,y), which implies [xi]A(x,y)=0[x^{i}]A(x,y)=0 for all odd ii; thus, there exists a unique polynomial V(x,y)V(x,y) such that V(x2,y)=A(x,y)V(x^{2},y)=A(x,y). Now, we have

u=[xn1]U(x,y)V(x2,y)modymu=[x^{n-1}]\frac{U(x,y)}{V(x^{2},y)}\bmod y^{m}

for U(x,y)P(x,y)Q(x,y)modxnmodymU(x,y)\coloneqq P(x,y)Q(-x,y)\bmod x^{n}\bmod y^{m}. Since 1/V(x2,y)1/V(x^{2},y) is an even formal power series with respect to xx, i.e., [xi](1/V(x,y))=0[x^{i}](1/V(x,y))=0 for all odd ii, we can ignore the odd (or even) part of U(x,y)U(x,y) if n1n-1 is even (or odd). Let Ue(x,y)U_{e}(x,y) and Uo(x,y)U_{o}(x,y) be the unique polynomials such that U(x,y)=Ue(x2,y)+xUo(x2,y)U(x,y)=U_{e}(x^{2},y)+xU_{o}(x^{2},y). Then, we have

u={[xn1]Ue(x2,y)V(x2,y)(n1 is even)[xn1]xUo(x2,y)V(x2,y)(n1 is odd)={[xn/21]Ue(x,y)V(x,y)(n1 is even)[xn/21]Uo(x,y)V(x,y)(n1 is odd).\begin{split}u&=\begin{cases}\displaystyle[x^{n-1}]\frac{U_{e}(x^{2},y)}{V(x^{2},y)}\quad&(n-1\text{ is even})\\ \displaystyle[x^{n-1}]\frac{xU_{o}(x^{2},y)}{V(x^{2},y)}\quad&(n-1\text{ is odd})\end{cases}\\ &=\begin{cases}\displaystyle[x^{\lceil n/2\rceil-1}]\frac{U_{e}(x,y)}{V(x,y)}\quad&(n-1\text{ is even})\\ \displaystyle[x^{\lceil n/2\rceil-1}]\frac{U_{o}(x,y)}{V(x,y)}\quad&(n-1\text{ is odd}).\end{cases}\end{split}

Now the problem is reduced to a case where nn is halved. We repeat this reduction until n1n-1 becomes 0. The resulting algorithm is summarized in Algorithm 1.

Algorithm 1 (𝖯𝗈𝗐𝖯𝗋𝗈𝗃\mathop{\mathsf{PowProj}})
Input: nn, mm, P(x,y),Q(x,y)𝔸[x,y]P(x,y),Q(x,y)\in\mathbb{A}[x,y]
Output: [xn1](P(x,y)/Q(x,y))modym[x^{n-1}](P(x,y)/Q(x,y))\bmod y^{m}
1:[x0y0]Q(x,y)=1[x^{0}y^{0}]Q(x,y)=1
2:while n>1n>1 do
3:     U(x,y)P(x,y)Q(x,y)modxnmodymU(x,y)\leftarrow P(x,y)Q(-x,y)\bmod x^{n}\bmod y^{m}
4:     if n1n-1 is even then
5:         P(x,y)i=0n/21xi[s2i]U(s,y)P(x,y)\leftarrow\sum_{i=0}^{\lceil n/2\rceil-1}x^{i}[s^{2i}]U(s,y)
6:     else
7:         P(x,y)i=0n/21xi[s2i+1]U(s,y)P(x,y)\leftarrow\sum_{i=0}^{\lceil n/2\rceil-1}x^{i}[s^{2i+1}]U(s,y)
8:     end if
9:     A(x,y)Q(x,y)Q(x,y)modxnmodymA(x,y)\leftarrow Q(x,y)Q(-x,y)\bmod x^{n}\bmod y^{m}
10:     Q(x,y)i=0n/21xi[s2i]A(s,y)Q(x,y)\leftarrow\sum_{i=0}^{\lceil n/2\rceil-1}x^{i}[s^{2i}]A(s,y)
11:     nn/2n\leftarrow\lceil n/2\rceil
12:end while
13:return (P(0,y)/Q(0,y))modym(P(0,y)/Q(0,y))\bmod y^{m}
Theorem 6.

Given a positive integer mm, a polynomial g(x)𝔸[x]g(x)\in\mathbb{A}[x] with degree less than nn, and a linear form w:𝔸[x]<n𝔸w\colon\mathbb{A}[x]_{<n}\to\mathbb{A}, one can compute w(g(x)imodxn)=j=0n1wj[xj]g(x)iw(g(x)^{i}\bmod x^{n})=\sum_{j=0}^{n-1}w_{j}[x^{j}]g(x)^{i} for all i=0,,m1i=0,\dots,m-1 in O(𝖬(n)logm+𝖬(m))\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n)\log m+\mathop{\mathsf{M}}(m)) operations.

Proof.

We analyze the time complexity of 𝖯𝗈𝗐𝖯𝗋𝗈𝗃(n,m,j=0n1wn1jxj,1yg(x))\mathop{\mathsf{PowProj}}(n,m,\sum_{j=0}^{n-1}w_{n-1-j}x^{j},1-yg(x)). Let TkT_{k} represent the time complexity required for polynomial multiplications in the kk-th iteration out of a total of log2n\lceil\log_{2}n\rceil iterations in Algorithm 1. Here, we consider the first iteration as the 0-th iteration. By induction, we observe that both bidegP\mathop{\mathrm{bideg}}P and bidegQ\mathop{\mathrm{bideg}}Q are at most (n/2k1,2k)(\lceil n/2^{k}\rceil-1,2^{k}) in the kk-th iteration, which implies Tk=O(𝖬(n))T_{k}=\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n)). At the same time, we also have both bidegP\mathop{\mathrm{bideg}}P and bidegQ\mathop{\mathrm{bideg}}Q are at most (n/2k1,m1)(\lceil n/2^{k}\rceil-1,m-1), which implies Tk=O(𝖬(nm/2k))T_{k}=\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(\lceil nm/2^{k}\rceil)). By combining these bounds with Lemma 4, the total time complexity of all iterations is bounded as

k=0log2n1Tk=O(k=0log2m1𝖬(n))+O(k=log2mlog2n1𝖬(nm/2k))=O(𝖬(n)logm)+O(𝖬(n))=O(𝖬(n)logm).\begin{split}\sum_{k=0}^{\lceil\log_{2}n\rceil-1}T_{k}&=\mathop{\mathrm{O}}\left(\sum_{k=0}^{\lceil\log_{2}m\rceil-1}\mathop{\mathsf{M}}(n)\right)+\mathop{\mathrm{O}}\left(\sum_{k=\lceil\log_{2}m\rceil}^{\lceil\log_{2}n\rceil-1}\mathop{\mathsf{M}}(\lceil nm/2^{k}\rceil)\right)\\ &=\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n)\log m)+\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n))\\ &=\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n)\log m).\end{split}

The computation in line 13 can be performed in O(𝖬(m))\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(m)) operations by Theorem 2. Other parts of the algorithm have a negligible contribution to the overall time complexity. Consequently, the total time complexity is O(𝖬(n)logm+𝖬(m))\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n)\log m+\mathop{\mathsf{M}}(m)). ∎

3.2 Algorithm for Composition

The algorithm presented in this section is essentially the transposition of the algorithm presented in Section 3.1. However, to enhance comprehension, we present a derivation that does not rely on the transposition principle. To this end, we rewrite the objective f(g(x))modxnf(g(x))\bmod x^{n}.

Let P(y)ym1f(y1),Q(x,y)1yg(x)P(y)\coloneqq y^{m-1}f(y^{-1}),Q(x,y)\coloneqq 1-yg(x). Then, we observe that

f(g(x))modxn=i=0m1([yi]f(y))g(x)imodxn=[ym1](ym1f(y1)i=0yig(x)i)modxn=[ym1]ym1f(y1)1yg(x)modxn=[ym1]P(y)Q(x,y)modxn.\begin{split}f(g(x))\bmod x^{n}&=\sum_{i=0}^{m-1}([y^{i}]f(y))g(x)^{i}\bmod x^{n}\\ &=[y^{m-1}]\left(y^{m-1}f(y^{-1})\sum_{i=0}^{\infty}y^{i}g(x)^{i}\right)\bmod x^{n}\\ &=[y^{m-1}]\frac{y^{m-1}f(y^{-1})}{1-yg(x)}\bmod x^{n}\\ &=[y^{m-1}]\frac{P(y)}{Q(x,y)}\bmod x^{n}.\end{split}

Let us define a operator \mathop{\mathcal{F}}\nolimits by

l,r(i0ai(x)yi)i=lr1ai(x)yil.\mathop{\mathcal{F}}\nolimits_{l,r}\left(\sum_{i\geq 0}a_{i}(x)y^{i}\right)\coloneqq\sum_{i=l}^{r-1}a_{i}(x)y^{i-l}.

Then our objective is to compute m1,m(P(y)/Q(x,y))modxn\mathop{\mathcal{F}}\nolimits_{m-1,m}(P(y)/Q(x,y))\bmod x^{n}. For algorithmic convenience, we generalize our aim to computing d,m(P(y)/Q(x,y))modxn\mathop{\mathcal{F}}\nolimits_{d,m}(P(y)/Q(x,y))\bmod x^{n} for a given dd. If n=1n=1, then d,m(P(y)/Q(x,y))modxn=d,m(P(y)/Q(0,y))\mathop{\mathcal{F}}\nolimits_{d,m}(P(y)/Q(x,y))\bmod x^{n}=\mathop{\mathcal{F}}\nolimits_{d,m}(P(y)/Q(0,y)). If n>1n>1, similar to the transformation described in Section 3.1, we can derive that

d,m(P(y)Q(x,y))modxn=d,m(P(y)Q(x,y)Q(x,y)Q(x,y)modxnmodym)modxn=d,m(P(y)V(x2,y)Q(x,y))modxnu,\begin{split}\mathop{\mathcal{F}}\nolimits_{d,m}\left(\frac{P(y)}{Q(x,y)}\right)\bmod x^{n}&=\mathop{\mathcal{F}}\nolimits_{d,m}\left(\frac{P(y)Q(-x,y)}{Q(x,y)Q(-x,y)\bmod x^{n}\bmod y^{m}}\right)\bmod x^{n}\\ &=\mathop{\mathcal{F}}\nolimits_{d,m}\left(\frac{P(y)}{V(x^{2},y)}Q(-x,y)\right)\bmod x^{n}\eqqcolon u,\end{split}

where V(x2,y)=Q(x,y)Q(x,y)modxnmodymV(x^{2},y)=Q(x,y)Q(-x,y)\bmod x^{n}\bmod y^{m}. Let emax{0,ddegyQ(x,y)}e\coloneqq\max\{0,d-\deg_{y}Q(x,y)\}. Then, we can discard terms of order less than ee with respect to yy among P(y)/V(x2,y)P(y)/V(x^{2},y) to compute uu, because they has no contribution to term with order dd or higher with respect to yy when multiplied by Q(x,y)Q(-x,y). Therefore, uu can be rewritten as

u=d,m((e,m(P(y)V(x2,y))ye)Q(x,y))modxn=de,me(e,m(P(y)V(x2,y))Q(x,y))modxn=de,me((e,m(P(y)V(x2,y))modxn)Q(x,y))modxn=de,me((e,m(P(y)V(z,y))modzn/2)|z=x2Q(x,y))modxn.\begin{split}u&=\mathop{\mathcal{F}}\nolimits_{d,m}\left(\left(\mathop{\mathcal{F}}\nolimits_{e,m}\left(\frac{P(y)}{V(x^{2},y)}\right)y^{e}\right)Q(-x,y)\right)\bmod x^{n}\\ &=\mathop{\mathcal{F}}\nolimits_{d-e,m-e}\left(\mathop{\mathcal{F}}\nolimits_{e,m}\left(\frac{P(y)}{V(x^{2},y)}\right)Q(-x,y)\right)\bmod x^{n}\\ &=\mathop{\mathcal{F}}\nolimits_{d-e,m-e}\left(\left(\mathop{\mathcal{F}}\nolimits_{e,m}\left(\frac{P(y)}{V(x^{2},y)}\right)\bmod x^{n}\right)Q(-x,y)\right)\bmod x^{n}\\ &=\mathop{\mathcal{F}}\nolimits_{d-e,m-e}\left(\left.{\left(\mathop{\mathcal{F}}\nolimits_{e,m}\left(\frac{P(y)}{V(z,y)}\right)\bmod z^{\lceil n/2\rceil}\right)}\right\rvert_{z=x^{2}}Q(-x,y)\right)\bmod x^{n}.\end{split}

Now the problem is reduced to a case where nn is halved. We repeat this reduction until nn becomes 11. The resulting algorithm is summarized in Algorithm 2.

Algorithm 2 (𝖢𝗈𝗆𝗉\mathop{\mathsf{Comp}})
Input: nn, dd, mm, P(y)𝔸[y]P(y)\in\mathbb{A}[y], Q(x,y)𝔸[x,y]Q(x,y)\in\mathbb{A}[x,y]
Output: d,m(P(y)/Q(x,y))modxn\mathop{\mathcal{F}}\nolimits_{d,m}(P(y)/Q(x,y))\bmod x^{n}
1:[x0y0]Q(x,y)=1[x^{0}y^{0}]Q(x,y)=1
2:if n=1n=1 then
3:     C(y)(P(y)/Q(0,y))modymC(y)\leftarrow(P(y)/Q(0,y))\bmod y^{m}
4:     return i=dm1yid[ti]C(t)\sum_{i=d}^{m-1}y^{i-d}[t^{i}]C(t)
5:else
6:     A(x,y)Q(x,y)Q(x,y)modxnmodymA(x,y)\leftarrow Q(x,y)Q(-x,y)\bmod x^{n}\bmod y^{m}
7:     V(x,y)i=0n/21xi[s2i]A(s,y)V(x,y)\leftarrow\sum_{i=0}^{\lceil n/2\rceil-1}x^{i}[s^{2i}]A(s,y)
8:     emax{0,ddegyQ(x,y)}e\leftarrow\max\{0,d-\deg_{y}Q(x,y)\}
9:     W(x,y)𝖢𝗈𝗆𝗉(n/2,e,m,P(y),V(x,y))W(x,y)\leftarrow\mathop{\mathsf{Comp}}(\lceil n/2\rceil,e,m,P(y),V(x,y))
10:     B(x,y)W(x2,y)Q(x,y)B(x,y)\leftarrow W(x^{2},y)Q(-x,y)
11:     return i=deme1yi(de)[ti]B(x,t)modxn\sum_{i=d-e}^{m-e-1}y^{i-(d-e)}[t^{i}]B(x,t)\bmod x^{n}
12:end if
Proof of Theorem 1.

We analyze the time complexity of 𝖢𝗈𝗆𝗉(n,m1,m,ym1f(y1),1yg(x))\mathop{\mathsf{Comp}}(n,m-1,m,\ y^{m-1}f(y^{-1}),1-yg(x)). Let TkT_{k} represent the time complexity required for line 6 and line 10 in the kk-th invocation out of a total of log2n\lceil\log_{2}n\rceil recursive invocations of Algorithm 2. Here, we consider the first invocation as the 0-th invocation. By induction, we observe that bidegQ(n/2k1,2k)\mathop{\mathrm{bideg}}Q\leq(\lceil n/2^{k}\rceil-1,2^{k}) and bidegW(n/2k+11,2k+1)\mathop{\mathrm{bideg}}W\leq(\lceil n/2^{k+1}\rceil-1,2^{k+1}) in the kk-th invocation, which implies Tk=O(𝖬(n))T_{k}=\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n)). At the same time, we also have bidegQ(n/2k1,m1)\mathop{\mathrm{bideg}}Q\leq(\lceil n/2^{k}\rceil-1,m-1) and bidegW(n/2k+11,m1)\mathop{\mathrm{bideg}}W\leq(\lceil n/2^{k+1}\rceil-1,m-1), which implies Tk=O(𝖬(nm/2k))T_{k}=\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(\lceil nm/2^{k}\rceil)). By combining these bounds with Lemma 4, the total time complexity of all invocations is bounded as

k=0log2n1Tk=O(k=0log2m1𝖬(n))+O(k=log2mlog2n1𝖬(nm/2k))=O(𝖬(n)logm)+O(𝖬(n))=O(𝖬(n)logm).\begin{split}\sum_{k=0}^{\lceil\log_{2}n\rceil-1}T_{k}&=\mathop{\mathrm{O}}\left(\sum_{k=0}^{\lceil\log_{2}m\rceil-1}\mathop{\mathsf{M}}(n)\right)+\mathop{\mathrm{O}}\left(\sum_{k=\lceil\log_{2}m\rceil}^{\lceil\log_{2}n\rceil-1}\mathop{\mathsf{M}}(\lceil nm/2^{k}\rceil)\right)\\ &=\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n)\log m)+\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n))\\ &=\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n)\log m).\end{split}

The computation in line 3 can be performed in O(𝖬(m))\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(m)) operations by Theorem 2. Other parts of the algorithm have a negligible contribution to the overall time complexity. Consequently, the total time complexity is O(𝖬(n)logm+𝖬(m))\mathop{\mathrm{O}}(\mathop{\mathsf{M}}(n)\log m+\mathop{\mathsf{M}}(m)). ∎

Acknowledgements

We extend our gratitude to Josh Alman and Hanna Sumita for their review and revisions of our paper. We also appreciate the helpful suggestions provided by the anonymous reviewers.

References

  • [Ber98] Daniel J. Bernstein “Composing power series over a finite ring in essentially linear time” In Journal of Symbolic Computation 26.3, 1998, pp. 339–341 DOI: 10.1006/jsco.1998.0216
  • [Ber04] Daniel J. Bernstein “Removing redundancy in high-precision Newton iteration” Available from http://cr.yp.to/papers.html#fastnewton, 2004
  • [BLS03] Alin Bostan, Grégoire Lecerf and Éric Schost “Tellegen’s principle into practice” In Proceedings of the 2003 international symposium on Symbolic and algebraic computation, 2003, pp. 37–44
  • [Bos+08] Alin Bostan, François Morain, Bruno Salvy and Éric Schost “Fast algorithms for computing isogenies between elliptic curves” In Mathematics of Computation 77.263, 2008, pp. 1755–1778 DOI: 10.1090/S0025-5718-08-02066-8
  • [BM21] Alin Bostan and Ryuhei Mori “A simple and fast algorithm for computing the NN-th term of a linearly recurrent sequence” In Symposium on Simplicity in Algorithms (SOSA), 2021, pp. 118–132 SIAM
  • [BSS08] Alin Bostan, Bruno Salvy and Éric Schost “Power series composition and change of basis” In Proceedings of the twenty-first international symposium on Symbolic and algebraic computation, 2008, pp. 269–276
  • [BS09] Alin Bostan and Éric Schost “A simple and fast algorithm for computing exponentials of power series” In Information Processing Letters 109.13, 2009, pp. 754–756 DOI: 10.1016/j.ipl.2009.03.012
  • [Bre76] Richard P Brent “Multiple-precision zero-finding methods and the complexity of elementary function evaluation” In Analytic computational complexity Elsevier, 1976, pp. 151–176
  • [BK78] Richard P. Brent and Hsiang T. Kung “Fast algorithms for manipulating formal power series” In Journal of the ACM 25.4, 1978, pp. 581–595
  • [BCS10] Peter Bürgisser, Michael Clausen and Mohammad A. Shokrollahi “Algebraic Complexity Theory” Springer, 2010
  • [CK91] David G. Cantor and Erich Kaltofen “On fast multiplication of polynomials over arbitrary algebras” In Acta Informatica 28.7, 1991, pp. 693–701 DOI: 10.1007/BF01178683
  • [CT65] James W. Cooley and John W. Tukey “An algorithm for the machine calculation of complex Fourier series” In Mathematics of Computation 19.90, 1965, pp. 297–301 URL: http://www.jstor.org/stable/2003354
  • [Fid85] Charles M. Fiduccia “An efficient formula for linear recurrences” In SIAM Journal on Computing 14, 1985, pp. 106–112 DOI: 10.1137/0214007
  • [GG13] Joachim Gathen and Jürgen Gerhard “Modern Computer Algebra” Cambridge University Press, 2013 DOI: 10.1017/CBO9781139856065
  • [Ger00] Jürgen Gerhard “Modular algorithms for polynomial basis conversion and greatest factorial factorization” In Proceedings of the Seventh Rhine Workshop on Computer Algebra (RWCA’00), 2000, pp. 125–141
  • [Har09] David Harvey “Faster exponentials of power series” In arXiv preprint arXiv:0911.3110, 2009
  • [Har11] David Harvey “Faster algorithms for the square root and reciprocal of power series” In Mathematics of Computation 80.273, 2011, pp. 387–394 DOI: 10.1090/S0025-5718-2010-02392-0
  • [HH19] David Harvey and Joris Hoeven “Faster polynomial multiplication over finite fields using cyclotomic coefficient rings” Id/No 101404 In Journal of Complexity 54, 2019, pp. 18 DOI: 10.1016/j.jco.2019.03.004
  • [Hoe02] Joris Hoeven “Relax, but don’t be too lazy” In Journal of Symbolic Computation 34.6, 2002, pp. 479–542 DOI: 10.1006/jsco.2002.0562
  • [Hoe10] Joris Hoeven “Newton’s method and FFT trading” In Journal of Symbolic Computation 45.8, 2010, pp. 857–878 DOI: 10.1016/j.jsc.2010.03.005
  • [HL20] Joris Hoeven and Grégoire Lecerf “Fast multivariate multi-point evaluation revisited” Id/No 101405 In Journal of Complexity 56, 2020, pp. 38 DOI: 10.1016/j.jco.2019.04.001
  • [HP98] Xiaohan Huang and Victor Y. Pan “Fast rectangular matrix multiplication and applications” In Journal of Complexity 14.2, 1998, pp. 257–299 DOI: https://doi.org/10.1006/jcom.1998.0476
  • [KU08] Kiran S. Kedlaya and Christopher Umans “Fast modular composition in any characteristic” In 2008 49th Annual IEEE Symposium on Foundations of Computer Science, 2008, pp. 146–155 DOI: 10.1109/FOCS.2008.13
  • [KU11] Kiran S. Kedlaya and Christopher Umans “Fast polynomial factorization and modular composition” In SIAM Journal on Computing 40.6, 2011, pp. 1767–1802 DOI: 10.1137/08073408X
  • [Knu98] Donald E. Knuth “The Art of Computer Programming. Vol. 2: Seminumerical Algorithms.” Addison-Wesley, 1998
  • [Kun74] Hsiang T. Kung “On computing reciprocals of power series” In Numerische Mathematik 22.5, 1974, pp. 341–348
  • [Nei+23] Vincent Neiger, Bruno Salvy, Éric Schost and Gilles Villard “Faster Modular Composition” In Journal of the ACM, 2023 DOI: 10.1145/3638349
  • [PSS12] Carine Pivoteau, Bruno Salvy and Michèle Soria “Algorithms for combinatorial structures: well-founded systems and Newton iterations” In Journal of Combinatorial Theory. Series A 119.8, 2012, pp. 1711–1773 DOI: 10.1016/j.jcta.2012.05.007
  • [Rit86] Peter Ritzmann “A fast numerical algorithm for the composition of power series with complex coefficients” In Theoretical computer science 44, 1986, pp. 1–16
  • [Sch00] Arnold Schönhage “Variations on computing reciprocals of power series” In Information Processing Letters 74.1-2, 2000, pp. 41–46 DOI: 10.1016/S0020-0190(00)00044-2
  • [SGV94] Arnold Schönhage, Andreas Grotefeld and Ekkehart Vetter “Fast algorithms. A multitape Turing machine implementation” B.I. Wissenschaftsverlag, 1994
  • [Ser10] Igor S. Sergeev “Fast algorithms for elementary operations on complex power series” In Discrete Mathematics and Applications 20.1, 2010, pp. 25–60 DOI: 10.1515/DMA.2010.002
  • [Ser12] Igor S. Sergeev “A note on the fast power series’ exponential” In arXiv preprint arXiv:1203.3883, 2012
  • [Wil+24] Virginia V. Williams, Yinzhan Xu, Zixuan Xu and Renfei Zhou “New bounds for matrix multiplication: from alpha to omega” In Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2024, pp. 3792–3835 DOI: 10.1137/1.9781611977912.134