A log-log speedup for exponent one-fifth deterministic integer factorisation
Abstract.
Building on techniques recently introduced by the second author, and further developed by the first author, we show that a positive integer may be rigorously and deterministically factored into primes in at most
bit operations. This improves on the previous best known result by a factor of .
2020 Mathematics Subject Classification:
Primary 11Y051. Introduction
The aim of this paper is to further refine a method for integer factorisation that was recently introduced by the second author [Hit20], and subsequently improved and simplified by the first author [Har20]. We insist that all algorithms under discussion be deterministic and that all complexity bounds be proved rigorously. Faster factoring algorithms are known if one allows randomisation, heuristic arguments or quantum computers; see [CP05, Len00, Wag13, Rie94].
We write for the time required to compute the prime factorisation of an integer , where “time” means “bit operations”, i.e., the number of steps performed by a deterministic Turing machine with a fixed, finite number of linear tapes [Pap94]. All integers are assumed to be encoded in the usual binary representation.
Prior to [Hit20], the best known asymptotic bounds for were of the form . These were achieved by methods going back to Pollard, Strassen and Coppersmith; for further references see [Har20]. The algorithm presented in [Hit20] was the first to break the barrier, achieving . Shortly afterwards the first author made further progress on the exponent [Har20], showing that . The main result of the present paper is the following incremental improvement.
Theorem 1.1.
There is a deterministic integer factorisation algorithm achieving
Note that if has three or more prime factors (counting repetitions), then at least one factor is bounded above by , and it is well known that such a factor may be found in time (see for example [Har20, Prop. 2.5]). Using this observation one easily reduces the proof of Theorem 1.1 to the case that is either prime or semiprime, i.e., a product of two distinct primes. For details of the reduction, see the proof of [Har20, Thm. 1.1]. For the rest of the paper, we assume that is either prime or semiprime, say where .
The strategy of [Har20] may be outlined as follows. Fix some integer coprime to . Since , Fermat’s little theorem implies that for any . If we can recover , for known coefficients , then it is easy to deduce and , since we know the product (see Lemma 2.4). Now, it was observed by Lehman [Leh74] (following ideas going back to Lawrence [Law95]) that if are are small integers, chosen so that is a good rational approximation to the unknown , then will be especially close to . This suggests rewriting the congruence as
When , the left hand side has the form where is “small”. Consequently, to solve for , it suffices to find a collision modulo between the “giantsteps” , where ranges over some sufficiently dense set of rational numbers, and the “babysteps” , where ranges over small integers. This collision-finding problem may be attacked using tools from fast polynomial arithmetic, and [Har20] shows that this strategy (after filling in many details left out in this rough sketch) leads to the bound .
The new algorithm in this paper follows the same basic plan described above, but utilises the additional information that and cannot themselves be divisible by small primes. We modify the algorithm so that it restricts attention to candidates for that are coprime to , i.e., is the product of the first primes for suitable . The number of possible values of modulo is (the Euler totient function), and for suitable choice of , Mertens’ theorem (see Lemma 2.6) implies that the ratio decays like . This is the source of the log-log savings over [Har20]. Actually, for technical reasons we reorganise the algorithm considerably: instead of defining the giantsteps by taking all pairs in some range, as is done in [Har20], we use algorithms for finding short lattice vectors to compute suitable values for and for each giantstep.
2. Preliminaries
For a positive integer, we define . Observe that for all , that for , and that may be computed in time .
We recall several facts about integer arithmetic. All results stated here without specific references may be found in textbooks such as [vzGG13] or [BZ11].
Let , and assume that we are given integers and such that . We may compute and in time . We write for the cost of computing the product ; it was shown recently that [HvdH21]. If , we may compute the quotients and , and therefore also the residue of modulo in the interval , in time . More generally, for a fixed positive rational number , and assuming that , we may compute and in time . We may compute , and if desired find such that (i.e., solve the extended GCD problem), in time .
Next we consider modular arithmetic. Let . We write for the ring of integers modulo . Elements of will always be represented by their residues in the interval , so occupy bits of storage. We write for the group of units of , i.e., the subset of those for which . Given , we may compute in time , and in time . Given and an integer , we may compute in time , using the “repeated squaring” algorithm. We may test whether , and if so compute , in time , by solving the extended GCD problem for and . If the gcd is not or , and if is semiprime, then we immediately recover and in time .
Lemma 2.1.
Let with . Given as input , we may compute the polynomial in time .
Proof.
This is exactly [Har20, Lem. 2.3]; the proof depends on a standard application of a product tree. By “compute a polynomial” we mean compute a list of its coefficients. ∎
Remark 2.2.
In the above lemma, the hypothesis “” means that for every constant , the lemma holds for all and in the region , where the implied big- constant in the target bound may depend on . For the rest of the paper, a similar remark applies whenever we use the big- notation in this way.
Lemma 2.3.
Given as input an element , positive integers , and of degree , we may compute in time .
Proof.
Lemma 2.4.
Given as input an integer such that is either a prime or semiprime , and integers with at most bits, we may test if is of the form , and if so recover and , in time .
Proof.
This is a trivial modification of [Har20, Lem. 3.1]. The proof depends on observing that and must be roots of the polynomial . ∎
Lemma 2.5.
There is an algorithm with the following properties. It takes as input integers and such that . It returns either some with , or a nontrivial factor of , or “ is prime”. Its runtime is bounded by
Proof.
This is exactly [Hit18, Thm. 6.3]. Briefly, the idea is to attempt to compute orders of various via the standard babystep-giantstep procedure. If we fail to compute , then the order must be large and we are done. If we succeed in computing , then we try to find a factor of via for each prime divisor of . If this fails, then we know that for every prime divisor of . Repeating this process for various , we either find an element of large order, or obtain enough information about the factorisation of for to make it feasible to factor directly. For details, see [Hit18]. ∎
We will need the following well-known results on the distribution of primes.
Lemma 2.6.
For we have
Proof.
See Theorem 6.9 and Theorem 2.7(e) in [MV07]. The first statement is a form of the Prime Number Theorem, and the second statement is one of Mertens’ theorems. ∎
Finally we recall that a list of elements of bit size may be sorted using the “merge sort” algorithm in time [Knu98]. Here we assume that the elements belong to some totally ordered set, and that we may compare a pair of elements in time .
3. A variant of Lehman’s result
The crux of the algorithms of [Hit20] and [Har20] is the observation, going back to Lehman [Leh74], that if is semiprime, then there exist “small” integers and such that is “close” to (see [Har20, Lem. 3.3], or the main theorem of [Leh74]). In this section we prove a variant of this result in which we impose an additional constraint on and of the form , where is a positive integer. (In Section 4 we will specialise to the case that is a product of small primes.) Moreover, instead of just an existence statement, we actually want to compute the desired and .
We recall some basic facts about lattices in . By a lattice we mean a subgroup of generated by two linearly independent vectors , i.e., the subgroup . The determinant of a lattice is the area of the fundamental parallelogram associated to any basis for . In particular for we have
For a vector we write for the usual Euclidean norm.
Lemma 3.1.
Let , and suppose we are given as input linearly independent vectors defining a lattice , such that . Then in time we may find a nonzero vector such that .
Proof.
We may find a nonzero vector of minimal norm by applying the classical Lagrange–Gauss reduction algorithm to the input basis (see [Gal12, Chap. 17]). According to [Gal12, Thm. 17.1.10], this runs in time . (The reduction algorithm is essentially a special case of LLL: the basic idea is to repeatedly subtract a suitable integer multiple of the shorter vector from the longer vector, until no further progress can be made in reducing the norms.) This vector satisfies where is the Hermite constant for rank- lattices (see [Gal12, Defn. 16.2.7]). Since we are done. ∎
Remark 3.2.
Lemma 3.1 can be improved slightly. Minkowski’s convex body theorem [Lan94, §V.3, Thm. 3] implies that there exists a nonzero vector such that , and such a vector can be found efficiently by a more involved algorithm; see for example [KS96]. This leads to better constants later in the paper, but does not affect our main asymptotic results.
Proposition 3.3.
There exists an algorithm with the following properties.
It takes as input positive integers , and , a positive integer coprime to , and an integer coprime to such that .
Its output is a pair of integers , such that if is a semiprime satisfying
(3.1) |
and
(3.2) |
then the linear combination satisfies
(3.3) |
and
(3.4) |
Assuming that , the algorithm runs in bit operations, and moreover the output satisfies .
Proof.
Assume that the input parameters , , , and are as described above. Assume also that is a semiprime satisfying (3.1) and (3.2).
Let so that . Then , so
for some . For any pair of integers , it then follows that
(3.5) |
Similarly, let , so that . Then is a rational number that is -integral (i.e., its denominator is coprime to ), and . Thus
and for any pair of integers we obtain
This last congruence implies that (3.4) holds for any that satisfies
(3.6) |
If additionally satisfies the inequalities
(3.7) |
then (3.5) implies that (3.3) holds. We are left with showing how to compute a pair satisfying both (3.6) and (3.7). (The point is that both of these conditions are independent of .)
The congruence (3.6) is equivalent to , where is the unique integer in congruent to modulo . Thus the solutions to (3.6) form a lattice , where and . Our goal is to find a nonzero vector that lies in the parallelogram defined by (3.7).
To achieve this, it is convenient to introduce the linear change of variables
In the -coordinates the inequalities (3.7) become simply
(3.8) |
i.e., the parallelogram becomes a square. Moreover, in the -coordinates the lattice gets mapped to the lattice where
The determinant of is . We may therefore apply Lemma 3.1 to the basis to find a nonzero pair satisfying (3.8).
4. The main algorithm
For the convenience of the reader, we recall the following algorithm from [Har20], which forms a key subroutine of the main search algorithm presented afterwards.
Algorithm 4.1 (Finding collisions modulo or ).
Input:
-
–
A positive integer , either prime or semiprime.
-
–
A positive integer , and an element such that .
-
–
Elements for some positive integer , such that for all and .
Output:
-
–
If is semiprime, and if there exists such that or for some , returns and .
-
–
Otherwise returns “no factors found”.
Proposition 4.2.
Algorithm 4.1 is correct. Assuming that , its running time is .
Proof.
This is exactly Proposition 4.1 in [Har20]. ∎
We now present the main search algorithm.
Algorithm 4.3 (The main search).
Input:
-
–
A positive integer , either prime or semiprime.
-
–
Positive integers and such that .
-
–
An element such that where
Output: If is semiprime, returns and . Otherwise returns “ is prime”.
(4.1) |
(4.2) |
Proposition 4.4.
Algorithm 4.3 is correct. If , then it runs in time
(4.3) |
Proof.
We first prove correctness. Suppose that is semiprime, with . We must show that in Steps 1–13 the algorithm succeeds in finding and .
Consider the loop in Steps 1–3. Since we have assumed that , in this loop we either find a factor of , or prove that both and . For the remainder of the proof, we will assume the latter.
We next consider the giantsteps. The block in Steps 8–10 is executed for various pairs . We claim that (3.2) holds for exactly one such , and that (3.1) holds for exactly one such . For (3.2) this is clear, as is coprime to by hypothesis, so there is exactly one visited by the outer loop such that and . For the inner loop, observe that strictly increases on each iteration (in Step 11), so the loop certainly terminates. For any visited in the loop, write for the value of at the next iteration. Since , on some iteration we must have ; in fact, this occurs for precisely one value of because the successive intervals are disjoint. Moreover, since is an integer, the condition is equivalent to , i.e., to (3.1). Therefore (3.1) holds for exactly one as claimed.
Let be the pair for which (3.1) and (3.2) hold, and consider the corresponding coefficients and computed in Step 8. According to Proposition 3.3, the linear combination satisfies (3.3) and (3.4) for . Let and be as defined in Step 9. From (3.4) we find that
(4.4) |
and (3.3) yields
so
This implies that , i.e., that for some . Inserting this into (4.4), we find that
where is as defined in Step 9. Now, the congruence implies that , so Fermat’s little theorem yields
We conclude that there must be a collision modulo between the babysteps computed in Step 2 and the giantsteps computed in Step 10, namely
(4.5) |
In Step 12, we attempt to find the solution to (4.5), by finding all solutions to the stronger congruence (4.1), which is a congruence modulo rather than modulo . Let be one of the solutions found in Step 12. If it is equal to , then the corresponding defined in (4.2) will be equal to , and the factors and will be found via Lemma 2.4. Now suppose instead that . We claim then that . Indeed, if , then we would have
which implies that due to the fact that and . This establishes the claim, and consequently the giantstep remains among the candidates in the list constructed in Step 13.
Finally we consider Step 13. The procedure in Step 12 ensures that all preconditions for Algorithm 4.1 are met. In addition, we have seen in the last paragraph that one of is equal to . Hence, Algorithm 4.1 will succeed in finding a nontrivial factor of .
If is prime, then Steps 1–13 will fail to find a nontrivial factor, and the algorithm will correctly return “ is prime” in Step 14.
We now analyse the runtime complexity. To prepare for the loop in Steps 1–3 we first compute ; the cost of this step is
The loop itself computes at most products modulo and GCDs of integers bounded by , whose total cost is
In Step 5 we compute GCDs of integers bounded by , which costs
Now consider the inner loop over . The ratio between values of on successive iterations is at least , so for each the number of iterations of the inner loop over is at most
The number of considered is , so the block in Steps 8–11 executes altogether
times. Let us estimate the cost of each iteration of this block. The cost of the invocation of Proposition 3.3 in Step 8 is . By hypothesis we have , certainly , and Proposition 3.3 ensures that , so all quantities appearing in Step 9 have bits. The cost of computing is therefore , and all other arithmetic operations required to compute , and in Step 9 cost at most bit operations. Similarly, the exponent () in Step 10 and the updated value of in Step 11 are computed in time . Finally, the modular exponentiation in Step 10 requires multiplications modulo , plus possibly one inversion modulo if the exponent is negative, so its cost is . We conclude that the block in Steps 8–11 runs in time , and that the total over all iterations is
In Step 12, we construct a list of pairs of length , and a list of tuples of length . From the bounds already mentioned, each item in these lists occupies bits. We then use merge-sort to sort the lists by the first component of each tuple, which requires
bit operations. Each giantstep is equal to at most one babystep , because our assumption implies that the babysteps are all distinct. Matching the two sorted lists, we may hence find all matches in time
Since there are at most such matches, the total cost for applying Lemma 2.4 in Step 12 is bounded by
Again, we note that the assumptions of Lemma 2.4 on the size of the candidates for , and are always satisfied. Finally, in Step 13 we apply Algorithm 4.1, whose complexity is
Combining the contributions from all steps, we obtain the overall bound
The last term becomes
The final term is dominated by , completing the proof. ∎
Remark 4.5.
Finally we present the main factoring routine. In this algorithm, is a constant that is chosen large enough to ensure that the proof of correctness works for all . In principle one could work out explicitly, but we will not do so.
Algorithm 4.6 (Factoring (semi)primes).
Input: A positive integer , either prime or semiprime.
Output: If is semiprime, returns and . Otherwise returns “ is prime”.
Proposition 4.7.
Algorithm 4.6 is correct (for suitable ), and it runs in time
Proof.
We start by discussing the choice of in Step 1. By the Prime Number Theorem (see Lemma 2.6) we have . Since
we find that for large ,
(4.6) |
We may compute by simply enumerating the primes up to and multiplying them together; this may be done in time . The rest of Step 1 (computing , and ) may also be carried out in time .
Step 2 runs in time
which is negligible. Assume that we do not find factors of or prove prime. We then obtain such that .
For the invocation of Algorithm 4.3 in Step 3, we must check the precondition , where we recall that . On one hand, (4.6) yields
On the other hand, the definition of , together with (4.6), implies that
(4.7) |
so
Therefore certainly for large . We conclude that Step 3 succeeds in factoring or proving prime.
The hypotheses of Proposition 4.4 are certainly satisfied, so the cost of Step 3 is given by (4.3). The first term of (4.3) is negligible thanks to (4.6). To estimate the second term, we note that Mertens’ theorem (Lemma 2.6) implies that
Using this estimate together with (4.7), it is easy to check that the second and third terms in (4.3) simplify to the desired bound , and hence our choice for the size of balances the asymptotic contribution from these two terms. Taking into account the discussion in Section 1, this also completes the proof of Theorem 1.1. ∎
References
- [BGS07] A. Bostan, P. Gaudry, and É. Schost, Linear recurrences with polynomial coefficients and application to integer factorization and Cartier-Manin operator, SIAM J. Comput. 36 (2007), no. 6, 1777–1806. MR 2299425 (2008a:11156)
- [Blu70] L. I. Bluestein, A linear filtering approach to the computation of discrete Fourier transform, IEEE Transactions on Audio and Electroacoustics 18 (1970), no. 4, 451–455.
- [BZ11] R. P. Brent and P. Zimmermann, Modern Computer Arithmetic, Cambridge Monographs on Applied and Computational Mathematics, vol. 18, Cambridge University Press, Cambridge, 2011. MR 2760886
- [CH14] E. Costa and D. Harvey, Faster deterministic integer factorization, Math. Comp. 83 (2014), no. 285, 339–345. MR 3120593
- [CP05] R. Crandall and C. Pomerance, Prime Numbers: a Computational Perspective, second ed., Springer, New York, 2005. MR 2156291 (2006a:11005)
- [Gal12] S. D. Galbraith, Mathematics of Public Key Cryptography, Cambridge University Press, Cambridge, 2012. MR 2931758
- [Har20] D. Harvey, An exponent one-fifth algorithm for deterministic integer factorisation, to appear in Mathematics of Computation, arXiv preprint https://arxiv.org/abs/2010.05450, 2020.
- [Hit18] M. Hittmeir, A babystep-giantstep method for faster deterministic integer factorization, Math. Comp. 87 (2018), no. 314, 2915–2935. MR 3834692
- [Hit20] by same author, A time-space tradeoff for Lehman’s deterministic integer factorization method, to appear in Mathematics of Computation, arXiv preprint http://arxiv.org/abs/2006.16729v1, 2020.
- [HvdH21] D. Harvey and J. van der Hoeven, Integer multiplication in time , Ann. of Math. (2) 193 (2021), no. 2, 563–617. MR 4224716
- [Knu98] D. E. Knuth, The Art of Computer Programming. Vol. 3, Addison-Wesley, Reading, MA, 1998, Sorting and searching, Second edition [of MR0445948]. MR 3077154
- [KS96] M. Kaib and C. P. Schnorr, The generalized Gauss reduction algorithm, J. Algorithms 21 (1996), no. 3, 565–578. MR 1417664
- [Lan94] S. Lang, Algebraic Number Theory, second ed., Graduate Texts in Mathematics, vol. 110, Springer-Verlag, New York, 1994. MR 1282723
- [Law95] F. W. Lawrence, Factorisation of numbers, Messenger of Math. 24 (1895), 100–109.
- [Leh74] R. S. Lehman, Factoring large integers, Math. Comp. 28 (1974), 637–646. MR 340163
- [Len00] A. K. Lenstra, Integer factoring, Des. Codes Cryptogr. 19 (2000), no. 2-3, 101–128, Towards a quarter-century of public key cryptography. MR 1758972
- [MV07] H. L. Montgomery and R. C. Vaughan, Multiplicative Number Theory. I. Classical Theory, Cambridge Studies in Advanced Mathematics, vol. 97, Cambridge University Press, Cambridge, 2007. MR 2378655
- [NSV11] A. Novocin, D. Stehlé, and G. Villard, An LLL-reduction algorithm with quasi-linear time complexity [extended abstract], STOC’11—Proceedings of the 43rd ACM Symposium on Theory of Computing, ACM, New York, 2011, pp. 403–412. MR 2931990
- [Pap94] C. H. Papadimitriou, Computational Complexity, Addison-Wesley Publishing Company, Reading, MA, 1994. MR 1251285 (95f:68082)
- [Rie94] H. Riesel, Prime Numbers and Computer Methods for Factorization, second ed., Progress in Mathematics, vol. 126, Birkhäuser Boston, Inc., Boston, MA, 1994. MR 1292250
- [vzGG13] J. von zur Gathen and J. Gerhard, Modern Computer Algebra, 3rd ed., Cambridge University Press, Cambridge, 2013. MR 3087522
- [Wag13] S. S. Wagstaff, Jr., The Joy of Factoring, Student Mathematical Library, vol. 68, American Mathematical Society, Providence, RI, 2013. MR 3135977