This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On Product Sets of Arithmetic Progressions

Max Wenqiang Xu Supported by the Cuthbert C. Hurd Graduate Fellowship in Mathematics, Stanford    Yunkun Zhou Supported by NSF GRFP Grant DGE-1656518
Abstract

We prove that the size of the product set of any finite arithmetic progression 𝒜\mathcal{A}\subset\mathbb{Z} satisfies

|𝒜𝒜||𝒜|2(log|𝒜|)2θ+o(1),|\mathcal{A}\cdot\mathcal{A}|\geq\frac{|\mathcal{A}|^{2}}{(\log|\mathcal{A}|)^{2\theta+o(1)}},

where 2θ=1(1+loglog2)/(log2)2\theta=1-(1+\log\log 2)/(\log 2) is the constant appearing in the celebrated Erdős multiplication table problem. This confirms a conjecture of Elekes and Ruzsa from about two decades ago.

If instead 𝒜\mathcal{A} is relaxed to be a subset of a finite arithmetic progression in integers with positive constant density, we prove that

|𝒜𝒜||𝒜|2(log|𝒜|)2log21+o(1).|\mathcal{A}\cdot\mathcal{A}|\geq\frac{|\mathcal{A}|^{2}}{(\log|\mathcal{A}|)^{2\log 2-1+o(1)}}.

This solves the typical case of another conjecture of Elekes and Ruzsa on the size of the product set of a set 𝒜\mathcal{A} whose sumset is of size O(|𝒜|)O(|\mathcal{A}|).

Our bounds are sharp up to the o(1)o(1) term in the exponents. We further prove asymmetric extensions of the above results.

\dajAUTHORdetails

title = On Product Sets of Arithmetic Progressions, author = Max Wenqiang Xu and Yunkun Zhou, plaintextauthor = Max Wenqiang Xu, Yunkun Zhou, \dajEDITORdetailsyear=2023, number=10, received=4 January 2022, published=25 July 2023, doi=10.19086/da.84267, [classification=text]

1 Introduction

The celebrated Erdős multiplication table problem asks to estimate how many numbers can be represented as the product of two positive integers which are at most NN. It is a classical result [10] that there are o(N2)o(N^{2}) such numbers, which can be proved by simply considering the typical number of prime factors of an integer nNn\leq N. Erdős [8] first determined the answer up to a (logN)o(1)(\log N)^{o(1)} factor:

|[N][N]|=N2(logN)2θ+o(1),\Big{|}[N]\cdot[N]\Big{|}=\frac{N^{2}}{(\log N)^{2\theta+o(1)}}, (1.1)

where [N]={1,,N}[N]=\{1,\ldots,N\}, θ=11+loglog4log4\theta=1-\frac{1+\log\log 4}{\log 4} (which implies that 2θ=11+loglog2log22\theta=1-\frac{1+\log\log 2}{\log 2}), and 𝒜𝒜\mathcal{A}\cdot\mathcal{A} is defined to be {aa:a,a𝒜}\{aa^{\prime}:a,a^{\prime}\in\mathcal{A}\} for any set 𝒜\mathcal{A}. The finer-order term was improved later by Tenenbaum [32]. Remarkably, Ford [12] has determined the quantity up to a constant factor:

|[N][N]|N2(logN)2θ(loglogN)3/2.\Big{|}[N]\cdot[N]\Big{|}\asymp\frac{N^{2}}{(\log N)^{2\theta}(\log\log N)^{3/2}}. (1.2)

There is also work on generalizations of the Erdős multiplication table problem to higher dimensions: see [16] for references.

In this paper, we are interested the same problem but with [N][N] replaced by an arbitrary arithmetic progression 𝒜\mathcal{A} of NN integers. In some cases the size of the product set can be as large as the trivial upper bound (N+12)\binom{N+1}{2}, which is of order N2N^{2}. One example that achieves this bound is {1+kd:k[N]}\{1+kd:k\in[N]\} where d>2Nd>2N. It is interesting to know whether the size of the product set of any arithmetic progression with given length NN can be substantially smaller than the bound in (1.1). Elekes and Ruzsa [7] conjectured that it cannot. In this paper, we prove their conjecture.

Theorem 1.1.

Let 𝒜\mathcal{A}\subseteq\mathbb{Z} be a finite arithmetic progression and θ=11+loglog4log4\theta=1-\frac{1+\log\log 4}{\log 4}. Then

|𝒜𝒜||𝒜|2(log|𝒜|)2θo(1).|\mathcal{A}\cdot\mathcal{A}|\geq|\mathcal{A}|^{2}(\log|\mathcal{A}|)^{-2\theta-o(1)}.

The strongest lower bound we can prove is |𝒜|2(log|𝒜|)2θ(loglog|𝒜|)7o(1)|\mathcal{A}|^{2}(\log|\mathcal{A}|)^{-2\theta}(\log\log|\mathcal{A}|)^{-7-o(1)} (Theorem 7.1), which is sharp up to a power of loglog|A|\log\log|A|.

We also prove an asymmetric version of the conjecture, which strengthens Theorem 1.1.

Theorem 1.2.

Let 𝒜,\mathcal{A},\mathcal{B}\subseteq\mathbb{Z} be two finite arithmetic progressions with lengths 2|||𝒜|2\leq|\mathcal{B}|\leq|\mathcal{A}| and θ=11+loglog4log4\theta=1-\frac{1+\log\log 4}{\log 4}. Then

|𝒜||𝒜|||(log|𝒜|)θo(1)(log||)θ,|\mathcal{A}\cdot\mathcal{B}|\geq{|\mathcal{A}||\mathcal{B}|}{(\log|\mathcal{A}|)^{-\theta-o(1)}(\log|\mathcal{B}|)^{-\theta}},

where the o(1)o(1) terms go to zero as |𝒜||\mathcal{A}| tends to infinity.

Remark.

The two theorems above are sharp up to a (log|𝒜|)o(1)(\log|\mathcal{A}|)^{o(1)} factor. The proof we present gives the error term in the form exp(O(loglog|𝒜|logloglog|𝒜|))\exp(O(\sqrt{\log\log|\mathcal{A}|}\log\log\log|\mathcal{A}|)). In fact, a stronger bound up to a (loglog|𝒜|)O(1)(\log\log|\mathcal{A}|)^{O(1)} factor can be obtained. We leave details of the proof to the stronger error term to Appendix A.

A particular structural property of arithmetic progressions 𝒜\mathcal{A} is that it has small sumset. The celebrated sum-product conjecture [9] states that for any 𝒜\mathcal{A}\subseteq\mathbb{Z},

max{|𝒜+𝒜|,|𝒜𝒜|}|𝒜|2o(1),\max\{|\mathcal{A}+\mathcal{A}|,|\mathcal{A}\cdot\mathcal{A}|\}\gg|\mathcal{A}|^{2-o(1)}, (1.3)

where 𝒜+𝒜={a+a:a,a𝒜}\mathcal{A}+\mathcal{A}=\{a+a^{\prime}:a,a^{\prime}\in\mathcal{A}\} and \ll is to denote up to a multiplicative constant. There has been much progress towards this conjecture (see [23] for the current record), though it is still wide open. One extremal case of the sum-product conjecture is the case where one of the sumset and product set is very small, in which, the conjecture is known to be true. Indeed, Chang [4] proved that

|𝒜𝒜||𝒜||𝒜+𝒜||𝒜|2,|\mathcal{A}\cdot\mathcal{A}|\ll|\mathcal{A}|\implies|\mathcal{A}+\mathcal{A}|\gg|\mathcal{A}|^{2},

which is sharp up to a multiplicative constant factor. See [19, 22] for discussions in other settings.

We are interested in the other extremal case where |𝒜+𝒜|/|𝒜||\mathcal{A}+\mathcal{A}|/|\mathcal{A}| is bounded, which seems to be much more challenging. In contrast to Chang’s result [4], even if the doubling number (i.e. |𝒜+𝒜|/|𝒜||\mathcal{A}+\mathcal{A}|/|\mathcal{A}|) is at most 22, the product set can still be smaller than |𝒜|2|\mathcal{A}|^{2} by a polylogarithmic factor (see (1.2)). Nathanson and Tenenbaum proved [20] that if |𝒜+𝒜|<3|𝒜|4|\mathcal{A}+\mathcal{A}|<3|\mathcal{A}|-4, then |𝒜𝒜||𝒜|2/(log|𝒜|)2|\mathcal{A}\cdot\mathcal{A}|\gg|\mathcal{A}|^{2}/(\log|\mathcal{A}|)^{2}. This result was later generalized by Chang [3] to hh-fold product sets. Finally, Elekes and Ruzsa [7] proposed the following conjecture.

Conjecture 1.3 (Elekes and Ruzsa [7]).

Let 𝒜\mathcal{A}\subseteq\mathbb{Z} be a finite set. Then the following holds.

|𝒜+𝒜||𝒜||𝒜𝒜||𝒜|2(log|𝒜|)12log2o(1).|\mathcal{A}+\mathcal{A}|\ll|\mathcal{A}|\implies|\mathcal{A}\cdot\mathcal{A}|\geq|\mathcal{A}|^{2}(\log|\mathcal{A}|)^{1-2\log 2-o(1)}.

Elekes and Ruzsa [7] showed that if |𝒜+𝒜||𝒜||\mathcal{A}+\mathcal{A}|\ll|\mathcal{A}|, then |𝒜𝒜||𝒜|2(log|𝒜|)1|\mathcal{A}\cdot\mathcal{A}|\gg|\mathcal{A}|^{2}(\log|\mathcal{A}|)^{-1} when 𝒜\mathcal{A} is a finite set of integers. As a corollary of Solymosi’s remarkable result [28], the same implication holds when 𝒜\mathcal{A} is a finite set of reals. We remark that their proofs did not use specific properties of the integers, and we improve their bounds by taking the arithmetic information of integers into account. Indeed, the 2log212\log 2-1 exponent has a natural arithmetic interpretation. By the Sathe-Selberg formula [25], the number of nNn\leq N with Ω(n)=(2+o(1))loglogN\Omega(n)=(2+o(1))\log\log N is about N/(logN)2log21+o(1)N/(\log N)^{2\log 2-1+o(1)}, where Ω(n)\Omega(n) is the number of prime factors of nn. Further we claim that the bound in Conjecture 1.3 is optimal up to o(1)o(1) factor in the exponent. One simple example is to take 𝒜[N]\mathcal{A}\subset[N] containing set of all numbers nn with Ω(n)=(1+o(1))loglogN\Omega(n)=(1+o(1))\log\log N. By the Hardy-Ramanujan theorem one has |𝒜|=(1+o(1))N|\mathcal{A}|=(1+o(1))N. Since AAA\cdot A is contained in the set of integers with Ω(n)=(2+o(1))loglogN\Omega(n)=(2+o(1))\log\log N, combining this with the previously mentioned application of Sathe-Selberg’s formula, the claim follows.

We believe that to get the conjectured exponent 2log212\log 2-1, one has to take the arithmetic information into account. We also remark that for similar arithmetic reasons, the constant 2log212\log 2-1 appears in recent work [5, 18, 29].

By the Freiman-Ruzsa theorem [15, 24], every set with constant doubling number is a dense subset of a generalized arithmetic progression of constant dimension. A natural case is that 𝒜\mathcal{A} is a dense subset of an arithmetic progression. In fact, in some sense this is also the typical case [1, 2]. In this paper, we prove that the Elekes-Ruzsa conjecture is true in this case.

Theorem 1.4.

Let δ(0,1]\delta\in(0,1] and 𝒜\mathcal{A} be a subset of an arithmetic progression 𝒫\mathcal{P}\subseteq\mathbb{Z} with |𝒜|δ|𝒫||\mathcal{A}|\geq\delta|\mathcal{P}|, then

|𝒜𝒜|δ|𝒜|2(log|𝒜|)12log2o(1).|\mathcal{A}\cdot\mathcal{A}|\gg_{\delta}|\mathcal{A}|^{2}(\log|\mathcal{A}|)^{1-2\log 2-o(1)}.

We remark that Pomerance and Sárközy [21] proved a special case of Theorem 1.4 (𝒫=[N])(\mathcal{P}=[N]). We also prove a more general asymmetric version.

Theorem 1.5.

Let δ(0,1]\delta\in(0,1] and 𝒜,\mathcal{A},\mathcal{B} be two subsets of two finite arithmetic progressions 𝒫1,𝒫2\mathcal{P}_{1},\mathcal{P}_{2}\subseteq\mathbb{Z}, and |𝒜|δ|𝒫1||\mathcal{A}|\geq\delta|\mathcal{P}_{1}|, ||δ|𝒫2||\mathcal{B}|\geq\delta|\mathcal{P}_{2}|, then

|𝒜|δ|𝒜|||(log|𝒜|)12log2o(1)(log||)12log2o(1).|\mathcal{A}\cdot\mathcal{B}|\gg_{\delta}|\mathcal{A}||\mathcal{B}|(\log|\mathcal{A}|)^{\frac{1}{2}-\log 2-o(1)}(\log|\mathcal{B}|)^{\frac{1}{2}-\log 2-o(1)}.

Although Theorem 1.4 does not solve Conjecture 1.3 completely, it can be used to prove the case for doubling number up to 44 (in terms of difference set). In this case, Eberhard, Green and Manners [6, Theorem 6.4] proved that 𝒜\mathcal{A} must be a subset of an arithmetic progression with positive density, and thus the following corollary can be deduced from Theorem 1.4 immediately.

Corollary 1.6.

Let ε>0\varepsilon>0. Let 𝒜\mathcal{A}\subseteq\mathbb{Z} be a finite set with |AA|(4ε)|A||A-A|\leq(4-\varepsilon)|A|, then

|𝒜𝒜|ε|𝒜|2(log|𝒜|)12log2o(1).|\mathcal{A}\cdot\mathcal{A}|\gg_{\varepsilon}|\mathcal{A}|^{2}(\log|\mathcal{A}|)^{1-2\log 2-o(1)}.

In Theorem 1.5, one can relax the condition that 𝒜,\mathcal{A},\mathcal{B} are dense subsets of arithmetic progressions 𝒫1,𝒫2\mathcal{P}_{1},\mathcal{P}_{2}, to the condition that 𝒜\mathcal{A} intersects an arithmetic progression 𝒫1\mathcal{P}_{1} (with sizes |𝒫1|=Θ(|𝒜|)|\mathcal{P}_{1}|=\Theta(|\mathcal{A}|)) with positive density, and a similar condition holds for \mathcal{B} and 𝒫2\mathcal{P}_{2}. Such 𝒜\mathcal{A} and \mathcal{B} may have large sumsets.

Our Theorem 1.5 also easily implies the following observation. Consider the restricted sumsets

𝒜+G:={a+b:(a,b)G)},|G|(1ε)|𝒜|||,\mathcal{A}+_{G}\mathcal{B}:=\{a+b:(a,b)\in G)\},\quad|G|\geq(1-\varepsilon)|\mathcal{A}||\mathcal{B}|,

where ε\varepsilon is a small absolute constant. If the restricted sumset is small, then one can get similar lower bounds on |𝒜||\mathcal{A}\cdot\mathcal{B}| as in Theorem 1.5. The proof simply consists of using the structural results in [17, 26] to conclude that such set 𝒜,\mathcal{A},\mathcal{B} must overlap with some arithmetic progressions 𝒫1,𝒫2\mathcal{P}_{1},\mathcal{P}_{2} heavily, and then applying Theorem 1.5 to conclude the proof. We leave the details for the interested reader.

We prove Theorems 1.2 and 1.5 by bounding the multiplicative energy from above.

Definition 1.7 (Multiplicative energy).

Let 𝒜,\mathcal{A},\mathcal{B} be two finite subsets of integers. The multiplicative energy between 𝒜,\mathcal{A},\mathcal{B} is defined as

E×(𝒜,):=|{(a1,a2,b1,b2)𝒜×𝒜××:a1b1=a2b2}|.E_{\times}(\mathcal{A},\mathcal{B}):=\left|\{(a_{1},a_{2},b_{1},b_{2})\in\mathcal{A}\times\mathcal{A}\times\mathcal{B}\times\mathcal{B}:a_{1}b_{1}=a_{2}b_{2}\}\right|.

When 𝒜=\mathcal{A}=\mathcal{B}, we write E×(𝒜):=E×(𝒜,𝒜)E_{\times}(\mathcal{A}):=E_{\times}(\mathcal{A},\mathcal{A}), which is called the multiplicative energy of 𝒜\mathcal{A}.

The first main theorem is about the multiplicative energy of any finite arithmetic progression.

Theorem 1.8.

Let 𝒜\mathcal{A}\subseteq\mathbb{Z} be a finite arithmetic progression. Then there exists a subset 𝒜𝒜\mathcal{A}^{\prime}\subseteq\mathcal{A} with size |𝒜||𝒜|(log|𝒜|)θo(1)|\mathcal{A}^{\prime}|\geq|\mathcal{A}|(\log|\mathcal{A}|)^{-\theta-o(1)} such that E×(𝒜)|𝒜|2.E_{\times}(\mathcal{A}^{\prime})\ll|\mathcal{A}^{\prime}|^{2}.

A stronger version in terms of the o(1)o(1) term is achieved in Appendix A. The second main theorem is about the multiplicative energy of a dense subset of any finite arithmetic progression.

Theorem 1.9.

Let δ(0,1]\delta\in(0,1] and let 𝒜\mathcal{A} be a subset of a finite arithmetic progression 𝒫\mathcal{P}\subseteq\mathbb{Z} such that |𝒜|δ|𝒫||\mathcal{A}|\geq\delta|\mathcal{P}|. Then there exists a subset 𝒜𝒜\mathcal{A}^{\prime}\subseteq\mathcal{A} with size |𝒜|δ|𝒜||\mathcal{A}^{\prime}|\gg_{\delta}|\mathcal{A}| such that

E×(𝒜)δ|𝒜|2(log|𝒜|)2log21+o(1).E_{\times}(\mathcal{A}^{\prime})\ll_{\delta}|\mathcal{A}|^{2}(\log|\mathcal{A}|)^{2\log 2-1+o(1)}.

1.1 Proof ideas

Our main focus will be on Theorem 1.8 and Theorem 1.9, and all the rest of our results will follow from them fairly straightforwardly (See Section 2).

Theorem 1.8 and Theorem 1.9 are well understood when the arithmetic progression is just the first NN integers (See [31, 8, 13] for the special case of Theorem 1.8 and [21] for the special case of Theorem 1.9.) The classical proofs of both special cases involve estimates on πk(x)\pi_{k}(x) (the number of integers up to xx with kk distinct prime factors) and tools like Shiu’s Theorem on bounding mean values of multiplicative functions. The original arguments cannot be directly generalized because these tools are not applicable to general arithmetic progressions. Our main novelty is the deduction of the general case in the case where the parameters of the arithmetic progression lie in a certain range such that all classical tools are applicable. The deduction is completed by proving the desired upper bounds for multiplicative energies in the other cases via elementary counting arguments. After the deduction steps, we carefully employ and adapt the method developed in [13, 8, 12, 14] to conclude the proof of both Theorem 1.8 and Theorem 1.9.

1.2 Organization

We organize the paper as follows. In Section 2 we prove Theorems 1.1, 1.2, 1.4 and 1.5 assuming Theorems 1.8 and 1.9. This is done by applying Cauchy-Schwarz type arguments (see Lemma 2.1). The remaining sections are devoted to proving the main Theorems 1.8 and 1.9. To prove Theorem 1.9, we first do a sequence of reduction steps in Section 3 as a preparation to proving the essential case of Theorem 1.9 in Section 4. We study the number of elements with a fixed number of prime factors in arithmetic progressions in Section 5 and crucially use it to complete the proof of Theorem 1.8 in Section 6. Finally in Section 7 we make conjectures about both problems.

1.3 Notations

For two functions f,g:f,g:\mathbb{R}\to\mathbb{R}, we write fg,gf,g=Ω(f)f\ll g,g\gg f,g=\Omega(f) or f=O(g)f=O(g) if there exists a positive constant CC such that fCgf\leq Cg, and we write fgf\asymp g or f=Θ(g)f=\Theta(g) if fgf\ll g and gfg\gg f. We write f=o(g)f=o(g) if f(x)ϵg(x)f(x)\leq\epsilon g(x) for any ϵ>0\epsilon>0 when g(x)g(x) is sufficiently large. We write δ\ll_{\delta} or δ\gg_{\delta} if the implicit constant depends on δ\delta. Throughout the paper, θ\theta always denotes the constant 11+loglog4log40.0431-\frac{1+\log\log 4}{\log 4}\approx 0.043.... For two sets 𝒜\mathcal{A} and \mathcal{B}, we write 𝒜=𝒜={ab:a𝒜,b}\mathcal{A}\mathcal{B}=\mathcal{A}\cdot\mathcal{B}=\{ab:a\in\mathcal{A},b\in\mathcal{B}\}. When 𝒜={a}\mathcal{A}=\{a\} is a singleton, we write a={a}a\mathcal{B}=\{a\}\cdot\mathcal{B}. For positive integer nn, ω(n)\omega(n) denotes the number of distinct prime factors of nn and ϕ(n)\phi(n) is the Euler’s totient function. All logarithms are base ee.

2 Reduction to multiplicative energy estimates

In this section, we show that Theorems 1.8 and 1.9 imply all the other main results. The implications follow from the Cauchy-Schwarz inequality (see Lemma 2.1). The proof of Lemma 2.1 is standard (e.g., see [30] Corollary 2.10 for an additive version): we include it here for completeness.

Lemma 2.1.

Let 𝒜,\mathcal{A},\mathcal{B} be nonempty sets of nonzero integers. Then we have

  1. 1.

    |𝒜||𝒜|2||2E×(𝒜,).|\mathcal{A}\cdot\mathcal{B}|\geq\frac{|\mathcal{A}|^{2}|\mathcal{B}|^{2}}{E_{\times}(\mathcal{A},\mathcal{B})}.

  2. 2.

    E×(𝒜,)E×(𝒜)E×()E_{\times}(\mathcal{A},\mathcal{B})\leq\sqrt{E_{\times}(\mathcal{A})E_{\times}(\mathcal{B})}.

Proof.

For any two sets of integers 𝒳,𝒴\{0}\mathcal{X},\mathcal{Y}\subset\mathbb{Z}\backslash\{0\}, we define

r𝒳𝒴(m):=|(x,y)𝒳×𝒴:xy=m|,r_{\mathcal{X}\cdot\mathcal{Y}}(m):=|(x,y)\in\mathcal{X}\times\mathcal{Y}:xy=m|,

and similarly define define ratio representation functions r𝒳/𝒴(m)r_{\mathcal{X}/\mathcal{Y}}(m) for mm\in\mathbb{Q}. By Definition 1.7 and the Cauchy-Schwarz inequality,

|𝒜|E×(𝒜,)=m𝒜1m𝒜r𝒜(m)2(m𝒜r𝒜(m))2=|𝒜|2||2.|\mathcal{A}\cdot\mathcal{B}|\cdot E_{\times}(\mathcal{A},\mathcal{B})=\sum_{m\in\mathcal{A}\cdot\mathcal{B}}1\cdot\sum_{m\in\mathcal{A}\cdot\mathcal{B}}r_{\mathcal{A}\cdot\mathcal{B}}(m)^{2}\geq\left(\sum_{m\in\mathcal{A}\cdot\mathcal{B}}r_{\mathcal{A}\cdot\mathcal{B}}(m)\right)^{2}=|\mathcal{A}|^{2}|\mathcal{B}|^{2}.

This gives (1). By Definition 1.7, and noting that a1a2=a1a2a_{1}a_{2}=a_{1}^{\prime}a_{2}^{\prime} is equivalent to a1a1=a2a2\frac{a_{1}}{a_{1}^{\prime}}=\frac{a_{2}^{\prime}}{a_{2}}, we have

E×(𝒜)=|{(a1,a2;a1,a2)𝒜4:a1a2=a1a2}|=mr𝒜/𝒜(m)2.E_{\times}(\mathcal{A})=\left|\{(a_{1},a_{2};a_{1}^{\prime},a_{2}^{\prime})\in\mathcal{A}^{4}:a_{1}a_{2}=a_{1}^{\prime}a_{2}^{\prime}\}\right|=\sum_{m\in\mathbb{Q}}r_{\mathcal{A}/\mathcal{A}}(m)^{2}.

We have a symmetric equation for \mathcal{B}. By the Cauchy-Schwarz inequality, we have

E×(𝒜)E×()=(mr𝒜/𝒜(m)2)(mr/(m)2)(mr𝒜/𝒜(m)r/(m))2.E_{\times}(\mathcal{A})E_{\times}(\mathcal{B})=\left(\sum_{m\in\mathbb{Q}}r_{\mathcal{A}/\mathcal{A}}(m)^{2}\right)\cdot\left(\sum_{m\in\mathbb{Q}}r_{\mathcal{B}/\mathcal{B}}(m)^{2}\right)\geq\left(\sum_{m\in\mathbb{Q}}r_{\mathcal{A}/\mathcal{A}}(m)r_{\mathcal{B}/\mathcal{B}}(m)\right)^{2}.

The right hand side is the square of the following counting

|{(a1,a2;b1,b2)𝒜×𝒜××:a1/a2=b1/b2}|=E×(𝒜,).\left|\{(a_{1},a_{2};b_{1},b_{2})\in\mathcal{A}\times\mathcal{A}\times\mathcal{B}\times\mathcal{B}:a_{1}/a_{2}=b_{1}/b_{2}\}\right|=E_{\times}(\mathcal{A},\mathcal{B}).

The equality is because a1/a2=b1/b2a_{1}/a_{2}=b_{1}/b_{2} is equivalent to a1b2=a2b1a_{1}b_{2}=a_{2}b_{1}. Hence we have (2). ∎

Assuming Theorems 1.8 and 1.9, we prove all the theorems regarding product sets (Theorems 1.1, 1.2, 1.4 and 1.5) via Lemma 2.1. Before we proceed to the proof, we remark that the condition in Lemma 2.1 that 𝒜,\mathcal{A},\mathcal{B} do not contain 0 is a mild technical restriction, as deleting one single element changes the size of product sets by at most O(|𝒜|+||)O(|\mathcal{A}|+|\mathcal{B}|), and changes multiplicative energy by at most O(|𝒜|||)O(|\mathcal{A}||\mathcal{B}|), which are negligible for our purpose.

Proof of Theorems 1.1 and 1.2 assuming Theorem 1.8.

By Theorem 1.8, there exist subsets 𝒜𝒜\mathcal{A}^{\prime}\subseteq\mathcal{A} and \mathcal{B}^{\prime}\subseteq\mathcal{B} with sizes |𝒜||𝒜|(log|𝒜|)θo(1),||||(log||)θo(1),|\mathcal{A}^{\prime}|\gg|\mathcal{A}|(\log|\mathcal{A}|)^{-\theta-o(1)},|\mathcal{B}^{\prime}|\gg|\mathcal{B}|(\log|\mathcal{B}|)^{-\theta-o(1)}, such that E×(𝒜)|𝒜|2,E×()||2.E_{\times}(\mathcal{A}^{\prime})\ll|\mathcal{A}^{\prime}|^{2},E_{\times}(\mathcal{B}^{\prime})\ll|\mathcal{B}^{\prime}|^{2}. These energy bounds give

E×(𝒜,)E×(𝒜)E×()|𝒜|||,E_{\times}(\mathcal{A}^{\prime},\mathcal{B}^{\prime})\leq\sqrt{E_{\times}(\mathcal{A}^{\prime})E_{\times}(\mathcal{B}^{\prime})}\ll|\mathcal{A}^{\prime}||\mathcal{B}^{\prime}|,

where the first inequality is due to (2) of Lemma 2.1. By (1) of Lemma 2.1 we conclude that

|𝒜||𝒜||𝒜|2||2E×(𝒜,)|𝒜|||(log|𝒜|)θ+o(1)(log||)θ+o(1).|\mathcal{A}\cdot\mathcal{B}|\geq|\mathcal{A}^{\prime}\cdot\mathcal{B}^{\prime}|\geq\frac{|\mathcal{A}^{\prime}|^{2}|\mathcal{B}^{\prime}|^{2}}{E_{\times}(\mathcal{A}^{\prime},\mathcal{B}^{\prime})}\geq\frac{|\mathcal{A}||\mathcal{B}|}{(\log|\mathcal{A}|)^{\theta+o(1)}(\log|\mathcal{B}|)^{\theta+o(1)}}.

We thereby prove Theorem 1.2. Taking =𝒜\mathcal{B}=\mathcal{A} in Theorem 1.2 gives Theorem 1.1. ∎

Proof of Theorems 1.4 and 1.5 assuming Theorem 1.9.

By applying Theorem 1.9 to 𝒜\mathcal{A} and \mathcal{B}, we get subsets 𝒜𝒜\mathcal{A}^{\prime}\subseteq\mathcal{A} and \mathcal{B}^{\prime}\subseteq\mathcal{B} with sizes |𝒜|δ|𝒜|,||δ||,|\mathcal{A}^{\prime}|\gg_{\delta}|\mathcal{A}|,|\mathcal{B}^{\prime}|\gg_{\delta}|\mathcal{B}|, and energy bounds

E×(𝒜)δ|𝒜|2(log|𝒜|)2log21+o(1),E×()δ||2(log||)2log21+o(1).E_{\times}(\mathcal{A}^{\prime})\ll_{\delta}|\mathcal{A}|^{2}(\log|\mathcal{A}|)^{2\log 2-1+o(1)},\quad E_{\times}(\mathcal{B}^{\prime})\ll_{\delta}|\mathcal{B}|^{2}(\log|\mathcal{B}|)^{2\log 2-1+o(1)}.

By applying (2) of Lemma 2.1 we get

E×(𝒜,)δ|𝒜|||(log|𝒜|)log212+o(1)(log||)log212+o(1).E_{\times}(\mathcal{A},\mathcal{B})\ll_{\delta}|\mathcal{A}||\mathcal{B}|(\log|\mathcal{A}|)^{\log 2-\frac{1}{2}+o(1)}(\log|\mathcal{B}|)^{\log 2-\frac{1}{2}+o(1)}.

Now Theorem 1.5 follows by (1) of Lemma 2.1. Taking =𝒜\mathcal{B}=\mathcal{A} in Theorem 1.5 gives Theorem 1.4. ∎

3 Proof of Theorem 1.9: reduction to the essential case

In this section, we want to reduce Theorem 1.9 to the following case, which is proved in the Section 4.

Theorem 3.1.

Let δ(0,1]\delta\in(0,1]. Let 𝒫={a+id:0i<L}\mathcal{P}=\{a+id:0\leq i<L\} be an arithmetic progression with common difference dd and length LL, with L,d>0L,d>0, LlogL>adLL\log L>a\geq dL, and gcd(a,d)=1\operatorname{gcd}(a,d)=1. If 𝒜𝒫\mathcal{A}\subseteq\mathcal{P} is a subset of size at least δL\delta L containing only square-free elements and LL is sufficiently large with respect to δ\delta, then there exists a subset 𝒜𝒜\mathcal{A}^{\prime}\subseteq\mathcal{A} with |𝒜||𝒜||\mathcal{A}^{\prime}|\gg|\mathcal{A}| such that

E×(𝒜)L2(logL)2log21+o(1).E_{\times}(\mathcal{A}^{\prime})\ll L^{2}(\log L)^{2\log 2-1+o(1)}. (3.1)

We start by giving an upper bound on the multiplicative energy in some special cases. When gcd(a,d)=1\operatorname{gcd}(a,d)=1, we have the following observation on the multiplicative energy.

Lemma 3.2.

Let 𝒫={a+id:0i<L}\mathcal{P}=\{a+id:0\leq i<L\} be an arithmetic progression with gcd(a,d)=1\operatorname{gcd}(a,d)=1 and a>0,d>0a>0,d>0. For any 𝒜𝒫\mathcal{A}\subseteq\mathcal{P}, we have E×(𝒜)2|𝒜|2+4L3a(1+logL).E_{\times}(\mathcal{A})\leq 2|\mathcal{A}|^{2}+4\frac{L^{3}}{a}(1+\log L).

Proof.

The number of solutions (a1,a2;a3,a4)𝒜4(a_{1},a_{2};a_{3},a_{4})\in\mathcal{A}^{4} to a1a2=a3a4a_{1}a_{2}=a_{3}a_{4} with {a1,a2}={a3,a4}\{a_{1},a_{2}\}=\{a_{3},a_{4}\} is at most 2|𝒜|22|\mathcal{A}|^{2}. We next estimate the number of off diagonal solutions.

We begin by parameterizing solutions (a1,a2;a3,a4)(a_{1},a_{2};a_{3},a_{4}). Write x1=gcd(a1,a3)x_{1}=\operatorname{gcd}(a_{1},a_{3}) and a1=x1y1a_{1}=x_{1}y_{1}, a3=x1y2a_{3}=x_{1}y_{2}. As a result, a2/y2=a4/y1a_{2}/y_{2}=a_{4}/y_{1}, which we denote as x2x_{2}. Therefore we derive a tuple (x1,x2;y1,y2)(x_{1},x_{2};y_{1},y_{2}) such that xiyj𝒜x_{i}y_{j}\in\mathcal{A} for all i,j{1,2}i,j\in\{1,2\}. As {a1,a2}{a3,a4}\{a_{1},a_{2}\}\neq\{a_{3},a_{4}\}, one has x1x2x_{1}\neq x_{2} and y1y2y_{1}\neq y_{2}. By symmetry we may assume that x1<x2x_{1}<x_{2} and y1<y2y_{1}<y_{2} and thereby decrease the number of such tuples by a factor of 44, i.e. E×(𝒜)2|𝒜|2E_{\times}(\mathcal{A})-2|\mathcal{A}|^{2} is at most 4|X|4|X| where

X:={(x1,x2;y1,y2):xiyj𝒜i,j{1,2},0<x1<x2,0<y1<y2}.X:=\{(x_{1},x_{2};y_{1},y_{2}):x_{i}y_{j}\in\mathcal{A}\;\leavevmode\nobreak\ \leavevmode\nobreak\ \forall\;i,j\in\{1,2\},0<x_{1}<x_{2},0<y_{1}<y_{2}\}.

Fix a choice of x1x_{1}. Since elements in 𝒜\mathcal{A} are included in the interval [a,a+dL)[a,a+dL), one has that x2x1=x2y1x1y1<1+dLa\frac{x_{2}}{x_{1}}=\frac{x_{2}y_{1}}{x_{1}y_{1}}<1+\frac{dL}{a}. Together with x2>x1x_{2}>x_{1}, it implies x2(x1,x1+x1dL/a)x_{2}\in(x_{1},x_{1}+x_{1}dL/a). Since all elements in 𝒜\mathcal{A} are coprime to dd, xix_{i} and yjy_{j} are also coprime to dd. As a result, x1y1ax2y1(modd)x_{1}y_{1}\equiv a\equiv x_{2}y_{1}\pmod{d} implies that x1x_{1} and x2x_{2} are both congruent to ay11ay_{1}^{-1} mod dd, and thus the number of choices for x2x_{2} is at most x1La\lfloor\frac{x_{1}L}{a}\rfloor.

Similarly, yi[ax1,ax1+dLx1)y_{i}\in[\frac{a}{x_{1}},\frac{a}{x_{1}}+\frac{dL}{x_{1}}) as x1yi[a,a+dL)x_{1}y_{i}\in[a,a+dL). By the fact y1<y2y_{1}<y_{2} and y1y2moddy_{1}\equiv y_{2}\mod d, there is no valid choice for (y1,y2)(y_{1},y_{2}) if Lx11\frac{L}{x_{1}}\leq 1. If Lx1>1\frac{L}{x_{1}}>1, then the number of choices for (y1,y2)(y_{1},y_{2}) is at most

12Lx1(Lx11)L2x12.\frac{1}{2}\left\lceil\frac{L}{x_{1}}\right\rceil\cdot\left(\left\lceil\frac{L}{x_{1}}\right\rceil-1\right)\leq\frac{L^{2}}{x_{1}^{2}}.

In summary, any such valid tuple satisfies aLx1<L\frac{a}{L}\leq x_{1}<L. Once x1x_{1} is fixed, the number of such tuples is at most L3ax1\frac{L^{3}}{ax_{1}}. By summing over all choices of x1x_{1}, we conclude that

E×(𝒜)2|𝒜|2+4x1<LL3ax12|𝒜|2+4L3a(1+logL).E_{\times}(\mathcal{A})\leq 2|\mathcal{A}|^{2}+4\sum_{x_{1}<L}\frac{L^{3}}{ax_{1}}\leq 2|\mathcal{A}|^{2}+\frac{4L^{3}}{a}(1+\log L).

The above lemma gives the following corollary immediately.

Corollary 3.3.

Let 𝒫={a+id:0i<L}\mathcal{P}=\{a+id:0\leq i<L\} be an arithmetic progression with gcd(a,d)=1\operatorname{gcd}(a,d)=1 and a>0,d>0a>0,d>0. Suppose 𝒜𝒫\mathcal{A}\subseteq\mathcal{P}. If a=Ω(LlogL)a=\Omega(L\log L), then E×(𝒜)=O(L2)E_{\times}(\mathcal{A})=O(L^{2}).

Our next step is to reduce the problem to the case that the elements are all square free.

Lemma 3.4 (Square-free reduction).

Let 𝒫={a+id:0i<L}\mathcal{P}=\{a+id:0\leq i<L\} be an arithmetic progression with gcd(a,d)=1\operatorname{gcd}(a,d)=1 and a,d>0a,d>0. Suppose a+dL<δ29L2a+dL<\frac{\delta^{2}}{9}L^{2} and dLd\leq L. If 𝒜\mathcal{A} is an arithmetic progression with density δ\delta, then we can find a subset 0𝒜\mathcal{B}_{0}\subseteq\mathcal{A} such that |0|δ|𝒜||\mathcal{B}_{0}|\gg_{\delta}|\mathcal{A}|, and all elements in 0\mathcal{B}_{0} have the same largest square factor. In particular, after dividing all elements in 0\mathcal{B}_{0} by their common largest square factor, the resulting set, denoted by \mathcal{B}, contains only square-free elements.

To prove this, we use the Pigeonhole Principle. We first show that most elements in 𝒫\mathcal{P} with relatively small aa and dd are not divisible by large perfect squares.

Lemma 3.5.

Let 𝒫={a+id:0i<L}\mathcal{P}=\{a+id:0\leq i<L\} be an arithmetic progression with gcd(a,d)=1\operatorname{gcd}(a,d)=1 and a,d>0a,d>0. Let T>0T>0 be a positive integer. The number of elements in 𝒫\mathcal{P} divisible by a square greater than T2T^{2} is at most a+dL+LT\sqrt{a+dL}+\frac{L}{T}.

Proof.

Let n=a+dLn=\lfloor\sqrt{a+dL}\rfloor. Clearly if an element in 𝒫\mathcal{P} is divisible by t2t^{2}, then tnt\leq n (or otherwise t2t^{2} is larger than any element in 𝒫\mathcal{P}). We claim that for each tt, the number of elements in 𝒫\mathcal{P} divisible by t2t^{2} is at most Lt2\lceil\frac{L}{t^{2}}\rceil. Since gcd(a,d)=1\operatorname{gcd}(a,d)=1, if t2t^{2} divides a+i1da+i_{1}d and a+i2da+i_{2}d, then gcd(t,d)=1\operatorname{gcd}(t,d)=1. This implies that t2|(i1i2)t^{2}|(i_{1}-i_{2}), so t2t^{2} divides at most one element in every t2t^{2} consecutive elements in 𝒫\mathcal{P}. We conclude that the number of elements in 𝒫\mathcal{P} divisible by a square greater than T2T^{2} is at most

t=T+1nLt2t=T+1n(1+Lt2)n+Lt=T+1n1t2n+LT.\sum_{t=T+1}^{n}\left\lceil\frac{L}{t^{2}}\right\rceil\leq\sum_{t=T+1}^{n}\left(1+\frac{L}{t^{2}}\right)\leq n+L\sum_{t=T+1}^{n}\frac{1}{t^{2}}\leq n+\frac{L}{T}.

We conclude with the desired bound by noting that na+dLn\leq\sqrt{a+dL}. ∎

Proof of Lemma 3.4.

As a+dL<δ29L2a+dL<\frac{\delta^{2}}{9}L^{2}, by taking T=3δT=\lceil\frac{3}{\delta}\rceil in Lemma 3.5, there are at least δ3L\frac{\delta}{3}L elements left in 𝒜\mathcal{A} that are not divisible by squares greater than T2T^{2}. By the Pigeonhole Principle, there exists some 1tT1\leq t\leq T such that there are at least δL3Tδ218L\frac{\delta L}{3T}\geq\frac{\delta^{2}}{18}L elements in 𝒜\mathcal{A} that are divisible by t2t^{2} and are square-free after divided by t2t^{2}. We shall let 0\mathcal{B}_{0} be the set of such elements. ∎

Proof of Theorem 1.9 assuming Theorem 3.1.

First, we may assume that L|𝒜|L\geq|\mathcal{A}| is sufficiently large with respect to δ\delta, or otherwise we may choose 𝒜=𝒜\mathcal{A}^{\prime}=\mathcal{A} and we have E×(𝒜)|𝒜|4δ1E_{\times}(\mathcal{A})\ll|\mathcal{A}|^{4}\ll_{\delta}1.

We finish proof by combining all results from above. Our setup is the following. Let 𝒫={a+id:0i<L}\mathcal{P}=\{a+id:0\leq i<L\} be an arithmetic progression with common difference dd and length LL. Let 𝒜𝒫\mathcal{A}\subseteq\mathcal{P} with density |𝒜||𝒫|δ\frac{|\mathcal{A}|}{|\mathcal{P}|}\geq\delta for some constant δ(0,1)\delta\in(0,1). We begin by noticing the following simple observation.

Observation

Let ϕ\phi be a non-trivial dilation operation (multiply by a nonzero constant) on the set 𝒫\mathcal{P}, then the operation ϕ\phi preserves the multiplicative energy of any subset of 𝒫\mathcal{P}. If Theorem 1.9 holds for (𝒜0,𝒫0)(\mathcal{A}_{0},\mathcal{P}_{0}) where ϕ(𝒜0)𝒜\phi(\mathcal{A}_{0})\subseteq\mathcal{A} and arithmetic progression ϕ(𝒫0)𝒫\phi(\mathcal{P}_{0})\subseteq\mathcal{P} with |𝒜0|δ|𝒜||\mathcal{A}_{0}|\gg_{\delta}|\mathcal{A}|, then the conclusion must also hold for (𝒜,𝒫)(\mathcal{A},\mathcal{P}).

Based on the above observation, we do a set of reduction steps to reduce to (𝒜,𝒫)(\mathcal{A}^{\prime},\mathcal{P}^{\prime}). At the same time, we trace the change of density from δ\delta to δ\delta^{\prime} in each step and LL^{\prime} to LL in each step.

Step 1: Assume all elements are positive. Without loss of generality, we may assume that at least 13\frac{1}{3} of the elements in 𝒜\mathcal{A} are positive, or we shall replace 𝒜\mathcal{A} by 𝒜-\mathcal{A}. Let 𝒜+=𝒜>0\mathcal{A}_{+}=\mathcal{A}\cap\mathbb{Z}_{>0} be the set of positive elements. Then it is sufficient to prove the statement for 𝒜+\mathcal{A}_{+}, i.e., one can find a subset of 𝒜+\mathcal{A}_{+} with small multiplicative energy L2(logL)12log2+o(1)\ll L^{2}(\log L)^{1-2\log 2+o(1)}.

In this step, we set 𝒜1=𝒜+\mathcal{A}_{1}=\mathcal{A}_{+}. It is clear that by doing this, the density δ1>δ/3\delta_{1}>\delta/3. We also make the enclosing arithmetic progression contain only positive elements. This does not decrease the density δ1\delta_{1}, while the length of the enclosing arithmetic progression becomes L1:=|𝒫>0||𝒜1|=Ω(δL)L_{1}:=|\mathcal{P}\cap\mathbb{Z}_{>0}|\geq|\mathcal{A}_{1}|=\Omega(\delta L). In summary, we can find an arithmetic progression 𝒫1={a1+id1:0i<L1}\mathcal{P}_{1}=\{a_{1}+id_{1}:0\leq i<L_{1}\} and a subset 𝒜1𝒫1\mathcal{A}_{1}\subseteq\mathcal{P}_{1} such that E×(𝒜)E×(𝒜1)E_{\times}(\mathcal{A})\geq E_{\times}(\mathcal{A}_{1}), a1,d1>0a_{1},d_{1}>0, and L1=Ω(δL)L_{1}=\Omega(\delta L), δ1=|𝒜1||𝒫1|δ/3\delta_{1}=\frac{|\mathcal{A}_{1}|}{|\mathcal{P}_{1}|}\geq\delta/3.

Step 2: Divide all elements by GCD. We may divide all elements in 𝒜1\mathcal{A}_{1} by gcd(a1,d1)\operatorname{gcd}(a_{1},d_{1}), giving set A2A_{2}. The density and the length do not change δ2=δ1>δ/3\delta_{2}=\delta_{1}>\delta/3 and L2=L1=Ω(δL)L_{2}=L_{1}=\Omega(\delta L). The new common difference is d2=d1gcd(a1,d1)d_{2}=\frac{d_{1}}{\operatorname{gcd}(a_{1},d_{1})} and the initial term becomes a2=a1gcd(a1,d1)a_{2}=\frac{a_{1}}{\operatorname{gcd}(a_{1},d_{1})}. Moreover this map (×gcd(a1,d1))(\times\operatorname{gcd}(a_{1},d_{1})) from subsets of 𝒜2\mathcal{A}_{2} to subsets of 𝒜1\mathcal{A}_{1} preserves multiplicative energy.

Step 3: Assume all elements lie in dyadic interval \texorpdfstring[x,2x)[x,2x)[x, 2x). Let It:={a2+id2:L2/2t+1i<L2/2t}I_{t}:=\{a_{2}+id_{2}:L_{2}/2^{t+1}\leq i<L_{2}/2^{t}\} for t0t\geq 0 and consider the dyadic partition (ignore the single element {a}\{a\})

{a2+id2:0<i<L2}=t0It.\{a_{2}+id_{2}:0<i<L_{2}\}=\bigcup_{t\geq 0}I_{t}.

We may set t0=O(1+log1δ2)t_{0}=O(1+\log\frac{1}{\delta_{2}}) such that tt0|It|δ2(L21)/2\sum_{t\geq t_{0}}|I_{t}|\leq\delta_{2}(L_{2}-1)/2. It implies that

|𝒜2t<t0It|δ2(L21)/2.\left|\mathcal{A}_{2}\cap\bigcup_{t<t_{0}}I_{t}\right|\geq\delta_{2}(L_{2}-1)/2.

This means that 𝒜2\mathcal{A}_{2} contains at least δ2/2\delta_{2}/2 fraction of elements in t<t0It\bigcup_{t<t_{0}}I_{t}. In particular there exists some 0t1<t00\leq t_{1}<t_{0} such that

|𝒜It1|δ22|It1|.|\mathcal{A}\cap I_{t_{1}}|\geq\frac{\delta_{2}}{2}|I_{t_{1}}|.

Note that It1={a2+id2:L2/2t1+1i<L2/2t1}I_{t_{1}}=\{a_{2}+id_{2}:L_{2}/2^{t_{1}+1}\leq i<L_{2}/2^{t_{1}}\} can be written as an arithmetic progression 𝒫3={a3+id3:0i<L3}\mathcal{P}_{3}=\{a_{3}+id_{3}:0\leq i<L_{3}\} with a3a2+d2L2/2t1+1a_{3}\geq a_{2}+d_{2}L_{2}/2^{t_{1}+1}, d3=d2d_{3}=d_{2}, and L3L2/2t1+1L_{3}\leq L_{2}/2^{t_{1}+1}. We conclude that we can find sets 𝒜3,𝒫3\mathcal{A}_{3},\mathcal{P}_{3} such that 𝒫3={a3+id3:0i<L3}\mathcal{P}_{3}=\{a_{3}+id_{3}:0\leq i<L_{3}\} with a3>d3L3a_{3}>d_{3}L_{3}, L3=Ω(L2/2t0)=δ2O(1)L2L_{3}=\Omega(L_{2}/2^{t_{0}})=\delta_{2}^{O(1)}L_{2} and 𝒜3=𝒜2𝒫3\mathcal{A}_{3}=\mathcal{A}_{2}\cap\mathcal{P}_{3} with |𝒜3|δ22L3=:δ3L3|\mathcal{A}_{3}|\geq\frac{\delta_{2}}{2}L_{3}=:\delta_{3}L_{3}. Thus, by the observation made at beginning, it suffices to study (𝒜3,𝒫3)(\mathcal{A}_{3},\mathcal{P}_{3}).

Step 4: Eliminate extremal cases. After step 33, we may assume that d3L3<a3d_{3}L_{3}<a_{3}. Applying Lemma 3.2, we may assume that d3L3<a3<L3logL3d_{3}L_{3}<a_{3}<L_{3}\log L_{3}, otherwise we directly have E×(𝒜3)δ|𝒜3|2E_{\times}(\mathcal{A}_{3})\ll_{\delta}|\mathcal{A}_{3}|^{2}.

Step 5: Assume all elements are square-free. After reduction in step 2, the condition in Lemma 3.4 is satisfied and by applying Lemma 3.4 we get a set 𝒜5\mathcal{A}_{5} with size at least δ3218L3\frac{\delta_{3}^{2}}{18}L_{3} containing only square-free elements. Moreover, there is some constant T=O(1/δ)T=O(1/\delta) such that T2𝒜5𝒜3T^{2}\mathcal{A}_{5}\subseteq\mathcal{A}_{3} and gcd(T,d3)=1\operatorname{gcd}(T,d_{3})=1. Clearly 𝒜5\mathcal{A}_{5} is also contained in an arithmetic progression of length L5=Θ(L3/T2)=Ω(δ2L3)L_{5}=\Theta(L_{3}/T^{2})=\Omega(\delta^{2}L_{3}). Also note that L5L3L_{5}\leq L_{3}, so we have δ5δ3218\delta_{5}\geq\frac{\delta_{3}^{2}}{18}. By applying the observation made at the beginning again, it is valid to make the deduction.

Summary. Combing the five reduction steps above, we conclude that either we find a subset 𝒜𝒜\mathcal{A}^{\prime}\subseteq\mathcal{A} with |𝒜|δ|𝒜||\mathcal{A}^{\prime}|\gg_{\delta}|\mathcal{A}| and E×(𝒜)δ|𝒜|2E_{\times}(\mathcal{A}^{\prime})\ll_{\delta}|\mathcal{A}|^{2} (which is more than what we need), or we get a set =𝒜5\mathcal{B}=\mathcal{A}_{5} containing only square-free integers, with density δ5=Ω(δ2)\delta_{5}=\Omega(\delta^{2}) in an arithmetic progression 𝒫={a+id:0i<L}\mathcal{P}^{\prime}=\{a^{\prime}+id^{\prime}:0\leq i<L^{\prime}\} satisfying that LlogL>a>dL>0L^{\prime}\log L^{\prime}>a^{\prime}>d^{\prime}L^{\prime}>0 where L=L5=Ω(δO(1)L)L^{\prime}=L_{5}=\Omega(\delta^{O(1)}L). Applying Theorem 3.1 on \mathcal{B}, there exists a subset \mathcal{B}^{\prime}\subseteq\mathcal{B} with size |||||\mathcal{B}^{\prime}|\gg|\mathcal{B}| such that

E×()||2(log||)2log21+o(1).E_{\times}(\mathcal{B}^{\prime})\ll|\mathcal{B}|^{2}(\log|\mathcal{B}|)^{2\log 2-1+o(1)}.

Moreover, it is guaranteed that there is a nonzero integer mm such that m:={m}𝒜m\mathcal{B}:=\{m\}\cdot\mathcal{B}\subseteq\mathcal{A}. Then 𝒜=m\mathcal{A}^{\prime}=m\mathcal{B}^{\prime} is a subset of 𝒜\mathcal{A} of size at least δ5Lδ|𝒜|\delta_{5}L^{\prime}\gg_{\delta}|\mathcal{A}|, and

E×(𝒜)=E×()||2(log||)2log21+o(1)δ|𝒜|2(log|𝒜|)2log21+o(1),E_{\times}(\mathcal{A}^{\prime})=E_{\times}(\mathcal{B}^{\prime})\ll|\mathcal{B}|^{2}(\log|\mathcal{B}|)^{2\log 2-1+o(1)}\ll_{\delta}|\mathcal{A}|^{2}(\log|\mathcal{A}|)^{2\log 2-1+o(1)},

as desired.∎

4 Proof of Theorem 3.1: the essential case

In this section we prove Theorem 3.1. Let us restate the theorem for convenience. See 3.1

In fact we prove the following more explicit and stronger statement.

Theorem 4.1.

For any δ,ϵ(0,1]\delta,\epsilon\in(0,1], there exists L0=L0(δ,ϵ)L_{0}=L_{0}(\delta,\epsilon) such that the following holds. Let 𝒫={a+id:i0i<L}\mathcal{P}=\{a+id:i\in 0\leq i<L\} be an arithmetic progression with common difference dd and length LL, with L>L0,d>0L>L_{0},d>0, LlogL>adLL\log L>a\geq dL, and gcd(a,d)=1\operatorname{gcd}(a,d)=1. If 𝒜𝒫\mathcal{A}\subseteq\mathcal{P} is a subset of size at least δL\delta L containing only square-free elements, then there exists a subset 𝒜𝒜\mathcal{A}^{\prime}\subseteq\mathcal{A} with |𝒜||𝒜|/2|\mathcal{A}^{\prime}|\geq|\mathcal{A}|/2 such that

E×(𝒜)L2(1+La(logL)2log21+ϵ).E_{\times}(\mathcal{A}^{\prime})\ll L^{2}\left(1+\frac{L}{a}(\log L)^{2\log 2-1+\epsilon}\right). (4.1)

As adLLa\geq dL\geq L, (4.1) implies (3.1). Thus, Theorem 3.1 follows from Theorem 4.1. To prove Theorem 4.1, we need the following result of Shiu [27].

Theorem 4.2 (Theorem 1 in [27]).

Let f(n)f(n) be a non-negative multiplicative function such that f(p)A1f(p^{\ell})\leq A_{1}^{\ell} for some positive constant A1A_{1} and for any ε>0\varepsilon>0, f(n)A2nεf(n)\leq A_{2}n^{\varepsilon} for some A2=A2(ε)A_{2}=A_{2}(\varepsilon). Let α,β(0,1/2]\alpha,\beta\in(0,1/2], integer aa satisfying gcd(a,k)=1\operatorname{gcd}(a,k)=1. Then as xx\to\infty we have

xyn<xna(modk)f(n)yϕ(k)1logxexp(px,pkf(p)p),\sum_{\begin{subarray}{c}x-y\leq n<x\\ n\equiv a\;(\mathrm{mod}\;k)\end{subarray}}f(n)\ll\frac{y}{\phi(k)}\frac{1}{\log x}\exp\left(\sum_{p\leq x,p\nmid k}\frac{f(p)}{p}\right),

provided that k<y1αk<y^{1-\alpha} and xβ<y<xx^{\beta}<y<x, where the implicit constant depends only on A1,A2,α,βA_{1},A_{2},\alpha,\beta and the summation on the right hand side is taken over prime pp.

Let ω(n)\omega(n) be the number of distinct prime factors of nn. Consider the family of nonnegative multiplicative functions M={zω(n):z(0,2]}M=\{z^{\omega(n)}:z\in(0,2]\}. Note that Theorem 4.2 is applicable to any fMf\in M with a universal choice of A1A_{1} and A2A_{2}, by observing that f(p)2f(p^{\ell})\leq 2, and f(n)2ω(n)=nO(1/loglogn)f(n)\leq 2^{\omega(n)}=n^{O(1/\log\log n)}. To evaluate the summation inside the exp on the right hand side, we need the following estimate.

Theorem 4.3 (Merten’s estimate, e.g. see [32, Theorem 1.10]).

For x2x\geq 2, px1p=loglogx+O(1).\sum_{p\leq x}\frac{1}{p}=\log\log x+O(1).

Using these, we estimate the number of elements in 𝒫\mathcal{P} with a large number of distinct prime factors.

Lemma 4.4.

Let 𝒫={a+id:0i<L}\mathcal{P}=\{a+id:0\leq i<L\} be an arithmetic progression satisfying L2>a+dL>a>0L^{2}>a+dL>a>0 and gcd(a,d)=1\operatorname{gcd}(a,d)=1. As a+dLa+dL\to\infty, if t>0t>0 satisfies t=o(loglog(a+dL))t=o(\log\log(a+dL)), then

|{n𝒫:ω(n)loglog(a+dL)+t}|dLϕ(d)exp((12+o(1))t2loglog(a+dL)).|\{n\in\mathcal{P}:\omega(n)\geq\log\log(a+dL)+t\}|\ll\frac{dL}{\phi(d)}\exp\left(-\left(\frac{1}{2}+o(1)\right)\frac{t^{2}}{\log\log(a+dL)}\right).
Proof.

Let =loglog(a+dL)+t\ell=\log\log(a+dL)+t and z=loglog(a+dL)=1+o(1)(1,2]z=\frac{\ell}{\log\log(a+dL)}=1+o(1)\in(1,2]. As z>1z>1, we use bounds zω(n)1z^{\omega(n)-\ell}\geq 1 when ω(n)\omega(n)\geq\ell and zω(n)>0z^{\omega(n)-\ell}>0 otherwise, to get that

|{n𝒫:ω(n)}|n𝒫zω(n)=zn𝒫zω(n).|\{n\in\mathcal{P}:\omega(n)\geq\ell\}|\leq\sum_{n\in\mathcal{P}}z^{\omega(n)-\ell}=z^{-\ell}\sum_{n\in\mathcal{P}}z^{\omega(n)}.

As =zloglog(a+dL)\ell=z\log\log(a+dL), we have z=ezlogzloglog(a+dL)=(log(a+dL))zlogzz^{-\ell}=e^{-z\log z\log\log(a+dL)}=(\log(a+dL))^{-z\log z}. Note that we can write 𝒫={an<a+dL:na(modd)}\mathcal{P}=\{a\leq n<a+dL:n\equiv a\pmod{d}\}. By L2>a+dL>a>0L^{2}>a+dL>a>0, we may take x=a+dLx=a+dL, y=dLy=dL, and k=dk=d, and they satisfy the conditions for Theorem 4.2 with α=β=1/2\alpha=\beta=1/2. It implies

zn𝒫zω(n)dLϕ(d)(log(a+dL))z1zlogz.z^{-\ell}\sum_{n\in\mathcal{P}}z^{\omega(n)}\ll\frac{dL}{\phi(d)}(\log(a+dL))^{z-1-z\log z}.

The implicit constant here is uniform for all z(1,2]z\in(1,2]. If we write λ:=z1=o(1)\lambda:=z-1=o(1), then we have

1zlogz+z=1(1+λ)log(1+λ)+(1+λ)=λ22+O(λ3)=(12+o(1))λ2.-1-z\log z+z=-1-(1+\lambda)\log(1+\lambda)+(1+\lambda)=-\frac{\lambda^{2}}{2}+O(\lambda^{3})=-\left(\frac{1}{2}+o(1)\right)\lambda^{2}.

By definition, we have λ=tloglog(a+dL)\lambda=\frac{t}{\log\log(a+dL)}, and the desired inequality follows. ∎

Note that dϕ(d)=O(loglogd)=O(loglog(a+dL))\frac{d}{\phi(d)}=O(\log\log d)=O(\log\log(a+dL)). We can choose t=(loglog(a+dL))2/3t=(\log\log(a+dL))^{2/3}, such that o(L)o(L) elements have more than T=loglog(a+dL)+(loglog(a+dL))2/3T=\log\log(a+dL)+(\log\log(a+dL))^{2/3} distinct prime factors. Therefore we have the following lemma.

Lemma 4.5.

For any ε(0,1]\varepsilon\in(0,1], there exists a0=a0(ε)a_{0}=a_{0}(\varepsilon) such that the following holds. For any a>a0a>a_{0}, if 𝒫={a+id:0i<L}\mathcal{P}=\{a+id:0\leq i<L\} is an arithmetic progression with common difference dd and length LL satisfying L2>a+dL>a>0L^{2}>a+dL>a>0 and gcd(a,d)=1\operatorname{gcd}(a,d)=1, then for T=loglog(a+dL)+(loglog(a+dL))23T=\log\log(a+dL)+(\log\log(a+dL))^{\frac{2}{3}},

|{n𝒫:ω(n)T}|(1ε)L.|\{n\in\mathcal{P}:\omega(n)\leq T\}|\geq(1-\varepsilon)L. (4.2)

Let 𝒜\mathcal{A}^{\prime} be the subset of elements of 𝒜\mathcal{A} with at most TT distinct prime factors, i.e.

𝒜:={a𝒜:ω(n)T=loglog(a+dL)+(loglog(a+dL))23}.\mathcal{A}^{\prime}:=\{a\in\mathcal{A}:\omega(n)\leq T=\log\log(a+dL)+(\log\log(a+dL))^{\frac{2}{3}}\}. (4.3)

By Lemma 4.5, we have |𝒜||𝒜|/2|\mathcal{A}^{\prime}|\geq|\mathcal{A}|/2 when a>a0(δ/2)a>a_{0}(\delta/2).

Proof of Theorem 4.1.

By choosing a sufficiently large constant in (3.1), we may assume that a>dLL>|𝒜|a>dL\geq L>|\mathcal{A}| is sufficiently large. In particular d<aL<logL<L1/3d<\frac{a}{L}<\log L<L^{1/3}. By Lemma 4.5, we may assume that aa0(δ/2)a\geq a_{0}(\delta/2). Now at most εL=δ2L\varepsilon L=\frac{\delta}{2}L elements in 𝒫\mathcal{P} have more than T=loglog(a+dL)+(loglog(a+dL))23T=\log\log(a+dL)+(\log\log(a+dL))^{\frac{2}{3}} distinct prime factors. Similar to the proof of Lemma 3.2, we estimate the size of

X:={(x1,x2;y1,y2):xiyj𝒜i,j{1,2},0<x1<x2,0<y1<y2,x1y1}.X:=\{(x_{1},x_{2};y_{1},y_{2}):x_{i}y_{j}\in\mathcal{A}^{\prime}\;\leavevmode\nobreak\ \leavevmode\nobreak\ \forall\;i,j\in\{1,2\},0<x_{1}<x_{2},0<y_{1}<y_{2},x_{1}\leq y_{1}\}.

By the same argument as in the proof of Lemma 3.2 (here we use an extra symmetry between xx and yy), we have that E×(𝒜)2|𝒜|2+8|X|E_{\times}(\mathcal{A}^{\prime})\leq 2|\mathcal{A}^{\prime}|^{2}+8|X|.

Fix x1x_{1}. As all elements in 𝒜\mathcal{A}^{\prime} are congruent to aa mod dd, they are all coprime to dd. Therefore xix_{i} and yjy_{j} are also coprime to dd. As a result, ax1y1x2y1x1y2(modd)a\equiv x_{1}y_{1}\equiv x_{2}y_{1}\equiv x_{1}y_{2}\pmod{d} implies that x2x_{2} is congruent to x1x_{1} mod dd and y1y_{1} and y2y_{2} are both congruent to ax11ax_{1}^{-1} mod dd. Since all elements in 𝒜\mathcal{A}^{\prime} are included in the interval [a,a+dL)[a,a+dL), one has x2x1=x2y1x1y1<1+dLa\frac{x_{2}}{x_{1}}=\frac{x_{2}y_{1}}{x_{1}y_{1}}<1+\frac{dL}{a}. Since x2>x1x_{2}>x_{1}, we know that x2(x1,x1+x1dL/a)x_{2}\in(x_{1},x_{1}+x_{1}dL/a). As x2x1(modd)x_{2}\equiv x_{1}\pmod{d}, the number of choices for x2x_{2} is at most x1La\lfloor\frac{x_{1}L}{a}\rfloor. In particular, if x1La<1\frac{x_{1}L}{a}<1 then there is no valid tuple in XX.

Similarly, as x1yi[a,a+dL)x_{1}y_{i}\in[a,a+dL), we know that yi[ax1,ax1+dLx1)y_{i}\in[\frac{a}{x_{1}},\frac{a}{x_{1}}+\frac{dL}{x_{1}}). As y1<y2y_{1}<y_{2} and they are congruent mod dd, we know that there is no valid choice for (y1,y2)(y_{1},y_{2}) if Lx1<1\frac{L}{x_{1}}<1. Otherwise if Lx11\frac{L}{x_{1}}\geq 1, then the number of choices is at most

12Lx1(Lx11)L2x12.\frac{1}{2}\left\lceil\frac{L}{x_{1}}\right\rceil\cdot\left(\left\lceil\frac{L}{x_{1}}\right\rceil-1\right)\leq\frac{L^{2}}{x_{1}^{2}}.

Therefore, for each x1x_{1}, the number of choices for (x2;y1,y2)(x_{2};y_{1},y_{2}) is at most L3ax1\frac{L^{3}}{ax_{1}}.

Let X1X_{1} be the set of such tuples with x1ea2L2x_{1}\leq\frac{ea^{2}}{L^{2}}. We first estimate |X1||X_{1}|. As we sum L3ax1\frac{L^{3}}{ax_{1}} over 1x1ea2L21\leq x_{1}\leq\frac{ea^{2}}{L^{2}}, we conclude that |X1|L3a(logea2L2+1)L2(2La+2LalogaL)4L2.|X_{1}|\leq\frac{L^{3}}{a}\left(\log\frac{ea^{2}}{L^{2}}+1\right)\leq L^{2}\left(2\frac{L}{a}+2\frac{L}{a}\log\frac{a}{L}\right)\leq 4L^{2}.

It remains to estimate the size of X2=XX1X_{2}=X\setminus X_{1}. It consists of tuples with x1>ea2L2x_{1}>\frac{ea^{2}}{L^{2}}. Note that x12x1y1<a+dLx_{1}^{2}\leq x_{1}y_{1}<a+dL, we know that ea2L2<x1<a+dL\frac{ea^{2}}{L^{2}}<x_{1}<\sqrt{a+dL}. For each fixed x1=x(ea2L2,a+dL)x_{1}=x\in(\frac{ea^{2}}{L^{2}},\sqrt{a+dL}), we consider tuples (x1,x2;y1,y2)X2(x_{1},x_{2};y_{1},y_{2})\in X_{2}. As argued above, we know that x2x_{2} is an element in (x1,x1dL/a)(x_{1},x_{1}dL/a) with x2x1(modd)x_{2}\equiv x_{1}\pmod{d}, and y1,y2y_{1},y_{2} are elements in [ax1,ax1+dLx1)[\frac{a}{x_{1}},\frac{a}{x_{1}}+\frac{dL}{x_{1}}) with y1y2ax11(modd)y_{1}\equiv y_{2}\equiv ax_{1}^{-1}\pmod{d}.

Next we use the fact that all elements in 𝒜\mathcal{A}^{\prime} having at most TT distinct prime factors and are square-free, which implies that ω(xi)+ω(yj)=ω(xiyj)T\omega(x_{i})+\omega(y_{j})=\omega(x_{i}y_{j})\leq T for i,j{1,2}i,j\in\{1,2\}. Therefore for each (x1,x2;y1,y2)X2(x_{1},x_{2};y_{1},y_{2})\in X_{2}, we have

(12)ω(x1y1)+ω(x1y2)+ω(x2y1)+ω(x2y2)4T1.\left(\frac{1}{\sqrt{2}}\right)^{\omega(x_{1}y_{1})+\omega(x_{1}y_{2})+\omega(x_{2}y_{1})+\omega(x_{2}y_{2})-4T}\geq 1.

Using the above bound and that ω(xiyj)=ω(xi)+ω(yj)\omega(x_{i}y_{j})=\omega(x_{i})+\omega(y_{j}) since xiyj𝒜x_{i}y_{j}\in\mathcal{A}^{\prime} is square-free, we have

|X2|(x1,x2;y1,y2)X22ω(x1)ω(x2)ω(y1)ω(y2)+2T22Tea2L2<x1<a+dL2ω(x1)x1<x2<x1a+dLax2x1(modd)2ω(x2)(ax1y<a+dLx1yax11(modd)2ω(y))2.\begin{split}|X_{2}|&\leq\sum_{(x_{1},x_{2};y_{1},y_{2})\in X_{2}}2^{-\omega(x_{1})-\omega(x_{2})-\omega(y_{1})-\omega(y_{2})+2T}\\ &\leq 2^{2T}\sum_{\frac{ea^{2}}{L^{2}}<x_{1}<\sqrt{a+dL}}2^{-\omega(x_{1})}\sum_{\begin{subarray}{c}x_{1}<x_{2}<x_{1}\frac{a+dL}{a}\\ x_{2}\equiv x_{1}\;(\mathrm{mod}\;d)\end{subarray}}2^{-\omega(x_{2})}\left(\sum_{\begin{subarray}{c}\frac{a}{x_{1}}\leq y<\frac{a+dL}{x_{1}}\\ y\equiv ax_{1}^{-1}\;(\mathrm{mod}\;d)\end{subarray}}2^{-\omega(y)}\right)^{2}.\end{split} (4.4)

We now apply Theorem 4.2 to two factors on the right hand side of (4.4). Note that for x=x1a+dLax^{\prime}=x_{1}\frac{a+dL}{a} and y=x1dL/ay^{\prime}=x_{1}dL/a, we have y>ea2L2dLa=eaLded2>d2y^{\prime}>\frac{ea^{2}}{L^{2}}\frac{dL}{a}=\frac{ea}{L}d\geq ed^{2}>d^{2} (where we use the assumption that adLa\geq dL) and x>yx^{\prime}>y^{\prime}. Finally we have (y)2=x12d2L2/a2>ex1d2ex1>x(y^{\prime})^{2}=x_{1}^{2}d^{2}L^{2}/a^{2}>ex_{1}d^{2}\geq ex_{1}>x^{\prime}. Therefore parameters (x,y,d,f)=(x,y,k,2ω())(x,y,d,f)=(x^{\prime},y^{\prime},k,2^{-\omega(\cdot)}) satisfy the conditions of Theorem 4.2 with α=β=1/2\alpha=\beta=1/2 and A1=A2=1A_{1}=A_{2}=1. By Theorem 4.3, we have px,pdf(p)/ploglogx+C2\sum_{p\leq x^{\prime},p\nmid d}f(p)/p\leq\frac{\log\log x^{\prime}+C}{2} for f(x)=2ω(x)f(x)=2^{-\omega(x)}, and further

x1<x2<x1a+dLax2x1(modd)2ω(x2)x1dLaϕ(d)(logx1(a+dL)a)12x1dLaϕ(d)(logx1)12.\sum_{\begin{subarray}{c}x_{1}<x_{2}<x_{1}\frac{a+dL}{a}\\ x_{2}\equiv x_{1}\;(\mathrm{mod}\;d)\end{subarray}}2^{-\omega(x_{2})}\ll\frac{x_{1}\frac{dL}{a}}{\phi(d)}\left(\log\frac{x_{1}(a+dL)}{a}\right)^{-\frac{1}{2}}\leq\frac{x_{1}dL}{a\phi(d)}(\log x_{1})^{-\frac{1}{2}}. (4.5)

Similarly as x1<a+dL2LlogL<L23x_{1}<\sqrt{a+dL}\leq\sqrt{2L\log L}<L^{\frac{2}{3}} for LL sufficiently large, one can verify that for (x,y)=(a+dLx1,dLx1)(x^{\prime},y^{\prime})=(\frac{a+dL}{x_{1}},\frac{dL}{x_{1}}), we have (y)2=d2L2x12>d2L23>d4(y^{\prime})^{2}=d^{2}\frac{L^{2}}{x_{1}^{2}}>d^{2}L^{\frac{2}{3}}>d^{4}, x>yx^{\prime}>y^{\prime}, and (y)2x=d2L2x1(a+dL)>d2L2(a+dL)3/2>1\frac{(y^{\prime})^{2}}{x^{\prime}}=\frac{d^{2}L^{2}}{x_{1}(a+dL)}>\frac{d^{2}L^{2}}{(a+dL)^{3/2}}>1. Hence by Theorem 4.2 with the same (α,β,A1,A2)(\alpha,\beta,A_{1},A_{2}) and by the same estimate due to Theorem 4.3, we have

ax1y<a+dLx1yax11(modd)2ω(y)dLx1ϕ(d)(loga+dLx1)122dLx1ϕ(d)(log(a+dL))12.\sum_{\begin{subarray}{c}\frac{a}{x_{1}}\leq y<\frac{a+dL}{x_{1}}\\ y\equiv ax_{1}^{-1}\;(\mathrm{mod}\;d)\end{subarray}}2^{-\omega(y)}\ll\frac{\frac{dL}{x_{1}}}{\phi(d)}\left(\log\frac{a+dL}{x_{1}}\right)^{-\frac{1}{2}}\leq\sqrt{2}\frac{dL}{x_{1}\phi(d)}(\log(a+dL))^{-\frac{1}{2}}. (4.6)

Here in the second inequality we use a+dLx1a+dL\frac{a+dL}{x_{1}}\geq\sqrt{a+dL}. Put (4.5) and (4.6) back to (4.4). We get

|X2|22Tea2L2<x1<a+dL2ω(x1)d3L3x1aϕ(d)3(logx1)12(log(a+dL))1=d3L3aϕ(d)322T(log(a+dL))1ea2L2<x1<a+dL2ω(x1)(logx1)12x1.\begin{split}|X_{2}|&\ll 2^{2T}\sum_{\frac{ea^{2}}{L^{2}}<x_{1}<\sqrt{a+dL}}2^{-\omega(x_{1})}\cdot\frac{d^{3}L^{3}}{x_{1}a\phi(d)^{3}}(\log x_{1})^{-\frac{1}{2}}(\log(a+dL))^{-1}\\ &=\frac{d^{3}L^{3}}{a\phi(d)^{3}}2^{2T}(\log(a+dL))^{-1}\sum_{\frac{ea^{2}}{L^{2}}<x_{1}<\sqrt{a+dL}}2^{-\omega(x_{1})}\frac{(\log x_{1})^{-\frac{1}{2}}}{x_{1}}.\end{split} (4.7)

Here the implicit constant is absolute. To estimate the second line of (4.7), we partition the interval dyadically over x1(ek,ek+1)x_{1}\in(e^{k},e^{k+1}) for 1logea2L2kloga+dL1\leq\lfloor\log\frac{ea^{2}}{L^{2}}\rfloor\leq k\leq\lfloor\log\sqrt{a+dL}\rfloor. In each interval we have

ek<x1<ek+12ω(x1)(logx1)12x1ek<x1<ek+12ω(x1)k12ekek(k+1)12k12ekk1,\sum_{e^{k}<x_{1}<e^{k+1}}2^{-\omega(x_{1})}\frac{(\log x_{1})^{-\frac{1}{2}}}{x_{1}}\leq\sum_{e^{k}<x_{1}<e^{k+1}}2^{-\omega(x_{1})}\frac{k^{-\frac{1}{2}}}{e^{k}}\ll e^{k}(k+1)^{-\frac{1}{2}}\frac{k^{-\frac{1}{2}}}{e^{k}}\ll k^{-1},

where in the second inequality we use Theorem 4.2 with (x,y,d,a)=(ek+1,ek+1ek,1,1)(x,y,d,a)=(e^{k+1},e^{k+1}-e^{k},1,1) and parameters α=β=1/2\alpha=\beta=1/2 and A1=A2=1A_{1}=A_{2}=1. Summing over all values of kk, we have for large enough LL,

ea2L2<x1<a+dL2ω(x1)(logx1)12x1k=logea2L2loga+dLk12loglog(a+dL).\sum_{\frac{ea^{2}}{L^{2}}<x_{1}<\sqrt{a+dL}}2^{-\omega(x_{1})}\frac{(\log x_{1})^{-\frac{1}{2}}}{x_{1}}\ll\sum_{k=\lfloor\log\frac{ea^{2}}{L^{2}}\rfloor}^{\lfloor\log\sqrt{a+dL}\rfloor}k^{-1}\leq 2\log\log(a+dL). (4.8)

Noting that T=loglog(a+dL)+(loglog(a+dL))23T=\log\log(a+dL)+(\log\log(a+dL))^{\frac{2}{3}} and by choosing L0L_{0} sufficiently large, we have

22T=(log(a+dL))(2log2)(1+(loglog(a+dL))13)<(log(a+dL))2log2+ϵ/3.2^{2T}=(\log(a+dL))^{(2\log 2)\left(1+(\log\log(a+dL))^{-\frac{1}{3}}\right)}<(\log(a+dL))^{2\log 2+\epsilon/3}. (4.9)

Put (4.8) and (4.9) back to (4.7). Noting that loglog(a+dL)<(log(a+dL))ϵ/3\log\log(a+dL)<(\log(a+dL))^{\epsilon/3}, dϕ(d)=O(loglogd)(log(a+dL))ϵ/9\frac{d}{\phi(d)}=O(\log\log d)\leq(\log(a+dL))^{\epsilon/9}, and a+dL2LlogLL2a+dL\leq 2L\log L\leq L^{2} as LL0L\geq L_{0} is sufficiently large, we conclude that

|X2|L3a(dϕ(d))3(log(a+dL))2log21+ϵ/3(log(a+dL))ϵ/3L3a(logL)2log21+ϵ.|X_{2}|\ll\frac{L^{3}}{a}\left(\frac{d}{\phi(d)}\right)^{3}(\log(a+dL))^{2\log 2-1+\epsilon/3}(\log(a+dL))^{\epsilon/3}\ll\frac{L^{3}}{a}(\log L)^{2\log 2-1+\epsilon}.

Here the implicit constant factor is absolute. It follows that

E×(𝒜)2|𝒜|2+8(|X1|+|X2|)L2(1+La(logL)2log21+ϵ).E_{\times}(\mathcal{A}^{\prime})\leq 2|\mathcal{A}^{\prime}|^{2}+8(|X_{1}|+|X_{2}|)\ll L^{2}\left(1+\frac{L}{a}(\log L)^{2\log 2-1+\epsilon}\right).

Recall that |𝒜||𝒜|/2|\mathcal{A}^{\prime}|\geq|\mathcal{A}|/2. We have the desired subset 𝒜\mathcal{A}^{\prime}. ∎

5 Elements with a fixed number of primes factors in arithmetic progressions

In this section, we make some preparations for the proof of Theorem 1.8 in Section 6. Let 𝒜\mathcal{A} be an arithmetic progression satisfying 0<dLaLlogL0<dL\leq a\leq L\sqrt{\log L}. For any positive integer nn, we denote the sequence of distinct prime factors of nn by p1(n)<p2(n)<<pω(n)(n)p_{1}(n)<p_{2}(n)<\cdots<p_{\omega(n)}(n). For parameters α,β\alpha,\beta\in\mathbb{R} and kk\in\mathbb{N}, we define

𝒩k(𝒜;α,β):={n𝒜:n square-free,ω(n)=k,loglogpj(n)αjβ 1jk}\mathcal{N}_{k}(\mathcal{A};\alpha,\beta):=\{n\in\mathcal{A}:n\textrm{ square-free},\;\omega(n)=k,\;\log\log p_{j}(n)\geq\alpha j-\beta\;\leavevmode\nobreak\ \leavevmode\nobreak\ \forall\;1\leq j\leq k\} (5.1)

and let Nk(𝒜;α,β)=|𝒩k(𝒜;α,β)|N_{k}(\mathcal{A};\alpha,\beta)=|\mathcal{N}_{k}(\mathcal{A};\alpha,\beta)|. Our goal is to give a lower bound on Nk(𝒜;α,β)N_{k}(\mathcal{A};\alpha,\beta). This is a natural generalization of [8, 11, 31] where the object studied is the special arithmetic progression 𝒜=[N]\mathcal{A}=[N]. Our proof uses ideas from [8, 31] and in Appendix A we give a stronger estimate following [14].

First we estimate the number of primes in an arithmetic progression using the following theorem of Siegel and Walfisz. For a finite 𝒜\mathcal{A}\subseteq\mathbb{N}, let π(𝒜)\pi(\mathcal{A}) be the number of primes in 𝒜\mathcal{A}.

Theorem 5.1 (Siegel-Walfisz, see Section II.8. in [32]).

Let π(x;q,a)\pi(x;q,a) be the number of primes up to xx with pa(modq)p\equiv a\pmod{q}. Then for any A>0A>0, uniformly for (a,q)=1(a,q)=1 and 1q(logx)A1\leq q\leq(\log x)^{A}, we have

π(x;q,a)=xϕ(q)logx+O(xϕ(q)(logx)2).\pi(x;q,a)=\frac{x}{\phi(q)\log x}+O\left(\frac{x}{\phi(q)(\log x)^{2}}\right). (5.2)
Lemma 5.2.

If 𝒜={a+id:0k<L}\mathcal{A}=\{a+id:0\leq k<L\} satisfies that LL is sufficiently large, and 0<dL<a<10LlogL0<dL<a<10L\sqrt{\log L} and gcd(a,d)=1\operatorname{gcd}(a,d)=1, then π(𝒜)dL2ϕ(d)logL\pi(\mathcal{A})\geq\frac{dL}{2\phi(d)\log L}.

Proof.

First note that we can write π(𝒜)=π(a+dL1;d,a)π(a1;d,a)\pi(\mathcal{A})=\pi(a+dL-1;d,a)-\pi(a-1;d,a). From our assumption we know that d<10logLd<10\sqrt{\log L} and a+dL>a>La+dL>a>L, so we may take A=1A=1 in Theorem 5.1 and L>e100L>e^{100} such that 1d<log(a1)<log(a+dL1)1\leq d<\log(a-1)<\log(a+dL-1). Hence we apply (5.2) to have

π(𝒜)=a+dL1ϕ(d)log(a+dL1)a1ϕ(d)log(a1)+O(a1ϕ(d)(log(a1))2+a+dL1ϕ(d)(log(a+dL1))2).\pi(\mathcal{A})=\frac{a+dL-1}{\phi(d)\log(a+dL-1)}-\frac{a-1}{\phi(d)\log(a-1)}+O\left(\frac{a-1}{\phi(d)(\log(a-1))^{2}}+\frac{a+dL-1}{\phi(d)(\log(a+dL-1))^{2}}\right).

To estimate the main term, noticing that a1a+dL12a2a-1\leq a+dL-1\leq 2a-2, we have

a+dL1ϕ(d)log(a+dL1)=a+dL1ϕ(d)log(a1)+O(a+dL1ϕ(d)(log(a1))2).\frac{a+dL-1}{\phi(d)\log(a+dL-1)}=\frac{a+dL-1}{\phi(d)\log(a-1)}+O\left(\frac{a+dL-1}{\phi(d)(\log(a-1))^{2}}\right).

This shows that

π(𝒜)=dLϕ(d)log(a1)+O(a1ϕ(q)(log(a1))2+a+dL1ϕ(d)(log(a+dL1))2+a+dL1ϕ(d)(log(a1))2).\pi(\mathcal{A})=\frac{dL}{\phi(d)\log(a-1)}+O\left(\frac{a-1}{\phi(q)(\log(a-1))^{2}}+\frac{a+dL-1}{\phi(d)(\log(a+dL-1))^{2}}+\frac{a+dL-1}{\phi(d)(\log(a-1))^{2}}\right).

Clearly a+dL1ϕ(d)(log(a1))2\frac{a+dL-1}{\phi(d)(\log(a-1))^{2}} is the largest among the three summands in the error term. As LdL<a10LlogLL\leq dL<a\leq 10L\sqrt{\log L}, we can bound the error terms by

a+dL1ϕ(d)(log(a1))220dLlogLϕ(d)(logL)2=O(dLϕ(d)(logL)32).\frac{a+dL-1}{\phi(d)(\log(a-1))^{2}}\leq\frac{20dL\sqrt{\log L}}{\phi(d)(\log L)^{2}}=O\left(\frac{dL}{\phi(d)(\log L)^{\frac{3}{2}}}\right).

As LL is sufficiently large, we may assume that the error term is at most 16dLϕ(d)logL\frac{1}{6}\frac{dL}{\phi(d)\log L}. Also note that a110LlogLL32a-1\leq 10L\sqrt{\log L}\leq L^{\frac{3}{2}} when LL is sufficiently large. This gives the desired estimate

π(𝒜)23dLϕ(d)logL16dLϕ(d)logL=dL2ϕ(d)logL.\pi(\mathcal{A})\geq\frac{2}{3}\frac{dL}{\phi(d)\log L}-\frac{1}{6}\frac{dL}{\phi(d)\log L}=\frac{dL}{2\phi(d)\log L}.

Using this, we can estimate Nk(𝒜;α,β)N_{k}(\mathcal{A};\alpha,\beta) with a specific choice of the parameters motivated by [8].

Proposition 5.3.

If 𝒜={a+id:0i<L}\mathcal{A}=\{a+id:0\leq i<L\} is an arithmetic progression satisfying that LL is sufficiently large, 0<dLaLlogL0<dL\leq a\leq L\sqrt{\log L}, and gcd(a,d)=1\operatorname{gcd}(a,d)=1, then for k=loglogLlog45loglogL4k=\left\lfloor\frac{\log\log L}{\log 4}-5\sqrt{\log\log L}\right\rfloor-4,

Nk(𝒜;log4,1)L(logL)θo(1).N_{k}(\mathcal{A};\log 4,1)\gg{L}{(\log L)^{-\theta-o(1)}}.
Proof.

To obtain a lower bound, we count elements n𝒩k(𝒜;log4,1)n\in\mathcal{N}_{k}(\mathcal{A};\log 4,1) with p1(n)p2(n)pk1(n)<ap_{1}(n)p_{2}(n)\cdots p_{k-1}(n)<\sqrt{a}. Once we fixed such a choice of (p1,p2,,pk1)(p_{1},p_{2},\dots,p_{k-1}) with loglogpjjlog41\log\log p_{j}\geq j\log 4-1 for all 1jk11\leq j\leq k-1, we count the number of pkp_{k} with pk>pk1p_{k}>p_{k-1} and p1p2pk𝒜p_{1}p_{2}\cdots p_{k}\in\mathcal{A}. Let q=p1p2pk1q=p_{1}p_{2}\cdots p_{k-1}. Clearly if qp𝒜qp\in\mathcal{A} for some prime pp, then paq>a>q>pk1p\geq\frac{a}{q}>\sqrt{a}>q>p_{k-1}. Moreover we have

loglogp>loglogL=loglogLlog2>klog41,\log\log p>\log\log\sqrt{L}=\log\log L-\log 2>k\log 4-1,

so any such choice of prime pp would give a valid choice for pkp_{k} as defined in (5.1).

For any such fixed choice of qq, if gcd(q,d)>1\operatorname{gcd}(q,d)>1, then any element in 𝒜\mathcal{A} is not a multiple of qq, so there is no such choice of pkp_{k}. If gcd(q,d)=1\operatorname{gcd}(q,d)=1, then q|a+idq|a+id is equivalent to iad1(modq)i\equiv-ad^{-1}\pmod{q} and there is an unique such i=i0i=i_{0} with 0i0<q0\leq i_{0}<q. Then the multiples of qq in 𝒜\mathcal{A} are of the form

{a+(i0+kq)d:0k<Li0q}{a+(i0+kq)d:0k<L/q},\left\{a+(i_{0}+k^{\prime}q)d:0\leq k^{\prime}<\frac{L-i_{0}}{q}\right\}\supseteq\{a+(i_{0}+k^{\prime}q)d:0\leq k^{\prime}<\lfloor L/q\rfloor\},

and the inclusion follows from Li0q>L/q1\frac{L-i_{0}}{q}>\lfloor L/q\rfloor-1. Hence any element p𝒜q:={a+kd:0k<L}p\in\mathcal{A}_{q}:=\{a^{\prime}+k^{\prime}d:0\leq k^{\prime}<L^{\prime}\} with a=a+i0dqa^{\prime}=\frac{a+i_{0}d}{q} and L=L/qL^{\prime}=\lfloor L/q\rfloor would satisfy pq𝒜pq\in\mathcal{A}. It is sufficient to estimate the number of primes from 𝒜q\mathcal{A}_{q}, which itself is an arithmetic progression. As gcd(a,d)gcd(a+i0d,d)=gcd(a,d)=1\operatorname{gcd}(a^{\prime},d)\leq\operatorname{gcd}(a+i_{0}d,d)=\operatorname{gcd}(a,d)=1, we have gcd(a,d)=1\operatorname{gcd}(a^{\prime},d)=1. Moreover, we know that dLdLq<aqadL^{\prime}\leq\frac{dL}{q}<\frac{a}{q}\leq a^{\prime} and a<10LlogLa^{\prime}<10L^{\prime}\sqrt{\log L^{\prime}} when LL is sufficiently large. Note that LLq1>L12(logL)141L^{\prime}\geq\frac{L}{q}-1>\frac{L^{\frac{1}{2}}}{(\log L)^{\frac{1}{4}}}-1 is also sufficiently large. By invoking Lemma 5.2, the number of choices for pkp_{k} is at least

π(𝒜q)dL2ϕ(d)logLdL/q4ϕ(d)logL.\pi(\mathcal{A}_{q})\geq\frac{dL^{\prime}}{2\phi(d)\log L^{\prime}}\geq\frac{dL/q}{4\phi(d)\log L}. (5.3)

Summing (5.3) up over all possible (p1,,pk1)(p_{1},\dots,p_{k-1}), we have

Nk(𝒜;log4,1)dL4ϕ(d)logLp1<<pk1,p1pk1<apjd,loglogpjjlog41 1jk11p1pk1.N_{k}(\mathcal{A};\log 4,1)\geq\frac{dL}{4\phi(d)\log L}\cdot\sum_{\begin{subarray}{c}p_{1}<\cdots<p_{k-1},p_{1}\cdots p_{k-1}<\sqrt{a}\\ p_{j}\nmid d,\log\log p_{j}\geq j\log 4-1\;\forall\;1\leq j\leq k-1\end{subarray}}\frac{1}{p_{1}\cdots p_{k-1}}. (5.4)

Since LaL2L\leq a\leq L^{2}, the condition in Lemma 5.4 is satisfied with our choice of kk, and it implies that

Nk(𝒜;log4,1)dL4ϕ(d)logL(elog4)k1(loga)o(1)L(logL)θ+o(1),\begin{split}N_{k}(\mathcal{A};\log 4,1)&\geq\frac{dL}{4\phi(d)\log L}(e\log 4)^{k-1}(\log\sqrt{a})^{-o(1)}\geq\frac{L}{(\log L)^{\theta+o(1)}},\end{split}

where we use that dϕ(d)1\frac{d}{\phi(d)}\geq 1, k=(1+o(1))loglogLlog4k=(1+o(1))\frac{\log\log L}{\log 4}, and the constant 14\frac{1}{4} is absorbed in the o(1)o(1) term. ∎

Lemma 5.4.

For any xx large enough, kloglogxlog45loglogxk\leq\frac{\log\log x}{\log 4}-5\sqrt{\log\log x}, d=O(logx)d=O(\log x), we have

p1<<pk,p1pk<xpjd,loglogpjjlog4 1jk1p1pk(elog4)k(logx)o(1).\sum_{\begin{subarray}{c}p_{1}<\cdots<p_{k},p_{1}\cdots p_{k}<x\\ p_{j}\nmid d,\log\log p_{j}\geq j\log 4\;\forall\;1\leq j\leq k\end{subarray}}\frac{1}{p_{1}\cdots p_{k}}\geq(e\log 4)^{k}(\log x)^{-o(1)}. (5.5)
Proof.

Let h=loglogxlog4h=\left\lfloor\frac{\sqrt{\log\log x}}{\log 4}\right\rfloor, and J=k/h=loglogxO(1)J=\lfloor k/h\rfloor=\sqrt{\log\log x}-O(1). For each j1j\geq 1, we define

Ij:=[eejloglogx,ee(j+1)loglogx).I_{j}:=\left[e^{e^{j\sqrt{\log\log x}}},e^{e^{(j+1)\sqrt{\log\log x}}}\right).

By Theorem 4.3, we know that pIj1/p=loglogx+O(1)\sum_{p\in I_{j}}1/p=\sqrt{\log\log x}+O(1). Therefore pPj1/p=loglogxO(logloglogx)\sum_{p\in P_{j}}1/p=\sqrt{\log\log x}-O(\log\log\log x) where PjP_{j} is the set of primes in IjI_{j} which are not factors of dd, and the O(logloglogx)O(\log\log\log x) term bounds the contribution from those primes which are factors of dd. In the summation in (5.5), we consider only the tuples (p1,,pk)(p_{1},\dots,p_{k}) with hh primes in PjP_{j} for each 1jJ1\leq j\leq J and (kJh)(k-Jh) primes in PJ+1P_{J+1}. Noting that kJhheheloglogxk-Jh\leq h\leq e^{h}\leq e^{\sqrt{\log\log x}}, a straightforward computation shows that

p1pkexp(j=1Jhe(j+1)loglogx+(kJh)e(J+2)loglogx)<ee(J+4)loglogx<xp_{1}\cdots p_{k}\leq\exp\left(\sum_{j=1}^{J}he^{(j+1)\sqrt{\log\log x}}+(k-Jh)e^{(J+2)\sqrt{\log\log x}}\right)<e^{e^{(J+4)\sqrt{\log\log x}}}<x

where in the last step we use that J+4<loglogxJ+4<\sqrt{\log\log x} for xx large. Hence, any such tuple would satisfy p1pk<xp_{1}\cdots p_{k}<x. Moreover, we know that loglogpjj/hloglogx>jlog4\log\log p_{j}\geq\lceil j/h\rceil\sqrt{\log\log x}>j\log 4. Therefore any such tuple contributes to a summand on the left hand side of (5.5). Then the left hand side of (5.5) is at least

j=1J(p1<<phPj1p1ph)(p1<<pkJhPJ+11p1pkJh).\prod_{j=1}^{J}\left(\sum_{p_{1}<\cdots<p_{h}\in P_{j}}\frac{1}{p_{1}\cdots p_{h}}\right)\cdot\left(\sum_{p_{1}<\cdots<p_{k-Jh}\in P_{J+1}}\frac{1}{p_{1}\cdots p_{k-Jh}}\right). (5.6)

Factors in (5.6) are of the same form. We estimate each of them as follows. For 1th1\leq t\leq h and j1j\geq 1,

p1<<ptPj1p1pt=1t!p1Pj1p1p2Pjp2p11p2ptPjptp1,,pt11pt1t!(pPj1pteejloglogx)t.\sum_{p_{1}<\cdots<p_{t}\in P_{j}}\frac{1}{p_{1}\cdots p_{t}}=\frac{1}{t!}\sum_{p_{1}\in P_{j}}\frac{1}{p_{1}}\sum_{\begin{subarray}{c}p_{2}\in P_{j}\\ p_{2}\neq p_{1}\end{subarray}}\frac{1}{p_{2}}\cdots\sum_{\begin{subarray}{c}p_{t}\in P_{j}\\ p_{t}\neq p_{1},\dots,p_{t-1}\end{subarray}}\frac{1}{p_{t}}\geq\frac{1}{t!}\left(\sum_{p\in P_{j}}\frac{1}{p}-\frac{t}{e^{e^{j\sqrt{\log\log x}}}}\right)^{t}. (5.7)

Note that thloglogx/log4eejloglogxt\leq h\leq\sqrt{\log\log x}/\log 4\leq e^{e^{j\sqrt{\log\log x}}}, so the term inside the parenthesis on the right hand side of (5.7) is loglogxO(logloglogx)\sqrt{\log\log x}-O(\log\log\log x). We shall apply Stirling’s formula and obtain that, when xx is sufficiently large, the right hand side of (5.7) is at least

RHS of (5.7)(loglogxO(logloglogx)1t/e)t(loglogx)1/4o(1)(elog4O(logloglogx/loglogx))t(loglogx)1/4o(1)(elog4)t(loglogx)C\begin{split}\textrm{RHS of }\eqref{eqn:sum-of-re-each}&\geq\left(\frac{\sqrt{\log\log x}-O(\log\log\log x)-1}{t/e}\right)^{t}\cdot(\log\log x)^{-1/4-o(1)}\\ &\geq\left(e\log 4-O\left(\log\log\log x/\sqrt{\log\log x}\right)\right)^{t}\cdot(\log\log x)^{-1/4-o(1)}\\ &\geq(e\log 4)^{t}\cdot(\log\log x)^{-C}\end{split}

for some absolute constant C>0C>0. Applying this bound to (5.7) with t=ht=h and t=kJht=k-Jh, we have

LHS of (5.5)(5.6)(elog4)Jh(loglogx)CJ(elog4)kJt(loglogx)C=(elog4)k(loglogx)O(loglogx)\begin{split}\textrm{LHS of }\eqref{eqn:sum-of-reciprocal}\geq\eqref{eqn:sum-of-re-1}&\geq(e\log 4)^{Jh}(\log\log x)^{-CJ}\cdot(e\log 4)^{k-Jt}(\log\log x)^{-C}\\ &=(e\log 4)^{k}\cdot(\log\log x)^{-O(\sqrt{\log\log x})}\end{split}

Thus, noting that the (loglogx)O(loglogx)=(logx)o(1)(\log\log x)^{-O(\sqrt{\log\log x})}=(\log x)^{-o(1)}, we finish the proof. ∎

6 Proof of Theorem 1.8

Following the argument in Section 3, we may only focus on the case where our arithmetic progression 𝒜={a+id:0i<L}\mathcal{A}=\{a+id:0\leq i<L\} satisfies 0<dL<a<LlogL0<dL<a<L\log L. Moreover by Theorem 4.1 with ϵ=322log2\epsilon=\frac{3}{2}-2\log 2, we may further assume that a<12LlogLa<\frac{1}{2}L\sqrt{\log L}. In particular, we have the following reduction.

Proposition 6.1.

If 𝒜={a+id:0i<L}\mathcal{A}=\{a+id:0\leq i<L\} is an arithmetic progression with length LL and common difference dd with a>0a>0, LL sufficiently large, and LlogLa+dLL\sqrt{\log L}\leq a+dL, then there exists a subset 𝒜𝒜\mathcal{A}^{\prime}\subseteq\mathcal{A} satisfying |𝒜||𝒜||\mathcal{A}^{\prime}|\gg|\mathcal{A}| and E×(𝒜)|𝒜|2E_{\times}(\mathcal{A}^{\prime})\ll|\mathcal{A}|^{2}.

Therefore we may reduce to the case where 𝒜={a+id:0i<L}\mathcal{A}=\{a+id:0\leq i<L\} with 2dLdL+aLlogL2dL\leq dL+a\leq L\sqrt{\log L}. Inspired by [14], we set A=𝒩k(𝒜;log4,β)A^{\prime}=\mathcal{N}_{k}(\mathcal{A};\log 4,\beta) where parameters are as in Proposition 5.3. This gives the desired lower bound on |𝒜||\mathcal{A}^{\prime}|. We show that E×(𝒩k(𝒜;log4,β))E_{\times}(\mathcal{N}_{k}(\mathcal{A};\log 4,\beta)) is small.

Proposition 6.2.

Let 𝒜={a+id:0i<L}\mathcal{A}=\{a+id:0\leq i<L\} be an arithmetic progression with LlogL>a>dL>0L\sqrt{\log L}>a>dL>0 and 𝒜=𝒩k(𝒜;log4,β)\mathcal{A}^{\prime}=\mathcal{N}_{k}(\mathcal{A};\log 4,\beta) be defined as in (5.1) with β\beta\in\mathbb{R}. Then for some absolute constant CC we have

E×(𝒜)CL2(logL)2+1log2loglogL(log4)2k22β.E_{\times}(\mathcal{A}^{\prime})\leq CL^{2}(\log L)^{-2+\frac{1}{\log 2}}\cdot\log\log L\cdot(\log 4)^{2k}2^{2\beta}.
Lemma 6.3.

For any finite 𝒜\mathcal{A}\subseteq\mathbb{R}, there exists 𝒜𝒜\mathcal{A}^{\prime}\subseteq\mathcal{A} with |𝒜||𝒜|32E×(𝒜)|\mathcal{A}^{\prime}|\geq\frac{|\mathcal{A}|^{3}}{2E_{\times}(\mathcal{A})} and E×(𝒜)4|𝒜|2E_{\times}(\mathcal{A}^{\prime})\leq 4|\mathcal{A}^{\prime}|^{2}.

Proof of Theorem 1.8 assuming Proposition 6.2 and Lemma 6.3.

We may assume that LL is sufficiently large and choose 𝒜′′=𝒩k(𝒜;log4,1)\mathcal{A}^{\prime\prime}=\mathcal{N}_{k}(\mathcal{A};\log 4,1) where k=loglogLlog45loglogL4=(1+o(1))loglogLlog4k=\left\lfloor\frac{\log\log L}{\log 4}-5\sqrt{\log\log L}\right\rfloor-4=(1+o(1))\frac{\log\log L}{\log 4}. By Proposition 5.3, we know that |𝒜′′|L(logL)θo(1).|\mathcal{A}^{\prime\prime}|\geq{L}{(\log L)^{-\theta-o(1)}}. By Proposition 6.2, noting that (log4)2k=(2log2)(1+o(1))loglogLlog2=(logL)1+loglog2log2+o(1)(\log 4)^{2k}=(2\log 2)^{(1+o(1))\frac{\log\log L}{\log 2}}=(\log L)^{1+\frac{\log\log 2}{\log 2}+o(1)}, we have

E×(𝒜′′)L2(logL)2+1log2+1+loglog2log2+o(1)=L2(logL)2θ+o(1).E_{\times}(\mathcal{A}^{\prime\prime})\leq L^{2}(\log L)^{-2+\frac{1}{\log 2}+1+\frac{\log\log 2}{\log 2}+o(1)}=L^{2}(\log L)^{-2\theta+o(1)}.

We apply Lemma 6.3 to find 𝒜𝒜′′\mathcal{A}^{\prime}\subseteq\mathcal{A}^{\prime\prime} of size at least |𝒜′′|32E×(𝒜′′)L(logL)θo(1)\frac{|\mathcal{A}^{\prime\prime}|^{3}}{2E_{\times}(\mathcal{A}^{\prime\prime})}\geq L(\log L)^{-\theta-o(1)}, as desired. ∎

The proof of Lemma 6.3 uses the probabilistic method. A similar application appears in [14].

Proof of Lemma 6.3.

Fix p=|𝒜|2E×(𝒜)(0,1)p=\frac{|\mathcal{A}|^{2}}{E_{\times}(\mathcal{A})}\in(0,1). We choose a random subset 𝒜𝒜\mathcal{A}^{\prime}\subseteq\mathcal{A} by keeping each element in 𝒜\mathcal{A} independently with probability pp. Then we have 𝔼|𝒜|2p2|𝒜|2\mathbb{E}|\mathcal{A}^{\prime}|^{2}\geq p^{2}|\mathcal{A}|^{2}. Note that for tuples (a1,a2;a3,a4)(𝒜)4(a_{1},a_{2};a_{3},a_{4})\in(\mathcal{A}^{\prime})^{4} with a1a2=a3a4a_{1}a_{2}=a_{3}a_{4}, there are at most 2|𝒜|22|\mathcal{A}^{\prime}|^{2} of them with {a1,a2,a3,a4}\{a_{1},a_{2},a_{3},a_{4}\} containing at most 22 elements. For all other ones, they come from a tuple (a1,a2;a3,a4)𝒜4(a_{1},a_{2};a_{3},a_{4})\in\mathcal{A}^{4} with at least 33 distinct elements, and they are included in (𝒜)4(\mathcal{A}^{\prime})^{4} with probability at most p3p^{3}. Therefore we have

𝔼E×(𝒜)𝔼2|𝒜|2+p3E×(𝒜)=2p2|𝒜|2+p3E×(𝒜)=3p2|𝒜|2.\mathbb{E}E_{\times}(\mathcal{A}^{\prime})\leq\mathbb{E}2|\mathcal{A}^{\prime}|^{2}+p^{3}E_{\times}(\mathcal{A})=2p^{2}|\mathcal{A}|^{2}+p^{3}E_{\times}(\mathcal{A})=3p^{2}|\mathcal{A}|^{2}.

This shows that there exists 𝒜\mathcal{A}^{\prime} such that 4|𝒜|2E×(𝒜)𝔼[4|𝒜|2E×(𝒜)]=p2|𝒜|24|\mathcal{A}^{\prime}|^{2}-E_{\times}(\mathcal{A}^{\prime})\geq\mathbb{E}[4|\mathcal{A}^{\prime}|^{2}-E_{\times}(\mathcal{A}^{\prime})]=p^{2}|\mathcal{A}|^{2}. In particular we know that |𝒜|2p2|𝒜|24=|𝒜|64(E×(𝒜))2|\mathcal{A}^{\prime}|^{2}\geq\frac{p^{2}|\mathcal{A}|^{2}}{4}=\frac{|\mathcal{A}|^{6}}{4(E_{\times}(\mathcal{A}))^{2}} and E×(𝒜)4|𝒜|2E_{\times}(\mathcal{A}^{\prime})\leq 4|\mathcal{A}^{\prime}|^{2}. ∎

The proof of Proposition 6.2 is similar to that of Theorem 4.1 but with two undetermined parameters that need to be optimized.

Proof of Proposition 6.2.

By letting the constant CC large, we may assume that LL is sufficiently large. Similar to the proof of Lemma 3.2, we upper bound the size of

X:={(x1,x2;y1,y2):xiyj𝒜i,j{1,2},0<x1<x2,0<y1<y2,x1y1}.X:=\{(x_{1},x_{2};y_{1},y_{2}):x_{i}y_{j}\in\mathcal{A}^{\prime}\;\leavevmode\nobreak\ \leavevmode\nobreak\ \forall\;i,j\in\{1,2\},0<x_{1}<x_{2},0<y_{1}<y_{2},x_{1}\leq y_{1}\}.

By the same argument as in Lemma 3.2 (here we use an extra symmetry between xx and yy), we have that E×(𝒜)2|𝒜|2+8|X|E_{\times}(\mathcal{A}^{\prime})\leq 2|\mathcal{A}^{\prime}|^{2}+8|X|. It remains to estimate |X||X|.

Fix x1x_{1}. As all elements in 𝒜\mathcal{A}^{\prime} are congruent to aa mod dd, they are all coprime to dd. Therefore xix_{i} and yjy_{j} are also coprime to dd. As a result, x1y1ax1y2(modd)x_{1}y_{1}\equiv a\equiv x_{1}y_{2}\pmod{d} implies that x1x_{1} and x2x_{2} are congruent mod dd. Moreover, y1y_{1} and y2y_{2} are both congruent to ax11ax_{1}^{-1} mod dd. Also note that elements in 𝒜\mathcal{A}^{\prime} are included in the interval [a,a+dL)[a,a+dL), we know that x2x1=x2y1x1y1<1+dLa\frac{x_{2}}{x_{1}}=\frac{x_{2}y_{1}}{x_{1}y_{1}}<1+\frac{dL}{a}. Hence also note that x2>x1x_{2}>x_{1}, we know that x2(x1,x1+x1dL/a)x_{2}\in(x_{1},x_{1}+x_{1}dL/a). Since x2x1(modd)x_{2}\equiv x_{1}\pmod{d}, there is no valid tuple in XX for x1La1\frac{x_{1}L}{a}\leq 1. Thus any valid tuple satisfies x1>a/Lx_{1}>a/L. Similarly, as x1yi[a,a+dL)x_{1}y_{i}\in[a,a+dL), we know that yi[ax1,ax1+dLx1)y_{i}\in[\frac{a}{x_{1}},\frac{a}{x_{1}}+\frac{dL}{x_{1}}), and y1,y2y_{1},y_{2} are congruent mod dd. By the extra symmetry, we have x1x1y1<a+dLx_{1}\leq\sqrt{x_{1}y_{1}}<\sqrt{a+dL}.

The goal is to apply Theorem 4.2 to bound the number of choices for xix_{i} and yjy_{j}. For each fixed x1=x(L/a,a+dL)x_{1}=x\in(L/a,\sqrt{a+dL}), we would like to compute the number of tuples (x1,x2;y1,y2)X(x_{1},x_{2};y_{1},y_{2})\in X. Let us denote the set of such tuples by X(x1)X(x_{1}).

By definition, any m𝒜m\in\mathcal{A}^{\prime} satisfies that ω(m)=k\omega(m)=k and ω(m,t)f(t):=loglogtlog4+β\omega(m,t)\leq f(t):=\frac{\log\log t}{\log 4}+\beta for all tt\in\mathbb{N}. Here ω(m,t):=#{p|m:pt}.\omega(m,t):=\#\{p|m:p\leq t\}. Note that we have xiyj𝒜x_{i}y_{j}\in\mathcal{A}^{\prime}. As 𝒜\mathcal{A}^{\prime} consists of square-free elements having kk distinct prime factors, we know that for any (i,j)(i,j), k=ω(xiyj)=ω(xi)+ω(yj)k=\omega(x_{i}y_{j})=\omega(x_{i})+\omega(y_{j}), and ω(xi,t)+ω(yj,t)=ω(xiyj,t)f(t)\omega(x_{i},t)+\omega(y_{j},t)=\omega(x_{i}y_{j},t)\leq f(t). Let λ,ξ(0,1)\lambda,\xi\in(0,1) to be determined. Then we have for each (x1,x2;y1,y2)X2(x_{1},x_{2};y_{1},y_{2})\in X_{2},

(λ2)ω(x1)(λ2)ω(x2)(λ2)ω(y1)(λ2)ω(y2)λ4k=λω(x1y1)+ω(x1y2)+ω(x2y1)+ω(x2y2)4k=1,(\lambda^{2})^{\omega(x_{1})}(\lambda^{2})^{\omega(x_{2})}(\lambda^{2})^{\omega(y_{1})}(\lambda^{2})^{\omega(y_{2})}\lambda^{-4k}=\lambda^{\omega(x_{1}y_{1})+\omega(x_{1}y_{2})+\omega(x_{2}y_{1})+\omega(x_{2}y_{2})-4k}=1,

and for any tt\in\mathbb{N},

(ξ2)ω(x1,t)(ξ2)ω(x2,t)(ξ2)ω(y1,t)(ξ2)ω(y2,t)ξ4f(t)=ξω(x1y1,t)+ω(x1y2,t)+ω(x2y1,t)+ω(x2y2,t)4f(t)1.(\xi^{2})^{\omega(x_{1},t)}(\xi^{2})^{\omega(x_{2},t)}(\xi^{2})^{\omega(y_{1},t)}(\xi^{2})^{\omega(y_{2},t)}\xi^{-4f(t)}=\xi^{\omega(x_{1}y_{1},t)+\omega(x_{1}y_{2},t)+\omega(x_{2}y_{1},t)+\omega(x_{2}y_{2},t)-4f(t)}\geq 1.

For each x1x_{1}, as x2<a+dLax1<2x1x_{2}<\frac{a+dL}{a}x_{1}<2x_{1}, if we choose t=2x1t=2x_{1}, then ω(xi,t)=ω(xi)\omega(x_{i},t)=\omega(x_{i}) for i=1,2i=1,2. Hence

|X(x1)|λ4k+2ω(x1)ξ4f(2x1)+2ω(x1)x1<x2<x1a+dLax2x1(modd)(λ2ξ2)ω(x2)(ax1y<a+dLx1yax11(modd)(λ2)ω(y)(ξ2)ω(y,2x1))2,\begin{split}|X(x_{1})|\leq\lambda^{-4k+2\omega(x_{1})}\xi^{-4f(2x_{1})+2\omega(x_{1})}\sum_{\begin{subarray}{c}x_{1}<x_{2}<x_{1}\frac{a+dL}{a}\\ x_{2}\equiv x_{1}\;(\mathrm{mod}\;d)\end{subarray}}\left(\lambda^{2}\xi^{2}\right)^{\omega(x_{2})}\left(\sum_{\begin{subarray}{c}\frac{a}{x_{1}}\leq y<\frac{a+dL}{x_{1}}\\ y\equiv ax_{1}^{-1}\;(\mathrm{mod}\;d)\end{subarray}}\left(\lambda^{2}\right)^{\omega(y)}\left(\xi^{2}\right)^{\omega(y,2x_{1})}\right)^{2},\end{split}

First we try to estimate the inner summation by Theorems 4.2 and 4.3. To see that Theorem 4.2 is applicable with α=β=1/2\alpha=\beta=1/2 and A1=A2=1A_{1}=A_{2}=1, note that we have x=a+dLx1x^{\prime}=\frac{a+dL}{x_{1}} and y=dLx1y^{\prime}=\frac{dL}{x_{1}}, and clearly y>xy^{\prime}>x^{\prime}. Since x1a+dL2LlogLx_{1}\leq\sqrt{a+dL}\leq\sqrt{2L\log L}, we have (y)2x=d2L2x1(a+dL)d2L2(2LlogL)3/2>1\frac{(y^{\prime})^{2}}{x^{\prime}}=\frac{d^{2}L^{2}}{x_{1}(a+dL)}\leq\frac{d^{2}L^{2}}{(2L\log L)^{3/2}}>1 for LL sufficiently large. Meanwhile we have yd2=dLd2x1dLd22LlogL>1\frac{y^{\prime}}{d^{2}}=\frac{dL}{d^{2}x_{1}}\leq\frac{dL}{d^{2}\sqrt{2L\log L}}>1 as d<logLd<\sqrt{\log L} when LL sufficiently large. Note that g(y)=(λ2)ω(y)(ξ2)ω(y,2x1)g(y)=\left(\lambda^{2}\right)^{\omega(y)}\left(\xi^{2}\right)^{\omega(y,2x_{1})} takes value g(p)=λ2g(p)=\lambda^{2} for 2x1<p<a+dLx12x_{1}<p<\frac{a+dL}{x_{1}} and g(p)=λ2ξ2g(p)=\lambda^{2}\xi^{2} for p<2x1p<2x_{1}. Therefore by Theorem 4.2,

ax1y<a+dLx1yax11(modd)(λ2)ω(y)(ξ2)ω(y,2x1)dLx1ϕ(d)1loga+dLx1exp(p2x1λ2ξ2λ2p+p<a+dLx1λ2p)dLx1ϕ(d)(log(a+dL))1+λ2(logx1)λ2ξ2λ2.\begin{split}\sum_{\begin{subarray}{c}\frac{a}{x_{1}}\leq y<\frac{a+dL}{x_{1}}\\ y\equiv ax_{1}^{-1}\;(\mathrm{mod}\;d)\end{subarray}}\left(\lambda^{2}\right)^{\omega(y)}\left(\xi^{2}\right)^{\omega(y,2x_{1})}&\ll\frac{\frac{dL}{x_{1}}}{\phi(d)}\frac{1}{\log\frac{a+dL}{x_{1}}}\exp\left(\sum_{p\leq 2x_{1}}\frac{\lambda^{2}\xi^{2}-\lambda^{2}}{p}+\sum_{p<\frac{a+dL}{x_{1}}}\frac{\lambda^{2}}{p}\right)\\ &\ll\frac{dL}{x_{1}\phi(d)}(\log(a+dL))^{-1+\lambda^{2}}(\log x_{1})^{\lambda^{2}\xi^{2}-\lambda^{2}}.\end{split}

Here in the second line we use Theorem 4.3 to get p<a+dLx1λ2pλ2logloga+dLx1+O(1)\sum_{p<\frac{a+dL}{x_{1}}}\frac{\lambda^{2}}{p}\leq\lambda^{2}\log\log\frac{a+dL}{x_{1}}+O(1), and simiarly p<2x1λ2ξ2λ2p(λ2ξ2λ2)loglog2x1+O(1)=(λ2ξ2λ2)loglogx1+O(1)\sum_{p<2x_{1}}\frac{\lambda^{2}\xi^{2}-\lambda^{2}}{p}\leq(\lambda^{2}\xi^{2}-\lambda^{2})\log\log 2x_{1}+O(1)=(\lambda^{2}\xi^{2}-\lambda^{2})\log\log x_{1}+O(1).

For the outer summation, we argue that Theorem 4.2 is applicable when x1>ea2L2x_{1}>\frac{ea^{2}}{L^{2}}. In this case we have x=x1a+dLa>y=x1dLax^{\prime}=x_{1}\frac{a+dL}{a}>y^{\prime}=x_{1}\frac{dL}{a}. Moreover (y)2x=x1d2L2(a+dL)a>x1d2L22a2>1\frac{(y^{\prime})^{2}}{x^{\prime}}=\frac{x_{1}d^{2}L^{2}}{(a+dL)a}>\frac{x_{1}d^{2}L^{2}}{2a^{2}}>1 when x1=2ea2L2x_{1}=\frac{2ea^{2}}{L^{2}}. At the same time, we have yd2=x1Lad>eadL>1\frac{y^{\prime}}{d^{2}}=\frac{x_{1}L}{ad}>\frac{ea}{dL}>1. Hence Theorem 4.2 is applicable with α=β=1/2\alpha=\beta=1/2, so

x1<x2<x1a+dLax2x1(modd)(λ2ξ2)ω(x2)x1dLaϕ(d)(logx1(a+dL)a)1+λ2ξ2x1dLaϕ(d)(logx1)1+λ2ξ2.\sum_{\begin{subarray}{c}x_{1}<x_{2}<x_{1}\frac{a+dL}{a}\\ x_{2}\equiv x_{1}\;(\mathrm{mod}\;d)\end{subarray}}\left(\lambda^{2}\xi^{2}\right)^{\omega(x_{2})}\ll\frac{x_{1}\frac{dL}{a}}{\phi(d)}\left(\log\frac{x_{1}(a+dL)}{a}\right)^{-1+\lambda^{2}\xi^{2}}\leq\frac{x_{1}dL}{a\phi(d)}(\log x_{1})^{-1+\lambda^{2}\xi^{2}}. (6.1)

Here in the second inequality we use Theorem 4.3 to get p<x1a+dLaλ2ξ2pλ2ξ2loglogx1(a+dL)a+O(1)\sum_{p<x_{1}\frac{a+dL}{a}}\frac{\lambda^{2}\xi^{2}}{p}\leq\lambda^{2}\xi^{2}\log\log\frac{x_{1}(a+dL)}{a}+O(1). Let X1X_{1} be the collection of tuples in XX with x1<ea2L2x_{1}<\frac{ea^{2}}{L^{2}}, and X2=XX1X_{2}=X\setminus X_{1}. Summing over x1>ea2/L2x_{1}>ea^{2}/L^{2}, noting that logL<log(a+dL)log(2L2)3logL\log L<\log(a+dL)\leq\log(2L^{2})\leq 3\log L and that ξ4f(2x1)=ξ4βξ4loglog2x1log4ξ4β(logx1)2log1/ξlog2\xi^{-4f(2x_{1})}=\xi^{-4\beta}\xi^{-4\frac{\log\log 2x_{1}}{\log 4}}\ll\xi^{-4\beta}(\log x_{1})^{\frac{2\log 1/\xi}{\log 2}}, we have

|X2|λ4kξ4βea2L2<x1<a+dLd3L3ax1ϕ(d)3(logx1)2log1/ξlog21+3λ2ξ22λ2(logL)2+2λ2(λ2ξ2)ω(x1).|X_{2}|\ll\lambda^{-4k}\xi^{-4\beta}\sum_{\frac{ea^{2}}{L^{2}}<x_{1}<\sqrt{a+dL}}\frac{d^{3}L^{3}}{ax_{1}\phi(d)^{3}}(\log x_{1})^{\frac{2\log 1/\xi}{\log 2}-1+3\lambda^{2}\xi^{2}-2\lambda^{2}}(\log L)^{-2+2\lambda^{2}}(\lambda^{2}\xi^{2})^{\omega(x_{1})}.

We partition integers x1x_{1} in the interval (ea2L2,a+dL)(\frac{ea^{2}}{L^{2}},\sqrt{a+dL}) into x1(et,et+1)x_{1}\in(e^{t},e^{t+1}) for logea2L2tloga+dL\lfloor\log\frac{ea^{2}}{L^{2}}\rfloor\leq t\leq\lfloor\log\sqrt{a+dL}\rfloor. Note that by our choice tlogea2L21t\geq\lfloor\log\frac{ea^{2}}{L^{2}}\rfloor\geq 1. In each interval we have

et<x1<et+1d3L3ax1ϕ(d)3(logx1)2log1/ξlog21+3λ2ξ2(logL)2+2λ22λ2ξ2(λ2ξ2)ω(x1)[et<x1<et+1]d3L3aetϕ(d)3t2log1/ξlog21+3λ2ξ22λ2(logL)2+2λ2et<x1<et+1(λ2ξ2)ω(x1)[by Theorem4.2]d3L3aetϕ(d)3t2log1/ξlog21+3λ2ξ22λ2(logL)2+2λ2et(e1)(t+1)1+λ2ξ2d3L3aϕ(d)3t2log1/ξlog22+4λ2ξ22λ2(logL)2+2λ2.\begin{split}&\sum_{e^{t}<x_{1}<e^{t+1}}\frac{d^{3}L^{3}}{ax_{1}\phi(d)^{3}}(\log x_{1})^{\frac{2\log 1/\xi}{\log 2}-1+3\lambda^{2}\xi^{2}}(\log L)^{-2+2\lambda^{2}-2\lambda^{2}\xi^{2}}(\lambda^{2}\xi^{2})^{\omega(x_{1})}\\ [e^{t}<x_{1}<e^{t+1}]\ll&\frac{d^{3}L^{3}}{ae^{t}\phi(d)^{3}}t^{\frac{2\log 1/\xi}{\log 2}-1+3\lambda^{2}\xi^{2}-2\lambda^{2}}(\log L)^{-2+2\lambda^{2}}\sum_{e^{t}<x_{1}<e^{t+1}}(\lambda^{2}\xi^{2})^{\omega(x_{1})}\\ [\mbox{by Theorem}\leavevmode\nobreak\ \ref{lem:Shiu}]\ll&\frac{d^{3}L^{3}}{ae^{t}\phi(d)^{3}}t^{\frac{2\log 1/\xi}{\log 2}-1+3\lambda^{2}\xi^{2}-2\lambda^{2}}(\log L)^{-2+2\lambda^{2}}\cdot e^{t}(e-1)(t+1)^{-1+\lambda^{2}\xi^{2}}\\ \ll&\frac{d^{3}L^{3}}{a\phi(d)^{3}}t^{\frac{2\log 1/\xi}{\log 2}-2+4\lambda^{2}\xi^{2}-2\lambda^{2}}(\log L)^{-2+2\lambda^{2}}.\end{split}

Optimizing the parameters by choosing λ=1log4\lambda=\frac{1}{\sqrt{\log 4}} and ξ=12\xi=\frac{1}{\sqrt{2}}, we have

|X2|λ4kξ4βt=logea2L2loga+dLd3L3aϕ(d)3t2log1/ξlog22+4λ2ξ22λ2(logL)2+2λ2=(log4)2k22βd3L3aϕ(d)3(logL)2+1log2t=logea2L2loga+dLt1L2(logL)2+1log2loglogL(log4)2k22β,\begin{split}|X_{2}|&\ll\lambda^{-4k}\xi^{-4\beta}\sum_{t=\lfloor\log\frac{ea^{2}}{L^{2}}\rfloor}^{\lfloor\log\sqrt{a+dL}\rfloor}\frac{d^{3}L^{3}}{a\phi(d)^{3}}t^{\frac{2\log 1/\xi}{\log 2}-2+4\lambda^{2}\xi^{2}-2\lambda^{2}}(\log L)^{-2+2\lambda^{2}}\\ &=(\log 4)^{2k}2^{2\beta}\frac{d^{3}L^{3}}{a\phi(d)^{3}}(\log L)^{-2+\frac{1}{\log 2}}\sum_{t=\lfloor\log\frac{ea^{2}}{L^{2}}\rfloor}^{\lfloor\log\sqrt{a+dL}\rfloor}t^{-1}\\ &\ll L^{2}(\log L)^{-2+\frac{1}{\log 2}}\cdot\log\log L\cdot(\log 4)^{2k}2^{2\beta},\end{split}

where in the last step we use that d3L3aϕ(d)3d2ϕ(d)3L2L2\frac{d^{3}L^{3}}{a\phi(d)^{3}}\leq\frac{d^{2}}{\phi(d)^{3}}L^{2}\ll L^{2}.

It remains to estimate |X1||X_{1}|. Note that (6.1) no longer holds, so we replace it by the trivial bound

x1<x2<x1a+dLax2x1(modd)(λ2ξ2)ω(x2)x1<x2<x1a+dLax2x1(modd)1x1La.\sum_{\begin{subarray}{c}x_{1}<x_{2}<x_{1}\frac{a+dL}{a}\\ x_{2}\equiv x_{1}\;(\mathrm{mod}\;d)\end{subarray}}\left(\lambda^{2}\xi^{2}\right)^{\omega(x_{2})}\leq\sum_{\begin{subarray}{c}x_{1}<x_{2}<x_{1}\frac{a+dL}{a}\\ x_{2}\equiv x_{1}\;(\mathrm{mod}\;d)\end{subarray}}1\leq\frac{x_{1}L}{a}.

Also by the choice of λ\lambda we have λ4k+2ω(x1)λ4k=(log4)2k\lambda^{-4k+2\omega(x_{1})}\leq\lambda^{-4k}=(\log 4)^{2k} and ξ4f(2x1)=ξ4βξ4loglog2x1log4ξ4β(logx1)2log1/ξlog2=22βlogx1\xi^{-4f(2x_{1})}=\xi^{-4\beta}\xi^{-4\frac{\log\log 2x_{1}}{\log 4}}\ll\xi^{-4\beta}(\log x_{1})^{2\frac{\log 1/\xi}{\log 2}}=2^{2\beta}\log x_{1}. As a consequence, we have (again noting that log(a+dL)=Θ(logL)\log(a+dL)=\Theta(\log L))

|X1|(log4)2k22βaL<x1<ea2L2d2L3aϕ(d)2x1(logL)2+2λ2(logx1)1+2λ2ξ22λ2=(log4)2k22βL2(logL)2+1log2d2ϕ(d)2LaaL<x1<ea2L2(logx1)1+1log4x1.\begin{split}|X_{1}|&\ll(\log 4)^{2k}2^{2\beta}\sum_{\frac{a}{L}<x_{1}<\frac{ea^{2}}{L^{2}}}\frac{d^{2}L^{3}}{a\phi(d)^{2}x_{1}}(\log L)^{-2+2\lambda^{2}}(\log x_{1})^{1+2\lambda^{2}\xi^{2}-2\lambda^{2}}\\ &=(\log 4)^{2k}2^{2\beta}L^{2}(\log L)^{-2+\frac{1}{\log 2}}\frac{d^{2}}{\phi(d)^{2}}\frac{L}{a}\sum_{\frac{a}{L}<x_{1}<\frac{ea^{2}}{L^{2}}}\frac{(\log x_{1})^{1+\frac{1}{\log 4}}}{x_{1}}.\end{split}

Note that we have

aL<x1<ea2L2(logx1)1+1log4x1(logea2L2)2+1log4a/L.\sum_{\frac{a}{L}<x_{1}<\frac{ea^{2}}{L^{2}}}\frac{(\log x_{1})^{1+\frac{1}{\log 4}}}{x_{1}}\ll\left(\log\frac{ea^{2}}{L^{2}}\right)^{2+\frac{1}{\log 4}}\ll\sqrt{a/L}.

Moreover we have d2ϕ(d)2Lad32ϕ(d)21\frac{d^{2}}{\phi(d)^{2}}\sqrt{\frac{L}{a}}\leq\frac{d^{\frac{3}{2}}}{\phi(d)^{2}}\ll 1. We conclude that |X1|L2(logL)2+1log2(log4)2k22β|X_{1}|\ll L^{2}(\log L)^{-2+\frac{1}{\log 2}}(\log 4)^{2k}2^{2\beta}. Combining with the previous estimate on |X2||X_{2}|, we have the desired upper bound |X|=|X1|+|X2|L2(logL)2+1log2loglogL(log4)2k22β|X|=|X_{1}|+|X_{2}|\ll L^{2}(\log L)^{-2+\frac{1}{\log 2}}\cdot\log\log L\cdot(\log 4)^{2k}2^{2\beta}. ∎

7 Concluding remarks

As we mentioned in the introduction, the best bound we can get in Theorem 1.1 is the following.

Theorem 7.1.

Let 𝒜\mathcal{A}\subseteq\mathbb{Z} be a finite arithmetic progression and θ=11+loglog4log4\theta=1-\frac{1+\log\log 4}{\log 4}. Then

|𝒜𝒜||𝒜|2(log|𝒜|)2θ(loglog|𝒜|)7+o(1).|\mathcal{A}\cdot\mathcal{A}|\geq\frac{{|\mathcal{A}|^{2}}}{{(\log|\mathcal{A}|)^{2\theta}(\log\log|\mathcal{A}|)^{7+o(1)}}}.

The proof of Theorem 7.1 is identical to the proof of Theorem 1.1 but we need a stronger estimate on the set 𝒩k(𝒜;log4,β)\mathcal{N}_{k}(\mathcal{A};\log 4,\beta) for kk with slightly larger size than the choice in Proposition 5.3. The stronger estimate requires ideas from the generalized Smirnov statistics [11] which is deferred to Appendix A.

We make the following conjecture on the sharp lower bound in Theorem 1.1.

Conjecture 7.2.

Let 𝒜\mathcal{A} be a finite arithmetic progression in integers. Then for |𝒜||\mathcal{A}| sufficiently large,

|𝒜𝒜||𝒜|2(log|𝒜|)2θ(loglog|𝒜|)3/2.|\mathcal{A}\cdot\mathcal{A}|\gg\frac{|\mathcal{A}|^{2}}{(\log|\mathcal{A}|)^{2\theta}(\log\log|\mathcal{A}|)^{3/2}}.

This conjecture, if true, would be tight up to a constant factor by considering 𝒜=[N]\mathcal{A}=[N]. We believe that this cannot be done by only estimating the multiplicative energy of a subset.

We also extend Conjecture 1.3 to the hh-fold product case.

Conjecture 7.3.

Let 𝒜\mathcal{A} be a set of integers and 𝒜h:={a1ah:ai𝒜, 1ih}\mathcal{A}^{h}:=\{a_{1}\cdots a_{h}:a_{i}\in\mathcal{A},\;\forall\;1\leq i\leq h\}. If |𝒜+𝒜||𝒜||\mathcal{A}+\mathcal{A}|\ll|\mathcal{A}|. Then

|𝒜h||𝒜|h(log|𝒜|)hlogh+h1o(1).|\mathcal{A}^{h}|\geq|\mathcal{A}|^{h}(\log|\mathcal{A}|)^{-h\log h+h-1-o(1)}.

One way to achieve this lower bound is by choosing 𝒜={1nN:ω(n)=(1+o(1))loglogN}\mathcal{A}=\{1\leq n\leq N:\omega(n)=(1+o(1))\log\log N\}.

Acknowledgments

We would like to thank Yifan Jing and Cosmin Pohoata for helpful discussions about Conjecture 1.3. We are indebted to Kevin Ford for his helpful comments and corrections to earlier versions of the paper. We are grateful to Kannan Soundararajan for discussions about Shiu’s Theorem [27] and work of Tenenbaum [31] and Ford [12]. We thank Fernando Xuancheng Shao for pointing out reference [6] after reading an earlier draft of the paper. We appreciate Huy Tuan Pham for his interest in and discussions on the problem. We thank Dimitris Koukoulopoulos for pointing out a mistake in the reference and Marc Munsch for pointing out a missing reference. We also thank the anonymous referee for valuable comments and feedback. Finally, we are indebted to Jacob Fox for encouraging us to write this paper and for carefully reading through earlier drafts of the paper and for useful suggestions.

References

  • [1] M. Campos, On the number of sets with a given doubling constant, Israel J. Math. 236 (2020), no. 2, 711–726. MR 4093901
  • [2] M. Campos, M. Coulson, O. Serra, and M. Wötzel, The typical approximate structure of sets with bounded sumset, 2021.
  • [3] M.-C. Chang, Product sets of arithmetic progressions, Unpublished manuscript.
  • [4] M.-C. Chang, The Erdős-Szeméredi problem on sum set and product set, Ann. of Math. (2) 157 (2003), no. 3, 939–957. MR 1983786
  • [5] R. de la Bretèche, M. Munsch, and G. Tenenbaum, Small Gál sums and applications, arXiv e-prints (2019), arXiv:1906.12203.
  • [6] S. Eberhard, B. Green, and F. Manners, Sets of integers with no large sum-free subset, Ann. of Math. (2) 180 (2014), no. 2, 621–652. MR 3224720
  • [7] Gy. Elekes and I. Z. Ruzsa, Few sums, many products, Studia Sci. Math. Hungar. 40 (2003), no. 3, 301–308. MR 2036961
  • [8] P. Erdős, An asymptotic inequality in the theory of numbers, Vestnik Leningrad. Univ. 15 (1960), no. 13, 41–49. MR 0126424
  • [9] P. Erdős and E. Szemerédi, On sums and products of integers, Studies in pure mathematics, Birkhäuser, Basel, 1983, pp. 213–218. MR 820223
  • [10] P. Erdős, Some remarks on number theory, Riveon Lematematika 9 (1955), 45–48. MR 73619
  • [11] K. Ford, Generalized Smirnov statistics and the distribution of prime factors, Funct. Approx. Comment. Math. 37 (2007), no. part 1, 119–129. MR 2357313
  • [12]  , The distribution of integers with a divisor in a given interval, Ann. of Math. (2) 168 (2008), no. 2, 367–433. MR 2434882
  • [13]  , Sharp probability estimates for generalized Smirnov statistics, Monatshefte fur Mathematik 153 (2008), no. 3, 205–216.
  • [14]  , Extremal properties of product sets, Proc. Steklov Inst. Math. 303 (2018), no. 1, 220–226, Published in Russian in Tr. Mat. Inst. Steklova 303 (2018), 239–245. MR 3920222
  • [15] G. A. Freĭman, Foundations of a structural theory of set addition, Translations of Mathematical Monographs, Vol 37, American Mathematical Society, Providence, R.I., 1973, Translated from the Russian. MR 0360496
  • [16] D. Koukoulopoulos, On the number of integers in a generalized multiplication table, J. Reine Angew. Math. 689 (2014), 33–99. MR 3187928
  • [17] V. F. Lev, Restricted set addition in groups. III. Integer sumsets with generic restrictions, Period. Math. Hungar. 42 (2001), no. 1-2, 89–98. MR 1832697
  • [18] D. Mastrostefano, On maximal product sets of random sets, J. Number Theory 224 (2021), 13–40. MR 4221525
  • [19] B. Murphy, M. Rudnev, I. Shkredov, and Y. Shteinikov, On the few products, many sums problem, J. Théor. Nombres Bordeaux 31 (2019), no. 3, 573–602. MR 4102615
  • [20] M. B. Nathanson and G. Tenenbaum, Inverse theorems and the number of sums and products, Astérisque 258 (1999), xiii, 195–204, Structure theory of set addition. MR 1701198
  • [21] C. Pomerance and A. Sárközy, On products of sequences of integers, Number theory, Vol. I (Budapest, 1987), Colloq. Math. Soc. János Bolyai, vol. 51, North-Holland, Amsterdam, 1990, pp. 447–463. MR 1058228
  • [22] M. Rudnev, G. Shakan, and I. D. Shkredov, Stronger sum-product inequalities for small sets, Proc. Amer. Math. Soc. 148 (2020), no. 4, 1467–1479. MR 4069186
  • [23] M. Rudnev and S. Stevens, An update on the sum-product problem, 2021.
  • [24] I. Z. Ruzsa, Generalized arithmetical progressions and sumsets, Acta Math. Hungar. 65 (1994), no. 4, 379–388. MR 1281447
  • [25] A. Selberg, Note on a paper by L. G. Sathe, J. Indian Math. Soc. (N.S.) 18 (1954), 83–87. MR 67143
  • [26] X. Shao and W. Xu, A robust version of Freiman’s 3k43k-4 theorem and applications, Math. Proc. Cambridge Philos. Soc. 166 (2019), no. 3, 567–581. MR 3933910
  • [27] P. Shiu, A Brun-Titchmarsh theorem for multiplicative functions, J. Reine Angew. Math. 313 (1980), 161–170. MR 552470
  • [28] J. Solymosi, Bounding multiplicative energy by the sumset, Adv. Math. 222 (2009), no. 2, 402–408. MR 2538014
  • [29] K. Soundararajan and M. W. Xu, Central limit theorems for random multiplicative functions, arXiv e-prints (2022), arXiv:2212.06098.
  • [30] T. Tao and V. Vu, Additive combinatorics, Cambridge Studies in Advanced Mathematics, vol. 105, Cambridge University Press, Cambridge, 2006. MR 2289012
  • [31] G. Tenenbaum, Sur la probabilité qu’un entier possède un diviseur dans un intervalle donné, Compositio Math. 51 (1984), no. 2, 243–263. MR 739737
  • [32]  , Introduction to analytic and probabilistic number theory, third ed., Graduate Studies in Mathematics, vol. 163, American Mathematical Society, Providence, RI, 2015, Translated from the 2008 French edition by Patrick D. F. Ion. MR 3363366

Appendix A Number of elements with a fixed number of primes factors in arithmetic progressions via generalized Smirnov statistics

In this section, we give a stronger estimate on size of 𝒩k(𝒜;α,β)\mathcal{N}_{k}(\mathcal{A};\alpha,\beta), than the bound in Proposition  5.3 (Section 5). This is the key ingredient in proving Theorem 7.1. Notably, our choice of kk here (see Proposition A.1) is slightly larger than the one we used in Proposition 5.3. The rest proof strategy of Theorem 7.1 is identical to the proof of Theorem 1.1. By applying Proposition 6.2 and Lemma 6.3 to give corresponding upper bounds on multiplicative energy, an application of Cauchy-Schwartz inequality gives lower bounds on product sets.

Recall that the set we are interested in Section 5 is the following:

𝒩k(𝒜;α,β):={n𝒜:n square-free,ω(n)=k,loglogpj(n)αjβ 1jk}.\mathcal{N}_{k}(\mathcal{A};\alpha,\beta):=\{n\in\mathcal{A}:n\textrm{ square-free},\;\omega(n)=k,\;\log\log p_{j}(n)\geq\alpha j-\beta\;\leavevmode\nobreak\ \leavevmode\nobreak\ \forall\;1\leq j\leq k\}. (A.1)

Let Nk(𝒜;α,β)=|𝒩k(𝒜;α,β)|N_{k}(\mathcal{A};\alpha,\beta)=|\mathcal{N}_{k}(\mathcal{A};\alpha,\beta)|. Our goal is to give a better lower bound on Nk(𝒜;α,β)N_{k}(\mathcal{A};\alpha,\beta), comparing to Proposition 5.3. The proof uses ideas in [11].

Proposition A.1.

There exists an absolute constant γ\gamma such that the following holds. If 𝒜={a+id:0i<L}\mathcal{A}=\{a+id:0\leq i<L\} is an arithmetic progression satisfying that LL is sufficiently large and 0<dLaLlogL0<dL\leq a\leq L\sqrt{\log L}, and gcd(a,d)=1\operatorname{gcd}(a,d)=1, then

Nk(𝒜;log4,γloglogloglogL)L(logL)θ(loglogL)3/2(logloglogL)C0N_{k}(\mathcal{A};\log 4,\gamma\log\log\log\log L)\gg\frac{L}{(\log L)^{\theta}}(\log\log L)^{-3/2}(\log\log\log L)^{-C_{0}}

for k=loglogLlog4k=\left\lfloor\frac{\log\log L}{\log 4}\right\rfloor and some fixed C0>0C_{0}>0.

Proof of Proposition A.1 assuming Lemma A.4.

Let β=γloglogloglogL\beta=\gamma\log\log\log\log L, and we may write
𝒩k=𝒩k(𝒜;log4,β)\mathcal{N}_{k}=\mathcal{N}_{k}(\mathcal{A};\log 4,\beta) as the inputs are clear from the context. To obtain a lower bound, we consider elements n𝒩kn\in\mathcal{N}_{k} with p1(n)p2(n)pk1(n)<ap_{1}(n)p_{2}(n)\cdots p_{k-1}(n)<\sqrt{a}.

Once we fix such a choice of (p1,p2,,pk1)(p_{1},p_{2},\dots,p_{k-1}) with p1pk1<ap_{1}\cdots p_{k-1}<\sqrt{a} and loglogpjjlog4β\log\log p_{j}\geq j\log 4-\beta for all 1jk11\leq j\leq k-1, we count the number of pkp_{k} with pk>pk1p_{k}>p_{k-1} and p1p2pk𝒜p_{1}p_{2}\cdots p_{k}\in\mathcal{A}. Let q=p1p2pk1q=p_{1}p_{2}\cdots p_{k-1}. Clearly if qp𝒜qp\in\mathcal{A} for some prime pp, then paq>a>q>pk1p\geq\frac{a}{q}>\sqrt{a}>q>p_{k-1}. Moreover we have loglogp>loglogL=loglogLlog2>αkβ\log\log p>\log\log\sqrt{L}=\log\log L-\log 2>\alpha k-\beta when β\beta is sufficiently large, so any such choice of prime pp would give a valid choice for pkp_{k} as defined in (A.1).

For any such fixed choice of qq, if gcd(q,d)>1\operatorname{gcd}(q,d)>1, then any element in 𝒜\mathcal{A} is not a multiple of qq, so there is no such choice of pkp_{k}.

If gcd(q,d)=1\operatorname{gcd}(q,d)=1, we know that q|a+idq|a+id is equivalent to iad1(modq)i\equiv-ad^{-1}\pmod{q}. There is a unique such i0i_{0} with 0i0<q0\leq i_{0}<q. Then the multiples of qq in 𝒜\mathcal{A} are of the form

{a+(i0+kq)d:0k<Li0q}{a+(i0+kq)d:0k<L/q}.\left\{a+(i_{0}+k^{\prime}q)d:0\leq k^{\prime}<\frac{L-i_{0}}{q}\right\}\supseteq\{a+(i_{0}+k^{\prime}q)d:0\leq k^{\prime}<\lfloor L/q\rfloor\}.

The inclusion is because Li0q>L/q1\frac{L-i_{0}}{q}>\lfloor L/q\rfloor-1. Hence any element p𝒜q:={a+kd:0k<L}p\in\mathcal{A}_{q}:=\{a^{\prime}+k^{\prime}d:0\leq k^{\prime}<L^{\prime}\} where a=a+i0dqa^{\prime}=\frac{a+i_{0}d}{q} and L=L/qL^{\prime}=\lfloor L/q\rfloor would satisfy pq𝒜pq\in\mathcal{A}. It is sufficient to estimate the number of primes from 𝒜q\mathcal{A}_{q}, which itself is an arithmetic progression. As gcd(a,d)gcd(a+i0d,d)=gcd(a,d)=1\operatorname{gcd}(a^{\prime},d)\leq\operatorname{gcd}(a+i_{0}d,d)=\operatorname{gcd}(a,d)=1, we have gcd(a,d)=1\operatorname{gcd}(a^{\prime},d)=1. Moreover, we know that dLdLq<aqadL^{\prime}\leq\frac{dL}{q}<\frac{a}{q}\leq a^{\prime} and a<10LlogLa^{\prime}<10L^{\prime}\sqrt{\log L^{\prime}} when LL is sufficiently large. We also have that LLq1>L12(logL)141L^{\prime}\geq\frac{L}{q}-1>\frac{L^{\frac{1}{2}}}{(\log L)^{\frac{1}{4}}}-1 is also sufficiently large. Hence by Lemma 5.2, the number of choices for pkp_{k} is at least

π(𝒜q)dL2ϕ(d)logLdL/q4ϕ(d)logL.\pi(\mathcal{A}_{q})\geq\frac{dL^{\prime}}{2\phi(d)\log L^{\prime}}\geq\frac{dL/q}{4\phi(d)\log L}. (A.2)

Hence by summing (A.2) up, we have

Nk(𝒜;log4,β)dL4ϕ(d)logLp1<<pk1,p1pk1<apjd,loglogpjjlog4β 1jk11p1pk1.N_{k}(\mathcal{A};\log 4,\beta)\geq\frac{dL}{4\phi(d)\log L}\cdot\sum_{\begin{subarray}{c}p_{1}<\cdots<p_{k-1},p_{1}\cdots p_{k-1}<\sqrt{a}\\ p_{j}\nmid d,\log\log p_{j}\geq j\log 4-\beta\;\forall\;1\leq j\leq k-1\end{subarray}}\frac{1}{p_{1}\cdots p_{k-1}}. (A.3)

To estimate the sum on the right hand side of (A.3), we use Lemma A.4. Let γ,C\gamma^{\prime},C^{\prime} be the constants in Lemma A.4, and let (x,k,d,β)=(a,k1,d,β)(x^{\prime},k^{\prime},d^{\prime},\beta^{\prime})=(\sqrt{a},k-1,d,\beta). We verify that the assumptions are met. First, as xaLx^{\prime}\geq\sqrt{a}\geq\sqrt{L}, we may assume that xx^{\prime} is sufficiently large by setting LL large enough. As LL is sufficiently large, we have L2aLL^{2}\geq a\geq L, so LxLL\geq x^{\prime}\geq\sqrt{L}. Hence loglogx=loglogL+O(1)\log\log x^{\prime}=\log\log L+O(1), and

k=k1=loglogLlog4+O(1)=loglogxlog4+O(1).k^{\prime}=k-1=\frac{\log\log L}{\log 4}+O(1)=\frac{\log\log x^{\prime}}{\log 4}+O(1).

Also we have d10logL10log(x)2=O(logx)d\leq 10\sqrt{\log L}\leq 10\sqrt{\log(x^{\prime})^{2}}=O(\log x^{\prime}). By choosing γ=γ\gamma=\gamma^{\prime}, we have β=β=γloglogloglogLγloglogloglogx\beta^{\prime}=\beta=\gamma\log\log\log\log L\geq\gamma\log\log\log\log x^{\prime}. Therefore we may apply Lemma A.4 with aa being polynomial in LL to get that there exists a constant C0>0C_{0}>0 such that, by recalling the definition of θ\theta,

Nk(𝒜;log4,β)dLϕ(d)logL(logL)1+loglog4log4(loglogL)3/2(logloglogL)C0L(logL)θ(loglogL)3/2(logloglogL)C0,\begin{split}N_{k}(\mathcal{A};\log 4,\beta)&\gg\frac{dL}{\phi(d)\log L}\cdot(\log L)^{\frac{1+\log\log 4}{\log 4}}(\log\log L)^{-3/2}(\log\log\log L)^{-C_{0}}\\ &\gg\frac{L}{(\log L)^{\theta}}(\log\log L)^{-3/2}(\log\log\log L)^{-C_{0}},\end{split}

where in the last inequality we use that dϕ(d)1\frac{d}{\phi(d)}\geq 1. ∎

Theorem A.2 (Theorem 1 in [13]).

Let {Ui}i=1n\{U_{i}\}_{i=1}^{n} be independent uniform random variables on [0,1][0,1] and 𝟏E\mathbf{1}_{E} be the indicator function for event EE. Uniformly for nn\in\mathbb{N} and u,w>0u,w>0, we have

Pr(i=1n𝟏Uit(n+wu)t+ut[0,1])=1e2uwn+O(u+wn).\Pr\left(\sum_{i=1}^{n}\mathbf{1}_{U_{i}\leq t}\leq(n+w-u)t+u\;\forall\;t\in[0,1]\right)=1-e^{-\frac{2uw}{n}}+O\left(\frac{u+w}{n}\right). (A.4)
Corollary A.3.

There exists a sufficiently large constant CC such that the following holds. Uniformly for βCα>0\beta\geq C\alpha>0 and CNαn+βαC\leq\frac{N-\alpha n+\beta}{\alpha}, we have

Vol{(x1,,xn):0x1xnN,xjαjβ 1jn}3Nnn!β(Nαn+β)nα2.\operatorname{Vol}\left\{(x_{1},\dots,x_{n}):\begin{matrix}0\leq x_{1}\leq\cdots\leq x_{n}\leq N,\\ x_{j}\geq\alpha j-\beta\;\forall\;1\leq j\leq n\end{matrix}\right\}\leq 3\frac{N^{n}}{n!}\frac{\beta(N-\alpha n+\beta)}{n\alpha^{2}}.

Moreover, if we further have that β(Nαn+β)nα21\frac{\beta(N-\alpha n+\beta)}{n\alpha^{2}}\leq 1, then

Vol{(x1,,xn):0x1xnN,xjαjβ 1jn}14Nnn!β(Nαn+β)nα2.\operatorname{Vol}\left\{(x_{1},\dots,x_{n}):\begin{matrix}0\leq x_{1}\leq\cdots\leq x_{n}\leq N,\\ x_{j}\geq\alpha j-\beta\;\forall\;1\leq j\leq n\end{matrix}\right\}\geq\frac{1}{4}\frac{N^{n}}{n!}\frac{\beta(N-\alpha n+\beta)}{n\alpha^{2}}.
Proof.

We may shrink the region by NN along each dimension. Now (N,α,β)=(1,αN,βN)(N,\alpha,\beta)=(1,\frac{\alpha}{N},\frac{\beta}{N}). As the volume of {xi=xj}\{x_{i}=x_{j}\} is zero, we may assume that they are all distinct without affecting the volume. Then we know that the volume of the desired region is 1n!\frac{1}{n!} of the volume of the following region:

R:={x1,,xn[0,1]n:the j-th smallest element is at least (αjβ)/N for all 1jn}.R^{\prime}:=\{x_{1},\dots,x_{n}\in[0,1]^{n}:\textrm{the $j$-th smallest element is at least $(\alpha j-\beta)/N$ for all $1\leq j\leq n$}\}.

As Vol{xj=(αjβ)/N}=0\operatorname{Vol}\{x_{j}=(\alpha j-\beta)/N\}=0 for each jj, we can rewrite RR^{\prime} as

R′′:={x1,,xn[0,1]n:the number of elements t is at most Nt+βα for all t[0,1]}R^{\prime\prime}:=\left\{x_{1},\dots,x_{n}\in[0,1]^{n}:\textrm{the number of elements $\leq t$ is at most $\frac{Nt+\beta}{\alpha}$ for all $t\in[0,1]$}\right\}

without affecting the volume, i.e. Vol(R′′)=Vol(R)\operatorname{Vol}(R^{\prime\prime})=\operatorname{Vol}(R^{\prime}). Compare it with Theorem A.2. Choose u=βαCu=\frac{\beta}{\alpha}\geq C and w=Nαn+βα[C,n/u]w=\frac{N-\alpha n+\beta}{\alpha}\in[C,n/u]. We denote the left hand side of (A.4) by Qn(u,w)Q_{n}(u,w). We have Vol(R)=Vol(R′′)=Qn(u,w)\operatorname{Vol}(R^{\prime})=\operatorname{Vol}(R^{\prime\prime})=Q_{n}(u,w). By Theorem A.3, there exists an absolute constant C0C_{0} such that

|Qn(u,w)(1e2uwn)|C0u+wn.\left|Q_{n}(u,w)-\left(1-e^{-\frac{2uw}{n}}\right)\right|\leq C_{0}\frac{u+w}{n}.

Pick C>8C0C>8C_{0}. Then C0u+wn14uwnC_{0}\frac{u+w}{n}\leq\frac{1}{4}\frac{uw}{n}. Since 1xex1-x\leq e^{-x}, we conclude that

Vol(R)=Qn(u,w)1e2uwn+C0u+wn2uwn+14uwn<3uwn=3β(Nαn+β)nα2.\operatorname{Vol}(R^{\prime})=Q_{n}(u,w)\leq 1-e^{-\frac{2uw}{n}}+C_{0}\frac{u+w}{n}\leq\frac{2uw}{n}+\frac{1}{4}\frac{uw}{n}<3\frac{uw}{n}=3\frac{\beta(N-\alpha n+\beta)}{n\alpha^{2}}.

With the extra condition that β(Nαn+β)nα2\frac{\beta(N-\alpha n+\beta)}{n\alpha^{2}}\leq, we have uwn1\frac{uw}{n}\leq 1. Also note that for x[0,2]x\in[0,2], ex1x4e^{-x}\leq 1-\frac{x}{4}. Therefore we conclude that, as 2uwn[0,2]\frac{2uw}{n}\in[0,2],

Vol(R)=Qn(u,w)uw2nC0u+wnuw4n=14β(Nαn+β)nα2.\operatorname{Vol}(R^{\prime})=Q_{n}(u,w)\geq\frac{uw}{2n}-C_{0}\frac{u+w}{n}\geq\frac{uw}{4n}=\frac{1}{4}\frac{\beta(N-\alpha n+\beta)}{n\alpha^{2}}.

Putting back the factor Nnn!\frac{N^{n}}{n!}, we get the desired statements. ∎

Lemma A.4.

There exist constants γ\gamma and C0>0C_{0}>0 so that for any xx large enough, k=loglogxlog4+O(1)k=\frac{\log\log x}{\log 4}+O(1), d=O(logx)d=O(\log x), and βγloglogloglogx\beta\geq\gamma\log\log\log\log x, we have

p1<<pk,p1pk<xpjd,loglogpjjlog4β 1jk1p1pk(logx)1+loglog4log4(loglogx)3/2(logloglogx)C0\sum_{\begin{subarray}{c}p_{1}<\cdots<p_{k},p_{1}\cdots p_{k}<x\\ p_{j}\nmid d,\log\log p_{j}\geq j\log 4-\beta\;\forall\;1\leq j\leq k\end{subarray}}\frac{1}{p_{1}\cdots p_{k}}\geq(\log x)^{\frac{1+\log\log 4}{\log 4}}(\log\log x)^{-3/2}(\log\log\log x)^{-C_{0}} (A.5)

First let us explain the intuition behind the right hand side of (A.5). If we replace the conditions by that pj<xp_{j}<x and pjdp_{j}\nmid d for all 1jk1\leq j\leq k, then the sum would be

(p<x,pd1p)k=(loglogxO(logloglogd))k=(loglogx)k(loglogd)O(1)=(loglogx)ko(1).\left(\sum_{p<x,p\nmid d}\frac{1}{p}\right)^{k}=(\log\log x-O(\log\log\log d))^{k}=(\log\log x)^{k}(\log\log d)^{-O(1)}=(\log\log x)^{k-o(1)}.

Here we repeat each term on the left hand side of (A.5) by exactly k!k! times. Thus we should expect a formula of the form (loglogx)ko(1)k!\frac{(\log\log x)^{k-o(1)}}{k!}. Now we aim to show that, with these extra conditions on the choices of (p1,,pk)(p_{1},\dots,p_{k}), the sum stays roughly the same.

Proof.

Motivated by [11], we partition primes in 𝒫={p:pd}\mathcal{P}=\{p:p\nmid d\} into consecutive parts. Fix λ0=1.9\lambda_{0}=1.9, and for j1j\geq 1 let λj\lambda_{j} to be the largest prime such that

pλj:pd1pj.\sum_{p\leq\lambda_{j}:p\nmid d}\frac{1}{p}\leq j.

Let 𝒫j={p:λj1<pλj,pd}\mathcal{P}_{j}=\{p:\lambda_{j-1}<p\leq\lambda_{j},p\nmid d\}. First we can estimate the size of λj\lambda_{j}. By maximality of λj\lambda_{j} we have

j1<j1λj<pλj:pd1ppλj1pj-1<j-\frac{1}{\lambda_{j}}<\sum_{p\leq\lambda_{j}:p\nmid d}\frac{1}{p}\leq\sum_{p\leq\lambda_{j}}\frac{1}{p} (A.6)

Meanwhile note that p|d1p=O(loglogω(d))=O(loglogloglogx)\sum_{p|d}\frac{1}{p}=O(\log\log\omega(d))=O(\log\log\log\log x), so we have

pλj1ppλj:pd1p+p|d1pj+O(loglogloglogx).\sum_{p\leq\lambda_{j}}\frac{1}{p}\leq\sum_{p\leq\lambda_{j}:p\nmid d}\frac{1}{p}+\sum_{p|d}\frac{1}{p}\leq j+O(\log\log\log\log x).

By Merten’s estimate (e.g. see Theorem 1.10 in [32]), we have

pλj1p=loglogλj+O(1).\sum_{p\leq\lambda_{j}}\frac{1}{p}=\log\log\lambda_{j}+O(1).

Thus we conclude that for some absolute constant C>0C>0,

jC<loglogλj<j+Cloglogloglogx.j-C<\log\log\lambda_{j}<j+C\log\log\log\log x.

Moreover, we know that by (A.6) we have

p𝒫j1p=pλj:pd1ppλj1:pd1pj1λj(j1)=11λj.\sum_{p\in\mathcal{P}_{j}}\frac{1}{p}=\sum_{p\leq\lambda_{j}:p\nmid d}\frac{1}{p}-\sum_{p\leq\lambda_{j-1}:p\nmid d}\frac{1}{p}\geq j-\frac{1}{\lambda_{j}}-(j-1)=1-\frac{1}{\lambda_{j}}. (A.7)

Fix positive integers t1t2tkt_{1}\leq t_{2}\leq\dots\leq t_{k}.

𝒫(t1,,tk):={(p1,,pk)𝒫k:p1<<pk,pj𝒫tj,1jk}.\mathcal{P}(t_{1},\dots,t_{k}):=\{(p_{1},\dots,p_{k})\in\mathcal{P}^{k}:p_{1}<\cdots<p_{k},\quad p_{j}\in\mathcal{P}_{t_{j}},\quad\forall 1\leq j\leq k\}.

We would like to reduce the summation in (A.5) over (p1,,pk)(p_{1},\dots,p_{k}) into summation over (t1,,tk)(t_{1},\dots,t_{k}). Thus we would like to impose some restrictions on (t1,,tk)(t_{1},\dots,t_{k}) so that the conditions p1pk<xp_{1}\cdots p_{k}<x, pjdp_{j}\nmid d, and loglogpjjlog4β\log\log p_{j}\geq j\log 4-\beta are all satisfied. Clearly pjdp_{j}\nmid d is satisfied for all jj. Note that we have

loglogpjloglogλtj1tj1C,\log\log p_{j}\geq\log\log\lambda_{t_{j}-1}\geq t_{j}-1-C,

so the condition loglogpjjlog4β\log\log p_{j}\geq j\log 4-\beta is satisfied if we require that tjjlog4(β1C)t_{j}\geq j\log 4-(\beta-1-C). Meanwhile, we also need p1pk<xp_{1}\cdots p_{k}<x. Note that we have

j=1klogpjj=1klogλtjj=1ketj+Cloglogloglogx.\sum_{j=1}^{k}\log p_{j}\leq\sum_{j=1}^{k}\log\lambda_{t_{j}}\leq\sum_{j=1}^{k}e^{t_{j}+C\log\log\log\log x}.

Hence if we define M=loglogxCloglogloglogxM=\log\log x-C\log\log\log\log x, then p1pk<xp_{1}\cdots p_{k}<x if

j=1ketjeM.\sum_{j=1}^{k}e^{t_{j}}\leq e^{M}. (A.8)

Now we impose the restriction that tjt_{j}’s are not concentrated at small values or large values: if many tjt_{j}’s are small, then it is hard to take distinct pjp_{j}’s from the same 𝒫t\mathcal{P}_{t}; if many of them are large then (A.8) is disobeyed. Let TT be a sufficiently large constant to be determined. We choose our tjt_{j} so that they only take values in [T+1,MT][T+1,M-T], and tmT+1T+m+1t_{mT+1}\geq T+m+1 and tkmTMTmt_{k-mT}\leq M-T-m for all positive integers m<kTm<\frac{k}{T}. Let btb_{t} be the number of 1jk1\leq j\leq k with tj=T+tt_{j}=T+t for 1tM2T1\leq t\leq M-2T. As t1tkt_{1}\leq\cdots\leq t_{k}, the numbers t1,,tkt_{1},\dots,t_{k} are uniquely determined by b1,,bM2Tb_{1},\dots,b_{M-2T}. Moreover, by our previous restriction, we know that btmin(Tt,T(M2Tt+1),k)b_{t}\leq\min(Tt,T(M-2T-t+1),k) for all 1tM2T1\leq t\leq M-2T. First we argue that by choosing TT sufficiently large, (A.8) is satisfied. Indeed,

j=1ketj=t=1M2TbteT+tt=1M2TT(M2Tt+1)eT+t=s=1M2TTseMT+1s<TeMT+1\sum_{j=1}^{k}e^{t_{j}}=\sum_{t=1}^{M-2T}b_{t}e^{T+t}\leq\sum_{t=1}^{M-2T}T(M-2T-t+1)e^{T+t}=\sum_{s=1}^{M-2T}Tse^{M-T+1-s}<Te^{M-T+1} (A.9)

which is less than eMe^{M} if we take T>2T>2. Now we have

(p1,,pk)𝒫(t1,,tk)1p1pk=t=1M2T1bt!(p1𝒫t+T1p1p2𝒫t+Tp2p11p2pbt𝒫t+Tpbtp1,,pbt11pbt)t=1M2T1bt!(p𝒫t+T1pbt1λt+T1)bt[by (A.7)]t=1M2T1bt!(1btλt+T1)bt[by bttT]t=1M2T1bt!(1tTexpexp(t+T1C))tT12t=1M2T1bt!\begin{split}\sum_{(p_{1},\dots,p_{k})\in\mathcal{P}(t_{1},\dots,t_{k})}\frac{1}{p_{1}\cdots p_{k}}&=\prod_{t=1}^{M-2T}\frac{1}{b_{t}!}\left(\sum_{p_{1}\in\mathcal{P}_{t+T}}\frac{1}{p_{1}}\sum_{\begin{subarray}{c}p_{2}\in\mathcal{P}_{t+T}\\ p_{2}\neq p_{1}\end{subarray}}\frac{1}{p_{2}}\cdots\sum_{\begin{subarray}{c}p_{b_{t}}\in\mathcal{P}_{t+T}\\ p_{b_{t}}\neq p_{1},\dots,p_{b_{t}-1}\end{subarray}}\frac{1}{p_{b_{t}}}\right)\\ &\geq\prod_{t=1}^{M-2T}\frac{1}{b_{t}!}\left(\sum_{p\in\mathcal{P}_{t+T}}\frac{1}{p}-\frac{b_{t}-1}{\lambda_{t+T-1}}\right)^{b_{t}}\\ [\mbox{by }\eqref{eqn:P_j-lower}]\quad&\geq\prod_{t=1}^{M-2T}\frac{1}{b_{t}!}\left(1-\frac{b_{t}}{\lambda_{t+T-1}}\right)^{b_{t}}\\ [\mbox{by }b_{t}\leq tT]\quad&\geq\prod_{t=1}^{M-2T}\frac{1}{b_{t}!}\left(1-\frac{tT}{\exp\exp(t+T-1-C)}\right)^{tT}\\ &\geq\frac{1}{2}\prod_{t=1}^{M-2T}\frac{1}{b_{t}!}\end{split} (A.10)

where the last inequality is derived by picking TT sufficiently large. Omitting the coefficient 1/21/2, (A.10) is the volume of the region

R(t1,,tk):={(x1,,xk)k:0x1x2xkM2T,tjT1xjtjT 1jk}.R(t_{1},\dots,t_{k}):=\left\{(x_{1},\dots,x_{k})\in\mathbb{R}^{k}:\begin{matrix}0\leq x_{1}\leq x_{2}\leq\cdots\leq x_{k}\leq M-2T,\\ t_{j}-T-1\leq x_{j}\leq t_{j}-T\;\forall\;1\leq j\leq k\end{matrix}\right\}.

Recall that tjt_{j} satisfies the following conditions: tjjlog4(β1C)t_{j}\geq j\log 4-(\beta-1-C) for any 1jk1\leq j\leq k, and tkmTT+m+1t_{k-mT}\geq T+m+1 and tkmTMTmt_{k-mT}\leq M-T-m for all positive integers m<kTm<\frac{k}{T}. Therefore we conclude that the union of R(t1,,tk)R(t_{1},\dots,t_{k}) over these choices of (t1,,tk)(t_{1},\dots,t_{k}) contains the following region

R:={(x1,,xk)k:0x1x2xkM2T,xjjlog4(β1C) 1jk,xmT+1T+m,xkmTMTm 1m<k/T}.R:=\left\{(x_{1},\dots,x_{k})\in\mathbb{R}^{k}:\begin{matrix}0\leq x_{1}\leq x_{2}\leq\cdots\leq x_{k}\leq M-2T,\\ x_{j}\geq j\log 4-(\beta-1-C)\;\forall\;1\leq j\leq k,\\ x_{mT+1}\geq T+m,x_{k-mT}\leq M-T-m\;\forall\;1\leq m<k/T\end{matrix}\right\}.

For simplicity let β=β1C\beta^{\prime}=\beta-1-C. Let us define

R0:={(x1,,xk)k:0x1x2xkM2T,xjjlog4β 1jk},R_{0}:=\left\{(x_{1},\dots,x_{k})\in\mathbb{R}^{k}:0\leq x_{1}\leq x_{2}\leq\cdots\leq x_{k}\leq M-2T,x_{j}\geq j\log 4-\beta^{\prime}\;\forall\;1\leq j\leq k\right\},

and for 1m<kT1\leq m<\frac{k}{T},

V1(m)=Vol(R0{xmT+1<m});V2(m)=Vol(R0{xkmT>M2Tm}).V_{1}(m)=\operatorname{Vol}(R_{0}\cap\{x_{mT+1}<m\});\quad V_{2}(m)=\operatorname{Vol}(R_{0}\cap\{x_{k-mT}>M-2T-m\}).

Then we have R=R0(1m<k/TV1(m)V2(m))R=R_{0}\setminus\left(\bigcup_{1\leq m<k/T}V_{1}(m)\cup V_{2}(m)\right), so

Vol(R)Vol(R0)1m<k/TV1(m)1m<k/TV2(m).\operatorname{Vol}(R)\geq\operatorname{Vol}(R_{0})-\sum_{1\leq m<k/T}V_{1}(m)-\sum_{1\leq m<k/T}V_{2}(m). (A.11)

First we estimate Vol(R0)\operatorname{Vol}(R_{0}). By Corollary A.3, we may set (α,β,N,n)=(log4,β,M2T,k)(\alpha,\beta,N,n)=(\log 4,\beta^{\prime},M-2T,k). When γ>C\gamma>C and xx is sufficiently large, we have β>log4>0\beta^{\prime}>\log 4>0 and

(M2T)klog4+β=(γC)loglogloglogx+O(1).(M-2T)-k\log 4+\beta^{\prime}=(\gamma-C)\log\log\log\log x+O(1). (A.12)

Hence the assumptions of Corollary A.3 are met, so

Vol(R0)(M2T)kk!(loglogloglogx)2k.\operatorname{Vol}(R_{0})\gg\frac{(M-2T)^{k}}{k!}\frac{(\log\log\log\log x)^{2}}{k}.

For each V1(m)V_{1}(m), we notice that the first mT+1mT+1 coordinates contribute volume at most mmT+1(mT+1)!\frac{m^{mT+1}}{(mT+1)!}, so

V1(m)mmT+1(mT+1)!Vol{0xmT+2xkM2T:xjαjβmT+2jk}.V_{1}(m)\leq\frac{m^{mT+1}}{(mT+1)!}\cdot\operatorname{Vol}\{0\leq x_{mT+2}\leq\cdots\leq x_{k}\leq M-2T:x_{j}\geq\alpha j-\beta^{\prime}\leavevmode\nobreak\ \leavevmode\nobreak\ \forall\leavevmode\nobreak\ mT+2\leq j\leq k\}.

By changing the variables, we shall rewrite as V1(m)mmT+1(mT+1)!S1(m)V_{1}(m)\leq\frac{m^{mT+1}}{(mT+1)!}S_{1}(m) where

S1(m):=Vol{0y1y2ykmT1M2T:yj(j+mT+1)log4β 1jkmT1}.S_{1}(m):=\operatorname{Vol}\left\{\begin{matrix}0\leq y_{1}\leq y_{2}\leq\cdots\leq y_{k-mT-1}\leq M-2T:\\ y_{j}\geq(j+mT+1)\log 4-\beta^{\prime}\leavevmode\nobreak\ \leavevmode\nobreak\ \forall\leavevmode\nobreak\ 1\leq j\leq k-mT-1\end{matrix}\right\}.

Similarly for V2(m)V_{2}(m) we have V2(m)mmT+1(mT+1)!S2(m)V_{2}(m)\leq\frac{m^{mT+1}}{(mT+1)!}S_{2}(m) where

S2(m):=Vol{0y1y2ykmT1M2T:yjjlog4β 1jkmT1}.S_{2}(m):=\operatorname{Vol}\left\{\begin{matrix}0\leq y_{1}\leq y_{2}\leq\cdots\leq y_{k-mT-1}\leq M-2T:y_{j}\geq j\log 4-\beta^{\prime}\leavevmode\nobreak\ \leavevmode\nobreak\ \forall\leavevmode\nobreak\ 1\leq j\leq k-mT-1\end{matrix}\right\}.

By comparing the definition of S1(m)S_{1}(m) and S2(m)S_{2}(m), we see that (j+mT+1)log4β>jlog4β(j+mT+1)\log 4-\beta^{\prime}>j\log 4-\beta^{\prime}, so the region of S1(m)S_{1}(m) is a subset of that of S2(m)S_{2}(m). Therefore S1(m)S2(m)S_{1}(m)\leq S_{2}(m). We next give an upper bound on S2(m)S_{2}(m). Since βγloglogloglogx\beta\geq\gamma\log\log\log\log x, the conditions in Corollary A.3 is satisfied once xx is sufficiently large (which is true as long as LL is large enough). By applying Corollary A.3 we have that

S2(m)(M2T)k1mT(k1mT)!β(M2Tlog4(k1mT)+β)(k1mT)α2.S_{2}(m)\leq\frac{(M-2T)^{k-1-mT}}{(k-1-mT)!}\cdot\frac{\beta^{\prime}(M-2T-\log 4(k-1-mT)+\beta^{\prime})}{(k-1-mT)\alpha^{2}}.

By using (A.12), the upper bound can be simplified as

(M2T)k1mT(k1mT)!β(γC)loglogloglogx+β(mT+O(1))k1mT.\leq\frac{(M-2T)^{k-1-mT}}{(k-1-mT)!}\cdot\frac{\beta^{\prime}(\gamma-C)\log\log\log\log x+\beta^{\prime}(mT+O(1))}{k-1-mT}.

We next use the binomial identity (kmT+1)=k!(mT+1)!(k1mT)!\binom{k}{mT+1}=\frac{k!}{(mT+1)!(k-1-mT)!} to get upper bound

(M2T)kk!(mT+1)!(M2T)mT+1(kmT+1)(loglogloglogx+mT)loglogloglogxk1mT,\frac{(M-2T)^{k}}{k!}\cdot\frac{(mT+1)!}{(M-2T)^{mT+1}}\cdot\binom{k}{mT+1}\cdot\frac{(\log\log\log\log x+mT)\log\log\log\log x}{k-1-mT},

and now we proceed by estimating binomial coefficients to get

S2(m)(M2T)kk!(mT+1)!(ke(M2T)mT)mT+1(loglogloglogx+mT)loglogloglogx.S_{2}(m)\leq\frac{(M-2T)^{k}}{k!}\cdot(mT+1)!\left(\frac{k}{e(M-2T)mT}\right)^{mT+1}\cdot(\log\log\log\log x+mT)\log\log\log\log x.

For i=1,2i=1,2, as Vi(m)mmT+1(mT+1)!Si(m)mmT+1(mT+1)!S2(m)V_{i}(m)\leq\frac{m^{mT+1}}{(mT+1)!}S_{i}(m)\leq\frac{m^{mT+1}}{(mT+1)!}S_{2}(m), we have

Vi(m)(M2T)kk!(ke(M2T)T)mT+1(loglogloglogx+mT)loglogloglogx(M2T)kk!1TmT+1(loglogloglogx)2(mT)2.\begin{split}V_{i}(m)&\leq\frac{(M-2T)^{k}}{k!}\cdot\left(\frac{k}{e(M-2T)T}\right)^{mT+1}\cdot(\log\log\log\log x+mT)\log\log\log\log x\\ &\leq\frac{(M-2T)^{k}}{k!}\cdot\frac{1}{T^{mT+1}}\cdot(\log\log\log\log x)^{2}(mT)^{2}.\end{split}

We next sum up all Vi(m)V_{i}(m) over m<k/Tm<k/T, and compare the sum with Vol(R0)\operatorname{Vol}(R_{0}). One has

1Vol(R0)1m<k/TVi(m)m1(mT)2TmT+112T\frac{1}{\operatorname{Vol}(R_{0})}\sum_{1\leq m<k/T}V_{i}(m)\leq\sum_{m\geq 1}\frac{(mT)^{2}}{T^{mT+1}}\leq\frac{1}{2^{T}}

for any T8T\geq 8. Thus, by choosing T=8T=8, we conclude that for i=1,2i=1,2,

mk/TVi(m)<14Vol(R0).\sum_{m\leq k/T}V_{i}(m)<\frac{1}{4}\operatorname{Vol}(R_{0}).

Plugging these two bounds in (A.11), we get

Vol(R)12Vol(R0)(M16)kk!(loglogloglogx)2k.\operatorname{Vol}(R)\geq\frac{1}{2}\operatorname{Vol}(R_{0})\gg\frac{(M-16)^{k}}{k!}\frac{(\log\log\log\log x)^{2}}{k}.

Recall that M=loglogxCloglogloglogxM=\log\log x-C\log\log\log\log x. By using Stirling formula, there exists some positive constants C1,C2>0C_{1},C_{2}>0 such that

Vol(R)(elog4loglogxC1loglogloglogxloglogx)loglogxlog4(loglogloglogx)2(loglogx)3/2(logx)1+loglog4log4(logloglogx)C2(loglogloglogx)2(loglogx)3/2,\begin{split}\operatorname{Vol}(R)&\gg\left(\frac{e\log 4\log\log x-C_{1}\log\log\log\log x}{\log\log x}\right)^{\frac{\log\log x}{\log 4}}\frac{(\log\log\log\log x)^{2}}{(\log\log x)^{3/2}}\\ &\gg(\log x)^{\frac{1+\log\log 4}{\log 4}}\cdot\frac{(\log\log\log x)^{C_{2}}(\log\log\log\log x)^{2}}{(\log\log x)^{3/2}},\end{split}

which completes the proof. ∎

{dajauthors}
{authorinfo}

[maxxu] Max Wenqiang Xu
Department of Mathematics,
Stanford University, Stanford, CA 94305, USA
maxxu\imageatstanford\imagedotedu

{authorinfo}

[yunkunzhou] Yunkun Zhou
Department of Mathematics,
Stanford University, Stanford, CA 94305, USA
yunkunzhou\imageatstanford\imagedotedu