On Lattices, Learning with Errors,
Random Linear Codes, and Cryptography
Abstract
Our main result is a reduction from worst-case lattice problems such as GapSVP and SIVP to a certain learning problem. This learning problem is a natural extension of the ‘learning from parity with error’ problem to higher moduli. It can also be viewed as the problem of decoding from a random linear code. This, we believe, gives a strong indication that these problems are hard. Our reduction, however, is quantum. Hence, an efficient solution to the learning problem implies a quantum algorithm for GapSVP and SIVP. A main open question is whether this reduction can be made classical (i.e., non-quantum).
We also present a (classical) public-key cryptosystem whose security is based on the hardness of the learning problem. By the main result, its security is also based on the worst-case quantum hardness of GapSVP and SIVP. The new cryptosystem is much more efficient than previous lattice-based cryptosystems: the public key is of size and encrypting a message increases its size by a factor of (in previous cryptosystems these values are and , respectively). In fact, under the assumption that all parties share a random bit string of length , the size of the public key can be reduced to .
1 Introduction
Main theorem.
For an integer and a real number , consider the ‘learning from parity with error’ problem, defined as follows: the goal is to find an unknown given a list of ‘equations with errors’
where the ’s are chosen independently from the uniform distribution on , is the inner product modulo of and , and each equation is correct independently with probability . More precisely, the input to the problem consists of pairs where each is chosen independently and uniformly from and each is independently chosen to be equal to with probability . The goal is to find . Notice that the case can be solved efficiently by, say, Gaussian elimination. This requires equations and time.
The problem seems to become significantly harder when we take any positive . For example, let us consider again the Gaussian elimination process and assume that we are interested in recovering only the first bit of . Using Gaussian elimination, we can find a set of equations such that is . Summing the corresponding values gives us a guess for the first bit of . However, a standard calculation shows that this guess is correct with probability . Hence, in order to obtain the first bit with good confidence, we have to repeat the whole procedure times. This yields an algorithm that uses equations and time. In fact, it can be shown that given only equations, the that maximizes the number of satisfied equations is with high probability . This yields a simple maximum likelihood algorithm that requires only equations and runs in time .
Blum, Kalai, and Wasserman [11] provided the first subexponential algorithm for this problem. Their algorithm requires only equations/time and is currently the best known algorithm for the problem. It is based on a clever idea that allows to find a small set of equations (say, ) among equations, such that is, say, . This gives us a guess for the first bit of that is correct with probability . We can obtain the correct value with high probability by repeating the whole procedure only times. Their idea was later shown to have other important applications, such as the first -time algorithm for solving the shortest vector problem [23, 5].
An important open question is to explain the apparent difficulty in finding efficient algorithms for this learning problem. Our main theorem explains this difficulty for a natural extension of this problem to higher moduli, defined next.
Let be some prime integer and consider a list of ‘equations with error’
where this time , are chosen independently and uniformly from , and . The error in the equations is now specified by a probability distribution on . Namely, for each equation , where each is chosen independently according to . We denote the problem of recovering from such equations by (learning with error). For example, the learning from parity problem with error is the special case where , , and . Under a reasonable assumption on (namely, that ), the maximum likelihood algorithm described above solves for using equations and time. Under a similar assumption, an algorithm resembling the one by Blum et al. [11] requires only equations/time. This is the best known algorithm for the LWE problem.
Our main theorem shows that for certain choices of and , a solution to implies a quantum solution to worst-case lattice problems.
Theorem 1.1 (Informal)
Let be integers and be such that . If there exists an efficient algorithm that solves then there exists an efficient quantum algorithm that approximates the decision version of the shortest vector problem (GapSVP) and the shortest independent vectors problem (SIVP) to within in the worst case.
The exact definition of will be given later. For now, it is enough to know that it is a distribution on that has the shape of a discrete Gaussian centered around 0 with standard deviation , as in Figure 1. Also, the probability of (i.e., no error) is roughly . A possible setting for the parameters is and (in fact, these are the parameters that we use in our cryptographic application).
GapSVP and SIVP are two of the main computational problems on lattices. In GapSVP, for instance, the input is a lattice, and the goal is to approximate the length of the shortest nonzero lattice vector. The best known polynomial time algorithms for them yield only mildly subexponential approximation factors [24, 38, 5]. It is conjectured that there is no classical (i.e., non-quantum) polynomial time algorithm that approximates them to within any polynomial factor. Lattice-based constructions of one-way functions, such as the one by Ajtai [2], are based on this conjecture.
One might even conjecture that there is no quantum polynomial time algorithm that approximates GapSVP (or SIVP) to within any polynomial factor. One can then interpret the main theorem as saying that based on this conjecture, the LWE problem is hard. The only evidence supporting this conjecture is that there are no known quantum algorithms for lattice problems that outperform classical algorithms, even though this is probably one of the most important open questions in the field of quantum computing.222If forced to make a guess, the author would say that the conjecture is true.
In fact, one could also interpret our main theorem as a way to disprove this conjecture: if one finds an efficient algorithm for LWE, then one also obtains a quantum algorithm for approximating worst-case lattice problems. Such a result would be of tremendous importance on its own. Finally, we note that it is possible that our main theorem will one day be made classical. This would make all our results stronger and the above discussion unnecessary.
The LWE problem can be equivalently presented as the problem of decoding random linear codes. More specifically, let be arbitrary and let be some vector. Then, consider the following problem: given a random matrix and the vector where each coordinate of the error vector is chosen independently from , recover . The Hamming weight of is roughly (since a value chosen from is with probability roughly ). Hence, the Hamming distance of from is roughly . Moreover, it can be seen that for large enough , for any other word , the Hamming distance of from is roughly . Hence, we obtain that approximating the nearest codeword problem to within factors smaller than on random codes is as hard as quantumly approximating worst-case lattice problems. This gives a partial answer to the important open question of understanding the hardness of decoding from random linear codes.
It turns out that certain problems, which are seemingly easier than the LWE problem, are in fact equivalent to the LWE problem. We establish these equivalences in Section 4 using elementary reductions. For example, being able to distinguish a set of equations as above from a set of equations in which the ’s are chosen uniformly from is equivalent to solving LWE. Moreover, it is enough to correctly distinguish these two distributions for some non-negligible fraction of all . The latter formulation is the one we use in our cryptographic applications.
Cryptosystem.
In Section 5 we present a public key cryptosystem and prove that it is secure based on the hardness of the LWE problem. We use the standard security notion of semantic, or IND-CPA, security (see, e.g., [20, Chapter 10]). The cryptosystem and its security proof are entirely classical. In fact, the cryptosystem itself is quite simple; the reader is encouraged to glimpse at the beginning of Section 5. Essentially, the idea is to provide a list of equations as above as the public key; encryption is performed by summing some of the equations (forming another equation with error) and modifying the right hand side depending on the message to be transmitted. Security follows from the fact that a list of equations with error is computationally indistinguishable from a list of equations in which the ’s are chosen uniformly.
By using our main theorem, we obtain that the security of the system is based also on the worst-case quantum hardness of approximating SIVP and GapSVP to within . In other words, breaking our cryptosystem implies an efficient quantum algorithm for approximating SIVP and GapSVP to within . Previous cryptosystems, such as the Ajtai-Dwork cryptosystem [4] and the one by Regev [36], are based on the worst-case (classical) hardness of the unique-SVP problem, which can be related to GapSVP (but not SIVP) through the recent result of Lyubashevsky and Micciancio [26].
Another important feature of our cryptosystem is its improved efficiency. In previous cryptosystems, the public key size is and the encryption increases the size of messages by a factor of . In our cryptosystem, the public key size is only and encryption increases the size of messages by a factor of only . This possibly makes our cryptosystem practical. Moreover, using an idea of Ajtai [3], we can reduce the size of the public key to . This requires all users of the cryptosystem to share some (trusted) random bit string of length . This can be achieved by, say, distributing such a bit string as part of the encryption and decryption software.
We mention that learning problems similar to ours were already suggested as possible sources of cryptographic hardness in, e.g., [10, 7], although this was done without establishing any connection to lattice problems. In another related work [3], Ajtai suggested a cryptosystem that has several properties in common with ours (including its efficiency), although its security is not based on worst-case lattice problems.
Why quantum?
This paper is almost entirely classical. In fact, quantum is needed only in one step in the proof of the main theorem. Making this step classical would make the entire reduction classical. To demonstrate the difficulty, consider the following situation. Let be some lattice and let where is the length of the shortest nonzero vector in . We are given an oracle that for any point within distance of finds the closest lattice vector to . If is not within distance of , the output of the oracle is undefined. Intuitively, such an oracle seems quite powerful; the best known algorithms for performing such a task require exponential time. Nevertheless, we do not see any way to use this oracle classically. Indeed, it seems to us that the only way to generate inputs to the oracle is the following: somehow choose a lattice point and let for some perturbation vector of length at most . Clearly, on input the oracle outputs . But this is useless since we already know !
It turns out that quantumly, such an oracle is quite useful. Indeed, being able to compute from allows us to uncompute . More precisely, it allows us to transform the quantum state to the state in a reversible (i.e., unitary) way. This ability to erase the contents of a memory cell in a reversible way seems useful only in the quantum setting.
Techniques.
Unlike previous constructions of lattice-based public-key cryptosystems, the proof of our main theorem uses an ‘iterative construction’. Essentially, this means that instead of ‘immediately’ finding very short vectors in a lattice, the reduction proceeds in steps where in each step shorter lattice vectors are found. So far, such iterative techniques have been used only in the construction of lattice-based one-way functions [2, 12, 27, 29]. Another novel aspect of our main theorem is its crucial use of quantum computation. Our cryptosystem is the first classical cryptosystem whose security is based on a quantum hardness assumption (see [30] for a somewhat related recent work).
Our proof is based on the Fourier transform of Gaussian measures, a technique that was developed in previous papers [36, 29, 1]. More specifically, we use a parameter known as the smoothing parameter, as introduced in [29]. We also use the discrete Gaussian distribution and approximations to its Fourier transform, ideas that were developed in [1].
Open questions.
The main open question raised by this work is whether Theorem 1.1 can be dequantized: can the hardness of LWE be established based on the classical hardness of SIVP and GapSVP? We see no reason why this should be impossible. However, despite our efforts over the last few years, we were not able to show this. As mentioned above, the difficulty is that there seems to be no classical way to use an oracle that solves the closest vector problem within small distances. Quantumly, however, such an oracle turns out to be quite useful.
Another important open question is to determine the hardness of the learning from parity with errors problem (i.e., the case ). Our theorem only works for . It seems that in order to prove similar results for smaller values of , substantially new ideas are required. Alternatively, one can interpret our inability to prove hardness for small as an indication that the problem might be easier than believed.
Followup work.
We now describe some of the followup work that has appeared since the original publication of our results in 2005 [37].
One line of work focussed on improvements to our cryptosystem. First, Kawachi, Tanaka, and Xagawa [21] proposed a modification to our cryptosystem that slightly improves the encryption blowup to , essentially getting rid of a factor. A much more significant improvement is described by Peikert, Vaikuntanathan, and Waters in [34]. By a relatively simple modification to the cryptosystem, they managed to bring the encryption blowup down to only , in addition to several equally significant improvements in running time. Finally, Akavia, Goldwasser, and Vaikuntanathan [6] show that our cryptosystem remains secure even if almost the entire secret key is leaked.
Another line of work focussed on the design of other cryptographic protocols whose security is based on the hardness of the LWE problem. First, Peikert and Waters [35] constructed, among other things, CCA-secure cryptosystems (see also [33] for a simpler construction). These are cryptosystems that are secure even if the adversary is allowed access to a decryption oracle (see, e.g., [20, Chapter 10]). All previous lattice-based cryptosystems (including the one in this paper) are not CCA-secure. Second, Peikert, Vaikuntanathan, and Waters [34] showed how to construct oblivious transfer protocols, which are useful, e.g., for performing secure multiparty computation. Third, Gentry, Peikert, and Vaikuntanathan [16] constructed an identity-based encryption (IBE) scheme. This is a public-key encryption scheme in which the public key can be any unique identifier of the user; very few constructions of such schemes are known. Finally, Cash, Peikert, and Sahai [13] constructed a public-key cryptosystem that remains secure even when the encrypted messages may depend upon the secret key. The security of all the above constructions is based on the LWE problem and hence, by our main theorem, also on the worst-case quantum hardness of lattice problems.
The LWE problem has also been used by Klivans and Sherstov to show hardness results related to learning halfspaces [22]. As before, due to our main theorem, this implies hardness of learning halfspaces based on the worst-case quantum hardness of lattice problems.
Finally, we mention two results giving further evidence for the hardness of the LWE problem. In the first, Peikert [32] somewhat strengthens our main theorem by replacing our worst-case lattice problems with their analogues for the norm, where is arbitrary. Our main theorem only deals with the standard versions.
In another recent result, Peikert [33] shows that the quantum part of our proof can be removed, leading to a classical reduction from GapSVP to the LWE problem. As a result, Peikert is able to show that public-key cryptosystems (including many of the above LWE-based schemes) can be based on the classical hardness of GapSVP, resolving a long-standing open question (see also [26]). Roughly speaking, the way Peikert circumvents the difficulty we described earlier is by noticing that the existence of an oracle that is able to recover from , where is a random lattice point and is a random perturbation of length at most , is by itself a useful piece of information as it provides a lower bound on the length of the shortest nonzero vector. By trying to construct such oracles for several different values of and checking which ones work, Peikert is able to obtain a good approximation of the length of the shortest nonzero vector.
Removing the quantum part, however, comes at a cost: the construction can no longer be iterative, the hardness can no longer be based on SIVP, and even for hardness based on GapSVP, the modulus in the LWE problem must be exponentially big unless we assume the hardness of a non-standard variant of GapSVP. Because of this, we believe that dequantizing our main theorem remains an important open problem.
1.1 Overview
In this subsection, we give a brief informal overview of the proof of our main theorem, Theorem 1.1. The complete proof appears in Section 3. We do not discuss here the reductions in Section 4 and the cryptosystem in Section 5 as these parts of the paper are more similar to previous work.
In addition to some very basic definitions related to lattices, we will make heavy use here of the discrete Gaussian distribution on of width , denoted . This is the distribution whose support is (which is typically a lattice), and in which the probability of each is proportional to (see Eq. (6) and Figure 2). We also mention here the smoothing parameter . This is a real positive number associated with any lattice ( is an accuracy parameter which we can safely ignore here). Roughly speaking, it gives the smallest starting from which ‘behaves like’ a continuous Gaussian distribution. For instance, for , vectors chosen from have norm roughly with high probability. In contrast, for sufficiently small , gives almost all its mass to the origin . Although not required for this section, a complete list of definitions can be found in Section 2.
Let be such that , as required in Theorem 1.1, and assume we have an oracle that solves . For concreteness, we can think of and . Our goal is to show how to solve the two lattice problems mentioned in Theorem 1.1. As we prove in Subsection 3.3 using standard reductions, it suffices to solve the following discrete Gaussian sampling problem (DGS): Given an -dimensional lattice and a number , output a sample from . Intuitively, the connection to GapSVP and SIVP comes from the fact that by taking close to its lower limit , we can obtain short lattice vectors (of length roughly ). In the rest of this subsection we describe our algorithm for sampling from . We note that the exact lower bound on is not that important for purposes of this overview, as it only affects the approximation factor we obtain for GapSVP and SIVP. It suffices to keep in mind that our goal is to sample from for that is rather small, say within a polynomial factor of .
The core of the algorithm is the following procedure, which we call the ‘iterative step’. Its input consists of a number (which is guaranteed to be not too small, namely, greater than ), and samples from where is some constant. Its output is a sample from the distribution for . Notice that since , . In order to perform this ‘magic’ of converting vectors of norm into shorter vectors of norm , the procedure of course needs to use the LWE oracle.
Given the iterative step, the algorithm for solving DGS works as follows. Let denote . The algorithm starts by producing samples from . Because is so large, such samples can be computed efficiently by a simple procedure described in Lemma 3.2. Next comes the core of the algorithm: for the algorithm uses its samples from to produce samples from by calling the iterative step times. Eventually, we end up with samples from and we complete the algorithm by simply outputting the first of those. Note the following crucial fact: using samples from , we are able to generate the same number of samples from (in fact, we could even generate more than samples). The algorithm would not work if we could only generate, say, samples, as this would require us to start with an exponential number of samples.
We now finally get to describe the iterative step. Recall that as input we have samples from and we are supposed to generate a sample from where . Moreover, is known and guaranteed to be at least , which can be shown to imply that . As mentioned above, the exact lower bound on does not matter much for this overview; it suffices to keep in mind that is sufficiently larger than , and that is sufficiently smaller than .
The iterative step is obtained by combining two parts (see Figure 3). In the first part, we construct a classical algorithm that uses the given samples and the LWE oracle to solve the following closest vector problem, which we denote by : given any point within distance of the dual lattice , output the closest vector in to .333In fact, we only solve but for simplicity we ignore the factor here. By our assumption on , the distance between any two points in is greater than and hence the closest vector is unique. In the second part, we use this algorithm to generate samples from . This part is quantum (and in fact, the only quantum part of our proof). The idea here is to use the algorithm to generate a certain quantum superposition which, after applying the quantum Fourier transform and performing a measurement, provides us with a sample from . In the following, we describe each of the two parts in more detail.
Part 1:
We start by recalling the main idea in [1]. Consider some probability distribution on some lattice and consider its Fourier transform , defined as
where in the second equality we simply rewrite the sum as an expectation. By definition, is -periodic, i.e., for any and . In [1] it was shown that given a polynomial number of samples from , one can compute an approximation of to within . To see this, note that by the Chernoff-Hoeffding bound, if are independent samples from , then
where the approximation is to within and holds with probability exponentially close to , assuming that is a large enough polynomial.
By applying this idea to the samples from given to us as input, we obtain a good approximation of the Fourier transform of , which we denote by . It can be shown that since one has the approximation
(1) |
(see Figure 4). Hence, for any (in fact an equality holds) and as one gets away from , its value decreases. For points within distance, say, from the lattice, its value is still some positive constant (roughly ). As the distance from increases, the value of the function soon becomes negligible. Since the distance between any two vectors in is at least , the Gaussians around each point of are well-separated.

Although not needed in this paper, let us briefly outline how one can solve using samples from . Assume that we are given some point within distance of . Intuitively, this is located on one of the Gaussians of . By repeatedly computing an approximation of using the samples from as described above, we ‘walk uphill’ on in an attempt to find its ‘peak’. This peak corresponds to the closest lattice point to . Actually, the procedure as described here does not quite work: due to the error in our approximation of , we cannot find the closest lattice point exactly. It is possible to overcome this difficulty; see [25] for the details. The same procedure actually works for slightly longer distances, namely , but beyond that distance the value of becomes negligible and no useful information can be extracted from our approximation of it.
Unfortunately, solving is not useful for the iterative step as it would lead to samples from , which is a wider rather than a narrower distribution than the one we started with. This is not surprising, since our solution to did not use the LWE oracle. Using the LWE oracle, we will now show that we can gain an extra factor in the radius, and obtain the desired algorithm.
Notice that if we could somehow obtain samples from we would be done: using the procedure described above, we could solve , which is better than what we need. Unfortunately, it is not clear how to obtain such samples, even with the help of the LWE oracle. Nevertheless, here is an obvious way to obtain something similar to samples from : just take the given samples from and divide them by . This provides us with samples from where is the lattice scaled down by a factor of . In the following we will show how to use these samples to solve .
Let us first try to understand what the distribution looks like. Notice that the lattice consists of translates of the original lattice . Namely, for each , consider the set
Then forms a partition of . Moreover, it can be shown that since is larger than the smoothing parameter , the probability given to each under is essentially the same, that is, . Intuitively, beyond the smoothing parameter, the Gaussian measure no longer ‘sees’ the discrete structure of , so in particular it is not affected by translations (this will be shown in Claim 3.8).
This leads us to consider the following distribution, call it . A sample from is a pair where is sampled from , and is such that . Notice that we can easily obtain samples from using the given samples from . From the above discussion we have that the marginal distribution of is essentially uniform. Moreover, by definition we have that the distribution of conditioned on any is . Hence, is essentially identical to the distribution on pairs in which is chosen uniformly at random, and then is sampled from . From now on, we think of as being this distribution.
We now examine the Fourier transform of (see Figure 5). When is zero, we already know that the Fourier transform is . For general , a standard calculation shows that the Fourier transform of is given by
(2) |
where is defined as
and denotes the (unique) closest vector in to . In other words, is the vector of coefficients of the vector in closest to when represented in the basis of , reduced modulo . So we see that the Fourier transform is essentially , except that each ‘hill’ gets its own phase depending on the vector of coefficients of the lattice point in its center. The appearance of these phases is as a result of a well-known property of the Fourier transform, saying that translation is transformed to multiplication by phase.


Equipped with this understanding of the Fourier transform of , we can get back to our task of solving . By the definition of the Fourier transform, we know that the average of over is given by (2). Assume for simplicity that (even though in this case finding the closest vector is trivial; it is simply itself). In this case, (2) is equal to . Since the absolute value of this expression is , we see that for such , the random variable (where ) must be deterministically equal to (this fact can also be seen directly). In other words, when , each sample from , provides us with a linear equation
with distributed essentially uniformly in . After collecting about such equations, we can use Gaussian elimination to recover . And as we shall show in Lemma 3.5 using a simple reduction, the ability to compute easily leads to the ability to compute the closest vector to .
We now turn to the more interesting case in which is not in , but only within distance of . In this case, the phase of (2) is still equal to . Its absolute value, however, is no longer , but still quite close to (depending on the distance of from ). Therefore, the random variable , where , must be typically quite close to (since, as before, the average of is given by (2)). Hence, each sample from , provides us with a linear equation with error,
Notice that is typically not an integer and hence we round it to the nearest integer. After collecting a polynomial number of such equations, we call the LWE oracle in order to recover . Notice that is distributed essentially uniformly, as required by the LWE oracle. Finally, as mentioned above, once we are able to compute , computing is easy (this will be shown in Lemma 3.5).
The above outline ignores one important detail: what is the error distribution in the equations we produce? Recall that the LWE oracle is only guaranteed to work with error distribution . Luckily, as we will show in Claim 3.9 and Corollary 3.10 (using a rather technical proof), if is at distance from for some , then the error distribution in the equations is essentially . (In fact, in order to get this error distribution, we will have to modify the procedure a bit and add a small amount of normal error to each equation.) We then complete the proof by noting (in Lemma 3.7) that an oracle for solving can be used to solve for any (even if is unknown).
Part 2:
In this part, we describe a quantum algorithm that, using a oracle, generates one sample from . Equivalently, we show how to produce a sample from given a oracle. The procedure is essentially the following: first, by using the CVP oracle, create a quantum state corresponding to . Then, apply the quantum Fourier transform and obtain a quantum state corresponding to . By measuring this state we obtain a sample from .
In the following, we describe this procedure in more detail. Our first goal is to create a quantum state corresponding to . Informally, this can be written as
(3) |
This state is clearly not well-defined. In the actual procedure, is replaced with some finite set (namely, all points inside the basic parallelepiped of that belong to some fine grid). This introduces several technical complications and makes the computations rather tedious. Therefore, in the present discussion, we opt to continue with informal expressions as in (3).
Let us now continue our description of the procedure. In order to prepare the state in (3), we first create the uniform superposition on ,
(This step is actually unnecessary in the real procedure, since there we work in the basic parallelepiped of ; but for the present discussion, it is helpful to imagine that we start with this state.) On a separate register, we create a ‘Gaussian state’ of width ,
This can be done using known techniques. The combined state of the system can be written as
We now add the first register to the second (a reversible operation), and obtain
Finally, we would like to erase, or uncompute, the first register to obtain
However, ‘erasing’ a register is in general not a reversible operation. In order for it to be reversible, we need to be able to compute from the remaining register . This is precisely why we need the oracle. It can be shown that almost all the mass of is on such that . Hence, is within distance of the lattice and the oracle finds the closest lattice point, namely, . This allows us to erase the first register in a reversible way.
In the final part of the procedure, we apply the quantum Fourier transform. This yields the quantum state corresponding to , namely,
By measuring this state, we obtain a sample from the distribution (or in fact from but this is a minor issue).
2 Preliminaries
In this section we include some notation that will be used throughout the paper. Most of the notation is standard. Some of the less standard notation is: the Gaussian function (Eq. (4)), the Gaussian distribution (Eq. (5)), the periodic normal distribution (Eq. (7)), the discretization of a distribution on (Eq. (8)), the discrete Gaussian distribution (Eq. (6)), the unique closest lattice vector (above Lemma 2.3), and the smoothing parameter (Definition 2.10).
General.
For two real numbers and we define as . For we define as the integer closest to or, in case two such integers exist, the smaller of the two. For any integer , we write for the cyclic group with addition modulo . We also write for , i.e., the segment with addition modulo .
We define a negligible amount in as an amount that is asymptotically smaller than for any constant . More precisely, is a negligible function in if for any . Similarly, a non-negligible amount is one which is at least for some . Also, when we say that an expression is exponentially small in we mean that it is at most . Finally, when we say that an expression (most often, some probability) is exponentially close to , we mean that it is .
We say that an algorithm with oracle access is a distinguisher between two distributions if its acceptance probability when the oracle outputs samples of the first distribution and its acceptance probability when the oracle outputs samples of the second distribution differ by a non-negligible amount.
Essentially all algorithms and reductions in this paper have an exponentially small error probability, and we sometimes do not state this explicitly.
For clarity, we present some of our reductions in a model that allows operations on real numbers. It is possible to modify them in a straightforward way so that they operate in a model that approximates real numbers up to an error of for arbitrary large constant in time polynomial in .
Given two probability density functions on , we define the statistical distance between them as
(notice that with this definition, the statistical distance ranges in ). A similar definition can be given for discrete random variables. The statistical distance satisfies the triangle inequality, i.e., for any ,
Another important fact which we often use is that the statistical distance cannot increase by applying a (possibly randomized) function , i.e.,
see, e.g., [28]. In particular, this implies that the acceptance probability of any algorithm on inputs from differs from its acceptance probability on inputs from by at most (the factor half coming from the choice of normalization in our definition of ).
Gaussians and other distributions.
Recall that the normal distribution with mean 0 and variance is the distribution on given by the density function where denotes . Also recall that the sum of two independent normal variables with mean 0 and variances and is a normal variable with mean 0 and variance . For a vector and any , let
(4) |
be a Gaussian function scaled by a factor of . We denote by . Note that . Hence,
(5) |
is an -dimensional probability density function and as before, we use to denote . The dimension is implicit. Notice that a sample from the Gaussian distribution can be obtained by taking independent samples from the -dimensional Gaussian distribution. Hence, sampling from to within arbitrarily good accuracy can be performed efficiently by using standard techniques. For simplicity, in this paper we assume that we can sample from exactly.444In practice, when only finite precision is available, can be approximated by picking a fine grid, and picking points from the grid with probability approximately proportional to . All our arguments can be made rigorous by selecting a sufficiently fine grid. Functions are extended to sets in the usual way; i.e., for any countable set . For any vector , we define to be a shifted version of . The following simple claim bounds the amount by which can shrink by a small change in .
Claim 2.1
For all and with and ,
-
Proof:
Using the inequality ,
For any countable set and a parameter , we define the discrete Gaussian probability distribution as
(6) |
See Figure 2 for an illustration.
For the distribution is the distribution on obtained by sampling from a normal variable with mean and standard deviation and reducing the result modulo (i.e., a periodization of the normal distribution),
(7) |
Clearly, one can efficiently sample from . The following technical claim shows that a small change in the parameter does not change the distribution by much.
Claim 2.2
For any ,
-
Proof:
We will show that the statistical distance between a normal variable with standard deviation and one with standard deviation is at most . This implies the claim since applying a function (modulo in this case) cannot increase the statistical distance. By scaling, we can assume without loss of generality that and for some . Then the statistical distance that we wish to bound is given by
Now, since for all ,
Hence we can bound the statistical distance above by
For an arbitrary probability distribution with density function and some integer we define its discretization as the discrete probability distribution obtained by sampling from , multiplying by , and rounding to the closest integer modulo . More formally,
(8) |
As an example, is shown in Figure 1.
Let be some integer, and let be some probability distribution on . Let be an integer and let be a vector. We define as the distribution on obtained by choosing a vector uniformly at random, choosing according to , and outputting , where additions are performed in , i.e., modulo . We also define as the uniform distribution on .
For a probability density function on , we define as the distribution on obtained by choosing a vector uniformly at random, choosing according to , and outputting , where the addition is performed in , i.e., modulo .
Learning with errors.
For an integer and a distribution on , we say that an algorithm solves if, for any , given samples from it outputs with probability exponentially close to . Similarly, for a probability density function on , we say that an algorithm solves if, for any , given samples from it outputs with probability exponentially close to . In both cases, we say that the algorithm is efficient if it runs in polynomial time in . Finally, we note that is assumed to be prime only in Lemma 4.2; In the rest of the paper, including the main theorem, can be an arbitrary integer.
Lattices.
We briefly review some basic definitions; for a good introduction to lattices, see [28]. A lattice in is defined as the set of all integer combinations of linearly independent vectors. This set of vectors is known as a basis of the lattice and is not unique. Given a basis of a lattice , the fundamental parallelepiped generated by this basis is defined as
When the choice of basis is clear, we write instead of . For a point we define as the unique point such that . We denote by the volume of the fundamental parallelepiped of or equivalently, the absolute value of the determinant of the matrix whose columns are the basis vectors of the lattice ( is a lattice invariant, i.e., it is independent of the choice of basis). The dual of a lattice in , denoted , is the lattice given by the set of all vectors such that for all vectors . Similarly, given a basis of a lattice, we define the dual basis as the set of vectors such that for all where denotes the Kronecker delta, i.e., if and otherwise. With a slight abuse of notation, we sometimes write for the matrix whose columns are . With this notation, we notice that . From this it follows that . As another example of this notation, for a point we write to indicate the integer coefficient vector of .
Let denote the length of the shortest nonzero vector in the lattice . We denote by the minimum length of a set of linearly independent vectors from , where the length of a set is defined as the length of longest vector in it. For a lattice and a point whose distance from is less than we define as the (unique) closest point to in . The following useful fact, due to Banaszczyk, is known as a ‘transference theorem’. We remark that the lower bound is easy to prove.
Lemma 2.3 ([9], Theorem 2.1)
For any lattice -dimensional , .
Two other useful facts by Banaszczyk are the following. The first bounds the amount by which the Gaussian measure of a lattice changes by scaling; the second shows that for any lattice , the mass given by the discrete Gaussian measure to points of norm greater than is at most exponentially small (the analogous statement for the continuous Gaussian is easy to establish).
Lemma 2.4 ([9], Lemma 1.4(i))
For any lattice and , .
Lemma 2.5 ([9], Lemma 1.5(i))
Let denote the Euclidean unit ball. Then, for any lattice and any , , where is the set of lattice points of norm greater than .
In this paper we consider the following lattice problems. The first two, the decision version of the shortest vector problem (GapSVP) and the shortest independent vectors problem (SIVP), are among the most well-known lattice problems and are concerned with and , respectively. In the definitions below, is the approximation factor, and the input lattice is given in the form of some arbitrary basis.
Definition 2.6
An instance of is given by an -dimensional lattice and a number . In instances, whereas in instances .
Definition 2.7
An instance of is given by an -dimensional lattice . The goal is to output a set of linearly independent lattice vectors of length at most .
A useful generalization of SIVP is the following somewhat less standard lattice problem, known as the generalized independent vectors problem (GIVP). Here, denotes an arbitrary real-valued function on lattices. Choosing results in SIVP.
Definition 2.8
An instance of is given by an -dimensional lattice . The goal is to output a set of linearly independent lattice vectors of length at most .
Another useful (and even less standard) lattice problem is the following. We call it the discrete Gaussian sampling problem (DGS). As before, denotes an arbitrary real-valued function on lattices.
Definition 2.9
An instance of is given by an -dimensional lattice and a number . The goal is to output a sample from .
We also consider a variant of the closest vector problem (which is essentially what is known as the bounded distance decoding problem [25]): For an -dimensional lattice , and some , we say that an algorithm solves if, given a point whose distance to is at most , the algorithm finds the closest lattice point to . In this paper will always be smaller than and hence the closest vector is unique.
The smoothing parameter.
We make heavy use of a lattice parameter known as the smoothing parameter [29]. Intuitively, this parameter provides the width beyond which the discrete Gaussian measure on a lattice behaves like a continuous one. The precise definition is the following.
Definition 2.10
For an -dimensional lattice and positive real , we define the smoothing parameter to be the smallest such that .
In other words, is the smallest such that a Gaussian measure scaled by on the dual lattice gives all but of its weight to the origin. We usually take to be some negligible function of the lattice dimension . Notice that is a continuous and strictly decreasing function of . Moreover, it can be shown that and . So, the parameter is well defined for any , and is the inverse function of . In particular, is also a continuous and strictly decreasing function of .
The motivation for this definition (and the name ‘smoothing parameter’) comes from the following result, shown in [29] (and included here as Claim 3.8). Informally, it says that if we choose a ‘random’ lattice point from an -dimensional lattice and add continuous Gaussian noise for some then the resulting distribution is within statistical distance of the ‘uniform distribution on ’. In this paper, we show (in Claim 3.9) another important property of this parameter: for , if we sample a point from and add Gaussian noise , we obtain a distribution whose statistical distance to a continuous Gaussian is at most . Notice that is the distribution one obtains when summing two independent samples from . Hence, intuitively, the noise is enough to hide the discrete structure of .
The following two upper bounds on the smoothing parameter appear in [29].
Lemma 2.11
For any -dimensional lattice , where .
Lemma 2.12
For any -dimensional lattice and ,
In particular, for any superlogarithmic function , for some negligible function .
We also need the following simple lower bound on the smoothing parameter.
Claim 2.13
For any lattice and any ,
In particular, for any and any constant , for large enough .
-
Proof:
Let be a vector of length and let . Then,
The first inequality follows by solving for . The second inequality is by Lemma 2.3.
The Fourier transform.
We briefly review some of the important properties of the Fourier transform. In the following, we omit certain technical conditions as these will always be satisfied in our applications. For a more precise and in-depth treatment, see, e.g., [14]. The Fourier transform of a function is defined to be
From the definition we can obtain two useful formulas; first, if is defined by for some function and vector then
(9) |
Similarly, if is defined by for some function and vector then
(10) |
Another important fact is that the Gaussian is its own Fourier transform, i.e., . More generally, for any it holds that . Finally, we will use the following formulation of the Poisson summation formula.
Lemma 2.14 (Poisson summation formula)
For any lattice and any function ,
3 Main Theorem
Our main theorem is the following. The connection to the standard lattice problems GapSVP and SIVP will be established in Subsection 3.3 by polynomial time reductions to DGS.
Theorem 3.1 (Main theorem)
Let be some negligible function of . Also, let be some integer and be such that . Assume that we have access to an oracle that solves given a polynomial number of samples. Then there exists an efficient quantum algorithm for .
-
Proof:
The input to our algorithm is an -dimensional lattice and a number . Our goal is to output a sample from . Let denote . The algorithm starts by producing samples from where is the constant from the iterative step lemma, Lemma 3.3. By Claim 2.13, , and hence we can produce these samples efficiently by the procedure described in the bootstrapping lemma, Lemma 3.2. Next, for we use our samples from to produce samples from . The procedure that does this, called the iterative step, is the core of the algorithm and is described in Lemma 3.3. Notice that the condition in Lemma 3.3 is satisfied since for all , . At the end of the loop, we end up with samples from and we complete the algorithm by simply outputting the first of those.
3.1 Bootstrapping
Lemma 3.2 (Bootstrapping)
There exists an efficient algorithm that, given any -dimensional lattice and , outputs a sample from a distribution that is within statistical distance of .
-
Proof:
By using the LLL basis reduction algorithm [24], we obtain a basis for of length at most and let be the parallelepiped generated by this basis. The sampling procedure samples a vector from and then outputs . Notice that .
Our goal is to show that the resulting distribution is exponentially close to . By Lemma 2.5, all but an exponentially small part of is concentrated on points of norm at most . So consider any with . By definition, the probability given to it by is . By Lemma 2.14, the denominator is and hence the probability is at most . On the other hand, by Claim 2.1, the probability given to by our procedure is
Together, these facts imply that our output distribution is within statistical distance of .
3.2 The iterative step
Lemma 3.3 (The iterative step)
Let be a negligible function, be a real number, and be an integer. Assume that we have access to an oracle that solves given a polynomial number of samples. Then, there exists a constant and an efficient quantum algorithm that, given any -dimensional lattice , a number , and samples from , produces a sample from .
Note that the output distribution is taken with respect to the randomness (and quantum measurements) used in the algorithm, and not with respect to the input samples. In particular, this means that from the same set of samples from we can produce any polynomial number of samples from .
-
Proof:
The algorithm consists of two main parts. The first part is shown in Lemma 3.4. There, we describe a (classical) algorithm that using and the samples from , solves . The second part is shown in Lemma 3.14. There, we describe a quantum algorithm that, given an oracle that solves , outputs a sample from . This is the only quantum component in this paper. We note that the condition in Lemma 3.14 is satisfied since by Claim 2.13, .
3.2.1 From samples to CVP
Our goal in this subsection is to prove the following.
Lemma 3.4 (First part of iterative step)
Let be a negligible function, be an integer, and be a real number. Assume that we have access to an oracle that solves given a polynomial number of samples. Then, there exist a constant and an efficient algorithm that, given any -dimensional lattice , a number , and samples from , solves .
For an -dimensional lattice , some , and an integer , we say that an algorithm solves if, given any point within distance of , it outputs , the coefficient vector of the closest vector to reduced modulo . We start with the following lemma, which shows a reduction from to .
Lemma 3.5 (Finding coefficients modulo is sufficient)
There exists an efficient algorithm that given a lattice , a number and an integer , solves given access to an oracle for .
-
Proof:
Our input is a point within distance of . We define a sequence of points as follows. Let be the coefficient vector of the closest lattice point to . We define . Notice that the closest lattice point to is and hence . Moreover, the distance of from is at most . Also note that this sequence can be computed by using the oracle.
After steps, we have a point whose distance to the lattice is at most . We now apply an algorithm for approximately solving the closest vector problem, such as Babai’s nearest plane algorithm [8]. This yields a lattice point within distance of . Hence, is the lattice point closest to and we managed to recover . Knowing and (by using the oracle), we can now recover . Continuing this process, we can recover . This completes the algorithm since is the closest point to .
As we noted in the proof of Lemma 3.3, for our choice of , . Hence, in order to prove Lemma 3.4, it suffices to present an efficient algorithm for . We do this by combining two lemmas. The first, Lemma 3.7, shows an algorithm that, given samples from for some (unknown) , outputs with probability exponentially close to by using as an oracle. Its proof is based on Lemma 3.6. The second, Lemma 3.11, is the main lemma of this subsection, and shows how to use and the given samples from in order to solve .
Lemma 3.6 (Verifying solutions of LWE)
Let be some integer. There exists an efficient algorithm that, given and samples from for some (unknown) and , outputs whether and is correct with probability exponentially close to .
We remark that the lemma holds also for all with essentially the same proof.
-
Proof:
The idea is to perform a statistical test on samples from that checks whether . Let be the distribution on obtained by sampling from and outputting . The algorithm takes samples from . It then computes . If , it decides that , otherwise it decides that .
We now analyze this algorithm. Consider the distribution . Notice that it be obtained by sampling from , sampling uniformly from and outputting . From this it easily follows that if , is exactly . Otherwise, if , we claim that has a period of for some integer . Indeed, let be an index on which . Then the distribution of is periodic with period . This clearly implies that the distribution of is periodic with period for some . Since a sample from can be obtained by adding a sample from and an independent sample from some other distribution, we obtain that also has the same period of .
Consider the expectation555We remark that this expectation is essentially the Fourier series of at point and that the following arguments can be explained in terms of properties of the Fourier series.
First, a routine calculation shows that for , , which is at least for . Moreover, if has a period of , then
which implies that if then . We complete the proof by noting that by the Chernoff bound, with probability exponentially close to .
Lemma 3.7 (Handling error for )
Let be some integer and . Assume that we have access to an oracle that solves by using a polynomial number of samples. Then, there exists an efficient algorithm that, given samples from for some (unknown) , outputs with probability exponentially close to .
-
Proof:
The proof is based on the following idea: by adding the right amount of noise, we can transform samples from to samples from (or something sufficiently close to it). Assume that the number of samples required by is at most for some . Let be the set of all integer multiplies of between and . For each , Algorithm does the following times. It takes samples from and adds to the second element of each sample a noise sampled independently from . This creates samples taken from the distribution . It then applies and obtains some candidate . Using Lemma 3.6, it checks whether . If the answer is yes, it outputs ; otherwise, it continues.
We now show that finds with probability exponentially close to . By Lemma 3.6, if outputs some value, then this value is correct with probability exponentially close to . Hence, it is enough to show that in one of the iterations, outputs some value. Consider the smallest such that . Clearly, . Define . Then,
By Claim 2.2, the statistical distance between and is at most . Hence, the statistical distance between samples from and samples from is at most . Therefore, for our choice of , outputs with probability at least . The probability that is not found in any of the calls to is at most .
For the analysis of our main procedure in Lemma 3.11, we will need to following claims regarding Gaussian measures on lattices. On first reading, the reader can just read the statements of Claim 3.8 and Corollary 3.10 and skip directly to Lemma 3.11. All claims show that in some sense, when working above the smoothing parameter, the discrete Gaussian measure behaves like the continuous Gaussian measure. We start with the following claim, showing that above the smoothing parameter, the discrete Gaussian measure is essentially invariant under shifts.
Claim 3.8
For any lattice , , , and ,
- Proof:
The following claim (which is only used to establish the corollary following it) says that when adding a continuous Gaussian of width to a discrete Gaussian of width , with both and sufficiently greater than the smoothing parameter, the resulting distribution is very close to a continuous Gaussian of the width we would expect, namely . To get some intuition on why we need to assume that both Gaussians are sufficiently wide, notice for instance that if the discrete Gaussian is very narrow, then it is concentrated on the origin, making the sum have width . Also, if the continuous Gaussian is too narrow, then the discrete structure is still visible in the sum.
Claim 3.9
Let be a lattice, let be any vector, let be two reals, and let denote . Assume that for some . Consider the continuous distribution on obtained by sampling from and then adding a noise vector taken from . Then, the statistical distance between and is at most .
- Proof:
Corollary 3.10
Let be a lattice, let be vectors, and let be two reals. Assume that for some . Then the distribution of where is distributed according to and is a normal variable with mean and standard deviation , is within statistical distance of a normal variable with mean and standard deviation . In particular, since statistical distance cannot increase by applying a function, the distribution of is within statistical distance of .
-
Proof:
We first observe that the distribution of is exactly the same as that of where is distributed as the continuous Gaussian . Next, by Claim 3.9, we know that the distribution of is within statistical distance of the continuous Gaussian . Taking the inner product of this continuous Gaussian with leads to a normal distribution with mean 0 and standard deviation , and we complete the proof by using the fact that statistical distance cannot increase by applying a function (inner product with in this case).
Lemma 3.11 (Main procedure of the first part)
Let be a negligible function, be an integer, and be a real number. Assume that we have access to an oracle that for all , finds given a polynomial number of samples from (without knowing ). Then, there exists an efficient algorithm that given an -dimensional lattice , a number , and a polynomial number of samples from , solves .
-
Proof:
We describe a procedure that given within distance of , outputs samples from the distribution for some where . By running this procedure a polynomial number of times and then using , we can find .
The procedure works as follows. We sample a vector from , and let . We then output
(12) where is chosen according to a normal distribution with standard deviation . We claim that the distribution given by this procedure is within negligible statistical distance of for some .
We first notice that the distribution of is very close to uniform. Indeed, the probability of obtaining each is proportional to . Using and Claim 3.8, the latter is , which implies that the statistical distance between the distribution of and the uniform distribution is negligible.
Next, we condition on any fixed value of and consider the distribution of the second element in (12). Define and note that . Then,
Now,
since . In words, this says that the inner product between and (and in fact, between any vector in and any vector in ) is the same as the inner product between the corresponding coefficient vectors. Since the coefficient vectors are integer,
from which it follows that is exactly .
We complete the proof by applying Corollary 3.10, which shows that the distribution of the remaining part is within negligible statistical distance of for , as required. Here we used that the distribution of is (since we are conditioning on ), the distribution of is normal with mean and standard deviation , and that
3.2.2 From CVP to samples
In this subsection we describe a quantum procedure that uses a CVP oracle in order to create samples from the discrete Gaussian distribution. We assume familiarity with some basic notions of quantum computation, such as (pure) states, measurements, and the quantum Fourier transform. See, e.g., [31] for a good introduction. For clarity, we often omit the normalization factors from quantum states.
The following lemma shows that we can efficiently create a ‘discrete quantum Gaussian state’ of width as long as is large enough compared with . It can be seen as the quantum analogue of Lemma 3.2. The assumption that is essentially without loss of generality since a lattice with rational coordinates can always be rescaled so that .
Lemma 3.12
There exists an efficient quantum algorithm that, given an -dimensional lattice and , outputs a state that is within distance of the normalized state corresponding to
(13) |
-
Proof:
We start by creating the ‘one-dimensional Gaussian state’
(14) This state can be created efficiently using a technique by Grover and Rudolph [18] who show that in order to create such a state, it suffices to be able to compute for any the sum to within good precision. This can be done using the same standard techniques used in sampling from the normal distribution.
Repeating the procedure described above times, creates a system whose state is the -fold tensor product of the state in Eq. (14), which can be written as
Since , Lemma 2.5 implies that this state is within distance of
(15) and hence for our purposes we can assume that we have generated the state in Eq. (15).
Next, using the LLL basis reduction algorithm [24], we obtain a basis for of length at most and let be the parallelepiped generated by this basis. We now compute in a new register and measure it. Let denote the result and note that . The state we obtain after the measurement is
Finally, we subtract from our register, and obtain
Our goal is to show that this state is within distance of the one in Eq. (13). First, by Lemma 2.5, all but an exponentially small part of the norm of the state in Eq. (13) is concentrated on points of norm at most . So consider any with . The amplitude squared given to it in Eq. (13) is . By Lemma 2.14, the denominator is and hence the amplitude squared is at most .
On the other hand, the amplitude squared given to by our procedure is . By Lemma 2.14, the denominator is
To obtain this inequality, first note that by the easy part of Lemma 2.3, , and then apply Lemma 2.5. Moreover, by Claim 2.1, the numerator is at least . Hence, the amplitude squared given to is at least , as required.
For a lattice and a positive integer , we denote by the lattice obtained by scaling down by a factor of . The following technical claim follows from the fact that almost all the mass of is on points of norm at most .666The proof originally provided in the published journal version of this paper (J. ACM, 56(6), 34, 2009) contained a bug. The bug was found and fixed by the author in summer 2023, and also independently by Yilei Chen, Zihan Hu, Qipeng Liu, Han Luo, and Yaxin Tu in late 2023. The current statement of the claim is identical to the original one. However, the condition is no longer used in the proof.
Claim 3.13
Let be an integer and be an -dimensional lattice satisfying . Let be some basic parallelepiped of . Then, the distance between the normalized quantum states corresponding to
is .
-
Proof:
Let be the norm of . In the following we show that the distance between and is at most . This is enough to establish that the distance between the normalized quantum states corresponding to and is exponentially small.
Define the -dimensional lattice
where the union is of disjoint sets. It follows that .
Next, notice that
Therefore, where is the subset of containing all pairs such that both and . Since , we can apply Lemma 2.5 (with ) and obtain that , as desired.
We now prove the main lemma of this subsection.
Lemma 3.14 (Second part of iterative step)
There exists an efficient quantum algorithm that, given any -dimensional lattice , a number , and an oracle that solves , outputs a sample from .
-
Proof:
By scaling, we can assume without loss of generality that . Let be a large enough integer. We can assume that is polynomial in the input size (since such an can be computed in polynomial time given the lattice ). Our first step is to create a state exponentially close to
(16) This is a state on qubits, a number that is polynomial in the input size. To do so, we first use Lemma 3.12 with and the lattice to create the state
By Lemma 2.5, this is exponentially close to
Next, we compute in a new register and obtain
Using the CVP oracle, we can recover from . This allows us to uncompute the first register and obtain
Using Claim 3.13, this state is exponentially close to the required state (16).
In the second step, we apply the quantum Fourier transform. First, using the natural mapping between and , we can rewrite (16) as
We now apply the quantum Fourier transform on . We obtain a state in which the amplitude of for is proportional to
where the last equality follows from Lemma 2.14 and Eq. (10). Hence, the resulting state can be equivalently written as
Notice that . Hence, we can apply Claim 3.13 to the lattice , and obtain that this state is exponentially close to
We measure this state and obtain for some vector with . Since is within of the lattice , and , we can recover by using, say, Babai’s nearest plane algorithm [8]. The output of the algorithm is .
We claim that the distribution of is exponentially close to . Indeed, the probability of obtaining any is proportional to . It remains to notice that by Lemma 2.5, all but an exponentially small fraction of the probability distribution is on points of norm less than .
3.3 Standard lattice problems
We now complete the proof of the main theorem by reducing the standard lattice problems GapSVP and SIVP to DGS. We start with SIVP. The basic idea of the reduction is simple: we call the DGS oracle enough times. We show that with high probability, there are short linearly independent vectors among the returned vectors. We prove this by using the following lemma, which appeared in the preliminary version of [29]. We include the proof since only a proof sketch was given there.
Lemma 3.15
Let be an -dimensional lattice and let be such that where . Then for any subspace of dimension at most the probability that where is chosen from is at least .
- Proof:
Corollary 3.16
Let be an -dimensional lattice and let be such that where . Then, the probability that a set of vectors chosen independently from contains no linearly independent vectors is exponentially small.
-
Proof:
Let be vectors chosen independently from . For , let be the event that
Clearly, if none of the ’s happens, then . Hence, in order to complete the proof it suffices to show that for all , . So fix some , and let us condition on some fixed choice of such that . By Lemma 3.15, the probability that
is at most . This implies that , as required.
In the following lemma we give the reduction from SIVP (in fact, GIVP) to DGS. It shows that under the assumptions of Theorem 3.1, there exists an efficient quantum algorithm for . By Lemma 2.12, this algorithm also solves .
Lemma 3.17
For any and any , there is a polynomial time reduction from to .
-
Proof:
As mentioned above, the idea of the reduction is to simply call the DGS oracle in an attempt to find short linearly independent vectors. One technical complication is that the function is not necessarily efficiently computable, and hence we do not know which parameter to give the DGS oracle. The solution is easy: we just try many values of and take the shortest set of linearly independent vectors found.
We now present the reduction in detail. The input to the reduction is a lattice . We first apply the LLL algorithm [24] to obtain linearly independent vectors of length at most . Let denote the resulting set, and let be the length of the longest vector in . By construction we have . For each call the DGS oracle times with the pair where , and let be the resulting set of vectors. At the end, look for a set of linearly independent vectors in each of , and output the shortest set found.
We now prove correctness. If then is already shorter than and so we are done. Otherwise, let be such that . Such an must exist by Claim 2.13. By Corollary 3.16, contains linearly independent vectors with probability exponentially close to . Moreover, by Lemma 2.5, all vectors in are of length at most with probability exponentially close to . Hence, our reduction outputs a set of linearly independent vectors of length at most , as required.
We now present the reduction from GapSVP to DGS. We first define the decision version of the closest vector problem (GapCVP) and a slight variant of it.
Definition 3.18
An instance of is given by an -dimensional lattice , a vector , and a number . In instances, , whereas in instances, .
Definition 3.19
An instance of is given by an -dimensional lattice , a vector , and a number . In instances, . In instances, and .
In [17] it is shown that for any , there is a polynomial time reduction from to (see also Lemma 5.22 in [29]). Hence, it suffices to show a reduction from to DGS. This reduction is given in the following lemma. By using Lemma 2.11, we obtain that under the assumptions of Theorem 3.1 there exists an efficient quantum algorithm for (and hence also for ).
Lemma 3.20
For any , there is a polynomial time reduction from to .
-
Proof:
The main component in our reduction is the NP verifier for coGapCVP shown in [1]. In more detail, [1] present an efficient algorithm, call it , whose input consists of an -dimensional lattice , a vector , a number , and a sequence of vectors in for some . When , the algorithm is guaranteed to reject. When , and are chosen from the distribution , then the algorithm accepts with probability exponentially close to .
The input to the reduction is an -dimensional lattice , a vector , and a number . We call the DGS oracle times with the lattice and the value to obtain vectors . We then apply with , , , and the vectors . We accept if and only if rejects.
To prove correctness, notice first that in the case of a instance, , and hence must reject (irrespective of the ’s). In the case of a instance we have that , and hence are guaranteed to be valid samples from . Moreover, , and hence accepts with probability exponentially close to .
4 Variants of the LWE problem
In this section, we consider several variants of the LWE problem. Through a sequence of elementary reductions, we prove that all problems are as hard as LWE. The results of this section are summarized in Lemma 4.4.
Lemma 4.1 (Average-case to Worst-case)
Let be some integers and be some distribution on . Assume that we have access to a distinguisher that distinguishes from for a non-negligible fraction of all possible . Then there exists an efficient algorithm that for all accepts with probability exponentially close to on inputs from and rejects with probability exponentially close to on inputs from .
-
Proof:
The proof is based on the following transformation. For any consider the function defined by
It is easy to see that this function transforms the distribution into . Moreover, it transforms the uniform distribution into itself.
Assume that for of all possible , the acceptance probability of on inputs from and on inputs from differ by at least . We construct as follows. Let denote the unknown input distribution. Repeat the following times. Choose a vector uniformly at random. Then estimate the acceptance probability of on and on by calling times on each of the input distributions. By the Chernoff bound, this allows us to obtain an estimate that with probability exponentially close to is within of the true acceptance probabilities. If the two estimates differ by more than then we stop and decide to accept. Otherwise we continue. If the procedure ends without accepting, we reject.
We now prove that distinguishes from for all . First, we claim that when is , the acceptance probability of is exponentially close to . Indeed, in this case, and therefore the two estimates that performs are of the same distribution. The probability that the estimates differ by more than is exponentially small. Next, consider the case that is for some . In each of the iterations, we are considering the distribution for some uniformly chosen . Notice that the distribution of is uniform on . Hence, with probability exponentially close to , in one of the iterations, is such that the acceptance probability of on inputs from and on inputs from differ by at least . Since our estimates are within , accepts with probability exponentially to .
Lemma 4.2 (Decision to Search)
Let be some integer, be a prime, and be some distribution on . Assume that we have access to procedure that for all accepts with probability exponentially close to on inputs from and rejects with probability exponentially close to on inputs from . Then, there exists an efficient algorithm that, given samples from for some , outputs with probability exponentially close to .
-
Proof:
Let us show how finds , the first coordinate of . Finding the other coordinates is similar. For any , consider the following transformation. Given a pair we output the pair where is chosen uniformly at random. It is easy to see that this transformation takes the uniform distribution into itself. Moreover, if then this transformation also takes to itself. Finally, if then it takes to the uniform distribution (note that this requires to be prime). Hence, using , we can test whether . Since there are only possibilities for we can try all of them.
Lemma 4.3 (Discrete to Continuous)
Let be some integers, let be some probability density function on , and let be its discretization to . Assume that we have access to an algorithm that solves . Then, there exists an efficient algorithm that solves .
-
Proof:
Algorithm simply takes samples from and discretizes the second element to obtain samples from . It then applies with these samples in order to find .
By combining the three lemmas above, we obtain
Lemma 4.4
Let be an integer and be a prime. Let be some probability density function on and let be its discretization to . Assume that we have access to a distinguisher that distinguishes from for a non-negligible fraction of all possible . Then, there exists an efficient algorithm that solves .
5 Public Key Cryptosystem
We let be the security parameter of the cryptosystem. Our cryptosystem is parameterized by two integers and a probability distribution on . A setting of these parameters that guarantees both security and correctness is the following. Choose to be some prime number between and and let for some arbitrary constant . The probability distribution is taken to be for , i.e., is such that . For example, we can choose . In the following description, all additions are performed in , i.e., modulo .
-
•
Private key: Choose uniformly at random. The private key is .
-
•
Public Key: For , choose vectors independently from the uniform distribution. Also choose elements independently according to . The public key is given by where .
-
•
Encryption: In order to encrypt a bit we choose a random set uniformly among all subsets of . The encryption is if the bit is 0 and if the bit is 1.
-
•
Decryption: The decryption of a pair is if is closer to than to modulo . Otherwise, the decryption is .
Notice that with our choice of parameters, the public key size is and the encryption process increases the size of a message by a factor of . In fact, it is possible to reduce the size of the public key to by the following idea of Ajtai [3]. Assume all users of the cryptosystem share some fixed (and trusted) random choice of . This can be achieved by, say, distributing these vectors as part of the encryption and decryption software. Then the public key need only consist of . This modification does not affect the security of the cryptosystem.
We next prove that under a certain condition on , , and , the probability of decryption error is small. We later show that our choice of parameters satisfies this condition. For the following two lemmas we need to introduce some additional notation. For a distribution on and an integer , we define as the distribution obtained by summing together independent samples from , where addition is performed in (for we define as the distribution that is constantly ). For a probability distribution on we define similarly. For an element we define as the integer if and as the integer otherwise. In other words, represents the distance of from . Similarly, for , we define as for and as otherwise.
Lemma 5.1 (Correctness)
Let . Assume that for any , satisfies that
Then, the probability of decryption error is at most . That is, for any bit , if we use the protocol above to choose private and public keys, encrypt , and then decrypt the result, then the outcome is with probability at least .
-
Proof:
Consider first an encryption of . It is given by for and
Hence, is exactly . The distribution of the latter is . According to our assumption, is less than with probability at least . In this case, it is closer to than to and therefore the decryption is correct. The proof for an encryption of is similar.
Claim 5.2
For our choice of parameters it holds that for any ,
for some negligible function .
-
Proof:
A sample from can be obtained by sampling from and outputting . Notice that this value is at most away from . Hence, it is enough to show that with high probability. This condition is equivalent to the condition that . Since is distributed as , and , the probability that is for some negligible function .
In order to prove the security of the system, we need the following special case of the leftover hash lemma that appears in [19]. We include a proof for completeness.
Claim 5.3
Let be some finite Abelian group and let be some integer. For any elements consider the statistical distance between the uniform distribution on and the distribution given by the sum of a random subset of . Then the expectation of this statistical distance over a uniform choice of is at most . In particular, the probability that this statistical distance is more than is at most .
-
Proof:
For a choice of elements from , let be the distribution of the sum of a random subsets of , i.e.,
In order to show that this distribution is close to uniform, we compute its norm, and note that it is very close to . From this it will follow that the distribution must be close to the uniform distribution. The norm of is given by
Taking expectation over , and using the fact that for any , , we obtain that
Finally, the expected distance from the uniform distribution is
We now prove that our cryptosystem is semantically secure, i.e., that it is hard to distinguish between encryptions of and encryptions of . More precisely, we show that if such a distinguisher exists, then there exists a distinguisher that distinguishes between and for a non-negligible fraction of all . If and is a prime, then by Lemma 4.4, this also implies an efficient (classical) algorithm that solves . This in turn implies, by Theorem 3.1, an efficient quantum algorithm for . Finally, by Lemma 3.17 we also obtain an efficient quantum algorithm for and by Lemma 3.20 we obtain an efficient quantum algorithm for .
Lemma 5.4 (Security)
For any and , if there exists a polynomial time algorithm that distinguishes between encryptions of and then there exists a distinguisher that distinguishes between and for a non-negligible fraction of all possible .
-
Proof:
Let be the acceptance probability of on input where is an encryption of with the public key and the probability is taken over the randomness in the choice of the private and public keys and over the randomness in the encryption algorithm. We define similarly for encryptions of and let be the acceptance probability of on inputs where are again chosen according to the private and public keys distribution but is chosen uniformly from . With this notation, our hypothesis says that for some .
We now construct a for which . By our hypothesis, either or . In the former case we take to be the same as . In the latter case, we construct as follows. On input , calls with . Notice that this maps the distribution on encryptions of to the distribution on encryptions of and the uniform distribution to itself. Therefore, is the required distinguisher.
For , let be the probability that accepts on input where are chosen from , and is an encryption of with the public key . Similarly, define to be the acceptance probability of where are chosen from , and is now chosen uniformly at random from . Our assumption on says that . Define
By an averaging argument we get that a fraction of at least of the are in . Hence, it is enough to show a distinguisher that distinguishes between and for any .
In the following we describe the distinguisher . We are given a distribution that is either or for some . We take samples from . Let be the probability that accepts on input where the probability is taken on the choice of as an encryption of the bit 0 with the public key . Similarly, let be the probability that accepts on input where the probability is taken over the choice of as a uniform element of . By applying a polynomial number of times, the distinguisher estimates both and up to an additive error of . If the two estimates differ by more than , accepts. Otherwise, rejects.
We first claim that when is the uniform distribution, rejects with high probability. In this case, are chosen uniformly from . Using Claim 5.3 with the group , we obtain that with probability exponentially close to , the distribution on obtained by encryptions of 0 is exponentially close to the uniform distribution on . Therefore, except with exponentially small probability,
Hence, our two estimates differ by at most , and rejects.
Next, we show that if is for then accepts with probability . Notice that (respectively, ) is the average of (respectively, ) taken over the choice of from . From we obtain by an averaging argument that
with probability at least over the choice of from . Hence, with probability at least , chooses such a and since our estimates are accurate to within , the difference between them is more than and accepts.
Acknowledgments
I would like to thank Michael Langberg, Vadim Lyubashevsky, Daniele Micciancio, Chris Peikert, Miklos Santha, Madhu Sudan, and an anonymous referee for useful comments.
References
- [1] D. Aharonov and O. Regev. Lattice problems in NP intersect coNP. Journal of the ACM, 52(5):749–765, 2005. Preliminary version in FOCS’04.
- [2] M. Ajtai. Generating hard instances of lattice problems. In Complexity of computations and proofs, volume 13 of Quad. Mat., pages 1–32. Dept. Math., Seconda Univ. Napoli, Caserta, 2004. Preliminary version in STOC 1996.
- [3] M. Ajtai. Representing hard lattices with bits. In Proc. 37th Annual ACM Symp. on Theory of Computing (STOC), pages 94–103, 2005.
- [4] M. Ajtai and C. Dwork. A public-key cryptosystem with worst-case/average-case equivalence. In Proc. 29th Annual ACM Symp. on Theory of Computing (STOC), pages 284–293, 1997.
- [5] M. Ajtai, R. Kumar, and D. Sivakumar. A sieve algorithm for the shortest lattice vector problem. In Proc. 33rd ACM Symp. on Theory of Computing, pages 601–610, 2001.
- [6] A. Akavia, S. Goldwasser, and V. Vaikuntanathan. Simultaneous hardcore bits and cryptography against memory attacks. In Proc. of 6th IACR Theory of Cryptography Conference (TCC), pages 474–495, 2009.
- [7] M. Alekhnovich. More on average case vs approximation complexity. In Proc. 44th Annual IEEE Symp. on Foundations of Computer Science (FOCS), pages 298–307, 2003.
- [8] L. Babai. On Lovasz’ lattice reduction and the nearest lattice point problem. Combinatorica, 6(1):1–13, 1986.
- [9] W. Banaszczyk. New bounds in some transference theorems in the geometry of numbers. Mathematische Annalen, 296(4):625–635, 1993.
- [10] A. Blum, M. Furst, M. Kearns, and R. J. Lipton. Cryptographic primitives based on hard learning problems. In Advances in cryptology—CRYPTO ’93, volume 773 of Lecture Notes in Comput. Sci., pages 278–291. Springer, Berlin, 1994.
- [11] A. Blum, A. Kalai, and H. Wasserman. Noise-tolerant learning, the parity problem, and the statistical query model. Journal of the ACM, 50(4):506–519, 2003.
- [12] J.-Y. Cai and A. Nerurkar. An improved worst-case to average-case connection for lattice problems. In Proc. 38th Annual IEEE Symp. on Foundations of Computer Science (FOCS), pages 468–477, 1997.
- [13] D. Cash, C. Peikert, and A. Sahai. Efficient circular-secure encryption from hard learning problems, 2009. Submitted.
- [14] W. Ebeling. Lattices and codes. Advanced Lectures in Mathematics. Friedr. Vieweg & Sohn, Braunschweig, revised edition, 2002. A course partially based on lectures by F. Hirzebruch.
- [15] U. Feige. Relations between average case complexity and approximation complexity. In Proc. 34th Annual ACM Symp. on Theory of Computing (STOC), pages 534–543, 2002.
- [16] C. Gentry, C. Peikert, and V. Vaikuntanathan. Trapdoors for hard lattices and new cryptographic constructions. In Proc. 40th ACM Symp. on Theory of Computing (STOC), pages 197–206, 2008.
- [17] O. Goldreich, D. Micciancio, S. Safra, and J.-P. Seifert. Approximating shortest lattice vectors is not harder than approximating closest lattice vectors. Information Processing Letters, 71(2):55–61, 1999.
- [18] L. Grover and T. Rudolph. Creating superpositions that correspond to efficiently integrable probability distributions. In quant-ph/0208112, http://xxx.lanl.gov, 2002.
- [19] R. Impagliazzo and D. Zuckerman. How to recycle random bits. In Proc. 30th Annual IEEE Symp. on Foundations of Computer Science (FOCS), pages 248–253, 1989.
- [20] J. Katz and Y. Lindell. Introduction to modern cryptography. Chapman & Hall/CRC Cryptography and Network Security. Chapman & Hall/CRC, Boca Raton, FL, 2008.
- [21] A. Kawachi, K. Tanaka, and K. Xagawa. Multi-bit cryptosystems based on lattice problems. In Public Key Cryptography – PKC 2007, volume 4450 of Lecture Notes in Comput. Sci., pages 315–329, Berlin, 2007. Springer.
- [22] A. R. Klivans and A. A. Sherstov. Cryptographic hardness for learning intersections of halfspaces. J. Comput. System Sci., 75(1):2–12, 2009. Preliminary version in FOCS’06.
- [23] R. Kumar and D. Sivakumar. On polynomial approximation to the shortest lattice vector length. In Proc. 12th Annual ACM-SIAM Symp. on Discrete Algorithms, pages 126–127, 2001.
- [24] A. K. Lenstra, H. W. Lenstra, Jr., and L. Lovász. Factoring polynomials with rational coefficients. Math. Ann., 261(4):515–534, 1982.
- [25] Y.-K. Liu, V. Lyubashevsky, and D. Micciancio. On bounded distance decoding for general lattices. In International Workshop on Randomization and Computation - Proceedings of RANDOM 2006, volume 4110 of Lecture Notes in Comput. Sci., pages 450–461, Barcelona, Spain, Aug. 2006. Springer.
- [26] V. Lyubashevsky and D. Micciancio. On bounded distance decoding, unique shortest vectors, and the minimum distance problem, 2009. Manuscript.
- [27] D. Micciancio. Almost perfect lattices, the covering radius problem, and applications to Ajtai’s connection factor. SIAM Journal on Computing, 34(1):118–169, 2004. Preliminary version in STOC 2002.
- [28] D. Micciancio and S. Goldwasser. Complexity of Lattice Problems: A Cryptographic Perspective, volume 671 of The Kluwer International Series in Engineering and Computer Science. Kluwer Academic Publishers, Boston, Massachusetts, Mar. 2002.
- [29] D. Micciancio and O. Regev. Worst-case to average-case reductions based on Gaussian measures. SIAM Journal on Computing, 37(1):267–302, 2007.
- [30] C. Moore, A. Russell, and U. Vazirani. A classical one-way function to confound quantum adversaries. In quant-ph/0701115, http://xxx.lanl.gov, 2007.
- [31] M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information. Cambridge University Press, 2000.
- [32] C. Peikert. Limits on the hardness of lattice problems in norms. Comput. Complexity, 17(2):300–351, 2008. Preliminary version in CCC’07.
- [33] C. Peikert. Public-key cryptosystems from the worst-case shortest vector problem. In Proc. 41st ACM Symp. on Theory of Computing (STOC), 2009.
- [34] C. Peikert, V. Vaikuntanathan, and B. Waters. A framework for efficient and composable oblivious transfer. In Advances in cryptology—CRYPTO ’08, volume 5157 of Lecture Notes in Comput. Sci., pages 554–571. Springer, Berlin, 2008.
- [35] C. Peikert and B. Waters. Lossy trapdoor functions and their applications. In Proc. 40th ACM Symp. on Theory of Computing (STOC), pages 187–196, 2008.
- [36] O. Regev. New lattice based cryptographic constructions. Journal of the ACM, 51(6):899–942, 2004. Preliminary version in STOC’03.
- [37] O. Regev. On lattices, learning with errors, random linear codes, and cryptography. In Proc. 37th ACM Symp. on Theory of Computing (STOC), pages 84–93, 2005.
- [38] C.-P. Schnorr. A hierarchy of polynomial time lattice basis reduction algorithms. Theoret. Comput. Sci., 53(2-3):201–224, 1987.