Minimax-optimal estimation for sparse multi-reference alignment with collision-free signals
Abstract
The Multi-Reference Alignment (MRA) problem aims at the recovery of an unknown signal from repeated observations under the latent action of a group of cyclic isometries, in the presence of additive noise of high intensity . It is a more tractable version of the celebrated cryo EM model. In the crucial high noise regime, it is known that its sample complexity scales as . Recent investigations have shown that for the practically significant setting of sparse signals, the sample complexity of the maximum likelihood estimator asymptotically scales with the noise level as . In this work, we investigate minimax optimality for signal estimation under the MRA model for so-called collision-free signals. In particular, this signal class covers the setting of generic signals of dilute sparsity (wherein the support size , where is the ambient dimension.
We demonstrate that the minimax optimal rate of estimation in for the sparse MRA problem in this setting is , where is the sample size. In particular, this widely generalizes the sample complexity asymptotics for the restricted MLE in this setting, establishing it as the statistically optimal estimator. Finally, we demonstrate a concentration inequality for the restricted MLE on its deviations from the ground truth.
1 Introduction
1.1 Introduction
Multi-reference alignment (MRA) is the problem of estimating an unknown signal from noisy samples that are subject to latent rotational transformations. In recent years, the MRA model has found practical applications in many scientific fields, such as structural biology StructuralBiology1 StructuralBiology2 StructuralBiology3 StructuralBiology4 StructuralBiology5 StructuralBiology6 , image recognition ImageRecognition1 ImageRecognition2 ImageRecognition3 ImageRecognition4 , robotics Robotics and signal processing SignalProcessing1 SignalProcessing2 , and has garnered significant research interest as a result. Most notably, the MRA model serves as a simplified model for the three-dimensional reconstruction problem in cryo-electron microscopy (cryo-EM) CryoEM1 Generic CryoEM2 .
In this paper, we study the estimation problem in the MRA model MRA1 Generic MRA2 for collision-free signals. Identify the vector space with the space of functions . Let
denote the cyclic group of coordinate shifts, where is the linear operator given by . In the general MRA estimation problem, the objective is to recover an unknown parameter , known in the literature as a signal, from independent noisy observations given by
(1) |
where the ’s are drawn from independently and uniformly at random and each is i.i.d standard Gaussian noise that is independent of the ’s. We remark that while the canonical distribution on is uniform, other distributions have also been considered OtherDistribution .
In this model, any vector is statistically indistinguishable from its cyclic shifts . Consequently, the parameter is only identifiable up to the action of . Hence we study the estimation problem under the rotation-invariant distance
Here, refers to the usual Euclidean norm.
Both the dimension and the noise level are assumed to be known. In many practical applications such as cryo-EM, the noise level is very high but can be lowered over time by technological improvements Improvement . As such, understanding the relationship between the difficulty of the estimation problem and the quantity , known as the signal-to-noise ratio is of fundamental importance. With that in mind, we focus our work on understanding the asymptotic sampling complexity in the MRA model as .
The MRA problem, particularly in the low-noise regime, has been mostly attacked using the synchronization approach MRA1 . However, it was recently recognised as a Gaussian mixture model BRW , which enabled the use of various methods such as the method of moments MethodOfMoments Generic and expectation-maximization ExpectationMaximization . For a more detailed discussion on the likelihood landscape of such models, we refer the reader to LikelihoodLandscape1 LikelihoodLandscape2 LikelihoodLandscape3 LikelihoodLandscape4 .
1.2 Estimation rates for MRA and sparsity
In the two seminal papers by Bandeira, Rigollet and Weed BRW Generic in 2017, it was shown that the general MRA model in the high noise regime has a sampling complexity of in the worst case, and in the case of signals having full Fourier support. This has motivated investigations into variations of the MRA model where a better sampling complexity may be achieved. One such variation is the MRA estimation problem for certain classes of sparse signals, where the restricted maximum likelihood estimator (abbrv. MLE) has been shown to exhibit an improved sampling complexity of Sparse . We refer the reader to Sparse1 and Sparse2 for an investigation into algorithmic aspects of sparse MRA and the role of sparsity in the related problem of cryo EM.
While it is of interest to understand the the sample complexity asymptotics for the restricted MLE, it only shades light on the performance of a certain type of estimators. In order to have a complete understanding of the problem, one needs to understand minimax optimal rates of estimation for the model. In other words, the central question is, what is the best possible behaviour over candidate estimators for the worst possible error over a sufficiently rich class of signals. Minimax optimality in statistical problems has a long and rich history; for a detailed account we refer the interested reader to Tsybakov . In the context of the MRA problem, minimax optimal rate of estimation for the general MRA problem is known to be (c.f. BRW ; Zhou-Zhou2022 ). However, minimax optimality for the sparse MRA problem is not understood, and in view of different sample complexity asymptotics, is not covered by the minimax theory on general signal spaces. This is the broad direction that we propose to explore in the work.
1.3 The space of collision free signals
In this work, we will focus our attention on a particular class of sparse signals called collision-free signals. For a vector , let denote the multiset of differences
We say that is collision-free if each element in appears with multiplicity exactly .
The notion of collision-freeness is motivated by the beltway problem in combinatorial optimization. The beltway problem consists of recovering a set of integers from their pairwise differences , up to the trivial symmetry of translating all the numbers in the set by the same amount. In 1939, Piccard Piccard conjectured that if two sets of integers and have the same set of pairwise differences , and the pairwise differences are known to be unique (i.e. and are collision-free), then the sets and must be translates of each other.
A counterexample to Piccard’s conjecture was found by Bloom Bloom in 1975 for . In 2007, a complete resolution was brought about by Bekir and Golomb Golomb , who showed that Piccard’s conjecture holds for . In our current context, this means that the support of a collision-free signal can be recovered from up to translation as long as . This fact will play a crucial role in allowing us to obtain improved bounds on the sampling complexity.
With the above discussion in mind, we will assume that the unknown parameter satisfies:
-
(i)
is collision free;
-
(ii)
;
-
(iii)
There exist positive constants such that for all and .
Let denote the set of signals satisfying the above three conditions. In our analysis, we will only keep track of the dependence on and . The other quantities and are fixed throughout the paper
Collision free signals arise as a natural concept in the setting of sparsity, especially in the context of sparse MRA. It is straightforward to envisage that the sparser a signal, the more likely it is for its support to be a collision free subset of . Indeed, a randomly chosen subset of (with expected size ) is going to be collision free with high probability as long as (c.f Sparse Appendix D). The latter corresponds to the regime of so-called dilute sparsity, per the terminology used in the investigations in Sparse , and has a natural place in the study of sparse MRA.
1.4 Main Results
For a signal , let denote the distribution of a random variable satisfying
where is drawn from uniformly and . Let denote the density function of . Explicitly,
Given i.i.d samples as in (1), we analyse the performance of the restricted maximum likelihood estimator (MLE) given by
Our main results are the following two theorems, which characterises the rate of estimation in the collision-free MRA model.
Theorem 1.4.1.
We have that
(2) |
where the infimum is taken over all estimators based on samples .
Theorem 1.4.2.
Fix a constant . There exist constants depending on such that for all and for all , the restricted MLE satisfies
Theorem 1.4.1 is a direct consequence of the following two results, which we state below for independent interest.
The first result is a uniform upper bound for the restricted MLE of , which is a marked improvement over the pointwise upper bound in (Sparse, , Theorem 4).
Theorem 1.4.3.
There exists positive constants and , where depends on , such that the restricted MLE satisfies
(3) |
uniformly over all choices for all sufficiently large.
The second result is a lower bound on the minimax rate of estimation in the sparse MRA model. In particular, it shows that the restricted MLE achieves the optimal sampling complexity of .
Theorem 1.4.4.
Fix a sparsity . Let denote the set of vectors satisfying . For any , there exists a universal constant such that
where the infimum is taken over all estimators on samples from .
2 Preliminaries
In this section, we introduce and prove some properties about the Kullback-Leibler divergence and the moment tensors. Both are key tools that play a central role in our analysis.
2.1 Kullback-Leibler Divergence and Moment Tensors
In the MRA model, the Kullback-Leibler (KL) divergence admits an explicit formula:
An important piece of machinery to analyse the KL divergence is the family of moment tensors. For any positive integer , the mth moment tensor of a vector is the quantity If is another vector, the mth moment difference tensor between and is defined to be
The relationship between the KL divergence and the moment tensors are given by the following results.
Theorem 2.1.1.
(BRW, , Lemma 8) Let . Let and . Then
Theorem 2.1.2.
(BRW, , Theorem 9) Let satisfy and . For any positive integer , there exist universal constants and such that
A few remarks are in order. Firstly, the assumption that means that the first moment tensor vanishes. Thus the summations in the above theorem start from . Secondly, the result still holds if the hypothesis is replaced by for any fixed positive constant . In our current setting, we let . A full proof of this statement can be found in Appendix C.
2.2 Collision-Free Signals
As a consequence of the resolution of the Piccard’s conjecture, collision-free signals are very well-behaved locally. This notion can be formalised in terms of the following lower bounds on the KL-divergence and moment tensors.
Lemma 2.2.1.
(Sparse, , Lemma 5) For any pair of collision free signals , we have
Proposition 2.2.2.
(Sparse, , Proposition 17) There exists a postive constant such that for all collision free signals satisfying and for all sufficiently large,
Combining the above two results, we obtain the following local lower bound for the KL divergence of collision free signals.
Proposition 2.2.3.
There exist positive constants such that for any pair of collision free signals with and and for any sufficiently large,
The threshold also satisfies the following property: for any satisfying and , we have .
2.3 Global Lower Bound for the KL Divergence
In this section, we will complement the sharp local bound for the KL divergence in Proposition 2.2.3 with the following global bound.
Proposition 2.3.1.
Let be two collision free signals, with . There exists a constant , depending on , such that
for all sufficiently large.
As it turns out, the third moment tensor for collision free signals have a very simple form. Thus our strategy for proving Proposition 2.3.1 will be to show that the third moment difference tensor is nonvanishing for two distinct collision free signals and then invoke Theorem 2.1.2.
Lemma 2.3.2.
Let be two collision free signals lying in distinct orbits. Then .
Proof.
We will show that the orbit of any collision free signal can be recovered purely from its third moment tensor. The collision free property of implies that for any with , we have that
In particular, the set , and hence the support of , can be recovered from its third moment tensor up to -orbit. Next, for each , to recover , let be such that . We have that
Thus
can be obtained from the third moment tensor. The conclusion follows. ∎
Lemma 2.3.3.
Let . We denote and . Let and . Then
where Sym is the symmetrization operator which acts on order- tensors by averaging over all permutations of the indices:
Proof.
Let and . Observe that
Since the Symmetrization operator is linear, we obtain
as desired. ∎
We are now ready to prove Proposition 2.3.1.
Proof of Proposition 2.3.1.
Since both the KL divergence and the orbit distance are invariant under the action of the group , we assume without loss of generality that . By Proposition 2.2.3 , it suffices to prove the above statement for the case of We will divide our argument into two cases.
Case 1. .
Observe that is bounded above as is (recall that we require for all ). Thus it suffices to show that
for some small constant depending only on . In what follows, we denote and . Firstly, by lemma 2.3.2, we have that . For each , define
Then set
Note that is a strictly positive constant since both and are compact sets. Furthermore, is independent of both and (but may depend on ). Let be a small constant, whose value may decrease from line to line, satisfying
(4) |
for all such that and . By Theorem 2.1.1,
Thus if , we have that
Now suppose that . If , by (BRW, , Theorem 9), there exists an universal positive constant such that
Finally, suppose that we have both and . Using (4) and Lemma 2.3.3, we obtain
Again by Theorem 2.1.2, we obtain
The desired conclusion follows by choosing
Case 2.
By Lemma A.1 , we in fact have a stronger bound . ∎
3 Uniform Rates of Convergence for the Maximum Likelihood Estimator
The goal of this section is to prove Theorem 1.4.3. Throughout this section, we fix a collision free signal , which plays the role of the unknown parameter which we are trying to estimate. Let and define the subspace
In this section, all derivatives will be taken with respect to the subspace . In other words, for a (twice continuously differentiable) function and a point , we define the gradient and the Hessian to be
Let denote the map
Let denote the Hessian of the above map at the point (on the subspace ). Since is positive semidefinite, it induces a seminorm via
For i.i.d samples drawn according to , the restricted MLE is a minimizer of
on the set
Our proof builds upon the approach in (BRW, , Theorem 4). In our proof, we will invoke the following results.
Lemma 3.0.1.
(BRW, , Lemma B.6) Let and let . Let denote the Hessian of . There exists a constant such that
Lemma 3.0.2.
(BRW, , Lemma B.7) Let be any vector subspace of . Suppose that there exists a positive integer and positive constants and such that for all satisfying and , we have that
Then the restricted MLE satisfies
for some positive constant .
Lemma 3.0.3.
(BRW, , Lemma B.15) There exists a positive constant depending only on the dimension such that for any with ,
Proof of Theorem 1.4.3.
The key idea behind the proof is that for , the dominant term in (3) is the first term, which is controlled by the local nature of the signals. Since collision free signals are very well-behaved locally (Proposition 2.2.3), this allows us to obtain sharper uniform bounds on the rate of convergence of the restricted MLE.
We assume without loss of generality that Furthermore, since is bounded, we also assume and by replacing and by and respectively.
Let be a fixed positive constant and let be a positive constant whose value may change from line to line. Define the event , where is another positive constant whose exact value will be determined later. We decompose
(5) |
and bound each term separately. Note that the first term entails the local behaviour of the restricted MLE while the second term entails the global behaviour of the restricted MLE.
Local bound: For the first term, let and denote the constants as in Lemma 3.0.3 and Proposition 2.2.3 respectively. We first choose sufficiently small such that
Let be sufficiently large such that and choose . By Lemma 3.0.3 and Proposition 2.2.3, we have that
Hence
Together with Proposition 2.2.3, we get
(6) |
The function satisfy and attains its minimum value on at . As such, we must have Furthermore, since , we also have that . Expanding as a Taylor series about , we obtain
where is a vector on the line segment between and . We employ the dual norm to bound the first term and the operator norm to bound the second term. This gives
where Using (6) and dividing by throughout on both sides,
Multiplying both sides by and applying Young’s inequality to the second term on the right-hand side yields
Taking expectation on both sides,
(7) |
This concludes the local analysis of the restricted MLE.
Global bound: We first address the second term in (5). Since on , we obtain
which can be combined with the third term in (7). This gives
(8) |
We will establish an upper bound for each of the three terms in the above equation separately.
For the first term, Bartlett’s identities state that for each ,
Since and the ’s are independent, we get
In particular, the nullspace of is contained in the nullspace of almost surely since any vector lying in the nullspace of satisfies
Since the row space of a symmetric matrix is the orthogonal complement of its nullspace, this means that the row space of (and hence the row space of is contained in the row space of almost surely. As a result,
(9) |
where denotes the Moore-Penrose inverse of . For the second term, we invoke Lemma 3.0.1 to get
(10) |
Finally, for the third term, we invoke Lemma 3.0.2 with the weaker global bound in Proposition 2.3.1 (i.e. ). We obtain
(11) |
for some positive constant depending on . Putting (9), (10) and (11) back into (8), we have
The conclusion follows. ∎
4 Lower Bound for Collision Free Signals
In this section, we prove Theorem 1.4.4.
Proof.
For convenience of notation, we assume that is even. The proof for the case in which is odd is conceptually similar but notationally more complicated. By Le Cam’s two point argument, for any , we have that
where the infimum is taken over all estimators on samples. Thus it suffices to construct two signals such that
for some universal constant but . Let be such that each element in the multiset
appears with multiplicity exactly . Define
where for some universal constant to be specified.
Note that and for . Since both and are mean zero signals, with , we can invoke Theorem 2.1.2 (with ) to obtain
by choosing . ∎
5 Concentration of the Maximum Likelihood Estimator
In this section, we fix a collision free signal that plays the role of the true signal which we wish to approximate. The goal of this section is to prove Theorem 1.4.2.
Recall that for i.i.d samples drawn according to , the restricted MLE is the minimizer of the negative log-likehood
on . Since is equal in distribution to , we obtain
(12) |
In particular, does not depend on the randomness of the group action. Thus we let denote its mean with respect to the Gaussian noise .
Lemma 5.0.1.
Let be fixed constants. There exists a -net (with respect to the Euclidean norm on ) for the subset
satisfying .
Proof.
For each tuple of indices with , define the set
By viewing as a subset of the -dimensional linear subspace
we see that there exists a -net for satisfying Define to be the union of the ’s. Since there are possible choices for , we have that
∎
Lemma 5.0.2.
There exists a universal constant and a constant depending on such that for any ,
Proof.
In what follows, let and be constants whose value may change from line to line. Using (12), for any , we may write
where
I | |||
and is a term not depending on the randomness of the ’s. We analyse the concentration of I and separately. Define the event . First observe that for each ,
The first term is deterministic while the distributions of the second and third terms are given by
By Gaussian tail bounds, we have that
For the chi-square random variable, we will invoke the following well-known chi-square tail bound.
Theorem 5.0.3.
(Chi-square, , Lemma 1). Let be a chi-squared distributed random variable with degrees of freedom. Then for any ,
Observe that
Let be such that If , then and so
On the other hand, if , then and so . Thus
Next, we have that
To summarise, we have the following bound
This implies that
(13) |
where we have used the fact that for the last inequality.
On the event , we have as well as
(14) |
We now control the concentration of . Differentiating with respect to , we obtain
Thus is -Lipschitz in . By the following Gaussian concentration result:
Theorem 5.0.4.
(Concentration, , Theorem 5.5) Let be a standard Gaussian vector. Let and let be an -lipschitz function. Then for all ,
for a fixed , the random variable is -subgaussian. Hoeffding’s inequality then yields
Now set and let be a -net of having cardinality
Next, the gradient of when differentiating with respect to is given by
This can be upper bounded by
On the other hand, on the event , for a fixed , the gradient of is given by
This can be upper bounded by
where the last inequality follows from . This means that on the event , the function (for each fixed ) is Lipschitz in , with Lipschitz constant at most
It then follows that
Combining this with (13) gives the desired result. ∎
Proof of Theorem 1.4.2.
In what follows, let and be constants depending on whose value may change from line to line. We employ a slicing argument. For the given value of and each integer , define
Then for all , we have that
by Proposition 2.3.1. Let . By shrinking the constant , we have that
Note that since . Invoking Lemma 5.0.2 with and , we have that
Note that . Thus in the event in which we have both and , since , we see that is not the restricted MLE (recall that the restricted MLE minimises on ). Thus
where the last inequality uses the fact that is bounded below by a constant. Summing over gives
Now for the infinite summation is bounded by the geometric series
which in turn is dominated by the first term. Thus
as desired. ∎
6 Acknowledgements
SG was supported in part by the MOE grants R-146-000-250-133, R-146-000-312-114 and MOE-T2EP20121-0013.
SSM was partially supported by an INSPIRE research grant (DST/INSPIRE/04/2018/002193) from the Department of Science and Technology, Government of India and a Start-Up Grant from Indian Statistical Institute, Kolkata.
References
- [1] Emmanuel Abbe, Tamir Bendory, William Leeb, João M Pereira, Nir Sharon, and Amit Singer. Multireference alignment is easier with an aperiodic translation distribution. IEEE Transactions on Information Theory, 65(6):3565–3584, 2018.
- [2] Afonso S Bandeira, Moses Charikar, Amit Singer, and Andy Zhu. Multireference alignment using semidefinite programming. In Proceedings of the 5th conference on Innovations in theoretical computer science, pages 459–470, 2014.
- [3] Afonso S Bandeira, Jonathan Niles-Weed, and Philippe Rigollet. Optimal rates of estimation for multi-reference alignment. Mathematical Statistics and Learning, 2(1):25–75, 2020.
- [4] Alberto Bartesaghi, Alan Merk, Soojay Banerjee, Doreen Matthies, Xiongwu Wu, Jacqueline LS Milne, and Sriram Subramaniam. 2.2 å resolution cryo-em structure of -galactosidase in complex with a cell-permeant inhibitor. Science, 348(6239):1147–1151, 2015.
- [5] Ahmad Bekir and Solomon Golomb. There are no further counterexamples to s. piccard’s theorem. Information Theory, IEEE Transactions on, 53:2864 – 2867, 09 2007.
- [6] Tamir Bendory, Alberto Bartesaghi, and Amit Singer. Single-particle cryo-electron microscopy: Mathematical theory, computational challenges, and opportunities. IEEE signal processing magazine, 37(2):58–76, 2020.
- [7] Tamir Bendory, Nicolas Boumal, Chao Ma, Zhizhen Zhao, and Amit Singer. Bispectrum inversion with application to multireference alignment. IEEE Transactions on signal processing, 66(4):1037–1050, 2017.
- [8] Tamir Bendory, Yuehaw Khoo, Joe Kileel, Oscar Mickelin, and Amit Singer. Autocorrelation analysis for cryo-em with sparsity constraints: Improved sample complexity and projection-based algorithms. Proceedings of the National Academy of Sciences, 120(18):e2216507120, 2023.
- [9] Tamir Bendory, Oscar Mickelin, and Amit Singer. Sparse multi-reference alignment: Sample complexity and computational hardness. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8977–8981. IEEE, 2022.
- [10] Gary S Bloom. A counterexample to a theorem of s. piccard. Journal of Combinatorial Theory, Series A, 22(3):378–379, 1977.
- [11] S. Boucheron, G. Lugosi, and P. Massart. Concentration Inequalities: A Nonasymptotic Theory of Independence. OUP Oxford, 2013.
- [12] Nicolas Boumal, Tamir Bendory, Roy R Lederman, and Amit Singer. Heterogeneous multireference alignment: A single pass approach. In 2018 52nd Annual Conference on Information Sciences and Systems (CISS), pages 1–6. IEEE, 2018.
- [13] Lisa Gottesfeld Brown. A survey of image registration techniques. ACM computing surveys (CSUR), 24(4):325–376, 1992.
- [14] Victor-Emmanuel Brunel. Learning rates for gaussian mixtures under group action. In Conference on Learning Theory, pages 471–491. PMLR, 2019.
- [15] Robert Diamond. On the multiple simultaneous superposition of molecular structures by rigid body transformations. Protein Science, 1(10):1279–1287, 1992.
- [16] Zehao Dou, Zhou Fan, and Harrison Zhou. Rates of estimation for high-dimensional multi-reference alignment. arXiv preprint arXiv:2205.01847, 2022.
- [17] Ian L Dryden and Kanti V Mardia. Statistical shape analysis: with applications in R, volume 995. John Wiley & Sons, 2016.
- [18] Zhou Fan, Roy R Lederman, Yi Sun, Tianhao Wang, and Sheng Xu. Maximum likelihood for high-noise group orbit estimation and single-particle cryo-em. arXiv preprint arXiv:2107.01305, 2021.
- [19] Zhou Fan, Yi Sun, Tianhao Wang, and Yihong Wu. Likelihood landscape and maximum likelihood estimation for the discrete orbit recovery model. Communications on Pure and Applied Mathematics, 76(6):1208–1302, 2023.
- [20] Hassan Foroosh, Josiane B Zerubia, and Marc Berthod. Extension of phase correlation to subpixel registration. IEEE transactions on image processing, 11(3):188–200, 2002.
- [21] Subhroshekhar Ghosh and Philippe Rigollet. Sparse multi-reference alignment: Phase retrieval, uniform uncertainty principles and the beltway problem. Foundations of Computational Mathematics, pages 1–48, 2022.
- [22] Roberto Gil-Pita, Manuel Rosa-Zurera, P Jarabo-Amores, and Francisco López-Ferreras. Using multilayer perceptrons to align high range resolution radar signals. In Artificial Neural Networks: Formal Models and Their Applications–ICANN 2005: 15th International Conference, Warsaw, Poland, September 11-15, 2005. Proceedings, Part II 15, pages 911–916. Springer, 2005.
- [23] Anya Katsevich and Afonso S Bandeira. Likelihood maximization and moment matching in low snr gaussian mixture models. Communications on Pure and Applied Mathematics, 76(4):788–842, 2023.
- [24] Beatrice Laurent and Pascal Massart. Adaptive estimation of a quadratic functional by model selection. Annals of statistics, pages 1302–1338, 2000.
- [25] Ryan O’Donnell. Analysis of boolean functions. Cambridge University Press, 2014.
- [26] Wooram Park and Gregory S Chirikjian. An assembly automation approach to alignment of noncircular projections in electron microscopy. IEEE Transactions on Automation Science and Engineering, 11(3):668–679, 2014.
- [27] Wooram Park, Charles R Midgett, Dean R Madden, and Gregory S Chirikjian. A stochastic kinematic model of class averaging in single-particle electron microscopy. The International journal of robotics research, 30(6):730–754, 2011.
- [28] Amelia Perry, Jonathan Weed, Afonso S Bandeira, Philippe Rigollet, and Amit Singer. The sample complexity of multireference alignment. SIAM Journal on Mathematics of Data Science, 1(3):497–517, 2019.
- [29] Sophie Piccard. Sur les ensembles de distances des emsembles de points d’un espace euclidien. 1939.
- [30] Ya’Acov Ritov. Estimating a signal with noisy nuisance parameters. Biometrika, 76(1):31–37, 1989.
- [31] Dirk Robinson, Sina Farsiu, and Peyman Milanfar. Optimal registration of aliased images using variable projection with applications to super-resolution. The Computer Journal, 52(1):31–42, 2009.
- [32] David M Rosen, Luca Carlone, Afonso S Bandeira, and John J Leonard. A certifiably correct algorithm for synchronization over the special euclidean group. In Algorithmic Foundations of Robotics XII: Proceedings of the Twelfth Workshop on the Algorithmic Foundations of Robotics, pages 64–79. Springer, 2020.
- [33] Brian M Sadler and Georgios B Giannakis. Shift-and rotation-invariant object reconstruction using the bispectrum. JOSA A, 9(1):57–69, 1992.
- [34] Sjors HW Scheres, Mikel Valle, Rafael Nunez, Carlos OS Sorzano, Roberto Marabini, Gabor T Herman, and Jose-Maria Carazo. Maximum-likelihood multi-reference refinement for electron microscopy images. Journal of molecular biology, 348(1):139–149, 2005.
- [35] Devika Sirohi, Zhenguo Chen, Lei Sun, Thomas Klose, Theodore C Pierson, Michael G Rossmann, and Richard J Kuhn. The 3.8 å resolution cryo-em structure of zika virus. Science, 352(6284):467–470, 2016.
- [36] Douglas L Theobald and Phillip A Steindel. Optimal simultaneous superpositioning of multiple structures with missing data. Bioinformatics, 28(15):1972–1979, 2012.
- [37] Alexandre B. Tsybakov. Introduction to Nonparametric Estimation. Springer series in statistics. Springer, 2009.
- [38] J Portegies Zwart, René van der Heiden, Sjoerd Gelsema, and Frans Groen. Fast translation invariant classification of hrr range profiles in a zero phase representation. IEE Proceedings-Radar, Sonar and Navigation, 150(6):411–418, 2003.
A Appendix A
Lemma A.1.
Let be two vectors and suppose that . Then there exists a constant , depending on , such that
Proof.
By replacing and with and respectively, we may assume that and . We further assume without loss of generality that . Now observe that
where the left-hand side follows from two different applications of the triangle inequality. Thus so it suffices to show that there exists a positive constant such that whenever . We will directly use the Donsker-Varadhan formula
with , where is a constant to be specified. For the first term, we have
For the second term, note that for , conditioned on , the random variable has a non-central -distribution with degrees of freedom. We obtain
We get
Choose Then so the quantity on the right-hand side is at least
where we have used the fact that for . Since , we have that
as desired. ∎
B Appendix B
Proposition B.1.
Let be two collision free signals such that . If , then either lies in the orbit of or lies in the orbit of .
Proof.
We have that for any with ,
Thus the support of is completely determined by the second moment tensor due to the resolution of the beltway problem. As , we must have for some . Without loss of generality assume . In other words, we have . Let denote their (common) support. Then the condition
is equivalent to condition that the pairwise product of the ratios
must all evaluate to . This implies that these ratios must either all be positive or all be negative. After reordering if necessary, we assume without loss of generality that
If the ratios are all positive, then since
we must have . If the above inequality is strict, then
a contradiction. Hence
In the latter case, a similar argument yields that for all and so . ∎
C Appendix C
We give a full proof of the modifield Theorem 2.1.2. We first prove a modified version of lemma B.12. Fix a positive number
Lemma C.1.
For any and satisfying and , we have
Proof.
Assume without loss of generality that By Jensen’s inequality,
Expanding the norm yields
where is such that by Cauchy Schwarz.
By the binomial theorem, for all such that , there exists an such that
with . By assumption, we have and so
∎
We now proof the upper bound of the modified Theorem 2.1.2.
Proof.
If then since , we must have as well so the statement trivially holds. Otherwise, first note that each term in (2.1.2) remains unchanged if the quantities , and are replaced by and respectively. The same is true when is replaced by another vector in the same -orbit. As such, we henceforth assume , and
Instead of establishing an upper bound on the KL divergence directly, we instead work with the -divergence
and then pass to the KL divergence via the upper bound . Since Jensen’s inequality implies that
Hence
To obtain our desired bound on the -divergence, we will integrate both sides with respect to . Expanding out the square on the right-hand side yield three terms, which we will evaluate separately. The first term is
where denotes an independent and identically distributed copy of . To simplify the expression, we seek to rewrite it as the integral of the density of a Gaussian by completing the square. We obtain
Via similar computations, the second and third terms evaluate to
respectively. As and , we have that and so
A power series expansion yields
With the lemma, and using the fact that , we conclude the proof with
∎
The remainder of the section is dedicated to proving the lower bound of the modified Theorem 2.1.2. To that end, we first recall the key ingredients used in the original proof of Theorem 2.1.2. At each step, the modifications that are needed to suit our current context are highlighted.
Definition C.2.
The (probabilist’s) Hermite polynomials are a family of polynomials defined by
Fact C.3.
The Hermite polynomials satisfy the following basic properties:
-
(i)
The polynomial has degree ;
-
(ii)
The family is an orthogonal basis for , where denotes the standard Gaussian measure on
-
(iii)
We have ;
-
(iv)
For any , we have
Definition C.4.
For each multi-index , define the multivariate polynomial by
The family is called the multivariate Hermite polynomials.
The multivariate Hermite polynomials form an orthogonal basis for the product space . By properties (ii), (iii) and (iv) of Fact C.3, the family of rescaled Hermite polynomials defined by
(15) |
satisfy the following identities
(16) | ||||
(17) |
Next, for each positive integer , we define the function in the following way. Given , set to be the order- symmetric tensor whose th-entry is given by , where is the multi-index associated to :
Note that for each -tuple .
The motivation behind the above definition will gradually become apparent once we write the quantities in terms of the family , where and are arbitrary vectors in . For a positive integer , consider the degree polynomial
If , then (15) implies that
Hence if , we get
To proceed, we will use the following lemma to relate the KL divergence between and to the quantity . (Remark: This lemma remains unchanged from lemma B.11 of the BRW paper).
Lemma C.5.
Let and be any two probability distributions on a measure space . Suppose that there exists a measurable function such that and . Then
(18) |
Proof.
By replacing by for a suitably chosen constant , we may assume that and . Let and denote the corresponding probability distributions of when is distributed according to and respectively. By the data processing inequality, it suffices to prove the claimed bound for We further assume that is absolutely continuous with respect to (otherwise the bound is trivial).
As the quantities involved in (18) arise from taking the expectation of random variables, our approach to establish the inequality will be to pass to the convex function defined by
and then apply Jensen’s inequality. This yields
Let be a dominating measure (i.e. and ) and let and denote the densities of and with respect to . The previous calculation implies that
By the Cauchy-Schwarz inequality,
Hence
as desired. ∎
With the above lemma, our strategy for establishing lower bounds for the KL divergence will be to lower bound the quantity and upper bound the variances of both and . To control , we will apply Lemma C.1. On the other hand, to control the variances, we will use its Hermite decomposition as a gateway to bring in heavy machinery from harmonic analysis. The following lemma remains unchanged from Lemma B.13 of the BRW paper.
Lemma C.6.
Fix a positive integer . Let and suppose that . Then for any symmetric tensors , where , we have
Proof.
Let . To upper bound the variance, it suffices to upper bound the second moment . Before we are able to bring in results from the theory of Gaussian spaces, we first need to replace with the centered multivariate normal distribution To that end, we apply the Cauchy-Schwartz inequality to obtain
(19) |
We first address the second term. This is done by proceeding in a similar fashion as in the proof of the upper bound of the modified Theorem 9. Observe that
By applying Jensen’s inequality and then completing the square afterwards, we obtain
(20) |
We now come to the crux of the matter, which is to establish an upper bound on the first term . To accomplish that goal, we bring in some standard results about the Ornstein-Uhlenbeck semigroup, which is a family of operators defined by
Here, we highlight that our definition of differs from the standard definition in the literature in the sense that expectation is taken with respect to a multivariate normal distribution with covariance matrix as opposed to to compensate for the scaling in
The set is an eigenbasis for the family , with [25, Proposition 11.37]. By viewing as a polynomial in , we get
where we have used the fact that . Thus if we define the degree polynomial
then Next, we will invoke the Gaussian hypercontractivity theorem:
Theorem C.7.
[25, Theorem 11.23] Let and let , where is the standard Gaussian measure on . Then for .
In our current context, the Gaussian hypercontractivity theorem implies that
It remains to compute . Due to the orthogonality relations in (17), when we expand the square, most of the terms will vanish. Since and are both symmetric tensors, for any tuple , the quantities and depend only on the multi-set Thus for each such that , if we define
where is any -tuple satisfying , we have that
Applying the orthogonality relations in (17), we obtain
(21) |
Plugging in (20) and (21) back into (19) gives the desired conclusion. ∎
We now conclude our discussion in this section by putting together everything that was introduced. Note that while the constants in our proof differ from the original proof given in [3, Theorem 9], the spirit of the proof remains unchanged.