This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Robust Non-adaptive Group Testing under Errors in Group Membership Specifications

Shuvayan Banerjee, Radhendushka Srivastava, James Saunderson and Ajit Rajwade Shuvayan Banerjee is with IITB-Monash Research Academy.Radhendushka Srivastava is with the Department of Mathematics, IIT Bombay.James Saunderson is with the Department of Electrical and Computer Systems Engineering, Monash University.Ajit Rajwade is with the Department of Computer Science and Engineering, IIT Bombay.
Abstract

Given pp samples, each of which may or may not be defective, group testing aims to determine their defect status indirectly by performing tests on n<pn<p ‘groups’ (also called ‘pools’), where a group is formed by mixing a subset of the pp samples. Under the assumption that the number of defective samples is very small compared to pp, group testing algorithms have provided excellent recovery of the status of all pp samples with even a small number of groups. Most existing methods, however, assume that the group memberships are accurately specified. This assumption may not always be true in all applications, due to various resource constraints. For example, such errors could occur when a technician, preparing the groups in a laboratory, unknowingly mixes together an incorrect subset of samples as compared to what was specified. We develop a new group testing method, the Debiased Robust Lasso Test Method (Drlt), that handles such group membership specification errors. The proposed Drlt method is based on an approach to debias, or reduce the inherent bias in, estimates produced by Lasso, a popular and effective sparse regression technique. We also provide theoretical upper bounds on the reconstruction error produced by our estimator. Our approach is then combined with two carefully designed hypothesis tests respectively for (i) the identification of defective samples in the presence of errors in group membership specifications, and (ii) the identification of groups with erroneous membership specifications. The Drlt approach extends the literature on bias mitigation of statistical estimators such as the Lasso, to handle the important case when some of the measurements contain outliers, due to factors such as group membership specification errors. We present several numerical results which show that our approach outperforms several intuitive baselines and robust regression techniques for identification of defective samples as well as erroneously specified groups.

Index Terms:
Group testing, debiasing, Lasso, non-adaptive group testing, hypothesis testing, compressed sensing

I Introduction

Group testing is a well-studied area of data science, information theory and signal processing, dating back to the classical work of Dorfman in [14]. Consider pp samples, one per subject, where each sample is either defective or non-defective. In the case of defective samples, additional quantitative information regarding the extent or severity of the defect in the sample may be available. Group testing typically replaces individual testing of these pp samples by testing of n<pn<p ‘groups’ of samples, thereby saving on the number of tests. Each group (also called a ‘pool’) consists of a mixture of small, equal portions taken from a subset of the pp samples. Let the (perhaps noisy) test results on the nn groups be arranged in an nn-dimensional vector 𝒛\boldsymbol{z}. Let the true status of each of the pp samples be arranged in a pp-dimensional vector 𝜷\boldsymbol{\beta^{*}}. The aim of group testing is to infer 𝜷\boldsymbol{\beta^{*}} from 𝒛\boldsymbol{z} given accurate knowledge of the group memberships. We encode group memberships in an n×pn\times p-dimensional binary matrix 𝑩\boldsymbol{B} (called the ‘pooling matrix’) where Bij=1B_{ij}=1 if the jthj^{\text{th}} sample is a member of the ithi^{\text{th}} group, and Bij=0B_{ij}=0 otherwise. If the overall status of a group is the sum of the status values of each of the samples that participated in the group, we have:

𝒛=𝑩𝜷+𝜼~,\boldsymbol{z}=\boldsymbol{B\beta^{*}}+\boldsymbol{\tilde{\eta}}, (1)

where 𝜼~n\boldsymbol{\tilde{\eta}}\in\mathbb{R}^{n} is a noise vector. In a large body of the literature on group testing (e.g., [5, 10, 14]), 𝒛\boldsymbol{z} and 𝜷\boldsymbol{\beta^{*}} are modeled as binary vectors, leading to the forward model 𝒛=𝔑(𝑩𝜷)\boldsymbol{z}=\mathfrak{N}(\boldsymbol{B\beta^{*}}), where the matrix-vector ‘multiplication’ 𝑩𝜷\boldsymbol{B\beta^{*}} involves binary OR, AND operations instead of sums and products, and 𝔑\mathfrak{N} is a noise operator that could at random flip some of the bits in 𝒛\boldsymbol{z}. In this work, however, we consider 𝒛\boldsymbol{z} and 𝜷\boldsymbol{\beta^{*}} to be vectors in n\mathbb{R}^{n} and p\mathbb{R}^{p} respectively, as also done in [23, 19, 47], and adopt the linear model (1). This enables encoding of quantitative information in 𝒛,𝜷\boldsymbol{z},\boldsymbol{\beta^{*}}, and 𝑩𝜷\boldsymbol{B\beta^{*}} now involves the usual matrix-vector multiplication.

In commonly considered situations, the number of non-zero samples, i.e., defective samples, s𝜷0s\triangleq\|\boldsymbol{\beta^{*}}\|_{0} is much less than pp, and βj=0\beta^{*}_{j}=0 indicates that the jthj^{\text{th}} sample is non-defective where 1jp1\leq j\leq p. In such cases, group testing algorithms have shown excellent results for the recovery of 𝜷\boldsymbol{\beta^{*}} from 𝒛,𝑩\boldsymbol{z},\boldsymbol{B}. These algorithms are surveyed in detail in [3] and can be classified into two broad categories: adaptive and non-adaptive. Adaptive algorithms [14, 23, 25] process the measurements (i.e., the results of pooled tests available in 𝒛\boldsymbol{z}) in two or more stages of testing, where the output of each stage determines the choice of pools in the subsequent testing stage. Non-adaptive algorithms [47, 19, 20, 6], on the other hand, process the measurements with only a single stage of testing. Non-adaptive algorithms are known to be more efficient in terms of time as well as the number of tests required, at the cost of somewhat higher recovery errors, as compared to adaptive algorithms [30, 19]. In this work, we focus on non-adaptive algorithms.

Problem Motivation: In the recent COVID-19 pandemic, RT-PCR (reverse transcription polymerase chain reaction) has been the primary method of testing a person for this disease. Due to widespread shortage of various resources for testing, group testing algorithms were widely employed in many countries [1]. Many of these approaches used Dorfman testing [14] (an adaptive algorithm), but non-adaptive algorithms have also been recommended or used for this task [19, 47, 6]. In this application, the vectors 𝜷\boldsymbol{\beta^{*}} and 𝒛\boldsymbol{z} refer to the real-valued viral loads in the individual samples and the pools respectively, and 𝑩\boldsymbol{B} is again a binary pooling matrix. In a pandemic situation, there is heavy demand on testing labs. This leads to practical challenges for the technicians to implement pooling due to factors such as (i) a heavy workload, (ii) differences in pooling protocols across different labs, and (iii) the fact that pooling is inherently more complicated than individual sample testing [50],[15, ‘Results’]. Due to this, there is the possibility of a small number of inadvertent errors in creating the pools. This causes a difference between a few entries of the pre-specified matrix 𝑩\boldsymbol{B} and the actual matrix 𝑩^\boldsymbol{\hat{B}} used for pooling. Note that 𝑩\boldsymbol{B} is known whereas as 𝑩^\boldsymbol{\hat{B}} is unknown in practice. The sparsity of the difference between 𝑩\boldsymbol{B} and 𝑩^\boldsymbol{\hat{B}} is a reasonable assumption, if the technicians are competent. Hence only a small number of group membership specifications contain errors. This issue of errors during pool creation is well documented in several independent sources such as [50],[15, ‘Results’], [13, Page 2], [44], [51, Sec. 3.1], [12, ‘Discussion’], [21, ‘Specific consideration related to SARS-CoV-2’] and [2, ‘Laboratory infrastructure’]. However the vast majority of group testing algorithms — adaptive as well as non-adaptive — do not account for these errors. To the best of our knowledge, this is the first piece of work on the problem of a mismatched pooling matrix (i.e., a pooling matrix that contains errors in group membership specifications) for non-adaptive group testing with real-valued 𝜷\boldsymbol{\beta^{*}} and (possibly) noisy 𝒛\boldsymbol{z}. We emphasize that besides pooled RT-PCR testing, faulty specification of pooling matrices may also naturally occur in group testing in many other scenarios, for example when applied to verification of electronic circuits [29]. Another scenario is in epidemiology [11], for identifying infected individuals who come in contact with agents who are sent to mix with individuals in the population. The health status of various individuals is inferred from the health status of the agents. However, sometimes an agent may remain uninfected even upon coming in contact with an infected individual, which can be interpreted as an error in the pooling matrix.

Related Work: We now comment on two related pieces of work which deal with group testing with errors in pooling matrices via non-adaptive techniques. The work in [11] considers probabilistic and structured errors in the pooling matrix, where an entry bijb_{ij} with a value of 1 could flip to 0 with a probability 0.50.5, but not vice versa, i.e., a genuinely zero-valued bijb_{ij} never flips to 1. The work in [35] considers a small number of ‘pretenders’ in the unknown binary vector 𝜷\boldsymbol{\beta^{*}}, i.e., there exist elements in 𝜷\boldsymbol{\beta^{*}} which flip from 1 to 0 with probability 0.50.5, but not vice versa. Both these techniques consider binary valued vectors 𝒛\boldsymbol{z} and 𝜷\boldsymbol{\beta^{*}}, unlike real-valued vectors as considered in this work. They also do not consider noise in 𝒛\boldsymbol{z} in addition to the errors in 𝑩\boldsymbol{B}. Furrthemore, we also present a method to identify the errors in 𝑩\boldsymbol{B}, unlike the techniques in [11, 35]. Due to these differences between our work and [11, 35], a direct numerical comparison between our results and theirs will not be meaningful.

Sensing Matrix Perturbation in Compressed Sensing: There exists a nice relationship between the group testing problem and the problem of signal reconstruction in compressed sensing (CS), as explored in [19, 20]. Likewise, there is literature in the CS community which deals with perturbations in sensing matrices [39, 40, 53, 27, 4, 24, 18]. However, these works either consider dense random perturbations (i.e., perturbations in every entry) [53, 27, 40, 4, 24, 18] or perturbations in specifications of Fourier frequencies [39, 26]. These perturbation models are vastly different from the sparse set of errors in binary matrices as considered in this work. Furthermore, apart from [39, 26], these techniques just perform robust signal estimation, without any attempt to identify rows of the sensing matrix which contained those errors.

Overview of contributions: In this paper, we present a robust approach for recovering 𝜷p\boldsymbol{\beta^{*}}\in\mathbb{R}^{p} from noisy 𝒛n\boldsymbol{z}\in\mathbb{R}^{n} when n<pn<p, given a (known) pre-specified pooling matrix 𝑩\boldsymbol{B}, but where the measurements in 𝒛\boldsymbol{z} have been generated with another unknown pooling matrix 𝑩^\boldsymbol{\hat{B}}, i.e., 𝒛=𝑩^𝜷+𝜼~\boldsymbol{z}=\boldsymbol{\hat{B}\beta^{*}}+\boldsymbol{\tilde{\eta}}. The approach, which we call the Debiased Robust Lasso Test Method or Drlt, extends existing work on ‘debiasing’ the well-known Lasso estimator in statistics [28], to also handle errors in 𝑩\boldsymbol{B}. In this approach, we present a principled method to identify which measurements in 𝒛\boldsymbol{z} correspond to rows with errors in 𝑩\boldsymbol{B}, using hypothesis testing. We also present an algorithm for direct estimation of 𝜷\boldsymbol{\beta^{*}} and a hypothesis test for identification of the defective samples in 𝜷\boldsymbol{\beta^{*}}, given errors in 𝑩\boldsymbol{B}. We establish the desirable properties of these statistical tests such as consistency. Though our approach was initially motivated by pooling errors during preparation of pools of COVID-19 samples, it is broadly applicable to any group-testing problem where the pool membership specifications contain errors.

Notations: Throughout this paper, 𝑰𝒏\boldsymbol{I_{n}} denotes the identity matrix of size n×nn\times n. We use the notation [n]{1,2,,n}[n]\triangleq\{1,2,\cdots,n\} for n+n\in\mathbb{Z}_{+}. Given a matrix 𝑨\boldsymbol{A}, its ithi^{\text{th}} row is denoted as 𝒂𝒊.\boldsymbol{a_{i.}}, its jthj^{\text{th}} column is denoted as 𝒂.𝒋\boldsymbol{a_{.j}} and the (i,j)th(i,j)^{\text{th}} element is denoted by aija_{ij}. The ithi^{\text{th}} column of the identity matrix will be denoted as 𝒆𝒊\boldsymbol{e_{i}}. For any vector 𝒛n\boldsymbol{z}\in\mathbb{R}^{n} and index set S[n]S\subseteq[n], we define 𝒛Sn\boldsymbol{z}_{S}\in\mathbb{R}^{n} such that iS,(zS)i=zi\forall i\in S,(z_{S})_{i}=z_{i} and iS,(zS)i=0\forall i\notin S,(z_{S})_{i}=0. ScS^{c} denotes the complement of set SS. We define the entrywise ll_{\infty} norm of a matrix 𝑨\boldsymbol{A} as |𝑨|maxi,j|aij||\boldsymbol{A}|_{\infty}\triangleq\underset{i,j}{\max}|a_{ij}|. Consider two real-valued random sequences xnx_{n} and rnr_{n}. Then, we say that xnx_{n} is oP(rn)o_{P}(r_{n}) if xn/rn0x_{n}/r_{n}\rightarrow 0 in probability, i.e., limnP(|xn/rn|ϵ)=0\lim_{n\rightarrow\infty}P(|x_{n}/r_{n}|\geq\epsilon)=0 for any ϵ>0\epsilon>0. Also, we say that xnx_{n} is OP(rn)O_{P}(r_{n}) if xn/rnx_{n}/r_{n} is bounded in probability, i.e., for any ϵ>0\epsilon>0 there exist m,n0>0m,n_{0}>0, such that P(|xn/rn|<m)1ϵP(|x_{n}/r_{n}|<m)\geq 1-\epsilon for all n>n0n>n_{0}.

Organization of the paper: The noise model involving measurement noise and errors in the pooling matrix is presented in Sec. II. Our core technique, Drlt, is presented in Sec. III, with essential background literature summarized in Sec. III-B, and our key innovations presented in Sec. III-C as well as Sec. IV. Detailed experimental results are given in Sec. V. We conclude in Sec. VI. Proofs of all theorems and lemmas are provided in Appendices A,B and C.

II Problem formulation

II-A Basic Noise Model

We now formally describe the model setup used in this paper. Suppose 𝜷p\boldsymbol{\beta^{*}}\in\mathbb{R}^{p} and that the elements of the n×pn\times p pooling matrix 𝑩\boldsymbol{B} are independently drawn from Bernoulli(0.5)\text{Bernoulli}(0.5) in (1). Additionally, let 𝜷\boldsymbol{\beta^{*}} be sparse with at most sps\ll p non-zero elements. Assume that the elements of the noise vector 𝜼~n\boldsymbol{\tilde{\eta}}\in\mathbb{R}^{n} in (1) are independent and identically distributed (i.i.d.) Gaussian random variables with mean 0 and variance σ~2\tilde{\sigma}^{2}. Note that, throughout this work, we assume σ~2\tilde{\sigma}^{2} to be known. The Lasso estimator 𝜷^\boldsymbol{\hat{\beta}}, used to estimate 𝜷\boldsymbol{\beta^{*}}, is defined as

𝜷^=argmin𝜷12n𝒛𝑩𝜷22+λ𝜷1.\boldsymbol{\hat{\beta}}=\arg\min_{\boldsymbol{\beta}}\frac{1}{2n}\|\boldsymbol{z}-\boldsymbol{B\beta}\|^{2}_{2}+\lambda\|\boldsymbol{\beta}\|_{1}. (2)

Given a sufficient number of measurements, the Lasso is known to be consistent for sparse 𝜷\boldsymbol{\beta^{*}} [22, Chapter 11] if the penalty parameter λ>0\lambda>0 is chosen appropriately and if 𝑩\boldsymbol{B} satisfies the Restricted Eigenvalue Condition (REC)111Restricted Eigenvalue Condition: For some constant ηc1\eta^{c}\geq 1 and S{1,2,..,n}S\subseteq\{1,2,..,n\}, let Cηc(S){𝚫n:𝚫𝑺𝒄1ηc𝚫𝑺1}C_{\eta^{c}}(S)\triangleq\{\boldsymbol{\Delta}\in\mathbb{R}^{n}:\|\boldsymbol{\Delta_{S^{c}}}\|_{1}\leq\eta^{c}\|\boldsymbol{\Delta_{S}}\|_{1}\}. We say that a m×nm\times n matrix 𝑩\boldsymbol{B} satisfies the REC if for a constant γ>0\gamma>0, we have 1p𝑩𝚫22γ𝚫22\frac{1}{p}\|\boldsymbol{B\Delta}\|_{2}^{2}\geq\gamma\|\boldsymbol{\Delta}\|_{2}^{2} for every 𝚫CηC(S)\boldsymbol{\Delta}\in C_{\eta^{C}}(S), γ\gamma being the RE constant.. Certain deterministic binary pooling matrices can also be used as in [19, 47] for a consistent estimator of 𝜷\boldsymbol{\beta^{*}}. However, we focus on the chosen random pooling matrix in this paper.

It is more convenient for analysis via the REC, and more closely related to the theory in [28], if the elements of the pooling matrix have mean 0. Since the elements of 𝑩\boldsymbol{B} are drawn independently from Bernoulli(0.5)(0.5), it does not obey the mean-zero property. Hence, we transform the random binary matrix 𝑩\boldsymbol{B} to a random Rademacher matrix 𝑨2𝑩𝟏𝒏×𝒑\boldsymbol{A}\triangleq 2\boldsymbol{B}-\boldsymbol{1_{n\times p}}, which is a simple one-one transformation similar to that adopted in [42] for Poisson compressive measurements. (Note that 𝟏𝒏×𝒑\boldsymbol{1_{n\times p}} refers to a matrix of size n×pn\times p containing all ones.) We also transform the measurements in 𝒛\boldsymbol{z} to equivalent measurements 𝒚\boldsymbol{y} associated with Rademacher matrix 𝑨\boldsymbol{A}.

The expression for each measurement in 𝒚\boldsymbol{y} is now given by:

i[n],yi=𝒂𝒊.𝜷+ηi𝒚=𝑨𝜷+𝜼,\displaystyle\forall i\in[n],y_{i}=\boldsymbol{a_{i.}}\boldsymbol{\beta^{*}}+\eta_{i}\implies\boldsymbol{y}=\boldsymbol{A\beta^{*}}+\boldsymbol{\eta}, (3)

where ηi𝒩(0,σ2),σ24σ~2\eta_{i}\sim\mathcal{N}(0,\sigma^{2}),\sigma^{2}\triangleq 4\tilde{\sigma}^{2}. We will henceforth consider 𝒚,𝑨\boldsymbol{y},\boldsymbol{A} for the Lasso estimates in the following manner: The Lasso estimator 𝜷^\boldsymbol{\hat{\beta}}, used to estimate 𝜷\boldsymbol{\beta^{*}}, is now defined as

𝜷^=argmin𝜷12n𝒚𝑨𝜷22+λ𝜷1.\boldsymbol{\hat{\beta}}=\arg\min_{\boldsymbol{\beta}}\frac{1}{2n}\|\boldsymbol{y}-\boldsymbol{A\beta}\|^{2}_{2}+\lambda\|\boldsymbol{\beta}\|_{1}. (4)

II-B Model Mismatch Errors

Consider the measurement model defined in (3). We now examine the effect of mis-specification of samples in a pool. That is, we consider the case where due to errors in mixing of the samples, the pools are generated using an unknown matrix 𝑨^\boldsymbol{\hat{A}} (say) instead of the pre-specified matrix 𝑨\boldsymbol{A}. Note that 𝑨\boldsymbol{A} and 𝑨^\boldsymbol{\hat{A}} are respectively obtained from 𝑩\boldsymbol{B} and 𝑩^\boldsymbol{\hat{B}}. The elements of matrix 𝑨^\boldsymbol{\hat{A}} and 𝑨\boldsymbol{A} are equal everywhere except for the misspecified samples in each pool. We refer to these errors in group membership specifications as ‘bit-flips’. For example, suppose that the ithi^{\text{th}} pool is specified to consist of samples j1,j2,j3[p]j_{1},j_{2},j_{3}\in[p]. But due to errors during pool creation, the ithi^{\text{th}} pool is generated using samples j1,j2,j5j_{1},j_{2},j_{5}. In this specific instance, ai,j3a^i,j3a_{i,j_{3}}\neq\hat{a}_{i,j_{3}} and ai,j5a^i,j5a_{i,j_{5}}\neq\hat{a}_{i,j_{5}}.

In practice, note that 𝑨\boldsymbol{A} is known whereas 𝑨^\boldsymbol{\hat{A}} is unknown. Moreover, the locations of bit-flips are unknown. Hence they induce signal-dependent and possibly large ‘model mismatch errorsδi(𝒂^𝒊.𝒂𝒊.)𝜷\delta_{i}^{*}\triangleq(\boldsymbol{\hat{a}_{i.}}-\boldsymbol{a_{i.}})\boldsymbol{\beta^{*}} in the ithi^{\text{th}} measurement. In the presence of bit-flips, the model in (3) can be expressed as:

yi=𝒂𝒊.𝜷+δi+ηi,fori[n],𝒚=𝑨𝜷+𝜹+𝜼=(𝑨|𝑰𝒏)(𝜷𝜹)+𝜼.\displaystyle y_{i}=\boldsymbol{a_{i.}\beta^{*}}+\delta_{i}^{*}+{\eta}_{i},\ \mbox{for}\ i\in[n],\implies\boldsymbol{y}=\boldsymbol{A\beta^{*}}+\boldsymbol{\delta^{*}}+\boldsymbol{{\eta}}=(\boldsymbol{A}|\boldsymbol{I_{n}})\begin{pmatrix}\boldsymbol{\beta^{*}}\\ \boldsymbol{\delta^{*}}\end{pmatrix}+\boldsymbol{\eta}. (5)

We assume 𝜹\boldsymbol{\delta^{*}}, which we call the ‘model mismatch error’ (MME) vector in n\mathbb{R}^{n}, to be sparse, and r𝜹0nr\triangleq\|\boldsymbol{\delta^{*}}\|_{0}\ll n. This implies that at least rr rows in 𝑨^\boldsymbol{\hat{A}} have bit-flips. The sparsity assumption is reasonable in many applications (e.g., given a competent technician performing pooling).

Suppose for a fixed i[n]i\in[n], 𝒂^𝒊.\boldsymbol{\hat{a}_{i.}} contains a bit-flip at index jj. However, if βj{\beta}_{j}^{*} is 0 then δi\delta^{*}_{i} would remain 0 despite the presence of a bit-flip in 𝒂^𝒊.\boldsymbol{\hat{a}_{i.}}. Furthermore, such a bit-flip has no effect on the measurements and is not identifiable from the measurements. However, if βj\beta_{j}^{*} is non-zero then δi\delta_{i}^{*} is also non-zero. Such a bit-flip adversely affects the measurement and we henceforth refer to it as an effective bit-flip or effective MME.

III Debiased Robust Lasso Test (Drlt) Method

We now present our proposed approach, named the ‘Debiased Robust Lasso Test Method’ (Drlt), for recovering the signal 𝜷\boldsymbol{\beta^{*}} given measurements 𝒚\boldsymbol{y} obtained from the erroneous, unknown matrix 𝑨^\boldsymbol{\hat{A}} which is different from the pre-specified, known sensing matrix 𝑨\boldsymbol{A}. The main objectives of this work are:

  1. Aim (i):

    Estimation of 𝜷\boldsymbol{\beta^{*}} under model mismatch and development of a statistical test to determine whether or not the jthj^{\text{th}} sample, j[p]j\in[p], is defective/diseased.

  2. Aim (ii):

    Development of a statistical test to determine whether or not 𝒂^𝒊.\boldsymbol{\hat{a}_{i.}}, i[n]i\in[n], contains effective MMEs.

A measurement containing an effective MME will appear like an outlier in comparison to other measurements due to the non-zero values in 𝜹\boldsymbol{\delta^{*}}. Therefore identification of measurements containing MMEs is equivalent to determining the non-zero entries of 𝜹\boldsymbol{\delta^{*}}. This idea is inspired by the concept of ‘Studentised residuals’ which is widely used in the statistics literature to identify outliers in full-rank regression models [36]. Since our model operates in a compressive regime where n<pn<p, the distributional property of studentized residuals may not hold. Therefore, we develop our Drlt method which is tailored for the compressive regime.

Our basic estimator for 𝜷\boldsymbol{\beta^{*}} and 𝜹\boldsymbol{\delta^{*}} from 𝒚\boldsymbol{y} and 𝑨\boldsymbol{A} is given as

(𝜷^𝝀𝟏𝜹^𝝀𝟐)\displaystyle\begin{pmatrix}\boldsymbol{\hat{\beta}_{\lambda_{1}}}\\ \boldsymbol{\hat{\delta}_{\lambda_{2}}}\end{pmatrix} =\displaystyle= argmin𝜷,𝜹12n𝒚𝑨𝜷𝜹22+λ1𝜷1+λ2𝜹1,\displaystyle\arg\min_{\boldsymbol{\beta},\boldsymbol{\delta}}{\frac{1}{2n}}\left\|\boldsymbol{y}-\boldsymbol{A\beta}-\boldsymbol{\delta}\right\|^{2}_{2}+\lambda_{1}\|\boldsymbol{\beta}\|_{1}+\lambda_{2}\left\|\boldsymbol{\delta}\right\|_{1}, (6)

where λ1,λ2\lambda_{1},\lambda_{2} are appropriately chosen regularization parameters. This estimator is a robust version of the Lasso regression [38]. The robust Lasso, just like the Lasso, will incur a bias due to the 1\ell_{1} penalty terms.

The work in [28] provides a method to mitigate the bias in the Lasso estimate and produces a ‘debiased’ signal estimate whose distribution turns out to be approximately Gaussian with specific observable parameters in the compressive regime (for details, see [28] and Sec. III-B below). However, the work in [28] does not take into account errors in sensing matrix specification. We non-trivially adapt the techniques of [28] to our specific application which considers MMEs in the pooling matrix, and we also develop novel procedures to realize Aims (i) and (ii) mentioned above.

We now first review important concepts which are used to develop our method for the specified aims. We subsequently develop our method in the rest of this section. However, before that, we present error bounds on the estimates 𝜷^𝝀𝟏\boldsymbol{\hat{\beta}_{\lambda_{1}}} and 𝜹^𝝀𝟐\boldsymbol{\hat{\delta}_{\lambda_{2}}} from (6), which are non-trivial extensions of results in [38]. These bounds will be essential in developing hypothesis tests to achieve Aims (i) and (ii).

III-A Bounds on the Robust Lasso Estimate

When the sensing matrix 𝑨\boldsymbol{A} is iid Gaussian, upper bounds on 𝜷𝜷^𝝀𝟏2\|\boldsymbol{\beta^{*}}-\boldsymbol{\hat{\beta}_{\lambda_{1}}}\|_{2} and 𝜹𝜹^𝝀𝟐2\|\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}\|_{2} have been presented in [38]. In our case, 𝑨\boldsymbol{A} is i.i.d. Rademacher, and hence some modifications to the results from [38] are required. We now state a theorem for the upper bound on the reconstruction error of both 𝜷^𝝀𝟏\boldsymbol{\hat{\beta}_{\lambda_{1}}} and 𝜹^𝝀𝟐\boldsymbol{\hat{\delta}_{\lambda_{2}}} for a random Rademacher pooling matrix 𝑨\boldsymbol{A}. We further use the so called ‘cone constraint’ to derive separate bounds on the estimates of both 𝜷\boldsymbol{\beta^{*}} and 𝜹\boldsymbol{\delta^{*}}. These bounds will be very useful in deriving theoretical results for debiasing.

Theorem 1

Let 𝛃^𝛌𝟏,𝛅^𝛌𝟐\boldsymbol{\hat{\beta}}_{\boldsymbol{\lambda_{1}}},\boldsymbol{\hat{\delta}_{\lambda_{2}}} be as in (6) and set λ14σlogpn,λ24σlognn\lambda_{1}\triangleq\frac{4\sigma\sqrt{\log p}}{\sqrt{n}},\lambda_{2}\triangleq\frac{4\sigma\sqrt{\log n}}{{n}}. Let n<pn<p, 𝒮{j:βj0}\mathcal{S}\triangleq\{j:\beta^{*}_{j}\neq 0\}, {i:δi0}\mathcal{R}\triangleq\{i:\delta^{*}_{i}\neq 0\}, s|𝒮|s\triangleq|\mathcal{S}| and r||r\triangleq|\mathcal{R}|. If |𝐀|1|\boldsymbol{A}|_{\infty}\leq 1 and 𝐀\boldsymbol{A} satisfies the Extended Restricted Eigenvalue Condition (EREC) from Definition 1 with κ>0\kappa>0 and with respect to the cone 𝒞(𝒮,,(nλ2)/λ1)\mathcal{C}(\mathcal{S},\mathcal{R},(\sqrt{n}\lambda_{2})/\lambda_{1}), then we have the following:

  1. (1)
    P(𝜷^𝝀𝟏𝜷148κ2(s+r)σlog(p)n)1(1p+1n).P\left(\left\|\boldsymbol{\hat{\beta}_{\lambda_{1}}}-\boldsymbol{\beta^{*}}\right\|_{1}\leq 48\kappa^{-2}(s+r)\sigma\sqrt{\frac{{\log(p)}}{{n}}}\right)\geq 1-\left(\frac{1}{p}+\frac{1}{n}\right). (7)
  2. (2)

    Additionally if nlogn(48κ2)2(s+r)2logpn\log n\geq(48\kappa^{-2})^{2}(s+r)^{2}\log p,

    P(𝜹^𝝀𝟐𝜹124σrlognn)1(1p+2n).P\Biggl{(}\left\|\boldsymbol{\hat{\delta}_{\lambda_{2}}}-\boldsymbol{\delta^{*}}\right\|_{1}\leq\frac{24\sigma r\sqrt{\log n}}{n}\Biggr{)}\geq 1-\left(\frac{1}{p}+\frac{2}{n}\right). (8)

\blacksquare

In Lemma. 1 of Sec.A, we show that the chosen random Rademacher pooling matrix 𝑨\boldsymbol{A} satisfies the EREC with κ=1/16\kappa=1/16 if λ1\lambda_{1} and λ2\lambda_{2} are chosen as in Theorem 1. Furthermore, |𝑨|=1|\boldsymbol{A}|_{\infty}=1.
Remarks on Theorem 1:

  1. 1.

    From part (1), we see that 𝜷^𝝀𝟏𝜷1\left\|\boldsymbol{\hat{\beta}_{\lambda_{1}}}-\boldsymbol{\beta^{*}}\right\|_{1} = OP((s+r)logpn)O_{P}\left((s+r)\sqrt{\frac{\log p}{n}}\right).

  2. 2.

    From part (2), we see that 𝜹^𝝀𝟐𝜹1\left\|\boldsymbol{\hat{\delta}_{\lambda_{2}}}-\boldsymbol{\delta^{*}}\right\|_{1} = OP(elognn)O_{P}\left(\frac{e\sqrt{\log n}}{n}\right).

  3. 3.

    The upper bounds of errors given in Theorem 1 increase with σ\sigma, as well as ss and rr, which is quite intuitive. They also decrease with nn.

III-B A note on the Debiased Lasso

Let us consider the measurement vector from (5), momentarily setting 𝜹𝟎\boldsymbol{\delta^{*}}\triangleq\boldsymbol{0}, i.e., we have 𝒚=𝑨𝜷+𝜼\boldsymbol{y}=\boldsymbol{A\beta^{*}}+\boldsymbol{\eta}. Let 𝜷^𝝀\boldsymbol{\hat{\beta}_{\lambda}} be the minimizer of the following Lasso problem

min𝜷12n𝒚𝑨𝜷22+λ𝜷1,\min_{\boldsymbol{\beta}}{\frac{1}{2n}}\|\boldsymbol{y}-\boldsymbol{A\beta}\|^{2}_{2}+\lambda\|\boldsymbol{\beta}\|_{1}, (9)

for a given value of λ\lambda. Though Lasso provides excellent theoretical guarantees [22, Chapter 11], it is well known that it produces biased estimates, i.e., E(𝜷^𝝀)𝜷E(\boldsymbol{\hat{\beta}_{\lambda}})\neq\boldsymbol{\beta^{*}} where the expectation is over different instances of 𝜼\boldsymbol{\eta}. The work in [28] replaces 𝜷^𝝀\boldsymbol{\hat{\beta}_{\lambda}} by a ‘debiased’ estimate 𝜷^𝒅\boldsymbol{\hat{\beta}_{d}} given by:

𝜷^𝒅=𝜷^𝝀+1n𝑴𝑨(𝒚𝑨𝜷^𝝀),\boldsymbol{\hat{\beta}_{d}}=\boldsymbol{\hat{\beta}_{\lambda}}+\dfrac{1}{n}\boldsymbol{M}\boldsymbol{A}^{\top}(\boldsymbol{y}-\boldsymbol{A\hat{\beta}_{\lambda}}), (10)

where 𝑴\boldsymbol{M} is an approximate inverse (defined as in Alg. 1) of 𝚺^𝑨𝑨/n\boldsymbol{\hat{\Sigma}}\triangleq\boldsymbol{A}^{\top}\boldsymbol{A}/n. Substituting 𝒚=𝑨𝜷+𝜼\boldsymbol{y}=\boldsymbol{A\beta^{*}}+\boldsymbol{\eta} into (10) and treating 1n𝑴𝑨𝑨\frac{1}{n}\boldsymbol{MA}^{\top}\boldsymbol{A} as approximately equal to the identity matrix, yields:

𝜷^𝒅=𝜷^𝝀+1n𝑴𝑨(𝑨𝜷+𝜼𝑨𝜷^𝝀)𝜷+1n𝑴𝑨𝜼,\boldsymbol{\hat{\beta}_{d}}=\boldsymbol{\hat{\beta}_{\lambda}}+\dfrac{1}{n}\boldsymbol{M}\boldsymbol{A}^{\top}(\boldsymbol{A\beta^{*}}+\boldsymbol{\eta}-\boldsymbol{A\hat{\beta}_{\lambda}})\approx\boldsymbol{\beta^{*}}+\dfrac{1}{n}\boldsymbol{\boldsymbol{M}\boldsymbol{A}^{\top}\eta}, (11)

which is referred to as a debiased estimate, as E(𝜷^𝒅)𝜷E(\boldsymbol{\hat{\beta}_{d}})\approx\boldsymbol{\beta^{*}}. Note that 𝚺^\boldsymbol{\hat{\Sigma}} is not an invertible matrix as n<pn<p, and hence, the approximate inverse is obtained by solving a convex optimization problem as given by Alg. 1, where the minimization of the diagonal elements of 𝑴𝚺^𝑴\boldsymbol{M\hat{\Sigma}M} is motivated by minimizing the variance of 𝜷^𝒅\boldsymbol{\hat{\beta}_{d}}, as proved in [28, Sec. 2.1]. Furthermore, as proved in [28, Theorem 7], the convex problem in Alg. 1 is feasible with high probability if 𝚺E[𝒂𝒊.(𝒂𝒊.)]\boldsymbol{\Sigma}\triangleq E[\boldsymbol{a_{i.}}(\boldsymbol{a_{i.}})^{\top}] (where the expectation is taken over the rows of 𝑨\boldsymbol{A}) obeys some specific statistical properties (see later in this section).

Algorithm 1 Construction of 𝑴\boldsymbol{M} (Alg 1 of [28])
0:  Measurement vector 𝒚\boldsymbol{y}, design matrix 𝑨\boldsymbol{A}, μ\mu
0:  𝑴\boldsymbol{M}
1:  Set 𝚺^=𝑨𝑨/n\boldsymbol{\hat{\Sigma}}=\boldsymbol{A}^{\top}\boldsymbol{A}/n.
2:  Let 𝒎𝒊p\boldsymbol{m_{i}}\in\mathbb{R}^{p} for i=1,2,,pi=1,2,\ldots,p be a solution of:
minimize𝒎𝒊𝚺^𝟏𝒎𝒊\displaystyle\textrm{minimize}\quad\boldsymbol{m_{i}}^{\top}\boldsymbol{\hat{\Sigma}_{1}m_{i}}
subject to𝚺^𝒎𝒊𝒆𝒊μ,\displaystyle\textrm{subject to}\quad\|\boldsymbol{\hat{\Sigma}m_{i}}-\boldsymbol{e_{i}}\|_{\infty}\leq\mu, (12)
where 𝒆𝒊p\boldsymbol{e_{i}}\in\mathbb{R}^{p} is the ithi^{\text{th}} column of the identity matrix 𝑰\boldsymbol{I} and μ=O(logpn)\mu=O\left(\sqrt{\frac{\log p}{n}}\right).
3:  Set 𝑴=(𝒎𝟏||𝒎𝒑)\boldsymbol{M}=(\boldsymbol{m_{1}}|\ldots|\boldsymbol{m_{p}})^{\top}. If any of the above problems is not feasible, then set 𝑴=𝑰𝒑\boldsymbol{M=I_{p}}.

The debiased estimate 𝜷^𝒅\boldsymbol{\hat{\beta}_{d}} in (10) obtained via an approximate inverse 𝑴\boldsymbol{M} of 𝚺^\boldsymbol{\hat{\Sigma}} using μ=O((logp)/n)\mu=O\left(\sqrt{(\log p)/n}\right) has the following statistical properties [28, Theorem 8]:

m(𝜷^𝒅𝜷)=𝑴𝑨𝜼/n+n(𝑴𝚺^𝑰𝒏)(𝜷𝜷^𝒅).\displaystyle\sqrt{m}(\boldsymbol{\hat{\beta}_{d}}-\boldsymbol{\beta^{*}})=\boldsymbol{MA}^{\top}\boldsymbol{\eta}/\sqrt{n}+\sqrt{n}(\boldsymbol{M\hat{\Sigma}}-\boldsymbol{I_{n}})(\boldsymbol{\beta^{*}}-\boldsymbol{\hat{\beta}_{d}}). (13)

Here the second term on the RHS is referred to as the bias vector. Moreover it is proved in [28, Theorem 8] that for sufficiently large nn, an appropriate choice of λ\lambda in (9) and appropriate statistical assumptions on 𝚺\boldsymbol{\Sigma}, the maximum absolute value of the bias vector is OP(σslogpn)O_{P}(\frac{\sigma s\log p}{\sqrt{n}}). Thus, if n>O((slogp)2)n>O((s\log p)^{2}), the largest absolute value of the bias vector will be negligible, and thus the debiasing effect is achieved since E(𝜷^𝒅)𝜷E(\boldsymbol{\hat{\beta}_{d}})\approx\boldsymbol{\beta^{*}}.

Our debiasing approach is motivated along similar lines as in Alg. 1, but with MMEs in the sensing matrix which the earlier method cannot handle. Moreover, we demonstrate via simulations that ignoring MMEs may lead to larger estimation errors – see Table I of Sec. V-A.

III-C Debiasing in the Presence of MMEs

In the presence of MMEs, the design matrix (𝑨|𝑰𝒏)(\boldsymbol{A}|\boldsymbol{I_{n}}) from (5) plays the role of 𝑨\boldsymbol{A} in Alg. 1. However (𝑨|𝑰𝒏)(\boldsymbol{A}|\boldsymbol{I_{n}}) is partly random and partly deterministic, whereas the theory in [28] applies to either purely random (Theorem 8 of [28]) or purely deterministic (Theorem 6 of [28]) matrices, but not a combination of both. Hence, the theoretical results of [28] do not apply for the approximate inverse of 1n(𝑨|𝑰𝒏)(𝑨|𝑰𝒏)\frac{1}{n}(\boldsymbol{A}|\boldsymbol{I_{n}})^{\top}(\boldsymbol{A}|\boldsymbol{I_{n}}) obtained using Alg. 1. Numerical results for the poor performance of ‘debiasing’ (as in (10)) with such an approximate inverse are demonstrated in Sec. V-A.

To produce a debiased estimate of 𝜷\boldsymbol{\beta^{*}} in the presence of MMEs in the pooling matrix, we adopt a different approach than the one in [28]. We define a linear combination of the residual error vectors obtained by running the robust Lasso estimator from (6) via a carefully chosen set of weights, in order to debias the robust Lasso estimates 𝜷^λ1,𝜹^λ2\boldsymbol{\hat{\beta}}_{\lambda_{1}},\boldsymbol{\hat{\delta}}_{\lambda_{2}}. The weights of the linear combination are represented in the form of an appropriately designed matrix 𝑾n×p\boldsymbol{W}\in\mathbb{R}^{n\times p} for debiasing 𝜷^𝝀𝟏\boldsymbol{\hat{\beta}_{\lambda_{1}}} and a derived weights matrix (𝑰𝒏1n𝑾𝑨)\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{WA}^{\top}\right) for debiasing 𝜹^λ2\boldsymbol{\hat{\delta}}_{\lambda_{2}}. We later provide a procedure to design an optimal 𝑾\boldsymbol{W}, as given in Alg. 1.

Given weight matrix 𝑾\boldsymbol{W}, we define a set of modified debiased Lasso estimates for 𝜷\boldsymbol{\beta^{*}} and 𝜹\boldsymbol{\delta^{*}} as follows:

𝜷^𝑾\displaystyle\boldsymbol{\hat{\beta}_{W}} \displaystyle\triangleq 𝜷^𝝀𝟏+1n𝑾(𝒚𝑨𝜷^𝝀𝟏𝜹^𝝀𝟐),\displaystyle\boldsymbol{\hat{\beta}_{\lambda_{1}}}+\frac{1}{n}\boldsymbol{W}^{\top}(\boldsymbol{y}-\boldsymbol{A\hat{\beta}_{\lambda_{1}}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}), (14)
𝜹^𝑾\displaystyle\boldsymbol{\hat{\delta}_{W}} \displaystyle\triangleq 𝒚𝑨𝜷^𝑾=𝜹^𝝀𝟐+(𝑰𝒏1n𝑾𝑨)(𝒚𝑨𝜷^𝝀𝟏𝜹^𝝀𝟐).\displaystyle\boldsymbol{y}-\boldsymbol{A}\boldsymbol{\hat{\beta}_{W}}=\boldsymbol{\hat{\delta}_{\lambda_{2}}}+\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{WA}^{\top}\right)(\boldsymbol{y}-\boldsymbol{A\hat{\beta}_{\lambda_{1}}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}). (15)

In our work, the matrix 𝑾\boldsymbol{W} does not play the role of 𝑴\boldsymbol{M} from Alg. 1, but instead plays the role of 𝑨𝑴\boldsymbol{AM}^{\top} (comparing (14) and (10)). In Theorem 2 below, we show that these estimates are debiased in nature for the choice 𝑾𝑨\boldsymbol{W}\triangleq\boldsymbol{A}. Thereafter, in Sec. IV and Theorem 5, using a different choice for 𝑾\boldsymbol{W} via Alg. 2, we show that the resultant tests are superior in comparison to 𝑾=𝑨\boldsymbol{W}=\boldsymbol{A}.

Theorem 2

Let 𝛃^𝛌𝟏,𝛅^𝛌𝟐\boldsymbol{\hat{\beta}}_{\boldsymbol{\lambda_{1}}},\boldsymbol{\hat{\delta}_{\lambda_{2}}} be as in (6), 𝛃^𝐖\boldsymbol{\hat{\beta}_{W}}, 𝛅^𝐖\boldsymbol{\hat{\delta}_{W}} be as in (14), (15) respectively and set λ14σlogpn,λ24σlognn\lambda_{1}\triangleq\frac{4\sigma\sqrt{\log p}}{\sqrt{n}},\lambda_{2}\triangleq\frac{4\sigma\sqrt{\log n}}{{n}}. If nn is ω[((s+r)logp)2]\omega[((s+r)\log p)^{2}], then given that 𝐀\boldsymbol{A} is a random Rademacher matrix and using 𝐖𝐀\boldsymbol{W}\triangleq\boldsymbol{A}, we have the following:

  1. (1)

    Additionally, if n<pn<p, then for any j[p]j\in[p],

    n(β^\scaletoWj5ptβj)N(0,σ2) as p,n.\sqrt{n}(\hat{\beta}_{{\scaleto{Wj}{5pt}}}-{\beta}^{*}_{j})\xrightarrow{\mathcal{L}}N\left(0,\sigma^{2}\right)\mbox{ as }p,n\to\infty. (16)
  2. (2)

    Additionally, if nlognn\log n is o(p)o(p), then for any i[n]i\in[n],

    δ^\scaletoWi5ptδiΣAiiN(0,σ2) as p,n,\displaystyle\frac{\hat{\delta}_{{\scaleto{Wi}{5pt}}}-\delta^{*}_{i}}{\sqrt{\Sigma_{A_{ii}}}}\xrightarrow{\mathcal{L}}N\left(0,\sigma^{2}\right)\mbox{ as }p,n\to\infty, (17)

    where 𝚺𝑨\boldsymbol{\Sigma_{A}} is defined as follows:

    𝚺𝑨(𝑰𝒏1n𝑨𝑨)(𝑰𝒏1n𝑨𝑨).\boldsymbol{\Sigma_{A}}\triangleq\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{AA}^{\top}\right)\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{AA}^{\top}\right)^{\top}. (18)

Here \xrightarrow{\mathcal{L}} denotes the convergence in law/distribution. \blacksquare

Remarks on Theorem 2

  1. 1.

    The asymptotic distributions of the LHS terms in (16) and (17) do not depend on 𝑨\boldsymbol{A}. These distributions are asymptotically Gaussian because the noise vector 𝜼\boldsymbol{\eta} is normally distributed.

  2. 2.

    Theorem 2 provides the key result to develop a testing procedure corresponding to Aims (i) and (ii).

  3. 3.

    If nn is ω[((s+r)logp)2]\omega[((s+r)\log p)^{2}] then Lemma 1 implies that the Rademacher matrix 𝑨\boldsymbol{A} satisfies EREC.

  4. 4.

    The condition n<pn<p in Result (1) emerges from (142) and (143), which are based on probabilistic bounds on the singular values of random Rademacher matrices [37]. For the special case where n=pn=p (which is no longer a compressive regime), these bounds are no longer applicable, and instead results such as [45, Thm. 1.2] can be used.

Drlt for β\boldsymbol{\beta^{*}}: In Aim (i), we intended to develop a statistical test to determine whether a sample in defective or not. Given the significance level α[0,1]\alpha\in[0,1], for each j[p]j\in[p], we reject the null hypothesis 𝖦𝟢,𝗃:βj=0\mathsf{G_{0,j}}:\beta^{*}_{j}=0 in favor of 𝖦𝟣,𝗃:βj0\mathsf{G_{1,j}}:\beta^{*}_{j}\neq 0 when

n|β^Wj|/σ>zα/2,\sqrt{n}|\hat{\beta}_{Wj}|/\sigma>z_{\alpha/2}, (19)

where zα/2z_{\alpha/2} is the upper (α/2)th(\alpha/2)^{\text{th}} quantile of a standard normal random variable.
Drlt for δ\boldsymbol{\delta^{*}}: In Aim (ii), we intended to develop a statistical test to determine whether or not a pooled measurement is affected by MMEs. Given the significance level α[0,1]\alpha\in[0,1], for each i[n]i\in[n], we reject the null hypothesis 𝖧𝟢,𝗂:δi=0\mathsf{H_{0,i}}:\delta^{*}_{i}=0 in favor of 𝖧𝟣,𝗂:δi0\mathsf{H_{1,i}}:\delta^{*}_{i}\neq 0 when

|δ^Wi|/(σΣAii)>zα/2.|\hat{\delta}_{Wi}|/\left(\sigma\sqrt{\Sigma_{A_{ii}}}\right)>z_{\alpha/2}. (20)

A desirable property of a statistical test is that probability of rejecting the null hypothesis when the alternate is true converges to 11 (referred to as a consistent test). Theorem 2 ensures that the proposed Drlts are consistent. In other words, the sensitivity and specificity (as described in Sec. V) of both these tests approach 11 as n,pn,p\rightarrow\infty.

IV Optimal Debiased Lasso Test

It is well known that a statistical test based on a statistic with smaller variance is generally more powerful than that based on a statistic with higher variance [9]. Therefore, we wish to design a weight matrix 𝑾\boldsymbol{W} that reduces the variance of the debiased robust Lasso estimate to construct an ‘optimal’ debiased robust Lasso test. Hence we choose 𝑾\boldsymbol{W} so as to minimize the sum of the asymptotic variances of the debiased estimates {β^Wj}j=1p\{\hat{\beta}_{Wj}\}_{j=1}^{p}. The procedure for the design of 𝑾\boldsymbol{W} is presented in Alg. 2.

Rearranging (14) and (15) followed by some algebra, we obtain the following expressions:

𝜷^𝑾𝜷\displaystyle\boldsymbol{\hat{\beta}_{W}}-\boldsymbol{\beta^{*}} =\displaystyle= 1n𝑾𝜼+(𝑰𝒑1n𝑾𝑨)(𝜷𝜷^𝝀𝟏)+1n𝑾(𝜹𝜹^𝝀𝟐),\displaystyle\frac{1}{n}\boldsymbol{W}^{\top}\boldsymbol{{\eta}}+\left(\boldsymbol{I_{p}}-\frac{1}{n}\boldsymbol{W}^{\top}\boldsymbol{A}\right)(\boldsymbol{\beta^{*}-\hat{\beta}_{\lambda_{1}}})+\frac{1}{n}\boldsymbol{W}^{\top}\left(\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}\right), (21)
𝜹^𝑾𝜹\displaystyle\boldsymbol{\hat{\delta}_{W}}-\boldsymbol{\delta^{*}} =\displaystyle= (𝑰𝒏1n𝑾𝑨)𝜼+(𝑰𝒏1n𝑾𝑨)𝑨(𝜷𝜷^𝝀𝟏)1n𝑾𝑨(𝜹𝜹^𝝀𝟐).\displaystyle\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{W}\boldsymbol{A}^{\top}\right)\boldsymbol{{\eta}}+\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{W}\boldsymbol{A}^{\top}\right)\boldsymbol{A}(\boldsymbol{\beta^{*}-\hat{\beta}_{\lambda_{1}}})-\frac{1}{n}\boldsymbol{WA}^{\top}\left(\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}\right). (22)

Note that the first term on the RHS of both (21) and (22) is zero-mean Gaussian. The remaining two terms in both equations are bias terms. In order to develop an optimal hypothesis test for the debiased robust Lasso, we show that (i) the variances of the first term on the RHS of (21) and (22) are bounded with appropriate scaling as n,pn,p\to\infty; and (ii) the two bias terms in (21) and (22) go to 0 in probability as n,pn,p\to\infty. In such a situation, the sum of the asymptotic variance of the elements of 𝜷^𝑾\boldsymbol{\hat{\beta}_{W}} will be σ2n2j=1p𝒘.𝒋𝒘.𝒋\frac{\sigma^{2}}{n^{2}}\sum_{j=1}^{p}\boldsymbol{w_{.j}}^{\top}\boldsymbol{w_{.j}}.

Algorithm 2 Design of 𝑾\boldsymbol{W}
0:  𝑨\boldsymbol{A}, μ1\mu_{1}, μ2\mu_{2} and μ3\mu_{3}
0:  𝑾\boldsymbol{W}
1:  We solve the following optimisation problem :
minimize𝑾\displaystyle\underset{\boldsymbol{W}}{\textrm{minimize}} j=1p𝒘.𝒋𝒘.𝒋\displaystyle\quad\sum_{j=1}^{p}\boldsymbol{w_{.j}}^{\top}\boldsymbol{w_{.j}}
subject to 𝖢𝟢:𝒘.𝒋𝒘.𝒋/n1j[n]\displaystyle\mathsf{C0}:\boldsymbol{w_{.j}}^{\top}\boldsymbol{w_{.j}}/n\leq 1\ \forall\ j\in[n]
𝖢𝟣:|(𝑰𝒑1n𝑾𝑨)|μ1,\displaystyle\mathsf{C1}:\left|\left(\boldsymbol{I_{p}}-\frac{1}{n}\boldsymbol{W^{\top}A}\right)\right|_{\infty}\leq\mu_{1},
𝖢𝟤:|1p(𝑰𝒏1n𝑾𝑨)𝑨|μ2,\displaystyle\mathsf{C2}:\left|\frac{1}{p}\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{W}\boldsymbol{A}^{\top}\right)\boldsymbol{A}\right|_{\infty}\leq\mu_{2},
𝖢𝟥:np1np|(𝑾𝑨npn𝑰𝒏)|μ3,\displaystyle\mathsf{C3}:\frac{n}{p\sqrt{1-\frac{n}{p}}}\left|\left(\frac{\boldsymbol{WA^{\top}}}{n}-\frac{p}{n}\boldsymbol{I_{n}}\right)\right|_{\infty}\leq\mu_{3},
where μ122log(p)n\mu_{1}\triangleq 2\sqrt{\frac{2\log(p)}{n}}, μ22log(2np)np+1n\mu_{2}\triangleq 2\sqrt{\frac{\log({2np})}{np}}+\frac{1}{n} and μ321n/p2log(n)p\mu_{3}\triangleq\frac{2}{\sqrt{1-n/p}}\sqrt{\frac{2\log(n)}{p}}.
2:  If the above problem is not feasible, then set 𝑾=𝑨\boldsymbol{W=A}.

Theorem 3 (given below) establishes that the second and third term on the RHS of both (21) and (22) go to 0 in probability. We design 𝑾\boldsymbol{W} which minimizes the expression j=1p𝒘.𝒋𝒘.𝒋\sum_{j=1}^{p}\boldsymbol{w_{.j}^{\top}w_{.j}} subject to constraints 𝖢𝟢,𝖢𝟣,𝖢𝟤,𝖢𝟥\mathsf{C0},\mathsf{C1},\mathsf{C2},\mathsf{C3} on 𝑾\boldsymbol{W} which are summarized in Alg. 2. The values of μ1,μ2,μ3\mu_{1},\mu_{2},\mu_{3} are selected in such a way that each of the constraints 𝖢𝟣,𝖢𝟤,𝖢𝟥\mathsf{C1},\mathsf{C2},\mathsf{C3} in Alg. 2 holds with high probability for the choice 𝑾𝑨\boldsymbol{W}\triangleq\boldsymbol{A}, as will be formally established in Lemma 5. These constraints are derived from Theorem 3 and ensure that the bias terms go to 0. In particular, the constraint 𝖢𝟣\mathsf{C1} ( via μ1\mu_{1}) controls the rate of convergence of bias terms on the RHS of (21), whereas the constraint 𝖢𝟤\mathsf{C2} (via μ2\mu_{2}) controls the rate of convergence of bias terms on RHS of (22). Furthermore, the constraint 𝖢𝟥\mathsf{C3} ensures that the asymptotic variance of the first term on RHS of (22) converges. Essentially, the choice 𝑾𝑨\boldsymbol{W}\triangleq\boldsymbol{A} helps us establish that the set of all possible 𝑾\boldsymbol{W} matrices which satisfy the constraints in Alg. 2 is non-empty with high probability. Finally, Theorem 4 establishes that the variances of the first term on the RHS of (21) and (22) converge. These theorems play a vital role in deriving Theorem 5 that leads to developing the optimal debiased robust Lasso tests.

Theorem 3

Let 𝛃^𝛌𝟏,𝛅^𝛌𝟐\boldsymbol{\hat{\beta}}_{\boldsymbol{\lambda_{1}}},\boldsymbol{\hat{\delta}_{\lambda_{2}}} be as in (6), 𝛃^𝐖\boldsymbol{\hat{\beta}_{W}}, 𝛅^𝐖\boldsymbol{\hat{\delta}_{W}} be as in (14), (15) respectively and set λ14σlogpn,λ24σlognn\lambda_{1}\triangleq\frac{4\sigma\sqrt{\log p}}{\sqrt{n}},\lambda_{2}\triangleq\frac{4\sigma\sqrt{\log n}}{{n}}. Let 𝐀\boldsymbol{A} be a random Rademacher matrix and let 𝐖\boldsymbol{W} be obtained from Alg. 2. Then if nn is o(p)o(p) and nn is ω[((s+r)logp)2]\omega[((s+r)\log p)^{2}], as p,np,n\to\infty, we have:

  1. 1.
    n(𝑰𝒑1n𝑾𝑨)(𝜷𝜷^𝝀𝟏)=oP(1).\left\|\sqrt{n}\left(\boldsymbol{I_{p}}-\frac{1}{n}\boldsymbol{W^{\top}}\boldsymbol{A}\right)(\boldsymbol{\beta^{*}-\hat{\beta}_{\lambda_{1}}})\right\|_{\infty}=o_{P}(1). (23)
  2. 2.
    1n𝑾(𝜹𝜹^𝝀𝟐)=oP(1).\left\|\frac{1}{\sqrt{n}}\boldsymbol{W}^{\top}\left(\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}\right)\right\|_{\infty}=o_{P}(1). (24)
  3. 3.
    np1n/p(𝑰𝒏1n𝑾𝑨)𝑨(𝜷𝜷^𝝀𝟏)=oP(1).\left\|\frac{n}{p\sqrt{1-n/p}}\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{W}\boldsymbol{A}^{\top}\right)\boldsymbol{A}(\boldsymbol{\beta^{*}-\hat{\beta}_{\lambda_{1}}})\right\|_{\infty}=o_{P}(1). (25)
  4. 4.
    np1n/p1n𝑾𝑨(𝜹𝜹^𝝀𝟐)=oP(1).\left\|\frac{n}{p\sqrt{1-n/p}}\frac{1}{n}\boldsymbol{WA}^{\top}\big{(}\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}\big{)}\right\|_{\infty}=o_{P}(1). (26)

\blacksquare

Now, note that the variance-covariance matrix of the first terms of the RHSs of (22) and (21), respectively, are

𝚺𝜷\displaystyle\boldsymbol{\Sigma_{\beta}} \displaystyle\triangleq Var(1n𝑾𝜼)=σ21n𝑾𝑾,\displaystyle Var\left(\frac{1}{n}\boldsymbol{W}^{\top}\boldsymbol{{\eta}}\right)=\sigma^{2}\frac{1}{n}\boldsymbol{W^{\top}W}, (27)
𝚺𝜹\displaystyle\boldsymbol{\Sigma_{\delta}} \displaystyle\triangleq Var((𝑰𝒏1n𝑾𝑨)𝜼)=σ2(𝑰𝒏1n𝑾𝑨)(𝑰𝒏1n𝑾𝑨).\displaystyle Var\left(\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{W}\boldsymbol{A}^{\top}\right)\boldsymbol{{\eta}}\right)=\sigma^{2}\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{WA}^{\top}\right)\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{WA}^{\top}\right)^{\top}. (28)

Theorem 4 shows that when 𝑾\boldsymbol{W} is chosen as per Alg. 2, the element-wise variances of the first term of the RHS of (21) (diagonal elements of 𝚺𝜷\boldsymbol{\Sigma_{\beta}}) approach 11 in probability. The constraints 𝖢𝟢\mathsf{C0} and 𝖢𝟣\mathsf{C1} of Alg. 2 are mainly used to establish this theorem. Further, for the optimal choice of 𝑾\boldsymbol{W} as in Alg. 2, we show that the element-wise variances of the first term of the RHS of (22) (diagonal elements of 𝚺𝜹\boldsymbol{\Sigma_{\delta}}) goes to 11 in probability. To establish this, we use the constraint 𝖢𝟥\mathsf{C3} of Alg. 2.

Theorem 4

Let 𝐀\boldsymbol{A} be a Rademacher matrix. Suppose 𝐖\boldsymbol{W} is obtained from Alg. 2 and 𝚺𝛃\boldsymbol{\Sigma_{\beta}} and 𝚺𝛅\boldsymbol{\Sigma_{\delta}} are defined as in (27) and (28), respectively. If nlognn\log n is o(p)o(p) and nn is ω[((s+r)logp)2]\omega[((s+r)\log p)^{2}], as n,pn,p\to\infty, we have the following:

  1. (1)

    For j[p]j\in[p],

    Σβjj𝑃σ2.\Sigma_{\beta_{jj}}\overset{P}{\to}\sigma^{2}. (29)
  2. (2)

    For i[n]i\in[n],

    n2p2(1n/p)Σδii𝑃σ2.\frac{n^{2}}{p^{2}(1-n/p)}\Sigma_{\delta_{ii}}\overset{P}{\to}\sigma^{2}. (30)

    \blacksquare

When we choose an optimal 𝑾\boldsymbol{W} as per the Alg. 2, the equations (21) and (22) along with Theorem 3 and Theorem 4 can be used to derive the asymptotic distribution of 𝜷^𝑾\boldsymbol{\hat{\beta}_{W}} and 𝜹^𝑾\boldsymbol{\hat{\delta}_{W}}. This is accomplished in Theorem 5, which can be viewed as a non-trivial extension of Theorem 2 for such an optimal choice of 𝑾\boldsymbol{W}.

Theorem 5

Let 𝛃^𝛌𝟏,𝛅^𝛌𝟐\boldsymbol{\hat{\beta}}_{\boldsymbol{\lambda_{1}}},\boldsymbol{\hat{\delta}_{\lambda_{2}}} be as in (6), 𝛃^𝐖\boldsymbol{\hat{\beta}_{W}}, 𝛅^𝐖\boldsymbol{\hat{\delta}_{W}} be as in (14), (15) respectively and set λ14σlogpn,λ24σlognn\lambda_{1}\triangleq\frac{4\sigma\sqrt{\log p}}{\sqrt{n}},\lambda_{2}\triangleq\frac{4\sigma\sqrt{\log n}}{{n}}. Let 𝐀\boldsymbol{A} be a random Rademacher matrix and 𝐖\boldsymbol{W} be the debiasing matrix obtained from Alg. 2. If nn is ω[((s+r)logp)2]\omega[((s+r)\log p)^{2}] and nlognn\log n is o(p)o(p), then we have:

  1. (1)

    For fixed j[p]j\in[p],

    n(β^\scaletoWj5ptβj)ΣβjjN(0,1) as p,n.\frac{\sqrt{n}(\hat{\beta}_{{\scaleto{Wj}{5pt}}}-{\beta}^{*}_{j})}{\sqrt{\Sigma_{\beta_{jj}}}}\xrightarrow{\mathcal{L}}N\left(0,1\right)\mbox{ as }p,n\to\infty. (31)
  2. (2)

    For fixed i[n]i\in[n]

    δ^\scaletoWi5ptδiΣδiiN(0,1) as p,n,\displaystyle\frac{\hat{\delta}_{{\scaleto{Wi}{5pt}}}-\delta^{*}_{i}}{\sqrt{\Sigma_{\delta_{ii}}}}\xrightarrow{\mathcal{L}}N\left(0,1\right)\mbox{ as }p,n\to\infty, (32)

    where Σβjj\Sigma_{\beta_{jj}} and Σδii\Sigma_{\delta_{ii}} are jthj^{\text{th}} and ithi^{\text{th}} diagonal element of matrices 𝚺𝜷\boldsymbol{\Sigma_{\beta}} (as in (27)) and 𝚺𝜹\boldsymbol{\Sigma_{\delta}} (as in (28)), respectively.

\blacksquare

Theorem 5 paves a way to develop an optimal Drlt for Aim (i) and (ii) of this work along a similar line of development as the Drlt.
Optimal Drlt for β\boldsymbol{\beta^{*}}: As in Drlt for 𝜷\boldsymbol{\beta^{*}}, we now present a hypothesis testing procedure for an optimally designed 𝑾\boldsymbol{W} to determine defective samples based on Theorem 5. As before, given α>0\alpha>0 we reject the null hypothesis 𝖦𝟢,𝗃:βj=0\mathsf{G_{0,j}}:\beta^{*}_{j}=0 in favor of 𝖦𝟣,𝗃:βj0\mathsf{G_{1,j}}:\beta^{*}_{j}\neq 0, for each j[p]j\in[p] when

nβ^Wj/Σβjj>zα/2.\sqrt{n}\hat{\beta}_{Wj}\big{/}\sqrt{\Sigma_{\beta_{jj}}}>z_{\alpha/2}. (33)

where zα/2z_{\alpha/2} is as in (19).

Optimal Drlt for δ\boldsymbol{\delta^{*}}: As in Drlt for 𝜹\boldsymbol{\delta^{*}}, we develop a hypothesis testing procedure corresponding to optimal 𝑾\boldsymbol{W} to determine whether or not a measurement in 𝒚\boldsymbol{y} is affected by effective MMEs based on Theorem 5. As before, given α>0\alpha>0, for i[n]i\in[n], we reject the null hypothesis 𝖧𝟢,𝗂:δi=0\mathsf{H_{0,i}}:\delta^{*}_{i}=0 in favor of 𝖧𝟣,𝗂:δi0\mathsf{H_{1,i}}:\delta^{*}_{i}\neq 0 when

δ^Wi/Σδii>zα/2.\hat{\delta}_{Wi}\big{/}\sqrt{\Sigma_{\delta_{ii}}}>z_{\alpha/2}. (34)

Similar to the Drlts in (19) and (20), Theorem 5 ensures that the proposed Optimal Drlts are consistent size α\alpha tests. This implies that the sensitivity and specificity of both these tests approach 11 as n,pn,p\rightarrow\infty. In Section V, we show that the Optimal Drlt is superior to Drlt in finite sample scenarios.

V Experimental Results

Data Generation: We now describe the method of data generation for our simulation study. We synthetically generated signals (i.e., 𝜷\boldsymbol{\beta^{*}}) with p=500p=500 elements in each. For the non-zero values of 𝜷\boldsymbol{\beta^{*}}, 40%\% were drawn i.i.d. from U(50,100)U(50,100) and the remaining 60%\% were drawn i.i.d. from U(500,103)U(500,10^{3}), and were placed at randomly chosen indices. The elements of the pooling matrix 𝑩\boldsymbol{B} were drawn i.i.d. from Bernoulli(0.5)\textrm{Bernoulli}(0.5), thereby producing 𝑨2𝑩𝟏𝒏𝟏𝒑\boldsymbol{A}\triangleq 2\boldsymbol{B}-\boldsymbol{1_{n}}\boldsymbol{1_{p}}^{\top} where 𝑨\boldsymbol{A} is Rademacher distributed. In order to generate effective bit-flips, sign changes were induced in an adversarial manner in randomly chosen rows of 𝑨\boldsymbol{A} and at column indices corresponding to the non-zero locations of 𝜷\boldsymbol{\beta^{*}}. This yielded the perturbed matrix 𝑨^\boldsymbol{\hat{A}}, produced via an adversarial form of the model mismatch error (MMEs) for bit-flips which will be described in the following paragraph. Define the fractions fsps/pf_{sp}\triangleq s/p, fadvr/nf_{adv}\triangleq r/n. We chose the noise standard deviation σ\sigma to be a fraction of the mean absolute value of the noiseless measurements, i.e., we set σfσi=1n|𝒂𝒊.𝜷|/n\sigma\triangleq f_{\sigma}\sum_{i=1}^{n}|\boldsymbol{a_{i.}\beta^{*}}|/n where 0<fσ<10<f_{\sigma}<1. For different simulation scenarios, different values of s=𝜷0s=\|\boldsymbol{\beta^{*}}\|_{0} (via fspf_{sp}), r=𝜹0r=\|\boldsymbol{\delta^{*}}\|_{0} (via fadvf_{adv}), noise standard deviation σ\sigma (via fσf_{\sigma}) and number of measurements nn were chosen, as will be described in the following paragraphs.

Choice of Model Mismatch Error: In our work, all MMEs were generated in the following manner. A bit-flipped pool (measurement as described in (5)) contains exactly one bit-flip at a randomly chosen index. Suppose that the ithi^{\text{th}} pool (measurement) contains a bit-flip. Then exactly one of the following two can happen: (1) some jthj^{\text{th}} sample that was intended to be in the pool (as defined in 𝑨\boldsymbol{A}) is excluded, or (2) some jthj^{\text{th}} sample that was not intended to be part of the pool (as defined in 𝑨\boldsymbol{A}) is included. These two cases lead to the following changes in the ithi^{\text{th}} row of 𝑨^\boldsymbol{\hat{A}} (as compared to the ithi^{\text{th}} row of 𝑨\boldsymbol{A}), and in both cases the choice of j[n]j\in[n] is random uniform: Case 1: a^ij=1\hat{a}_{ij}=-1 but aij=1a_{ij}=1, Case 2: a^ij=1\hat{a}_{ij}=1 but aij=1a_{ij}=-1. Note that under this scheme, the generated bit-flips may not be effective. Hence MMEs need to be applied in adversarial setting by inducing bit-flips only at those entries in any row of 𝑨^\boldsymbol{\hat{A}} corresponding to column indices with non-zero values of 𝜷\boldsymbol{\beta^{*}}.

Choice of Regularization Parameters: The regularization parameters λ1\lambda_{1} and λ2\lambda_{2} were chosen such that log(λ1)\log(\lambda_{1}) and log(λ2)\log(\lambda_{2}) was from the range [1:0.25:7][1:0.25:7] in the following manner: We first identified values of λ1\lambda_{1} and λ2\lambda_{2} such that the Lilliefors test [32] confirmed the Gaussian distribution for both nβ^Wj/(Σβjj)\sqrt{n}\hat{\beta}_{Wj}/\left(\sqrt{\Sigma_{\beta_{jj}}}\right) and δ^Wi/(Σδii)\hat{\delta}_{Wi}/\left(\sqrt{\Sigma_{\delta_{ii}}}\right) (see Odrlt as in (33) and (34)) at the 1% significance level, for at least 70% of j[p]j\in[p] (coordinates of 𝜷\boldsymbol{\beta^{*}}) and i[n]i\in[n] (coordinates of 𝜹\boldsymbol{\delta^{*}}). Out of these chosen values, we determined the value of λ1,λ2\lambda_{1},\lambda_{2} that minimized the average cross-validation error over 10 folds. In each fold, 90% of the nn measurements (denoted by a sub-vector 𝒚𝒓\boldsymbol{y_{r}} corresponding to sub-matrix 𝑨𝒓\boldsymbol{A_{r}}) were used to obtain (𝜷^𝝀𝟏,𝜹^𝝀𝟐\boldsymbol{\hat{\beta}_{\lambda_{1}}},\boldsymbol{\hat{\delta}_{\lambda_{2}}}) via the robust Lasso, and the remaining 10% of the measurements (denoted by a sub-vector 𝒚𝒄𝒗\boldsymbol{y_{cv}} corresponding to measurements generated by the sub-matrix 𝑨𝒄𝒗\boldsymbol{A_{cv}}) were used to estimate the cross-validation error 𝒚𝒄𝒗𝑨𝒄𝒗𝜷^𝝀𝟏𝑰𝒄𝒗𝜹^𝝀𝟐22\|\boldsymbol{y_{cv}}-\boldsymbol{A_{cv}}\boldsymbol{\hat{\beta}_{\lambda_{1}}}-\boldsymbol{I_{cv}}\boldsymbol{\hat{\delta}_{\lambda_{2}}}\|^{2}_{2}. Note that 𝑰𝒄𝒗\boldsymbol{I_{cv}} is a sub-matrix of the identity matrix which samples only some elements of 𝒚\boldsymbol{y} and hence of 𝜹^𝒄𝒗\boldsymbol{\hat{\delta}_{cv}}. The cross-validation error is known to be an estimable quantity for the mean-squared error [52], which justifies its choice as a method for parameter selection.

Evaluation Measures of Hypothesis Tests: Many different variants of the Lasso estimator were compared empirically against each other as will be described in subsequent subsections. Each of them were modeled and solved using the CVX package in MATLAB. Results for the hypothesis tests (given in (19),(20),(33) and (34)) are reported in terms of sensitivity and specificity (defined below). The significance level of these tests is chosen at 1%1\%. Consider a binary signal 𝒃^𝜷\boldsymbol{\hat{b}_{\beta}} with pp elements. In our simulations, a sample at index jj in 𝜷^𝑾\boldsymbol{\hat{\beta}_{W}} is declared to be defective if the hypothesis test 𝖦𝟢,𝗃\mathsf{G_{0,j}} is rejected, in which case we set b^β,j=1\hat{b}_{\beta,j}=1. In all other cases, we set b^β,j=0\hat{b}_{\beta,j}=0. We declare an element to be a true defective if βj0\beta^{*}_{j}\neq 0 and b^β,j0\hat{b}_{\beta,j}\neq 0, and a false defective if βj=0\beta^{*}_{j}=0 but b^β,j0\hat{b}_{\beta,j}\neq 0. We declare it to be a false non-defective if βj=0\beta^{*}_{j}=0 but b^β,j0\hat{b}_{\beta,j}\neq 0, and a true non-defective if βj=0\beta^{*}_{j}=0 and b^β,j=0\hat{b}_{\beta,j}=0. The sensitivity for 𝜷\boldsymbol{\beta^{*}} is defined as (#\# true defectives)/(#\# true defectives + #\# false non-defectives) and specificity for 𝜷\boldsymbol{\beta^{*}} is defined as (#\# true non-defectives)/(#\# true non-defectives + #\# false defectives). We report the results of testing for the debiased tests using: (i) 𝑾𝑨\boldsymbol{W}\triangleq\boldsymbol{A} corresponding to Drlt (see (19) and (20)), and (ii) the optimal 𝑾\boldsymbol{W} using Alg. 2 corresponding to Odrlt (see (33) and (34)).

V-A Results with Baseline Debiasing Techniques in the Presence of MMEs

We now describe the results of an experiment to show the impact of Odrlt (33) in the presence of MMEs induced in 𝑨\boldsymbol{A}. We compare Odrlt with the baseline hypothesis test for 𝜷\boldsymbol{\beta^{*}} as defined by [28], which is equivalent to ignoring MMEs (i.e., setting 𝜹=0\boldsymbol{\delta^{*}}=0 in (5)). Considering the presence of MMEs, we further compare Odrlt with the baseline test defined in [28] which would use the approximate inverse of the augmented sensing matrix (𝑨|𝑰𝒏)(\boldsymbol{A}|\boldsymbol{I_{n}}) as obtained from Alg. 1 with μ=2log(n+p)n\mu=2\sqrt{\frac{\log(n+p)}{n}}. Moreover, the theoretical results established in [28] hold for completely random or purely deterministic sensing matrices, whereas the sensing matrix corresponding to the MME model, i.e., (𝑨|𝑰𝒏)(\boldsymbol{A}|\boldsymbol{I_{n}}), is partly random and partly deterministic. We now describe these two chosen baseline hypothesis tests for 𝜷\boldsymbol{\beta^{*}} in more detail.

  1. 1.

    Baseline ignoring MMEs: (Baseline-1) This approach computes the following ‘debiased’ estimate of 𝜷\boldsymbol{\beta^{*}} as given in Equation (5) of [28]:

    𝜷^𝒃𝜷^𝝀,𝒃+1n𝑴𝑨(𝒚𝑨𝜷^𝝀,𝒃),\boldsymbol{\hat{\beta}_{b}}\triangleq\boldsymbol{\hat{\beta}_{\lambda,b}}+\frac{1}{n}\boldsymbol{MA}^{\top}(\boldsymbol{y}-\boldsymbol{A\hat{\beta}_{\lambda,b}}), (35)

    where 𝜷^𝝀,𝒃argmin𝜷𝒚𝑨𝜷22+λ𝜷1\boldsymbol{\hat{\beta}_{\lambda,b}}\triangleq\text{argmin}_{\boldsymbol{\beta}}\|\boldsymbol{y}-\boldsymbol{A\beta}\|^{2}_{2}+\lambda\|\boldsymbol{\beta}\|_{1}, and 𝑴\boldsymbol{M} is the approximate inverse of 𝑨\boldsymbol{A} obtained from Alg. 1. In this baseline approach, we reject the null hypothesis 𝖦𝟢,𝗃:βj=0\mathsf{G_{0,j}}:\beta^{*}_{j}=0 in favor of 𝖦𝟣,𝗃:βj0\mathsf{G_{1,j}}:\beta^{*}_{j}\neq 0, for each j[p]j\in[p] when n𝜷^𝒃𝒋/σ2[𝑴𝑨𝑨𝑴]jj>zα/2\sqrt{n}\boldsymbol{\hat{\beta}_{bj}}/\sqrt{\sigma^{2}[\boldsymbol{MA^{\top}AM^{\top}}]_{jj}}>z_{\alpha/2}.

  2. 2.

    Baseline considering MMEs: (Baseline-2) In this approach, we consider the MMEs which is equivalent to the sensing matrix as (𝑨|𝑰𝒏)(\boldsymbol{A}|\boldsymbol{I_{n}}) and signal vector 𝒙=(𝜷,𝜹)\boldsymbol{x^{*}}=(\boldsymbol{\beta^{*}}^{\top},\boldsymbol{\delta^{*}}^{\top})^{\top}. The ‘debiased’ estimate of 𝒙\boldsymbol{x^{*}} in this approach is given as:

    𝒙~𝒃𝒙~𝝀+1n𝑴~(𝑨|𝑰𝒏)(𝒚(𝑨|𝑰𝒏)𝒙~𝝀),\boldsymbol{\tilde{x}_{b}}\triangleq\boldsymbol{\tilde{x}_{\lambda}}+\frac{1}{n}\boldsymbol{\tilde{M}}(\boldsymbol{A}|\boldsymbol{I_{n}})^{\top}(\boldsymbol{y}-(\boldsymbol{A}|\boldsymbol{I_{n}})\boldsymbol{\tilde{x}_{\lambda}}), (36)

    where 𝒙~𝝀argmin𝜷𝒚(𝑨|𝑰𝒏)𝒙22+λ𝒙1\boldsymbol{\tilde{x}_{\lambda}}\triangleq\text{argmin}_{\boldsymbol{\beta}}\|\boldsymbol{y}-(\boldsymbol{A}|\boldsymbol{I_{n}})\boldsymbol{x}\|^{2}_{2}+\lambda\|\boldsymbol{x}\|_{1} and 𝑴~\boldsymbol{\tilde{M}} is the approximate inverse of (𝑨|𝑰𝒏)(\boldsymbol{A}|\boldsymbol{I_{n}}) obtained from Alg. 1. Then 𝜷~𝒃\boldsymbol{\tilde{\beta}_{b}} is obtained by extracting the first pp elements of 𝒙~𝒃\boldsymbol{\tilde{x}_{b}}. In this approach, we reject the null hypothesis 𝖦𝟢,𝗃:βj=0\mathsf{G_{0,j}}:\beta^{*}_{j}=0 in favor of 𝖦𝟣,𝗃:βj0\mathsf{G_{1,j}}:\beta^{*}_{j}\neq 0, for each j[p]j\in[p] when n𝜷~𝒃𝒋/σ2[𝑴~(𝑨|𝑰𝒏)(𝑨|𝑰𝒏)𝑴~]jj>zα/2\sqrt{n}\boldsymbol{\tilde{\beta}_{bj}}/\sqrt{\sigma^{2}[\boldsymbol{\tilde{M}(A|I_{n})^{\top}(A|I_{n})\tilde{M}^{\top}}]_{jj}}>z_{\alpha/2}.

For these baseline approaches, the regularization parameter λ\lambda was chosen using cross validation. We chose the λ\lambda value which minimized the validation error with 90%90\% of the measurements used for reconstruction and the remaining 10%10\% used for cross-validation.

nn Sensitivity - Baseline-1 Sensitivity-Baseline-2 Sensitivity-Odrlt Specificity-Baseline-1 Specificity-Baseline-2 Specificity-Odrlt
100 0.522 0.602 0.647 0.678 0.702 0.771
200 0.597 0.682 0.704 0.832 0.895 0.931
300 0.698 0.802 0.878 0.884 0.915 0.963
400 0.791 0.834 0.951 0.902 0.927 0.999
500 0.858 0.894 0.984 0.923 0.956 1

TABLE I: Comparison of average Sensitivity and Specificity (based on 100 independent noise runs) for the tests Baseline-1, Baseline-2 and Odrlt for determining defectives in 𝜷\boldsymbol{\beta^{*}} from their respective debiased estimates in the presence of MMEs induced in 𝑨\boldsymbol{{A}} (See Sec. V-A for detailed definitions).

In Table  I, we compare the average values (over 100100 instances of measurement noise) of Sensitivity and Specificity of Baseline-1, Baseline-2 and Odrlt for different values of nn varying in {100,200,300,400,500}\{100,200,300,400,500\} and p=500p=500. It is clear from Table I, that for all the values of nn, the Sensitivity and Specificity value of Odrlt is higher as compared to that of Baseline-1 and Baseline-2. The performance of Baseline-2 dominates Baseline-1 which indicates that ignoring MMEs may lead to misleading inferences in small sample scenario. Furthermore, the Sensitivity and Specificity of Odrlt approaches 1 as nn increases. This highlights the superiority of our proposed technique and its associated hypothesis tests over two carefully chosen baselines. Note that there is no prior literature on debiasing in the presence of MMEs, and hence these two baselines are the only possible competitors for our technique.

V-B Empirical verification of asymptotic results of Theorem. 5

In this subsection, we compare the empirical distribution of TG,jn(β^Wjβj)/[𝚺𝜷]jjT_{G,j}\triangleq\sqrt{n}(\hat{\beta}_{Wj}-\beta^{*}_{j})/\sqrt{[\boldsymbol{\Sigma_{\beta}}]_{jj}} and TH,i(δ^Wiδi)/[𝚺𝜹]iiT_{H,i}\triangleq(\hat{\delta}_{Wi}-\delta^{*}_{i})/\sqrt{[\boldsymbol{\Sigma_{\delta}}]_{{ii}}}, for the optimal weight matrix 𝑾\boldsymbol{W}, with its asymptotic distribution as derived in Theorem. 5. We chose p=500,n=400p=500,n=400, fadv=0.01f_{adv}=0.01, fsp=0.01f_{sp}=0.01 and fσ=0.01f_{\sigma}=0.01. The measurement vector 𝒚\boldsymbol{y} was generated with a perturbed matrix 𝑨^\boldsymbol{\hat{A}} containing effective bit-flips generated using MMEs as described above. Here, TG,jT_{G,j} and TH,iT_{H,i} were computed for 100100 runs across different noise instances in 𝜼\boldsymbol{\eta}.

The left sub-figure of Fig. 1 shows plots of the quantiles of a standard normal random variable versus the quantiles of TG,jT_{G,j} based on 100100 runs for each j[p]j\in[p] in different colors. A 45o45^{o} straight line passing through the origin is also plotted (black solid line) as a reference. These pp different quantile-quantile (QQ) plots corresponding to j[p]j\in[p], all super-imposed on one another, indicate that the quantiles of the TG,jT_{G,j} are close to that of standard normal distribution in the range of [2,2][-2,2] (thus covering 95%95\% range of the area under the standard bell curve) for defective as well as non-defective samples. This confirms that the distribution of the TG,jT_{G,j} are each approximately N(0,1)N(0,1) even in this chosen finite sample scenario. Similarly, the right sub-figure of Fig. 1 shows the QQ-plot corresponding to TH,iT_{H,i} for each i[n]i\in[n] in different colors. As before, these nn different QQ-plots, one for each i[n]i\in[n], all super-imposed on one another, indicate that the TH,iT_{H,i} are also each approximately standard normal, with or without MMEs.

Refer to caption
Refer to caption
Figure 1: Left: Quantile-Quantile plots of 𝒩(0,1)\mathcal{N}(0,1) vs. TG,jT_{G,j} (defined at the beginning of Sec. V-B) using 100 independent noise runs for all j[p]j\in[p] (one plot per index jj with different colors). Right: Quantile-Quantile plots of 𝒩(0,1)\mathcal{N}(0,1) vs. TH,iT_{H,i} (defined at the beginning of Sec. V-B) using 100 independent noise runs for all i[n]i\in[n] (one plot per index ii with different colors). For both plots, the pooling matrix contained MMEs.

V-C Sensitivity and Specificity of Odrlt and Drlt for 𝛅\boldsymbol{\delta^{*}}

We performed experiments to study sensitivity and specificity of Drlt and Odrlt for 𝜹\boldsymbol{\delta^{*}}. In experimental setup E1, we varied fadv{0.01,,0.1}f_{adv}\in\{0.01,...,0.1\} with fixed values n=400,fsp=0.01,fσ=0.1n=400,f_{sp}=0.01,f_{\sigma}=0.1. In E2, we varied nn from 200 to 500 in steps of 50 with fadv=0.01,fsp=0.01,fσ=0.1f_{adv}=0.01,f_{sp}=0.01,f_{\sigma}=0.1. In E3, we varied fσ{0,0.05,,0.5}f_{\sigma}\in\{0,0.05,...,0.5\} with n=400,fadv=0.01,fsp=0.01n=400,f_{adv}=0.01,f_{sp}=0.01. In E4, we varied fsp{0.01,,0.1}f_{sp}\in\{0.01,...,0.1\} with n=400,fadv=0.01,fσ=0.1n=400,f_{adv}=0.01,f_{\sigma}=0.1. The experiments were run 100 times across different noise instances in 𝜼\boldsymbol{\eta}, for the same signal 𝜷\boldsymbol{\beta^{*}} (in E1, E2 and E3) and sensing matrix 𝑨\boldsymbol{A} (in E1, E3 and E4). In E4, the sparsity of the signal varies, therefore, the signal vector 𝜷\boldsymbol{\beta^{*}} also varies. Similarly, in E2 as nn varies, the sensing matrix 𝑨\boldsymbol{A} also varies.

The empirical sensitivity and specificity of a test was computed as follows. The estimate 𝜹^𝑾\boldsymbol{\hat{\delta}_{W}} was binarized to create a vector 𝒃^𝑾,𝜹\boldsymbol{\hat{b}_{W,\delta}} such that for all i[n]i\in[n], the value of b^W,δ(i)\hat{b}_{W,\delta}(i) was set to 1 if Drlt or Odrlt rejected the hypothesis 𝖧𝟢,𝗂\mathsf{H_{0,i}}, and b^W,δ(i)\hat{b}_{W,\delta}(i) was set to 0 otherwise. Likewise, a ground truth binary vector 𝒃𝜹\boldsymbol{b_{\delta}^{*}} was created which satisfied bδ(i)=1b^{*}_{\delta}(i)=1 at all locations ii where δi0\delta^{*}_{i}\neq 0 and bδ(i)=0b^{*}_{\delta}(i)=0 otherwise. Sensitivity and specificity values were computed by comparing corresponding entries of 𝒃𝜹\boldsymbol{b^{*}_{\delta}} and 𝒃^𝑾,𝜹\boldsymbol{\hat{b}_{W,\delta}}. The sensitivity of Drlt and Odrlt test for 𝜹\boldsymbol{\delta^{*}} averaged over 100 runs of different 𝜼\boldsymbol{\eta} instances is reported in Fig. 2 for the different experimental settings E1, E2, E3, E4. Under setup E2, the sensitivity plot indicates that the sensitivity of Drlt and Odrlt increases as nn increases. Under setups E1, E3, and E4, the sensitivity of both Drlt and Odrlt is reasonable even with larger values of fadvf_{adv}, fσf_{\sigma}, and fspf_{sp} (which are difficult regimes). In Fig. 2, we compare the sensitivity of Drlt and Odrlt to that of Robust Lasso from (6) without any debiasing step, which is abbreviated as Rl. To determine defectives and non-defectives for the Rl method, we adopted a thresholding strategy where an estimated element was considered defective (resp. non-defective) if its value was greater than or equal to (resp. less than) a threshold τss\tau_{ss}. The optimal value of τss\tau_{ss} was chosen clairvoyantly (i.e., assuming knowledge of the ground truth signal vector 𝜷\boldsymbol{\beta^{*}}) using Youden’s index which is the value that maximises Sensitivity+Specificity1\text{Sensitivity}+\text{Specificity}-1. Furthermore, Fig. 2 indicates that the sensitivity of Odrlt is superior to that of Rl and Drlt with Drlt also slightly better than Rl. Note that, in practice, a choice of the threshold τss\tau_{ss} for Rl would be challenging and require a representative training set, whereas Drlt and Odrlt do not require any training set for the choice of such a threshold.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 2: Average Sensitivity and Specificity plots (over 100 independent noise runs) for detecting measurements containing MMEs (i.e. detecting non-zero values of 𝜹\boldsymbol{\delta^{*}}) using Drlt, Odrlt and Robust Lasso (Rl). The experimental parameters are p=500,fσ=0.1,fadv=0.01,fsp=0.1,n=400p=500,f_{\sigma}=0.1,f_{adv}=0.01,f_{sp}=0.1,n=400. Left to right, top to bottom: results for experiments E1, E2, E3, E4 (see Sec. V-C for details).

V-D Identification of Defective Samples in 𝛃\boldsymbol{\beta^{*}}

In the next set of experimental results, we first examined the effectiveness of Drlt and Odrlt to detect defective samples in 𝜷\boldsymbol{\beta^{*}} in the presence of bit-flips in 𝑨\boldsymbol{A} induced as per adversarial MMEs. We compared the performance of Drlt and Odrlt to two other closely related algorithms to enable performance calibration: (1) Robust Lasso (Rl) from (6) without debiasing; (2) A hypothesis testing mechanism on a pooling matrix without model mismatch, which we refer to as Baseline-3. In Baseline-3, we generated measurements with the correct pooling matrix 𝑨\boldsymbol{A} (i.e., 𝜹=0\boldsymbol{\delta^{*}}=0) and obtained a debiased Lasso estimate as given by (11). (Note that Baseline-3 is very different from Baseline-1 and Baseline-2 from Sec. V-A as in this approach 𝜹=0\boldsymbol{\delta^{*}}=0.) Using this debiased estimate, we obtained a hypothesis test similar to Odrlt. In the case of Rl, the decision regarding whether a sample is defective or not was taken based on a threshold τss\tau_{ss} that was chosen as the Youden’s index on a training set of signals from the same distribution. The regularization parameters λ1,λ2\lambda_{1},\lambda_{2} were chosen separately for every choice of parameters fadv,fσ,fspf_{adv},f_{\sigma},f_{sp} and nn.

We examined the variation in sensitivity and specificity with regard to change in the following parameters, keeping all other parameters fixed: (EA) the number of bit-flips in the matrix 𝑨\boldsymbol{A} expressed as a fraction fadv[0,1]f_{adv}\in[0,1] of nn; (EB) number of pools nn; (EC) noise standard deviation σ\sigma expressed as a fraction fσ[0,1]f_{\sigma}\in[0,1] of the quantity y¯\bar{y} defined in Sec. V-B; (ED) sparsity (0\ell_{0} norm) ss of vector 𝜷\boldsymbol{\beta^{*}} expressed as fraction fsp[0,1]f_{sp}\in[0,1] of signal dimension pp. For the bit-flips experiment i.e., (EA), fadvf_{adv} was varied in {0.01,0.02,,0.1}\{0.01,0.02,\ldots,0.1\} with n=400,fsp=0.01,fσ=0.1n=400,f_{sp}=0.01,f_{\sigma}=0.1. For the measurements experiment (EB), nn was varied over {200,150,,500}\{200,150,\ldots,500\} with fsp=0.01,fadv=0.01,fσ=0.1f_{sp}=0.01,f_{adv}=0.01,f_{\sigma}=0.1. For the noise experiment (i.e., (EC), we varied fσf_{\sigma} in {0,0.05,,0.5}\{0,0.05,\ldots,0.5\} with n=400,fsp=0.01,fadv=0.01n=400,f_{sp}=0.01,f_{adv}=0.01. For the sparsity experiment (i.e., (ED), fspf_{sp} was varied in {0.01,0.02,,0.1}\{0.01,0.02,\ldots,0.1\} with n=400,fadv=0.01,fσ=0.1n=400,f_{adv}=0.01,f_{\sigma}=0.1.

The Sensitivity and Specificity values, averaged over 100 noise instances, for all four experiments are plotted in Fig. 3. The plots demonstrate the superior performance of Odrlt over Rl and Drlt. Furthermore, the performance of Drlt is also superior to Rl. In all regimes, Baseline performs best as it uses an error-free sensing matrix. We also see that for higher nn, lower fσf_{\sigma} and lower fspf_{sp}, the sensitivity and specificity for Odrlt comes very close to that of Baseline.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3: Average Sensitivity and Specificity plots (over 100 independent noise runs) plots for detecting defective samples (i.e., non-zero values of 𝜷\boldsymbol{\beta^{*}}) using Drlt, Odrlt, Robust Lasso and Baseline 3. Left to right, top to bottom: results for experiments (EA), (EB), (EC), (ED). The experimental parameters are p=500,fσ=0.1,fadv=0.01,fsp=0.1,n=400p=500,f_{\sigma}=0.1,f_{adv}=0.01,f_{sp}=0.1,n=400. See Sec. V-D for more details.

V-E RRMSE Comparison of Debiased Robust Lasso Techniques to Baseline Algorithms

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4: Average RRMSE comparison (over 100 independent noise runs) using Odrlt, Drlt, L1 (L1 Lasso), L2 (L2 Lasso), RL1 (L1 Lasso with Ransac), the RL2 (L2 Lasso with Ransac), and robust Lasso (Rl) w.r.t. variation in the following parameters keeping others fixed: bit-flip proportions fadvf_{adv} (topleft), measurements nn (top right), noise level fσf_{\sigma} (bottom left) and sparsity fspf_{sp} (bottom right). The fixed parameters are dimension of p=500,fσ=0.1,fadv=0.01,fsp=0.01,n=400p=500,f_{\sigma}=0.1,f_{adv}=0.01,f_{sp}=0.01,n=400. See Sec. V-E for more details.

We computed estimates of 𝜷\boldsymbol{\beta^{*}} using the debiased robust Lasso technique in two ways: (i) with the weights matrix 𝑾𝑨\boldsymbol{W}\triangleq\boldsymbol{A}, and (ii) the optimal 𝑾\boldsymbol{W} as obtained using Alg. 2. We henceforth refer to these estimators as Debiased Robust Lasso (Drl) and Optimal Debiased Robust Lasso (Odrl) respectively.

We computed the relative root mean squared error (RRMSE) for Drl and Odrl as follows: First, the pooled measurements with MMEs were identified as described in Sec. V-C and then discarded. From the remaining measurements, an estimate of 𝜷\boldsymbol{\beta^{*}} was obtained using robust Lasso with the optimal λ1,λ2\lambda_{1},\lambda_{2} chosen by cross-validation. Given the resultant estimate 𝜷^\boldsymbol{\hat{\beta}}, the RRMSE was computed as 𝜷𝜷^2/𝜷2\|\boldsymbol{\beta^{*}}-\boldsymbol{\hat{\beta}}\|_{2}/\|\boldsymbol{\beta^{*}}\|_{2}.

We compared the RRMSE of Drl and Odrl to that of the following algorithms:

  1. 1.

    Robust Lasso or Rl from (6).

  2. 2.

    Lasso (referred to as L2) based on minimizing 𝒚𝑨𝜷22+λ𝜷1\|\boldsymbol{y}-\boldsymbol{A\beta}\|^{2}_{2}+\lambda\|\boldsymbol{\beta}\|_{1} with respect to 𝜷\boldsymbol{\beta}. Note that this ignores MMEs.

  3. 3.

    An inherently outlier-resistant version of Lasso which uses the 1\ell_{1} data fidelity (referred to as L1), based on minimizing 𝒚𝑨𝜷1+λ𝜷1\|\boldsymbol{y}-\boldsymbol{A\beta}\|_{1}+\lambda\|\boldsymbol{\beta}\|_{1} with respect to 𝜷\boldsymbol{\beta}.

  4. 4.

    Variants of L1 and L2 combined with the well-known Ransac (Random Sample Consensus) framework [16] (described below in more detail). The combined estimators are referred to as Rl1 and Rl2 respectively.

Ransac is a popular randomized robust regression algorithm, widely used in computer vision [17, Chap. 10]. We apply here to the signal reconstruction problem considered in this paper. In Ransac, multiple small subsets of measurements from 𝒚\boldsymbol{y} are randomly chosen. Let the total number of subsets be NSN_{S}. Let the set of the chosen subsets be denoted by {𝒵i}i=1NS\{\mathcal{Z}_{i}\}_{i=1}^{N_{S}}. From each subset 𝒵i\mathcal{Z}_{i}, the vector 𝜷^(i)\boldsymbol{\hat{\beta}}^{(i)} is estimated, using either L2 or L1. Every measurement is made to ‘cast a vote’ for one of the models from the set {𝜷^(i)}i=1NS\{\boldsymbol{\hat{\beta}}^{(i)}\}_{i=1}^{N_{S}}. We say that measurement yly_{l} (where l[n]l\in[n]) casts a vote for model 𝜷^(j)\boldsymbol{\hat{\beta}}^{(j)} (where j[NS]j\in[N_{S}]) if |yl𝒂𝒍.𝜷^(j)||yl𝒂𝒍.𝜷^(k)||y_{l}-\boldsymbol{a_{l.}}\boldsymbol{\hat{\beta}}^{(j)}|\leq|y_{l}-\boldsymbol{a_{l.}}\boldsymbol{\hat{\beta}}^{(k)}| for k[NS],kjk\in[N_{S}],k\neq j. Let the model which garners the largest number of votes be denoted by 𝜷^j\boldsymbol{\hat{\beta}}^{j_{*}}, where j[NS]j_{*}\in[N_{S}]. The set of measurements which voted for this model is called the consensus set. Ransac when combined with L2 and L1 is respectively called Rl2 and Rl1. In Rl2, the estimator L2 is used to determine 𝜷\boldsymbol{\beta^{*}} using measurements only from the consensus set. Likewise, in Rl1, the estimator L1 is used to determine 𝜷\boldsymbol{\beta^{*}} using measurements only from the consensus set.

Our experiments in this section were performed for signal and sensing matrix settings identical to those described in Sec. V-D. The performance in all experiments was measured using RRMSE, averaged over reconstructions from 100 independent noise runs. For all techniques, the regularization parameters were chosen using cross-validation following the procedure in [52]. The maximum number of subsets for finding the consensus set in Ransac was set to NS=500N_{S}=500 with 0.9n0.9n measurements in each subset. RRMSE plots for all competing algorithms are presented in Fig. 4, where we see that Odrl and Drl outperformed all other algorithms for all parameter ranges considered here. We also observe that Odrl produces lower RRMSE than Drl, particularly in the regime involving higher fadvf_{adv}.

VI Conclusion

We have presented a technique for determining the sparse vector 𝜷\boldsymbol{\beta^{*}} of health status values from noisy pooled measurements in 𝒚\boldsymbol{y}, with the additional feature that our technique is designed to handle bit-flip errors in the pooling matrix. These bit-flip errors can occur at a small number of unknown locations, due to which the pre-specified matrix 𝑨\boldsymbol{A} (known) and the actual pooling matrix 𝑨^\boldsymbol{\hat{A}} (unknown) via which pooled measurements are acquired, differ from each other. We use the theory of Lasso debiasing as our basic scaffolding to identify the defective samples in 𝜷\boldsymbol{\beta^{*}}, but with extensive and non-trivial theoretical and algorithmic innovations to (i) make the debiasing robust to model mismatch errors (MMEs), and also to (ii) enable identification of the pooled measurements that were affected by the MMEs. Our approach is also validated by an extensive set of simulation results, where the proposed method outperforms intuitive baseline techniques. To our best knowledge, there is no prior literature on using Lasso debiasing to identify measurements with MMEs.

There are several interesting avenues for future work:

  1. 1.

    Currently the optimal weights matrix 𝑾\boldsymbol{W} is designed to minimize the variance of the debiased estimates of 𝜷\boldsymbol{\beta^{*}} and not necessarily those of 𝜹\boldsymbol{\delta^{*}}. Our technique could in principle be extended to derive another weights matrix to minimize the variance of the debiased estimates of 𝜹\boldsymbol{\delta^{*}}.

  2. 2.

    A specific form of MMEs consists of unknown permutations in the pooling matrix (also called ‘permutation noise’ [51]) where the pooled results are swapped with one another. The techniques in this paper can be extended to identify pooled measurements that suffer from permutation noise, and potentially correct them. These results, which will be reported elsewhere, are an interesting contribution to the growing sub-area of ‘unlabelled sensing’ [41].

  3. 3.

    Our technique is based on work in [28], which requires that 𝜷0<n/logp\|\boldsymbol{\beta^{*}}\|_{0}<\sqrt{n}/\log p. This is a stronger condition that 𝜷0<n/logp\|\boldsymbol{\beta^{*}}\|_{0}<n/\log p which is common in sparse regression. However in the literature on Lasso debiasing, there exist techniques such as [7] which relax the condition on 𝜷0\|\boldsymbol{\beta^{*}}\|_{0} to allow for n/logp<𝜷0n/logp\sqrt{n}/\log p<\|\boldsymbol{\beta^{*}}\|_{0}\leq n/\log p, with the caveat that a priori knowledge of 𝜷0\|\boldsymbol{\beta^{*}}\|_{0} is (provably) essential. Incorporating these results within the current framework is another avenue for future research. It is also of interest to combine our results with those on in situ estimation of 𝜷0\|\boldsymbol{\beta^{*}}\|_{0} from pooled or compressed measurements as in [43, 33].

Appendix A Proofs of Theorems and Lemmas on Robust Lasso

We extend Theorem 1 and Lemma 1 of [38] to prove our Theorem 1. We re-parameterize model (5) and the robust Lasso optimisation problem (6) to match those in [38], i.e.,

𝒚=(𝑨|n𝑰𝒏)(𝜷𝜹/n)+𝜼=𝑨𝜷+n𝒆+𝜼,\boldsymbol{y}=\left(\boldsymbol{A}\ |\ \sqrt{n}\boldsymbol{I_{n}}\right)\begin{pmatrix}\boldsymbol{\beta^{*}}\\ \boldsymbol{\delta^{*}}/\sqrt{n}\end{pmatrix}+\boldsymbol{{\eta}}=\boldsymbol{A\beta^{*}}+\sqrt{n}\boldsymbol{e^{*}}+\boldsymbol{\eta}, (37)

where 𝒆𝜹/n\boldsymbol{e^{*}}\triangleq\boldsymbol{\delta^{*}}/\sqrt{n}. Note that the optimization problem (6) is

(𝜷^𝝀𝟏𝜹^𝝀~𝟐)\displaystyle\begin{pmatrix}\boldsymbol{\hat{\beta}_{\lambda_{1}}}\\ \boldsymbol{\hat{\delta}_{\tilde{\lambda}_{2}}}\end{pmatrix} =\displaystyle= argmin𝜷,𝜹12n𝒚(𝑨|n𝑰𝒏)(𝜷𝜹/n)22+λ1𝜷1+λ~2𝜹n1,\displaystyle\arg\min_{\boldsymbol{\beta},\boldsymbol{\delta}}{\frac{1}{2n}}\left\|\boldsymbol{y}-(\boldsymbol{A}|\sqrt{n}\boldsymbol{I_{n}})\begin{pmatrix}\boldsymbol{\beta}\\ \boldsymbol{\delta}/\sqrt{n}\end{pmatrix}\right\|^{2}_{2}+\lambda_{1}\|\boldsymbol{\beta}\|_{1}+{\tilde{\lambda}_{2}}\left\|\frac{\boldsymbol{\delta}}{\sqrt{n}}\right\|_{1},

where λ~2=nλ2\tilde{\lambda}_{2}=\sqrt{n}\lambda_{2}. The equivalent robust Lasso optimization problem for the model (37) is given by:

(𝜷^𝝀𝟏𝒆^𝝀~𝟐)\displaystyle\begin{pmatrix}\boldsymbol{\hat{\beta}_{\lambda_{1}}}\\ \boldsymbol{\hat{e}_{\tilde{\lambda}_{2}}}\end{pmatrix} =\displaystyle= argmin𝜷,𝒆12n𝒚(𝑨|n𝑰𝒏)(𝜷𝒆)22+λ1𝜷1+λ~2𝒆1,\displaystyle\arg\min_{\boldsymbol{\beta},\boldsymbol{e}}{\frac{1}{2n}}\left\|\boldsymbol{y}-(\boldsymbol{A}|\sqrt{n}\boldsymbol{I_{n}})\begin{pmatrix}\boldsymbol{\beta}\\ \boldsymbol{e}\end{pmatrix}\right\|^{2}_{2}+\lambda_{1}\|\boldsymbol{\beta}\|_{1}+{\tilde{\lambda}_{2}}\left\|\boldsymbol{e}\right\|_{1}, (38)

where 𝒆^𝝀𝟐=𝜹^𝝀𝟐/n\boldsymbol{\hat{e}_{\lambda_{2}}}=\boldsymbol{\hat{\delta}_{\lambda_{2}}}/\sqrt{n}. In order to prove Theorem 1, we first recall the Extended Restricted Eigenvalue Condition (EREC) for a sensing matrix from [38]. Given 𝜷\boldsymbol{\beta}^{*} and 𝜹\boldsymbol{\delta}^{*}, let us define sets

𝒮{j:βj0},{i:δi0}.\mathcal{S}\triangleq\{j:\beta^{*}_{j}\neq 0\}\ ,\ \mathcal{R}\triangleq\{i:\delta^{*}_{i}\neq 0\}. (39)

Note that s|𝒮|,r||s\triangleq|\mathcal{S}|,r\triangleq|\mathcal{R}|.

Definition 1

Extended Restricted Eigenvalue Condition (EREC) [38]: Given 𝒮,\mathcal{S},\mathcal{R} as defined in (39), and λ1,λ~2>0\lambda_{1},\tilde{\lambda}_{2}>0, an n×pn\times p matrix 𝐀\boldsymbol{A} is said to satisfy the EREC if there exists a κ>0\kappa>0 such that

1n𝑨𝒉𝜷+n𝒉𝜹2κ(𝒉𝜷2+𝒉𝜹2),\frac{1}{\sqrt{n}}\|\boldsymbol{Ah_{\beta}}+\sqrt{n}\boldsymbol{h_{\delta}}\|_{2}\geq\kappa(\|\boldsymbol{h_{\beta}}\|_{2}+\|\boldsymbol{h_{\delta}}\|_{2}), (40)

for all (𝐡𝛃,𝐡𝛅)𝒞(𝒮,,λ)(\boldsymbol{h_{\beta}},\boldsymbol{h_{\delta}})\in\mathcal{C}(\mathcal{S},\mathcal{R},\lambda) with λ:=λ~2/λ1\lambda:=\tilde{\lambda}_{2}/\lambda_{1} and where 𝒞\mathcal{C} is defined as follows:

𝒞(𝒮,,λ)\displaystyle\mathcal{C}(\mathcal{S},\mathcal{R},\lambda) \displaystyle\triangleq {(𝒉𝜷,𝒉𝜹)p×n:(𝒉𝜷)𝒮c1+λ(𝒉𝜹)c13((𝒉𝜷)𝒮1+λ(𝒉𝜹)1)}.\displaystyle\{(\boldsymbol{h_{\beta}},\boldsymbol{h_{\delta}})\in\mathbb{R}^{p}\times\mathbb{R}^{n}:\|(\boldsymbol{h_{\beta}})_{\mathcal{S}^{c}}\|_{1}+\lambda\|(\boldsymbol{h_{\delta}})_{\mathcal{R}^{c}}\|_{1}\leq 3(\|(\boldsymbol{h_{\beta}})_{\mathcal{S}}\|_{1}+\lambda\|(\boldsymbol{h_{\delta}})_{\mathcal{R}}\|_{1})\}. (41)

Here, (𝐡𝛃)𝒮(\boldsymbol{h_{\beta}})_{\mathcal{S}} and (𝐡𝛅)(\boldsymbol{h_{\delta}})_{\mathcal{R}} are ss and rr dimensional vectors extracted from 𝐡𝛃\boldsymbol{h_{\beta}} and 𝐡𝛅\boldsymbol{h_{\delta}} respectively, restricted to the set 𝒮\mathcal{S} and \mathcal{R} as defined in (39). \blacksquare

In Lemma. 1, we extend Lemma 1 from [38] to random Rademacher matrices. In this lemma we show that a random Rademacher matrix 𝑨\boldsymbol{A} satisfies EREC with high probability for κ=1/16\kappa=1/16.

Lemma 1

Let 𝐀\boldsymbol{A} be an n×pn\times p matrix with i.i.d. Rademacher entries. There exist positive constants C1,C2,c3,c4C_{1},C_{2},c_{3},c_{4} such that if nC1slogpn\geq C_{1}s\log p and rmin{C2nlogn,slogplogn}r\leq\textrm{min}\{C_{2}\frac{n}{\log n},\frac{s\log p}{\log n}\} then

P((𝒉𝜷,𝒉𝜹)𝒞(𝒮,,λ),1n𝑨𝒉𝜷+n𝒉𝜹2116(𝒉𝜷2+𝒉𝜹2))1c3exp{c4n},P\left(\forall\ (\boldsymbol{h_{\beta}},\boldsymbol{h_{\delta}})\in\mathcal{C}\left(\mathcal{S},\mathcal{R},\lambda\right),\ \frac{1}{\sqrt{n}}\|\boldsymbol{Ah_{\beta}}+\sqrt{n}\boldsymbol{h_{\delta}}\|_{2}\geq\frac{1}{16}(\|\boldsymbol{h_{\beta}}\|_{2}+\|\boldsymbol{h_{\delta}}\|_{2})\right)\geq 1-c_{3}\exp{\{-c_{4}n\}},

where λ=lognlogp\lambda=\sqrt{\frac{\log n}{\log p}} and 𝒞\mathcal{C} as in (41). \blacksquare

Proof of Lemma 1: Using a similar line of argument as in the proof of Lemma 1 of [38], it is enough to show the following two properties of the sensing matrix 𝑨\boldsymbol{A} to complete the proof.

  1. 1.

    Lower bound on 1nAhβ22+hδ22\frac{1}{n}\|\boldsymbol{Ah_{\beta}}\|_{2}^{2}+\|\boldsymbol{h_{\delta}}\|_{2}^{2}. For some κ1>0\kappa_{1}>0 with high probability,

    1n𝑨𝒉𝜷22+𝒉𝜹22κ1(𝒉𝜷2+𝒉𝜹2)2(𝒉𝜷,𝒉𝜹)𝒞(𝒮,,lognlogp).\frac{1}{n}\|\boldsymbol{Ah_{\beta}}\|_{2}^{2}+\|\boldsymbol{h_{\delta}}\|_{2}^{2}\geq\kappa_{1}\left(\|\boldsymbol{h_{\beta}}\|_{2}+\|\boldsymbol{h_{\delta}}\|_{2}\right)^{2}\ \qquad\forall\ (\boldsymbol{h_{\beta}},\boldsymbol{h_{\delta}})\in\mathcal{C}\left(\mathcal{S},\mathcal{R},\sqrt{\frac{\log n}{\log p}}\right). (42)
  2. 2.

    Mutual Incoherence: The column space of the matrix 𝑨\boldsymbol{A} is incoherent with the column space of the identity matrix. For some κ2>0\kappa_{2}>0 with high probability,

    1n|𝑨𝒉𝜷,𝒉𝜹|κ2(𝒉𝜷2+𝒉𝜹2)2(𝒉𝜷,𝒉𝜹)𝒞(𝒮,,lognlogp).\frac{1}{\sqrt{n}}|\langle\boldsymbol{Ah_{\beta}},\boldsymbol{h_{\delta}}\rangle|\leq\kappa_{2}(\|\boldsymbol{h_{\beta}}\|_{2}+\|\boldsymbol{h_{\delta}}\|_{2})^{2}\ \qquad\forall\ (\boldsymbol{h_{\beta}},\boldsymbol{h_{\delta}})\in\mathcal{C}\left(\mathcal{S},\mathcal{R},\sqrt{\frac{\log n}{\log p}}\right). (43)

By using (42) and (43), we have, with high probability,

1n𝑨𝒉𝜷+𝒏𝒉𝜹22\displaystyle\frac{1}{n}\|\boldsymbol{Ah_{\beta}+\sqrt{n}h_{\delta}}\|_{2}^{2} =\displaystyle= 1n𝑨𝒉𝜷22+𝒉𝜹22+2n𝑨𝒉𝜷,𝒉𝜹κ1(𝒉𝜷2+𝒉𝜹2)22κ2(𝒉𝜷2+𝒉𝜹2)2\displaystyle\frac{1}{n}\left\|\boldsymbol{Ah_{\beta}}\right\|_{2}^{2}+\|\boldsymbol{h_{\delta}}\|_{2}^{2}+\frac{2}{\sqrt{n}}\langle\boldsymbol{Ah_{\beta}},\boldsymbol{h_{\delta}}\rangle\geq\kappa_{1}\left(\|\boldsymbol{h_{\beta}}\|_{2}+\|\boldsymbol{h_{\delta}}\|_{2}\right)^{2}-2\kappa_{2}(\|\boldsymbol{h_{\beta}}\|_{2}+\|\boldsymbol{h_{\delta}}\|_{2})^{2}
=\displaystyle= (κ12κ2)(𝒉𝜷2+𝒉𝜹2)2\displaystyle(\kappa_{1}-2\kappa_{2})(\|\boldsymbol{h_{\beta}}\|_{2}+\|\boldsymbol{h_{\delta}}\|_{2})^{2}

The proof is completed if κ1>2κ2\kappa_{1}>2\kappa_{2}. We now show that (42) and (43) hold together with κ1>2κ2\kappa_{1}>2\kappa_{2} for a Rademacher sensing matrix 𝑨\boldsymbol{A}.
We now state two important facts on the Rademacher matrix 𝑨\boldsymbol{A} which will be used in proving (42) and (43) respectively.

  1. (1)

    We use a result following Lemma 1 [31] (see the equation immediately following Lemma 1 in [31], and set D¯\bar{D} in that equation to the identity matrix, since we are concerned with signals that are sparse in the canonical basis). Using this result, there exist positive constants c2,c3,c4c_{2},c^{\prime}_{3},c^{\prime}_{4}, such that with probability at least 1c3exp{c4n}1-c^{\prime}_{3}\exp{\{-c^{\prime}_{4}n\}}:

    1n𝑨𝒉𝜷2𝒉𝜷24c2logpn𝒉𝜷1𝒉𝜷p.\frac{1}{\sqrt{n}}\|\boldsymbol{Ah_{\beta}}\|_{2}\geq\frac{\|\boldsymbol{h_{\beta}}\|_{2}}{4}-c_{2}\sqrt{\frac{\log p}{n}}\|\boldsymbol{h_{\beta}}\|_{1}\ \forall\ \boldsymbol{h_{\beta}}\in\mathbb{R}^{p}. (44)
  2. (2)

    From Theorem 4.4.5 of [49], for a s×rs\times r^{\prime} dimensional Rademacher matrix 𝑨𝓡𝒊𝓢𝒋\boldsymbol{A_{\mathcal{R}_{i}\mathcal{S}_{j}}}, there exists a constant c1>0c_{1}>0 such that, for any τ>0\tau^{\prime}>0, with probability at least 12exp{nτ2}1-2\exp{\{-n\tau^{\prime 2}\}} we have

    1n𝑨𝓡𝒊𝓢𝒋2=1nσmax(𝑨𝓡𝒊𝓢𝒋)c1(sn+rn+τ).\frac{1}{\sqrt{n}}\|\boldsymbol{A_{\mathcal{R}_{i}\mathcal{S}_{j}}}\|_{2}=\frac{1}{\sqrt{n}}\sigma_{max}({\boldsymbol{A_{\mathcal{R}_{i}\mathcal{S}_{j}}}})\leq c_{1}\left(\sqrt{\frac{s}{n}}+\sqrt{\frac{r^{\prime}}{n}}+\tau^{\prime}\right). (45)

Throughout this proof, we take the constants C172(24c2)2C_{1}\triangleq\frac{7^{2}}{(24c_{2})^{2}} and C2max{322c22,4(51200c1)2}C_{2}\triangleq\max\{32^{2}c_{2}^{2},4(51200c_{1})^{2}\}, where c1,c2c_{1},c_{2} are as defined in (44) and (45) respectively.

Proof of (42): We first obtain a lower bound on 1n𝑨𝒉𝜷22\frac{1}{n}\|\boldsymbol{Ah_{\beta}}\|_{2}^{2} using (44). For every (𝒉𝜷,𝒉𝜹)𝒞(𝒮,,lognlogp)(\boldsymbol{h_{\beta}},\boldsymbol{h_{\delta}})\in\mathcal{C}\left(\mathcal{S},\mathcal{R},\sqrt{\frac{\log n}{\log p}}\right), we have:

𝒉𝜷14(𝒉𝜷)𝓢1+3lognlogp(𝒉𝜹)𝓢14s𝒉𝜷2+3lognlogpr𝒉𝜹2.\|\boldsymbol{h_{\beta}}\|_{1}\leq 4\|\boldsymbol{(h_{\beta})_{\mathcal{S}}}\|_{1}+3\sqrt{\frac{\log n}{\log p}}\|\boldsymbol{(h_{\delta})_{\mathcal{S}}}\|_{1}\leq 4\sqrt{s}\|\boldsymbol{h_{\beta}}\|_{2}+3\sqrt{\frac{\log n}{\log p}}\sqrt{r}\|\boldsymbol{h_{\delta}}\|_{2}. (46)

Substituting (46) in (44), we obtain that, with probability at least 1c3exp{c4n}1-c^{\prime}_{3}\exp{\{-c^{\prime}_{4}n\}}, for every (𝒉𝜷,𝒉𝜹)𝒞(𝒮,,lognlogp)(\boldsymbol{h_{\beta}},\boldsymbol{h_{\delta}})\in\mathcal{C}\left(\mathcal{S},\mathcal{R},\sqrt{\frac{\log n}{\log p}}\right):

1n𝑨𝒉𝜷2\displaystyle\frac{1}{\sqrt{n}}\|\boldsymbol{Ah_{\beta}}\|_{2} \displaystyle\geq (144c2slogpn)𝒉𝜷23c2lognlogprlogpn𝒉𝜹2,\displaystyle\left(\frac{1}{4}-4c_{2}\sqrt{\frac{s\log p}{n}}\right)\|\boldsymbol{h_{\beta}}\|_{2}-3c_{2}\sqrt{\frac{\log n}{\log p}}\sqrt{\frac{r\log p}{n}}\|\boldsymbol{h_{\delta}}\|_{2},
1n𝑨𝒉𝜷2+𝒉𝜹2\displaystyle\therefore\frac{1}{\sqrt{n}}\|\boldsymbol{Ah_{\beta}}\|_{2}+\|\boldsymbol{h_{\delta}}\|_{2} \displaystyle\geq (144c2slogpn)𝒉𝜷2+(13c2lognlogprlogpn)𝒉𝜹2.\displaystyle\left(\frac{1}{4}-4c_{2}\sqrt{\frac{s\log p}{n}}\right)\|\boldsymbol{h_{\beta}}\|_{2}+\left(1-3c_{2}\sqrt{\frac{\log n}{\log p}}\sqrt{\frac{r\log p}{n}}\right)\|\boldsymbol{h_{\delta}}\|_{2}. (47)

Under the assumption nC1slogpn\geq C_{1}s\log p, the first term in the brackets of (47) is greater than 18\frac{1}{8}. Again, under the assumption rC2nlognr\leq C_{2}\frac{n}{\log n}, the second term is greater than 18\frac{1}{8}. Thus we have, 1n𝑨𝒉𝜷2+𝒉𝜹218(𝒉𝜷2+𝒉𝜹2)\frac{1}{\sqrt{n}}\|\boldsymbol{Ah_{\beta}}\|_{2}+\|\boldsymbol{h_{\delta}}\|_{2}\geq\frac{1}{8}(\|\boldsymbol{h_{\beta}}\|_{2}+\|\boldsymbol{h_{\delta}}\|_{2}). Squaring both sides, we have, 1n𝑨𝒉𝜷22+𝒉𝜹22+2n𝑨𝒉𝜷2𝒉𝜹2164(𝒉𝜷2+𝒉𝜹2)2\frac{1}{n}\|\boldsymbol{Ah_{\beta}}\|_{2}^{2}+\|\boldsymbol{h_{\delta}}\|_{2}^{2}+\frac{2}{\sqrt{n}}\|\boldsymbol{Ah_{\beta}}\|_{2}\|\boldsymbol{h_{\delta}}\|_{2}\geq\frac{1}{64}\left(\|\boldsymbol{h_{\beta}}\|_{2}+\|\boldsymbol{h_{\delta}}\|_{2}\right)^{2}. Using the fact that 𝒂22+𝒃222𝒂2𝒃2\|\boldsymbol{a}\|^{2}_{2}+\|\boldsymbol{b}\|^{2}_{2}\geq 2\|\boldsymbol{a}\|_{2}\|\boldsymbol{b}\|_{2} for any vectors 𝒂,𝒃\boldsymbol{a},\boldsymbol{b}, we have, 2(1n𝑨𝒉𝜷22+𝒉𝜹22)1n𝑨𝒉𝜷22+𝒉𝜹22+2n𝑨𝒉𝜷2𝒉𝜹2164(𝒉𝜷2+𝒉𝜹2)22\left(\frac{1}{n}\|\boldsymbol{Ah_{\beta}}\|_{2}^{2}+\|\boldsymbol{h_{\delta}}\|_{2}^{2}\right)\geq\frac{1}{n}\|\boldsymbol{Ah_{\beta}}\|_{2}^{2}+\|\boldsymbol{h_{\delta}}\|_{2}^{2}+\frac{2}{\sqrt{n}}\|\boldsymbol{Ah_{\beta}}\|_{2}\|\boldsymbol{h_{\delta}}\|_{2}\geq\frac{1}{64}\left(\|\boldsymbol{h_{\beta}}\|_{2}+\|\boldsymbol{h_{\delta}}\|_{2}\right)^{2}. Hence we have with probability at least 1c3exp{c4n}1-c^{\prime}_{3}\exp{\{-c^{\prime}_{4}n\}}, for every (𝒉𝜷,𝒉𝜹)𝒞(𝒮,,lognlogp)(\boldsymbol{h_{\beta}},\boldsymbol{h_{\delta}})\in\mathcal{C}\left(\mathcal{S},\mathcal{R},\sqrt{\frac{\log n}{\log p}}\right)

1n𝑨𝒉𝜷22+𝒉𝜹221128(𝒉𝜷2+𝒉𝜹2)2.\frac{1}{n}\|\boldsymbol{Ah_{\beta}}\|_{2}^{2}+\|\boldsymbol{h_{\delta}}\|_{2}^{2}\geq\frac{1}{128}(\|\boldsymbol{h_{\beta}}\|_{2}+\|\boldsymbol{h_{\delta}}\|_{2})^{2}. (48)

Therefore, we have κ1=1/128\kappa_{1}=1/128 completing the proof of (42).

Proof of (43): This part of the proof directly follows the proof of Lemma 2 in [38], with a few minor differences in constant factors. Nevertheless, we are including it here to make the paper self-contained.

Divide the set {1,2,,p}\{1,2,\ldots,p\} into subsets 𝒮1,𝒮2,,𝒮q\mathcal{S}_{1},\mathcal{S}_{2},\ldots,\mathcal{S}_{q} of size ss each, such that the first set 𝒮1\mathcal{S}_{1} contains ss largest absolute value entries of 𝒉𝜷\boldsymbol{h_{\beta}} indexed by 𝒮\mathcal{S}, the set 𝒮2\mathcal{S}_{2} contains ss largest absolute value entries of the vector (𝒉𝜷)𝓢𝒄\boldsymbol{({h_{\beta}})_{\mathcal{S}^{c}}}, 𝒮2\mathcal{S}_{2} contains the second largest ss absolute value entries of (𝒉𝜷)𝓢𝒄\boldsymbol{({h_{\beta}})_{\mathcal{S}^{c}}}, and so on. By the same strategy, we also divide the set {1,2,,n}\{1,2,\ldots,n\} into subsets 1,2,,k\mathcal{R}_{1},\mathcal{R}_{2},\ldots,\mathcal{R}_{k} such that the first set 1\mathcal{R}_{1} contains rr entries of 𝒉𝜹\boldsymbol{h_{\delta}} indexed by \mathcal{R} and sets 2,3,\mathcal{R}_{2},\mathcal{R}_{3},\ldots are of size rrr^{\prime}\geq r. We have for every (𝒉𝜷,𝒉𝜹)𝒞(𝒮,,lognlogp)(\boldsymbol{h_{\beta}},\boldsymbol{h_{\delta}})\in\mathcal{C}\left(\mathcal{S},\mathcal{R},\sqrt{\frac{\log n}{\log p}}\right),

1n|𝑨𝒉𝜷,𝒉𝜹|i,j1n|𝑨𝓡𝒊𝓢𝒋(𝒉𝜷)𝒮j,(𝒉𝜹)i|\displaystyle\frac{1}{\sqrt{n}}|\langle\boldsymbol{Ah_{\beta}},\boldsymbol{h_{\delta}}\rangle|\leq\sum_{i,j}\frac{1}{\sqrt{n}}|\langle\boldsymbol{A_{\mathcal{R}_{i}\mathcal{S}_{j}}}(\boldsymbol{h_{\beta}})_{\mathcal{S}_{j}},(\boldsymbol{h_{\delta}})_{\mathcal{R}_{i}}\rangle| maxi,j1n𝑨𝓡𝒊𝓢𝒋2i,j(𝒉𝜷)𝒮j2(𝒉𝜹)i2\displaystyle\leq\max_{i,j}\frac{1}{\sqrt{n}}\|\boldsymbol{A_{\mathcal{R}_{i}\mathcal{S}_{j}}}\|_{2}\sum_{i^{\prime},j^{\prime}}\|(\boldsymbol{h_{\beta}})_{\mathcal{S}_{j^{\prime}}}\|_{2}\|(\boldsymbol{h_{\delta}})_{\mathcal{R}_{i^{\prime}}}\|_{2} (49)
=maxi,j1n𝑨𝓡𝒊𝓢𝒋2(j(𝒉𝜷)𝒮j2)(i(𝒉𝜹)i2).\displaystyle=\max_{i,j}\frac{1}{\sqrt{n}}\|\boldsymbol{A_{\mathcal{R}_{i}\mathcal{S}_{j}}}\|_{2}\left(\sum_{j^{\prime}}\|(\boldsymbol{h_{\beta}})_{\mathcal{S}_{j^{\prime}}}\|_{2}\right)\left(\sum_{i^{\prime}}\|(\boldsymbol{h_{\delta}})_{\mathcal{R}_{i^{\prime}}}\|_{2}\right). (50)

Note that 𝑨𝓡𝒊𝓢𝒋\boldsymbol{A_{\mathcal{R}_{i}\mathcal{S}_{j}}} (a submatrix of 𝑨\boldsymbol{A} containing rows belonging to i\mathcal{R}_{i} and columns belonging to 𝒮j\mathcal{S}_{j}) is itself a Rademacher matrix with i.i.d. entries. Taking the union bound over all possible values of 𝒮j\mathcal{S}_{j} and i\mathcal{R}_{i}, we have that the inequality in (45) holds with probability at least 12(nr)(ps)exp(nτ2)1-2\binom{n}{r^{\prime}}\binom{p}{s}\exp{(-n\tau^{\prime 2})}. If n4τ2slog(p)n\geq 4{\tau^{\prime}}^{-2}s\log(p) we obtain, (ps)psexp(τ2n/4)\binom{p}{s}\leq p^{s}\leq\exp({\tau^{\prime}}^{2}n/4). Furthermore, if we assume, n4τ2rlog(n)n\geq 4{\tau^{\prime}}^{-2}r^{\prime}\log(n), we have (nr)nrexp(τ2n/4)\binom{n}{r^{\prime}}\leq n^{r^{\prime}}\leq\exp({\tau^{\prime}}^{2}n/4). Later we will give a choice of τ\tau^{\prime} which ensures that these conditions are satisfied. Therefore, we obtain with probability at least 12exp{nτ2/2}1-2\exp{\{-n\tau^{\prime 2}/2\}},

maxi,j1n𝑨𝓡𝒊𝓢𝒋2c1(sn+rn+τ).\textrm{max}_{i,j}\frac{1}{\sqrt{n}}\|\boldsymbol{A_{\mathcal{R}_{i}\mathcal{S}_{j}}}\|_{2}\leq c_{1}\left(\sqrt{\frac{s}{n}}+\sqrt{\frac{r^{\prime}}{n}}+\tau^{\prime}\right). (51)

Using the first inequality in the last equation of Section 2.1 of [8] we obtain i=3q(𝒉𝜷)𝓢𝒊21s(𝒉𝜷)𝓢𝒄1\sum_{i=3}^{q}\|\boldsymbol{(h_{\beta})_{\mathcal{S}_{i}}}\|_{2}\leq\frac{1}{\sqrt{s}}\|\boldsymbol{(h_{\beta})_{\mathcal{S}^{c}}}\|_{1}. Furthermore, for every (𝒉𝜷,𝒉𝜹)𝒞(𝒮,,lognlogp)(\boldsymbol{h_{\beta}},\boldsymbol{h_{\delta}})\in\mathcal{C}\left(\mathcal{S},\mathcal{R},\sqrt{\frac{\log n}{\log p}}\right), we have (𝒉𝜷)𝓢𝒄13s𝒉𝜷2+3lognlogpr𝒉𝜹2\|\boldsymbol{(h_{\beta})_{\mathcal{S}^{c}}}\|_{1}\leq 3\sqrt{s}\|\boldsymbol{h_{\beta}}\|_{2}+3\sqrt{\frac{\log n}{\log p}}\sqrt{r}\|\boldsymbol{h_{\delta}}\|_{2}. Hence,

i=1q(𝒉𝜷)𝓢𝒊2=(𝒉𝜷)𝓢𝟏2+(𝒉𝜷)𝓢𝟐2+i=3q(𝒉𝜷)𝓢𝒊22𝒉𝜷2+i=3q(𝒉𝜷)𝓢𝒊25𝒉𝜷2+3lognlogprs𝒉𝜹2.\sum_{i=1}^{q}\|\boldsymbol{(h_{\beta})_{\mathcal{S}_{i}}}\|_{2}=\|\boldsymbol{(h_{\beta})_{\mathcal{S}_{1}}}\|_{2}+\|\boldsymbol{(h_{\beta})_{\mathcal{S}_{2}}}\|_{2}+\sum_{i=3}^{q}\|\boldsymbol{(h_{\beta})_{\mathcal{S}_{i}}}\|_{2}\leq 2\|\boldsymbol{h_{\beta}}\|_{2}+\sum_{i=3}^{q}\|\boldsymbol{(h_{\beta})_{\mathcal{S}_{i}}}\|_{2}\leq 5\|\boldsymbol{h_{\beta}}\|_{2}+3\sqrt{\frac{\log n}{\log p}}\sqrt{\frac{r}{s}}\|\boldsymbol{h_{\delta}}\|_{2}.

Following a similar process we obtain i=3k(𝒉𝜹)𝓡𝒊21r(𝒉𝜹)𝓡𝒄1\sum_{i=3}^{k}\|\boldsymbol{(h_{\delta})_{\mathcal{R}_{i}}}\|_{2}\leq\frac{1}{\sqrt{r^{\prime}}}\|\boldsymbol{(h_{\delta})_{\mathcal{R}^{c}}}\|_{1}. Furthermore, for every (𝒉𝜷,𝒉𝜹)𝒞(𝒮,,lognlogp)(\boldsymbol{h_{\beta}},\boldsymbol{h_{\delta}})\in\mathcal{C}\left(\mathcal{S},\mathcal{R},\sqrt{\frac{\log n}{\log p}}\right), we have 1r(𝒉𝜹)𝓡𝒄13sr1lognlogp𝒉𝜷2+3rr𝒉𝜹2\frac{1}{\sqrt{r^{\prime}}}\|\boldsymbol{(h_{\delta})_{\mathcal{R}^{c}}}\|_{1}\leq 3\sqrt{\frac{s}{r^{\prime}}}\frac{1}{\sqrt{\frac{\log n}{\log p}}}\|\boldsymbol{h_{\beta}}\|_{2}+3\sqrt{\frac{r}{r^{\prime}}}\|\boldsymbol{h_{\delta}}\|_{2}. Since rrr^{\prime}\geq r,

i=1k(𝒉𝜹)𝓡𝒊2=(𝒉𝜹)𝓡𝟏2+(𝒉𝜹)𝓡𝟐2+i=3k(𝒉𝜹)𝓡𝒊22𝒉𝜹2+i=3k(𝒉𝜹)𝓡𝒊25𝒉𝜹2+3lognlogpsr𝒉𝜷2.\sum_{i=1}^{k}\|\boldsymbol{(h_{\delta})_{\mathcal{R}_{i}}}\|_{2}=\|\boldsymbol{(h_{\delta})_{\mathcal{R}_{1}}}\|_{2}+\|\boldsymbol{(h_{\delta})_{\mathcal{R}_{2}}}\|_{2}+\sum_{i=3}^{k}\|\boldsymbol{(h_{\delta})_{\mathcal{R}_{i}}}\|_{2}\leq 2\|\boldsymbol{h_{\delta}}\|_{2}+\sum_{i=3}^{k}\|\boldsymbol{(h_{\delta})_{\mathcal{R}_{i}}}\|_{2}\leq 5\|\boldsymbol{h_{\delta}}\|_{2}+\frac{3}{\sqrt{\frac{\log n}{\log p}}}\sqrt{\frac{s}{r^{\prime}}}\|\boldsymbol{h_{\beta}}\|_{2}.

Hence, joining (51) , (45) into (49), we obtain with probability at least 12exp{nτ2/2}1-2\exp{\{-n\tau^{\prime 2}/2\}}, for every (𝒉𝜷,𝒉𝜹)𝒞(𝒮,,lognlogp)(\boldsymbol{h_{\beta}},\boldsymbol{h_{\delta}})\in\mathcal{C}\left(\mathcal{S},\mathcal{R},\sqrt{\frac{\log n}{\log p}}\right)

1n|𝑨𝒉𝜷,𝒉𝜹|c1(sn+rn+τ)×(5𝒉𝜷2+3lognlogprs𝒉𝜹2)×(5𝒉𝜹2+3lognlogpsr𝒉𝜷2).\displaystyle\frac{1}{\sqrt{n}}|\langle\boldsymbol{Ah_{\beta}},\boldsymbol{h_{\delta}}\rangle|\leq c_{1}\left(\sqrt{\frac{s}{n}}+\sqrt{\frac{r^{\prime}}{n}}+\tau^{\prime}\right)\times\left(5\|\boldsymbol{h_{\beta}}\|_{2}+3\sqrt{\frac{\log n}{\log p}}\sqrt{\frac{r}{s}}\|\boldsymbol{h_{\delta}}\|_{2}\right)\times\left(5\|\boldsymbol{h_{\delta}}\|_{2}+\frac{3}{\sqrt{\frac{\log n}{\log p}}}\sqrt{\frac{s}{r^{\prime}}}\|\boldsymbol{h_{\beta}}\|_{2}\right). (52)

Recall that rslogplognr\leq\frac{s\log p}{\log n}, by assumption. Taking r=slogplognr^{\prime}=\frac{s\log p}{\log n} leads to lognlogprslognlogprs=1\sqrt{\frac{\log n}{\log p}}\sqrt{\frac{r}{s}}\leq\sqrt{\frac{\log n}{\log p}}\sqrt{\frac{r^{\prime}}{s}}=1 and 1lognlogpsr=1\frac{1}{\sqrt{\frac{\log n}{\log p}}}\sqrt{\frac{s}{r^{\prime}}}=1. Thus, we obtain with probability at least 12exp(nτ2/2)1-2\exp({-n{\tau^{\prime 2}}/2}) for every (𝒉𝜷,𝒉𝜹)𝒞(𝒮,,lognlogp)(\boldsymbol{h_{\beta}},\boldsymbol{h_{\delta}})\in\mathcal{C}\left(\mathcal{S},\mathcal{R},\sqrt{\frac{\log n}{\log p}}\right),

1n|𝑨𝒉𝜷,𝒉𝜹|25c1(sn+rn+τ)×(𝒉𝜷2+𝒉𝜹2)2\frac{1}{\sqrt{n}}|\langle\boldsymbol{Ah_{\beta}},\boldsymbol{h_{\delta}}\rangle|\leq 25c_{1}\left(\sqrt{\frac{s}{n}}+\sqrt{\frac{r^{\prime}}{n}}+\tau^{\prime}\right)\times(\|\boldsymbol{h_{\beta}}\|_{2}+\|\boldsymbol{h_{\delta}}\|_{2})^{2} (53)

Let τ1/(51200c1)\tau^{\prime}\triangleq 1/(51200c_{1}). Recall that, C1max{322c22,4(51200c1)2}C_{1}\triangleq\max\{32^{2}c_{2}^{2},4(51200c_{1})^{2}\}. Then nC1slogpn\geq C_{1}s\log p implies n4τ2slogp=4τ2rlognn\geq 4{\tau^{\prime}}^{-2}s\log p=4{\tau^{\prime}}^{-2}r^{\prime}\log n. Furthermore,

snrn=slogpnlognτ/2.\sqrt{\frac{s}{n}}\leq\sqrt{\frac{r^{\prime}}{n}}=\sqrt{\frac{s\log p}{n\log n}}\leq\tau^{\prime}/2. (54)

Therefore, we have with probability at least 12exp(nτ2/2)1-2\exp({-n{\tau^{\prime 2}}/2}), for every (𝒉𝜷,𝒉𝜹)𝒞(𝒮,,lognlogp)(\boldsymbol{h_{\beta}},\boldsymbol{h_{\delta}})\in\mathcal{C}\left(\mathcal{S},\mathcal{R},\sqrt{\frac{\log n}{\log p}}\right)

1n|𝑨𝒉𝜷,𝒉𝜹|25c1×2τ(𝒉𝜷2+𝒉𝜹2)21512(𝒉𝜷2+𝒉𝜹2)2\frac{1}{\sqrt{n}}|\langle\boldsymbol{Ah_{\beta}},\boldsymbol{h_{\delta}}\rangle|\leq 25c_{1}\times 2\tau^{\prime}(\|\boldsymbol{h_{\beta}}\|_{2}+\|\boldsymbol{h_{\delta}}\|_{2})^{2}\leq\frac{1}{512}(\|\boldsymbol{h_{\beta}}\|_{2}+\|\boldsymbol{h_{\delta}}\|_{2})^{2} (55)

This completes the proof of (43).

Now, from (48) and (55), using a union bound, we obtain with probability at least 1(c3exp(c4n)+2exp(nτ2/2))1-(c^{\prime}_{3}\exp(-c^{\prime}_{4}n)+2\exp(-n{\tau^{\prime 2}}/2)),

1n𝑨𝒉𝜷+𝒏𝒉𝜹22(κ12κ2)(𝒉𝜷2+𝒉𝜹2)2=κ2(𝒉𝜷2+𝒉𝜹2)2\frac{1}{n}\|\boldsymbol{Ah_{\beta}+\sqrt{n}h_{\delta}}\|_{2}^{2}\geq(\kappa_{1}-2\kappa_{2})(\|\boldsymbol{h_{\beta}}\|_{2}+\|\boldsymbol{h_{\delta}}\|_{2})^{2}=\kappa^{2}(\|\boldsymbol{h_{\beta}}\|_{2}+\|\boldsymbol{h_{\delta}}\|_{2})^{2} (56)

Taking c3=c3+2c_{3}=c^{\prime}_{3}+2 and c4=min{c4,τ2/2}c_{4}=\min\{c^{\prime}_{4},{\tau^{\prime 2}}/2\}, we have 1(c3exp(c4n)+2exp(τ2n/2)1c3exp(c4n)1-(c^{\prime}_{3}\exp(-c^{\prime}_{4}n)+2\exp(-{\tau^{\prime 2}}n/2)\geq 1-c_{3}\exp(-c_{4}n).

Note that, we have, κ=κ12κ2=1/16\kappa=\sqrt{\kappa_{1}-2\kappa_{2}}=1/16. Taking the root over (56), we obtain with probability at least 1c3exp(c4n)1-c_{3}\exp(-c_{4}n),

1n𝑨𝒉𝜷+𝒏𝒉𝜹2116(𝒉𝜷2+𝒉𝜹2)(𝒉𝜷,𝒉𝜹)𝒞(𝒮,,lognlogp).\frac{1}{\sqrt{n}}\|\boldsymbol{Ah_{\beta}+\sqrt{n}h_{\delta}}\|_{2}\geq\frac{1}{16}(\|\boldsymbol{h_{\beta}}\|_{2}+\|\boldsymbol{h_{\delta}}\|_{2})\ \forall\ (\boldsymbol{h_{\beta}},\boldsymbol{h_{\delta}})\in\mathcal{C}\left(\mathcal{S},\mathcal{R},\sqrt{\frac{\log n}{\log p}}\right).

This completes the proof of the lemma. \blacksquare

A-A Proof of Theorem 1

Proof of (7): We now derive the bound for the l1l_{1} norm of the robust lasso estimate of the error 𝜷^𝝀𝟏𝜷\boldsymbol{\hat{\beta}_{\lambda_{1}}}-\boldsymbol{\beta^{*}} given by the optimisation problem (6). Recall that, we have λ1=4σlogpn\lambda_{1}=\frac{4\sigma\sqrt{\log p}}{\sqrt{n}} and λ2=4σlognn\lambda_{2}=\frac{4\sigma\sqrt{\log n}}{{n}}. We choose λ~2nλ2=4σlognn\tilde{\lambda}_{2}\triangleq\sqrt{n}\lambda_{2}=\frac{4\sigma\sqrt{\log n}}{\sqrt{n}}. We use λ~2\tilde{\lambda}_{2} to define the cone constraint in (41). Note that, in the proof of Theorem 1 of [38], it is shown that 𝒉𝜷𝜷^𝝀𝟏𝜷\boldsymbol{h_{\beta}}\triangleq\boldsymbol{\hat{\beta}_{\lambda_{1}}-\beta^{*}} and 𝒉𝜹1n(𝜹^𝝀𝟐𝜹)\boldsymbol{h_{\delta}}\triangleq\frac{1}{\sqrt{n}}(\boldsymbol{\hat{\delta}_{\lambda_{2}}-\delta^{*}}) satisfies the cone constraint given in (41). Therefore, we have

(𝒉𝜷)𝒮c1+λ~2λ1(𝒉𝜹)c13((𝒉𝜷)𝒮1+λ~2λ1(𝒉𝜹)1).\|(\boldsymbol{h_{\beta}})_{\mathcal{S}^{c}}\|_{1}+\frac{\tilde{\lambda}_{2}}{\lambda_{1}}\|(\boldsymbol{h_{\delta}})_{\mathcal{R}^{c}}\|_{1}\leq 3(\|(\boldsymbol{h_{\beta}})_{\mathcal{S}}\|_{1}+\frac{\tilde{\lambda}_{2}}{\lambda_{1}}\|(\boldsymbol{h_{\delta}})_{\mathcal{R}}\|_{1}). (57)

Now by using Eqn. (57), we have

𝒉𝜷1\displaystyle\|\boldsymbol{h_{\beta}}\|_{1} =\displaystyle= (𝒉𝜷)𝓢1+(𝒉𝜷)𝓢𝒄14(𝒉𝜷)𝓢1+3λ~2λ1(𝒉𝜹)𝓡14s𝒉𝜷2+3rλ~2λ1𝒉𝜹2.\displaystyle\|\boldsymbol{(h_{\beta})_{\mathcal{S}}}\|_{1}+\|\boldsymbol{(h_{\beta})_{\mathcal{S}^{c}}}\|_{1}\leq 4\|\boldsymbol{(h_{\beta})_{\mathcal{S}}}\|_{1}+3\frac{\tilde{\lambda}_{2}}{\lambda_{1}}\|\boldsymbol{(h_{\delta})_{\mathcal{R}}}\|_{1}\leq 4\sqrt{s}\|\boldsymbol{h_{\beta}}\|_{2}+3\sqrt{r}\frac{\tilde{\lambda}_{2}}{\lambda_{1}}\|\boldsymbol{h_{\delta}}\|_{2}. (58)

Here, the last inequality of Eqn.(58) holds since (𝒉𝜷)𝓢1s𝒉𝜷2\left\|\boldsymbol{(h_{\beta})_{\mathcal{S}}}\right\|_{1}\leq\sqrt{s}\|\boldsymbol{h_{\beta}}\|_{2} and (𝒉𝜹)𝓡1r𝒉𝜹2\left\|\boldsymbol{(h_{\delta})_{\mathcal{R}}}\right\|_{1}\leq\sqrt{r}\|\boldsymbol{h_{\delta}}\|_{2}. Note that, max{s,r}s+r\max\{\sqrt{s},\sqrt{r}\}\leq\sqrt{s+r}. Based on the values of λ1,λ~2\lambda_{1},\tilde{\lambda}_{2}, we have λ~2<λ1\tilde{\lambda}_{2}<\lambda_{1} since n<pn<p. Hence, by using Eqn.(58), we have

𝒉𝜷14s𝒉𝜷2+3r𝒉𝜹24s+r(𝒉𝜷2+𝒉𝜹2).\displaystyle\|\boldsymbol{h_{\beta}}\|_{1}\leq 4\sqrt{s}\|\boldsymbol{h_{\beta}}\|_{2}+3\sqrt{r}\|\boldsymbol{h_{\delta}}\|_{2}\leq 4\sqrt{s+r}\left(\|\boldsymbol{h_{\beta}}\|_{2}+\|\boldsymbol{h_{\delta}}\|_{2}\right). (59)

Recall that, 𝒆=𝜹n\boldsymbol{e^{*}}=\frac{\boldsymbol{\delta^{*}}}{\sqrt{n}} and 𝒆^=𝜹^𝝀~𝟐n\boldsymbol{\hat{e}}=\frac{\boldsymbol{\hat{\delta}_{\tilde{\lambda}_{2}}}}{\sqrt{n}} in Theorem 1 of [38]. Therefore, by the equivalence of the model given in (37) and the optimisation problem in (6) with that of [38], we have

𝜷^𝝀𝟏𝜷2+1n(𝜹^𝝀~𝟐𝜹)23κ2max{λ1s,λ~2r},\|\boldsymbol{\hat{\beta}_{\lambda_{1}}-\beta^{*}}\|_{2}+\left\|\frac{1}{\sqrt{n}}(\boldsymbol{\hat{\delta}_{\tilde{\lambda}_{2}}-\delta^{*}})\right\|_{2}\leq 3\kappa^{-2}\textrm{max}\{\lambda_{1}\sqrt{s},\tilde{\lambda}_{2}\sqrt{r}\}, (60)

as long as

2𝑨𝜼nλ1,and2𝜼nλ~2.\dfrac{2\|\boldsymbol{A}^{\top}\boldsymbol{\eta}\|_{\infty}}{n}\leq\lambda_{1},\ \text{and}\ \dfrac{2\|\boldsymbol{\eta}\|_{\infty}}{\sqrt{n}}\leq\tilde{\lambda}_{2}. (61)

Therefore when (61) holds, then by using (59) (recall 𝒉𝜷=𝜷^𝝀𝟏𝜷\boldsymbol{h_{\beta}}=\boldsymbol{\hat{\beta}_{\lambda_{1}}-\beta^{*}}) and (60), we have

𝜷^𝝀𝟏𝜷14s+r(3κ2max{λ1s,λ~2r})12κ2(s+r)max{λ1,λ~2}12κ2(s+r)λ1.\displaystyle\|\boldsymbol{\hat{\beta}_{\lambda_{1}}-\beta^{*}}\|_{1}\leq 4\sqrt{s+r}\left(3\kappa^{-2}\textrm{max}\{\lambda_{1}\sqrt{s},\tilde{\lambda}_{2}\sqrt{r}\}\right)\leq 12\kappa^{-2}(s+r)\textrm{max}\{\lambda_{1},\tilde{\lambda}_{2}\}\leq 12\kappa^{-2}(s+r)\lambda_{1}. (62)

We will now bound the probability that 2𝑨𝜼nλ1\dfrac{2\|\boldsymbol{A}^{\top}\boldsymbol{\eta}\|_{\infty}}{n}\leq\lambda_{1} and 2𝜼nλ~2\dfrac{2\|\boldsymbol{\eta}\|_{\infty}}{\sqrt{n}}\leq\tilde{\lambda}_{2} using the fact that 𝜼\boldsymbol{\eta} is Gaussian. By using Lemma 7 in Appendix C which describes the tail bounds of the Gaussian vector, we have

P(2𝜼/n4σlog(n)/n)11n,\displaystyle P(2\|\boldsymbol{\eta}\|_{\infty}/\sqrt{n}\leq 4\sigma\sqrt{\log(n)/n})\geq 1-\frac{1}{n}, (63)
P(21n𝑨𝜼/n4σlog(p)/n)11p.\displaystyle P\left(2\left\|\frac{1}{\sqrt{n}}\boldsymbol{A}^{\top}\boldsymbol{\eta}\right\|_{\infty}/\sqrt{n}\leq 4\sigma\sqrt{\log(p)/n}\right)\geq 1-\frac{1}{p}. (64)

Using (63),(64) with Bonferroni’s inequality in (62), we have:

P(𝜷^𝝀𝟏𝜷148κ2(s+r)σlog(p)n)11n1p.\displaystyle P\left(\|\boldsymbol{\hat{\beta}_{\lambda_{1}}}-\boldsymbol{\beta^{*}}\|_{1}\leq 48\kappa^{-2}(s+r)\sigma\sqrt{\frac{{\log(p)}}{{n}}}\right)\geq 1-\frac{1}{n}-\frac{1}{p}. (65)

This completes the proof of (7).
Proof of (8): We now derive an upper bound of 𝜹^𝝀𝟐𝜹1\|\boldsymbol{\hat{\delta}_{\lambda_{2}}}-\boldsymbol{\delta^{*}}\|_{1}. We approach this by showing that given the optimal estimate of 𝜷^λ1\boldsymbol{\hat{\beta}}_{\lambda_{1}}, we can obtain a unique estimate of 𝜹^λ2\boldsymbol{\hat{\delta}}_{\lambda_{2}} using (67). We then derive the upper bound on 𝜹^𝝀𝟐𝜹1\|\boldsymbol{\hat{\delta}_{\lambda_{2}}}-\boldsymbol{\delta^{*}}\|_{1} using the Lasso bounds given in [22]. Expanding the terms in (5), we obtain:

min𝜷,𝜹12n𝒚𝑨𝜷𝜹22+λ1𝜷1+λ2𝜹1=min𝜷{λ1𝜷1+i=1nminδi{12n((yi𝒂𝒊.𝜷)δi)2+λ2|δi|}}.\displaystyle{\min}_{\boldsymbol{\beta},\boldsymbol{\delta}}{\frac{1}{2n}}\left\|\boldsymbol{y}-\boldsymbol{A\beta}-\boldsymbol{\delta}\right\|^{2}_{2}+\lambda_{1}\|\boldsymbol{\beta}\|_{1}+\lambda_{2}\left\|\boldsymbol{\delta}\right\|_{1}={\min}_{\boldsymbol{\beta}}\left\{\lambda_{1}\|\boldsymbol{\beta}\|_{1}+\sum_{i=1}^{n}{\min}_{\delta_{i}}\left\{\frac{1}{2n}((y_{i}-\boldsymbol{a_{i.}\beta})-\delta_{i})^{2}+{\lambda_{2}}|\delta_{i}|\right\}\right\}. (66)

Given the optimal solutions 𝜷^𝝀𝟏\boldsymbol{\hat{\beta}_{\lambda_{1}}} and 𝜹^𝝀𝟐\boldsymbol{\hat{\delta}_{\lambda_{2}}} of (6), 𝜹^𝝀𝟐\boldsymbol{\hat{\delta}_{\lambda_{2}}} can also be viewed as

𝜹^𝝀𝟐=argmin𝜹12ni=1n{(yi𝒂𝒊.𝜷^𝝀𝟏δi)2}+λ2𝜹1\displaystyle\boldsymbol{\hat{\delta}_{\lambda_{2}}}=\textrm{argmin}_{\boldsymbol{\delta}}\frac{1}{2n}\sum_{i=1}^{n}\{(y_{i}-\boldsymbol{a_{i.}}\boldsymbol{\hat{\beta}_{\lambda_{1}}}-\delta_{i})^{2}\}+\lambda_{2}\|\boldsymbol{\delta}\|_{1} (67)

Thus (67) can also be viewed as Lasso estimator for 𝒛=𝑰𝒏𝜹+ϱ\boldsymbol{z}=\boldsymbol{I_{n}\delta^{*}}+\boldsymbol{\varrho}, where 𝒛𝒚𝑨𝜷^𝝀𝟏\boldsymbol{z}\triangleq\boldsymbol{y}-\boldsymbol{A}^{\top}\boldsymbol{\hat{\beta}_{\lambda_{1}}} and ϱ𝑨(𝜷𝜷^𝝀𝟏)+𝜼\boldsymbol{\varrho}\triangleq\boldsymbol{A}(\boldsymbol{\beta^{*}}-\boldsymbol{\hat{\beta}_{\lambda_{1}})+\boldsymbol{\eta}} with 𝜹\boldsymbol{\delta^{*}} being rr-sparse. By using Theorem 11.1(b) of [22] , we have if λ22ϱn\lambda_{2}\geq 2\frac{\|\boldsymbol{\varrho}\|_{\infty}}{n},

𝜹^𝝀𝟐𝜹23rλ2γr,\left\|\boldsymbol{\hat{\delta}_{\lambda_{2}}}-\boldsymbol{\delta^{*}}\right\|_{2}\leq\dfrac{3\sqrt{r}\lambda_{2}}{\gamma_{r}}, (68)

where γr\gamma_{r} is the Restricted eigenvalue constant of order rr which equals one for 𝑰𝒏\boldsymbol{I_{n}}. Now using the result in Lemma 11.1 of [22], when λ22ϱn\lambda_{2}\geq 2\frac{\|\boldsymbol{\varrho}\|_{\infty}}{n}, then

(𝜹^𝝀𝟐𝜹)c13(𝜹^𝝀𝟐𝜹)1.\left\|(\boldsymbol{\hat{\delta}_{\lambda_{2}}}-\boldsymbol{\delta^{*}})_{\mathcal{R}^{c}}\right\|_{1}\leq 3\left\|(\boldsymbol{\hat{\delta}_{\lambda_{2}}}-\boldsymbol{\delta^{*}})_{\mathcal{R}}\right\|_{1}.

Therefore by using (68) when λ22ϱn\lambda_{2}\geq 2\frac{\|\boldsymbol{\varrho}\|_{\infty}}{n}, we have

𝜹^𝝀𝟐𝜹1\displaystyle\|\boldsymbol{\hat{\delta}_{\lambda_{2}}}-\boldsymbol{\delta^{*}}\|_{1} =\displaystyle= (𝜹^𝝀𝟐𝜹)c1+(𝜹^𝝀𝟐𝜹)14(𝜹^𝝀𝟐𝜹)14r(𝜹^𝝀𝟐𝜹)24r𝜹^𝝀𝟐𝜹2\displaystyle\left\|(\boldsymbol{\hat{\delta}_{\lambda_{2}}}-\boldsymbol{\delta^{*}})_{\mathcal{R}^{c}}\right\|_{1}+\left\|(\boldsymbol{\hat{\delta}_{\lambda_{2}}}-\boldsymbol{\delta^{*}})_{\mathcal{R}}\right\|_{1}\leq 4\left\|(\boldsymbol{\hat{\delta}_{\lambda_{2}}}-\boldsymbol{\delta^{*}})_{\mathcal{R}}\right\|_{1}\leq 4\sqrt{r}\left\|(\boldsymbol{\hat{\delta}_{\lambda_{2}}}-\boldsymbol{\delta^{*}})_{\mathcal{R}}\right\|_{2}\leq 4\sqrt{r}\|\boldsymbol{\hat{\delta}_{\lambda_{2}}}-\boldsymbol{\delta^{*}}\|_{2} (69)
\displaystyle\leq 12rλ2.\displaystyle 12r\lambda_{2}.

Therefore we now show that λ2(=4σlognn)2ϱn\lambda_{2}\left(=4\sigma\frac{\sqrt{\log n}}{n}\right)\geq 2\frac{\|\boldsymbol{\varrho}\|_{\infty}}{n} (i.e., ϱ2σlogn\|\boldsymbol{\varrho}\|_{\infty}\leq 2\sigma\sqrt{\log n}) holds with high probability. Now, by Lemma 6 and the triangle inequality, we have

ϱ=𝑨(𝜷𝜷^𝝀𝟏)+𝜼|𝑨|𝜷𝜷^𝝀𝟏1+𝜼.\|\boldsymbol{\varrho}\|_{\infty}=\|\boldsymbol{A}(\boldsymbol{\beta^{*}}-\boldsymbol{\hat{\beta}_{\lambda_{1}}})+\boldsymbol{\eta}\|_{\infty}\leq|\boldsymbol{A}|_{\infty}\|\boldsymbol{\beta^{*}}-\boldsymbol{\hat{\beta}_{\lambda_{1}}}\|_{1}+\|\boldsymbol{\eta}\|_{\infty}.

By using Lemma 7 in Appendix C, we have the following:

P(𝜼σlog(n))1n.P(\|\boldsymbol{\eta}\|_{\infty}\geq\sigma\sqrt{\log(n)})\leq\frac{1}{n}. (70)

Since |𝑨|1|\boldsymbol{A}|_{\infty}\leq 1, by using (65), we have

P(ϱ48κ2(s+r)σlog(p)n+σlog(n))1(2n+1p).\displaystyle P\left(\|\boldsymbol{\varrho}\|_{\infty}\leq 48\kappa^{-2}(s+r)\sigma\sqrt{\frac{\log(p)}{n}}+\sigma\sqrt{\log(n)}\right)\geq 1-\left(\frac{2}{n}+\frac{1}{p}\right). (71)

Since nlogn(48κ2)2(s+r)2logpn\log n\geq(48\kappa^{-2})^{2}(s+r)^{2}\log p, we have 48κ2(s+r)σlog(p)n<σlog(n)48\kappa^{-2}(s+r)\sigma\sqrt{\frac{\log(p)}{n}}<\sigma\sqrt{\log(n)}. Thus

P(ϱ2σlogn)1(2n+1p).P(\|\boldsymbol{\varrho}\|_{\infty}\leq 2\sigma\sqrt{\log n})\geq 1-\left(\frac{2}{n}+\frac{1}{p}\right). (72)

We now put (72) in (69) to obtain:

P(𝜹^𝝀𝟐𝜹124σrlognn)1(2n+1p).\displaystyle P\left(\|\boldsymbol{\hat{\delta}_{\lambda_{2}}}-\boldsymbol{\delta^{*}}\|_{1}\leq\frac{24\sigma r\sqrt{\log n}}{n}\right)\geq 1-\left(\frac{2}{n}+\frac{1}{p}\right). (73)

This completes the proof. \blacksquare

Appendix B Proofs of Theorems and Lemmas on Debiased Lasso

B-A Proof of Theorem 2

Note that we have chosen 𝑾=𝑨\boldsymbol{W}=\boldsymbol{A}. Now, recall the expression for 𝜷^𝑾\boldsymbol{\hat{\beta}_{W}} from (14) and model as given in (5), we have

𝜷^𝑾𝜷\displaystyle\boldsymbol{\hat{\beta}_{W}}-\boldsymbol{\beta^{*}} =\displaystyle= 1n𝑨𝜼+(𝑰𝒑1n𝑨𝑨)(𝜷^𝝀𝟏𝜷)+1n𝑨(𝜹𝜹^𝝀𝟐).\displaystyle\frac{1}{n}\boldsymbol{A}^{\top}\boldsymbol{{\eta}}+\left(\boldsymbol{I_{p}}-\frac{1}{n}\boldsymbol{A^{\top}}\boldsymbol{A}\right)(\boldsymbol{\hat{\beta}_{\lambda_{1}}}-\boldsymbol{\beta^{*}})+\frac{1}{n}\boldsymbol{A}^{\top}\left(\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}\right). (74)

In Lemma 2, (78) and (79) show that the second and third term on the RHS of (74) are negligible as nn, pp increases in probability. Therefore, in view of Lemma 2, we have

n(β^Wjβj)\displaystyle\sqrt{n}(\hat{\beta}_{Wj}-\beta^{*}_{j}) =\displaystyle= 1n𝒂.j𝜼+oP(1),\displaystyle\frac{1}{\sqrt{n}}\boldsymbol{a}_{.j}^{\top}\boldsymbol{{\eta}}+o_{P}(1), (75)

where 𝒂.j\boldsymbol{a}_{.j} denotes the jthj^{\text{th}} column of matrix 𝑨\boldsymbol{A}. Given 𝒂.𝒋\boldsymbol{a_{.j}}, by using the Gaussianity of 𝜼\boldsymbol{\eta}, the first term on the RHS of (75) is a Gaussian random variable with mean 0 and variance σ2𝒂.j𝒂.jn\sigma^{2}\frac{\boldsymbol{a}_{.j}^{\top}\boldsymbol{a}_{.j}}{n}. Since 𝒂.j𝒂.jn=1\frac{\boldsymbol{a}_{.j}^{\top}\boldsymbol{a}_{.j}}{n}=1, the first term on the RHS is N(0,σ2)N(0,\sigma^{2}). This completes the proof of result (1) of the theorem.

We now turn to result (2) of the theorem. By using a similar decomposition argument as in the case of 𝜷^𝑾\boldsymbol{\hat{\beta}_{W}} in (74) and using the expression of 𝜹^𝑾\boldsymbol{\hat{\delta}_{W}} in (15), we have

𝜹^𝑾𝜹=(𝑰𝒏1n𝑨𝑨)𝜼+(𝑰𝒏1n𝑨𝑨)𝑨(𝜷𝜷^𝝀𝟏)1n𝑨𝑨(𝜹𝜹^𝝀𝟐).\displaystyle\boldsymbol{\hat{\delta}_{W}}-\boldsymbol{\delta^{*}}=\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{A}\boldsymbol{A}^{\top}\right)\boldsymbol{{\eta}}+\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{A}\boldsymbol{A}^{\top}\right)\boldsymbol{A}(\boldsymbol{\beta^{*}-\hat{\beta}_{\lambda_{1}}})-\frac{1}{n}\boldsymbol{AA}^{\top}\big{(}\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}\big{)}. (76)

We have 𝚺𝑨=(𝑰𝒏1n𝑨𝑨)(𝑰𝒏1n𝑨𝑨)\boldsymbol{\Sigma_{A}}=\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{AA}^{\top}\right)\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{AA}^{\top}\right)^{\top}. From (80) and (81) of Lemma 2, the second and third terms on the RHS of (76) are both oP(p1n/pn)o_{P}\left(\frac{p\sqrt{1-n/p}}{n}\right). Therefore, using Lemma 2, we have for any i[n]i\in[n]

(δ^Wiδi)ΣAii=(𝑰𝒏1n𝑨𝑨)i.𝜼ΣAii+oP(1ΣAiip1n/pn).\displaystyle\frac{\left({\hat{\delta}_{Wi}}-{\delta^{*}_{i}}\right)}{\sqrt{\Sigma_{A_{ii}}}}=\frac{\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{AA}^{\top}\right)^{\top}_{i.}\boldsymbol{{\eta}}}{\sqrt{\Sigma_{A_{ii}}}}+o_{P}\left(\frac{1}{\sqrt{\Sigma_{A_{ii}}}}\frac{p\sqrt{1-n/p}}{n}\right). (77)

As 𝜼\boldsymbol{\eta} is Gaussian, the first term on the RHS of (77) is a Gaussian random variable with mean 0 and variance σ2\sigma^{2}. In Lemma 3, we show that n2p(pn)ΣAii\frac{n^{2}}{p(p-n)}\Sigma_{A_{ii}} converges to 11 in probability if 𝑨\boldsymbol{A} is a Rademacher matrix. This implies that the second term on the RHS of (77) is oP(1)o_{P}(1). This completes the proof of result (2). \blacksquare

Lemma 2

Let 𝛃^𝛌𝟏,𝛅^𝛌𝟐\boldsymbol{\hat{\beta}}_{\boldsymbol{\lambda_{1}}},\boldsymbol{\hat{\delta}_{\lambda_{2}}} be as in (6) and set λ14σlogpn,λ24σlognn\lambda_{1}\triangleq\frac{4\sigma\sqrt{\log p}}{\sqrt{n}},\lambda_{2}\triangleq\frac{4\sigma\sqrt{\log n}}{{n}}. Given 𝐀\boldsymbol{A} is a Rademacher matrix, if nn is o(p)o(p) and nn is ω[((s+r)logp)2]\omega[((s+r)\log p)^{2}], then as n,pn,p\to\infty we have following:

n(𝑰𝒑1n𝑨𝑨)(𝜷𝜷^𝝀𝟏)\displaystyle\left\|\sqrt{n}\left(\boldsymbol{I_{p}}-\frac{1}{n}\boldsymbol{A^{\top}}\boldsymbol{A}\right)(\boldsymbol{\beta^{*}-\hat{\beta}_{\lambda_{1}}})\right\|_{\infty} =\displaystyle= oP(1)\displaystyle o_{P}(1) (78)
1n𝑨(𝜹𝜹^𝝀𝟐)\displaystyle\left\|\frac{1}{\sqrt{n}}\boldsymbol{A}^{\top}\left(\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}\right)\right\|_{\infty} =\displaystyle= oP(1)\displaystyle o_{P}(1) (79)
np1n/p(𝑰𝒏1n𝑨𝑨)𝑨(𝜷𝜷^𝝀𝟏)\displaystyle\left\|\frac{n}{p\sqrt{1-n/p}}\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{A}\boldsymbol{A}^{\top}\right)\boldsymbol{A}(\boldsymbol{\beta^{*}-\hat{\beta}_{\lambda_{1}}})\right\|_{\infty} =\displaystyle= oP(1)\displaystyle o_{P}(1) (80)
np1n/p1n𝑨𝑨(𝜹𝜹^𝝀𝟐)\displaystyle\left\|\frac{n}{p\sqrt{1-n/p}}\frac{1}{n}\boldsymbol{AA}^{\top}\big{(}\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}\big{)}\right\|_{\infty} =\displaystyle= oP(1)\displaystyle o_{P}(1) (81)

\blacksquare

Proof of Lemma 2:
When nn is ω[((s+r)logp)2]\omega[((s+r)\log p)^{2}], the assumptions of Lemma 1 are satisfied. Hence, the Rademacher matrix 𝑨\boldsymbol{A} satisfies the assumptions of Theorem 1 with probability that goes to 11 as n,pn,p\rightarrow\infty. Therefore, to prove the results, it suffices to condition on the event that the conclusion of Theorem 1 holds.

Proof of (78): Using result (4) of Lemma 6, we have:

n(𝑰𝒑1n𝑨𝑨)(𝜷𝜷^𝝀𝟏)n|𝑨𝑨/n𝑰𝒑|𝜷𝜷^𝝀𝟏1.\displaystyle\left\|\sqrt{n}\left(\boldsymbol{I_{p}}-\frac{1}{n}\boldsymbol{A^{\top}}\boldsymbol{A}\right)(\boldsymbol{\beta^{*}-\hat{\beta}_{\lambda_{1}}})\right\|_{\infty}\leq\sqrt{n}|\boldsymbol{A^{\top}A}/n-\boldsymbol{I_{p}}|_{\infty}\|\boldsymbol{\beta^{*}}-\boldsymbol{\hat{\beta}_{\lambda_{1}}}\|_{1}. (82)

From result (1) of Lemma 5, result (1) of Theorem 1, and result (5) of Lemma 6 , we have,

n(𝑰𝒑1n𝑨𝑨)(𝜷𝜷^𝝀𝟏)=OP((s+r)logpn).\left\|\sqrt{n}\left(\boldsymbol{I_{p}}-\frac{1}{n}\boldsymbol{A^{\top}}\boldsymbol{A}\right)(\boldsymbol{\beta^{*}-\hat{\beta}_{\lambda_{1}}})\right\|_{\infty}=O_{P}\left((s+r)\frac{\log p}{\sqrt{n}}\right). (83)

Under the assumption nn is ω[((s+r)logp)2]\omega[((s+r)\log p)^{2}], we have:

n(𝑰𝒑1n𝑨𝑨)(𝜷𝜷^𝝀𝟏)=oP(1).\left\|\sqrt{n}\left(\boldsymbol{I_{p}}-\frac{1}{n}\boldsymbol{A^{\top}}\boldsymbol{A}\right)(\boldsymbol{\beta^{*}-\hat{\beta}_{\lambda_{1}}})\right\|_{\infty}=o_{P}(1). (84)

Proof of (79): Again by using result (4) of Lemma 6, we have

1n𝑨(𝜹𝜹^𝝀𝟐)|1n𝑨|𝜹𝜹^𝝀𝟐1.\left\|\frac{1}{\sqrt{n}}\boldsymbol{A^{\top}}(\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}})\right\|_{\infty}\leq\left|\frac{1}{\sqrt{n}}\boldsymbol{A^{\top}}\right|_{\infty}\|\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}\|_{1}.

Since 𝑨\boldsymbol{A} is a Rademacher matrix, we have, |1n𝑨|=1n\left|\frac{1}{\sqrt{n}}\boldsymbol{A^{\top}}\right|_{\infty}=\frac{1}{\sqrt{n}}. From result (2) of Theorem 1 and result (5) of Lemma 6, we have

1n𝑨(𝜹𝜹^𝝀𝟐)=OP(rlognn3/2).\left\|\frac{1}{\sqrt{n}}\boldsymbol{A^{\top}}(\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}})\right\|_{\infty}=O_{P}\left(\frac{r\sqrt{\log n}}{n^{3/2}}\right). (85)

As n,pn,p\to\infty, we have

1n𝑨(𝜹𝜹^𝝀𝟐)=oP(1).\left\|\frac{1}{\sqrt{n}}\boldsymbol{A^{\top}}(\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}})\right\|_{\infty}=o_{P}(1). (86)

Proof of (80): Again using result (4) of Lemma 6, we have,

np1n/p(𝑰𝒏1n𝑨𝑨)𝑨(𝜷𝜷^𝝀𝟏)n1n/p×|1p(𝑰𝒏1n𝑨𝑨)𝑨|𝜷𝜷^𝝀𝟏1.\displaystyle\left\|\frac{n}{p\sqrt{1-n/p}}\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{A}\boldsymbol{A}^{\top}\right)\boldsymbol{A}(\boldsymbol{\beta^{*}-\hat{\beta}_{\lambda_{1}}})\right\|_{\infty}\leq\frac{n}{\sqrt{1-n/p}}\times\left|\frac{1}{p}\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{A}\boldsymbol{A}^{\top}\right)\boldsymbol{A}\right|_{\infty}\|\boldsymbol{\beta^{*}}-\boldsymbol{\hat{\beta}_{\lambda_{1}}}\|_{1}. (87)

By using result (5) of Lemma 6, result (1) of Theorem 1 and result (2) of Lemma 5, we have

np1n/p(𝑰𝒏1n𝑨𝑨)𝑨(𝜷𝜷^𝝀𝟏)\displaystyle\left\|\frac{n}{p\sqrt{1-n/p}}\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{A}\boldsymbol{A}^{\top}\right)\boldsymbol{A}(\boldsymbol{\beta^{*}-\hat{\beta}_{\lambda_{1}}})\right\|_{\infty} \displaystyle\leq OP((s+r)n1n/p(log(pn)pn+1n)logpn)\displaystyle O_{P}\left((s+r)\frac{n}{\sqrt{1-n/p}}\left(\sqrt{\frac{\log(pn)}{pn}}+\frac{1}{n}\right)\sqrt{\frac{\log p}{n}}\right) (88)
=\displaystyle= OP((s+r)n1n/plog(pn)pnlogpn+(s+r)n1n/p1nlogpn)\displaystyle O_{P}\left(\frac{(s+r)n}{\sqrt{1-n/p}}\sqrt{\frac{\log(pn)}{pn}}\sqrt{\frac{\log p}{n}}+\frac{(s+r)n}{\sqrt{1-n/p}}\frac{1}{n}\sqrt{\frac{\log p}{n}}\right)
=\displaystyle= OP((s+r)1n/plog(np)log(p)p+(s+r)1n/plogpn).\displaystyle O_{P}\left(\frac{(s+r)}{\sqrt{1-n/p}}\sqrt{\frac{\log(np)\log(p)}{p}}+\frac{(s+r)}{\sqrt{1-n/p}}\sqrt{\frac{\log p}{n}}\right).

Since nn is o(p)o(p) and nn is ω[((s+r)logp)2]\omega[((s+r)\log p)^{2}], (88) becomes as n,pn,p\to\infty,

np1n/p(𝑰𝒏1n𝑨𝑨)𝑨(𝜷𝜷^𝝀𝟏)=oP(1).\left\|\frac{n}{p\sqrt{1-n/p}}\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{A}\boldsymbol{A}^{\top}\right)\boldsymbol{A}(\boldsymbol{\beta^{*}-\hat{\beta}_{\lambda_{1}}})\right\|_{\infty}=o_{P}(1). (89)

Proof of (81): Using result (4) of Lemma 6, we have,

np1n/p1n𝑨𝑨(𝜹𝜹^𝝀𝟐)11n/p×|1p𝑨𝑨|(𝜹𝜹^𝝀𝟐)1.\displaystyle\left\|\frac{n}{p\sqrt{1-n/p}}\frac{1}{n}\boldsymbol{AA}^{\top}\big{(}\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}\big{)}\right\|_{\infty}\leq\frac{1}{\sqrt{1-n/p}}\times\left|\frac{1}{p}\boldsymbol{AA}^{\top}\right|_{\infty}\left\|\big{(}\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}\big{)}\right\|_{1}. (90)

Since 𝑨\boldsymbol{A} is a Rademacher matrix, the elements of 1p𝑨𝑨\frac{1}{p}\boldsymbol{AA^{\top}} lie between 1-1 and 11. Therefore, |1p𝑨𝑨|=1\left|\frac{1}{p}\boldsymbol{AA^{\top}}\right|_{\infty}=1. By using part (5) of Lemma 6 and result (2) of Theorem 1, we have

np1n/p1n𝑨𝑨(𝜹𝜹^𝝀𝟐)=OP(r1n/plognn).\displaystyle\left\|\frac{n}{p\sqrt{1-n/p}}\frac{1}{n}\boldsymbol{AA}^{\top}\big{(}\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}\big{)}\right\|_{\infty}=O_{P}\left(\frac{r}{\sqrt{1-n/p}}\frac{\sqrt{\log n}}{\sqrt{n}}\right). (91)

Since nn is o(p)o(p), we have

np1n/p1n𝑨𝑨(𝜹𝜹^𝝀𝟐)=oP(1).\left\|\frac{n}{p\sqrt{1-n/p}}\frac{1}{n}\boldsymbol{AA}^{\top}\big{(}\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}\big{)}\right\|_{\infty}=o_{P}(1). (92)

This completes the proof. \blacksquare

Lemma 3

Let 𝐀\boldsymbol{A} be a Rademacher matrix and 𝚺𝐀\boldsymbol{\Sigma_{A}} be as defined in (18). If nlognn\log n is o(p)o(p), we have as n,pn,p\to\infty for any i[n]i\in[n],

n2p2(1n/p)ΣAii𝑃1.\frac{n^{2}}{p^{2}(1-n/p)}\Sigma_{A_{ii}}\overset{P}{\to}1. (93)

Proof of Lemma 3: Recall that from (18), 𝚺𝑨=(𝑰𝒏1n𝑨𝑨)(𝑰𝒏1n𝑨𝑨)\boldsymbol{\Sigma_{A}}=\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{AA}^{\top}\right)\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{AA}^{\top}\right)^{\top}. Note that for i[n]i\in[n], we have

n2p2(1n/p)ΣAii\displaystyle\frac{n^{2}}{p^{2}(1-n/p)}\Sigma_{A_{ii}} =\displaystyle= n2p2(1np)(12n𝒂𝒊.𝒂𝒊.+1n2𝒂𝒊.𝑨𝑨𝒂𝒊.)\displaystyle\frac{n^{2}}{p^{2}(1-\frac{n}{p})}\left(1-\frac{2}{n}\boldsymbol{a_{i.}}\boldsymbol{a_{i.}}^{\top}+\frac{1}{n^{2}}\boldsymbol{a_{i.}A^{\top}A}\boldsymbol{a_{i.}}^{\top}\right) (94)
=\displaystyle= (np(1np)(1𝒂𝒊.𝒂𝒊.n))2+k=1nki(np(1np)(𝒂𝒊.𝒂𝒌.n))2\displaystyle\left(\frac{n}{p\sqrt{({1-\frac{n}{p}})}}\left(1-\frac{\boldsymbol{a_{i.}a_{i.}^{\top}}}{n}\right)\right)^{2}+\underset{k\neq i}{\sum_{k=1}^{n}}\left(\frac{n}{p\sqrt{({1-\frac{n}{p}})}}\left(\frac{\boldsymbol{a_{i.}a_{k.}^{\top}}}{n}\right)\right)^{2}
=\displaystyle= 1np+k=1nki(np(1np)(𝒂𝒊.𝒂𝒌.n))2.\displaystyle{1-\frac{n}{p}}+\underset{k\neq i}{\sum_{k=1}^{n}}\left(\frac{n}{p\sqrt{({1-\frac{n}{p}})}}\left(\frac{\boldsymbol{a_{i.}a_{k.}^{\top}}}{n}\right)\right)^{2}.

In (94), since the second term is positive, we have

n2p2(1n/p)ΣAii1np.\frac{n^{2}}{p^{2}(1-n/p)}\Sigma_{A_{ii}}\geq 1-\frac{n}{p}. (95)

We have from Result (3) of Lemma 5, for k[n],kik\in[n],k\neq i,

P(11n/p|𝒂𝒊.𝒂𝒌./p|21n/p2log(n)p)12n2.P\left(\frac{1}{\sqrt{1-n/p}}\left|\boldsymbol{a_{i.}a_{k.}^{\top}}/p\right|\leq\frac{2}{\sqrt{1-n/p}}\sqrt{\frac{2\log(n)}{p}}\right)\geq 1-\frac{2}{n^{2}}. (96)

By using (96), we have

P(k=1nki(np(1np)(𝒂𝒊.𝒂𝒌.n))24(n1)1n/p2log(n)p)\displaystyle P\left(\underset{k\neq i}{\sum_{k=1}^{n}}\left(\frac{n}{p\sqrt{({1-\frac{n}{p}})}}\left(\frac{\boldsymbol{a_{i.}a_{k.}^{\top}}}{n}\right)\right)^{2}\leq\frac{4(n-1)}{{1-n/p}}{\frac{2\log(n)}{p}}\right) \displaystyle\geq P(k=1nki{(np(1np)(𝒂𝒊.𝒂𝒌.n))241n/p2log(n)p})\displaystyle P\left(\underset{k\neq i}{\cap_{k=1}^{n}}\left\{\left(\frac{n}{p\sqrt{({1-\frac{n}{p}})}}\left(\frac{\boldsymbol{a_{i.}a_{k.}^{\top}}}{n}\right)\right)^{2}\leq\frac{4}{{1-n/p}}{\frac{2\log(n)}{p}}\right\}\right) (97)
=\displaystyle= P(k=1nki{np(1np)|𝒂𝒊.𝒂𝒌.n|21n/p2log(n)p})\displaystyle P\left(\underset{k\neq i}{\cap_{k=1}^{n}}\left\{\frac{n}{p\sqrt{({1-\frac{n}{p}})}}\left|\frac{\boldsymbol{a_{i.}a_{k.}^{\top}}}{n}\right|\leq\frac{2}{\sqrt{1-n/p}}{\sqrt{\frac{2\log(n)}{p}}}\right\}\right)
\displaystyle\geq 12(n1)n2.\displaystyle 1-\frac{2(n-1)}{n^{2}}.

The last inequality comes using Bonferroni’s inequality which states that P(i=1nUi)1i=1nP(Ui)P(\cap_{i=1}^{n}U_{i})\geq 1-\sum_{i=1}^{n}P(U_{i}) for any events U1,U2,,UnU_{1},U_{2},...,U_{n}. Therefore by using (94) and (97), we have

P(n2p2(1n/p)ΣAii1np+4(n1)1n/p2log(n)p)12nP\left(\frac{n^{2}}{p^{2}(1-n/p)}\Sigma_{A_{ii}}\leq 1-\frac{n}{p}+\frac{4(n-1)}{{1-n/p}}{\frac{2\log(n)}{p}}\right)\geq 1-\frac{2}{n} (98)

By using (95) and (98), we have

P(1npn2p2(1n/p)ΣAii1np+4(n1)1n/p2log(n)p)12nP\left(1-\frac{n}{p}\leq\frac{n^{2}}{p^{2}(1-n/p)}\Sigma_{A_{ii}}\leq 1-\frac{n}{p}+\frac{4(n-1)}{{1-n/p}}{\frac{2\log(n)}{p}}\right)\geq 1-\frac{2}{n} (99)

Since nlognn\log n is o(p)o(p), as n,pn,p\to\infty, 1np11-\frac{n}{p}\to 1 and 4(n1)1n/p2log(n)p0\frac{4(n-1)}{{1-n/p}}{\frac{2\log(n)}{p}}\to 0. This completes the proof. \blacksquare


Now we proceed to the results involving debiasing using the optimal weights matrix 𝑾\boldsymbol{W} obtained from Alg. 2. The proofs of these results largely follow the same approach as that for 𝑾=𝑨\boldsymbol{W}=\boldsymbol{A} (i.e. Theorem 2). However there is one major point of departure—due to differences in properties of the weights matrix 𝑾\boldsymbol{W} designed from Alg. 2 (given in Lemma 4), as compared to the case where 𝑾=𝑨\boldsymbol{W}=\boldsymbol{A} (given in Lemma 5).

B-B Proof of Theorem 3

Proof of (23): Using Result (4) of Lemma 6, we have

n(𝑰𝒑1n𝑾𝑨)(𝜷𝜷^𝝀𝟏)n𝑾𝑨/n𝑰𝒑|𝜷𝜷^𝝀𝟏1.\displaystyle\left\|\sqrt{n}\left(\boldsymbol{I_{p}}-\frac{1}{n}\boldsymbol{W^{\top}}\boldsymbol{A}\right)(\boldsymbol{\beta^{*}-\hat{\beta}_{\lambda_{1}}})\right\|_{\infty}\leq\sqrt{n}\|\boldsymbol{W^{\top}A}/n-\boldsymbol{I_{p}}|_{\infty}\|\boldsymbol{\beta^{*}}-\boldsymbol{\hat{\beta}_{\lambda_{1}}}\|_{1}. (100)

Using Result (2) of Lemma 4, Result (1) of Theorem 1 and Result (5) of Lemma 6, we have

n(𝑰𝒑1n𝑾𝑨)(𝜷𝜷^𝝀𝟏)=OP((s+r)logpn).\left\|\sqrt{n}\left(\boldsymbol{I_{p}}-\frac{1}{n}\boldsymbol{W^{\top}}\boldsymbol{A}\right)(\boldsymbol{\beta^{*}-\hat{\beta}_{\lambda_{1}}})\right\|_{\infty}=O_{P}\left((s+r)\frac{\log p}{\sqrt{n}}\right). (101)

If nn is ω[((s+r)logp)2]\omega[((s+r)\log p)^{2}], we have:

n(𝑰𝒑1n𝑾𝑨)(𝜷𝜷^𝝀𝟏)=oP(1).\left\|\sqrt{n}\left(\boldsymbol{I_{p}}-\frac{1}{n}\boldsymbol{W^{\top}}\boldsymbol{A}\right)(\boldsymbol{\beta^{*}-\hat{\beta}_{\lambda_{1}}})\right\|_{\infty}=o_{P}(1). (102)

Proof of (24): Using Result (4) of Lemma 6, we have

1n𝑾(𝜹𝜹^𝝀𝟐)|1n𝑾|𝜹𝜹^𝝀𝟐1.\left\|\frac{1}{\sqrt{n}}\boldsymbol{W^{\top}}(\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}})\right\|_{\infty}\leq\left|\frac{1}{\sqrt{n}}\boldsymbol{W^{\top}}\right|_{\infty}\|\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}\|_{1}.

Using Result (3) of Lemma 4, Result (2) of Theorem 1 and Result (5) of Lemma 6, we have

1n𝑾(𝜹𝜹^𝝀𝟐)=OP(rlognn3/2)=oP(1).\left\|\frac{1}{\sqrt{n}}\boldsymbol{W^{\top}}(\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}})\right\|_{\infty}=O_{P}\left(\frac{r\sqrt{\log n}}{n^{3/2}}\right)=o_{P}(1). (103)

Proof of (25): Using Result (4) of Lemma 6, we have

np1n/p(𝑰𝒏1n𝑾𝑨)𝑨(𝜷𝜷^𝝀𝟏)n1n/p×|1p(𝑰𝒏1n𝑾𝑨)𝑨|𝜷𝜷^𝝀𝟏1.\displaystyle\left\|\frac{n}{p\sqrt{1-n/p}}\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{W}\boldsymbol{A}^{\top}\right)\boldsymbol{A}(\boldsymbol{\beta^{*}-\hat{\beta}_{\lambda_{1}}})\right\|_{\infty}\leq\frac{n}{\sqrt{1-n/p}}\times\left|\frac{1}{p}\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{W}\boldsymbol{A}^{\top}\right)\boldsymbol{A}\right|_{\infty}\|\boldsymbol{\beta^{*}}-\boldsymbol{\hat{\beta}_{\lambda_{1}}}\|_{1}. (104)

Using Result (4) of Lemma 4, Result (1) of Theorem 1 and Result (5) of Lemma 6, we have

np1n/p(𝑰𝒏1n𝑾𝑨)𝑨(𝜷𝜷^𝝀𝟏)\displaystyle\left\|\frac{n}{p\sqrt{1-n/p}}\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{W}\boldsymbol{A}^{\top}\right)\boldsymbol{A}(\boldsymbol{\beta^{*}-\hat{\beta}_{\lambda_{1}}})\right\|_{\infty} \displaystyle\leq OP((s+r)n1n/p(log(pn)pn+1n)logpn)\displaystyle O_{P}\left(\frac{(s+r)n}{\sqrt{1-n/p}}\left(\sqrt{\frac{\log(pn)}{pn}}+\frac{1}{n}\right)\sqrt{\frac{\log p}{n}}\right) (105)
=\displaystyle= OP((s+r)1n/plog(np)log(p)p+(s+r)1n/plogpn).\displaystyle O_{P}\left(\frac{(s+r)}{\sqrt{1-n/p}}\sqrt{\frac{\log(np)\log(p)}{p}}+\frac{(s+r)}{\sqrt{1-n/p}}\sqrt{\frac{\log p}{n}}\right).

Since nn is o(p)o(p) and nn is ω[((s+r)logp)2]\omega[((s+r)\log p)^{2}], we have

np1n/p(𝑰𝒏1n𝑾𝑨)𝑨(𝜷𝜷^𝝀𝟏)=oP(1).\left\|\frac{n}{p\sqrt{1-n/p}}\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{W}\boldsymbol{A}^{\top}\right)\boldsymbol{A}(\boldsymbol{\beta^{*}-\hat{\beta}_{\lambda_{1}}})\right\|_{\infty}=o_{P}(1). (106)

Proof of (26): Using Result (4) of Lemma 6, we have

np1n/p1n𝑾𝑨(𝜹𝜹^𝝀𝟐)11n/p×|1p𝑾𝑨|(𝜹𝜹^𝝀𝟐)1\displaystyle\left\|\frac{n}{p\sqrt{1-n/p}}\frac{1}{n}\boldsymbol{WA}^{\top}\big{(}\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}\big{)}\right\|_{\infty}\leq\frac{1}{\sqrt{1-n/p}}\times\left|\frac{1}{p}\boldsymbol{WA}^{\top}\right|_{\infty}\left\|\big{(}\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}\big{)}\right\|_{1} (107)

Using Result (5) of Lemma 4, Result (2) of Theorem 1 and Result (5) of Lemma 6, we have

np1n/p1n𝑾𝑨(𝜹𝜹^𝝀𝟐)\displaystyle\left\|\frac{n}{p\sqrt{1-n/p}}\frac{1}{n}\boldsymbol{WA}^{\top}\big{(}\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}\big{)}\right\|_{\infty} =\displaystyle= OP(r1n/p(nlog(np)p+1)lognn3/2)\displaystyle O_{P}\left(\frac{r}{\sqrt{1-n/p}}\left(\sqrt{\frac{n\log(np)}{p}}+1\right)\frac{\sqrt{\log n}}{{n^{3/2}}}\right) (108)
=\displaystyle= OP(r1n/plog(np)plognn+r1n/plognn3/2).\displaystyle O_{P}\left(\frac{r}{\sqrt{1-n/p}}\sqrt{\frac{\log(np)}{p}}\sqrt{\frac{{\log n}}{{n}}}+\frac{r}{\sqrt{1-n/p}}\frac{\sqrt{\log n}}{{n^{3/2}}}\right).

Since nn is o(p)o(p), we have

np1n/p1n𝑾𝑨(𝜹𝜹^𝝀𝟐)=oP(1)\left\|\frac{n}{p\sqrt{1-n/p}}\frac{1}{n}\boldsymbol{WA}^{\top}\big{(}\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}\big{)}\right\|_{\infty}=o_{P}(1) (109)

This completes the proof. \blacksquare

B-C Proof of Theorem 4

Result(1): Recall that 𝑾\boldsymbol{W} is the output of Alg. 2, 𝚺𝜷=σ2n𝑾𝑾\boldsymbol{\Sigma_{\beta}}=\frac{\sigma^{2}}{n}\boldsymbol{W^{\top}W}, 𝚺𝜹=σ2(𝑰𝒏1n𝑾𝑨)(𝑰𝒏1n𝑾𝑨)\boldsymbol{\Sigma_{\delta}}=\sigma^{2}\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{WA}^{\top}\right)\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{WA}^{\top}\right)^{\top} and μ1=22logpn\mu_{1}=2\sqrt{\frac{2\log p}{n}}. We will derive the lower bound of Σβjj\Sigma_{\beta_{jj}} for all j[p]j\in[p] following the same idea as in the proof of Lemma 12 of [28]. Note that, Σβjj=σ2n𝒘.𝒋𝒘.𝒋\Sigma_{\beta_{jj}}=\frac{\sigma^{2}}{n}\boldsymbol{w_{.j}^{\top}w_{.j}}. For all j[p]j\in[p], from (134) of result (2) of Lemma 4, for any feasible 𝑾\boldsymbol{W} with probability at least 1(2p2+2n2+32np)1-\left(\dfrac{2}{p^{2}}+\dfrac{2}{n^{2}}+\dfrac{3}{2np}\right), we have

11n𝒂.𝒋𝒘.𝒋μ1 1μ1𝒂.𝒋𝒘.𝒋.1-\frac{1}{n}\boldsymbol{a_{.j}^{\top}w_{.j}}\leq\mu_{1}\ \implies\ 1-\mu_{1}\leq\boldsymbol{a_{.j}^{\top}w_{.j}}.

For any feasible 𝑾\boldsymbol{W} of Alg.2, we have for any c>0c>0,

1n𝒘.𝒋𝒘.𝒋\displaystyle\frac{1}{n}\boldsymbol{w_{.j}^{\top}w_{.j}} \displaystyle\geq 1n𝒘.𝒋𝒘.𝒋+c(1μ1)c(𝒂.𝒋𝒘.𝒋)min𝒘.𝒋n{1n𝒘.𝒋𝒘.𝒋+c(1μ1)c(𝒂.𝒋𝒘.𝒋)}\displaystyle\frac{1}{n}\boldsymbol{w_{.j}^{\top}w_{.j}}+c(1-\mu_{1})-c\left(\boldsymbol{a_{.j}^{\top}w_{.j}}\right)\geq\underset{\boldsymbol{w_{.j}}\in\mathbb{R}^{n}}{\min}\left\{\frac{1}{n}\boldsymbol{w_{.j}^{\top}w_{.j}}+c(1-\mu_{1})-c\left(\boldsymbol{a_{.j}^{\top}w_{.j}}\right)\right\}
=\displaystyle= min𝒘.𝒋n{1n(𝒘.𝒋c𝒂.𝒋/2)(𝒘.𝒋c𝒂.𝒋/2)}+c(1μ1)c24𝒂.𝒋𝒂.𝒋nc(1μ1)c24𝒂.𝒋𝒂.𝒋n\displaystyle\underset{\boldsymbol{w_{.j}}\in\mathbb{R}^{n}}{\min}\left\{\frac{1}{n}\left(\boldsymbol{w_{.j}}-c\boldsymbol{a_{.j}}/2\right)^{\top}\left(\boldsymbol{w_{.j}}-c\boldsymbol{a_{.j}}/2\right)\right\}+c(1-\mu_{1})-\frac{c^{2}}{4}\frac{\boldsymbol{a_{.j}^{\top}a_{.j}}}{n}\geq c(1-\mu_{1})-\frac{c^{2}}{4}\frac{\boldsymbol{a_{.j}}^{\top}\boldsymbol{a_{.j}}}{n}
=\displaystyle= c(1μ1)c24.\displaystyle c(1-\mu_{1})-\dfrac{c^{2}}{4}.

We obtain the last inequality by putting 𝒘.𝒋=c𝒂.𝒋/2\boldsymbol{w_{.j}}=c\boldsymbol{a_{.j}}/2 which makes the square term 0. The rightmost equality is because 𝒂.𝒋𝒂.𝒋=n\boldsymbol{a_{.j}}^{\top}\boldsymbol{a_{.j}}=n. The lower bound on the RHS is maximized for c=2(1μ1)c=2(1-\mu_{1}). Plugging in this value of cc, we obtain the following with probability atleast 1(2p2+2n2+12np)1-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right):

1n𝒘.𝒋𝒘.𝒋(1μ1)2.\frac{1}{n}\boldsymbol{w_{.j}^{\top}w_{.j}}\geq(1-\mu_{1})^{2}.

Hence, from the above equation and (129), we obtain the lower bound on Σβjj\Sigma_{\beta_{jj}} for any j[p]j\in[p] as follows:

P(Σβjjσ2(1μ1)2)1(2p2+2n2+12np).P\left(\Sigma_{\beta_{jj}}\geq\sigma^{2}(1-\mu_{1})^{2}\right)\geq 1-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right). (110)

Furthermore from Result (1) of Lemma. 4, we have

P(𝒘.𝒋𝒘.𝒋/n1j[p])=1.P\left(\boldsymbol{w_{.j}^{\top}w_{.j}}/n\leq 1\ \forall j\in[p]\right)=1. (111)

We use (111) to get, for any j[p]j\in[p], P(𝒘.𝒋𝒘.𝒋/n1)=1P(\boldsymbol{w_{.j}^{\top}w_{.j}}/n\leq 1)=1. As Σβjj=σ2𝒘.𝒋𝒘.𝒋n\Sigma_{\beta_{jj}}=\sigma^{2}\frac{\boldsymbol{w_{.j}^{\top}w_{.j}}}{n}, we have for any j[p]j\in[p]:

P(Σβjjσ2)=1.P\left(\Sigma_{\beta_{jj}}\leq\sigma^{2}\right)=1. (112)

Using (112) with (110), we obtain for any j[p]j\in[p],

P(σ2(1μ1)2Σβjjσ2)1(2p2+2n2+12np).P\left(\sigma^{2}(1-\mu_{1})^{2}\leq\Sigma_{\beta_{jj}}\leq\sigma^{2}\right)\geq 1-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right).

Now under the assumption nn is ω[((s+r)logp)2]\omega[((s+r)\log p)^{2}], μ10\mu_{1}\to 0. Hence, we have, Σβjj𝑃σ2\Sigma_{\beta_{jj}}\overset{P}{\to}\sigma^{2}. This completes the proof of Result (1).
Result (2): Recall that μ3=21n/p2log(n)p\mu_{3}=\frac{2}{\sqrt{1-n/p}}\sqrt{\frac{2\log(n)}{p}}. Now in order to obtain the upper and lower bounds for Σδii\Sigma_{\delta_{ii}} for any i[n]i\in[n], we use Result (6) of Lemma. 4. We have,

P(np1np|(𝑾𝑨npn𝑰𝒏)|μ3)1(2p2+2n2+12np).P\left(\frac{n}{p\sqrt{1-\frac{n}{p}}}\left|\left(\frac{\boldsymbol{WA^{\top}}}{n}-\frac{p}{n}\boldsymbol{I_{n}}\right)\right|_{\infty}\leq\mu_{3}\right)\geq 1-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right). (113)

We have for i[n]i\in[n],

n2p2(1np)Σδii\displaystyle\frac{n^{2}}{p^{2}(1-\frac{n}{p})}\Sigma_{\delta_{ii}} =\displaystyle= σ2n2p2(1np)(12n𝒘𝒊.𝒂𝒊.+1n2𝒘𝒊.𝑨𝑨𝒘𝒊.)\displaystyle\sigma^{2}\frac{n^{2}}{p^{2}(1-\frac{n}{p})}\left(1-\frac{2}{n}\boldsymbol{w_{i.}}\boldsymbol{a_{i.}}^{\top}+\frac{1}{n^{2}}\boldsymbol{w_{i.}A^{\top}A}\boldsymbol{w_{i.}}^{\top}\right) (114)
=\displaystyle= σ2(np(1np)(1𝒘𝒊.𝒂𝒊.n))2+σ2k=1nki(np(1np)(𝒘𝒊.𝒂𝒌.n))2.\displaystyle\sigma^{2}\left(\frac{n}{p\sqrt{({1-\frac{n}{p}})}}\left(1-\frac{\boldsymbol{w_{i.}a_{i.}^{\top}}}{n}\right)\right)^{2}+\sigma^{2}\underset{k\neq i}{\sum_{k=1}^{n}}\left(\frac{n}{p\sqrt{({1-\frac{n}{p}})}}\left(\frac{\boldsymbol{w_{i.}a_{k.}^{\top}}}{n}\right)\right)^{2}.

Let 𝑽=(𝑾𝑨npn𝑰𝒏)\boldsymbol{V}=\left(\frac{\boldsymbol{WA^{\top}}}{n}-\frac{p}{n}\boldsymbol{I_{n}}\right). Note that, for i[n]i\in[n],

vii=𝒘𝒊.𝒂𝒊.npn=(𝒘𝒊.𝒂𝒊.n1)+(1pn).v_{ii}=\frac{\boldsymbol{w_{i.}a_{i.}}^{\top}}{n}-\frac{p}{n}=\left(\frac{\boldsymbol{w_{i.}a_{i.}}^{\top}}{n}-1\right)+\left(1-\frac{p}{n}\right). (115)

Hence from (115), we have

np1n/pvii\displaystyle\frac{n}{p\sqrt{1-n/p}}v_{ii} =\displaystyle= np1n/p(𝒘𝒊.𝒂𝒊.n1)+11n/p(np1)\displaystyle\frac{n}{p\sqrt{1-n/p}}\left(\frac{\boldsymbol{w_{i.}a_{i.}}^{\top}}{n}-1\right)+\frac{1}{\sqrt{1-n/p}}\left(\frac{n}{p}-1\right) (116)
=\displaystyle= np1n/p(𝒘𝒊.𝒂𝒊.n1)11n/p(1np)\displaystyle\frac{n}{p\sqrt{1-n/p}}\left(\frac{\boldsymbol{w_{i.}a_{i.}}^{\top}}{n}-1\right)-\frac{1}{\sqrt{1-n/p}}\left(1-\frac{n}{p}\right)
=\displaystyle= np1n/p(𝒘𝒊.𝒂𝒊.n1)1np.\displaystyle\frac{n}{p\sqrt{1-n/p}}\left(\frac{\boldsymbol{w_{i.}a_{i.}}^{\top}}{n}-1\right)-\sqrt{1-\frac{n}{p}}.

We have, from (113),

P(np1n/pviiμ3)1(2p2+2n2+12np).P\left(\frac{n}{p\sqrt{1-n/p}}v_{ii}\geq-\mu_{3}\right)\geq 1-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right).

Therefore using (116), we have

P(np1n/p(𝒘𝒊.𝒂𝒊.n1)1npμ3)\displaystyle P\left(\frac{n}{p\sqrt{1-n/p}}\left(\frac{\boldsymbol{w_{i.}a_{i.}}^{\top}}{n}-1\right)-\sqrt{1-\frac{n}{p}}\geq-\mu_{3}\right) \displaystyle\geq 1(2p2+2n2+12np)\displaystyle 1-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right)
P(np1n/p(𝒘𝒊.𝒂𝒊.n1)1npμ3)\displaystyle\implies P\left(\frac{n}{p\sqrt{1-n/p}}\left(\frac{\boldsymbol{w_{i.}a_{i.}}^{\top}}{n}-1\right)\geq\sqrt{1-\frac{n}{p}}-\mu_{3}\right) \displaystyle\geq 1(2p2+2n2+12np).\displaystyle 1-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right). (117)

Using (117) in (114) yields the following inequality with probability at least 1(2p2+2n2+12np)1-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right),

n2p2(1np)Σδii\displaystyle\frac{n^{2}}{p^{2}(1-\frac{n}{p})}\Sigma_{\delta_{ii}} =\displaystyle= σ2(np(1np)(𝒘𝒊.𝒂𝒊.n1))2+σ2k=1,kin(np1n/p𝒘𝒊.𝒂𝒌.n)2\displaystyle\sigma^{2}\left(\frac{n}{p\sqrt{({1-\frac{n}{p}})}}\left(\frac{\boldsymbol{w_{i.}a_{i.}^{\top}}}{n}-1\right)\right)^{2}+\sigma^{2}\sum_{k=1,k\neq i}^{n}\left(\frac{n}{p\sqrt{1-n/p}}\frac{\boldsymbol{w_{i.}a_{k.}^{\top}}}{n}\right)^{2}
\displaystyle\geq σ2(np(1np)(𝒘𝒊.𝒂𝒊.n1))2σ2(1npμ3)2.\displaystyle\sigma^{2}\left(\frac{n}{p\sqrt{({1-\frac{n}{p}})}}\left(\frac{\boldsymbol{w_{i.}a_{i.}^{\top}}}{n}-1\right)\right)^{2}\geq\sigma^{2}\left(\sqrt{1-\frac{n}{p}}-\mu_{3}\right)^{2}.

Therefore the lower bound on Σδii\Sigma_{\delta_{ii}} is as follows:

P(n2p2(1np)Σδiiσ2(1npμ3)2)1(2p2+2n2+12np).P\left(\frac{n^{2}}{p^{2}(1-\frac{n}{p})}\Sigma_{\delta_{ii}}\geq\sigma^{2}\left(\sqrt{1-\frac{n}{p}}-\mu_{3}\right)^{2}\right)\geq 1-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right). (118)

We need to now derive an upper bound on Σδii\Sigma_{\delta_{ii}}. By the same argument as before, we have from (113)

P(np1n/pviiμ3)\displaystyle P\left(\frac{n}{p\sqrt{1-n/p}}v_{ii}\leq\mu_{3}\right) \displaystyle\geq 1(2p2+2n2+12np),\displaystyle 1-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right),
P(np1n/p(𝒘𝒊.𝒂𝒊.n1)1npμ3)\displaystyle\implies P\left(\frac{n}{p\sqrt{1-n/p}}\left(\frac{\boldsymbol{w_{i.}a_{i.}}^{\top}}{n}-1\right)-\sqrt{1-\frac{n}{p}}\leq\mu_{3}\right) \displaystyle\geq 1(2p2+2n2+12np),\displaystyle 1-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right),
P(np1n/p(𝒘𝒊.𝒂𝒊.n1)1np+μ3)\displaystyle\implies P\left(\frac{n}{p\sqrt{1-n/p}}\left(\frac{\boldsymbol{w_{i.}a_{i.}}^{\top}}{n}-1\right)\leq\sqrt{1-\frac{n}{p}}+\mu_{3}\right) \displaystyle\geq 1(2p2+2n2+12np).\displaystyle 1-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right).

Again for i[n],k[n],kii\in[n],k\in[n],k\neq i, vij=𝒘𝒊.𝒂𝒌.nv_{ij}=\frac{\boldsymbol{w_{i.}a_{k.}^{\top}}}{n}. We have from (113),

P(np1n/p|𝒘𝒊.𝒂𝒌.n|μ3)1(2p2+2n2+12np).P\left(\frac{n}{p\sqrt{1-n/p}}\left|\frac{\boldsymbol{w_{i.}a_{k.}^{\top}}}{n}\right|\leq\mu_{3}\right)\geq 1-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right).

Now from (114), we have

n2p2(1np)Σδii\displaystyle\frac{n^{2}}{p^{2}(1-\frac{n}{p})}\Sigma_{\delta_{ii}} =\displaystyle= σ2(np(1np)(𝒘𝒊.𝒂𝒊.n1))2+σ2k=1,kin(np1n/p𝒘𝒊.𝒂𝒌.n)2\displaystyle\sigma^{2}\left(\frac{n}{p\sqrt{({1-\frac{n}{p}})}}\left(\frac{\boldsymbol{w_{i.}a_{i.}^{\top}}}{n}-1\right)\right)^{2}+\sigma^{2}\sum_{k=1,k\neq i}^{n}\left(\frac{n}{p\sqrt{1-n/p}}\frac{\boldsymbol{w_{i.}a_{k.}^{\top}}}{n}\right)^{2}
\displaystyle\leq σ2(1np+μ3)2+σ2(n1)μ32.\displaystyle\sigma^{2}\left(\sqrt{1-\frac{n}{p}}+\mu_{3}\right)^{2}+\sigma^{2}(n-1)\mu_{3}^{2}.

The last inequality holds with probability at least 12(2p2+2n2+12np)1-2\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right). Hence, we have

P(n2p2(1np)Σδiiσ2(1np+μ3)2+σ2(n1)μ32)12(2p2+2n2+12np).P\left(\frac{n^{2}}{p^{2}(1-\frac{n}{p})}\Sigma_{\delta_{ii}}\leq\sigma^{2}\left(\sqrt{1-\frac{n}{p}}+\mu_{3}\right)^{2}+\sigma^{2}(n-1)\mu_{3}^{2}\right)\geq 1-2\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right). (119)

Using (119) with (118), we obtain the following using the union bound, for all i[n]i\in[n],

P(σ2(1npμ3)2n2p2(1np)Σδiiσ2(1np+μ3)2+σ2(n1)μ32)13(2p2+2n2+12np).P\left(\sigma^{2}\left(\sqrt{1-\frac{n}{p}}-\mu_{3}\right)^{2}\leq\frac{n^{2}}{p^{2}(1-\frac{n}{p})}\Sigma_{\delta_{ii}}\leq\sigma^{2}\left(\sqrt{1-\frac{n}{p}}+\mu_{3}\right)^{2}+\sigma^{2}(n-1)\mu_{3}^{2}\right)\geq 1-3\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right).

Therefore, under the assumption nlognn\log n is o(p)o(p), we have, (n1)μ32=(n1)lognp0(n-1)\mu_{3}^{2}=(n-1)\frac{\log n}{p}\to 0 and (1np+μ3)21\left(\sqrt{1-\frac{n}{p}}+\mu_{3}\right)^{2}\to 1. Hence, we have n2p2(1np)Σδii𝑃σ2\frac{n^{2}}{p^{2}(1-\frac{n}{p})}\Sigma_{\delta_{ii}}\overset{P}{\to}\sigma^{2}. This completes the proof. \blacksquare

B-D Proof of Theorem 5

Let 𝑾\boldsymbol{W} be the output of Alg. 2. Using the definition of 𝜷^𝑾\boldsymbol{\hat{\beta}_{W}} from (14) and the measurement model from (5), we have

𝜷^𝑾𝜷\displaystyle\boldsymbol{\hat{\beta}_{W}}-\boldsymbol{\beta^{*}} =\displaystyle= 1n𝑾𝜼+(𝑰𝒑1n𝑾𝑨)(𝜷^𝝀𝟏𝜷)+1n𝑾(𝜹𝜹^𝝀𝟐).\displaystyle\frac{1}{n}\boldsymbol{W}^{\top}\boldsymbol{{\eta}}+\left(\boldsymbol{I_{p}}-\frac{1}{n}\boldsymbol{W^{\top}}\boldsymbol{A}\right)(\boldsymbol{\hat{\beta}_{\lambda_{1}}}-\boldsymbol{\beta^{*}})+\frac{1}{n}\boldsymbol{W}^{\top}\left(\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}\right). (120)

Using Results (1) and (2) of Theorem 3, the second and third term on the RHS of (120) are oP(1/n)o_{P}(1/\sqrt{n}). Recall that 𝚺𝜷=σ2n𝑾𝑾\boldsymbol{\Sigma_{\beta}}=\frac{\sigma^{2}}{n}\boldsymbol{W^{\top}W}. Therefore, we have

n(β^Wjβj)Σβjj\displaystyle\frac{\sqrt{n}(\hat{\beta}_{Wj}-\beta^{*}_{j})}{\sqrt{\Sigma_{\beta_{jj}}}} =\displaystyle= 1n𝒘.j𝜼Σβjj+oP(1/Σβjj),\displaystyle\frac{\frac{1}{\sqrt{n}}\boldsymbol{w}_{.j}^{\top}\boldsymbol{{\eta}}}{\sqrt{\Sigma_{\beta_{jj}}}}+o_{P}\left(1/\sqrt{\Sigma_{\beta_{jj}}}\right), (121)

where 𝒘.j\boldsymbol{w}_{.j} denotes the jthj^{\text{th}} column of matrix 𝑾\boldsymbol{W}. As 𝜼\boldsymbol{\eta} is Gaussian, the first term on the RHS of (121) is a Gaussian random variable with mean 0 and variance 11. Using Result (1) of Theorem 4, Σβjj\Sigma_{\beta_{jj}} converges to σ2\sigma^{2} in probability. This completes the proof of Result (1).

Using (15) and the measurement model (5), we have

𝜹^𝑾𝜹=(𝑰𝒏1n𝑾𝑨)𝜼+(𝑰𝒏1n𝑾𝑨)𝑨(𝜷𝜷^𝝀𝟏)1n𝑾𝑨(𝜹𝜹^𝝀𝟐).\displaystyle\boldsymbol{\hat{\delta}_{W}}-\boldsymbol{\delta^{*}}=\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{W}\boldsymbol{A}^{\top}\right)\boldsymbol{{\eta}}+\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{W}\boldsymbol{A}^{\top}\right)\boldsymbol{A}(\boldsymbol{\beta^{*}-\hat{\beta}_{\lambda_{1}}})-\frac{1}{n}\boldsymbol{WA}^{\top}\big{(}\boldsymbol{\delta^{*}}-\boldsymbol{\hat{\delta}_{\lambda_{2}}}\big{)}. (122)

Using Results (3) and (4) of Theorem 3, the second and third term on the RHS of (76) are both oP(p1n/pn)o_{P}\left(\frac{p\sqrt{1-n/p}}{n}\right). Recall from (28), that 𝚺𝜹=σ2(𝑰𝒏1n𝑾𝑨)(𝑰𝒏1n𝑾𝑨)\boldsymbol{\Sigma_{\delta}}=\sigma^{2}\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{W}\boldsymbol{A}^{\top}\right)\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{W}\boldsymbol{A}^{\top}\right)^{\top}. Therefore, we have

(δ^Wiδi)Σδii=(𝑰𝒏1n𝑾𝑨)i.𝜼Σδii+oP(1Σδiip1n/pn).\displaystyle\frac{\left({\hat{\delta}_{Wi}}-{\delta^{*}_{i}}\right)}{\sqrt{\Sigma_{\delta_{ii}}}}=\frac{\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{WA}^{\top}\right)^{\top}_{i.}\boldsymbol{{\eta}}}{\sqrt{\Sigma_{\delta_{ii}}}}+o_{P}\left(\frac{1}{\sqrt{\Sigma_{\delta_{ii}}}}\frac{p\sqrt{1-n/p}}{n}\right). (123)

As 𝜼\boldsymbol{\eta} is Gaussian, the first term on the RHS of (123) is a Gaussian random variable with mean 0 and variance 11. Using Result (2) of Theorem 4, n2p2(1n/p)Σδii\frac{n^{2}}{p^{2}(1-n/p)}\Sigma_{\delta_{ii}} converges to σ2\sigma^{2} in probability so that (1Σδiip1n/pn)=1\left(\frac{1}{\sqrt{\Sigma_{\delta_{ii}}}}\frac{p\sqrt{1-n/p}}{n}\right)=1. This completes the proof of result (2). \blacksquare

Lemma 4

Let 𝐀\boldsymbol{A} be a n×pn\times p Rademacher matrix and 𝐖\boldsymbol{W} be the corresponding output of Alg. 2. If nn is o(p)o(p), we have the following results:

  1. (1)

    P(𝒘.𝒋𝒘.𝒋/n1j[p])=1P\left(\boldsymbol{w_{.j}^{\top}w_{.j}}/n\leq 1\ \forall j\in[p]\right)=1.

  2. (2)

    |1n𝑾𝑨𝑰𝒑|\left|\frac{1}{n}\boldsymbol{W^{\top}}\boldsymbol{A}-\boldsymbol{I_{p}}\right|_{\infty} = OP(log(p)n)O_{P}\left(\sqrt{\frac{\log(p)}{n}}\right).

  3. (3)

    |1n𝑾|=O(1)\left|\frac{1}{\sqrt{n}}\boldsymbol{W^{\top}}\right|_{\infty}=O\left(1\right).

  4. (4)

    |1p(1n𝑾𝑨𝑰𝒏)𝑨|\left|\frac{1}{p}\left(\frac{1}{n}\boldsymbol{W}\boldsymbol{A}^{\top}-\boldsymbol{I_{n}}\right)\boldsymbol{A}\right|_{\infty}= OP(log(pn)pn+1n)O_{P}\left(\sqrt{\frac{\log(pn)}{pn}}+\frac{1}{n}\right).

  5. (5)

    |1p𝑾𝑨|=OP(nlog(np)p+1)\left|\frac{1}{p}\boldsymbol{WA}^{\top}\right|_{\infty}=O_{P}\left(\sqrt{\frac{n\log(np)}{p}}+1\right).

  6. (6)

    np1np|(𝑾𝑨npn𝑰𝒏)|=OP(11n/plog(n)p)\frac{n}{p\sqrt{1-\frac{n}{p}}}\left|\left(\frac{\boldsymbol{WA^{\top}}}{n}-\frac{p}{n}\boldsymbol{I_{n}}\right)\right|_{\infty}=O_{P}\left(\frac{1}{\sqrt{1-n/p}}\sqrt{\frac{\log(n)}{p}}\right).

Proof of Lemma. 4:
In order to prove these results, we will first show that the intersection event of the 44 constraints of Alg. 2 is non-null with probability at least 1(2p2+2n2+12np)1-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right) as the solution 𝑾=𝑨\boldsymbol{W}=\boldsymbol{A} is in the feasible set. This will show that the feasible region of the optimisation problem given in Alg. 2 is non-empty. Let us first define the following sets:
G1(n,p)={𝑨Rn×p:|𝑨𝑨/n𝑰𝒑|μ1}G_{1}(n,p)=\left\{\boldsymbol{A}\in R^{n\times p}:|\boldsymbol{A^{\top}A}/n-\boldsymbol{I_{p}}|_{\infty}\leq\mu_{1}\right\}, G2(n,p)={𝑨Rn×p:|1p(𝑰𝒏𝑨𝑨/n)𝑨|μ2}G_{2}(n,p)=\left\{\boldsymbol{A}\in R^{n\times p}:\left|\frac{1}{p}(\boldsymbol{I_{n}}-\boldsymbol{AA^{\top}}/n)\boldsymbol{A}\right|\leq\mu_{2}\right\},
G3(n,p)={𝑨Rn×p:np1np|(𝑨𝑨npn𝑰𝒏)|μ3}G_{3}(n,p)=\left\{\boldsymbol{A}\in R^{n\times p}:\frac{n}{p\sqrt{1-\frac{n}{p}}}\left|\left(\frac{\boldsymbol{AA^{\top}}}{n}-\frac{p}{n}\boldsymbol{I_{n}}\right)\right|_{\infty}\leq\mu_{3}\right\}, G4(n,p)={𝑨Rn×p:𝒂.𝒋𝒂.𝒋/n1j[p]}G_{4}(n,p)=\{\boldsymbol{A}\in R^{n\times p}:\boldsymbol{a_{.j}^{\top}a_{.j}}/n\leq 1\ \forall\ j\in[p]\} , where, μ1=22log(p)n\mu_{1}=2\sqrt{\frac{2\log(p)}{n}}, μ2=2log(2np)np+1n\mu_{2}=2\sqrt{\frac{\log({2np})}{np}}+\frac{1}{n} and μ3=21n/p2log(n)p\mu_{3}=\frac{2}{\sqrt{1-n/p}}\sqrt{\frac{2\log(n)}{p}}. Note that, here 𝑨\boldsymbol{A} is a n×pn\times p Rademacher matrix. We will now state the probabilities of the aforementioned sets. From (149) of Lemma 5, we have

P(𝑨G1(n,p))12p2P\left(\boldsymbol{A}\in G_{1}(n,p)\right)\geq 1-\frac{2}{p^{2}} (124)

Again from (155) of Lemma 5, we have

P(𝑨G2(n,p))112npP\left(\boldsymbol{A}\in G_{2}(n,p)\right)\geq 1-\frac{1}{2np} (125)

Similarly from (157) of Lemma 5, we have

P(𝑨G3(n,p))12n2P\left(\boldsymbol{A}\in G_{3}(n,p)\right)\geq 1-\frac{2}{n^{2}} (126)

Lastly, since 𝑨\boldsymbol{A} is Rademacher, P(𝒂.𝒋𝒂.𝒋/n1j[p])=P(j=1p𝒂.𝒋𝒂.𝒋/n1)=j=1pP(𝒂.𝒋𝒂.𝒋/n1)=1p=1P(\boldsymbol{a_{.j}^{\top}a_{.j}}/n\leq 1\ \forall\ j\in[p])=P\left(\cap_{j=1}^{p}\boldsymbol{a_{.j}^{\top}a_{.j}}/n\leq 1\right)=\prod_{j=1}^{p}P(\boldsymbol{a_{.j}^{\top}a_{.j}}/n\leq 1)=1^{p}=1. Therefore, we have,

P(𝑨G4(n,p))=1P\left(\boldsymbol{A}\in G_{4}(n,p)\right)=1 (127)

Note that, P(𝑨{k=14Gk(n,p)}c)=P(𝑨{k=14(Gk(n,p))c})k=14P(𝑨(Gk(n,p))c)(2p2+2n2+12np)P\left(\boldsymbol{A}\in\{\cap_{k=1}^{4}G_{k}(n,p)\}^{c}\right)=P\left(\boldsymbol{A}\in\{\cup_{k=1}^{4}(G_{k}(n,p))^{c}\}\right)\leq\sum_{k=1}^{4}P\left(\boldsymbol{A}\in(G_{k}(n,p))^{c}\right)\leq\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right). Therefore, we obtain, P(𝑨{k=14Gk(n,p)}c)=1P(𝑨{k=14Gk(n,p)})(2p2+2n2+12np)P\left(\boldsymbol{A}\in\{\cap_{k=1}^{4}G_{k}(n,p)\}^{c}\right)=1-P\left(\boldsymbol{A}\in\{\cap_{k=1}^{4}G_{k}(n,p)\}\right)\leq\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right) . Hence,

P(𝑨{k=14Gk(n,p)})1(2p2+2n2+12np).P\left(\boldsymbol{A}\in\{\cap_{k=1}^{4}G_{k}(n,p)\}\right)\geq 1-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right). (128)

Therefore, 𝑨\boldsymbol{A} satisfies the constraints of Alg.2 with high probability. This implies that there exists 𝑾\boldsymbol{W^{*}} that satisfies the constraints. Let

E(n,p)={𝑨:𝑾s.t.|𝑾𝑨/n𝑰𝒑|μ1,|1p(𝑰𝒏𝑾𝑨/n)𝑨|μ2,np1np|(𝑾𝑨npn𝑰𝒏)|μ3,𝒘.𝒋𝒘.𝒋/n1j[p]}.E(n,p)=\Bigg{\{}\boldsymbol{A}:\exists\boldsymbol{W^{*}}\ \text{s.t.}\ |\boldsymbol{{W^{*}}^{\top}A}/n-\boldsymbol{I_{p}}|_{\infty}\leq\mu_{1},\ \left|\frac{1}{p}(\boldsymbol{I_{n}}-\boldsymbol{W^{*}A^{\top}}/n)\boldsymbol{A}\right|\leq\mu_{2},\\ \frac{n}{p\sqrt{1-\frac{n}{p}}}\left|\left(\frac{\boldsymbol{W^{*}A^{\top}}}{n}-\frac{p}{n}\boldsymbol{I_{n}}\right)\right|_{\infty}\leq\mu_{3},\boldsymbol{{w^{*}}_{.j}^{\top}{w^{*}}_{.j}}/n\leq 1\ \forall\ j\in[p]\Bigg{\}}.

Hence, we have

P(𝑨E(n,p))P(𝑨k=14Gk(n,p))1(2p2+2n2+12np)P(\boldsymbol{A}\in E(n,p))\geq P(\boldsymbol{A}\in\cap_{k=1}^{4}G_{k}(n,p))\geq 1-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right) (129)

Given that the set of feasible solutions is non-null, we can say that the optimal solution of Alg. 2 denoted by 𝑾\boldsymbol{W} satisfies the constraints of Alg. 2 with probability 11.
Result (1): Recall that the event that there exists a point satisfying constraints 𝖢𝟢\mathsf{C0}𝖢𝟥\mathsf{C3} is E(n,p)E(n,p). We have

P(𝒘.𝒋𝒘.𝒋/n1j[p])\displaystyle P\left(\boldsymbol{w_{.j}^{\top}w_{.j}}/n\leq 1\ \forall j\in[p]\right) =\displaystyle= P(𝒘.𝒋𝒘.𝒋/n1j[p]|𝑨E(n,p))P(𝑨E(n,p))\displaystyle P\left(\boldsymbol{w_{.j}^{\top}w_{.j}}/n\leq 1\ \forall j\in[p]\bigg{|}\boldsymbol{A}\in E(n,p)\right)P\left(\boldsymbol{A}\in E(n,p)\right) (130)
+\displaystyle+ P(𝒘.𝒋𝒘.𝒋/n1j[p]|𝑨E(n,p)c)P(𝑨E(n,p)c)\displaystyle P\left(\boldsymbol{w_{.j}^{\top}w_{.j}}/n\leq 1\ \forall j\in[p]\bigg{|}\boldsymbol{A}\in E(n,p)^{c}\right)P\left(\boldsymbol{A}\in E(n,p)^{c}\right)

If there exists a feasible solution to 𝖢𝟢\mathsf{C0}𝖢𝟥\mathsf{C3} then 𝒘.j𝒘.j/n1\boldsymbol{w}_{.j}^{\top}\boldsymbol{w}_{.j}/n\leq 1. Therefore, we have

P(𝒘.𝒋𝒘.𝒋/n1j[p]|𝑨E(n,p))=1.P\left(\boldsymbol{w_{.j}^{\top}w_{.j}}/n\leq 1\ \forall j\in[p]\bigg{|}\boldsymbol{A}\in E(n,p)\right)=1. (131)

Now, we have from Alg. 2, if the constraints of the optimisation problem are not satisfied, then we choose 𝑾=𝑨\boldsymbol{W}=\boldsymbol{A} as the output. This event is given by 𝑨E(n,p)c\boldsymbol{A}\in E(n,p)^{c}. Now, we know that for Rademacher matrix 𝑨\boldsymbol{A}, 𝒂.𝒋𝒂.𝒋/n=1\boldsymbol{a_{.j}^{\top}a_{.j}}/n=1 with probability 1. Therefore, we have

P(𝒘.𝒋𝒘.𝒋/n1j[p]|𝑨E(n,p)c)=1.P\left(\boldsymbol{w_{.j}^{\top}w_{.j}}/n\leq 1\ \forall j\in[p]\bigg{|}\boldsymbol{A}\in E(n,p)^{c}\right)=1. (132)

Therefore, we have from (131),(132) and (130),

P(𝒘.𝒋𝒘.𝒋/n1j[p])=P(𝑨E(n,p))+P(𝑨E(n,p)c)=1.P\left(\boldsymbol{w_{.j}^{\top}w_{.j}}/n\leq 1\ \forall j\in[p]\right)=P\left(\boldsymbol{A}\in E(n,p)\right)+P\left(\boldsymbol{A}\in E(n,p)^{c}\right)=1. (133)

Result (2): Recall that μ1=22log(p)n\mu_{1}=2\sqrt{\frac{2\log(p)}{n}}. Note that we have for any two events F1,F2F_{1},F_{2}, P(F1)=P(F1F2)+P(F1F2c)P(F1F2)+P(F2c)P(F_{1})=P(F_{1}\cap F_{2})+P(F_{1}\cap F_{2}^{c})\leq P(F_{1}\cap F_{2})+P(F_{2}^{c}). Therefore, we have,

P(|𝑾𝑨/n𝑰𝒑|μ1)\displaystyle P\left(|\boldsymbol{W^{\top}A}/n-\boldsymbol{I_{p}}|_{\infty}\geq\mu_{1}\right) \displaystyle\leq P({|𝑾𝑨/n𝑰𝒑|μ1}E(n,p))+P({E(n,p)}c)\displaystyle P\left(\{|\boldsymbol{W^{\top}A}/n-\boldsymbol{I_{p}}|_{\infty}\geq\mu_{1}\}\cap E(n,p)\right)+P\left(\{E(n,p)\}^{c}\right)
\displaystyle\leq P({|𝑾𝑨/n𝑰𝒑|μ1}E(n,p))+(2p2+2n2+12np)\displaystyle P\left(\{|\boldsymbol{W^{\top}A}/n-\boldsymbol{I_{p}}|_{\infty}\geq\mu_{1}\}\cap E(n,p)\right)+\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right)

The last inequality comes from (129). Since 𝑾\boldsymbol{W} is a feasible solution, given 𝑨E(n,p)\boldsymbol{A}\in E(n,p), it will satisfy the second constraint of Alg. 2 with probability 1. This means that

P({|𝑾𝑨/n𝑰𝒑|μ1}E(n,p))=0.P\left(\{|\boldsymbol{W^{\top}A}/n-\boldsymbol{I_{p}}|_{\infty}\geq\mu_{1}\}\cap E(n,p)\right)=0.

Therefore we have,

P(|𝑾𝑨/n𝑰𝒑|μ1)1(2p2+2n2+12np).P\left(|\boldsymbol{W^{\top}A}/n-\boldsymbol{I_{p}}|_{\infty}\leq\mu_{1}\right)\geq 1-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right). (134)

Since μ1=22log(p)n\mu_{1}=2\sqrt{\frac{2\log(p)}{n}}, we have, |𝑾𝑨/n𝑰𝒑|=OP(log(p)/n)|\boldsymbol{W^{\top}A}/n-\boldsymbol{I_{p}}|_{\infty}=O_{P}(\sqrt{\log(p)/n}).
Result (3): From (133), we have that, for each j[p]j\in[p], 𝒘.𝒋2n\|\boldsymbol{w_{.j}}\|_{2}\leq\sqrt{n} with probability 11. Note that for any vector 𝒙\boldsymbol{x}, 𝒙𝒙2\|\boldsymbol{x}\|_{\infty}\leq\|\boldsymbol{x}\|_{2}, we have for every j[p]j\in[p], 𝒘.𝒋n\|\boldsymbol{w_{.j}}\|_{\infty}\leq\sqrt{n} with probability 11. Since |𝑾|maxj[p]𝒘.𝒋n|\boldsymbol{W}^{\top}|_{\infty}\leq\underset{j\in[p]}{\max}\|\boldsymbol{w_{.j}}\|_{\infty}\leq\sqrt{n} with probability 1. Therefore, we have

|1n𝑾|=O(1).\left|\frac{1}{\sqrt{n}}\boldsymbol{W}^{\top}\right|_{\infty}=O(1). (135)

Result (4): Recall that μ2=1n+2log(2np)np\mu_{2}=\frac{1}{n}+2\sqrt{\frac{\log({2np})}{np}}. Therefore, we have

P(|1p(𝑰𝒏1n𝑾𝑨)𝑨|μ2)\displaystyle P\left(\left|\frac{1}{p}\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{W}\boldsymbol{A}^{\top}\right)\boldsymbol{A}\right|_{\infty}\geq\mu_{2}\right) \displaystyle\leq P({|1p(𝑰𝒏1n𝑾𝑨)𝑨|μ2}E(n,p))+P({E(n,p)}c)\displaystyle P\left(\left\{\left|\frac{1}{p}\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{W}\boldsymbol{A}^{\top}\right)\boldsymbol{A}\right|_{\infty}\geq\mu_{2}\right\}\cap E(n,p)\right)+P\left(\{E(n,p)\}^{c}\right)
\displaystyle\leq P({|1p(𝑰𝒏1n𝑾𝑨)𝑨|μ2}E(n,p))+(2p2+2n2+12np)\displaystyle P\left(\left\{\left|\frac{1}{p}\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{W}\boldsymbol{A}^{\top}\right)\boldsymbol{A}\right|_{\infty}\geq\mu_{2}\right\}\cap E(n,p)\right)+\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right)

The last inequality comes from (129). Note that, since 𝑾\boldsymbol{W} is a feasible solution, given 𝑨E(n,p)\boldsymbol{A}\in E(n,p), it will satisfy the third constraint of Alg. 2 with probability 1. This implies

P({|1p(𝑰𝒏1n𝑾𝑨)𝑨|μ2}E(n,p))=0.P\left(\left\{\left|\frac{1}{p}\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{W}\boldsymbol{A}^{\top}\right)\boldsymbol{A}\right|_{\infty}\geq\mu_{2}\right\}\cap E(n,p)\right)=0.

Therefore, we have,

P(|1p(𝑰𝒏1n𝑾𝑨)𝑨|μ2)1(2p2+2n2+12np)P\left(\left|\frac{1}{p}\left(\boldsymbol{I_{n}}-\frac{1}{n}\boldsymbol{W}\boldsymbol{A}^{\top}\right)\boldsymbol{A}\right|_{\infty}\leq\mu_{2}\right)\geq 1-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right) (136)

Hence, we have, |1p(1n𝑾𝑨𝑰𝒏)𝑨|\left|\frac{1}{p}\left(\frac{1}{n}\boldsymbol{W}\boldsymbol{A}^{\top}-\boldsymbol{I_{n}}\right)\boldsymbol{A}\right|_{\infty}= OP(log(pn)pn+1/n)O_{P}\left(\sqrt{\frac{\log(pn)}{pn}}+1/n\right).
Result (5): Recall that from Eqn.(129), we have with probability at least 1(2p2+2n2+12np)1-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right),

|1p(𝑾𝑨/n𝑰𝒏)𝑨|μ2.\left|\frac{1}{p}(\boldsymbol{WA^{\top}}/n-\boldsymbol{I_{n}})\boldsymbol{A}\right|_{\infty}\leq\mu_{2}.

Applying triangle inequality, we have with probability atleast 1(2p2+2n2+12np)1-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right),

|1np𝑾𝑨𝑨||1p(𝑾𝑨/n𝑰𝒏)𝑨|+1p|𝑨|μ2+1/p.\left|\frac{1}{np}\boldsymbol{WA}^{\top}\boldsymbol{A}\right|_{\infty}\leq\left|\frac{1}{p}(\boldsymbol{WA^{\top}}/n-\boldsymbol{I_{n}})\boldsymbol{A}\right|_{\infty}+\frac{1}{p}|\boldsymbol{A}|_{\infty}\leq\mu_{2}+1/p. (137)

We now present some useful results about the norms being used in this proof. Let 𝑿\boldsymbol{X} be a p×pp\times p matrix and 𝒀\boldsymbol{Y} be a p×np\times n matrix . Recall the following definitions from [48],

𝒀max𝒙n{𝟎}𝒀𝒙𝒙and𝒀22max𝒙n{𝟎}𝒀𝒙2𝒙2=σmax(𝒀).\displaystyle\|\boldsymbol{Y}\|_{\infty\rightarrow\infty}\triangleq\max_{\boldsymbol{x}\in\mathbb{R}^{n}\setminus\{\boldsymbol{0}\}}\frac{\|\boldsymbol{Yx}\|_{\infty}}{\|\boldsymbol{x}\|_{\infty}}\quad\textup{and}\ \|\boldsymbol{Y}\|_{2\rightarrow 2}\triangleq\max_{\boldsymbol{x}\in\mathbb{R}^{n}\setminus\{\boldsymbol{0}\}}\frac{\|\boldsymbol{Yx}\|_{2}}{\|\boldsymbol{x}\|_{2}}=\sigma_{max}(\boldsymbol{Y}).

Note that |𝑿𝒀|𝑿|𝒀||\boldsymbol{XY}|_{\infty}\leq\|\boldsymbol{X}\|_{\infty\rightarrow\infty}|\boldsymbol{Y}|_{\infty}222This is because 𝑿𝒀=maxi𝑿𝒀𝒊maxi𝑿𝒀.𝒊\|\boldsymbol{XY}\|_{\infty}=\max_{i}\|\boldsymbol{XY_{i}}\|_{\infty}\leq\max_{i}\|\boldsymbol{X}\|_{\infty\rightarrow\infty}\|\boldsymbol{Y_{.i}}\|_{\infty} (by the definition of the induced norm) =𝑿maxi𝒀.𝒊=𝑿𝒀=\|\boldsymbol{X}\|_{\infty\rightarrow\infty}\max_{i}\|\boldsymbol{Y_{.i}}\|_{\infty}=\|\boldsymbol{X}\|_{\infty\rightarrow\infty}\|\boldsymbol{Y}\|_{\infty}.. Since 1p𝒙2𝒙\frac{1}{\sqrt{p}}\|\boldsymbol{x}\|_{2}\leq\|\boldsymbol{x}\|_{\infty} and 𝒀𝒙𝒀𝒙2\|\boldsymbol{Y}^{\top}\boldsymbol{x}\|_{\infty}\leq\|\boldsymbol{Y}^{\top}\boldsymbol{x}\|_{2} for all 𝒙p\boldsymbol{x}\in\mathbb{R}^{p} , we have

𝒀p𝒀22.\|\boldsymbol{Y^{\top}}\|_{\infty\rightarrow\infty}\leq\sqrt{p}\|\boldsymbol{Y}^{\top}\|_{2\rightarrow 2}. (138)

Then by using (138), we have

|𝑿𝒀|=|𝒀𝑿|\displaystyle|\boldsymbol{XY}|_{\infty}=|\boldsymbol{Y}^{\top}\boldsymbol{X}^{\top}|_{\infty} 𝒀|𝑿|p𝒀22|𝑿|=pσmax(𝒀)|𝑿|.\displaystyle\leq\|\boldsymbol{Y}^{\top}\|_{\infty\rightarrow\infty}|\boldsymbol{X}^{\top}|_{\infty}\leq\sqrt{p}\|\boldsymbol{Y}^{\top}\|_{2\rightarrow 2}|\boldsymbol{X}^{\top}|_{\infty}=\sqrt{p}\sigma_{max}(\boldsymbol{Y})|\boldsymbol{X}|_{\infty}. (139)

Substituting 𝑿=1np𝑾𝑨𝑨\boldsymbol{X}=\frac{1}{np}\boldsymbol{W}\boldsymbol{A}^{\top}\boldsymbol{A} and 𝒀=𝑨𝑨(𝑨𝑨)1\boldsymbol{Y=A^{\dagger}}\triangleq\boldsymbol{A}^{\top}(\boldsymbol{AA}^{\top})^{-1}, the Moore-Penrose pseudo-inverse of 𝑨\boldsymbol{A}, in (139), we obtain:

|1np𝑾𝑨|=|1np𝑾𝑨𝑨𝑨|pnp|𝑾𝑨𝑨|𝑨22.\left|\frac{1}{np}\boldsymbol{WA^{\top}}\right|_{\infty}=\left|\frac{1}{np}\boldsymbol{WA}^{\top}\boldsymbol{AA}^{\dagger}\right|_{\infty}\leq\frac{\sqrt{p}}{np}|\boldsymbol{WA}^{\top}\boldsymbol{A}|_{\infty}\|\boldsymbol{A^{\dagger}}\|_{2\rightarrow 2}. (140)

We now derive the upper bound for the second factor of (140). We have,

𝑨22=σmax(𝑨)=1σmin(𝑨)=1σmin(𝑨).\|\boldsymbol{A^{\dagger}}\|_{2\rightarrow 2}=\sigma_{max}(\boldsymbol{A^{\dagger}})=\frac{1}{\sigma_{min}(\boldsymbol{A})}=\frac{1}{\sigma_{min}(\boldsymbol{A}^{\top})}. (141)

Note that, for an arbitrary ϵ1>0\epsilon_{1}>0, using Theorem 1.1. from [37] for the mean-zero sub-Gaussian random matrix 𝑨\boldsymbol{A}, we have the following

P(σmin(𝑨)ϵ1(pn1))(c6ϵ1)pn+1+(c5)p.P(\sigma_{min}(\boldsymbol{A}^{\top})\leq\epsilon_{1}(\sqrt{p}-\sqrt{n-1}))\leq(c_{6}\epsilon_{1})^{p-n+1}+(c_{5})^{p}. (142)

where c6>0c_{6}>0 and c5(0,1)c_{5}\in(0,1) are constants dependent on the sub-Gaussian norm of the entries of 𝑨\boldsymbol{A}^{\top}. Let for some small constant ψ(0,1)\psi\in(0,1) ϵ1c6ψ\epsilon_{1}c_{6}\triangleq\psi, we have

P(σmax(𝑨)c6ψ(pn1))1((ψ)pn+1+(c5)p).P\left(\sigma_{max}(\boldsymbol{A^{\dagger}})\leq\frac{c_{6}}{\psi(\sqrt{p}-\sqrt{n-1})}\right)\geq 1-((\psi)^{p-n+1}+(c_{5})^{p}). (143)

we have

P(σmax(𝑨)c6ψp(1n1p))1((ψ)pn+1+(c5)p).\displaystyle P\left(\sigma_{max}(\boldsymbol{A^{\dagger}})\leq\frac{c_{6}}{\psi\sqrt{p}\left(1-\frac{\sqrt{n-1}}{\sqrt{p}}\right)}\right)\geq 1-((\psi)^{p-n+1}+(c_{5})^{p}). (144)

Using Eqns. (144) and (137), we have

P[p1np|𝑾𝑨𝑨||𝑨|22(μ2+1p)c6ψ(1n/p)]1((2p2+2n2+12np)+(1/2)pn+1+(c5)p).\displaystyle P\left[\sqrt{p}\frac{1}{np}|\boldsymbol{WA}^{\top}\boldsymbol{A}|_{\infty}|\boldsymbol{A^{\dagger}}|_{2\rightarrow 2}\leq\left(\mu_{2}+\frac{1}{p}\right)\frac{c_{6}}{\psi(1-\sqrt{n/p})}\right]\geq 1-\left(\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right)+(1/2)^{p-n+1}+(c_{5})^{p}\right).

Therefore we have by Bonferroni’s inequality,

P(|1np𝑾𝑨|(μ2+1p)c6ψ(1n/p))1((2p2+2n2+12np)+(ψ)pn+1+(c5)p).P\left(\left|\frac{1}{np}\boldsymbol{WA}^{\top}\right|_{\infty}\leq\left(\mu_{2}+\frac{1}{p}\right)\frac{c_{6}}{\psi(1-\sqrt{n/p})}\right)\geq 1-\left(\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right)+(\psi)^{p-n+1}+(c_{5})^{p}\right).

Under the condition n=o(p)n=o(p) as n,pn,p\to\infty, the probability 1(2p2+2n2+12np+(ψ)pn+1+(c5)p)11-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}+(\psi)^{p-n+1}+(c_{5})^{p}\right)\to 1. Therefore, we have,

|1p𝑾𝑨|=OP(n(log(np)np+1n+1p))=OP(nlog(np)p+1+np)=OP(nlog(np)p+1).\left|\frac{1}{p}\boldsymbol{WA}^{\top}\right|_{\infty}=O_{P}\left(n\left(\sqrt{\frac{\log(np)}{np}}+\frac{1}{n}+\frac{1}{p}\right)\right)=O_{P}\left(\sqrt{\frac{n\log(np)}{p}}+1+\frac{n}{p}\right)=O_{P}\left(\sqrt{\frac{n\log(np)}{p}}+1\right). (145)

Result (6): Recall that μ3=21n/p2log(n)p\mu_{3}=\frac{2}{\sqrt{1-n/p}}\sqrt{\frac{2\log(n)}{p}}. We have,

P(np1np|(𝑾𝑨npn𝑰𝒏)|μ3)\displaystyle P\left(\frac{n}{p\sqrt{1-\frac{n}{p}}}\left|\left(\frac{\boldsymbol{WA^{\top}}}{n}-\frac{p}{n}\boldsymbol{I_{n}}\right)\right|_{\infty}\geq\mu_{3}\right)
\displaystyle\leq P({np1np|(𝑾𝑨npn𝑰𝒏)|μ3}E(n,p))+P({E(n,p)}c)\displaystyle P\left(\left\{\frac{n}{p\sqrt{1-\frac{n}{p}}}\left|\left(\frac{\boldsymbol{WA^{\top}}}{n}-\frac{p}{n}\boldsymbol{I_{n}}\right)\right|_{\infty}\geq\mu_{3}\right\}\cap E(n,p)\right)+P\left(\{E(n,p)\}^{c}\right)
\displaystyle\leq P({np1np|(𝑾𝑨npn𝑰𝒏)|μ3}E(n,p))+(2p2+2n2+12np)\displaystyle P\left(\left\{\frac{n}{p\sqrt{1-\frac{n}{p}}}\left|\left(\frac{\boldsymbol{WA^{\top}}}{n}-\frac{p}{n}\boldsymbol{I_{n}}\right)\right|_{\infty}\geq\mu_{3}\right\}\cap E(n,p)\right)+\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right)

The last inequality comes from (129). Note that, since 𝑾\boldsymbol{W} is a feasible solution, given 𝑨E3(n,p)\boldsymbol{A}\in E_{3}(n,p), it will satisfy the fourth constraint of Alg. 2 with probability 1. This implies that:

P({np1np|(𝑾𝑨npn𝑰𝒏)|μ3}E(n,p))=0,P\left(\left\{\frac{n}{p\sqrt{1-\frac{n}{p}}}\left|\left(\frac{\boldsymbol{WA^{\top}}}{n}-\frac{p}{n}\boldsymbol{I_{n}}\right)\right|_{\infty}\geq\mu_{3}\right\}\cap E(n,p)\right)=0,

which yields:

P(np1np|(𝑾𝑨npn𝑰𝒏)|μ3)1(2p2+2n2+12np).P\left(\frac{n}{p\sqrt{1-\frac{n}{p}}}\left|\left(\frac{\boldsymbol{WA^{\top}}}{n}-\frac{p}{n}\boldsymbol{I_{n}}\right)\right|_{\infty}\leq\mu_{3}\right)\geq 1-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right). (146)

Since, 1(2p2+2n2+12np)1-\left(\frac{2}{p^{2}}+\frac{2}{n^{2}}+\frac{1}{2np}\right) goes to 0 as n,pn,p\to\infty, the proof is completed. \blacksquare

Appendix C Lemma on properties of 𝑨\boldsymbol{A}

Lemma 5

Let 𝐀\boldsymbol{A} be a n×pn\times p random Rademacher matrix. Then the following holds:

  1. 1.

    |1n𝑨𝑨𝑰𝒑|\left|\frac{1}{n}\boldsymbol{A^{\top}}\boldsymbol{A}-\boldsymbol{I_{p}}\right|_{\infty} = OP(log(p)n)O_{P}\left(\sqrt{\frac{\log(p)}{n}}\right).

  2. 2.

    |1p(1n𝑨𝑨𝑰𝒏)𝑨|=OP(log(pn)pn+1n)\left|\frac{1}{p}\left(\frac{1}{n}\boldsymbol{A}\boldsymbol{A}^{\top}-\boldsymbol{I_{n}}\right)\boldsymbol{A}\right|_{\infty}=O_{P}\left(\sqrt{\frac{\log(pn)}{pn}}+\frac{1}{n}\right).

  3. 3.

    If n<pn<p, then np1np|(𝑨𝑨npn𝑰𝒏)|=OP(11n/plog(n)p)\frac{n}{p\sqrt{1-\frac{n}{p}}}\left|\left(\frac{\boldsymbol{AA^{\top}}}{n}-\frac{p}{n}\boldsymbol{I_{n}}\right)\right|_{\infty}=O_{P}\left(\frac{1}{\sqrt{1-n/p}}\sqrt{\frac{\log(n)}{p}}\right).

Proof of Lemma 5, [Result (1)]: Let 𝑽𝑨𝑨/n𝑰𝒑\boldsymbol{V}\triangleq\boldsymbol{A^{\top}A}/n-\boldsymbol{I_{p}}. Note that elements of 𝑽\boldsymbol{V} matrix satisfies the following:

vlj={k=1n(akl)2n1=0if j=l,l[p]k=1naklakjnif jl,j,l[p]v_{lj}=\begin{cases}\sum_{k=1}^{n}\frac{(a_{kl})^{2}}{n}-1=0&\mbox{if }j=l,\ l\in[p]\\ \sum_{k=1}^{n}\frac{a_{kl}a_{kj}}{n}&\mbox{if }j\neq l,\ j,l\in[p]\end{cases} (147)

Therefore, we now consider off-diagonal elements of 𝑽\boldsymbol{V} (i.e., ljl\neq j) for the bound. Each summand of vljv_{lj} is uniformly bounded 1naklakjn1n-\frac{1}{n}\leq\frac{a_{kl}a_{kj}}{n}\leq\frac{1}{n} since the elements of 𝑨\boldsymbol{A} are ±1\pm 1. Note that E[aklakjn]=0E\left[\frac{a_{kl}a_{kj}}{n}\right]=0 k[n],lj[p]\forall k\in[n],\ l\neq j\in[p]. Furthermore, for lj[p]l\neq j\in[p], each of the summands of vljv_{lj} are independent since the elements of 𝑨\boldsymbol{A} are independent. Therefore, using Hoeffding’s Inequality (see Lemma 1 of [34]) for t>0t>0,

P(|vlj|t)=P(|k=1naklakjn|t)2ent22,lj[p].\displaystyle P(|v_{lj}|\geq t)=P\left(\left|\sum_{k=1}^{n}\frac{a_{kl}a_{kj}}{n}\right|\geq t\right)\leq 2e^{-\frac{nt^{2}}{2}},\ l\neq j\in[p].

Therefore we have

P(maxlj[n]|vlj|t)\displaystyle P\left(\max_{l\neq j\in[n]}|v_{lj}|\geq t\right) =\displaystyle= P(lj[n]|vlj|t)lj[n]P(|vlj|t)2p(p1)ent22<2p2ent22.\displaystyle P\left(\cup_{l\neq j\in[n]}|v_{lj}|\geq t\right)\leq\sum_{l\neq j\in[n]}P(|v_{lj}|\geq t)\leq 2p(p-1)e^{-\frac{nt^{2}}{2}}<2p^{2}e^{-\frac{nt^{2}}{2}}. (148)

Putting t=22logpnt=2\sqrt{\frac{2\log p}{n}} in (148), we obtain:

P(maxlj[n]|vlj|22logpn)2p2e4log(p)=2p2.\displaystyle P\left(\max_{l\neq j\in[n]}|v_{lj}|\geq 2\sqrt{\frac{2\log p}{n}}\right)\leq 2p^{2}e^{-4\log(p)}=2p^{-2}. (149)

Thus, we have:

|𝑽|=OP(logpn).|\boldsymbol{V}|_{\infty}=O_{P}\left(\sqrt{\frac{\log p}{n}}\right). (150)

This completes the proof of Result (1).

Result (2): Note that,

1p(1n𝑨𝑨𝑰𝒏)𝑨=1p(1n𝑨𝑨𝑨𝑨)𝑽.\frac{1}{p}\left(\frac{1}{n}\boldsymbol{A}\boldsymbol{A}^{\top}-\boldsymbol{I_{n}}\right)\boldsymbol{A}=\frac{1}{p}\left(\frac{1}{n}\boldsymbol{A}\boldsymbol{A}^{\top}\boldsymbol{A}-\boldsymbol{A}\right)\triangleq\boldsymbol{V}. (151)

Fix i[n]i\in[n] and j[p]j\in[p], and consider

vij=aijp+1npl=1pk=1nailaklakjv_{ij}=-\frac{a_{ij}}{p}+\frac{1}{np}\sum_{l=1}^{p}\sum_{k=1}^{n}a_{il}a_{kl}a_{kj}

By splitting the sum over ll into the terms where ljl\neq j and the term where l=jl=j, and simplifying by using the fact that akj2=1a_{kj}^{2}=1 for all k,jk,j, we obtain

vij\displaystyle v_{ij} =aijp+1np(l=1ljpk=1nailaklakj)+1np(k=1naijakj2)\displaystyle=-\frac{a_{ij}}{p}+\frac{1}{np}\left(\sum_{\begin{subarray}{c}l=1\\ l\neq j\end{subarray}}^{p}\sum_{k=1}^{n}a_{il}a_{kl}a_{kj}\right)+\frac{1}{np}\left(\sum_{k=1}^{n}a_{ij}a_{kj}^{2}\right)
=aijp+1np(l=1ljpk=1nailaklakj)+aijp1n(k=1n1)\displaystyle=-\frac{a_{ij}}{p}+\frac{1}{np}\left(\sum_{\begin{subarray}{c}l=1\\ l\neq j\end{subarray}}^{p}\sum_{k=1}^{n}a_{il}a_{kl}a_{kj}\right)+\frac{a_{ij}}{p}\frac{1}{n}\left(\sum_{k=1}^{n}1\right)
=1np(l=1ljpk=1nailaklakj).\displaystyle=\frac{1}{np}\left(\sum_{\begin{subarray}{c}l=1\\ l\neq j\end{subarray}}^{p}\sum_{k=1}^{n}a_{il}a_{kl}a_{kj}\right).

Next we split the sum over kk into the terms where kik\neq i and the term where k=ik=i to obtain

vij=1np(l=1ljpk=1nailaklakj)=1np(l=1ljpk=1kinailaklakj)+1npl=1ljpail2aij=1np(l=1ljpk=1kinailaklakj)+p1npaij.v_{ij}=\frac{1}{np}\left(\sum_{\begin{subarray}{c}l=1\\ l\neq j\end{subarray}}^{p}\sum_{k=1}^{n}a_{il}a_{kl}a_{kj}\right)=\frac{1}{np}\left(\sum_{\begin{subarray}{c}l=1\\ l\neq j\end{subarray}}^{p}\sum_{\begin{subarray}{c}k=1\\ k\neq i\end{subarray}}^{n}a_{il}a_{kl}a_{kj}\right)+\frac{1}{np}\sum_{\begin{subarray}{c}l=1\\ l\neq j\end{subarray}}^{p}a_{il}^{2}a_{ij}=\frac{1}{np}\left(\sum_{\begin{subarray}{c}l=1\\ l\neq j\end{subarray}}^{p}\sum_{\begin{subarray}{c}k=1\\ k\neq i\end{subarray}}^{n}a_{il}a_{kl}a_{kj}\right)+\frac{p-1}{np}a_{ij}. (152)

If we condition on 𝒂i.\boldsymbol{a}_{i.} and 𝒂.j\boldsymbol{a}_{.j}, the (n1)(p1)(n-1)(p-1) random variables 1npailaklakj\frac{1}{np}a_{il}a_{kl}a_{kj} for k[n]{i}k\in[n]\setminus\{i\} and l[p]{j}l\in[p]\setminus\{j\} are independent, have mean zero, and are bounded between 1np-\frac{1}{np} and 1np\frac{1}{np}. Therefore, by Hoeffding’s inequality, for t>0t>0, we have

P(|1npl=1ljpk=1kinailaklakj|t|𝒂i.,𝒂.j)2en2p2t22(n1)(p1)2enpt2/2.P\left(\left.\left|\frac{1}{np}\sum_{\begin{subarray}{c}l=1\\ l\neq j\end{subarray}}^{p}\sum_{\begin{subarray}{c}k=1\\ k\neq i\end{subarray}}^{n}a_{il}a_{kl}a_{kj}\right|\geq t\;\right|\;\boldsymbol{a}_{i.},\boldsymbol{a}_{.j}\right)\leq 2e^{-\frac{n^{2}p^{2}t^{2}}{2(n-1)(p-1)}}\leq 2e^{-npt^{2}/2}. (153)

Since the RHS of (153) is independent of 𝒂i,\boldsymbol{a}_{i,} and 𝒂.j\boldsymbol{a}_{.j} the bound also holds on the unconditional probability, i.e., we have

P(|1npl=1ljpk=1kinailaklakj|t)2en2p2t22(n1)(p1)2enpt2/2.P\left(\left|\frac{1}{np}\sum_{\begin{subarray}{c}l=1\\ l\neq j\end{subarray}}^{p}\sum_{\begin{subarray}{c}k=1\\ k\neq i\end{subarray}}^{n}a_{il}a_{kl}a_{kj}\right|\geq t\right)\leq 2e^{-\frac{n^{2}p^{2}t^{2}}{2(n-1)(p-1)}}\leq 2e^{-npt^{2}/2}. (154)

Since aija_{ij} is Rademacher, p1np|aij|<1n\frac{p-1}{np}|a_{ij}|<\frac{1}{n} with probability 11. Choosing t=2log(2np)npt=2\sqrt{\frac{\log(2np)}{np}} and using the triangle inequality, we have from (154),

P(|vij|1n+2log(2np)np)\displaystyle P\left(|v_{ij}|\geq\frac{1}{n}+2\sqrt{\frac{\log(2np)}{np}}\right) P(|1npl=1ljpk=1kinailaklakj|+p1np|aij|2log(2np)np+1n)\displaystyle\leq P\left(\left|\frac{1}{np}\sum_{\begin{subarray}{c}l=1\\ l\neq j\end{subarray}}^{p}\sum_{\begin{subarray}{c}k=1\\ k\neq i\end{subarray}}^{n}a_{il}a_{kl}a_{kj}\right|+\frac{p-1}{np}|a_{ij}|\geq 2\sqrt{\frac{\log(2np)}{np}}+\frac{1}{n}\right)
P(|1npl=1ljpk=1kinailaklakj|2log(2np)np)12n2p2.\displaystyle\leq P\left(\left|\frac{1}{np}\sum_{\begin{subarray}{c}l=1\\ l\neq j\end{subarray}}^{p}\sum_{\begin{subarray}{c}k=1\\ k\neq i\end{subarray}}^{n}a_{il}a_{kl}a_{kj}\right|\geq 2\sqrt{\frac{\log(2np)}{np}}\right)\leq\frac{1}{2n^{2}p^{2}}.

Then, by a union bound over npnp events,

P(maxi,j|vij|1n+2log(2np)np)np12n2p2=12np.P\left(\max_{i,j}|v_{ij}|\geq\frac{1}{n}+2\sqrt{\frac{\log(2np)}{np}}\right)\leq np\,\frac{1}{2n^{2}p^{2}}=\frac{1}{2np}. (155)

This completes the proof of Result (2).
Result (3): Reversing the roles of nn and pp in result (1), (147) and (149) of this lemma, we have

P(|𝑨𝑨p𝑰𝒏|22log(n)p)12n2.P\left(\left|\frac{\boldsymbol{AA^{\top}}}{p}-\boldsymbol{I_{n}}\right|_{\infty}\leq 2\sqrt{\frac{2\log(n)}{p}}\right)\geq 1-\frac{2}{n^{2}}. (156)

Now, multiplying by 11n/p\frac{1}{\sqrt{1-n/p}}, we get

P(np1np|(𝑨𝑨npn𝑰𝒏)|21n/p2log(n)p)12n2.P\left(\frac{n}{p\sqrt{1-\frac{n}{p}}}\left|\left(\frac{\boldsymbol{AA^{\top}}}{n}-\frac{p}{n}\boldsymbol{I_{n}}\right)\right|_{\infty}\leq\frac{2}{\sqrt{1-n/p}}\sqrt{\frac{2\log(n)}{p}}\right)\geq 1-\frac{2}{n^{2}}. (157)

This completes the proof. \blacksquare

Appendix D Some useful lemmas

Lemma 6

Let 𝐔\boldsymbol{U} and 𝐕\boldsymbol{V} be two n×pn\times p random matrices. Let ϑ\vartheta\in\mathbb{R} and 𝐰n\boldsymbol{w}\in\mathbb{R}^{n}. Then,

  1. 1.

    |ϑ𝑼|=|ϑ||𝑼||\vartheta\boldsymbol{U}|_{\infty}=|\vartheta||\boldsymbol{U}|_{\infty}.

  2. 2.

    |𝑼+𝑽||𝑼|+|𝑽||\boldsymbol{U}+\boldsymbol{V}|_{\infty}\leq|\boldsymbol{U}|_{\infty}+|\boldsymbol{V}|_{\infty}.

  3. 3.

    If |𝑼|=OP(h1(n,p))|\boldsymbol{U}|_{\infty}=O_{P}(h_{1}(n,p)) and |𝑽|=OP(h2(n,p))|\boldsymbol{V}|_{\infty}=O_{P}(h_{2}(n,p)), then |𝑼+𝑽|OP(max{h1(n,p),h2(n,p)})|\boldsymbol{U}+\boldsymbol{V}|_{\infty}\leq O_{P}(\max\{h_{1}(n,p),h_{2}(n,p)\}).

  4. 4.

    𝒘𝑽|𝑽|𝒘1\|\boldsymbol{w^{\top}V}\|_{\infty}\leq|\boldsymbol{V}|_{\infty}\|\boldsymbol{w}\|_{1}.

  5. 5.

    If |𝑽|=OP(h1(n,p))|\boldsymbol{V}|_{\infty}=O_{P}(h_{1}(n,p)) and 𝒘1=OP(h𝒘(n,p))\|\boldsymbol{w}\|_{1}=O_{P}(h_{\boldsymbol{w}}(n,p)), then 𝒘𝑽OP(h1(n,p)hw(n,p))\|\boldsymbol{w^{\top}V}\|_{\infty}\leq O_{P}(h_{1}(n,p)h_{w}(n,p)). \blacksquare

Proof:
Part (1): We have, |ϑ𝑼|=maxi[n],j[p]|ϑuij|=maxi[n],j[p]|ϑ||uij|=|ϑ||𝑼||\vartheta\boldsymbol{U}|_{\infty}=\underset{i\in[n],j\in[p]}{\max}|\vartheta u_{ij}|=\underset{i\in[n],j\in[p]}{\max}|\vartheta||u_{ij}|=|\vartheta||\boldsymbol{U}|_{\infty}.

Part (2): We have, |𝑼+𝑽|=maxi[n],j[p]|uij+vij|maxi[n],j[p]{|uij|+|vij|}maxi[n],j[p]|uij|+maxi[n],j[p]|vij|=|𝑼|+|𝑽||\boldsymbol{U}+\boldsymbol{V}|_{\infty}=\underset{i\in[n],j\in[p]}{\max}|u_{ij}+v_{ij}|\leq\underset{i\in[n],j\in[p]}{\max}\{|u_{ij}|+|v_{ij}|\}\leq\underset{i\in[n],j\in[p]}{\max}|u_{ij}|+\underset{i\in[n],j\in[p]}{\max}|v_{ij}|=|\boldsymbol{U}|_{\infty}+|\boldsymbol{V}|_{\infty}.

Part (3): Given |𝑼|=OP(h1(n,p))|\boldsymbol{U}|_{\infty}=O_{P}(h_{1}(n,p)) and |𝑽|=OP(h2(n,p))|\boldsymbol{V}|_{\infty}=O_{P}(h_{2}(n,p)). From Part (2), we have,

|𝑼+𝑽||𝑼|+|𝑽|OP(h1(n,p))+OP(h1(n,p))=OP(h1(n,p)+h2(n,p))OP(max{h1(n,p),h2(n,p)}).|\boldsymbol{U}+\boldsymbol{V}|_{\infty}\leq|\boldsymbol{U}|_{\infty}+|\boldsymbol{V}|_{\infty}\leq O_{P}(h_{1}(n,p))+O_{P}(h_{1}(n,p))=O_{P}(h_{1}(n,p)+h_{2}(n,p))\leq O_{P}(\max\{h_{1}(n,p),h_{2}(n,p)\}).

Part (4): For any j[p]j\in[p], we have

|(𝒘𝑽)j|=|i=1nvijwi|i=1n|vij||wi|i=1nmaxj[p]|vij||wi|maxi[n],j[p]|vij|i=1n|wi|=|𝑽|𝒘1.|(\boldsymbol{w^{\top}V})_{j}|=|\sum_{i=1}^{n}{v}_{ij}w_{i}|\leq\sum_{i=1}^{n}|{v}_{ij}||w_{i}|\leq\sum_{i=1}^{n}\max_{j\in[p]}|{v}_{ij}||w_{i}|\leq\max_{i\in[n],j\in[p]}|{v}_{ij}|\sum_{i=1}^{n}|w_{i}|=|\boldsymbol{V}|_{\infty}\|\boldsymbol{w}\|_{1}.

Part (5): We have from Part (4), 𝒘𝑽|𝑽|𝒘1\|\boldsymbol{w^{\top}V}\|_{\infty}\leq|\boldsymbol{V}|_{\infty}\|\boldsymbol{w}\|_{1} = OP(h1(n,p))OP(h𝒘(n,p))O_{P}(h_{1}(n,p))O_{P}(h_{\boldsymbol{w}}(n,p)) = OP(h1(n,p)hw(n,p))O_{P}(h_{1}(n,p)h_{w}(n,p)). \blacksquare

Lemma 7

Let XiX_{i}, for i=1,2,,ki=1,2,\ldots,k, be Gaussian random variables with mean 0 and variance σ2\sigma^{2}. Then, we have

P[maxi[k]|Xi|2σlog(k)]1/k.P\left[\max_{i\in[k]}|X_{i}|\geq 2\sigma\sqrt{\log(k)}\right]\leq 1/k. (158)

Note that Lemma 7 does not require independence. For a proof see, e.g., [46].

References

  • [1] List of countries implementing pool testing strategy against COVID-19. https://en.wikipedia.org/wiki/List_of_countries_implementing_pool_testing_strategy_against_COVID-19. Last retrieved, Oct 2021.
  • [2] M. Aldridge and D. Ellis. Pooled Testing and Its Applications in the COVID-19 Pandemic, pages 217–249. 2022.
  • [3] M. Aldridge, O. Johnson, and J. Scarlett. Group testing: An information theory perspective. Found. Trends Commun. Inf. Theory, 15:196–392, 2019.
  • [4] A. Aldroubi, X. Chen, and A.M. Powell. Perturbations of measurement matrices and dictionaries in compressed sensing. Appl. Comput. Harmon. Anal., 33(2), 2012.
  • [5] G. Atia and V. Saligrama. Boolean compressed sensing and noisy group testing. IEEE Trans. Inf. Theory, 58(3), 2012.
  • [6] S. H. Bharadwaja and C. R. Murthy. Recovery algorithms for pooled RT-qPCR based COVID-19 screening. IEEE Trans. Signal Process., 70:4353–4368, 2022.
  • [7] T.T. Cai and Z. Guo. Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity. Ann. Stat., 45(2), 2017.
  • [8] Emmanuel J. Candès, Justin K. Romberg, and Terence Tao. Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Math., 59(8):1207–1223, 2006.
  • [9] G. Casella and R. Berger. Statistical Inference. Thomson Learning, 2002.
  • [10] C. L. Chan, P. H. Che, S. Jaggi, and V. Saligrama. Non-adaptive probabilistic group testing with noisy measurements: Near-optimal bounds with efficient algorithms. In ACCC, pages 1832–1839, 2011.
  • [11] M. Cheraghchi, A. Hormati, A. Karbasi, and M. Vetterli. Group testing with probabilistic tests: Theory, design and application. IEEE Transactions on Information Theory, 57(10), 2011.
  • [12] A. Christoff et al. Swab pooling: A new method for large-scale RT-qPCR screening of SARS-CoV-2 avoiding sample dilution. PLOS ONE, 16(2):1–12, 02 2021.
  • [13] S. Comess, H. Wang, S. Holmes, and C. Donnat. Statistical Modeling for Practical Pooled Testing During the COVID-19 Pandemic. Statistical Science, 37(2):229 – 250, 2022.
  • [14] R. Dorfman. The detection of defective members of large populations. Ann. Math. Stat., 14(4):436–440, 1943.
  • [15] E. Fenichel, R. Koch, A. Gilbert, G. Gonsalves, and A. Wyllie. Understanding the barriers to pooled SARS-CoV-2 testing in the United States. Microbiology Spectrum, 2021.
  • [16] M. Fischler and R. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 1981.
  • [17] D. Forsyth and J. Ponce. Computer Vision: A Modern Approach. Pearson, 2012.
  • [18] S. Fosson, V. Cerone, and D. Regruto. Sparse linear regression from perturbed data. Automatica, 122, 2020.
  • [19] Sabyasachi Ghosh, Rishi Agarwal, Mohammad Ali Rehan, Shreya Pathak, Pratyush Agarwal, Yash Gupta, Sarthak Consul, Nimay Gupta, Ritesh Goenka, Ajit Rajwade, and Manoj Gopalkrishnan. A compressed sensing approach to pooled RT-PCR testing for COVID-19 detection. IEEE Open Journal of Signal Processing, 2:248–264, 2021.
  • [20] A. C. Gilbert, M. A. Iwen, and M. J. Strauss. Group testing and sparse signal recovery. In 42nd Asilomar Conf. Signals, Syst. and Comput., pages 1059–1063, 2008.
  • [21] N. Grobe, A. Cherif, X. Wang, Z. Dong, and P. Kotanko. Sample pooling: burden or solution? Clin. Microbiol. Infect., 27(9):1212–1220, 2021.
  • [22] T. Hastie, R. Tibshirani, and M. Wainwright. Statistical Learning with Sparsity: The LASSO and Generalizations. CRC Press, 2015.
  • [23] A. Heidarzadeh and K. Narayanan. Two-stage adaptive pooling with RT-qPCR for COVID-19 screening. In ICASSP, 2021.
  • [24] M.A. Herman and T. Strohmer. General deviants: an analysis of perturbations in compressed sensing. IEEE Journal on Sel. Topics Signal Process., 4(2), 2010.
  • [25] F. Hwang. A method for detecting all defective members in a population by group testing. J Am Stat Assoc, 67(339):605–608, 1972.
  • [26] J.D. Ianni and W.A. Grissom. Trajectory auto-corrected image reconstruction. Magnetic Resonance in Medicine, 76(3), 2016.
  • [27] T. Ince and A. Nacaroglu. On the perturbation of measurement matrix in non-convex compressed sensing. Signal Process., 98:143–149, 2014.
  • [28] A. Javanmard and A. Montanari. Confidence intervals and hypothesis testing for high-dimensional regression. J Mach Learn Res, 2014.
  • [29] A. Kahng and S. Reda. New and improved BIST diagnosis methods from combinatorial group testing theory. IEEE Trans. Comp. Aided Design of Inetg. Circ. and Sys., 25(3), 2006.
  • [30] D. B. Larremore, B. Wilder, E. Lester, S. Shehata, J. M. Burke, J. A. Hay, M. Tambe, M. Mina, and R. Parker. Test sensitivity is secondary to frequency and turnaround time for COVID-19 screening. Science Advances, 7(1), 2021.
  • [31] Y. Li and G. Raskutti. Minimax optimal convex methods for Poisson inverse problems under q\ell_{q} -ball sparsity. IEEE Trans. Inf. Theory, 64(8):5498–5512, 2018.
  • [32] Hubert W Lilliefors. On the Kolmogorov-Smirnov test for normality with mean and variance unknown. Journal of the American statistical Association, 62(318):399–402, 1967.
  • [33] Miles E Lopes. Unknown sparsity in compressed sensing: Denoising and inference. IEEE Transactions on Information Theory, 62(9):5145–5166, 2016.
  • [34] G. Lugosi. Concentration-of-measure inequalities, 2009. https://www.econ.upf.edu/ lugosi/anu.pdf.
  • [35] A. Mazumdar and S. Mohajer. Group testing with unreliable elements. In ACCC, 2014.
  • [36] D. C. Montgomery, E. Peck, and G. Vining. Introduction to Linear Regression Analysis. Wiley, 2021.
  • [37] M.Rudelson and R.Vershynin. Smallest singular value of a random rectangular matrix. Comm. Pure Appl. Math., 2009.
  • [38] Nam H. Nguyen and Trac D. Tran. Robust LASSO with missing and grossly corrupted observations. IEEE Trans. Inf. Theory, 59(4):2036–2058, 2013.
  • [39] H. Pandotra, E. Malhotra, A. Rajwade, and K. S. Gurumoorthy. Dealing with frequency perturbations in compressive reconstructions with Fourier sensing matrices. Signal Process., 165:57–71, 2019.
  • [40] J. Parker, V. Cevher, and P. Schniter. Compressive sensing under matrix uncertainties: An approximate message passing approach. In Asilomar Conference on Signals, Systems and Computers, pages 804–808, 2011.
  • [41] Liangzu Peng, Boshi Wang, and Manolis Tsakiris. Homomorphic sensing: Sparsity and noise. In International Conference on Machine Learning, pages 8464–8475. PMLR, 2021.
  • [42] M. Raginsky, R. Willett, Z. Harmany, and R. Marcia. Compressed sensing performance bounds under Poisson noise. IEEE Trans. Signal Process., 58(8):3990–4002, 2010.
  • [43] Chiara Ravazzi, Sophie Fosson, Tiziano Bianchi, and Enrico Magli. Sparsity estimation from compressive projections via sparse random matrices. EURASIP Journal on Advances in Signal Processing, 2018:1–18, 2018.
  • [44] R. Rohde. COVID-19 pool testing: Is it time to jump in? https://asm.org/Articles/2020/July/COVID-19-Pool-Testing-Is-It-Time-to-Jump-In, 2020.
  • [45] Mark Rudelson and Roman Vershynin. The Littlewood–Offord problem and invertibility of random matrices. Advances in Mathematics, 218(2):600–633, 2008.
  • [46] P. Massart S. Boucheron, G. Lugosi. Concentration inequality: A nonasymptotic theory of independence. Oxford Claredon Press, 2012.
  • [47] N. Shental et al. Efficient high throughput SARS-CoV-2 testing to detect asymptomatic carriers. Sci. Adv., 6(37), September 2020.
  • [48] J. Todd. Induced Norms, pages 19–28. Birkhäuser Basel, Basel, 1977.
  • [49] R. Vershynin. High-Dimensional Probability:An Introduction with Applications in Data Science. Cambridge University Press, 2018.
  • [50] Katherine J. Wu. Why pooled testing for the coronavirus isn’t working in America. https://www.nytimes.com/2020/08/18/health/coronavirus-pool-testing.html. Last retrieved October 2021.
  • [51] Hooman Zabeti, Nick Dexter, Ivan Lau, Leonhardt Unruh, Ben Adcock, and Leonid Chindelevitch. Group testing large populations for SARS-CoV-2. medRxiv, pages 2021–06, 2021.
  • [52] J. Zhang, L. Chen, P. Boufounos, and Y. Gu. On the theoretical analysis of cross validation in compressive sensing. In ICASSP, 2014.
  • [53] H. Zhu, G. Leus, and G. Giannakis. Sparsity-cognizant total least-squares for perturbed compressive sampling. IEEE Trans. Signal Process., 59(11), 2011.