This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Image detection using combinatorial auction

Simon Anuk, Tamir Bendory, and Amichai Painsky
Abstract

This paper studies the optimal solution of the classical problem of detecting the location of multiple image occurrences in a two-dimensional, noisy measurement. Assuming the image occurrences do not overlap, we formulate this task as a constrained maximum likelihood optimization problem. We show that the maximum likelihood estimator is equivalent to an instance of the winner determination problem from the field of combinatorial auction and that the solution can be obtained by searching over a binary tree. We then design a pruning mechanism that significantly accelerates the runtime of the search. We demonstrate on simulations and electron microscopy data sets that the proposed algorithm provides accurate detection in challenging regimes of high noise levels and densely packed image occurrences.

1 Introduction

This paper studies the problem of accurately detecting multiple occurrences of a template image sW×Ws\in\mathbb{R}^{W\times W} located in a noisy measurement yy (the observed, acquired, data). Specifically, let yN×My\in\mathbb{R}^{N\times M} be a measurement of the form:

y[n,m]=k=1Ks[ncn(k),mcm(k)]+w[n,m],{y[n,m]}=\sum_{k=1}^{K}s\left[n-c^{(k)}_{n},m-c^{(k)}_{m}\right]+w[n,m], (1)

where 𝐜(k)=(cn(k),cm(k)){\mathbf{c}^{(k)}}=(c^{(k)}_{n},c^{(k)}_{m}) is the upper-left corner coordinate of the kk-th image occurrence. We model the noise term w[n,m]w[n,m] as an i.i.d. Gaussian noise with mean zero and variance σ2\sigma^{2}. Our goal is to estimate the unknown image locations 𝐜(1),,𝐜(K){\mathbf{c}^{(1)}},\ldots,{\mathbf{c}^{(K)}} from the observation yy. While KK—the number of image occurrences—is an unknown parameter, it is instructive to first assume that KK is known; we later omit this assumption.

The problem of detecting multiple image occurrences in noisy data appears in various image processing applications, including fluorescence microscopy [15, 8], astronomy [4], anomaly detection [9], neuroimaging [38, 16, 36], and electron microscopy [10, 19, 3]. A detailed discussion on these applications is provided in [5]. However, although many algorithms were developed over the years (see, e.g. [9]) in most cases there is no known statistically optimal method for detecting the images.

A particularly important motivation for this paper is single-particle cryo-electron microscopy (cryo-EM)—a leading technology for determining the three-dimensional structure of biological molecules [1, 35, 12]. In a cryo-EM experiment, multiple samples of biological molecules are frozen in a thin layer of ice. The samples might be densely packed, but they do not overlap. One of the first stages in the computational pipeline of cryo-EM involves locating images within a large and highly noisy measurement [10, 19, 3]. Motivated by cryo-EM, we assume that the image occurrences do not overlap, but otherwise may be arbitrarily spread in the measurement. Namely, each image occurrence is separated by at least WW pixels from any other image occurrence, in at least one dimension:

max(|cn(k)cn()|,|cm(k)cm()|)W,k.\displaystyle\max\left(\left|c^{(k)}_{n}-c^{(\ell)}_{n}\right|,\left|c^{(k)}_{m}-c^{(\ell)}_{m}\right|\right)\geq W,\quad\forall k\neq\ell. (2)

We refer to (2) as the separation condition. Figure 1 illustrates an example of a measurement that contains six image occurrences. The three image occurrences on the right side are densely packed (but satisfy the separation condition (2)), while the image occurrences on the left are separated by more than 2W2W pixels (twice the separation condition (2)); the distinction between these two scenarios plays a key role in this paper.

Refer to caption
Refer to caption
Figure 1: The left panel displays a noiseless measurement with six image occurrences: the images on the right side are densely packed, while the images on the left side are separated by more than 2W2W pixels. The right panel shows a noisy version of the same measurement, with SNR=30[dB]-30[\text{dB}]. Our goal is to accurately estimate the location of the six images from the noisy measurement.

Assuming the template image ss and the number of image occurrences KK are known, the maximum likelihood estimator (MLE), for the sought locations, is given by

argmin𝐜^(1),,𝐜^(K)yk=1Ks[nc^n(k),mc^m(k)]22,\displaystyle\operatorname*{arg\,min}_{\hat{\mathbf{c}}^{(1)},\ldots,\hat{\mathbf{c}}^{(K)}}\left\|y-\sum_{k=1}^{K}s\left[n-\hat{c}^{(k)}_{n},m-\hat{c}^{(k)}_{m}\right]\right\|^{2}_{2}, (3)

where we denote by 𝐜^(1),,𝐜^(K)\hat{\mathbf{c}}^{(1)},\ldots,\hat{\mathbf{c}}^{(K)} the estimates of 𝐜(1),{\mathbf{c}}^{(1)},\ldots 𝐜(K){\mathbf{c}}^{(K)}, respectively. Taking the separation condition (2) into account, the MLE results in a constrained optimization problem:

argmax𝐜^(1),,𝐜^(K)k=1Kn=1Nm=1My[n,m]s[nc^n(k),mc^m(k)]\displaystyle\operatorname*{arg\,max}_{\hat{\mathbf{c}}^{(1)},\ldots,\hat{\mathbf{c}}^{(K)}}\sum_{k=1}^{K}\sum_{n=1}^{N}\sum_{m=1}^{M}y[n,m]s\left[n-\hat{c}^{(k)}_{n},m-\hat{c}^{(k)}_{m}\right] (4)
s.t.max(|c^n(k)c^n()|,|c^m(k)c^m()|)W,k.\displaystyle\ \textrm{s.t.}\quad\max\left(\left|\hat{c}^{(k)}_{n}-\hat{c}^{(\ell)}_{n}\right|,\left|\hat{c}^{(k)}_{m}-\hat{c}^{(\ell)}_{m}\right|\right)\geq W,\quad\forall k\neq\ell.

Efficiently computing the optimal solution to this optimization problem is the prime goal of this work.

A popular and natural heuristic for solving (4) is as follows [19, 10, 28, 14, 31]. First, we correlate the template image ss with the measurement yy. We set 𝐜^(1)\hat{\mathbf{c}}^{(1)} as the location which maximizes the correlation. The next estimator, 𝐜^(2)\hat{\mathbf{c}}^{(2)}, is chosen as the next maximum of the correlation that satisfies the separation condition (2) with respect to 𝐜^(1){\hat{\mathbf{c}}^{(1)}}. The same approach is applied sequentially for estimating the rest of the locations 𝐜^(3),,𝐜^(K)\hat{\mathbf{c}}^{(3)},\ldots,\hat{\mathbf{c}}^{(K)}. We refer to this algorithm as the greedy algorithm, and it can be thought of as a variation of the well-known template matching algorithm that takes the separation condition (2) into account. This algorithm is highly efficient since the correlations can be calculated using the FFT algorithm [29].

Figure 2 illustrates the correlation between the noisy measurement yy of Figure 1 and the template image ss, and the output of the greedy algorithm. Evidently, the greedy algorithm successfully detects the locations of the well-separated image occurrences on the left end of the measurement (when the images are separated by at least 2W2W pixels, twice the separation condition of (2)) but completely fails when the image occurrences are densely packed (right end). The reason is that correlation doubles the support of the image. Thus, the greedy algorithm is likely to consider two close image occurrences as one, especially in the presence of high levels of noise. In this paper, we design an algorithm that detects the image occurrences accurately for any measurement that satisfies the separation condition (2)—even for measurements with highly dense image occurrences—by computing the constrained MLE (4).

While many papers designed object detection algorithms, as far as we know, none of them is guaranteed to achieve the MLE [9], as we propose in this paper. For example, a CNN-based technique was designed in [30]; this method works well but has no theoretical guarantees and requires a large data set for training. Other papers focused on accelerating the template matching algorithm [24]. The MLE for the one-dimensional version of (1) was considered in [31] based on dynamic programming. A multiple hypothesis approach was suggested in [34, 5, 11], and the statistical properties of the problem were analyzed in [6].

Refer to caption
Refer to caption
Figure 2: On the left, the correlation between the noisy measurement yy, as depicted in Figure  1, and the template image ss. On the right, the dashed lines display the noiseless measurement, as depicted in Figure 1. The red images are the output of the greedy algorithm, which fails to detect the location of the images on the right when the image occurrences are densely packed.

In Section 2, we reformulate the constrained MLE problem (4) as a Winner Determination Problem (WDP)—an optimization problem from the field of combinatorial auction. Section 3 shows how the problem can be solved exactly by binary tree search. While scanning the entire tree is intractable, we design an efficient pruning method to significantly accelerate the search, while guaranteeing to find the optimal solution. This section outlines the technical details of the proposed algorithm. Section 4 presents results on simulated data, and Section 5 shows results on experimental electron microscopy datasets. Section 6 concludes the paper.

2 Image detection as a combinatorial auction problem

This section begins by introducing the classical combinatorial auction problem and the WDP. Then, we show how the constrained optimization problem (4) can be reformulated as an optimization problem, which is highly similar to the classical WDP. This formulation leads to an efficient algorithm, whose details are introduced in Section 3.

2.1 Classical combinatorial auction

Consider an auction of a variety of different goods. Unlike a classical auction, where goods are sold individually, a bidder can place a bid on a bundle of goods in a combinatorial auction. The auction manager receives multiple price offers for different bundles of goods. The goal of the auction manager is to maximize his revenue by allocating the goods to the highest submitted bids. Naturally, the same good may not be allocated to multiple bidders. This means that if two bidders place bids over bundles with overlapping goods, only one of the bids can win.

Let 𝒢={γ1,,γA}\mathcal{G}=\{\gamma_{1},\ldots,\gamma_{A}\} be a set of given goods, and let ={β1,,βB}\mathcal{B}=\{\beta_{1},\ldots,\beta_{B}\} be a set of bids. Each bid βi\beta_{i} includes a bundle of goods g(βi)𝒢g(\beta_{i})\subseteq\mathcal{G}, which the bidder seeks to acquire. For this bundle, the bidder of βi\beta_{i} proposes a price, p(βi)p(\beta_{i}). Let xix_{i} be an indicator variable that corresponds to the winning bids. That is, we set xi=1x_{i}=1 if βi\beta_{i} is a winning bid and xi=0x_{i}=0 otherwise. An allocation, 𝝅\bm{\pi}, is a set of winning bids that do not overlap. Specifically, 𝝅\bm{\pi}\subseteq\mathcal{B} is a subset of the bids, where β2β1𝝅,g(β1)g(β2)=\forall\beta_{2}\neq\beta_{1}\in\bm{\pi},g(\beta_{1})\cap g(\beta_{2})=\varnothing. Therefore, an allocation cannot include two bids that include the same good.

Using this notation, the combinatorial auction problem is given by:

maxx1,,xB\displaystyle\max_{x_{1},\ldots,x_{B}} i=1Bxip(βi)\displaystyle\quad\sum_{i=1}^{B}x_{i}p(\beta_{i}) (5)
   s.t. i|γg(βi)xi1,\displaystyle\sum_{i|\gamma\in g(\beta_{i})}x_{i}\leq 1,\quad γ𝒢,\displaystyle\forall\gamma\in\mathcal{G},
xi{0,1},\displaystyle\;\;\;\;x_{i}\in\{0,1\}, i.\displaystyle\forall i.

The first constraint ensures that each good can be allocated once at most, while the second constraint guarantees that each bid can be either included or excluded from the chosen allocation. We define the optimal allocation as the allocation that maximizes the revenue of the auction manager. The problem of determining the optimal allocation is known as the WDP [23]. It is known that the WDP is an NP-hard problem and cannot be approximated, within any constant, by a polynomial-time algorithm [32, 33].

2.2 Image detection as a winner determination problem

We now cast the constrained optimization problem (4) as a variation of the WDP. We associate every pixel in the measurement with a single good and let \mathcal{B} be a set of bids. Accepting βij\beta_{ij}\in\mathcal{B} implies that our estimator includes an image occurrence whose upper left corner is located at the (i,j)(i,j)-th pixel. The goods requested by the bid, g(βij)g(\beta_{ij}), are a W×WW\times W set of pixels, corresponding to a possible image occurrence in the measurement. The offered price of the bid, p(βij)p(\beta_{ij}), is the correlation between the measurement and the template image, namely,

p(βij)=n=1Nm=1My[n,m]s[ni,mj].p(\beta_{ij})=\sum_{n=1}^{N}\sum_{m=1}^{M}y[n,m]s[n-i,m-j]. (6)

The revenue of the auctioneer is the sum of offered prices by all accepted bids. Since we are estimating the upper left corner coordinates of images in the measurement, we require that the estimated coordinates will satisfy c^n(k)NW+1\hat{c}^{(k)}_{n}\leq N-W+1 and c^m(k)MW+1\hat{c}^{(k)}_{m}\leq M-W+1, so we do not exceed the measurement’s dimension. We denote N~=NW+1\tilde{N}=N-W+1 and M~=MW+1\tilde{M}=M-W+1.

Let us define 𝐗{0,1}N~×M~\mathbf{X}\in\{0,1\}^{\tilde{N}\times\tilde{M}} as a matrix of indicators, in which xij𝐗x_{ij}\in\mathbf{X} is equal to 11 if the bid βij\beta_{ij} is included in the allocation, and 0 otherwise. To enforce the separation constraint (2), we require that

𝟙WT𝐗~ij𝟙W{0,1},i,j,\displaystyle\mathbbm{1}_{{}_{W}}^{T}\tilde{\mathbf{X}}_{ij}\mathbbm{1}_{{}_{W}}\in\{0,1\},\quad\forall i,j, (7)

where 𝟙W\mathbbm{1}_{W} is an all-ones W×1W\times 1 column vector and 𝐗~ij\tilde{\mathbf{X}}_{ij} is a W×WW\times W sub-matrix of 𝐗\mathbf{X}, whose upper left entry is the (i,j)(i,j) entry in the matrix 𝐗\mathbf{X}. Using this notation, and assuming KK is known, the constrained likelihood problem (4) can be written as:

maxx1,,xN~M~\displaystyle\max_{x_{1},\ldots,x_{\tilde{N}\tilde{M}}} i=1N~j=1M~xijp(βij)\displaystyle\sum_{i=1}^{\tilde{N}}\sum_{j=1}^{\tilde{M}}x_{ij}p(\beta_{ij}) (8)
s.t. 𝟙N~T𝐗𝟙M~=K,\displaystyle\mathbbm{1}_{{}_{\tilde{N}}}^{T}\mathbf{X}\mathbbm{1}_{{}_{\tilde{M}}}=K,
𝟙WT𝐗~ij𝟙W1,\displaystyle\mathbbm{1}_{{}_{W}}^{T}\tilde{\mathbf{X}}_{ij}\mathbbm{1}_{{}_{W}}\leq 1, i,j,\displaystyle\forall i,j,
xij{0,1},\displaystyle x_{ij}\in\{0,1\}, i,j,\displaystyle\forall i,j,

where the last two conditions are, together, equivalent to (7). Therefore, solving (8) is equivalent to (4), where the non-zero entries of the optimal 𝐗\mathbf{X} correspond to the MLE of the locations in (4).

We underscore that the optimization problem (8) is slightly different from the classical combinatorial auction (5). In the classical setup, the auctioneer aims to include as many bids as possible in the optimal allocation to maximize the revenue, whereas in our model the optimal allocation must include exactly KK bids. Furthermore, in our model, p(βij)p(\beta_{ij}) can be negative. Thus, we cannot use existing algorithms that solve the classical WDP (5), such as the one proposed in [23], and we design a new algorithm tailored for our constrained setup. Importantly, the designed algorithm must be computationally efficient: solving (8) in a brute-force manner requires searching over an exponential number of bids, rendering this approach intractable. For example, if N,M=30N,M=30 and K=5K=5, then the number of possible allocations is 1012\sim 10^{12}. The next section introduces a pruning technique to solve (8), which dramatically reduces the number of explored allocations.

3 WDP for image detection

To delineate the proposed algorithm, we introduce further notation. Two bids, βi,βj\beta_{i},\beta_{j}\in\mathcal{B}, are considered to be conflicting if g(βi)g(βj)g(\beta_{i})\cap g(\beta_{j})\neq\varnothing. We extend the functions g()g(\cdot) and p()p(\cdot) to apply to allocations: g(𝝅)=β𝝅g(β)g(\bm{\pi})=\bigcup_{\beta\in\bm{\pi}}g(\beta) and p(𝝅)=β𝝅p(β)p(\bm{\pi})=\sum_{\beta\in\bm{\pi}}p(\beta) A partial allocation, 𝝅k{\bm{\pi}_{k}}, is an allocation in which the number of bids is k<Kk<K. A bid βi\beta_{i} is considered to be conflicting with a partial allocation 𝝅k{\bm{\pi}_{k}} if g(βi)g(𝝅k)g(\beta_{i})\cap g({\bm{\pi}_{k}})\neq\varnothing. We denote the set of bids that conflict with a partial allocation 𝝅k{\bm{\pi}_{k}} as C𝝅kC_{\bm{\pi}_{k}}. Hence, given a partial allocation 𝝅k{\bm{\pi}_{k}}, the set of allowed bids, from which a bid can be added to 𝝅k{\bm{\pi}_{k}}, is C¯𝝅k=C𝝅k\overline{C}_{{\bm{\pi}_{k}}}=\mathcal{B}\setminus C_{{\bm{\pi}_{k}}}. In addition, we define 𝝅~opt\tilde{\bm{\pi}}_{\text{opt}} as the current optimal allocation that was found by our algorithm so far.

3.1 Binary tree construction

The algorithm begins with forming a binary tree data structure. Binary trees, widely used in search algorithms in computer science [2], are suitable for examining every possible allocation through a systematic navigation to determine the optimal allocation.

Given a set of bids, each node in the tree corresponds to a simple binary question: whether to include or exclude the bid in the allocation. This results in a full binary tree where, except for the leaf nodes, each node has two children corresponding to the same bid. We introduce a simple example to illustrate the construction of the binary tree. Let us consider y4y\in\mathbb{R}^{4}, a template image of a unit size, ss\in\mathbb{R}, and K=2K=2. As explained in Section 2, we cast the image detection problem as a variation of the WDP. In this example, every bid includes a bundle of a single good (since the template image is of a unit size). We first set β1\beta_{1} as the tree’s root, and β4\beta_{4} forms the tree’s leaves. To explore an allocation that includes the current node, we navigate to the node’s left child and add the current node to our partial allocation, 𝝅k\bm{\pi}_{k}. Conversely, navigating to the right child represents the alternative choice of excluding the current node from 𝝅k\bm{\pi}_{k}. Figure 3 demonstrates the binary tree we obtain in this simple example.

Refer to caption
Figure 3: An example of a binary tree, corresponding to a measurement with four elements.

As previously mentioned, the binary tree enables us to conduct an exhaustive search across all possible allocations. Therefore, by using this search mechanism, the optimal allocation will inevitably be encountered during our search.

3.2 Tree pruning

Exhaustively searching through the tree leads necessarily to the optimal allocation, but is computationally infeasible even for problems of small dimensions. To overcome this issue, we use the algorithmic technique of branch-and-bound and present a pruning method to avoid exploring a large portion of the allocations in advance by eliminating sub-trees that cannot yield the optimal allocation.

We illustrate this method using the example of Figure 3. We refer to a smaller binary tree, spanned by a node other than the root, as a sub-tree. Let us assume that our optimal allocation is 𝝅opt={β1,β3}\bm{\pi}_{\text{opt}}=\{\beta_{1},\beta_{3}\} and that we have already encountered our optimal allocation 𝝅opt\bm{\pi}_{\text{opt}} in our search. At this point, we are unaware that this is the optimal allocation, so we continue exploring the remaining possible allocations. Assume we have chosen to exclude β1\beta_{1} from the empty partial allocation, 𝝅0\bm{\pi}_{0}. Following that, two bids are left to be found in the sub-tree spanned by β2\beta_{2}. Figure 4 demonstrates this sub-tree. The maximal revenue achievable by exploring this sub-tree is bounded by the sum of the prices of the two bids with the highest revenue within it. Since this revenue is less than what is achieved by the optimal allocation (by the assumption that we already encountered the optimal allocation), we can avoid exploring the allocations in this sub-tree. Thus, in this example, we reduce the computational burden by half, as we prune half of the binary tree.

Refer to caption
Figure 4: A sub-tree that corresponds to excluding β1\beta_{1} from the partial allocation 𝝅k\bm{\pi}_{k}. If the maximal revenue that can be achieved by exploring this sub-tree is less than the revenue of our current optimal allocation, 𝝅~opt\tilde{\bm{\pi}}_{\text{opt}}, we prune it.

We now formulate this example as a general pruning mechanism. Say we have reached a partial allocation 𝝅k\bm{\pi}_{k}. We sort the bids in C¯𝝅k\overline{C}_{\bm{\pi}_{k}} (the set of allowed bids) in descending order according to their revenues. We denote the KkK-k maximal revenue bids in C¯𝝅k\overline{C}_{\bm{\pi}_{k}} as C¯𝝅k(Kk)\overline{C}^{(K-k)}_{\bm{\pi}_{k}}. Summing the revenues of the bids in C¯𝝅k(Kk)\overline{C}^{(K-k)}_{\bm{\pi}_{k}} bounds on the remaining revenue to be allocated in this sub-tree

h(𝝅k):=βC¯𝝅k(Kk),p(β).h(\bm{\pi}_{k}):=\sum_{\beta\in\overline{C}^{(K-k)}_{\bm{\pi}_{k}},}p(\beta). (9)

This function will enable us to determine if the optimal allocation cannot be achieved within the current sub-tree, and thus we can prune it. Specifically, we prune the sub-tree if the sum of p(𝝅k)p(\bm{\pi}_{k}) and h(𝝅k)h(\bm{\pi}_{k}) is less than the revenue of our current optimal allocation, p(𝝅opt)p(\bm{\pi}_{\text{opt}}), i.e.,

p(𝝅k)+h(𝝅k)<p(𝝅~opt).p(\bm{\pi}_{k})+h(\bm{\pi}_{k})<p(\tilde{\bm{\pi}}_{\text{opt}}). (10)

Since we only prune sub-trees that do not lead to the optimal allocation, our proposed algorithm remains optimal. We note, however, that h(𝝅k)h(\bm{\pi}_{k}) is not necessarily a tight upper bound as it ignores the constraint that the remaining KkK-k bids must not conflict with each other.

3.3 Sorting the bids

Our algorithm examines each potential optimal allocation, and thus the optimality of the algorithm is unaffected by rearranging the order of the bids. We aim to reorder the bids in such a way that maximizes the efficiency of the pruning mechanism. Note that the earlier we secure a high revenue allocation p(𝝅~opt)p(\tilde{\bm{\pi}}_{\text{opt}}), the more the stopping criterion (10) is met, resulting in fewer allocations to be examined. Hence, we begin the algorithm by sorting the bids by their revenue in descending order. The bid with the highest revenue is fixed as the tree’s root, while the bid with the lowest revenue forms the tree’s leaves. This strategy enhances the frequency in which the pruning condition is met, thereby boosting its efficiency and accelerating the algorithm’s runtime. For example, in the experiments in Section 4, we observed that sorting the bids can lead to acceleration by a factor of around 10.

3.4 Algorithm description

We are now ready to describe our algorithm for detecting image occurrences in a noisy measurement. We will use the example above to explain the guiding principles. We start by sorting the bids in descending order by their revenues, as described in Section 3.3. An example of a sorted tree is presented in Figure 5.

Refer to caption
Figure 5: A sorted binary tree, corresponding to the tree that was presented in Figure 3.

Our algorithm is designed such that when we encounter a node, it is added to our partial allocation only if it does not conflict with the partial allocation and if the partial allocation itself does not satisfy the pruning condition (10). We begin our search at the tree’s root, β1\beta_{1}. As this is the first node we encounter, we add the node to our partial allocation and move to the left child, β3\beta_{3}. At this point, β3\beta_{3} is added to the partial allocation, since it does not conflict with our partial allocation nor does the partial allocation itself satisfy the pruning condition (10). Since this is the first allocation we attain, this is the current optimal allocation. We note that the first allocation we explore is the output of the greedy algorithm. This means that the greedy algorithm is not necessarily optimal, since the optimal allocation may be achieved in a later stage. In our specific simple example, indeed the greedy algorithm attains the optimal solution (since the first two bids do not conflict), but in order to explain the full mechanism of the algorithm, we describe the remaining steps.

We then remove the last bid we added, β3\beta_{3}, and continue examining allocations excluding that bid. To do so, we move to the right child of β3\beta_{3}. Upon reaching β2\beta_{2}, we check if our partial allocation 𝝅1\bm{\pi}_{1} satisfies the pruning condition (10). Since the optimal allocation is 𝝅opt={β1,β3}\bm{\pi}_{\text{opt}}=\{\beta_{1},\beta_{3}\}, the pruning condition is met, and hence we prune the sub-tree spanned by β2\beta_{2} and return to the parent, β3\beta_{3}. Since we examined the sub-tree spanned by β3\beta_{3}, we next return to the parent, β1\beta_{1}, and remove β1\beta_{1} from our partial allocation. This way, we examined all allocations that include β1\beta_{1}, and we move to the node’s right child. This process continues until the algorithm explores the entire tree. Upon completion, it returns the optimal allocation. The general algorithmic pipeline is described in Algorithm 1.

Algorithm 1 A WDP algorithm to solve (8)
  Input: Set of bids \mathcal{B} and number of image occurrences KK
  Output: Optimal allocation 𝝅opt\bm{\pi}_{\text{opt}}
  Initialize: curr \leftarrow root, 𝝅k={}\bm{\pi}_{k}=\{\}𝝅~opt={}\tilde{\bm{\pi}}_{\text{opt}}=\{\}
  Sort \mathcal{B} by revenue and construct the binary tree
  while curr \neq NULL do
     if size(𝝅k)=K(\bm{\pi}_{k})=K then
        if p(𝝅k)>p(𝝅~opt)p(\bm{\pi}_{k})>p(\tilde{\bm{\pi}}_{\text{opt}}) then
           𝝅~opt=𝝅k\tilde{\bm{\pi}}_{\text{opt}}=\bm{\pi}_{k}
        end if
        𝝅k𝝅kcurr.parent\bm{\pi}_{k}\leftarrow\bm{\pi}_{k}\setminus\text{curr.parent}
        curr \leftarrow curr.parent
     else
        if have not explored left sub-tree then
           if p(𝝅k)+h(𝝅k)<p(𝝅~opt)p(\bm{\pi}_{k})+h(\bm{\pi}_{k})<p(\tilde{\bm{\pi}}_{\text{opt}}) then
              prune left and right sub-trees.
           else if curr C¯𝝅k\in\overline{C}_{\bm{\pi}_{k}} then
              𝝅k𝝅k\bm{\pi}_{k}\leftarrow\bm{\pi}_{k}\ \cup curr 
              curr \leftarrow curr.left
           end if
        else if right sub-tree exists then
           curr \leftarrow curr.right  
        else
           if curr 𝝅k\in\bm{\pi}_{k} then
              𝝅k𝝅k\bm{\pi}_{k}\leftarrow\bm{\pi}_{k}\setminus curr
           end if
           curr \leftarrow curr.parent
        end if
     end if
  end while

3.5 Estimating the number of image occurrences using the gap statistics principle

In many real-world scenarios, such as electron microscopy [1, 11], the exact number of image occurrences KK is unknown. This imposes a major challenge given that both the greedy algorithm and Algorithm 1 require a fixed KK. A common practice is solving (4) for various KK values, choosing the one that induces the steepest change in the objective value. This heuristic is called the “knee” method and has been widely adopted across diverse domains, such as clustering [37] and regularization [18].

Here, we propose employing the gap statistics principle, which can be thought of as a statistical technique to identify the “knee” [37]. This principle is widely applied to a variety of domains, as discussed for example in [26, 27]. The underlying idea of gap statistics is to adjust the objective value curve (as a function of the possible values of KK) by comparing it with its expectation under a null reference. Let us define the gap statistic as:

gap(K)=p(𝝅opt(K))𝔼p(𝝅opt(K)),\text{gap}(K)=p(\bm{\pi}_{\text{opt}}(K))-\mathbb{E}^{*}p(\bm{\pi}_{\text{opt}}(K)), (11)

where p(𝝅opt(K))p(\bm{\pi}_{\text{opt}}(K)) represents the revenue of the optimal allocation for KK image occurrences (computed by Algorithm 1), and 𝔼p(𝝅opt(K))\mathbb{E}^{*}p(\bm{\pi}_{\text{opt}}(K)) is the expectation for a measurement of dimensions N×MN\times M derived from a “null” reference distribution. To form the “null” distribution, we randomly rearrange the pixels of the measurement, resulting in an unstructured measurement, with no template image occurrences. Specifically, we estimate the expectation under the null by Ep(𝝅opt(K))1Rr=1Rp(𝝅optr(K))E^{*}p(\bm{\pi}_{\text{opt}}(K))\approx\frac{1}{R}\sum_{r=1}^{R}p(\bm{\pi}_{\text{opt}}^{r}(K)), where p(𝝅optr(K))p(\bm{\pi}_{\text{opt}}^{r}(K)) is the revenue achieved by the optimal allocation, at the rr-th rearrangement; in the experiments below, we set R=50R=50. The estimate of KK—the number of image occurrences—is the KK that maximizes gap(K)\text{gap}(K) and is denoted by K^\hat{K}.

4 Numerical experiments

In this section, we conduct numerical experiments to compare Algorithm 1 with the alternative greedy algorithm. We use the F1-score to evaluate the performance of both methods [17, 20, 13, 25]. The F1F_{1} score is defined as:

F1=2×Precision×TPRPrecision+TPR,\text{F}_{1}=2\times\frac{\text{Precision}\times\text{TPR}}{\text{Precision}+\text{TPR}}, (12)

where Precision is the ratio of correct detections divided by the total number of detections, and TPR, which stands for True Positive Rate, is the ratio of correct detections divided by the total number of image occurrences. We note that in the case where the number of image occurrences KK is known, Precision equals TPR.

We define a detection as correct if:

max(|c^n(k)cn(k)|,|c^m(k)cm(k)|)W2.\displaystyle\max\left(\left|\hat{c}^{(k)}_{n}-{c}^{(k)}_{n}\right|,\left|\hat{c}^{(k)}_{m}-{c}^{(k)}_{m}\right|\right)\leq\frac{W}{2}. (13)

That is, a detection is classified as correct if the estimated image location is within W2\frac{W}{2} pixels from the true location. This convention is common in the image detection literature [34, 5]. We define the SNR of the measurement as

SNR=10logKW2σ2NM,\text{SNR}=10\log\frac{KW^{2}}{\sigma^{2}NM}, (14)

where σ2\sigma^{2} is the variance of the noise. To compare our algorithm with the greedy algorithm, which works well when the supports of the correlated image occurrences do not overlap, we define another separation condition.

max(|cn(k)cn()|,|cm(k)cm()|)2W,k.\displaystyle\max\left(\left|c^{(k)}_{n}-c^{(\ell)}_{n}\right|,\left|c^{(k)}_{m}-c^{(\ell)}_{m}\right|\right)\geq 2W,\quad\forall k\neq\ell. (15)

We refer to this condition as the well-separated condition.

For a given set of parameters M,N,W,KM,N,W,K, we generate a measurement yy as follows. We begin by placing the first image occurrence at a random location (drawn from a uniform distribution) within the measurement. Then, we draw another location and place the image occurrence if it meets either (2) or (15). This process is repeated until all KK images have been placed in the measurement. Finally, we add i.i.d. white Gaussian noise with zero mean and σ2\sigma^{2} variance to the measurement. The code to reproduce the numerical experiments is publicly available at https://github.com/saimonanuk/Optimal-detection-of-non-overlapping-images-via-combinatorial-auction.

4.1 A known number of image occurrences

In the first experiment, we assume that the number of image occurrences KK is known, and both algorithms receive the true number of image occurrences KK as input. We set N=M=40N=M=40 and K=4K=4, and choose a template image with all-ones entries of size W=3W=3. We generate the measurement such that the separation condition (2) is met, and at least one pair of image occurrences is separated precisely by WW pixels. For each SNR level, we conduct 1000 trials, each with a fresh measurement. Figure 6 shows the average F1F_{1} score of Algorithm 1 and the greedy algorithm as a function of the SNR. Evidently, Algorithm 1 outperforms the greedy algorithm in all SNR regimes. As expected, in high SNR regimes, Algorithm 1 achieves F1=1\text{F}_{1}=1. On the contrary, the greedy algorithm fails even in the high SNR regime, achieving F1=0.88\text{F}_{1}=0.88, due to the proximity of the image occurrences in the measurement. The average runtime per trial of Algorithm 1 was approximately 80 seconds, whereas the average runtime of the greedy algorithm was 0.1 seconds.

Refer to caption
Figure 6: The average F1F_{1} score, as a function of the SNR, of Algorithm 1 and the greedy algorithm, assuming the number of image occurrences is known.

We repeated the same experiment, but now the image occurrences are well-separated according to (15). The results of this experiment are illustrated in Figure 7. As expected, in this case, both algorithms achieve the same results, and the running time remains the same.

Refer to caption
Figure 7: The average F1F_{1} score, as a function of the SNR, of Algorithm 1 and the greedy algorithm, where the image occurrences satisfy the well-separated condition (15), assuming the number of image occurrences is known.

4.2 An unknown number of image occurrences

We repeated the first experiment from the previous section (when the image occurrences satisfy (2) but not (15)), but now the number of image occurrences KK is unknown. To estimate KK, we used the gap statistics principle. This estimate is then used as an input for both algorithms to determine the image locations. The results are presented in Figure 8.

Refer to caption
Figure 8: The average F1F_{1} score, as a function of the SNR, of Algorithm 1 and the greedy algorithm assuming the number of image occurrences is unknown.

Similar to the previous case, in the unknown KK scenario, Algorithm 1 outperforms the greedy algorithm in all SNR regimes. In addition, Algorithm 1 provides better estimates of the number of signal occurrences KK. For example, for SNR=12.5=-12.5, Algorithm 1 gets 87.2%87.2\% accuracy in estimating KK precisely, while the greedy algorithm gets only 80%80\%. This gap drops with the SNR until they eventually converge to the same estimations when the SNR is low enough.

5 Cryo-electron microscopy examples

We applied Algorithm 1 and the greedy algorithm to two cryo-EM datasets, available at the EMPIAR repository [21]. We assume a disk-like shape for the images. Following standard procedures in this field, we performed a sequence of preprocessing steps before applying both algorithms. First, we whitened the measurement by manually selecting “noise-only” areas (i.e., without images), which were used to estimate the empirical covariance matrix of the noise. The data is then normalized by the inverse of this covariance matrix. Then, the data is downsampled by a factor of 12, concurrently reducing computational load and amplifying the SNR. Ultimately, we took the whitened, downsampled measurement and determined patches in which the image occurrences are not well-separated, namely, do not satisfy (15). We focused on small dense patches since this is the scenario in which Algorithm 1 shines. In well-separated (15) areas, it is recommended to use the computationally efficient greedy algorithm. Figure 9 displays two different measurements, after the preprocessing steps.

Refer to caption
Refer to caption
Figure 9: Two cryo-EM measurements after preprocessing steps. The left panel presents a measurement from the EMPIAR-10081 data set and the right panel shows a measurement from the EMPIAR-10217 data set. The orange squares are the dense areas used as the input for both algorithms.

5.1 EMPIAR 10081

Our first experiment is conducted on the EMPIAR-10081 data set of the Human HCN1 Hyperpolarization-Activated Channel [22]. The measurement is presented in Figure 9. We set the radius of the disc used as a template image to 3 pixels. The number of image occurrences is estimated based on gap statistics. Figure 10 shows the output of both algorithms. Algorithm 1 identifies two separated images. In contrast, the greedy algorithm identifies a single occurrence. For comparison, we also show the output of a popular detection algorithm for cryo-EM data sets (particle picker), called TOPAZ [3]. This algorithm is based on a deep learning technique (see also [7]).

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 10: From left to right: the output of the greedy algorithm, the original patch taken from the data set EMPIAR 10081 as illustrated in Figure 9, the output of Algorithm 1, and the output of the cryo-EM particle picker TOPAZ [3].

5.2 EMPIAR 10217

Our second experiment is conducted on the EMPIAR-10217 data set of the bovine liver glutamate dehydrogenase [21]. We set the radius of the disc used as a template image to 6 pixels. The measurement is presented in Figure 9 and the results of both algorithms are displayed in Figure 11. Similar to the previous example, Algorithm 1 identifies two different particle images, while the greedy algorithm finds only one particle image in the measurement.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 11: From left to right: the output of the greedy algorithm, the original patch taken from the data set EMPIAR 10217 as illustrated in Figure 9, the output of Algorithm 1, and the output of the cryo-EM particle picker TOPAZ [3].

6 CONCLUSION

This paper studies the problem of detecting multiple image occurrences in a two-dimensional, noisy measurement. We approach this task by formulating it as a constrained maximum likelihood optimization problem. By showing the equivalence between this problem and a version of the winner determination problem, we design an efficient search algorithm over a binary tree. Our algorithm can find the optimal solution, even in scenarios characterized by high noise levels and densely packed image occurrences.

Our methodology can be extended in a couple of directions that we leave for future work. First, we assume white Gaussian noise, whereas in many applications, such as cryo-EM, that noise is not white [1]. Using the formulation of the detection problem as a combinatorial auction (8), changing the noise statistics will only change the revenue function pp, while the rest of the optimization problem remains the same. Similarly, one may want to introduce a prior on the sought locations (in the Bayesian sense). In this case, the goal will be to maximize the posterior distribution under the separation condition. This again will not change the structure of the algorithm, just the definition of the revenue. Second, our pruning mechanism is quite conservative since it ignores the separation condition. An important future work includes designing a more effective pruning mechanism that will accelerate the search over the tree (and thus the algorithm’s runtime), perhaps at the cost of a small bounded deviation from the optimal solution.

Acknowledgments

T.B. is supported in part by the BSF grant no. 2020159, the NSF-BSF grant no. 2019752, and the ISF grant no. 1924/21. A.P. is supported by the in part by ISF grant no. 963/21. The authors are grateful to Amnon Balanov for his assistance with TOPAZ.

References

  • [1] Tamir Bendory, Alberto Bartesaghi, and Amit Singer. Single-particle cryo-electron microscopy: Mathematical theory, computational challenges, and opportunities. IEEE signal processing magazine, 37(2):58–76, 2020.
  • [2] Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509–517, 1975.
  • [3] Tristan Bepler, Andrew Morin, Micah Rapp, Julia Brasch, Lawrence Shapiro, Alex J Noble, and Bonnie Berger. Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs. Nature methods, 16(11):1153–1160, 2019.
  • [4] Pierpaulo Brutti, Christopher Genovese, Christopher J Miller, Robert C Nichol, and Larry Wasserman. Spike hunting in galaxy spectra. 2005.
  • [5] Dan Cheng and Armin Schwartzman. Multiple testing of local maxima for detection of peaks in random fields. Annals of statistics, 45(2):529, 2017.
  • [6] Marom Dadon, Wasim Huleihel, and Tamir Bendory. Detection and recovery of hidden submatrices. IEEE Transactions on Signal and Information Processing over Networks, 2024.
  • [7] Ashwin Dhakal, Rajan Gyawali, Liguo Wang, and Jianlin Cheng. CryoTransformer: a transformer model for picking protein particles from Cryo-EM micrographs. Bioinformatics, 40(3):btae109, 2024.
  • [8] Alexander Egner, Claudia Geisler, Claas Von Middendorff, Hannes Bock, Dirk Wenzel, Rebecca Medda, Martin Andresen, Andre C Stiel, Stefan Jakobs, Christian Eggeling, et al. Fluorescence nanoscopy in whole cells by asynchronous localization of photoswitching emitters. Biophysical journal, 93(9):3285–3290, 2007.
  • [9] Thibaud Ehret, Axel Davy, Jean-Michel Morel, and Mauricio Delbracio. Image anomalies: A review and synthesis of detection methods. Journal of Mathematical Imaging and Vision, 61:710–743, 2019.
  • [10] Amitay Eldar, Boris Landa, and Yoel Shkolnisky. KLT picker: Particle picking using data-driven optimal templates. Journal of structural biology, 210(2):107473, 2020.
  • [11] Amitay Eldar, Keren Mor Waknin, Samuel Davenport, Tamir Bendory, Armin Schwartzman, and Yoel Shkolnisky. Object detection under the linear subspace model with application to cryo-EM images. arXiv preprint arXiv:2405.00364, 2024.
  • [12] Joachim Frank. Three-dimensional electron microscopy of macromolecular assemblies: visualization of biological molecules in their native state. Oxford university press, 2006.
  • [13] Akinori Fujino, Hideki Isozaki, and Jun Suzuki. Multi-label text categorization with model combination based on f1-score maximization. In Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II, 2008.
  • [14] Munenori Fukunishi, Daniel Mcduff, and Norimichi Tsumura. Improvements in remote video based estimation of heart rate variability using the Welch FFT method. Artificial Life and Robotics, 23:15–22, 2018.
  • [15] Claudia Geisler, Andreas Schönle, C Von Middendorff, Hannes Bock, Christian Eggeling, Alexander Egner, and SW Hell. Resolution of λ\lambda/10 in fluorescence microscopy using fast single molecule photo-switching. Applied Physics A, 88:223–226, 2007.
  • [16] Christopher R Genovese, Nicole A Lazar, and Thomas Nichols. Thresholding of statistical maps in functional neuroimaging using the false discovery rate. Neuroimage, 15(4):870–878, 2002.
  • [17] Abbas Ghaddar and Philippe Langlais. Robust lexical features for improved neural network named-entity recognition. arXiv preprint arXiv:1806.03489, 2018.
  • [18] Per Christian Hansen. Analysis of discrete ill-posed problems by means of the L-curve. SIAM review, 34(4):561–580, 1992.
  • [19] Ayelet Heimowitz, Joakim Andén, and Amit Singer. APPLE picker: Automatic particle picking, a low-effort cryo-EM framework. Journal of structural biology, 204(2):215–227, 2018.
  • [20] Hao Huang, Haihua Xu, Xianhui Wang, and Wushour Silamu. Maximum F1-score discriminative training criterion for automatic mispronunciation detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(4):787–797, 2015.
  • [21] Andrii Iudin, Paul K Korir, José Salavert-Torres, Gerard J Kleywegt, and Ardan Patwardhan. EMPIAR: a public archive for raw electron microscopy image data. Nature methods, 13(5):387–388, 2016.
  • [22] Chia-Hsueh Lee and Roderick MacKinnon. Structures of the human hcn1 hyperpolarization-activated channel. Cell, 168(1):111–120, 2017.
  • [23] Kevin Eric Leyton-Brown. Resource allocation in competitive multiagent systems. stanford university, 2003.
  • [24] Arif Mahmood and Sohaib Khan. Correlation-coefficient-based fast template matching through partial elimination. IEEE Transactions on image processing, 21(4):2099–2108, 2011.
  • [25] Amichai Painsky. Quality assessment and evaluation criteria in supervised learning. Machine Learning for Data Science Handbook: Data Mining and Knowledge Discovery Handbook, pages 171–195, 2023.
  • [26] Amichai Painsky and Saharon Rosset. Exclusive row biclustering for gene expression using a combinatorial auction approach. In 2012 IEEE 12th International Conference on Data Mining, pages 1056–1061. IEEE, 2012.
  • [27] Amichai Painsky and Saharon Rosset. Optimal set cover formulation for exclusive row biclustering of gene expression. Journal of Computer Science and Technology, 29(3):423–435, 2014.
  • [28] BV P Prasad and Velusamy Parthasarathy. Detection and classification of cardiovascular abnormalities using FFT based multi-objective genetic algorithm. Biotechnology & Biotechnological Equipment, 32(1):183–193, 2018.
  • [29] Sergio Rapuano and Fred J Harris. An introduction to FFT and time domain windows. IEEE instrumentation & measurement magazine, 10(6):32–44, 2007.
  • [30] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
  • [31] Mordechai Roth, Amichai Painsky, and Tamir Bendory. Detecting non-overlapping signals with dynamic programming. Entropy, 25(2):250, 2023.
  • [32] Michael H Rothkopf, Aleksandar Pekeč, and Ronald M Harstad. Computationally manageable combinational auctions. Management science, 44(8):1131–1147, 1998.
  • [33] Tuomas Sandholm. Algorithm for optimal winner determination in combinatorial auctions. Artificial intelligence, 135(1-2):1–54, 2002.
  • [34] Armin Schwartzman, Yulia Gavrilov, and Robert J Adler. Multiple testing of local maxima for detection of peaks in 1D. Annals of statistics, 39(6):3290, 2011.
  • [35] Amit Singer and Fred J Sigworth. Computational methods for single-particle electron cryomicroscopy. Annual review of biomedical data science, 3:163–190, 2020.
  • [36] Jonathan E Taylor and Keith J Worsley. Detecting sparse signals in random fields, with an application to brain mapping. Journal of the American Statistical Association, 102(479):913–928, 2007.
  • [37] Robert Tibshirani, Guenther Walther, and Trevor Hastie. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2):411–423, 2001.
  • [38] Keith J Worsley, Sean Marrett, Peter Neelin, Alain C Vandal, Karl J Friston, and Alan C Evans. A unified statistical approach for determining significant signals in images of cerebral activation. Human brain mapping, 4(1):58–73, 1996.