This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Baseline Statistical Method For Robust User-Assisted Multiple Segmentation

Hüseyin Afşer The submission date is 03/01/2022.
       Hüseyin Afşer is with the Adana Alparslan Türkeş Science and Technology University, Department of Electrical Electronics Engineering, 01250, Adana, Turkey. (e-mail:[email protected])
Abstract

Recently, several image segmentation methods that welcome and leverage different types of user assistance have been developed. In these methods, the user inputs can be provided by drawing bounding boxes over image objects, drawing scribbles or planting seeds that help to differentiate between image boundaries or by interactively refining the missegmented image regions. Due to the variety in the types and the amounts of these inputs, relative assessment of different segmentation methods becomes difficult. As a possible solution, we propose a simple yet effective, statistical segmentation method that can handle and utilize different input types and amounts. The proposed method is based on robust hypothesis testing, specifically the DGL test, and can be implemented with time complexity that is linear in the number of pixels and quadratic in the number of image regions. Therefore, it is suitable to be used as a baseline method for quick benchmarking and assessing the relative performance improvements of different types of user-assisted segmentation algorithms. We provide a mathematical analysis on the operation of the proposed method, discuss its capabilities and limitations, provide design guidelines and present simulations that validate its operation.

Index Terms:
Image Segmentation, User-Assisted Segmentation, Interactive Segmentation, Multiple Instance Segmentation, Robust Hypothesis Testing, DGL Test.

I Introduction

In image segmentation the aim is to assign a label to every image pixels in a way that the pixels with the same labels exhibit similar or unique properties. These properties can be low-level features such as color and texture or high-level semantic descriptions.

Image segmentation, in general, is a challenging problem. Therefore, depending upon the target application and specifications there exist several approaches to segmentation that differ in capabilities and limitations [1]. To ease this non-trivial task, recently numerous methods that accept and utilize different types of user assistance have been developed. For example in Graph Cuts [2], Random Walk [3] and their many variants [4, 5, 6, 7, 8, 9, 10, 11, 12], the user inputs are seed points or scribbles. In GrabCut [13] and its variants [14, 15, 16, 17] the user inputs are bounding boxes where these methods may also incorporate additional user inputs to refine the segmentation process. A detailed survey on user-assisted segmentation methods can be found in [18].

The variant algorithms cited above are often developed with the intention of improving the performance of the original method [4, 5, 6, 7, 9, 8, 16, 11], decreasing or easing the human interaction [16, 10, 7, 15, 8] or providing robustness to the user input [5, 14, 15, 12]. However, a great majority of these methods are limited to binary, i.e. foreground and background, segmentation. Besides, due to their construction and inherent operating mechanism, they work best under a specific input type and their performances improve with the amount or the precision of the user inputs. Therefore, it becomes quite difficult to assess the relative performance improvements of these methods within a unified framework. This difficulty only increases if their varying complexities are taken into the assesment picture.

With the intention of providing a common ground to different user assisted segmentation methods, we present a novel, minimalistic, multiple segmentation algorithm based on robust hypothesis testing [19, 20] and specifically the DGL test [21, 22, 23]. Being algorithmically simple, robust to different input types and requiring a complexity that is linear in the number of pixels, the proposed method is suitable to be used as a baseline method for quick benchmarking purposes. The main contributions of this paper are:

1) We show that one can take advantage of the inherent robustness of the DGL test in user-assisted image segmentation. Some user input types that we consider are:

i) a fraction of labeled pixels within ground truths masks or bounding boxes

ii) randomly selected seed points within ground truths masks

iii) randomly perturbed bounding boxes

2) The proposed method is algorithmically simple and can be coded with around 30 Matlab sentences. It can be efficiently implemented with a time complexity that is linear in the number of pixels and quadratic in the number of image labels.

3) We provide a mathematical analysis on the operation of the proposed method.

4) We present simulations on Berkeley’s BSDS 500 database [24] that validate the robust operation of the proposed method.

5) In addition to the initial user input, the increases in the accuracy of the proposed method per additional user inputs can easily be benchmarked where the accuracy can be increased arbitrarily.

II Preliminaries

II-A Problem Statement

Consider segmenting an image I:ΩdI:\Omega\rightarrow\mathbb{R}^{d} with NN pixels, |Ω|=N|\Omega|=N, into MM disjoint and collectively exhaustive regions Ω1,Ω2,,ΩM\Omega_{1},\Omega_{2},...,\Omega_{M} where a region may consist of several topologically separated components. Let XX denote an arbitrary pixel in Ω\Omega and I(X)I(X) be its intensity. We adopt the probabilistic formulation in [26] and assume that pixel intensities are independent and identically distributed (i.i.d.) and these distributions are different in Ωi\Omega_{i}, i=1,2,,Mi=1,2,...,M. More formally

{I(X)|XΩi}Pi,i=1,2,,M,\displaystyle\{I(X)|X\in\Omega_{i}\}\sim P_{i},\quad\quad i=1,2,...,M, (1)

where, in general, Pi:d[0,1]P_{i}:\mathbb{R}^{d}\rightarrow[0,1]. The assumed model is depicted in Figure 1. For grayscale images d=1d=1 and for RGB, HSV, etc., images d=3d=3. We will also consider digital images where pixel intensities are quantized along dd dimensions to form a discrete product alphabet 𝒳=𝒳1×𝒳2×.×𝒳d{\cal X}={\cal X}_{1}\times{\cal X}_{2}\times....\times{\cal X}_{d} so that I:Ω𝒳I:\Omega\rightarrow{\cal X} and Pi:𝒳[0,1]P_{i}:{\cal X}\rightarrow[0,1].

Ω2;P2\Omega_{2};P_{2}Ω1;P1\Omega_{1};P_{1}Ω2;P2\Omega_{2};P_{2}Ω3;P3\Omega_{3};P_{3}Ω4;P4\Omega_{4};P_{4}
Figure 1: Illustration of the assumed image model for the case M=4M=4. The image consists of 4 regions Ω1,Ω2,Ω3,Ω4\Omega_{1},\Omega_{2},\Omega_{3},\Omega_{4} where the distributions of pixel intensities in each region are distinct.

The aim in segmentation is to devise a labeling rule L:Ω{1,2,,M}L:\Omega\rightarrow\{1,2,...,M\} such that L(X)=iL(X)=i is chosen if XΩiX\in\Omega_{i}.

II-B DGL Test

The DGL test [21] is a robust MM-ary hypothesis testing procedure for i.i.d. sequences. It can be used when the true distributions, PiP_{i}, i=1,2,,Mi=1,2,...,M, that are responsible for the generation of an observed sequence is not known, but there exist nominal distributions QiQ_{i}, Qi:d[0,1]Q_{i}:\mathbb{R}^{d}\rightarrow[0,1], that are close to true ones. The test is robust provided that there exists a positive Δ\Delta such that

V(Pi,Qi)(minijV(Qi,Qj)Δ)/2\displaystyle V(P_{i},Q_{i})\leq(\min_{i\neq j}V(Q_{i},Q_{j})-\Delta)/2 (2)

where V(P,Q)V(P,Q) denotes the total variation between PP and QQ. If the condition in (2) is satisfied, the resultant test will have a probability of error that decreases exponentially in the length of the test sequence uniform for all PiP_{i}.

Recently, we have investigated the DGL test in the discrete setting for statistical classification with labeled training sequences (see [23]). Our findings indicate that the achievable error exponent of the DGL test is sub-optimal, however it provides consistent classification for any length of training sequences provided that the unknown sources are separated in total variation. We will use the following corollary [23, Corollary 1,2] to analyze the performance of the proposed method.

Corollary 1.

Let 𝐭int=[ti1,ti2,,tint]{\bf\it t}^{n_{t}}_{i}=[t_{i1},t_{i2},...,t_{in_{t}}], i=1,2,,Mi=1,2,...,M, be a set of training sequences such that 𝐭int{\bf\it t}^{n_{t}}_{i} is generated from distribution PiP_{i}. In the discrete setting where Pi:𝒳[0,1]P_{i}:{\cal X}\rightarrow[0,1] and when the empirical distribution of 𝐭iN{\bf\it t}^{N}_{i} is taken as the nominal distribution QiQ_{i} in the DGL test, the error probability for testing a sequence with length nn is upper bounded as

Pr[e]2Men(αminijV(Pi,Pj)22(2+α)2max{2ln(M1)n,|𝒳|ln2n})\displaystyle\Pr[e]\leq 2Me^{-n(\frac{\alpha\min_{i\neq j}V(P_{i},P_{j})^{2}}{2(2+\sqrt{\alpha})^{2}}-\max\{\frac{2\ln(M-1)}{n},\frac{|{\cal X}|\ln 2}{n}\})} (3)

and

Pr[e]2Men(2αminijV(Pi,Pj)2(3|𝒳|+2α)2max{2ln(M1)n,ln|𝒳|n})\displaystyle\Pr[e]\leq 2Me^{-n(\frac{2\alpha\min_{i\neq j}V(P_{i},P_{j})^{2}}{{(3|\cal X|}+2\sqrt{\alpha})^{2}}-\max\{\frac{2\ln(M-1)}{n},\frac{\ln|{\cal X}|}{n}\})} (4)

where α=nt/n\alpha=n_{t}/n.

The above bounds are not tight in general, however their non-asymptotic nature helps in characterizing the error exponent for finite nn and α\alpha.

III DGL Test-Based Segmentation

III-A Implementation

Ω2\Omega_{2}Ω1\Omega_{1}Ω2\Omega_{2}Ω3\Omega_{3}Ω4\Omega_{4}
(a)
Ω2\Omega_{2}Ω1\Omega_{1}Ω2\Omega_{2}Ω3\Omega_{3}Ω4\Omega_{4}
(b)
Figure 2: a) Dashed lines represent superpixel boundaries. b) Performing image segmentation at the superpixel level.

Let 𝒯1,𝒯2,.,𝒯M{\cal T}_{1},{\cal T}_{2},....,{\cal T}_{M} be MM sets of user-labeled image pixels. These pixels may be obtained from the ground truth masks, the bounding boxes or from the vicinity of image seeds/scribbles as we explain in detail in the next section. Let Q^i:𝒳[0,1]\hat{Q}_{i}:{\cal X}\rightarrow[0,1], 𝒳=𝒳1×𝒳2××𝒳d{\cal X}={\cal X}_{1}\times{\cal X}_{2}\times...\times{\cal X}_{d}, i=1,2,.,Mi=1,2,....,M, denote the possibly quantized intensity histograms of 𝒯i{\cal T}_{i}. More formally,

Q^i(q)q𝒳=1|𝒯i|X𝒯i𝕀I¯(X)=q\displaystyle\hat{Q}_{i}(q)_{q\in{\cal X}}=\frac{1}{|{\cal T}_{i}|}\sum_{X\in{\cal T}_{i}}\mathbb{I}_{\bar{I}(X)=q} (5)

where 𝕀\mathbb{I} is an indicator function and I¯(X)\bar{I}(X) denotes the rounded intensity of pixel XX where rounding is towards the nearest edge of the histogram bin descriptor.

We perform image labeling on the superpixel level by first partitioning Ω\Omega into superpixels, Ω{𝒮1,𝒮2,.,𝒮K}\Omega\rightarrow\{{\cal S}_{1},{\cal S}_{2},....,{\cal S}_{K}\}, KNK\leq N, and then apply the DGL test for each superpixel. The proposed approach is illustrated in Figure 2. Although the premise of superpixels is their boundary adherences, in practice this may not always hold and assigning the image labels at the superpixel level may result in some residual error as can be seen in Figure 2b. However, their adoption simplifies the proposed approach as they group adjacent pixels into perceptually similar cells that are easy to differentiate with the DGL test.

In order to apply the DGL test, one needs to calculate M(M1)/2M(M-1)/2 Borel set Ai,jA_{i,j}, Ai,j𝒜A_{i,j}\in\cal A, that are of the form

Ai,j={q:Q^i(q)Q^j(q)},1i<jM,\displaystyle A_{i,j}=\left\{q:\hat{Q}_{i}(q)\geq\hat{Q}_{j}(q)\right\},\quad 1\leq i<j\leq M, (6)

for all q𝒳q\in{\cal X}. Then, if we let {X1,X2,,Xn}\{X_{1},X_{2},...,X_{n}\} be the set of pixels in 𝒮k{\cal S}_{k}, the DGL test decides L^(𝒮k)=i\hat{L}({\cal S}_{k})=i if

maxA𝒜|AQ^i(A)μn(A)|=minj=1,,MmaxA𝒜|AQ^j(A)μn(A)|\displaystyle\max_{A\in\cal{A}}\left|\int_{A}\hat{Q}_{i}(A)-\mu_{n}(A)\right|=\min_{j=1,...,M}\max_{A\in\cal{A}}\left|\int_{A}\hat{Q}_{j}(A)-\mu_{n}(A)\right| (7)

where μn(A)=1ni=1n𝕀XiA\mu_{n}(A)=\frac{1}{n}\sum_{i=1}^{n}\mathbb{I}_{X_{i}\in A} and integrals in (7) can be calculated numerically over 𝒳{\cal X}.

The pseudo-code for implementing the proposed segmentation algorithm in Matlab for the case d=3d=3 is presented in Algorithm 1. This algorithm is minimalistic in the sense that it is based on 4 simple functions and can be coded with around 30 sentences.

III-B Complexity

The superpixels 𝒮1,𝒮2,.,𝒮K{\cal S}_{1},{\cal S}_{2},....,{\cal S}_{K} can be calculated with complexity O(N)O(N) e.g. by using SLIC algorithm [25] in Matlab. The calculations of Q^1,Q^2,.,Q^M\hat{Q}_{1},\hat{Q}_{2},....,\hat{Q}_{M} can be done in O(N)O(N), as well. Investigating Algorithm 1, we observe that the main loops in DGLSegment can be calculated with time complexity O(M2N+M2logM)O(M^{2}N+M^{2}\log M) and require a space complexity of O(M3N)O(M^{3}N). The initialization functions Calc𝒜Calc_{\cal A} and Prob𝒜Prob_{\cal A} can be calculated with time complexity O(M2|𝒳|)O(M^{2}|{\cal X}|), 𝒳=𝒳1×𝒳2××𝒳d{\cal X}={\cal X}_{1}\times{\cal X}_{2}\times...\times{\cal X}_{d}, and require space complexities O(M2|𝒳|)O(M^{2}|{\cal X}|) and O(M3|𝒳|)O(M^{3}|{\cal X}|), respectively. Therefore, when MM is not exponential in NN, the proposed method can be implemented with time complexity O(M2max{|𝒳|,N})O(M^{2}\max\{|{\cal X}|,N\}) and space complexity O(M3max{|𝒳|,N})O(M^{3}\max\{|{\cal X}|,N\}). When |𝒳|>N|{\cal X}|>N the complexities are not linear in the number of pixels which may be undesired. In such cases, one can apply dimensionality reduction and implement the algorithm on 𝒳\cal X^{\prime}, 𝒳=𝒳1×𝒳2××𝒳d{\cal X^{\prime}}={\cal X}_{1}^{\prime}\times{\cal X}_{2}^{\prime}\times...\times{\cal X}_{d^{\prime}}^{\prime}, ddd^{\prime}\leq d, by forcing |𝒳|N|{\cal X^{\prime}}|\leq N. This ensures an algorithm with time complexity O(M2N)O(M^{2}N) and space complexity O(M3N)O(M^{3}N). Imposing dimensionality reduction may result in some performance degradation; however, as we show via simulations in the next section this degradation may be negligible.

Function DGLSegmentDGLSegment
       input 𝒮1,𝒮2,,𝒮K{\cal S}_{1},{\cal S}_{2},...,{\cal S}_{K}, Q^1,Q^2,.,Q^M\hat{Q}_{1},\hat{Q}_{2},....,\hat{Q}_{M}
      output: L^(𝒮1),L^(𝒮2),.,L^(𝒮K)\hat{L}({\cal S}_{1}),\hat{L}({\cal S}_{2}),....,\hat{L}({\cal S}_{K}) 𝒜=Calc𝒜(Q^1,.,Q^M){\cal A}=Calc_{\cal A}(\hat{Q}_{1},....,\hat{Q}_{M}), PA=Prob𝒜(Q^1,,Q^M)P_{A}=Prob_{\cal A}(\hat{Q}_{1},...,\hat{Q}_{M})
      for k=1:Kk=1:K do
             X=𝒮kX={\cal S}_{k};
            for i=1:M1i=1:M-1 do
                   for j=i+1:Mj=i+1:M do
                         μA=Meann(X,A(:,:,:,i,j))\mu_{A}=Mean_{n}(X,A(:,:,:,i,j));
                        for m=1:Mm=1:M do
                               d(m,i,j)=abs(PA(m,i,j)μA)d(m,i,j)=abs(P_{A}(m,i,j)-\mu_{A}); t(m)=max(max(d(m,:,:));t(m)=max(max(d(m,:,:));
                        
                  
            L^(Sk)=argmin(t)\hat{L}(S_{k})=argmin(t);
      return  
Function Calc𝒜Calc_{\cal A}
       input: Q^1,Q^2,.,Q^M\hat{Q}_{1},\hat{Q}_{2},....,\hat{Q}_{M} output: 𝒜{\cal A}
      for i=1:Mi=1:M do
             for j=i:M1j=i:M-1 do
                   [in1,in2,in3in_{1},in_{2},in_{3}] =find( Q^i(:,:,:)>Q^j(:,:,:)\hat{Q}_{i}(:,:,:)>\hat{Q}_{j}(:,:,:) )
                  for t=1:length(in1)t=1:length(in_{1}) do
                         A(in1(t),in2(t),in2(t),i,j)=1A(in_{1}(t),in_{2}(t),in_{2}(t),i,j)=1;
                  
            
      return  
Function Prob𝒜Prob_{\cal A}
       input: Q^1,Q^2,.,Q^M,𝒜\hat{Q}_{1},\hat{Q}_{2},....,\hat{Q}_{M},{\cal A}
      output: AQ^i(A),A𝒜\sum_{A}\hat{Q}_{i}(A),\forall A\in{\cal A}, i=1,2,,Mi=1,2,...,M
      for m=1:Mm=1:M do
             for i=i:M1i=i:M-1 do
                  
                  for j=i:Mj=i:M do
                         PA(m,i,j)=trapz(A(:,:,:,i,j).Qm(:,:,:))P_{A}(m,i,j)=trapz(A(:,:,:,i,j).*Q_{m}(:,:,:));
                         // Trapezoidal numerical integration over 𝒳1×𝒳2×𝒳3{\cal X}_{1}\times{\cal X}_{2}\times{\cal X}_{3}
                        
                  
            
      return  
Function MeannMean_{n}
      
      input: A𝒜A\in\cal A, XX output: μn(A)\mu_{n}(A), n=length(X)n=length(X);
      for i=1:ni=1:n do
             if A(X(i,1),X(i,2),X(i,3))==1A(X(i,1),X(i,2),X(i,3))==1 then
                   I=I+1;I=I+1;
            
      μn=I/n;\mu_{n}=I/n;
Algorithm 1 Robust User-Assisted Segmentation

III-C Performance

In addition to the adopted probabilistic model in Section 2A, let us also assume that the superpixels adhere perfectly to the boundaries as 𝒮kΩi{\cal S}_{k}\subset\Omega_{i}, for some i=1,2,,Mi=1,2,...,M, k=1,2,,Kk=1,2,...,K. Then, the following result is a consequence of Corollary 1.

Corollary 2.

Let nmin=min{|𝒯1|,|𝒯2|,.,|𝒯M|}n_{min}=min\{|{\cal T}_{1}|,|{\cal T}_{2}|,....,|{\cal T}_{M}|\} by assuming the pixels are selected from ground truth masks. Also, let 𝒳=𝒳1×𝒳2×𝒳d{\cal X}={\cal X}_{1}\times{\cal X}_{2}...\times{\cal X}_{d} be the product alphabet for the image intensities i.e. I:Ω𝒳I:\Omega\rightarrow{\cal X}. Then, the probability of mislabeling 𝒮k={X1,X2,.,Xn}{\cal S}_{k}=\{X_{1},X_{2},....,X_{n}\} with the proposed method obeys the upper bounds in (3) and (4) by letting α=nmin/n\alpha=n_{min}/n.

The bounds in (3)\eqref{eq:thm_1} and (4)\eqref{eq:thm_2} imply that the error probability decreases exponentially in the size, nn, of the superpixel. Therefore, the image should be decomposed into superpixels whose sizes are as large as possible. However, due to the imperfect boundary adherences of the superpixels and the desired granularity in the image, the number of superpixels should be adjusted so that there exist a few of them in the smallest Ωi\Omega_{i}. Next, we observe that the exponent term in (3) is positive provided that

n2(1+α)2|𝒳|ln(2)αminijV(Pi,Pj)2\displaystyle n\geq\frac{2(1+\sqrt{\alpha})^{2}|{\cal X}|\ln(2)}{\alpha\min_{i\neq j}V(P_{i},P_{j})^{2}} (8)

which implies that the performance depends on the relative sizes of the alphabet and the superpixels, as well. If |𝒳||\cal X| scales faster than nn, the bound in (4) can be used and this ensures a positive error exponent provided that α\sqrt{\alpha} scales faster than |𝒳||{\cal X}| as nmin|𝒳|2nn_{min}\geq|{\cal X}|^{2}n. Therefore, by increasing the amount of user provided labeled pixels from each Ωi\Omega_{i} one can ensure the correct operation of the proposed test for any |𝒳||{\cal X}| and nn.

When the user inputs are labeled pixels from bounding boxes, Q^i\hat{Q}_{i} is no longer an empirical density of PiP_{i} but a mixture. For this case, the proposed method is still robust with a positive error exponent provided that V(Pi,Q^i)(minijV(Q^i,Q^j)Δ)/2V(P_{i},\hat{Q}_{i})\leq(\min_{i\neq j}V(\hat{Q}_{i},\hat{Q}_{j})-\Delta)/2 for some positive Δ\Delta. However, as the bounding boxes get loose, the mixing of distributions increases and thus minijV(Q^i,Q^j)\min_{i\neq j}V(\hat{Q}_{i},\hat{Q}_{j}) decreases. This, in turn, degrades the performance of the proposed method.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Refer to caption
(f)
Refer to caption
(g)
Figure 3: The performance of the prosed method on two relatively difficult images from BSDS 500 database. In all rows, a) the original image, b) ground truth, c) to g) are the outputs of DGL100%GT{}_{GT}^{100\%}, DGL50%GT{}_{GT}^{50\%}, DGL15ptsGT{}_{GT}^{15\hskip 1.0ptpts}, DGL50%BB{}_{BB}^{50\%}, DGL10%ptBB{}_{BB}^{10\%pt}, respectively.

IV Simulations

The proposed segmentation method is evaluated on the test images of Berkeley’s BSDS 500 database [24]. This set consists of 200 natural RGB images with multiple ground truth annotations. The images have sizes 321×481321\times 481 where each color intensity is represented with 8 bits i.e. |𝒳|=2563|{\cal X}|=256^{3}. As an accuracy measure we have considered intersection over union (IoU) which is the ratio of correctly labeled pixels to all labeled pixels (we considered 90% of the labeled pixels due to oversegmentations in images). We have estimated IoU by averaging it over multiple images and multiple ground truths.

Let GTi and BBi, i=1,2,,Mi=1,2,...,M, denote the ground truth masks and (tight) bounding boxes of Ωi\Omega_{i}. First, we have formed 𝒯i{\cal T}_{i} from a percentage f%f\%, f{25,50,75,100}f\in\{25,50,75,100\}, of randomly selected labeled pixels from GTi and BBi. We denote these methods by DGLf%GT{}_{GT}^{f\%} and DGLf%BB{}_{BB}^{f\%}, respectively. Next, we have considered the case where tt, t{10,15,20}t\in\{10,15,20\}, random pixels seeds in Ωi\Omega_{i} are selected and the case where bounding boxes are perturbed by p%p\%, p{5,10,15}p\in\{5,10,15\}. These methods are denoted by DGLtptsGT{}_{GT}^{t\hskip 1.0ptpts} and DGLp%ptBB{}_{BB}^{p\%pt}, respectively. In the former, 𝒯i{\cal T}_{i} is constructed from the pixels within a square whose center is the seed point in Ωi\Omega_{i} and whose side-length is ll (we used ll=50 for BSDS images). In the latter approach, the two corner points, (r1,c1r_{1},c_{1}), (r2,c2r_{2},c_{2}), of bounding boxes that touch the diagonal are uniformly perturbed p%p\% to obtain (r^1,c^1\hat{r}_{1},\hat{c}_{1}), (r^2,c^2)\hat{r}_{2},\hat{c}_{2}), respectively. Here, r^1\hat{r}_{1} is selected uniformly over [r1(r2r1)p200,r1+(r2r1)p200][r1-\frac{(r_{2}-r_{1})p}{200},r1+\frac{(r_{2}-r_{1})p}{200}] and we followed the same approached to obtain c^1,r^2\hat{c}_{1},\hat{r}_{2} and c^2\hat{c}_{2}.

The implementation of the proposed algorithm in RGB color space requires a memory proportional to M3×2563M^{3}\times 256^{3} which becomes unpractical for images with M>10M>10. Thus, we used dimensionality reduction by implementing the proposed algorithm in the HSV color space and considering only H and S components of the image. This approach has the advantage of decoupling the chromatic information from the shading effect [27, 1] and the resultant algorithm showed negligible performance degradation when |𝒳1|=|𝒳2|=1024|{\cal X}_{1}|=|{\cal X}_{2}|=1024 where 𝒳1{\cal X}_{1} and 𝒳2{\cal X}_{2} correspond to equidistant bins for H and S components, respectively. The performance of this approach is demonstrated in Figure 3 for two relatively difficult images where we have used the SLIC framework [25] with K=500K=500 superpixels. Inspecting the images we observe that mislabeling occurs when i) regions with similar intensity distributions are annotated differently due to the presence of a contour e.g. the sky in the top image and tree branches in the bottom image ii) the intensity distributions mix while construction 𝒯i{\cal T}_{i} in DGLtptsGT{}_{GT}^{t\hskip 1.0ptpts}, DGLf%BB{}_{BB}^{f\%} and DGLp%ptBB{}_{BB}^{p\%pt} e.g. the mislabeled superpixels in the body and in the wing of the mother bird.

For benchmarking we have reduced the alphabet size and used the algorithm with |𝒳1|=|𝒳2|=321×482392|{\cal X}_{1}|=|{\cal X}_{2}|=\sqrt{321\times 482}\approx 392 so that it has linear complexity in the number of pixels. This reduction resulted in some degradation, around 3%-5%, in the accuracy. We have also investigated the effect of additional user inputs by assuming a genie-aided user optimally relabels the mislabeled (or partially mislabeled) superpixels which can be performed by at most two point clicks per superpixel. With this approach, the amount of increase in the accuracy per additional user inputs (clicks) can be easily calculated during benchmarking without actually performing the corrections.

The simulation results are presented in Figure 4. Recall that 𝒯i{\cal T}_{i} consists of labeled pixels from Ωi\Omega_{i} in DGLf%GT{}_{GT}^{f\%}, therefore DGL100%GT{}_{GT}^{100\%} exhibits the ultimate performance of the proposed method. Inspecting Figure 4, we observe that the performances of DGL100%GT{}_{GT}^{100\%}, DGL75%GT{}_{GT}^{75\%} and DGL50%GT{}_{GT}^{50\%} are almost the same and one can reach an initial segmentation accuracy around 0.9 when only half of the labeled pixels in ground truths masks are provided by the user. This performance is promising given the simplicity of the proposed method. As expected from the discussion in Section 2C, the performance deteriorates in DGLf%BB{}_{BB}^{f\%}, DGLtptsGT{}_{GT}^{t\hskip 1.0ptpts} and DGLp%ptBB{}_{BB}^{p\%pt}. However, this deterioration is consistent in each of the considered user input type which indicates the robustness of the proposed method.

Refer to caption
Figure 4: Benchmarking results on BSDS 500 database

V Conclusion

We have presented a novel, user-assisted, multiple segmentation method based on DGL testing procedure. The proposed method is shown to be robust when used with different user input types such as labeled pixels and seeds from ground truth masks or bounding boxes that may be perturbed. Having linear time complexity in the number of pixels and requiring minimal coding, around 30 lines, the proposed method is suitable for fast benchmarking purposes. In this work, we have only utilized the empirical distributions, i.e. histograms, in the user input regions. Including the spatial information of these regions, e.g. as in [28], in the DGL test may improve the performance of the proposed method and is the topic of our future work.

References

  • [1] R. C. Gonzalez, R. E. Woods and S. L. Eddins “Digital Image Processing Using Matlab”, Prentice Hall, 2003.
  • [2] Y. Y. Boykov and M. P. Jolly, “Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images,” Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, 2001, pp. 105-112, vol.1.
  • [3] L. Grady, “Random Walks for Image Segmentation,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 11, pp. 1768-1783, Nov. 2006.
  • [4] T. Wang, Z. Ji, Q. Sun, Q. Chen, S. Han, “Image segmentation based on weighting boundary information via graph cut ”, Journal of Visual Communication and Image Representation, vol. 33, pp. 10-19, 2015.
  • [5] B. L. Price, B. Morse and S. Cohen, “Geodesic graph cut for interactive image segmentation,” 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3161-3168, 2010.
  • [6] S. Vicente, V. Kolmogorov and C. Rother, “Graph cut based image segmentation with connectivity priors,” 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8, 2008.
  • [7] Y. Li, J. Sun, C. K. Tang, H. Y. Shum, “Lazy snapping”, ACM Transactions on Graphics, vol. 23, no. 3, pp. 303–308, 2004.
  • [8] B. Peng, L. Zhang, D. Zhang, J. Yang, “Image segmentation by iterated region merging with localized graph cuts,” Pattern Recognition, vol: 44, issue: 10–11, pp. 2527-2538, 2011.
  • [9] W. Yang, J. Cai, J. Zheng and J. Luo, “User-Friendly Interactive Image Segmentation Through Unified Combinatorial User Inputs,” in IEEE Transactions on Image Processing, vol. 19, no. 9, pp. 2470-2479, Sept. 2010.
  • [10] C. G. Bampis, P. Maragos and A. C. Bovik, “Graph-Driven Diffusion and Random Walk Schemes for Image Segmentation,” in IEEE Transactions on Image Processing, vol. 26, no. 1, pp. 35-50, Jan. 2017.
  • [11] X. Dong, J. Shen, L. Shao and L. Van Gool, “Sub-Markov Random Walk for Image Segmentation,” in IEEE Transactions on Image Processing, vol. 25, no. 2, pp. 516-527, Feb. 2016.
  • [12] C. G. Bampis and P. Maragos, “Unifying the random walker algorithm and the SIR model for graph clustering and image segmentation,” 2015 IEEE International Conference on Image Processing (ICIP), pp. 2265-2269, 2015.
  • [13] C. Rother, V. Kolmogorov and A. Blake, “Grabcut: Interactive foreground extraction using iterated graph cuts,” ACM Transactions on Graphics, vol. 23, no. 3, pp. 309–314, 2004.
  • [14] H. Yu, Y. Zhou, H. Qian, M. Xian and S. Wang, “Loosecut: Interactive image segmentation with loosely bounded boxes,” 2017 IEEE International Conference on Image Processing (ICIP), pp. 3335-3339, 2017.
  • [15] S. Wu, M. Nakao and T. Matsuda, “SuperCut: Superpixel Based Foreground Extraction With Loose Bounding Boxes in One Cutting,” in IEEE Signal Processing Letters, vol. 24, no. 12, pp. 1803-1807, Dec. 2017.
  • [16] M. Tang, L. Gorelick, O. Veksler and Y. Boykov, “GrabCut in One Cut,” 2013 IEEE International Conference on Computer Vision, pp. 1769-1776, 2013.
  • [17] M. Rajchl et al, “DeepCut: Object Segmentation From Bounding Box Annotations Using Convolutional Neural Networks,” in IEEE Transactions on Medical Imaging, vol. 36, no. 2, pp. 674-683, Feb. 2017.
  • [18] H. Ramadan, C. Lachqar and H. Tairi, “ A survey of recent interactive image segmentation methods,”, Comp. Visual Media 6, pp. 355–384, 2020.
  • [19] P. J. Huber, “Robust Statistics”, New York J. Wiley, 1981.
  • [20] G. Gül, “Robust and Distributed Hypothesis Testing”, Springer, 2018.
  • [21] L. Devroye, L. Gyorfi and G. A Lugosi, “A note on robust hypothesis testing”, IEEE Trans. Inform. Theory, vol. 48, no. 7, pp. 2111-2014, 2002.
  • [22] E. Biglieri and L. Gyorfi, “Some remarks on robust binary hypothesis testing”, IEEE Inter. Symp. on Inform. Theory, pp. 566-570, 2014.
  • [23] H. Afşer, “Statistical Classification via Robust Hypothesis Testing: Non-Asymptotic and Simple Bounds,” in IEEE Signal Processing Letters, vol. 28, pp. 2112-2116, 2021.
  • [24] P. Arbeláez, M. Maire, C. Fowlkes and J. Malik, “Contour Detection and Hierarchical Image Segmentation,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 5, pp. 898-916, May 2011.
  • [25] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua and S. Süsstrunk, ”SLIC Superpixels Compared to State-of-the-Art Superpixel Methods,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 11, pp. 2274-2282, Nov. 2012,
  • [26] Junmo Kim, J. W. Fisher, A. Yezzi, M. Cetin and A. S. Willsky, “A nonparametric statistical method for image segmentation using information theory and curve evolution,” in IEEE Transactions on Image Processing, vol. 14, no. 10, pp. 1486-1502, Oct. 2005
  • [27] M. Mignotte, “Segmentation by Fusion of Histogram-Based KK-Means Clusters in Different Color Spaces,” in IEEE Transactions on Image Processing, vol. 17, no. 5, pp. 780-787, May 2008.
  • [28] C. Nieuwenhuis and D. Cremers, “Spatially Varying Color Distributions for Interactive Multilabel Segmentation,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 5, pp. 1234-1247, May 2013.