Spectrum Congruency of Multiscale Local Patches for Edge Detection

Fang Yang¹, Xin Su²*, Li Chai¹ 2 School of Remote Sensing and Information Engineering Wuhan University, 430079, Wuhan, China
1 Engineering Research Center of Metallurgical Automation and Measurement Technology,Wuhan University of Science and Technology 430081, Wuhan, China

Abstract

This paper proposes a novel feature called spectrum congruency for describing edges in images. The spectrum congruency is a generalization of the phase congruency, which depicts how much each Fourier components of the image are congruent in phase. Instead of using fixed bases in phase congruency, the spectrum congruency measures the congruency of the energy distribution of multiscale patches in a data-driven transform domain, which is more adaptable to the input images. Multiscale image patches are used to acquire different frequency components for modeling the local energy and amplitude. The spectrum congruency coincides nicely with human visions of perceiving features and provides a more reliable way of detecting edges. Unlike most existing differential-based multiscale edge detectors which simply combine the multiscale information, our method focuses on exploiting the correlation of the multiscale patches based on their local energy. We test our proposed method on synthetic and real images, and the results demonstrate that our approach is practical and highly robust to noise.

Index Terms:

Spectrum Congruency; Edge Detection; Multiscale; Local Energy Model; Transform Domain;

I Introduction

Edge detection is one of the most fundamental and essential tasks in the computer vision field. Edges play essential roles and act as the basis of a number of tasks such as image segmentation and object recognition [1, 2, 3, 4, 5, 6]. In the views of image processing, an edge is a place where there is a sharp change or discontinuity of intensity or brightness. To find the discontinuity, most edge detection methods focus on the maximum of the absolute values of the first-order derivative or the zero crossings of the second-order derivative. Besides, some methods based on the signal decomposition or wavelets are also favored in edge detection [7, 8].

The last decades have witnessed a huge growth in the development of edge detection methods. The existing edge detectors can be roughly divided into two classes: single scale-based and multiple scale-based methods [9]. Chronologically, early single-scale edge detection approaches aim at identifying points where the image brightness changes sharply, such as Robert operator [10], Sobel operator [11], Prewitt operator [12] etc. These detectors use simple gradient operators to detect edges and are very sensitive to noise, in the meantime, they can not provide precise locations of edges. Then Marr and Hildreth proposed to search for the the zero crossings of the second order derivative of an image, which helps to improve the performance [13]. However, the Marr-Hildreth algorithm does not behave well at places where the intensity varies, and it generates localization errors. The Canny detector is the most widely used edge detector since it was proposed [14]. Canny detector aims at finding an optimal edge detection algorithm, which means that the algorithm 1) could label as many real edges as possible in images; 2) could localize the edges accurately on the edge; 3) should mark no more than once for an edge and should not mark the possible noise as the edges. Canny detector preprocess the image with a Gaussian kernel, which acts as a scale selection. With a small scale Gaussian kernel, the Canny detector can provide details of intensity variation information, but it is prone to noise. A large scale Gaussian kernel makes the Canny detector robust to noise, but the precision of location is degraded more or less [15, 16]. The single-scale edge detectors are actually defined to detect step edges. However, in natural images, one can generally find spatially scaled edges of which the intensity discontinuities vary over different widths [17]. Hence, single-scale edge detectors do not conform to actual situations and are not sufficient for edge detection. Then multiscale techniques are employed to describe and synthesize the varieties of edges [18, 19].

Multiscale approaches can be implemented either in the spatial domain or in the transform domain. In the spatial domain, the edge strength maps are usually obtained by differential-based methods, as done in the single scale. For example, Bao et al. proposed a scale multiplication function to fuse the responses of Canny detector at different scales [20]. Shui et al. proposed an edge detection method based on the multiplication of isotropic and anisotropic Gaussian directional derivatives [21]. Wang et al. detected the edges by using the first-order derivative of the anisotropic Gaussian kernel, which improves the robustness to noise for small scale kernels [9]. In [7], the authors proposed a multiscale edge detection in the Gaussian scale space based on Gaussian smoothing and put forward a coarse-to-fine tracking algorithm to combine the results at each scale. However, differential-based edge detection methods are always sensitive to variations in image illumination and it is hard to determine the threshold values. Besides, only a very few works aim at exploring the fusion of multiscale information [7]. Except for the scale multiplication, morphological dilation and multiplication was also proposed to fuse the multiscale information [22]. These methods simply combine the multiscale formation but do not provide an inner and deeper explanation. In our paper, to better exploit the correlation between multiscale information, we adopt to fuse the multiscale information in the transform domain according to the local energy model.

In [23, 24], the authors found that biologically or physically, the edges in images could be defined as places where the Fourier components are maximally congruent in phase. This phenomenon is called the phase congruency. It is proved in [25] that the phase congruency is consistent with the human visual system, i.e., it explains a number of psychophysical effects that how human perceive the features in images [26, 27]. Therefore, it is invariant to image brightness and illumination. Basically, the phase congruency is computed in the following three stages: 1) use several pairs of quadrature filters with different scales to obtain the low-, mid- and high-frequency filter templates; 2) convolve the templates with the input image, thus bringing the information at each frequency band; 3) compute the phase congruency according to the pixel local energy, amplitude, the estimated noise energy and a weighting function.

Phase congruency can be comprehended as salient features that have similar values of the local phase when observed at different scales [28]. It can be also regarded as the edge strength map. As a low-level image feature, the phase congruency is favored and widely applied in many kinds of image vision tasks because of its invariance to illumination and consistency to the human visual system, such as image quality assessment [29], multi-modal image analysis [30, 31], pattern recognition and object detection [32, 33, 34, 35], image registration [36, 37] and so on. However, existing measurements of phase congruency are sensitive to noise, even though they take the noise compensation into consideration. Moreover, they generate glitch artifacts when the input image is noisy because they integrate the response value from multiple orientations, which may lead to spurious edges [27, 38]. Besides, the phase congruency model extracts information of different frequency bands by convolving the image with Log-Gabor filters of different scales, however, the Log-Gabor filter is characterized by Gaussian kernels, which reduces some high-frequency components. Therefore, a weighting function is needed to devalue phase congruency at locations where the spread of filter responses is narrow [27], which may reduce the precision of feature location.

In this paper, we are motivated to find a proper data-driven transform domain to represent the input image and compute the local energy. The new measurement is called spectrum congruency and it is a generalization of the phase congruency. With the data-driven bases, we do not need to integrate the filter response value from multiple orientations, thus avoiding the glitch artifact. In addition, the spectrum congruency extracts different frequency information by using multiscale patches and can retain all frequency information, hence, the weighting function is unnecessary. The spectrum congruency is more adaptable to the input image, which makes the detected features more reliable. To the best of our knowledge, there are no results in the literature regarding applying the data-driven method to measure the local energy and the phase congruency.

In recent years, with the rapid development of big data, the learning-based [2, 39, 40, 41] edge detection methods, especially deep learning-based approaches [42, 43, 44, 45, 46] have achieved great success. Despite the superior performance of the learning-based methods, it is generally acknowledged that such techniques require a large amount of training data. Moreover, they suffer from a lack of interpretability and their computation is complicated, time-consuming. Therefore, it is worth developing the model-based edge detection method to quickly obtain accurate results without supervising information and training data.

The remainder of this paper is organized as follows. Section II introduces the local energy model and how it is applied to compute the phase congruency feature. Section III presents our proposed spectrum congruency according to data-driven local energy model in detail. Section IV analyzes the effect of patchsizes and demonstrates the effectiveness of our method on real image data. Section V concludes the paper.

II Local Energy Model and Phase Congruency

The original way to calculate phase congruency of a signal is to compute the phase difference of each Fourier components, which is not efficient. It is indicated in [47] that points of maximum phase congruency are located in the peaks of local energy. Therefore, to simplify the calculation, Venkatesh and Owens proposed to measure phase congruency according to the local energy $E(x)$ :

E(x)=\sqrt{F^{2}(x)+H^{2}(x)},

(1)

where $F(x)$ is the signal $f(x)$ without its DC component, and $H(x)$ is the Hilbert transform of $F(x)$ . The phase congruency of the signal is equivalent to its local energy scaled by the reciprocal of the sum of Fourier amplitude [47]:

PC(x)=\frac{E(x)}{\sum_{n}A_{n}(x)+\epsilon},

(2)

where $A_{n}(x)$ represents the amplitude of the $n_{th}$ Fourier component and $\epsilon$ is a small positive value to keep the denominator from being zero.

However, the Fourier transform is insufficient to determine local frequency information [26]. This is because that the Fourier transform does not take the spread of frequencies into account, which makes it invalid for some special signals, e.g., signals with only one frequency component. Another important reason is that windowing is a necessary operation to control the scale, which makes the method complicated. Afterwards, wavelet transform are used to calculate the phase congruency respectively [26]. The log-Gabor filter is applied because it can be constructed with arbitrary bandwidth and can obtain local frequency information appropriately. The local energy based on the log-Gabor filter is computed as:

E(x)=\sqrt{(\sum_{n}e_{n}(x))^{2}+(\sum_{n}o_{n}(x))^{2}},

(3)

where $e_{n}(x)$ and $o_{n}(x)$ are the even and odd filter response at $n^{th}$ scale. The corresponding amplitude of the signal at each frequency scale is :

A_{n}(x)=\sqrt{(e_{n}(x)^{2}+o_{n}(x)^{2})}

(4)

A 2D Gabor filter is actually a Gaussian kernel function modulated by a sinusoidal plane wave. Hence, when applying the log-Gabor filter to the input image, some high-frequency information is removed, which will lead to spurious responses. Thus, a weighting function $W(\cdot)$ was constructed by applying a sigmoid function to the filter response value and multiplied to the local energy. Additionally, the noise compensation was taken into account by subtracting the estimated noise energy $T$ from the local energy. The phase congruency model that is computed by log-Gabor filters then becomes:

PC(x)=\frac{W(x)\lfloor E(x)-T\rfloor}{\sum_{n}A_{n}(x)+\epsilon},

(5)

where $\lfloor\cdot\rfloor$ denotes that the enclosed value is itself if it is positive, and otherwise zero. It is postulated that the distribution of the magnitude of energy vector has a Rayleigh distribution, hence, $T$ is the estimated noise energy according to the mean value and variance of the Rayleigh distribution at the smallest scale.

The classical phase congruency model computes the energy and amplitude in the transform domain with fixed bases, such as the Fourier transform [47], the log-Gabor transform [27], the monogenic signal [48], which results in spurious edges and glitch artifacts in noisy images. Therefore, we would like to develop a data-driven method that is adaptable to the input images to compute the edge feature map.

III Spectrum Congruency and Edge Detection

III-A Spectrum Congruency via Multiscale Local Energy

The local energy is mostly computed via the integration of responses values of pairs of quadrature filters, which makes it sensitive to noise and may lead to glitch artifact. This is due to that the quadrature filters use fixed bases. Although fixed bases are suitable for most types of signals, they are not adaptable to the input signals. Hence, in this paper, we would like to search for an appropriate transform domain with data-driven bases to represent the signal and compute the corresponding local energy.

The transform domain is embedded in a Hilbert space $\mathbf{H}$ and composed of a set of complete orthogonal bases $\{\mathbf{v}_{n}\}_{n=1}^{N},\mathbf{v}_{n}\in\mathbb{R}^{N}$ , where $N$ is the dimension of $\mathbf{H}$ . Note that the wavelet transform can be used in the multiscale analysis naturally because the wavelet base is scalable, however, our method is data-driven, scaling of the bases will lose the complete and orthogonal property of the bases. Therefore, for a 2D image $f(x,y)$ , to acquire its information on different frequencies, firstly, we choose image patches around each pixel with different sizes $S_{1},S_{2},...,S_{K}$ that are sorted in ascending order. We denote these patches by $P_{1},P_{2},...,P_{K}$ , where $P_{k}\in\mathbb{R}^{\sqrt{S_{k}}\times\sqrt{S_{k}}},S_{k}\in[S_{1},S_{K}]$ . Then, at each target pixel, all of the patches are re-sampled to the size of $S_{1}$ :

P^{\prime}_{k}=P_{k}(x,y)\downarrow_{\frac{S_{k}}{S_{1}}},

(6)

where $\downarrow$ means the downsampling operation, $\frac{S_{k}}{S_{1}}$ is the downsampling ratio and $P^{\prime}_{k}$ is the $k_{th}$ new patch after downsampling. In this way, $P^{\prime}_{1}$ (the same as $P_{1}$ ) contains the high-frequency information because it describes the local information within a small range. Additionally, as $k$ increases, $P^{\prime}_{k}$ contains lower and lower frequency information because the downsampling process has removed high-frequency information gradually.

The re-sampling process is similar to the calculation of phase congruency by using wavelet. The wavelet-based methods pick up the low-, mid- and high-frequency information according to the Log-Gabor filter on the frequency domain, while we extract different frequency components directly from the signal in the spatial domain.

Suppose the ideal transform domain is composed of a set of complete orthogonal bases $\{\mathbf{v}_{n}\}_{n=1}^{S_{1}},\mathbf{v}_{n}\in\mathbb{R}^{S_{1}}$ , then the edge detection process based on the local energy is computed as follows.

First, remove the DC component of each patch surrounded at each target pixel by subtracting their mean value respectively:

X_{k}=P^{\prime}_{k}-\bar{P}^{\prime}_{k}.

(7)

Next, vectorize these patches $\{X_{k}\}$ as $\{\mathbf{x}_{k}\}_{{k}=1}^{K},\mathbf{x}_{k}\in\mathbb{R}^{S_{1}}$ , and project all the vectors that carry different frequency information on $\{\mathbf{v}_{n}\}$ :

\mathbf{y}_{k}=[\mathbf{x}_{k}^{T}\mathbf{v}_{1},\mathbf{x}_{k}^{T}\mathbf{v}_{2},...,\mathbf{x}_{k}^{T}\mathbf{v}_{S_{1}}]^{T},

(8)

where $\mathbf{y}_{k}=[y_{k}^{1},y_{k}^{2},...,y_{k}^{S_{1}}]^{T}$ means the projection term of the vector from the $k_{th}$ scale, and $y_{k}^{s}=\mathbf{x}_{k}^{T}\mathbf{v}_{s}$ is the $s_{th}$ element of $\mathbf{y}_{k}$ .

Then the summation of local energy can be expressed as follows:

E(x,y)=\sqrt{(\Sigma_{k}y_{k}^{1})^{2}+(\Sigma_{k}y_{k}^{2})^{2}+...+(\Sigma_{k}y_{k}^{S_{1}})^{2}}.

(9)

And the corresponding local amplitude can be computed as:

\Sigma_{k}A_{k}(x,y)=\Sigma_{k}\sqrt{(y_{k}^{1})^{2}+(y_{k}^{2})^{2}+...+(y_{k}^{S_{1}})^{2}}.

(10)

Definition: The spectrum congruency based on the multiscale patches is defined as follows:

SC(x,y)=\frac{\lfloor E(x,y)-T\rfloor}{\Sigma_{k}A_{k}}.

(11)

The term $T$ in Eq.(11) is a constant to compensate the influence of noise. It can be determined either by the mean value of energy response, or by a fixed threshold given by the users. In this paper, we use $T=\alpha\mu$ to estimate the noise energy, where $\mu$ stands for the mean value of energy response, and $\alpha$ is a given constant.

III-B Spectrum Congruency for Edge detection

The spectrum congruency can be regarded as the edge strength map that reflects the probability of a pixel being the edge. However, the pixels near the edges are also assigned a lot of energy, which means they have a very high probability of being edge pixels, thus degrading the accuracy in edge location. Therefore, we take advantage of the non-maximum suppression algorithm [14] to thin the edges. The edge maps after thinning is denoted $ETM$ . Algorithm.1 describes how to obtain the $SC$ and $ETM$ in detail.

Input: image

f(x,y)

, PatchSize={

S_{1},S_{2},S_{3},...,S_{K}

};

Output: spectrum congruency

SC

and edge thinning map

ETM

;

foreach pixel

(x,y)

f(x,y)

1. Extract

K

patches centered at

(x,y)

P_{1}(x,y)

P_{2}(x,y)

P_{3}(x,y)

,…,

P_{K}(x,y)

2. Remove the DC components of each patch

foreach

S_{k}

in PatchSize do

3. Downsample

P_{k}(x,y)

to form a new patch

P_{S_{k}new}(x,y)

with the same size of

S_{1}

4. Project each patch

P_{S_{k}new}(x,y)

to a set of

complete and orthogonal bases

\{\mathbf{v}\}_{n}

;

5. Compute

E_{k}

(Eq.(9)) and

A_{k}

respectively

according to Eq.(10)

endfor

6. Compute the spectrum congruency

SC(x,y)

via

(Eq.(11))

endfor

7. Apply NMS algorithm to

SC

obtain the edge thinning

map

ETM

return

SC

ETM

Algorithm 1 Spectrum Congruency of Multiscale Local Patches for Edge Detection

Figure.1 depicts the whole process of the proposed spectrum congruency method. We can see that the edges in the edge strength map obtained by Eq.(11) are very clear, and nearly all edges are detected. However, the edge features are too wide to provide precise localization. The non-maximum suppression algorithm is applied as the final step to provide a refined edge thinning map.

Refer to caption — Figure 1: Flowchart of the proposed method, three patches of different sizes are selected at a target point

Compared with the phase congruency, the proposed spectrum congruency is based on the local energy of multiscale patches and does not need to consider the influence of the filter orientations and the frequency spread as the phase congruency does. This avoids the integration of values from multiple orientations, thus eliminating the glitch artifact and providing a much simpler way to measure the edge strength. Besides, there is no need of the weighting function to reduce the spurious response and the proposed method is highly robust to noise. This is because the combination of low-, mid- and high- frequency information around each pixel helps to retain the most important information and remove the noise as well.

III-C Bases Selection

Theorem: Let $\{\mathbf{v}_{n}\}_{n=1}^{N}$ , $\{\mathbf{u}_{n}\}_{n=1}^{N}$ be two arbitrary sets of complete orthogonal bases of a domain $\Omega\subset\mathbb{R}^{N}$ . For a set of vectors $\{\mathbf{x}_{p}\}_{p=1}^{P},\mathbf{x}_{p}\in\Omega$ , the energy and amplitude of its projection on $\{\mathbf{v}_{n}\}$ and $\{\mathbf{u}_{n}\}$ are the same:

\left\{\begin{matrix}E_{\{\mathbf{v}_{n}\}}=E_{\{\mathbf{u}_{n}\}}\\ A_{p}(x)_{\{\mathbf{v}_{n}\}}=A_{p}(x)_{\{\mathbf{u}_{n}\}}\end{matrix}.\right.

Proof: The projection of $\{\mathbf{x}_{p}\}$ on $\{\mathbf{v}_{n}\}$ and $\{\mathbf{u}_{n}\}$ are $\{\mathbf{y}_{p}\}$ and $\{\mathbf{z}_{p}\}$ respectively. At each scale $p$ , the projected vectors are:

\left\{\begin{matrix}\mathbf{y}_{p}=[\mathbf{x}_{p}^{T}\mathbf{v}_{1},\mathbf{x}_{p}^{T}\mathbf{v}_{2},...,\mathbf{x}_{p}^{T}\mathbf{v}_{N}]^{T}\vspace{0.15cm}\\ \mathbf{z}_{p}=[\mathbf{x}_{p}^{T}\mathbf{u}_{1},\mathbf{x}_{p}^{T}\mathbf{u}_{2},...,\mathbf{x}_{p}^{T}\mathbf{u}_{N}]^{T},\end{matrix}\right.

The energy of the projected vector is computed according to Eq.(9), the square of the energy is as follows:

$\displaystyle E^{2}_{\{\mathbf{v}_{n}\}}=$	$\displaystyle(\mathbf{x}_{1}^{T}\mathbf{v}_{1}+\mathbf{x}_{2}^{T}\mathbf{v}_{1}+...+\mathbf{x}_{P}^{T}\mathbf{v}_{1})^{2}+$	(12)
	$\displaystyle(\mathbf{x}_{1}^{T}\mathbf{v}_{2}+\mathbf{x}_{2}^{T}\mathbf{v}_{2}+...+\mathbf{x}_{P}^{T}\mathbf{v}_{2})^{2}+...$
	$\displaystyle+(\mathbf{x}_{1}^{T}\mathbf{v}_{N}+\mathbf{x}_{2}^{T}\mathbf{v}_{N}+...+\mathbf{x}_{P}^{T}\mathbf{v}_{N})^{2}$

Let $(\mathbf{x}_{1}^{T}+\mathbf{x}_{2}^{T}+...+\mathbf{x}_{P}^{T})=\mathbf{x}$ , then Eq.(12) can be rewritten as:

E^{2}_{\{\mathbf{v}_{n}\}}=(\mathbf{x}^{T}\mathbf{v}_{1})^{2}+(\mathbf{x}^{T}\mathbf{v}_{2})^{2}+...+(\mathbf{x}^{T}\mathbf{v}_{N})^{2}

(13)

As illustrated in the sketch map Fig.2, the projection of $\mathbf{x}$ on each base of $\{\mathbf{v}_{n}\}$ is: $\mathbf{x}^{T}\mathbf{v}_{n}=\mathbf{y}^{n}$ , then the r.h.s of Eq.(13) is in fact the magnitude of the corresponding vector $\mathbf{x}$ :

(\mathbf{y}^{1})^{2}+(\mathbf{y}^{2})^{2}+...+(\mathbf{y}^{N})^{2}=\|\mathbf{x}\|^{2}

Similarly, when using the bases $\{\mathbf{u}_{n}\}$ , the energy is $E^{2}_{\{\mathbf{u}_{n}\}}=(\mathbf{z}^{1})^{2}+(\mathbf{z}^{2})^{2}+...+(\mathbf{z}^{N})^{2}=\|\mathbf{x}\|^{2}$ . Therefore, the energy of the projected vectors is actually the same regardless of the bases:

E_{\{\mathbf{v}_{n}\}}=E_{\{\mathbf{u}_{n}\}}.

The amplitude of the projected vector $\mathbf{y}_{p}$ is computed according to Eq.(9):

A_{p}(x)_{\{\mathbf{v}_{n}\}}=\sqrt{(\mathbf{x}_{p}\mathbf{v}_{1})^{2}+(\mathbf{x}_{p}\mathbf{v}_{2})^{2}+...+(\mathbf{x}_{p}\mathbf{v}_{N})^{2}}.

Due to that $\{\mathbf{v}_{n}\}$ is complete and orthogonal, $A_{p}(x)_{\{\mathbf{v}_{n}\}}$ is actually the $\mathfrak{L}_{2}$ norm of $\mathbf{x}_{p}$ , i.e., $A_{p}(x)_{\{\mathbf{v}_{n}\}}=\|\mathbf{x}_{p}\|$ , hence we have:

A_{p}(x)_{\{\mathbf{v}_{n}\}}=A_{p}(x)_{\{\mathbf{u}_{n}\}}.

$\hfill\blacksquare$

According to this theorem, the spectrum congruency is unaffected by the bases as long as the transform domain is embedded in a Hilbert space $\mathbf{H}$ , where the bases are all complete and orthogonal. In other words, the spectrum congruency is all the same for all sets of complete orthogonal bases $\{\mathbf{v}_{n}\}_{n=1}^{N}$ . Hence, in our case, to facilitate the computation, we use the column vectors of the identity matrix $I_{d}\in\mathbb{R}^{S_{1}\times S_{1}}$ as the bases. Thus, the energy and amplitude of the whole image can be calculated in linear time.

IV Experiment and Analysis

IV-A Experiment Data and Settings

To verify the effectiveness of our proposed method, we test our approach on both noise-free images and noisy images. The data used in this section are from the public dataset, including the BSDS500 dataset [2].

In terms of the estimated noise energy $T=\alpha\mu$ in Eq.(11). In our experiments, $\alpha$ is tunable. Empirically, we set $\alpha=0.5$ for noise-free images and $1\leq\alpha\leq 2.5$ for noisy images according to the noise deviation. Three scales of patches are used in this paper, i.e. $K=3$ . The choice of patch size is discussed in Sect.IV-B.

In Sect.IV-C, we test the noise-immunity of the proposed method on noisy cases. Noisy images are generated by adding Gaussian white additive noise to the original noise-free images. We compare the spectrum congruency with the Log-Gabor-based and monogenic signal-based phase congruency. In addition, we compare the edge-thinning maps of spectrum congruency with three state-of-the-art methods: the Canny edge detector (CED) [14], the scale multiplication Canny edge detector (SMED) [20] and the detector based on isotropic and anisotropic Gaussian kernel (IAGK) [21]. The figure of merit is considered as an objective measurement to evaluate the performance of the different methods under noisy cases:

FOM=\dfrac{1}{\max\{N_{gt},N_{det}\}}\sum_{k=1}^{N_{det}}\dfrac{1}{1+\beta d^{2}(k)},

(14)

where $N_{gt}$ represents the number of pixels of the groundtruth, ${N_{det}}$ stands for number of pixels of the edges detected by the edge detectors, $d(k)$ means the distance between the $k$ -th real edge and the detected edge, $\alpha$ is a constant, normally, $\beta=1/9$ .

At last, we compare our method with the aforementioned state-of-the-art edge detectors on the standard benchmark dataset in Sect.IV-D. To evaluate the performance of these methods on standard benchmark dataset, we compute the precision, recall and F-measure:

\left\{\begin{array}[]{l}precision=\dfrac{N_{corr}}{N_{gt}}\\ recall=\dfrac{N_{corr}}{N_{det}}\vspace{0.2cm}\qquad\qquad\qquad,\\ F=\dfrac{2*precision*recall}{precision+recall}\end{array}\right.

(15)

where ${N_{corr}}$ means the edges that are correctly detected, There are two common ways of determining the optimal threshold of F-measure. The first one is applying a fixed threshold to all the edge strength maps, which is called the optimal dataset scale (ODS) threshold. The second one is to use an threshold to the edge strength maps individually, which is named the optimal image scale (OIS). Details are discussed and analyzed as follows.

IV-B Analysis of Patch Size

The patch size is an essential factor in our method. Theoretically, the patch size can range from a single pixel $1\times 1$ to the whole image $M\times N$ . However, the patch size can not be too large in reality, or the localization of edge features will be inaccurate. To test and analyze the impact of patch sizes, we study the patch size in two aspects: 1) the selection of the scales, or the gap size between two scale from fine to coarse patches; 2) with a fixed gap size, the size of patches from small to large. Figure 3 displays the original three test images in this section.

Refer to caption

Figure 3: Images for analyzing the patch size

IV-B1 The selection of the scales

The patch sizes reflect the bandwidths in the transform domain. we choose different patch scales from fine to coarse with increasing gap size:

Scale Case 1:: $S_{1}=3\times 3$ , $S_{2}=5\times 5$ , $S_{3}=7\times 7$ ;
Scale Case 2:: $S_{1}=3\times 3$ , $S_{2}=7\times 7$ , $S_{3}=11\times 11$ ;
Scale Case 3:: $S_{1}=3\times 3$ , $S_{2}=11\times 11$ , $S_{3}=19\times 19$ .

Figure 4 demonstrates the spectrum congruency and the thinning results of the mountain image. The results from left to right are obtained using patches with size from Scale Case 1 to Scale Case 3. Clearly, with the increase of the patch size and the gap size, the extracted edge feature is broader and broader from (a) to (e), which results in severe glitch artifact after thinning. See (f) as an example. The reason is that, with an oversize patch, pixels far away from the edges will also be included in the big patch. This means that the target/center pixel of the big patch is also assigned some energy from the edge pixel. Hence, the patch size should be controlled within an appropriate range. In our following experiments, the largest patch size is $11\times 11$ , and the gap size is no more than $4$ .

IV-B2 The selection of patch size

Next, we would like to study that, within a suitable gap size, how the fine and coarse patch size would affect the spectrum congruency and the detected edge feature. Therefore, three cases are considered here:

Patchsize Case 1:: $S_{1}=3\times 3$ , $S_{2}=5\times 5$ , $S_{3}=7\times 7$ ;
Patchsize Case 2:: $S_{1}=5\times 5$ , $S_{2}=7\times 7$ , $S_{3}=9\times 9$ ;
Patchsize Case 3:: $S_{1}=7\times 7$ , $S_{2}=9\times 9$ , $S_{3}=11\times 11$ .

The first to the third column of Fig.5 shows the results obtained by the patch size of Patchsize Case 1 to Patchsize Case 3 respectively. It can be seen clearly that as the patch size increases, the details are neglected gradually, and the contour of salient objects are more obvious and complete. For example, in the goose image, the fur on the neck of the goose and the water wave are detected by using the smallest patch size, while in Column.3, only a few of fur on the neck is preserved and the water wave is barely seen. Additionally, in the parade image, the stones on the street are less and less detected with the increase of patch size. Moreover, in the mountain image, the grass on the ground and cracks on the stone are mostly obtained by small patch size, while with big patch size, much fewer details are detected.

From the experimental results, we can conclude that small patch sizes are fit for extracting the edge feature of details in images, while big patch sizes are more suitable for detecting the contour of the salient objects in images.

IV-C Experiments on Noisy Cases

IV-C1 Spectrum Congruency vs Phase Congruency

The spectrum congruency is obtained according to the local energy model, which is analogous to phase congruency, hence, we compare our results to the Log-Gabor-based and monogenic signal-based phase congruency on some real images. In the noisy cases, three noise variances are added to the original images, $\sigma=10,20,30$ respectively. The patch size is $S_{1}=3\times 3$ , $S_{2}=5\times 5$ , $S_{3}=7\times 7$ for the noise-free image and $\sigma=10$ ; When $\sigma=20$ and $30$ , we choose $S_{1}=5\times 5$ , $S_{2}=7\times 7$ , $S_{3}=9\times 9$ . The experiment results are displayed in Fig.6, Fig.7, Fig.8 and Fig.9.

Figure.6 shows the performance on the noise-free images. From the results we can see that our method provides much more useful details compared to the other two methods. For example, in the opera house image, the lines and corners of the roof are well detected by our method, while discontinuous and inconspicuous edges are generated by the other two methods. In addition, even those weak edges can be detected by using our method, such as the background of the parrot image. Moreover, with the thinning step by using the non-maximum suppression, our method provides more precise location of the edges.

Figure.7, Fig.8 and Fig.9 demonstrate the effectiveness of the spectrum congruency on noisy images, of which the standard deviation equals to $10$ , $20$ and $30$ respectively. As the noise increases, the phase congruency value decreases dramatically by using the Log-Gabor-based method. When the noise variance adds up to $30$ , the feature is nearly unseen.

For the house image in Fig.7, a lot of features disappear by using the Log-Gabor filter, the glitch artifact is generated by using monogenic signal, but our method provides stable result as it is in Fig.6. The buildings in the cameraman image in Fig.7 are barely seen by using the Log-Gabor and monogenic signal, while in our case, they are still well detected. The complete contours of the parrot, the plane and the opera house are presented by our method, while by using the other two methods, a lot of segments are missing.

With a stronger noise $\sigma=20$ in Fig.8, we can see that our method still yields strong edge strength, while the detected features by the other two methods are weak. Even though that our result is a little affected by the noise, it still provides as much essential and complete features as possible. For example, in the butterfly image, the other two methods are not able to detect the flowers in the background, while our approach still succeeds in providing these details. Additionally, it can be seen clearly that with the increase of noise, both phase congruency generated by the Log-Gabor and the monogenic signal decreases a lot. Even in the places where there exist strong edges, the value of phase congruency is still small. While in our case, the value of features keeps stable regardless of noise.

When a higher level noise $\sigma=30$ is added, Fig.9 demonstrates that the proposed method still achieves good performance. In the plane image, the contour of plane is completely detected by using our proposed method (d), while in (b) and (c), nearly nothing can be observed. In the opera house image, the buildings in the background is detected and preserved by using our method.

The experimental results show that our proposed spectrum congruency is superior to the phase congruency in detecting features either in noise-free cases or in noisy instances.

IV-C2 Edge-Thinning Maps

The edge-thinning maps are always the output of logic operations, and they stand for the exact location of detected edges. Hence, the FOM measure is used here to evaluate the precision of the location of different edge-thinning maps. The test is implemented on a synthetic image because synthetic images are equipped with the ideal and uncontroversial groundtruth. We add different noise deviations $\sigma=30,50,100,150$ to the test image. Three state-of-the-art methods are used as comparisons: the Canny edge detector (CED) [14], the scale multiplication Canny edge detector (SMED) [20] and the detector based on isotropic and anisotropic Gaussian kernel (IAGK) [21]. The results of the three comparison methods are obtained by selecting the optimal parameters which result in the highest FOM. Since the noise is strong in this experiment, we preprocess the image with a Gaussian filter before computing the spectrum congruency. The window size of the Gaussian kernel is $7\times 7$ , and the standard deviation is $s=3.5$ .

Figure 10 and Table.I show the experimental results and the FOM index respectively. Table.I demonstrates that the proposed method is competitive with the state-of-the-art edge detection method, especially when the noise deviation is high. In Fig.10, when the noise deviation is low, all the methods can provide precise locations and complete edges. When there is a high level of noise, the differential-based methods need large scales to confront the noise, while large scales will blur the details, thus leading to some missed detection. For example, when $\sigma=100$ , the top line of the rectangle is detected by our method, while it is missed by using the other three differential-based method. When $\sigma=150$ , compared with the differential-based methods, our method is less affected by the noise and can still provide relatively complete contour of the shapes in the image, while the other methods detect some false segments and incomplete contours. The experiment results demonstrate that the spectrum congruency is highly robust to noise.

TABLE I: FOM of CED, SMED, IAGK and the proposed method under different noise standard deviation in Fig.10

	CED	SMED	IAGK	Proposed
$\sigma=30$	0.8195	0.8149	0.8141	0.8137
$\sigma=50$	0.771	0.7863	0.7767	0.7778
$\sigma=100$	0.6978	0.7005	0.6782	0.7081
$\sigma=150$	0.6039	0.5995	0.5991	0.6138

IV-D Experiment on BSDS500 Dataset

The BSDS500 dataset is a widely used dataset for image contour detection and segmentation. Each image is labeled by $4$ to $9$ annotators manually. It contains 200 training images, 200 test images, 100 validation images. Here we use the 100 gray validation images for experiment. Each method is configured as follows:

•

CED: The scale is $\sqrt{2}$ , which is widely adopted in the literature.
•

SMED: The two scales are $2$ and $8$ respectively.
•

IAGK: The scale is specified as $4$ , the anisotropic factor is $2\sqrt{2}$ and the number of kernel orientation is 16.
•

SC: The patchsize is $3,5,7$ respectively, $\alpha=0.05$ .

The PR vurves of the four edge detectors are displayed in Fig.11. In terms of the PR curve, we can see that our proposed edge detector behaves better than the CED, SMED and IAGK approaches.

Meanwhile, Table.II indicates that the proposed method achieves better performance. The $F_{ODS}$ , $F_{OIS}$ and $R50$ indexes demonstrate that our proposed method is competitive with the other three methods.

TABLE II: F-measure of CED, SMED, IAGK and the proposed method

	CED	IAGK	SMED	Proposed
$F_{ODS}$	0.566	0.549	0.601	0.607
$F_{OIS}$	0.608	0.543	0.610	0.626
$R50$	0.649	0.589	0.741	0.756

V Conclusions and Future Work

This paper proposes a novel image feature called spectrum congruency for edge detection. The spectrum congruency is computed based on the local energy of multiscale patches. Similar to the phase congruency, spectrum congruency is also consistent with the human visual system on perceiving the features of interest in the images. Unlike phase congruency, which measures the local energy via the transforms of fixed bases such as the Fourier transform, Log-Gabor transform and the monogenic signals, the proposed spectrum congruency computes the local energy in a data-driven way, which are more adaptable to the input images. Patches with different sizes around each pixel are selected and re-sampled to acquire information from low to high frequency information. The spectrum congruency indicates the edge strength map and is computed by integrating information at different frequency band.

Compared with the previous measurement of phase congruency, our method is more robust to noise, with fewer glitch artifacts, smoother and more continuous edge features. Besides, compared with state-of-the-art edge detectors, our method can provide more reliable edges when confronting high level of noise. In future, we are going to apply our edge detection method higher level computer vision tasks such as image registration, image segmentation, object recognition etc.

References

[1] A. Koschan and M. A. Abidi, “Detection and classification of edges in color images,” IEEE Signal Processing Magazine, vol. 22, no. 1, pp. 64–73, 2005.
[2] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, “Contour detection and hierarchical image segmentation,” IEEE Transaction on Pattern Analysis and Maching Intelligence, vol. 33, pp. 898–916, May 2011.
[3] J. Chen, J. Li, D. Pan, Q. Zhu, and Z. Mao, “Edge-guided multiscale segmentation of satellite multispectral imagery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 50, no. 11, pp. 4513–4520, 2012.
[4] S. Zhang, Q. Tian, K. Lu, Q. Huang, and W. Gao, “Edge-SIFT: Discriminative binary descriptor for scalable partial-duplicate mobile search,” IEEE Transation on Image Processing, vol. 22, no. 7, pp. 2889–2902, 2013.
[5] Y. Li, S. Wang, Q. Tian, and X. Ding, “A survey of recent advances in visual feature detection,” Neurocomputing, vol. 149, Part B, pp. 736–751, 2015.
[6] F. P. Wang and P. L. Shui, “Noise-robust color edge detector using gradient matrix and anisotropic gaussian directional derivative matrix,” Pattern Recognition, vol. 52, pp. 346–357, 2016.
[7] C. Lopez-Molina, B. De Baets, H. Bustince, J. A. Sanz, and E. Barrenechea, “Multiscale edge detection based on gaussian smoothing and edge tracking,” Knowledge Based Systems, vol. 44, pp. 101–111, 2013.
[8] N. Imamoglu, W. Lin, and Y. Fang, “A saliency detection model using low-level features based on wavelet transform,” IEEE Transactions on Multimedia, vol. 15, no. 1, pp. 96–105, 2013.
[9] G. Wang, C. Lopez-Molina, and B. Debaets, “Multiscale edge detection using first-order derivative of anisotropic gaussian kernels,” Journal of Mathematical Imaging and Vision, vol. 61, no. 5, 2019.
[10] L. G. Roberts, Machine perception of three-dimensional solids. PhD thesis, Massachusetts Institute of Technology, 1963.
[11] I. Sobel, Camera Models and Machine Perception. PhD thesis, Stanford University, 1970.
[12] J. M. Prewitt, “Object enhancement and extraction,” Picture processing and Psychopictorics, vol. 10, no. 1, pp. 15–19, 1970.
[13] D. Marr and E. Hildreth, “Theory of edge detection,” Proceedings of the Royal Society of London B: Biological Sciences, vol. 207, no. 1167, pp. 187–217, 1980.
[14] J. Canny, “A computational approach to edge detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 6, pp. 679–698, 1986.
[15] R. Deriche, “Using canny’s criteria to derive a recursively implemented optimal edge detector,” International Journal of Computer Vision, vol. 1, no. 2, pp. 167–187, 1987.
[16] L. Ding and A. Goshtasby, “On the canny edge detector,” Pattern Recognition, vol. 34, no. 3, pp. 721–725, 2001.
[17] J. H. Elder and S. W. Zucker, “Local scale control for edge detection and blur estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 7, pp. 699–716, 1998.
[18] V. Torre and T. A. Poggio, “On edge detection,” IEEE Transactions on Pattern Analysis Machine Intelligence, vol. 8, no. 2, pp. 147–163, 1986.
[19] J. C. Bezdek, R. Chandrasekhar, and Y. Attikouzel, “A geometric approach to edge detection,” IEEE Transactions on Fuzzy Systems, vol. 6, no. 1, pp. 52–75, 2002.
[20] P. Bao, L. Zhang, and X. Wu, “Canny edge detection enhancement by scale multiplication,” IEEE transactions on pattern analysis and machine intelligence, vol. 27, no. 9, p. 1485, 2005.
[21] P. L. Shui and W. C. Zhang, “Noise-robust edge detector combining isotropic and anisotropic gaussian kernels,” Pattern Recognition, vol. 45, no. 2, pp. 806–820, 2012.
[22] G. Papari, P. Campisi, N. Petkov, and A. Neri, “A biologically motivated multiresolution approach to contour detection,” EURASIP Journal on Advances in Signal Processing, vol. 2007, no. 1, pp. 119–119, 2007.
[23] M. C. Morrone, J. Ross, D. C. Burr, and R. Owens, “Mach bands are phase dependent,” Nature, vol. 324, no. 6094, pp. 250–253, 1986.
[24] M. C. Morrone and R. A. Owens, “Feature detection from local energy,” Pattern Recognition Letters, vol. 6, no. 5, pp. 303–313, 1987.
[25] M. C. Morrone and D. C. Burr, “Feature detection in human vision: A phase-dependent energy model,” Proceedings of the Royal Society of London. Series B. Biological Sciences, vol. 235, no. 1280, pp. 221–245, 1988.
[26] P. Kovesi, “Image features from phase congruency,” Videre: Journal of Computer Vision Research, vol. 1, no. 3, pp. 1–26, 1999.
[27] P. Kovesi, “Phase congruency: A low-level image invariant,” Psychological Research, vol. 64, no. 2, pp. 136–148, 2000.
[28] B. Obara, M. Fricker, and V. Grau, “Coherence enhancing diffusion filtering based on the phase congruency tensor,” pp. 202–205, 2012.
[29] L. Zhang, L. Zhang, X. Mou, and D. Zhang, “FSIM: A feature similarity index for image quality assessment,” IEEE Transactions on Image Processing, vol. 20, no. 8, pp. 2378–2386, 2011.
[30] G. Bhatnagar, Q. M. J. Wu, and Z.Liu, “Directive contrast based multimodal medical image fusion in NSCT domain,” IEEE Transactions on Multimedia, vol. 15, no. 5, pp. 1014–1024, 2013.
[31] J. Li, Q. Hu, and M. Ai, “RIFT: Multi-modal image matching based on radiation-variation insensitive feature transform,” IEEE Transactions on Image Processing, vol. 29, pp. 3296–3310, 2020.
[32] S. Gundimada and V. K. Asari, “Facial recognition using multisensor images based on localized kernel eigen spaces,” IEEE Transactions on Image Processing, vol. 18, no. 6, pp. 1314–1325, 2009.
[33] A. Verikas, A. Gelzinis, M. Bacauskiene, I. Olenina, S. Olenin, and E. Vaiciukynas, “Phase congruency-based detection of circular objects applied to analysis of phytoplankton images,” Pattern Recognition, vol. 45, no. 4, pp. 1659–1670, 2012.
[34] S. Shojaeilangari, W.-Y. Yau, and E.-K. Teoh, “A novel phase congruency based descriptor for dynamic facial expression analysis,” Pattern Recognition Letters, vol. 49, pp. 55–61, 2014.
[35] T. Mouats, N. Aouf, and M. A. Richardson, “A novel image representation via local frequency analysis for illumination invariant stereo matching,” IEEE Transactions on Image Processing, vol. 24, no. 9, pp. 2685–2700, 2015.
[36] Y. Ye, J. Shan, L. Bruzzone, and L. Shen, “Robust registration of multimodal remote sensing images based on structural similarity,” IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 5, pp. 2941–2958, 2017.
[37] J. Fan, Y. Wu, M. Li, W. Liang, and Y. Cao, “SAR and optical image registration using nonlinear diffusion and phase congruency structural descriptor,” IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 9, pp. 5368–5379, 2018.
[38] P. Kovesi, “Phase congruency detects corners and edges,” in The australian pattern recognition society conference: DICTA, vol. 2003, 2003.
[39] J. J. Lim, C. L. Zitnick, and P. Dollar, “Sketch tokens: A learned mid-level representation for contour and object detection,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
[40] P. Dollár and C. L. Zitnick, “Structured forests for fast edge detection,” in 2013 IEEE International Conference on Computer Vision, pp. 1841–1848, 2013.
[41] Z. Zhang, F. Xing, X. Shi, and L. Yang, “Semicontour: A semi-supervised learning approach for contour detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 251–259, 2016.
[42] Y. Ganin and V. Lempitsky, “N^4-fields: Neural network nearest neighbor fields for image transforms,” in Asian Conference on Computer Vision, pp. 536–551, Springer, 2014.
[43] G. Bertasius, J. Shi, and L. Torresani, “Deepedge: A multi-scale bifurcated deep network for top-down contour detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4380–4389, 2015.
[44] W. Shen, X. Wang, Y. Wang, X. Bai, and Z. Zhang, “Deepcontour: A deep convolutional feature learned by positive-sharing loss for contour detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3982–3991, 2015.
[45] S. Xie and Z. Tu, “Holistically-nested edge detection,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 1395–1403, 2015.
[46] J. He, S. Zhang, M. Yang, Y. Shan, and T. Huang, “Bi-directional cascade network for perceptual edge detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3828–3837, 2019.
[47] S. Venkatesh and R. Owens, “On the classification of image features,” Pattern Recognition Letters, vol. 11, no. 5, pp. 339–349, 1990.
[48] M. Felsberg and G. Sommer, “The monogenic signal,” IEEE Transactions on Signal Processing, vol. 49, no. 12, pp. 3136–3144, 2001.