This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Co-occurrence Background Model with Superpixels for Robust Background Initialization

Wenjun Zhou1, Yuheng Deng1, Bo Peng1, Dong Liang2 and Shun’ichi Kaneko3 1School of Computer Science, Southwest Petroleum University, Chengdu, China 610500
Email: [email protected], [email protected]
2Nanjing University of Aeronautics and Astronautics, China 3Hokkaido University, Japan
Abstract

Background initialization is an important step in many high-level applications of video processing, ranging from video surveillance to video inpainting. However, this process is often affected by practical challenges such as illumination changes, background motion, camera jitter and intermittent movement, etc. In this paper, we develop a co-occurrence background model with superpixel segmentation for robust background initialization. We first introduce a novel co-occurrence background modeling method called as Co-occurrence Pixel-Block Pairs (CPB) to generate a reliable initial background model, and the superpixel segmentation is utilized to further acquire the spatial texture information of foreground and background. Then, the initial background can be determined by combining the foreground extraction results with the superpixel segmentation information. Experimental results obtained from the dataset of the challenging benchmark (SBMnet) validate it’s performance under various challenges.

I Introduction

As a widely used approach in various computer vision and video processing applications[1, 2], scene background initialization plays an active role in object detection[3], video segmentation[4], video coding[5, 6] and video inpainting[7, 8], etc. Scene background initialization describes the scene without any foreground objects and generates a clear background to facilitate more efficient follow-up processing in computer vision or video processing applications. Bouwmans et al. overviewed and summarized many traditional and recent approaches that have been proposed and developed for scene background initialization[2], and previous works[9, 10] have already analyzed challenges of background initialization. However, background initialization is still faced with some severe practical challenges[11] which include:

  • Illumination changes: for example, light intensity typically varies during day.

  • Background motion: some movements in a scene should be determined as background e.g. swaying tree, waving water, or ever-changing advertising boards.

  • Camera jitter: in video surveillance, camera jitter is one severe issue that needs to be solved for background initialization.

  • Intermittent movement: the scene with abandoned objects stopping for a short while and then moving away. Under this condition, to differentiate between foreground and abandoned objects is difficult.

Fig. 1 shows the typical examples of these challenges.

Refer to caption
Figure 1: Typical examples of these challenges: (a) Illumination changes, (b) Background motion, (c) Camera jitter, (d) Intermittent motion.

To handle above challenges, we propose a robust background initialization approach based on the co-occurrence background model (Co-occurrence Pixel-Block Pairs: CPB) with superpixels. CPB has already been described in our previous work[12, 13]. As an intuitive and robust background model, CPB was originally designed for foreground detection under dramatical background changes, such as illumination changes and background motion. Here, CPB is utilized as the background model for scene background initialization. Then, in order to further obtain the spatial texture information of foreground and background for efficient background generation, the superpixel algorithm called simple linear iterative clustering (SLIC) [14] is introduced to classify the spatial correlations and temporal difference motion between foreground and background for motion detection. The main contributions of this work are as follows:

  1. 1.

    The proposed approach enables to effectively acquire the spatial-temporal information of foreground and background and sensitively distinguish the difference between them, so it is highly efficient for motion detection in a scene under complex challenges, especially strong background changes (e.g. illumination changes and background motion) or intermittent motion.

  2. 2.

    The proposed approach provides a low-complexity and efficient strategy for robust background initialization. Especially when compared with neural network (NN) based approaches[15, 16], it has low cost because it is capable of training without any teacher signals.

The rest of this paper is organized as follows. The proposed approach is described in Section II. Section III analyzes the experimental results from the dataset of the SBMnet[17]. Conclusions are discussed in Section IV.

II Methodology

In this section, the proposed approach is described in details. The steps of it includes: (1) CPB background modeling; (2) Motion detection; (3) Background generation as shown in Fig. 2.

Refer to caption
Figure 2: Overview of background initialization by the proposed approach.

II-A Co-occurrence Background Model

The working diagram of CPB background modeling is illustrated in Fig. 3 including: the training process and the detecting process. In this work, the target pixel pp is compared with the QBQ^{B} as block, and we define {QkB}k=1,2,,K={Q1B,Q2B,,QKB}\{Q_{k}^{B}\}_{k=1,2,...,K}=\{Q_{1}^{B},Q_{2}^{B},...,Q_{K}^{B}\} to denote a supporting block set for the target pixel pp. Each frame is divided into blocks QKBQ_{K}^{B} of size m×nm\times n pixels:

QB={Q11Q12Q1nQ21Q22Q2nQm1Qm2Qmn}.Q^{B}=\begin{Bmatrix}Q_{11}&Q_{12}&\dots&Q_{1n}\\ Q_{21}&Q_{22}&\dots&Q_{2n}\\ \vdots&\vdots&\vdots&\vdots\\ Q_{m1}&Q_{m2}&\dots&Q_{mn}\end{Bmatrix}. (1)
Refer to caption
Figure 3: Working diagram of CPB background model using PETS2001 dataset as a demonstration.

Background changes in scene can affect the current intensity of target pixel pp in foreground detection. Hence, it is quite natural that block QBQ^{B}, being strongly correlated with target pixel pp, can be used to determine the state of the latter. Block QBQ^{B} can be introduced as a reference to estimate the current intensity of target pixel pp, that is, there exists a correlation between pixel pp and block QBQ^{B}: Ip=I¯Q+ΔkI_{p}=\bar{I}_{Q}+\Delta_{k} (I¯Q\bar{I}_{Q} is the average intensity of block QBQ^{B} in the current detecting frame). In order to reduce the risk of individual error and perform robust background model, to select the sufficient number of block QBQ^{B} with high correlation as supporting blocks is necessary, defined as follows:

{QkB}k=1,2,,K={QB|γ(p,QB)is the K highest},\{Q_{k}^{B}\}_{k=1,2,...,K}=\{Q^{B}|\gamma(p,Q^{B})\text{is the $K$ highest}\}, (2)

where

γ(p,QkB)=Cp,Q¯kσpσQ¯k,{\gamma}(p,Q_{k}^{B})=\dfrac{C_{p,\bar{Q}_{k}}}{\sigma_{p}\cdot\sigma_{\bar{Q}_{k}}}, (3)

where γ\gamma is the Pearson’s product-moment correlation coefficient. Then, the Gaussian model is used to construct the co-occurrence model for each pixel-block pair:

ΔkN(bk,σk2)Δk=IpI¯Qk,\Delta_{k}\sim N(b_{k},\sigma_{k}^{2})\quad\Delta_{k}=I_{p}-\bar{I}_{Q_{k}}, (4)

where IpI_{p} is the intensity of the pixel pp at tt frame and I¯Qk\bar{I}_{Q_{k}} is the average intensity of blocks QkBQ_{k}^{B} at tt frame. The background model is built as a list consisting of [IP,uk,vk,bk,σk][I^{P},u_{k},v_{k},b_{k},\sigma_{k}], where IPI^{P} is the average intensity of target pixel pp in TT sequence frames computed by training (TT is the number of training frames) and (uk,vk)(u_{k},v_{k}) are the coordinates of supporting blocks.

At the detecting process, we use the correlation dependent decision for identifying the state of target pixel pp as shown in Fig.3 and more details are described in[13].

Refer to caption
Figure 4: Representative results in different challenging sequences.
TABLE I: Results on different challenges from the SBMnet dataset
Challenge Method AGE pEPs pCEPs PSNR MS-SSIM CQM
LaBGen-OF 1.8388 0.0026 0.0017 0.9899 34.6563 35.4184
MSCL 2.3728 0.0027 0.0016 0.9866 34.081 34.7595
FSBE 3.0236 0.0055 0.0035 0.9821 33.6317 34.2344
LaBGen-P-Semantic
(MP+U)
1.9743 0.0024 0.0015 0.9899 34.8111 35.5647
SPMD 2.1919 0.0004 0.0000 0.9935 38.6807 38.9381
Basic Our approach 1.4275 0.0002 0.0000 0.9983 42.3151 42.2216
LaBGen-OF 19.6355 0.4062 0.2597 0.9346 19.4204 20.9417
MSCL 2.8098 0.0043 0.0000 0.9913 34.9208 35.5259
FSBE 6.6733 0.0177 0.0002 0.9817 29.2464 30.2773
LaBGen-P-Semantic
(MP+U)
17.6197 0.2733 0.1829 0.8641 18.4939 20.083
SPMD 6.0889 0.0540 0.0129 0.9755 26.9955 28.1438
Illumination Changes Our approach 15.2618 0.1657 0.0130 0.9451 21.3651 22.3365
LaBGen-OF 1.7604 0.0022 0.0005 0.9893 38.6184 39.0805
MSCL 2.1299 0.0016 0.0005 0.9962 36.6006 36.8315
FSBE 1.8453 0.0029 0.0003 0.9814 37.9984 37.9817
LaBGen-P-Semantic
(MP+U)
1.5156 0.0000 0.0000 0.9970 41.4472 41.4719
SPMD 2.2313 0.0035 0.0002 0.9823 36.8531 36.1390
Background Motion Our approach 1.7742 0.0000 0.0000 0.9965 39.7339 39.9130
LaBGen-OF 11.9868 0.1590 0.0267 0.8719 20.2275 21.7778
MSCL 5.8660 0.0471 0.0067 0.9699 26.0077 27.1642
FSBE 10.1060 0.1413 0.0283 0.9003 22.5280 23.8107
LaBGen-P-Semantic
(MP+U)
11.1637 0.1466 0.0281 0.8619 20.4535 21.8627
SPMD 1.3573 0.0001 0.0000 0.9979 42.1226 42.1988
Camera Jitter Our approach 9.4038 0.1205 0.0133 0.9235 22.6436 24.0308
LaBGen-OF 2.3248 0.0043 0.0021 0.9948 36.5121 36.8640
MSCL 1.8481 0.0026 0.0011 0.9943 37.9796 38.1597
FSBE 3.8068 0.0263 0.0173 0.9432 27.9022 28.9156
LaBGen-P-Semantic
(MP+U)
2.1082 0.0031 0.0016 0.9945 37.5222 37.7290
SPMD 2.1629 0.0032 0.0017 0.9940 37.2778 37.5754
Intermittent Movement Our approach 1.6250 0.0012 0.0000 0.9957 38.4293 38.7184
  • *

    Note that red entries indicate the best in metric.

II-B Motion Detection Combined with Superpixels

Superpixel segmentation has attracted the interest of many computer vision applications as it provides an effective strategy to estimate image features and reduce the complexity of subsequent image processing tasks[18]. Superpixels have been applied in various fields including object recognition[19, 20], image segmentation[21] and object tracking[22].

As most optical flow techniques assumed [23] that the motion field near motion boundary between foreground and background tend to be over-smoothed and blurred. Motion boundaries are the most important regions and incorrect motions near the area often lead a incorrect result in motion estimation. For effective motion estimation in a scene, we introduce the superpixel segmentation algorithm in the proposed algorithm to further acquire and differentiate the spatial texture information of foreground and background[24, 11]. Here, SLIC algorithm[14] is utilized on account of its low complexity and high memory efficiency in computation.

The steps of motion detection are as follows:

  1. 1.

    To record the pixels {p(xi,yj)}\left\{p(x_{i},y_{j})\right\} of the foreground detected by CPB;

  2. 2.

    To estimate the value VV of superpixel regions SS in these pixels {p(xi,yj)}\left\{p(x_{i},y_{j})\right\};

  3. 3.

    Then, to detect the motion and acquire the motion mask MM, when {p(x,y)}in current frame\forall\{p(x,y)\}\text{in current frame} is denoted as:

    m(x,y)={1 if p(x,y)V0 otherwise .m(x,y)=\left\{\begin{array}[]{ll}{1}&{\text{ if }p(x,y)\in V}\\ {0}&{\text{ otherwise }}\end{array}\right.. (5)

The motion mask M={m(x,y)}M=\left\{m(x,y)\right\}. With the help of superpixel segmentation, the proposed approach can further acquire the spatial information of each pixel and distinguish the different motion information between foreground and background. Based on this, the proposed approach can reinforce the original CPB for extracting motion and avoid errors in information extraction from pixels.

II-C Final Background Generation

Then, we replace the region of motion mask with the initial CPB background model for background generation as shown in Fig. 3.

III Experiments

III-A Experiment Setup

In order to fairly evaluate the proposed approach without losing generality, we consider the several challenges in the background initialization algorithm[17]. The following challenges are selected from SBMnet for evaluation:

  • Basic: PETS2006 represents a mixture of mild challenges typical of the shadows and intermittent movement.

  • Illumination changes: Dataset3Camera2 with the illumination changes during day.

  • Background motion: advertisementBoard contains an ever-changing advertising board in the scene.

  • Camera jitter: boulevard contains the videos captured by outdoor unstable cameras.

  • Intermittent movement: sofa sequence with abandoned objects moving, then stopping for a short while, and then moving again.

III-B Evaluation Measurement

Six metrics which are the common measurements for the background initialization algorithm [17, 11] are introduced for performance evaluation in this paper. They are explained as follows:

  • AGE (Average Gray-level Error): average of the absolute difference between GT and BI.

  • pEPs (Percentage of Error Pixels): number of pixels in BI whose value differs from the value of the corresponding pixel in GT by more than a threshold τ\tau, which is set as 20 in[17].

  • pCEPs (Percentage of Clustered Error Pixels): percentage of CEPs (number of pixels whose 4-connected neighbors are also error pixels) with respect to the total number of pixels in the image.

  • PSNR (Peak Signal to Noise Ratio): widely used to measure the quality of BI compared with GT, defined as PSNR=10log10(2552MSE)PSNR=10\cdot{\log}_{10}\left(\dfrac{255^{2}}{MSE}\right).

  • MS-SSIM (Multi-scale Structural Similarity Index): estimate of the perceived visual distortion defined in [25].

  • CQM (Color image Quality Measure): defined in [26]. It assumes values in db and the higher CQM value, the better is the background estimate.

Where, GT means the ground truth of the background image and BI means the generated background image computed by the background initialization approaches.

III-C Result Evaluation

In this section, the proposed approach is compared with five different state-of-the-art techniques selected from SBMnet benchmark, which are LaBGen-OF[27], MSCL[28], FSBE[29], LaBGen-P-Semantic(MP+U) [30] and SPMD[11]. Four of them are the leading techniques for background initialization in SBMnet benchmark, especially MSCL[28] which is the top ranked techniques at present. All the results of the five different techniques come from SBMnet benchmark.

In experiments, we set each block as 8×88\times 8 pixels with input frame size of 320×240320{\times}240 for CPB. All used parameters are listed in Table II, and a detailed discussion of parameters can be found in[12]. Experimental results of the background initialization are presented in Fig. 4, and Table I lists the overall evaluation of these approaches in different challenges. It can be seen from the above results as shown in Fig. 4 and Table II, that our approach outperforms other techniques in challenges of Basic and Intermittent Movement, and for Background Motion, our approach has a close performance to LaBGen-P-Semantic(MP+U), which is the best in this challenge. For other two different challenges, our approach also leads the intermediate level compared with other techniques and the performance is acceptable. The comparison shows that our approach is robust and effective for background initialization in different challenges.

The processing time for background initialization is close to 0.15 seconds with frame size of 320×240320{\times}240 in MATLAB platform (Intel i7 2.40 GHZ and 16G).

TABLE II: Parameters setting in CPB
Supporting blocks number KK 20
Threshold of Gaussian model η\eta 2.5
Threshold of correlation dependent decision λ\lambda 0.5

IV Conclusions

In this paper, we propose a new approach for robust background initialization of a complex scene based on co-occurrence background model (CPB) with superpixel segmentation. It is designed to handle the severe challenges in background initialization, such as illumination changes, background motion, camera jitter and intermittent movement, etc. Video sequences contain the temporal context information which can be learned by CPB model from the training data to resist interference in the scene. Furthermore, superpixel segmentation can help acquire more spatial texture information to facilitate the motion differentiation between foreground and background. The experimental results under different challenges validate the comprehensive performance of the proposed approach. More details including source code are released in: https://github.com/zwj1archer/CPB-superpixel.git.

Acknowledgment

This work is supported by scientific research starting project of SWPU (No.2019QHZ017).

References

  • [1] T. Bouwmans, “Traditional and recent approaches in background modeling for foreground detection: An overview,” Computer Science Review, vol. 11, pp. 31–66, 2014.
  • [2] T. Bouwmans, L. Maddalena, and A. Petrosino, “Scene background initialization: A taxonomy,” Pattern Recognition Letters, vol. 96, pp. 3–11, 2017.
  • [3] X. Zhang, C. Zhu, S. Wang, Y. Liu, and M. Ye, “A bayesian approach to camouflaged moving object detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 9, pp. 2001–2013, 2017.
  • [4] C. Chiu, M. Ku, and L. Liang, “A robust object segmentation system using a probability-based background extraction algorithm,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, no. 4, pp. 518–528, 2010.
  • [5] M. Paul, “Efficient video coding using optimal compression plane and background modelling,” IET Image Processing, vol. 6, no. 9, pp. 1311–1318, 2012.
  • [6] X. Li et al., “Background-foreground information based bit allocation algorithm for surveillance video on high efficiency video coding (hevc),” in 2016 Visual Communications and Image Processing (VCIP).   IEEE, 2016, pp. 1–4.
  • [7] A. Colombari, M. Cristani, V. Murino, and A. Fusiello, “Exemplar-based background model initialization,” in Proceedings of the third ACM international workshop on Video surveillance & sensor networks, 2005, pp. 29–36.
  • [8] X. Chen, Y. Shen, and Y. H. Yang, “Background estimation using graph cuts and inpainting,” in Proceedings of Graphics Interface 2010, 2010, pp. 97–103.
  • [9] L. Maddalena and A. Petrosino, “Towards benchmarking scene background initialization,” in International conference on image analysis and processing.   Springer, 2015, pp. 469–476.
  • [10] P.-M. Jodoin, L. Maddalena, A. Petrosino, and Y. Wang, “Extensive benchmark and survey of modeling methods for scene background initialization,” IEEE Transactions on Image Processing, vol. 26, no. 11, pp. 5244–5256, 2017.
  • [11] Z. Xu, B. Min, and R. C. Cheung, “A robust background initialization algorithm with superpixel motion detection,” Signal Processing: Image Communication, vol. 71, pp. 1–12, 2019.
  • [12] W. Zhou, S. Kaneko, D. Liang, M. Hashimoto, and Y. Satoh, “Background subtraction based on co-occurrence pixel-block pairs for robust object detection in dynamic scenes,” IIEEJ transactions on image electronics and visual computing, vol. 5, no. 2, pp. 146–159, 2017.
  • [13] W. Zhou, S. Kaneko, M. Hashimoto, Y. Satoh, and D. Liang, “Foreground detection based on co-occurrence background model with hypothesis on degradation modification in dynamic scenes,” Signal Processing, vol. 160, pp. 66–79, 2019.
  • [14] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, “Slic superpixels compared to state-of-the-art superpixel methods,” IEEE transactions on pattern analysis and machine intelligence, vol. 34, no. 11, pp. 2274–2282, 2012.
  • [15] P. Xu, M. Ye, Q. Liu, X. Li, L. Pei, and J. Ding, “Motion detection via a couple of auto-encoder networks,” in 2014 IEEE International Conference on Multimedia and Expo (ICME).   IEEE, 2014, pp. 1–6.
  • [16] I. Halfaoui, F. Bouzaraa, and O. Urfalioglu, “Cnn-based initial background estimation,” in 2016 23rd International Conference on Pattern Recognition (ICPR).   IEEE, 2016, pp. 101–106.
  • [17] “A dataset for testing background estimation algorithms,” http://scenebackgroundmodeling.net/.
  • [18] M. Wang, X. Liu, Y. Gao, X. Ma, and N. Q. Soomro, “Superpixel segmentation: A benchmark,” Signal Processing-image Communication, vol. 56, pp. 28–39, 2017.
  • [19] H. Lu, X. Feng, X. Li, and L. Zhang, “Superpixel level object recognition under local learning framework,” Neurocomputing, vol. 120, pp. 203–213, 2013.
  • [20] D. Giordano, F. Murabito, S. Palazzo, and C. Spampinato, “Superpixel-based video object segmentation using perceptual organization and location prior,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 4814–4822.
  • [21] T. Lei, X. Jia, Y. Zhang, S. Liu, H. Meng, and A. K. Nandi, “Superpixel-based fast fuzzy c-means clustering for color image segmentation,” IEEE Transactions on Fuzzy Systems, vol. 27, no. 9, pp. 1753–1766, 2018.
  • [22] F. Yang, H. Lu, and M.-H. Yang, “Robust superpixel tracking,” IEEE Transactions on Image Processing, vol. 23, no. 4, pp. 1639–1651, 2014.
  • [23] S. Baker, D. Scharstein, J. Lewis, S. Roth, M. J. Black, and R. Szeliski, “A database and evaluation methodology for optical flow,” International journal of computer vision, vol. 92, no. 1, pp. 1–31, 2011.
  • [24] J. Lim and B. Han, “Generalized background subtraction using superpixels with label integrated motion estimation,” in European Conference on Computer Vision.   Springer, 2014, pp. 173–187.
  • [25] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2.   Ieee, 2003, pp. 1398–1402.
  • [26] Y. Yalman and İ. ERTÜRK, “A new color image quality measure based on yuv transformation and psnr for human vision system,” Turkish Journal of Electrical Engineering & Computer Sciences, vol. 21, no. 2, pp. 603–612, 2013.
  • [27] B. Laugraud and M. Van Droogenbroeck, “Is a memoryless motion detection truly relevant for background generation with labgen?” in International Conference on Advanced Concepts for Intelligent Vision Systems.   Springer, 2017, pp. 443–454.
  • [28] S. Javed, A. Mahmood, T. Bouwmans, and S. K. Jung, “Background–foreground modeling based on spatiotemporal sparse subspace clustering,” IEEE Transactions on Image Processing, vol. 26, no. 12, pp. 5840–5854, 2017.
  • [29] A. Djerida, Z. Zhao, and J. Zhao, “Robust background generation based on an effective frames selection method and an efficient background estimation procedure (fsbe),” Signal Processing: Image Communication, vol. 78, pp. 21–31, 2019.
  • [30] B. Laugraud, S. Piérard, and M. Van Droogenbroeck, “Labgen-p-semantic: A first step for leveraging semantic segmentation in background generation,” Journal of Imaging, vol. 4, no. 7, p. 86, 2018.