Co-occurrence Background Model with Superpixels for Robust Background Initialization
Abstract
Background initialization is an important step in many high-level applications of video processing, ranging from video surveillance to video inpainting. However, this process is often affected by practical challenges such as illumination changes, background motion, camera jitter and intermittent movement, etc. In this paper, we develop a co-occurrence background model with superpixel segmentation for robust background initialization. We first introduce a novel co-occurrence background modeling method called as Co-occurrence Pixel-Block Pairs (CPB) to generate a reliable initial background model, and the superpixel segmentation is utilized to further acquire the spatial texture information of foreground and background. Then, the initial background can be determined by combining the foreground extraction results with the superpixel segmentation information. Experimental results obtained from the dataset of the challenging benchmark (SBMnet) validate it’s performance under various challenges.
I Introduction
As a widely used approach in various computer vision and video processing applications[1, 2], scene background initialization plays an active role in object detection[3], video segmentation[4], video coding[5, 6] and video inpainting[7, 8], etc. Scene background initialization describes the scene without any foreground objects and generates a clear background to facilitate more efficient follow-up processing in computer vision or video processing applications. Bouwmans et al. overviewed and summarized many traditional and recent approaches that have been proposed and developed for scene background initialization[2], and previous works[9, 10] have already analyzed challenges of background initialization. However, background initialization is still faced with some severe practical challenges[11] which include:
-
•
Illumination changes: for example, light intensity typically varies during day.
-
•
Background motion: some movements in a scene should be determined as background e.g. swaying tree, waving water, or ever-changing advertising boards.
-
•
Camera jitter: in video surveillance, camera jitter is one severe issue that needs to be solved for background initialization.
-
•
Intermittent movement: the scene with abandoned objects stopping for a short while and then moving away. Under this condition, to differentiate between foreground and abandoned objects is difficult.
Fig. 1 shows the typical examples of these challenges.
To handle above challenges, we propose a robust background initialization approach based on the co-occurrence background model (Co-occurrence Pixel-Block Pairs: CPB) with superpixels. CPB has already been described in our previous work[12, 13]. As an intuitive and robust background model, CPB was originally designed for foreground detection under dramatical background changes, such as illumination changes and background motion. Here, CPB is utilized as the background model for scene background initialization. Then, in order to further obtain the spatial texture information of foreground and background for efficient background generation, the superpixel algorithm called simple linear iterative clustering (SLIC) [14] is introduced to classify the spatial correlations and temporal difference motion between foreground and background for motion detection. The main contributions of this work are as follows:
-
1.
The proposed approach enables to effectively acquire the spatial-temporal information of foreground and background and sensitively distinguish the difference between them, so it is highly efficient for motion detection in a scene under complex challenges, especially strong background changes (e.g. illumination changes and background motion) or intermittent motion.
- 2.
II Methodology
In this section, the proposed approach is described in details. The steps of it includes: (1) CPB background modeling; (2) Motion detection; (3) Background generation as shown in Fig. 2.
II-A Co-occurrence Background Model
The working diagram of CPB background modeling is illustrated in Fig. 3 including: the training process and the detecting process. In this work, the target pixel is compared with the as block, and we define to denote a supporting block set for the target pixel . Each frame is divided into blocks of size pixels:
(1) |
Background changes in scene can affect the current intensity of target pixel in foreground detection. Hence, it is quite natural that block , being strongly correlated with target pixel , can be used to determine the state of the latter. Block can be introduced as a reference to estimate the current intensity of target pixel , that is, there exists a correlation between pixel and block : ( is the average intensity of block in the current detecting frame). In order to reduce the risk of individual error and perform robust background model, to select the sufficient number of block with high correlation as supporting blocks is necessary, defined as follows:
(2) |
where
(3) |
where is the Pearson’s product-moment correlation coefficient. Then, the Gaussian model is used to construct the co-occurrence model for each pixel-block pair:
(4) |
where is the intensity of the pixel at frame and is the average intensity of blocks at frame. The background model is built as a list consisting of , where is the average intensity of target pixel in sequence frames computed by training ( is the number of training frames) and are the coordinates of supporting blocks.
At the detecting process, we use the correlation dependent decision for identifying the state of target pixel as shown in Fig.3 and more details are described in[13].
Challenge | Method | AGE | pEPs | pCEPs | PSNR | MS-SSIM | CQM | ||
---|---|---|---|---|---|---|---|---|---|
LaBGen-OF | 1.8388 | 0.0026 | 0.0017 | 0.9899 | 34.6563 | 35.4184 | |||
MSCL | 2.3728 | 0.0027 | 0.0016 | 0.9866 | 34.081 | 34.7595 | |||
FSBE | 3.0236 | 0.0055 | 0.0035 | 0.9821 | 33.6317 | 34.2344 | |||
|
1.9743 | 0.0024 | 0.0015 | 0.9899 | 34.8111 | 35.5647 | |||
SPMD | 2.1919 | 0.0004 | 0.0000 | 0.9935 | 38.6807 | 38.9381 | |||
Basic | Our approach | 1.4275 | 0.0002 | 0.0000 | 0.9983 | 42.3151 | 42.2216 | ||
LaBGen-OF | 19.6355 | 0.4062 | 0.2597 | 0.9346 | 19.4204 | 20.9417 | |||
MSCL | 2.8098 | 0.0043 | 0.0000 | 0.9913 | 34.9208 | 35.5259 | |||
FSBE | 6.6733 | 0.0177 | 0.0002 | 0.9817 | 29.2464 | 30.2773 | |||
|
17.6197 | 0.2733 | 0.1829 | 0.8641 | 18.4939 | 20.083 | |||
SPMD | 6.0889 | 0.0540 | 0.0129 | 0.9755 | 26.9955 | 28.1438 | |||
Illumination Changes | Our approach | 15.2618 | 0.1657 | 0.0130 | 0.9451 | 21.3651 | 22.3365 | ||
LaBGen-OF | 1.7604 | 0.0022 | 0.0005 | 0.9893 | 38.6184 | 39.0805 | |||
MSCL | 2.1299 | 0.0016 | 0.0005 | 0.9962 | 36.6006 | 36.8315 | |||
FSBE | 1.8453 | 0.0029 | 0.0003 | 0.9814 | 37.9984 | 37.9817 | |||
|
1.5156 | 0.0000 | 0.0000 | 0.9970 | 41.4472 | 41.4719 | |||
SPMD | 2.2313 | 0.0035 | 0.0002 | 0.9823 | 36.8531 | 36.1390 | |||
Background Motion | Our approach | 1.7742 | 0.0000 | 0.0000 | 0.9965 | 39.7339 | 39.9130 | ||
LaBGen-OF | 11.9868 | 0.1590 | 0.0267 | 0.8719 | 20.2275 | 21.7778 | |||
MSCL | 5.8660 | 0.0471 | 0.0067 | 0.9699 | 26.0077 | 27.1642 | |||
FSBE | 10.1060 | 0.1413 | 0.0283 | 0.9003 | 22.5280 | 23.8107 | |||
|
11.1637 | 0.1466 | 0.0281 | 0.8619 | 20.4535 | 21.8627 | |||
SPMD | 1.3573 | 0.0001 | 0.0000 | 0.9979 | 42.1226 | 42.1988 | |||
Camera Jitter | Our approach | 9.4038 | 0.1205 | 0.0133 | 0.9235 | 22.6436 | 24.0308 | ||
LaBGen-OF | 2.3248 | 0.0043 | 0.0021 | 0.9948 | 36.5121 | 36.8640 | |||
MSCL | 1.8481 | 0.0026 | 0.0011 | 0.9943 | 37.9796 | 38.1597 | |||
FSBE | 3.8068 | 0.0263 | 0.0173 | 0.9432 | 27.9022 | 28.9156 | |||
|
2.1082 | 0.0031 | 0.0016 | 0.9945 | 37.5222 | 37.7290 | |||
SPMD | 2.1629 | 0.0032 | 0.0017 | 0.9940 | 37.2778 | 37.5754 | |||
Intermittent Movement | Our approach | 1.6250 | 0.0012 | 0.0000 | 0.9957 | 38.4293 | 38.7184 |
-
*
Note that red entries indicate the best in metric.
II-B Motion Detection Combined with Superpixels
Superpixel segmentation has attracted the interest of many computer vision applications as it provides an effective strategy to estimate image features and reduce the complexity of subsequent image processing tasks[18]. Superpixels have been applied in various fields including object recognition[19, 20], image segmentation[21] and object tracking[22].
As most optical flow techniques assumed [23] that the motion field near motion boundary between foreground and background tend to be over-smoothed and blurred. Motion boundaries are the most important regions and incorrect motions near the area often lead a incorrect result in motion estimation. For effective motion estimation in a scene, we introduce the superpixel segmentation algorithm in the proposed algorithm to further acquire and differentiate the spatial texture information of foreground and background[24, 11]. Here, SLIC algorithm[14] is utilized on account of its low complexity and high memory efficiency in computation.
The steps of motion detection are as follows:
-
1.
To record the pixels of the foreground detected by CPB;
-
2.
To estimate the value of superpixel regions in these pixels ;
-
3.
Then, to detect the motion and acquire the motion mask , when is denoted as:
(5)
The motion mask . With the help of superpixel segmentation, the proposed approach can further acquire the spatial information of each pixel and distinguish the different motion information between foreground and background. Based on this, the proposed approach can reinforce the original CPB for extracting motion and avoid errors in information extraction from pixels.
II-C Final Background Generation
Then, we replace the region of motion mask with the initial CPB background model for background generation as shown in Fig. 3.
III Experiments
III-A Experiment Setup
In order to fairly evaluate the proposed approach without losing generality, we consider the several challenges in the background initialization algorithm[17]. The following challenges are selected from SBMnet for evaluation:
-
•
Basic: PETS2006 represents a mixture of mild challenges typical of the shadows and intermittent movement.
-
•
Illumination changes: Dataset3Camera2 with the illumination changes during day.
-
•
Background motion: advertisementBoard contains an ever-changing advertising board in the scene.
-
•
Camera jitter: boulevard contains the videos captured by outdoor unstable cameras.
-
•
Intermittent movement: sofa sequence with abandoned objects moving, then stopping for a short while, and then moving again.
III-B Evaluation Measurement
Six metrics which are the common measurements for the background initialization algorithm [17, 11] are introduced for performance evaluation in this paper. They are explained as follows:
-
•
AGE (Average Gray-level Error): average of the absolute difference between GT and BI.
-
•
pEPs (Percentage of Error Pixels): number of pixels in BI whose value differs from the value of the corresponding pixel in GT by more than a threshold , which is set as 20 in[17].
-
•
pCEPs (Percentage of Clustered Error Pixels): percentage of CEPs (number of pixels whose 4-connected neighbors are also error pixels) with respect to the total number of pixels in the image.
-
•
PSNR (Peak Signal to Noise Ratio): widely used to measure the quality of BI compared with GT, defined as .
-
•
MS-SSIM (Multi-scale Structural Similarity Index): estimate of the perceived visual distortion defined in [25].
-
•
CQM (Color image Quality Measure): defined in [26]. It assumes values in db and the higher CQM value, the better is the background estimate.
Where, GT means the ground truth of the background image and BI means the generated background image computed by the background initialization approaches.
III-C Result Evaluation
In this section, the proposed approach is compared with five different state-of-the-art techniques selected from SBMnet benchmark, which are LaBGen-OF[27], MSCL[28], FSBE[29], LaBGen-P-Semantic(MP+U) [30] and SPMD[11]. Four of them are the leading techniques for background initialization in SBMnet benchmark, especially MSCL[28] which is the top ranked techniques at present. All the results of the five different techniques come from SBMnet benchmark.
In experiments, we set each block as pixels with input frame size of for CPB. All used parameters are listed in Table II, and a detailed discussion of parameters can be found in[12]. Experimental results of the background initialization are presented in Fig. 4, and Table I lists the overall evaluation of these approaches in different challenges. It can be seen from the above results as shown in Fig. 4 and Table II, that our approach outperforms other techniques in challenges of Basic and Intermittent Movement, and for Background Motion, our approach has a close performance to LaBGen-P-Semantic(MP+U), which is the best in this challenge. For other two different challenges, our approach also leads the intermediate level compared with other techniques and the performance is acceptable. The comparison shows that our approach is robust and effective for background initialization in different challenges.
The processing time for background initialization is close to 0.15 seconds with frame size of in MATLAB platform (Intel i7 2.40 GHZ and 16G).
Supporting blocks number | 20 |
---|---|
Threshold of Gaussian model | 2.5 |
Threshold of correlation dependent decision | 0.5 |
IV Conclusions
In this paper, we propose a new approach for robust background initialization of a complex scene based on co-occurrence background model (CPB) with superpixel segmentation. It is designed to handle the severe challenges in background initialization, such as illumination changes, background motion, camera jitter and intermittent movement, etc. Video sequences contain the temporal context information which can be learned by CPB model from the training data to resist interference in the scene. Furthermore, superpixel segmentation can help acquire more spatial texture information to facilitate the motion differentiation between foreground and background. The experimental results under different challenges validate the comprehensive performance of the proposed approach. More details including source code are released in: https://github.com/zwj1archer/CPB-superpixel.git.
Acknowledgment
This work is supported by scientific research starting project of SWPU (No.2019QHZ017).
References
- [1] T. Bouwmans, “Traditional and recent approaches in background modeling for foreground detection: An overview,” Computer Science Review, vol. 11, pp. 31–66, 2014.
- [2] T. Bouwmans, L. Maddalena, and A. Petrosino, “Scene background initialization: A taxonomy,” Pattern Recognition Letters, vol. 96, pp. 3–11, 2017.
- [3] X. Zhang, C. Zhu, S. Wang, Y. Liu, and M. Ye, “A bayesian approach to camouflaged moving object detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 9, pp. 2001–2013, 2017.
- [4] C. Chiu, M. Ku, and L. Liang, “A robust object segmentation system using a probability-based background extraction algorithm,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, no. 4, pp. 518–528, 2010.
- [5] M. Paul, “Efficient video coding using optimal compression plane and background modelling,” IET Image Processing, vol. 6, no. 9, pp. 1311–1318, 2012.
- [6] X. Li et al., “Background-foreground information based bit allocation algorithm for surveillance video on high efficiency video coding (hevc),” in 2016 Visual Communications and Image Processing (VCIP). IEEE, 2016, pp. 1–4.
- [7] A. Colombari, M. Cristani, V. Murino, and A. Fusiello, “Exemplar-based background model initialization,” in Proceedings of the third ACM international workshop on Video surveillance & sensor networks, 2005, pp. 29–36.
- [8] X. Chen, Y. Shen, and Y. H. Yang, “Background estimation using graph cuts and inpainting,” in Proceedings of Graphics Interface 2010, 2010, pp. 97–103.
- [9] L. Maddalena and A. Petrosino, “Towards benchmarking scene background initialization,” in International conference on image analysis and processing. Springer, 2015, pp. 469–476.
- [10] P.-M. Jodoin, L. Maddalena, A. Petrosino, and Y. Wang, “Extensive benchmark and survey of modeling methods for scene background initialization,” IEEE Transactions on Image Processing, vol. 26, no. 11, pp. 5244–5256, 2017.
- [11] Z. Xu, B. Min, and R. C. Cheung, “A robust background initialization algorithm with superpixel motion detection,” Signal Processing: Image Communication, vol. 71, pp. 1–12, 2019.
- [12] W. Zhou, S. Kaneko, D. Liang, M. Hashimoto, and Y. Satoh, “Background subtraction based on co-occurrence pixel-block pairs for robust object detection in dynamic scenes,” IIEEJ transactions on image electronics and visual computing, vol. 5, no. 2, pp. 146–159, 2017.
- [13] W. Zhou, S. Kaneko, M. Hashimoto, Y. Satoh, and D. Liang, “Foreground detection based on co-occurrence background model with hypothesis on degradation modification in dynamic scenes,” Signal Processing, vol. 160, pp. 66–79, 2019.
- [14] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, “Slic superpixels compared to state-of-the-art superpixel methods,” IEEE transactions on pattern analysis and machine intelligence, vol. 34, no. 11, pp. 2274–2282, 2012.
- [15] P. Xu, M. Ye, Q. Liu, X. Li, L. Pei, and J. Ding, “Motion detection via a couple of auto-encoder networks,” in 2014 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2014, pp. 1–6.
- [16] I. Halfaoui, F. Bouzaraa, and O. Urfalioglu, “Cnn-based initial background estimation,” in 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 2016, pp. 101–106.
- [17] “A dataset for testing background estimation algorithms,” http://scenebackgroundmodeling.net/.
- [18] M. Wang, X. Liu, Y. Gao, X. Ma, and N. Q. Soomro, “Superpixel segmentation: A benchmark,” Signal Processing-image Communication, vol. 56, pp. 28–39, 2017.
- [19] H. Lu, X. Feng, X. Li, and L. Zhang, “Superpixel level object recognition under local learning framework,” Neurocomputing, vol. 120, pp. 203–213, 2013.
- [20] D. Giordano, F. Murabito, S. Palazzo, and C. Spampinato, “Superpixel-based video object segmentation using perceptual organization and location prior,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 4814–4822.
- [21] T. Lei, X. Jia, Y. Zhang, S. Liu, H. Meng, and A. K. Nandi, “Superpixel-based fast fuzzy c-means clustering for color image segmentation,” IEEE Transactions on Fuzzy Systems, vol. 27, no. 9, pp. 1753–1766, 2018.
- [22] F. Yang, H. Lu, and M.-H. Yang, “Robust superpixel tracking,” IEEE Transactions on Image Processing, vol. 23, no. 4, pp. 1639–1651, 2014.
- [23] S. Baker, D. Scharstein, J. Lewis, S. Roth, M. J. Black, and R. Szeliski, “A database and evaluation methodology for optical flow,” International journal of computer vision, vol. 92, no. 1, pp. 1–31, 2011.
- [24] J. Lim and B. Han, “Generalized background subtraction using superpixels with label integrated motion estimation,” in European Conference on Computer Vision. Springer, 2014, pp. 173–187.
- [25] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2. Ieee, 2003, pp. 1398–1402.
- [26] Y. Yalman and İ. ERTÜRK, “A new color image quality measure based on yuv transformation and psnr for human vision system,” Turkish Journal of Electrical Engineering & Computer Sciences, vol. 21, no. 2, pp. 603–612, 2013.
- [27] B. Laugraud and M. Van Droogenbroeck, “Is a memoryless motion detection truly relevant for background generation with labgen?” in International Conference on Advanced Concepts for Intelligent Vision Systems. Springer, 2017, pp. 443–454.
- [28] S. Javed, A. Mahmood, T. Bouwmans, and S. K. Jung, “Background–foreground modeling based on spatiotemporal sparse subspace clustering,” IEEE Transactions on Image Processing, vol. 26, no. 12, pp. 5840–5854, 2017.
- [29] A. Djerida, Z. Zhao, and J. Zhao, “Robust background generation based on an effective frames selection method and an efficient background estimation procedure (fsbe),” Signal Processing: Image Communication, vol. 78, pp. 21–31, 2019.
- [30] B. Laugraud, S. Piérard, and M. Van Droogenbroeck, “Labgen-p-semantic: A first step for leveraging semantic segmentation in background generation,” Journal of Imaging, vol. 4, no. 7, p. 86, 2018.