¹¹footnotetext: Hai-Long Tu and Yang Liu contributed equally to this work.²²footnotetext: Corresponding author. [email protected]

Networks with pixel embedding: a method to improve noise resistance in image classification

Yang Liu Hai-Long Tu Chi-Chun Zhou Yi Liu and Fu-Lin Zhang

Abstract

In the task of image classification, usually, the network is sensitive to noises. For example, an image of cat with noises might be misclassified as an ostrich. Conventionally, to overcome the problem of noises, one uses the technique of data augmentation, that is, to teach the network to distinguish noises by adding more images with noises in the training dataset. In this work, we provide a noise-resistance network in images classification by introducing a technique of pixel embedding. We test the network with pixel embedding, which is abbreviated as the network with PE, on the mnist database of handwritten digits. It shows that the network with PE outperforms the conventional network on images with noises. The technique of pixel embedding can be used in many tasks of image classification to improve noise resistance.

1 Introduction

In the task of image classification, information of labels is embedded in pixels of the image. For example, in an image of dogs, a group of pixels with different values will give the typical pattern of the dog, such as the ear and the nose. If the network can remember those typical patterns constructed by a group of pixels, then it can predict the label of an image with high accuracy. For example, neural networks based on convolution and pooling are very good at extracting typical patterns from pixels of the image [1, 2] and even outperform humans in some tasks of image classification [3].

However, one finds that the conventional network is sensitive to noises [4, 5]. For example, an image of cat with noises might be misclassified as an ostrich. The low noise resistance of the conventional network will lead to problems such as the safety problem of face recognition. Therefore, how to increase the noise resistance of the network in image classification becomes a problem.

Conventionally, to overcome the problem of noises, one uses the technique of data augmentation [6, 7, 8, 9], that is, to teach the network to distinguish noises by adding more image samples with noises in the training dataset. Or, one develops techniques of images noise reduction [10, 11, 12, 13].

In the task of image classification, instead of understanding the meaning of pixels, the network just need to remember the typical patterns constructed by a group of pixels. On the one hand, noises are random values added to pixels of the image. On the other hand, the conventional network is designed to remember typical patterns constructed by a group of pixels. Thus, noises will greatly interfere in those typical patterns that has been learned by the network. As a result, networks will give wrong labels because pixels with noises now become highly confusing.

In this paper, we provide a noise-resistance network in image classification by introducing a technique of pixel embedding. The network with pixel embedding abbreviated as the network with PE is designed not only to remember the typical pattern but also to "understand" the meaning of the pixels. The method in the present paper is enlightened by the classical technique of words embedding [14, 15, 16] in the natural language processing (NLP). The technique of words embedding helps the network to "understand" the meaning of the words and learns knowledges such as analogy and context. We test the network with PE on the mnist database of handwritten digits. By comparing the behavior of networks with and without PE on the test dataset of noised images of the mnist dataset, we show that the network with PE outperforms the conventional network on images with noises and thus is with high noise resistance. The technique of pixel embedding can be used in many tasks of image classification to improve noise resistance. The source code is released in github. ³³footnotetext: $https://github.com/zhouchichun/pixels\_embedding$

The work is organized as follows: in Sec. 2, we introduce the technique of pixel embedding. In Sec. 3, we test the network with PE in the classification task on the mnist database of handwritten digits. Conclusions and outlooks are given in Sec. 4.

2 pixel embedding

In this section, we introduce the technique of pixel embedding. The main ideal of the technique of pixel embedding is to replace the single-valued pixel with a vector which has the shape $1\times M$ with $M$ the embedding size. Values of the vector are parameters that need to be trained. By using the technique of pixel embedding, the network is able to "understand" the meaning of the pixels and learn to distinguish noises automatically.

The main procedure of pixel embedding is as follow: for an image with $H\times L$ pixels, firstly, we normalize the pixels by subtracting the average and dividing the standard deviation. Secondly, we convert the normalized pixels into an integer by multiplying pixels by an integer, say $1000$ , and taking the integer portion. Thirdly, we introduce an embedding dictionary which maps an integer to a unique random vector with embedding size, say $64$ . The procedure is shown in Fig. (1).

Refer to caption — Figure 1: The procedure of pixel embedding.

The network used in the experiment. The technique of pixel embedding can be used in various kinds of conventional networks. Without loss of generality, the network we used here is based on the VGG16, a classical network for image classification. The structure of the network is shown in Fig. (2). The detail of the structure of the neural network is given in Table. (1)

Table 1: ConvNet configurations.

Layer	Name	Output Shape
$0$	Input	(28,28, $E_{s}$ )
$1$	Convolution	(28,28,64)
$2$	Max pooling	(14,14,64)
$3$	Convolution	(14,14,64)
$4$	Max pooling	(7,7,64)
$5$	Folding	(1,3136)
$6$	Full connection	(1,10)

Training parameters are show in Table. (2)

Table 2: Training parameters of the network used in the present paper.

Length of the embedding dictionary	$2000$
Optimization algorithm	Adam
Integer to multiply	$1000$
Embedding size	$64$
Learning rate	$0.0001$
Batch size	$64$

3 The classification task on the mnist database

In this section, we test the network with PE on the mnist database. The mnist database of handwritten digits [17, 18] is an famous open dataset for image classification task. We train networks with PE and without PE on $60,000$ training images respectively and test them on $10,000$ test images with different noises. The behaviors of networks with and without PE on the test dataset of noised images of the mnist dataset are as follows.

For the test dataset with gauss noise, the result is given in Table. (3).

Table 3: The accuracy on test dataset with gauss noise of networks with and without PE.

Number	Index	Accuracy (without PE)	Accuracy (with PE)
case1	Standard test dataset	$98.7$	$97.83$
case2	$\mu=1.0,\sigma=1.0$	$85.70$	$95.94$
case3	$\mu=5.0,\sigma=1.0$	$88.66$	$88.10$
case4	$\mu=10,\sigma=2$	$78.53$	$80.18$
case5	$\mu=25,\sigma=5$	$53.24$	$67.92$
case6	$\mu=50,\sigma=10$	$28.71$	$59.89$
case7	$\mu=75,\sigma=15$	$18.61$	$40.22$
case8	$\mu=100,\sigma=10$	$17.81$	$33.52$
case9	$\mu=150,\sigma=10$	$12.22$	$18.84$

In Table. (3), " $\mu=1.0,\sigma=1.0$ " means that the average value of noises is $1.0$ and the standard deviation is $1.0$ and so on. Because the pixels are integers between $0$ and $255$ , the random noise are transformed to integers between $0$ and $255$ . As shown in Fig. (3).

For the test dataset with salt-and-pepper noise, the result is given in Table. (4).

Table 4: The accuracy on test dataset with salt-and-pepper noise of networks with and without PE.

Number	Index	Accuracy (without PE)	Accuracy (with PE)
case1	Standard test dataset	$98.17$	$97.83$
case2	$p=0.1,v=25$	$94.72$	$96.96$
case3	$p=0.1,v=50$	$90.10$	$97.07$
case4	$p=0.1,v=100$	$84.24$	$96.03$
case5	$p=0.2,v=50$	$86.93$	$95.28$
case6	$p=0.3,v=50$	$82.36$	$91.14$
case7	$p=0.4,v=50$	$77.31$	$89.59$
case8	$p=0.5,v=50$	$69.53$	$78.51$
case9	$p=0.5,v=100$	$49.47$	$77.75$
case10	$p=0.6,v=50$	$60.00$	$81.55$
case11	$p=0.6,v=100$	$35.38$	$64.12$
case12	$p=0.7,v=50$	$48.40$	$71.72$
case13	$p=0.7,v=100$	$22.20$	$59.09$
case14	$p=0.8,v=100$	$14.93$	$47.62$

In Table. (4), " $p=0.1,v=25$ " means that the value of the noise pixels is $25$ and the probability of a pixel to have a noise is $0.1$ and so on. The random noise are also transformed to integers between $0$ and $255$ . As shown in Figs. (4)-(5).

4 Conclusions and outlooks

In this paper, beyond the conventional method of data augmentation, where more sample datas are added to the training dataset, we propose a new method that lead to a noise-resistance network where no more training sample is needed. In this approach, a technique of pixel embedding is introduced. This method is enlightened by the classical techniques of word embedding in the NLP. The technique of pixel embedding extends the meaning of pixels, and thus, the network can learn the knowledge such as the noise, the context, and the meaning of the pixel. We test the network with PE on the mnist database of handwritten digits. By comparing the behavior of networks with and without PE on the test dataset of noised images of the mnist dataset, we show that the network with PE outperforms the conventional network on images with noises and thus is with high noise resistance. The technique of pixel embedding can be used in various kinds of tasks of image classification to improve noise resistance.

To remember or to learn. In the task of image classification, instead of understanding the meaning of pixels, the network just need to remember the typical patterns constructed by a group of pixels. To remember is much easier than to "understand", thus, the methodology of deep neural network achieve success in the classification tasks of images first. However, at cases, such as image classification with noises, understanding a sentence in the NLP, discovering hidden laws from datas in financial, mathematics, and physics, the network has to learn more knowledges from the raw data other than just remember typical patterns constructed by combinations of raw features. To do this, we have to introduce techniques that can make the network more "intelligent".

The technique of pixel embedding is an attempt to make the network to "understand" the meaning of pixels other than to remember typical patterns constructed by groups of pixels.

5 Authors’ contributions

Hai-Long Tu performed the experiment. Yang Liu checked the algorithm, reconstructed the code, and repeated the experiment. Hai-Long Tu and Yang Liu contributed equally to this work

6 Acknowledgments

We are very indebted to Profs. Wu-Sheng Dai, Guan-Wen Fang, and Yong Xie for their encouragements. This work is supported by "Artificial intelligence + domestic waste classification" of the basic research youth project of Yunnan Provincial Science and Technology Department.

7 Appendix

It shows that the pixel embedding increases the noise resisting ability of the algorithm. However, in the experiment, the pixel embedding is applied after normalization preprocess. Here, in order to show that the pixel embedding contributes mainly to the noise resisting ability other than the normalization preprocess, we give the result of network without pixel embedding but with normalization, as shown in Table. (5)

Table 5: The accuracy on test dataset with gauss noise of networks with only normalization preprocess.

Number	Index	Accuracy
case1	Standard test dataset	$97.57$
case2	$\mu=1.0,\sigma=1.0$	$70.25$
case3	$\mu=5.0,\sigma=1.0$	$72.26$
case4	$\mu=10,\sigma=2$	$54.14$
case5	$\mu=25,\sigma=5$	$24.85$
case6	$\mu=50,\sigma=10$	$11.56$
case7	$\mu=75,\sigma=15$	$10.11$
case8	$\mu=100,\sigma=10$	$9.94$
case9	$\mu=150,\sigma=10$	$9.93$

References

[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in neural information processing systems, pp. 1097–1105, 2012.
[2] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai, et al., Recent advances in convolutional neural networks, Pattern Recognition 77 (2018) 354–377.
[3] M. Z. Alom, T. M. Taha, C. Yakopcic, S. Westberg, P. Sidike, M. S. Nasrin, B. C. Van Esesn, A. A. S. Awwal, and V. K. Asari, The history began from alexnet: A comprehensive survey on deep learning approaches, arXiv preprint arXiv:1803.01164 (2018).
[4] R. Geirhos, D. H. Janssen, H. H. Schütt, J. Rauber, M. Bethge, and F. A. Wichmann, Comparing deep neural networks against humans: object recognition when the signal gets weaker, arXiv preprint arXiv:1706.06969 (2017).
[5] G. B. P. da Costa, W. A. Contato, T. S. Nazare, J. E. Neto, and M. Ponti, An empirical study on the effects of different types of noise in image classification tasks, arXiv preprint arXiv:1609.02781 (2016).
[6] B. R. Frieden, Image augmentation and restoration, in Picture processing and digital filtering, pp. 177–248. Springer, 1975.
[7] F. Russo, An image augmentation technique combining sharpening and noise reduction, IEEE Transactions on Instrumentation and Measurement 51 (2002), no. 4 824–828.
[8] D. Ray, D. D. Majumder, and A. Das, Noise reduction and image augmentation of mri using adaptive multiscale data condensation, in 2012 1st International Conference on Recent Advances in Information Technology (RAIT), pp. 107–113, IEEE, 2012.
[9] Q. Ye, H. Huang, and C. Zhang, Image augmentation using stochastic resonance [sonar image processing applications], in 2004 International Conference on Image Processing, 2004. ICIP’04., vol. 1, pp. 263–266, IEEE, 2004.
[10] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, Journal of machine learning research 11 (2010), no. Dec 3371–3408.
[11] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising, IEEE Transactions on Image Processing 26 (2017), no. 7 3142–3155.
[12] S. Lefkimmiatis, Non-local color image denoising with convolutional neural networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3587–3596, 2017.
[13] T. Brooks, B. Mildenhall, T. Xue, J. Chen, D. Sharlet, and J. T. Barron, Unprocessing images for learned raw denoising, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11036–11045, 2019.
[14] O. Levy and Y. Goldberg, Neural word embedding as implicit matrix factorization, in Advances in neural information processing systems, pp. 2177–2185, 2014.
[15] D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, and B. Qin, Learning sentiment-specific word embedding for twitter sentiment classification, in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1555–1565, 2014.
[16] Y. Goldberg and O. Levy, word2vec explained: deriving mikolov et al.’s negative-sampling word-embedding method, arXiv preprint arXiv:1402.3722 (2014).
[17] Y. LeCun, The mnist database of handwritten digits, http://yann. lecun. com/exdb/mnist/ (1998).
[18] L. Deng, The mnist database of handwritten digit images for machine learning research [best of the web], IEEE Signal Processing Magazine 29 (2012), no. 6 141–142.