Supervised Anomaly Detection Method Combining Generative Adversarial Networks and Three-Dimensional Data in Vehicle Inspections

Yohei Baba JR East Information Systems Company, Shinjuku, Tokyo, 169-0072, Japan [email protected] Takuro Hoshi JR East Information Systems Company, Shinjuku, Tokyo, 169-0072, Japan [email protected] Ryosuke Mori East Japan Railway Company Shibuya, Tokyo, 151-0053, Japan [email protected] {@IEEEauthorhalign} Gaurang Gavai PARC A Xerox Company,3333 Coyote Hill Road, Palo Alto CA 94304 USA [email protected]

Abstract

The external visual inspections of rolling stock’s underfloor equipment are currently being performed via human visual inspection. In this study, we attempt to partly automate visual inspection by investigating anomaly inspection algorithms that use image processing technology. As the railroad maintenance studies tend to have little anomaly data, unsupervised learning methods are usually preferred for anomaly detection; however, training cost and accuracy is still a challenge. Additionally, a researcher created anomalous images from normal images by adding noise, etc., but the anomalous targeted in this study is the rotation of piping cocks that was difficult to create using noise. Therefore, in this study, we propose a new method that uses style conversion via generative adversarial networks on three-dimensional computer graphics and imitates anomaly images to apply anomaly detection based on supervised learning. The geometry-consistent style conversion model was used to convert the image, and because of this the color and texture of the image were successfully made to imitate the real image while maintaining the anomalous shape. Using the generated anomaly images as supervised data, the anomaly detection model can be easily trained without complex adjustments and successfully detects anomalies.

Index Terms:

Computer vision; machine learning; railway engineering; generative adversarial networks; three-dimensional computer graphics.

I Introduction

The railroad is one of Japan’s most widely known means of transportation, and it necessitates high quality maintenance to ensure daily stability. However, because inspecting inputs requires a considerable amount of work and there is a risk of oversight, methods for automatic railroad inspection are being developed[1]. JR East Japan Group, Japan’s largest railroad company, is working on smart maintenance that conforms to the state of the equipment and vehicles[2]; however, because maintenance is performed before anomalies occur owing to meticulous quality management, much anomaly data is not available. The visual inspection of the underfloor equipment of rolling stock, which is the subject of this research, is performed by human eyes, but anomaly detection has been studied to automate part of this visual inspection[3]. They were successful in detecting anomalies in the study by comparing the luminance of normal and anomalous images. However, it is difficult to apply anomaly detection per supervised learning, which necessitates a large number of anomaly images for the supervised data. In machine learning, securing sufficient data is essential. There are numerous data expansion methods via creating machine learning data via amplifying a small amount of data, which are summarized in a previous study[4]. As reported in Ref.[5], the amount of data was increased by generating images of liver lesions using deep convolutional GAN (DCGAN)[6], to classify the illness images. Moreover, CycleGAN was used to amplify a small number of facial image data[7, 8]. The method for amplifying anomaly data described above, however, uses GAN. As a result, because supervised learning requires different anomaly data for learning data and test data, it cannot be used when only one set of anomaly data is available. Alternatively, AnoGAN is an unsupervised learning method that does not use anomaly data[9]. AnoGAN can generate anomaly detection models without the use of anomaly data, but it has some limitations, such as a high computational cost. Therefore, f-AnoGAN was developed afterward[10], which greatly reduced the computational cost. In generative anomaly detection, a model trained only with normal images cannot reproduce anomaly regions, and errors occur during reconstruction. Recently, extensive research has been done to better understand the range of reconstruction of normal images. Using multiple hypotheses, the normal image’s range was clearly captured[11]. According to previous studies, the range of the normal image is defined as a Gaussian distribution[12]; a model was built using double autoencoders GAN[13]; and a rough reconstruction model was combined with a detailed reconstruction model[14]. The model accuracy has been improved by reducing the influence of the background[15, 16]. Another study increased the accuracy through training model with a small number of anomalous images[17]. Furthermore, the accuracy was improved by generating abnormal images based on normal images and using them for training[18, 19]. The anomaly detection using only normal images as described above is very important in the railroad field. As a substantial amount of anomaly imagery is not available in the railroad maintenance field, many railroad corporations and organizations are exploring it. To address cases where there is a lot of data but a small percentage of anomaly images, luminance was improved by detecting anomaly data via semisupervised learning and adding it to the training data[20]. Simultaneously, a model was developed for detecting anomalies in an overhead wire using a DCGAN discriminatory mechanism, as well as a model for detecting anomalies in railroad equipment using unsupervised learning that does not use anomaly data[21]. Moreover,, models were used that also include encoders that convert image data to vector spaces, to create anomaly detection models for electric equipment[22, 23]. These models do not use anomaly images, but there was still room for improvement in terms of luminance. A method for detecting anomalies was proposed by overlaying rail images of the same area and detecting the differences[24, 25]. This method can be used if there is only one image as a sample per shot or area, but it required machine learning using anomaly images to prevent false positives that could likely occur. Alternatively, there is a method for adjusting for a lack of anomaly data in industrial and robot detection by creating anomaly images with three-dimensional (3D) computer graphics (CG) software and using them as supervised images. It is possible to generate anomaly images that do not exist using this method, but there will be issues with color and texture differences between CG images and actual photos, as well as being unable to distinguish actual images even when training with CG images. CG simulation images were used to train the object detection model of robotic controls, and they enable recognition of “one of many variations” for actual images by changing the rendering settings, generating images in various brightness, and colors, and using them as supervised images[26]. Further, images were generated by adjusting the texture settings of the objects to detect, approximate the actual texture, and use images from the actual site for the background portions[27]. Based on this research, we can see that “object detection” using CG for the supervised data was successful, but it does not refer to “anomaly detection” that detects small changes in images. Anomaly detection is a more difficult task than object detection because the precision changes depending on small differences in the texture and luminance when compared to object detection. In this study, we used GANs on CG to generate anomaly images that were infinitely close to the real images shot with an external inspection device and then used those as supervised data to detect anomalies.

II Data

II-A Vehicle side imagery and piping cocks

The images of vehicle sides captured by external vehicle inspection devices are the focus of this study. Fig.1 depicts an image taken with an external vehicle inspection device. Because the resolution of this image is very high (20,501 × 2,048 pixels) and it was taken over a large area–the entire vehicle. Therefore, it would be difficult to apply machine learning anomaly detection. As a result, we investigated whether anomalies in piping cocks could be detected. A trimmed image of 200 × 200 pixels with the piping cocks and their surrounding area extracted is shown in Fig.2. The piping cock is considered to be in normal condition in Fig.2. If the piping cock angle differs from that in Fig.2, it is determined to be an anomaly.

Refer to caption — Figure 1: Image captured with an external vehicle inspection device)

The anomaly images required for the verification of the anomaly detection model were provided by the East Japan Railway Company. The anomaly images are shown in Fig.3.

II-B Data flow

Fig.4 depicts the data flow of this study. We used an image of the actual device for training and testing for the normal image. Meanwhile, the data used for training and testing for the anomaly image is different. We used data created in 3D CG and converted it using the image conversion model for machine learning for the training data. The test data consisted of anomaly images provided by East Japan Railway Company. The details of the training image generation method and the number of images used are described in III-B.

II-C CG imagery

Based on photos of the anomaly simulator, we created the CG to use as supervised data. Fig.5 depicts a representation of the anomaly simulator. We generated 3D geometric data that replicated the geometry of the photo(s) as closely as possible. To create anomaly CG, we used 3D rendering software based on 3D geometric data and exported an image rotated at various angles. Fig.6 depicts an example of the CG anomaly. In comparing Fig.2 and Fig.6, the surface of the CG image is more even and does not have the color shading that is in the actual image.

III Proposed Method

III-A Style conversion via GAN

To make the CG image closer to the actual image, we performed style conversion using machine learning. In this study, we used geometry-consistency GAN (GCGAN)[28], believing that converting the color and texture while maintaining the geometry was necessary. In GCGAN, we train so that “a normal converted image” and “a rotated image that is converted and then reverted to its original rotation” match. This helps us to maintain precise geometry while converting the color and pattern. The GCGAN we used was the model published by GitHub[29]. The architecture is shown in Fig.7.

As learning data, we used 600 actual image data and 600 CG data. The dropout layer was disabled, the input size was 200 × 200, the batch size was 12, and the transformations could be flipped vertically or rotated 90°, so we tried both and used those flipped vertically. The IdentityMappingLoss parameter was set to 0.5, which determines the percentage of the original image’s color and structure that is retained. We trained the model for 400 epochs without changing the learning rate, then for 200 epochs while decreasing the learning rate to 0, and used the model with the best quality from the 200th epoch. We input CG anomaly data and generated 1,000 anomaly data against the model that had been trained. Fig.8 shows an example of one of the generated images. The rotated piping cock and plate are reproduced in the same position, and it can be confirmed that the color shading and texture are changed.

III-B Data set

To compare the results of anomaly detection precision trained using CG and anomaly images that had style conversion with GCGAN, we created an anomaly detection model from two patterns of data sets. The details of the data are shown in TableI.

	CG Training Model		GCGAN Training Model
	Training	Data	training	Data
Actual Image-Normal	600	600	600	600
CG Anomaly	600	0	0	0
GCGAN-Anomaly	0	0	600	0
Actual Image-Anomaly	0	1	0	1

TABLE I: Data set of anomaly detection model

III-C Anomaly detection model using ResNet

To test whether anomaly detection is possible with anomaly images generated using GCGAN as supervised data, we implemented transfer learning based on ResNet.[30] The structure is shown in Fig.9. The ResNet50 model of Tensorflow was used, and the configuration is shown in Fig.9. We used a learning rate of $1.0*10^{-5}$ , a training frequency of 50 epochs, a momentum of 0.9, a batch size of 32, and an image size scaled to 224. Two patterns, CG and GCGAN, were used as the anomaly images for training, and both were trained using the same parameters.

IV Results and Discussion

IV-A Results

Fig.10 depicts the outcomes of anomaly detection using only CG as supervised data. We discovered that the anomaly images could not be distinguished at all and that improving the precision with CG anomaly images was difficult. Fig.11 then displays the results of GCGAN training model detection. The GCGAN training model was effective in detecting anomalous images and increasing accuracy.

IV-B Discussion

There was a difference in the detectability of anomaly images when training with CG and when using CG converted using GCGAN. Moreover, whether this is sufficient to focus on the position of the cock is unclear. As a result, we used Grad-CAM to visualize the areas that the anomaly detection model considers when making decisions[31]. The results of Grad-CAM when an anomaly image was used are shown in Fig.12. The model’s reaction is spread across the entire image due to the use of CG. The model based on GCGAN, however, focuses on the handle positions. This indicates that GCGAN’s texture transformation changed the detection model’s focus to the image’s structure.

V Conclusions

In this research, we verified that anomaly detection is possible using fictional anomaly images generated by combining CG and GCGAN as supervised data. A common issue in railroad maintenance is the scarcity of anomaly data. When a specific number of anomaly patterns can be obtained from a small amount of data (for example, when there are several dozen to several hundred anomaly data), there is existing research that allows for an increase in data through data expansion. Furthermore, even if there was a way to build a model using unsupervised learning without using any anomaly data, and the images were normal in the unsupervised learning, there would be more false positives owing to environmental changes, causing precision issues. In this research, we were able to identify anomaly patterns by generating fictional anomaly images and succeeded in detecting anomalies without increasing the number of false positives. In this study, efforts in creating models specific for this research using 3D CG software were required, but we believe that, in the future, we could collect learning data with a simulator, as has been done in related research on robot recognition. In the future, we will expand on this research, address the issue of scarce anomaly data that is shared across the railroad industry, and aspire to implement automatic anomaly detection in society.

Acknowledgement

We wish to thank East Japan Railway for supplying sample data. We would like also to thank colleagues in Technical Research Center in JR East Information Systems for useful discussions.

References

[1] M. Chenariyan Nakhaee, D. Hiemstra, M. Stoelinga, and M. Noort, “The recent applications of machine learning in rail track maintenance: A survey,” pp. 91–105, 01 2019.
[2] E. J. R. Company, “Revolution2027,” 2018, https://www.jreast.co.jp/press/2018/20180702.pdf.
[3] H. Nakajima, A. Uno, and T. Fujii, “Development of outside inspection system for rolling stock,” 2019. [Online]. Available: https://www.jreast.co.jp/development/tech/pdf_62/tech-62-11-14.pdf
[4] C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” Journal of Big Data, vol. 6, no. 1, pp. 1–48, 2019.
[5] M. Frid-Adar, I. Diamant, E. Klang, M. Amitai, J. Goldberger, and H. Greenspan, “Gan-based synthetic medical image augmentation for increased cnn performance in liver lesion classification,” Neurocomputing, vol. 321, p. 321–331, Dec 2018. [Online]. Available: http://dx.doi.org/10.1016/j.neucom.2018.09.013
[6] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” 2016. [Online]. Available: https://arxiv.org/abs/1511.06434
[7] X. Zhu, Y. Liu, Z. Qin, and J. Li, “Data augmentation in emotion classification using generative adversarial networks,” 2017. [Online]. Available: https://arxiv.org/abs/1711.00648
[8] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” 2020. [Online]. Available: https://arxiv.org/abs/1703.10593
[9] T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs, “Unsupervised anomaly detection with generative adversarial networks to guide marker discovery,” 2017. [Online]. Available: https://arxiv.org/abs/1703.05921
[10] T. Schlegl, P. Seeböck, S. M. Waldstein, G. Langs, and U. Schmidt-Erfurth, “f-anogan: Fast unsupervised anomaly detection with generative adversarial networks,” Medical Image Analysis, vol. 54, pp. 30–44, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1361841518302640
[11] D. T. Nguyen, Z. Lou, M. Klar, and T. Brox, “Anomaly detection with multiple-hypotheses predictions,” 2019. [Online]. Available: https://arxiv.org/abs/1810.13292
[12] X. Wang, Y. Du, S. Lin, P. Cui, Y. Shen, and Y. Yang, “advae: A self-adversarial variational autoencoder with gaussian anomaly prior knowledge for anomaly detection,” Knowledge-Based Systems, vol. 190, p. 105187, Feb 2020. [Online]. Available: http://dx.doi.org/10.1016/j.knosys.2019.105187
[13] T.-W. Tang, W.-H. Kuo, J.-H. Lan, C.-F. Ding, H. Hsu, and H.-T. Young, “Anomaly detection neural network with dual auto-encoders gan and its industrial inspection applications,” Sensors, vol. 20, no. 12, 2020. [Online]. Available: https://www.mdpi.com/1424-8220/20/12/3336
[14] Y. Liu, C. Zhuang, and F. Lu, “Unsupervised two-stage anomaly detection,” 2021. [Online]. Available: https://arxiv.org/abs/2103.11671
[15] D. Kimura, S. Chaudhury, M. Narita, A. Munawar, and R. Tachibana, “Adversarial discriminative attention for robust anomaly detection,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), March 2020.
[16] J. W. Song, K. Kong, Y. I. Park, and S.-J. Kang, “Attention map-guided two-stage anomaly detection using hard augmentation,” 2021. [Online]. Available: https://arxiv.org/abs/2103.16851
[17] J.-H. Lee, M. Astrid, M. Z. Zaheer, and S.-I. Lee, “Deep visual anomaly detection with negative learning,” 2021. [Online]. Available: https://arxiv.org/abs/2105.11058
[18] C. Wang, Y.-M. Zhang, and C.-L. Liu, “Anomaly detection via minimum likelihood generative adversarial networks,” 2018. [Online]. Available: https://arxiv.org/abs/1808.00200
[19] V. Zavrtanik, M. Kristan, and D. Skočaj, “Draem - a discriminatively trained reconstruction embedding for surface anomaly detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 8330–8339.
[20] S. Hajizadeh, A. Núñez, and D. Tax, “Semi-supervised rail defect detection from imbalanced image data,” IFAC-PapersOnLine, vol. 49, pp. 78–83, 12 2016.
[21] Y. Lyu, Z. Han, J. Zhong, C. Li, and Z. Liu, “A gan-based anomaly detection method for isoelectric line in high-speed railway,” in 2019 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), 2019, pp. 1–6.
[22] X. Lu, Y. Peng, W. Quan, N. Zhou, D. Zou, and J. X. Chen, “An anomaly detection method for outdoors insulator in high-speed railway traction substation,” in 2020 2nd International Conference on Advances in Computer Technology, Information Science and Communications (CTISC), 2020, pp. 161–165.
[23] B. Zhao, R. Xue, and Q. Zhang, “Anomaly detection in railway transportation based on self-representation and generative adversarial networks,” 2021. [Online]. Available: http://aium2021.felk.cvut.cz/papers/AI4UM_paper_6.pdf
[24] L. Gao, Y. Jiu, X. Wei, Z. Wang, and W. Xing, “Anomaly detection of trackside equipment based on gps and image matching,” IEEE Access, vol. 8, pp. 17 346–17 355, 2020.
[25] X. Guo, X. Wei, M. Guo, X. Wei, L. Gao, and W. Xing, “Anomaly detection of trackside equipment based on semi-supervised and multi-domain learning,” in 2020 15th IEEE International Conference on Signal Processing (ICSP), vol. 1, 2020, pp. 268–273.
[26] J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” 2017. [Online]. Available: https://arxiv.org/abs/1703.06907
[27] T. Kudo and R. Takimoto, “Cg utilization for creation of regression model training data in deep learning,” Procedia Computer Science, vol. 159, pp. 832–841, 2019, knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 23rd International Conference KES2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1877050919314292
[28] H. Fu, M. Gong, C. Wang, K. Batmanghelich, K. Zhang, and D. Tao, “Geometry-consistent generative adversarial networks for one-sided unsupervised domain mapping,” 2018. [Online]. Available: https://arxiv.org/abs/1809.05852
[29] ——, “Gcgan: Geometry-consistent generative adversarial networks for one-sided unsupervised domain mapping,” 2019, https://github.com/hufu6371/GcGAN.
[30] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2015. [Online]. Available: https://arxiv.org/abs/1512.03385
[31] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” International Journal of Computer Vision, vol. 128, no. 2, p. 336–359, Oct 2019. [Online]. Available: http://dx.doi.org/10.1007/s11263-019-01228-7