66email: {qinrongzxxlxy,huangsheng}@cqu.edu.cn, {lhuangfu,dhood1037}@sdsu.edu, [email protected]
indicates corresponding author.
Kernel Inversed Pyramidal Resizing Network for Efficient Pavement Distress Recognition
Abstract
Pavement Distress Recognition (PDR) is an important step in pavement inspection and can be powered by image-based automation to expedite the process and reduce labor costs. Pavement images are often in high-resolution with a low ratio of distressed to non-distressed areas. Advanced approaches leverage these properties via dividing images into patches and explore discriminative features in the scale space. However, these approaches usually suffer from information loss during image resizing and low efficiency due to complex learning frameworks. In this paper, we propose a novel and efficient method for PDR. A light network named the Kernel Inversed Pyramidal Resizing Network (KIPRN) is introduced for image resizing, and can be flexibly plugged into the image classification network as a pre-network to exploit resolution and scale information. In KIPRN, pyramidal convolution and kernel inversed convolution are specifically designed to mine discriminative information across different feature granularities and scales. The mined information is passed along to the resized images to yield an informative image pyramid to assist the image classification network for PDR. We applied our method to three well-known Convolutional Neural Networks (CNNs), and conducted an evaluation on a large-scale pavement image dataset named CQU-BPDD. Extensive results demonstrate that KIPRN can generally improve the pavement distress recognition of these CNN models and show that the simple combination of KIPRN and EfficientNet-B3 significantly outperforms the state-of-the-art patch-based method in both performance and efficiency.
Keywords:
Pavement Distress Recognition Image Classification Resizing Network.1 Introduction
Pavement distress is one of the largest threats to modern road networks and, as such, Pavement Distress Recognition (PDR) is an important aspect in maintaining logistics infrastructure. Traditionally, pavement distress recognition is done manually by professionals, and this requires a large overhead of labor and extensive domain knowledge [1]. Given the complex and vast network of roadways, it is almost impossible to accomplish the pavement distress inspection manually. Therefore, automating the pavement distress recognition task is essential.
In recent decades, many methods have been proposed to address this issue through the use of computer vision. Conventional approaches often utilize rudimentary image analysis, hand-crafted features, and traditional classifiers [20, 27, 5, 15, 18]. For example, Salman et al. [20] proposed a crack detection method based on the Gabor filter, and Li et al. [15] developed a neighboring difference histogram method to detect conventional, human visual, pavement disease. The main problem of these approaches is that the optimizations of feature extraction and image classification steps are separated or even omitted from any learning process. Inspired by the remarkable success of deep learning, numerous researchers have employed such models to solve this problem [11, 10, 17, 14, 26, 2]. For example, Gopalakrishnan et al. [11] employed a Deep Convolutional Neural Network (DCNN) trained on the ImageNet database, and transferred that learning to automatically detect cracks in Hot-Mix Asphalt (HMA) and Portland Cement Concrete (PCC) surfaced pavement images. Fan et al. [10] proposed a novel road crack detection algorithm that is based on deep learning and adaptive image segmentation. However, these approaches often overlook many key characteristics of pavement images, such as the high image resolution and the low ratio of distressed to non-distressed areas.

Huang et al [13] presented a Weakly Supervised Patch Label Inference Network with Image Pyramid (WSPLIN-IP) to solve those problems. The WSPLIN-IP exploited high resolution information and scale space of images by dividing an image pyramid into patches for label inference via weakly supervised learning and achieved promising performances in comparison with other state-of-the-art approaches. However, its patch collection strategy and the complex patch label inference processes can lead to low efficiency in practical applications. Moreover, as the mainstream PDR approach, the CNN-based methods often need to resize images to a uniform size for the CNNs, where traditional resizing algorithms, such as bilinear interpolation, are employed. As a result, image resizing is completely independent of model optimization, which inevitably leads to the loss of some discriminative information. A few existing related studies, such as [19], are often difficult to apply to pavement images due to the high sensitivity of pavement diseases to deformation and the need for multi-scale input.
Inspired by the idea of Resizing Network [24], we elaborate a light image resizing network named the Kernel Inversed Pyramidal Resizing Network (KIPRN) to address these issues. The KIPRN can be integrated into any deep learning-based model as a self-contained supplemental module and be optimized with the model as one whole integration, and it learns to retain the information, and compensates for the information loss caused by the image resizing based on the bilinear interpolation. As shown in Figure 1, KIPRN employs pyramidal convolutions [9] to extract information from the original pavement images with different granularity, and then produces a three-layer image pyramid for each input image with our designed kernel inversed ResBlock. Pyramidal convolutions enable the mining of more resolution information with different sizes of convolutions, while the kernel inversed ResBlock can better mine the scale information by enlarging the differences between relative receptive fields in different resizing branches. Finally, the produced image pyramid will be input into the subsequent deep learning-based PDR model to exploit the scale space again without greatly increasing computational burdens.
We evaluated the KIPRN on a large-scale pavement image dataset named CQU-BPDD [26]. Extensive results show that our method generally boosts many deep learning-based PDR models. Moreover, our enhanced EfficientNet-B3 not only achieved state-of-the-art performances, but also obtained prominent advantages in efficiency compared with the WSPLIN-IP, which is a recent state-of-the-art PDR approach that also considers the resolution and scale information and utilizes the EfficientNet-B3 as its backbone network. The main contributions of our work can be summarized as follows:
We propose a novel resizing network named KIPRN that can boost any deep learning-based PDR approach by exploiting the resolution and scale information of images. Moreover, the KIPRN will not require significant computational cost. To the best of our knowledge, our work is the first attempt to use a deep learning model to study image resizing in pavement distress analysis.
We propose kernel inversed ResBlocks, which applies the smaller convolutional kernels to the larger feature maps, while applying the larger convolutional kernels to the smaller feature maps in a size-inversion way. It implicitly enlarges the scale space and, thereby, enhances the scale information exploitation.
Extensive results demonstrate that the KIPRN can generally improve deep learning-based PDR approaches and achieve a state-of-the-art performance without greatly increasing computational burdens.
2 Methodology
2.1 Problem Formulation and Overview
Pavement distress recognition is an image classification task to classify the images of damaged pavements into different distress categories. Let and be the pavement images and their corresponding labels respectively. is a -dimensional one-hot vector, where is the number of pavement distress categories. represents the -th element of . If the -th element is the only non-zero element, it indicates that the corresponding pavement image has the -th type of pavement distress. The goal of pavement distress recognition is to train a PDR model to recognize pavement distress in a given pavement image.
In deep learning-based PDR, the high-resolution images are often resized into a fixed size to meet the input or efficiency requirement of these models. Moreover, some studies also show that exploiting the scale information of pavement images can benefit pavement distress recognition. Thus, image resizing is an inevitable process in pavement distress recognition based on deep learning. However, the conventional linear interpolation-based image resizing process is independent of the optimization of the pavement distress model, and, thereby, often causes the loss of discriminative information.
To address this issue, we propose an end-to-end network named KIPRN for training to resize the pavement images and, thereby, aiding the PDR model. KIPRN can be plugged into any deep learning-based image classification models as a pre-network and optimized with these models. The pavement distress recognition process based on the KIPRN can be represented as follows,
(1) |
where and are the mapping functions of the KIPRN and the PDR model respectively, while and are their corresponding parameters. As shown in Figure 1, the KIPRN consists of three modules, namely Image Resizing (IR), Pyramidal Convolution (PC) and Kernel Inversed Convolution (KIC). On the one hand, the original pavement images will be resized into different sizes through IR. On the other hand, PC extracts the features of original pavement images from different granularities, and then KIC mines the scale information of the extracted features with different branches via inversed kernels. The mined information will be compensated into the resized images to yield a three-layer image pyramid for each pavement image. Finally, the produced image pyramid will be input into the subsequent deep learning-based PDR model to accomplish the label inference.
2.2 Image Resizing
The Image Resizing (IR) module is used to resize the original pavement image into different sizes. Here, we set . This process can be represented as follows,
(2) |
where is any chosen traditional image resizing algorithm, and is the collection of resized images. is the -th resized pavement image. We followed [24] and adopted bilinear interpolation as the image resizing algorithm. The KIPRN will compensate the subsequently learned discriminative and scale information into to generate a pavement image pyramid that is more conducive for a deep learning-based PDR model to correctly recognize pavement distress.
2.3 Pyramidal Convolution
The next step of the KIPRN is to leverage a Pyramidal Convolution (PC) module to mine and preserve the relevant information of the original images from different feature granularities. The golden dash-line rectangle in Figure 1 shows the details of the PC module, which is a two-layers of pyramidal convolution [9]. The first layer adopts three convolution kernels, whose sizes are , , and respectively, while two convolution kernels of the second layer are and respectively. Let be the mapping function of the PC module where is its corresponding parameter. The pyramidal feature map can be generated as follows,
(3) |
which sufficiently encodes the detailed features of pavement images under different granularities.
2.4 Kernel Inversed Convolution
Once we obtain the feature map , a Kernel Inversed Convolutional (KIC) module is designed to better exploit the scale information of pavement images by enlarging the differences in the receptive fields in different convolution branches, as shown in the red dash-line rectangle in Figure 1. In KIC, the feature map is resized into different sizes, which are identical to the ones of those resized images in the image resizing module,
(4) |
where is the collection of the resized feature maps, and is the resized feature map corresponding to the resized image . Thereafter, these resized feature maps are fed into different convolution branches to produce the information compensations for different resized images, and yield the final image pyramid. The whole image pyramid generation process can be represented as follows,
(5) |
where is the generated image pyramid, is the mapping function of the Kernel Inversed Convolution (KIC) module, and is its learned parameters.
To better retain the discriminative information of pavement images across different scales, we adopted an idea from [4], and elaborate a series of kernel inversed ResBlocks for feature learning. In these ResBlocks, as shown in the orange dash-line rectangles in Figure 1, the smaller convolution kernel is applied to the larger feature map, which enables the mining of the local detailed information of images, while the larger convolution kernel is applied to the smaller feature map, which enables the capturing of the global structural features of images. In other words, the kernel inversed convolution enlarges the perception range in the scale space, and thereby preserves more conducive information for the solution of the subsequent task. In this manner, the whole KIPRN process is a composition of these modules, , where .
2.5 Multi-Scale Pavement Distress Recognition
In the final step, the generated image pyramid was input into an image classification network to predict the distress label of a given pavement image as follows,
(6) |
The KIPRN was optimized with the image classification network together in an end-to-end manner. Let be the loss function of the image classification network. The optimal parameters of the KIPRN and the image classification network can be obtained by solving the following programming problem,
(7) |
In this study, we chose three well known Convolutional Neural Networks (CNNs), namely EfficientNet-B3, ResNet-50 and Inception-v1, as the image classification networks to validate the effectiveness of the KIPRN.
3 Experiment
3.1 Dataset and Setup
A large-scale pavement dataset named CQU-BPDD [26] was employed for evaluation. It consists of 43,861 typical pavement images and 16,795 diseased pavement images and includes seven different types of distresses, namely alligator crack, crack pouring, longitudinal crack, massive crack, transverse crack, raveling, and repair. Since pavement distress recognition is a follow-up task of pavement distress detection, we only used the diseased pavement images for training and testing. For a fair comparison, we followed the data split strategy in [13], so that 5,140 images were selected for training while the remaining ones were used for testing.
As shown in Table 1, we chose three traditional shallow learning methods, five well-known CNNs, two classical vision transformers and WSPLIN-IP as baselines. For all deep learning models, we adopted AdamW [16] as the optimizer and the learning rate was set to 0.0001. The input images were resized by the KIPRN into three resolutions, , , and, , to yield the image pyramid. Recognition accuracy is used as the performance metric following [13].
Method | Type | Accuracy |
---|---|---|
RGB + RF [3] | Single-scale | 0.305 |
HOG [7] + SVM [6] | Single-scale | 0.318 |
VGG-16 [21] | Single-scale | 0.562 |
Inception-v3 [23] | Single-scale | 0.716 |
ViT-S/16 [8] | Single-scale | 0.750 |
ViT-B/16 [8] | Single-scale | 0.753 |
WSPLIN-IP0 [13] | Multi-scale | 0.837 |
WSPLIN-IP [13] | Multi-scale | 0.850 |
ResNet-50 [12] | Single-scale | 0.712 |
ResNet-50 | Multi-scale | 0.786 |
Ours + ResNet-50 | Multi-scale | 0.827 |
Inception-v1 [22] | Single-scale | 0.726 |
Inception-v1 | Multi-scale | 0.781 |
Ours + Inception-v1 | Multi-scale | 0.803 |
EfficientNet-B3 [25] | Single-scale | 0.786 |
EfficientNet-B3 | Multi-scale | 0.830 |
Ours + EfficientNet-B3 | Multi-scale | 0.861 |
(a) EfficientNet-B3
(b) EfficientNet-B3 + KIPRN
(c) EfficientNet-B3
(d) EfficientNet-B3 + KIPRN

Moudule | Setting/Design | Accuracy |
---|---|---|
Resblocks | 3×3 | 0.844 |
Resblocks | 5×5 | 0.852 |
Resblocks | 7×7 | 0.847 |
Resblocks | Foward | 0.857 |
Resblocks | Inversed | 0.861 |
Pyconv | All | 0.828 |
Pyconv | None | 0.855 |
Pyconv | Resblock | 0.845 |
Pyconv | Last | 0.830 |
Pyconv | First | 0.861 |
3.2 Pavement Disease Recognition
Table 1 tabulates the pavement distress recognition accuracies of different methods on the CQU-BPDD dataset. The results show that the KIRPN significantly boosts all three chosen CNN-based image classification models. The KIPRN improves the the accuracy of EfficientNet-B3, ResNet-50 and Inception-v1 by , and respectively. Moreover, EfficientNet-B3 enhanced by the KIPRN achieved the best performance among all the methods. As shown in Figure 2, the Class Activation Mapping (CAM) of the EfficientNet-B3 enhanced by the KIPRN is better than that of the original EfficientNet-B3. Another interesting observation regarding the multi-scale approaches is they often perform better than their single-scale versions, even though they simply use bilinear interpolation to resize images. The second best performed method, WSPLIN-IP, another multi-scale approach, adopted EfficientNet-B3 as its backbone networks. However, EfficientNet-B3 enhanced by KIPRN achieved 2.4% accuracy gains over WSIPLIN-IP under the same experimental settings. Overall, the multi-scale approaches based on the KIPRN consistently outperformed the versions based on the bilinear interpolation. These results clearly reveal two facts, namely 1 exploiting scale space of images can improve pavement distress recognition, and 2 the KIPRN enables learning conductive information for distress recognition during image resizing.
3.3 Ablation Study
Table 2 tabulates the impacts of different convolution settings on the performance of the KIPRN deployed on EfficientNet-B3. These experiments were conducted on KIPRN deployed on EfficientNet-B3. The results show that the kernel inversed strategy outperforms all the other ResBlock settings. This verifies that our strategy can capture more information in the scale space by enlarging the differences in relative receptive fields. Another interesting observation is that putting pyramidal convolution on the first two layers led to the best performances. We attribute this to the fact that pyramidal convolutions are more capable of capturing low-level visual details than learning the abstract semantic features that the last layers are designed for.
Figure 3 reports the training times of different models per epoch. The observations show that EfficientNet-B3 enhanced by the KIPRN (ours) has similar training efficiency compared to the single-scale EfficientNet-B3, while enjoying the 10X training speeds over WISPLIN-IP, which is also a multi-scale approach.
4 Conclusion
In this study, we proposed an end-to-end resizing network named the KIPRN, which can boost any deep learning-based PDR approach by assisting it to better exploit the resolution and scale information of images. The KIPRN consists of IR, PC and KIC. The KIPRN can be integrated into any deep learning-based PDR model in a plug-and-play way and optimizes together with a deep learning-based PDR model in an end-to-end manner. Extensive results show that our method generally boosts many deep learning-based PDR models. In our future work, we will attempt to compensate the features to the original pavement image in a better way than simple addition.
5 Acknowledgments
This study was supported by the San Diego State University 2021 Emergency Spring Research, Scholarship, and Creative Activities (RSCA) Funding distributed by the Division of Research and Innovation, as well as XSEDE EMPOWER Program under National Science Foundation grant number ACI-1548562.
References
- [1] Benedetto, A., Tosti, F., Pajewski, L., D’Amico, F., Kusayanagi, W.: Fdtd simulation of the gpr signal for effective inspection of pavement damages. In: Proceedings of the 15th International Conference on Ground Penetrating Radar. pp. 513–518. IEEE (2014)
- [2] Bhagvati, C., Skolnick, M.M., Grivas, D.A.: Gaussian normalisation of morphological size distributions for increasing sensitivity to texture variations and its applications to pavement distress classification. In: CVPR (1994)
- [3] Breiman, L.: Random forests. Machine learning 45(1), 5–32 (2001)
- [4] Chen, C.F., Fan, Q., Mallinar, N., Sercu, T., Feris, R.: Big-little net: An efficient multi-scale feature representation for visual and speech recognition. arXiv preprint arXiv:1807.03848 (2018)
- [5] Chou, J., O’Neill, W.A., Cheng, H.: Pavement distress classification using neural networks. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics. vol. 1, pp. 397–401. IEEE (1994)
- [6] Cortes, C., Vapnik, V.: Support-vector networks. Machine learning 20(3), 273–297 (1995)
- [7] Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05). vol. 1, pp. 886–893. Ieee (2005)
- [8] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
- [9] Duta, I.C., Liu, L., Zhu, F., Shao, L.: Pyramidal convolution: Rethinking convolutional neural networks for visual recognition. arXiv preprint arXiv:2006.11538 (2020)
- [10] Fan, R., Bocus, M.J., Zhu, Y., Jiao, J., Wang, L., Ma, F., Cheng, S., Liu, M.: Road crack detection using deep convolutional neural network and adaptive thresholding. In: 2019 IEEE Intelligent Vehicles Symposium (IV). pp. 474–479. IEEE (2019)
- [11] Gopalakrishnan, K., Khaitan, S.K., Choudhary, A., Agrawal, A.: Deep convolutional neural networks with transfer learning for computer vision-based data-driven pavement distress detection. Construction and building materials 157, 322–330 (2017)
- [12] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)
- [13] Huang, G., Huang, S., Huangfu, L., Yang, D.: Weakly supervised patch label inference network with image pyramid for pavement diseases recognition in the wild. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 7978–7982. IEEE (2021)
- [14] Li, B., Wang, K.C., Zhang, A., Yang, E., Wang, G.: Automatic classification of pavement crack using deep convolutional neural network. International Journal of Pavement Engineering 21(4), 457–463 (2020)
- [15] Li, Q., Liu, X.: Novel approach to pavement image segmentation based on neighboring difference histogram method. In: 2008 Congress on Image and Signal Processing. vol. 2, pp. 792–796. IEEE (2008)
- [16] Loshchilov, I., Hutter, F.: Fixing weight decay regularization in adam (2018)
- [17] Naddaf-Sh, S., Naddaf-Sh, M.M., Kashani, A.R., Zargarzadeh, H.: An efficient and scalable deep learning approach for road damage detection. In: 2020 IEEE International Conference on Big Data (Big Data). pp. 5602–5608. IEEE (2020)
- [18] Nejad, F.M., Zakeri, H.: An expert system based on wavelet transform and radon neural network for pavement distress classification. Expert Systems with Applications 38(6), 7088–7101 (2011)
- [19] Recasens, A., Kellnhofer, P., Stent, S., Matusik, W., Torralba, A.: Learning to zoom: a saliency-based sampling layer for neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 51–66 (2018)
- [20] Salman, M., Mathavan, S., Kamal, K., Rahman, M.: Pavement crack detection using the gabor filter. In: 16th international IEEE conference on intelligent transportation systems (ITSC 2013). pp. 2039–2044. IEEE (2013)
- [21] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
- [22] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1–9 (2015)
- [23] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2818–2826 (2016)
- [24] Talebi, H., Milanfar, P.: Learning to resize images for computer vision tasks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 497–506 (2021)
- [25] Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolutional neural networks. In: International conference on machine learning. pp. 6105–6114. PMLR (2019)
- [26] Tang, W., Huang, S., Zhao, Q., Li, R., Huangfu, L.: Iteratively optimized patch label inference network for automatic pavement disease detection. arXiv preprint arXiv:2005.13298 (2020)
- [27] Wang, C., Sha, A., Sun, Z.: Pavement crack classification based on chain code. In: 2010 Seventh international conference on fuzzy systems and knowledge discovery. vol. 2, pp. 593–597. IEEE (2010)