Autonomous Removal of Perspective Distortion of Elevator Button
Images based on Corner Detection
Abstract
Elevator button recognition is a critical function to realize the autonomous operation of elevators. However, challenging image conditions and various image distortions make it difficult to recognize buttons accurately. To fill this gap, we propose a novel deep learning-based approach, which aims to autonomously correct perspective distortions of elevator button images based on button corner detection results. First, we leverage a novel image segmentation model and the Hough Transform method to obtain button segmentation and button corner detection results. Then, pixel coordinates of standard button corners are used as reference features to estimate camera motions for correcting perspective distortions. Fifteen elevator button images are captured from different angles of view as the dataset. The experimental results demonstrate that our proposed approach is capable of estimating camera motions and removing perspective distortions of elevator button images with high accuracy.
I Introduction
Autonomous elevator operation is a promising solution for mobile navigation in office buildings. The system consists of three parts: button recognition, motion planning, and robot control. Among them, button recognition is the most basic but challenging part. Its performance directly determines the success rate and robustness of the entire elevator autonomous operating system. Traditional button recognition algorithms tend to place markers on the elevator button panel in advance, and then the position of each button relative to the markers can be acquired by calculating the geometric relationship between them. Unfortunately, these hand-engineered algorithms are inconvenient and will fail if the elevator button panel cannot be marked beforehand. To overcome this limitation, in the past few years, researchers have proposed plenty of deep learning-based button recognition algorithms [1, 2, 3], which can output button recognition results directly from raw elevator button images. However, the recognition accuracy of these deep learning-based algorithms is not satisfactory due to various image conditions and distortions. There exist many kinds of button shapes, button sizes, elevator panel designs, and light conditions. Meanwhile, various perspective distortions and unexpected blurs make it more challenging to recognize buttons accurately. In this article, we propose a novel deep learning-based approach to autonomously correct perspective distortions of elevator button images based on button corner detection results.
The proposed approach consists of two parts. The first part is a button corner detection algorithm. We train an image segmentation model to perform feature extraction of raw elevator button images and obtain button segmentation results. Then four lines of every button are identified by using the Hough Transform method [4]. Pixel coordinates and the order of all button corners can be obtained as they are the intersections of identified lines. The second part is a pose estimation algorithm. It takes the hypothetical button corners with standard pixel coordinates as reference points and calculates the camera motions to align corners of raw elevator button images and the reference points. Then by using an inverse transformation, new elevator button images without perspective distortions can be generated.
The contributions of this work are summarized as follows:
-
•
We derive detection results of button corners by utilizing an image segmentation model and the Hough Transform method.
-
•
We propose a novel algorithm that can autonomously remove perspective distortions of elevator button images based on the detection results of button corners.
The remainder of this article is organized as follows. The previous work on elevator button recognition and existing distortion removal methods are reviewed in Sec. II. Sec. III and Sec. IV outline the proposed autonomous perspective distortion removal approach, while the experimental results are presented and discussed in Sec. V. Finally, we draw some conclusions and discuss the future work in Sec. VI.
II Related Work
II-A Elevator button recognition
Before deep learning techniques are widely used in the research area of object recognition for robotics, researchers tend to develop hand-engineered approaches based on traditional image processing techniques to recognize elevator buttons. For instance, Klingbeil et al. [5] designed a pipeline to realize the function of button detection and character recognition, using a grid fitting method to regress button locations based on a sliding window-based object detector. This method achieved an accuracy of 86.2% in a test set containing 50 images. However, images in the test set were assumed to be in good light conditions without any perspective distortion. As a result, the pipeline designed by [5] cannot be used in natural scenes. Zakaria et al. [6] developed a framework for vision-based elevator external button recognition and localization based on Sobel operator edge detection technique and Wiener filter. In [7], the template matching combined with a homography-based transform was used to deal with vision-based button recognition for a robot arm manipulating the elevator. However, approaches in [6][7] were not robust to noise or environmental variability due to the limited capacity of traditional image processing methods.
Various deep learning-based methods have recently been applied to elevator button recognition with the revolution of computational technologies. The accuracy of elevator button recognition can be significantly improved with the discrimination capabilities of deep neural networks. For instance, in [8], the recognition task was formalized as a classification problem, and a hybrid button classification system was proposed, which combined histogram of oriented gradients (HOG), bag-of-words (BoW), and artificial neural networks (ANN). Experimental results in [8] showed that ANN could help improve the performance of button classification a lot. Dong et al. [2] proposed a button recognition system based on convolutional neural networks (CNN), which can achieve a high recognition accuracy for known elevator button panels. In [3], elevator button recognition was regarded as a multi-object detection problem, and a single-shot multi-box detector (SSD) was used as the detection network. Zhu et al. [1] proposed a novel algorithm for elevator button recognition, called OCR-RCNN, which integrated a character recognition branch into the Faster-RCNN and turned multi-object detection problem into a binary button detection task and a character recognition task. Inspired by [9], in this article, we design a more advanced semantic segmentation model based on the Deeplabv3+ model [10] and the Hough Transform method to obtain button segmentation and button corner detection results.
II-B Removal of perspective distortions
In contrast to the vast literature on various elevator button recognition algorithms based on traditional image processing methods or deep learning models, only countable publications studied the removal of perspective distortions for elevator button images. Researchers have proposed some perspective distortion removal algorithms for document images [11, 12, 13], electroluminescent images [14], lithographic watermarked authentication images [15], and so on. Zhu et al. [16] proposed a novel perspective distortion removal algorithm that leveraged the Gaussian Mixture Model (GMM) and EM framework. The algorithm in [16] took as input the outcomes of the button center recognizer and finally generated the corrected images. However, the algorithm in [16] can only handle internal panel images that contain a number of buttons and may easily fail for external elevator button images with few button samples. We further integrate button corners as feature points in this article to realize autonomously perspective distortion removal for robotic elevator button images. The experimental results demonstrate that our proposed approach can handle external elevator panels well.
III Button Corner Detection
III-A Button Segmentation
In this article, we design an image segmentation model based on the Deeplabv3+ model to obtain segmentation results of pixels belonging to elevator buttons. The Deeplabv3+ model is one of the state-of-the-art models using advanced deep learning technologies to generate semantic segmentation results of input images, which combines the encoder-decoder structure and the spatial pyramid pooling module. The encoder-decoder structure can help extract sharp object boundaries, and the spatial pyramid pooling module can help capture rich contextual information. The detail of the proposed image segmentation model is shown in Fig. LABEL:fig2. The input is a raw elevator button image, and the output is a gray-scale image with button segmentation results.
In the encoding stage, different from the deeplabv3+ model, we utilize the MobileNetv2 [17], a depthwise separable backbone, to first extract low-level features and high-level features. Then several atrous convolutions [18] with different rates are applied to capture rich semantic information from high-level features. In the decoding stage, the low-level features module is concatenated with the output of the encoder first. Then, a convolution module is used to further fuse the extracted features. Finally, bilinear interpolation is used to obtain the segmentation prediction results of the same size as the input image.
The value of every pixel represents which category it belongs to. For instance, when this button segmentation model is applied to distorted images Fig. LABEL:fig1 (a), there exist four categories: ‘up’, ‘down’, ‘keyhole’, and ‘non-button’.
III-B Corner coordinates detection
After obtaining button segmentation results of raw elevator button images, we first use dilation and erosion methods to reduce image noise and smooth the edges of buttons to improve the performance of line detection. The process of erosion followed by dilation is called a closed operation, which is used to connect neighboring objects and smooth their boundaries at the same time without significantly changing their area.
Then the Hough Transform method is applied to detect four lines of buttons. Hough transform is one of the primary methods to detect geometric shapes from images in computer vision, image analysis, and digital image processing. The transformation between two coordinate spaces is to map a curve or line with the same shape from one space to another coordinate space and form a peak. Finally, after obtaining detection results of four lines of the button, we can derive pixel coordinates of button corners as they are the intersections of the detected lines. And the order of corners of every button is defined in advance to facilitate the perspective distortion removal algorithm. The order is shown in Fig. LABEL:fig3.
IV Perspective Distortion Removal
To begin with, we first define the notations which will be frequently used in this paper. Throughout this work, matrices are written as boldface uppercase letters, and vectors are written as boldface lower letters. The following notations are used:
-
•
- the detected button corners in the image plane;
-
•
- the detected button corners in the normalized image plane;
-
•
- the presupposed standard button corners without distortion in the image plane;
-
•
- the presupposed standard button corners without distortion in the normalized image plane;
-
•
- the rectified button corners in the image plane;
-
•
- the rectified button corners in the normalized image plane;
-
•
- the intrinsic parameter of the camera;
-
•
- focal length in the meter for fixed focal length, non-zoomed camera;
-
•
- image center in pixel;
-
•
- pixel width and height in meter;
-
•
- the spatial coordinates of detected button corners;
-
•
- the spatial coordinates of standard button corners;
-
•
- new spatial coordinates of detected button corners after rotation operation;
-
•
- new spatial coordinates of detected button corners with depth equal to 1 after rotation and translation operation;
-
•
- the matrix representation of angle-axis parameterized rotation ;
-
•
T - the matrix representation of translation, between detected button corners and standard button corners;
-
•
- the number of buttons on image;
-
•
- slopes of the horizontal line of every button in space coordinate;
-
•
- slopes of the vertical line of every button in space coordinate;
-
•
- cosine values of the angles between horizontal and vertical lines of every button in space coordinate.
The detail of the proposed perspective distortion removal algorithm is shown in Alg. 1.
The first step is to establish a presupposed elevator button image, in which pixel coordinates of the button corners U are standard without perspective distortion. Two types of the presupposed elevator button images are shown in Fig. 4.

The second step is back projection. and are obtained by adding a third row to C and . Then the inverse matrix of the intrinsic camera parameter is leveraged to obtain the spatial coordinates of button corners, and the equation is shown as follows:
(1) |
In this algorithm, we assume that for the standard button corners without perspective distortions, the slopes of horizontal lines equal to zero, the slopes of vertical lines equal to infinity, and the cosine values of the angles between horizontal and vertical lines equal to zero. Thus for E, we have:
(2) |
The third step is to compute the rotation and translation matrix to form new spatial coordinates of detected button corners. Three-dimensional rotation matrices are utilized to rotate spatial coordinates of the corners, which include rotating against the x-axis, against the y-axis and against the z-axis, shown as follows respectively:
(3) |
(4) |
(5) |
where , , are radian values and the relation between angle value and radian value is:
(6) |
The rotation matrix is formed as Then, new spatial coordinates of detected button corners are computed through the following equations:
(7) |
where P[3] represents the third row of new spatial coordinates, and the translation matrix T is defined as the difference value between spatial coordinates of the first corner in the presupposed elevator button image and the distorted elevator button image.
The fourth step is to estimate camera motions. The goal is to find the optimal rotation matrix and translation matrix so that the lines formed by the new spatial coordinates of the distorted button corners are parallel to the lines formed by the spatial coordinates of the presupposed standard button corners. In this algorithm, we set , which means that every 0.5 degree against each axis is sampled to form the rotation matrix . The range is from -40 degrees to 40 degrees. Three criteria are chosen to evaluate which formed rotation matrix is optimal:
The first criterion is , representing the slopes of horizontal lines of every button in space coordinate. of each button is defined as:
(8) |
where denotes the first value of the -th corner and denotes the second value of the -th corner, respectively. Then we can obtain the two-norm result of ,
(9) |
The second criterion is , representing the slopes of vertical lines of every button in space coordinate. of each button is defined as:
(10) |
Then we can obtain the two-norm result of and its reciprocal ,
(11) |
The third criterion is Cos, representing the cosine values of the angles between horizontal and vertical lines of every button in space coordinate. The horizontal line vector, vertical line vector, and of each button are shown as follows:
(12) |










where denotes the spatial coordinate of the first corner, the spatial coordinate of the second corner, and the spatial coordinate of the fourth corner. Then we can obtain the two-norm result of Cos,
(13) |
When is smaller, we can obtain a better perpendicular result of horizontal and vertical lines of buttons.
To combine the three criteria, they are normalized as follows:
(14) |
(15) |
(16) |
Then the final criterion is shown as follows:
(17) |
When is smallest, we can obtain the optimal rotation matrix and translation matrix.
The fifth step is to form new rectified images. After obtaining the optimal pose (), each pixel of the distorted elevator button image can be transformed to have new spatial coordinates through Eq. (7). Then intrinsic camera parameter is used to do projection and get new pixel coordinates in the normalized plane,
(18) |
By taking the first and second rows, we can obtain pixel coordinates of the rectified button corners in the image plane. Finally, a new rectified elevator button image is generated by applying an inverse image warping operation.
V Experiments
To verify the effectiveness of the proposed approach, we collect a dataset with 15 images from 3 different elevators, which are captured from different angles of views containing severe perspective distortions. The intrinsic camera parameter is:
(19) |
The value of Eq. (13) is used to measure the accuracy of the proposed perspective distortion removal algorithm, which represents the two-norm value of cosine values of the angles between horizontal and vertical lines of all buttons in space coordinate. When the value of Eq. (13) is smaller, the rectification performance is better. The experimental results of 15 elevator button images are shown in Table I. Some demonstrations of the corresponding original and rectified images are presented in Fig. 5. From Fig. 5 and Table I, we can see that our proposed approach is capable of removing perspective distortions of elevator button images autonomously with high accuracy.
No. | I-10 | I-20 | I-30 | I-40 | I-50 | Average |
---|---|---|---|---|---|---|
1 | 0.036 | 0.042 | 0.050 | 0.007 | 0.024 | 0.032 |
No. | I-160 | I-170 | I-180 | I-190 | I-200 | Average |
2 | 0.003 | 0.026 | 0.003 | 0.004 | 0.016 | 0.010 |
No. | I-850 | I-860 | I-870 | I-880 | I-890 | Average |
3 | 0.048 | 0.070 | 0.097 | 0.100 | 0.074 | 0.078 |
VI Conclusion
This article proposes a novel deep learning-based approach that can autonomously remove perspective distortions of elevator button images. We utilize an image segmentation model and Hough Transform method to obtain the detection results of button corners. A novel algorithm is designed to correct the perspective distortions of original elevator button images. Currently, the presented algorithm can only handle elevator button images that contain several rectangle buttons. For elevator button images that contain circular buttons, as a circle has no slopes, the presented algorithm will fail. In the next step, we will use the Hough transform method to calculate the center coordinates and radius of the circular buttons and develop a novel algorithm that can autonomously remove perspective distortions of images containing circular elevator buttons.
References
- [1] D. Zhu, Y. Fang, Z. Min, D. Ho, and M. Q.-H. Meng, “Ocr-rcnn: An accurate and efficient framework for elevator button recognition,” IEEE Transactions on Industrial Electronics, 2021.
- [2] Z. Dong, D. Zhu, and M. Q.-H. Meng, “An autonomous elevator button recognition system based on convolutional neural networks,” in 2017 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, 2017, pp. 2533–2539.
- [3] J. Liu and Y. Tian, “Recognizing elevator buttons and labels for blind navigation,” in 2017 IEEE 7th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER). IEEE, 2017, pp. 1236–1240.
- [4] R. O. Duda and P. E. Hart, “Use of the hough transformation to detect lines and curves in pictures,” Communications of the ACM, vol. 15, no. 1, pp. 11–15, 1972.
- [5] E. Klingbeil, B. Carpenter, O. Russakovsky, and A. Y. Ng, “Autonomous operation of novel elevators for robot navigation,” in 2010 IEEE International Conference on Robotics and Automation. IEEE, 2010, pp. 751–758.
- [6] W. N. F. W. Zakaria, M. R. Daud, S. Razali, and M. F. Abas, “Elevator‘s external button recognition and detection for vision-based system,” Proceeding of the Electrical Engineering Computer Science and Informatics, vol. 1, no. 1, pp. 265–269, 2014.
- [7] H.-H. Kim, D.-J. Kim, and K.-H. Park, “Robust elevator button recognition in the presence of partial occlusion and clutter by specular reflections,” IEEE Transactions on Industrial Electronics, vol. 59, no. 3, pp. 1597–1611, 2011.
- [8] K. T. Islam, G. Mujtaba, R. G. Raj, and H. F. Nweke, “Elevator button and floor number recognition through hybrid image classification approach for navigation of service robot in buildings,” in 2017 International Conference on Engineering Technology and Technopreneurship (ICE2T). IEEE, 2017, pp. 1–4.
- [9] J. Liu, Y. Fang, D. Zhu, N. Ma, J. Pan, and M. Q.-H. Meng, “A large-scale dataset for benchmarking elevator button segmentation and character recognition,” arXiv preprint arXiv:2103.09030, 2021.
- [10] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801–818.
- [11] Y. Takezawa, M. Hasegawa, and S. Tabbone, “Camera-captured document image perspective distortion correction using vanishing point detection based on radon transform,” in 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 2016, pp. 3968–3974.
- [12] C. Liu, Y. Zhang, B. Wang, and X. Ding, “Restoring camera-captured distorted document images,” International Journal on Document Analysis and Recognition (IJDAR), vol. 18, no. 2, pp. 111–124, 2015.
- [13] M. Shafii and M. Sid-Ahmed, “Skew detection and correction based on an axes-parallel bounding box,” International Journal on Document Analysis and Recognition (IJDAR), vol. 18, no. 1, pp. 59–71, 2015.
- [14] C. Mantel, S. Spataru, H. Parikh, D. Sera, G. A. dos Reis Benatto, N. Riedel, S. Thorsteinsson, P. B. Poulsen, and S. Forchhammer, “Correcting for perspective distortion in electroluminescence images of photovoltaic panels,” in 2018 IEEE 7th World Conference on Photovoltaic Energy Conversion (WCPEC)(A Joint Conference of 45th IEEE PVSC, 28th PVSEC & 34th EU PVSEC). IEEE, 2018, pp. 0433–0437.
- [15] Y. Xie, J. Li, J.-j. Wang, and C.-y. Liu, “A geometric distortion correction method for lithographic watermarked authentication images,” in Fifth International Conference on Graphic and Image Processing (ICGIP 2013), vol. 9069. International Society for Optics and Photonics, 2014, p. 906908.
- [16] D. Zhu, J. Liu, N. Ma, Z. Min, and M. Q.-H. Meng, “Autonomous removal of perspective distortion for robotic elevator button recognition,” in 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, 2019, pp. 913–917.
- [17] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520.
- [18] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv preprint arXiv:1706.05587, 2017.