CANet: Curved Guide Line Network with Adaptive Decoder for Lane Detection

Abstract

Lane detection is challenging due to the complicated on-road scenarios and line deformation from different camera perspectives. Lots of solutions were proposed, but can not deal with “corner lanes” well. To address this problem, this paper proposes a new top-down deep learning lane detection approach, CANet. A lane instance is first responded by the heatmap on the U-shaped “curved guide line” at global semantic level, thus the corresponding features of each lane are aggregated at the response point. Then CANet obtains the heatmap response of the entire lane through conditional convolution, and finally decodes the point set to describe lanes via adaptive decoder. The prototype is implemented with Pytorch, and evaluated against 3 well-known datasets extensively. The experimental results show that CANet reaches SOTA in different metrics. Our code will be released soon.

Index Terms— Lane detect, guide line, adaptive decoder

1 Introduction

Benefiting from ADAS (Advanced Driver Assistance Systems), automated driving will be capable of controlling all aspects of driving without human intervention. Fast, accurate lane detection is among the top challenges for ADAS, which is generally regarded as a computer vision task. Recently, deep learning-based lane detection approaches are emerging and perform much better than traditional ways.

By leveraging the progresses in instance segmentation [1, 2, 3], CondLaneNet [4] devises a two-stage solution. Firstly, it uses high-level semantic features to extract the origin of each lane to represent an instance, secondly, the instance origin guides the underlying visual features to describe the shape of the instance accurately. In the first stage, the image boundary (rectangle) is the guide line for origin finding, which results in many corner lanes with small grazing angles, making deep learning models hard to recall. Figure 1a illustrates the F1-score, recall, and precision for different grazing angles, which clearly shows that corner lanes of small grazing angles perform worse. In the second stage, row-wise classification is employed, but it can hardly deal with lanes that are nearly horizontal. Besides, ordinal classification [5] needs additional classifiers to indicate the lane range, but the range indicators and row-wise head often have inconsistent predictions at the end of the lane, leading to anomalies such as tail flicks.

Refer to caption — (a) Model measurements.

To address these problems, we propose CANet, Curved guide line with Adaptive decoder Network. The proposed curved guide line, particularly an inscribed U-shaped line, can increase the grazing angles of corner lanes for better recall. Then, Gaussian mask is used to supervise the lane generation, enabling the network to choose either row- or column-wise classification adaptively according to the shape of the mask as a post-processing decoder during inference. Therefore, lane range will be determined by the heatmap response range instead of additional classifiers, which could avoid inconsistency otherwise.

The rest of the paper is organized as follows. Section 2 reviews related literature and the design of CANet follows in Section 3. Then, Section 4 evaluates CANet, and the paper is concluded in Section 5.

2 Related Work

Many approaches have been proposed for detecting lanes. This section discusses the recent deep learning techniques.

Top-down solutions. “Curve fitting” approaches [6, 7] model lane as a curve. However, it is hard to recognize all lanes because some instances are not mathematically ideal. “Anchor-based” methods such as LaneATT [8] are similar to dense prediction in object detection, so they can not deal with common cases like double solid lines very well. “Row/column-wise classification” [5, 4] is widely adopted in line-shape recognition, however they can not catch horizontal lanes.

Bottom-up solutions. “Key point detection” [9, 10] figures out critical points to describe the lane. The lane might be unsmooth or incomplete in case of missing key points, which is very likely to happen for those in the invisible parts lacking of global semantic information. “Segmentation-based” methods [11, 12] focus too much on pixel-level boundaries but not capturing lane shape. Additionally, it is difficult for segmentation to distinguish close instances because adjacent pixels usually share similar characteristics.

3 Methods

3.1 Network Architecture

As an application of instance detection, CANet’s architecture is inspired by many existing techniques as in Figure 2. It uses ResNet [13] as backbone to extract image features, and uses PAFPN [14] with transformer [15] as neck to obtain multi-scale information. In the instance extraction branch, CANet proposes using curved guide line to obtain the key points (i.e. origins) identifying instances, then acquires instance-level features through RIM (Recurrent Instance Module) [4]. Under the guidance of these features, CANet calculates Gaussian mask and offset heatmap through the conditional convolution, and finally infers lane coordinates with adaptive decoder.

3.2 Guide Line

To the best of our knowledge, though guide lines have been implicitly used by some research [4] to mark the origins of lanes, we’re the first to name the boundary. If guide lines are not enforced, the origins will be randomly distributed, which is inefficient for learning. It is straightforward to consider the image border (a rectangle) as a guide line to restrict the freedom of lane origins. However, in some case it is observed that the response is very scattered, and the response peak is often indistinguishable from other non-response points.

Figure 3 helps to understand the relation between the response range and grazing angle. The black dotted line denotes the detected vectorized lane (hereinafter referred to as “vector line”), and the green dotted lines alongside indicate the range of pixels that generate a response on the heatmap. The greener the background is, the more positive the sample is. $d$ is the radius of the range. The tangents of two guide lines are illustrated in orange and magenta, with grazing angles as $\alpha 1$ and $\alpha 2$ , respectively. Then, the relation between response range and grazing angle could be described in Equation (1). The bigger the angle (close to 90^∘), the tighter the response range. As the grazing angle gets smaller, the range scatters rapidly. This reflects the results in Figure 1a. Therefore, the guide line principle is to “reduce the number of lanes of small grazing angles”.

x_{i}=\frac{2d}{\sin(\alpha_{i})};\quad i=1,2

(1)

3.2.1 Curved Guide Line

According to the guide line principle, CANet proposes to use a U-shaped curve, the lower half is a partial ellipse and the upper half is the image side borders. The ellipse is defined in Equation (2), where $w$ and $h$ are the width and height of the feature map, respectively, and $c_{x}$ and $c_{y}$ are hyper-parameters to adjust the ellipse center. Usually, $c_{x}$ is preferred to 0.5, so the ellipse is in the center of the image horizontally, and the optimal $c_{y}$ should be obtained by parameter experiments.

\frac{(x-c_{x}\cdot w)^{2}}{(c_{x}\cdot w)\texttwosuperior}+\frac{(y-c_{y}\cdot h)^{2}}{(c_{y}\cdot h)^{2}}=1

(2)

Due to its incurvate property in corners, curved guide line results in bigger grazing angles for corner lanes. Figure 1b shows that there are few small angles with the curved guide line. This guarantees various model measurements will be better than that of the rectangle one.

3.2.2 Key Points Capturing

CANet follows CornerNet [16] and CondLaneNet [4] to predict a heatmap to find key points. In order to alleviate the position inaccuracy because of downsampling, CANet uses a normalized Gaussian kernel with offset as supervision in Equation (3).

Y_{xy}=\textrm{normalize}\left(\exp\left(-\frac{x^{2}+y^{2}}{2\sigma^{2}}\right)\right)

(3)

where $x$ and $y$ denote the exact coordinates of each key point. Loss is formulated as focal loss as CondLanenet [4], CornerNet [16] and CenterNet [17].

Table 1: Comparison of different methods on CULane.

Method	Total	Normal	Crowded	Dazzle	Shadow	No line	Arrow	Curve	Cross	Night
SCNN [11]	71.60	90.60	69.70	58.50	66.90	43.40	84.10	64.40	1990	66.10
CurveLanes-L [18]	74.80	90.70	72.30	67.70	70.10	49.40	85.80	68.40	1746	68.90
LaneATT-L [8]	77.02	91.74	76.16	69.47	76.31	50.46	86.29	64.05	1264	70.81
UFLDv2-M [5]	75.90	92.50	74.90	65.70	75.30	49.00	88.50	70.20	1864	70.60
CondLaneNet-S [4]	78.14	92.87	75.79	70.72	80.01	52.39	89.37	72.40	1364	73.23
CondLaneNet-M [4]	78.74	93.38	77.14	71.17	79.93	51.85	89.89	73.88	1387	73.92
CondLaneNet-L [4]	79.48	93.47	77.44	70.93	80.91	54.13	90.16	75.21	1201	74.80
CANet-S	78.46	93.07	76.59	70.51	77.82	52.24	89.39	72.48	1213	72.91
CANet-M	79.16	93.58	77.88	73.11	75.06	51.68	90.09	75.54	1176	73.92
CANet-L	79.86	93.60	78.74	70.07	79.35	52.88	90.18	76.69	1196	74.91

Table 2: Comparison of different methods on CurveLanes.

Method	F1	Precision	Recall	GFlops
SCNN [11]	65.02	76.13	56.74	328.4
CurveLanes-L [18]	82.29	91.11	75.03	20.7
CondLaneNet-S [4]	85.09	87.75	82.58	10.3
CondLaneNet-M [4]	85.92	88.29	83.68	19.7
CondLaneNet-L [4]	86.10	88.98	83.41	44.9
CANet-S	86.57	91.37	82.25	13.1
CANet-M	87.19	91.53	83.25	22.6
CANet-L	87.87	91.69	84.36	45.7

3.3 Gaussian Mask Supervision & Adaptive Decoder

Figure 4 shows a typical lane image, sharing the notations as Figure 3. The bold blue dashed line represents a row of pixels, the red point is the intersection of the vector line and the current row, the yellow point is on the green dashed line, and the gray point is the pixel at a horizontal distance $d$ from the red one.

In traditional row-wise classification, all negative samples (yellow and gray pixels) are weighted with the same weight. UFAST [5] uses L1 distance to constrain expectations, making negative samples’ weights proportional to the distance from the positive sample in the same row, but it does not consider the influence of inclination on loss. Instead, CANet uses heatmap form supervision to construct a Gaussian distribution with the vector line as the center, to model that positive sample features decrease with distance.

Since lanes might be horizontal or vertical, it is crucial to choose either row or column-wise classification according to the lane shape. For traditional line classification, the network cannot dynamically adjust the anchor during inference because pixels from different rows are not comparable. CANet addresses this challenge with the aforementioned heatmap supervision, enabling the network to do global pixel comparison instead, which makes it possible to change the row-wise classification in training to a dynamic post-processing operation. Then, CANet is able to choose row or column-wise classification, two kinds of anchors, dynamically to the activation shape of the instance heatmap. Besides, the heatmap response range is the lane range, so there is no need to introduce additional classifiers that would introduce range-to-localization inconsistencies to indicate the range.

4 Experiments

4.1 Results

Extensive experiments were conducted on 3 widely used lane detection datasets, CULane [11], CurveLanes [18], and TuSimple [19]. For the sake of direct comparison, official metrics are used.

Table 1 presents the results of CANet compared to various models on different subsets of CULane. CANet achieves the state of the art overall with 79.86 F1-score (the current best result is 79.63). Particularly, CANet also delivers outstanding performance in Crowded, Dazzle, Cross and Night subsets, which are very difficult to detect.

CurveLanes contains more difficult scenarios such as curves. Table 2 illustrates the results of all models on CurveLanes. CANet-L achieves SOTA with 87.87 F1-score, 1.7 percentage points higher than the current best one. Actually, CANet’s small version CANet-S surpasses the current best F1 with only 13.1/44.9 $\approx$ 29% computational cost. In terms of precision, CANet is a bit lower than the best result (91.69 vs. 93.58), however, CANet has the best recall rate, about 12.77 percentage points higher. This indicates that CANet does a better job on the trade-off between recall and precision.

Compared to the other datasets, TuSimple is simpler and Table 4 shows that all models perform very well, and CANet provides marginal improvements on most indicators.

4.2 Effectiveness of Curved Guide Line

For easy comparison, we divide lanes on CurveLanes into 3 groups according to grazing angles against rectangle guide line. Table 3 shows that curved guide line improves recall for all groups, especially for the smallest group. This is because curved guide line reduces the number of lanes of small grazing angles as illustrated in Figure 1b.

Table 3: Recall for different lanes when curved guide line is disabled and enabled, respectively.

	0°-30°	30°-60°	60°-90°
Disabled	76.16	81.71	81.75
Enabled	82.51 (+6.35)	84.58 (+2.87)	83.70 (+1.95)

4.3 Ablation Study

Table 5 presents how the two policies, curved guide line and adaptive decoder, affect each other with CurveLanes dataset. Baseline (Line 2) is derived from CondLaneNet-L with two updates: the downsampling factor for key points is reduced from 16 to 8, and the rectangle guide line is enforced. These updates make the baseline perform about 0.125 percentage points better than the original CondLaneNet-L. In Line 3, we change the rectangle guide line to curved guide line, and the F1-score increases by 1.036 to the baseline. Then, Gaussian mask supervision and adaptive decoder are solely applied in Line 4 and the model receives 1.221 increments in terms of F1. Finally, we combine both policies in Line 5 as CANet, and reach the best performance.

Table 4: Comparison of different methods on TuSimple.

Method	F1	Accuracy	FP	FN
SCNN [11]	95.97	96.53	6.17	1.80
LaneATT-L [8]	96.06	96.10	5.64	2.17
UFLDv2-M [5]	96.22	95.56	3.18	4.37
CondLaneNet-S [4]	97.01	95.48	2.18	3.80
CondLaneNet-M [4]	96.98	95.37	2.20	3.82
CondLaneNet-L [4]	97.24	96.54	2.01	3.50
CANet-S	97.51	96.56	2.29	2.68
CANet-M	97.44	96.66	2.32	2.79
CANet-L	97.77	96.76	1.92	2.53

Table 5: Ablation study of the optimization policies.

Line	Model	F1-score	Precision	Recall
1	CondLaneNet	86.10	88.98	83.41
2	baseline	86.23	92.46	80.78
3	+curved guide line	87.26	91.98	83.01
4	+adaptive decoder	87.45	92.52	82.90
5	CANet	87.87	91.69	84.36

5 Conclusion and Discussion

As a crucial and challenging task for automated driving, lane detection has been widely explored from different perspectives, especially in the deep learning era. However, the SOTA approaches are difficult to recognize corner lanes effectively. This paper first proposes “guide line” to constrain the lane origins and suggests a U-shaped curved guide line to turn grazing angles bigger for stable learning. By using Gaussian mask in supervision stage, the adaptive decoder mechanism could choose between row- or column-wise classification more intelligently, and the prediction of location and range behave more consistently.

References

[1] Daniel Bolya, Chong Zhou, Fanyi Xiao, and Yong Jae Lee, “Yolact: Real-time instance segmentation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9157–9166.
[2] Hao Chen, Kunyang Sun, Zhi Tian, Chunhua Shen, Yongming Huang, and Youliang Yan, “Blendmask: Top-down meets bottom-up for instance segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 8573–8581.
[3] Zhi Tian, Chunhua Shen, and Hao Chen, “Conditional convolutions for instance segmentation,” in European conference on computer vision. Springer, 2020, pp. 282–298.
[4] Lizhe Liu, Xiaohao Chen, Siyu Zhu, and Ping Tan, “Condlanenet: a top-to-down lane detection framework based on conditional convolution,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3773–3782.
[5] Zequn Qin, Huanyu Wang, and Xi Li, “Ultra fast structure-aware deep lane detection,” in European Conference on Computer Vision. Springer, 2020, pp. 276–291.
[6] Ruijin Liu, Zejian Yuan, Tie Liu, and Zhiliang Xiong, “End-to-end lane shape prediction with transformers,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2021, pp. 3694–3702.
[7] Zhengyang Feng, Shaohua Guo, Xin Tan, Ke Xu, Min Wang, and Lizhuang Ma, “Rethinking efficient lane detection via curve modeling,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17062–17070.
[8] Lucas Tabelini, Rodrigo Berriel, Thiago M Paixao, Claudine Badue, Alberto F De Souza, and Thiago Oliveira-Santos, “Keep your eyes on the lane: Real-time attention-guided lane detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 294–302.
[9] Zhan Qu, Huan Jin, Yang Zhou, Zhen Yang, and Wei Zhang, “Focus on local: Detecting lane marker from bottom up via key point,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14122–14130.
[10] Jinsheng Wang, Yinchao Ma, Shaofei Huang, Tianrui Hui, Fei Wang, Chen Qian, and Tianzhu Zhang, “A keypoint-based global association network for lane detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1392–1401.
[11] Xingang Pan, Jianping Shi, Ping Luo, Xiaogang Wang, and Xiaoou Tang, “Spatial as deep: Spatial cnn for traffic scene understanding,” in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. 2018, AAAI’18/IAAI’18/EAAI’18, AAAI Press.
[12] Hala Abualsaud, Sean Liu, David B Lu, Kenny Situ, Akshay Rangesh, and Mohan M Trivedi, “Laneaf: Robust multi-lane detection with affinity fields,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 7477–7484, 2021.
[13] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
[14] Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia, “Path aggregation network for instance segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8759–8768.
[15] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
[16] Hei Law and Jia Deng, “Cornernet: Detecting objects as paired keypoints,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 734–750.
[17] Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, and Qi Tian, “Centernet: Keypoint triplets for object detection,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6569–6578.
[18] Hang Xu, Shaoju Wang, Xinyue Cai, Wei Zhang, Xiaodan Liang, and Zhenguo Li, “Curvelane-nas: Unifying lane-sensitive architecture search and adaptive point blending,” in European Conference on Computer Vision. Springer, 2020, pp. 689–704.
[19] Tusimple, “Tusimple lane detection benchmark, 2017,” https://github.com/TuSimple/tusimple-benchmark, 2017.