Learning to Calibrate for Reliable Visual Fire Detection

Ziqi Zhang^1,2, Xiuzhuang Zhou¹, and Xiangyang Gong² ¹ Authors are with the School of Intelligent Engineering and Automation, Beijing University of Posts and Telecommunications, Beijing 100876, China(e-mail: [email protected])² Authors are with Information and Digital Management Department, China Petrochemical Corporation, Beijing 100728, China

Abstract

Fire is characterized by its sudden onset and destructive power, making early fire detection crucial for ensuring human safety and protecting property. With the advancement of deep learning, the application of computer vision in fire detection has significantly improved. However, deep learning models often exhibit a tendency toward overconfidence, and most existing works focus primarily on enhancing classification performance, with limited attention given to uncertainty modeling. To address this issue, we propose transforming the Expected Calibration Error (ECE), a metric for measuring uncertainty, into a differentiable ECE loss function. This loss is then combined with the cross-entropy loss to guide the training process of multi-class fire detection models. Additionally, to achieve a good balance between classification accuracy and reliable decision, we introduce a curriculum learning-based approach that dynamically adjusts the weight of the ECE loss during training. Extensive experiments are conducted on two widely used multi-class fire detection datasets, DFAN and EdgeFireSmoke, validating the effectiveness of our uncertainty modeling method.

I Introduction

Fire detection involves the identification and confirmation of fire events in a monitored environment using various technical methods, enabling timely interventions to control the fire and minimize associated damage. This technology is widely employed in fire-prone settings, such as chemical plants, construction material companies, and residential areas, and is typically implemented through sensor-based detection, human monitoring, or algorithmic analysis.

Due to the inherent limitations of sensor detection technologies and the low efficiency of manual patrols, these methods often fail to meet the real-time requirements of fire detection. As a result, most of the attempts have shifted towards the application of computer vision, which enables continuous, intelligent fire detection and alarm systems based on video or image data. In [1], a multi-feature fusion detection method is proposed to leverage distinct visual characteristics [2] associated with fire to identify regions in images that match these features, thereby determining the presence of a fire. Additionally, deep learning techniques are employed to automatically learn richer, high-level features from video or image data. Disturbance removal methods [3] and the design of more discriminative fire features [4] further reduce the impact of complex environmental factors on the model’s recognition accuracy. Visual fire detection offers the advantage of directly identifying fire locations, making it particularly effective in high-risk environments, such as areas containing hazardous gases, where traditional detection methods may fall short. In recent years, with advancements in deep learning, visual fire detection has seen substantial improvements, leading to a significant enhancement in recognition accuracy.

Fire detection systems are generally based on classification models that judge whether a fire occurs. In this process, the correctness of the decision largely depends on the accuracy of the predicted probabilities. However, fire detection scenarios are usually quite complex, with many interfering objects in the environment, as shown in Figure 1, which is highly similar to flames or smoke in terms of color, fluidity, and other features. Additionally, flames and smoke have irregular shapes and varying shades[5], and the visual characteristics of different objects when burning may also be distinctly different. For example, objects like sulfur and magnesium may exhibit rare colors such as blue-purple when burning, sometimes accompanied with other visual characteristics such as emitting white light. The features of fire images are similar to those of non-fire images, while the features among different fire images can vary greatly, leading to higher uncertainty.

Predictive uncertainty can significantly impact the accuracy of a model’s decisions, leading to false positives or false negatives, which may result in severe, irreparable consequences. Therefore, modeling uncertainty is crucial for achieving a more comprehensive understanding of fire detection models and mitigating the risk of over-reliance on inaccurate predictions. Analyzing the sources of uncertainty can also facilitate targeted improvements to the model, such as adjustments to the model architecture or enhancements in data processing, ultimately leading to more reliable decision. However, effectively modeling predictive uncertainty requires substantial computational resources and additional validation/recalibration data [6, 7], which poses a considerable challenge to its practical implementation. Additionally, in classification tasks, achieving an appropriate balance between classification accuracy and reliable decision remains a critical research challenge.

This paper proposes a new method for modeling uncertainty in visual fire detection, by introducing a new loss and calibrating the fire detection model online based on curriculum learning. Our work makes the following contributions:

1) A differentiable ECE Loss is introduced when training the multi-class fire detection models, for modeling the predictive uncertainty in visual fire detection.

2) A method for dynamically adjusting the weight of ECE Loss is designed, inspired by the principles of curriculum learning. This enables the model to progressively transition from simpler to more complex tasks, thereby effectively balancing classification accuracy and reliable decision.

3) Extensive experimental evaluations are conducted on the publicly available datasets DFAN and EdgeFireSmoke for fire detection, demonstrating that the proposed method achieves improved calibration performance without sacrificing the classification accuracy.

II Related Work

Fire detection is designed to identify and confirm the occurrence of a fire in a monitored scene. Currently, fire detection relies mainly on computer vision technology, with works focusing on areas such as distinguishing interfering objects, recognizing complex fire images, and enhancing the efficiency of detection systems.

Fire detection environments are diverse and complex, with many scenarios prone to misidentification. Tao et al. [3] proposed a triple disturbance removal network for smoke detection, which learned discriminative representations to effectively reduce the false alarm rate caused by disturbances at spatial, temporal, and semantic levels. He et al. [4] introduced a lightweight feature-level and decision-level fusion module, incorporating spatial and channel attention mechanisms to detect small smoke patterns and recognize smoke-like objects. Tao et al. [8] developed a forest smoke recognition network with pixel-level supervision, featuring a detail difference perception module, an attention feature separation module, and a multi-connection aggregation method, which effectively mitigates the low detection rate and high false alarm rate in complex scenarios. Park et al. [9] proposed a method for generating virtual wildfire images using a Generative Adversarial Network (GAN), annotating them with a weakly supervised image localization module, and performing wildfire detection based on an enhanced YOLOv5s model, significantly reducing false alarms during the detection process.

The visual characteristics of fire vary significantly across different detection scenarios, and the irregular, dynamic shapes of flames and smoke further complicate detection. Li et al. [10] proposed an anchor-free fire recognition algorithm that integrated a multi-scale feature fusion network with a channel attention mechanism, combining loss functions including classification loss, regression loss, and center point loss. This approach enhanced the model’s ability to detect irregularly shaped flames and smoke with blurred boundaries. Yuan et al. [11] introduced a method that combined a 3D cross-convolutional attention module with count prior embedding, addressing the challenges posed by the semi-transparency and blurred edges of smoke, which often led to reduced detection accuracy. Liang et al. [12] proposed an anchor-free, structure-based fire detection algorithm, designing the feature extraction network’s residual module as a multi-branch structure to capture more expressive flame features. By strengthening feature representation through an improved feature fusion network, this method enhanced the model’s ability to detect multi-scale flames, making it suitable for many fire detection scenarios.

Considering the rapid spread of fire, it is crucial not only to improve the accuracy of fire detection but also to enhance the inference speed and deployment efficiency of models. Siddique et al. [13] proposed an Internet of Things (IoT)-based federated learning framework for forest fire classification, which distributed computational tasks across multiple nodes. This approach enhanced detection efficiency while safeguarding user privacy and data security. Li et al. [14] introduced a lightweight fire detection model and developed an edge computing system that connected feedback from the edge model to edge gateways and smart devices. This solution addresses the limitations of traditional fire detection systems, which are often too large to be deployed on edge devices. Tian et al. [15] proposed a fire detection algorithm that strengthened spatial feature extraction and multi-scale feature fusion, incorporating local convolution modules to reduce the size of the backbone network and detection head. This approach achieved high detection accuracy while ensuring real-time performance. Zhang et al. [16] presented a flame and smoke detection algorithm that integrated a YOLOv5-ResNet cascade network. By enhancing the YOLOv5 detection network and combining continuous multi-frame detection results with changes in smoke area, the algorithm improved the detection performance of small flame and smoke targets. This approach also effectively eliminated non-flame and non-smoke objects, achieving high accuracy, rapid detection, and low false alarm rate, making it suitable for large-scale industrial applications.

In recent years, the rapid development of deep learning has significantly enhanced the recognition accuracy of visual fire detection. However, models still exhibit a tendency to be overconfident in their predictions [17]. Specifically, for certain samples, the model may produce incorrect classification results while maintaining high confidence in these erroneous predictions. Furthermore, much of current works in visual fire detection focus on improving detection accuracy, with little attention given to uncertainty modeling.

Uncertainty modeling methods have broad applications in the field of computer vision. Ji et al. [18] were among the first to incorporate uncertainty into the task of image tampering detection. They proposed an uncertainty estimation network that dynamically supervised uncertainty from both the data and the model, using the generated uncertainty map to refine tampering detection outcomes. This approach led to more accurate and reliable detection. In the context of salient object detection, Tian et al. [19] explored distribution uncertainty, investigating the effectiveness of long-tail learning, single-model uncertainty modeling, and test-time strategies to address the distributional differences between training and testing samples. Yelleni et al. [7] focused on uncertainty in object detection, introducing a method called MC-DropBlock. This approach leveraged the DropBlock technique to model cognitive uncertainty during model training and inference, while using a Gaussian likelihood function to capture accidental uncertainty in the data. Their method significantly enhanced the generalization ability of object detection models.

This paper introduces a new uncertainty modeling method to the visual fire detection by integrating an uncertainty-aware loss with cross-entropy loss and training the model based on curriculum learning. The proposed method demonstrates improved calibration performance in multi-class fire detection tasks, enhancing the reliability of the model’s decisions.

III Method

Modeling uncertainty in visual fire detection involves two key aspects: uncertainty calibration and uncertainty measurement. Calibration refers to the process of adjusting a model so that its predicted confidence aligns with its actual classification accuracy, thereby reducing predictive uncertainty. In contrast, uncertainty measurement involves visualization or quantitative methods, intending to assess and represent the model’s uncertainty, providing a clearer understanding of the confidence associated with its predictions.

III-A Uncertainty Calibration

Calibration can be classified into post-calibration and online calibration. Post-calibration refers to the process of re-mapping the predictions of a pre-trained model to yield more accurate probabilities. Common post-calibration techniques include Temperature Scaling, Vector Scaling, and others [17]. In contrast, online calibration involves constraining predictive uncertainty during the model’s training process, allowing the model to generate credible predictions directly.

We focus on multi-class tasks in visual fire detection, where the model takes a single image $X$ as input, and the corresponding label is denoted as $Y\in\{1,\ldots,K\}$ , with $K$ representing the number of classes. The model processes the input images and, after passing through the final softmax activation function, outputs the predicted probabilities for each class $\{\hat{p}_{1},\hat{p}_{2},\ldots,\hat{p}_{K}\}$ . The largest predicted probability is denoted as $\hat{P}$ , and the class corresponding to $\hat{P}$ is the model’s predicted class, denoted as $\hat{Y}$ . Therefore, $\hat{P}$ represents the model’s confidence for the sample belonging to class $\hat{Y}$ .

The ideal calibration result for a multi-class fire detection model is given by

P(\hat{Y}=Y|\hat{P}=p)=p\quad\forall{p}\in[0,1]

(1)

The equation implies that for all samples, the model’s confidence in its predictions should be numerically consistent with its true classification accuracy. Specifically, if $m$ samples all have a confidence level $p$ , then the model should correctly classify $m\times p$ of these samples. The discrepancy between the model’s predicted confidence and the actual observed accuracy across different confidence levels is what the calibration error measures. However, since $\hat{P}$ in equation 1 is a continuous random variable, it is challenging to verify the equation’s validity based on a finite number of samples. Therefore, binning or other approximation methods are typically employed to address this issue.

III-B Uncertainty Measurement

Uncertainty is an abstract concept which must be assessed through either visual or quantitative methods. A reliability diagram is a visual method for reflecting the uncertainty of a model by representing the true classification accuracy as the function of confidence. The implementation steps are as follows: First, test samples are input into the trained model to obtain corresponding predicted probabilities and classification predictions. Next, the interval $[0,1]$ is divided into $M$ sub-intervals, and the predicted probabilities $\hat{P}$ are assigned to one of the $M$ intervals. Let $B_{m}$ represents the set of samples whose predicted probabilities fall within the interval $I_{m}=(\dfrac{m-1}{M},\dfrac{m}{M})$ . The predicted accuracy for all samples in the m-th interval can be expressed as

acc(B_{m})=\dfrac{\sum_{\begin{subarray}{c}i\in B_{m}\end{subarray}}1(\hat{y}_{i}=y_{i})}{|B_{m}|}

(2)

where $\hat{y_{i}}$ represents the predicted category of sample $i$ , $y_{i}$ represents the true category of sample $i$ , $B_{m}$ represents the set of samples falling in the $m$ -th interval, and $acc(B_{m})$ can be seen as an unbiased estimate of $P(\hat{Y}=Y|\hat{P}\in I_{m})$ . The average confidence of all samples in the $m$ -th sub-interval can be expressed as

conf(B_{m})=\dfrac{\sum_{\begin{subarray}{c}i\in B_{m}\end{subarray}}\hat{p}_{i}}{|B_{m}|}

(3)

where $\hat{p}_{i}$ represents the predicted probability, also known as the confidence, of sample $i$ , and $conf(B_{m})$ can be considered as an approximation of the value of $p$ on the right side of equation 1.

Therefore, equation 1 can be approximated as $acc(B_{m})=conf(B_{m})$ , meaning that, in the case of ideal calibration, the reliability diagram should display as the identity function. Taking $M=10$ , the reliability diagram corresponding to the ideal calibration is shown in Figure 2. The closer the curve (in red) is to the diagonal line (in black), the better the calibration performance. In the diagram, the height of the pink bars represents the average confidence of samples in each sub-interval, while the height of the purple bars reflects the classification accuracy of samples in the corresponding sub-interval. Ideally, the two areas coincide completely.

Reliability diagrams provide an intuitive means of reflecting a model’s uncertainty. However, when the differences between two reliability diagrams are subtle, as illustrated in Figure 3, it is hard to assess which model can provide more reliable decisions.

In such cases, representing the model’s uncertainty using a scalar value proves to be more practical. A widely adopted method for quantification is illustrated as

\sum_{\begin{subarray}{c}\hat{p}\end{subarray}}|P(\hat{Y}=Y|\hat{P}=p)-p|

(4)

This method of uncertainty quantification is derived from equation 1, which assesses the predictive uncertainty by evaluating the gap between the model’s confidence in its predictions and the true classification accuracy. This gap is typically approximated using the ECE[20] metric. The calculation of ECE follows a process similar to the construction of reliability diagrams: the interval $[0,1]$ is divided into $M$ sub-intervals, and within each sub-interval, the weighted average of the differences between the average accuracy and confidence is computed, as outlined in equation 5, where $n$ denotes the total number of samples. ECE evaluates the discrepancy between the predicted confidence and the true classification accuracy. A smaller ECE indicates better calibration performance of the model.

ECE=\sum_{m=1}^{M}\dfrac{B_{m}}{n}|acc(B_{m})-conf(B_{m})|

(5)

III-C ECE Loss

We consider the approach of online calibration in visual fire detection, which involves constraining the model’s credibility during training. One possible method is to use ECE as a loss function. However, the calculation of the accuracy $\textit{acc}(B_{m})$ in the ECE metric involves the 0-1 indicator function, as described in equation 2. Therefore, ECE is non-differentiable, making it unsuitable for direct use as a loss function during optimization.

The sigmoid function, with a range of (0, 1), can map any real number to this interval and exhibits a monotonically increasing behavior, facilitating a smooth transition between 0 and 1. Consequently, we propose approximating the indicator function with the sigmoid function. After this adjustment, the accuracy calculation is modified as presented in equation 6. This converts the ECE metric into a differentiable ECE Loss, without altering the underlying calculation logic.

acc(B_{m})=\frac{\sum_{i\in B_{m}}S(tan(\pi\hat{p}_{i}-\dfrac{\pi}{2}))}{|B_{m}|}

(6)

where the sigmoid function S(x) is given in equation 7, and its corresponding curve is depicted in Figure 4.

S(x)=\dfrac{1}{1+e^{-x}}

(7)

When $\hat{p}_{i}$ varies from 0 to 1, the curve of $\textit{acc}(B_{m})$ exhibits a smooth and differentiable profile, as depicted in Figure 5.

III-D Online Calibration

In this study, we combine ECE Loss with cross-entropy loss (NLL Loss) to jointly supervise the training of a multi-class fire detection model. Initially, we observe the relative magnitudes of NLL Loss and ECE Loss during training. Based on their proportions, we set the expected weight $\gamma_{E}$ for ECE Loss, while fixing the weight coefficient of NLL Loss at 1.0. At the early stage of training, the model’s classification performance is not stable, and thus the constraining effect of ECE Loss on model uncertainty is relatively weak. To balance the model’s classification accuracy and reliable decision, we draw on the curriculum learning [21]. Specifically, we gradually increase the weight of ECE Loss as the number of training epochs progresses, until it reaches the predefined value of $\gamma_{E}$ . This strategy ensures that as the model becomes more confident and its accuracy improves, ECE Loss progressively contributes more to reducing uncertainty.

For DFAN model, the ratio of NLL Loss to ECE Loss is approximately 1:20, which leads to the choice of $\gamma_{E}=0.05$ ; for EdgeFireSmoke model, the ratio of NLL Loss to ECE Loss is approximately 5:1, so $\gamma_{E}=5$ . The overall loss function used during model training can be expressed by

L=L_{n}+\frac{c_{e}-s_{e}}{N-s_{e}}\times\gamma_{E}\times L_{e}

(8)

where $L_{n}$ represents the NLL Loss, and $L_{e}$ the ECE Loss. When calculating ECE Loss, the number of sub-intervals $M$ is set to 10. The variable $c_{e}$ denotes the current training epoch, $s_{e}$ denotes the epoch at which ECE Loss is first incorporated into the loss function, and $N$ is the total number of epochs the model is trained for. Based on observations from the experimental process, as the training epoch increases, the model’s classification accuracy gradually stabilizes and the weight of ECE Loss gradually increases, which ultimately approaches the expected weight. As a consequence, the constraining effect of ECE Loss on the model’s uncertainty becomes more pronounced, encouraging the model to refine its uncertainty estimation and improve its overall calibration.

IV Experiments and Analysis

To evaluate the effectiveness of the proposed uncertainty modeling method, we conduct experiments using two publicly available multi-class datasets in the field of visual fire detection: DFAN and EdgeFireSmoke. The experimental procedure involves three main steps: First, the models are trained using only NLL Loss to establish a baseline. Next, the models are retrained by incorporating both NLL Loss and ECE Loss for comparison. Finally, we compare the classification performance and calibration performance of the models under the two settings to assess the improvements brought by ECE Loss.

IV-A Dataset

The DFAN dataset [22] is sourced from videos on platforms such as YouTube, Facebook, and disaster emergency management agencies, comprising a total of 3,803 images spread across twelve categories. The dataset is split into training, validation, and testing sets in a ratio of 7:2:1. The distribution of images across the different categories is presented in Table 1, and some example images from each category are shown in Figure 6.

TABLE 1: Class distribution of the DFAN dataset.

Category	Number of Pictures
Boat Fire	338
Building Fire	305
Bus Fire	400
Car Fire	579
Cargo Fire	207
Electric Fire	300
Forest Fire	480
Pickup Fire	257
SUV Fire	240
Van Fire	300
Train Fire	300
Non Fire	97

The EdgeFireSmoke dataset [23] consists of wildfire images captured by drones and is organized into four categories: Burned-area images, which typically feature blackened ground or withered tree trunks; Fire-smoke images, where the smoke is depicted as black or white; Fog-area images, characterized by blurred visuals and difficulty in distinguishing objects within the environment; and Green-area images, representing normal environments without the presence of the conditions mentioned above. The dataset contains a total of 49,452 images, with the data split into training, validation, and testing sets in a 2:3:5 ratio. The distribution of images across the different categories is shown in Table 2, and some example images from each category are illustrated in Figure 7.

TABLE 2: Class distribution of the EdgeFireSmoke dataset.

Category	Number of Pictures
Burned-area	9348
Fire-smoke	15579
Fog-area	9762
Green-area	14763

IV-B Evaluation Metrics

We employ five evaluation metrics to assess the performance of the multi-class fire detection models: precision, recall, F1 score, accuracy, and the ECE metric. The first four metrics are designed to evaluate the model’s classification performance, while the ECE metric is intended to measure the decisions’ reliability.

In the multi-classification task of this paper, precision, recall, F1 score, and accuracy are first calculated for each individual class, and then the average values across all classes are taken as the overall metrics. Besides, to measure the model’s predictive uncertainty, the ECE metric is employed. The calculation of accuracy follows the method outlined in equation 2. When plotting the reliability diagram and calculating ECE on the test data, the number of bins $M$ is set to 15 to provide a detailed evaluation of the model’s calibration across different confidence levels.

IV-C Experimental Setup

DFAN and EdgeFireSmoke models are implemented using the TensorFlow framework, and the training procedures largely adhere to the parameter settings outlined in the original papers. For the DFAN model, the training process is conducted over 50 epochs. To ensure the effective calibration of ECE Loss, and considering the relatively small size of the DFAN dataset, the batch size is increased to 32. Additionally, the input images are resized to 299×299 pixels. The model is optimized using the SGD optimizer with a learning rate set to 0.001. For the EdgeFireSmoke model, the training is performed over 30 epochs. To fully exploit the calibration effect of ECE Loss, the batch size is set to 128, and the images are resized to 224×224 pixels. The Adam optimizer is used with a learning rate of 0.001.

IV-D Experimental Results

The model trained with NLL Loss is referred to as the vanilla model, while the model trained by combining NLL Loss and ECE Loss is referred to as the calibrated model. The vanilla DFAN model is denoted as $\textit{DFAN}_{\textit{nll}}$ , and the calibrated DFAN model is denoted as $\textit{DFAN}_{\textit{cali}}$ . Experiments results have demonstrated that the best calibration effect is achieved when $s_{e}=0$ , meaning that ECE Loss is incorporated from the start of training. The experimental comparison between the vanilla and calibrated DFAN models is shown in Table 3. Compared to the $\textit{DFAN}_{\textit{nll}}$ model, the $\textit{DFAN}_{\textit{cali}}$ model exhibits a significant reduction in uncertainty, with only a minor decrease of less than 0.7% in classification accuracy. The reliability diagrams of the vanilla and calibrated models are shown in Figure 8. The left diagram represents the reliability diagram for model $\textit{DFAN}_{\textit{nll}}$ . In the confidence intervals [0.2, 0.4] and [0.6, 0.8], a noticeable gap exists between the model’s predicted confidence and its actual prediction accuracy. The function curve (the red curve in the diagram) deviates significantly from the perfect prediction (represented by the black dashed line), indicating a higher level of model uncertainty. In contrast, the right diagram shows the reliability diagram for model $\textit{DFAN}_{\textit{cali}}$ . In the interval [0.6, 0.8], the model’s confidence and prediction accuracy are more closely aligned, with the function curve closely following the diagonal line. In the interval [0.2, 0.4], the prediction results have also improved, indicating better calibration. The calibrated model provides more reliable predictions, with better alignment between predicted confidence and classification accuracy.

TABLE 3: Performance comparison of DFAN model before and after calibration.

Model	P(%)	R(%)	F1(%)	ACC(%)	ECE
$\textit{DFAN}_{\textit{nll}}$	87.94	87.54	87.31	87.54	0.05436
$\textit{DFAN}_{\textit{cali}}$	87.72	86.89	86.75	86.89	0.04013

The experimental comparison between the vanilla and calibrated EdgeFireSmoke models is shown in Table 4. The vanilla EdgeFireSmoke model is denoted as $\textit{Edge}_{\textit{nll}}$ . In the online calibration experiments for this model, two experimental schemes demonstrate effective calibration effects. The first experimental scheme sets $s_{e}=0$ , and the resulting model is denoted as $\textit{Edge}_{\textit{cali1}}$ , corresponding to the second row of experimental results in Table 4; the second scheme sets $s_{e}=10$ , and the resulting model is denoted as $\textit{Edge}_{\textit{cali2}}$ , corresponding to the third row of experimental results in the table. Although the vanilla model already demonstrated low uncertainty, the introduction of ECE Loss still results in a decrease in model uncertainty, and the loss in model accuracy is controlled within 0.5%. This indicates that the inclusion of ECE Loss successfully mitigates uncertainty without significantly sacrificing classification accuracy. The reliability diagrams of both the vanilla and calibrated EdgeFireSmoke models are shown in Figure 9. In the interval [0.6, 1.0], the predicted accuracy and average confidence in both $\textit{Edge}_{\textit{cali1}}$ and $\textit{Edge}_{\textit{cali2}}$ models are much closer, highlighting the effectiveness of ECE Loss in reducing the model’s tendency towards overconfidence. These results demonstrate that ECE Loss can improve the decisions’ reliability by ensuring that the model’s confidence aligns more accurately with its prediction accuracy.

TABLE 4: Performance comparison of EdgeFireSmoke model before and after calibration.

Model	P(%)	R(%)	F1(%)	ACC(%)	ECE
$\textit{Edge}_{\textit{nll}}$	98.15	97.97	98.02	98.03	0.01208
$\textit{Edge}_{\textit{cali1}}$	97.78	97.82	97.79	97.79	0.00596
$\textit{Edge}_{\textit{cali2}}$	97.72	97.95	97.80	97.80	0.00707

Additionally, to balance the model’s classification accuracy and reliable decision, a dynamic loss function design is proposed, where the weights of the loss functions change over time. This approach is inspired by the concept of curriculum learning, which allows the model to progressively incorporate more complex tasks as it stabilizes on simpler tasks. To verify the effectiveness of this approach, we consider incorporating ECE Loss from the beginning of the training process, with both loss functions’ weights fixed at their expected values throughout training. This can be specifically represented by

L=L_{n}+\gamma_{E}\times L_{e}

(9)

When training the model, the weight coefficient $\gamma_{E}$ for ECE Loss remains constant, ensuring that the magnitudes of both loss functions are kept consistent and exert an equal constraining effect on the model’s training process. Under this setting, the training results of the DFAN model are denoted as $\textit{DFAN}_{\textit{w/o cl}}$ , and those of the EdgeFireSmoke model are denoted as $\textit{Edge}_{\textit{w/o cl}}$ . The comparative experimental results are shown in Table 5 and Figure 10. The results demonstrate that, whether in terms of model classification accuracy or decision reliability, the models trained using curriculum learning approach outperforms the models with a constant weight setting for ECE Loss. This suggests that during the early stage of training, when the model’s classification accuracy is still developing, introducing ECE Loss could even hinder the model’s ability to learn effective classification patterns. Following the curriculum learning, the model first focus on the classification accuracy and later on the decision reliability. This gradual transition leads to more reliable and accurate predictions.

TABLE 5: Performance comparison of models under different training settings.

Model	P(%)	R(%)	F1(%)	ACC(%)	ECE
$\textit{Edge}_{\textit{cali1}}$	97.78	97.82	97.79	97.79	0.00596
$\textit{Edge}_{\textit{cali2}}$	97.72	97.95	97.80	97.80	0.00707
$\textit{Edge}_{\textit{w/o cl}}$	97.33	97.15	97.19	97.20	0.11289
$\textit{DFAN}_{\textit{cali}}$	87.72	86.89	86.75	86.89	0.04013
$\textit{DFAN}_{\textit{w/o cl}}$	86.61	85.90	85.55	85.90	0.05254

V Conclusion

To address the high predictive uncertainty in fire detection, we propose a new method to model uncertainty in visual fire detection by introducing a differentiable ECE Loss and calibrating the fire detection model online. Inspired by curriculum learning, we adjust the weight of ECE Loss over time, balancing the model’s classification accuracy and reliable decision. Experiments conducted on two multi-class datasets, DFAN and EdgeFireSmoke, indicate that even when the predictive uncertainty is relatively low, the incorporation of ECE Loss can mitigate the model’s tendency toward overconfidence, effectively improving the decision reliability while keeping the sacrifice in classification accuracy within 0.7%.

Given the limited availability of multi-class datasets in fire detection, we validate the effectiveness of our method with two commonly used datasets. The development of more classification datasets will further promote the adoption of uncertainty modeling techniques in visual fire detection. Future work will focus on improving the effectiveness of calibration, exploring methods to enhance classification accuracy while maintaining the reliability of decisions, and ultimately achieving better effects in real applications.

References

[1] J. Liang, “Research on fire recognition method of inspection robot based on video image,” Ph.D. dissertation, Hebei University of Technology, 2023.
[2] C. Jin, T. Wang, N. Alhusaini, S. Zhao, H. Liu, K. Xu, and J. Zhang, “Video fire detection methods based on deep learning: Datasets, methods, and future directions,” Fire, vol. 6, no. 8, p. 315, 2023.
[3] H. Tao, “A triple interference removal network based on temporal and spatial attention interaction for forest smoke recognition in videos,” Computers and Electronics in Agriculture, vol. 218, p. 108756, 2024.
[4] L. He, X. Gong, S. Zhang, L. Wang, and F. Li, “Efficient attention based deep fusion cnn for smoke detection in fog environment,” Neurocomputing, vol. 434, pp. 224–238, 2021.
[5] D. Li, J. Zhou, and Q. Liu, “Fire and smoke detection algorithm based on improved yolov8,” pp. 1–9, 2024.
[6] M. Arlović, M. Patel, J. Balen, and F. Hržić, “F2m: Ensemble-based uncertainty estimation model for fire detection in indoor environments,” Engineering applications of artificial intelligence, vol. 133, p. 108428, 2024.
[7] S. H. Yelleni, D. Kumari, P. Srijith et al., “Monte carlo dropblock for modeling uncertainty in object detection,” Pattern Recognition, vol. 146, p. 110003, 2024.
[8] H. Tao, Q. Duan, M. Lu, and Z. Hu, “Learning discriminative feature representation with pixel-level supervision for forest smoke recognition,” Pattern Recognition, vol. 143, p. 109761, 2023.
[9] M. Park, J. Bak, S. Park et al., “Advanced wildfire detection using generative adversarial network-based augmented datasets and weakly supervised object localization,” International Journal of Applied Earth Observation and Geoinformation, vol. 114, p. 103052, 2022.
[10] G. Li, P. Chen, C. Xu, C. Sun, and Y. Ma, “Anchor-free smoke and flame recognition algorithm with multi-loss,” Fire, vol. 6, no. 6, p. 225, 2023.
[11] F. Yuan, Z. Dong, L. Zhang, X. Xia, and J. Shi, “Cubic-cross convolutional attention and count prior embedding for smoke segmentation,” Pattern Recognition, vol. 131, p. 108902, 2022.
[12] Y. Liang, T. Chen, and W. Zhang, “Multi-scale fire detection algorithm with adaptive attention,” Transactions of Beijing Institute of Techonology, vol. 44, no. 01, pp. 91–101, 2024.
[13] A. A. Siddique, N. Alasbali, M. Driss, W. Boulila, M. S. Alshehri, and J. Ahmad, “Sustainable collaboration: Federated learning for environmentally conscious forest fire classification in green internet of things (iot),” Internet of Things, vol. 25, p. 101013, 2024.
[14] C. Li, G. Li, Y. Song, Q. He, Z. Tian, H. Xu, and X. Liu, “Fast forest fire detection and segmentation application for uav-assisted mobile edge computing system,” IEEE Internet of Things Journal, 2023.
[15] J. Tian, G. Qin, and W. Zhang, “Fire-and-smoke detection based on convolutional attention and feature fusion,” pp. 1–12, 2024.
[16] Q. Zhang, W. Zhang, and X. Yang, “Smoke and flame detection method with yolov5-resnet cascade network,” vol. 23, no. 02, pp. 397–405, 2023.
[17] C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” in International conference on machine learning. PMLR, 2017, pp. 1321–1330.
[18] K. Ji, F. Chen, X. Guo, Y. Xu, J. Wang, and J. Chen, “Uncertainty-guided learning for improving image manipulation detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 22 456–22 465.
[19] X. Tian, J. Zhang, M. Xiang, and Y. Dai, “Modeling the distributional uncertainty for salient object detection models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19 660–19 670.
[20] M. P. Naeini, G. Cooper, and M. Hauskrecht, “Obtaining well calibrated probabilities using bayesian binning,” in Proceedings of the AAAI conference on artificial intelligence, vol. 29, no. 1, 2015.
[21] Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” in Proceedings of the 26th annual international conference on machine learning, 2009, pp. 41–48.
[22] H. Yar, T. Hussain, M. Agarwal, Z. A. Khan, S. K. Gupta, and S. W. Baik, “Optimized dual fire attention network and medium-scale fire classification benchmark,” IEEE Transactions on Image Processing, vol. 31, pp. 6331–6343, 2022.
[23] J. S. Almeida, C. Huang, F. G. Nogueira, S. Bhatia, and V. H. C. de Albuquerque, “Edgefiresmoke: A novel lightweight cnn model for real-time video fire–smoke detection,” IEEE Transactions on Industrial Informatics, vol. 18, no. 11, pp. 7889–7898, 2022.