Location-aware Adaptive Normalization: A Deep Learning Approach For Wildfire Danger Forecasting
Abstract
Climate change is expected to intensify and increase extreme events in the weather cycle. Since this has a significant impact on various sectors of our life, recent works are concerned with identifying and predicting such extreme events from Earth observations. With respect to wildfire danger forecasting, previous deep learning approaches duplicate static variables along the time dimension and neglect the intrinsic differences between static and dynamic variables. Furthermore, most existing multi-branch architectures lose the interconnections between the branches during the feature learning stage. To address these issues, this paper proposes a 2D/3D two-branch convolutional neural network (CNN) with a Location-aware Adaptive Normalization layer (LOAN). Using LOAN as a building block, we can modulate the dynamic features conditional on their geographical locations. Thus, our approach considers feature properties as a unified yet compound 2D/3D model. Besides, we propose using the sinusoidal-based encoding of the day of the year to provide the model with explicit temporal information about the target day within the year. Our experimental results show a better performance of our approach than other baselines on the challenging FireCube dataset. The results show that location-aware adaptive feature normalization is a promising technique to learn the relation between dynamic variables and their geographic locations, which is highly relevant for areas where remote sensing data builds the basis for analysis. The source code is available at https://github.com/HakamShams/LOAN.
Index Terms:
Machine learning, remote sensing, climate science, wildfire, convolutional neural network, adaptive normalization, time encoding.I Introduction
There is a general expectation that weather and climate extremes will change their patterns and frequencies in the future [1, 2, 3, 4]. This is particularly the case for the Mediterranean region, which has been identified as a hot spot for climatic changes [5, 6, 7]. Because extreme weather events can impose short- and long-term risks in our Earth system, predicting these risks such as droughts, windstorms, and wildfires has become recently more relevant. In particular, wildfire forecasting constitutes one of the open challenges for risk assessment and emergency response [8, 9, 10]. Wildfire forecasting refers to the task of fire-susceptibility mapping using key remote sensing, meteorological, and anthropogenic variables [11]. Building an integrated modeling system of the Earth should also consider wildfire events to comprehend the origin of past patterns better and predict future ones [12]. Unlike typical prediction tasks, understanding when weather conditions have a high tendency to cause fire events addresses more complexities; among these are the stochastic nature of fire events [13] and fire drivers, which are time-dependent and inter-correlated across variables [14]. Moreover, the prediction model should consider difficulties like a high false positive error rate, uncertainty, and class imbalance.
In recent years, many works leveraged classical machine learning approaches to solve the task [11]. More recently, deep learning methods [15] have become popular since they can handle large multivariate datasets more efficiently and are able to learn highly complex relationships between observations and the predicted outcome. In the context of wildfire danger forecasting, Prapas et al. [16] and Kondylatos et al. [13] proposed to use recurrent neural networks in combination with 2D convolutions to exploit both temporal and spatial context. These approaches, however, do not distinguish between the different input variables. Static variables like elevation, which barely change over time, are simply copied and concatenated with dynamic variables like surface temperature. This results not only in a highly redundant input to the network, but it also neglects strong causal effects between static and dynamic variables. For instance, the surface temperature strongly depends on the geographical location, which is described by static variables.
In this work, we thus propose a convolutional neural network for wildfire danger forecasting that handles static and dynamic variables differently. Since the static variables do not change over time, they are processed by a branch consisting of 2D convolutions while the dynamic variables are processed by the second branch with 3D convolutions as illustrated in Fig. 1. To address the causal effect of static variables on dynamic variables, we introduce feature modulation for the dynamic variables where the modulation parameters are generated dynamically and conditionally on the geographical location. We thus name this method Location-aware Adaptive Normalization (LOAN). In addition, we encode the date of the forecasting during a year by an absolute time encoding based on the sinusoidal encoding [17]. Both LOAN and the time encoding can be implemented as plugin layers in different deep learning architectures. We view our model as a generic architecture that can be used for other time-dependent forecasting tasks with static and dynamic variables. We conduct extensive experiments on the FireCube dataset [18] where our approach outperforms previous works. We achieve an overall improvement of up to in precision, in F1-score, in AUROC, and in OA on the test set.
The rest of this paper is organized as follows. Section II reviews the related literature. Section III provides information about the used dataset. The proposed method is described in detail in Section IV. The experimental results and ablation study are provided in Section V and Section VI, respectively. Finally, conclusions and outlook are given in Section VII.
II Related Works
II-A Wildfire Danger Forecasting
Wildfire forecasting or wildfire-susceptibility mapping from remote sensing and Earth observations data is a very important topic for wildfire management[11]. We briefly review some prior related works in this direction. Iban et al. [19], Pham et al. [20], and Gholami et al. [21] relied on traditional machine learning approaches to generate susceptibility maps. Shang et al. [22] and Mitsopoulos and Mallinis [23] utilized Random Forests classifiers (RF). In their works, they studied the importance of biotic and abiotic predictors for wildfire forecasting. Jiang et al. [24] proposed a deep learning approach based on a Multi-Layer Perceptron (MLP) and included a comparison with traditional machine learning algorithms. In Le et al. [25], a similar MLP-based approach was presented to generate a forest fire danger map. Zhang et al. [26] used a convolutional neural network (CNN) and extended their work later to predict fire susceptibility at the global level [27]. Other works with CNN were conducted in Bjånes et al. [28] and Bergado et al. [29]. Furthermore, Huot et al. [30] approached the problem as a scene classification task using U-Net models to predict wildfire spreading. Their approach operates directly on the whole scene. A similar approach based on a U-Net++ model for global wildfire forecasting was proposed in Prapas et al. [31]. Yoon and Voulgaris [32] presented an approach that relies on a recurrent network with Gated Recurrent Units (GRU) to model past observations and on a CNN to predict wildfire probability maps for multiple time steps. More recently, Prapas et al. [16] and Kondylatos et al. [13] proposed to use LSTM-based (Long-Short Term Memory) approaches. They exploited both temporal and spatio-temporal context by applying recurrent LSTM and ConvLSTM models. They did not consider the whole scene at once but rather the classification of one pixel at a time (pixel-level).
Unlike these works, we do not treat all observation variables in the same way, but we propose a deep learning model that handles different types of variables in separated 2D and 3D CNN branches. In contrast to a ConvLSTM, which models spatial and temporal relations separately, a 3D CNN models spatio-temporal relations jointly. We assume that the dataset contains static and dynamic variables, which we argue is the case for most datasets. Similar to Prapas et al. [16] and Kondylatos et al. [13], we also formulate the problem as pixel-level classification taking into account the spatio-temporal local context around the target pixel.
II-B Multi-Branch Neural Networks
When deep learning is applied to potentially multi-source remote sensing-based Earth observation data, multi-branch neural networks are a commonly used framework. This is mainly because such networks enrich representation learning and provide discriminative learning perspectives of the input variables [33]. In addition, an important aspect of the multi-branch design is the capability to adapt some parts of the model to a specific type of input. The general framework generates features from each branch and fuses these features in the network to obtain a unified feature vector. This fused representation is used as input to the subsequent layers. In Gaetano et al. [34], a two-branch 2D CNN network was proposed to handle panchromatic information along with a multi-spectral one for image classification. Tan et al. [35] reduced the depth of a semantic segmentation classifier by applying consecutive blocks, each containing three CNN branches. A similar objective can be found in Zhao et al. [33], where the network complexity was reduced via weight sharing and self-distillation (SD) embedding. In this way, only the main branch is used during inference, which inherits the knowledge of trained subbranches and has a close performance to an ensemble model. For hyperspectral image classification, Xu et al. [36] introduced a model called Spectral–Spatial Unified Network (SSUN). In their model, spectral features are learned by a grouping-based LSTM, and spatial features are learned by a 2D CNN. Shen et al. [37] used separated spectral and spatial convolutional branches for hyperspectral input (CDELM). They based their framework on the extreme learning machine (ELM). Unlike common backpropagation algorithms, they used a single hidden layer feed-forward model. A multi-branch architecture was also explored for image fusion. Liu et al. [38] proposed a two-stream CNN called (StfNet). They investigated the task of spatio-temporal image fusion. Their network takes a coarse image input along with its neighboring images to predict the reconstructed fine image. Some works adapted a multi-branch architecture to construct a multi-scale feature vector. In Gan et al. [39], a dual-branch CNN with different filter kernel sizes was used as an autoencoder. Thus, the input image could be processed at different scales. Tang et al. [40] proposed a multi-scale Gaussian pyramid to handle hyperspectral input. They used the Gaussian pyramid to obtain multi-scale images which are then processed by ResNet modules [41]. In this way, spatial features can be learned at different scales. They further used a second branch, which performs a discrete wavelet transform on the spectral input followed by an LSTM module [42]. The spatial and spectral features are then fused and processed by an MLP to obtain the final classification result. For short-term multi-temporal image classification, Zheng et al. [43] addressed the task through a Multi-temporal Deep Fusion Network (MDFN). In their framework, the LSTM-based branch is used to learn temporal-spectral features. At the same time, temporal-spatial and spectral-spatial information is learned via a joint 3D-2D CNN with two branches. Furthermore, some works employed attention mechanisms with multi-branch architectures [44, 45, 46]. When attention is applied, it drives the model to focus more on regions of interest. The former described methods [34, 35, 36, 37, 38, 39, 40, 43, 44] except of Zhao et al. [33], Zhu et al. [45], and Deng et al. [46] did not consider linking information between branches during the feature learning stage. This limits the gradient flow and disentangles the correlations between the learned features.
This paper proposes an architecture composed of two CNN branches for a forecasting task. A 2D branch is used to learn spatial features from static variables. At the same time, a 3D branch is used to learn spatio-temporal features from dynamic variables, which vary along the input temporal dimension. The branches are further linked via adaptive modulation layers to model the causal effects of static variables on dynamic variables.

II-C Conditional Normalization
Since the introduction of normalization techniques in deep learning, they became a basic building block in many state-of-the-art models. Common normalization methods include batch normalization [47], group normalization [48], instance normalization [49], and layer normalization [50]. It has been shown empirically that normalization layers help with model optimization and regularization. Through normalization layers, the activation maps inside the model are normalized to follow a normal distribution with zero mean. After that, the normalized activation maps are modulated or denormalized by learnable affine transformation parameters. These parameters vary across channels and are learned based on the running training statistics together with the model parameters. Therefore, such a normalization method is called unconditional. Compared to popular unconditional normalizations, there exist conditional normalization techniques which aim to learn affine parameters conditionally on external input. In the field of computer vision, conditional normalization is often used for image synthesis and style transformation [51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64]. More recently, Marín and Escalera [65] adapted the conditional normalization from Wang et al. [52] and Park et al. [51] to generate high-resolution satellite images. We use the conditional normalization in a very different way than [65, 52, 51]. While these works focus on synthesizing an image using the segmentation map as conditional input, we aim to modulate dynamic features conditioned on static features.
II-D Temporal Positional Encoding
A plethora of studies exist about temporal modeling in remote sensing [66, 67, 68, 69]. Recently, the self-attention model, also known as Transformer and first presented by Vaswani et al. [17] for natural language processing, has become a natural choice to handle sequential data, which includes a positional encoding. In the field of remote sensing, many works showed the benefits of adapting positional encoding for time-dependent image classification (Garnot et al. [70], Garnot and Landrieu [71]), panoptic segmentation (Garnot and Landrieu [72]), and image generation (Dress et al. [73]). In their work, each image was given an encoded time vector according to its position with respect to a reference point, i.e., the first acquisition time step. Nyborg et al. [74] used the calendar time (day of the year) to provide positional information within the sequence. They also proposed to learn or estimate time shifts between geographically distant regions to enhance the generalization further. In another work, Nyborg et al. [75] used the thermal time, which is obtained by accumulating daily average temperatures over the growing season, for crop classification. In general, transformer-based approaches require the positional encoding since the temporal information is otherwise lost within a transformer model. While 3D CNNs consider the temporal order of the input such that a positional encoding is not necessary, we show in this work that adding an absolute temporal encoding is also useful for time-dependent forecasting.
III Dataset
There are only very few publically available datasets for wildfire forecasting and they differ significantly in the observational variables, the spatial and temporal resolutions, and the task that needs to be addressed. A related dataset is the Next Day Wildfire Spread dataset [30] where the task is to predict wildfire spread. It is formulated as a scene classification task and not as a pixel-wise wildfire forecasting task as it is proposed in the FireCube dataset [18] and addressed in this work. The FireCube dataset was first published in Prapas et al. [16] and extended later in Kondylatos et al. [13]. It includes multivariate spatio-temporal data streams with 90 variables from the years - with a resolution of . The area is , covering parts of the Eastern Mediterranean. The observational variables include meteorological data [76], satellite-derived products [77, 78], topographic features [79], human-related activities [80], and historical fire records [81, 82]. In addition, Copernicus Corine Land Cover (CLC) [83] and Fire Weather Index (FWI) [84] are provided. The target is to predict for each pixel if a wildfire will ignite and become large () in the next day. The task is equivalent to binary classification, where the positive class represents a wildfire event.
Since wildfire forecasting is essentially an imbalanced classification task, the authors of [13] extracted samples as follows: For a target day , the static variables form a patch of centered around the target pixel at day . In contrast, the dynamic variables consist of time series of observations from days until . For each positive sample, a few negative samples from different locations are sampled. Although the negatives are from different locations, they are sampled from regions that have a similar land cover distribution as the positive samples to make the task more difficult.
Overall the dataset includes samples for training ( positives and negatives for the years -), samples for validation ( positives and negatives for the year ) and samples for testing ( positives and negatives for the year and positives and negatives for the year ). The year in the test set contains an extreme wildfire season in Greece [85, 13]. The extracted samples are available in [86].
In this paper, we use from the described dataset the same variables as in Kondylatos et al. [13]. This includes the following:
- •
-
•
dynamic variables:
IV Methodology
Problem formulation. Given a multivariate spatio-temporal data cube , where H and W are the spatial extensions of the cube, T is the temporal extension in the past for the time series , and V is the number of variables (static and dynamic), our aim is to learn a mapping function f approximated by a neural network that can predict the probability of a wildfire event to start at the center of for the target day :
(1) |
To achieve this, we propose a spatio-temporal 2D/3D CNN with two branches as illustrated in Fig. 1. First, the network design is introduced in Section IV-A. Then, the Location-aware Adaptive Normalization layer (LOAN), which is the core of our work is explained in detail in Section IV-B. Finally, Section IV-C describes the integration of the absolute temporal encoding (TE) into the model.
IV-A 2D/3D Two-Branch CNN
As shown in Fig. 1, our network consists of two branches that process dynamic and static variables, respectively. We denote the data cube with dynamic variables by and the data cube with static variables by . As in previous works, we normalize the input channel-wise to the range . Since the static variables do not have a time component, we use 2D convolutions for the static branch and 3D convolutions for the dynamic branch. More in detail:
Dynamic branch. The dynamic branch takes the variables which vary over time as input. It consists of blocks; each block has a 3D convolution with a kernel size followed by a ReLU activation function and a 3D max pooling layer. To reduce overfitting, we use global average pooling (GAP) [88] at the end of the last block. We denote the feature vector learned from this branch as .
Static branch. In parallel to the dynamic branch, the static branch has a similar architecture. However, 2D convolutions are used instead of 3D ones. We denote the feature vector learned from this branch as . Note that the dimensionality of the static feature vector is lower than the dimensionality of the dynamic feature vector since the input data cube is smaller.
In a nutshell, the dynamic- and static branch are two functions and , respectively:
(2) | |||
(3) |
For the dynamic feature vector we add an absolute temporal encoding , which will be described in Section IV-C. The two feature vectors are then concatenated and fed into 4 classification layers with 1D convolutions of kernel size 1. The layers reduce the dimensionality from 384 to 256, 128, 32, and 2. To reduce overfitting, we use dropout with a dropout probability for the 1D convolutional layers except the last two layers. Finally, a softmax activation is used after the last classification layer to predict the probability of a wildfire. In addition, we use a batch normalization layer [47] for the block of each branch. More implementation details are given in Section V.
For training, we use the binary cross entropy as loss function:
(4) |
where denotes the batch size and is the true label for sample .
In the following, we discuss the Location-Aware Adaptive Normalization (LOAN) that modulates the dynamic features based on the static features and the already mentioned absolute temporal encoding.
IV-B Location-Aware Adaptive Normalization (LOAN)
In general, dynamic variables are correlated with the geographic location, i.e., temperature and pressure change with elevation, soil moisture and NDVI vary with land cover, and humidity is correlated with some static variables like the waterway distance. Since the dynamic variables depend on the static variables and not vice versa, we aim to exploit this knowledge in our approach. This is done by learning to normalize the dynamic features based on the location-dependent static features. To this end, we propose a conditional normalization technique for remote sensing data called Location-aware Adaptive Normalization (LOAN).
We first describe a batch-normalization [47] where the activation map is normalized before it is modulated by scale and bias . Let be an activation map in the block of the dynamic branch and be an activation map of the corresponding block in the static branch, where denotes the number of channels and , and denote the depth, width, and height of the activation map , respectively. Using the indices , , , , and , the normalization of the dynamic branch is performed by the following equation:
(5) |
where is the activation before the normalization, and are the computed mean and standard deviation of channel , i.e., computed over the tensor and all samples in the batch, and and are the learnable modulation parameters.
In our case, we aim to learn a modulation of the dynamic features at the block where the modulation parameters and depend on the corresponding static features :
(6) |
In contrast to (5), the modulation parameters and vary with respect to sample in the batch, the location , and across channel , but they are constant over time . Furthermore, they are conditioned on the static features . We thus call the conditional map for the modulation.

The way how and are computed is illustrated in Fig. 2. First, the conditional map is normalized channel-wise as following:
(7) |
where
(8) | |||
(9) |
Afterward, is projected by two convolutional layers with filters to compute and . In our implementation, these modulation parameters are then duplicated along the temporal dimension to match the depth of such that (6) can be computed.
We add the Location-aware Adaptive Normalization layer (LOAN) in the first two blocks as shown in Fig. 1. The activation maps of the dynamic branch are normalized only in the block and modulated in both the and blocks. The impact of the blocks where the LOAN layer is added is evaluated in Table IV.

In the experimental section, we also evaluate a variant of LOAN that is not conditioned on the intermediate features of the static branch as shown in Figs. 1 and 2, but on the static variables directly as shown in Fig. 3. In this case, the conditional map has different spatial dimensions and number of channels compared to the features in the dynamic branch, i.e., the conditional maps consist of variables and have spatial dimensions. In this respect, the conditional map () is first resized to match the spatial dimensions () of the activation map from the dynamic branch. We use the nearest-neighbor method for the down-sampling. The resized conditional map is then fed into a convolutional layer with kernel size to double the number of channels, i.e., . Finally, as in the previous version of LOAN, the conditional map is normalized, projected by two convolutional layers, and duplicated along the temporal dimension to compute and . The impact of different conditional maps using the variants of LOAN is evaluated in Table II.
IV-C Absolute Temporal Encoding (TE)
Some extreme events in the climate model have a dependent relation on time [89, 14]. This is also the case for the FireCube dataset [18] where wildfire events vary from month to month and occur more frequently in the summer time as shown in Fig. 5. So far, the network does not consider an absolute time like the day of the year. Instead, for any forecast day , the last 10 days are used as observations but the network does not have the information what day during the year is.
As shown in Fig. 1, we add this information to the dynamic branch before we concatenate the static and dynamic features. To encode the day of the year, we use the standard fixed sinusoidal-based encoding by Vaswani et al. [17]. We pre-compute for each day of the year 111We consider February for the encoding., which is extracted from , the absolute temporal encoding vector :
(10) | |||
(11) |
where is the embedding dimension. Each even dimension results from a sine function, while each odd dimension results from a cosine function. This allows to have a smooth and yet unique encoding for every time step, i.e., each day of a year. Note that the vector has the same size as .
In order to add the absolute time encoding vector to the dynamic feature vector , we weight each element of the vector by a learnable weight vector :
(12) |
where denotes the Hadamard product, i.e., element-wise multiplication. Fig. 4 illustrates how the temporal embedding is added to the model.


V Experimental Results and Analysis
Implementation Details. The network is trained with the binary cross entropy loss (4) using the Pytorch framework [90] with a learning rate and the Adam optimizer () [91] with a weight decay . We use a batch size of and train the network for epochs. All models were trained on a single NVIDIA GeForce RTX 3090 GPU.
Performance Metrics. As described in Section III, we use the FireCube dataset [18]. We follow the same protocol for quantitative comparison as in Prapas et al. [16] and Kondylatos et al. [13]. The evaluation metrics are precision, recall, and F1-score, calculated for the positive class that represents a wildfire event. In addition, we report true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN). Moreover, we provide the overall accuracy (OA) and the area under the receiver operating characteristic curve (AUROC) as evaluation metrics. OA is the accuracy obtained on all negative and positive samples in the test set. The AUROC describes the true positive rate (TPR) against the false positive rate (FPR) within multiple thresholds in one value.
V-A Comparison with Baselines
We compare our approach to the approaches that have been evaluated on the described dataset in Kondylatos et al. [13]. This includes two deep learning models, namely LSTM [42] and ConvLSTM [66], and two classical machine learning classifiers, namely Random Forests (RF) [94] and XGBoost [95]. For further details regarding the architectures and hyper-parameters of the models, we refer to the work of Kondylatos et al. [13]. In order to demonstrate the benefit of treating static and dynamic variables differently, we also compare with a one-branch 3D CNN where we duplicate the static variables along the temporal dimension and concatenate them together with the dynamic variables to form a single data cube. In addition, we compare to the recent transformer models TimeSformer [92] and Video SwinTransformer [93], which use space-time attention. As mentioned in Subsection II-D, vision transformers are based on a self-attention mechanism to model spatio-temporal dependencies. Compared to CNNs, transformers have less inductive bias and need much more computational resources for training. To the best of our knowledge, no prior work has done a systematic study about the performance of video vision transformers for wildfire forecasting.
To ensure a fair comparison, all baselines were re-implemented and trained on the same samples with a fixed random seed. We do not use any augmentation technique. The quantitative results of our experiments are provided in Table LABEL:tab:table1. The results of the proposed 2D/3D CNN are shown with and without absolute temporal encoding (TE).
We can observe that the proposed 2D/3D CNN outperforms the other methods for most metrics, particularly FP, TN, Precision, F1-score, and OA, on the validation and testing sets. SwinTransformer and ConvLSTM achieve a slightly higher AUROC for 2019 and 2020, respectively. LSTM and SwinTransformer achieve a higher recall, but at the cost of a very low precision. In comparison with other deep learning methods, LSTM and SwinTransformer have even the highest number of false positives for all years. The main weakness of LSTM lies in the fact that it does not consider the spatial context, while large models like SwinTransformer are prone to overfitting, which results in a relatively poor performance for 2020. RF and XGBoost have the same disadvantage as LSTM, but even a weaker temporal model and thus perform worse than LSTM. Most interesting is the comparison to 3D CNN since it uses the same 3D CNN structure but only one branch, i.e., it treats static variables like dynamic variables. The results show that the proposed approach with two branches outperforms the single-branch architecture for all metrics and all years. This demonstrates the importance of treating static variables differently than dynamic variables. Adding the absolute temporal encoding (TE) to the model substantially reduces FP at the cost of decreasing TP. This is also reflected in the precision and recall.
Year (val) | Year - (test) | |||||||||||
Static Variables | Precision() | Recall() | F1-score() | AUROC() | OA() | Precision() | Recall() | F1-score() | AUROC() | OA() | ||
DEM + slope | 75.61 | 73.69 | 74.64 | 93.82 | 89.88 | 80.88 | 81.67 | 81.27 | 97.06 | 95.05 | ||
Distance to roads Distance to waterway Population density | 77.27 | 76.08 | 76.67 | 94.16 | 90.63 | 81.91 | 74.48 | 78.02 | 96.11 | 94.48 | ||
Land cover | 77.27 | 76.62 | 76.94 | 94.62 | 90.72 | 81.85 | 79.93 | 80.88 | 96.92 | 95.03 | ||
DEM + slope Distance to roads Distance to waterway Population density | 75.09 | 77.00 | 76.03 | 94.09 | 90.19 | 79.64 | 76.89 | 78.24 | 96.69 | 94.37 | ||
DEM + slope Land cover | 74.61 | 73.46 | 74.03 | 94.24 | 89.58 | 80.05 | 81.44 | 80.74 | 97.20 | 94.89 | ||
Land cover Distance to roads Distance to waterway Population density | 77.52 | 73.23 | 75.32 | 94.52 | 90.30 | 82.63 | 81.12 | 81.87 | 96.98 | 95.27 | ||
All static variables | 79.64 | 74.62 | 77.05 | 94.52 | 91.01 | 83.71 | 83.60 | 83.65 | 97.41 | 95.70 |
Year (val) | Year - (test) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Conditional Map | Precision() | Recall() | F1-score() | AUROC() | OA() | Precision() | Recall() | F1-score() | AUROC() | OA() | ||
75.68 | 62.23 | 68.30 | 93.38 | 88.32 | 83.35 | 79.98 | 81.63 | 97.11 | 95.26 | |||
DEM + slope | 70.84 | 71.00 | 70.92 | 92.77 | 88.23 | 76.90 | 79.45 | 78.15 | 96.16 | 94.15 | ||
Distance to roads Distance to waterway Population density | 73.83 | 76.15 | 74.97 | 93.74 | 89.72 | 77.98 | 76.63 | 77.30 | 96.22 | 94.08 | ||
Land cover | 79.05 | 72.00 | 75.36 | 94.58 | 90.48 | 81.53 | 75.19 | 78.23 | 96.80 | 94.49 | ||
All static variables | 77.12 | 74.92 | 76.00 | 94.95 | 90.44 | 81.05 | 84.08 | 82.54 | 97.55 | 95.32 | ||
Activation maps
(Static branch) |
79.64 | 74.62 | 77.05 | 94.52 | 91.01 | 83.71 | 83.60 | 83.65 | 97.41 | 95.70 |
In Table LABEL:tab:table2, we present additional experimental results alongside the memory footprint as the number of parameters (# Params), the estimated multiply-accumulate operations (multiply-adds) (MMACs), and the expected inference time, which is estimated as samples per millisecond (# SPmS). The performance metrics are calculated on both testing years - as one set. Since the LOAN layers increase the amount of parameters of the 2D/3D CNN, we report the results of the one-branch 3D CNN using k and k parameters. The smaller 3D CNN has about the same amount of parameters as the 2D/3D CNN without LOAN, whereas the larger 3D CNN has more parameters than the proposed model. Due to the lack of a spatial modeling, LSTM has the fewest parameters and is the fastest, but the precision is very low. ConvLSTM, the small one-branch 3D CNN, and 2D/3D CNN without LOAN and TE, which consider the spatial context, perform similar but 2D/3D CNN is the fastest approach and ConvLSTM is the slowest approach among them. Compared to the other approaches, transformer models have considerably more parameters and require more operations, which makes them computationally expensive. Nevertheless, our approach outperforms both transformer models while requiring by far less parameters and computational operations.
Normalizing the dynamic features conditioned on the static features (LOAN) increases all metrics. It also outperforms the large 3D CNN in all metrics and inference time. Adding TE increases all metrics when LOAN is not used, while increasing the computational cost only very little. When LOAN is used, adding TE decreases the recall but increases all other metrics. Since LOAN and TE change the dynamic features, we observe a different trade-off between recall and precision if both are used. This change is consistent over the years as shown in Table LABEL:tab:table1. Nevertheless, the F1-Score, AUROC, and OA are highest if both are used.
V-B Variable Importance
To assess the importance of different static variables, we present the results obtained with different combinations of static variables in Table I. For this experiment, we use 2D/3D CNN with LOAN but without TE. All dynamic variables are used in this experiment and the results are reported for the year and for the years - as one set. The static variables are grouped into 3 main categories: topographic variables consisting of digital elevation model (DEM) and slope, anthropogenic-related variables consisting of distance to roads or waterway and population density, and land cover variables.
From the results in Table I, we can conclude that among the 3 categories topographic variables give the best results for the years - when they are used without other variables. While land cover and anthropogenic-related variables provide the best results for the year . Overall, all static variables are relevant and the best results are obtained when all static variables are used (last row). This is also better than using the static variables of 2 out of the 3 categories, which is reported in rows 4-6 of Table I.
V-C Comparing Different Conditional Maps
While Table I shows the importance of different static variables as input to the 2D/3D CNN, we also analyze the impact of different ways to modulate the dynamic features in Table II. For the experiments, we use the 2D/3D CNN without TE and all dynamic and static variables as input. While we use all variables, the different settings differ in the input that is fed to the LOAN layer, i.e., the static features that the modulation of the dynamic features is conditioned on.
The results of the proposed conditioning, where we use the features from the corresponding block of the static branch, are shown in the last row. In the first row, we show the results if we do not use LOAN at all, i.e., we do not use any modulation of the dynamic features. For the other rows of Table II, we modify LOAN such that it is not conditioned on the intermediate features of the static branch as shown in Figs. 1 and 2. Instead, we condition LOAN directly on static variables. Note that we need to slightly adapt LOAN as shown in Fig. 3 since the number of static input variables differs from the number of feature channels at the block where LOAN is added.
As seen from Table II, the best result is obtained when we use the activation maps, i.e., the intermediate features, from the static branch for conditioning the modulation. However, comparable results are obtained when all static variables are directly used by the variant of LOAN that is shown in Fig. 4. While this variant achieves slightly higher AUROC, the variant shown in Fig. 3 achieves higher F1-score and OA. Using the static variables directly, we can analyze how the three categories of static variables impact the modulation of the dynamic features and thus the results. We can conclude that the modulation of the dynamic features is very sensitive to its condition. For the year , all three categories (rows -) improve the results compared to the setup without feature modulation (first row). For the years -, this is not the case and only the combination of all static variables (row ) leads to an improvement. The reason is the mismatch between and . Compared to the other categories, land cover has the highest number of variables per pixels () and shows the best performance. This indicates that conditioning the modulation of the dynamic features on the intermediate features of the static branch is a more practical approach than conditioning the modulation on the static variables directly, which seems to be sensitive to the number of variables.
V-D Qualitative Results
Predicted wildfire danger-susceptibility maps are depicted in Figs. 6, 7, and 8. We take input from the and days of three consecutive months in summer (June, July, and August) and predict for the and days of each month. We end up with around k pixels (samples) per day. The output from the deep learning models (LSTM, ConvLSTM, Ours) is a probability . In addition, we visualize the predicted maps produced by FWI with the provided spatial resolution . The output of FWI is clipped to the range [13]. The ignition points of large wildfires at those days are represented as black circles on the map. The first observation is that regardless of the coarse resolution of FWI, the predicted maps produced by the deep learning models are more reliable. While FWI relies on meteorological observations and models functional relationships, the results show that the modeled functional relationships are insufficient and do not reflect the complexity of the problem of forecasting wildfire. We also find that our proposed model with TE discards spots that result in a high false positive error rate while it keeps extreme ones (cf. the results for July in and ). Another important observation is that the LSTM model, which does not account for the spatial context, tends to produce heterogeneous predictions where neighboring pixels often have very different wildfire danger probabilities. Consequently, it generates many false positives. In contrast, our proposed model and ConvLSTM produce more homogeneous and clustered predictions. The respective performance metrics for Figs. 6 and 7 are provided in Table III.



02/06/2020 | 03/06/2020 | 02/07/2020 | 03/07/2020 | 02/08/2020 | 03/08/2020 | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
# Positive samples | 0 | 0 | 2 | 0 | 71 | 25 | ||||||
Algorithm | P | N | P | N | P | N | P | N | P | N | P | N |
LSTM [13] | - | 99.98 | - | 99.96 | 00.00 | 81.41 | - | 82.37 | 87.32 | 48.93 | 88.00 | 56.23 |
ConvLSTM [13] | - | 100.00 | - | 100.00 | 00.00 | 89.72 | - | 92.46 | 92.96 | 65.09 | 80.00 | 72.52 |
2D/3D CNN | - | 100.00 | - | 99.99 | 100.00 | 89.51 | - | 90.33 | 73.24 | 66.74 | 80.00 | 76.17 |
2D/3D CNN w/ TE | - | 100.00 | - | 100.00 | 00.00 | 96.81 | - | 97.92 | 78.87 | 65.68 | 80.00 | 74.32 |
02/06/2021 | 03/06/2021 | 02/07/2021 | 03/07/2021 | 02/08/2021 | 03/08/2021 | |||||||
# Positive samples | 0 | 0 | 36 | 22 | 679 | 1417 | ||||||
Algorithm | P | N | P | N | P | N | P | N | P | N | P | N |
LSTM [13] | - | 98.23 | - | 96.44 | 100.00 | 70.58 | 68.18 | 83.14 | 82.03 | 32.54 | 93.30 | 23.88 |
ConvLSTM [13] | - | 99.11 | - | 98.25 | 100.00 | 75.53 | 100.00 | 87.42 | 71.87 | 35.66 | 87.01 | 28.28 |
2D/3D CNN | - | 99.38 | - | 98.08 | 100.00 | 84.36 | 100.00 | 88.97 | 83.80 | 39.17 | 99.29 | 33.75 |
2D/3D CNN w/ TE | - | 99.91 | - | 98.95 | 100.00 | 94.77 | 100.00 | 98.61 | 85.13 | 36.67 | 99.36 | 31.67 |
VI Ablation Study
We finally evaluate two additional aspects. In Section VI-A, we analyze at which blocks of the proposed 2D/3D CNN LOAN is best added and the impact of the absolute temporal encoding with respect to the number of negative samples in Section VI-B.
VI-A LOAN Position in the Model
Year (val) | ||||||
Block | Precision() | Recall() | F1-score() | AUROC() | ||
75.68 | 62.23 | 68.30 | 93.38 | |||
76.90 | 75.54 | 76.21 | 94.40 | |||
79.64 | 74.62 | 77.05 | 94.52 | |||
73.92 | 70.85 | 72.35 | 93.68 | |||
Year - (test) | ||||||
Block | Precision() | Recall() | F1-score() | AUROC() | ||
83.35 | 79.98 | 81.63 | 97.11 | |||
80.90 | 78.72 | 79.80 | 97.22 | |||
83.71 | 83.60 | 83.65 | 97.41 | |||
79.77 | 81.30 | 80.52 | 96.92 |
As shown in Fig. 1, the proposed network has three blocks and we add LOAN to the first and second block. We evaluate in Table IV different configurations where we add LOAN only to the first or to all three blocks. The results are reported without TE. If we add LOAN only to the first block, the performance increases for the year but not for the years - compared to our model without LOAN (first row). When adding LOAN to the first two blocks, we observe a consistent improvement for all years. For the year , Precision, Recall, F1-score, and AUROC are improved by , , , and , respectively, and by , , , and for the years -, respectively. Adding LOAN to all three blocks performs worse than adding LOAN only to the first two blocks. This is due to the decrease of spatial resolution after each block by the pooling layers. In the third block the spatial resolution is too coarse for a location-specific modulation of the dynamic features.
VI-B Absolute Temporal Encoding


As we have seen in Tables LABEL:tab:table1 and LABEL:tab:table2, absolute temporal encoding (TE) increases precision at the cost of lower recall. Depending on the use of the wildfire forecasting, recall or precision are more important. The precision also depends on the amount of negative samples. In order to show that TE gives consistently a higher precision, we varied the number of negative samples. In this experiment, we test on all positive samples in the test set (years -) and gradually increase the number of negative ones. As shown in Fig. 9, we start with a setup where the number of negative samples is equal to the number of positive samples, i.e., 5635 positive and 5635 negative samples. We then increase the number of negative samples. Since the number of negative samples increases, the precision decreases for all methods. Note that the recall does not change since the number of positive samples remains the same. As already observed in Table LABEL:tab:table2, LSTM has a very low precision. 2D/3D CNN with LOAN has in all settings a higher precision than ConvLSTM and a much higher recall as shown in Table LABEL:tab:table2. Transformer models on the other hand have less precision than ConvLSTM but provide an overall better recall as shown in Table LABEL:tab:table2. While adding TE decreases recall, it improves the precision substantially and the improvement increases when the number of negative samples increases. While other metrics like F1-score or AUROC combine precision and recall in a single measure, depending on the application a higher recall or a higher precision might be more important. If precision is more important, TE is very useful. If recall is more important, TE should not be used. We also point out that TE encodes only the day of the year since the dataset has a relatively small spatial extension (). In case of larger datasets at continental scale, a consideration of the spatial location for the encoding would also become relevant as biogeographical regions occur [74], which are characterized by different climate variabilities and anthropogenic drivers over time. Finally, we plot in Fig. 10 the loss (4) curve during training.
VII Conclusion
In this work, we proposed a new deep learning approach for wildfire danger forecasting. In contrast to previous works, we handle spatial (static) and spatio-temporal (dynamic) variables differently. Our model processes the spatial and spatio-temporal variables in two separated 2D/3D CNN branches to learn static and dynamic feature vectors. Moreover, we have introduced the Location-aware Adaptive Normalization layer, which modulates the activation maps in the dynamic branch conditionally on their respective static features to address the causal effect of static features on dynamic features. We furthermore integrated an absolute time encoding into the model. By encoding the calendar time, we make the model explicitly aware of the forecasting day. While the time encoding reduces the recall, it substantially increases the precision. We conducted our experiments on the FireCube dataset and demonstrated the effectiveness of our approach compared to several baselines in terms of Precision, F1-score, AUROC, and OA. Although our approach demonstrated a substantial improvement compared to previous works for wildfire forecasting, it still has some limitations. Despite the fact that our framework includes domain knowledge through the normalization layer and absolute time encoding, it does not incorporate physical knowledge about the Earth system. Furthermore, the FireCube dataset covers only parts of Eastern Mediterranean and the years 2009-2021. There is a need for more standardized datasets for wildfire forecasting at a continental scale and longer time periods. Finally, there may be hidden events that are correlated with climate variability and extreme weather conditions. It is an open question how these impact the forecast quality and if additional input variables will be needed to improve the forecast accuracy.
We believe that the proposed approach of dealing with spatial and spatio-temporal variables is also highly relevant for other remote sensing applications.
Acknowledgments
We would like to thank Ioannis Prapas and Spyros Kondylatos for providing the FireCube dataset.
References
- [1] L. Ren, P. Arkin, T. M. Smith, and S. S. Shen, “Global precipitation trends in 1900–2005 from a reconstruction and coupled model simulations,” Journal of Geophysical Research: Atmospheres, vol. 118, no. 4, pp. 1679–1689, 2013. [Online]. Available: https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1002/jgrd.50212
- [2] A. M. Lausier and S. Jain, “Overlooked trends in observed global annual precipitation reveal underestimated risks,” Scientific reports, vol. 8, no. 1, pp. 1–7, 2018.
- [3] S. Perkins-Kirkpatrick and S. Lewis, “Increasing trends in regional heatwaves,” Nature communications, vol. 11, no. 1, pp. 1–8, 2020.
- [4] R. Samuels, A. Hochman, A. Baharad, A. Givati, Y. Levi, Y. Yosef, H. Saaroni, B. Ziv, T. Harpaz, and P. Alpert, “Evaluation and projection of extreme precipitation indices in the eastern mediterranean based on cmip5 multi-model ensemble,” International Journal of Climatology, vol. 38, no. 5, pp. 2280–2297, 2018.
- [5] G. Zittis, P. Hadjinicolaou, M. Klangidou, Y. Proestos, and J. Lelieveld, “A multi-model, multi-scenario, and multi-domain analysis of regional climate projections for the mediterranean,” Regional Environmental Change, vol. 19, no. 8, pp. 2621–2635, 2019.
- [6] M. J. Barcikowska, S. B. Kapnick, L. Krishnamurty, S. Russo, A. Cherchi, and C. K. Folland, “Changes in the future summer mediterranean climate: contribution of teleconnections and local factors,” Earth System Dynamics, vol. 11, no. 1, pp. 161–181, 2020. [Online]. Available: https://esd.copernicus.org/articles/11/161/2020/
- [7] A. Hochman, F. Marra, G. Messori, J. G. Pinto, S. Raveh-Rubin, Y. Yosef, and G. Zittis, “Extreme weather and societal impacts in the eastern mediterranean,” Earth System Dynamics, vol. 13, no. 2, pp. 749–777, 2022. [Online]. Available: https://esd.copernicus.org/articles/13/749/2022/
- [8] M. P. Thompson, Y. Wei, D. E. Calkin, C. D. O’Connor, C. J. Dunn, N. M. Anderson, and J. S. Hogland, “Risk management and analytics in wildfire response,” Current Forestry Reports, vol. 5, no. 4, pp. 226–239, 2019.
- [9] S. C. P. Coogan, F. Robinne, P. Jain, and M. D. Flannigan, “Scientists’ warning on wildfire — a canadian perspective,” Canadian Journal of Forest Research, 2019.
- [10] F. Moreira, D. Ascoli, H. Safford, M. A. Adams, J. M. Moreno, J. M. C. Pereira, F. X. Catry, J. Armesto, W. Bond, M. E. González, T. Curt, N. Koutsias, L. McCaw, O. Price, J. G. Pausas, E. Rigolot, S. Stephens, C. Tavsanoglu, V. R. Vallejo, B. W. V. Wilgen, G. Xanthopoulos, and P. M. Fernandes, “Wildfire management in mediterranean-type regions: paradigm change needed,” Environmental Research Letters, vol. 15, no. 1, p. 011001, jan 2020. [Online]. Available: https://dx.doi.org/10.1088/1748-9326/ab541e
- [11] P. Jain, S. C. Coogan, S. G. Subramanian, M. Crowley, S. Taylor, and M. D. Flannigan, “A review of machine learning applications in wildfire science and management,” Environmental Reviews, vol. 28, no. 4, pp. 478–505, 2020.
- [12] D. Fornacca, G. Ren, and W. Xiao, “Performance of three modis fire products (mcd45a1, mcd64a1, mcd14ml), and esa fire_cci in a mountainous area of northwest yunnan, china, characterized by frequent small fires,” Remote Sensing, vol. 9, no. 11, p. 1131, 2017.
- [13] S. Kondylatos, I. Prapas, M. Ronco, I. Papoutsis, G. Camps-Valls, M. Piles, M.-Á. Fernández-Torres, and N. Carvalhais, “Wildfire danger prediction and understanding with deep learning,” Geophysical Research Letters, vol. 49, no. 17, p. e2022GL099368, 2022. [Online]. Available: https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2022GL099368
- [14] S. Hantson, A. Arneth, S. P. Harrison, D. I. Kelley, I. C. Prentice, S. S. Rabin, S. Archibald, F. Mouillot, S. R. Arnold, P. Artaxo et al., “The status and challenge of global fire modelling,” Biogeosciences, vol. 13, no. 11, pp. 3359–3375, 2016.
- [15] M. Reichstein, G. Camps-Valls, B. Stevens, M. Jung, J. Denzler, N. Carvalhais et al., “Deep learning and process understanding for data-driven earth system science,” Nature, vol. 566, no. 7743, pp. 195–204, 2019.
- [16] I. Prapas, S. Kondylatos, I. Papoutsis, G. Camps-Valls, M. Ronco, M.-Á. Fernández-Torres, M. P. Guillem, and N. Carvalhais, “Deep learning methods for daily wildfire danger forecasting,” 2021. [Online]. Available: https://arxiv.org/abs/2111.02736
- [17] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- [18] I. Prapas, S. Kondylatos, and I. Papoutsis, “FireCube: A Daily Datacube for the Modeling and Analysis of Wildfires in Greece,” May 2022. [Online]. Available: https://doi.org/10.5281/zenodo.6475592
- [19] M. C. Iban and A. Sekertekin, “Machine learning based wildfire susceptibility mapping using remotely sensed fire data and gis: A case study of adana and mersin provinces, turkey,” Ecological Informatics, vol. 69, p. 101647, 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1574954122000966
- [20] B. T. Pham, A. Jaafari, M. Avand, N. Al-Ansari, T. Dinh Du, H. P. H. Yen, T. V. Phong, D. H. Nguyen, H. V. Le, D. Mafi-Gholami, I. Prakash, H. Thi Thuy, and T. T. Tuyen, “Performance evaluation of machine learning methods for forest fire modeling and prediction,” Symmetry, vol. 12, no. 6, 2020. [Online]. Available: https://www.mdpi.com/2073-8994/12/6/1022
- [21] S. Gholami, N. Kodandapani, J. Wang, and J. M. L. Ferres, “Where there’s smoke, there’s fire: Wildfire risk predictive modeling via historical climate data,” in AAAI Conference on Artificial Intelligence, 2020.
- [22] C. Shang, M. A. Wulder, N. C. Coops, J. C. White, and T. Hermosilla, “Spatially-explicit prediction of wildfire burn probability using remotely-sensed and ancillary data,” Canadian Journal of Remote Sensing, vol. 46, pp. 313 – 329, 2020.
- [23] I. Mitsopoulos and G. Mallinis, “A data-driven approach to assess large fire size generation in greece,” Natural Hazards, vol. 88, pp. 1591–1607, 2017.
- [24] T. Jiang, S. K. Bendre, H. Lyu, and J. Luo, “From static to dynamic prediction: Wildfire risk assessment based on multiple environmental factors,” in 2021 IEEE International Conference on Big Data (Big Data), 2021, pp. 4877–4886.
- [25] H. V. Le, D. A. Hoang, C. T. Tran, P. Q. Nguyen, V. H. T. Tran, N. D. Hoang, M. Amiri, T. P. T. Ngo, H. V. Nhu, T. V. Hoang, and D. Tien Bui, “A new approach of deep neural computing for spatial prediction of wildfire danger at tropical climate areas,” Ecological Informatics, vol. 63, p. 101300, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1574954121000911
- [26] G. Zhang, M. Wang, and K. Liu, “Forest fire susceptibility modeling using a convolutional neural network for yunnan province of china,” International Journal of Disaster Risk Science, vol. 10, no. 3, pp. 386–403, 2019.
- [27] ——, “Deep neural networks for global wildfire susceptibility modelling,” Ecological Indicators, vol. 127, p. 107735, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1470160X21004003
- [28] A. Bjånes, R. De La Fuente, and P. Mena, “A deep learning ensemble model for wildfire susceptibility mapping,” Ecological Informatics, vol. 65, p. 101397, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1574954121001886
- [29] J. R. Bergado, C. Persello, K. Reinke, and A. Stein, “Predicting wildfire burns from big geodata using deep learning,” Safety science, vol. 140, p. 105276, 2021.
- [30] F. Huot, R. L. Hu, N. Goyal, T. Sankar, M. Ihme, and Y.-F. Chen, “Next day wildfire spread: A machine learning dataset to predict wildfire spreading from remote-sensing data,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–13, 2022.
- [31] I. Prapas, A. Ahuja, S. Kondylatos, I. Karasante, L. Alonso, E. Panagiotou, C. Davalas, D. Michail, N. Carvalhais, and I. Papoutsis, “Deep learning for global wildfire forecasting,” in NeurIPS 2022 Workshop on Tackling Climate Change with Machine Learning, 2022. [Online]. Available: https://www.climatechange.ai/papers/neurips2022/52
- [32] H.-J. Yoon and P. G. Voulgaris, “Multi-time predictions of wildfire grid map using remote sensing local data,” 2022 IEEE International Conference on Knowledge Graph (ICKG), pp. 365–372, 2022.
- [33] Q. Zhao, Y. Ma, S. Lyu, and L. Chen, “Embedded self-distillation in compact multibranch ensemble network for remote sensing scene classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–15, 2022.
- [34] R. Gaetano, D. Ienco, K. Ose, and R. Cresson, “A two-branch cnn architecture for land cover classification of pan and ms imagery,” Remote Sensing, vol. 10, no. 11, 2018. [Online]. Available: https://www.mdpi.com/2072-4292/10/11/1746
- [35] Y. Tan, S. Xiong, and P. Yan, “Multi-branch convolutional neural network for built-up area extraction from remote sensing image,” Neurocomputing, vol. 396, pp. 358–374, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231219309208
- [36] Y. Xu, L. Zhang, B. Du, and F. Zhang, “Spectral–spatial unified networks for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 10, pp. 5893–5909, 2018.
- [37] Y. Shen, L. Xiao, J. Chen, and D. Pan, “A spectral-spatial domain-specific convolutional deep extreme learning machine for supervised hyperspectral image classification,” IEEE Access, vol. 7, pp. 132 240–132 252, 2019.
- [38] X. Liu, C. Deng, J. Chanussot, D. Hong, and B. Zhao, “Stfnet: A two-stream convolutional neural network for spatiotemporal image fusion,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 9, pp. 6552–6564, 2019.
- [39] C. Gan, X. Yan, Y. Wu, and Z. Zhang, “A two-branch convolution residual network for image compressive sensing,” IEEE Access, vol. 8, pp. 1705–1714, 2020.
- [40] Y. Tang, X. Xie, and Y. Yu, “Hyperspectral classification of two-branch joint networks based on gaussian pyramid multiscale and wavelet transform,” IEEE Access, vol. 10, pp. 56 876–56 887, 2022.
- [41] Z. Zhong, J. Li, L. Ma, H. Jiang, and H. Zhao, “Deep residual networks for hyperspectral image classification,” in 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2017, pp. 1824–1827.
- [42] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, pp. 1735–80, 12 1997.
- [43] Y. Zheng, S. Liu, Q. Du, H. Zhao, X. Tong, and M. Dalponte, “A novel multitemporal deep fusion network (mdfn) for short-term multitemporal hr images classification,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 10 691–10 704, 2021.
- [44] R. Li, S. Zheng, C. Duan, Y. Yang, and X. Wang, “Classification of hyperspectral image based on double-branch dual-attention mechanism network,” Remote Sensing, vol. 12, no. 3, 2020. [Online]. Available: https://www.mdpi.com/2072-4292/12/3/582
- [45] Z. Zhu, Y. Tao, and X. Luo, “Hcnnet: A hybrid convolutional neural network for spatiotemporal image fusion,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–16, 2022.
- [46] Z. Deng, Y. Wang, B. Zhang, L. Li, J. Wang, L. Bian, and C. Yang, “A triple-path spectral–spatial network with interleave-attention for hyperspectral image classification,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 5906–5923, 2022.
- [47] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning. PMLR, 2015, pp. 448–456.
- [48] Y. Wu and K. He, “Group normalization,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3–19.
- [49] D. Ulyanov, A. Vedaldi, and V. S. Lempitsky, “Instance normalization: The missing ingredient for fast stylization,” ArXiv, vol. abs/1607.08022, 2016.
- [50] J. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” ArXiv, vol. abs/1607.06450, 2016.
- [51] T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, “Semantic image synthesis with spatially-adaptive normalization,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2332–2341.
- [52] X. Wang, K. Yu, C. Dong, and C. Change Loy, “Recovering realistic texture in image super-resolution by deep spatial feature transform,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 606–615.
- [53] X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 1501–1510.
- [54] J. Ling, H. Xue, L. Song, R. Xie, and X. Gu, “Region-aware adaptive instance normalization for image harmonization,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 9357–9366.
- [55] N. van Noord and E. Postma, “A learned representation of artist-specific colourisation,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, Oct 2017.
- [56] P. Zhu, R. Abdal, Y. Qin, and P. Wonka, “Sean: Image synthesis with semantic region-adaptive normalization,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 5103–5112.
- [57] Z. Tan, D. Chen, Q. Chu, M. Chai, J. Liao, M. He, L. Yuan, G. Hua, and N. Yu, “Efficient semantic image synthesis via class-adaptive normalization,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 4852–4866, 2022.
- [58] Z. Lv, X. Li, X. Li, F. Li, T. Lin, D. He, and W. Zuo, “Learning semantic person image generation by region-adaptive normalization,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 10 801–10 810.
- [59] Y. Lyu, P. Chen, J. Sun, X. Wang, J. Dong, and T. Tan, “Detailed region-adaptive normalization for heavy makeup transfer,” arXiv preprint arXiv:2109.14525, 2021.
- [60] K. Jakoel, L. Efraim, and T. R. Shaham, “Gans spatial control via inference-time adaptive normalization,” in 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022, pp. 31–40.
- [61] T. Chen, M. Lucic, N. Houlsby, and S. Gelly, “On self modulation for generative adversarial networks,” 2019. [Online]. Available: https://openreview.net/forum?id=Hkl5aoR5tm
- [62] X. Huang and S. J. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1510–1519, 2017.
- [63] V. Sushko, E. Schönfeld, D. Zhang, J. Gall, B. Schiele, and A. Khoreva, “Oasis: Only adversarial supervision for semantic image synthesis,” International Journal of Computer Vision, 2022.
- [64] S. Li, M.-M. Cheng, and J. Gall, “Dual pyramid generative adversarial networks for semantic image synthesis,” in British Machine Vision Conference, 2022.
- [65] J. Marín and S. Escalera, “Sssgan: Satellite style and structure generative adversarial networks,” Remote Sensing, vol. 13, no. 19, 2021. [Online]. Available: https://www.mdpi.com/2072-4292/13/19/3984
- [66] X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-c. Woo, “Convolutional lstm network: A machine learning approach for precipitation nowcasting,” Advances in neural information processing systems, vol. 28, 2015.
- [67] M. Rußwurm and M. Körner, “Multi-temporal land cover classification with sequential recurrent encoders,” ISPRS International Journal of Geo-Information, vol. 7, no. 4, 2018. [Online]. Available: https://www.mdpi.com/2220-9964/7/4/129
- [68] C. Pelletier, G. I. Webb, and F. Petitjean, “Temporal convolutional neural network for the classification of satellite image time series,” Remote Sensing, vol. 11, no. 5, p. 523, 2019, https://www.mdpi.com/2072-4292/11/5/523.
- [69] W. R. Moskolaï, W. Abdou, A. Dipanda, and Kolyang, “Application of deep learning architectures for satellite image time series prediction: A review,” Remote Sensing, vol. 13, no. 23, 2021. [Online]. Available: https://www.mdpi.com/2072-4292/13/23/4822
- [70] V. S. F. Garnot, L. Landrieu, S. Giordano, and N. Chehata, “Satellite image time series classification with pixel-set encoders and temporal self-attention,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12 325–12 334.
- [71] V. S. F. Garnot and L. Landrieu, “Lightweight temporal self-attention for classifying satellite images time series,” in Advanced Analytics and Learning on Temporal Data, V. Lemaire, S. Malinowski, A. Bagnall, T. Guyet, R. Tavenard, and G. Ifrim, Eds. Cham: Springer International Publishing, 2020, pp. 171–181.
- [72] V. S. Fare Garnot and L. Landrieu, “Panoptic segmentation of satellite image time series with convolutional temporal attention networks,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 4852–4861.
- [73] L. Drees, I. Weber, M. Rußwurm, and R. Roscher, “Time dependent image generation of plants from incomplete sequences with cnn-transformer,” in DAGM German Conference on Pattern Recognition. Springer, 2022, pp. 495–510.
- [74] J. Nyborg, C. Pelletier, S. Lefèvre, and I. Assent, “Timematch: Unsupervised cross-region adaptation by temporal shift estimation,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 188, pp. 301–313, 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0924271622001216
- [75] J. Nyborg, C. Pelletier, and I. Assent, “Generalized classification of satellite image time series with thermal positional encoding,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022, pp. 1391–1401.
- [76] J. Muñoz Sabater, E. Dutra, A. Agustí-Panareda, C. Albergel, G. Arduini, G. Balsamo, S. Boussetta, M. Choulga, S. Harrigan, H. Hersbach, B. Martens, D. G. Miralles, M. Piles, N. J. Rodríguez-Fernández, E. Zsoter, C. Buontempo, and J.-N. Thépaut, “Era5-land: a state-of-the-art global reanalysis dataset for land applications,” Earth System Science Data, vol. 13, no. 9, pp. 4349–4383, 2021. [Online]. Available: https://essd.copernicus.org/articles/13/4349/2021/
- [77] K. Didan, “Mod13a2 modis/terra vegetation indices 16-day l3 global 1km sin grid v006 [data set]. nasa eosdis land processes daac,” 2015. [Online]. Available: https://doi.org/10.5067/MODIS/MOD13A2.006
- [78] Z. Wan, S. Hook, and G. Hulley, “Mod11a2 modis/terra land surface temperature/emissivity 8-day l3 global 1km sin grid v006,” 2015. [Online]. Available: https://doi.org/10.5067/MODIS/MOD13A2.006
- [79] A. Bashfield and A. Keim, “Continent-wide dem creation for the european union,” in 34th International Symposium on Remote Sensing of Environment. The GEOSS Era: Towards Operational Environmental Monitoring. Sydney, Australia, 2011, pp. 10–15.
- [80] A. J. Tatem, “Worldpop, open data for spatial demography,” Scientific data, vol. 4, no. 1, pp. 1–4, 2017.
- [81] J. San-Miguel-Ayanz, E. Schulte, G. Schmuck, and A. Camia, “The european forest fire information system in the context of environmental policies of the european union,” Forest Policy and Economics, vol. 29, pp. 19–25, 2013, the FIRE PARADOX project: Setting the basis for a shift in the forest fire policies in Europe. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S138993411200127X
- [82] L. Giglio, W. Schroeder, and C. O. Justice, “The collection 6 modis active fire detection algorithm and fire products,” Remote Sensing of Environment, vol. 178, pp. 31–41, 2016. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0034425716300827
- [83] G. Büttner, CORINE Land Cover and Land Cover Change Products. Dordrecht: Springer Netherlands, 2014, pp. 55–74. [Online]. Available: https://doi.org/10.1007/978-94-007-7969-3_5
- [84] D. Steinfeld, Calculation of indices for forest fire risk assessment in weather and climate data, 2022. [Online]. Available: https://github.com/steidani/FireDanger
- [85] T. M. Giannaros, G. Papavasileiou, K. Lagouvardos, V. Kotroni, S. Dafis, A. Karagiannidis, and E. Dragozi, “Meteorological analysis of the 2021 extreme wildfires in greece: Lessons learned and implications for early warning of the potential for pyroconvection,” Atmosphere, vol. 13, no. 3, 2022. [Online]. Available: https://www.mdpi.com/2073-4433/13/3/475
- [86] I. Prapas, S. Kondylatos, and I. Papoutsis, “Training data for submitted paper ”Wildfire Danger Prediction and Understanding with Deep Learning”,” May 2022. [Online]. Available: https://doi.org/10.5281/zenodo.6528394
- [87] C. Cammalleri, J. V. Vogt, B. Bisselink, and A. de Roo, “Comparing soil moisture anomalies from multiple independent sources over different regions across the globe,” Hydrology and Earth System Sciences, vol. 21, no. 12, pp. 6329–6343, 2017.
- [88] M. Lin, Q. Chen, and S. Yan, “Network in network,” arXiv preprint arXiv:1312.4400, 2013.
- [89] Y. Martín, M. Zúñiga-Antón, and M. R. Mimbrero, “Modelling temporal variation of fire-occurrence towards the dynamic prediction of human wildfire ignition danger in northeast spain,” Geomatics, Natural Hazards and Risk, vol. 10, no. 1, pp. 385–411, 2019. [Online]. Available: https://doi.org/10.1080/19475705.2018.1526219
- [90] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 2019, pp. 8024–8035. [Online]. Available: http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
- [91] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2015.
- [92] G. Bertasius, H. Wang, and L. Torresani, “Is space-time attention all you need for video understanding?” in Proceedings of the International Conference on Machine Learning (ICML), July 2021.
- [93] Z. Liu, J. Ning, Y. Cao, Y. Wei, Z. Zhang, S. Lin, and H. Hu, “Video swin transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 3202–3211.
- [94] L. Breiman, “Random forests,” Machine Learning 2001 45:1, vol. 45, pp. 5–32, 10 2001. [Online]. Available: https://link.springer.com/article/10.1023/A:1010933404324
- [95] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’16. New York, NY, USA: Association for Computing Machinery, 2016, p. 785–794. [Online]. Available: https://doi.org/10.1145/2939672.2939785
![]() |
Mohamad Hakam Shams Eddin is a PhD student at the University of Bonn, Germany. He received his Dipl.-Ing. degree in topographic engineering from the University of Aleppo, Syria, in 2015 and his M.Sc. degree in geomatics engineering from the University of Stuttgart, Germany, in 2019. His research interests include deep learning, remote sensing, and anomaly detection. |
![]() |
Ribana Roscher (Member, IEEE) received the Dipl.Ing. and Ph.D. degrees in geodesy from the University of Bonn, Bonn, Germany, in 2008 and 2012, respectively. Until 2022, she was a Junior Professor of Remote Sensing with the University of Bonn. Before she was a Postdoctoral Researcher with the University of Bonn, the Julius-Kuehn Institute, Siebeldingen, Germany, Freie Universitaet Berlin, Berlin, Germany, and the Humboldt Innovation, Berlin. In 2015, she was a Visiting Researcher with the Fields Institute, Toronto, ON, Canada. Since 2022, Ribana Roscher is a professor of Data Science for Crop Systems at the University of Bonn, Bonn, Germany. Currently, she leads the Data Science research area at IBG-2, Forschungszentrum Jülich. |
![]() |
Juergen Gall (Member, IEEE) received his B.Sc. and M.Sc. degrees in mathematics from the University of Wales Swansea, UK and from the University of Mannheim, Germany, in 2004 and 2005, respectively. In 2009, he obtained a Ph.D. in computer science from the Saarland University and the Max Planck Institut fuer Informatik. He was a postdoctoral researcher at the Computer Vision Laboratory, ETH Zurich, Switzerland from 2009 until 2012 and a senior research scientist at the Max Planck Institute for Intelligent Systems in Tuebingen from 2012 until 2013. Since 2013, Juergen Gall is a professor at the University of Bonn and head of Computer Vision Group. He is furthermore member of the Lamarr Institute for Machine Learning and Artificial Intelligence. |