Fusing VHR Post-disaster Aerial Imagery and LiDAR Data for
Roof Classification in the Caribbean

Isabelle Tingzon, Nuala Margaret Cowan, Pierre Chrzanowski
The World Bank Group, GFDRR
{tisabelle, ncowan, pchrzanowski}@worldbank.org

Abstract

Accurate and up-to-date information on building characteristics is essential for vulnerability assessment; however, the high costs and long timeframes associated with conducting traditional field surveys can be an obstacle to obtaining critical exposure datasets needed for disaster risk management. In this work, we leverage deep learning techniques for the automated classification of roof characteristics from very high-resolution orthophotos and airborne LiDAR data obtained in Dominica following Hurricane Maria in 2017. We demonstrate that the fusion of multimodal earth observation data performs better than using any single data source alone. Using our proposed methods, we achieve F1 scores of 0.93 and 0.92 for roof type and roof material classification, respectively. This work is intended to help governments produce more timely building information to improve resilience and disaster response in the Caribbean.

1 Introduction

Natural hazards such as hurricanes, floods, landslides, and volcanic activity have been increasing in frequency and intensity in the Caribbean over recent years, raising great concern regarding the disaster risk of vulnerable communities and exposed populations [16]. When Hurricane Maria ravaged Dominica in 2017, over 28,000 homes (90% of the housing stock) were either damaged or destroyed, accumulating damages and losses estimated at USD 380M in the housing sector alone [19]. The destruction of structural assets as a result of extreme hazard events poses significant challenges to many small island developing states in the Caribbean. Extensive damage to buildings can lead to precarious housing conditions, building collapse, and loss of lives. Furthermore, the cost of repairing and rebuilding homes and public infrastructure can lead to severe budget reductions, increased debt, and ultimately, weaker economic growth prospects in the region [19].

Disaster risk management and mitigation planning are thus paramount to minimizing the adverse effects of natural hazards and preventing the loss of human lives [24]. Disaster risk mitigation begins with analyzing the risk profile of buildings to different hazards and requires comprehensive information on the spatial distribution of buildings and their various characteristics. Building information modeling, as defined in [29], involves the acquisition and management of the physical representation and characterization of built objects which form the basis for decision-making. Information on building characteristics is typically accessible via official government databases; however, due to the high costs and long timeframes associated with collecting and maintaining such datasets, up-to-date and granular building information is often lacking, inaccessible, or completely unavailable in many developing countries.

To address the challenge of data scarcity, researchers have turned to deep learning (DL) and earth observation (EO) for the automatic extraction of rich, fine-grained building attribute information. Recent studies have sought to classify buildings based on rooftop characteristics using convolutional neural networks (CNNs) in combination with different types of remote sensing data, such as very high-resolution (VHR) aerial imagery and light detection and ranging (LiDAR) data [10, 3, 22, 21, 6]. While the use of multimodal EO data presents new opportunities to extract more granular dimensions of building attribution, determining the appropriate data fusion strategy for integrating heterogenous image modalities remains a challenge [7].

This study investigates the fusion of multimodal EO data for extracting rooftop characteristics in the Caribbean using DL techniques. Specifically, we explore two data fusion strategies: (1) feature-level data fusion and (2) decision-level integration using CNNs for classifying roof type and roof material from RGB orthophotos and LiDAR data in Dominica, taken after the advent of Hurricane Maria in 2017. We show that models trained using a combination of RGB and LiDAR images consistently outperform models trained using any one data source alone. Lastly, we evaluate the best models on an independent dataset of drone images with the goal of generating more frequently updated exposure datasets for disaster risk mitigation in the Caribbean.

2 Application Context

This work was developed in the context of the Digital Earth Project for Resilient Housing and Infrastructure in the Caribbean, a World Bank project funded by the Global Facility for Disaster Reduction and Recovery (GFDRR), which builds upon the work of the Global Program for Resilient Housing (GPRH) [2, 5]. The project aims to enhance local capacity in the Caribbean to leverage EO-based solutions in support of resilient infrastructure and housing operations. This includes developing local skills and capabilities to produce and update critical building information needed for governments to improve resilience in the region.

The project also aims to support government initiatives such as the Resilient Housing Scheme by the Government of the Commonwealth of Dominica (GoCD), which strives to make 90% of housing stock resilient by 2030, based on the Dominica Climate Resilience and Recovery Plan 2020-2030 [20]. For such programs to be successful, accurate and up-to-date maps of building characteristics are requisite for planning and monitoring the relocation of vulnerable citizens, the retrofitting of damaged structures, and the construction of new resilient homes [20].

In line with these goals, the Digital Earth for a Resilient Caribbean Project is comprised of three components, described in further detail in Appendix A: (1) capacity building, (2) generation and integration of baseline exposure datasets, and (3) knowledge exchange and dissemination. This paper primarily aims to advance the goals of the second component, which include leveraging DL and EO to fill baseline exposure data gaps in Caribbean countries.

3 Related Work

Recent years have seen a growing interest in leveraging DL and EO for vulnerability assessment of structural assets to better inform decision-making for disaster risk management [13, 23]. Previous studies demonstrate the successful applications of DL models, specifically CNNs, for characterizing buildings based on roof geometry (e.g. flat, hip, gable) from high-resolution satellite images and LiDAR data [3, 22, 21, 10, 6]. Recent studies have explored the use of CNNs for deep feature extraction in combination with ML algorithms for downstream roof type classification with promising results [6, 22].

Likewise, CNNs have also been used in the context of roof material classification and damage assessment [25, 12, 17, 30, 15, 11, 28, 11]. Several studies have examined the applications of CNNs for classifying buildings into different roof material categories (concrete, metal, etc.) [12, 25, 18]. Other studies classify buildings based on the level of damage sustained, with recent works focusing on detecting the presence of blue tarpaulins from post-disaster satellite images [30, 15]. Our work seeks to expand this body of literature by evaluating CNN algorithms for rooftop classification using multimodal remote sensing data in the context of disaster risk management and post-disaster damage assessment in the Caribbean.

4 Data

To generate our ground truth dataset, we used the following three data sources for Dominica and Saint Lucia, as detailed in Table 1: (1) RGB aerial imagery, (2) LiDAR data, and (3) building footprints in the form of georeferenced vector polygons. RGB orthophotos and LiDAR data were obtained from the Government of Saint Lucia (GoSL) and GoCD; meanwhile, nationwide building footprints were delineated from aerial images by the World Bank Group. LiDAR-derived data products include the Digital Surface Model (DSM) and the Digital Terrain Model (DTM), from which we generate the normalized DSM (nDSM). Further details on data pre-processing can be found in Appendix B.

To create a diverse and representative dataset, we selected 80 randomly sampled 500 m x 500 m tiles across Dominica and Saint Lucia and labeled each building within the selected tiles via visual interpretation of the RGB orthophotos and LiDAR images. We annotated a total of 8,345 buildings according to two attributes: (1) roof type and (2) roof material, the classes and distributions of which are listed in Table 2, based on the most common roof characteristics observed in the Caribbean [1]. To generate the RGB and LiDAR image patches for each building, we crop the minimum bounding rectangle of the building footprint, scaled by a factor of 1.5, from each raster data source. Figures 1 and 2 illustrate examples of the RGB orthophotos and LiDAR-derived image patches.

Table 1: Details of the RGB orthophotos, LiDAR, and building footprints.

	RGB Orthophotos		LiDAR		Buildings
	Resolution	Year	Spacing	Year	Count
Dominica	0.20 m/px	2018-2019	0.50 m	2018-2019	50,347
Saint Lucia	0.10 m/px	2022	0.50 m	2022	69,275

Table 2: Class distribution of roof type and roof material across countries and across the train/test splits.

		Dominica	Saint Lucia	Train	Test	Total
Roof Type	Gable	2,055	1,063	2,339	779	3,118
	Hip	1,334	710	1,534	510	2,044
	Flat	1,636	274	1,433	477	1,910
	No Roof	1,145	128	955	318	1,273
Roof Material	Healthy metal	1,420	1,343	2,073	690	2,763
	Irregular metal	1,206	507	1,285	428	1,713
	Concrete/cement	1,200	171	1,029	342	1,371
	Blue tarpaulin	1,153	0	865	288	1,153
	Incomplete	1,191	154	1,009	336	1,345
	Total	6,170	2,175	6,261	2,084	8,345

5 Methods

5.1 Baseline CNN Models

We select ResNet50 [8], Inceptionv3 [26], and EfficientNet-B0 [27] as our CNN base architectures for experimentation [6]. Each CNN is trained separately on RGB orthophotos and LiDAR-derived nDSM raster images; models trained using LiDAR are modified to accept single-channel image inputs. Further details on CNN model modifications and configurations, input image pre-processing, and data augmentation can be found in Appendix C.

5.2 Data Fusion

We implement two fusion strategies for combining RGB orthophotos and LiDAR images (see Appendix D) [31, 7]:

•

Feature-level data fusion. Dense feature embeddings are extracted from the global average pooling layers of the best CNN model for each data source. The two feature vectors are then concatenated and used as input into a downstream ML classifier (see Figure 4).
•

Decision-level integration. The results of the best-performing CNN per data source are combined using two different decision-level fusion strategies: (1) by computing the mean of the softmax vectors of the models, and (2) by concatenating the softmax layers from each CNN into a single vector and feeding this into a downstream ML classifier [9].

We experiment with the following ML classifiers for the downstream classification task: Logistic Regression (LR), Random Forests (RF), and Support Vector Machines (SVM). A detailed description of the ML model hyperparameter tuning can be found in Appendix E.

6 Results and Discussion

6.1 Model Evaluation

For model training and evaluation, we split the dataset into 75% training data and 25% test data using stratified random sampling to preserve the percentage of samples for each class, as shown in Table 2. Note that the test data is comprised entirely of samples from Dominica, our primary region of interest in this study. To evaluate model performance, we report the precision, recall, F1 score, and accuracy (see Appendix F for more details).

Table 3 shows the performance of CNNs trained individually on RGB orthophotos and LiDAR data for each classification task. Results indicate that for roof type classification, the LiDAR-only models perform slightly better than the RGB-only models; however, for roof material classification, the opposite is true, i.e. RGB-only models perform considerably better than LiDAR-only models with a difference of $>$ 25% F1 score.

We experimented with several data fusion techniques as shown in Table 4 and find that models trained using a fusion of RGB and LiDAR data yield a 1-3% boost over models trained using any one data source alone. Furthermore, we find that feature-level data fusion consistently outperforms decision-level integration for both classification tasks. Our best model for roof type classification is an LR model that fuses the deep feature embeddings of a ResNet50 model trained on RGB and an Inceptionv3 model trained on LiDAR, achieving an F1 score of 93%. Meanwhile, the best model for roof material classification is an RF model that fuses the feature embeddings of an EfficientNet-B0 model trained on RGB and an Inceptionv3 model trained on LiDAR data, achieving a 92% F1 score.

Table 3: Comparison of CNN test set results (%) for (a) roof type classification and (b) roof material classification when trained individually on RGB and LiDAR image inputs.

		F1 score	Precision	Recall	Accuracy
(a) Roof type classification
RGB	ResNet50	89.04	88.56	89.61	88.72
	Inceptionv3	89.02	89.01	89.04	88.53
	EfficientNet-B0	87.34	87.12	87.63	87.04
LiDAR	ResNet50	89.71	89.55	89.88	89.97
	Inceptionv3	90.21	90.09	90.37	90.60
	EfficientNet-B0	87.77	87.81	87.84	87.81

		F1 score	Precision	Recall	Accuracy
(b) Roof material classification
RGB	ResNet50	90.35	90.09	90.64	90.12
	Inceptionv3	90.34	90.15	90.70	90.07
	EfficientNet-B0	90.69	90.80	90.63	90.40
LiDAR	ResNet50	62.22	61.32	63.40	63.29
	Inceptionv3	64.01	63.32	65.05	63.77
	EfficientNet-B0	61.28	60.49	62.55	61.13

Table 4: Test set results (%) using (a) feature-level data fusion and (b) decision-level integration. For simplicity, we fuse the best-performing models for each task (see Table 3).

		F1 score	Precision	Recall	Accuracy
(a) Feature-level data fusion
Roof Type	LR^a	93.05	93.00	93.11	92.90
	RF^a	92.84	92.88	92.82	92.75
	SVM^a	92.75	92.72	92.79	92.61
Roof Material	LR^b	91.31	91.05	91.60	90.83
Roof Material	RF^b	91.66	91.48	91.85	91.17
	SVM^b	91.36	91.10	91.64	90.88

		F1 score	Precision	Recall	Accuracy
(b) Decision-level integration
Roof Type	Mean^a	92.92	92.76	93.12	92.95
	LR^a	92.25	91.92	92.63	92.13
	RF^a	92.76	92.47	93.09	92.71
	SVM^a	92.02	91.73	92.34	91.89
Roof Material	Mean^b	91.05	91.04	91.09	90.55
	LR^b	91.31	91.15	91.49	90.79
	RF^b	91.22	90.98	91.51	90.55
	SVM^b	91.36	91.23	91.49	90.83

a

Fuses ResNet50 trained on RGB and Inceptionv3 trained on LiDAR.
b

Fuses EfficientNet-B0 trained on RGB and Inceptionv3 trained on LiDAR.

6.2 Drone Imagery

One major component of the Digital Earth for a Resilient Caribbean Project involves training local communities to operate drones to enable frequent collection of VHR aerial images. To assess whether our models can generalize to VHR RGB drone imagery (3 cm/px), we manually labeled 373 buildings from pre-disaster drone images taken in Colihaut, Dominica in 2017. We evaluated our best RGB models using these annotations as ground truth and achieved F1 scores of 86% for roof type classification and 90% for roof material classification. Figure 3 illustrates the pre- and post-disaster roof material classification maps in Colihaut.

7 Conclusion

This study investigates different strategies for fusing RGB orthophotos and LiDAR images for roof type and roof material classification in the Caribbean. Our findings indicate that feature-level data fusion yields a 1-3% boost over using any single data source alone. Our best models achieve F1 scores $>$ 90% for both classification tasks. Future works will explore the use of additional data sources (e.g. drones, street-view images) for extracting building characteristics for climate resilience in the Caribbean.

Acknowledgments

The Digital Earth Project for Resilient Infrastructure and Housing in the Caribbean is funded by the Global Facility for Disaster Reduction and Recovery (GFDRR). We are grateful for the support of the Government of Saint Lucia (GoSL) and the Department of Physical Development and Urban Renewal (DPDUR) in providing the datasets for Saint Lucia. We also thank the Government of the Commonwealth of Dominica (GoCD) for providing the data for Dominica and implementing component 2 of the Disaster Vulnerability Reduction Project (DVRP), funded by the World Bank Group and Climate Investments Fund (CIF) under the Pilot Program for Climate Resilience (PPCR). We are grateful for the insightful discussions with Sarah Antos as well as the initial work of the Global Program for Resilient Housing (GPRH), which provided the essential foundations for the study. We thank Michael Fedak and Christopher Williams for their assistance in providing access to the datasets as well as the insightful discussions on the data landscape in the Caribbean.

References

[1] Open AI Caribbean Challenge: Mapping Disaster Risk from Aerial Imagery. https://www.drivendata.org/competitions/58/disaster-response-roof-type/page/143/#roof_type, 2019.
[2] Global Program for Resilient Housing (GPRH), https://www.worldbank.org/en/topic/disasterriskmanagement/brief/global-program-for-resilient-housing, 2022 January.
[3] M Buyukdemircioglu, R Can, and S Kocaman. Deep learning based roof type classification using very high resolution aerial imagery. The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, 43:55–60, 2021.
[4] Mehmet Buyukdemircioglu, Recep Can, Sultan Kocaman, and Martin Kada. Deep learning based building footprint extraction from very high resolution true orthophotos and nDSM. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, pages 211–218, 2022.
[5] Capturing Housing Data in Small Island Developing States, World Bank, 2022 Washington, DC. Creative Commons Attribution CC BY 4.0.
[6] Jeremy Castagno and Ella Atkins. Roof shape classification from LiDAR and satellite image data fusion using supervised learning. Sensors, 18(11):3960, 2018.
[7] Mauro Dalla Mura, Saurabh Prasad, Fabio Pacifici, Paulo Gamba, Jocelyn Chanussot, and Jón Atli Benediktsson. Challenges and opportunities of multimodality and data fusion in remote sensing. Proceedings of the IEEE, 103(9):1585–1601, 2015.
[8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
[9] Eike Jens Hoffmann, Yuanyuan Wang, Martin Werner, Jian Kang, and Xiao Xiang Zhu. Model fusion for building type classification from aerial and street view images. Remote Sensing, 11(11):1259, 2019.
[10] Xingliang Huang, Libo Ren, Chenglong Liu, Yixuan Wang, Hongfeng Yu, Michael Schmitt, Ronny Hänsch, Xian Sun, Hai Huang, and Helmut Mayer. Urban building classification (UBC) - a dataset for individual building detection and classification from satellite imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1413–1421, 2022.
[11] Navjot Kaur, Cheng-Chun Lee, Ali Mostafavi, and Ali Mahdavi-Amiri. Large-scale building damage assessment using a novel hierarchical transformer architecture on satellite images. Computer-Aided Civil and Infrastructure Engineering, 2023.
[12] Jonguk Kim, Hyansu Bae, Hyunwoo Kang, and Suk Gyu Lee. CNN algorithm for roof detection and material classification in satellite images. Electronics, 10(13):1592, 2021.
[13] Vasileios Linardos, Maria Drakaki, Panagiotis Tzionas, and Yannis L Karnavas. Machine learning in disaster management: Recent developments in methods and applications. Machine Learning and Knowledge Extraction, 4(2):446–473, 2022.
[14] Aydan Menderes, Arzu Erener, and Gülcan Sarp. Automatic detection of damaged buildings after earthquake hazard by using remote sensing and information technologies. Procedia Earth and Planetary Science, 15:257–262, 2015.
[15] Hiroyuki Miura, Tomohiro Aridome, and Masashi Matsuoka. Deep learning-based identification of collapsed, non-collapsed and blue tarp-covered buildings from post-disaster aerial images. Remote Sensing, 12(12):1924, 2020.
[16] Sònia Muñoz and İnci Ötker. Building resilience to natural disasters in the Caribbean requires greater preparedness. https://www.imf.org/en/News/Articles/2018/12/07/NA120718-Building-Resilience-to-Natural-Disasters-in-Caribbean-Requires-Greater-Preparedness, December 2018.
[17] Shohei Naito, Hiromitsu Tomozawa, Yuji Mori, Takeshi Nagata, Naokazu Monma, Hiromitsu Nakamura, Hiroyuki Fujiwara, and Gaku Shoji. Building-damage detection method based on machine learning utilizing aerial photographs of the Kumamoto earthquake. Earthquake Spectra, 36(3):1166–1187, 2020.
[18] Masayu Norman, Helmi Zulhaidi Mohd Shafri, Shattri Mansor, Badronnisa Yusuf, and Nurul Ain Wahida Mohd Radzali. Fusion of multispectral imagery and LiDAR data for roofing materials and roofing surface conditions assessment. International Journal of Remote Sensing, 41(18):7090–7111, 2020.
[19] Government of the Commonwealth of Dominica. Post-Disaster Needs Assessment Hurricane Maria September 18, 2017. https://www.gfdrr.org/en/dominica-hurricane-maria-post-disaster-assessment-and-support-recovery-planning, September 2017.
[20] Government of the Commonwealth of Dominica. Dominica climate resilience and recovery plan 2020–2030. https://odm.gov.dm/wp-content/uploads/2022/02/CRRP-Final-042020.pdf, 2020.
[21] Naim Ölçer, Didem Ölçer, and Emre Sümer. Roof type classification with innovative machine learning approaches. PeerJ Computer Science, 2023.
[22] Tahmineh Partovi, Friedrich Fraundorfer, Seyedmajid Azimi, Dimitrios Marmanis, and Peter Reinartz. Roof type selection based on patch-based classsification using deep learning for high resolution satellite imagery. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences-ISPRS Archives, 42(W1):653–657, 2017.
[23] Vitor Silva, Svetlana Brzev, Charles Scawthorn, Catalina Yepes, Jamal Dabbeek, and Helen Crowley. A building classification system for multi-hazard risk assessment. International Journal of Disaster Risk Science, 13(2):161–177, 2022.
[24] Robert Soden, Dennis Wagenaar, and Annegien Tijssen. Responsible artificial intelligence for disaster risk management. https://opendri.org/wp-content/uploads/2021/06/ResponsibleAI4DRM.pdf.
[25] Roman A Solovyev. Roof material classification from aerial imagery. Optical Memory and Neural Networks, 29:198–208, 2020.
[26] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
[27] Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019.
[28] Tinka Valentijn, Jacopo Margutti, Marc van den Homberg, and Jorma Laaksonen. Multi-hazard and spatial transferability of a CNN for automated building damage assessment. Remote Sensing, 12(17):2839, 2020.
[29] Chaofeng Wang, Qian Yu, Kincho H Law, Frank McKenna, X Yu Stella, Ertugrul Taciroglu, Adam Zsarnóczay, Wael Elhaddad, and Barbaros Cetiner. Machine learning-based regional scale intelligent modeling of building information for natural hazard risk management. Automation in Construction, 122:103474, 2021.
[30] Jinglin Xu, Feng Zeng, Wen Liu, and Toru Takahashi. Damage detection and level classification of roof damage after Typhoon Faxai based on aerial photos and deep learning. Applied Sciences, 12(10):4912, 2022.
[31] Jiadi Yin, Ping Fu, Nicholas AS Hamm, Zhichao Li, Nanshan You, Yingli He, Ali Cheshmehzangi, and Jinwei Dong. Decision-level and feature-level integration of remote sensing and geospatial big data for urban land use mapping. Remote Sensing, 13(8):1579, 2021.

Appendix

A Digital Earth for a Resilient Caribbean

The Digital Earth Project for Resilient Housing and Infrastructure in the Caribbean aims to support the development, scaling, and standardization of EO solutions for disaster risk management and is comprised of three main components:

1.

Capacity building. The project supports the training and upskilling of government officials, local community members, researchers, and other key stakeholders in geospatial data analytics, ML modeling, community mapping, and geospatial data management.
2.

Generation of critical exposure datasets. The project strives to provide operational support for the generation, integration, and maintenance of new baseline exposure data layers through a combination of advanced technologies (e.g. AI and EO-based solutions) and local interventions (e.g. field surveys, participatory mapping).
3.

Knowledge exchange and dissemination. The project aims to promote the methodologies and results to other Caribbean regions through information exchange and knowledge-sharing sessions.

B LiDAR Data Pre-processing

The LiDAR-derived data products are provided in the form of the digital surface model (DSM) and digital terrain model (DTM). DSMs are a type of elevation model that represents the topography of the earth’s surface, including artificial, man-made, or natural features such as tree tops, power lines, and buildings above bare ground. DTMs represent the bare-earth model absent of any natural or human-made features. The normalized DSM (nDSM) is calculated as the difference between the DSM and DTM and represents the relative height of features above the ground surface. nDSMs have been used in previous studies to improve the detection of buildings and building damage [4, 14]. In our study, we use the single-channel nDSMs as input to CNNs for roof type and roof material classification.

C CNN Model Configuration

All models were pre-trained on the ImageNet dataset and fine-tuned using cross-entropy loss. For models trained using LiDAR-derived nDSM images, the initial convolutional layer was modified to accept single-channel input images, and its corresponding parameters were replaced with the mean of the weights from the pretrained model. Input images are zero-padded to a square up to the maximum between the width and height of the image and resized to 224 x 224 px for ResNet50 and EfficientNet-B0 and 299 x 299 px for Inceptionv3. We implement data augmentation in the form random vertical and horizontal image flips and rotations ranging from $-90^{\circ}$ to $90^{\circ}$ . For model configuration, we used the Adam optimizer and set the batch size to 32, with an initial learning rate of $1e^{-5}$ . For the ResNet50 model, a dropout layer that sets input units to zero with a probability of 0.5 was added before the fully connected (FC) layer as a regularization mechanism. All models were trained for a maximum of 30 epochs, with a learning rate scheduler that reduces the learning rate by a factor of 0.1 after 7 epochs of no improvement.

D Data Fusion

Based on the data fusion techniques described in Section 5.2, we extract two types of vector layers from the CNN models trained individually on each data source: (1) for feature-level data fusion, deep feature embeddings of size $(2048\times 1)$ for ResNet50 and Inceptionv3 and $(1280\times 1)$ for EfficientNet-B0 are extracted from the global average pooling layer of each CNN model and concatenated to form a single vector, and (2) for decision-level integration, a vector of probabilities is extracted from the softmax layer of each CNN and combined into a single vector. The combined vectors are then fed as input into an ML classifier for the downstream tasks of roof type and roof material classification.

For simplicity, we fuse the vectors (i.e. feature embeddings or softmax probabilities) extracted from the RGB and LiDAR models that achieve the best performance in terms of the F1 score for each of the classification tasks. Specifically, for roof type classification, we fuse vectors extracted from the ResNet50 model trained on RGB images (F1 score: 89.04%) and the Inceptionv3 model trained on LiDAR images (F1 score: 90.21%). Meanwhile, for roof material classification, we fuse the vectors of the EfficentNet-B0 model trained on RGB images (F1 score: 90.69%) and the Inceptionv3 model trained on LiDAR images (F1 score: 64.01%).

E Downstream ML Classifiers

For the downstream classification task, we experiment with different ML methods, including LR, RF, and linear SVMs. We implemented hyperparameter tuning on the training set using a 5-fold cross-validation (CV) strategy, specifically: (a) grid search CV for LR and SVM and (2) randomized search CV for RF, with 30 parameter settings sampled set for each run. For LR and SVM, our search space includes the norm of the penalty (L1 and L2) and the regularization parameter $C$ (values include 0.001, 0.01, 0.1, and 1). For LR, we also experimented with two different types of solvers (LBFGS and LIBLINEAR). For RF, our search space includes the number of trees (set to values between 100 and 1000 with increments of 50), the maximum depth of trees (set to values between 3 and 10), and the criterion measuring the quality of the split (gini or entropy). For all models, we tried different scaling techniques, including min-max scaling, standard scaling, and robust scaling as implemented in scikit-learn.

For roof type classification, the best F1 score of 93% is achieved by an LR model with an L2 penalty term and a 0.1 regularization parameter $C$ . The model fuses the concatenated feature embeddings (min-max scaled) of two models: a ResNet50 model trained on RGB and an Inceptionv3 model trained on LiDAR. For roof material classification, the best F1 score of 92% is achieved by an RF model with 450 estimators, a maximum depth of 9, and an entropy criterion. The RF model is trained on the combined feature embeddings (min-max scaled) of an EfficientNet-B0 model trained on RGB and an Inceptionv3 model trained on LiDAR data.

F Performance Metrics

We compute precision, recall, F1 score, and accuracy using standard definitions as follows:

\text{Precision}=\frac{\text{TP}}{\text{TP}+\text{FP}}

(1)

\text{Recall}=\frac{\text{TP}}{\text{TP}+\text{FN}}

(2)

\text{F1 score}=\frac{2\cdot\text{Precision}\cdot\text{Recall}}{\text{Precision}+\text{Recall}}

(3)

\text{Accuracy}=\frac{\text{TP}+\text{TN}}{\text{TP}+\text{TN}+\text{FP}+\text{FN}}

(4)

where TP is the number of true positives, TN is the number of true negatives, FP the number of false positives, and FN the number of false negatives. The precision, recall, and F1 score metrics are computed as the unweighted mean of the values calculated per class (i.e. macro-averaged).

Fusing VHR Post-disaster Aerial Imagery and LiDAR Data for Roof Classification in the Caribbean