\backgroundsetup

position=current page.east, angle=-90, nodeanchor=east, vshift=-5mm, hshift = 92mm, opacity=1, scale=1.35, contents=Non-peer reviewed. This work has been submitted to IEEE Transactions on Geoscience and Remote Sensing for possible publication.

This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.

Improving the Thermal Infrared Monitoring of Volcanoes: A Deep Learning Approach for Intermittent Image Series

Jeremy Diaz, Guido Cervone, and Christelle Wauthier J. Diaz is currently with the United States Geological Survey Water Mission Area. At the time of this work, he was with the Department of Geography and the Institute of Computational and Data Sciences, The Pennsylvania State University, University Park, PA, 16801 USA (e-mail: [email protected])G. Cervone is with the Department of Geography and the Institute of Computational and Data Sciences, The Pennsylvania State University, University Park, PA, 16801 USAC. Wauthier is with the Department of Geosciences and the Institute of Computational and Data Sciences, The Pennsylvania State University, University Park, PA, 16801 USAManuscript received April 19, 2005; revised August 26, 2015.

Abstract

Active volcanoes are globally distributed and pose societal risks at multiple geographic scales, ranging from local hazards to regional/international disruptions. Many volcanoes do not have continuous ground monitoring networks; meaning that satellite observations provide the only record of volcanic behavior and unrest. Among these remote sensing observations, thermal imagery is inspected daily by volcanic observatories for examining the early signs, onset, and evolution of eruptive activity. However, thermal scenes are often obstructed by clouds, meaning that forecasts must be made off image sequences whose scenes are only usable intermittently through time. Here, we explore forecasting this thermal data stream from a deep learning perspective using existing architectures that model sequences with varying spatiotemporal considerations. Additionally, we propose and evaluate new architectures that explicitly model intermittent image sequences. Using ASTER Kinetic Surface Temperature data for $9$ volcanoes between $1999$ and $2020$ , we found that a proposed architecture (ConvLSTM + Time-LSTM + U-Net) forecasts volcanic temperature imagery with the lowest RMSE ( $4.164^{\circ}$ C, other methods: $4.217-5.291^{\circ}$ C). Additionally, we examined performance on multiple time series derived from the thermal imagery and the effect of training with data from singular volcanoes. Ultimately, we found that models with the lowest RMSE on forecasting imagery did not possess the lowest RMSE on recreating time series derived from that imagery and that training with individual volcanoes generally worsened performance relative to a multi-volcano data set. This work highlights the potential of data-driven deep learning models for volcanic unrest forecasting while revealing the need for carefully constructed optimization targets.

Index Terms:

Global volcano monitoring, eruption forecasting, deep learning (DL), remote sensing, thermal infrared, ASTER, neural networks, LSTM, convolutional neural network (CNN), data-driven architecture, irregular time series.

I Introduction

Forecasting volcanic unrest has been a major scientific goal for at least 100 years [1]. This is important because over 800 million people live within 100 kilometers of a Holocene volcano [2]. Beyond direct volcanic hazards, volcanoes can perturb agriculture, infrastructure, the environment, as well as (inter)national business and travel [3] during the course of an eruption or during inter-eruptive periods. Short-term (e.g., days to months) volcanic unrest forecasts are made by identifying patterns in monitoring observational time series (e.g. seismic, deformation, thermal, emissions). Improving these forecasts has been identified as a grand challenge within the field of volcanology [4]. Ultimately, this task is very challenging because there are various methods, expertise, and/or instrumentation to monitor volcanoes on the ground, as well as remotely from the air and space (see [4] section 1.4 for a light overview, see [5] for a more detailed remote sensing perspective). As a result, reliable and timely monitoring and, thus, the ability to forecast unrest accurately is costly and absent from many active volcanoes [3].

Short-term forecasting of volcanic unrest is conducted through a range of domain and statistical approaches, both quantitative and qualitative. At the highest level, these forecasts are generated by expert committees at observatories [6] and described through event trees which communicate all possible outcomes with the associated probabilities at each stage of the outcome [7]. Likewise, the very few examples of quantitative methods which directly output broadly prescriptive forecasts (i.e., many metrics or probabilities) rely upon event trees as input data [8, 9]. These event tree approaches are very promising but also very demanding, requiring a plethora of expert knowledge, monitoring streams, and volcano-specific knowledge. On the other hand, quantitative methods are often more limited in scope, instead being applied to individual time series of monitoring data to aid in the identification of early warnings of change (e.g., values above background, changes in pattern), and this exercise is often considered forecasting, see [10, 11]. Examples of this approach that are heavily influenced by domain knowledge include modeling seismic signals from a material failure perspective [12, 13] and monitoring gas ratios [14] and lava discharge rates [15] from an equilibrium/disequilibrium perspective. On the other hand, more purely quantitative approaches include the use of simple thresholds [1], examination of current events among their broader distributions [16], and various modeling projects that have made use of Poisson point processes [17, 18], variograms [19, 20], Markov chain Monte Carlo methods [21], and ARIMA models [22, 23].

With global volcanic monitoring in mind, there are many difficulties and opportunities with this existing toolkit of forecasting options. Particularly, individual forecasting efforts generally differ in the variety and historical reach of their available data. Committee-based and event tree approaches are particularly ambitious, but their reliance on many experts makes them inaccessible due to global volcanic monitoring already being financially restrictive. Likewise, quantitative methods that employ domain knowledge often make use of specialized ground equipment (e.g., seismometers, infrared spectrometers) that is lacking at many active volcanoes. Thus, financially accessible solutions which are globally homogeneous are highly appealing. Because of this, satellite remote sensing has been thoroughly studied [24] and widely adopted to both establish and supplement monitoring efforts [25, 26, 27, 28]. Currently, multiple online tools exist to aid global volcanic monitoring via remote sensing; examples of these include MODVOLC [28, 29], MIROVA (Middle InfraRed Observation of Volcanic Activity) [30], and MOUNTS (Monitoring Unrest From Space) [31]. Thermal monitoring has seen particular operational success due to the relative ease of access, processing, and interpretation. This imagery is often provided for viewing while accompanied by time series of higher-level information, such as the maximum temperature above the background (i.e., “excess temperature”) [28], the number of hotspots [31], the maximum distance of a hotspot from the summit [30], and/or measures of radiance [30].

Thermal remote sensing imagery is therefore a promising data stream for global volcanic monitoring and forecasting. However, for a predictive image task, the routine statistical models mentioned earlier have multiple weaknesses: notably, they make rigid assumptions about the functional relationships of data and the distributions underlying those data, do not necessarily consider spatially correlated data, and, perhaps most importantly, are relatively simple—exploring a parameter space much smaller than current computational technology allows. Meanwhile, since 2012 [32], neural networks have seen an explosive growth in popularity (including within the field of remote sensing [33]) due to their incredible performance in making image-based predictions by utilizing millions of optimized parameters and lots of data. Likewise, machine/deep learning has been applied in the field of volcanology for various detection and classification tasks (particularly with seismic signals and Interferometric Synthetic Aperture - InSAR - geodetic data) [34, 35, 36, 37, 31, 38], but the task of forecasting remains poorly explored [33, 1]. However, in the existing examples of short-term volcanic forecasting with machine learning [39, 40, 41], neural networks have more commonly excelled [39, 40] and among neural networks, more sophisticated architectures have proven beneficial [40]. Lastly, while it is undesirable that neural networks are infamous for having uninterpretable mappings of input data to output targets, they are additionally favorable for this cloud-susceptible task because they have a documented history for performing well with noisy and missing data [42].

In summary, improving short-term volcanic forecasts are valuable for society worldwide. There are many ways to produce these forecasts, but most cannot be implemented globally due to the costs of experts and specialized ground equipment. There are, however, many globally accessible remote sensing data streams; particularly, satellite thermal imagery has been widely adopted. Forecasting remotely sensed imagery—especially imagery that can be inhibited by atmospheric conditions—is not well explored. Here, we draw on existing and proposed models (with an emphasis on neural networks) to explore and evaluate the task of forecasting volcanic temperature imagery. Specific experiments include:

•

Which model (among naive methods, autoregression, and numerous neural networks) can best forecast volcanic AST_08 image sequences?
•

How do derived time series from these models compare to real observations?
•

How does performance on individual volcanoes change when the training data set is limited to individual volcanoes instead of all 9 in the data set?

II Data and Methods

II-A Data Set

Nine active volcanoes were targeted in this project: Mount Erebus (Antarctica), Erta Ale (Ethiopia), Mount Etna (Italy), Kīlauea (Hawaii), Masaya (Nicaragua), Nyamulagira (Democratic Republic of Congo), Nyiragongo (Democratic Republic of Congo), Pacaya (Guatemala), and the distant Pu’u ’Ō’ō cone of Kīlauea (Hawai’i, US). These volcanoes were chosen for their active nature and/or because they have displayed lava lakes (a superficial and persistent thermal phenomenon) for at least some percentage of the data record. Among these, the Hawaiian volcanoes and Mount Etna are among the most studied and monitored in the world; Erebus, Masaya, and Pacaya have some degree of ground monitoring; Nyamulagira and Nyiragongo have decommissioned ground monitoring efforts due to civil unrest in the region; and, Erta Ale does not have regular ground monitoring [2].

Among thermal monitoring options, nighttime imagery is preferred in the field of volcanology because it is generally less cloudy and decreases the effects of topography and solar heating, “allowing better delineation of low-temperature geothermal anomalies” [43, p. 18]. In general, many platforms do not make nighttime thermal imagery easily accessible (e.g., Landsat has special request targets [44]); however, the $>20$ -year NASA Terra platform which hosts the MODIS and ASTER instruments are an exception to this. MODIS is very popular because it provides a very high temporal sampling rate (four images per day [45]) which can aid in the rapid detection of changes [29]. ASTER, on the other hand, provides a 10-fold increase in spatial resolution which allows the less frequent detection of subtler changes months in advance [46, 47, 5]. Nighttime thermal data from ASTER (L1T) is available for bulk download in the form of 5 bands of radiance values (8-12 $\mu$ m) from the online DAAC2Disk Utility, courtesy of the NASA Land Processes Distributed Active Archive Center (LP DAAC), USGS Earth Resources Observation and Science (EROS) Center, Sioux Falls, South Dakota, https://lpdaac.usgs.gov/tools/daac2diskscripts/. The level-2 Surface Kinetic Temperature product (AST_08) [48] which is more thoroughly corrected for the atmosphere and with radiance converted to temperature is available for order from the NASA LP DAAC via NASA’s Earthdata Search (https://search.earthdata.nasa.gov/search?q=ast_08).

First, all L1T data for each volcano between the dates December 18, 1999, and May 1, 2020, were downloaded. Initial inspection revealed a poor correlation between viable scenes and cloud cover metadata, so all scenes were manually inspected for viability in the data set. Upon manual inspection, each scene was denoted viable, uncertain, or nonviable (Figure 1). Due to the large number of nonviable scenes, uncertain scenes were ultimately included in our data set; these were scenes with partial cloud cover or scenes with large extents of cloud cover, through which, thermal features were still observable. The final set of viable and uncertain scenes were then ordered as level-2 AST_08 data for the remainder of the project. The distribution of time between these scenes is presented in Figure 2.

Refer to caption — Figure 1: Manual scene inspection results by volcano. Scenes are from the beginning of ASTER record to May 1, 2020.

II-B Data Processing

The AST_08 data contained missing values representing areas outside the sensor’s field of view as well as pixels amid very volcanically active areas (see Figure 3, top row) – with sources referencing the latter as recovery pixels [49]. In the level-2 data, the latter appears as counter-intuitive dark spots within a bright hotspot, while the level-1 data does not contain these missing pixel observations. Since locations outside the field of view will have valid observations in at most two cardinal directions, pixels with valid values in at least three cardinal directions were filled with nearest-neighbor interpolation; qualitatively, this better resembled level-1 data and did not assign improbable ”cold” pixels islands in the middle of large lava lakes/flows.

To narrow the focus of the image forecasting task, a $96\times 96$ pixel subset of the scene was used in the data set; with a 90-meter spatial resolution, this translates to an $8.64\times 8.64$ kilometer subset. This precise size is somewhat arbitrary, but, qualitatively, it was large enough to observe significant portions of large lava flows while not being too large to neglect small lava lakes. The subset was centered on the latitude and longitude of the volcano summit. Converting the scenes’ coordinate reference systems to latitude and longitude for summit locating and subsetting introduced geometric distortions that were visually apparent for Mount Erebus (located in Antarctica); however, these relics were not problematic for thermal interpretation.

In volcanology studies, temperature measurements are often examined relative to the environmental background (i.e., the excessive temperature of the volcanic feature relative to the surrounding environment); as such, the AST_08 scenes were processed this way (see Figure 3 for steps up to this point). Volcano studies often manually identify a cloud-free, representative part of the environment to determine the environmental background temperature. To automate this process in a robust way, the following process was used to compute the average environmental background temperature: first, a $10\times 10$ pixel subset was extracted from each corner of the reduced image; these corners were then considered appropriate if less than 10% of their pixel values were outside the sensor’s field of view (i.e., missing values). The mean was then calculated for each subset, and the mean of those means was used as the environmental background temperature. Mount Erebus’s location along the orbital path led to unique viewing geometries where this method would fail to identify any valid corners; in those cases, the $10\times 10$ pixel subsets were rotated around the perimeter of the image to identify suitable areas (see Figure 4). Once the environmental background temperature was computed, it was subtracted from every pixel value. Notably, in addition to being consistent with volcanology studies, this removes the task of forecasting the seasonality of the background environment.

After the above, there remained missing values from the $96\times 96$ scene subsets that represented areas outside the sensor’s field of view. These pixels were then filled with the values from the previous scene and the date of those previous scenes was noted. When this occurred for the first scene acquired for a volcano (i.e., no previous scenes), the missing values were filled with nearest-neighbor interpolation (Figure 5). Lastly, prior to training the data was scaled between 0 and 1 (i.e., $x_{new}=(x_{original}-x_{min})/(x_{max}-x_{min})$ ) to facilitate training convergence.

II-C Forecasting Configuration

The forecasting task here uses the previous $n$ scenes to forecast the next scene. For most models, this is done by treating each pixel as an independent observation, but for convolution-based neural networks, this is done at the image-level. The exact choice of $n$ is not straightforward, but realistically, it must be large enough to make use of past information and small enough to not magnify computational and storage burdens. A popular method in volcanology studies is to examine the maximum excess temperature through time, so these time series were extracted from the L1T data (prior to AST_08 retrieval) and their autocorrelation structure was examined for each volcano. The average of the largest significant autocorrelation lag was $6.43$ , so $n$ was set to $6$ .

In the case of volcano-specific training sets (see Results and Discussion, section C), the largest significant autocorrelation lag for each volcano was used with a minimum accepted value of $3$ and a maximum accepted value of $10$ previous scenes.

II-D Models

II-D1 Baseline predictive methods

ARIMA models are common time series models (see [50] for a comprehensive coverage) with published applications in volcanic forecasting [22, 23]. The ARIMA acronym stands for “autoregressive integrated moving average” because it combines the simpler autoregressive (AR) model that predicts $x_{t}$ based on the linear combination of its previous $\{x_{t-1},\dots,x_{t-p}\}$ values and the moving average (MA) model that defines $x_{t}$ based on the linear combination of the past $q$ random perturbations to the process mean. The “I/integrated” part of the name refers to the process of $d$ -order differencing, a method for approximating constant mean and, to a lesser extent, autocovariance, through time; the simplest and most common example of this is first-order differencing which is concerned with modeling the change in $x_{t}$ ( $\Delta x_{t}=x_{t}-x_{t-1}$ ) instead of $x_{t}$ directly. Thus, an ARIMA model is defined by 3 parameters: $p$ , $d$ , and $q$ .

This work examined two baseline models related to the commonly used ARIMA model—more precisely, with both being variants of the AR model. All neural network architectures examined in this work utilize $6$ previous scenes to forecast the next scene (unless stated otherwise), so the AR( $p=6$ ) model is an intuitive baseline analog. This can be expressed mathematically as:

x_{t}=\sum_{i=1}^{6}\phi_{i}x_{t-i}

(1)

Likewise, scientifically, the move naive forecast is to assume no change from the most recent scenes, so this will also be examined by deploying an AR( $p=1$ ) model with $\phi_{1}=1$ .

Lastly, a final baseline model is inspired from the domain’s data processing—predicting all $0$ values as the forecast. In light of the data processing, this forecast can be interpreted as predicting a scene of homogeneous temperature with no pixels exceeding the background average.

II-D2 Existing neural network architectures

Among neural network architectures, the long short-term memory (LSTM) model is very popular. Like its preceding recurrent neural network (RNN), this model is designed for sequential (rather than independent) data, and it expanded upon its predecessor by introducing gate units that are trained to learn the optimal flow of new data and retention of prior information. The equations of this model are given below:

i_{t}=\sigma(W_{xi}x_{t}+W_{hi}h_{t-1}+b_{i})

(2)

f_{t}=\sigma(W_{xf}x_{t}+W_{hf}h_{t-1}+b_{f})

(3)

c_{t}=f_{t}\odot c_{t-1}+i_{t}\odot tanh(W_{xc}x_{t}+W_{hc}h_{t-1}+b_{c})

(4)

o_{t}=\sigma(W_{xo}x_{t}+W_{ho}h_{t-1}+b_{o})

(5)

h_{t}=o_{t}\odot tanh(c_{t})

(6)

Here, $\circ$ denotes the Hadamard product for element-wise multiplication of two matrices, $\sigma$ represents the sigmoid activation function, lower case variables indicate scalars or vectors, and upper case variables indicate matrices. Likewise, these notation rules are adapted throughout the section, modifying original equations as needed.

The traditional LSTM was modified by Shi et al. [51] to perform precipitation nowcasting on radar maps by changing both the inputs and the hidden states from scalars/vectors to matrices and applying the learned weights via the convolution ( $\ast$ ) or “sliding window” operator in Equations 2-5:

I_{t}=\sigma(W_{xi}\ast X_{t}+W_{hi}\ast H_{t-1}+b_{I})

(7)

F_{t}=\sigma(W_{xf}\ast X_{t}+W_{hf}\ast H_{t-1}+b_{F})

(8)

C_{t}=F_{t}\odot C_{t-1}+I_{t}\odot tanh(W_{xc}\ast X_{t}+W_{hc}\ast H_{t-1}+b_{C})

(9)

O_{t}=\sigma(W_{xo}\ast X_{t}+W_{ho}\ast H_{t-1}+b_{O})

(10)

H_{t}=O_{t}\odot tanh(C_{t})

(11)

The result is the now-popular convolutional LSTM with proven results on scientific raster time series, and the architecture has since been used to predict future temperature [52], precipitation [53], and land cover [54].

Separately, Zhu et al. [55] incorporated two time gates to allow the “notion of interval” or time between observations to improve content recommendation services (e.g., music and scientific literature). These time gates serve as further learned modulators to the flow of short-term and long-term memory, yielding the Time-LSTM:

t1_{t}=\sigma(W_{x1}x_{t}+\sigma(W_{t1}\Delta t_{t})+b_{1})

(12)

\mbox{such that }W_{t1}\leq 0

t2_{t}=\sigma(W_{x2}x_{t}+\sigma(W_{t2}\Delta t_{t})+b_{2})

(13)

\widetilde{c_{t}}=(1-i_{t}\odot t1_{t})\odot c_{t-1}+i_{t}\odot t1_{t}\odot\sigma(W_{xc}x_{t}+W_{hc}h_{t-1}+b_{c})

(14)

c_{t}=(1-i_{t})\odot c_{t-1}+i_{t}\odot t2_{t}\odot\sigma(W_{xc}x_{t}+W_{hc}h_{t-1}+b_{c})

(15)

h_{t}=o_{t}\odot tanh(\widetilde{c_{t}})

(16)

Similarly, Baytas et al. [56] developed the Time-Aware LSTM that analyzed irregularly timed patient records to predict patient groups with similar disease progression pathways. Instead of using further nonlinear and learned gates, however, the Time-Aware LSTM simply reallocates the memory contributions of long-elapsed inputs from short-term to long-term memory proportional to the time elapsed via a monotonically non-increasing function $g$ :

c^{S}_{t-1}=tanh(W_{d}c_{t-1}+b_{d})

(17)

\hat{c}^{S}_{t-1}=c^{S}_{t-1}\odot g(\Delta_{t})

(18)

c^{T}_{t-1}=c_{t-1}-c^{S}_{t-1}

(19)

c^{\ast}_{t-1}=c^{T}_{t-1}+\hat{c}^{S}_{t-1}

(20)

\widetilde{c}=tanh(W_{xc}x_{t}+W_{hc}h_{t-1}+b_{c})

(21)

c_{t}=f_{t}\odot c^{\ast}_{t-1}+i_{t}\odot\widetilde{c}

(22)

Thus, the Time-LSTM and Time-Aware LSTM largely modify and expand Equation (4) of the original LSTM ¹¹1It is worth noting that the Time-LSTM defines $\Delta_{t}$ as the time following an observation ( $t_{i+1}-t_{i}$ ), while the Time-Aware LSTM defines $\Delta_{t}$ as the time preceding an observation ( $t_{i}-t_{i-1}$ )..

II-D3 Proposed neural network architectures

There does, however, remain the opportunity to modify these equations to include the composite spatial handling of the convolutional LSTM and the irregular temporal handling of the Time-LSTM or Time-Aware LSTM, and this is somewhat straightforward due to their shared origin and minimal modifications of the traditional LSTM. I express these modifications below for (1) Time-LSTM + ConvLSTM:

I_{t}=\sigma(W_{XI}*X_{t}+W_{HI}*H_{t-1}+b_{I})

(23)

T1_{t}=\sigma(W_{XT1}*X_{t}+\sigma(W_{TT1}*\Delta T_{t})+b_{T1})

(24)

\mbox{such that }W_{TT1}\leq 0

T2_{t}=\sigma(W_{XT2}*X_{t}+\sigma(W_{TT2}*\Delta T_{t})+b_{T2})

(25)

\begin{split}\widetilde{C_{t}}=(1-I_{t}\circ T1_{t})\circ C_{t-1}+I_{t}\circ T1_{t}\circ\\ \sigma(W_{XC}*X_{t}+W_{HC}*H_{t-1}+b_{C})\end{split}

(26)

\begin{split}C_{t}=(1-I_{t})\circ C_{t-1}+I_{t}\circ T2_{t}\circ\\ \sigma(W_{XC}*X_{t}+W_{HC}*H_{t-1}+b_{C})\end{split}

(27)

O_{t}=\sigma(W_{XO}*X_{t}+W_{TO}*\Delta T_{t}+W_{HO}*H_{t-1}+b_{O})

(28)

H_{t}=O_{t}\circ\sigma(\widetilde{C_{t}})

(29)

and (2) Time-Aware LSTM $+$ ConvLSTM:

I_{t}=\sigma(W_{XI}*X_{t}+W_{HI}*H_{t-1}+b_{I})

(30)

F_{t}=\sigma(W_{XF}*X_{t}+W_{HF}*H_{t-1}+b_{F})

(31)

O_{t}=\sigma(W_{XO}*X_{t}+W_{HO}*H_{t-1}+b_{O})

(32)

C^{S}_{t-1}=tanh(W_{D}*C_{t-1}+b_{D})

(33)

\hat{C}^{S}_{t-1}=C^{S}_{t-1}\circ g(\Delta T_{t})

(34)

C^{T}_{t-1}=C_{t-1}-C^{S}_{t-1}

(35)

C^{*}_{t-1}=C^{T}_{t-1}+\hat{C}^{S}_{t-1}

(36)

\widetilde{C}=tanh(W_{XC}*X_{t}+W_{HC}*H_{t-1}+b_{C})

(37)

C_{t}=F_{t}\circ C^{*}_{t-1}+I_{t}\circ\widetilde{C}

(38)

H_{t}=O_{t}\circ tanh(C_{t})

(39)

The above equations are minimally modified from their original forms, but, notably, the use of peephole connections in the original Time-LSTM were not included because (i) the exclusion yields a simpler model, and (ii) a seminal paper [57] found that removing peephole connections from LSTMs did not significantly reduce performance. In both cases, the convolutional LSTM’s C gate has been modified to incorporate the potentially nonlinear effect of the time interval ( $\Delta T_{t}$ ) between the current observation ( $X_{t}$ ) and the previous state ( $H_{t-1}$ ) on the current state ( $H_{t}$ ). Here, the time interval ( $\Delta T_{t}$ ) is also a spatial structure that allows temporal specification per pixel; this enables the use of partially obstructed images which may be caused by different orbital paths and/or incomplete cloud obstruction. The current state ( $H_{t}$ ) is naively a spatial structure, but it may be further processed to some metric of interest (e.g. maximum pixel value, the number of anomalous pixels) or used as inputs to further neural network layers.

II-D4 U-Net

Manual inspection of early results for the convolution-based models suggested that these models tended to produce overly smoothed predictions. This can be seen as a problem with precise localization, which the U-Net architecture [58] provides a solution for. The U-Net architecture was designed for image segmentation with no intention for sequential tasks. In the original U-Net implementation, a deep convolutional neural network is improved by creating symmetrical hidden dimensions which start relatively low then increase and return to the original specifications (e.g., $64$ , $128$ , $64$ ); as this happens, the spatial size shrinks then expands while propagating the outputs of earlier layers into the later but symmetrical layers (i.e., $64_{1}$ ’s output propagates to $64_{2}$ ’s input) to promote localization and improved resolution. Here, we implement the symmetrically sized hidden dimensions and the propagation of earlier symmetrical layers’ output to the convolution-based model with the lowest validation set RMSE.

II-E Optimization Details

The data set was separated with a $70/15/15$ split chronologically by each volcano into a training, validation, and test set. The training set was used to fit model parameters, the validation set was used to compare different models as well as compare the same model fit with different regularization strengths, and the test set was aside for evaluation of the final, best model. Every model was fit with $5$ different regularization strengths ranging from $0.0001$ to $1$ by a factor of $10$ ; higher values of this parameter help reduce overfitting to the training set by penalizing large model parameters.

Choosing the exact number of hidden layers as well as the number of units/neurons in each layer is not straightforward, with the only general rules being that more parameters increase performance, computational burden, and the need to combat overfitting. The exact hidden layer choices used in this project were taken from the published best results of the existing neural networks used here [51, 55, 56]. All models were trained for 100 epochs with the Adam optimizer [59] to minimize mean squared error. Optimization occurred in varying mini-batch sizes that were determined to maximize GPU utilization.

All models were implemented in the Python programming language via the PyTorch machine learning framework. The hardware used for this project was an HPC cluster with 10 NVIDIA Tesla V100 GPUs with 32 gigabytes of RAM each; all neural network computations used this hardware in parallel.

III Results and Discussion

III-A Experiment 1: Best Model for Forecasting Images

Table 1 presents the validation set RMSE along with important training details for all the models. Among the naive methods, using the most recent scene as the forecast for the next scene led to an RMSE of $5.291^{\circ}$ C while, alternatively, assuming a homogenous temperature landscape (“All Zeros”) led to an RMSE of $4.738^{\circ}$ C. Representing the only learned statistical model, the AR(6) returned an RMSE of $4.322^{\circ}$ C. Among the non-convolutional neural networks, the standard LSTM models returned RMSEs ranging from $4.217$ to $4.265^{\circ}$ C, the Time-LSTM models returned RMSEs ranging from $4.694$ to $4.870^{\circ}$ C, and the Time-Aware LSTM models returned RMSEs ranging from $4.660$ to $4.726^{\circ}$ C. The latter two converged very closely to the All Zeros model, returning low magnitude predictions in the best results. Convolution-based neural networks returned RMSEs between $4.164$ and $4.428^{\circ}$ C with less variance than the non-convolutional neural networks. The lowest RMSE was achieved by the proposed ConvLSTM + Time-LSTM + U-Net architecture. It was approximately $10$ times faster than the lowest RMSE LSTM and required approximately $40$ times fewer parameters.

TABLE I: Validation set performance and training details for all models

Model Type	Model	Hidden Dim.s	RMSE (^∘C)	Number of Parameters	Training Time (hours)
Naive	Last Scene	–	5.291	–	–
	All Zeros	–	4.738	–	–
Statistical	AR(6)	–	4.322	6	0.10
Non-spatial NN	LSTM	128	4.265	67080	0.09
		1028, 512	4.229	7393384	2.05
		2000, 2000	4.217	48032008	4.91
	Time-LSTM	128	4.870	51213	0.29
		1028, 512	4.706	5555825	4.00
		2000, 2000	4.694	36052013	10.10
	Time-Aware LSTM	128	4.660	83594	0.25
		1028, 512	4.719	8713854	4.15
		2000, 2000	4.726	56036010	11.37
Spatial NN	ConvLSTM	64, 64	4.230	447528	0.26
		128, 64, 64	4.428	1335080	0.41
	ConvLSTM + Time-LSTM	64, 64	4.255	341967	0.31
		128, 64, 64	4.250	1013903	0.46
	ConvLSTM + Time-Aware LSTM	64, 64	4.256	521394	0.31
		128, 64, 64	4.394	1556530	0.45
	ConvLSTM + Time-LSTM + U-Net	64, 128, 64	4.164	1233359	0.53

For the best model of each type (i.e., statistical, non-convolutional, and convolutional), the models were separately trained for $1000$ epochs to determine if further training resulted in better performance, but, instead, it was found that validation set RMSE increased for all models with this extended training. For the AR(6), validation set RMSE increased from $4.322$ to $4.357^{\circ}$ C; for the LSTM, $4.217$ to $4.293^{\circ}$ C; and, for the ConvLSTM + Time-LSTM + U-Net, $4.164$ to $4.182^{\circ}$ C. The models originally trained for $100$ epochs are thus used for subsequent discussion, and the three mentioned in this paragraph will be referred to as the “selected models”.

Predictions for the lowest RMSE model are presented in Figure 6 and Figure 7. Both of these demonstrations compare observations with predictions during large effusive events. In Figure 6 the predictions are well aligned with the observations. It can be noted that the predicted sequence failed to predict new spatial features until after they emerged in observations. Interestingly, as the lava flow continues, the model emphasized particular hotspot areas that were present in prior scenes while broadening the width of more linear features. While Figure 7 is also a large lava flow, its data are much less ideal—note the persistent presence of partial cloud cover as well as missing edge values which required filling with previous scenes. Despite these obstructing conditions, the model continued to predict the presence of a large lava flow in the correct general direction and shape.

The distributions of predicted pixel values for the selected models as well as a poor-performing model (Time-LSTM) were examined and are presented in Figure 8. Barring the poor-performing model, it was found that the models with lower RMSE tended to predict lower-valued extremes than higher RMSE models and all models predicted lower-valued extremes than the true observations.

All in all, the explicit modeling/optimization task was to predict image sequences that are irregularly spaced in time with minimal RMSE relative to the true observations. Each model had different characteristics which influenced its ability to accomplish this task. For example, the AR(6) had very few parameters while some of the neural networks had millions of parameters; the Time-LSTM considered the time between scenes while the ConvLSTM considered a pixel’s spatial neighbors. Ultimately, the proposed model, specifically the ConvLSTM + Time-LSTM + U-net, considered more characteristics of the data (e.g., spatial, irregular in time, and high resolution) and accomplished this explicit goal the best. It is worthwhile to note that while all models have low error relative to the full range of the data that these error differences are spread across $2211840$ pixels— $240$ images in the withheld validation set, each of which contains $9216$ pixels.

III-B Experiment 2: Derived Time Series

With infrared-based imagery, there are many derived time series of interest in volcanic monitoring. Specifically, with temperature data, radiance-based measurements cannot be examined, so here we focus on (1) maximum temperature above the background, (2) the number of hotspots, and (3) distance of furthest hotspot from the summit. All derived time series are concerned with anomalous or extreme temperatures, so it was expected that the models’ inability to predict the true scale of extreme temperatures would inhibit derived time series. To alleviate this, predicted time series were derived both before and after histogram matching the validation set predictions to the training set observations. A demonstration of this can be seen in Figure 9.

Table 2 contains the error of the selected models as well as the poor-performing Time-LSTM on the derived time series. See Figure 10 for a demonstration of these time series using the validation set, histogram matching, and the ConvLSTM + Time-LSTM + U-Net model.

TABLE II: Derived Time Series Metrics for Selected Models

Model	Histogram	Maximum	Number of	Maximum Distance of a Hotspot (m)
	Matching	Temperature (^∘C)	Hotspots	from the Summit
AR(6)	No	33.304	20.553	1190.338
AR(6)	Yes	32.255	18.180	1067.456
LSTM	No	40.604	25.249	1421.756
LSTM	Yes	32.120	18.303	1105.383
Time-LSTM	No	57.930	25.347	1513.082
Time-LSTM	Yes	32.361	17.468	825.800
ConvLSTM + Time-LSTM + U-net	No	46.262	25.347	1513.082
ConvLSTM + Time-LSTM + U-net	Yes	36.642	17.884	1090.002

Before histogram matching, the AR(6) achieves the lowest RMSE on all three derived time series. This is attributable to the fact that it predicts the largest maximum values of the available models. After histogram matching, which accounts for the apparent tendency of neural networks to produce low-scale extremes, the AR(6) model is no longer the best at any derived time series. Likewise, the proposed model (ConvLSTM + Time-LSTM + U-net), which achieved the lowest RMSE on predicting the image sequences was unable to reproduce any derived time series with the lowest RMSE. Instead, after histogram matching, the traditional LSTM achieved the lowest RMSE recreating the maximum temperature above the background ( $32.120^{\circ}$ C), while the Time-LSTM achieved the lowest RMSE on the number of hotspots ( $17.468$ pixels) and the maximum distance of a hotspot from the summit ( $825.800$ meters).

This discrepancy is interesting because the Time-LSTM was included in further comparisons since its best optimization resulted in the worst validation set image sequences (RMSE = $4.694^{\circ}$ C). Further examination revealed that this is because it tended to converge to near-zero image predictions (minimum = $-0.907^{\circ}$ C, maximum = $0.958^{\circ}$ C). Together, this suggests that despite struggling immensely with scale, the Time-LSTM learned superior pattern recognition within a very narrow comfort zone for this task that was indirectly optimized for.

Scientifically, this suggests an important implication—when forecasting multipurpose data with many potential uses, directly minimizing error on the data does not necessarily minimize error on its potential uses or derivations. Moving forward, it would be best to explicitly account for these other metrics directly in the loss function (i.e., optimization problem). Compared to highly engineered, domain-based algorithms, deep learning models are relatively simple and rely on heavy parameterization along with large amounts of data to improve performance. With immense interest in the value of deep learning for scientific tasks, it will likely be a very promising pursuit to engineer loss functions with the multiple scientific priorities and constraints in mind—an example of this is can be seen when forecasting water temperature constrained by physical laws [60].

III-C Experiment 3: Training with Individual Volcanoes

Broadly, RMSE decreased when the training set was limited to individual volcanoes (Table 3). In general, volcanoes with fewer data produced worse results, but performance did not increase exclusively with training set quantity. For example, training exclusively with Erta Ale or Kīlauea data led to lower RMSE than training exclusively with Erebus data ( $4.356^{\circ}$ C, n = 178 and $4.390^{\circ}$ C, n = 150 versus $4.684^{\circ}$ C, n = 336), and training exclusively with Nyiragongo data led to higher RMSE than training exclusively with Masaya data ( $5.365^{\circ}$ C, n = 74 versus $5.343^{\circ}$ C, n = 44).

TABLE III: Validation set performance for different training sets: all volcanoes or specific volcanoes

		Validation Set Performance (^∘C)
Training Volcano	Number of Sequences	All	Erebus	Erta Ale	Etna	Kīlauea	Masaya	Nyamulagira	Nyiragongo	Pacaya	Pu’u ’Ō’ō
All	1127	4.164	3.311	5.875	2.631	2.583	3.458	5.019	7.423	2.611	3.890
Erebus	336	4.684	3.959	6.160	4.629	2.880	5.532	5.369	7.100	3.052	3.884
Erta Ale	178	4.356	3.897	5.534	4.451	2.742	3.141	4.868	6.691	3.085	3.890
Etna	132	5.696	5.629	7.341	3.667	4.533	6.369	5.898	7.652	3.347	5.206
Kīlauea	150	4.390	3.863	5.486	4.737	2.729	3.229	4.955	6.745	3.350	3.882
Masaya	44	5.343	4.772	6.671	5.362	3.739	6.073	5.966	7.527	3.992	4.585
Nyamulagira	53	5.345	4.846	6.727	5.122	3.785	6.074	5.920	7.508	3.838	4.620
Nyiragongo	74	5.365	4.788	6.678	5.422	3.765	6.103	6.001	7.552	3.970	4.611
Pacaya	50	5.453	5.103	6.919	4.717	4.011	6.166	5.915	7.540	3.736	4.802
Pu’u ’Ō’ō	120	4.946	4.340	6.356	4.967	3.119	5.636	5.469	7.241	3.543	4.109

It is commonly noted that every volcano is different and, as such, patterns learned from one volcano may not generalize to another. It could have been possible that a volcano’s validation set performance was minimized by training only with data from that volcano, but that was not found to be the case; in general, training with a specific volcano did not reduce RMSE on that volcano itself. The only exception to this was Erta Ale (all volcanoes RMSE = $5.875^{\circ}$ C, Erta Ale only RMSE = $5.534^{\circ}$ C). The reasons for this are not known; it may be that patterns from the other volcanoes do not apply well to Erta Ale, and/or the fact that, unlike most volcanoes in this data set, Erta Ale has relatively flat, cloud-free terrain.

In $8/9$ cases, a training set consisting of one volcano did not benefit validation set performance on that volcano itself, but in $4/9$ cases, this did reduce validation set RMSE on at least one other volcano. This was the case when the training set consisted of only Erebus, Erta Ale, Kīlauea, and Pu’u ’Ō’ō resulting in validation set RMSE reductions in $2/8$ , $3/8$ , $5/8$ , and $1/8$ other volcanoes, respectively. Among these, the model fit using only Kīlauea data improved validation set performance on at least the same volcanoes that the others did (i.e., training sets of Erebus, Erta Ale, and Pu’u ’Ō’ō). Kīlauea is one of the most well-studied volcanoes in the world and, fortunately, this appears to suggest that thermal patterns learned from it may generalize better than other well-known basaltic volcanoes.

III-D Opportunities

A forefront weakness in this forecasting project is that high-resolution thermal imagery is the only data considered. Even within thermal data sources, there are additional data sources that could be integrated—most obviously MODIS. Additionally, regional weather satellites such as GOES could prove valuable and would still provide nighttime imagery. Beyond nighttime imagery, there are also the options of Landsat and Sentinel-2. With additional expertise it would also be beneficial to include other forms of remote sensing, such as InSAR for ground deformation and IR/UV for gas emissions, and, when available, performance would likely benefit from the inclusion of ground monitoring efforts. Ultimately, volcanoes display a variety of behavior that can be measured on the ground and from space; here, we focused on thermal satellite imagery, one of the most globally available and widely adopted.

Moving forward, the task of integrating all of this data will likely not be straightforward. Experts in the field have speculated that ”few developments promise to revolutionize eruption forecasting — along with so many other branches of science — as does machine learning.” [1, p. 23]. We argue here that important developments moving forward will likely require the development of machine/deep learning methods with scientific goals and data characteristics in mind. Geoscience fields, for example, are generally data-sparse but rich with theory-based models [61], and it has been found that integrating scientific theory with machine learning helps alleviate the burden of low data quantity [60]. Additionally, scientific data may not have the same characteristics as the data that machine learning methods were developed for. For example, Earth data are often irregular in time and space, and imagery may be very different (e.g., gravity-based data, interferograms, hyperspectral) from the typical optical situations that computer vision is concerned with. In summary, there are many broad directions and considerations for machine learning in science, and scientific discovery and application will likely benefit from models that are creatively specified to not only the data but also the governing scientific goals and principles.

IV Conclusion

Here we present one of the first projects forecasting volcanic activity with machine learning. Specifically, we forecasted ASTER nighttime temperature imagery for 9 volcanoes between $1999$ and $2020$ . To better match the remote sensing data characteristics (i.e., temporally irregular image sequences), we proposed three new neural network architectures, and among them, one achieved lower RMSE than all considered models (e.g., autoregression, LSTM, ConvLSTM). All models struggled with scale, but it was found that histogram matching alleviated this, producing derived time series (e.g., maximum temperature above background, the number of hotspots, and distance of furthest hotspot from the summit) that generally approximated reality. It was discovered, however, that the lowest RMSE on forecasting future scenes did not ensure the lowest RMSE on derived tasks. This suggests a need for science-driven optimization targets, especially when forecasting multipurpose monitoring streams. Lastly, it appears that patterns learned from multiple volcanoes generally increase forecasting performance relative to single-volcano training sets. The pattern of improved validation set performance was not exclusively tied to data set size, suggesting that certain volcanoes provide generalizable patterns despite small data sets. Interestingly, the models fit to Kīlauea, one of the most well-studied volcanoes in the world, were found to generalize better than any other volcano considered here—further reducing RMSE relative to the entire data set in $5/9$ volcanoes.

This work demonstrates the value of implementing data-driven changes to popular neural networks for remote sensing forecasting, and we expect that it should generalize well to other Earth observation tasks with data that are irregularly available through time. This also demonstrated a weakness in using standard optimization targets (e.g., base data RMSE) for multipurpose scientific data; performing best at predicting data does not necessarily indicate the best performance on the final uses of that data. Machine learning has the potential for transformative results on scientific tasks, but it is paramount that these efforts make data-driven adjustments, consider the full problem to be explicitly optimized, and integrate existing scientific theory/principles.

Appendix A Test Set Performance

Many important findings in this paper revolved around training conditions (specifically, comparing different types of models and models that utilized different subsets of the available data). As such, the test set—sat aside for final performance evaluation of the best model—could not play a role in these narratives. Here, we present some test set findings when using the proposed model (ConvLSTM + Time-LSTM + U-net) that achieved the lowest validation set RMSE on the image forecasting task.

First, this model was fit on a combination of the training and validation sets (i.e., the chronologically first 85% of scenes per volcano) to more fully utilize the available data. The resulting model was then evaluated on the test set. Overall, this model achieved a test set RMSE of $4.100^{\circ}$ C—lower than all previously found validation set RMSEs (see Table 1).

In line with the discussions found in Section 3, Subsections A and B, this model refused to produce very anomalous temperatures—the maximum temperature it predicted for the entire test set was $10.268^{\circ}$ C above the background, while the maximum observation for these scenes was $104.789^{\circ}$ C above the background. This was problematic for the derived series where it achieved RMSEs of $46.718^{\circ}$ C, $635.387$ pixels, and $3420.979$ meters for forecasting the maximum temperature above the background, the number of hotspots, and the maximum distance of a hotspot from the summit, respectively. Histogram matching to the training and validation set alleviated performance at forecasting the maximum temperature above the background, reducing the RMSE to $30.202^{\circ}$ C; however, performance remained poor at forecasting the number of hotspots ( $644.899$ pixels) and the maximum distance of a hotspot from the summit ( $3819.775$ meters). In these tasks, stretching the small range of predicted values resulted in far too many hot pixels whereas before there were too few. For reference these values can be compared to Table 2, but it should be noted that those metrics were computed for different temporal ranges of the data (i.e., mutually exclusive dates between $2015$ and $2019$ versus dates between $2017$ and $2020$ , depending on the volcano).

One of the events of particular interest in the test set is the 2018 eruptive activity experienced at Kīlauea and its East Rift Zone [62]—represented in part by the Pu’u ’Ō’ō scenes in our data set. Unfortunately, no available scenes collected between May and August were deemed viable or uncertain, but in relation to our data set, this data would have been at the Pu’u ’Ō’ō and Kīlauea subsets, not the highly active lower East Rift Zone. There is, however, relatively dense data availability prior to May 2018.

Leading up the May 2018 lower East Rift Zone eruptions, the model does produce two interesting patterns in its forecasts (that have been histogram matched to the training and validation set to account for known issues in maximum values). The first is a large increase then sharp decrease in maximum temperature above the background at the Kīlauea summit (Figures 11 and 12); while this did not greatly mimic observations at the time, it did shortly proceed the May 2018 lava lake withdrawal that had consequences in local seismic activity and was indicative of broader changes in the plumbing system. It may be argued that this reduced temperature is an artifact of the preceding cloud cover, but Figure 7 demonstrates that cloud cover alone is not sufficient to prohibit active forecasts. The second interesting forecast pattern is that the model predicted an increasing amount of hot spots at the Pu’u ’Ō’ō subset (Figures 11 and 13) leading up the lower East Rift Zone activity that stood in contrast to the fluctuating—if not declining—pattern seen in observations. In reality, the Pu’u ’Ō’ō vent ultimately collapsed, thus displaying no effusive hotspots; however, the ultimate result increased eruptive activity in the lower East Rift Zone. In summary, derived time series from the model forecasted sharply decreasing anomalous temperatures at Kīlauea’s summit shortly before the lava lake withdrawal and increasingly wide eastward lava flows from Pu’u ’Ō’ō prior to Pu’u ’Ō’ō’s vent collapse and the large lava flows in the lower East Rift Zone.

Appendix B Varying the Time between Scenes with the Best Proposed Model

The existing neural networks used in this paper assume one of three data scenarios: sequential data whose observations are evenly distributed in time, sequential images whose observations are evenly distributed in time, or sequential data whose observations are unevenly distributed in time. The merit of the proposed neural networks is that they represent a combination of the latter two scenarios—modeling images whose observations are unevenly distributed in time. These proposed models achieved some of the lowest RMSEs compared to existing models and did so efficiently with respect to computation time and number of parameters (see Table 1). Here, we inspect how image forecasts change when the time between scenes is varied for the proposed model that achieved the lowest validation set RMSE on the image forecasting task (ConvLSTM + Time-LSTM + U-net).

We consider six simple modifications to the time differences between scenes to evaluate how these time differences affect the image forecast. First, we consider multiplying and dividing the first time difference in the six-scene input sequence by a factor of $10$ ; furthermore, we consider the same procedure applied to the last time difference in the six-scene input sequence; and, lastly, we consider adjusting every time difference in this manner. To compare these effects, we consider the mean value of the difference between the originally forecasted scene (with unaltered time differences) and the newly forecasted scene. Additionally, we examine the RMSE of the altered forecast relative to the original forecast. In summary, these two metrics will provide a general direction and magnitude of change.

TABLE IV: Effect on forecast of changing time differences

	Mean of Differenced	RMSE of Altered
Forecast	Forecast	Forecast Relative to
Adjustment	(Adjusted - Original)	Original Forecast
First $\Delta$ T * 0.1	0.002	0.003
Last $\Delta$ T * 0.1	0.005	0.010
All $\Delta$ T * 0.1	0.006	0.017
First $\Delta$ T * 10	-0.019	0.034
Last $\Delta$ T * 10	-0.053	0.098
All $\Delta$ T * 10	-0.063	0.169

The results presented in Table 4 suggest that reducing the time differences by a factor of $10$ has very little effect on image forecasts. The average time difference in the data set is $37$ days, so this likely indicates that the model expects similar changes on the scale of approximately one week to one month. Alternatively, the metrics comparing a change in the forecast are generally an order of magnitude higher when time differences were adjusted in the same manner but by a factor of $10$ (instead of one-tenth). Additionally, predicted pixel values (on average) decreased when time differences were increased. The greatest changes in forecasts were caused by increasing all time differences by a factor of $10$ ; qualitative inspection of some of the most-changed forecasts showed that there was a tendency for background areas to be forecasted as cooler while volcanically active areas often remained unchanged or were forecasted as hotter. Figure 14 displays how some forecasts were different when all of the time differences were increased by a factor of $10$ .

As is common with neural networks, these patterns are difficult to interpret. The tendency to increase average predicted temperatures at short time scales and decrease them at long time scales may be related to the effects of the atmosphere. For example, the thermal signal from clouds does get considered by the forecasts, and sometimes it is passed forward as image features in the forecast. When the time between scenes was increased, the average temperature in the scene decreased, reducing the effects of weather (e.g., relatively warm nighttime clouds) on the long-term forecast. Additionally, one criterion when considering scene viability was whether a volcanic phenomenon was observed through partial cloud cover; for example, large lava flows (particularly at Pu’u ’Ō’ō) were included with partially obstructing conditions. Perhaps the model learned that shorter time between scenes was indicative of more volcanic activity coupled with more nighttime clouds. The more pronounced patterns of forecasting greatly increased or decreased patterns in volcanically active areas are much harder to interpret. It is notable, however, that these patterns are highly spatial, and the existence of differing forecast patterns that emerge when changing the temporal structure is suggestive that the temporal information is valuable and that machine learning algorithms are inclined to find patterns within it.

Acknowledgment

This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE1255832. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. Additionally, we thank Penn State’s Institute of Computational and Data Sciences for both funding and computational resources. Lastly, Helen Greatrex and Manzhu Yu for serving on the advising committee which led to this paper.

References

[1] M. P. Poland and K. R. Anderson, “Partly cloudy with a chance of lava flows: forecasting volcanic eruptions in the twenty-first century,” Journal of Geophysical Research: Solid Earth, vol. 125, no. 1, pp. 1–32, 2020.
[2] S. K. Brown, R. S. J. Sparks, K. Mee, C. Vye-Brown, E. Ilyinskaya, S. F. Jenkins, and S. C. Loughlin, “Country and regional profiles of volcanic hazard and risk.” Global Volcanic Hazards and Risk, p. Appendix B, 2015.
[3] S. K. Brown, S. C. Loughlin, R. S. J. Sparks, C. Vye-Brown, J. Barclay, E. Calder, E. Cottrell, G. Jolly, J. C. Komorowski, C. Mandeville, C. Newhall, J. Palma, S. Potter, and G. Valentine, “Global volcanic hazard and risk,” in Global Volcanic Hazards and Risk, 2015, pp. 81–172.
[4] National Academies of Sciences, Engineering, and Medicine, Volcanic eruptions and their repose, unrest, precursors, and timing. Washington, DC: The National Academies Press, 2017.
[5] K. Reath, M. Pritchard, M. Poland, F. Delgado, S. Carn, D. Coppola, B. Andrews, S. K. Ebmeier, E. Rumpf, S. Henderson, S. Baker, P. Lundgren, R. Wright, J. Biggs, T. Lopez, C. Wauthier, S. Moruzzi, A. Alcott, R. Wessels, J. Griswold, S. Ogburn, S. Loughlin, F. Meyer, G. Vaughan, and M. Bagnardi, “Thermal, deformation, and degassing remote sensing time series (CE 2000–2017) at the 47 most active volcanoes in Latin America: implications for volcanic systems,” Journal of Geophysical Research: Solid Earth, vol. 124, no. 1, pp. 195–218, 2019.
[6] W. Aspinall, “A route to more tractable expert advice,” Nature, vol. 463, no. 7279, pp. 294–295, 2010.
[7] C. Newhall and R. Hoblitt, “Constructing event trees for volcanic crises,” Bulletin of Volcanology, vol. 64, no. 1, pp. 3–20, 2002.
[8] W. Marzocchi, L. Sandri, and J. Selva, “BET_EF: A probabilistic tool for long- and short-term eruption forecasting,” Bulletin of Volcanology, vol. 70, no. 5, pp. 623–632, 2008.
[9] W. N. Junek, W. L. Jones, and M. T. Woods, “Use of logistic regression for forecasting short-term volcanic activity,” Algorithms, vol. 5, no. 3, pp. 330–363, 2012.
[10] D. A. Swanson, T. J. Casadevall, D. Dzurisin, R. T. Holcomb, C. G. Newhall, S. D. Malone, and C. S. Weaver, “Forecasts and predictions of eruptive activity at Mount St. Helens, USA: 1975-1984,” Journal of Geodynamics, vol. 3, pp. 397–423, 1985.
[11] M. P. Poland, T. Lopez, R. Wright, and M. J. Pavolonis, “Forecasting, detecting, and tracking volcanic eruptions from space,” Remote Sensing in Earth Systems Sciences, 2020.
[12] S. De la Cruz-Reyna and G. A. Reyes-Dávila, “A model to describe precursory material-failure phenomena: Applications to short-term forecasting at Colima volcano, Mexico,” Bulletin of Volcanology, vol. 63, no. 5, pp. 297–308, 2001.
[13] Y. Lavallée, P. G. Meredith, D. B. Dingwell, K. U. Hess, J. Wassermann, B. Cordonnier, A. Gerik, and J. H. Kruhl, “Seismogenic lavas and explosive eruption forecasting,” Nature, vol. 453, no. 7194, pp. 507–510, 2008.
[14] A. Aiuppa, R. Moretti, C. Federico, G. Giudice, S. Gurrieri, M. Liuzzo, P. Papale, H. Shinohara, and M. Valenza, “Forecasting Etna eruptions by real-time observation of volcanic gas composition,” Geology, vol. 35, no. 12, pp. 1115–1118, 2007.
[15] M. Ripepe, M. Pistolesi, D. Coppola, D. Delle Donne, R. Genco, G. Lacanna, M. Laiolo, E. Marchetti, G. Ulivieri, and S. Valade, “Forecasting Effusive Dynamics and Decompression Rates by Magmastatic Model at Open-vent Volcanoes,” Scientific Reports, vol. 7, no. 3885, pp. 1–9, 2017.
[16] W. Marzocchi and M. S. Bebbington, “Probabilistic eruption forecasting at short and long time scales,” Bulletin of Volcanology, vol. 74, no. 8, pp. 1777–1805, 2012.
[17] S. De la Cruz-Reyna, “Poisson-distributed patterns of explosive eruptive activity,” Bulletin of Volcanology, vol. 54, no. 1, pp. 57–67, 1991.
[18] M. S. Bebbington, “Assessing spatio-temporal eruption forecasts in a monogenetic volcanic field,” Journal of Volcanology and Geothermal Research, vol. 252, pp. 14–28, 2013. [Online]. Available: http://dx.doi.org/10.1016/j.jvolgeores.2012.11.010
[19] O. Jaquet, R. Carniel, S. Sparks, G. Thompson, R. Namar, and M. Di Cecca, “DEVIN: A forecasting approach using stochastic methods applied to the Soufrière Hills Volcano,” Journal of Volcanology and Geothermal Research, vol. 153, no. 1-2, pp. 97–111, 2006.
[20] R. Carniel, O. Jaquet, and M. Tárraga, “Chapter 14: perspectives on the application of the geostatistical approach to volcano forecasting at different time scales,” Developments in Volcanology, vol. 10, pp. 471–487, 2008.
[21] P. Segall, “Volcano deformation and eruption forecasting,” Geological Society Special Publication, vol. 380, no. 1, pp. 85–106, 2013.
[22] C. H. Ho, “Empirical recurrence rate time series for volcanism: Application to Avachinsky volcano, Russia,” Journal of Volcanology and Geothermal Research, vol. 173, no. 1-2, pp. 15–25, 2008.
[23] T. Carlà, E. Intrieri, F. Di Traglia, and N. Casagli, “A statistical-based approach for determining the intensity of unrest phases at Stromboli volcano (Southern Italy) using one-step-ahead forecasts of displacement time series,” Natural Hazards, vol. 84, no. 1, pp. 669–683, 2016.
[24] M. A. Furtney, M. E. Pritchard, J. Biggs, S. A. Carn, S. K. Ebmeier, J. A. Jay, B. T. McCormick Kilbride, and K. A. Reath, “Synthesizing multi-sensor, multi-satellite, multi-decadal datasets for global volcano monitoring,” Journal of Volcanology and Geothermal Research, vol. 365, pp. 38–56, 2018. [Online]. Available: https://doi.org/10.1016/j.jvolgeores.2018.10.002
[25] K. Reath, M. E. Pritchard, S. Moruzzi, A. Alcott, D. Coppola, and D. Pieri, “The AVTOD (ASTER Volcanic Thermal Output Database) Latin America archive,” Journal of Volcanology and Geothermal Research, vol. 376, pp. 62–74, 2019. [Online]. Available: https://doi.org/10.1016/j.jvolgeores.2019.03.019
[26] M. S. Ramsey, “Synergistic use of satellite thermal detection and science: A decadal perspective using ASTER,” Geological Society, London, Special Publication, vol. 426, no. 1, pp. 115–136, 2016.
[27] M. R. Patrick, J. Kauahikaua, T. Orr, A. Davies, and M. Ramsey, “Operational thermal remote sensing and lava flow monitoring at the Hawaiian Volcano Observatory,” Geological Society, London, Special Publication, vol. 426, no. 1, pp. 489–503, 2016.
[28] R. Wright, L. Flynn, H. Garbeil, A. Harris, and E. Pilger, “Automated volcanic eruption detection using MODIS,” Remote Sensing of Environment, vol. 82, no. 1, pp. 135–155, 2002.
[29] R. Wright, “MODVOLC: 14 years of autonomous observations of effusive volcanism from space,” Geological Society Special Publications, vol. 426, no. 1, 2016.
[30] D. Coppola, M. Laiolo, C. Cigolini, Donne, D. D., and M. Ripepe, “Enhanced volcanic hot-spot detection using MODIS IR data: results from the MIROVA system,” Geological Society London Special Publications, vol. 426, no. 1, pp. 181–205, 2016.
[31] S. Valade, A. Ley, F. Massimetti, O. D’Hondt, M. Laiolo, D. Coppola, D. Loibl, O. Hellwich, and T. R. Walter, “Towards global volcano monitoring using multisensor sentinel missions and artificial intelligence: The MOUNTS monitoring system,” Remote Sensing, vol. 11, no. 13, pp. 1–31, 2019.
[32] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, pp. 1097–1105, 2012.
[33] L. Ma, Y. Liu, X. Zhang, Y. Ye, G. Yin, and B. A. Johnson, “Deep learning in remote sensing applications: A meta-analysis and review,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 152, no. November 2018, pp. 166–177, 2019. [Online]. Available: https://doi.org/10.1016/j.isprsjprs.2019.04.015
[34] M. Ibs-von Seht, “Detection and identification of seismic signals recorded at Krakatau volcano (Indonesia) using artificial neural networks,” Journal of Volcanology and Geothermal Research, vol. 176, no. 4, pp. 448–456, 2008.
[35] G. Curilem, J. Vergara, G. Fuentealba, G. Acuña, and M. Chacón, “Classification of seismic signals at Villarrica volcano (Chile) using neural networks and genetic algorithms,” Journal of Volcanology and Geothermal Research, vol. 180, no. 1, pp. 1–8, 2009. [Online]. Available: http://dx.doi.org/10.1016/j.jvolgeores.2008.12.002
[36] A. Piscini and V. Lombardo, “Volcanic hot spot detection from optical multispectral remote sensing data using artificial neural networks,” Geophysical Journal International, vol. 196, no. 3, pp. 1525–1535, 2014.
[37] N. Anantrasirichai, J. Biggs, F. Albino, P. Hill, and D. Bull, “Application of machine learning to classification of volcanic deformation in routinely generated InSAR data,” Journal of Geophysical Research: Solid Earth, vol. 123, no. 8, pp. 6592–6606, 2018.
[38] J. Sun, C. Wauthier, K. Stephens, M. Gervais, G. Cervone, P. La Femina, and M. Higgins, “Automatic Detection of Volcanic Surface Deformation Using Deep Learning,” Journal of Geophysical Research: Solid Earth, vol. 125, no. 9, 2020.
[39] A. Brancato, P. M. Buscema, G. Massini, and S. Gresta, “Pattern Recognition for Flank Eruption Forecasting : An Application at Mount Etna Volcano ( Sicily , Italy ),” Open Journal of Geology, vol. 6, pp. 583–597, 2016.
[40] Y. Nomura, M. Nemoto, N. Hayashi, S. Hanaoka, M. Murata, T. Yoshikawa, Y. Masutani, E. Maeda, O. Abe, and H. K. Tanaka, “Pilot study of eruption forecasting with muography using convolutional neural network,” Scientific Reports, vol. 10, no. 5272, pp. 1–11, 2020.
[41] D. E. Dempsey, S. J. Cronin, S. Mei, and A. W. Kempa-Liehr, “Automatic precursor recognition and real-time forecasting of sudden explosive volcanic eruptions at Whakaari, New Zealand,” Nature Communications, vol. 11, no. 3562, pp. 1–8, 2020. [Online]. Available: http://dx.doi.org/10.1038/s41467-020-17375-2
[42] S. Openshaw and R. J. Abrahart, GeoComputation, 1st ed., S. Openshaw and R. J. Abrahart, Eds. Taylor & Francis, 2000.
[43] D. Pieri and M. Abrams, “ASTER watches the world’s volcanoes: A new paradigm for volcanological observations from orbit,” Journal of Volcanology and Geothermal Research, vol. 135, no. 1-2, pp. 13–28, 2004.
[44] T. Arvidson, J. Gasch, and S. N. Goward, “Landsat 7’s long-term acquisition plan - An innovative approach to building a global imagery archive,” Remote Sensing of Environment, vol. 78, no. 1-2, pp. 13–26, 2001.
[45] D. Coppola, M. Laiolo, C. Cigolini, F. Massimetti, D. Delle Donne, M. Ripepe, H. Arias, S. Barsotti, C. B. Parra, R. G. Centeno, S. Cevuard, G. Chigna, C. Chun, E. Garaebiti, D. Gonzales, J. Griswold, J. Juarez, L. E. Lara, C. M. López, O. Macedo, C. Mahinda, S. Ogburn, O. Prambada, P. Ramon, D. Ramos, A. Peltier, S. Saunders, E. de Zeeuw-van Dalfsen, N. Varley, and R. William, “Thermal remote sensing for global volcano monitoring: experiences from the MIROVA system,” Frontiers in Earth Science, vol. 7, no. January, pp. 1–21, 2020.
[46] D. Pieri and M. Abrams, “ASTER observations of thermal anomalies preceding the April 2003 eruption of Chikurachki volcano, Kurile Islands, Russia,” Remote Sensing of Environment, vol. 99, no. 1-2, pp. 84–94, 2005.
[47] K. A. Reath, M. S. Ramsey, J. Dehn, and P. W. Webley, “Predicting eruptions from precursory activity using remote sensing data hybridization,” Journal of Volcanology and Geothermal Research, vol. 321, pp. 18–30, 2016. [Online]. Available: http://dx.doi.org/10.1016/j.jvolgeores.2016.04.027
[48] U.S./Japan ASTER Science Team, “ASTER Level 2 Surface Temperature Product,” NASA LP DAAC, 2001.
[49] S. Rose and M. Ramsey, “The 2005 eruption of Kliuchevskoi volcano: Chronology and processes derived from ASTER spaceborne and field-based data,” Journal of Volcanology and Geothermal Research, vol. 184, no. 3-4, pp. 367–380, 2009. [Online]. Available: http://dx.doi.org/10.1016/j.jvolgeores.2009.05.001
[50] J. D. Cryer and K.-S. Chan, Time Series Analysis with Applications in R, 2nd ed. Springer-Verlag New York, 2008.
[51] X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-k. Wong, and W.-c. Woo, “Convolutional LSTM network: a machine learning approach for precipitation nowcasting,” Advances in Neural Information Processing Systems, pp. 802–810, 2015. [Online]. Available: http://papers.nips.cc/paper/5955-convolutional-lstm-network-a-machine-learning-approach-for-precipitation-nowcasting
[52] C. Xiao, N. Chen, C. Hu, K. Wang, Z. Xu, Y. Cai, L. Xu, Z. Chen, and J. Gong, “A spatiotemporal deep learning model for sea surface temperature field prediction using time-series satellite data,” Environmental Modelling and Software, vol. 120, no. October 2018, p. 104502, 2019. [Online]. Available: https://doi.org/10.1016/j.envsoft.2019.104502
[53] S. Kim, S. Hong, M. Joh, and S. K. Song, “DeepRain: ConvLSTM network for precipitation prediction using multichannel radar data,” Proceedings of the 7th International Workshop on Climate Informatics: CI 2017, pp. 89–92, 2017.
[54] A. Stoian, V. Poulain, J. Inglada, V. Poughon, and D. Derksen, “Land cover maps production with high resolution satellite image time series and convolutional neural networks: Adaptations and limits for operational systems,” Remote Sensing, vol. 11, no. 1986, 2019.
[55] Y. Zhu, H. Li, Y. Liao, B. Wang, Z. Guan, H. Liu, and D. Cai, “What to do next: modeling user behaviors by Time-LSTM,” International Joint Conference on Artificial Intelligence, pp. 3602–3608, 2017.
[56] I. M. Baytas, C. Xiao, X. Zhang, F. Wang, A. K. Jain, and J. Zhou, “Patient subtyping via time-aware LSTM networks,” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 65–74, 2017.
[57] K. Greff, R. K. Srivastava, J. Koutnik, B. R. Steunebrink, and J. Schmidhuber, “LSTM: A search space odyssey,” IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 10, pp. 2222–2232, 2017.
[58] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, vol. 9351, pp. 234–241, 2015.
[59] D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pp. 1–15, 2015.
[60] J. S. Read, X. Jia, J. Willard, A. P. Appling, J. A. Zwart, S. K. Oliver, A. Karpatne, G. J. Hansen, P. C. Hanson, W. Watkins, M. Steinbach, and V. Kumar, “Process-Guided Deep Learning Predictions of Lake Water Temperature,” Water Resources Research, vol. 55, no. 11, pp. 9173–9190, 2019.
[61] A. Karpatne, I. Ebert-Uphoff, S. Ravela, H. A. Babaie, and V. Kumar, “Machine learning for the geosciences: challenges and opportunities,” IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 8, pp. 1544–1554, 2019.
[62] C. A. Neal, S. R. Brantley, L. Antolik, J. L. Babb, and Etc, “The 2018 rift eruption and summit collapse of Kīlauea Volcano,” Science, vol. 363, no. January, pp. 367–374, 2019.