\Author

[1]AnnaVaughan \Author[1]WillTebbutt \Author[2,3]J. ScottHosking \Author[1]Richard E.Turner

1]University of Cambridge, Cambridge, UK 2]British Antarctic Survey, Cambridge, UK 3]The Alan Turing Institute, UK

\correspondence

\pubdiscuss\published

Convolutional conditional neural processes for local climate downscaling

Abstract

A new model is presented for multisite statistical downscaling of temperature and precipitation using convolutional conditional neural processes (convCNPs). ConvCNPs are a recently developed class of models that allow deep learning techniques to be applied to off-the-grid spatio-temporal data. This model has a substantial advantage over existing downscaling methods in that the trained model can be used to generate multisite predictions at an arbitrary set of locations, regardless of the availability of training data. The convCNP model is shown to outperform an ensemble of existing downscaling techniques over Europe for both temperature and precipitation taken from the VALUE intercomparison project. The model also outperforms an approach that uses Gaussian processes to interpolate single-site downscaling models at unseen locations. Importantly, substantial improvement is seen in the representation of extreme precipitation events. These results indicate that the convCNP is a robust downscaling model suitable for generating localised projections for use in climate impact studies, and motivates further research into applications of deep learning techniques in statistical downscaling.

\introduction

Statistical downscaling methods are vital tools in translating global and regional climate model output to actionable guidance for climate impact studies. General circulation models (GCMs) and regional climate models (RCMs) are used to provide projections of future climate scenarios, however coarse resolution and systematic biases result in unrealistic behaviour, particularly for extreme events (allen2016climate; maraun2017towards). In recognition of these limitations, downscaling is routinely performed to correct raw GCM and RCM outputs. This is achieved either by dynamical downscaling, running a nested high-resolution simulation, or statistical methods. Comparisons of statistical and dynamical downscaling suggest that neither group of methods is clearly superior (ayar2016intercomparison; casanueva2016towards), however in practice computationally cheaper statistical methods are widely used.

Major classes of statistical downscaling methods are model output statistics (MOS) and perfect prognosis (PP; Maraun et al., 2010). MOS methods explicitly adjust the simulated distribution of a given variable to the observed distribution, using variations of quantile mapping (teutschbein2012bias; piani2010statistical; cannon2020bias). Though these methods are widely applied in impact studies, they struggle to downscale extreme values and artificially alter trends (maraun2013bias; maraun2017towards). In contrast, in PP downscaling, the aim is to learn a transfer function f such that

\hat{y}=f(x,Z)

(1)

Where $\hat{y}$ is the downscaled prediction of a given climate variable whose true value is $y$ at location $x$ and Z is a set of predictors from the climate model (maraun2018statistical). This is based on the assumption that while sub-grid-scale and parameterised processes are poorly represented in GCMs, the large scale flow is generally better resolved.

Multiple different models have been trialled for parameterising $f$ . Traditional statistical methods used for this purpose include multiple linear regression (gutierrez2013reassessing; hertig2013novel), generalised linear models (san2017reassessing) and analog techniques (hatfield2015temperature; ayar2016intercomparison). More recently, there has been considerable interest in applying advances in machine learning to this problem, including relevance vector machines (ghosh2008statistical), artificial neural networks (sachindra2018statistical), autoencoders (vandal2019intercomparison), recurrent neural networks (bhardwaj2018downscaling) and convolutional neural networks (bano2020configuration; hohlein2020comparative). There has been debate as to whether deep learning methods provide improvement over traditional statistical techniques such as multiple linear regression. vandal2019intercomparison found that machine learning approaches offered little improvement over traditional methods in downscaling precipitation. In contrast bano2020configuration compared downscaling performance of convolutional neural networks (CNNs) and simple linear models, finding that CNNs improved predictions of precipitation, but did not result in improvements for temperature.

Limitations remain in these models. In many climate applications it is desirable to make projections that are both (i) consistent over multiple locations and (ii) specific to an arbitrary locality. The problem of multi-site downscaling has been widely studied, with two classes of approaches emerging. Traditional methods take analogues or principal components of the coarse-resolution field as predictors. The spatial dependence is then explicitly modelled for a given set of sites, using observations at those locations to train the model (maraun2018statistical; cannon2008probabilistic; bevacqua2017multivariate; mehrotra2005nonparametric). More recent work has sought to leverage avances in machine learning, for example CNNs, for feature extraction (bhardwaj2018downscaling; bano2020configuration; hohlein2020comparative). Each of these classes of methods has their drawbacks. Traditional methods have limited feature selection, however are able to train directly on true observations. In contrast, deep learning techniques such as CNNs allow for sophisticated feature extraction, however are only able to be applied to gridded datasets. Gridding of observations naturally introduces error, especially for areas with complex topography and highly stochastic variables such as precipitation (king2013efficacy). The second limitation common to existing downscaling models is that predictions can only be made at sites for which training data are available. Creating a projection at an arbitrary location is achieved either through interpolation of model predictions or taking the closest location.

In this study, we address these challenges by developing a new statistical model for downscaling temperature and precipitation at an arbitrary set of sites. This is achieved using convolutional conditional neural processes, state of the art probabilistic machine learning methods combining ideas from Gaussian Processes (GPs) and deep neural networks. The model combines the advantages of existing classes of multi-site approaches, with feature extraction using a convolutional neural network together with training on off-the-grid data. In contrast to existing methods, the trained model can be used to make coherent local projections at any site regardless of the availability of training observations.

The specific aims of this study are as follows:

1.

Develop a new statistical model for downscaling GCM output capable of training on off-grid data, making predictions at unseen locations and utilising sub-grid-scale topographic information to refine local predictions.
2.

Compare the performance of the statistical model to existing strong baselines.
3.

Compare the performance of the statistical model at unseen locations to existing interpolation methods.
4.

Quantify the impact of including sub-grid-scale topography on model predictions.

Section 2 outlines the development of the downscaling model, and presents the experimental setup used to address aims 2-4. Section 3 compares the performance of the statistical model to an ensemble of baselines. Sections 4 and 5 explore model performance at unseen locations and the impact of including local topographic data. Finally, Section 6 presents a discussion of these results and suggestions for further applications.

1 Datasets and methodology

We first outline the development of the statistical downscaling model, followed by a description of three validation experiments.

1.1 The downscaling model

Our aim is to approximate the function $f$ in equation 1 to predict the value of a downscaled climate variable $y$ at locations x given a set of coarse-scale predictors $Z$ . In order to take the local topography into account, we assume that this function also depends on the local topography at each target point, denoted e, i.e

\hat{y}=f(\textbf{x},Z,\textbf{e})

We take a probabilistic approach to specifying f where we include a noise model, so that

p(y|\textbf{x},Z,\textbf{e})=p(y|\theta(\textbf{x},Z,\textbf{e}))

Deterministic predictions are made from this by using, for example, the predictive mean

\hat{y}=\int yp(y|\textbf{x},Z,\textbf{e})dy

In this model $\theta$ is parameterised as

\theta(\textbf{x},Z,\textbf{e})=\psi_{MLP}[\phi_{c}(h=CNN(Z),\textbf{x}),\textbf{e}]

Here $\theta$ is a vector of parameters of an distribution for the climate variable at prediction locations x. This is assumed to be Gaussian for maximum temperature and a Gamma-Bernoulli mixture for precipitation. e is a vector of sub-grid-scale topographic information at each of the prediction locations, $\psi_{MLP}$ is a multi-layer perceptron, $\phi_{c}$ is a function parameterised as a neural network and CNN is a convolutional neural network. Each component of this is described below, with a schematic of the model shown in Figure 1.

Refer to caption — Figure 1: Schematic of the convCNP model for downscaling precipitation demonstrating the flow of data in predicting precipitation for a given day at target locations x. Gridded coarse resolution data for each predictor is fed into the CNN, producing predictions of $\theta=(\rho,\alpha,\beta)$ at each grid point. These gridded predictions are then transformed to a prediction at the target location using an exponentiated-quadratic kernel. Finally, these elevation agnostic predictions are fed into a multi-layer perceptron together with topographic data e to produce a final prediction of the parameters.