This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Plant Species Richness Prediction from DESIS Hyperspectral Data: A Comparison Study on Feature Extraction Procedures
and Regression Models

Yiqing Guo Karel Mokany Cindy Ong Peyman Moghadam Simon Ferrier Shaun R. Levick CSIRO Land and Water, Acton, ACT 2601, Australia CSIRO Energy, Kensington, WA 6151, Australia CSIRO Data61, Pullenvale, QLD 4069, Australia CSIRO Land and Water, Winnellie, NT 0822, Australia
Abstract

The diversity of terrestrial vascular plants plays a key role in maintaining the stability and productivity of ecosystems. Monitoring species compositional diversity across large spatial scales is challenging and time consuming. Airborne hyperspectral imaging has shown promise for measuring plant diversity remotely, but to operationalise these efforts over large regions we need to advance satellite-based alternatives. The advanced spectral and spatial specification of the recently launched DESIS (the DLR Earth Sensing Imaging Spectrometer) instrument provides a unique opportunity to test the potential for monitoring plant species diversity with spaceborne hyperspectral data. This study provides a quantitative assessment on the ability of DESIS hyperspectral data for predicting plant species richness in two different habitat types in southeast Australia. Spectral features were first extracted from the DESIS spectra, then regressed against on-ground estimates of plant species richness, with a two-fold cross validation scheme to assess the predictive performance. We tested and compared the effectiveness of Principal Component Analysis (PCA), Canonical Correlation Analysis (CCA), and Partial Least Squares analysis (PLS) for feature extraction, and Kernel Ridge Regression (KRR), Gaussian Process Regression (GPR), and Random Forest Regression (RFR) for species richness prediction. The best prediction results were r=0.76r=0.76 and RMSE=5.89\text{RMSE}=5.89 for the Southern Tablelands region, and r=0.68r=0.68 and RMSE=5.95\text{RMSE}=5.95 for the Snowy Mountains region. Relative importance analysis for the DESIS spectral bands showed that the red-edge, red, and blue spectral regions were more important for predicting plant species richness than the green bands and the near-infrared bands beyond red-edge. We also found that the DESIS hyperspectral data performed better than Sentinel-2 multispectral data in the prediction of plant species richness. Our results provide a quantitative reference for future studies exploring the potential of spaceborne hyperspectral data for plant biodiversity mapping.

keywords:
hyperspectral , remote sensing , vascular plant , biodiversity , species richness , DESIS (the DLR Earth Sensing Imaging Spectrometer)
journal: ISPRS Journal of Photogrammetry and Remote Sensing

1 Introduction

Plant biodiversity is of critical importance to the stability of terrestrial ecosystems [Frankel et al., 1995]. Anthropogenic activities, such as inappropriate cropping, deforestation, overgrazing, and construction, in conjunction with climate change, have been leading to substantial degradation and loss of natural habitats, posing imminent threats to vulnerable plant species [Ceballos et al., 2015, Tollefson, 2019]. Consequently, species extinctions are occurring much faster than the natural background rate [Ceballos et al., 2015]. Conservation activities have been undertaken in many places around the world aiming to reduce the current rate of extinction [Leclère et al., 2020, Mokany et al., 2020].

Spatial mapping of plant biodiversity helps with a better understanding of the distribution and temporal trends of plant species richness, facilitating effective policy making in environmental conservation and restoration [Stevenson et al., 2021, Myers et al., 2021, De Palma et al., 2021]. Considerable effort has been made to collect samples of plant species richness through in-situ surveys [Kattge et al., 2020]. Despite the ever increasing amount of data, it has been recognised that the completeness and representativeness of biodiversity samples still remain as a major challenge for the compilation of up-to-date biodiversity maps with fine resolution and wide coverage [König et al., 2019, Kattge et al., 2020]. This data gap is understandable as field expeditions are labour and time consuming, and sometimes infeasible if the location is remote or hard to access [Wang and Gamon, 2019, Guo et al., 2018]. Moreover, inconsistencies among collection campaigns in their sampling strategies and ground plot sizes, confounded by human subjectivity and bias, further hampered the use of ground sampling data in downstream scientific research [Wang and Gamon, 2019].

Spaceborne remote sensing has long been deemed as a promising and cost-effective tool for mapping plant biodiversity, mainly due to its ability to capture data over large areas and in a timely manner [Skidmore et al., 2021, Wang and Gamon, 2019, Bush et al., 2017, Pettorelli et al., 2016]. Among different types of remote sensing data, hyperspectral data is of particular interest for the biodiversity community, as it contains rich features in the spectral domain that can be utilised to explore the underlying relationship with plant biodiversity on the ground [Ghiyamat and Shafri, 2010, Carlson et al., 2007, Guo et al., 2022]. Previous studies have shown that plant biodiversity is linked to remotely sensed spectral measurements because of a well-founded interrelationship between plant species richness and primary productivity [Wang et al., 2016, Grace et al., 2016]. It is hypothesised that a high diversity of plant species enhances primary production of the community, as a result of complementary functions provided by the diversified species composition. These interrelationships have the potential to enable researchers to use hyperspectral measurements, and their derived spectral indices, as remotely sensed measures of vegetation productivity and estimates of species richness.

Most research into the relationships between biodiversity and hyperspectral data have made use of hand-held or airborne hyperspectral sensors, which are more readily available than spaceborne hyperspectral data-streams (e.g., Peng et al. [2018], Asner [2008], Féret and Asner [2014], and Hacker et al. [2020]). Spaceborne hyperspectral imaging is relatively rare, with no active satellites in orbit since the Hyperion mission (which was active from 2000 to 2017). However, in preparation for the upcoming launch of EnMAP [Guanter et al., 2015], the DLR launched an exploratory system to the International Space Station and embedded into the Multi-User-System for Earth Sensing (MUSES) platform in 2018. The DLR Earth Sensing Imaging Spectrometer (DESIS) [Eckardt et al., 2015, Mafanya et al., 2022] provides a unique opportunity to test the potential for monitoring plant species diversity with spaceborne hyperspectral data. It delivers hyperspectral images with 235 spectral bands over the visible and near-infrared regions of 400 \sim 1000 nm, with a spectral resolution of 2.55 nm and a spatial resolution of 30 m [Eckardt et al., 2015, Alonso et al., 2019, Krutz et al., 2019]. The high resolutions in both spectral and spatial domains make DESIS a promising data source for estimating biodiversity from space. However, there is so far a lack of quantitative studies on assessing the ability of DESIS hyperspectral data for predicting plant species richness values on the ground. It is worth noting the added challenge of spaceborne hyperspectral imagery such as DESIS having a larger pixel size (30m) than typical airborne (ranging from a few centimeters to several meters). As the signal of a large pixel tends to comprise a mix of multiple species, many techniques previously used to quantify richness with hyperspectral data, which detect individual species and then sum up to get richness (e.g., Asner [2008], Féret and Asner [2014]), cannot be applied.

The original bands in hyperspectral imagery are not orthogonal but rather highly collinear with each other, presenting a high degree of redundancy of information. Such redundancy can be removed to some extent by transforming the original spectral bands into an orthogonal space of a lower dimensionality. Among popular algorithms for dimensionality reduction are Principal Component Analysis (PCA) [Xu et al., 2019, Jia and Richards, 1999], Canonical Correlation Analysis (CCA) [Zhao et al., 2014b, Richards and Jia, 2006], and Partial Least Squares analysis (PLS) [Feilhauer et al., 2015, Hacker et al., 2020, Wang et al., 2019]. These dimensionality reduction techniques reduce information redundancy by removing multicollinearity among spectral bands, enabling extracting spectral features from the original spectral data of hundreds of bands. In contrast to feature selection that chooses a subset of the original bands (or spectral indices computed from a subset of the original bands), feature extraction techniques such as PCA, CCA, and PLS are able to makes use of information in all bands by transforming them into compact yet informative features. Compared with pre-defined vegetation indices such as Ratio Vegetation Index (RVI) and Normalised Difference Vegetation Index (NDVI), spectral features generated with feature extraction have shown better performance in extracting useful information from hyperspectral measurements [Zhao et al., 2014b]. Following the extraction of spectral features, regression analysis can then be conducted to explore potential relationships between the extracted features and target biological variables. Commonly applied regression algorithms include the Kernel Ridge Regression (KRR), Gaussian Process Regression (GPR), and Random Forest Regression (RFR). Statistical regression based on extracted spectral features has shown to be effective in addressing biological problems with hyperspectral remote sensing data. For example, in a study to detect unintended herbicide damage in crops with hyperspectral measurements, dimensionality reduction was conducted with CCA in order to extract useful information to discriminate between healthy and damaged crops [Zhao et al., 2014b]. In another study aiming to retrieve foliar traits from hyperspectral data, PLS was used to reduce the spectral dimensionality, with adequate retrieval accuracies being achieved for 10 out of the 11 functional traits [Hacker et al., 2020]. These studies demonstrate that feature extraction and statistical regression can be effective tools in addressing biological problems with hyperspectral measurements.

In this study, we aimed to assess the potential for DESIS hyperspectral data to predict on-ground plant species richness in two regions of southeast New South Wales, Australia—the Southern Tablelands and Snowy Mountains. Our approach focused on spectral feature extraction from the DESIS spectra, and subsequent regression against field-measured plant species richness. We tested the combination of different feature extraction procedures (PCA, CCA, and PLS) and regression models (GPR, KRR, and RFR) for the predictive performance of species richness with DESIS data. Through quantitative analyses, we sought to address primarily the following important questions: (1) How much variation in plant species richness can be explained with DESIS data? (2) Which parts of the spectrum had the most explanatory power? (3) Could similar results be achieved with more readily available multi-spectral imagery such as Sentinel-2?

2 Materials and Methods

While DESIS hyperspectral data contain rich spectral information, modelling is needed to link such information to field-based measurements of plant species richness. Here we followed a two-step approach whereby spectral features were first extracted from DESIS spectra and then correlated to species richness through regression. In this section, we start with describing the DESIS spectra and in-situ richness samples, followed by introducing the methods for feature extraction, regression, and accuracy assessment.

2.1 Study site and field data

This study focused on two different habitat types in southeast New South Wales, Australia, namely the Southern Tablelands (34°12’26”–34°39’07”S, 150°05’57”–150°40’51”E) and Snowy Mountains (35°43’58”–36°16’30”S, 148°23’16”–148°39’02”E), as shown in Fig. 1. Both regions are located within the climate zone of Cfb (oceanic climates), according to the Köppen–Geiger climate classification system.

Refer to caption
Figure 1: Locations of in-situ plant species richness samples collected in field experiments.

The Southern Tablelands region is located to the southwest of Sydney (Fig. 1). It is characterised by high-altitude plains with a rich biodiversity. There are more than 1200 plant species within Southern Tablelands, of which 30 are listed as threatened [Fallding, 2002]. A large part of the landscape has been transformed into suburbs for residential developments and pastures for grazing purposes. Considering the high degree of human interference and habitat alteration, conservation efforts have been undertaken in order to preserve endangered plant species by improving habitat connectivity and condition.

The Snowy Mountains region is located to the southwest of Canberra (Fig. 1). It encompasses the highest mountain ranges of the Australian Alps, serving as an important habitat for alpine-exclusive species. There are 212 species of vascular plants, of which 21 are endemic [Pickering et al., 2008]. Due to its unique status in Australia’s ecosystem, the plant biodiversity in Snowy Mountains has drawn consistent interest from the research community (e.g. Körner [1995], Pickering et al. [2008], and Pickering and Green [2009]).

For on-ground measures of vascular plant species richness, we obtained plant community survey data from the NSW BioNet Vegetation Information System database [Government, 2019]. Field surveys were conducted to collect species richness samples in 2016 and 2017 for the two regions, with a sampling plot area of 400 m2 (20 m ×\times 20 m) at each surveying location. A significant bushfire event occurred during the 2019–2020 summer, with some of the sampling points situated within the affected areas. These bushfire-affected samples were excluded from the data set, based on the National Indicative Aggregated Fire Extent Datasets (NIAFED) provided by the Australian Government Department of Agriculture, Water and the Environment. After the exclusion, a total of 44 and 29 samples were used in this study for analysis for the Southern Tablelands and Snowy Mountains regions, respectively. The locations of these samples are shown in Fig. 1, and their associated information is summarised in Table 1. For each sampling plot, the number of native vascular plant species was calculated and used as the response variable in our analyses.

Table 1: Information summary of the plant species richness samples collected in field observations.
Southern Tablelands Snowy Mountains
Number of Samples 44 29
Sampling Time Feb 19, 2017 \sim Dec 07, 2017 Feb 24, 2016 \sim Dec 13, 2017
Plot Area 400 m2 (20 m ×\times 20 m) 400 m2 (20 m ×\times 20 m)
Geo-extent
34°12’26”–34°39’07”S
150°05’57”–150°40’51”E
35°43’58”–36°16’30”S
148°23’16”–148°39’02”E

The histograms of species richness distribution for sampling plots in the two regions are shown in Fig. 2. Generally, the sampling plots in Southern Tablelands show a higher richness of species than Snowy Mountains, with a mean richness value of 44.4 for the former and 23.5 for the latter. The difference in species richness can be mainly attributed to the fact that the Southern Tablelands is located at lower altitudes with a relatively warmer climate than the mountainous region of Snowy Mountains.

Refer to caption
Figure 2: Histograms of species richness distribution for sampling plots in the (a) Southern Tablelands, and (b) Snowy Mountains regions.

2.2 Satellite Data

The DESIS spectrometer [Krutz et al., 2019] is embedded in the MUSES platform onboard the International Space Station at an altitude of approximately 400 km. It operates in a push-broom imaging mode featuring state-of-the-art radiometric and spectral specifications. It delivers hyperspectral images with 235 spectral bands over the visible and near-infrared regions of 400 \sim 1000 nm, with a spectral resolution of 2.55 nm and a spatial resolution of 30 m. The radiometric resolution for each band is 12 bit with 1 bit gain. The signal-to-noise ratio is 195 at the wavelength of 550 nm.

In our study, the DESIS Level-2A product was used. It consisted of surface reflectance images with atmospheric correction having been applied. The correction was conducted with DLR’s PACO (Python Atmospheric COrrection) software [De los Reyes et al., 2018] where the MODTRAN® radiative transfer model [Berk, 2016] served as the module for simulating atmospheric effects. As inputs for atmospheric simulation, aerosol optical thickness and water vapour content were retrieved per pixel using reflectance in the red and NIR bands, and bands around the water absorption features of 820 nm, respectively. DESIS spectra intersecting with locations of the species richness samples were queried within CSIRO’s Earth Analytics and Science Innovation (EASI) platform. We selected spectra captured in January 2020 for our analysis. In order to moderate the random noise present in the original spectra, the spectral resolution were down-sampled from 2.55 nm into 10.2 nm bins with the assumption of a Gaussian-shaped spectral response function. The atmospherically affected bands of 759, 769, 933.4, 943.4, and 953.2 nm, and the low quality bands of 402.8, 410.3, and 999.5 nm at the left and right ends of the spectrum were removed. A total of 52 bands were retained after the removal. Pixels flagged as cloud by the DLR Level-1 processing were masked out.

The Bidirectional Reflectance Distribution Function (BRDF) effect in DESIS data is unneglectable, given the 4.14.1^{\circ} field of view in conjunction with the ±15\pm 15^{\circ} along track pointing capability of the DESIS sensor, and the ±25\pm 25^{\circ} along track and 45-45^{\circ} \sim +5+5^{\circ} cross track tilting capability of the MUSES platform. Following the approach adopted in Green and Craig [1985] and Ong and Cudahy [2014] for correcting BRDF effect in hyperspectral data, in our study each DESIS spectra was mean-normalised with each spectral band being divided by the mean value over all bands.

For comparison purpose, cloud-free Sentinel-2 multispectral data observed closest to the sensing time of DESIS spectra were also downloaded. These Sentinel spectra were downloaded as Level-2A surface reflectance with atmospheric effects being corrected. Each spectrum consisted of 12 bands covering the visible and near-infrared spectral regions. The Sentinel-2 data were re-sampled into a spatial resolution of 30 m to be consistent with that of the DESIS data. A comparison between the DESIS and Sentinel-2 spectra at one of the ground sampling plots is shown in Fig. 3. Both spectra cover the visible, near-infrared, and short-wave-infrared (SWIR) regions, including the red-edge region that is critical for vegetation mapping. The Sentinel-2 and DESIS spectra show a consistent shape in these spectra regions, with DESIS having a much denser band coverage. The SWIR region is covered by Sentinel-2 only with its two bands. Though information in SWIR is not contained in the DESIS data, it will be provided by the upcoming DLR mission of EnMAP (for which DESIS has been served as a preparation mission), as shown in Fig. 3.

Refer to caption
Figure 3: Comparison of the DESIS (before and after pre-processing) and Sentinel-2 spectra at one of the ground sampling plots. An EnMAP spectrum simulated for a random location is also shown for reference.

2.3 Dimensionality Reduction for DESIS Hyperspectral Data

The DESIS hyperspectral data provide rich information in the abundant and spectrally continuous bands, but these bands are highly collinear. We performed dimensionality reduction to address this problem—testing three different approaches, namely the Principal Component Analysis (PCA), Canonical Correlation Analysis (CCA), and Partial Least Squares analysis (PLS). These methods aim to find a linear transformation to project the DESIS spectra from the original space of nn spectral bands to a new space of a reduced dimensionality defined by kk uncorrelated components, with kk being smaller than nn.

Mathematically, the transformation for dimensionality reduction is written as:

𝐓=𝐗𝐖,\mathbf{T}=\mathbf{X}\mathbf{W}, (1)

where the input data for the transformation is 𝐗\mathbf{X}, which is a n×mn\times m matrix consisting of the original spectra, with nn being the number of observed spectra and mm being the number of spectral bands; the output of the transformation is 𝐓=[𝒕1,𝒕2,,𝒕k]\mathbf{T}=[{\bm{t}}_{1},{\bm{t}}_{2},\cdots,{\bm{t}}_{k}], which is a n×kn\times k matrix consisting of kk components of 𝐗\mathbf{X}; 𝐖=[𝒘1,𝒘2,,𝒘k]\mathbf{W}=[{\bm{w}}_{1},{\bm{w}}_{2},\cdots,{\bm{w}}_{k}] is a m×km\times k matrix transforming 𝐗\mathbf{X} from the original nn-dimensional space into a new space of kk components. The weights in 𝒘i\bm{w}_{i} are a measure of relative contributions of the original bands to the transformed component 𝒕i=𝐗𝒘i{\bm{t}}_{i}=\mathbf{X}{\bm{w}}_{i}. The number of components, kk, needs to be preset. In this study, kk is selected as the one that achieves the highest cross-validation accuracy.

The PCA is an unsupervised algorithm that finds orthogonal components as the ones that explain the maximum variance in the spectral data, disregarding the target values of species richness. In contrast, CCA and PLS are supervised with both spectral data and species richness values being taken into account in the computation of components. The difference between CCA and PLS is that CCA seeks to maximise the correlation between computed components and species richness values, while PLS aims to maximise the covariance between the two.

2.4 Estimation of Species Richness with Regression Models

After reducing the dimensionality of spectral data from the original nn bands to kk components, regression is conducted to predict species richness from the components, such that the mismatch between model-predicted and ground-truth species richness is minimised. The Kernel Ridge Regression (KRR), Gaussian Process Regression (GPR), and Random Forest Regression (RFR) algorithms are employed and compared, covering respectively the deterministic, Bayesian, and ensemble approaches to statistical regression. Here we focus on formulating our task of estimating species richness within the frameworks of these regression approaches.

The DESIS spectra {𝒙i}i=1n\left\{\bm{x}_{i}\right\}_{i=1}^{n} can be represented by their components {𝒕i}i=1n\left\{\bm{t}_{i}\right\}_{i=1}^{n} using one of the dimensionality reduction methods described in Subsection 2.3. The KRR transforms {𝒕i}i=1n\left\{\bm{t}_{i}\right\}_{i=1}^{n} into a feature space of high dimensionality (potentially infinite dimensionality) via a function φ(𝒕i)\varphi(\bm{t}_{i}). A linear model in the high-dimensional feature space is non-linear when it projects back to the original space, thus enabling capturing non-linear relationships in the data. The number of parameters for a linear model increases with space dimensionality, posing a risk of over-fitting. In order to constrain the model complexity, a regularisation term is adopted in KRR to penalise the norm of the coefficient vector 𝒃\bm{b}. The optimisation problem is:

argmin𝒃i=1n(𝒕iT𝒃yi)2+λ𝒃22,\arg\min_{\bm{b}}\quad\sum_{i=1}^{n}(\bm{t}_{i}^{\text{T}}\bm{b}-y_{i})^{2}+\lambda\left\|\bm{b}\right\|_{2}^{2}, (2)

where λ\lambda is the regularisation parameter controls the relative importance of the regularisation term 𝒃22\left\|\bm{b}\right\|_{2}^{2}.

The input data for Eq. 2 are the transformed components {𝒕i}i=1n\left\{\bm{t}_{i}\right\}_{i=1}^{n}. Solving Eq. 2 does not involve calculating φ(𝒕i)\varphi(\bm{t}_{i}). Instead, it only requires computation of the inner product k(𝒕i,𝒕j)=φ(𝒕i)φ(𝒕j)k(\bm{t}_{i},\bm{t}_{j})=\varphi(\bm{t}_{i})\varphi(\bm{t}_{j}), where 𝒕i\bm{t}_{i} and 𝒕j\bm{t}_{j} are pairs from the input data {𝒕i}i=1n\left\{\bm{t}_{i}\right\}_{i=1}^{n}. In this study, the kernel function k(𝒕i,𝒕j)k(\bm{t}_{i},\bm{t}_{j}) is specified as a combination of a dot-product kernel kd(𝒕i,𝒕j)k_{d}(\bm{t}_{i},\bm{t}_{j}), a radial-basis function kernel kr(𝒕i,𝒕j)k_{r}(\bm{t}_{i},\bm{t}_{j}), and a white kernel kw(𝒕i,𝒕j)k_{w}(\bm{t}_{i},\bm{t}_{j}):

k(𝒕i,𝒕j)=kd(𝒕i,𝒕j)+kr(𝒕i,𝒕j)+kw(𝒕i,𝒕j),kd(𝒕i,𝒕j)=𝒕i𝒕j+σ2,kr(𝒕i,𝒕j)=exp(𝒕i𝒕j222l2),kw(𝒕i,𝒕j)=δif𝒕i=𝒕jelse  0,\begin{split}k(\bm{t}_{i},\bm{t}_{j})&=k_{d}(\bm{t}_{i},\bm{t}_{j})+k_{r}(\bm{t}_{i},\bm{t}_{j})+k_{w}(\bm{t}_{i},\bm{t}_{j}),\\ k_{d}(\bm{t}_{i},\bm{t}_{j})&=\bm{t}_{i}\cdot\bm{t}_{j}+\sigma^{2},\\ k_{r}(\bm{t}_{i},\bm{t}_{j})&={\rm exp}(-\frac{\left\|\bm{t}_{i}-\bm{t}_{j}\right\|_{2}^{2}}{2l^{2}}),\\ k_{w}(\bm{t}_{i},\bm{t}_{j})&=\delta\quad{\rm if}\;\;\bm{t}_{i}=\bm{t}_{j}\;{\rm else}\;\;0,\end{split} (3)

where σ\sigma, ll, and δ\delta are hyperparameters that need to be selected with grid search. The dot-product and radial-basis function kernels account for the linearity and non-linearity of the data, respectively, while the white kernel explains the noise in the data. For a new spectrum 𝒙\bm{x} with its extracted components 𝒕\bm{t}, the predicted species richness is y^=i=1nαik(𝒕i,𝒕)\hat{y}=\sum_{i=1}^{n}{\alpha}_{i}k(\bm{t}_{i},\bm{t}), where αi=(𝐊+λ𝐈)1yi{\alpha}_{i}=(\mathbf{K}+\lambda\mathbf{I})^{-1}y_{i} with 𝐊i,j=k(𝒕i,𝒕j)\mathbf{K}_{i,j}=k(\bm{t}_{i},\bm{t}_{j}).

In contrast to KRR that is formulated in a deterministic form, GPR is a Bayesian approach for regression. The underlying function correlating DESIS components and species richness is assumed to be distributed probabilistically as a Gaussian Process (GP):

f(𝒕)𝒢𝒫(m(𝒕),k(𝒕,𝒕))f(\bm{t})\sim\mathcal{GP}(m(\bm{t}),k(\bm{t},\bm{t}^{\prime})) (4)

where m(𝒕)m(\bm{t}) is the mean function which is often set to 0, and k(𝒕,𝒕)k(\bm{t},\bm{t}^{\prime}) is the covariance function that can be specified as a kernel function. The input data for Eq. 4 are the transformed components {𝒕i}i=1n\left\{\bm{t}_{i}\right\}_{i=1}^{n}. The covariance function k(𝒕,𝒕)k(\bm{t},\bm{t}^{\prime}) for GPR is set in the same way as that for KRR (Eq. 3), i.e., k(𝒕,𝒕)=k(𝒕i,𝒕j)k(\bm{t},\bm{t}^{\prime})=k(\bm{t}_{i},\bm{t}_{j}) where 𝒕i\bm{t}_{i} and 𝒕j\bm{t}_{j} are pairs from the input data {𝒕i}i=1n\left\{\bm{t}_{i}\right\}_{i=1}^{n}. In contrast to KRR in which the optimal kernel parameters is found by grid search, the parameters of the covariance function in GPR can be automatically determined based on gradient-ascent on the marginal likelihood function. After the posterior likelihood is determined for the GP, the predicted species richness is y^=i=1nαik(𝒕i,𝒕)\hat{y}=\sum_{i=1}^{n}{\alpha}_{i}k(\bm{t}_{i},\bm{t}) for a new spectrum, where αi=(𝐊+ϵ2𝐈)1yi{\alpha}_{i}=(\mathbf{K}+\epsilon^{2}\mathbf{I})^{-1}y_{i} with 𝐊i,j\mathbf{K}_{i,j} being k(𝒕i,𝒕j)k(\bm{t}_{i},\bm{t}_{j}) and ϵ\epsilon being a pre-set parameter explaining the noise in the data.

In addition to KRR and GPR that are respectively belong to the deterministic and Bayesian regression categories, we also tested the ensemble method of RFR. A random forest combines a number of decision tree regressors and takes the average regression result over all trees as the final estimate. Due to the ensemble structure, RFR tends to produce robust regression results with high resistance to overfitting and data noise. The training of a random forest regressor minimises the following optimization function:

argmin𝜽j1dj=1di=1n[h(𝒕i;𝜽j)yi]2,\arg\min_{\bm{\theta}_{j}}\quad\dfrac{1}{d}\sum_{j=1}^{d}\sum_{i=1}^{n}\left[h(\bm{t}_{i};\bm{\theta}_{j})-y_{i}\right]^{2}, (5)

where h()h(\cdot) is the decision tree regressor with 𝜽j\bm{\theta}_{j} being the parameters of the jjth tree; dd is the number of decision trees, which was set to 100 in this study.

2.5 Determination of Hyperparameters

Hyperparameters that need to be pre-set included the number of components kk in the dimensionality reduction methods PCA, CCA, and PLS, and the kernel parameters σ\sigma, ll, and δ\delta in the non-linear regression method KRR. The number of components kk determines how much information to retain after dimensionality reduction. An optimal selection of kk is able to reduce data redundancy without excessively discarding useful information in the original DESIS spectra. In this study, we tested different kk values ranging from 1 to 10. The optimal kk value was selected based on the accuracy of species richness prediction, and the amount of variance in the spectral data that the retained components could explain. The tunable hyperparameters σ\sigma, ll, and δ\delta for the kernel function define the non-linearity structure of the regression model KRR. These kernel parameters were selected based on grid search. A grid of values was tested with each parameter varying from 10510^{-5} to 10510^{5} on a logarithmic scale. The combination of σ\sigma, ll, and δ\delta that produced the best performance in species richness prediction was selected.

2.6 Accuracy Assessment and Analyses

A two-fold validation scheme was used in this study for assessing the modelling accuracy. For each of the Southern Tablelands and Snowy Mountains regions, the whole data set was randomly partitioned into two subsets (Subsets I and II), with Subset I dedicated to training and Subset II for validation (Round I), followed by Subset II for training and Subset I for validation (Round II). This procedure was repeated 100 times with the data set being partitioned differently each time. Correlation diagrams were plotted between the ground-truth species richness and the predicted values from the DESIS spectra. The coefficient of correlation (rr) and Root-Mean-Square Error (RMSE) were calculated to evaluate the performance of the models. Results with different feature extraction procedures (PCA, CCA, and PLS) and regression models (KRR, GPR, and RFR) were computed and compared.

2.7 Band Importance Analysis

The DESIS data consist of spectral measurements over the visible and near-infrared bands from 400 nm to 1000 nm. Importance analysis was conducted in order to analyse which spectral bands provide more explanatory power than others in predicting plant species richness. We used the vector length of contribution values of each band to all components used in the regression weighted by the partial correlation coefficients as the importance index of the band:

Ii=j=1k(wi,jpj)2,I_{i}=\sqrt{\sum_{j=1}^{k}(w_{i,j}\cdot p_{j})^{2}}, (6)

where IiI_{i} is the importance index for the iith band; wi,jw_{i,j} is the (i,j)(i,j)th element of the weight matrix 𝐖\mathbf{W} in Eq. 1, representing the contribution of the iith band to the jjth component; pjp_{j} is the partial correlation coefficient of the jjth component with the target variable of species richness; kk is the number of components used in regression. In our study, the importance indices IiI_{i} are normalised to relative values I~i\tilde{I}_{i} with sum over all bands equal to one:

I~i=Iii=1mIi,\tilde{I}_{i}=\frac{I_{i}}{\sum_{i=1}^{m}I_{i}}, (7)

where mm is the number of spectral bands.

2.8 Comparison with Multispectral Data

Spaceborne multispectral imagery such as Sentinel-2 is more readily available than hyperspectral. Though Sentinel-2 images have less bands than the DESIS hyperspectral data, they are delivered on a more stable and systematic basis. In this study, an analysis is conducted to see if Sentinel-2 multispectral data are able to achieve comparable results. Through this analysis, we hoped to examine how well Sentinel-2 could serve as a substitute for plant biodiversity mapping in instances where hyperspectral data are unavailable.

The Sentinel-2 data set described in Section 2.2 was used for the comparison. In order to conduct a fair comparison between hyperspectral and multispectral data with minimised differences in instrumental specifications and acquisition conditions, a Sentinel-2-like synthetic data set was simulated from the DESIS data. The simulation involved resampling the DESIS spectra using the spectral response functions of Sentinel-2’s visible and near-infrared bands. Both the real and synthetic Sentinel-2 sets were used for species richness prediction, with results being compared to those achieved with DESIS data.

3 Results

3.1 Spectral Reflectance Differences Between Species Richness Classes

The blue, green, and red curves in Fig. 4 show the DESIS spectra averaged over ground sampling plots with species richness falling into the low, intermediate, and high tertiles, respectively. Results for the Southern Tablelands and Snowy Mountains regions are displayed in Figs 4a and b, respectively. For each region, it was observed that plots of higher richness showed a lower reflectance in the visible range of 400680400\sim 680 nm. Considering that major absorption features of chlorophyll are located within the visible region [Zhao et al., 2014b], the lower reflectance in this spectral portion might indicate a higher concentration of chlorophyll. It was also observed that plots of higher richness showed a higher reflectance in the near-infrared plateau of 7801000780\sim 1000 nm and a steeper red edge between 680780680\sim 780 nm, which might suggest a larger Leaf Area Index (LAI) [Delegido et al., 2013] and a greater vegetation vigor [Boochs et al., 1990] for those plots. On the basis of these observations, the spectral shape of high richness plots, as compared with spectra of intermediate and low richness plots, may imply a generally richer vegetation. This is consistent with findings reported in literature that high species richness enhances primary productivity [Wang et al., 2016, Grace et al., 2016] and biomass [Malhi et al., 2020, Tilman et al., 1997].

Refer to caption
Figure 4: Average DESIS reflectance spectra calculated from field sample plots with low, intermediate, and high species richness for the (a) Southern Tablelands and (b) Snowy Mountains regions.

3.2 Hyperparameter Selection Results

Fig. 5 shows the rr and RMSE values achieved with different numbers of components (ranging from 1 to 10) being selected as features. The results were averaged over the two study regions. It was observed that two components achieved the best performance for all the three feature extraction methods of PCA (Fig. 5a), CCA (Fig. 5b), and PLS (Fig. 5c). When only one component was used with more information in the original spectral data being discarded, lower rr values and higher RMSE values were also observed, indicating a poorer performance compared with that achieved by two components. When more than two components were retained with a higher degree of data redundancy presenting, weaker results were also observed. The eigenvalue, percentage of explained variance, and cumulative percentage of explained variance with different components are shown in Table 2. It was seen that with two components, the PCA, CCA, and PLS could explain 93.81%\%, 89.34%\%, and 87.38%\% of variance in the DESIS data, respectively. Based on these results, the number of components, kk, was set to two in our experiments for dimensionality reduction with PCA, CCA, and PLS. The first and second components are plotted in Fig. 6. The first component depicted the general shape of the spectral brightness, while the second component highlighted more on local spectral features of the spectrum.

Refer to caption
Figure 5: Impact of number of components on the estimation accuracy of plant species richness with (a) Principal Component Analysis (PCA), (b) Canonical Correlation Analysis (CCA), and (c) Partial Least Squares analysis (PLS).
Table 2: The eigenvalue, percentage of explained variance, and cumulative percentage of explained variance for components produced with Principal Component Analysis (PCA), Canonical Correlation Analysis (CCA), and Partial Least Squares analysis (PLS).
Component PCA CCA PLS
Eigenvalue % of Variance Cumulative % of Variance Eigenvalue % of Variance Cumulative % of Variance Eigenvalue % of Variance Cumulative % of Variance
1 44.35 84.48 84.48 40.43 74.61 74.61 40.96 75.58 75.58
2 4.90 9.34 93.81 7.98 14.73 89.34 6.39 11.80 87.38
3 2.22 4.23 98.04 4.16 7.67 97.01 4.94 9.12 96.50
4 0.44 0.84 98.87 0.61 1.13 98.14 0.70 1.29 97.79
5 0.22 0.41 99.29 0.48 0.89 99.03 0.41 0.75 98.55
6 0.10 0.18 99.47 0.12 0.23 99.26 0.23 0.42 98.97
7 0.04 0.08 99.55 0.10 0.19 99.45 0.14 0.26 99.22
8 0.04 0.08 99.63 0.08 0.14 99.59 0.11 0.19 99.42
9 0.03 0.06 99.69 0.06 0.11 99.69 0.08 0.14 99.56
10 0.02 0.04 99.73 0.04 0.08 99.77 0.06 0.11 99.67
Refer to caption
Figure 6: The first and second components for the transformed DESIS spectra. Left column: Result for the Southern Tablelands with (a) Principal Component Analysis (PCA), (b) Canonical Correlation Analysis (CCA), and (c) Partial Least Squares analysis (PLS); Right column: Result for the Snowy Mountains region with (d) PCA, (e) CCA, and (f) PLS.

The grid searching result for the optimal KRR hyperparameters σ\sigma, ll, and δ\delta is shown in Fig. 7. The best combination of kernel parameters was σ=103\sigma=10^{3}, l=103l=10^{3}, and δ=10\delta=10. This combination of kernel parameter values was used as default values for KRR in our experiments.

Refer to caption
Figure 7: Selection of kernel parameters σ\sigma, ll, and δ\delta based on grid search for Kernel Ridge Regression (KRR). The best performing combinations of kernel parameters are circled in black.

3.3 Plant Species Richness Prediction

The assessment of different dimensionality reduction and regression methods (Table 3) showed that, for the Southern Tablelands region, the best result was achieved with a combination of PLS for dimensionality reduction and GPR for regression, with rr being 0.76 and RMSE being 5.89. Results with PLS as the dimensionality reduction method performed better than those with PCA and CCA. The PLS-based models achieved 0.750.760.75\sim 0.76 for rr and 5.895.925.89\sim 5.92 for RMSE, respectively, better than the PCA-based models with 0.700.710.70\sim 0.71 for rr and 5.996.025.99\sim 6.02 for RMSE, and the CCA-based models with 0.690.730.69\sim 0.73 for rr and 5.986.025.98\sim 6.02 for RMSE.

Table 3: The coefficient of correlation (rr) and Root-Mean-Square Error (RMSE) between model predicted and ground truth plant species richness for the Southern Tablelands region.
Dimensionality  Reduction Regression rr RMSE
PCA KRR 0.70 6.01
PCA GPR 0.71 5.99
PCA RFR 0.70 6.02
CCA KRR 0.69 6.02
CCA GPR 0.72 5.98
CCA RFR 0.73 6.00
PLS KRR 0.75 5.92
PLS GPR 0.76 5.89
PLS RFR 0.75 5.91

The assessment results for the Snowy Mountains region in shown in Table 4. The Snowy Mountains region is located at high altitudes with less human interference and a generally lower species richness than the Southern Tablelands (Fig. 2). Compared with the results for the Southern Tablelands in Table 3, it can be seen that generally the rr values were lower and RMSE were higher for the Snowy Mountains region (Table 4). The best rr and RMSE were 0.68 and 5.95, respectively, achieved with a combination of PLS for dimensionality reduction and RFR for regression.

Table 4: The coefficient of correlation (rr) and Root-Mean-Square Error (RMSE) between model predicted and ground truth plant species richness for the Snowy Mountains region.
Dimensionality  Reduction Regression rr RMSE
PCA KRR 0.51 6.17
PCA GPR 0.54 6.03
PCA RFR 0.56 6.01
CCA KRR 0.52 6.14
CCA GPR 0.53 6.08
CCA RFR 0.54 5.99
PLS KRR 0.64 6.08
PLS GPR 0.66 5.97
PLS RFR 0.68 5.95

The correlation diagrams in Fig. 8 show the relationship between the ground-truth species richness and the predicted values from the DESIS spectra, with data samples from the validation set. These results were produced with the best performing models in Tables 3 and 4.

Refer to caption
Figure 8: Correlation diagrams between the ground-truth species richness and the predicted values from the DESIS spectra for the (a) Southern Tablelands and (b) Snowy Mountains regions. The rr and RMSE stand for the coefficient of correlation and root-mean-square error.

3.4 Generalised Modelling Results

Figs 8a and 8b show results with modelling being conducted for the Southern Tablelands and Snowy Mountains regions separately. When we modelled the pooled data from both regions, we found that accuracy decreased (Fig. 9). The prediction result was r=0.61r=0.61 and RMSE=10.1\text{RMSE}=10.1, which was lower than modelling with data from only one region (Figs 8a and 8b). This indicates that location-specific modelling performs better than using one model to describe multiple regions.

The difficulty of modelling relationships between hyperspectral data and plant species richness for multiple regions, compared modelling for each region separately, is worth further investigation. Different regions may differ in the assemblages of plant species, and their compositional and structural properties, resulting in extra variations in hyperspectral data in addition to those induced by the richness of species. These additional variations may add complexity in exploring useful information in hyperspectral data to predict plant species richness. The location-specific relationship hyperspectral data and plant species richness calls for location-dependant modelling or encoding location information into the input spectra in future studies when mapping plant species richness at continental or global scales is attempted.

Refer to caption
Figure 9: Correlation diagrams between the ground-truth species richness and values predicted from DESIS spectra with data congregating both of the Southern Tablelands and Snowy Mountains regions. The rr and RMSE stand for the coefficient of correlation and root-mean-square error.

3.5 The Relative Importance of Spectral Bands

The relative importance of DESIS bands in predicting plant species richness is shown in Fig. 10, with the aim to analyse which parts of the spectrum had the most explanatory power. Subplots 10a, b, and c display the results with PCA, CCA, and PLS being used as the feature extraction procedure, respectively. From these subplots, it was observed that bands in the red-edge spectral region of approx. 700 \sim 720 nm showed the highest importance. This may suggest that the slope of the red-edge is an important spectral feature for plant species richness prediction, considering that the low, intermediate, and high species richness plots showed varied red-edge slopes in Fig. 4. This observation might be supported by literature that the red-edge is a critical spectral region for vegetation mapping as it is closely related to biological variables such as leaf area index [Delegido et al., 2013], plant vigour [Boochs et al., 1990], and biochemical contents [Mutanga and Skidmore, 2007]. Followed by the red-edge, the visible range of approx. 400 \sim 700 nm also showed high importance. The importance of visible bands in plant species richness prediction might be justified by the fact that, this portion of the spectrum, especially bands in red and blue, is the major leaf pigment absorption range. It is sensitive to mainly chlorophyll a and b contents, according to the sensitivity analysis result reported in Zhao et al. [2014b, a]. Though less important than the red-edge and visible regions, the near-infrared region with wavelengths longer than 720 nm also showed some explanatory power. This indicated that the near-infrared region also provided contributory information in predicting plant species richness, as this portion of the spectrum, often characterised by high reflectance for vegetation, is sensitive to leaf thickness [Zhao et al., 2014a] and the amount, arrangement, and inclination of leaves in the canopy [Knipling, 1970].

Refer to caption
Figure 10: Relative importance analysis for the DESIS bands in predicting plant species richness with spectral features being extracted by (a) Principal Component Analysis (PCA), (b) Canonical Correlation Analysis (CCA), and (c) Partial Least Squares analysis (PLS).

3.6 Comparison with Multispectral Data

Figs 11a and 11b show correlation diagrams between the ground-truth species richness and values predicted from the real Sentinel-2 multispectral data for the Southern Tablelands and Snowy Mountains regions, respectively. It is seen that, a prediction result of r=0.66r=0.66 and RMSE=6.27\text{RMSE}=6.27 was achieved for the Southern Tablelands region (Fig. 11a), and r=0.57r=0.57 and RMSE=6.31\text{RMSE}=6.31 for the Snowy Mountains region (Fig. 11b). The prediction results with the Sentinel-2-like synthetic data set are shown in Fig. 12. For the Southern Tablelands, the rr and RMSE values are 0.65 and 6.19, while for the Snowy Mountains, the rr and RMSE values are 0.57 and 6.39.

Refer to caption
Figure 11: Correlation diagrams between the ground-truth species richness and values predicted from the real Sentinel-2 multispectral data set for the (a) Southern Tablelands and (b) Snowy Mountains regions. The rr and RMSE stand for the coefficient of correlation and root-mean-square error.
Refer to caption
Figure 12: Correlation diagrams between the ground-truth species richness and values predicted from the Sentinel-2-like synthetic multispectral data set for the (a) Southern Tablelands and (b) Snowy Mountains regions. The rr and RMSE stand for the coefficient of correlation and root-mean-square error.

These multispectral results were slightly worse than the hyperspectral results shown in Fig. 8. This could be explained by the fact that hyperspectral data contain richer information in the spectral domain. However, when taking data availability and stability into consideration, spaceborne multispectral imagery could serve as a reliable alternative for plant biodiversity mapping. Also, data such as Sentinel has much more consistent and regular sampling over time, and harnessing the added information in temporal changes in multispectral signal could provide further explanatory power for biodiversity analyses.

4 Discussions

4.1 Comparison to Previous Studies

Predicting plant biodiversity from remotely sensed measurements helps with making evidence-informed environmental policies and conducting effective conservation activities. Hyperspectral imagery delivered by the recently launched DESIS instrument opens the potential for plant biodiversity monitoring at a finer spatial scale and with a higher accuracy. In this study, we take the the Southern Tablelands and Snowy Mountains regions in southeast Australia as experimental sites to test the relationship between DESIS data and plant species richness. A two step approach is proposed, where feature extraction techniques are first used to reduce the dimensionality of hyperspectral data, followed by regression models with kernel functions to account for the linearity, non-linearity, and noise in data.

Obtaining informative features from hyperspectral data is important to the success of subsequent interference of plant species richness. In previous studies, features have been primarily selected as a subset of the original bands, or as spectral indices computed from a subset of the original bands (e.g., Malhi et al. [2020], Peng et al. [2018], and Wang et al. [2016]). The bands or indices are often selected based on our a priori knowledge of the spectral properties, such as absorption features of biochemical contents. Though selected features usually offer good explainability, it means we have to discard information in unselected bands. Considering the large number of bands in hyperspectral data, many of them would be discarded as the high collinearity of hyperspectral data often requires a considerable reduction of dimensionality. In contrast, the feature extraction approach, as adopted in this work, makes use of the information in all original bands by transforming them into a new feature space of lower and non-collinear dimensionality.

Linear models have been primarily employed in previous studies to relate features to species richness. For example, multiple linear regression has been adopted by Wang et al. [2016] and Malhi et al. [2020], and stepwise linear regression by Peng et al. [2018]. However, the relationship between features and richness might not necessarily follows simply a linear pattern. In our study, a novel kernel function is proposed (Eq. 3), with the dot-product and radial-basis function kernels account for the linearity and non-linearity of the data, respectively, and the white kernel explains the noise in the data. The ability of our model to explore both linear and non-linear patterns distinguishes our work from aforementioned studies.

4.2 Mechanism of Plant Species Richness Prediction

Though detailed field surveys have been deemed as the most accurate and reliable way for assessing plant biodiversity, remote sensing data can serve as a proxy for large-scale and cost-effective biodiversity mapping. The mechanism justifying the use of remote sensing has been widely discussed in literature. It is worth noting the exact mechanism is dependant on which method is used and the sensor specifications (e.g., pixel size and spectral range). The richness of species can be estimated either directly from raw spectra, or via the extracted plant functional traits or types [Wang and Gamon, 2019]. Methods based on the spectral variation hypothesis is also intriguing, whereby the spectral variation across spatially adjacent pixels is employed as the proxy [Palmer et al., 2002, Fassnacht et al., 2022]. Here we focus on discussing the underlying mechanism that underpins our study where the DESIS spectra with a pixel size of 30 m and a spectral range of 400–1000 nm are linked to on-ground species richness via feature extraction and statistical regression.

It has been reported in literature that plant communities of high diversity tend to have an enhanced primary productivity [Wang et al., 2016, Grace et al., 2016] and a higher above ground biomass [Malhi et al., 2020, Tilman et al., 1997]. Though the reason to explain the richness–productivity/biomass relationship is a subject of debate, a common theory is that the complementary roles played by different species lead to lower nutrient losses and more sustainable soils [Tilman et al., 1996]. Species complementarity allows plants to capture resources in ways that are complementary in space or/and time, leading to increased biomass production [Cardinale et al., 2007]. For example, a high number of species allows plants to reside in various partitions of niches, resulting in a denser occupation of space and a higher efficiency of water, nutrition, and sunlight usage.

Based on this relationship, many studies have successfully estimated plant species richness from hyperspectral measurements (e.g., Wang et al. [2016], Malhi et al. [2020], and Peng et al. [2018]), given that remotely sensed hyperspectral data is a good proxy of vegetative biomass and primary productivity. A positive and dynamic productivity–diversity relationship is observed in a prairie grassland experiment at Cedar Creek, Minnesota, USA, with NDVI being employed as a proxy of vegetation productivity to estimate species richness [Wang et al., 2016]. The correlation between spectral indices and plant species diversity is also reported in Peng et al. [2018] for a semi-arid sandland ecosystem in Inner Mongolia, China.

It is worth noting that previous studies are primarily focused on ground-measured (e.g., [Peng et al., 2018] and Wang et al. [2016]) or airborne (e.g., Asner [2008]) hyperspectral data, with a limited spatial range. The recently launched DESIS and PRISMA [Pignatti et al., 2013, Verrelst et al., 2021], and the upcoming EnMAP (Environmental Mapping and Analysis Program) [Guanter et al., 2015] missions, enable us to test the potential of spaceborne hyperspectral measurements in plant species richness mapping. Our study shows a promising correlation between the two, and finds that the correlation is location-dependent.

4.3 Limitations

Over the past decades, in-situ samples of plant species richness have been collected via various survey campaigns. In total, more than 188 thousands of samples have been gathered in Australia as of the year of 2018 [Gellie et al., 2018]. However, most of the samples are not able to be matched with a DESIS observation that is temporally close enough to them, as DESIS has not been in operation until 2018. In this study, in order to avoid large temporal discrepancies, we have limited our on-ground samples to those that are less than three years apart with their associated DESIS spectra. The limited spatial coverage of DESIS images, and bushfires during the 2019–2020 summer, have further reduced the number of available samples. As a result, analyses in this study are conducted on a relatively small number of samples. Nevertheless, the ever increasing amount of both satellite images and on-ground samples will enable more comprehensive assessment of the potential of spaceborne hyperspectral remote sensing for plant biodiversity mapping. Moreover, though the data set used in this work does not have information on which strata the observed species come from, it is worth splitting out richness for different strata in future field surveys. With strata information on record, we might be able to model tree richness and understory richness separately, and then combine them once predicted to get total richness. The results reported in this study may serve as a basis for future studies.

It is important to note that the DESIS pixel size (30×3030\times 30 m) does not match exactly with the plot area of ground species richness samples (400m2=20×20m400\;\text{m}^{2}=20\times 20\;\text{m}). Though richness measurements could be scaled to a larger or smaller plot area using an assumed power relationship (S=cAzS=cA^{z}) between species richness (SS) and plot area (AA) [Rosenzweig, 1995], the location-specific power parameter (zz) is often hard to be accurately determined without adequate knowledge about the experiment site. It is suggested that, in future field campaigns, it is worthwhile to gather information on the richness-area relationship, in order to facilitate accurate up- and down-scaling of plot areas. Potential inaccuracies in geo-registration of DESIS images and in geo-positioning measurements during field surveys may also result in geo-mismatch between pixels and ground plots. Better geo-registration and geo-positioning accuracies would help reduce uncertainties in predicting species richness in future works.

The DESIS sensor covers the visible and near-infrared (VNIR) portions of the spectrum. As shown in this study, this spectral range is informative in plant biodiversity mapping. However, it is also worthwhile to leverage the potential of the short-wave-infrared (SWIR) bands. The upcoming EnMAP imaging spectroscopy mission [Guanter et al., 2015] is scheduled to be launched in 2022. The EnMAP dual-spectrometer instrument, covering both VNIR and SWIR from 420 nm to 2450 nm (as shown in Fig. 3), will provide an opportunity to integrate information from SWIR for plant biodiversity mapping. In addition to hyperspectral optical imagery, the combination of data from other sensor types, such synthetic-aperture radar (SAR) or LiDAR data, could also be explored to improve our ability in remote mapping of plant biodiversity.

5 Conclusion

Spaceborne hyperspectral remote sensing is a promising and cost-effective data source to enable plant biodiversity mapping. Thanks to its advanced spectral and spatial specifications, the recently launched hyperspectral instrument DESIS (the DLR Earth Sensing Imaging Spectrometer) opens up an opportunity to monitor plant biodiversity at a finer spatial scale and with a higher accuracy. In this study, we assessed the ability of DESIS hyperspectral data in predicting plant species richness in the Southern Tablelands and Snowy Mountains regions in southeast New South Wales, Australia. The spectral features were firstly extracted, and then correlated to plant species richness via statistical regression. We evaluated the performance of several combinations of feature extraction procedures (PCA, CCA, and PLS) and regression models (KRR, GPR, and RFR). The main findings of this study are summarised as follows:

(1) Plant species richness values were predicted from DESIS data in the two study regions. Prediction accuracies fell within a comparable range for different combinations of feature extraction techniques and regression models (Tables 3 and 4). The best prediction results were r=0.76r=0.76 and RMSE=5.89\text{RMSE}=5.89 for Southern Tablelands region, and r=0.68r=0.68 and RMSE=5.95\text{RMSE}=5.95 for the Snowy Mountains region.

(2) The correlation between DESIS hyperspectral data and plant species richness was region-specific. Modelling the correlation separately for each region produced better results than building a single model for all regions.

(3) The relative importance analysis conducted among DESIS bands showed that the red-edge, red, and blue spectral regions are more important in predicting plant species richness than the green bands and the near-infrared bands beyond red-edge (noting that the SWIR region is not sampled by DESIS).

(4) The DESIS hyperspectral data performed better than multispectral data in predicting plant species richness, indicating that the provision of richer information in the spectral domain is important for diversity mapping.

Results shown in this study provided a quantitative reference on the potential for spaceborne hyperspectral data to be used in the mapping of on-ground plant species richness. Future studies should focus on extending the current approach to larger areas, investigating the potential of upcoming hyperspectral missions that extend into the SWIR region, and exploring the combination of data from other sensor types.

6 Acknowledgement

The authors are grateful to the anonymous reviewers for their important and insightful comments for improving this manuscript.

References

  • Alonso et al. [2019] Alonso, K., Bachmann, M., Burch, K., Carmona, E., Cerra, D., De los Reyes, R., Dietrich, D., Heiden, U., Hölderlin, A., Ickes, J., et al., 2019. Data products, quality and validation of the DLR Earth sensing imaging spectrometer (DESIS). Sensors 19, 4471.
  • Asner [2008] Asner, G.P., 2008. Hyperspectral remote sensing of canopy chemistry, physiology, and biodiversity in tropical rainforests. Hyperspectral remote sensing of tropical and sub-tropical forests , 261–296.
  • Berk [2016] Berk, A., 2016. MODTRAN 5.4. 0 User’s Manual. Spectral Sciences Inc., Burlingon, MA .
  • Boochs et al. [1990] Boochs, F., Kupfer, G., Dockter, K., Kühbauch, W., 1990. Shape of the red edge as vitality indicator for plants. International Journal of Remote Sensing 11, 1741–1753.
  • Bush et al. [2017] Bush, A., Sollmann, R., Wilting, A., Bohmann, K., Cole, B., Balzter, H., Martius, C., Zlinszky, A., Calvignac-Spencer, S., Cobbold, C.A., et al., 2017. Connecting Earth observation to high-throughput biodiversity data. Nature Ecology & Evolution 1, 1–9.
  • Cardinale et al. [2007] Cardinale, B.J., Wright, J.P., Cadotte, M.W., Carroll, I.T., Hector, A., Srivastava, D.S., Loreau, M., Weis, J.J., 2007. Impacts of plant diversity on biomass production increase through time because of species complementarity. Proceedings of the National Academy of Sciences 104, 18123–18128.
  • Carlson et al. [2007] Carlson, K.M., Asner, G.P., Hughes, R.F., Ostertag, R., Martin, R.E., 2007. Hyperspectral remote sensing of canopy biodiversity in Hawaiian lowland rainforests. Ecosystems 10, 536–549.
  • Ceballos et al. [2015] Ceballos, G., Ehrlich, P.R., Barnosky, A.D., García, A., Pringle, R.M., Palmer, T.M., 2015. Accelerated modern human–induced species losses: Entering the sixth mass extinction. Science Advances 1, e1400253.
  • De Palma et al. [2021] De Palma, A., Hoskins, A., Gonzalez, R.E., Börger, L., Newbold, T., Sanchez-Ortiz, K., Ferrier, S., Purvis, A., 2021. Annual changes in the Biodiversity Intactness Index in tropical and subtropical forest biomes, 2001–2012. Scientific Reports 11, 1–13.
  • Delegido et al. [2013] Delegido, J., Verrelst, J., Meza, C., Rivera, J., Alonso, L., Moreno, J., 2013. A red-edge spectral index for remote sensing estimation of green LAI over agroecosystems. European Journal of Agronomy 46, 42–52.
  • Eckardt et al. [2015] Eckardt, A., Horack, J., Lehmann, F., Krutz, D., Drescher, J., Whorton, M., Soutullo, M., 2015. DESIS (DLR Earth sensing imaging spectrometer for the ISS-MUSES platform), in: 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), IEEE. pp. 1457–1459.
  • Fallding [2002] Fallding, M., 2002. A planning framework for natural ecosystems of the ACT and NSW Southern Tablelands. Natural Heritage Trust, NSW National Parks and Wildlife Service.
  • Fassnacht et al. [2022] Fassnacht, F.E., Müllerova, J., Conti, L., Malavasi, M., Schmidtlein, S., 2022. About the link between biodiversity and spectral variation. Applied Vegetation Science , e12643.
  • Feilhauer et al. [2015] Feilhauer, H., Asner, G.P., Martin, R.E., 2015. Multi-method ensemble selection of spectral bands related to leaf biochemistry. Remote Sensing of Environment 164, 57–65.
  • Frankel et al. [1995] Frankel, O.H., Brown, A.H., Burdon, J.J., 1995. The Conservation of Plant Biodiversity. Cambridge University Press.
  • Féret and Asner [2014] Féret, J.B., Asner, G.P., 2014. Mapping tropical forest canopy diversity using high-fidelity imaging spectroscopy. Ecological Applications 24, 1289–1296. URL: https://esajournals.onlinelibrary.wiley.com/doi/abs/10.1890/13-1824.1, doi:https://doi.org/10.1890/13-1824.1, arXiv:https://esajournals.onlinelibrary.wiley.com/doi/pdf/10.1890/13-1824.1.
  • Gellie et al. [2018] Gellie, N.J., Hunter, J.T., Benson, J.S., Kirkpatrick, J.B., Cheal, D.C., McCreery, K., Brocklehurst, P., 2018. Overview of plot-based vegetation classification approaches within Australia. Phytocoenologia 48, 251--272.
  • Ghiyamat and Shafri [2010] Ghiyamat, A., Shafri, H.Z., 2010. A review on hyperspectral remote sensing for homogeneous and heterogeneous forest biodiversity assessment. International Journal of Remote Sensing 31, 1837--1856.
  • Government [2019] Government, N., 2019. BioNet Systematic Flora Survey. https://www.environment.nsw.gov.au/research/VISplot.htm. [Online; accessed 11-February-2022].
  • Grace et al. [2016] Grace, J.B., Anderson, T.M., Seabloom, E.W., et al., 2016. Integrative modelling reveals mechanisms linking productivity and plant species richness. Nature 529, 390--393.
  • Green and Craig [1985] Green, A., Craig, M., 1985. Analysis of aircraft spectrometer data with logarithmic residuals, in: JPL Proc. of the Airborne Imaging Spectrometer Data Anal. Workshop, pp. 111--119.
  • Guanter et al. [2015] Guanter, L., Kaufmann, H., Segl, K., Foerster, S., Rogass, C., Chabrillat, S., Kuester, T., Hollstein, A., Rossner, G., Chlebek, C., et al., 2015. The EnMAP spaceborne imaging spectroscopy mission for Earth observation. Remote Sensing 7, 8830--8857.
  • Guo et al. [2018] Guo, Y., Jia, X., Paull, D., 2018. Effective sequential classifier training for svm-based multitemporal remote sensing image classification. IEEE Transactions on Image Processing 27, 3036--3048.
  • Guo et al. [2022] Guo, Y., Mokany, K., Ong, C., Moghadam, P., Ferrier, S., Levick, S., 2022. Quantitative assessment of DESIS hyperspectral data for plant biodiversity estimation in Australia, in: 2022 IEEE International Geoscience and Remote Sensing Symposium, IEEE. pp. 1744--1747.
  • Hacker et al. [2020] Hacker, P.W., Coops, N.C., Townsend, P.A., Wang, Z., 2020. Retrieving foliar traits of quercus garryana var. garryana across a modified landscape using leaf spectroscopy and LiDAR. Remote Sensing 12, 26.
  • Jia and Richards [1999] Jia, X., Richards, J.A., 1999. Segmented principal components transformation for efficient hyperspectral remote-sensing image display and classification. IEEE Transactions on Geoscience and Remote Sensing 37, 538--542.
  • Kattge et al. [2020] Kattge, J., Bönisch, G., Díaz, S., Lavorel, S., Prentice, I.C., Leadley, P., Tautenhahn, S., Werner, G.D., Aakala, T., Abedi, M., et al., 2020. TRY plant trait database--enhanced coverage and open access. Global Change Biology 26, 119--188.
  • Knipling [1970] Knipling, E.B., 1970. Physical and physiological basis for the reflectance of visible and near-infrared radiation from vegetation. Remote Sensing of Environment 1, 155--159.
  • König et al. [2019] König, C., Weigelt, P., Schrader, J., Taylor, A., Kattge, J., Kreft, H., 2019. Biodiversity data integration—the significance of data resolution and domain. PLoS Biology 17, e3000183.
  • Körner [1995] Körner, C., 1995. Alpine plant diversity: a global survey and functional interpretations, in: Arctic and alpine biodiversity: Patterns, causes and ecosystem consequences. Springer, pp. 45--62.
  • Krutz et al. [2019] Krutz, D., Müller, R., Knodt, U., Günther, B., Walter, I., Sebastian, I., Säuberlich, T., Reulke, R., Carmona, E., Eckardt, A., et al., 2019. The instrument design of the DLR Earth sensing imaging spectrometer (DESIS). Sensors 19, 1622.
  • Leclère et al. [2020] Leclère, D., Obersteiner, M., Barrett, M., Butchart, S.H., Chaudhary, A., De Palma, A., DeClerck, F.A., Di Marco, M., Doelman, J.C., Dürauer, M., et al., 2020. Bending the curve of terrestrial biodiversity needs an integrated strategy. Nature 585, 551--556.
  • Mafanya et al. [2022] Mafanya, M., Tsele, P., Zengeya, T., Ramoelo, A., 2022. An assessment of image classifiers for generating machine-learning training samples for mapping the invasive Campuloclinium macrocephalum (Less.) DC (pompom weed) using DESIS hyperspectral imagery. ISPRS Journal of Photogrammetry and Remote Sensing 185, 188--200.
  • Malhi et al. [2020] Malhi, R.K.M., Anand, A., Mudaliar, A.N., Pandey, P.C., Srivastava, P.K., Sandhya Kiran, G., 2020. Synergetic use of in situ and hyperspectral data for mapping species diversity and above ground biomass in Shoolpaneshwar Wildlife sanctuary, Gujarat. Tropical Ecology 61, 106--115.
  • Mokany et al. [2020] Mokany, K., Ferrier, S., Harwood, T.D., Ware, C., Di Marco, M., Grantham, H.S., Venter, O., Hoskins, A.J., Watson, J.E., 2020. Reconciling global priorities for conserving biodiversity habitat. Proceedings of the National Academy of Sciences 117, 9906--9911.
  • Mutanga and Skidmore [2007] Mutanga, O., Skidmore, A.K., 2007. Red edge shift and biochemical content in grass canopies. ISPRS Journal of Photogrammetry and Remote Sensing 62, 34--42.
  • Myers et al. [2021] Myers, B.J., Weiskopf, S.R., Shiklomanov, A.N., Ferrier, S., Weng, E., Casey, K.A., Harfoot, M., Jackson, S.T., Leidner, A.K., Lenton, T.M., et al., 2021. A new approach to evaluate and reduce uncertainty of model-based biodiversity projections for conservation policy formulation. BioScience 71, 1261--1273.
  • Ong and Cudahy [2014] Ong, C., Cudahy, T., 2014. Mapping contaminated soils: using remotely-sensed hyperspectral data to predict pH. European journal of soil science 65, 897--906.
  • Palmer et al. [2002] Palmer, M.W., Earls, P.G., Hoagland, B.W., White, P.S., Wohlgemuth, T., 2002. Quantitative tools for perfecting species lists. Environmetrics 13, 121--137.
  • Peng et al. [2018] Peng, Y., Fan, M., Song, J., Cui, T., Li, R., 2018. Assessment of plant species diversity based on hyperspectral indices at a fine scale. Scientific Reports 8, 1--11.
  • Pettorelli et al. [2016] Pettorelli, N., Wegmann, M., Skidmore, A., Mücher, S., Dawson, T.P., Fernandez, M., Lucas, R., Schaepman, M.E., Wang, T., O’Connor, B., et al., 2016. Framing the concept of satellite remote sensing essential biodiversity variables: challenges and future directions. Remote Sensing in Ecology and Conservation 2, 122--131.
  • Pickering and Green [2009] Pickering, C., Green, K., 2009. Vascular plant distribution in relation to topography, soils and micro-climate at five GLORIA sites in the Snowy Mountains, Australia. Australian Journal of Botany 57, 189--199.
  • Pickering et al. [2008] Pickering, C., Hill, W., Green, K., 2008. Vascular plant diversity and climate change in the alpine zone of the Snowy Mountains, Australia. Biodiversity and Conservation 17, 1627--1644.
  • Pignatti et al. [2013] Pignatti, S., Palombo, A., Pascucci, S., Romano, F., Santini, F., Simoniello, T., Umberto, A., Vincenzo, C., Acito, N., Diani, M., et al., 2013. The PRISMA hyperspectral mission: Science activities and opportunities for agriculture and land monitoring, in: 2013 IEEE International Geoscience and Remote Sensing Symposium-IGARSS, IEEE. pp. 4558--4561.
  • De los Reyes et al. [2018] De los Reyes, R., Richter, R., Langheinrich, M., Pflug, B., Schwind, P., 2018. Validation of a new atmospheric correction software using AERONET reference data PACO: Python-based Atmospheric COrrection, in: Workshop on Land Product Validation and Evolution (LPVE2018), pp. 1--1.
  • Richards and Jia [2006] Richards, J.A., Jia, X., 2006. Remote Sensing Digital Image Analysis [4th Edition]. Springer.
  • Rosenzweig [1995] Rosenzweig, M.L., 1995. Species diversity in space and time. Cambridge University Press.
  • Skidmore et al. [2021] Skidmore, A.K., Coops, N.C., Neinavaz, E., Ali, A., Schaepman, M.E., Paganini, M., Kissling, W.D., Vihervaara, P., Darvishzadeh, R., Feilhauer, H., et al., 2021. Priority list of biodiversity metrics to observe from space. Nature Ecology & Evolution 5, 896--906.
  • Stevenson et al. [2021] Stevenson, S.L., Watermeyer, K., Caggiano, G., Fulton, E.A., Ferrier, S., Nicholson, E., 2021. Matching biodiversity indicators to policy needs. Conservation Biology 35, 522--532.
  • Tilman et al. [1997] Tilman, D., Knops, J., Wedin, D., Reich, P., Ritchie, M., Siemann, E., 1997. The influence of functional diversity and composition on ecosystem processes. Science 277, 1300--1302.
  • Tilman et al. [1996] Tilman, D., Wedin, D., Knops, J., 1996. Productivity and sustainability influenced by biodiversity in grassland ecosystems. Nature 379, 718--720.
  • Tollefson [2019] Tollefson, J., 2019. Humans are driving one million species to extinction. Nature 569, 171--172.
  • Verrelst et al. [2021] Verrelst, J., Rivera-Caicedo, J.P., Reyes-Muñoz, P., Morata, M., Amin, E., Tagliabue, G., Panigada, C., Hank, T., Berger, K., 2021. Mapping landscape canopy nitrogen content from space using PRISMA data. ISPRS Journal of Photogrammetry and Remote Sensing 178, 382--395.
  • Wang and Gamon [2019] Wang, R., Gamon, J.A., 2019. Remote sensing of terrestrial plant biodiversity. Remote Sensing of Environment 231, 111218.
  • Wang et al. [2016] Wang, R., Gamon, J.A., Montgomery, R.A., et al., 2016. Seasonal variation in the NDVI--species richness relationship in a prairie grassland experiment (Cedar Creek). Remote Sensing 8, 128.
  • Wang et al. [2019] Wang, Z., Townsend, P.A., Schweiger, A.K., Couture, J.J., Singh, A., Hobbie, S.E., Cavender-Bares, J., 2019. Mapping foliar functional traits and their uncertainties across three years in a grassland experiment. Remote Sensing of Environment 221, 405--416.
  • Xu et al. [2019] Xu, M., Jia, X., Pickering, M., Jia, S., 2019. Thin cloud removal from optical remote sensing images using the noise-adjusted principal components transform. ISPRS Journal of Photogrammetry and Remote Sensing 149, 215--225.
  • Zhao et al. [2014a] Zhao, F., Guo, Y., Huang, Y., Reddy, K.N., Lee, M.A., Fletcher, R.S., Thomson, S.J., 2014a. Early detection of crop injury from herbicide glyphosate by leaf biochemical parameter inversion. International Journal of Applied Earth Observation and Geoinformation 31, 78--85.
  • Zhao et al. [2014b] Zhao, F., Huang, Y., Guo, Y., Reddy, K.N., Lee, M.A., Fletcher, R.S., Thomson, S.J., 2014b. Early detection of crop injury from glyphosate on soybean and cotton using plant leaf hyperspectral data. Remote Sensing 6, 1538--1563.