Challenges and Opportunities in Location Modeling for Large-Scale Geospatial Prediction Problems

Ben Trovato [email protected] 1234-5678-9012 , G.K.M. Tobin [email protected] Institute for Clarity in DocumentationP.O. Box 1212DublinOhio43017-6221 , Lars Thørväld The Thørväld Group1 Thørväld CircleHeklaIceland [email protected] , Valerie Béranger Inria Paris-RocquencourtRocquencourtFrance , Aparna Patel Rajiv Gandhi UniversityRono-HillsDoimukhArunachal PradeshIndia , Huifen Chan Tsinghua University30 Shuangqing RdHaidian QuBeijing ShiChina , Charles Palmer Palmer Research Laboratories8600 Datapoint DriveSan AntonioTexas78229 [email protected] , John Smith The Thørväld Group [email protected] and Julius P. Kumquat The Kumquat Consortium [email protected]

(2018)

Abstract.

Location Encoding…

datasets, neural networks, gaze detection, text tagging

^†^†copyright: acmcopyright^†^†journalyear: 2018^†^†doi: 10.1145/1122445.1122456^†^†conference: The 28th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems; November 03–06, 2020; Seattle, WA^†^†price: 15.00^†^†isbn: 978-1-4503-XXXX-X/18/06^†^†ccs: Computer systems organization Embedded systems^†^†ccs: Computer systems organization Redundancy^†^†ccs: Computer systems organization Robotics^†^†ccs: Networks Network reliability

1. Introduction and Motivation

The rapid development of novel deep learning and representation learning techniques and the increasing availability of diverse, large-scale geospatial data have fueled substantial progress in geospatial artificial intelligence (GeoAI) research (smith1984artificial; couclelis1986artificial; openshaw1997artificial; janowicz2020geoai). This includes progress on a wide spectrum of challenging tasks such as terrain feature detection and extraction (li2020automated), land use classification (zhong2019deep), navigation in the urban environment (mirowski2018learning), image geolocalization (weyand2016planet; izbicki2019exploiting), toponym recognition and disambiguation (delozier2015gazetteer; wang2020neurotpr), geographic knowledge graph completion and summarization (qiu2019knowledge; yan2019spatially), traffic forecasting (li2017diffusion), to name a few.

Despite the fact that these models are very different in design, they share a common characteristic - they need to represent (or encode) different types of spatial data, such as points (e.g., points of interest (POIs)), polylines (e.g., trajectories), polygons (e.g., administrative regions), graphs/networks (e.g., transportation networks), or raster (e.g., satellite images), in a hidden embedding space so that they can be utilized by machine learning models such as deep neural nets (NN). For raster data, this encoding process is straightforward since the regular grid structures can be directly handled by existing deep learning models such as convolutional neural networks (CNN) (krizhevsky2012imagenet). The representation problem gets more complicated for vector data such as point sets, polylines, polygons, and networks, which have more irregular spatial organization formats, because the concepts of location, distance, and direction among others do not have straightforward counterparts in existing NN and it is not trivial to design NN operations (e.g., convolution) for irregularly structured data (valsesia2018learning).

Early efforts perform data transformation operations to convert the underlying spatial data into a format which can be handled by existing NN modules (wang2019dynamic). However, this conversion process often leads to information loss. For example, many early research about point cloud classification and segmentation first converted 3D point clouds into volumetric representations (e.g., voxelized shapes) (maturana2015voxnet; qi2016volumetric) or rendered them into 2D images (su2015multi; qi2016volumetric). Then they applied 3D or 2D CNN on these converted data representations for the classification or segmentation tasks. These practices have a major limitation – choosing an appropriate spatial resolution for a volumetric representation is challenging (qi2017pointnet). A finer spatial resolution leads to data sparsity and higher computation cost while a coarser spatial resolution provides poor prediction results.

The reason for performing such data conversions is a lack of means to directly handle vector data in deep neural nets. An alternative approach is to encode these spatial data models directly. The first step towards such goal is to encode a point location into an embedding space such that these location embeddings can be easily used in the downstream NN modules. This is the idea of location encoding.

Location encoding (mac2019presence; mai2020multi; zhong2020reconstructing; mai2020se; gao2018learning; xu2018encoding; chu2019geo) refers to a NN-based encoding process which represents a point/location into a high dimensional vector/embedding such that this embedding can preserve different spatial information (e.g., distance, direction) and, at the same time, be learning-friendly

for downstream machine learning (ML) models such as neural nets and support vector machines (SVM). By learning friendly we mean that the downstream model does not need to be very complex and does not require lots of training data to prevent model overfitting. The encoding results are called location embeddings. And the corresponding NN architecture is called location encoder, which is a general-purpose model that can be incorporated into different GeoAI models for different downstream tasks.

Figure 1 is an illustration of location encoding. Here, we use location-based species classification as an example of the downstream tasks which aims at predicting species $y$ based on a given location $\bx$ . The training objective is to learn the conditional distribution $P(y|\bx)$ , i.e., the probability of observing $y$ given $\bx$ , which is highly non-linear. The idea of location encoding can be understood as a feature decomposition process which decomposes location $\bx$ (e.g., a two-dimensional vector of latitude and longitude) into a learning-friendly high dimensional vector (e.g., a vector with 100 dimensions), such that the highly non-linear distribution $P(y|\bx)$ can be learned with a relatively simple learner such as a linear SVM or a shallow NN model $M()$ . The key benefits of such an architecture are to require less training data with simpler learners, and the possibility to leverage unsupervised training to better learn the location representations.

Refer to caption — Figure 1. An illustration of location encoding. Here, we use location-based species classification as an example of the downstream tasks. Those 20 points in 2D space represent species occurrence records. Each occurrence can be written as $\pt_{i}=(\bx_{i},y_{i})$ where $\bx_{i}$ indicates the 2D locations and $y_{i}$ indicates the corresponding species type, i.e., the ground truth label. $\neifunc(\bx_{i})$ indicates the spatial neighborhood of $\bx_{i}$ . A location encoder $\enc(\cdot)$ takes 2D location $\bx_{i}$ as its input and outputs a location embedding as a high dimensional vector. This embedding is further fed into a downstream NN model $M()$ for species prediction. The whole model architecture can be trained end-to-end in a supervised learning manner.

Recently, the effectiveness of location encoding has been demonstrated in multiple GeoAI tasks including geo-aware image classification (yin2019gps2vec; chu2019geo; mac2019presence; mai2020multi), POI classification (mai2020multi), place annotation (yin2019gps2vec), trajectory prediction (xu2018encoding; yin2019gps2vec), location privacy protection (rao2020lstm), geographic question answering (mai2020se), 3D protein distribution reconstruction (zhong2020reconstructing), point cloud classification and segmentation (qi2017pointnet; qi2017pointnet++; li2018pointcnn), and so on. Despite these successful stories, there is still a lack of a systematic review on such a topic. This paper fills this gap by providing a comparative survey on different location encoding models. We give a general conceptual formulation framework which unifies almost all existing location encoding methods.

It is worth mentioning that the location encoding discussed in this work is different from the traditional location encoding systems (i.e., geocoding systems)¹¹1https://gogeomatics.ca/location-encoding-systems-could-geographic-coordinates-be-replaced-and-at-what-cost/ which convert geographic coordinates into codes using an encoding scheme such as Geohash or codes for partition tiles such as Open Location Code and what3words. These traditional encoding systems are designed to support navigation and spatial indexing, while the neural location encoders we present here are used to support downstream ML models.

The contributions of our work are as follows:

(1)

Although there are multiple existing works on location encoding, the necessity to design such a model is not clear. In this work, we formally define the location encoding problem and discuss the necessity from a machine learning perspective.
(2)

We conduct a systematic review on existing location encoding research. A detailed classification system for location encoders is provided and all models are reformulated under a unified framework. This allows us to identify the commonalities and differences among different location encoding models. As far as we know, this is the first review on such a topic.
(3)

We extend the idea of location encoding to the broader topic of encoding different types of spatial data (e.g., polylines, polygons, graphs, and rasters). The possible solutions and challenges are discussed.
(4)

To emphasize the general applicability of location encoding, we discuss its potential applications in different geoscience domains. We hope these discussions can open up new areas of research.

The rest of this paper is structured as follows. We first introduce a formal definition of location encoding in Section LABEL:sec:def. Then, in Section 2, we discuss the necessity of location encoding. Next, we provide a general framework for understanding the current landscape of location encoding research and survey a collection of representative work in Section LABEL:sec:review. In Section LABEL:sec:complex-geo, we discuss how to apply location encoding for different types of spatial data. Finally, we conclude our work and discuss future research directions in Section LABEL:sec:future.

2. The necessity of location encoding for GeoAI

In this section we motivate the need to embed a location $\bx\in\Real^{\spdim}$ ( $\spdim=2,3$ ) into a high dimensional vector $\enc^{(\mP,\theta)}(\bx)\in\Real^{\embdim}$ , which may seem counter-intuitive at first. We mainly address this issue from a machine learning perspective.

A key concept in statistical machine learning is bias-variance trade-off (hastie2009). On the one hand, when a learning system is required to pick one hypothesis out of a large hypothesis space (e.g., deciding the parameters of a large 24 layer neural networks), \maiit is flexible enough to approximate almost any non-linear distribution (low bias). However, it needs a lot of training data to prevent overfitting. This is called the low bias high variance situation. On the other hand, when the hypothesis space is restricted (e.g., linear regression or single layer neural nets) the system has little chance to over fit, but might be ill-suited to model the underlying distribution and result in low performance in both training and testing sets \mai(high bias). This situation is called low variance high bias. For many applications the data distribution is complex and \maihighly non-linear. We may not have enough domain knowledge to design good models with low variance (the effective model complexity) and low bias (the model data mismatch) at the same time. \maiMoreover, we might want to avoid adopting too much domain knowledge into the model design which will make the resulting model task specific. For example, the distribution of plant species (\maisuch as $P(y|\bx)$ in Figure 1) may be highly irregular influenced by several geospatial factors and interactions among species (mac2019presence). Kernel (smoothing) methods (e.g., Radial Basis Function (RBF)) and neural networks (e.g., feed-forward nets) are two types of most successful models which require very little domain knowledge for model design. They both have well established ways of controlling the effective model complexity. The kernel methods are more suited to low dimensional input data – modeling highly non-linear distributions with little model complexity. \maiHowever, they need to store the kernels during inference time which is not memory efficient. Neural networks have more representation power \maiwhich means a deep network can approximate very complex functions with no bias, while requiring more domain knowledge for model design to achieve lower variance and bias.

From a statistical machine learning perspective, the main purposes of location encoding is to produce learning friendly representations of geographic locations for downstream models such as SVM and neural networks. By learning friendly we mean that the downstream model does not need to be very complex and require large training samples. For example, the location encoding process may perform a feature decomposition ( $\bx\in\Real^{\spdim}\to\enc^{(\mP,\theta)}(\bx)\in\Real^{\embdim}$ , where $\spdim<\embdim$ ) so that the distribution we want to model such as $P(y|\bx)$ in Figure 1 becomes linear in the decomposed feature space, and a simple linear model can be applied. Figure 2 illustrates this idea by using a simple binary classification task. If we use the original geographic coordinates $\bx$ as the input features to train the binary classifier, the resulting classifier $M_{1}$ will be a complex and nonlinear function which is prone to overfitting as shown in the left of Figure 2. After the location encoding process, the geographic coordinates feature is decomposed so that a simple linear model $M_{2}$ can be used as the binary classifier²²2The dimensionality of the location embedding space will be larger (e.g., 32 or 128); we use 3D here for ease of illustration..

3. Different application domains for location encoding

To further show the flexibility and generalizability of location encoding, this section will discuss its possible applications in different research domains.

3.1. Point cloud based mapping and recognition

Point cloud based mapping and recognition are among the most important usage of location encoding which is essential for many real-world applications such as autonomous navigation (geiger2012we), housekeeping robots (oh2002development), argumented/virtual reality (park2008multiple), automated indoor mapping (huitl2012tumindoor), face detection (e.g., iPhone’s face ID feature), and so on. Many $\enc^{(\mP)}(\bx)$ we discussed in Section LABEL:sec:review such as PointNet, VoxelNet, and PointCNN are originally proposed for point cloud based tasks including point cloud classification and segmentation (qi2017pointnet; qi2017pointnet++), point cloud based 3D object recognition (zhou2018voxelnet), and point cloud generation (achlioptas2018learning; valsesia2018learning).

\reviseone

Despite these success stories, there are multiple challenges to be solved in this domain. Typical point clouds obtained from LiDAR contain 100k+ points which results in high computation and memory requirements (zhou2018voxelnet). How to scale up current models to an order of magnitudes of the real-world point clouds while preserving the ability to capture fine-grained localized features is challenging. Moreover, the current point-based networks usually resort to expensive neighbor searching mechanism such as KNN search. This significantly impact\reviseones model efficiency. How to design $\neifunc(\bx)$ for a more efficient neighborhood search is worth further investigation (guo2020deep).

3.2. Human mobility and urban studies

Human mobility is another important application area given the increasingly large-scale mobility data generated from location-based services (dodge2020progress). Section LABEL:sec:line has discussed how to encode trajectories as polylines into an embedding space for several applications such as trajectory prediction (xu2018encoding) and trajectory synthesis (rao2020lstm). Moreover, location encoding is useful to understand the urban spatial structure and urban dynamics. Possible applications are place representation learning (yan2017itdl; liu2019place), urban zone representation learning (zhai2019beyond; fu2019efficient), place characteristic prediction (zhu2020understanding), and urban traffic forecasting (li2017diffusion; cai2020traffic).

3.3. Biodiversity and species spatiotemporal distribution modeling

\revisetwo

Traditionally, the study of species spatiotemporal distribution modeling (SDM) (zuo2008geosvm) is limited to a small spatial scale due to the lack of species occurrence data. This also prohibits the usage of deep learning models on this topic. Recently, multiple large-scale species occurrence data sets have been constructed such as iNaturalist dataset³³3https://www.inaturalist.org/projects/city-nature-challenge-2020 (van2018inaturalist), Global Biodiversity Information Facility (GBIF) dataset ⁴⁴4https://www.gbif.org/, Pl@ntNet dataset⁵⁵5https://plantnet.org/en/, and GeoLifeCLEF Dataset⁶⁶6https://www.imageclef.org/GeoLifeCLEF2020 (cole2020geolifeclef). These datasets \reviseoneaccelerate the development of many location encoders for geo-aware species fine-grain recognition and SDM (chu2019geo; mac2019presence; mai2020multi). This will be an interesting application area for location encoding models.

3.4. Geospatial semantics

Geospatial semantics (kuhn2005geospatial; janowicz2012geospatial) is about “understanding GIS contents, and capturing this understanding in formal theories”. hu2017geospatial further divided this definition into two parts: understanding and formalization. The first part is mainly about human cognition of geographic concepts and natural language understanding of geospatial related text contents. Important applications include toponym recognition and disambiguation (delozier2015gazetteer; ju2016things; wallgrun2018geocorpora; wang2020neurotpr), text geolocalization (wing2011simple; izbicki2019geolocating), spatial relation extraction (ramalho2018encoding), geographic information retrieval (purves2011geographic; janowicz2011semantics; hu2015metadata; jiang2018towards; mai2020semantically), and text based geographic question answering (chen2014parameterized). The second part focuses on capturing this understanding through formal theories such as ontologies. Important applications include geo-ontology engineering (janowicz2012observation; calegari2016supporting), geographic knowledge graph construction (regalia2019computing) and entity alignment (trisedya2019entity), and geo-ontology matching (delgado2013evaluation; zhu2016spatial). Many applications \reviseonerely on both understanding and formalization such as geo-analytical QA (scheider2020geo; xu2020extracting), thematic signature learning for geographic feature types (adams2015thematic), GeoKG based geographic QA (mai2019relaxing; mai2020se), and geographic knowledge graph summarization (yan2019spatially).

Location encoding can be utilized in many tasks discussed above. For example, it can be directly used for multiple GeoKG based tasks (e.g., summarization, alignment, geo-ontology matching) to encode the spatial footprints of each geographic entities by following the idea we discussed in Section LABEL:sec:graph. For place name disambiguation, location encoding can be used to capture the correspondence between places’ locations and their semantic context (e.g., place description). It can learn a spatial distribution of thematic topics over the world which serve as a prior knowledge for this task (ju2016things).

As for other tasks such as text geolocalization, locations are used as the model output rather than input. For these tasks, how location encoding can be included into the existing model design still needs to be examined in future research.

3.5. Climate science

Since understanding the \reviseoneunderlying mechanism behind the important weather phenomena is the major objective of climate science, deep learning technologies is not widely used in this domain given its limited ability for model interpretation. Nevertheless, we \reviseonesee an increasing number of \reviseonestudies which utilize deep learning models for different prediction tasks on climate such as precipitation forecasting (agrawal2019machine), El Niño-Southern Oscillation (ENSO) forecasting (ham2019deep), weather model downscaling (sachindra2018statistical), and drought prediction (agana2017deep; kaur2020deep). Location encoding can be useful in these tasks when the \reviseoneunderlying datasets use vector data models such as the SST data collected from buoys ⁷⁷7https://www.ndbc.noaa.gov/ and the weather condition data collected from weather balloons.