Length Learning for Planar Euclidean Curves

Barak Or and Liam Hazan B.Or is with the Department of Marine Technologies, Charney School of Marine Sciences, University of Haifa, Israel (e-mail: [email protected]).L.Hazan is with Industrial Engineering School at Technion, Israel(e-mail:[email protected]).

Abstract

In this work, we used deep neural networks (DNNs) to solve a fundamental problem in differential geometry. One can find many closed-form expressions for calculating curvature, length, and other geometric properties in the literature. As we know these concepts, we are highly motivated to reconstruct them by using deep neural networks. In this framework, our goal is to learn geometric properties from examples. The simplest geometric object is a curve. Therefore, this work focuses on learning the length of planar sampled curves created by a sine waves dataset. For this reason, the fundamental length axioms were reconstructed using a supervised learning approach. Following these axioms a simplified DNN model, we call $ArcLengthNet$ , was established. The robustness to additive noise and discretization errors were tested.

Index Terms:

Differential geometry, length learning, planar Euclidean curves, supervised learning.

I Introduction

The calculation of curve length is the major component in many classical and modern problems involving numerical differential geometry [1, 2]. For example, a handwritten signature involves the computation of the length along the curve [3]. Several numerical constraints affect the quality of the length calculation; additive noise, discretization error, and partial information. A robust approach to handle it is required.
Lastly, Machine Learning (ML) has become highly popular. It has achieved great success in solving many classification, regression and anomaly detection tasks [4]. A sub-field of ML is the Deep Neural Networks (DNN), which outperforms many classic methods by design deep architectures [4]. An efficient DNN architecture finds intrinsic properties by using a convolutional operator (and some more sophisticated operators) and generalize them. Their success is related to the enormous amount of data and their capability to optimize it by high computational available resources. In this work, we address a fundamental question in the field of differential geometry [5] and we aim to reconstruct a basic property using DNN. The simplest geometric object is a curve. To characterize a curve, one can define metrics such as length and curvature, and distinguish one curve from another. There are many close form expressions for calculation of the length, curvature, and other geometric properties in the classical literature [6]. However, since we know the powerful functions of DNN, we are highly motivated to reconstruct the fundamental length property for curves, the arclength, by designing a DNN. We focused on the two-dimensional Euclidean domain. The formulation of this task was done in a supervised learning method where a data-dependent learning-based approach was applied by feeding each example at a time through our DNN model and by minimizing a unique loss function that satisfies the length axioms. For simplicity, we focused on sine wave curves and created a dataset by tuning the wave amplitude, phase, translation and rotation to cover a wide-range of geometric representation. The resulted trained DNN was called $ArcLengthNet$ . It obtains a 2D vector as an input, represents the planar Euclidean sampled curve, and outputs their respective length.
Related papers in the literature mainly address a higher level of geometric information by deep learning approach [7, 8]. Saying that, a fundamental property was reconstructed by DNN in [9], where a curvature-based invariant signature was learned by using a Siamese network configuration [10]. They presented the advantages of using DNN to reconstructing the curvature signature, among which it mainly results in robustness to noise and sampling errors.
The main contributions of this work is to reconstruct the length property. For that, two architectures were designed. One is based on Convolutional Neural Networks (CNNs), and the other is based on Recurrent Neural Networks (RNNs). We showed that the CNN-based architecture overcomes the RNN-based architecture, and by that establishing the $ArcLengthNet$ as a CNN based architecture. The advantage of the $ArcLengthNet$ is presented.
The remainder of the paper is organized as follows: Section 2 summarizes the geometric background of the length properties. Section 3 provides a detailed description of the learning approach were the two architectures are presented. Section 4 presents the results followed by the discussion. Section 5 gives the conclusions.

II Geometric Background of Arclength

In this section, the length properties are presented and the discretization error is reviewed.

II-A Length properties

Consider a planar parametric differential curve in the Euclidean space, $C\left(p\right)=\left\{{x\left(p\right),y\left(p\right)}\right\}\in{{\cal R}^{2}}$ , where $x$ and $y$ are the curve coordinates parameterized by parameter $p\in[0,N]$ , where $N$ is a partition parameter. The Euclidean length of the curve, is given by

l\left(p\right)=\int_{0}^{p}{\left|{{C_{\tilde{p}}}\left({\tilde{p}}\right)}\right|d\tilde{p}}=\int_{0}^{p}{\sqrt{x_{\tilde{p}}^{2}+y_{\tilde{p}}^{2}}d\tilde{p}},

(1)

where $x_{p}=\frac{{dx}}{{dp}},y_{p}=\frac{{dy}}{{dp}}$ . Summing all the increments results in the total length of $C$ , given by

{\cal L}=\int_{0}^{N}{\left|{{C_{\tilde{p}}}\left({\tilde{p}}\right)}\right|d\tilde{p}}.

(2)

Following the length definition, the main length axioms are provided.
Additivity: The length additives with respect to concatenation, where for any $C_{1}$ and $C_{2}$ the following holds

{\cal L}\left({{C_{1}}}\right)+{\cal L}\left({{C_{2}}}\right)={\cal L}\left({{C_{1}}\cup{C_{2}}}\right).

(3)

Invariant: length is invariant with respect to rotation ( $\bf R$ ) and translation ( $\bf T$ ),

{\cal L}\left({\left({{\bf{T}}+{\bf{R}}}\right)C}\right)={\cal L}\left(C\right).

(4)

Monotonic: length is monotone, where for any $C_{1}$ and $C_{2}$ the following holds

{\cal L}\left({{C_{1}}}\right)\leq{\cal L}\left({{C_{2}}}\right)\,\,\,\,\,\,\,\,{C_{1}}\subset{C_{2}}.

(5)

Non-negativity: The length of any curve is non-negative,

{\cal L}\left(C\right)\geq 0.

(6)

In order to reconstruct the length property by DNN, a discretization of the curve should be applied. As a consequence, it prone to errors.

II-B Discretization error

The curve $C$ lies on a close interval $[\alpha,\beta]$ . In order to find the length by a discretized process, a partition of the interval is done, where

{\cal P}=\left\{{\alpha={p_{0}}<{p_{1}}<{p_{2}}<\cdots<{p_{N}}=\beta}\right\}.

(7)

For every partition ${\cal P}$ , the curve length can be represented by the sum

s\left({{\cal P}}\right)=\sum\limits_{n=1}^{N}{\left|{C\left({{p_{n}}}\right)-C\left({{p_{n-1}}}\right)}\right|}.

(8)

The discretization error is given by,

\begin{array}[]{l}{e_{d}}={\cal L}-s\left({\cal P}\right)\\ =\int_{0}^{N}{\left|{{C_{p}}\left(p\right)}\right|dp-}\sum\limits_{n=1}^{N}{\left|{C\left({{p_{n}}}\right)-C\left({{p_{n-1}}}\right)}\right|}\end{array}.

(9)

where obviously, $e_{d}\to 0$ when $N\to\infty$ (for further reading, the reader refers to [11]). Fig. 1 illustrates a general curve with their discretized representation.

Refer to caption — Figure 1: Discretization

III Learning approach

III-A Motivation

The motivation for using DNN for this task lies in the core challenge of implementing equations (1) and (2) in real-life scenarios. These equations involve nonlinearity and derivatives. Poor sampling and additive noise might lead to numerical errors [12]. The differential and integral operators can be obtained by using convolution filters [9]. The differential invariants can be interpreted as a high pass filter and the integration as a low pass filter. Hence, it is convenient to use the Convolutional Neural Network (CNN) for our task. Another approach to deal with this task involves the Recurrent Neural Network (RNN), where the curve is considered a time-series [13, 14]. A modified version of RNN is the Long-Short Term Memory (LSTM), where a weighting for the past time steps is employed [15]. We implemented two architectures. The first is based on a simplified CNN, and the second is based on a simplified LSTM. We found that the CNN-based architecture overcomes the LSTM-based architecture. The rest of this section presents the details of our process to obtained both.

III-B Data generation

The reconstruction of the length properties was done in a supervised learning approach, where many curve examples with their lengths as labels were considered. Each curve is represented by $2\times N$ vector for the $x$ and $y$ coordinates and a fixed number of points $N$ .
We created a dataset with $20,000$ to fully enable DNN training. This large amount of examples aimed to cover curve transformations and to satisfy different patterns. These curves were created by taking a sinus wave with a random sampling of amplitude $a$ , phase $\phi$ , translation $\bf T$ , and rotation, $\bf R$ . The general curve in our data is given by

C\left(p\right)={\bf{R}}\tilde{C}\left(p\right)+{\bf{T}},

(10)

where

\tilde{C}\left(p\right)=a\sin\left({p+\phi}\right).

(11)

A random cutting point was randomly selected to divide the curve into two new curves to reconstruct the additivity property. Fig. 2 shows some curves we created. The data set was split into train and test sets, where additional holdout set was created. During the data creation phase, we demanded non-negativity of the labels and created many curves of different length with rotation and translation examples to cover the various axioms (3)-(6).

III-C Loss function

In order to show the additive property, we designed a unique loss function for each example, where it designed as follows

{J_{k}}={\left({{\cal L}\left({{s_{1}}}\right)-O\left({{s_{2}}}\right)-O\left({{s_{3}}}\right)}\right)^{2}}_{k}+\lambda\left\|{{\Theta_{ij}}}\right\|_{2}^{2},

(12)

where ${s_{1}},{s_{2}}$ , and ${s_{3}}$ are the input curves that hold the equality ${\cal L}\left({{s_{1}}}\right)={\cal L}\left({{s_{2}}}\right)+{\cal L}\left({{s_{3}}}\right)$ , $O$ is the DNN output, $k$ is the example index, $\lambda$ is a regularization parameter, $\Theta_{ij}$ are the various DNN weight, and $\left\|\cdot\right\|_{2}^{2}$ is the two dimension norm. By minimizing $J_{k}$ by passing the examples through a model, the weights were tuned. The optimized model is characterized by the optimal weights that provided by

\Theta_{ij}^{*}=\mathop{\arg\min}\limits_{{\Theta_{ij}}}\sum\limits_{k}{{J_{k}}}.

(13)

III-D ArcLengthNet Architecture

In this subsection the $ArcLengthNet$ is presented (the CNN-based architecture). The model architecture is very simplified, including a convolutional layer and two fully-connected layers with only one activation function. Each curve is represented by $N=200$ points. This representation is inserted into a convolutional layer with a small kernel of size $3$ . It is processed into a fully connected layer that outputs only $10$ weights through a Rectified Linear Unit (ReLU) activation function to another fully connected layer which finally outputs the length. The architecture is shown in Fig.3.

III-E $ArcLengthNet$ Training

The DNN was trained by passing many examples in small batches with a back-propagation method. The training process was carried out in batches of $200$ examples for $100$ epochs. The optimizer we used is Stochastic Gradient Descent (SGD) with momentum and weight decay [16]. Various parameters are provided in Table 1. Fig.4 shows a graph of the train and test losses as a function of the number of epochs.

III-F LSTM-based Architecture

The task of length learning can be interpreted as a time-series based learning. Each point of the curve can be interpreted as a time step with $x$ and $y$ coordinates. The DNN aims to generalize the local properties into a global length. The typical architecture to deal with time-series data is the Recurrent Neural Network (RNN). One modification of RNN is the Long Short-Term Memory (LSTM) architecture, where has feedback connections (in opposite to the classical RNN). The LSTM is used in handwritten recognition tasks with a great success [17]. Motivated by this success, we designed an LSTM architecture, presented in Fig.5. Similar to $ArcLengthNet$ , the LSTM-based architecture is simplified, including $200$ blocks with $1$ vector outputs. They concatenated and inserted to two fully connected layers ( $10$ each) that outputs connected with a ReLU activation function. Fig.6 shows a graph of train and test losses as a function of the number of epochs.

TABLE I: Learning Parameters

Description	Symbol	Value
Nunber of examples	$K$	$20,000$
Train/test ratio	-	$80/20$
Regularization parameter	$\lambda$	$0.01$
Partition parameter	$N$	$200$
Batch size	-	$200$
Learning rate	$\eta$	$0.001$
Momentum	$-$	$0.9$
Weight decay	-	$0.0005$
Epochs	-	$100$

IV Results and discussion

Both architectures were trained in the same approach with the same database. As shown in Fig.3 and Fig.6 these architectures were well trained after $100$ epochs. A holdout set was defined to test the performance of the architectures on unseen data. This set contains $5,000$ examples that have not been used in train set or test test. The $ArcLengthNet$ obtained a minimum MSE of $0.17$ , where the LSTM based model obtained a minimum MSE of $0.24$ . Hence, we conclude that the $ArclengthNet$ architecture overcomes LSTM-based architecture. The monotonic property was tested for the $ArcLengthNet$ on this holdout set, where a linear relation was established between the true length and the $ArcLengthNet$ (Fig.4).

IV-A Performance measure

In order to verify our results, we used the Mean Squared Error (MSE) criterion, Root-MSE (RMSE), as also a unique criterion: RMSE-over-Mean-Length-Ratio (RMLR), given by

{\bf{RMLR}}=\frac{{{\bf{RMSE}}}}{{{\cal E}\left[{\cal L}\right]}},

(14)

where ${\cal E}$ is the expected value operator. This measure provides a normalized error with respect to the curve length.various curves of different lengths, we must appropriately weigh the errors.

V Conclusion and further work

A learning-based approach to reconstruct the length of curves was presented. The powerful of deep neural networks to reconstruct the fundamental axioms was demonstrated. The results can be further used to improve handwritten signatures, and reconstruct some more differential geometry properties and theorems.

References

[1] B. Guenter and R. Parent, “Computing the arc length of parametric curves,” IEEE Computer Graphics and Applications, vol. 10, no. 3, pp. 72–78, 1990.
[2] H.-B. Hellweg and M. Crisfield, “A new arc-length method for handling sharp snap-backs,” Computers & Structures, vol. 66, no. 5, pp. 704–709, 1998.
[3] S. Y. Ooi, A. B. J. Teoh, Y. H. Pang, and B. Y. Hiew, “Image-based handwritten signature verification using hybrid methods of discrete radon transform, principal component analysis and probabilistic neural network,” Applied Soft Computing, vol. 40, pp. 274–282, 2016.
[4] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015.
[5] S. Sternberg, Lectures on differential geometry. American Mathematical Soc., 1999, vol. 316.
[6] R. Kimmel, Numerical geometry of images: Theory, algorithms, and applications. Springer Science & Business Media, 2003.
[7] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst, “Geometric deep learning: going beyond euclidean data,” IEEE Signal Processing Magazine, vol. 34, no. 4, pp. 18–42, 2017.
[8] J. Berg and K. Nyström, “A unified deep artificial neural network approach to partial differential equations in complex geometries,” Neurocomputing, vol. 317, pp. 28–41, 2018.
[9] G. Pai, A. Wetzler, and R. Kimmel, “Learning invariant representations of planar curves,” arXiv preprint arXiv:1611.07807, 2016.
[10] D. Chicco, “Siamese neural networks: An overview,” Artificial Neural Networks, pp. 73–94, 2020.
[11] M. P. Do Carmo, Differential geometry of curves and surfaces: revised and updated second edition. Courier Dover Publications, 2016.
[12] S. Qian and J. Weiss, “Wavelets and the numerical solution of partial differential equations,” Journal of Computational Physics, vol. 106, no. 1, pp. 155–175, 1993.
[13] A. C. Tsoi and A. Back, “Discrete time recurrent neural network architectures: A unifying review,” Neurocomputing, vol. 15, no. 3-4, pp. 183–223, 1997.
[14] K. Smagulova and A. P. James, “A survey on lstm memristive neural network architectures and applications,” The European Physical Journal Special Topics, vol. 228, no. 10, pp. 2313–2324, 2019.
[15] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[16] S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv preprint arXiv:1609.04747, 2016.
[17] A. Graves, M. Liwicki, S. Fernández, R. Bertolami, H. Bunke, and J. Schmidhuber, “A novel connectionist system for unconstrained handwriting recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 31, no. 5, pp. 855–868, 2008.