This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Stochastic Optimal Control of a Sailboat

Cole Miles1,2, Alexander Vladimirsky3 *Supported in part by the NSF DMS (awards 1738010 and 2111522), and the DOE CSGF (award DE-SC0020347).1Department of Physics, Cornell University2[email protected]3Department of Mathematics, Cornell University
Abstract

In match race sailing, competitors must steer their boats upwind in the presence of unpredictably evolving weather. Combined with the tacking motion necessary to make upwind progress, this makes it natural to model their path-planning as a hybrid stochastic optimal control problem. Dynamic programming provides the tools for solving these, but the computational cost can be significant. We greatly accelerate a semi-Lagrangian iterative approach of Ferretti and Festa [1] by reducing the state space dimension and designing an adaptive timestep discretization that is very nearly causal. We also provide a more accurate tack-switching operator by integrating over potential wind states after the switch. The method is illustrated through a series of simulations with varying stochastic wind conditions.

I INTRODUCTION

In recent years, advances in numerical and mathematical methods have greatly impacted the sport of sailing. Yacht path planning algorithms are now heavily employed both as a part of match simulation software, and as a tool to provide live guidance on optimal decision-making during a race. Here, we focus on the problem of finding the optimal feedback control to perform live path planning in the presence of a stochastically-evolving weather system.

Weather conditions greatly impact the performance capabilities of a sailboat in a race. While it is possible to produce meteorological forecasts, there is an inevitable stochastic component to how weather conditions evolve during the course of a race which is impossible to predict exactly. It is critical that algorithms used for decision making correctly factor this randomness into their control policy. In [2], this randomness is modeled as a discrete-time Markov chain process which updates weather conditions at a fixed tick rate. In [3] this idea is extended to a continuous-time Markov chain, formulating the system as a free boundary problem whose solution is a switching curve giving the optimal control policy. Here, we will take the approach of [1] and describe optimality through dynamic programming, yielding a system of two coupled quasi-variational inequalities of Hamilton-Jacobi-Bellman (HJB) type with degenerate ellipticity.

For full physical accuracy, a model would have to include both the position and velocity of the sailboat in the state space, and model the forces on the sailboat in different state configurations, leading to at best 55 coupled first-order equations in 55 state variables. An approach along these lines can be found in [2]. But in long course matches, inertial effects associated with accelerating up to the equilibrium speed are (in general) secondary, allowing one to reduce the system to 33 coupled first-order equations by assuming that the velocity vector can be changed instantaneously.

However, this assumption cannot be made in modeling the process of tacking, which is necessary when attempting to travel upwind. To make progress, sailors must travel at angles to the wind, repeatedly switching directions to steer in a zig-zag pattern on ether side of the wind direction. Each time the bow is swung to the other side of the wind (a single tack), the boat is significantly slowed down for the period of time in which the wind pushes against the boat. In previous works ([1],[4]), this is incorporated by modeling the system as a hybrid control problem, with a continuous control corresponding to the angle of the sails and a discrete control corresponding to which tack we should be on. The lost inertia has been modeled as a fixed switching delay, reflecting the time lost due to the reduced speed from performing the tack.

In this work, we extend and improve upon the methods of Ferretti and Festa [1], who solve the problem using a semi-Lagrangian discretization. In Sec. II, we give the original problem statement and recast it to a reduced coordinate system in 2D, which alone greatly reduces the time needed to solve the problem numerically. In Sec. III, we set up the optimal control problem following the structure of Ref. [1], and provide an improvement to their switching operator to incorporate the wind evolution during the process of a tack. In Sec. IV, we describe our semi-Lagrangian numerical methods, including a discretization in which the system is nearly-causal and can be solved in a handful of iterations. We present the results of numerical simulations in Sec. V, producing optimal switching maps and highlighting the effect of our improved switching operator. We conclude by listing directions for future work in Sec. VI.

II System Dynamics

9090135135180180225225270270315315045450.000.000.010.010.020.020.030.030.040.040.050.05
WindTack q=1q=1Tack q=2q=2
(a) The polar speed plot s(u)s(u) used in this work. Note the symmetry about u=0u=0^{\circ}, non-convexities near 00^{\circ} and 180180^{\circ}, and that the speed vanishes at u=0u=0^{\circ}. These properties are common to all polar plots.
Tack q=1q=1𝒯\mathcal{T}rrWinds\vec{s}θ\thetauudrdrdθd\theta
Tack q=2q=2𝒯\mathcal{T}rrWinds\vec{s}θ\thetauudrdrdθd\theta
(b) A diagram of the system setup in reduced coordinates.
Figure 1: Boat dynamics relative to the wind and relative to a target.

We consider a problem introduced in [1]: time-optimal sailboat navigation to a closed target set 𝒯\mathcal{T} when the wind direction evolves stochastically. For simplicity, we focus on their simplified model where the wind speed is fixed, and the angle ϕ\phi of the upwind direction (measured counterclockwise from the yy-axis) evolves under a drift-diffusion process:

dϕ=adt+σdWd\phi=adt+\sigma dW (1)

where aa is the drift rate, σ\sigma is the strength of the Brownian motion, and dWdW is the differential of a Wiener process.

We control the angle of the boat’s velocity relative to the upwind direction. But at any given time we are also restricted to an interval of angles determined by the sailboat’s current tack, or configuration of sails. We represent the tack as a controlled discrete variable q{1,2}q\in\{1,2\}; q=1q=1 represents the starboard tack, which we model as having access to the angles [0,π][0,\pi], while the port tack q=2q=2 has access to [π,0][-\pi,0]. For notational convenience, we will use (1)qu(-1)^{q}u to encode the velocity angle relative to the upwind direction, with the steering control variable u[0,π].u\in[0,\pi]. So, the steering angle uu is measured counterclockwise for q=1q=1 and clockwise for q=2,q=2, with u=0u=0 corresponding to motion directly against the wind. Due to our inertia-less approximation, whenever uu changes, the boat is assumed to instantly start travelling in the new direction with the angle-dependent speed s(u)s(u). The speed profile (termed the polar plot) capturing this dependence is determined by the exact geometry of a specific boat. A typical example [5] is provided in Fig. 1(a).

Assuming that the boat’s location is (x(t),y(t)),(x(t),y(t)), the system dynamics in absolute coordinates is then

dx\displaystyle dx =s(u)sin(ϕ(1)qu)dt\displaystyle=-s(u)\sin(\phi-(-1)^{q}u)dt
dy\displaystyle dy =s(u)cos(ϕ(1)qu)dt\displaystyle=s(u)\cos(\phi-(-1)^{q}u)dt (2)
dϕ\displaystyle d\phi =adt+σdW\displaystyle=adt+\sigma dW

We now simplify by focusing on radially symmetric problems: if we assume that the target 𝒯\mathcal{T} is circular, we no longer care about an absolute angle of the boat’s position relative to the target. Shifting to a coordinate system centered at 𝒯,\mathcal{T}, we can reduce the (x,y,ϕ)(x,y,\phi) state space to just two coordinates (r,θ)(r,\theta), where r=x2+y2r=\sqrt{x^{2}+y^{2}} is the radial distance of the boat from the center of 𝒯\mathcal{T} and θ=ϕ+atan2(x,y)\theta=\phi+\operatorname{atan2}(-x,-y) is the angle of the wind measured counterclockwise relative to the straight line between the boat and the target; see Fig. 1(b). In this reduced representation, the dynamics are governed by

dr\displaystyle dr =s(u)cos(θ(1)qu)dt\displaystyle=-s(u)\cos(\theta-(-1)^{q}u)dt
dθ\displaystyle d\theta =[s(u)rsin(θ(1)qu)+a]dt+σdW\displaystyle=\left[\frac{s(u)}{r}\sin(\theta-(-1)^{q}u)+a\right]dt+\sigma dW (3)

which describe the projections of the velocity vector onto the radial and angular components, combined with the drift-diffusion evolution of the wind. We refer to the coefficients on dtdt in each equation as r˙\dot{r} and θ˙\dot{\theta}, respectively. New from the original system is the θ˙1r\dot{\theta}\propto\frac{1}{r} relationship, which will force us to utilize an adaptive timestepping scheme that chooses smaller timesteps near the target to counteract the rapidly increasing angular speed. We note that θ˙\dot{\theta} still remains bounded since 𝒯\mathcal{T} has positive radius.

III Stochastic Optimal Control

Assuming we start from a position 𝒓=(r,θ)\boldsymbol{r}=(r,\theta) and tack qq, the total time-to-target Tμ(𝒓,q)=min{t𝒓(t)𝒯,𝒓(0)=𝒓,q(0)=q}T_{\mu}(\boldsymbol{r},q)=\min\{t\mid\boldsymbol{r}^{\prime}(t)\in\mathcal{T},\boldsymbol{r}^{\prime}(0)=\boldsymbol{r},q^{\prime}(0)=q\} is a random variable whose expectation we are trying to minimize111 Strictly speaking, the objective in [1] is different since they minimize 𝔼[0Teλt𝑑t]=𝔼[(1exp(λT))/λ],\mathbb{E}\left[\int_{0}^{T}e^{-\lambda t}dt\right]=\mathbb{E}\left[\left(1-\exp(-\lambda T)\right)/\lambda\right], where λ>0\lambda>0 is a time-discounting coefficient chosen to improve the convergence of value iterations. Our objective is obtained by taking the limit with λ0\lambda\to 0 and the numerical tests indicate that value iterations based on our discretization converge even in this “undiscounted” case. by choosing a feedback steering/tack-switching policy μ\mu. This Tμ(𝒓,q)T_{\mu}(\boldsymbol{r},q) can be split into two pieces: the time spent continuously controlling the sails plus the time spent performing tack-switches. Each tack-switch incurs a switching delay CC due to the speed loss from the wind pushing directly against the boat’s intended direction during the maneuver222For simplicity of modeling, we will treat this delay as constant, though state-dependent CC would not present additional computational challenges. . We model the boat’s propelled speed as falling to zero (i.e., s(u)=0s(u)=0) during a tack-switch.

We thus define the value function to encode the minimal expected time to 𝒯\mathcal{T} form each starting configuration:

v(𝒓,q):=infμ𝔼[Tμ(𝒓,q)].v(\boldsymbol{r},q):=\inf_{\mu}\mathbb{E}\left[T_{\mu}(\boldsymbol{r},q)\right]. (4)

By Bellman’s Optimality Principle, this vv must satisfy

v(𝒓,q)=min(τ+infμΣμ,τv(𝒓,q),C+𝒩v(𝒓,q))+o(τ),v(\boldsymbol{r},q)=\min\left(\tau+\inf_{\mu}\Sigma_{\mu,\tau}v(\boldsymbol{r},q),\hskip 5.0ptC+\mathcal{N}v(\boldsymbol{r},q)\right)+o(\tau), (5)

where the first argument to the min\min is to be interpreted as the best expected time-to-𝒯\mathcal{T} if we stay on the current tack at least for a small time τ\tau, while the second is the best expected time if we switch tacks immediately. More formally, we define the operators Σμ,τ\Sigma_{\mu,\tau} and 𝒩\mathcal{N} as:

Σμ,τv(𝒓,q)\displaystyle\Sigma_{\mu,\tau}v(\boldsymbol{r},q) =𝔼𝒓;μ,τ[v(𝒓,q)|𝒓],\displaystyle=\mathbb{E}_{\boldsymbol{r}^{\prime};\mu,\tau}[v(\boldsymbol{r}^{\prime},q)|\boldsymbol{r}], (6)
𝒩v(𝒓,q)\displaystyle\mathcal{N}v(\boldsymbol{r},q) =𝔼𝒓;Ø,C[v(𝒓,3q)|𝒓],\displaystyle=\mathbb{E}_{\boldsymbol{r}^{\prime};\O,C}[v(\boldsymbol{r}^{\prime},3-q)|\boldsymbol{r}], (7)

where we employ the compact notation 𝔼𝒓;μ,τ[|𝒓]\mathbb{E}_{\boldsymbol{r}^{\prime};\mu,\tau}[\cdot|\boldsymbol{r}] to indicate the expectation of a function over evolved positions 𝒓\boldsymbol{r}^{\prime}, starting from position 𝒓\boldsymbol{r} and using control policy μ\mu for a time τ\tau. In (7), the symbol Ø\O in place of the control policy indicates that during the tacking time CC, no steering control is taking place as s(u)=0s(u)=0 for the duration. Importantly, the version of operator (7) used in [1] does not treat θ\theta as evolving during the tack, instead setting 𝒩v(𝒓,q)=v(𝒓,3q)\mathcal{N}v(\boldsymbol{r},q)=v(\boldsymbol{r},3-q). As we show in section V, correctly incorporating these switch-duration dynamics is critical in high-drift problems.

A standard argument based on an Ito-Taylor expansion of (5) yields the quasi-variational inequality that must be satisfied by a smooth value function:

max(v𝒩vC,H(𝒓,q,𝒓v)σ22vθθ)=0\max\left(v-\mathcal{N}v-C,\quad H(\boldsymbol{r},q,\nabla_{\boldsymbol{r}}v)-\frac{\sigma^{2}}{2}v_{\theta\theta}\right)=0 (8)

with the Hamiltonian

H(𝒓,q,𝒑)=maxu(r˙(𝒓,q,u)p1θ˙(𝒓,q,u)p2)1.H(\boldsymbol{r},q,\boldsymbol{p})=\max_{u}\left(-\dot{r}(\boldsymbol{r},q,u)p_{1}-\dot{\theta}(\boldsymbol{r},q,u)p_{2}\right)-1. (9)

If vv is not smooth, one can still interpret it as a weak solution (in particular, the unique viscosity solution [6]) of (8).

As θ\theta is the only randomly evolving variable and obeys drift-diffusive dynamics, both expectations in (6) and (7) take a similar form as averaging vv over Gaussian-distributed final θ\theta^{\prime}. For small τ\tau, an approximation to (6) taking u=μ(𝒓,q)u=\mu(\boldsymbol{r},q) and r˙,θ˙\dot{r},\dot{\theta} constant for a time τ\tau yields:

Σu,τv(r,θ,q)=12πτσ𝑑θexp{(θθθ˙τ)22τσ2}v(r+r˙τ,θ,q),\Sigma_{u,\tau}v(r,\theta,q)=\frac{1}{\sqrt{2\pi\tau}\sigma}\int_{-\infty}^{\infty}d\theta^{\prime}\\ \exp\left\{-\frac{(\theta^{\prime}-\theta-\dot{\theta}\tau)^{2}}{2\tau\sigma^{2}}\right\}v(r+\dot{r}\tau,\theta^{\prime},q), (10)

where vv is interpreted as 2π2\pi-periodic in θ\theta^{\prime}, and r˙,θ˙\dot{r},\dot{\theta} both implicitly depend on (r,θ,q,u)(r,\theta,q,u). Eq. (7) is the same formula under the substitutions q3q,τC,r˙=0,θ˙=aq\to 3-q,\tau\to C,\dot{r}=0,\dot{\theta}=a.

IV Numerical Implementation

IV-A Semi-Lagrangian Discretization

For a fixed small τ\tau, an approximation of the value function vv can be obtained as a fixed point of the mapping defined by ignoring the o(τ)o(\tau) term in (5). It can be shown that if the schemes approximating both arguments in the min\min are monotone, stable, and consistent with (8), then iterating this mapping is guaranteed to converge [7]. As in [1], to build a monotone scheme we discretize the state space on a rectangular grid and perform value iterations of (5) starting from an initial guess v0v^{0} to produce a sequence of approximate value functions vnv^{n} converging to the solution of the discretized version of (5). Using monotone schemes, choosing v0v^{0} such that the first iteration produces v1v0v^{1}\leq v^{0} everywhere is sufficient to ensure convergence [8].

To approximate the continuous-control operator Σμ,τ,\Sigma_{\mu,\tau}, we change the variable of integration in (10) to ξ=(θθθ˙τ)/2τσ2\xi=(\theta^{\prime}-\theta-\dot{\theta}\tau)/\sqrt{2\tau\sigma^{2}} and use the Gauss-Hermite quadrature [9]:

Σu,τvn(𝒓,q)=1π𝑑ξeξ2vn(r+r˙τ,2τσξ+θ+θ˙τ,q)1πi=1Mwivn(r+r˙τ,2τσξi+θ+θ˙τ,q),\Sigma_{u,\tau}v^{n}(\boldsymbol{r},q)\\ =\frac{1}{\sqrt{\pi}}\int_{-\infty}^{\infty}d\xi e^{-\xi^{2}}v^{n}(r+\dot{r}\tau,\sqrt{2\tau}\sigma\xi+\theta+\dot{\theta}\tau,q)\\ \approx\frac{1}{\sqrt{\pi}}\sum_{i=1}^{M}w_{i}v^{n}(r+\dot{r}\tau,\,\sqrt{2\tau}\sigma\xi_{i}+\theta+\dot{\theta}\tau,\,q), (11)

where MM is the number of quadrature points used, ξi\xi_{i} are the roots of the MMth Hermite polynomial HM(ξ)H_{M}(\xi), and wiw_{i} are

wi=2M1M!πM2[HM1(ξi)]2.w_{i}=\frac{2^{M-1}M!\sqrt{\pi}}{M^{2}[H_{M-1}(\xi_{i})]^{2}}. (12)

The sum in (11) requires evaluating vnv^{n} off-grid, which we accomplish through interpolation, as is common for semi-Lagrangian schemes [10]. The latter are guaranteed to be monotone as long as the interpolation used is zeroth or first order [11]. In our implementation, we define the numerical operator Su,τS_{u,\tau} by the sum of (11) evaluated using bilinear interpolation between the 4 closest gridpoints. We use M=2M=2 quadrature points, which reduces Su,τS_{u,\tau} to the form in [1], averaging between the two points 𝒓±=(r+r˙τ,θ+θ˙τ±στ).\boldsymbol{r}_{\pm}=\left(r+\dot{r}\tau,\,\theta+\dot{\theta}\tau\pm\sigma\sqrt{\tau}\right).

Due to θ˙\dot{\theta} rapidly increasing as 1r\frac{1}{r} near the target, we adaptively choose our time discretization as:

τ(𝒓,q,u)min(Δrr˙(𝒓,q,u),Δθθ˙(𝒓,q,u)).\tau(\boldsymbol{r},q,u)\propto\min\left(\frac{\Delta r}{\dot{r}(\boldsymbol{r},q,u)},\frac{\Delta\theta}{\dot{\theta}(\boldsymbol{r},q,u)}\right). (13)

We choose the constant of proportionality in (13) to be 1.51.5, to step into the middle of the first grid cell along the direction of motion. We then denote the scheme SS as the optimization of this operator over control angles uu:

Svn(𝒓,q)=minu(τ(𝒓,q,u)+Su,τ(𝒓,u,q)vn(𝒓,q)).Sv^{n}(\boldsymbol{r},q)=\min_{u}\Big{(}\tau(\boldsymbol{r},q,u)+S_{u,\tau(\boldsymbol{r},u,q)}v^{n}(\boldsymbol{r},q)\Big{)}. (14)

The optimal control policy solution is then precisely the argmin\arg\min of (14). In practice, since our polar plot datasets contain a discrete set of s(u)s(u) values, we perform this minimization by a simple grid search over the tabulated angles. The accuracy of this procedure can be also improved by expanding the available set of control angles through interpolation.

The same reparameterization and Gauss-Hermite quadrature approach is used to produce a numerical operator NN which approximates the switching operator of (7) as

Nvn(𝒓,q)=1πi=1Mwivn(r,2Cσξi+θ+aC, 3q).Nv^{n}(\boldsymbol{r},q)=\frac{1}{\sqrt{\pi}}\sum_{i=1}^{M}w_{i}v^{n}(r,\,\sqrt{2C}\sigma\xi_{i}+\theta+aC,\,3-q). (15)

To improve the accuracy, our implementation uses M=3M=3 for this switching quadrature since CτC\gg\tau in our experiments.

Our full method is summarized below in Algorithm 1.

1Initialize: set v0=0v^{0}=0 on 𝒯\mathcal{T} and v0=v^{0}=\infty elsewhere.
2 Initialize: set maxdiff=\texttt{maxdiff}=\infty and n=0.n=0.
3 while maxdiff>ϵ\texttt{maxdiff}>\epsilon do
4       foreach rr\in\mathcal{R} do
5             foreach θΘ\theta\in\Theta do
6                   foreach q{1,2}q\in\{1,2\} do
7                         foreach ui𝒰u_{i}\in\mathcal{U} do
8                               τ=1.5min(Δrr˙(q,ui),Δθθ˙(r,q,ui))\tau=1.5\min\left(\frac{\Delta r}{\dot{r}(q,u_{i})},\frac{\Delta\theta}{\dot{\theta}(r,q,u_{i})}\right)
9                               SiΔ=Sui,τvn(1𝒓,q)S^{\Delta}_{i}=S_{u_{i},\tau}v^{n}(1\boldsymbol{r},q)
10                              
11                        SΔ=miniSiΔS^{\Delta}=\min_{i}S^{\Delta}_{i}
12                         NΔ=C+Nvn(𝒓,q)N^{\Delta}=C+Nv^{n}(\boldsymbol{r},q)
13                         vn+1(𝒓,q)=min(SΔ,NΔ)v^{n+1}(\boldsymbol{r},q)=\min(S^{\Delta},N^{\Delta})
14                        
15      maxdiff=max𝒓,q(|vn+1(𝒓,q)vn(𝒓,q)|)\texttt{maxdiff}=\max_{\boldsymbol{r},q}(|v^{n+1}(\boldsymbol{r},q)-v^{n}(\boldsymbol{r},q)|)
16       n=n+1n=n+1
17      
Algorithm 1 Semi-Lagrangian value iterations to a convergence criterion ϵ\epsilon. Here, ,Θ,𝒰\mathcal{R},\Theta,\mathcal{U} refer to the discrete set of gridpoints sampled for the state and control spaces.

IV-B Gauss-Seidel Iterations

Gauss-Seidel relaxation can greatly reduce the number of iterations needed to reach convergence. We heuristically find that time-parameterized stochastic-optimal paths (r(t),θ(t))(r(t),\theta(t)) almost always have monotone decreasing r(t)r(t). Wherever this holds, the value function at any point will only depend on the values of gridpoints with smaller r.r. This can be exploited by updating the value function estimates in-place, sweeping along increasing rr direction. As a result, grid points later in the rr sweep will see already-updated values at smaller rr rather than the old values from the previous iteration.

An additional change can be made to our discretization to further take advantage of this causal dependence. To ensure the currently updated point only depends on those at smaller rr, we choose the adaptive timestep so that we always step at least a full grid length along the rr axis:

τ(𝒓,q,u)=1.5Δrr˙(𝒓,q,u).\tau(\boldsymbol{r},q,u)=1.5\frac{\Delta r}{\dot{r}(\boldsymbol{r},q,u)}. (16)

We refer to this second discretization as “row-wise Gauss-Seidel” (rwGS) relaxation. Our experiments show that it requires a nearly-constant number of iterations, only slowly beginning to increase at really fine grids. Meanwhile, both the Gauss-Jacobi (GJ) and the standard Gauss-Seidel (GS) require a number of iterations roughly linear in NRN_{R} (the number of grid points along the rr direction). As shown in Fig. 2, on the finest tested grid, GJ requires a factor of 100\approx 100 more iterations than GS, which itself requires a factor of 100\approx 100 more iterations than the rwGS. However, this massive speedup in the latter is not completely free. Comparing (13) and (16), we see that the row-wise iteration uses larger timesteps than the standard discretization in areas of the domain where Δθ/θ˙<Δr/r˙\Delta\theta/\dot{\theta}<\Delta r/\dot{r}, in particular close to the target. Additional numerical tests (summarized in Supplementary Materials) show that, despite slightly larger discretization errors in rwGS, these errors decrease with the same rate as in GS under the grid refinement.

200200400400600600800800100010001200120014001400101.010^{1.0}101.510^{1.5}102.010^{2.0}102.510^{2.5}103.010^{3.0}NrN_{r}IterationsGauss-JacobiGauss-SeidelRowwise Gauss-Seidel
Figure 2: The number of iterations required to reach convergence with a fixed ϵ=108\epsilon=10^{-8} and an increasing number of grid points along each state dimension. In this plot, we have fixed σ=0.05,a=0.1\sigma=0.05,a=0.1. At each point, NθN_{\theta} is scaled accordingly with NrN_{r}.

IV-C Live Simulations / Control Synthesis

Using the grid approximation of the value function obtained above, we can perform live sailboat navigation using two derived data structures. Our “switchgrid”, indicates the gridpoints for which it was optimal to switch to the opposite tack (i.e., wherever NΔ<SΔN^{\Delta}<S^{\Delta} in Algorithm 1). We can also similarly keep track of a “direction grid,” recording the controls uiu_{i} that minimize SΔS^{\Delta} at each gridpoint.

In our simulations, we evolve the system (3) using a simple Euler–Maruyama method, with the stochastic component of the θ\theta evolution sampled from 𝒩(0,σ2)\mathcal{N}(0,\sigma^{2}) in each timestep. We switch to the opposite tack whenever at least 33 of the surrounding 44 gridpoints have NΔ<SΔN^{\Delta}<S^{\Delta}; otherwise, we stay on the current tack and use the optimal control value of the nearest gridpoint in the direction grid.

V Numerical Experiments

We now apply our algorithm to solve the optimal control problem (3)-(4) for a variety of system parameters, and sample stochastic optimal trajectories for each scenario333 To ensure the computational reproducibility, we provide the full source code, movies illustrating these simulations, and additional convergence data at https://eikonal-equation.github.io/Stochastic_Sailing/ . For all results shown, the target radius is R=0.1R=0.1 and the domain is (r,θ)[R,2.0]×[0,2π](r,\theta)\in[R,2.0]\times[0,2\pi]. Target boundary conditions are set as v(r,θ,q)=0,r<Rv(r,\theta,q)=0,\;\forall r<R and exit boundary conditions are v(r,θ,q)=106,r>2.0.v(r,\theta,q)=10^{6},\;\forall r>2.0.

We first focus on problems with zero drift (a=0)(a=0), but varying the diffusivity coefficient σ\sigma. In Fig. 3, we plot switchgrid solutions for a collection of σ\sigma values, noting that these grids are defined and plotted in (r,θ)(r,\theta) state space, not the absolute position space. Points are marked red if switchgrid(r,θ,q=1)=1\texttt{switchgrid}(r,\theta,q=1)=1, blue if switchgrid(r,θ,q=2)=1\texttt{switchgrid}(r,\theta,q=2)=1, and white if both are 0. The darker shaded regions correspond to the switchgrids obtained if we use the zero-evolution switching operator of [1] as compared to our (7). The black circle at the center marks the target set 𝒯\mathcal{T}. In general, stochastic optimal trajectories “bounce” between these red and blue switching fronts as they approach 𝒯\mathcal{T}.

With σ=0,\sigma=0, the switching fronts lie at constant θ\theta, corresponding to the angles that locally maximize the projection of the polar speed plot (Fig. 1(a)) onto the axis of the wind. Once a sailboat hits this front, it can switch tack and stay at this constant angle that maximizes its speed towards the target (see Fig. 4(a)). As σ\sigma increases, these switching fronts contract more tightly around θ=0,180\theta=0^{\circ},180^{\circ}. As the future wind state grows more uncertain with larger σ\sigma, the optimal policy becomes more conservative and constrains the boat to smaller excursions in θ\theta before switching to the opposite tack. We observe that in these zero-drift problems, our improved switching operator makes the switching grids modestly more conservative, switching earlier by a few degrees in Fig. 3(b,c).

In Fig. 4, we plot samples of stochastic optimal trajectories corresponding to these control policies, with blue markers indicating where tack switches occur. Similarly to [1], we initialize six boats at positions (r,θ){(1.8,0.0),(1.93,±0.39)},(r,\theta)\in\{(1.8,0.0),(1.93,\pm 0.39)\}, and tacks q{1,2}q\in\{1,2\}. We then evolve and control each boat’s state until it reaches its target as described in Sec. IV-C. We can see that, as foreshadowed by the switchgrids, at higher values of σ\sigma the trajectories tend to stay within tighter cones from the target.

Refer to caption
(a) σ=0.0\sigma=0.0
Refer to caption
(b) σ=0.05\sigma=0.05
Refer to caption
(c) σ=0.10\sigma=0.10
Figure 3: Switchgrids for stochastic zero-drift (a=0a=0) problems in (r,θ)(r,\theta) space. Red points indicate that switchgrid(r,θ,q=1)=1\texttt{switchgrid}(r,\theta,q=1)=1, blue points indicate that switchgrid(r,θ,q=2)=1\texttt{switchgrid}(r,\theta,q=2)=1, and white points indicate that both are 0. Darkened regions correspond to the switchgrids obtained when thezero-evolution switching operator 𝒩v(𝒓,q)=v(𝒓,3q)\mathcal{N}v(\boldsymbol{r},q)=v(\boldsymbol{r},3-q) from [1] is used instead of our (15).
Refer to caption
(a) σ=0.0\sigma=0.0
Refer to caption
(b) σ=0.05\sigma=0.05
Refer to caption
(c) σ=0.10\sigma=0.10
Figure 4: Six sample trajectories for stochastic zero-drift (a=0a=0) problems: starting from 3 different (r,θ)(r,\theta) positions and 2 different starting tacks (q=1,2q=1,2). Within each panel, all trajectories use the same sampled wind evolution. Blue points indicate the tack switch events.
Refer to caption
(a) a=0.05a=0.05
Refer to caption
(b) a=0.10a=0.10
Refer to caption
(c) a=0.15a=0.15
Figure 5: Switchgrids for stochastic (σ=0.05\sigma=0.05) nonzero-drift problems. Colorings are as in Fig. 3.
Refer to caption
(a) a=0.05a=0.05
Refer to caption
(b) a=0.10a=0.10
Refer to caption
(c) a=0.15a=0.15
Figure 6: Six sample trajectories for stochastic (σ=0.05\sigma=0.05) non-zero-drift problems: starting from 3 different (r,θ)(r,\theta) positions and 2 different starting tacks. Within each panel, all trajectories use the same sampled wind evolution. Blue points indicate the tack switch events.

Next, in Figs. 5 and 6 we show switchgrids and sampled trajectories for problems in which the drift parameter aa is nonzero, with a fixed σ=0.05\sigma=0.05. As the drift increases, we observe the switchgrids and the trajectories becoming skewed and asymmetric towards negative θ\theta. This reflects that it is optimal to “lead” the wind in anticipation of future wind evolution, allowing the boat to stay in the portion of state space that allows it to steer at the optimal 4045~{}40^{\circ}\mathchar 45\relax 45^{\circ} angle while the wind evolves.

Additionally, some “wave”-like features start to emerge in the switchgrids for large aa. Empirically, we observe that the number of “waves” in the switchgrid corresponds to the number of 180180^{\circ} turns the wind tends to make by the time an average boat at r=Rr=R reaches the target. Each “wave” can thus be interpreted as corresponding to the optimal control during one (average) half-cycle of the wind. We notice that accurately modeling the switching operator is critical in these high-drift scenarios, as the two options produce dramatically different wave-like structures in the switchgrids.

VI Conclusions

We have introduced an accelerated semi-Lagrangian approach to solve a hybrid stochastic optimal control problem arising in match race sailing. By working in a reduced coordinate system and designing a nearly-causal discretization, we were able to produce control policies on high-resolution grids which provide insight into tack-switching strategies under unpredictable weather.

While the conversion to reduced coordinates greatly accelerates our solution, it also introduces several limitations since we had to assume that the problem is radially symmetric. This restricts the target shape to a circle, and does not allow the inclusion of state constraints such as obstacles or nearby coastline. However, it does still allow for modeling match races between multiple agents. Our approach will require keeping track of (r,θ)(r,\theta) coordinates for each agent, yielding an overall state space with one less dimension than in [4]. For this reason, we hope that our approach will allow to rapidly test the implications of various weather models or match strategies in “ideal” conditions far away from any barriers.

This paper is focused on a risk-neutral task of finding a feedback policy μ\mu that minimizes the expected time to target 𝔼[Tμ]\mathbb{E}\left[T_{\mu}\right]. However, practitioners often prefer robust control policies more suitable when bad outcomes are costly. Two such popular approaches are the “risk-sensitive” control [12, 13] and HH^{\infty} control [14]. But neither of these provides any direct bounds on the likelihood of bad scenarios; i.e., wμ(c)=(Tμc).w_{\mu}(c)=\mathbb{P}\left(T_{\mu}\geq c\right). An efficient method for minimizing wμ(c)w_{\mu}(c) for all threshold values cc simultaneously was recently introduced for piecewise-deterministic Markov processes in [15]. We hope that it can be similarly extended to the current context. It would be also interesting to extend the methods of multiobjective optimal control (e.g., [16, 17]) to find Pareto-optimal policies reflecting the rational tradeoffs between conflicting optimization criteria (e.g., 𝔼[Tμ]\mathbb{E}\left[T_{\mu}\right] vs wμ(c)w_{\mu}(c) or even wμ(c1)w_{\mu}(c_{1}) vs wμ(c2)w_{\mu}(c_{2})).

Acknowledgements: The authors are grateful to Roberto Ferretti for sparking their interest in this class of problems.

References

  • [1] R. Ferretti and A. Festa, “Optimal Route Planning for Sailing Boats: A Hybrid Formulation,” Journal of Optimization Theory and Applications, vol. 181, no. 3, pp. 1015–1032, 2019.
  • [2] A. B. Philpott, S. G. Henderson, and D. Teirney, “A Simulation Model for Predicting Yacht Match Race Outcomes,” Operations Research, vol. 52, no. 1, pp. 1–16, Feb. 2004.
  • [3] L. Vinckenbosch, “Stochastic Control and Free Boundary Problems for Sailboat Trajectory Optimization,” 2012, publisher: Lausanne, EPFL.
  • [4] S. Cacace, R. Ferretti, and A. Festa, “Stochastic hybrid differential games and match race problems,” Applied Mathematics and Computation, vol. 372, p. 124966, 2020.
  • [5] “ORC sailboat data,” Tech. Rep. [Online]. Available: https://jieter.github.io/orc-data/site/
  • [6] A. Bensoussan and J. Menaldi, “Stochastic hybrid control,” J. Math. Anal. Appl., vol. 249, no. 1, pp. 261–288, 2000.
  • [7] G. Barles and P. Souganidis, “Convergence of approximation schemes for fully nonlinear second order equations,” Asymptotic Analysis, vol. 4, no. 3, pp. 271–283, 1991.
  • [8] R. Ferretti and H. Zidani, “Monotone Numerical Schemes and Feedback Construction for Hybrid Control Systems,” Journal of Optimization Theory and Applications, vol. 165, no. 2, pp. 507–531, 2015.
  • [9] M. Abramowitz and I. A. Stegun, Eds., Handbook of mathematical functions: with formulas, graphs, and mathematical tables, 9th ed., ser. Dover books on mathematics.   Dover Publ, 2013.
  • [10] M. Falcone and R. Ferretti, Semi-Lagrangian approximation schemes for linear and Hamilton-Jacobi equations.   SIAM, 2014, vol. 133.
  • [11] R. Ferretti, “A Technique for High-order Treatment of Diffusion Terms in Semi-Lagrangian Schemes,” Communications in Computational Physics, vol. 8, no. 2, pp. 445–470, June 2010.
  • [12] R. A. Howard and J. E. Matheson, “Risk-Sensitive Markov Decision Processes,” Management Science, vol. 18, no. 7, pp. 356–369, 1972.
  • [13] W. H. Fleming and W. M. McEneaney, “Risk-sensitive control on an infinite time horizon,” SIAM Journal on Control and Optimization, vol. 33, no. 6, pp. 1881–1915, 1995.
  • [14] T. Başar and P. Bernhard, HH^{\infty}-optimal control and related minimax design problems.   Birkhauser, 1995.
  • [15] E. Cartee, A. Farah, A. Nellis, J. Van Hook, and A. Vladimirsky, “Quantifying and managing uncertainty in piecewise-deterministic Markov processes,” preprint: https://arxiv.org/abs/2008.00555, 2020.
  • [16] A. Kumar and A. Vladimirsky, “An efficient method for multiobjective optimal control and optimal control subject to integral constraints,” Journal of Computational Mathematics, vol. 28, pp. 517–551, 2010.
  • [17] A. Desilles and H. Zidani, “Pareto front characterization for multi-objective optimal control problems using Hamilton-Jacobi approach,” SIAM J. Control Optim., vol. 57, no. 6, pp. 3884–3910, 2018.