This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The dually flat structure for singular models

Naomichi Nakajima M2, Graduate School of Information Science and Technology, Hokkaido University, Sapporo 060-0814, Japan [email protected]  and  Toru Ohmoto Department of Mathematics, Faculty of Science, Global Station of Bigdata and cybersecurity (GiCORE-GSB), Hokkaido University, Sapporo 060-0810, Japan [email protected]
Abstract.

The dually flat structure introduced by Amari-Nagaoka is highlighted in information geometry and related fields. In practical applications, however, the underlying pseudo-Riemannian metric may often be degenerate, and such an excellent geometric structure is rarely defined on the entire space. To fix this trouble, in the present paper, we propose a novel generalization of the dually flat structure for a certain class of singular models from the viewpoint of Lagrange and Legendre singularity theory – we introduce a quasi-Hessian manifold endowed with a possibly degenerate metric and a particular symmetric cubic tensor, which exceeds the concept of statistical manifolds and is adapted to the theory of (weak) contrast functions. In particular, we establish Amari-Nagaoka’s extended Pythagorean theorem and projection theorem in this general setup, and consequently, most of applications of these theorems are suitably justified even for such singular cases. This work is motivated by various interests with different backgrounds from Frobenius structure in mathematical physics to Deep Learning in data science.

Key words and phrases:
Dually flat structure, canonical divergence, Hessian geometry, Legendre duality, wavefronts, caustics, singularity theory.
Key words and phrases:
First keyword and Second keyword and More

1. Introduction

The dually flat structure is highlighted in information geometry – it brings a united geometric insight on various fields such as statistical science, convex optimizations, (quantum) information theory, and so on (Amari-Nagaoka [1], Amari [2], Chentsov [8]). This is also essentially the same as the Hessian structure in affine differential geometry (Shima [25]). On a CC^{\infty}-manifold MM, a dually flat structure is a triplet (h,,)(h,\nabla,\nabla^{*}) where hh is a pseudo-Riemannian metric (i.e., non-degenerate symmetric (0,2)(0,2)-tensor) and \nabla and \nabla^{*} are flat affine connections on MM satisfying certain properties; the most particular feature is that the metric is locally given by the Hessian matrix of some potential function in \nabla-affine coordinates. In practical applications, however, the Hessian matrix may often be degenerate along some locus Σ\Sigma of points in MM, and then, strictly speaking, the differential geometric method can not be directly applied. We call such a space a singular model, roughly. In the present paper, we propose a novel generalization of the dually flat structure for a certain class of singular models from the viewpoint of contact geometry and singularity theory. This provides a new framework for general hierarchical structures – we introduce a quasi-Hessian manifold MM endowed with a possibly degenerate quadratic tensor hh and a particular symmetric cubic tensor CC, that exceeds the concept of statistical manifolds and very fits with the theory of contrast functions due to Eguchi [13]. In fact, such MM naturally possesses a canonical divergence 𝒟:M×M\mathcal{D}:M\times M\to\mathbb{R}, which is a weak contrast function compatible with hh and CC (Theorem 4.10). The key is the Legendre duality, which does exist even under the presence of the degeneracy locus Σ\Sigma of hh. In spite of no metric hh and no connection \nabla available (!), we generalize in a natural way the Amari-Chentsov cubic tensor h\nabla h to a symmetric tensor CC defined on the entire space MM (that is possible even in case that M=ΣM=\Sigma), and especially we establish Amari-Nagaoka’s extended Pythagorean theorem and projection theorem in this setup (Theorems 4.7, 4.8). Consequently, in principle, most of applications of these theorems are suitably justified even for such degenerate cases.

As the first observation, we see that if the Hessian of a potential is degenerate, the graph of the dual potential (i.e., the Legendre transform of the potential) is no longer a submanifold but a wavefront having singularities branched along its caustics. More generally, our quasi-Hessian manifold MM is locally accompanied with two kinds of wavefronts, later called the e/me/m-wavefronts, as an alternate to the pair of a convex potential and its dual. To grasp the point, it would be helpful to refer to Fig.1 and Fig.2 in §3.1 in advance. Those two wavefronts are mutually tied by the Legendre duality in a strict sense, and also they have ‘height functions’ (i.e., projections to the zz and zz^{\prime}-axes, respectively) by using which we generalize the Bregman divergence to our divergence 𝒟\mathcal{D}. Further, we associate the pair of coherent tangent bundles

(E,E,Φ:TME)and(E,E,Φ:TME),(E,\nabla^{E},\Phi:TM\to E)\;\;\mbox{and}\;\;(E^{\prime},\nabla^{E^{\prime}},\Phi^{\prime}:TM\to E^{\prime}),

where each of EE and EE^{\prime} is a vector bundle on MM that is an alternative to the tangent bundle TMTM and equipped with a flat connection and a bundle map from TMTM measuring the degeneration of hh, and EE and EE^{\prime} are dual to each other. Intuitively, the coherent tangent bundles come from the splitting of the standard contact structure into the directions of the e/me/m-wavefront projections (see §2.1, §3.1 and §3.2). The role of \nabla and \nabla^{*} on a dually flat space is undertaken by E\nabla^{E} and E\nabla^{E^{\prime}} of those vector bundles on our singular model MM, and then the new cubic tensor CC of TMTM is defined by using these connections through Φ\Phi and Φ\Phi^{\prime}3.4). Originally, the notion of coherent tangent bundles has been introduced for studying Riemannian geometry of singular wavefronts by Saji-Umehara-Yamada [24], and here we borrow an affine geometry version. In the case that hh is non-degenerate (E=TME=TM, E=TME^{\prime}=T^{*}M), the dually flat structure in the original form is naturally recovered.

Singularities of caustics and wavefronts have thoroughly been investigated in Lagrange and Legendre singularity theory (initiated by Arnol’d, Zakalyukin and Hörmander) in relation with a broad range of subjects such as classical mechanics, thermodynamics, geometric optics, Fourier integral operators, control theory, catastrophe theory and so on [3, 4, 5, 6, 17, 18, 21]. We bring several techniques or concepts in this theory into information geometry, that may suggest new directions in both theory and application. In fact, the present paper is motivated by various interests from different backgrounds:

  • -

    A typical example of quasi-Hessian manifolds is a general affine hypersurface MM in n+1\mathbb{R}^{n+1}. It possesses mixed geometry with changing metric-type – that goes back to Darboux and others dealing with a rich geometry of parabolic curve Σ\Sigma (the curve of inflection points) separating elliptic and hyperbolic domains on a surface in 3\mathbb{R}^{3} [6, 5, 22]. That is also related to nonconvex optimization and variational problems [12].

  • -

    Any Lagrange submanifold of 2n\mathbb{R}^{2n} is a quasi-Hessian manifold. If it is flat, then the cubic tensor CC satisfies the so-called WDVV equation, which mainly appears in topological field theory, that yields a version of Frobenius manifold-like structure [23, 27, 15]. That is also related to geometry of Poisson manifolds and paraKähler structure [7, 9].

  • -

    In statistical inference, any curved exponential family produces a quasi-Hessian manifold, which represents the \nabla^{*}-extrinsic geometry in the ambient family. For instance, it is useful for studying catastrophe phenomena in root selections of the maximum likelihood equation [26]. Almost all statistical learning machines including deep neural networks allow degeneration of Fisher-Rao matrices [2, 14, 28], to which we are seeking for a new approach.

In the present paper, we will focus only on basic ideas and write them plainly in a self-contained manner as much as possible – most arguments are elementary and checked by direct computations, and we will NOT enter into any detail of singularity theory here. Therefore, perhaps, this paper would be readable enough for anyone with various background. Nevertheless, we believe that this paper contains some new observations in this field. The rest of this paper is organized as follows. In §2 we give a brief summary on some basics in contact geometry and the dually flat structure. In §3, after reviewing the definition of Lagrange and Legendre singularities, we introduce quasi-Hessian manifolds. In §4, the associated canonical divergence will be discussed; the extended Pythagorean theorem and projection theorem are presented in our setting and also we give a relation with contrast functions. In §5, we pick up some possible applications and open questions.

Throughout, bold letters denote column vectors, e.g., 𝒙=(x1,,xn)T\mbox{\boldmath$x$}=(x_{1},\cdots,x_{n})^{T}, and the notation with prime 𝒙\mbox{\boldmath$x$}^{\prime} simply means to distinguish from 𝒙x (not mean any operation like differential or transpose). Also we let f𝒙\frac{\partial f}{\partial\mbox{\boldmath$x$}} denote (fx1,,fxn)T(\frac{\partial f}{\partial x_{1}},\cdots,\frac{\partial f}{\partial x_{n}})^{T} for short as usual. We assume that manifolds and maps are of class CC^{\infty}, for the simplicity.

The authors are partly supported by GiCORE-GSB, Hokkaido University, and JSPS KAKENHI Grant Numbers JP17H06128 and JP18K18714.

2. Dually Flat Structure

2.1. Contact geometry and Legendre duality

To begin with, we summarize a minimal set of basic knowledge in contact geometry which will be used throughout this paper. As best references, we recommend Chap.18-22 of Arnol’d et al [4], Appendix of Arnol’d [3] and Izumiya-Ishikawa [17].

A contact manifold is a (2n+1)(2n+1)-dimensional manifold NN endowed with a maximally non-integrable hyperplane field ξpTpN(pN)\xi_{p}\subset T_{p}N\;(p\in N), i.e., ξ\xi is locally expressed by the kernel of a 11-form θ\theta satisfying the non-degeneracy condition θ(dθ)n0\theta\wedge(d\theta)^{n}\not=0. This field ξ\xi is called a contact structure on NN. The most important example is the standard contact space 2n+1\mathbb{R}^{2n+1}; it is the 11-jet space of functions on the affine nn-space

2n+1=J1(n,)=Tn×,\mathbb{R}^{2n+1}=J^{1}(\mathbb{R}^{n},\mathbb{R})=T^{*}\mathbb{R}^{n}\times\mathbb{R},

where the 11-jet of a function ff at a point 𝒂a means the Taylor coefficients at 𝒂a of order 1\leq 1, i.e., (df(𝒂),f(𝒂))T𝒂n×(df(\mbox{\boldmath$a$}),f(\mbox{\boldmath$a$}))\in T_{\mbox{\tiny$\mbox{\boldmath$a$}$}}^{*}\mathbb{R}^{n}\times\mathbb{R}. The contact structure is given by the 11-form

θ=dz𝒑Td𝒙=dzi=1npidxi,\theta=dz-\mbox{\boldmath$p$}^{T}d\mbox{\boldmath$x$}=dz-\sum_{i=1}^{n}p_{i}dx_{i},

called the standard contact form, where zz is the last coordinate, and 𝒙x and 𝒑p denote, respectively, the base and the fiber coordinates of the cotangent bundle TnT^{*}\mathbb{R}^{n} (we always write coordinates in this order). We often write 𝒙n\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}} and 𝒑n\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}} in order to distinguish them. Note that the standard contact structure relies on the affine structure of the base space, not on the choice of coordinates 𝒙x.

The famous Darboux theorem tells that the contact structure is locally unique; namely, for any contact manifold NN, we can always find a system of local coordinates around any point pNp\in N, in which the contact structure is presented by the standard one.

A Legendre submanifold LL of a (2n+1)(2n+1)-dimensional contact manifold (N,ξ)(N,\xi) is an nn-dimensional integral manifold of the field ξ\xi, i.e., TpLξpT_{p}L\subset\xi_{p} for every pLp\in L. It is easy to see that in the standard 2n+1\mathbb{R}^{2n+1}, the graph of (df,f)(df,f) of a function z=f(𝒙)z=f(\mbox{\boldmath$x$})

(1) L={(𝒙,𝒑,z)2n+1|𝒑=f𝒙,z=f(𝒙)}L=\left\{\,(\mbox{\boldmath$x$},\mbox{\boldmath$p$},z)\in\mathbb{R}^{2n+1}\,\bigg{|}\,\mbox{\boldmath$p$}=\frac{\partial f}{\partial\mbox{\boldmath$x$}},\;z=f(\mbox{\boldmath$x$})\,\right\}

is a Legendre submanifold, and conversely, every Legendre submanifold which is diffeomorphically projected to 𝒙n\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}} is expressed in this form (1).

The standard symplectic structure of TnT^{*}\mathbb{R}^{n} is defined by the non-degenerate closed 22-form

ω=i=1ndxidpi.\omega=\sum_{i=1}^{n}dx_{i}\wedge dp_{i}.

A Lagrange submanifold is an nn-dimensional submanifold over which ω\omega vanishes. A typical example is the graph of dfdf, i.e., the image of LL of the form (1) via projection along the zz-axis. Any Lagrange submanifold of TnT^{*}\mathbb{R}^{n} is always locally liftable to a Legendre submanifold of Tn×T^{*}\mathbb{R}^{n}\times\mathbb{R} uniquely up to a transition parallel to the zz-axis. If it is entirely liftable, then we call it an exact Lagrange submanifold.

A Legendre fibration π:NB\pi:N\to B is a fiber bundle whose total space is a (2n+1)(2n+1)-dimensional contact manifold NN, the base is an (n+1)(n+1)-dimensional manifold BB and every fiber π1(x)\pi^{-1}(x) is Legendrian. The most typical example is the projection from the standard space

π:2n+1𝒙n×,(𝒙,𝒑,z)(𝒙,z).\pi:\mathbb{R}^{2n+1}\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R},\;\;(\mbox{\boldmath$x$},\mbox{\boldmath$p$},z)\mapsto(\mbox{\boldmath$x$},z).

Every Legendre fibration is locally described in this typical form with suitable local coordinates.

The Legendre duality is described as follows. Consider the transformation :2n+12n+1\mathcal{L}:\mathbb{R}^{2n+1}\to\mathbb{R}^{2n+1} given by

(𝒙,𝒑,z)=(𝒙,𝒑,z):=(𝒑,𝒙,𝒑T𝒙z).(\mbox{\boldmath$x$}^{\prime},\mbox{\boldmath$p$}^{\prime},z^{\prime})=\mathcal{L}(\mbox{\boldmath$x$},\mbox{\boldmath$p$},z):=(\mbox{\boldmath$p$},\mbox{\boldmath$x$},\mbox{\boldmath$p$}^{T}\mbox{\boldmath$x$}-z).

It is a contactomorphism, i.e., \mathcal{L} preserves the contact hyperplane fields ξ\xi; indeed, θ=(dz𝒑Td𝒙)=θ\mathcal{L}^{*}\theta=\mathcal{L}^{*}(dz^{\prime}-\mbox{\boldmath$p$}^{\prime T}d\mbox{\boldmath$x$}^{\prime})=-\theta. Put

π:=π:2n+1𝒑n×,(𝒙,𝒑,z)(𝒑,𝒑T𝒙z),\pi^{\prime}:=\pi\circ\mathcal{L}:\mathbb{R}^{2n+1}\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}\times\mathbb{R},\;(\mbox{\boldmath$x$},\mbox{\boldmath$p$},z)\mapsto(\mbox{\boldmath$p$},\mbox{\boldmath$p$}^{T}\mbox{\boldmath$x$}-z),

which is also a Legendre fibration. Then, the double fibration structure of the standard contact space is defined as the following diagram:

(dL) 𝒙n×\textstyle{\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}}2n+1\textstyle{\mathbb{R}^{2n+1}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}π\scriptstyle{\;\;\pi}π\scriptstyle{\pi^{\prime}\;\;}𝒑n×\textstyle{\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}\times\mathbb{R}}

Let Π:n×n\Pi:\mathbb{R}^{n}\times\mathbb{R}\to\mathbb{R}^{n} be the projection to the first factor and put

π1=Ππ:2n+1𝒙n,π1=Ππ:2n+1𝒑n.\pi_{1}=\Pi\circ\pi:\mathbb{R}^{2n+1}\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}},\quad\pi^{\prime}_{1}=\Pi\circ\pi^{\prime}:\mathbb{R}^{2n+1}\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}.

Let LL be a Legendre submanifold of 2n+1\mathbb{R}^{2n+1}. In this paper, we call LL a regular model if LL is diffeomorphic to some open subsets U𝒙nU\subset\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}} and V𝒑nV\subset\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}} via projections π1\pi_{1} and π1\pi^{\prime}_{1}, respectively. Equivalently, there exsit functions z=f(𝒙)z=f(\mbox{\boldmath$x$}) on UU and z=φ(𝒑)z^{\prime}=\varphi(\mbox{\boldmath$p$}) on VV such that

  • -  

    L2n+1=J1(𝒙n,)L\subset\mathbb{R}^{2n+1}=J^{1}(\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}},\mathbb{R}) is the graph of (df,f)(df,f);

  • -  

    (L)2n+1=J1(𝒑n,)\mathcal{L}(L)\subset\mathbb{R}^{2n+1}=J^{1}(\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}},\mathbb{R}) is the graph of (dφ,φ)(d\varphi,\varphi).

Then we have

𝒑=f𝒙,𝒙=φ𝒑,f(𝒙)+φ(𝒑)𝒑T𝒙=0.\mbox{\boldmath$p$}=\frac{\partial f}{\partial\mbox{\boldmath$x$}},\;\;\mbox{\boldmath$x$}=\frac{\partial\varphi}{\partial\mbox{\boldmath$p$}},\quad f(\mbox{\boldmath$x$})+\varphi(\mbox{\boldmath$p$})-\mbox{\boldmath$p$}^{T}\mbox{\boldmath$x$}=0.

The coordinate change is the gradient map UVU\to V, 𝒙𝒑=f𝒙\mbox{\boldmath$x$}\mapsto\mbox{\boldmath$p$}=\frac{\partial f}{\partial\mbox{\tiny$\mbox{\boldmath$x$}$}}. It is diffeomorphic, thus the Hessian matrix of f(𝒙)f(\mbox{\boldmath$x$}) is non-degenerate at every 𝒙U\mbox{\boldmath$x$}\in U. Here, the inverse map VUV\to U is given by 𝒑𝒙=φ𝒑\mbox{\boldmath$p$}\mapsto\mbox{\boldmath$x$}=\frac{\partial\varphi}{\partial\mbox{\tiny$\mbox{\boldmath$p$}$}}, and its Hessian matrix is the inverse of that of f(𝒙)f(\mbox{\boldmath$x$}). We say that z=φ(𝒑)z^{\prime}=\varphi(\mbox{\boldmath$p$}) is the Legendre transform of z=f(𝒙)z=f(\mbox{\boldmath$x$}) and vice-versa. We call ff a potential function and φ\varphi its dual potential. This correspondence is the Legendre duality. It is very common in, e.g., convex analysis: if z=f(𝒙)z=f(\mbox{\boldmath$x$}) is strictly convex, then z=φ(𝒑)z^{\prime}=\varphi(\mbox{\boldmath$p$}) is also (see Remark 2.2).

An affine Legendre equivalence, a new terminology introduced in this paper, is defined by an affine transformation F:2n+12n+1\mathcal{L}_{F}:\mathbb{R}^{2n+1}\to\mathbb{R}^{2n+1} of the form

F(𝒙,𝒑,z)=(A𝒙+𝒃,A𝒑+𝒃,z+𝒄T𝒙+d)\mathcal{L}_{F}(\mbox{\boldmath$x$},\mbox{\boldmath$p$},z)=(A\mbox{\boldmath$x$}+\mbox{\boldmath$b$},\;A^{\prime}\mbox{\boldmath$p$}+\mbox{\boldmath$b$}^{\prime},\;z+\mbox{\boldmath$c$}^{T}\mbox{\boldmath$x$}+d)

together with affine transformations

F(𝒙,z)=(A𝒙+𝒃,z+𝒄T𝒙+d),F(\mbox{\boldmath$x$},z)=(A\mbox{\boldmath$x$}+\mbox{\boldmath$b$},z+\mbox{\boldmath$c$}^{T}\mbox{\boldmath$x$}+d),
F(𝒑,z)=(A𝒑+𝒃,z+𝒄T𝒑+d),F^{*}(\mbox{\boldmath$p$},z^{\prime})=(A^{\prime}\mbox{\boldmath$p$}+\mbox{\boldmath$b$}^{\prime},z^{\prime}+\mbox{\boldmath$c$}^{\prime T}\mbox{\boldmath$p$}+d^{\prime}),

where AA is invertible and

A=(AT)1,𝒃=A𝒄,𝒃=A𝒄,d=𝒃T𝒃d.A^{\prime}=(A^{T})^{-1},\;\;\mbox{\boldmath$b$}^{\prime}=A^{\prime}\mbox{\boldmath$c$},\;\;\mbox{\boldmath$b$}=A\mbox{\boldmath$c$}^{\prime},\;\;d^{\prime}=\mbox{\boldmath$b$}^{\prime T}\mbox{\boldmath$b$}-d.

Note that FF (or FF^{*}) determines F\mathcal{L}_{F}. It is easy to see that F\mathcal{L}_{F} preserves the contact form and the double fibrations (dL)(dL), i.e., Fθ=θ\mathcal{L}_{F}^{*}\theta=\theta and the following diagram commutes:

𝒙n×z\textstyle{\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}_{z}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}F\scriptstyle{F}\scriptstyle{\simeq}2n+1\textstyle{\mathbb{R}^{2n+1}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}π\scriptstyle{\;\;\pi}π\scriptstyle{\pi^{\prime}\;\;}F\scriptstyle{\mathcal{L}_{F}}\scriptstyle{\simeq}𝒑n×z\textstyle{\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}\times\mathbb{R}_{z^{\prime}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}F\scriptstyle{F^{*}}\scriptstyle{\simeq}𝒙n×z\textstyle{\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}_{z}}2n+1\textstyle{\mathbb{R}^{2n+1}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}π\scriptstyle{\;\;\pi}π\scriptstyle{\pi^{\prime}\;\;}𝒑n×z\textstyle{\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}\times\mathbb{R}_{z^{\prime}}}
Definition 2.1.

We say that two Legendre submanifolds L1,L2L_{1},L_{2} of 2n+1\mathbb{R}^{2n+1} are affine Legendre equivalent if there exists some F\mathcal{L}_{F} which identifies L1L_{1} with L2L_{2}.

Remark 2.2.

(Projective duality) The Legendre duality is an affine expression of the projective duality. We denote by n+1(:=Pn+1)\mathbb{P}^{n+1}\,(:=\mathbb{R}P^{n+1}) the real projective space of dimension n+1n+1 and by n+1(:=Pn+1)\mathbb{P}^{n+1*}\,(:=\mathbb{R}P^{n+1*}) the dual projective space. Let NN denote the incidence submanifold of n+1×n+1\mathbb{P}^{n+1}\times\mathbb{P}^{n+1*} which consists of pairs (p,λ)(p,\lambda) with pλp\in\lambda, i.e., NN is a codimension one submanifold (dimN=2n+1\dim N=2n+1) defined by

p0x0+p1x1++pn+1xn+1=0p_{0}x_{0}+p_{1}x_{1}+\cdots+p_{n+1}x_{n+1}=0

for p=[x0::xn+1]n+1p=[x_{0}:\cdots:x_{n+1}]\in\mathbb{P}^{n+1} and λ=[p0::pn+1]n+1\lambda=[p_{0}:\cdots:p_{n+1}]\in\mathbb{P}^{n+1*}. Note that NN is naturally identified with the projective cotangent bundle PTn+1(=PTn+1)PT^{*}\mathbb{P}^{n+1}\,(=PT^{*}\mathbb{P}^{n+1*}), and thus NN becomes a contact manifold [4, §20.1]. Consider the open subset ONO_{N} of NN defined by xn+10x_{n+1}\not=0 and p00p_{0}\not=0. We may set xn+1=p0=1x_{n+1}=p_{0}=-1, and put z=x0z=x_{0} and z=pn+1z^{\prime}=p_{n+1}, then the above equation is rewritten as

z+z𝒑T𝒙=0.z+z^{\prime}-\mbox{\boldmath$p$}^{T}\mbox{\boldmath$x$}=0.

Clearly, ONO_{N} has two systems of coordinates, (𝒙,𝒑,z)(\mbox{\boldmath$x$},\mbox{\boldmath$p$},z) and (𝒑,𝒙,z)(\mbox{\boldmath$p$},\mbox{\boldmath$x$},z^{\prime}), and the coordinate change between them is just the above :2n+12n+1\mathcal{L}:\mathbb{R}^{2n+1}\to\mathbb{R}^{2n+1} preserving the contact structure of NN. In projective geometry, the double Legendre fibration

n+1\textstyle{\mathbb{P}^{n+1}}N\textstyle{N\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}π\scriptstyle{\;\;\;\pi}π\scriptstyle{\pi^{\prime}\;\;\;}n+1\textstyle{\mathbb{P}^{n+1*}}

expresses the duality principle on points and hyperplanes, where π\pi and π\pi^{\prime} are projections of the projective cotangent bundles. Restrict this diagram to ONO_{N} and identify ONO_{N} with 2n+1\mathbb{R}^{2n+1} using coordinates (𝒙,𝒑,z)(\mbox{\boldmath$x$},\mbox{\boldmath$p$},z), we get the diagram (dL)(dL). For instance, in case of n=1n=1, consider a parameterized plane curve

C:(x,z):=(x,f(x))22.C:(x,z):=(x,f(x))\in\mathbb{R}^{2}\subset\mathbb{P}^{2}.

Then its projective dual is the following curve consisting of the tangent lines:

C:(p,z)=(dfdx(x),xdfdx(x)f(x))22.\textstyle C^{*}:(p,z^{\prime})=(\frac{df}{dx}(x),x\frac{df}{dx}(x)-f(x))\in\mathbb{R}^{2*}\subset\mathbb{P}^{2*}.

If CC is convex, then CC^{*} is also. If CC has an inflection point, e.g., f(x)=13x3+f(x)=\frac{1}{3}x^{3}+\cdots, then CC^{*} has a cusp at the corresponding point, (p,z)=(x2+,23x3+)(p,z^{\prime})=(x^{2}+\cdots,\frac{2}{3}x^{3}+\cdots), and therefore, CC^{*} is locally the graph of a bi-valued function, z=±23p3/2+(p0)z^{\prime}=\pm\frac{2}{3}p^{3/2}+\cdots\,(p\geq 0).

2.2. Dually flat structure

Let LL be a Legendre submanifold of a regular model with potential function z=f(𝒙)z=f(\mbox{\boldmath$x$}). The Hessian matrix

H(p)=[2fxixj(π1(p))](pL)H(p)=\left[\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}(\pi_{1}(p))\right]\quad(p\in L)

is invertible, thus it defines a pseudo-Riemannian metric hh on LL, called the Hessian metric associated to ff. Additionally, through the projections π1\pi_{1} and π1\pi^{\prime}_{1}, the fixed affine structures of 𝒙n\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}} and 𝒑n\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}} induce two different flat affine connections ,\nabla,\nabla^{*} on LL, respectively.

Definition 2.3.

([1]) The triplet (h,,)(h,\nabla,\nabla^{*}) is called the dually flat structure on a regular model LL.

Note that (h,,)(h,\nabla,\nabla^{*}) is preserved under affine Legendre equivalence; indeed, F\mathcal{L}_{F} induces affine transformations of 𝒙n\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}} and 𝒑n\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}} and simply adds a linear function 𝒄T𝒙+d\mbox{\boldmath$c$}^{T}\mbox{\boldmath$x$}+d to the potential z=f(𝒙)z=f(\mbox{\boldmath$x$}).

The dually flat structure is traditionally introduced in terms of differential geometry in an intrinsic way. We briefly summarize it below, see [1, 2, 20, 25] for the detail.

A statistical manifold is a pseudo-Riemannian manifold (M,h)(M,h) equipped with a torsion-free affine connection \nabla being compatible with hh, i.e., the cubic tensor T:=hT:=\nabla h is totally symmetric:

(Xh)(Y,Z)=(Yh)(X,Z)(\nabla_{X}h)(Y,Z)=(\nabla_{Y}h)(X,Z)

for vector fields X,YX,Y and ZZ. Equivalently [20, p.306], a stastistical manifold may also be defined as a manifold endowed with a pseudo-Riemannian metric hh and a totally symmetric (0,3)(0,3)-tensor TT (due to Lauritzen), that is also described within the theory of contrast functions in [13]. The dual connection \nabla^{*} (with respect to hh) is defined by

Xh(Y,Z)=h(XY,Z)+h(Y,XZ),Xh(Y,Z)=h(\nabla_{X}Y,Z)+h(Y,\nabla_{X}^{*}Z),

and then \nabla^{*} is torsion-free and h\nabla^{*}h is also symmetric. Furthermore, if \nabla is flat (i.e., torsion-free and curvature-free), then \nabla^{*} is also. Such a statistical manifold with flat connections is called a dually flat manifold [1, 2] or a Hessian manifold [25, 20]. The most notable characteristic of a dually flat manifold is that locally it holds that

h=dfh=\nabla df

for some local potential ff. In other words, the metric hh is expressed by the non-degenerate Hessian matrix of z=f(𝒙)z=f(\mbox{\boldmath$x$}) in \nabla-affine local coordinates 𝒙x. Moreover, the \nabla^{*}-affine coordinates 𝒑p are then given by 𝒑=f𝒙(𝒙)\mbox{\boldmath$p$}=\frac{\partial f}{\partial\mbox{\tiny$\mbox{\boldmath$x$}$}}(\mbox{\boldmath$x$}).

This local expression of a dually flat manifold MM exactly provides a regular model LL in 2n+1\mathbb{R}^{2n+1}, the graph of 11-jet of a local potential, equipped with the dually flat structure in the sense of Definition 2.3. Such a regular model LL is uniquely determined up to affine Legendre equivalence. To see this precisely, suppose that the metric hh is locally expressed by the Hessian matrices HαH_{\alpha} and HβH_{\beta} of two potential functions fα(𝒙α)f_{\alpha}(\mbox{\boldmath$x$}^{\alpha}) and fβ(𝒙β)f_{\beta}(\mbox{\boldmath$x$}^{\beta}) in different \nabla-affine local coordinates, respectively. Here, let (Uα,𝒙α=(x1α,,xnα)T)(U_{\alpha},\mbox{\boldmath$x$}^{\alpha}=(x^{\alpha}_{1},\cdots,x^{\alpha}_{n})^{T}) and (Uβ,𝒙β=(x1β,,xnβ)T)(U_{\beta},\mbox{\boldmath$x$}^{\beta}=(x^{\beta}_{1},\cdots,x^{\beta}_{n})^{T}) denote the charts with Uα,UβMU_{\alpha},U_{\beta}\subset M, UαUβU_{\alpha}\cap U_{\beta}\not=\emptyset. By definition, there is an affine transformation

𝒙β=ψ(𝒙α)=A𝒙α+𝒃.\mbox{\boldmath$x$}^{\beta}=\psi(\mbox{\boldmath$x$}^{\alpha})=A\mbox{\boldmath$x$}^{\alpha}+\mbox{\boldmath$b$}.

By the assumption, it holds that ATHβ(p)A=Hα(p)A^{T}H_{\beta}(p)A=H_{\alpha}(p) for every pUαUβp\in U_{\alpha}\cap U_{\beta}, thus any second partial derivatives of the composite function fβψ(𝒙α)f_{\beta}\circ\psi(\mbox{\boldmath$x$}^{\alpha}) coincide with those of fα(𝒙α)f_{\alpha}(\mbox{\boldmath$x$}^{\alpha}). Namely, these two functions are the same up to some linear term:

fβψ(𝒙α)=fα(𝒙α)+𝒄T𝒙α+d.f_{\beta}\circ\psi(\mbox{\boldmath$x$}^{\alpha})=f_{\alpha}(\mbox{\boldmath$x$}^{\alpha})+\mbox{\boldmath$c$}^{T}\mbox{\boldmath$x$}^{\alpha}+d.

Then the affine transformation

F(𝒙α,z):=(A𝒙α+𝒃,z+𝒄T𝒙α+d)F(\mbox{\boldmath$x$}^{\alpha},z):=(A\mbox{\boldmath$x$}^{\alpha}+\mbox{\boldmath$b$},z+\mbox{\boldmath$c$}^{T}\mbox{\boldmath$x$}^{\alpha}+d)

sends the graph of z=fα(𝒙α)z=f_{\alpha}(\mbox{\boldmath$x$}^{\alpha}) to the graph of z=fβ(𝒙β)z=f_{\beta}(\mbox{\boldmath$x$}^{\beta}). Hence, the corresponding affine Legendre equivalence F\mathcal{L}_{F} identifies two regular models, Lα2n+1L_{\alpha}\subset\mathbb{R}^{2n+1} defined by (dfα,fα)(df_{\alpha},f_{\alpha}) and Lβ2n+1L_{\beta}\subset\mathbb{R}^{2n+1} defined by (dfβ,fβ)(df_{\beta},f_{\beta}), on the overlap.

Actually, less noticed, though, this simple observation says that any dually flat manifold is an affine manifold having an atlas {(Uα,𝒙α)}αΛ\{(U_{\alpha},\mbox{\boldmath$x$}^{\alpha})\}_{\alpha\in\Lambda} with affine coordinate changes ψαβ(α,βΛ)\psi_{\alpha}^{\beta}\,(\alpha,\beta\in\Lambda) so that it is additionally equipped with local potentials {fα}αΛ\{f_{\alpha}\}_{\alpha\in\Lambda} whose graphs are glued by affine transformations FαβF_{\alpha}^{\beta} of the above form. The affine structure gives the flat connection \nabla, and local potentials restore the Hessian metric hh by gluing {Hα}αΛ\{H_{\alpha}\}_{\alpha\in\Lambda}. At the level of Legendre submanifolds given by 11-jets of local potentials, notice again that F\mathcal{L}_{F} preserves (h,,)(h,\nabla,\nabla^{*}). Therefore, we may rephrase the above statement in the following way:

Proposition 2.4.

Any dually flat or Hessian manifold is an affine manifold made up by gluing several regular models LαL_{\alpha} in 2n+1\mathbb{R}^{2n+1} via affine Legendre equivalence. The metric hh and the pair of affine connections \nabla and \nabla^{*} are reconstructed by the dually flat structures of LαL_{\alpha} in the sense of Definition 2.3.

This gluing construction will be generalized later to introduce our quasi-Hessian manifolds (Definition 3.19 in §3.3).

Remark 2.5.

Since each gluing map acts also on a neighborhood of a regular model in 2n+1\mathbb{R}^{2n+1}, the gluing construction yields a dually flat manifold as a Legendre submanifold of some ambient contact manifold (also it produces a Lagrange submanifold of some symplectic manifold). Let (M,h,,)(M,h,\nabla,\nabla^{*}) be a dually flat manifold, and suppose that there exists a global potential f:Mf:M\to\mathbb{R} with h=dfh=\nabla df. Take local charts UαU_{\alpha} of \nabla-flat coordinates, then local potentials f|Uαf|_{U_{\alpha}} define regular models LαL_{\alpha} in J1(Uα,)=TUα×J^{1}(U_{\alpha},\mathbb{R})=T^{*}U_{\alpha}\times\mathbb{R}, and they are glued together by affine Legendre equivalence of the form F\mathcal{L}_{F} with 𝒄=0\mbox{\boldmath$c$}=0 and d=0d=0. Conversely, gluing local models by this special kind of affine Legendre equivalences yields a dually flat manifold with a global potential. As a weaker situation, suppose that there exists a closed 11-form σ\sigma with h=σh=\nabla\sigma; then MM is said to be of Koszul type [25]. This case corresponds to gluing regular models by affine Legendre equivalence of the form F\mathcal{L}_{F} with 𝒄=0\mbox{\boldmath$c$}=0 but possibly d0d\not=0.

Example 2.6.

(Amari [1, 2]). An exponential family MM is a family of probability density functions of the form

p(𝒖|θ)=exp(𝒖Tθψ(θ))p(\mbox{\boldmath$u$}|\theta)=\exp(\mbox{\boldmath$u$}^{T}\theta-\psi(\theta))

where 𝒖=(u1,,un)n\mbox{\boldmath$u$}=(u_{1},\cdots,u_{n})\in\mathbb{R}^{n} is a random valuable (with its measure dμd\mu) and θ=(θ1,,θn)TUn\theta=(\theta_{1},\cdots,\theta_{n})^{T}\in U\subset\mathbb{R}^{n} are parameters (UU is an open set). The normalization factor ψ(θ)=logexp(𝒖Tθ)𝑑μ\psi(\theta)=\log\int\exp(\mbox{\boldmath$u$}^{T}\theta)\,d\mu is called the potential of this family. Fix the affine structure of UU, and put i=θi\partial_{i}=\frac{\partial}{\partial\theta_{i}}. We see that the expectation is the corresponding dual coordinate

ηi:=𝐄[ui|θ]=iψ(θ)\eta_{i}:={\bf E}[u_{i}|\theta]=\partial_{i}\psi(\theta)

and the (co)variance are written by

hij:=𝐕[𝒖|θ]ij=ijψ(θ)=𝐄[(ilogp)(jlogp)],h_{ij}:={\bf V}[\mbox{\boldmath$u$}|\theta]_{ij}=\partial_{i}\partial_{j}\psi(\theta)={\bf E}\left[(\partial_{i}\log p)(\partial_{j}\log p)\right],

where the last one means the Fisher-Rao information. If h=[hij]h=[h_{ij}] is positive and one regards θ,η\theta,\eta as the ,\nabla,\nabla^{*}-affine coordinates, respectively, then (M,h,,)(M,h,\nabla,\nabla^{*}) becomes a dually flat manifold. Normal distributions and finite discrete distributions are typical examples.

3. Quasi-Hessian structure

Our main idea is to consider not only regular models but also general Legendre submanifolds of 2n+1\mathbb{R}^{2n+1}. Then the Lagrange-Legendre singularity theory naturally comes up into the picture (Arnol’d el al [4], Izumiya-Ishikawa [17]). Nevertheless, in this paper, we only use very basic notions/properties in the theory, which are prepared in §3.1. As another new ingredient, we introduce in §3.2 an affine geometric version of the coherent tangent bundle in Saji-Umehara-Yamada [24]. In §3.3 and §3.4 we define a quasi-Hessian manifold endowed with a particular cubic tensor.

3.1. e/me/m-wavefronts and e/me/m-caustics

A Legendre map is the composition

πι:LNB\pi\circ\iota:L\to N\to B

of the inclusion ι\iota of a Legendre submanifold LL and the projection of a Legendre fibration π:NB\pi:N\to B. The image is usually called a wavefront; we denote it by W(L)W(L) in this paper. The Legendre map πι:LB\pi\circ\iota:L\to B may have singular points, i.e., points pp on LL at which the rank of the differential is not maximum (equivalently, TpLT_{p}L is tangent to the fiber of π\pi), called Legendre singularities [4, 17]. Then the wavefront is no longer a submanifold.

From now on, we consider the diagram (dL)(dL) of double Legendre fibrations on 2n+1\mathbb{R}^{2n+1} and an arbitrary Legendre submanifold L2n+1L\subset\mathbb{R}^{2n+1}. So we have two Legendre maps

πe:=πι:L𝒙n×z,πm:=πι:L𝒑n×z\pi^{e}:=\pi\circ\iota:L\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}_{z},\quad\pi^{m}:=\pi^{\prime}\circ\iota:L\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}\times\mathbb{R}_{z^{\prime}}

and call them the ee- and mm-Legendre maps, respectively, following a traditional notation in information geometry (“ee-” and “mm-” come from words in statisitcs, i.e., exponential and mixture) [1].

Definition 3.1.

(e/me/m-wavefronts) We set

We(L):=πe(L)𝒙n×z,Wm(L):=πm(L)𝒑n×z,W_{e}(L):=\pi^{e}(L)\subset\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}_{z},\;\;\;W_{m}(L):=\pi^{m}(L)\subset\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}\times\mathbb{R}_{z^{\prime}},

and call them the e/me/m-wavefronts associated to LL, respectively.

The e/me/m-wavefronts are Legendre dual to each other in point-hyperplane duality principle (Remark 2.2).

Usually, the projection of a Lagrange submanifold of TnT^{*}\mathbb{R}^{n} to the base is called a Lagrange map [4, 17]. So we have the e/me/m-Lagrange maps

π1e=Ππe:L𝒙n,π1m=Ππm:L𝒑n.\pi^{e}_{1}=\Pi\circ\pi^{e}:L\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}},\quad\pi^{m}_{1}=\Pi\circ\pi^{m}:L\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}.

It is easy to see that the following two conditions on points pLp\in L are equivalent:

  • pp is a singular point of the Legendre map πe:L𝒙n×z\pi^{e}:L\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}_{z};

  • pp is a singular point of the Lagrange map π1e:L𝒙n\pi^{e}_{1}:L\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}.

Indeed, any vTpLv\in T_{p}L enjoys dzp(v)𝒑(p)Td𝒙p(v)=0dz_{p}(v)-\mbox{\boldmath$p$}(p)^{T}d\mbox{\boldmath$x$}_{p}(v)=0, thus, if d𝒙p(v)=0d\mbox{\boldmath$x$}_{p}(v)=0, then dzp(v)=0dz_{p}(v)=0.

Definition 3.2.

(e/me/m-caustics) The ee-critical set C(π1e)LC(\pi^{e}_{1})\subset L consists of all singular points of the ee-Lagrange map π1e:L𝒙n\pi^{e}_{1}:L\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}, and we call its image π1eC(π1e)𝒙n\pi^{e}_{1}C(\pi^{e}_{1})\subset\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}} the ee-caustics associated to LL. The mm-version is defined in entirely the same way.

Definition 3.3.

We say that LL is locally a regular model around pLp\in L if there is an open neighborhood of pp in LL which is a regular model of 2n+1\mathbb{R}^{2n+1}, i.e., pp is neither ee-critical nor mm-critical.

Consider the case that pLp\in L is not ee-critical but mm-critical (see toy examples in Examples 3.4, 3.5 below). Then We(L)W_{e}(L) is the graph of some local potential function z=f(𝒙)z=f(\mbox{\boldmath$x$}) defined near π1e(p)\pi^{e}_{1}(p). Take 𝒙x as local coordinates of LL around pp. Then the ee-Lagrange map is written as the identity map of 𝒙x and the ee-caustics is empty, while the mm-Lagrange map πm:L𝒑n\pi^{m}:L\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}} is written as the gradient map 𝒑=f(𝒙)\mbox{\boldmath$p$}=\nabla f(\mbox{\boldmath$x$}). Now it is critical at pp, so Wm(L)W_{m}(L) is singular. In this case we call LL a model with degenerate potential. In particular, if ff admits inflection points in strict sense, Wm(L)W_{m}(L) is the graph of a multi-valued function z=φ(𝒑)z^{\prime}=\varphi(\mbox{\boldmath$p$}) branched along the mm-caustics in 𝒑n\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}.

Example 3.4.

(A2A_{2}-singularity). Let

f(𝒙)=x133+x222.f(\mbox{\boldmath$x$})=\frac{x_{1}^{3}}{3}+\frac{x_{2}^{2}}{2}.

Then 𝒑=(p1,p2)=(x12,x2)\mbox{\boldmath$p$}=(p_{1},p_{2})=(x_{1}^{2},x_{2}) and the degeneracy locus Σ\Sigma of the Hessian h=2fh=\nabla^{2}f is defined by x1=0x_{1}=0. See the pictures on the left in Fig. 1.

  • -

    The ee-wavefront We(L)W_{e}(L) is smooth and has parabolic points along Σ\Sigma. There is no ee-caustic.

  • -

    The mm-wavefront Wm(L)W_{m}(L) is a singular surface with cuspidal edge; it is the graph of the bi-valued dual potential

    z=𝒑T𝒙z=23x13+12x22=±23p13/2+12p22z^{\prime}=\mbox{\boldmath$p$}^{T}\mbox{\boldmath$x$}-z=\frac{2}{3}x_{1}^{3}+\frac{1}{2}x_{2}^{2}=\pm\frac{2}{3}p_{1}^{3/2}+\frac{1}{2}p_{2}^{2}

    defined on p10p_{1}\geq 0 and branched along the mm-caustics p1=0p_{1}=0.

This singularity does not appear, if the Hessian is be non-negative. Note that for every statistical model, the Fisher-Rao metric is non-negative.

Example 3.5.

(A3A_{3}-singularity). Let

f(𝒙)=x144+x222.f(\mbox{\boldmath$x$})=\frac{x_{1}^{4}}{4}+\frac{x_{2}^{2}}{2}.

Then 𝒑=(p1,p2)=(x13,x2)\mbox{\boldmath$p$}=(p_{1},p_{2})=(x_{1}^{3},x_{2}) and the degeneracy locus Σ\Sigma of the Hessian h=2fh=\nabla^{2}f is defined by x1=0x_{1}=0. See the pictures on the right in Fig. 1.

  • -

    The ee-wavefront We(L)W_{e}(L) is smooth and convex. There is no ee-caustic.

  • -

    The mm-wavefront Wm(L)W_{m}(L) is a singular surface; it is the graph of the dual potential

    z=𝒑T𝒙z=34x14+12x22=34p14/3+12p22,z^{\prime}=\mbox{\boldmath$p$}^{T}\mbox{\boldmath$x$}-z=\frac{3}{4}x_{1}^{4}+\frac{1}{2}x_{2}^{2}=\frac{3}{4}p_{1}^{4/3}+\frac{1}{2}p_{2}^{2},

    which is defined on the entire space but singular along the mm-caustics p1=0p_{1}=0.

This is a typically degenerate minimum of functions and also a typical type of singularities with 2\mathbb{Z}_{2}-symmetry (cf. [4]).

Refer to caption
Figure 1. The e/me/m-wavefronts and the e/me/m-caustics (Examples 3.4 and 3.5).

Furthermore, it can happen that pLp\in L is ee-critical and mm-critical simultaneously. Then both wavefronts We(L)W_{e}(L) and Wm(L)W_{m}(L) become singular at πe(p)\pi^{e}(p) and πm(p)\pi^{m}(p), respectively. In general, by Implicit Function Theorem, there is a partition IJ={1,,n}I\sqcup J=\{1,\cdots,n\} (IJ=I\cap J=\emptyset) such that LL is locally parametrized around pp by coordinates 𝒙I=(xi)T\mbox{\boldmath$x$}_{I}=(x_{i})^{T} and 𝒑J=(pj)T\mbox{\boldmath$p$}_{J}=(p_{j})^{T} (iIi\in I, jJj\in J). In fact, we can find a function g(𝒙I,𝒑J)g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J}) such that LL near pp is expressed by

𝒑I=g𝒙I,𝒙J=g𝒑J,z=𝒑JT𝒙J+g(𝒙I,𝒑J),\mbox{\boldmath$p$}_{I}=\frac{\partial g}{\partial\mbox{\boldmath$x$}_{I}},\;\;\mbox{\boldmath$x$}_{J}=-\frac{\partial g}{\partial\mbox{\boldmath$p$}_{J}},\;\;z=\mbox{\boldmath$p$}_{J}^{T}\mbox{\boldmath$x$}_{J}+g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J}),

where we write g𝒙I=(gxi1,)T(I=(i1,i2,))\frac{\partial g}{\partial\mbox{\tiny$\mbox{\boldmath$x$}$}_{I}}=(\frac{\partial g}{\partial x_{i_{1}}},\cdots)^{T}\;(I=(i_{1},i_{2},\cdots)). This follows from the form (1) in §2.1 and the canonical transformation

2n+12n+1(𝒙,𝒑,z)(𝒙I,𝒑J,𝒑I,𝒙J,𝒑JT𝒙J+z)\mathbb{R}^{2n+1}\to\mathbb{R}^{2n+1}\;\;(\mbox{\boldmath$x$},\mbox{\boldmath$p$},z)\mapsto(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J},\mbox{\boldmath$p$}_{I},-\mbox{\boldmath$x$}_{J},-\mbox{\boldmath$p$}_{J}^{T}\mbox{\boldmath$x$}_{J}+z)

which preserves the contact structure. Usually, g(𝒙I,𝒑J)g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J}) is called a generating function of LL around pp [4, §20]. In particular, in case that J=J=\emptyset (resp. I=I=\emptyset), a generating function is a potential z=f(𝒙)z=f(\mbox{\boldmath$x$}) (resp. dual potential z=φ(𝒑)z^{\prime}=\varphi(\mbox{\boldmath$p$})). The e/me/m-Legendre maps are locally expressed as follows.

πe:(L,p)𝒙n×z,(𝒙I,𝒑J)(𝒙I,g𝒑J,𝒑JTg𝒑J+g(𝒙I,𝒑J)),πm:(L,p)𝒑n×z,(𝒙I,𝒑J)(g𝒙I,𝒑J,𝒙ITg𝒙Ig(𝒙I,𝒑J)).\begin{array}[]{ll}\pi^{e}:(L,p)\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}_{z},&\displaystyle(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})\mapsto\left(\mbox{\boldmath$x$}_{I},\;-\frac{\partial g}{\partial\mbox{\boldmath$p$}_{J}},\;-\mbox{\boldmath$p$}_{J}^{T}\frac{\partial g}{\partial\mbox{\boldmath$p$}_{J}}+g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})\right),\\ \pi^{m}:(L,p)\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}\times\mathbb{R}_{z^{\prime}},&\displaystyle(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})\mapsto\left(\frac{\partial g}{\partial\mbox{\boldmath$x$}_{I}},\;\mbox{\boldmath$p$}_{J},\;\mbox{\boldmath$x$}_{I}^{T}\frac{\partial g}{\partial\mbox{\boldmath$x$}_{I}}-g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})\right).\end{array}

Also the e/me/m-Lagrange maps π1e,π1m\pi^{e}_{1},\pi^{m}_{1} are obtained by ignoring the last zz and zz^{\prime}-coordinate, respectively.

Example 3.6.

Let

g(x1,p2)=x133+p244g(x_{1},p_{2})=\frac{x_{1}^{3}}{3}+\frac{p_{2}^{4}}{4}

be a generating function. The e/me/m-Legendre maps πe\pi^{e} and πm\pi^{m} send (x1,p2)(x_{1},p_{2}) to

(x1,x2,z)=(x1,p23,x1333p244),(p1,p2,z)=(x12,p2,2x133p244),(x_{1},x_{2},z)=\left(x_{1},-p_{2}^{3},\frac{x_{1}^{3}}{3}-\frac{3p_{2}^{4}}{4}\right),\quad(p_{1},p_{2},z^{\prime})=\left(x_{1}^{2},p_{2},\frac{2x_{1}^{3}}{3}-\frac{p_{2}^{4}}{4}\right),

respectively, so those images We(L)W_{e}(L) and Wm(L)W_{m}(L) are singular surfaces having some own geometric nature, and the e/me/m-caustics are defined by x2=0x_{2}=0 on 𝒙2\mathbb{R}^{2}_{\mbox{\tiny$\mbox{\boldmath$x$}$}} and p1=0p_{1}=0 on 𝒑2\mathbb{R}^{2}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}, see Fig. 2.

Refer to caption
Figure 2. Both e/me/m-wavefronts are singular (Example 3.6).
Remark 3.7.

(Hierarchical structure) For a dually flat manifold LL with a potential z=f(𝒙)z=f(\mbox{\boldmath$x$}) (i.e., a regular model), there are two systems of coordinates, 𝒙x and 𝒑(=f𝒙)\mbox{\boldmath$p$}\,(=\frac{\partial f}{\partial\mbox{\tiny$\mbox{\boldmath$x$}$}}), which are \nabla-flat and \nabla^{*}-flat, respectively. That produces a hierarchical structure – we may take a new system of coordinates (𝒙I,𝒑J)(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J}), called mixed coordinates in Amari [2], which yields two foliations of complementary dimensions on LL defined by 𝒙I=const.\mbox{\boldmath$x$}_{I}=const. and 𝒑J=const.\mbox{\boldmath$p$}_{J}=const.; their leaves are \nabla^{*}-flat and \nabla-flat, respectively, and mutually orthogonal with respect to the Hessian metric associated to ff. This structure is useful for application, see [2]. For an arbitrary Legendre submanifold LL, a potential may not exist globally, but as seen above, for any pLp\in L, we can always find a generating function g(𝒙I,𝒑J)g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J}) on a neighborhood UU of pp (possible choices of the partition IJI\sqcup J depends on pp). That locally defines mixed coordinates (𝒙I,𝒑J)(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J}) and two orthogonal foliations on UU (see Remark 3.15 below). Usually these coordinates can not be extended to the entire space LL, because of the presence of e/me/m-caustics (i.e., hh is degenerate). Nevertheless, this new structure is well organized globally, that we will formulate properly in the following subsections.

3.2. Coherent tangent bundles

Let LL be a Legendre submanifold of 2n+1\mathbb{R}^{2n+1}. As seen above, the ee-wavefront We(L)W_{e}(L) is not a manifold in general, but there is an alternate to its ‘tangent bundle’. Every point pLp\in L defines a hyperplane EpE_{p} in Tπe(p)(𝒙n×z)=𝒙n×zT_{\pi^{e}(p)}(\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}_{z})=\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}_{z}, and the family of such hyperplanes form a vector bundle on LL of rank nn:

E(=EL):={(p,w)L×(𝒙n×z)dzp(w)𝒑(p)Td𝒙p(w)=0}.E(=E_{L}):=\{\;(p,w)\in L\times(\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}_{z})\;\mid\;dz_{p}(w)-\mbox{\boldmath$p$}(p)^{T}d\mbox{\boldmath$x$}_{p}(w)=0\;\}.

Since LL is Legendrian, we see

TpLξp=kerθp=(dπp)1(Ep),T_{p}L\subset\xi_{p}=\ker\theta_{p}=(d\pi_{p})^{-1}(E_{p}),

thus dπe(TpL)Epd\pi^{e}(T_{p}L)\subset E_{p}. We then associate a vector bundle map (a smooth fiber-preserving map which is linear on each fiber)

Φ:TLE,vpdπpe(vp).\Phi:TL\to E,\quad v_{p}\mapsto d\pi^{e}_{p}(v_{p}).

Note that Φ\Phi is isomorphic if and only if πe:L𝒙n×\pi^{e}:L\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R} is an immersion.

Remark 3.8.

We remark that EE is the “limiting” tangent bundle of the ee-wavefront We(L)W_{e}(L). Note that the kernel of dπp:Tp2n+1Tπ(p)(𝒙n×)d\pi_{p}:T_{p}\mathbb{R}^{2n+1}\to T_{\pi(p)}(\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}) is spanned by pi\frac{\partial}{\partial p_{i}}’s. If pLp\in L is a regular point of the ee-Legendre map πe:L𝒙n×\pi^{e}:L\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}, then TpLkerdπp={0}T_{p}L\cap\ker d\pi_{p}=\{0\} and

Ep=Imdπpe(TpL)=Tπe(p)We(L).E_{p}={\rm Im}\,d\pi^{e}_{p}(T_{p}L)=T_{\pi^{e}(p)}W_{e}(L).

In fact, in this case, πe\pi^{e} is an immersion around pp, so W(L)W(L) is a submanifold around πe(p)\pi^{e}(p). If a sequence of regular points pnLp_{n}\in L of πe\pi^{e} converges to a critical point pp, then the image of TpnLT_{p_{n}}L converges to EpE_{p} (in the Grassmannian of nn-planes in n+1=𝒙n×\mathbb{R}^{n+1}=\mathbb{R}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}^{n}\times\mathbb{R}) because of the continuity of the bundle EE. In this case, W(L)W(L) is singular at πe(p)\pi^{e}(p), thus the tangent space at that point is not defined, but it has the limiting tangent space EpE_{p} as an alternate. Another characterization of EpE_{p} is

Ep=ker[dπp:Tp2n+1Tπm(p)(𝒑n×z)]E_{p}=\ker\left[d\pi^{\prime}_{p}:T_{p}\mathbb{R}^{2n+1}\to T_{\pi^{m}(p)}(\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}\times\mathbb{R}_{z^{\prime}})\,\right]

through the inclusion Ep𝒙n{0}z2n+1=Tp2n+1E_{p}\subset\mathbb{R}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}^{n}\oplus\{0\}\oplus\mathbb{R}_{z}\subset\mathbb{R}^{2n+1}=T_{p}\mathbb{R}^{2n+1}. In fact, the Jacobian matrix of π\pi^{\prime} at pp is

[OE0𝒑(p)T𝒙(p)T1]\left[\begin{array}[]{ccc}O&E&0\\ \mbox{\boldmath$p$}(p)^{T}&\mbox{\boldmath$x$}(p)^{T}&-1\end{array}\right]

so its kernel is given by dzp𝒑(p)Td𝒙p=0dz_{p}-\mbox{\boldmath$p$}(p)^{T}d\mbox{\boldmath$x$}_{p}=0 and d𝒑p=0d\mbox{\boldmath$p$}_{p}=0. Note that the contact hyperplane splits as

ξp=kerdπpkerdπp.\xi_{p}=\ker d\pi^{\prime}_{p}\oplus\ker d\pi_{p}.

Let ~\widetilde{\nabla} be the flat connection on the affine space 𝒙n×\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}, and for any pLp\in L, let ψp:𝒙n×Ep\psi_{p}:\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}\to E_{p} denote the linear projection along the zz-axis. Then a connection E\nabla^{E} of the vector bundle EE over LL is naturally defined by

XEξ(p):=ψp~Xξ(p)\nabla^{E}_{X}\xi(p):=\psi_{p}\circ\widetilde{\nabla}_{X}\xi(p)

where ξ\xi is any section of EE and XX is any vector field on LL around pp.

Lemma 3.9.

The resulting connection E\nabla^{E} is flat and ‘relatively torsion-free’, i.e., for any vector fields X,YX,Y on LL, it holds that

XE(Φ(Y))YE(Φ(X))=Φ([X,Y]).\nabla^{E}_{X}(\Phi(Y))-\nabla^{E}_{Y}(\Phi(X))=\Phi([X,Y]).

Proof :  Put si(p):=xi+pizEps_{i}(p):=\frac{\partial}{\partial x_{i}}+p_{i}\frac{\partial}{\partial z}\in E_{p} (1in1\leq i\leq n), then they form a frame of flat global sections of EE:

XEsi=ψp(~Xsi)=ψp(X(pi)z)=0.\nabla^{E}_{X}s_{i}=\psi_{p}(\tilde{\nabla}_{X}s_{i})=\psi_{p}(X(p_{i})\frac{\partial}{\partial z})=0.

Thus E\nabla^{E} is flat. Next, a key point is that Φ\Phi is represented by the Jacobi matrix of the ee-Lagrange map π1e=(f1,,fn):L𝒙n\pi^{e}_{1}=(f_{1},\cdots,f_{n}):L\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}, i.e., Φ(j)=i=1n(jfi)si\Phi(\partial_{j})=\sum_{i=1}^{n}(\partial_{j}f_{i})s_{i} in local coordinates (t1,,tn)(t_{1},\cdots,t_{n}) of LL with j=tj\partial_{j}=\frac{\partial}{\partial t_{j}}. Let X=k=1nakkX=\sum_{k=1}^{n}a_{k}\partial_{k} and Y=j=1nbjjY=\sum_{j=1}^{n}b_{j}\partial_{j}, then

XEΦ(Y)=i,j,k((kjfi)akbj+(jfi)ak(kbj))si.\nabla^{E}_{X}\Phi(Y)=\sum_{i,j,k}((\partial_{k}\partial_{j}f_{i})a_{k}b_{j}+(\partial_{j}f_{i})a_{k}(\partial_{k}b_{j}))s_{i}.

The rest is shown by a direct computation. \Box

Definition 3.10.

We call (E,Φ,E)(E,\Phi,\nabla^{E}) the coherent tangent bundle associated to the ee-wavefront We(L)W_{e}(L).

Remark 3.11.

The definition of coherent tangent bundles is originally due to Saji-Umehara-Yamada [24, §6] from the viewpoint of Riemannian geometry. They have studied several kinds of curvatures associated to wavefronts. In our case, we use the fixed affine structure of the ambient space of the wavefront, instead of metric. Also affine differential geometry of wavefronts should be rich.

In entirely the same way, for the mm-wavefront Wm(L)W_{m}(L), the coherent tangent bundle EE^{\prime} with Φ:=dπm:TLE\Phi^{\prime}:=d\pi^{m}:TL\to E^{\prime} and E\nabla^{E^{\prime}} is defined:

E(=EL):={(p,w)L×(𝒑n×z)dz(w)𝒙(p)Td𝒑(w)=0}.E^{\prime}(=E^{\prime}_{L}):=\{\;(p,w)\in L\times(\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}\times\mathbb{R}_{z^{\prime}})\;\mid\;dz^{\prime}(w)-\mbox{\boldmath$x$}(p)^{T}d\mbox{\boldmath$p$}(w)=0\;\}.

In fact, the double Legendre fibration (dL)(dL) can be viewed as the pair of maps (π1,π)(\pi\circ\mathcal{L}^{-1},\pi) using different coordinates (𝒑,𝒙,z)(\mbox{\boldmath$p$},\mbox{\boldmath$x$},z^{\prime}) of 2n+1\mathbb{R}^{2n+1}, and then the above construction yields (E,Φ,E)(E^{\prime},\Phi^{\prime},\nabla^{E^{\prime}}) in this dual side. In particular, EpE^{\prime}_{p} is identified with kerdπpe\ker d\pi^{e}_{p} (see Remark 3.8).

We have defined EE and EE^{\prime} as vector bundles on LL, although they are actually defined on the ambient space 2n+1\mathbb{R}^{2n+1}. The contact hyperplane ξ\xi has the direct sum decomposition:

ξp=kerdπpkerdπpEpEp𝒙n𝒑n.\xi_{p}=\ker d\pi^{\prime}_{p}\oplus\ker d\pi_{p}\simeq E_{p}\oplus E^{\prime}_{p}\simeq\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\oplus\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}.

Here we have canonical frames of flat sections for both EE and EE^{\prime},

si(p)=xi+pizEp,si(p)=pi+xizEp,s_{i}(p)=\frac{\partial}{\partial x_{i}}+p_{i}\frac{\partial}{\partial z}\in E_{p},\quad s_{i}^{*}(p)=\frac{\partial}{\partial p_{i}}+x_{i}\frac{\partial}{\partial z^{\prime}}\in E^{\prime}_{p},

by which EE^{\prime} is identified with the dual to EE and vice-vasa, and there are natural correspondence slxls_{l}\leftrightarrow\frac{\partial}{\partial x_{l}} and slpls_{l}^{*}\leftrightarrow\frac{\partial}{\partial p_{l}} via projections along the zz and zz^{\prime}-axes. The vector bundle ξ\xi carries not only the symplectic form ω=i=1ndxidpi\omega=\sum_{i=1}^{n}dx_{i}\wedge dp_{i} but also a pseudo-Riemannian metric of type (n,n)(n,n) induced from

τ:=i=1ndxidpi=12i=1n(dxidpi+dpidxi).\tau:=\sum_{i=1}^{n}dx_{i}dp_{i}=\frac{1}{2}\sum_{i=1}^{n}(dx_{i}\otimes dp_{i}+dp_{i}\otimes dx_{i}).

Using frames sis_{i} and sjs_{j}^{*}, we may write vectors of EpE_{p} and EpE^{\prime}_{p} as column vectors 𝒖u and 𝒖\mbox{\boldmath$u$}^{\prime}, respectively, and then

τ(𝒖𝒖,𝒗𝒗)=12[𝒖T𝒖T][OEEO][𝒗𝒗]=12(𝒖T𝒗+𝒖T𝒗)\tau(\mbox{\boldmath$u$}\oplus\mbox{\boldmath$u$}^{\prime},\mbox{\boldmath$v$}\oplus\mbox{\boldmath$v$}^{\prime})=\frac{1}{2}\left[\,\mbox{\boldmath$u$}^{T}\;\mbox{\boldmath$u$}^{\prime T}\right]\left[\begin{array}[]{cc}O&E\\ E&O\end{array}\right]\left[\begin{array}[]{cc}\mbox{\boldmath$v$}\\ \mbox{\boldmath$v$}^{\prime}\end{array}\right]=\frac{1}{2}(\mbox{\boldmath$u$}^{T}\mbox{\boldmath$v$}^{\prime}+\mbox{\boldmath$u$}^{\prime T}\mbox{\boldmath$v$})

and also ω(𝒖𝒖,𝒗𝒗)=12(𝒖T𝒗𝒖T𝒗)\omega(\mbox{\boldmath$u$}\oplus\mbox{\boldmath$u$}^{\prime},\mbox{\boldmath$v$}\oplus\mbox{\boldmath$v$}^{\prime})=\frac{1}{2}(\mbox{\boldmath$u$}^{T}\mbox{\boldmath$v$}^{\prime}-\mbox{\boldmath$u$}^{\prime T}\mbox{\boldmath$v$}).

Any affine Legendre equivalence F\mathcal{L}_{F} preserves ω\omega and τ\tau on ξ\xi, because it sends 𝒖𝒖\mbox{\boldmath$u$}\oplus\mbox{\boldmath$u$}^{\prime} to A𝒖A𝒖A\mbox{\boldmath$u$}\oplus A^{\prime}\mbox{\boldmath$u$}^{\prime} with A=(AT)1A^{\prime}=(A^{T})^{-1}.

Definition 3.12.

We define the quasi-Hessian metric of LL by the pullback of τ\tau:

h(Y,Z):=τ(ιY,ιZ)for Y,ZTLh(Y,Z):=\tau(\iota_{*}Y,\iota_{*}Z)\;\;\;\mbox{for $Y,Z\in TL$}

where ι=ΦΦ:TLξ=EE\iota_{*}=\Phi\oplus\Phi^{\prime}:TL\hookrightarrow\xi=E\oplus E^{\prime} is the inclusion (it is a Lagrange subbundle).

Note that hh is a possibly degenerate symmetric (0,2)(0,2)-tensor, although we abuse the word ‘metric’. If Φ\Phi is isomorphic, then hh exactly coincides with the Hessian metric associated to a potential z=f(𝒙)z=f(\mbox{\boldmath$x$}); any vector of TpLT_{p}L is written by 𝒖H𝒖ξp\mbox{\boldmath$u$}\oplus H\mbox{\boldmath$u$}\in\xi_{p} where H=[hij]H=[h_{ij}] is the Hessian matrix, thus

h(𝒖,𝒗)=τ(𝒖H𝒖,𝒗H𝒗)=𝒖TH𝒗.h(\mbox{\boldmath$u$},\mbox{\boldmath$v$})=\tau(\mbox{\boldmath$u$}\oplus H\mbox{\boldmath$u$},\mbox{\boldmath$v$}\oplus H\mbox{\boldmath$v$})=\mbox{\boldmath$u$}^{T}H\mbox{\boldmath$v$}.

In general, a local expression of hh is given as follows.

Lemma 3.13.

Let g(𝐱I,𝐩J)g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J}) be a generating function. Then,

h=i,kI2gxixkdxidxkj,lJ2gpjpldpjdpl.h=\sum_{i,k\in I}\frac{\partial^{2}g}{\partial x_{i}\partial x_{k}}\,dx_{i}dx_{k}-\sum_{j,l\in J}\frac{\partial^{2}g}{\partial p_{j}\partial p_{l}}\,dp_{j}dp_{l}.

Proof :  A direct computation shows

τ\displaystyle\tau =\displaystyle= d𝒙ITd𝒑I+d𝒙JTd𝒑J\displaystyle d\mbox{\boldmath$x$}_{I}^{T}d\mbox{\boldmath$p$}_{I}+d\mbox{\boldmath$x$}_{J}^{T}d\mbox{\boldmath$p$}_{J}
=\displaystyle= d𝒙ITd(Ig)d(Jg)Td𝒑J\displaystyle d\mbox{\boldmath$x$}_{I}^{T}d(\partial_{I}g)-d(\partial_{J}g)^{T}d\mbox{\boldmath$p$}_{J}
=\displaystyle= d𝒙ITgIId𝒙I+d𝒙ITgIJd𝒑J(gJId𝒙I)Td𝒑J(gJJd𝒑J)Td𝒑J\displaystyle d\mbox{\boldmath$x$}_{I}^{T}g_{II}d\mbox{\boldmath$x$}_{I}+d\mbox{\boldmath$x$}_{I}^{T}g_{IJ}d\mbox{\boldmath$p$}_{J}-(g_{JI}d\mbox{\boldmath$x$}_{I})^{T}d\mbox{\boldmath$p$}_{J}-(g_{JJ}d\mbox{\boldmath$p$}_{J})^{T}d\mbox{\boldmath$p$}_{J}
=\displaystyle= d𝒙ITgIId𝒙Id𝒑JTgJJd𝒑J.\displaystyle d\mbox{\boldmath$x$}_{I}^{T}g_{II}d\mbox{\boldmath$x$}_{I}-d\mbox{\boldmath$p$}_{J}^{T}g_{JJ}d\mbox{\boldmath$p$}_{J}.

Here we use the notation of symmetric products of 11-forms and (gJI)T=gIJ(g_{JI})^{T}=g_{IJ}. \Box

Lemma 3.14.

Let pLp\in L. The following properties are equivalent:

  1. (1)

    hh is non-degenerate at pp ;

  2. (2)

    pp is neither of ee-critical nor mm-critical;

  3. (3)

    LL is locally a regular model around pp;

  4. (4)

    hh is the Hessian metric associated to a local potential z=f(𝒙)z=f(\mbox{\boldmath$x$}) near pp;

  5. (5)

    both Φ\Phi and Φ\Phi^{\prime} are isomorphisms at pp.

Proof :  By Lemma 3.13, (1) means that both gIIg_{II} and gJJg_{JJ} are non-degenerate. Then, using normal forms of the e/me/m-Lagrange maps π1e\pi^{e}_{1} and π1m\pi^{m}_{1} written in the end of §2.3, those maps are locally diffeomorphic by Inverse Mapping Theorem, so it is just (2) and (3). That means that we can take a local potential z=f(𝒙)z=f(\mbox{\boldmath$x$}) as generating function, that is equivalent to (4). Since Φ\Phi and Φ\Phi^{\prime} are expressed by the Jacobi matrices of the e/me/m-Lagrange maps, (2) and (5) are the same. \Box

Remark 3.15.

As noted in Remark 3.7, locally we always find mixed coordinates (𝒙I,𝒑J)(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J}). By Lemma 3.13, even if hh is degenerate, leaves 𝒑J=const.\mbox{\boldmath$p$}_{J}=const. and 𝒙I=const.\mbox{\boldmath$x$}_{I}=const. are orthogonal: h(xi,pj)=0h(\frac{\partial}{\partial x_{i}},\frac{\partial}{\partial p_{j}})=0 (iIi\in I, jJj\in J).

Definition 3.16.

Let Σ(=ΣL,h)\Sigma\,(=\Sigma_{L,h}) denote the set of pLp\in L at which hh is degenerate, equivalently, the locus where either Φ\Phi or Φ\Phi^{\prime} is not isomorphic:

Σ=C(πe)C(πm).\Sigma=C(\pi^{e})\cup C(\pi^{m}).

We call Σ\Sigma the degeneracy locus of the quasi-Hessian metric hh.

Since LL is Legendrian, TpLT_{p}L is a Lagrange subspace of the symplectic vector space ξp=EpEp\xi_{p}=E_{p}\oplus E^{\prime}_{p}. Note that Φ\Phi (resp. Φ\Phi^{\prime}) is the linear projection of TpLT_{p}L to the factor EpE_{p} (resp. EpE_{p}^{\prime}), and especially, dimTpLEp1\dim T_{p}L\cap E_{p}\geq 1 (resp. dimTpLEp1\dim T_{p}L\cap E^{\prime}_{p}\geq 1) if and only if pp is mm-critical (resp. ee-critical). In particular, the null space of hh splits:

nullhp=kerΦpkerΦp=(TpLEp)(TpLEp).{\rm null}\,h_{p}=\ker\Phi^{\prime}_{p}\oplus\ker\Phi_{p}=(T_{p}L\cap E_{p})\oplus(T_{p}L\cap E^{\prime}_{p}).
Definition 3.17.

For an arbitrary Legendre submanifold L2n+1L\subset\mathbb{R}^{2n+1}, we call the triplet (h,(E,E,Φ),(E,E,Φ))(h,(E,\nabla^{E},\Phi),(E^{\prime},\nabla^{E^{\prime}},\Phi^{\prime})) the dually flat structure of LL.

TL\textstyle{TL\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}Φ\scriptstyle{\Phi}Φ\scriptstyle{\Phi^{\prime}}EL\textstyle{E_{L}}EL\textstyle{E_{L}^{\prime}}
Remark 3.18.

Given a regular model LL, we have the triple (h,E,E)(h,E,E^{\prime}), where both Φ\Phi and Φ\Phi^{\prime} are isomorphic. That restores the dually flat structure in the original form (Definition 2.3); Indeed, \nabla and \nabla^{*} on TLTL are uniquely determined by

Φ(XY)=XEΦ(Y),Φ(XY)=XEΦ(Y)\Phi(\nabla_{X}Y)=\nabla^{E}_{X}\Phi(Y),\quad\Phi^{\prime}(\nabla^{*}_{X}Y)=\nabla^{E^{\prime}}_{X}\Phi^{\prime}(Y)

where X,YX,Y are arbitrary vector fields on LL. On the other hand, a singular model LL with degenerate potential z=f(𝒙)z=f(\mbox{\boldmath$x$}) is the case that Φ\Phi is isomorphic and Φ\Phi^{\prime} is not. Then the connection \nabla of TLTL is obtained from E\nabla^{E} via Φ\Phi in the same way as above, while \nabla^{*} does not exist. If both Φ\Phi and Φ\Phi^{\prime} are not isomorphic, there is no connection on TLTL.

3.3. Quasi-Hessian manifolds

Our generalized dually flat structure presented in Definition 3.17 is compatible with affine Legendre equivalence. That means that if an affine Legendre equivalence F\mathcal{L}_{F} identifies Legendre submanifolds L1L_{1} and L2L_{2}, then the quasi-Hessian metrics are preserved, Fh2=h1\mathcal{L}_{F}^{*}h_{2}=h_{1}, and F\mathcal{L}_{F} naturally induces vector bundle isomorphisms between coherent tangent bundles, EL1EL2E_{L_{1}}\simeq E_{L_{2}} and EL1EL2E^{\prime}_{L_{1}}\simeq E^{\prime}_{L_{2}}, such that the isomorphisms identify equipped affine flat connections and we have the following commutative diagram

EL1\textstyle{E_{L_{1}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}\scriptstyle{\simeq}TL1\textstyle{TL_{1}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}\scriptstyle{\simeq}\scriptstyle{\mathcal{L_{F}}}Φ1\scriptstyle{\Phi_{1}}Φ1\scriptstyle{\Phi^{\prime}_{1}}EL1\textstyle{E^{\prime}_{L_{1}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}\scriptstyle{\simeq}EL2\textstyle{E_{L_{2}}}TL2\textstyle{TL_{2}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}Φ2\scriptstyle{\Phi_{2}}Φ2\scriptstyle{\Phi^{\prime}_{2}}EL2\textstyle{E^{\prime}_{L_{2}}}

Thus the ordinary gluing construction works. To be precise, suppose that we are given a collection {Lα}αΛ\{L_{\alpha}\}_{\alpha\in\Lambda}, where Λ\Lambda is a countable set, such that it satisfies the following properties:

  1. (i)

    for every αΛ\alpha\in\Lambda, LαL_{\alpha} itself is an open manifold and it is embedded in 2n+1\mathbb{R}^{2n+1} as a Legendre submanifold, called a local model;

  2. (ii)

    for every α,βΛ\alpha,\beta\in\Lambda, there is an open subset LαβLαL_{\alpha\beta}\subset L_{\alpha} (also LβαLβL_{\beta\alpha}\subset L_{\beta}) and a diffeomorphism αβ:LαβLβα\mathcal{L}_{\alpha}^{\beta}:L_{\alpha\beta}\to L_{\beta\alpha} such that over each connected component of LαβL_{\alpha\beta}, αβ\mathcal{L}_{\alpha}^{\beta} is given by an affine Legendre equivalence of the ambient space 2n+1\mathbb{R}^{2n+1};

  3. (iii)

    for α,β,γΛ\alpha,\beta,\gamma\in\Lambda, it holds that αγ=βγαβ\mathcal{L}_{\alpha}^{\gamma}=\mathcal{L}_{\beta}^{\gamma}\circ\mathcal{L}_{\alpha}^{\beta} on LαβLαγL_{\alpha\beta}\cap L_{\alpha\gamma}.

Let MM be the resulting topological space from these gluing data 𝒰={Lα,αβ}\mathcal{U}=\{L_{\alpha},\mathcal{L}_{\alpha}^{\beta}\}. Assume that MM is Hausdorff, then MM itself becomes an nn-dimensional manifold in the ordinary sense. One can naturally associate a possibly degenerate (0,2)(0,2)-tensor hh on MM and a pair of globally defined dual coherent tangent bundles EE and EE^{\prime} on MM with bundle maps Φ:TME\Phi:TM\to E and Φ:TME\Phi:TM\to E^{\prime} equipped with affine flat connections. The bundles EE and EE^{\prime} are dual to each other.

Definition 3.19.

We call (M,𝒰)(M,\mathcal{U}) equipped with (h,(E,E,Φ),(E,E,Φ))(h,(E,\nabla^{E},\Phi),(E^{\prime},\nabla^{E^{\prime}},\Phi^{\prime})) a quasi-Hessian manifold. We define the degeneracy locus Σ\Sigma to be the locus of points of MM at which hh is degenerate.

Since the gluing maps αβ\mathcal{L}_{\alpha}^{\beta} also act on a neighborhood of LαβL_{\alpha\beta} in 2n+1\mathbb{R}^{2n+1} and preserve the contact structure, MM is realized as a Legendre submanifold in some ambient contact manifold.

By the above construction, it is obvious to see

Proposition 3.20.

Let MM be a quasi-Hessian manifold. Then hh is non-degenerate everywhere if and only if MM is a Hessian manifold.

Proof :  From the equivalence of (1) and (3) in Lemma 3.14, we see that hh is non-degenerate everywhere if and only if any local models LαL_{\alpha} are regular models, that means MM is a Hessian manifold (Proposition 2.4). \Box

Remark 3.21.

More generally, we may allow a local model LαL_{\alpha} not to be a manifold but a singular Legendre variety; it is a closed subset with a partition (stratification) into integral submanifolds of the contact structure (the projection to the cotangent bundle is called a singular Lagrange variety), see, e.g., Ishikawa [16]. That results a quasi-Hessian manifold with singularities.

An intrinsic definition of quasi-Hessian manifolds is also available. Roughly speaking, it is an nn-dimensional manifold MM equipped with a pair of flat coherent tangent bundles (E,E,Φ)(E,\nabla^{E},\Phi) and (E,E,Φ)(E^{\prime},\nabla^{E^{\prime}},\Phi^{\prime}) of rank nn; we impose two conditions:

  1. (a)

    the vector bundle EEE\oplus E^{\prime} of rank 2n2n is endowed with a symplectic structure ω\omega and a pseudo-Riemannian metric τ\tau of type (n,n)(n,n) satisfying ω(u,v)=τ(u,v)=0\omega(u,v)=\tau(u,v)=0 (u,vEpu,v\in E_{p} or EpE^{\prime}_{p}) and ω(u,v)=τ(u,v)\omega(u,v)=\tau(u,v) (uEpu\in E_{p}, vEpv\in E^{\prime}_{p}); this condtion defines the dualily between EE and EE^{\prime};

  2. (b)

    the bundle map

    ΦΦ:TMEE\Phi\oplus\Phi^{\prime}:TM\to E\oplus E^{\prime}

    is injective and the image is a Lagrange subbundle which is certainly integrable in order to ensure to find a local model around each point of MM as in Definition 3.19. We omit the detail here.

This also suggests a degenerate version of the so-called Codazzi structure (cf. [25]).

3.4. Cubic tensor and α\alpha-family

In the theory of dually flat manifolds [1], not only the Hessian metric hh but also the Amari-Chentsov tensor T:=hT:=\nabla h takes an essential role; it satisfies

T(X,Y,Z)=h(XY,Z)h(XY,Z)=h(Y,XZ)h(Y,XZ)T(X,Y,Z)=h(\nabla^{*}_{X}Y,Z)-h(\nabla_{X}Y,Z)=h(Y,\nabla^{*}_{X}Z)-h(Y,\nabla_{X}Z)

for vector fields X,Y,ZX,Y,Z.

Note that whenever \nabla exists, the tensor TT is defined everywhere, independently whether or not hh is non-degenerate. This is an easy case. We generalize the Amari-Chentsov tensor for an arbitrary quasi-Hessian manifold

(M,h,(E,E,Φ),(E,E,Φ))(M,h,(E,\nabla^{E},\Phi),(E^{\prime},\nabla^{E^{\prime}},\Phi^{\prime}))

but the way is not obvious at all, because there is no connection of TMTM. Finally we will see that the obtained tensor is a very natural one (Proposition 3.24 below).

Lemma 3.22.

For any vector field XX on MM, and for any sections η\eta of EE and ζ\zeta^{\prime} of EE^{\prime}, it holds that

Xτ(η,ζ)=τ(XEη,ζ)+τ(η,XEζ)X\tau(\eta,\zeta^{\prime})=\tau(\nabla^{E}_{X}\eta,\zeta^{\prime})+\tau(\eta,\nabla^{E^{\prime}}_{X}\zeta^{\prime})

where we put τ(η,ζ):=τ(η0,0ζ)\tau(\eta,\zeta^{\prime}):=\tau(\eta\oplus 0,0\oplus\zeta^{\prime}) for short.

Proof :  Take local frames of flat sections sis_{i} of EE and sjs_{j}^{*} of EE^{\prime} with τ(si,sj)=12δij\tau(s_{i},s_{j}^{*})=\frac{1}{2}\delta_{ij} (1i,jn1\leq i,j\leq n) on an open set UMU\subset M. Put η=aisi\eta=\sum a_{i}s_{i} and ζ=bjsj\zeta^{\prime}=\sum b_{j}s_{j}^{*} where ai,bja_{i},b_{j} are functions on UU, then

XEη=X(ai)si,XEζ=X(bj)sj,τ(η,ζ)=12aibi.\nabla^{E}_{X}\eta=\sum X(a_{i})s_{i},\;\;\nabla^{E^{\prime}}_{X}\zeta^{\prime}=\sum X(b_{j})s_{j}^{*},\;\;\tau(\eta,\zeta^{\prime})=\frac{1}{2}\sum a_{i}b_{i}.

This leads to the equality. \Box

For Y,ZTMY,Z\in TM, put

η=Φ(Y),ζ=Φ(Z)E,η=Φ(Y),ζ=Φ(Z)E.\eta=\Phi(Y),\;\;\zeta=\Phi(Z)\;\in E,\;\;\eta^{\prime}=\Phi^{\prime}(Y),\;\;\zeta^{\prime}=\Phi^{\prime}(Z)\;\in E^{\prime}.

Then

h(Y,Z)=τ(ηη,ζζ)=τ(η,ζ)+τ(ζ,η).h(Y,Z)=\tau(\eta\oplus\eta^{\prime},\zeta\oplus\zeta^{\prime})=\tau(\eta,\zeta^{\prime})+\tau(\zeta,\eta^{\prime}).

Using Lemma 3.22, for vector fields X,Y,ZX,Y,Z on MM,

Xh(Y,Z)\displaystyle Xh(Y,Z) =\displaystyle= X(τ(η,ζ))+X(τ(ζ,η))\displaystyle X(\tau(\eta,\zeta^{\prime}))+X(\tau(\zeta,\eta^{\prime}))
=\displaystyle= τ(XEη,ζ)+τ(η,XEζ)+τ(XEζ,η)+τ(ζ,XEη).\displaystyle\tau(\nabla^{E}_{X}\eta,\zeta^{\prime})+\tau(\eta,\nabla^{E^{\prime}}_{X}\zeta^{\prime})+\tau(\nabla^{E}_{X}\zeta,\eta^{\prime})+\tau(\zeta,\nabla^{E^{\prime}}_{X}\eta^{\prime}).

We call the sum of first and third terms the E\nabla^{E}-part, the rest the E\nabla^{E^{\prime}}-part, tentatively. We are concerned with their difference.

Definition 3.23.

For a quasi-Hessian manifold MM, we define the canonical cubic tensor CC by the following (0,3)(0,3)-tensor on MM:

C(X,Y,Z):=τ(η,XEζ)+τ(ζ,XEη)τ(XEη,ζ)τ(XEζ,η).\displaystyle C(X,Y,Z):=\tau(\eta,\nabla^{E^{\prime}}_{X}\zeta^{\prime})+\tau(\zeta,\nabla^{E^{\prime}}_{X}\eta^{\prime})-\tau(\nabla^{E}_{X}\eta,\zeta^{\prime})-\tau(\nabla^{E}_{X}\zeta,\eta^{\prime}).

In particular, if hh is non-degenerate, then Φ(XY)=XEΦ(Y)\Phi(\nabla_{X}Y)=\nabla^{E}_{X}\Phi(Y) (Remark 3.18) and we have

τ(XEη,ζ)=τ(Φ(XY),Φ(Z))=12h(XY,Z)\tau(\nabla^{E}_{X}\eta,\zeta^{\prime})=\tau(\Phi(\nabla_{X}Y),\Phi^{\prime}(Z))=\frac{1}{2}h(\nabla_{X}Y,Z)

and so on, thus it follows that the E\nabla^{E}-part and the E\nabla^{E^{\prime}}-part are equal to, respectively,

12(h(XY,Z)+h(Y,XZ)),12(h(XY,Z)+h(Y,XZ)).\frac{1}{2}(h(\nabla_{X}Y,Z)+h(Y,\nabla_{X}Z)),\quad\frac{1}{2}(h(\nabla^{*}_{X}Y,Z)+h(Y,\nabla^{*}_{X}Z)).

Hence, we see that CC coincides with the Amari-Chentsov tensor TT:

C(X,Y,Z)\displaystyle C(X,Y,Z) =\displaystyle= 12(h(XY,Z)h(XY,Z))+12(h(Y,XZ)h(Y,XZ))\displaystyle\textstyle\frac{1}{2}(h(\nabla^{*}_{X}Y,Z)-h(\nabla_{X}Y,Z))+\frac{1}{2}(h(Y,\nabla^{*}_{X}Z)-h(Y,\nabla_{X}Z))
=\displaystyle= T(X,Y,Z).\displaystyle T(X,Y,Z).

Using local coordinates, we write down the tensor CC explicitly as follows. Take a local model L2n+1L\subset\mathbb{R}^{2n+1} and pLp\in L. As mentioned before, locally around pp, LL is parameterized by some local coordinates 𝒙I,𝒑J\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J} with a generating function g(𝒙I,𝒑J)g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J}). For the simplicity, for each 1kn1\leq k\leq n, we set

k:=xk(kI)orpk(kJ).\partial_{k}:=\frac{\partial}{\partial x_{k}}\;(k\in I)\;\;\;\mbox{or}\;\;\;\frac{\partial}{\partial p_{k}}\;(k\in J).
Proposition 3.24.

The canonical cubic tensor CC is locally the third partial derivative of a generating function: for any k,l,mk,l,m,

C(k,l,m)=klmg.C(\partial_{k},\partial_{l},\partial_{m})=\partial_{k}\partial_{l}\partial_{m}g.

In particular, CC is symmetric.

Proof :  This is shown by direct computation. The generating function yields a Lagrange embedding LTnL\to T^{*}\mathbb{R}^{n} given by

ι:(𝒙I,𝒑J)(𝒙I,𝒙J,𝒑I,𝒑J):=(𝒙I,Jg,Ig,𝒑J),\iota:(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})\mapsto(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$x$}_{J},\mbox{\boldmath$p$}_{I},\mbox{\boldmath$p$}_{J}):=(\mbox{\boldmath$x$}_{I},-\partial_{J}g,\partial_{I}g,\mbox{\boldmath$p$}_{J}),

thus the differential ι:TpLTp(Tn)=𝒙n𝒑n\iota_{*}:T_{p}L\to T_{p}(T^{*}\mathbb{R}^{n})=\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\oplus\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}} is written as

ι(k)=kjJ(kjg)xj+iI(kig)pi.\iota_{*}(\partial_{k})=\partial_{k}-\sum_{j\in J}(\partial_{k}\partial_{j}g)\frac{\partial}{\partial x_{j}}+\sum_{i\in I}(\partial_{k}\partial_{i}g)\frac{\partial}{\partial p_{i}}.

Let si,si(1in)s_{i},s_{i}^{*}\;(1\leq i\leq n) be flat sections of EE and EE^{\prime} as before; τ(si,sj)=12δij\tau(s_{i},s_{j}^{*})=\frac{1}{2}\delta_{ij}. Then for kIk\in I,

Φ(ιk)=skjJ(kjg)sj,Φ(ιk)=iI(kig)si,\Phi(\iota_{*}\partial_{k})=s_{k}-\sum_{j\in J}(\partial_{k}\partial_{j}g)s_{j},\quad\Phi^{\prime}(\iota_{*}\partial_{k})=\sum_{i\in I}(\partial_{k}\partial_{i}g)s_{i}^{*},

and for kJk\in J,

Φ(ιk)=jJ(kjg)sj,Φ(ιk)=sk+iI(kig)si.\Phi(\iota_{*}\partial_{k})=-\sum_{j\in J}(\partial_{k}\partial_{j}g)s_{j},\quad\Phi^{\prime}(\iota_{*}\partial_{k})=s_{k}^{*}+\sum_{i\in I}(\partial_{k}\partial_{i}g)s_{i}^{*}.

Put η=Φ(ιl)\eta=\Phi(\iota_{*}\partial_{l}), η=Φ(ιl)\eta^{\prime}=\Phi^{\prime}(\iota_{*}\partial_{l}), ζ=Φ(ιm)\zeta=\Phi(\iota_{*}\partial_{m}), ζ=Φ(ιm)\zeta^{\prime}=\Phi^{\prime}(\iota_{*}\partial_{m}), and X=kX=\partial_{k}.

For lIl\in I, mJm\in J and any kk, we have

τ(η,XEζ)=τ(slJ(ljg)sj,I(kmig)si)=12klmg,\displaystyle\textstyle\tau(\eta,\nabla^{E^{\prime}}_{X}\zeta^{\prime})=\tau(s_{l}-\sum_{J}(\partial_{l}\partial_{j}g)s_{j},\sum_{I}(\partial_{k}\partial_{m}\partial_{i}g)s_{i}^{*})=\frac{1}{2}\partial_{k}\partial_{l}\partial_{m}g,
τ(ζ,XEη)=τ(J(mjg)sj,I(klig)si)=0,\displaystyle\textstyle\tau(\zeta,\nabla^{E^{\prime}}_{X}\eta^{\prime})=\tau(-\sum_{J}(\partial_{m}\partial_{j}g)s_{j},\sum_{I}(\partial_{k}\partial_{l}\partial_{i}g)s_{i}^{*})=0,
τ(XEη,ζ)=τ(J(kljg)sj,sm+I(mig)si)=12klmg,\displaystyle\textstyle\tau(\nabla^{E}_{X}\eta,\zeta^{\prime})=\tau(-\sum_{J}(\partial_{k}\partial_{l}\partial_{j}g)s_{j},s_{m}^{*}+\sum_{I}(\partial_{m}\partial_{i}g)s_{i}^{*})=-\frac{1}{2}\partial_{k}\partial_{l}\partial_{m}g,
τ(XEζ,η)=τ(J(kmjg)sj,I(lig)si)=0.\displaystyle\textstyle\tau(\nabla^{E}_{X}\zeta,\eta^{\prime})=\tau(-\sum_{J}(\partial_{k}\partial_{m}\partial_{j}g)s_{j},\sum_{I}(\partial_{l}\partial_{i}g)s_{i}^{*})=0.

Thus, the E\nabla^{E^{\prime}}-part minus the E\nabla^{E}-part gives C(k,l,m)=klmgC(\partial_{k},\partial_{l},\partial_{m})=\partial_{k}\partial_{l}\partial_{m}g.

For l,mIl,m\in I and any kk, we have

τ(η,XEζ)=τ(slJ(ljg)sj,I(kmig)si)=12klmg,\displaystyle\textstyle\tau(\eta,\nabla^{E^{\prime}}_{X}\zeta^{\prime})=\tau(s_{l}-\sum_{J}(\partial_{l}\partial_{j}g)s_{j},\sum_{I}(\partial_{k}\partial_{m}\partial_{i}g)s_{i}^{*})=\frac{1}{2}\partial_{k}\partial_{l}\partial_{m}g,
τ(ζ,XEη)=τ(smJ(mjg)sj,I(klig)si)=12klmg,\displaystyle\textstyle\tau(\zeta,\nabla^{E^{\prime}}_{X}\eta^{\prime})=\tau(s_{m}-\sum_{J}(\partial_{m}\partial_{j}g)s_{j},\sum_{I}(\partial_{k}\partial_{l}\partial_{i}g)s_{i}^{*})=\frac{1}{2}\partial_{k}\partial_{l}\partial_{m}g,
τ(XEη,ζ)=τ(J(kljg)sj,I(mig)si)=0,\displaystyle\textstyle\tau(\nabla^{E}_{X}\eta,\zeta^{\prime})=\tau(-\sum_{J}(\partial_{k}\partial_{l}\partial_{j}g)s_{j},\sum_{I}(\partial_{m}\partial_{i}g)s_{i}^{*})=0,
τ(XEζ,η)=τ(J(kmjg)sj,I(lig)si)=0.\displaystyle\textstyle\tau(\nabla^{E}_{X}\zeta,\eta^{\prime})=\tau(-\sum_{J}(\partial_{k}\partial_{m}\partial_{j}g)s_{j},\sum_{I}(\partial_{l}\partial_{i}g)s_{i}^{*})=0.

Thus, C(k,l,m)=klmgC(\partial_{k},\partial_{l},\partial_{m})=\partial_{k}\partial_{l}\partial_{m}g. The same is true for the case of l,mJl,m\in J. \Box

Remark 3.25.

For a dually flat manifold with potential function ff, the above proposition corresponds to a well known property

T(i,j,k)=ijkf,T(\partial_{i},\partial_{j},\partial_{k})=\partial_{i}\partial_{j}\partial_{k}f,

with respect to \nabla-affine coordinates. In fact, a quasi-Hessian manifold is well characterized by using hh and CC, that will be discussed within the theory of (weak) contrast functions (see §3.4).

As well known, for a dually flat manifold MM, the family of α\alpha-connections is defined by

(α)=1+α2+1α2\nabla^{(\alpha)}=\frac{1+\alpha}{2}\nabla+\frac{1-\alpha}{2}\nabla^{*}

(α(\alpha\in\mathbb{R}). Namely, it deforms the Levi-Civita connection using TT linearly. When α=±1\alpha=\pm 1, ,\nabla,\nabla^{*} are recovered. Both (α)\nabla^{(\alpha)} and (α)\nabla^{(-\alpha)} are mutually dual and they form the so-called α\alpha-geometry [1, 20]. For a quasi-Hessian manifold MM, we have connections of EE and EE^{\prime}, but none of TMTM, thus there is no direct analogy to α\alpha-geometry. Nevertheless, as an attempt, we define a new (0,3)(0,3)-tensor

N(α)(X,Y,Z)\displaystyle N^{(\alpha)}(X,Y,Z) :=\displaystyle:= 1+α2[E-part]+1α2[E-part]\displaystyle\frac{1+\alpha}{2}\left[\mbox{$\nabla^{E}$-part}\right]+\frac{1-\alpha}{2}\left[\mbox{$\nabla^{E^{\prime}}$-part}\right]
=\displaystyle= 12Xh(Y,Z)α2C(X,Y,Z).\displaystyle\frac{1}{2}Xh(Y,Z)-\frac{\alpha}{2}C(X,Y,Z).

Obviously, N(1)(X,Y,Z)N^{(-1)}(X,Y,Z) is the E\nabla^{E^{\prime}}-part, N(1)(X,Y,Z)N^{(1)}(X,Y,Z) is the E\nabla^{E}-part multiplied by (1)(-1), and a sort of duality holds:

Xh(Y,Z)=N(α)(X,Y,Z)+N(α)(X,Y,Z).Xh(Y,Z)=N^{(\alpha)}(X,Y,Z)+N^{(-\alpha)}(X,Y,Z).

In general, N(α)N^{(\alpha)} is not totally symmetric, for Xh(Y,Z)Xh(Y,Z) is not so. If either Φp\Phi_{p} or Φp\Phi^{\prime}_{p} is isomorphic, then we may take a possibly degenerate local (dual) potential around pp (i.e., II or J=J=\emptyset) as generating function gg; thus hh is written by the Hessian of the potential, and hence Xh(Y,Z)Xh(Y,Z) is symmetric, and N(α)N^{(\alpha)} is also. Furthermore, if hh is non-degenerate, i.e., MM is a dually flat manifold, we completely restore α\alpha-geoemtry.

4. Divergence

Let (M,h,(E,E),(E,E))(M,h,(E,\nabla^{E}),(E^{\prime},\nabla^{E^{\prime}})) be a quasi-Hessian manifold throughout this section.

4.1. Geodesic-like curves

Let c:IMc:I\to M be a curve, where I()I\;(\not=\emptyset)\subset\mathbb{R} is an open interval, and set c˙(t):=ddtc(t)Tc(t)M\dot{c}(t):=\frac{d}{dt}c(t)\in T_{c(t)}M, the velocity vector (tIt\in I).

Definition 4.1.

A curve c:IMc:I\to M is called an mm-curve if it is an immersion (c˙(t)0\dot{c}(t)\not=0) and satisfies that at every tIt\in I, vectors of Ec(t)E^{\prime}_{c(t)}

Φc˙(t),c˙E(Φc˙)(t),(c˙E)2(Φc˙)(t),\Phi^{\prime}\circ\dot{c}(t),\;\;\nabla^{E^{\prime}}_{\dot{c}}(\Phi^{\prime}\circ\dot{c})(t),\;\;(\nabla^{E^{\prime}}_{\dot{c}})^{2}(\Phi^{\prime}\circ\dot{c})(t),\;\cdots

are not simultaneously zero and any two are linearly dependent. Also an ee-curve is defined in the same way by replacing Φ\Phi^{\prime} and EE^{\prime} by Φ\Phi and EE, respectively.

Suppose that the curve is given in a local model, c:ILαc:I\to L_{\alpha}. We denote by

𝒑(t):=π1mc(t)𝒑n\mbox{\boldmath$p$}(t):=\pi^{m}_{1}\circ c(t)\in\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}

the image via the mm-Lagrange map π1m:Lα𝒑n\pi^{m}_{1}:L_{\alpha}\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}. Note that EpE^{\prime}_{p} is canonically isomorphic to 𝒑n\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}} by linear projection along the zz^{\prime}-axis. Unless Φc˙(t)\Phi^{\prime}\circ\dot{c}(t) becomes to be zero, the velocity vector 𝒑˙(t)\dot{\mbox{\boldmath$p$}}(t) does not vanish and its acceleration vector 𝒑¨(t)\ddot{\mbox{\boldmath$p$}}(t) is parallel to the velocity (it can be 0) by the condition in Definition 4.1. Hence 𝒑(t)\mbox{\boldmath$p$}(t) moves on a straight line in 𝒑n\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}, i.e., c(t)c(t) is a re-parametrization of an mm-geodesic (geodesic with respect to =E\nabla^{*}=\nabla^{E^{\prime}}). A trouble occurs when Φc˙(t0)=0\Phi^{\prime}\circ\dot{c}(t_{0})=0 at some t0t_{0}. Then, 𝒑˙(t0)=0\dot{\mbox{\boldmath$p$}}(t_{0})=0, but by the condition for mm-curve, some higher derivative is non-zero, say dkdtk𝒑(t0)0\frac{d^{k}}{dt^{k}}\mbox{\boldmath$p$}(t_{0})\not=0, and then the vector dk+1dtk+1𝒑(t)\frac{d^{k+1}}{dt^{k+1}}\mbox{\boldmath$p$}(t) is parallel to dkdtk𝒑(t)\frac{d^{k}}{dt^{k}}\mbox{\boldmath$p$}(t), so we see again that 𝒑(t)\mbox{\boldmath$p$}(t) moves on a straight line, but it meets the mm-caustics at t=t0t=t_{0}; it stops once and then turns back or goes forward along the same line, according to kk even or odd, see Fig. 3 (cf. Examples 3.4 and 3.5). We choose a direction vector 𝒎c\mbox{\boldmath$m$}_{c} of the straight line. For an ee-curve c(t)c(t), everything is the same, and we denote by 𝒆c\mbox{\boldmath$e$}_{c} a direction of the corresponding line on 𝒙n\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}.

Refer to caption
Figure 3. The image of mm-curves via the mm-Lagrange map.
Remark 4.2.

(1) Not arbitrary two points on MM are connected by an mm-curve but by a piecewise mm-curve. In fact, in Example 3.4, you can easily find such two points on the mm-wavefront, the left in Fig. 3. That is also for ee-curves. (2) Take coordinates (𝒙I,𝒑J)(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J}) for a local model LαL_{\alpha}. It is easy to see that any e/me/m-curves satisfy a certain partial differential equation (like the geodesic equation) using h=(hij)h=(h_{ij}) and C=(Cijk)C=(C_{ijk}). In §3.4, we have introduced the α\alpha-family of cubic tensors N(α)N^{(\alpha)}. Thus we may consider an α\alpha-analogy to e/me/m-curves; indeed, over MΣM-\Sigma, it is the same as geodesics with respect to (α)\nabla^{(\alpha)} and (α)\nabla^{(-\alpha)}.

The following definition does not depend on the choices of LαL_{\alpha} and direction vectors.

Definition 4.3.

Let cec_{e}, cmc_{m} and 𝒆e, 𝒎m as above. Let SS be a submanifold of MM and cmc_{m} meets SS at qSq\in S. We say that cmc_{m} and SS are orthogonal at qq if it holds that

𝒎Td𝒙(u)=0for any uTqS\displaystyle\mbox{\boldmath$m$}^{T}d\mbox{\boldmath$x$}(u)=0\;\;\;\mbox{\rm for any $u\in T_{q}S$}

where (𝒙,𝒑,z)(\mbox{\boldmath$x$},\mbox{\boldmath$p$},z) is the coordinates of 2n+1\mathbb{R}^{2n+1} for a local model LαL_{\alpha} containing qq. Similarly, cec_{e} and SS are orthogonal at qq if 𝒆Td𝒑(u)=0\mbox{\boldmath$e$}^{T}d\mbox{\boldmath$p$}(u)=0 for any uu. Furthermore, we say that cec_{e} and cmc_{m} are strictly orthogonal at qq if 𝒆T𝒎=0\mbox{\boldmath$e$}^{T}\mbox{\boldmath$m$}=0.

If qΣq\not\in\Sigma, the above definition of the orthogonality of cmc_{m} and SS is the same as the orthogonality with respect to the metric hh. In fact, taking a regular model around qq, the Hessian H(q)H(q) is non-degenerate, and thus

h(c˙(t),u)=𝒙˙(t)THd𝒙(u)=(H𝒙˙(t))Td𝒙(u)=𝒑˙(t)Td𝒙(u)=k𝒎Td𝒙(u)h(\dot{c}(t),u)=\dot{\mbox{\boldmath$x$}}(t)^{T}Hd\mbox{\boldmath$x$}(u)=(H\dot{\mbox{\boldmath$x$}}(t))^{T}d\mbox{\boldmath$x$}(u)=\dot{\mbox{\boldmath$p$}}(t)^{T}d\mbox{\boldmath$x$}(u)=k\mbox{\boldmath$m$}^{T}d\mbox{\boldmath$x$}(u)

for some k0k\not=0. However, if qΣq\in\Sigma, the meaning is different in general, for it can happen that 𝒑˙(t0)=0\dot{\mbox{\boldmath$p$}}(t_{0})=0 but 𝒎0\mbox{\boldmath$m$}\not=0 (in this case, 𝒎m is determined by some higher derivative of 𝒑(t)\mbox{\boldmath$p$}(t)). The reason why we define the strictly orthogonality is the same; 𝒆e and 𝒎m may not be determined by velocity vectors.

4.2. Canonical divergence

Let LL be a Legendre submanifold of 2n+1\mathbb{R}^{2n+1}. We denote coordinates by

p=(𝒙(p),𝒑(p),z(p))2n+1,z(p)=𝒑(p)T𝒙(p)z(p).p=(\mbox{\boldmath$x$}(p),\mbox{\boldmath$p$}(p),z(p))\in\mathbb{R}^{2n+1},\quad z^{\prime}(p)=\mbox{\boldmath$p$}(p)^{T}\mbox{\boldmath$x$}(p)-z(p)\in\mathbb{R}.
Definition 4.4.

The canonical divergence 𝒟=𝒟L:L×L\mathcal{D}=\mathcal{D}_{L}:L\times L\to\mathbb{R} is defined by

𝒟(p,q)=z(p)+z(q)𝒙(p)T𝒑(q).\displaystyle\mathcal{D}(p,q)=z(p)+z^{\prime}(q)-\mbox{\boldmath$x$}(p)^{T}\mbox{\boldmath$p$}(q).

Note that 𝒟(p,p)=0\mathcal{D}(p,p)=0 and it is asymmetric, 𝒟(p,q)𝒟(q,p)\mathcal{D}(p,q)\not=\mathcal{D}(q,p), in general. In particular, if LL is a regular model with positve definite Hessian metric, this is nothing but the Bregman divergence for some convex potential z=f(𝒙)z=f(\mbox{\boldmath$x$}),

𝒟(p,q)=f(𝒙(p))+φ(𝒑(q))𝒙(p)T𝒑(q),\mathcal{D}(p,q)=f(\mbox{\boldmath$x$}(p))+\varphi(\mbox{\boldmath$p$}(q))-\mbox{\boldmath$x$}(p)^{T}\mbox{\boldmath$p$}(q),

where z=φ(𝒑)z^{\prime}=\varphi(\mbox{\boldmath$p$}) is the Legendre transform of the potential [1].

Lemma 4.5.

The canonical divergence 𝒟L\mathcal{D}_{L} is invariant under affine Legendre equivalence, i.e., if Legendre submanifolds L1L_{1} and L2L_{2} of 2n+1\mathbb{R}^{2n+1} are affine Legendre equivalenct via F\mathcal{L}_{F}, then it holds that

𝒟L1=𝒟L2(F×F)on L1×L1.\mathcal{D}_{L_{1}}=\mathcal{D}_{L_{2}}\circ(\mathcal{L}_{F}\times\mathcal{L}_{F})\quad\mbox{on $L_{1}\times L_{1}$.}

Proof :  Suppose that F:2n+12n+1\mathcal{L}_{F}:\mathbb{R}^{2n+1}\to\mathbb{R}^{2n+1} is given by

(𝒖,𝒗,w)=F(𝒙,𝒑,z)=(A𝒙+𝒃,A𝒑+𝒃,z+𝒄T𝒙+d)(\mbox{\boldmath$u$},\mbox{\boldmath$v$},w)=\mathcal{L}_{F}(\mbox{\boldmath$x$},\mbox{\boldmath$p$},z)=(A\mbox{\boldmath$x$}+\mbox{\boldmath$b$},A^{\prime}\mbox{\boldmath$p$}+\mbox{\boldmath$b$}^{\prime},z+\mbox{\boldmath$c$}^{T}\mbox{\boldmath$x$}+d)

with A=(AT)1A^{\prime}=(A^{T})^{-1}, 𝒃=A𝒄\mbox{\boldmath$b$}^{\prime}=A^{\prime}\mbox{\boldmath$c$} and w=𝒗T𝒖ww^{\prime}=\mbox{\boldmath$v$}^{T}\mbox{\boldmath$u$}-w, and F(L1)=L2\mathcal{L}_{F}(L_{1})=L_{2}. Then

𝒟L2(F(p),F(q))\displaystyle\mathcal{D}_{L_{2}}(\mathcal{L}_{F}(p),\mathcal{L}_{F}(q))
=w(p)+w(q)𝒖(p)T𝒗(q)\displaystyle\quad=w(p)+w^{\prime}(q)-\mbox{\boldmath$u$}(p)^{T}\mbox{\boldmath$v$}(q)
=w(p)w(q)+𝒗(q)T(𝒖(q)𝒖(p))\displaystyle\quad=w(p)-w(q)+\mbox{\boldmath$v$}(q)^{T}(\mbox{\boldmath$u$}(q)-\mbox{\boldmath$u$}(p))
=z(p)z(q)+𝒄T(𝒙(p)𝒙(q))+(A(𝒑(q)+𝒄))T(A(𝒙(q)𝒙(p))\displaystyle\quad=z(p)-z(q)+\mbox{\boldmath$c$}^{T}(\mbox{\boldmath$x$}(p)-\mbox{\boldmath$x$}(q))+(A^{\prime}(\mbox{\boldmath$p$}(q)+\mbox{\boldmath$c$}))^{T}(A(\mbox{\boldmath$x$}(q)-\mbox{\boldmath$x$}(p))\qquad\quad
=z(p)z(q)+𝒑(q)T(𝒙(q)𝒙(p))\displaystyle\quad=z(p)-z(q)+\mbox{\boldmath$p$}(q)^{T}(\mbox{\boldmath$x$}(q)-\mbox{\boldmath$x$}(p))
=z(p)+z(q)𝒙(p)T𝒑(q).\displaystyle\quad=z(p)+z^{\prime}(q)-\mbox{\boldmath$x$}(p)^{T}\mbox{\boldmath$p$}(q).
=𝒟L1(p,q).\displaystyle\quad=\mathcal{D}_{L_{1}}(p,q).

This completes the proof. \Box

Let (M,𝒰={Lα})(M,\mathcal{U}=\{L_{\alpha}\}) be a quasi-Hessian manifold obtained by gluing local models and put ΔM={(p,p)M×M}\varDelta_{M}=\{(p,p)\in M\times M\}. Let U(ΔM)U(\varDelta_{M}) denote the subset of M×MM\times M consisting of points (p,q)(p,q) such that there is some local model LαL_{\alpha} containing p,qp,q. Since MM is endowed with the quotient topology, U(ΔM)U(\varDelta_{M}) is an open neighborhood of the diagonal ΔM\varDelta_{M}.

Definition 4.6.

We set 𝒟M(p,q):=𝒟Lα(p,q)\mathcal{D}_{M}(p,q):=\mathcal{D}_{L_{\alpha}}(p,q) at p,qLαp,q\in L_{\alpha} for some α\alpha, then 𝒟M:U(ΔM)\mathcal{D}_{M}:U(\varDelta_{M})\to\mathbb{R} is well-defined by Lemma 4.5. We call it the canonical divergence of MM.

If MM is connected and simply connected, then the canonical divergence of MM can be extended to the entire space, so we obtain 𝒟M:M×M\mathcal{D}_{M}:M\times M\to\mathbb{R}.

In Amari-Nagaoka’s theory of dually flat structure [1, 2], there are two important theorems named by extended Pythagorean Theorem and projection theorem. They take a central role in application to statistical inference, em-algorithm, machine learning and so on. These are immediately generalized to our singular setup. In the following two theorems, assume that MM is a local model (i.e. M=L2n+1M=L\subset\mathbb{R}^{2n+1}) or a connected and simply-connected quasi-Hessian manifold. Anyway, the canonical divergence 𝒟(=𝒟M)\mathcal{D}\,(=\mathcal{D}_{M}) is defined on M×MM\times M.

We say that two points p,qp,q are jointed by a curve c:IMc:I\to M if there are t0,t1It_{0},t_{1}\in I with c(t0)=pc(t_{0})=p and c(t1)=qc(t_{1})=q.

Theorem 4.7.

(Extended Pythagorean Theorem) Let p,q,rMp,q,r\in M be three distinct points such that pp and qq are joined by an ee-curve cec_{e}, and qq and rr are jointed by an mm-curve cmc_{m}, and furthermore, cec_{e} and cmc_{m} are strictly orthogonal at qq. Then it holds that

𝒟(p,r)=𝒟(p,q)+𝒟(q,r).\displaystyle\mathcal{D}(p,r)=\mathcal{D}(p,q)+\mathcal{D}(q,r).

Proof :  Since 𝒟(q,q)=z(q)+z(q)𝒙(q)T𝒑(q)=0\mathcal{D}(q,q)=z(q)+z^{\prime}(q)-\mbox{\boldmath$x$}(q)^{T}\mbox{\boldmath$p$}(q)=0, we see that

𝒟(p,r)𝒟(p,q)𝒟(q,r)=(𝒙(p)𝒙(q))T(𝒑(r)𝒑(q)).\mathcal{D}(p,r)-\mathcal{D}(p,q)-\mathcal{D}(q,r)=-(\mbox{\boldmath$x$}(p)-\mbox{\boldmath$x$}(q))^{T}(\mbox{\boldmath$p$}(r)-\mbox{\boldmath$p$}(q)).

The images of the maps π1ece\pi^{e}_{1}\circ c_{e} and π1mcm\pi^{m}_{1}\circ c_{m} lie on lines with direction vectors, say 𝒆,𝒎\mbox{\boldmath$e$},\mbox{\boldmath$m$}, respectively. Then

𝒙(p)𝒙(q)=k0𝒆,𝒑(r)𝒑(q)=k1𝒎\mbox{\boldmath$x$}(p)-\mbox{\boldmath$x$}(q)=k_{0}\mbox{\boldmath$e$},\quad\mbox{\boldmath$p$}(r)-\mbox{\boldmath$p$}(q)=k_{1}\mbox{\boldmath$m$}

for some k0,k1k_{0},k_{1}\in\mathbb{R}. The assumption is 𝒆T𝒎=0\mbox{\boldmath$e$}^{T}\mbox{\boldmath$m$}=0, thus the equality follows. \Box

Theorem 4.8.

(Projection Theorem) Let SS be a submanifold of MM and cm:[0,1]Lc_{m}:[0,1]\to L an mm-curve with q=cm(1)Sq=c_{m}(1)\in S. Put p=cm(0)Lp=c_{m}(0)\in L. Then, cmc_{m} and SS are orthogonal at qq if and only if qq is a critical point of the function F=𝒟(,p):SF=\mathcal{D}(-,p):S\to\mathbb{R}. The same holds for an ee-curve cec_{e} and F=𝒟(p,)F=\mathcal{D}(p,-).

Proof :  Take a generating function g(𝒙I,𝒑J)g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J}) around qq. Recall 𝒑I=g𝒙I\mbox{\boldmath$p$}_{I}=\frac{\partial g}{\partial\mbox{\boldmath$x$}_{I}}, 𝒙J=g𝒑J\mbox{\boldmath$x$}_{J}=-\frac{\partial g}{\partial\mbox{\boldmath$p$}_{J}} and z=𝒑JT𝒙J+g(𝒙I,𝒑J)z=\mbox{\boldmath$p$}_{J}^{T}\mbox{\boldmath$x$}_{J}+g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J}). Let γ=γ(s)\gamma=\gamma(s) be an immersed curve on SS with γ(0)=q\gamma(0)=q. On this curve, we have ddsg(𝒙I,𝒑J)=𝒑IT(dds𝒙I)𝒙JT(dds𝒑J)\frac{d}{ds}g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})=\mbox{\boldmath$p$}_{I}^{T}(\frac{d}{ds}\mbox{\boldmath$x$}_{I})-\mbox{\boldmath$x$}_{J}^{T}(\frac{d}{ds}\mbox{\boldmath$p$}_{J}) and dzds=(dds𝒑J)T𝒙J+𝒑JT(dds𝒙J)+ddsg=𝒑Tdds𝒙\frac{dz}{ds}=(\frac{d}{ds}\mbox{\boldmath$p$}_{J})^{T}\mbox{\boldmath$x$}_{J}+\mbox{\boldmath$p$}_{J}^{T}(\frac{d}{ds}\mbox{\boldmath$x$}_{J})+\frac{d}{ds}g=\mbox{\boldmath$p$}^{T}\frac{d}{ds}\mbox{\boldmath$x$}. Therefore, we see

d(Fγ)ds(s)\displaystyle\frac{d(F\circ\gamma)}{ds}(s) =\displaystyle= dds(z(γ(s))+z(p)𝒑(p)T𝒙(γ(s)))\displaystyle\frac{d}{ds}(z(\gamma(s))+z^{\prime}(p)-\mbox{\boldmath$p$}(p)^{T}\mbox{\boldmath$x$}(\gamma(s)))
=\displaystyle= (𝒑(γ(s))𝒑(p))Td(𝒙γ)ds(s).\displaystyle(\mbox{\boldmath$p$}(\gamma(s))-\mbox{\boldmath$p$}(p))^{T}\frac{d(\mbox{\boldmath$x$}\circ\gamma)}{ds}(s).

At s=0s=0, the vector 𝒑(q)𝒑(p)\mbox{\boldmath$p$}(q)-\mbox{\boldmath$p$}(p) is a scalar multiple of the direction vector 𝒗v of the line in 𝒑n\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}} to which the mm-curve cmc_{m} is projected, and dds𝒙=dπ1e(dγds(0))𝒙n\frac{d}{ds}\mbox{\boldmath$x$}=d\pi^{e}_{1}(\frac{d\gamma}{ds}(0))\in\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}. Hence, the orthogonality of SS and cmc_{m} at qq is equivalent to that ddsFγ(0)=0\frac{d}{ds}F\circ\gamma(0)=0 for arbitrary γ\gamma, that means that FF is critical at qq. \Box

Example 4.9.

We check the Pythagorean theorem for a toy example in Example 3.4. Let

f(𝒙)=x133+x222f(\mbox{\boldmath$x$})=\frac{x_{1}^{3}}{3}+\frac{x_{2}^{2}}{2}

and use affine local coordinates 𝒙=(x1,x2)\mbox{\boldmath$x$}=(x_{1},x_{2}) for LL. The mm-Lagrange map is (x1,x2)(p1,p2)=(x12,x2)(x_{1},x_{2})\mapsto(p_{1},p_{2})=(x_{1}^{2},x_{2}), and Σ\Sigma is the x2x_{2}-axis. We have already computed the dual potential zz^{\prime}, thus for P:=𝒙(p)=(a1,a2)P:=\mbox{\boldmath$x$}(p)=(a_{1},a_{2}) and Q:=𝒙(q)=(b1,b2)Q:=\mbox{\boldmath$x$}(q)=(b_{1},b_{2}), we have

𝒟(P,Q)=a133+a222+2b133+b222a1b12a2b2.\mathcal{D}(P,Q)=\frac{a_{1}^{3}}{3}+\frac{a_{2}^{2}}{2}+\frac{2b_{1}^{3}}{3}+\frac{b_{2}^{2}}{2}-a_{1}b_{1}^{2}-a_{2}b_{2}.

A straight line p2=ap1+bp_{2}=ap_{1}+b on 𝒑2\mathbb{R}^{2}_{\mbox{\tiny$\mbox{\boldmath$p$}$}} corresponds to a parabola x2=ax12+bx_{2}=ax_{1}^{2}+b on 𝒙2\mathbb{R}^{2}_{\mbox{\tiny$\mbox{\boldmath$x$}$}} (i.e. an mm-curve). Now, for example, take an mm-curve cmc_{m}: x2=12x12x_{2}=\frac{1}{2}x_{1}^{2} (𝒎=(2,1)T\mbox{\boldmath$m$}=(2,1)^{T}), and two points Q:=(u,u22)Q:=(u,\frac{u^{2}}{2}) and R:=(t,t22)R:=(t,\frac{t^{2}}{2}) lying on it. Take a point P:=(s,2(su)+u22)P:=(s,-2(s-u)+\frac{u^{2}}{2}) on the straight line on 𝒙2\mathbb{R}^{2}_{\mbox{\tiny$\mbox{\boldmath$x$}$}} (i.e. an ee-curve) passing through QQ directed by 𝒆=(1,2)T\mbox{\boldmath$e$}=(1,-2)^{T} with 𝒆T𝒎=0\mbox{\boldmath$e$}^{T}\mbox{\boldmath$m$}=0. Then PQR\triangle PQR satisfies the condition, and we see 𝒟(P,Q)+𝒟(Q,R)=𝒟(P,R)\mathcal{D}(P,Q)+\mathcal{D}(Q,R)=\mathcal{D}(P,R). It does not matter whether the point QQ lies on Σ\Sigma or not.

Refer to caption
Figure 4. Two projections of the triangle pqr\triangle pqr lying on LL of Example 3.4. We see a folded triangle on 𝒑2\mathbb{R}^{2}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}.

.

4.3. Weak contrast functions

First we recall the definition of contrast functions (Eguchi [13]). Let MM be a manifold and ρ:U\rho:U\to\mathbb{R} a function defined on an open neighborhood UU of the diagonal ΔMM×M\mathit{\Delta}_{M}\subset M\times M. Given vector fields Xi(1ik)X_{i}\,(1\leq i\leq k), Yj(1jl)Y_{j}\,(1\leq j\leq l) on MM, we set a function

ρ[X1Xk|Y1Yl]:M\rho[X_{1}\cdots X_{k}|Y_{1}\cdots Y_{l}]:M\to\mathbb{R}

by assigning to pMp\in M the value

(X1)p(Xk)p(Y1)q(Yl)q(ρ(p,q))|p=q.(X_{1})_{p}\cdots(X_{k})_{p}(Y_{1})_{q}\cdots(Y_{l})_{q}\left(\rho(p,q)\right)|_{p=q}.

We also write ρ[X|](r)=Xpρ(p,q)|p=q=r\rho[X|-](r)=X_{p}\rho(p,q)|_{p=q=r} and so on. We call ρ:U\rho:U\to\mathbb{R} a contrast function of MM if it satisfies that

(i)ρ[|]=ρ(p,p)=0(ii)ρ[X|]=ρ[|X]=0,(i)\;\;\rho[-|-]=\rho(p,p)=0\qquad(ii)\;\;\rho[X|-]=\rho[-|X]=0,

(iii) h(X,Y):=ρ[X|Y]h(X,Y):=-\rho[X|Y] is a pseudo-Riemannian metric on MM.

We call ρ\rho a weak contrast function if it satisfies only (i) and (ii).

Given a contrast function ρ\rho, affine connections are defined by

h(XY,Z):=ρ[XY|Z],h(Y,XZ):=ρ[Y|XZ].h(\nabla_{X}Y,Z):=-\rho[XY|Z],\quad h(Y,\nabla^{*}_{X}Z):=-\rho[Y|XZ].

Those connections are torsion-free, mutually dual with respect to hh, and h\nabla h is symmetric, and therefore, (M,h,)(M,h,\nabla) becomes a statistical manifold [13, 20]. Conversely, given a statistical manifold MM, one can find a contrast function which reproduces the metric and connections [19] – it is actually shown in [19] that for a symmetric (0,2)(0,2)-tensor hh (i.e., a possibly degenerate metric) and a symmetric (0,3)(0,3)-tensor cc, one can find a weak contrast function ρ:U\rho:U\to\mathbb{R} which satisfies that

h(X,Y)\displaystyle h(X,Y) =\displaystyle= ρ[X|Y](=ρ[XY|]=ρ[|XY]),\displaystyle-\rho[X|Y]\;(=\rho[XY|-]=\rho[-|XY]),
c(X,Y,Z)\displaystyle c(X,Y,Z) =\displaystyle= ρ[Z|XY]+ρ[XY|Z].\displaystyle-\rho[Z|XY]+\rho[XY|Z].

Among statistical manifolds, Hessian manifolds admit a notable property: the Bregman divergence is a contrast function, and it reproduces the dually flat structure. That is extended to our quasi-Hessian manifold and its canonical divergence.

Theorem 4.10.

For a quasi-Hessian manifold MM, the canonical divergence 𝒟M\mathcal{D}_{M} is a weak contrast function, and reproduces the quasi-Hessian metric and the canonical cubic tensor by

h(X,Y)\displaystyle h(X,Y) =\displaystyle= 𝒟M[X|Y],\displaystyle-\mathcal{D}_{M}[X|Y],
C(X,Y,Z)\displaystyle C(X,Y,Z) =\displaystyle= 𝒟M[XY|Z]+𝒟M[Z|XY].\displaystyle-\mathcal{D}_{M}[XY|Z]+\mathcal{D}_{M}[Z|XY].

Proof :  Since this is a local property, take a local model Lα2n+1L_{\alpha}\subset\mathbb{R}^{2n+1}. Suppose that g(𝒙I,𝒑J)g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J}) is a generating function for LαL_{\alpha} around pLαp\in L_{\alpha}. Then (𝒙I,𝒑J)(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J}) is a system of local coordinates for LαL_{\alpha} around pp, and it holds that xj(q)=gpj(q)x_{j}(q)=-\frac{\partial g}{\partial p_{j}}(q), pi(q)=gxi(q)p_{i}(q)=\frac{\partial g}{\partial x_{i}}(q), z(q)=𝒑J(q)T𝒙J(q)+g(q)z(q)=\mbox{\boldmath$p$}_{J}(q)^{T}\mbox{\boldmath$x$}_{J}(q)+g(q) for qLαq\in L_{\alpha} close to pp. Hence,

𝒟M(p,q)\displaystyle\mathcal{D}_{M}(p,q) =\displaystyle= z(p)z(q)+𝒑(q)T(𝒙(q)𝒙(p))\displaystyle z(p)-z(q)+\mbox{\boldmath$p$}(q)^{T}(\mbox{\boldmath$x$}(q)-\mbox{\boldmath$x$}(p))
=\displaystyle= g(p)g(q)+𝒙J(p)T(𝒑J(p)𝒑J(q))+𝒑I(q)T(𝒙I(q)𝒙I(p)).\displaystyle g(p)-g(q)+\mbox{\boldmath$x$}_{J}(p)^{T}(\mbox{\boldmath$p$}_{J}(p)-\mbox{\boldmath$p$}_{J}(q))+\mbox{\boldmath$p$}_{I}(q)^{T}(\mbox{\boldmath$x$}_{I}(q)-\mbox{\boldmath$x$}_{I}(p)).

Let k\partial_{k} denote xk\frac{\partial}{\partial x_{k}} if kIk\in I and pk\frac{\partial}{\partial p_{k}} if kJk\in J, for short. Then

(k)p𝒟M(p,q)\displaystyle(\partial_{k})_{p}\mathcal{D}_{M}(p,q) =\displaystyle= ϵ(k)(kg(p)kg(q))+k𝒙J(p)T(𝒑J(p)𝒑J(q)),\displaystyle\epsilon(k)(\partial_{k}g(p)-\partial_{k}g(q))+\partial_{k}\mbox{\boldmath$x$}_{J}(p)^{T}(\mbox{\boldmath$p$}_{J}(p)-\mbox{\boldmath$p$}_{J}(q)),
(k)q𝒟M(p,q)\displaystyle(\partial_{k})_{q}\mathcal{D}_{M}(p,q) =\displaystyle= (1ϵ(k))(kg(p)kg(q))+k𝒑I(q)T(𝒙I(q)𝒙I(p)),\displaystyle(1-\epsilon(k))(\partial_{k}g(p)-\partial_{k}g(q))+\partial_{k}\mbox{\boldmath$p$}_{I}(q)^{T}(\mbox{\boldmath$x$}_{I}(q)-\mbox{\boldmath$x$}_{I}(p)),

where ϵ(k)=1\epsilon(k)=1 if kIk\in I and 0 if kJk\in J. It immediately follows that

𝒟M[|]=0,𝒟M[k|]=𝒟M[|k]=0,\mathcal{D}_{M}[-|-]=0,\quad\mathcal{D}_{M}[\partial_{k}|-]=\mathcal{D}_{M}[-|\partial_{k}]=0,

so the divergence is a weak contrast function. Put

ϵ(k,l)={1(k,lI)1(k,lJ)0(otherwise).\epsilon(k,l)=\left\{\begin{array}[]{rl}1&(k,l\in I)\\ -1&(k,l\in J)\\ 0&\mbox{(otherwise)}.\end{array}\right.

Then a simple computation shows that

(l)p(k)p𝒟M(p,q)\displaystyle(\partial_{l})_{p}(\partial_{k})_{p}\mathcal{D}_{M}(p,q) =\displaystyle= ϵ(k,l)lkg(p)+lk𝒙J(p)T(𝒑J(p)𝒑J(q)),\displaystyle\epsilon(k,l)\partial_{l}\partial_{k}g(p)+\partial_{l}\partial_{k}\mbox{\boldmath$x$}_{J}(p)^{T}(\mbox{\boldmath$p$}_{J}(p)-\mbox{\boldmath$p$}_{J}(q)),
(l)q(k)q𝒟M(p,q)\displaystyle(\partial_{l})_{q}(\partial_{k})_{q}\mathcal{D}_{M}(p,q) =\displaystyle= ϵ(k,l)lkg(q)+lk𝒑I(q)T(𝒙I(q)𝒙I(p)).\displaystyle\epsilon(k,l)\partial_{l}\partial_{k}g(q)+\partial_{l}\partial_{k}\mbox{\boldmath$p$}_{I}(q)^{T}(\mbox{\boldmath$x$}_{I}(q)-\mbox{\boldmath$x$}_{I}(p)).\qquad\quad

Hence 𝒟M[kl|]=h(k,l)\mathcal{D}_{M}[\partial_{k}\partial_{l}|-]=h(\partial_{k},\partial_{l}) and

𝒟M[kl|m]𝒟M[m|kl]=klmg\mathcal{D}_{M}[\partial_{k}\partial_{l}|\partial_{m}]-\mathcal{D}_{M}[\partial_{m}|\partial_{k}\partial_{l}]=-\partial_{k}\partial_{l}\partial_{m}g

for any k,l,mk,l,m. This coincides with the cubic tenser CC by Proposition 3.24 up to the sign. \Box

5. Discussions

We shortly discuss possible directions or proposals for further researches.

5.1. Pre-Frobenius structure

In mathematical physics such as string theory, there often arise manifolds endowed with commutative and associative multiplication on tangent spaces satisfying certain properties, called (several variations of) Frobenius manifolds [10]. Now, let (M,h,C)(M,h,C) be a flat Hessian manifold, i.e., the metric connection with respect to hh is flat. Then MM naturally carries a (weak) version of Frobenius structure [27, §2]. Put Cijk=C(i,j,k)C_{ijk}=C(\partial_{i},\partial_{j},\partial_{k}) using \nabla-affine coordinates, and we may take them as structure constants to define a multiplication on TpMT_{p}M:

ij:=k,lCijkhkll.\partial_{i}\circ\partial_{j}:=\sum_{k,l}C_{ijk}h^{kl}\partial_{l}.

Since CC is symmetric, it is commutative. The associativity, (ij)k=i(jk)(\partial_{i}\circ\partial_{j})\circ\partial_{k}=\partial_{i}\circ(\partial_{j}\circ\partial_{k}), is written down to

a,b(CijbCklaCilaCjkb)=0(i,j,k,l),\sum_{a,b}(C_{ijb}C_{kla}-C_{ila}C_{jkb})=0\qquad(\forall\,i,j,k,l),

and a bit surprisingly, the left hand side coincides with the curvature tensor for the Levi-Civita connection of hh [11, 23]; the equation is actually known as the WDVV equation in string theory. Moreover, it is easy to see that the multiplication is compatible with the metric: h(ij,k)=h(i,jk)h(\partial_{i}\circ\partial_{j},\partial_{k})=h(\partial_{i},\partial_{j}\circ\partial_{k}). Then the tuple (M,h,)(M,h,\circ) becomes a weak pre-Frobenius manifold (cf. [10, 15]). For a quasi-Hessian manifold MM, the symmetric cubic tensor C=(Cijk)C=(C_{ijk}) is defined everywhere, but hklh^{kl} is not; even though, the WDVV equation makes sense. Then, at least for every pΣp\in\Sigma (pointwise), the quotient TpM/null(hp)T_{p}M/{\rm null}(h_{p}) carries a Frobenius algebra structure.

A new pre-Frobenius structure on a certain space of probability distributions has recently been found using the Hessian geometry on convex cones and paracomplex structure in [9]. Also from the context of Poisson and paraKähler geometry, the notion of contravariant pseudo-Hessian manifolds has been introduced in [7], which is actually very close to our quasi-Hessian manifolds with degenerate potentials. Those should be mutually related.

As a different question from the above, more interesting is local geometry of quasi-Hessian MM in relation with the Saito-Givental theory – under a certain condition, the germ of MM at a point should be a real geometry counterpart to analytic spectrum of a massive F-manifold (cf. [15, §3]; the analytic spectrum is a certain holomorphic Legendre submanifold of 2n+1\mathbb{C}^{2n+1} defined by a versal deformation of a complex isolated hypersurface singularity as its a generating family). Perhaps, this was essentially posed by Arnol’d [5, §4].

5.2. Statistical inference and machine learning

Suppose that our statistical model SS is a curved exponential family, i.e., a submanifold of an exponential family MM (see Example 2.6). Let 𝒟:M×M\mathcal{D}:M\times M\to\mathbb{R} be the associated Bregman divergence, which is known to coincide with the Kullback-Leibler divergence

𝒟KL(q,p)=q(u)logq(u)p(u)du\mathcal{D}_{KL}(q,p)=\int q(u)\log\frac{q(u)}{p(u)}du

measuring an ‘asymmetrical distance’ from a distribution (density function) q=q(u)q=q(u) to another p=p(u)p=p(u). A given data set {𝐮i}\{{\bf u}_{i}\} yields an observed point p^M\hat{p}\in M, then the task of statistical inference is to find q0Sq_{0}\in S which best approximates the point p^\hat{p}. Information geometry [1, 2] provides a clear geometric understanding on the maximum likelihood estimate (MLE), that is, the MLE assigns to p^\hat{p} the point q0Sq_{0}\in S which attains the minimum of 𝒟(,p^):S\mathcal{D}(\cdot,\hat{p}):S\to\mathbb{R}, and especially, p^\hat{p} is projected to q0q_{0} along an mm-geodesic (mm-curve) being orthogonal to SS at q0q_{0}. We have shown that this assertion is valid even in case that MM admits the locus Σ\Sigma where the Fisher-Rao metric is degenerate (Theorem 4.8), see Fig. 5.

Refer to caption
Figure 5. A conceptual figure for statistical inference.

If p^\hat{p} is sufficiently close to SS and far from Σ\Sigma, then the asymptotic theory of estimation is discussed. However, in practice, we may not be able to know if p^\hat{p} is the case. For instance, it often happens that the MLE has multiple local minimums, i.e., the maximum likelihood equation may have multiple roots. Then, as p^\hat{p} varies by renewing the data, catastrophe phenomena – the birth and death of min/max. points – can happen. Actually, the ambiguity of root selection in MLE has been studied in practical and numerical approach (cf. [26, §4]), while there seems to be less theoretical approach so far. Our framework provides a right way from information geometry. Define

F:S×MF(q,p):=𝒟(ι(q),p)F:S\times M\to\mathbb{R}\qquad F(q,p):=\mathcal{D}(\iota(q),p)

and we may consider FF as a global generating family [4, p.323], i.e., it defines a Legendre submanifold of TM×T^{*}M\times\mathbb{R} by

LS:={(p,η,z)|qS,Fq(q,p)=0,η=Fp(q,p),z=F(p,η)}L_{S}:=\left\{\;(p,\eta,z)\;\middle|\;\exists\,q\in S,\;\frac{\partial F}{\partial q}(q,p)=0,\;\eta=\frac{\partial F}{\partial p}(q,p),\;z=F(p,\eta)\right\}

where we roughly denote by Fq\frac{\partial F}{\partial q} the differential with respect to SS and so on. This gives a typical example of a quasi-Hessian manifold. The critical value set of the Lagrange map π:LSM\pi:L_{S}\to M (π(p,η,z)p\pi(p,\eta,z)\mapsto p) is nothing but the envelope of the family of all mm-curves on MM which are orthogonal to SS; we call it the mm-caustics determined by SS. If SS is not \nabla-flat, the mm-caustics usually appear (that reflects the \nabla^{*}-extrinsic geometry of SS in MM). It turns out that the catastrophe phenomenon mentioned above arises when the data manifold DD intersects with the mm-caustics determined by SS. Conversely, for a given data manifold DD, we may consider the restriction of 𝒟\mathcal{D} to M×DM\times D and define the ee-caustics determined by DD similarly. Interaction between these two e/me/m-caustics can be involved and affect the performance of EM-algorithm (cf. Amari [2, Chap. 8]). Note that in principle, the above strategy may be adapted to any divergence and any statistical model. The detail will be discussed somewhere else.

As described in Amari [2, Chap.11], a class of learning machines is also based on the Bregman divergence 𝒟ϕ\mathcal{D}_{\phi} of convex functions ϕ\phi. Now, as an attempt, suppose that ϕ\phi is a nonconvex function (possibly with inflection points). Read 𝒟ϕ\mathcal{D}_{\phi} to be the corresponding canonical divergence in our sense (see §4.2). Here we would like to notice that the same proofs in convex case do often work to obtain slightly weaker results for such general ϕ\phi – an easy example is Theorem 11.1 of [2], which is read off as “the kk-mean ηC:=1kxi\eta_{C}:=\frac{1}{k}\sum x_{i} of a cluster C={xi}i=1kC=\{x_{i}\}_{i=1}^{k} in n\mathbb{R}^{n} is always a critical point of 𝒟ϕ(C,):=1k𝒟ϕ(xi,)\mathcal{D}_{\phi}(C,-):=\frac{1}{k}\sum\mathcal{D}_{\phi}(x_{i},-), and all other critical points are obtained from ηC\eta_{C} and ker2ϕ\ker\nabla^{2}\phi”. We expect a similar result for some other optimization algorithm. On the other hand, almost all statistical learning machines allow Fisher-Rao matrices to be degenerate [14, 28]. In particular, as in [2, Chap.12], most of deep learning machines use the Gaussian noise with a fixed (co)variance for regression; then the parameter space MM becomes a self-dual Riemannian manifold (h,=)(h,\nabla=\nabla^{*}) off the degeneracy locus Σ\Sigma of hh having many components. We seek another scheme for measuring errors which is compatible with our singular model.

5.3. Conclusion

In the present paper, we have proposed an information geometry for singular models from the viewpoint of contact geometry and singularity theory. We have introduced quasi-Hessian manifolds, which extend the notion of dually flat manifolds of Amari-Nagaoka so that the Hessian metric can be degenerate, but the canonical cubic tensor is consistently defined on the entire space. Most notable is that the extended Pythagorean theorem and projection theorem are valid even in this singular setup.

There are several further directions as mentioned above. We end by adding a few more comments. There is an on-going project of the first author on local classification of singularities of emem-wavefronts in flat affine coordinates, which extends an old work of Ekeland [12] in nonconvex optimization and leads to affine differential geometry of wavefronts (cf. [24]). Secondly, since a quasi-Hessian manifold is embedded in some contact manifold, we may think of the Hamiltonian-Jacobi method for time evolution of quasi-Hessian manifolds (wavefront propagation) and semi-classical quantization (WKB analysis) in our framework (cf. [3]). Finally, it would be valuable to find some connections with preceding excellent works on singular statistical models [2, 14, 28] – especially, we hope that the theory of singular Legendre varieties and Legendre currents would make a bridge between the differential geometric method [1, 2] and the algebro-geometric method [28].

References

  • [1] S. Amari, H. Nagaoka, Method of information geometry, A.M.S., Oxford Univ. Press (2000).
  • [2] S. Amari, Information Geometry and Its Application, Applied Math. Sci., 194, Springer (2016).
  • [3] V.I. Arnol’d, Mathematical Methods of Classical Mechanics, 2nd Edition, Grad. Texts Math. 60, Springer-Verlag (1989).
  • [4] V.I. Arnol’d, S.M. Gusein-Zade, A.N. Varchenko, Singularities of Differentiable Maps I, Monographs in Math. 82, Birkhäuser (1986).
  • [5] V.I. Arnol’d, Singularities of Caustics and Wave Fronts, Kluwer Acad. Publ. (1990).
  • [6] V.I. Arnol’d, Catastrophe Theory, 3rd Edition, Springer-Verlag (1992).
  • [7] S. Benayadi and M. Boucetta, On para-Kähler Lie algebroids and contravariant pseudo‐Hessian structures Math. Nachrichten 292 (2019), 1418–1443.
  • [8] N. Chentsov, Statistical decision rules and optimal inference, Translation of Math. Monograph 53, AMS, Providence (1982).
  • [9] N. Combe and Y.I. Manin, F-manifolds and geometry of information, preprint (2020), ArXiv:2004.08808.
  • [10] B. Dubrovin, Geometry of 2D topological field theories, Springer Lect. Note Series 1620 (1996), 120–348.
  • [11] J. Duistermaat, On Hessian Riemannian Structures, Asian J. Math. 5 (2001), 79–91.
  • [12] I. Ekeland, Legendre duality in nonconvex optimization and calculus of variations, SIAM J. Control and Optimization, 15 (6) (1977), 905–934.
  • [13] S. Eguchi, Geometry of minimum contrast, Hiroshima Math. J., 22 (1992), 631–647.
  • [14] K. Fukumizu and S. Kuriki, Statistics of Singular Models, Frontier Stat. Sci. 7, Iwanami Publ. (2004) (in Japanese).
  • [15] C. Hertling, Frobenius Manifolds and Moduli Spaces for Singularities, Cambridge Univ. Press (2002).
  • [16] G. Ishikawa, Parametrization of a singular Lagrangian variety, Trans. Amer. Math. Soc. 331 (1992), 787-798.
  • [17] S. Izumiya and G. Ishikawa, Applied Singularity Theory (in Japanese), Kyoritsu Shuppan, Co. Ltd. (1998)
  • [18] S. Izumiya, M.C. Romero-Fuster, M.A.S. Ruas, F. Tari, Differential Geometry from a Singularity Theory Viewpoint, World Scientific (2015).
  • [19] T. Matsumoto, Any statistical manifold has a contrast function – On the C3C^{3}-functions taking the minimum at the diagonal of the product manifold, Hiroshima Math. J. 23 (1993), 327–332.
  • [20] H. Matsuzoe, Statistical manifolds and affine differential geometry, Advanced Stud. Pure Math. 57 (2010), 303–321.
  • [21] T. Poston, I. Stewart, Catastrophe Theory and Its Applications, Pitman Publ. Ltd. (1978).
  • [22] H. Sano, Y. Kabata, J.L. Deolindo-Silva, T. Ohmoto, Classification of jets of surfaces in projective 3-space via central projection, Bull. Brazilian Math. Soc., New Series 48 (2017), 623–639.
  • [23] H. Kito, On Hessian structures on the Euclidean space and the hyperbolic space, Osaka J. Math. 36 (1999), 51–62.
  • [24] K. Saji, M. Umehara, K. Yamada, The geometry of fronts, Annals of Math., 169 (2009), 491–529.
  • [25] H. Shima, The geometry of Hessian Structures, World Scientific (2007).
  • [26] C. Small and W. Jinfang, Numerical Methods for Nonlinear Estimating Equations, Oxford Univ. Press (2003).
  • [27] B. Totaro, The curvature of a Hessian metric, Internat. J. Math. 15 (2004), 369–391.
  • [28] S. Watanabe, Algebraic Geometry and Statistical Learning Theory, Cambridge Univ. Press (2008).