The dually flat structure for singular models

Naomichi Nakajima M2, Graduate School of Information Science and Technology, Hokkaido University, Sapporo 060-0814, Japan [email protected] and Toru Ohmoto Department of Mathematics, Faculty of Science, Global Station of Bigdata and cybersecurity (GiCORE-GSB), Hokkaido University, Sapporo 060-0810, Japan [email protected]

Abstract.

The dually flat structure introduced by Amari-Nagaoka is highlighted in information geometry and related fields. In practical applications, however, the underlying pseudo-Riemannian metric may often be degenerate, and such an excellent geometric structure is rarely defined on the entire space. To fix this trouble, in the present paper, we propose a novel generalization of the dually flat structure for a certain class of singular models from the viewpoint of Lagrange and Legendre singularity theory – we introduce a quasi-Hessian manifold endowed with a possibly degenerate metric and a particular symmetric cubic tensor, which exceeds the concept of statistical manifolds and is adapted to the theory of (weak) contrast functions. In particular, we establish Amari-Nagaoka’s extended Pythagorean theorem and projection theorem in this general setup, and consequently, most of applications of these theorems are suitably justified even for such singular cases. This work is motivated by various interests with different backgrounds from Frobenius structure in mathematical physics to Deep Learning in data science.

Key words and phrases:

Dually flat structure, canonical divergence, Hessian geometry, Legendre duality, wavefronts, caustics, singularity theory.

Key words and phrases:

First keyword and Second keyword and More

1. Introduction

The dually flat structure is highlighted in information geometry – it brings a united geometric insight on various fields such as statistical science, convex optimizations, (quantum) information theory, and so on (Amari-Nagaoka [1], Amari [2], Chentsov [8]). This is also essentially the same as the Hessian structure in affine differential geometry (Shima [25]). On a $C^{\infty}$ -manifold $M$ , a dually flat structure is a triplet $(h,\nabla,\nabla^{*})$ where $h$ is a pseudo-Riemannian metric (i.e., non-degenerate symmetric $(0,2)$ -tensor) and $\nabla$ and $\nabla^{*}$ are flat affine connections on $M$ satisfying certain properties; the most particular feature is that the metric is locally given by the Hessian matrix of some potential function in $\nabla$ -affine coordinates. In practical applications, however, the Hessian matrix may often be degenerate along some locus $\Sigma$ of points in $M$ , and then, strictly speaking, the differential geometric method can not be directly applied. We call such a space a singular model, roughly. In the present paper, we propose a novel generalization of the dually flat structure for a certain class of singular models from the viewpoint of contact geometry and singularity theory. This provides a new framework for general hierarchical structures – we introduce a quasi-Hessian manifold $M$ endowed with a possibly degenerate quadratic tensor $h$ and a particular symmetric cubic tensor $C$ , that exceeds the concept of statistical manifolds and very fits with the theory of contrast functions due to Eguchi [13]. In fact, such $M$ naturally possesses a canonical divergence $\mathcal{D}:M\times M\to\mathbb{R}$ , which is a weak contrast function compatible with $h$ and $C$ (Theorem 4.10). The key is the Legendre duality, which does exist even under the presence of the degeneracy locus $\Sigma$ of $h$ . In spite of no metric $h$ and no connection $\nabla$ available (!), we generalize in a natural way the Amari-Chentsov cubic tensor $\nabla h$ to a symmetric tensor $C$ defined on the entire space $M$ (that is possible even in case that $M=\Sigma$ ), and especially we establish Amari-Nagaoka’s extended Pythagorean theorem and projection theorem in this setup (Theorems 4.7, 4.8). Consequently, in principle, most of applications of these theorems are suitably justified even for such degenerate cases.

As the first observation, we see that if the Hessian of a potential is degenerate, the graph of the dual potential (i.e., the Legendre transform of the potential) is no longer a submanifold but a wavefront having singularities branched along its caustics. More generally, our quasi-Hessian manifold $M$ is locally accompanied with two kinds of wavefronts, later called the $e/m$ -wavefronts, as an alternate to the pair of a convex potential and its dual. To grasp the point, it would be helpful to refer to Fig.1 and Fig.2 in §3.1 in advance. Those two wavefronts are mutually tied by the Legendre duality in a strict sense, and also they have ‘height functions’ (i.e., projections to the $z$ and $z^{\prime}$ -axes, respectively) by using which we generalize the Bregman divergence to our divergence $\mathcal{D}$ . Further, we associate the pair of coherent tangent bundles

(E,\nabla^{E},\Phi:TM\to E)\;\;\mbox{and}\;\;(E^{\prime},\nabla^{E^{\prime}},\Phi^{\prime}:TM\to E^{\prime}),

where each of $E$ and $E^{\prime}$ is a vector bundle on $M$ that is an alternative to the tangent bundle $TM$ and equipped with a flat connection and a bundle map from $TM$ measuring the degeneration of $h$ , and $E$ and $E^{\prime}$ are dual to each other. Intuitively, the coherent tangent bundles come from the splitting of the standard contact structure into the directions of the $e/m$ -wavefront projections (see §2.1, §3.1 and §3.2). The role of $\nabla$ and $\nabla^{*}$ on a dually flat space is undertaken by $\nabla^{E}$ and $\nabla^{E^{\prime}}$ of those vector bundles on our singular model $M$ , and then the new cubic tensor $C$ of $TM$ is defined by using these connections through $\Phi$ and $\Phi^{\prime}$ (§3.4). Originally, the notion of coherent tangent bundles has been introduced for studying Riemannian geometry of singular wavefronts by Saji-Umehara-Yamada [24], and here we borrow an affine geometry version. In the case that $h$ is non-degenerate ( $E=TM$ , $E^{\prime}=T^{*}M$ ), the dually flat structure in the original form is naturally recovered.

Singularities of caustics and wavefronts have thoroughly been investigated in Lagrange and Legendre singularity theory (initiated by Arnol’d, Zakalyukin and Hörmander) in relation with a broad range of subjects such as classical mechanics, thermodynamics, geometric optics, Fourier integral operators, control theory, catastrophe theory and so on [3, 4, 5, 6, 17, 18, 21]. We bring several techniques or concepts in this theory into information geometry, that may suggest new directions in both theory and application. In fact, the present paper is motivated by various interests from different backgrounds:

-

A typical example of quasi-Hessian manifolds is a general affine hypersurface $M$ in $\mathbb{R}^{n+1}$ . It possesses mixed geometry with changing metric-type – that goes back to Darboux and others dealing with a rich geometry of parabolic curve $\Sigma$ (the curve of inflection points) separating elliptic and hyperbolic domains on a surface in $\mathbb{R}^{3}$ [6, 5, 22]. That is also related to nonconvex optimization and variational problems [12].
-

Any Lagrange submanifold of $\mathbb{R}^{2n}$ is a quasi-Hessian manifold. If it is flat, then the cubic tensor $C$ satisfies the so-called WDVV equation, which mainly appears in topological field theory, that yields a version of Frobenius manifold-like structure [23, 27, 15]. That is also related to geometry of Poisson manifolds and paraKähler structure [7, 9].
-

In statistical inference, any curved exponential family produces a quasi-Hessian manifold, which represents the $\nabla^{*}$ -extrinsic geometry in the ambient family. For instance, it is useful for studying catastrophe phenomena in root selections of the maximum likelihood equation [26]. Almost all statistical learning machines including deep neural networks allow degeneration of Fisher-Rao matrices [2, 14, 28], to which we are seeking for a new approach.

In the present paper, we will focus only on basic ideas and write them plainly in a self-contained manner as much as possible – most arguments are elementary and checked by direct computations, and we will NOT enter into any detail of singularity theory here. Therefore, perhaps, this paper would be readable enough for anyone with various background. Nevertheless, we believe that this paper contains some new observations in this field. The rest of this paper is organized as follows. In §2 we give a brief summary on some basics in contact geometry and the dually flat structure. In §3, after reviewing the definition of Lagrange and Legendre singularities, we introduce quasi-Hessian manifolds. In §4, the associated canonical divergence will be discussed; the extended Pythagorean theorem and projection theorem are presented in our setting and also we give a relation with contrast functions. In §5, we pick up some possible applications and open questions.

Throughout, bold letters denote column vectors, e.g., $\mbox{\boldmath$x$}=(x_{1},\cdots,x_{n})^{T}$ , and the notation with prime $\mbox{\boldmath$x$}^{\prime}$ simply means to distinguish from $x$ (not mean any operation like differential or transpose). Also we let $\frac{\partial f}{\partial\mbox{\boldmath$x$}}$ denote $(\frac{\partial f}{\partial x_{1}},\cdots,\frac{\partial f}{\partial x_{n}})^{T}$ for short as usual. We assume that manifolds and maps are of class $C^{\infty}$ , for the simplicity.

The authors are partly supported by GiCORE-GSB, Hokkaido University, and JSPS KAKENHI Grant Numbers JP17H06128 and JP18K18714.

2. Dually Flat Structure

2.1. Contact geometry and Legendre duality

To begin with, we summarize a minimal set of basic knowledge in contact geometry which will be used throughout this paper. As best references, we recommend Chap.18-22 of Arnol’d et al [4], Appendix of Arnol’d [3] and Izumiya-Ishikawa [17].

A contact manifold is a $(2n+1)$ -dimensional manifold $N$ endowed with a maximally non-integrable hyperplane field $\xi_{p}\subset T_{p}N\;(p\in N)$ , i.e., $\xi$ is locally expressed by the kernel of a $1$ -form $\theta$ satisfying the non-degeneracy condition $\theta\wedge(d\theta)^{n}\not=0$ . This field $\xi$ is called a contact structure on $N$ . The most important example is the standard contact space $\mathbb{R}^{2n+1}$ ; it is the $1$ -jet space of functions on the affine $n$ -space

\mathbb{R}^{2n+1}=J^{1}(\mathbb{R}^{n},\mathbb{R})=T^{*}\mathbb{R}^{n}\times\mathbb{R},

where the $1$ -jet of a function $f$ at a point $a$ means the Taylor coefficients at $a$ of order $\leq 1$ , i.e., $(df(\mbox{\boldmath$a$}),f(\mbox{\boldmath$a$}))\in T_{\mbox{\tiny$\mbox{\boldmath$a$}$}}^{*}\mathbb{R}^{n}\times\mathbb{R}$ . The contact structure is given by the $1$ -form

\theta=dz-\mbox{\boldmath$p$}^{T}d\mbox{\boldmath$x$}=dz-\sum_{i=1}^{n}p_{i}dx_{i},

called the standard contact form, where $z$ is the last coordinate, and $x$ and $p$ denote, respectively, the base and the fiber coordinates of the cotangent bundle $T^{*}\mathbb{R}^{n}$ (we always write coordinates in this order). We often write $\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}$ and $\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}$ in order to distinguish them. Note that the standard contact structure relies on the affine structure of the base space, not on the choice of coordinates $x$ .

The famous Darboux theorem tells that the contact structure is locally unique; namely, for any contact manifold $N$ , we can always find a system of local coordinates around any point $p\in N$ , in which the contact structure is presented by the standard one.

A Legendre submanifold $L$ of a $(2n+1)$ -dimensional contact manifold $(N,\xi)$ is an $n$ -dimensional integral manifold of the field $\xi$ , i.e., $T_{p}L\subset\xi_{p}$ for every $p\in L$ . It is easy to see that in the standard $\mathbb{R}^{2n+1}$ , the graph of $(df,f)$ of a function $z=f(\mbox{\boldmath$x$})$

(1)

L=\left\{\,(\mbox{\boldmath$x$},\mbox{\boldmath$p$},z)\in\mathbb{R}^{2n+1}\,\bigg{|}\,\mbox{\boldmath$p$}=\frac{\partial f}{\partial\mbox{\boldmath$x$}},\;z=f(\mbox{\boldmath$x$})\,\right\}

is a Legendre submanifold, and conversely, every Legendre submanifold which is diffeomorphically projected to $\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}$ is expressed in this form (1).

The standard symplectic structure of $T^{*}\mathbb{R}^{n}$ is defined by the non-degenerate closed $2$ -form

\omega=\sum_{i=1}^{n}dx_{i}\wedge dp_{i}.

A Lagrange submanifold is an $n$ -dimensional submanifold over which $\omega$ vanishes. A typical example is the graph of $df$ , i.e., the image of $L$ of the form (1) via projection along the $z$ -axis. Any Lagrange submanifold of $T^{*}\mathbb{R}^{n}$ is always locally liftable to a Legendre submanifold of $T^{*}\mathbb{R}^{n}\times\mathbb{R}$ uniquely up to a transition parallel to the $z$ -axis. If it is entirely liftable, then we call it an exact Lagrange submanifold.

A Legendre fibration $\pi:N\to B$ is a fiber bundle whose total space is a $(2n+1)$ -dimensional contact manifold $N$ , the base is an $(n+1)$ -dimensional manifold $B$ and every fiber $\pi^{-1}(x)$ is Legendrian. The most typical example is the projection from the standard space

\pi:\mathbb{R}^{2n+1}\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R},\;\;(\mbox{\boldmath$x$},\mbox{\boldmath$p$},z)\mapsto(\mbox{\boldmath$x$},z).

Every Legendre fibration is locally described in this typical form with suitable local coordinates.

The Legendre duality is described as follows. Consider the transformation $\mathcal{L}:\mathbb{R}^{2n+1}\to\mathbb{R}^{2n+1}$ given by

(\mbox{\boldmath$x$}^{\prime},\mbox{\boldmath$p$}^{\prime},z^{\prime})=\mathcal{L}(\mbox{\boldmath$x$},\mbox{\boldmath$p$},z):=(\mbox{\boldmath$p$},\mbox{\boldmath$x$},\mbox{\boldmath$p$}^{T}\mbox{\boldmath$x$}-z).

It is a contactomorphism, i.e., $\mathcal{L}$ preserves the contact hyperplane fields $\xi$ ; indeed, $\mathcal{L}^{*}\theta=\mathcal{L}^{*}(dz^{\prime}-\mbox{\boldmath$p$}^{\prime T}d\mbox{\boldmath$x$}^{\prime})=-\theta$ . Put

\pi^{\prime}:=\pi\circ\mathcal{L}:\mathbb{R}^{2n+1}\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}\times\mathbb{R},\;(\mbox{\boldmath$x$},\mbox{\boldmath$p$},z)\mapsto(\mbox{\boldmath$p$},\mbox{\boldmath$p$}^{T}\mbox{\boldmath$x$}-z),

which is also a Legendre fibration. Then, the double fibration structure of the standard contact space is defined as the following diagram:

(d ​ L)

Let $\Pi:\mathbb{R}^{n}\times\mathbb{R}\to\mathbb{R}^{n}$ be the projection to the first factor and put

\pi_{1}=\Pi\circ\pi:\mathbb{R}^{2n+1}\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}},\quad\pi^{\prime}_{1}=\Pi\circ\pi^{\prime}:\mathbb{R}^{2n+1}\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}.

Let $L$ be a Legendre submanifold of $\mathbb{R}^{2n+1}$ . In this paper, we call $L$ a regular model if $L$ is diffeomorphic to some open subsets $U\subset\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}$ and $V\subset\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}$ via projections $\pi_{1}$ and $\pi^{\prime}_{1}$ , respectively. Equivalently, there exsit functions $z=f(\mbox{\boldmath$x$})$ on $U$ and $z^{\prime}=\varphi(\mbox{\boldmath$p$})$ on $V$ such that

-

$L\subset\mathbb{R}^{2n+1}=J^{1}(\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}},\mathbb{R})$ is the graph of $(df,f)$ ;
-

$\mathcal{L}(L)\subset\mathbb{R}^{2n+1}=J^{1}(\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}},\mathbb{R})$ is the graph of $(d\varphi,\varphi)$ .

Then we have

\mbox{\boldmath$p$}=\frac{\partial f}{\partial\mbox{\boldmath$x$}},\;\;\mbox{\boldmath$x$}=\frac{\partial\varphi}{\partial\mbox{\boldmath$p$}},\quad f(\mbox{\boldmath$x$})+\varphi(\mbox{\boldmath$p$})-\mbox{\boldmath$p$}^{T}\mbox{\boldmath$x$}=0.

The coordinate change is the gradient map $U\to V$ , $\mbox{\boldmath$x$}\mapsto\mbox{\boldmath$p$}=\frac{\partial f}{\partial\mbox{\tiny$\mbox{\boldmath$x$}$}}$ . It is diffeomorphic, thus the Hessian matrix of $f(\mbox{\boldmath$x$})$ is non-degenerate at every $\mbox{\boldmath$x$}\in U$ . Here, the inverse map $V\to U$ is given by $\mbox{\boldmath$p$}\mapsto\mbox{\boldmath$x$}=\frac{\partial\varphi}{\partial\mbox{\tiny$\mbox{\boldmath$p$}$}}$ , and its Hessian matrix is the inverse of that of $f(\mbox{\boldmath$x$})$ . We say that $z^{\prime}=\varphi(\mbox{\boldmath$p$})$ is the Legendre transform of $z=f(\mbox{\boldmath$x$})$ and vice-versa. We call $f$ a potential function and $\varphi$ its dual potential. This correspondence is the Legendre duality. It is very common in, e.g., convex analysis: if $z=f(\mbox{\boldmath$x$})$ is strictly convex, then $z^{\prime}=\varphi(\mbox{\boldmath$p$})$ is also (see Remark 2.2).

An affine Legendre equivalence, a new terminology introduced in this paper, is defined by an affine transformation $\mathcal{L}_{F}:\mathbb{R}^{2n+1}\to\mathbb{R}^{2n+1}$ of the form

\mathcal{L}_{F}(\mbox{\boldmath$x$},\mbox{\boldmath$p$},z)=(A\mbox{\boldmath$x$}+\mbox{\boldmath$b$},\;A^{\prime}\mbox{\boldmath$p$}+\mbox{\boldmath$b$}^{\prime},\;z+\mbox{\boldmath$c$}^{T}\mbox{\boldmath$x$}+d)

together with affine transformations

F(\mbox{\boldmath$x$},z)=(A\mbox{\boldmath$x$}+\mbox{\boldmath$b$},z+\mbox{\boldmath$c$}^{T}\mbox{\boldmath$x$}+d),

F^{*}(\mbox{\boldmath$p$},z^{\prime})=(A^{\prime}\mbox{\boldmath$p$}+\mbox{\boldmath$b$}^{\prime},z^{\prime}+\mbox{\boldmath$c$}^{\prime T}\mbox{\boldmath$p$}+d^{\prime}),

where $A$ is invertible and

A^{\prime}=(A^{T})^{-1},\;\;\mbox{\boldmath$b$}^{\prime}=A^{\prime}\mbox{\boldmath$c$},\;\;\mbox{\boldmath$b$}=A\mbox{\boldmath$c$}^{\prime},\;\;d^{\prime}=\mbox{\boldmath$b$}^{\prime T}\mbox{\boldmath$b$}-d.

Note that $F$ (or $F^{*}$ ) determines $\mathcal{L}_{F}$ . It is easy to see that $\mathcal{L}_{F}$ preserves the contact form and the double fibrations $(dL)$ , i.e., $\mathcal{L}_{F}^{*}\theta=\theta$ and the following diagram commutes:

Definition 2.1.

We say that two Legendre submanifolds $L_{1},L_{2}$ of $\mathbb{R}^{2n+1}$ are affine Legendre equivalent if there exists some $\mathcal{L}_{F}$ which identifies $L_{1}$ with $L_{2}$ .

Remark 2.2.

(Projective duality) The Legendre duality is an affine expression of the projective duality. We denote by $\mathbb{P}^{n+1}\,(:=\mathbb{R}P^{n+1})$ the real projective space of dimension $n+1$ and by $\mathbb{P}^{n+1*}\,(:=\mathbb{R}P^{n+1*})$ the dual projective space. Let $N$ denote the incidence submanifold of $\mathbb{P}^{n+1}\times\mathbb{P}^{n+1*}$ which consists of pairs $(p,\lambda)$ with $p\in\lambda$ , i.e., $N$ is a codimension one submanifold ( $\dim N=2n+1$ ) defined by

p_{0}x_{0}+p_{1}x_{1}+\cdots+p_{n+1}x_{n+1}=0

for $p=[x_{0}:\cdots:x_{n+1}]\in\mathbb{P}^{n+1}$ and $\lambda=[p_{0}:\cdots:p_{n+1}]\in\mathbb{P}^{n+1*}$ . Note that $N$ is naturally identified with the projective cotangent bundle $PT^{*}\mathbb{P}^{n+1}\,(=PT^{*}\mathbb{P}^{n+1*})$ , and thus $N$ becomes a contact manifold [4, §20.1]. Consider the open subset $O_{N}$ of $N$ defined by $x_{n+1}\not=0$ and $p_{0}\not=0$ . We may set $x_{n+1}=p_{0}=-1$ , and put $z=x_{0}$ and $z^{\prime}=p_{n+1}$ , then the above equation is rewritten as

z+z^{\prime}-\mbox{\boldmath$p$}^{T}\mbox{\boldmath$x$}=0.

Clearly, $O_{N}$ has two systems of coordinates, $(\mbox{\boldmath$x$},\mbox{\boldmath$p$},z)$ and $(\mbox{\boldmath$p$},\mbox{\boldmath$x$},z^{\prime})$ , and the coordinate change between them is just the above $\mathcal{L}:\mathbb{R}^{2n+1}\to\mathbb{R}^{2n+1}$ preserving the contact structure of $N$ . In projective geometry, the double Legendre fibration

expresses the duality principle on points and hyperplanes, where $\pi$ and $\pi^{\prime}$ are projections of the projective cotangent bundles. Restrict this diagram to $O_{N}$ and identify $O_{N}$ with $\mathbb{R}^{2n+1}$ using coordinates $(\mbox{\boldmath$x$},\mbox{\boldmath$p$},z)$ , we get the diagram $(dL)$ . For instance, in case of $n=1$ , consider a parameterized plane curve

C:(x,z):=(x,f(x))\in\mathbb{R}^{2}\subset\mathbb{P}^{2}.

Then its projective dual is the following curve consisting of the tangent lines:

\textstyle C^{*}:(p,z^{\prime})=(\frac{df}{dx}(x),x\frac{df}{dx}(x)-f(x))\in\mathbb{R}^{2*}\subset\mathbb{P}^{2*}.

If $C$ is convex, then $C^{*}$ is also. If $C$ has an inflection point, e.g., $f(x)=\frac{1}{3}x^{3}+\cdots$ , then $C^{*}$ has a cusp at the corresponding point, $(p,z^{\prime})=(x^{2}+\cdots,\frac{2}{3}x^{3}+\cdots)$ , and therefore, $C^{*}$ is locally the graph of a bi-valued function, $z^{\prime}=\pm\frac{2}{3}p^{3/2}+\cdots\,(p\geq 0)$ .

2.2. Dually flat structure

Let $L$ be a Legendre submanifold of a regular model with potential function $z=f(\mbox{\boldmath$x$})$ . The Hessian matrix

H(p)=\left[\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}(\pi_{1}(p))\right]\quad(p\in L)

is invertible, thus it defines a pseudo-Riemannian metric $h$ on $L$ , called the Hessian metric associated to $f$ . Additionally, through the projections $\pi_{1}$ and $\pi^{\prime}_{1}$ , the fixed affine structures of $\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}$ and $\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}$ induce two different flat affine connections $\nabla,\nabla^{*}$ on $L$ , respectively.

Definition 2.3.

([1]) The triplet $(h,\nabla,\nabla^{*})$ is called the dually flat structure on a regular model $L$ .

Note that $(h,\nabla,\nabla^{*})$ is preserved under affine Legendre equivalence; indeed, $\mathcal{L}_{F}$ induces affine transformations of $\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}$ and $\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}$ and simply adds a linear function $\mbox{\boldmath$c$}^{T}\mbox{\boldmath$x$}+d$ to the potential $z=f(\mbox{\boldmath$x$})$ .

The dually flat structure is traditionally introduced in terms of differential geometry in an intrinsic way. We briefly summarize it below, see [1, 2, 20, 25] for the detail.

A statistical manifold is a pseudo-Riemannian manifold $(M,h)$ equipped with a torsion-free affine connection $\nabla$ being compatible with $h$ , i.e., the cubic tensor $T:=\nabla h$ is totally symmetric:

(\nabla_{X}h)(Y,Z)=(\nabla_{Y}h)(X,Z)

for vector fields $X,Y$ and $Z$ . Equivalently [20, p.306], a stastistical manifold may also be defined as a manifold endowed with a pseudo-Riemannian metric $h$ and a totally symmetric $(0,3)$ -tensor $T$ (due to Lauritzen), that is also described within the theory of contrast functions in [13]. The dual connection $\nabla^{*}$ (with respect to $h$ ) is defined by

Xh(Y,Z)=h(\nabla_{X}Y,Z)+h(Y,\nabla_{X}^{*}Z),

and then $\nabla^{*}$ is torsion-free and $\nabla^{*}h$ is also symmetric. Furthermore, if $\nabla$ is flat (i.e., torsion-free and curvature-free), then $\nabla^{*}$ is also. Such a statistical manifold with flat connections is called a dually flat manifold [1, 2] or a Hessian manifold [25, 20]. The most notable characteristic of a dually flat manifold is that locally it holds that

h=\nabla df

for some local potential $f$ . In other words, the metric $h$ is expressed by the non-degenerate Hessian matrix of $z=f(\mbox{\boldmath$x$})$ in $\nabla$ -affine local coordinates $x$ . Moreover, the $\nabla^{*}$ -affine coordinates $p$ are then given by $\mbox{\boldmath$p$}=\frac{\partial f}{\partial\mbox{\tiny$\mbox{\boldmath$x$}$}}(\mbox{\boldmath$x$})$ .

This local expression of a dually flat manifold $M$ exactly provides a regular model $L$ in $\mathbb{R}^{2n+1}$ , the graph of $1$ -jet of a local potential, equipped with the dually flat structure in the sense of Definition 2.3. Such a regular model $L$ is uniquely determined up to affine Legendre equivalence. To see this precisely, suppose that the metric $h$ is locally expressed by the Hessian matrices $H_{\alpha}$ and $H_{\beta}$ of two potential functions $f_{\alpha}(\mbox{\boldmath$x$}^{\alpha})$ and $f_{\beta}(\mbox{\boldmath$x$}^{\beta})$ in different $\nabla$ -affine local coordinates, respectively. Here, let $(U_{\alpha},\mbox{\boldmath$x$}^{\alpha}=(x^{\alpha}_{1},\cdots,x^{\alpha}_{n})^{T})$ and $(U_{\beta},\mbox{\boldmath$x$}^{\beta}=(x^{\beta}_{1},\cdots,x^{\beta}_{n})^{T})$ denote the charts with $U_{\alpha},U_{\beta}\subset M$ , $U_{\alpha}\cap U_{\beta}\not=\emptyset$ . By definition, there is an affine transformation

\mbox{\boldmath$x$}^{\beta}=\psi(\mbox{\boldmath$x$}^{\alpha})=A\mbox{\boldmath$x$}^{\alpha}+\mbox{\boldmath$b$}.

By the assumption, it holds that $A^{T}H_{\beta}(p)A=H_{\alpha}(p)$ for every $p\in U_{\alpha}\cap U_{\beta}$ , thus any second partial derivatives of the composite function $f_{\beta}\circ\psi(\mbox{\boldmath$x$}^{\alpha})$ coincide with those of $f_{\alpha}(\mbox{\boldmath$x$}^{\alpha})$ . Namely, these two functions are the same up to some linear term:

f_{\beta}\circ\psi(\mbox{\boldmath$x$}^{\alpha})=f_{\alpha}(\mbox{\boldmath$x$}^{\alpha})+\mbox{\boldmath$c$}^{T}\mbox{\boldmath$x$}^{\alpha}+d.

Then the affine transformation

F(\mbox{\boldmath$x$}^{\alpha},z):=(A\mbox{\boldmath$x$}^{\alpha}+\mbox{\boldmath$b$},z+\mbox{\boldmath$c$}^{T}\mbox{\boldmath$x$}^{\alpha}+d)

sends the graph of $z=f_{\alpha}(\mbox{\boldmath$x$}^{\alpha})$ to the graph of $z=f_{\beta}(\mbox{\boldmath$x$}^{\beta})$ . Hence, the corresponding affine Legendre equivalence $\mathcal{L}_{F}$ identifies two regular models, $L_{\alpha}\subset\mathbb{R}^{2n+1}$ defined by $(df_{\alpha},f_{\alpha})$ and $L_{\beta}\subset\mathbb{R}^{2n+1}$ defined by $(df_{\beta},f_{\beta})$ , on the overlap.

Actually, less noticed, though, this simple observation says that any dually flat manifold is an affine manifold having an atlas $\{(U_{\alpha},\mbox{\boldmath$x$}^{\alpha})\}_{\alpha\in\Lambda}$ with affine coordinate changes $\psi_{\alpha}^{\beta}\,(\alpha,\beta\in\Lambda)$ so that it is additionally equipped with local potentials $\{f_{\alpha}\}_{\alpha\in\Lambda}$ whose graphs are glued by affine transformations $F_{\alpha}^{\beta}$ of the above form. The affine structure gives the flat connection $\nabla$ , and local potentials restore the Hessian metric $h$ by gluing $\{H_{\alpha}\}_{\alpha\in\Lambda}$ . At the level of Legendre submanifolds given by $1$ -jets of local potentials, notice again that $\mathcal{L}_{F}$ preserves $(h,\nabla,\nabla^{*})$ . Therefore, we may rephrase the above statement in the following way:

Proposition 2.4.

Any dually flat or Hessian manifold is an affine manifold made up by gluing several regular models $L_{\alpha}$ in $\mathbb{R}^{2n+1}$ via affine Legendre equivalence. The metric $h$ and the pair of affine connections $\nabla$ and $\nabla^{*}$ are reconstructed by the dually flat structures of $L_{\alpha}$ in the sense of Definition 2.3.

This gluing construction will be generalized later to introduce our quasi-Hessian manifolds (Definition 3.19 in §3.3).

Remark 2.5.

Since each gluing map acts also on a neighborhood of a regular model in $\mathbb{R}^{2n+1}$ , the gluing construction yields a dually flat manifold as a Legendre submanifold of some ambient contact manifold (also it produces a Lagrange submanifold of some symplectic manifold). Let $(M,h,\nabla,\nabla^{*})$ be a dually flat manifold, and suppose that there exists a global potential $f:M\to\mathbb{R}$ with $h=\nabla df$ . Take local charts $U_{\alpha}$ of $\nabla$ -flat coordinates, then local potentials $f|_{U_{\alpha}}$ define regular models $L_{\alpha}$ in $J^{1}(U_{\alpha},\mathbb{R})=T^{*}U_{\alpha}\times\mathbb{R}$ , and they are glued together by affine Legendre equivalence of the form $\mathcal{L}_{F}$ with $\mbox{\boldmath$c$}=0$ and $d=0$ . Conversely, gluing local models by this special kind of affine Legendre equivalences yields a dually flat manifold with a global potential. As a weaker situation, suppose that there exists a closed $1$ -form $\sigma$ with $h=\nabla\sigma$ ; then $M$ is said to be of Koszul type [25]. This case corresponds to gluing regular models by affine Legendre equivalence of the form $\mathcal{L}_{F}$ with $\mbox{\boldmath$c$}=0$ but possibly $d\not=0$ .

Example 2.6.

(Amari [1, 2]). An exponential family $M$ is a family of probability density functions of the form

p(\mbox{\boldmath$u$}|\theta)=\exp(\mbox{\boldmath$u$}^{T}\theta-\psi(\theta))

where $\mbox{\boldmath$u$}=(u_{1},\cdots,u_{n})\in\mathbb{R}^{n}$ is a random valuable (with its measure $d\mu$ ) and $\theta=(\theta_{1},\cdots,\theta_{n})^{T}\in U\subset\mathbb{R}^{n}$ are parameters ( $U$ is an open set). The normalization factor $\psi(\theta)=\log\int\exp(\mbox{\boldmath$u$}^{T}\theta)\,d\mu$ is called the potential of this family. Fix the affine structure of $U$ , and put $\partial_{i}=\frac{\partial}{\partial\theta_{i}}$ . We see that the expectation is the corresponding dual coordinate

\eta_{i}:={\bf E}[u_{i}|\theta]=\partial_{i}\psi(\theta)

and the (co)variance are written by

h_{ij}:={\bf V}[\mbox{\boldmath$u$}|\theta]_{ij}=\partial_{i}\partial_{j}\psi(\theta)={\bf E}\left[(\partial_{i}\log p)(\partial_{j}\log p)\right],

where the last one means the Fisher-Rao information. If $h=[h_{ij}]$ is positive and one regards $\theta,\eta$ as the $\nabla,\nabla^{*}$ -affine coordinates, respectively, then $(M,h,\nabla,\nabla^{*})$ becomes a dually flat manifold. Normal distributions and finite discrete distributions are typical examples.

3. Quasi-Hessian structure

Our main idea is to consider not only regular models but also general Legendre submanifolds of $\mathbb{R}^{2n+1}$ . Then the Lagrange-Legendre singularity theory naturally comes up into the picture (Arnol’d el al [4], Izumiya-Ishikawa [17]). Nevertheless, in this paper, we only use very basic notions/properties in the theory, which are prepared in §3.1. As another new ingredient, we introduce in §3.2 an affine geometric version of the coherent tangent bundle in Saji-Umehara-Yamada [24]. In §3.3 and §3.4 we define a quasi-Hessian manifold endowed with a particular cubic tensor.

3.1. $e/m$ -wavefronts and $e/m$ -caustics

A Legendre map is the composition

\pi\circ\iota:L\to N\to B

of the inclusion $\iota$ of a Legendre submanifold $L$ and the projection of a Legendre fibration $\pi:N\to B$ . The image is usually called a wavefront; we denote it by $W(L)$ in this paper. The Legendre map $\pi\circ\iota:L\to B$ may have singular points, i.e., points $p$ on $L$ at which the rank of the differential is not maximum (equivalently, $T_{p}L$ is tangent to the fiber of $\pi$ ), called Legendre singularities [4, 17]. Then the wavefront is no longer a submanifold.

From now on, we consider the diagram $(dL)$ of double Legendre fibrations on $\mathbb{R}^{2n+1}$ and an arbitrary Legendre submanifold $L\subset\mathbb{R}^{2n+1}$ . So we have two Legendre maps

\pi^{e}:=\pi\circ\iota:L\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}_{z},\quad\pi^{m}:=\pi^{\prime}\circ\iota:L\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}\times\mathbb{R}_{z^{\prime}}

and call them the $e$ - and $m$ -Legendre maps, respectively, following a traditional notation in information geometry (“ $e$ -” and “ $m$ -” come from words in statisitcs, i.e., exponential and mixture) [1].

Definition 3.1.

( $e/m$ -wavefronts) We set

W_{e}(L):=\pi^{e}(L)\subset\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}_{z},\;\;\;W_{m}(L):=\pi^{m}(L)\subset\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}\times\mathbb{R}_{z^{\prime}},

and call them the $e/m$ -wavefronts associated to $L$ , respectively.

The $e/m$ -wavefronts are Legendre dual to each other in point-hyperplane duality principle (Remark 2.2).

Usually, the projection of a Lagrange submanifold of $T^{*}\mathbb{R}^{n}$ to the base is called a Lagrange map [4, 17]. So we have the $e/m$ -Lagrange maps

\pi^{e}_{1}=\Pi\circ\pi^{e}:L\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}},\quad\pi^{m}_{1}=\Pi\circ\pi^{m}:L\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}.

It is easy to see that the following two conditions on points $p\in L$ are equivalent:

•

$p$ is a singular point of the Legendre map $\pi^{e}:L\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}_{z}$ ;
•

$p$ is a singular point of the Lagrange map $\pi^{e}_{1}:L\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}$ .

Indeed, any $v\in T_{p}L$ enjoys $dz_{p}(v)-\mbox{\boldmath$p$}(p)^{T}d\mbox{\boldmath$x$}_{p}(v)=0$ , thus, if $d\mbox{\boldmath$x$}_{p}(v)=0$ , then $dz_{p}(v)=0$ .

Definition 3.2.

( $e/m$ -caustics) The $e$ -critical set $C(\pi^{e}_{1})\subset L$ consists of all singular points of the $e$ -Lagrange map $\pi^{e}_{1}:L\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}$ , and we call its image $\pi^{e}_{1}C(\pi^{e}_{1})\subset\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}$ the $e$ -caustics associated to $L$ . The $m$ -version is defined in entirely the same way.

Definition 3.3.

We say that $L$ is locally a regular model around $p\in L$ if there is an open neighborhood of $p$ in $L$ which is a regular model of $\mathbb{R}^{2n+1}$ , i.e., $p$ is neither $e$ -critical nor $m$ -critical.

Consider the case that $p\in L$ is not $e$ -critical but $m$ -critical (see toy examples in Examples 3.4, 3.5 below). Then $W_{e}(L)$ is the graph of some local potential function $z=f(\mbox{\boldmath$x$})$ defined near $\pi^{e}_{1}(p)$ . Take $x$ as local coordinates of $L$ around $p$ . Then the $e$ -Lagrange map is written as the identity map of $x$ and the $e$ -caustics is empty, while the $m$ -Lagrange map $\pi^{m}:L\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}$ is written as the gradient map $\mbox{\boldmath$p$}=\nabla f(\mbox{\boldmath$x$})$ . Now it is critical at $p$ , so $W_{m}(L)$ is singular. In this case we call $L$ a model with degenerate potential. In particular, if $f$ admits inflection points in strict sense, $W_{m}(L)$ is the graph of a multi-valued function $z^{\prime}=\varphi(\mbox{\boldmath$p$})$ branched along the $m$ -caustics in $\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}$ .

Example 3.4.

( $A_{2}$ -singularity). Let

f(\mbox{\boldmath$x$})=\frac{x_{1}^{3}}{3}+\frac{x_{2}^{2}}{2}.

Then $\mbox{\boldmath$p$}=(p_{1},p_{2})=(x_{1}^{2},x_{2})$ and the degeneracy locus $\Sigma$ of the Hessian $h=\nabla^{2}f$ is defined by $x_{1}=0$ . See the pictures on the left in Fig. 1.

-

The $e$ -wavefront $W_{e}(L)$ is smooth and has parabolic points along $\Sigma$ . There is no $e$ -caustic.

The $m$ -wavefront $W_{m}(L)$ is a singular surface with cuspidal edge; it is the graph of the bi-valued dual potential

z^{\prime}=\mbox{\boldmath$p$}^{T}\mbox{\boldmath$x$}-z=\frac{2}{3}x_{1}^{3}+\frac{1}{2}x_{2}^{2}=\pm\frac{2}{3}p_{1}^{3/2}+\frac{1}{2}p_{2}^{2}

defined on $p_{1}\geq 0$ and branched along the $m$ -caustics $p_{1}=0$ .

This singularity does not appear, if the Hessian is be non-negative. Note that for every statistical model, the Fisher-Rao metric is non-negative.

Example 3.5.

( $A_{3}$ -singularity). Let

f(\mbox{\boldmath$x$})=\frac{x_{1}^{4}}{4}+\frac{x_{2}^{2}}{2}.

Then $\mbox{\boldmath$p$}=(p_{1},p_{2})=(x_{1}^{3},x_{2})$ and the degeneracy locus $\Sigma$ of the Hessian $h=\nabla^{2}f$ is defined by $x_{1}=0$ . See the pictures on the right in Fig. 1.

-

The $e$ -wavefront $W_{e}(L)$ is smooth and convex. There is no $e$ -caustic.

The $m$ -wavefront $W_{m}(L)$ is a singular surface; it is the graph of the dual potential

z^{\prime}=\mbox{\boldmath$p$}^{T}\mbox{\boldmath$x$}-z=\frac{3}{4}x_{1}^{4}+\frac{1}{2}x_{2}^{2}=\frac{3}{4}p_{1}^{4/3}+\frac{1}{2}p_{2}^{2},

which is defined on the entire space but singular along the $m$ -caustics $p_{1}=0$ .

This is a typically degenerate minimum of functions and also a typical type of singularities with $\mathbb{Z}_{2}$ -symmetry (cf. [4]).

Refer to caption — Figure 1. The $e/m$ -wavefronts and the $e/m$ -caustics (Examples 3.4 and 3.5).

Furthermore, it can happen that $p\in L$ is $e$ -critical and $m$ -critical simultaneously. Then both wavefronts $W_{e}(L)$ and $W_{m}(L)$ become singular at $\pi^{e}(p)$ and $\pi^{m}(p)$ , respectively. In general, by Implicit Function Theorem, there is a partition $I\sqcup J=\{1,\cdots,n\}$ ( $I\cap J=\emptyset$ ) such that $L$ is locally parametrized around $p$ by coordinates $\mbox{\boldmath$x$}_{I}=(x_{i})^{T}$ and $\mbox{\boldmath$p$}_{J}=(p_{j})^{T}$ ( $i\in I$ , $j\in J$ ). In fact, we can find a function $g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})$ such that $L$ near $p$ is expressed by

\mbox{\boldmath$p$}_{I}=\frac{\partial g}{\partial\mbox{\boldmath$x$}_{I}},\;\;\mbox{\boldmath$x$}_{J}=-\frac{\partial g}{\partial\mbox{\boldmath$p$}_{J}},\;\;z=\mbox{\boldmath$p$}_{J}^{T}\mbox{\boldmath$x$}_{J}+g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J}),

where we write $\frac{\partial g}{\partial\mbox{\tiny$\mbox{\boldmath$x$}$}_{I}}=(\frac{\partial g}{\partial x_{i_{1}}},\cdots)^{T}\;(I=(i_{1},i_{2},\cdots))$ . This follows from the form (1) in §2.1 and the canonical transformation

\mathbb{R}^{2n+1}\to\mathbb{R}^{2n+1}\;\;(\mbox{\boldmath$x$},\mbox{\boldmath$p$},z)\mapsto(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J},\mbox{\boldmath$p$}_{I},-\mbox{\boldmath$x$}_{J},-\mbox{\boldmath$p$}_{J}^{T}\mbox{\boldmath$x$}_{J}+z)

which preserves the contact structure. Usually, $g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})$ is called a generating function of $L$ around $p$ [4, §20]. In particular, in case that $J=\emptyset$ (resp. $I=\emptyset$ ), a generating function is a potential $z=f(\mbox{\boldmath$x$})$ (resp. dual potential $z^{\prime}=\varphi(\mbox{\boldmath$p$})$ ). The $e/m$ -Legendre maps are locally expressed as follows.

\begin{array}[]{ll}\pi^{e}:(L,p)\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}_{z},&\displaystyle(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})\mapsto\left(\mbox{\boldmath$x$}_{I},\;-\frac{\partial g}{\partial\mbox{\boldmath$p$}_{J}},\;-\mbox{\boldmath$p$}_{J}^{T}\frac{\partial g}{\partial\mbox{\boldmath$p$}_{J}}+g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})\right),\\ \pi^{m}:(L,p)\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}\times\mathbb{R}_{z^{\prime}},&\displaystyle(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})\mapsto\left(\frac{\partial g}{\partial\mbox{\boldmath$x$}_{I}},\;\mbox{\boldmath$p$}_{J},\;\mbox{\boldmath$x$}_{I}^{T}\frac{\partial g}{\partial\mbox{\boldmath$x$}_{I}}-g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})\right).\end{array}

Also the $e/m$ -Lagrange maps $\pi^{e}_{1},\pi^{m}_{1}$ are obtained by ignoring the last $z$ and $z^{\prime}$ -coordinate, respectively.

Example 3.6.

Let

g(x_{1},p_{2})=\frac{x_{1}^{3}}{3}+\frac{p_{2}^{4}}{4}

be a generating function. The $e/m$ -Legendre maps $\pi^{e}$ and $\pi^{m}$ send $(x_{1},p_{2})$ to

(x_{1},x_{2},z)=\left(x_{1},-p_{2}^{3},\frac{x_{1}^{3}}{3}-\frac{3p_{2}^{4}}{4}\right),\quad(p_{1},p_{2},z^{\prime})=\left(x_{1}^{2},p_{2},\frac{2x_{1}^{3}}{3}-\frac{p_{2}^{4}}{4}\right),

respectively, so those images $W_{e}(L)$ and $W_{m}(L)$ are singular surfaces having some own geometric nature, and the $e/m$ -caustics are defined by $x_{2}=0$ on $\mathbb{R}^{2}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}$ and $p_{1}=0$ on $\mathbb{R}^{2}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}$ , see Fig. 2.

Remark 3.7.

(Hierarchical structure) For a dually flat manifold $L$ with a potential $z=f(\mbox{\boldmath$x$})$ (i.e., a regular model), there are two systems of coordinates, $x$ and $\mbox{\boldmath$p$}\,(=\frac{\partial f}{\partial\mbox{\tiny$\mbox{\boldmath$x$}$}})$ , which are $\nabla$ -flat and $\nabla^{*}$ -flat, respectively. That produces a hierarchical structure – we may take a new system of coordinates $(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})$ , called mixed coordinates in Amari [2], which yields two foliations of complementary dimensions on $L$ defined by $\mbox{\boldmath$x$}_{I}=const.$ and $\mbox{\boldmath$p$}_{J}=const.$ ; their leaves are $\nabla^{*}$ -flat and $\nabla$ -flat, respectively, and mutually orthogonal with respect to the Hessian metric associated to $f$ . This structure is useful for application, see [2]. For an arbitrary Legendre submanifold $L$ , a potential may not exist globally, but as seen above, for any $p\in L$ , we can always find a generating function $g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})$ on a neighborhood $U$ of $p$ (possible choices of the partition $I\sqcup J$ depends on $p$ ). That locally defines mixed coordinates $(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})$ and two orthogonal foliations on $U$ (see Remark 3.15 below). Usually these coordinates can not be extended to the entire space $L$ , because of the presence of $e/m$ -caustics (i.e., $h$ is degenerate). Nevertheless, this new structure is well organized globally, that we will formulate properly in the following subsections.

3.2. Coherent tangent bundles

Let $L$ be a Legendre submanifold of $\mathbb{R}^{2n+1}$ . As seen above, the $e$ -wavefront $W_{e}(L)$ is not a manifold in general, but there is an alternate to its ‘tangent bundle’. Every point $p\in L$ defines a hyperplane $E_{p}$ in $T_{\pi^{e}(p)}(\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}_{z})=\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}_{z}$ , and the family of such hyperplanes form a vector bundle on $L$ of rank $n$ :

E(=E_{L}):=\{\;(p,w)\in L\times(\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}_{z})\;\mid\;dz_{p}(w)-\mbox{\boldmath$p$}(p)^{T}d\mbox{\boldmath$x$}_{p}(w)=0\;\}.

Since $L$ is Legendrian, we see

T_{p}L\subset\xi_{p}=\ker\theta_{p}=(d\pi_{p})^{-1}(E_{p}),

thus $d\pi^{e}(T_{p}L)\subset E_{p}$ . We then associate a vector bundle map (a smooth fiber-preserving map which is linear on each fiber)

\Phi:TL\to E,\quad v_{p}\mapsto d\pi^{e}_{p}(v_{p}).

Note that $\Phi$ is isomorphic if and only if $\pi^{e}:L\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}$ is an immersion.

Remark 3.8.

We remark that $E$ is the “limiting” tangent bundle of the $e$ -wavefront $W_{e}(L)$ . Note that the kernel of $d\pi_{p}:T_{p}\mathbb{R}^{2n+1}\to T_{\pi(p)}(\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R})$ is spanned by $\frac{\partial}{\partial p_{i}}$ ’s. If $p\in L$ is a regular point of the $e$ -Legendre map $\pi^{e}:L\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}$ , then $T_{p}L\cap\ker d\pi_{p}=\{0\}$ and

E_{p}={\rm Im}\,d\pi^{e}_{p}(T_{p}L)=T_{\pi^{e}(p)}W_{e}(L).

In fact, in this case, $\pi^{e}$ is an immersion around $p$ , so $W(L)$ is a submanifold around $\pi^{e}(p)$ . If a sequence of regular points $p_{n}\in L$ of $\pi^{e}$ converges to a critical point $p$ , then the image of $T_{p_{n}}L$ converges to $E_{p}$ (in the Grassmannian of $n$ -planes in $\mathbb{R}^{n+1}=\mathbb{R}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}^{n}\times\mathbb{R}$ ) because of the continuity of the bundle $E$ . In this case, $W(L)$ is singular at $\pi^{e}(p)$ , thus the tangent space at that point is not defined, but it has the limiting tangent space $E_{p}$ as an alternate. Another characterization of $E_{p}$ is

E_{p}=\ker\left[d\pi^{\prime}_{p}:T_{p}\mathbb{R}^{2n+1}\to T_{\pi^{m}(p)}(\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}\times\mathbb{R}_{z^{\prime}})\,\right]

through the inclusion $E_{p}\subset\mathbb{R}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}^{n}\oplus\{0\}\oplus\mathbb{R}_{z}\subset\mathbb{R}^{2n+1}=T_{p}\mathbb{R}^{2n+1}$ . In fact, the Jacobian matrix of $\pi^{\prime}$ at $p$ is

\left[\begin{array}[]{ccc}O&E&0\\ \mbox{\boldmath$p$}(p)^{T}&\mbox{\boldmath$x$}(p)^{T}&-1\end{array}\right]

so its kernel is given by $dz_{p}-\mbox{\boldmath$p$}(p)^{T}d\mbox{\boldmath$x$}_{p}=0$ and $d\mbox{\boldmath$p$}_{p}=0$ . Note that the contact hyperplane splits as

\xi_{p}=\ker d\pi^{\prime}_{p}\oplus\ker d\pi_{p}.

Let $\widetilde{\nabla}$ be the flat connection on the affine space $\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}$ , and for any $p\in L$ , let $\psi_{p}:\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\times\mathbb{R}\to E_{p}$ denote the linear projection along the $z$ -axis. Then a connection $\nabla^{E}$ of the vector bundle $E$ over $L$ is naturally defined by

\nabla^{E}_{X}\xi(p):=\psi_{p}\circ\widetilde{\nabla}_{X}\xi(p)

where $\xi$ is any section of $E$ and $X$ is any vector field on $L$ around $p$ .

Lemma 3.9.

The resulting connection $\nabla^{E}$ is flat and ‘relatively torsion-free’, i.e., for any vector fields $X,Y$ on $L$ , it holds that

\nabla^{E}_{X}(\Phi(Y))-\nabla^{E}_{Y}(\Phi(X))=\Phi([X,Y]).

Proof : Put $s_{i}(p):=\frac{\partial}{\partial x_{i}}+p_{i}\frac{\partial}{\partial z}\in E_{p}$ ( $1\leq i\leq n$ ), then they form a frame of flat global sections of $E$ :

\nabla^{E}_{X}s_{i}=\psi_{p}(\tilde{\nabla}_{X}s_{i})=\psi_{p}(X(p_{i})\frac{\partial}{\partial z})=0.

Thus $\nabla^{E}$ is flat. Next, a key point is that $\Phi$ is represented by the Jacobi matrix of the $e$ -Lagrange map $\pi^{e}_{1}=(f_{1},\cdots,f_{n}):L\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}$ , i.e., $\Phi(\partial_{j})=\sum_{i=1}^{n}(\partial_{j}f_{i})s_{i}$ in local coordinates $(t_{1},\cdots,t_{n})$ of $L$ with $\partial_{j}=\frac{\partial}{\partial t_{j}}$ . Let $X=\sum_{k=1}^{n}a_{k}\partial_{k}$ and $Y=\sum_{j=1}^{n}b_{j}\partial_{j}$ , then

\nabla^{E}_{X}\Phi(Y)=\sum_{i,j,k}((\partial_{k}\partial_{j}f_{i})a_{k}b_{j}+(\partial_{j}f_{i})a_{k}(\partial_{k}b_{j}))s_{i}.

The rest is shown by a direct computation. $\Box$

Definition 3.10.

We call $(E,\Phi,\nabla^{E})$ the coherent tangent bundle associated to the $e$ -wavefront $W_{e}(L)$ .

Remark 3.11.

The definition of coherent tangent bundles is originally due to Saji-Umehara-Yamada [24, §6] from the viewpoint of Riemannian geometry. They have studied several kinds of curvatures associated to wavefronts. In our case, we use the fixed affine structure of the ambient space of the wavefront, instead of metric. Also affine differential geometry of wavefronts should be rich.

In entirely the same way, for the $m$ -wavefront $W_{m}(L)$ , the coherent tangent bundle $E^{\prime}$ with $\Phi^{\prime}:=d\pi^{m}:TL\to E^{\prime}$ and $\nabla^{E^{\prime}}$ is defined:

E^{\prime}(=E^{\prime}_{L}):=\{\;(p,w)\in L\times(\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}\times\mathbb{R}_{z^{\prime}})\;\mid\;dz^{\prime}(w)-\mbox{\boldmath$x$}(p)^{T}d\mbox{\boldmath$p$}(w)=0\;\}.

In fact, the double Legendre fibration $(dL)$ can be viewed as the pair of maps $(\pi\circ\mathcal{L}^{-1},\pi)$ using different coordinates $(\mbox{\boldmath$p$},\mbox{\boldmath$x$},z^{\prime})$ of $\mathbb{R}^{2n+1}$ , and then the above construction yields $(E^{\prime},\Phi^{\prime},\nabla^{E^{\prime}})$ in this dual side. In particular, $E^{\prime}_{p}$ is identified with $\ker d\pi^{e}_{p}$ (see Remark 3.8).

We have defined $E$ and $E^{\prime}$ as vector bundles on $L$ , although they are actually defined on the ambient space $\mathbb{R}^{2n+1}$ . The contact hyperplane $\xi$ has the direct sum decomposition:

\xi_{p}=\ker d\pi^{\prime}_{p}\oplus\ker d\pi_{p}\simeq E_{p}\oplus E^{\prime}_{p}\simeq\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\oplus\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}.

Here we have canonical frames of flat sections for both $E$ and $E^{\prime}$ ,

s_{i}(p)=\frac{\partial}{\partial x_{i}}+p_{i}\frac{\partial}{\partial z}\in E_{p},\quad s_{i}^{*}(p)=\frac{\partial}{\partial p_{i}}+x_{i}\frac{\partial}{\partial z^{\prime}}\in E^{\prime}_{p},

by which $E^{\prime}$ is identified with the dual to $E$ and vice-vasa, and there are natural correspondence $s_{l}\leftrightarrow\frac{\partial}{\partial x_{l}}$ and $s_{l}^{*}\leftrightarrow\frac{\partial}{\partial p_{l}}$ via projections along the $z$ and $z^{\prime}$ -axes. The vector bundle $\xi$ carries not only the symplectic form $\omega=\sum_{i=1}^{n}dx_{i}\wedge dp_{i}$ but also a pseudo-Riemannian metric of type $(n,n)$ induced from

\tau:=\sum_{i=1}^{n}dx_{i}dp_{i}=\frac{1}{2}\sum_{i=1}^{n}(dx_{i}\otimes dp_{i}+dp_{i}\otimes dx_{i}).

Using frames $s_{i}$ and $s_{j}^{*}$ , we may write vectors of $E_{p}$ and $E^{\prime}_{p}$ as column vectors $u$ and $\mbox{\boldmath$u$}^{\prime}$ , respectively, and then

\tau(\mbox{\boldmath$u$}\oplus\mbox{\boldmath$u$}^{\prime},\mbox{\boldmath$v$}\oplus\mbox{\boldmath$v$}^{\prime})=\frac{1}{2}\left[\,\mbox{\boldmath$u$}^{T}\;\mbox{\boldmath$u$}^{\prime T}\right]\left[\begin{array}[]{cc}O&E\\ E&O\end{array}\right]\left[\begin{array}[]{cc}\mbox{\boldmath$v$}\\ \mbox{\boldmath$v$}^{\prime}\end{array}\right]=\frac{1}{2}(\mbox{\boldmath$u$}^{T}\mbox{\boldmath$v$}^{\prime}+\mbox{\boldmath$u$}^{\prime T}\mbox{\boldmath$v$})

and also $\omega(\mbox{\boldmath$u$}\oplus\mbox{\boldmath$u$}^{\prime},\mbox{\boldmath$v$}\oplus\mbox{\boldmath$v$}^{\prime})=\frac{1}{2}(\mbox{\boldmath$u$}^{T}\mbox{\boldmath$v$}^{\prime}-\mbox{\boldmath$u$}^{\prime T}\mbox{\boldmath$v$})$ .

Any affine Legendre equivalence $\mathcal{L}_{F}$ preserves $\omega$ and $\tau$ on $\xi$ , because it sends $\mbox{\boldmath$u$}\oplus\mbox{\boldmath$u$}^{\prime}$ to $A\mbox{\boldmath$u$}\oplus A^{\prime}\mbox{\boldmath$u$}^{\prime}$ with $A^{\prime}=(A^{T})^{-1}$ .

Definition 3.12.

We define the quasi-Hessian metric of $L$ by the pullback of $\tau$ :

h(Y,Z):=\tau(\iota_{*}Y,\iota_{*}Z)\;\;\;\mbox{for $Y,Z\in TL$}

where $\iota_{*}=\Phi\oplus\Phi^{\prime}:TL\hookrightarrow\xi=E\oplus E^{\prime}$ is the inclusion (it is a Lagrange subbundle).

Note that $h$ is a possibly degenerate symmetric $(0,2)$ -tensor, although we abuse the word ‘metric’. If $\Phi$ is isomorphic, then $h$ exactly coincides with the Hessian metric associated to a potential $z=f(\mbox{\boldmath$x$})$ ; any vector of $T_{p}L$ is written by $\mbox{\boldmath$u$}\oplus H\mbox{\boldmath$u$}\in\xi_{p}$ where $H=[h_{ij}]$ is the Hessian matrix, thus

h(\mbox{\boldmath$u$},\mbox{\boldmath$v$})=\tau(\mbox{\boldmath$u$}\oplus H\mbox{\boldmath$u$},\mbox{\boldmath$v$}\oplus H\mbox{\boldmath$v$})=\mbox{\boldmath$u$}^{T}H\mbox{\boldmath$v$}.

In general, a local expression of $h$ is given as follows.

Lemma 3.13.

Let $g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})$ be a generating function. Then,

h=\sum_{i,k\in I}\frac{\partial^{2}g}{\partial x_{i}\partial x_{k}}\,dx_{i}dx_{k}-\sum_{j,l\in J}\frac{\partial^{2}g}{\partial p_{j}\partial p_{l}}\,dp_{j}dp_{l}.

Proof : A direct computation shows

$\displaystyle\tau$	$\displaystyle=$	$\displaystyle d\mbox{\boldmath$x$}_{I}^{T}d\mbox{\boldmath$p$}_{I}+d\mbox{\boldmath$x$}_{J}^{T}d\mbox{\boldmath$p$}_{J}$
	$\displaystyle=$	$\displaystyle d\mbox{\boldmath$x$}_{I}^{T}d(\partial_{I}g)-d(\partial_{J}g)^{T}d\mbox{\boldmath$p$}_{J}$
	$\displaystyle=$	$\displaystyle d\mbox{\boldmath$x$}_{I}^{T}g_{II}d\mbox{\boldmath$x$}_{I}+d\mbox{\boldmath$x$}_{I}^{T}g_{IJ}d\mbox{\boldmath$p$}_{J}-(g_{JI}d\mbox{\boldmath$x$}_{I})^{T}d\mbox{\boldmath$p$}_{J}-(g_{JJ}d\mbox{\boldmath$p$}_{J})^{T}d\mbox{\boldmath$p$}_{J}$
	$\displaystyle=$	$\displaystyle d\mbox{\boldmath$x$}_{I}^{T}g_{II}d\mbox{\boldmath$x$}_{I}-d\mbox{\boldmath$p$}_{J}^{T}g_{JJ}d\mbox{\boldmath$p$}_{J}.$

Here we use the notation of symmetric products of $1$ -forms and $(g_{JI})^{T}=g_{IJ}$ . $\Box$

Lemma 3.14.

Let $p\in L$ . The following properties are equivalent:

(1)

$h$ is non-degenerate at $p$ ;
(2)

$p$ is neither of $e$ -critical nor $m$ -critical;
(3)

$L$ is locally a regular model around $p$ ;
(4)

$h$ is the Hessian metric associated to a local potential $z=f(\mbox{\boldmath$x$})$ near $p$ ;
(5)

both $\Phi$ and $\Phi^{\prime}$ are isomorphisms at $p$ .

Proof : By Lemma 3.13, (1) means that both $g_{II}$ and $g_{JJ}$ are non-degenerate. Then, using normal forms of the $e/m$ -Lagrange maps $\pi^{e}_{1}$ and $\pi^{m}_{1}$ written in the end of §2.3, those maps are locally diffeomorphic by Inverse Mapping Theorem, so it is just (2) and (3). That means that we can take a local potential $z=f(\mbox{\boldmath$x$})$ as generating function, that is equivalent to (4). Since $\Phi$ and $\Phi^{\prime}$ are expressed by the Jacobi matrices of the $e/m$ -Lagrange maps, (2) and (5) are the same. $\Box$

Remark 3.15.

As noted in Remark 3.7, locally we always find mixed coordinates $(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})$ . By Lemma 3.13, even if $h$ is degenerate, leaves $\mbox{\boldmath$p$}_{J}=const.$ and $\mbox{\boldmath$x$}_{I}=const.$ are orthogonal: $h(\frac{\partial}{\partial x_{i}},\frac{\partial}{\partial p_{j}})=0$ ( $i\in I$ , $j\in J$ ).

Definition 3.16.

Let $\Sigma\,(=\Sigma_{L,h})$ denote the set of $p\in L$ at which $h$ is degenerate, equivalently, the locus where either $\Phi$ or $\Phi^{\prime}$ is not isomorphic:

\Sigma=C(\pi^{e})\cup C(\pi^{m}).

We call $\Sigma$ the degeneracy locus of the quasi-Hessian metric $h$ .

Since $L$ is Legendrian, $T_{p}L$ is a Lagrange subspace of the symplectic vector space $\xi_{p}=E_{p}\oplus E^{\prime}_{p}$ . Note that $\Phi$ (resp. $\Phi^{\prime}$ ) is the linear projection of $T_{p}L$ to the factor $E_{p}$ (resp. $E_{p}^{\prime}$ ), and especially, $\dim T_{p}L\cap E_{p}\geq 1$ (resp. $\dim T_{p}L\cap E^{\prime}_{p}\geq 1$ ) if and only if $p$ is $m$ -critical (resp. $e$ -critical). In particular, the null space of $h$ splits:

{\rm null}\,h_{p}=\ker\Phi^{\prime}_{p}\oplus\ker\Phi_{p}=(T_{p}L\cap E_{p})\oplus(T_{p}L\cap E^{\prime}_{p}).

Definition 3.17.

For an arbitrary Legendre submanifold $L\subset\mathbb{R}^{2n+1}$ , we call the triplet $(h,(E,\nabla^{E},\Phi),(E^{\prime},\nabla^{E^{\prime}},\Phi^{\prime}))$ the dually flat structure of $L$ .

Remark 3.18.

Given a regular model $L$ , we have the triple $(h,E,E^{\prime})$ , where both $\Phi$ and $\Phi^{\prime}$ are isomorphic. That restores the dually flat structure in the original form (Definition 2.3); Indeed, $\nabla$ and $\nabla^{*}$ on $TL$ are uniquely determined by

\Phi(\nabla_{X}Y)=\nabla^{E}_{X}\Phi(Y),\quad\Phi^{\prime}(\nabla^{*}_{X}Y)=\nabla^{E^{\prime}}_{X}\Phi^{\prime}(Y)

where $X,Y$ are arbitrary vector fields on $L$ . On the other hand, a singular model $L$ with degenerate potential $z=f(\mbox{\boldmath$x$})$ is the case that $\Phi$ is isomorphic and $\Phi^{\prime}$ is not. Then the connection $\nabla$ of $TL$ is obtained from $\nabla^{E}$ via $\Phi$ in the same way as above, while $\nabla^{*}$ does not exist. If both $\Phi$ and $\Phi^{\prime}$ are not isomorphic, there is no connection on $TL$ .

3.3. Quasi-Hessian manifolds

Our generalized dually flat structure presented in Definition 3.17 is compatible with affine Legendre equivalence. That means that if an affine Legendre equivalence $\mathcal{L}_{F}$ identifies Legendre submanifolds $L_{1}$ and $L_{2}$ , then the quasi-Hessian metrics are preserved, $\mathcal{L}_{F}^{*}h_{2}=h_{1}$ , and $\mathcal{L}_{F}$ naturally induces vector bundle isomorphisms between coherent tangent bundles, $E_{L_{1}}\simeq E_{L_{2}}$ and $E^{\prime}_{L_{1}}\simeq E^{\prime}_{L_{2}}$ , such that the isomorphisms identify equipped affine flat connections and we have the following commutative diagram

Thus the ordinary gluing construction works. To be precise, suppose that we are given a collection $\{L_{\alpha}\}_{\alpha\in\Lambda}$ , where $\Lambda$ is a countable set, such that it satisfies the following properties:

(i)

for every $\alpha\in\Lambda$ , $L_{\alpha}$ itself is an open manifold and it is embedded in $\mathbb{R}^{2n+1}$ as a Legendre submanifold, called a local model;
(ii)

for every $\alpha,\beta\in\Lambda$ , there is an open subset $L_{\alpha\beta}\subset L_{\alpha}$ (also $L_{\beta\alpha}\subset L_{\beta}$ ) and a diffeomorphism $\mathcal{L}_{\alpha}^{\beta}:L_{\alpha\beta}\to L_{\beta\alpha}$ such that over each connected component of $L_{\alpha\beta}$ , $\mathcal{L}_{\alpha}^{\beta}$ is given by an affine Legendre equivalence of the ambient space $\mathbb{R}^{2n+1}$ ;
(iii)

for $\alpha,\beta,\gamma\in\Lambda$ , it holds that $\mathcal{L}_{\alpha}^{\gamma}=\mathcal{L}_{\beta}^{\gamma}\circ\mathcal{L}_{\alpha}^{\beta}$ on $L_{\alpha\beta}\cap L_{\alpha\gamma}$ .

Let $M$ be the resulting topological space from these gluing data $\mathcal{U}=\{L_{\alpha},\mathcal{L}_{\alpha}^{\beta}\}$ . Assume that $M$ is Hausdorff, then $M$ itself becomes an $n$ -dimensional manifold in the ordinary sense. One can naturally associate a possibly degenerate $(0,2)$ -tensor $h$ on $M$ and a pair of globally defined dual coherent tangent bundles $E$ and $E^{\prime}$ on $M$ with bundle maps $\Phi:TM\to E$ and $\Phi:TM\to E^{\prime}$ equipped with affine flat connections. The bundles $E$ and $E^{\prime}$ are dual to each other.

Definition 3.19.

We call $(M,\mathcal{U})$ equipped with $(h,(E,\nabla^{E},\Phi),(E^{\prime},\nabla^{E^{\prime}},\Phi^{\prime}))$ a quasi-Hessian manifold. We define the degeneracy locus $\Sigma$ to be the locus of points of $M$ at which $h$ is degenerate.

Since the gluing maps $\mathcal{L}_{\alpha}^{\beta}$ also act on a neighborhood of $L_{\alpha\beta}$ in $\mathbb{R}^{2n+1}$ and preserve the contact structure, $M$ is realized as a Legendre submanifold in some ambient contact manifold.

By the above construction, it is obvious to see

Proposition 3.20.

Let $M$ be a quasi-Hessian manifold. Then $h$ is non-degenerate everywhere if and only if $M$ is a Hessian manifold.

Proof : From the equivalence of (1) and (3) in Lemma 3.14, we see that $h$ is non-degenerate everywhere if and only if any local models $L_{\alpha}$ are regular models, that means $M$ is a Hessian manifold (Proposition 2.4). $\Box$

Remark 3.21.

More generally, we may allow a local model $L_{\alpha}$ not to be a manifold but a singular Legendre variety; it is a closed subset with a partition (stratification) into integral submanifolds of the contact structure (the projection to the cotangent bundle is called a singular Lagrange variety), see, e.g., Ishikawa [16]. That results a quasi-Hessian manifold with singularities.

An intrinsic definition of quasi-Hessian manifolds is also available. Roughly speaking, it is an $n$ -dimensional manifold $M$ equipped with a pair of flat coherent tangent bundles $(E,\nabla^{E},\Phi)$ and $(E^{\prime},\nabla^{E^{\prime}},\Phi^{\prime})$ of rank $n$ ; we impose two conditions:

(a)

the vector bundle $E\oplus E^{\prime}$ of rank $2n$ is endowed with a symplectic structure $\omega$ and a pseudo-Riemannian metric $\tau$ of type $(n,n)$ satisfying $\omega(u,v)=\tau(u,v)=0$ ( $u,v\in E_{p}$ or $E^{\prime}_{p}$ ) and $\omega(u,v)=\tau(u,v)$ ( $u\in E_{p}$ , $v\in E^{\prime}_{p}$ ); this condtion defines the dualily between $E$ and $E^{\prime}$ ;
(b)

the bundle map

$\Phi\oplus\Phi^{\prime}:TM\to E\oplus E^{\prime}$

is injective and the image is a Lagrange subbundle which is certainly integrable in order to ensure to find a local model around each point of $M$ as in Definition 3.19. We omit the detail here.

This also suggests a degenerate version of the so-called Codazzi structure (cf. [25]).

3.4. Cubic tensor and $\alpha$ -family

In the theory of dually flat manifolds [1], not only the Hessian metric $h$ but also the Amari-Chentsov tensor $T:=\nabla h$ takes an essential role; it satisfies

T(X,Y,Z)=h(\nabla^{*}_{X}Y,Z)-h(\nabla_{X}Y,Z)=h(Y,\nabla^{*}_{X}Z)-h(Y,\nabla_{X}Z)

for vector fields $X,Y,Z$ .

Note that whenever $\nabla$ exists, the tensor $T$ is defined everywhere, independently whether or not $h$ is non-degenerate. This is an easy case. We generalize the Amari-Chentsov tensor for an arbitrary quasi-Hessian manifold

(M,h,(E,\nabla^{E},\Phi),(E^{\prime},\nabla^{E^{\prime}},\Phi^{\prime}))

but the way is not obvious at all, because there is no connection of $TM$ . Finally we will see that the obtained tensor is a very natural one (Proposition 3.24 below).

Lemma 3.22.

For any vector field $X$ on $M$ , and for any sections $\eta$ of $E$ and $\zeta^{\prime}$ of $E^{\prime}$ , it holds that

X\tau(\eta,\zeta^{\prime})=\tau(\nabla^{E}_{X}\eta,\zeta^{\prime})+\tau(\eta,\nabla^{E^{\prime}}_{X}\zeta^{\prime})

where we put $\tau(\eta,\zeta^{\prime}):=\tau(\eta\oplus 0,0\oplus\zeta^{\prime})$ for short.

Proof : Take local frames of flat sections $s_{i}$ of $E$ and $s_{j}^{*}$ of $E^{\prime}$ with $\tau(s_{i},s_{j}^{*})=\frac{1}{2}\delta_{ij}$ ( $1\leq i,j\leq n$ ) on an open set $U\subset M$ . Put $\eta=\sum a_{i}s_{i}$ and $\zeta^{\prime}=\sum b_{j}s_{j}^{*}$ where $a_{i},b_{j}$ are functions on $U$ , then

\nabla^{E}_{X}\eta=\sum X(a_{i})s_{i},\;\;\nabla^{E^{\prime}}_{X}\zeta^{\prime}=\sum X(b_{j})s_{j}^{*},\;\;\tau(\eta,\zeta^{\prime})=\frac{1}{2}\sum a_{i}b_{i}.

This leads to the equality. $\Box$

For $Y,Z\in TM$ , put

\eta=\Phi(Y),\;\;\zeta=\Phi(Z)\;\in E,\;\;\eta^{\prime}=\Phi^{\prime}(Y),\;\;\zeta^{\prime}=\Phi^{\prime}(Z)\;\in E^{\prime}.

Then

h(Y,Z)=\tau(\eta\oplus\eta^{\prime},\zeta\oplus\zeta^{\prime})=\tau(\eta,\zeta^{\prime})+\tau(\zeta,\eta^{\prime}).

Using Lemma 3.22, for vector fields $X,Y,Z$ on $M$ ,

	$\displaystyle Xh(Y,Z)$	$\displaystyle=$	$\displaystyle X(\tau(\eta,\zeta^{\prime}))+X(\tau(\zeta,\eta^{\prime}))$
		$\displaystyle=$	$\displaystyle\tau(\nabla^{E}_{X}\eta,\zeta^{\prime})+\tau(\eta,\nabla^{E^{\prime}}_{X}\zeta^{\prime})+\tau(\nabla^{E}_{X}\zeta,\eta^{\prime})+\tau(\zeta,\nabla^{E^{\prime}}_{X}\eta^{\prime}).$

We call the sum of first and third terms the $\nabla^{E}$ -part, the rest the $\nabla^{E^{\prime}}$ -part, tentatively. We are concerned with their difference.

Definition 3.23.

For a quasi-Hessian manifold $M$ , we define the canonical cubic tensor $C$ by the following $(0,3)$ -tensor on $M$ :

\displaystyle C(X,Y,Z):=\tau(\eta,\nabla^{E^{\prime}}_{X}\zeta^{\prime})+\tau(\zeta,\nabla^{E^{\prime}}_{X}\eta^{\prime})-\tau(\nabla^{E}_{X}\eta,\zeta^{\prime})-\tau(\nabla^{E}_{X}\zeta,\eta^{\prime}).

In particular, if $h$ is non-degenerate, then $\Phi(\nabla_{X}Y)=\nabla^{E}_{X}\Phi(Y)$ (Remark 3.18) and we have

\tau(\nabla^{E}_{X}\eta,\zeta^{\prime})=\tau(\Phi(\nabla_{X}Y),\Phi^{\prime}(Z))=\frac{1}{2}h(\nabla_{X}Y,Z)

and so on, thus it follows that the $\nabla^{E}$ -part and the $\nabla^{E^{\prime}}$ -part are equal to, respectively,

\frac{1}{2}(h(\nabla_{X}Y,Z)+h(Y,\nabla_{X}Z)),\quad\frac{1}{2}(h(\nabla^{*}_{X}Y,Z)+h(Y,\nabla^{*}_{X}Z)).

Hence, we see that $C$ coincides with the Amari-Chentsov tensor $T$ :

	$\displaystyle C(X,Y,Z)$	$\displaystyle=$	$\displaystyle\textstyle\frac{1}{2}(h(\nabla^{}_{X}Y,Z)-h(\nabla_{X}Y,Z))+\frac{1}{2}(h(Y,\nabla^{}_{X}Z)-h(Y,\nabla_{X}Z))$
		$\displaystyle=$	$\displaystyle T(X,Y,Z).$

Using local coordinates, we write down the tensor $C$ explicitly as follows. Take a local model $L\subset\mathbb{R}^{2n+1}$ and $p\in L$ . As mentioned before, locally around $p$ , $L$ is parameterized by some local coordinates $\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J}$ with a generating function $g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})$ . For the simplicity, for each $1\leq k\leq n$ , we set

\partial_{k}:=\frac{\partial}{\partial x_{k}}\;(k\in I)\;\;\;\mbox{or}\;\;\;\frac{\partial}{\partial p_{k}}\;(k\in J).

Proposition 3.24.

The canonical cubic tensor $C$ is locally the third partial derivative of a generating function: for any $k,l,m$ ,

C(\partial_{k},\partial_{l},\partial_{m})=\partial_{k}\partial_{l}\partial_{m}g.

In particular, $C$ is symmetric.

Proof : This is shown by direct computation. The generating function yields a Lagrange embedding $L\to T^{*}\mathbb{R}^{n}$ given by

\iota:(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})\mapsto(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$x$}_{J},\mbox{\boldmath$p$}_{I},\mbox{\boldmath$p$}_{J}):=(\mbox{\boldmath$x$}_{I},-\partial_{J}g,\partial_{I}g,\mbox{\boldmath$p$}_{J}),

thus the differential $\iota_{*}:T_{p}L\to T_{p}(T^{*}\mathbb{R}^{n})=\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}\oplus\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}$ is written as

\iota_{*}(\partial_{k})=\partial_{k}-\sum_{j\in J}(\partial_{k}\partial_{j}g)\frac{\partial}{\partial x_{j}}+\sum_{i\in I}(\partial_{k}\partial_{i}g)\frac{\partial}{\partial p_{i}}.

Let $s_{i},s_{i}^{*}\;(1\leq i\leq n)$ be flat sections of $E$ and $E^{\prime}$ as before; $\tau(s_{i},s_{j}^{*})=\frac{1}{2}\delta_{ij}$ . Then for $k\in I$ ,

\Phi(\iota_{*}\partial_{k})=s_{k}-\sum_{j\in J}(\partial_{k}\partial_{j}g)s_{j},\quad\Phi^{\prime}(\iota_{*}\partial_{k})=\sum_{i\in I}(\partial_{k}\partial_{i}g)s_{i}^{*},

and for $k\in J$ ,

\Phi(\iota_{*}\partial_{k})=-\sum_{j\in J}(\partial_{k}\partial_{j}g)s_{j},\quad\Phi^{\prime}(\iota_{*}\partial_{k})=s_{k}^{*}+\sum_{i\in I}(\partial_{k}\partial_{i}g)s_{i}^{*}.

Put $\eta=\Phi(\iota_{*}\partial_{l})$ , $\eta^{\prime}=\Phi^{\prime}(\iota_{*}\partial_{l})$ , $\zeta=\Phi(\iota_{*}\partial_{m})$ , $\zeta^{\prime}=\Phi^{\prime}(\iota_{*}\partial_{m})$ , and $X=\partial_{k}$ .

For $l\in I$ , $m\in J$ and any $k$ , we have

	$\displaystyle\textstyle\tau(\eta,\nabla^{E^{\prime}}_{X}\zeta^{\prime})=\tau(s_{l}-\sum_{J}(\partial_{l}\partial_{j}g)s_{j},\sum_{I}(\partial_{k}\partial_{m}\partial_{i}g)s_{i}^{*})=\frac{1}{2}\partial_{k}\partial_{l}\partial_{m}g,$
	$\displaystyle\textstyle\tau(\zeta,\nabla^{E^{\prime}}_{X}\eta^{\prime})=\tau(-\sum_{J}(\partial_{m}\partial_{j}g)s_{j},\sum_{I}(\partial_{k}\partial_{l}\partial_{i}g)s_{i}^{*})=0,$
	$\displaystyle\textstyle\tau(\nabla^{E}_{X}\eta,\zeta^{\prime})=\tau(-\sum_{J}(\partial_{k}\partial_{l}\partial_{j}g)s_{j},s_{m}^{}+\sum_{I}(\partial_{m}\partial_{i}g)s_{i}^{})=-\frac{1}{2}\partial_{k}\partial_{l}\partial_{m}g,$
	$\displaystyle\textstyle\tau(\nabla^{E}_{X}\zeta,\eta^{\prime})=\tau(-\sum_{J}(\partial_{k}\partial_{m}\partial_{j}g)s_{j},\sum_{I}(\partial_{l}\partial_{i}g)s_{i}^{*})=0.$

Thus, the $\nabla^{E^{\prime}}$ -part minus the $\nabla^{E}$ -part gives $C(\partial_{k},\partial_{l},\partial_{m})=\partial_{k}\partial_{l}\partial_{m}g$ .

For $l,m\in I$ and any $k$ , we have

	$\displaystyle\textstyle\tau(\eta,\nabla^{E^{\prime}}_{X}\zeta^{\prime})=\tau(s_{l}-\sum_{J}(\partial_{l}\partial_{j}g)s_{j},\sum_{I}(\partial_{k}\partial_{m}\partial_{i}g)s_{i}^{*})=\frac{1}{2}\partial_{k}\partial_{l}\partial_{m}g,$
	$\displaystyle\textstyle\tau(\zeta,\nabla^{E^{\prime}}_{X}\eta^{\prime})=\tau(s_{m}-\sum_{J}(\partial_{m}\partial_{j}g)s_{j},\sum_{I}(\partial_{k}\partial_{l}\partial_{i}g)s_{i}^{*})=\frac{1}{2}\partial_{k}\partial_{l}\partial_{m}g,$
	$\displaystyle\textstyle\tau(\nabla^{E}_{X}\eta,\zeta^{\prime})=\tau(-\sum_{J}(\partial_{k}\partial_{l}\partial_{j}g)s_{j},\sum_{I}(\partial_{m}\partial_{i}g)s_{i}^{*})=0,$
	$\displaystyle\textstyle\tau(\nabla^{E}_{X}\zeta,\eta^{\prime})=\tau(-\sum_{J}(\partial_{k}\partial_{m}\partial_{j}g)s_{j},\sum_{I}(\partial_{l}\partial_{i}g)s_{i}^{*})=0.$

Thus, $C(\partial_{k},\partial_{l},\partial_{m})=\partial_{k}\partial_{l}\partial_{m}g$ . The same is true for the case of $l,m\in J$ . $\Box$

Remark 3.25.

For a dually flat manifold with potential function $f$ , the above proposition corresponds to a well known property

T(\partial_{i},\partial_{j},\partial_{k})=\partial_{i}\partial_{j}\partial_{k}f,

with respect to $\nabla$ -affine coordinates. In fact, a quasi-Hessian manifold is well characterized by using $h$ and $C$ , that will be discussed within the theory of (weak) contrast functions (see §3.4).

As well known, for a dually flat manifold $M$ , the family of $\alpha$ -connections is defined by

\nabla^{(\alpha)}=\frac{1+\alpha}{2}\nabla+\frac{1-\alpha}{2}\nabla^{*}

$(\alpha\in\mathbb{R}$ ). Namely, it deforms the Levi-Civita connection using $T$ linearly. When $\alpha=\pm 1$ , $\nabla,\nabla^{*}$ are recovered. Both $\nabla^{(\alpha)}$ and $\nabla^{(-\alpha)}$ are mutually dual and they form the so-called $\alpha$ -geometry [1, 20]. For a quasi-Hessian manifold $M$ , we have connections of $E$ and $E^{\prime}$ , but none of $TM$ , thus there is no direct analogy to $\alpha$ -geometry. Nevertheless, as an attempt, we define a new $(0,3)$ -tensor

	$\displaystyle N^{(\alpha)}(X,Y,Z)$	$\displaystyle:=$	$\displaystyle\frac{1+\alpha}{2}\left[\mbox{$\nabla^{E}$-part}\right]+\frac{1-\alpha}{2}\left[\mbox{$\nabla^{E^{\prime}}$-part}\right]$
		$\displaystyle=$	$\displaystyle\frac{1}{2}Xh(Y,Z)-\frac{\alpha}{2}C(X,Y,Z).$

Obviously, $N^{(-1)}(X,Y,Z)$ is the $\nabla^{E^{\prime}}$ -part, $N^{(1)}(X,Y,Z)$ is the $\nabla^{E}$ -part multiplied by $(-1)$ , and a sort of duality holds:

Xh(Y,Z)=N^{(\alpha)}(X,Y,Z)+N^{(-\alpha)}(X,Y,Z).

In general, $N^{(\alpha)}$ is not totally symmetric, for $Xh(Y,Z)$ is not so. If either $\Phi_{p}$ or $\Phi^{\prime}_{p}$ is isomorphic, then we may take a possibly degenerate local (dual) potential around $p$ (i.e., $I$ or $J=\emptyset$ ) as generating function $g$ ; thus $h$ is written by the Hessian of the potential, and hence $Xh(Y,Z)$ is symmetric, and $N^{(\alpha)}$ is also. Furthermore, if $h$ is non-degenerate, i.e., $M$ is a dually flat manifold, we completely restore $\alpha$ -geoemtry.

4. Divergence

Let $(M,h,(E,\nabla^{E}),(E^{\prime},\nabla^{E^{\prime}}))$ be a quasi-Hessian manifold throughout this section.

4.1. Geodesic-like curves

Let $c:I\to M$ be a curve, where $I\;(\not=\emptyset)\subset\mathbb{R}$ is an open interval, and set $\dot{c}(t):=\frac{d}{dt}c(t)\in T_{c(t)}M$ , the velocity vector ( $t\in I$ ).

Definition 4.1.

A curve $c:I\to M$ is called an $m$ -curve if it is an immersion ( $\dot{c}(t)\not=0$ ) and satisfies that at every $t\in I$ , vectors of $E^{\prime}_{c(t)}$

\Phi^{\prime}\circ\dot{c}(t),\;\;\nabla^{E^{\prime}}_{\dot{c}}(\Phi^{\prime}\circ\dot{c})(t),\;\;(\nabla^{E^{\prime}}_{\dot{c}})^{2}(\Phi^{\prime}\circ\dot{c})(t),\;\cdots

are not simultaneously zero and any two are linearly dependent. Also an $e$ -curve is defined in the same way by replacing $\Phi^{\prime}$ and $E^{\prime}$ by $\Phi$ and $E$ , respectively.

Suppose that the curve is given in a local model, $c:I\to L_{\alpha}$ . We denote by

\mbox{\boldmath$p$}(t):=\pi^{m}_{1}\circ c(t)\in\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}

the image via the $m$ -Lagrange map $\pi^{m}_{1}:L_{\alpha}\to\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}$ . Note that $E^{\prime}_{p}$ is canonically isomorphic to $\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}$ by linear projection along the $z^{\prime}$ -axis. Unless $\Phi^{\prime}\circ\dot{c}(t)$ becomes to be zero, the velocity vector $\dot{\mbox{\boldmath$p$}}(t)$ does not vanish and its acceleration vector $\ddot{\mbox{\boldmath$p$}}(t)$ is parallel to the velocity (it can be $0$ ) by the condition in Definition 4.1. Hence $\mbox{\boldmath$p$}(t)$ moves on a straight line in $\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}$ , i.e., $c(t)$ is a re-parametrization of an $m$ -geodesic (geodesic with respect to $\nabla^{*}=\nabla^{E^{\prime}}$ ). A trouble occurs when $\Phi^{\prime}\circ\dot{c}(t_{0})=0$ at some $t_{0}$ . Then, $\dot{\mbox{\boldmath$p$}}(t_{0})=0$ , but by the condition for $m$ -curve, some higher derivative is non-zero, say $\frac{d^{k}}{dt^{k}}\mbox{\boldmath$p$}(t_{0})\not=0$ , and then the vector $\frac{d^{k+1}}{dt^{k+1}}\mbox{\boldmath$p$}(t)$ is parallel to $\frac{d^{k}}{dt^{k}}\mbox{\boldmath$p$}(t)$ , so we see again that $\mbox{\boldmath$p$}(t)$ moves on a straight line, but it meets the $m$ -caustics at $t=t_{0}$ ; it stops once and then turns back or goes forward along the same line, according to $k$ even or odd, see Fig. 3 (cf. Examples 3.4 and 3.5). We choose a direction vector $\mbox{\boldmath$m$}_{c}$ of the straight line. For an $e$ -curve $c(t)$ , everything is the same, and we denote by $\mbox{\boldmath$e$}_{c}$ a direction of the corresponding line on $\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}$ .

Remark 4.2.

(1) Not arbitrary two points on $M$ are connected by an $m$ -curve but by a piecewise $m$ -curve. In fact, in Example 3.4, you can easily find such two points on the $m$ -wavefront, the left in Fig. 3. That is also for $e$ -curves. (2) Take coordinates $(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})$ for a local model $L_{\alpha}$ . It is easy to see that any $e/m$ -curves satisfy a certain partial differential equation (like the geodesic equation) using $h=(h_{ij})$ and $C=(C_{ijk})$ . In §3.4, we have introduced the $\alpha$ -family of cubic tensors $N^{(\alpha)}$ . Thus we may consider an $\alpha$ -analogy to $e/m$ -curves; indeed, over $M-\Sigma$ , it is the same as geodesics with respect to $\nabla^{(\alpha)}$ and $\nabla^{(-\alpha)}$ .

The following definition does not depend on the choices of $L_{\alpha}$ and direction vectors.

Definition 4.3.

Let $c_{e}$ , $c_{m}$ and $e$ , $m$ as above. Let $S$ be a submanifold of $M$ and $c_{m}$ meets $S$ at $q\in S$ . We say that $c_{m}$ and $S$ are orthogonal at $q$ if it holds that

\displaystyle\mbox{\boldmath$m$}^{T}d\mbox{\boldmath$x$}(u)=0\;\;\;\mbox{\rm for any $u\in T_{q}S$}

where $(\mbox{\boldmath$x$},\mbox{\boldmath$p$},z)$ is the coordinates of $\mathbb{R}^{2n+1}$ for a local model $L_{\alpha}$ containing $q$ . Similarly, $c_{e}$ and $S$ are orthogonal at $q$ if $\mbox{\boldmath$e$}^{T}d\mbox{\boldmath$p$}(u)=0$ for any $u$ . Furthermore, we say that $c_{e}$ and $c_{m}$ are strictly orthogonal at $q$ if $\mbox{\boldmath$e$}^{T}\mbox{\boldmath$m$}=0$ .

If $q\not\in\Sigma$ , the above definition of the orthogonality of $c_{m}$ and $S$ is the same as the orthogonality with respect to the metric $h$ . In fact, taking a regular model around $q$ , the Hessian $H(q)$ is non-degenerate, and thus

h(\dot{c}(t),u)=\dot{\mbox{\boldmath$x$}}(t)^{T}Hd\mbox{\boldmath$x$}(u)=(H\dot{\mbox{\boldmath$x$}}(t))^{T}d\mbox{\boldmath$x$}(u)=\dot{\mbox{\boldmath$p$}}(t)^{T}d\mbox{\boldmath$x$}(u)=k\mbox{\boldmath$m$}^{T}d\mbox{\boldmath$x$}(u)

for some $k\not=0$ . However, if $q\in\Sigma$ , the meaning is different in general, for it can happen that $\dot{\mbox{\boldmath$p$}}(t_{0})=0$ but $\mbox{\boldmath$m$}\not=0$ (in this case, $m$ is determined by some higher derivative of $\mbox{\boldmath$p$}(t)$ ). The reason why we define the strictly orthogonality is the same; $e$ and $m$ may not be determined by velocity vectors.

4.2. Canonical divergence

Let $L$ be a Legendre submanifold of $\mathbb{R}^{2n+1}$ . We denote coordinates by

p=(\mbox{\boldmath$x$}(p),\mbox{\boldmath$p$}(p),z(p))\in\mathbb{R}^{2n+1},\quad z^{\prime}(p)=\mbox{\boldmath$p$}(p)^{T}\mbox{\boldmath$x$}(p)-z(p)\in\mathbb{R}.

Definition 4.4.

The canonical divergence $\mathcal{D}=\mathcal{D}_{L}:L\times L\to\mathbb{R}$ is defined by

\displaystyle\mathcal{D}(p,q)=z(p)+z^{\prime}(q)-\mbox{\boldmath$x$}(p)^{T}\mbox{\boldmath$p$}(q).

Note that $\mathcal{D}(p,p)=0$ and it is asymmetric, $\mathcal{D}(p,q)\not=\mathcal{D}(q,p)$ , in general. In particular, if $L$ is a regular model with positve definite Hessian metric, this is nothing but the Bregman divergence for some convex potential $z=f(\mbox{\boldmath$x$})$ ,

\mathcal{D}(p,q)=f(\mbox{\boldmath$x$}(p))+\varphi(\mbox{\boldmath$p$}(q))-\mbox{\boldmath$x$}(p)^{T}\mbox{\boldmath$p$}(q),

where $z^{\prime}=\varphi(\mbox{\boldmath$p$})$ is the Legendre transform of the potential [1].

Lemma 4.5.

The canonical divergence $\mathcal{D}_{L}$ is invariant under affine Legendre equivalence, i.e., if Legendre submanifolds $L_{1}$ and $L_{2}$ of $\mathbb{R}^{2n+1}$ are affine Legendre equivalenct via $\mathcal{L}_{F}$ , then it holds that

\mathcal{D}_{L_{1}}=\mathcal{D}_{L_{2}}\circ(\mathcal{L}_{F}\times\mathcal{L}_{F})\quad\mbox{on $L_{1}\times L_{1}$.}

Proof : Suppose that $\mathcal{L}_{F}:\mathbb{R}^{2n+1}\to\mathbb{R}^{2n+1}$ is given by

(\mbox{\boldmath$u$},\mbox{\boldmath$v$},w)=\mathcal{L}_{F}(\mbox{\boldmath$x$},\mbox{\boldmath$p$},z)=(A\mbox{\boldmath$x$}+\mbox{\boldmath$b$},A^{\prime}\mbox{\boldmath$p$}+\mbox{\boldmath$b$}^{\prime},z+\mbox{\boldmath$c$}^{T}\mbox{\boldmath$x$}+d)

with $A^{\prime}=(A^{T})^{-1}$ , $\mbox{\boldmath$b$}^{\prime}=A^{\prime}\mbox{\boldmath$c$}$ and $w^{\prime}=\mbox{\boldmath$v$}^{T}\mbox{\boldmath$u$}-w$ , and $\mathcal{L}_{F}(L_{1})=L_{2}$ . Then

	$\displaystyle\mathcal{D}_{L_{2}}(\mathcal{L}_{F}(p),\mathcal{L}_{F}(q))$
	$\displaystyle\quad=w(p)+w^{\prime}(q)-\mbox{\boldmath$u$}(p)^{T}\mbox{\boldmath$v$}(q)$
	$\displaystyle\quad=w(p)-w(q)+\mbox{\boldmath$v$}(q)^{T}(\mbox{\boldmath$u$}(q)-\mbox{\boldmath$u$}(p))$
	$\displaystyle\quad=z(p)-z(q)+\mbox{\boldmath$c$}^{T}(\mbox{\boldmath$x$}(p)-\mbox{\boldmath$x$}(q))+(A^{\prime}(\mbox{\boldmath$p$}(q)+\mbox{\boldmath$c$}))^{T}(A(\mbox{\boldmath$x$}(q)-\mbox{\boldmath$x$}(p))\qquad\quad$
	$\displaystyle\quad=z(p)-z(q)+\mbox{\boldmath$p$}(q)^{T}(\mbox{\boldmath$x$}(q)-\mbox{\boldmath$x$}(p))$
	$\displaystyle\quad=z(p)+z^{\prime}(q)-\mbox{\boldmath$x$}(p)^{T}\mbox{\boldmath$p$}(q).$
	$\displaystyle\quad=\mathcal{D}_{L_{1}}(p,q).$

This completes the proof. $\Box$

Let $(M,\mathcal{U}=\{L_{\alpha}\})$ be a quasi-Hessian manifold obtained by gluing local models and put $\varDelta_{M}=\{(p,p)\in M\times M\}$ . Let $U(\varDelta_{M})$ denote the subset of $M\times M$ consisting of points $(p,q)$ such that there is some local model $L_{\alpha}$ containing $p,q$ . Since $M$ is endowed with the quotient topology, $U(\varDelta_{M})$ is an open neighborhood of the diagonal $\varDelta_{M}$ .

Definition 4.6.

We set $\mathcal{D}_{M}(p,q):=\mathcal{D}_{L_{\alpha}}(p,q)$ at $p,q\in L_{\alpha}$ for some $\alpha$ , then $\mathcal{D}_{M}:U(\varDelta_{M})\to\mathbb{R}$ is well-defined by Lemma 4.5. We call it the canonical divergence of $M$ .

If $M$ is connected and simply connected, then the canonical divergence of $M$ can be extended to the entire space, so we obtain $\mathcal{D}_{M}:M\times M\to\mathbb{R}$ .

In Amari-Nagaoka’s theory of dually flat structure [1, 2], there are two important theorems named by extended Pythagorean Theorem and projection theorem. They take a central role in application to statistical inference, em-algorithm, machine learning and so on. These are immediately generalized to our singular setup. In the following two theorems, assume that $M$ is a local model (i.e. $M=L\subset\mathbb{R}^{2n+1}$ ) or a connected and simply-connected quasi-Hessian manifold. Anyway, the canonical divergence $\mathcal{D}\,(=\mathcal{D}_{M})$ is defined on $M\times M$ .

We say that two points $p,q$ are jointed by a curve $c:I\to M$ if there are $t_{0},t_{1}\in I$ with $c(t_{0})=p$ and $c(t_{1})=q$ .

Theorem 4.7.

(Extended Pythagorean Theorem) Let $p,q,r\in M$ be three distinct points such that $p$ and $q$ are joined by an $e$ -curve $c_{e}$ , and $q$ and $r$ are jointed by an $m$ -curve $c_{m}$ , and furthermore, $c_{e}$ and $c_{m}$ are strictly orthogonal at $q$ . Then it holds that

\displaystyle\mathcal{D}(p,r)=\mathcal{D}(p,q)+\mathcal{D}(q,r).

Proof : Since $\mathcal{D}(q,q)=z(q)+z^{\prime}(q)-\mbox{\boldmath$x$}(q)^{T}\mbox{\boldmath$p$}(q)=0$ , we see that

\mathcal{D}(p,r)-\mathcal{D}(p,q)-\mathcal{D}(q,r)=-(\mbox{\boldmath$x$}(p)-\mbox{\boldmath$x$}(q))^{T}(\mbox{\boldmath$p$}(r)-\mbox{\boldmath$p$}(q)).

The images of the maps $\pi^{e}_{1}\circ c_{e}$ and $\pi^{m}_{1}\circ c_{m}$ lie on lines with direction vectors, say $\mbox{\boldmath$e$},\mbox{\boldmath$m$}$ , respectively. Then

\mbox{\boldmath$x$}(p)-\mbox{\boldmath$x$}(q)=k_{0}\mbox{\boldmath$e$},\quad\mbox{\boldmath$p$}(r)-\mbox{\boldmath$p$}(q)=k_{1}\mbox{\boldmath$m$}

for some $k_{0},k_{1}\in\mathbb{R}$ . The assumption is $\mbox{\boldmath$e$}^{T}\mbox{\boldmath$m$}=0$ , thus the equality follows. $\Box$

Theorem 4.8.

(Projection Theorem) Let $S$ be a submanifold of $M$ and $c_{m}:[0,1]\to L$ an $m$ -curve with $q=c_{m}(1)\in S$ . Put $p=c_{m}(0)\in L$ . Then, $c_{m}$ and $S$ are orthogonal at $q$ if and only if $q$ is a critical point of the function $F=\mathcal{D}(-,p):S\to\mathbb{R}$ . The same holds for an $e$ -curve $c_{e}$ and $F=\mathcal{D}(p,-)$ .

Proof : Take a generating function $g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})$ around $q$ . Recall $\mbox{\boldmath$p$}_{I}=\frac{\partial g}{\partial\mbox{\boldmath$x$}_{I}}$ , $\mbox{\boldmath$x$}_{J}=-\frac{\partial g}{\partial\mbox{\boldmath$p$}_{J}}$ and $z=\mbox{\boldmath$p$}_{J}^{T}\mbox{\boldmath$x$}_{J}+g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})$ . Let $\gamma=\gamma(s)$ be an immersed curve on $S$ with $\gamma(0)=q$ . On this curve, we have $\frac{d}{ds}g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})=\mbox{\boldmath$p$}_{I}^{T}(\frac{d}{ds}\mbox{\boldmath$x$}_{I})-\mbox{\boldmath$x$}_{J}^{T}(\frac{d}{ds}\mbox{\boldmath$p$}_{J})$ and $\frac{dz}{ds}=(\frac{d}{ds}\mbox{\boldmath$p$}_{J})^{T}\mbox{\boldmath$x$}_{J}+\mbox{\boldmath$p$}_{J}^{T}(\frac{d}{ds}\mbox{\boldmath$x$}_{J})+\frac{d}{ds}g=\mbox{\boldmath$p$}^{T}\frac{d}{ds}\mbox{\boldmath$x$}$ . Therefore, we see

	$\displaystyle\frac{d(F\circ\gamma)}{ds}(s)$	$\displaystyle=$	$\displaystyle\frac{d}{ds}(z(\gamma(s))+z^{\prime}(p)-\mbox{\boldmath$p$}(p)^{T}\mbox{\boldmath$x$}(\gamma(s)))$
		$\displaystyle=$	$\displaystyle(\mbox{\boldmath$p$}(\gamma(s))-\mbox{\boldmath$p$}(p))^{T}\frac{d(\mbox{\boldmath$x$}\circ\gamma)}{ds}(s).$

At $s=0$ , the vector $\mbox{\boldmath$p$}(q)-\mbox{\boldmath$p$}(p)$ is a scalar multiple of the direction vector $v$ of the line in $\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}$ to which the $m$ -curve $c_{m}$ is projected, and $\frac{d}{ds}\mbox{\boldmath$x$}=d\pi^{e}_{1}(\frac{d\gamma}{ds}(0))\in\mathbb{R}^{n}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}$ . Hence, the orthogonality of $S$ and $c_{m}$ at $q$ is equivalent to that $\frac{d}{ds}F\circ\gamma(0)=0$ for arbitrary $\gamma$ , that means that $F$ is critical at $q$ . $\Box$

Example 4.9.

We check the Pythagorean theorem for a toy example in Example 3.4. Let

f(\mbox{\boldmath$x$})=\frac{x_{1}^{3}}{3}+\frac{x_{2}^{2}}{2}

and use affine local coordinates $\mbox{\boldmath$x$}=(x_{1},x_{2})$ for $L$ . The $m$ -Lagrange map is $(x_{1},x_{2})\mapsto(p_{1},p_{2})=(x_{1}^{2},x_{2})$ , and $\Sigma$ is the $x_{2}$ -axis. We have already computed the dual potential $z^{\prime}$ , thus for $P:=\mbox{\boldmath$x$}(p)=(a_{1},a_{2})$ and $Q:=\mbox{\boldmath$x$}(q)=(b_{1},b_{2})$ , we have

\mathcal{D}(P,Q)=\frac{a_{1}^{3}}{3}+\frac{a_{2}^{2}}{2}+\frac{2b_{1}^{3}}{3}+\frac{b_{2}^{2}}{2}-a_{1}b_{1}^{2}-a_{2}b_{2}.

A straight line $p_{2}=ap_{1}+b$ on $\mathbb{R}^{2}_{\mbox{\tiny$\mbox{\boldmath$p$}$}}$ corresponds to a parabola $x_{2}=ax_{1}^{2}+b$ on $\mathbb{R}^{2}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}$ (i.e. an $m$ -curve). Now, for example, take an $m$ -curve $c_{m}$ : $x_{2}=\frac{1}{2}x_{1}^{2}$ ( $\mbox{\boldmath$m$}=(2,1)^{T}$ ), and two points $Q:=(u,\frac{u^{2}}{2})$ and $R:=(t,\frac{t^{2}}{2})$ lying on it. Take a point $P:=(s,-2(s-u)+\frac{u^{2}}{2})$ on the straight line on $\mathbb{R}^{2}_{\mbox{\tiny$\mbox{\boldmath$x$}$}}$ (i.e. an $e$ -curve) passing through $Q$ directed by $\mbox{\boldmath$e$}=(1,-2)^{T}$ with $\mbox{\boldmath$e$}^{T}\mbox{\boldmath$m$}=0$ . Then $\triangle PQR$ satisfies the condition, and we see $\mathcal{D}(P,Q)+\mathcal{D}(Q,R)=\mathcal{D}(P,R)$ . It does not matter whether the point $Q$ lies on $\Sigma$ or not.

4.3. Weak contrast functions

First we recall the definition of contrast functions (Eguchi [13]). Let $M$ be a manifold and $\rho:U\to\mathbb{R}$ a function defined on an open neighborhood $U$ of the diagonal $\mathit{\Delta}_{M}\subset M\times M$ . Given vector fields $X_{i}\,(1\leq i\leq k)$ , $Y_{j}\,(1\leq j\leq l)$ on $M$ , we set a function

\rho[X_{1}\cdots X_{k}|Y_{1}\cdots Y_{l}]:M\to\mathbb{R}

by assigning to $p\in M$ the value

(X_{1})_{p}\cdots(X_{k})_{p}(Y_{1})_{q}\cdots(Y_{l})_{q}\left(\rho(p,q)\right)|_{p=q}.

We also write $\rho[X|-](r)=X_{p}\rho(p,q)|_{p=q=r}$ and so on. We call $\rho:U\to\mathbb{R}$ a contrast function of $M$ if it satisfies that

(i)\;\;\rho[-|-]=\rho(p,p)=0\qquad(ii)\;\;\rho[X|-]=\rho[-|X]=0,

(iii) $h(X,Y):=-\rho[X|Y]$ is a pseudo-Riemannian metric on $M$ .

We call $\rho$ a weak contrast function if it satisfies only (i) and (ii).

Given a contrast function $\rho$ , affine connections are defined by

h(\nabla_{X}Y,Z):=-\rho[XY|Z],\quad h(Y,\nabla^{*}_{X}Z):=-\rho[Y|XZ].

Those connections are torsion-free, mutually dual with respect to $h$ , and $\nabla h$ is symmetric, and therefore, $(M,h,\nabla)$ becomes a statistical manifold [13, 20]. Conversely, given a statistical manifold $M$ , one can find a contrast function which reproduces the metric and connections [19] – it is actually shown in [19] that for a symmetric $(0,2)$ -tensor $h$ (i.e., a possibly degenerate metric) and a symmetric $(0,3)$ -tensor $c$ , one can find a weak contrast function $\rho:U\to\mathbb{R}$ which satisfies that

	$\displaystyle h(X,Y)$	$\displaystyle=$	$\displaystyle-\rho[X\|Y]\;(=\rho[XY\|-]=\rho[-\|XY]),$
	$\displaystyle c(X,Y,Z)$	$\displaystyle=$	$\displaystyle-\rho[Z\|XY]+\rho[XY\|Z].$

Among statistical manifolds, Hessian manifolds admit a notable property: the Bregman divergence is a contrast function, and it reproduces the dually flat structure. That is extended to our quasi-Hessian manifold and its canonical divergence.

Theorem 4.10.

For a quasi-Hessian manifold $M$ , the canonical divergence $\mathcal{D}_{M}$ is a weak contrast function, and reproduces the quasi-Hessian metric and the canonical cubic tensor by

	$\displaystyle h(X,Y)$	$\displaystyle=$	$\displaystyle-\mathcal{D}_{M}[X\|Y],$
	$\displaystyle C(X,Y,Z)$	$\displaystyle=$	$\displaystyle-\mathcal{D}_{M}[XY\|Z]+\mathcal{D}_{M}[Z\|XY].$

Proof : Since this is a local property, take a local model $L_{\alpha}\subset\mathbb{R}^{2n+1}$ . Suppose that $g(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})$ is a generating function for $L_{\alpha}$ around $p\in L_{\alpha}$ . Then $(\mbox{\boldmath$x$}_{I},\mbox{\boldmath$p$}_{J})$ is a system of local coordinates for $L_{\alpha}$ around $p$ , and it holds that $x_{j}(q)=-\frac{\partial g}{\partial p_{j}}(q)$ , $p_{i}(q)=\frac{\partial g}{\partial x_{i}}(q)$ , $z(q)=\mbox{\boldmath$p$}_{J}(q)^{T}\mbox{\boldmath$x$}_{J}(q)+g(q)$ for $q\in L_{\alpha}$ close to $p$ . Hence,

	$\displaystyle\mathcal{D}_{M}(p,q)$	$\displaystyle=$	$\displaystyle z(p)-z(q)+\mbox{\boldmath$p$}(q)^{T}(\mbox{\boldmath$x$}(q)-\mbox{\boldmath$x$}(p))$
		$\displaystyle=$	$\displaystyle g(p)-g(q)+\mbox{\boldmath$x$}_{J}(p)^{T}(\mbox{\boldmath$p$}_{J}(p)-\mbox{\boldmath$p$}_{J}(q))+\mbox{\boldmath$p$}_{I}(q)^{T}(\mbox{\boldmath$x$}_{I}(q)-\mbox{\boldmath$x$}_{I}(p)).$

Let $\partial_{k}$ denote $\frac{\partial}{\partial x_{k}}$ if $k\in I$ and $\frac{\partial}{\partial p_{k}}$ if $k\in J$ , for short. Then

	$\displaystyle(\partial_{k})_{p}\mathcal{D}_{M}(p,q)$	$\displaystyle=$	$\displaystyle\epsilon(k)(\partial_{k}g(p)-\partial_{k}g(q))+\partial_{k}\mbox{\boldmath$x$}_{J}(p)^{T}(\mbox{\boldmath$p$}_{J}(p)-\mbox{\boldmath$p$}_{J}(q)),$
	$\displaystyle(\partial_{k})_{q}\mathcal{D}_{M}(p,q)$	$\displaystyle=$	$\displaystyle(1-\epsilon(k))(\partial_{k}g(p)-\partial_{k}g(q))+\partial_{k}\mbox{\boldmath$p$}_{I}(q)^{T}(\mbox{\boldmath$x$}_{I}(q)-\mbox{\boldmath$x$}_{I}(p)),$

where $\epsilon(k)=1$ if $k\in I$ and $0$ if $k\in J$ . It immediately follows that

\mathcal{D}_{M}[-|-]=0,\quad\mathcal{D}_{M}[\partial_{k}|-]=\mathcal{D}_{M}[-|\partial_{k}]=0,

so the divergence is a weak contrast function. Put

\epsilon(k,l)=\left\{\begin{array}[]{rl}1&(k,l\in I)\\ -1&(k,l\in J)\\ 0&\mbox{(otherwise)}.\end{array}\right.

Then a simple computation shows that

	$\displaystyle(\partial_{l})_{p}(\partial_{k})_{p}\mathcal{D}_{M}(p,q)$	$\displaystyle=$	$\displaystyle\epsilon(k,l)\partial_{l}\partial_{k}g(p)+\partial_{l}\partial_{k}\mbox{\boldmath$x$}_{J}(p)^{T}(\mbox{\boldmath$p$}_{J}(p)-\mbox{\boldmath$p$}_{J}(q)),$
	$\displaystyle(\partial_{l})_{q}(\partial_{k})_{q}\mathcal{D}_{M}(p,q)$	$\displaystyle=$	$\displaystyle\epsilon(k,l)\partial_{l}\partial_{k}g(q)+\partial_{l}\partial_{k}\mbox{\boldmath$p$}_{I}(q)^{T}(\mbox{\boldmath$x$}_{I}(q)-\mbox{\boldmath$x$}_{I}(p)).\qquad\quad$

Hence $\mathcal{D}_{M}[\partial_{k}\partial_{l}|-]=h(\partial_{k},\partial_{l})$ and

\mathcal{D}_{M}[\partial_{k}\partial_{l}|\partial_{m}]-\mathcal{D}_{M}[\partial_{m}|\partial_{k}\partial_{l}]=-\partial_{k}\partial_{l}\partial_{m}g

for any $k,l,m$ . This coincides with the cubic tenser $C$ by Proposition 3.24 up to the sign. $\Box$

5. Discussions

We shortly discuss possible directions or proposals for further researches.

5.1. Pre-Frobenius structure

In mathematical physics such as string theory, there often arise manifolds endowed with commutative and associative multiplication on tangent spaces satisfying certain properties, called (several variations of) Frobenius manifolds [10]. Now, let $(M,h,C)$ be a flat Hessian manifold, i.e., the metric connection with respect to $h$ is flat. Then $M$ naturally carries a (weak) version of Frobenius structure [27, §2]. Put $C_{ijk}=C(\partial_{i},\partial_{j},\partial_{k})$ using $\nabla$ -affine coordinates, and we may take them as structure constants to define a multiplication on $T_{p}M$ :

\partial_{i}\circ\partial_{j}:=\sum_{k,l}C_{ijk}h^{kl}\partial_{l}.

Since $C$ is symmetric, it is commutative. The associativity, $(\partial_{i}\circ\partial_{j})\circ\partial_{k}=\partial_{i}\circ(\partial_{j}\circ\partial_{k})$ , is written down to

\sum_{a,b}(C_{ijb}C_{kla}-C_{ila}C_{jkb})=0\qquad(\forall\,i,j,k,l),

and a bit surprisingly, the left hand side coincides with the curvature tensor for the Levi-Civita connection of $h$ [11, 23]; the equation is actually known as the WDVV equation in string theory. Moreover, it is easy to see that the multiplication is compatible with the metric: $h(\partial_{i}\circ\partial_{j},\partial_{k})=h(\partial_{i},\partial_{j}\circ\partial_{k})$ . Then the tuple $(M,h,\circ)$ becomes a weak pre-Frobenius manifold (cf. [10, 15]). For a quasi-Hessian manifold $M$ , the symmetric cubic tensor $C=(C_{ijk})$ is defined everywhere, but $h^{kl}$ is not; even though, the WDVV equation makes sense. Then, at least for every $p\in\Sigma$ (pointwise), the quotient $T_{p}M/{\rm null}(h_{p})$ carries a Frobenius algebra structure.

A new pre-Frobenius structure on a certain space of probability distributions has recently been found using the Hessian geometry on convex cones and paracomplex structure in [9]. Also from the context of Poisson and paraKähler geometry, the notion of contravariant pseudo-Hessian manifolds has been introduced in [7], which is actually very close to our quasi-Hessian manifolds with degenerate potentials. Those should be mutually related.

As a different question from the above, more interesting is local geometry of quasi-Hessian $M$ in relation with the Saito-Givental theory – under a certain condition, the germ of $M$ at a point should be a real geometry counterpart to analytic spectrum of a massive F-manifold (cf. [15, §3]; the analytic spectrum is a certain holomorphic Legendre submanifold of $\mathbb{C}^{2n+1}$ defined by a versal deformation of a complex isolated hypersurface singularity as its a generating family). Perhaps, this was essentially posed by Arnol’d [5, §4].

5.2. Statistical inference and machine learning

Suppose that our statistical model $S$ is a curved exponential family, i.e., a submanifold of an exponential family $M$ (see Example 2.6). Let $\mathcal{D}:M\times M\to\mathbb{R}$ be the associated Bregman divergence, which is known to coincide with the Kullback-Leibler divergence

\mathcal{D}_{KL}(q,p)=\int q(u)\log\frac{q(u)}{p(u)}du

measuring an ‘asymmetrical distance’ from a distribution (density function) $q=q(u)$ to another $p=p(u)$ . A given data set $\{{\bf u}_{i}\}$ yields an observed point $\hat{p}\in M$ , then the task of statistical inference is to find $q_{0}\in S$ which best approximates the point $\hat{p}$ . Information geometry [1, 2] provides a clear geometric understanding on the maximum likelihood estimate (MLE), that is, the MLE assigns to $\hat{p}$ the point $q_{0}\in S$ which attains the minimum of $\mathcal{D}(\cdot,\hat{p}):S\to\mathbb{R}$ , and especially, $\hat{p}$ is projected to $q_{0}$ along an $m$ -geodesic ( $m$ -curve) being orthogonal to $S$ at $q_{0}$ . We have shown that this assertion is valid even in case that $M$ admits the locus $\Sigma$ where the Fisher-Rao metric is degenerate (Theorem 4.8), see Fig. 5.

If $\hat{p}$ is sufficiently close to $S$ and far from $\Sigma$ , then the asymptotic theory of estimation is discussed. However, in practice, we may not be able to know if $\hat{p}$ is the case. For instance, it often happens that the MLE has multiple local minimums, i.e., the maximum likelihood equation may have multiple roots. Then, as $\hat{p}$ varies by renewing the data, catastrophe phenomena – the birth and death of min/max. points – can happen. Actually, the ambiguity of root selection in MLE has been studied in practical and numerical approach (cf. [26, §4]), while there seems to be less theoretical approach so far. Our framework provides a right way from information geometry. Define

F:S\times M\to\mathbb{R}\qquad F(q,p):=\mathcal{D}(\iota(q),p)

and we may consider $F$ as a global generating family [4, p.323], i.e., it defines a Legendre submanifold of $T^{*}M\times\mathbb{R}$ by

L_{S}:=\left\{\;(p,\eta,z)\;\middle|\;\exists\,q\in S,\;\frac{\partial F}{\partial q}(q,p)=0,\;\eta=\frac{\partial F}{\partial p}(q,p),\;z=F(p,\eta)\right\}

where we roughly denote by $\frac{\partial F}{\partial q}$ the differential with respect to $S$ and so on. This gives a typical example of a quasi-Hessian manifold. The critical value set of the Lagrange map $\pi:L_{S}\to M$ ( $\pi(p,\eta,z)\mapsto p$ ) is nothing but the envelope of the family of all $m$ -curves on $M$ which are orthogonal to $S$ ; we call it the $m$ -caustics determined by $S$ . If $S$ is not $\nabla$ -flat, the $m$ -caustics usually appear (that reflects the $\nabla^{*}$ -extrinsic geometry of $S$ in $M$ ). It turns out that the catastrophe phenomenon mentioned above arises when the data manifold $D$ intersects with the $m$ -caustics determined by $S$ . Conversely, for a given data manifold $D$ , we may consider the restriction of $\mathcal{D}$ to $M\times D$ and define the $e$ -caustics determined by $D$ similarly. Interaction between these two $e/m$ -caustics can be involved and affect the performance of EM-algorithm (cf. Amari [2, Chap. 8]). Note that in principle, the above strategy may be adapted to any divergence and any statistical model. The detail will be discussed somewhere else.

As described in Amari [2, Chap.11], a class of learning machines is also based on the Bregman divergence $\mathcal{D}_{\phi}$ of convex functions $\phi$ . Now, as an attempt, suppose that $\phi$ is a nonconvex function (possibly with inflection points). Read $\mathcal{D}_{\phi}$ to be the corresponding canonical divergence in our sense (see §4.2). Here we would like to notice that the same proofs in convex case do often work to obtain slightly weaker results for such general $\phi$ – an easy example is Theorem 11.1 of [2], which is read off as “the $k$ -mean $\eta_{C}:=\frac{1}{k}\sum x_{i}$ of a cluster $C=\{x_{i}\}_{i=1}^{k}$ in $\mathbb{R}^{n}$ is always a critical point of $\mathcal{D}_{\phi}(C,-):=\frac{1}{k}\sum\mathcal{D}_{\phi}(x_{i},-)$ , and all other critical points are obtained from $\eta_{C}$ and $\ker\nabla^{2}\phi$ ”. We expect a similar result for some other optimization algorithm. On the other hand, almost all statistical learning machines allow Fisher-Rao matrices to be degenerate [14, 28]. In particular, as in [2, Chap.12], most of deep learning machines use the Gaussian noise with a fixed (co)variance for regression; then the parameter space $M$ becomes a self-dual Riemannian manifold $(h,\nabla=\nabla^{*})$ off the degeneracy locus $\Sigma$ of $h$ having many components. We seek another scheme for measuring errors which is compatible with our singular model.

5.3. Conclusion

In the present paper, we have proposed an information geometry for singular models from the viewpoint of contact geometry and singularity theory. We have introduced quasi-Hessian manifolds, which extend the notion of dually flat manifolds of Amari-Nagaoka so that the Hessian metric can be degenerate, but the canonical cubic tensor is consistently defined on the entire space. Most notable is that the extended Pythagorean theorem and projection theorem are valid even in this singular setup.

There are several further directions as mentioned above. We end by adding a few more comments. There is an on-going project of the first author on local classification of singularities of $em$ -wavefronts in flat affine coordinates, which extends an old work of Ekeland [12] in nonconvex optimization and leads to affine differential geometry of wavefronts (cf. [24]). Secondly, since a quasi-Hessian manifold is embedded in some contact manifold, we may think of the Hamiltonian-Jacobi method for time evolution of quasi-Hessian manifolds (wavefront propagation) and semi-classical quantization (WKB analysis) in our framework (cf. [3]). Finally, it would be valuable to find some connections with preceding excellent works on singular statistical models [2, 14, 28] – especially, we hope that the theory of singular Legendre varieties and Legendre currents would make a bridge between the differential geometric method [1, 2] and the algebro-geometric method [28].

References

[1] S. Amari, H. Nagaoka, Method of information geometry, A.M.S., Oxford Univ. Press (2000).
[2] S. Amari, Information Geometry and Its Application, Applied Math. Sci., 194, Springer (2016).
[3] V.I. Arnol’d, Mathematical Methods of Classical Mechanics, 2nd Edition, Grad. Texts Math. 60, Springer-Verlag (1989).
[4] V.I. Arnol’d, S.M. Gusein-Zade, A.N. Varchenko, Singularities of Differentiable Maps I, Monographs in Math. 82, Birkhäuser (1986).
[5] V.I. Arnol’d, Singularities of Caustics and Wave Fronts, Kluwer Acad. Publ. (1990).
[6] V.I. Arnol’d, Catastrophe Theory, 3rd Edition, Springer-Verlag (1992).
[7] S. Benayadi and M. Boucetta, On para-Kähler Lie algebroids and contravariant pseudo‐Hessian structures Math. Nachrichten 292 (2019), 1418–1443.
[8] N. Chentsov, Statistical decision rules and optimal inference, Translation of Math. Monograph 53, AMS, Providence (1982).
[9] N. Combe and Y.I. Manin, F-manifolds and geometry of information, preprint (2020), ArXiv:2004.08808.
[10] B. Dubrovin, Geometry of 2D topological field theories, Springer Lect. Note Series 1620 (1996), 120–348.
[11] J. Duistermaat, On Hessian Riemannian Structures, Asian J. Math. 5 (2001), 79–91.
[12] I. Ekeland, Legendre duality in nonconvex optimization and calculus of variations, SIAM J. Control and Optimization, 15 (6) (1977), 905–934.
[13] S. Eguchi, Geometry of minimum contrast, Hiroshima Math. J., 22 (1992), 631–647.
[14] K. Fukumizu and S. Kuriki, Statistics of Singular Models, Frontier Stat. Sci. 7, Iwanami Publ. (2004) (in Japanese).
[15] C. Hertling, Frobenius Manifolds and Moduli Spaces for Singularities, Cambridge Univ. Press (2002).
[16] G. Ishikawa, Parametrization of a singular Lagrangian variety, Trans. Amer. Math. Soc. 331 (1992), 787-798.
[17] S. Izumiya and G. Ishikawa, Applied Singularity Theory (in Japanese), Kyoritsu Shuppan, Co. Ltd. (1998)
[18] S. Izumiya, M.C. Romero-Fuster, M.A.S. Ruas, F. Tari, Differential Geometry from a Singularity Theory Viewpoint, World Scientific (2015).
[19] T. Matsumoto, Any statistical manifold has a contrast function – On the $C^{3}$ -functions taking the minimum at the diagonal of the product manifold, Hiroshima Math. J. 23 (1993), 327–332.
[20] H. Matsuzoe, Statistical manifolds and affine differential geometry, Advanced Stud. Pure Math. 57 (2010), 303–321.
[21] T. Poston, I. Stewart, Catastrophe Theory and Its Applications, Pitman Publ. Ltd. (1978).
[22] H. Sano, Y. Kabata, J.L. Deolindo-Silva, T. Ohmoto, Classification of jets of surfaces in projective 3-space via central projection, Bull. Brazilian Math. Soc., New Series 48 (2017), 623–639.
[23] H. Kito, On Hessian structures on the Euclidean space and the hyperbolic space, Osaka J. Math. 36 (1999), 51–62.
[24] K. Saji, M. Umehara, K. Yamada, The geometry of fronts, Annals of Math., 169 (2009), 491–529.
[25] H. Shima, The geometry of Hessian Structures, World Scientific (2007).
[26] C. Small and W. Jinfang, Numerical Methods for Nonlinear Estimating Equations, Oxford Univ. Press (2003).
[27] B. Totaro, The curvature of a Hessian metric, Internat. J. Math. 15 (2004), 369–391.
[28] S. Watanabe, Algebraic Geometry and Statistical Learning Theory, Cambridge Univ. Press (2008).

The dually flat structure for singular models

Abstract.

Key words and phrases:

Key words and phrases:

1. Introduction

2. Dually Flat Structure

2.1. Contact geometry and Legendre duality

Definition 2.1.

Remark 2.2.

2.2. Dually flat structure

Definition 2.3.

Proposition 2.4.

Remark 2.5.

Example 2.6.

3. Quasi-Hessian structure

3.1. e/me/m-wavefronts and e/me/m-caustics

Definition 3.1.

Definition 3.2.

Definition 3.3.

Example 3.4.

Example 3.5.

Example 3.6.

Remark 3.7.

3.2. Coherent tangent bundles

Remark 3.8.

Lemma 3.9.

Definition 3.10.

Remark 3.11.

Definition 3.12.

Lemma 3.13.

Lemma 3.14.

Remark 3.15.

Definition 3.16.

Definition 3.17.

Remark 3.18.

3.3. Quasi-Hessian manifolds

Definition 3.19.

Proposition 3.20.

Remark 3.21.

3.4. Cubic tensor and α\alpha-family

Lemma 3.22.

Definition 3.23.

Proposition 3.24.

Remark 3.25.

4. Divergence

4.1. Geodesic-like curves

Definition 4.1.

Remark 4.2.

Definition 4.3.

4.2. Canonical divergence

Definition 4.4.

Lemma 4.5.

Definition 4.6.

Theorem 4.7.

Theorem 4.8.

Example 4.9.

4.3. Weak contrast functions

Theorem 4.10.

5. Discussions

5.1. Pre-Frobenius structure

5.2. Statistical inference and machine learning

5.3. Conclusion

References

3.1. $e/m$ -wavefronts and $e/m$ -caustics

3.4. Cubic tensor and $\alpha$ -family