The dually flat structure for singular models
Abstract.
The dually flat structure introduced by Amari-Nagaoka is highlighted in information geometry and related fields. In practical applications, however, the underlying pseudo-Riemannian metric may often be degenerate, and such an excellent geometric structure is rarely defined on the entire space. To fix this trouble, in the present paper, we propose a novel generalization of the dually flat structure for a certain class of singular models from the viewpoint of Lagrange and Legendre singularity theory – we introduce a quasi-Hessian manifold endowed with a possibly degenerate metric and a particular symmetric cubic tensor, which exceeds the concept of statistical manifolds and is adapted to the theory of (weak) contrast functions. In particular, we establish Amari-Nagaoka’s extended Pythagorean theorem and projection theorem in this general setup, and consequently, most of applications of these theorems are suitably justified even for such singular cases. This work is motivated by various interests with different backgrounds from Frobenius structure in mathematical physics to Deep Learning in data science.
Key words and phrases:
Dually flat structure, canonical divergence, Hessian geometry, Legendre duality, wavefronts, caustics, singularity theory.Key words and phrases:
First keyword and Second keyword and More1. Introduction
The dually flat structure is highlighted in information geometry – it brings a united geometric insight on various fields such as statistical science, convex optimizations, (quantum) information theory, and so on (Amari-Nagaoka [1], Amari [2], Chentsov [8]). This is also essentially the same as the Hessian structure in affine differential geometry (Shima [25]). On a -manifold , a dually flat structure is a triplet where is a pseudo-Riemannian metric (i.e., non-degenerate symmetric -tensor) and and are flat affine connections on satisfying certain properties; the most particular feature is that the metric is locally given by the Hessian matrix of some potential function in -affine coordinates. In practical applications, however, the Hessian matrix may often be degenerate along some locus of points in , and then, strictly speaking, the differential geometric method can not be directly applied. We call such a space a singular model, roughly. In the present paper, we propose a novel generalization of the dually flat structure for a certain class of singular models from the viewpoint of contact geometry and singularity theory. This provides a new framework for general hierarchical structures – we introduce a quasi-Hessian manifold endowed with a possibly degenerate quadratic tensor and a particular symmetric cubic tensor , that exceeds the concept of statistical manifolds and very fits with the theory of contrast functions due to Eguchi [13]. In fact, such naturally possesses a canonical divergence , which is a weak contrast function compatible with and (Theorem 4.10). The key is the Legendre duality, which does exist even under the presence of the degeneracy locus of . In spite of no metric and no connection available (!), we generalize in a natural way the Amari-Chentsov cubic tensor to a symmetric tensor defined on the entire space (that is possible even in case that ), and especially we establish Amari-Nagaoka’s extended Pythagorean theorem and projection theorem in this setup (Theorems 4.7, 4.8). Consequently, in principle, most of applications of these theorems are suitably justified even for such degenerate cases.
As the first observation, we see that if the Hessian of a potential is degenerate, the graph of the dual potential (i.e., the Legendre transform of the potential) is no longer a submanifold but a wavefront having singularities branched along its caustics. More generally, our quasi-Hessian manifold is locally accompanied with two kinds of wavefronts, later called the -wavefronts, as an alternate to the pair of a convex potential and its dual. To grasp the point, it would be helpful to refer to Fig.1 and Fig.2 in §3.1 in advance. Those two wavefronts are mutually tied by the Legendre duality in a strict sense, and also they have ‘height functions’ (i.e., projections to the and -axes, respectively) by using which we generalize the Bregman divergence to our divergence . Further, we associate the pair of coherent tangent bundles
where each of and is a vector bundle on that is an alternative to the tangent bundle and equipped with a flat connection and a bundle map from measuring the degeneration of , and and are dual to each other. Intuitively, the coherent tangent bundles come from the splitting of the standard contact structure into the directions of the -wavefront projections (see §2.1, §3.1 and §3.2). The role of and on a dually flat space is undertaken by and of those vector bundles on our singular model , and then the new cubic tensor of is defined by using these connections through and (§3.4). Originally, the notion of coherent tangent bundles has been introduced for studying Riemannian geometry of singular wavefronts by Saji-Umehara-Yamada [24], and here we borrow an affine geometry version. In the case that is non-degenerate (, ), the dually flat structure in the original form is naturally recovered.
Singularities of caustics and wavefronts have thoroughly been investigated in Lagrange and Legendre singularity theory (initiated by Arnol’d, Zakalyukin and Hörmander) in relation with a broad range of subjects such as classical mechanics, thermodynamics, geometric optics, Fourier integral operators, control theory, catastrophe theory and so on [3, 4, 5, 6, 17, 18, 21]. We bring several techniques or concepts in this theory into information geometry, that may suggest new directions in both theory and application. In fact, the present paper is motivated by various interests from different backgrounds:
-
-
A typical example of quasi-Hessian manifolds is a general affine hypersurface in . It possesses mixed geometry with changing metric-type – that goes back to Darboux and others dealing with a rich geometry of parabolic curve (the curve of inflection points) separating elliptic and hyperbolic domains on a surface in [6, 5, 22]. That is also related to nonconvex optimization and variational problems [12].
-
-
Any Lagrange submanifold of is a quasi-Hessian manifold. If it is flat, then the cubic tensor satisfies the so-called WDVV equation, which mainly appears in topological field theory, that yields a version of Frobenius manifold-like structure [23, 27, 15]. That is also related to geometry of Poisson manifolds and paraKähler structure [7, 9].
-
-
In statistical inference, any curved exponential family produces a quasi-Hessian manifold, which represents the -extrinsic geometry in the ambient family. For instance, it is useful for studying catastrophe phenomena in root selections of the maximum likelihood equation [26]. Almost all statistical learning machines including deep neural networks allow degeneration of Fisher-Rao matrices [2, 14, 28], to which we are seeking for a new approach.
In the present paper, we will focus only on basic ideas and write them plainly in a self-contained manner as much as possible – most arguments are elementary and checked by direct computations, and we will NOT enter into any detail of singularity theory here. Therefore, perhaps, this paper would be readable enough for anyone with various background. Nevertheless, we believe that this paper contains some new observations in this field. The rest of this paper is organized as follows. In §2 we give a brief summary on some basics in contact geometry and the dually flat structure. In §3, after reviewing the definition of Lagrange and Legendre singularities, we introduce quasi-Hessian manifolds. In §4, the associated canonical divergence will be discussed; the extended Pythagorean theorem and projection theorem are presented in our setting and also we give a relation with contrast functions. In §5, we pick up some possible applications and open questions.
Throughout, bold letters denote column vectors, e.g., , and the notation with prime simply means to distinguish from (not mean any operation like differential or transpose). Also we let denote for short as usual. We assume that manifolds and maps are of class , for the simplicity.
The authors are partly supported by GiCORE-GSB, Hokkaido University, and JSPS KAKENHI Grant Numbers JP17H06128 and JP18K18714.
2. Dually Flat Structure
2.1. Contact geometry and Legendre duality
To begin with, we summarize a minimal set of basic knowledge in contact geometry which will be used throughout this paper. As best references, we recommend Chap.18-22 of Arnol’d et al [4], Appendix of Arnol’d [3] and Izumiya-Ishikawa [17].
A contact manifold is a -dimensional manifold endowed with a maximally non-integrable hyperplane field , i.e., is locally expressed by the kernel of a -form satisfying the non-degeneracy condition . This field is called a contact structure on . The most important example is the standard contact space ; it is the -jet space of functions on the affine -space
where the -jet of a function at a point means the Taylor coefficients at of order , i.e., . The contact structure is given by the -form
called the standard contact form, where is the last coordinate, and and denote, respectively, the base and the fiber coordinates of the cotangent bundle (we always write coordinates in this order). We often write and in order to distinguish them. Note that the standard contact structure relies on the affine structure of the base space, not on the choice of coordinates .
The famous Darboux theorem tells that the contact structure is locally unique; namely, for any contact manifold , we can always find a system of local coordinates around any point , in which the contact structure is presented by the standard one.
A Legendre submanifold of a -dimensional contact manifold is an -dimensional integral manifold of the field , i.e., for every . It is easy to see that in the standard , the graph of of a function
is a Legendre submanifold, and conversely, every Legendre submanifold which is diffeomorphically projected to is expressed in this form (1).
The standard symplectic structure of is defined by the non-degenerate closed -form
A Lagrange submanifold is an -dimensional submanifold over which vanishes. A typical example is the graph of , i.e., the image of of the form (1) via projection along the -axis. Any Lagrange submanifold of is always locally liftable to a Legendre submanifold of uniquely up to a transition parallel to the -axis. If it is entirely liftable, then we call it an exact Lagrange submanifold.
A Legendre fibration is a fiber bundle whose total space is a -dimensional contact manifold , the base is an -dimensional manifold and every fiber is Legendrian. The most typical example is the projection from the standard space
Every Legendre fibration is locally described in this typical form with suitable local coordinates.
The Legendre duality is described as follows. Consider the transformation given by
It is a contactomorphism, i.e., preserves the contact hyperplane fields ; indeed, . Put
which is also a Legendre fibration. Then, the double fibration structure of the standard contact space is defined as the following diagram:
Let be the projection to the first factor and put
Let be a Legendre submanifold of . In this paper, we call a regular model if is diffeomorphic to some open subsets and via projections and , respectively. Equivalently, there exsit functions on and on such that
-
-
is the graph of ;
-
-
is the graph of .
Then we have
The coordinate change is the gradient map , . It is diffeomorphic, thus the Hessian matrix of is non-degenerate at every . Here, the inverse map is given by , and its Hessian matrix is the inverse of that of . We say that is the Legendre transform of and vice-versa. We call a potential function and its dual potential. This correspondence is the Legendre duality. It is very common in, e.g., convex analysis: if is strictly convex, then is also (see Remark 2.2).
An affine Legendre equivalence, a new terminology introduced in this paper, is defined by an affine transformation of the form
together with affine transformations
where is invertible and
Note that (or ) determines . It is easy to see that preserves the contact form and the double fibrations , i.e., and the following diagram commutes:
Definition 2.1.
We say that two Legendre submanifolds of are affine Legendre equivalent if there exists some which identifies with .
Remark 2.2.
(Projective duality) The Legendre duality is an affine expression of the projective duality. We denote by the real projective space of dimension and by the dual projective space. Let denote the incidence submanifold of which consists of pairs with , i.e., is a codimension one submanifold () defined by
for and . Note that is naturally identified with the projective cotangent bundle , and thus becomes a contact manifold [4, §20.1]. Consider the open subset of defined by and . We may set , and put and , then the above equation is rewritten as
Clearly, has two systems of coordinates, and , and the coordinate change between them is just the above preserving the contact structure of . In projective geometry, the double Legendre fibration
expresses the duality principle on points and hyperplanes, where and are projections of the projective cotangent bundles. Restrict this diagram to and identify with using coordinates , we get the diagram . For instance, in case of , consider a parameterized plane curve
Then its projective dual is the following curve consisting of the tangent lines:
If is convex, then is also. If has an inflection point, e.g., , then has a cusp at the corresponding point, , and therefore, is locally the graph of a bi-valued function, .
2.2. Dually flat structure
Let be a Legendre submanifold of a regular model with potential function . The Hessian matrix
is invertible, thus it defines a pseudo-Riemannian metric on , called the Hessian metric associated to . Additionally, through the projections and , the fixed affine structures of and induce two different flat affine connections on , respectively.
Definition 2.3.
([1]) The triplet is called the dually flat structure on a regular model .
Note that is preserved under affine Legendre equivalence; indeed, induces affine transformations of and and simply adds a linear function to the potential .
The dually flat structure is traditionally introduced in terms of differential geometry in an intrinsic way. We briefly summarize it below, see [1, 2, 20, 25] for the detail.
A statistical manifold is a pseudo-Riemannian manifold equipped with a torsion-free affine connection being compatible with , i.e., the cubic tensor is totally symmetric:
for vector fields and . Equivalently [20, p.306], a stastistical manifold may also be defined as a manifold endowed with a pseudo-Riemannian metric and a totally symmetric -tensor (due to Lauritzen), that is also described within the theory of contrast functions in [13]. The dual connection (with respect to ) is defined by
and then is torsion-free and is also symmetric. Furthermore, if is flat (i.e., torsion-free and curvature-free), then is also. Such a statistical manifold with flat connections is called a dually flat manifold [1, 2] or a Hessian manifold [25, 20]. The most notable characteristic of a dually flat manifold is that locally it holds that
for some local potential . In other words, the metric is expressed by the non-degenerate Hessian matrix of in -affine local coordinates . Moreover, the -affine coordinates are then given by .
This local expression of a dually flat manifold exactly provides a regular model in , the graph of -jet of a local potential, equipped with the dually flat structure in the sense of Definition 2.3. Such a regular model is uniquely determined up to affine Legendre equivalence. To see this precisely, suppose that the metric is locally expressed by the Hessian matrices and of two potential functions and in different -affine local coordinates, respectively. Here, let and denote the charts with , . By definition, there is an affine transformation
By the assumption, it holds that for every , thus any second partial derivatives of the composite function coincide with those of . Namely, these two functions are the same up to some linear term:
Then the affine transformation
sends the graph of to the graph of . Hence, the corresponding affine Legendre equivalence identifies two regular models, defined by and defined by , on the overlap.
Actually, less noticed, though, this simple observation says that any dually flat manifold is an affine manifold having an atlas with affine coordinate changes so that it is additionally equipped with local potentials whose graphs are glued by affine transformations of the above form. The affine structure gives the flat connection , and local potentials restore the Hessian metric by gluing . At the level of Legendre submanifolds given by -jets of local potentials, notice again that preserves . Therefore, we may rephrase the above statement in the following way:
Proposition 2.4.
Any dually flat or Hessian manifold is an affine manifold made up by gluing several regular models in via affine Legendre equivalence. The metric and the pair of affine connections and are reconstructed by the dually flat structures of in the sense of Definition 2.3.
This gluing construction will be generalized later to introduce our quasi-Hessian manifolds (Definition 3.19 in §3.3).
Remark 2.5.
Since each gluing map acts also on a neighborhood of a regular model in , the gluing construction yields a dually flat manifold as a Legendre submanifold of some ambient contact manifold (also it produces a Lagrange submanifold of some symplectic manifold). Let be a dually flat manifold, and suppose that there exists a global potential with . Take local charts of -flat coordinates, then local potentials define regular models in , and they are glued together by affine Legendre equivalence of the form with and . Conversely, gluing local models by this special kind of affine Legendre equivalences yields a dually flat manifold with a global potential. As a weaker situation, suppose that there exists a closed -form with ; then is said to be of Koszul type [25]. This case corresponds to gluing regular models by affine Legendre equivalence of the form with but possibly .
Example 2.6.
(Amari [1, 2]). An exponential family is a family of probability density functions of the form
where is a random valuable (with its measure ) and are parameters ( is an open set). The normalization factor is called the potential of this family. Fix the affine structure of , and put . We see that the expectation is the corresponding dual coordinate
and the (co)variance are written by
where the last one means the Fisher-Rao information. If is positive and one regards as the -affine coordinates, respectively, then becomes a dually flat manifold. Normal distributions and finite discrete distributions are typical examples.
3. Quasi-Hessian structure
Our main idea is to consider not only regular models but also general Legendre submanifolds of . Then the Lagrange-Legendre singularity theory naturally comes up into the picture (Arnol’d el al [4], Izumiya-Ishikawa [17]). Nevertheless, in this paper, we only use very basic notions/properties in the theory, which are prepared in §3.1. As another new ingredient, we introduce in §3.2 an affine geometric version of the coherent tangent bundle in Saji-Umehara-Yamada [24]. In §3.3 and §3.4 we define a quasi-Hessian manifold endowed with a particular cubic tensor.
3.1. -wavefronts and -caustics
A Legendre map is the composition
of the inclusion of a Legendre submanifold and the projection of a Legendre fibration . The image is usually called a wavefront; we denote it by in this paper. The Legendre map may have singular points, i.e., points on at which the rank of the differential is not maximum (equivalently, is tangent to the fiber of ), called Legendre singularities [4, 17]. Then the wavefront is no longer a submanifold.
From now on, we consider the diagram of double Legendre fibrations on and an arbitrary Legendre submanifold . So we have two Legendre maps
and call them the - and -Legendre maps, respectively, following a traditional notation in information geometry (“-” and “-” come from words in statisitcs, i.e., exponential and mixture) [1].
Definition 3.1.
(-wavefronts) We set
and call them the -wavefronts associated to , respectively.
The -wavefronts are Legendre dual to each other in point-hyperplane duality principle (Remark 2.2).
Usually, the projection of a Lagrange submanifold of to the base is called a Lagrange map [4, 17]. So we have the -Lagrange maps
It is easy to see that the following two conditions on points are equivalent:
-
•
is a singular point of the Legendre map ;
-
•
is a singular point of the Lagrange map .
Indeed, any enjoys , thus, if , then .
Definition 3.2.
(-caustics) The -critical set consists of all singular points of the -Lagrange map , and we call its image the -caustics associated to . The -version is defined in entirely the same way.
Definition 3.3.
We say that is locally a regular model around if there is an open neighborhood of in which is a regular model of , i.e., is neither -critical nor -critical.
Consider the case that is not -critical but -critical (see toy examples in Examples 3.4, 3.5 below). Then is the graph of some local potential function defined near . Take as local coordinates of around . Then the -Lagrange map is written as the identity map of and the -caustics is empty, while the -Lagrange map is written as the gradient map . Now it is critical at , so is singular. In this case we call a model with degenerate potential. In particular, if admits inflection points in strict sense, is the graph of a multi-valued function branched along the -caustics in .
Example 3.4.
(-singularity). Let
Then and the degeneracy locus of the Hessian is defined by . See the pictures on the left in Fig. 1.
-
-
The -wavefront is smooth and has parabolic points along . There is no -caustic.
-
-
The -wavefront is a singular surface with cuspidal edge; it is the graph of the bi-valued dual potential
defined on and branched along the -caustics .
This singularity does not appear, if the Hessian is be non-negative. Note that for every statistical model, the Fisher-Rao metric is non-negative.
Example 3.5.
(-singularity). Let
Then and the degeneracy locus of the Hessian is defined by . See the pictures on the right in Fig. 1.
-
-
The -wavefront is smooth and convex. There is no -caustic.
-
-
The -wavefront is a singular surface; it is the graph of the dual potential
which is defined on the entire space but singular along the -caustics .
This is a typically degenerate minimum of functions and also a typical type of singularities with -symmetry (cf. [4]).

Furthermore, it can happen that is -critical and -critical simultaneously. Then both wavefronts and become singular at and , respectively. In general, by Implicit Function Theorem, there is a partition () such that is locally parametrized around by coordinates and (, ). In fact, we can find a function such that near is expressed by
where we write . This follows from the form (1) in §2.1 and the canonical transformation
which preserves the contact structure. Usually, is called a generating function of around [4, §20]. In particular, in case that (resp. ), a generating function is a potential (resp. dual potential ). The -Legendre maps are locally expressed as follows.
Also the -Lagrange maps are obtained by ignoring the last and -coordinate, respectively.
Example 3.6.
Let
be a generating function. The -Legendre maps and send to
respectively, so those images and are singular surfaces having some own geometric nature, and the -caustics are defined by on and on , see Fig. 2.

Remark 3.7.
(Hierarchical structure) For a dually flat manifold with a potential (i.e., a regular model), there are two systems of coordinates, and , which are -flat and -flat, respectively. That produces a hierarchical structure – we may take a new system of coordinates , called mixed coordinates in Amari [2], which yields two foliations of complementary dimensions on defined by and ; their leaves are -flat and -flat, respectively, and mutually orthogonal with respect to the Hessian metric associated to . This structure is useful for application, see [2]. For an arbitrary Legendre submanifold , a potential may not exist globally, but as seen above, for any , we can always find a generating function on a neighborhood of (possible choices of the partition depends on ). That locally defines mixed coordinates and two orthogonal foliations on (see Remark 3.15 below). Usually these coordinates can not be extended to the entire space , because of the presence of -caustics (i.e., is degenerate). Nevertheless, this new structure is well organized globally, that we will formulate properly in the following subsections.
3.2. Coherent tangent bundles
Let be a Legendre submanifold of . As seen above, the -wavefront is not a manifold in general, but there is an alternate to its ‘tangent bundle’. Every point defines a hyperplane in , and the family of such hyperplanes form a vector bundle on of rank :
Since is Legendrian, we see
thus . We then associate a vector bundle map (a smooth fiber-preserving map which is linear on each fiber)
Note that is isomorphic if and only if is an immersion.
Remark 3.8.
We remark that is the “limiting” tangent bundle of the -wavefront . Note that the kernel of is spanned by ’s. If is a regular point of the -Legendre map , then and
In fact, in this case, is an immersion around , so is a submanifold around . If a sequence of regular points of converges to a critical point , then the image of converges to (in the Grassmannian of -planes in ) because of the continuity of the bundle . In this case, is singular at , thus the tangent space at that point is not defined, but it has the limiting tangent space as an alternate. Another characterization of is
through the inclusion . In fact, the Jacobian matrix of at is
so its kernel is given by and . Note that the contact hyperplane splits as
Let be the flat connection on the affine space , and for any , let denote the linear projection along the -axis. Then a connection of the vector bundle over is naturally defined by
where is any section of and is any vector field on around .
Lemma 3.9.
The resulting connection is flat and ‘relatively torsion-free’, i.e., for any vector fields on , it holds that
Proof : Put (), then they form a frame of flat global sections of :
Thus is flat. Next, a key point is that is represented by the Jacobi matrix of the -Lagrange map , i.e., in local coordinates of with . Let and , then
The rest is shown by a direct computation.
Definition 3.10.
We call the coherent tangent bundle associated to the -wavefront .
Remark 3.11.
The definition of coherent tangent bundles is originally due to Saji-Umehara-Yamada [24, §6] from the viewpoint of Riemannian geometry. They have studied several kinds of curvatures associated to wavefronts. In our case, we use the fixed affine structure of the ambient space of the wavefront, instead of metric. Also affine differential geometry of wavefronts should be rich.
In entirely the same way, for the -wavefront , the coherent tangent bundle with and is defined:
In fact, the double Legendre fibration can be viewed as the pair of maps using different coordinates of , and then the above construction yields in this dual side. In particular, is identified with (see Remark 3.8).
We have defined and as vector bundles on , although they are actually defined on the ambient space . The contact hyperplane has the direct sum decomposition:
Here we have canonical frames of flat sections for both and ,
by which is identified with the dual to and vice-vasa, and there are natural correspondence and via projections along the and -axes. The vector bundle carries not only the symplectic form but also a pseudo-Riemannian metric of type induced from
Using frames and , we may write vectors of and as column vectors and , respectively, and then
and also .
Any affine Legendre equivalence preserves and on , because it sends to with .
Definition 3.12.
We define the quasi-Hessian metric of by the pullback of :
where is the inclusion (it is a Lagrange subbundle).
Note that is a possibly degenerate symmetric -tensor, although we abuse the word ‘metric’. If is isomorphic, then exactly coincides with the Hessian metric associated to a potential ; any vector of is written by where is the Hessian matrix, thus
In general, a local expression of is given as follows.
Lemma 3.13.
Let be a generating function. Then,
Proof : A direct computation shows
Here we use the notation of symmetric products of -forms and .
Lemma 3.14.
Let . The following properties are equivalent:
-
(1)
is non-degenerate at ;
-
(2)
is neither of -critical nor -critical;
-
(3)
is locally a regular model around ;
-
(4)
is the Hessian metric associated to a local potential near ;
-
(5)
both and are isomorphisms at .
Proof : By Lemma 3.13, (1) means that both and are non-degenerate. Then, using normal forms of the -Lagrange maps and written in the end of §2.3, those maps are locally diffeomorphic by Inverse Mapping Theorem, so it is just (2) and (3). That means that we can take a local potential as generating function, that is equivalent to (4). Since and are expressed by the Jacobi matrices of the -Lagrange maps, (2) and (5) are the same.
Remark 3.15.
Definition 3.16.
Let denote the set of at which is degenerate, equivalently, the locus where either or is not isomorphic:
We call the degeneracy locus of the quasi-Hessian metric .
Since is Legendrian, is a Lagrange subspace of the symplectic vector space . Note that (resp. ) is the linear projection of to the factor (resp. ), and especially, (resp. ) if and only if is -critical (resp. -critical). In particular, the null space of splits:
Definition 3.17.
For an arbitrary Legendre submanifold , we call the triplet the dually flat structure of .
Remark 3.18.
Given a regular model , we have the triple , where both and are isomorphic. That restores the dually flat structure in the original form (Definition 2.3); Indeed, and on are uniquely determined by
where are arbitrary vector fields on . On the other hand, a singular model with degenerate potential is the case that is isomorphic and is not. Then the connection of is obtained from via in the same way as above, while does not exist. If both and are not isomorphic, there is no connection on .
3.3. Quasi-Hessian manifolds
Our generalized dually flat structure presented in Definition 3.17 is compatible with affine Legendre equivalence. That means that if an affine Legendre equivalence identifies Legendre submanifolds and , then the quasi-Hessian metrics are preserved, , and naturally induces vector bundle isomorphisms between coherent tangent bundles, and , such that the isomorphisms identify equipped affine flat connections and we have the following commutative diagram
Thus the ordinary gluing construction works. To be precise, suppose that we are given a collection , where is a countable set, such that it satisfies the following properties:
-
(i)
for every , itself is an open manifold and it is embedded in as a Legendre submanifold, called a local model;
-
(ii)
for every , there is an open subset (also ) and a diffeomorphism such that over each connected component of , is given by an affine Legendre equivalence of the ambient space ;
-
(iii)
for , it holds that on .
Let be the resulting topological space from these gluing data . Assume that is Hausdorff, then itself becomes an -dimensional manifold in the ordinary sense. One can naturally associate a possibly degenerate -tensor on and a pair of globally defined dual coherent tangent bundles and on with bundle maps and equipped with affine flat connections. The bundles and are dual to each other.
Definition 3.19.
We call equipped with a quasi-Hessian manifold. We define the degeneracy locus to be the locus of points of at which is degenerate.
Since the gluing maps also act on a neighborhood of in and preserve the contact structure, is realized as a Legendre submanifold in some ambient contact manifold.
By the above construction, it is obvious to see
Proposition 3.20.
Let be a quasi-Hessian manifold. Then is non-degenerate everywhere if and only if is a Hessian manifold.
Proof : From the equivalence of (1) and (3) in Lemma 3.14, we see that is non-degenerate everywhere if and only if any local models are regular models, that means is a Hessian manifold (Proposition 2.4).
Remark 3.21.
More generally, we may allow a local model not to be a manifold but a singular Legendre variety; it is a closed subset with a partition (stratification) into integral submanifolds of the contact structure (the projection to the cotangent bundle is called a singular Lagrange variety), see, e.g., Ishikawa [16]. That results a quasi-Hessian manifold with singularities.
An intrinsic definition of quasi-Hessian manifolds is also available. Roughly speaking, it is an -dimensional manifold equipped with a pair of flat coherent tangent bundles and of rank ; we impose two conditions:
-
(a)
the vector bundle of rank is endowed with a symplectic structure and a pseudo-Riemannian metric of type satisfying ( or ) and (, ); this condtion defines the dualily between and ;
-
(b)
the bundle map
is injective and the image is a Lagrange subbundle which is certainly integrable in order to ensure to find a local model around each point of as in Definition 3.19. We omit the detail here.
This also suggests a degenerate version of the so-called Codazzi structure (cf. [25]).
3.4. Cubic tensor and -family
In the theory of dually flat manifolds [1], not only the Hessian metric but also the Amari-Chentsov tensor takes an essential role; it satisfies
for vector fields .
Note that whenever exists, the tensor is defined everywhere, independently whether or not is non-degenerate. This is an easy case. We generalize the Amari-Chentsov tensor for an arbitrary quasi-Hessian manifold
but the way is not obvious at all, because there is no connection of . Finally we will see that the obtained tensor is a very natural one (Proposition 3.24 below).
Lemma 3.22.
For any vector field on , and for any sections of and of , it holds that
where we put for short.
Proof : Take local frames of flat sections of and of with () on an open set . Put and where are functions on , then
This leads to the equality.
For , put
Then
Using Lemma 3.22, for vector fields on ,
We call the sum of first and third terms the -part, the rest the -part, tentatively. We are concerned with their difference.
Definition 3.23.
For a quasi-Hessian manifold , we define the canonical cubic tensor by the following -tensor on :
In particular, if is non-degenerate, then (Remark 3.18) and we have
and so on, thus it follows that the -part and the -part are equal to, respectively,
Hence, we see that coincides with the Amari-Chentsov tensor :
Using local coordinates, we write down the tensor explicitly as follows. Take a local model and . As mentioned before, locally around , is parameterized by some local coordinates with a generating function . For the simplicity, for each , we set
Proposition 3.24.
The canonical cubic tensor is locally the third partial derivative of a generating function: for any ,
In particular, is symmetric.
Proof : This is shown by direct computation. The generating function yields a Lagrange embedding given by
thus the differential is written as
Let be flat sections of and as before; . Then for ,
and for ,
Put , , , , and .
For , and any , we have
Thus, the -part minus the -part gives .
For and any , we have
Thus, . The same is true for the case of .
Remark 3.25.
For a dually flat manifold with potential function , the above proposition corresponds to a well known property
with respect to -affine coordinates. In fact, a quasi-Hessian manifold is well characterized by using and , that will be discussed within the theory of (weak) contrast functions (see §3.4).
As well known, for a dually flat manifold , the family of -connections is defined by
). Namely, it deforms the Levi-Civita connection using linearly. When , are recovered. Both and are mutually dual and they form the so-called -geometry [1, 20]. For a quasi-Hessian manifold , we have connections of and , but none of , thus there is no direct analogy to -geometry. Nevertheless, as an attempt, we define a new -tensor
Obviously, is the -part, is the -part multiplied by , and a sort of duality holds:
In general, is not totally symmetric, for is not so. If either or is isomorphic, then we may take a possibly degenerate local (dual) potential around (i.e., or ) as generating function ; thus is written by the Hessian of the potential, and hence is symmetric, and is also. Furthermore, if is non-degenerate, i.e., is a dually flat manifold, we completely restore -geoemtry.
4. Divergence
Let be a quasi-Hessian manifold throughout this section.
4.1. Geodesic-like curves
Let be a curve, where is an open interval, and set , the velocity vector ().
Definition 4.1.
A curve is called an -curve if it is an immersion () and satisfies that at every , vectors of
are not simultaneously zero and any two are linearly dependent. Also an -curve is defined in the same way by replacing and by and , respectively.
Suppose that the curve is given in a local model, . We denote by
the image via the -Lagrange map . Note that is canonically isomorphic to by linear projection along the -axis. Unless becomes to be zero, the velocity vector does not vanish and its acceleration vector is parallel to the velocity (it can be ) by the condition in Definition 4.1. Hence moves on a straight line in , i.e., is a re-parametrization of an -geodesic (geodesic with respect to ). A trouble occurs when at some . Then, , but by the condition for -curve, some higher derivative is non-zero, say , and then the vector is parallel to , so we see again that moves on a straight line, but it meets the -caustics at ; it stops once and then turns back or goes forward along the same line, according to even or odd, see Fig. 3 (cf. Examples 3.4 and 3.5). We choose a direction vector of the straight line. For an -curve , everything is the same, and we denote by a direction of the corresponding line on .

Remark 4.2.
(1) Not arbitrary two points on are connected by an -curve but by a piecewise -curve. In fact, in Example 3.4, you can easily find such two points on the -wavefront, the left in Fig. 3. That is also for -curves. (2) Take coordinates for a local model . It is easy to see that any -curves satisfy a certain partial differential equation (like the geodesic equation) using and . In §3.4, we have introduced the -family of cubic tensors . Thus we may consider an -analogy to -curves; indeed, over , it is the same as geodesics with respect to and .
The following definition does not depend on the choices of and direction vectors.
Definition 4.3.
Let , and , as above. Let be a submanifold of and meets at . We say that and are orthogonal at if it holds that
where is the coordinates of for a local model containing . Similarly, and are orthogonal at if for any . Furthermore, we say that and are strictly orthogonal at if .
If , the above definition of the orthogonality of and is the same as the orthogonality with respect to the metric . In fact, taking a regular model around , the Hessian is non-degenerate, and thus
for some . However, if , the meaning is different in general, for it can happen that but (in this case, is determined by some higher derivative of ). The reason why we define the strictly orthogonality is the same; and may not be determined by velocity vectors.
4.2. Canonical divergence
Let be a Legendre submanifold of . We denote coordinates by
Definition 4.4.
The canonical divergence is defined by
Note that and it is asymmetric, , in general. In particular, if is a regular model with positve definite Hessian metric, this is nothing but the Bregman divergence for some convex potential ,
where is the Legendre transform of the potential [1].
Lemma 4.5.
The canonical divergence is invariant under affine Legendre equivalence, i.e., if Legendre submanifolds and of are affine Legendre equivalenct via , then it holds that
Proof : Suppose that is given by
with , and , and . Then
This completes the proof.
Let be a quasi-Hessian manifold obtained by gluing local models and put . Let denote the subset of consisting of points such that there is some local model containing . Since is endowed with the quotient topology, is an open neighborhood of the diagonal .
Definition 4.6.
We set at for some , then is well-defined by Lemma 4.5. We call it the canonical divergence of .
If is connected and simply connected, then the canonical divergence of can be extended to the entire space, so we obtain .
In Amari-Nagaoka’s theory of dually flat structure [1, 2], there are two important theorems named by extended Pythagorean Theorem and projection theorem. They take a central role in application to statistical inference, em-algorithm, machine learning and so on. These are immediately generalized to our singular setup. In the following two theorems, assume that is a local model (i.e. ) or a connected and simply-connected quasi-Hessian manifold. Anyway, the canonical divergence is defined on .
We say that two points are jointed by a curve if there are with and .
Theorem 4.7.
(Extended Pythagorean Theorem) Let be three distinct points such that and are joined by an -curve , and and are jointed by an -curve , and furthermore, and are strictly orthogonal at . Then it holds that
Proof : Since , we see that
The images of the maps and lie on lines with direction vectors, say , respectively. Then
for some . The assumption is , thus the equality follows.
Theorem 4.8.
(Projection Theorem) Let be a submanifold of and an -curve with . Put . Then, and are orthogonal at if and only if is a critical point of the function . The same holds for an -curve and .
Proof : Take a generating function around . Recall , and . Let be an immersed curve on with . On this curve, we have and . Therefore, we see
At , the vector is a scalar multiple of the direction vector of the line in to which the -curve is projected, and . Hence, the orthogonality of and at is equivalent to that for arbitrary , that means that is critical at .
Example 4.9.
We check the Pythagorean theorem for a toy example in Example 3.4. Let
and use affine local coordinates for . The -Lagrange map is , and is the -axis. We have already computed the dual potential , thus for and , we have
A straight line on corresponds to a parabola on (i.e. an -curve). Now, for example, take an -curve : (), and two points and lying on it. Take a point on the straight line on (i.e. an -curve) passing through directed by with . Then satisfies the condition, and we see . It does not matter whether the point lies on or not.

.
4.3. Weak contrast functions
First we recall the definition of contrast functions (Eguchi [13]). Let be a manifold and a function defined on an open neighborhood of the diagonal . Given vector fields , on , we set a function
by assigning to the value
We also write and so on. We call a contrast function of if it satisfies that
(iii) is a pseudo-Riemannian metric on .
We call a weak contrast function if it satisfies only (i) and (ii).
Given a contrast function , affine connections are defined by
Those connections are torsion-free, mutually dual with respect to , and is symmetric, and therefore, becomes a statistical manifold [13, 20]. Conversely, given a statistical manifold , one can find a contrast function which reproduces the metric and connections [19] – it is actually shown in [19] that for a symmetric -tensor (i.e., a possibly degenerate metric) and a symmetric -tensor , one can find a weak contrast function which satisfies that
Among statistical manifolds, Hessian manifolds admit a notable property: the Bregman divergence is a contrast function, and it reproduces the dually flat structure. That is extended to our quasi-Hessian manifold and its canonical divergence.
Theorem 4.10.
For a quasi-Hessian manifold , the canonical divergence is a weak contrast function, and reproduces the quasi-Hessian metric and the canonical cubic tensor by
Proof : Since this is a local property, take a local model . Suppose that is a generating function for around . Then is a system of local coordinates for around , and it holds that , , for close to . Hence,
Let denote if and if , for short. Then
where if and if . It immediately follows that
so the divergence is a weak contrast function. Put
Then a simple computation shows that
Hence and
for any . This coincides with the cubic tenser by Proposition 3.24 up to the sign.
5. Discussions
We shortly discuss possible directions or proposals for further researches.
5.1. Pre-Frobenius structure
In mathematical physics such as string theory, there often arise manifolds endowed with commutative and associative multiplication on tangent spaces satisfying certain properties, called (several variations of) Frobenius manifolds [10]. Now, let be a flat Hessian manifold, i.e., the metric connection with respect to is flat. Then naturally carries a (weak) version of Frobenius structure [27, §2]. Put using -affine coordinates, and we may take them as structure constants to define a multiplication on :
Since is symmetric, it is commutative. The associativity, , is written down to
and a bit surprisingly, the left hand side coincides with the curvature tensor for the Levi-Civita connection of [11, 23]; the equation is actually known as the WDVV equation in string theory. Moreover, it is easy to see that the multiplication is compatible with the metric: . Then the tuple becomes a weak pre-Frobenius manifold (cf. [10, 15]). For a quasi-Hessian manifold , the symmetric cubic tensor is defined everywhere, but is not; even though, the WDVV equation makes sense. Then, at least for every (pointwise), the quotient carries a Frobenius algebra structure.
A new pre-Frobenius structure on a certain space of probability distributions has recently been found using the Hessian geometry on convex cones and paracomplex structure in [9]. Also from the context of Poisson and paraKähler geometry, the notion of contravariant pseudo-Hessian manifolds has been introduced in [7], which is actually very close to our quasi-Hessian manifolds with degenerate potentials. Those should be mutually related.
As a different question from the above, more interesting is local geometry of quasi-Hessian in relation with the Saito-Givental theory – under a certain condition, the germ of at a point should be a real geometry counterpart to analytic spectrum of a massive F-manifold (cf. [15, §3]; the analytic spectrum is a certain holomorphic Legendre submanifold of defined by a versal deformation of a complex isolated hypersurface singularity as its a generating family). Perhaps, this was essentially posed by Arnol’d [5, §4].
5.2. Statistical inference and machine learning
Suppose that our statistical model is a curved exponential family, i.e., a submanifold of an exponential family (see Example 2.6). Let be the associated Bregman divergence, which is known to coincide with the Kullback-Leibler divergence
measuring an ‘asymmetrical distance’ from a distribution (density function) to another . A given data set yields an observed point , then the task of statistical inference is to find which best approximates the point . Information geometry [1, 2] provides a clear geometric understanding on the maximum likelihood estimate (MLE), that is, the MLE assigns to the point which attains the minimum of , and especially, is projected to along an -geodesic (-curve) being orthogonal to at . We have shown that this assertion is valid even in case that admits the locus where the Fisher-Rao metric is degenerate (Theorem 4.8), see Fig. 5.

If is sufficiently close to and far from , then the asymptotic theory of estimation is discussed. However, in practice, we may not be able to know if is the case. For instance, it often happens that the MLE has multiple local minimums, i.e., the maximum likelihood equation may have multiple roots. Then, as varies by renewing the data, catastrophe phenomena – the birth and death of min/max. points – can happen. Actually, the ambiguity of root selection in MLE has been studied in practical and numerical approach (cf. [26, §4]), while there seems to be less theoretical approach so far. Our framework provides a right way from information geometry. Define
and we may consider as a global generating family [4, p.323], i.e., it defines a Legendre submanifold of by
where we roughly denote by the differential with respect to and so on. This gives a typical example of a quasi-Hessian manifold. The critical value set of the Lagrange map () is nothing but the envelope of the family of all -curves on which are orthogonal to ; we call it the -caustics determined by . If is not -flat, the -caustics usually appear (that reflects the -extrinsic geometry of in ). It turns out that the catastrophe phenomenon mentioned above arises when the data manifold intersects with the -caustics determined by . Conversely, for a given data manifold , we may consider the restriction of to and define the -caustics determined by similarly. Interaction between these two -caustics can be involved and affect the performance of EM-algorithm (cf. Amari [2, Chap. 8]). Note that in principle, the above strategy may be adapted to any divergence and any statistical model. The detail will be discussed somewhere else.
As described in Amari [2, Chap.11], a class of learning machines is also based on the Bregman divergence of convex functions . Now, as an attempt, suppose that is a nonconvex function (possibly with inflection points). Read to be the corresponding canonical divergence in our sense (see §4.2). Here we would like to notice that the same proofs in convex case do often work to obtain slightly weaker results for such general – an easy example is Theorem 11.1 of [2], which is read off as “the -mean of a cluster in is always a critical point of , and all other critical points are obtained from and ”. We expect a similar result for some other optimization algorithm. On the other hand, almost all statistical learning machines allow Fisher-Rao matrices to be degenerate [14, 28]. In particular, as in [2, Chap.12], most of deep learning machines use the Gaussian noise with a fixed (co)variance for regression; then the parameter space becomes a self-dual Riemannian manifold off the degeneracy locus of having many components. We seek another scheme for measuring errors which is compatible with our singular model.
5.3. Conclusion
In the present paper, we have proposed an information geometry for singular models from the viewpoint of contact geometry and singularity theory. We have introduced quasi-Hessian manifolds, which extend the notion of dually flat manifolds of Amari-Nagaoka so that the Hessian metric can be degenerate, but the canonical cubic tensor is consistently defined on the entire space. Most notable is that the extended Pythagorean theorem and projection theorem are valid even in this singular setup.
There are several further directions as mentioned above. We end by adding a few more comments. There is an on-going project of the first author on local classification of singularities of -wavefronts in flat affine coordinates, which extends an old work of Ekeland [12] in nonconvex optimization and leads to affine differential geometry of wavefronts (cf. [24]). Secondly, since a quasi-Hessian manifold is embedded in some contact manifold, we may think of the Hamiltonian-Jacobi method for time evolution of quasi-Hessian manifolds (wavefront propagation) and semi-classical quantization (WKB analysis) in our framework (cf. [3]). Finally, it would be valuable to find some connections with preceding excellent works on singular statistical models [2, 14, 28] – especially, we hope that the theory of singular Legendre varieties and Legendre currents would make a bridge between the differential geometric method [1, 2] and the algebro-geometric method [28].
References
- [1] S. Amari, H. Nagaoka, Method of information geometry, A.M.S., Oxford Univ. Press (2000).
- [2] S. Amari, Information Geometry and Its Application, Applied Math. Sci., 194, Springer (2016).
- [3] V.I. Arnol’d, Mathematical Methods of Classical Mechanics, 2nd Edition, Grad. Texts Math. 60, Springer-Verlag (1989).
- [4] V.I. Arnol’d, S.M. Gusein-Zade, A.N. Varchenko, Singularities of Differentiable Maps I, Monographs in Math. 82, Birkhäuser (1986).
- [5] V.I. Arnol’d, Singularities of Caustics and Wave Fronts, Kluwer Acad. Publ. (1990).
- [6] V.I. Arnol’d, Catastrophe Theory, 3rd Edition, Springer-Verlag (1992).
- [7] S. Benayadi and M. Boucetta, On para-Kähler Lie algebroids and contravariant pseudo‐Hessian structures Math. Nachrichten 292 (2019), 1418–1443.
- [8] N. Chentsov, Statistical decision rules and optimal inference, Translation of Math. Monograph 53, AMS, Providence (1982).
- [9] N. Combe and Y.I. Manin, F-manifolds and geometry of information, preprint (2020), ArXiv:2004.08808.
- [10] B. Dubrovin, Geometry of 2D topological field theories, Springer Lect. Note Series 1620 (1996), 120–348.
- [11] J. Duistermaat, On Hessian Riemannian Structures, Asian J. Math. 5 (2001), 79–91.
- [12] I. Ekeland, Legendre duality in nonconvex optimization and calculus of variations, SIAM J. Control and Optimization, 15 (6) (1977), 905–934.
- [13] S. Eguchi, Geometry of minimum contrast, Hiroshima Math. J., 22 (1992), 631–647.
- [14] K. Fukumizu and S. Kuriki, Statistics of Singular Models, Frontier Stat. Sci. 7, Iwanami Publ. (2004) (in Japanese).
- [15] C. Hertling, Frobenius Manifolds and Moduli Spaces for Singularities, Cambridge Univ. Press (2002).
- [16] G. Ishikawa, Parametrization of a singular Lagrangian variety, Trans. Amer. Math. Soc. 331 (1992), 787-798.
- [17] S. Izumiya and G. Ishikawa, Applied Singularity Theory (in Japanese), Kyoritsu Shuppan, Co. Ltd. (1998)
- [18] S. Izumiya, M.C. Romero-Fuster, M.A.S. Ruas, F. Tari, Differential Geometry from a Singularity Theory Viewpoint, World Scientific (2015).
- [19] T. Matsumoto, Any statistical manifold has a contrast function – On the -functions taking the minimum at the diagonal of the product manifold, Hiroshima Math. J. 23 (1993), 327–332.
- [20] H. Matsuzoe, Statistical manifolds and affine differential geometry, Advanced Stud. Pure Math. 57 (2010), 303–321.
- [21] T. Poston, I. Stewart, Catastrophe Theory and Its Applications, Pitman Publ. Ltd. (1978).
- [22] H. Sano, Y. Kabata, J.L. Deolindo-Silva, T. Ohmoto, Classification of jets of surfaces in projective 3-space via central projection, Bull. Brazilian Math. Soc., New Series 48 (2017), 623–639.
- [23] H. Kito, On Hessian structures on the Euclidean space and the hyperbolic space, Osaka J. Math. 36 (1999), 51–62.
- [24] K. Saji, M. Umehara, K. Yamada, The geometry of fronts, Annals of Math., 169 (2009), 491–529.
- [25] H. Shima, The geometry of Hessian Structures, World Scientific (2007).
- [26] C. Small and W. Jinfang, Numerical Methods for Nonlinear Estimating Equations, Oxford Univ. Press (2003).
- [27] B. Totaro, The curvature of a Hessian metric, Internat. J. Math. 15 (2004), 369–391.
- [28] S. Watanabe, Algebraic Geometry and Statistical Learning Theory, Cambridge Univ. Press (2008).