This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Universal Approximation on the Hypersphere

Tin Lok James Ng    Kwok-Kun Kwong
Abstract

It is well known that any continuous probability density function on m\mathbb{R}^{m} can be approximated arbitrarily well by a finite mixture of normal distributions, provided that the number of mixture components is sufficiently large. The von-Mises-Fisher distribution, defined on the unit hypersphere SmS^{m} in m+1\mathbb{R}^{m+1}, has properties that are analogous to those of the multivariate normal on m+1\mathbb{R}^{m+1}. We prove that any continuous probability density function on SmS^{m} can be approximated to arbitrary degrees of accuracy by a finite mixture of von-Mises-Fisher distributions.

1 Introduction

Finite mixtures of distributions (McLachlan and Peel, 2000) are being widely used in various fields for modelling random phenomena. In a finite mixture model, the distribution of random observations is modelled as mixture of a finite number of component distributions with varying proportions. The finite mixture of normal distributions (Fraley and Raftery, 2002) is one of the most frequently used finite mixture models for continuous data taking values in the Euclidean space, because of their flexibility of representation of arbitrary distributions. Indeed, it has been shown that given sufficient number of mixture components, a finite mixture of normals can approximate any continuous probability density functions up to any desired level of accuracy (Bacharoglou, 2010; Nguyen and McLachlan, 2019).

Despite the success and popularity of finite mixture of normal distributions in a wide range of applications, frequently, data possess more structure and representing them using Euclidean vectors may be inappropriate. An important case is when data are normalized to have unit norm, which can be naturally represented as points on the unit hypersphere Sm:={xm+1:x2=1}S^{m}:=\{x\in\mathbb{R}^{m+1}:||x||_{2}=1\}. For example, the direction of flight of a bird or the orientation of an animal can be represented as points on the circle S1S^{1} or sphere S2S^{2}. Consequently, standard methods for analyzing univariate or multivariate data cannot be used, and distributions that take into account the directional nature of the data are required.

The von-Mises-Fisher distribution (Fisher et al., 1993) is one of the most commonly used distribution to describe directional data on SmS^{m} which has properties analogous to those of the multivariate normal on m+1\mathbb{R}^{m+1}. A unit norm vector xSmx\in S^{m} has a von-Mises-Fisher distribution if it has density

fm+1(x;μ,κ)=cm+1(κ)exp(κx,μ),xSm,f_{m+1}(x;\mu,\kappa)=c_{m+1}(\kappa)\exp(\kappa\langle x,\mathbf{\mu}\rangle),\quad x\in S^{m},

where κ>0\kappa>0 is the concentration parameter and the mean direction μSm\mu\in S^{m} satisfies μ=1||\mu||=1. In particular, as κ\kappa increases, the distribution becomes increasingly concentrated at μ\mu. The normalizing constant cm+1(κ)c_{m+1}(\kappa) is given by

cm+1(κ)=κm+121(2π)m+12Im+121(κ),c_{m+1}(\kappa)=\frac{\kappa^{\frac{m+1}{2}-1}}{(2\pi)^{\frac{m+1}{2}}I_{\frac{m+1}{2}-1}(\kappa)},

where IvI_{v} is the modified Bessel function at order vv.

A finite mixture of von-Mises-Fisher distributions on SmS^{m} with HH components has density

fm+1(x;{πh,μh,κh}h=1H)=h=1Hπhfm+1(x;μh,κh).\displaystyle f_{m+1}(x;\{\pi_{h},\mu_{h},\kappa_{h}\}_{h=1}^{H})=\sum_{h=1}^{H}\pi_{h}f_{m+1}(x;\mu_{h},\kappa_{h}). (1)

The mixing proportions {πh}h=1H\{\pi_{h}\}_{h=1}^{H} are non-negative and sum to 1 (i.e. 0πh1,hπh=10\leq\pi_{h}\leq 1,\sum_{h}\pi_{h}=1), and {μh,κh}h=1H\{\mu_{h},\kappa_{h}\}_{h=1}^{H} are the parameters for the HH mixture components.

Finite mixtures of von-Mises-Fisher distributions have found numerous applications, including clustering of high dimensional text data and gene expression (Banerjee et al., 2005) and clustering of online user behavior (Qin et al., 2016). A natural question that arises is whether finite mixtures of von-Mises-Fisher distributions can approximate any continuous probability distribution on the hypersphere up to any desired level of accuracy.

In this paper, we provide an affirmative answer to this question. We prove that any continuous probability distribution on SmS^{m} can be approximated by finite mixture of von-Mises-Fisher distributions in sup\sup norm given enough mixture components, and each component is sufficiently concentrated at respective mean directions. Our proof utilizes the theory of approximation by spherical convolution (Menegatto, 1997).

The paper is structured as follows. Section 2 provides relevant background that are needed for the proof of the main result. The main result is stated in Section 3 and is proved in Section 4.

2 Background

This section provides the definitions of kernel function, spherical convolution and eigenfunction expansion which are needed for the proof of the main result. We refer the interested reader to Menegatto (1997) for detailed expositions of the theory.

We denote the space of all continuous functions defined on the hypersphere SmS^{m} by C(Sm)C(S^{m}). Let dωmd\omega_{m} be the surface measure on SmS^{m}, and define ωm:=Sm𝑑ωm\omega_{m}:=\int_{S^{m}}d\omega_{m}. The uniform and the p{\cal L}^{p} norm on SmS^{m} are defined as

fm,:=supxSm|f(x)|||f||_{m,\infty}:=\sup_{x\in S^{m}}|f(x)|

and

fm,p:=(1ωmSm|f(x)|p𝑑ωm(x))1/p,||f||_{m,p}:=\bigg{(}\frac{1}{\omega_{m}}\int_{S^{m}}|f(x)|^{p}d\omega_{m}(x)\bigg{)}^{1/p},

respectively. In particular, the p{\cal L}^{p} space contains all functions defined on SmS^{m} that are integrable with respect to dωmd\omega_{m}. When no confusion arises, we let VmV_{m} be any of the space above with corresponding norm ||||m||\cdot||_{m} (i.e. ||||m=||||m,p||\cdot||_{m}=||\cdot||_{m,p} for 1p<1\leq p<\infty or ||||m=supxSm|f(x)|||\cdot||_{m}=\sup_{x\in S^{m}}|f(x)|).

We define the space 1,m{\cal L}^{1,m} which consists of all measurable functions KK on [1,1][-1,1] with norm

K1,m:=ωm1ωm11|K(t)|(1t2)(m2)/2𝑑t<.||K||_{1,m}:=\frac{\omega_{m-1}}{\omega_{m}}\int_{-1}^{1}|K(t)|(1-t^{2})^{(m-2)/2}dt<\infty.

Functions in the space 1,m{\cal L}^{1,m} are called kernels. Let ,\langle\cdot,\cdot\rangle be the inner product in m+1\mathbb{R}^{m+1}, it is straight forward to show that for all xSmx\in S^{m}, the following equality holds:

K1,m:=1ωmSm|K(x,y)|𝑑ωm(y).||K||_{1,m}:=\frac{1}{\omega_{m}}\int_{S^{m}}|K(\langle x,y\rangle)|d\omega_{m}(y).

The spherical convolution KfK*f of a kernel KK in 1,m{\cal L}^{1,m} with a function ff in VmV_{m} is defined by

(Kf)(x):=1ωmSmK(x,y)f(y)𝑑ωm(y),xSm.(K*f)(x):=\frac{1}{\omega_{m}}\int_{S^{m}}K(\langle x,y\rangle)f(y)d\omega_{m}(y),\quad x\in S^{m}.

For a fixed kernel KK, the mapping defined by the spherical convolution fKff\rightarrow K*f for fVmf\in V_{m} has range in VmV_{m}.

A useful property of spherical convolution is the Funk and Hecke’s formula (Xu, 2000) for eigenfunction expansion of any kernel K1,mK\in{\cal L}^{1,m}. Let km{\cal H}_{k}^{m} be the space of all degree kk spherical harmonics in m+1m+1 variables and let NkmN^{m}_{k} be its dimension (Reimer, 2012, Chapter 3). Let Qk(m1)/2Q_{k}^{(m-1)/2} be the Gegenbauer polynomial of degree kk normalized by Qk(m1)/2(1)=NkmQ_{k}^{(m-1)/2}(1)=N_{k}^{m}. The Gegenbauer polynomials are certain types of the Jacobi polynomials and are conveniently defined using generating functions (Reimer, 2012, Chapter 2).

The Funk and Hecke’s formula states that for a kernel K1,mK\in{\cal L}^{1,m} the following expansion holds:

KYkm=akm(K)Ykm,K1,m,Ykmkm,k=0,1,K*Y_{k}^{m}=a_{k}^{m}(K)Y_{k}^{m},\quad K\in{\cal L}^{1,m},Y_{k}^{m}\in{\cal H}_{k}^{m},k=0,1,\ldots

In particular, the spherical harmonics YkmY_{k}^{m} for k=0,1,k=0,1,\ldots are the eigenfunctions associated with the kernel KK, and the eigenvalues in the series expansion can be expressed in terms of Gegenbauer polynomials:

akm(K)=ωm1ωm11K(t)Qk(m1)/2(t)Qk(m1)/2(1)(1t2)((m2)/2𝑑t,k=0,1,a_{k}^{m}(K)=\frac{\omega_{m-1}}{\omega_{m}}\int_{-1}^{1}K(t)\frac{Q_{k}^{(m-1)/2}(t)}{Q_{k}^{(m-1)/2}(1)}(1-t^{2})^{((m-2)/2}dt,\quad k=0,1,\ldots

In particular, we have

a0m(K)=ωm1ωm11K(t)(1t2)((m2)/2𝑑t.a_{0}^{m}(K)=\frac{\omega_{m-1}}{\omega_{m}}\int_{-1}^{1}K(t)(1-t^{2})^{((m-2)/2}dt.

Menegatto (1997) has investigated necessary and sufficient conditions under which a sequence of kernels {Kn}n\{K_{n}\}_{n} in 1,m{\cal L}^{1,m} has the property

Knffm0,fVm||K_{n}*f-f||_{m}\rightarrow 0,\quad\forall f\in V_{m}

as nn\rightarrow\infty. For non-negative kernels {Kn}n\{K_{n}\}_{n}, Theorem 3.4 of Menegatto (1997) provides sufficient conditions for the convergence of spherical convolutions KnffK_{n}*f\rightarrow f, and is stated below.

Lemma 1.

Let {Kn}\{K_{n}\} be a sequence of non-negative kernels in 1,m{\cal L}^{1,m}. Suppose

  1. 1.

    a0m(Kn)1a_{0}^{m}(K_{n})\rightarrow 1 as nn\rightarrow\infty;

  2. 2.

    (ωm1/ωm)1ρ|Kn(t)|(1t2)(m2)/2𝑑t0(\omega_{m-1}/\omega_{m})\int_{-1}^{\rho}|K_{n}(t)|(1-t^{2})^{(m-2)/2}dt\rightarrow 0, for all ρ(1,1),\rho\in(-1,1),

then Knffm0||K_{n}*f-f||_{m}\rightarrow 0 as nn\rightarrow\infty.

3 Main Result

We state the main result concerning the approximating properties of the finite mixtures of von Mises-Fisher distributions in the form (1). Recall that the probability density function of the von Mises–Fisher distribution for the random (m+1)(m+1)-dimensional unit vector xx is given by:

fm+1(x;μ,κ)=cm+1(κ)exp(κx,μ),f_{m+1}(x;\mu,\kappa)=c_{m+1}(\kappa)\exp(\kappa\langle x,\mu\rangle),

where μSm\mu\in S^{m} is the mean direction and κ>0\kappa>0 is the concentration parameter. We define a sequence of kernels {Kn}n\{K_{n}\}_{n} in 1,m{\cal L}^{1,m} by

Kn(t)=cm+1(n)exp(nt),t[1,1].\displaystyle K_{n}(t)=c_{m+1}(n)\exp(nt),\quad t\in[-1,1]. (2)

In particular, for any fixed ySmy\in S^{m},

Kn(x,y)=cm+1(n)exp(κx,y),xSmK_{n}(\langle x,y\rangle)=c_{m+1}(n)\exp(\kappa\langle x,y\rangle),\quad x\in S^{m}

is the density function of the von Mises-Fisher distribution with mean direction yy and concentration parameter nn. For a fixed ySmy\in S^{m}, Kn(,y)K_{n}(\langle\cdot,y\rangle) plays the role of a “bump function” and becomes increasingly concentrated on yy as nn increases.

We show that for any continuous probability density functions ff on SmS^{m}, we can construct a mixture of von Mises-Fisher distributions where each mixture component has the form Kn(,yk)K_{n}(\langle\cdot,y_{k}\rangle) and ff can be approximated up to desired level of accuracy under the uniform norm.

Theorem 1.

Let ff be a continuous probability density function on SmS^{m}, then given δ>0\delta>0, there exists integers nn and NN, y1,y2,yNy_{1},y_{2},\ldots y_{N} in SmS^{m}, c1,,cNc_{1},\ldots,c_{N} in \mathbb{R} with ck>0c_{k}>0 and k=1Nck=1\sum_{k=1}^{N}c_{k}=1 such that

maxxSm|f(x)k=1NckKn(x,yk)|<δ.\max_{x\in S^{m}}\bigg{|}f(x)-\sum_{k=1}^{N}c_{k}K_{n}(\langle x,y_{k}\rangle)\bigg{|}<\delta.

4 Proof of Theorem 1

In this section we first state and prove a few lemmas needed for the proof of Theorem 1. Recall that VmV_{m} is the space of integrable functions on SmS^{m} with respect to either the p{\cal L}^{p} norm or the uniform norm ||||m||\cdot||_{m}. We first show that for any function fVmf\in V_{m} the spherical convolution KnfK_{n}*f converges to ff in ||||m||\cdot||_{m} norm.

Lemma 2.

Knffm0||K_{n}*f-f||_{m}\rightarrow 0 as nn\rightarrow\infty for all fVmf\in V_{m}.

Proof.

It is sufficient to verify conditions 1 and 2 in Lemma 1. For condition 1, since for non-negative kernel KK and for any fixed xSmx\in S^{m},

a0m(K)\displaystyle a_{0}^{m}(K) =\displaystyle= ωm1ωm11K(t)(1t2)(m2)/2𝑑t\displaystyle\frac{\omega_{m-1}}{\omega_{m}}\int_{-1}^{1}K(t)(1-t^{2})^{(m-2)/2}dt
=\displaystyle= 1ωmSmK(x,y)𝑑ωm(y)\displaystyle\frac{1}{\omega_{m}}\int_{S^{m}}K(\langle x,y\rangle)d\omega_{m}(y)

The last equality equals to 1 if KK is a probability density function.

For condition 2, we note that for any fixed xSmx\in S^{m},

1ρ|Kn(t)|(1t2)(m2)/2𝑑t\displaystyle\int_{-1}^{\rho}|K_{n}(t)|(1-t^{2})^{(m-2)/2}dt =\displaystyle= 1ρent(1t2)(m2)/2𝑑t11ent(1t2)(m2)/2𝑑t\displaystyle\frac{\int_{-1}^{\rho}e^{nt}(1-t^{2})^{(m-2)/2}dt}{\int_{-1}^{1}e^{nt}(1-t^{2})^{(m-2)/2}dt}
=\displaystyle= {y:x,yρ}enx,y𝑑ωm(y)Smenx,y𝑑ωm(y)\displaystyle\frac{\int_{\{y:\langle x,y\rangle\leq\rho\}}e^{n\langle x,y\rangle}d\omega_{m}(y)}{\int_{S^{m}}e^{n\langle x,y\rangle}d\omega_{m}(y)}

where the second equality is a result of applying a change of variable. Since enx,yenρe^{n\langle x,y\rangle}\leq e^{n\rho} if x,y<ρ\langle x,y\rangle<\rho, the numerator above is bounded above by

{y:x,yρ}enx,y𝑑ωm(y)ωm({y:x,yρ})enρ.\displaystyle\int_{\{y:\langle x,y\rangle\leq\rho\}}e^{n\langle x,y\rangle}d\omega_{m}(y)\leq\omega_{m}(\{y:\langle x,y\rangle\leq\rho\})e^{n\rho}. (3)

To lower bound the denominator, we define the ball Bδ(x):={ySm:x,y1δ}B_{\delta}(x):=\{y\in S^{m}:\langle x,y\rangle\geq 1-\delta\} where 1δ>ρ1-\delta>\rho. Consequently,

Smenx,y𝑑ωm(y)Bδ(x)enx,y𝑑ωm(y)\displaystyle\int_{S^{m}}e^{n\langle x,y\rangle}d\omega_{m}(y)\geq\int_{B_{\delta}(x)}e^{n\langle x,y\rangle}d\omega_{m}(y) (4)
en(1δ)ωm(Bδ(x)).\displaystyle\geq e^{n(1-\delta)}\omega_{m}(B_{\delta}(x)). (5)

Therefore, combining the two inequalities (3) and (4), we have

1ρ|Kn(t)|(1t2)(m2)/2𝑑tωm({y:x,yρ})enρen(1δ)ωm(Bδ(x)).\int_{-1}^{\rho}|K_{n}(t)|(1-t^{2})^{(m-2)/2}dt\leq\frac{\omega_{m}(\{y:\langle x,y\rangle\leq\rho\})e^{n\rho}}{e^{n(1-\delta)}\omega_{m}(B_{\delta}(x))}.

Since 1δ>ρ1-\delta>\rho, the RHS of the inequality above goes to 0 as nn\rightarrow\infty. ∎

The following lemma concerning uniform approximation on SmS^{m} by Riemann sums is useful.

Lemma 3.

Let g(x,y):Sm×Smg(x,y):S^{m}\times S^{m}\to\mathbb{R} be a continuous function. Then for any δ>0\delta>0, there is a partition {U1,,UN}\{U_{1},\cdots,U_{N}\} of SmS^{m} such that the integral Smg(x,y)𝑑ωm(y)\int_{S^{m}}g(x,y)d\omega_{m}(y) can be uniformly approximated on SmS^{m} by Riemann sums:

maxxSm|Smg(x,y)𝑑ωm(y)k=1Ng(x,yk)ωm(Uk)|<δ,\displaystyle\max_{x\in S^{m}}\left|\int_{S^{m}}g(x,y)d\omega_{m}(y)-\sum_{k=1}^{N}g(x,y_{k})\omega_{m}(U_{k})\right|<\delta,

for any ykUky_{k}\in U_{k}, where each UkU_{k} is connected.

Proof.

For each xSmx^{\prime}\in S^{m}, there exists a neighborhood 𝒰x\mathcal{U}_{x^{\prime}} such that for x𝒰xx\in\mathcal{U}_{x^{\prime}}, we have

maxySm|g(x,y)g(x,y)|<δ3ωm.\displaystyle\max_{y\in S^{m}}|g(x,y)-g(x^{\prime},y)|<\frac{\delta}{3\omega_{m}}.

Thus, for any x𝒰xx\in\mathcal{U}_{x^{\prime}}, we have

|Smg(x,y)𝑑ωm(y)Smg(x,y)𝑑ωm(y)|\displaystyle\left|\int_{S^{m}}g(x,y)d\omega_{m}(y)-\int_{S^{m}}g\left(x^{\prime},y\right)d\omega_{m}(y)\right|\leq Sm|g(x,y)g(x,y)|𝑑ωm(y)\displaystyle\int_{S^{m}}\left|g(x,y)-g\left(x^{\prime},y\right)\right|d\omega_{m}(y)
\displaystyle\leq maxySm|g(x,y)g(x,y)|Sm𝑑ωm(y)\displaystyle\max_{y\in S^{m}}\left|g(x,y)-g\left(x^{\prime},y\right)\right|\int_{S^{m}}d\omega_{m}(y)
<\displaystyle< δ3.\displaystyle\frac{\delta}{3}.

There exists a partition {U1,,UN}\{U_{1},\cdots,U_{N^{\prime}}\} of SmS^{m} by standard spherical coordinates blocks such that Smg(x,y)𝑑ωm(y)\int_{S^{m}}g(x^{\prime},y)d\omega_{m}(y) can be approximated uniformly by Riemann sums:

|Smg(x,y)𝑑ωm(y)k=1Ng(x,yk)ωm(Uk)|<δ3\displaystyle\left|\int_{S^{m}}g(x^{\prime},y)d\omega_{m}(y)-\sum_{k=1}^{N^{\prime}}g(x^{\prime},y_{k})\omega_{m}(U_{k})\right|<\frac{\delta}{3}

for any ykUky_{k}\in U_{k}. Now, for x𝒰xx\in\mathcal{U}_{x^{\prime}}, we have

|Smg(x,y)𝑑ωm(y)k=1Ng(x,yk)ωm(Uk)|\displaystyle\left|\int_{S^{m}}g(x,y)d\omega_{m}(y)-\sum_{k=1}^{N^{\prime}}g(x,y_{k})\omega_{m}(U_{k})\right|\leq |Smg(x,y)𝑑ωm(y)Smg(x,y)𝑑ωm(y)|\displaystyle\left|\int_{S^{m}}g(x,y)d\omega_{m}(y)-\int_{S^{m}}g(x^{\prime},y)d\omega_{m}(y)\right|
+|Smg(x,y)𝑑ωm(y)k=1Ng(x,yk)ωm(Uk)|\displaystyle+\left|\int_{S^{m}}g(x^{\prime},y)d\omega_{m}(y)-\sum_{k=1}^{N^{\prime}}g(x^{\prime},y_{k})\omega_{m}(U_{k})\right|
+|k=1Ng(x,yk)ωm(Uk)k=1Ng(x,yk)ωm(Uk)|\displaystyle+\left|\sum_{k=1}^{N^{\prime}}g(x^{\prime},y_{k})\omega_{m}(U_{k})-\sum_{k=1}^{N^{\prime}}g(x,y_{k})\omega_{m}(U_{k})\right|
<\displaystyle< δ.\displaystyle\delta.

Since {𝒰x}xSm\{\mathcal{U}_{x^{\prime}}\}_{x^{\prime}\in S^{m}} covers SmS^{m}, there exists a finite subcover {𝒰x1,,𝒰xn}\{\mathcal{U}_{x_{1}},\cdots,\mathcal{U}_{x_{n}}\}. We can then find a common refinement of all the partitions used in the Riemann sums for Smg(xi,y)𝑑ωm(y)\int_{S_{m}}g(x_{i},y)d\omega_{m}(y), i=1,,ni=1,\cdots,n. The claimed result follows immediately. ∎

The following result shows that any continuous function on SmS^{m} can be uniformly approximated by linear combinations of {Kn(,yk)}k\{K_{n}(\langle\cdot,y_{k}\rangle)\}_{k} for y1,y2,y_{1},y_{2},\ldots in SmS^{m}.

Lemma 4.

Let ff be a non-zero continuous function on SmS^{m}, then given δ>0,\delta>0, there exists integers nn and N,y1,y2,yNN,y_{1},y_{2},\ldots y_{N} in Sm,c1,,cNS^{m},c_{1},\ldots,c_{N} in \mathbb{R} such that

maxxSm|f(x)k=1NckKn(x,yk)|<δ.\displaystyle\max_{x\in S^{m}}\left|f(x)-\sum_{k=1}^{N}c_{k}K_{n}\left(\left\langle x,y_{k}\right\rangle\right)\right|<\delta.
Proof.

By Lemma 2, there exists an integer nn such that

maxxSm|f(x)SmKn(x,y)f(y)𝑑ωm(y)|<δ2.\max_{x\in S^{m}}\left|f(x)-\int_{S^{m}}K_{n}\left(\langle x,y\rangle\right)f(y)d\omega_{m}(y)\right|<\frac{\delta}{2}. (6)

On the other hand, by Lemma 3, there exists a partition {U1,,UN}\{U_{1},\cdots,U_{N}\} by connected sets of SmS^{m} such that for any xSmx\in S^{m} and ykUky_{k}\in U_{k},

|SmK(x,y)f(y)𝑑ωm(y)k=1NKn(x,yk)f(yk)ωm(Uk)|<δ2.\left|\int_{S^{m}}K(\langle x,y\rangle)f(y)d\omega_{m}(y)-\sum_{k=1}^{N}K_{n}(\langle x,y_{k}\rangle)f(y_{k})\omega_{m}(U_{k})\right|<\frac{\delta}{2}. (7)

The result follows by combining (6), (7), and letting ck=f(yk)ωm(Uk)c_{k}=f(y_{k})\omega_{m}(U_{k}) for k=1,,Nk=1,\ldots,N.

Proof of Theorem 1.

It remains to carefully pick the points ykUky_{k}\in U_{k} to ensure that k=1Nck=1\sum_{k=1}^{N}c_{k}=1 in (7). This follows by applying the integral mean value theorem to each of the integrals

Ukf(y)𝑑ωm(y),k=1,,N\int_{U_{k}}f(y)d\omega_{m}(y),\quad k=1,\ldots,N (8)

with connected Uk,U_{k}, and the fact that

k=1NUkf(y)𝑑ωm(y)=1.\sum_{k=1}^{N}\int_{U_{k}}f(y)d\omega_{m}(y)=1.

References

  • Bacharoglou (2010) Bacharoglou, A. G. (2010), “Approximation of probability distributions by convex mixtures of Gaussian measures,” Proc. Amer. Math. Soc., 138, 2619–2628.
  • Banerjee et al. (2005) Banerjee, A., Dhillon, I. S., Ghosh, J., and Sra, S. (2005), “Clustering on the Unit Hypersphere Using von Mises-Fisher Distributions,” J. Mach. Learn. Res., 6, 1345–1382.
  • Fisher et al. (1993) Fisher, N. I., Lewis, T., and Embleton, B. J. J. (1993), Statistical analysis of spherical data, Cambridge University Press, Cambridge, revised reprint of the 1987 original.
  • Fraley and Raftery (2002) Fraley, C. and Raftery, A. E. (2002), “Model-based clustering, discriminant analysis, and density estimation,” J. Amer. Statist. Assoc., 97, 611–631.
  • McLachlan and Peel (2000) McLachlan, G. J. and Peel, D. (2000), Finite mixture models, vol. 299 of Probability and Statistics – Applied Probability and Statistics Section, New York: Wiley.
  • Menegatto (1997) Menegatto, V. A. (1997), “Approximation by spherical convolution,” Numer. Funct. Anal. Optim., 18, 995–1012.
  • Nguyen and McLachlan (2019) Nguyen, H. D. and McLachlan, G. (2019), “On approximations via convolution-defined mixture models,” Comm. Statist. Theory Methods, 48, 3945–3955.
  • Qin et al. (2016) Qin, X., Cunningham, P., and Salter-Townshend, M. (2016), “Online Trans-Dimensional von Mises-Fisher Mixture Models for User Profiles,” J. Mach. Learn. Res., 17, 7021–7071.
  • Reimer (2012) Reimer, M. (2012), Multivariate polynomial approximation, vol. 144, Birkhäuser.
  • Xu (2000) Xu, Y. (2000), “Funk-Hecke formula for orthogonal polynomials on spheres and on balls,” Bull. London Math. Soc., 32, 447–457.