This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\RenewEnviron

equation

\BODY\BODY (1)

Asymptotic properties of generalized closed-form maximum likelihood estimators

Pedro L. Ramos1, Eduardo Ramos2, and Francisco A. Rodrigues2,
and Francisco Louzada2
1 Facultad de Matemáticas, Pontificia Universidad Católica de Chile, Santiago, Chile
2 Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, Brazil
Summary

The maximum likelihood estimator (MLE) is pivotal in statistical inference, yet its application is often hindered by the absence of closed-form solutions for many models. This poses challenges in real-time computation scenarios, particularly within embedded systems technology, where numerical methods are impractical. This study introduces a generalized form of the MLE that yields closed-form estimators under certain conditions. We derive the asymptotic properties of the proposed estimator and demonstrate that our approach retains key properties such as invariance under one-to-one transformations, strong consistency, and an asymptotic normal distribution. The effectiveness of the generalized MLE is exemplified through its application to the Gamma, Nakagami, and Beta distributions, showcasing improvements over the traditional MLE. Additionally, we extend this methodology to a bivariate gamma distribution, successfully deriving closed-form estimators. This advancement presents significant implications for real-time statistical analysis across various applications.

Keywords: Closed-form estimators; Maximum Likelihood estimators; Generalized maximum likelihood estimator; generalized estimator.

1 Introduction

Introduced by Ronald Fisher [1], the maximum likelihood method is one of the most well-known and used inferential procedures to estimate the unknown parameters of a given distribution. Alternative methods to the maximum likelihood estimator (MLE) have been proposed in the literature, such as those based on statistical moments [13], percentile [14, 15], product of spacings [7], or goodness of fit measures, to list a few. Although alternative inferential methods are popular nowadays, the MLEs are the most widely used due to their flexibility in incorporating additional complexity (such as random effects, covariates, censoring, among others) and their properties: asymptotical efficiency, consistent and invariance under one-to-one transformation. These properties are achieved when the MLEs satisfy some regularity conditions [5, 16, 20].

It is now well established from various studies that the MLEs do not always return closed-form expressions for most common distributions. In these cases, numerical methods, such as Newton-Rapshon or its variants, are usually considered to find the values that maximize the likelihood function. Important variants of the maximum likelihood estimator, such as profile [18], pseudo [11], conditional [2], penalized [3, 10] and marginal likelihoods [8], have been presented to eliminate nuisance parameters and decrease the computational cost. Another important procedure to achieve the MLEs is the expectation-maximization (EM) algorithm [9], which involves unobserved latent variables jointly with unknown parameters. The expectation and maximization steps also involve, in most cases, the use of numerical methods that may have a high computational cost. However, there is a need to use closed-form estimators to estimate the unknown parameters in many situations. For instance, in embed technology, small components need to compute the estimates without using maximization procedures, and in real-time applications, it is necessary to provide an immediate answer.

In this study, we present a generalized approach to the maximum likelihood method, enabling the derivation of closed-form expressions for estimating distribution parameters in numerous scenarios. Our primary objective is to establish the asymptotic normality and strong consistency of our proposed estimator. Furthermore, we demonstrate that these conditions are significantly simplified within a broad family of generalized maximum likelihood equations. The practical implications of our findings are substantial, offering efficient and rapid computational methods for obtaining estimates. Most importantly, our results facilitate the construction of confidence intervals and hypothesis tests, thus broadening their applicability in various fields.

The proposed method is illustrated regarding the Gamma, Beta, Nakagami distributions and a bivariate gamma model. In these cases, the standard ML does not have closed-form expressions, and numerical methods or approximations are necessary to find these distributions’ solutions. Hence our approach does not require iterative numerical methods, and, computationally, the work required by using our estimators is less complicated than that required in ML estimators. The remainder of this paper is organized as follows. Section 2 presents the new generalized maximum likelihood estimator and its properties. Section 3 considers the application of the Gamma, Nakagami, and Beta distributions. Finally, Section 4 summarizes the study.

2 Generalized Maximum Likelihood Estimator

The method we propose here can be applied to obtain closed-form expressions for distributions with a given density f(x;𝜽)f(x\,;\,\boldsymbol{\theta}). In order to formulate the method, let Ω\Omega represent the sample space, let 𝒳\mathcal{X} represent the space of the data xx, where 𝒳\mathcal{X} is equipped with a measure μ\mu, which can be either discrete or continuous, let Θs\Theta\subset\mathbb{R}^{s} be an open set containing the true parameter 𝜽0\boldsymbol{\theta}_{0}, to be estimated, and for each x𝒳x\in\mathcal{X} we let 𝒜xr\mathcal{A}_{x}\subset\mathbb{R}^{r}, 0rs0\leq r\leq s be an open set, possibly depending on xx, containing a fixed parameter 𝜶0\boldsymbol{\alpha}_{0}, which represents additional parameters which will be used during the procedure to obtain the estimators.

Now, suppose X1X_{1}, X2X_{2}, \cdots, XnX_{n} are independent and identically distributed (iid) random variables, which can be either discrete or continuous, with a strictly positive density function f(x;𝜽0)f(x\,;\,\boldsymbol{\theta}_{0}). Then, given a function g(x;𝜽,𝜶)g(x\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}) defined for x𝒳x\in\mathcal{X}, 𝜽Θ\boldsymbol{\theta}\in\Theta and 𝜶𝒜x\boldsymbol{\alpha}\in\mathcal{A}_{x} we define the generalized maximum likelihood equations for 𝛉\boldsymbol{\theta} over the coordinates (θ1,,θsr,𝛂)(\theta_{1},\cdots,\theta_{s-r},\boldsymbol{\alpha}) at 𝛂=𝛂0\boldsymbol{\alpha}=\boldsymbol{\alpha}_{0} to be the set of equations

1ni=1nθjlogg(Xi;𝜽,𝜶0)=E𝜽[θjlogg(X1;𝜽,𝜶0)],1jsr,\displaystyle\frac{1}{n}\sum_{i=1}^{n}\frac{\partial}{\partial\theta_{j}}\log\,g(X_{i}\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}_{0})=\operatorname{E}_{\boldsymbol{\theta}}\left[\frac{\partial}{\partial\theta_{j}}\log\,g(X_{1}\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}_{0})\right],\quad 1\leq j\leq s-r, (2)
1ni=1nαjlogg(Xi;𝜽,𝜶0)=E𝜽[αjlogg(X1;𝜽,𝜶0)],1jr,\displaystyle\frac{1}{n}\sum_{i=1}^{n}\frac{\partial}{\partial\alpha_{j}}\log\,g(X_{i}\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}_{0})=\operatorname{E}_{\boldsymbol{\theta}}\left[\frac{\partial}{\partial\alpha_{j}}\log\,g(X_{1}\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}_{0})\right],\quad 1\leq j\leq r,

as long as these partial derivatives exist and the expected values above are finite.

To see how the generalized likelihood equations generalize the maximum likelihood equations, note that, in case the equation 𝒳f(x;𝜽)𝑑μ=1\int_{\mathcal{X}}f(x\,;\,\boldsymbol{\theta})\,d\mu=1 can be differentiated under the integral sign we obtain 𝒳θjf(x;𝜽)𝑑μ=0\int_{\mathcal{X}}\frac{\partial}{\partial\theta_{j}}f(x\,;\,\boldsymbol{\theta})\,d\mu=0 for all jj, in which case, letting g(x;𝜽)=f(x;𝜽)g(x\,;\,\boldsymbol{\theta})=f(x\,;\,\boldsymbol{\theta}), it follows that the generalized maximum likelihood equations for 𝜽\boldsymbol{\theta} over the coordinates (θ1,,θs)(\theta_{1},\cdots,\theta_{s}) are given by the equations

1ni=1nθjlogf(Xi;𝜽)=0,1js,\frac{1}{n}\sum_{i=1}^{n}\frac{\partial}{\partial\theta_{j}}\log\,f(X_{i}\,;\,\boldsymbol{\theta})=0,\quad 1\leq j\leq s,

which coincide with the maximum likelihood equations. This differentiation under the integral sign condition is in fact a natural condition to impose, since it is universally used in order to prove the consistency and asymptotic normality of the maximum likelihood estimator.

From now on, our goal shall be that of giving conditions guaranteeing existence of solutions for the generalized maximum likelihood equations as well as conditions under which an obtained solution 𝜽^n(X)\boldsymbol{\hat{\theta}}_{n}(X) of the generalized maximum likelihood equations is a consistent estimator for the true parameter 𝜽0\boldsymbol{\theta}_{0}, and is asymptotically normal. In order to formulate the result, given a fixed 𝜶0Θ\boldsymbol{\alpha}_{0}\in\Theta we denote

hi(x;𝜽)=βjlogg(x;𝜽,𝜶0)E𝜽[βilogg(X1;𝜽,𝜶0)]h_{i}(x\,;\,\boldsymbol{\theta})=\frac{\partial}{\partial\beta_{j}}\log\,g\left(x\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}_{0}\right)-\operatorname{E}_{\boldsymbol{\theta}}\left[\frac{\partial}{\partial\beta_{i}}\log\,g\left(X_{1}\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}_{0}\right)\right] (3)

for all xx, 𝜽\boldsymbol{\theta} and ii, where (β1,,βs)=(θ1,,θsr,α1,,αr)(\beta_{1},\cdots,\beta_{s})=(\theta_{1},\cdots,\theta_{s-r},\alpha_{1},\cdots,\alpha_{r}). Moreover, we let J(𝜽)=(Ji,j(𝜽))Ms()J(\boldsymbol{\theta})=\left(J_{i,j}(\boldsymbol{\theta})\right)\in M_{s}(\mathbb{R}) and K(𝜽)=(Ki,j(𝜽))Ms()K(\boldsymbol{\theta})=\left(K_{i,j}(\boldsymbol{\theta})\right)\in M_{s}(\mathbb{R}), be defined by

Ji,j(𝜽)\displaystyle J_{i,j}(\boldsymbol{\theta}) =E𝜽[θjhi(X1;𝜽)] and\displaystyle=\operatorname{E}_{\boldsymbol{\theta}}\left[-\frac{\partial}{\partial\theta_{j}}\,h_{i}(X_{1}\,;\,\boldsymbol{\theta})\right]\mbox{ and} (4)
Ki,j(𝜽)\displaystyle K_{i,j}(\boldsymbol{\theta}) =cov𝜽[hi(X1;𝜽),hj(X1;𝜽)],\displaystyle=\operatorname{cov}_{\boldsymbol{\theta}}\left[h_{i}(X_{1}\,;\,\boldsymbol{\theta}),\,h_{j}(X_{1}\,;\,\boldsymbol{\theta})\right],

for 1is1\leq i\leq s and 1js1\leq j\leq s. These matrices shall play the role that the Fisher information matrix II plays in the classical maximum likelihood method.

In the following, we say an estimator θ^n(𝑿)\hat{\theta}_{n}(\boldsymbol{X}) satisfy the modified likelihood equations (10) with probability converging to one strongly if, letting AnA_{n} denote the subset of Ω\Omega in which θ^n(𝑿)\hat{\theta}_{n}(\boldsymbol{X}) satisfy (10), we have

limnP(mnAm)=1.\lim_{n\to\infty}P(\cap_{m\geq n}A_{m})=1. (5)

More generally, we say a series of events A1A_{1}, A2A_{2}, \cdots happen with probability converging to one strongly if (5) is valid.

In the following we prove a result regarding the existence and strong consistence of solutions of the modified likelihood equations (10) for an arbitrary probability density function ff.

Theorem 2.1.

Denote 𝐗=(X1,X2,,Xn)\boldsymbol{X}=\left(X_{1},X_{2},\cdots,X_{n}\right), where X1X_{1}, \cdots, XnX_{n} are iid with density f(x;𝛉0)f(x\,;\,\boldsymbol{\theta}_{0}) and suppose:

  • (A)

    J(𝜽0)J(\boldsymbol{\theta}_{0}) and K(𝜽0)K(\boldsymbol{\theta}_{0}), as defined in (12), exist and J(𝜽0)J(\boldsymbol{\theta}_{0}) is invertible.

  • (B)

    hj(x;𝜽,𝜶0)h_{j}(x\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}_{0}) is measurable in xx and θihj(x;𝜽,𝜶0)\frac{\partial}{\partial\theta_{i}}h_{j}(x\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}_{0}) exist and is continuous in 𝜽\boldsymbol{\theta}, for all jj and θΘ\theta\in\Theta, where hjh_{j} is given in (3).

  • (C)

    There exist measurable functions Mij(x)M_{ij}(x) and an open set Θ0\Theta_{0} containing the true parameter 𝜽0\boldsymbol{\theta}_{0} such that Θ¯0Θ\overline{\Theta}_{0}\subset\Theta and for all 𝜽Θ¯0\boldsymbol{\theta}\in\overline{\Theta}_{0} and x𝒳x\in\mathcal{X} we have

    |θihj(x;𝜽,𝜶0)|Mij(x) and E𝜽0[Mij(X1)]<,\displaystyle\left|\frac{\partial}{\partial\theta_{i}}h_{j}(x\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}_{0})\right|\leq M_{ij}(x)\mbox{ and }E_{\boldsymbol{\theta}_{0}}\left[M_{ij}(X_{1})\right]<\infty,

    for all 1is1\leq i\leq s and 1js1\leq j\leq s.

Then, with probability converging to one, the generalized maximum likelihood equations has a solution. Specifically, there exist 𝛉^n(𝐱)=(θ^1n(𝐱),,θ^sn(𝐱))\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{x})=(\hat{\theta}_{1n}(\boldsymbol{x}),\cdots,\hat{\theta}_{sn}(\boldsymbol{x})) measurable in 𝐱𝒳n\boldsymbol{x}\in\mathcal{X}^{n} such that:

  • I)

    𝜽^n(𝑿)\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X}) satisfy the modified likelihood equations (10) with probability converging to one strongly.

  • II)

    𝜽^n(𝑿)\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X}) is a strongly consistent estimator for 𝜽0\boldsymbol{\theta}_{0}

  • III)

    n(𝜽^n(𝑿)𝜽0)T𝐷Ns(0,(J(𝜽0)1)TK(𝜽0)J(𝜽0)1)\sqrt{n}(\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X})-\boldsymbol{\theta}_{0})^{T}\overset{D}{\to}N_{s}\left(0,(J(\boldsymbol{\theta}_{0})^{-1})^{T}K(\boldsymbol{\theta}_{0})J(\boldsymbol{\theta}_{0})^{-1}\right).

Proof.

The proof is available in the Appendix. ∎

Note that if r=0r=0 in the above result, then condition (A)(A) corresponds to requiring the Fisher information matrix I(𝜽0)I(\boldsymbol{\theta}_{0}) to be invertible, since in such case J(𝜽0)=I(𝜽0)J(\boldsymbol{\theta}_{0})=I(\boldsymbol{\theta}_{0}).

As a corollary of the above result we have in special the following theorem, which simplifies the above conditions (A)(A) to (C)(C), when gg is contained in a certain family of measurable functions.

Theorem 2.2.

Denote 𝐗=(X1,X2,,Xn)\boldsymbol{X}=\left(X_{1},X_{2},\cdots,X_{n}\right), where X1X_{1}, \cdots, XnX_{n} are iid with density f(x;𝛉0)f(x\,;\,\boldsymbol{\theta}_{0}), let g(x;𝛉,𝛂)g(x\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}) be defined as

g(x;𝜽,𝜶)=V(x)exp(i=1sηi(𝜽)Ti(x,𝜶)+L(𝜽,𝜶)) for all x𝒳,𝜽Θ,𝜶𝒜x,g(x\,;\,\boldsymbol{\theta},\boldsymbol{\alpha})=V(x)\exp\left(\sum_{i=1}^{s}\eta_{i}(\boldsymbol{\theta})T_{i}(x,\boldsymbol{\alpha})+L(\boldsymbol{\theta},\boldsymbol{\alpha})\right)\mbox{ for all }x\in\mathcal{X},\ \boldsymbol{\theta}\in\Theta,\ \boldsymbol{\alpha}\in\mathcal{A}_{x},

where ηi\eta_{i} and LL are C3C^{3} for all ii, VV is measurable and positive, Ti(x,𝛂𝟎)T_{i}(x,\boldsymbol{\alpha_{0}}) is measurable in xx, the partial derivatives αjTi(x,𝛂0)\frac{\partial}{\partial\alpha_{j}}T_{i}(x,\boldsymbol{\alpha}_{0}) exist for all ii and jj, and suppose:

  • (A)

    J(𝜽0)J(\boldsymbol{\theta}_{0}) and K(𝜽0)K(\boldsymbol{\theta}_{0}), as defined in (12), exist and J(𝜽0)J(\boldsymbol{\theta}_{0}) is invertible.

  • (B)

    E𝜽[Ti(x,𝜶𝟎)]\operatorname{E}_{\boldsymbol{\theta}}\left[T_{i}(x,\boldsymbol{\alpha_{0}})\right] and E𝜽[αjTi(x,𝜶0)]\operatorname{E}_{\boldsymbol{\theta}}\left[\frac{\partial}{\partial\alpha_{j}}T_{i}(x,\boldsymbol{\alpha}_{0})\right] are finite, for all ii, jj and 𝜽Θ\boldsymbol{\theta}\in\Theta.

Then, with probability converging to one as nn\to\infty, the generalized maximum likelihood equations has a solution. Specifically, there exist 𝛉^n(𝐱)=(θ^1n(𝐱),,θ^sn(𝐱))\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{x})=(\hat{\theta}_{1n}(\boldsymbol{x}),\cdots,\hat{\theta}_{sn}(\boldsymbol{x})) measurable in 𝐱𝒳n\boldsymbol{x}\in\mathcal{X}^{n} such that:

  • I)

    𝜽^n(𝑿)\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X}) satisfy the modified likelihood equations (10) with probability converging to one strongly.

  • II)

    𝜽^n(𝑿)\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X}) is a strongly consistent estimator for 𝜽0\boldsymbol{\theta}_{0}

  • III)

    n(𝜽^n(𝑿)𝜽0)T𝐷Ns(0,(J(𝜽0)1)TK(𝜽0)J(𝜽0)1)\sqrt{n}(\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X})-\boldsymbol{\theta}_{0})^{T}\overset{D}{\to}N_{s}\left(0,(J(\boldsymbol{\theta}_{0})^{-1})^{T}K(\boldsymbol{\theta}_{0})J(\boldsymbol{\theta}_{0})^{-1}\right).

Proof.

The proof is available in the Appendix. ∎

Note the family of measurable functions imposed above for gg is more general than the exponential family of distributions, and besides, gg is not even required to be a probability density function. Additionally, note that no restrictions are made over ff besides being a probability density function. Thus, we consider this result to be important since it provides an infinite number of possible estimators, due to the infinite possible choices for gg, and besides, provides easy to verify conditions for obtained estimators to be strongly consistent and asymptotically normal.

Now, in order to define the generalized generalized likelihood equations under change of variables, given π:ΘΛ\pi:\Theta\to\Lambda a diffeomorphism and letting g(x;𝝀,𝜶)=g(x;π1(𝝀),𝜶)g^{*}(x\,;\,\boldsymbol{\lambda},\boldsymbol{\alpha})=g(x\,;\,\pi^{-1}(\boldsymbol{\lambda}),\boldsymbol{\alpha}) for all xx, 𝝀Λ\boldsymbol{\lambda}\in\Lambda and 𝜶0𝒜x\boldsymbol{\alpha}_{0}\in\mathcal{A}_{x}, we let the generalized maximum likelihood equations for 𝛌=π(𝛉)\boldsymbol{\lambda}=\pi(\boldsymbol{\theta}) at 𝛂=𝛂0\boldsymbol{\alpha}=\boldsymbol{\alpha}_{0} be defined by the set of equations

1ni=1nλjlogg(Xi;𝝀,𝜶0)=Eπ1(𝝀)[λjlogg(X1;𝝀,𝜶0)],1jsr,\displaystyle\frac{1}{n}\sum_{i=1}^{n}\frac{\partial}{\partial\lambda_{j}}\log\,g^{*}(X_{i}\,;\,\boldsymbol{\lambda},\boldsymbol{\alpha}_{0})=\operatorname{E}_{\pi^{-1}(\boldsymbol{\lambda})}\left[\frac{\partial}{\partial\lambda_{j}}\log\,g^{*}(X_{1}\,;\,\boldsymbol{\lambda},\boldsymbol{\alpha}_{0})\right],\quad 1\leq j\leq s-r, (6)
1ni=1nαjlogg(Xi;𝝀,𝜶0)=Eπ1(𝝀)[αjlogg(X1;𝝀,𝜶0)],1jr,\displaystyle\frac{1}{n}\sum_{i=1}^{n}\frac{\partial}{\partial\alpha_{j}}\log\,g^{*}(X_{i}\,;\,\boldsymbol{\lambda},\boldsymbol{\alpha}_{0})=\operatorname{E}_{\pi^{-1}(\boldsymbol{\lambda})}\left[\frac{\partial}{\partial\alpha_{j}}\log\,g^{*}(X_{1}\,;\,\boldsymbol{\lambda},\boldsymbol{\alpha}_{0})\right],\quad 1\leq j\leq r,

as long as these partial derivatives exist and the expected values are finite.

Proposition 2.3 (One-to-one invariance).

Suppose Θ=Θ1×Θ2\Theta=\Theta_{1}\times\Theta_{2}, Λ=Λ1×Λ2\Lambda=\Lambda_{1}\times\Lambda_{2} where Θ1\Theta_{1}, Λ1sr\Lambda_{1}\subset\mathbb{R}^{s-r} and Θ2\Theta_{2} and Λ2r\Lambda_{2}\subset\mathbb{R}^{r} are open sets, suppose π:ΛΘ\pi:\Lambda\to\Theta can be written as

π(𝜽)=(π1(𝜽1),π2(𝜽2)), for all 𝜽=(𝜽1,𝜽2)Θ1×Θ2,\pi(\boldsymbol{\theta})=(\pi_{1}(\boldsymbol{\theta}_{1}),\pi_{2}(\boldsymbol{\theta}_{2})),\mbox{ for all }\boldsymbol{\theta}=(\boldsymbol{\theta}_{1},\boldsymbol{\theta}_{2})\in\Theta_{1}\times\Theta_{2},

where π1:Θ1Λ1\pi_{1}:\Theta_{1}\to\Lambda_{1} and π2:Θ2Λ2\pi_{2}:\Theta_{2}\to\Lambda_{2} are diffeomorphisms, suppose that for some nn\in\mathbb{N}, with probability one on Ω\Omega, 𝛉^n(𝐗)\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X}) is a solution for the generalized maximum likelihood equations for 𝛉\boldsymbol{\theta} at 𝛂=𝛂0\boldsymbol{\alpha}=\boldsymbol{\alpha}_{0}. Then, with probability one in Ω\Omega, π(𝛉^n(𝐗))\pi(\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X})) is a solution to the generalized likelihood equations for 𝛌=π(𝛉)\boldsymbol{\lambda}=\pi(\boldsymbol{\theta}) at 𝛂=𝛂0\boldsymbol{\alpha}=\boldsymbol{\alpha}_{0}.

Proof.

Since π1\pi_{1} does not depend on 𝜽2\boldsymbol{\theta}_{2} and π2\pi_{2} does not depend on 𝜽1\boldsymbol{\theta}_{1}, it follows that π1(𝝀)=(π11(𝝀1),π21(𝝀2))\pi^{-1}(\boldsymbol{\lambda})=\left(\pi_{1}^{-1}(\boldsymbol{\lambda}_{1}),\pi_{2}^{-1}(\boldsymbol{\lambda}_{2})\right) for all 𝝀=(𝝀1,𝝀2)Λ1×Λ2\boldsymbol{\lambda}=(\boldsymbol{\lambda}_{1},\boldsymbol{\lambda}_{2})\in\Lambda_{1}\times\Lambda_{2} and thus, letting π11(𝝀1)=(π11(𝝀1),,π1(sr)(𝝀1))\pi_{1}^{-1}(\boldsymbol{\lambda}_{1})=\left(\pi^{*}_{11}(\boldsymbol{\lambda}_{1}),\cdots,\pi^{*}_{1(s-r)}(\boldsymbol{\lambda}_{1})\right) for all 𝝀1Λ1\boldsymbol{\lambda}_{1}\in\Lambda_{1}, and letting g(x;𝝀,𝜶)=g(x;π1(𝝀),𝜶)g^{*}(x\,;\,\boldsymbol{\lambda},\boldsymbol{\alpha})=g(x\,;\,\pi^{-1}(\boldsymbol{\lambda}),\boldsymbol{\alpha}) for all xx, from the chain rule it follows that

λjlogg(Xi;𝝀,𝜶0)=k=1srθklogg(Xi;π1(𝝀),𝜶0)λjπ1k(𝝀1),\displaystyle\frac{\partial}{\partial\lambda_{j}}\log g^{*}(X_{i}\,;\,\boldsymbol{\lambda},\boldsymbol{\alpha}_{0})=\sum_{k=1}^{s-r}\frac{\partial}{\partial\theta_{k}}\log g(X_{i}\,;\,\pi^{-1}\left(\boldsymbol{\lambda}\right),\boldsymbol{\alpha}_{0})\frac{\partial}{\partial\lambda_{j}}\pi^{*}_{1k}(\boldsymbol{\lambda}_{1}), (7)

for all ii and 1jsr1\leq j\leq s-r. Moreover, by hypothesis, with probability one in Ω\Omega, 𝜽^n(𝑿)\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X}) satisfy

1ni=1nθjlogg(Xi;𝜽^n(𝑿),𝜶0)=𝒳(θjlogg(X1;𝜽^n(𝑿),𝜶0))f(X1;𝜽^n(𝑿),𝜶0)𝑑μ,\displaystyle\frac{1}{n}\sum_{i=1}^{n}\frac{\partial}{\partial\theta_{j}}\log\,g(X_{i}\,;\,\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X}),\boldsymbol{\alpha}_{0})=\int_{\mathcal{X}}\left(\frac{\partial}{\partial\theta_{j}}\log\,g(X_{1}\,;\,\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X}),\boldsymbol{\alpha}_{0})\right)f(X_{1}\,;\,\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X}),\boldsymbol{\alpha}_{0})\;d\mu, (8)

for all ii and 1jsr1\leq j\leq s-r. Thus, denoting 𝝀^n(𝑿)=π(𝜽^n(𝑿))\boldsymbol{\hat{\lambda}}_{n}(\boldsymbol{X})=\pi(\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X})) it follows combining (7) and (8) that

1ni=1nλjlogg(Xi;𝝀^n(𝑿),𝜶0)=k=1sr(1ni=1nθklogg(Xi;𝜽^n(𝑿),𝜶0))λjπ1k(𝝀1),\displaystyle\frac{1}{n}\sum_{i=1}^{n}\frac{\partial}{\partial\lambda_{j}}\log\,g^{*}(X_{i}\,;\,\boldsymbol{\hat{\lambda}}_{n}(\boldsymbol{X}),\boldsymbol{\alpha}_{0})=\sum_{k=1}^{s-r}\left(\frac{1}{n}\sum_{i=1}^{n}\frac{\partial}{\partial\theta_{k}}\log g(X_{i}\,;\,\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X}),\boldsymbol{\alpha}_{0})\right)\frac{\partial}{\partial\lambda_{j}}\pi^{*}_{1k}(\boldsymbol{\lambda}_{1}),
=𝒳(k=1sr(i=1nθklogg(Xi;𝜽^n(𝑿),𝜶0))λjπ1k(𝝀1))f(X1;𝜽^n(𝑿),𝜶0)𝑑μ\displaystyle=\int_{\mathcal{X}}\left(\sum_{k=1}^{s-r}\left(\sum_{i=1}^{n}\frac{\partial}{\partial\theta_{k}}\log g(X_{i}\,;\,\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X}),\boldsymbol{\alpha}_{0})\right)\frac{\partial}{\partial\lambda_{j}}\pi^{*}_{1k}(\boldsymbol{\lambda}_{1})\right)f(X_{1}\,;\,\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X}),\boldsymbol{\alpha}_{0})\;d\mu
=𝒳(λjlogg(X1;𝝀^n(𝑿),𝜶0))f(X1;π1(𝝀^n(𝑿)),𝜶0)𝑑μ\displaystyle=\int_{\mathcal{X}}\left(\frac{\partial}{\partial\lambda_{j}}\log\,g^{*}(X_{1}\,;\,\boldsymbol{\hat{\lambda}}_{n}(\boldsymbol{X}),\boldsymbol{\alpha}_{0})\right)f(X_{1}\,;\,\pi^{-1}(\boldsymbol{\hat{\lambda}}_{n}(\boldsymbol{X})),\boldsymbol{\alpha}_{0})\;d\mu
=Eπ1(𝝀^n(𝑿))[λjlogg(X1;𝝀^n(𝑿),𝜶0)]\displaystyle=\operatorname{E}_{\pi^{-1}(\boldsymbol{\hat{\lambda}}_{n}(\boldsymbol{X}))}\left[\frac{\partial}{\partial\lambda_{j}}\log\,g^{*}(X_{1}\,;\,\boldsymbol{\hat{\lambda}}_{n}(\boldsymbol{X}),\boldsymbol{\alpha}_{0})\right]

with probability one on Ω\Omega. That is, 𝝀^n(𝑿)\boldsymbol{\hat{\lambda}}_{n}(\boldsymbol{X}) satisfy with probability one on Ω\Omega the first equation in (6). Additionally since by hypothesis π\pi does not depend on the variable αj\alpha_{j} for given 1jr1\leq j\leq r it follows that

αjlogg(Xi;𝝀^n(𝑿),𝜶0)=αjlogg(Xi;𝜽^n(𝑿),𝜶0),\displaystyle\frac{\partial}{\partial\alpha_{j}}\,\log g^{*}(X_{i}\,;\,\boldsymbol{\hat{\lambda}}_{n}(\boldsymbol{X}),\boldsymbol{\alpha}_{0})=\frac{\partial}{\partial\alpha_{j}}\,\log g(X_{i}\,;\,\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X}),\boldsymbol{\alpha}_{0}),

for all ii and 1jr1\leq j\leq r, from which it follows using the hypothesis, just as before, that

1ni=1nαjlogg(Xi;𝝀^n(𝑿),𝜶0)=Eπ1(𝝀^n(𝑿))[αjlogg(X1;𝝀^n(𝑿),𝜶0)]\frac{1}{n}\sum_{i=1}^{n}\frac{\partial}{\partial\alpha_{j}}\log\,g^{*}(X_{i}\,;\,\boldsymbol{\hat{\lambda}}_{n}(\boldsymbol{X}),\boldsymbol{\alpha}_{0})=\operatorname{E}_{\pi^{-1}(\boldsymbol{\hat{\lambda}}_{n}(\boldsymbol{X}))}\left[\frac{\partial}{\partial\alpha_{j}}\log\,g^{*}(X_{1}\,;\,\boldsymbol{\hat{\lambda}}_{n}(\boldsymbol{X}),\boldsymbol{\alpha}_{0})\right]

for 1jr1\leq j\leq r, with probability one on Ω\Omega, which concludes the proof. ∎

In general, MML estimators will not necessarily be functions of sufficient statistics. Additionally, in our applications, we shall use gg as a generalized version of the distribution ff in order to obtain the generalized maximum likelihood estimators. As we shall see, due to the high number of new distributions introduced in the past decades, it is not difficult to find such functions gg generalizing ff. In the next section, we present applications of the proposed method.

3 Examples

We illustrate the proposed method by applying it to the Gamma, Nakagami-m, and Beta distributions. The examples are presented for well-known distributions, so we shall not present their backgrounds. The standard MLEs for the cited distributions are widely discussed in statistical books, which shows that no closed-form expression can be achieved using the MLE method.

The Gamma and the Nakagami-m distributions are particular cases of the generalized Gamma distribution while the Beta distribution is a special case of the generalized Beta distribution. Therefore, we will consider this generalized distributions to obtain the generalized maximum-likelihood equations used to obtain the closed-form estimators.

As we shall see, in all examples presented here we shall have

E𝜽[θjlogg(X1;𝜽,𝜶0)]=0 and E𝜽[αjlogg(X1;𝜽,𝜶0)]=0\displaystyle\operatorname{E}_{\boldsymbol{\theta}}\left[\frac{\partial}{\partial\theta_{j}}\log\,g(X_{1}\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}_{0})\right]=0\mbox{ and }\operatorname{E}_{\boldsymbol{\theta}}\left[\frac{\partial}{\partial\alpha_{j}}\log\,g(X_{1}\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}_{0})\right]=0 (9)

for all jj. This should be expected in these examples due to differentiation under the integral sign of the equation 𝒳g(X1;𝜽,𝜶)𝑑μ=1\int_{\mathcal{X}}g(X_{1}\,;\,\boldsymbol{\theta},\boldsymbol{\alpha})\;d\mu=1, since in these examples ff is a special case of gg, and gg is a probability distribution. In special, in these examples the generalized maximum likelihood equations shall be given by

1ni=1nθjlogg(Xi;𝜽,𝜶0)=0,1jsr,\displaystyle\frac{1}{n}\sum_{i=1}^{n}\frac{\partial}{\partial\theta_{j}}\log\,g(X_{i}\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}_{0})=0,\quad 1\leq j\leq s-r, (10)
1ni=1nαjlogg(Xi;𝜽,𝜶0)=0,1jr,\displaystyle\frac{1}{n}\sum_{i=1}^{n}\frac{\partial}{\partial\alpha_{j}}\log\,g(X_{i}\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}_{0})=0,\quad 1\leq j\leq r,

and moreover J(𝜽)J(\boldsymbol{\theta}) and K(𝜽)K(\boldsymbol{\theta}) shall be given by

Ji,j(𝜽)\displaystyle J_{i,j}(\boldsymbol{\theta}) =E𝜽[2θjβig(X1;𝜽,𝜶0)] and\displaystyle=\operatorname{E}_{\boldsymbol{\theta}}\left[-\frac{\partial^{2}}{\partial\theta_{j}\partial\beta_{i}}\,g(X_{1}\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}_{0})\right]\mbox{ and} (11)
Ki,j(𝜽)\displaystyle K_{i,j}(\boldsymbol{\theta}) =cov𝜽[βig(X1;𝜽,𝜶0),βjg(X1;𝜽,𝜶0)],\displaystyle=\operatorname{cov}_{\boldsymbol{\theta}}\left[\frac{\partial}{\partial\beta_{i}}g(X_{1}\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}_{0}),\,\frac{\partial}{\partial\beta_{j}}g(X_{1}\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}_{0})\right],

for all ii and jj, where 𝜷\boldsymbol{\beta} is as in (3). Additionally, since we shall use only well known distributions g(x;𝜽,𝜶)g(x\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}), whose Fisher information matrix I(𝜽,𝜶)I(\boldsymbol{\theta},\boldsymbol{\alpha}) can be computed either by

Ii,j(𝜽,𝜶)\displaystyle I_{i,j}(\boldsymbol{\theta},\boldsymbol{\alpha}) =E𝜽[2βiβjg(X1;𝜽,𝜶)] or\displaystyle=\operatorname{E}_{\boldsymbol{\theta}}\left[-\frac{\partial^{2}}{\partial\beta^{*}_{i}\partial\beta^{*}_{j}}\,g(X_{1}\,;\,\boldsymbol{\theta},\boldsymbol{\alpha})\right]\mbox{ or} (12)
Ii,j(𝜽,𝜶)\displaystyle I_{i,j}(\boldsymbol{\theta},\boldsymbol{\alpha}) =cov𝜽[βig(X1;𝜽,𝜶),βjg(X1;𝜽,𝜶)],\displaystyle=\operatorname{cov}_{\boldsymbol{\theta}}\left[\frac{\partial}{\partial\beta^{*}_{i}}g(X_{1}\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}),\,\frac{\partial}{\partial\beta^{*}_{j}}g(X_{1}\,;\,\boldsymbol{\theta},\boldsymbol{\alpha})\right],

where 𝜷=(𝜶,𝜽)\boldsymbol{\beta}^{*}=(\boldsymbol{\alpha},\boldsymbol{\theta}) it follows that, in these examples, J(𝜽)J(\boldsymbol{\theta}) and K(𝜽)K(\boldsymbol{\theta}) are submatrices of I(𝜽,𝜶0)I(\boldsymbol{\theta},\boldsymbol{\alpha}_{0}).

Example 1: Let us consider that X1X_{1}, X2X_{2}, \ldots, XnX_{n} are iid random variables (RV) following a gamma distribution with probability density function (PDF) given by:

f(x;λ,ϕ)=1Γ(ϕ)(ϕλ)ϕxϕ1exp(ϕλx) for all x>0,f(x\,;\,\lambda,\phi)=\frac{1}{\Gamma(\phi)}\left(\frac{\phi}{\lambda}\right)^{\phi}x^{\phi-1}\exp\left(-\frac{\phi}{\lambda}x\right)\mbox{ for all }x>0, (13)

where ϕ>0\phi>0 is the shape parameter, λ>0\lambda>0 is the scale parameter and Γ(α)=0exxα1𝑑x\Gamma(\alpha)=\int_{0}^{\infty}{e^{-x}x^{\alpha-1}dx} is the gamma function.

We can apply the generalized maximum likelihood approach for this distribution by considering the density function g(x;λ,ϕ,α)g(x\,;\,\lambda,\phi,\alpha) representing the generalized gamma distribution, where λ>0\lambda>0, ϕ>0\phi>0 and α>0\alpha>0, given by

g(x;λ,ϕ,α)=αΓ(ϕ)(ϕλ)ϕxαϕ1exp(ϕλxα) for all x>0.g(x\,;\,\lambda,\phi,\alpha)=\frac{\alpha}{\Gamma(\phi)}\left(\frac{\phi}{\lambda}\right)^{\phi}x^{\alpha\phi-1}\exp\left(-\frac{\phi}{\lambda}x^{\alpha}\right)\mbox{ for all }x>0. (14)

In order to formulate the generalized maximum likelihood equations for this distribution we first note that

Eλ,ϕ[X1]=λ,Eλ,ϕ[log(X1)]=ψ(ϕ)log(ϕλ) and\displaystyle\operatorname{E}_{\lambda,\phi}\left[X_{1}\right]=\lambda,\ \operatorname{E}_{\lambda,\phi}\left[\log(X_{1})\right]=\psi(\phi)-\log\left(\frac{\phi}{\lambda}\right)\mbox{ and } (15)
Eλ,ϕ[X1log(X1)]=λ(ψ(ϕ)+1ϕlog(ϕλ)),\displaystyle\operatorname{E}_{\lambda,\phi}\left[X_{1}\log(X_{1})\right]=\lambda\left(\psi(\phi)+\frac{1}{\phi}-\log\left(\frac{\phi}{\lambda}\right)\right),

which, combined with Eλ,ϕ[1]=1\operatorname{E}_{\lambda,\phi}[1]=1 implies that

Eλ,ϕ[λlogg(X1;λ,ϕ,1)]=ϕλϕλ2Eλ,ϕ[X1]=0 and\displaystyle\operatorname{E}_{\lambda,\phi}\left[\frac{\partial}{\partial\lambda}\log g(X_{1}\,;\,\lambda,\phi,1)\right]=\frac{\phi}{\lambda}-\frac{\phi}{\lambda^{2}}\operatorname{E}_{\lambda,\phi}\left[X_{1}\right]=0\mbox{ and}
Eλ,ϕ[αlogg(X1;λ,ϕ,1)]=1+ϕEλ,ϕ[log(X1)]ϕλEλ,ϕ[X1log(X1)]=0,\displaystyle\operatorname{E}_{\lambda,\phi}\left[\frac{\partial}{\partial\alpha}\log g(X_{1}\,;\,\lambda,\phi,1)\right]=1+\phi\operatorname{E}_{\lambda,\phi}\left[\log\left(X_{1}\right)\right]-\frac{\phi}{\lambda}\operatorname{E}_{\lambda,\phi}\left[X_{1}\log\left(X_{1}\right)\right]=0,

that is, (9) is satisfied. Thus, the generalized likelihood equations for 𝜽=(λ,ϕ)\boldsymbol{\theta}=(\lambda,\phi) over the coordinates (λ,α)(\lambda,\alpha) at α=1\alpha=1 are given by

i=1nλlogg(Xi;λ,ϕ,1)=nϕλϕλ2i=1nXi=0 and\displaystyle\sum_{i=1}^{n}\frac{\partial}{\partial\lambda}\log g(X_{i}\,;\,\lambda,\phi,1)=\frac{n\phi}{\lambda}-\frac{\phi}{\lambda^{2}}\sum_{i=1}^{n}{X_{i}}=0\mbox{ and}
i=1nαlogg(Xi;λ,ϕ,1)=n+ϕ(i=1nlog(Xi)1λi=1nXilog(Xi))=0.\displaystyle\sum_{i=1}^{n}\frac{\partial}{\partial\alpha}\log g(X_{i}\,;\,\lambda,\phi,1)=n+\phi\left(\sum_{i=1}^{n}\log\left(X_{i}\right)-\frac{1}{\lambda}\sum_{i=1}^{n}X_{i}\log\left(X_{i}\right)\right)=0.

Following [17], as long as the equality X1==XnX_{1}=\cdots=X_{n} does not hold, we have i=1nXilog(Xi)1ni=1nXii=1nlog(Xi)0\sum_{i=1}^{n}X_{i}\log\left(X_{i}\right)-\frac{1}{n}\sum_{i=1}^{n}X_{i}\sum_{i=1}^{n}\log\left(X_{i}\right)\neq 0, in which case a direct computation shows the generalized likelihood equations above has as only solution

λ^n=1ni=1nXi and ϕ^n=i=1nXii=1nXilog(Xi)1ni=1nXii=1nlog(Xi).\hat{\lambda}_{n}=\frac{1}{n}\sum_{i=1}^{n}{X_{i}}\mbox{ and }\hat{\phi}_{n}=\frac{\sum_{i=1}^{n}X_{i}}{\sum_{i=1}^{n}X_{i}\log\left(X_{i}\right)-\frac{1}{n}\sum_{i=1}^{n}X_{i}\sum_{i=1}^{n}\log\left(X_{i}\right)}. (16)

On the other hand, the MLE for ϕ\phi and λ\lambda would be obtained by solving the non-linear system of equations

λ=1ni=1nXi and log(ϕ)ψ(ϕ)=log(λ)1ni=1nlog(Xi),\lambda=\frac{1}{n}\sum_{i=1}^{n}X_{i}\mbox{ and }\log(\phi)-\psi(\phi)=\log(\lambda)-\frac{1}{n}\sum_{i=1}^{n}{\log\left(X_{i}\right)}, (17)

where ψ(k)=klogΓ(k)=Γ(k)Γ(k)\psi(k)=\frac{\partial}{\partial k}\log\Gamma(k)=\frac{\Gamma^{\prime}(k)}{\Gamma(k)} is the digamma function.

We now apply Theorem 2.2 to prove that the obtained estimators are consistent and asymptotically normal.

Proposition 3.1.

ϕ^n\hat{\phi}_{n} and λ^n\hat{\lambda}_{n} are strongly consistent estimators for the true parameters ϕ\phi and λ\lambda, and asymptotically normal with n(λ^nλ)𝐷N(0,λ2/ϕ)\sqrt{n}\left(\hat{\lambda}_{n}-\lambda\right)\overset{D}{\to}N\left(0,\lambda^{2}/\phi\right) and n(ϕ^nϕ)𝐷N(0,ϕ3ψ(ϕ+1)+ϕ2)\sqrt{n}\left(\hat{\phi}_{n}-\phi\right)\overset{D}{\to}N\left(0,\phi^{3}\psi^{\prime}(\phi+1)+\phi^{2}\right).

Proof.

In order to apply Theorem 2.2 we note g(x;λ,ϕ,α)g(x\,;\,\lambda,\phi,\alpha) can be rewritten as

g(x;λ,ϕ,α)=αΓ(ϕ)(ϕλ)ϕxαϕ1exp(ϕλxα)=\displaystyle g(x\,;\,\lambda,\phi,\alpha)=\frac{\alpha}{\Gamma(\phi)}\left(\frac{\phi}{\lambda}\right)^{\phi}x^{\alpha\phi-1}\exp\left(-\frac{\phi}{\lambda}x^{\alpha}\right)=
V(x)exp(η1(λ,ϕ)T1(x,α)+η2(λ,ϕ)T2(x,α)+L(λ,ϕ,α))\displaystyle V(x)\exp\left(\eta_{1}(\lambda,\phi)T_{1}(x,\alpha)+\eta_{2}(\lambda,\phi)T_{2}(x,\alpha)+L(\lambda,\phi,\alpha)\right)

for all x>0x>0, λ>0\lambda>0, θ>0\theta>0 and α>0\alpha>0, where

V(x)=1x,η1(λ,ϕ)=ϕ,η2(λ,ϕ)=ϕλ\displaystyle V(x)=\frac{1}{x},\ \eta_{1}(\lambda,\phi)=\phi,\ \eta_{2}(\lambda,\phi)=-\frac{\phi}{\lambda}
T1(x,α)=αlog(x),T2(x,α)=xα and\displaystyle T_{1}(x,\alpha)=\alpha\log(x),\ T_{2}(x,\alpha)=x^{\alpha}\mbox{ and }
L(λ,ϕ,α)=log(α)log(Γ(ϕ))+ϕlog(ϕλ).\displaystyle L(\lambda,\phi,\alpha)=\log\left(\alpha\right)-\log\left(\Gamma(\phi)\right)+\phi\log\left(\frac{\phi}{\lambda}\right).

To check condition (A)(A) of Theorem 2.2 note that, for α=1\alpha=1 and using reparametrization over the Fisher information matrix from the GG distribution available in [12] it follows that the Fisher information matrix under our parametrization satisfy

I(λ,ϕ,α0)=[ϕλ20ϕlog(ϕλ)ϕψ(ϕ)1λ0ϕψ(ϕ)1ϕ1ϕϕlog(ϕλ)ϕψ(ϕ)1λ1ϕI3,3(λ,ϕ)],I(\lambda,\phi,\alpha_{0})=\begin{bmatrix}\dfrac{\phi}{\lambda^{2}}&0&\frac{\phi\log\left(\frac{\phi}{\lambda}\right)-\phi\psi(\phi)-1}{\lambda}\\ 0&\frac{\phi\psi^{\prime}(\phi)-1}{\phi}&\frac{1}{\phi}\\ \frac{\phi\log\left(\frac{\phi}{\lambda}\right)-\phi\psi(\phi)-1}{\lambda}&\frac{1}{\phi}&I_{3,3}(\lambda,\phi)\end{bmatrix}, (18)

for α0=1\alpha_{0}=1, where

I3,3(λ,ϕ)=log(ϕλ)(ϕlog(ϕλ)2ϕψ(ϕ)2)+ϕψ(ϕ)+2ψ(ϕ)+ϕψ(ϕ)2+1.I_{3,3}(\lambda,\phi)=\log\left(\frac{\phi}{\lambda}\right)\left(\phi\log\left(\frac{\phi}{\lambda}\right)-2\phi\psi(\phi)-2\right)+\phi\psi^{\prime}(\phi)+2\psi(\phi)+\phi\psi(\phi)^{2}+1.

Therefore since, as discussed earlier, J(λ,ϕ)J(\lambda,\phi) and K(λ,ϕ)K(\lambda,\phi) can be computed as submatrices of I(λ,ϕ,α0)I(\lambda,\phi,\alpha_{0}), we have

J(λ,ϕ)=[ϕλ2ϕlog(ϕλ)ϕψ(ϕ)1λ01ϕ] and K(λ,ϕ)=[ϕλ2ϕlog(ϕλ)ϕψ(ϕ)1λϕlog(ϕλ)ϕψ(ϕ)1λI3,3(λ,ϕ)],J(\lambda,\phi)=\begin{bmatrix}\dfrac{\phi}{\lambda^{2}}&\frac{\phi\log\left(\frac{\phi}{\lambda}\right)-\phi\psi(\phi)-1}{\lambda}\\ 0&\frac{1}{\phi}\end{bmatrix}\mbox{ and }K(\lambda,\phi)=\begin{bmatrix}\dfrac{\phi}{\lambda^{2}}&\frac{\phi\log\left(\frac{\phi}{\lambda}\right)-\phi\psi(\phi)-1}{\lambda}\\ \frac{\phi\log\left(\frac{\phi}{\lambda}\right)-\phi\psi(\phi)-1}{\lambda}&I_{3,3}(\lambda,\phi)\end{bmatrix}, (19)

and thus, since det(J(λ,ϕ))=1λ20\det(J(\lambda,\phi))=\frac{1}{\lambda^{2}}\neq 0, it follows that J(λ,ϕ)J(\lambda,\phi) is invertible for all ϕ>0\phi>0 and λ>0\lambda>0 with

J(λ,ϕ)1=[λ2ϕλ(ϕlog(ϕλ)ϕψ(ϕ)1)0ϕ],J(\lambda,\phi)^{-1}=\begin{bmatrix}\frac{\lambda^{2}}{\phi}&-\lambda\left(\phi\log\left(\frac{\phi}{\lambda}\right)-\phi\psi(\phi)-1\right)\\ 0&\phi\end{bmatrix},

that is, condition (A)(A) is verified. Additionally, after some algebraic computations, one can verify that

(J(λ,ϕ)1)TK(λ,ϕ)J(λ,ϕ)1=[λ2ϕ00ϕ3ψ(ϕ+1)+ϕ2].(J(\lambda,\phi)^{-1})^{T}K(\lambda,\phi)J(\lambda,\phi)^{-1}=\begin{bmatrix}\frac{\lambda^{2}}{\phi}&0\\ 0&\phi^{3}\psi^{\prime}(\phi+1)+\phi^{2}\end{bmatrix}. (20)

Item (B)(B) is straightforward to check from (2.2). Thus conditions (A)(A) and (B)(B) of Theorem 2.2 are valid and therefore, from Theorem 2.2, we conclude there exist 𝜽^n(𝒙)=(θ^1n(𝒙),θ^2n(𝒙))\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{x})=(\hat{\theta}_{1n}(\boldsymbol{x}),\hat{\theta}_{2n}(\boldsymbol{x})) measurable in 𝒙𝒳n\boldsymbol{x}\in\mathcal{X}^{n} satisfying items I)I) to III)III) of Theorem 2.2.

Now, since the equation X1==XnX_{1}=\cdots=X_{n} has probability zero of ocurring for n2n\geq 2, it follows that (λ^n,ϕ^n)(\hat{\lambda}_{n},\hat{\phi}_{n}) as given in (16) is, with probability one, the only solution of the generalized maximum likelihood equations for n2n\geq 2. This fact combined with item I)I) of Theorem 2.2 implies that 𝜽^n(𝑿)(λ^n,ϕ^n)a.s.0\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X})-(\hat{\lambda}_{n},\hat{\phi}_{n})\overset{a.s.}{\to}0. Thus the proposition follows from items II)II) and III)III) of Theorem 2.2 combined with (20). ∎

Note that the MLE of ϕ\phi differs from the obtained using our approach, which leads to a closed-form expression. Figure 1 presents the Bias and root of the mean square error (RMSE) obtained from 100,000100,000 replications assuming n=10,15,,100n=10,15,\ldots,100 and ϕ=2\phi=2 and λ=1.5\lambda=1.5. We presented only the results related to ϕ\phi, since the estimator of λ\lambda is the same using both approaches. It can be seen from the obtained results that both estimators’ results in similar (although not the same) results.

Refer to caption
Figure 1: Bias and RMSE for ϕ\phi for samples sizes of 10,15,,10010,15,\ldots,100 elements considering ϕ=2\phi=2 and λ=1.5\lambda=1.5.

Example 2: Let X1X_{1}, X2X_{2}, \ldots, XnX_{n} be iid random variables following a Nakagami-m distribution with PDF given by

f(x;λ,ϕ)=2Γ(ϕ)(ϕλ)ϕt2ϕ1exp(ϕλt2),f(x\,;\,\lambda,\phi)=\frac{2}{\Gamma(\phi)}\left(\frac{\phi}{\lambda}\right)^{\phi}t^{2\phi-1}\exp\left(-\frac{\phi}{\lambda}t^{2}\right),

for all t>0t>0, where ϕ>0.5\phi>0.5 and λ>0\lambda>0.

Once again letting gg as in (14), just as in Example 1, following [19], as long as X1==XnX_{1}=\cdots=X_{n} does not hold, it follows that i=1nXi2log(Xi2)1ni=1nXi2i=1nlog(Xi2)0\sum_{i=1}^{n}X_{i}^{2}\log\left(X_{i}^{2}\right)-\frac{1}{n}\sum_{i=1}^{n}X_{i}^{2}\sum_{i=1}^{n}\log\left(X_{i}^{2}\right)\neq 0, in which case the generalized maximum likelihood equations for (λ,ϕ)(\lambda,\phi) over the coordinates (λ,α)(\lambda,\alpha) at α=2\alpha=2 has as only solution

λ^n=1ni=1nXi2 and ϕ^n=i=1nXi2i=1nXi2log(Xi2)1ni=1nXi2i=1nlog(Xi2)\displaystyle\hat{\lambda}_{n}=\frac{1}{n}\sum_{i=1}^{n}{X_{i}^{2}}\ \ \mbox{ and }\ \ \hat{\phi}_{n}=\cfrac{\sum_{i=1}^{n}X_{i}^{2}}{\sum_{i=1}^{n}X_{i}^{2}\log\left(X_{i}^{2}\right)-\frac{1}{n}\sum_{i=1}^{n}X_{i}^{2}\sum_{i=1}^{n}\log\left(X_{i}^{2}\right)}\,\cdot

The estimator has a similar expression as that of the MMLEs of the Gamma distribution. Once again, we note these estimators are strongly consistent and asymptotically normal:

Proposition 3.2.

ϕ^n\hat{\phi}_{n} and λ^n\hat{\lambda}_{n} are strongly consistent estimators for the true parameters ϕ\phi and λ\lambda, and asymptotically normal with n(λ^nλ)𝐷N(0,λ2/ϕ)\sqrt{n}\left(\hat{\lambda}_{n}-\lambda\right)\overset{D}{\to}N\left(0,\lambda^{2}/\phi\right) and n(ϕ^nϕ)𝐷N(0,ϕ3ψ(ϕ+1)+ϕ2)\sqrt{n}\left(\hat{\phi}_{n}-\phi\right)\overset{D}{\to}N\left(0,\phi^{3}\psi^{\prime}(\phi+1)+\phi^{2}\right).

Proof.

The arguments and computations involved are completely analogous to that of Proposition 3.4. ∎

Here, we also compare the proposed estimators with the standard MLE. In Figure 2 we present the Bias and RMSE obtained from 100,000100,000 replications assuming n=10,15,,100n=10,15,\ldots,100 and ϕ=4\phi=4 and λ=10\lambda=10. We also presented only the results related to ϕ\phi. It can be seen from the obtained results that both estimators returned very close estimates.

Refer to caption
Figure 2: Bias and RMSE for ϕ\phi for samples sizes of 10,15,,10010,15,\ldots,100 elements considering ϕ=4\phi=4 and λ=10\lambda=10.

Note that the approach given above can be considered for other particular cases. For instance, the Wilson-Hilferty distributions are obtained when α=3\alpha=3. Hence, we can obtain closed-form estimators for cited distribution as well. It is essential to mention that, in the above examples, we do not claim that the GG distribution is the unique distribution which can be used to obtain closed-form estimators for the Gamma and Nakagami distributions. Different choices for gg may lead to different closed-form estimators.

Now we apply the proposed approach in a generalized version of the beta distribution that will return a closed-form estimator for both parameters.

Example 3: Let us assume that the chosen beta distribution has the PDF given by

f(x;α,β)=xα1(1x)β1B(α,β)0<x<1,f(x\,;\,\alpha,\beta)=\frac{x^{\alpha-1}(1-x)^{\beta-1}}{\operatorname{B}(\alpha,\beta)}\quad 0<x<1, (21)

where B(α,β)=Γ(α)Γ(β)Γ(α+β)\operatorname{B}(\alpha,\beta)=\frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)} is the beta function, α>0\alpha>0, β>0\beta>0.

We can apply the generalized maximum likelihood approach for this distribution by considering the function g(x;α,β,a,c)g(x\,;\,\alpha,\beta,a,c) representing the generalized beta distribution, where α>2\alpha>2, β>2\beta>2 given by:

g(x;α,β,a,c)=(xa)α1(cx)β1(ca)α+β1B(α,β) for all x(0,1) and 0a<x<c1.g(x\,;\,\alpha,\beta,a,c)=\frac{\left(x-a\right)^{\alpha-1}\left(c-x\right)^{\beta-1}}{(c-a)^{\alpha+\beta-1}\operatorname{B}(\alpha,\beta)}\mbox{ for all }x\in(0,1)\mbox{ and }0\leq a<x<c\leq 1.

Once again in order to formulate the generalized maximum likelihood equations for 𝜽=(α,β)\boldsymbol{\theta}=(\alpha,\beta) over the coordinates (a,c)(a,c) at (a,c)=(0,1)(a,c)=(0,1) we note that

Eα,β[1X1]=α+β1α1 and Eα,β[11X1]=α+β1β1,\operatorname{E}_{\alpha,\beta}\left[\frac{1}{X_{1}}\right]=\frac{\alpha+\beta-1}{\alpha-1}\mbox{ and }\operatorname{E}_{\alpha,\beta}\left[\frac{1}{1-X_{1}}\right]=\frac{\alpha+\beta-1}{\beta-1}, (22)

from which it follows that

Eα,β[alogg(X1;α,β,0,1)]=(α1)Eα,β[1X1]+(α+β1)Eα,β[1]=0 and\displaystyle\operatorname{E}_{\alpha,\beta}\left[\frac{\partial}{\partial a}\log g(X_{1}\,;\,\alpha,\beta,0,1)\right]=-(\alpha-1)\operatorname{E}_{\alpha,\beta}\left[\frac{1}{X_{1}}\right]+(\alpha+\beta-1)\operatorname{E}_{\alpha,\beta}\left[1\right]=0\mbox{ and}
Eα,β[clogg(X1;α,β,0,1)]=(β1)Eα,β[1X1](α+β1)Eα,β[1]=0.\displaystyle\operatorname{E}_{\alpha,\beta}\left[\frac{\partial}{\partial c}\log g(X_{1}\,;\,\alpha,\beta,0,1)\right]=(\beta-1)\operatorname{E}_{\alpha,\beta}\left[\frac{1}{X_{1}}\right]-(\alpha+\beta-1)\operatorname{E}_{\alpha,\beta}\left[1\right]=0.

that is, (9) is satisfied. Thus the generalized likelihood equations for 𝜽=(α,β)\boldsymbol{\theta}=(\alpha,\beta) over the coordinates (a,c)(a,c) at (a,c)=(0,1)(a,c)=(0,1) are given by

i=1nalogg(Xi;α,β,0,1)=(α1)i=1n1Xi+n(α+β1)=0, and\displaystyle\sum_{i=1}^{n}\frac{\partial}{\partial a}\log g(X_{i}\,;\,\alpha,\beta,0,1)=-(\alpha-1)\sum_{i=1}^{n}{\frac{1}{X_{i}}}\,+n(\alpha+\beta-1)=0,\mbox{ and }
i=1nclogg(Xi;α,β,0,1)=(β1)i=1n11Xin(α+β1)=0.\displaystyle\sum_{i=1}^{n}\frac{\partial}{\partial c}\log g(X_{i}\,;\,\alpha,\beta,0,1)=(\beta-1)\sum_{i=1}^{n}{\frac{1}{1-X_{i}}}\,-n(\alpha+\beta-1)=0.

Note that, from the harmonic-arithmetic inequality, as long as the equality X1==XnX_{1}=\ldots=X_{n} does not hold, we have i=1n1XiXin2i=1nXi1Xi>0\sum_{i=1}^{n}\frac{1-X_{i}}{X_{i}}-\frac{n^{2}}{\sum_{i=1}^{n}\frac{X_{i}}{1-X_{i}}}>0 and i=1nXi1Xin2i=1n1XiXi>0\sum_{i=1}^{n}\frac{X_{i}}{1-X_{i}}-\frac{n^{2}}{\sum_{i=1}^{n}\frac{1-X_{i}}{X_{i}}}>0, in which case, after some algebraic manipulations it is seem that the only solutions for the above system of linear equations are given by

α^n=(i=1n1Xi)(i=1n1XiXin2i=1nXi1Xi)1.\hat{\alpha}_{n}=\left(\sum_{i=1}^{n}\frac{1}{X_{i}}\right)\left(\sum_{i=1}^{n}\frac{1-X_{i}}{X_{i}}-\frac{n^{2}}{\sum_{i=1}^{n}\frac{X_{i}}{1-X_{i}}}\right)^{-1}. (23)
β^n=(i=1n11Xi)(i=1nXi1Xin2i=1n1XiXi)1.\hat{\beta}_{n}=\left(\sum_{i=1}^{n}\frac{1}{1-X_{i}}\right)\left(\sum_{i=1}^{n}\frac{X_{i}}{1-X_{i}}-\frac{n^{2}}{\sum_{i=1}^{n}\frac{1-X_{i}}{X_{i}}}\right)^{-1}. (24)

In the following, we apply Theorem 2.2 prove that these estimators are consistent and asymptotically normal.

Proposition 3.3.

α^n\hat{\alpha}_{n} and β^n\hat{\beta}_{n} are strongly consistent estimators for the true parameters α\alpha and β\beta, and asymptotically normal with n(α^nα)𝐷N(0,Q(α,β))\sqrt{n}\left(\hat{\alpha}_{n}-\alpha\right)\overset{D}{\to}N\left(0,Q(\alpha,\beta)\right) and n(β^nβ)𝐷N(0,Q(β,α))\sqrt{n}\left(\hat{\beta}_{n}-\beta\right)\overset{D}{\to}N\left(0,Q(\beta,\alpha)\right), where

Q(y,z)=y(y1)2(4yz26z210yz+5y+16z10)(y2)(z2)(y+z1) for all y>2 and z>2.Q(y,z)=\frac{y(y-1)^{2}(4yz^{2}-6z^{2}-10yz+5y+16z-10)}{(y-2)(z-2)(y+z-1)}\mbox{ for all }y>2\mbox{ and }z>2.
Proof.

In order to apply Theorem 2.2 we note g(x;α,β,a,c)g(x\,;\,\alpha,\beta,a,c) can be written as

g(x;α,β,a,c)=V(x)exp[η1(α,β)log(xa)+η2(α,β)log(cx)+L(α,β,a,c)]g(x\,;\,\alpha,\beta,a,c)=V(x)\exp\left[\eta_{1}(\alpha,\beta)\log(x-a)+\eta_{2}(\alpha,\beta)\log(c-x)+L(\alpha,\beta,a,c)\right]

for all x(0,1)x\in(0,1), α>2\alpha>2, β>2\beta>2 and (a,c)𝒜x(a,c)\in\mathcal{A}_{x}, with 𝒜x\mathcal{A}_{x} representing the restriction 0a<x<c10\leq a<x<c\leq 1, where

V(x)=1,η1(α,β)=α1,η2(α,β)=β1 and\displaystyle V(x)=1,\ \eta_{1}(\alpha,\beta)=\alpha-1,\ \eta_{2}(\alpha,\beta)=\beta-1\mbox{ and }
L(α,β,a,b)=(α+β1)log(ca)log(B(α,β)).\displaystyle L(\alpha,\beta,a,b)=-(\alpha+\beta-1)\log(c-a)-\log(\operatorname{B}(\alpha,\beta)).

In order to check condition (A)(A) of Theorem 2.2 note that for a=0a=0 and c=1c=1, following the computation of the Fisher information matrix for gg given in [4], we have

J(α,β)=[β(α1)11α(β1)] and K(α,β)=[β(α+β1)(α2)α+β1α+β1α(α+β1)(β2)].J(\alpha,\beta)=\begin{bmatrix}\frac{\beta}{(\alpha-1)}&-1\\ 1&-\frac{\alpha}{(\beta-1)}\end{bmatrix}\mbox{ and }K(\alpha,\beta)=\begin{bmatrix}\frac{\beta(\alpha+\beta-1)}{(\alpha-2)}&\alpha+\beta-1\\ \alpha+\beta-1&\frac{\alpha(\alpha+\beta-1)}{(\beta-2)}\end{bmatrix}. (25)

Thus, since α+β1>0\alpha+\beta-1>0, it is easy to see J(α,β)J(\alpha,\beta) is invertible with

(J(α,β))1=[α(α1)α+β1(α1)(β1)α+β1(α1)(β1)α+β1β(α1)α+β1].(J(\alpha,\beta))^{-1}=\begin{bmatrix}\frac{\alpha(\alpha-1)}{\alpha+\beta-1}&-\frac{(\alpha-1)(\beta-1)}{\alpha+\beta-1}\\ \frac{(\alpha-1)(\beta-1)}{\alpha+\beta-1}&-\frac{\beta(\alpha-1)}{\alpha+\beta-1}\end{bmatrix}.

Therefore we conclude condition (A)(A) is satisfied, and after some algebraic computations one may find that

(J(α,β)1)TK(α,β)J(α,β)1=[Q(α,β)Q1(α,β)Q1(α,β)Q(β,α)].(J(\alpha,\beta)^{-1})^{T}K(\alpha,\beta)J(\alpha,\beta)^{-1}=\begin{bmatrix}Q(\alpha,\beta)&Q_{1}(\alpha,\beta)\\ Q_{1}(\alpha,\beta)&Q(\beta,\alpha)\end{bmatrix}. (26)

where Q(y,z)Q(y,z) is as in the proposition and Q1(y,z)Q_{1}(y,z) is a rational function on yy and zz.

Once again, item (B)(B) is straightforward to check from (22). Thus, conditions (A)(A) and (B)(B) of Theorem 2.2 are valid, and therefore, following the same arguments as in the proof of Proposition 3.4, the proposition follows from the conclusion of Theorem 2.2 combined with (26). ∎

Figure 3 provides the Bias and RMSE obtained from 100,000100,000 replications assuming n=10,15,,100n=10,15,\ldots,100 and α=3\alpha=3 and β=2.5\beta=2.5. Here we considered the proposed estimator and compared with the standard MLE that does not have a closed-form expression.

Refer to caption
Figure 3: Bias and RMSE for α\alpha and β\beta for samples sizes of 10,15,,10010,15,\ldots,100 elements considering α=3\alpha=3 and β=2.5\beta=2.5.

Unlike the Gamma and Nakagami distributions, we observed that the closed-form estimator has an additional bias. Although they are obtained from different distributions for many parameter values, they returned similar results. A major drawback of the estimators (23) and (24) is that the properties that ensure the consistency and asymptotic normality do not hold when the values of α\alpha and β\beta are smaller than 22.

Example 4: Let us consider that (X1,Y1)(X_{1},Y_{1}), (X2,Y2)(X_{2},Y_{2}), \ldots, (Xn,Yn)(X_{n},Y_{n}) are iid random variables (RV) following a gamma distribution with probability density function (PDF) given by:

f(x,y;β,α1,α2)=1βα2Γ(α1)Γ(α2)xα11(yx)α21eyβ, where 0<x<y<f(x,y\,;\,\beta,\alpha_{1},\alpha_{2})=\frac{1}{\beta^{\alpha^{*}_{2}}\Gamma(\alpha_{1})\Gamma(\alpha_{2})}x^{\alpha_{1}-1}(y-x)^{\alpha_{2}-1}e^{-\frac{y}{\beta}},\mbox{ where }0<x<y<\infty (27)

where α1\alpha_{1}, α2\alpha_{2} and β\beta are positive, α21\alpha_{2}\neq 1 and α2=α1+α2\alpha_{2}^{*}=\alpha_{1}+\alpha_{2}.

We can apply the generalized maximum likelihood approach for this distribution by considering the density function g(x,y;β,α1,α2,γ1,γ2)g(x,y\,;\,\beta,\alpha_{1},\alpha_{2},\gamma_{1},\gamma_{2}) representing the generalized gamma distribution given by

g(x,y;β,α1,α2,γ1,γ2)=γ1γ2βα2Γ(α1)Γ(α2)xα1γ11(yγ2xγ1)α21eyγ2βyγ21g(x,y\,;\,\beta,\alpha_{1},\alpha_{2},\gamma_{1},\gamma_{2})=\frac{\gamma_{1}\gamma_{2}}{\beta^{\alpha^{*}_{2}}\Gamma(\alpha_{1})\Gamma(\alpha_{2})}x^{\alpha_{1}\gamma_{1}-1}\left(y^{\gamma_{2}}-x^{\gamma_{1}}\right)^{\alpha_{2}-1}e^{-\frac{y^{\gamma_{2}}}{\beta}}y^{\gamma_{2}-1}

where β\beta, α1\alpha_{1}, α2\alpha_{2} are positive, 0<x<y0<x<y, and (γ1,γ2)𝒜x,y(\gamma_{1},\gamma_{2})\in\mathcal{A}_{x,y}, where 𝒜x,y(0,)\mathcal{A}_{x,y}\subset(0,\infty) represents the inequality xγ1<yγ2x^{\gamma_{1}}<y^{\gamma_{2}}.

In order to formulate the generalized maximum likelihood equations for this distribution at (γ1,γ2)=(1,1)(\gamma_{1},\gamma_{2})=(1,1), letting Z¯1\bar{Z}_{1}, Z¯2\bar{Z}_{2}, Z¯3\bar{Z}_{3}, Z¯4\bar{Z}_{4}, Z¯5\bar{Z}_{5} and Z¯6\bar{Z}_{6} be the means of YY, logX\log X, logY\log Y, YlogYY\log Y, XlogXYX\frac{X\log X}{Y-X} and YlogYYX\frac{Y\log Y}{Y-X}, respectively, where Y=(Y1,,Yn)Y=(Y_{1},\cdots,Y_{n}) and X=(x1,,Xn)X=(x_{1},\cdots,X_{n}), and define z¯i\bar{z}_{i} analogously for 1i61\leq i\leq 6. From [21] we have

E𝜽[Z¯1]\displaystyle\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{1}\right] =α2β,E𝜽[Z¯2]=ψ(α1)+logβ,E𝜽[Z¯3]=ψ(α2)+logβ,\displaystyle=\alpha_{2}^{*}\beta,\ \operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{2}\right]=\psi(\alpha_{1})+\log\beta,\ \operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{3}\right]=\psi(\alpha_{2}^{*})+\log\beta, (28)
E𝜽[Z¯4]\displaystyle\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{4}\right] =α2β[ψ(α2)+logβ+1α2],\displaystyle=\alpha_{2}^{*}\beta\left[\psi(\alpha_{2}^{*})+\log\beta+\frac{1}{\alpha_{2}^{*}}\right],
E𝜽[Z¯5]\displaystyle\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{5}\right] =α1α21[ψ(α1)+logβ+1α1], and\displaystyle=\frac{\alpha_{1}}{\alpha_{2}-1}\left[\psi(\alpha_{1})+\log\beta+\frac{1}{\alpha_{1}}\right],\mbox{ and}
E𝜽[Z¯6]\displaystyle\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{6}\right] =α21α21[ψ(α2)+logβ],\displaystyle=\frac{\alpha_{2}^{*}-1}{\alpha_{2}-1}\left[\psi(\alpha_{2}^{*})+\log\beta\right],

from which it follows that

E𝜽[βlogg(X1,Y1;𝜽,1,1)]\displaystyle\operatorname{E}_{\boldsymbol{\theta}}\left[\frac{\partial}{\partial\beta}\log g(X_{1},Y_{1}\,;\,\boldsymbol{\theta},1,1)\right] =(α1+α2)β+1β2E𝜽[Z¯1]=0,\displaystyle=-\frac{(\alpha_{1}+\alpha_{2})}{\beta}+\frac{1}{\beta^{2}}\operatorname{E}_{\boldsymbol{\theta}}[\bar{Z}_{1}]=0,
E𝜽[γ1logg(X1,Y1;𝜽,1,1)]\displaystyle\operatorname{E}_{\boldsymbol{\theta}}\left[\frac{\partial}{\partial\gamma_{1}}\log g(X_{1},Y_{1}\,;\,\boldsymbol{\theta},1,1)\right] =α1E𝜽[Z¯2](α21)E𝜽[Z¯5]+1=0,\displaystyle=\alpha_{1}\operatorname{E}_{\boldsymbol{\theta}}[\bar{Z}_{2}]-(\alpha_{2}-1)\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{5}\right]+1=0,
E𝜽[γ2logg(X1,Y1;𝜽,1,1)]\displaystyle\operatorname{E}_{\boldsymbol{\theta}}\left[\frac{\partial}{\partial\gamma_{2}}\log g(X_{1},Y_{1}\,;\,\boldsymbol{\theta},1,1)\right] =(α21)E𝜽[Z¯6]1βE𝜽[Z¯4]+1+E𝜽[Z¯3]=0,\displaystyle=(\alpha_{2}-1)\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{6}\right]-\frac{1}{\beta}\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{4}\right]+1+\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{3}\right]=0,

that is, (9) is satisfied. Thus, the generalized likelihood equations for 𝜽=(β,α1,α2)\boldsymbol{\theta}=(\beta,\alpha_{1},\alpha_{2}) over the coordinates (β,γ1,γ2)(\beta,\gamma_{1},\gamma_{2}) at (γ1,γ2)=(1,1)(\gamma_{1},\gamma_{2})=(1,1) are given by

1ni=1nβlogg(xi,yi;𝜽,1,1)\displaystyle\frac{1}{n}\sum_{i=1}^{n}\frac{\partial}{\partial\beta}\log g(x_{i},y_{i}\,;\,\boldsymbol{\theta},1,1) =α1+α2β+1β2z¯1=0,\displaystyle=-\frac{\alpha_{1}+\alpha_{2}}{\beta}+\frac{1}{\beta^{2}}\bar{z}_{1}=0,
1ni=1nγ1logg(xi,yi;𝜽,1,1)\displaystyle\frac{1}{n}\sum_{i=1}^{n}\frac{\partial}{\partial\gamma_{1}}\log g(x_{i},y_{i}\,;\,\boldsymbol{\theta},1,1) =α1z¯2(α21)z¯5+1=0,\displaystyle=\alpha_{1}\bar{z}_{2}-(\alpha_{2}-1)\bar{z}_{5}+1=0,
1ni=1nγ2logg(xi,yi;𝜽,1,1)\displaystyle\frac{1}{n}\sum_{i=1}^{n}\frac{\partial}{\partial\gamma_{2}}\log g(x_{i},y_{i}\,;\,\boldsymbol{\theta},1,1) =(α21)z¯61βz¯4+1+z¯3=0\displaystyle=(\alpha_{2}-1)\bar{z}_{6}-\frac{1}{\beta}\bar{z}_{4}+1+\bar{z}_{3}=0

Multiplying the first equation above by β\beta we obtain a linear system of equations in α1\alpha_{1}, α2\alpha_{2} and 1β\frac{1}{\beta}, from which, using the Cramer rule it will follow that it has as unique solution given by

α^2=B(𝒛¯)A(𝒛¯),α^1=1z¯2[(α^21)j=1nz¯51],β^=z¯1(α^1+α^2).\displaystyle\hat{\alpha}_{2}=-\frac{B(\boldsymbol{\bar{z}})}{A(\boldsymbol{\bar{z}})},\ \hat{\alpha}_{1}=\frac{1}{\bar{z}_{2}}\left[(\hat{\alpha}_{2}-1)\sum_{j=1}^{n}\bar{z}_{5}-1\right],\ \hat{\beta}=\frac{\bar{z}_{1}}{(\hat{\alpha}_{1}+\hat{\alpha}_{2})}.

as long as A(𝒛¯)0A(\boldsymbol{\bar{z}})\neq 0, where

A(𝒛¯)=z¯6z¯1z¯2z¯5z¯4z¯2z¯4 and B(𝒛¯)=(z¯5+1)z¯4+(1+z¯3+z¯6)z¯1z¯2A(\boldsymbol{\bar{z}})=\bar{z}_{6}\bar{z}_{1}\bar{z}_{2}-\bar{z}_{5}\bar{z}_{4}-\bar{z}_{2}\bar{z}_{4}\mbox{ and }B(\boldsymbol{\bar{z}})=-(\bar{z}_{5}+1)\bar{z}_{4}+(1+\bar{z}_{3}+\bar{z}_{6})\bar{z}_{1}\bar{z}_{2}

for 𝒛¯=(z¯1,,z¯6)\boldsymbol{\bar{z}}=\left(\bar{z}_{1},\cdots,\bar{z}_{6}\right).

We now apply Theorem 2.2 to prove that the obtained estimators are strongly consistent and asymptotically normal.

Proposition 3.4.

α^1\hat{\alpha}_{1}, α^2\hat{\alpha}_{2} and β^\hat{\beta} are strongly consistent estimators for the true parameters and α^1\hat{\alpha}_{1}, α^2\hat{\alpha}_{2} and β^\hat{\beta} are asymptotically normal, as long as α21\alpha_{2}\neq 1 and

(α21)ψ(α1)+α2ψ(α2)+(2α21)log(β)+10.(\alpha_{2}^{*}-1)\psi(\alpha_{1})+\alpha_{2}^{*}\psi(\alpha_{2}^{*})+(2\alpha_{2}^{*}-1)\log(\beta)+1\neq 0.
Proof.

In order to apply Theorem 2.2 we note g(x;λ,ϕ,α)g(x\,;\,\lambda,\phi,\alpha) can be rewritten as

g(x,y;β,α1,α2,γ1,γ2)=γ1γ2βα2Γ(α1)Γ(α2)xα1γ11(yγ2xγ1)α21eyγ2βyγ21=\displaystyle g(x,y\,;\,\beta,\alpha_{1},\alpha_{2},\gamma_{1},\gamma_{2})=\frac{\gamma_{1}\gamma_{2}}{\beta^{\alpha^{*}_{2}}\Gamma(\alpha_{1})\Gamma(\alpha_{2})}x^{\alpha_{1}\gamma_{1}-1}\left(y^{\gamma_{2}}-x^{\gamma_{1}}\right)^{\alpha_{2}-1}e^{-\frac{y^{\gamma_{2}}}{\beta}}y^{\gamma_{2}-1}=
V(x)exp(η1(𝜽)T1(x,γ1,γ2)+η2(𝜽)T2(x,γ1,γ2)+η3(𝜽)T3(x,γ1,γ2)+L(λ,ϕ,α))\displaystyle V(x)\exp\left(\eta_{1}(\boldsymbol{\theta})T_{1}(x,\gamma_{1},\gamma_{2})+\eta_{2}(\boldsymbol{\theta})T_{2}(x,\gamma_{1},\gamma_{2})+\eta_{3}(\boldsymbol{\theta})T_{3}(x,\gamma_{1},\gamma_{2})+L(\lambda,\phi,\alpha)\right)

for all 0<x<y0<x<y, positive α1\alpha_{1}, α2\alpha_{2}, β\beta, and (γ1,γ2)𝒜x,y(\gamma_{1},\gamma_{2})\in\mathcal{A}_{x,y}, where ηi(𝜽)\eta_{i}(\boldsymbol{\theta}) and Ti(x,γ1,γ2)T_{i}(x,\gamma_{1},\gamma_{2}) satisfy the conditions of Theorem 2.2.

To check condition (A)(A) of Theorem 2.2 note that, for (γ1,γ2)=(1,1)(\gamma_{1},\gamma_{2})=(1,1) and using the relations (28) we have

J(α1,α2,β)=[1β1β1β3E𝜽[Z¯1]E𝜽[Z¯2]E𝜽[Z¯5]00E𝜽[Z¯6]1β2E𝜽[Z¯4].]J(\alpha_{1},\alpha_{2},\beta)=\begin{bmatrix}\frac{1}{\beta}&\frac{1}{\beta}&\frac{1}{\beta^{3}}\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{1}\right]\\ -\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{2}\right]&\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{5}\right]&0\\ 0&-\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{6}\right]&-\frac{1}{\beta^{2}}\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{4}\right].\end{bmatrix} (29)

Thus it follows that

detJ(α1,α2,β)\displaystyle\operatorname{det}J(\alpha_{1},\alpha_{2},\beta) =1β3(E𝜽[Z¯6]E𝜽[Z¯1]E𝜽[Z¯2]E𝜽[Z¯5]E𝜽[Z¯4]E𝜽[Z¯2]E𝜽[Z¯4])\displaystyle=\frac{1}{\beta^{3}}\left(\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{6}\right]\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{1}\right]\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{2}\right]-\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{5}\right]\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{4}\right]-\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{2}\right]\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{4}\right]\right)
=1β2α2(ψ(α1)+logβ)(α21α21)(ψ(α2)+logβ)\displaystyle=\frac{1}{\beta^{2}}\alpha_{2}^{*}(\psi(\alpha_{1})+\log\beta)\left(\frac{\alpha_{2}^{*}-1}{\alpha_{2}-1}\right)(\psi(\alpha_{2}^{*})+\log\beta)
1β2(α21α21(ψ(α1)+logβ)+1α21)α2(ψ(α2)+logβ+1α2)\displaystyle-\frac{1}{\beta^{2}}\left(\frac{\alpha_{2}^{*}-1}{\alpha_{2}-1}(\psi(\alpha_{1})+\log\beta)+\frac{1}{\alpha_{2}-1}\right)\alpha_{2}^{*}\left(\psi(\alpha_{2}^{*})+\log\beta+\frac{1}{\alpha_{2}^{*}}\right)
=1β2(α21α21(ψ(α1)+logβ)+α2α21(ψ(α2)+logβ)+1α21)\displaystyle=-\frac{1}{\beta^{2}}\left(\frac{\alpha_{2}^{*}-1}{\alpha_{2}-1}(\psi(\alpha_{1})+\log\beta)+\frac{\alpha_{2}^{*}}{\alpha_{2}-1}\left(\psi(\alpha_{2}^{*})+\log\beta\right)+\frac{1}{\alpha_{2}-1}\right)
=1β2(α21)((α21)ψ(α1)+α2ψ(α2)+(2α21)log(β)+1)0\displaystyle=-\frac{1}{\beta^{2}(\alpha_{2}-1)}\left((\alpha_{2}^{*}-1)\psi(\alpha_{1})+\alpha_{2}^{*}\psi(\alpha_{2}^{*})+(2\alpha_{2}^{*}-1)\log(\beta)+1\right)\neq 0

that is, condition (A)(A) is verified.

Item (B)(B) of (2.2) is straightforward to check from the relations (28). Thus conditions (A)(A) and (B)(B) of Theorem 2.2 are valid and therefore, from Theorem 2.2, we conclude there exist 𝜽^n(𝒙)=(θ^1n(𝒙),θ^2n(𝒙),θ^3n(𝒙))\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{x})=(\hat{\theta}_{1n}(\boldsymbol{x}),\hat{\theta}_{2n}(\boldsymbol{x}),\hat{\theta}_{3n}(\boldsymbol{x})) measurable in 𝒙𝒳n\boldsymbol{x}\in\mathcal{X}^{n} satisfying items I)I) to III)III) of Theorem 2.2.

Now, from the strong law of large numbers, as nn\to\infty we have

(Z¯1,Z¯2,,Z¯6)a.s.(E𝜽[Z¯1],E𝜽[Z¯2],,E𝜽[Z¯6])\left(\bar{Z}_{1},\bar{Z}_{2},\cdots,\bar{Z}_{6}\right)\overset{a.s.}{\to}\left(\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{1}\right],\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{2}\right],\cdots,\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{6}\right]\right)

and thus it follows from the continuous mapping theorem that

A(𝒁¯)a.s.E𝜽[Z¯6]E𝜽[Z¯1]E𝜽[Z¯2]E𝜽[Z¯5]E𝜽[Z¯4]E𝜽[Z¯2]E𝜽[Z¯4]\displaystyle A(\boldsymbol{\bar{Z}})\overset{a.s.}{\to}\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{6}\right]\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{1}\right]\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{2}\right]-\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{5}\right]\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{4}\right]-\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{2}\right]\operatorname{E}_{\boldsymbol{\theta}}\left[\bar{Z}_{4}\right]
=β(α21)((α21)ψ(α1)+α2ψ(α2)+(2α21)log(β)+1)0.\displaystyle=-\frac{\beta}{(\alpha_{2}-1)}\left((\alpha_{2}^{*}-1)\psi(\alpha_{1})+\alpha_{2}^{*}\psi(\alpha_{2}^{*})+(2\alpha_{2}^{*}-1)\log(\beta)+1\right)\neq 0.

In special, due to the alternate characterization of strong convergence, it follows that, with probability converging to one strongly we have A(𝒁¯)0A(\boldsymbol{\bar{Z}})\neq 0, in which case the modified likelihood equations has (α^1,α^2,β^)(\hat{\alpha}_{1},\hat{\alpha}_{2},\hat{\beta}) as its unique solution. This fact combined with item I)I) of Theorem 2.2 implies that 𝜽^n(𝑿)(α^1,α^2,β^)a.s.0\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X})-(\hat{\alpha}_{1},\hat{\alpha}_{2},\hat{\beta})\overset{a.s.}{\to}0. Thus the proposition follows, once again, from items II)II) and III)III) of Theorem 2.2. ∎

4 Final Remarks

We have shown that the proposed generalized version of the maximum likelihood estimators provides a vital alternative to achieve closed-form expressions when the standard MLE approach fails. The proposed approach can also be used with discrete distributions, the results remain valid, and the obtained estimators are still strong consistent, invariant, and asymptotic normally distributed. Due to the likelihood function’s flexibility, additional complexity can be included in the distribution and the inferential procedure such as censoring, long-term survival, covariates, and random effects.

The method introduced in this study particularly benefits from the utilization of generalized versions of the baseline distribution. This aspect not only adds significant impetus to the application of various new distributions that have emerged over recent decades but also underscores their practical relevance. Moreover, given that the estimators derived from these generalized distributions are not uniform in nature, it prompts an insightful comparison among them. Such comparative analyses are instrumental in identifying the most effective estimator, especially when evaluated against specific performance metrics. On a different note, our findings demonstrate that the generalized form is not confined to being a distribution. This realization broadens our investigative scope beyond just generalized density functions, allowing for a more expansive and inclusive exploration of potential solutions.

As shown in Examples 1 and 2, the estimators’ behaviors in terms of Bias and RMSE are similar to those obtained under the MLE for the Gamma and Nakagami distributions. Therefore, corrective bias approaches can also be used to remove the bias of the generalized estimators. For the Beta distribution, the comparison showed different behavior for the proposed estimators. We observed that for specific small values of the parameters, the results might not be consistent. This example illustrates what happens in situations where, for some parameter values, the Fisher information of the generalized distribution has singularity problems. Finally, we discuss an approach to obtain closed-form estimators for a bivariate model which provides some insights that can be used in other multivariate models.

This observation lays the groundwork for further exploration, especially in the realm of real-time statistical estimation. It underscores the need for new estimators for distributions with intricate parameter spaces, tailoring them for rapid computation. This aspect is particularly vital for integration with machine learning methodologies, such as tree-based algorithms, where swift and efficient computational techniques are essential. Our study adds a new dimension to the ongoing discourse in statistical estimation, pivoting towards solutions that are not only theoretically sound but also practically viable in dealing with complex data sets. In an era where data complexity and volume are escalating, our approach heralds a promising direction for developing more agile and adaptable statistical tools, crucial for real-time analysis and decision-making in dynamic environments.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Acknowledgements

Eduardo Ramos acknowledges financial support from São Paulo State Research Foundation (FAPESP Proc. 2019/27636-9). Francisco Rodrigues acknowledges financial support from CNPq (grant number 309266/2019-0). Francisco Louzada is supported by the Brazilian agencies CNPq (grant number 301976/2017-1) and FAPESP (grant number 2013/07375-0).

Appendix

In order to prove Theorem 2.2 we shall need the tecnical lemma that follows.

In the following, given 𝒙m\boldsymbol{x}\in\mathbb{R}^{m} we let 𝒙2=i=1mxi2\left\|\boldsymbol{x}\right\|_{2}=\sqrt{\sum_{i=1}^{m}x_{i}^{2}} and given a matrix AMm()A\in M_{m}(\mathbb{R}) we let A2\left\|A\right\|_{2} denote the usual spectral norm defined by A2=supx=1AxT2\left\|A\right\|_{2}=\sup_{\left\|x\right\|=1}\left\|Ax^{T}\right\|_{2}. Moreover, given a differentiable function F:ΘmF:\Theta\to\mathbb{R}^{m}, for Θm\Theta\subset\mathbb{R}^{m} open, we denote θjF(𝜽)=(θjF1(𝜽),,θjFn(𝜽))\frac{\partial}{\partial\theta_{j}}F(\boldsymbol{\theta})=\left(\frac{\partial}{\partial\theta_{j}}F_{1}(\boldsymbol{\theta}),\cdots,\frac{\partial}{\partial\theta_{j}}F_{n}(\boldsymbol{\theta})\right), and we denote by 𝜽F(𝜽)\frac{\partial}{\partial\boldsymbol{\theta}}F(\boldsymbol{\theta}) the Jacobian of FF at 𝜽Θ\boldsymbol{\theta}\in\Theta, that is, 𝜽F(𝜽)=(θjFi(𝜽))Mm()\frac{\partial}{\partial\boldsymbol{\theta}}F(\boldsymbol{\theta})=\left(\frac{\partial}{\partial\theta_{j}}F_{i}(\boldsymbol{\theta})\right)\in M_{m}(\mathbb{R}) for all 𝜽Θ\boldsymbol{\theta}\in\Theta.

Lemma .1.

Let Θm\Theta\subset\mathbb{R}^{m} be open, let F:ΘmF:\Theta\to\mathbb{R}^{m} be C1C^{1}, let JMm()J\in M_{m}(\mathbb{R}) be invertible, denote λ=12J121\lambda=\frac{1}{2}\left\|J^{-1}\right\|_{2}^{-1} and r=1λF(𝛉0)2r=\frac{1}{\lambda}\left\|F(\boldsymbol{\theta}_{0})\right\|_{2}, and suppose that:

B¯(𝜽0,r)Θ and 𝜽F(𝜽)J2λ for all θB¯(𝜽0,r),\displaystyle\bar{B}(\boldsymbol{\theta}_{0},r)\subset\Theta\mbox{ and }\left\|\frac{\partial}{\partial\boldsymbol{\theta}}F(\boldsymbol{\theta})-J\right\|_{2}\leq\lambda\mbox{ for all }\theta\in\bar{B}(\boldsymbol{\theta}_{0},r),

Then there exist 𝛉B¯(𝛉0,r)\boldsymbol{\theta}^{*}\in\bar{B}(\boldsymbol{\theta}_{0},r) such that F(𝛉)=0F(\boldsymbol{\theta}^{*})=0.

Proof.

The proof shall follow from a simple application of the Browder Fixed Point Theorem.

Letting L:B¯(𝜽0,r)mL:\bar{B}(\boldsymbol{\theta}_{0},r)\to\mathbb{R}^{m} be defined by L(𝜽)=𝜽J1𝜽F(𝜽)L(\boldsymbol{\theta})=\boldsymbol{\theta}-J^{-1}\frac{\partial}{\partial\boldsymbol{\theta}}F(\boldsymbol{\theta}) for all B¯(𝜽0,r)\bar{B}(\boldsymbol{\theta}_{0},r) we shall prove that L(B¯(𝜽0,r))B¯(𝜽0,r)L(\bar{B}(\boldsymbol{\theta}_{0},r))\subset\bar{B}(\boldsymbol{\theta}_{0},r). Indeed, from the chain rule it follows that LL is differentiable in B¯(𝜽0,r)\bar{B}(\boldsymbol{\theta}_{0},r) with

L(𝜽)=IJ1𝜽F(𝜽)=J1(J𝜽F(𝜽)) for all 𝜽Θ.L^{\prime}(\boldsymbol{\theta})=I-J^{-1}\frac{\partial}{\partial\boldsymbol{\theta}}F(\boldsymbol{\theta})=J^{-1}\left(J-\frac{\partial}{\partial\boldsymbol{\theta}}F(\boldsymbol{\theta})\right)\mbox{ for all }\boldsymbol{\theta}\in\Theta.

Thus, for all 𝜽B¯(𝜽0,r)\boldsymbol{\theta}\in\bar{B}(\boldsymbol{\theta}_{0},r) we have

𝜽L(𝜽)2=J1(J𝜽F(𝜽))2J12J𝜽F(𝜽)212,\displaystyle\left\|\frac{\partial}{\partial\boldsymbol{\theta}}L(\boldsymbol{\theta})\right\|_{2}=\left\|J^{-1}\left(J-\frac{\partial}{\partial\boldsymbol{\theta}}F(\boldsymbol{\theta})\right)\right\|_{2}\leq\left\|J^{-1}\right\|_{2}\left\|J-\frac{\partial}{\partial\boldsymbol{\theta}}F(\boldsymbol{\theta})\right\|_{2}\leq\frac{1}{2},

and thus from the mean value inequality we have

L(𝜽)L(𝜽0)212𝜽𝜽02r2 for all 𝜽B¯(𝜽0,r).\left\|L(\boldsymbol{\theta})-L(\boldsymbol{\theta}_{0})\right\|_{2}\leq\frac{1}{2}\left\|\boldsymbol{\theta}-\boldsymbol{\theta}_{0}\right\|_{2}\leq\frac{r}{2}\mbox{ for all }\boldsymbol{\theta}\in\bar{B}(\boldsymbol{\theta}_{0},r). (30)

Moreover, note that

L(𝜽0)𝜽02=J1F(𝜽0)2J12F(𝜽0)2=12λF(𝜽0)2=r2\left\|L(\boldsymbol{\theta}_{0})-\boldsymbol{\theta}_{0}\right\|_{2}=\left\|J^{-1}F(\boldsymbol{\theta}_{0})\right\|_{2}\leq\left\|J^{-1}\right\|_{2}\left\|F(\boldsymbol{\theta}_{0})\right\|_{2}=\frac{1}{2\lambda}\left\|F(\boldsymbol{\theta}_{0})\right\|_{2}=\frac{r}{2} (31)

Thus, given 𝜽B¯(𝜽0,r)\boldsymbol{\theta}\in\bar{B}(\boldsymbol{\theta}_{0},r) from inequalities (30) and (31) and the triangle inequality we have

L(𝜽)𝜽02L(𝜽)L(𝜽0)2+L(𝜽0)𝜽02r2+r2=r,\displaystyle\left\|L(\boldsymbol{\theta})-\boldsymbol{\theta}_{0}\right\|_{2}\leq\left\|L(\boldsymbol{\theta})-L(\boldsymbol{\theta}_{0})\right\|_{2}+\left\|L(\boldsymbol{\theta}_{0})-\boldsymbol{\theta}_{0}\right\|_{2}\leq\frac{r}{2}+\frac{r}{2}=r,

that is, L(𝜽)B¯(𝜽0,r)L(\boldsymbol{\theta})\in\bar{B}(\boldsymbol{\theta}_{0},r) for all 𝜽B¯(𝜽𝟎,r)\boldsymbol{\theta}\in\bar{B}(\boldsymbol{\theta_{0}},r), which proves that L(B¯(𝜽0,r)B¯(𝜽0,r)L(\bar{B}(\boldsymbol{\theta}_{0},r)\subset\bar{B}(\boldsymbol{\theta}_{0},r). Thus, since L:B¯(𝜽0,r)B¯(𝜽0,r)L:\bar{B}(\boldsymbol{\theta}_{0},r)\to\bar{B}(\boldsymbol{\theta}_{0},r) is continuous, from the Browder Fixed Point Theorem we conclude LL has at least one fixed point 𝜽\boldsymbol{\theta}^{*} in B¯(𝜽0,r)\bar{B}(\boldsymbol{\theta}_{0},r), and thus

L(𝜽)=𝜽J1F(𝜽)=0F(𝜽)=0L(\boldsymbol{\theta}^{*})=\boldsymbol{\theta}^{*}\Rightarrow J^{-1}F(\boldsymbol{\theta}^{*})=0\Rightarrow F(\boldsymbol{\theta}^{*})=0

which concludes the proof. ∎

Additionally we shall need the following lemma, regarding elementary properties of the spectral norm:

Lemma .2.

Given AMn()A\in M_{n}(\mathbb{R}), the following items hold

  • i)

    A2i=1nbi2\left\|A\right\|_{2}\leq\sum_{i=1}^{n}\left\|b_{i}\right\|_{2}, where bi=(ai1,,ain)b_{i}=(a_{i1},\cdots,a_{in}) for 1in1\leq i\leq n.

  • ii)

    If BMn()B\in M_{n}(\mathbb{R}) is invertible and AB2<B121\left\|A-B\right\|_{2}<\left\|B^{-1}\right\|_{2}^{-1}, then AA is invertible as well.

Proof.

To prove item i)i) applying the Cauchy–Schwarz inequality, we have for all xnx\in\mathbb{R}^{n} that

AxT2=i=1nbi,x2i=1nbi22x22=(i=1nbi22)x2,\left\|Ax^{T}\right\|_{2}=\sqrt{\sum_{i=1}^{n}\langle b_{i},x\rangle^{2}}\leq\sqrt{\sum_{i=1}^{n}\left\|b_{i}\right\|_{2}^{2}\left\|x\right\|_{2}^{2}}=\left(\sqrt{\sum_{i=1}^{n}\left\|b_{i}\right\|_{2}^{2}}\right)\left\|x\right\|_{2},

which proves that A2i=1nbi22\left\|A\right\|_{2}\leq\sqrt{\sum_{i=1}^{n}\left\|b_{i}\right\|_{2}^{2}} by definition of the spectral norm, and thus the result follows directly from the inequality i=1nbi22i=1nbi2\sqrt{\sum_{i=1}^{n}\left\|b_{i}\right\|_{2}^{2}}\leq\sum_{i=1}^{n}\left\|b_{i}\right\|_{2}.

To prove item ii)ii), note that, under the hypothesis, letting C=B1AC=B^{-1}A it follows that

CI2=B1(AB)2B12AB2<1\left\|C-I\right\|_{2}=\left\|B^{-1}(A-B)\right\|_{2}\leq\left\|B^{-1}\right\|_{2}\left\|A-B\right\|_{2}<1

which implies that CC is invertible and thus A=BCA=BC must be invertible as well, since it is a product of invertible square matrices. ∎

Using the above results we are now ready to prove Theorem 2.1.

Proof.

Existence of solutions:

Letting hjh_{j} be as in (3), that is

hj(x;𝜽)=βjlogg(x;𝜽,𝜶0)E𝜽[βjlogg(x;𝜽,𝜶0)]h_{j}(x\,;\,\boldsymbol{\theta})=\frac{\partial}{\partial\beta_{j}}\log\,g\left(x\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}_{0}\right)-\operatorname{E}_{\boldsymbol{\theta}}\left[\frac{\partial}{\partial\beta_{j}}\log\,g\left(x\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}_{0}\right)\right]

for all x𝒳x\in\mathcal{X}, where (β1,,βs)=(θ1,,θsr,α1,,αr)(\beta_{1},\cdots,\beta_{s})=(\theta_{1},\cdots,\theta_{s-r},\alpha_{1},\cdots,\alpha_{r}) and letting Fn:Θ×𝒳nsF_{n}:\Theta\times\mathcal{X}^{n}\to\mathbb{R}^{s} be defined by Fn=(Fn1,,Fns)F_{n}=\left(F_{n1},\cdots,F_{ns}\right) where

Fnj(𝜽,𝒙)=1ni=1nhj(xi;𝜽),\displaystyle F_{nj}(\boldsymbol{\theta},\boldsymbol{x})=-\frac{1}{n}\sum_{i=1}^{n}h_{j}\left(x_{i}\,;\,\boldsymbol{\theta}\right),

for all 𝜽Θ\boldsymbol{\theta}\in\Theta, 𝒙=(x1,,xn)𝒳n\boldsymbol{x}=(x_{1},\cdots,x_{n})\in\mathcal{X}^{n}, and 1js1\leq j\leq s. Note due to the strong law of the large numbers and from E𝜽0[hj(Xi(w);𝜽]=0\operatorname{E}_{\boldsymbol{\theta}_{0}}\left[h_{j}(X_{i}(w)\,;\,\boldsymbol{\theta}\right]=0 for all ii\in\mathbb{N} that

Fm(𝜽0,𝑿(w))a.s.0F_{m}(\boldsymbol{\theta}_{0},\boldsymbol{X}(w))\overset{a.s.}{\to}0

and thus, from the alternative definition of strong convergence it follows that

limnPr{m=n{Fm(𝜽0,𝑿(w))2>ϵ}}=0\lim_{n\to\infty}\operatorname{Pr}\left\{\cap_{m=n}^{\infty}\left\{\left\|F_{m}(\boldsymbol{\theta}_{0},\boldsymbol{X}(w))\right\|_{2}>\epsilon\right\}\right\}=0 (32)

for all ϵ>0\epsilon>0. Now, letting J:ΘMs()J:\Theta\to M_{s}(\mathbb{R}) be defined by J(𝜽)=(Ji,j(𝜽))Ms()J(\boldsymbol{\theta})=\left(J_{i,j}(\boldsymbol{\theta})\right)\in M_{s}(\mathbb{R}), where

Ji,j(𝜽)=E𝜽0[θjhi(X1;𝜽)].J_{i,j}(\boldsymbol{\theta})=\operatorname{E}_{\boldsymbol{\theta}_{0}}\left[-\frac{\partial}{\partial\theta_{j}}h_{i}(X_{1}\,;\,\boldsymbol{\theta})\right].

Condition (B) says that

|θihj(X1;𝜽)|Mij(X1) and E𝜽0[Mij(X1)]<,\displaystyle\left|\frac{\partial}{\partial\theta_{i}}h_{j}(X_{1}\,;\,\boldsymbol{\theta})\right|\leq M_{ij}(X_{1})\mbox{ and }E_{\boldsymbol{\theta}_{0}}\left[M_{ij}(X_{1})\right]<\infty, (33)

for all ii and jj. In special, from (33) and from the dominated convergence theorem it follows that J(𝜽)J(\boldsymbol{\theta}) is continuous at 𝜽0\boldsymbol{\theta}_{0}. Moreover, denoting Ji(𝜽)=(Ji,1(𝜽),,Ji,s(𝜽))sJ_{i}(\boldsymbol{\theta})=\left(J_{i,1}(\boldsymbol{\theta}),\cdots,J_{i,s}(\boldsymbol{\theta})\right)\in\mathbb{R}^{s} for all ii, from (33) and the uniform strong law of the large numbers it follows that

sup𝜽Θ¯0θiFn(𝜽,𝑿(w))Ji(𝜽)2a.s.0\sup_{\boldsymbol{\theta}\in\overline{\Theta}_{0}}\left\|\frac{\partial}{\partial\theta_{i}}F_{n}(\boldsymbol{\theta},\boldsymbol{X}(w))-J_{i}(\boldsymbol{\theta})\right\|_{2}\overset{a.s.}{\to}0

for all ii, and thus, once again due to the alternative definition of strong convergence we have

limnPr{n=m{sup𝜽Θ¯0θiFm(𝜽,𝑿(w))Ji(𝜽)2>ϵ}}=0\lim_{n\to\infty}\operatorname{Pr}\left\{\cap_{n=m}^{\infty}\left\{\sup_{\boldsymbol{\theta}\in\overline{\Theta}_{0}}\left\|\frac{\partial}{\partial\theta_{i}}F_{m}(\boldsymbol{\theta},\boldsymbol{X}(w))-J_{i}(\boldsymbol{\theta})\right\|_{2}>\epsilon\right\}\right\}=0 (34)

for all ϵ>0\epsilon>0 and ii. Now, given m>0m>0 such that B¯(𝜽𝟎,1m)Θ¯0\bar{B}\left(\boldsymbol{\theta_{0}},\frac{1}{m}\right)\subset\bar{\Theta}_{0} and 1m<λ2\frac{1}{m}<\frac{\lambda}{2}, where λ=12J(𝜽0)121\lambda=\frac{1}{2}\left\|J(\boldsymbol{\theta}_{0})^{-1}\right\|_{2}^{-1}, combining (32) and (34), it follows there exist Nm>0N_{m}>0 and a set Ωm\Omega_{m} of probability 11m1-\frac{1}{m}, such that

Fn(𝜽0,𝑿(w))2<1m and 
sup𝜽Θ¯0
θiFn(𝜽,𝑿(w))
Ji(𝜽)2
<1sm
\left\|F_{n}(\boldsymbol{\theta}_{0},\boldsymbol{X}(w))\right\|_{2}<\frac{1}{m}\mbox{ and }\\ \sup_{\boldsymbol{\theta}\in\overline{\Theta}_{0}}\left\|\frac{\partial}{\partial\theta_{i}}F_{n}(\boldsymbol{\theta},\boldsymbol{X}(w))-J_{i}(\boldsymbol{\theta})\right\|_{2}<\frac{1}{sm}
(35)

for all nNmn\geq N_{m}, ii and wΩmw\in\Omega_{m}. Combining the second inequality of (35) with item i)i) of Lemma .2 it follows that:

sup𝜽Θ¯0𝜽Fn(𝜽,𝑿(w))J(𝜽)2<1m\sup_{\boldsymbol{\theta}\in\overline{\Theta}_{0}}\left\|\frac{\partial}{\partial\boldsymbol{\theta}}F_{n}(\boldsymbol{\theta},\boldsymbol{X}(w))-J(\boldsymbol{\theta})\right\|_{2}<\frac{1}{m}

Now, since JJ is continuous at 𝜽0\boldsymbol{\theta}_{0}, there exist an open set ΘmB¯(𝜽0,1m)Θ¯0\Theta_{m}\subset\bar{B}(\boldsymbol{\theta}_{0},\frac{1}{m})\subset\bar{\Theta}_{0} such that

J(𝜽)J(𝜽0)2<1m for all 𝜽Θm\left\|J(\boldsymbol{\theta})-J(\boldsymbol{\theta}_{0})\right\|_{2}<\frac{1}{m}\mbox{ for all }\boldsymbol{\theta}\in\Theta_{m} (36)

Combining the above inequalities with the triangle inequality we conclude that

Fn(𝜽0,𝑿(w))2<ϵ and 
sup𝜽Θm
𝜽Fn(𝜽,𝑿(w))
J(𝜽0)2
<2m<λ
\left\|F_{n}(\boldsymbol{\theta}_{0},\boldsymbol{X}(w))\right\|_{2}<\epsilon\mbox{ and }\\ \sup_{\boldsymbol{\theta}\in\Theta_{m}}\left\|\frac{\partial}{\partial\boldsymbol{\theta}}F_{n}(\boldsymbol{\theta},\boldsymbol{X}(w))-J(\boldsymbol{\theta}_{0})\right\|_{2}<\frac{2}{m}<\lambda
(37)

for all nNmn\geq N_{m} and wΩmw\in\Omega_{m}. Thus from Lemma (.1) it follows that for each wΩmw\in\Omega_{m} there exist 𝜽¯(w)ΘmB¯(𝜽0,1m)\boldsymbol{\bar{\theta}}(w)\in\Theta_{m}\subset\bar{B}\left(\boldsymbol{\theta}_{0},\frac{1}{m}\right) such that

Fn(𝜽¯(w),𝑿(w))=0 for all wΩm and nNm,F_{n}(\boldsymbol{\bar{\theta}}(w),\boldsymbol{X}(w))=0\mbox{ for all }w\in\Omega_{m}\mbox{ and }n\geq N_{m}, (38)

which, in special, proves that the generalized maximum likelihood equations has at least one solution with probability converging to one as nn\to\infty.

Construction of a measurable estimator:

We shall construct the estimator 𝜽^n\boldsymbol{\hat{\theta}}_{n}. Note that if (37) and (38) are valid for some Nm>0N_{m}>0 then it is valid for any NmNmN^{*}_{m}\geq N_{m} as well. Thus, without loss of generality we can suppose N1<N2<N3<N_{1}<N_{2}<N_{3}<\cdots. Now, given n<N1n<N_{1} we define

𝜽^n(x)=0 for all x𝒳n.\boldsymbol{\hat{\theta}}_{n}(x)=0\mbox{ for all }x\in\mathcal{X}^{n}.

On the other hand, to define 𝜽^n(x)\boldsymbol{\hat{\theta}}_{n}(x) for nN1n\geq N_{1}, let mnm_{n} be the only integer for which Nmnn<Nmn+1N_{m_{n}}\leq n<N_{m_{n}+1} is satisfied. Since N1<N2<N3<N_{1}<N_{2}<N_{3}<\cdots it follows that mnm_{n} is well defined and mnm_{n}\to\infty as nn\to\infty. Now, note that Fn(𝜽,𝒙)F_{n}(\boldsymbol{\theta},\boldsymbol{x}) is continuous in 𝜽\boldsymbol{\theta} for all 𝒙\boldsymbol{x} in 𝒳n\mathcal{X}^{n} and measurable in 𝒙\boldsymbol{x} for all 𝜽Ω\boldsymbol{\theta}\in\Omega. Thus, FnF_{n} is a Carathéodory function for all nNn\geq N. Therefore, letting ϕ:𝒳nB¯(𝜽0,1mn)\phi:\mathcal{X}^{n}\to\bar{B}\left(\boldsymbol{\theta}_{0},\frac{1}{m_{n}}\right) be the multivalued map defined by

𝜽ϕ(𝒙) if and only if Fn(𝜽,𝒙)=0 and 𝜽𝜽021mn,\boldsymbol{\theta}\in\phi(\boldsymbol{x})\mbox{ if and only if }F^{*}_{n}(\boldsymbol{\theta},\boldsymbol{x})=0\mbox{ and }\left\|\boldsymbol{\theta}-\boldsymbol{\theta}_{0}\right\|_{2}\leq\frac{1}{m_{n}}, (39)

since FnF_{n} is Carathéodory and B¯(𝜽0,1mn)\bar{B}\left(\boldsymbol{\theta}_{0},\frac{1}{m_{n}}\right) is compact, it follows from the theory of measurable maps that ϕ\phi is a measurable map (see [6], Corollary 18.8 p. 596). Now construct a second multivalued map ϕ:𝒳nB¯(𝜽𝟎,1mn)\phi^{*}:\mathcal{X}^{n}\to\bar{B}\left(\boldsymbol{\theta_{0}},\frac{1}{m_{n}}\right) defined by:

ϕ(x)=ϕ(x) if ϕ(x) and ϕ(x)={𝜽0} otherwise.\displaystyle\phi^{*}(x)=\phi(x)\mbox{ if }\phi(x)\neq\emptyset\mbox{ and }\phi^{*}(x)=\{\boldsymbol{\theta}_{0}\}\mbox{ otherwise}.

From the measurability of ϕ\phi it is clear that ϕ\phi^{*} is measurable as well, and since ϕ(x)\phi^{*}(x) is always non-empty, we can apply the measurable selection theorem (see [6], Theorem 18.7 p. 603) to obtain a measurable function 𝜽^n(x)\boldsymbol{\hat{\theta}}_{n}(x) satisfying

𝜽^n(x)ϕ(x) for all x𝒳n.\boldsymbol{\hat{\theta}}_{n}(x)\in\phi^{*}(x)\mbox{ for all }x\in\mathcal{X}^{n}.

which concludes the construction of our estimator 𝜽^n(x)\boldsymbol{\hat{\theta}}_{n}(x).

By the construction it follows that 𝜽^n(x)\boldsymbol{\hat{\theta}}_{n}(x) satisfy Fn(𝜽^n(𝒙),𝒙)=0F^{*}_{n}(\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{x}),\boldsymbol{x})=0 in every point 𝒙𝒳n\boldsymbol{x}\in\mathcal{X}^{n} in which the equation Fn(𝜽,𝒙)=0F_{n}(\boldsymbol{\theta},\boldsymbol{x})=0 has at least one solution 𝜽\boldsymbol{\theta} in B¯(𝜽0,1mn)\bar{B}\left(\boldsymbol{\theta}_{0},\frac{1}{m_{n}}\right). Thus, it follows that 𝜽^n(𝑿)\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X}) satisfy Fn(𝜽^n(𝑿(w)),𝑿(w))=0F_{n}(\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X}(w)),\boldsymbol{X}(w))=0 at every point wΩw\in\Omega in which Fn(𝜽,𝑿(w))=0F_{n}(\boldsymbol{\theta},\boldsymbol{X}(w))=0 has at least one solution 𝜽\boldsymbol{\theta} in B¯(𝜽0,1mn)\bar{B}\left(\boldsymbol{\theta}_{0},\frac{1}{m_{n}}\right), and since nNmnn\geq N_{m_{n}}, from what we proved earlier it follows this happens with probability greater or equal to 11mn1-\frac{1}{m_{n}}.

Thus, since 11mn11-\frac{1}{m_{n}}\to 1 as nn\to\infty, it follows that, with probability converging to one as nn\to\infty, 𝜽^n(𝑿)\hat{\boldsymbol{\theta}}_{n}(\boldsymbol{X}) satisfy Fn(𝜽^n(𝑿),𝑿(w))=0F_{n}\left(\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X}),\boldsymbol{X}(w)\right)=0 for all nNmnn\geq N_{m_{n}}, which proves item I)I).

Now, by construction 𝜽^n(𝒙)B¯(𝜽0,1mn)\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{x})\in\bar{B}\left(\boldsymbol{\theta}_{0},\frac{1}{m_{n}}\right) for all n1n\geq 1 and 𝒙𝒳n\boldsymbol{x}\in\mathcal{X}^{n} and since 1mn0\frac{1}{m_{n}}\to 0 as nn\to\infty it follows that 𝜽^n(𝑿(w))𝜽0\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X}(w))\to\boldsymbol{\theta}_{0} as nn\to\infty for all wΩw\in\Omega, which, in special, proves item II)II).

Asymptotic normality:

From the mean value theorem, for each fixed 1is1\leq i\leq s, and wΩw\in\Omega there must exist a 𝒚in(w)s\boldsymbol{y}_{in}(w)\in\mathbb{R}^{s} contained in the segment connecting 𝜽0\boldsymbol{\theta}_{0} to 𝜽^n(𝑿(w))\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X}(w)) such that

Fnj(𝜽^n(𝑿(w)),𝑿(w))=Fnj(𝜽0,𝑿(w))+i=1sθiFnj(𝒚in(w),𝑿(w))(θ^nj(𝑿(w))θ0j).\begin{aligned} F_{nj}(\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X}(w)),\boldsymbol{X}(w))=F_{nj}(\boldsymbol{\theta}_{0},\boldsymbol{X}(w))+\sum_{i=1}^{s}\frac{\partial}{\partial\theta_{i}}F_{nj}(\boldsymbol{y}_{in}(w),\boldsymbol{X}(w))(\hat{\theta}_{nj}(\boldsymbol{X}(w))-\theta_{0j})\end{aligned}. (40)

On the other hand, letting

Hn(𝒚,w)=Fnj(𝜽^n(𝑿(w)),𝑿(w))Fnj(𝜽0,𝑿(w))i=1sθiFnj(𝒚,𝑿(w))(θ^nj(𝑿(w))θ0j)H_{n}(\boldsymbol{y},w)=F_{nj}(\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X}(w)),\boldsymbol{X}(w))-F_{nj}(\boldsymbol{\theta}_{0},\boldsymbol{X}(w))-\sum_{i=1}^{s}\frac{\partial}{\partial\theta_{i}}F_{nj}(\boldsymbol{y},\boldsymbol{X}(w))(\hat{\theta}_{nj}(\boldsymbol{X}(w))-\theta_{0j})

for all 𝒚Θ0\boldsymbol{y}\in\Theta_{0} it follows by hypothesis (B)(B) that Hn(y,w)H_{n}(y,w) is continuous in yy and measurable in ww, and thus is a Carathéodory function, from which it follows, once again due to the theory of measurable maps and the measurable selection theorem, that such 𝒚in(w)\boldsymbol{y}_{in}(w) can be chosen to be measurable in ww.

Now, letting An(w)Ms()A_{n}(w)\in M_{s}(\mathbb{R}) be defined by An(w)=(aij,n(w))A_{n}(w)=(a_{ij,n}(w)) where aij,n(w)=θiFnj(𝒚in(w),w)a_{ij,n}(w)=\frac{\partial}{\partial\theta_{i}}F_{nj}(\boldsymbol{y}_{in}(w),w) for all 1is1\leq i\leq s, 1js1\leq j\leq s and wΩw\in\Omega, since by construction 𝜽^n(𝑿)Θmn\hat{\boldsymbol{\theta}}_{n}(\boldsymbol{X})\in\Theta_{m_{n}} for all nN1n\geq N_{1} and since 𝒚in(w)\boldsymbol{y}_{in}(w) is contained in the segment connecting 𝜽0\boldsymbol{\theta}_{0} to 𝜽^n(𝑿)\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X}), it follows that 𝒚in(w)Θmn\boldsymbol{y}_{in}(w)\in\Theta_{m_{n}} for all nN1n\geq N_{1} as well. Thus, once again combining the second inequality from (37) with item i)i) of Lemma .2 it follows that

An(w)J(𝜽0)2<2mn<λ=J(𝜽0)121 for all nN1\left\|A_{n}(w)-J(\boldsymbol{\theta}_{0})\right\|_{2}<\frac{2}{m_{n}}<\lambda=\left\|J(\boldsymbol{\theta}_{0})^{-1}\right\|_{2}^{-1}\mbox{ for all }n\geq N_{1} (41)

Thus, in particular, from lemma .2 it follows An(w)A_{n}(w) is invertible for all nN1n\geq N_{1} and wΩw\in\Omega.

On the other hand, since by construction Fn(𝜽^n(𝑿(w)),𝑿(w))=0F_{n}(\boldsymbol{\hat{\theta}}_{n}(\boldsymbol{X}(w)),\boldsymbol{X}(w))=0 for all wΩnw\in\Omega_{n} it follows from (40) that:

An(w)T(𝜽^n(𝑿(w))𝜽0)T=Fn(𝜽0,𝑿(w)) for all wΩn.A_{n}(w)^{T}(\hat{\boldsymbol{\theta}}_{n}(\boldsymbol{X}(w))-\boldsymbol{\theta}_{0})^{T}=F_{n}(\boldsymbol{\theta}_{0},\boldsymbol{X}(w))\mbox{ for all }w\in\Omega_{n}. (42)

Now, let 𝜽n(w)\boldsymbol{\theta}^{*}_{n}(w) be defined as

𝜽n(w)T=𝜽0T(An(w)1)TFn(𝜽0,𝑿(w)) for all wΩ and nN1.\boldsymbol{\theta}_{n}^{*}(w)^{T}=\boldsymbol{\theta}_{0}^{T}-(A_{n}(w)^{-1})^{T}F_{n}(\boldsymbol{\theta}_{0},\boldsymbol{X}(w))\mbox{ for all }w\in\Omega\mbox{ and }n\geq N_{1}. (43)

From (42) it follows that 𝜽^n(𝑿(w))=𝜽n(w)\hat{\boldsymbol{\theta}}_{n}(\boldsymbol{X}(w))=\boldsymbol{\theta}_{n}^{*}(w) for all wΩnw\in\Omega_{n} and thus 𝜽^n(𝑿(w))a.s.𝜽n(w)\hat{\boldsymbol{\theta}}_{n}(\boldsymbol{X}(w))\overset{a.s.}{\to}\boldsymbol{\theta}_{n}^{*}(w).

Since 2mn0\frac{2}{m_{n}}\to 0 as nn\to\infty it follows from (41) that An(w)a.s.J(𝜽0)A_{n}(w)\overset{a.s.}{\to}J(\boldsymbol{\theta}_{0}) and thus from the invertibility of the matrices involved, it follows for nN1n\geq N_{1} that

(An(w))1a.s.J(𝜽0)1(A_{n}(w))^{-1}\overset{a.s.}{\to}J(\boldsymbol{\theta}_{0})^{-1} (44)

as well. Additionally, from the central limit theorem we know that

nFn(𝜽0,𝑿)𝐷Ns(0,K(𝜽0)),\sqrt{n}F_{n}(\boldsymbol{\theta}_{0},\boldsymbol{X})\overset{D}{\to}N_{s}(0,K(\boldsymbol{\theta}_{0})), (45)

which combined with (45), (43) and the Slutsky’s Theorem, implies that

n(𝜽n(w)𝜽0)T𝐷(J(𝜽0)1)TNs(0,K(𝜽0))=Ns(0,(J(𝜽0)1)TK(𝜽0)J(𝜽0)1),\sqrt{n}(\boldsymbol{\theta}_{n}^{*}(w)-\boldsymbol{\theta}_{0})^{T}\overset{D}{\to}(J(\boldsymbol{\theta}_{0})^{-1})^{T}N_{s}(0,K(\boldsymbol{\theta}_{0}))=N_{s}\left(0,(J(\boldsymbol{\theta}_{0})^{-1})^{T}K(\boldsymbol{\theta}_{0})J(\boldsymbol{\theta}_{0})^{-1}\right),

which concludes the proof, since we already proved that 𝜽^n(𝑿)a.s.𝜽n(w)\hat{\boldsymbol{\theta}}_{n}(\boldsymbol{X})\overset{a.s.}{\to}\boldsymbol{\theta}_{n}^{*}(w). ∎

As a corollary of Theorem 2.1 We now prove Theorem 2.2.

Proof.

Item (A)(A) of Theorem 2.1 is the same as that of Theorem 2.2, and thus this item is satisfied.

Now, from hypothesis we see that

hj(x;𝜽)=k=1sθjηk(𝜽)(Tk(x,𝜶0)E𝜽[Tk(X1,𝜶0)])+θjlogL(𝜽,𝜶0), for 1jsr\displaystyle h_{j}(x\,;\,\boldsymbol{\theta})=\sum_{k=1}^{s}\frac{\partial}{\partial\theta_{j}}\eta_{k}(\boldsymbol{\theta})\left(T_{k}(x,\boldsymbol{\alpha}_{0})-\operatorname{E}_{\boldsymbol{\theta}}\left[T_{k}(X_{1},\boldsymbol{\alpha}_{0})\right]\right)+\frac{\partial}{\partial\theta_{j}}\log L(\boldsymbol{\theta},\boldsymbol{\alpha}_{0}),\mbox{ for }1\leq j\leq s-r
hsr+j(x;𝜽)=k=1sηk(𝜽)(αjTk(x,𝜶0)E𝜽[αjTk(X1,𝜶0)])+αjlogL(𝜽,𝜶0), for 1jr.\displaystyle h_{s-r+j}(x\,;\,\boldsymbol{\theta})=\sum_{k=1}^{s}\eta_{k}(\boldsymbol{\theta})\left(\frac{\partial}{\partial\alpha_{j}}T_{k}(x,\boldsymbol{\alpha}_{0})-\operatorname{E}_{\boldsymbol{\theta}}\left[\frac{\partial}{\partial\alpha_{j}}T_{k}(X_{1},\boldsymbol{\alpha}_{0})\right]\right)+\frac{\partial}{\partial\alpha_{j}}\log L(\boldsymbol{\theta},\boldsymbol{\alpha}_{0}),\mbox{ for }1\leq j\leq r.

From these relations and the hypothesis, it is easy to see that hj(x;𝜽,𝜶0)h_{j}(x\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}_{0}) is measurable in xx and θihj(x;𝜽,𝜶0)\frac{\partial}{\partial\theta_{i}}h_{j}(x\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}_{0}) is well defined and continuous in 𝜽\boldsymbol{\theta}, for all jj and θΘ\theta\in\Theta, that is, item (B)(B) of Theorem 2.1 is also satisfied.

Finally, letting

Mi,j(x)=k=1ssup𝜽Θ¯0|2θiθjηk(𝜽)|(|Tk(x,𝜶0)|+|E𝜽[Tk(X1,𝜶0)]|)+sup𝜽Θ¯0|2θiθjlogL(𝜽,𝜶0)|\displaystyle M_{i,j}(x)=\sum_{k=1}^{s}\sup_{\boldsymbol{\theta}\in\overline{\Theta}_{0}}\left|\frac{\partial^{2}}{\partial\theta_{i}\partial\theta_{j}}\eta_{k}(\boldsymbol{\theta})\right|\left(\left|T_{k}(x,\boldsymbol{\alpha}_{0})\right|+\left|\operatorname{E}_{\boldsymbol{\theta}}\left[T_{k}(X_{1},\boldsymbol{\alpha}_{0})\right]\right|\right)+\sup_{\boldsymbol{\theta}\in\overline{\Theta}_{0}}\left|\frac{\partial^{2}}{\partial\theta_{i}\partial\theta_{j}}\log L(\boldsymbol{\theta},\boldsymbol{\alpha}_{0})\right|

for 1is1\leq i\leq s and 1jsr1\leq j\leq s-r, and letting

Mi,sr+j(x)=k=1ssup𝜽Θ¯0|θiηk(𝜽)|(|αjTk(x,𝜶0)|+|E𝜽[αjTk(X1,𝜶0)]|)+|2θiαjlogL(𝜽,𝜶0)|\displaystyle M_{i,s-r+j}(x)=\sum_{k=1}^{s}\sup_{\boldsymbol{\theta}\in\overline{\Theta}_{0}}\left|\frac{\partial}{\partial\theta_{i}}\eta_{k}(\boldsymbol{\theta})\right|\left(\left|\frac{\partial}{\partial\alpha_{j}}T_{k}(x,\boldsymbol{\alpha}_{0})\right|+\left|\operatorname{E}_{\boldsymbol{\theta}}\left[\frac{\partial}{\partial\alpha_{j}}T_{k}(X_{1},\boldsymbol{\alpha}_{0})\right]\right|\right)+\left|\frac{\partial^{2}}{\partial\theta_{i}\partial\alpha_{j}}\log L(\boldsymbol{\theta},\boldsymbol{\alpha}_{0})\right|

for 1is1\leq i\leq s and 1jr1\leq j\leq r, one can check directly that

|θihj(x;𝜽,𝜶0)|Mij(x) for 1is and 1js.\displaystyle\left|\frac{\partial}{\partial\theta_{i}}h_{j}(x\,;\,\boldsymbol{\theta},\boldsymbol{\alpha}_{0})\right|\leq M_{ij}(x)\mbox{ for }1\leq i\leq s\mbox{ and }1\leq j\leq s.

Additionally, since E𝜽[Ti(x,𝜶𝟎)]\operatorname{E}_{\boldsymbol{\theta}}\left[T_{i}(x,\boldsymbol{\alpha_{0}})\right] and E𝜽[αjTi(x,𝜶0)]\operatorname{E}_{\boldsymbol{\theta}}\left[\frac{\partial}{\partial\alpha_{j}}T_{i}(x,\boldsymbol{\alpha}_{0})\right] are finite for all 𝜽Θ\boldsymbol{\theta}\in\Theta, it follows that

E𝜽0[Mij(X1)]<, for all 1is and 1js.E_{\boldsymbol{\theta}_{0}}\left[M_{ij}(X_{1})\right]<\infty,\mbox{ for all }1\leq i\leq s\mbox{ and }1\leq j\leq s.

which proves item (C)(C) of Theorem 2.1 is also satisfied. Thus we can apply the conclusions of Theorem 2.1, concluding the proof. ∎

References

  • Aldrich et al. [1997] Aldrich, J. et al. (1997). Ra fisher and the making of maximum likelihood 1912-1922. Statistical science 12(3), 162–176.
  • Andersen [1970] Andersen, E. B. (1970). Asymptotic properties of conditional maximum-likelihood estimators. Journal of the Royal Statistical Society: Series B (Methodological) 32(2), 283–301.
  • Anderson and Blair [1982] Anderson, J. and V. Blair (1982). Penalized maximum likelihood estimation in logistic regression and discrimination. Biometrika 69(1), 123–136.
  • Aryal and Nadarajah [2004] Aryal, G. and S. Nadarajah (2004). Information matrix for beta distributions. Serdica Mathematical Journal 30(4), 513p–526p.
  • Bierens [2004] Bierens, H. J. (2004). Introduction to the mathematical and statistical foundations of econometrics. Cambridge University Press.
  • Charalambos D. Aliprantis [2006] Charalambos D. Aliprantis, K. B. (2006). Infinite Dimensional Analysis: A Hitchhiker’s Guide (3rd ed.). Springer.
  • Cheng and Amin [1983] Cheng, R. and N. Amin (1983). Estimating parameters in continuous univariate distributions with a shifted origin. Journal of the Royal Statistical Society. Series B (Methodological), 394–403.
  • Cox [1975] Cox, D. R. (1975). Partial likelihood. Biometrika 62(2), 269–276.
  • Dempster et al. [1977] Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39(1), 1–22.
  • Firth [1993] Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika 80(1), 27–38.
  • Gourieroux et al. [1984] Gourieroux, C., A. Monfort, and A. Trognon (1984). Pseudo maximum likelihood methods: Theory. Econometrica: journal of the Econometric Society, 681–700.
  • Hager and Bain [1970] Hager, H. W. and L. J. Bain (1970). Inferential procedures for the generalized gamma distribution. Journal of the American Statistical Association 65(332), 1601–1609.
  • Hosking [1990] Hosking, J. R. (1990). L-moments: Analysis and estimation of distributions using linear combinations of order statistics. Journal of the Royal Statistical Society: Series B (Methodological) 52(1), 105–124.
  • Kao [1958] Kao, J. H. (1958). Computer methods for estimating weibull parameters in reliability studies. IRE Transactions on Reliability and Quality Control, 15–22.
  • Kao [1959] Kao, J. H. (1959). A graphical estimation of mixed weibull parameters in life-testing of electron tubes. Technometrics 1(4), 389–407.
  • Lehmann and Casella [2006] Lehmann, E. L. and G. Casella (2006). Theory of point estimation. Springer Science & Business Media.
  • Louzada et al. [2019] Louzada, F., P. L. Ramos, and E. Ramos (2019). A note on bias of closed-form estimators for the gamma distribution derived from likelihood equations. The American Statistician 73(2), 195–199.
  • Murphy and Van der Vaart [2000] Murphy, S. A. and A. W. Van der Vaart (2000). On profile likelihood. Journal of the American Statistical Association 95(450), 449–465.
  • Ramos et al. [2020] Ramos, P. L., F. Louzada, and E. Ramos (2020). Bias reduction in the closed-form maximum likelihood estimator for the nakagami-m fading parameter. IEEE Wireless Communications Letters 9(10), 1692–1695.
  • Redner et al. [1981] Redner, R. et al. (1981). Note on the consistency of the maximum likelihood estimate for nonidentifiable distributions. Annals of Statistics 9(1), 225–228.
  • Zhao et al. [2022] Zhao, J., Y.-H. Jang, and H.-M. Kim (2022). Closed-form and bias-corrected estimators for the bivariate gamma distribution. Journal of Multivariate Analysis 191, 105009.