matrixdist: An R Package for Statistical Analysis of Matrix Distributions

Martin Bladt Department of Mathematical Sciences, University of Copenhagen, DK-2100 Copenhagen, Denmark [email protected] , Alaric Mueller Faculty of Business and Economics, University of Lausanne, 1015 Lausanne, Switzerland [email protected] and Jorge Yslas Institute for Financial and Actuarial Mathematics, University of Liverpool, L69 7ZL Liverpool, UK [email protected]

Abstract.

The matrixdist R package provides a comprehensive suite of tools for the statistical analysis of matrix distributions, including phase-type, inhomogeneous phase-type, discrete phase-type, and related multivariate distributions. This paper introduces the package and its key features, including the estimation of these distributions and their extensions through expectation-maximisation algorithms, as well as the implementation of regression through the proportional intensities and mixture-of-experts models. Additionally, the paper provides an overview of the theoretical background, discusses the algorithms and methods implemented in the package, and offers practical examples to illustrate the application of matrixdist in real-world scenarios. The matrixdist R package aims to provide researchers and practitioners a wide set of tools for analysing and modelling complex data using matrix distributions.

1. Introduction

In recent years, matrix distributions such as phase-type (PH), inhomogeneous phase-type (IPH), discrete phase-type (DPH), and related multivariate distributions have gained prominence in various fields due to their ability to seamlessly model complex or heterogeneous data. Matrix distributions can be employed to capture diverse behaviours and dependencies in real-world scenarios, making them particularly useful in areas such as survival analysis, actuarial science, queuing theory, and finance.

PH distributions (cf. Bladt and Nielsen,, 2017) are a natural extension of the exponential distribution and are defined as the time to reach an absorbing state in an otherwise transient time-homogeneous pure-jump Markov process. These distributions possess closed-form formulas in terms of matrices for various functionals such as density, cumulative distribution function, Laplace transform, moments, and can approximate any other distribution on the positive real line. Since only the time to absorption is observed and not the actual stochastic process trajectory, models based on PH distributions are hidden Markov models. Hence, they are typically estimated using the expectation-maximisation (EM) algorithm (cf. Asmussen et al.,, 1996). However, their tail behaviour remains exponential, which can be a limiting factor in certain applications that require a more accurate tail description. This shortcoming has led to the exploration of classes extending PH distributions, offering different tail behaviours while retaining PH-like properties. One such class is comprised of IPH distributions (Albrecher and Bladt,, 2019), which can be defined equivalently as the law of a transformed PH distributed random variable or the absorption time of a time-inhomogeneous pure-jump Markov process. The tail behaviour of the resulting IPH distribution can be as heavy or light as desired, depending on the chosen transformation. Moreover, for parametric transformations, the estimation procedure has been outlined in Albrecher et al., 2022b . Several extensions to the multivariate setting of PH distributions have been introduced in the literature. The most general construction can be found in Kulkarni, (1989), called the MPH* class, whose definition consists of considering marginals that depend on a unique Markov jump process but with different rewards at each state. Regarding multivariate extensions of the IPH class, Albrecher et al., 2022b presented some alternatives that allow for estimation. Other extensions were considered in Albrecher et al., 2022b and Bladt, (2023).

For modelling count data, the DPH class (cf. Bladt and Nielsen,, 2017) provides a tractable and flexible set of distributions, which also allows for estimation via an EM algorithm. Formally, DPH distributions are defined as the time to reach absorption in a discrete Markov chain with one absorbing state and the remaining states being transient. However, they can also be viewed as an extension of the geometric distribution. The more general class of multivariate DPH distributions was introduced in Campillo-Navarro, (2018), employing a similar construction principle as the MPH* class via rewards. More recently, Bladt and Yslas, 2023c presented other more tractable classes of DPH distributions that offer similar versatility in modelling count data.

Matrix distributions have proven to be useful in various applications, motivating the development of regression models tailored to these distributions. Two notable regression models for matrix distributions include the proportional intensities (PI) model and the mixture-of-experts (MoE) specification. The PI model, as proposed by Albrecher et al., 2022a , extends the classical Cox proportional hazards model to the IPH setting. This model enables the incorporation of covariate information directly into the intensity matrix of the underlying Markov process. On the other hand, the MoE specification, originally introduced in Bladt and Yslas, 2023b , incorporates the covariate information in the vector of initial probabilities and can easily be modified to the multivariate case (cf. Albrecher et al.,, 2023).

This paper presents the matrixdist package (Bladt and Yslas, 2023a, ), designed to work with matrix distributions and implemented in R (R Core Team,, 2023). The matrixdist package leverages the S4 class system in R to ensure a well-structured and organised implementation of the various distribution classes and their associated methods. By utilising S4 classes, the package offers a user-friendly interface for interacting with PH, IPH, DPH, and related multivariate and regression distributions. Users can create distribution objects with their respective parameters, and the package will automatically manage the underlying data structures and computations. This design promotes code reusability, maintainability, and extensibility, making it easier to add new distribution classes and methods in the future while ensuring a consistent and coherent user experience. The object oriented design can be particularly efficient when dealing with large numbers of parameters during statistical estimation. Furthermore, most computationally intensive functions are implemented using the C++ language to enhance performance. These functions are then made accessible in the R language via the Rcpp package, ensuring both speed and ease-of-use for the matrixdist package.

It is worth mentioning that the R package matrixdist offers a preferable solution compared to existing packages for working with matrix distributions. To provide context, we briefly summarise some alternative software tools for matrix distributions:

•

The R package actuar (Dutang et al.,, 2008) contains implementations for the density, cumulative distribution, moments, and moment-generating function of univariate PH distributions.
•

The R package mapfit (Okamura and Dohi,, 2015) offers some estimation methods for PH distributions and for Markovian Arrival Processes (MAP). The latter can loosely be seen as dependent concatenations of PH variables (arrivals).
•

The recent PhaseTypeR package (Rivas-González et al.,, 2023) includes implementations for reward transformations and functionals for some of the multivariate extensions of homogeneous PH and DPH distributions. However, no statistical estimation is considered.

The matrixdist package stands out by providing a comprehensive and efficient suite of tools that cover univariate, multivariate, continuous, discrete, and regression matrix models, and their statistical estimation. Right-censoring is also supported. The package’s versatility, completeness, and high-speed performance make it an ideal choice for researchers and practitioners across all matrix distribution domains.

The remainder of the paper is organised as follows. Section 2 provides the mathematical formulation of univariate DPH, PH and IPH distributions, their basic properties and regression extensions. Similar information for multivariate matrix distribution is given in Section 3, where the multivariate extensions of the IPH and DPH classes are introduced. A discussion on estimation methods for the introduced models via EM algorithms is provided in Section 4, and detailed illustrations on the use of the package are provided in Section 5. Finally, Section 6 gives a condensed summary of the methods available in matrixdist and shows how further information can be accessed within the package’s documentation.

2. Univariate matrix distributions

2.1. Discrete phase-type distributions

Let $(Z_{n})_{n\in\mathbb{N}_{0}}$ be a discrete Markov chain (MC) on a finite state space $\{1,\dots,p,p+1\}$ where the first $p$ states are transient, and state $p+1$ is absorbing. Then, the MC has transition matrix $\mathbf{P}$ given by

\mathbf{P}=\left(\begin{array}[]{cc}\mathbf{S}&\mathbf{s}\\ \bm{0}&1\end{array}\right)\,,

where $\mathbf{S}$ is a $p\times p$ matrix, known as sub-transition matrix, and $\mathbf{s}$ is a column vector of dimension $p$ . Considering that the rows of $\mathbf{P}$ sum to $1$ , the following relation holds $\mathbf{s}=\mathbf{e}-\mathbf{S}\mathbf{e}$ , with $\mathbf{e}=(1,\dots,1)^{\top}$ the $p$ -dimensional column vector of ones. Furthermore, we assume that $(Z_{n})_{n\in\mathbb{N}_{0}}$ may start in any transient state with a certain probability $\alpha_{k}=\mathbb{P}(Z_{0}=k)$ , $k=1,\dots,p$ , and let $\bm{\alpha}=(\alpha_{1},\dots,\alpha_{p})$ . Then, the time until absorption $N=\inf\{n\geq 1\mid Z_{n}=p+1\}$ is said to be discrete phase-type (DPH) distributed with representation $(\bm{\alpha},\mathbf{S})$ , and we write $N\sim\mbox{DPH}(\bm{\alpha},\mathbf{S})$ or $N\sim\mbox{DPH}_{p}(\bm{\alpha},\mathbf{S})$ . Such a random variable has probability mass function $q$ and distribution function $Q$ given by

q(n)=\bm{\alpha}\mathbf{S}^{n-1}\mathbf{s}\,,\quad n\geq 1\,,

Q(n)=1-\bm{\alpha}\mathbf{S}^{n}\mathbf{e}\,,\quad n\geq 1\,.

The absorption time $N$ has $\kappa$ -factorial moments, $\kappa\in\mathbb{N}$ , given by

\mathbb{E}(N(N-1)\cdots(N-\kappa+1))=\kappa!\bm{\alpha}\mathbf{S}^{\kappa-1}\left(\mathbf{I}-\mathbf{S}\right)^{-\kappa}\mathbf{e}\,.

In particular, we have that $\mathbb{E}(N)=\bm{\alpha}\left(\mathbf{I}-\mathbf{S}\right)^{-1}\mathbf{e}$ . Furthermore, its probability-generating function $P_{N}$ is given by

P_{N}(t)=\mathbb{E}(t^{N})=\bm{\alpha}\left(t^{-1}\mathbf{I}-\mathbf{S}\right)\mathbf{s}\,,\quad|t|\leq 1\,,

where $\mathbf{I}$ is the identity matrix of appropriate dimension.

Let $N_{1}\sim\mbox{DPH}_{p_{1}}(\bm{\alpha}_{1},\mathbf{S}_{1})$ and $N_{2}\sim\mbox{DPH}_{p_{2}}(\bm{\alpha}_{2},\mathbf{S}_{2})$ be two independent DPH distributed random variables. Then,

N_{1}+N_{2}\sim\mbox{DPH}_{p_{1}+p_{2}}\left(\left(\bm{\alpha}_{1},\bm{0}\right),\left(\begin{array}[]{cc}\mathbf{S}_{1}&\mathbf{s}_{1}\bm{\alpha}_{2}\\ \bm{0}&\mathbf{S}_{2}\end{array}\right)\right)\,,

\min\left(N_{1},N_{2}\right)\sim\mbox{DPH}_{p_{1}p_{2}}\left(\bm{\alpha}_{1}\otimes\bm{\alpha}_{2},\mathbf{S}_{1}\oplus\mathbf{S}_{2}\right)\,,

and

\max\left(N_{1},N_{2}\right)\sim\mbox{DPH}_{p_{1}p_{2}+p_{1}+p_{2}}\left(\left(\bm{\alpha}_{1}\otimes\bm{\alpha}_{2},\bm{0},\bm{0}\right),\left(\begin{array}[]{ccc}\mathbf{S}_{1}\otimes\mathbf{S}_{2}&\mathbf{S}_{1}\otimes\mathbf{s}_{2}&\mathbf{s}_{1}\otimes\mathbf{S}_{2}\\ \bm{0}&\mathbf{S}_{1}&\bm{0}\\ \bm{0}&\bm{0}&\mathbf{S}_{2}\end{array}\right)\right)\,,

where $\otimes$ and $\oplus$ denote the Kronecker product and sum, respectively. Furthermore, if $U\sim Bernoulli(\nu)$ , $\nu\in[0,1]$ , then

UN_{1}+(1-U)N_{2}\sim\mbox{DPH}_{p_{1}+p_{2}}\left(\left(\nu\bm{\alpha}_{1}\,,(1-\nu)\bm{\alpha}_{2}\right),\left(\begin{array}[]{cc}\mathbf{S}_{1}&\bm{0}\\ \bm{0}&\mathbf{S}_{2}\end{array}\right)\right)\,.

Thus, the DPH class is closed under addition, minima, maxima, and finite mixtures. We refer to Bladt and Nielsen, (2017) for a detailed description of DPH distributions.

Regression

Recently, a regression model for frequency modelling based on DPH distributions was introduced in Bladt and Yslas, 2023c as a generalisation of mixture-of-experts (MoE) models. The main idea is to incorporate the covariate information on the vector of initial probabilities, which play the role of expert weights. The resulting distributional model is flexible, as it possesses the property of denseness on more general regression models satisfying some mild technical conditions. Next, we present the fundamental concepts of this regression model. Define the mapping

\bm{\alpha}:D\subset\mathbb{R}^{h}\rightarrow\Delta^{p-1},

where $\Delta^{p-1}=\{(\alpha_{1},\dots,\alpha_{p})\in\mathbb{R}^{p}\mid\sum_{k}\alpha_{k}=1,\;\alpha_{k}\geq 0\;\forall k\}$ is the standard $(p-1)$ simplex. Then, for any given vector of covariate information $\mathbf{X}=(X_{1},\dots,X_{h})\in\mathbb{R}^{h}$ , the initial probabilities of the underlying MC of a DPH distribution are taken to be $\mathbb{P}(Z_{0}=k)=\alpha_{k}(\mathbf{x}):=(\bm{\alpha}(\mathbf{x}))_{k}$ , $k=1,\dots,p$ . Thus, we say that $N\mid\mathbf{X}\sim\mbox{DPH}(\bm{\alpha}(\mathbf{X}),\mathbf{S})$ follows the DPH-MoE specification. For $D=\mathbb{R}^{h}$ , we have a convenient parametrisation of initial probabilities, namely the softmax parametrisation given by

\alpha_{k}(\mathbf{X};\bm{\gamma})=\frac{\exp\left(\mathbf{X}\bm{\gamma}_{k}\right)}{\sum_{j=1}^{p}\exp\left(\mathbf{X}\bm{\gamma}_{j}\right)}\quad k=1,\dots,p\;,

where $\bm{\gamma}_{k}\in\bar{\mathbb{R}}^{h}$ , and $\bm{\gamma}=(\bm{\gamma}_{1}^{\top},\dots,\bm{\gamma}_{p}^{\top})^{\top}\in\bar{\mathbb{R}}^{p\times h}$ . More details about this model as well as precise denseness considerations can be found in Bladt and Yslas, 2023c . We also refer to Bladt and Yslas, 2023b for an extension of this regression model to continuous phase-type distributions.

2.2. Continuous phase-type distributions

Let $(J_{t})_{t\geq 0}$ be a time-homogeneous Markov jump process on a finite state space $\{1,\dots,p,p+1\}$ , where states $1,\dots,p$ are transient, and state $p+1$ is absorbing. Then, the intensity matrix $\bm{\Lambda}$ of $(J_{t})_{t\geq 0}$ is of the form

\bm{\Lambda}=\left(\begin{array}[]{cc}\mathbf{S}&\mathbf{s}\\ \bm{0}&0\end{array}\right)\,,

where $\mathbf{S}$ is a $p\times p$ matrix, called a sub-intensity matrix, and $\mathbf{s}$ is a $p$ -dimensional column vector. Since every row of $\bm{\Lambda}$ sums to zero, it follows that $\mathbf{s}=-\mathbf{S}\,\mathbf{e}$ . Assume that the process starts somewhere in the transient space with probability $\alpha_{k}=\mathbb{P}(J_{0}=k)$ , $k=1,\dots,p$ , and let $\bm{\alpha}=(\alpha_{1},\dots,\alpha_{p})$ . Here it is assumed that the probability of starting in the absorbing state $p+1$ is zero, i.e., $\mathbb{P}(J_{0}=p+1)=0$ . Then, we say that the time until absorption $Y$ has phase-type (PH) distribution with representation $(\bm{\alpha},\mathbf{S})$ , and we write $Y\sim\mbox{PH}(\bm{\alpha},\mathbf{S})$ or $Y\sim\mbox{PH}_{p}(\bm{\alpha},\mathbf{S})$ . The density $f$ and distribution function $F$ of $Y\sim\mbox{PH}(\bm{\alpha},\mathbf{S})$ are given by

f(y)=\bm{\alpha}\exp\left(\mathbf{S}y\right)\mathbf{s}\,,\quad y>0\,,

F(y)=1-\bm{\alpha}\exp\left(\mathbf{S}y\right)\mathbf{e}\,,\quad y>0\,,

where the exponential of a matrix $\mathbf{A}$ is defined by

\exp\left(\mathbf{A}\right)=\sum_{n=0}^{\infty}\frac{\mathbf{A}^{n}}{n!}\,.

The efficient computation of the exponential of a matrix is not a straightforward task in high dimensions, and we refer to Moler and Van Loan, (1978) for a survey of different methods. The moments of $Y\sim\mbox{PH}(\bm{\alpha},\mathbf{S})$ are given by

\mathbb{E}\left(Y^{\theta}\right)=\Gamma\left(1+\theta\right)\bm{\alpha}\left(-\mathbf{S}\right)^{-\theta}\mathbf{e}\,,\quad\theta>0\,.

Moreover, its Laplace transform $L_{Y}$ is given by

L_{Y}(u)=\mathbb{E}\left(\exp(-uY)\right)=\bm{\alpha}\left(u\mathbf{I}-\mathbf{S}\right)\mathbf{s}\,,\quad u\geq 0\,.

Let $Y_{1}\sim\mbox{PH}_{p_{1}}(\bm{\alpha}_{1},\mathbf{S}_{1})$ and $Y_{2}\sim\mbox{PH}_{p_{2}}(\bm{\alpha}_{2},\mathbf{S}_{2})$ be independent PH distributed random variables. Then

Y_{1}+Y_{2}\sim\mbox{PH}_{p_{1}+p_{2}}\left(\left(\bm{\alpha}_{1},\bm{0}\right),\left(\begin{array}[]{cc}\mathbf{S}_{1}&\mathbf{s}_{1}\bm{\alpha}_{2}\\ \bm{0}&\mathbf{S}_{2}\end{array}\right)\right)\,,

\min\left(Y_{1},Y_{2}\right)\sim\mbox{PH}_{p_{1}p_{2}}\left(\bm{\alpha}_{1}\otimes\bm{\alpha}_{2},\mathbf{S}_{1}\oplus\mathbf{S}_{2}\right)\,,

and

\max\left(Y_{1},Y_{2}\right)\sim\mbox{PH}_{p_{1}p_{2}+p_{1}+p_{2}}\left(\left(\bm{\alpha}_{1}\otimes\bm{\alpha}_{2},\bm{0},\bm{0}\right),\left(\begin{array}[]{ccc}\mathbf{S}_{1}\oplus\mathbf{S}_{2}&\mathbf{I}\otimes\mathbf{s}_{2}&\mathbf{s}_{1}\otimes\mathbf{I}\\ \bm{0}&\mathbf{S}_{1}&\bm{0}\\ \bm{0}&\bm{0}&\mathbf{S}_{2}\end{array}\right)\right)\,,

This means that the PH class is closed under addition, minima, and maxima. In fact, it is also closed under any order statistic (cf. Bladt and Nielsen, (2017) for the exact and more involved expression). Moreover, the class is closed under finite mixtures. Concretely, if $U\sim Bernoulli(\nu)$ , $\nu\in[0,1]$ , then

UY_{1}+(1-U)Y_{2}\sim\mbox{PH}_{p_{1}+p_{2}}\left(\left(\nu\bm{\alpha}_{1},(1-\nu)\bm{\alpha}_{2}\right),\left(\begin{array}[]{cc}\mathbf{S}_{1}&\bm{0}\\ \bm{0}&\mathbf{S}_{2}\end{array}\right)\right)\,.

Furthermore, when the mixing probabilities are given by a DPH distribution, the resulting distribution is again PH. More specifically, let $N\sim\mbox{DPH}_{q}(\bm{\pi},\mathbf{T})$ and $Y_{1},Y_{2}\dots$ be iid, independent of $N$ , with common element $Y\sim\mbox{PH}_{p}(\bm{\alpha},\mathbf{S})$ . Then

\sum_{n=1}^{N}Y_{n}\sim\mbox{PH}_{qp}(\bm{\pi}\otimes\bm{\alpha},\mathbf{I}\otimes\mathbf{S}+\mathbf{T}\otimes\mathbf{s}\bm{\alpha})\,.

We refer to Bladt and Nielsen, (2017) for a comprehensive reading on PH distributions.

By decomposing the matrix $\mathbf{S}$ into its Jordan normal form, we observe that the right tail of a PH distribution exhibits asymptotically exponential behaviour. This limitation motivates the introduction of PH extensions that can accommodate diverse tail behaviours. One such extension is the class of inhomogeneous phase-type distributions, which is presented in the following section.

2.3. Inhomogeneous phase-type distributions

In Albrecher and Bladt, (2019), the class of inhomogeneous phase-type distribution was introduced by allowing the Markov jump process to be time-inhomogeneous in the construction principle of PH distributions. In this way, $(J_{t})_{t\geq 0}$ has an intensity matrix of the form

\bm{\Lambda}(t)=\left(\begin{array}[]{cc}\mathbf{S}(t)&\mathbf{s}(t)\\ \bm{0}&0\end{array}\right)\,,

where $\mathbf{s}(t)=-\mathbf{S}(t)\,\mathbf{e}$ . Here it is assumed that the vector of initial probabilities is $\bm{\alpha}=(\alpha_{1},\dots,\alpha_{p})$ . Then, we say that the time until absorption $Y$ has inhomogeneous phase-type (IPH) distribution with representation $(\bm{\alpha},\mathbf{S}(t))$ , and we write $Y\sim\mbox{IPH}(\bm{\alpha},\mathbf{S}(t))$ . However, this setting is too general for applications since its functionals are given in terms of product integrals. Thus, we consider the subclass of IPH distribution when $\mathbf{S}(t)$ is of the form $\mathbf{S}(t)=\lambda(t)\mathbf{S}$ , where $\lambda$ is some known non-negative real function, and $\mathbf{S}$ is a sub-intensity matrix. In this case, we write $Y\sim\mbox{IPH}(\bm{\alpha},\mathbf{S},\lambda)$ . Note that with $\lambda\equiv 1$ , one gets the conventional PH distributions. This subclass is particularly suitable for applications due to the following property. If $Y\sim\mbox{IPH}(\bm{\alpha},\mathbf{S},\lambda)$ , then there exists a function $g$ such that

Y\sim g(\tau)\,,

where $\tau\sim\mbox{PH}(\bm{\alpha},\mathbf{S})$ . The relationship between $g$ and $\lambda$ is given by the following expression

g^{-1}(y)=\int_{0}^{y}\lambda(t)dt\,.

The density $f$ and distribution function $F$ of $Y\sim\mbox{IPH}(\bm{\alpha},\mathbf{S},\lambda)$ can again be expressed using the exponential of matrices and are given by

f(y)=\lambda(y)\,\bm{\alpha}\exp\left(\int_{0}^{y}\lambda(t)dt\ \mathbf{S}\right)\mathbf{s}\,,\quad y>0\,,

F(y)=1-\bm{\alpha}\exp\left(\int_{0}^{y}\lambda(t)dt\ \mathbf{S}\right)\mathbf{e}\,,\quad y>0\,.

The class of IPH distributions is no longer confined to exponential tails, despite being defined in terms of matrix exponentials. Instead, the function $\lambda$ determines their exact asymptotic behaviour (see Albrecher et al., 2022a for details).

It turns out that a number of IPH distributions can be expressed as classical distributions with matrix-valued parameters properly defined using functional calculus. Table 2.1 contains the IPH transformations as implemented in the matrixdist package (see Albrecher and Bladt, (2019); Albrecher et al., 2022b ; Albrecher et al., 2022a for further information and rationale on the different parametrisations).

	$\lambda(t)$	$g(y)$	Parameters Domain
Matrix-Pareto	$(t+\beta)^{-1}$	$\beta\left(\exp(y)-1\right)$	$\beta>0$
Matrix-Weibull	$\beta t^{\beta-1}$	$y^{1/\beta}$	$\beta>0$
Matrix-Lognormal	$\gamma(\log(t+1))^{\gamma-1}/(t+1)$	$\exp(y^{1/\gamma})-1$	$\gamma>1$
Matrix-Loglogistic	$\theta t^{\theta-1}/(t^{\theta}+\gamma^{\theta})$	$\gamma(\exp(y)-1)^{1/\theta}$	$\gamma,\theta>0$
Matrix-Gompertz	$\exp(\beta t)$	$\log(\beta y+1)/\beta$	$\beta>0$
Matrix-GEV	-	$\mu+{\sigma}(y^{-\xi}-1)/{\xi}$	$\mu\in\mathbb{R}$ , $\sigma>0$ , $\xi\in\mathbb{R}$

Table 2.1. Transformations

Regression

At least two regression models based on IPH distributions exist in the literature. The first model, known as the proportional intensities (PI) model, was introduced in Albrecher et al., 2022a . The authors, inspired by the proportional hazards model (cf. Cox,, 1972), formulated a way of regressing on the intensity function $\lambda$ for different covariate information. More specifically, the main idea of the PI model is that the inhomogeneous intensity $\lambda(\cdot;\bm{\theta})$ , which is assumed to be fully determined by a vector of parameters $\bm{\theta}$ , is proportionally affected for different covariate information $\mathbf{X}\in\mathbb{R}^{h}$ according to a positive real-valued and measurable function $m(\cdot)$ . In other words, the model assumes that

\lambda(t;\mathbf{X},\bm{\theta},\bm{\gamma})=\lambda(t;\bm{\theta})m(\mathbf{X}\bm{\gamma}),\quad t\geq 0,

where $\bm{\gamma}$ is a $h$ -dimensional column vector containing regression coefficients. With this assumption, $Y\mid\mathbf{X}\sim\mbox{IPH}(\bm{\alpha},\mathbf{S},\lambda(\cdot;\mathbf{X},\bm{\theta},\bm{\gamma}))$ with corresponding density and cumulative functions

f(y)=m(\mathbf{X}\bm{\gamma})\lambda(y;\bm{\theta})\bm{\alpha}\exp\left(m(\mathbf{X}\bm{\gamma})\int_{0}^{y}\lambda(t;\bm{\theta})\,dt\mathbf{S}\right)\mathbf{s}\,,

and

F(y)=1-\bm{\alpha}\exp\left(m(\mathbf{X}\bm{\gamma})\int_{0}^{y}\lambda(t;\bm{\theta})\,dt\mathbf{S}\right)\mathbf{e}\,.

Typically, the measurable function $m()$ is taken to be $m(x)=\exp(x)$ , which is the default implementation in matrixdist. In this framework, when $p=1$ , one recovers the proportional hazard model. However, for $p>1$ , the implied hazard functions between different subgroups can deviate from proportionality in the distribution body but are asymptotically proportional in the tail. For more details on the PI model, we refer the reader to Albrecher et al., 2022a and Bladt, (2022).

The second regression model, called the phase-type mixture-of-experts (PH-MoE) model, was introduced in Bladt and Yslas, 2023b as a generalisation of mixture-of-experts models to the PH distributions framework and follows the same rationale as the DPH-MoE specification described above (cf. Bladt and Yslas, 2023b for further details).

3. Multivariate matrix distributions

In this section, we present the mathematical foundations of the different classes of multivariate matrix distributions implemented in matrixdist.

3.1. Multivariate discrete phase-type distributions

MDPH* class

Let $N\sim\mbox{DPH}_{p}(\bm{\alpha},\mathbf{S})$ be a DPH distributed random variable with underlying Markov chain $(Z_{n})_{n\in\mathbb{N}_{0}}$ . Let $\mathbf{r}_{j}=(r_{j}(1),\dots,r_{j}(p))^{\top}$ be column vectors of dimension $p$ taking values in $\mathbb{N}_{0}^{p}$ , $j=1,\dots,d$ , and let $\mathbf{R}=(\mathbf{r}_{1},\dots,\mathbf{r}_{d})$ be a $p\times d$ matrix, called the reward matrix. We define

N^{(j)}=\sum_{n=0}^{N}r_{j}(Z_{n})\,,

for all $j=1,\dots,d$ . Then, we say that the random vector $\mathbf{N}=(N^{(1)},\dots,N^{(d)})$ has a multivariate discrete phase-type distribution of the MDPH* type, and we write $\mathbf{N}\sim\mbox{MDPH*}(\bm{\alpha},\mathbf{S},\mathbf{R})$ . This class of distributions was introduced in Campillo-Navarro, (2018), and the construction principle can be seen as an extension of the so-called transformation via rewards for univariate DPH distributions. In this contribution, the author showed that elements in this class possess explicit expressions for joint probability-generating function, joint moment-generating function, and joint moments. Moreover, the class is dense in the class of distributions with support in $\mathbb{N}^{d}$ , and margins are DPH distributed. However, general closed-form expressions for the joint density and distribution functions are not available, and the estimation of this general class is still an open question, limiting their use in practice. Nonetheless, there are subclasses of MDPH* distributions with explicit expressions of these two functionals that preserve the denseness property of the MDPH* class. The matrixdist package provides implementations for two of these subclasses, which we describe next.

fMDPH class

Let $\mathbf{N}=(N^{(1)},N^{(2)})\sim\mbox{MDPH*}(\bm{\alpha},\mathbf{S},\mathbf{R})$ with

\bm{\alpha}=\left(\bm{\pi},\bm{0}\right)\,,\quad\mathbf{S}=\left(\begin{array}[]{cc}\mathbf{S}_{11}&\mathbf{S}_{12}\\ \bm{0}&\mathbf{S}_{22}\end{array}\right)\,,\quad\text{and}\quad\mathbf{R}=\left(\begin{array}[]{cc}\mathbf{e}&\bm{0}\\ \bm{0}&\mathbf{e}\end{array}\right)\,.

Here, $\mathbf{S}_{11}$ and $\mathbf{S}_{22}$ are sub-transition matrices of dimensions $p_{1}$ and $p_{2}$ ( $p_{1}+p_{2}=p$ ), respectively, $\mathbf{S}_{12}$ is a $p_{1}\times p_{2}$ matrix satisfying $\mathbf{S}_{11}\mathbf{e}+\mathbf{S}_{12}\mathbf{e}=\mathbf{e}$ , and $\bm{\pi}$ is a $p_{1}$ dimensional vector of initial probabilities. Then, one can show that the joint density of this random vector is given by

f_{\mathbf{N}}(n^{(1)},n^{(2)})=\bm{\pi}\mathbf{S}_{11}^{n^{(1)}-1}\mathbf{S}_{12}\mathbf{S}_{22}^{n^{(2)}-1}\mathbf{s}_{2}\,,

where $\mathbf{s}_{2}=\mathbf{e}-\mathbf{S}_{22}\mathbf{e}$ . In this case, we say that $\mathbf{N}$ is bivariate discrete phase-type distributed of the feed-forward type, and we write $\mathbf{N}\sim\mbox{fMDPH}(\bm{\pi},\mathbf{S}_{11},\mathbf{S}_{12},\mathbf{S}_{22})$ . In particular, the marginals are parametrised by $N^{(1)}\sim\mbox{DPH}(\bm{\pi},\mathbf{S}_{11})$ and $N^{(2)}\sim\mbox{DPH}(\bm{\pi}\left(\mathbf{I}-\mathbf{S}_{11}\right)^{-1}\mathbf{S}_{12},\mathbf{S}_{22})$ . For more details on this class of bivariate DPH distributions, we refer the reader to Bladt and Yslas, 2023c . Although the above construction principle can be extended to higher dimensions, the implementation of the methods associated with this class becomes more challenging. Fortunately, another subclass of MDPH* distributions introduced in Bladt and Yslas, 2023c can easily be tracked and implemented in higher dimensions, as we describe next.

mDPH class

Let $(Z_{n}^{(j)})_{n\in\mathbb{N}_{0}}$ , $j=1,\dots,d$ , be Markov chains on a common state space with $p$ transient states. All chains are assumed to start in the same state and then evolve independently until respective absorptions. Formally,

Z_{0}^{(j)}=Z_{0}^{(l)}\,,\quad(Z_{n}^{(j)})_{n\in\mathbb{N}_{0}}{\perp\!\!\!\!\perp}_{Z_{0}^{(1)}}(Z_{n}^{(l)})_{n\in\mathbb{N}_{0},\>l\neq j}\,,\quad\forall j,l\in\{1,\dots,d\}\,.

If $N^{(j)}$ , $j=1,\dots,d$ , are the univariate DPH distributed absorption times of $(Z_{n}^{(j)})_{n\in\mathbb{N}_{0}}$ , we say that the vector $\mathbf{N}=(N^{(1)},\ldots,N^{(d)})$ has a multivariate discrete phase-type distribution of the mDPH type, and we write

\mathbf{N}\sim\mbox{mDPH}(\bm{\alpha},\widetilde{S})\,,

where $\bm{\alpha}$ is the common vector of initial probabilities and $\widetilde{S}=\{\mathbf{S}_{1},\dots,\mathbf{S}_{d}\}$ , with $\mathbf{S}_{j}$ the sub-transition matrix associated with $(Z_{n}^{(j)})_{n\in\mathbb{N}_{0}}$ , $j=1,\dots,d$ . Explicit expressions for several functionals of interest of this class can easily be obtained by conditioning on the starting value of the Markov chains. For instance, the joint density function of $\mathbf{N}\sim\mbox{mDPH}(\bm{\alpha},\widetilde{S})$ is given by

f_{\mathbf{N}}(\mathbf{n})=\sum_{k=1}^{p}\alpha_{k}\prod_{j=1}^{d}\mathbf{e}_{k}^{\top}\mathbf{S}_{j}^{n^{(j)}-1}\mathbf{s}_{j}\,,\quad\mathbf{n}\in\mathbb{N}^{d}\,.

3.2. Multivariate continuous phase-type distributions

We now present the MPH*, fMPH, and mPH classes of distributions, which follow similar construction principles as their discrete counterparts.

Let $\tau\sim\mbox{PH}_{p}(\bm{\alpha},\mathbf{S})$ be a PH distributed random variable, $\mathbf{r}_{j}=(r_{j}(1),\dots,r_{j}(p))^{\top}$ be nonnegative column vectors of dimension $p$ , $j=1,\dots,d$ , and $\mathbf{R}=(\mathbf{r}_{1},\dots,\mathbf{r}_{d})$ be a $p\times d$ reward matrix. Vectors $\mathbf{r}_{j}$ represent rates at which a reward is accumulated when $(J_{t})_{t\geq 0}$ is in a specific state. Thus, we denote by

Y^{(j)}=\int_{0}^{\tau}r_{j}(J_{t})\,dt\,,

the total reward earned prior to absorption by each component $j$ , $j=1,\dots,d$ . Then, we say that the random vector $\mathbf{Y}=(Y^{(1)},\dots,Y^{(d)})$ has a multivariate phase-type distribution of the MPH* type, and we write $\mathbf{Y}\sim\mbox{MPH*}(\bm{\alpha},\mathbf{S},\mathbf{R})$ (see Kulkarni, (1989); Bladt and Nielsen, (2017); Albrecher et al., 2022b for more details). One attractive characteristic of this class of multivariate distributions is that, given a nonnegative vector $\mathbf{w}=(w_{1},\dots,w_{d})$ , $<\mathbf{Y},\mathbf{w}>=\sum_{j=1}^{d}w_{j}Y^{(j)}$ is PH distributed, with representation $(\bm{\alpha}_{\mathbf{w}},\mathbf{S}_{\mathbf{w}})$ say. To obtain the exact parametrisation, we split the state space $\{1,\dots,p\}=E_{+}\cup E_{0}$ , such that $k\in E_{+}$ if $(\mathbf{R}\mathbf{w}^{\top})_{k}>0$ and $k\in E_{0}$ if $(\mathbf{R}\mathbf{w}^{\top})_{k}=0$ . Then, by a reordering of the states, we can assume that $\bm{\alpha}$ and $\mathbf{S}$ can be written as

\bm{\alpha}=\left(\begin{array}[]{cc}\bm{\alpha}^{+}\,,&\bm{\alpha}^{0}\end{array}\right)\quad\text{and}\quad\mathbf{S}=\left(\begin{array}[]{cc}\mathbf{S}^{++}&\mathbf{S}^{+0}\\ \mathbf{S}^{0+}&\mathbf{S}^{00}\end{array}\right)\,,

where the $+$ terms correspond to states in $E_{+}$ , and the $0$ terms to those states in $E_{0}$ . For instance, $\mathbf{S}^{0+}$ collects transition intensities by which $(J_{t})_{t\geq 0}$ jumps from states in $E_{0}$ to states in $E_{+}$ . After this arrangement, we can express the distribution of the continuous part of $<\mathbf{Y},\mathbf{w}>$ via a PH representation of the form

\bm{\alpha}_{\mathbf{w}}=\bm{\alpha}^{+}+\bm{\alpha}^{0}\left(-\mathbf{S}^{00}\right)^{-1}\mathbf{S}^{0+}\quad\text{and}\quad\mathbf{S}_{\mathbf{w}}=\bm{\Delta}((\mathbf{R}\mathbf{w}^{\top})_{+})^{-1}\left(\mathbf{S}^{++}+\mathbf{S}^{+0}\left(-\mathbf{S}^{00}\right)^{-1}\mathbf{S}^{0+}\right)\,.

Here, $\bm{\Delta}((\mathbf{R}\mathbf{w}^{\top})_{+})$ is a diagonal matrix with entries the vector $(\mathbf{R}\mathbf{w}^{\top})_{+}$ formed of the appropriate ordered values satisfying $(\mathbf{R}\mathbf{w}^{\top})_{k}>0$ . Note that an atom at zero of size $\bm{\alpha}^{0}\left(\mathbf{I}-\left(-\mathbf{S}^{00}\right)^{-1}\mathbf{S}^{0+}\right)\mathbf{e}$ may appear since it is possible that $(J_{t})_{t\geq 0}$ starts in a state in $E_{0}$ and gets absorbed before reaching a state in $E_{+}$ . In particular, the above result implies that if $\mathbf{Y}\sim\mbox{MPH*}(\bm{\alpha},\mathbf{S},\mathbf{R})$ , then its marginals $Y^{(j)}$ are PH distributed with parameters easily computed with the aforementioned formulas. Moreover, the so-called transformation via rewards for univariate PH distributions is retrieved when $d=1$ above.

Although members of the MPH* class possess other desirable characteristics, such as explicit expressions for joint Laplace transform and joint moments (see Section 8.1.1 of Bladt and Nielsen, (2017)), general closed-form expressions for both joint density and joint distribution functions do not exist. Similar to its discrete counterpart, the MDPH* class, this complicates their use in practice and, more importantly, their estimation. This motivates the introduction of other subclasses of MPH* distributions for which explicit expressions can be achieved. The matrixdist package provides implementations for the fMPH and mPH subclasses, which are derived similarly to their discrete equivalents.

The fMPH class is obtained by considering MPH* parametrisations of the form $\mbox{MPH*}(\bm{\alpha},\mathbf{S},\mathbf{R})$ with

\bm{\alpha}=\left(\bm{\pi},\bm{0}\right)\,,\quad\mathbf{S}=\left(\begin{array}[]{cc}\mathbf{S}_{11}&\mathbf{S}_{12}\\ \bm{0}&\mathbf{S}_{22}\end{array}\right)\,,\quad\text{and}\quad\mathbf{R}=\left(\begin{array}[]{cc}\mathbf{e}&\bm{0}\\ \bm{0}&\mathbf{e}\end{array}\right)\,,

where $\mathbf{S}_{11}$ and $\mathbf{S}_{22}$ are sub-intensity matrices of dimensions $p_{1}$ and $p_{2}$ ( $p_{1}+p_{2}=p$ ), respectively, $\mathbf{S}_{12}$ is a $p_{1}\times p_{2}$ matrix satisfying $\mathbf{S}_{11}\mathbf{e}+\mathbf{S}_{12}\mathbf{e}=\bm{0}$ , and $\bm{\pi}$ is a $p_{1}$ dimensional vector of initial probabilities. In such a case, explicit expressions for different functionals can be obtained. For example, the joint density of a random vector $\mathbf{Y}=(Y^{(1)},Y^{(2)})$ with the above parametrisation is given by

f_{\mathbf{Y}}(y^{(1)},y^{(2)})=\bm{\alpha}\exp\left(\mathbf{S}_{11}y^{(1)}\right)\mathbf{S}_{12}\exp\left(\mathbf{S}_{22}y^{(2)}\right)\left(-\mathbf{S}_{22}\right)\mathbf{e}\,,

For more details on this class of bivariate PH distribution and an extension to the IPH framework, we refer to Albrecher et al., 2022b .

The tractable class of mIPH distributions (with a particular instance, the mPH class) was introduced in Bladt, (2023) by considering $d$ time-inhomogeneous Markov pure jump processes on a common state space with $p$ transient states that start in the same state and then evolve independently until respective absorptions. If $Y^{(j)}$ , $j=1,\dots,d$ , are the univariate IPH distributed absorption times of these Markov jump processes, we say that the vector $\mathbf{Y}=(Y^{(1)},\ldots,Y^{(d)})$ has a multivariate inhomogeneous phase-type distribution of the mIPH type, and we write

\mathbf{Y}\sim\mbox{mIPH}(\bm{\alpha},\widetilde{S},\widetilde{L}),\quad\mbox{where}\quad\widetilde{S}=\{\mathbf{S}_{1},\dots,\mathbf{S}_{d}\}\quad\text{and}\quad\widetilde{L}=\{\lambda_{1},\dots,\lambda_{d}\}.

This construction yields explicit expressions for important functionals. For instance, let $\mathbf{Y}\sim\mbox{mIPH}(\bm{\alpha},\widetilde{S},\widetilde{L})$ , then we have

f_{\mathbf{Y}}(\mathbf{y})=\sum_{k=1}^{p}\alpha_{k}\prod_{j=1}^{d}\mathbf{e}_{k}^{\top}\exp\left(\mathbf{S}_{j}g_{j}^{-1}(y^{(j)})\right)\mathbf{s}_{j}\lambda_{j}(y^{(j)}),\quad\mathbf{y}\in\mathbb{R}^{d}_{+}\,.

Note that this allows for combining different types of IPH marginals, given by $\lambda_{i}(\cdot)$ and $g_{i}(\cdot)$ , which enables different tail behaviours for the marginals.

3.3. Regression

For some of the multivariate classes introduced earlier, regression in the vector of initial probabilities can be applied by utilizing the same principle as in the univariate mixture-of-experts (MoE) approach. This allows us to capture the relationships between the multivariate responses and covariates, while taking advantage of the flexibility and expressiveness of the multivariate matrix distributions. Specifically, the mIPH-MoE model was introduced in Albrecher et al., (2023) and used to estimate the joint remaining lifetimes of spouses. Additionally, the mDPH-MoE and fMDPH-MoE specifications were derived in Bladt and Yslas, 2023c and illustrated for modelling insurance frequencies.

4. Note on estimation algorithms and computational remarks

Most of the distribution classes presented above can be estimated using various forms of expectation-maximisation (EM) algorithms. The original EM algorithm for PH distribution was introduced in Asmussen et al., (1996) and later extended for censored data in Olsson, (1996). The corresponding algorithm for DPH distributions can be found in Bladt and Nielsen, (2017). More recently, Albrecher et al., 2022b derived an EM algorithm for IPH distributions and illustrated how the implementation of the algorithms for MPH* and fMPH distributions can be carried out smoothly. Regarding the regression models, the estimation method for the PI model is presented in Albrecher et al., 2022a , while Bladt and Yslas, 2023b provides the algorithm for the estimation of the PH-MoE specification. This method was then adapted in Albrecher et al., (2023) for the mIPH class and in Bladt and Yslas, 2023c for the DPH framework, which includes extensions to the multivariate classes mDPH and fMDPH.

All estimation procedures implemented in the matrixdist package related to PH distributions require an effective method for computing matrix-exponential operations, as the various E-steps involved hinge on these types of quantities. Several theoretical methods exist for these purposes (see for instance Moler and Van Loan, (1978)), but an efficient numerical implementation is crucial due to the significant computational burden of the algorithms. The matrixdist package offers three primary implementations of matrix exponentials: the first, described in Asmussen et al., (1996), involves converting the problem into a system of ordinary differential equations. The second method utilises the so called uniformisation, with the exact description available in Albrecher et al., 2022b . The third and final method is the Padé approximation, which can be found in Moler and Van Loan, (1978). Each method presents its own advantages and disadvantages, and the choice will depend on the specific problem/distribution at hand.

5. Illustrations

5.1. Univariate distirbutions

In this section, we demonstrate how to utilize the matrixdist package to fit univariate PH and DPH distributions to real-world data.

Continous PH distributions

We consider the AutoBi dataset from the insuranceData package, which contains information about automobile bodily injury claims. The dataset comprises eight variables: one for the economic loss (which we aim to describe), six covariates, and one variable to identify each claim. We start by providing general descriptive statistics for the variable of interest (LOSS).

# Load the data
data("AutoBi", package = "insuranceData")
claim <- AutoBi
# Economic losses resulting from an automobile accident
loss <- claim$LOSS
# Summary statistics of losses
summary(loss)

#>     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
#>    0.005    0.640    2.331    5.954    3.995 1067.697

# 10 largest values of losses
tail(sort(loss), n = 10)

#>  [1]   82.000   96.007  114.604  150.000  162.047  188.720  193.000  222.405
#>  [9]  273.604 1067.697

We observe that the losses range from 0.005 to 1067.697, with 75% of the data smaller than 3.995. The presence of large values, compared to the body of the distribution, serves as an initial indication of a heavy tail. To further confirm this observation, we create a Hill plot using the hill() function from the evir package.

library(evir)
hill(loss)

Refer to caption — Figure 5.1. Hill plot for losses in the AutoBi dataset.

The plot exhibits a flat behaviour for the largest order statistics, suggesting a Pareto-type tail (or regularly varying). Therefore, we opt for an IPH model to more accurately capture the tail of the losses. More specifically, we use a matrix-Pareto distribution. The fitting process starts by creating an initial IPH distribution, which provides the starting parameters for the EM algorithm. This can be accomplished by employing the iph() method in matrixdist. For our analysis, we create an IPH object with random parameters of a general structure, a dimension 6, and a Pareto inhomogeneity function.

set.seed(22)
# Create an IPH object
x <- iph(gfun = "pareto", structure = "general", dimension = 6)

With an initial IPH distribution in place, we can employ the fit() method to fit the data. It is important to note that fit() requires specifying the number of steps for the EM algorithm. In this instance, we use 1500 steps, as the changes in the loglikelihood become negligible with this number (the same criteria is employed to determine the number of iterations in the subsequent illustration). The method outputs an IPH object containing the fitted parameters. Further steps could be made on the output object if required.

# Fitting procedure
z <- fit(x, y = loss, stepsEM = 1500)

Figures 5.2 and 5.3 display a QQ plot for the fitted distribution and a histogram of the (log) data against the fitted density, respectively. These plots indicate that the fitted distribution serves as a good approximation of the data. In particular, the sample and fitted quantiles align closely with the identity line, especially within the distribution’s body. Although there is a slight deviation from the identity line for quantiles in the tail, they remain relatively close to the blue line. The histogram further supports the notion that the fitted IPH distribution is a satisfactory approximation.

PH regression

We now shift our focus to the regression framework. In particular, we employ the PI specification to capture the influence of covariate information on economic losses. The initial step involves removing any incomplete covariate data. Next, we need to split the covariates and the response variable into two different objects. In our case, we investigate the effect of the variables ATTORNEY, CLMSEX and CLMAGE. The variable ATTORNEY indicates whether the claimant was represented by an attorney or not, CLMSEX is the claimant’s sex, and CLMAGE corresponds to their age.

# Covariate information, excluding missing entries
cov <- na.omit(claim)
# Separation of the information
X <- cov[, c("ATTORNEY", "CLMSEX", "CLMAGE")]
loss <- cov[, "LOSS"]

Now, we can employ the reg() method to perform the PI regression. This function works similarly to the fit() method, with the main difference being that estimation of the regression parameters is included in each iteration of the EM algorithm (see Albrecher et al., 2022a for details). Along with the data, the reg() method requires an initial IPH distribution. In this case, we employ the previously fitted matrix-Pareto distribution without covariates.

# PI regression
z.reg <- reg(x = z, y = loss, X = X, stepsEM = 250)

The resulting object (z.reg) contains all matrix parameters along with the regression coefficients. This information can be easily accessed via the coef() function as follows.

# Fitted parameters
z.reg.par <- coef(z.reg)
z.reg.par

#> $alpha
#> [1] 2.356094e-04 3.257751e-06 5.920828e-02 3.755407e-04 4.561202e-01
#> [6] 4.840571e-01
#>
#> $S
#>               [,1]          [,2]        [,3]          [,4]          [,5]
#> [1,] -3.412707e+01  7.592078e-06  1.42774265  1.533919e-01  6.975181e-06
#> [2,]  2.145191e+01 -3.473845e+01  0.38024154  1.288647e+01  3.162116e-05
#> [3,]  3.090083e-01  6.287658e-01 -1.83234510  4.643455e-01  3.670351e-01
#> [4,]  5.666353e-01  1.166288e-04  7.77223071 -2.987094e+01  2.335434e-04
#> [5,]  1.432727e-04  3.578715e+01  0.49172745  2.247409e-04 -3.628114e+01
#> [6,]  4.561651e-06  1.243609e-03  0.08311972  3.771681e-05  4.707313e-02
#>               [,6]
#> [1,]  3.037507e+01
#> [2,]  2.799554e-05
#> [3,]  3.352413e-02
#> [4,]  2.066459e+01
#> [5,]  1.118600e-03
#> [6,] -6.641175e+00
#>
#> $beta
#> [1] 33.10073
#>
#> $B
#>
#>  1.15139949 -0.10888748 -0.01368395
#>
#> $C
#> numeric(0)

The above information can be used, for example, to create customised loss distributions for individual claimants based on their distinctive characteristics.

Take, for instance, a new claimant: a 60-year-old male, unrepresented by an attorney. The following code leverages our model to estimate his loss distribution.

# Covariate information of the new claimant
claimant <- data.frame(ATTORNEY = 2, CLMSEX = 1, CLMAGE = 60)
# Multiplicative effect in the sub-intensity matrix
prop <- exp(sum(z.reg.par$B * claimant))
# IPH distribution for claimant’s loss
claimant.iph <- iph(
  alpha = z.reg.par$alpha, S = z.reg.par$S * prop,
  gfun = "pareto", gfun_pars = z.reg.par$beta
)

If we wish to visualize this claimant’s survival function, we can plot it as follows.

# Evaluation points for the survival function
q <- seq(0, 15, 0.01)
# Plot of the survival function
plot(q, cdf(claimant.iph, q, lower.tail = F),
  type = "l", col = "blue", lwd = 1.3,
  ylab = "1-F(y)", xlab = "y", main = "Survival function"
)

We refer readers to Section 6 and the documentation of matrixdist for further available methods for PH and IPH distributions.

Discrete PH distributions

The matrixdist package provides an efficient approach to working with DPH distributions. To exemplify its application, we will use the NMES1988 data set included in the R package AER. This data is the result of a US National Medical Expenditure Survey that took place from 1987 to 1988 and contains various count variables. For our illustration, we will focus on the annual number of visits to a physician’s office (visits variable) for US residents aged above 66. Note, however, that when working with DPH distributions, it is necessary to add 1 to data starting at 0, as DPH random variable’s support commences at 1.

# Load the data
data("NMES1988", package = "AER")
df <- NMES1988
# Yearly physician office visits
visit <- df$visits + 1
df$y <- visit

The process of fitting a DPH distribution to data closely parallels that of fitting PH distributions. It starts by defining an initial dph object, which must then be passed to the fit() function along with the data (Figure 5.5 depicts the fitted distribution).

# Create a DPH distribution
set.seed(23)
d <- dph(structure = "general", dimension = 3)
# Fit the distribution to the data we are interested in
d.fit <- fit(x = d, y = visit, stepsEM = 1500)

The data set also provides many other covariates. Here, we will concentrate on examining the influence of age, gender, and number of chronic conditions on the frequency of visits to a physician. For such a purpose, we will employ the MoE framework to integrate the covariate information into the DPH specification. This can be done by employing the MoE() method. The first step to performing this regression method is to specify a formula (f) that details how the covariates affect the number of physician visits.

# Convert the gender variable into binary format
df$gender <- ifelse(df$gender == "male", 1, 0)
# Regression formula
f <- y ~ age + gender + chronic

Next, such a formula has to be passed on to the MoE() method along with the data (including covariates) and an initial DPH distribution.

# MoE regression
d.reg <- MoE(x = d.fit, formula = f, data = df, stepsEM = 500)

The fitted model allows us to examine the impact of different covariate profiles on the number of physician visits. For example, let us compare the probability mass functions for two subjects. Both are 69 years old, female, but they differ in the number of chronic conditions they possess: one has five chronic conditions while the other does not have any.

# DPH distribution for a subject with 5 chronic conditions
dph.sub1 <- dph(alpha = d.reg$alpha[12, ], S = d.reg$S)
# DPH distribution for a subject with 0 chronic conditions
dph.sub2 <- dph(alpha = d.reg$alpha[38, ], S = d.reg$S)
# Plot of density function for both subjects
q <- seq(1, 20, 1)
plot(q, dens(dph.sub1, q),
  type = "p", col = "blue", cex = 0.8,
  ylab = "P(N=n)", xlab = "n", ylim = c(0, 0.3)
)
points(q, dens(dph.sub2, q), col = "red", cex = 0.8)
legend("topright",
  legend = c("5 chronic conditions", "0 chronic conditions"),
  col = c("blue", "red"), bty = "n", pch = 1, cex = 0.8
)

Furthermore, we can quantify the impact of chronic conditions on the expected number of visits to a physician’s office. Let us compute the expected number of visits for both individuals:

# Mean value for subject with 5 chronic conditions
mean(dph.sub1) - 1

#> [1] 8.098524

# Mean value for subject with 0 chronic conditions
mean(dph.sub2) - 1

#> [1] 3.656367

Our analysis unsurprisingly reveals that the subject with five chronic conditions is expected to visit more often than the one with none. This demonstrates the ability of our fitted model to quantify the impact of varying covariate profiles - a tool which is essential for healthcare professionals when predicting healthcare usage patterns.

5.2. Multivariate distributions

MPH* class

To illustrate the capabilities of the implementations for MPH* distributions in matrixdist, we use the rdj data set of the copula package. More specifically, our focus is to use an MPH* for describing the joint behaviour of the absolute log returns of the three stock prices in this data set: Intel, Microsoft, and General Electric.

Firstly, we load the data, take the absolute value, and eliminate any data points with atoms at zero.

set.seed(24)
# Load the data
data("rdj", package = "copula")
# Absolute log return of stock prices
y <- as.matrix(abs(rdj[, -1]))
# Remove data points with atoms at zero
y <- y[y[, 1] != 0 & y[, 2] != 0 & y[, 3] != 0, ]

As in the univariate case, the fitting process requires the definition of an initial MPHstar object that is subsequently fed into the fit() function. In the present case, we use an MPHstar object with randomly generated parameters, a general form of the matrix parameters, and dimension $10$ .

# Initial MPH* distribution
x <- MPHstar(structure = "general", dimension = 10, variables = 3)
# Fitting to data
x.fit <- fit(x = x, y = y, stepsEM = 200, uni_epsilon = 1e-6, r = 0.5)

In particular, the reward matrix of the fitted distribution can be accessed as follows:

# Estimated reward matrix
x.fit@pars$R

#>              [,1]       [,2]       [,3]
#>  [1,] 0.624656883 0.03162784 0.34371527
#>  [2,] 0.448807782 0.08398967 0.46720255
#>  [3,] 0.545561519 0.42181735 0.03262113
#>  [4,] 0.783031036 0.08518594 0.13178302
#>  [5,] 0.973586315 0.01054082 0.01587287
#>  [6,] 0.318181076 0.21878855 0.46303038
#>  [7,] 0.015947222 0.78138202 0.20267076
#>  [8,] 0.498338747 0.28584901 0.21581224
#>  [9,] 0.001025196 0.99897480 0.00000000
#> [10,] 0.471608399 0.48276616 0.04562544

Finally, we assess the quality of the fit by creating QQ plots of the fitted margins.

# Names for the QQ plot
name <- colnames(y)
# Quantiles’ levels
qq <- seq(0.01, 0.99, length.out = 100)
par(mfrow = c(1, 3))
for (j in 1:3) {
  # j-th marginal fitted distribution
  x <- marginal(x.fit, j)
  # Marginal observations
  ym <- y[, j]
  # quantiles
  y.q <- quantile(ym, probs = qq)
  x.q <- quan(x, qq)
  # QQ plot
  plot(y.q, x.q, main = name[j], xlab = "Sample", ylab = "Fitted", pch = 16, asp = 1)
  abline(0, 1, col = "blue")
}

Upon inspecting the QQ plots, we can observe that the fitted marginals provide a satisfactory approximation of the data.

mDPH class

Finally, let us illustrate the use of the mDPH class with a practical example. We will use the NMES1988 data set again, but this time, we will focus on jointly modelling two distinct variables: the yearly number of physician office visits (visits) and yearly non-physician office visits (nvisits). With the matrixdist package, the fitting of a mDPH distribution to this data can be done as follows.

# Load the data
data("NMES1988", package = "AER")
df <- NMES1988
# Observations in matrix form
mvisit <- cbind(df$visits, df$nvisits) + 1
# Initial mDPH distribution
set.seed(12345)
md <- mdph(structure = rep("general", 2), dimension = 4, variables = 2)
# Fitting procedure
md.fit <- fit(x = md, y = mvisit, stepsEM = 1500)

The expected values, variance-covariance matrix, and correlation matrix of the fitted distribution can be obtained using the subsequent code:

# Vector of marginal expectations
mean(md.fit) - 1

#> [1] 5.774399 1.618021

# Variance-covariance matrix
var(md.fit)

#>           [,1]      [,2]
#> [1,] 46.243751  5.726782
#> [2,]  5.726782 27.372432

# Correlation matrix
cor(md.fit)

#>           [,1]      [,2]
#> [1,] 1.0000000 0.1609635
#> [2,] 0.1609635 1.0000000

Note that the mean of the fitted mDPH distribution matches the sample’s mean, a property that also holds for the estimation methods for the PH, DPH, and mPH classes. Additionally, the fitted model captures the correlation between physician office visits and non-office visits. The mDPH class also supports the MoE framework, allowing us to incorporate covariate information. Next, we incorporate age, gender, and the number of chronic conditions into our analysis.

df$y <- mvisit
# Convert the gender variable into binary format
df$gender <- ifelse(df$gender == "male", 1, 0)
# Regression formula
f <- y ~ age + gender + chronic
# MoE regression
md.reg <- MoE(x = md.fit, y = mvisit, formula = f, data = df, stepsEM = 500)

The above output allows us to predict the joint distribution for new individuals with a set of specific characteristics. This is achieved by employing the predict() function in conjunction with the nnet object included in md.reg (accessible via md.reg$mm). For instance, for a 72-year-old male patient with one chronic condition, the mDPH distribution describing the joint behaviour of the two variables of interest can be specified as follows:

# Characteristics of a new patient
patient <- data.frame(age = 7.2, gender = 1, chronic = 1)
# Predicted starting probabilities
alpha <- predict(md.reg$mm, type = "probs", newdata = patient)
# New mDPH distribution
md.patient <- mdph(alpha = alpha, S = md.reg$S)

Figure 5.7 shows the joint probability mass function of yearly office and non-office visits for the new patient.

6. Summary of methods

We have explored the fundamental properties of some matrix distributions and demonstrated the use of the matrixdist package for analyzing real-life data through detailed examples. The package encompasses methods for computing various functionals and estimating most of the classes discussed in this paper. Table 6.1 presents the families of matrix distributions currently implemented in the package. We refer readers to the package documentation for guidance on using these construction functions. For example, consulting ?ph will provide instructions on creating PH distributions. Additionally, Table 6.2 offers a comprehensive list of the implemented methods, along with a mapping to the applicable classes. For examples of the use of these methods, the package help can be consulted. The general structure for accessing the desired examples is ?`method_name,class_name-method`. For example, ?`fit,ph-method` directs to the documentation of the fit() method for the ph class.

Table 6.1. Classes of matrix distributions available in matrixdist

Class	Constructor function
DPH	dph()
PH	ph()
IPH	iph()
fMDPH	bivdph()
mDPH	mdph()
MPH*	MPHstar()
fMPH	bivph()
mPH	mph()
fMIPH	biviph()
mIPH	miph()

Table 6.2. Methods for matrix distributions available in matrixdist

Method	Description	Available for
sim()	Simulates iid realisations	dph, ph, iph, bivdph, mdph, MPHstar, bivph, mph, biviph, miph
moment()	Moments	dph, ph, bivdph, mdph, bivph, mph
laplace()	Laplace transform	ph, bivph, mph
mgf()	Moment-generating function	ph, bivph, mph
pgf()	Probability-generating function	dph, bivdph, mdph
+	Parmetrisation of the sum of two matrix distributions	dph, ph
minimum()	Parmetrisation of the minimum of two matrix distributions	dph, ph, iph
maximum()	Parmetrisation of the maximum of two matrix distributions	dph, ph, iph
mixture()	Parmetrisation of a mixture of two matrix distributions	dph, ph
Nfold()	N-fold convolution with N DPH distributed	dph, ph
dens()	Density function	dph, ph, iph, bivdph, mdph, bivph, mph, biviph, miph
cdf()	Cumulative distribution function	dph, ph, iph, mph, miph
haz()	Hazard rate function	ph, iph
quan()	Quantiles	ph, iph
TVR()	Transformation via rewards	dph, ph
fit()	Estimation	dph, ph, iph, bivdph, mdph, MPHstar, bivph, mph, biviph, miph
reg()	PI regression	ph, iph
MoE()	MoE regression	dph, ph, iph, bivdph, mdph, mph, miph
marginal()	Parametrisation of marginal distribution	bivdph, mdph, MPHstar, bivph, mph, biviph, miph
linCom()	Parametrisation of a linear combination of marginals	MPHstar, bivph
mean()	Mean or vector of means	dph, ph, bivdph, mdph, MPHstar, bivph, mph
var()	Variance or covariance matrix	dph, ph, bivdph, mdph, MPHstar, bivph, mph
cor()	Correlation matrix	bivdph, mdph, MPHstar, bivph, mph

References

Albrecher and Bladt, (2019) Albrecher, H. and Bladt, M. (2019). Inhomogeneous phase-type distributions and heavy tails. Journal of Applied Probability, 56(4):1044–1064.
(2) Albrecher, H., Bladt, M., Bladt, M., and Yslas, J. (2022a). Mortality modeling and regression with matrix distributions. Insurance: Mathematics and Economics, 107:68–87.
Albrecher et al., (2023) Albrecher, H., Bladt, M., and Müller, A. J. (2023). Joint lifetime modelling with matrix distributions. Dependence Modeling, 11(1).
(4) Albrecher, H., Bladt, M., and Yslas, J. (2022b). Fitting inhomogeneous phase-type distributions to data: The univariate and the multivariate case. Scandinavian Journal of Statistics, 49(1):44–77.
Asmussen et al., (1996) Asmussen, S., Nerman, O., and Olsson, M. (1996). Fitting phase-type distributions via the EM algorithm. Scandinavian Journal of Statistics, 23(4):419–441.
Bladt, (2022) Bladt, M. (2022). Phase-type distributions for claim severity regression modeling. ASTIN Bulletin: The Journal of the IAA, 52(2):417–448.
Bladt, (2023) Bladt, M. (2023). A tractable class of multivariate phase-type distributions for loss modeling. North American Actuarial Journal.
Bladt and Nielsen, (2017) Bladt, M. and Nielsen, B. F. (2017). Matrix-Exponential Distributions in Applied Probability, volume 81. Springer.
(9) Bladt, M. and Yslas, J. (2023a). matrixdist: Statistics for Matrix Distributions. R package version 1.1.9.
(10) Bladt, M. and Yslas, J. (2023b). Phase-type mixture-of-experts regression for loss severities. Scandinavian Actuarial Journal, 2023(4):303–329.
(11) Bladt, M. and Yslas, J. (2023c). Robust claim frequency modeling through phase-type mixture-of-experts regression. Insurance: Mathematics and Economics, 111:1–22.
Campillo-Navarro, (2018) Campillo-Navarro, A. (2018). Order statistics and multivariate discrete phase-type distributions. PhD thesis, Ph. D. thesis, DTU Lyngby.
Cox, (1972) Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological), 34(2):187–202.
Dutang et al., (2008) Dutang, C., Goulet, V., and Pigeon, M. (2008). actuar: An R package for actuarial science. Journal of Statistical software, 25:1–37.
Kulkarni, (1989) Kulkarni, V. G. (1989). A new class of multivariate phase type distributions. Operations Research, 37(1):151–158.
Moler and Van Loan, (1978) Moler, C. and Van Loan, C. (1978). Nineteen dubious ways to compute the exponential of a matrix. SIAM review, 20(4):801–836.
Okamura and Dohi, (2015) Okamura, H. and Dohi, T. (2015). mapfit: An R-based tool for PH/MAP parameter estimation. In Quantitative Evaluation of Systems: 12th International Conference, QEST 2015, Madrid, Spain, September 1-3, 2015, Proceedings 12, pages 105–112. Springer.
Olsson, (1996) Olsson, M. (1996). Estimation of phase-type distributions from censored data. Scandinavian Journal of Statistics, 23(4):443–460.
R Core Team, (2023) R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
Rivas-González et al., (2023) Rivas-González, I., Andersen, L. N., and Hobolth, A. (2023). PhaseTypeR: an R package for phase-type distributions in population genetics. Journal of Open Source Software, 8(82):5054.