This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

matrixdist: An R Package for Statistical Analysis of Matrix Distributions

Martin Bladt Department of Mathematical Sciences, University of Copenhagen, DK-2100 Copenhagen, Denmark [email protected] Alaric Mueller Faculty of Business and Economics, University of Lausanne, 1015 Lausanne, Switzerland [email protected]  and  Jorge Yslas Institute for Financial and Actuarial Mathematics, University of Liverpool, L69 7ZL Liverpool, UK [email protected]
Abstract.

The matrixdist R package provides a comprehensive suite of tools for the statistical analysis of matrix distributions, including phase-type, inhomogeneous phase-type, discrete phase-type, and related multivariate distributions. This paper introduces the package and its key features, including the estimation of these distributions and their extensions through expectation-maximisation algorithms, as well as the implementation of regression through the proportional intensities and mixture-of-experts models. Additionally, the paper provides an overview of the theoretical background, discusses the algorithms and methods implemented in the package, and offers practical examples to illustrate the application of matrixdist in real-world scenarios. The matrixdist R package aims to provide researchers and practitioners a wide set of tools for analysing and modelling complex data using matrix distributions.

1. Introduction

In recent years, matrix distributions such as phase-type (PH), inhomogeneous phase-type (IPH), discrete phase-type (DPH), and related multivariate distributions have gained prominence in various fields due to their ability to seamlessly model complex or heterogeneous data. Matrix distributions can be employed to capture diverse behaviours and dependencies in real-world scenarios, making them particularly useful in areas such as survival analysis, actuarial science, queuing theory, and finance.

PH distributions (cf. Bladt and Nielsen,, 2017) are a natural extension of the exponential distribution and are defined as the time to reach an absorbing state in an otherwise transient time-homogeneous pure-jump Markov process. These distributions possess closed-form formulas in terms of matrices for various functionals such as density, cumulative distribution function, Laplace transform, moments, and can approximate any other distribution on the positive real line. Since only the time to absorption is observed and not the actual stochastic process trajectory, models based on PH distributions are hidden Markov models. Hence, they are typically estimated using the expectation-maximisation (EM) algorithm (cf. Asmussen et al.,, 1996). However, their tail behaviour remains exponential, which can be a limiting factor in certain applications that require a more accurate tail description. This shortcoming has led to the exploration of classes extending PH distributions, offering different tail behaviours while retaining PH-like properties. One such class is comprised of IPH distributions (Albrecher and Bladt,, 2019), which can be defined equivalently as the law of a transformed PH distributed random variable or the absorption time of a time-inhomogeneous pure-jump Markov process. The tail behaviour of the resulting IPH distribution can be as heavy or light as desired, depending on the chosen transformation. Moreover, for parametric transformations, the estimation procedure has been outlined in Albrecher et al., 2022b . Several extensions to the multivariate setting of PH distributions have been introduced in the literature. The most general construction can be found in Kulkarni, (1989), called the MPH* class, whose definition consists of considering marginals that depend on a unique Markov jump process but with different rewards at each state. Regarding multivariate extensions of the IPH class, Albrecher et al., 2022b presented some alternatives that allow for estimation. Other extensions were considered in Albrecher et al., 2022b and Bladt, (2023).

For modelling count data, the DPH class (cf. Bladt and Nielsen,, 2017) provides a tractable and flexible set of distributions, which also allows for estimation via an EM algorithm. Formally, DPH distributions are defined as the time to reach absorption in a discrete Markov chain with one absorbing state and the remaining states being transient. However, they can also be viewed as an extension of the geometric distribution. The more general class of multivariate DPH distributions was introduced in Campillo-Navarro, (2018), employing a similar construction principle as the MPH* class via rewards. More recently, Bladt and Yslas, 2023c presented other more tractable classes of DPH distributions that offer similar versatility in modelling count data.

Matrix distributions have proven to be useful in various applications, motivating the development of regression models tailored to these distributions. Two notable regression models for matrix distributions include the proportional intensities (PI) model and the mixture-of-experts (MoE) specification. The PI model, as proposed by Albrecher et al., 2022a , extends the classical Cox proportional hazards model to the IPH setting. This model enables the incorporation of covariate information directly into the intensity matrix of the underlying Markov process. On the other hand, the MoE specification, originally introduced in Bladt and Yslas, 2023b , incorporates the covariate information in the vector of initial probabilities and can easily be modified to the multivariate case (cf. Albrecher et al.,, 2023).

This paper presents the matrixdist package (Bladt and Yslas, 2023a, ), designed to work with matrix distributions and implemented in R (R Core Team,, 2023). The matrixdist package leverages the S4 class system in R to ensure a well-structured and organised implementation of the various distribution classes and their associated methods. By utilising S4 classes, the package offers a user-friendly interface for interacting with PH, IPH, DPH, and related multivariate and regression distributions. Users can create distribution objects with their respective parameters, and the package will automatically manage the underlying data structures and computations. This design promotes code reusability, maintainability, and extensibility, making it easier to add new distribution classes and methods in the future while ensuring a consistent and coherent user experience. The object oriented design can be particularly efficient when dealing with large numbers of parameters during statistical estimation. Furthermore, most computationally intensive functions are implemented using the C++ language to enhance performance. These functions are then made accessible in the R language via the Rcpp package, ensuring both speed and ease-of-use for the matrixdist package.

It is worth mentioning that the R package matrixdist offers a preferable solution compared to existing packages for working with matrix distributions. To provide context, we briefly summarise some alternative software tools for matrix distributions:

  • The R package actuar (Dutang et al.,, 2008) contains implementations for the density, cumulative distribution, moments, and moment-generating function of univariate PH distributions.

  • The R package mapfit (Okamura and Dohi,, 2015) offers some estimation methods for PH distributions and for Markovian Arrival Processes (MAP). The latter can loosely be seen as dependent concatenations of PH variables (arrivals).

  • The recent PhaseTypeR package (Rivas-González et al.,, 2023) includes implementations for reward transformations and functionals for some of the multivariate extensions of homogeneous PH and DPH distributions. However, no statistical estimation is considered.

The matrixdist package stands out by providing a comprehensive and efficient suite of tools that cover univariate, multivariate, continuous, discrete, and regression matrix models, and their statistical estimation. Right-censoring is also supported. The package’s versatility, completeness, and high-speed performance make it an ideal choice for researchers and practitioners across all matrix distribution domains.

The remainder of the paper is organised as follows. Section 2 provides the mathematical formulation of univariate DPH, PH and IPH distributions, their basic properties and regression extensions. Similar information for multivariate matrix distribution is given in Section 3, where the multivariate extensions of the IPH and DPH classes are introduced. A discussion on estimation methods for the introduced models via EM algorithms is provided in Section 4, and detailed illustrations on the use of the package are provided in Section 5. Finally, Section 6 gives a condensed summary of the methods available in matrixdist and shows how further information can be accessed within the package’s documentation.

2. Univariate matrix distributions

2.1. Discrete phase-type distributions

Let (Zn)n0(Z_{n})_{n\in\mathbb{N}_{0}} be a discrete Markov chain (MC) on a finite state space {1,,p,p+1}\{1,\dots,p,p+1\} where the first pp states are transient, and state p+1p+1 is absorbing. Then, the MC has transition matrix 𝐏\mathbf{P} given by

𝐏=(𝐒𝐬𝟎1),\mathbf{P}=\left(\begin{array}[]{cc}\mathbf{S}&\mathbf{s}\\ \bm{0}&1\end{array}\right)\,,

where 𝐒\mathbf{S} is a p×pp\times p matrix, known as sub-transition matrix, and 𝐬\mathbf{s} is a column vector of dimension pp. Considering that the rows of 𝐏\mathbf{P} sum to 11, the following relation holds 𝐬=𝐞𝐒𝐞\mathbf{s}=\mathbf{e}-\mathbf{S}\mathbf{e}, with 𝐞=(1,,1)\mathbf{e}=(1,\dots,1)^{\top} the pp-dimensional column vector of ones. Furthermore, we assume that (Zn)n0(Z_{n})_{n\in\mathbb{N}_{0}} may start in any transient state with a certain probability αk=(Z0=k)\alpha_{k}=\mathbb{P}(Z_{0}=k), k=1,,pk=1,\dots,p, and let 𝜶=(α1,,αp)\bm{\alpha}=(\alpha_{1},\dots,\alpha_{p}). Then, the time until absorption N=inf{n1Zn=p+1}N=\inf\{n\geq 1\mid Z_{n}=p+1\} is said to be discrete phase-type (DPH) distributed with representation (𝜶,𝐒)(\bm{\alpha},\mathbf{S}), and we write NDPH(𝜶,𝐒)N\sim\mbox{DPH}(\bm{\alpha},\mathbf{S}) or NDPHp(𝜶,𝐒)N\sim\mbox{DPH}_{p}(\bm{\alpha},\mathbf{S}). Such a random variable has probability mass function qq and distribution function QQ given by

q(n)=𝜶𝐒n1𝐬,n1,q(n)=\bm{\alpha}\mathbf{S}^{n-1}\mathbf{s}\,,\quad n\geq 1\,,
Q(n)=1𝜶𝐒n𝐞,n1.Q(n)=1-\bm{\alpha}\mathbf{S}^{n}\mathbf{e}\,,\quad n\geq 1\,.

The absorption time NN has κ\kappa-factorial moments, κ\kappa\in\mathbb{N}, given by

𝔼(N(N1)(Nκ+1))=κ!𝜶𝐒κ1(𝐈𝐒)κ𝐞.\mathbb{E}(N(N-1)\cdots(N-\kappa+1))=\kappa!\bm{\alpha}\mathbf{S}^{\kappa-1}\left(\mathbf{I}-\mathbf{S}\right)^{-\kappa}\mathbf{e}\,.

In particular, we have that 𝔼(N)=𝜶(𝐈𝐒)1𝐞\mathbb{E}(N)=\bm{\alpha}\left(\mathbf{I}-\mathbf{S}\right)^{-1}\mathbf{e}. Furthermore, its probability-generating function PNP_{N} is given by

PN(t)=𝔼(tN)=𝜶(t1𝐈𝐒)𝐬,|t|1,P_{N}(t)=\mathbb{E}(t^{N})=\bm{\alpha}\left(t^{-1}\mathbf{I}-\mathbf{S}\right)\mathbf{s}\,,\quad|t|\leq 1\,,

where 𝐈\mathbf{I} is the identity matrix of appropriate dimension.

Let N1DPHp1(𝜶1,𝐒1)N_{1}\sim\mbox{DPH}_{p_{1}}(\bm{\alpha}_{1},\mathbf{S}_{1}) and N2DPHp2(𝜶2,𝐒2)N_{2}\sim\mbox{DPH}_{p_{2}}(\bm{\alpha}_{2},\mathbf{S}_{2}) be two independent DPH distributed random variables. Then,

N1+N2DPHp1+p2((𝜶1,𝟎),(𝐒1𝐬1𝜶2𝟎𝐒2)),N_{1}+N_{2}\sim\mbox{DPH}_{p_{1}+p_{2}}\left(\left(\bm{\alpha}_{1},\bm{0}\right),\left(\begin{array}[]{cc}\mathbf{S}_{1}&\mathbf{s}_{1}\bm{\alpha}_{2}\\ \bm{0}&\mathbf{S}_{2}\end{array}\right)\right)\,,
min(N1,N2)DPHp1p2(𝜶1𝜶2,𝐒1𝐒2),\min\left(N_{1},N_{2}\right)\sim\mbox{DPH}_{p_{1}p_{2}}\left(\bm{\alpha}_{1}\otimes\bm{\alpha}_{2},\mathbf{S}_{1}\oplus\mathbf{S}_{2}\right)\,,

and

max(N1,N2)DPHp1p2+p1+p2((𝜶1𝜶2,𝟎,𝟎),(𝐒1𝐒2𝐒1𝐬2𝐬1𝐒2𝟎𝐒1𝟎𝟎𝟎𝐒2)),\max\left(N_{1},N_{2}\right)\sim\mbox{DPH}_{p_{1}p_{2}+p_{1}+p_{2}}\left(\left(\bm{\alpha}_{1}\otimes\bm{\alpha}_{2},\bm{0},\bm{0}\right),\left(\begin{array}[]{ccc}\mathbf{S}_{1}\otimes\mathbf{S}_{2}&\mathbf{S}_{1}\otimes\mathbf{s}_{2}&\mathbf{s}_{1}\otimes\mathbf{S}_{2}\\ \bm{0}&\mathbf{S}_{1}&\bm{0}\\ \bm{0}&\bm{0}&\mathbf{S}_{2}\end{array}\right)\right)\,,

where \otimes and \oplus denote the Kronecker product and sum, respectively. Furthermore, if UBernoulli(ν)U\sim Bernoulli(\nu), ν[0,1]\nu\in[0,1], then

UN1+(1U)N2DPHp1+p2((ν𝜶1,(1ν)𝜶2),(𝐒1𝟎𝟎𝐒2)).UN_{1}+(1-U)N_{2}\sim\mbox{DPH}_{p_{1}+p_{2}}\left(\left(\nu\bm{\alpha}_{1}\,,(1-\nu)\bm{\alpha}_{2}\right),\left(\begin{array}[]{cc}\mathbf{S}_{1}&\bm{0}\\ \bm{0}&\mathbf{S}_{2}\end{array}\right)\right)\,.

Thus, the DPH class is closed under addition, minima, maxima, and finite mixtures. We refer to Bladt and Nielsen, (2017) for a detailed description of DPH distributions.

Regression

Recently, a regression model for frequency modelling based on DPH distributions was introduced in Bladt and Yslas, 2023c as a generalisation of mixture-of-experts (MoE) models. The main idea is to incorporate the covariate information on the vector of initial probabilities, which play the role of expert weights. The resulting distributional model is flexible, as it possesses the property of denseness on more general regression models satisfying some mild technical conditions. Next, we present the fundamental concepts of this regression model. Define the mapping

𝜶:DhΔp1,\bm{\alpha}:D\subset\mathbb{R}^{h}\rightarrow\Delta^{p-1},

where Δp1={(α1,,αp)pkαk=1,αk0k}\Delta^{p-1}=\{(\alpha_{1},\dots,\alpha_{p})\in\mathbb{R}^{p}\mid\sum_{k}\alpha_{k}=1,\;\alpha_{k}\geq 0\;\forall k\} is the standard (p1)(p-1) simplex. Then, for any given vector of covariate information 𝐗=(X1,,Xh)h\mathbf{X}=(X_{1},\dots,X_{h})\in\mathbb{R}^{h}, the initial probabilities of the underlying MC of a DPH distribution are taken to be (Z0=k)=αk(𝐱):=(𝜶(𝐱))k\mathbb{P}(Z_{0}=k)=\alpha_{k}(\mathbf{x}):=(\bm{\alpha}(\mathbf{x}))_{k}, k=1,,pk=1,\dots,p. Thus, we say that N𝐗DPH(𝜶(𝐗),𝐒)N\mid\mathbf{X}\sim\mbox{DPH}(\bm{\alpha}(\mathbf{X}),\mathbf{S}) follows the DPH-MoE specification. For D=hD=\mathbb{R}^{h}, we have a convenient parametrisation of initial probabilities, namely the softmax parametrisation given by

αk(𝐗;𝜸)=exp(𝐗𝜸k)j=1pexp(𝐗𝜸j)k=1,,p,\alpha_{k}(\mathbf{X};\bm{\gamma})=\frac{\exp\left(\mathbf{X}\bm{\gamma}_{k}\right)}{\sum_{j=1}^{p}\exp\left(\mathbf{X}\bm{\gamma}_{j}\right)}\quad k=1,\dots,p\;,

where 𝜸k¯h\bm{\gamma}_{k}\in\bar{\mathbb{R}}^{h}, and 𝜸=(𝜸1,,𝜸p)¯p×h\bm{\gamma}=(\bm{\gamma}_{1}^{\top},\dots,\bm{\gamma}_{p}^{\top})^{\top}\in\bar{\mathbb{R}}^{p\times h}. More details about this model as well as precise denseness considerations can be found in Bladt and Yslas, 2023c . We also refer to Bladt and Yslas, 2023b for an extension of this regression model to continuous phase-type distributions.

2.2. Continuous phase-type distributions

Let (Jt)t0(J_{t})_{t\geq 0} be a time-homogeneous Markov jump process on a finite state space {1,,p,p+1}\{1,\dots,p,p+1\}, where states 1,,p1,\dots,p are transient, and state p+1p+1 is absorbing. Then, the intensity matrix 𝚲\bm{\Lambda} of (Jt)t0(J_{t})_{t\geq 0} is of the form

𝚲=(𝐒𝐬𝟎0),\bm{\Lambda}=\left(\begin{array}[]{cc}\mathbf{S}&\mathbf{s}\\ \bm{0}&0\end{array}\right)\,,

where 𝐒\mathbf{S} is a p×pp\times p matrix, called a sub-intensity matrix, and 𝐬\mathbf{s} is a pp-dimensional column vector. Since every row of 𝚲\bm{\Lambda} sums to zero, it follows that 𝐬=𝐒𝐞\mathbf{s}=-\mathbf{S}\,\mathbf{e}. Assume that the process starts somewhere in the transient space with probability αk=(J0=k)\alpha_{k}=\mathbb{P}(J_{0}=k), k=1,,pk=1,\dots,p, and let 𝜶=(α1,,αp)\bm{\alpha}=(\alpha_{1},\dots,\alpha_{p}). Here it is assumed that the probability of starting in the absorbing state p+1p+1 is zero, i.e., (J0=p+1)=0\mathbb{P}(J_{0}=p+1)=0. Then, we say that the time until absorption YY has phase-type (PH) distribution with representation (𝜶,𝐒)(\bm{\alpha},\mathbf{S}), and we write YPH(𝜶,𝐒)Y\sim\mbox{PH}(\bm{\alpha},\mathbf{S}) or YPHp(𝜶,𝐒)Y\sim\mbox{PH}_{p}(\bm{\alpha},\mathbf{S}). The density ff and distribution function FF of YPH(𝜶,𝐒)Y\sim\mbox{PH}(\bm{\alpha},\mathbf{S}) are given by

f(y)=𝜶exp(𝐒y)𝐬,y>0,f(y)=\bm{\alpha}\exp\left(\mathbf{S}y\right)\mathbf{s}\,,\quad y>0\,,
F(y)=1𝜶exp(𝐒y)𝐞,y>0,F(y)=1-\bm{\alpha}\exp\left(\mathbf{S}y\right)\mathbf{e}\,,\quad y>0\,,

where the exponential of a matrix 𝐀\mathbf{A} is defined by

exp(𝐀)=n=0𝐀nn!.\exp\left(\mathbf{A}\right)=\sum_{n=0}^{\infty}\frac{\mathbf{A}^{n}}{n!}\,.

The efficient computation of the exponential of a matrix is not a straightforward task in high dimensions, and we refer to Moler and Van Loan, (1978) for a survey of different methods. The moments of YPH(𝜶,𝐒)Y\sim\mbox{PH}(\bm{\alpha},\mathbf{S}) are given by

𝔼(Yθ)=Γ(1+θ)𝜶(𝐒)θ𝐞,θ>0.\mathbb{E}\left(Y^{\theta}\right)=\Gamma\left(1+\theta\right)\bm{\alpha}\left(-\mathbf{S}\right)^{-\theta}\mathbf{e}\,,\quad\theta>0\,.

Moreover, its Laplace transform LYL_{Y} is given by

LY(u)=𝔼(exp(uY))=𝜶(u𝐈𝐒)𝐬,u0.L_{Y}(u)=\mathbb{E}\left(\exp(-uY)\right)=\bm{\alpha}\left(u\mathbf{I}-\mathbf{S}\right)\mathbf{s}\,,\quad u\geq 0\,.

Let Y1PHp1(𝜶1,𝐒1)Y_{1}\sim\mbox{PH}_{p_{1}}(\bm{\alpha}_{1},\mathbf{S}_{1}) and Y2PHp2(𝜶2,𝐒2)Y_{2}\sim\mbox{PH}_{p_{2}}(\bm{\alpha}_{2},\mathbf{S}_{2}) be independent PH distributed random variables. Then

Y1+Y2PHp1+p2((𝜶1,𝟎),(𝐒1𝐬1𝜶2𝟎𝐒2)),Y_{1}+Y_{2}\sim\mbox{PH}_{p_{1}+p_{2}}\left(\left(\bm{\alpha}_{1},\bm{0}\right),\left(\begin{array}[]{cc}\mathbf{S}_{1}&\mathbf{s}_{1}\bm{\alpha}_{2}\\ \bm{0}&\mathbf{S}_{2}\end{array}\right)\right)\,,
min(Y1,Y2)PHp1p2(𝜶1𝜶2,𝐒1𝐒2),\min\left(Y_{1},Y_{2}\right)\sim\mbox{PH}_{p_{1}p_{2}}\left(\bm{\alpha}_{1}\otimes\bm{\alpha}_{2},\mathbf{S}_{1}\oplus\mathbf{S}_{2}\right)\,,

and

max(Y1,Y2)PHp1p2+p1+p2((𝜶1𝜶2,𝟎,𝟎),(𝐒1𝐒2𝐈𝐬2𝐬1𝐈𝟎𝐒1𝟎𝟎𝟎𝐒2)),\max\left(Y_{1},Y_{2}\right)\sim\mbox{PH}_{p_{1}p_{2}+p_{1}+p_{2}}\left(\left(\bm{\alpha}_{1}\otimes\bm{\alpha}_{2},\bm{0},\bm{0}\right),\left(\begin{array}[]{ccc}\mathbf{S}_{1}\oplus\mathbf{S}_{2}&\mathbf{I}\otimes\mathbf{s}_{2}&\mathbf{s}_{1}\otimes\mathbf{I}\\ \bm{0}&\mathbf{S}_{1}&\bm{0}\\ \bm{0}&\bm{0}&\mathbf{S}_{2}\end{array}\right)\right)\,,

This means that the PH class is closed under addition, minima, and maxima. In fact, it is also closed under any order statistic (cf. Bladt and Nielsen, (2017) for the exact and more involved expression). Moreover, the class is closed under finite mixtures. Concretely, if UBernoulli(ν)U\sim Bernoulli(\nu), ν[0,1]\nu\in[0,1], then

UY1+(1U)Y2PHp1+p2((ν𝜶1,(1ν)𝜶2),(𝐒1𝟎𝟎𝐒2)).UY_{1}+(1-U)Y_{2}\sim\mbox{PH}_{p_{1}+p_{2}}\left(\left(\nu\bm{\alpha}_{1},(1-\nu)\bm{\alpha}_{2}\right),\left(\begin{array}[]{cc}\mathbf{S}_{1}&\bm{0}\\ \bm{0}&\mathbf{S}_{2}\end{array}\right)\right)\,.

Furthermore, when the mixing probabilities are given by a DPH distribution, the resulting distribution is again PH. More specifically, let NDPHq(𝝅,𝐓)N\sim\mbox{DPH}_{q}(\bm{\pi},\mathbf{T}) and Y1,Y2Y_{1},Y_{2}\dots be iid, independent of NN, with common element YPHp(𝜶,𝐒)Y\sim\mbox{PH}_{p}(\bm{\alpha},\mathbf{S}). Then

n=1NYnPHqp(𝝅𝜶,𝐈𝐒+𝐓𝐬𝜶).\sum_{n=1}^{N}Y_{n}\sim\mbox{PH}_{qp}(\bm{\pi}\otimes\bm{\alpha},\mathbf{I}\otimes\mathbf{S}+\mathbf{T}\otimes\mathbf{s}\bm{\alpha})\,.

We refer to Bladt and Nielsen, (2017) for a comprehensive reading on PH distributions.

By decomposing the matrix 𝐒\mathbf{S} into its Jordan normal form, we observe that the right tail of a PH distribution exhibits asymptotically exponential behaviour. This limitation motivates the introduction of PH extensions that can accommodate diverse tail behaviours. One such extension is the class of inhomogeneous phase-type distributions, which is presented in the following section.

2.3. Inhomogeneous phase-type distributions

In Albrecher and Bladt, (2019), the class of inhomogeneous phase-type distribution was introduced by allowing the Markov jump process to be time-inhomogeneous in the construction principle of PH distributions. In this way, (Jt)t0(J_{t})_{t\geq 0} has an intensity matrix of the form

𝚲(t)=(𝐒(t)𝐬(t)𝟎0),\bm{\Lambda}(t)=\left(\begin{array}[]{cc}\mathbf{S}(t)&\mathbf{s}(t)\\ \bm{0}&0\end{array}\right)\,,

where 𝐬(t)=𝐒(t)𝐞\mathbf{s}(t)=-\mathbf{S}(t)\,\mathbf{e}. Here it is assumed that the vector of initial probabilities is 𝜶=(α1,,αp)\bm{\alpha}=(\alpha_{1},\dots,\alpha_{p}). Then, we say that the time until absorption YY has inhomogeneous phase-type (IPH) distribution with representation (𝜶,𝐒(t))(\bm{\alpha},\mathbf{S}(t)), and we write YIPH(𝜶,𝐒(t))Y\sim\mbox{IPH}(\bm{\alpha},\mathbf{S}(t)). However, this setting is too general for applications since its functionals are given in terms of product integrals. Thus, we consider the subclass of IPH distribution when 𝐒(t)\mathbf{S}(t) is of the form 𝐒(t)=λ(t)𝐒\mathbf{S}(t)=\lambda(t)\mathbf{S}, where λ\lambda is some known non-negative real function, and 𝐒\mathbf{S} is a sub-intensity matrix. In this case, we write YIPH(𝜶,𝐒,λ)Y\sim\mbox{IPH}(\bm{\alpha},\mathbf{S},\lambda). Note that with λ1\lambda\equiv 1, one gets the conventional PH distributions. This subclass is particularly suitable for applications due to the following property. If YIPH(𝜶,𝐒,λ)Y\sim\mbox{IPH}(\bm{\alpha},\mathbf{S},\lambda), then there exists a function gg such that

Yg(τ),Y\sim g(\tau)\,,

where τPH(𝜶,𝐒)\tau\sim\mbox{PH}(\bm{\alpha},\mathbf{S}). The relationship between gg and λ\lambda is given by the following expression

g1(y)=0yλ(t)𝑑t.g^{-1}(y)=\int_{0}^{y}\lambda(t)dt\,.

The density ff and distribution function FF of YIPH(𝜶,𝐒,λ)Y\sim\mbox{IPH}(\bm{\alpha},\mathbf{S},\lambda) can again be expressed using the exponential of matrices and are given by

f(y)=λ(y)𝜶exp(0yλ(t)𝑑t𝐒)𝐬,y>0,f(y)=\lambda(y)\,\bm{\alpha}\exp\left(\int_{0}^{y}\lambda(t)dt\ \mathbf{S}\right)\mathbf{s}\,,\quad y>0\,,
F(y)=1𝜶exp(0yλ(t)𝑑t𝐒)𝐞,y>0.F(y)=1-\bm{\alpha}\exp\left(\int_{0}^{y}\lambda(t)dt\ \mathbf{S}\right)\mathbf{e}\,,\quad y>0\,.

The class of IPH distributions is no longer confined to exponential tails, despite being defined in terms of matrix exponentials. Instead, the function λ\lambda determines their exact asymptotic behaviour (see Albrecher et al., 2022a for details).

It turns out that a number of IPH distributions can be expressed as classical distributions with matrix-valued parameters properly defined using functional calculus. Table 2.1 contains the IPH transformations as implemented in the matrixdist package (see Albrecher and Bladt, (2019); Albrecher et al., 2022b ; Albrecher et al., 2022a for further information and rationale on the different parametrisations).

λ(t)\lambda(t) g(y)g(y) Parameters Domain
Matrix-Pareto (t+β)1(t+\beta)^{-1} β(exp(y)1)\beta\left(\exp(y)-1\right) β>0\beta>0
Matrix-Weibull βtβ1\beta t^{\beta-1} y1/βy^{1/\beta} β>0\beta>0
Matrix-Lognormal γ(log(t+1))γ1/(t+1)\gamma(\log(t+1))^{\gamma-1}/(t+1) exp(y1/γ)1\exp(y^{1/\gamma})-1 γ>1\gamma>1
Matrix-Loglogistic θtθ1/(tθ+γθ)\theta t^{\theta-1}/(t^{\theta}+\gamma^{\theta}) γ(exp(y)1)1/θ\gamma(\exp(y)-1)^{1/\theta} γ,θ>0\gamma,\theta>0
Matrix-Gompertz exp(βt)\exp(\beta t) log(βy+1)/β\log(\beta y+1)/\beta β>0\beta>0
Matrix-GEV - μ+σ(yξ1)/ξ\mu+{\sigma}(y^{-\xi}-1)/{\xi} μ\mu\in\mathbb{R}, σ>0\sigma>0, ξ\xi\in\mathbb{R}
Table 2.1. Transformations

Regression

At least two regression models based on IPH distributions exist in the literature. The first model, known as the proportional intensities (PI) model, was introduced in Albrecher et al., 2022a . The authors, inspired by the proportional hazards model (cf. Cox,, 1972), formulated a way of regressing on the intensity function λ\lambda for different covariate information. More specifically, the main idea of the PI model is that the inhomogeneous intensity λ(;𝜽)\lambda(\cdot;\bm{\theta}), which is assumed to be fully determined by a vector of parameters 𝜽\bm{\theta}, is proportionally affected for different covariate information 𝐗h\mathbf{X}\in\mathbb{R}^{h} according to a positive real-valued and measurable function m()m(\cdot). In other words, the model assumes that

λ(t;𝐗,𝜽,𝜸)=λ(t;𝜽)m(𝐗𝜸),t0,\lambda(t;\mathbf{X},\bm{\theta},\bm{\gamma})=\lambda(t;\bm{\theta})m(\mathbf{X}\bm{\gamma}),\quad t\geq 0,

where 𝜸\bm{\gamma} is a hh-dimensional column vector containing regression coefficients. With this assumption, Y𝐗IPH(𝜶,𝐒,λ(;𝐗,𝜽,𝜸))Y\mid\mathbf{X}\sim\mbox{IPH}(\bm{\alpha},\mathbf{S},\lambda(\cdot;\mathbf{X},\bm{\theta},\bm{\gamma})) with corresponding density and cumulative functions

f(y)=m(𝐗𝜸)λ(y;𝜽)𝜶exp(m(𝐗𝜸)0yλ(t;𝜽)𝑑t𝐒)𝐬,f(y)=m(\mathbf{X}\bm{\gamma})\lambda(y;\bm{\theta})\bm{\alpha}\exp\left(m(\mathbf{X}\bm{\gamma})\int_{0}^{y}\lambda(t;\bm{\theta})\,dt\mathbf{S}\right)\mathbf{s}\,,

and

F(y)=1𝜶exp(m(𝐗𝜸)0yλ(t;𝜽)𝑑t𝐒)𝐞.F(y)=1-\bm{\alpha}\exp\left(m(\mathbf{X}\bm{\gamma})\int_{0}^{y}\lambda(t;\bm{\theta})\,dt\mathbf{S}\right)\mathbf{e}\,.

Typically, the measurable function m()m() is taken to be m(x)=exp(x)m(x)=\exp(x), which is the default implementation in matrixdist. In this framework, when p=1p=1, one recovers the proportional hazard model. However, for p>1p>1, the implied hazard functions between different subgroups can deviate from proportionality in the distribution body but are asymptotically proportional in the tail. For more details on the PI model, we refer the reader to Albrecher et al., 2022a and Bladt, (2022).

The second regression model, called the phase-type mixture-of-experts (PH-MoE) model, was introduced in Bladt and Yslas, 2023b as a generalisation of mixture-of-experts models to the PH distributions framework and follows the same rationale as the DPH-MoE specification described above (cf. Bladt and Yslas, 2023b for further details).

3. Multivariate matrix distributions

In this section, we present the mathematical foundations of the different classes of multivariate matrix distributions implemented in matrixdist.

3.1. Multivariate discrete phase-type distributions

MDPH* class

Let NDPHp(𝜶,𝐒)N\sim\mbox{DPH}_{p}(\bm{\alpha},\mathbf{S}) be a DPH distributed random variable with underlying Markov chain (Zn)n0(Z_{n})_{n\in\mathbb{N}_{0}}. Let 𝐫j=(rj(1),,rj(p))\mathbf{r}_{j}=(r_{j}(1),\dots,r_{j}(p))^{\top} be column vectors of dimension pp taking values in 0p\mathbb{N}_{0}^{p}, j=1,,dj=1,\dots,d, and let 𝐑=(𝐫1,,𝐫d)\mathbf{R}=(\mathbf{r}_{1},\dots,\mathbf{r}_{d}) be a p×dp\times d matrix, called the reward matrix. We define

N(j)=n=0Nrj(Zn),N^{(j)}=\sum_{n=0}^{N}r_{j}(Z_{n})\,,

for all j=1,,dj=1,\dots,d. Then, we say that the random vector 𝐍=(N(1),,N(d))\mathbf{N}=(N^{(1)},\dots,N^{(d)}) has a multivariate discrete phase-type distribution of the MDPH* type, and we write 𝐍MDPH*(𝜶,𝐒,𝐑)\mathbf{N}\sim\mbox{MDPH*}(\bm{\alpha},\mathbf{S},\mathbf{R}). This class of distributions was introduced in Campillo-Navarro, (2018), and the construction principle can be seen as an extension of the so-called transformation via rewards for univariate DPH distributions. In this contribution, the author showed that elements in this class possess explicit expressions for joint probability-generating function, joint moment-generating function, and joint moments. Moreover, the class is dense in the class of distributions with support in d\mathbb{N}^{d}, and margins are DPH distributed. However, general closed-form expressions for the joint density and distribution functions are not available, and the estimation of this general class is still an open question, limiting their use in practice. Nonetheless, there are subclasses of MDPH* distributions with explicit expressions of these two functionals that preserve the denseness property of the MDPH* class. The matrixdist package provides implementations for two of these subclasses, which we describe next.

fMDPH class

Let 𝐍=(N(1),N(2))MDPH*(𝜶,𝐒,𝐑)\mathbf{N}=(N^{(1)},N^{(2)})\sim\mbox{MDPH*}(\bm{\alpha},\mathbf{S},\mathbf{R}) with

𝜶=(𝝅,𝟎),𝐒=(𝐒11𝐒12𝟎𝐒22),and𝐑=(𝐞𝟎𝟎𝐞).\bm{\alpha}=\left(\bm{\pi},\bm{0}\right)\,,\quad\mathbf{S}=\left(\begin{array}[]{cc}\mathbf{S}_{11}&\mathbf{S}_{12}\\ \bm{0}&\mathbf{S}_{22}\end{array}\right)\,,\quad\text{and}\quad\mathbf{R}=\left(\begin{array}[]{cc}\mathbf{e}&\bm{0}\\ \bm{0}&\mathbf{e}\end{array}\right)\,.

Here, 𝐒11\mathbf{S}_{11} and 𝐒22\mathbf{S}_{22} are sub-transition matrices of dimensions p1p_{1} and p2p_{2} (p1+p2=pp_{1}+p_{2}=p), respectively, 𝐒12\mathbf{S}_{12} is a p1×p2p_{1}\times p_{2} matrix satisfying 𝐒11𝐞+𝐒12𝐞=𝐞\mathbf{S}_{11}\mathbf{e}+\mathbf{S}_{12}\mathbf{e}=\mathbf{e}, and 𝝅\bm{\pi} is a p1p_{1} dimensional vector of initial probabilities. Then, one can show that the joint density of this random vector is given by

f𝐍(n(1),n(2))=𝝅𝐒11n(1)1𝐒12𝐒22n(2)1𝐬2,f_{\mathbf{N}}(n^{(1)},n^{(2)})=\bm{\pi}\mathbf{S}_{11}^{n^{(1)}-1}\mathbf{S}_{12}\mathbf{S}_{22}^{n^{(2)}-1}\mathbf{s}_{2}\,,

where 𝐬2=𝐞𝐒22𝐞\mathbf{s}_{2}=\mathbf{e}-\mathbf{S}_{22}\mathbf{e}. In this case, we say that 𝐍\mathbf{N} is bivariate discrete phase-type distributed of the feed-forward type, and we write 𝐍fMDPH(𝝅,𝐒11,𝐒12,𝐒22)\mathbf{N}\sim\mbox{fMDPH}(\bm{\pi},\mathbf{S}_{11},\mathbf{S}_{12},\mathbf{S}_{22}). In particular, the marginals are parametrised by N(1)DPH(𝝅,𝐒11)N^{(1)}\sim\mbox{DPH}(\bm{\pi},\mathbf{S}_{11}) and N(2)DPH(𝝅(𝐈𝐒11)1𝐒12,𝐒22)N^{(2)}\sim\mbox{DPH}(\bm{\pi}\left(\mathbf{I}-\mathbf{S}_{11}\right)^{-1}\mathbf{S}_{12},\mathbf{S}_{22}). For more details on this class of bivariate DPH distributions, we refer the reader to Bladt and Yslas, 2023c . Although the above construction principle can be extended to higher dimensions, the implementation of the methods associated with this class becomes more challenging. Fortunately, another subclass of MDPH* distributions introduced in Bladt and Yslas, 2023c can easily be tracked and implemented in higher dimensions, as we describe next.

mDPH class

Let (Zn(j))n0(Z_{n}^{(j)})_{n\in\mathbb{N}_{0}}, j=1,,dj=1,\dots,d, be Markov chains on a common state space with pp transient states. All chains are assumed to start in the same state and then evolve independently until respective absorptions. Formally,

Z0(j)=Z0(l),(Zn(j))n0Z0(1)(Zn(l))n0,lj,j,l{1,,d}.Z_{0}^{(j)}=Z_{0}^{(l)}\,,\quad(Z_{n}^{(j)})_{n\in\mathbb{N}_{0}}{\perp\!\!\!\!\perp}_{Z_{0}^{(1)}}(Z_{n}^{(l)})_{n\in\mathbb{N}_{0},\>l\neq j}\,,\quad\forall j,l\in\{1,\dots,d\}\,.

If N(j)N^{(j)}, j=1,,dj=1,\dots,d, are the univariate DPH distributed absorption times of (Zn(j))n0(Z_{n}^{(j)})_{n\in\mathbb{N}_{0}}, we say that the vector 𝐍=(N(1),,N(d))\mathbf{N}=(N^{(1)},\ldots,N^{(d)}) has a multivariate discrete phase-type distribution of the mDPH type, and we write

𝐍mDPH(𝜶,S~),\mathbf{N}\sim\mbox{mDPH}(\bm{\alpha},\widetilde{S})\,,

where 𝜶\bm{\alpha} is the common vector of initial probabilities and S~={𝐒1,,𝐒d}\widetilde{S}=\{\mathbf{S}_{1},\dots,\mathbf{S}_{d}\}, with 𝐒j\mathbf{S}_{j} the sub-transition matrix associated with (Zn(j))n0(Z_{n}^{(j)})_{n\in\mathbb{N}_{0}}, j=1,,dj=1,\dots,d. Explicit expressions for several functionals of interest of this class can easily be obtained by conditioning on the starting value of the Markov chains. For instance, the joint density function of 𝐍mDPH(𝜶,S~)\mathbf{N}\sim\mbox{mDPH}(\bm{\alpha},\widetilde{S}) is given by

f𝐍(𝐧)=k=1pαkj=1d𝐞k𝐒jn(j)1𝐬j,𝐧d.f_{\mathbf{N}}(\mathbf{n})=\sum_{k=1}^{p}\alpha_{k}\prod_{j=1}^{d}\mathbf{e}_{k}^{\top}\mathbf{S}_{j}^{n^{(j)}-1}\mathbf{s}_{j}\,,\quad\mathbf{n}\in\mathbb{N}^{d}\,.

3.2. Multivariate continuous phase-type distributions

We now present the MPH*, fMPH, and mPH classes of distributions, which follow similar construction principles as their discrete counterparts.

Let τPHp(𝜶,𝐒)\tau\sim\mbox{PH}_{p}(\bm{\alpha},\mathbf{S}) be a PH distributed random variable, 𝐫j=(rj(1),,rj(p))\mathbf{r}_{j}=(r_{j}(1),\dots,r_{j}(p))^{\top} be nonnegative column vectors of dimension pp, j=1,,dj=1,\dots,d, and 𝐑=(𝐫1,,𝐫d)\mathbf{R}=(\mathbf{r}_{1},\dots,\mathbf{r}_{d}) be a p×dp\times d reward matrix. Vectors 𝐫j\mathbf{r}_{j} represent rates at which a reward is accumulated when (Jt)t0(J_{t})_{t\geq 0} is in a specific state. Thus, we denote by

Y(j)=0τrj(Jt)𝑑t,Y^{(j)}=\int_{0}^{\tau}r_{j}(J_{t})\,dt\,,

the total reward earned prior to absorption by each component jj, j=1,,dj=1,\dots,d. Then, we say that the random vector 𝐘=(Y(1),,Y(d))\mathbf{Y}=(Y^{(1)},\dots,Y^{(d)}) has a multivariate phase-type distribution of the MPH* type, and we write 𝐘MPH*(𝜶,𝐒,𝐑)\mathbf{Y}\sim\mbox{MPH*}(\bm{\alpha},\mathbf{S},\mathbf{R}) (see Kulkarni, (1989); Bladt and Nielsen, (2017); Albrecher et al., 2022b for more details). One attractive characteristic of this class of multivariate distributions is that, given a nonnegative vector 𝐰=(w1,,wd)\mathbf{w}=(w_{1},\dots,w_{d}), <𝐘,𝐰>=j=1dwjY(j)<\mathbf{Y},\mathbf{w}>=\sum_{j=1}^{d}w_{j}Y^{(j)} is PH distributed, with representation (𝜶𝐰,𝐒𝐰)(\bm{\alpha}_{\mathbf{w}},\mathbf{S}_{\mathbf{w}}) say. To obtain the exact parametrisation, we split the state space {1,,p}=E+E0\{1,\dots,p\}=E_{+}\cup E_{0}, such that kE+k\in E_{+} if (𝐑𝐰)k>0(\mathbf{R}\mathbf{w}^{\top})_{k}>0 and kE0k\in E_{0} if (𝐑𝐰)k=0(\mathbf{R}\mathbf{w}^{\top})_{k}=0. Then, by a reordering of the states, we can assume that 𝜶\bm{\alpha} and 𝐒\mathbf{S} can be written as

𝜶=(𝜶+,𝜶0)and𝐒=(𝐒++𝐒+0𝐒0+𝐒00),\bm{\alpha}=\left(\begin{array}[]{cc}\bm{\alpha}^{+}\,,&\bm{\alpha}^{0}\end{array}\right)\quad\text{and}\quad\mathbf{S}=\left(\begin{array}[]{cc}\mathbf{S}^{++}&\mathbf{S}^{+0}\\ \mathbf{S}^{0+}&\mathbf{S}^{00}\end{array}\right)\,,

where the ++ terms correspond to states in E+E_{+}, and the 0 terms to those states in E0E_{0}. For instance, 𝐒0+\mathbf{S}^{0+} collects transition intensities by which (Jt)t0(J_{t})_{t\geq 0} jumps from states in E0E_{0} to states in E+E_{+}. After this arrangement, we can express the distribution of the continuous part of <𝐘,𝐰><\mathbf{Y},\mathbf{w}> via a PH representation of the form

𝜶𝐰=𝜶++𝜶0(𝐒00)1𝐒0+and𝐒𝐰=𝚫((𝐑𝐰)+)1(𝐒+++𝐒+0(𝐒00)1𝐒0+).\bm{\alpha}_{\mathbf{w}}=\bm{\alpha}^{+}+\bm{\alpha}^{0}\left(-\mathbf{S}^{00}\right)^{-1}\mathbf{S}^{0+}\quad\text{and}\quad\mathbf{S}_{\mathbf{w}}=\bm{\Delta}((\mathbf{R}\mathbf{w}^{\top})_{+})^{-1}\left(\mathbf{S}^{++}+\mathbf{S}^{+0}\left(-\mathbf{S}^{00}\right)^{-1}\mathbf{S}^{0+}\right)\,.

Here, 𝚫((𝐑𝐰)+)\bm{\Delta}((\mathbf{R}\mathbf{w}^{\top})_{+}) is a diagonal matrix with entries the vector (𝐑𝐰)+(\mathbf{R}\mathbf{w}^{\top})_{+} formed of the appropriate ordered values satisfying (𝐑𝐰)k>0(\mathbf{R}\mathbf{w}^{\top})_{k}>0. Note that an atom at zero of size 𝜶0(𝐈(𝐒00)1𝐒0+)𝐞\bm{\alpha}^{0}\left(\mathbf{I}-\left(-\mathbf{S}^{00}\right)^{-1}\mathbf{S}^{0+}\right)\mathbf{e} may appear since it is possible that (Jt)t0(J_{t})_{t\geq 0} starts in a state in E0E_{0} and gets absorbed before reaching a state in E+E_{+}. In particular, the above result implies that if 𝐘MPH*(𝜶,𝐒,𝐑)\mathbf{Y}\sim\mbox{MPH*}(\bm{\alpha},\mathbf{S},\mathbf{R}), then its marginals Y(j)Y^{(j)} are PH distributed with parameters easily computed with the aforementioned formulas. Moreover, the so-called transformation via rewards for univariate PH distributions is retrieved when d=1d=1 above.

Although members of the MPH* class possess other desirable characteristics, such as explicit expressions for joint Laplace transform and joint moments (see Section 8.1.1 of Bladt and Nielsen, (2017)), general closed-form expressions for both joint density and joint distribution functions do not exist. Similar to its discrete counterpart, the MDPH* class, this complicates their use in practice and, more importantly, their estimation. This motivates the introduction of other subclasses of MPH* distributions for which explicit expressions can be achieved. The matrixdist package provides implementations for the fMPH and mPH subclasses, which are derived similarly to their discrete equivalents.

The fMPH class is obtained by considering MPH* parametrisations of the form MPH*(𝜶,𝐒,𝐑)\mbox{MPH*}(\bm{\alpha},\mathbf{S},\mathbf{R}) with

𝜶=(𝝅,𝟎),𝐒=(𝐒11𝐒12𝟎𝐒22),and𝐑=(𝐞𝟎𝟎𝐞),\bm{\alpha}=\left(\bm{\pi},\bm{0}\right)\,,\quad\mathbf{S}=\left(\begin{array}[]{cc}\mathbf{S}_{11}&\mathbf{S}_{12}\\ \bm{0}&\mathbf{S}_{22}\end{array}\right)\,,\quad\text{and}\quad\mathbf{R}=\left(\begin{array}[]{cc}\mathbf{e}&\bm{0}\\ \bm{0}&\mathbf{e}\end{array}\right)\,,

where 𝐒11\mathbf{S}_{11} and 𝐒22\mathbf{S}_{22} are sub-intensity matrices of dimensions p1p_{1} and p2p_{2} (p1+p2=pp_{1}+p_{2}=p), respectively, 𝐒12\mathbf{S}_{12} is a p1×p2p_{1}\times p_{2} matrix satisfying 𝐒11𝐞+𝐒12𝐞=𝟎\mathbf{S}_{11}\mathbf{e}+\mathbf{S}_{12}\mathbf{e}=\bm{0}, and 𝝅\bm{\pi} is a p1p_{1} dimensional vector of initial probabilities. In such a case, explicit expressions for different functionals can be obtained. For example, the joint density of a random vector 𝐘=(Y(1),Y(2))\mathbf{Y}=(Y^{(1)},Y^{(2)}) with the above parametrisation is given by

f𝐘(y(1),y(2))=𝜶exp(𝐒11y(1))𝐒12exp(𝐒22y(2))(𝐒22)𝐞,f_{\mathbf{Y}}(y^{(1)},y^{(2)})=\bm{\alpha}\exp\left(\mathbf{S}_{11}y^{(1)}\right)\mathbf{S}_{12}\exp\left(\mathbf{S}_{22}y^{(2)}\right)\left(-\mathbf{S}_{22}\right)\mathbf{e}\,,

For more details on this class of bivariate PH distribution and an extension to the IPH framework, we refer to Albrecher et al., 2022b .

The tractable class of mIPH distributions (with a particular instance, the mPH class) was introduced in Bladt, (2023) by considering dd time-inhomogeneous Markov pure jump processes on a common state space with pp transient states that start in the same state and then evolve independently until respective absorptions. If Y(j)Y^{(j)}, j=1,,dj=1,\dots,d, are the univariate IPH distributed absorption times of these Markov jump processes, we say that the vector 𝐘=(Y(1),,Y(d))\mathbf{Y}=(Y^{(1)},\ldots,Y^{(d)}) has a multivariate inhomogeneous phase-type distribution of the mIPH type, and we write

𝐘mIPH(𝜶,S~,L~),whereS~={𝐒1,,𝐒d}andL~={λ1,,λd}.\mathbf{Y}\sim\mbox{mIPH}(\bm{\alpha},\widetilde{S},\widetilde{L}),\quad\mbox{where}\quad\widetilde{S}=\{\mathbf{S}_{1},\dots,\mathbf{S}_{d}\}\quad\text{and}\quad\widetilde{L}=\{\lambda_{1},\dots,\lambda_{d}\}.

This construction yields explicit expressions for important functionals. For instance, let 𝐘mIPH(𝜶,S~,L~)\mathbf{Y}\sim\mbox{mIPH}(\bm{\alpha},\widetilde{S},\widetilde{L}), then we have

f𝐘(𝐲)=k=1pαkj=1d𝐞kexp(𝐒jgj1(y(j)))𝐬jλj(y(j)),𝐲+d.f_{\mathbf{Y}}(\mathbf{y})=\sum_{k=1}^{p}\alpha_{k}\prod_{j=1}^{d}\mathbf{e}_{k}^{\top}\exp\left(\mathbf{S}_{j}g_{j}^{-1}(y^{(j)})\right)\mathbf{s}_{j}\lambda_{j}(y^{(j)}),\quad\mathbf{y}\in\mathbb{R}^{d}_{+}\,.

Note that this allows for combining different types of IPH marginals, given by λi()\lambda_{i}(\cdot) and gi()g_{i}(\cdot), which enables different tail behaviours for the marginals.

3.3. Regression

For some of the multivariate classes introduced earlier, regression in the vector of initial probabilities can be applied by utilizing the same principle as in the univariate mixture-of-experts (MoE) approach. This allows us to capture the relationships between the multivariate responses and covariates, while taking advantage of the flexibility and expressiveness of the multivariate matrix distributions. Specifically, the mIPH-MoE model was introduced in Albrecher et al., (2023) and used to estimate the joint remaining lifetimes of spouses. Additionally, the mDPH-MoE and fMDPH-MoE specifications were derived in Bladt and Yslas, 2023c and illustrated for modelling insurance frequencies.

4. Note on estimation algorithms and computational remarks

Most of the distribution classes presented above can be estimated using various forms of expectation-maximisation (EM) algorithms. The original EM algorithm for PH distribution was introduced in Asmussen et al., (1996) and later extended for censored data in Olsson, (1996). The corresponding algorithm for DPH distributions can be found in Bladt and Nielsen, (2017). More recently, Albrecher et al., 2022b derived an EM algorithm for IPH distributions and illustrated how the implementation of the algorithms for MPH* and fMPH distributions can be carried out smoothly. Regarding the regression models, the estimation method for the PI model is presented in Albrecher et al., 2022a , while Bladt and Yslas, 2023b provides the algorithm for the estimation of the PH-MoE specification. This method was then adapted in Albrecher et al., (2023) for the mIPH class and in Bladt and Yslas, 2023c for the DPH framework, which includes extensions to the multivariate classes mDPH and fMDPH.

All estimation procedures implemented in the matrixdist package related to PH distributions require an effective method for computing matrix-exponential operations, as the various E-steps involved hinge on these types of quantities. Several theoretical methods exist for these purposes (see for instance Moler and Van Loan, (1978)), but an efficient numerical implementation is crucial due to the significant computational burden of the algorithms. The matrixdist package offers three primary implementations of matrix exponentials: the first, described in Asmussen et al., (1996), involves converting the problem into a system of ordinary differential equations. The second method utilises the so called uniformisation, with the exact description available in Albrecher et al., 2022b . The third and final method is the Padé approximation, which can be found in Moler and Van Loan, (1978). Each method presents its own advantages and disadvantages, and the choice will depend on the specific problem/distribution at hand.

5. Illustrations

5.1. Univariate distirbutions

In this section, we demonstrate how to utilize the matrixdist package to fit univariate PH and DPH distributions to real-world data.

Continous PH distributions

We consider the AutoBi dataset from the insuranceData package, which contains information about automobile bodily injury claims. The dataset comprises eight variables: one for the economic loss (which we aim to describe), six covariates, and one variable to identify each claim. We start by providing general descriptive statistics for the variable of interest (LOSS).

# Load the data
data("AutoBi", package = "insuranceData")
claim <- AutoBi
# Economic losses resulting from an automobile accident
loss <- claim$LOSS
# Summary statistics of losses
summary(loss)
#>     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
#>    0.005    0.640    2.331    5.954    3.995 1067.697
# 10 largest values of losses
tail(sort(loss), n = 10)
#>  [1]   82.000   96.007  114.604  150.000  162.047  188.720  193.000  222.405
#>  [9]  273.604 1067.697

We observe that the losses range from 0.005 to 1067.697, with 75% of the data smaller than 3.995. The presence of large values, compared to the body of the distribution, serves as an initial indication of a heavy tail. To further confirm this observation, we create a Hill plot using the hill() function from the evir package.

library(evir)
hill(loss)
Refer to caption
Figure 5.1. Hill plot for losses in the AutoBi dataset.

The plot exhibits a flat behaviour for the largest order statistics, suggesting a Pareto-type tail (or regularly varying). Therefore, we opt for an IPH model to more accurately capture the tail of the losses. More specifically, we use a matrix-Pareto distribution. The fitting process starts by creating an initial IPH distribution, which provides the starting parameters for the EM algorithm. This can be accomplished by employing the iph() method in matrixdist. For our analysis, we create an IPH object with random parameters of a general structure, a dimension 6, and a Pareto inhomogeneity function.

set.seed(22)
# Create an IPH object
x <- iph(gfun = "pareto", structure = "general", dimension = 6)

With an initial IPH distribution in place, we can employ the fit() method to fit the data. It is important to note that fit() requires specifying the number of steps for the EM algorithm. In this instance, we use 1500 steps, as the changes in the loglikelihood become negligible with this number (the same criteria is employed to determine the number of iterations in the subsequent illustration). The method outputs an IPH object containing the fitted parameters. Further steps could be made on the output object if required.

# Fitting procedure
z <- fit(x, y = loss, stepsEM = 1500)

Figures 5.2 and 5.3 display a QQ plot for the fitted distribution and a histogram of the (log) data against the fitted density, respectively. These plots indicate that the fitted distribution serves as a good approximation of the data. In particular, the sample and fitted quantiles align closely with the identity line, especially within the distribution’s body. Although there is a slight deviation from the identity line for quantiles in the tail, they remain relatively close to the blue line. The histogram further supports the notion that the fitted IPH distribution is a satisfactory approximation.

Refer to caption
Figure 5.2. QQ plot of fitted distribution.
Refer to caption
Figure 5.3. Histogram of the logarithm of losses vs fitted density (blue line).

PH regression

We now shift our focus to the regression framework. In particular, we employ the PI specification to capture the influence of covariate information on economic losses. The initial step involves removing any incomplete covariate data. Next, we need to split the covariates and the response variable into two different objects. In our case, we investigate the effect of the variables ATTORNEY, CLMSEX and CLMAGE. The variable ATTORNEY indicates whether the claimant was represented by an attorney or not, CLMSEX is the claimant’s sex, and CLMAGE corresponds to their age.

# Covariate information, excluding missing entries
cov <- na.omit(claim)
# Separation of the information
X <- cov[, c("ATTORNEY", "CLMSEX", "CLMAGE")]
loss <- cov[, "LOSS"]

Now, we can employ the reg() method to perform the PI regression. This function works similarly to the fit() method, with the main difference being that estimation of the regression parameters is included in each iteration of the EM algorithm (see Albrecher et al., 2022a for details). Along with the data, the reg() method requires an initial IPH distribution. In this case, we employ the previously fitted matrix-Pareto distribution without covariates.

# PI regression
z.reg <- reg(x = z, y = loss, X = X, stepsEM = 250)

The resulting object (z.reg) contains all matrix parameters along with the regression coefficients. This information can be easily accessed via the coef() function as follows.

# Fitted parameters
z.reg.par <- coef(z.reg)
z.reg.par
#> $alpha
#> [1] 2.356094e-04 3.257751e-06 5.920828e-02 3.755407e-04 4.561202e-01
#> [6] 4.840571e-01
#>
#> $S
#>               [,1]          [,2]        [,3]          [,4]          [,5]
#> [1,] -3.412707e+01  7.592078e-06  1.42774265  1.533919e-01  6.975181e-06
#> [2,]  2.145191e+01 -3.473845e+01  0.38024154  1.288647e+01  3.162116e-05
#> [3,]  3.090083e-01  6.287658e-01 -1.83234510  4.643455e-01  3.670351e-01
#> [4,]  5.666353e-01  1.166288e-04  7.77223071 -2.987094e+01  2.335434e-04
#> [5,]  1.432727e-04  3.578715e+01  0.49172745  2.247409e-04 -3.628114e+01
#> [6,]  4.561651e-06  1.243609e-03  0.08311972  3.771681e-05  4.707313e-02
#>               [,6]
#> [1,]  3.037507e+01
#> [2,]  2.799554e-05
#> [3,]  3.352413e-02
#> [4,]  2.066459e+01
#> [5,]  1.118600e-03
#> [6,] -6.641175e+00
#>
#> $beta
#> [1] 33.10073
#>
#> $B
#>
#>  1.15139949 -0.10888748 -0.01368395
#>
#> $C
#> numeric(0)

The above information can be used, for example, to create customised loss distributions for individual claimants based on their distinctive characteristics.

Take, for instance, a new claimant: a 60-year-old male, unrepresented by an attorney. The following code leverages our model to estimate his loss distribution.

# Covariate information of the new claimant
claimant <- data.frame(ATTORNEY = 2, CLMSEX = 1, CLMAGE = 60)
# Multiplicative effect in the sub-intensity matrix
prop <- exp(sum(z.reg.par$B * claimant))
# IPH distribution for claimant’s loss
claimant.iph <- iph(
  alpha = z.reg.par$alpha, S = z.reg.par$S * prop,
  gfun = "pareto", gfun_pars = z.reg.par$beta
)

If we wish to visualize this claimant’s survival function, we can plot it as follows.

# Evaluation points for the survival function
q <- seq(0, 15, 0.01)
# Plot of the survival function
plot(q, cdf(claimant.iph, q, lower.tail = F),
  type = "l", col = "blue", lwd = 1.3,
  ylab = "1-F(y)", xlab = "y", main = "Survival function"
)
Refer to caption
Figure 5.4. Survival function of losses for the new claimant.

We refer readers to Section 6 and the documentation of matrixdist for further available methods for PH and IPH distributions.

Discrete PH distributions

The matrixdist package provides an efficient approach to working with DPH distributions. To exemplify its application, we will use the NMES1988 data set included in the R package AER. This data is the result of a US National Medical Expenditure Survey that took place from 1987 to 1988 and contains various count variables. For our illustration, we will focus on the annual number of visits to a physician’s office (visits variable) for US residents aged above 66. Note, however, that when working with DPH distributions, it is necessary to add 1 to data starting at 0, as DPH random variable’s support commences at 1.

# Load the data
data("NMES1988", package = "AER")
df <- NMES1988
# Yearly physician office visits
visit <- df$visits + 1
df$y <- visit

The process of fitting a DPH distribution to data closely parallels that of fitting PH distributions. It starts by defining an initial dph object, which must then be passed to the fit() function along with the data (Figure 5.5 depicts the fitted distribution).

# Create a DPH distribution
set.seed(23)
d <- dph(structure = "general", dimension = 3)
# Fit the distribution to the data we are interested in
d.fit <- fit(x = d, y = visit, stepsEM = 1500)
Refer to caption
Figure 5.5. Comparison between empirical probability mass function and fitted one.

The data set also provides many other covariates. Here, we will concentrate on examining the influence of age, gender, and number of chronic conditions on the frequency of visits to a physician. For such a purpose, we will employ the MoE framework to integrate the covariate information into the DPH specification. This can be done by employing the MoE() method. The first step to performing this regression method is to specify a formula (f) that details how the covariates affect the number of physician visits.

# Convert the gender variable into binary format
df$gender <- ifelse(df$gender == "male", 1, 0)
# Regression formula
f <- y ~ age + gender + chronic

Next, such a formula has to be passed on to the MoE() method along with the data (including covariates) and an initial DPH distribution.

# MoE regression
d.reg <- MoE(x = d.fit, formula = f, data = df, stepsEM = 500)

The fitted model allows us to examine the impact of different covariate profiles on the number of physician visits. For example, let us compare the probability mass functions for two subjects. Both are 69 years old, female, but they differ in the number of chronic conditions they possess: one has five chronic conditions while the other does not have any.

# DPH distribution for a subject with 5 chronic conditions
dph.sub1 <- dph(alpha = d.reg$alpha[12, ], S = d.reg$S)
# DPH distribution for a subject with 0 chronic conditions
dph.sub2 <- dph(alpha = d.reg$alpha[38, ], S = d.reg$S)
# Plot of density function for both subjects
q <- seq(1, 20, 1)
plot(q, dens(dph.sub1, q),
  type = "p", col = "blue", cex = 0.8,
  ylab = "P(N=n)", xlab = "n", ylim = c(0, 0.3)
)
points(q, dens(dph.sub2, q), col = "red", cex = 0.8)
legend("topright",
  legend = c("5 chronic conditions", "0 chronic conditions"),
  col = c("blue", "red"), bty = "n", pch = 1, cex = 0.8
)
[Uncaptioned image]

Furthermore, we can quantify the impact of chronic conditions on the expected number of visits to a physician’s office. Let us compute the expected number of visits for both individuals:

# Mean value for subject with 5 chronic conditions
mean(dph.sub1) - 1
#> [1] 8.098524
# Mean value for subject with 0 chronic conditions
mean(dph.sub2) - 1
#> [1] 3.656367

Our analysis unsurprisingly reveals that the subject with five chronic conditions is expected to visit more often than the one with none. This demonstrates the ability of our fitted model to quantify the impact of varying covariate profiles - a tool which is essential for healthcare professionals when predicting healthcare usage patterns.

5.2. Multivariate distributions

MPH* class

To illustrate the capabilities of the implementations for MPH* distributions in matrixdist, we use the rdj data set of the copula package. More specifically, our focus is to use an MPH* for describing the joint behaviour of the absolute log returns of the three stock prices in this data set: Intel, Microsoft, and General Electric.

Firstly, we load the data, take the absolute value, and eliminate any data points with atoms at zero.

set.seed(24)
# Load the data
data("rdj", package = "copula")
# Absolute log return of stock prices
y <- as.matrix(abs(rdj[, -1]))
# Remove data points with atoms at zero
y <- y[y[, 1] != 0 & y[, 2] != 0 & y[, 3] != 0, ]

As in the univariate case, the fitting process requires the definition of an initial MPHstar object that is subsequently fed into the fit() function. In the present case, we use an MPHstar object with randomly generated parameters, a general form of the matrix parameters, and dimension 1010.

# Initial MPH* distribution
x <- MPHstar(structure = "general", dimension = 10, variables = 3)
# Fitting to data
x.fit <- fit(x = x, y = y, stepsEM = 200, uni_epsilon = 1e-6, r = 0.5)

In particular, the reward matrix of the fitted distribution can be accessed as follows:

# Estimated reward matrix
x.fit@pars$R
#>              [,1]       [,2]       [,3]
#>  [1,] 0.624656883 0.03162784 0.34371527
#>  [2,] 0.448807782 0.08398967 0.46720255
#>  [3,] 0.545561519 0.42181735 0.03262113
#>  [4,] 0.783031036 0.08518594 0.13178302
#>  [5,] 0.973586315 0.01054082 0.01587287
#>  [6,] 0.318181076 0.21878855 0.46303038
#>  [7,] 0.015947222 0.78138202 0.20267076
#>  [8,] 0.498338747 0.28584901 0.21581224
#>  [9,] 0.001025196 0.99897480 0.00000000
#> [10,] 0.471608399 0.48276616 0.04562544

Finally, we assess the quality of the fit by creating QQ plots of the fitted margins.

# Names for the QQ plot
name <- colnames(y)
# Quantiles’ levels
qq <- seq(0.01, 0.99, length.out = 100)
par(mfrow = c(1, 3))
for (j in 1:3) {
  # j-th marginal fitted distribution
  x <- marginal(x.fit, j)
  # Marginal observations
  ym <- y[, j]
  # quantiles
  y.q <- quantile(ym, probs = qq)
  x.q <- quan(x, qq)
  # QQ plot
  plot(y.q, x.q, main = name[j], xlab = "Sample", ylab = "Fitted", pch = 16, asp = 1)
  abline(0, 1, col = "blue")
}
Refer to caption
Figure 5.6. QQ plot of fitted distirbutions for each marginal.

Upon inspecting the QQ plots, we can observe that the fitted marginals provide a satisfactory approximation of the data.

mDPH class

Finally, let us illustrate the use of the mDPH class with a practical example. We will use the NMES1988 data set again, but this time, we will focus on jointly modelling two distinct variables: the yearly number of physician office visits (visits) and yearly non-physician office visits (nvisits). With the matrixdist package, the fitting of a mDPH distribution to this data can be done as follows.

# Load the data
data("NMES1988", package = "AER")
df <- NMES1988
# Observations in matrix form
mvisit <- cbind(df$visits, df$nvisits) + 1
# Initial mDPH distribution
set.seed(12345)
md <- mdph(structure = rep("general", 2), dimension = 4, variables = 2)
# Fitting procedure
md.fit <- fit(x = md, y = mvisit, stepsEM = 1500)

The expected values, variance-covariance matrix, and correlation matrix of the fitted distribution can be obtained using the subsequent code:

# Vector of marginal expectations
mean(md.fit) - 1
#> [1] 5.774399 1.618021
# Variance-covariance matrix
var(md.fit)
#>           [,1]      [,2]
#> [1,] 46.243751  5.726782
#> [2,]  5.726782 27.372432
# Correlation matrix
cor(md.fit)
#>           [,1]      [,2]
#> [1,] 1.0000000 0.1609635
#> [2,] 0.1609635 1.0000000

Note that the mean of the fitted mDPH distribution matches the sample’s mean, a property that also holds for the estimation methods for the PH, DPH, and mPH classes. Additionally, the fitted model captures the correlation between physician office visits and non-office visits. The mDPH class also supports the MoE framework, allowing us to incorporate covariate information. Next, we incorporate age, gender, and the number of chronic conditions into our analysis.

df$y <- mvisit
# Convert the gender variable into binary format
df$gender <- ifelse(df$gender == "male", 1, 0)
# Regression formula
f <- y ~ age + gender + chronic
# MoE regression
md.reg <- MoE(x = md.fit, y = mvisit, formula = f, data = df, stepsEM = 500)

The above output allows us to predict the joint distribution for new individuals with a set of specific characteristics. This is achieved by employing the predict() function in conjunction with the nnet object included in md.reg (accessible via md.reg$mm). For instance, for a 72-year-old male patient with one chronic condition, the mDPH distribution describing the joint behaviour of the two variables of interest can be specified as follows:

# Characteristics of a new patient
patient <- data.frame(age = 7.2, gender = 1, chronic = 1)
# Predicted starting probabilities
alpha <- predict(md.reg$mm, type = "probs", newdata = patient)
# New mDPH distribution
md.patient <- mdph(alpha = alpha, S = md.reg$S)

Figure 5.7 shows the joint probability mass function of yearly office and non-office visits for the new patient.

Refer to caption
Figure 5.7. Joint probability mass function.

6. Summary of methods

We have explored the fundamental properties of some matrix distributions and demonstrated the use of the matrixdist package for analyzing real-life data through detailed examples. The package encompasses methods for computing various functionals and estimating most of the classes discussed in this paper. Table 6.1 presents the families of matrix distributions currently implemented in the package. We refer readers to the package documentation for guidance on using these construction functions. For example, consulting ?ph will provide instructions on creating PH distributions. Additionally, Table 6.2 offers a comprehensive list of the implemented methods, along with a mapping to the applicable classes. For examples of the use of these methods, the package help can be consulted. The general structure for accessing the desired examples is ?`method_name,class_name-method`. For example, ?`fit,ph-method` directs to the documentation of the fit() method for the ph class.

Table 6.1. Classes of matrix distributions available in matrixdist
Class Constructor function
DPH dph()
PH ph()
IPH iph()
fMDPH bivdph()
mDPH mdph()
MPH* MPHstar()
fMPH bivph()
mPH mph()
fMIPH biviph()
mIPH miph()
Table 6.2. Methods for matrix distributions available in matrixdist
Method Description Available for
sim() Simulates iid realisations dph, ph, iph, bivdph, mdph, MPHstar, bivph, mph, biviph, miph
moment() Moments dph, ph, bivdph, mdph, bivph, mph
laplace() Laplace transform ph, bivph, mph
mgf() Moment-generating function ph, bivph, mph
pgf() Probability-generating function dph, bivdph, mdph
+ Parmetrisation of the sum of two matrix distributions dph, ph
minimum() Parmetrisation of the minimum of two matrix distributions dph, ph, iph
maximum() Parmetrisation of the maximum of two matrix distributions dph, ph, iph
mixture() Parmetrisation of a mixture of two matrix distributions dph, ph
Nfold() N-fold convolution with N DPH distributed dph, ph
dens() Density function dph, ph, iph, bivdph, mdph, bivph, mph, biviph, miph
cdf() Cumulative distribution function dph, ph, iph, mph, miph
haz() Hazard rate function ph, iph
quan() Quantiles ph, iph
TVR() Transformation via rewards dph, ph
fit() Estimation dph, ph, iph, bivdph, mdph, MPHstar, bivph, mph, biviph, miph
reg() PI regression ph, iph
MoE() MoE regression dph, ph, iph, bivdph, mdph, mph, miph
marginal() Parametrisation of marginal distribution bivdph, mdph, MPHstar, bivph, mph, biviph, miph
linCom() Parametrisation of a linear combination of marginals MPHstar, bivph
mean() Mean or vector of means dph, ph, bivdph, mdph, MPHstar, bivph, mph
var() Variance or covariance matrix dph, ph, bivdph, mdph, MPHstar, bivph, mph
cor() Correlation matrix bivdph, mdph, MPHstar, bivph, mph

References

  • Albrecher and Bladt, (2019) Albrecher, H. and Bladt, M. (2019). Inhomogeneous phase-type distributions and heavy tails. Journal of Applied Probability, 56(4):1044–1064.
  • (2) Albrecher, H., Bladt, M., Bladt, M., and Yslas, J. (2022a). Mortality modeling and regression with matrix distributions. Insurance: Mathematics and Economics, 107:68–87.
  • Albrecher et al., (2023) Albrecher, H., Bladt, M., and Müller, A. J. (2023). Joint lifetime modelling with matrix distributions. Dependence Modeling, 11(1).
  • (4) Albrecher, H., Bladt, M., and Yslas, J. (2022b). Fitting inhomogeneous phase-type distributions to data: The univariate and the multivariate case. Scandinavian Journal of Statistics, 49(1):44–77.
  • Asmussen et al., (1996) Asmussen, S., Nerman, O., and Olsson, M. (1996). Fitting phase-type distributions via the EM algorithm. Scandinavian Journal of Statistics, 23(4):419–441.
  • Bladt, (2022) Bladt, M. (2022). Phase-type distributions for claim severity regression modeling. ASTIN Bulletin: The Journal of the IAA, 52(2):417–448.
  • Bladt, (2023) Bladt, M. (2023). A tractable class of multivariate phase-type distributions for loss modeling. North American Actuarial Journal.
  • Bladt and Nielsen, (2017) Bladt, M. and Nielsen, B. F. (2017). Matrix-Exponential Distributions in Applied Probability, volume 81. Springer.
  • (9) Bladt, M. and Yslas, J. (2023a). matrixdist: Statistics for Matrix Distributions. R package version 1.1.9.
  • (10) Bladt, M. and Yslas, J. (2023b). Phase-type mixture-of-experts regression for loss severities. Scandinavian Actuarial Journal, 2023(4):303–329.
  • (11) Bladt, M. and Yslas, J. (2023c). Robust claim frequency modeling through phase-type mixture-of-experts regression. Insurance: Mathematics and Economics, 111:1–22.
  • Campillo-Navarro, (2018) Campillo-Navarro, A. (2018). Order statistics and multivariate discrete phase-type distributions. PhD thesis, Ph. D. thesis, DTU Lyngby.
  • Cox, (1972) Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological), 34(2):187–202.
  • Dutang et al., (2008) Dutang, C., Goulet, V., and Pigeon, M. (2008). actuar: An R package for actuarial science. Journal of Statistical software, 25:1–37.
  • Kulkarni, (1989) Kulkarni, V. G. (1989). A new class of multivariate phase type distributions. Operations Research, 37(1):151–158.
  • Moler and Van Loan, (1978) Moler, C. and Van Loan, C. (1978). Nineteen dubious ways to compute the exponential of a matrix. SIAM review, 20(4):801–836.
  • Okamura and Dohi, (2015) Okamura, H. and Dohi, T. (2015). mapfit: An R-based tool for PH/MAP parameter estimation. In Quantitative Evaluation of Systems: 12th International Conference, QEST 2015, Madrid, Spain, September 1-3, 2015, Proceedings 12, pages 105–112. Springer.
  • Olsson, (1996) Olsson, M. (1996). Estimation of phase-type distributions from censored data. Scandinavian Journal of Statistics, 23(4):443–460.
  • R Core Team, (2023) R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
  • Rivas-González et al., (2023) Rivas-González, I., Andersen, L. N., and Hobolth, A. (2023). PhaseTypeR: an R package for phase-type distributions in population genetics. Journal of Open Source Software, 8(82):5054.