A Note on the Moments of Special Mixture Distributions, with Applications for Control Charts

Balázs Dobi, András Zempléni
Budapest, Hungary

Abstract: Control charts can be applied in a wide range of areas, this paper focuses on generalisations suitable for healthcare applications. We concentrate on the effect of using mixture distributions as the possible shifts in the process mean value. We give a closed formula for the expected squared deviation from the target value between samplings. This observation can be utilised in finding the cost-optimal parameters of the control chart.

Keywords: control chart, cost-optimality, Markov-chain, mixture distributions

1 Introduction

Control charts, as methods of statistical process control in industry have been introduced by Walter A. Shewhart in the 1920s (Montgomery, 2009). Even though initially control charts were optimised with respect to statistical criteria, the concept of cost-efficient or cost-optimal control charts appeared not long after. One of the earliest and most important work in this field is A. J. Duncan’s paper from 1956 (Duncan, 1956). The concept is still very popular today as can be seen by published articles and developed software packages (see e.g. Mortarino (2010); Zhu and Park (2013)).

Our earlier research focused on applications in healthcare settings, see e.g. Dobi and Zempléni (2019a). In this paper, the main model is a flexible framework which is based on the works of Zempléni et al. (2004). This framework uses a Markov chain-based approach, which is similar to Duncan’s cycle model as it also defines states related to the monitored process. The advantage of this approach was that it allowed generalisations for random shift sizes, random repairs and random sampling times, all of which are common in healthcare applications. Using these control charts, we were able to estimate the optimal parameters of a patient monitoring setup, which consisted of the optimal time between samplings (i.e. control visits) and critical value (i.e. medical criteria) (Dobi and Zempléni, 2019a).

In this paper we aim to assess the effect of mixture shift size distributions - distributions which model the degradation in quality - on the average cost over a given time period. We give a closed formula for one of the main terms in the cost-calculations for control charts.

The rest of the paper consists of the following parts: Section 2 contains the main results, but first the mathematical background for the later results are given. Namely Subsection 2.1 briefly discusses the Markov chain-based framework and its generalisations. Subsections 2.2 and 2.3 deal with the main subject of this note: we present here a formula for the squared expectation of mixture shift distributions over a given time interval. Section 3 concludes the paper.

2 Methods

The description of the Markov chain-based framework below is just a brief introduction and summary necessary for understanding the results presented later in the paper. For further reading and more detailed descriptions see Zempléni et al. (2004); Dobi and Zempléni (2019a, b, 2022).

2.1 The Markov chain-based framework and its generalisations

Our control charts are used to monitor a characteristic, based on a single sample element at a time. The shift in this characteristic - when it occurs - is positive. These assumptions correspond to an $X$ -chart setup with sample size $N=1$ and one-sided critical value $K$ . Usually, there are three free parameters which are the focus of optimisation: the sample size $N$ , the critical value $K$ , and the time between samplings $h$ . Since the sample size is fixed here as $N=1$ , we are left with two parameters to use: $K$ and $h$ .

The following parameters and constants used throughout the paper are supposed to be known:

Notation	Meaning
$\mu_{0}$	Target value, in-control expectation
$\sigma$	Process standard deviation
$\delta$	Expected value of the exponential distribution (shift size)
$\zeta$	Probability of geometric shift in the mixed distribution
$\xi$	Probability parameter of the geometric distribution
$s$	Expected number of shifts in a unit time
$c_{s}$	Sampling cost
$c_{os}$	Base out-of-control cost per unit time
$c_{ob}$	Shift-proportional out-of-control cost per unit time
$c_{rb}$	Base repair cost
$c_{rs}$	Shift-proportional repair cost

The followings are also assumed:

•

The measurement error is normal with expectation 0 and known standard deviation $\sigma$ . Let its cumulative distribution function (CDF) be denoted by $\phi$ . Thus the in-control process distribution is normal with parameters $\mu_{0}$ and $\sigma$ .
•

The shift intensity ( $1/s$ : the inverse of the expected number of shifts in a unit time) is constant.
•

The shift size distribution is assumed to be known.
•

The process does not repair itself, and when a repair (treatment) is carried out it does not repair the process perfectly. Furthermore, the repair itself is treated as an instantaneous event, thus all costs related to repairing should be included in the repair cost. For example if a major repair entails higher cost, then this should also be reflected in the calculation.
•

The time between shifts, the shift sizes and repair effectiveness are all assumed to be independent from each other.

Using the above assumptions, the future distances from $\mu_{0}$ are only dependent on the current distance and independent from the earlier ones. As we consider the states at the samplings, the process is a Markov chain, with a continuous state space. In our earlier papers we introduced a discretisation scheme, which was needed in the implementation of the algorithm. In this paper we do not consider this in detail, as the theoretical results will not need discretisation, only the general form of the cost function.

Our approach contains two important differences compared to the traditional models. First is the random shift size, introduced for the cost-optimal approach by Zempléni et al. (2004). The second is the random repair size, introduced by Dobi and Zempléni (2019a). In this note we focus on the first one.

Let $\tau_{i}$ denote the random shift times on the real line and let $\rho_{i}$ be the shift size at time $\tau_{i}$ . Assume that $\rho_{i}$ follows a distribution, which has a CDF with support over $(0,\infty)$ , and that the shift sizes have the same distribution, are independent from each other and from $\tau_{i}$ . Let the probability mass function (PMF) of the number of shifts after time $t$ from the process’ start (assumed to be in the in-control state) be denoted by $\nu_{t}$ . $\nu_{t}$ is a discrete distribution with support over $\mathbf{N}^{0}$ , which will be assumed to have the Poisson distribution with parameter $\lambda t$ . If the previous conditions are met, the resulting random process of the shifts - let us denote it by $H(t)$ - has step functions as trajectories, which are monotonically increasing between samplings. The CDF of the process values for a given time $t$ from the start can be written the following way (assuming that there was no alarm signal and repair before $t$ ):

(1)

Z_{t}(x)=\begin{cases}0&{\text{if }}x<0,\\ \nu_{t}(0)+\sum_{k=1}^{\infty}\nu_{t}(k)\Psi_{k}(x)&{\text{if }}x\geq 0,\end{cases}

where $\Psi_{k}()$ is the CDF of the sum of $k$ independent, identically distributed shift sizes $\rho_{i}$ . The case $x=0$ means there is no shift, and the probability of zero shift size is just the probability of no shift occurring, which is $\nu_{t}(0)$ .

The resulting process of the expected values is a monotone increasing step function between samplings and has a downward ”jump” at alarm states - as the repair is assumed to be instantaneous.

The form of the cost function we used is

(2)

\mathrm{E}(C)=C\cdot P,

where $E(C)$ is the expected cost per unit time, $C$ is a vector of the total cost per unit time associated with each state and $P$ is the stationary distribution (vector of probabilities), defined by a suitable discretisation scheme. $E(C)$ is the expected value of the discrete distribution where the possible values are the costs related to the states and their probabilities are set by the stationary distribution of the Markov chain.

The $C$ vector is

C=\Bigg{\{}\frac{c_{s}}{h},\frac{c_{s}+c_{f}}{h},\frac{c_{s}}{h}+c_{o},\frac{c_{r}+c_{s}}{h}+c_{o}B_{h}\Bigg{\}},

where $c_{s}$ is the sampling cost, $c_{f}$ is the false alarm cost, $c_{r}$ is the true alarm (repair) cost and $c_{o}$ is the out-of-control operation cost per unit time. This cost item will be investigated in more detail in the sequel.

The relationship between the distance from $\mu_{0}$ and the resulting costs are often based on a Taguchi-type loss function (Montgomery, 2009). This means that a squared loss function is assumed. The expected out-of-control operation cost can be written as the expectation of a function of the distance from the target value. At (1) the shift size distribution was defined for a given time $t$ , but this time we are interested in calculating the expected cost for a whole interval. Formally, we intend to calculate the area ${\cal C}_{h,j}^{2}$ under the curve $\{t,E((H_{t_{0},j}(t))^{2})\}=\int_{t_{0}}^{t_{0}+h}E((H_{t_{0},j}(t))^{2})dt$ , where $t_{0}\leq t\leq t_{0}+h$ and $H_{t_{0},j}$ is the shifted mean-process (its unconditional distribution function $Z_{t}$ was given in (1)) upon the condition that $H(t_{0})=j$ . Note that the area under the curve is the total cost generated in a time interval, the expected cost per unit time for practical use can be calculated from this value by simply dividing with the length of the interval.

Thus, the main question here is the integral of the expected squared shift size over a given interval. This is the subject of the next subsection, in the case of a mixture distribution.

2.2 Mixture distribution as the shift size

For practical demonstration purposes, let us assume first that the shift times form a homogeneous Poisson process, and the size of a single shift has a distribution which is a mixture of an exponential and a geometric distribution:

F_{m}(x)=\zeta F_{g}(x)+(1-\zeta)F_{e}(x),

where $\zeta\in[0,1]$ is the mixing parameter, $F_{g}()$ is the CDF of the geometric distribution, $F_{e}()$ is the CDF of the exponential distribution, and $F_{m}()$ denotes the CDF of the resulting mixture distribution. Such a distribution can model processes with slow degradation, mixed with sudden jumps. A real-life example could be the effectiveness of a professional athlete, where the continuous degradation can be attributed to age and sudden degradation to accidents. Another example could be simultaneously occurring chronic kidney disease and acute kidney injury (Hsu and Hsu, 2016). Different definitions are available for the geometric distribution, the one used here has support over positive integers, thus $F_{g}(x)=1-(1-\xi)^{x}$ , with probability parameter $\xi$ . A negative binomial distribution can be defined as the convolution of independent geometric distributions with the same parameter. Its PMF can be written in the following way:

(3)

f_{nb}(x)={x-1\choose x-r}\cdot\xi^{r}(1-\xi)^{x-r},

where $r$ is the number of summed geometrically distributed variables. The support of this distribution is $x\in\{r,r+1,r+2,...\}$ . Let us assume now that $n$ shifts occurred in a given time interval, and $r$ of these were geometrically, while the rest were exponentially distributed. The CDF of the distribution of the sum of the shifts is then

F_{M|n,r}(x)=\begin{dcases}F_{E_{n-r}}(x)&{\text{if }}r=0,\\ \sum_{l=r}^{\left\lfloor x\right\rfloor}F_{E_{n-r}}(x-l)f_{nb}(l)&{\text{otherwise}},\end{dcases}

where $F_{E_{n-r}}$ is the distribution function of the Erlang distribution $E(n-r,\frac{1}{\delta})$ . The distribution function of the sum of $n$ variables with exponential-geometric mixed distribution is then

F_{M|n}(x)=\sum_{r=0}^{n}f_{b}(r)F_{M|n,r}(x),

where $f_{b}()$ is the PMF of the binomial distribution with parameters $n$ and $\zeta$ , giving the probability of $r$ geometrically distributed shifts out of $n$ .

If one wants to construct the shift size distribution for a given time after start using the notations in (1), $\nu_{t}$ would be the PMF of the Poisson distribution, and $\Psi_{k}()$ would be $F_{M|n}()$ with $n=k$ .

For the following paragraphs, let $X$ denote an $E(u,\frac{1}{\delta})$ (Erlang) distributed random variable, which is the sum of $u$ independent exponential variates each with mean $\delta$ , and let $Y$ denote a $NegBin(u,\xi)$ (negative binomial) distributed random variable, which is the sum of $u$ independent geometric variates each with parameter $\xi$ . Note, that for the negative binomial distribution we are using the definition given at (3). Also, for practical use, one may wish to have discrete jump size other than the integers defined by the support of the negative binomial distribution. We can simply multiply the random variable by a constant (let us denote it by $J$ ) to create arbitrary jump-intervals. The initial equation thus takes the form

{\cal C}_{h,j}^{2}=\int_{0}^{h}\Bigg{[}e^{-ts}j^{2}+\Bigg{(}\sum_{k=1}^{\infty}\frac{(ts)^{k}e^{-ts}}{k!}\cdot E((X+JY+j)^{2})\Bigg{)}\Bigg{]}dt\\

In this equation $j$ represents an arbitrary starting distance greater than or equal to $\mu_{0}$ , since the process may not start at $\mu_{0}$ after a sampling. The innermost expectation, which was in Dobi and Zempléni (2019a) simply the expectation of $(X+j)^{2}$ , now is replaced by $E((X+JY+j)^{2})$ . It is easier to present the calculation in parts, as it will result in a long expression. Firstly,

(4)	$\displaystyle E((X+JY+j)^{2})=\sum_{r=0}^{k}\binom{k}{r}\zeta^{r}(1-\zeta)^{k-r}$	$\displaystyle\Bigg{[}\Bigg{(}\frac{k-r}{1/\delta}\Bigg{)}^{2}+\frac{k-r}{1/\delta^{2}}+$
	$\displaystyle\ +\Bigg{(}J\frac{r}{\xi}\Bigg{)}^{2}+J^{2}r\frac{1-\xi}{\xi^{2}}+j^{2}+$
	$\displaystyle\ +2\frac{k-r}{1/\delta}J\frac{r}{\xi}+2\frac{k-r}{1/\delta}j+2J\frac{r}{\xi}j\Bigg{]}.$

Notice that we only expanded $E(X+JY+j)^{2}$ based on the well-known form of $(a+b+c)^{2}$ , operating with the constant $j$ as a random variable with degenerate distribution, and using the independence between variables. Namely, for fixed $k$ and $r$ , $\Big{(}\frac{k-r}{1/\delta}\Big{)}^{2}+\frac{k-r}{1/\delta^{2}}$ is $E(X^{2})$ , $\Big{(}J\frac{r}{\xi}\Big{)}^{2}+J^{2}r\frac{1-\xi}{\xi^{2}}$ is $E(Y^{2})$ . The rest of the terms are just $2E(XY)$ , $2E(Xj)$ and $2E(Yj)$ . The number of geometric variables follow a binomial distribution, hence the formula in the beginning of the right hand side of the equation. $k$ in this equation is a positive integer constant, but otherwise is modelled by a Poisson distribution. The expectation is reduced to

(5)	$\displaystyle E((X+JY+j)^{2})$	$\displaystyle=k\delta^{2}+(j-k\delta(\zeta-1))^{2}+$
	$\displaystyle\ +\frac{2k\zeta J(j+\delta(k+\zeta-k\zeta-1))-\xi k(\delta\zeta)^{2}}{\xi}+$
	$\displaystyle\ +\frac{k\zeta J^{2}(2-\xi+\zeta(k-1))}{\xi^{2}}$

Notice, that the result cannot be simply partitioned into a purely Erlang or a purely negative binomial part. The rest of the calculation is the same as for the exponential (non-mixture) shift size case in Dobi and Zempléni (2019a), albeit with more complicated expressions:

(6)	$\displaystyle{\cal C}_{h,j}^{2}$	$\displaystyle=\int_{0}^{h}e^{-ts}j^{2}+\sum_{k=1}^{\infty}\frac{(ts)^{k}e^{-ts}}{k!}\big{(}E((X+JY+j)^{2})\big{)}dt$
	$\displaystyle=\int_{0}^{h}j^{2}+\frac{2jst(\delta(\xi-\xi\zeta)+\zeta J)}{\xi}+$
	$\displaystyle\ +\frac{st(\delta^{2}\xi^{2}(\zeta-1)(-2+st(\zeta-1))-2\delta\xi\zeta stJ(\zeta-1)+\zeta(2-\xi+\zeta st)J^{2})}{\xi^{2}}dt$
	$\displaystyle=j^{2}+\delta hjs-\delta hj\zeta s+\frac{hj\zeta sJ}{\xi}+$
	$\displaystyle\ +\frac{hs(-6\delta^{2}\xi^{2}(\zeta-1)-3(\xi-2)\zeta J^{2}+2hs(\delta(\xi-\xi\zeta)+\zeta J)^{2})}{6\xi^{2}}$

The generality of the proposition is highlighted by its application on mixture distributions, since there is no readily-available density function to be used to ease the calculations, like in the simpler, exponential case, treated e.g. in Dobi and Zempléni (2019a).

2.3 General form for 2-component mixture shift size distributions

The case above demonstrates that ${\cal C}_{h,j}^{2}$ can be calculated using a closed form for non-trivial shift size distributions too. Several assumptions were made for the special cases so far, however the results can be generalised. Let us now only assume that $X$ and $Y$ are random variables with finite expectation and variance. Note that $X$ and $Y$ are not the component variates themselves, but the resulting variables after summing the stacked shifts. Let us denote these expectation and variance values with $m_{X}$ , $v_{X}$ , $m_{Y}$ and $v_{Y}$ for $X$ and $Y$ respectively. The calculation of the general form is very similar to the specific case above:

{\cal C}_{h,0}^{2}=\int_{0}^{h}\Bigg{(}\sum_{k=1}^{\infty}\frac{(ts)^{k}e^{-ts}}{k!}\cdot E((X+Y)^{2})\Bigg{)}dt\\

$E((X+Y)^{2})$ takes on the following form:

(7)	$\displaystyle E((X+Y)^{2})=\sum_{r=0}^{k}\binom{k}{r}\zeta^{r}(1-\zeta)^{k-r}$	$\displaystyle\Bigg{[}[(k-r)m_{X}]^{2}+(k-r)v_{X}$
	$\displaystyle\ +[rm_{Y}]^{2}+rv_{Y}+$
	$\displaystyle\ +2(k-r)m_{X}rm_{Y}\Bigg{]}.$

The rationale behind the formulation is simply that a variable which is defined as the sum of $(k-r)$ $X$ variates will have the expectation $(k-r)m_{X}$ and variance $(k-r)v_{X}$ , since we assumed finite expectation and variance for $X$ . The same is true for a variable that is the sum of $r$ $Y$ variates. The calculation above results in

(8)	$\displaystyle E((X+Y)^{2})$	$\displaystyle=k\big{[}m_{X}^{2}k+v_{X}-\zeta v_{X}+$
	$\displaystyle\ +\zeta\big{(}v_{Y}-(m_{X}-m_{Y})(m_{Y}+m_{X}(2k-1))\big{)}+$
	$\displaystyle\ +\zeta^{2}(m_{X}-m_{Y})^{2}(k-1)\big{]}.$

Again, the rest of the calculation is quite straightforward, as we only have to repeat the steps well-established above:

(9)	$\displaystyle{\cal C}_{h,0}^{2}$	$\displaystyle=\int_{0}^{h}\sum_{k=1}^{\infty}\frac{(ts)^{k}e^{-ts}}{k!}\big{(}E((X+Y)^{2})\big{)}dt$
	$\displaystyle=\int_{0}^{h}st(m_{X}^{2}+v_{X}-m_{X}^{2}\zeta-v_{X}\zeta+m_{Y}^{2}\zeta+v_{Y}\zeta+(m_{X}-m_{X}\zeta+m_{Y}\zeta)^{2}st)dt$
	$\displaystyle=\frac{1}{6}h^{2}s(3(m_{X}^{2}+v_{X}+\zeta(m_{Y}^{2}+v_{Y}-m_{X}^{2}-v_{X}))+2h(m_{X}-\zeta(m_{X}-m_{Y}))^{2}s).$

From this general equation, one can easily get the previously-presented (Dobi and Zempléni, 2019a) exponential shift size case for $\zeta=0$ , $j=0$ , $m_{X}=\delta$ , $v_{X}=\delta^{2}$ :

(10)		$\displaystyle\frac{1}{6}h^{2}s(3(m_{X}^{2}+v_{X}+\zeta(m_{Y}^{2}+v_{Y}-m_{X}^{2}-v_{X}))+2h(m_{X}-\zeta(m_{X}-m_{Y}))^{2}s)$
	$\displaystyle=\frac{1}{6}h^{2}s(3(\delta^{2}+\delta^{2}+0(m_{Y}^{2}+v_{Y}-\delta^{2}-\delta^{2}))+2h(\delta-0(\delta-m_{Y}))^{2}s)$
	$\displaystyle=\frac{1}{6}h^{2}s(6\delta^{2}+2h\delta^{2}s)=h^{2}s\delta\Big{(}\delta+\frac{hs\delta}{3}\Big{)}.$

It is also useful to inspect the results, if $m_{X}=m_{Y}$ and $v_{X}=v_{Y}$ :

(11)		$\displaystyle\frac{1}{6}h^{2}s(3(m_{X}^{2}+v_{X}+\zeta(m_{Y}^{2}+v_{Y}-m_{X}^{2}-v_{X}))+2h(m_{X}-\zeta(m_{X}-m_{Y}))^{2}s)$
	$\displaystyle=\frac{1}{6}h^{2}s(3(m_{X}^{2}+v_{X}+\zeta(m_{X}^{2}+v_{X}-m_{X}^{2}-v_{X}))+2h(m_{X}-\zeta(m_{X}-m_{X}))^{2}s)$
	$\displaystyle=\frac{1}{6}h^{2}s\Big{(}3(m_{X}^{2}+v_{X})+2hsm_{X}^{2}\Big{)}.$

One can see that $\zeta$ disappears from the equation, as expected.

3 Conclusions

In this note we presented a general formula for the expectation of the squared shift, when the shift size is a mixture distribution. This speeded up the Markov- chart R package, as shown in Dobi and Zempl\a’eni (2020); Dobi and Zempléni (2022) and has potential applications for other important cases, like normal and Frechet mixtures - where both components are continuous, but we face a mixture of a light tailed and a heavy tailed shift. We plan to come back to the implementation of these results later.

References

Dobi and Zempléni [2019a] Balázs Dobi and András Zempléni. Markov chain-based cost-optimal control charts for health care data. Quality and Reliability Engineering International, 35(5):1379–1395, 2019a. doi: 10.1002/qre.2518.
Dobi and Zempléni [2019b] Balázs Dobi and András Zempléni. Markov chain-based cost-optimal control charts with different shift size distributions. Annales Universitatis Scientiarum Budapestinensis de Rolando Eötvös Nominatae, Sectio Computatorica, 49:129–146, 2019b. URL http://ac.inf.elte.hu/Vol_049_2019/129_49.pdf.
Dobi and Zempl\a’eni [2020] Bal\a’azs Dobi and Andr\a’as Zempl\a’eni. Markovchart: Markov Chain-Based Cost-Optimal Control Charts, 2020. URL https://CRAN.R-project.org/package=Markovchart. R package version 2.1.4.
Dobi and Zempléni [2022] Balázs Dobi and András Zempléni. Markovchart: An R package for cost-optimal patient monitoring and treatment using control charts. Computational Statistics, 2022. doi: 10.1007/s00180-021-01175-3.
Duncan [1956] Acheson J Duncan. The economic design of x charts used to maintain current control of a process. Journal of the American Statistical Association, 51(274):228–242, 1956. doi: 10.1080/01621459.1956.10501322.
Hsu and Hsu [2016] Raymond K Hsu and Chi-yuan Hsu. The role of acute kidney injury in chronic kidney disease. Seminars in Nephrology, 36(4):283–292, 2016. doi: 10.1016/j.semnephrol.2016.05.005.
Montgomery [2009] Douglas C. Montgomery. Introduction to Statistical Quality Control, 6th Edition. John Wiley & Sons, Jefferson City, MO, 2009.
Mortarino [2010] Cinzia Mortarino. Duncan’s model for x̄-control charts: Sensitivity analysis to input parameters. Quality and Reliability Engineering International, 26(1):17–26, 2010. doi: 10.1002/qre.1026.
Zempléni et al. [2004] András Zempléni, Miklós Véber, Belmiro Duarte, and Pedro Saraiva. Control charts: a cost‐optimization approach for processes with random shifts. Applied Stochastic Models in Business and Industry, 20(3):185–200, 2004. doi: 10.1002/asmb.521.
Zhu and Park [2013] Weicheng Zhu and Changsoon Park. edcc: An R package for the economic design of the control chart. Journal of Statistical Software, 52(9):1–24, 2013. doi: 10.18637/jss.v052.i09.

Balázs Dobi and András Zempléni
Department of Probability Theory and Statistics, Eötvös Loránd University
Budapest, Hungary
[email protected]
[email protected]