A variability measure for estimates
of parameters in interval data fitting

Sergey P. Shary
Institute of Computational Technologies SB RAS
and Novosibirsk State University,
Novosibirk, Russia
E-mail: [email protected]

Abstract

The paper presents a construction of a quantitative measure of variability for parameter estimates in the data fitting problem under interval uncertainty. It shows the degree of variability and ambiguity of the estimate, and the need for its introduction is dictated by non-uniqueness of answers to the problems with interval data. A substantiation of the new variability measure is given, its application and motivations are discussed. Several examples and a series of numerical tests are considered, showing the features of the new characteristic and the specifics of its use.
Keywords: data fitting problem, linear regression, interval data uncertainty,
maximum compatibility method, strong compatibility, variability measure.
MSC 2010: 65G40, 62J10, 90C90

1 Introduction and problem statement

The purpose of this work is to present a quantitative variability measure for estimates of parameters of functional dependencies in the statistics of interval data. This is a relatively young branch of modern data science that does not rely on the probability theory, but makes extensive use of interval analysis methods (see, e. g., the surveys in [4, 7, 10]).

Refer to caption — Fig. 1: A variability measure can be an estimate

By the term “variability”, we understand the degree of variation and ambiguity of the estimate, and the need for its introduction is dictated by the fact that, in processing interval data, the answer is typically not unique. Usually, we get a whole set of different estimates that are equally consistent (compatible) with the source data and, thus, suitable as solutions to the problem. The extent to which this set is large or small is, partly, characterized by the term “variability”. In traditional probabilistic statistics, estimates of parameters are known to be random variables themselves, and the measure of their variability can be the variance of the estimates, mean absolute difference, median absolute deviation, average absolute deviation, and such like. What could be their analogues in the statistics of interval data?

At first glance, the answer to this question seems quite obvious: it can be any value that characterizes the size of the set of solutions to the problem, if it is non-empty. We can even take an enclosure of the solution set obtained by an interval method. A certain disadvantage of this variant is the excessive detailing of the answer given as a box in $\mathbb{R}^{n}$ , a large amount of information that still needs to be “digested” and reduced to a compact and expressive form. Sometimes, an interval estimate in the form of an axes-aligned box may inadequately represent the solution set. Another disadvantage is the complexity of finding such an estimate.

It is desirable to have a relatively simple and efficiently computable quantity, expressed in a single number, because it would give a general aggregate view of the subject of interest. Similarly to variance and other probabilistic measures, it can serve as an approximate characteristic of the quality of parameter estimation. The greater the variability of an estimate, the less its certainty and the worse its quality, and this can serve as a basis for conclusions about the quality of the estimate.

At the same time, the introduced variability measure should not be simply the “size of the solution set”. If this solution set, for example, is unstable and changes abruptly with arbitrarily small changes in the data, then its size is, to some extent, misleading and disorienting (see example in Section 4). A practically useful variability measure should take into account this possible instability of the solution set to the problem and give us a robust value.

In our article, we are within the framework of the data fitting problem (often called regression analysis problem): given results of measurements or observations, it is required to construct a functional dependence of a fixed type that “best fit” these data. Specifically, we need to determine the parameters $x_{1}$ , $x_{2}$ , …, $x_{n}$ of a linear function of the form

b=x_{1}a_{1}+\ldots+x_{n}a_{n}

(1)

from a number of values of the independent variables $a_{1}$ , $a_{2}$ , …, $a_{n}$ (also called exogenous, explanatory, predictor or input variables), and the corresponding values of the dependent variable $b$ (also called endogenous, response, criterion or output variable). Both $a_{1}$ , $a_{2}$ , …, $a_{n}$ and $b$ are not known precisely, and we only have intervals of their possible values (see Fig. 2). To find estimates of the coefficients $x_{1}$ , $x_{2}$ , …, $x_{n}$ , we use the so-called maximum compatibility method (previously called “maximum consistency method”), which was proposed and developed in the works [6, 16, 17, 19] and others. After the estimates for $x_{1}$ , $x_{2}$ , …, $x_{n}$ are found, we need to somehow evaluate their variability. Our article presents a construction of the variability measure in the above data fitting problem.

Note that traditional methods of data fitting and regression analysis, such as the least squares method and its modifications, the least modulus method, etc., cannot be applied to the solution of our problem, since they are unsuitable for situations where the source data are intervals rather than points.

2 Formulation of the main results

2.1 Maximum compatibility method and tolerable solution set

The initial data for our problem is a set of values of independent and dependent variables for function (1), which are obtained as a result of $m$ measurements (observations):

\begin{array}[]{ccccc}\text{\boldmath$a$}_{11},&\text{\boldmath$a$}_{12},&\ldots&\text{\boldmath$a$}_{1n},&\text{\boldmath$b$}_{1},\\ \text{\boldmath$a$}_{21},&\text{\boldmath$a$}_{22},&\ldots&\text{\boldmath$a$}_{2n},&\text{\boldmath$b$}_{2},\\ \vdots&\vdots&\ddots&\vdots&\vdots\\ \text{\boldmath$a$}_{m1},&\text{\boldmath$a$}_{m2},&\ldots&\text{\boldmath$a$}_{mn},&\text{\boldmath$b$}_{m}.\end{array}

(2)

These are intervals as we assume that these data are inaccurate and have interval uncertainty due to measurement errors, etc. Both the data (2) and other interval values throughout the text are highlighted in bold mathematical font according to the informal international standard [5]. The first index of the interval values from (2) means the measurement number, and the second one, at $\text{\boldmath$a$}_{ij}$ ’s, is the number of the independent variable that takes the corresponding value in this measurement.

To find an estimate $(\hat{x}_{1},\hat{x}_{2},\ldots,\hat{x}_{n})$ of the parameters of the linear function (1), we “substitute” data (2) into equality (1), thus getting an interval system of linear algebraic equations

\left\{\ \begin{array}[]{ccccccccc}\text{\boldmath$a$}_{11}x_{1}&+&\text{\boldmath$a$}_{12}x_{2}&+&\ldots&+&\text{\boldmath$a$}_{1n}x_{n}&=&\text{\boldmath$b$}_{1},\\[1.0pt] \text{\boldmath$a$}_{21}x_{1}&+&\text{\boldmath$a$}_{22}x_{2}&+&\ldots&+&\text{\boldmath$a$}_{2n}x_{n}&=&\text{\boldmath$b$}_{2},\\[1.0pt] \vdots&&\vdots&&\ddots&&\vdots&&\vdots\\[1.0pt] \text{\boldmath$a$}_{m1}x_{1}&+&\text{\boldmath$a$}_{m2}x_{2}&+&\ldots&+&\text{\boldmath$a$}_{mn}x_{n}&=&\text{\boldmath$b$}_{m},\end{array}\right.

(3)

or, briefly,

\text{\boldmath$A$}x=\text{\boldmath$b$}

(4)

with an interval $m\times n$ -matrix $\text{\boldmath$A$}=(\text{\boldmath$a$}_{ij})$ and interval $m$ -vector $\text{\boldmath$b$}=(\text{\boldmath$b$}_{i})$ in the right-hand side. The sets of parameters which are compatible, in this or that sense, with the measurement data (2) form various solution sets for the equations system (3). The most popular of them are the united solution set and tolerable solution set. The united solution set, defined as

\varXi_{uni}(\text{\boldmath$A$},\text{\boldmath$b$})=\bigl{\{}\,x\in\mathbb{R}^{n}\mid\text{ $Ax=b\,$ for some $A\in\text{\boldmath$A$}$ and $b\in\text{\boldmath$b$}$}\,\bigr{\}},

corresponds to the so-called weak compatibility between the parameters of function (1) and data (2) (see [6, 16, 17]). The tolerable solution set, defined as

\varXi_{tol}(\text{\boldmath$A$},\text{\boldmath$b$})=\bigl{\{}\,x\in\mathbb{R}^{n}\mid\text{ $Ax\in\text{\boldmath$b$}\,$ for each matrix $A\in\text{\boldmath$A$}$}\,\bigr{\}},

corresponds to the so-called strong compatibility between the parameters of function (1) and data (2) (see [19]).

Further, we assume that the solution to the data fitting problem for function (1) is found by the maximum compatibility method (see [16, 17, 19]). As an estimate of the parameters of function (1), it takes the maximum point of the recognizing functional, a special function that gives a quantitative “compatibility measure” of this estimate with empirical data (2).

The maximum compatibility method has two versions, “weak” and “strong”, that differ in understanding how exactly the interval data should be “compatible” with the function that we construct on them. Weak and strong compatibility reflect two different situations that may occur in data processing. In the weak version, it is required that the graph of the constructed function just intersects the measurement uncertainty boxes (see [16, 17]). The strong version implies more stringent condition: it requires that the function graph passes within the “corridors” specified by the intervals $\text{\boldmath$b$}_{i}$ , $i=1,2,\ldots,m$ , for any values of the independent variables $a_{1}$ , $a_{2}$ , …, $a_{n}$ from the respective intervals $\text{\boldmath$a$}_{i1}$ , $\text{\boldmath$a$}_{i2}$ , …, $\text{\boldmath$a$}_{in}$ obtained in the $i$ -th measurement (see [19]). This is illustrated in Fig. 3, where the straight line of the function graph goes through the vertical faces of the measurement uncertainty boxes. The weak compatibility is shown in Fig. 2 by two upper straight lines. The lower line in Fig. 2 does not satisfy compatibility condition at all, neither weak nor strong, since it does not intersect some boxes.

The “strong version” of the maximum compatibility method has a number of theoretical and practical advantages over the “weak version”. These are polynomial complexity, robustness of estimates and their finite variability, the fact that the strong compatibility partially overcomes the so-called Demidenko paradox, etc. (see details in [19]). Hence, we consider below a strong version of the maximum compatibility method, which corresponds to the tolerable solution set $\varXi_{tol}(\text{\boldmath$A$},\text{\boldmath$b$})$ for the interval system of equations (4). Its recognizing functional is usually denoted by “Tol”,

\mathrm{Tol}\,(x,\text{\boldmath$A$},\text{\boldmath$b$})\;=\,\min_{1\leq i\leq m}\left\{\,\mathrm{rad}\,\text{\boldmath$b$}_{i}-\left|\;\mathrm{mid}\,\text{\boldmath$b$}_{i}-\sum_{j=1}^{n}\,\text{\boldmath$a$}_{ij}x_{j}\,\right|\,\right\},

(5)

where

\mathrm{rad}\,\text{\boldmath$b$}_{i}=\tfrac{1}{2}(\overline{\text{\boldmath$b$}}_{i}-\underline{\text{\boldmath$b$}}_{i}),\hskip 65.44133pt\mathrm{mid}\,\text{\boldmath$b$}_{i}=\tfrac{1}{2}(\overline{\text{\boldmath$b$}}_{i}+\underline{\text{\boldmath$b$}}_{i})

are radii and midpoints of the components of the right-hand side $b$ , the arithmetic operations inside the modulus in (5) are those of the classical interval arithmetic (see, e. g., [4, 8, 9]), and the modulus is understood as the maximum absolute value of the points from the interval,

|\text{\boldmath$a$}|=\max\,\{\,|a|\mid a\in\text{\boldmath$a$}\,\}=\max\,\bigl{\{}\,|\underline{\text{\boldmath$a$}}|,|\overline{\text{\boldmath$a$}}|\,\bigr{\}}.

Typical graphs of the functional Tol for the one-dimensional case are shown in Fig. 4 and Fig. 5.

To solve the data fitting problem for the linear function (1) and data set (2), it is necessary to find the unconstrained maximum, over all $x\in\mathbb{R}^{n}$ , of the functional $\mathrm{Tol}\,(x,\text{\boldmath$A$},\text{\boldmath$b$})$ ,

\mathrm{Tol}\,(x,\text{\boldmath$A$},\text{\boldmath$b$})\rightarrow\max,

and the vector $\hat{x}=\arg\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,(x,\text{\boldmath$A$},\text{\boldmath$b$})$ at which this maximum is attained provides an estimate of the parameters of function (1).

If $\max\,\mathrm{Tol}\,\geq 0$ , then the solution set $\varXi_{tol}(\text{\boldmath$A$},\text{\boldmath$b$})$ , i. e., the set of parameters strongly compatible with the data is non-empty, and $\hat{x}\in\varXi_{tol}(\text{\boldmath$A$},\text{\boldmath$b$})$ . If $\max\,\mathrm{Tol}\,<0$ , then the solution set $\varXi_{tol}(\text{\boldmath$A$},\text{\boldmath$b$})$ is empty and there do not exist parameters that are strongly compatible with data (2). However, the argument $\hat{x}$ of $\max\,\mathrm{Tol}\,$ still provides the best compatibility of the constructed linear function with data (2) (more precisely, the least incompatibility).

To conclude this subsection, we give a useful result on the tolerable solution set that allows us to investigate whether it is bounded or unbounded, i. e., whether the tolerable solution sets is finite in size or extends infinitely.

Irene Sharaya’s boundedness criterion [13] Let the tolerable solution set to an interval linear system $\text{\boldmath$A$}x=\text{\boldmath$b$}$ be nonempty. It is unbounded if and only if the matrix $A$ has linearly dependent noninterval columns.

The criterion of boundedness shows that the tolerable solution set is unbounded, in fact, under exceptional circumstances, which are almost never fulfilled in practice, when working with actual interval data. That is, the tolerable solution set is mostly bounded, and the estimates obtained by the strong version of the maximum compatibility method almost always has finite variability.

2.2 Variability measures

As a quantity characterizing the variability of the estimate of the parameter vector $\hat{x}=(\hat{x}_{1},\hat{x}_{2},\ldots,\hat{x}_{n})$ in the linear function (1), which is obtained by the maximum compatibility method from data (2), we propose

\mathrm{IVE}\,(\text{\boldmath$A$},\text{\boldmath$b$})\;=\;\sqrt{n}\;\max_{\mathbb{R}^{n}}\,\mathrm{Tol}\,\cdot\Bigl{(}\;\min_{A\in\text{\boldmath$A$}}\,\mathrm{cond}_{2}\,A\,\Bigr{)}\cdot\frac{\displaystyle\bigl{\|}\,\arg\max_{\mathbb{R}^{n}}\,\mathrm{Tol}\,\bigr{\|}_{2}}{\|\hat{\text{\boldmath$b$}}\|_{2}}.

(6)

In this formula,

$n$ is the dimension of the parameter vector of function (1) under construction,

$\|\cdot\|_{2}$ is the Euclidean norm (2-norm) of vectors from $\mathbb{R}^{n}$ , defined as

\|x\|_{2}\;=\;\left(\;\sum_{i=1}^{n}|x_{i}|^{2}\,\right)^{1/2},

$\mathrm{cond}_{2}\,A$ is the spectral condition number of the matrix $A$ , defined as

\mathrm{cond}_{2}\,A\;=\;\frac{\sigma_{\max}(A)}{\sigma_{\min}(A)},

i. e., the ratio of the maximal $\sigma_{\max}(A)$ and minimal $\sigma_{\min}(A)$ singular values of $A$ ; it is an extension, to the rectangular case, of the concept of the condition number from computational linear algebra (see e. g. [2, 24]);

$\hat{\text{\boldmath$b$}}$ is a certain “most representative” point from the interval vector $b$ , which is taken as

\hat{\text{\boldmath$b$}}\;=\;\tfrac{1}{2}(|\mathrm{mid}\,\text{\boldmath$b$}+\mathrm{rad}\,\text{\boldmath$b$}|+|\mathrm{mid}\,\text{\boldmath$b$}-\mathrm{rad}\,\text{\boldmath$b$}|),

(7)

where the operations “mid” and “rad” are applied in componentwise manner.

Despite the definite formula (7) for $\hat{\text{\boldmath$b$}}$ , it should be noted that the introduction of this point is, to a large extent, a matter of common sense. The general approach to the definition of $\hat{\text{\boldmath$b$}}$ is that it must be a kind of “most representative” point from the right-hand side vector $b$ , and in some situations this choice may be different from formula (7). For example, $\hat{\text{\boldmath$b$}}$ can be a point result of the measurement, around which the uncertainty interval is built later, based on information about the accuracy of the measuring device.

Apart from (6), as a measure of relative variability of the parameter estimate, the value

n\;\Bigl{(}\,\min_{A\in\text{\boldmath$A$}}\,\mathrm{cond}_{2}A\,\Bigr{)}\cdot\frac{\max_{\mathbb{R}^{n}}\,\mathrm{Tol}\,}{\|\hat{\text{\boldmath$b$}}\|_{2}},

(8)

can have a certain significance. Both IVE and value (8) are defined for interval linear systems (4) with nonzero right-hand sides. They can take either positive real values or be infinite. The latter occurs in the only case of $\,\min_{A\in\text{\boldmath$A$}}\,\mathrm{cond}_{2}A=\infty$ , when all the point matrices $A\in\text{\boldmath$A$}$ have incomplete rank, i. e., when $\sigma_{\min}(A)=0$ for every $A\in\text{\boldmath$A$}$ . Then the variability measures are set to be infinite.

The symbol IVE is built as an abbreviation of the phrase “interval variability of the estimate”. Below, we show that the value IVE adequately characterizes the size of non-empty tolerable solution set for a large class of practically important situations. But it is useful to discuss informal motivations that lead to the estimate IVE and to demonstrate that IVE has an intuitive, clear and even visual meaning.

The tolerable solution set of an interval system of linear algebraic equations is the set of zero level of the recognizing functional Tol (see details in [15]), or, in other words, the intersection of the hypograph of this functional with the coordinate plane $\mathrm{Tol}\,=0$ (this is illustrated in Fig. 4). As a consequence, the magnitude of the maximum of the recognizing functional can, with other things being equal, be a measure of how extensive or narrow the tolerable solution set is. The more $\max\,\mathrm{Tol}\,$ , the larger the size of the tolerable solution set, and vice versa. An additional factor that provides “other things being equal” is the slope (steepness) of pieces of hyperplanes of which the polyhedral graph of the functional Tol is compiled (these are straight lines in the 1D case in Fig. 4 and Fig. 5). The slope of the hyperplanes is determined by the coefficients of the equations that define them, which are the endpoints of the data intervals (2). The value of this slope is summarized in terms of the condition number of point matrices from the interval data matrix $A$ . Finally, the multiplier

\frac{\|\arg\max\,\mathrm{Tol}\,\|_{2}}{\|\hat{\text{\boldmath$b$}}\|_{2}}\ =\ \frac{\|\hat{x}\|_{2}}{\|\hat{\text{\boldmath$b$}}\|_{2}}

is a scaling coefficient that helps to provide the commensurability of the final value with magnitudes of the solution, $\arg\max\,\mathrm{Tol}\,$ , and the right-hand side vector of the equations system. Thus, formula (6) is obtained.

3 A justification of the variability measure

Considering the most general case, we should assume that the number of measurements $m$ may not coincide with the number $n$ of unknown parameters of the linear function (1). In this section, we consider only the case $m\geq n$ . In other words, the number of measurements (observations) made is not less than the number of function parameters. Then the interval system of linear equations (4) is either square or tall (overdetermined). Of course, the data fitting problem makes sense for $m<n$ too, the maximum compatibility method also works for this case, and the variability measure IVE is then also applicable (see Section 4), but the latter still needs a separate substantiation.

3.1 Estimates of perturbations of the solution
to rectangular linear systems

The starting point of our constructions justifying the choice of (6) exactly in the form described above is the well-known inequality that estimates perturbation $\Delta x$ of a nonzero solution $x$ to the system of linear algebraic equations $Ax=b$ depending on the change $\Delta b$ of the right-hand side $b$ (see, e. g., [2, 24]):

\frac{\|\Delta x\|_{2}}{\|x\|_{2}}\ \leq\ \mathrm{cond}_{2}\,A\,\cdot\frac{\|\Delta b\|_{2}}{\|b\|_{2}}.

(9)

It is usually considered for square systems of linear equations, when $m=n$ , but in the case of the Euclidean vector norm and the spectral condition number of matrices, this inequality holds true in the more general case with $m\geq n$ . Naturally, estimate (9) makes sense only for $\sigma_{\min}(A)\neq 0$ , when $\mathrm{cond}_{2}A<\infty$ , i. e., when the matrix $A$ has full column rank. Let us briefly recall its derivation for this case.

Given

Ax=b\quad\text{ и }\quad A(x+\Delta x)=b+\Delta b,

we have

A\Delta x=\Delta b.

Further,

	$\displaystyle\displaystyle\frac{\displaystyle\phantom{M}\frac{\\|\Delta x\\|_{2}}{\\|x\\|_{2}}\phantom{M}}{\displaystyle\frac{\\|\Delta b\\|_{2}}{\\|b\\|_{2}}}\$	$\displaystyle=\ \frac{\\|\Delta x\\|_{2}\,\\|b\\|_{2}\phantom{I}}{\phantom{I}\\|x\\|_{2}\,\\|\Delta b\\|_{2}}=\ \frac{\\|\Delta x\\|_{2}\,\\|Ax\\|_{2}\phantom{I}}{\phantom{I}\\|x\\|_{2}\,\\|A\Delta x\\|_{2}}\ =\ \frac{\\|\Delta x\\|_{2}}{\\|A\Delta x\\|_{2}}\;\frac{\\|Ax\\|_{2}}{\\|x\\|_{2}}$
		$\displaystyle\leq\;\max_{\Delta x\neq 0}\frac{\\|\Delta x\\|_{2}}{\\|A\Delta x\\|_{2}}\ \max_{x\neq 0}\frac{\\|Ax\\|_{2}}{\\|x\\|_{2}}\ =\ \left(\min_{\Delta x\neq 0}\frac{\\|A\Delta x\\|_{2}}{\\|\Delta x\\|_{2}}\right)^{-1}\ \max_{x\neq 0}\frac{\\|Ax\\|_{2}}{\\|x\\|_{2}}$
		$\displaystyle=\ \bigl{(}\sigma_{\min}(A)\bigr{)}^{-1}\,\sigma_{\max}(A)\,=\ \mathrm{cond}_{2}(A)$

by virtue of the properties of the singular values (see e. g. [3, 24]). A comparison of the beginning and the end of this calculation leads to the inequality (9), which, as is easy to understand, is attainable for some $x$ and $\Delta x$ , or, equivalently, for some right-hand sides of $b$ and their perturbations $\Delta b$ . Naturally, the above calculations and the resulting estimate make sense only for $\sigma_{\min}(A)\neq 0$ .

3.2 Interval systems with point matrices

Let us consider an interval system of linear algebraic equations

Ax=\text{\boldmath$b$}

(10)

with a point (noninterval) $m\times n$ -matrix $A$ , $m\geq n$ , and an interval $m$ -vector $b$ in the right-hand side. We assume that $A$ has full column rank and, therefore, $\mathrm{cond}_{2}\,A<\infty$ .

Suppose also that the tolerable solution set for system (10) is non-empty, i. e. $\varXi_{tol}(A,\text{\boldmath$b$})=\bigl{\{}\,x\in\mathbb{R}^{n}\mid Ax\in\text{\boldmath$b$}\,\bigr{\}}\neq\varnothing$ . We need to quickly and with little effort estimate the size of this solution set, and our answer will be a “radius type” estimate for $\varXi_{tol}(A,\text{\boldmath$b$})$ . More precisely, we are going to evaluate $\max\|x^{\prime}-\hat{x}\|_{2}$ over all $x^{\prime}\in\varXi_{tol}(A,\text{\boldmath$b$})$ and for a special fixed point $\hat{x}\in\varXi_{tol}(A,\text{\boldmath$b$})$ , which is taken as

\hat{x}\ =\ \arg\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,(x,A,\text{\boldmath$b$}).

Recall that the argument $\hat{x}$ of the maximum of the recognizing functional for system (10) is an estimate of parameters of linear function (1) from empirical data. Strictly speaking, this point can be determined non-uniquely, but then let $\hat{x}$ be any one of the points at which the maximum is reached.

Let $x^{\prime}$ be a point in the tolerable solution set $\varXi_{tol}(A,\text{\boldmath$b$})$ . How to evaluate $\|x^{\prime}-\hat{x}\|_{2}$ ? It is clear that $x^{\prime}$ and $\hat{x}$ are solutions of systems of linear algebraic equations with the matrix $A$ and some right-hand sides $b^{\prime}$ and $\hat{b}$ , respectively, from the interval vector $b$ . If $\hat{x}\neq 0$ and $\hat{b}\neq 0$ , then we can apply inequality (9), considering a perturbation of the solution $\hat{x}$ to the system of linear algebraic equations $Ax=\hat{b}$ . Then $\Delta x=x^{\prime}-\hat{x}$ , $\Delta b=b^{\prime}-\hat{b}$ , and we get

\frac{\|x^{\prime}-\hat{x}\|_{2}}{\|\hat{x}\|_{2}}\ \leq\;\mathrm{cond}_{2}\,A\cdot\frac{\|b^{\prime}-\hat{b}\|_{2}}{\|\hat{b}\|_{2}},

from where the absolute estimate is obtained

\|x^{\prime}-\hat{x}\|_{2}\ \leq\;\mathrm{cond}_{2}\,A\cdot\|\hat{x}\|_{2}\cdot\frac{\|b^{\prime}-\hat{b}\|_{2}}{\|\hat{b}\|_{2}}.

(11)

The point $\hat{x}$ is found as the result of maximization of the recognizing functional Tol, the point $\hat{b}$ coincides with $A\hat{x}$ , the condition number $\mathrm{cond}_{2}\,A$ can be computed by well-developed standard procedures. Therefore, for practical work with inequality (11), one need somehow evaluate $\|b^{\prime}-\hat{b}\|_{2}$ .

But first, bearing in mind the further application of the deduced estimate in a situation where the matrix $A$ may vary, we somewhat roughen (11) by taking approximately $\|\hat{b}\|_{2}\approx\|\hat{\text{\boldmath$b$}}\|_{2}$ , that is, as the norm of the “most representative” point $\hat{\text{\boldmath$b$}}$ of the interval vector $b$ , which we defined in Section 2.2:

\|\hat{b}\|_{2}\,\approx\,\|\hat{\text{\boldmath$b$}}\|_{2},\qquad\text{ where }\ \hat{\text{\boldmath$b$}}\,=\,\tfrac{1}{2}\,\bigl{(}\,|\mathrm{mid}\,\text{\boldmath$b$}+\mathrm{rad}\,\text{\boldmath$b$}|+|\mathrm{mid}\,\text{\boldmath$b$}-\mathrm{rad}\,\text{\boldmath$b$}|\,\bigr{)}.

In doing this, some coarsening is allowed, so instead of (11) we write

\|x^{\prime}-\hat{x}\|_{2}\ \lessapprox\;\mathrm{cond}_{2}\,A\cdot\|\hat{x}\|_{2}\cdot\frac{\|\Delta b\|_{2}}{\|\hat{\text{\boldmath$b$}}\|_{2}}.

(12)

Now it is necessary to determine the increment of the right-hand side $\Delta b=b^{\prime}-\hat{b}$ . Its obvious upper bound is $2\,\mathrm{rad}\,\text{\boldmath$b$}$ , but it is too crude. To get a more accurate estimate of $\Delta b$ , we also consider, along with system (10), a system of linear algebraic equations

Ax=\tilde{\text{\boldmath$b$}},

(13)

for which the right-hand side is obtained by uniform “compressing” the interval vector $b$ :

\tilde{\text{\boldmath$b$}}\,:=\,\bigl{[}\,\underline{\text{\boldmath$b$}}+M,\overline{\text{\boldmath$b$}}-M\,\bigr{]},

(14)

where

M\;:=\;\max_{x\in\mathbb{R}^{n}}\;\mathrm{Tol}\,(x,A,\text{\boldmath$b$})\ \geq\ 0.

Since the maximum $M$ is reached for a certain value of the argument, $\hat{x}$ , then

M=\,\min_{1\leq i\leq m}\left\{\,\mathrm{rad}\,\text{\boldmath$b$}_{i}-\left|\;\mathrm{mid}\,\text{\boldmath$b$}_{i}-\sum_{j=1}^{n}\,\text{\boldmath$a$}_{ij}\hat{x}_{j}\,\right|\,\right\}\ \leq\,\min_{1\leq i\leq m}\,\mathrm{rad}\,\text{\boldmath$b$}_{i}.

As a result, $\underline{\text{\boldmath$b$}}+M\leq\overline{\text{\boldmath$b$}}-M$ in componentwise sense, and the endpoints in the interval vector (14) do not “overlap” each other.

But the properties of the recognizing functional imply that, for the interval system of linear algebraic equations (13) with the right-hand side (14), the maximum of the recognizing functional is zero:

\max_{x\in\mathbb{R}^{n}}\;\mathrm{Tol}\,(x,A,\tilde{\text{\boldmath$b$}})\ =\ 0.

Indeed, the values of $\mathrm{rad}\,\text{\boldmath$b$}_{i}$ are summands in all expressions in (5), for which we take the minimum over $i=1,2,\ldots,m$ . Hence, if we simultaneously increase or decrease all $\mathrm{rad}\,\text{\boldmath$b$}_{i}$ by the same value, keeping the midpoints $\mathrm{mid}\,\text{\boldmath$b$}_{i}$ unchanged, then the total value of the recognizing functional will increase or decrease by exactly same value. In other words, if we take a constant $C\geq 0$ and the interval $m$ -vector $\text{\boldmath$e$}=([-1,1],\ldots,[-1,1])^{\top}$ , then, for the system $Ax=\text{\boldmath$b$}+C\text{\boldmath$e$}\,$ with all the right-hand sides expanded by $[-C,C]$ , we have

\mathrm{Tol}\,(x,A,\text{\boldmath$b$}+C\text{\boldmath$e$})\ =\ \mathrm{Tol}\,(x,A,\text{\boldmath$b$})+C.

(15)

Therefore,

\max_{x\in\mathbb{R}^{n}}\;\mathrm{Tol}\,(x,A,\text{\boldmath$b$}+C\text{\boldmath$e$})\ =\ \max_{x\in\mathbb{R}^{n}}\;\mathrm{Tol}\,(x,A,\text{\boldmath$b$})+C.

(16)

The uniform narrowing of the right-hand side vector acts on the tolerable solution set and the recognizing functional in a completely similar way. If we narrow down all the components by the same value $M$ , then the maximum of the recognizing functional of the new interval system also decreases by $M$ .

By virtue of the properties of the recognizing functional, the tolerable solution set $\varXi_{tol}(A,\tilde{\text{\boldmath$b$}})$ for system (13) has empty interior (such sets are often called “non-solid” or “meager”), which we will consider equivalent to “having zero size”. Naturally, this is a simplifying assumption, since in reality the tolerable solution set corresponding to the zero maximum of the recognizing functional may be not a single-point set. But we still accept that. This implication is also supported by the fact that the situation with the zero maximum of the recognizing functional is unstable: the corresponding tolerable solution set can become empty with an arbitrarily small data perturbation (see Section 4).

Another fact concerning the auxiliary system (13) with the narrowed right-hand side, which follows from (15)–(16), is that the point $\hat{x}$ remains to be the argument of the maximum of the recognizing functional:

\hat{x}\ =\ \arg\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,(x,A,\tilde{\text{\boldmath$b$}}).

For this reason, the point $\hat{b}=A\hat{x}$ lies in the interval vector $\tilde{\text{\boldmath$b$}}$ defined by (14).

From what has been said, it follows that the solution set for the system $Ax=\text{\boldmath$b$}$ is obtained from the solution set of the system $Ax=\tilde{\text{\boldmath$b$}}$ , which has “negligible size” and for which $\max_{x\in\mathbb{R}^{n}}\mathrm{Tol}\,(x,\text{\boldmath$A$},\tilde{\text{\boldmath$b$}})=0$ , through expanding the right-hand side vector $\tilde{\text{\boldmath$b$}}$ in each component simultaneously by $[-M,M]$ , where

M=\max_{x\in\mathbb{R}^{n}}\;\mathrm{Tol}\,(x,\text{\boldmath$A$},\text{\boldmath$b$}).

The interval vector $\tilde{\text{\boldmath$b$}}\ni b$ may have non-zero size, but we put

[-\Delta b,\Delta b]=([-M,M],\ldots,[-M,M])^{\top}

in order to make our estimate (12) attainable. Accordingly, in inequality (12) we take

\|\Delta b\|=\max_{x\in\mathbb{R}^{n}}\;\mathrm{Tol}\,(x,\text{\boldmath$A$},\text{\boldmath$b$}),

if the Chebyshev norm ( $\infty$ -norm) is considered, or a value that differs from it by a corrective factor from the equivalence inequality for vector norms, if we take any other norm. As is known, for any vector $y\in\mathbb{R}^{n}$ (see [2])

\|y\|_{\infty}\leq\|y\|_{2}\leq\sqrt{n}\;\|y\|_{\infty}.

(17)

Then

\|x^{\prime}-\hat{x}\|_{2}\ \lessapprox\,\sqrt{n}\ \,\mathrm{cond}_{2}A\cdot\bigl{\|}\,\arg\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,\bigr{\|}_{2}\cdot\frac{\max_{x\in\mathbb{R}^{n}}\mathrm{Tol}\,}{\|\hat{\text{\boldmath$b$}}\|_{2}}.

(18)

What happens if the matrix $A$ does not have a full column rank? Then, by virtue of the Irene Sharaya criterion, the nonempty tolerable solution set to the system (10) is unbounded. This is completely consistent with the fact that then $\mathrm{cond}_{2}A=\infty$ and the value of the variability measure IVE is infinite too.

3.3 General interval systems

Finally, we consider a general interval system of linear equations $\text{\boldmath$A$}x=\text{\boldmath$b$}$ , with an essentially interval matrix, i. e., when $\mathrm{rad}\,\text{\boldmath$A$}\neq 0$ . In view of the properties of the tolerable solution set (see, e. g., [15]), it can be represented as

\varXi_{tol}(\text{\boldmath$A$},\text{\boldmath$b$})\ =\ \bigcap_{A\in\text{\boldmath$A$}}\;\bigl{\{}\,x\in\mathbb{R}^{n}\mid Ax\in\text{\boldmath$b$}\,\bigr{\}}\ =\ \bigcap_{A\in\text{\boldmath$A$}}\varXi_{tol}(A,\text{\boldmath$b$}),

(19)

i. e., as the intersection of the solution sets to the individual systems $Ax=b$ with point matrices $A\in\text{\boldmath$A$}$ .

For each interval linear system $Ax=\text{\boldmath$b$}$ with $A\in\text{\boldmath$A$}$ , we have estimate (18), if $A$ has full column rank. Otherwise, if the point matrix $A$ has incomplete column rank and the corresponding solution set $\varXi_{tol}(A,\text{\boldmath$b$})$ is unbounded, then we do not take it into account. Consequently, for the tolerable solution set of the system $\text{\boldmath$A$}x=\text{\boldmath$b$}$ , which is the intersection of the solution sets $\varXi_{tol}(A,\text{\boldmath$b$})$ for all $A\in\text{\boldmath$A$}$ , the following should be true:

\|x^{\prime}-\hat{x}\|_{2}\ \lessapprox\ \min_{A\in\text{\boldmath$A$}}\,\left\{\;\sqrt{n}\;\,\mathrm{cond}_{2}A\cdot\bigl{\|}\,\arg\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,\bigr{\|}_{2}\cdot\frac{\max_{x\in\mathbb{R}^{n}}\mathrm{Tol}\,}{\|\hat{\text{\boldmath$b$}}\|_{2}}\,\right\}.

(20)

The transition from representation (19) to inequality (20) can be both very accurate and rather crude (as can be seen from considering the intersection of two 1D intervals). It all depends on the size of the intersection of the solution sets of individual systems $Ax=\text{\boldmath$b$}$ . On the other hand, the amount of this intersection is indirectly characterized by the magnitude of $\max_{x\in\mathbb{R}^{n}}\mathrm{Tol}\,$ .

Taking the above facts into account, we perform approximate estimation of the right-hand side of inequality (20) by moving the minimum over $A\in\text{\boldmath$A$}$ through the curly brackets. First of all, we evaluate the factor $\|\arg\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,\|_{2}$ , which changes to the smallest extent, by the constant available to us after the numerical solution of the data fitting problem:

\bigl{\|}\arg\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,(x,A,\text{\boldmath$b$})\bigr{\|}_{2}\,\approx\;\mathrm{const}\;=\;\bigl{\|}\arg\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,(x,\text{\boldmath$A$},\text{\boldmath$b$})\bigr{\|}_{2}.

(21)

Next, the minimum of $\mathrm{cond}_{2}A$ naturally turns to $\min\mathrm{cond}_{2}A$ , and the most important factor $\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,(x,A,\text{\boldmath$b$})$ will be changed to $\max_{x\in\mathbb{R}^{n}}\mathrm{Tol}\,(x,\text{\boldmath$A$},\text{\boldmath$b$})$ . This choice (as well as (21)) is rather rigidly determined by the following reasons.

The expression for our variability measure should preserve its simplicity and be uniform for all cases and situations. In particular, if the interval matrix $A$ squeezes to a point matrix $A$ , then our measure should turn to the estimate (18) for the point case. Finally, if $\max\,\mathrm{Tol}\,=0$ , then our measure must be zero too, since the size of the (stable) tolerable solution set is also zero, and our variability measure should reliably detect such situations. All this taken together leads to the estimate

\|x^{\prime}-\hat{x}\|_{2}\;\,\lessapprox\ \sqrt{n}\ \Bigl{(}\;\min_{A\in\text{\boldmath$A$}}\,\mathrm{cond}_{2}A\,\Bigr{)}\cdot\bigl{\|}\,\arg\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,\bigr{\|}_{2}\cdot\frac{\max_{x\in\mathbb{R}^{n}}\mathrm{Tol}\,}{\|\hat{\text{\boldmath$b$}}\|_{2}}.

(22)

The same estimate as (22), by virtue of the equivalence inequality (17), is also true for the Chebyshev norm:

\max_{x^{\prime}\in\varXi_{tol}(\text{\boldmath$A$},\text{\boldmath$b$})}\|x^{\prime}-\hat{x}\|_{\infty}\ \lessapprox\ \sqrt{n}\;\,\Bigl{(}\;\min_{A\in\text{\boldmath$A$}}\,\mathrm{cond}_{2}\,A\,\Bigr{)}\cdot\bigl{\|}\,\arg\max_{x\in\mathbb{R}^{n}}\mathrm{Tol}\,\bigr{\|}_{2}\cdot\frac{\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,}{\|\hat{\text{\boldmath$b$}}\|_{2}}.

This completes the rationale for (6).

If we want to evaluate the relative size of the tolerable solution set, expressing it in ratio to the norm of its points, then it is reasonable to take $\hat{x}=\arg\max_{x\in\mathbb{R}^{n}}\mathrm{Tol}\,$ as the “most typical” point from the tolerable solution set $\varXi_{tol}(\text{\boldmath$A$},\text{\boldmath$b$})$ . Using (17) again, we obtain

\frac{\max_{x^{\prime}\in\varXi_{tol}(\text{\boldmath$A$},\text{\boldmath$b$})}\|x^{\prime}-x^{\prime\prime}\|_{\infty}}{\|\hat{x}\|_{\infty}}\ \lessapprox\ n\;\Bigl{(}\;\min_{A\in\text{\boldmath$A$}}\,\mathrm{cond}_{2}A\,\Bigr{)}\cdot\frac{\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,}{\|\hat{\text{\boldmath$b$}}\|_{2}}.

This gives value (8).

4 Numerical examples and some tests

First of all, we consider an example of unstable tolerable solution set that changes abruptly with small perturbations in the system of equations. For all interval $2\times 2$ -systems of linear algebraic equations of the form

\begin{pmatrix}[-1,1]&[-1,1]\\[2.0pt] 1&-1\\[2.0pt] \end{pmatrix}\begin{pmatrix}x_{1}\\[2.0pt] x_{2}\end{pmatrix}=\begin{pmatrix}{[-1,1]}\\[2.0pt] {[1,1+\eta]}\end{pmatrix},\qquad\eta\geq 0,

(23)

the tolerable solution sets are the same: this is the straight line segment joining the points $(0,-1)$ and $(1,0)$ and depicted in Fig. 6. The diameter of the solution set is essentially non-zero (namely, $\sqrt{2}$ ), but the unconstrained maximum of the recognizing functional Tol for all such systems is zero, and it is attained at the point $(0.5,-0.5)$ .

At the same time, any arbitrarily small increase in the lower endpoint of the interval $[1,1+\eta]$ in the right-hand side of the second equation makes the tolerable solution set empty. An arbitrarily small reduction of the upper endpoint of the interval $[-1,1]$ , located in the first component of the right-hand side vector, produces a similar effect. It turns out that the maximum value of the recognizing functional Tol characterizes very precisely the instability of the original solution set and the zero size of the solution sets of perturbed systems.

As the second example, we consider the problem of constructing a linear function of two variables $a_{1}$ and $a_{2}$ ,

b=x_{1}a_{1}+x_{2}a_{2},

(24)

from the interval data obtained in 3 measurements:

\begin{array}[]{c|ccc}&\text{\boldmath$a$}_{1}&\text{\boldmath$a$}_{2}&\text{\boldmath$b$}\\ \hline\cr\\[-8.53581pt] 1&[98,100]&[99,101]&[190,210]\\[3.0pt] 2&[97,99]&[98,100]&[200,220]\\[3.0pt] 3&[96,98]&[97,99]&[190,210]\end{array}

Note that in these data the three-dimensional uncertainty boxes of measurements 1 and 2, as well as 2 and 3, substantially “overlap” each other: their intersections are boxes with non-empty interiors, the sizes of which are comparable to the sizes of the original data boxes.

To determine the coefficients $x_{1}$ and $x_{2}$ , we compose an interval $3\times 2$ -system of linear algebraic equations

\begin{pmatrix}[98,100]&[99,101]\\[2.0pt] [97,99]&[98,100]\\[2.0pt] [96,98]&[97,99]\end{pmatrix}\begin{pmatrix}x_{1}\\[2.0pt] x_{2}\end{pmatrix}=\begin{pmatrix}{[190,210]}\\[2.0pt] {[200,220]}\\[2.0pt] {[190,210]}\end{pmatrix}.

(25)

Its matrix has incomplete rank, since its member is a point matrix with the rank 1:

\begin{pmatrix}98&99\\ 98&99\\ 98&99\end{pmatrix}.

(26)

The united solution set for system (25) is unbounded, therefore it is hardly possible to determine, with certainty, the coefficients of the linear function (24) satisfying the weak compatibility between parameters and data (see Section 2). However, the interval matrix of system (25) does not contain linearly dependent point columns, and therefore, according to the Irene Sharaya criterion [13] (see Section 2.1), the tolerable solution set is bounded. It is depicted in Fig. 7, which is drawn by the procedure EqnTol2D from the package IntLinInc2D [14]. The minimum spectral condition number of the point matrices contained in the interval matrix of (25) is $103.83$ , and it is reached on the matrix

\begin{pmatrix}100&99\\ 97&100\\ 96&99\end{pmatrix}.

This result can be obtained, for example, using the simulated annealing algorithm on the set of point matrices contained in the interval matrix of (25).

Numerical solution of the maximization problem for the recognizing functional Tol can be carried out within MATLAB environment, using the free program tolsolvty2.m (available from the author’s page at ResearchGate [21]). It implements a modified version of the so-called $r$ -algorithms for non-differentiable optimization proposed and developed by N.Z. Shor and N.G. Zhurbenko [20]. Using the precision settings specified in it “by default”, we get $\max\,\mathrm{Tol}\,=1.9095$ , which is reached at $\hat{x}=(5.1857\cdot 10^{-7},2.0603)^{\top}$ . Then,

\mathrm{IVE}\,=\sqrt{2}\cdot 1.9095\cdot 103.83\cdot\frac{\|\hat{x}\|_{2}}{\sqrt{200^{2}+210^{2}+200^{2}}}=1.6399.

Interval hull of the tolerable solution set for system (25) (that is, its optimal interval enclosure) is the box

\begin{pmatrix}[-0.9620,3.0227]\\[2.0pt] [-0.9320,3.0127]\end{pmatrix},

which can also be found by the procedure EqnTol2D. We see that the value of IVE is in satisfactory agreement with the radii of the components of the optimal estimate of the solution set, equal to $1.9924$ and $1.9724$ respectively.

In the maximum compatibility method, the argument $\hat{x}=\arg\max_{x\in\mathbb{R}^{n}}\mathrm{Tol}\,$ of the unconstrained maximum of the recognizing functional plays a crucial role, and, in fact, our variability estimate IVE relies heavily on it. This is why it makes sense to look at the box $\hat{\text{\boldmath$x$}}$ with the components $[\hat{x}_{i}-\mathrm{IVE}\,,\hat{x}_{i}+\mathrm{IVE}\,]$ , $i=1,2$ . It is also depicted in Fig. 7, and the substantial asymmetry of its location relative to the solution set is, of course, explained by the specific position of the center, the point $\hat{x}$ , as well as the ill-conditioning of the point systems from (25). With other data, the box $\hat{\text{\boldmath$x$}}$ estimates the tolerable solution sets significantly better (see further).

Next, we give an example of the opposite type (in a sense, dual to the previous example), where a linear function of three variables

b=x_{1}a_{1}+x_{2}a_{2}+x_{2}a_{3}

(27)

is to be constructed from the data of two experiments summarized below:

\begin{array}[]{c|cccc}&\text{\boldmath$a$}_{1}&\text{\boldmath$a$}_{2}&\text{\boldmath$a$}_{3}&\text{\boldmath$b$}\\ \hline\cr\\[-8.53581pt] 1&[98,100]&[97,99]&[96,98]&[190,210]\\[3.0pt] 2&[99,101]&[98,100]&[97,99]&[200,220]\end{array}

To find the parameters of function (27), we come to an underdetermined interval system of linear algebraic equations

\begin{pmatrix}[98,100]&[97,99]&[96,98]\\[2.0pt] [99,101]&[98,100]&[97,99]\end{pmatrix}\begin{pmatrix}x_{1}\\[1.0pt] x_{2}\\[1.0pt] x_{3}\end{pmatrix}=\begin{pmatrix}{[190,210]}\\[2.0pt] {[200,220]}\end{pmatrix}.

(28)

Its matrix is the transposed matrix of system (25), so $\min_{A\in\text{\boldmath$A$}}\mathrm{cond}_{2}\,A$ is the same for it. Also, the matrix of system (28) contains a point matrix of the incomplete rank 1, which is transposed for (26) (and many more such matrices).

Again, the united solution set for system (28) is unbounded, and it is difficult (if at all possible) to determine the coefficients of the linear function (27), relying on the weak compatibility between parameters and data, due to “infinite variability” of the resulting estimate. Nevertheless, in these adverse conditions, the nonempty tolerable solution set to the interval system of equations (28) is bounded by virtue of the Irene Sharaya criterion [13] (see Section 2.1). In Fig. 8, the tolerable solution set is depicted as a thin hexagonal plate. Computation of the maximum of the recognizing functional for this system using the code tolsolvty2.m gives the value $\max\mathrm{Tol}\,=3.9698$ , which is reached at the point

\hat{x}=\arg\max\mathrm{Tol}\,=\,\bigl{(}\,2.0603,3\cdot 10^{-6},2.1\cdot 10^{-6}\,\bigr{)}^{\top}.

It can be taken as an estimate of the coefficients in (27). Then the varibility measure of the above estimate is

\mathrm{IVE}\,=\sqrt{2}\cdot 3.9698\cdot 103.83\cdot\frac{\|\hat{x}\|_{2}}{\sqrt{200^{2}+210^{2}}}=4.1413.

Interval hull (optimal interval enclosure) of the tolerable solution set for system (28) is the box

\begin{pmatrix}[-1.9747,4.0302]\\[2.0pt] [-1.9899,4.0759]\\[2.0pt] [-1.9949,4.1071]\end{pmatrix},

which can also be computed by the procedure EqnTolR3. The radii of the components of this interval vector are $3.0024$ , $3.0329$ , $3.0510$ respectively, which is also not very different from the value of IVE. The example shows that the value IVE works even in the case of $m<n$ , when the number of measurements is less than the number of parameters to be determined. But a rigorous justification of this fact is waiting for its study.

To conclude the section, we present, in Table 1, the results of numerical tests for the interval linear $n\times n$ -system

\left(\begin{array}[]{cccc}\theta&{[0,2]}&\cdots&{[0,2]}\\[1.0pt] {[0,2]}&\theta&\cdots&{[0,2]}\\[1.0pt] \vdots&\vdots&\ddots&\vdots\\[1.0pt] {[0,2]}&{[0,2]}&\cdots&\theta\end{array}\right)\;\left(\begin{array}[]{@{\,}c@{\,}}x_{1}\\[1.0pt] x_{2}\\[1.0pt] \vdots\\[1.0pt] x_{n}\end{array}\right)=\left(\begin{array}[]{@{\;}c@{\;}}{[1,K]}\\[1.0pt] {[1,K]}\\[1.0pt] \vdots\\[1.0pt] {[1,K]}\end{array}\right),

(29)

with various $n$ and $K$ . System (29) resembles that proposed in [9], having exactly same matrix. But the right-hand sides were taken as positive intervals $[1,K]$ , since the original balanced intervals $[-1,1]$ in the system from [9] make the tolerable solution set “too symmetric”.

Table 1: Results of the computational tests with system (29)

$\theta\rule[-8.53581pt]{0.0pt}{22.76219pt}$	$\mathrm{IVE}\,$	$\\|\,\mathrm{rad}\,(\,\scalebox{0.67}[0.87]{$\Box$\hskip 1.0pt}\varXi_{tol})\\|_{\infty}$	$\\|\,\mathrm{rad}\,(\,\scalebox{0.67}[0.87]{$\Box$\hskip 1.0pt}\varXi_{tol})\\|_{2}$	$\theta\rule[-8.53581pt]{0.0pt}{22.76219pt}$	$\mathrm{IVE}\,$	$\\|\,\mathrm{rad}\,(\,\scalebox{0.67}[0.87]{$\Box$\hskip 1.0pt}\varXi_{tol})\\|_{\infty}$	$\\|\,\mathrm{rad}\,(\,\scalebox{0.67}[0.87]{$\Box$\hskip 1.0pt}\varXi_{tol})\\|_{2}$
$n=5$ , $K=10$				$n=10$ , $K=10$
2.0	1.019	1.25	2.795	6.0	0.894	0.5	1.581
4.0	1.081	0.875	1.957	9.0	1.491	0.389	1.230
6.0	0.786	0.639	1.429	12.0	0.582	0.313	0.988
8.0	0.681	0.5	1.118	15.0	0.495	0.26	0.822
10.0	0.534	0.41	0.917	20.0	0.396	0.203	0.640
$n=5$ , $K=20$				$n=10$ , $K=20$
2.0	2.953	3.75	8.385	6.0	2.489	1.333	4.216
4.0	2.698	2.125	4.752	9.0	1.831	0.944	2.987
6.0	2.015	1.472	3.292	12.0	1.432	0.729	2.306
8.0	1.591	1.125	2.516	15.0	1.255	0.593	1.876
10.0	1.378	0.91	2.035	20.0	0.985	0.453	1.431

The interval matrix of system (29) is known to be regular if and only if $\theta>n$ for even $n$ and $\theta>\sqrt{n^{2}-1}$ for odd $n$ [9]. Consequently, in Table 1, the first two rows that correspond to each separate case of $n$ and $K$ refer to systems with singular matrices. As the parameter $\theta$ grows, the matrix of the system becomes regular and better conditioned.

The values of IVE are compared with $\|\,\mathrm{rad}\,(\,\scalebox{0.67}[0.87]{$\Box$\hskip 1.0pt}\varXi_{tol})\|_{\infty}$ and $\|\,\mathrm{rad}\,(\,\scalebox{0.67}[0.87]{$\Box$\hskip 1.0pt}\varXi_{tol})\|_{2}$ , that is, with the Chebyshev norm (max-norm) and the Euclidean norm of the radius of the interval hull of the tolerable solution set (denoted as $\scalebox{0.67}[0.87]{$\Box$\hskip 1.0pt}\varXi_{tol}$ ). We can see that, with the exception of two cases corresponding to $n=5$ and $K=10,20$ , the values of IVE are always within the two-sided bounds given by $\|\,\mathrm{rad}\,(\,\scalebox{0.67}[0.87]{$\Box$\hskip 1.0pt}\varXi_{tol})\|_{\infty}$ (lower bound) and $\|\,\mathrm{rad}\,(\,\scalebox{0.67}[0.87]{$\Box$\hskip 1.0pt}\varXi_{tol})\|_{2}$ (upper bound). And that is reasonable. Overall, our numerical experiments confirm the adequacy of the new measure of variability, which gives quite satisfactory approximate estimates of the size of the tolerable solution sets in various situations.

5 Discussion

IVE is the first measure of variability proposed in the statistics of interval data, for estimation using the maximum compatibility method, and we can not compare IVE with similar other measures, since they simply do not exist. However, it is useful to correlate the estimate IVE with the ideal mathematical characteristics of the solution set, such as its diameter, in terms of computational convenience and laboriousness.

First of all, IVE reflects instabilities in the solution set better than the diameter (see the first example in Section 4). An instability of the tolerable solution set for an interval linear system arises in the case when the maximum value of the recognizing functional is zero, $\max\mathrm{Tol}\,=0$ . Then the tolerable solution set can be either a single-point or an extended set with non-zero diameter and empty interior [15]. After an arbitrarily small perturbation of data, the latter situation can abruptly turn into the empty solution set. In any case, this phenomenon is signaled by the zero value of the maximum of the recognizing functional. The corresponding variability measure IVE is also zero, which is quite natural: it makes sense to show only “stable size” of the solution set. The equality of IVE to zero or “almost zero” thus allows us to diagnose unstable cases.

Next, the problem of computing the diameter, in 2-norm, of the tolerable solution set to an interval linear system of equations is NP-hard in general. This follows from its reducibility to the quadratic programming problem with linear constraints (see [22]). Indeed, the membership of a point in the tolerable solution set to an interval $m\times n$ -system of equations is determined by a system of linear inequalities of the size $2m\times 2n$ (the Rohn theorem [12]). To compute the diameter of the tolerable solution set in 2-norm, we have to maximize the quadratic objective function $\|x^{\prime}-x^{\prime\prime}\|^{2}_{2}$ over all pairs of points $x^{\prime}$ , $x^{\prime\prime}$ from the tolerable solution set, i. e. satisfying $2m\times 2n$ -systems of linear inequalities. So, computing the diameter of the tolerable solution set is not easy.

The diameter of the interval hull of the tolerable solution set can be computed more simply, but it is not better than IVE in any case, since the interval hull is not the solution set itself, and the coarsening resulted from such a replacement may be large.

Calculation of IVE by formula (6) requires solving the data fitting problem, that is, finding $\max\,\mathrm{Tol}\,$ and $\arg\max\,\mathrm{Tol}\,$ , and then we need to evaluate the minimum of the condition number of the matrices from the interval data matrix. In turn, the recognizing functional Tol is a concave piecewise linear function [15], so computing its maximum is polynomially complex. The author efficiently solves it by nonsmooth optimization methods developed in recent decades, in particular, using $r$ -algorithms proposed by N.Z. Shor [20], or using separating plane algorithms (see, e. g., [11, 23]). The most difficult part in calculating IVE is thus evaluating the minimum condition number of point matrices from a given interval matrix.

Computing the exact minimum of the condition number is not simple, but to address practical problems which will apply the value IVE, it is sufficient to have an approximate estimate for $\min_{A\in\text{\boldmath$A$}}\,\mathrm{cond}_{2}\,A$ from above. This follows from our considerations in Section 3.3. Sometimes, it is not necessary to compute $\min\,\mathrm{cond}_{2}\,A$ at all, if we have to compare, with each other, the variability of the estimates obtained for the same data matrix $A$ .

If the interval matrix is “sufficiently narrow”, being not very different from a point matrix, then we can approximate

\min_{A\in\text{\boldmath$A$}}\,\mathrm{cond}_{2}\,A\;\approx\;\mathrm{cond}_{2}(\mathrm{mid}\,\text{\boldmath$A$}).

(30)

But in general, this recipe may work poorly, since the left and right sides of the approximate equality (30) can be quite different. In the examples with systems (25) and (28) from Section 4, the condition number of the midpoint matrix is $2.38\cdot 10^{4}$ , and, using the simplified formula (30), we are mistaken in evaluating the variability measure IVE by more than 20 times.

In the general case, for a more accurate evaluation of $\min\,\mathrm{cond}_{2}\,A$ , we can use popular evolutionary optimization methods, such as a genetic algorithm, simulated annealing, particle swarm optimization, etc., within the interval matrix $A$ . In the numerical experiments from Section 4, the minimum of the condition number was found using the standard program of simulated annealing from free computer math system Scilab.

Note that there is a fundamental difference between computing the variability measure IVE and computing the diameter of the tolerable solution set: in the first case, we calculate a minimum, while in the second we have to find a maximum. Using traditional optimization methods and various heuristics, in the first case we compute an approximation to the minimum from above, and in the second case we find an approximation to the maximum from below. If we want to get, with our variability measure, a guaranteed outer estimation of the solution set, then the upper estimate, which is obtained by calculating the minimum in IVE, is more preferable.

There exists one more viewpoint at the variability measure IVE.

In traditional probabilistic statistics, the phenomenon of collinearity of data (also called “multicollinearity”) plays a large role. It is the presence of a linear dependence between the input (predictor) variables of the regression model. The $k$ variables of the model in question are usually called collinear if the vectors representing them lie in a linear space of dimension less than $k$ [1], so that one of these vectors is a linear combination of the others. In practice, such exact collinearity of data is rare, but real computational problems in data fitting often starts when the data vectors are “almost linearly dependent”. Then the parameter estimates are unstable, which leads to increased statistical uncertainty, i. e., an increase in the variance of the estimates.

According to modern views, the collinearity of data is most adequately described by the condition number of the matrix made up of these data (see, e. g., [1], Chapter 3). In this sense, our IVE is, in fact, a measure of the collinearity of the data, corrected with the help of the actual value of the estimate and compatibility of this estimate with the data (which is indicated by the maximal value of the recognizing functional). The minimum over all $\mathrm{cond}_{2}A$ for $A\in\text{\boldmath$A$}$ is taken due to the specifics of the strong compatibility of parameters and data, and it agrees well with the regularizing role of the tolerable solution set (see [18]).

With this interpretation, IVE makes sense even with a negative maximum of the recognizing functional, max Tol, when the tolerable solution set is empty and the parameters of the linear function (1), which are strongly compatible with the data, do not exist. The absolute value of IVE still shows, up to a certain scaling coefficient, a measure of the collinearity of the data (a measure of their ill-conditioning), and the negative sign indicates the status of the solution to the problem, i. e., that the parameter vector computed is not strongly compatible with the data, but only provides the best possible approximation for the input data of the problem.

The immediate goal of further research is to justify the use of IVE for underdetermined situations, where the number $m$ of observations is less than the number $n$ of parameters to be determined. The maximum compatibility method works well in this case too, we get parameter estimates and we can calculate their values of IVE, but its application needs to be justified, at least at the same level of rigor as was done in this work for $m\geq n$ .

Aknowledgements

The author is grateful to Alexander Bazhenov, Sergey Kumkov, and Sergei Zhilin for stimulating discussions and valuable comments on the work. Also, the author thanks the anonymous reviewers for their constructive criticism and good suggestions.

References

[1] D.A. Belsley, E. Kuh, R.E. Welsch, Regerssion Diagnostics, Wiley-Interscience, Hoboken, N.J., 1980, 2004.
[2] G.H. Golub, Ch.F. Van Loan, Matrix Computations, The Johns Hopkins University Press, Baltimore, 1996, 2013.
[3] R.A. Horn, Ch.R. Johnson, Topics in Matrix Analysis, Cambridge University Press, Cambridge, 1994.
[4] L. Jaulin, M. Kieffer, O. Didrit, E. Walter, Applied Interval Analysis, Springer, London, 2001.
[5] R.B. Kearfott, M. Nakao, A. Neumaier, S. Rump, S.P. Shary, P. van Hentenryck, Standardized notation in interval analysis, Computational Technologies 15 (2010), No. 1, pp. 7–13.
[6] V. Kreinovich, S.P. Shary, Interval methods for data fitting under uncertainty: a probabilistic treatment, Reliable Computing 23 (2016), pp. 105–140. URL: http://interval.louisiana.edu/reliable-computing-journal/volume-23/reliable-computing-23-pp-105-140.pdf (accessed 10 March 2020).
[7] M. Milanese, J. Norton, H. Piet-Lahanier, E. Walter (Eds.), Bounding Approaches to System Identification, Plenum Press, New York, 1996. DOI: 10.1007/978-1-4757-9545-5
[8] R.E. Moore, R.B. Kearfott, M.J. Cloud, Introduction to Interval Analysis, SIAM, Philadelphia, 2009.
[9] A. Neumaier, Interval Methods for Systems of Equations, Cambridge University Press, Cambridge, 1990.
[10] H.T. Nguyen, V. Kreinovich, B. Wu, G. Xiang, Computing Statistics under Interval and Fuzzy Uncertainty. Applications to Computer Science and Engineering, Springer, Berlin-Heidelberg, 2012.
[11] E.A. Nurminski, Separating plane algorithms for convex optimization, Mathematical Programming 76 (1997), pp. 373–391. DOI: 10.1007/BF02614389
[12] J. Rohn, Inner solutions of linear interval systems, in: Nickel K. (Ed.), Interval Mathematics 1985, Lecture Notes on Computer Science 212, Springer, New York, 1986, pp. 157–158.
[13] I.A. Sharaya, On unbounded tolerable solution sets, Reliable Computing 11 (2005), pp. 425–432. DOI: 10.1007/s11155-005-0049-9
[14] I.A. Sharaya, IntLinInc2D, a MATLAB software package for visualization of solution sets to interval linear 2D systems. Novosibirsk, 2014. URL: http://www.nsc.ru/interval/sharaya
[15] S.P. Shary, Solving the linear interval tolerance problem, Mathematics and Computers in Simulation 39 (1995), pp. 53–85. DOI: 10.1016/0378-4754(95)00135-K
[16] S.P. Shary, Solvability of interval linear equations and data analysis under uncertainty, Automation and Remote Control 73 (2012), pp. 310–322. DOI: 10.1134/S0005117912020099
[17] S.P. Shary, Maximum consistency method for data fitting under interval uncertainty, Journal of Global Optimization 66 (2016), pp. 111–126. DOI: 10.1007/s10898-015-0340-1
[18] S.P. Shary, Interval regularization for imprecise linear algebraic equations. Deposited in arXiv.org on 27 Sep 2018, No. arXiv:1810.01481.
[19] S.P. Shary, Weak and strong compatibility in data fitting under interval uncertainty, Advances in Data Science and Adaptive Analysis 12 (2020) (in press).
[20] N.Z. Shor, N.G. Zhurbenko, A minimization method using space dilation in the direction of difference of two successive gradients, Cybernetics 7(3) (1971), pp. 450–459. DOI: 10.1007/BF01070454
[21] TOLSOLVTY2, a MATLAB code for examining solvability of the interval linear tolerance problem. URL: https://www.researchgate.net/publication/294889566_TOLSOLVTY2
[22] S.A. Vavasis, Complexity theory: Quadratic programming, in: C.A. Floudas and P.M. Pardalos (Eds.), Encyclopedia of Optimization. Second Edition, New York, Springer, 2009, pp. 451–453.
[23] E. Vorontsova, Extended separating plane algorithm and NSO-solutions of PageRank problem, in: Y. Kochetov, M. Khachay, V. Beresnev, E. Nurminski, P. Pardalos (Eds.), Discrete Optimization and Operations Research. Proceedings of 9th International Conference DOOR 2016, Vladivostok, Russia, September 19-23, 2016, Lecture Notes in Computer Science 9869, Cham, Switzerland, Springer International, 2016, pp. 547–560. DOI: 10.1007/978-3-319-44914-2_43
[24] D.S. Watkins, Fundamentals of Matrix Computations, Wiley-Interscience, New York, 2002.

	$\displaystyle\displaystyle\frac{\displaystyle\phantom{M}\frac{\\|\Delta x\\|_{2}}{\\|x\\|_{2}}\phantom{M}}{\displaystyle\frac{\\|\Delta b\\|_{2}}{\\|b\\|_{2}}}\$	$\displaystyle=\ \frac{\\|\Delta x\\|_{2}\,\\|b\\|_{2}\phantom{I}}{\phantom{I}\\|x\\|_{2}\,\\|\Delta b\\|_{2}}=\ \frac{\\|\Delta x\\|_{2}\,\\|Ax\\|_{2}\phantom{I}}{\phantom{I}\\|x\\|_{2}\,\\|A\Delta x\\|_{2}}\ =\ \frac{\\|\Delta x\\|_{2}}{\\|A\Delta x\\|_{2}}\;\frac{\\|Ax\\|_{2}}{\\|x\\|_{2}}$
		$\displaystyle\leq\;\max_{\Delta x\neq 0}\frac{\\|\Delta x\\|_{2}}{\\|A\Delta x\\|_{2}}\ \max_{x\neq 0}\frac{\\|Ax\\|_{2}}{\\|x\\|_{2}}\ =\ \left(\min_{\Delta x\neq 0}\frac{\\|A\Delta x\\|_{2}}{\\|\Delta x\\|_{2}}\right)^{-1}\ \max_{x\neq 0}\frac{\\|Ax\\|_{2}}{\\|x\\|_{2}}$
		$\displaystyle=\ \bigl{(}\sigma_{\min}(A)\bigr{)}^{-1}\,\sigma_{\max}(A)\,=\ \mathrm{cond}_{2}(A)$

A variability measure for estimates of parameters in interval data fitting