This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A  variability  measure  for  estimates
of parameters in interval data fitting

Sergey P. Shary
Institute of Computational Technologies SB RAS
and Novosibirsk State University,
Novosibirk, Russia
E-mail: [email protected]
Abstract

The paper presents a construction of a quantitative measure of variability for parameter estimates in the data fitting problem under interval uncertainty. It shows the degree of variability and ambiguity of the estimate, and the need for its introduction is dictated by non-uniqueness of answers to the problems with interval data. A substantiation of the new variability measure is given, its application and motivations are discussed. Several examples and a series of numerical tests are considered, showing the features of the new characteristic and the specifics of its use.
Keywords: data fitting problem, linear regression, interval data uncertainty,
                   maximum compatibility method, strong compatibility, variability measure.
MSC 2010: 65G40, 62J10, 90C90


1 Introduction and problem statement

The purpose of this work is to present a quantitative variability measure for estimates of parameters of functional dependencies in the statistics of interval data. This is a relatively young branch of modern data science that does not rely on the probability theory, but makes extensive use of interval analysis methods (see, e. g., the surveys in [4, 7, 10]).

Refer to caption
Fig. 1: A variability measure can be an estimate

of the size of the set of possible solutions.

By the term “variability”, we understand the degree of variation and ambiguity of the estimate, and the need for its introduction is dictated by the fact that, in processing interval data, the answer is typically not unique. Usually, we get a whole set of different estimates that are equally consistent (compatible) with the source data and, thus, suitable as solutions to the problem. The extent to which this set is large or small is, partly, characterized by the term “variability”. In traditional probabilistic statistics, estimates of parameters are known to be random variables themselves, and the measure of their variability can be the variance of the estimates, mean absolute difference, median absolute deviation, average absolute deviation, and such like. What could be their analogues in the statistics of interval data?

At first glance, the answer to this question seems quite obvious: it can be any value that characterizes the size of the set of solutions to the problem, if it is non-empty. We can even take an enclosure of the solution set obtained by an interval method. A certain disadvantage of this variant is the excessive detailing of the answer given as a box in n\mathbb{R}^{n}, a large amount of information that still needs to be “digested” and reduced to a compact and expressive form. Sometimes, an interval estimate in the form of an axes-aligned box may inadequately represent the solution set. Another disadvantage is the complexity of finding such an estimate.

It is desirable to have a relatively simple and efficiently computable quantity, expressed in a single number, because it would give a general aggregate view of the subject of interest. Similarly to variance and other probabilistic measures, it can serve as an approximate characteristic of the quality of parameter estimation. The greater the variability of an estimate, the less its certainty and the worse its quality, and this can serve as a basis for conclusions about the quality of the estimate.

At the same time, the introduced variability measure should not be simply the “size of the solution set”. If this solution set, for example, is unstable and changes abruptly with arbitrarily small changes in the data, then its size is, to some extent, misleading and disorienting (see example in Section 4). A practically useful variability measure should take into account this possible instability of the solution set to the problem and give us a robust value.

Refer to caption
Fig. 2: An illustration for the data fitting problem under interval uncertainty.

In our article, we are within the framework of the data fitting problem (often called regression analysis problem): given results of measurements or observations, it is required to construct a functional dependence of a fixed type that “best fit” these data. Specifically, we need to determine the parameters x1x_{1}, x2x_{2}, …, xnx_{n} of a linear function of the form

b=x1a1++xnanb=x_{1}a_{1}+\ldots+x_{n}a_{n} (1)

from a number of values of the independent variables a1a_{1}, a2a_{2}, …, ana_{n} (also called exogenous, explanatory, predictor or input variables), and the corresponding values of the dependent variable bb (also called endogenous, response, criterion or output variable). Both a1a_{1}, a2a_{2}, …, ana_{n} and bb are not known precisely, and we only have intervals of their possible values (see Fig. 2). To find estimates of the coefficients x1x_{1}, x2x_{2}, …, xnx_{n}, we use the so-called maximum compatibility method (previously called “maximum consistency method”), which was proposed and developed in the works [6, 16, 17, 19] and others. After the estimates for x1x_{1}, x2x_{2}, …, xnx_{n} are found, we need to somehow evaluate their variability. Our article presents a construction of the variability measure in the above data fitting problem.

Note that traditional methods of data fitting and regression analysis, such as the least squares method and its modifications, the least modulus method, etc., cannot be applied to the solution of our problem, since they are unsuitable for situations where the source data are intervals rather than points.

2 Formulation of the main results

2.1 Maximum compatibility method and tolerable solution set

The initial data for our problem is a set of values of independent and dependent variables for function (1), which are obtained as a result of mm measurements (observations):

𝒂11,𝒂12,𝒂1n,𝒃1,𝒂21,𝒂22,𝒂2n,𝒃2,𝒂m1,𝒂m2,𝒂mn,𝒃m.\begin{array}[]{ccccc}\text{\boldmath$a$}_{11},&\text{\boldmath$a$}_{12},&\ldots&\text{\boldmath$a$}_{1n},&\text{\boldmath$b$}_{1},\\ \text{\boldmath$a$}_{21},&\text{\boldmath$a$}_{22},&\ldots&\text{\boldmath$a$}_{2n},&\text{\boldmath$b$}_{2},\\ \vdots&\vdots&\ddots&\vdots&\vdots\\ \text{\boldmath$a$}_{m1},&\text{\boldmath$a$}_{m2},&\ldots&\text{\boldmath$a$}_{mn},&\text{\boldmath$b$}_{m}.\end{array} (2)

These are intervals as we assume that these data are inaccurate and have interval uncertainty due to measurement errors, etc. Both the data (2) and other interval values throughout the text are highlighted in bold mathematical font according to the informal international standard [5]. The first index of the interval values from (2) means the measurement number, and the second one, at 𝒂ij\text{\boldmath$a$}_{ij}’s, is the number of the independent variable that takes the corresponding value in this measurement.

To find an estimate (x^1,x^2,,x^n)(\hat{x}_{1},\hat{x}_{2},\ldots,\hat{x}_{n}) of the parameters of the linear function (1), we “substitute” data (2) into equality (1), thus getting an interval system of linear algebraic equations

{𝒂11x1+𝒂12x2++𝒂1nxn=𝒃1,𝒂21x1+𝒂22x2++𝒂2nxn=𝒃2,𝒂m1x1+𝒂m2x2++𝒂mnxn=𝒃m,\left\{\ \begin{array}[]{ccccccccc}\text{\boldmath$a$}_{11}x_{1}&+&\text{\boldmath$a$}_{12}x_{2}&+&\ldots&+&\text{\boldmath$a$}_{1n}x_{n}&=&\text{\boldmath$b$}_{1},\\[1.0pt] \text{\boldmath$a$}_{21}x_{1}&+&\text{\boldmath$a$}_{22}x_{2}&+&\ldots&+&\text{\boldmath$a$}_{2n}x_{n}&=&\text{\boldmath$b$}_{2},\\[1.0pt] \vdots&&\vdots&&\ddots&&\vdots&&\vdots\\[1.0pt] \text{\boldmath$a$}_{m1}x_{1}&+&\text{\boldmath$a$}_{m2}x_{2}&+&\ldots&+&\text{\boldmath$a$}_{mn}x_{n}&=&\text{\boldmath$b$}_{m},\end{array}\right. (3)

or, briefly,

𝑨x=𝒃\text{\boldmath$A$}x=\text{\boldmath$b$} (4)

with an interval m×nm\times n-matrix 𝑨=(𝒂ij)\text{\boldmath$A$}=(\text{\boldmath$a$}_{ij}) and interval mm-vector 𝒃=(𝒃i)\text{\boldmath$b$}=(\text{\boldmath$b$}_{i}) in the right-hand side. The sets of parameters which are compatible, in this or that sense, with the measurement data (2) form various solution sets for the equations system (3). The most popular of them are the united solution set and tolerable solution set. The united solution set, defined as

Ξuni(𝑨,𝒃)={xn Ax=b for some A𝑨 and b𝒃},\varXi_{uni}(\text{\boldmath$A$},\text{\boldmath$b$})=\bigl{\{}\,x\in\mathbb{R}^{n}\mid\text{ $Ax=b\,$ for some $A\in\text{\boldmath$A$}$ and $b\in\text{\boldmath$b$}$}\,\bigr{\}},

corresponds to the so-called weak compatibility between the parameters of function (1) and data (2) (see [6, 16, 17]). The tolerable solution set, defined as

Ξtol(𝑨,𝒃)={xn Ax𝒃 for each matrix A𝑨},\varXi_{tol}(\text{\boldmath$A$},\text{\boldmath$b$})=\bigl{\{}\,x\in\mathbb{R}^{n}\mid\text{ $Ax\in\text{\boldmath$b$}\,$ for each matrix $A\in\text{\boldmath$A$}$}\,\bigr{\}},

corresponds to the so-called strong compatibility between the parameters of function (1) and data (2) (see [19]).

Refer to caption
Fig. 3: An illustration of the strong compatibility between interval data and a linear function.

Further, we assume that the solution to the data fitting problem for function (1) is found by the maximum compatibility method (see [16, 17, 19]). As an estimate of the parameters of function (1), it takes the maximum point of the recognizing functional, a special function that gives a quantitative “compatibility measure” of this estimate with empirical data (2).

The maximum compatibility method has two versions, “weak” and “strong”, that differ in understanding how exactly the interval data should be “compatible” with the function that we construct on them. Weak and strong compatibility reflect two different situations that may occur in data processing. In the weak version, it is required that the graph of the constructed function just intersects the measurement uncertainty boxes (see [16, 17]). The strong version implies more stringent condition: it requires that the function graph passes within the “corridors” specified by the intervals 𝒃i\text{\boldmath$b$}_{i}, i=1,2,,mi=1,2,\ldots,m, for any values of the independent variables a1a_{1}, a2a_{2}, …, ana_{n} from the respective intervals 𝒂i1\text{\boldmath$a$}_{i1}, 𝒂i2\text{\boldmath$a$}_{i2}, …, 𝒂in\text{\boldmath$a$}_{in} obtained in the ii-th measurement (see [19]). This is illustrated in Fig. 3, where the straight line of the function graph goes through the vertical faces of the measurement uncertainty boxes. The weak compatibility is shown in Fig. 2 by two upper straight lines. The lower line in Fig. 2 does not satisfy compatibility condition at all, neither weak nor strong, since it does not intersect some boxes.

The “strong version” of the maximum compatibility method has a number of theoretical and practical advantages over the “weak version”. These are polynomial complexity, robustness of estimates and their finite variability, the fact that the strong compatibility partially overcomes the so-called Demidenko paradox, etc. (see details in [19]). Hence, we consider below a strong version of the maximum compatibility method, which corresponds to the tolerable solution set Ξtol(𝑨,𝒃)\varXi_{tol}(\text{\boldmath$A$},\text{\boldmath$b$}) for the interval system of equations (4). Its recognizing functional is usually denoted by “Tol”,

Tol(x,𝑨,𝒃)=min1im{rad𝒃i|mid𝒃ij=1n𝒂ijxj|},\mathrm{Tol}\,(x,\text{\boldmath$A$},\text{\boldmath$b$})\;=\,\min_{1\leq i\leq m}\left\{\,\mathrm{rad}\,\text{\boldmath$b$}_{i}-\left|\;\mathrm{mid}\,\text{\boldmath$b$}_{i}-\sum_{j=1}^{n}\,\text{\boldmath$a$}_{ij}x_{j}\,\right|\,\right\}, (5)

where

rad𝒃i=12(𝒃¯i𝒃¯i),mid𝒃i=12(𝒃¯i+𝒃¯i)\mathrm{rad}\,\text{\boldmath$b$}_{i}=\tfrac{1}{2}(\overline{\text{\boldmath$b$}}_{i}-\underline{\text{\boldmath$b$}}_{i}),\hskip 65.44133pt\mathrm{mid}\,\text{\boldmath$b$}_{i}=\tfrac{1}{2}(\overline{\text{\boldmath$b$}}_{i}+\underline{\text{\boldmath$b$}}_{i})

are radii and midpoints of the components of the right-hand side 𝒃b, the arithmetic operations inside the modulus in (5) are those of the classical interval arithmetic (see, e. g., [4, 8, 9]), and the modulus is understood as the maximum absolute value of the points from the interval,

|𝒂|=max{|a|a𝒂}=max{|𝒂¯|,|𝒂¯|}.|\text{\boldmath$a$}|=\max\,\{\,|a|\mid a\in\text{\boldmath$a$}\,\}=\max\,\bigl{\{}\,|\underline{\text{\boldmath$a$}}|,|\overline{\text{\boldmath$a$}}|\,\bigr{\}}.

Typical graphs of the functional Tol for the one-dimensional case are shown in Fig. 4 and Fig. 5.

To solve the data fitting problem for the linear function (1) and data set (2), it is necessary to find the unconstrained maximum, over all xnx\in\mathbb{R}^{n}, of the functional Tol(x,𝑨,𝒃)\mathrm{Tol}\,(x,\text{\boldmath$A$},\text{\boldmath$b$}),

Tol(x,𝑨,𝒃)max,\mathrm{Tol}\,(x,\text{\boldmath$A$},\text{\boldmath$b$})\rightarrow\max,

and the vector x^=argmaxxnTol(x,𝑨,𝒃)\hat{x}=\arg\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,(x,\text{\boldmath$A$},\text{\boldmath$b$}) at which this maximum is attained provides an estimate of the parameters of function (1).

If maxTol0\max\,\mathrm{Tol}\,\geq 0, then the solution set Ξtol(𝑨,𝒃)\varXi_{tol}(\text{\boldmath$A$},\text{\boldmath$b$}), i. e., the set of parameters strongly compatible with the data is non-empty, and x^Ξtol(𝑨,𝒃)\hat{x}\in\varXi_{tol}(\text{\boldmath$A$},\text{\boldmath$b$}). If maxTol<0\max\,\mathrm{Tol}\,<0, then the solution set Ξtol(𝑨,𝒃)\varXi_{tol}(\text{\boldmath$A$},\text{\boldmath$b$}) is empty and there do not exist parameters that are strongly compatible with data (2). However, the argument x^\hat{x} of maxTol\max\,\mathrm{Tol}\, still provides the best compatibility of the constructed linear function with data (2) (more precisely, the least incompatibility).

To conclude this subsection, we give a useful result on the tolerable solution set that allows us to investigate whether it is bounded or unbounded, i. e., whether the tolerable solution sets is finite in size or extends infinitely.


Irene Sharaya’s boundedness criterion [13]  Let the tolerable solution set to an interval linear system 𝐀x=𝐛\text{\boldmath$A$}x=\text{\boldmath$b$} be nonempty. It is unbounded if and only if the matrix 𝐀A has linearly dependent noninterval columns.


The criterion of boundedness shows that the tolerable solution set is unbounded, in fact, under exceptional circumstances, which are almost never fulfilled in practice, when working with actual interval data. That is, the tolerable solution set is mostly bounded, and the estimates obtained by the strong version of the maximum compatibility method almost always has finite variability.

2.2 Variability measures

As a quantity characterizing the variability of the estimate of the parameter vector x^=(x^1,x^2,,x^n)\hat{x}=(\hat{x}_{1},\hat{x}_{2},\ldots,\hat{x}_{n}) in the linear function (1), which is obtained by the maximum compatibility method from data (2), we propose

IVE(𝑨,𝒃)=nmaxnTol(minA𝑨cond2A)argmaxnTol2𝒃^2.\mathrm{IVE}\,(\text{\boldmath$A$},\text{\boldmath$b$})\;=\;\sqrt{n}\;\max_{\mathbb{R}^{n}}\,\mathrm{Tol}\,\cdot\Bigl{(}\;\min_{A\in\text{\boldmath$A$}}\,\mathrm{cond}_{2}\,A\,\Bigr{)}\cdot\frac{\displaystyle\bigl{\|}\,\arg\max_{\mathbb{R}^{n}}\,\mathrm{Tol}\,\bigr{\|}_{2}}{\|\hat{\text{\boldmath$b$}}\|_{2}}. (6)

In this formula,

nn is the dimension of the parameter vector of function (1) under construction,

2\|\cdot\|_{2} is the Euclidean norm (2-norm) of vectors from n\mathbb{R}^{n}, defined as

x2=(i=1n|xi|2)1/2,\|x\|_{2}\;=\;\left(\;\sum_{i=1}^{n}|x_{i}|^{2}\,\right)^{1/2},

cond2A\mathrm{cond}_{2}\,A is the spectral condition number of the matrix AA, defined as

cond2A=σmax(A)σmin(A),\mathrm{cond}_{2}\,A\;=\;\frac{\sigma_{\max}(A)}{\sigma_{\min}(A)},

i. e., the ratio of the maximal σmax(A)\sigma_{\max}(A) and minimal σmin(A)\sigma_{\min}(A) singular values of AA; it is an extension, to the rectangular case, of the concept of the condition number from computational linear algebra (see e. g. [2, 24]);

𝒃^\hat{\text{\boldmath$b$}} is a certain “most representative” point from the interval vector 𝒃b, which is taken as

𝒃^=12(|mid𝒃+rad𝒃|+|mid𝒃rad𝒃|),\hat{\text{\boldmath$b$}}\;=\;\tfrac{1}{2}(|\mathrm{mid}\,\text{\boldmath$b$}+\mathrm{rad}\,\text{\boldmath$b$}|+|\mathrm{mid}\,\text{\boldmath$b$}-\mathrm{rad}\,\text{\boldmath$b$}|), (7)

where the operations “mid” and “rad” are applied in componentwise manner.

Refer to captionΞ𝑡𝑜𝑙\varXi_{\mathit{tol}}
Fig. 4: The maximum value of the recognizing functional

gives an idea of the size of the tolerable solution set Ξtol\varXi_{tol}.

Despite the definite formula (7) for 𝒃^\hat{\text{\boldmath$b$}}, it should be noted that the introduction of this point is, to a large extent, a matter of common sense. The general approach to the definition of 𝒃^\hat{\text{\boldmath$b$}} is that it must be a kind of “most representative” point from the right-hand side vector 𝒃b, and in some situations this choice may be different from formula (7). For example, 𝒃^\hat{\text{\boldmath$b$}} can be a point result of the measurement, around which the uncertainty interval is built later, based on information about the accuracy of the measuring device.

Apart from (6), as a measure of relative variability of the parameter estimate, the value

n(minA𝑨cond2A)maxnTol𝒃^2,n\;\Bigl{(}\,\min_{A\in\text{\boldmath$A$}}\,\mathrm{cond}_{2}A\,\Bigr{)}\cdot\frac{\max_{\mathbb{R}^{n}}\,\mathrm{Tol}\,}{\|\hat{\text{\boldmath$b$}}\|_{2}}, (8)

can have a certain significance. Both IVE and value (8) are defined for interval linear systems (4) with nonzero right-hand sides. They can take either positive real values or be infinite. The latter occurs in the only case of minA𝑨cond2A=\,\min_{A\in\text{\boldmath$A$}}\,\mathrm{cond}_{2}A=\infty, when all the point matrices A𝑨A\in\text{\boldmath$A$} have incomplete rank, i. e., when σmin(A)=0\sigma_{\min}(A)=0 for every A𝑨A\in\text{\boldmath$A$}. Then the variability measures are set to be infinite.

The symbol IVE is built as an abbreviation of the phrase “interval variability of the estimate”. Below, we show that the value IVE adequately characterizes the size of non-empty tolerable solution set for a large class of practically important situations. But it is useful to discuss informal motivations that lead to the estimate IVE and to demonstrate that IVE has an intuitive, clear and even visual meaning.

Refer to captionΞ𝑡𝑜𝑙\varXi_{\mathit{tol}}Ξ𝑡𝑜𝑙\varXi_{\mathit{tol}}
Fig. 5: In addition to the maximum of the recognizing functional, the size

of the tolerable solution set is also affected by “steepness” of the graph.

The tolerable solution set of an interval system of linear algebraic equations is the set of zero level of the recognizing functional Tol (see details in [15]), or, in other words, the intersection of the hypograph of this functional with the coordinate plane Tol=0\mathrm{Tol}\,=0 (this is illustrated in Fig. 4). As a consequence, the magnitude of the maximum of the recognizing functional can, with other things being equal, be a measure of how extensive or narrow the tolerable solution set is. The more maxTol\max\,\mathrm{Tol}\,, the larger the size of the tolerable solution set, and vice versa. An additional factor that provides “other things being equal” is the slope (steepness) of pieces of hyperplanes of which the polyhedral graph of the functional Tol is compiled (these are straight lines in the 1D case in Fig. 4 and Fig. 5). The slope of the hyperplanes is determined by the coefficients of the equations that define them, which are the endpoints of the data intervals (2). The value of this slope is summarized in terms of the condition number of point matrices from the interval data matrix 𝑨A. Finally, the multiplier

argmaxTol2𝒃^2=x^2𝒃^2\frac{\|\arg\max\,\mathrm{Tol}\,\|_{2}}{\|\hat{\text{\boldmath$b$}}\|_{2}}\ =\ \frac{\|\hat{x}\|_{2}}{\|\hat{\text{\boldmath$b$}}\|_{2}}

is a scaling coefficient that helps to provide the commensurability of the final value with magnitudes of the solution, argmaxTol\arg\max\,\mathrm{Tol}\,, and the right-hand side vector of the equations system. Thus, formula (6) is obtained.

3 A justification of the variability measure

Considering the most general case, we should assume that the number of measurements mm may not coincide with the number nn of unknown parameters of the linear function (1). In this section, we consider only the case mnm\geq n. In other words, the number of measurements (observations) made is not less than the number of function parameters. Then the interval system of linear equations (4) is either square or tall (overdetermined). Of course, the data fitting problem makes sense for m<nm<n too, the maximum compatibility method also works for this case, and the variability measure IVE is then also applicable (see Section 4), but the latter still needs a separate substantiation.

3.1 Estimates of perturbations of the solution
to rectangular linear systems

The starting point of our constructions justifying the choice of (6) exactly in the form described above is the well-known inequality that estimates perturbation Δx\Delta x of a nonzero solution xx to the system of linear algebraic equations Ax=bAx=b depending on the change Δb\Delta b of the right-hand side bb (see, e. g., [2, 24]):

Δx2x2cond2AΔb2b2.\frac{\|\Delta x\|_{2}}{\|x\|_{2}}\ \leq\ \mathrm{cond}_{2}\,A\,\cdot\frac{\|\Delta b\|_{2}}{\|b\|_{2}}. (9)

It is usually considered for square systems of linear equations, when m=nm=n, but in the case of the Euclidean vector norm and the spectral condition number of matrices, this inequality holds true in the more general case with mnm\geq n. Naturally, estimate (9) makes sense only for σmin(A)0\sigma_{\min}(A)\neq 0, when cond2A<\mathrm{cond}_{2}A<\infty, i. e., when the matrix AA has full column rank. Let us briefly recall its derivation for this case.

Given

Ax=b и A(x+Δx)=b+Δb,Ax=b\quad\text{ и }\quad A(x+\Delta x)=b+\Delta b,

we have

AΔx=Δb.A\Delta x=\Delta b.

Further,

Δx2x2Δb2b2\displaystyle\displaystyle\frac{\displaystyle\phantom{M}\frac{\|\Delta x\|_{2}}{\|x\|_{2}}\phantom{M}}{\displaystyle\frac{\|\Delta b\|_{2}}{\|b\|_{2}}}\ =Δx2b2x2Δb2=Δx2Ax2x2AΔx2=Δx2AΔx2Ax2x2\displaystyle=\ \frac{\|\Delta x\|_{2}\,\|b\|_{2}\phantom{I}}{\phantom{I}\|x\|_{2}\,\|\Delta b\|_{2}}=\ \frac{\|\Delta x\|_{2}\,\|Ax\|_{2}\phantom{I}}{\phantom{I}\|x\|_{2}\,\|A\Delta x\|_{2}}\ =\ \frac{\|\Delta x\|_{2}}{\|A\Delta x\|_{2}}\;\frac{\|Ax\|_{2}}{\|x\|_{2}}
maxΔx0Δx2AΔx2maxx0Ax2x2=(minΔx0AΔx2Δx2)1maxx0Ax2x2\displaystyle\leq\;\max_{\Delta x\neq 0}\frac{\|\Delta x\|_{2}}{\|A\Delta x\|_{2}}\ \max_{x\neq 0}\frac{\|Ax\|_{2}}{\|x\|_{2}}\ =\ \left(\min_{\Delta x\neq 0}\frac{\|A\Delta x\|_{2}}{\|\Delta x\|_{2}}\right)^{-1}\ \max_{x\neq 0}\frac{\|Ax\|_{2}}{\|x\|_{2}}
=(σmin(A))1σmax(A)=cond2(A)\displaystyle=\ \bigl{(}\sigma_{\min}(A)\bigr{)}^{-1}\,\sigma_{\max}(A)\,=\ \mathrm{cond}_{2}(A)

by virtue of the properties of the singular values (see e. g. [3, 24]). A comparison of the beginning and the end of this calculation leads to the inequality (9), which, as is easy to understand, is attainable for some xx and Δx\Delta x, or, equivalently, for some right-hand sides of bb and their perturbations Δb\Delta b. Naturally, the above calculations and the resulting estimate make sense only for σmin(A)0\sigma_{\min}(A)\neq 0.

3.2 Interval systems with point matrices

Let us consider an interval system of linear algebraic equations

Ax=𝒃Ax=\text{\boldmath$b$} (10)

with a point (noninterval) m×nm\times n-matrix AA, mnm\geq n, and an interval mm-vector 𝒃b in the right-hand side. We assume that AA has full column rank and, therefore, cond2A<\mathrm{cond}_{2}\,A<\infty.

Suppose also that the tolerable solution set for system (10) is non-empty, i. e. Ξtol(A,𝒃)={xnAx𝒃}\varXi_{tol}(A,\text{\boldmath$b$})=\bigl{\{}\,x\in\mathbb{R}^{n}\mid Ax\in\text{\boldmath$b$}\,\bigr{\}}\neq\varnothing. We need to quickly and with little effort estimate the size of this solution set, and our answer will be a “radius type” estimate for Ξtol(A,𝒃)\varXi_{tol}(A,\text{\boldmath$b$}). More precisely, we are going to evaluate maxxx^2\max\|x^{\prime}-\hat{x}\|_{2} over all xΞtol(A,𝒃)x^{\prime}\in\varXi_{tol}(A,\text{\boldmath$b$}) and for a special fixed point x^Ξtol(A,𝒃)\hat{x}\in\varXi_{tol}(A,\text{\boldmath$b$}), which is taken as

x^=argmaxxnTol(x,A,𝒃).\hat{x}\ =\ \arg\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,(x,A,\text{\boldmath$b$}).

Recall that the argument x^\hat{x} of the maximum of the recognizing functional for system (10) is an estimate of parameters of linear function (1) from empirical data. Strictly speaking, this point can be determined non-uniquely, but then let x^\hat{x} be any one of the points at which the maximum is reached.

Let xx^{\prime} be a point in the tolerable solution set Ξtol(A,𝒃)\varXi_{tol}(A,\text{\boldmath$b$}). How to evaluate xx^2\|x^{\prime}-\hat{x}\|_{2}? It is clear that xx^{\prime} and x^\hat{x} are solutions of systems of linear algebraic equations with the matrix AA and some right-hand sides bb^{\prime} and b^\hat{b}, respectively, from the interval vector 𝒃b. If x^0\hat{x}\neq 0 and b^0\hat{b}\neq 0, then we can apply inequality (9), considering a perturbation of the solution x^\hat{x} to the system of linear algebraic equations Ax=b^Ax=\hat{b}. Then Δx=xx^\Delta x=x^{\prime}-\hat{x}, Δb=bb^\Delta b=b^{\prime}-\hat{b}, and we get

xx^2x^2cond2Abb^2b^2,\frac{\|x^{\prime}-\hat{x}\|_{2}}{\|\hat{x}\|_{2}}\ \leq\;\mathrm{cond}_{2}\,A\cdot\frac{\|b^{\prime}-\hat{b}\|_{2}}{\|\hat{b}\|_{2}},

from where the absolute estimate is obtained

xx^2cond2Ax^2bb^2b^2.\|x^{\prime}-\hat{x}\|_{2}\ \leq\;\mathrm{cond}_{2}\,A\cdot\|\hat{x}\|_{2}\cdot\frac{\|b^{\prime}-\hat{b}\|_{2}}{\|\hat{b}\|_{2}}. (11)

The point x^\hat{x} is found as the result of maximization of the recognizing functional Tol, the point b^\hat{b} coincides with Ax^A\hat{x}, the condition number cond2A\mathrm{cond}_{2}\,A can be computed by well-developed standard procedures. Therefore, for practical work with inequality (11), one need somehow evaluate bb^2\|b^{\prime}-\hat{b}\|_{2}.

But first, bearing in mind the further application of the deduced estimate in a situation where the matrix AA may vary, we somewhat roughen (11) by taking approximately b^2𝒃^2\|\hat{b}\|_{2}\approx\|\hat{\text{\boldmath$b$}}\|_{2}, that is, as the norm of the “most representative” point 𝒃^\hat{\text{\boldmath$b$}} of the interval vector 𝒃b, which we defined in Section 2.2:

b^2𝒃^2, where 𝒃^=12(|mid𝒃+rad𝒃|+|mid𝒃rad𝒃|).\|\hat{b}\|_{2}\,\approx\,\|\hat{\text{\boldmath$b$}}\|_{2},\qquad\text{ where }\ \hat{\text{\boldmath$b$}}\,=\,\tfrac{1}{2}\,\bigl{(}\,|\mathrm{mid}\,\text{\boldmath$b$}+\mathrm{rad}\,\text{\boldmath$b$}|+|\mathrm{mid}\,\text{\boldmath$b$}-\mathrm{rad}\,\text{\boldmath$b$}|\,\bigr{)}.

In doing this, some coarsening is allowed, so instead of (11) we write

xx^2cond2Ax^2Δb2𝒃^2.\|x^{\prime}-\hat{x}\|_{2}\ \lessapprox\;\mathrm{cond}_{2}\,A\cdot\|\hat{x}\|_{2}\cdot\frac{\|\Delta b\|_{2}}{\|\hat{\text{\boldmath$b$}}\|_{2}}. (12)

Now it is necessary to determine the increment of the right-hand side Δb=bb^\Delta b=b^{\prime}-\hat{b}. Its obvious upper bound is 2rad𝒃2\,\mathrm{rad}\,\text{\boldmath$b$}, but it is too crude. To get a more accurate estimate of Δb\Delta b, we also consider, along with system (10), a system of linear algebraic equations

Ax=𝒃~,Ax=\tilde{\text{\boldmath$b$}}, (13)

for which the right-hand side is obtained by uniform “compressing” the interval vector 𝒃b:

𝒃~:=[𝒃¯+M,𝒃¯M],\tilde{\text{\boldmath$b$}}\,:=\,\bigl{[}\,\underline{\text{\boldmath$b$}}+M,\overline{\text{\boldmath$b$}}-M\,\bigr{]}, (14)

where

M:=maxxnTol(x,A,𝒃) 0.M\;:=\;\max_{x\in\mathbb{R}^{n}}\;\mathrm{Tol}\,(x,A,\text{\boldmath$b$})\ \geq\ 0.

Since the maximum MM is reached for a certain value of the argument, x^\hat{x}, then

M=min1im{rad𝒃i|mid𝒃ij=1n𝒂ijx^j|}min1imrad𝒃i.M=\,\min_{1\leq i\leq m}\left\{\,\mathrm{rad}\,\text{\boldmath$b$}_{i}-\left|\;\mathrm{mid}\,\text{\boldmath$b$}_{i}-\sum_{j=1}^{n}\,\text{\boldmath$a$}_{ij}\hat{x}_{j}\,\right|\,\right\}\ \leq\,\min_{1\leq i\leq m}\,\mathrm{rad}\,\text{\boldmath$b$}_{i}.

As a result, 𝒃¯+M𝒃¯M\underline{\text{\boldmath$b$}}+M\leq\overline{\text{\boldmath$b$}}-M in componentwise sense, and the endpoints in the interval vector (14) do not “overlap” each other.

But the properties of the recognizing functional imply that, for the interval system of linear algebraic equations (13) with the right-hand side (14), the maximum of the recognizing functional is zero:

maxxnTol(x,A,𝒃~)= 0.\max_{x\in\mathbb{R}^{n}}\;\mathrm{Tol}\,(x,A,\tilde{\text{\boldmath$b$}})\ =\ 0.

Indeed, the values of rad𝒃i\mathrm{rad}\,\text{\boldmath$b$}_{i} are summands in all expressions in (5), for which we take the minimum over i=1,2,,mi=1,2,\ldots,m. Hence, if we simultaneously increase or decrease all rad𝒃i\mathrm{rad}\,\text{\boldmath$b$}_{i} by the same value, keeping the midpoints mid𝒃i\mathrm{mid}\,\text{\boldmath$b$}_{i} unchanged, then the total value of the recognizing functional will increase or decrease by exactly same value. In other words, if we take a constant C0C\geq 0 and the interval mm-vector 𝒆=([1,1],,[1,1])\text{\boldmath$e$}=([-1,1],\ldots,[-1,1])^{\top}, then, for the system Ax=𝒃+C𝒆Ax=\text{\boldmath$b$}+C\text{\boldmath$e$}\, with all the right-hand sides expanded by [C,C][-C,C], we have

Tol(x,A,𝒃+C𝒆)=Tol(x,A,𝒃)+C.\mathrm{Tol}\,(x,A,\text{\boldmath$b$}+C\text{\boldmath$e$})\ =\ \mathrm{Tol}\,(x,A,\text{\boldmath$b$})+C. (15)

Therefore,

maxxnTol(x,A,𝒃+C𝒆)=maxxnTol(x,A,𝒃)+C.\max_{x\in\mathbb{R}^{n}}\;\mathrm{Tol}\,(x,A,\text{\boldmath$b$}+C\text{\boldmath$e$})\ =\ \max_{x\in\mathbb{R}^{n}}\;\mathrm{Tol}\,(x,A,\text{\boldmath$b$})+C. (16)

The uniform narrowing of the right-hand side vector acts on the tolerable solution set and the recognizing functional in a completely similar way. If we narrow down all the components by the same value MM, then the maximum of the recognizing functional of the new interval system also decreases by MM.

By virtue of the properties of the recognizing functional, the tolerable solution set Ξtol(A,𝒃~)\varXi_{tol}(A,\tilde{\text{\boldmath$b$}}) for system (13) has empty interior (such sets are often called “non-solid” or “meager”), which we will consider equivalent to “having zero size”. Naturally, this is a simplifying assumption, since in reality the tolerable solution set corresponding to the zero maximum of the recognizing functional may be not a single-point set. But we still accept that. This implication is also supported by the fact that the situation with the zero maximum of the recognizing functional is unstable: the corresponding tolerable solution set can become empty with an arbitrarily small data perturbation (see Section 4).

Another fact concerning the auxiliary system (13) with the narrowed right-hand side, which follows from (15)–(16), is that the point x^\hat{x} remains to be the argument of the maximum of the recognizing functional:

x^=argmaxxnTol(x,A,𝒃~).\hat{x}\ =\ \arg\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,(x,A,\tilde{\text{\boldmath$b$}}).

For this reason, the point b^=Ax^\hat{b}=A\hat{x} lies in the interval vector 𝒃~\tilde{\text{\boldmath$b$}} defined by (14).

From what has been said, it follows that the solution set for the system Ax=𝒃Ax=\text{\boldmath$b$} is obtained from the solution set of the system Ax=𝒃~Ax=\tilde{\text{\boldmath$b$}}, which has “negligible size” and for which maxxnTol(x,𝑨,𝒃~)=0\max_{x\in\mathbb{R}^{n}}\mathrm{Tol}\,(x,\text{\boldmath$A$},\tilde{\text{\boldmath$b$}})=0, through expanding the right-hand side vector 𝒃~\tilde{\text{\boldmath$b$}} in each component simultaneously by [M,M][-M,M], where

M=maxxnTol(x,𝑨,𝒃).M=\max_{x\in\mathbb{R}^{n}}\;\mathrm{Tol}\,(x,\text{\boldmath$A$},\text{\boldmath$b$}).

The interval vector 𝒃~b\tilde{\text{\boldmath$b$}}\ni b may have non-zero size, but we put

[Δb,Δb]=([M,M],,[M,M])[-\Delta b,\Delta b]=([-M,M],\ldots,[-M,M])^{\top}

in order to make our estimate (12) attainable. Accordingly, in inequality (12) we take

Δb=maxxnTol(x,𝑨,𝒃),\|\Delta b\|=\max_{x\in\mathbb{R}^{n}}\;\mathrm{Tol}\,(x,\text{\boldmath$A$},\text{\boldmath$b$}),

if the Chebyshev norm (\infty-norm) is considered, or a value that differs from it by a corrective factor from the equivalence inequality for vector norms, if we take any other norm. As is known, for any vector yny\in\mathbb{R}^{n} (see [2])

yy2ny.\|y\|_{\infty}\leq\|y\|_{2}\leq\sqrt{n}\;\|y\|_{\infty}. (17)

Then

xx^2ncond2AargmaxxnTol2maxxnTol𝒃^2.\|x^{\prime}-\hat{x}\|_{2}\ \lessapprox\,\sqrt{n}\ \,\mathrm{cond}_{2}A\cdot\bigl{\|}\,\arg\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,\bigr{\|}_{2}\cdot\frac{\max_{x\in\mathbb{R}^{n}}\mathrm{Tol}\,}{\|\hat{\text{\boldmath$b$}}\|_{2}}. (18)

What happens if the matrix AA does not have a full column rank? Then, by virtue of the Irene Sharaya criterion, the nonempty tolerable solution set to the system (10) is unbounded. This is completely consistent with the fact that then cond2A=\mathrm{cond}_{2}A=\infty and the value of the variability measure IVE is infinite too.

3.3 General interval systems

Finally, we consider a general interval system of linear equations 𝑨x=𝒃\text{\boldmath$A$}x=\text{\boldmath$b$}, with an essentially interval matrix, i. e., when rad𝑨0\mathrm{rad}\,\text{\boldmath$A$}\neq 0. In view of the properties of the tolerable solution set (see, e. g., [15]), it can be represented as

Ξtol(𝑨,𝒃)=A𝑨{xnAx𝒃}=A𝑨Ξtol(A,𝒃),\varXi_{tol}(\text{\boldmath$A$},\text{\boldmath$b$})\ =\ \bigcap_{A\in\text{\boldmath$A$}}\;\bigl{\{}\,x\in\mathbb{R}^{n}\mid Ax\in\text{\boldmath$b$}\,\bigr{\}}\ =\ \bigcap_{A\in\text{\boldmath$A$}}\varXi_{tol}(A,\text{\boldmath$b$}), (19)

i. e., as the intersection of the solution sets to the individual systems Ax=bAx=b with point matrices A𝑨A\in\text{\boldmath$A$}.

For each interval linear system Ax=𝒃Ax=\text{\boldmath$b$} with A𝑨A\in\text{\boldmath$A$}, we have estimate (18), if AA has full column rank. Otherwise, if the point matrix AA has incomplete column rank and the corresponding solution set Ξtol(A,𝒃)\varXi_{tol}(A,\text{\boldmath$b$}) is unbounded, then we do not take it into account. Consequently, for the tolerable solution set of the system 𝑨x=𝒃\text{\boldmath$A$}x=\text{\boldmath$b$}, which is the intersection of the solution sets Ξtol(A,𝒃)\varXi_{tol}(A,\text{\boldmath$b$}) for all A𝑨A\in\text{\boldmath$A$}, the following should be true:

xx^2minA𝑨{ncond2AargmaxxnTol2maxxnTol𝒃^2}.\|x^{\prime}-\hat{x}\|_{2}\ \lessapprox\ \min_{A\in\text{\boldmath$A$}}\,\left\{\;\sqrt{n}\;\,\mathrm{cond}_{2}A\cdot\bigl{\|}\,\arg\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,\bigr{\|}_{2}\cdot\frac{\max_{x\in\mathbb{R}^{n}}\mathrm{Tol}\,}{\|\hat{\text{\boldmath$b$}}\|_{2}}\,\right\}. (20)

The transition from representation (19) to inequality (20) can be both very accurate and rather crude (as can be seen from considering the intersection of two 1D intervals). It all depends on the size of the intersection of the solution sets of individual systems Ax=𝒃Ax=\text{\boldmath$b$}. On the other hand, the amount of this intersection is indirectly characterized by the magnitude of maxxnTol\max_{x\in\mathbb{R}^{n}}\mathrm{Tol}\,.

Taking the above facts into account, we perform approximate estimation of the right-hand side of inequality (20) by moving the minimum over A𝑨A\in\text{\boldmath$A$} through the curly brackets. First of all, we evaluate the factor argmaxxnTol2\|\arg\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,\|_{2}, which changes to the smallest extent, by the constant available to us after the numerical solution of the data fitting problem:

argmaxxnTol(x,A,𝒃)2const=argmaxxnTol(x,𝑨,𝒃)2.\bigl{\|}\arg\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,(x,A,\text{\boldmath$b$})\bigr{\|}_{2}\,\approx\;\mathrm{const}\;=\;\bigl{\|}\arg\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,(x,\text{\boldmath$A$},\text{\boldmath$b$})\bigr{\|}_{2}. (21)

Next, the minimum of cond2A\mathrm{cond}_{2}A naturally turns to mincond2A\min\mathrm{cond}_{2}A, and the most important factor maxxnTol(x,A,𝒃)\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,(x,A,\text{\boldmath$b$}) will be changed to maxxnTol(x,𝑨,𝒃)\max_{x\in\mathbb{R}^{n}}\mathrm{Tol}\,(x,\text{\boldmath$A$},\text{\boldmath$b$}). This choice (as well as (21)) is rather rigidly determined by the following reasons.

The expression for our variability measure should preserve its simplicity and be uniform for all cases and situations. In particular, if the interval matrix 𝑨A squeezes to a point matrix AA, then our measure should turn to the estimate (18) for the point case. Finally, if maxTol=0\max\,\mathrm{Tol}\,=0, then our measure must be zero too, since the size of the (stable) tolerable solution set is also zero, and our variability measure should reliably detect such situations. All this taken together leads to the estimate

xx^2n(minA𝑨cond2A)argmaxxnTol2maxxnTol𝒃^2.\|x^{\prime}-\hat{x}\|_{2}\;\,\lessapprox\ \sqrt{n}\ \Bigl{(}\;\min_{A\in\text{\boldmath$A$}}\,\mathrm{cond}_{2}A\,\Bigr{)}\cdot\bigl{\|}\,\arg\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,\bigr{\|}_{2}\cdot\frac{\max_{x\in\mathbb{R}^{n}}\mathrm{Tol}\,}{\|\hat{\text{\boldmath$b$}}\|_{2}}. (22)

The same estimate as (22), by virtue of the equivalence inequality (17), is also true for the Chebyshev norm:

maxxΞtol(𝑨,𝒃)xx^n(minA𝑨cond2A)argmaxxnTol2maxxnTol𝒃^2.\max_{x^{\prime}\in\varXi_{tol}(\text{\boldmath$A$},\text{\boldmath$b$})}\|x^{\prime}-\hat{x}\|_{\infty}\ \lessapprox\ \sqrt{n}\;\,\Bigl{(}\;\min_{A\in\text{\boldmath$A$}}\,\mathrm{cond}_{2}\,A\,\Bigr{)}\cdot\bigl{\|}\,\arg\max_{x\in\mathbb{R}^{n}}\mathrm{Tol}\,\bigr{\|}_{2}\cdot\frac{\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,}{\|\hat{\text{\boldmath$b$}}\|_{2}}.

This completes the rationale for (6).

If we want to evaluate the relative size of the tolerable solution set, expressing it in ratio to the norm of its points, then it is reasonable to take x^=argmaxxnTol\hat{x}=\arg\max_{x\in\mathbb{R}^{n}}\mathrm{Tol}\, as the “most typical” point from the tolerable solution set Ξtol(𝑨,𝒃)\varXi_{tol}(\text{\boldmath$A$},\text{\boldmath$b$}). Using (17) again, we obtain

maxxΞtol(𝑨,𝒃)xx′′x^n(minA𝑨cond2A)maxxnTol𝒃^2.\frac{\max_{x^{\prime}\in\varXi_{tol}(\text{\boldmath$A$},\text{\boldmath$b$})}\|x^{\prime}-x^{\prime\prime}\|_{\infty}}{\|\hat{x}\|_{\infty}}\ \lessapprox\ n\;\Bigl{(}\;\min_{A\in\text{\boldmath$A$}}\,\mathrm{cond}_{2}A\,\Bigr{)}\cdot\frac{\max_{x\in\mathbb{R}^{n}}\,\mathrm{Tol}\,}{\|\hat{\text{\boldmath$b$}}\|_{2}}.

This gives value (8).

4 Numerical examples and some tests

First of all, we consider an example of unstable tolerable solution set that changes abruptly with small perturbations in the system of equations. For all interval 2×22\times 2-systems of linear algebraic equations of the form

([1,1][1,1]11)(x1x2)=([1,1][1,1+η]),η0,\begin{pmatrix}[-1,1]&[-1,1]\\[2.0pt] 1&-1\\[2.0pt] \end{pmatrix}\begin{pmatrix}x_{1}\\[2.0pt] x_{2}\end{pmatrix}=\begin{pmatrix}{[-1,1]}\\[2.0pt] {[1,1+\eta]}\end{pmatrix},\qquad\eta\geq 0, (23)

the tolerable solution sets are the same: this is the straight line segment joining the points (0,1)(0,-1) and (1,0)(1,0) and depicted in Fig. 6. The diameter of the solution set is essentially non-zero (namely, 2\sqrt{2}), but the unconstrained maximum of the recognizing functional Tol for all such systems is zero, and it is attained at the point (0.5,0.5)(0.5,-0.5).

Refer to captionΞ𝑡𝑜𝑙\varXi_{\mathit{tol}}
Fig. 6: The tolerable solution set for the interval equations systems (23).

At the same time, any arbitrarily small increase in the lower endpoint of the interval [1,1+η][1,1+\eta] in the right-hand side of the second equation makes the tolerable solution set empty. An arbitrarily small reduction of the upper endpoint of the interval [1,1][-1,1], located in the first component of the right-hand side vector, produces a similar effect. It turns out that the maximum value of the recognizing functional Tol characterizes very precisely the instability of the original solution set and the zero size of the solution sets of perturbed systems.

As the second example, we consider the problem of constructing a linear function of two variables a1a_{1} and a2a_{2},

b=x1a1+x2a2,b=x_{1}a_{1}+x_{2}a_{2}, (24)

from the interval data obtained in 3 measurements:

𝒂1𝒂2𝒃1[98,100][99,101][190,210]2[97,99][98,100][200,220]3[96,98][97,99][190,210]\begin{array}[]{c|ccc}&\text{\boldmath$a$}_{1}&\text{\boldmath$a$}_{2}&\text{\boldmath$b$}\\ \hline\cr\\[-8.53581pt] 1&[98,100]&[99,101]&[190,210]\\[3.0pt] 2&[97,99]&[98,100]&[200,220]\\[3.0pt] 3&[96,98]&[97,99]&[190,210]\end{array}

Note that in these data the three-dimensional uncertainty boxes of measurements 1 and 2, as well as 2 and 3, substantially “overlap” each other: their intersections are boxes with non-empty interiors, the sizes of which are comparable to the sizes of the original data boxes.

Refer to captionx1x_{1}x2x_{2}𝒙^\hat{\text{\boldmath$x$}}
Fig. 7:  The  tolerable  solution  set  of  the  system  of  equations  (25)

in comparison with the box constructed by using the estimate IVE.

To determine the coefficients x1x_{1} and x2x_{2}, we compose an interval 3×23\times 2-system of linear algebraic equations

([98,100][99,101][97,99][98,100][96,98][97,99])(x1x2)=([190,210][200,220][190,210]).\begin{pmatrix}[98,100]&[99,101]\\[2.0pt] [97,99]&[98,100]\\[2.0pt] [96,98]&[97,99]\end{pmatrix}\begin{pmatrix}x_{1}\\[2.0pt] x_{2}\end{pmatrix}=\begin{pmatrix}{[190,210]}\\[2.0pt] {[200,220]}\\[2.0pt] {[190,210]}\end{pmatrix}. (25)

Its matrix has incomplete rank, since its member is a point matrix with the rank 1:

(989998999899).\begin{pmatrix}98&99\\ 98&99\\ 98&99\end{pmatrix}. (26)

The united solution set for system (25) is unbounded, therefore it is hardly possible to determine, with certainty, the coefficients of the linear function (24) satisfying the weak compatibility between parameters and data (see Section 2). However, the interval matrix of system (25) does not contain linearly dependent point columns, and therefore, according to the Irene Sharaya criterion [13] (see Section 2.1), the tolerable solution set is bounded. It is depicted in Fig. 7, which is drawn by the procedure EqnTol2D from the package IntLinInc2D [14]. The minimum spectral condition number of the point matrices contained in the interval matrix of (25) is 103.83103.83, and it is reached on the matrix

(10099971009699).\begin{pmatrix}100&99\\ 97&100\\ 96&99\end{pmatrix}.

This result can be obtained, for example, using the simulated annealing algorithm on the set of point matrices contained in the interval matrix of (25).

Numerical solution of the maximization problem for the recognizing functional Tol can be carried out within MATLAB environment, using the free program tolsolvty2.m (available from the author’s page at ResearchGate [21]). It implements a modified version of the so-called rr-algorithms for non-differentiable optimization proposed and developed by N.Z. Shor and N.G. Zhurbenko [20]. Using the precision settings specified in it “by default”, we get maxTol=1.9095\max\,\mathrm{Tol}\,=1.9095, which is reached at x^=(5.1857107,2.0603)\hat{x}=(5.1857\cdot 10^{-7},2.0603)^{\top}. Then,

IVE=21.9095103.83x^22002+2102+2002=1.6399.\mathrm{IVE}\,=\sqrt{2}\cdot 1.9095\cdot 103.83\cdot\frac{\|\hat{x}\|_{2}}{\sqrt{200^{2}+210^{2}+200^{2}}}=1.6399.

Interval hull of the tolerable solution set for system (25) (that is, its optimal interval enclosure) is the box

([0.9620,3.0227][0.9320,3.0127]),\begin{pmatrix}[-0.9620,3.0227]\\[2.0pt] [-0.9320,3.0127]\end{pmatrix},

which can also be found by the procedure EqnTol2D. We see that the value of IVE is in satisfactory agreement with the radii of the components of the optimal estimate of the solution set, equal to 1.99241.9924 and 1.97241.9724 respectively.

In the maximum compatibility method, the argument x^=argmaxxnTol\hat{x}=\arg\max_{x\in\mathbb{R}^{n}}\mathrm{Tol}\, of the unconstrained maximum of the recognizing functional plays a crucial role, and, in fact, our variability estimate IVE relies heavily on it. This is why it makes sense to look at the box 𝒙^\hat{\text{\boldmath$x$}} with the components [x^iIVE,x^i+IVE][\hat{x}_{i}-\mathrm{IVE}\,,\hat{x}_{i}+\mathrm{IVE}\,], i=1,2i=1,2. It is also depicted in Fig. 7, and the substantial asymmetry of its location relative to the solution set is, of course, explained by the specific position of the center, the point x^\hat{x}, as well as the ill-conditioning of the point systems from (25). With other data, the box 𝒙^\hat{\text{\boldmath$x$}} estimates the tolerable solution sets significantly better (see further).

Next, we give an example of the opposite type (in a sense, dual to the previous example), where a linear function of three variables

b=x1a1+x2a2+x2a3b=x_{1}a_{1}+x_{2}a_{2}+x_{2}a_{3} (27)

is to be constructed from the data of two experiments summarized below:

𝒂1𝒂2𝒂3𝒃1[98,100][97,99][96,98][190,210]2[99,101][98,100][97,99][200,220]\begin{array}[]{c|cccc}&\text{\boldmath$a$}_{1}&\text{\boldmath$a$}_{2}&\text{\boldmath$a$}_{3}&\text{\boldmath$b$}\\ \hline\cr\\[-8.53581pt] 1&[98,100]&[97,99]&[96,98]&[190,210]\\[3.0pt] 2&[99,101]&[98,100]&[97,99]&[200,220]\end{array}

To find the parameters of function (27), we come to an underdetermined interval system of linear algebraic equations

([98,100][97,99][96,98][99,101][98,100][97,99])(x1x2x3)=([190,210][200,220]).\begin{pmatrix}[98,100]&[97,99]&[96,98]\\[2.0pt] [99,101]&[98,100]&[97,99]\end{pmatrix}\begin{pmatrix}x_{1}\\[1.0pt] x_{2}\\[1.0pt] x_{3}\end{pmatrix}=\begin{pmatrix}{[190,210]}\\[2.0pt] {[200,220]}\end{pmatrix}. (28)

Its matrix is the transposed matrix of system (25), so minA𝑨cond2A\min_{A\in\text{\boldmath$A$}}\mathrm{cond}_{2}\,A is the same for it. Also, the matrix of system (28) contains a point matrix of the incomplete rank 1, which is transposed for (26) (and many more such matrices).

Refer to captionx1x_{1}x2x_{2}x3x_{3}
Fig. 8: The tolerable solution set for the interval equations system (28).

Again, the united solution set for system (28) is unbounded, and it is difficult (if at all possible) to determine the coefficients of the linear function (27), relying on the weak compatibility between parameters and data, due to “infinite variability” of the resulting estimate. Nevertheless, in these adverse conditions, the nonempty tolerable solution set to the interval system of equations (28) is bounded by virtue of the Irene Sharaya criterion [13] (see Section 2.1). In Fig. 8, the tolerable solution set is depicted as a thin hexagonal plate. Computation of the maximum of the recognizing functional for this system using the code tolsolvty2.m gives the value maxTol=3.9698\max\mathrm{Tol}\,=3.9698, which is reached at the point

x^=argmaxTol=( 2.0603,3106,2.1106).\hat{x}=\arg\max\mathrm{Tol}\,=\,\bigl{(}\,2.0603,3\cdot 10^{-6},2.1\cdot 10^{-6}\,\bigr{)}^{\top}.

It can be taken as an estimate of the coefficients in (27). Then the varibility measure of the above estimate is

IVE=23.9698103.83x^22002+2102=4.1413.\mathrm{IVE}\,=\sqrt{2}\cdot 3.9698\cdot 103.83\cdot\frac{\|\hat{x}\|_{2}}{\sqrt{200^{2}+210^{2}}}=4.1413.

Interval hull (optimal interval enclosure) of the tolerable solution set for system (28) is the box

([1.9747,4.0302][1.9899,4.0759][1.9949,4.1071]),\begin{pmatrix}[-1.9747,4.0302]\\[2.0pt] [-1.9899,4.0759]\\[2.0pt] [-1.9949,4.1071]\end{pmatrix},

which can also be computed by the procedure EqnTolR3. The radii of the components of this interval vector are 3.00243.0024, 3.03293.0329, 3.05103.0510 respectively, which is also not very different from the value of IVE. The example shows that the value IVE works even in the case of m<nm<n, when the number of measurements is less than the number of parameters to be determined. But a rigorous justification of this fact is waiting for its study.

To conclude the section, we present, in Table 1, the results of numerical tests for the interval linear n×nn\times n-system

(θ[0,2][0,2][0,2]θ[0,2][0,2][0,2]θ)(x1x2xn)=([1,K][1,K][1,K]),\left(\begin{array}[]{cccc}\theta&{[0,2]}&\cdots&{[0,2]}\\[1.0pt] {[0,2]}&\theta&\cdots&{[0,2]}\\[1.0pt] \vdots&\vdots&\ddots&\vdots\\[1.0pt] {[0,2]}&{[0,2]}&\cdots&\theta\end{array}\right)\;\left(\begin{array}[]{@{\,}c@{\,}}x_{1}\\[1.0pt] x_{2}\\[1.0pt] \vdots\\[1.0pt] x_{n}\end{array}\right)=\left(\begin{array}[]{@{\;}c@{\;}}{[1,K]}\\[1.0pt] {[1,K]}\\[1.0pt] \vdots\\[1.0pt] {[1,K]}\end{array}\right), (29)

with various nn and KK. System (29) resembles that proposed in [9], having exactly same matrix. But the right-hand sides were taken as positive intervals [1,K][1,K], since the original balanced intervals [1,1][-1,1] in the system from [9] make the tolerable solution set “too symmetric”.

Table 1: Results of the computational tests with system (29)
θ\theta\rule[-8.53581pt]{0.0pt}{22.76219pt} IVE\mathrm{IVE}\, rad(Ξtol)\|\,\mathrm{rad}\,(\,\scalebox{0.67}[0.87]{$\Box$\hskip 1.0pt}\varXi_{tol})\|_{\infty} rad(Ξtol)2\|\,\mathrm{rad}\,(\,\scalebox{0.67}[0.87]{$\Box$\hskip 1.0pt}\varXi_{tol})\|_{2} θ\theta\rule[-8.53581pt]{0.0pt}{22.76219pt} IVE\mathrm{IVE}\, rad(Ξtol)\|\,\mathrm{rad}\,(\,\scalebox{0.67}[0.87]{$\Box$\hskip 1.0pt}\varXi_{tol})\|_{\infty} rad(Ξtol)2\|\,\mathrm{rad}\,(\,\scalebox{0.67}[0.87]{$\Box$\hskip 1.0pt}\varXi_{tol})\|_{2}
n=5n=5, K=10K=10 n=10n=10, K=10K=10
2.0 1.019 1.25 2.795 6.0 0.894 0.5 1.581
4.0 1.081 0.875 1.957 9.0 1.491 0.389 1.230
6.0 0.786 0.639 1.429 12.0 0.582 0.313 0.988
8.0 0.681 0.5 1.118 15.0 0.495 0.26 0.822
10.0 0.534 0.41 0.917 20.0 0.396 0.203 0.640
n=5n=5, K=20K=20 n=10n=10, K=20K=20
2.0 2.953 3.75 8.385 6.0 2.489 1.333 4.216
4.0 2.698 2.125 4.752 9.0 1.831 0.944 2.987
6.0 2.015 1.472 3.292 12.0 1.432 0.729 2.306
8.0 1.591 1.125 2.516 15.0 1.255 0.593 1.876
10.0 1.378 0.91 2.035 20.0 0.985 0.453 1.431

The interval matrix of system (29) is known to be regular if and only if θ>n\theta>n for even nn and θ>n21\theta>\sqrt{n^{2}-1} for odd nn [9]. Consequently, in Table 1, the first two rows that correspond to each separate case of nn and KK refer to systems with singular matrices. As the parameter θ\theta grows, the matrix of the system becomes regular and better conditioned.

The values of IVE are compared with rad(Ξtol)\|\,\mathrm{rad}\,(\,\scalebox{0.67}[0.87]{$\Box$\hskip 1.0pt}\varXi_{tol})\|_{\infty} and rad(Ξtol)2\|\,\mathrm{rad}\,(\,\scalebox{0.67}[0.87]{$\Box$\hskip 1.0pt}\varXi_{tol})\|_{2}, that is, with the Chebyshev norm (max-norm) and the Euclidean norm of the radius of the interval hull of the tolerable solution set (denoted as Ξtol\scalebox{0.67}[0.87]{$\Box$\hskip 1.0pt}\varXi_{tol}). We can see that, with the exception of two cases corresponding to n=5n=5 and K=10,20K=10,20, the values of IVE are always within the two-sided bounds given by rad(Ξtol)\|\,\mathrm{rad}\,(\,\scalebox{0.67}[0.87]{$\Box$\hskip 1.0pt}\varXi_{tol})\|_{\infty} (lower bound) and rad(Ξtol)2\|\,\mathrm{rad}\,(\,\scalebox{0.67}[0.87]{$\Box$\hskip 1.0pt}\varXi_{tol})\|_{2} (upper bound). And that is reasonable. Overall, our numerical experiments confirm the adequacy of the new measure of variability, which gives quite satisfactory approximate estimates of the size of the tolerable solution sets in various situations.

5 Discussion

IVE is the first measure of variability proposed in the statistics of interval data, for estimation using the maximum compatibility method, and we can not compare IVE with similar other measures, since they simply do not exist. However, it is useful to correlate the estimate IVE with the ideal mathematical characteristics of the solution set, such as its diameter, in terms of computational convenience and laboriousness.

First of all, IVE reflects instabilities in the solution set better than the diameter (see the first example in Section 4). An instability of the tolerable solution set for an interval linear system arises in the case when the maximum value of the recognizing functional is zero, maxTol=0\max\mathrm{Tol}\,=0. Then the tolerable solution set can be either a single-point or an extended set with non-zero diameter and empty interior [15]. After an arbitrarily small perturbation of data, the latter situation can abruptly turn into the empty solution set. In any case, this phenomenon is signaled by the zero value of the maximum of the recognizing functional. The corresponding variability measure IVE is also zero, which is quite natural: it makes sense to show only “stable size” of the solution set. The equality of IVE to zero or “almost zero” thus allows us to diagnose unstable cases.

Next, the problem of computing the diameter, in 2-norm, of the tolerable solution set to an interval linear system of equations is NP-hard in general. This follows from its reducibility to the quadratic programming problem with linear constraints (see [22]). Indeed, the membership of a point in the tolerable solution set to an interval m×nm\times n-system of equations is determined by a system of linear inequalities of the size 2m×2n2m\times 2n (the Rohn theorem [12]). To compute the diameter of the tolerable solution set in 2-norm, we have to maximize the quadratic objective function xx′′22\|x^{\prime}-x^{\prime\prime}\|^{2}_{2} over all pairs of points xx^{\prime}, x′′x^{\prime\prime} from the tolerable solution set, i. e. satisfying 2m×2n2m\times 2n-systems of linear inequalities. So, computing the diameter of the tolerable solution set is not easy.

The diameter of the interval hull of the tolerable solution set can be computed more simply, but it is not better than IVE in any case, since the interval hull is not the solution set itself, and the coarsening resulted from such a replacement may be large.

Calculation of IVE by formula (6) requires solving the data fitting problem, that is, finding maxTol\max\,\mathrm{Tol}\, and argmaxTol\arg\max\,\mathrm{Tol}\,, and then we need to evaluate the minimum of the condition number of the matrices from the interval data matrix. In turn, the recognizing functional Tol is a concave piecewise linear function [15], so computing its maximum is polynomially complex. The author efficiently solves it by nonsmooth optimization methods developed in recent decades, in particular, using rr-algorithms proposed by N.Z. Shor [20], or using separating plane algorithms (see, e. g., [11, 23]). The most difficult part in calculating IVE is thus evaluating the minimum condition number of point matrices from a given interval matrix.

Computing the exact minimum of the condition number is not simple, but to address practical problems which will apply the value IVE, it is sufficient to have an approximate estimate for minA𝑨cond2A\min_{A\in\text{\boldmath$A$}}\,\mathrm{cond}_{2}\,A from above. This follows from our considerations in Section 3.3. Sometimes, it is not necessary to compute mincond2A\min\,\mathrm{cond}_{2}\,A at all, if we have to compare, with each other, the variability of the estimates obtained for the same data matrix 𝑨A.

If the interval matrix is “sufficiently narrow”, being not very different from a point matrix, then we can approximate

minA𝑨cond2Acond2(mid𝑨).\min_{A\in\text{\boldmath$A$}}\,\mathrm{cond}_{2}\,A\;\approx\;\mathrm{cond}_{2}(\mathrm{mid}\,\text{\boldmath$A$}). (30)

But in general, this recipe may work poorly, since the left and right sides of the approximate equality (30) can be quite different. In the examples with systems (25) and (28) from Section 4, the condition number of the midpoint matrix is 2.381042.38\cdot 10^{4}, and, using the simplified formula (30), we are mistaken in evaluating the variability measure IVE by more than 20 times.

In the general case, for a more accurate evaluation of mincond2A\min\,\mathrm{cond}_{2}\,A, we can use popular evolutionary optimization methods, such as a genetic algorithm, simulated annealing, particle swarm optimization, etc., within the interval matrix 𝑨A. In the numerical experiments from Section 4, the minimum of the condition number was found using the standard program of simulated annealing from free computer math system Scilab.

Note that there is a fundamental difference between computing the variability measure IVE and computing the diameter of the tolerable solution set: in the first case, we calculate a minimum, while in the second we have to find a maximum. Using traditional optimization methods and various heuristics, in the first case we compute an approximation to the minimum from above, and in the second case we find an approximation to the maximum from below. If we want to get, with our variability measure, a guaranteed outer estimation of the solution set, then the upper estimate, which is obtained by calculating the minimum in IVE, is more preferable.

There exists one more viewpoint at the variability measure IVE.

In traditional probabilistic statistics, the phenomenon of collinearity of data (also called “multicollinearity”) plays a large role. It is the presence of a linear dependence between the input (predictor) variables of the regression model. The kk variables of the model in question are usually called collinear if the vectors representing them lie in a linear space of dimension less than kk [1], so that one of these vectors is a linear combination of the others. In practice, such exact collinearity of data is rare, but real computational problems in data fitting often starts when the data vectors are “almost linearly dependent”. Then the parameter estimates are unstable, which leads to increased statistical uncertainty, i. e., an increase in the variance of the estimates.

According to modern views, the collinearity of data is most adequately described by the condition number of the matrix made up of these data (see, e. g., [1], Chapter 3). In this sense, our IVE is, in fact, a measure of the collinearity of the data, corrected with the help of the actual value of the estimate and compatibility of this estimate with the data (which is indicated by the maximal value of the recognizing functional). The minimum over all cond2A\mathrm{cond}_{2}A for A𝑨A\in\text{\boldmath$A$} is taken due to the specifics of the strong compatibility of parameters and data, and it agrees well with the regularizing role of the tolerable solution set (see [18]).

With this interpretation, IVE makes sense even with a negative maximum of the recognizing functional, max Tol, when the tolerable solution set is empty and the parameters of the linear function (1), which are strongly compatible with the data, do not exist. The absolute value of IVE still shows, up to a certain scaling coefficient, a measure of the collinearity of the data (a measure of their ill-conditioning), and the negative sign indicates the status of the solution to the problem, i. e., that the parameter vector computed is not strongly compatible with the data, but only provides the best possible approximation for the input data of the problem.

The immediate goal of further research is to justify the use of IVE for underdetermined situations, where the number mm of observations is less than the number nn of parameters to be determined. The maximum compatibility method works well in this case too, we get parameter estimates and we can calculate their values of IVE, but its application needs to be justified, at least at the same level of rigor as was done in this work for mnm\geq n.

Aknowledgements

The author is grateful to Alexander Bazhenov, Sergey Kumkov, and Sergei Zhilin for stimulating discussions and valuable comments on the work. Also, the author thanks the anonymous reviewers for their constructive criticism and good suggestions.

References

  • [1] D.A. Belsley, E. Kuh, R.E. Welsch, Regerssion Diagnostics, Wiley-Interscience, Hoboken, N.J., 1980, 2004.
  • [2] G.H. Golub, Ch.F. Van Loan, Matrix Computations, The Johns Hopkins University Press, Baltimore, 1996, 2013.
  • [3] R.A. Horn, Ch.R. Johnson, Topics in Matrix Analysis, Cambridge University Press, Cambridge, 1994.
  • [4] L. Jaulin, M. Kieffer, O. Didrit, E. Walter, Applied Interval Analysis, Springer, London, 2001.
  • [5] R.B. Kearfott, M. Nakao, A. Neumaier, S. Rump, S.P. Shary, P. van Hentenryck, Standardized notation in interval analysis, Computational Technologies 15 (2010), No. 1, pp. 7–13.
  • [6] V. Kreinovich, S.P. Shary, Interval methods for data fitting under uncertainty: a probabilistic treatment, Reliable Computing 23 (2016), pp. 105–140.  URL: http://interval.louisiana.edu/reliable-computing-journal/volume-23/reliable-computing-23-pp-105-140.pdf (accessed 10 March 2020).
  • [7] M. Milanese, J. Norton, H. Piet-Lahanier, E. Walter (Eds.), Bounding Approaches to System Identification, Plenum Press, New York, 1996.  DOI: 10.1007/978-1-4757-9545-5
  • [8] R.E. Moore, R.B. Kearfott, M.J. Cloud, Introduction to Interval Analysis, SIAM, Philadelphia, 2009.
  • [9] A. Neumaier, Interval Methods for Systems of Equations, Cambridge University Press, Cambridge, 1990.
  • [10] H.T. Nguyen, V. Kreinovich, B. Wu, G. Xiang, Computing Statistics under Interval and Fuzzy Uncertainty. Applications to Computer Science and Engineering, Springer, Berlin-Heidelberg, 2012.
  • [11] E.A. Nurminski, Separating plane algorithms for convex optimization, Mathematical Programming 76 (1997), pp. 373–391.   DOI: 10.1007/BF02614389
  • [12] J. Rohn, Inner solutions of linear interval systems, in: Nickel K. (Ed.), Interval Mathematics 1985, Lecture Notes on Computer Science 212, Springer, New York, 1986, pp. 157–158.
  • [13] I.A. Sharaya, On unbounded tolerable solution sets, Reliable Computing 11 (2005), pp. 425–432.   DOI: 10.1007/s11155-005-0049-9
  • [14] I.A. Sharaya, IntLinInc2D, a MATLAB software package for visualization of solution sets to interval linear 2D systems. Novosibirsk, 2014.  URL: http://www.nsc.ru/interval/sharaya
  • [15] S.P. Shary, Solving the linear interval tolerance problem, Mathematics and Computers in Simulation 39 (1995), pp. 53–85.   DOI: 10.1016/0378-4754(95)00135-K
  • [16] S.P. Shary, Solvability of interval linear equations and data analysis under uncertainty, Automation and Remote Control 73 (2012), pp. 310–322.  DOI: 10.1134/S0005117912020099
  • [17] S.P. Shary, Maximum consistency method for data fitting under interval uncertainty, Journal of Global Optimization 66 (2016), pp. 111–126.  DOI: 10.1007/s10898-015-0340-1
  • [18] S.P. Shary, Interval regularization for imprecise linear algebraic equations. Deposited in arXiv.org on 27 Sep 2018, No. arXiv:1810.01481.
  • [19] S.P. Shary, Weak and strong compatibility in data fitting under interval uncertainty, Advances in Data Science and Adaptive Analysis 12 (2020) (in press).
  • [20] N.Z. Shor, N.G. Zhurbenko, A minimization method using space dilation in the direction of difference of two successive gradients, Cybernetics 7(3) (1971), pp. 450–459.   DOI: 10.1007/BF01070454
  • [21] TOLSOLVTY2, a MATLAB code for examining solvability of the interval linear tolerance problem.   URL: https://www.researchgate.net/publication/294889566_TOLSOLVTY2
  • [22] S.A. Vavasis, Complexity theory: Quadratic programming, in: C.A. Floudas and P.M. Pardalos (Eds.), Encyclopedia of Optimization. Second Edition, New York, Springer, 2009, pp. 451–453.
  • [23] E. Vorontsova, Extended separating plane algorithm and NSO-solutions of PageRank problem, in: Y. Kochetov, M. Khachay, V. Beresnev, E. Nurminski, P. Pardalos (Eds.), Discrete Optimization and Operations Research. Proceedings of 9th International Conference DOOR 2016, Vladivostok, Russia, September 19-23, 2016, Lecture Notes in Computer Science 9869, Cham, Switzerland, Springer International, 2016, pp. 547–560.   DOI: 10.1007/978-3-319-44914-2_43
  • [24] D.S. Watkins, Fundamentals of Matrix Computations, Wiley-Interscience, New York, 2002.