Distributionally robust possibilistic optimization problems
Abstract
In this paper a class of optimization problems with uncertain linear constraints is discussed. It is assumed that the constraint coefficients are random vectors whose probability distributions are only partially known. Possibility theory is used to model the imprecise probabilities. In one of the interpretations, a possibility distribution (a membership function of a fuzzy set) in the set of coefficient realizations induces a necessity measure, which in turn defines a family of probability distributions in this set. The distributionally robust approach is then used to transform the imprecise constraints into deterministic counterparts. Namely, the uncertain left-had side of each constraint is replaced with the expected value with respect to the worst probability distribution that can occur. It is shown how to represent the resulting problem by using linear or second order cone constraints. This leads to problems which are computationally tractable for a wide class of optimization models, in particular for linear programming.
Keywords: robust optimization; possibility theory; imprecise probabilities; fuzzy intervals
1 Introduction
In this paper we consider the following uncertain optimization problem:
(1) |
In formulation (1), is a vector of objective function coefficients, is a vector of constraint right-hand sides, and is a vector of uncertain coefficients of the th constraint. In the following we will use the notation . We assume that is a random vector in whose realization is unknown when a solution to (1) is determined. A particular realization of is called a scenario. After replacing with scenario , for each , we get a deterministic counterpart of (1). The true probability distribution for is only partially known. In particular, the set of possible scenarios is provided, which is the support of or its reasonable approximation. Finally, is a nonempty and bounded subset of which restricts the domain of decision variables . If is a polyhedron, for example, , for some , then we get uncertain linear programming problem. If, additionally, , then we get uncertain combinatorial optimization problem.
The method of handling the uncertain constraints and solving (1) depends on the information available about . If is a random vector with a known probability distribution, then the th constraint can be replaced with a stochastic chance constraint of the form
where is a given risk level [8, 21]. If the uncertainty set is the only information provided with , then the imprecise constraint can be replaced with the following strict robust constraint [2]:
(2) |
Choosing appropriate uncertainty set is crucial in (2). For the discrete uncertainty, consists of explicitly listed scenarios [25]. These scenarios correspond to some events which influence the value of or can be the result of sampling the random vector . For the interval uncertainty [25], one provides an interval for each component of and is the Cartesian product of these intervals (a hyperrectangle in ). In order to cut off the extreme values of this hyperrectangle, whose probability of occurrence can be negligible, one can add a budget to . This budget typically limits deviations of scenarios from some nominal scenario [5, 29]. In another approach is specified as an ellipsoid centered at some nominal scenario [2]. Using the ellipsoidal uncertainty one can model correlations between the components of . One can also construct from available data [4], which leads to various data-deriven robust models.
Another approach to handle the uncertainty consists in modeling the uncertain coefficients in (1) by using fuzzy sets. There are a lot of concepts of solving fuzzy optimization problems (see, e.g., [20, 35, 26, 34, 37, 32, 31]) – for a comprehensive description of them we refer the reader to [28]. In one of the most common class of models, is a vector of fuzzy intervals whose membership functions are interpreted as possibility distributions for the components of . A description of possibility theory that offers a general framework of dealing with imprecise or incomplete knowledge can be found, for example, in [14]. It uses two dual possibility and necessity measures to handle the uncertainty. In the possibilistic setting one can replace the uncertain constraints in (1) with fuzzy chance constraints of the form [19, 27, 23]:
where is a necessity measure induced by a possibility distribution for .
If a partial information about the probability distribution of is available, then the so-called distributionally robust approach can be used (see, e.g., [16, 39, 10]). In this approach it is assumed that the true probability distribution for belongs to the so-called ambiguity set of probability distributions. For example, one can consider all probability distributions given some bounds on its mean and covariance matrix [10]. Alternatively, the ambiguity set can be specified by providing a family of confidence sets (this approach will be adopted in this paper) [39]. Using the distributionally robust approach, one can consider the following counterpart of the imprecise constraint :
(3) |
where the left hand side is the expected value of the random variable under a worst probability distribution . Observe that (3) reduces to (2) if contains all probability distributions with the support .
In this paper we propose a new approach in the area of fuzzy optimization. Will show how the distributionally robust optimization can be naturally used in the setting of possibility theory. A preliminary version of the paper has appeared in [17]. We will start by defining a possibility distribution (a membership function of a fuzzy set) for . This can be done by using both available data or experts knowledge. Following the interpretation of possibility distribution [1, 13], we will define an ambiguity set of probability distributions for . Finally, we will apply constraints of the type (3) to compute a solution. We will show that the model of uncertainty assumed leads to a family of linear or second order cone constraints. So, the resulting problem is tractable for important cases of (1), for example for the class of linear programming problems.
Remark 1.
If one uses (3), then the assumption that and , , are precise in (1) causes no loss of generality. Indeed, if is imprecise, then we can minimize an auxiliary variable , subject to the imprecise constraint ( reflects the possible realizations of objective function values). If is imprecise, then the th constraint can be rewritten as , where is an auxiliary variable such that . In both cases, we can replace the imprecise constraint with a one of the form (3).
This paper is organized as follows. In Section 2 we recall basic notions of possibility theory, in particular its probabilistic interpretation assumed in this paper. In Section 3 we discuss the discrete uncertainty representation in which a possibility degree for each scenario is specified. Then the ambiguity set contains a family of discrete probability distributions with a finite support . We will show that (3) is then equivalent to a family of linear constraints. In Section 4 we consider the interval uncertainty representation, in which unknown probability distribution for has a compact continuous support. We propose a method of defining a continuous possibility distribution for , which can be built by using available data or experts knowledge. We show that (3) is then equivalent to a family of second order cone constraints.
2 Possibility theory and imprecise probabilities
Let be a set of alternatives. A primitive object of possibility theory (see, e.g., [1]) is a possibility distribution , which assigns to each element a possibility degree . We only assume that is normal, i.e. there is such that . Possibility distribution can be built by using available data or experts opinions (see, e.g., [14]). A possibility distribution induces the following possibility and necessity measures in :
where is the complement of . In this paper we assume that the possibility distribution represents uncertainty in , i.e. some knowledge about uncertain quantity taking values in . We now recall, following [13], a probabilistic interpretation of the pair induced by the possibility distribution . Define
(4) |
In this case and (see [13, 15, 9]). So, the possibility distribution in encodes a family of probability measures in . Any probability measure is said to be consistent with and for any event the inequalities
(5) |
hold. A detailed discussion on the expressive power of (5) can also be found in [38].
Observe that is nonempty due the assumption that is normalized. Indeed, the probability distribution such that for some such that is in . To see this consider two cases. If , then and . If , then .
3 Discrete uncertainty model
Consider uncertain constraint , where is a random vector in with an unknown probability distribution (in order to simplify notation we skip the index in the constraint). In this section we assume that the support of is finite, i.e. is the set of all possible, explicitly listed scenarios (realizations of ) which can occur with a positive probability. We thus consider the discrete uncertainty model [25, 22]. Let be a membership function of a fuzzy set in , . We will interpret as a possibility distribution in the scenario set , i.e. . In order to simplify presentation, we will identify each scenario with its index . We can thus assume that is a possibility distribution in the index set , . The value of is the possibility degree for scenario . Assume also w.l.o.g. that and thus .
Given , we can now construct an ambiguity set of probability distributions for by using (4). Because contains only discrete probability distributions in , we get
and we can describe by the following system of linear constraints:
(6) |
In description (6) the first set of constraints model the condition , (). Notice that the formulation (6) has exponential number of constraints. In the following, we will show how to reduce the number of constraints to .
Let us partition the set of indices into disjoint sets such that the elements of have the same possibility degrees and the possibility degrees of the elements in are greater than the possibility degrees of the elements in for any . For example, let be a possibility distribution (a fuzzy set) in for (i.e. a possibility distribution in the set of eight scenarios). Then , , , . Let be the possibility degree of the elements in . In the example we have , , and .
Proposition 1.
System (6) is equivalent to the following system of constraints:
(7) |
Proof.
It is easy to verify that each feasible solution to (6) is also feasible to (7). Indeed, for , , we get . So, (7) is a subset of the constraints (6). Assume that is feasible to (7). Consider any set , . If , then is feasible to (6). Assume that and let be the first index such that . If , then and the constraint in (6) holds for . Assume that . The inequality from (7)
implies
where the last equality results from the fact that . In consequence the constraint in (6) holds for . ∎
For the sample possibility distribution , the ambiguity set can be described by the following system of constraints:
Using Proposition 1, the value of , for a given , can be computed by using the following linear program:
(8) |
Theorem 1.
Given , the constraint
(9) |
is equivalent to the following system of linear constraints:
(10) |
Proof.
Consider two boundary cases of the proposed model. If for some , and for each , then is the only scenario which can occur. The constraint (3) reduces then to . If for all , then and (3) becomes the strict robust constraint (2) for . Theorem 1 yields the following corollary.
Corollary 1.
4 Interval uncertainty model
Let us again focus on an uncertain constraint , where is a random vector in with an unknown probability distribution having now a compact continuous support . We will start by describing the support . Next, we will propose a continuous possibility distribution in which will induce an ambiguity set of probability distributions according to (4). This possibility distribution can be provided by experts or can be estimated by using available data. Assume that for each component of a nominal (expected, the most probable) value and an interval containing possible values of are known. Let
i.e. is a hyperrectangle containing possible scenarios (realizations of ). The components of are often correlated. Also, the probability that a subset of components will take the extreme values can be very small or even 0. So, is typically an overestimation of the support of . In robust optimization some limit for the deviations of scenarios from the nominal vector is often assumed, i.e. , where is a deviation prescribed. The parameter is called a budget and it controls the amount of uncertainty in . One can define, for example, , which leads to the continuous budgeted uncertainty [29]. In this paper, following [2, 3] we use
where is a given matrix. Notice that , where for some postive semidefinite matrix . For example, can be an estimation of the covariance matrix for (see, e.g., [6]). Therefore
is an ellipsoid in . We postulate that contains all possible realizations of , i.e. . Hence is the estimation of the support of . In the following, we will construct a possibility distribution for , which can be easily built from , , matrix and the budget .
Let be a possibility distribution for the component of , such that that , , is continuous, strictly increasing in and continuous strictly decreasing in . One can identify the possibility distribution with membership function of a fuzzy interval shown in Figure 1a (i.e. ). Let
be the -cut of . It is easy to check that is an interval for each fixed . Set . The function , the left profile of , is continuous and strictly increasing and the function , the right profile of , is continuous and strictly decreasing for . For the fuzzy interval , the -cut of can be computed by using the following formula:
We will assume that . Then are nondegenerate intervals (they have nonempty interiors) for each . The numbers and control the amount of uncertainty for below and over , respectively. In particular, when , then becomes . On the other hand, if , then becomes the closed interval (see Figure 1).
The deviation depends on an unknown realization of , so we can treat the deviation as an uncertain quantity . We can now build the following possibility distribution for : , is continuous and strictly decreasing in and if . Let
be the -cut of . Fix . We can identify with membership function of a fuzzy interval given, for example, in the form of (see Figure 1b). The -cut of is then for .
We now build the following joint possibility distribution , for :
(12) |
The value of is the possibility degree for scenario . The first part of the possibility distribution, i.e. , is built from the possibility distributions of non-interacting components [11]. The second part is added to model interactions between the components of . Define the -cut of
It is easily seen that
Set . Observe that . Furthermore , , is a family of nested sets, i.e. if . By the continuity and monotonicity of , and we get
(13) |
Notice also that is a closed convex set for each and has nonempty interior for all .
Proposition 2.
The following equality
holds, where is the set of all probability measures on .
Proof.
Let . We need to show that . Observe first that . To see this choose any probability distribution . Since is an event for each , we get for each (see equation (13)). In consequence . We now show that . Let and consider event . If , then and . Suppose and let
Then Choose for arbitrarily small . As , there is scenario such that and . But for each scenario such that . Consequently . Because , . Hence and . ∎
In order to construct a tractable reformulation of (3) for we need to use a discretization of . Choose integer and set . Define , . We consider the following ambiguity set:
(14) |
Set is a discrete approximation of . It is easy to verify that for any . By fixing sufficiently large constant , we obtain arbitrarily close approximation of .

Theorem 2.
The constraint
(15) |
is equivalent to the following system of second-order cone constraints:
(16) |
where and is the th column of .
Proof.
The proof is adapted from [39]. The left hand side of (15) can be expressed as the problem of moments (see, e.g., [36, 39]):
(17) |
where is the set of all nonnegative measures on . Notice that the second (equality) constraint implies is a probability measure supported on . The dual of the problem of moments takes the following form (see, e.g., [24]):
(18) |
Strong duality implies that the optimal objective values of (17) and (18) are the same (see, e.g., [24, Theorem 1, Corollary 1]). Define
and . Hence , , form a partition of the support into disjoint sets. Model (18) can be then rewritten as follows:
(19) |
which is equivalent to
(20) |
The th constraint in (20) can be reformulated as
(21) |
By the linearity of in , the left hand side is maximized at the boundary of , which coincides with the boundary of . Hence (21) is equivalent to
The left hand side of this inequality, by definition of , yields the following problem:
Let us substitute , which yields:
Introducing new variables leads to the following model:
(22) |
The dual to (22) is (see the Appendix):
(23) |
where is the th column of . The strong duality (see Appendix) implies that (22) and (23) have the same optimal objective function values. Model (23) together with (20) yield (16). ∎
Theorem 2 leads to the following corollary:
Corollary 2.
Corollary 2 shows that the resulting deterministic counterpart of uncertain linear programming problem (1), under the interval uncertainty model, is polynomially solvable (see, e.g., [7]) and thus can solve efficiently by some standard off-the-shelf solvers like IBM ILOG CPLEX [18].
Consider a sample of optimization problem (1) in which the value of the objective function is uncertain and two constraints: and are precisely known. The uncertainty of the coefficients and is modeled by fuzzy intervals and , respectively, regarded as possibility distributions for their values. A possibility distribution for uncertain deviation is prescribed by fuzzy interval , , and . By (12), we obtain a joint possibility distribution for . Fix , which leads to the ambiguity set for shown in Figure 2, built according to (14). In particular, the set , being the intersection of and the ellipse contains all possible scenarios. Applying now the distributionally robust approach in the possibilistic setting gives:
(24) |
Remark 1 shows how to convert the above problem with the uncertain objective function to the one with the precise objective function and the uncertain constraint. Theorem 2 (see Corollary 2) now leads to a second-order cone program that is equivalent to (24). An optimal solution of the program is and . Let us focus on evaluating the objective function
The worst probability distribution in assigns probability 0.5 to scenario and probability 0.5 to scenario in . These two points can be obtained by maximizing the linear function over the convex sets and , respectively (see Figure 2). We thus get .
5 Risk aversion modeling
In this section we will show how the ambiguity set can be relaxed to take individual decision maker risk aversion into account. Observe that the lower bounds on the probabilities for the sets in change uniformly. In consequence, a worst probability distribution uniformly assigns probabilities to points on the boundaries of . An example is shown in Figure 2, where (for ) probability 0.5 is assigned to points on the boundaries of and . It has been observed that risk averse decision makers can perceive worst coefficient realizations as more probable (see, e.g., [33, 12]). We now propose an method of taking this risk-aversion into account. Let
for and . The function is continuous, concave, increasing in and such that and . If , then for each . On the other hand, if , then for each , so tends to a linear function (see Fig. 3).
Fix and consider the following ambiguity set:
Since , we still get a valid bound . Notice that the lower bound can be now smaller (i.e. larger probabilities can be assigned to worse scenarios), which reflects the probability distortion.
As for each and if , decreasing leads to more conservative constraint (3). If , then . On the other hand, if , then is the set of all probability measures with support , and the constraint (3) becomes
which is equivalent to the strict robust constraint (2). Theorem 2 remains valid for the ambiguity set . It is enough to replace the constant with the constant in the first inequality in (16).
6 Application to portfolio selection
In this section we will apply the model constructed in Section 4 (see also Section 5) to a portfolio selection problem. We are given a set of assets whose returns form a random vector . A portfolio is a vector , where is the share of the th asset. Then is the uncertain return of portfolio (accordingly, the quantity is the uncertain loss for ). The probability distribution for is unknown. However, we have a sample of past realizations of . Using this sample we can estimate the mean return and the covariance matrix for . The sample mean and covariance matrix for 7 assets, computed for a sample of 30 subsequent observations in Polish stock market, are
We fix , , , where for each . Using Chebyshev’s inequality one can show that with high probability. We now use the fuzzy intervals to model the possibility distributions for the asset returns. We use linear membership functions. One can, however, further refine the information for the return values using different shapes, i.e. changing the parameters and . We use the fuzzy interval for to model the possibility distribution for the deviation . We consider the following problem:
(25) |

Figure 4 shows the optimal objective value of (25) for . In the first boundary case, , there is no uncertainty and in the optimal portfolio all is allocated to the asset with the largest expected return (smallest expected loss). In the second boundary case, when , there is no limit on deviation from and all is allocated to the asset with the smallest . For the intermediate values of , we get a family of various portfolios in which a diversification is profitable. One such portfolio is shown in Figure 4.
Figure 5 shows the effect of taking the individual risk aversion into account. In this figure the optimal objective value of (25) for is shown (we fix ). For smaller values of the objective value of (25) is larger. One can see that the optimal portfolio is adjusted (see Figure 5) to take larger sets of admissible probability distributions into account.

7 Conclusions
In this paper we have proposed a method of unifying the fuzzy (possibilistic) optimization with the distributionally robust approach. Using the connections between the possibility and probability theories one can build a set of probability distributions for uncertain constraint coefficient, based on a possibilistic information. The uncertain constraint, the left-hand side, can be then replaced with the expected value with respect to the worst probability distribution that can occur. In practical applications the true probability distributions for the uncertain data are often unknown. However, in most cases some information about the possible scenarios (parameter realizations) is available. In this case the possibility theory is an attractive choice, because it is flexible and allows to take both available data and experts’ opinions into account. In this paper we have proposed some methods of defining possibility distributions for scenarios, which is easy to apply in practice. Furthermore, the resulting deterministic counterpart of the problem can be solved efficiently for a large class of problems (for example for linear programming).
The key element of our model is the definition of a joint possibility distribution in scenario set. This step is quite flexible, as there are only minor restrictions on possibility distribution function which must be imposed. The support of this possibility distribution should contain all possible scenarios and a budget should control the magnitude of deviations of scenarios from some nominal (expected) one. We believe that such possibility distribution can be a good tool to handle the uncertainty in optimization problems.
References
- [1] C. Baudrit and D. Dubois. Practical representations of incomplete probabilistic knowledge. Computational Statistics and Data Analysis, 51:86–108, 2006.
- [2] A. Ben-Tal, L. El Ghaoui, and A. Nemirovski. Robust optimization. Princeton Series in Applied Mathematics. Princeton University Press, Princeton, NJ, 2009.
- [3] A. Ben-Tal and A. Nemirovski. Robust solutions of uncertain linear programs. Operation Research Letters, 25:1–13, 1999.
- [4] D. Bertsimas, V. Gupta, and N. Kallus. Data-deriven robust optimization. Mathematical Programming, 167:235–292, 2017.
- [5] D. Bertsimas and M. Sim. The price of robustness. Operations research, 52:35–53, 2004.
- [6] D. Bertsimas and M. Sim. Robust discrete optimization under ellipsoidal uncertainty sets. Technical report, MIT, 2004.
- [7] S. Boyd and L. Vandenberghe. Covex optimization. Cambridge University Press, 2008.
- [8] A. Charnes and W. Cooper. Chance-constrainted programming. Management Science, 6(73–79), 1959.
- [9] G. De Cooman and D. Aeyels. Supremum-preserving upper probabilities. Information Sciences, 118:173–212, 1999.
- [10] E. Delage and Y. Ye. Distributionally robust optimization under moment uncertainty with application to data-deriven problems. Operations Research, 58:595–612, 2010.
- [11] S. Destercke, D. Dubois, and E. Chojnacki. A consonant approximation of the product of independent consonant random sets. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 17:773–792, 2009.
- [12] E. Diecidue and P. P. Wakker. On the intuition of rank-depedent utility. The Journal of Risk and Uncertainty, 23:281–298, 2001.
- [13] D. Dubois. Possibility theory and statistical reasoning. Computational Statistics and Data Analysis, 51:47–69, 2006.
- [14] D. Dubois and H. Prade. Possibility theory: an approach to computerized processing of uncertainty. Plenum Press, New York, 1988.
- [15] D. Dubois and H. Prade. When upper probabilities are possibility measures. Fuzzy Sets and Systems, 49:65–74, 1992.
- [16] J. Goh and M. Sim. Distributionally robust optimization and its tractable approximations. Operations Research, 58:902–917, 2010.
- [17] R. Guillaume, A. Kasperski, and P. Zieliński. Distributionally robust optimization in possibilistic setting. In 2021 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pages 1–6, 2021.
- [18] IBM ILOG CPLEX Optimization Studio. CPLEX User’s manual. https://www.ibm.com.
- [19] M. Inuiguchi and J. Ramik. Possibilistic linear programming: a brief review of fuzzy mathematical programming and a comparison with stochastic programming in portfolio selection problem. Fuzzy Sets and Systems, 111:3–28, 2000.
- [20] M. Inuiguchi, J. Ramík, T. Tanino, and M. Vlach. Satisficing solutions and duality in interval and fuzzy linear programming. Fuzzy Sets and Systems, 135:151–177, 2003.
- [21] P. Kall and J. Mayer. Stochastic linear programming. Models, theory and computation. Springer, 2005.
- [22] A. Kasperski and P. Zieliński. Robust Discrete Optimization Under Discrete and Interval Uncertainty: A Survey. In Robustness Analysis in Decision Aiding, Optimization, and Analytics, pages 113–143. Springer-Verlag, 2016.
- [23] A. Kasperski and P. Zieliński. Soft robust solutions to possibilistic optimization problems. Fuzzy Sets and Systems, 422:130–148, 2021.
- [24] E. d. Klerk and M. Laurent. A survey of semidefinite programming approaches to the generalized problem of moments and their error analysis. In C. Araujo, G. Benkart, C. E. Praeger, and B. Tanbay, editors, World Women in Mathematics 2018. Association for Women in Mathematics Series, pages 17–56. Springer, 2019.
- [25] P. Kouvelis and G. Yu. Robust Discrete Optimization and its Applications. Kluwer Academic Publishers, 1997.
- [26] Y.-J. Lai and C.-L. Hwang. Fuzzy Mathematical Programming. Springer-Verlag, 1992.
- [27] B. Liu. Fuzzy random chance-constrained programming. IEEE Transactions on Fuzzy Systems, 9:713–720, 2001.
- [28] W. A. Lodwick and J. Kacprzyk, editors. Fuzzy Optimization - Recent Advances and Applications, volume 254 of Studies in Fuzziness and Soft Computing. Springer-Verlag, 2010.
- [29] E. Nasrabadi and J. B. Orlin. Robust optimization with incremental recourse. CoRR, abs/1312.4075, 2013.
- [30] C. H. Papadimitriou and K. Steiglitz. Combinatorial optimization: algorithms and complexity. Dover Publications Inc., 1998.
- [31] M. S. Pishvaee, J. Razmin, and S. A. Torabi. Robust possibilistic programming for socially responsible supply chain network design: A new approach. Fuzzy Sets and Systems, 206:1–20, 2012.
- [32] E. Quaeghebeur, K. Shariatmadar, and G. De Cooman. Constrained optimization problems under uncertainty with coherent lower previsions. Fuzzy Sets and Systems, 206:74–88, 2012.
- [33] J. Quiggin. A theory of anticipated utility. Journal of Economic Behaviour and Organization, 3:323–343, 1982.
- [34] J. Ramík. Duality in fuzzy linear programming with possibility and necessity relations. Fuzzy Sets Systems, 157:1283–1302, 2006.
- [35] J. Ramík and M. Vlach. Generalized concavity in fuzzy optimization and decision analysis. Kluwer Academic Publishers, 2002.
- [36] A. Shapiro. On duality theory of conic linear problems. In M. Á. Goberna and M. A. López, editors, Semi-infinite programming, chapter 7, pages 135–165. Kluwer Academic Publishers, 2001.
- [37] R. Słowiński and J. Teghem, editors. Stochastic Versus Fuzzy Approaches to Multiobjective Mathematical Programming under Uncertainty. Kluwer Academic Publishers, 1990.
- [38] M. Troffaes, E. Miranda, and S. Destercke. On the connection between probability boxes and possibility measures. Information Sciences, 224:88–108, 2013.
- [39] W. Wiesemann, D. Kuhn, and M. Sim. Distributionally robust convex optimization. Operations Research, 62:1358–1376, 2014.
Appendix
The proof of (23).
Let us rewrite (22) as follows:
(26) |
and write the Lagrangian function
where and are the Lagrangian multipliers. After rearranging the terms we get
Problem (26) is convex and the set of feasible solutions is bounded and has nonempty interior. Hence it has an optimal solution with the finite objective function value . The problem also satisfies the strong duality (see, e.g., [7]), i.e.
(27) |
Since , , are unrestricted, the equality:
(28) |
must hold. We now show that if and , otherwise. Assume, that . By Cauchy-Shwartz inequality, . Hence and . Assume that and take for some . Then , which can be made arbitrarily small by increasing . Hence, it must be
(29) |
Now, using (28) and (29), the problem (21) can be rewritten as
which is equivalent to (23). ∎