Variational Theory and Algorithms for a Class of Asymptotically Approachable Nonconvex Problems

Hanyang Li Department of Industrial Engineering and Operations Research, University of California, Berkeley, Berkeley 94720 ([email protected]) Ying Cui Department of Industrial Engineering and Operations Research, University of California, Berkeley, Berkeley 94720 ([email protected])

Abstract

We investigate a class of composite nonconvex functions, where the outer function is the sum of univariate extended-real-valued convex functions and the inner function is the limit of difference-of-convex functions. A notable feature of this class is that the inner function may fail to be locally Lipschitz continuous. It covers a range of important yet challenging applications, including inverse optimal value optimization and problems under value-at-risk constraints. We propose an asymptotic decomposition of the composite function that guarantees epi-convergence to the original function, leading to necessary optimality conditions for the corresponding minimization problem. The proposed decomposition also enables us to design a numerical algorithm such that any accumulation point of the generated sequence, if exists, satisfies the newly introduced optimality conditions. These results expand on the study of so-called amenable functions introduced by Poliquin and Rockafellar in 1992, which are compositions of convex functions with smooth maps, and the prox-linear methods for their minimization. To demonstrate that our algorithmic framework is practically implementable, we further present verifiable termination criteria and preliminary numerical results.

Keywords:

epi-convergence; optimality conditions; nonsmooth analysis; difference-of-convex functions

1 Introduction.

We consider a class of composite optimization problems of the form:

\begin{array}[]{rl}\displaystyle\operatornamewithlimits{minimize}_{x\in\mathbb{R}^{n}}&\;\,\displaystyle\sum_{p=1}^{m}\left[F_{p}(x)\triangleq\varphi_{p}\big{(}f_{p}(x)\big{)}\right],\end{array}

(CP₀)

where for each $p=1,\cdots,m$ , the outer function $\varphi_{p}:\mathbb{R}\to\mathbb{R}\cup\{+\infty\}$ is proper, convex, lower semicontinuous (lsc), and the inner function $f_{p}:\mathbb{R}^{n}\rightarrow\mathbb{R}$ is not necessarily locally Lipschitz continuous.

If each inner function $f_{p}$ is continuously differentiable, then the objective in (CP₀) belongs to the family of amenable functions under a constraint qualification [25, 26]. For a thorough exploration of the variational theory of amenable functions, readers are referred to [30, Chapter 10(F)]. The properties of amenable functions have also led to the development of prox-linear algorithms, where convex subproblems are constructed through the linearization of the inner smooth mapping [16, 4, 5, 19, 14].

However, there are various applications of composite optimization problem in the form of (CP₀) where the inner function $f_{p}$ is nondifferentiable. In the following, we provide two such examples.

Example 1.1 (The inverse optimal value optimization). For $p=1,\cdots,m$ , consider the optimal value function

f_{p}(x)\triangleq\displaystyle{\inf_{y\in{\mathbb{R}^{d}}}}\left\{(c^{\,p}+C^{\,p}x)^{\top}y+\frac{1}{2}\,y^{\top}Q^{\,p}\,y\;\middle|\;A^{\,p}x+B^{\,p}y\leq b^{\,p}\,\right\}\qquad x\in{\mathbb{R}^{n}}

(1)

with appropriate dimensional vectors $b^{\,p}$ and $c^{\,p}$ , and matrices $A^{\,p},B^{\,p},C^{\,p}$ and $Q^{\,p}$ . The function $f_{p}$ is not smooth in general. The inverse (multi) optimal value problem [2, 24] finds a vector $x\in{\mathbb{R}^{n}}$ that minimizes the discrepancy between observed optimal values ${\{\nu_{p}\}_{p=1}^{m}}$ and true optimal values $\{f_{p}(x)\}_{p=1}^{m}$ based on a prescribed metric, such as the $\ell_{1}$ -error:

\displaystyle\operatornamewithlimits{minimize}_{x\in{\mathbb{R}^{n}}}\;\sum_{p=1}^{m}\left|{\nu_{p}}-f_{p}(x)\right|.

(2)

If $f_{p}$ is real-valued for $p=1,\cdots,m$ , one can express problem (2) in the form of (CP₀) by defining the outer function $\varphi_{p}(t)=|{\nu_{p}}-t|$ .

Example 1.2 (The portfolio optimization under a value-at-risk constraint). Given a random variable $Y$ , the Value-at-risk (VaR) of $Y$ at a confidence level $\alpha\in(0,1)$ is defined as $\mbox{VaR}_{\alpha}(Y)\triangleq{\min}\left\{\gamma\in\mathbb{R}\,\mid\,\mathbb{P}(Y\leq\gamma)\geq\alpha\right\}$ . Let $Z$ be the random return of investments and $c(\cdot,\cdot)$ be a lsc function representing the profit of $Z$ parameterized by $x\in\mathbb{R}^{n}$ . An agent’s goal is to maximize the expected profit, denoted by $\mathbb{E}[\,c(x,Z)]$ , while also controlling the risk via a constraint on $\mbox{VaR}_{\alpha}[c(x,Z)]$ under a prescribed level $r$ . The model can be written as

{\displaystyle\operatornamewithlimits{minimize}_{x\in\mathbb{R}^{n}}\;\;-\mathbb{E}\left[\,c(x,Z)\right]}\quad\mbox{subject to}\;\;\mbox{VaR}_{\alpha}[\,c(x,Z)]\,\geq r.

(3)

Define $\delta_{A}$ as the indicator function of a set $A$ , where $\delta_{A}(t)=0$ for $t\in A$ and $\delta_{A}(t)=+\infty$ for $t\notin A$ . Problem (3) can then be put into the framework (CP₀) by setting $\varphi_{1}(t)=-t$ , $f_{1}(x)=\mathbb{E}[\,c(x,Z)]$ , $\varphi_{2}(t)=\delta_{[r,+\infty)}(t)$ , and $f_{2}(x)=\mbox{VaR}_{\alpha}[\,c(x,Z)]$ . We note that the function $\mbox{VaR}_{\alpha}[\,c(\cdot,Z)]$ can be nondifferentiable even if the function $c(\cdot,z)$ is differentiable for every $z$ .

Due to the nondifferentiablity of the inner function $f_{p}$ in (CP₀), the overall objective is not amenable and the prox-linear algorithm [16] is not applicable to solve this composite optimization problem. In this paper, we develop an algorithmic framework for a subclass of (CP₀), where each inner function $f_{p}$ , although nondifferentiable, can be derived from DC functions through a limiting process. We refer to this class of functions as approachable difference-of-convex (ADC) functions (see section 2.1 for the formal definition). It is important to note that ADC functions are ubiquitous. In particular, we will show that the optimal value function $f_{p}$ in (1) and $\mbox{VaR}_{\alpha}[\,c(\cdot,Z)]$ in (3) are instances of ADC functions under mild conditions. In fact, based on the result recently shown in [31], any lsc function is the epi-limit of piecewise affine DC functions.

With this new class of functions in hand, we have made a first step to understand the variational properties of the composite ADC minimization problem (CP₀), including an in-depth analysis of its necessary optimality conditions. The novel optimality conditions are defined through a handy approximation of the subdifferential mapping $\partial f_{p}$ that explores the ADC structure of $f_{p}$ . Using the notion of epi-convergence, we further show that these optimality conditions are necessary conditions for any local solution of (CP₀). Additionally, we propose a double-loop algorithm to solve (CP₀), where the outer loop dynamically updates the DC functions approximating each $f_{p}$ , and the inner loop finds an approximate stationary point of the resulting composite DC problem through successive convex approximations. It can be shown that any accumulation point of the sequence generated by our algorithm satisfies the newly introduced optimality conditions.

Our strategy to handle the nondifferentiable and possibly discontinuous inner function $f_{p}$ through a sequence of DC functions shares certain similarities with the approximation frameworks in the existing literature. For instance, Ermoliev et al. [15] have designed smoothing approximations for lsc functions utilizing convolutions with bounded mollifier sequences, a technique akin to local “averaging”. Research has sought to identify conditions that ensure gradient consistency for the smoothing approximation of composite nonconvex functions [10, 8, 6, 7]. Notably, Burke and Hoheisel [6] have emphasized the importance of epi-convergence for the approximating sequence, a less stringent requirement than the continuous convergence assumed in earlier works [10, 3]. In recent work, Royset [32] has studied the consistent approximation of the composite optimization in terms of the global minimizers and stationary solutions, where the inner function is assumed to be locally Lipschitz continuous. Our notion of subdifferentials and optimality conditions for (CP₀) takes inspiration from these works but adapts to accommodate nonsmooth approximating sequences that exhibit the advantageous property of being DC.

The rest of the paper is organized as follows. Section 2 presents a class of ADC functions and introduces a new associated notion of subdifferential. In section 3, we investigate the necessary optimality conditions for problem (CP₀). Section 4 is devoted to an algorithmic framework for solving (CP₀) and its convergence analysis to the newly introduced optimality conditions. We also discuss termination criteria for practical implementation in section 4.3. Preliminary numerical experiments on the inverse optimal value problems are presented in the last section.

Notation and Terminology. Let $\|\cdot\|$ denote the Euclidean norm in $\mathbb{R}^{n}$ . We use the symbol $\mathbb{B}(\bar{x},\delta)$ to denote the Euclidean ball $\{x\in\mathbb{R}^{n}\mid\|x-\bar{x}\|\leq\delta\}$ . The set of nonpositive and nonnegative are denoted by $\mathbb{R}_{-}$ and $\mathbb{R}_{+}$ , respectively, and the set of nonnegative integers is denoted by $\mathbb{N}$ . We write $\mathbb{N}_{\infty}^{\sharp}\triangleq\{N\subset\mathbb{N}\,\mid\,N\mbox{ infinite}\}$ and $\mathbb{N}_{\infty}\triangleq\{N\,\mid\,\mathbb{N}\;\backslash N\text{ finite}\}$ . Notation $\{t^{k}\}$ is used to simplify the expression of any sequence $\{t^{k}\}_{k\in\mathbb{N}}$ , where the elements can be points, sets, or functions. By $t^{k}\to t$ and $t^{k}\to_{N}t$ , we mean that the sequence $\{t^{k}\}$ and the subsequence $\{t^{k}\}_{k\in N}$ indexed by $N\in\mathbb{N}^{\sharp}_{\infty}$ converge to $t$ , respectively.

Given two sets $A$ and $B$ in $\mathbb{R}^{n}$ and a scalar $\lambda\in\mathbb{R}$ , the Minkowski sum and the scalar multiple are defined as $A+B\triangleq\{a+b\mid a\in A,b\in B\}$ and $\lambda\,A\triangleq\{\lambda\,a\mid a\in A\}$ . We also define $0\cdot\emptyset=\{0\}$ and $\lambda\cdot\emptyset=\emptyset$ whenever $\lambda\neq 0$ . When $A$ and $B$ are nonempty and closed, we define the one-sided deviation of $A$ from $B$ as $\mathbb{D}(A,B)\,\triangleq\,\sup_{x\in A}\;\operatorname{dist}(x,B)$ , where $\operatorname{dist}(x,B)\triangleq\inf_{y\in B}\|y-x\|$ . The Hausdorff distance between $A$ and $B$ is given by $\mathbb{H}(A,B)\triangleq\max\{\mathbb{D}(A,B),\,\mathbb{D}(B,A)\}$ . The boundary and interior of $A$ are denoted by $\operatorname{bdry}(A)$ and $\operatorname{int}(A)$ . The topological closure and the convex hull of $A$ are indicated by $\operatorname{cl}(A)$ and $\operatorname{con}A$ .

For a sequence of sets $\{C^{k}\}$ , we define its outer limit as

\displaystyle\operatornamewithlimits{Lim\,sup}_{k\rightarrow+\infty}C^{k}\triangleq\{u\mid\exists\,N\in\mathbb{N}_{\infty}^{\sharp},\,u^{k}\to_{N}u\text{ with }u^{k}\in C^{k}\},

and the horizon outer limit as

{\displaystyle\operatornamewithlimits{Lim\,sup}_{k\rightarrow+\infty}}^{\infty}\,C^{k}\triangleq\{0\}\cup\left\{u\mid\exists\,N\in\mathbb{N}_{\infty}^{\sharp},\,\lambda_{k}\downarrow 0,\,\lambda_{k}u^{k}\rightarrow_{N}u\text{ with }u^{k}\in C^{k}\right\}.

The outer limit of a set-valued mapping $S:\mathbb{R}^{n}\rightrightarrows\mathbb{R}^{m}$ is defined as

\displaystyle\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow\bar{x}}S(x)\triangleq\bigcup_{x^{k}\rightarrow\bar{x}}\displaystyle\operatornamewithlimits{Lim\,sup}_{k\rightarrow+\infty}S(x^{k})=\{u\mid\exists\,x^{k}\rightarrow\bar{x},\,u^{k}\rightarrow u\text{ with }u^{k}\in S(x^{k})\}\quad\bar{x}\in\mathbb{R}^{n}.

We say $S$ is outer semicontinuous (osc) at $\bar{x}\in\mathbb{R}^{n}$ if $\operatorname{Lim\,sup}_{x\rightarrow\bar{x}}S(x)\subset S(\bar{x})$ . Consider some index set $N\in\mathbb{N}^{\sharp}_{\infty}$ . A sequence of sets $\{C^{k}\}_{k\in N}$ is equi-bounded if there exists a bounded set $B$ such that $C^{k}\subset B$ for all $k\in N$ . Otherwise, the sequence is unbounded. If there is an integer $K\in N$ such that $\{C^{k}\}_{k\in N,k\geq K}$ is equi-bounded, then the sequence $\{C^{k}\}_{k\in N}$ is said to be eventually bounded. Interested readers are referred to [30, Chapter 4] for a comprehensive study of set convergence.

The regular normal cone and the limiting normal cone of a set $C\subset\mathbb{R}^{n}$ at $\bar{x}\in C$ are given by

\widehat{\mathcal{N}}_{C}(\bar{x})\triangleq\left\{v\,\middle|\,v^{\top}(x-\bar{x})\leq o(\|x-\bar{x}\|)\text{ for all }x\in C\right\}\quad\mbox{and}\quad\mathcal{N}_{C}(\bar{x})\triangleq\displaystyle\operatornamewithlimits{Lim\,sup}_{x(\in C)\rightarrow\bar{x}}\widehat{\mathcal{N}}_{C}(x).

The proximal normal cone of a set $C$ at $\bar{x}\in C$ is defined as $\mathcal{N}^{p}_{C}(\bar{x})\triangleq\{\lambda(x-\bar{x})\mid\bar{x}\in P_{C}(x),\lambda\geq 0\}$ , where $P_{C}$ is the projection onto $C$ that maps any $x$ to the set of points in $C$ that are closest to $x$ .

For an extended-real-valued function $f:\mathbb{R}^{n}\rightarrow\overline{\mathbb{R}}\triangleq\mathbb{R}\cup\{\pm\infty\}$ , we write its effective domain as $\operatorname{dom}f\triangleq\{x\in\mathbb{R}^{n}\mid f(x)<+\infty\}$ , and the epigraph as $\operatorname{epi}f\triangleq\{(x,\alpha)\in\mathbb{R}^{n+1}\mid\alpha\geq f(x)\}$ . We say $f$ is proper if $\operatorname{dom}f$ is nonempty and $f(x)>-\infty$ for all $x\in\mathbb{R}^{n}$ . We adopt the common rules for extended arithmetic operations, and the lower and upper limits of a sequence of scalars in $\overline{\mathbb{R}}$ (cf. [30, Chapter 1(E)]).

Let $f:\mathbb{R}^{n}\to\overline{\mathbb{R}}$ be a proper function. We write $x\rightarrow_{f}\bar{x}$ , if $x\rightarrow\bar{x}\text{ and }f(x)\rightarrow f(\bar{x})$ . The regular subdifferential and the limiting subdifferential of $f$ at $\bar{x}\in\operatorname{dom}f$ are respectively defined as

\widehat{\partial}f(\bar{x})\triangleq\{v\mid f(x)\geq f(\bar{x})+v^{\top}(x-\bar{x})+o(\|x-\bar{x}\|)\text{ for all }x\}\quad\mbox{and}\quad\partial f(\bar{x})\triangleq\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow_{f}\bar{x}}\widehat{\partial}f(x).

For any $\bar{x}\notin\operatorname{dom}f$ , we set $\widehat{\partial}f(\bar{x})=\partial f(\bar{x})=\emptyset$ . When $f$ is locally Lipschitz continuous at $\bar{x}$ , $\operatorname{con}\partial f(\bar{x})$ equals to the Clarke subdifferential $\partial_{C}f(\bar{x})$ . We further say $f$ is subdifferentially regular at $\bar{x}\in\operatorname{dom}f$ if $f$ is lsc at $\bar{x}$ and $\widehat{\partial}f(\bar{x})=\partial f(\bar{x})$ . When $f$ is proper and convex, $\widehat{\partial}f$ , $\partial f$ , and $\partial_{C}f$ coincide with the concept of the subdifferential in convex analysis.

Finally, we introduce the notion of function convergence. A sequence of functions $\{f^{k}:\mathbb{R}^{n}\to\overline{\mathbb{R}}\}$ is said to converge pointwise to $f:\mathbb{R}^{n}\rightarrow\overline{\mathbb{R}}$ , written $f^{k}\overset{\text{p}}{\rightarrow}f$ , if $\lim_{k\to+\infty}f^{k}(x)=f(x)$ for any $x\in\mathbb{R}^{n}$ . The sequence $\{f^{k}\}$ is said to epi-converge to $f$ , written $f^{k}\overset{\text{e}}{\rightarrow}f$ , if for any $x$ , it holds

\left\{\begin{array}[]{ll}\displaystyle\liminf_{k\rightarrow+\infty}f^{k}(x^{k})\geq f(x)&\;\text{for every sequence }x^{k}\rightarrow x,\\ \displaystyle\limsup_{k\rightarrow+\infty}f^{k}(x^{k})\leq f(x)&\;\text{for some sequence }x^{k}\rightarrow x.\end{array}\right.

The sequence $\{f^{k}\}$ is said to converge continuously to $f$ , written $f^{k}\overset{\text{c}}{\rightarrow}f$ , if $\lim_{k\rightarrow+\infty}f^{k}(x^{k})=f(x)$ for any $x$ and any sequence $x^{k}\rightarrow x$ .

2 Approachable difference-of-convex functions.

In this section, we formally introduce a class of functions that can be asymptotically approximated by DC functions. A new concept of subdifferential that is defined through the approximating functions is proposed. At the end of this section, we provide several examples that demonstrate the introduced concepts.

2.1 Definitions and properties.

An extended-real-valued function can be approximated by a sequence of functions in various notions of convergence, as comprehensively investigated in [30, Chapter 7(A-C)]. Among these approaches, epi-convergence has a notable advantage in its ability to preserve the global minimizers [30, Theorem 7.31]. Our focus lies on a particular class of approximating functions, wherein each function exhibits a DC structure.

Definition 1.

A function $f$ is said to be DC on its domain if there exist proper, lsc and convex functions $g,h:\mathbb{R}^{n}\to\overline{\mathbb{R}}$ such that $\operatorname{dom}f=[\,\operatorname{dom}g\cap\operatorname{dom}h\,]$ and $f(x)=g(x)-h(x)$ for any $x\in\operatorname{dom}f$ .

With this definition, we introduce the concept of ADC functions.

Definition 2 (ADC functions).

Let $f:\mathbb{R}^{n}\rightarrow\overline{\mathbb{R}}$ be a proper function.
(a) $f$ is said to be pointwise approachable DC (p-ADC) if there exist proper functions $\{f^{k}:\mathbb{R}^{n}\to\overline{\mathbb{R}}\}$ , DC on their respective domains, such that $f^{k}\overset{\text{p}}{\rightarrow}f$ .
(b) $f$ is said to be epigraphically approachable DC (e-ADC) if there exist proper functions $\{f^{k}:\mathbb{R}^{n}\to\overline{\mathbb{R}}\}$ , DC on their respective domains, such that $f^{k}\overset{\text{e}}{\rightarrow}f$ .
(c) $f$ is said to be continuously approachable DC (c-ADC) if there exist proper functions $\{f^{k}:\mathbb{R}^{n}\to\overline{\mathbb{R}}\}$ , DC on their respective domains, such that $f^{k}\overset{\text{c}}{\rightarrow}f$ .

A function $f$ is said to be ADC associated with $\{f^{k}\}$ if $\{f^{k}\}$ confirms one of these convergence properties. By a slight abuse of notation, we denote the DC decomposition of each $f^{k}$ as $f^{k}=g^{k}-h^{k}$ , although the equality may only hold for $x\in\operatorname{dom}f^{k}$ .

A p-ADC function may not be lsc. An example is given by $f(x)=\mathbf{1}_{\{0\}}(x)+2\cdot\mathbf{1}_{(0,+\infty)}(x)$ , where for a set $C\subset\mathbb{R}^{n}$ , we write $\mathbf{1}_{C}(x)=1$ if $x\in C$ and $\mathbf{1}_{C}(x)=0$ if $x\notin C$ . In this case, $f$ is not lsc at $x=0$ . However, $f$ is p-ADC associated with $f^{k}(x)=\max\left(\,0,\,2kx+1\,\right)-\max\left(\,0,\,2kx-1\,\right)$ . In contrast, any e-ADC function must be lsc [30, Proposition 7.4(a)], and any c-ADC function is continuous [30, Theorem 7.14].

The relationships among different notions of function convergence, including the unaddressed uniform convergence, have been thoroughly examined in [30]. Generally, pointwise convergence and epi-convergence do not imply one another, but they coincide when the sequence $\{f^{k}\}$ is asymptotically equi-lsc everywhere [30, Theorem 7.10]. In addition, $\{f^{k}\}$ converges continuously to $f$ if and only if both $f^{k}\overset{\text{e}}{\rightarrow}f$ and $(-f^{k})\overset{\text{e}}{\rightarrow}(-f)$ are satisfied [30, Theorem 7.11]. While verifying epi-convergence is often challenging, it becomes simpler for a monotonic sequence $\{f^{k}\}$ that converges pointwise to $f$ [30, Proposition 7.4(c-d)].

2.2 Subdifferentials of ADC functions.

Characterizing the limiting and Clarke subdifferentials can be challenging when dealing with functions that exhibit complex composite structures. Our focus in this subsection is on numerically computable approximations of the limiting subdifferentials. We begin with the definitions.

Definition 3 (approximate subdifferentials).

Consider an ADC function $f:\mathbb{R}^{n}\rightarrow\overline{\mathbb{R}}$ associated with $\{f^{k}=g^{k}-h^{k}\}$ . The approximate subdifferential of $f$ (associated with $\{f^{k}=g^{k}-h^{k}\}$ ) at $\bar{x}\in\mathbb{R}^{n}$ is defined as

\begin{array}[]{rl}\partial_{A}f(\bar{x})&\triangleq\,\bigcup\limits_{x^{k}\rightarrow\bar{x}}\displaystyle\operatornamewithlimits{Lim\,sup}_{k\rightarrow+\infty}\;\big{[}\partial g^{k}(x^{k})-\partial h^{k}(x^{k})\big{]}.\end{array}

The approximate horizon subdifferential of $f$ (associated with $\{f^{k}=g^{k}-h^{k}\}$ ) at $\bar{x}\in\mathbb{R}^{n}$ is defined as

\partial^{\infty}_{A}f(\bar{x})\triangleq\,\bigcup\limits_{x^{k}\rightarrow\bar{x}}{\displaystyle\operatornamewithlimits{Lim\,sup}_{\,k\rightarrow+\infty}}^{\infty}\;\big{[}\partial g^{k}(x^{k})-\partial h^{k}(x^{k})\big{]}.

Unlike the limiting subdifferential which requires $x^{k}\rightarrow_{f}\bar{x}$ , $\partial_{A}f(x)$ is defined using all the sequences $x^{k}\rightarrow\bar{x}$ without necessitating the convergence of function values. It follows directly from the definitions that the mappings $x\mapsto\partial_{A}f(x)$ and $x\mapsto\partial_{A}^{\infty}f(x)$ are osc. The following proposition presents a sufficient condition for $\partial_{A}f(\bar{x})=\partial f(\bar{x})=\emptyset$ at any $\bar{x}\notin\operatorname{dom}f$ .

Proposition 1.

Let $\bar{x}\notin\operatorname{dom}f$ . Then $\partial_{A}f(\bar{x})=\emptyset$ if for any sequence $x^{k}\rightarrow\bar{x}$ , we have $x^{k}\notin\operatorname{dom}f^{k}$ for all sufficiently large $k$ . The latter condition is particularly satisfied whenever $\operatorname{dom}f$ is closed and $\operatorname{dom}f^{k}\subset\operatorname{dom}f$ for all sufficiently large $k$ .

Proof.

Note that for any $x^{k}\rightarrow\bar{x}\notin\operatorname{dom}f$ , we have $[\partial g^{k}(x^{k})-\partial h^{k}(x^{k})]=\emptyset$ for all sufficiently large $k$ due to $x^{k}\notin\operatorname{dom}f^{k}=[\operatorname{dom}g^{k}\cap\operatorname{dom}h^{k}]$ . Thus, $\partial_{A}f(\bar{x})=\emptyset$ for any $\bar{x}\notin\operatorname{dom}f$ . ∎

In the subsequent analysis, we restrict our attention to $\bar{x}\in\operatorname{dom}f$ . Admittedly, the set $\partial_{A}f(\bar{x})$ depends on the approximating sequence $\{f^{k}\}$ and the DC decomposition of each $f^{k}$ , which may contain irrelevant information concerning the local geometry of $\operatorname{epi}f$ . In fact, for a given ADC function $f$ , we can make the set $\partial_{A}f(\bar{x})$ arbitrarily large by adding the same nonsmooth functions to both $g^{k}$ and $h^{k}$ . By Attouch’s theorem (see for example [30, Theorem 12.35]), for proper, lsc, convex functions $f$ and $\{f^{k}\}$ , if $f^{k}\overset{\text{e}}{\rightarrow}f$ , we immediately have $\partial_{A}f=\partial f$ when taking $g^{k}=f^{k}$ and $h^{k}=0$ . In what follows, we further explore the relationships among $\partial_{A}f$ and other commonly employed subdifferentials in the literature beyond the convex setting. As it turns out, with respect to an arbitrary DC function $f^{k}$ that is lsc, $\partial_{A}f(\bar{x})$ contains the limiting subdifferential of $f$ at any $\bar{x}\in\operatorname{dom}f$ whenever $f^{k}\overset{\text{e}}{\rightarrow}f$ .

Theorem 1 (subdifferentials relationships).

Consider an ADC function $f:\mathbb{R}^{n}\to\overline{\mathbb{R}}$ . The following statements hold for any $\bar{x}\in\operatorname{dom}f$ .
(a) If $f$ is e-ADC associated with $\{f^{k}\}$ and $f^{k}$ is lsc, then $\partial f(\bar{x})\subset\partial_{A}f(\bar{x})$ and $\partial^{\infty}f(\bar{x})\subset\partial^{\infty}_{A}f(\bar{x})$ .
(b) If $f$ is locally Lipschitz continuous and bounded from below, then there exists a sequence of DC functions $\{f^{k}\}$ such that $f^{k}\overset{\text{c}}{\rightarrow}f$ , $\partial f(\bar{x})\subset\partial_{A}f(\bar{x})\subset\partial_{C}f(\bar{x})$ , and $\partial^{\infty}_{A}f(\bar{x})=\{0\}$ . Consequently, $\operatorname{con}\partial_{A}f(\bar{x})=\partial_{C}f(\bar{x})$ , the set $\partial_{A}f(\bar{x})$ is nonempty and bounded, and $\partial f(\bar{x})=\partial_{A}f(\bar{x})$ when $f$ is subdifferentially regular at $\bar{x}$ .

Proof.

(a) Let $g^{k}-h^{k}$ be a DC decomposition of $f^{k}$ . Since $f$ is e-ADC, it must be lsc [30, Proposition 7.4(a)]. Using epi-convergence of $\{f^{k}\}$ to $f$ , we know from [30, corollary 8.47(b)] and [30, Proposition 8.46(e)] that any element of $\partial f(\bar{x})$ can be generated as a limit of regular subgradients at $x^{k}$ with $x^{k}\rightarrow_{N}\bar{x}$ and $f^{k}(x^{k})\rightarrow_{N}f(\bar{x})$ for some $N\in\mathbb{N}_{\infty}$ . Indeed, we can further restrict $x^{k}\in\operatorname{dom}f^{k}$ since $f^{k}(x^{k})\rightarrow_{N}f(\bar{x})$ and $\bar{x}\in\operatorname{dom}f$ . Then, we have

\partial f(\bar{x})\subset\bigcup\limits_{x^{k}(\in\operatorname{dom}f^{k})\rightarrow\bar{x}}\displaystyle\operatornamewithlimits{Lim\,sup}_{k\rightarrow+\infty}{\widehat{\partial}}f^{k}(x^{k})\subset\bigcup\limits_{x^{k}(\in\operatorname{dom}f^{k})\rightarrow\bar{x}}\displaystyle\operatornamewithlimits{Lim\,sup}_{k\rightarrow+\infty}\big{[}\partial g^{k}(x^{k})-\partial h^{k}(x^{k})\big{]}\subset\partial_{A}f(\bar{x}),

where the second inclusion can be verified as follows: Firstly, due to the lower semicontinuity of $f^{k}$ and $h^{k}$ , and $x^{k}\in\operatorname{dom}f^{k}\subset\operatorname{dom}g^{k}$ , it follows from the sum rule of regular subdifferentials [30, corollary 10.9] that $\widehat{\partial}g^{k}(x^{k})\supset\widehat{\partial}f^{k}(x^{k})+\widehat{\partial}h^{k}(x^{k})$ . Consequently, $\widehat{\partial}f^{k}(x^{k})\subset\widehat{\partial}g^{k}(x^{k})-\widehat{\partial}h^{k}(x^{k})=\partial g^{k}(x^{k})-\partial h^{k}(x^{k})$ since $g^{k}$ and $h^{k}$ are proper and convex [30, Proposition 8.12]. Similarly, by [30, corollary 8.47(b)], we have

\partial^{\infty}f(\bar{x})\subset\bigcup\limits_{x^{k}(\in\operatorname{dom}f^{k})\rightarrow\bar{x}}{\displaystyle\displaystyle\operatornamewithlimits{Lim\,sup}_{k\rightarrow+\infty}}^{\infty}\;{\widehat{\partial}}f^{k}(x^{k})\subset\bigcup\limits_{x^{k}(\in\operatorname{dom}f^{k})\rightarrow\bar{x}}{\displaystyle\displaystyle\operatornamewithlimits{Lim\,sup}_{k\rightarrow+\infty}}^{\infty}\big{[}\partial g^{k}(x^{k})-\partial h^{k}(x^{k})\big{]}\subset\partial^{\infty}_{A}f(\bar{x}).

(b) For a locally Lipschitz continuous function $f$ , consider its Moreau envelope $e_{\gamma}f(x)\triangleq\inf_{z}\{f(z)+\|z-x\|^{2}/(2\gamma)\}$ and the set-valued mapping $P_{\gamma f}(x)\triangleq\operatornamewithlimits{argmin}_{z}\{f(z)+\|z-x\|^{2}/(2\gamma)\}$ . For any sequence $\gamma_{k}\downarrow 0$ , we demonstrate in the following that $\{f^{k}\triangleq e_{\gamma_{k}}f\}$ is the desired sequence of approximating functions. Firstly, since $f$ is bounded from below, it must be prox-bounded and, thus, each $f^{k}$ is continuous and $f^{k}(\bar{x})\uparrow f(\bar{x})$ for all $\bar{x}$ (cf. [30, Theorem 1.25]). By the continuity of $f$ and $f^{k}$ , we have $f^{k}\overset{\text{c}}{\rightarrow}f$ from [30, Proposition 7.4(c-d)]. It then follows from part (a) that $\partial f(\bar{x})\subset\partial_{A}f(\bar{x})$ . Consider the following DC decomposition of each $f^{k}$ :

f^{k}(x)=\underbrace{\frac{\|x\|^{2}}{2\gamma_{k}}}_{\triangleq g^{k}(x)}-\underbrace{\sup_{z\in\mathbb{R}^{n}}\left\{-f(z)-\frac{\|z\|^{2}}{2\gamma_{k}}+\frac{z^{\top}x}{\gamma_{k}}\right\}}_{\triangleq h^{k}(x)}\qquad x\in\mathbb{R}^{n}.

It is clear that $f(z)+\|z\|^{2}/(2\gamma_{k})+z^{\top}x/\gamma_{k}$ is level-bounded in $z$ locally uniformly in $x$ , since for any $r\in\mathbb{R}$ and any bounded set $X\subset\mathbb{R}^{n}$ , the set

\left\{z\in\mathbb{R}^{n}\middle|\,x\in X,f(z)+\frac{\|z\|^{2}}{2\gamma_{k}}-\frac{z^{\top}x}{\gamma_{k}}\leq r\right\}\subset\left\{z\in\mathbb{R}^{n}\middle|\,x\in X,\|z-x\|^{2}\leq\|x\|^{2}+2\gamma_{k}\left[r-\inf_{z}f(z)\right]\right\}

is bounded. Due to the level-boundedness condition, we can apply the subdifferential formula of the parametric minimization [30, Theorem 10.13] to get

\partial(-h^{k})(x)\subset\bigcup_{z\in P_{\gamma_{k}f}(x)}\left\{y\,\middle|\,(0,y)\in\partial_{(z,x)}\left(f(z)+\frac{\|z\|^{2}}{2\gamma_{k}}-\frac{z^{\top}x}{\gamma_{k}}\right)\right\}{\subset}\bigcup_{z\in P_{\gamma_{k}f}(x)}\left\{\partial f(z)-\frac{x}{\gamma_{k}}\right\},

where the last inclusion is due to the calculus rules [30, Proposition 10.5 and exercise 8.8(c)]. Since $h^{k}$ is convex, we have $-\partial h^{k}(x)=\partial_{C}(-h^{k})(x)=\operatorname{con}\partial(-h^{k})(x)$ by [30, Theorem 9.61], which further yields that

\left[\partial g^{k}(x)-\partial h^{k}(x)\right]\subset\operatorname{con}\,\bigcup\left\{\partial f(z)\,\middle|\,z\in P_{\gamma_{k}f}(x)\right\}\qquad\forall\,x\in\mathbb{R}^{n},\;{k\in\mathbb{N}}.

(4)

For any $x^{k}\rightarrow\bar{x}$ and any $z^{k}\in P_{\gamma_{k}f}(x^{k})$ , we have

\frac{1}{2\gamma_{k}}\|z^{k}-x^{k}\|^{2}+\inf_{x}f(x)\leq\frac{1}{2\gamma_{k}}\|z^{k}-x^{k}\|^{2}+f(z^{k})\leq\frac{1}{2\gamma_{k}}\|\bar{x}-x^{k}\|^{2}+f(\bar{x}).

Then, $\|z^{k}-x^{k}\|\leq\sqrt{\|\bar{x}-x^{k}\|^{2}+2\gamma_{k}[f(\bar{x})-\inf_{x}f(x)]}\rightarrow 0$ due to the assumption that $f$ is bounded from below and therefore $z^{k}\rightarrow\bar{x}$ . By the local Lipschitz continuity of $f$ , it follows from [30, Theorem 9.13] that the mapping $\partial f:x\mapsto\partial f(x)$ is locally bounded at $\bar{x}$ . Thus, there is a bounded set $S$ such that $\bigcup\{\partial f(z^{k})\mid z^{k}\in P_{\gamma_{k}f}(x^{k})\}\subset S$ for all sufficiently large $k$ . It follows directly from [30, Example 4.22] and the definition of the approximate horizon subdifferential that $\partial^{\infty}_{A}f(\bar{x})=\{0\}$ .

Next, we will prove $\partial_{A}f(\bar{x})\subset\partial_{C}f(\bar{x})$ . For any $u\in\partial_{A}f(\bar{x})$ , from (4), there exist sequences of vectors $x^{k}\rightarrow\bar{x}$ and $u^{k}\rightarrow u$ with each $u^{k}$ taken from the convex hull of a bounded set $\bigcup\{\partial f(z^{k})\mid z^{k}\in P_{\gamma_{k}f}(x^{k})\}$ . By Carathéodory’s Theorem (see, e.g. [27, Theorem 17.1]), for each $k$ , we have $u^{k}=\sum_{i=1}^{n+1}\lambda_{k,i}\;v^{k,i}$ for some nonnegative scalars $\{\lambda_{k,i}\}^{n+1}_{i=1}$ with $\sum_{i=1}^{n+1}\lambda_{k,i}=1$ and a sequence $\big{\{}v^{k,i}\in\partial f(z^{k,i})\big{\}}^{n+1}_{i=1}$ with $\{z^{k,i}\in P_{\gamma_{k}f}(x^{k})\}^{n+1}_{i=1}$ . It is easy to see that the sequences $\{\lambda_{k,i}\}_{k\in\mathbb{N}}$ and $\{v^{k,i}\}_{k\in\mathbb{N}}$ are bounded for each $i$ . We can then obtain convergent subsequences $\lambda_{k,i}\to_{N}\bar{\lambda}_{i}\geq 0$ with $\sum_{i=1}^{n+1}\bar{\lambda}_{i}=1$ and $v^{k,i}\to_{N}{\bar{v}}^{\,i}$ for each $i$ . Since $z^{k,i}\to\bar{x}$ , we have ${\bar{v}}^{\,i}\in\partial f(\bar{x})$ by using the outer semicontinuity of $\partial f$ . Thus, $u^{k}\rightarrow_{N}u=\sum_{i=1}^{n+1}\bar{\lambda}_{i}\,{\bar{v}}^{\,i}\in\operatorname{con}\partial f(\bar{x})=\partial_{C}f(\bar{x})$ . This implies that $\partial_{A}f(\bar{x})\subset\partial_{C}f(\bar{x})$ . The rest of the statements in (b) follows from the fact that $\partial_{C}f(\bar{x})$ is nonempty and bounded whenever $f$ is locally Lipschitz continuous [30, Theorem 9.61]. ∎

Under suitable assumptions, Theorem 1(b) guarantees the existence of an ADC decomposition that has its approximate subdifferential contained in the Clarke subdifferential of the original function. Notably, this decomposition may not always be practically useful due to the necessity of computing the Moreau envelope for a generally nonconvex function. Another noteworthy remark is that the assumptions and results of Theorem 1 can be localized to any specific point $\bar{x}$ . This can be accomplished by defining a notion of “local epi-convergence” at $\bar{x}$ and extending the result of [30, corollary 8.47] accordingly.

2.3 Examples of ADC functions.

In this subsection, we provide examples of ADC functions, including functions that are discontinuous relative to their domains, with explicit and computationally tractable approximating sequences. Moreover, we undertake an investigation into the approximate subdifferentials of these ADC functions.

Example 2.1 (implicitly convex-concave functions). The concept of implicitly convex-concave (icc) functions is introduced in the monograph [13], and is further generalized to extended-real-valued functions in [20]. A proper function $f:\mathbb{R}^{n}\rightarrow\overline{\mathbb{R}}$ is icc if there exists a lifted function $\overline{f}:\mathbb{R}^{n}\times\mathbb{R}^{n}\rightarrow\overline{\mathbb{R}}$ such that the following three conditions hold:

(i) $\overline{f}(z,x)=+\infty$ if $z\notin\operatorname{dom}f,x\in\mathbb{R}^{n}$ , and $\overline{f}(z,x)=-\infty$ if $z\in\operatorname{dom}f,x\notin\operatorname{dom}f$ ;

(ii) $\overline{f}(\cdot,x)$ is convex for any fixed $x\in\operatorname{dom}f$ , and $\overline{f}(z,\cdot)$ is concave for any fixed $z\in\operatorname{dom}f$ ;

(iii) $f(x)=\overline{f}(x,x)$ for any $x\in\operatorname{dom}f$ .

A notable example of icc functions is the optimal value function $f_{p}$ in (1), which is associated with the lifted function defined by (the subscripts/superscripts $p$ are omitted for brevity):

\begin{array}[]{lr}\overline{f}(z,x)\triangleq\displaystyle{\inf_{y\in{\mathbb{R}^{d}}}}\left\{(c+Cx)^{\top}y+\frac{1}{2}y^{\top}Q\,y\;\middle|\;Az+By\leq b\right\}\qquad(x,z)\in\operatorname{dom}f\times\operatorname{dom}f.\end{array}

(5)

Let $\partial_{1}\overline{f}(\cdot,x)$ and $\partial_{2}(-\overline{f})(z,\cdot)$ denote the subdifferentials of the convex functions $\overline{f}(\cdot,x)$ and $(-\overline{f})(z,\cdot)$ , respectively, for any $(x,z)\in\operatorname{dom}f\times\operatorname{dom}f$ . For any $\gamma>0$ , the partial Moreau envelope of an icc function $f$ associated with $\overline{f}$ is given by

\inf_{z\in{\mathbb{R}^{n}}}\left\{\overline{f}(z,x)+\frac{1}{2\gamma}\|z-x\|^{2}\right\}=\underbrace{\frac{\|x\|^{2}}{2\gamma}}_{\triangleq g_{\gamma}(x)}-\underbrace{\sup_{z\in{\mathbb{R}^{n}}}\left\{-\overline{f}(z,x)-\frac{\|z\|^{2}}{2\gamma}+\frac{z^{\top}x}{\gamma}\right\}}_{\triangleq h_{\gamma}(x)}\qquad x\in\operatorname{dom}f.\vspace{-0.1in}

(6)

This decomposition, established in [20], offers computational advantages compared to the standard Moreau envelope, as the maximization problem defining $h_{\gamma}$ is concave in $z$ for any fixed $x$ . In what follows, we present new results on the conditions under which the icc function $f$ is e-ADC and c-ADC based on the partial Moreau envelope. Additionally, we explore a relationship between $\partial_{A}f(\bar{x})$ and $\partial_{1}\overline{f}(\bar{x},\bar{x})-\partial_{2}(-\overline{f})(\bar{x},\bar{x})$ , where the latter is known to be an outer estimate of $\partial_{C}f(\bar{x})$ [13, Proposition 4.4.26]. The proof is deferred to Appendix A.

Proposition 2.

Let $f:{\mathbb{R}^{n}}\to\overline{\mathbb{R}}$ be a proper, lsc, icc function associated with $\overline{f}$ , where $\operatorname{dom}f$ is closed and $\overline{f}$ is lsc on ${\mathbb{R}^{n}}\times\operatorname{dom}f$ , bounded below on $\operatorname{dom}f\times\operatorname{dom}f$ , and continuous relative to $\operatorname{int}(\operatorname{dom}f)\times\operatorname{int}(\operatorname{dom}f)$ . Given a sequence of scalars $\gamma_{k}\downarrow 0$ , we have:
(a) $f$ is e-ADC associated with $\{f^{k}\}$ , where each $f^{k}(x)\triangleq{g_{\gamma_{k}}(x)-h_{\gamma_{k}}(x)}+\delta_{\operatorname{dom}f}(x)$ . In addition, if $\operatorname{dom}f={\mathbb{R}^{n}}$ , then $f$ is c-ADC associated with $\{f^{k}\}$ .
(b) $\partial_{A}f(\bar{x})\subset\partial_{1}\overline{f}(\bar{x},\bar{x})-\partial_{2}(-\overline{f})(\bar{x},\bar{x})$ and $\partial^{\infty}_{A}f(\bar{x})=\{0\}$ for any $\bar{x}\in\operatorname{int}(\operatorname{dom}f)$ .

Example 2.2 (VaR for continuous random variables). Given a continuous random variable $Y:\Omega\to\mathbb{R}$ , its conditional value-at-risk (CVaR) at a confidence level $\alpha\in(0,1)$ is defined as $\mbox{CVaR}_{\alpha}(Y)\triangleq\mathbb{E}[\,Y\mid Y\geq\mbox{VaR}_{\alpha}(Y)]$ , where $\mbox{VaR}_{\alpha}$ is the value-at-risk given in Example 1.2 (see, e.g., [29]). For any $\alpha\in(0,1)$ and $k>1/{\alpha}$ , we define

g^{k}(x)\triangleq[k(1-\alpha)+1]\,{\mbox{CVaR}_{\alpha-1/k}}[\,c(x,Z)],\quad h^{k}(x)\triangleq k(1-\alpha)\,{\mbox{CVaR}_{\alpha}}[\,c(x,Z)]\quad x\in\mathbb{R}^{n}.

(7)

The following properties of VaR for continuous random variables hold, with proofs provided in Appendix A.

Proposition 3.

Let $c:\mathbb{R}^{n}\times\mathbb{R}^{m}\to\mathbb{R}$ be a lsc function and $Z:\Omega\to\mathbb{R}^{m}$ be a random vector. Suppose that $c(\cdot,z)$ is convex for any fixed $z\in\mathbb{R}^{m}$ , and $c(x,Z)$ is a random variable having a continuous distribution induced by that of $Z$ for any fixed $x\in\mathbb{R}^{n}$ . Additionally, assume that $\mathbb{E}[\,|c(x,Z)|\,]<+\infty$ for any $x\in\mathbb{R}^{n}$ . For any given constant $\alpha\in(0,1)$ , the following properties hold.
(a) $\mbox{VaR}_{\alpha}[\,{c(\cdot,Z)}]$ is lsc and e-ADC associated with $\{g^{k}-h^{k}\}$ (with the definitions of $g^{k}$ and $h^{k}$ in (7)). Additionally, if $c(\cdot,\cdot)$ is continuous, then $\mbox{VaR}_{\alpha}[\,{c(\cdot,Z)}]$ is continuous and c-ADC associated with $\{g^{k}-h^{k}\}$ .
(b) If there exists a measurable function $\kappa:\mathbb{R}^{m}\to\mathbb{R}_{+}$ such that $\mathbb{E}[\,\kappa(Z)]<+\infty$ and $|c(x,z)-c(x^{\prime},z)|\leq\kappa(z)\|x-x^{\prime}\|$ for all $x,x^{\prime}\in\mathbb{R}^{n}$ and $z\in\mathbb{R}^{m}$ , then for any $\bar{x}\in\mathbb{R}^{n}$ ,

\partial_{A}\mbox{VaR}_{\alpha}[\,c(\cdot,Z)](\bar{x})=\bigcup_{x^{k}\rightarrow\bar{x}}\displaystyle\operatornamewithlimits{Lim\,sup}_{k\rightarrow+\infty}\mathbb{E}\left[\,\partial_{1}\,c(x^{k},Z)\,\middle|\,\mbox{VaR}_{\alpha-1/k}[\,c(x^{k},Z)]<c(x^{k},Z)<\mbox{VaR}_{\alpha}[\,c(x^{k},Z)]\,\right],\vspace{-0.08in}

where $\mathbb{E}[\,\mathcal{A}(Z)\mid B]$ for a random set-valued mapping $\mathcal{A}$ and an event $B$ is defined as the set of conditional expectations $\mathbb{E}[\,a(Z)\mid B]$ for all measurable selections $a(Z)\in\mathcal{A}(Z)$ .

3 The convex composite ADC functions and minimization.

This section aims to derive necessary optimality conditions for (CP₀), particularly focusing on the inner function $f_{p}$ that lacks local Lipschitz continuity. Throughout the rest of this paper, we assume that $\varphi_{p}:\mathbb{R}\to\mathbb{R}\cup\{+\infty\}$ is proper, convex, lsc and $f_{p}:\mathbb{R}^{n}\to\mathbb{R}$ is real-valued for all $p=1,\cdots,m$ . Depending on whether $\varphi_{p}$ is nondecreasing or not, we partition $\{1,\cdots,m\}$ into two categories:

I_{1}\triangleq\left\{\,p\in\{1,\cdots,m\}\,\middle|\,\varphi_{p}\text{ nondecreasing}\right\}\quad\text{and}\quad I_{2}\triangleq\{1,\cdots,m\}\backslash I_{1}.

(8)

We do not specifically address the case where $\varphi_{p}$ is nonincreasing, as one can always redefine $\widetilde{\varphi}_{p}(t)=\varphi_{p}(-t)$ and $\widetilde{f}_{p}(x)=-f_{p}(x)$ , enabling the treatment of these indices in the same manner as those in $I_{1}$ . Therefore, the set $I_{2}$ should be viewed as the collection of indices $p$ where $\varphi_{p}$ is not monotone. We further make the following assumptions on the functions $\varphi_{p}$ and $f_{p}$ .

Assumption 1 For each $p$ , we have (a) $f_{p}$ is e-ADC associated with $\{f^{k}_{p}=g^{k}_{p}-h^{k}_{p}\}_{k\in\mathbb{N}}$ , and $\operatorname{dom}g^{k}_{p}=\operatorname{dom}h^{k}_{p}=\mathbb{R}^{n}$ ; (b) $-\infty<\displaystyle\liminf_{x^{\prime}\rightarrow x,\,k\rightarrow+\infty}f^{k}_{p}(x^{\prime})\leq\limsup_{x^{\prime}\rightarrow x,\,k\rightarrow+\infty}f^{k}_{p}(x^{\prime})<+\infty$ for all $x\in\mathbb{R}^{n}$ ; (c) $\big{[}F^{k}_{p}\triangleq\varphi_{p}\circ f^{k}_{p}\big{]}\overset{\text{e}}{\rightarrow}F_{p}$ .

From Assumption 1(a), each $f^{k}_{p}$ is locally Lipschitz continuous since any real-valued convex function is locally Lipschitz continuous. Obviously, $f^{k}_{p}\overset{\text{c}}{\rightarrow}f_{p}$ is sufficient for Assumption 1(b) to hold. Since $f^{k}_{p}\overset{\text{e}}{\rightarrow}f_{p}$ , we have $\liminf_{x^{\prime}\rightarrow x,k\rightarrow+\infty}f^{k}_{p}(x^{\prime})\geq f_{p}(x)>-\infty$ for each $p$ at any $x\in\mathbb{R}^{n}$ . However, $\limsup_{x^{\prime}\rightarrow x,k\rightarrow+\infty}f^{k}_{p}(x^{\prime})<+\infty$ does not hold trivially. For example, consider a continuous function $f$ and

f^{k}(x)=\left\{\begin{array}[]{cl}f(x)+k^{2}x+k&\text{ if }x\in[-1/k,0]\\ f(x)-k^{2}x+k&\text{ if }x\in(0,1/k]\\ f(x)&\text{ otherwise}\end{array}\right.,

which results in $f^{k}\overset{\text{e}}{\rightarrow}f$ but ${\limsup_{k\rightarrow+\infty}f^{k}(0)=+\infty}$ . Additionally, Assumption 1(b) ensures that at each point $x$ and for any sequence $x^{k}\to x$ , the sequence $\{f^{k}_{p}(x^{k})\}_{k\in\mathbb{N}}$ must be bounded.

It follows from [30, Exercise 7.8(c)] and [32, Theorem 2.4] that there are several sufficient conditions for Assumption 1(c) to hold, which differ based on the monotonicity of each $\varphi_{p}$ : (i) For $p\in I_{1}$ , either $\varphi_{p}$ is real-valued or $f^{k}_{p}\leq f_{p}$ ; (ii) For $p\in I_{2}$ , $f_{p}$ is c-ADC and for all $x$ with $f_{p}(x)\in\operatorname{bdry}(\operatorname{dom}{\varphi_{p}})$ , there exists a sequence ${x^{k}}\to x$ with $f(x^{k})\in\operatorname{int}(\operatorname{dom}\varphi_{p})$ . In addition, according to [30, Proposition 7.4(a)], Assumption 1(c) implies that $F_{p}=\varphi_{p}\circ f_{p}$ is lsc. We also note that Assumption 1(c) doesn’t necessarily imply $\sum_{p=1}^{m}F^{k}_{p}\overset{\text{e}}{\rightarrow}\sum_{p=1}^{m}F_{p}$ . To maintain epi-convergence under addition of functions, one may refer to the sufficient conditions in [30, Theorem 7.46].

3.1 Asymptotic stationarity under epi-convergence.

In this subsection, we introduce a novel stationarity concept for problem (CP₀), grounded in a monotonic decomposition of univariate convex functions. We demonstrate that under certain constraint qualifications, epi-convergence of approximating functions ensures this stationarity concept as a necessary optimality condition. Alongside the known fact that epi-convergence also ensures the consistency of global optimal solutions [30, Theorem 7.31(b)], this highlights the usefulness of epi-convergence as a tool for studying the approximation of problem (CP₀).

The following lemma is an extension of [13, Lemma 6.1.1] from real-valued univariate convex functions to extended-real-valued univariate convex functions.

Lemma 1 (a monotonic decomposition of univariate convex functions).

Let $\varphi:\mathbb{R}\rightarrow\overline{\mathbb{R}}$ be a proper, lsc and convex function. Then there exist a proper, lsc, convex and nondecreasing function $\varphi^{\uparrow}$ , as well as a proper, lsc, convex and nonincreasing function $\varphi^{\downarrow}$ , such that $\varphi=\varphi^{\uparrow}+\varphi^{\downarrow}$ . In addition, if $\operatorname{int}(\operatorname{dom}\varphi)\neq\emptyset$ , then $\partial\varphi(z)=\partial\varphi^{\uparrow}(z)+\partial\varphi^{\downarrow}(z)$ for any $z\in\operatorname{dom}\varphi$ .

Proof.

From the convexity of $\varphi$ , $\operatorname{dom}\varphi$ is an interval on $\mathbb{R}$ , possibly unbounded. In fact, we can explicitly construct $\varphi^{\uparrow}$ and $\varphi^{\downarrow}$ in following two cases.

Case 1. If $\varphi$ has no direction of recession, i.e., there does not exist $d\neq 0$ such that for any $z$ , $\varphi(z+\lambda d)$ is a nonincreasing function of $\lambda>0$ , it follows from [27, Theorem 27.2] that $\varphi$ attains its minimum at some $z^{\ast}\in\operatorname{dom}\varphi$ . Define

\varphi^{\uparrow}(z)=\left\{\begin{array}[]{cl}\varphi(z^{\ast})&\text{if }z\leq z^{\ast}\\ \varphi(z)&\text{if }z>z^{\ast}\end{array}\right.\quad\mbox{and}\quad\varphi^{\downarrow}(z)=\left\{\begin{array}[]{cl}\varphi(z)-\varphi(z^{\ast})&\text{if }z\leq z^{\ast}\\ 0&\text{if }z>z^{\ast}\end{array}\right..

Observe that $\emptyset\neq\operatorname{int}(\operatorname{dom}\varphi)\subset\big{[}\operatorname{int}(\operatorname{dom}\varphi^{\uparrow})\cap\operatorname{int}(\operatorname{dom}\varphi^{\downarrow})\big{]}$ . Consequently, from [27, Theorem 23.8], we have $\partial\varphi(z)=\partial\varphi^{\uparrow}(z)+\partial\varphi^{\downarrow}(z)$ for any $z\in\mathbb{R}$ .

Case 2. Otherwise, there exists $d\neq 0$ such that for any $z\in\mathbb{R}$ , $\varphi(z+\lambda d)$ is a nonincreasing function of $\lambda>0$ . Consequently, $\operatorname{dom}\varphi$ must be an unbounded interval on $\mathbb{R}$ . Let $d=1$ (or $-1$ ) be such a recession direction, then $\varphi$ is nonincreasing (or nondecreasing) on $\mathbb{R}$ . We can set $\varphi^{\uparrow}=0$ and $\varphi^{\downarrow}=\varphi$ (or $\varphi^{\uparrow}=\varphi$ and $\varphi^{\downarrow}=0$ ). In this case, it is obvious that $\partial\varphi(z)=\partial\varphi^{\uparrow}(z)+\partial\varphi^{\downarrow}(z)$ for any $z\in\mathbb{R}$ . The proof is thus completed. ∎

In the subsequent analysis, we use $\varphi^{\uparrow}$ and $\varphi^{\downarrow}$ to denote the monotonic decomposition of any univariate, proper, lsc, and convex function $\varphi$ constructed in the proof of Lemma 1 and, in particular, we take $\varphi^{\downarrow}=0$ whenever $\varphi$ is nondecreasing. We are now ready to present the definition of asymptotically stationary points.

Definition 4 (asymptotically stationary points).

Let each $f_{p}$ be an ADC function associated with $\{f^{k}_{p}=g^{k}_{p}-h^{k}_{p}\}_{k\in\mathbb{N}}$ . For each $p$ , define

T_{p}(x)\triangleq\left\{t_{p}\,\middle|\,\exists\,N\in\mathbb{N}_{\infty}^{\sharp},x^{k}\rightarrow x\text{ with }f^{k}_{p}(x^{k})\rightarrow_{N}t_{p}\right\}\qquad x\in\mathbb{R}^{n}.

(9)

We say that $\bar{x}$ is an asymptotically stationary (A-stationary) point of problem (CP₀) if for each $p$ , there exists $y_{p}\in\bigcup\{\partial\varphi_{p}(t_{p})\mid t_{p}\in T_{p}(\bar{x})\}$ such that

0\,\in\,\sum\limits_{p=1}^{m}\left(\,\big{\{}y_{p}\,\partial_{A}f_{p}(\bar{x})\big{\}}\cup{\big{[}\pm\partial^{\infty}_{A}f_{p}(\bar{x})\backslash\{0\}\,\big{]}}\,\right).

(10)

We say that $\bar{x}$ is a weakly asymptotically stationary (weakly A-stationary) point of problem (CP₀) if for each $p$ , there exist $\bar{t}_{p}\in T_{p}(\bar{x})$ , $y_{p,1}\in\partial\varphi^{\uparrow}_{p}(\bar{t}_{p})$ and $y_{p,2}\in\partial\varphi^{\downarrow}_{p}(\bar{t}_{p})$ such that

0\in\sum\limits_{p=1}^{m}\left(\,\left\{y_{p,1}\,\partial_{A}f_{p}(\bar{x})+y_{p,2}\,\partial_{A}f_{p}(\bar{x})\right\}\cup\big{[}\pm\partial^{\infty}_{A}f_{p}(\bar{x})\backslash\{0\}\big{]}\,\right).

Remark 1.

(i) Given that the approximate subdifferential $\partial_{A}f_{p}$ is determined by the approximating sequence $\{f^{k}_{p}\}_{k\in\mathbb{N}}$ and their corresponding DC decompositions, the notion of (weak) A-stationarity also depends on these sequences and decompositions. (ii) It follows directly from Lemma 1 that an A-stationary point must be a weakly A-stationary point if $\operatorname{int}(\operatorname{dom}\varphi_{p})\neq\emptyset$ for each $p=1,\cdots,m$ . (iii) When each $\varphi_{p}$ is nondecreasing or nonincreasing, the concepts of weak A-stationarity and A-stationarity coincide. (iv) Given a point $\bar{x}$ , we can rewrite (10) as

0\in\sum_{p\in I}\big{[}\pm\partial^{\infty}_{A}f_{p}(\bar{x})\backslash\{0\}\big{]}+\sum_{p\in\{1,\cdots,m\}\backslash I}\{y_{p}\,\partial_{A}f_{p}(\bar{x})\}

for some index set $I\subset\{1,\cdots,m\}$ that is potentially empty. For each $p\in I$ , although the scalar $y_{p}$ does not explicitly appear in this inclusion, its existence implies that $\bigcup\{\partial\varphi_{p}(t_{p})\mid t_{p}\in T_{p}(\bar{x})\}\neq\emptyset$ , which plays a role in ensuring $\bar{x}\in\operatorname{dom}(\varphi_{p}\circ f_{p})$ . For instance, if $f^{k}_{p}\overset{\text{c}}{\rightarrow}f_{p}$ for some $p\in I$ , then $T_{p}(\bar{x})=\{f_{p}(\bar{x})\}$ , and the existence of $y_{p}\in\bigcup\{\partial\varphi_{p}(t_{p})\mid t_{p}\in T_{p}(\bar{x})\}=\partial\varphi_{p}(f_{p}(\bar{x}))$ yields $\bar{x}\in\operatorname{dom}(\varphi_{p}\circ f_{p})$ .

In the following, we take a detour to compare the A-stationarity with the stationarity defined in [32], where the author has focused on a more general composite problem

\operatornamewithlimits{minimize}_{x\in\mathbb{R}^{n}}\;\varphi\left(f(x)\right),

where $\varphi:\mathbb{R}^{m}\to\overline{\mathbb{R}}$ is proper, lsc, convex and $f\triangleq(f_{1},\cdots,f_{m}):\mathbb{R}^{n}\to\mathbb{R}^{m}$ is a locally Lipschitz continuous mapping. Consider the special case where $\varphi(z)=\sum_{p=1}^{m}\varphi_{p}(z_{p})$ with $z=(z_{1},\cdots,z_{m})$ . Under this setting, a vector $\bar{x}$ is called a stationary point in [32] if there exist $\bar{y}$ and $\bar{z}$ such that

0\in S(\bar{x},\bar{y},\bar{z})\triangleq\big{\{}(f_{1}(\bar{x}),\cdots,f_{m}(\bar{x}))-\bar{z}\big{\}}\times\big{\{}\partial\varphi_{1}(\bar{z}_{1})\times\cdots\times\partial\varphi_{m}(\bar{z}_{m})-\bar{y}\big{\}}\times\left(\sum_{p=1}^{m}\bar{y}_{p}\,\partial_{C}f_{p}(\bar{x})\right),

(11)

which can be equivalently written as

0\in\sum_{p=1}^{m}\bar{y}_{p}\,\partial_{C}f_{p}(\bar{x})\;\text{ for some }\;\bar{y}_{p}\in\partial\varphi_{p}(f_{p}(\bar{x}))\quad p=1,\cdots,m.

(12)

For any fixed ${k\in\mathbb{N}}$ , a surrogate set-valued mapping $S^{k}$ can be defined similarly as $S$ in (11) by substituting $f_{p}$ and $\varphi_{p}$ with $f^{k}_{p}$ and $\varphi^{k}_{p}$ for each $p$ . The cited paper provides sufficient conditions to ensure $\operatorname{Lim\,sup}_{k\rightarrow+\infty}(\operatorname{gph}S^{k})\subset\operatorname{gph}S$ , which asserts that any accumulation point $(\bar{x},\bar{y},\bar{z})$ of a sequence $\{(x^{k},y^{k},z^{k})\}$ with $0\in S^{k}(x^{k},y^{k},z^{k})$ yields a stationary point $\bar{x}$ . Our study on the asymptotic stationarity differs from [32] in the following aspects:

1.

Our outer convex function $\varphi$ is assumed to have the separable form $\sum_{p=1}^{m}\varphi_{p}$ , while [32] allows a general proper, lsc, convex function. In addition, each $\varphi_{p}$ is fixed in our approximating problem while [32] considers a sequence of convex functions $\{\varphi^{k}_{p}\}_{k\in\mathbb{N}}$ that epi-converges to $\varphi_{p}$ .

We do not require the inner function $f_{p}$ to be locally Lipschitz continuous.
If each $f_{p}$ is locally Lipschitz continuous and bounded from below, it then follows from Theorem 1 that $f_{p}$ is c-ADC associated with $\{f^{k}_{p}=g^{k}_{p}-h^{k}_{p}\}_{k\in\mathbb{N}}$ such that $\partial f_{p}(x)\subset\partial_{A}f_{p}(x)\subset\partial_{C}f_{p}(x)$ and $\partial^{\infty}_{A}f_{p}(x)=\{0\}$ for any $x$ . Moreover, by $f^{k}_{p}\overset{\text{c}}{\rightarrow}f_{p}$ , one has $T_{p}(x)=\{f_{p}(x)\}$ . Thus, for any A-stationary point $\bar{x}$ induced by these ADC decompositions, there exists $\bar{y}_{p}\in\partial\varphi_{p}(f_{p}(\bar{x}))$ for each $p$ such that

0\in\sum_{p=1}^{m}\left\{\bar{y}_{p}\,\partial_{A}f_{p}(\bar{x})\right\}\;\subset\;\sum_{p=1}^{m}\left\{\bar{y}_{p}\,\partial_{C}f_{p}(\bar{x})\right\}.

(13)

Hence, $\bar{x}$ is also a stationary point defined in (12). Indeed, A-stationarity here can be sharper than the latter one as the last inclusion in (13) may not hold with equality.
When $f_{p}$ fails to be locally Lipschitz continuous for some $p$ , it is not known if (11) is still a necessary condition for a local solution of (CP₀). This situation further complicates the fulfillment of conditions outlined in [32, Theorem 2.4], especially the requirement of $f^{k}_{p}\overset{\text{c}}{\rightarrow}f_{p}$ , due to the potential discontinuity of $f_{p}$ . As will be shown in Theorem 2 below, despite these challenges, weak A-stationarity continues to be a necessary optimality condition under Assumption 1.

To proceed, for each $p$ and any $x\in\operatorname{dom}(\varphi_{p}\circ f_{p})$ , we define $S_{p}(x)$ to be a collection of sequences:

{S_{p}(x)\,\triangleq\,\left\{\,\{x^{k}_{p}\}_{k\in\mathbb{N}}\,\middle|\,x^{k}_{p}\rightarrow x\text{ with }\varphi_{p}(f^{k}_{p}(x^{k}_{p}))\rightarrow\varphi_{p}(f_{p}(x))\,\right\}.}

(14)

Theorem 2 (necessary conditions for optimality).

Let $\bar{x}\in\bigcap_{p=1}^{m}\operatorname{dom}F_{p}$ be a local minimizer of problem (CP₀). Suppose that Assumption 1 and the following two conditions hold:
(i) For each $p$ and any sequence $\{x^{k}_{p}\}_{k\in\mathbb{N}}\in S_{p}(\bar{x})$ , there is a positive integer $K$ such that

0\notin\partial_{C}f^{k}_{p}(x^{k}_{p})\quad\text{or}\quad\mathcal{N}_{\operatorname{dom}\varphi_{p}}(f^{k}_{p}(x^{k}_{p}))=\{0\}\quad\forall\,k\geq K,

(15)

and

\bigg{[}0\in y_{p}\,\partial_{A}f_{p}(\bar{x}),\;y_{p}\in\bigcup\big{\{}\mathcal{N}_{\operatorname{dom}\varphi_{p}}(t_{p})\mid t_{p}\in T_{p}(\bar{x})\big{\}}\bigg{]}\quad\Longrightarrow\quad y_{p}=0,\quad p=1,\cdots,m.

(16)

(ii) One has

\left[\sum\limits_{p=1}^{m}w_{p}=0,\;w_{p}\in\partial^{\infty}(\varphi_{p}\circ f_{p})(\bar{x})\right]\quad\Longrightarrow\quad w_{1}=\cdots=w_{m}=0.

(17)

Then $\bar{x}$ is an A-stationary point of (CP₀). Additionally, $\bar{x}$ is a weakly A-stationary point of (CP₀) if $\operatorname{int}(\operatorname{dom}\varphi_{p})\neq\emptyset$ for each $p=1,\cdots,m$ .

Proof.

By using Fermat’s rule [30, Theorem 10.1] and the sum rule of the limiting subdifferentials [30, Corrollary 10.9] due to the condition (17), we have

\begin{split}0&\;\in\partial\left[\sum\limits_{p=1}^{m}(\varphi_{p}\circ f_{p})(\bar{x})\right]\subset\sum\limits_{p=1}^{m}\partial(\varphi_{p}\circ f_{p})(\bar{x})\;\overset{(\rm i)}{\subset}\;\sum\limits_{p=1}^{m}\bigcup_{\{x^{k}_{p}\}_{k\in\mathbb{N}}\in{S_{p}(\bar{x})}}\displaystyle\operatornamewithlimits{Lim\,sup}_{k\rightarrow+\infty}\;\partial(\varphi_{p}\circ f^{k}_{p})(x^{k}_{p})\\ &\overset{(\rm ii)}{\subset}\sum\limits_{p=1}^{m}\bigcup_{\{x^{k}_{p}\}_{k\in\mathbb{N}}\in{S_{p}(\bar{x})}}\displaystyle\operatornamewithlimits{Lim\,sup}_{k\rightarrow+\infty}\;\bigcup\left\{\partial(y^{k}_{p}\,f^{k}_{p})(x^{k}_{p})\,\middle|\,y^{k}_{p}\in\partial\varphi_{p}(f^{k}_{p}(x^{k}_{p}))\right\}\\ &\overset{(\rm iii)}{\subset}\sum\limits_{p=1}^{m}\bigcup_{\{x^{k}_{p}\}_{k\in\mathbb{N}}\in{S_{p}(\bar{x})}}\displaystyle\operatornamewithlimits{Lim\,sup}_{k\rightarrow+\infty}\;\left\{y^{k}_{p}\,v^{k}_{p}\,\middle|\,y^{k}_{p}\in\partial\varphi_{p}(f^{k}_{p}(x^{k}_{p})),\,v^{k}_{p}\in\partial_{C}f^{k}_{p}(x^{k}_{p})\right\}\\ &\overset{(\rm iv)}{\subset}\sum\limits_{p=1}^{m}\bigcup_{\{x^{k}_{p}\}_{k\in\mathbb{N}}\in{S_{p}(\bar{x})}}\displaystyle\operatornamewithlimits{Lim\,sup}_{k\rightarrow+\infty}\;\left\{y^{k}_{p}\,v^{k}_{p}\,\middle|\,y^{k}_{p}\in\partial\varphi_{p}(f^{k}_{p}(x^{k}_{p})),\,v^{k}_{p}\in\left[\partial g^{k}_{p}(x^{k}_{p})-\partial h^{k}_{p}(x^{k}_{p})\right]\right\}.\end{split}

(18)

The inclusion $(\rm i)$ is due to $\varphi_{p}\circ f^{k}_{p}\overset{\text{e}}{\rightarrow}\varphi_{p}\circ f_{p}$ in Assumption 1(c) and approximation of subgradients under epi-convergence [30, corollary 8.47] and [30, Proposition 8.46(e)]; $(\rm ii)$ follows from the nonsmooth Lagrange multiplier rule [30, Exercise 10.52] due to the local Lipschitz continuity of $f^{k}_{p}$ [30, Example 9.14] and the condition (15); $(\rm iii)$ and $(\rm iv)$ use the calculus rules of the Clarke subdifferential [12, Chapter 2.3]. For each $p$ , any sequence $\{x^{k}_{p}\}_{k\in\mathbb{N}}\in S_{p}(\bar{x})$ and any element

\bar{w}_{p}\in\displaystyle\operatornamewithlimits{Lim\,sup}_{k\rightarrow+\infty}\;\left\{y^{k}_{p}\,v^{k}_{p}\,\middle|\,y^{k}_{p}\in\partial\varphi_{p}(f^{k}_{p}(x^{k}_{p})),v^{k}_{p}\in\big{[}\partial g^{k}_{p}(x^{k}_{p})-\partial h^{k}_{p}(x^{k}_{p})\big{]}\right\},

there is a subsequence $w^{k}_{p}\rightarrow_{N}\bar{w}_{p}$ with $w^{k}_{p}=y^{k}_{p}\,v^{k}_{p}$ for some $N\in\mathbb{N}^{\sharp}_{\infty}$ . Next, we show the existence of $\bar{y}_{p}\in\bigcup\{\partial\varphi_{p}(t_{p})\mid t_{p}\in T_{p}(\bar{x})\}$ for each $p$ such that

\bar{w}_{p}\in\left\{\,\bar{y}_{p}\,\partial_{A}f_{p}(\bar{x})\,\right\}\,\cup\,\big{[}\pm\partial^{\infty}_{A}f_{p}(\bar{x})\backslash\{0\}\,\big{]}.

(19)

By Assumption 1(b), the subsequence $\{f^{k}_{p}(x^{k}_{p})\}_{k\in N}$ is bounded. Taking a subsequence if necessary, we can suppose that $f^{k}_{p}(x^{k}_{p})\to_{N}\bar{z}_{p}\in T_{p}(\bar{x})$ . If $\{y^{k}_{p}\}_{k\in N}$ is unbounded, then $\{v^{k}_{p}\}_{k\in N}$ has a subsequence converging to $0$ and, thus, $0\in\partial_{A}f_{p}(\bar{x})$ . Additionally, there exists $\widetilde{y}_{p}\neq 0$ such that

\hskip-5.05942pt\frac{y^{k}_{p}}{|y^{k}_{p}|}\rightarrow_{N}\widetilde{y}_{p}\in{\displaystyle\operatornamewithlimits{Lim\,sup}_{k(\in N)\rightarrow+\infty}}^{\infty}\;\partial\varphi_{p}(f^{k}_{p}(x^{k}_{p}))\overset{(\rm v)}{=}{\displaystyle\operatornamewithlimits{Lim\,sup}_{k(\in N)\rightarrow+\infty}}^{\infty}\;\widehat{\partial}\varphi_{p}(f^{k}_{p}(x^{k}_{p}))\overset{(\rm vi)}{\subset}\partial^{\infty}\varphi_{p}(\bar{z}_{p})\overset{(\rm vii)}{=}\mathcal{N}_{\operatorname{dom}\varphi_{p}}(\bar{z}_{p}).

(20)

The equation $(\rm v)$ follows from [30, Proposition 8.12] by the convexity of $\varphi_{p}$ . From $\{x^{k}_{p}\}_{k\in\mathbb{N}}\in{S_{p}(\bar{x})}$ and $\bar{x}\in\operatorname{dom}F_{p}$ , we must have $f^{k}_{p}(x^{k}_{p})\in\operatorname{dom}\varphi_{p}$ for sufficiently large $k\in N$ . Since $\varphi_{p}$ is lsc, it holds that $\varphi_{p}(\bar{z}_{p})\leq\liminf_{k(\in N)\rightarrow+\infty}\varphi_{p}(f^{k}_{p}(x^{k}_{p}))=\varphi_{p}(f_{p}(\bar{x}))$ and, thus, $\bar{z}_{p}\in\operatorname{dom}\varphi_{p}$ . Also, notice that $\varphi_{p}$ is continuous relative to its domain as it is univariate convex and lsc [27, Theorem 10.2]. This continuity implies $\varphi_{p}(f^{k}_{p}(x^{k}_{p}))\rightarrow_{N}\varphi_{p}(\bar{z}_{p})$ . The inclusion $(\rm vi)$ follows directly from the definition of the horizon subdifferential. Lastly, $(\rm vii)$ is due to the lower semicontinuity of the proper convex function $\varphi$ and [30, Proposition 8.12]. Therefore, we have $(0\neq)\widetilde{y}_{p}\in\bigcup\{\mathcal{N}_{\operatorname{dom}\varphi_{p}}(t_{p})\mid t_{p}\in T_{p}(\bar{x})\big{\}}$ with $0\in\widetilde{y}_{p}\partial_{A}\,f_{p}(\bar{x})$ due to $0\in\partial_{A}f_{p}(\bar{x})$ , contradicting (16). So far, we conclude that $\{y^{k}_{p}\}_{k\in N}$ is a bounded sequence. Suppose that $y^{k}_{p}\rightarrow_{N}\bar{y}_{p}$ and, thus, $\bar{y}_{p}\in\partial\varphi_{p}(\bar{z}_{p})$ by the outer semicontinuity of $\partial\varphi_{p}$ [30, Proposition 8.7].

Case 1. If $\bar{y}_{p}=0$ , inclusion (19) holds trivially for $\bar{w}_{p}=0$ , and for $\bar{w}_{p}\neq 0$ we can find a subsequence $\{|y^{k}_{p}|\}_{k\in N^{\prime}}\downarrow 0$ such that $\{|y^{k}_{p}|\,v^{k}_{p}\}_{k\in N^{\prime}}$ converges to $\bar{w}_{p}$ or $-\bar{w}_{p}(\neq 0)$ with $v^{k}_{p}\in\big{[}\partial g^{k}_{p}(x^{k}_{p})-\partial h^{k}_{p}(x^{k}_{p})\big{]}$ for all $k\in N^{\prime}$ . Therefore, (19) follows from

\bar{w}_{p}\in\left[\,\Big{(}\pm{\displaystyle\displaystyle\operatornamewithlimits{Lim\,sup}_{k\rightarrow+\infty}}^{\infty}\;\big{[}\partial g^{k}_{p}(x^{k}_{p})-\partial h^{k}_{p}(x^{k}_{p})\big{]}\Big{)}\backslash\{0\}\,\right]\subset\big{[}\pm\partial^{\infty}_{A}f_{p}(\bar{x})\backslash\{0\}\big{]}.

Case 2. Otherwise, $\|v^{k}_{p}\|\rightarrow_{N}\|\bar{w}_{p}\|/|\bar{y}_{p}|$ . This means that $\{v^{k}_{p}\}_{k\in N}$ is bounded. Suppose $v^{k}_{p}\to_{N}\bar{v}_{p}$ . Then, $\bar{v}_{p}\in\operatorname{Lim\,sup}_{k\rightarrow+\infty}\big{[}\partial g^{k}_{p}(x^{k}_{p})-\partial h^{k}_{p}(x^{k}_{p})\big{]}\subset\partial_{A}f_{p}(\bar{x})$ , and (19) is evident from $\bar{w}_{p}=\bar{y}_{p}\,\bar{v}_{p}$ .

In either case, we have proved (19). Combining (18) with (19), for some $\bar{y}_{p}\in\bigcup\{\partial\varphi_{p}(t_{p})\mid t_{p}\in T_{p}(\bar{x})\}$ , we know that $\bar{x}$ is an A-stationary point of (CP₀). The final assertion follows from Remark 1(ii). ∎

3.2 An example of A-stationarity.

We present an example to illustrate the concept of A-stationarity and to study its relationship with other known optimality conditions.

Example 3.1 (bi-parametrized two-stage stochastic programs). Consider the following bi-parametrized two-stage stochastic program with fixed scenarios described in [21]:

\displaystyle\operatornamewithlimits{minimize}_{x\in{\mathbb{R}^{n}}}\;\,\theta(x)+\frac{1}{m_{1}}\sum_{p=1}^{m_{1}}f_{p}(x)\quad\mbox{subject to}\;\;\phi_{p}(x)\leq 0,\quad p=1,\cdots,m_{2},

(21)

where $\theta,\phi_{p}:{\mathbb{R}^{n}}\to\mathbb{R}$ are convex, continuously differentiable for $p=1,\cdots,m_{2}$ , and $f_{p}$ , as defined in (1), is real-valued for $p=1,\cdots,m_{1}$ . At $x=\bar{x}$ , let $Y_{p}(\bar{x})$ and $\Lambda_{p}(\bar{x})$ represent the optimal solutions and multipliers for each second-stage problem (1). Suppose that $Y_{p}(\bar{x})$ and $\Lambda_{p}(\bar{x})$ are bounded. Note that $\theta$ and $\phi_{p}$ are ADC functions since they are convex. Example 2.1 shows that $f_{p}$ is an ADC function, and therefore, problem (21) is a specific case of the composite model (CP₀). Given an A-stationary point $\bar{x}$ of (21), under the assumptions of Example 2.1, we have

	$\displaystyle\hskip-7.22743pt0$	$\displaystyle\in\nabla\theta(\bar{x})+\frac{1}{m_{1}}\sum_{p=1}^{m_{1}}\big{(}\left\{\partial_{A}f_{p}(\bar{x})\right\}\cup\left[\,\pm\partial^{\infty}_{A}f_{p}(\bar{x})\backslash\{0\}\right]\big{)}+\sum_{p=1}^{m_{2}}\bar{\mu}^{\,m_{1}+p}\,\nabla\phi_{p}(\bar{x})$		(22)
		$\displaystyle\subset\nabla\theta(\bar{x})+\frac{1}{m_{1}}\sum_{p=1}^{m_{1}}\left\{\partial_{1}\overline{f}_{p}(\bar{x},\bar{x})-\partial_{2}(-\overline{f}_{p})(\bar{x},\bar{x})\right\}+\sum_{p=1}^{m_{2}}\bar{\mu}^{\,m_{1}+p}\,\nabla\phi_{p}(\bar{x}),$		(22)

where $\bar{\mu}^{m_{1}+p}\in\mathcal{N}_{(-\infty,0]}(\phi_{p}(\bar{x}))$ for $p=1,\cdots,m_{2}$ and $\overline{f}_{p}$ is defined in (5) for $p=1,\cdots,m_{1}$ . By assumptions, both $\Lambda_{p}(\bar{x})$ and $Y_{p}(\bar{x})$ are nonempty, bounded, and

\Lambda_{p}(\bar{x})\times Y_{p}(\bar{x})=\left\{(\bar{y}^{p},\bar{\mu}^{p})\,\middle|\,c^{p}+C^{p}\bar{x}+Q^{p}\,\bar{y}^{p}+(B^{p})^{\top}\bar{\mu}^{p}=0,\;0\leq b^{p}-A^{p}\bar{x}-B^{p}\bar{y}^{p}\;\perp\;\bar{\mu}^{p}\geq 0\right\}.

It then follows from Danskin’s Theorem [11, Theorem 2.1] that

	$\displaystyle\partial_{1}\overline{f}_{p}(\bar{x},\bar{x})$	$\displaystyle=\operatorname{con}\left\{(A^{p})^{\top}\bar{\mu}^{p}\,\middle\|\,\bar{\mu}^{p}\in\Lambda_{p}(\bar{x})\right\}=\left\{(A^{p})^{\top}\bar{\mu}^{p}\,\middle\|\,\bar{\mu}^{p}\in\Lambda_{p}(\bar{x})\right\},$
	$\displaystyle\partial_{2}(-\overline{f}_{p})(\bar{x},\bar{x})$	$\displaystyle=\operatorname{con}\left\{-(C^{p})^{\top}\bar{y}^{p}\,\middle\|\,\bar{y}^{p}\in Y_{p}(\bar{x})\right\}=\left\{-(C^{p})^{\top}\bar{y}^{p}\,\middle\|\,\bar{y}^{p}\in Y_{p}(\bar{x})\right\}.$

Combining these expressions with (22), we obtain

\left\{\begin{array}[]{ll}0=\nabla\theta(\bar{x})+\displaystyle\frac{1}{m_{1}}\sum\limits_{p=1}^{m_{1}}\left[(C^{p})^{\top}\bar{y}^{p}+(A^{p})^{\top}\bar{\mu}^{p}\right]+\sum_{p=1}^{m_{2}}\bar{\mu}^{m_{1}+p}\,\nabla\phi_{p}(\bar{x}),\\ c^{\,p}+C^{p}\bar{x}+Q^{p}\,\bar{y}^{p}+(B^{p})^{\top}\bar{\mu}^{p}=0,\;\,0\leq b^{p}-A^{p}\bar{x}-B^{p}\bar{y}^{p}\;\perp\;\bar{\mu}^{p}\geq 0,\quad p=1,\cdots,m_{1},\\[3.61371pt] 0\leq\phi_{p}(\bar{x})\;\perp\;\bar{\mu}^{\,m_{1}+p}\geq 0,\quad p=1,\cdots,m_{2},\end{array}\right.

which are the Karush-Kuhn-Tucker (KKT) conditions for the deterministic equivalent of (21).

4 A computational algorithm.

In this section, we consider a double-loop algorithm for solving problem (CP₀). The inner loop finds an approximate stationary point of the perturbed composite optimization problem

\displaystyle\operatornamewithlimits{minimize}_{x\in\mathbb{R}^{n}}\;\,\sum_{p=1}^{m}\left[F^{k}_{p}(x)\triangleq\varphi_{p}(f^{k}_{p}(x))\right]

(23)

by solving a sequence of convex subproblems, while the outer loop drives $k\to+\infty$ . It is important to note the potential infeasibility in (23) because $[F^{k}_{p}=\varphi_{p}\circ f^{k}_{p}]\overset{\text{e}}{\rightarrow}F_{p}$ in Assumption 1(c), together with $\operatorname{dom}(\varphi_{p}\circ f_{p})\neq\emptyset$ , does not guarantee $\operatorname{dom}(\varphi_{p}\circ f^{k}_{p})\neq\emptyset$ for all ${k\in\mathbb{N}}$ . This can be seen from the example of $\varphi(t)=\delta_{(-\infty,0]}(t)$ , $f(x)=\max\{x,0\}-1/10$ and $f^{k}(x)=\max\{x,0\}+1/k-1/10$ . Obviously $\operatorname{dom}(\varphi\circ f)=(-\infty,1/10]$ and $\varphi\circ f^{k}\overset{\text{e}}{\rightarrow}\varphi\circ f$ by [32, Theorem 2.4(d)], but we have $\operatorname{dom}(\varphi\circ f^{k})=\emptyset$ for $k=1,\cdots,9$ . Even though $\operatorname{dom}(\varphi_{p}\circ f^{k}_{p})\neq\emptyset$ for all ${k\in\mathbb{N}}$ and each $p$ , this does not imply the feasibility of convex subproblems used in the inner loop to approximate (23).

For simplicity of the analysis, we assume that in problem (CP₀), $\varphi_{p}$ is real-valued for $p=1,\cdots,m_{1}$ , and $\varphi_{p}=\delta_{(-\infty,0]}$ for $p=m_{1}+1,\cdots,m$ . Namely, the problem takes the following form:

\displaystyle\operatornamewithlimits{minimize}_{x\in\mathbb{R}^{n}}\;\,\sum_{p=1}^{m_{1}}\left[F_{p}(x)=\varphi_{p}\big{(}f_{p}(x)\big{)}\right]\quad\mbox{subject to}\;\,f_{p}(x)\leq 0,\;\,p=m_{1}+1,\cdots,m.

(CP₁)

For $p=1,\cdots,m_{1}$ , the convexity of each real-valued function $\varphi_{p}$ implies its continuity by [27, corollary 10.1.1]. Consequently, the composite function $F^{k}_{p}=\varphi_{p}\circ f^{k}_{p}$ is also continuous for $p=1,\cdots,m_{1}$ and $k\in\mathbb{N}$ due to the continuity of each approximating function $f^{k}_{p}$ . It is important to note that model (CP₁) still covers discontinuous objective functions since each $f_{p}$ can be discontinuous, even though the approximating sequence $\{f^{k}_{p}\}_{k\in\mathbb{N}}$ only consists of locally Lipschitz continuous functions.

4.1 Assumptions and examples.

Firstly, we make an assumption to address the feasibility issue outlined at the start of this section. For all $k\in\mathbb{N}$ and $p=m_{1}+1,\cdots,m$ , define

{\alpha^{k}_{p}\triangleq\sup_{x\in X^{k}}\left[f^{k+1}_{p}(x)-f^{k}_{p}(x)\right]_{+}}\text{ with }X^{k}\triangleq\left\{x\in\mathbb{R}^{n}\,\middle|\,f^{k}_{p}(x)\leq 0,\,p=m_{1}+1,\cdots,m\right\}.

Based on these auxiliary sequences, we need an initial point $x^{0}$ that is strictly feasible to the constraints $f^{0}_{p}(x)\leq 0$ for each $p=m_{1}+1,\cdots,m$ .

Assumption 2 (strict feasibility) There exist $x^{0}$ and nonnegative sequences $\big{\{}\widehat{\alpha^{k}_{p}}\big{\}}_{k\in\mathbb{N}}$ for $p=m_{1}+1,\cdots,m$ , such that $\alpha^{k}_{p}\leq\widehat{\alpha^{k}_{p}}$ for all ${k\in\mathbb{N}}$ and $\sum_{{k^{\prime}}=0}^{+\infty}\widehat{\alpha^{{k^{\prime}}}_{p}}<+\infty,\quad f^{0}_{p}(x^{0})\leq-\sum_{{k^{\prime}}=0}^{+\infty}\widehat{\alpha^{{k^{\prime}}}_{p}},\qquad\,p=m_{1}+1,\cdots,m.$

To streamline our notation and analysis, we extend the definitions of $\alpha^{k}_{p}$ and introduce $\widehat{\alpha^{k}_{p}}$ for $p=1,\cdots,m_{1}$ by setting $\alpha^{k}_{p}=\widehat{\alpha^{k}_{p}}=0$ for all $k\in\mathbb{N}$ and $p=1,\cdots,m_{1}$ . Since the quantity $\alpha^{k}_{p}$ depends on the sequence $\{f^{k}_{p}\}_{k\in\mathbb{N}}$ , Assumption 2 poses a condition for this approximating sequence. Consider a fixed index $p\in\{m_{1}+1,\cdots,m\}$ . One can use the following way to construct $\{\alpha^{k}_{p}\}_{k\in N}$ . Suppose that there exist a function $\widetilde{f_{p}}:\mathbb{R}^{n}\times[0,1]\to\mathbb{R}$ and a nonnegative sequence $\gamma_{k}\downarrow 0$ such that

\widetilde{f_{p}}(x,\gamma_{k})=f^{k}_{p}(x)\quad\text{and}\quad\widetilde{f_{p}}(x,0)=f_{p}(x).

Additionally, assume that for any fixed $x$ , the function $\widetilde{f_{p}}(x,\cdot)$ is continuous on $[0,1]$ and differentiable on $(0,1)$ , and there exists a constant $C_{p}$ such that $\big{|}\nabla_{\gamma}\widetilde{f_{p}}(x,\gamma)\big{|}\leq C_{p}$ for any $x$ and $\gamma\in(0,1)$ . For any fixed $x$ , by the mean value theorem, there exists a point $\bar{\gamma}_{k}\in(\gamma_{k+1},\gamma_{k})$ such that $f^{k+1}_{p}(x)-f^{k}_{p}(x)=(\gamma_{k+1}-\gamma_{k})\nabla_{\gamma}\widetilde{f_{p}}(x,\bar{\gamma}_{k})$ . Thus,

\sum^{+\infty}_{k^{\prime}=0}\alpha^{k^{\prime}}_{p}\leq\sum^{+\infty}_{k^{\prime}=0}(\gamma_{k^{\prime}}-\gamma_{k^{\prime}+1})\sup_{x\in\mathbb{R}^{n}}\left|\nabla_{\gamma}\widetilde{f_{p}}(x,\bar{\gamma}_{k^{\prime}})\right|\leq\sum^{+\infty}_{k^{\prime}=0}\left[\,\widehat{\alpha^{k^{\prime}}_{p}}\triangleq C_{p}(\gamma_{k^{\prime}}-\gamma_{k^{\prime}+1})\right]=C_{p}\gamma_{0}<+\infty.

Two more assumptions on the approximating sequences $\{f_{p}^{k}\}_{k\in\mathbb{N}}$ are needed.

Assumption 3 (smoothness of $g_{p}^{k}$ or $h_{p}^{k}$ ) For each ${k\in\mathbb{N}}$ , there exists $\ell_{k}>0$ such that $\min\Big{\{}\,\mathbb{H}\big{(}\partial g^{k}_{p}(x),\partial g^{k}_{p}(x^{\prime})\big{)},\,\mathbb{H}\big{(}\partial h^{k}_{p}(x),\partial h^{k}_{p}(x^{\prime})\big{)}\,\Big{\}}\leq\ell_{k}\|x^{\prime}-x\|\quad\forall\,x,x^{\prime}\in\mathbb{R}^{n},\;p=1,\cdots,m.$ Assumption 4 (level-boundedness) For each ${k\in\mathbb{N}}$ , the function ${H^{k}\triangleq}\sum_{p=1}^{m}F^{k}_{p}$ is level-bounded, i.e., for any $r\in\mathbb{R}$ , the set ${\left\{x\in\mathbb{R}^{n}\,\middle|\,\sum^{m_{1}}_{p=1}\varphi_{p}(f^{k}_{p}(x))+\sum^{m}_{p=m_{1}+1}\delta_{(-\infty,0]}(f^{k}_{p}(x))\leq r\right\}=}\left\{x\in\mathbb{R}^{n}\,\middle|\,\sum\limits_{p=1}^{m_{1}}\varphi_{p}\left(f^{k}_{p}(x)\right)\leq r\right\}\cap X^{k}$ is bounded.

Assumption 3 imposes conditions on the Lipschitz continuity of the subdifferential mapping $\partial g^{k}_{p}$ or $\partial h^{k}_{p}$ , which will be used to determine the termination rule of the inner loop. A straightforward sufficient condition for this assumption is that, for each $p$ and $k$ , at least one of the functions $g^{k}_{p}$ and $h^{k}_{p}$ is $\ell_{k}$ -smooth, i.e., $\|\nabla g^{k}_{p}(x)-\nabla g^{k}_{p}(x^{\prime})\|\leq\ell_{k}\|x-x^{\prime}\|$ or $\|\nabla h^{k}_{p}(x)-\nabla h^{k}_{p}(x^{\prime})\|\leq\ell_{k}\|x-x^{\prime}\|$ for any $x,x^{\prime}\in\mathbb{R}^{n}$ . We also remark that Assumption 3 can hold even though both $g^{k}_{p}$ and $h^{k}_{p}$ are nondifferentiable. This can be seen from the following univariate example: $g^{k}_{p}(x)=|x|$ and $h^{k}_{p}(x)=|x-1|$ for any $x\in\mathbb{R}$ . It is not difficult to verify that Assumption 3 holds for $\ell_{k}=2$ . Assumption 4 is a standard condition to ensure the boundedness of the generated sequences for each ${k\in\mathbb{N}}$ .

In addition, we need a technical assumption to ensure the boundedness of the multiplier sequences in our algorithm.

Assumption 5 (an asymptotic constraint qualification) For any $\bar{x}\in{\bigcap_{p=1}^{m}}\operatorname{dom}F_{p}$ , if there exists $\{y_{p}\}_{p=1}^{m}$ satisfying $0=\sum_{p=1}^{m}y_{p}\,v_{p}$ where for each $p$ (with the definition of $T_{p}(\bar{x})$ in (9)), $(y_{p},\,v_{p})\in\Big{(}\bigcup\big{\{}\mathcal{N}_{\operatorname{dom}\varphi_{p}}(t_{p})\mid t_{p}\in T_{p}(\bar{x})\big{\}}\times\operatorname{con}\partial_{A}f_{p}(\bar{x})\Big{)}\cup\Big{(}\mathbb{R}\times\left[\,\partial^{\infty}_{A}f_{p}(\bar{x})\backslash\{0\}\,\right]\Big{)},$ (24) then we must have $y_{1}=\cdots=y_{m}=0$ .

The normal cone $\mathcal{N}_{\operatorname{dom}\varphi_{p}}(t_{p})$ in (24) reduces to $\{0\}$ for $p=1,\cdots,m_{1}$ and $\mathcal{N}_{(-\infty,0]}(t_{p})$ for $p=m_{1}+1,\cdots,m$ . According to the definitions of $\partial_{A}f_{p}(\bar{x})$ and $\partial^{\infty}_{A}f_{p}(\bar{x})$ , Assumption 5 depends on the approximating sequences $\{f^{k}_{p}\}_{k\in\mathbb{N}}$ for $p=1,\cdots,m$ . It holds trivially if each $\varphi_{p}$ is real-valued and $\partial^{\infty}_{A}f_{p}(\bar{x})=\{0\}$ . By Theorem 1(b), the condition $\partial^{\infty}_{A}f_{p}(\bar{x})=\{0\}$ holds when the ADC decompositions are constructed using the Moreau envelope, provided that $f_{p}$ is locally Lipschitz continuous and bounded from below. However, in general, Assumption 5 is not easy to verify. For Example 3.1, the assumption translates into

\left[\sum_{p=1}^{m_{2}}\lambda_{p}\nabla\phi_{p}(\bar{x})=0,\;\lambda_{p}\in\mathcal{N}_{(-\infty,0]}(\phi_{p}(\bar{x})),\;\;p=1,\cdots,m_{2}\right]\quad\Longrightarrow\quad\lambda_{1}=\cdots=\lambda_{m_{2}}=0.

This is equivalent to the Mangasarian-Fromovitz constraint qualiﬁcation (MFCQ) for problem (21) by [30, Example 6.40]; see also [28].

Furthermore, if each $f_{p}$ is c-ADC associated with $\{f^{k}_{p}=g^{k}_{p}-h^{k}_{p}\}_{k\in\mathbb{N}}$ such that $\operatorname{con}\partial_{A}f_{p}(\bar{x})=\partial_{C}f_{p}(\bar{x})$ , and $\partial^{\infty}_{A}f_{p}(\bar{x})=\{0\}$ , Assumption 5 states that

\left[\,0\in\sum_{p=1}^{m}y_{p}\,\partial_{C}f_{p}(\bar{x}),\quad y_{p}\in\mathcal{N}_{\operatorname{dom}\varphi_{p}}(f_{p}(\bar{x})),\;\;p=1,\cdots,m\,\right]\quad\Longrightarrow\quad y_{1}=\cdots=y_{m}=0.

This condition aligns with the constraint qualification for the composite optimization problem in [32, Proposition 2.1], and is stronger than the condition in the nonsmooth Lagrange multiplier rule [30, Exercise 10.52]. Finally, Assumption 5 implies the constraint qualifications (15)-(17) in Theorem 2. We formally present this conclusion in the following proposition. The proof of Proposition 4 is given in Appendix B.

Proposition 4 (consequences of Assumption 5).

Suppose that Assumptions 1 and 5 hold, and $f^{k}_{p}\overset{\text{e}}{\rightarrow}f_{p}$ for each $p$ . If $\sup\varphi_{p}=+\infty$ for $p\in I_{1}$ , and $f_{p}$ is locally Lipschitz continuous for $p\in I_{2}$ (with the definitions of $I_{1}$ and $I_{2}$ in (8)), then conditions (15), (16), and (17) hold at any feasible point $\bar{x}$ of (CP₁). Consequently, any local solution of (CP₁) is a (weakly) A-stationary point of (CP₁).

In the following, we use two examples to further illustrate Assumption 3 and the computation of $\{\widehat{\alpha^{k}_{p}}\}_{k\in\mathbb{N}}$ in Assumption 2.

Example 4.1 (icc constraints). Let $f_{p}$ be real-valued and icc associated with $\overline{f}_{p}$ , where $\overline{f}_{p}(\cdot,x)$ is Lipschitz continuous with modulus $L$ for any $x$ . For the sequence $\{f^{k}_{p}\}_{k\in\mathbb{N}}$ in Example 2.1, it follows from $g^{k}_{p}(x)=\|x\|^{2}/(2\gamma_{k})$ that Assumption 3 holds for $\ell_{k}=1/\gamma_{k}$ . To construct the quantities $\widehat{\alpha^{k}_{p}}$ in Assumption 2, we notice that

\alpha^{k}_{p}\leq\sup_{x\in\mathbb{R}^{n}}\left[f^{k+1}_{p}(x)-f^{k}_{p}(x)\right]_{+}\,\leq\,\displaystyle\sup_{x\in\mathbb{R}^{n}}\left[f_{p}(x)-f^{k}_{p}(x)\right]_{+}\,\leq\,\frac{\gamma_{k}\,L^{2}}{2}\,\triangleq\,\widehat{\alpha^{k}_{p}}\qquad\forall\,{k\in\mathbb{N}},

(25)

where the second inequality is due to $f^{k+1}_{p}(x)\leq f_{p}(x)$ for any $x$ , and the last one uses the bound between the partial Moreau envelope and the original function [20, Lemma 3]. Thus, the sequence $\big{\{}\widehat{\alpha^{k}_{p}}\big{\}}_{k\in\mathbb{N}}$ satisfies $\sum_{{k^{\prime}}=0}^{+\infty}\widehat{\alpha^{{k^{\prime}}}_{p}}<+\infty$ if $\{\gamma_{k}\}$ is summable.

Alternatively, we can construct the quantities $\widehat{\alpha^{k}_{p}}$ as follows. Let the partial Moreau envelope in (6) be the function $\widetilde{f_{p}}$ jointly defined for $(x,\gamma)\in\mathbb{R}^{n}\times(0,1]$ , and $\widetilde{f_{p}}(x,0)=f_{p}(x)$ for any $x$ . We claim that $\widetilde{f_{p}}(x,\cdot)$ is continuous on $[0,1]$ and differentiable on $(0,1)$ for any fixed $x$ . Continuity in $\gamma$ can be simply checked by a standard argument [30, Theorem 1.17(c)], noting that the optimal value is achieved at a unique point as the function $\overline{f}_{p}(\cdot,x)+\|\cdot-x\|^{2}/(2\gamma)$ is strongly convex for any fixed $x$ . Differentiability follows from the Danskin’s Theorem [11, Theorem 2.1] that $\nabla_{\gamma}\widetilde{f_{p}}(x,\gamma)=-\|z-x\|^{2}/(2\gamma^{2})$ with $z$ satisfying $(x-z)/\gamma\in\partial_{1}\bar{f}(z,x)$ for any $(x,\gamma)\in\mathbb{R}^{n}\times(0,1]$ . It then follows from the Lipschitz continuity of $\overline{f}_{p}(\cdot,x)$ that $\big{|}\nabla_{\gamma}\widetilde{f_{p}}(x,\gamma)\big{|}\leq L^{2}/2\triangleq C_{p}$ for any $(x,\gamma)\in\mathbb{R}^{n}\times(0,1]$ . Therefore, $\alpha^{k}_{p}\leq C_{p}(\gamma_{k}-\gamma_{k+1})\triangleq\widehat{\alpha^{k}_{p}}$ and $\sum^{+\infty}_{k^{\prime}=0}\widehat{\alpha^{k^{\prime}}_{p}}=C_{p}\gamma_{0}<+\infty$ for any sequence $\{\widetilde{f_{p}}(\cdot,\gamma_{k})\}_{k\in\mathbb{N}}$ defined by the partial Moreau envelope with $\gamma_{k}\downarrow 0$ .

Example 4.2 (VaR constraints for log-normal distributions). Consider $f_{p}(x)=\mbox{VaR}_{\alpha}[\,c(x,Z)]$ with $c(x,Z)=\exp(\,x^{\top}Z)$ for some random vector $Z\sim\mbox{Normal}(\mu,\Sigma)$ , where $\Sigma$ is a positive definite covariance matrix. We have $c(x,Z)\sim\mbox{Lognormal}\big{(}x^{\top}\mu,\sqrt{x^{\top}\Sigma x}\big{)}$ . The variable $x$ is restricted to a compact set $X\subset\mathbb{R}^{n}$ . Denote the $\alpha$ -quantile of the standard normal distribution by $q_{\alpha}$ and the cumulative distribution function of the standard normal distribution by $\Phi(\cdot)$ . By direct calculation (cf. [23, Section 3.2]), we have

\mbox{VaR}_{\alpha}[\,c(x,Z)]=\exp\left({x^{\top}\mu+\sqrt{x^{\top}\Sigma\,x}\,q_{\alpha}}\right),\;\mbox{CVaR}_{\alpha}[\,c(x,Z)]=\exp\left(x^{\top}\mu+\frac{x^{\top}\Sigma x}{2}\right)\frac{\Phi\big{(}\sqrt{x^{\top}\Sigma x}-q_{\alpha}\big{)}}{1-\alpha}.

Hence, $f_{p}(x)=\mbox{VaR}_{\alpha}[\,c(x,Z)]$ is neither convex nor concave if $q_{\alpha}<0$ . For the sequence $\{f^{k}_{p}\}_{k\in\mathbb{N}}$ in Example 2.2, we can derive that

h^{k}_{p}(x)=k(1-\alpha)\,\mbox{CVaR}_{\alpha}[\,c(x,Z)]=k\,\exp\left(x^{\top}\mu+x^{\top}\Sigma x/2\right)\,\Phi\big{(}\sqrt{x^{\top}\Sigma x}-q_{\alpha}\big{)}.

Since $\Sigma$ is positive definite, it is easy to see that $h^{k}_{p}$ is twice continuously differentiable. Consequently, $h^{k}_{p}$ is $\ell_{k}$ -smooth relative to the compact set $X$ for some $\ell_{k}$ , and Assumption 3 holds (relative to $X$ ). Next, we define $\widetilde{f_{p}}(x,\gamma)=\frac{1}{\gamma}\int^{\alpha}_{\alpha-\gamma}\mbox{VaR}_{t}[\,c(x,Z)]\,\mbox{d}t$ for any $(x,\gamma)\in\mathbb{R}^{n}\times(0,\frac{\alpha}{2}]$ and $\widetilde{f_{p}}(x,0)=f_{p}(x)$ for any $x$ . Obviously, $\widetilde{f_{p}}(x,\cdot)$ is continuous on $[0,\frac{\alpha}{2}]$ and differentiable on $(0,\frac{\alpha}{2})$ for any fixed $x$ . By using the Leibniz rule for differentiating the parametric integral, for $\gamma\in(0,\frac{\alpha}{2})$ , we have

\begin{array}[]{rl}\left|\nabla_{\gamma}\widetilde{f_{p}}(x,\gamma)\right|=&\displaystyle\frac{1}{\gamma^{2}}\int^{\alpha}_{\alpha-\gamma}(\mbox{VaR}_{t}[\,c(x,Z)]-\mbox{VaR}_{\alpha-\gamma}[\,c(x,Z)])\,\mbox{d}t\\[9.39545pt] \leq&\displaystyle\frac{1}{\gamma}\left(\mbox{VaR}_{\alpha}[\,c(x,Z)]-\mbox{VaR}_{\alpha-\gamma}[\,c(x,Z)]\right)\\[7.22743pt] =&\displaystyle\exp({x^{\top}\mu})\;\frac{\exp\big{(}{\sqrt{x^{\top}\Sigma x}\,q_{\alpha}}\big{)}-\exp\big{(}{\sqrt{x^{\top}\Sigma x}\,q_{\alpha-\gamma}}\big{)}}{\gamma}\\[9.39545pt] =&\exp(x^{\top}\mu)\cdot\left[\exp\big{(}\sqrt{x^{\top}\Sigma x}\,q_{\alpha^{\prime}}\big{)}\sqrt{x^{\top}\Sigma x}\;\nabla_{\alpha}q_{\alpha^{\prime}}\right]\end{array}

for some $\alpha^{\prime}\in(\alpha-\gamma,\alpha)$ by the mean-value theorem. By using the fact $\nabla_{\alpha}q_{\alpha}=\sqrt{2\pi}\,\exp(q_{\alpha}^{2}/2)$ , the monotonicity $q_{\alpha/2}<q_{\alpha^{\prime}}<q_{\alpha}$ , and the compactness of $X$ , we further have

{\sup_{x\in X}}\left|\nabla_{\gamma}\widetilde{f_{p}}(x,\gamma)\right|\leq\exp\left(\frac{\max\{q_{\alpha}^{2},q_{\,\alpha/2}^{2}\}}{2}\right)\sup_{x\in X}\left\{\exp\big{(}{x^{\top}\mu+\sqrt{x^{\top}\Sigma x}\,q_{\alpha}}\big{)}\sqrt{(2\pi)\,x^{\top}\Sigma x}\right\}\,\triangleq\,C_{p}<+\infty.

Therefore, $\alpha^{k}_{p}\leq C_{p}(\gamma_{k}-\gamma_{k+1})\triangleq\widehat{\alpha^{k}_{p}}$ and $\sum^{\infty}_{k^{\prime}=0}\widehat{\alpha^{k}_{p}}=C_{p}\gamma_{0}<+\infty$ for any sequence $\{\widetilde{f_{p}}(\cdot,\gamma_{k})\}_{k\in\mathbb{N}}$ with $\gamma_{k}\downarrow 0$ .

4.2 The algorithmic framework and convergence analysis.

We now formalize the algorithm for solving (CP₁). For $p=m_{1}+1,\cdots,m$ , recall the nonnegative sequences $\big{\{}\widehat{\alpha^{k}_{p}}\big{\}}_{k\in\mathbb{N}}$ introduced in Assumption 2, and observe that $\sum^{+\infty}_{k^{\prime}=k}\widehat{\alpha^{k^{\prime}}_{p}}\rightarrow 0$ as $k\to+\infty$ . For consistency of our notation, we also set $\widehat{\alpha^{k}_{p}}\equiv 0$ for all ${k\in\mathbb{N}}$ and $p=1,\cdots,m_{1}$ . At the $k$ -th outer iteration and for $p=1,\cdots,m$ , consider the upper and lower approximation of $f^{k}_{p}$ at a point $y$ by taking some $a^{k}_{p}\in\partial h^{k}_{p}(y)$ , $b^{k}_{p}\in\partial g^{k}_{p}(y)$ and incorporating sequences $\big{\{}\widehat{\alpha^{k}_{p}}\big{\}}_{k\in\mathbb{N}}$ :

\begin{array}[]{rll}f^{k,\text{upper}}_{p}(x;y)\triangleq&g^{k}_{p}(x)-h^{k}_{p}(y)-(a^{k}_{p})^{\top}(x-y)+\sum\limits^{+\infty}_{k^{\prime}=k}\widehat{\alpha^{k^{\prime}}_{p}},\\[7.22743pt] f^{k,\text{lower}}_{p}(x;y)\triangleq&g^{k}_{p}(y)+(b^{k}_{p})^{\top}(x-y)-h^{k}_{p}(x).\end{array}

(26)

Observe that, for fixed $y$ , the upper approximation $f^{k,\text{upper}}_{p}(\cdot\,;y)$ is convex while the lower approximation $f^{k,\text{lower}}_{p}(\cdot\,;y)$ is concave. For $p=1,\cdots,m_{1}$ , consider the following function

\widehat{F^{k}_{p}}(x;y)\triangleq\varphi_{p}^{\uparrow}\left(f^{k,\text{upper}}_{p}(x;y)\right)+\varphi_{p}^{\downarrow}\left(f^{k,\text{lower}}_{p}(x;y)\right),

(27)

which is a convex majorization of $F^{k}_{p}$ at a point $y$ by the fact that $\varphi^{\uparrow}_{p}$ is nondecreasing and $\varphi^{\downarrow}_{p}$ is nonincreasing. For $p=m_{1}+1,\cdots,m$ , consider the convex constraint $f^{k,\text{upper}}_{p}(x;y)\leq 0$ as an approximation for $f^{k}_{p}(x)\leq 0$ .

We summarize the properties of all the surrogate functions as follows. Note that (28a) and (28b) hold for $p=1,\cdots,m$ , while (28c) holds only for $p=1,\cdots,m_{1}$ .


$\displaystyle f^{k,\text{upper}}_{p}(x;y)\geq f^{k}_{p}(x)+\sum^{\infty}_{k^{\prime}=k}\widehat{\alpha^{k^{\prime}}_{p}}$	$\displaystyle\geq f^{k}_{p}(x),\qquad f^{k,\text{upper}}_{p}(x;x)=f^{k}_{p}(x)+\sum^{\infty}_{k^{\prime}=k}\widehat{\alpha^{k^{\prime}}_{p}},$	(28a)
$\displaystyle f^{k,\text{lower}}_{p}(x;y)$	$\displaystyle\leq f^{k}_{p}(x),\qquad f^{k,\text{lower}}_{p}(x;x)=f^{k}_{p}(x),$	(28b)
$\displaystyle\widehat{F^{k}_{p}}(x;y)$	$\displaystyle\geq F^{k}_{p}(x),\qquad\widehat{F^{k}_{p}}(x;x)=F^{k}_{p}(x).$	(28c)

The proposed method for solving problem (CP₁) is outlined in Algorithm 1. The inner loop of the algorithm (indexed by $i$ ) is terminated when the following conditions are satisfied:

\left\{\begin{array}[]{rll}f^{k,\text{upper}}_{p}(x^{k,i+1};x^{k,i})&\leq f^{k}_{p}(x^{k,i+1})+\sum\limits^{+\infty}_{k^{\prime}=k}\widehat{\alpha^{k^{\prime}}_{p}}+\epsilon_{k},&\quad p=1,\cdots,m,\\[3.61371pt] f^{k,\text{lower}}_{p}(x^{k,i+1};x^{k,i})&\geq f^{k}_{p}(x^{k,i+1})-\epsilon_{k},&\quad p\in I_{2},\\[3.61371pt] \|x^{k,i+1}-x^{k,i}\|&\leq\delta_{k}/(\lambda+\ell_{k}),\end{array}\right.

(29)

Algorithm 1 The prox-ADC method for solving (CP₁)

Input: Given $x^{0}$ and $\big{\{}\widehat{\alpha^{k}_{p}}\big{\}}_{k\in\mathbb{N}}$ satisfying Assumption 2. Let $\{\ell_{k}\}$ be a sequence satisfying Assumption 3. Choose $\lambda>0$ , a positive sequence $(\epsilon_{k},\delta_{k})\downarrow 0$ such that $\delta_{k}/(\lambda+\ell_{k})\downarrow 0$ . Set $k=0$ .

1:while a prescribed stopping criterion is not met do

x^{k,0}=x^{k}

3: for

i=0,1,\cdots

4: Take

a^{k,i}_{p}\in\partial\,g^{k}_{p}(x^{k,i})

for

p=1,\cdots,m

and

b^{k,i}_{p}\in\partial h^{k}_{p}(x^{k,i})

for

p=1,\cdots,m_{1}

5: Solve the strongly convex subproblem:

\hskip-10.84006ptx^{k,i+1}=\left[\begin{array}[]{cl}\displaystyle\operatornamewithlimits{argmin}_{x\in\mathbb{R}^{n}}&\;\;\sum\limits_{p=1}^{m_{1}}\widehat{F^{k}_{p}}(x;x^{k,i})+\frac{\lambda}{2}\|x-x^{k,i}\|^{2}\\[10.84006pt] \text{subject to}&\;\;f^{k,\text{upper}}_{p}(x;x^{k,i})\leq 0,\,p=m_{1}+1,\cdots,m\end{array}\right]

(30)

6: if the conditions (29) hold for

\lambda,\ell_{k},\epsilon_{k},\delta_{k}

, and

\sum^{+\infty}_{k^{\prime}=k}\widehat{\alpha^{k^{\prime}}_{p}}

then

7: Break the for-loop

8: else

i\leftarrow i+1

10: end if

11: end for

12:

x^{k+1}=x^{k,i}

13:

k\leftarrow k+1

14:end while

Refer to caption — (a) $F_{1}=\varphi_{1}\circ f_{1}$ for a convex $\varphi_{1}$ and a smooth $f_{1}$ .

In contrast to the prox-linear algorithm that is designed to minimize amenable functions and adopts complete linearization of the inner maps, the prox-ADC method retains more curvature information inherent in these maps (see Figure 1). We emphasize that the prox-ADC method differs from [13, Algorithm 7.1.2] that is designed for solving a problem with a convex composite DC objective and DC constraints. Central to the prox-ADC method is the double-loop structure, where, in contrast to [13, Algorithm 7.1.2], the DC sequence $f^{k}_{p}$ is dynamically updated in the outer loop rather than remaining the same. This adaptation necessitates specialized termination criteria (29) and the incorporation of $\widehat{\alpha^{k}_{p}}$ to maintain feasibility with each update of $f^{k}_{p}$ . In the following, we demonstrate the well-definedness of the prox-ADC method. Specifically, we establish that for each iteration $k$ , the criteria detailed in (29) are attainable in finitely many steps.

Theorem 3 (convergence of the inner loop).

Suppose that Assumptions 1-4 hold. Then the following statements hold.
(a) Problem (30) is feasible for any $k,i\in\mathbb{N}$ .
(b) The stopping rule of the inner loop is achievable in finitely many steps, i.e., the smallest integer $i$ satisfying conditions (29), denoted by $i_{k}$ , is finite for any ${k\in\mathbb{N}}$ .

Proof.

We prove (a) and (b) by induction. For $k=0$ , notice from Assumption 2 and (28a) that $f^{0,\text{upper}}_{p}(x^{0};x^{0})=f^{0}_{p}(x^{0})+\sum^{+\infty}_{{k^{\prime}}=0}\widehat{\alpha^{{k^{\prime}}}_{p}}\leq 0$ for $p=m_{1}+1,\cdots,m$ . Thus, problem (30) is feasible for $k=i=0$ . Assume that (30) is feasible for $k=0$ and some $i=\bar{i}~{}({\in\mathbb{N}})$ . Consequently, $x^{0,\bar{i}+1}$ is well-defined and for $p=m_{1}+1,\cdots,m$ ,

f^{0,\text{upper}}_{p}(x^{0,\bar{i}+1};x^{{0},\bar{i}+1})\overset{\eqref{eq:f_upper}}{=}f^{0}_{p}(x^{0,\bar{i}+1})+\sum\limits^{+\infty}_{{k^{\prime}}=0}\widehat{\alpha^{{k^{\prime}}}_{p}}\;\overset{\eqref{eq:f_upper}}{\leq}\;f^{0,\text{upper}}_{p}(x^{0,\bar{i}+1};x^{0,\bar{i}})\leq 0,

which yields the feasibility of (30) for $k=0$ , $i=\bar{i}+1$ . Hence, by induction, problem (30) is feasible for $k=0$ and any $i\in\mathbb{N}$ . To proceed, recall the function $H^{k}$ defined in Assumption 4. From the update of $x^{0,i+1}$ , we have

H^{0}(x^{0,i+1})=\sum_{p=1}^{m_{1}}F^{0}_{p}(x^{0,i+1})\,{\overset{\eqref{eq:F_hat}}{\leq}}\,\sum_{p=1}^{m_{1}}\widehat{F^{0}_{p}}(x^{0,i+1};x^{0,i})\leq H^{0}(x^{0,i})-\frac{\lambda}{2}\|x^{0,i+1}-x^{0,i}\|^{2}\quad\forall\,i{\in\mathbb{N}}.

(31)

The last inequality follows from the definition of $x^{0,i+1}$ and the second relation in (28c) that $\widehat{F^{0}_{p}}(x^{0,i};x^{0,i})=F^{0}_{p}(x^{0,i})$ for $p=1,\cdots,m_{1}$ . Observe that $H^{0}$ is bounded from below by the continuity of $F^{0}_{p}=\varphi_{p}\circ f^{0}_{p}$ for $p=1,\cdots,m_{1}$ (see the discussion following model (CP₁)) and the level-boundedness of $H^{0}$ . Suppose for contradiction that the stopping rule of the inner loop is not achievable in finitely many steps. Then from (31), $\left\{H^{0}(x^{0,i})\right\}$ converges and $\sum_{i=0}^{\infty}\|x^{0,i+1}-x^{0,i}\|^{2}<+\infty$ . The latter further yields $\|x^{0,i+1}-x^{0,i}\|\rightarrow 0$ and thus the last condition in (29) is achievable in finitely many iterations. Next, to derive a contradiction, it suffices to prove that the first two conditions in (29) can also be achieved in finitely many steps. We only show the first one since the other can be done with similar arguments. By the level-boundedness of $H^{0}$ , the set $S^{0}\triangleq\{x\mid H^{0}(x)\leq H^{0}(x^{0,0})\}$ is compact. Notice that $x^{0,i}\in S^{0}$ for all $i\in\mathbb{N}$ due to (31). For $p=1,\cdots,m$ , we then have

0\leq\;f^{0,\text{upper}}_{p}(x^{0,i+1};x^{0,i})-f^{0}_{p}(x^{0,i+1})-\sum^{+\infty}_{{k^{\prime}}=0}\widehat{\alpha^{{k^{\prime}}}_{p}}\;=\;h^{0}_{p}(x^{0,i+1})-h^{0}_{p}(x^{0,i})-(a^{0,i}_{p})^{\top}(x^{0,i+1}-x^{0,i})\;\longrightarrow\;0,

because $h^{0}_{p}$ is uniformly continuous on the compact set $S^{0}$ and $\{a^{0,i}_{p}\}_{i\in\mathbb{N}}\subset\bigcup\left\{\partial h^{0}_{p}(x)\mid x\in S^{0}\right\}$ is bounded by [27, Theorem 24.7]. Therefore, for a fixed $\epsilon_{0}>0$ , there exists some $i_{0}$ such that $f^{0,\text{upper}}_{p}(x^{0,i_{0}+1};x^{0,i_{0}})\leq f^{0}_{p}(x^{0,i_{0}+1})+\sum^{+\infty}_{{k^{\prime}}=0}\widehat{\alpha^{{k^{\prime}}}_{p}}+\epsilon_{0}$ holds for $p=1,\cdots,m$ . Thus, (a)-(b) hold for $k=0$ .

Now assume that (a)-(b) hold for some $k=\bar{k}~{}({\in\mathbb{N}})$ and, hence $i_{\bar{k}}$ is finite. It then follows from $x^{\bar{k}+1,0}={x^{\bar{k},i_{\bar{k}}}}\in X^{\bar{k}}$ and $f^{\bar{k},\text{upper}}_{p}({x^{\bar{k},i_{\bar{k}}}};x^{\bar{k},i_{\bar{k}}})\leq 0$ that for each $p=m_{1}+1,\cdots,m$ ,

\begin{array}[]{rl}&f^{\bar{k}+1,\text{upper}}_{p}(x^{\bar{k}+1,0};x^{\bar{k}+1,0}){\overset{\eqref{eq:f_upper}}{=}}f^{\bar{k}+1}_{p}(x^{\bar{k}+1,0})+\sum\limits^{+\infty}_{{k^{\prime}}=\bar{k}+1}\widehat{\alpha^{{k^{\prime}}}_{p}}\\[10.84006pt] \leq&f^{\bar{k}}_{p}(x^{\bar{k}+1,0})+\sup\limits_{x\in X^{\bar{k}}}\left[f^{\bar{k}+1}_{p}(x)-f^{\bar{k}}_{p}(x)\right]_{+}+\sum\limits^{+\infty}_{{k^{\prime}}=\bar{k}+1}\widehat{\alpha^{{k^{\prime}}}_{p}}\\[13.00806pt] \leq&f^{\bar{k}}_{p}(x^{\bar{k}+1,0})+\sum\limits^{+\infty}_{{k^{\prime}}=\bar{k}}\widehat{\alpha^{{k^{\prime}}}_{p}}\;\;{\overset{\eqref{eq:f_upper}}{=}}\;\;f^{\bar{k},\text{upper}}_{p}(x^{\bar{k}+1,0};x^{\bar{k},i_{\bar{k}}})\leq 0.\end{array}

Thus, problem (30) is feasible for $k=\bar{k}+1$ and any ${i\in\mathbb{N}}$ . Building upon this, we can now clearly see the validity of (b) for $k=\bar{k}+1$ , as we have shown similar results earlier in the case of $k=0$ . By induction, we complete the proof of (a)-(b). ∎

For any $k\in\mathbb{N}$ , define the set of multipliers for problem (30) as

Y^{k}(x^{k+1})\triangleq\left\{\left(\begin{array}[]{c}y^{k}_{1,1}\\ y^{k}_{1,2}\\ \vdots\\ y^{k}_{m,1}\\ y^{k}_{m,2}\end{array}\right)\middle|\begin{array}[]{rl}&0\in\sum\limits_{p=1}^{m}\Big{[}y^{k}_{p,1}\,\partial f^{k,\text{upper}}_{p}(x^{k,i_{k}+1};x^{k,i_{k}})+y^{k}_{p,2}\,\partial f^{k,\text{lower}}_{p}(x^{k,i_{k}+1};x^{k,i_{k}})\Big{]}\\ &\qquad+\lambda(x^{k,i_{k}+1}-x^{k,i_{k}}),\\[3.61371pt] &y^{k}_{p,1}\in\partial\varphi^{\uparrow}_{p}(f_{p}^{k,\text{upper}}(x^{k,i_{k}+1};x^{k,i_{k}})),\,p=1,\cdots,m,\\[3.61371pt] &y^{k}_{p,2}\in\partial\varphi^{\downarrow}_{p}(f_{p}^{k,\text{lower}}(x^{k,i_{k}+1};x^{k,i_{k}})),\,p=1,\cdots,m.\end{array}\right\}.

Here $x^{k,i_{k}+1}$ is uniquely determined by $x^{k+1}=x^{k,i_{k}}$ as the minimizer of a strongly convex problem (30). Notice that $y^{k}_{p,2}=0$ for $p\in I_{1}$ since $\varphi_{p}$ is nondecreasing and $\varphi^{\downarrow}_{p}=0$ for $p\in I_{1}$ . Let $\{x^{k+1}\}_{k\in N}$ be a subsequence that converges to some point $\bar{x}$ . As we will see in the following lemma, the asymptotic constraint qualification in Assumption 5 implies the non-emptiness and the compactness of $Y^{k}(x^{k+1})$ for all sufficiently large $k\in N$ and the eventual boundedness of $\{Y^{k}(x^{k+1})\}_{k\in N}$ . These technical results play an important role in the convergence analysis of the prox-ADC method. However, a stronger property of equi-boundedness appears necessary for designing practical termination criteria for the algorithm. We will establish this strengthened property in section 4.3 under non-asymptotic constraint qualifications.

Lemma 2 (non-emptiness and eventual boundedness of multipliers).

Let $\bar{x}\in\bigcap^{m}_{p=1}\operatorname{dom}F_{p}$ be a feasible point of problem (CP₁). Suppose that Assumptions 1-5 hold. Consider any sequence $\{x^{k}\}$ generated by the prox-ADC method, with a subsequence $\{x^{k+1}\}_{k\in N}$ converging to $\bar{x}$ . The following statements hold.
(a) The set of multipliers $Y^{k}(x^{k+1})$ is non-empty and compact for all sufficiently large $k\in N$ .
(b) Additionally, if $\partial^{\infty}_{A}f_{p}(\bar{x})=\{0\}$ for $p\in I_{2}$ (with the definition of $I_{2}$ in (8)), then the sequence $\big{\{}Y^{k}(x^{k+1})\big{\}}_{k\in N}$ is eventually bounded.

Proof.

(a) Observe that $x^{k,i_{k}+1}\to_{N}\bar{x}$ because $x^{k+1}=x^{k,i_{k}}\to_{N}\bar{x}$ and $\|x^{k,i_{k}}-x^{k,i_{k}+1}\|\leq\delta_{k}/(\lambda+\ell_{k})\downarrow 0$ by conditions (29). The non-emptiness and compactness of $Y^{k}(x^{k+1})$ for all sufficiently large $k\in N$ is a direct consequence of the nonsmooth Lagrange multiplier rule [30, Exercise 10.52] for problem (30) if we can show that, for all sufficiently large $k\in N$ , $y^{k}_{m_{1}+1}=\cdots=y^{k}_{m}=0$ is the unique solution of the following system

0\in\sum\limits_{p=m_{1}+1}^{m}y^{k}_{p}\,\partial f_{p}^{k,\text{upper}}(x^{k,i_{k}+1};x^{k,i_{k}}),\;\,y^{k}_{p}\in\mathcal{N}_{(-\infty,0]}(f_{p}^{k,\text{upper}}(x^{k,i_{k}+1};x^{k,i_{k}})),\;p=m_{1}+1,\cdots,m.

(32)

Suppose that the above claim does not hold. Then, there exists a subsequence $N^{\prime}\subset N$ such that $y^{k}_{m_{1}+1}=\cdots=y^{k}_{m}=0$ is not the unique solution of (32) for all $k\in N^{\prime}$ . Without loss of generality, suppose $N^{\prime}=N$ and take $\{y^{k}_{p}\}_{k\in N}$ for $p=m_{1}+1,\cdots,m$ satisfying (32) and $\sum_{p=m_{1}+1}^{m}|y^{k}_{p}|=1$ . For each $p$ and $k\in N$ , define

A^{k}_{p}\triangleq\left\{y^{k}_{p}\,v^{k}_{p}\,\middle|\,v^{k}_{p}\in\left\{\partial g^{k}_{p}(x^{k,i_{k}})-\partial h^{k}_{p}(x^{k,i_{k}})\right\}\cup\left\{\partial g^{k}_{p}(x^{k,i_{k}+1})-\partial h^{k}_{p}(x^{k,i_{k}+1}{)}\right\}\right\}.

Then, for all $k\in N$ , we have

\begin{array}[]{rl}&\operatorname{dist}\left(0,\sum\limits_{p=m_{1}+1}^{m}A^{k}_{p}\right)\\[10.84006pt] \overset{(\rm i)}{\leq}&\operatorname{dist}\left(0,\sum\limits_{p=m_{1}+1}^{m}y^{k}_{p}\left[\partial g^{k}_{p}(x^{k,i_{k}+1})-\partial h^{k}_{p}(x^{k,i_{k}})\right]\right)+\sum\limits_{p=m_{1}+1}^{m}\mathbb{D}\left(y^{k}_{p}\left[\partial g^{k}_{p}(x^{k,i_{k}+1})-\partial h^{k}_{p}(x^{k,i_{k}})\right],A^{k}_{p}\,\right)\\[10.84006pt] {\overset{(\rm ii)}{\leq}}&{\operatorname{dist}\left(0,\sum\limits_{p=m_{1}+1}^{m}y^{k}_{p}\,\partial f_{p}^{k,\text{upper}}(x^{k,i_{k}+1};x^{k,i_{k}})\right)}\\[8.67204pt] &\qquad\quad{+\sum\limits_{p=m_{1}+1}^{m}|y^{k}_{p}|\cdot\min\Big{\{}\mathbb{D}\left(\,\partial g^{k}_{p}(x^{k,i_{k}+1}),\,\partial g^{k}_{p}(x^{k,i_{k}})\,\right),\,\mathbb{D}\left(\,\partial h^{k}_{p}(x^{k,i_{k}}),\partial h^{k}_{p}(x^{k,i_{k}+1}\right)\Big{\}}}\\ \overset{(\rm iii)}{\leq}&0+\sum\limits_{p=m_{1}+1}^{m}|y^{k}_{p}|\cdot\min\Big{\{}\mathbb{H}(\partial g^{k}_{p}(x^{k,i_{k}+1}),\partial g^{k}_{p}(x^{k,i_{k}})),\,\mathbb{H}(\partial h^{k}_{p}(x^{k,i_{k}+1}),\partial h^{k}_{p}(x^{k,i_{k}}))\Big{\}}\\[7.22743pt] \overset{(\rm iv)}{\leq}&\sum\limits_{p=m_{1}+1}^{m}|y^{k}_{p}|\cdot\ell_{k}\,\|x^{k,i_{k}+1}-x^{k,i_{k}}\|\;\overset{(\rm v)}{\leq}\;\delta_{k},\end{array}

where $(\rm i)$ uses the inequalities $\mathbb{D}(A,C)\leq\mathbb{D}(A,B)+\mathbb{D}(B,C)$ and $\mathbb{D}(A+B,A^{\prime}+B^{\prime})\leq\mathbb{D}(A,A^{\prime})+\mathbb{D}(B,B^{\prime})$ ; the first term in $(\rm ii)$ is because of the construction of upper convex majorization (26); the second term in $(\rm ii)$ is due to $\mathbb{D}(A,B\cup C)\leq\min\{\mathbb{D}(A,B),\mathbb{D}(A,C)\}$ so that

\begin{array}[]{rl}&\mathbb{D}\left(y^{k}_{p}\left[\,\partial g^{k}_{p}(x^{k,i_{k}+1})-\partial h^{k}_{p}(x^{k,i_{k}})\right],\,A^{k}_{p}\,\right)\\[5.78172pt] =&|y^{k}_{p}|\cdot\mathbb{D}\Big{(}\,\partial g^{k}_{p}(x^{k,i_{k}+1})-\partial h^{k}_{p}(x^{k,i_{k}}),\,\left\{\partial g^{k}_{p}(x^{k,i_{k}})-\partial h^{k}_{p}(x^{k,i_{k}})\right\}\cup\left\{\partial g^{k}_{p}(x^{k,i_{k}+1})-\partial h^{k}_{p}(x^{k,i_{k}+1})\right\}\Big{)}\\[7.22743pt] \leq&|y^{k}_{p}|\cdot\min\Big{\{}\mathbb{D}\left(\,\partial g^{k}_{p}(x^{k,i_{k}+1}),\,\partial g^{k}_{p}(x^{k,i_{k}})\,\right),\,\mathbb{D}\left(\,\partial h^{k}_{p}(x^{k,i_{k}}),\partial h^{k}_{p}(x^{k,i_{k}+1}\right)\Big{\}}.\end{array}

Inequality $(\rm iii)$ is due to (32) and $\mathbb{D}(A,B)\leq\mathbb{H}(A,B)$ ; $(\rm iv)$ is by Assumption 3; and $(\rm v)$ is implied by conditions (29) and $\sum_{p=m_{1}+1}^{m}|y^{k}_{p}|=1$ . Equivalently, for all $k\in N$ and $p=m_{1}+1,\cdots,m$ , there exist $y^{k}_{p}\in\mathcal{N}_{(-\infty,0]}\left(f^{k,\text{upper}}_{p}(x^{k,i_{k}+1};x^{k,i_{k}})\right)$ with $\sum_{p=m_{1}+1}^{m}|y^{k}_{p}|=1$ and

v^{k}_{p}\in\left\{\partial g^{k}_{p}(x^{k,i_{k}})-\partial h^{k}_{p}(x^{k,i_{k}})\right\}\cup\left\{\partial g^{k}_{p}(x^{k,i_{k}+1})-\partial h^{k}_{p}(x^{k,i_{k}+1})\right\}

such that $\|\sum_{p=m_{1}+1}^{m}y^{k}_{p}\,v^{k}_{p}\|\leq\delta_{k}$ . For $p=m_{1}+1,\cdots,m$ , since the subsequence $\{f^{k}_{p}(x^{k,i_{k}+1})\}_{k\in N}$ is bounded by Assumption 1(b), we can assume without loss of generality that $f^{k}_{p}(x^{k,i_{k}+1})$ converges to some $\bar{z}_{p}\in T_{p}(\bar{x})$ as $k(\in N)\to+\infty$ . Furthermore, it can be easily seen from (28a) and (29) that $f^{k,\text{upper}}_{p}(x^{k,i_{k}+1};x^{k,i_{k}})$ converges to the same limit point $\bar{z}_{p}$ for $p=m_{1}+1,\cdots,m$ . Notice that $f^{k,\text{upper}}_{p}(x^{k,i_{k}+1};x^{i,i_{k}})\leq 0$ for all $k\in N$ and $p=m_{1}+1,\cdots,m$ from Theorem 3(a) and, thus, each $\bar{z}_{p}$ must satisfy $\bar{z}_{p}\leq 0$ . Suppose that $y^{k}_{p}\rightarrow_{N}\bar{y}_{p}$ for each $p$ . Then, by the outer semicontinuity of the normal cone [30, Proposition 6.6],

\bar{y}_{p}\in\mathcal{N}_{(-\infty,0]}(\bar{z}_{p})\subset\bigcup\big{\{}\mathcal{N}_{\operatorname{dom}\varphi_{p}}(t_{p})\mid t_{p}\in T_{p}(\bar{x})\big{\}},\quad p=m_{1}+1,\cdots,m.

Obviously, $\sum_{p=m_{1}+1}^{m}|\bar{y}_{p}|=1$ , and the sequence $\{\bar{y}_{p}\}^{m}_{p=m_{1}+1}$ has at least one nonzero element. Consider two cases.

Case 1. If $\{v^{k}_{p}\}_{k\in N}$ is bounded for $p=m_{1}+1,\cdots,m$ , then there are vectors $\{\bar{v}_{p}\}^{m}_{p=m_{1}+1}$ with $\bar{v}_{p}\in\partial_{A}f_{p}(\bar{x})$ such that $v^{k}_{p}\rightarrow_{N}\bar{v}_{p}$ and $0=\sum_{p=m_{1}+1}^{m}\bar{y}_{p}\,\bar{v}_{p}\in\sum_{p=m_{1}+1}^{m}\bar{y}_{p}\,\partial_{A}f_{p}(\bar{x})$ , contradicting Assumption 5 since $\bar{y}_{m_{1}+1},\cdots,\bar{y}_{m}$ are not all zeros.

Case 2. Otherwise, there exists some $p$ such that $\{v^{k}_{p}\}_{k\in N}$ is unbounded. Define the index sets

I_{\text{ub}}\triangleq\left\{\,p\in\{m_{1}+1,\cdots,m\}\,\middle|\,\{v^{k}_{p}\}_{k\in N}\text{ unbounded}\,\right\}\,(\neq\emptyset)\quad\mbox{and}\quad I_{\text{b}}\triangleq\{m_{1}+1,\cdots,m\}\backslash I_{\text{ub}}.

Notice that $\big{\{}\sum_{p\in I_{\text{b}}}y^{k}_{p}\,v^{k}_{p}\big{\}}_{k\in N}$ is bounded. Without loss of generality, assume that this sequence converges to some $\bar{w}$ and, thus, $\sum_{p\in I_{\text{ub}}}y^{k}_{p}\,v^{k}_{p}\rightarrow_{N}(-\bar{w})$ .

Step 1: Next we prove by contradiction that, for each $p\in I_{\text{ub}}$ , the sequence $\{y^{k}_{p}\,v^{k}_{p}\}_{k\in N}$ is bounded. Suppose that the boundedness fails and $\sum_{p\in I_{\text{ub}}}\|y^{k}_{p}\,v^{k}_{p}\|\rightarrow_{N}+\infty$ by passing to a subsequence. Consider $\widetilde{w}^{k}_{p}\triangleq y^{k}_{p}\,v^{k}_{p}/\sum_{p\in I_{\text{ub}}}\|y^{k}_{p}\,v^{k}_{p}\|$ for $p\in I_{\text{ub}}$ . Then $\sum_{p\in I_{\text{ub}}}\widetilde{w}^{k}_{p}\rightarrow_{N}0$ . Since $\sum_{p\in I_{\text{ub}}}\|\widetilde{w}^{k}_{p}\|=1$ for all $k\in N$ , we can assume that there exist $p_{1}\in I_{\text{ub}}$ and $\widetilde{w}_{p_{1}}\neq 0$ such that $\widetilde{w}^{k}_{p_{1}}\rightarrow_{N}\widetilde{w}_{p_{1}}$ . It then follows from the construction of $\widetilde{w}^{k}_{p}$ that $\{\widetilde{w}^{k}_{p}\}_{k\in N}$ has a subsequence converging to some element of $\pm\partial^{\infty}_{A}f_{p}(\bar{x})$ for each $p\in I_{\text{ub}}$ and, in particular, $\widetilde{w}_{p_{1}}\in\big{[}\pm\partial^{\infty}_{A}f_{p_{1}}(\bar{x})\backslash\{0\}\big{]}$ . From $\sum_{p\in I_{\text{ub}}}\widetilde{w}^{k}_{p}\rightarrow_{N}0$ , we obtain

0\in\left[\,\pm\partial^{\infty}_{A}f_{p_{1}}(\bar{x})\backslash\{0\}\,\right]+\sum_{p\in I_{\text{ub}}\backslash\{p_{1}\}}\left[\,\pm\partial^{\infty}_{A}f_{p}(\bar{x})\,\right],

a contradiction to Assumption 5 since the coefficient of the term $[\,\pm\partial^{\infty}_{A}f_{p_{1}}(\bar{x})\backslash\{0\}\,]$ is nonzero. So far, we have shown the boundedness of $\{y^{k}_{p}\,v^{k}_{p}\}_{k\in N}$ for each $p\in I_{\text{ub}}$ .

Step 2: Now suppose that $y^{k}_{p}\,v^{k}_{p}\rightarrow_{N}\bar{w}_{p}$ for each $p\in I_{\text{ub}}$ with $\sum_{p\in I_{\text{ub}}}\bar{w}_{p}=-\bar{w}$ . Thus $y^{k}_{p}\rightarrow_{N}0$ and $\bar{w}_{p}\in\left[\,\pm\partial^{\infty}_{A}f_{p}(\bar{x})\,\right]$ for each $p\in I_{\text{ub}}$ . Since $\sum_{p=m_{1}+1}^{m}|\bar{y}_{p}|=\sum_{p\in I_{\text{b}}}|\bar{y}_{p}|=1$ , there exists $p_{2}\in I_{\text{b}}$ such that $\bar{y}_{p_{2}}\neq 0$ . Then $\sum_{p=m_{1}+1}^{m}y^{k}_{p}\,v^{k}_{p}\rightarrow_{N}0$ implies

0\in\bar{y}_{p_{2}}\,\partial_{A}f_{p_{2}}(\bar{x})+\sum_{p\in I_{\text{b}}\backslash\{p_{2}\}}\bar{y}_{p}\,\partial_{A}f_{p}(\bar{x})+\sum_{p\in I_{\text{ub}}}\left[\,\pm\partial^{\infty}_{A}f_{p}(\bar{x})\,\right],

which leads to a contradiction to Assumption 5. Thus, $Y^{k}(x^{k+1})$ is non-empty and compact for all sufficiently large $k\in N$ .

(b) By part (a), assume from now on that $Y^{k}(x^{k+1})\neq\emptyset$ for all $k\in N$ without loss of generality. We also assume that $\{f^{k}_{p}(x^{k,i_{k}+1})\}_{k\in N}$ converges to some point $\bar{z}_{p}\in T_{p}(\bar{x})$ for $p=1,\cdots,m$ . Then, by (28a), (28b) and (29), $f^{k,\text{upper}}_{p}(x^{k,i_{k}+1};x^{k,i_{k}})\to_{N}\bar{z}_{p}$ for $p=1,\cdots,m$ and $f^{k,\text{lower}}_{p}(x^{k,i_{k}+1};x^{k,i_{k}})\to_{N}\bar{z}_{p}$ for $p\in I_{2}$ . For any $k\in N$ and any $(y^{k}_{1,1},y^{k}_{1,2},\cdots,y^{k}_{m,1},y^{k}_{m,2})\in Y^{k}(x^{k+1})$ , we have $y^{k}_{p,1}\in\partial\varphi^{\uparrow}_{p}(f^{k,\text{upper}}_{p}(x^{k,i_{k}+1};x^{k,i_{k}}))$ and $y^{k}_{p,2}\in\partial\varphi^{\downarrow}_{p}(f^{k,\text{lower}}_{p}(x^{k,i_{k}+1};x^{k,i_{k}}))$ for $p=1,\cdots,m$ satisfying

0\in\sum\limits_{p=1}^{m}\Big{[}\,y^{k}_{p,1}\big{[}\partial g^{k}_{p}(x^{k,i_{k}+1})-\partial\,h^{k}_{p}(x^{k,i_{k}})\big{]}+y^{k}_{p,2}\big{[}\partial g^{k}_{p}(x^{k,i_{k}})-\partial\,h^{k}_{p}(x^{k,i_{k}+1})\big{]}\Big{]}+\lambda(x^{k,i_{k}+1}-x^{k,i_{k}}).

(33)

Due to Assumption 3 and similar arguments in the proof of part (a), the optimality condition (33) implies that

\left\{\begin{array}[]{l}\displaystyle\left\|\sum_{p=1}^{m}\big{(}y^{k}_{p,1}\,v^{k}_{p,1}+y^{k}_{p,2}\,v^{k}_{p,2}\big{)}\right\|{\leq\left[\lambda+\sum\limits_{p=1}^{m}\big{(}|y^{k}_{p,1}|+|y^{k}_{p,2}|\big{)}\ell_{k}\right]\frac{\delta_{k}}{\lambda+\ell_{k}}\leq\max\left\{1,\sum\limits_{p=1}^{m}\big{(}|y^{k}_{p,1}|+|y^{k}_{p,2}|\big{)}\right\}\delta_{k}},\\[14.45377pt] \left[\begin{array}[]{cc}v^{k}_{p,1}\in\partial g^{k}_{p}(x^{k,i_{k}})-\partial h^{k}_{p}(x^{k,i_{k}})\\[7.22743pt] v^{k}_{p,2}\in\partial g^{k}_{p}(x^{k,i_{k}+1})-\partial h^{k}_{p}(x^{k,i_{k}+1})\end{array}\right]\,\text{or}\,\left[\begin{array}[]{cc}v^{k}_{p,1}\in\partial g^{k}_{p}(x^{k,i_{k}+1})-\partial h^{k}_{p}(x^{k,i_{k}+1})\\[7.22743pt] v^{k}_{p,2}\in\partial g^{k}_{p}(x^{k,i_{k}})-\partial h^{k}_{p}(x^{k,i_{k}})\end{array}\right],\,p=1,\cdots,m.\end{array}\right.

(34)

Note that, for $p\in I_{1}$ , $\varphi_{p}$ is nondecreasing, i.e., $\varphi^{\downarrow}_{p}=0$ . Then ${y^{k}_{p,2}}=0$ for all $k\in N$ and $p\in{I_{1}}$ , and the first inequality of (34) is equivalent to

\left\|\sum\limits_{p\in I_{1}}y^{k}_{p,1}\,v^{k}_{p,1}+\sum\limits_{p\in I_{2}}\big{(}y^{k}_{p,1}\,v^{k}_{p,1}+y^{k}_{p,2}\,v^{k}_{p,2}\big{)}\right\|\leq{\max\left\{1,\sum\limits_{p\in I_{1}}|y^{k}_{p,1}|+\sum\limits_{p\in I_{2}}\big{(}|y^{k}_{p,1}|+|y^{k}_{p,2}|\big{)}\right\}\delta_{k}}.

(35)

Observe that the sequences $\{v^{k}_{p,1}\}_{k\in N}$ and $\{v^{k}_{p,2}\}_{k\in N}$ must be bounded for $p\in I_{2}$ . Otherwise, we could assume $\|v^{k}_{p,1}\|\rightarrow_{N}+\infty$ . Then every accumulation point of the unit vectors $\{v^{k}_{p,1}/\|v^{k}_{p,1}\|\}_{k\in N}$ would be in the set $\partial^{\infty}_{A}f_{p}(\bar{x})$ , contradicting our assumption that $\partial^{\infty}_{A}f_{p}(\bar{x})=\{0\}$ for each $p\in I_{2}$ .

For $p\in I_{2}\subset\{1,\cdots,m_{1}\}$ , given that $\varphi^{\uparrow}_{p}$ is convex, real-valued, and $f^{k,\text{upper}}_{p}(x^{k,i_{k}+1};x^{k,i_{k}})\rightarrow_{N}\bar{z}_{p}$ , we can invoke [27, Theorem 24.7] to deduce the boundedness of $\{y^{k}_{p,1}\}_{k\in N}$ . A parallel reasoning applies to demonstrate the boundedness of $\{y^{k}_{p,2}\}_{k\in N}$ .

For $p\in I_{1}$ , we proceed by contradiction to establish the boundedness of $\{y^{k}_{p,1}\}_{k\in N}$ based on Assumption 5. Suppose that $\big{\{}\sum_{p\in I_{1}}|y^{k}_{p,1}|\big{\}}_{k\in N}$ is unbounded and $\sum_{p\in I_{1}}|y^{k}_{p,1}|\rightarrow_{N}+\infty$ by passing to a subsequence. Consider the normalized subsequences $\big{\{}\widetilde{y}^{k}_{p,1}\triangleq y^{k}_{p,1}/\sum_{{p^{\prime}}\in I_{1}}|y^{k}_{{p^{\prime}},1}|\big{\}}_{k\in N}$ and $\big{\{}\widetilde{y}^{k}_{p,2}\triangleq y^{k}_{p,2}/\sum_{p^{\prime}\in I_{1}}|y^{k}_{p^{\prime},1}|\big{\}}_{k\in N}$ for each $p$ . Consequently, $\widetilde{y}^{k}_{p,1}\rightarrow_{N}0$ and $\widetilde{y}^{k}_{p,2}\rightarrow_{N}0$ for $p\in I_{2}$ . By the triangle inequality and (35), we have

\begin{array}[]{rl}&\left|\;\displaystyle\left\|\sum_{p\in I_{1}}\widetilde{y}^{k}_{p,1}\,v^{k}_{p,1}\right\|-\left\|\sum_{p\in I_{2}}\big{(}\widetilde{y}^{k}_{p,1}\,v^{k}_{p,1}+\widetilde{y}^{k}_{p,2}\,v^{k}_{p,2}\big{)}\right\|\;\right|\,\displaystyle\leq\,\left\|\sum_{p\in I_{1}}\widetilde{y}^{k}_{p,1}\,v^{k}_{p,1}+\sum_{p\in I_{2}}\big{(}\widetilde{y}^{k}_{p,1}\,v^{k}_{p,1}+\widetilde{y}^{k}_{p,2}\,v^{k}_{p,2}\big{)}\right\|\\[15.89948pt] \leq&{\displaystyle\max\left\{\frac{1}{\sum_{p\in I_{1}}|y^{k}_{p,1}|},1+\frac{\sum_{p\in I_{2}}\big{(}|y^{k}_{p,1}|+|y^{k}_{p,2}|\big{)}}{\sum_{p\in I_{1}}|y^{k}_{p,1}|}\right\}\delta_{k}}\;\longrightarrow_{N}\;0,\end{array}

which further implies $\big{\|}\sum_{p\in I_{1}}\widetilde{y}^{k}_{p,1}\,v^{k}_{p,1}\big{\|}\rightarrow_{N}0$ by the boundedness of $\{v^{k}_{p,1}\}_{k\in N}$ and $\{v^{k}_{p,2}\}_{k\in N}$ for $p\in I_{2}$ . Now suppose that $\widetilde{y}^{k}_{p,1}\rightarrow_{N}\widetilde{y}_{p,1}$ for $p\in I_{1}$ . Then from a similar reasoning in (20), for $p\in I_{1}$ ,

\widetilde{y}_{p,1}\in{\displaystyle\operatornamewithlimits{Lim\,sup}_{k(\in N)\rightarrow+\infty}}^{\infty}\;\partial\varphi^{\uparrow}_{p}\left(f^{k,\text{upper}}_{p}(x^{k,i_{k}+1};x^{k,i_{k}})\right)\subset\partial^{\infty}\varphi^{\uparrow}_{p}(\bar{z}_{p})=\mathcal{N}_{\operatorname{dom}\varphi^{\uparrow}_{p}}(\bar{z}_{p}),

and obviously $\sum_{p\in I_{1}}|\widetilde{y}_{p,1}|=1$ . The remaining argument to derive a contradiction to Assumption 5 follows the same steps as the proof of part (a) for the two cases, with the exception that the index set $\{m_{1}+1,\cdots,m\}$ is replaced by $I_{1}$ . Thus, the sequences $\{y^{k}_{p,1}\}_{k\in N}$ for $p\in I_{1}\cup I_{2}$ and $\{y^{k}_{p,2}\}_{k\in N}$ for $p\in I_{1}$ are bounded. We can conclude that $\bigcup\{Y^{k}(x^{k+1})\mid k\in N,k\geq K\}$ is bounded for sufficiently large integer $K$ , because otherwise we could extract a subsequence of multipliers from $Y^{k}(x^{k+1})$ whose norms diverge to $+\infty$ as $k(\in N)\to+\infty$ , in contradiction to the result of boundedness that we have shown. Hence, the subsequence $\{Y^{k}(x^{k+1})\}_{k\in N}$ is eventually bounded. ∎

We make a remark on Lemma 2(b) about the additional assumption. According to the proof of part (b), the assumption $\partial^{\infty}_{A}f_{p}(\bar{x})=\{0\}$ for $p\in I_{2}$ ensures the boundedness of the set $\partial_{A}f_{p}(\bar{x})$ for $p\in I_{2}$ . There are some sufficient conditions for $\partial^{\infty}_{A}f_{p}(\bar{x})=\{0\}$ to hold: (i) If $f_{p}$ is locally Lipschitz continuous and bounded from below, by Theorem 1(b), we have $\partial^{\infty}_{A}f_{p}(x)=\{0\}$ at any $x\in\operatorname{dom}f_{p}$ for the approximating sequence generated by the Moreau envelope. (ii) If $f_{p}$ is icc associated with $\overline{f}_{p}$ satisfying all assumptions in Proposition 2, it then follows from Proposition 2(b) that $\partial^{\infty}_{A}f_{p}(x)=\{0\}$ at any $x\in\operatorname{int}(\operatorname{dom}f_{p})$ for the approximating sequence based on the partial Moreau envelope. It is worth mentioning that the icc function $f_{p}$ under condition (ii) is not necessarily locally Lipschitz continuous.

The main convergence result of the prox-ADC method follows.

Theorem 4.

Suppose that Assumptions 1-5 hold. Let $\{x^{k}\}$ be the sequence generated by the prox-ADC method. Suppose that $\{x^{k}\}$ has an accumulation point $\bar{x}$ and, in addition, $\partial^{\infty}_{A}f_{p}(\bar{x})=\{0\}$ for $p\in I_{2}$ . Then $\bar{x}$ is a weakly A-stationary point of (CP₁). Moreover, if for each $p\in I_{2}$ , the functions $g^{k}_{p}$ and $h^{k}_{p}$ are $\ell_{k}$ -smooth for all ${k\in\mathbb{N}}$ , i.e., there exists a sequence $\{\ell_{k}\}$ such that for all ${k\in\mathbb{N}}$ ,

\max\Big{\{}\,\big{\|}\nabla g^{k}_{p}(x)-\nabla g^{k}_{p}(x^{\prime})\big{\|},\,\big{\|}\nabla h^{k}_{p}(x)-\nabla h^{k}_{p}(x^{\prime})\big{\|}\,\Big{\}}\leq\ell_{k}\|x^{\prime}-x\|\quad\forall\,x,x^{\prime}\in\mathbb{R}^{n},\;p\in I_{2},

(36)

then $\bar{x}$ is also an A-stationary point of (CP₁).

Proof.

Let $\{x^{k+1}\}_{k\in N}$ be a subsequence converging to $\bar{x}$ . By the stopping conditions (29) and $x^{k,i_{k}}\rightarrow_{N}\bar{x}$ , we also have $x^{k,i_{k}+1}\rightarrow_{N}\bar{x}$ . First, we prove $\bar{x}\in\bigcap^{m}_{p=1}\operatorname{dom}F_{p}$ . From Theorem 3(a), we have $f^{k}_{p}(x^{k,i_{k}+1})\leq 0$ for $p=m_{1}+1,\cdots,m$ and all $k\in\mathbb{N}$ . Due to epi-convergence in Assumption 1(c), it holds that

\delta_{(-\infty,0]}(f_{p}(\bar{x}))\leq\liminf_{k(\in N)\to+\infty}\delta_{(-\infty,0]}(f^{k}_{p}(x^{k,i_{k}+1}))=0,\qquad p=m_{1}+1,\cdots,m.

Thus $f_{p}(\bar{x})\leq 0$ for $p=m_{1}+1,\cdots,m$ and $\bar{x}\in{\bigcap^{m}_{p=m_{1}+1}\operatorname{dom}F_{p}}$ . By Assumption 1(a), $\operatorname{dom}\varphi_{p}=\mathbb{R}^{n}$ for all $p=1,\cdots,m_{1}$ . This implies $\bar{x}\in\bigcap^{m_{1}}_{p=1}\operatorname{dom}F_{p}$ , and we can conclude that $\bar{x}\in\bigcap^{m}_{p=1}\operatorname{dom}F_{p}$ .

By Lemma 2(a), for all sufficiently large $k\in N$ , we have

\hskip-3.61371pt0\in\sum\limits_{p=1}^{m}\Big{[}\,y^{k}_{p,1}\left(\partial g^{k}_{p}(x^{k,i_{k}+1})-\partial\,h^{k}_{p}(x^{k,i_{k}})\right)+y^{k}_{p,2}\left(\partial g^{k}_{p}(x^{k,i_{k}})-\partial\,h^{k}_{p}(x^{k,i_{k}+1})\right)\Big{]}+\lambda(x^{k,i_{k}+1}-x^{k,i_{k}}),

(37)

where $y^{k}_{p,1}\in\partial\varphi^{\uparrow}_{p}(f^{k,\text{upper}}_{p}(x^{k,i_{k}+1};x^{k,i_{k}}))$ and $y^{k}_{p,2}\in\partial\varphi^{\downarrow}_{p}(f^{k,\text{lower}}_{p}(x^{k,i_{k}+1};x^{k,i_{k}}))$ for $p=1,\cdots,m$ . It follows from Lemma 2(b) that the subsequences $\{y^{k}_{p,1}\}_{k\in N}$ and $\{y^{k}_{p,2}\}_{k\in N}$ are bounded for $p=1,\cdots,m$ . Suppose that $y^{k}_{p,1}\to_{N}\bar{y}_{p,1}$ and $y^{k}_{p,2}\to_{N}\bar{y}_{p,2}$ for $p=1,\cdots,m$ . Recall that the subsequence $\{f^{k}_{p}(x^{k,i_{k}+1})\}_{k\in N}$ is bounded by Assumption 1(b) for $p=1,\cdots,m$ . Without loss of generality, assume that $\{f^{k}_{p}(x^{k,i_{k}+1})\}_{k\in N}$ converges to some point $\bar{z}_{p}\in T_{p}(\bar{x})$ for $p=1,\cdots,m$ . Then, by (28a), (28b) and (29), $f^{k,\text{upper}}_{p}(x^{k,i_{k}+1};x^{k,i_{k}})\to_{N}\bar{z}_{p}$ for $p=1,\cdots,m$ and $f^{k,\text{lower}}_{p}(x^{k,i_{k}+1};x^{k,i_{k}})\to_{N}\bar{z}_{p}$ for $p\in I_{2}$ . From the outer semicontinuity of $\partial\varphi^{\uparrow}_{p}$ and $\partial\varphi^{\downarrow}$ , we have $\bar{y}_{p,1}\in\partial\varphi^{\uparrow}_{p}(\bar{z}_{p})$ for $p=1,\cdots,m$ and $\bar{y}_{p,2}\in\partial\varphi^{\downarrow}_{p}(\bar{z}_{p})$ for $p\in I_{2}$ .

To proceed, we prove by contradiction that the sequence $\{y^{k}_{p,1}\,v^{k}_{p,1}\}_{k\in N}$ is bounded for $p\in I_{1}$ . Suppose that $\sum_{p\in I_{1}}\|y^{k}_{p,1}\,v^{k}_{p,1}\|\rightarrow_{N}+\infty$ . For each $p\in I_{2}$ , the boundedness of $\{v^{k}_{p,1}\}_{k\in N}$ and $\{v^{k}_{p,2}\}_{k\in N}$ follows from the assumption $\partial^{\infty}_{A}f_{p}(\bar{x})=\{0\}$ ; otherwise, any accumulation point of the unit vectors $\{v^{k}_{p,1}/\|v^{k}_{p,1}\|\}_{k\in N}$ would be in $\partial^{\infty}_{A}f_{p}(\bar{x})$ , leading to a contradiction. Since $\{y^{k}_{p,1}\}_{k\in N}$ and $\{y^{k}_{p,1}\}_{k\in N}$ for $p\in I_{2}$ are also bounded, we conclude that the subsequence $\big{\{}\sum_{p\in I_{2}}(y^{k}_{p,1}\,v^{k}_{p,1}+y^{k}_{p,2}\,v^{k}_{p,2})\big{\}}_{k\in N}$ is bounded. Thus, we can assume that

\sum_{p\in I_{2}}\big{(}y^{k}_{p,1}\,v^{k}_{p,1}+y^{k}_{p,2}\,v^{k}_{p,2}\big{)}\rightarrow_{N}\;\bar{w}~{}\left(\in\sum_{p\in I_{2}}(\bar{y}_{p,1}\,\partial_{A}f_{p}(\bar{x})+\bar{y}_{p,2}\,\partial_{A}f_{p}(\bar{x}))\right).

By (35), it follows that $\sum_{p\in I_{1}}y^{k}_{p,1}\,v^{k}_{p,1}\rightarrow_{N}(-\bar{w})$ . Consider $\widetilde{w}^{k}_{p}\triangleq y^{k}_{p,1}\,v^{k}_{p,1}/\sum_{{p^{\prime}}\in I_{1}}\|y^{k}_{{p^{\prime}},1}\,v^{k}_{{p^{\prime}},1}\|$ for $p\in I_{1}$ , and then $\sum_{p\in I_{1}}\widetilde{w}^{k}_{p}\rightarrow_{N}0$ . Given $\sum_{p\in I_{1}}\|\widetilde{w}^{k}_{p}\|=1$ for all $k\in N$ , there must exist $p_{1}\in I_{1}$ such that $\widetilde{w}^{k}_{p_{1}}\rightarrow_{N}\widetilde{w}_{p_{1}}\neq 0$ . For each $p\in I_{1}$ , it then follows from $y^{k}_{p,1}/\sum_{{p^{\prime}}\in I_{1}}\|y^{k}_{{p^{\prime}},1}v^{k}_{{p^{\prime}},1}\|\to_{N}0$ that $\{\widetilde{w}^{k}_{p}\}_{k\in N}$ has a subsequence converging to some element in $\partial^{\infty}_{A}f_{p}(\bar{x})$ . In particular, $\widetilde{w}_{p_{1}}\in\partial^{\infty}_{A}f_{p_{1}}(\bar{x})\backslash\{0\}$ . Since $\sum_{p\in I_{1}}\widetilde{w}^{k}_{p}\rightarrow_{N}0$ , this implies that

0\in\left[\,\partial^{\infty}_{A}f_{p_{1}}(\bar{x})\backslash\{0\}\,\right]+\sum_{p\in I_{1}\backslash\{p_{1}\}}\partial^{\infty}_{A}f_{p}(\bar{x}),\vspace{-0.1in}

which contradicts Assumption 5. Hence, $\{y^{k}_{p,1}\,v^{k}_{p,1}\}_{k\in N}$ is bounded for $p\in I_{1}$ .

We are now ready to prove that $\bar{x}$ is a weakly A-stationary point. Suppose that $y^{k}_{p,1}\,v^{k}_{p,1}\rightarrow_{N}\bar{w}_{p}$ for $p\in I_{1}$ with $\sum_{p\in I_{1}}\bar{w}_{p}=-\bar{w}$ . It remains to show that for each $p\in I_{1}$ , there exists $\bar{y}_{p,1}\in\bigcup\{\partial\varphi^{\uparrow}_{p}(t_{p})\mid t_{p}\in T_{p}(\bar{x})\}$ such that

\bar{w}_{p}\in\left\{\,\bar{y}_{p,1}\,\partial_{A}f_{p}(\bar{x})\,\right\}\cup\left[\,\partial^{\infty}_{A}f_{p}(\bar{x})\backslash\{0\}\,\right],

which can be derived similarly as the proof of (19) in Theorem 2. Summarizing these arguments, we conclude that $\bar{x}$ is a weakly A-stationary point of (CP₁).

Under the additional assumption of the theorem, there exist $y^{k}_{p,1}\in\partial\varphi^{\uparrow}_{p}(f^{k,\text{upper}}_{p}(x^{k,i_{k}+1};x^{k,i_{k}}))$ , $y^{k}_{p,2}\in\partial\varphi^{\downarrow}_{p}(f^{k,\text{lower}}_{p}(x^{k,i_{k}+1};x^{k,i_{k}}))$ for $p=1,\cdots,m$ , and

v^{k}_{p,1}\in\left\{\partial g^{k}_{p}(x^{k,i_{k}})-\partial h^{k}_{p}(x^{k,i_{k}})\right\}\cup\left\{\partial g^{k}_{p}(x^{k,i_{k}+1})-\partial h^{k}_{p}(x^{k,i_{k}+1})\right\}{\mbox{ for }p\in I_{1}}

such that

		$\displaystyle\left\\|\sum_{p\in I_{1}}y^{k}_{p,1}\,v^{k}_{p,1}+\sum_{p\in I_{2}}(y^{k}_{p,1}+y^{k}_{p,2})\left[\nabla g^{k}_{p}(x^{k,i_{k}})-\nabla h^{k}_{p}(x^{k,i_{k}})\right]\right\\|$
	$\displaystyle\overset{(\rm vi)}{\leq}$	$\displaystyle\lambda\\|x^{k,i_{k}+1}-x^{k,i_{k}}\\|+\sum\limits_{p\in I_{1}}\|y^{k}_{p,1}\|\cdot\min\Big{\{}{\mathbb{H}\left(\partial g^{k}_{p}(x^{k,i_{k}+1}),\partial g^{k}_{p}(x^{k,i_{k}})\right),\,\mathbb{H}\left(\partial h^{k}_{p}(x^{k,i_{k}+1}),\partial h^{k}_{p}(x^{k,i_{k}})\right)}\Big{\}}$
		$\displaystyle+\sum_{p\in I_{2}}\Big{(}\|y^{k}_{p,1}\|\cdot\\|\nabla g^{k}_{p}(x^{k,i_{k}})-\nabla g^{k}_{p}(x^{k,i_{k}+1})\\|+\|y^{k}_{p,2}\|\cdot\\|\nabla h^{k}_{p}(x^{k,i_{k}+1})-\nabla h^{k}_{p}(x^{k,i_{k}})\\|\Big{)}$
	$\displaystyle\overset{(\rm vii)}{\leq}$	$\displaystyle\left[\lambda+\left(\sum_{p\in I_{1}}\|y^{k}_{p,1}\|+\sum_{p\in I_{2}}\big{(}\|y^{k}_{p,1}\|+\|y^{k}_{p,2}\|\big{)}\right)\ell_{k}\right]\\|x^{k,i_{k}+1}-x^{k,i_{k}}\\|$
	$\displaystyle\overset{(\rm viii)}{\leq}$	$\displaystyle{\max\left\{1,\sum\limits_{p\in I_{1}}\|y^{k}_{p,1}\|+\sum\limits_{p\in I_{2}}\big{(}\|y^{k}_{p,1}\|+\|y^{k}_{p,2}\|\big{)}\right\}\delta_{k}\qquad\forall\,k\in\mathbb{N}},$

where $(\rm vi)$ is implied by the optimality condition (37), $(\rm vii)$ employs (36) and Assumption 3, and $(\rm viii)$ follows from conditions (29). This inequality is a tighter version of (35) in the sense that, for each $p\in I_{2}$ and ${k\in\mathbb{N}}$ , $v^{k}_{p,1}$ and $v^{k}_{p,2}$ are elements taken from the single-valued mapping $\nabla g^{k}_{p}(\cdot)-\nabla h^{k}_{p}(\cdot)$ evaluated at the same point $x^{k,i_{k}}$ . A straightforward adaptation of the preceding argument confirms that $\bar{x}$ is an A-stationary point of (CP₁). ∎

4.3 Termination criteria.

The previous subsection demonstrates the asymptotic convergence of the algorithm, showing that any accumulation point of the sequence generated by the prox-ADC method is weakly A-stationary. This subsection is dedicated to the non-asymptotic analysis of verifiable termination criteria for practical implementation.

Assumption 6 (non-asymptotic constraint qualifications) Let $\lambda$ be the parameter in Algorithm 1. For all $k\in\mathbb{N}$ and any pair $(x^{\prime},x^{\prime\prime})$ satisfying $x^{\prime\prime}=\left[\begin{array}[]{cl}\displaystyle\operatornamewithlimits{argmin}_{x\in\mathbb{R}^{n}}&\;\;\sum\limits_{p=1}^{m_{1}}\widehat{F^{k}_{p}}(x;x^{\prime})+\frac{\lambda}{2}\|x-x^{\prime}\|^{2}\\[10.84006pt] \text{subject to}&\;\;f^{k,\text{upper}}_{p}(x;x^{\prime})\leq 0,\,p=m_{1}+1,\cdots,m\end{array}\right],$ if there exist $y^{k}_{p}\in\mathcal{N}_{(-\infty,0]}(f_{p}^{k,\text{upper}}(x^{\prime\prime};x^{\prime}))$ for $p=m_{1}+1,\cdots,m$ such that $0\in\sum\limits_{p=m_{1}+1}^{m}y^{k}_{p}\,\partial f_{p}^{k,\text{upper}}(x^{\prime\prime};x^{\prime}),\vspace{-0.08in}$ then we must have $y^{k}_{m_{1}+1}=\cdots=y^{k}_{m}=0$ .

A direct consequence of Assumption 6 and the nonsmooth Lagrange multiplier rule [30, Exercise 10.52] is that the set of multipliers $Y^{k}(x^{k+1})$ is non-empty and compact for any fixed $k\in\mathbb{N}$ . This is in contrast with Lemma 2(a), where the results only hold for sufficiently large $k\in N$ . We will show below that the result on the eventual boundedness of the subsequence $\{Y^{k}(x^{k+1})\}_{k\in N}$ can be strengthened to the equi-boundedness under this assumption.

Proposition 5 (equi-boundedness of multipliers).

Suppose that Assumptions 1-6 hold. Consider any sequence $\{x^{k}\}$ generated by the prox-ADC method. The following statements hold.
(a) If there is a subsequence $\{x^{k+1}\}_{k\in N}$ converging to some $\bar{x}$ and $\partial^{\infty}_{A}f_{p}(\bar{x})=\{0\}$ for $p\in I_{2}$ , then the subsequence $\big{\{}Y^{k}(x^{k+1})\big{\}}_{k\in N}$ is equi-bounded.
(b) If $\{x^{k}\}$ is bounded and $\partial^{\infty}_{A}f_{p}(x)=\{0\}$ for any $x\in\bigcap^{m}_{p=1}\operatorname{dom}F_{p}$ and $p\in I_{2}$ , then the sequence $\big{\{}Y^{k}(x^{k+1})\big{\}}$ is equi-bounded, i,e,

D\,\triangleq\,\sup_{k\in\mathbb{N}}\,\sup_{y\in Y^{k}(x^{k+1})}\|y\|<+\infty.

(38)

Proof.

(a) We know from Lemma 2(b) that the subsequence $\{Y^{k}(x^{k+1})\}_{k\in N}$ is eventually bounded. This implies the existence of an index $K\in N$ such that $\bigcup\{Y^{k}(x^{k+1})\mid k\in N,k\geq K\}$ is bounded. On the other hand, it follows from Assumption 6 that $Y^{k}(x^{k+1})$ is non-empty and compact for any fixed $k\in N$ . Thus, $\bigcup\{Y^{k}(x^{k+1})\mid k\in N\}$ is bounded, and $\{Y^{k}(x^{k+1})\}_{k\in N}$ is equi-bounded.

(b) Suppose for contradiction that $\{Y^{k}(x^{k+1})\}$ is not equi-bounded. Then for any nonnegative integer $j$ , there is an index $k_{j}\in\mathbb{N}$ such that $\|y^{k_{j}}\|\geq j$ for some multiplier $y^{k_{j}}\in Y^{k_{j}}\big{(}x^{k_{j}+1}\big{)}$ . Observe that the nonnegative sequence of indices $\{k_{j}\}_{j\in\mathbb{N}}$ is either bounded or unbounded. It suﬃces to consider these two cases separately.

Suppose first that $\{k_{j}\}_{j\in\mathbb{N}}$ is bounded. There must be an index $\bar{k}\in\mathbb{N}$ that appears infinitely many times in $\{k_{j}\}_{j\in\mathbb{N}}$ . Consequently, the set $Y^{\bar{k}}(x^{\bar{k}+1})$ is unbounded, a contradiction to Assumption 6.

Suppose next that $\{k_{j}\}_{j\in\mathbb{N}}$ is unbounded. For some index set $N^{\prime}\in\mathbb{N}^{\sharp}_{\infty}$ , we have $k_{j}\to+\infty$ as $j(\in N^{\prime})\to+\infty$ . Notice that the subsequence $\{x^{k_{j}}\}_{j\in N^{\prime}}$ is bounded since $\{x^{k}\}$ is bounded. By passing to a subsequence if necessary, we assume that $\{x^{k_{j}}\}_{j\in N^{\prime}}$ converges to some $\widetilde{x}$ . Using epi-convergence in Assumption 1(c) and following the same procedure as in the proof of Theorem 4, we can obtain that $\widetilde{x}\in\bigcap^{m}_{p=1}\operatorname{dom}F_{p}$ . Then, by the assumption in (b), $\partial^{\infty}_{A}f_{p}(\widetilde{x})=\{0\}$ for $p\in I_{2}$ . Henceforth, there is a subsequence $\{x^{k_{j}}\}_{j\in N^{\prime}}\subset\{x^{k}\}$ converging to some $\widetilde{x}$ with a corresponding subsequence of multipliers $\{y^{k_{j}}\in Y^{k_{j}}(x^{k_{j}+1})\}_{j\in N^{\prime}}$ such that $\|y^{k_{j}}\|\to+\infty$ as $j(\in N^{\prime})\to+\infty$ , which is a contradiction to the result of part (a).

We have obtained contradictions for the two cases where $\{k_{j}\}_{j\in\mathbb{N}}$ is bounded or unbounded. Then we can conclude that $\{Y^{k}(x^{k+1})\}$ is equi-bounded and the quantity $D$ defined in (38) is finite. ∎

After obtaining the equi-boundedness of the multipliers, we next introduce a relaxation of the weakly A-stationary point for preparation of the termination criteria. For a proper and convex function $f$ and any $\beta>0$ , we denote $\partial^{\beta}f(\bar{x})\triangleq\bigcup\{\partial f(x)\mid x\in\mathbb{B}(\bar{x},\beta)\}$ , which is related to the Goldstein’s $\beta$ -subdifferential [18].

Definition 5.

Given any $\bar{\eta}>0$ , $\bar{\beta}>0$ and $\bar{k}\in\mathbb{N}$ , we say a point $x$ is a $(\bar{\eta},\bar{\beta},\bar{k})$ -weakly A-stationary point of problem (CP₀) if there exists a nonnegative integer $k\geq\bar{k}$ such that

\operatorname{dist}\left(0,\,\sum^{m}_{p=1}\bigcup\left\{y_{p,1}\big{[}\partial^{\bar{\beta}}g^{k}_{p}(x)-\partial^{\bar{\beta}}h^{k}_{p}(x)\big{]}+y_{p,2}\big{[}\partial^{\bar{\beta}}g^{k}_{p}(x)-\partial^{\bar{\beta}}h^{k}_{p}(x)\big{]}\,\middle|\begin{array}[]{c}\,y_{p,1}\in\partial^{\bar{\beta}}\varphi^{\uparrow}_{p}(f^{k}_{p}(x)),\\ y_{p,2}\in\partial^{\bar{\beta}}\varphi^{\downarrow}_{p}(f^{k}_{p}(x))\end{array}\right\}\right)\leq\bar{\eta}.

We remark that, if each outer function $\varphi_{p}$ is an identity function, i.e., $\varphi_{p}(t)=t$ for any $t\in\mathbb{R}$ , and each inner function $f_{p}$ is DC rather than ADC, the above definition in the context of a DC program is independent of $k$ and says about nearness to a $\bar{\eta}$ -critical point [35, definition 2]. For nonsmooth optimization problem, similar definitions based on the idea of small nearby subgradients, together with the termination criteria, have appeared in the literature [18, 9].

The following proposition reveals the relationship between a $(\bar{\eta},\bar{\beta},\bar{k})$ -weakly A-stationary point and a weakly A-stationary point.

Proposition 6.

Let $\bar{x}\in\bigcap^{m}_{p=1}\operatorname{dom}F_{p}$ be a feasible point of (CP₀). Suppose that Assumption 1 holds and $\partial^{\infty}_{A}f_{p}(\bar{x})=\{0\}$ for each $p=1,\cdots,m$ . For any nonnegative sequence $(\eta_{k},\beta_{k})\downarrow 0$ and some index set $N\in\mathbb{N}^{\sharp}_{\infty}$ , if each $x^{k}$ is a $(\eta_{k},\beta_{k},k)$ -weakly A-stationary point of (CP₀) for $k\in N$ and $x^{k}\to_{N}\bar{x}$ , then $\bar{x}$ is a weakly A-stationary point of (CP₀).

Proof.

By Assumption 1(b), the subsequence $\{f^{k}_{p}(x^{k})\}_{k\in N}$ is bounded for each $p$ . Then, there is an index set $N^{\prime}(\subset N)\in N^{\sharp}_{\infty}$ such that $\{f^{k}_{p}(x^{k})\}_{k\in N^{\prime}}$ converges to some $\bar{t}_{p}\in T_{p}(\bar{x})$ for each $p$ . Using the outer semicontinuity of the subdifferential mapping of a convex function, we have

\displaystyle\operatornamewithlimits{Lim\,sup}_{k(\in N^{\prime})\to+\infty}\;\partial^{\beta_{k}}\varphi^{\uparrow}_{p}(f^{k}_{p}(x^{k}))\subset\partial\varphi^{\uparrow}_{p}(\bar{t}_{p}),\qquad\displaystyle\operatornamewithlimits{Lim\,sup}_{k(\in N^{\prime})\to+\infty}\;\partial^{\beta_{k}}\varphi^{\downarrow}_{p}(f^{k}_{p}(x^{k}))\subset\partial\varphi^{\downarrow}_{p}(\bar{t}_{p})

and

\displaystyle\operatornamewithlimits{Lim\,sup}_{k(\in N^{\prime})\to+\infty}\big{[}\partial^{\beta_{k}}g^{k}_{p}(x^{k})-\partial^{\beta_{k}}h^{k}_{p}(x^{k})\big{]}\subset\partial_{A}f_{p}(\bar{x}).

Thus, by taking an outer limit of the subdifferentials involved in the condition that $x^{k}$ is $(\eta_{k},\beta_{k},k)$ -weakly A-stationary for all $k\in N$ , we know that $\bar{x}$ is a weakly A-stationary point of (CP₀). ∎

We conclude this section with our main result on the termination criteria.

Proposition 7 (termination criteria).

Suppose that Assumptions 1-6 hold. Let $\{x^{k}\}$ be the sequence generated by the prox-ADC method. Suppose that $\{x^{k}\}$ is bounded and $\partial^{\infty}_{A}f_{p}(x)=\{0\}$ for any $x\in\bigcap^{m}_{p=1}\operatorname{dom}F_{p}$ and $p\in I_{2}$ . For any $\bar{\eta}>0$ , $\bar{\beta}>0$ and $\bar{k}\in\mathbb{N}$ , there exists a nonnegative integer $k_{0}\geq\bar{k}$ such that

\max\limits_{1\leq p\leq m}\sum\limits^{+\infty}_{k^{\prime}=k_{0}}\widehat{\alpha^{k^{\prime}}_{p}}+\epsilon_{k_{0}}\leq\bar{\beta},\quad\frac{\delta_{k_{0}}}{\lambda+\ell_{k_{0}}}\leq\bar{\beta},\quad\delta_{k_{0}}\leq\bar{\eta}.

(39)

Consequently, $x^{k_{0},i_{k_{0}}+1}$ is a $\left(\bar{\eta}\cdot\max\{1,\sqrt{2m}D\},\bar{\beta},\bar{k}\right)$ -weakly A-stationary point of problem (CP₁), where $D$ is the constant defined in (38).

Proof.

The existence of $k_{0}\geq\bar{k}$ satisfying (39) is a direct consequence of $\sum^{+\infty}_{k^{\prime}=0}\widehat{\alpha^{k^{\prime}}_{p}}<+\infty$ , $(\epsilon_{k},\delta_{k})\downarrow 0$ , and $\delta_{k}/(\lambda+\ell_{k})\downarrow 0$ . Recalling (34) in the proof of Lemma 2, at the $k_{0}$ -th outer iteration, we have

\left\{\begin{array}[]{l}\displaystyle\left\|\sum_{p=1}^{m}\big{(}y^{k_{0}}_{p,1}\,v^{k_{0}}_{p,1}+y^{k_{0}}_{p,2}\,v^{k_{0}}_{p,2}\big{)}\right\|\leq\max\left\{1,\sum\limits_{p=1}^{m}\big{(}|y^{k_{0}}_{p,1}|+|y^{k_{0}}_{p,2}|\big{)}\right\}\delta_{k_{0}},\\[18.06749pt] \left[\begin{array}[]{cc}v^{k_{0}}_{p,1}\in\partial g^{k_{0}}_{p}(x^{k_{0},i_{k_{0}}})-\partial h^{k_{0}}_{p}(x^{k_{0},i_{k_{0}}})\\[7.22743pt] v^{k_{0}}_{p,2}\in\partial g^{k_{0}}_{p}(x^{k_{0},i_{k_{0}}+1})-\partial h^{k_{0}}_{p}(x^{k_{0},i_{k_{0}}+1})\end{array}\right]\text{or}\left[\begin{array}[]{cc}v^{k_{0}}_{p,1}\in\partial g^{k_{0}}_{p}(x^{k_{0},i_{k_{0}}+1})-\partial h^{k_{0}}_{p}(x^{k_{0},i_{k_{0}}+1})\\[7.22743pt] v^{k_{0}}_{p,2}\in\partial g^{k_{0}}_{p}(x^{k_{0},i_{k_{0}}})-\partial h^{k_{0}}_{p}(x^{k_{0},i_{k_{0}}})\end{array}\right],\,p=1,\cdots,m,\end{array}\right.

where $y^{k_{0}}_{p,1}\in\partial\varphi^{\uparrow}_{p}(f^{k_{0},\text{upper}}_{p}(x^{k_{0},i_{k_{0}}+1};x^{k_{0},i_{k_{0}}}))$ , $y^{k_{0}}_{p,2}\in\partial\varphi^{\downarrow}_{p}(f^{k_{0},\text{lower}}_{p}(x^{k_{0},i_{k_{0}}+1};x^{k_{0},i_{k_{0}}}))$ for $p=1,\cdots,m$ . Thus, at the point $x^{\ast}=x^{k_{0},i_{k_{0}}+1}$ , we have

\operatorname{dist}\left(0,\,\sum^{m}_{p=1}\bigcup\left\{y_{p,1}\big{[}\partial^{\beta}g^{k_{0}}_{p}(x^{\ast})-\partial^{\beta}h^{k_{0}}_{p}(x^{\ast})\big{]}+y_{p,2}\big{[}\partial^{\beta}g^{k_{0}}_{p}(x^{\ast})-\partial^{\beta}h^{k_{0}}_{p}(x^{\ast})\big{]}\,\middle|\begin{array}[]{c}\,y_{p,1}\in\partial^{\beta}\varphi^{\uparrow}_{p}(f^{k_{0}}_{p}(x^{\ast})),\\ y_{p,2}\in\partial^{\beta}\varphi^{\downarrow}_{p}(f^{k_{0}}_{p}(x^{\ast}))\end{array}\right\}\right)\leq\eta,

where the parameters $\beta$ and $\eta$ are given by

\begin{array}[]{l}\beta=\max\left\{\begin{array}[]{c}\max\limits_{1\leq p\leq m}\left[f^{k_{0},\text{upper}}_{p}(x^{\ast}\,;x^{k_{0},i_{k_{0}}})-f^{k_{0}}_{p}(x^{\ast})\right],\\ \max\limits_{1\leq p\leq m}\left[f^{k_{0}}_{p}(x^{\ast})-f^{k_{0},\text{lower}}_{p}(x^{\ast}\,;x^{k_{0},i_{k_{0}}})\right],\\ \|x^{\ast}-x^{k_{0},i_{k_{0}}}\|\end{array}\right\}\overset{\eqref{eq:inner_stopping}}{\leq}\max\left\{\begin{array}[]{c}\max\limits_{1\leq p\leq m}\sum\limits^{+\infty}_{k^{\prime}=k_{0}}\widehat{\alpha^{k^{\prime}}_{p}}+\epsilon_{k_{0}},\\[10.84006pt] \delta_{k_{0}}/(\lambda+\ell_{k_{0}})\end{array}\right\}\overset{\eqref{eq:Termination}}{\leq}\bar{\beta},\\[28.90755pt] \eta={\max\left\{1,\sum\limits_{p=1}^{m}\big{(}|y^{k_{0}}_{p,1}|+|y^{k_{0}}_{p,2}|\big{)}\right\}\delta_{k_{0}}}\overset{\text{Proposition }\ref{prop:equi-bounded}}{\leq}\max\{1,\,\sqrt{2m}D\}\cdot\delta_{k_{0}}\overset{\eqref{eq:Termination}}{\leq}\bar{\eta}\cdot\max\{1,\,\sqrt{2m}D\}.\end{array}

Henceforth, for $k_{0}$ satisfying (39), $x^{\ast}=x^{k_{0},i_{k_{0}}+1}$ is a $\left(\bar{\eta}\cdot\max\{1,\,\sqrt{2m}D\},\bar{\beta},\bar{k}\right)$ -weakly A-stationary point of problem (CP₁). ∎

5 Numerical examples.

We present some preliminary experiments to illustrate the performance of our algorithm on the inverse optimal value optimization with or without constraints. The first experiment aims to demonstrate the practical performance of the prox-ADC method under the termination criteria in section 4.3, by varying different approximating sequences and initial points. To demonstrate the computation of ADC constrained problems, especially the choice of the quantity $\widehat{\alpha^{k}_{p}}$ and a feasible initial point in Assumption 2, we further consider the constrained inverse optimal value optimization. These experiments were tested on a MacBook Air laptop with an Apple M1 chip and 16GB of memory using Julia 1.10.2.

5.1 Inverse optimal value optimization with simple constraints.

Based on the setting in (2), we aim to find a vector $x\in[-1,1]^{n}$ to minimize the errors between the observed optimal values ${\{\nu_{p}\}^{m}_{p=1}}$ and true optimal values $\{f_{p}(x)\}^{m}_{p=1}$ :

\displaystyle\operatornamewithlimits{minimize}_{x\in[-1,1]^{n}}\;F(x)\triangleq\sum_{p=1}^{m}\left|{\nu_{p}}-f_{p}(x)\right|,

(40)

where each $f_{p}$ is the optimal value function as defined in (1). We fix $n=10$ , $m=11$ , $d=10$ , and the number of inequality constraints $\ell=5$ in the minimization problem (1). Vectors $b^{\,p}$ and $c^{\,p}$ , and matrices $A^{\,p},B^{\,p},C^{\,p}$ are randomly generated with each entry independent and normally distributed with mean $\mu=0$ and variance $\sigma=1$ . For numerical stability, we then normalize matrices $C^{\,p}$ and $A^{\,p}$ by a factor of $\sqrt{n}$ . We also generate a positive definite matrix $Q^{\,p}$ and a random solution $x^{\ast}=u/\|u\|$ with $u\sim\mbox{Normal}(0,I_{n})$ . We set ${\nu_{p}}=f_{p}(x^{\ast})$ for each $p$ and, therefore, $F(x^{\ast})=0$ .

We adopt the ADC decomposition in (6), denoted by $f^{k}_{p}=(g_{p})_{\gamma_{k}}-(h_{p})_{\gamma_{k}}$ with a sequence $\{\gamma_{k}=1/(k+1)^{\rho}\}$ for some exponent $\rho>0$ . Consequently, $\ell_{k}=1/\gamma_{k}=(k+1)^{\rho}$ . We apply the prox-ADC algorithm to solve this example with $\epsilon_{k}=\delta_{k}=1/(k+1)^{\rho}$ and $\lambda=5$ . In this example, the strongly convex subproblem (30) can be easily reformulated to a problem with linear objective and convex quadratic constraints, which is solved by Gurobi in our experiments.

We first investigate the performance of our algorithm under the termination criteria with different values of parameters. Figure 2 displays the logarithm of the objective values against the number of outer iterations and the total number of inner iterations. We mark three different points on the curve where the termination criteria (39) with $\bar{\eta}=\bar{\beta}=10^{-1},10^{-2},10^{-3}$ and $\bar{k}=10,20,40$ are satisfied.

We have also experimented with various values of exponent $\rho$ that determine the convergence rate of the approximating sequence and various initial points. In both cases, we terminate the algorithm under the conditions (39) with $\bar{\eta}=\bar{\beta}=10^{-2}$ and $\bar{k}=10$ . In Figure 3, we observe that setting different values of $\rho$ under the same termination criteria leads to candidate solutions with similar objective values, and there are roughly two phases of convergence in terms of the total number of iterations. Initially, the objective value decreases faster for smaller $\rho$ , corresponding to poorer approximation. When the objective value is sufficiently small ( $10^{-3}$ on this particular instance), larger $\rho$ results in faster convergence to high accuracy. We remark that for $\rho=0.5$ , the algorithm reaches the maximum number of outer iterations and does not output a $(10^{-2},10^{-2},10)$ -weakly A-stationary point. Figure 4 demonstrates the influence of using various initial points that are uniformly distributed on $[-1,1]^{n}$ . On this instance, two of the initial points find $(10^{-2},10^{-2},10)$ -weakly A-stationary points with large objective values. For these two initial points, we rerun the algorithm with $\bar{\eta}=\bar{\beta}=10^{-3}$ , and the algorithm still terminates with large objective values.

5.2 Inverse optimal value optimization with ADC constraints.

We consider a variant of the inverse optimal value optimization that is defined as follows:

\begin{array}[]{cl}\displaystyle\operatornamewithlimits{minimize}_{x\in[-1,1]^{n}}&\;\,\displaystyle F(x)=\sum^{m_{1}}_{p=1}\left|{\nu_{p}}-f_{p}(x)\right|\\[14.45377pt] \mbox{subject to}&\;\,\displaystyle\frac{{\nu_{p}}-f_{p}(x)}{\max\{1,|{\nu_{p}}|\}}\leq\varepsilon,\quad\frac{f_{p}(x)-{\nu_{p}}}{\max\{1,|{\nu_{p}}|\}}\leq\varepsilon,\quad p=m_{1}+1,\cdots,m.\end{array}

(41)

In this formulation, the observations of the optimal values ${\{\nu_{p}\}^{m}_{p=1}}$ are divided into two groups indexed by $\{1,\cdots,m_{1}\}$ and $\{m_{1}+1,\cdots,m\}$ . We aim to minimize the errors for the first group while ensuring the relative errors for the second group do not exceed a specified feasibility tolerance, denoted by $\varepsilon$ . In our experiment, we fix $n=10$ , $m=11$ , $m_{1}=8$ , $\varepsilon=10^{-1}$ , $d=10$ , and the number of inequality constraints $\ell=5$ in the minimization problem (1). The solution $x^{\ast}$ and the data, including ${\{\nu_{p}\}^{m}_{p=1}}$ , are randomly generated in the same way as in section 5.1. We can see that $x^{\ast}$ is feasible to (41) and attains the minimal objective value $F(x^{\ast})=0$ .

Similar to the first example, we adopt the ADC decomposition in (6), denoted by $f^{k}_{p}=(g_{p})_{\gamma_{k}}-(h_{p})_{\gamma_{k}}$ with a sequence $\{\gamma_{k}=1/(k+\tilde{k})^{\rho}\}$ for some positive integer $\tilde{k}$ and $\rho>0$ . Due to the feasibility problem in Assumption 2, we introduce the additional parameter $\tilde{k}$ to control the approximating sequences, which will be explained in details later. We also note that treating $\frac{{\nu_{p}}-f_{p}(x)}{\max\{1,|{\nu_{p}}|\}}\leq\varepsilon$ and $\frac{f_{p}(x)-{\nu_{p}}}{\max\{1,|{\nu_{p}}|\}}\leq\varepsilon$ as two separate constraints for $p={m_{1}+1},\cdots,m$ leads to the failure of the asymptotic constraint qualification in Assumption 5 because the approximate subdifferentials of the ADC functions $f_{p}$ and $-f_{p}$ are linearly dependent. This issue can be resolved by rewriting the constraints in a composite ADC form $\frac{|{\nu_{p}}-f_{p}(x)|}{\max\{1,|{\nu_{p}}|\}}\leq\varepsilon$ and assuming a corresponding version of Assumption 5. We omit this technical detail since the main focus of this section is to illustrate the practical implementation of our algorithm.

To verify Assumption 2 that states the existence of a strictly feasible point, we first follow the discussion after Assumption 2 to construct the quantity $\widehat{\alpha^{k}_{p}}=(\gamma_{k}-\gamma_{k+1})L_{p}^{2}/(2\max\{1,|{\nu_{p}}|\})$ where $L_{p}$ is the Lipschitz constant of $\overline{f}_{p}(\cdot,x)$ for all $x\in[-1,1]^{n}$ . We can derive the Lipschitz constant $L_{p}$ by characterizing the subdifferential $\partial_{1}\overline{f_{p}}(\cdot,x)$ for a fixed $x$ based on Danskin’s Theorem [11, Theorem 2.1] and then upper bounding the norm of this subdifferential over $x\in[-1,1]^{n}$ . The extra denominator $\max\{1,|{\nu_{p}}|\}$ in the expression of $\widehat{\alpha^{k}_{p}}$ is due to the scaling of the constraints in (41). Then, consider the following problem:

\displaystyle\operatornamewithlimits{minimize}_{x\in[-1,1]^{n}}\;V(x)\triangleq\sum^{m}_{p=m_{1}+1}\max\Bigg{\{}0,\,\big{|}{\nu_{p}}-f^{0}_{p}(x)\big{|}-\underbrace{\left(\varepsilon-\sum^{+\infty}_{k^{\prime}=0}\widehat{\alpha^{k^{\prime}}_{p}}\right)\max\{1,|{\nu_{p}}|\}}_{\triangleq\,s_{p}\,(\text{constant})}\Bigg{\}},

(Feas)

where the objective is the sum of the compositions of univariate convex functions $\varphi_{p}(t)=\max\{0,|{\nu_{p}}-t|-s_{p}\}$ and DC functions $f^{0}_{p}$ . Notice that problem (Feas) takes the same form as (23). Thus, we can apply the inner loop of the prox-ADC method to solve it approximately. If solving this problem gives a solution $x^{0}$ with $V(x^{0})=0$ , then

\frac{|{\nu_{p}}-f^{0}_{p}(x)|}{\max\{1,|{\nu_{p}}|\}}\leq\varepsilon-\sum^{+\infty}_{k^{\prime}=0}\widehat{\alpha^{k^{\prime}}_{p}}=\varepsilon-\frac{(L_{p})^{2}}{2\,\tilde{k}^{\rho}\cdot\max\{1,|{\nu_{p}}|\}},

(42)

and $x^{0}$ is a strictly feasible point satisfying Assumption 2. We emphasize that using the inner loop of the prox-ADC method for solving problem (Feas) to obtain a strictly feasible point is merely a heuristic. Although this approach works well in our experiments, it is generally not easy to verify Assumption 2. We make a final remark on the role of $\tilde{k}$ . For small values of $\rho$ and $\tilde{k}$ , it is possible that $\varepsilon<\frac{(L_{p})^{2}}{2\,\tilde{k}^{\rho}\cdot\max\{1,|{\nu_{p}}|\}}$ , and, from (42), there is no strictly feasible point satisfying Assumption 2 for this fixed approximating sequence. Henceforth, the flexibility of the parameter ${\tilde{k}}$ is necessary to ensure the validity of Assumption 2.

We implement the above procedure to find an initial point and then apply the prox-ADC method with $\epsilon_{k}=\delta_{k}=1/(k+1)^{\rho}$ and $\lambda=5$ . On most of the randomly generated instances, we observe that the point given by solving (Feas) is also feasible to the original problem (41) along the iterations, although this result cannot be implied by Assumption 2. In Figure 5, we again plot the logarithm of the objective values against the total number of iterations, using various combination of $\tilde{k}$ and $\rho$ and various initial points. It is worth mentioning that for this constrained problem, the random initial point is not directly utilized in the prox-ADC method. Instead, it is first used in problem (Feas) to generate a strictly feasible point satisfying Assumption 2, and the candidate solution for (Feas) then becomes the initial point of the prox-ADC method for solving (41).

Acknowledgments.

The authors are partially supported by the National Science Foundation under grants CCF-2416172 and DMS-2416250, and the National Institutes of Health under grant 1R01CA287413-01. The authors are grateful to the associate editor and two reviewers for their careful reading and constructive comments that have substantially improved the paper.

References

[1] Carlo Acerbi. Spectral measures of risk: A coherent representation of subjective risk aversion. Journal of Banking and Finance, 26(7):1505–1518, 2002.
[2] Shabbir Ahmed and Yongpei Guan. The inverse optimal value problem. Mathematical Programming, 102(1):91–110, 2005.
[3] Amir Beck and Marc Teboulle. Smoothing and first order methods: A unified framework. SIAM Journal on Optimization, 22(2):557–580, 2012.
[4] James V Burke. Descent methods for composite nondifferentiable optimization problems. Mathematical Programming, 33(3):260–279, 1985.
[5] James V Burke and Michael C Ferris. A Gauss-Newton method for convex composite optimization. Mathematical Programming, 71(2):179–194, 1995.
[6] James V Burke and Tim Hoheisel. Epi-convergent smoothing with applications to convex composite functions. SIAM Journal on Optimization, 23(3):1457–1479, 2013.
[7] James V Burke and Tim Hoheisel. Epi-convergence properties of smoothing by infimal convolution. Set-Valued and Variational Analysis, 25(1):1–23, 2017.
[8] James V Burke, Tim Hoheisel, and Christian Kanzow. Gradient consistency for integral-convolution smoothing functions. Set-Valued and Variational Analysis, 21(2):359–376, 2013.
[9] James V Burke, Adrian S Lewis, and Michael L Overton. A robust gradient sampling algorithm for nonsmooth, nonconvex optimization. SIAM Journal on Optimization, 15(3):751–779, 2005.
[10] Xiaojun Chen. Smoothing methods for nonsmooth, nonconvex minimization. Mathematical Programming, 134(1):71–99, 2012.
[11] Frank H Clarke. Generalized gradients and applications. Transactions of the American Mathematical Society, 205:247–262, 1975.
[12] Frank H Clarke. Optimization and Nonsmooth Analysis. Society for Industrial and Applied Mathematics, Philadelphia, 1990.
[13] Ying Cui and Jong-Shi Pang. Modern Nonconvex Nondifferentiable Optimization. Society for Industrial and Applied Mathematics, Philadelphia, 2021.
[14] Dmitriy Drusvyatskiy and Courtney Paquette. Efficiency of minimizing compositions of convex functions and smooth maps. Mathematical Programming, 178(1–2):503–558, 2019.
[15] Yuri M Ermoliev, Vladimir I Norkin, and Roger JB Wets. The minimization of semicontinuous functions: Mollifier subgradients. SIAM Journal on Control and Optimization, 33(1):149–167, 1995.
[16] Roger Fletcher. A model algorithm for composite nondifferentiable optimization problems. Mathematical Programming Studies, 17:67–76, 1982.
[17] Gerald B Folland. Real Analysis: Modern Techniques and Their Applications, volume 40. John Wiley & Sons, New York, 1999.
[18] Allen A Goldstein. Optimization of Lipschitz continuous functions. Mathematical Programming, 13(1):14–22, 1977.
[19] Adrian S Lewis and Stephen J Wright. A proximal method for composite minimization. Mathematical Programming, 158(1-2):501–546, 2016.
[20] Hanyang Li and Ying Cui. A decomposition algorithm for two-stage stochastic programs with nonconvex recourse. SIAM Journal on Optimization, 34(1):306–335, 2024.
[21] Junyi Liu, Ying Cui, Jong-Shi Pang, and Suvrajeet Sen. Two-stage stochastic programming with linearly bi-parameterized quadratic recourse. SIAM Journal on Optimization, 30(3):2530–2558, 2020.
[22] Boris S Mordukhovich. Generalized differential calculus for nonsmooth and set-valued mappings. Journal of Mathematical Analysis and Applications, 183(1):250–288, 1994.
[23] Matthew Norton, Valentyn Khokhlov, and Stan Uryasev. Calculating CVaR and bPOE for common probability distributions with application to portfolio optimization and density estimation. Annals of Operations Research, 299(1):1281–1315, 2021.
[24] Giuseppe Paleologo and Samer Takriti. Bandwidth trading: A new market looking for help from the OR community. AIRO News, 6(3):1–4, 2001.
[25] RA Poliquin and R Tyrrell Rockafellar. Amenable functions in optimization. Nonsmooth Optimization: Methods and Applications (Erice, 1991), pages 338–353, 1992.
[26] RA Poliquin and R Tyrrell Rockafellar. A calculus of epi-derivatives applicable to optimization. Canadian Journal of Mathematics, 45(4):879–896, 1993.
[27] R Tyrrell Rockafellar. Convex Analysis, volume 18. Princeton University Press, Princeton, NJ, 1970.
[28] R Tyrrell Rockafellar. Lagrange multipliers and optimality. SIAM Review, 35(2):183–238, 1993.
[29] R Tyrrell Rockafellar and Stanislav Uryasev. Optimization of conditional value-at-risk. Journal of Risk, 2(3):21–42, 2000.
[30] R Tyrrell Rockafellar and Roger JB Wets. Variational Analysis, volume 317. Springer Science & Business Media, New York, 2009.
[31] Johannes O Royset. Approximations of semicontinuous functions with applications to stochastic optimization and statistical estimation. Mathematical Programming, 184(1):289–318, 2020.
[32] Johannes O Royset. Consistent approximations in composite optimization. Mathematical Programming, 201(1–2):339–372, 2022.
[33] Alexander Shapiro, Darinka Dentcheva, and Andrzej Ruszczyński. Lectures on Stochastic Programming: Modeling and Theory (Third Edition). Society for Industrial and Applied Mathematics, Philadelphia, 2021.
[34] Wim van Ackooij. A discussion of probability functions and constraints from a variational perspective. Set-Valued and Variational Analysis, 28(4):585–609, 2020.
[35] Yao Yao, Qihang Lin, and Tianbao Yang. Large-scale optimization of partial AUC in a range of false positive rates. Advances in Neural Information Processing Systems, 35:31239–31253, 2022.

Appendix A. Proofs of Proposition 2 and Proposition 3.

Proof of Proposition 2..

(a) We first generalize the convergence result of the classical Moreau envelopes when $\gamma_{k}\downarrow 0$ (see, e.g., [30, Theorem 1.25]) to the partial Moreau envelopes. Fixing any $\gamma_{0}>0$ , we consider the function $\psi(z,x,\gamma)\triangleq\overline{f}(z,x)+\delta_{\operatorname{dom}f}(x)+\psi_{0}(z,x,\gamma)$ with

\psi_{0}(z,x,\gamma)\triangleq\left\{\begin{array}[]{cl}\|z-x\|^{2}/(2\gamma)&\text{ if }\gamma\in(0,\gamma_{0}],\\ 0&\text{ if }\gamma=0,z=x,\\ \infty&\text{ otherwise.}\end{array}\right.

Notice that $f^{k}(x)=g_{\gamma_{k}}(x)-h_{\gamma_{k}}(x)+\delta_{\operatorname{dom}f}(x)=\inf_{z}\psi(z,x,\gamma_{k})$ . It is easy to verify that $\psi$ is proper and lsc based on our assumptions. Under the assumption that $\overline{f}$ is bounded from below on $\operatorname{dom}f\times\operatorname{dom}f$ , we can also show by contradiction that $\psi(z,x,\gamma)$ is level-bounded in $z$ locally uniformly in $(x,\gamma)$ . Consequently, it follows from [30, Theorem 1.17] that $f^{k}(x)=\inf_{z}\psi(z,x,\gamma_{k})\uparrow f(x)$ for any fixed $x$ and each $f^{k}$ is lsc.

Hence, $f^{k}\overset{\text{e}}{\rightarrow}f$ is a direct consequence of [30, Proposition 7.4(d)] by $f^{k}(x)\uparrow f(x)$ for all $x$ and the lower semicontinuity of $f^{k}$ . If $\operatorname{dom}f={\mathbb{R}^{n}}$ , then $f$ is continuous, and thus $f^{k}\overset{\text{c}}{\rightarrow}f$ by [30, Proposition 7.4(c-d)]. We then complete the proof of (a).

(b) For any $\bar{x}\in\operatorname{int}(\operatorname{dom}f)$ ,

\begin{array}[]{rl}\partial_{A}f(\bar{x})=&\bigcup\limits_{x^{k}\rightarrow\bar{x}}\displaystyle\operatornamewithlimits{Lim\,sup}_{k\rightarrow+\infty}\;\left\{\partial{g_{\gamma_{k}}}(x^{k})-\partial{h_{\gamma_{k}}}(x^{k})\right\}\\ \overset{(\rm i^{\prime})}{=}&\bigcup\limits_{x^{k}\rightarrow\bar{x}}\displaystyle\operatornamewithlimits{Lim\,sup}_{k\rightarrow+\infty}\;\left\{\frac{x^{k}}{\gamma_{k}}-\partial_{2}(-\overline{f})(z^{k},x^{k})-\frac{z^{k}}{\gamma_{k}}\;\middle|\;z^{k}=\operatorname*{arg\,min}_{z\in{\mathbb{R}^{n}}}\left[\,\overline{f}(z,x^{k})+\frac{1}{2\gamma_{k}}\|z-x^{k}\|^{2}\right]\right\}\\ \,\overset{(\rm ii^{\prime})}{\subset}&\bigcup\limits_{(x^{k},z^{k})\rightarrow(\bar{x},\bar{x})}\displaystyle\operatornamewithlimits{Lim\,sup}_{k\rightarrow+\infty}\;\left[\partial_{1}\overline{f}(z^{k},x^{k})-\partial_{2}(-\overline{f})(z^{k},x^{k})\right]\\ \overset{(\rm iii^{\prime})}{=}&\partial_{1}\overline{f}(\bar{x},\bar{x})-\partial_{2}(-\overline{f})(\bar{x},\bar{x}),\end{array}

where $(\rm i^{\prime})$ follows from the convexity of $(-\overline{f})(z,\cdot)$ for any $z\in\operatorname{dom}f$ and Danskin’s Theorem [11, Theorem 2.1]; $(\rm ii^{\prime})$ is due to the optimality condition for $z^{k}$ , and $z^{k}\rightarrow\bar{x}$ is obtained by similar arguments in the proof of Theorem 1(b) due to our assumption that $\overline{f}$ is bounded from below on $\operatorname{dom}f\times\operatorname{dom}f$ ; and $(\rm iii^{\prime})$ uses the outer semicontinuity of $\partial_{1}\overline{f}$ and $\partial_{2}(-\overline{f})$ at $(\bar{x},\bar{x})$ [20, Lemma 5]. Therefore, for any $\bar{x}\in\operatorname{int}(\operatorname{dom}f)$ , $\partial f(\bar{x})\subset\partial_{A}f(\bar{x})\subset\partial_{1}\overline{f}(\bar{x},\bar{x})-\partial_{2}(-\overline{f})(\bar{x},\bar{x})$ . Moreover, due to the local boundedness of the mappings $\partial_{1}\overline{f}$ and $\partial_{2}(-\overline{f})$ at $(\bar{x},\bar{x})$ [20, Lemma 5], it follows from [30, Example 4.22] that $\partial^{\infty}_{A}f(\bar{x})=\{0\}$ . ∎

Proof of Proposition 3..

(a) Note that for any $x\in\mathbb{R}^{n}$ , ${\mbox{CVaR}_{\alpha}}[\,c(x,Z)\,]$ is well-defined and takes finite value due to $\mathbb{E}[\,|c(x,Z)|\,]<+\infty$ . Since $c(x,Z)$ follows a continuous distribution for any $x\in\mathbb{R}^{n}$ , we know from [29, Theorem 1] and [1] that CVaR has the following equivalent representations:

{\mbox{CVaR}_{\alpha}}[\,c(x,Z)\,]=\inf_{t\in\mathbb{R}}\left\{t+\frac{1}{1-\alpha}\,\mathbb{E}\left[\,\max\{c(x,Z)-t,0\}\right]\right\}=\frac{1}{1-\alpha}\int_{\alpha}^{1}\mbox{VaR}_{t}[\,c(x,Z)\,]\,\mbox{d}t.

Moreover, ${\mbox{CVaR}_{\alpha}}[\,c(\cdot,Z)\,]$ is convex by the convexity of $c(\cdot,z)$ for any fixed $z\in\mathbb{R}^{m}$ (cf. [29, Theorem 2]). Therefore, both $g^{k}$ and $h^{k}$ defined in (7) are convex. By the definitions of $g^{k}$ and $h^{k}$ , we have

\begin{array}[]{rl}g^{k}(x)-h^{k}(x)&=[k(1-\alpha)+1]\,\mbox{CVaR}_{\alpha-1/k}[c(x,Z)]-k(1-\alpha)\,\mbox{CVaR}_{\alpha}[c(x,Z)]\\[5.78172pt] &=\displaystyle\frac{k(1-\alpha)+1}{1-(\alpha-1/k)}\int^{1}_{\alpha-1/k}\operatorname{VaR}_{t}[c(x,Z)]\,\mbox{d}t-\frac{k(1-\alpha)}{1-\alpha}\int^{1}_{\alpha}\operatorname{VaR}_{t}[c(x,Z)]\,\mbox{d}t\\[10.84006pt] &=k\displaystyle\int^{1}_{\alpha-1/k}\operatorname{VaR}_{t}[c(x,Z)]\,\mbox{d}t-k\int^{1}_{\alpha}\operatorname{VaR}_{t}[c(x,Z)]\,\mbox{d}t\\[10.84006pt] &=k\displaystyle\int^{\alpha}_{\alpha-1/k}\operatorname{VaR}_{t}[c(x,Z)]\,\mbox{d}t.\end{array}

Note that $\operatorname{VaR}_{t}[c(x,Z)]$ is nondecreasing as a function of $t$ for any fixed $x\in\mathbb{R}^{n}$ . Namely,

\displaystyle\int^{\alpha}_{\alpha-1/k}\operatorname{VaR}_{\alpha-1/k}[c(x,Z)]\,\mbox{d}t\leq\displaystyle\int^{\alpha}_{\alpha-1/k}\operatorname{VaR}_{t}[c(x,Z)]\,\mbox{d}t\leq\displaystyle\int^{\alpha}_{\alpha-1/k}\operatorname{VaR}_{\alpha}[c(x,Z)]\,\mbox{d}t.

Thus, $\mbox{VaR}_{\alpha-1/k}[\,c(x,Z)\,]\leq g^{k}(x)-h^{k}(x)\leq\mbox{VaR}_{\alpha}[\,c(x,Z)\,]$ for any $x\in\mathbb{R}^{n}$ and $k>1/\alpha$ . Since $\mbox{VaR}_{t}[\,c(x,Z)\,]$ as a function of $t$ on $(0,1)$ is left-continuous, it follows that $[g^{k}(x)-h^{k}(x)]\uparrow\mbox{VaR}_{\alpha}[\,c(x,Z)\,]$ for all $x$ . Observe that

\{x\mid\mbox{VaR}_{\alpha}[\,c(x,Z)\,]\leq r\}=\{x\mid\mathbb{P}(c(x,Z)\leq r)\geq\alpha\}.

Based on our assumptions and [34, Proposition 2.2], for any $r\in\mathbb{R}$ , the probability function $x\mapsto-\mathbb{P}(c(x,Z)\leq r)$ is lsc, which implies the closedness of the level set $\{x\mid\mathbb{P}(c(x,Z)\leq r)\geq\alpha\}$ for any $(r,\alpha)\in\mathbb{R}\times(0,1)$ . Hence, $\mbox{VaR}_{\alpha}[\,{c(\cdot,Z)}]$ is lsc for any given $\alpha\in(0,1)$ and is continuous if $c(\cdot,\cdot)$ is further assumed to be continuous. Then (a) is a direct consequence of [30, Proposition 7.4(c-d)] by the monotonicity $[g^{k}(x)-h^{k}(x)]\uparrow\operatorname{VaR}_{\alpha}[\,c(x,Z)]$ and the continuity of $\mbox{VaR}_{\alpha}[\,c(\cdot,Z)]$ .

(b) We use $\mathcal{L}_{1}(\Omega,\mathcal{F},\mathbb{P})$ to denote the space of all random variables ${X}:\Omega\to\mathbb{R}$ with $\mathbb{E}[\,|{X}(\omega)|]<+\infty$ . According to [33, Example 6.19], the function ${\mbox{CVaR}_{\alpha}}:\mathcal{L}_{1}(\Omega,\mathcal{F},\mathbb{P})\to\mathbb{R}$ is subdifferentiable (see [33, (9.281)] for the definition). Consider any fixed $x\in\mathbb{R}^{n}$ . Given that $c(x,Z)$ is a continuous random variable in $\mathcal{L}_{1}(\Omega,\mathcal{F},\mathbb{P})$ , it follows from [33, (6.81)] that the subdifferential of ${\mbox{CVaR}_{\alpha}}[\cdot]$ at $c(x,Z)$ is:

{\partial\left(\mbox{CVaR}_{\alpha}[\,\cdot\,]\right)[\,c(x,Z)]}=\left\{\phi\in\mathcal{L}_{\infty}(\Omega,\mathcal{F},\mathbb{P})\,\middle|\begin{array}[]{ll}\phi(\omega)=(1-\alpha)^{-1}&\text{if }c(x,Z(\omega))>\mbox{VaR}_{\alpha}[\,c(x,Z)]\\ {\phi(\omega)\in[0,(1-\alpha)^{-1}]}&{\text{if }c(x,Z(\omega))=\mbox{VaR}_{\alpha}[\,c(x,Z)]}\\ \phi(\omega)=0&\text{if }c(x,Z(\omega))<\mbox{VaR}_{\alpha}[\,c(x,Z)]\end{array}\right\}.

(43)

We would like to mention that the event $\{\omega\in\Omega\mid c(x,Z(\omega))=\mbox{VaR}_{\alpha}[\,c(x,Z)]\}$ has zero probability and, thus, $\mathbb{E}[\,\phi\,]=(1-\alpha)^{-1}\cdot(1-\alpha)=1$ for every random variable $\phi\in\partial\left(\mbox{CVaR}_{\alpha}[\,\cdot\,]\right)\,[\,c(x,Z)]$ . Let $\mathbb{P}_{Z}$ denote the probability measure associated with $Z$ . By using [33, Theorem 6.14], we obtain the subdifferential of the convex function ${\mbox{CVaR}_{\alpha}}[\,c(\cdot,Z)\,]$ at $x$ :

\begin{split}\begin{array}[]{rl}\partial\left({\mbox{CVaR}_{\alpha}}[\,c(\cdot,Z)\,]\right)(x)=&\operatorname{cl}\left(\displaystyle\bigcup_{\phi\in\partial({\operatorname{CVaR}_{\alpha}}[\,\cdot\,])\,[\,c(x,Z)]}\int\partial_{1}\,c(x,Z(\omega))\,\phi(\omega)\,\mbox{d}\mathbb{P}_{Z}(\omega)\right)\\[19.5132pt] \overset{(\rm iv^{\prime})}{=}&{\operatorname{cl}\left(\displaystyle\int\partial_{1}\,c(x,Z(\omega))\,\bar{\phi}(\omega)\,\mbox{d}\mathbb{P}_{Z}(\omega)\right)\quad\forall\,\bar{\phi}\in\partial(\mbox{CVaR}_{\alpha}[\,\cdot\,])\,[\,c(x,Z)]}\\[12.28577pt] \overset{(\rm v^{\prime})}{=}&{\displaystyle\int\partial_{1}\,c(x,Z(\omega))\,\bar{\phi}(\omega)\,\mbox{d}\mathbb{P}_{Z}(\omega)\qquad\quad\;\;\forall\,\bar{\phi}\in\partial(\mbox{CVaR}_{\alpha}[\,\cdot\,])\,[\,c(x,Z)].}\end{array}\end{split}

(44)

To see $(\rm iv^{\prime})$ , it suffices to show that, for arbitrary two elements $\phi_{1}$ and $\phi_{2}$ in the set $\partial(\mbox{CVaR}_{\alpha}[\,\cdot\,])\,[\,c(x,Z)]$ , we have

\displaystyle\int\partial_{1}\,c(x,Z(\omega))\,\phi_{1}(\omega)\,\mbox{d}\mathbb{P}_{Z}(\omega)=\int\partial_{1}\,c(x,Z(\omega))\,\phi_{2}(\omega)\,\mbox{d}\mathbb{P}_{Z}(\omega).

(45)

To this end, we take any measurable selection $a(x,Z(\omega))\in\partial_{1}c(x,Z(\omega))$ . By the assumption that $|c(x,z)-c(x^{\prime},z)|\leq\kappa(z)\|x-x^{\prime}\|$ for all $x,x^{\prime}\in\mathbb{R}^{n}$ and $z\in\mathbb{R}^{m}$ , it holds that $\|a(x,Z)\|\leq\kappa(Z)$ since subgradients of a convex function are uniformly bounded in norm by the Lipschitz constant. Consequently, both $a(x,Z(\omega))\,\phi_{1}(\omega)$ and $a(x,Z(\omega))\,\phi_{2}(\omega)$ are integrable as $|\phi_{1}(\omega)|\leq(1-\alpha)^{-1}$ and $|\phi_{2}(\omega)|\leq(1-\alpha)^{-1}$ for any $\omega$ by (43) and $\mathbb{E}[\|a(x,Z)\|]\leq\mathbb{E}[\kappa(Z)]<+\infty$ by our assumption. Observing that $a(x,Z(\omega))\,\phi_{1}(\omega)=a(x,Z(\omega))\,\phi_{2}(\omega)$ almost surely, we can conclude from [17, Proposition 2.23] that $\int a(x,Z(\omega))\,\phi_{1}(\omega)\,\mbox{d}\mathbb{P}_{Z}(\omega)=\int a(x,Z(\omega))\,\phi_{2}(\omega)\,\mbox{d}\mathbb{P}_{Z}(\omega)$ . This completes the proof of (45).

Next, we will explain why the closure can be removed in (44). By the convexity of $c(\cdot,z)$ for any fixed $z\in\mathbb{R}^{m}$ and the existence of a measurable function $\kappa$ , it follows from [12, Theorem 2.7.2] that

\int\partial_{1}c(x,Z(\omega))\,\bar{\phi}(\omega)\,\mbox{d}\mathbb{P}_{Z}(\omega)=\partial{\left(\int c(\cdot,Z(\omega))\,\bar{\phi}(\omega)\,\mbox{d}\mathbb{P}_{Z}(\omega)\right)(x)},

where the right-hand-side is the subdifferential of a convex function and, thus, is a closed set. Then, we can omit the closure to obtain the equation $(\rm v^{\prime})$ in (44).

Now we use the expression of $\partial\left(\mbox{CVaR}_{\alpha}[\,c(\cdot,Z)\,]\right)(x)$ to characterize $\partial_{A}\mbox{VaR}_{\alpha}[\,c(\cdot,Z)](\bar{x})$ . For any $k>1/\alpha$ , taking any $\phi_{3}\in\partial(\operatorname{CVaR}_{\alpha-1/k}[\,\cdot\,])\,[\,c(x,Z)]$ and $\phi_{4}\in\partial(\operatorname{CVaR}_{\alpha}[\,\cdot\,])\,[\,c(x,Z)]$ , we have

\begin{array}[]{rl}\partial g^{k}(x)-\partial h^{k}(x)=&[k(1-\alpha)+1]\,\partial\operatorname{CVaR}_{\alpha-1/k}[\,c(\cdot,Z)](x)-k(1-\alpha)\,\partial\operatorname{CVaR}_{\alpha}[\,c(\cdot,Z)](x)\\[3.61371pt] \overset{\eqref{eq:CVaR_comp_subdiff}}{=}&\displaystyle\int\partial_{1}c(x,Z(\omega))\cdot\big{(}[k(1-\alpha)+1]\,\phi_{3}(\omega)-k(1-\alpha)\,\phi_{4}(\omega)\big{)}\,\mbox{d}\mathbb{P}_{Z}(\omega)\\ \overset{\eqref{eq:CVaR_subdiff}}{=}&\displaystyle\int\partial_{1}c(x,Z(\omega))\cdot\phi(\omega)\,\mbox{d}\mathbb{P}_{Z}(\omega),\end{array}

with

\phi(\omega)\triangleq\left\{\begin{array}[]{cl}{0}&{\text{ if }c(x,Z(\omega))>\mbox{VaR}_{\alpha}[\,c(x,Z)]}{\text{ or }c(x,Z(\omega))<\mbox{VaR}_{\alpha-1/k}[\,c(x,Z)]}\\ {[0,k]}&{\text{ if }c(x,Z(\omega))=\mbox{VaR}_{\alpha}[\,c(x,Z)]}\text{ or }c(x,Z(\omega))=\mbox{VaR}_{\alpha-1/k}[\,c(x,Z)]\\ k&\text{ if }\mbox{VaR}_{\alpha-1/k}[\,c(x,Z)]<c(x,Z(\omega))<\mbox{VaR}_{\alpha}[\,c(x,Z)]\end{array}\right..

Since the event $\{\omega\in\Omega\mid c(x,Z(\omega))=\mbox{VaR}_{\alpha}[\,c(x,Z)]\text{ or }\mbox{VaR}_{\alpha-1/k}[\,c(x,Z)]\}$ has zero probability, we have

\begin{array}[]{rl}\partial g^{k}(x)-\partial h^{k}(x)=&\displaystyle\int\partial_{1}c(x,Z(\omega))\,k\,\mathbf{1}\{\mbox{VaR}_{\alpha-1/k}[\,c(x,Z)]<c(x,Z)<\mbox{VaR}_{\alpha}[\,c(x,Z)]\}\,\mbox{d}\mathbb{P}_{Z}(\omega)\\[2.168pt] =&\displaystyle\int\partial_{1}c(x,Z(\omega))\cdot\frac{\mathbf{1}\{\mbox{VaR}_{\alpha-1/k}[\,c(x,Z)]<c(x,Z)<\mbox{VaR}_{\alpha}[\,c(x,Z)]\}}{\mathbb{P}(\mbox{VaR}_{\alpha-1/k}[\,c(x,Z)]<c(x,Z)<\mbox{VaR}_{\alpha}[\,c(x,Z)])}\,\mbox{d}\mathbb{P}_{Z}(\omega)\\[10.84006pt] =&\mathbb{E}\left[\,\partial_{1}c(x,Z)\,\middle|\,\mbox{VaR}_{\alpha-1/k}[\,c(x,Z)]<c(x,Z)<\mbox{VaR}_{\alpha}[\,c(x,Z)]\,\right].\end{array}

By the definition of the approximate subdifferential, the proof is then completed. ∎

Appendix B. Proof of Proposition 4.

We start with the chain rules for $\partial(\varphi\circ f)$ and $\partial^{\infty}(\varphi\circ f)$ , where the inner function $f$ is merely lsc. These results are extensions of the nonlinear rescaling [30, Proposition 10.19(b)] to the case where $\varphi$ may lack the strictly increasing property at a given point. One can also derive the same results through a general chain rule of the coderivative for composite set-valued mappings [22, Theorem 5.1]. However, to avoid the complicated computations accompanied by the introduction of the coderivative, we give an alternative proof below that is more straightforward. To prepare for the chain rules, we need a technical lemma about the proximal normal cone.

Lemma 3.

Let $f:\mathbb{R}^{n}\to\mathbb{R}$ be a lsc function. For $\bar{\alpha}>f(\bar{x})$ , it holds that

\mathcal{N}^{p}_{\operatorname{epi}f}(\bar{x},\bar{\alpha})\subset\mathcal{N}^{p}_{\operatorname{epi}f}(\bar{x},f(\bar{x})).

Proof of Lemma 3..

For $\bar{\alpha}>f(\bar{x})$ . By [30, Example 6.16], we have

\mathcal{N}^{p}_{\operatorname{epi}f}(\bar{x},\bar{\alpha})=\left\{\lambda[(x,\alpha)-(\bar{x},\bar{\alpha})]\mid(x,\alpha)\in\mathbb{R}^{n+1}\text{ such that }(\bar{x},\bar{\alpha})\in\Pi_{\operatorname{epi}f}(x,\alpha),\lambda\geq 0\right\},

where $\Pi_{\operatorname{epi}f}:\mathbb{R}^{n+1}\to\operatorname{epi}f$ is the projection operator. For any $(x,\alpha)\in\mathbb{R}^{n+1}$ with $(\bar{x},\bar{\alpha})\in\Pi_{\operatorname{epi}f}(x,\alpha)$ , we have

(\bar{x},\bar{\alpha})\in\displaystyle\operatornamewithlimits{argmin}_{(u,t)\in\operatorname{epi}f}\|(u,t)-(x,\alpha)\|^{2},

which can be equivalently written as

(\bar{x},f(\bar{x}))\in\displaystyle\operatornamewithlimits{argmin}_{(u,t+\bar{\alpha}-f(\bar{x}))\in\operatorname{epi}f}\|(u,t)-(x,\alpha-\bar{\alpha}+f(\bar{x}))\|^{2}.

Then restrict the feasible region of the above problem to a subset $\{(u,t)\mid(u,t)\in\operatorname{epi}f\}$ . Since $(\bar{x},f(\bar{x}))$ is still a feasible point in this subset, we have $(\bar{x},f(\bar{x}))\in\Pi_{\operatorname{epi}f}(x,\alpha-\bar{\alpha}+f(\bar{x}))$ . Hence, $(\bar{x},\bar{\alpha})\in\Pi_{\operatorname{epi}f}(x,\alpha)$ implies $(\bar{x},f(\bar{x}))\in\Pi_{\operatorname{epi}f}(x,\alpha-\bar{\alpha}+f(\bar{x}))$ . Using this result and the expression of $\mathcal{N}^{p}_{\operatorname{epi}f}(\bar{x},\bar{\alpha})$ , we conclude that $\mathcal{N}^{p}_{\operatorname{epi}f}(\bar{x},\bar{\alpha})\subset\mathcal{N}^{p}_{\operatorname{epi}f}(\bar{x},f(\bar{x}))$ for $\bar{\alpha}>f(\bar{x})$ . ∎

We present the chain rules with a self-contained proof in the following lemma.

Lemma 4 (chain rules for the limiting subdifferential).

Let $\varphi:\mathbb{R}\rightarrow\overline{\mathbb{R}}$ be proper, lsc, convex, and nondecreasing with $\sup\varphi=+\infty$ , and $f:\mathbb{R}^{n}\rightarrow\mathbb{R}$ be lsc. Consider $\bar{x}\in\operatorname{dom}(\varphi\circ f)$ . If the only scalar $y\in\operatorname{Lim\,sup}_{x\rightarrow_{(\varphi\circ f)}\bar{x}}\,\mathcal{N}_{\operatorname{dom}\varphi}(f(x))$ with $0\in y\cdot\operatorname{Lim\,sup}_{x\rightarrow\bar{x}}\partial f(x)$ is $y=0$ , then

\begin{array}[]{ll}\quad\partial(\varphi\circ f)(\bar{x})\subset\bigcup\left\{\,y\cdot\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow\bar{x}}\partial f(x)\,\middle|\,y\in\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow_{(\varphi\circ f)}\,\bar{x}}\partial\varphi(f(x))\,\right\}\cup\left[\,{\Big{(}}{\displaystyle\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow\bar{x}}}^{\infty}\partial f(x){\Big{)}}\backslash\{0\}\,\right],\\[14.45377pt] \,\partial^{\infty}(\varphi\circ f)(\bar{x})\subset\bigcup\left\{\,y\cdot\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow\bar{x}}\partial f(x)\,\middle|\,y\in\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow_{(\varphi\circ f)}\,\bar{x}}\mathcal{N}_{\operatorname{dom}\varphi}(f(x))\,\right\}\cup\left[\,{\Big{(}}{\displaystyle\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow\bar{x}}}^{\infty}\partial f(x){\Big{)}}\backslash\{0\}\,\right].\end{array}

Proof of Lemma 4..

The basic idea is to rewrite $\varphi\circ f$ as a parametric minimization problem and apply [30, Theorem 10.13]. Note that $\varphi\big{(}f(x)\big{)}=\inf_{\alpha}\;[g(x,\alpha)\triangleq\delta_{\operatorname{epi}f}(x,\alpha)+\varphi(\alpha)]$ for $x\in\operatorname{dom}(\varphi\circ f)$ . Define the corresponding set of optimal solutions as $M(x)$ for any $x\in\operatorname{dom}(\varphi\circ f)$ . Then, we have $f(\bar{x})\in{M(\bar{x})}$ and $\varphi(\alpha)=\varphi(f(\bar{x}))$ for any $\alpha\in{M(\bar{x})}$ . By our assumptions, it is obvious that $\operatorname{dom}\varphi\in\{(-\infty,b),(-\infty,b]\}$ for some $b\in\mathbb{R}\cup\{+\infty\}$ . Based on our assumption that $\sup\varphi=+\infty$ and $f$ is lsc, it is easy to verify that $g$ is proper, lsc, and level-bounded in $\alpha$ locally uniformly in $x$ . Then we apply [30, Theorem 10.13] to obtain

\partial(\varphi\circ f)(\bar{x})\subset\left\{v\mid(v,0)\in\partial g(\bar{x},\bar{\alpha}),\bar{\alpha}\in M(\bar{x})\right\},\;\partial^{\infty}(\varphi\circ f)(\bar{x})\subset\left\{v\mid(v,0)\in\partial^{\infty}g(\bar{x},\bar{\alpha}),\bar{\alpha}\in M(\bar{x})\right\}.

(46)

Step 1: We will show that for any $\bar{\alpha}\in{M(\bar{x})}$ ,

\mathcal{N}_{\operatorname{epi}f}(\bar{x},\bar{\alpha})\cap\big{(}\,\{0\}\times[-\mathcal{N}_{\operatorname{dom}\varphi}(\bar{\alpha})]\,\big{)}=\{0\}.

(47)

We divide the proof of (47) into two cases.

Case 1. If ${M(\bar{x})}$ is a singleton $\{f(\bar{x})\}$ , we can characterize $\mathcal{N}_{\operatorname{epi}f}(\bar{x},f(\bar{x}))$ by using the result in [30, Theorem 8.9]. Since $\partial f(\bar{x})\subset\operatorname{Lim\,sup}_{x\rightarrow\bar{x}}\partial f(x)$ and $\mathcal{N}_{\operatorname{dom}\varphi}(f(\bar{x}))\subset\operatorname{Lim\,sup}_{x\rightarrow_{(\varphi\circ f)}\,\bar{x}}\,\mathcal{N}_{\operatorname{dom}\varphi}(f(x))$ , it follows from our assumption that either $0\notin\partial f(\bar{x})$ or $\mathcal{N}_{\operatorname{dom}\varphi}(f(\bar{x}))=\{0\}$ . Hence, based on the characterization of $\mathcal{N}_{\operatorname{epi}f}(\bar{x},f(\bar{x}))$ , (47) is satisfied.

Case 2. Otherwise, there exists $\bar{\alpha}_{\max}\in(f(\bar{x}),+\infty)$ such that ${M(\bar{x})}=[f(\bar{x}),\bar{\alpha}_{\max}]$ since $\varphi$ is lsc, nondecreasing and $\sup\varphi=+\infty$ . Thus, from (46),

\begin{split}\partial(\varphi\circ f)(\bar{x})&\subset\big{[}\{v\mid(v,0)\in\partial g(\bar{x},f(\bar{x}))\}\cup\{v\mid(v,0)\in\partial g(\bar{x},\bar{\alpha}),f(\bar{x})<\bar{\alpha}\leq\bar{\alpha}_{\max}\}\big{]},\\[5.78172pt] \partial^{\infty}(\varphi\circ f)(\bar{x})&\subset\big{[}\{v\mid(v,0)\in\partial^{\infty}g(\bar{x},f(\bar{x}))\}\cup\{v\mid(v,0)\in\partial^{\infty}g(\bar{x},\bar{\alpha}),f(\bar{x})<\bar{\alpha}\leq\bar{\alpha}_{\max}\}\big{]}.\end{split}

(48)

Let ${M_{1}(\bar{x})}\triangleq\left\{\bar{\alpha}\in(f(\bar{x}),\bar{\alpha}_{\max}]\,\middle|\,\exists\,x^{k}\rightarrow\bar{x}\text{ with }f(x^{k})\rightarrow\bar{\alpha}\right\}$ and ${M_{2}(\bar{x})\triangleq M(\bar{x})\backslash M_{1}(\bar{x})}$ . In the following, we characterize $\mathcal{N}_{\operatorname{epi}f}(\bar{x},\bar{\alpha})$ and verify (47) separately for $\bar{\alpha}\in{M_{1}(\bar{x})}$ and $\bar{\alpha}\in{M_{2}(\bar{x})}$ .

Case 2.1. For any $\bar{\alpha}\in{M_{1}(\bar{x})}$ , we first prove the inclusion:

\mathcal{N}_{\operatorname{epi}f}(\bar{x},\bar{\alpha})\subset\bigg{[}\left\{\lambda(v,-1)\,\middle|\,v\in\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow\bar{x}}\partial f(x),\lambda>0\right\}\cup\left\{(v,0)\,\middle|\,v\in{\displaystyle\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow\bar{x}}}^{\infty}\partial f(x)\right\}\bigg{]}.

(49)

Observe that for any $\bar{\alpha}\in{M_{1}(\bar{x})}$ , it holds that

\mathcal{N}_{\operatorname{epi}f}(\bar{x},\bar{\alpha})\subset\displaystyle\operatornamewithlimits{Lim\,sup}_{(x,\alpha)(\in\operatorname{epi}f)\rightarrow(\bar{x},\bar{\alpha})}\;\mathcal{N}^{p}_{\operatorname{epi}f}(x,\alpha)\subset\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow\bar{x}}\;\mathcal{N}^{p}_{\operatorname{epi}f}(x,f(x))\subset\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow\bar{x}}\;\mathcal{N}_{\operatorname{epi}f}(x,f(x)),

(50)

where the first inclusion is because any normal vector is a limit of proximal normals at nearby points [30, Exercise 6.18]; the second one uses Lemma 3; the last inclusion follows from the fact that the proximal normal cone is a subset of the limiting normal cone [30, Example 6.16]. Based on the the result of [30, Theorem 8.9] that

\mathcal{N}_{\operatorname{epi}f}(x,f(x))=\left\{\lambda(v,-1)\,\middle|\,v\in\partial f(x),\lambda>0\right\}\cup\left\{(v,0)\,\middle|\,v\in\partial^{\infty}f(x)\right\},

we conclude that $\mathcal{N}_{\operatorname{epi}f}(\bar{x},\bar{\alpha})\subset\mathbb{R}^{n}\times\mathbb{R}_{-}$ for any $\bar{\alpha}\in{M_{1}(\bar{x})}$ . For any $(v,-1)\in\mathcal{N}_{\operatorname{epi}f}(\bar{x},\bar{\alpha})$ with $\bar{\alpha}\in{M_{1}(\bar{x})}$ , there exist $x^{k}\rightarrow\bar{x}$ , $v^{k}\rightarrow v$ with $v^{k}\in\partial f(x^{k})$ . Then $v\in\operatorname{Lim\,sup}_{x\rightarrow\bar{x}}\partial f(x)$ .

To prove (49), it remains to show that $v\in\operatorname{Lim\,sup}_{x\rightarrow\bar{x}}^{\infty}\partial f(x)$ whenever $(v,0)\in\mathcal{N}_{\operatorname{epi}f}(\bar{x},\bar{\alpha})$ . It follows from (50) that $(v,0)$ is a limit of proximal normals of $\operatorname{epi}f$ at $(x^{k},f(x^{k}))$ for some sequence $x^{k}\rightarrow\bar{x}$ . (i) First consider the case $(v^{k},0)\rightarrow(v,0)$ with $(v^{k},0)\in\mathcal{N}^{p}_{\operatorname{epi}f}(x^{k},f(x^{k}))$ . Following the argument in the proof of [30, Theorem 8.9], we can derive $v^{k}\in\partial^{\infty}f(x^{k})$ . Therefore,

v\in\displaystyle\operatornamewithlimits{Lim\,sup}_{k\rightarrow+\infty}\;\partial^{\infty}f(x^{k})\;\subset\;\displaystyle\operatornamewithlimits{Lim\,sup}_{k\rightarrow+\infty}\left(\bigcup_{x^{k,i}\rightarrow_{f}\,x^{k}}{\displaystyle\displaystyle\operatornamewithlimits{Lim\,sup}_{i\rightarrow+\infty}}^{\infty}\partial f(x^{k,i})\right)\;\subset\;\bigcup_{x^{j}\rightarrow\bar{x}}{\displaystyle\displaystyle\operatornamewithlimits{Lim\,sup}_{j\rightarrow+\infty}}^{\infty}\;\partial f(x^{j}),

where the first inclusion is due to the definition of the horizon subdifferential, and the last inclusion follows from a standard diagonal extraction procedure. (ii) In the other case, we have $\lambda_{k}(v^{k},-1)\rightarrow(v,0)$ with $\lambda_{k}\downarrow 0$ and $v^{k}\in\partial f(x^{k})$ for all ${k\in\mathbb{N}}$ . It is easy to see $v\in\operatorname{Lim\,sup}_{x\rightarrow\bar{x}}^{\infty}\partial f(x)$ . So far, we obtain inclusion (49). Since $\bar{\alpha}\in{M_{1}(\bar{x})}$ , we have $\mathcal{N}_{\operatorname{dom}\varphi}(\bar{\alpha})\subset\operatorname{Lim\,sup}_{x\rightarrow_{(\varphi\circ f)}\bar{x}}\,\mathcal{N}_{\operatorname{dom}\varphi}(f(x))$ , and our assumption implies that $\lambda=0$ is the unique solution satisfying $0\in\lambda\cdot\operatorname{Lim\,sup}_{x\rightarrow\bar{x}}\partial f(x)$ with $\lambda\in\mathcal{N}_{\operatorname{dom}\varphi}(\bar{\alpha})$ . Combining this with (49), we immediately obtain (47).

Case 2.2. For any $\bar{\alpha}\in{M_{2}(\bar{x})}$ , consider any sequence $\big{\{}(x^{k},\alpha^{k})\big{\}}\subset\operatorname{epi}f$ converging to $(\bar{x},\bar{\alpha})$ . Then $\alpha^{k}>f(x^{k})$ for all sufficiently large $k$ since $\bar{\alpha}\notin{M_{1}(\bar{x})}$ . It is easy to see that $\mathcal{N}^{p}_{\operatorname{epi}f}(x^{k},\alpha^{k})\subset\mathbb{R}^{n}\times\{0\}$ , which gives us $\mathcal{N}_{\operatorname{epi}f}(x^{k},\alpha^{k})\subset\mathbb{R}^{n}\times\{0\}$ due to [30, Exercise 6.18]. By following a similar pattern as the final part of Case 2.1, it is not difficult to obtain, for any $\bar{\alpha}\in{M_{2}(\bar{x})}$ ,

\mathcal{N}_{\operatorname{epi}f}(\bar{x},\bar{\alpha})\subset\left\{(v,0)\,\middle|\,v\in{\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow\bar{x}}}^{\infty}\partial f(x)\right\}.

(51)

In this case, (47) holds trivially. Hence, we have verified (47) for cases 1 and 2.

Step 2: Based on (47) in step 1, we can now apply the sum rule [30, corollary 10.9] for $\partial g(\bar{x},\bar{\alpha})$ to obtain

\partial g(\bar{x},\bar{\alpha})\subset\mathcal{N}_{\operatorname{epi}f}(\bar{x},\bar{\alpha})+\{0\}\times\partial\varphi(\bar{\alpha}),\qquad\partial^{\infty}g(\bar{x},\bar{\alpha})\subset\mathcal{N}_{\operatorname{epi}f}(\bar{x},\bar{\alpha})+\{0\}\times\mathcal{N}_{\operatorname{dom}\varphi}(\bar{\alpha}).

(52)

Case 1. For ${M(\bar{x})}=\{f(\bar{x})\}$ , by combining (52) with (46), we can derive the stated results for $\partial(\varphi\circ f)(\bar{x})$ and $\partial^{\infty}(\varphi\circ f)(\bar{x})$ based on the observations that $\partial\varphi(f(\bar{x}))\subset\operatorname{Lim\,sup}_{x\rightarrow_{(\varphi\circ f)}\bar{x}}\varphi(f(x))$ and $\partial^{\infty}f(\bar{x})\subset\operatorname{Lim\,sup}_{x\rightarrow\bar{x}}^{\infty}\partial f(x)$ .

Case 2. Otherwise, by (52), we have

\begin{array}[]{cl}&\{v\mid(v,0)\in\partial g(\bar{x},\bar{\alpha}),f(\bar{x})<\bar{\alpha}\leq\bar{\alpha}_{\max}\}\\ \overset{\eqref{eq:2.1normalcone_epi}\eqref{eq:2.2normalcone_epi}}{\subset}&\bigcup\left\{y\cdot\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow\bar{x}}\partial f(x)\,\middle|\,y\in\partial\varphi(\bar{\alpha}),\bar{\alpha}\in{M_{1}}(\bar{x})\right\}\cup\left[\,{\bigcup}\left\{{\displaystyle\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow\bar{x}}}^{\infty}\partial f(x)\,\middle|\,0\in\partial\varphi(\bar{\alpha}),f(\bar{x})<\bar{\alpha}\leq\bar{\alpha}_{\max}\right\}\right]\\[10.84006pt] \subset&\bigcup\left\{\,y\cdot\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow\bar{x}}\partial f(x)\,\middle|\,y\in\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow_{(\varphi\circ f)}\,\bar{x}}\partial\varphi(f(x))\,\right\}\cup\left[\,{\Big{(}}{\displaystyle\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow\bar{x}}}^{\infty}\partial f(x){\Big{)}}\backslash\{0\}\,\right],\end{array}

where the last inclusion is because $0$ will be included in the first set if $0\in\partial\varphi(\bar{\alpha})$ for some $\bar{\alpha}\in(f(\bar{x}),\bar{\alpha}_{\max}]$ and the second set will be empty otherwise. Similarly,

\begin{array}[]{rl}&\{v\mid(v,0)\in\partial g^{\infty}(\bar{x},\bar{\alpha}),f(\bar{x})<\bar{\alpha}\leq\bar{\alpha}_{\max}\}\\[5.78172pt] \subset&\bigcup\left\{\,y\cdot\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow\bar{x}}\partial f(x)\,\middle|\,y\in\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow_{(\varphi\circ f)}\,\bar{x}}\mathcal{N}_{\operatorname{dom}\varphi}(f(x))\,\right\}\cup\left[\,{\Big{(}}{\displaystyle\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow\bar{x}}}^{\infty}\partial f(x){\Big{)}}\backslash\{0\}\,\right].\end{array}

We then complete the proof by using the inclusions in (48). ∎

Equipped with the chain rules, we are now ready to prove Proposition 4.

Proof of Proposition 4..

Let $\bar{x}$ be any feasible point, i.e., $\bar{x}\in{\bigcap_{p=1}^{m}}\operatorname{dom}F_{p}$ . Suppose for contradiction that (15) does not hold at $\bar{x}$ . Thus, there exist $p_{1}\in\{1,\cdots,m\}$ , $\{x^{k}\}\in S_{p_{1}}(\bar{x})$ and an index set $N\in\mathbb{N}_{\infty}^{\sharp}$ such that $0\in\partial_{C}f^{k}_{p_{1}}(x^{k})$ and $\mathcal{N}_{\operatorname{dom}\varphi_{p_{1}}}\left(f^{k}_{p_{1}}(x^{k})\right)\neq\{0\}$ for all $k\in N$ . Take an arbitrary nonzero scalar $y^{k}\in\mathcal{N}_{\operatorname{dom}\varphi_{p_{1}}}\left(f^{k}_{p_{1}}(x^{k})\right)$ for all $k\in N$ . Let $\widetilde{y}$ be any accumulation point of the unit scalars $\{y^{k}/|y^{k}|\}_{k\in N}$ . Then, we have $(0\neq)\widetilde{y}\in\bigcup\{\mathcal{N}_{\operatorname{dom}\varphi_{p_{1}}}(t_{p_{1}})\mid t_{p_{1}}\in T_{p_{1}}(\bar{x})\}$ and $0\in\operatorname{con}\partial_{A}f_{p_{1}}(\bar{x})$ , contradicting Assumption 5. This proves condition (15).

For any fixed $p=1,\cdots,m$ , let $y_{p^{\prime}}=0$ for any $p^{\prime}\in\{1,\cdots,m\}\backslash\{p\}$ in Assumption 5. Then the only scalar $y_{p}\in\bigcup\left\{\mathcal{N}_{\operatorname{dom}\varphi_{p}}(t_{p})\mid t_{p}\in T_{p}(\bar{x})\right\}$ with $0\in y_{p}\,\operatorname{con}\partial_{A}f_{p}(\bar{x})$ is $y_{p}=0$ , which completes the proof of (16).

To derive the constraint qualification (17), we consider two cases.

Case 1. For $p\in I_{2}$ , we have $\mathcal{N}_{\operatorname{dom}\varphi_{p}}(f_{p}(\bar{x}))\subset\bigcup\big{\{}\mathcal{N}_{\operatorname{dom}\varphi_{p}}(t_{p})\mid t_{p}\in T_{p}(\bar{x})\big{\}}$ due to $f^{k}_{p}\overset{\text{e}}{\rightarrow}f_{p}$ and $\partial(yf_{p})(\bar{x})\subset y\,\partial_{C}f_{p}(\bar{x})\subset y\cdot\operatorname{con}\partial_{A}f_{p}(\bar{x})$ for any $y$ by Theorem 1(a). Together with Assumption 5, we deduce that the only scalar $y\in\mathcal{N}_{\operatorname{dom}\varphi_{p}}(f_{p}(\bar{x}))$ with $0\in\partial(yf_{p})(\bar{x})$ is $y=0$ . From this condition and the local Lipschitz continuity of $f_{p}$ for $p\in I_{2}$ , we can apply the chain rule [30, Theorem 10.49] to get

\partial^{\infty}(\varphi_{p}\circ f_{p})(\bar{x})\subset\bigcup\left\{y\cdot\operatorname{con}\partial_{A}f_{p}(\bar{x})\mid y\in\mathcal{N}_{\operatorname{dom}\varphi_{p}}(t_{p}),t_{p}\in T_{p}(\bar{x})\right\}.

(53)

Case 2. For $p\in I_{1}$ , to utilize the chain rules (Lemma 4) for $\partial^{\infty}(\varphi_{p}\circ f_{p})$ , we must first confirm the validity of the condition:

\left[0\in y\cdot\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow\bar{x}}\partial f_{p}(x),\quad y\in\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow_{F_{p}}\bar{x}}\;\mathcal{N}_{\operatorname{dom}\varphi_{p}}(f_{p}(x))\right]\quad\Longrightarrow\quad y=0.

(54)

Indeed, it suffices to consider the case of $\operatorname{dom}\varphi^{\uparrow}_{p}=(-\infty,r_{p})$ or $(-\infty,r_{p}]$ for some $r_{p}\in\mathbb{R}$ , because the statement holds trivially when $\varphi^{\uparrow}_{p}$ is real-valued. For any element $\bar{y}\in\operatorname{Lim\,sup}_{x\rightarrow_{F_{p}}\,\bar{x}}\,\mathcal{N}_{\operatorname{dom}\varphi_{p}}(f_{p}(x))$ , there exist $(x^{k},y^{k})\rightarrow(\bar{x},\bar{y})$ with $y^{k}\in\mathcal{N}_{\operatorname{dom}\varphi_{p}}(f_{p}(x^{k}))$ and $F_{p}(x^{k})\rightarrow F_{p}(\bar{x})$ . Since $\bar{x}\in\operatorname{dom}F_{p}$ , we must have $x^{k}\in\operatorname{dom}F_{p}$ for all sufficiently large k, i.e., $f_{p}(x^{k})\in\operatorname{dom}\varphi^{\uparrow}_{p}$ , and $\{f_{p}(x^{k})\}_{k\in\mathbb{N}}$ is bounded from above due to $\operatorname{dom}\varphi^{\uparrow}_{p}=(-\infty,r_{p})$ or $(-\infty,r_{p}]$ . The sequence $\{f_{p}(x^{k})\}_{k\in\mathbb{N}}$ is also bounded from below since $f_{p}$ is lsc as a consequence of $f^{k}_{p}\overset{\text{e}}{\rightarrow}f_{p}$ . Then, we can assume that the bounded sequence $\{f_{p}(x^{k})\}_{k\in\mathbb{N}}$ converges to some $\bar{z}_{p}$ . Note that $\bar{z}_{p}\in\operatorname{dom}\varphi_{p}$ due to $F_{p}(\bar{x})=\liminf_{k\rightarrow+\infty}\varphi_{p}(f_{p}(x^{k}))\geq\varphi_{p}(\bar{z}_{p})$ . Thus, by the outer semicontinuity, $y^{k}\rightarrow\bar{y}\in\mathcal{N}_{\operatorname{dom}\varphi_{p}}(\bar{z}_{p})$ . By $f^{k}_{p}\overset{\text{e}}{\rightarrow}f_{p}$ , each $f_{p}(x^{k})$ can be expressed as the limit of a sequence $\{f^{i}_{p}(x^{k,i})\}_{i\in\mathbb{N}}$ with $x^{k,i}\rightarrow x^{k}$ for any fixed ${k\in\mathbb{N}}$ . Using a standard diagonal extraction procedure, one can extract a subsequence $f^{i_{k}}_{p}(x^{k,i_{k}})\rightarrow\bar{z}_{p}$ with $x^{k,i_{k}}\rightarrow\bar{x}$ . Hence, $\bar{z}_{p}\in T_{p}(\bar{x})$ and

\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow_{F_{p}}\,\bar{x}}\,\mathcal{N}_{\operatorname{dom}\varphi_{p}}(f_{p}(x))\subset\bigcup\{\mathcal{N}_{\operatorname{dom}\varphi_{p}}(t_{p})\mid t_{p}\in T_{p}(\bar{x})\}.

(55)

Using the subdifferentials relationships in Theorem 1 and the outer semicontinuity of $\partial_{A}f_{p}$ , we have

\displaystyle\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow\bar{x}}\;\partial f_{p}(x)\subset\displaystyle\operatornamewithlimits{Lim\,sup}_{x\rightarrow\bar{x}}\;\partial_{A}f_{p}(x)=\partial_{A}f_{p}(\bar{x}).

(56)

By (55), (56) and Assumption 5, we immediately get (54). Thus, we can apply the chain rule in Lemma 4, and use (55), (56) again to obtain

\begin{array}[]{rl}\partial^{\infty}(\varphi_{p}\circ f_{p})(\bar{x})&\displaystyle\subset\bigcup\left\{\,y\,\partial_{A}f_{p}(x)\,\middle|\,y\in\mathcal{N}_{\operatorname{dom}\varphi_{p}}(t_{p}),\,t_{p}\in T_{p}(\bar{x})\right\}\cup\left[\,{\displaystyle\displaystyle\operatornamewithlimits{Lim\,sup}_{x\to\bar{x}}}^{\infty}{\partial}f_{p}(x)\backslash\{0\}\,\right]\\[8.67204pt] &\displaystyle\subset\bigcup\left\{\,y\,\partial_{A}f_{p}(x)\,\middle|\,y\in\mathcal{N}_{\operatorname{dom}\varphi_{p}}(t_{p}),\,t_{p}\in T_{p}(\bar{x})\right\}\cup\left[\,\partial^{\infty}_{A}f_{p}(\bar{x})\backslash\{0\}\,\right].\end{array}

(57)

For the last inclusion, we use $\operatorname{Lim\,sup}_{x\rightarrow\bar{x}}^{\infty}\;\partial f_{p}(x)\subset\operatorname{Lim\,sup}_{x\rightarrow\bar{x}}^{\infty}\;\partial_{A}f_{p}(x)\subset\partial^{\infty}_{A}f_{p}(\bar{x})$ by Theorem 1(a) and using a standard diagonal extraction procedure. Combining inclusions (53), (57) for two cases with Assumption 5, we derive (17) and complete the proof. ∎

		$\displaystyle\left\\|\sum_{p\in I_{1}}y^{k}_{p,1}\,v^{k}_{p,1}+\sum_{p\in I_{2}}(y^{k}_{p,1}+y^{k}_{p,2})\left[\nabla g^{k}_{p}(x^{k,i_{k}})-\nabla h^{k}_{p}(x^{k,i_{k}})\right]\right\\|$
	$\displaystyle\overset{(\rm vi)}{\leq}$	$\displaystyle\lambda\\|x^{k,i_{k}+1}-x^{k,i_{k}}\\|+\sum\limits_{p\in I_{1}}\|y^{k}_{p,1}\|\cdot\min\Big{\{}{\mathbb{H}\left(\partial g^{k}_{p}(x^{k,i_{k}+1}),\partial g^{k}_{p}(x^{k,i_{k}})\right),\,\mathbb{H}\left(\partial h^{k}_{p}(x^{k,i_{k}+1}),\partial h^{k}_{p}(x^{k,i_{k}})\right)}\Big{\}}$
		$\displaystyle+\sum_{p\in I_{2}}\Big{(}\|y^{k}_{p,1}\|\cdot\\|\nabla g^{k}_{p}(x^{k,i_{k}})-\nabla g^{k}_{p}(x^{k,i_{k}+1})\\|+\|y^{k}_{p,2}\|\cdot\\|\nabla h^{k}_{p}(x^{k,i_{k}+1})-\nabla h^{k}_{p}(x^{k,i_{k}})\\|\Big{)}$
	$\displaystyle\overset{(\rm vii)}{\leq}$	$\displaystyle\left[\lambda+\left(\sum_{p\in I_{1}}\|y^{k}_{p,1}\|+\sum_{p\in I_{2}}\big{(}\|y^{k}_{p,1}\|+\|y^{k}_{p,2}\|\big{)}\right)\ell_{k}\right]\\|x^{k,i_{k}+1}-x^{k,i_{k}}\\|$
	$\displaystyle\overset{(\rm viii)}{\leq}$	$\displaystyle{\max\left\{1,\sum\limits_{p\in I_{1}}\|y^{k}_{p,1}\|+\sum\limits_{p\in I_{2}}\big{(}\|y^{k}_{p,1}\|+\|y^{k}_{p,2}\|\big{)}\right\}\delta_{k}\qquad\forall\,k\in\mathbb{N}},$