Fused Lasso Nearly Isotonic Signal Approximation in General Dimensions
Vladimir Pastukhov
Department of Computer Science and Engineering,
Chalmers, Sweden
Email: [email protected]
Abstract
In this paper we introduce and study fused lasso nearly-isotonic signal approximation, which is a combination of fused lasso and generalized nearly-isotonic regression. We show how these three estimators relate to each other, derive solution to the general problem, show that it is computationally feasible and provides a trade-off between piecewise monotonicity, sparsity and goodness-of-fit. Also, we derive an unbiased estimator of the degrees of freedom of the approximator.
This work is motivated by recent papers in nearly-constrained estimation and papers in penalized least squared regression. The subject of penalized estimators starts with -penalisation (Tibshirani, 1996), i.e.
which is called lasso signal approximation. The -penalisation is usually addressed as ridge regression (Hoerl & Kennard, 1970) or sometimes as Tikhonov-Philips regularization (Phillips, 1962; Tikhonov et al., 1995).
First, to set up the ideas and for simplicity of notation we consider one dimensional cases of the penalized estimators. In the next subsection we generalise these estimators to the case of isotonic constraints with respect to a general partial order.
For a given sequence of data points the fusion approximator (cf. Rinaldo (2009)) is given by
(1)
Further, the combination of fusion estimator and lasso is called fused lasso estimator and is given by:
(2)
The fused lasso was introduced in cf. Tibshirani et al. (2005) and its asymptotic properties were studied in detail in Rinaldo (2009).
Remark 1
In the paper Tibshirani & Taylor (2011) the estimator in (1) is called the fused lasso, while the estimator in (2) is addressed as the sparse fused lasso.
In the area of constrained inference the basic and simplest problem is the isotonic regression in one dimension. For a given sequence of data points the isotonic regression is the following approximation
(3)
i.e. the isotonic regression is the -projection of the vector onto the set of non-increasing vectors in . The notion of isotonic ”regression” in this context might be confusing. Nevertheless, it is a standard notion in this subject, cf., for example, the papers Best & Chakravarti (1990); Stout (2013), where the notation ”isotonic regression” is used for the isotonic projection of a general vector. Also, in this paper we use notations ”regression”, ”estimator” and ”approximator” interchangeably.
The nearly-isotonic regression, introduced in Tibshirani et al. (2011) and studied in detail in Minami (2020), is a less restrictive version of the isotonic regression and is given by the following optimization problem
(4)
where .
In this paper we combine fused lasso estimator with nearly-isotonic regression and call the resulting estimator as fused lasso nearly-isotonic signal approximator, i.e. for a given sequence of data points the problem in one dimensional case is the following optimization
(5)
Also, in the case of and , with , we call the estimator as fused nearly-isotonic regression, i.e.
(6)
The generalisation of nearly-isotonic regression in (6) was proposed in the conclusion of the paper Tibshirani et al. (2011).
We will state the problem in (5) for the case of isotonic constraints with respect to a general partial order, but, first, we have to introduce the notation.
1.1 Notation
We start with basic definitions of partial order and isotonic regression. Let be some index set. Next, we define the following binary relation on .
A binary relation on is called partial order if
•
it is reflexive, i.e. for all ;
•
it is transitive, i.e. , and imply ;
•
it is antisymmetric, i.e. , and imply .
Further, a vector indexed by is called isotonic with respect to the partial order on if implies . We denote the set of all isotonic vectors in with respect to the partial order on by , which is also called isotonic cone. Next, a vector is isotonic regression of an arbitrary vector over the pre-ordered index set if
(7)
For any partial order relation on there exists directed graph , with and
(8)
such that an arbitrary vector is isotonic with respect to iff for any . Therefore, equivalently to the definition in (7), a vector is isotonic regression of an arbitrary vector indexed by the partially ordered index set if
(9)
Further, for the directed graph , which corresponds to the partial order on , the nearly-isotonic regresson of indexed by is given by
(10)
This generalisation of nearly-isotonic regression was introduced and studied in Minami (2020).
Next, fussed and fussed lasso approximators for a general directed graph are given by
(11)
and
(12)
These optimization problems were introduced and solved for a general directed graph in Tibshirani & Taylor (2011).
Further, let denote the oriented incidence matrix for the directed graph , corresponding to on . We choose the orientation of in the following way. Assume that the graph with vertexes has edges. Next, assume we label the vertexes by and edges by . Then is matrix with
In order to clarify the notations we consider the following examples of partial order relations. First, let us consider the monotonic order relation in one dimensional case. Let , and for and we naturally define if . Further, if we let and , then is the directed graph which correspond to the one dimensional order relation on . Figure 1 displays the graph and the incidence matrix for the graph.
(a) Graph
(b) Oriented incidence matrix
Figure 1: Graph for monotonic contstraints and oriented incidence matrix
Next, we consider two dimensional case with bimonotonic constraints. The notion of bimonotonicity was first introduced in Beran & Dümbgen (2010) and it means the following. Let us consider the index set
with the following order relation on it: for we have iff and . Then, a vector , with , indexed by is called bimonotone if it is isotonic with respect to bimonotone order defined on its index . Further, we define the directed graph with vertexes , and the edges
The labeled graph and its incidence matrix are displayed on Figure 2.
(a) Graph
(b) Oriented incidence matrix
Figure 2: Graph for bimonotonic contstraints and oriented incidence matrix
1.2 General statement of the problem
Now we can state the general problem studied in this paper. Let be a signal indexed by the index set with the partial order relation defined on . Next, let be the directed gpraph corresponding to on . The fussed lasso nearly isotonic signal approximation with respect to on (or, equivalently, to the directed graph , corresponding to ) is given by
(13)
Therefore, the estimator in (13) is a combination of the estimators in (10) and (12).
Equivalently, we can rewrite the problem in the following way:
(14)
where is the oriented incidence matrix of the graph .
Analogously to the definition in one dimensional case, if we call the estimator as fussed nearly-isotonic approximator and denote it by .
2 The solution to the fused lasso nearly-isotonic signal approximator
First, we consider fused nearly-isotonic regression, i.e. in (14) we assume that .
Theorem 1
For a fixed data vector indexed by the index set with the partial order relation defined on the solution to the fused nearly-isotonic problem in (14) is given by
(15)
with
(16)
where is the incidence matrix of the directed graph , corresponding to on , is the vector whose all elements are equal to and the notation for vectors means for all .
Proof.
First, following the derivations of trend filtering and generalised lasso in Kim et al. (2009) and Tibshirani & Taylor (2011), respectively, we can write the optimization problem in (6) in the following way
Further, the Lagrangian is given by
(17)
where is a dual variable.
Note that
and
Next, the dual function is given by
and, therefore, the dual problem is
which is equivalent to
Lastly, taking first derivative of Lagrangian with respect to we get the following relation between and
Next, we provide the solution to the fused lasso nearly-isotonic regression.
Theorem 2
For a given vector indexed by the solution to the fused lasso nearly-isotonic signal approximator is given by soft thresholding the fused nearly-isotonic regression , i.e.
(18)
for .
Proof.
The proof is similar to the derivation of solution of the fused lasso in Friedman et al. (2007). Nevertheless, for compliteness of the paper we provide the proof for .
The subgradient equations (which are necessary and sufficient conditions for the solution of (5)) for , with , are
(19)
where
(20)
Next, let , and denote the values of the parameters defined above at some value of . Note, the values of and are fixed. Therefore, if for we have
and for we can set .
Next, let denote the soft thresholding of , i.e.
The goal is to prove that provides the solution to (13).
Note, analogously to the proof for the fused lasso estimator in Lemma A.1 at Friedman et al. (2007), if either or , and or ,
then we also have or , respectively. Therefore, soft thresholding of does not change the ordering of these pairs and we have and . Next, if for some we have , then and , and, again, we can set , and .
Now let us insert into subgradient equations (19) and show that we can find , for all .
First, assume that for some we have . Then
Note, that
because . Therefore, if , then .
The proof for the case when is similar and one can show that if .
Second, assume that . Then, , and
Next, if we let , then, again, we have
Therefore, we have proved that .
3 Properties of the fused lasso nearly-isotonic signal approximator
We start with a proposition which shows how the solutions to the optimization problems (11), (10) and (14) are related to each other. This result will be used in the next section to derive degrees of freedom of the fused lasso nearly-isotonic signal approximator.
Proposition 1
For a fixed data vector indexed by and penalisation parameters and the following relations between estimators , and hold
(21)
(22)
and
(23)
where is the oriented incidence matrix for the graph corresponding to the partial order relation on .
Proof.
First, from Tibshirani et al. (2011) the solution to the nearly-isotonic problem is given by
The proof for the fused lasso nearly-isotonic estimator is the same with the change of variable in (15) and (16) for the proof of the first equality in (22) and with for the second equality.
Next, we prove the result for the case of fused lasso nearly-isotonic approximator. From Theorem 2 we have
Next, we prove that, analogously to fussed lasso and nearly-isotonic regression, as one of the penalization parameters increases the constant regions in the solution can only be joined together and not split apart. We prove this result only for one dimensional monotonic order, and the general case is an open question. This result could be potentially useful in the future research for the solution path of fussed lasso nearly-isotonic approximator.
Proposition 2
Let with a natural order defined on it. Next, let and are the triples of penalisation parameters such that one of the elements of is greater than the corresponding element in , while two others are the same. Next, assume that for some the solution satisfies
Then for we have
Proof.
Case 1: and are fixed and .
The result of the proposition for this case follows directly from Theorem 2.
Case 2: and are fixed and .
Let us consider the fused nearly-isotonic regression and write the subgradient equations
where and , with , are defined in (20), and, analogously to the proof of Theorem 2, , denote the values of the parameters defined above at some value of .
Assume that for in the solution we have a following constant region
(24)
and in the similar way as in Tibshirani et al. (2011) for we consider the subset of the subgradient equations
(25)
with , and show that there exists the solution for which (24) holds, and .
Note first that as increases, (24) holds until the merge with other groups happens, which means that and will not change their values until the merge of this constant region. Also, as it follows from (20), for the value of is in . Therefore, without any violation of the restrictions on we can assume that for any .
Next, taking pairwise differences between subgradient equations for we have
and, since and provide the solution to the subgradient equations (25), then
(27)
Next, as pointed out at Friedman et al. (2007) and Tibshirani et al. (2011)
then, one can show that
(28)
Further, let us consider the case of . Then we have
Recall, above we set , and and does not change their values until the merge, which means that , and .
Therefore, the subgradient equations for can be written as
Next, since the term is not changed, , and
then we have
Therefore we proved that . Since is given by soft thresholding of , then for .
Case 3: and are fixed and . The proof for this case is virtually identical to the proof for the Case 2. In this case we assume that for any . Next, and do not change their values until the merge, which, again, means that , and . Finally, we can show that
4 Degrees of freedom
In this section we discuss the estimation of degrees of freedom for fused nearly-isotonic regression and fused lasso nearly isotonic signal approximator. Let us consider the following nonparametric model
where is the unknown signal, and the error term , with .
The degrees of freedom is a measure of complexity of the estimator, and following Efron (1986), for the fixed values of , and the degrees of freedom of and are given by
(29)
and
(30)
The next theorem provides the unbiased estimators of the degrees of freedom and .
Theorem 3
For the fixed values of , and let
and
Then we have
and
Proof.
First, for the fused estimator let
Then, as it follows from Tibshirani & Taylor (2011), for we have
Further, again, using the property of the covariance, we have
Lastly, we note that the proof for the unbiased estimator of the degrees of freedom for nearly-isotonic regression, given in Tibshirani et al. (2011), can be done in the same way as in the current proof, using the relation (21) and, again, the result of the paper Tibshirani & Taylor (2011) for the fusion estimator .
We can use the estimate of degrees of freedom for unbiased estimation of the true risk , which is given by statistic
5 Acknowledgments
This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundataion.
Appendix A Appendix
References
(1)
Beran & Dümbgen (2010)
Beran, R. & Dümbgen, L. (2010), ‘Least squares and shrinkage estimation under bimonotonicity constraints’,
Statistics and computing20(2), 177–189.
Best & Chakravarti (1990)
Best, M. J. & Chakravarti, N. (1990), ‘Active set algorithms for isotonic regression; a
unifying framework’, Mathematical Programming47(1), 425–439.
Efron (1986)
Efron, B. (1986), ‘How biased is the apparent
error rate of a prediction rule?’, Journal of the American statistical
Association81(394), 461–470.
Friedman et al. (2007)
Friedman, J., Hastie, T., Höfling, H. & Tibshirani, R.
(2007), ‘Pathwise coordinate optimization’,
The Annals of Applied Statistics1(2), 302–332.
Hoerl & Kennard (1970)
Hoerl, A. E. & Kennard, R. W. (1970), ‘Ridge regression: Biased estimation for
nonorthogonal problems’, Technometrics12(1), 55–67.
Kim et al. (2009)
Kim, S.-J., Koh, K., Boyd, S. & Gorinevsky, D. (2009), ‘ trend filtering’, SIAM Review51(2), 339–360.
Minami (2020)
Minami, K. (2020), ‘Estimating piecewise
monotone signals’, Electronic Journal of Statistics14(1), 1508–1576.
Phillips (1962)
Phillips, D. L. (1962), ‘A technique for the
numerical solution of certain integral equations of the first kind’, Journal of the ACM (JACM)9(1), 84–97.
Rinaldo (2009)
Rinaldo, A. (2009), ‘Properties and
refinements of the fused lasso’, The Annals of Statistics37(5B), 2922–2952.
Stout (2013)
Stout, Q. F. (2013), ‘Isotonic regression via
partitioning’, Algorithmica66(1), 93–112.
Tibshirani (1996)
Tibshirani, R. (1996), ‘Regression shrinkage
and selection via the lasso’, Journal of the Royal Statistical Society:
Series B (Methodological)58(1), 267–288.
Tibshirani et al. (2011)
Tibshirani, R. J., Hoefling, H. & Tibshirani, R. (2011), ‘Nearly-isotonic regression’, Technometrics53(1), 54–61.
Tibshirani & Taylor (2011)
Tibshirani, R. J. & Taylor, J. (2011), ‘The solution path of the generalized lasso’, The Annals of Statistics39(3), 1335–1371.
Tibshirani et al. (2005)
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. & Knight, K.
(2005), ‘Sparsity and smoothness via the
fused lasso’, Journal of the Royal Statistical Society: Series B
(Statistical Methodology)67(1), 91–108.
Tikhonov et al. (1995)
Tikhonov, A. N., Goncharsky, A., Stepanov, V. & Yagola, A. G.
(1995), Numerical methods for the
solution of ill-posed problems, Vol. 328, Springer Science & Business
Media.