Generalized eigen, singular value, and partial least squares decompositions: The \pkgGSVD package

Derek Beaton
Rotman Research Institute Baycrest Health Sciences [email protected]

\Plainauthor

Derek Beaton \PlaintitleGeneralized eigen, singular value, and partial least squares decompositions: The GSVD package \Shorttitle\pkgGSVD: generalized SVD \Abstract The generalized singular value decomposition (GSVD, a.k.a. “SVD triplet”, “duality diagram” approach) provides a unified strategy and basis to perform nearly all of the most common multivariate analyses (e.g., principal components, correspondence analysis, multidimensional scaling, canonical correlation, partial least squares). Though the GSVD is ubiquitous, powerful, and flexible, it has very few implementations. Here I introduce the \pkgGSVD package for \proglangR. The general goal of \pkgGSVD is to provide a small set of accessible functions to perform the GSVD and two other related decompositions (generalized eigenvalue decomposition, generalized partial least squares-singular value decomposition). Furthermore, \pkgGSVD helps provide a more unified conceptual approach and nomenclature to many techniques. I first introduce the concept of the GSVD, followed by a formal definition of the generalized decompositions. Next I provide some key decisions made during development, and then a number of examples of how to use \pkgGSVD to implement various statistical techniques. These examples also illustrate one of the goals of \pkgGSVD: how others can (or should) build analysis packages that depend on \pkgGSVD. Finally, I discuss the possible future of \pkgGSVD. \Keywordsmultivariate analysis, generalized singular value decomposition, principal components analysis, correspondence analysis, multidimensional scaling, canonical correlation, partial least squares, \proglangR \Plainkeywordsmultivariate analysis, generalized singular value decomposition, principal components analysis, correspondence analysis, multidimensional scaling, canonical correlation, partial least squares, R \Address Derek Beaton
Rotman Research Institute, Baycrest Health Sciences
3560 Bathurst Street Toronto, ON Canada M6A 2E1
E-mail:

1 Introduction

The singular value decomposition (SVD; golub_singular_1971) is one of the most important tools in a multivariate toolbox. Conceptually and practically, the SVD is the core technique behind numerous statistical methods; the most common of which is principal component analysis (PCA; jolliffe_principal_2012; abdi_principal_2010; jolliffe_principal_2016). A lesser known—but far more ubiquitous—tool is the generalized SVD (GSVD; abdi2007singular; greenacre_theory_1984; takane_relationships_2003; holmes_multivariate_2008). The GSVD is more ubiquitous because it is the technique behind—and generalizes—many more statistical methods. The core concept behind the GSVD is that a data matrix has two companion matrices: one for the rows and one for the columns. These companion matrices are weights, constraints, or metrics imposed on the rows and columns of a data matrix. For examples, via the GSVD we can implement weighted solutions to PCA, or analyses under different metrics, such as correspondence analysis (CA; greenacre_theory_1984; lebart_multivariate_1984; escofier-cordier_analyse_1965; greenacre_correspondence_2010) or canonical correlation analysis (CCA; harold1936relations; abdi2017canonical).

Though the GSVD generalizes—and is more flexible than—the SVD, there are few if any direct implementations of the GSVD in \proglangR. Rather, the GSVD is typically part of specific packages, and these packages are typically designed around more general analyses and broad usage. Rarely, if ever, are these GSVD implementations accessible to, or meant for, more direct access by users. Given the frequent and ubiquitous uses of the SVD and GSVD, a more accessible implementation would benefit a wide range of users.

Here I introduce a package designed around the GSVD, called \pkgGSVD. \pkgGSVD is a lightweight implementation of the GSVD and two other generalized decompositions: the generalized eigendecomposition (GEVD) and the generalized partial least squares-SVD (GPLSSVD). \pkgGSVD was designed for a wide range of users, from analysts to package developers, all of whom would benefit from more direct access to the GSVD and similar decompositions. More importantly, the \pkgGSVD package and the idea of the GSVD provide a basis to unify concepts, nomenclature, and techniques across a wide array of statistical traditions and multivariate analyses approaches. \pkgGSVD has three core functions: \codegeigen(), \codegsvd(), and \codegplsssvd(). These core functions provide a way for users to implement a wide array of methods including (but not limited to) multidimensional scaling, principal components analysis, correspondence analysis, canonical correlation, partial least squares, and numerous variants and extensions of the aforementioned. \pkgGSVD also helps simplify and unify concepts across techniques because, at their core, all of these techniques can be accomplished with the SVD.

In this paper, I introduce the core functions and functionality of \pkgGSVD. Furthermore, I show how the GSVD can provide a more unified nomenclature across techniques, and I show how various techniques can be implemented through \codegeigen(), \codegsvd(), and \codegplsssvd(). Here I provide sufficient detail for the majority of users and readers. However, some readers may want to take a deeper dive into various techniques and literatures. Where possible and relevant, I point the readers to works with much more detail and substance.

This paper is outlined as follows. In Generalized decompositions I provide background on, notation for, and mathematical explanations of the three decompositions discussed here (GEVD, GSVD, and GPLSSVD), followed by some additional notes and literature. In Package description, I provide an overview of the core functions, their uses, and other development notes. In Examples of multivariate analyses I provide detailed implementations of numerous multivariate techniques, as well as variations of those techniques or variations of how to implement them with \pkgGSVD (e.g., PCA via GEVD vs PCA via GSVD). In Discussion I make some comments about the package, its uses, its potential, and possible future development.

2 Generalized decompositions

The GSVD is probably best known by way of the correspondence analysis literature (greenacre_theory_1984; lebart_multivariate_1984; escofier-cordier_analyse_1965; greenacre_correspondence_2010) and, more generally, the “French way” of multivariate data analyses (holmes_discussion_2017; holmes_multivariate_2008). There is considerable breadth and depth of materials on the GSVD and related techniques since the 1960s (and even well before then). Though such breadth and depth is beyond the scope of this paper, I provide a list of references and resources throughout the paper for any interested readers. This section instead focuses on the formulation—and to a degree the nomenclature—of the three core techniques. I first introduce notation, followed by the GEVD, the GSVD, and then formalize and introduce the GPLSSVD.

2.1 Notation

Bold uppercase letters denote matrices (e.g., $\mathbf{X}$ ). Upper case italic letters (e.g., $I$ ) denote cardinality, size, or length. Subscripts for matrices denote relationships with certain other matrices, for examples ${\mathbf{Z}}_{\mathbf{X}}$ is some matrix derived from or related to the ${\bf X}$ matrix, where something like ${\bf F}_{I}$ is a matrix related to the $I$ set of elements. When these matrices are introduced, they are also specified. Two matrices side-by-side denotes standard matrix multiplication (e.g., $\bf{X}\bf{Y}$ ). Superscript ^T denotes the transpose operation, and superscript ^-1 denotes standard matrix inversion. The diagonal operator, denoted $\mathrm{diag\{\}}$ , transforms a vector into a diagonal matrix, or extracts the diagonal of a matrix in order to produce a vector.

2.2 Generalized eigendecomposition

The generalized eigendecomposition (GEVD) requires two matrices: a $J\times J$ (square) data matrix ${\bf X}$ and a $J\times J$ constraints matrix ${\bf W}_{\bf X}$ . For the GEVD, ${\bf X}$ is typically positive semi-definite and symmetric (e.g., a covariance matrix) and ${\bf W}_{\bf X}$ is required to be positive semi-definite. The GEVD decomposes the data matrix ${\bf X}$ —with respect to its constraints ${\bf W}_{\bf X}$ —into two matrices as

{\bf X}={\bf Q}{\bf\Lambda}{\bf Q}^{T},

(1)

where ${\bf\Lambda}$ is a diagonal matrix that contains the eigenvalues and ${\bf Q}$ are the generalized eigenvectors. The GEVD finds orthogonal slices of a data matrix, with respect to its constraints, where each orthogonal slice explains the maximum possible variance. That is, the GEVD maximizes ${\bf\Lambda}={\bf Q}^{T}{\bf W}_{\bf X}{\bf X}{\bf W}_{\bf X}{\bf Q}$ under the constraint of orthogonality where ${\bf Q}^{T}{\bf W}_{\bf X}{\bf Q}={\bf I}$ . Practically, the GEVD is performed with the standard eigenvalue decomposition (EVD) as

\widetilde{\bf X}={\bf V}{\bf\Lambda}{\bf V},

(2)

where $\widetilde{\bf X}={\bf W}_{\bf X}^{\frac{1}{2}}{\bf X}{\bf W}_{\bf X}^{\frac{1}{2}}$ and ${\bf V}$ are the eigenvectors which are orthonormal, such that ${\bf V}^{T}{\bf V}={\bf I}$ . The relationship between the GEVD and EVD can be explained as the relationship between the generalized and standard eigenvectors where

{\bf Q}={\bf W}_{\bf X}^{-\frac{1}{2}}{\bf V}\Longleftrightarrow{\bf V}={\bf W}_{\bf X}^{\frac{1}{2}}{\bf Q}.

(3)

When ${\bf W}_{\bf X}={\bf I}$ , the GEVD produces exactly the same results as the EVD because $\widetilde{\bf X}={\bf X}$ and thus ${\bf Q}={\bf V}$ . Analyses with the EVD and GEVD—such as PCA—typically produce component or factor scores. With the GEVD, component scores are defined as

{\bf F}_{J}={\bf W}_{\bf X}{\bf Q}{\bf\Delta},

(4)

where ${\bf\Delta}={\bf\Lambda}^{\frac{1}{2}}$ , which are singular values. The maximization in the GEVD can be reframed as the maximization of the component scores where ${\bf\Lambda}={\bf F}_{J}^{T}{\bf W}_{\bf X}^{-1}{\bf F}_{J}$ , still subject to ${\bf Q}^{T}{\bf W}_{\bf X}{\bf Q}={\bf I}$ .

2.3 Generalized singular value decomposition

The generalized singular value decomposition (GSVD) requires three matrices: an $I\times J$ (rectangular) data matrix ${\bf X}$ , an $I\times I$ row constraints matrix ${\bf M}_{\bf X}$ , and a $J\times J$ columns constraints matrix ${\bf W}_{\bf X}$ . For the GSVD ${\bf M}_{\bf X}$ and ${\bf W}_{\bf X}$ are each required to be positive semi-definite. The GSVD decomposes the data matrix ${\bf X}$ —with respect to both of its constraints ${\bf M}_{\bf X}$ and ${\bf W}_{\bf X}$ —into three matrices as

{\bf X}={\bf P}{\bf\Delta}{\bf Q}^{T},

(5)

where ${\bf\Delta}$ is a diagonal matrix that contains the singular values, and where ${\bf P}$ and ${\bf Q}$ are the left and right generalized singular vectors, respectively. From the GSVD we can obtain eigenvalues as ${\bf\Lambda}^{2}={\bf\Delta}$ . The GSVD finds orthogonal slices of a data matrix, with respect to its constraints, where each slice explains the maximum possible square root of the variance. That is, the GSVD maximizes ${\bf\Delta}={\bf P}^{T}{\bf M}_{\bf X}{\bf X}{\bf W}_{\bf X}{\bf Q}$ under the constraint of orthogonality where ${\bf P}^{T}{\bf M}_{\bf X}{\bf P}={\bf I}={\bf Q}^{T}{\bf W}_{\bf X}{\bf Q}$ . Typically, the GSVD is performed with the standard SVD as

\widetilde{\bf X}={\bf U}{\bf\Delta}{\bf V},

(6)

where $\widetilde{\bf X}={{\bf M}_{\bf X}^{\frac{1}{2}}}{\bf X}{{\bf W}^{\frac{1}{2}}_{\bf X}}$ , and where ${\bf U}$ and ${\bf V}$ are the left and right singular vectors, respectively, which are orthonormal such that ${\bf U}^{T}{\bf U}={\bf I}={\bf V}^{T}{\bf V}$ . The relationship between the GSVD and SVD can be explained as the relationship between the generalized and standard singular vectors where

	$\displaystyle{\bf P}={{\bf M}^{-\frac{1}{2}}_{\bf X}}{\bf U}\Longleftrightarrow{\bf U}={{\bf M}^{\frac{1}{2}}_{\bf X}}{\bf P}$		(7)
	$\displaystyle{\bf Q}={{\bf W}^{-\frac{1}{2}}_{\bf X}}{\bf V}\Longleftrightarrow{\bf V}={{\bf W}^{\frac{1}{2}}_{\bf X}}{\bf Q}.$		(7)

When ${\bf M}_{\bf X}={\bf I}={\bf W}_{\bf X}$ , the GSVD produces exactly the same results as the SVD because $\widetilde{\bf X}={\bf X}$ and thus ${\bf P}={\bf U}$ and ${\bf Q}={\bf V}$ . Analyses with the SVD and GSVD—such as PCA or CA—typically produce component or factor scores. With the GSVD, component scores are defined as

{\bf F}_{I}={\bf M}_{\bf X}{\bf P}{\bf\Delta}\textrm{ and }{\bf F}_{J}={\bf W}_{\bf X}{\bf Q}{\bf\Delta},

(8)

for the left (rows) and right (columns) of ${\bf X}$ , respectively. The optimization in the GSVD can be reframed as the maximization of the component scores where ${\bf F}_{I}^{T}{\bf M}_{\bf X}^{-1}{\bf F}_{I}={\bf\Lambda}={\bf F}_{J}^{T}{\bf W}_{\bf X}^{-1}{\bf F}_{J}$ , still subject to ${\bf P}^{T}{\bf M}_{\bf X}{\bf P}={\bf I}={\bf Q}^{T}{\bf W}_{\bf X}{\bf Q}$ . Note how the optimization with respect to the component scores shows a maximization for the eigenvalues.

2.4 Generalized partial least squares singular value decomposition

The generalized partial least squares-singular value decomposition (GPLSSVD) is a reformulation of the PLSSVD. The PLSSVD is a specific type of PLS from the broader PLS family (tenenhaus1998regression). The PLSSVD has various other names—for example, PLS correlation (krishnan_partial_2011)—but canonically comes from the psychology (ketterlinus1989partial) and neuroimaging literatures (mcintosh_spatial_1996), though it traces its origins to Tucker’s interbattery factor analysis (tucker_inter-battery_1958). Recently, with other colleagues, I introduced the idea of the GPLSSVD as it helped us formalize a new version of PLS for categorical data (beaton_partial_2016; beaton_generalization_2019). However, the GPLSSVD also allows us to generalize other methods, such as canonical correlation analysis (see Supplemental Material of beaton_generalization_2019).

The GPLSSVD requires six matrices: an $N\times I$ (rectangular) data matrix ${\bf X}$ with its $N\times N$ row constraints matrix ${\bf M}_{\bf X}$ and its $I\times I$ columns constraints matrix ${\bf W}_{\bf X}$ , and an $N\times J$ (rectangular) data matrix ${\bf Y}$ with its $N\times N$ row constraints matrix ${\bf M}_{\bf Y}$ and its $J\times J$ columns constraints matrix ${\bf W}_{\bf Y}$ . For the GPLSSVD all constraint matrices are required to be positive semi-definite. The GPLSSVD decomposes the relationship between the data matrices, with respect to their constraints, and expresses the common information as the relationship between latent variables. The goal of partial least squares-SVD (PLSSVD) is to find a combination of orthogonal latent variables that maximize the relationship between two data matrices. PLS is often presented as $\mathrm{argmax(}{\bf{l_{\bf X}}_{\ell}^{T}}{{\bf l}_{\bf Y}}_{\ell}\mathrm{)}=\mathrm{argmax}\textrm{ }\mathrm{cov(}{\bf{l_{\bf X}}_{\ell}},{{\bf l}_{\bf Y}}_{\ell}\mathrm{)}$ , under the condition that ${\bf{l_{\bf X}}_{\ell}^{T}}{{\bf l}_{\bf Y}}_{\ell^{\prime}}=0$ when ${\ell}\neq{\ell^{\prime}}$ . This maximization can be framed as

{{\bf L}_{\bf X}^{T}}{\bf L}_{\bf Y}={\bf\Delta},

(9)

where ${\bf\Delta}$ is the diagonal matrix of singular values, and so ${\bf\Delta}^{2}={\bf\Lambda}$ which are eigenvalues. Like with the GSVD, the GPLSSVD decomposes the relationship between two data matrices into three matrices as

[({\bf M}_{\bf X}^{\frac{1}{2}}{\bf X})^{T}({\bf M}_{\bf Y}^{\frac{1}{2}}{\bf Y})]={\bf P}{\bf\Delta}{\bf Q}^{T},

(10)

where ${\bf\Delta}$ is the diagonal matrix of singular values, and where ${\bf P}$ and ${\bf Q}$ are the left and right generalized singular vectors, respectively. Like the GSVD and GEVD, the GPLSSVD finds orthogonal slices of $({\bf M}_{\bf X}^{\frac{1}{2}}{\bf X})^{T}({\bf M}_{\bf Y}^{\frac{1}{2}}{\bf Y})$ with respect to the column constraints. The GPLSSVD maximizes ${\bf\Delta}={\bf P}^{T}{\bf W}_{\bf X}[({\bf M}_{\bf X}^{\frac{1}{2}}{\bf X})^{T}({\bf M}_{\bf Y}^{\frac{1}{2}}{\bf Y})]{\bf W}_{\bf Y}{\bf Q}$ under the constraint of orthogonality where ${\bf P}^{T}{\bf W}_{\bf X}{\bf P}={\bf I}={\bf Q}^{T}{\bf W}_{\bf Y}{\bf Q}$ . Typically, the GPLSSVD is performed with the SVD as

\widetilde{\bf X}^{T}\widetilde{\bf Y}={\bf U}{\bf\Delta}{\bf V},

(11)

where $\widetilde{\bf X}={{\bf M}_{\bf X}^{\frac{1}{2}}}{\bf X}{{\bf W}^{\frac{1}{2}}_{\bf X}}$ and $\widetilde{\bf Y}={{\bf M}_{\bf Y}^{\frac{1}{2}}}{\bf Y}{{\bf W}^{\frac{1}{2}}_{\bf Y}}$ , and where ${\bf U}$ and ${\bf V}$ are the left and right singular vectors, respectively, which are orthonormal such that ${\bf U}^{T}{\bf U}={\bf I}={\bf V}^{T}{\bf V}$ . The relationship between the generalized and standard singular vectors are

	$\displaystyle{\bf P}={{\bf W}^{-\frac{1}{2}}_{\bf X}}{\bf U}\Longleftrightarrow{\bf U}={{\bf W}^{\frac{1}{2}}_{\bf X}}{\bf P}$		(12)
	$\displaystyle{\bf Q}={{\bf W}^{-\frac{1}{2}}_{\bf Y}}{\bf V}\Longleftrightarrow{\bf V}={{\bf W}^{\frac{1}{2}}_{\bf Y}}{\bf Q}.$		(12)

When all constraint matrices are ${\bf I}$ , the GPLSSVD produces exactly the same results as the PLSSVD because $\widetilde{\bf X}={\bf X}$ and $\widetilde{\bf Y}={\bf Y}$ and thus ${\bf P}={\bf U}$ and ${\bf Q}={\bf V}$ .

The latent variables are then expressed with respect to the constraints and generalized singular vectors as ${\bf L}_{\bf X}=({\bf M}_{\bf X}^{\frac{1}{2}}{\bf X}{\bf W}_{\bf X}{\bf P})$ and ${\bf L}_{\bf Y}=({\bf M}_{\bf Y}^{\frac{1}{2}}{\bf Y}{\bf W}_{\bf Y}{\bf Q})$ . These latent variables maximize the weighted covariance (by way of the constraints) subject to orthogonality where

	$\displaystyle{\bf L}_{\bf X}^{T}{\bf L}_{\bf Y}=$		(13)
	$\displaystyle({\bf M}_{\bf X}^{\frac{1}{2}}{\bf X}{\bf W}_{\bf X}{\bf P})^{T}({\bf M}_{\bf Y}^{\frac{1}{2}}{\bf Y}{\bf W}_{\bf Y}{\bf Q})=$
	$\displaystyle(\widetilde{\bf X}{\bf U})^{T}(\widetilde{\bf Y}{\bf V})=$
	$\displaystyle{\bf U}^{T}\widetilde{\bf X}^{T}\widetilde{\bf Y}{\bf V}={\bf\Delta}.$

We will see in the following section that the “weighted covariance” could be the correlation, which allows us to use the GPLSSVD to perform various types of “cross-decomposition” techniques. Like with the GEVD and GSVD, the GPLSSVD produces component or factor scores. The component scores are defined as

{\bf F}_{I}={\bf W}_{\bf X}{\bf P}{\bf\Delta}\textrm{ and }{\bf F}_{J}={\bf W}_{\bf Y}{\bf Q}{\bf\Delta},

(14)

for the columns of ${\bf X}$ and the columns of ${\bf Y}$ , respectively. The optimization in the GPLSSVD can be reframed as the maximization of the component scores where ${\bf F}_{I}^{T}{\bf W}_{\bf X}^{-1}{\bf F}_{I}={\bf\Lambda}={\bf F}_{J}^{T}{\bf W}_{\bf Y}^{-1}{\bf F}_{J}$ where ${\bf\Lambda}$ are the eigenvalues, and this maximization is still subject to ${\bf P}^{T}{\bf W}_{\bf X}{\bf P}={\bf I}={\bf Q}^{T}{\bf W}_{\bf Y}{\bf Q}$ .

2.5 Decomposition tuples

For simplicity, the GSVD is often referred to as a “triplet” or “the GSVD triplet” (husson_jan_2016; holmes_multivariate_2008) comprised of (1) the data matrix, (2) the column constraints, and (3) the row constraints. We can use the same concept to also define “tuples” for the GEVD and GPLSSVD. To note, the traditional way to present the GSVD triplet is in the above order (data, column constraints, row constraints). However, here I present a different order for the elements in the tuples so that I can (1) better harmonize the tuples across the three decompositions presented here, and (2) simplify the tuples such that the order of the elements within the tuples reflects the matrix multiplication steps. Furthermore, I present two different tuples for each decomposition—a complete and a partial—where the partial is a lower rank solution. The complete decomposition tuples are:

•

The complete GEVD 2-tuple: $\mathrm{GEVD(}{\bf X},{\bf W}_{\bf X}\mathrm{)}$
•

The complete GSVD decomposition 3-tuple: $\mathrm{GSVD(}{\bf M}_{\bf X},{\bf X},{\bf W}_{\bf X}\mathrm{)}$
•

The complete GPLSSVD decomposition 6-tuple: $\mathrm{GPLSSVD(}{\bf M}_{\bf X},{\bf X},{\bf W}_{\bf X},{\bf M}_{\bf Y},{\bf Y},{\bf W}_{\bf Y}\mathrm{)}$ .

Additionally, we can take the idea of tuples one step further and allow for the these tuples to also define the desired returned rank of the results referred to as “partial decompositions”. The partial decompositions produce (return) only the first $C$ components, and are defined as:

•

The partial GEVD decomposition 3-tuple: $\mathrm{GEVD(}{\bf X},{\bf W}_{\bf X},C\mathrm{)}$
•

The partial GSVD decomposition 4-tuple: $\mathrm{GSVD(}{\bf M}_{\bf X},{\bf X},{\bf W}_{\bf X},C\mathrm{)}$
•

The partial GPLSSVD decomposition 7-tuple: $\mathrm{GPLSSVD(}{\bf M}_{\bf X},{\bf X},{\bf W}_{\bf X},{\bf M}_{\bf Y},{\bf Y},{\bf W}_{\bf Y},C\mathrm{)}$ .

Overall, these tuples provide short and convenient ways to express the decompositions. And as we will see in later sections, these tuples provide a simpler way to express specific techniques under the same framework (e.g., PLS and CCA via GPLSSVD).

2.6 Restrictions and limitations

In general, \pkgGSVD was designed around the most common analyses that use the GSVD, GEVD, and GPLSSVD. These techniques are, typically, multidimensional scaling (GEVD), principal components analysis (GEVD or GSVD), correspondence analysis (GSVD), partial least squares (GSVD or GPLSSVD), reduced rank regression (GSVD or GPLSSVD), canonical correlation analysis (GSVD or GPLSSVD), and numerous variants and extensions of all the aforementioned (and more).

One of the restrictions of these generalized decompositions is that any constraint matrix must be positive semi-definite. That means, typically, these matrices are square symmetric matrices with non-negative eigenvalues. Often that means constraint matrices are, for examples, covariance matrices or diagonal matrices. \pkgGSVD performs checks to ensure adherence to positive semi-definiteness for constraint matrices.

Likewise, many techniques performed through the GEVD or EVD also assume positive semi-definiteness. For example, PCA is the EVD of a correlation or covariance matrix. Thus in \pkgGSVD, there are checks for positive semi-definite matrices in the GEVD. Furthermore, all decompositions in \pkgGSVD check for eigenvalues below a precision level. When found, these very small eigenvalues are effectively zero and not returned by any of the decompositions in \pkgGSVD. However, this can be changed with an input parameter to allow for these vectors to be returned (I discuss these parameters in the following section).

2.7 Other and related decompositions.

To note, the GSVD discussed here is not the same as another technique referred to as the “generalized singular value decomposition” (van_loan_generalizing_1976). The Van Loan generalized SVD has more recently been referred to as the “quotient SVD (QSVD)” to help distinguish it from the GSVD defined here (takane_relationships_2003). Furthermore, there are numerous other variants of the EVD and SVD beyond the GEVD and GSVD presented here. takane_relationships_2003 provides a detailed explanation of those variants as well as the relationships between the variants.

2.8 Other packages of note

There are multiple packages that implement methods based on the GSVD or GEVD, where some of do in fact make use of a GSVD or GSVD-like call. Generally, though, the GSVD calls in these packages are meant more for internal (to the package) use instead of a more external facing tool like \pkgGSVD. A core (but by no means comprehensive) list of those packages follows. There are at least four packages designed to provide interfaces to specific GSVD techniques, and each of those packages has an internal function meant for GSVD-like decomposition.

•
\pkg
ExPosition which includes the function genPDQ (beaton_exposition_2014). As the author of \pkgExPosition, I designed genPDQ under the most common usages which are generally diagonal matrices (or simply just vectors). I regret that design choice, which is one of the primary motivations why I developed \pkgGSVD.
•
\pkg
FactoMineR (le_factominer_2008) includes the function svd.triplet, which also makes use of vectors instead of matrices because the most common (conceptual) uses of the GSVD is that the row and/or column constraints are diagonal matrices.
•
\pkg
ade4 (dray_ade4_2007) which includes a function called as.dudi that is the core decomposition step. It, too, makes use of vectors.
•
\pkg
amap (lucas_amap_2019) which includes the function acp which, akin to the previous 3 packages listed, also only makes use of vectors for row and column constraints.

However, there are other packages that also more generally provide methods based on GSVD, GEVD, or GPLSSVD approaches however, they do not necessarily include a GSVD-like decomposition. Instead they make more direct use of the SVD or eigendecompositions, or alternative methods (e.g., alternating least squares). Those packages include but are not limited to:

•
\pkg
MASS (venables_modern_2002) includes MASS::corresp which is an implementation of CA,
•
\pkg
ca which includes a number of CA-based methods (nenadic_correspondence_2007),
•
\pkg
CAvariants is another package that includes standard CA and other variations not seen in other packages (lombardo_variants_2016),
•
\pkg
homals, and \pkganacor, are packages each that address a number of methods in the CA family (leeuw_gifi_2009; de_leeuw_simple_2009),
•
\pkg
candisc focuses on canonical discriminant and correlation analyses (friendly_candisc_2020),
•
\pkg
vegan that includes a very large set of ordination methods with a particular emphasis on cross-decomposition or canonical methods (oksanen_vegan_2019), and
•
\pkg
rgcca also presents a more unified approach to CCA and PLS under a more unified framework that also includes various approaches to regularization (tenenhaus_rgcca_2017; tenenhaus_regularized_2014).
•
\pkg
ics is a package for invariant coordinate selection, which can obtain unmixing matrices (i.e., as in independent components analysis) (nordhausen_tools_2008; tyler_invariant_2009).

3 Package description: core functions and features

The \pkgGSVD package has three primary “workhorse” functions:

•
\code
geigen(X, W, k = 0, tol = sqrt(.Machine$double.eps), symmetric),
•
\code
gsvd(X, LW, RW, k = 0, tol = .Machine$double.eps), and
•
\code
gplssvd(X, Y, XLW, YLW, XRW, YRW, k = 0, tol = .Machine$double.eps)

In \codegeigen() or \codegsvd() each there is one data matrix \codeX, whereas \codegplssvd() has two data matrices \codeX and \codeY. In \codegeigen() there is a single constraint matrix \codeW. In \codegsvd() there are two constraint matrices, \codeLW or “left weights” for the rows of \codeX and \codeRW or “right weights” for the columns of \codeX. The “left” and “right” references are used because of the association between these weights and the left and right generalized singular vectors. In \codegplssvd() there are two constraint matrices per data matrix (so four total constraint matrices): \codeXLW and \codeXRW for \codeX’s “left” and “right” weights, and \codeYLW and \codeYRW for \codeY’s “left” and “right” weights. The \codegeigen() includes the argument \codesymmetric to indicate if \codeX is a symmetric matrix; when missing \codeX is tested via \codeisSymmetric(). The \codesymmetric argument is eventually passed through to, and is the same as, \codesymmetric in \codebase::eigen(). All three functions include \codek which indicates how many components to return. Finally, all three functions include a tolerance argument \codetol, which is passed through to \codetolerance_svd() or \codetolerance_eigen(). These functions are the same as \codebase::svd() and \codebase::eigen(), respectively, with the added tolerance feature. In both cases, the \codetol argument is used to check for any eigenvalues or singular values below the tolerance threshold. Any eigen- or singular values below that threshold are discarded, as they are effectively zero. These values occur when data are collinear, which is common in high dimensional cases or in techniques such as Multiple Correspondence Analysis. However, the \codetol argument can be effectively turned off with the use of \codeNA, \codeNULL, \codeInf, \code-Inf, \codeNaN, or any value $<0$ . In this case, both \codetolerance_svd() and \codetolerance_eigen() simply call \codebase::svd() and \codebase::eigen() with no changes. When using the \codetol argument, eigen- and singular values are also checked to ensure that they are real and positive values. If they are not, then \codegeigen(), \codegsvd(), and \codegplssvd() stop. The motivation behind this behavior is because the \codegeigen(), \codegsvd(), and \codegplssvd() functions are meant to perform routine multivariate analyses—such as MDS, PCA, CA, CCA, or PLS—that require data and/or constraint matrices assumed to be positive semi-definite.

Data matrices are the minimally required objects for \codegeigen(), \codegsvd(), and \codegplssvd(). All other arguments (input) either have suitable defaults or are allowed to be missing. For example, when any of the constraints (“weights”) are missing, then the constraints are mathematically equivalent to identity matrices (i.e., ${\bf I}$ ) which contain $1$ s on the diagonal with $0$ s off-diagonal. Table 1 shows a mapping between our (more formal) notation above and our more intuitively named arguments for the functions. The rows of Table 1 are the three primary functions—\codegeigen(), \codegsvd(), and \codegplssvd()—where the columns are the elements used in the formal notation (and also used in the tuple notation).

	${\bf X}$	${\bf Y}$	${{\bf M}_{\bf X}}$	${{\bf W}_{\bf X}}$	${{\bf M}_{\bf Y}}$	${{\bf W}_{\bf Y}}$	$C$
\codegeigen()	\codeX	-	-	\codeW	-	-	\codek
\codegsvd()	\codeX	-	\codeLW	\codeRW	-	-	\codek
\codegplssvd()	\codeX	\codeY	\codeXRW	\codeXLW	\codeYRW	\codeYLW	\codek

Table 1: Mapping between arguments (input) to functions (rows) and notation for the analysis tuples (columns).

Additionally, there are some “helper” and convenience functions used internally to the \codegeigen(), \codegsvd(), and \codegplssvd() functions that are made available for use. These include \codesqrt_psd_matrix() and \codeinvsqrt_psd_matrix() which compute the square root (\codesqrt) and inverse square root (\codeinvsqrt) of positive semi-definite (\codepsd) matrices (\codematrix), respectively. The \pkgGSVD package also includes helpful functions for testing matrices: \codeis_diagaonal_matrix() and \codeis_empty_matrix(). Both of these tests help minimize the memory and computational footprints for, or check validity of, the constraints matrices.

Finally, the three core functions in \pkgGSVD—\codegeigen(), \codegsvd(), and \codegplssvd()—each have their own class objects but provide overlapping and identical outputs. The class object is hierarchical from a list, to a package, to the specific function: \codec("geigen","GSVD","list"), \codec("gsvd","GSVD","list"), and \codec("gplssvd","GSVD","list") for \codegeigen(), \codegsvd(), and \codegplssvd() respectively. Table 2 list the possible outputs across \codegeigen(), \codegsvd(), and \codegplssvd(). The first column of Table 2 explains the returned value, where the second column provides a mapping back to the notation used here. The last three columns indicate—with an ‘X’—which of the returned values are available from the \codegeigen, \codegsvd, or \codegplssvd functions.

	What it is	Notation	\codegeigen	\codegsvd	\codegplssvd
\coded	\codek singular values	${\bf\Delta}$	✓	✓	✓
\coded_full	all singular values	${\bf\Delta}$	✓	✓	✓
\codel	\codek eigenvalues	${\bf\Lambda}$	✓	✓	✓
\codel_full	all eigenvalues	${\bf\Lambda}$	✓	✓	✓
\codeu	\codek Left singular/eigen vectors	${\bf U}$		✓	✓
\codev	\codek Right singular/eigen vectors	${\bf V}$	✓	✓	✓
\codep	\codek Left generalized singular/eigen vectors	${\bf P}$		✓	✓
\codeq	\codek Right generalized singular/eigen vectors	${\bf Q}$	✓	✓	✓
\codefi	\codek Left component scores	${\bf F}_{I}$		✓	✓
\codefj	\codek Right component scores	${\bf F}_{J}$	✓	✓	✓
\codelx	\codek Latent variable scores for \codeX	${\bf L}_{\bf X}$			✓
\codely	\codek Latent variable scores for \codeY	${\bf L}_{\bf Y}$			✓

Table 2: Mapping of values (output from functions; rows) to their conceptual meanings, notation used here, and which \pkgGSVD functions have these values.

Different fields and traditions use different nomenclature, different descriptions, or different ways of framing optimizations. But conceptually and mathematically, numerous multivariate techniques are much more related than they appear, especially when solved with the EVD and SVD. The \pkgGSVD package provides a single framework to unify common multivariate analyses by way of three generalized decompositions. The \codearguments (function inputs) and \codevalues (function outputs) help reinforce the mathematical and coneptual equivalence and relationships between techniques, and now via the \pkgGSVD package, we see this unification programatically and through analyses. Therefore the common names—across the core functions—for the \codevalues in Table 2 was an intentional design choice.

Finally, this package is “lightweight” in that it is written in base R, with no dependencies, and has a minimal set of functions to achieve the goals of \pkgGSVD. Furthermore, a number of strategies were employed in order to minimize both memory and computational footprints. For example, when constraints matrices are available, they are checked for certain conditions. Specifically, if a matrix is a diagonal matrix is it transformed into a vector, which decreases memory consumption and speeds up some computations (e.g., multiplication). If that same vector is all $1$ s, then the matrix was an identity matrix, and is then ignored in all computation.

4 Examples of multivariate analyses

In this section, I present many of the most commonly used multivariate analyses, and how they can be performed through the \pkgGSVD package, as well as how these methods can be framed in various ways. Here, I focus primarily on what are likely the most common: principal components analysis, multidimensional scaling, correspondence analysis (and some of its variations), partial least squares, reduced rank regression (a.k.a. redundnacy analysis or multivariable multivariate regression; van1977redundancy; de2012least), and canonical correlation analysis. As I introduce these methods, I also introduce how to use various functions (and their parameters) in \pkgGSVD.

There are other very common multivariate techniques, such as log and power CA methods (greenacre_correspondence_2010; greenacre2009power), and various discriminant analyses. I forgo the descriptions of these latter cases because they tend to be specific or special cases of techniques I do highlight. For examples: log-linear CA requires additional transformations and then performs CA, and various discriminant analyses reduce to special cases of PLS, RRR, or CCA.

The \pkgGSVD package contains several toy or illustrative data sets that work as small examples of various techniques. There is also a larger and more realistic data set in the package that I use in the following examples. That data set is a synthetic data set modeled after data from the Ontario Neurodegenerative Disease Research Initiative (ONDRI; https://ondri.ca/). The synthetic data were generated from real data, but were “synthesized” with the \pkgsynthpop package (synthpop). This synthetic data set—\codesynthetic_ONDRI—contains 138 rows (observations) and 17 columns (variables). See https://github.com/ondri-nibs for more details. The \codesynthetic_ONDRI data set is particularly useful because it contains a variety of data types, e.g., some quantitative such as cognitive or behavioral scores that are continuous, and brain volumes that are strictly positive integers, as well as categorical and ordinal variables (typically demographic or clinical measures). For each of the following illustrations of techniques, I use particular subsets of the \codesynthetic_ONDRI data most relevant or appropriate for those techniques (e.g., continuous measures for PCA, distances for MDS, cross-tabulations of categories for CA).

4.1 Principal components analysis

Generally, there are two ways to approach PCA: with a covariance matrix or with a correlation matrix. First, I show both of these PCA approaches on a subset of continuous measures from the \codesynthetic_ONDRI dataset. Then I focus on correlation PCA, but with an emphasis on (some of) the variety of ways we can perform correlation PCA with generalized decompositions. PCA is illustrated with a subset of continuous measures from cognitive tasks.

We can perform a covariance PCA and a correlation PCA with the generalized eigendecomposition as:

{CodeChunk}{CodeInput}

R> continuous_data <- synthetic_ONDRI[,c("TMT_A_sec", "TMT_B_sec", + "Stroop_color_sec", "Stroop_word_sec", + "Stroop_inhibit_sec", "Stroop_switch_sec")] R> R> cov_pca_geigen <- geigen( cov(continuous_data) ) R> cor_pca_geigen <- geigen( cor(continuous_data) )

In these cases, the use here is no different—from a user perspective—of how PCA would be performed with the plain \codeeigen. For now, the major advantage of the \codegeigen approach is that the output (values) also include component scores and other measures common to these decompositions, such as singular values, as seen in the output. The following code chunk shows the results of the \codeprint method for \codegeigen

{CodeChunk}{CodeInput}

R> cov_pca_geigen {CodeOutput} **GSVD package object of class type ’geigen’.**

geigen() was performed on a marix with 6 columns/rows Number of total components = 6. Number of retained components = 6.

The ’geigen’ object contains: $d_{f}ullFullsetofsingularvalues$ l_full Full set of eigen values $dRetainedsetofsingularvalues(k)$ l Retained set of eigen values (k) $vEigen/singularvectors$ q Generalized eigen/singular vectors $fjComponentscores\CodeInput R>cor_{p}ca_{g}eigen\CodeOutput**GSVDpackageobjectofclasstype^{\prime}geigen^{\prime}.**\par geigen()wasperformedonamarixwith6columns/rowsNumberoftotalcomponents=6.Numberofretainedcomponents=6.\par The^{\prime}geigen^{\prime}objectcontains:$d_{f}ullFullsetofsingularvalues$l_{f}ullFullsetofeigenvalues$dRetainedsetofsingularvalues(k)$lRetainedsetofeigenvalues(k)$vEigen/singularvectors$qGeneralizedeigen/singularvectors$fjComponentscores\par AmorecomprehensiveapproachtoPCAistheanalysisofarectangulartable---i.e.,theobservationsbythemeasures---asopposedtothesquaresymmetricmatrixofrelationshipsbetweenthevariables.ThemorecomprehensiveapproachtoPCAcanbeperformedwith\code{gsvd},asshownbelow,withits\code{print}methodtoshowtheoutputfrom\code{gsvd}.ForthisexampleofcorrelationPCA,weintroduceasetofconstraintsfortherows.ForstandardPCA,therowconstraintsarejust$\frac{1}{I}$where$I$isthenumberofrowsinthedatamatrix.\par\CodeChunk\CodeInput R>scaled_{d}ata<-scale(continuous_{d}ata,center=T,scale=T)R>degrees_{o}f_{f}reedom<-nrow(continuous_{d}ata)-1R>row_{c}onstraints<-rep(1/(degrees_{o}f_{f}reedom),nrow(scaled_{d}ata))R>R>cor_{p}ca_{g}svd<-gsvd(scaled_{d}ata,LW=row_{c}onstraints)R>R>cor_{p}ca_{g}svd\CodeOutput**GSVDpackageobjectofclasstype^{\prime}gsvd^{\prime}.**\par gsvd()wasperformedonamatrixwith138rowsand6columnsNumberofcomponents=6.Numberofretainedcomponents=6.\par The^{\prime}gsvd^{\prime}objectcontains:$d_{f}ullFullsetofsingularvalues$l_{f}ullFullsetofeigenvalues$dRetainedsetofsingularvalues(k)$lRetainedsetofeigenvalues(k)$uLeftsingularvectors(forrowsofDAT)$vRightsingularvectors(forcolumnsofDAT)$pLeftgeneralizedsingularvectors(forrowsofDAT)$qRightgeneralizedsingularvectors(forcolumnsofDAT)$fiLeftcomponentscores(forrowsofDAT)$fjRightcomponentscores(forcolumnsofDAT)\par Inthenextcodechunk,wecanseethattheresultsforthecommonobjects---thesingularandeigenvalues,thegeneralizedandplainsingularvectors,andthecomponentscores---areallidenticalbetween\code{geigen}and\code{gsvd}.\par\CodeChunk\CodeInput R>all(+all.equal(cor_{p}ca_{g}eigen$l_{f}ull,cor_{p}ca_{g}svd$l_{f}ull),+all.equal(cor_{p}ca_{g}eigen$d_{f}ull,cor_{p}ca_{g}svd$d_{f}ull),+all.equal(cor_{p}ca_{g}eigen$v,cor_{p}ca_{g}svd$v),+all.equal(cor_{p}ca_{g}eigen$q,cor_{p}ca_{g}svd$q),+all.equal(cor_{p}ca_{g}eigen$fj,cor_{p}ca_{g}svd$fj)+)\CodeOutput[1]TRUE\par TheuseofrowconstraintshelpsillustratetheGSVDtriplet.ThepreviouscorrelationPCAexampleis$\mathrm{GSVD(}\frac{1}{I-1}{\bf I},{\bf X},{\bf I}\mathrm{)}$where$\frac{1}{I}{\bf I}$isadiagaonalmatrixof$\frac{1}{I-1}$with$I$asthenumberofrowsof${\bf X}$and${\bf I}$istheidentitymatrix.Inthisparticularcase,therowconstraintsareallequalandthusasimplescalingfactor.Alternatively,wecouldjustdividebythatscalaras\code{gsvd(scaled_{d}ata/sqrt(degrees_{o}f_{f}reedom))}whichisequivalentto\code{geigen(cor(continuous_{d}ata))}.\par ThepreviousPCAtriplet---$\mathrm{GSVD(}\frac{1}{I}{\bf I},{\bf X},{\bf I}\mathrm{)}$---helpsustransitiontoanalternativeviewofcorrelationPCAandanexpandeduseoftheGSVDtriplet.CorrelationPCAcanbereframedascovariancePCAwithadditionalconstraintsimposedonthecolumns.Let^{\prime}sfirstcomputePCAwithrowandcolumnconstraintsvia\code{gsvd},thengointomoredetailontheGSVDtripletanditscomputation.\par\CodeChunk\CodeInput R>centered_{d}ata<-scale(continuous_{d}ata,scale=F)R>degrees_{o}f_{f}reedom<-nrow(centered_{d}ata)-1R>row_{c}onstraints<-rep(1/(degrees_{o}f_{f}reedom),nrow(centered_{d}ata))R>col_{c}onstraints<-degrees_{o}f_{f}reedom/colSums((centered_{d}ata)^{2})R>R>cor_{p}ca_{g}svd_{t}riplet<-gsvd(centered_{d}ata,+LW=row_{c}onstraints,+RW=col_{c}onstraints)\par First,let^{\prime}stesttheequalitybetweenthepreviouscorrelationPCAandthisalternativeapproach.\par\CodeChunk\CodeInput R>mapply(all.equal,cor_{p}ca_{g}svd,cor_{p}ca_{g}svd_{t}riplet)\CodeOutput du"TRUE""TRUE"vd_{f}ull"TRUE""TRUE"l_{f}ulll"TRUE""TRUE"pfi"TRUE""TRUE"qfj"Meanrelativedifference:23.65976""Meanrelativedifference:0.9420834"\par Wecanseethatalmosteverythingisidentical,withtwoexceptions:\code{q}and\code{fj}.That^{\prime}sbecausethedecomposedmatrixisidenticalacrossthetwo,buttheirconstraintsarenot.Thecommonalitiesandthedifferencesstemfromhowtheproblemwasframed.Let^{\prime}sunpackwhatwedidforthethreeversionsofPCAshown.Let^{\prime}sassumethat${\bf X}$isamatrixthathasbeencolumn-wisecentered.Thecovariancematrixfor${\bf X}$is${\bf\Sigma}={\bf X}^{T}(\frac{1}{I-1}{\bf I}){\bf X}$,where${\bf d}=\mathrm{diag\{}{\bf\Sigma}\mathrm{\}}$,whicharethevariancespervariable,and${\bf D}=\mathrm{diag\{}{\bf d}\mathrm{\}}$whichisadiagonalmatrixofthevariances.EachofthethreecorrelationPCAapproachescanbeframedasthefollowinggeneralizeddecompositions:\par\begin{itemize} \par\itemize@item@$\mathrm{GEVD(}{\bf D}^{-\frac{1}{2}}{\bf S}{\bf D}^{-\frac{1}{2}},{\bf I}\mathrm{)}$ \par\itemize@item@$\mathrm{GSVD(}\frac{1}{I-1}{\bf I},{\bf X}{\bf D}^{-\frac{1}{2}},{\bf I}\mathrm{)}$ \par\itemize@item@$\mathrm{GSVD(}\frac{1}{I-1}{\bf I},{\bf X},{\bf D}^{-1}\mathrm{)}$ \par\end{itemize}\par ThesecondwayofframingPCA---$\mathrm{GSVD(}\frac{1}{I-1}{\bf I},{\bf X}{\bf D}^{-\frac{1}{2}},{\bf I}\mathrm{)}$---isthecanonical``Frenchway^{\prime\prime}ofpresentingPCAviatheGSVD:acentereddatamatrix(${\bf X}$)normalized/scaledbythesquarerootofthevariances(${\bf D}^{-\frac{1}{2}}$),withequalweightsof${\frac{1}{I}}$fortherowsandanidentitymatrixforthecolumns.\par ThoughnotalwayspracticalforsimplePCAcomputation,theuseofconstraintsinaGEVDdoubletandGSVDtripletshowsthatwecanframe(orre-frame)varioustechniquesasdecompositionsofdatawithrespecttoconstraints.InthecaseofPCA,thetheconstraint-baseddecompositionshighlightthatwecouldperformPCAwithany(suitable)constraintsappliedtotherowsandcolumns.Soforexample,wemayhave\emph{a priori}knowledgeabouttheobservations(rows)andelecttousedifferentweightsforeachobservation(we^{\prime}llactuallyseethisintheMDSexamples).Wecouldalsousesomeknownorassumedpopulationvarianceascolumnconstraintsinsteadofthesamplevariance,orusecolumnconstraintsasasetofweightstoimposeonthevariablesinthedata.\par$

Metric multidimensional scaling (MDS) is a technique akin to PCA, but specifically for the factorization of distance matrix (torgerson1952multidimensional; borg2005modern). MDS, like PCA, is also an eigen-technique. Like the PCA examples, I show several ways to perform MDS through the generalized approaches; specifically all through the GEVD. But for this particular example we will (eventually) make use of some known or a priori information as the constraints. First we show how to perform MDS as a plain EVD problem and then with constraints as a GEVD problem. In these MDS illustrations, I use a conceptually simpler albeit computationally more expensive approach. For example, I use centering matrices and matrix algebra (in \proglangR) to show the steps. Much more efficient methods exist. In these examples we use the same measures as in the PCA example.

{CodeChunk}{CodeInput}

R> data_for_distances <- synthetic_ONDRI[, + c("TMT_A_sec", "TMT_B_sec", + "Stroop_color_sec", "Stroop_word_sec", + "Stroop_inhibit_sec","Stroop_switch_sec")] R> scaled_data <- scale(data