This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

HODGE-AWARE CONTRASTIVE LEARNING

Abstract

Simplicial complexes prove effective in modeling data with multiway dependencies, such as data defined along the edges of networks or within other higher-order structures. Their spectrum can be decomposed into three interpretable subspaces via the Hodge decomposition, resulting foundational in numerous applications. We leverage this decomposition to develop a contrastive self-supervised learning approach for processing simplicial data and generating embeddings that encapsulate specific spectral information. Specifically, we encode the pertinent data invariances through simplicial neural networks and devise augmentations that yield positive contrastive examples with suitable spectral properties for downstream tasks. Additionally, we reweight the significance of negative examples in the contrastive loss, considering the similarity of their Hodge components to the anchor. By encouraging a stronger separation among less similar instances, we obtain an embedding space that reflects the spectral properties of the data. The numerical results on two standard edge flow classification tasks show a superior performance even when compared to supervised learning techniques. Our findings underscore the importance of adopting a spectral perspective for contrastive learning with higher-order data.

Index Terms—  Hodge Laplacian, simplicial filter, contrastive learning.

1 Introduction

The difficulty to represent many biological, social, and technological networks arises from the complexity of their inherent relational structures and the nuanced definitions of the associated data. Importantly, the data in such networks is often defined on higher-order components like edges and triangles, thus demanding an approach that incorporates interactions beyond the pairwise paradigm [1]. Simplicial complexes (SCs) have been proposed to model such dependencies in higher-order networks [2, 3]. Amongst others, SCs have shown particular efficacy in edge flow applications (e.g., mass, energy, information, or trajectories) for which traditional graph-based techniques do not induce a good inductive bias [4, 5]. Notably, they have been leveraged to alleviate the curse of dimensionality in autoregressive flow prediction for water networks [6] and to remove arbitrage opportunities in currency exchange markets [7].

An important property of SCs is that they enjoy an algebraic representation via the Hodge Laplacian matrices, ultimately allowing for spectral analysis. The latter is achieved through the Hodge decomposition of the spectrum and has been used to develop Hodge-aware signal processing and learning techniques for edge flows and other simplicial data [8, 9, 5]. Recent advances include developing a simplicial Fourier transform [4], convolutional filters [7], and neural networks that are trained in a (semi-)supervised manner. [10, 11, 12, 13].

While these supervised learning (SL) methods for simplicial data have their merits, they also come with limitations such as a over-reliance on labeled examples and suboptimal performance in data-scarce scenarios. SL methods are also mainly designed for a specific task and the resulting output is generally not reusable in different applications. Therefore, we propose a contrastive learning (CL) approach for simplicial data that addresses these issues. To this end, we employ a simplicial convolutional neural network (SCNN) to produce embeddings and optimise it in a self-supervised manner using the InfoNCE loss [14]. The resulting CL approach can use both labeled and unlabeled examples to create robust and reusable representations for simplicial data. These representations can subsequently be employed in various downstream tasks, offering improved accuracy especially in label-scarce scenarios.

To further enhance our method, we propose stochastic augmentations and introduce information about the Hodge Decomposition into the embeddings by i) optimizing the parameters of the augmentations so that they generate positive examples that respect the spectral properties of the data; and ii) weighting the negative samples in the InfoNCE loss by a Hodge-aware distance to the anchor (true datum). This approach results in a spectrally organized embedding space and facilitates downstream learning. We corroborate the latter in two edge flow classification settings and show superior performance compared with a fully-supervised model. Related to this, our contribution is threefold:

  1. C1)

    we propose simplicial contrastive learning (SCL), design related augmentations and experimentally validate all our approaches;

  2. C2)

    we show how augmentations in the simplicial domain can be optimized with respect to the Hodge decomposition;

  3. C3)

    we introduce a reweighing of the negative examples based on the similarity of their Hodge components to encourage a spectrally organized embedding space.

2 Data Processing on Simplicial Complexes

In this section, we introduce the fundamental concepts behind simplicial data structures and related data processing techniques.

2.1 Simplicial Data and Structures

Let 𝒱={1,,N}{\mathcal{V}}=\{1,\ldots,N\} be a set of vertices. A kk-simplex 𝒮k{\mathcal{S}}^{k} is a subset of 𝒱{\mathcal{V}} that contains k+1k+1 distinct elements. A simplicial complex 𝒳K{\mathcal{X}}^{K} of order KK is a collection of simplices such that it contains at least one KK- simplex and if 𝒮k𝒳K{\mathcal{S}}^{k}\in{\mathcal{X}}^{K} we have that all subsets of 𝒮k{\mathcal{S}}^{k} are also elements of 𝒳K{\mathcal{X}}^{K}. From a geometric representation perspective, in a SC of order K=2K=2, nodes are 0-simplices, edges are 1-simplices and (filled) triangles are 2-simplices [2, 3].

Neighborhood relations in an SC can be represented via the incidence matrices and Hodge Laplacians. Specifically, the incidence matrix 𝐁kNk1×Nk\mathbf{B}_{k}\in\mathbb{R}^{N_{k-1}\times N_{k}} captures the adjacencies between (k1)(k-1)- and kk-simplices. Consequently, 𝐁1\mathbf{B}_{1} represents the node-to-edge incidence matrix and 𝐁2\mathbf{B}_{2} the edge-to-triangle incidence matrix. For an SC of order two, 𝒳2{\mathcal{X}}^{2}, the related Hodge Laplacians are defined as:

𝐋0=𝐁1𝐁1,𝐋1=𝐋1,+𝐋1,u:=𝐁1𝐁1+𝐁2𝐁2,𝐋2=𝐁2𝐁2\begin{array}[]{l}{{\mathbf{L}}_{0}}={{\mathbf{B}}_{1}}{\mathbf{B}}_{1}^{\top},\\ {{\mathbf{L}}_{1}}={{\mathbf{L}}_{1,\ell}}+{{\mathbf{L}}_{1,u}}:={\mathbf{B}}_{1}^{\top}{{\mathbf{B}}_{1}}+{{\mathbf{B}}_{2}}{\mathbf{B}}_{2}^{\top},\\ {{\mathbf{L}}_{2}}={\mathbf{B}}_{2}^{\top}{{\mathbf{B}}_{2}}\end{array} (2)

where 𝐋0\mathbf{L}_{0}, 𝐋1\mathbf{L}_{1}, 𝐋2\mathbf{L}_{2} represent the neighborhood relationships between nodes, edges, and triangles, respectively. Moreover, 𝐋0\mathbf{L}_{0} coincides with the standard graph Laplacian [3]. The lower-Laplacian 𝐋1,=𝐁1𝐁1\mathbf{L}_{1,\ell}=\mathbf{B}_{1}^{\top}\mathbf{B}_{1} and upper-Laplacian 𝐋1,u=𝐁2𝐁2\mathbf{L}_{1,u}=\mathbf{B}_{2}\mathbf{B}_{2}^{\top} split the edge adjacencies into relations that are due to common vertices and common triangles, respectively.

A kk-simplicial signal xkx^{k} is a mapping from a kk-simplex to the set of real numbers that formalizes the simplicial data. We collect all kk- signals in vector 𝐱k=[x1k,,xNkk]{\mathbf{x}}^{k}=[x^{k}_{1},\ldots,x^{k}_{N_{k}}]^{\top}, where xikx_{i}^{k} is the signal on the iith simplex and NkN_{k} is the total number of kk-simplices. For instance, we denote an edge flow as 𝐱1=[x11,,xN11]{\mathbf{x}}^{1}=[x^{1}_{1},\ldots,x^{1}_{N_{1}}]^{\top} with xe1x^{1}_{e} being the flow on the edge e=(m,n)e=(m,n) in 𝒮1\mathcal{S}^{1}. In the sequel, we focus on edge flows to ease exposition and because of their wider applicability; thus we drop the superscript and denote them as 𝐱{\mathbf{x}}.

2.2 The Hodge Decomposition

SCs allow for a spectral processing of simplicial signals via the Hodge decomposition, which decomposes the space Nk{\mathbb{R}}^{N_{k}} as:

Nk=im(𝐁k)im(𝐁k+1)ker(𝐋k){\mathbb{R}}^{N_{k}}=\text{im}({\mathbf{B}}_{k}^{\top})\oplus\text{im}({\mathbf{B}}_{k+1})\oplus\text{ker}({\mathbf{L}}_{k}) (1)

where im()\text{im}(\cdot) and ker()\text{ker}(\cdot) are the image and kernels spaces of a matrix and \oplus is the direct sum of vector spaces [4, 7]. Accordingly, we can decompose any edge flow 𝐱{\mathbf{x}}, into three parts 𝐱=𝐱G+𝐱C+𝐱H{\mathbf{x}}={\mathbf{x}}_{\rm G}+{\mathbf{x}}_{\rm C}+{\mathbf{x}}_{\rm H} each living in an orthogonal subspace known as the gradient space 𝐱Gim(𝐁1){\mathbf{x}}_{\rm G}\in\text{im}({\mathbf{B}}_{1}^{\top}), the curl space 𝐱Cim(𝐁2){\mathbf{x}}_{\rm C}\in\text{im}({\mathbf{B}}_{2}), and the harmonic space 𝐱Hker(𝐋1){\mathbf{x}}_{\rm H}\in\text{ker}({\mathbf{L}}_{1}). In turn, the 1-Hodge Laplacian can be eigendecomposed as 𝐋1=𝐔𝚲𝐔{\mathbf{L}}_{1}={\mathbf{U}}\boldsymbol{\Lambda}{\mathbf{U}}^{\top} with eigenvectors 𝐔{\mathbf{U}} and eigenvalues 𝚲\boldsymbol{\Lambda}. By grouping the eigenvectors as 𝐔=[𝐔G,𝐔C,𝐔H]{\mathbf{U}}=[{\mathbf{U}}_{\rm G},{\mathbf{U}}_{\rm C},{\mathbf{U}}_{\rm H}] and projecting the edge flow onto them gives rise to the embeddings 𝐱~G=𝐔G𝐱\tilde{\mathbf{x}}_{\mathrm{G}}=\mathbf{U}_{\mathrm{G}}^{\top}\mathbf{x}, 𝐱~C=𝐔C𝐱\tilde{\mathbf{x}}_{\mathrm{C}}=\mathbf{U}_{\mathrm{C}}^{\top}\mathbf{x}, 𝐱~H=𝐔H𝐱\tilde{\mathbf{x}}_{\mathrm{H}}=\mathbf{U}_{\mathrm{H}}^{\top}\mathbf{x}. These embeddings encode different interpretable properties of the data, which we will exploit in Sec. 4 to design augmentations. Refer to [4] for more detail on the role of the Hodge decomposition on processing simplicial data.

2.3 Simplicial Convolutional Neural Networks

Simplicial convolutional filters are linear parametric mappings that can process simplicial signals [7]. For an input edge flow 𝐱{\mathbf{x}}, the filtered signal is 𝐲=𝐇(𝐋1)𝐱{\mathbf{y}}={\mathbf{H}}({\mathbf{L}}_{1}){\mathbf{x}} with simplicial filtering matrix:

𝐇(𝐋1):=(ϵ𝐈+l1=0L1αl1𝐋1,l1+l2=0L2βl2𝐋1,ul2).{\mathbf{H}}({\mathbf{L}}_{1}):=\bigg{(}\epsilon\mathbf{I}+\sum_{l_{1}=0}^{L_{1}}\alpha_{l_{1}}{\mathbf{L}}_{1,\ell}^{l_{1}}+\sum_{l_{2}=0}^{L_{2}}\beta_{l_{2}}{\mathbf{L}}_{1,u}^{l_{2}}\bigg{)}. (2)

Here, {ϵ,αl1,βl2}\{\epsilon,\alpha_{l_{1}},\beta_{l_{2}}\} are filter coefficients and 𝐋1,\mathbf{L}_{1,\ell}, 𝐋1,u\mathbf{L}_{1,u} propagate the edge signal via their respective neighbourhood relations. The filter is local in the sense that it moves information by at most L=max{L1,L2}L=\max\{L_{1},L_{2}\} hops across the simplicial structure. Moreover, by exploiting the recursions 𝐋1,l1𝐱=𝐋1,(𝐋1,l11𝐱){\mathbf{L}}_{1,\ell}^{l_{1}}{\mathbf{x}}={\mathbf{L}}_{1,\ell}({\mathbf{L}}_{1,\ell}^{l_{1}-1}{\mathbf{x}}), 𝐋1,ul2𝐱k=𝐋ku(𝐋1,ul21𝐱){\mathbf{L}}_{1,u}^{l_{2}}{\mathbf{x}}^{k}={\mathbf{L}}_{ku}({\mathbf{L}}_{1,u}^{l_{2}-1}{\mathbf{x}}) and since the Laplacian matrices are sparse, we can obtain the output with a cost of order 𝒪(LN1){\mathcal{O}}(LN_{1}). The filter part related to the lower Laplacian acts on the signal gradient embedding, whereas that related to the upper Laplacian acts on the curl embedding. All the parameters contribute to processing the signal harmonic embedding. Refer to [7, Sec. IV] for the specific relation of filter (2) and decomposition (1).

The constant number of parameters and the linear computational complexity in the number of edges, make the filter (2) an appealing solution to learn representations from edge flows. To also learn non-linear mappings, simplicial convolutional neural networks (SCNNs) have been developed as layered structures interleaving filters with pointwise non-linearities [13]. Specifically, given the SCNN input 𝐱0:=𝐱{\mathbf{x}}_{0}:={\mathbf{x}}, the propagation rule at each layer tt is:

𝐱t=σ(𝐇t(𝐋1)𝐱t1)\mathbf{x}_{t}=\sigma\big{(}{\mathbf{H}}_{t}({\mathbf{L}}_{1}){\mathbf{x}}_{t-1}\big{)} (3)

where σ()\sigma(\cdot) is a pointwise nonlinearity (e.g., ReLU). The final layer constitutes the SCNN output which provides its embedding. Compactly, we will denote the SCNN input-output relation as 𝐡:=f(𝐱){\mathbf{h}}:=f_{\mathcal{H}}({\mathbf{x}}), where 𝐡{\mathbf{h}} is referred to as the SCNN embedding and set :={ϵt,{α1,t},{β2,t}}t{\mathcal{H}}:=\{\epsilon_{t},\{\alpha_{\ell_{1},t}\},\{\beta_{\ell_{2},t}\}\}_{t} collects the parameters of all layers. Furthermore, depending on the setting, a readout function is applied to 𝐡{\mathbf{h}} to transform it for the task at hand (e.g., binary classification).

3 Problem Statement

Learning representations via the SCNN is typically done in a supervised manner, but we often have just a few labeled examples or we are missing labels at all. To tackle this challenge, we resort to contrastive learning and propose a self-supervised learning approach for simplicial complexes. Our problem statement reads as:

Given a set of unlabeled edge flows, we want to train a simplicial convolutional neural network in a self-supervised manner to generate embeddings that reflect the Hodge-properties of the data and that can be used in a downstream task.

We approach this problem by training the network with the contrastive InfoNCE loss and by designing augmentations that preserve the desired spectral properties. We also reweight the negative samples in the loss to push apart spectrally-different embeddings.

𝐡1\mathbf{h}_{1}𝐳1\mathbf{z}_{1}𝐱1\mathbf{x}_{1}𝐡2\mathbf{h}_{2}𝐱2\mathbf{x}_{2}𝐳2\mathbf{z}_{2}𝐱{\mathbf{x}}InfoNCEg()g_{\mathcal{H}}(\cdot)g()g_{\mathcal{H}}(\cdot)f()f_{\mathcal{H}}(\cdot)f()f_{\mathcal{H}}(\cdot)𝒯1(){\mathcal{T}}_{1}(\cdot) 𝒯2(){\mathcal{T}}_{2}(\cdot)
Fig. 1: InfoNCE learning with augmentations. The data point (anchor) 𝐱{\mathbf{x}} submits two transformations 𝒯1\2(){\mathcal{T}}_{1\backslash 2}(\cdot) to generate positive augmented examples. The latter are first passed through the SCNN f()f_{\mathcal{H}}(\cdot) to generate the simplicial embeddings 𝐡{\mathbf{h}} (cf. (3)) and then to a parametric map g()g_{\mathcal{H}}(\cdot) to obtain the final representation 𝐳{\mathbf{z}}. These representations are contrasted in loss (4) to train both f()f_{\mathcal{H}}(\cdot) and g()g_{\mathcal{H}}(\cdot).

4 Simplicial Contrastive Learning

To train an SCNN in a self-supervised manner, we resort to the contrastive learning framework [15, 16]. In the simplicial setting, this principle suggests that for each edge flow datum 𝐱{\mathbf{x}} we create both positive and negative examples and train the SCNN (a.k.a. the encoder) to map the positive embeddings close to each other and the negative ones farther apart. Specifically, we consider an edge flow datum 𝐱{\mathbf{x}} with its final representation 𝐳=g(𝐡)=g(f(𝐱)){\mathbf{z}}=g_{{\mathcal{H}}}({\mathbf{h}})=g_{\mathcal{H}}(f_{\mathcal{H}}({\mathbf{x}})), where f()f_{{\mathcal{H}}}(\cdot) is the SCNN [cf. (3)] and g()g_{\mathcal{H}}(\cdot) is a parametric map (typically a fully connected layer). Then, we create a positive pair for 𝐱{\mathbf{x}} via two topological augmentations 𝐱i=𝒯1(𝐱)\mathbf{x}_{i}^{\prime}=\mathcal{T}_{1}(\mathbf{x}) and 𝐱j=𝒯2(𝐱)\mathbf{x}_{j}^{\prime}=\mathcal{T}_{2}(\mathbf{x}) with respective representations 𝐳i=g(f(𝐱i)){\mathbf{z}}_{i}^{\prime}=g_{\mathcal{H}}(f_{\mathcal{H}}({\mathbf{x}}_{i}^{\prime})) and 𝐳j=g(f(𝐱j)){\mathbf{z}}_{j}^{\prime}=g_{\mathcal{H}}(f_{\mathcal{H}}({\mathbf{x}}_{j}^{\prime})). The negative examples w.r.t. to 𝐱{\mathbf{x}} consists of other edge flows from the dataset 𝐱m𝐱{\mathbf{x}}_{m}\neq{\mathbf{x}} and their augmentations. With reference to Fig. 1, the overall network is trained to minimize the so-called temperature-scaled InfoNCE objective:

infoNCE =(𝐱i,𝐱j)log(esim(𝐳i,𝐳j)/τm=1Mesim(𝐳i,𝐳m)/τ)\mathcal{L}_{\text{infoNCE }}=-\sum_{\left(\mathbf{x}_{i}^{\prime},\mathbf{x}_{j}^{\prime}\right)\in\mathbb{P}}\log\left(\frac{e^{{\rm sim}\left(\mathbf{z}_{i}^{\prime},\mathbf{z}_{j}^{\prime}\right)/\tau}}{\sum_{m=1}^{M}e^{{\rm sim}\left(\mathbf{z}^{\prime}_{i},\mathbf{z}_{m}\right)/\tau}}\right) (4)

where \mathbb{P} is the set of all positive pairs in the data, sim(𝒖,𝒗)=𝒖𝒗/𝒖2𝒗2\operatorname{sim}(\boldsymbol{u},\boldsymbol{v})=\boldsymbol{u}^{\top}\boldsymbol{v}/\|\boldsymbol{u}\|_{2}\|\boldsymbol{v}\|_{2} is the cosine similarity, τ\tau is a temperature parameter, and MM is the number of negative examples 𝐱m{\mathbf{x}}_{m} with representations 𝐳m{\mathbf{z}}_{m}. The numerator encourages the network to map the positive embeddings close to each other, while the denominator repulses the negative embedding 𝐳m{\mathbf{z}}_{m} from the positive one 𝐳i{\mathbf{z}}_{i}^{\prime}.

The InfoNCE optimizes a lower bound on the mutual information between the representations of the positive pairs [14]. Hence, augmentations should only preserve the information necessary to perform well on a downstream task [17]. Consequently, the irrelevant information is destroyed and the mutual information encoded in the representations is optimized for the task at hand. Common augmentations in contrastive learning are stochastic and include masking part of the data (pixels, vertices, edges, node features, etc.), which expresses the belief that the most important parts of the data are preserved when a few connections or feature values are removed [16, 18]. While this preserves, in probability, crucial information and works well empirically, note that the most relevant parts of a data point may not always remain intact. To mitigate this, the dropout probabilities are often chosen such that the most important properties of the data are conserved more often (e.g., in graphs nodes with a high centrality) instead of randomly removing information [19, 20, 21, 22]. With this in place, we propose the following augmentation method.

Edge flow masking. This method masks edge flows with probabilities 𝐩\mathbf{p} to generate a positive example. That is, 𝐱=𝒯(𝐱):=𝐱𝐞\mathbf{x^{\prime}}={\mathcal{T}}({\mathbf{x}}):=\mathbf{x}\circ\mathbf{e} where 𝐞\mathbf{e} is a random Bernoulli vector with entry 𝐞iBer(𝐩i)\mathbf{e}_{i}\sim Ber(\mathbf{p}_{i}) and \circ is the elementwise product. The standard approach is to pick the same masking probability for all edges.

Such an augmentation is more effective in settings with binary flow types {,1,1}\{-,1,1\} or when a zero value is not indicative. This is because the masked flow attains a zero value and the augmentation effectively drops parts of the flow. Next, we show how to optimize the masking probabilities w.r.t. flow Hodge-related information.

4.1 Hodge-Aware Spectral Augmentations

Simplicial data often have particular properties in one of the three Hodge embeddings [3, 23], which may be wrongly affected by augmentations if ignored. Hence, following the InfoMin principle simplicial augmentations should destroy information on irrelevant Hodge embeddings and preserve it on the others. To account for the latter, we cast the dropout probabilities as a stochastic optimization problem, considering the expected value of the difference of the generated Hodge embeddings to the embeddings of the anchor. Specifically, consider the embeddings of an augmented example 𝐱~G=𝐔G𝐱\tilde{\mathbf{x}}^{\prime}_{\text{G}}=\mathbf{U}_{\mathrm{G}}^{\top}\mathbf{x}^{\prime}, 𝐱~C=𝐔C𝐱\tilde{\mathbf{x}}^{\prime}_{\text{C}}=\mathbf{U}_{\mathrm{C}}^{\top}\mathbf{x}^{\prime}, 𝐱~H=𝐔H𝐱\tilde{\mathbf{x}}^{\prime}_{\text{H}}=\mathbf{U}_{\mathrm{H}}^{\top}\mathbf{x}^{\prime}. Then the expressions G(𝐩)=𝔼[𝐱~G𝐱~G22]\mathcal{L}_{\mathrm{G}}(\mathbf{p})=\mathbb{E}[\left\|\tilde{\mathbf{x}}_{\mathrm{G}}-\tilde{\mathbf{x}}^{\prime}_{\text{G}}\right\|_{2}^{2}], C(𝐩)=𝔼[𝐱~C𝐱~C22]\mathcal{L}_{\mathrm{C}}(\mathbf{p})=\mathbb{E}[\left\|\tilde{\mathbf{x}}_{\mathrm{C}}-\tilde{\mathbf{x}}^{\prime}_{\text{C}}\right\|_{2}^{2}], H(𝐩)=𝔼[𝐱~H𝐱~H22]\mathcal{L}_{\mathrm{H}}(\mathbf{p})=\mathbb{E}[\left\|\tilde{\mathbf{x}}_{\mathrm{H}}-\tilde{\mathbf{x}}^{\prime}_{\text{H}}\right\|_{2}^{2}] quantify the expected quadratic differences between the original and the augmented hodge embeddings. 111To see that these indeed depend on 𝐩\mathbf{p} use the equality Tr(𝐗𝐗)=𝐗22\mathrm{Tr}\left({\mathbf{X}}{\mathbf{X}}^{\top}\right)=\|{\mathbf{X}}\|_{2}^{2} and recall 𝔼[𝐱𝐞]=𝐱𝐩\mathbb{E}\left[\mathbf{x}\circ\mathbf{e}\right]=\mathbf{x}\circ\mathbf{p}. Then 𝔼[𝐱~G𝐱~G22]=𝐱~G22Tr(𝐔G𝐱(𝐱𝐩)𝐔G)Tr(𝐔G(𝐱𝐩)𝐱𝐔G)+Tr(𝐔G𝐏𝐔G)\mathbb{E}[\left\|\tilde{\mathbf{x}}_{\mathrm{G}}-\tilde{\mathbf{x}}^{\prime}_{\text{G}}\right\|_{2}^{2}]=\left\|\tilde{\mathbf{x}}_{\text{G}}\right\|_{2}^{2}-\mathrm{Tr}\left(\mathbf{U}_{\mathrm{G}}\mathbf{x}(\mathbf{x}\circ\mathbf{p})^{\top}\mathbf{U}_{\mathrm{G}}^{\top}\right)-\mathrm{Tr}\left(\mathbf{U}_{\mathrm{G}}(\mathbf{x}\circ\mathbf{p})\mathbf{x}^{\top}\mathbf{U}_{\mathrm{G}}^{\top}\right)+\mathrm{Tr}\left(\mathbf{U}_{\mathrm{G}}\mathbf{P}\mathbf{U}_{\mathrm{G}}^{\top}\right), where 𝐏\mathbf{P} is a matrix with entries 𝐏i,j=𝐱i𝐱j𝐩i𝐩j\mathbf{P}_{i,j}=\mathbf{x}_{i}\mathbf{x}_{j}\mathbf{p}_{i}\mathbf{p}_{j} for iji\neq j and 𝐏i,j=(𝐱i)2𝐩i\mathbf{P}_{i,j}=(\mathbf{x}_{i})^{2}\mathbf{p}_{i} for i=ji=j. We use these distances to optimize the probabilities based on prior knowledge such that one or more of the augmented embeddings are similar while the remaining ones differ. For example, when the curl and harmonic embeddings are important, such as in the trajectory prediction task that we will touch in the experiments (Sec. 5), we could design 𝐩{\mathbf{p}} by solving:

min𝐩\displaystyle\!\min_{\mathbf{p}} G(𝐩)+C(𝐩)+H(𝐩)\displaystyle-\mathcal{L}_{\mathrm{G}}(\mathbf{p})+\mathcal{L}_{\mathrm{C}}(\mathbf{p})+\mathcal{L}_{\mathrm{H}}(\mathbf{p}) (5a)
subject to 𝐩𝒢𝐩:={𝐩𝐩[0,1]N1,𝐩1ϵ𝐩},\displaystyle\mathbf{p}\in\mathcal{G}_{\mathbf{p}}:=\left\{\mathbf{p}\mid\mathbf{p}\in[0,1]^{N_{1}},\|\mathbf{p}\|_{1}\leq\epsilon_{\mathbf{p}}\right\}, (5b)

where set 𝒢𝐩\mathcal{G}_{\mathbf{p}} puts a maximum budget ϵ𝐩\epsilon_{\mathbf{p}} on the allocated drop probabilities in a sparse manner (i.e., the 1\ell_{1}-norm 1\|\cdot\|_{1} imposes sparse probabilities on a few flows). The budget ϵ𝐩\epsilon_{\mathbf{p}} is tuned as a hyperparameter. 222In (5a), we could also weight the contributions of the different components (e.g., αCC(𝐩)+αHH(𝐩)\alpha_{\rm C}\mathcal{L}_{\mathrm{C}}(\mathbf{p})+\alpha_{\rm H}\mathcal{L}_{\mathrm{H}}(\mathbf{p}) with scalars αC,αH>0\alpha_{\rm C},\alpha_{\rm H}>0 when the signals have some contribution in each of them. By solving (5), we find the dropout probabilities 𝐩\mathbf{p} that generate examples with similar curl and harmonic components to the original data point but with a different gradient component. We solve this optimization problem with projected gradient descent, projecting 𝐩\mathbf{p} onto the constraint set 𝒢𝐩{\mathcal{G}}_{{\mathbf{p}}} after every step.

4.2 Hodge-Aware Debiasing

Problem (5) influences the embedding space by acting on the augmentation functions 𝒯1\2(){\mathcal{T}}_{1\backslash 2}(\cdot) to generate better positive examples. To further improve the organization of the embedding space, we shall also act on the negative samples. This is known as a debiasing technique and consists of reweighting the negative samples in the InfoNCE loss [24, 25, 26]. I.e., by optimizing w.r.t. the loss:

weighted =(𝐱i,𝐱j)𝒫log(esim(𝐳i,𝐳j)/τm=1Mw(𝐱i,𝐱m)esim(𝐳i,𝐳m)/τ)\mathcal{L}_{\text{weighted }}=-\sum_{\left(\mathbf{x}_{i}^{\prime},\mathbf{x}_{j}^{\prime}\right)\in{\mathcal{P}}}\log\left(\frac{e^{{\rm sim}\left(\mathbf{z}_{i}^{\prime},\mathbf{z}_{j}^{\prime}\right)/\tau}}{\sum_{m=1}^{M}w(\mathbf{x}_{i},\mathbf{x}_{m})\ e^{{\rm sim}\left(\mathbf{z}^{\prime}_{i},\mathbf{z}_{m}\right)/\tau}}\right) (6)

where w(𝐱i,𝐱m)w(\mathbf{x}_{i},\mathbf{x}_{m}) is the weighting term between the anchor 𝐱i{\mathbf{x}}_{i} and the negative example 𝐱m{\mathbf{x}}_{m}. For Hodge-aware learning, this weight should reflect the spectral properties of the data; thus, we would like to push spectrally different samples further away from the anchor.

We first consider a weighted embedding similarity between two data points:

𝒮(𝐱~i,𝐱~m)=γHCD(𝐱~H,i,𝐱~H,m)+γGCD(𝐱~G,i,𝐱~G,m)+γCCD(𝐱~C,i,𝐱~C,m)\displaystyle\begin{split}\mathcal{S}(\tilde{\mathbf{x}}_{i},\tilde{\mathbf{x}}_{m})=\gamma_{\rm H}\ \mathrm{CD}&(\tilde{\mathbf{x}}_{\mathrm{H},i},\tilde{\mathbf{x}}_{\mathrm{H},m})+\gamma_{\rm G}\ \mathrm{CD}(\tilde{\mathbf{x}}_{\mathrm{G},i},\tilde{\mathbf{x}}_{\mathrm{G},m})\\ &+\gamma_{\rm C}\ \mathrm{CD}(\tilde{\mathbf{x}}_{\mathrm{C},i},\tilde{\mathbf{x}}_{\mathrm{C},m})\end{split} (7)

CD(𝐱~i,𝐱~m)=1𝐱~i𝐱~m𝐱~i2𝐱~m2\mathrm{CD}(\tilde{\mathbf{x}}_{i},\tilde{\mathbf{x}}_{m})=1-\frac{\tilde{\mathbf{x}}^{\top}_{i}\tilde{\mathbf{x}}_{m}}{\left\|\tilde{\mathbf{x}}_{i}\right\|_{2}\left\|\tilde{\mathbf{x}}_{m}\right\|_{2}} is the cosine distance and weights γH,γG,γC0\gamma_{\rm H},\gamma_{\rm G},\gamma_{\rm C}\geq 0 are picked based on prior knowledge about the task or tuned as hyperparameters. Choosing the cosine distance leads to higher negative values for spectrally more dissimilar examples. Then, we compute the weight as the normalized similarity over the MM negative samples:

w(𝐱i,𝐱m)=𝒮(𝐱~i,𝐱~m)m=1M𝒮(𝐱~i,𝐱~m)w(\mathbf{x}_{i},\mathbf{x}_{m})=\frac{\mathcal{S}(\tilde{\mathbf{x}}_{i},\tilde{\mathbf{x}}_{m})}{\sum_{m=1}^{M}\mathcal{S}(\tilde{\mathbf{x}}_{i},\tilde{\mathbf{x}}_{m})} (8)

which means that the weights for each datum sum to one and ensures that the loss terms for different data points are comparable. Summarizing the above, substituting (7) into (8) and the latter into (6) leads to an overall contrastive loss that pushes spectrally different samples away from the anchor based on this dissimilarity score. This encourages an embedding space that is organized with respect to the Hodge decomposition.

Refer to caption
Fig. 2: Embedding distance for augmentations sampled with uniform probabilities (blue) and with the proposed spectrally optimized ones (orange). In the harmonic embedding, for the distribution associated with the spectral edge drop more probability mass lies over smaller differences and we are thus more likely to generate samples with more similar harmonic components than with the standard method.

5 Numerical Results

We corroborate the proposed approach and compare it with supervised alternatives on two edge flow classification tasks: i) a synthetic task considering trajectories on a map; and ii) a real-data case that contains ocean drifters moving around the Madagascar island. Because of the holes in the SC representations, the harmonic embedding captures important information for solving the task. Due to the limited space, we refer the reader to [27, §5] for more details.

Setup. For the trajectory dataset, we generate 200 training, 100 validation, and 100 test data points while there are 160 training and 40 test data points for the ocean drifter. We train the unsupervised simplicial contrastive learner (SCL) on all available unlabled data points and fit a linear support vector machine (SVM) on the obtained embeddings. For the ocean drifters, we use a 10-fold cross-validation on the training set to estimate the penalty parameter for the SVM. We report the average accuracy over 16 data splits. We optimize the network with stochastic gradient descent and grid search the learning rate and weight decay in the interval [105,1][10^{-5},1] in decimal steps. Furthermore, we select the edge flow drop probabilities 𝐩{\mathbf{p}} and perturbation budged ϵ𝐩\epsilon_{{\mathbf{p}}} from [0.1,0.7][0.1,0.7]. All models are trained for 200 epochs with a batch size of 100. For the encoder we follow the setting in [12] (which is supervised in there) and use a Tanh-activation and a hidden-layer of size 64. We tune the number of layers and the convolutional orders L1=L2L_{1}=L_{2} in [1,2,3][1,2,3]. We compare the proposed approach with a fully-supervised SCNN and conduct and extensive analysis to understand the role of the different components.

Results. Table 1 depicts the overall performance on the downstream tasks. The spectral simplicial contrastive learner (SSCLSpec) trained with reweighted negative samples and spectrally optimized probabilities achieves the best downstream accuracy on both datasets. This shows the ability of the proposed approach to effectively encode more relevant Hodge-related information into the embeddings, facilitating the subsequent linear learner. Fig. 2 further reinforces this aspect by showing the embedding distance between the anchor and two different augmentation techniques (random edge drop and proposed). The proposed approach generates more similar harmonic embeddings, which is key to the obtained results for the task. In Fig. 3, we show the proposed approach consistently achieves a superior performance independent of the augmentation quality.

Notably, even for models trained without a reweighted loss, the incorporation of spectrally optimized augmentations (SCLSpec) improves the accuracy over uniform probabilities (SCL). This substantiates the importance of the spectral augmentations as a standalone feature. Furthermore, to evaluate the impact of the encoder, we tested a learner that uses only lower Laplacian encoding (SCLlow), omitting triangle relationships. Compared to its simplicial counterpart SCL under identical conditions, SCLlow, manifests a noticeable decrease in performance. This demonstrates that the structural advantages of simplicial networks to process flow data transfer to the contrastive learning setting.

Refer to caption
Fig. 3: Performance comparison between a model trained with the standard loss (SCL) (cf. (4)) vs. a model trained with a spectrally reweighted loss (SSCL) (cf. (6)) for an edge drop augmentation. The SSCL outperforms the SCL irrespective of the augmentation quality.
Table 1: Test Accuracies for the Trajectory and Ocean Drifter datasets. SCL denotes models trained with the standard InfoNCE loss (cf. (4)), while SSCL models are trained with spectrally reweighted negatives (cf. (6)). The subscript Spec denotes that the augmentation probablities are spectrally optimized.
Model Trajectory Task Ocean Drifters
SSCLSpec (ours) 97.9±0.397.9\pm 0.3 90.3±1.490.3\pm 1.4
SCNN (supervised) 95.2±0.595.2\pm 0.5 78.5±1.178.5\pm 1.1
SSCL 96.8±0.496.8\pm 0.4 89.1±1.089.1\pm 1.0
SCLSpec 98.2±0.498.2\pm 0.4 83.1±1.183.1\pm 1.1
SCL 96.1±0.696.1\pm 0.6 81.6±1.681.6\pm 1.6
SCLlow 91.0±0.291.0\pm 0.2 77.1±1.277.1\pm 1.2

6 Conclusion

We show that a contrastive learning framework, when coupled with a simplicial neural network, is effective for generating representations for edge flow data that contain hodge-related information. Related to this, we demonstrated that positive examples with specific spectral properties can be generated by casting the task as an optimization problem on the underlying probabilities of a dropout augmentation. Once these probabilities are optimized, they can be used to generate examples with desired spectral characteristics. We also introduce Hodge-related information into the problem by reweighting the negative examples in the loss based on their spectral difference to the anchor. This pushes spectrally very different examples further apart and results in an embedding space that takes the relevant hodge information into account. Empirical results demonstrate that these optimized embeddings can be used to significantly outperform a fully-supervised model on two edge flow classification tasks. For future work, exploring other types of data augmentation methods and conducting experiments with simplicial complexes of varying dimensions remain as key areas.

References

  • [1] F.Battiston, G.Cencetti, I.Iacopini, V.Latora, M.Lucas, A.Patania, J.-G.Young, and G.Petri, “Networks beyond pairwise interactions: Structure and dynamics,” Physics Reports, vol. 874, pp. 1–92, 2020.
  • [2] M.Robinson, Topological signal processing, vol. 81, Springer, 2014.
  • [3] M. T.Schaub, Y.Zhu, J.-B.Seby, T. M.Roddenberry, and S.Segarra, “Signal processing on higher-order networks: Livin’on the edge… and beyond,” Signal Processing, vol. 187, pp. 108149, 2021.
  • [4] S.Barbarossa and S.Sardellitti, “Topological signal processing over simplicial complexes,” IEEE Transactions on Signal Processing, vol. 68, pp. 2992–3007, 2020.
  • [5] T. M.Roddenberry, F.Frantzen, M. T.Schaub, and S.Segarra, “Hodgelets: Localized spectral representations of flows on simplicial complexes,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 5922–5926.
  • [6] J.Krishnan, R.Money, B.Beferull-Lozano, and E.Isufi, “Simplicial vector autoregressive model for streaming edge flows,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
  • [7] E.Isufi and M.Yang, “Convolutional filtering in simplicial complexes,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 5578–5582.
  • [8] M.Yang, E.Isufi, M. T.Schaub, and G.Leus, “Finite impulse response filters for simplicial complexes,” in 2021 29th European Signal Processing Conference (EUSIPCO).
  • [9] M.Yang and E.Isufi, “Simplicial trend filtering,” in 2022 56th Asilomar Conference onSignals,Systems, and Computers, 2022.
  • [10] S.Ebli, M.Defferrard, and G.Spreemann, “Simplicial neural networks,” in TDA & Beyond, 2020.
  • [11] T. M.Roddenberry, N.Glaze, and S.Segarra, “Principled simplicial neural networks for trajectory prediction,” in International Conference on Machine Learning. PMLR, 2021, pp. 9020–9029.
  • [12] C.Bodnar, F.Frasca, Y. G.Wang, N.Otter, G.Montufar, P.Liò, and M. M.Bronstein, “Weisfeiler and lehman go topological: Message passing simplicial networks,” in ICLR 2021 Workshop on Geometrical and Topological Representation Learning, 2021.
  • [13] M.Yang, E.Isufi, and G.Leus, “Simplicial convolutional neural networks,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 8847–8851.
  • [14] A.van den Oord, Y.Li, and O.Vinyals, “Representation learning with contrastive predictive coding,” CoRR, vol. abs/1807.03748, 2018.
  • [15] T.Chen, S.Kornblith, M.Norouzi, and G. E.Hinton, “A simple framework for contrastive learning of visual representations,” CoRR, vol. abs/2002.05709, 2020.
  • [16] Y.You, T.Chen, Y.Sui, T.Chen, Z.Wang, and Y.Shen, “Graph contrastive learning with augmentations,” in Advances in Neural Information Processing Systems, H.Larochelle, M.Ranzato, R.Hadsell, M.Balcan, and H.Lin, Eds. 2020, vol. 33, pp. 5812–5823, Curran Associates, Inc.
  • [17] Y.Tian, C.Sun, B.Poole, D.Krishnan, C.Schmid, and P.Isola, “What makes for good views for contrastive learning?,” in Advances in Neural Information Processing Systems, H.Larochelle, M.Ranzato, R.Hadsell, M.Balcan, and H.Lin, Eds. 2020, vol. 33, pp. 6827–6839, Curran Associates, Inc.
  • [18] Y.Zhu, Y.Xu, F.Yu, Q.Liu, S.Wu, and L.Wang, “Deep Graph Contrastive Representation Learning,” in ICML Workshop on Graph Representation Learning and Beyond, 2020.
  • [19] Y.Zhu, Y.Xu, F.Yu, Q.Liu, S.Wu, and L.Wang, “Graph contrastive learning with adaptive augmentation,” in Proceedings of the Web Conference 2021. apr 2021, ACM.
  • [20] N.Liu, X.Wang, D.Bo, C.Shi, and J.Pei, “Revisiting graph contrastive learning from the perspective of graph spectrum,” in Advances in Neural Information Processing Systems, A. H.Oh, A.Agarwal, D.Belgrave, and K.Cho, Eds., 2022.
  • [21] L.Lin, J.Chen, and H.Wang, “Spectral augmentation for self-supervised learning on graphs,” in The Eleventh International Conference on Learning Representations, 2023.
  • [22] K.Ding, Z.Xu, H.Tong, and H.Liu, “Data augmentation for deep graph learning: A survey,” SIGKDD Explor. Newsl., vol. 24, no. 2, pp. 61–77, dec 2022.
  • [23] H.Bhatia, G.Norgard, V.Pascucci, and P.-T.Bremer, “The helmholtz-hodge decomposition—a survey,” IEEE Transactions on Visualization and Computer Graphics, vol. 19, no. 8, pp. 1386–1404, 2013.
  • [24] C.-Y.Chuang, J.Robinson, L.Yen-Chen, A.Torralba, and S.Jegelka, “Debiased contrastive learning,” 2020.
  • [25] Q.Sun, W.Zhang, and X.Lin, “Progressive hard negative masking: From global uniformity to local tolerance,” 2022.
  • [26] Y.Liu, X.Yang, S.Zhou, X.Liu, Z.Wang, K.Liang, W.Tu, L.Li, J.Duan, and C.Chen, “Hard sample aware network for contrastive deep graph clustering,” 2022.
  • [27] M. T.Schaub, A. R.Benson, P.Horn, G.Lippner, and A.Jadbabaie, “Random walks on simplicial complexes and the normalized hodge 1-laplacian,” SIAM Review, vol. 62, no. 2, pp. 353–391, 2020.