This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Relation-aware graph structure embedding with co-contrastive learning for drug-drug interaction prediction

Mengying Jiang [email protected] Guizhong Liu [email protected] Biao Zhao [email protected] Yuanchao Su [email protected] Weiqiang Jin [email protected] School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China College of Geomatics, Xi’an University of Science and Technology, Xi’an 710054, China
Abstract

Relation-aware graph structure embedding is promising for predicting multi-relational drug-drug interactions (DDIs). Typically, most existing methods begin by constructing a multi-relational DDI graph and then learning relation-aware graph structure embeddings (RaGSEs) of drugs from the DDI graph. Nevertheless, most existing approaches are usually limited in learning RaGSEs of new drugs, leading to serious over-fitting when the test DDIs involve such drugs. To alleviate this issue, we propose a novel DDI prediction method based on relation-aware graph structure embedding with co-contrastive learning, RaGSECo. The proposed RaGSECo constructs two heterogeneous drug graphs: a multi-relational DDI graph and a multi-attribute drug-drug similarity (DDS) graph. The two graphs are used respectively for learning and propagating the RaGSEs of drugs, aiming to ensure all drugs, including new ones, can possess effective RaGSEs. Additionally, we present a novel co-contrastive learning module to learn drug-pairs (DPs) representations. This mechanism learns DP representations from two distinct views (interaction and similarity views) and encourages these views to supervise each other collaboratively to obtain more discriminative DP representations. We evaluate the effectiveness of our RaGSECo on three different tasks using two real datasets. The experimental results demonstrate that RaGSECo outperforms existing state-of-the-art prediction methods.

keywords:
Adverse drug reactions , Graph neural networks , Graph structure embedding , Self-supervised learning
{highlights}

We innovatively propose the propagation of relation-aware graph structure embeddings from known drugs to new drugs with similar features.

It is the first attempt to learn drug-pair feature representations based on cross-view contrastive learning, and the selected views are the interaction and similarity views.

We design a novel positive sample selection strategy that considers the underlying correlations between drug pairs, further enhancing the effectiveness of cross-view contrastive learning.

1 Introduction

Human diseases are often the result of complex biological processes, and single-drug treatments are often insufficient to cure them entirely [18, 51]. Consequently, combination drug therapy has become attractive as it can reduce drug resistance and improve efficacy [20, 6]. Nevertheless, the simultaneous use of multiple drugs can alter their properties and lead to adverse drug interactions, which can cause harm to patients [41]. Therefore, identifying potential DDIs is crucial for safe coadministration [50]. Traditional methods for detecting DDIs, such as biological or pharmacological assays, are labor-intensive, time-consuming, and expensive [12, 23]. In contrast, computation-based methods are generally less expensive and can achieve higher accuracy than traditional methods [53]. In recent years, many computational methods have been proposed for predicting DDI events [58].

Many existing works regard the drugs with known interactions as known drugs and refer to those without known interactions as new drugs [5]. Typically, the lack of interaction information on new drugs makes it difficult for models to predict corresponding interactions accurately [5]. Nevertheless, predicting interactions involving new drugs is significant for the efficient development of new drugs [37]. Therefore, many DDI prediction approaches concern this task [30]. For example, Deng et al. [5] proposed the multimodal deep learning framework for predicting DDI events (DDIDML). DDIDML employs multiple biochemical attributes (such as enzymes, targets, pathways, and chemical substructures) to compute multiple similarities to build a deep neural network (DNN) model for multi-relational DDI prediction. Zhang et al. [52] proposed the sparse feature learning ensemble method with linear neighborhood regularization (SFLLN) that combines multiple drug features like DDIDML and known DDIs to predict novel DDI. Lin et al. [25] proposed a DDI prediction method that jointly utilizes the multi-source drug fusion, the multi-source feature fusion, and the transformer (MDF-SA-DDI). The MDF-SA-DDI first utilizes four different drug fusion networks to encode latent features of DPs, then adopts transformers to perform latent feature fusion for representation learning of DPs. The aforementioned methods employ multiple attributes to learn representative drug embeddings and have achieved leading performance when predicting interactions involving new drugs. Nevertheless, to ensure that the training and test sets have the same data distribution, these methods do not use multi-relational interaction information between drugs, limiting the DDI prediction performance.

Recently, self-supervised learning, which aims to derive supervised signals from the data itself spontaneously, has emerged as a promising strategy for effective representation learning [46]. Among the various techniques under the umbrella of self-supervised learning, contrastive learning has attracted substantial attention [29]. Contrastive learning first extracts positive and negative samples from data and then maximize the similarities between positive samples while minimizing the similarities between negative samples[55]. In this way, contrastive learning can learn discriminative representations even without abundant labels. Despite the broad application of contrastive learning in computer vision [27] and natural language processing [54], its potential in the context of DDI prediction tasks has been scarcely explored. Implementing contrastive learning to facilitate DDI prediction is by no means trivial and requires careful consideration of the task-specific characteristics and nuances of contrastive learning.

Based on the above discussion, the primary motivation for our work lies in enabling all drugs to distill effective RaGSEs, thereby facilitating DP representation learning and improving DDI prediction. We propose a relation-aware graph structure embedding with co-contrastive learning framework for DDI prediction, abbreviated as RaGSECo, an end-to-end learning model. Implementing our RaGSECo approach includes two main steps: drug embedding and DP representation learning. The relation-aware graph structure embedding learning and propagation (RaGSELP) method is proposed for drug embedding. RaGSELP constructs a multi-relational DDI graph by using known DDIs (training set). Then, RaGSELP learns RaGSEs of known drugs by aggregating their neighbor’s features under different relations. Inspired by an assumption that similar drugs may interact with the same drugs [35, 25, 5], RaGSELP constructs a multi-attribute DDS graph. Within this DDS graph, RaGSELP learns embeddings of new drugs by aggregating their neighbor’s RaGSEs, aiming to enable all drugs to possess effective RaGSEs. Furthermore, we incorporate multi-view representation learning and cross-view contrastive Learning to present a novel co-contrastive learning mechanism to learn DP representations. Unlike previous contrastive learning, which contrasts the original and the corrupted networks, we design two distinct views for DP representation learning: interaction and similarity views. Specifically, we leverage the RaGSEs of drugs to generate the DP representations under the interaction view, and we employ similarities between drugs to generate DP representations under the similarity view. The interaction view primarily utilizes known interaction relationships between drugs. The similarity view can discover inherent connections between drugs and facilitate inferring the potential therapeutic effects, toxicity reactions, or drug interactions of new drugs. Therefore, the two views are complementary. Moreover, we consider underlying correlations between DPs to design a novel positive selection strategy to enhance cross-view contrastive learning. With the training, these two views supervise each other collaboratively and learn more discriminative DP representations.

The main contributions of our work can be summarized as follows:

  • \bullet Propagating the RaGSEs of known drugs to new drugs with similar features is innovatively proposed on relation-aware-based methods. This significantly improves DDI prediction performances of relation-aware-based methods, especially in predicting DDIs involving new drugs.

  • \bullet To our best knowledge, this is the first attempt to perform the DP-level cross-view contrastive learning. More discriminative DP representations can be learned by co-contrastive learning based on a cross-view manner.

  • \bullet The proposed RaGSECo ingeniously contrasts and incorporates two views of the drug information network (interaction and similarity views), enabling DP to capture both the known and the potential interactions.

2 Related Work

We review previous studies relevant to this work in three areas: relation-aware graph structure embedding, contrastive learning, and drug feature extraction.

2.1 Relation-Aware Graph Structure Embedding

The methods based on relation-aware graph structure embedding learning pay attention to the topology of the graph, typically learning entity embeddings by aggregating information from neighboring entities under different relations [47]. The practicability of these methods on multi-relational DDI predictions makes these methods attract considerable attention. Most existing methods use graph neural networks (GNNs), including relational graph convolutional networks (RGCNs) [34] and graph attention networks (GATs) [40]. For example, Hong et al. [15] used the GAT to propose the link-aware graph attention network (LaGAT) that learns drug embedding by aggregating features of neighbors from different attention pathways via different DDI event types, where the attention weights depend on embedding representations of drugs and their neighbors. Wang et al. [42] proposed the graph of graphs neural network (GoGNN) that captures the information on drug structure and multi-relational drug interactions in a hierarchical way to learn drug embedding for DDI prediction. Yu et al. [47] proposed the relation-aware network embedding for DDI prediction (RANEDDI) that considers both the multi-relational information between drugs and the relation-aware network structure information of drugs to obtain the drug embedding for DDI prediction. Yu et al. [48] proposed the substructure-aware tensor neural network model for DDI prediction (STNN-DDI) that learns a 3-D tensor of (substructure, substructure, interaction type) triplets, which characterizes a substructure–substructure interaction (SSI) space. The aforementioned methods take advantage of interaction information and perform well in the multi-relational DDI prediction task. Nevertheless, the absence of interaction information on new drugs limits the performance of predicting DDIs involving new drugs.

2.2 Contrastive Learning

The approaches based on contrastive learning learn representations by contrasting positive pairs against negative pairs have achieved considerable success across various domains [3]. In this section, we mainly focus on reviewing the contrastive learning methods related to DDI prediction tasks. Zhao et al. [55] constructed original and corrupted networks, then minimized the mutual information between outputs of original and corrupted networks, and maximized the mutual information between outputs from only the original or corrupted network. Wang et al. [44] proposed a self-supervised meta-path detection mechanism to train a deep transformer encoder model that can capture the global structure and semantic features in heterogeneous biomedical networks. Gao et al. [10] designed the drug-disease associations view and drug or disease similarity view, then maximized the mutual information between the two views. Zhuang et al. [59] learned high-level drug representations containing graph-level structural information by maximizing the local-global mutual information. Although many works have been proposed to learn high-level drug embeddings by contrastive learning, little effort has been made toward DP-level contrastive learning. Nevertheless, learning high-level DP representations is more practical for the DDI prediction task.

2.3 Drug Feature Extraction

Drug feature extraction is crucial for model training [25]. Zhu et al. [58] took into account eight drug attributes (molecular substructure, target, enzyme, pathway, side effect, phenotype, gene, and disease) to extract features. Deng et al. [5] jointly consider targets, enzymes, and chemical substructures, which achieved outstanding performance [25]. Typically, each attribute is associated with a set of descriptors. A drug can be denoted as a binary feature vector where the element (1 or 0) indicates the presence or absence of the corresponding descriptor [25]. These sparse binary feature vectors typically have high dimensionalities. In general, high dimensional inputs can be resource-intensive and induce the curse of dimensionality, leading to extremely poor performance in some cases [25]. Therefore, given that similar drugs may interact with the same drugs, Deng et al. [5] opted to use the Jaccard similarities calculated from the binary feature vectors to define drugs rather than the binary feature vector itself. The Jaccard similarity is calculated as follows:

Jaccard(U,V)=|UV||UV|\operatorname{Jaccard}(U,V)=\frac{|U\cap V|}{|U\cup V|} (1)

where UU and VV are the descriptors of two drugs under a specific attribute. Herein, |UV||U\cap V| is the cardinality of the intersection between UU and VV, and |UV||U\cup V| is the union.

3 Methodology

3.1 RaGSECo framework

Refer to caption
Figure 1: Examples of construction strategies for three different test sets. The nodes represent drugs. Solid edges indicate known DDI interactions (training set), while dotted edges represent the prediction task (test set). The edges with different colors signify various interaction types.

The RaGSECo is proposed for multi-relational DDI prediction, which can be regarded as a multi-class classification task. Based on three different experimental settings, the multi-relational DDI prediction task can be further partitioned into three different tasks, defined as follows:

  • \bullet Task 1: predicting unobserved interaction events between known drugs.

  • \bullet Task 2: predicting interaction events between known drugs and new drugs.

  • \bullet Task 3: predicting interaction events between new drugs.

These three tasks are vividly illustrated in Fig. 1. As the figure illustrates, the training and test sets contain the same drugs in Task 1. In Task 2, half of the drugs involved in the test set appear in the training set. In Task 3, the training and test sets possess different drugs. Therefore, from Task 1 to Task 3, the prediction difficulties increase in turn.

Taking Task 3 as an example, the framework of our RaGSECo is illustrated in Fig. 2. The RaGSECo constructs two heterogeneous drug graphs: a multi-relational DDI graph and a multi-attribute DDS graph. In the DDS graph, nodes denote drugs, and edges represent similarities between drugs. The different edge colors indicate similarities under varying attributes (targets, enzymes, and chemical substructures). Consequently, in the DDS graph, each pair of drugs can have up to three types of edges. In contrast, in the DDI graph, each pair of drugs has only up to one type of edge. The framework of our RaGSECo contains two primary parts: drug embedding and DP representation learning. The RaGSELP method is proposed to learn drug embedding, and the cross-view module is developed for learning DP representation for DDI prediction.

Refer to caption
Figure 2: Flowchart of the proposed RaGSECo, involving two main parts: RaGSELP (𝐔𝐩𝐩𝐞𝐫\bf{Upper}) and co-contrastive learning (𝐁𝐨𝐭𝐭𝐨𝐦\bf{Bottom}). In the RaGSELP method, we first learn the RaGSEs of known drugs by propagating their neighbor’s features under different relations within the DDI graph. Subsequently, we propagate the learned RaGSEs to the connected new drugs within the DDS graph to ensure that all drugs possess effective RaGSEs. Afterward, the co-contrastive learning module takes in multiple features (initial features, RaGSEs, and SMILES strings) of two drugs and then generates the feature representations of the DP. Finally, the Decoder calculates the classification probability distributions of the DP.

3.2 RaGSELP

The RaGSELP method is proposed to learn drug embedding. RaGSELP contains two parts: RaGSE learning (RaGSEL) and RaGSE propagation (RaGSEP). RaGSEL learns the RaGSEs of known drugs, and RaGSEP propagates the RaGSEs of known drugs to new drugs with similar features.

3.2.1 RaGSEL

We construct a multi-relational DDI graph 𝒢=(𝒟,,)\mathcal{G}=(\mathcal{D},\mathcal{E},\mathcal{R}), where nodes represent drugs, and edges denote labeled interactions. Herein, 𝒟\mathcal{D} is the set of all drugs (including known and new drugs), and |𝒟|=N|\mathcal{D}|=N represents the number of drugs. Let 𝐗={𝐱i}i=1NN×d{\bf X}=\left\{{\bf x}_{i}\right\}_{i=1}^{N}\in\mathbb{R}^{N\times d} be the initial feature matrix of drugs, derived from Section 2.3. \mathcal{R} is the set of interaction event types, and ||=R|\mathcal{R}|=R is the number of interaction type. ={r}r=1R\mathcal{E}=\left\{\mathcal{E}_{r}\right\}_{r=1}^{R} is the set of known interactions, and r\mathcal{E}_{r} represents the set of interactions under interaction type rr. Let 𝒜={𝐀r}r=1RN×N×R\mathcal{A}=\left\{{\bf A}_{r}\right\}_{r=1}^{R}\in\mathbb{R}^{N\times N\times R} be the multi-relational adjacency tensor, where 𝐀rN×N{\bf A}_{r}\in\mathbb{R}^{N\times N} represents the adjacency matrix under interaction relation rr. Let Ar(i,j)A_{r(i,j)} (i,j=1,,N)(i,j=1,\dots,N) be the element of 𝐀𝐫\bf{A}_{r}, Ar(i,j)=1A_{r(i,j)}=1 if (i,j)r(i,j)\in\mathcal{E}_{r} and Ar(i,j)=0A_{r(i,j)}=0 if (i,j)r(i,j)\notin\mathcal{E}_{r}.

Subsequently, we use the R-GCN layer [34] to learn the RaGSEs of drugs from the multi-relational DDI graph. The forward propagation function is defined as follows:

𝐡i=σ(rj𝒩irA^r(i,j)Ri𝐱j𝐖r+𝐱i𝐖o)\mathbf{h}_{i}=\sigma\left(\sum_{r\in\mathcal{R}}\sum_{j\in\mathcal{N}_{i}^{r}}\frac{\hat{A}_{r(i,j)}}{R_{i}}\mathbf{x}_{j}\mathbf{W}_{r}+\mathbf{x}_{i}\mathbf{W}_{o}\right)\\ (2)

where 𝐡i1×d\mathbf{h}_{i}\in\mathbb{R}^{1\times d^{\prime}} represents the RaGSEs of drug ii, 𝐱i1×d\mathbf{x}_{i}\in\mathbb{R}^{1\times d} are the initial feature vector of drug ii, i𝒟i\in\mathcal{D}. Herein, 𝐖rd×d\mathbf{W}_{r}\in\mathbb{R}^{d\times d^{\prime}} represents the relation-specific weight matrix and the adoption of a set of 𝐖r\mathbf{W}_{r} (r=1,,R)(r=1,\dots,R) supports multiple edge types. On the contrary, 𝐖0d×d\mathbf{W}_{0}\in\mathbb{R}^{d\times d^{\prime}} is a single weight matrix regardless of relations. Here, 𝒩ir\mathcal{N}_{i}^{r} represents the set of drugs connected to drug ii under relation rr. σ()\sigma(\cdot) is an element-wise activation function: ReLU()=max(0,)\operatorname{ReLU}(\cdot)=\max(0,\cdot). To normalize the incoming messages of each drug, we use {Ri}i=1N\left\{R_{i}\right\}_{i=1}^{N} to define a set of normalized constants. Here, RiR_{i} equals the number of interaction types in which drug ii is involved. A^r(i,j)\hat{A}_{r(i,j)} is the aggregation weight assigned to the neighboring drug jj. Specifically, A^r(i,j)\hat{A}_{r(i,j)} denotes the element in 𝐀^r\hat{\bf A}_{r}. The calculation of 𝐀^r\hat{\bf A}_{r} is based on a classic graph-based normalization method [19].

𝐀^r=𝐃r1/2𝐀r𝐃r1/2\hat{\bf A}_{r}={\bf D}_{r}^{-1/2}{\bf A}_{r}{\bf D}_{r}^{-1/2} (3)

where 𝐃r{\bf D}_{r} is the degree matrix of 𝐀r{\bf A}_{r} and a diagonal matrix.

Herein, 𝐡i\mathbf{h}_{i} can represent relation-aware graph structure information of drug ii if drug ii is known. Nevertheless, if drug ii is new, 𝐡i\mathbf{h}_{i} would be influenced only by drug ii itself. Consequently, directly using the output of Eq. (2) as the drug embeddings for DDI prediction could potentially result in severe over-fitting issues for Tasks 2 and 3.

3.2.2 RaGSEP

Based on the assumption that similar drugs may interact with the same drugs [25, 5], we propose a strategy that propagates the RaGSEs of known drugs to new drugs with similar features. This strategy can effectively overcome the over-fitting issues that arise in Tasks 2 and 3. To facilitate the propagation of embeddings, we construct a multi-attribute DDS graph 𝒢s=(𝒟,,𝒮)\mathcal{G}_{s}=(\mathcal{D},\mathcal{E},\mathcal{S}), where nodes represent drugs, and edges denote Jaccard similarities between drugs under various attributes. Herein, we still employ these three biological attributes (i.e., chemical substructure, enzyme, and target). In the DDS graph, 𝒟\mathcal{D} is the set of nodes, with 𝐇={𝐡i}i=1N{\bf H}=\left\{{\bf h}_{i}\right\}_{i=1}^{N} as the feature matrix of nodes, which is the output of Eq. (2). \mathcal{E} represents the set of edges. 𝒮\mathcal{S} is the set of similarity types, and |𝒮|=3|\mathcal{S}|=3 is the number of similarity types. Let 𝒜={𝐀s,𝐀e,𝐀t}N×N×3\mathcal{A}=\left\{{\bf A}_{s},{\bf A}_{e},{\bf A}_{t}\right\}\in\mathbb{R}^{N\times N\times 3} be the adjacency tensor, where 𝐀s{\bf A}_{s}, 𝐀e{\bf A}_{e}, and 𝐀t{\bf A}_{t} represent the substructure-based, enzyme-based, and target-based adjacency matrix, respectively.

To normalize the incoming messages, these three adjacency matrices also need to be normalized. The normalization method is referred to as Eq. (3). Let 𝐀^s\hat{\bf A}_{s}, 𝐀^e\hat{\bf A}_{e}, and 𝐀^t\hat{\bf A}_{t} be the normalized adjacency matrices. We use these normalized adjacency matrices to propagate the learned RaGSEs to the strongly associated drugs within the DDS graph. The propagation procedure can be expressed as follows:

𝐇s=σ(𝐀^sn𝐇𝐖s){\bf H}_{s}=\sigma\left(\hat{\bf A}_{s}^{n}{\bf H}{\bf W}_{s}\right) (4)
𝐇e=σ(𝐀^en𝐇𝐖e){\bf H}_{e}=\sigma\left(\hat{\bf A}_{e}^{n}{\bf H}{\bf W}_{e}\right) (5)
𝐇t=σ(𝐀^tn𝐇𝐖t){\bf H}_{t}=\sigma\left(\hat{\bf A}_{t}^{n}{\bf H}{\bf W}_{t}\right) (6)
𝐇embed=𝐇s+𝐇e+𝐇t{\bf H}_{embed}={\bf H}_{s}+{\bf H}_{e}+{\bf H}_{t} (7)

where 𝐀^sn𝐇\hat{\bf A}^{n}_{s}\bf{H}, 𝐀^en𝐇\hat{\bf A}^{n}_{e}\bf{H}, and 𝐀^tn𝐇\hat{\bf A}^{n}_{t}\bf{H} represent graph convolutions on the DDS graph. nn is a hyperparameter, and 𝐀^sn\hat{\bf A}_{s}^{n} is 𝐀^s\hat{\bf A}_{s} to the power of nn. The same definition is extended to 𝐀^en\hat{\bf A}_{e}^{n} and 𝐀^tn\hat{\bf A}_{t}^{n}. This strategy allows RaGSECo to capture higher-order neighbor information flexibly. 𝐖s\mathbf{W}_{s}, 𝐖e\mathbf{W}_{e}, 𝐖td×d\mathbf{W}_{t}\in\mathbb{R}^{d^{\prime}\times d^{\prime}} are the attribute-specific trainable weight matrices. 𝐇s\mathbf{H}_{s}, 𝐇e\mathbf{H}_{e}, and 𝐇t\mathbf{H}_{t} are the propagation results along with different pathways. 𝐇embed={𝐡embed,i}i=1NN×d\mathbf{H}_{embed}=\left\{{\bf h}_{embed,i}\right\}_{i=1}^{N}\in\mathbb{R}^{N\times d^{\prime}} is the sum of the three propagation results and represents the final embeddings of drugs. Particularly, 𝐡embed,i{\bf h}_{embed,i} can represent effective RaGSEs of drug ii, whether drug ii is known or new.

Refer to caption
Figure 3: Illustrations of the co-contrastive learning module. This module accepts the initial feature vectors, RaGSEs, and SMILES string of two drugs as the input and generates the DP representations. The initial feature vectors of the two drugs are concatenated and fed into an FNN, resulting in the DP representations under the similarity view. In a similar fashion, another FNN is utilized to generate the DP representations from the interaction view. Meanwhile, a Convolutional Neural Network (CNN) is employed to obtain the SMILES-based DP representations. Additionally, a cross-view contrastive mechanism facilitates collaborative supervision between similarity and interaction views, thereby promoting the learning of more discriminative DP representations.

3.3 Co-Contrastive Learning

The co-contrastive learning module, as illustrated in Fig. 3, is designed to learn DP representations. This module consists of two primary components: multi-view representation learning and collaborative contrastive optimization. The first component learns DP representations from multiple views. The second component facilitates a cooperative process in which these views optimize and supervise each other.

3.3.1 Multi-View Representation Learning

To increase data diversity, facilitate gradient propagation, and generate discriminative DP representations, we employ multiple drug features containing RaGSEs, initial features, and Simplified-Molecular-Input-Line-Entry-System (SMILES) string.

SMILES string is a line notation that uses a predefined set of rules to describe the structure of compounds, and each drug has its own SMILES string [4]. The characters of a SMILES string represent chemical atoms or chemical bonds, and the lengths of the SMILES strings are unconstrained [31]. In this work, we convert each SMILES string to a p×qp\times q dimensional feature matrix for simplicity, where pp denotes the number of characters and qq is the unified length of the SMILES string [16]. Accordingly, the columns of the SMILES-based feature matrix are one-hot vectors. Since the initial lengths of the SMILES strings are flexible, we will cut off the extra part if the actual length of the SMILES string is greater than qq, and we will pad zeros if the actual length is less than qq. The pp and qq are set as 6464 and 100100 according to Huang et al. [16].

Let QQ be the number of the known DDIs. Considering computational burdens, we adopt a batch-wise scheme to train the RaGSECo. Let KK be the number of DPs in a batch. Given DP kk (k=1,K)(k=1,\dots\,K) that involves two drugs ii and jj (i,j=1,N;ij)(i,j=1,\dots\,N;i\neq j). Let 𝐒i{\bf{S}}_{i}, 𝐒jp×q{\bf{S}}_{j}\in\mathbb{R}^{p\times q} be the SMILES-based feature matrices of drugs ii and jj. 𝐡embed,i{\bf{h}}_{embed,i} and 𝐡embed,j{\bf{h}}_{embed,j} are the RaGSEs of two drugs. 𝐱i{\bf x}_{i} and 𝐱j{\bf x}_{j} are initial feature vectors. Subsequently, we employ two feedforward neural networks (FNNs) to process the RaGSEs and initial features of DPs, respectively. In addition, a CNN encodes SMILES strings of DPs. The employed CNN is a multi-layer 1-D CNN followed by a global max pooling layer, and with reference to [16]. Accordingly, three features of DP kk are encoded as follows:

𝐩kiniti=FNN1(𝐱i||𝐱j,ΘFNN1){\bf p}_{k}^{initi}=\operatorname{FNN1}({\bf x}_{i}||{\bf x}_{j},\Theta_{\mathrm{FNN1}}) (8)
𝐩kembed=FNN2(𝐡embed,i||𝐡embed,j,ΘFNN2){\bf p}_{k}^{embed}=\operatorname{FNN2}({\bf h}_{embed,i}||{\bf h}_{embed,j},\Theta_{\mathrm{FNN2}}) (9)
𝐩ksmile=CNN(𝐒i||𝐒j,ΘCNN).{\bf p}_{k}^{smile}=\operatorname{CNN}(\mathbf{S}_{i}||\mathbf{S}_{j},\Theta_{\mathrm{CNN}}). (10)

where the symbols ΘFNN1\Theta_{\mathrm{FNN1}}, ΘFNN2\Theta_{\mathrm{FNN2}}, and ΘCNN\Theta_{\mathrm{CNN}} represent the trainable weights involved in FNN1, FNN2, and CNN, respectively.

The initial feature vectors of drugs are denoted by Jaccard similarity scores between drugs. Thus, 𝐩kiniti1×dFNN{\bf p}_{k}^{initi}\in\mathbb{R}^{1\times d^{FNN}} can be considered as the DP feature representations under the similarity view. Meanwhile, the drug RaGSEs primarily concentrate on interaction information between drugs. As such, 𝐩kembed1×dFNN{\bf p}_{k}^{embed}\in\mathbb{R}^{1\times d^{FNN}} can be perceived as the DP representations under the interaction view.

3.3.2 Collaboratively Contrastive Optimization

The interaction view mainly focuses on known interaction relationships between nodes, while the similarity view can infer potential therapeutic effects, adverse reactions, or drug interactions of new drugs by discovering underlying drug correlations. Therefore, these two views are complementary and mutually reinforcing. Herein, we define positive and negative samples for contrastive learning. In this paper, given the feature representations of a DP under the interaction view, we can simply define its feature representations under the similarity view as the positive sample. Nevertheless, our analysis of the employed datasets suggests that there may be underlying correlations between DPs. Consequently, we propose a novel positive selection strategy.

This paper utilizes two real-world, multi-relational DDI datasets: Dataset 1 and Dataset 2. In Dataset 1, the known DDIs are categorized into 65 types of DDI events, while the number of interaction event types in which each drug is involved ranges from 1 to 20. The median of the event type counts is 10, significantly lower than 65. Similar findings are also observed in Dataset 2. These findings suggest that a particular drug is likely to associate with specific interaction event types strongly. Thus, the interaction event types a drug participates in can be considered significant characteristics of the drug. We refer to these significant characteristics as interaction characteristics of drugs. Let 𝐭i1×R{{\bf t}_{i}}\in{\mathbb{R}}^{1\times R} represent the interaction characteristics of drug ii, where RR is the number of interaction event types. The vector 𝐭i{\bf t}_{i} is binary, and its nonzero elements represent the interaction event types the drug ii participates in. Given a DP kk that involves two drugs ii and jj, the interaction characteristics of DP kk are defined as 𝐜k=𝐭i+𝐭j{\bf c}_{k}={\bf t}_{i}+{\bf t}_{j}. Based on this, we propose a new positive selection strategy: if two DPs exhibit similar interaction characteristics, they are designated as positive samples. One advantage of such a strategy is that the selected positive samples may reflect the underlying interaction tendency of the target DP. For DPs kk and qq, we calculate the cosine similarity of their interaction characteristics as follows:

Sk,q=𝐜k𝐜q𝐜k𝐜q.{S}_{k,q}=\frac{{\bf c}_{k}\cdot{\bf c}_{q}}{\|{\bf c}_{k}\|\|{\bf c}_{q}\|}. (11)

Given the threshold values tpost_{pos} and tnegt_{neg}, where tpos>tnegt_{pos}>t_{neg}. DP qq is deemed a positive sample of DP kk if Sk,qtpos{S}_{k,q}\geq t_{pos}, and conversely, is considered a negative sample if Sk,qtneg{S}_{k,q}\leq t_{neg}. Let 𝒫\mathcal{P} represent the set of positive pairs of one batch, and 𝒩\mathcal{N} be the set of negative pairs. We then model a self-supervised learning task using the standard binary cross-entropy loss. With this, we present the co-contrastive learning loss function as follows:

ss1=\displaystyle\ell_{ss1}= 1|𝒫|+|𝒩|((k,q)𝒫logΨ(𝐩kembed,𝐩qiniti)\displaystyle-\frac{1}{|\mathcal{P}|+|\mathcal{N}|}\left(\sum_{(k,q)\in\mathcal{P}}\log\Psi\left({\bf p}_{k}^{embed},{\bf p}_{q}^{initi}\right)\right. (12)
+(k,q)𝒩log(1Ψ(𝐩kembed,𝐩qiniti)))\displaystyle\left.+\sum_{(k,q)\in\mathcal{N}}\log\left(1-\Psi\left({\bf p}_{k}^{embed},{\bf p}_{q}^{initi}\right)\right)\right)

where the contrastive discriminator Ψ(,)\Psi(\cdot,\cdot) is instantiated as σ(𝐩kembed𝐖𝐩qinitiT)\sigma({\bf p}_{k}^{embed}{\bf W}{{\bf p}_{q}^{initi}}^{T}). 𝐖{\bf W} is a learnable weight matrix. The activation function σ\sigma is the sigmoid function, which produces a score representing the probability of being a positive sample. The co-contrastive learning mechanism maximizes the mutual information of positive pairs under the interaction and similarity views. To stabilize the co-contrastive learning process and facilitate two views supervising each other, we present another co-contrastive learning loss function:

ss2=\displaystyle\ell_{ss2}= 1|𝒫|+|𝒩|((k,q)𝒫logΨ(𝐩kiniti,𝐩qembed)\displaystyle-\frac{1}{|\mathcal{P}|+|\mathcal{N}|}\left(\sum_{(k,q)\in\mathcal{P}}\log\Psi\left({\bf p}_{k}^{initi},{\bf p}_{q}^{embed}\right)\right. (13)
+(k,q)𝒩log(1Ψ(𝐩kiniti,𝐩qembed))).\displaystyle\left.+\sum_{(k,q)\in\mathcal{N}}\log\left(1-\Psi\left({\bf p}_{k}^{initi},{\bf p}_{q}^{embed}\right)\right)\right).

In conclusion, to fully utilize all significant information and learn more discriminative DP representations, we define the final DP representations 𝐩k{\bf p}_{k} as follows:

𝐩kadd=𝐩kiniti+𝐩kembed+𝐩ksmile{\bf p}_{k}^{add}={\bf p}_{k}^{initi}+{\bf p}_{k}^{embed}+{\bf p}_{k}^{smile} (14)
𝐩k=||(𝐩kiniti,𝐩kembed,𝐩ksmile,𝐩kadd).{\bf p}_{k}=||\left({\bf p}_{k}^{initi},{\bf p}_{k}^{embed},{\bf p}_{k}^{smile},{\bf p}_{k}^{add}\right). (15)

where the symbol |||| denotes the concatenation.

3.4 Loss Function

We construct a Decoder by two fully connected layers that map 𝐩k{\bf p}_{k} into the probability distribution space 𝐲k1×R{\bf y}_{k}\in\mathbb{R}^{1\times R}:

𝐲k=Decoder(𝐩k;ΘDecoder){\bf y}_{k}=\operatorname{Decoder}\left({\bf p}_{k};\Theta_{\mathrm{Decoder}}\right) (16)

where ΘDecoder\Theta_{\mathrm{Decoder}} is the trainable parameters. The first fully connected layer is followed by an activation function GeLu [14], a batch normalization layer [17], and a dropout layer. The second fully connected layer is followed by a softmax function.

DDI events prediction is a multi-classification task. Thus, we choose the cross-entropy loss function as our classification loss function:

ce=k=1Kr=1Ry^k,rlog(yk,r)\mathcal{L}_{ce}=-\sum_{k=1}^{K}\sum_{r=1}^{R}\hat{y}_{k,r}\log\left(y_{k,r}\right) (17)

where y^k,r\hat{y}_{k,r} represents the rr-th element of 𝐲^k\hat{\bf{y}}_{k}, which corresponds to the ground-truth vector (one-hot encoding) of the DP kk. KK denotes the number of DDI samples in a single batch for training. On the other hand, yk,ry_{k,r} denotes the predicted probability score of the DP kk under the class rr, corresponding to the rr-th element of 𝐲k{\bf y}_{k}.

To emphasize the importance of the classification loss, we introduce a hyperparameter λ\lambda to scale the classification cross-entropy loss ce\mathcal{L}_{ce}. Accordingly, the total loss function of the model is formulated as follows:

=λce+ss1+ss2\mathcal{L}=\lambda\mathcal{L}_{ce}+\mathcal{L}_{ss1}+\mathcal{L}_{ss2} (18)

where \mathcal{L} is the loss of one batch. Furthermore, we employ Mixup [49], a data augmentation algorithm, to increase the quantity of the original data and improve the generalization ability and robustness of the model. Mixup is described in detail in the previous study [49].

Input: NepochsN_{epochs}: maximum epochs; NbatchesN_{batches}: total batchs; λ\lambda: hyper-parameter; 𝒯r\mathcal{T}_{r}: the set of training DDIs; 𝐘^\bf{\hat{Y}}: ground truth of training DDIs; 𝒯e\mathcal{T}_{e}: the set of test DDIs.
Output: 𝐘𝐭𝐞\bf{Y}_{te}: predicted types of test DDIs.
1𝐗\bf{X}\leftarrow Obtain drugs’ initial features via (1).
2 𝒢\mathcal{G}\leftarrow Construct a DDI graph using known DDIs and 𝐗\bf{X}.
3 //\ast Training Procedure \ast//
4 for  epoch[1,Nepochs]epoch\in[1,N_{epochs}] do
5       for  i[1,Nbatches]i\in[1,N_{batches}] do
6             //\ast Acquiring Drug Embddings \ast//
7             Use 𝒢\mathcal{G} learn RaGSEs of known drugs, 𝐇{\bf H}, via Eq.(2).
8             Use Jaccard similarities and 𝐇{\bf H} to build a DDS graph 𝒢s\mathcal{G}_{s}.
9             Use 𝒢s\mathcal{G}_{s} obtain RaGSEs of all drugs, 𝐇embed{\bf H}_{embed}, via Eq.(4)-(7).
10             //\ast Acquiring Drug-Pair Representations \ast//
11             Obtain ii-th batch training DDIs from 𝒯r\mathcal{T}_{r}, and let the DP kk belong to the ii-th batch training DDIs.
12             Acquire three representations of DP kk: 𝐩kinitial{\bf{p}}^{initial}_{k}, 𝐩kembed{\bf{p}}^{embed}_{k}, 𝐒ksmile{\bf{S}}^{smile}_{k}, via Eq.(8)-(10).
13             Obtain positive and negative sample set via Eq.(11).
14             ss1\mathcal{L}_{ss1}, ss2\mathcal{L}_{ss2} \leftarrow obtain co-contrastive learning losses via Eq.(12) and Eq.(13).
15             Calculate final representations of DP kk, 𝐩k{\bf{p}}_{k}, via Eq.(14) and Eq.(15).
16             //\ast DDI Prediction \ast//
17             Use 𝐩k{\bf{p}}_{k} to calculate the probability distribution 𝐲k{\bf{y}}_{k} via Eq.(16).
18             ce\mathcal{L}_{ce}\leftarrow use ground truth 𝐲^𝐤\bf{\hat{y}_{k}} and predicted result 𝐲k\mathbf{y}_{k} calculate prediction loss via Eq.(17).
19             Obtain the final loss λce+ss1+ss2\mathcal{L}\leftarrow\lambda\mathcal{L}_{ce}+\mathcal{L}_{ss1}+\mathcal{L}_{ss2}.
20             Minimize \mathcal{L} and update 𝐇{\bf H}, 𝐇embed{\bf H}_{embed}, and 𝐩k{\bf{p}}_{k}.
21            
22       end for
23      
24 end for
25//\ast Testing Procedure \ast//
26 Predict types of the DDI in 𝒯e\mathcal{T}_{e} using the trained MSAGC model.
Algorithm 1 Pseudocode of the proposed RaGSECo.

RaGSECo is a semi-supervised DDI prediction approach. Both known and new drugs are incorporated into the training and testing phases of each task. We present pseudocode in Algorithm 1 for a clearer illustration of optimizing the stated objective. We begin by extracting drug features via Jaccard similarities (Line 1). Following this, we construct a multi-relational DDI graph using the extracted drug features and known DDIs, derived from training data 𝒯r\mathcal{T}_{r} (Line 2). During each batch of each training iteration, our initial step is to learn embeddings 𝐇embed{\bf H}_{embed} for all drugs (Lines 7-9). Then, a batch of training data is obtained from 𝒯r\mathcal{T}_{r}, and feature representations 𝐏k{\bf{P}}_{k} for the corresponding DPs are learned (Lines 11, 12, and 15). Subsequently, we construct sets of positive and negative samples (Line 16) and calculate co-contrastive learning losses ss1\mathcal{L}_{ss1}, ss2\mathcal{L}_{ss2} (Line 13 and 14). The prediction loss, denoted as ce\mathcal{L}_{ce}, is subsequently computed (Lines 17 and 18). In the following step, we merge ss1\mathcal{L}_{ss1}, ss2\mathcal{L}_{ss2} and ce\mathcal{L}_{ce} to generate the final loss \mathcal{L} (Line 19), then optimize \mathcal{L} to update 𝐇embed{\bf H}_{embed} and 𝐏k{\bf{P}}_{k} (Line 20). Once all training iterations are complete, we use the trained RaGSECo to test the types of interaction events of DPs in 𝒯e\mathcal{T}_{e}.

4 Results and Discussion

4.1 Datasets

This paper uses two real datasets to explore the effectiveness and competitiveness of our RaGSECo. The first data set (Dataset 1) was collected by Deng et al. from DrugBank 111https://go.drugbank.com/releases/latest and published in [5]. Dataset 1 contains 572572 drugs with 3726437~{}264 pairwise DDIs associated with 6565 interaction types. Each drug is represented by four biological attributes: enzymes, targets, pathways, and chemical substructures.

We also construct a dataset (Dataset 2) from DrugBank. We use 10001000 drugs, and each drug is described by three features: enzymes, targets, and chemical substructures. Accordingly, we obtained a total of 206029206~{}029 pairwise DDIs, which are associated with 9999 types of events. The detailed information on the two datasets is listed in Table 1.

Table 1: Summary of the datasets utilized in our experiments.
Dataset Drug number DDI number DDI event types
Dataset 1 572 37264 65
Dataset 2 1000 206029 99

4.2 Baselines

We compare the proposed RaGSECo with six baselines:

\bullet MCFF-MTDDI [11] extracts multiple drug-related features and proposes a gated recurrent unit-based multi-channel feature fusion module to yield comprehensive representations of DPs.

\bullet MDF-SA-DDI [25] uses four encoders to generate four different DP features and adopts transformers to perform feature fusion.

\bullet RANEDDI [47] is a relation-aware graph structure embedding method that considers the multi-relational interaction information to obtain drug embedding for DDI prediction.

\bullet DDIMDL [5] is a DNN method that pays attention to multiply drug-drug-similarities for multi-relational DDI prediction.

\bullet DeepDDI [33] is a DNN method that uses structural information of DPs to train the model and make DDI predictions.

\bullet DNN [21] uses multiple similarity information of DPs to predict the pharmacological effects of DDIs by autoencoders and a deep feed-forward network.

4.3 Experimental Settings

To demonstrate the effectiveness, we evaluate our RaGSECo by implementing Tasks 1, 2, and 3 on the two real datasets. In Task 1, we randomly split DDIs into five subsets via 5-fold cross-validation (5-CV), with four subsets as the training set and the remaining as the test set. For Tasks 2 and 3, we randomly split drugs instead of DDIs into five subsets via 5-CV, with four subsets used as the known drugs and the remaining as the new drugs. In Task 2, the DDIs involving two known drugs are defined as the training set, and the DDIs involving one known drug and one new drug are utilized as the test set. In Task 3, the training set is the same as Task 2, and the DDIs between two new drugs are utilized as the test set.

The prediction task is the multi-class classification work. To evaluate the classification performance of our RaGSECo, we use six metrics: accuracy (ACC), the area under the precision–recall-curve (AUPR), the area under the ROC curve (AUC), Precision, Recall, and F1 score. The activation function, dropout layer, and batch normalization layer [17] are used between the fully connected layers. By default, we use the Gaussian error linear unit [14] activation function and Radam optimizer [28]. The proposed RaGSECo is implemented with the deep learning library PyTorch. The Python and PyTorch versions are 3.8.10 and 1.9.0, respectively. All experiments are conducted on a Windows server with a GPU (NVIDIA GeForce RTX 4090 Ti).

4.4 Hyperparameter Searching and Setting

The proposed RaGSECo involves some hyperparameters that influence the prediction performance, including batch size (bsbs), learning rate (lrlr), dropout rate (drdr), training epochs (tete), the dimension of drug RaGSEs (dd^{\prime}), the power of adjacency matrices in the DDS graph (nn), the output dimension of FNN1 (dFNNd^{FNN}), threshold values (tpost_{pos} and tnegt_{neg}), and the weight of CE loss (λ\lambda). To thoroughly investigate the impact of each parameter on the prediction results, a grid search strategy is employed where one parameter is varied while keeping the other parameters fixed. Table LABEL:hyperparameters_value summarizes the values of all hyperparameters on different datasets and tasks. Among these hyperparameters, nn, dFNNd^{FNN}, tpost_{pos} and tnegt_{neg} are specific to the proposed RaGSECo. Therefore, in the subsequent experiments, we will provide the impact of these hyperparameters on the experimental results and analyze the reasons for them.

4.4.1 Impact of RaGSE Propagation Distance nn

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4: Six metric scores versus the maximum RaGSE propagation distance nn on Task 1 (a), Task 2 (b), Task 3 (c) of Dataset 1.

The hyperparameter nn represents the power of adjacency matrices in the DDS graph and determines the distance of RaGSE propagation. It plays a crucial role in RaGSE propagation. To understand the influence of nn on the prediction performance, we conduct experiments on three tasks of Dataset 1 and evaluate six metric scores across different nn values. Referring to Fig. 4 (a), in Task 1, the metric scores of RaGSECo do not change drastically as nn increases, and RaGSECo can consistently achieve good prediction performances. On the other hand, in Tasks 2 and 3, we observed that the prediction performance of RaGSECo is sensitive to the value of nn. Specifically, when nn is set to 0, the prediction performance of our RaGSECo is at its lowest. The prediction performance significantly improves as the value of nn increases. This is because the test DDIs in Tasks 2 and 3 involve new drugs that do not possess relation-aware interaction information when nn is 0. As a result, the model tends to be over-fitting. As nn increases, new drugs can aggregate effective relation-aware interaction information from similar known drugs, which significantly improves the generalization ability of RaGSECo. Based on the experimental results, we determined that setting nn as 0 in Task 1 and 3 in Tasks 2 and 3 leads to desirable prediction performances.

4.4.2 Impact of Output Dimension dFNNd^{FNN}

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5: Six metric scores versus the output dimension of FNN1 and FNN2 dFNNd^{FNN} on Task 1 (a), Task 2 (b), and Task 3 (c) of Dataset 1.

The output dimension dFNNd^{FNN} of FNN1 and FNN2 is an important hyperparameter for our RaGSECo model. Increasing dFNNd^{FNN} can enhance the generalization ability of RaGSECo to some extent. In order to investigate the impact of dFNNd^{FNN} on the prediction performance of RaGSECo, we conducted experiments on three tasks of Dataset 1 and examined the changes in metric scores as dFNNd^{FNN} varies. From Fig. 5, it can be observed that the prediction performances of RaGSECo gradually improve as dFNNd^{FNN} increase among all tasks. This indicates the robustness of our model. To strike a balance between accuracy and efficiency, we set dFNNd^{FNN} to 1000 for Task 1 and 1500 for Tasks 2 and 3.

4.4.3 Impact of Threshold Values tpost_{pos} and tnegt_{neg}

Refer to caption
Refer to caption
Refer to caption
Figure 6: ACC (a), AUPR (b), and F1(c) versus the threshold values tpost_{pos} and tnegt_{neg} on Task 1 of Dataset 1.

The threshold values tpost_{pos} and tnegt_{neg} are important parameters for selecting positive and negative samples of DPs. In our experiments, we set tpost_{pos} to the range of [0.8,0.85,,1][0.8,0.85,\dots,1] and tnegt_{neg} to the range of [0,0.1,,0.4][0,0.1,\dots,0.4]. Fig. 6 presents the comparisons of three representative metric scores (ACC, AURR, and F1) for different values of tpost_{pos} and tnegt_{neg}. It can be observed that the results are not significantly affected by variations in tpost_{pos} and tnegt_{neg}. This indicates the stability of our co-contrastive learning strategy. Herein, the feature representations from the two views of each DP serve as positive samples for each other when tpost_{pos} equals 1. As observed, the model produces superior prediction results when tpost_{pos} is set to 0.95, as opposed to when it is set to 1. This observation indicates the effectiveness of our positive sample selection strategy. Finally, we set tpos=0.95t_{pos}=0.95 and tneg=0.1t_{neg}=0.1 for subsequent experiments.

Refer to caption
Refer to caption
Refer to caption
Figure 7: Experimental results of RaGSECo and its six variants in terms of ACC (a), AUPR (b), and F1 (c) on Task 1 of Dataset 1.
Refer to caption
Refer to caption
Refer to caption
Figure 8: Experimental results of RaGSECo and its six variants in terms of ACC (a), AUPR (b), and F1 (c) on Task 2 of Dataset 1.
Refer to caption
Refer to caption
Refer to caption
Figure 9: Experimental results of RaGSECo and its six variants in terms of ACC (a), AUPR (b), and F1 (c) on Task 3 of Dataset 1.

4.5 Analysis of RaGSECo with Its Variants

In this subsection, we compare our proposed RaGSECo with six variants to assess the necessity and effectiveness of each component in DDI prediction. RaGSECo incorporates multiple components, including the construction of multi-relational DDI and multi-attribute DDS graphs, the integration of SMILES string, initial features, and RaGSEs of drugs, and the adoption of a co-contrastive learning mechanism to enhance DP representation learning. By comparing RaGSECo with its variants, we aim to understand the contributions of each component to the overall performance. The six variants are represented as follows:

\bullet~{}RaGSECo-R: A variant of RaGSECo that does not construct a multi-relational DDI graph to learn RaGSEs of known drugs. From a practical perspective, the initial features of drugs 𝐗\mathbf{X} are directly used as the node initial features of the multi-attribute DDS graph to learn drug embeddings.

\bullet~{}RaGSECo-M: A variant of RaGSECo that does not construct the multi-attribute DDS graph for RaGSE propagation. In other words, the output of RaGSEL, 𝐇\mathbf{H}, is taken as the final drug embeddings.

\bullet~{}RaGSECo-I: A variant of RaGSECo that neglects the initial features of drugs during DP representation learning. Hence, the representations of DP kk in Eq. (15) are denoted as: 𝐩𝐤=||(𝐩𝐤𝐬𝐦𝐢𝐥𝐞,𝐩𝐤𝐞𝐦𝐛𝐞𝐝,𝐩𝐤𝐬𝐦𝐢𝐥𝐞+𝐩𝐤𝐞𝐦𝐛𝐞𝐝)\bf{p}_{k}=||\left({\bf p}_{k}^{smile},{\bf p}_{k}^{embed},{\bf p}_{k}^{smile}+{\bf p}_{k}^{embed}\right).

\bullet~{}RaGSECo-S: A variant of RaGSECo that does not use the SMILES strings of drugs during DP representation learning. Thus, the representations of DP kk in Eq. (15) are represented as: 𝐩𝐤=||(𝐩𝐤𝐢𝐧𝐢𝐭𝐢,𝐩𝐤𝐞𝐦𝐛𝐞𝐝,𝐩𝐤𝐢𝐧𝐢𝐭𝐢+𝐩𝐤𝐞𝐦𝐛𝐞𝐝)\bf{p}_{k}=||\left({\bf p}_{k}^{initi},{\bf p}_{k}^{embed},{\bf p}_{k}^{initi}+{\bf p}_{k}^{embed}\right).

\bullet~{}RaGSECo-E: A RaGSECo variant that ignores drug RaGSEs during DP representation learning. Hence, the representations of DP kk in Eq. (15) are denoted as: 𝐩𝐤=||(𝐩𝐤𝐢𝐧𝐢𝐭𝐢,𝐩𝐤𝐬𝐦𝐢𝐥𝐞,𝐩𝐤𝐢𝐧𝐢𝐭𝐢+𝐩𝐤𝐬𝐦𝐢𝐥𝐞)\bf{p}_{k}=||\left({\bf p}_{k}^{initi},{\bf p}_{k}^{smile},{\bf p}_{k}^{initi}+{\bf p}_{k}^{smile}\right).

\bullet~{}RaGSECo-C: A RaGSECo variant that neglects the employment of co-contrastive learning. Hence, the representations of DP kk in Eq. (15) are denoted as: 𝐩𝐤=𝐩𝐤𝐞𝐦𝐛𝐞𝐝\bf{p}_{k}={\bf p}_{k}^{embed}. the total loss is represented as =ce\mathcal{L}=\mathcal{L}_{ce}.

Herein, we select three representative metric scores (Accuracy, AUPR, and F1) to evaluate the prediction performance of these models. Fig. 7, 8, and 9 illustrate the performance of RaGSECo and its six variants on Task 1, 2, and 3 of Dataset 1, respectively. These figures show that RaGSECo achieves higher metric scores compared to its variants, indicating the effectiveness of RaGSE learning and propagation, multimodal DP representation learning, and the co-contrastive learning mechanism. Comparing RaGSECo-R and RaGSECo-M, we can see that RaGSECo-M performs better on Task 1, while RaGSECo-R significantly outperforms RaGSECo-M on Tasks 2 and 3. This observation confirms that the test DDIs in Task 1 consist of known drugs with distinguishable relation-aware information. Meanwhile, it demonstrates that the test DDIs include new drugs in Tasks 2 and 3, which may impact the model performance. Nevertheless, RaGSE propagation can effectively mitigate this issue. Analyzing the results of RaGSECo-I, RaGSECo-S, and RaGSECo-E reveals that incorporating the drug’s initial features, SMILES string, and RaGSEs enhances data diversity and improves the model’s performance. Finally, the performance of RaGSECo-C further validates the effectiveness of the co-contrastive learning mechanism.

Table 2: Prediction performances of different methods on Dataset 1.
Method ACC AUPR AUC Precision Recall F1
Task 1 RaGSECo 0.9461 0.9838 0.9991 0.9121 0.9043 0.9050
MCFF-MTDDI 0.9350 0.9757 0.9985 0.9100 0.8820 0.8918
MDF-SA-DDI 0.9301 0.9737 0.9989 0.9085 0.8760 0.8878
RANEDDI 0.9228 0.9657 0.9980 0.8747 0.8701 0.8717
DDIMDL 0.8852 0.9208 0.9976 0.8471 0.7182 0.7585
DeepDDI 0.8371 0.8899 0.9961 0.7275 0.6611 0.6848
DNN 0.8797 0.9134 0.9963 0.8047 0.7027 0.7223
Task 2 RaGSECo 0.6826 0.7002 0.9535 0.6514 0.5631 0.5860
MCFF-MTDDI 0.6650 0.6800 0.9500 0.6561 0.5139 0.5574
MDF-SA-DDI 0.6633 0.6776 0.9497 0.6547 0.5078 0.5584
DDIMDL 0.6415 0.6558 0.9799 0.5607 0.4319 0.4460
DeepDDI 0.5774 0.5594 0.9575 0.3630 0.3890 0.3416
DNN 0.6239 0.6361 0.9796 0.4237 0.2840 0.2997
Task 3 RaGSECo 0.4464 0.4014 0.8848 0.3001 0.2513 0.2600
MCFF-MTDDI 0.4400 0.3870 0.8701 0.2823 0.2351 0.2437
MDF-SA-DDI 0.4338 0.3873 0.8630 0.2715 0.2226 0.2329
DDIMDL 0.4075 0.3635 0.9512 0.2408 0.1452 0.1590
DeepDDI 0.3602 0.2781 0.9059 0.1586 0.1450 0.1373
DNN 0.4087 0.3776 0.9550 0.1836 0.1093 0.1152
Refer to caption
Refer to caption
Figure 10: Performance comparison for each DDI event type of Dataset 1.
Table 3: Prediction performances of different methods on Dataset 2.
Method ACC AUPR AUC Precision Recall F1
Task 1 RaGSECo 0.9344 0.9805 0.9995 0.9021 0.9234 0.9113
MCFF-MTDDI 0.9010 0.9532 0.9984 0.8300 0.9122 0.8631
MDF-SA-DDI 0.8725 0.9385 0.9979 0.7518 0.9198 0.8220
RANEDDI 0.8611 0.9225 0.9872 0.8155 0.9084 0.8110
DDIMDL 0.8401 0.8824 0.9892 0.7678 0.8580 0.7800
DeepDDI 0.7813 0.8542 0.9810 0.6459 0.8065 0.6732
DNN 0.8281 0.8650 0.9722 0.7337 0.8239 0.7560
Task 2 RaGSECo 0.6570 0.6800 0.9862 0.5576 0.6087 0.5586
MCFF-MTDDI 0.6422 0.6564 0.9601 0.5407 0.5087 0.5140
MDF-SA-DDI 0.6333 0.6486 0.9667 0.5317 0.4900 0.4785
DDIMDL 0.6039 0.6159 0.9737 0.5080 0.4465 0.4470
DeepDDI 0.5266 0.5122 0.9522 0.3030 0.3592 0.3219
DNN 0.5985 0.6035 0.9666 0.3572 0.2475 0.2661
Task 3 RaGSECo 0.4775 0.4362 0.9578 0.2957 0.2898 0.2723
MCFF-MTDDI 0.4545 0.4136 0.9340 0.2857 0.2348 0.2427
MDF-SA-DDI 0.4530 0.4092 0.9358 0.2768 0.2336 0.2399
DDIMDL 0.4111 0.3739 0.9514 0.2715 0.1691 0.1823
DeepDDI 0.3466 0.2685 0.9165 0.2284 0.1535 0.1640
DNN 0.4062 0.3680 0.9454 0.1576 0.1245 0.1373

4.6 Comparison with Other Methods

4.6.1 Dataset 1

To assess the competitiveness of our RaGSECo, we compared it with several state-of-the-art DDI prediction methods, namely MCFF-MTDDI, MDF-SA-DDI, RANEDDI, DDIMDL, DeepDDI, and DNN. Table 2 presents the metric scores achieved by these methods across the three tasks of Dataset 1. As observed, in most cases, the comparison results demonstrate that our RaGSECo outperforms the competitors in terms of performance metrics in three tasks. In Task 1, although RANEDDI also considers relation-aware graph structure information to learn drug embedding and achieve outstanding prediction performance, our RaGSECo achieves better results than RANEDDI. Specifically, the improvements of RaGSECo over RANEDDI are 2.33%2.33\%, 1.81%1.81\%, 3.74%3.74\%, 2.83%2.83\%, and 3.33%3.33\% in terms of ACC, AUPR, Precision, Recall, and F1, respectively. The reasons causing this phenomenon are manifold. On the one hand, RANEDDI ignores multiple drug-related attributes, such as targets, enzymes, and chemical substructures. This makes RANEDDI lack the ability to capture relationships beyond interactions between drugs. On the other hand, simply concatenating embeddings of two drugs to obtain drug-pair features also limits the generalization ability of RANEDDI. MDF-SA-DDI considers multiple attributes representing drugs and employs multiple drug fusion methods to learn drug-pair features. Nevertheless, our RaGSECo still has 1.63%1.63\%, 1.01%1.01\%, 2.832.83, and 1.72%1.72\% improvements over MDF-SA-DDI with respect to ACC, AUPR, Recall, and F1, respectively. The main reason is that MDF-SA-DDI does not take advantage of specific interaction information between drugs.

For further insight, we investigate the performances of our RaGSECo and four competitive baselines for each event. Fig. 10 displays the AUPR and AUC scores of the five prediction models for each event on Task 1. As observed, RaGSECo produces greater AUPR and AUC scores than other methods in most event types. In most cases, RaGSECo can achieve a satisfactory result. All unsatisfactory prediction results are observed in low-frequency event types, such as #39\#39, #52\#52, and #64\#64, with frequencies of only 9898, 2424, and 1010 samples, respectively. The limited availability of training samples for these low-frequency event types may contribute to relatively poorer performance.

In Tasks 2 and 3, we compare our RaGSECo with five competitive prediction methods, i.e., MCFF-MTDDI, MDF-SA-DDI, DDIMDL, DeepDDI, and DNN. Since RANEDDI only focuses on the embedding of known drugs, RANEDDI can not be applied to Task 2 and Task 3. In Task 2 and Task 3, the test DDIs include new drugs, and the lack of interaction information on the new drugs weakens the generalization ability of models. Therefore, the prediction accuracy of models performed on Task 2 and Task 3 is lower than that on Task 1. Nevertheless, RaGSECo outperforms other competitors in most cases. The reasons are three folds: 1) RaGSECo enables all drugs, including new drugs, to capture effective relation-aware interaction information. 2) RaGSECo inventively combines initial features, SMILES information, and drug RaGSEs, enhancing the information diversity. 3) RaGSECo captures the underlying correlation between DPs and enables two views to supervise each other by co-contrastive learning.

4.6.2 Dataset 2

Table 3 presents the performance metrics of the proposed RaGSECo and other outstanding approaches on three tasks of Dataset 2. Dataset 2 exhibits greater diversity compared to Dataset 1. It has more drugs, more DDIs, and a wider range of DDI event types. As observed, our RaGSECo can achieve better prediction performances than other competitors in most cases. In Task 1, compared with MCFF-MTDDI, the proposed RaGSECo has 0.73%0.73\%, 2.25%2.25\%, 1.47%1.47\%, and 2.11%2.11\% performance improvements in terms of ACC, Precision, Recall, and F1, respectively. As observed in Tasks 2 and 3, our RaGSECo still acquires the best results compared with five competitive prediction models. The experiment results demonstrate the effectiveness and robustness of RaGSECo.

4.7 Case Study

In this section, we conduct case studies to validate the usefulness of RaGSECo. We adopt all DDIs and their event types of Dataset 2 to train the RaGSECo model. Then, we use the trained RaGSECo model to test the other DPs. Finally, we report the top-ranked prediction results. We pay attention to five events with the highest frequencies and check up the top 10 predictions related to each event. Then, we test the selected DPs using the DDI Checker tool222https://go.drugbank.com/interax/multi_search. In the selected 50 DPs, 7 DDIs are confirmed and recorded in Table LABEL:casestudytab. For example, Ribavirin may decrease the excretion rate of Sofosbuvir, which could result in a higher serum level. The metabolism of Sevoflurane can be increased when combined with Prednisolone phosphate.

Table 4: The drug names and event types of the confirmed DDIs.
Rank Drug names DDI event type
1 Prednisolone phosphate and Sevoflurane The metabolism increase
2 Dexamethasone and Amlodipine The metabolism increase
3 Prednisolone phosphate and Flunitrazepam The metabolism increase
4 Gestoden and Clofazimine The metabolism decrease
5 Mometasone furoate and Cytarabine The metabolism decrease
6 Prednisolone phosphate and Ranolazine The serum concentration increase
77 Velpatasvir and Ribavirin The excretion rate which could result in
a higher serum level decrease
Table 5: The hyper-parameters of best accuracy for the proposed RaGSECo on all Tasks.
Dataset Task Hyper-parameters
Dataset 1 Task 1 bsbs:512, lrlr:2e-5, drdr:0.3, tete:120, dd^{\prime}:500, nn:0, dFNNd^{FNN}:1000, tpost_{pos}:0.95, tnegt_{neg}:0.1, λ\lambda:5
Task 2 bsbs:512, lrlr:5e-6, drdr:0.2, tete:120, dd^{\prime}:500, nn:3, dFNNd^{FNN}:1500, tpost_{pos}:0.95, tnegt_{neg}:0.1, λ\lambda:5
Task 3 bsbs:512, lrlr:5e-6, drdr:0.2, tete:120, dd^{\prime}:500, nn:3, dFNNd^{FNN}:1500, tpost_{pos}:0.95, tnegt_{neg}:0.1, λ\lambda:5
Dataset 2 Task 1 bsbs:1024, lrlr:2e-5, drdr:0.5, tete:120, dd^{\prime}:500, nn:0, dFNNd^{FNN}:1500, tpost_{pos}:0.95, tnegt_{neg}:0.1, λ\lambda:5
Task 2 bsbs:1024, lrlr:5e-6, drdr:0.5, tete:120, dd^{\prime}:500, nn:3, dFNNd^{FNN}:1500, tpost_{pos}:0.95, tnegt_{neg}:0.1, λ\lambda:5
Task 3 bsbs:1024, lrlr:5e-6, drdr:0.5, tete:120, dd^{\prime}:500, nn:3, dFNNd^{FNN}:1500, tpost_{pos}:0.95, tnegt_{neg}:0.1, λ\lambda:5

5 Conclusion

In multi-relational DDI prediction, relation-aware graph embedding-based methods hold considerable promise. Nevertheless, interaction information of new drugs is unknown, which may cause these methods to suffer from severe over-fitting when predicting DDIs involving new drugs. To address this issue, we introduce the RaGSECo approach. The primary contribution of the RaGSECo is enabling all drugs to capture effective relation-aware interaction features and promoting the identical distribution of the training set and test set. Additionally, our RaGSECo employs a cross-view contrastive mechanism to boost DP representation learning by utilizing the underlying correlation between DPs and supervising two views collaboratively. Our approach offers a promising drug and DP representation learning solution, thereby enhancing DDI prediction performance.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  • [1] Gottlieb Assaf and et al. INDI: a computational framework for inferring drug interactions and their associated recommendations. Molecular systems biology, 8(1):592, 2012.
  • [2] Antoine Bordes, Nicolas Usunier, Alberto García-Durán, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multi-relational data. Advances in Neural Information Processing Systems, 26, 2013.
  • [3] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, ICML, volume 119, pages 1597–1607, 2020.
  • [4] Zhongjian Cheng, Qichang Zhao, Yaohang Li, and et al. IIFDTI: predicting drug-target interactions through interactive and independent features based on attention mechanism. Bioinformatics, 38(17):4153–4161, 2022.
  • [5] Yifan Deng, Xinran Xu, Yang Qiu, and et al. A multimodal deep learning framework for predicting drug-drug interaction events. Bioinformatics, 36(15):4316–4322, 2020.
  • [6] Yijie Ding, Jijun Tang, and Fei Guo. Identification of drug-target interactions via multi-view graph regularized link propagation model. Neurocomputing, 461:618–631, 2021.
  • [7] Yijie Ding, Jijun Tang, Fei Guo, and et al. Identification of drug-side effect association via multiple information integration with centered kernel alignment. Neurocomputing, 325:211–224, 2019.
  • [8] Reza Ferdousi, Reza Safdari, and Yadollah Omidi. Computational prediction of drug-drug interactions based on drugs functional similarities. Journal of Biomedical Informatics, 70:54–64, 2017.
  • [9] Achille Fokoue, Mohammad Sadoghi, Oktie Hassanzadeh, and Ping Zhang. Predicting drug-drug interactions through large-scale similarity-based link prediction. volume 9678, pages 774–789, 2016.
  • [10] Zihao Gao, Huifang Ma, Xiaohui Zhang, Zheyu Wu, and Zhixin Li. Co-contrastive self-supervised learning for drug-disease association prediction. In PRICAI 2022: Trends in Artificial Intelligence - 19th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2022, Shanghai, China, November 10-13, 2022, Proceedings, Part I, volume 13629 of Lecture Notes in Computer Science, pages 327–338, 2022.
  • [11] Chen-Di Han, Chun-Chun Wang, Li Huang, and Xing Chen. MCFF-MTDDI: multi-channel feature fusion for multi-typed drug-drug interaction prediction. Briefings in Bioinformatics, 2023.
  • [12] Chengxin He, Lei Duan, Huiru Zheng, Linlin Song, and Menglin Huang. An explainable framework for drug repositioning from disease information network. Neurocomputing, 511:247–258, 2022.
  • [13] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross B. Girshick. Momentum contrast for unsupervised visual representation learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 9726–9735, 2020.
  • [14] Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415., 2020.
  • [15] Yue Hong, Pengyu Luo, Shuting Jin, and et al. Lagat: link-aware graph attention network for drug-drug interaction prediction. Bioinformatics, 38(24):5406–5412, 2022.
  • [16] Kexin Huang, Tianfan Fu, Lucas M. Glass, and et al. DeepPurpose: a deep learning library for drug-target interaction prediction. Bioinformatics, 36(22-23):5545–5547, 2021.
  • [17] Sergey Ioffe and Christian Szegedy. Batch normalization: accelerating deep network training by reducing internal covariate shift. In ICML, 2015.
  • [18] Humayun Kayesh, Md. Saiful Islam, Junhu Wang, Ryoma J. Ohira, and Zhe Wang. SCAN: A shared causal attention network for adverse drug reactions detection in tweets. Neurocomputing, 479:60–74, 2022.
  • [19] Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, ICLR, 2017.
  • [20] Geonhee Lee, Chihyun Park, and Jaegyoon Ahn. Novel deep learning model for more accurate prediction of drug-drug interaction effects. BMC Bioinformatics, 20(1):1–8, 2019.
  • [21] Geonhee Lee, Chihyun Park, and Jaegyoon Ahn. Novel deep learning model for more accurate prediction of drug-drug interaction effects. BMC Bioinformatics, 20(1):415:1–415:8, 2019.
  • [22] Zhifei Li, Hai Liu, Zhaoli Zhang, Tingting Liu, and Jiangbo Shu. Recalibration convolutional networks for learning interaction knowledge graph embedding. Neurocomputing, 427:118–130, 2021.
  • [23] Majun Lian, Xinjie Wang, and Wenli Du. Integrated multi-similarity fusion and heterogeneous graph inference for drug-target interaction prediction. Neurocomputing, 500:1–12, 2022.
  • [24] Jiacheng Lin, Lijun Wu, Jinhua Zhu, and et al. R2-DDI: relation-aware feature refinement for drug-drug interaction prediction. Briefings in Bioinformatics, 24(1), 2023.
  • [25] Shenggeng Lin, Yanjing Wang, Lingfeng Zhang, and et al. MDF-SA-DDI: predicting drug-drug interaction events based on multi-source drug fusion, multi-source feature fusion and transformer self-attention mechanism. Briefings in Bioinformatics, 23(1), 2022.
  • [26] Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the AAAI conference on artificial intelligence, 2015.
  • [27] Kun Liu, Rui Meng, Longteng Li, Jingkun Mao, and Haiyong Chen. SiSL-Net: Saliency-guided self-supervised learning network for image classification. Neurocomputing, 510:193–202, 2022.
  • [28] Liyuan Liu, Haoming Jiang, Pengcheng He, and et al. On the variance of the adaptive learning rate and beyond. In ICLR, 2020.
  • [29] Qianhui Men, Edmond S.L. Ho, Hubert P.H. Shum, and Howard Leung. Focalized contrastive view-invariant learning for self-supervised skeleton-based action recognition. Neurocomputing, 537:198–209, 2023.
  • [30] Arnold K Nyamabo, Hui Yu, Zun Liu, and Jian Yu Shi. Drug-drug interaction prediction with learnable size-adaptive molecular substructures. Briefings in Bioinformatics, 23(1), 2022.
  • [31] Arnold K. Nyamabo, Hui Yu, and Jian-Yu Shi. SSI-DDI: substructure-substructure interactions for drug-drug interaction prediction. Briefings in Bioinformatics, 22(6), 2021.
  • [32] Shanchen Pang, Ying Zhang, Tao Song, Xudong Zhang, and Xun Wang. AMDE: a novel attention-mechanism-based multidimensional feature encoder for drug-drug interaction prediction. Briefings in Bioinformatics, 23(1), 2022.
  • [33] Jae Yong Ryu, Hyun Uk Kim, and et al. Deep learning improves prediction of drug-drug and drug-food interactions. Proceedings of the National Academy of Sciences, 115(18):E4304–E4311, 2018.
  • [34] Michael Sejr Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. Modeling relational data with graph convolutional networks. In The Semantic Web - 15th International Conference, ESWC, volume 10843, pages 593–607, 2018.
  • [35] Yifan Shang, Lin Gao, Quan Zou, and Liang Yu. Prediction of drug-target interactions based on multi-layer network representation learning. Neurocomputing, 434:80–89, 2021.
  • [36] Ying Shen, Kaiqi Yuan, Min Yang, Buzhou Tang, Yaliang Li, Nan Du, and Kai Lei. KMR: knowledge-oriented medicine representation learning for drug-drug interaction and similarity computation. Journal of Cheminformatics, 11(1):22:1–22:16, 2019.
  • [37] Jian-Yu Shi, Hua Huang, Jia-Xin Li, and et al. TMFUF: a triple matrix factorization-based unified framework for predicting comprehensive drug-drug interactions of new drugs. BMC Bioinformatics, 19(14):27–37, 2018.
  • [38] Yonglong Tian, Dilip Krishnan, and Phillip Isola. Contrastive multiview coding. In Computer Vision - ECCV, volume 12356, pages 776–794, 2020.
  • [39] Ashish Vaswani, Noam Shazeer, and Niki Parmar aand et al. Attention is all you need. Advances in Neural Information Processing Systems, 30, 2017.
  • [40] Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks. In 6th International Conference on Learning Representations, ICLR, 2018.
  • [41] Santiago Vilar, Carol Friedman, and George Hripcsak. Detection of drug-drug interactions through data mining studies using clinical sources, scientific literature and social media. Briefings in Bioinformatics, 19(5):863–877, 2018.
  • [42] Hanchen Wang, Defu Lian, Ying Zhang, and et al. GoGNN: graph of graphs neural network for predicting structured entity interactions. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020.
  • [43] Xiao Wang, Nian Liu, Hui Han, and Chuan Shi. Self-supervised heterogeneous graph neural network with co-contrastive learning. In Feida Zhu, Beng Chin Ooi, and Chunyan Miao, editors, KDD ’21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14-18, 2021, pages 1726–1736. ACM, 2021.
  • [44] Xiaoqi Wang, Yaning Yang, Kenli Li, Wentao Li, Fei Li, and Shaoliang Peng. BioERP: biomedical heterogeneous network-based self-supervised representation learning approach for entity relationship predictions. Bioinform., 37(24):4793–4800, 2021.
  • [45] Yanda Wang, Weitong Chen, Dechang Pi, Lin Yue, Sen Wang, and Miao Xu. Self-supervised adversarial distribution regularization for medication recommendation. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI, pages 3134–3140, 2021.
  • [46] Yueyue Wang, Danjun Song, Wentao Wang, Shengxiang Rao, Xiaoying Wang, and Manning Wang. Self-supervised learning and semi-supervised learning for multi-sequence medical image classification. Neurocomputing, 513:383–394, 2022.
  • [47] Hui Yu, Wenmin Dong, Jianyu Shi, and et al. RANEDDI: relation-aware network embedding for drug-drug interaction prediction. Information Sciences, 582:167–180, 2022.
  • [48] Hui Yu, Shiyu Zhao, and Jianyu Shi. STNN-DDI: a substructure-aware tensor neural network to predict drug-drug interactions. Briefings in Bioinformatics, 23(4), 2022.
  • [49] Hongyi Zhang, Moustapha Cissé, Yann N. Dauphin, and David Lopez-Paz. mixup: beyond empirical risk minimization. In ICLR, 2018.
  • [50] Tianlin Zhang, Jiaxu Leng, and Ying Liu. Deep learning for drug-drug interaction extraction from the literature: a review. Briefings in Bioinformatics, 21(5):1609–1627, 2020.
  • [51] Tongxuan Zhang, Hongfei Lin, Yuqi Ren, Zhihao Yang, Jian Wang, Xiaodong Duan, and Bo Xu. Identifying adverse drug reaction entities from social media with adversarial transfer learning model. Neurocomputing, 453:254–262, 2021.
  • [52] Wen Zhang, Kanghong Jing, Feng Huang, and et al. SFLLN: A sparse feature learning ensemble method with linear neighborhood regularization for predicting drug-drug interactions. Information Sciences, 497:189–201, 2019.
  • [53] Yu P. Zhang and Quan Zou. PPTPP: a novel therapeutic peptide prediction method using physicochemical property encoding and adaptive feature representation learning. Bioinformatics, 36(13):3982–3987, 2020.
  • [54] Biao Zhao, Weiqiang Jin, Javier Del Ser, and Guang Yang. ChatAgri: Exploring potentials of chatgpt on cross-linguistic agricultural text classification, 2023.
  • [55] Chengshuai Zhao, Shuai Liu, Feng Huang, Shichao Liu, and Wen Zhang. CSGNN: Contrastive self-supervised graph neural network for molecular interaction prediction. In IJCAI, pages 3756–3763, 2021.
  • [56] Tianyi Zhao, Yang Hu, Linda R. Valsdottir, and et al. Identifying drug-target interactions based on graph convolutional network and deep neural network. Briefings in Bioinformatics, 22(2):2141–2150, 2021.
  • [57] Yi Zheng, Hui Peng, and et al. DDI-PULearn: a positive-unlabeled learning method for large-scale prediction of drug-drug interactions. BMC Bioinformatics, 20(19):661, 2019.
  • [58] Jiajing Zhu, Yongguo Liu, Yun Zhang, and et al. Multi-attribute discriminative representation learning for prediction of adverse drug-drug interaction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12):10129–10144, 2022.
  • [59] Luhe Zhuang, Hong Wang, Meifang Hua, Wei Li, and Hui Zhang. Predicting drug-drug adverse reactions via multi-view graph contrastive representation model. Applied Intelligence, pages 1–18, 2023.
  • [60] Marinka Zitnik, Monica Agrawal, and Jure Leskovec. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics, 34(13):i457–i466, 2018.