This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Central-Smoothing Hypergraph Neural Networks for Predicting Drug-Drug Interactions

Duc Anh Nguyen, Canh Hao Nguyen, and Hiroshi Mamitsuka The authors are with Bioinformatics Center, Institute for Chemical Research, Kyoto University, Japan. H. M. is also with Department of Computer Science, Aalto University, Finland.D.A.N, C.H.N. and H.M. have been supported in part by Otsuka Toshimi Scholarship Foundation, MEXT KAKENHI [22K12150] and MEXT KAKENHI [16H02868, 19H04169, 21H05027, 22H03645] (and AIPSE program by Academy of Finland), respectively.Corresponding author: D.A.N. (email: [email protected]).Published at IEEE Transactions on Neural Networks and Learning Systems, DOI: 10.1109/TNNLS.2023.3261860.
Abstract

Predicting drug-drug interactions (DDI) is the problem of predicting side effects (unwanted outcomes) of a pair of drugs using drug information and known side effects of many pairs. This problem can be formulated as predicting labels (i.e. side effects) for each pair of nodes in a DDI graph, of which nodes are drugs and edges are interacting drugs with known labels. State-of-the-art methods for this problem are graph neural networks (GNNs), which leverage neighborhood information in the graph to learn node representations. For DDI, however, there are many labels with complicated relationships due to the nature of side effects. Usual GNNs often fix labels as an one-hot vector that does not reflect label relationships and potentially do not obtain the highest performance in the difficult cases of infrequent labels. In this paper, we formulate DDI as a hypergraph where each hyperedge is a triple: two nodes for drugs and one node for a label. We then present CentSmoothie\mathrm{CentSmoothie}, a hypergraph neural network that learns representations of nodes and labels altogether with a novel ’central-smoothing’ formulation. We empirically demonstrate the performance advantages of CentSmoothie\mathrm{CentSmoothie} in simulations as well as real datasets.

Index Terms:
hypergraph neural networks, hypergraph Laplacian, smoothing, drug-drug interactions

I Introduction

In drug-drug interactions (DDI), concurrent use of two drugs can lead to side effects, which are unwanted reactions in human bodies. It is a very important task to predict drug-drug interactions to guide drug safety. Given drug information and known side effects of many pairs of drugs, one wishes to learn a model to predict side effects of all pairs of drugs, which include new pairs of drugs without known side effects or known pairs (to denoise or complete side effect data). DDI is usually represented as a graph with nodes for drugs, edges for drug pairs that interact, with (binary vector) labels for (known) side effects [1]. The task is to predict labels of all pairs of nodes in the DDI graph. Fig. 1a shows an example of a DDI graph, where the dotted edge with question marks is the pair of drugs with labels to be predicted.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 1: Illustrative examples of (a) a traditional graph and (b) a (proposed) hypergraph for drug-drug interactions, and (c) central-smoothing assumption.

Recently, graph neural networks have emerged as a prominent approach for this task with high prediction performance [1, 2]. Graph neural networks for predicting DDI have two steps: learning new representations of drugs from a DDI graph, and using these representations for predictions. One drawback of this approach is the lack of learning label (i.e. side effect) representations. There are many side effects with complicated relationships. For example, our largest dataset has 964 side effects, where the number of drug pairs for one side effect (positive samples in supervised learning) ranges from 288 to 22,520. Previous methods represent each side effect as an independent one-hot vector, potentially under-utilizing the relationship among side effects [1, 3, 2]. Considering the relationship between side effects would be beneficial for predicting side effects, especially the ones with only small numbers of positive samples (i.e. infrequent side effects). Hence, it is desirable to learn the representations for both drugs and side effects, namely both nodes and edge labels, together.

To this end, we propose to encode DDI data with a hypergraph [4]. A node in the hypergraph can be either a drug or a side effect. A hyperedge is a triple of two drugs and a side effect that they caused. Hence, a pair of drugs with multiple side effects will result in many hyperedges in the hypergraph. Fig. 1b illustrates an example of a hypergraph corresponding to the DDI graph in Fig. 1a. Existing learning methods of hypergraph neural networks are based on a smoothing assumption that the representations of nodes in a hyperedge should be close to each other [5, 6, 7]. However, this assumption is not necessarily appropriate for our DDI problem, since each node representation should reflect (chemical or biological) properties of the corresponding drug and interacting drugs do not necessarily need to have similar properties.

We propose CentSmoothie\mathrm{CentSmoothie}, a central-smoothing hypergraph neural network that uses our idea, central-smoothing assumption (see Fig. 1c) for each hyperedge in the hypergraph for DDI. The idea is to learn node representations in a hyperedge such that (i) a drug node representation reflects the property of the corresponding drug and (ii) a side effect node representation reflects a combination of some properties of the two drugs that cause the corresponding side effect [8, 9]. To implement (ii), we first assume that a side effect representation should be related to the midpoint of the representations of the two interacting drugs, reflecting the combination of the two drug properties. Furthermore, there might have different side effects of the same two drugs, suggesting that each side effect might be obtained by a partial combination of the two drug properties. Hence, we propose that the representation for each side effect is learned to be close to a weighted midpoint of the corresponding two drug representations.

We formulate the above assumption, and then define the central-smoothing hypergraph Laplacian to be used in each layer of the hypergraph neural network with spectral convolution [5]. We also provide a computational method with the complexity of O(n) for the proposed hypergraph Laplacian.

We conducted extensive experiments to verify the performance advantages of CentSmoothie\mathrm{CentSmoothie} in both synthetic and real datasets. Our experimental results demonstrated that CentSmoothie\mathrm{CentSmoothie} significantly outperformed existing spectral-based convolutional hypergraph neural networks in all cases. In particular, CentSmoothie\mathrm{CentSmoothie} achieved higher performances over baselines for real datasets with more infrequent side effects, which are more difficult to predict, justifying the benefit of learning label (side effect) representations.

II Related Work

Existing work in predicting DDI can be divided into two approaches: non-graph based and graph based ones. In the non-graph based approach, pre-defined feature vectors, indicating the existences of chemical substructures and interacting proteins of drugs, are used. The side effects can be predicted by using a model (for example, a multilayer feedforward neural network), which receives the feature vectors of two drugs as input and the vector indicating the side effects of the two drugs as output [10, 3].

In the graph based approach, topological information of graphs is used to enhance the representations of nodes, leading to higher performance than the non-graph based approach. There are two types of graphs that can be used: molecular graphs of drugs and a DDI graph. For a DDI graph where nodes are drugs and edges are interactions between drugs, graph neural networks (GNNs) are applied to learn a new representation of a drug node based on its neighbors. Recent results show that GNNs for predicting DDI achieves the cutting-edge performance [1, 2]. An extension of a DDI graph can be a DDI heterogeneous graph, where nodes are drugs and side effects and edges are pairs of interacting drugs or drug-side effects [11]. However, the DDI heterogeneous graph cannot preserve triples of drug-drug-side effects.

GNNs can be further divided into two approaches: spectral convolution and spatial convolution [12]. In the spectral convolution, at first, the graph Laplacian is defined, and then each GNN layer is constructed from the graph Fourier transformation given the graph Laplacian [5, 13]. The spatial convolution approach uses node spatial relation that a node is updated based on information from neighbor nodes [14, 11].

Different from existing work for predicting drug-drug interactions, we formulate the drug-drug interactions in the form of a hypergraph and develop a new hypergraph neural network (HGNN) on the DDI hypergraph.

In HGNNs, recent work has inherited the spectral convolution approach on graphs to adapt to hypergraphs by defining the hypergraph Laplacian [5, 7]. Once the hypergraph Laplacian is defined, HGNNs can be constructed in the same manner as that for GNNs. Another approach for HGNNs is the spatial convolution approach with attention mechanisms [6].

III Background

In this section we briefly describe the hypergraph Laplacian being derived from a smoothness measure [4]. Let G=(V,E)G=(V,E) be a general hypergraph, where VV is the node set and E2VE\subset 2^{V} is the hyperedge set. Let W=diag(w(e1),,w(e|E|))|E|×|E|𝟎W=diag(w(e_{1}),...,w(e_{|E|}))\in\mathbb{R}^{|E|\times|E|}\succcurlyeq\mathbf{0} be the diagonal matrix that w(e)w(e) is the weight of hyperedge ee. Let xR|V|x\in R^{|V|} be values of nodes on the hypergraph that xux_{u} is the value of xx at node uu.

The hypergraph Laplacian is usually defined to be used in a similar manner to the graph Laplacian: to evaluate the smoothness of a function on a graph. Let sh(x,G)sh(x,G) be a smoothness measure of xx on GG and ss(x,e)ss(x,e) be a smoothness measure of xx on hyperedge ee. The smoothness on the hypergraph usually has the following form [4]:

sh(x,G)=𝒯eEw(e)ss(x,e)\displaystyle sh(x,G)=\mathcal{T}_{e\in E}w(e)ss(x,e) (1)

where 𝒯\mathcal{T} is an aggregation operator, such as sum (the most commonly used one), max, or lpl_{p} norm [4]. Usual smoothing assumption on hypergraphs is that nodes within a hyperedge should be close to each other [5, 6, 15], and then the smoothness measure on each hyperedge is calculated by:

ss(x,e)=(u,v)e(xuxv)2.\displaystyle ss(x,e)=\sum_{(u,v)\in e}(x_{u}-x_{v})^{2}. (2)

When 𝒯\mathcal{T} is a sum operator, the smoothness of a function on a hypergraph can be found in the following form:

sh(x,G)\displaystyle sh(x,G) =eEw(e)(u,v)e(xuxv)2\displaystyle=\sum_{e\in E}w(e)\sum_{(u,v)\in e}(x_{u}-x_{v})^{2} (3)
=x𝐓Lx,\displaystyle=x^{\mathbf{T}}Lx, (4)

which has the quadratic form with LL, and LL is then called the hypergraph Laplacian of the hypergraph. In the next section, we will propose a new smoothing assumption on hypergraphs then define a new hypergraph Laplacian.

IV CentSmoothie: Central-Smoothing Hypergraph Neural Networks

IV-A Problem Setting

We formulate the problem of predicting DDI as follows.

Input: Given a hypergraph of drug-drug interactions: G=(V,E)G=(V,E), where the node set V=VDVSV=V_{D}\cup V_{S} consists of a drug node set VDV_{D} and a side effect node set VSV_{S}, a known hyperedge set EVD×VD×VSE\subset V_{D}\times V_{D}\times V_{S} (Since two drugs in a drug pair are unordered, two triples (u,v,t)(u,v,t) and (v,u,t)(v,u,t) (u,vVDu,v\in V_{D} and tVSt\in V_{S}) are the same), and the feature vectors of drugs: XDR|VD|×K0X_{D}\in R^{|V_{D}|\times K_{0}}, where K0K_{0} is the feature size. The feature vectors of side effects are one-hot vectors.

Output: For each triple e=(u,v,t)VD×VD×VSe=(u,v,t)\in V_{D}\times V_{D}\times V_{S}, tt is predicted to be a side effect of uu and vv if the score of the triple is larger than a threshold.

IV-B Central-Smoothing Hypergraph Laplacian

The key idea is a central-smoothing assumption: each hyperedge is called central-smooth if a weighted version of the midpoint of drug node representations is close enough to the representation of the side effect node. It is motivated by biological research that a side effect of a pair of drugs is caused by a combination of properties of the two drugs [8, 9]. Assuming that representations reflecting all properties of drugs are obtained, the midpoint (likewise, the sum) of two drug representations should contain all these properties of the two drugs. A weighted midpoint, which in the ideal case, would contain properties from each drug, represents a specific combination of the properties, potentially reflecting the cause of a side effect. The idea of summing representations to reflect a combination of features from two entities has been used in the past, such as in translation model for knowledge graph embedding (TransE, for directed graphs [16]) or kernels for link prediction (pairwise kernels for undirected graphs [17]).

Central-smoothing measure on a hyperedge. In the embedding space of KK-dimension, considering dimension kk with the embedding of nodes: Xk|V|X_{k}\in\mathbb{R}^{|V|} that Xk,uX_{k,u}\in\mathbb{R} is the embedding of node uVu\in V. Given a hyperedge e=(u,v,t)e=(u,v,t), a weight Wk,t+W_{k,t}\in\mathbb{R}^{+} is a parameter indicating the relevance of side effect tt on dimension kk. We assign the weight of side effect tt to the hyperedge (wk(e)=Wk,tw_{k}(e)=W_{k,t}), and let 𝐖k=diag(wk(e1),,wk(e|E|))\mathbf{W}_{k}=diag(w_{k}(e_{1}),...,w_{k}(e_{|E|})) be the diagonal matrix of the hyperedge weights. The central-smoothing measure on dimension kk of the hyperedge is defined as:

ssc(Xk,e)=Wk,t(Xk,u+Xk,v2Xk,t)2.\displaystyle ss^{c}(X_{k},e)=W_{k,t}(\frac{X_{k,u}+X_{k,v}}{2}-X_{k,t})^{2}. (5)

Central-smoothing measure on the hypergraph. For hypergraph GG, the central-smoothing measure on dimension kk is defined as the sum of the central-smoothing measures on all hyperedges:

shc(Xk,G)=eEWk,t(Xk,u+Xk,v2Xk,t)2.sh^{c}(X_{k},G)=\sum_{e\in E}W_{k,t}(\frac{X_{k,u}+X_{k,v}}{2}-X_{k,t})^{2}. (6)

Central-smoothing hypergraph Laplacian. Since shc(Xk,G)sh^{c}(X_{k},G) is a nonnegative quadratic form, there exists a 𝐋k|𝕍|×|𝕍|\mathbf{L}_{k}\in\mathbb{R^{|V|\times|V|}} such that shc(Xk,G)=XkT𝐋kXksh^{c}(X_{k},G)=X_{k}^{T}\mathbf{L}_{k}X_{k}. We call 𝐋k\mathbf{L}_{k} as the central-smoothing hypergraph Laplacian, which can be derived as follows.

Let H|V|×|E|H\in\mathbb{R}^{|V|\times|E|} be a weighted oriented incidence matrix of GG that for a hyperedge eEe\in E, Hu,e=Hv,e=12H_{u,e}=H_{v,e}=\frac{1}{2} and Ht,e=1H_{t,e}=-1, we have:

shc(Xk,G)\displaystyle sh^{c}(X_{k},G) =eEWk,t(Xk,u+Xk,v2Xk,t)2\displaystyle=\sum_{e\in E}W_{k,t}(\frac{X_{k,u}+X_{k,v}}{2}-X_{k,t})^{2}
=XkTH𝐖kH𝖳Xk\displaystyle=X_{k}^{T}H\mathbf{W}_{k}{H}^{\mathsf{T}}X_{k}
=defXkT𝐋kxk.\displaystyle\stackrel{{\scriptstyle\text{def}}}{{=}}X_{k}^{T}\mathbf{L}_{k}x_{k}. (7)

Then,

𝐋k=H𝐖kH𝖳.\mathbf{L}_{k}=H\mathbf{W}_{k}{H}^{\mathsf{T}}. (8)

Computing the central-smoothing hypergraph Laplacian. The central-smoothing hypergraph Laplacian 𝐋k\mathbf{L}_{k} in (8) can be computed with the time complexity of O(|E|)O(|E|). Concretely, each element 𝐋k,i,j\mathbf{L}_{k,i,j} can be computed by:

𝐋k,i,j=eE|i,jewk(e)Hi,eHj,e.\displaystyle\mathbf{L}_{k,i,j}=\sum_{e\in E|i,j\in e}w_{k}(e)H_{i,e}H_{j,e}. (9)

We have four cases:

  • 𝐋k,i,j=𝐋k,j,i=14tVs|(i,j,t)EWk,t\mathbf{L}_{k,i,j}=\mathbf{L}_{k,j,i}=\frac{1}{4}\sum_{t\in V_{s}|(i,j,t)\in E}W_{k,t} if i!=jVDi!=j\in V_{D}.

  • 𝐋k,i,j=𝐋k,j,i=12nd(i,j)Wk,j\mathbf{L}_{k,i,j}=\mathbf{L}_{k,j,i}=-\frac{1}{2}n_{d}(i,j)W_{k,j} if iVD,jVSi\in V_{D},j\in V_{S}.

  • 𝐋k,i,i=14t|tVSmd(i,t)Wk,t\mathbf{L}_{k,i,i}=\frac{1}{4}\sum_{t|t\in V_{S}}m_{d}(i,t)W_{k,t} if iVDi\in V_{D}.

  • 𝐋k,i,i=q(i)Wk,i\mathbf{L}_{k,i,i}=q(i)W_{k,i} if iVSi\in V_{S}.

where nd(i,j)=|{(u,v,j)E|u=iv=i}|n_{d}(i,j)=|\{(u,v,j)\in E|u=i\lor v=i\}|, md(i,t)=|{u|(i,u,t)(u,i,t)E}|m_{d}(i,t)=|\{u|(i,u,t)\lor(u,i,t)\in E\}|, q(i)=|{(u,v,i)|(u,v,i)E}|q(i)=|\{(u,v,i)|(u,v,i)\in E\}|.

Complexity analysis. Given NN convolution layers, the computational complexity for all central-smoothing hypergraph Laplacian is O(NK|E|)O(N\cdot K\cdot|E|). Each 𝐋k\mathbf{L}_{k} can be computed with a complexity of O(|E|)O(|E|) by iterating over all hyperedges in EE once, and for each hyperedge, the side effect weight is added to the corresponding elements in 𝐋k\mathbf{L}_{k} and we have N×KN\times K Laplacian matrices to compute. We note that KK here is referred to the size of latent features, and this is not the original input features. In practice, even if the size of the original input features is very large, the number of latent features can be very small (200\leq 200), which is computationally tractable.

Non-weighted version. In our experiments, we will examine the need for the weight of each side effect. So we here show a non-weighted version of central-smoothing hypergraph Laplacian, called CentSimple\mathrm{CentSimple} by fixing 𝐖k\mathbf{W}_{k} to be an identity matrix, where the central-smoothing hypergraph Laplacian in (8) becomes:

𝐋~k=HH𝖳.\displaystyle\tilde{\mathbf{L}}_{k}=H{H}^{\mathsf{T}}. (10)

IV-C Central-Smoothing Hypergraph Neural Networks (HGNNs)

Transforming input features to latent spaces

We first transform the input feature vector of drugs and one-hot vector of side effects to the KK-dimension latent space by using a two-layer feedforward neural network for drugs, and a one-layer feedforward neural network (as an embedding table) for side effect, respectively, as follows:

XD(0)\displaystyle X_{D}^{(0)} =fD(XD)\displaystyle=f_{D}(X_{D})
XS(0)\displaystyle X_{S}^{(0)} =fS(XS),\displaystyle=f_{S}(X_{S}),

where XD|K0|×|VD|X_{D}\in\mathbb{R}^{|K_{0}|\times|V_{D}|} is the drug input features with feature size K0K_{0}, XS|VS|×|VS|X_{S}\in\mathbb{R}^{|V_{S}|\times|V_{S}|} is the one-hot vector of side effect, XD(0)K×|VD|X_{D}^{(0)}\in\mathbb{R}^{K\times|V_{D}|}, XS(0)K×|VS|X_{S}^{(0)}\in\mathbb{R}^{K\times|V_{S}|} and fDf_{D} and fSf_{S} are the corresponding feedforward neural networks.

Convolution layers on the latent spaces

We adapt HGNN layers [5] using 𝐋k\mathbf{L}_{k} at dimension kk. Given hypergraph Laplacian 𝐋k\mathbf{L}_{k}, we have the normalized adjacency matrix with a self-loop at each node:

A~k=2Id𝐋k1/2𝐋kd𝐋k1/2,\displaystyle\tilde{A}_{k}=2I-d_{\mathbf{L}_{k}}^{-1/2}\mathbf{L}_{k}d_{\mathbf{L}_{k}}^{-1/2}, (11)

where d𝐋kd_{\mathbf{L}_{k}} is the degree matrix, corresponding to Laplacian 𝐋k\mathbf{L}_{k} and II is the identity matrix.

Let D~k\tilde{D}_{k} be the corresponding degree matrix of A~k\tilde{A}_{k}, each layer of central-smoothing HGNNs has the following form:

X(l+1)=σ(X~(l+1)Θ(l)),\displaystyle X^{(l+1)}=\sigma(\tilde{X}^{(l+1)}\Theta^{(l)}), (12)

where X~(l+1)=[x~1(l+1),,x~K(l+1)]\tilde{X}^{(l+1)}=[\tilde{x}_{1}^{(l+1)},...,\tilde{x}_{K}^{(l+1)}] and x~k(l+1)=D~k1/2A~kD~k1/2xk(l)\tilde{x}_{k}^{(l+1)}=\tilde{D}_{k}^{-1/2}\tilde{A}_{k}\tilde{D}_{k}^{-1/2}x_{k}^{(l)}, Θ(l)K×K\Theta^{(l)}\in\mathbb{R}^{K\times K} is the parameters for the transformation from layer (l)(l) to layer (l+1)(l+1), and σ\sigma is an activation function.

IV-D Predicting Drug-Drug Interactions

Assuming that X𝖳|V|×K{X^{*}}^{\mathsf{T}}\in\mathbb{R}^{|V|\times K} is the final node representation with learnt weights W={Wk|k=1K}W^{*}=\{W_{k}^{*}|k=1...K\}. For all e=(u,v,t)e=(u,v,t), tt is predicted to be a side effect of uu and vv if the representation of tt is close enough to the weighted midpoint of the two drug node representations (computed by score function p(e,X,W)p(e,X^{*},W^{*})). First, we compute smoothness measures ssa(e,X,W)ssa(e,X^{*},W^{*}) of (u,v,t)(u,v,t) on all dimensions:

ssa(e,X,W)=k=1KWk,t(Xk,u+Xk,v2Xk,t)2.\displaystyle ssa(e,X^{*},W^{*})=\sum_{k=1}^{K}W^{*}_{k,t}(\frac{X^{*}_{k,u}+X^{*}_{k,v}}{2}-X^{*}_{k,t})^{2}. (13)

Then, the prediction score is defined to be:

p(e,X,W)=11+ssa(e,X,W).\displaystyle p(e,X^{*},W^{*})=\frac{1}{1+ssa(e,X^{*},W^{*})}. (14)

If p(e,X,W)>hp(e,X^{*},W^{*})>h, a predefined threshold, then tt is predicted to be a side effect of uu and vv.

IV-E Objective Function of CentSmoothie

Let E¯=VD×VD×VSE\bar{E}=V_{D}\times V_{D}\times V_{S}\setminus E be complement of the hyperedge set. The objective function to train CentSmoothie\mathrm{CentSmoothie} is to maximize the score p(e,X,W)p(e,X^{*},W^{*}) of the known hyperedges and minimize the score of the complement set E¯\bar{E}^{*}. Then the objective function can be defined as:

minW0,Xf(X,W)\displaystyle\min_{W^{*}\geq 0,X^{*}}f(X^{*},W^{*}) =eE(1p(e,X,W))2\displaystyle=\sum_{e\in E}(1-p(e,X^{*},W^{*}))^{2} (15)
+λeE¯p(e,X,W)2,\displaystyle+\lambda\sum_{e\in\bar{E}}p(e,X^{*},W^{*})^{2}, (16)

where λ\lambda is a hyperparameter.

In practice, as |E¯||\bar{E}| is too large, we randomly sample a subset of ΩE¯,|Ω|=|E|\Omega\subset\bar{E},|\Omega|=|E| to replace E¯\bar{E} in the objective function to reduce the computational cost (A CentSmoothie implementation available at https://github.com/anhnda/CentSmoothieCode). To keep the non-negative constraint on WW^{*}, we used a projected gradient descent [18].

V Experiments

We conducted experiments to evaluate the performance of our proposed method, CentSmoothie\mathrm{CentSmoothie}, a hypergraph neural network with a central-smoothing assumption, in two scenarios: (i) a synthetic dataset and (ii) three real DDI datasets. On the synthetic dataset, we aimed to validate that CentSmoothie\mathrm{CentSmoothie} could achieve higher performances than traditional hypergraph neural networks, by using the data generated from the central-smoothing assumption. On the real DDI datasets, we examined the performance of CentSmoothie\mathrm{CentSmoothie} in comparison with baseline models, to prove that the central-smoothing assumption is suitable for DDI data.

For both scenarios, we used 20-fold cross-validation using the mean AUC (area under the ROC curve) and the mean AUPR (area under the precision-recall curve) with standard deviations, to validate the prediction performances [1].

For graph and hypergraph neural networks, the numbers of layers and the embedding sizes were in [1, 2, 3] and [10, 20, 30], respectively. The activation function was rectified linear unit (ReLu). The hyperparameter λ\lambda was fixed: 0.01. The results obtained were the highest performances with the number of layers of 2 and the embedding size of 20 for all methods. All experiments were run on a computer with Intel Core I7-9700 CPU, 8 GB GeForce RTX 2080 GPU, and 32 GB RAM.

V-A Synthetic Data

V-A1 Generation

We generated a synthetic dataset with the idea that each drug has several groups of features and the combination of two groups of features leads to a side effect of the drugs. We fixed the number of drugs D=500D=500, the number of side effects: S=45S=45, and changed maximum number of groups of drug features from 11 to 66. The detail of the generation process can be found in the supplement.

V-A2 Comparing Methods

For the synthetic dataset, we used the central-smoothing hypergraph neural networks CentSmoothie\mathrm{CentSmoothie}, the non-weighted central-smoothing hypergraph neural networks CentSimple\mathrm{CentSimple}, and the existing spectral based hypergraph neural network, HPNN\mathrm{HPNN} [5].

V-A3 Results

Fig. 2 shows the AUC and AUPR of each compared method, obtained by changing maximum number of groups of features for drugs. We could easily see that CentSmoothie\mathrm{CentSmoothie} achieved the highest AUC and AUPR scores for all values of x-axis, followed by CentSimple\mathrm{CentSimple} and then HPNN\mathrm{HPNN}. In particular, the AUC scores of CentSmoothie\mathrm{CentSmoothie} were always higher than 0.95, while those of HPNN decreased when drugs are more complex with larger numbers of groups drugs features. This clearly showed that CentSmoothie\mathrm{CentSmoothie} could correctly capture the patterns generated by the central smoothing assumption, particularly for larger numbers of groups of drug features. Similarly, the AUC scores of CentSimple\mathrm{CentSimple} decreased with higher number of maximum number of groups of features, e.g. around 0.75 at 6. The pattern for AUPR scores was also similar to that of AUC scores. This result showed that CentSmoothie\mathrm{CentSmoothie} could learn different side effects for drug pairs more effectively than CentSimple\mathrm{CentSimple}, implying the significance of using a weight for each side effect in CentSmoothie\mathrm{CentSmoothie}.

Refer to caption
(a)
Refer to caption
(b)
Figure 2: Synthetic data performance comparison: (a) AUC and (b) AUPR.

V-B Real Data

V-B1 Data description

We used three real DDI datasets: TWOSIDES, CADDDI, and JADERDDI. TWOSIDES is a public dataset for DDI extracted from the FDA adverse event reporting system (US database) [19]. To our knowledge, TWOSIDES is the largest and commonly used benchmark dataset for DDI [1, 10, 20]. In a similar manner as in [19] of TWOSIDES, we used significant tests to generate two new DDI datasets: CADDDI from Canada vigilance adverse reaction report (Canada database, from 1965 to Feb 2021) [21] and JADERDDI from The Japanese Adverse Drug Event Report (Japanese database, from 2004 to March 2021) [22]. We only selected small molecular drugs appearing in DrugBank [23]. Each drug feature vector was a binary vector with the size of 2,329, indicating the existences of 881 substructures and 1,448 interacting proteins [24]. The statistics of the final datasets is shown in Table I.

TABLE I: Statistics of the three real datasets.
Dataset #drugs #side effects #drug-drugs #drug-drug-side effects Avg. side effects/drug-drugs drug-drugs/ side effects
Min Max Avg
TWOSIDES 557 964 49,677 3,606,046 72.58 288 22,520 3740.7
CADDDI 587 969 21,918 373,976 17.06 89 3288 385.9
JADERDDI 545 922 36,929 222,081 6.01 60 1922 240.9
TABLE II: Comparison of performances of the methods on the real DDI datasets.
Method TWOSIDES CADDDI JADERDDI
AUC AUPR AUC AUPR AUC AUPR
MLNN 0.8372 ±\pm 0.0050 0.7919 ±\pm 0.0041 0.8689 ±\pm 0.0021 0.6927 ±\pm 0.0082 0.8578 ±\pm 0.0015 0.3789 ±\pm 0.0020
MRGNN 0.8452 ±\pm 0.0036 0.8029 ±\pm 0.0039 0.9226 ±\pm 0.0015 0.7113 ±\pm 0.0031 0.9049 ±\pm 0.0009 0.3698 ±\pm 0.0019
Decagon 0.8639 ±\pm 0.0029 0.8094 ±\pm 0.0024 0.9132 ±\pm 0.0014 0.6338 ±\pm 0.0029 0.9099 ±\pm 0.0012 0.4710 ±\pm 0.0027
SpecConv 0.8785 ±\pm 0.0025 0.8256±\pm 0.0022 0.8971 ±\pm 0.0055 0.6640 ±\pm 0.0014 0.8862 ±\pm 0.0025 0.5162 ±\pm 0.0047
HETGNN 0.9113 ±\pm 0.0004 0.8267 ±\pm 0.0005 0.9371 ±\pm 0.0004 0.7974 ±\pm 0.0011 0.8989 ±\pm 0.0007 0.5618 ±\pm 0.0012
HPNN 0.9044 ±\pm0.0003 0.8410 ±\pm 0.0007 0.9495 ±\pm 0.0004 0.7020 ±\pm 0.0018 0.9127 ±\pm 0.0004 0.5198 ±\pm 0.0016
CentSimple\mathrm{CentSimple} 0.9242 ±\pm 0.0003 0.8638 ±\pm 0.0011 0.9584 ±\pm 0.0005 0.6890 ±\pm 0.0016 0.9239 ±\pm 0.0007 0.5349 ±\pm 0.0021
CentSmoothie\mathrm{CentSmoothie} 0.9348 ±\pm 0.0002 0.8749 ±\pm 0.0013 0.9846 ±\pm 0.0001 0.8230 ±\pm 0.0019 0.9684 ±\pm 0.0004 0.6044 ±\pm 0.0025

V-B2 Comparing Methods

On the real datasets, we compared our proposed methods to baselines: none-graph based, graph based, and hypergraph based methods. For the none-graph based method, we used a multi-layer feedforward neural network (MLNN) [10]. For graph neural networks, on the drug molecular graphs, we used MRGNN\mathrm{MRGNN} [20] with the recommended hyperparameter settings. On the DDI graph, we used Decagon [1], a spatial convolution, SpecConv (a spectral convolution graph neural networks) [13], and HETGNN (a heterogeneous graph neural network) [11]. For hypergraph neural networks, we used the existing spectral convolution hypergraph neural network, HPNN\mathrm{HPNN} [5]. We also showed the results of CentSimple\mathrm{CentSimple} to see the effect of central-smoothing without having weights for side effects.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 3: Performance comparison (AUC (left) and AUPR (right)) on (a) TWOSIDES, (b) CADDDI and (c) JADDERDDI.

V-B3 Results

Table II shows the AUC scores and AUPR scores of all methods. We could see that again CentSmoothie\mathrm{CentSmoothie} achieved the highest AUC and AUPR scores in all three datasets. For TWOSIDES, CentSmoothie\mathrm{CentSmoothie} achieved 0.9348 in AUC and 0.8749 in AUPR, followed by CentSimple\mathrm{CentSimple} (0.9242 and 0.8638), HPNN (0.9044 and 0.8410), HETGNN (0.9113 and 0.8267), SpecConv (0.8785 and 0.8256), Decagon (0.8639 and 0.8094), MRGNN (0.8452 and 0.8029), and MLNN (0.8372 and 0.7919).

For CADDDI and JADERDDI, CentSmoothie\mathrm{CentSmoothie} had the highest performances with AUC and AUPR: (0.9845 and 0.8230) and (0.9684 and 0.6044), respectively. The second and third best methods were CentSimple\mathrm{CentSimple} and HPNN, respectively.

In particular, in AUC, there existed two clear performance gaps. The first one was between hypergraph based methods (CentSmoothie\mathrm{CentSmoothie}, CentSimple\mathrm{CentSimple} and HPNN) and non-hypergraph based methods (HETGNN, SpecConv, Decagon, MRGNN, and MLNN). The second one was between CentSmoothie\mathrm{CentSmoothie} and (CentSimple\mathrm{CentSimple} and HPNN). The first gap showed the advantage of using hypergraph based method for predicting drug-drug interaction. The second gap showed the advantage of central smoothing over regular smoothing. In addition, we could see the importance of learning weights for each side effect to improve the prediction performance.

In AUPR, there was a clear gap between CentSmoothie\mathrm{CentSmoothie} and the remaining methods. This again showed the advantage of learning weights under the central smoothing assumption for predicting DDI.

CentSmoothie\mathrm{CentSmoothie} can learn the representations of side effects together with drugs to leverage the relationships of side effects (see the supplement for representation visualization of side effects). These side effect representations might be useful for infrequent side effects which are harder to predict due to the scarcity of positive training data. Fig. 3 showed the AUC (left) and AUPR (right) scores of the methods on the subset of most infrequent side effects, obtained by starting with the most infrequent side effect and adding the next infrequent side effects to the subset. From both AUC and AUPR scores in Fig. 3, we could see that CentSmoothie\mathrm{CentSmoothie} achieved the best performances for all values of x-axis (the rightmost point of x-axis corresponds to using all side effects), being followed by CentSimple\mathrm{CentSimple} and HPNN.

VI Conclusion

We have presented CentSmoothie\mathrm{CentSmoothie}, a hypergraph neural network, for predicting drug-drug interactions, to learn representations of side effects together with drug representations in the same space. A unique feature of CentSmoothie\mathrm{CentSmoothie} is a new central-smoothing formulation, which can be incorporated into the hypergraph Laplacian, to model drug-drug interactions. Our extensive experiments of using both synthetic and three real datasets confirmed clear performance advantages of CentSmoothie\mathrm{CentSmoothie} over existing hypergraph and graph neural network methods, indicating that CentSmoothie\mathrm{CentSmoothie} could learn representations of drugs and side effects simultaneously with the central-smoothing assumption. Furthermore, CentSmoothie\mathrm{CentSmoothie} kept high performance on the infrequent side effects for which the performances of other methods dropped significantly, indicating that CentSmoothie\mathrm{CentSmoothie} allows leveraging the relationships among side effects to help the difficult cases of less frequent side effects. For future work, it is interesting to extend the central-smoothing assumption into more general cases not limiting to 3-uniform hypergraphs. In addition, learning adaptive ratios to replace the constraint of the midpoint might be considered.

References

  • [1] M. Zitnik, M. Agrawal, and J. Leskovec, “Modeling polypharmacy side effects with graph convolutional networks,” Bioinformatics, vol. 34, no. 13, pp. i457–i466, 2018.
  • [2] Y.-H. Feng, S.-W. Zhang, and J.-Y. Shi, “Dpddi: A deep predictor for drug-drug interactions,” BMC bioinformatics, vol. 21, no. 1, pp. 1–15, 2020.
  • [3] X. Chu, Y. Lin, Y. Wang, L. Wang, J. Wang, and J. Gao, “Mlrda: A multi-task semi-supervised learning framework for drug-drug interaction prediction,” in Proceedings of the 28th International Joint Conference on Artificial Intelligence.   AAAI Press, 2019, pp. 4518–4524.
  • [4] H. C. Nguyen and H. Mamitsuka, “Learning on hypergraphs with sparsity,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
  • [5] Y. Feng, H. You, Z. Zhang, R. Ji, and Y. Gao, “Hypergraph neural networks,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 3558–3565.
  • [6] S. Bai, F. Zhang, and P. H. Torr, “Hypergraph convolution and hypergraph attention,” arXiv preprint arXiv:1901.08150, 2019.
  • [7] N. Yadati, M. Nimishakavi, P. Yadav, V. Nitin, A. Louis, and P. Talukdar, “Hypergcn: A new method for training graph convolutional networks on hypergraphs,” in Advances in Neural Information Processing Systems, 2019, pp. 1511–1522.
  • [8] S. E. Leucuta and L. Vlase, “Pharmacokinetics and metabolic drug interactions,” Current clinical pharmacology, vol. 1, no. 1, pp. 5–20, 2006.
  • [9] K. Corrie and J. G. Hardman, “Mechanisms of drug interactions: pharmacodynamics and pharmacokinetics,” Anaesthesia & Intensive Care Medicine, vol. 12, no. 4, pp. 156–159, 2011.
  • [10] N. Rohani and C. Eslahchi, “Drug-drug interaction predicting by neural network using integrated similarity,” Scientific reports, vol. 9, no. 1, pp. 1–11, 2019.
  • [11] C. Zhang, D. Song, C. Huang, A. Swami, and N. V. Chawla, “Heterogeneous graph neural network,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019, pp. 793–803.
  • [12] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A comprehensive survey on graph neural networks,” IEEE Transactions on Neural Networks and Learning Systems, 2020.
  • [13] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
  • [14] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70.   JMLR. org, 2017, pp. 1263–1272.
  • [15] T.-H. H. Chan and Z. Liang, “Generalizing the hypergraph laplacian via a diffusion process with mediators,” Theoretical Computer Science, vol. 806, pp. 416–428, 2020.
  • [16] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko, “Translating embeddings for modeling multi-relational data,” Advances in neural information processing systems, vol. 26, 2013.
  • [17] J. Basilico and T. Hofmann, “Unifying collaborative and content-based filtering,” in Proceedings of the twenty-first international conference on Machine learning, 2004, p. 9.
  • [18] C.-J. Lin, “Projected gradient methods for nonnegative matrix factorization,” Neural computation, vol. 19, no. 10, pp. 2756–2779, 2007.
  • [19] N. P. Tatonetti, P. Y. Patrick, R. Daneshjou, and R. B. Altman, “Data-driven prediction of drug effects and interactions,” Science translational medicine, vol. 4, no. 125, pp. 125ra31–125ra31, 2012.
  • [20] N. Xu, P. Wang, L. Chen, J. Tao, and J. Zhao, “Mr-gnn: Multi-resolution and dual graph neural network for predicting structured entity interactions,” arXiv preprint arXiv:1905.09558, 2019.
  • [21] Canada Vigilance Program , “Canada vigilance adverse reaction online database,” https://www.canada.ca/en/health-canada/services/drugs-health-products/medeffect-canada/adverse-reaction-database.html, 2021, online; accessed 25 May 2021.
  • [22] Pharmaceutical and Medical Devices Agency , “The japanese adverse drug event report,” https://www.pmda.go.jp/safety/info-services/drugs/adr-info/suspected-adr/0003.html, 2021, online; accessed 15 March 2021.
  • [23] D. S. Wishart, Y. D. Feunang, A. C. Guo, E. J. Lo, A. Marcu, J. R. Grant, T. Sajed, D. Johnson, C. Li, Z. Sayeeda et al., “Drugbank 5.0: a major update to the drugbank database for 2018,” Nucleic acids research, vol. 46, no. D1, pp. D1074–D1082, 2018.
  • [24] D. A. Nguyen, C. H. Nguyen, and H. Mamitsuka, “A survey on adverse drug reaction studies: data, tasks and machine learning methods,” Briefings in bioinformatics, vol. 22, no. 1, pp. 164–177, 2021.
  • [25] D. M. Gujral, G. Lloyd, and S. Bhattacharyya, “Effect of prophylactic betablocker or ace inhibitor on cardiac dysfunction & heart failure during anthracycline chemotherapy±\pmtrastuzumab,” The Breast, vol. 37, pp. 64–71, 2018.
  • [26] W. W. Ogden, D. M. B. II, and J. D. Rives, “Panniculitis of the mesentery,” Annals of surgery, vol. 151, no. 5, p. 659, 1960.
  • [27] M. Stieger, J.-P. Schmid, N. Yawalkar, and T. Hunziker, “Extracorporeal shock wave therapy for injection site panniculitis in multiple sclerosis patients,” Dermatology, vol. 230, no. 1, pp. 82–86, 2015.

Derivation of incidence matrix H

We show that given H|V|×|E|H\in\mathbb{R}^{|V|\times|E|}, where Hu,e=Hv,e=12H_{u,e}=H_{v,e}=\frac{1}{2} and Ht,e=1H_{t,e}=-1 for each e=(u,v,t)Ee=(u,v,t)\in E then:

eEWk,t(Xk,u+Xk,v2Xk,t)2=XkTH𝐖kH𝖳Xk\displaystyle\sum_{e\in E}W_{k,t}(\frac{X_{k,u}+X_{k,v}}{2}-X_{k,t})^{2}=X_{k}^{T}H\mathbf{W}_{k}{H}^{\mathsf{T}}X_{k}

where Xk|V|×1X_{k}\in\mathbb{R}^{|V|\times 1}, Xk,uX_{k,u} be the corresponding value of uu in XkX_{k}, 𝐖k=diag(wk(e1),,wk(e|E|))\mathbf{W}_{k}=diag(w_{k}(e_{1}),...,w_{k}(e_{|E|})) with wk(e)=Wk,tw_{k}(e)=W_{k,t}.

Proof: Let H.,e|V|×1H_{.,e}\in\mathbb{R}^{|V|\times 1} be the column of HH corresponding to hyperedge ee. We have:

eEWk,t(Xk,u+Xk,v2Xk,t)2\displaystyle\sum_{e\in E}W_{k,t}(\frac{X_{k,u}+X_{k,v}}{2}-X_{k,t})^{2}
=eE(Xk,u+Xk,v2Xk,t)Wk,t(Xk,u+Xk,v2Xk,t)\displaystyle=\sum_{e\in E}(\frac{X_{k,u}+X_{k,v}}{2}-X_{k,t})W_{k,t}(\frac{X_{k,u}+X_{k,v}}{2}-X_{k,t})
=eE(XkH.,e)Wk,t(XkH.,e)\displaystyle=\sum_{e\in E}(X_{k}H_{.,e})W_{k,t}(X_{k}H_{.,e})
=Xk𝖳H𝐖kHTXk\displaystyle=X_{k}^{\mathsf{T}}H\mathbf{W}_{k}H^{T}X_{k}\hskip 20.0pt\square

Computing LkL_{k}

Given the formulation for LkL_{k}:

𝐋k,i,j=eE|i,jewk(e)Hi,eHj,e.\displaystyle\mathbf{L}_{k,i,j}=\sum_{e\in E|i,j\in e}w_{k}(e)H_{i,e}H_{j,e}.

We have four cases:

  1. 1.

    i,jVDi,j\in V_{D}, i!=ji!=j, meaning that Hi,e=Hj,e=12H_{i,e}=H_{j,e}=\frac{1}{2}, hence:

    𝐋k,i,j\displaystyle\mathbf{L}_{k,i,j} =eE|i,jewk(e)Hi,eHj,e\displaystyle=\sum_{e\in E|i,j\in e}w_{k}(e)H_{i,e}H_{j,e}
    =14eE|i,jewk(e)\displaystyle=\frac{1}{4}\sum_{e\in E|i,j\in e}w_{k}(e)
    =14tVS|e=(i,j,t)EWk,t\displaystyle=\frac{1}{4}\sum_{t\in V_{S}|e=(i,j,t)\in E}W_{k,t}
  2. 2.

    iVD,jVSi\in V_{D},j\in V_{S}, meaning that Hi,e=12H_{i,e}=\frac{1}{2} and Hj,e=1H_{j,e}=-1, hence:

    𝐋k,i,j\displaystyle\mathbf{L}_{k,i,j} =𝐋k,j,i=eE|i,jewk(e)Hi,eHj,e\displaystyle=\mathbf{L}_{k,j,i}=\sum_{e\in E|i,j\in e}w_{k}(e)H_{i,e}H_{j,e}
    =12eE|i,jewk(e)=12eE|i,jeWk,j\displaystyle=\frac{-1}{2}\sum_{e\in E|i,j\in e}w_{k}(e)=\frac{-1}{2}\sum_{e\in E|i,j\in e}W_{k,j}
    =12Wk,jeE|i,je1\displaystyle=\frac{-1}{2}W_{k,j}\sum_{e\in E|i,j\in e}1
    =12Wk,je=(u,v,j)E|u=iv=i1\displaystyle=\frac{-1}{2}W_{k,j}\sum_{e=(u,v,j)\in E|u=i\lor v=i}1
    =12Wk,jnd(i,j)\displaystyle=\frac{-1}{2}W_{k,j}n_{d}(i,j)

    where nd(i,j)=|{(u,v,j)E|u=iv=i}|n_{d}(i,j)=|\{(u,v,j)\in E|u=i\lor v=i\}|.

  3. 3.

    i=jVDi=j\in V_{D}, Hi,e=Hj,e=12H_{i,e}=H_{j,e}=\frac{1}{2}, hence:

    𝐋k,i,i\displaystyle\mathbf{L}_{k,i,i} =e=(u,v,t)E|u=iv=iwk(e)Hi,eHi,e\displaystyle=\sum_{e=(u,v,t)\in E|u=i\lor v=i}w_{k}(e)H_{i,e}H_{i,e}
    =14e=(u,v,t)E|u=iv=iwk,t\displaystyle=\frac{1}{4}\sum_{e=(u,v,t)\in E|u=i\lor v=i}w_{k,t}
    =14tVSwk,te=(u,v,t)E|u=iv=i1\displaystyle=\frac{1}{4}\sum_{t\in V_{S}}w_{k,t}\sum_{e=(u,v,t)\in E|u=i\lor v=i}1
    =14tVSwk,tmd(i,t)\displaystyle=\frac{1}{4}\sum_{t\in V_{S}}w_{k,t}m_{d}(i,t)

    where md(i,t)=|{u|(i,u,t)(u,i,t)E}|m_{d}(i,t)=|\{u|(i,u,t)\lor(u,i,t)\in E\}|.

  4. 4.

    i=jVSi=j\in V_{S}, meaning that Hi,e=Hj,e=1H_{i,e}=H_{j,e}=-1, hence:

    𝐋k,i,i\displaystyle\mathbf{L}_{k,i,i} =e=(u,v,t)E|u=iv=iwk(e)Hi,eHi,e\displaystyle=\sum_{e=(u,v,t)\in E|u=i\lor v=i}w_{k}(e)H_{i,e}H_{i,e}
    =e=(u,v,i)Ewk(e)=Wk,ie=(u,v,i)E1\displaystyle=\sum_{e=(u,v,i)\in E}w_{k}(e)=W_{k,i}\sum_{e=(u,v,i)\in E}1
    =Wk,iq(i)\displaystyle=W_{k,i}q(i)

    where q(i)=|{(u,v,i)|(u,v,i)E}|q(i)=|\{(u,v,i)|(u,v,i)\in E\}|.

Detail of synthetic data generation

The idea to generate synthetic data is that each drug has several groups of features and the combination of two groups of features leads to a side effect of the drugs. The generation process consists of three steps:

  • Step 1: Generating groups of features and their combinations. Suppose that there were nn groups of features: G={g1,,gn}G=\{g_{1},...,g_{n}\}. There are maximally n(n1)2\frac{n(n-1)}{2} group combinations: P={(gi,gj)|i=1n,j=i+1n}P=\{(g_{i},g_{j})|i=1...n,j=i+1...n\}. Each group combination piP,i=1|P|p_{i}\in P,i=1...|P| is assigned with a side effects sis_{i}.

  • Step 2: Generating drug features. Let aa be the number of features in a group, DD be the number of drugs, and mm be the maximum number of groups of features for each drug.

    For each drug ii, we first uniformly sampled the number of groups 1nim1\leq n_{i}\leq m and then sample nin_{i} groups from GG. Let GiGG_{i}\in G be the sampled groups of drug ii. Let the binary vector bia.n\textbf{b}_{i}\in\mathbb{R}^{a.n} indicated the existence of features for drug did_{i} that 𝐛i(j)=1\mathbf{b}_{i}(j)=1 if j/aGi\left\lfloor j/a\right\rfloor\in G_{i}, otherwise 𝐛i(j)=0\mathbf{b}_{i}(j)=0.

    The feature vector of drug ii was sampled from a Gaussian distribution with mean 𝐛i\mathbf{b}_{i} and variance σ\sigma: 𝐟i=Gaussian(𝐛i,σ)\mathbf{f}_{i}=Gaussian(\mathbf{b}_{i},\sigma).

  • Step 3: Generating triples of drug-drug and side effects. For each pair of two drugs generated from Step 2, we matched the group combinations of the two drugs with the corresponding side effects from Step 1. For a pair of two drugs ii and jj with corresponding groups GiG_{i} and GjG_{j}, let Pij=Gi×GjP_{ij}=G_{i}\times G_{j} and Sij={st|ptPij}S_{ij}=\{s_{t}|p_{t}\in P_{ij}\}, we generated the triples: Eij={(di,dj,st)|stSij}E_{ij}=\{(d_{i},d_{j},s_{t})|s_{t}\in S_{ij}\}.

By going through all pairs of drugs, we obtained the synthetic data set with the drug feature vectors F={fi|i=1n}F=\{f_{i}|i=1...n\} and the triples of drug-drug-side effect E=i=1n,j=i+1nEijE=\cup_{i=1...n,j=i+1...n}E_{ij}.

We set the number of groups n=10n=10, the number of features in each group a=3a=3, the variance σ=0.01\sigma=0.01, and the number of drugs D=500D=500. We changed mm in the range of [1,2,6][1,2,\cdots 6].

Details of experiments

Extracting new datasets

For Canada vigilance adverse and JADERDDI from The Japanese Adverse Drug Event Report, each database consists of reports such that each report contains drugs and the corresponding observed side effects of a patient.

The extraction from these databases was that for each drug pair, we divided the reports into two groups: an exposed group for the reports having the drug pair and a nonexposed group for the reports not having the drug pair. Then, for each side effect, Fisher’s exact test with the threshold p-value of 0.05 was used to check if the occurrence rate of the side effect in the exposed group was significantly higher than in the nonexposed group.

Finally, we obtained a set of significant triples of drug-drug-side effects for each database.

Regarding the overlapping of the datasets, between TWOSIDES and CADDDI, there is 24.8% of overlapping in side effect names and 59.8% of overlapping in drug names. For JADERDDI, we used Google service to translate Japanese drug names to English, mostly written in Katakana, which are more reliable to translate. The overlapping in drug names of TWOSIDE and JADERDDI is 15%. We did not calculate the overlapping of side effects in JADDERDDI since the side effect names were not translate

Splitting data and hyperparameter selection

We split the significant triples (positive set) of drug-drug-side effects into 20 folds with the same ratios for side effects in all folds. The negative set is the complement set of the positive set, defined by: VD×VD×VS/EV_{D}\times V_{D}\times V_{S}/E.

We ran the methods with all hyperparameters in the grid search range. For each method, we selected the hyperparameters having the highest mean value of the 20-fold cross-validation.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Refer to caption
(f)
Refer to caption
(g)
Refer to caption
(h)
Figure 4: Visualization of representations of drugs and side effects ((a-b) Panniculitis, (c-d) Sarcoma, (e-f) Pneumoconiosis, and (g-h) Splenectomy) learnt from HPNN and CentSmoothie trained with TWOSIDES. In CentSmoothie, the representation of a side effect tends to close to the mean of all midpoints of drug pairs causing the side effect. In HPNN, the representation of the side effect is hard to distinguish from the drugs.

Case studies for predicting unknown drug pairs on infrequent side effects

Side effect Drug pair Rank (Score) Literature
CentSmoothie\mathrm{CentSmoothie} HPNN\mathrm{HPNN} Decagon
Panniculitis Ranitidine, Pioglitazone 1(0.94) 10(0.53) -
Diazepam, Clarithromycin 2(0.94) 7(0.57) 139(0.27)
Folic Acid, Metoclopramide 3(0.89) 12(0.50) 62(0.40) -
Fexofenadine, Furosemide 4(0.88) 6(0.58) 34(0.47)
Metronidazole, Salbutamol 5(0.87) 5(0.59) 1(0.61)
Zolpidem, Warfarin 6(0.85) 1(0.66) 91(0.34)
Salbutamol, Warfarin 7(0.85) 2(0.66) -
Sertraline, Hydrochlorothiazide 8(0.85) 4(0.62) 130(0.29) -
Warfarin, Tolterodine 9(0.84) 17(0.45) -
Acetaminophen, Amoxicillin 10(0.82) 13(0.48) 61(0.40)
Sarcoma Carvedilol, Ramipril 1(0.80) 2(0.61) -
Simvastatin, Glipizide 2(0.80) 10(0.49) 21(0.50)
Ibuprofen, Mirtazapine 3(0.80) 13(0.44) 45(0.46)
Lactulose, Simvastatin 4(0.79) 11(0.46) 55(0.42)
Zolpidem, Fluticasone 5(0.78) 1(0.66) 89(0.35) -
Prednisolone, Acetaminophen 6(0.78) 5(0.57) -
Lisinopril, Nystatin 7(0.77) 17(0.44) 25(0.48)
Ibuprofen, Fluticasone 8(0.77) 3(0.61) 31(0.48)
Acetylsalicylic acid, Alendronic acid 9(0.76) 6(0.57) -
Fluticasone, Famotidine 10(0.74) 4(0.60) -
Pneumoconiosis Prednisone, Tolterodine 1(0.91) 2(0.58) 2(0.62)
Celecoxib, Diltiazem 2(0.88) 1(0.60) - -
Atorvastatin, Fenofibrate 3(0.80) 7(0.46) 170(0.23)
Rosuvastatin, Acetaminophen 4(0.78) 3(0.50) -
Losartan, Carisoprodol 5(0.77) 10(0.43) 187(0.21) -
Oxycodone, Zoledronic acid 6(0.68) 5(0.49) 18(0.51) -
Gabapentin, Diclofenac 7(0.68) 4(0.50) 20(0.50)
Risedronate, Metoclopramide 8(0.68) 11(0.43) 49(0.43) -
Rofecoxib, Pamidronate 9(0.66) 9(0.44) -
Tamsulosin, Ofloxacin 10(0.65) 6(0.49) 4(0.61)
Splenectomy Doxycycline, Alendronic acid 1(0.84) 5(0.59) -
Hydroxyzine, Warfarin 2(0.83) 7(0.54) 109(0.38) -
Paroxetine, Pamidronate 3(0.80) 4(0.62) 87(0.40)
Oxycodone, Venlafaxine 4(0.80) 3(0.63) -
Lorazepam, Acetaminophen 5(0.76) 1(0.75) -
Zolpidem, Lansoprazole 6(0.76) 2(0.74) -
Hydroxyzine, Bupropion 7(0.73) 12(0.47) 3(0.61)
Paroxetine, Niacin 8(0.71) 16(0.43) 153(0.30) -
Simvastatin, Doxazosin 9(0.71) 15(0.44) -
Enalapril, Cephalexin 10(0.68) 10(0.49) 58(0.50)
TABLE III: Predictions of unknown drug pairs for a side effect, top-ranked by CentSmoothie (trained with TWOSIDES) with prediction scores and evidences from the literature.

We showed sampled results obtained by CentSmoothie\mathrm{CentSmoothie} trained with the largest dataset (TWOSIDES), for predicting unknown drug pairs of each side effect, where the drug pairs with the side effect shown here are not in the current drug-drug interaction data [19]. Our focus was on infrequent side effects, which were thought to be harder to predict. Also, we confirmed the biological validity of the predicted drug pairs by finding relevant biomedical articles through searching the biological literature using keywords of the predicted drug pair and the side effect.

Table III shows the four side effects (selected from the top 5% infrequent side effects), and for each side effect, ten unknown pairs with the highest prediction scores by CentSmoothie\mathrm{CentSmoothie}. Also for each of the ten drug pairs, the score obtained by HPNN (also Decagon) and the rank according to the score are shown if they were in top 200 predictions. The last column showed the article relevant to each predicted drug pair. For 31 of the 40 predictions, we could find evidence (biomedical articles) by literature survey, implying the prominent of the findings by CentSmoothie\mathrm{CentSmoothie}. Comparing with the ranks (top ten) by CentSmoothie\mathrm{CentSmoothie}, those by HPNN were larger. Meanwhile, those ranks by Decagon were very large, where some rankes were out of top 200, meaning that CentSmoothie and Decagon have different prediction preferences.

Taking a closer look, for example, for sarcoma, the highest score was achieved by the pair of Carvedilol and Ramipril, where this pair was not in the training drug-drug interaction data [19], while this new interaction could be predicted by CentSmoothie\mathrm{CentSmoothie} and validated by [25]. These results demonstrated that prediction by CentSmoothie\mathrm{CentSmoothie} for unknown drug pairs could be used for further clinical verification and so CentSmoothie\mathrm{CentSmoothie} itself would be a highly useful model.

Visualizing representations

Side effects and drug pairs

We visualized the representations of drugs and side effects learnt by CentSmoothie\mathrm{CentSmoothie} and HPNN\mathrm{HPNN} using TWOSIDES dataset to examine the difference between the central-smoothing assumption and the traditional smoothing assumption. We used the same four side effects as those we showed in Section Case studies for predicting unknown drug pairs on infrequent side effects.

Fig. 4 shows the visualization obtained by applying principal component analysis (PCA) to the resultant representation by each of the two methods, where for each side effect, drugs (blue dots) and the side effect (red triangle) are shown in the three-dimensional (3D) space. (For CentSmoothie\mathrm{CentSmoothie}, the representations on the subspace corresponding to the side effect were fed into PCA). We drew (gray) lines for drug pairs with side effects. For CentSmoothie\mathrm{CentSmoothie}, we further showed the midpoint of each drug pair (with a side effect) by a black dot, to see if the midpoint is close to the representation of the side effect. We could easily see that for each side effect, the representations of side effects tended to be located around at the mean point among all midpoints (black dots). However, for HPNN, it was difficult to interpret the representations (of side effects) learnt by HPNN among the representations of drugs. Also by using these visualizations, we could easily understand how each pair of drugs and the side effect are positioned in the space. Particularly, by checking if the side effect is located nearby the midpoint of the corresponding drug pair, we can guess that the side effect might be caused.

Side effects relationships

Refer to caption
Figure 5: Visualization of side effect representations.

We visualized the representations of all side effects learnt by CentSmoothie\mathrm{CentSmoothie} on TWOSIDES dataset to see the relationships of side effects. Fig. 5 shows the visualization of side effects in a three-dimensional space. We could see that side effects are grouped into some small clusters. We highlighted an infrequent side effect: Panniculitis and two of its nearest neighbors: Fracture nonunion and Hemia inguinal. Furthermore we could find the evidence for the occurrence of Panniculitis with Fracture nonunion and Hemia inguinal [26, 27].