This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Interlayer link prediction in multiplex social networks: an iterative degree penalty algorithm

Rui Tang1, Shuyu Jiang1, Xingshu Chen1,2∗, Haizhou Wang1,2, Wenxian Wang2, and Wei Wang2∗ 1. College of Cybersecurity, Sichuan University, Chengdu 610065, China 2. Cybersecurity Research Institute, Sichuan University, Chengdu 610065, China
Abstract

Online social network (OSN) applications provide different experiences; for example, posting a short text on Twitter and sharing photographs on Instagram. Multiple OSNs constitute a multiplex network. For privacy protection and usage purposes, accounts belonging to the same user in different OSNs may have different usernames, photographs, and introductions. Interlayer link prediction in multiplex network aims at identifying whether the accounts in different OSNs belong to the same person, which can aid in tasks including cybercriminal behavior modeling and customer interest analysis. Many real-world OSNs exhibit a scale-free degree distribution; thus, neighbors with different degrees may exert different influences on the node matching degrees across different OSNs. We developed an iterative degree penalty (IDP) algorithm for interlayer link prediction in the multiplex network. First, we proposed a degree penalty principle that assigns a greater weight to a common matched neighbor with fewer connections. Second, we applied node adjacency matrix multiplication for efficiently obtaining the matching degree of all unmatched node pairs. Thereafter, we used the approved maximum value method to obtain the interlayer link prediction results from the matching degree matrix. Finally, the prediction results were inserted into the priori interlayer node pair set and the above processes were performed iteratively until all unmatched nodes in one layer were matched or all matching degrees of the unmatched node pairs were equal to 0. Experiments demonstrated that our advanced IDP algorithm significantly outperforms current network structure-based methods when the multiplex network average degree and node overlapping rate are low.

keywords:
Social networks , Multiplex network , Interlayer link prediction , Scale-free

1 Introduction

With rapid developments in Internet technology, Internet surfing has become increasingly convenient and efficient in recent years. Online social network (OSN) applications, such as Twitter, Facebook, Instagram, and LinkedIn, have rapidly been integrated into people’s everyday lives and have become major social communication tools. These multiple OSNs constitute a multiplex network, where each OSN can be represented as a layer within the multiplex network. In this multiplex network, nodes represent user accounts and intralayer links capture friendships or follower-followee relations. If two accounts in different OSNs belong to the same user, an interlayer link will exist between the nodes. The various OSN applications provide users with different functional experiences. For example, people generally use LinkedIn to follow work-related contents, post short text descriptions on Twitter to express their experiences at the time, and share their photographs on Instagram. Analyzing these OSNs has a significant impact on society, politics, economy, etc [1, 2, 3, 4].

In general, the OSN can be characterized by a complex network, in which nodes represent individuals and links capture the relationships between them [5]. Several researchers have conducted studies in this field relating to clustering [6], link prediction [7], information diffusion [1], and community detection [8, 9], among others. However, it is difficult to apply such a representation to describe multilayer structures such as multiple OSNs, multiple transportation networks [10] as well as the dynamic networks [7, 9]. The multilayer structures have a significant influence on the aspects of cascade [11, 12], propagation [13, 14, 15, 16], synchronization [17], and game [18, 19, 20]. In recent years, the multiplex network [11, 21, 22] has emerged to characterize these multilayer structures. It explicitly incorporates multiple connectivity channels into a system and provides a natural description for systems in which entities have a different set of neighbors in each layer [23].

A user may always be active in different OSNs using different accounts; in most cases, it is not known whether accounts across different OSNs belong to the same person. This means that most of the interlayer links are not given in advance. To identify whether accounts in different OSNs belong to the same person, the structure or feature information should be leveraged to predict the interlayer links, which is a challenging problem in network analysis in recent years [2]. It has been established that no OSN application can completely replace all other similar software at present. Therefore, online users generally register multiple accounts for using these OSNs. When one person registers his or her accounts in different OSNs, he or she may use different usernames, photographs, and introductions for the purpose of privacy protection or anonymity. In this manner, users can confidently use the Internet to chat, make friends, and share information on different social media, according to their personal preferences. However, anonymity also poses harms to society to a certain extent. For example, criminals may register large quantities of different accounts on OSN applications. They subsequently engage in illegal activities, such as spreading rumors, spreading viral links, and inducing financial fraud on these applications [6]. Given effective methods to determine the correspondence between accounts of the same user on different OSNs, we can establish the patterns of criminal violations of laws, model their online behavior, lock their geographical locations, and even determine their real identities, thereby effectively striking against them.

Moreover, numerous other benefits are offered when predicting the interlayer links in the multiplex network consisting of multiple OSNs. For example, business site owners can be aided in studying user behaviors, analyzing the interests of customers, and analyzing the factors that affect their decisions [24]. Furthermore, OSN users can be kept up to date with their virtual contacts from different OSNs in an integrated environment [25].

Three main approaches are available for interlayer link prediction in the multiplex social network: (i) feature-based interlayer link prediction, (ii) network-based interlayer link prediction, and (iii) a combination of multiple approaches. Among these, network-based methods have been applied extensively in recent years [26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41]. Its strategies for predicting interlayer links rely on examining the structures of social graphs on different OSN platforms [32].

In the process of using network structures for interlayer link prediction, in general, only the attributes of the nodes to be predicted are considered, along with the importance of the attributes of their neighbors, such as ignoring the neighbors’ degrees [36]. Many real-world OSNs exhibit a scale-free property. Their degree distribution follows a power law distribution, which is common knowledge in numerous real-world networks [42, 43]. The number of common matched neighbors and their degree attributes may have different effects on the matching degree of two nodes in different layers for the prediction of their interlayer link.

In this study, we developed an iterative degree penalty (IDP) algorithm for interlayer link prediction in the multiplex network. The major contributions of this paper can be summarized as follows:

  • 1.

    We propose a degree penalty principle to calculate the matching degree between the unmatched nodes across different layers in the multiplex network. In the real world, people’s friend relationship cycles are highly individual [36]. Therefore, with more common matched neighbors, a higher probability exists that two unmatched nodes across different layers have an interlayer link. However, common matched neighbors with fewer connections are assigned higher weights for the matching degree between two unmatched nodes. Our degree penalty principle uses the inverse log frequency of the common matched neighbors of two unmatched nodes to calculate the matching degree, thereby effectively addressing the above two situations.

  • 2.

    We develop an iterative algorithm to determine additional hidden interlayer links. We use node adjacency matrix multiplication for efficiently obtaining the matching degree of all unmatched node pairs. Thereafter, we use the approved maximum value method to obtain the interlayer link prediction results from the matching degree matrix. Finally, the prediction results are inserted into the priori interlayer node pair set, and the above processes are performed iteratively until all of the unmatched nodes in one layer are matched or all the matching degrees of the unmatched node pairs are equal to 0.

  • 3.

    We verify the effectiveness of our advanced IDP algorithm on both artificial scale-free and real-world networks. The results demonstrate that the IDP algorithm significantly outperforms the existing network structure-based method FRUI: the recall rate increases by a maximum of 36.6% and an average of 7.0% when the priori interlayer link rates are less than 10% on the real-world networks.

The remainder of this paper is organized as follows. Section 2 reviews related works. Section 3 describes the preliminaries and presents the problem definitions. Section 4 explains the proposed IDP algorithm. Section 5 presents the experimental results. Section 6 concludes the paper and provides future research directions.

2 Related Works

The problem of interlayer link prediction in a multiplex network constituted by multiple OSNs has attracted substantial research attention in the past decade [2]. Existing research efforts can be divided into three categories: feature-based prediction, network-based prediction, and a combination of multiple approaches.

2.1 Feature-based prediction

In several works, features have been extracted from profiles and contents. These studies have adopted machine learning techniques to predict the interlayer links across multiple OSNs [2]. Profile attributes include the username, gender, birthday, address, experience, and image, among others [44]. The username has been found to be the most important attribute in the profile and has been explored extensively. Zafarani and Liu [45] formalized the problem of mapping among identities across multiple websites, and conducted similar empirical tests on thousands of usernames across 12 social networks, thereby empirically validating several hypotheses. They further developed a supervised methodology for the same problem in Ref. [46], which extracted behavioral features of usernames based on the priori knowledge of linguistics and human behavior according to three aspects: human limitations, exogenous factors, and endogenous factors. Perito et al. [47] determined that a significant portion of the users can be linked by means of their usernames, and identified users with binary classifiers. Liu et al. [48] differentiated among users with the same usernames and proposed an unsupervised approach to match users. Among usernames, the recognition of photographs may provide a means of mapping users. Although usernames can be used to predict the interlayer links of the multiplex social network, it is difficult to use them in large-scale situations because some users may have the same username. Acquisti et al. [49] used publicly available photos for large-scale individual re-identification. However, users tend to publish different pieces of information in different OSNs  [50]. Only a handful of users may put their personal photos on different OSNs. Therefore, using photos for identification is appropriate for these specific users.

Apart from usernames and images, several researchers have focused on considering various profile attributes to improve the prediction performance [51, 52, 53, 54, 55, 56]. A heuristic approach based on using the username, e-mail, and birth city attributes was proposed by Carmagnola et al. [51]. Iofciu et al. [52] divided profile information into two types: usernames and tags. They determined that users can be identified by analyzing their tagging, and the performance can be improved by combining tags and usernames. Mu et al. [56] explored the concept of latent user space to describe the user profiles in different social platforms, so that two individuals with greater similarities would have closer profiles in the latent user space. Similar areas were also studied in Refs. [53, 54, 55]. Leveraging more profile attributes could improve the prediction performance effectively, but profile attributes may contain information which are null or are not sufficiently strong to indicate user identity.

Content attributes for individuals can reveal their activities, such as posting, mentioning, and commenting in OSNs [2]. These properties can capture unique characteristics for the interlayer link prediction of multiplex OSNs. Zheng et al. [57] extracted four types of writing-style features (syntactic, lexical, structural, and content) to identify authorship. Goga et al. [58] exploited the geo-location, timestamp, and writing style of user posts to identify accounts on different OSNs. Zafarani and Liu [59] extracted trajectory-based content features to capture the unique footprints of user activities for linking the same user accounts across platforms. Nevertheless, the methods based on content attributes often face the problem of data sparsity, because the amount of location data or posted content of different users varies significantly. These approaches are usually confined to some specific OSN applications, which makes it harder to leveraged them on a general multiplex social network.

2.2 Network-based prediction

Personal data available on OSNs are usually anonymized owing to privacy concerns, with users often removing their profile and attribute information or replacing these with fake information [40]. Therefore, using network structures to predict the interlayer links has been the focus of substantial research. Narayanan and Shmatikov [60] assumed that a natural person usually has a similar social network in the virtual world. Based on this assumption, Zhou et al. [36] developed a network structure-based method known as the Friend Relationship-based User Identification (FRUI) algorithm. The FRUI algorithm identifies the same user across different OSNs by leveraging the shared friend cycle, and requires several priori matched pairs. They subsequently proposed an approach that did not require priori matched pairs in Ref. [33]. Interlayer link prediction in a multiplex network consisting of multiple OSNs formed the maximum common subgraph problem in Ref. [61], which maximized the number of standard links to obtain one-to-one mapping. The maximum common subgraph problem is an NP-hard problem, and is therefore hardly used in real scenarios. Tan et al. [41] leveraged a hypergraph to model user relationships and correlate accounts across different OSNs by projecting the manifolds of two networks onto a commonly embedded space. Moreover, neighborhood-based network features were used for account alignment in [62]. Zhang et al. [63] proposed a joint link fusion algorithm to predict the social links and anchor links of two OSNs simultaneously. This algorithm transferred information relating to social links from one network to another. Its time complexity is approximately O(n3)O(n^{3}). Numerous researchers have used network embedding [64], also known as representation learning [65], to predict interlayer links in recent years. Liu et al. [37] adopted a network embedding approach to align users across multiple directed OSNs. Their method embedded two OSNs into a common space to capture the social links of user accounts. Man et al. [38] proposed a supervised model to learn cross-layer mapping for interlayer link prediction, which leveraged network embedding to capture the underlying structural regularities. Zhou et al. [31] proposed an unsupervised algorithm for user identification, using network embedding and scalable nearest neighbors. A deep reinforcement learning-based framework was developed in Ref. [32], which could embed the global network structure, thereby achieving higher correlation accuracy than other network embedding methods. Similar studies can also be found in Refs. [26, 27, 28, 29, 31, 34, 35, 39]. Most of the network embedding-based methods implicitly assume that the input networks are complete without missing edges, and it is not easy to add user attribution into the representation vectors.

2.3 Combination of multiple approaches

Other relevant approaches have combined the feature-based and structure-based methods for interlayer link prediction; for example, searching the same username from the friend lists of seed users across Facebook and Twitter [66]. Nunes et al. [67] introduced a binary classification method to classify account pairs as belonging to the same person or not. The feature vectors of the classifier were constructed by profile information, descriptions of interests, and friend lists. Kong et al. [68] studied two fully aligned OSN datasets collected from Foursquare and Twitter to determine the correspondence between different user accounts. They extracted multiple features and formulated the correspondence problem as a stable matching problem. The network structure features were one type among multiple features. Zhang et al. [69] explored a probabilistic approach for mapping individuals, in which the user friend or connection count was used as a feature. Lu et al. [24] introduced a methodology to identify customers between customer accounts, e-commerce sites, and OSN applications. The network structure was used to extract user interest features.

In the majority of combination approaches, the problem of attempting to obtain user profile features for the sake of privacy protection exists. Our study focuses on structure-based prediction. Numerous real-world OSNs exhibit a scale-free degree distribution [43]. Common matched neighbors with different degrees may have different influences on the matching degrees of nodes across two OSNs. In this study, we developed an IDP algorithm to address this issue.

3 Preliminaries and problem

In this section, we introduce the conception of the multiplex network for representing multiple OSN applications, and describe the problem of interlayer link prediction on the multiplex network. The symbols and notations frequently used in this paper are displayed in Table 1. We use bold uppercase letters for matrices, bold lowercase letters for vectors, and lowercase letters for scalars.

Refer to caption
Figure 1: Example of representing relations across multiple OSNs using multiplex network. (a) Real scenario: There are two OSN applications. OSNα has six user accounts, while OSNβ has 7. Each account exhibits certain connections with other accounts on the same OSN. In the red circle, three pairs of accounts belong to three users, and the correspondence among these accounts is revealed in advance. The task is to determine the correspondence of the other accounts. (b) Multiplex network: We use a multiplex network \mathcal{M} to represent multiple OSNs. The two OSNs are denoted by layers α\alpha and β\beta. The user accounts are denoted by nodes viαv^{\alpha}_{i} or vjβv^{\beta}_{j}. The relationships of the accounts connected in an OSN are denoted by interlayer links eijαe^{\alpha}_{ij} or eijβe^{\beta}_{ij}. The correspondence of two accounts across two OSNs is denoted by the interlayer link eijαβe^{\alpha\beta}_{ij}. The task of determining the correspondence of accounts across OSNs becomes the prediction of interlayer links in the multiplex network.

3.1 Definitions

To distinguish the different OSN platforms, we represent the relationship between nodes in one OSN as the graph Gα(Vα,Eα)G^{\alpha}(V^{\alpha},E^{\alpha}), where VαV^{\alpha} and EαVα×VαE^{\alpha}\subseteq V^{\alpha}\times V^{\alpha} are the sets of nodes and edges of the graph GαG^{\alpha}. The elements of EαE^{\alpha} are referred to as intralayer links. The scenario of multiple OSNs is represented by the multiplex network =(g,c)\mathcal{M}=(g,c), where g={Gα|α{1,,m}}g={\{G^{\alpha}|\alpha\in\{1,\cdots,m\}\}} is the set of graphs and c={EαβVα×Vβ|α,β{1,,m},αβ}c={\{E^{\alpha\beta}\subseteq V^{\alpha}\times V^{\beta}|\alpha,\beta\in\{1,\cdots,m\},\alpha\neq\beta\}} is the set of connections between the nodes of graphs GαG^{\alpha} and GβG^{\beta}. If there exists an edge eijαβe^{\alpha\beta}_{ij} between node viαv^{\alpha}_{i} in graph GαG^{\alpha} and node vjβv^{\beta}_{j} in graph GβG^{\beta}, the accounts represented by the two nodes belong to the same user. The elements of EαβE^{\alpha\beta} are known as interlayer links. Each subnetwork in gg is referred to as a layer in multiplex network \mathcal{M}. An illustration of multiplex networks is provided in Fig. 1.

Table 1: Symbols and notations
Symbol Description
\mathcal{M} The multiplex network.
GG A social network that is one layer of \mathcal{M}.
u,vu,v Nodes in \mathcal{M}.
α,β\alpha,\beta Layer indices of \mathcal{M}.
eα,eβe^{\alpha},e^{\beta} Intralayer connections in GαG^{\alpha} and GβG^{\beta}.
𝒆\bm{e},𝑬\bm{E} Intralayer connection vector and intralayer connection matrix, respectively.
eαβe^{\alpha\beta} Interlayer connection.
i,j,a,bi,j,a,b Node indices.
nα,nβn^{\alpha},n^{\beta} Number of nodes in GαG^{\alpha} and GβG^{\beta}.
nn Number of MINPs.
Γ(vi)\Gamma(v_{i}) Neighbors set of node viv_{i}.
r,𝑹r,\bm{R} Matching degree between two interlayer nodes and matching degree matrix for all interlayer nodes, respectively.
kk Degree of a node.
𝒅\bm{d} Degree vector of all nodes in a layer.
hh Reciprocal of logarithm for a priori interlayer node degree
𝒉,𝑯\bm{h},\bm{H} Reciprocal vector of logarithm for a priori interlayer node degree and reciprocal matrix of logarithm for all priori interlayer node degrees.
δ\delta Control parameter for selecting candidate matched interlayer nodes.
𝑨\bm{A} Adjacency matrix of a layer.
pp The ratio of PINPs to INPs.
ss The percentage of remaining nodes when extracting subnetworks to construct the multiplex network.
Φ\Phi Set of PINPs.
φα,φβ\varphi^{\alpha},\varphi^{\beta} Set of priori interlayer nodes in layers α\alpha and β\beta.
Ψ\Psi Set of MINPs.
ψα,ψβ\psi^{\alpha},\psi^{\beta} Set of matched interlayer nodes in layer α\alpha and β\beta.

Prior to introducing the details of our proposed method, we present the formal definitions of various important concepts in this paper.

Definition 1: Interlayer node pair (INP). Given a multiplex network \mathcal{M}, if an interlayer link exists between a node viαv^{\alpha}_{i} in layer α\alpha and a node vjβv^{\beta}_{j} in layer β\beta, we refer to the pair consisting of these two nodes as an INP, which is represented by INP(viα(v^{\alpha}_{i} ,vjβ)v^{\beta}_{j}). Moreover, the nodes belonging to the INP are known as interlayer nodes.

Definition 2: Matched interlayer node pair (MINP). Given a multiplex network \mathcal{M}, if a node pair consisting of node viαv^{\alpha}_{i} in layer α\alpha and node vjβv^{\beta}_{j} in layer β\beta is matched as an INP, we refer to this pair as an MINP and denote it by MINP (viα(v^{\alpha}_{i} ,vjβ)v^{\beta}_{j}). An MINP is not necessarily an INP, because the results calculated by the algorithm may be incorrect.

Definition 3: Priori interlayer node pair (PINP). Given a multiplex network \mathcal{M}, if several INPs are provided in advance, we refer to these INPs as PINPs. Furthermore, the nodes belonging to the PINP are known as priori interlayer nodes.

Definition 4: Unmatched node pair (UNP). Given a multiplex network \mathcal{M}, if a node viαv^{\alpha}_{i} in layer α\alpha and a node vjβv^{\beta}_{j} in layer β\beta have not been matched, we refer to the pair consisting of these two nodes as a UNP, which is represented by UNP(viα(v^{\alpha}_{i} ,vjβ)v^{\beta}_{j}).

Definition 5: Common matched neighbor (CMN). Given a PINP(viαv^{\alpha}_{i},vjβv^{\beta}_{j}), if node vaαv^{\alpha}_{a} in layer α\alpha has an intralayer link with node viαv^{\alpha}_{i}, and node vbβv^{\beta}_{b} in layer β\beta has an intralayer link with node vjβv^{\beta}_{j}, we can state that the PINP(viαv^{\alpha}_{i},vjβv^{\beta}_{j}) is the CMN of nodes vaαv^{\alpha}_{a} and vbβv^{\beta}_{b}.

It is worth noting that the terms graph and layer, link and connection, and interlayer node pair and interlayer link are used interchangeably in this paper.

3.2 Problem statement

Supposing that we have a two-layer multiplex network \mathcal{M} with a small set of priori interlayer links, the purpose of the interlayer link prediction problem is to predict which node pairs are most likely to have the interlayer links.

Given a UNP (uiα,ujβ)(u^{\alpha}_{i},u^{\beta}_{j}) in the multiplex network \mathcal{M}, the objective function of the interlayer link prediction can be defined as follows:

f(uiα,ujβ)=J(rij)={1,if eijαβ exist,0,otherwise,f(u^{\alpha}_{i},u^{\beta}_{j})=J(r_{ij})=\left\{\begin{array}[]{ll}1,&\mbox{if $e^{\alpha\beta}_{ij}$ exist,}\\ 0,&\mbox{otherwise,}\end{array}\right. (1)

where rijr_{ij} represents the matching degree of the unmatched nodes viαv^{\alpha}_{i} and vjβv^{\beta}_{j}. The interlayer link prediction problem is converted into the calculation of rijr_{ij} and the definition of the objective function JJ.

In a real scenario, certain people may possess two or more accounts in the same OSN application. For simplicity, we assume that these multiple accounts belong to different users. This means that the interlayer link prediction problem in this study is a one-to-one matching problem; that is, no two interlayer links share the same node.

4 IDP algorithm

In this section, we introduce the iterative interlayer link prediction algorithm, known as the IDP, into the multiplex network to predict the interlayer links based on only a small part of these. Firstly, we propose a degree penalty principle that assigns a greater weight to a CMN with fewer connections. Secondly, we propose node adjacency matrix multiplication for efficiently obtaining the matching degree of all of the UNPs. Thereafter, we use the approved maximum value method to obtain the interlayer link prediction results from the matching degree matrix. Finally, the prediction results are inserted into the PINP set, and the above processes are performed iteratively until all of the unmatched nodes in one layer are matched or all of the match degrees of the UNPs are equal to 0.

Refer to caption
Figure 2: Examples of interlayer link prediction with IDP. (a) A two-layer multiplex network with nine nodes in layer α\alpha and ten nodes in layer β\beta. All of the interlayer links and three of the interlayer links are provided, respectively. Now, we need to use the IDP algorithm to predict the remaining interlayer links. (b) The first iteration to output a prediction result. We use the degree penalty principle to calculate the matching degree of each candidate node pair and set δ=1\delta=1 to select the maximum pair (5,5) as a result in this iteration. (c) The next iteration for predicting the interlayer links. In this iteration, there are four MINPs. We use the same method to calculate the matching degree, and the result is (4,4). (d) The results in the later iterations, which are (3,3), (2,2), (7,7), and (9,9).

4.1 Degree penalty principle

Researchers have established that numerous natural and artificial networks exhibit certain common topological characteristics, such as small world, scale free, and core periphery [70]. The scale-free property is valuable in that the degree distribution of a scale-free network follows a power law. Zhou et al. analyzed four leading OSN sites (Delicious, Flickr, Twitter, and YouTube), and found that these sites all exhibit scale-free degree distributions [43]. This means that there exist large nodes with a small degree, as well as several nodes with a large degree. The CMNs with different degrees will have different weights for the UNP matching degrees. For example, if a person has only one friend and he or she follows an account vαv^{\alpha} on OSN GαG^{\alpha} and an account vβv^{\beta} on OSN GβG^{\beta}, it is highly likely that vαv^{\alpha} and vβv^{\beta} belong to the same person, namely his or her friend. In contrast, if this person has many friends, it is difficult to determine whether or not the accounts that he or she follows on different OSNs belong to the same person.

Based on the above statements, we propose the principle of the degree penalty of CMNs for calculating the matching degree of the unmatched nodes, which assigns more weights to smaller-degree CMNs. We define

rij=(vaα,vbβ)Φ,vaαΓ(uiα),vbβΓ(ujβ)log1(kvaα+1)+log1(kvbβ+1)r_{ij}=\sum_{\begin{subarray}{c}\forall(v^{\alpha}_{a},v^{\beta}_{b})\in\Phi,\\ v^{\alpha}_{a}\in\Gamma(u^{\alpha}_{i}),\\ v^{\beta}_{b}\in\Gamma(u^{\beta}_{j})\end{subarray}}\log^{-1}(k_{v^{\alpha}_{a}}+1)+\log^{-1}(k_{v^{\beta}_{b}}+1) (2)

as the matching degree of UNP (uiα,ujβ)(u^{\alpha}_{i},u^{\beta}_{j}). In Eq. (2), Φ\Phi represents the set of PINPs, Γ(uiα)\Gamma(u^{\alpha}_{i}) and Γ(ujβ)\Gamma(u^{\beta}_{j}) represent the neighbor sets of nodes uiαu^{\alpha}_{i} and ujβu^{\beta}_{j}, respectively, kk represents the node degree, and the constraints in the equation indicate that the PINP (vaα,vbβ)(v^{\alpha}_{a},v^{\beta}_{b}) is the CMN of UNP (uiα,ujβ)(u^{\alpha}_{i},u^{\beta}_{j}). It is noteworthy that, if kvaα=1k_{v^{\alpha}_{a}}=1, log(kvaα)\log(k_{v^{\alpha}_{a}}) will be equal to 0, and log1(kvaα)\log^{-1}(k_{v^{\alpha}_{a}}) becomes meaningless. To overcome this problem, we add 1 to each log by means of Laplace smoothing.

The degree penalty principle has the following characteristics:

(i) It can reflect the contribution of the number of CMNs to the matching degree. For any two unmatched nodes uiαu^{\alpha}_{i} and ujβu^{\beta}_{j}, a greater number of CMNs results in their matching degree rijr_{ij} being larger.

(ii) It can reflect the influence of the degree of CMNs on the matching degree. For any CMNs, a smaller degree results in the weights of these CMNs making a greater contribution to the matching degree of the UNPs connected to them.

4.2 Calculation of matching degree

The matching degree of each UNP can be calculated by the degree penalty principle proposed above. Given all of the nodes, interlayer links, and priori interlayer links, the question is how to obtain all matching degrees of the UNPs efficiently. We propose an approach based on matrix operation, inspired by Ref. [71].

Given a multiplex network \mathcal{M}, nαn^{\alpha} and nβn^{\beta} are denoted as the number of nodes in layers α\alpha and β\beta, respectively, while nn is the number of PINPs. The number of unmatched nodes in layer α\alpha can be represented by nαnn^{\alpha}-n, while the number of unmatched nodes in layer β\beta can be represented by nβnn^{\beta}-n.

For a node vaαφαv^{\alpha}_{a}\in\varphi^{\alpha}, where φα\varphi^{\alpha} represents the set of priori interlayer nodes in layer α\alpha, the degree of vaαv^{\alpha}_{a} can be expressed as

kvaα=b=1nαeabα,k_{v^{\alpha}_{a}}=\sum_{b=1}^{n_{\alpha}}{e^{\alpha}_{ab}}, (3)

where eabαe^{\alpha}_{ab} is equal to 1 if an intralayer link exists between nodes vaαv^{\alpha}_{a} and vbβv^{\beta}_{b}, and 0 otherwise.

By haα=log1(kvaα+1)h^{\alpha}_{a}=\log^{-1}(k_{v^{\alpha}_{a}}+1), we can rewrite Eq. (2) as

rij=(vaα,vbβ)Φ,vaαΓ(uiα),vbβΓ(ujβ)haα+hbβ.r_{ij}=\sum_{\begin{subarray}{c}\forall(v^{\alpha}_{a},v^{\beta}_{b})\in\Phi,\\ v^{\alpha}_{a}\in\Gamma(u^{\alpha}_{i}),\\ v^{\beta}_{b}\in\Gamma(u^{\beta}_{j})\end{subarray}}h^{\alpha}_{a}+h^{\beta}_{b}. (4)

This equation will facilitate the subsequent introduction of matrix operations.

It is clear that, if PINP (vaα,vbβ)(v^{\alpha}_{a},v^{\beta}_{b}) is the CMN of UNP (uiα,ujβ)(u^{\alpha}_{i},u^{\beta}_{j}), eiaαe^{\alpha}_{ia} and ejbβe^{\beta}_{jb} will be equal to 1; thus, eiaαhaαejbβ=haαe^{\alpha}_{ia}\cdot h^{\alpha}_{a}\cdot e^{\beta}_{jb}=h^{\alpha}_{a} and eiaαhbβejbβ=hbβe^{\alpha}_{ia}\cdot h^{\beta}_{b}\cdot e^{\beta}_{jb}=h^{\beta}_{b}. In contrast, if PINP (vaα,vbβ)(v^{\alpha}_{a},v^{\beta}_{b}) is not the CMN of UNP (uiα,ujβ)(u^{\alpha}_{i},u^{\beta}_{j}), eiaαe^{\alpha}_{ia} or ejbβe^{\beta}_{jb} will be equal to 0; thus, eiaαhaαejbβ=0e^{\alpha}_{ia}\cdot h^{\alpha}_{a}\cdot e^{\beta}_{jb}=0 and eiaαhbβejbβ=0e^{\alpha}_{ia}\cdot h^{\beta}_{b}\cdot e^{\beta}_{jb}=0. Therefore, Eq. (4) can be replaced with

rij=(vaα,vbβ)Φeiaαhaαejbβ+eiaαhbβejbβ.r_{ij}=\sum_{\forall(v^{\alpha}_{a},v^{\beta}_{b})\in\Phi}e^{\alpha}_{ia}\cdot h^{\alpha}_{a}\cdot e^{\beta}_{jb}+e^{\alpha}_{ia}\cdot h^{\beta}_{b}\cdot e^{\beta}_{jb}. (5)

If node vaαv^{\alpha}_{a} is a matched interlayer node in layer α\alpha, a counterpart node must exist in layer β\beta, and vice versa. Naturally, it is possible to make the PINPs uniform, as follows: (v1α(v^{\alpha}_{1}, v1β),(v2αv^{\beta}_{1}),(v^{\alpha}_{2}, v2β),,(vnαv^{\beta}_{2}),\cdots,(v^{\alpha}_{n}, vnβ)v^{\beta}_{n}). Therefore, Eq. (5) can be rewritten as

rij=a=1nhaαeiaαeajβ+eiaαeajβhaβ.r_{ij}=\sum_{a=1}^{n}h^{\alpha}_{a}\cdot e^{\alpha}_{ia}\cdot e^{\beta}_{aj}+e^{\alpha}_{ia}\cdot e^{\beta}_{aj}\cdot h^{\beta}_{a}. (6)

Using the vector form, Eq. (6) can be represented as

rij=[h1αei1α,h2αei2α,,hnαeinα][e1jβe2jβenjβ]+[ei1α,ei2α,,einα][e1jβh1βe2jβh2βenjβhnβ].\begin{array}[]{l}r_{ij}=[h^{\alpha}_{1}\cdot e^{\alpha}_{i1},h^{\alpha}_{2}\cdot e^{\alpha}_{i2},\cdots,h^{\alpha}_{n}\cdot e^{\alpha}_{in}]\cdot\left[\begin{array}[]{c}e^{\beta}_{1j}\\ e^{\beta}_{2j}\\ \vdots\\ e^{\beta}_{nj}\\ \end{array}\right]\\ +~{}[e^{\alpha}_{i1},e^{\alpha}_{i2},\cdots,e^{\alpha}_{in}]\cdot\left[\begin{array}[]{c}e^{\beta}_{1j}\cdot h^{\beta}_{1}\\ e^{\beta}_{2j}\cdot h^{\beta}_{2}\\ \vdots\\ e^{\beta}_{nj}\cdot h^{\beta}_{n}\\ \end{array}\right].\end{array} (7)

It is known that the Hadamard product of A[aij]A\equiv[a_{ij}] and B[bij]B\equiv[b_{ij}] with the same dimensions is the matrix AB=[aijbij]A\circ B=[a_{ij}b_{ij}]. Therefore, [h1αei1α,h2αei2α,,hnαeinα][h^{\alpha}_{1}\cdot e^{\alpha}_{i1},h^{\alpha}_{2}\cdot e^{\alpha}_{i2},\cdots,h^{\alpha}_{n}\cdot e^{\alpha}_{in}] can be represented as [h1α,h2α,,hnα][ei1α,ei2α,,einα][h^{\alpha}_{1},h^{\alpha}_{2},\cdots,h^{\alpha}_{n}]\circ[e^{\alpha}_{i1},e^{\alpha}_{i2},\cdots,e^{\alpha}_{in}]. Denoting 𝒉iα=[h1α,h2α,,hnα]T\bm{h}^{\alpha}_{i}=[h^{\alpha}_{1},h^{\alpha}_{2},\cdots,h^{\alpha}_{n}]^{T}, 𝒉jβ=[h1β,h2β,,hnβ]T\bm{h}^{\beta}_{j}=[h^{\beta}_{1},h^{\beta}_{2},\cdots,h^{\beta}_{n}]^{T}, 𝒆iα=[ei1α,ei2α,,einα]T\bm{e}^{\alpha}_{i}=[e^{\alpha}_{i1},e^{\alpha}_{i2},\cdots,e^{\alpha}_{in}]^{T}, 𝒆jβ=[e1jβ,e2jβ,,enjβ]T\bm{e}^{\beta}_{j}=[e^{\beta}_{1j},e^{\beta}_{2j},\cdots,e^{\beta}_{nj}]^{T}, where ()T(\cdot)^{T} is the transposition of ()(\cdot), Eq. (7) can be rewritten as

rij=((𝒉iα)T(𝒆iα)T)𝒆jβ+(𝒆iα)T(𝒆jβ𝒉jβ).r_{ij}=((\bm{h}^{\alpha}_{i})^{T}\circ(\bm{e}^{\alpha}_{i})^{T})\cdot\bm{e}^{\beta}_{j}+(\bm{e}^{\alpha}_{i})^{T}\cdot(\bm{e}^{\beta}_{j}\circ\bm{h}^{\beta}_{j}). (8)

Using Eq. (8), the matching degree of UNP (uiα,ujβ)(u^{\alpha}_{i},u^{\beta}_{j}) is represented by the form of the vector operation. Then, we can express the matching degree of all UNPs in the matrix operation form. In a similar manner, we denote 𝑯α=[𝒉1α,𝒉2α,,𝒉nαnα]\bm{H}^{\alpha}=[\bm{h}^{\alpha}_{1},\quad\bm{h}^{\alpha}_{2},\quad\cdots,\quad\bm{h}^{\alpha}_{n^{\alpha}-n}], 𝑬α=[𝒆1α,𝒆2α,,𝒆nαnα]\bm{E}^{\alpha}=[\bm{e}^{\alpha}_{1},\bm{e}^{\alpha}_{2},\cdots,\bm{e}^{\alpha}_{n^{\alpha}-n}], 𝑯β=[𝒉1β,𝒉2β,,𝒉nβnβ]\bm{H}^{\beta}=[\bm{h}^{\beta}_{1},\bm{h}^{\beta}_{2},\cdots,\bm{h}^{\beta}_{n^{\beta}-n}], 𝑬β=[𝒆1β,𝒆2β,,𝒆nβnβ]\bm{E}^{\beta}=[\bm{e}^{\beta}_{1},\bm{e}^{\beta}_{2},\cdots,\bm{e}^{\beta}_{n^{\beta}-n}]. The matching degree of all UNPs can be calculated as follows:

𝑹=([𝑯α]T[𝑬α]T)𝑬β+[𝑬α]T(𝑬β𝑯β).\bm{R}=([\bm{H}^{\alpha}]^{T}\circ[\bm{E}^{\alpha}]^{T})\cdot\bm{E}^{\beta}+[\bm{E}^{\alpha}]^{T}\cdot(\bm{E}^{\beta}\circ\bm{H}^{\beta}). (9)

It is worth noting that 𝑯α\bm{H}^{\alpha} consists of nαnn^{\alpha}-n copies of 𝒉iα\bm{h}^{\alpha}_{i}, because 𝒉1α=𝒉2α==𝒉nα=[h1α,h2α,,hnα]T\bm{h}^{\alpha}_{1}=\bm{h}^{\alpha}_{2}=\cdots=\bm{h}^{\alpha}_{n}=[h^{\alpha}_{1},h^{\alpha}_{2},\cdots,h^{\alpha}_{n}]^{T}. Similarly, 𝑯β\bm{H}^{\beta} consists of nβnn^{\beta}-n copies of 𝒉jβ\bm{h}^{\beta}_{j}. Moreover, 𝑬α\bm{E}^{\alpha} and 𝑬β\bm{E}^{\beta} are the submatrices of the adjacency matrix of GαG^{\alpha} and GβG^{\beta}, respectively. Each row of the submatrix represents one of the matched interlayer nodes, while each column represents one of the unmatched nodes.

Based on the above statements, the four matrices 𝑯α\bm{H}^{\alpha}, 𝑬α\bm{E}^{\alpha}, 𝑬β\bm{E}^{\beta}, and 𝑯β\bm{H}^{\beta} can be obtained. The matching degree of all unmatched nodes between the different layers of the multiplex network \mathcal{M} can be calculated efficiently.

4.3 Selecting matched interlayer node pairs

After obtaining the values of all elements of the matching degree matrix 𝑹\bm{R}, we need to determine which UNPs can be selected as MINPs. This means that we should formulate the objective function JJ. This function clarifies that a node pair can be selected as an MINP when its matching degree satisfies certain conditions. A larger value of the matching degree rijr_{ij} indicates a higher probability that the node pair (uiαu^{\alpha}_{i}, ujβu^{\beta}_{j}) is the MINP. Therefore, we define the objective function as:

J(rij)=𝟙(rijδmax(𝑹)),s.t.uiαψα,ujβψβ,\begin{array}[]{c}J(r_{ij})=\mathds{1}(r_{ij}\geq\delta\cdot\max(\bm{R})),\\ s.t.~{}u^{\alpha}_{i}\notin\psi^{\alpha},u^{\beta}_{j}\notin\psi^{\beta}\\ \end{array}, (10)

where (i) 𝟙()\mathds{1}(\cdot) is an indicator function that takes 1 if the condition inside the parenthesis is true, and zero otherwise; (ii) max()\max(\cdot) is the maximum function that takes the maximum value inside the parenthesis; (iii) ψα\psi^{\alpha} and ψβ\psi^{\beta} are the sets of matched interlayer nodes in layers α\alpha and β\beta, respectively; and (iv) δ\delta is a control parameter that takes a value from 0 to 1. When δ=1\delta=1, only the UNPs with the maximum value of the matching degree can be selected as MINPs. When δ(0,1)\delta\in(0,1), the UNPs with a matching degree greater than δmax(𝑹)\delta\cdot\max(\bm{R}) can be selected as MINPs. Moreover, when δ=0\delta=0, all of the UNPs can be selected as MINPs. It is obvious that a smaller value of δ\delta means that additional UNPs are selected as MINPs, and hence, the accuracy is lower. In contrast, a larger value of δ\delta indicates that less UNPs are selected as MINPs, and hence, the efficiency is lower. The constraints in Eq. (10) are used to ensure that each unmatched node is matched only once.

4.4 Achieving additional matched interlayer node pairs iteratively

In the previous steps, the degree penalty principle and matrix operation are leveraged to calculate the matching degree matrix 𝑹\bm{R} for all unmatched nodes among the different layers in the multiplex network \mathcal{M} and the node pairs with matching degrees greater than δ\delta times max(𝑹)\max(\bm{R}) are selected as MINPs. By performing these steps once, only one or several UNPs can be selected as MINPs. Therefore, we propose an iterative strategy, known as the IDP algorithm, to achieve additional MINPs.

Denoting Ψ\Psi as the set of MINPs, we can add elements of Ψ\Psi to Φ\Phi. Moreover, we execute the steps introduced above again to identify more MINPs. This strategy can be executed iteratively until all of the unmatched nodes in one layer are matched or all of the matching degrees of the UNPs are equal to 0.

The details of our suggested IDP algorithm for interlayer link prediction are presented in Algorithm 1. In each iteration, the time complexity of the algorithm mainly depends on the calculation of the matching degree matrix 𝑹\bm{R} of Algorithm 1, which is matrix multiplication. Provided nαn=nβn=mn^{\alpha}-n=n^{\beta}-n=m, the running time of the matrix multiplication is nm2/qnm^{2}/q, where qq is the number of compute nodes [72]. For the whole algorithm, provided that zz is the average number of selected matched interlayer nodes per iteration, the time complexity of IDP is O(nm3/zp)O(nm^{3}/zp). Moreover, an illustration of the procedure of the IDP algorithm is presented in Fig. 2.

Algorithm 1. IDP
Input:Nodes and connections of GαG^{\alpha} and GβG^{\beta} in multiplex network \mathcal{M}, PINPs, parameter δ\delta.
Output:MINPs
 1: function IDP(GαG^{\alpha}, GβG^{\beta}, PINPs, δ\delta)
 2: 𝑨α\bm{A}^{\alpha}\leftarrow adjacency matrix of GαG^{\alpha}, 𝑨β\bm{A}^{\beta}\leftarrow adjacency matrix of GβG^{\beta}
 3: 𝒅α\bm{d}^{\alpha}\leftarrow degree of nodes in GαG^{\alpha}, 𝒅β\bm{d}^{\beta}\leftarrow degree of nodes in GβG^{\beta}
 4: nαn^{\alpha}=size(𝑨α\bm{A}^{\alpha}), nβn^{\beta}=size(𝑨β\bm{A}^{\beta}), nn=size(PINPs)
 5: matchNodeα=[], unmatchNodeα=[], matchNodeβ=[], unmatchNodeβ=[], MINPs=[]
 6: while n<nαn<n^{\alpha} and n<nβn<n^{\beta} do
 7:  j=1,k=1
 8:  foreach i in nαn^{\alpha} do
 9:   if i is in PINPs do
10:    matchNodeα[j]=i, j++
11:   else do
12:    unmatchNodeα[k]=i, k++
13:  j=1,k=1
14:  foreach i in nβn^{\beta} do
15:   if i is in PINPs do
16:    matchNodeβ[j]=i, j++
17:   else do
18:    unmatchNodeβ[k]=i, k++
19:  𝑬α\bm{E}^{\alpha}=submatrix(𝑨α\bm{A}^{\alpha},matchNodeα,unmatchNodeα)
20:  𝑬β\bm{E}^{\beta}=submatrix(𝑨β\bm{A}^{\beta},matchNodeβ,unmatchNodeβ)
21:  𝒉1α\bm{h}^{\alpha}_{1}=1./log(submatrix(𝒅α\bm{d}^{\alpha},matchNodeα,1).+1)
22:  𝒉1β\bm{h}^{\beta}_{1}=1./log(submatrix(𝒅β\bm{d}^{\beta},matchNodeβ,1).+1)
23:  𝑯α=[𝒉1α,𝒉1α,,𝒉1α]n×(nαn)\bm{H}^{\alpha}=[\bm{h}^{\alpha}_{1},\bm{h}^{\alpha}_{1},\cdots,\bm{h}^{\alpha}_{1}]_{n\times(n^{\alpha}-n)}
24:  𝑯β=[𝒉1β,𝒉1β,,𝒉1β]n×(nβn)\bm{H}^{\beta}=[\bm{h}^{\beta}_{1},\bm{h}^{\beta}_{1},\cdots,\bm{h}^{\beta}_{1}]_{n\times(n^{\beta}-n)}
25:  𝑹=((𝑯α)T(𝑬α)T)𝑬β+[𝑬α]T(𝑬β𝑯β)\bm{R}=((\bm{H}^{\alpha})^{T}\circ(\bm{E}^{\alpha})^{T})\cdot\bm{E}^{\beta}+[\bm{E}^{\alpha}]^{T}\cdot(\bm{E}^{\beta}\circ\bm{H}^{\beta})
26:  max=0\max=0
27:  foreach i in nαnn^{\alpha}-n do
28:   foreach j in nβnn^{\beta}-n do
29:    if 𝑹\bm{R}[i][j]>max>\max do
30:     max=𝑹\max=\bm{R}[i][j]
31:  foreach i in nαnn^{\alpha}-n do
32:   foreach j in nβnn^{\beta}-n do
33:    if 𝑹\bm{R}[i][j]>=δmax>=\delta\cdot\max and unmatchNodeα[j] not in MINPs do
34:     MINPs \leftarrow (unmatchNodeα[i],unmatchNodeβ[j])
35:     PINPs \leftarrow (unmatchNodeα[i],unmatchNodeβ[j])
36:  nn=size(PINPs)
37: return MINPs
38: function submatrix(𝑨\bm{A},rows,cols)
39: 𝑸\bm{Q} \leftarrow Extracts the rows and columns of the matrix 𝑨\bm{A} according to the number in the arrays of rows and cols.
40: return 𝑸\bm{Q}

5 Experiments

In this section, we firstly describe the experimental settings. Thereafter, the evaluation metrics are introduced. Finally, we present the experimental results of the baseline and IDP algorithms on artificial scale-free networks and real-world networks.

Table 2: Time complexity of baselines and IDP
Methods Time Complexity
INOE O(KD(|Eα|+|Eβ|))O(KD(|E^{\alpha}|+|E^{\beta}|))
NS O(m(|Eα|+|Eβ|)dαdβ)O(m(|E^{\alpha}|+|E^{\beta}|)d^{\alpha}d^{\beta})
FRUI O(mndαdβ)O(mnd^{\alpha}d^{\beta})
IDP O(nm3/zp)O(nm^{3}/zp)

5.1 Experimental settings

We verify the effectiveness of our proposed IDP algorithm on both artificial scale-free and real-world networks. The steps for constructing the artificial scale-free multiplex networks are as follows:

(i) We create a Barabási-Albett (BA) [73] network, the degree distribution of which follows a power law distribution, according to the generation step proposed in Ref. [73]. This is an original network for the following steps. (ii) Using Gorg(Vorg,Eorg)G^{org}(V^{org},E^{org}) to represent the original BA network and ss to represent the percentage of remaining nodes, we construct two networks, Gα=GorgG^{\alpha}=G^{org} and Gβ=GorgG^{\beta}=G^{org}. (iii) For a node viαv^{\alpha}_{i} in GαG^{\alpha}, we generate a random value a1a_{1} with a uniform distribution in [0,1]. If a1sa_{1}\geq s, the node viαv^{\alpha}_{i} is discarded, as are all of the intralayer links connected with node viαv^{\alpha}_{i}. Otherwise, node viαv^{\alpha}_{i} is preserved in GαG^{\alpha}. (iv) Similarly, for a node viβv^{\beta}_{i} in GβG^{\beta}, we generate a random value a2a_{2} with a uniform distribution in [0,1]. If a2sa_{2}\geq s, the node viβv^{\beta}_{i} is discarded, as are all of the intralayer links connected with node viβv^{\beta}_{i}. Otherwise, node viβv^{\beta}_{i} is preserved in GβG^{\beta}. (v) We determine whether to add an interlayer link between viαv^{\alpha}_{i} and viβv^{\beta}_{i}. If a1<sa_{1}<s and a2<sa_{2}<s simultaneously, the counterpart nodes of node viorgv^{org}_{i}, namely viαv^{\alpha}_{i} and viβv^{\beta}_{i}, remain in layers α\alpha and β\beta. Therefore, we add an interlayer link between viαv^{\alpha}_{i} and viβv^{\beta}_{i}. Otherwise, we do nothing. (vi) Steps (iii) to (v) are repeated until all of the nodes in GαG^{\alpha} and GβG^{\beta} are traversed.

The real-world datasets contain eight networks downloaded from the websites Stanford Large Network Dataset Collection [74], Link Prediction Group [75], and Network Analysis of Advogato [76]. In a real scenario, OSN application users are often wary of their privacy. Therefore, it is difficult to obtain complete ground truth. To address this problem, we perform experiments on self-matching real-world networks[77]. The solution for obtaining the self-matching real-world multiplex network is the same as that for the BA multiplex network described above.

After constructing the multiplex networks, we use the FRUI [36], NS [78], and INOE [37] as the baselines for the experiments, where FRUI is the closest to the IDP algorithm for the interlayer link prediction as a state of the art. Each experiment is repeated 500 times. For the sake of reducing computational time in each experiment, the adjacency matrices constructed by the matched nodes and unmatched nodes of layer α\alpha and β\beta in the ttth iteration will join to 𝑬α\bm{E}^{\alpha} and 𝑬β\bm{E}^{\beta}, respectively. This could avoid reconstructing 𝑬α\bm{E}^{\alpha} and 𝑬β\bm{E}^{\beta} in (t+1)(t+1)th iteration.

The time complexity of the baseline and IDP algorithms is presented in Table 2. In particular, KK is the negative sampling number, DD is the representation dimension, |Eα||E^{\alpha}| and |Eβ||E^{\beta}| are the number of intralayer links in layer α\alpha and β\beta, respectively. dαd^{\alpha} and dβd^{\beta} are the maximal degrees of nodes in layer α\alpha and β\beta, respectively.

5.2 Evaluation metrics

We employ the recall, precision, and F1  [2] as the metrics for evaluating the performance of the FRUI and IDP algorithms, which are widely used in information retrieval, machine learning, and data mining, among others. The recall can be formulated as

Recall=TPTP+FN,Recall=\frac{TP}{TP+FN}, (11)

where TP and FN indicate the number of true positives and false negatives, respectively [48]. In this study, the recall reflects the ratio of INPs that the algorithm correctly predicts to the number of INPs that need to be predicted. The precision can be formulated as

Precision=TPTP+FP,Precision=\frac{TP}{TP+FP}, (12)

where FP indicates the number of false positives [48]. In this study, the precision reflects the ratio of INPs that the algorithm correctly predicts to all of the results predicted by the algorithm. F1F1 is the harmonic mean between the precision and recall [79]. It can be formulated as

F1=2RecallPrecisionRecall+Precision,F1=\frac{2\cdot Recall\cdot Precision}{Recall+Precision}, (13)

where the range for F1F1 is [0,1]. For these three metrics, a higher value indicates superior performance of the algorithm performance.

5.3 Results on artificial networks

In this subsection, we outline the manner in which to determine an optimal value of δ\delta, and demonstrate the comparison results of the baseline and IDP algorithms on different average node degrees, node overlaps, and network sizes.

5.3.1 Determining optimal δ\delta

A possible solution for obtaining an optimum δ\delta is to determine it experimentally. We set the default values of δ\delta to increase from 0.1 to 1 by 0.1 and execute the IDP algorithm. In these experiments, the BA network parameter mm, which represents the number of edges for attaching a new node to existing nodes, is equal to 10, the percentage of remaining nodes ss is equal to 0.5, the network sizes NN of the original BA networks are equal to 2,000, 4,000, 6,000, 8,000, and 10,000, respectively, and the ratios of PINPs to INPs pp are equal to 0.01,0.02,,0.10.01,~{}0.02,~{}\cdots,~{}0.1, respectively.

Table 3 displays the average F1 rates of these experiments. It can be observed that the IDP algorithm exhibits the best performance at δ=0.5\delta=0.5 when NN is equal to 2,000. When NN is equal to 4,000,6,000,8,000,and10,0004,000,~{}6,000,~{}8,000,and~{}10,000, the IDP algorithm exhibits superior performance at δ=0.5,0.7,0.7\delta=0.5,~{}0.7,~{}0.7, and 0.70.7, respectively. Overall, the IDP algorithm performs the best when δ\delta is equal to 0.7, at which the average F1 rate is equal to 0.3023. Therefore, we determine that the practical value of the parameter δ\delta is equal to 0.7.

Table 3: Performance of IDP algorithm on different δ\delta
Metric NN δ\delta
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
F1F_{1} 2,000 0.1029 0.1113 0.1355 0.1435 0.1445 0.1389 0.1311 0.1372 0.1299 0.1015
4,000 0.1786 0.2043 0.2303 0.2462 0.2548 0.2574 0.2546 0.2559 0.2508 0.2390
6,000 0.2199 0.2515 0.2881 0.3073 0.3176 0.3287 0.3331 0.3294 0.3344 0.3227
8,000 0.2445 0.2868 0.3207 0.3398 0.3505 0.3665 0.3711 0.3674 0.3727 0.3603
10,000 0.2786 0.3207 0.3554 0.3771 0.3998 0.4133 0.4217 0.4180 0.4221 0.4151
Average 0.2049 0.2349 0.2660 0.2828 0.2934 0.3010 0.3023 0.3016 0.3020 0.2877

5.3.2 Effects of average degree

We evaluate the performance of the baseline and IDP algorithms at different average degrees. We set N=2,000N=2,000, s=0.5s=0.5, pp increasing from 0.01 to 0.1 by 0.01, and m=5,10,15m=5,~{}10,~{}15. Figures 3(a) to (c) display the recall, precision, and F1 rates of the baseline and IDP algorithms under the above experimental settings. The following observations can be made. The IDP algorithm outperforms the baseline algorithms. This is because the IDP algorithm uses the degree penalty principle to calculate the matching degree of two unmatched nodes across different layers. This principle can reflect the contribution of the number of CMNs and influence of the degree of CMNs on the matching degree, while the FRUI algorithm only reflects the contribution of the number of CMNs. The NS algorithm depends heavily on the degree of the nodes in layer β\beta. It limits the contribution of CMNs; hence, the performance is not as good as FRUI. The INOE algorithm uses network embedding to predict the interlayer links, which needs more PINPs to train the model to show its advantage. For a given mm, the recall, precision, and F1 rates of the baseline and IDP algorithms increase with pp. This is because a larger pp means that additional CMNs can be used to calculate the matching degree of UNPs. For a given pp, the recall, precision, and F1 rates of the baseline and IDP algorithms increase with mm, as the nodes with a low degree are difficult to match. A smaller mm indicates more low-degree nodes. It is noteworthy that some lines in Figs. 3(a) to (c) are overlapped. We have listed their recall rate in Table 4.

Refer to caption
Figure 3: Comparison between baselines and IDP on different average degrees. (a) Recall rate, (b) precision rate, (c) F1 rate, (d) percentages of improvement between FRUI and IDP of recall rate, (e) percentages of improvement between FRUI and IDP of precision rate, and (f) percentages of improvement between FRUI and IDP of F1 rate versus pp. In these experiments, we set N=2,000N=2,000, s=0.5s=0.5, and m=5,10,15m=5,~{}10,~{}15. The horizontal ordinates denote the ratios of PINPs to INPs.
Table 4: Recall rate of overlapped lines in Figs. 3(a) to (c)
Method mm pp
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
INOE 15 0.0079 0.0108 0.0147 0.0200 0.0273 0.0365 0.0481 0.0610 0.0764 0.0950
INOE 10 0.0074 0.0106 0.0153 0.0215 0.0289 0.0383 0.0489 0.0608 0.0752 0.0912
INOE 5 0.0084 0.0123 0.0176 0.0240 0.0307 0.0386 0.0475 0.0574 0.0680 0.0795
NS 5 0.0037 0.0071 0.0113 0.0174 0.0260 0.0338 0.0414 0.0521 0.0633 0.0718
FRUI 5 0.0038 0.0077 0.0134 0.0213 0.0277 0.0385 0.0489 0.0594 0.0713 0.0843
IDP 5 0.0022 0.0060 0.0117 0.0192 0.0293 0.0382 0.0509 0.0627 0.0743 0.0866

Figures 3(d) to (f) illustrate the percentages of improvement in the recall, precision, and F1 rates for the IDP algorithm compared to the best baseline, FRUI, algorithm under the same experimental settings as those in Figs. 3(a) to (c). The recall rate increases by a maximum of 20.8% and an average of 6.0%. The precision rate increases by a maximum of 10.5% and an average of 3.1%. The F1 rate increases by a maximum of 14% and an average of 4.1%. A larger mm results in greater overall improvement percentages. This is because, with a larger mm, each unmatched node or matched interlayer node has more interlayer links. For a UMP, more CMNs may be involved in the calculation of its matching degree. Thus, the advantage of the degree penalty principle becomes more obvious. When m=15m=15, the percentages of improvement exhibit a trend of first increasing and then decreasing with an increase in pp. This trend is caused by the following factors. (i) When pp is very small, the number of PINPs used to calculate the matching degree is small. The ability of both the FRUI and IDP algorithms is poor. Thus, the percentages of improvement are not obvious. (ii) With an increase in pp, the number of PINPs increases. Owing to the use of the degree penalty principle to calculate the matching degree, the IDP algorithm can use more useful information of the CMNs. A greater number of PINPs results in higher percentages of improvement. (iii) When the percentages of improvement reach a maximum value, the matching advantage offered by increasing the PINPs in the IDP algorithm decreases. Thus, the percentages of improvement gradually decrease until they disappear. When m=5m=5 or m=10m=10, the percentages of improvement only exhibit an increasing trend. This trend is consistent with the first half when m=15m=15. When pp is greater than 0.1 and the percentages of improvement reach the maximum value, they will decrease as pp increases.

5.3.3 Effects of node overlaps

We use the Jaccard coefficient to measure the node overlaps, as follows:

O(Gα,Gβ)=|VαVβ||VαVβ|.O(G^{\alpha},G^{\beta})=\frac{|V^{\alpha}\bigcap V^{\beta}|}{|V^{\alpha}\bigcup V^{\beta}|}. (14)

When s=0.4,0.5,0.6s=0.4,~{}0.5,~{}0.6, the node overlapping rate is approximately equal to 0.25, 0.33, and 0.43, respectively, according to Eq. (14).

Refer to caption
Figure 4: Comparison between FRUI and IDP algorithms on different node overlaps. (a) Recall rate, (b) precision rate, (c) F1 rate, (d) percentages of improvement of recall rate, (e) percentages of improvement of precision rate, and (f) percentages of improvement of F1 rate versus pp. In these experiments, we set N=2,000N=2,000, m=10m=10, and s=0.4,0.5,0.6s=0.4,~{}0.5,~{}0.6. The horizontal ordinates denote the ratios of PINPs to INPs.

We set N=2000N=2000, m=10m=10, pp increasing from 0.01 to 0.1 by 0.01, and s=0.4,0.5,0.6s=0.4,~{}0.5,~{}0.6 to execute the experiments for the evaluation of the performance of the FRUI and IDP algorithms with different node overlaps. Figures 4(a) to (c) illustrate the recall, precision, and F1 rates of the FRUI and IDP algorithms under the above experimental settings. The following observations can be made.

The IDP algorithm outperforms the FRUI algorithm. For a given ss, the recall, precision, and F1 rates of the FRUI and IDP algorithms increase with pp. The reasons are the same as those in Figs. 3(a) to (c). For a given pp, the recall, precision, and F1 rates of the FRUI and IDP algorithms increase with ss. This is because a higher overlapping rate results in a higher proportion of interlayer nodes in all nodes, and a lower probability of incorrect matching; hence, superior performance of the algorithms. For a UMP, additional interlayer links mean that more CMNs may be involved in the calculation of its matching degree. Thus, the advantage of the degree penalty principle becomes more obvious.

Figures 4(d) to (f) display the percentages of improvement under the same experimental settings as those in Figs. 4(a) to (c). The recall rate increases by a maximum of 17.5% and an average of 4.6%. The precision rate increases by a maximum of 10.9% and an average of 2.7%. The F1 rate increases by a maximum of 13.4% and an average of 3.4%. A larger ss results in greater overall percentages of improvement. This is because a larger ss indicates a greater number of INPs and PINPs. For a UMP, additional CMNs may be involved in the calculation of its matching degree; thus, the advantage of the degree penalty principle is more obvious. When s=0.6s=0.6, the percentages of improvement exhibit a trend of first increasing and then decreasing with an increase in pp. Moreover, when s=0.4s=0.4 or s=0.5s=0.5, the percentages of improvement exhibit a trend of increasing with an increase in pp. The reasons are the same as those in Figs. 3(d) to (f).

5.3.4 Effects of network size

We set s=0.5s=0.5, m=10m=10, pp increasing from 0.01 to 0.1 by 0.01, and N=N=2,000, 4,000, 6,000, 8,000, and 10,000 to execute the experiments for the evaluation of the performance of the FRUI and IDP algorithms with different network sizes. Figures 5(a) to (c) illustrate the recall, precision, and F1 rates of the FRUI and IDP algorithms under the above experimental settings. The following observations can be made.

Refer to caption
Figure 5: Comparison between FRUI and IDP algorithms on different number of nodes. (a) Recall rate, (b) precision rate, (c) F1 rate, (d) percentages of improvement of recall rate, (e) percentages of improvement of precision rate, and (f) percentages of improvement of F1 rate versus pp. In these experiments, we set m=10m=10, s=0.5s=0.5, and N=2,000,4,000,6,000,8,000N=2,000,~{}4,000,~{}6,000,~{}8,000, and 10,000. The horizontal ordinates denote the ratios of PINPs to INPs.

The IDP algorithm outperforms the FRUI algorithm. For a given NN, the recall, precision, and F1 rates of the FRUI and IDP algorithms increase with pp. The reasons are the same as those in Figs. 3(a) to (c). For a given pp, the recall, precision, and F1 rates of the FRUI and IDP algorithms increase with NN. This is because, when NN is larger, more PINPs are used to calculate the matching degree of all UNPs; hence, the performance of the algorithms is improved. When N=10,000N=10,000 and p0.07p\geq 0.07, the performance of both the FRUI and IDP algorithms decreases slightly with the increase in pp, because the number of interlayer links that can be predicted correctly has a maximum. When the maximum number is reached, the increase in pp will lead to a slight decrease in the number of interlayer links to be predicted and that are correctly predicted.

Figures 5(d) to (f) display the percentages of improvement under the same experimental settings as those in Figs. 5(a) to (c). The recall rate increases by a maximum of 23.3% and an average of 8.2%. The precision rate increases by a maximum of 12.2% and an average of 4.5%. The F1 rate increases by a maximum of 16.1% and by an average of 5.8%. A larger NN results in greater average percentages of improvement for the IDP algorithm compared to the FRUI algorithm. Because the BA networks exhibit the characteristics of preferential attachment, new vertices attach preferentially to vertices that are already well connected [73]. A larger NN means that the influence of the degree of CMNs on the matching degree is more sensitive; hence, the percentages of improvement are increased. When N=4,000,6,000,8,000N=4,000,~{}6,000,~{}8,000 or 10,000, the percentages of improvement exhibit a trend of first increasing and then decreasing with an increase in pp. Moreover, when N=2000N=2000, the percentages of improvement exhibit a trend of increasing with an increase in pp. The reasons are the same as those in Figs. 3(d) to (f).

5.4 Results on real-world networks

Table 5: Statistical characteristics of eight real-world networks, including network size (NN), number of edges (EE), maximum degree (kmaxk_{max}), first (k\langle{k}\rangle) and second moments (k2\langle{k^{2}}\rangle) of degree distribution, degree-degree correlations (rr), and clustering (cc).
No. Networks Statistical characteristics of networks
NN EE kmaxk_{max} k\langle{k}\rangle k2\langle{k^{2}}\rangle rr cc
Real-1 Email-Eu-core[80] 1005 16064 345 32.58 2386.6 0.026-0.026 0.45
Real-2 UC Irvine messages[81] 1899 13838 255 14.57 810.7 0.188-0.188 0.14
Real-3 Wikipedia vote[82] 7115 100762 1065 28.32 3530.5 0.083-0.083 0.21
Real-4 Twitter[63] 5120 130576 1725 51.01 11679.7 0.214-0.214 0.30
Real-5 Political blogs[83] 1222 16714 351 27.36 2223 0.221-0.221 0.36
Real-6 Hamsterster friendships[84] 1788 12476 272 13.96 635.6 0.089-0.089 0.17
Real-7 Hamsterster full[84] 2000 16098 273 16.1 704.7 0.023 0.57
Real-8 Foursquare[63] 5313 54233 552 20.42 1436.1 0.193-0.193 0.23

We use eight real-world networks to construct self-matching real-world multiplex networks, and evaluate the FRUI and IDP algorithms on these multiplex networks. The statistical characteristics of the eight real-world networks are presented in Table 5. Formally, we represent each of the eight real-world networks as an undirected graph. In these experiments, s=0.5s=0.5 and pp increases from 0.01 to 0.1 by 0.01.

Figures  6(a) to (f) display the recall, precision, and F1 rates of the FRUI and IDP algorithms of the eight real-world networks. The following observations can be made. The IDP algorithm outperforms the FRUI algorithm. For a given real-world network, the recall, precision, and F1 rates of the FRUI and IDP algorithms increase with pp. The reasons are the same as those in Figs. 3(a) to (c). For a given pp, different real-world networks exhibit varying performances, because the network sizes, average degrees, and clustering coefficients of the real-world networks differ.

Refer to caption
Figure 6: Comparison between FRUI and IDP algorithms on eight real-world networks. (a) and (d) Recall rate, (b) and (e) precision rate, (c) and (f) F1 rate, (g) and (j) percentages of improvement of recall rate, (h) and (k) percentages of improvement of precision rate, (i) and (l) percentages of improvement of F1 rate versus pp. In these experiments, s=0.5s=0.5. The horizontal ordinates denote the ratios of PINPs to INPs.

Figures  6(g) to (l) illustrate the percentages of improvement under the same experimental settings as those in Figs. 6(a) to (f). The following observations can be made. The recall rate increases by a maximum of 36.6% and an average of 7.0%. The precision rate increases by a maximum of 19.0% and an average of 3.8%. The F1 rate increases by a maximum of 25.0% and an average of 5.0%. On real-world network 2, 3, 4, and 8, the percentages of improvement exhibit a trend of first increasing and then decreasing with an increase in pp. On real-world networks 1, 5, 6, and 7, the percentages of improvement exhibit a trend of first increasing and then decreasing with an increase in pp. The reasons are the same as those in Figs. 3(d) to (f).

6 Conclusion

In this study, we have investigated the problem of interlayer link prediction in the multiplex network. We solved this problem by leveraging network structure attributes. We used a degree penalty principle to calculate the matching degree of two unmatched nodes across different layers, which could reflect the influence of the number of CMNs and their degree attributes. For the sake of efficiency, we adopted node adjacency matrix multiplication to obtain the matching degrees of all UNPs. Moreover, we developed an iterative algorithm to determine additional hidden interlayer links. Experiments on both artificial and real-world networks demonstrated that our advanced IDP algorithm outperformed the FRUI algorithm.

In summary, the IDP algorithm offers comparative advantages in multiplex networks with a degree distribution that follows a power law distribution. When users in multiple OSN applications input different usernames and other attribute information owing to privacy or anonymity, our proposed algorithm can effectively associate the accounts belonging to the same person across different OSNs. Such an association can aid in achieving the information fusion of multiple OSN platforms, understanding the online behavior of network users, constructing integral user profiles, and providing services for e-commerce, cyber security, and recommendation systems, among others. As future work, we will investigate the manner in which to use higher-order structures, such as communicating within a group or participating in the same online activity, to improve the performance of the interlayer link prediction.

7 Acknowledgments

This work was supported by the National Science and Technology Support Program of China (Grant No. 2012BAH18B05) and the National Natural Science Foundation of China (Grant Nos. 81602935, 81773548, 61802270, 61802271).

References

  • Shi et al. [2017] C. Shi, Y. Li, J. Zhang, Y. Sun, P. S. Yu, A survey of heterogeneous information network analysis, IEEE Transactions on Knowledge and Data Engineering 29 (2017) 17–37.
  • Shu et al. [2017] K. Shu, S. Wang, J. Tang, R. Zafarani, H. Liu, User identity linkage across online social networks: A review, ACM SIGKDD Explorations Newsletter 18 (2017) 5–17.
  • Fujita et al. [2019] H. Fujita, A. Gaeta, V. Loia, F. Orciuoli, Hypotheses analysis and assessment in counter-terrorism activities: a method based on owa and fuzzy probabilistic rough sets, IEEE Transactions on Fuzzy Systems (2019).
  • Capuano et al. [2017] N. Capuano, F. Chiclana, H. Fujita, E. Herrera-Viedma, V. Loia, Fuzzy group decision making with incomplete information guided by social influence, IEEE Transactions on Fuzzy Systems 26 (2017) 1704–1718.
  • Newman [2010] M. Newman, Networks: an introduction, Oxford university press, New York, NY, USA, 2010.
  • Han et al. [2017] X. Han, L. Wang, C. Cui, J. Ma, S. Zhang, Linking multiple online identities in criminal investigations: A spectral co-clustering framework, IEEE Transactions on Information Forensics and Security 12 (2017) 2242–2255.
  • Chi et al. [2019] K. Chi, G. Yin, Y. Dong, H. Dong, Link prediction in dynamic networks based on the attraction force between nodes, Knowledge-Based Systems (2019).
  • Yin et al. [2017] G. Yin, K. Chi, Y. Dong, H. Dong, An approach of community evolution based on gravitational relationship refactoring in dynamic networks, Physics Letters A 381 (2017) 1349–1355.
  • Zhao et al. [2019] Z. Zhao, C. Li, X. Zhang, F. Chiclana, E. H. Viedma, An incremental method to detect communities in dynamic evolving social networks, Knowledge-Based Systems 163 (2019) 404–415.
  • Boccaletti et al. [2014] S. Boccaletti, G. Bianconi, R. Criado, C. del Genio, J. Gómez-Gardeñes, M. Romance, I. Sendiña-Nadal, Z. Wang, M. Zanin, The structure and dynamics of multilayer networks, Physics Reports 544 (2014) 1–122.
  • Gao et al. [2011a] J. Gao, S. V. Buldyrev, H. E. Stanley, S. Havlin, Networks formed from interdependent networks, Nature Physics 8 (2011a) 40–48.
  • Gao et al. [2011b] J. Gao, S. V. Buldyrev, S. Havlin, H. E. Stanley, Robustness of a network of networks, Physical Review Letters 107 (2011b).
  • Granell et al. [2013] C. Granell, S. Gómez, A. Arenas, Dynamical interplay between awareness and epidemic spreading in multiplex networks, Physical Review Letters 111 (2013) 128701.
  • Wang et al. [2014] W. Wang, M. Tang, H. Yang, Y. Do, Y.-C. Lai, G. Lee, Asymmetrically interacting spreading dynamics on complex layered networks, Scientific Reports 4 (2014) 5097.
  • Wang et al. [2016] W. Wang, Q.-H. Liu, S.-M. Cai, M. Tang, L. A. Braunstein, H. E. Stanley, Suppressing disease spreading by using information diffusion on multiplex networks, Scientific Reports 6 (2016) 29259.
  • Wang et al. [2019] W. Wang, Q.-H. Liu, J. Liang, Y. Hu, T. Zhou, Coevolution spreading in complex networks, arXiv preprint arXiv:1901.02125 (2019).
  • Zhang et al. [2015] X. Zhang, S. Boccaletti, S. Guan, Z. Liu, Explosive synchronization in adaptive and multilayer networks, Physical Review Letters 114 (2015) 038701.
  • Wang et al. [2013a] Z. Wang, A. Szolnoki, M. Perc, Optimal interdependence between networks for the evolution of cooperation, Scientific Reports 3 (2013a) 2470.
  • Wang et al. [2013b] Z. Wang, A. Szolnoki, M. Perc, Interdependent network reciprocity in evolutionary games, Scientific Reports 3 (2013b) 1183.
  • Wang et al. [2014] Z. Wang, A. Szolnoki, M. Perc, Self-organization towards optimally interdependent networks by means of coevolution, New Journal of Physics 16 (2014) 033041.
  • Kivela et al. [2014] M. Kivela, A. Arenas, M. Barthelemy, J. P. Gleeson, Y. Moreno, M. A. Porter, Multilayer networks, Journal of Complex Networks 2 (2014) 203–271.
  • Cozzo et al. [2018] E. Cozzo, G. F. de Arruda, F. A. Rodrigues, Y. Moreno, Multiplex Networks: Basic Formalism and Structural Properties, Springer-Verlag GmbH, Cham, Switzerland, pp. 3–5.
  • Domenico et al. [2013] M. D. Domenico, A. Solé-Ribalta, E. Cozzo, M. Kivelä, Y. Moreno, M. A. Porter, S. Gómez, A. Arenas, Mathematical formulation of multilayer networks, Physical Review X 3 (2013).
  • Lu et al. [2014] C.-T. Lu, H.-H. Shuai, P. S. Yu, Identifying your customers in social networks, in: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, ACM, 2014, pp. 391–400.
  • Vosecky et al. [2009] J. Vosecky, D. Hong, V. Y. Shen, User identification across multiple social networks, in: 2009 first international conference on networked digital technologies, IEEE, 2009, pp. 360–365.
  • Wang et al. [2019] S. Wang, X. Li, Y. Ye, S. Feng, R. Lau, X. Huang, X. Du, Anchor link prediction across attributed networks via network embedding, Entropy 21 (2019) 254.
  • Zhang et al. [2019] S. Zhang, H. Tong, R. Maciejewski, T. Eliassi-Rad, Multilevel network alignment, in: Proceedings of the 28nd International Conference on World Wide Web, ACM Press, San Francisco, CA, USA, 2019.
  • Chu et al. [2019] X. Chu, X. Fan, D. Yao, Z. Zhu, J. Huang, J. Bi, Cross-network embedding for multi-network alignment, in: Proceedings of the 28nd International Conference on World Wide Web, ACM Press, San Francisco, CA, USA, 2019.
  • Wang et al. [2019] Y. Wang, H. Shen, J. Gao, X. Cheng, Learning binary hash codes for fast anchor link retrieval across networks, in: Proceedings of the 28nd International Conference on World Wide Web, ACM Press, San Francisco, CA, USA, 2019.
  • Zhan et al. [2018] Q. Zhan, J. Zhang, P. S. Yu, Integrated anchor and social link predictions across multiple social networks, Knowledge and Information Systems (2018).
  • Zhou et al. [2018a] X. Zhou, X. Liang, J. Zhao, A. Zhiyuli, H. Zhang, An unsupervised user identification algorithm using network embedding and scalable nearest neighbour, Cluster Computing (2018a).
  • Zhou et al. [2018b] F. Zhou, L. Liu, K. Zhang, G. Trajcevski, J. Wu, T. Zhong, Deeplink: A deep learning approach for user identity linkage, in: IEEE INFOCOM 2018-IEEE Conference on Computer Communications, IEEE, 2018b, pp. 1313–1321.
  • Zhou et al. [2018c] X. Zhou, X. Liang, X. Du, J. Zhao, Structure based user identification across social networks, IEEE Transactions on Knowledge and Data Engineering 30 (2018c) 1178–1191.
  • Zhang et al. [2018] J. Zhang, B. Chen, X. Wang, H. Chen, C. Li, F. Jin, G. Song, Y. Zhang, Mego2vec: Embedding matched ego networks for user alignment across social networks, in: Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM), ACM, 2018, pp. 327–336.
  • Zhang et al. [2017] S. Zhang, H. Tong, J. Tang, J. Xu, W. Fan, ineat: Incomplete network alignment, in: 17th IEEE International Conference on Data Mining (ICDM), IEEE, New Orleans, LA, USA, 2017, pp. 1189–1194.
  • Zhou et al. [2016] X. Zhou, X. Liang, H. Zhang, Y. Ma, Cross-platform identification of anonymous identical users in multiple social media networks, IEEE Transactions on Knowledge and Data Engineering 28 (2016) 411–424.
  • Liu et al. [2016] L. Liu, W. K. Cheung, X. Li, L. Liao, Aligning users across social networks using network embedding., in: Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), pp. 1774–1780.
  • Man et al. [2016] T. Man, H. Shen, S. Liu, X. Jin, X. Cheng, Predict anchor links across social networks via an embedding approach, in: 25th International Joint Conference on Artificial Intelligence (IJCAI), volume 16, New York, USA, pp. 1823–1829.
  • Zhang and Tong [2016] S. Zhang, H. Tong, Final: Fast attributed network alignment, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '16, ACM Press, San Francisco, CA, USA, 2016.
  • Zhang and Philip [2015] J. Zhang, S. Y. Philip, Multiple anonymized social networks alignment, in: 2015 IEEE International Conference on Data Mining, IEEE, 2015, pp. 599–608.
  • Tan et al. [2014] S. Tan, Z. Guan, D. Cai, X. Qin, J. Bu, C. Chen, Mapping users across networks by manifold alignment on hypergraph, in: 28th AAAI Conference on Artificial Intelligence, Québec City, Québec, Canada, pp. 159–165.
  • Feng et al. [2018] R. Feng, Y. Yang, W. Hu, F. Wu, Y. Zhang, Representation learning for scale-free networks, in: Thirty-Second AAAI Conference on Artificial Intelligence.
  • Zhou et al. [2011] T. Zhou, M. Medo, G. Cimini, Z.-K. Zhang, Y.-C. Zhang, Emergence of scale-free leadership structure in social recommender systems, PLoS ONE 6 (2011) e20648.
  • Zhao et al. [2018] D. Zhao, N. Zheng, M. Xu, X. Yang, J. Xu, An improved user identification method across social networks via tagging behaviors, in: 30th International Conference on Tools with Artificial Intelligence (ICTAI), IEEE, 2018, pp. 616–622.
  • Zafarani and Liu [2009] R. Zafarani, H. Liu, Connecting corresponding identities across communities, in: Third International AAAI Conference on Weblogs and Social Media.
  • Zafarani and Liu [2013] R. Zafarani, H. Liu, Connecting users across social media sites: a behavioral-modeling approach, in: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2013, pp. 41–49.
  • Perito et al. [2011] D. Perito, C. Castelluccia, M. A. Kaafar, P. Manils, How unique and traceable are usernames?, in: 11st International Symposium on Privacy Enhancing Technologies Symposium, Springer, 2011, pp. 1–17.
  • Liu et al. [2013] J. Liu, F. Zhang, X. Song, Y.-I. Song, C.-Y. Lin, H.-W. Hon, What’s in a name? an unsupervised approach to link users across communities, in: 6th ACM International Conference on Web Search and Data Mining (WSDM), ACM, Rome, Italy, 2013, pp. 495–504.
  • Acquisti et al. [2014] A. Acquisti, R. Gross, F. D. Stutzman, Face recognition and privacy in the age of augmented reality, Journal of Privacy and Confidentiality 6 (2014) 1–20.
  • Labitzke et al. [2011] S. Labitzke, I. Taranu, H. Hartenstein, What your friends tell others about you: Low cost linkability of social network profiles, in: Proc. 5th International ACM Workshop on Social Network Mining and Analysis, San Diego, CA, USA, pp. 1065–1070.
  • Carmagnola and Cena [2009] F. Carmagnola, F. Cena, User identification for cross-system personalisation, Information Sciences 179 (2009) 16–32.
  • Iofciu et al. [2011] T. Iofciu, P. Fankhauser, F. Abel, K. Bischoff, Identifying users across social tagging systems, in: 5th International AAAI Conference on Weblogs and Social Media.
  • Abel et al. [2013] F. Abel, E. Herder, G.-J. Houben, N. Henze, D. Krause, Cross-system user modeling and personalization on the social web, User Modeling and User-Adapted Interaction 23 (2013) 169–209.
  • Goga et al. [2013] O. Goga, D. Perito, H. Lei, R. Teixeira, R. Sommer, Large-scale correlation of accounts across social networks, University of California at Berkeley, Berkeley, California, Tech. Rep. TR-13-002 (2013).
  • Cortis et al. [2013] K. Cortis, S. Scerri, I. Rivera, S. Handschuh, An ontology-based technique for online profile resolution, in: International Conference on Social Informatics, Springer, 2013, pp. 284–298.
  • Mu et al. [2016] X. Mu, F. Zhu, E.-P. Lim, J. Xiao, J. Wang, Z.-H. Zhou, User identity linkage by latent user space modelling, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2016, pp. 1775–1784.
  • Zheng et al. [2006] R. Zheng, J. Li, H. Chen, Z. Huang, A framework for authorship identification of online messages: Writing-style features and classification techniques, Journal of the American Society for Information Science and Technology 57 (2006) 378–393.
  • Goga et al. [2013] O. Goga, H. Lei, S. H. K. Parthasarathi, G. Friedland, R. Sommer, R. Teixeira, Exploiting innocuous activity for correlating users across sites, in: Proceedings of the 22nd international conference on World Wide Web, ACM, 2013, pp. 447–458.
  • Riederer et al. [2016] C. Riederer, Y. Kim, A. Chaintreau, N. Korula, S. Lattanzi, Linking users across domains with location data: Theory and validation, in: Proceedings of the 25th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, 2016, pp. 707–719.
  • Narayanan and Shmatikov [2010] A. Narayanan, V. Shmatikov, Myths and fallacies of ”personally identifiable information”, Communications of the ACM 53 (2010) 24–26.
  • Zhu et al. [2012] Y. Zhu, L. Qin, J. X. Yu, Y. Ke, X. Lin, High efficiency and quality: large graphs matching, The International Journal on Very Large Data Bases 22 (2012) 345–368.
  • Zhang et al. [2015] Y. Zhang, J. Tang, Z. Yang, J. Pei, P. S. Yu, Cosnet: Connecting heterogeneous social networks with local and global consistency, in: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2015, pp. 1485–1494.
  • Zhang and Philip [2015] J. Zhang, S. Y. Philip, Integrated anchor and social link predictions across social networks, in: Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires, Argentina.
  • Tang et al. [2015] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, Q. Mei, Line: Large-scale information network embedding, in: Proceedings of the 24th international conference on world wide web, International World Wide Web Conferences Steering Committee, 2015, pp. 1067–1077.
  • Perozzi et al. [2014] B. Perozzi, R. Al-Rfou, S. Skiena, Deepwalk: Online learning of social representations, in: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2014, pp. 701–710.
  • Jain et al. [2013] P. Jain, P. Kumaraguru, A. Joshi, @i seek 'fb.me': Identifying users across multiple online social networks, in: Proceedings of the 22nd International Conference on World Wide Web - WWW '13 Companion, ACM Press, 2013.
  • Nunes et al. [2012] A. Nunes, P. Calado, B. Martins, Resolving user identities over social networks through supervised learning and rich similarity features, in: Proceedings of the 27th Annual ACM symposium on applied computing, ACM, 2012, pp. 728–729.
  • Kong et al. [2013] X. Kong, J. Zhang, P. S. Yu, Inferring anchor links across multiple heterogeneous social networks, in: Proceedings of the 22nd ACM international conference on Information & Knowledge Management, ACM, 2013, pp. 179–188.
  • Zhang et al. [2014] H. Zhang, M.-Y. Kan, Y. Liu, S. Ma, Online social network profile linkage, in: Asia Information Retrieval Symposium, Springer, 2014, pp. 197–208.
  • Newman [2003] M. E. J. Newman, The structure and function of complex networks, SIAM Review 45 (2003) 167–256.
  • Katz [1953] L. Katz, A new status index derived from sociometric analysis, Psychometrika 18 (1953) 39–43.
  • Lee et al. [1997] J. S. Lee, S.-Y. Park, P. B. Berra, S. Ranka, I/o and memory-efficient matrix multiplication with user-controllable parallel i/o, in: Proceedings 1997 International Conference on Parallel and Distributed Systems, IEEE, pp. 59–66.
  • Barabási and Albert [1999] A.-L. Barabási, R. Albert, Emergence of scaling in random networks, Science 286 (1999) 509–512.
  • Leskovec and Krevl [2020] J. Leskovec, A. Krevl, SNAP Datasets: Stanford large network dataset collection, http://snap.stanford.edu/data, 2020.
  • lin [2020] Link prediction group, http://www.linkprediction.org/, 2020.
  • kon [2020] Network analysis of advogato, http://konect.uni-koblenz.de/networks/, 2020.
  • Kong et al. [2016] C. Kong, M. Gao, C. Xu, W. Qian, A. Zhou, Entity matching across multiple heterogeneous data sources, in: 21st International Conference on Database Systems for Advanced Applications (DASFAA), Springer, Dallas, Texas, USA, 2016, pp. 133–146.
  • Narayanan and Shmatikov [2009] A. Narayanan, V. Shmatikov, De-anonymizing social networks, in: Proceedings of the 2009 30th IEEE Symposium on Security and Privacy, IEEE Computer Society, pp. 173–187.
  • HRIPCSAK and ROTHSCHILD [2005] G. HRIPCSAK, A. S. ROTHSCHILD, Agreement, the f-measure, and reliability in information retrieval, Journal of the American Medical Informatics Association 12 (2005) 296–298.
  • Yin et al. [2017] H. Yin, A. R. Benson, J. Leskovec, D. F. Gleich, Local higher-order graph clustering, in: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2017, pp. 555–564.
  • Opsahl and Panzarasa [2009] T. Opsahl, P. Panzarasa, Clustering in weighted networks, Social networks 31 (2009) 155–163.
  • Leskovec et al. [2010] J. Leskovec, D. Huttenlocher, J. Kleinberg, Predicting positive and negative links in online social networks, in: Proceedings of the 19th international conference on World Wide Web, ACM, 2010, pp. 641–650.
  • Adamic and Glance [2005] L. A. Adamic, N. Glance, The political blogosphere and the 2004 us election: divided they blog, in: Proceedings of the 3rd international workshop on Link discovery, ACM, 2005, pp. 36–43.
  • Kunegis [2013] J. Kunegis, Konect: the koblenz network collection, in: Proceedings of the 22nd International Conference on World Wide Web, ACM, 2013, pp. 1343–1350.