Interlayer link prediction in multiplex social networks: an iterative degree penalty algorithm

Rui Tang¹, Shuyu Jiang¹, Xingshu Chen^1,2∗, Haizhou Wang^1,2, Wenxian Wang², and Wei Wang^2∗ 1. College of Cybersecurity, Sichuan University, Chengdu 610065, China 2. Cybersecurity Research Institute, Sichuan University, Chengdu 610065, China

Abstract

Online social network (OSN) applications provide different experiences; for example, posting a short text on Twitter and sharing photographs on Instagram. Multiple OSNs constitute a multiplex network. For privacy protection and usage purposes, accounts belonging to the same user in different OSNs may have different usernames, photographs, and introductions. Interlayer link prediction in multiplex network aims at identifying whether the accounts in different OSNs belong to the same person, which can aid in tasks including cybercriminal behavior modeling and customer interest analysis. Many real-world OSNs exhibit a scale-free degree distribution; thus, neighbors with different degrees may exert different influences on the node matching degrees across different OSNs. We developed an iterative degree penalty (IDP) algorithm for interlayer link prediction in the multiplex network. First, we proposed a degree penalty principle that assigns a greater weight to a common matched neighbor with fewer connections. Second, we applied node adjacency matrix multiplication for efficiently obtaining the matching degree of all unmatched node pairs. Thereafter, we used the approved maximum value method to obtain the interlayer link prediction results from the matching degree matrix. Finally, the prediction results were inserted into the priori interlayer node pair set and the above processes were performed iteratively until all unmatched nodes in one layer were matched or all matching degrees of the unmatched node pairs were equal to 0. Experiments demonstrated that our advanced IDP algorithm significantly outperforms current network structure-based methods when the multiplex network average degree and node overlapping rate are low.

keywords:

Social networks , Multiplex network , Interlayer link prediction , Scale-free

1 Introduction

With rapid developments in Internet technology, Internet surfing has become increasingly convenient and efficient in recent years. Online social network (OSN) applications, such as Twitter, Facebook, Instagram, and LinkedIn, have rapidly been integrated into people’s everyday lives and have become major social communication tools. These multiple OSNs constitute a multiplex network, where each OSN can be represented as a layer within the multiplex network. In this multiplex network, nodes represent user accounts and intralayer links capture friendships or follower-followee relations. If two accounts in different OSNs belong to the same user, an interlayer link will exist between the nodes. The various OSN applications provide users with different functional experiences. For example, people generally use LinkedIn to follow work-related contents, post short text descriptions on Twitter to express their experiences at the time, and share their photographs on Instagram. Analyzing these OSNs has a significant impact on society, politics, economy, etc [1, 2, 3, 4].

In general, the OSN can be characterized by a complex network, in which nodes represent individuals and links capture the relationships between them [5]. Several researchers have conducted studies in this field relating to clustering [6], link prediction [7], information diffusion [1], and community detection [8, 9], among others. However, it is difficult to apply such a representation to describe multilayer structures such as multiple OSNs, multiple transportation networks [10] as well as the dynamic networks [7, 9]. The multilayer structures have a significant influence on the aspects of cascade [11, 12], propagation [13, 14, 15, 16], synchronization [17], and game [18, 19, 20]. In recent years, the multiplex network [11, 21, 22] has emerged to characterize these multilayer structures. It explicitly incorporates multiple connectivity channels into a system and provides a natural description for systems in which entities have a different set of neighbors in each layer [23].

A user may always be active in different OSNs using different accounts; in most cases, it is not known whether accounts across different OSNs belong to the same person. This means that most of the interlayer links are not given in advance. To identify whether accounts in different OSNs belong to the same person, the structure or feature information should be leveraged to predict the interlayer links, which is a challenging problem in network analysis in recent years [2]. It has been established that no OSN application can completely replace all other similar software at present. Therefore, online users generally register multiple accounts for using these OSNs. When one person registers his or her accounts in different OSNs, he or she may use different usernames, photographs, and introductions for the purpose of privacy protection or anonymity. In this manner, users can confidently use the Internet to chat, make friends, and share information on different social media, according to their personal preferences. However, anonymity also poses harms to society to a certain extent. For example, criminals may register large quantities of different accounts on OSN applications. They subsequently engage in illegal activities, such as spreading rumors, spreading viral links, and inducing financial fraud on these applications [6]. Given effective methods to determine the correspondence between accounts of the same user on different OSNs, we can establish the patterns of criminal violations of laws, model their online behavior, lock their geographical locations, and even determine their real identities, thereby effectively striking against them.

Moreover, numerous other benefits are offered when predicting the interlayer links in the multiplex network consisting of multiple OSNs. For example, business site owners can be aided in studying user behaviors, analyzing the interests of customers, and analyzing the factors that affect their decisions [24]. Furthermore, OSN users can be kept up to date with their virtual contacts from different OSNs in an integrated environment [25].

Three main approaches are available for interlayer link prediction in the multiplex social network: (i) feature-based interlayer link prediction, (ii) network-based interlayer link prediction, and (iii) a combination of multiple approaches. Among these, network-based methods have been applied extensively in recent years [26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41]. Its strategies for predicting interlayer links rely on examining the structures of social graphs on different OSN platforms [32].

In the process of using network structures for interlayer link prediction, in general, only the attributes of the nodes to be predicted are considered, along with the importance of the attributes of their neighbors, such as ignoring the neighbors’ degrees [36]. Many real-world OSNs exhibit a scale-free property. Their degree distribution follows a power law distribution, which is common knowledge in numerous real-world networks [42, 43]. The number of common matched neighbors and their degree attributes may have different effects on the matching degree of two nodes in different layers for the prediction of their interlayer link.

In this study, we developed an iterative degree penalty (IDP) algorithm for interlayer link prediction in the multiplex network. The major contributions of this paper can be summarized as follows:

1.

We propose a degree penalty principle to calculate the matching degree between the unmatched nodes across different layers in the multiplex network. In the real world, people’s friend relationship cycles are highly individual [36]. Therefore, with more common matched neighbors, a higher probability exists that two unmatched nodes across different layers have an interlayer link. However, common matched neighbors with fewer connections are assigned higher weights for the matching degree between two unmatched nodes. Our degree penalty principle uses the inverse log frequency of the common matched neighbors of two unmatched nodes to calculate the matching degree, thereby effectively addressing the above two situations.
2.

We develop an iterative algorithm to determine additional hidden interlayer links. We use node adjacency matrix multiplication for efficiently obtaining the matching degree of all unmatched node pairs. Thereafter, we use the approved maximum value method to obtain the interlayer link prediction results from the matching degree matrix. Finally, the prediction results are inserted into the priori interlayer node pair set, and the above processes are performed iteratively until all of the unmatched nodes in one layer are matched or all the matching degrees of the unmatched node pairs are equal to 0.
3.

We verify the effectiveness of our advanced IDP algorithm on both artificial scale-free and real-world networks. The results demonstrate that the IDP algorithm significantly outperforms the existing network structure-based method FRUI: the recall rate increases by a maximum of 36.6% and an average of 7.0% when the priori interlayer link rates are less than 10% on the real-world networks.

The remainder of this paper is organized as follows. Section 2 reviews related works. Section 3 describes the preliminaries and presents the problem definitions. Section 4 explains the proposed IDP algorithm. Section 5 presents the experimental results. Section 6 concludes the paper and provides future research directions.

2 Related Works

The problem of interlayer link prediction in a multiplex network constituted by multiple OSNs has attracted substantial research attention in the past decade [2]. Existing research efforts can be divided into three categories: feature-based prediction, network-based prediction, and a combination of multiple approaches.

2.1 Feature-based prediction

In several works, features have been extracted from profiles and contents. These studies have adopted machine learning techniques to predict the interlayer links across multiple OSNs [2]. Profile attributes include the username, gender, birthday, address, experience, and image, among others [44]. The username has been found to be the most important attribute in the profile and has been explored extensively. Zafarani and Liu [45] formalized the problem of mapping among identities across multiple websites, and conducted similar empirical tests on thousands of usernames across 12 social networks, thereby empirically validating several hypotheses. They further developed a supervised methodology for the same problem in Ref. [46], which extracted behavioral features of usernames based on the priori knowledge of linguistics and human behavior according to three aspects: human limitations, exogenous factors, and endogenous factors. Perito et al. [47] determined that a significant portion of the users can be linked by means of their usernames, and identified users with binary classifiers. Liu et al. [48] differentiated among users with the same usernames and proposed an unsupervised approach to match users. Among usernames, the recognition of photographs may provide a means of mapping users. Although usernames can be used to predict the interlayer links of the multiplex social network, it is difficult to use them in large-scale situations because some users may have the same username. Acquisti et al. [49] used publicly available photos for large-scale individual re-identification. However, users tend to publish different pieces of information in different OSNs [50]. Only a handful of users may put their personal photos on different OSNs. Therefore, using photos for identification is appropriate for these specific users.

Apart from usernames and images, several researchers have focused on considering various profile attributes to improve the prediction performance [51, 52, 53, 54, 55, 56]. A heuristic approach based on using the username, e-mail, and birth city attributes was proposed by Carmagnola et al. [51]. Iofciu et al. [52] divided profile information into two types: usernames and tags. They determined that users can be identified by analyzing their tagging, and the performance can be improved by combining tags and usernames. Mu et al. [56] explored the concept of latent user space to describe the user profiles in different social platforms, so that two individuals with greater similarities would have closer profiles in the latent user space. Similar areas were also studied in Refs. [53, 54, 55]. Leveraging more profile attributes could improve the prediction performance effectively, but profile attributes may contain information which are null or are not sufficiently strong to indicate user identity.

Content attributes for individuals can reveal their activities, such as posting, mentioning, and commenting in OSNs [2]. These properties can capture unique characteristics for the interlayer link prediction of multiplex OSNs. Zheng et al. [57] extracted four types of writing-style features (syntactic, lexical, structural, and content) to identify authorship. Goga et al. [58] exploited the geo-location, timestamp, and writing style of user posts to identify accounts on different OSNs. Zafarani and Liu [59] extracted trajectory-based content features to capture the unique footprints of user activities for linking the same user accounts across platforms. Nevertheless, the methods based on content attributes often face the problem of data sparsity, because the amount of location data or posted content of different users varies significantly. These approaches are usually confined to some specific OSN applications, which makes it harder to leveraged them on a general multiplex social network.

2.2 Network-based prediction

Personal data available on OSNs are usually anonymized owing to privacy concerns, with users often removing their profile and attribute information or replacing these with fake information [40]. Therefore, using network structures to predict the interlayer links has been the focus of substantial research. Narayanan and Shmatikov [60] assumed that a natural person usually has a similar social network in the virtual world. Based on this assumption, Zhou et al. [36] developed a network structure-based method known as the Friend Relationship-based User Identification (FRUI) algorithm. The FRUI algorithm identifies the same user across different OSNs by leveraging the shared friend cycle, and requires several priori matched pairs. They subsequently proposed an approach that did not require priori matched pairs in Ref. [33]. Interlayer link prediction in a multiplex network consisting of multiple OSNs formed the maximum common subgraph problem in Ref. [61], which maximized the number of standard links to obtain one-to-one mapping. The maximum common subgraph problem is an NP-hard problem, and is therefore hardly used in real scenarios. Tan et al. [41] leveraged a hypergraph to model user relationships and correlate accounts across different OSNs by projecting the manifolds of two networks onto a commonly embedded space. Moreover, neighborhood-based network features were used for account alignment in [62]. Zhang et al. [63] proposed a joint link fusion algorithm to predict the social links and anchor links of two OSNs simultaneously. This algorithm transferred information relating to social links from one network to another. Its time complexity is approximately $O(n^{3})$ . Numerous researchers have used network embedding [64], also known as representation learning [65], to predict interlayer links in recent years. Liu et al. [37] adopted a network embedding approach to align users across multiple directed OSNs. Their method embedded two OSNs into a common space to capture the social links of user accounts. Man et al. [38] proposed a supervised model to learn cross-layer mapping for interlayer link prediction, which leveraged network embedding to capture the underlying structural regularities. Zhou et al. [31] proposed an unsupervised algorithm for user identification, using network embedding and scalable nearest neighbors. A deep reinforcement learning-based framework was developed in Ref. [32], which could embed the global network structure, thereby achieving higher correlation accuracy than other network embedding methods. Similar studies can also be found in Refs. [26, 27, 28, 29, 31, 34, 35, 39]. Most of the network embedding-based methods implicitly assume that the input networks are complete without missing edges, and it is not easy to add user attribution into the representation vectors.

2.3 Combination of multiple approaches

Other relevant approaches have combined the feature-based and structure-based methods for interlayer link prediction; for example, searching the same username from the friend lists of seed users across Facebook and Twitter [66]. Nunes et al. [67] introduced a binary classification method to classify account pairs as belonging to the same person or not. The feature vectors of the classifier were constructed by profile information, descriptions of interests, and friend lists. Kong et al. [68] studied two fully aligned OSN datasets collected from Foursquare and Twitter to determine the correspondence between different user accounts. They extracted multiple features and formulated the correspondence problem as a stable matching problem. The network structure features were one type among multiple features. Zhang et al. [69] explored a probabilistic approach for mapping individuals, in which the user friend or connection count was used as a feature. Lu et al. [24] introduced a methodology to identify customers between customer accounts, e-commerce sites, and OSN applications. The network structure was used to extract user interest features.

In the majority of combination approaches, the problem of attempting to obtain user profile features for the sake of privacy protection exists. Our study focuses on structure-based prediction. Numerous real-world OSNs exhibit a scale-free degree distribution [43]. Common matched neighbors with different degrees may have different influences on the matching degrees of nodes across two OSNs. In this study, we developed an IDP algorithm to address this issue.

3 Preliminaries and problem

In this section, we introduce the conception of the multiplex network for representing multiple OSN applications, and describe the problem of interlayer link prediction on the multiplex network. The symbols and notations frequently used in this paper are displayed in Table 1. We use bold uppercase letters for matrices, bold lowercase letters for vectors, and lowercase letters for scalars.

Refer to caption — Figure 1: Example of representing relations across multiple OSNs using multiplex network. (a) Real scenario: There are two OSN applications. OSN_α has six user accounts, while OSN_β has 7. Each account exhibits certain connections with other accounts on the same OSN. In the red circle, three pairs of accounts belong to three users, and the correspondence among these accounts is revealed in advance. The task is to determine the correspondence of the other accounts. (b) Multiplex network: We use a multiplex network $\mathcal{M}$ to represent multiple OSNs. The two OSNs are denoted by layers $\alpha$ and $\beta$ . The user accounts are denoted by nodes $v^{\alpha}_{i}$ or $v^{\beta}_{j}$ . The relationships of the accounts connected in an OSN are denoted by interlayer links $e^{\alpha}_{ij}$ or $e^{\beta}_{ij}$ . The correspondence of two accounts across two OSNs is denoted by the interlayer link $e^{\alpha\beta}_{ij}$ . The task of determining the correspondence of accounts across OSNs becomes the prediction of interlayer links in the multiplex network.

3.1 Definitions

To distinguish the different OSN platforms, we represent the relationship between nodes in one OSN as the graph $G^{\alpha}(V^{\alpha},E^{\alpha})$ , where $V^{\alpha}$ and $E^{\alpha}\subseteq V^{\alpha}\times V^{\alpha}$ are the sets of nodes and edges of the graph $G^{\alpha}$ . The elements of $E^{\alpha}$ are referred to as intralayer links. The scenario of multiple OSNs is represented by the multiplex network $\mathcal{M}=(g,c)$ , where $g={\{G^{\alpha}|\alpha\in\{1,\cdots,m\}\}}$ is the set of graphs and $c={\{E^{\alpha\beta}\subseteq V^{\alpha}\times V^{\beta}|\alpha,\beta\in\{1,\cdots,m\},\alpha\neq\beta\}}$ is the set of connections between the nodes of graphs $G^{\alpha}$ and $G^{\beta}$ . If there exists an edge $e^{\alpha\beta}_{ij}$ between node $v^{\alpha}_{i}$ in graph $G^{\alpha}$ and node $v^{\beta}_{j}$ in graph $G^{\beta}$ , the accounts represented by the two nodes belong to the same user. The elements of $E^{\alpha\beta}$ are known as interlayer links. Each subnetwork in $g$ is referred to as a layer in multiplex network $\mathcal{M}$ . An illustration of multiplex networks is provided in Fig. 1.

Table 1: Symbols and notations

Symbol	Description
$\mathcal{M}$	The multiplex network.
$G$	A social network that is one layer of $\mathcal{M}$ .
$u,v$	Nodes in $\mathcal{M}$ .
$\alpha,\beta$	Layer indices of $\mathcal{M}$ .
$e^{\alpha},e^{\beta}$	Intralayer connections in $G^{\alpha}$ and $G^{\beta}$ .
$\bm{e}$ , $\bm{E}$	Intralayer connection vector and intralayer connection matrix, respectively.
$e^{\alpha\beta}$	Interlayer connection.
$i,j,a,b$	Node indices.
$n^{\alpha},n^{\beta}$	Number of nodes in $G^{\alpha}$ and $G^{\beta}$ .
$n$	Number of MINPs.
$\Gamma(v_{i})$	Neighbors set of node $v_{i}$ .
$r,\bm{R}$	Matching degree between two interlayer nodes and matching degree matrix for all interlayer nodes, respectively.
$k$	Degree of a node.
$\bm{d}$	Degree vector of all nodes in a layer.
$h$	Reciprocal of logarithm for a priori interlayer node degree
$\bm{h},\bm{H}$	Reciprocal vector of logarithm for a priori interlayer node degree and reciprocal matrix of logarithm for all priori interlayer node degrees.
$\delta$	Control parameter for selecting candidate matched interlayer nodes.
$\bm{A}$	Adjacency matrix of a layer.
$p$	The ratio of PINPs to INPs.
$s$	The percentage of remaining nodes when extracting subnetworks to construct the multiplex network.
$\Phi$	Set of PINPs.
$\varphi^{\alpha},\varphi^{\beta}$	Set of priori interlayer nodes in layers $\alpha$ and $\beta$ .
$\Psi$	Set of MINPs.
$\psi^{\alpha},\psi^{\beta}$	Set of matched interlayer nodes in layer $\alpha$ and $\beta$ .

Prior to introducing the details of our proposed method, we present the formal definitions of various important concepts in this paper.

Definition 1: Interlayer node pair (INP). Given a multiplex network $\mathcal{M}$ , if an interlayer link exists between a node $v^{\alpha}_{i}$ in layer $\alpha$ and a node $v^{\beta}_{j}$ in layer $\beta$ , we refer to the pair consisting of these two nodes as an INP, which is represented by INP $(v^{\alpha}_{i}$ , $v^{\beta}_{j})$ . Moreover, the nodes belonging to the INP are known as interlayer nodes.

Definition 2: Matched interlayer node pair (MINP). Given a multiplex network $\mathcal{M}$ , if a node pair consisting of node $v^{\alpha}_{i}$ in layer $\alpha$ and node $v^{\beta}_{j}$ in layer $\beta$ is matched as an INP, we refer to this pair as an MINP and denote it by MINP $(v^{\alpha}_{i}$ , $v^{\beta}_{j})$ . An MINP is not necessarily an INP, because the results calculated by the algorithm may be incorrect.

Definition 3: Priori interlayer node pair (PINP). Given a multiplex network $\mathcal{M}$ , if several INPs are provided in advance, we refer to these INPs as PINPs. Furthermore, the nodes belonging to the PINP are known as priori interlayer nodes.

Definition 4: Unmatched node pair (UNP). Given a multiplex network $\mathcal{M}$ , if a node $v^{\alpha}_{i}$ in layer $\alpha$ and a node $v^{\beta}_{j}$ in layer $\beta$ have not been matched, we refer to the pair consisting of these two nodes as a UNP, which is represented by UNP $(v^{\alpha}_{i}$ , $v^{\beta}_{j})$ .

Definition 5: Common matched neighbor (CMN). Given a PINP( $v^{\alpha}_{i}$ , $v^{\beta}_{j}$ ), if node $v^{\alpha}_{a}$ in layer $\alpha$ has an intralayer link with node $v^{\alpha}_{i}$ , and node $v^{\beta}_{b}$ in layer $\beta$ has an intralayer link with node $v^{\beta}_{j}$ , we can state that the PINP( $v^{\alpha}_{i}$ , $v^{\beta}_{j}$ ) is the CMN of nodes $v^{\alpha}_{a}$ and $v^{\beta}_{b}$ .

It is worth noting that the terms graph and layer, link and connection, and interlayer node pair and interlayer link are used interchangeably in this paper.

3.2 Problem statement

Supposing that we have a two-layer multiplex network $\mathcal{M}$ with a small set of priori interlayer links, the purpose of the interlayer link prediction problem is to predict which node pairs are most likely to have the interlayer links.

Given a UNP $(u^{\alpha}_{i},u^{\beta}_{j})$ in the multiplex network $\mathcal{M}$ , the objective function of the interlayer link prediction can be defined as follows:

f(u^{\alpha}_{i},u^{\beta}_{j})=J(r_{ij})=\left\{\begin{array}[]{ll}1,&\mbox{if $e^{\alpha\beta}_{ij}$ exist,}\\ 0,&\mbox{otherwise,}\end{array}\right.

(1)

where $r_{ij}$ represents the matching degree of the unmatched nodes $v^{\alpha}_{i}$ and $v^{\beta}_{j}$ . The interlayer link prediction problem is converted into the calculation of $r_{ij}$ and the definition of the objective function $J$ .

In a real scenario, certain people may possess two or more accounts in the same OSN application. For simplicity, we assume that these multiple accounts belong to different users. This means that the interlayer link prediction problem in this study is a one-to-one matching problem; that is, no two interlayer links share the same node.

4 IDP algorithm

In this section, we introduce the iterative interlayer link prediction algorithm, known as the IDP, into the multiplex network to predict the interlayer links based on only a small part of these. Firstly, we propose a degree penalty principle that assigns a greater weight to a CMN with fewer connections. Secondly, we propose node adjacency matrix multiplication for efficiently obtaining the matching degree of all of the UNPs. Thereafter, we use the approved maximum value method to obtain the interlayer link prediction results from the matching degree matrix. Finally, the prediction results are inserted into the PINP set, and the above processes are performed iteratively until all of the unmatched nodes in one layer are matched or all of the match degrees of the UNPs are equal to 0.

4.1 Degree penalty principle

Researchers have established that numerous natural and artificial networks exhibit certain common topological characteristics, such as small world, scale free, and core periphery [70]. The scale-free property is valuable in that the degree distribution of a scale-free network follows a power law. Zhou et al. analyzed four leading OSN sites (Delicious, Flickr, Twitter, and YouTube), and found that these sites all exhibit scale-free degree distributions [43]. This means that there exist large nodes with a small degree, as well as several nodes with a large degree. The CMNs with different degrees will have different weights for the UNP matching degrees. For example, if a person has only one friend and he or she follows an account $v^{\alpha}$ on OSN $G^{\alpha}$ and an account $v^{\beta}$ on OSN $G^{\beta}$ , it is highly likely that $v^{\alpha}$ and $v^{\beta}$ belong to the same person, namely his or her friend. In contrast, if this person has many friends, it is difficult to determine whether or not the accounts that he or she follows on different OSNs belong to the same person.

Based on the above statements, we propose the principle of the degree penalty of CMNs for calculating the matching degree of the unmatched nodes, which assigns more weights to smaller-degree CMNs. We define

r_{ij}=\sum_{\begin{subarray}{c}\forall(v^{\alpha}_{a},v^{\beta}_{b})\in\Phi,\\ v^{\alpha}_{a}\in\Gamma(u^{\alpha}_{i}),\\ v^{\beta}_{b}\in\Gamma(u^{\beta}_{j})\end{subarray}}\log^{-1}(k_{v^{\alpha}_{a}}+1)+\log^{-1}(k_{v^{\beta}_{b}}+1)

(2)

as the matching degree of UNP $(u^{\alpha}_{i},u^{\beta}_{j})$ . In Eq. (2), $\Phi$ represents the set of PINPs, $\Gamma(u^{\alpha}_{i})$ and $\Gamma(u^{\beta}_{j})$ represent the neighbor sets of nodes $u^{\alpha}_{i}$ and $u^{\beta}_{j}$ , respectively, $k$ represents the node degree, and the constraints in the equation indicate that the PINP $(v^{\alpha}_{a},v^{\beta}_{b})$ is the CMN of UNP $(u^{\alpha}_{i},u^{\beta}_{j})$ . It is noteworthy that, if $k_{v^{\alpha}_{a}}=1$ , $\log(k_{v^{\alpha}_{a}})$ will be equal to 0, and $\log^{-1}(k_{v^{\alpha}_{a}})$ becomes meaningless. To overcome this problem, we add 1 to each log by means of Laplace smoothing.

The degree penalty principle has the following characteristics:

(i) It can reflect the contribution of the number of CMNs to the matching degree. For any two unmatched nodes $u^{\alpha}_{i}$ and $u^{\beta}_{j}$ , a greater number of CMNs results in their matching degree $r_{ij}$ being larger.

(ii) It can reflect the influence of the degree of CMNs on the matching degree. For any CMNs, a smaller degree results in the weights of these CMNs making a greater contribution to the matching degree of the UNPs connected to them.

4.2 Calculation of matching degree

The matching degree of each UNP can be calculated by the degree penalty principle proposed above. Given all of the nodes, interlayer links, and priori interlayer links, the question is how to obtain all matching degrees of the UNPs efficiently. We propose an approach based on matrix operation, inspired by Ref. [71].

Given a multiplex network $\mathcal{M}$ , $n^{\alpha}$ and $n^{\beta}$ are denoted as the number of nodes in layers $\alpha$ and $\beta$ , respectively, while $n$ is the number of PINPs. The number of unmatched nodes in layer $\alpha$ can be represented by $n^{\alpha}-n$ , while the number of unmatched nodes in layer $\beta$ can be represented by $n^{\beta}-n$ .

For a node $v^{\alpha}_{a}\in\varphi^{\alpha}$ , where $\varphi^{\alpha}$ represents the set of priori interlayer nodes in layer $\alpha$ , the degree of $v^{\alpha}_{a}$ can be expressed as

k_{v^{\alpha}_{a}}=\sum_{b=1}^{n_{\alpha}}{e^{\alpha}_{ab}},

(3)

where $e^{\alpha}_{ab}$ is equal to 1 if an intralayer link exists between nodes $v^{\alpha}_{a}$ and $v^{\beta}_{b}$ , and 0 otherwise.

By $h^{\alpha}_{a}=\log^{-1}(k_{v^{\alpha}_{a}}+1)$ , we can rewrite Eq. (2) as

r_{ij}=\sum_{\begin{subarray}{c}\forall(v^{\alpha}_{a},v^{\beta}_{b})\in\Phi,\\ v^{\alpha}_{a}\in\Gamma(u^{\alpha}_{i}),\\ v^{\beta}_{b}\in\Gamma(u^{\beta}_{j})\end{subarray}}h^{\alpha}_{a}+h^{\beta}_{b}.

(4)

This equation will facilitate the subsequent introduction of matrix operations.

It is clear that, if PINP $(v^{\alpha}_{a},v^{\beta}_{b})$ is the CMN of UNP $(u^{\alpha}_{i},u^{\beta}_{j})$ , $e^{\alpha}_{ia}$ and $e^{\beta}_{jb}$ will be equal to 1; thus, $e^{\alpha}_{ia}\cdot h^{\alpha}_{a}\cdot e^{\beta}_{jb}=h^{\alpha}_{a}$ and $e^{\alpha}_{ia}\cdot h^{\beta}_{b}\cdot e^{\beta}_{jb}=h^{\beta}_{b}$ . In contrast, if PINP $(v^{\alpha}_{a},v^{\beta}_{b})$ is not the CMN of UNP $(u^{\alpha}_{i},u^{\beta}_{j})$ , $e^{\alpha}_{ia}$ or $e^{\beta}_{jb}$ will be equal to 0; thus, $e^{\alpha}_{ia}\cdot h^{\alpha}_{a}\cdot e^{\beta}_{jb}=0$ and $e^{\alpha}_{ia}\cdot h^{\beta}_{b}\cdot e^{\beta}_{jb}=0$ . Therefore, Eq. (4) can be replaced with

r_{ij}=\sum_{\forall(v^{\alpha}_{a},v^{\beta}_{b})\in\Phi}e^{\alpha}_{ia}\cdot h^{\alpha}_{a}\cdot e^{\beta}_{jb}+e^{\alpha}_{ia}\cdot h^{\beta}_{b}\cdot e^{\beta}_{jb}.

(5)

If node $v^{\alpha}_{a}$ is a matched interlayer node in layer $\alpha$ , a counterpart node must exist in layer $\beta$ , and vice versa. Naturally, it is possible to make the PINPs uniform, as follows: $(v^{\alpha}_{1}$ , $v^{\beta}_{1}),(v^{\alpha}_{2}$ , $v^{\beta}_{2}),\cdots,(v^{\alpha}_{n}$ , $v^{\beta}_{n})$ . Therefore, Eq. (5) can be rewritten as

r_{ij}=\sum_{a=1}^{n}h^{\alpha}_{a}\cdot e^{\alpha}_{ia}\cdot e^{\beta}_{aj}+e^{\alpha}_{ia}\cdot e^{\beta}_{aj}\cdot h^{\beta}_{a}.

(6)

Using the vector form, Eq. (6) can be represented as

\begin{array}[]{l}r_{ij}=[h^{\alpha}_{1}\cdot e^{\alpha}_{i1},h^{\alpha}_{2}\cdot e^{\alpha}_{i2},\cdots,h^{\alpha}_{n}\cdot e^{\alpha}_{in}]\cdot\left[\begin{array}[]{c}e^{\beta}_{1j}\\ e^{\beta}_{2j}\\ \vdots\\ e^{\beta}_{nj}\\ \end{array}\right]\\ +~{}[e^{\alpha}_{i1},e^{\alpha}_{i2},\cdots,e^{\alpha}_{in}]\cdot\left[\begin{array}[]{c}e^{\beta}_{1j}\cdot h^{\beta}_{1}\\ e^{\beta}_{2j}\cdot h^{\beta}_{2}\\ \vdots\\ e^{\beta}_{nj}\cdot h^{\beta}_{n}\\ \end{array}\right].\end{array}

(7)

It is known that the Hadamard product of $A\equiv[a_{ij}]$ and $B\equiv[b_{ij}]$ with the same dimensions is the matrix $A\circ B=[a_{ij}b_{ij}]$ . Therefore, $[h^{\alpha}_{1}\cdot e^{\alpha}_{i1},h^{\alpha}_{2}\cdot e^{\alpha}_{i2},\cdots,h^{\alpha}_{n}\cdot e^{\alpha}_{in}]$ can be represented as $[h^{\alpha}_{1},h^{\alpha}_{2},\cdots,h^{\alpha}_{n}]\circ[e^{\alpha}_{i1},e^{\alpha}_{i2},\cdots,e^{\alpha}_{in}]$ . Denoting $\bm{h}^{\alpha}_{i}=[h^{\alpha}_{1},h^{\alpha}_{2},\cdots,h^{\alpha}_{n}]^{T}$ , $\bm{h}^{\beta}_{j}=[h^{\beta}_{1},h^{\beta}_{2},\cdots,h^{\beta}_{n}]^{T}$ , $\bm{e}^{\alpha}_{i}=[e^{\alpha}_{i1},e^{\alpha}_{i2},\cdots,e^{\alpha}_{in}]^{T}$ , $\bm{e}^{\beta}_{j}=[e^{\beta}_{1j},e^{\beta}_{2j},\cdots,e^{\beta}_{nj}]^{T}$ , where $(\cdot)^{T}$ is the transposition of $(\cdot)$ , Eq. (7) can be rewritten as

r_{ij}=((\bm{h}^{\alpha}_{i})^{T}\circ(\bm{e}^{\alpha}_{i})^{T})\cdot\bm{e}^{\beta}_{j}+(\bm{e}^{\alpha}_{i})^{T}\cdot(\bm{e}^{\beta}_{j}\circ\bm{h}^{\beta}_{j}).

(8)

Using Eq. (8), the matching degree of UNP $(u^{\alpha}_{i},u^{\beta}_{j})$ is represented by the form of the vector operation. Then, we can express the matching degree of all UNPs in the matrix operation form. In a similar manner, we denote $\bm{H}^{\alpha}=[\bm{h}^{\alpha}_{1},\quad\bm{h}^{\alpha}_{2},\quad\cdots,\quad\bm{h}^{\alpha}_{n^{\alpha}-n}]$ , $\bm{E}^{\alpha}=[\bm{e}^{\alpha}_{1},\bm{e}^{\alpha}_{2},\cdots,\bm{e}^{\alpha}_{n^{\alpha}-n}]$ , $\bm{H}^{\beta}=[\bm{h}^{\beta}_{1},\bm{h}^{\beta}_{2},\cdots,\bm{h}^{\beta}_{n^{\beta}-n}]$ , $\bm{E}^{\beta}=[\bm{e}^{\beta}_{1},\bm{e}^{\beta}_{2},\cdots,\bm{e}^{\beta}_{n^{\beta}-n}]$ . The matching degree of all UNPs can be calculated as follows:

\bm{R}=([\bm{H}^{\alpha}]^{T}\circ[\bm{E}^{\alpha}]^{T})\cdot\bm{E}^{\beta}+[\bm{E}^{\alpha}]^{T}\cdot(\bm{E}^{\beta}\circ\bm{H}^{\beta}).

(9)

It is worth noting that $\bm{H}^{\alpha}$ consists of $n^{\alpha}-n$ copies of $\bm{h}^{\alpha}_{i}$ , because $\bm{h}^{\alpha}_{1}=\bm{h}^{\alpha}_{2}=\cdots=\bm{h}^{\alpha}_{n}=[h^{\alpha}_{1},h^{\alpha}_{2},\cdots,h^{\alpha}_{n}]^{T}$ . Similarly, $\bm{H}^{\beta}$ consists of $n^{\beta}-n$ copies of $\bm{h}^{\beta}_{j}$ . Moreover, $\bm{E}^{\alpha}$ and $\bm{E}^{\beta}$ are the submatrices of the adjacency matrix of $G^{\alpha}$ and $G^{\beta}$ , respectively. Each row of the submatrix represents one of the matched interlayer nodes, while each column represents one of the unmatched nodes.

Based on the above statements, the four matrices $\bm{H}^{\alpha}$ , $\bm{E}^{\alpha}$ , $\bm{E}^{\beta}$ , and $\bm{H}^{\beta}$ can be obtained. The matching degree of all unmatched nodes between the different layers of the multiplex network $\mathcal{M}$ can be calculated efficiently.

4.3 Selecting matched interlayer node pairs

After obtaining the values of all elements of the matching degree matrix $\bm{R}$ , we need to determine which UNPs can be selected as MINPs. This means that we should formulate the objective function $J$ . This function clarifies that a node pair can be selected as an MINP when its matching degree satisfies certain conditions. A larger value of the matching degree $r_{ij}$ indicates a higher probability that the node pair ( $u^{\alpha}_{i}$ , $u^{\beta}_{j}$ ) is the MINP. Therefore, we define the objective function as:

\begin{array}[]{c}J(r_{ij})=\mathds{1}(r_{ij}\geq\delta\cdot\max(\bm{R})),\\ s.t.~{}u^{\alpha}_{i}\notin\psi^{\alpha},u^{\beta}_{j}\notin\psi^{\beta}\\ \end{array},

(10)

where (i) $\mathds{1}(\cdot)$ is an indicator function that takes 1 if the condition inside the parenthesis is true, and zero otherwise; (ii) $\max(\cdot)$ is the maximum function that takes the maximum value inside the parenthesis; (iii) $\psi^{\alpha}$ and $\psi^{\beta}$ are the sets of matched interlayer nodes in layers $\alpha$ and $\beta$ , respectively; and (iv) $\delta$ is a control parameter that takes a value from 0 to 1. When $\delta=1$ , only the UNPs with the maximum value of the matching degree can be selected as MINPs. When $\delta\in(0,1)$ , the UNPs with a matching degree greater than $\delta\cdot\max(\bm{R})$ can be selected as MINPs. Moreover, when $\delta=0$ , all of the UNPs can be selected as MINPs. It is obvious that a smaller value of $\delta$ means that additional UNPs are selected as MINPs, and hence, the accuracy is lower. In contrast, a larger value of $\delta$ indicates that less UNPs are selected as MINPs, and hence, the efficiency is lower. The constraints in Eq. (10) are used to ensure that each unmatched node is matched only once.

4.4 Achieving additional matched interlayer node pairs iteratively

In the previous steps, the degree penalty principle and matrix operation are leveraged to calculate the matching degree matrix $\bm{R}$ for all unmatched nodes among the different layers in the multiplex network $\mathcal{M}$ and the node pairs with matching degrees greater than $\delta$ times $\max(\bm{R})$ are selected as MINPs. By performing these steps once, only one or several UNPs can be selected as MINPs. Therefore, we propose an iterative strategy, known as the IDP algorithm, to achieve additional MINPs.

Denoting $\Psi$ as the set of MINPs, we can add elements of $\Psi$ to $\Phi$ . Moreover, we execute the steps introduced above again to identify more MINPs. This strategy can be executed iteratively until all of the unmatched nodes in one layer are matched or all of the matching degrees of the UNPs are equal to 0.

The details of our suggested IDP algorithm for interlayer link prediction are presented in Algorithm 1. In each iteration, the time complexity of the algorithm mainly depends on the calculation of the matching degree matrix $\bm{R}$ of Algorithm 1, which is matrix multiplication. Provided $n^{\alpha}-n=n^{\beta}-n=m$ , the running time of the matrix multiplication is $nm^{2}/q$ , where $q$ is the number of compute nodes [72]. For the whole algorithm, provided that $z$ is the average number of selected matched interlayer nodes per iteration, the time complexity of IDP is $O(nm^{3}/zp)$ . Moreover, an illustration of the procedure of the IDP algorithm is presented in Fig. 2.

Algorithm 1. IDP
Input:Nodes and connections of $G^{\alpha}$ and $G^{\beta}$ in multiplex network $\mathcal{M}$ , PINPs, parameter $\delta$ .
Output:MINPs
1: function IDP( $G^{\alpha}$ , $G^{\beta}$ , PINPs, $\delta$ )
2: $\bm{A}^{\alpha}\leftarrow$ adjacency matrix of $G^{\alpha}$ , $\bm{A}^{\beta}\leftarrow$ adjacency matrix of $G^{\beta}$
3: $\bm{d}^{\alpha}\leftarrow$ degree of nodes in $G^{\alpha}$ , $\bm{d}^{\beta}\leftarrow$ degree of nodes in $G^{\beta}$
4: $n^{\alpha}$ =size( $\bm{A}^{\alpha}$ ), $n^{\beta}$ =size( $\bm{A}^{\beta}$ ), $n$ =size(PINPs)
5: matchNode^α=[], unmatchNode^α=[], matchNode^β=[], unmatchNode^β=[], MINPs=[]
6: while $n<n^{\alpha}$ and $n<n^{\beta}$ do
7: j=1,k=1
8: foreach i in $n^{\alpha}$ do
9: if i is in PINPs do
10: matchNode^α[j]=i, j++
11: else do
12: unmatchNode^α[k]=i, k++
13: j=1,k=1
14: foreach i in $n^{\beta}$ do
15: if i is in PINPs do
16: matchNode^β[j]=i, j++
17: else do
18: unmatchNode^β[k]=i, k++
19: $\bm{E}^{\alpha}$ =submatrix( $\bm{A}^{\alpha}$ ,matchNode^α,unmatchNode^α)
20: $\bm{E}^{\beta}$ =submatrix( $\bm{A}^{\beta}$ ,matchNode^β,unmatchNode^β)
21: $\bm{h}^{\alpha}_{1}$ =1./log(submatrix( $\bm{d}^{\alpha}$ ,matchNode^α,1).+1)
22: $\bm{h}^{\beta}_{1}$ =1./log(submatrix( $\bm{d}^{\beta}$ ,matchNode^β,1).+1)
23: $\bm{H}^{\alpha}=[\bm{h}^{\alpha}_{1},\bm{h}^{\alpha}_{1},\cdots,\bm{h}^{\alpha}_{1}]_{n\times(n^{\alpha}-n)}$
24: $\bm{H}^{\beta}=[\bm{h}^{\beta}_{1},\bm{h}^{\beta}_{1},\cdots,\bm{h}^{\beta}_{1}]_{n\times(n^{\beta}-n)}$
25: $\bm{R}=((\bm{H}^{\alpha})^{T}\circ(\bm{E}^{\alpha})^{T})\cdot\bm{E}^{\beta}+[\bm{E}^{\alpha}]^{T}\cdot(\bm{E}^{\beta}\circ\bm{H}^{\beta})$
26: $\max=0$
27: foreach i in $n^{\alpha}-n$ do
28: foreach j in $n^{\beta}-n$ do
29: if $\bm{R}$ [i][j] $>\max$ do
30: $\max=\bm{R}$ [i][j]
31: foreach i in $n^{\alpha}-n$ do
32: foreach j in $n^{\beta}-n$ do
33: if $\bm{R}$ [i][j] $>=\delta\cdot\max$ and unmatchNode^α[j] not in MINPs do
34: MINPs $\leftarrow$ (unmatchNode^α[i],unmatchNode^β[j])
35: PINPs $\leftarrow$ (unmatchNode^α[i],unmatchNode^β[j])
36: $n$ =size(PINPs)
37: return MINPs
38: function submatrix( $\bm{A}$ ,rows,cols)
39: $\bm{Q}$ $\leftarrow$ Extracts the rows and columns of the matrix $\bm{A}$ according to the number in the arrays of rows and cols.
40: return $\bm{Q}$

5 Experiments

In this section, we firstly describe the experimental settings. Thereafter, the evaluation metrics are introduced. Finally, we present the experimental results of the baseline and IDP algorithms on artificial scale-free networks and real-world networks.

Table 2: Time complexity of baselines and IDP

Methods	Time Complexity
INOE	$O(KD(\|E^{\alpha}\|+\|E^{\beta}\|))$
NS	$O(m(\|E^{\alpha}\|+\|E^{\beta}\|)d^{\alpha}d^{\beta})$
FRUI	$O(mnd^{\alpha}d^{\beta})$
IDP	$O(nm^{3}/zp)$

5.1 Experimental settings

We verify the effectiveness of our proposed IDP algorithm on both artificial scale-free and real-world networks. The steps for constructing the artificial scale-free multiplex networks are as follows:

(i) We create a Barabási-Albett (BA) [73] network, the degree distribution of which follows a power law distribution, according to the generation step proposed in Ref. [73]. This is an original network for the following steps. (ii) Using $G^{org}(V^{org},E^{org})$ to represent the original BA network and $s$ to represent the percentage of remaining nodes, we construct two networks, $G^{\alpha}=G^{org}$ and $G^{\beta}=G^{org}$ . (iii) For a node $v^{\alpha}_{i}$ in $G^{\alpha}$ , we generate a random value $a_{1}$ with a uniform distribution in [0,1]. If $a_{1}\geq s$ , the node $v^{\alpha}_{i}$ is discarded, as are all of the intralayer links connected with node $v^{\alpha}_{i}$ . Otherwise, node $v^{\alpha}_{i}$ is preserved in $G^{\alpha}$ . (iv) Similarly, for a node $v^{\beta}_{i}$ in $G^{\beta}$ , we generate a random value $a_{2}$ with a uniform distribution in [0,1]. If $a_{2}\geq s$ , the node $v^{\beta}_{i}$ is discarded, as are all of the intralayer links connected with node $v^{\beta}_{i}$ . Otherwise, node $v^{\beta}_{i}$ is preserved in $G^{\beta}$ . (v) We determine whether to add an interlayer link between $v^{\alpha}_{i}$ and $v^{\beta}_{i}$ . If $a_{1}<s$ and $a_{2}<s$ simultaneously, the counterpart nodes of node $v^{org}_{i}$ , namely $v^{\alpha}_{i}$ and $v^{\beta}_{i}$ , remain in layers $\alpha$ and $\beta$ . Therefore, we add an interlayer link between $v^{\alpha}_{i}$ and $v^{\beta}_{i}$ . Otherwise, we do nothing. (vi) Steps (iii) to (v) are repeated until all of the nodes in $G^{\alpha}$ and $G^{\beta}$ are traversed.

The real-world datasets contain eight networks downloaded from the websites Stanford Large Network Dataset Collection [74], Link Prediction Group [75], and Network Analysis of Advogato [76]. In a real scenario, OSN application users are often wary of their privacy. Therefore, it is difficult to obtain complete ground truth. To address this problem, we perform experiments on self-matching real-world networks[77]. The solution for obtaining the self-matching real-world multiplex network is the same as that for the BA multiplex network described above.

After constructing the multiplex networks, we use the FRUI [36], NS [78], and INOE [37] as the baselines for the experiments, where FRUI is the closest to the IDP algorithm for the interlayer link prediction as a state of the art. Each experiment is repeated 500 times. For the sake of reducing computational time in each experiment, the adjacency matrices constructed by the matched nodes and unmatched nodes of layer $\alpha$ and $\beta$ in the $t$ th iteration will join to $\bm{E}^{\alpha}$ and $\bm{E}^{\beta}$ , respectively. This could avoid reconstructing $\bm{E}^{\alpha}$ and $\bm{E}^{\beta}$ in $(t+1)$ th iteration.

The time complexity of the baseline and IDP algorithms is presented in Table 2. In particular, $K$ is the negative sampling number, $D$ is the representation dimension, $|E^{\alpha}|$ and $|E^{\beta}|$ are the number of intralayer links in layer $\alpha$ and $\beta$ , respectively. $d^{\alpha}$ and $d^{\beta}$ are the maximal degrees of nodes in layer $\alpha$ and $\beta$ , respectively.

5.2 Evaluation metrics

We employ the recall, precision, and F1 [2] as the metrics for evaluating the performance of the FRUI and IDP algorithms, which are widely used in information retrieval, machine learning, and data mining, among others. The recall can be formulated as

Recall=\frac{TP}{TP+FN},

(11)

where TP and FN indicate the number of true positives and false negatives, respectively [48]. In this study, the recall reflects the ratio of INPs that the algorithm correctly predicts to the number of INPs that need to be predicted. The precision can be formulated as

Precision=\frac{TP}{TP+FP},

(12)

where FP indicates the number of false positives [48]. In this study, the precision reflects the ratio of INPs that the algorithm correctly predicts to all of the results predicted by the algorithm. $F1$ is the harmonic mean between the precision and recall [79]. It can be formulated as

F1=\frac{2\cdot Recall\cdot Precision}{Recall+Precision},

(13)

where the range for $F1$ is [0,1]. For these three metrics, a higher value indicates superior performance of the algorithm performance.

5.3 Results on artificial networks

In this subsection, we outline the manner in which to determine an optimal value of $\delta$ , and demonstrate the comparison results of the baseline and IDP algorithms on different average node degrees, node overlaps, and network sizes.

5.3.1 Determining optimal $\delta$

A possible solution for obtaining an optimum $\delta$ is to determine it experimentally. We set the default values of $\delta$ to increase from 0.1 to 1 by 0.1 and execute the IDP algorithm. In these experiments, the BA network parameter $m$ , which represents the number of edges for attaching a new node to existing nodes, is equal to 10, the percentage of remaining nodes $s$ is equal to 0.5, the network sizes $N$ of the original BA networks are equal to 2,000, 4,000, 6,000, 8,000, and 10,000, respectively, and the ratios of PINPs to INPs $p$ are equal to $0.01,~{}0.02,~{}\cdots,~{}0.1$ , respectively.

Table 3 displays the average F1 rates of these experiments. It can be observed that the IDP algorithm exhibits the best performance at $\delta=0.5$ when $N$ is equal to 2,000. When $N$ is equal to $4,000,~{}6,000,~{}8,000,and~{}10,000$ , the IDP algorithm exhibits superior performance at $\delta=0.5,~{}0.7,~{}0.7$ , and $0.7$ , respectively. Overall, the IDP algorithm performs the best when $\delta$ is equal to 0.7, at which the average F1 rate is equal to 0.3023. Therefore, we determine that the practical value of the parameter $\delta$ is equal to 0.7.

Table 3: Performance of IDP algorithm on different

\delta

Metric	$N$	$\delta$
Metric	$N$	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9	1.0
$F_{1}$	2,000	0.1029	0.1113	0.1355	0.1435	0.1445	0.1389	0.1311	0.1372	0.1299	0.1015
	4,000	0.1786	0.2043	0.2303	0.2462	0.2548	0.2574	0.2546	0.2559	0.2508	0.2390
	6,000	0.2199	0.2515	0.2881	0.3073	0.3176	0.3287	0.3331	0.3294	0.3344	0.3227
	8,000	0.2445	0.2868	0.3207	0.3398	0.3505	0.3665	0.3711	0.3674	0.3727	0.3603
	10,000	0.2786	0.3207	0.3554	0.3771	0.3998	0.4133	0.4217	0.4180	0.4221	0.4151
	Average	0.2049	0.2349	0.2660	0.2828	0.2934	0.3010	0.3023	0.3016	0.3020	0.2877

5.3.2 Effects of average degree

We evaluate the performance of the baseline and IDP algorithms at different average degrees. We set $N=2,000$ , $s=0.5$ , $p$ increasing from 0.01 to 0.1 by 0.01, and $m=5,~{}10,~{}15$ . Figures 3(a) to (c) display the recall, precision, and F1 rates of the baseline and IDP algorithms under the above experimental settings. The following observations can be made. The IDP algorithm outperforms the baseline algorithms. This is because the IDP algorithm uses the degree penalty principle to calculate the matching degree of two unmatched nodes across different layers. This principle can reflect the contribution of the number of CMNs and influence of the degree of CMNs on the matching degree, while the FRUI algorithm only reflects the contribution of the number of CMNs. The NS algorithm depends heavily on the degree of the nodes in layer $\beta$ . It limits the contribution of CMNs; hence, the performance is not as good as FRUI. The INOE algorithm uses network embedding to predict the interlayer links, which needs more PINPs to train the model to show its advantage. For a given $m$ , the recall, precision, and F1 rates of the baseline and IDP algorithms increase with $p$ . This is because a larger $p$ means that additional CMNs can be used to calculate the matching degree of UNPs. For a given $p$ , the recall, precision, and F1 rates of the baseline and IDP algorithms increase with $m$ , as the nodes with a low degree are difficult to match. A smaller $m$ indicates more low-degree nodes. It is noteworthy that some lines in Figs. 3(a) to (c) are overlapped. We have listed their recall rate in Table 4.

Table 4: Recall rate of overlapped lines in Figs. 3(a) to (c)

Method	$m$	$p$
Method	$m$	0.01	0.02	0.03	0.04	0.05	0.06	0.07	0.08	0.09	0.1
INOE	15	0.0079	0.0108	0.0147	0.0200	0.0273	0.0365	0.0481	0.0610	0.0764	0.0950
INOE	10	0.0074	0.0106	0.0153	0.0215	0.0289	0.0383	0.0489	0.0608	0.0752	0.0912
INOE	5	0.0084	0.0123	0.0176	0.0240	0.0307	0.0386	0.0475	0.0574	0.0680	0.0795
NS	5	0.0037	0.0071	0.0113	0.0174	0.0260	0.0338	0.0414	0.0521	0.0633	0.0718
FRUI	5	0.0038	0.0077	0.0134	0.0213	0.0277	0.0385	0.0489	0.0594	0.0713	0.0843
IDP	5	0.0022	0.0060	0.0117	0.0192	0.0293	0.0382	0.0509	0.0627	0.0743	0.0866

Figures 3(d) to (f) illustrate the percentages of improvement in the recall, precision, and F1 rates for the IDP algorithm compared to the best baseline, FRUI, algorithm under the same experimental settings as those in Figs. 3(a) to (c). The recall rate increases by a maximum of 20.8% and an average of 6.0%. The precision rate increases by a maximum of 10.5% and an average of 3.1%. The F1 rate increases by a maximum of 14% and an average of 4.1%. A larger $m$ results in greater overall improvement percentages. This is because, with a larger $m$ , each unmatched node or matched interlayer node has more interlayer links. For a UMP, more CMNs may be involved in the calculation of its matching degree. Thus, the advantage of the degree penalty principle becomes more obvious. When $m=15$ , the percentages of improvement exhibit a trend of first increasing and then decreasing with an increase in $p$ . This trend is caused by the following factors. (i) When $p$ is very small, the number of PINPs used to calculate the matching degree is small. The ability of both the FRUI and IDP algorithms is poor. Thus, the percentages of improvement are not obvious. (ii) With an increase in $p$ , the number of PINPs increases. Owing to the use of the degree penalty principle to calculate the matching degree, the IDP algorithm can use more useful information of the CMNs. A greater number of PINPs results in higher percentages of improvement. (iii) When the percentages of improvement reach a maximum value, the matching advantage offered by increasing the PINPs in the IDP algorithm decreases. Thus, the percentages of improvement gradually decrease until they disappear. When $m=5$ or $m=10$ , the percentages of improvement only exhibit an increasing trend. This trend is consistent with the first half when $m=15$ . When $p$ is greater than 0.1 and the percentages of improvement reach the maximum value, they will decrease as $p$ increases.

5.3.3 Effects of node overlaps

We use the Jaccard coefficient to measure the node overlaps, as follows:

O(G^{\alpha},G^{\beta})=\frac{|V^{\alpha}\bigcap V^{\beta}|}{|V^{\alpha}\bigcup V^{\beta}|}.

(14)

When $s=0.4,~{}0.5,~{}0.6$ , the node overlapping rate is approximately equal to 0.25, 0.33, and 0.43, respectively, according to Eq. (14).

We set $N=2000$ , $m=10$ , $p$ increasing from 0.01 to 0.1 by 0.01, and $s=0.4,~{}0.5,~{}0.6$ to execute the experiments for the evaluation of the performance of the FRUI and IDP algorithms with different node overlaps. Figures 4(a) to (c) illustrate the recall, precision, and F1 rates of the FRUI and IDP algorithms under the above experimental settings. The following observations can be made.

The IDP algorithm outperforms the FRUI algorithm. For a given $s$ , the recall, precision, and F1 rates of the FRUI and IDP algorithms increase with $p$ . The reasons are the same as those in Figs. 3(a) to (c). For a given $p$ , the recall, precision, and F1 rates of the FRUI and IDP algorithms increase with $s$ . This is because a higher overlapping rate results in a higher proportion of interlayer nodes in all nodes, and a lower probability of incorrect matching; hence, superior performance of the algorithms. For a UMP, additional interlayer links mean that more CMNs may be involved in the calculation of its matching degree. Thus, the advantage of the degree penalty principle becomes more obvious.

Figures 4(d) to (f) display the percentages of improvement under the same experimental settings as those in Figs. 4(a) to (c). The recall rate increases by a maximum of 17.5% and an average of 4.6%. The precision rate increases by a maximum of 10.9% and an average of 2.7%. The F1 rate increases by a maximum of 13.4% and an average of 3.4%. A larger $s$ results in greater overall percentages of improvement. This is because a larger $s$ indicates a greater number of INPs and PINPs. For a UMP, additional CMNs may be involved in the calculation of its matching degree; thus, the advantage of the degree penalty principle is more obvious. When $s=0.6$ , the percentages of improvement exhibit a trend of first increasing and then decreasing with an increase in $p$ . Moreover, when $s=0.4$ or $s=0.5$ , the percentages of improvement exhibit a trend of increasing with an increase in $p$ . The reasons are the same as those in Figs. 3(d) to (f).

5.3.4 Effects of network size

We set $s=0.5$ , $m=10$ , $p$ increasing from 0.01 to 0.1 by 0.01, and $N=$ 2,000, 4,000, 6,000, 8,000, and 10,000 to execute the experiments for the evaluation of the performance of the FRUI and IDP algorithms with different network sizes. Figures 5(a) to (c) illustrate the recall, precision, and F1 rates of the FRUI and IDP algorithms under the above experimental settings. The following observations can be made.

The IDP algorithm outperforms the FRUI algorithm. For a given $N$ , the recall, precision, and F1 rates of the FRUI and IDP algorithms increase with $p$ . The reasons are the same as those in Figs. 3(a) to (c). For a given $p$ , the recall, precision, and F1 rates of the FRUI and IDP algorithms increase with $N$ . This is because, when $N$ is larger, more PINPs are used to calculate the matching degree of all UNPs; hence, the performance of the algorithms is improved. When $N=10,000$ and $p\geq 0.07$ , the performance of both the FRUI and IDP algorithms decreases slightly with the increase in $p$ , because the number of interlayer links that can be predicted correctly has a maximum. When the maximum number is reached, the increase in $p$ will lead to a slight decrease in the number of interlayer links to be predicted and that are correctly predicted.

Figures 5(d) to (f) display the percentages of improvement under the same experimental settings as those in Figs. 5(a) to (c). The recall rate increases by a maximum of 23.3% and an average of 8.2%. The precision rate increases by a maximum of 12.2% and an average of 4.5%. The F1 rate increases by a maximum of 16.1% and by an average of 5.8%. A larger $N$ results in greater average percentages of improvement for the IDP algorithm compared to the FRUI algorithm. Because the BA networks exhibit the characteristics of preferential attachment, new vertices attach preferentially to vertices that are already well connected [73]. A larger $N$ means that the influence of the degree of CMNs on the matching degree is more sensitive; hence, the percentages of improvement are increased. When $N=4,000,~{}6,000,~{}8,000$ or 10,000, the percentages of improvement exhibit a trend of first increasing and then decreasing with an increase in $p$ . Moreover, when $N=2000$ , the percentages of improvement exhibit a trend of increasing with an increase in $p$ . The reasons are the same as those in Figs. 3(d) to (f).

5.4 Results on real-world networks

Table 5: Statistical characteristics of eight real-world networks, including network size (

N

), number of edges (

E

), maximum degree (

k_{max}

), first (

\langle{k}\rangle

) and second moments (

\langle{k^{2}}\rangle

) of degree distribution, degree-degree correlations (

r

), and clustering (

c

No.	Networks	Statistical characteristics of networks
No.	Networks	$N$	$E$	$k_{max}$	$\langle{k}\rangle$	$\langle{k^{2}}\rangle$	$r$	$c$
Real-1	Email-Eu-core[80]	1005	16064	345	32.58	2386.6	$-0.026$	0.45
Real-2	UC Irvine messages[81]	1899	13838	255	14.57	810.7	$-0.188$	0.14
Real-3	Wikipedia vote[82]	7115	100762	1065	28.32	3530.5	$-0.083$	0.21
Real-4	Twitter[63]	5120	130576	1725	51.01	11679.7	$-0.214$	0.30
Real-5	Political blogs[83]	1222	16714	351	27.36	2223	$-0.221$	0.36
Real-6	Hamsterster friendships[84]	1788	12476	272	13.96	635.6	$-0.089$	0.17
Real-7	Hamsterster full[84]	2000	16098	273	16.1	704.7	0.023	0.57
Real-8	Foursquare[63]	5313	54233	552	20.42	1436.1	$-0.193$	0.23

We use eight real-world networks to construct self-matching real-world multiplex networks, and evaluate the FRUI and IDP algorithms on these multiplex networks. The statistical characteristics of the eight real-world networks are presented in Table 5. Formally, we represent each of the eight real-world networks as an undirected graph. In these experiments, $s=0.5$ and $p$ increases from 0.01 to 0.1 by 0.01.

Figures 6(a) to (f) display the recall, precision, and F1 rates of the FRUI and IDP algorithms of the eight real-world networks. The following observations can be made. The IDP algorithm outperforms the FRUI algorithm. For a given real-world network, the recall, precision, and F1 rates of the FRUI and IDP algorithms increase with $p$ . The reasons are the same as those in Figs. 3(a) to (c). For a given $p$ , different real-world networks exhibit varying performances, because the network sizes, average degrees, and clustering coefficients of the real-world networks differ.

Figures 6(g) to (l) illustrate the percentages of improvement under the same experimental settings as those in Figs. 6(a) to (f). The following observations can be made. The recall rate increases by a maximum of 36.6% and an average of 7.0%. The precision rate increases by a maximum of 19.0% and an average of 3.8%. The F1 rate increases by a maximum of 25.0% and an average of 5.0%. On real-world network 2, 3, 4, and 8, the percentages of improvement exhibit a trend of first increasing and then decreasing with an increase in $p$ . On real-world networks 1, 5, 6, and 7, the percentages of improvement exhibit a trend of first increasing and then decreasing with an increase in $p$ . The reasons are the same as those in Figs. 3(d) to (f).

6 Conclusion

In this study, we have investigated the problem of interlayer link prediction in the multiplex network. We solved this problem by leveraging network structure attributes. We used a degree penalty principle to calculate the matching degree of two unmatched nodes across different layers, which could reflect the influence of the number of CMNs and their degree attributes. For the sake of efficiency, we adopted node adjacency matrix multiplication to obtain the matching degrees of all UNPs. Moreover, we developed an iterative algorithm to determine additional hidden interlayer links. Experiments on both artificial and real-world networks demonstrated that our advanced IDP algorithm outperformed the FRUI algorithm.

In summary, the IDP algorithm offers comparative advantages in multiplex networks with a degree distribution that follows a power law distribution. When users in multiple OSN applications input different usernames and other attribute information owing to privacy or anonymity, our proposed algorithm can effectively associate the accounts belonging to the same person across different OSNs. Such an association can aid in achieving the information fusion of multiple OSN platforms, understanding the online behavior of network users, constructing integral user profiles, and providing services for e-commerce, cyber security, and recommendation systems, among others. As future work, we will investigate the manner in which to use higher-order structures, such as communicating within a group or participating in the same online activity, to improve the performance of the interlayer link prediction.

7 Acknowledgments

This work was supported by the National Science and Technology Support Program of China (Grant No. 2012BAH18B05) and the National Natural Science Foundation of China (Grant Nos. 81602935, 81773548, 61802270, 61802271).

References

Shi et al. [2017] C. Shi, Y. Li, J. Zhang, Y. Sun, P. S. Yu, A survey of heterogeneous information network analysis, IEEE Transactions on Knowledge and Data Engineering 29 (2017) 17–37.
Shu et al. [2017] K. Shu, S. Wang, J. Tang, R. Zafarani, H. Liu, User identity linkage across online social networks: A review, ACM SIGKDD Explorations Newsletter 18 (2017) 5–17.
Fujita et al. [2019] H. Fujita, A. Gaeta, V. Loia, F. Orciuoli, Hypotheses analysis and assessment in counter-terrorism activities: a method based on owa and fuzzy probabilistic rough sets, IEEE Transactions on Fuzzy Systems (2019).
Capuano et al. [2017] N. Capuano, F. Chiclana, H. Fujita, E. Herrera-Viedma, V. Loia, Fuzzy group decision making with incomplete information guided by social influence, IEEE Transactions on Fuzzy Systems 26 (2017) 1704–1718.
Newman [2010] M. Newman, Networks: an introduction, Oxford university press, New York, NY, USA, 2010.
Han et al. [2017] X. Han, L. Wang, C. Cui, J. Ma, S. Zhang, Linking multiple online identities in criminal investigations: A spectral co-clustering framework, IEEE Transactions on Information Forensics and Security 12 (2017) 2242–2255.
Chi et al. [2019] K. Chi, G. Yin, Y. Dong, H. Dong, Link prediction in dynamic networks based on the attraction force between nodes, Knowledge-Based Systems (2019).
Yin et al. [2017] G. Yin, K. Chi, Y. Dong, H. Dong, An approach of community evolution based on gravitational relationship refactoring in dynamic networks, Physics Letters A 381 (2017) 1349–1355.
Zhao et al. [2019] Z. Zhao, C. Li, X. Zhang, F. Chiclana, E. H. Viedma, An incremental method to detect communities in dynamic evolving social networks, Knowledge-Based Systems 163 (2019) 404–415.
Boccaletti et al. [2014] S. Boccaletti, G. Bianconi, R. Criado, C. del Genio, J. Gómez-Gardeñes, M. Romance, I. Sendiña-Nadal, Z. Wang, M. Zanin, The structure and dynamics of multilayer networks, Physics Reports 544 (2014) 1–122.
Gao et al. [2011a] J. Gao, S. V. Buldyrev, H. E. Stanley, S. Havlin, Networks formed from interdependent networks, Nature Physics 8 (2011a) 40–48.
Gao et al. [2011b] J. Gao, S. V. Buldyrev, S. Havlin, H. E. Stanley, Robustness of a network of networks, Physical Review Letters 107 (2011b).
Granell et al. [2013] C. Granell, S. Gómez, A. Arenas, Dynamical interplay between awareness and epidemic spreading in multiplex networks, Physical Review Letters 111 (2013) 128701.
Wang et al. [2014] W. Wang, M. Tang, H. Yang, Y. Do, Y.-C. Lai, G. Lee, Asymmetrically interacting spreading dynamics on complex layered networks, Scientific Reports 4 (2014) 5097.
Wang et al. [2016] W. Wang, Q.-H. Liu, S.-M. Cai, M. Tang, L. A. Braunstein, H. E. Stanley, Suppressing disease spreading by using information diffusion on multiplex networks, Scientific Reports 6 (2016) 29259.
Wang et al. [2019] W. Wang, Q.-H. Liu, J. Liang, Y. Hu, T. Zhou, Coevolution spreading in complex networks, arXiv preprint arXiv:1901.02125 (2019).
Zhang et al. [2015] X. Zhang, S. Boccaletti, S. Guan, Z. Liu, Explosive synchronization in adaptive and multilayer networks, Physical Review Letters 114 (2015) 038701.
Wang et al. [2013a] Z. Wang, A. Szolnoki, M. Perc, Optimal interdependence between networks for the evolution of cooperation, Scientific Reports 3 (2013a) 2470.
Wang et al. [2013b] Z. Wang, A. Szolnoki, M. Perc, Interdependent network reciprocity in evolutionary games, Scientific Reports 3 (2013b) 1183.
Wang et al. [2014] Z. Wang, A. Szolnoki, M. Perc, Self-organization towards optimally interdependent networks by means of coevolution, New Journal of Physics 16 (2014) 033041.
Kivela et al. [2014] M. Kivela, A. Arenas, M. Barthelemy, J. P. Gleeson, Y. Moreno, M. A. Porter, Multilayer networks, Journal of Complex Networks 2 (2014) 203–271.
Cozzo et al. [2018] E. Cozzo, G. F. de Arruda, F. A. Rodrigues, Y. Moreno, Multiplex Networks: Basic Formalism and Structural Properties, Springer-Verlag GmbH, Cham, Switzerland, pp. 3–5.
Domenico et al. [2013] M. D. Domenico, A. Solé-Ribalta, E. Cozzo, M. Kivelä, Y. Moreno, M. A. Porter, S. Gómez, A. Arenas, Mathematical formulation of multilayer networks, Physical Review X 3 (2013).
Lu et al. [2014] C.-T. Lu, H.-H. Shuai, P. S. Yu, Identifying your customers in social networks, in: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, ACM, 2014, pp. 391–400.
Vosecky et al. [2009] J. Vosecky, D. Hong, V. Y. Shen, User identification across multiple social networks, in: 2009 first international conference on networked digital technologies, IEEE, 2009, pp. 360–365.
Wang et al. [2019] S. Wang, X. Li, Y. Ye, S. Feng, R. Lau, X. Huang, X. Du, Anchor link prediction across attributed networks via network embedding, Entropy 21 (2019) 254.
Zhang et al. [2019] S. Zhang, H. Tong, R. Maciejewski, T. Eliassi-Rad, Multilevel network alignment, in: Proceedings of the 28nd International Conference on World Wide Web, ACM Press, San Francisco, CA, USA, 2019.
Chu et al. [2019] X. Chu, X. Fan, D. Yao, Z. Zhu, J. Huang, J. Bi, Cross-network embedding for multi-network alignment, in: Proceedings of the 28nd International Conference on World Wide Web, ACM Press, San Francisco, CA, USA, 2019.
Wang et al. [2019] Y. Wang, H. Shen, J. Gao, X. Cheng, Learning binary hash codes for fast anchor link retrieval across networks, in: Proceedings of the 28nd International Conference on World Wide Web, ACM Press, San Francisco, CA, USA, 2019.
Zhan et al. [2018] Q. Zhan, J. Zhang, P. S. Yu, Integrated anchor and social link predictions across multiple social networks, Knowledge and Information Systems (2018).
Zhou et al. [2018a] X. Zhou, X. Liang, J. Zhao, A. Zhiyuli, H. Zhang, An unsupervised user identification algorithm using network embedding and scalable nearest neighbour, Cluster Computing (2018a).
Zhou et al. [2018b] F. Zhou, L. Liu, K. Zhang, G. Trajcevski, J. Wu, T. Zhong, Deeplink: A deep learning approach for user identity linkage, in: IEEE INFOCOM 2018-IEEE Conference on Computer Communications, IEEE, 2018b, pp. 1313–1321.
Zhou et al. [2018c] X. Zhou, X. Liang, X. Du, J. Zhao, Structure based user identification across social networks, IEEE Transactions on Knowledge and Data Engineering 30 (2018c) 1178–1191.
Zhang et al. [2018] J. Zhang, B. Chen, X. Wang, H. Chen, C. Li, F. Jin, G. Song, Y. Zhang, Mego2vec: Embedding matched ego networks for user alignment across social networks, in: Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM), ACM, 2018, pp. 327–336.
Zhang et al. [2017] S. Zhang, H. Tong, J. Tang, J. Xu, W. Fan, ineat: Incomplete network alignment, in: 17th IEEE International Conference on Data Mining (ICDM), IEEE, New Orleans, LA, USA, 2017, pp. 1189–1194.
Zhou et al. [2016] X. Zhou, X. Liang, H. Zhang, Y. Ma, Cross-platform identification of anonymous identical users in multiple social media networks, IEEE Transactions on Knowledge and Data Engineering 28 (2016) 411–424.
Liu et al. [2016] L. Liu, W. K. Cheung, X. Li, L. Liao, Aligning users across social networks using network embedding., in: Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), pp. 1774–1780.
Man et al. [2016] T. Man, H. Shen, S. Liu, X. Jin, X. Cheng, Predict anchor links across social networks via an embedding approach, in: 25th International Joint Conference on Artificial Intelligence (IJCAI), volume 16, New York, USA, pp. 1823–1829.
Zhang and Tong [2016] S. Zhang, H. Tong, Final: Fast attributed network alignment, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '16, ACM Press, San Francisco, CA, USA, 2016.
Zhang and Philip [2015] J. Zhang, S. Y. Philip, Multiple anonymized social networks alignment, in: 2015 IEEE International Conference on Data Mining, IEEE, 2015, pp. 599–608.
Tan et al. [2014] S. Tan, Z. Guan, D. Cai, X. Qin, J. Bu, C. Chen, Mapping users across networks by manifold alignment on hypergraph, in: 28th AAAI Conference on Artificial Intelligence, Québec City, Québec, Canada, pp. 159–165.
Feng et al. [2018] R. Feng, Y. Yang, W. Hu, F. Wu, Y. Zhang, Representation learning for scale-free networks, in: Thirty-Second AAAI Conference on Artificial Intelligence.
Zhou et al. [2011] T. Zhou, M. Medo, G. Cimini, Z.-K. Zhang, Y.-C. Zhang, Emergence of scale-free leadership structure in social recommender systems, PLoS ONE 6 (2011) e20648.
Zhao et al. [2018] D. Zhao, N. Zheng, M. Xu, X. Yang, J. Xu, An improved user identification method across social networks via tagging behaviors, in: 30th International Conference on Tools with Artificial Intelligence (ICTAI), IEEE, 2018, pp. 616–622.
Zafarani and Liu [2009] R. Zafarani, H. Liu, Connecting corresponding identities across communities, in: Third International AAAI Conference on Weblogs and Social Media.
Zafarani and Liu [2013] R. Zafarani, H. Liu, Connecting users across social media sites: a behavioral-modeling approach, in: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2013, pp. 41–49.
Perito et al. [2011] D. Perito, C. Castelluccia, M. A. Kaafar, P. Manils, How unique and traceable are usernames?, in: 11st International Symposium on Privacy Enhancing Technologies Symposium, Springer, 2011, pp. 1–17.
Liu et al. [2013] J. Liu, F. Zhang, X. Song, Y.-I. Song, C.-Y. Lin, H.-W. Hon, What’s in a name? an unsupervised approach to link users across communities, in: 6th ACM International Conference on Web Search and Data Mining (WSDM), ACM, Rome, Italy, 2013, pp. 495–504.
Acquisti et al. [2014] A. Acquisti, R. Gross, F. D. Stutzman, Face recognition and privacy in the age of augmented reality, Journal of Privacy and Confidentiality 6 (2014) 1–20.
Labitzke et al. [2011] S. Labitzke, I. Taranu, H. Hartenstein, What your friends tell others about you: Low cost linkability of social network profiles, in: Proc. 5th International ACM Workshop on Social Network Mining and Analysis, San Diego, CA, USA, pp. 1065–1070.
Carmagnola and Cena [2009] F. Carmagnola, F. Cena, User identification for cross-system personalisation, Information Sciences 179 (2009) 16–32.
Iofciu et al. [2011] T. Iofciu, P. Fankhauser, F. Abel, K. Bischoff, Identifying users across social tagging systems, in: 5th International AAAI Conference on Weblogs and Social Media.
Abel et al. [2013] F. Abel, E. Herder, G.-J. Houben, N. Henze, D. Krause, Cross-system user modeling and personalization on the social web, User Modeling and User-Adapted Interaction 23 (2013) 169–209.
Goga et al. [2013] O. Goga, D. Perito, H. Lei, R. Teixeira, R. Sommer, Large-scale correlation of accounts across social networks, University of California at Berkeley, Berkeley, California, Tech. Rep. TR-13-002 (2013).
Cortis et al. [2013] K. Cortis, S. Scerri, I. Rivera, S. Handschuh, An ontology-based technique for online profile resolution, in: International Conference on Social Informatics, Springer, 2013, pp. 284–298.
Mu et al. [2016] X. Mu, F. Zhu, E.-P. Lim, J. Xiao, J. Wang, Z.-H. Zhou, User identity linkage by latent user space modelling, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2016, pp. 1775–1784.
Zheng et al. [2006] R. Zheng, J. Li, H. Chen, Z. Huang, A framework for authorship identification of online messages: Writing-style features and classification techniques, Journal of the American Society for Information Science and Technology 57 (2006) 378–393.
Goga et al. [2013] O. Goga, H. Lei, S. H. K. Parthasarathi, G. Friedland, R. Sommer, R. Teixeira, Exploiting innocuous activity for correlating users across sites, in: Proceedings of the 22nd international conference on World Wide Web, ACM, 2013, pp. 447–458.
Riederer et al. [2016] C. Riederer, Y. Kim, A. Chaintreau, N. Korula, S. Lattanzi, Linking users across domains with location data: Theory and validation, in: Proceedings of the 25th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, 2016, pp. 707–719.
Narayanan and Shmatikov [2010] A. Narayanan, V. Shmatikov, Myths and fallacies of ”personally identifiable information”, Communications of the ACM 53 (2010) 24–26.
Zhu et al. [2012] Y. Zhu, L. Qin, J. X. Yu, Y. Ke, X. Lin, High efficiency and quality: large graphs matching, The International Journal on Very Large Data Bases 22 (2012) 345–368.
Zhang et al. [2015] Y. Zhang, J. Tang, Z. Yang, J. Pei, P. S. Yu, Cosnet: Connecting heterogeneous social networks with local and global consistency, in: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2015, pp. 1485–1494.
Zhang and Philip [2015] J. Zhang, S. Y. Philip, Integrated anchor and social link predictions across social networks, in: Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires, Argentina.
Tang et al. [2015] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, Q. Mei, Line: Large-scale information network embedding, in: Proceedings of the 24th international conference on world wide web, International World Wide Web Conferences Steering Committee, 2015, pp. 1067–1077.
Perozzi et al. [2014] B. Perozzi, R. Al-Rfou, S. Skiena, Deepwalk: Online learning of social representations, in: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2014, pp. 701–710.
Jain et al. [2013] P. Jain, P. Kumaraguru, A. Joshi, @i seek 'fb.me': Identifying users across multiple online social networks, in: Proceedings of the 22nd International Conference on World Wide Web - WWW '13 Companion, ACM Press, 2013.
Nunes et al. [2012] A. Nunes, P. Calado, B. Martins, Resolving user identities over social networks through supervised learning and rich similarity features, in: Proceedings of the 27th Annual ACM symposium on applied computing, ACM, 2012, pp. 728–729.
Kong et al. [2013] X. Kong, J. Zhang, P. S. Yu, Inferring anchor links across multiple heterogeneous social networks, in: Proceedings of the 22nd ACM international conference on Information & Knowledge Management, ACM, 2013, pp. 179–188.
Zhang et al. [2014] H. Zhang, M.-Y. Kan, Y. Liu, S. Ma, Online social network profile linkage, in: Asia Information Retrieval Symposium, Springer, 2014, pp. 197–208.
Newman [2003] M. E. J. Newman, The structure and function of complex networks, SIAM Review 45 (2003) 167–256.
Katz [1953] L. Katz, A new status index derived from sociometric analysis, Psychometrika 18 (1953) 39–43.
Lee et al. [1997] J. S. Lee, S.-Y. Park, P. B. Berra, S. Ranka, I/o and memory-efficient matrix multiplication with user-controllable parallel i/o, in: Proceedings 1997 International Conference on Parallel and Distributed Systems, IEEE, pp. 59–66.
Barabási and Albert [1999] A.-L. Barabási, R. Albert, Emergence of scaling in random networks, Science 286 (1999) 509–512.
Leskovec and Krevl [2020] J. Leskovec, A. Krevl, SNAP Datasets: Stanford large network dataset collection, http://snap.stanford.edu/data, 2020.
lin [2020] Link prediction group, http://www.linkprediction.org/, 2020.
kon [2020] Network analysis of advogato, http://konect.uni-koblenz.de/networks/, 2020.
Kong et al. [2016] C. Kong, M. Gao, C. Xu, W. Qian, A. Zhou, Entity matching across multiple heterogeneous data sources, in: 21st International Conference on Database Systems for Advanced Applications (DASFAA), Springer, Dallas, Texas, USA, 2016, pp. 133–146.
Narayanan and Shmatikov [2009] A. Narayanan, V. Shmatikov, De-anonymizing social networks, in: Proceedings of the 2009 30th IEEE Symposium on Security and Privacy, IEEE Computer Society, pp. 173–187.
HRIPCSAK and ROTHSCHILD [2005] G. HRIPCSAK, A. S. ROTHSCHILD, Agreement, the f-measure, and reliability in information retrieval, Journal of the American Medical Informatics Association 12 (2005) 296–298.
Yin et al. [2017] H. Yin, A. R. Benson, J. Leskovec, D. F. Gleich, Local higher-order graph clustering, in: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2017, pp. 555–564.
Opsahl and Panzarasa [2009] T. Opsahl, P. Panzarasa, Clustering in weighted networks, Social networks 31 (2009) 155–163.
Leskovec et al. [2010] J. Leskovec, D. Huttenlocher, J. Kleinberg, Predicting positive and negative links in online social networks, in: Proceedings of the 19th international conference on World Wide Web, ACM, 2010, pp. 641–650.
Adamic and Glance [2005] L. A. Adamic, N. Glance, The political blogosphere and the 2004 us election: divided they blog, in: Proceedings of the 3rd international workshop on Link discovery, ACM, 2005, pp. 36–43.
Kunegis [2013] J. Kunegis, Konect: the koblenz network collection, in: Proceedings of the 22nd International Conference on World Wide Web, ACM, 2013, pp. 1343–1350.