This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Hiding in Temporal Networks

Marcin Waniek New York University Abu Dhabi, Abu Dhabi, UAE Petter Holme Tokyo Institute of Technology, Tokyo, Japan Talal Rahwan New York University Abu Dhabi, Abu Dhabi, UAE
Abstract

Social network analysis tools can infer various attributes just by scrutinizing one’s connections. Several researchers have studied the problem faced by an evader whose goal is to strategically rewire their social connections in order to mislead such tools, thereby concealing their private attributes. However, to date this literature has only considered static networks, while neglecting the more general case of temporal networks, where the structure evolves over time. Driven by this observation, we study how the evader can conceal their importance from an adversary armed with temporal centrality measures. We consider computational and structural aspects of this problem: Is it computationally feasible to calculate optimal ways of hiding? If it is, what network characteristics facilitate hiding? This topic has been studied in static networks, but in this work, we add realism to the problem by considering temporal networks of edges changing in time. We find that it is usually computationally infeasible to find the optimal way of hiding. On the other hand, by manipulating one’s contacts, one could add a surprising amount of privacy. Compared to static networks, temporal networks offer more strategies for this type of manipulation and are thus, to some extent, easier to hide in.

1 Introduction

The increasing sophistication and ubiquity of computer-based invigilation tools is a persistent threat to the privacy of the general public. The increasing number of privacy-related scandals, such as Cambridge Analytica using data of millions of Facebook users for political agendas [18], demonstrates just how vulnerable our private information is in the age of social media. Network data, from social media in particular, can be used to uncover sensitive information, such as sexual orientation [20], relationship status [12], or political views [32].

Due to this vulnerability of network data, a number of privacy-preservation solutions have been proposed, both in terms of legislature [9] and algorithmic solutions [25, 21]. Most of the literature puts the responsibility of protecting the system users’ privacy in the hands of a centralized authority [13, 22, 10], but such an authority might be prone to error and negligence. A different body of literature proposes methods that can be applied by members of the social network to protect their own privacy, without having to rely on any central entities [56, 30].

Nevertheless, so far, the development of privacy-protection schemes for network data has focused on networks that are static, i.e., whose structure do not change over time. At the same time, network science researchers are starting to shift their attention to temporal networks—the more general case where the structure is allowed to change [16]. In many domains that involve dynamic changes, momentary contacts, and processes unfolding over time, the added complexity of a temporal-network approach can be justified by an added predictive and explanatory power [16]. Temporal networks already found application in such varied areas as communication [8, 3], microbiology [35, 36], and face-to-face interactions [41]. Nevertheless, so far no privacy-protection solutions have been proposed for temporal networks.

Motivated by the relevance of the growing privacy threats and the increasing popularity of temporal networks, in this work we examine how a member of a temporal network can avoid being detected by temporal centrality measures. We set out to investigate the issue from both theoretical and empirical standpoints. As for the theoretical analysis, we evaluate the computational complexity of the problem faced by an evader who wishes to rewire the network in order to obscure their central position in it. We consider both the decision version of the problem—if it is possible to find an optimal solution in polynomial time—as well as the optimization version—if it is possible to identify a solution that is guaranteed to be within a certain bound from optimum. As for the empirical analysis, we consider several real-life temporal network datasets and investigate how effectively the top nodes in the temporal centrality rankings can conceal their importance. We not only observe the results of the hiding process but also identify, using regression analysis, the properties of the nodes that allow them to conceal their importance effectively. Altogether, our study is the first analysis of strategic evasion of social network analysis tools in temporal networks.

2 Related Work

In recent years there has been a growing interest in temporal networks, where the network structure is not static, but instead changes with the passage of time [16]. Temporal networks found particularly relevant applications in epidemics, where they have been used to predict the infection’s reproduction number [15] and to construct static graphs based on temporal contact data [14]. In the context of our work, an essential class of tools for the analysis of temporal networks are centrality measures [23]. The approach to their design varies greatly, ranging from the analysis of network flows [42] and shortest temporal paths [34], to the applications of eigenvector-like techniques [43, 26, 44].

The topic of avoiding detection by social network analysis tools recently received some attention in the literature devoted to static networks. The greatest amount of attention is focused on hiding from centrality measures, either by lowering the node’s centrality, either in absolute terms [51] or in relative terms [50, 5]. Other works provide an axiomatic characterization of centrality measures that are resilient to being fooled [54], analyze the possible strategies of an adversary who is aware of the existence of nodes that want to hide themselves [52], or consider the problem of hiding from centrality measures in multilayer networks [49]. Other hiding problems considered in the literature include preventing the identification of closely-cooperating groups of nodes by community detection algorithms [51], avoiding the detection of private relationships by link prediction algorithms [53, 57], and investigating the possibility of concealing the source of network diffusion from source detection algorithms [48]. All of these works consider only static networks.

3 Preliminaries

3.1 Temporal Networks

Throughout the article, we will let T\langle T\rangle denote a time interval of TT discrete time steps, i.e., T={0,,T1}\langle T\rangle=\{0,\ldots,T-1\}. We will sometimes refer to a particular tTt\in\langle T\rangle as the moment tt. Let us denote by G=(V,K,T)𝔾G=(V,K,T)\in\mathbb{G} a temporal network, where VV is the set of nn nodes, KV×V×TK\subseteq V\times V\times\langle T\rangle is the set of contacts, and TT is the duration of the time interval during which the contacts in KK take place. We denote a contact (sometimes also called a temporal edge) between nodes vv and ww at time tt by (v,w,t)(v,w,t). In this work we only consider undirected temporal networks, i.e., we do not discern between contacts (v,w,t)(v,w,t) and (w,v,t)(w,v,t). Moreover, we assume that networks do not contain self-contacts, i.e., vVtT(v,v,t)K\forall_{v\in V}\forall_{t\in\langle T\rangle}(v,v,t)\notin K. We denote all contacts of a given node vv by KG(v)K_{G}(v). Finally, for KV×V×TK^{\prime}\subseteq V\times V\times\langle T\rangle we denote by GKG\cup K^{\prime} the effect of adding the set of contacts KK^{\prime} to GG, i.e., GK=(V,KK,T)G\cup K^{\prime}=(V,K\cup K^{\prime},T).

A time-respecting path in a temporal network G=(V,K,T)G=(V,K,T) is an ordered sequence of distinct contacts from KK, c1,,ck\langle c_{1},\ldots,c_{k}\rangle, in which for every two consecutive contacts (v,w,t)(v,w,t) and (v,w,t)(v^{\prime},w^{\prime},t^{\prime}) we have that w=vw=v^{\prime} and t>tt^{\prime}>t. The duration of the path is the time difference between the first and the last contact in the path, i.e., the duration of a path (v,w,t),,(v,w,t)\langle(v,w,t),\ldots,(v^{\prime},w^{\prime},t^{\prime})\rangle is ttt^{\prime}-t. Let ΠG(v,w)\Pi_{G}(v,w) denote the set of temporal paths from vv to ww with the minimal duration. We say that we can reach node ww from node vv at time tt if there exists a time-respecting path from vv to ww that starts at time greater than or equal to tt. The latency between vv and ww at time tt is the shortest time it takes to reach ww from vv starting at time tt along time-respecting paths, we denote it by λG(v,w,t)\lambda_{G}(v,w,t). If no such time-respecting path exists then λG(v,w,t)=\lambda_{G}(v,w,t)=\infty. We denote average latency between vv and ww by λ(v,w)\lambda^{*}(v,w). Notice that average latency is the area under the latency plot, divided by the length of the time interval T1T-1. Following Pan and Saramäki [34] we assume a pair-specific temporal boundary condition, where for every pair of nodes, the first time-respecting path between them is repeated after the end of the observed time interval when computing the average latency for that pair. Hence, if there exists at least one time-respecting path between the pair of nodes, then the average latency between them is finite. An example of a latency plot used to compute the average latency is presented in Figure 1.

Refer to caption
Figure 1: An example of temporal network and a latency plot. A. A temporal network with times of each contact denoted as number next to edges. B. A plot of latency between v1v_{1} and v4v_{4} for under assumption that the duration of time interval is T=7T=7. The gray area under plot is proportional to average latency between v1v_{1} and v4v_{4}. Notice that the latency in time interval (6,7](6,7] is finite due to the assumption about a pair-specific temporal boundary condition [34].

To make the notation more readable, we will often omit the temporal network itself from the notation whenever it is clear from the context, e.g., by writing λ(v,w,t)\lambda(v,w,t) instead of λG(v,w,t)\lambda_{G}(v,w,t). This applies not only to the notation presented thus far, but rather to all notation in this article.

3.2 Temporal Centrality Measures

Centrality measures quantify the importance of a given node in a network. The concept has also been extended to temporal networks [23]. In this work we consider the following temporal centrality measures:

  • Degree temporal centrality [23]—importance of a node vv corresponds to its number of contacts.

    cD((V,K,T),v)=tT|{wV:(v,w,t)K}|(n1)T.c_{D}\!\left(\left(V,\!K,\!T\right)\!,\!v\right)=\frac{\sum_{t\in\langle T\rangle}|\{w\!\in\!V\!:\!(v,w,t)\!\in\!K\}|}{(n-1)T}.
  • Closeness temporal centrality [34]—importance of a node vv corresponds to its average latency to other nodes.

    cC((V,K,T),v)=1n1wV:vw1λ(v,w).c_{C}\!\left(\left(V,\!K,\!T\right)\!,\!v\right)=\frac{1}{n-1}\sum_{w\in V:v\neq w}\frac{1}{\lambda^{*}(v,w)}.
  • Betweenneess temporal centrality [42]—importance of a node vv corresponds to the percentage of shortest temporal paths between pairs of other nodes passing through vv.

    cB((V,K,T),v)=\displaystyle c_{B}\!\left(\left(V,\!K,\!T\right)\!,\!v\right)= 1(n1)(n2)T\displaystyle\frac{1}{(n-1)(n-2)T}
    u,wV{v}:uwt=0T1|Π~tv(u,w)||Π(u,w)|\displaystyle\sum_{\begin{subarray}{c}u,w\in V\setminus\{v\}:\\ u\neq w\end{subarray}}\sum_{t=0}^{T-1}\frac{|\widetilde{\Pi}^{v}_{t}(u,\!w)|}{|\Pi(u,\!w)|}

    where Π~tv(u,w)\widetilde{\Pi}^{v}_{t}(u,w) is the set of shortest temporal paths from uu to ww such that vv belongs to the path and either contacts between vv and its predecessor and successor take place at moment tt, or contact between vv and its predecessor takes place at or before moment tt, while the contact between vv and its successor takes place after moment tt:

    Π~tv(u,w)=\displaystyle\widetilde{\Pi}^{v}_{t}(u,\!w)= {πΠ(u,w):(v,v,t)π(v,v′′,t′′)π\displaystyle\{\pi\!\in\!\Pi(u,\!w):(v^{\prime}\!\!,\!v,\!t^{\prime})\!\in\!\pi\land(v,\!v^{\prime\prime}\!\!,\!t^{\prime\prime})\!\in\!\pi
    (t=t=t′′tt<t′′)}.\displaystyle\land(t^{\prime}=t=t^{\prime\prime}\lor t^{\prime}\leq t<t^{\prime\prime})\}.
  • Eigenvector temporal centrality [26]—importance of a node vv corresponds to the importance of its neighbors. The temporal version of the eigenvector centrality was defined in a number of ways [43, 17, 55, 45]. In our work we use algorithm by Lv et al. [26], as it allows to efficiently process even relatively large networks.

4 Theoretical Analysis

We now present the formal definition of the computational problems faced by the evader, starting with finding an optimal way of hiding within a certain budget.

Definition 1 (Temporal Hiding).

An instance of the Temporal Hiding problem is defined by a tuple (G,v,A^,R^,b,c,ω)(G,v^{\dagger},\widehat{A},\widehat{R},b,c,\omega), where G=(V,K,T)G=(V,K,T) is a temporal network, vv^{\dagger} is the evader, A^V×V×T\widehat{A}\subseteq V\times V\times\langle T\rangle is the set of contacts allowed to be added, R^K\widehat{R}\subseteq K is the set of contacts allowed to be removed, bb\in\mathbb{N} is a budget specifying the maximum number of contacts that can be added or removed, c:𝔾×Vc\colon\mathbb{G}\times V\rightarrow\mathbb{R} is a temporal centrality measure, and ω\omega\in\mathbb{N} is a chosen safety threshold. The goal is then to identify two sets, AA^A^{*}\subseteq\widehat{A} and RR^R^{*}\subseteq\widehat{R}, such that |A|+|R|b|A^{*}|+|R^{*}|\leq b and:

|{wV:c(G,w)>c(G,v)}|ω\left|\left\{w\in V:c(G^{*},w)>c(G^{*},v^{\dagger})\right\}\right|\geq\omega

where G=(V,(KA)R,T)G^{*}=\left(V,(K\cup A^{*})\setminus R^{*},T\right).

We also present an approximation version of the problem, where the goal of the evader is to satisfy a certain safety threshold using as few network modifications as possible.

Definition 2 (Minimum Temporal Hiding).

An instance of the Minimum Temporal Hiding problem is defined by a tuple (G,v,A^,R^,c,ω)(G,v^{\dagger},\widehat{A},\widehat{R},c,\omega), where G=(V,K,T)G=(V,K,T) is a temporal network, vv^{\dagger} is the evader, A^V×V×T\widehat{A}\subseteq V\times V\times\langle T\rangle is the set of contacts allowed to be added, R^K\widehat{R}\subseteq K is the set of contacts allowed to be removed, c:𝔾×Vc\colon\mathbb{G}\times V\rightarrow\mathbb{R} is a temporal centrality measure, and ω\omega\in\mathbb{N} is a chosen safety threshold. The goal is then to identify two sets, AA^A^{*}\subseteq\widehat{A} and RR^R^{*}\subseteq\widehat{R}, such that the sum of their sizes |A|+|R||A^{*}|+|R^{*}| is as small as possible and and:

|{wV:c(G,w)>c(G,v)}|ω\left|\left\{w\in V:c(G^{*},w)>c(G^{*},v^{\dagger})\right\}\right|\geq\omega

where G=(V,(KA)R,T)G^{*}=\left(V,(K\cup A^{*})\setminus R^{*},T\right).

Table 1: Summary of our computational complexity results.
Centrality Temporal Hiding Minimum Temporal Hiding
Degree NP-complete (Theorem 1) We show a 22-approximation algorithm (Theorem 4)
Closeness NP-complete (Theorem 2) Inapproximable within (1ϵ)ln|A^|(1-\epsilon)\ln|\widehat{A}| for any ϵ>0\epsilon>0 (Theorem 5)
Betweenness NP-complete (Theorem 3) Inapproximable within (1ϵ)ln|A^|(1-\epsilon)\ln|\widehat{A}| for any ϵ>0\epsilon>0 (Theorem 6)

Table 1 presents the summary of our computational complexity results.

Theorem 1.

Temporal Hiding problem is NP-complete given the degree temporal centrality.

Proof.

The problem is trivially in NP, since after adding and removing a given set of contacts we can computed the ranking of all nodes according to the degree temporal centrality in polynomial time.

We will now prove the NP-hardness of the problem. To this end, we will show a reduction from the NP-complete Finding kk-Clique problem. The decision version of this problem is defined by a simple network, H=(V,E)H=(V,E), where V={v1,,vn}V=\{v_{1},\ldots,v_{n}\}, and a constant kk\in\mathbb{N}, where the goal is to determine whether there exist kk nodes forming a clique in HH.

Let (H,k)(H,k) be a given instance of the Finding kk-Clique problem. Let us assume that n=2n=2, i.e., network HH has at least two nodes.

We will now construct an instance of the Temporal Hiding problem. First, let us construct a temporal network G=(V,K,T)G=(V^{\prime},K,T) where:

  • V=V{v}i=1nj=1nk+1{uji}V^{\prime}=V\cup\{v^{\dagger}\}\cup\bigcup_{i=1}^{n}\bigcup_{j=1}^{n-k+1}\{u^{i}_{j}\},

  • K=viVujiV{(vi,uji,0)}viV{(v,vi,0)}K\!=\!\bigcup_{v_{i}\in V}\bigcup_{u^{i}_{j}\in V}\{(v_{i},\!u^{i}_{j},\!0)\}\cup\bigcup_{v_{i}\in V}\{(v^{\dagger}\!,\!v_{i},\!0)\},

  • T=1T=1.

An example of the construction of the network GG is presented in Figure 2.

Now, consider the instance (G,v,A^,R^,b,c,ω)(G,v^{\dagger},\widehat{A},\widehat{R},b,c,\omega) of the Temporal Hiding problem, where:

  • GG is the temporal network we just constructed,

  • vv^{\dagger} is the evader,

  • A^={(vi,vj,0):(vi,vj)E}\widehat{A}=\{(v_{i},v_{j},0):(v_{i},v_{j})\in E\}, i.e., only edges existing in HH can be added to GG, all of them in moment 0,

  • R^=\widehat{R}=\emptyset, i.e., none of the edges can be removed,

  • b=k(k1)2b=\frac{k(k-1)}{2},

  • cc is the temporal degree centrality,

  • ω=k\omega=k is the safety threshold.

Since R^=\widehat{R}=\emptyset, for any solution to the constructed instance of the Temporal Hiding problem we must have R=R^{*}=\emptyset. Hence, we will omit mentions of RR^{*} in the remainder of the proof, and we will assume that a solution consists just of AA^{*}.

Note that the number of contacts of the evader vv^{\dagger} in GG is nn, and it does not change after the addition of any AAA\subseteq A^{*} (as we can only add edges between the members of VV). Furthermore, note that the number of contacts of every node ujiu^{i}_{j} is 11, and it cannot be increased. Therefore, the only nodes that can contribute to satisfying the safety threshold (by increasing their centrality to a value greater than that of the evader) are the nodes in VV. The number of contacts of any viv_{i} in GG is nk+2n-k+2 (as it contacts with nk+1n-k+1 nodes ujiu^{i}_{j} and with the node vv^{\dagger}). Therefore, for a given viv_{i} to have a greater number of contacts than vv^{\dagger}, we have to add to GG at least k1k-1 contacts incident with viv_{i}.

Refer to caption
Figure 2: Construction used in the proof of Theorem 1. The network HH is given as part of the Finding kk-Clique problem instance, while the network GG is constructed as part of the Temporal Hiding problem instance. The blue nodes in GG are those corresponding to the nodes in HH, while the evader is marked red. All contacts occur at moment 0. Green dotted lines depict the contacts allowed to be added.

We will now show that the constructed instance of the Temporal Hiding problem has a solution if and only if the given instance of the Finding kk-Clique problem has a solution.

Assume that there exists a solution to the given instance of the Finding kk-Clique problem, i.e., a subset VVV^{*}\subseteq V forming a kk-clique in HH. We will show that A={(vi,vj,0):vi,vjV}A^{*}=\{(v_{i},v_{j},0):v_{i},v_{j}\in V^{*}\} is a solution to the constructed instance of the Temporal Hiding problem. First, notice that indeed AA^A^{*}\subseteq\widehat{A}, as A^\widehat{A} contains all edges from HH at moment 0, and V×VV^{*}\times V^{*} is a clique in HH. Note also that adding AA^{*} to GG increases the number of contacts of kk nodes in VV^{*} to n+1n+1, i.e., to a value greater than the number of contacts of the evader. We showed that if there exists a solution to the given instance of the Finding kk-Clique problem, then there also exists a solution to the constructed instance of the Temporal Hiding problem.

Assume that there exists a solution AA^{*} to the constructed instance of the Temporal Hiding problem. We will show that V=(vi,vj,0)A{vi,vj}V^{*}=\bigcup_{(v_{i},v_{j},0)\in A^{*}}\{v_{i},v_{j}\} forms a kk-clique in HH. Notice that since AA^{*} is a solution, it increases the number of contacts of at least kk nodes in VV (since the safety threshold is ω=k\omega=k) by at least k1k-1. However, since the budget is b=k(k1)2b=\frac{k(k-1)}{2}, adding AA^{*} must increase the number of contacts of exactly kk nodes in VV by exactly k1k-1. If such a choice is available, the nodes in VV^{*} form a clique in A^\widehat{A} at moment 0, therefore, they also form a clique in HH. We showed that if there exists a solution to the constructed instance of the Temporal Hiding problem, then there also exists a solution to the given instance of the Finding kk-Clique problem. This concludes the proof. ∎

Theorem 2.

Temporal Hiding problem is NP-complete given the closeness temporal centrality.

Proof.

The problem is trivially in NP since after adding and removing a given set of contacts, we can compute the ranking of all nodes according to the closeness temporal centrality in polynomial time.

We will now prove that the problem is NP-hard. To this end, we will show a reduction from the NP-complete Set Cover problem. The decision version of this problem is defined by a universe, U={u1,,u|U|}U=\{u_{1},\ldots,u_{|U|}\}, a collection of sets S={S1,,S|S|}S=\{S_{1},\ldots,S_{|S|}\} such that iSiU\forall_{i}S_{i}\subset U, and an integer kk\in\mathbb{N} where the goal is to determine whether there exist kk elements of SS the union of which equals UU.

Let (U,S,k)(U,S,k) be a given instance of the Set Cover problem. Let us assume that |S|3|S|\geq 3, note that all instances where |S|<3|S|<3 can be solved in polynomial time. We will now construct an instance of the Temporal Hiding problem. First, let us construct a temporal network G=(V,K,T)G=(V,K,T) where:

  • V={v,z}USi=1|S|k{yi}i=1|U|+|S|1{xi}V=\{v^{\dagger},z\}\cup U\cup S\cup\bigcup_{i=1}^{|S|-k}\{y_{i}\}\cup\bigcup_{i=1}^{|U|+|S|-1}\{x_{i}\},

  • K=xiV{(v,xi,0)}yiV{(z,yi,0)}uiSj{(ui,Sj,1)}K=\bigcup_{x_{i}\in V}\{(v^{\dagger},x_{i},0)\}\cup\bigcup_{y_{i}\in V}\{(z,y_{i},0)\}\cup\bigcup_{u_{i}\in S_{j}}\{(u_{i},S_{j},1)\},

  • T=2T=2.

An example of the construction of the network GG is presented in Figure 3.

Refer to caption
Figure 3: Construction used in the proof of Theorem 2. On the left there is an instance of the Set Cover problem, while on the right there is a network GG constructed as part of the Temporal Hiding problem instance. The blue nodes in GG correspond to the elements of the universe UU, while the evader is marked red. Dotted lines correspond to contacts at moment 0, and solid lines to contact at moment 11. Green lines depict the contacts allowed to be added.

Now, consider the instance (G,v,A^,R^,b,c,ω)(G,v^{\dagger},\widehat{A},\widehat{R},b,c,\omega) of the Temporal Hiding problem, where:

  • GG is the temporal network we just constructed,

  • vv^{\dagger} is the evader,

  • A^={(z,Si,0):SiV}\widehat{A}=\{(z,S_{i},0):S_{i}\in V\},

  • R^=\widehat{R}=\emptyset, i.e., none of the edges can be removed,

  • b=kb=k,

  • cc is the temporal closeness centrality,

  • ω=1\omega=1 is the safety threshold.

Since R^=\widehat{R}=\emptyset, for any solution to the constructed instance of the Temporal Hiding problem we must have R=R^{*}=\emptyset. Hence, we will omit mentions of RR^{*} in the remainder of the proof, and we will assume that a solution consists just of AA^{*}.

First, let us analyze the temporal closeness centrality values of the nodes in GG after the addition of any AA^A\subseteq\widehat{A}. Let qAq_{A} denote the number of nodes ujUu_{j}\in U such that λ(z,uj)<\lambda^{*}(z,u_{j})<\infty, i.e., the number of nodes ujUu_{j}\in U such that zz is connected (via the contacts from AA) with at least one node SiS_{i} connected with uju_{j} (notice that there are no other possibilities for zz to have finite average latency to a node in UU). The temporal closeness centrality values can be computed as follows:

  • cC(GA,v)=2(|U|+|S|1)n1c_{C}(G\cup A,v^{\dagger})=\frac{2(|U|+|S|-1)}{n-1},

  • cC(GA,z)=2(|S|k+|A|+qA)n1c_{C}(G\cup A,z)=\frac{2(|S|-k+|A|+q_{A})}{n-1},

  • cC(GA,Si)2(|U|+1)n1<cC(GA,v)c_{C}(G\cup A,S_{i})\leq\frac{2(|U|+1)}{n-1}<c_{C}(G\cup A,v^{\dagger}) for every SiVS_{i}\in V such that (z,Si,0)A(z,S_{i},0)\in A,

  • cC(GA,Si)2|U|n1<cC(GA,v)c_{C}(G\cup A,S_{i})\leq\frac{2|U|}{n-1}<c_{C}(G\cup A,v^{\dagger}) for every SiVS_{i}\in V such that (z,Si,0)A(z,S_{i},0)\notin A,

  • cC(GA,ui)=2(|{SjS:uiSj}|)n12|S|n1<cC(GA,v)c_{C}(G\cup A,u_{i})=\frac{2(|\{S_{j}\in S:u_{i}\in S_{j}\}|)}{n-1}\leq\frac{2|S|}{n-1}<c_{C}(G\cup A,v^{\dagger}) for every uiVu_{i}\in V,

  • cC(GA,xi)=2n1<cC(GA,v)c_{C}(G\cup A,x_{i})=\frac{2}{n-1}<c_{C}(G\cup A,v^{\dagger}) for every xiVx_{i}\in V,

  • cC(GA,yi)=2n1<cC(GA,v)c_{C}(G\cup A,y_{i})=\frac{2}{n-1}<c_{C}(G\cup A,v^{\dagger}) for every yiVy_{i}\in V.

Since the safety threshold is ω=1\omega=1, only one node has to have greater temporal closeness centrality than vv^{\dagger} after he addition of a given AA^A\subseteq\widehat{A} in order for said AA to be solution to the constructed instance of the Temporal Hiding problem. However, notice that the only node that can have greater temporal closeness centrality than vv^{\dagger} (and therefore higher position in the ranking) is zz. In other words, a given AA^A\subseteq\widehat{A} is a solution to the constructed instance of the Temporal Hiding problem if and only if we have cC(GA,z)>cC(GA,v)c_{C}(G\cup A,z)>c_{C}(G\cup A,v^{\dagger}).

We will now show that the constructed instance of the Temporal Hiding problem has a solution if and only if the given instance of the Set Cover problem has a solution.

Assume that there exists a solution to the given instance of the Set Cover problem, i.e., a subset SSS^{*}\subseteq S of size kk the union of which is the universe UU. We will show that A={z}×S×{0}A^{*}=\{z\}\times S^{*}\times\{0\} (i.e., connecting zz with nodes corresponding to all sets in SS^{*}) is a solution to the constructed instance of the Temporal Hiding problem. First, notice that after the addition of AA^{*} there exists a time-respecting path from zz to every node ujUu_{j}\in U, leading through the node corresponding to an element SiSS_{i}\in S^{*} containing uju_{j}, with which zz is now connected. Hence, we have that qA=|U|q_{A^{*}}=|U| and |A|=k|A^{*}|=k, which gives us:

cC(GA,z)=2(|S|+|U|)n1>\displaystyle c_{C}(G\cup A^{*},z)=\frac{2(|S|+|U|)}{n-1}> 2(|U|+|S|1)n1\displaystyle\frac{2(|U|+|S|-1)}{n-1}
=cC(GA,v).\displaystyle=c_{C}(G\cup A^{*},v^{\dagger}).

Therefore, after the addition of AA^{*} node zz has greater temporal closeness centrality than the evader. We showed that if there exists a solution to the given instance of the Set Cover problem, then there also exists a solution to the constructed instance of the Temporal Hiding problem.

Assume that there exists a solution AA^{*} to the constructed instance of the Temporal Hiding problem. We will show that S={SiS:(z,Si,0)A}S^{*}=\{S_{i}\in S:(z,S_{i},0)\in A^{*}\} covers the universe UU. Let us compute the difference between cC(GA,v)c_{C}(G\cup A^{*},v^{\dagger}) and cC(GA,z)c_{C}(G\cup A^{*},z):

cC(GA,z)cC(G\displaystyle c_{C}(G\cup A^{*},z)-c_{C}(G A,v)\displaystyle\cup A^{*},v^{\dagger})
=2(|A|k+qA|U|+1)n1.\displaystyle=\frac{2(|A^{*}|-k+q_{A^{*}}-|U|+1)}{n-1}.

Since AA^{*} is a solution, zz must have greater temporal closeness centrality than vv^{\dagger}, implying cC(GA,z)cC(GA,v)>0c_{C}(G\cup A^{*},z)-c_{C}(G\cup A^{*},v^{\dagger})>0, which gives us:

|A|+qA+1>|U|+k.|A^{*}|+q_{A^{*}}+1>|U|+k.

Notice however that |A|k|A^{*}|\leq k (since the evader’s budget is b=kb=k) and that qA|U|q_{A^{*}}\leq|U| (since qAq_{A^{*}} is the number of nodes in UU at finite average latency from zz). Therefore, we must have |A|=k|A^{*}|=k and qA=|U|q_{A^{*}}=|U|, which means that after the addition of AA^{*} for every ujUu_{j}\in U we have a node SiSS_{i}\in S connected to both zz and uju_{j} (there is no other way for zz to have a finite average latency to uju_{j}). Consequently, every element of the universe ujUu_{j}\in U is covered by at least one set SiSS_{i}\in S^{*}, as zz is connected with a node SiS_{i} only if it belongs to SS^{*}, and SiS_{i} is connected with uju_{j} only if it contains uju_{j} in the Set Cover problem instance. We showed that if there exists a solution to the constructed instance of the Temporal Hiding problem, then there also exists a solution to the given instance of the Set Cover problem. This concludes the proof. ∎

Theorem 3.

Temporal Hiding problem is NP-complete given the betweenness temporal centrality.

Proof.

The problem is trivially in NP, since after adding and removing a given set of contacts we can computed the ranking of all nodes according to the betweenness temporal centrality in polynomial time.

We will now prove that the problem is NP-hard. To this end, we will show a reduction from the NP-complete 33-Set Cover problem. The decision version of this problem is defined by a universe, U={u1,,u|U|}U=\{u_{1},\ldots,u_{|U|}\}, a collection of sets S={S1,,S|S|}S=\{S_{1},\ldots,S_{|S|}\} such that iSiU|Si|=3\forall_{i}S_{i}\subset U\land|S_{i}|=3, and an integer kk\in\mathbb{N} where the goal is to determine whether there exist kk elements of SS the union of which equals UU.

Let (U,S,k)(U,S,k) be a given instance of the 33-Set Cover problem. Let us assume that |U|6|U|\geq 6, note that all instances where |U|<6|U|<6 can be solved in polynomial time. We will now construct an instance of the Temporal Hiding problem. First, let us construct a temporal network G=(V,K,T)G=(V,K,T) where:

  • V={v,w,y,z}USi=1|U|+k1{xi}V=\{v^{\dagger},w,y,z\}\cup U\cup S\cup\bigcup_{i=1}^{|U|+k-1}\{x_{i}\},

  • K={(z,w,0),(v,y,0)}xiV{(v,xi,1)}uiSj{(ui,Sj,2)}K=\{(z,w,0),(v^{\dagger},y,0)\}\cup\bigcup_{x_{i}\in V}\{(v^{\dagger},x_{i},1)\}\cup\bigcup_{u_{i}\in S_{j}}\{(u_{i},S_{j},2)\},

  • T=3T=3.

An example of the construction of the network GG is presented in Figure 4.

Refer to caption
Figure 4: Construction used in the proof of Theorem 3. On the left there is an instance of the 33-Set Cover problem, while on the right there is a network GG constructed as part of the Temporal Hiding problem instance. The blue nodes in GG correspond to the elements of the universe UU, while the evader is marked red. Dashed lines correspond to contacts at moment 0, dotted lines to contacts at moment 11, and solid lines to contact at moment 22. Green lines depict the contacts allowed to be added.

Now, consider the instance (G,v,A^,R^,b,c,ω)(G,v^{\dagger},\widehat{A},\widehat{R},b,c,\omega) of the Temporal Hiding problem, where:

  • GG is the temporal network we just constructed,

  • vv^{\dagger} is the evader,

  • A^={(z,Si,1):SiV}\widehat{A}=\{(z,S_{i},1):S_{i}\in V\},

  • R^=\widehat{R}=\emptyset, i.e., none of the edges can be removed,

  • b=kb=k,

  • cc is the temporal betweenness centrality,

  • ω=1\omega=1 is the safety threshold.

Since R^=\widehat{R}=\emptyset, for any solution to the constructed instance of the Temporal Hiding problem we must have R=R^{*}=\emptyset. Hence, we will omit mentions of RR^{*} in the remainder of the proof, and we will assume that a solution consists just of AA^{*}.

First, let us analyze the temporal betweenness centrality values of the nodes in GG after the addition of any AA^A\subseteq\widehat{A}. Let qAq_{A} denote the number of nodes ujUu_{j}\in U such that λ(z,uj)<\lambda^{*}(z,u_{j})<\infty, i.e., the number of nodes ujUu_{j}\in U such that zz is connected (via the contacts from AA) with at least one node SiS_{i} connected with uju_{j} (notice that there are no other possibilities for zz to have finite average latency to a node in UU). The temporal betweenness centrality values can be computed as follows:

  • cB(GA,v)=|U|+k1(n1)(n2)Tc_{B}(G\cup A,v^{\dagger})=\frac{|U|+k-1}{(n-1)(n-2)T}, as it belongs to the shortest temporal paths from yy to all |U|+k1|U|+k-1 nodes xix_{i},

  • cB(GA,z)=|A|+qA(n1)(n2)Tc_{B}(G\cup A,z)=\frac{|A|+q_{A}}{(n-1)(n-2)T}, as it belongs to the shortest temporal paths from ww to all |A||A| nodes SiS_{i} it is connected with and to all qAq_{A} nodes uiu_{i} that can be reached from it,

  • cB(GA,Si)6(n1)(n2)TcB(GA,v)c_{B}(G\cup A,S_{i})\leq\frac{6}{(n-1)(n-2)T}\leq c_{B}(G\cup A,v^{\dagger}), for any ii, as SiS_{i} can belong to the shortest temporal paths from nodes in {z,w}\{z,w\} to the three nodes in UU that it is connected with,

  • cB(GA,w)=cB(GA,y)=cB(GA,ui)=cB(GA,xi)=0c_{B}(G\cup A,w)=c_{B}(G\cup A,y)=c_{B}(G\cup A,u_{i})=c_{B}(G\cup A,x_{i})=0, for any ii, as none of these nodes belong to the shortest temporal paths between any pairs of other nodes.

Since the safety threshold is ω=1\omega=1, only one node has to have greater temporal betweenness centrality than vv^{\dagger} after he addition of a given AA^A\subseteq\widehat{A} in order for said AA to be solution to the constructed instance of the Temporal Hiding problem. However, notice that the only node that can have greater temporal betweenness centrality than vv^{\dagger} (and therefore higher position in the ranking) is zz. In other words, a given AA^A\subseteq\widehat{A} is a solution to the constructed instance of the Temporal Hiding problem if and only if we have cB(GA,z)>cB(GA,v)c_{B}(G\cup A,z)>c_{B}(G\cup A,v^{\dagger}).

We will now show that the constructed instance of the Temporal Hiding problem has a solution if and only if the given instance of the 33-Set Cover problem has a solution.

Assume that there exists a solution to the given instance of the 33-Set Cover problem, i.e., a subset SSS^{*}\subseteq S of size kk the union of which is the universe UU. We will show that A={z}×S×{1}A^{*}=\{z\}\times S^{*}\times\{1\} (i.e., connecting zz with nodes corresponding to all sets in SS^{*}) is a solution to the constructed instance of the Temporal Hiding problem. First, notice that after the addition of AA^{*} there exists a time-respecting path from zz to every node ujUu_{j}\in U, leading through the node corresponding to an element SiSS_{i}\in S^{*} containing uju_{j}, with which zz is now connected. Hence, we have that qA=|U|q_{A^{*}}=|U| and |A|=k|A^{*}|=k, which gives us:

cB(GA,z)=|U|+k(n1)(n2)T\displaystyle c_{B}(G\cup A^{*},z)=\frac{|U|+k}{(n-1)(n-2)T} >|U|+k1(n1)(n2)T\displaystyle>\frac{|U|+k-1}{(n-1)(n-2)T}
=cB(GA,v).\displaystyle=c_{B}(G\cup A^{*},v^{\dagger}).

Therefore, after the addition of AA^{*} node zz has greater temporal betweenness centrality than the evader. We showed that if there exists a solution to the given instance of the 33-Set Cover problem, then there also exists a solution to the constructed instance of the Temporal Hiding problem.

Assume that there exists a solution AA^{*} to the constructed instance of the Temporal Hiding problem. We will show that S={SiS:(z,Si,1)A}S^{*}=\{S_{i}\in S:(z,S_{i},1)\in A^{*}\} covers the universe UU. Let us compute the difference between cB(GA,v)c_{B}(G\cup A^{*},v^{\dagger}) and cB(GA,z)c_{B}(G\cup A^{*},z):

cB(GA,z)cB\displaystyle c_{B}(G\cup A^{*},z)-c_{B} (GA,v)\displaystyle(G\cup A^{*},v^{\dagger})
=qA+|A||U|k+1(n1)(n2)T.\displaystyle=\frac{q_{A^{*}}+|A^{*}|-|U|-k+1}{(n-1)(n-2)T}.

Since AA^{*} is a solution, zz must have greater temporal betweenness centrality than vv^{\dagger}, implying cB(GA,z)cB(GA,v)>0c_{B}(G\cup A^{*},z)-c_{B}(G\cup A^{*},v^{\dagger})>0, which gives us:

qA+|A|+1>|U|+k.q_{A^{*}}+|A^{*}|+1>|U|+k.

Notice however that |A|k|A^{*}|\leq k (since the evader’s budget is b=kb=k) and that qA|U|q_{A^{*}}\leq|U| (since qAq_{A^{*}} is the number of nodes in UU at finite average latency from zz). Therefore, we must have |A|=k|A^{*}|=k and qA=|U|q_{A^{*}}=|U|, which means that after the addition of AA^{*} for every ujUu_{j}\in U we have a node SiSS_{i}\in S connected to both zz and uju_{j} (there is no other way for zz to have a finite average latency to uju_{j}). Consequently, every element of the universe ujUu_{j}\in U is covered by at least one set SiSS_{i}\in S^{*}, as zz is connected with a node SiS_{i} only if it belongs to SS^{*}, and SiS_{i} is connected with uju_{j} only if it contains uju_{j} in the 33-Set Cover problem instance. We showed that if there exists a solution to the constructed instance of the Temporal Hiding problem, then there also exists a solution to the given instance of the 33-Set Cover problem. This concludes the proof. ∎

Algorithm 1 Finding a 22-approximate solution for the Minimum Temporal Hiding problem given the degree temporal centrality.
1:Temporal networks G=(V,K,T)G=(V,K,T), evader vv^{\dagger}, set of edges allowed to be added A^\widehat{A}, set of edges allowed to be removed R^\widehat{R}, safety threshold ω\omega.
2:Solution (A,R)(A^{*},R^{*}) to the instance (G,v,A^,R^,cC,ω)(G,v^{\dagger},\widehat{A},\widehat{R},c_{C},\omega) of the Temporal Hiding problem or \perp if there is no solution.
3:Let vi=1n1\langle\!v\rangle_{i=1}^{n\!-\!1}\! be a sequence of all nodes in V\!V\!\! other than vv^{\dagger}
4:A^A^({v}×V×T)\widehat{A}\leftarrow\widehat{A}\setminus(\{v^{\dagger}\}\!\times\!V\!\times\!\langle T\rangle)
5:R^R^({v}×V×T)\widehat{R}\leftarrow\widehat{R}\cap(\{v^{\dagger}\}\!\times\!V\!\times\!\langle T\rangle)
6:for z0,,|R^|z\leftarrow 0,\ldots,|\widehat{R}| do
7:    Xiz[θ,q]X_{i}^{z}[\theta,q]\leftarrow\infty for every i,θ,qi,\theta,q
8:    X0z[0,0]0X_{0}^{z}[0,0]\leftarrow 0
9:    for i1,,n1i\leftarrow 1,\ldots,n-1 do
10:         ϕi|A^({vi}×V×T)|\phi_{i}\leftarrow\left|\widehat{A}\cap\left(\{v_{i}\}\!\times\!V\!\times\!\langle T\rangle\!\right)\right|
11:         for r0,,|R^({v}×{vi}×T)|r\leftarrow 0,\ldots,\left|\widehat{R}\cap\left(\{v^{\dagger}\}\!\times\!\{v_{i}\}\!\times\!\langle T\rangle\right)\right| do
12:             amax(0,|KG(v)|z+1|KG(vi)|+r)a\leftarrow\max\left(0,|K_{G}(v^{\dagger})|\!-\!z\!+\!1\!-\!|K_{G}(v_{i})|\!+\!r\right)\!
13:             for θ0,,min(i1,ω)\theta\leftarrow 0,\ldots,\min(i-1,\omega) do
14:                 for q0,,zq\leftarrow 0,\ldots,z do
15:                     if aϕiXi1z[θ,q]+a<Xiz[θ+1,q+r]a\!\!\leq\!\!\phi_{i}\!\land\!X_{i\!-\!1}^{z}\![\theta,\!q]\!\!+\!\!a\!\!<\!\!X_{i}^{z}\![\theta\!\!+\!\!1,\!q\!\!+\!\!r] then
16:                         Xiz[θ+1,q+r]Xi1z[θ,q]+aX_{i}^{z}[\theta+1,q+r]\leftarrow X_{i\!-\!1}^{z}[\theta,q]+a
17:                         Yiz[θ+1,q+r](a,r)Y_{i}^{z}[\theta+1,q+r]\leftarrow(a,r)                      
18:                     if a>0Xi1z[θ,q]<Xiz[θ,q+r]a\!>\!0\land X_{i\!-\!1}^{z}[\theta,\!q]\!<\!X_{i}^{z}[\theta,\!q\!+\!r] then
19:                         Xiz[θ,q+r]Xi1z[θ,q]X_{i}^{z}[\theta,q+r]\leftarrow X_{i\!-\!1}^{z}[\theta,q]
20:                         Yiz[θ,q+r](0,r)Y_{i}^{z}[\theta,q+r]\leftarrow(0,r)                                                                 
21:    CzXn1z[ω,z]+2zC_{z}\leftarrow X_{n-1}^{z}[\omega,z]+2z
22:zargminzCzz^{*}\leftarrow\operatorname*{arg\,min}_{z}C_{z}
23:if Cz=C_{z^{*}}=\infty then
24:    return \perp
25:(A,R)(,)(A^{*},R^{*})\leftarrow(\emptyset,\emptyset)
26:xCz2zx^{*}\leftarrow C_{z^{*}}\!-\!2z^{*}
27:θω\theta^{*}\leftarrow\omega
28:qzq^{*}\leftarrow z^{*}
29:for in1,,1i\leftarrow n-1,\ldots,1 do
30:    (a,r)Yiz[θ,q](a,r)\leftarrow Y_{i}^{z^{*}}[\theta^{*},q^{*}]
31:    AAselect(a,A^({vi}×V×T))A^{*}\leftarrow A^{*}\cup\operatorname*{select}\left(a,\widehat{A}\cap\left(\{v_{i}\}\!\times\!V\!\times\!\langle T\rangle\right)\right)
32:    RRselect(r,R^({v}×{vi}×T))R^{*}\leftarrow R^{*}\cup\operatorname*{select}\left(r,\widehat{R}\cap\left(\{v^{\dagger}\}\!\times\!\{v_{i}\}\!\times\!\langle T\rangle\right)\right)
33:    xxax^{*}\leftarrow x^{*}\!-\!a
34:    qqrq^{*}\leftarrow q^{*}\!-\!r
35:    if |KG(vi)|+ar>|KG(v)|z|K_{G}(v_{i})|\!+\!a\!-\!r>|K_{G}(v^{\dagger})|\!-\!z^{*} then
36:         θθ1\theta^{*}\leftarrow\theta^{*}\!-\!1     
37:return (A,R)(A^{*},R^{*})
Theorem 4.

Algorithm 1 is a 22-approximation for the Minimum Temporal Hiding problem given the degree temporal centrality. The bound is tight.

Proof.

First, notice that removing contacts that are not incident with vv^{\dagger} and adding edges that are incident with vv^{\dagger} is always suboptimal, as in case of both these changes the difference between the degree of the evader and all other nodes either increases or stays the same. Hence, no solution of the optimal size will include these types of modification and we exclude them in lines 4-5.

Let the cost of a given solution be expressed by a number of stubs (half-contacts), i.e., twice the number of contacts. In lines 6-22 the algorithm computes the minimal cost of a solution that satisfies the safety threshold, under assumption that we are allowed to connect any two stubs from contacts appearing in A^\widehat{A} (thus we are guaranteed that the actual optimum is greater or equal). It does so using the dynamic programming technique. The loop in line 6 iterates over the number of contacts removed from the network. The algorithm fills the arrays XX and YY in order to identify CzC_{z}, the optimal cost of solution that satisfies the safety thresholds while removing zz contacts from the network. The element Xiz[θ,q]X^{z}_{i}[\theta,q] is minimal number of stubs that have to added to nodes v1,,viv_{1},\ldots,v_{i} so that θ\theta of them have greater degrees than the evader, assuming that we removed qq stubs incident with them and that the degree of the evader is |KG(v)|z|K_{G}(v^{\dagger})|-z. The element Yiz[θ,q]Y^{z}_{i}[\theta,q] stores the number of stubs that are added to and removed from node viv_{i} to achieve the solution with the cost Xiz[θ,q]X^{z}_{i}[\theta,q]. The loop in line 9 iterates over all nodes in the network other than the evader, while the loop in line 11 iterates over the number of stubs removed from the node viv_{i}. The value aa computed in line is the number of stubs that have to added to node viv_{i} in order for it to contribute towards the safety threshold, i.e., in order for it to have greater degree than the evader. The loops in lines 13 and 14 iterate over all values of θ\theta and qq valid for arrays Xi1zX^{z}_{i-1} and Yi1zY^{z}_{i-1}, and based on them fill arrays XizX^{z}_{i} and YizY^{z}_{i}. If the addition of the required number of aa stubs is possible (test in line 15) then we record a solution where the node viv_{i} contributes towards the safety threshold (value θ+1\theta+1 in lines 16-17). In lines 19-20) we record a solution where the node viv_{i} does not contribute towards the safety threshold, but is only used to decrease the degree of the evader. We do it only if counting towards the threshold actually required adding any stubs (test in line 18). Finally, in line 21 we compute the cost of the optimal solution that removes zz contacts from the network, while satisfying the safety threshold ω\omega.

We showed that CzC_{z^{*}} is the minimal cost of a solution that satisfies the safety threshold, under assumption that we are allowed to connect any two stubs from contacts appearing in A^\widehat{A}. If no solution was found (which is tested in line 23), then the algorithm returns \perp in line 24. Otherwise, the algorithm constructs the solution in lines 25-37. Since we removed all contact including vv^{\dagger} from A^\widehat{A}, any solution that removes all contacts from the optimal solution and adds all the stubs from the optimal solution satisfies the safety threshold. What is more, such a solution contains at most twice as many contacts as the optimum (as it is possible that each stub in A^\widehat{A} from the optimal solution gets connected to a stub not appearing in the optimal solution). Hence, the algorithm is a 22-approximation. The complexity of the algorithm is 𝒪(|R^|3nω)\mathcal{O}(|\widehat{R}|^{3}n\omega).

Refer to caption
Figure 5: Example of a network for which the approximation ratio of Algorithm 1 is exactly 22. The red node corresponds to the evader. Numbers next to edges are the moments in which the contacts occur. Green dotted lines depict the contacts allowed to be added.

Finally, to prove the claim about the tight bound, Figure 5 presents an example of a network for which the approximation ratio of Algorithm 1 is exactly 22. Given a safety threshold ω=4\omega=4, Algorithm 1 will add to the network contacts (v1,v2,0)(v_{1},v_{2},0) and (v3,v4,0)(v_{3},v_{4},0) (because of the order of nodes in sequence vi=1n1\langle v\rangle_{i=1}^{n-1}), whereas the optimal solution is to add the contact (v5,v6,0)(v_{5},v_{6},0). This concludes the proof. ∎

Theorem 5.

The Minimum Temporal Hiding problem given the closeness temporal centrality cannot be approximated within a ratio of (1ϵ)lnn(1-\epsilon)\ln n for any ϵ>0\epsilon>0, unless P=NPP=NP.

Proof.

In order to prove the theorem, we will use the result by Dinur and Steurer [6] that the Minimum Set Cover problem cannot be approximated within a ratio of (1ϵ)lnn(1-\epsilon)\ln n for any ϵ>0\epsilon>0, unless P=NPP=NP.

Let X=(U,S)X=(U,S) be an instance of the Minimum Set Cover problem, where UU is the universe {u1,,u|U|}\{u_{1},\ldots,u_{|U|}\}, and SS is a collection {S1,,S|S|}\{S_{1},\ldots,S_{|S|}\} of subsets of UU. The goal here is to find subset SSS^{*}\subseteq S such that the union of SS^{*} equals UU and the size of SS^{*} is minimal.

First, we will show a function f(X)f(X) that based on an instance of the Minimum Set Cover problem X=(U,S)X=(U,S) constructs an instance of the Minimum Temporal Hiding problem. Let us assume that |S|3|S|\geq 3, note that all instances where |S|<3|S|<3 can be solved in polynomial time.

Let the temporal network G(X)=(V,K,T)G(X)=(V,K,T) be defined as:

  • V={v,z}USi=1|U|+|S|1{xi}V=\{v^{\dagger},z\}\cup U\cup S\cup\bigcup_{i=1}^{|U|+|S|-1}\{x_{i}\},

  • K=xiV{(v,xi,0)}SiV{(z,Si,1)}uiSj{(ui,Sj,1)}K=\bigcup_{x_{i}\in V}\{(v^{\dagger},x_{i},0)\}\cup\bigcup_{S_{i}\in V}\{(z,S_{i},1)\}\cup\bigcup_{u_{i}\in S_{j}}\{(u_{i},S_{j},1)\},

  • T=2T=2.

An example of the construction of the network G(X)G(X) is presented in Figure 6.

Refer to caption
Figure 6: Construction used in the proof of Theorem 5. On the left there is an instance XX of the Minimum Set Cover problem, while on the right there is a network G(X)G(X) constructed as part of the Minimum Temporal Hiding problem instance. The blue nodes in G(X)G(X) correspond to the elements of the universe UU, while the evader is marked red. Dotted lines correspond to contacts at moment 0, and solid lines to contact at moment 11. Green lines depict the contacts allowed to be added.

The formula of function ff is then f(X)=(G(X),v,A^,R^,c,ω)f(X)=(G(X),v^{\dagger},\widehat{A},\widehat{R},c,\omega), where:

  • G(X)G(X) is the temporal network we just constructed,

  • vv^{\dagger} is the evader,

  • A^={(z,Si,0):SiV}\widehat{A}=\{(z,S_{i},0):S_{i}\in V\},

  • R^=\widehat{R}=\emptyset, i.e., none of the edges can be removed,

  • cc is the temporal closeness centrality,

  • ω=1\omega=1 is the safety threshold.

Let AA^{*} be the solution to the constructed instance of the Minimum Temporal Hiding problem f(X)f(X) (notice that since R^=\widehat{R}=\emptyset, we must have R=R^{*}=\emptyset). The function gg computing corresponding solution to the instance XX of the Minimum Set Cover problem is now g(X,A)={SiS:(z,Si,0)A}g(X,A^{*})=\{S_{i}\in S:(z,S_{i},0)\in A^{*}\}, i.e., SS^{*} is the set of all sets SiS_{i} such that their corresponding nodes SiS_{i} are connected with zz via the contacts in AA^{*}.

Now, we will show that g(X,A)g(X,A^{*}) is indeed a correct solution to XX, i.e., that it covers the entire universe. Let qAq_{A} denote the number of nodes ujUu_{j}\in U such that λG(X)A(z,uj)<\lambda^{*}_{G(X)\cup A}(z,u_{j})<\infty for a given set of contacts AA^A\subseteq\widehat{A}. Centrality values after adding AA to the network G(X)G(X) are as follows:

  • cC(G(X)A,v)=2(|U|+|S|1)n1c_{C}(G(X)\cup A,v^{\dagger})=\frac{2(|U|+|S|-1)}{n-1},

  • cC(G(X)A,z)=2(|S|+qA)n1c_{C}(G(X)\cup A,z)=\frac{2(|S|+q_{A})}{n-1},

  • cC(G(X)A,Si)2(|U|+1)n1<cC(G(X)A,v)c_{C}(G(X)\cup A,S_{i})\leq\frac{2(|U|+1)}{n-1}<c_{C}(G(X)\cup A,v^{\dagger}) for every SiVS_{i}\in V,

  • cC(G(X)A,ui)=2(|{SjS:uiSj}|)n12|S|n1<cC(G(X)A,v)c_{C}(G(X)\cup A,u_{i})=\frac{2(|\{S_{j}\in S:u_{i}\in S_{j}\}|)}{n-1}\leq\frac{2|S|}{n-1}<c_{C}(G(X)\cup A,v^{\dagger}) for every uiVu_{i}\in V,

  • cC(G(X)A,xi)=2n1<cC(G(X)A,v)c_{C}(G(X)\cup A,x_{i})=\frac{2}{n-1}<c_{C}(G(X)\cup A,v^{\dagger}) for every xiVx_{i}\in V.

Therefore, the safety threshold ω\omega is satisfied if and only if cC(G(X)A,z)>cC(G(X)A,v)c_{C}(G(X)\cup A,z)>c_{C}(G(X)\cup A,v^{\dagger}), which is the case when qA=|U|q_{A}=|U|. Hence, if AA^{*} is a solution to f(X)f(X), then for every node uju_{j} there exists a node SiS_{i} such that (z,Si,0)A(z,S_{i},0)\in A^{*} and uju_{j} is connected with SiS_{i}. However, because of the way we constructed the network G(X)G(X), this can only be the case when ujSiu_{j}\in S_{i} in the given instance XX of the Minimum Set Cover problem. Therefore, for every element of the universe uju_{j} there exists a set SiS_{i} such that ujSiu_{j}\in S_{i} and SiS_{i} in g(X,A)g(X,A^{*}), which implies that g(X,A)g(X,A^{*}) is a solution to the given instance XX of the Minimum Set Cover problem, i.e., it covers the universe. What is more, since |g(X,A)|=|A||g(X,A^{*})|=|A^{*}|, we also have that the optimal solutions to both instances are of the same size.

Now, assume that there exists an rr-approximation algorithm for the Minimum Temporal Hiding problem where r=(1ϵ)ln|A^|r=(1-\epsilon)\ln|\widehat{A}| for some ϵ>0\epsilon>0. Let us use this algorithm to solve the constructed instance f(X)f(X) and consider solution g(X,A)g(X,A^{*}) to the given instance XX of the Minimum Set Cover problem. Since the size of the optimal solution is the same for both instances, we obtained an approximation algorithm that solves Minimum Set Cover problem to within (1ϵ)lnn(1-\epsilon)\ln n for ϵ>0\epsilon>0. However, Dinur and Steurer [6] showed that the Minimum Set Cover problem cannot be approximated within a ratio of (1ϵ)lnn(1-\epsilon)\ln n for any ϵ>0\epsilon>0, unless P=NPP=NP. Therefore, such approximation algorithm for the Minimum Temporal Hiding problem cannot exist, unless P=NPP=NP. This concludes the proof. ∎

Theorem 6.

The Minimum Temporal Hiding problem given the betweenness temporal centrality cannot be approximated within a ratio of (1ϵ)lnn(1-\epsilon)\ln n for any ϵ>0\epsilon>0, unless P=NPP=NP.

Proof.

In order to prove the theorem, we will use the result by Dinur and Steurer [6] that the Minimum Set Cover problem cannot be approximated within a ratio of (1ϵ)lnn(1-\epsilon)\ln n for any ϵ>0\epsilon>0, unless P=NPP=NP.

Let X=(U,S)X=(U,S) be an instance of the Minimum Set Cover problem, where UU is the universe {u1,,u|U|}\{u_{1},\ldots,u_{|U|}\}, and SS is a collection {S1,,S|S|}\{S_{1},\ldots,S_{|S|}\} of subsets of UU. The goal here is to find subset SSS^{*}\subseteq S such that the union of SS^{*} equals UU and the size of SS^{*} is minimal.

First, we will show a function f(X)f(X) that based on an instance of the Minimum Set Cover problem X=(U,S)X=(U,S) constructs an instance of the Minimum Temporal Hiding problem. Let us assume that i|Si|<|U|\forall_{i}|S_{i}|<|U|, note that all instances where there exists Si=US_{i}=U can be solved in polynomial time.

Let the temporal network G(X)=(V,K,T)G(X)=(V,K,T) be defined as:

  • V={v,z}USi=1|U|{wi}i=1|U|+1{xi}i=1|U|1{yi}V=\{v^{\dagger},z\}\cup U\cup S\cup\bigcup_{i=1}^{|U|}\{w_{i}\}\cup\bigcup_{i=1}^{|U|+1}\{x_{i}\}\cup\bigcup_{i=1}^{|U|-1}\{y_{i}\},

  • K=xiV{(v,xi,0)}yiV{(v,yi,2)}wiV{(z,wi,0)}wi,wjV{(wi,wj,2)}K=\bigcup_{x_{i}\in V}\{(v^{\dagger},x_{i},0)\}\cup\bigcup_{y_{i}\in V}\{(v^{\dagger},y_{i},2)\}\cup\bigcup_{w_{i}\in V}\{(z,w_{i},0)\}\cup\bigcup_{w_{i},w_{j}\in V}\{(w_{i},w_{j},2)\}
    wi,SjV{(wi,Sj,2)}SiV{(z,Si,2)}Si,SjV{(Si,Sj,2)}uiSj{(ui,Sj,2)}\cup\bigcup_{w_{i},S_{j}\in V}\{(w_{i},S_{j},2)\}\cup\bigcup_{S_{i}\in V}\{(z,S_{i},2)\}\cup\bigcup_{S_{i},S_{j}\in V}\{(S_{i},S_{j},2)\}\cup\bigcup_{u_{i}\in S_{j}}\{(u_{i},S_{j},2)\},

  • T=3T=3.

An example of the construction of the network G(X)G(X) is presented in Figure 7.

Refer to caption
Figure 7: Construction used in the proof of Theorem 6. On the left there is an instance XX of the Minimum Set Cover problem, while on the right there is a network G(X)G(X) constructed as part of the Minimum Temporal Hiding problem instance. The blue nodes in G(X)G(X) correspond to the elements of the universe UU, while the evader is marked red. Dashed lines correspond to contacts at moment 0, dotted lines to contacts at moment 11, and solid lines to contact at moment 22. Green lines depict the contacts allowed to be added. Contacts between nodes SiS_{i} and wjw_{j} are depicted grey for better readability.

The formula of function ff is then f(X)=(G(X),v,A^,R^,c,ω)f(X)=(G(X),v^{\dagger},\widehat{A},\widehat{R},c,\omega), where:

  • G(X)G(X) is the temporal network we just constructed,

  • vv^{\dagger} is the evader,

  • A^={(z,Si,1):SiV}\widehat{A}=\{(z,S_{i},1):S_{i}\in V\},

  • R^=\widehat{R}=\emptyset, i.e., none of the edges can be removed,

  • cc is the temporal betweenness centrality,

  • ω=1\omega=1 is the safety threshold.

Let AA^{*} be the solution to the constructed instance of the Minimum Temporal Hiding problem f(X)f(X) (notice that since R^=\widehat{R}=\emptyset, we must have R=R^{*}=\emptyset). The function gg computing corresponding solution to the instance XX of the Minimum Set Cover problem is now g(X,A)={SiS:(z,Si,0)A}g(X,A^{*})=\{S_{i}\in S:(z,S_{i},0)\in A^{*}\}, i.e., SS^{*} is the set of all sets SiS_{i} such that their corresponding nodes SiS_{i} are connected with zz via the contacts in AA^{*}.

Now, we will show that g(X,A)g(X,A^{*}) is indeed a correct solution to XX, i.e., that it covers the entire universe. Let qAq_{A} denote the number of nodes ujUu_{j}\in U such that λG(X)A(z,uj)<\lambda^{*}_{G(X)\cup A}(z,u_{j})<\infty for a given set of contacts AA^A\subseteq\widehat{A}. Centrality values after adding AA to the network G(X)G(X) are as follows:

  • cB(G(X)A,v)=(|U|+1)(|U|1)(n1)(n2)T=|U|21(n1)(n2)Tc_{B}(G(X)\cup A,v^{\dagger})=\frac{(|U|+1)(|U|-1)}{(n-1)(n-2)T}=\frac{|U|^{2}-1}{(n-1)(n-2)T}, as vv^{\dagger} controls all shortest temporal paths from nodes xix_{i} to yiy_{i},

  • cB(G(X)A,z)=qA|U|(n1)(n2)Tc_{B}(G(X)\cup A,z)=\frac{q_{A}|U|}{(n-1)(n-2)T}, as zz controls all shortest temporal paths from nodes wiw_{i} to nodes uju_{j} that are reachable from zz,

  • cB(G(X)A,Si)(|U|+1)(|U|1)(n1)(n2)T=|U|21(n1)(n2)T=cB(G(X)A,v)c_{B}(G(X)\cup A,S_{i})\leq\frac{(|U|+1)(|U|-1)}{(n-1)(n-2)T}=\frac{|U|^{2}-1}{(n-1)(n-2)T}=c_{B}(G(X)\cup A,v^{\dagger}), as SiS_{i} can control only paths from nodes wiw_{i} and zz (there are |U|+1|U|+1 such nodes) to nodes uju_{j} that SiS_{i} has contact with (we made an assumption that there are at most |U|1|U|-1 such nodes),

  • cB(G(X)A,wi)=cB(G(X)A,ui)=cB(G(X)A,xi)=cB(G(X)A,yi)=0<cB(G(X)A,v)c_{B}(G(X)\cup A,w_{i})=c_{B}(G(X)\cup A,u_{i})=c_{B}(G(X)\cup A,x_{i})=c_{B}(G(X)\cup A,y_{i})=0<c_{B}(G(X)\cup A,v^{\dagger}), as these node do not control any shortest paths.

Therefore, the safety threshold ω\omega is satisfied if and only if cB(G(X)A,z)>cB(G(X)A,v)c_{B}(G(X)\cup A,z)>c_{B}(G(X)\cup A,v^{\dagger}), which is the case when qA=|U|q_{A}=|U|. Hence, if AA^{*} is a solution to f(X)f(X), then for every node uju_{j} there exists a node SiS_{i} such that (z,Si,1)A(z,S_{i},1)\in A^{*} and uju_{j} is connected with SiS_{i}. However, because of the way we constructed the network G(X)G(X), this can only be the case when ujSiu_{j}\in S_{i} in the given instance XX of the Minimum Set Cover problem. Therefore, for every element of the universe uju_{j} there exists a set SiS_{i} such that ujSiu_{j}\in S_{i} and SiS_{i} in g(X,A)g(X,A^{*}), which implies that g(X,A)g(X,A^{*}) is a solution to the given instance XX of the Minimum Set Cover problem, i.e., it covers the universe. What is more, since |g(X,A)|=|A||g(X,A^{*})|=|A^{*}|, we also have that the optimal solutions to both instances are of the same size.

Now, assume that there exists an rr-approximation algorithm for the Minimum Temporal Hiding problem where r=(1ϵ)ln|A^|r=(1-\epsilon)\ln|\widehat{A}| for some ϵ>0\epsilon>0. Let us use this algorithm to solve the constructed instance f(X)f(X) and consider solution g(X,A)g(X,A^{*}) to the given instance XX of the Minimum Set Cover problem. Since the size of the optimal solution is the same for both instances, we obtained an approximation algorithm that solves Minimum Set Cover problem to within (1ϵ)lnn(1-\epsilon)\ln n for ϵ>0\epsilon>0. However, Dinur and Steurer [6] showed that the Minimum Set Cover problem cannot be approximated within a ratio of (1ϵ)lnn(1-\epsilon)\ln n for any ϵ>0\epsilon>0, unless P=NPP=NP. Therefore, such approximation algorithm for the Minimum Temporal Hiding problem cannot exist, unless P=NPP=NP. This concludes the proof. ∎

5 Empirical Analysis

In this section, we present the experimental analysis of the problem using simulations.

Table 2: Datasets considered in the simulations.
Network |V||V| |K||K|
Bluetooth [27] 74 87491
Call center [33] 52 1182
Cambridge [4] 187 8769
Conference 1 [19] 113 8218
Conference 2 [4] 199 27165
Conference 3 [38] 402 28954
Conference 4 [38] 361 19304
Copenhagen call [40] 484 1667
Copenhagen SMS [40] 535 6165
Diary [37] 49 1427
High school 1 [29] 312 28780
High school 2 [29] 310 35592
High school 3 [29] 303 30383
High school 4 [29] 295 28112
High school 5 [29] 299 26391
Hospital [47] 75 9313
Hospital colocation [47] 73 35200
Intel [4] 113 7408
Kenya [24] 52 2070
Office [11] 92 2679
Office colocation [11] 95 45357
Polish manufacturer email [31] 167 43360
Primary school 1 [39] 236 52339
Primary school 2 [39] 238 56313
Reality [7] 64 4251
Romania [28] 42 19045
St Andrews [2] 25 1483
Undergrads call [27] 75 3574
Undergrads SMS [27] 41 758
University couples call [1] 126 31155
University couples SMS [1] 110 11962

5.1 Datasets

In our simulations we consider a number of real-life temporal network datasets. All datasets are presented in Table 2.

Given that the time resolutions of different datasets are vastly different, we perform the following normalization procedure. We divide the time interval between the first and the last timestamp in the original dataset into 10001000 equal subintervals. For each pair of nodes we consider them to have contact at time step tt if and only if the tt-th subinterval contains at least one contact in the original dataset. The length of the time interval of such normalized dataset is then T=1000T=1000.

The third column of Table 2 contains the number of contacts after performing the normalization procedure. Such normalized datasets are then used in the simulations.

Refer to caption
Figure 8: Effects of hiding the top 1010 nodes in each centrality ranking. Smaller plots present results for different real-life networks. The large central plot presents data averaged over all networks (with each point corresponding to an average over nodes at a particular position in the ranking of a given centrality). The y-axis corresponds to the change in the evader’s centrality ranking relative to the number of all nodes (with positive values indicating that the evader becomes better hidden due to the heuristic). The x-axis corresponds to the evader’s influence relative to their initial influence. Different colors correspond to centrality measures, while various shapes correspond to heuristics.
Table 3: Temporal node descriptors.
Type Descriptor Definition
ϕC\phi_{C} Fraction of node’s contacts when half of all contacts happened
ϕT\phi_{T} Fraction of node’s contacts at half the time TT
ρC\rho_{C} Fraction of node’s edges present when half of all contacts happened
ρT\rho_{T} Fraction of node’s edges present at half the time TT
ρc\rho_{c} Fraction of node’s edges present at both the first and last 5%5\% of all contacts
Time evolution ρt\rho_{t} Fraction of node’s edges present at both the first and last 5%5\% of the time TT
εm\varepsilon_{m} Mean of intercontact times over node’s edges
εs\varepsilon_{s} Standard deviation of intercontact times over node’s edges
εv\varepsilon_{v} Coefficient of variation of intercontact times over node’s edges
εk\varepsilon_{k} Skewness of intercontact times over node’s edges
λm\lambda_{m} Mean of the number of other contacts between two consecutive contacts
of a node’s edge
λs\lambda_{s} Standard deviation of the number of other contacts between two consecutive
contacts of a node’s edge
λv\lambda_{v} Coefficient of variation of the number of other contacts between two consecutive
contacts of a node’s edge
λk\lambda_{k} Skewness of the number of other contacts between two consecutive
Edge activity contacts of a node’s edge
νm\nu_{m} Like εm\varepsilon_{m} but for node’s contacts
νs\nu_{s} Like εs\varepsilon_{s} but for node’s contacts
νv\nu_{v} Like εv\varepsilon_{v} but for node’s contacts
νk\nu_{k} Like εk\varepsilon_{k} but for node’s contacts
ηm\eta_{m} Like λm\lambda_{m} but for node’s contacts
ηs\eta_{s} Like λsc\lambda^{c}_{s} but for node’s contacts
ηv\eta_{v} Like λvc\lambda^{c}_{v} but for node’s contacts
Node activity ηkc\eta^{c}_{k} Like λkc\lambda^{c}_{k} but for node’s contacts
δ\delta Degree of the node
δR\delta_{R} Degree of the node divided by the average degree
κ\kappa Number of contacts of the node
κR\kappa_{R} Number of contacts of the node divided by the average number of contacts
Network structure ζ\zeta Local clustering coefficient of the node

5.2 Heuristics

As shown in Section 4, the task of finding an optimal way to hide from temporal centralities is computationally intractable. Hence, we now consider a number of heuristic solutions that add or remove contacts from the network in hope of improving the concealment of the evader:

Remove Random

Uniformly at random selects a neighbor of the evader, then removes one of the contacts with this neighbor (chosen uniformly at random),

Remove Minimum

Removes one of the contacts (chosen uniformly at random) with a neighbor with the smallest number of contacts with the evader,

Add Random

Uniformly at random selects a neighbor of the evader, then adds another contact with this neighbor (the moment of which is chosen uniformly at random),

Add New

Adds a contact (the moment of which is chosen uniformly at random) between the evader and a second degree neighbor of the evader (chosen uniformly at random), i.e., a node that has at least one common neighbor with the evader, but is not the evader’s neighbor.

For both adding and removing contacts we consider two heuristics—one selecting contacts to add or remove uniformly at random, and one with a more strategic approach, focused either on removing all contacts with some neighbors, or on finding new neighbors.

Refer to caption Refer to caption
Refer to caption Refer to caption
Refer to caption
Figure 9: Lasso regression coefficients of the node descriptors. In each plot the x-axis corresponds to descriptors (defined in Table 3) with non-zero coefficients in the Lasso regression, while the y-axis corresponds to the value of said coefficients. Different colors corresponds to different hiding heuristics.

5.3 Basic Simulations

Given a temporal network GG and a centrality measure cc, we select 1010 top nodes in the centrality ranking as the potential evaders. We then attempt to hide each such evader vv^{\dagger} using each of the considered heuristics described above. For each heuristic, we add or remove 5%5\% of the evader’s contacts (at least ten for nodes with exceptionally few contacts).

We measure the following values before and after the hiding process:

  • the position of vv^{\dagger} in the ranking of cc,

  • the centrality value of vv^{\dagger} according to cc,

  • the influence of vv^{\dagger} over the network, measured as the average probability that vv^{\dagger} gets infected if a Susceptible-Infected process starts in a different node times the expected number of infected nodes if a Susceptible-Infected process starts in vv^{\dagger} (we consider the process with the probability of infection 10%10\%).

We repeated the simulation 100100 times for each network and presented the results as averages.

Figure 8 presents the results of our simulations. As shown in the figure, in the vast majority of cases, the removal of contacts results in the evader’s being more hidden, i.e., dropping in centrality ranking and becoming less influential. At the same time, adding contacts to the network has the opposite effects, i.e., the evader becomes more influential and more exposed to temporal centrality analysis. It suggests a centrality-influence trade-off in temporal networks. When comparing the effects of heuristics performing random changes, be it adding or removing contacts, with those performing strategic manipulation, i.e., Remove Minimum and Add New heuristics, we can see that using strategic heuristics result in a more pronounced effect, i.e., the magnitude of change in centrality ranking or influence value is greater than in the case of their random counterparts. Finally, when comparing different centrality measures, the betweenness centrality is, on average, the most sensitive to manipulation, i.e., executing the hiding process results in the greatest change in the betweenness centrality ranking. The next most sensitive centrality measure is the closeness centrality, with the degree and the eigenvector centrality measures being the most resilient on average.

5.4 Regression Analysis

The results of the simulations presented so far give us some insight into how effective different hiding heuristics can be. Still, they do not explain what qualities of the evader determine how successfully it can hide from temporal centralities. To this end, we now perform the regression analysis of the hiding process. Since the analysis presented in the previous section showed that only removal heuristics successfully hide the evader, we will focus our attention on them.

For each node considered an evader in our experiments, we compute several descriptors, i.e., its characteristics relating to its contacts and the network structure in which it is embedded. We consider four different categories of descriptors: time evolution, edge activity, node activity, and network structure. All descriptors are presented in Table 3.

Figure 9 presents the regression coefficients computed using the Lasso regression methods [46]. Since the Lasso regression performs variable selection, not all descriptors appear in the figure (the coefficients for the omitted descriptors are deemed zero). As seen from the figure, the exact values of the coefficients depend on the temporal centrality under consideration. Nevertheless, we can see some trends. The descriptors that show a strong positive correlation with the evader’s ability to hide from at least two temporal centralities are the mean of the intercontact time of the evader’s contacts νm\nu_{m} and the mean of the number of other contacts between two consecutive contacts over node’s contacts ηm\eta_{m}. On the other hand, the descriptors that show negative correlation vary greatly between centrality measures. Interestingly, the local clustering coefficient of the evader ζ\zeta has positive regression coefficients values for more locally-oriented centralities, i.e., degree and eigenvector, but negative values for more global centralities, i.e., closeness and betweenness.

6 Conclusions

In this article, we analyzed both theoretically and empirically the problem faced by an evader—a member of a temporal social network who wishes to avoid being detected by temporal centrality measures. As part of the theoretical analysis, we defined the decision and the optimization versions of the problem faced by the evader. We proved that the decision version is NP-complete even for very basic temporal centrality measures. As for the optimization version, we presented a 22-approximation algorithm for the degree temporal centrality while showing that the problem is inapproximable within logarithmic bounds for the closeness and betweenness centrality measures. As part of the empirical analysis, we compared the effectiveness of several hiding heuristics in real-life temporal social networks, finding that removing existing contacts is significantly more effective in avoiding detection by temporal centralities than adding new contacts. Moreover, using regression analysis, we determined that the nodes whose contacts are more distributed throughout the analyzed period are, on average, more successful in obscuring their central position in the network. More broadly, our study contributes to the literature on hiding from social network analysis tools by considering temporal networks, which are not only more general, but also much more challenging to analyze compared to static networks.

Our findings suggest that while it is computationally infeasible to find an optimal way of hiding from temporal centrality measures, skillful manipulation of one’s contacts can still help maintain some semblance of safety even when utilizing heuristic solutions. At the same time, considering the temporal aspects of the network activity offers much more diverse ways of hiding one’s importance. Whereas in a static network, a member of the network can only decide who to connect or avoid, in a temporal network, they can also determine the distribution of the contacts, thus creating a much richer strategic space that should be further analyzed in the future research.

Acknowledgments

P.H. was supported by JSPS KAKENHI Grant Number JP 21H04595.

References

  • [1] N. Aharony, W. Pan, C. Ip, I. Khayal, and A. Pentland. Socialfmri: Investigating and shaping social mechanisms in the real world. Pervasive Mob. Comput., 7(6):643–659, 2011.
  • [2] G. Bigwood, T. Henderson, D. Rehunathan, M. Bateman, and S. Bhatti. Crawdad dataset st andrews/sassy (v. 2011-06-03). CRAWDAD wireless network data archive, 2011. Downloaded June 2011.
  • [3] J. Candia, M. C. González, P. Wang, T. Schoenharl, G. Madey, and A.-L. Barabási. Uncovering individual and collective human dynamics from mobile phone records. J. Phys. A, 41(22):224015, 2008.
  • [4] A. Chaintreau, P. Hui, J. Crowcroft, C. Diot, R. Gass, and J. Scott. Impact of human mobility on opportunistic forwarding algorithms. IEEE Trans. Mob. Comput., 6(6):606–620, June 2007.
  • [5] P. Dey and S. Medya. Covert networks: How hard is it to hide? In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pages 628–637, Montreal, Canada, 2019. IFAAMAS.
  • [6] I. Dinur and D. Steurer. Analytical approach to parallel repetition. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 624–633. ACM, 2014.
  • [7] N. Eagle and A. S. Pentland. Reality mining: sensing complex social systems. Pers. Ubiquitous Comput., 10(4):255–268, 2006.
  • [8] J.-P. Eckmann, E. Moses, and D. Sergi. Entropy of dialogues creates coherent structures in e-mail traffic. Proc. Natl. Acad. Sci. USA, 101(40):14333–14337, 2004.
  • [9] EU. Regulation 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC. Official Journal of the European Union, L119:1–88, May 2016.
  • [10] B. C. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv., 42(4):1–53, 2010.
  • [11] M. Genois, C. L. Vestergaard, J. Fournet, A. Panisson, I. Bonmarin, and A. Barrat. Data on face-to-face contacts in an office building suggest a low-cost vaccination strategy based on community linkers. Network Science, 3:326–347, 9 2015.
  • [12] R. Gross and A. Acquisti. Information revelation and privacy in online social networks. In Proceedings of the 2005 ACM workshop on Privacy in the electronic society, pages 71–80, 2005.
  • [13] M. Hay, G. Miklau, D. Jensen, P. Weis, and S. Srivastava. Anonymizing social networks. Technical report, Computer Science Department, University of Massachusetts Amherst, 2007. No. 07–19.
  • [14] P. Holme. Epidemiologically optimal static networks from temporal network data. PLoS Comput. Biol., 9(7):e1003142, 2013.
  • [15] P. Holme and N. Masuda. The basic reproduction number as a predictor for epidemic outbreaks in temporal networks. PLOS One, 10(3):e0120567, 2015.
  • [16] P. Holme and J. Saramäki. Temporal networks. Phys. Rep., 519(3):97–125, 2012.
  • [17] Q. Huang, C. Zhao, X. Zhang, X. Wang, and D. Yi. Centrality measures in temporal networks with time series analysis. EPL (Europhys. Lett.), 118(3):36001, 2017.
  • [18] J. Isaak and M. J. Hanna. User data privacy: Facebook, cambridge analytica, and privacy protection. Computer, 51(8):56–59, 2018.
  • [19] L. Isella, J. Stehlé, A. Barrat, C. Cattuto, J.-F. Pinton, and W. Van den Broeck. What’s in a crowd? analysis of face-to-face behavioral networks. J. Theor. Biol., 271(1):166–180, 2011.
  • [20] C. Jernigan and B. F. Mistree. Gaydar: Facebook friendships expose sexual orientation. First Monday, 2009.
  • [21] M. Kearns, A. Roth, Z. S. Wu, and G. Yaroslavtsev. Private algorithms for the protected in social network search. Proc. Natl. Acad. Sci. USA, page 201510612, 2016.
  • [22] V. Khatri and C. V. Brown. Designing data governance. Comm. ACM, 53(1):148–152, 2010.
  • [23] H. Kim and R. Anderson. Temporal node centrality in complex networks. Phys. Rev. E, 85(2):026107, 2012.
  • [24] M. C. Kiti, M. Tizzoni, T. M. Kinyanjui, D. C. Koech, P. K. Munywoki, M. Meriac, L. Cappa, A. Panisson, A. Barrat, C. Cattuto, et al. Quantifying social contacts in a household setting of rural kenya using wearable proximity sensors. EPJ Data Sci., 5(1):1–21, 2016.
  • [25] J. I. Lane, V. Stodden, S. Bender, and H. Nissenbaum, editors. Privacy, Big Data, and the Public Good: Frameworks for Engagement. Cambridge University Press, 2014.
  • [26] L. Lv, K. Zhang, T. Zhang, X. Li, J. Zhang, and W. Xue. Eigenvector centrality measure based on node similarity for multilayer and temporal networks. IEEE Access, 7:115725–115733, 2019.
  • [27] A. Madan, M. Cebrian, S. Moturu, K. Farrahi, and A. Pentland. Sensing the ‘health state’ of a community. IEEE Pervasive Comput., 11(4):36–45, 2012.
  • [28] R.-C. Marin, C. Dobre, and F. Xhafa. Exploring predictability in mobile interaction. In 2012 Third International Conference on Emerging Intelligent Data and Web Technologies, pages 133–139. IEEE, 2012.
  • [29] R. Mastrandrea, J. Fournet, and A. Barrat. Contact patterns in a high school: a comparison between data collected using wearable sensors, contact diaries and friendship surveys. PLOS One, 10(9):e0136497, 2015.
  • [30] T. P. Michalak, T. Rahwan, and M. Wooldridge. Strategic social network analysis. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
  • [31] R. Michalski, S. Palus, and P. Kazienko. Matching organizational structure and social network extracted from email communication. In Lecture Notes in Business Information Processing, volume 87, pages 197–206, Berlin Heidelberg, 2011. Springer.
  • [32] A. Mislove, B. Viswanath, K. P. Gummadi, and P. Druschel. You are who you know: inferring user profiles in online social networks. In Proceedings of the third ACM international conference on Web search and data mining, pages 251–260, 2010.
  • [33] D. Olguin, B. Waber, T. Kim, A. Mohan, K. Ara, and A. Pentland. Sensible organizations: Technology and methodology for automatically measuring organizational behavior. IEEE Trans. Syst. Man Cybern. Syst., 39(1):43–55, 2009.
  • [34] R. K. Pan and J. Saramäki. Path lengths, correlations, and centrality in temporal networks. Phys. Rev. E, 84(1):016105, 2011.
  • [35] T. M. Przytycka, M. Singh, and D. K. Slonim. Toward the dynamic interactome: it’s about time. Brief. Bioinform., 11(1):15–29, 2010.
  • [36] A. Rao, A. O. Hero, D. J. States, and J. D. Engel. Inferring time-varying network topologies from gene expression data. Eurasip J. Bioinform. Syst. Biol., 2007:1–12, 2007.
  • [37] J. M. Read, K. T. Eames, and W. J. Edmunds. Dynamic social networks and the implications for the spread of infectious disease. J. Roy. Soc. Interface, 5(26):1001–1007, 2008.
  • [38] J. Stehlé, N. Voirin, A. Barrat, C. Cattuto, V. Colizza, L. Isella, C. Régis, J.-F. Pinton, N. Khanafer, W. Van den Broeck, et al. Simulation of an SEIR infectious disease model on the dynamic contact network of conference attendees. BMC Med., 9(1):87, 2011.
  • [39] J. Stehlé, N. Voirin, A. Barrat, C. Cattuto, L. Isella, J.-F. Pinton, M. Quaggiotto, W. Van den Broeck, C. Régis, B. Lina, et al. High-resolution measurements of face-to-face contact patterns in a primary school. PLOS One, 6(8):e23176, 2011.
  • [40] A. Stopczynski, V. Sekara, P. Sapiezynski, A. Cuttone, J. E. Larsen, and S. Lehmann. Measuring large-scale social networks with high resolution. PLOS One, 9(4):e95978, 2014.
  • [41] T. Takaguchi, M. Nakamura, N. Sato, K. Yano, and N. Masuda. Predictability of conversation partners. Phys. Rev. X, 1(1):011008, 2011.
  • [42] J. Tang, M. Musolesi, C. Mascolo, V. Latora, and V. Nicosia. Analysing information flows and key mediators through temporal centrality metrics. In Proceedings of the 3rd Workshop on Social Network Systems, pages 1–6, 2010.
  • [43] D. Taylor, S. A. Myers, A. Clauset, M. A. Porter, and P. J. Mucha. Eigenvector-based centrality measures for temporal networks. Multiscale Model. Sim., 15(1):537–574, 2017.
  • [44] D. Taylor, M. A. Porter, and P. J. Mucha. Supracentrality analysis of temporal networks with directed interlayer coupling. In Temporal Network Theory, pages 325–344. Springer, 2019.
  • [45] D. Taylor, M. A. Porter, and P. J. Mucha. Tunable eigenvector-based centralities for multiplex and temporal networks. Multiscale Model. Sim., 19(1):113–147, 2021.
  • [46] R. Tibshirani. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Series B Stat. Methodol., 58(1):267–288, 1996.
  • [47] P. Vanhems, A. Barrat, C. Cattuto, J.-F. Pinton, N. Khanafer, C. Régis, B.-a. Kim, B. Comte, and N. Voirin. Estimating potential infection transmission routes in hospital wards using wearable proximity sensors. PLOS One, 8(9):e73970, 2013.
  • [48] M. Waniek, M. Cebrian, P. Holme, and T. Rahwan. Social diffusion sources can escape detection, 2021.
  • [49] M. Waniek, T. Michalak, and T. Rahwan. Hiding in multilayer networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 1021–1028, 2020.
  • [50] M. Waniek, T. P. Michalak, T. Rahwan, and M. Wooldridge. On the construction of covert networks. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pages 1341–1349, 2017.
  • [51] M. Waniek, T. P. Michalak, M. J. Wooldridge, and T. Rahwan. Hiding individuals and communities in a social network. Nat. Hum. Behav., 2(2):139–147, 2018.
  • [52] M. Waniek, J. Woundefinednica, K. Zhou, Y. Vorobeychik, T. Rahwan, and T. P. Michalak. Strategic evasion of centrality measures. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’21, page 1389–1397, Richland, SC, 2021. International Foundation for Autonomous Agents and Multiagent Systems.
  • [53] M. Waniek, K. Zhou, Y. Vorobeychik, E. Moro, T. P. Michalak, and T. Rahwan. How to hide one’s relationships from link prediction algorithms. Sci. Rep., 9(1):12208, 2019.
  • [54] T. Was, M. Waniek, T. Rahwan, and T. Michalak. The manipulability of centrality measures-an axiomatic approach. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pages 1467–1475, 2020.
  • [55] R.-R. Yin, Q. Guo, J.-N. Yang, and J.-G. Liu. Inter-layer similarity-based eigenvector centrality measures for temporal networks. Physica A, 512:165–173, 2018.
  • [56] S. Yu, M. Zhao, C. Fu, J. Zheng, H. Huang, X. Shu, Q. Xuan, and G. Chen. Target defense against link-prediction-based attacks via evolutionary perturbations. IEEE Trans. Knowl. Data Eng., 33(2):754–767, 2021.
  • [57] K. Zhou, T. P. Michalak, M. Waniek, T. Rahwan, and Y. Vorobeychik. Attacking similarity-based link prediction in social networks. In Proceedings of the 18th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), page 305–313, 2019.