Population-level Balance in Signed Networks

Weijing Tang
Department of Statistics and Data Science, Carnegie Mellon University
and
Ji Zhu
Department of Statistics, University of Michigan

Abstract

Statistical network models are useful for understanding the underlying formation mechanism and characteristics of complex networks. However, statistical models for signed networks have been largely unexplored. In signed networks, there exist both positive (e.g., like, trust) and negative (e.g., dislike, distrust) edges, which are commonly seen in real-world scenarios. The positive and negative edges in signed networks lead to unique structural patterns, which pose challenges for statistical modeling. In this paper, we introduce a statistically principled latent space approach for modeling signed networks and accommodating the well-known balance theory, i.e., “the enemy of my enemy is my friend” and “the friend of my friend is my friend”. The proposed approach treats both edges and their signs as random variables, and characterizes the balance theory with a novel and natural notion of population-level balance. This approach guides us towards building a class of balanced inner-product models, and towards developing scalable algorithms via projected gradient descent to estimate the latent variables. We also establish non-asymptotic error rates for the estimates, which are further verified through simulation studies. In addition, we apply the proposed approach to an international relation network, which provides an informative and interpretable model-based visualization of countries during World War II.

Keywords: Signed Networks, Balance Theory, Latent Space Models, Projected Gradient Descent.

1 Introduction

Networks characterize connectivity relationships between individuals of a complex system and are ubiquitous in various fields, such as social science, biology, transportation, and information technology [22]. In a network, a node represents an individual and an edge between two nodes indicates the presence of certain interaction or relation. Given the unique relational information represented by networks, many statistical models have been developed to understand the underlying mechanism of the system and help explain the observed phenomenon on networks; see for example [10] for a comprehensive overview. One important class of statistical models is the latent variable model, where the presence/absence of an edge depends on the node latent variables. For example, stochastic block models use latent categorical variables to describe the block structure among nodes [1]; latent space models map nodes into a low-dimensional metric space while accounting for transitivity, homophily for node attributes, node heterogeneity and clustering [14, 19]. Such latent variable models are attractive due to their interpretable structure, their nature for network visualization, and their ability for downstream network-assisted learning such as node clustering, node classification, and network link prediction.

Nonetheless, most statistical network models only focus on the presence/absence of edges while ignoring different types of edges, which makes them inadequate for modeling signed networks. A signed network consists of two types of edges, positive edges and negative edges, and such polarized relationships are common in real-world networks. For example, positive and negative edges may respectively correspond to relationships of like and dislike in social networks, collaboration and competition in trading networks, or alliance and militarized dispute in international relation networks. Modeling signed networks has its own unique challenges not merely due to the additional sign for each edge, but more importantly, because the presence of positive and negative edges affect each other in certain ways. There have been various social theories that describe the structural pattern of signed networks [11, 20], an important one being the structural balance theory [12]. Specifically, the balance theory describes the distribution of different types of triangles (i.e. three nodes that are connected with each other). A triangle in a signed network is called balanced if the product of its three edge signs is positive; and it is called unbalanced otherwise (see Figure 1 for examples). The balance theory postulates that balanced triangles should be more prevalent than unbalanced triangles in signed networks, which directly coincides with the proverb, “the enemy of my enemy is my friend” and “the friend of my friend is my friend”. Moreover, recent studies have found empirical evidence of the balance property in many real-world signed networks [20, 17, 9].

Refer to caption — Figure 1: Four types of triangles in signed networks, where the left two are balanced and the right two are unbalanced.

On the other hand, there have been very few statistical models for signed networks that incorporate the balance theory into modeling. To the best of our knowledge, [8] is the only recent work; specifically, it extends the configuration model [6] to signed networks with a focus on matching not only the node degree distribution but also the sign distribution and proportion of balanced triangles. Besides statistical models, there is a collection of work using low-rank matrix completion approaches induced by the balance theory for learning tasks such as sign prediction and clustering [15, 5]. These works assume that there are underlying signed edges (not allowing for zeros) between all possible pairs of $n$ nodes, and view the network as a fixed $n\times n$ adjacency matrix with entries of $\{+1,-1\}$ . In comparison, statistical network models can provide statistically interpretable structures and account for noise in both signs and edges by modeling network distributions that precisely quantify the randomness in the observed data.

In this paper, we propose a latent space approach to accommodate the balance theory for signed networks in a statistically principled way. Specifically, we introduce a novel definition of balance at the population level, which matches the balance theory in nature while viewing an observed network as the realization of a random quantity. For concreteness, we consider an undirected signed network with $n$ nodes denoted by a symmetric signed adjacency matrix $A$ , with $A_{ij}=A_{ji}=1$ if node $i$ and node $j$ are linked by a positive edge, $A_{ij}=A_{ji}=-1$ if node $i$ and node $j$ are linked by a negative edge, and $A_{ij}=A_{ji}=0$ if there is no edge between $i$ and $j$ . We assume there is no self-loop and thereby the diagonal elements of $A$ are zeros. We assume $A_{ij}$ to be random variables taking values in $\{-1,0,1\}$ and define the notion of balance at the population level as follows.

Definition 1 (Population-level balanced network).

A network is population-level balanced if

E(A_{ij}A_{j\ell}A_{\ell i}\big{|}|A_{ij}A_{j\ell}A_{\ell i}|=1)>0,\text{ for any three different nodes }(i,j,\ell).

This definition suggests the expected product of signs on any triangle to be positive but does not require all triangles to be balanced in an observed signed network. Furthermore, the stochastic notion in Definition 1 allows us to investigate what generating mechanisms of signed networks are inherently of population-level balance. Specifically, we will focus on a general class of latent space models, due to aforementioned merits of latent space models. Rigorous descriptions are provided in Section 2. The key finding is that, if there exists a partition of the latent space such that edges tend to be positive within the same subset and negative between different subsets, then the network generated from such a latent space model is population-level balanced.

Based on this finding, we further propose a class of balanced inner-product models that directly capture the population-level balance. A unique difference from latent space models for unsigned networks is that we introduce an additional latent polar variable for each node. In particular, when the product of latent polar variables of two nodes has a large positive value, the sign of an edge between them is more likely to be positive; for a node with the latent polar variable being zero, it has no preference on the signs when forming edges with other nodes. We note that it is this novel introduction of latent polar variables that enables modeling signed networks with the population-level balance.

This paper is organized as follows. We introduce the latent space approach for signed networks in Section 2, where we also provide a sufficient condition for the population-level balance. We present the proposed balanced inner product models in Section 3. We develop two scalable estimation methods in Section 4 and establish their non-asymptotic error rates in Section 5, which are further validated by simulation studies in Section 6. We demonstrate the effectiveness of the proposed approach in modeling a real-world signed network for international relations in Section 7. All proofs are given in the Supplemental Material.

2 A Latent Space Approach for Signed Networks

In this section, we propose a probabilistic generative process for undirected signed networks with $n$ nodes. Recall that $A\in\{1,0,-1\}^{n\times n}$ is the signed adjacency matrix. Suppose the latent space $\mathcal{U}_{0}$ is endowed with the probability measure $P_{u}$ ; $B(\cdot,\cdot):\mathcal{U}_{0}\times\mathcal{U}_{0}\rightarrow(0,1)$ and $f(\cdot,\cdot):\mathcal{U}_{0}\times\mathcal{U}_{0}\rightarrow(-\infty,\infty)$ are two measurable symmetric functions.

Definition 2 (A general latent space model for signed network $G(n,\mathcal{U}_{0},P_{u},B,f)$ ).

For $1\leq i\leq n$ , let $u_{i}\in\mathcal{U}_{0}$ be the latent vector independently sampled from the distribution $P_{u}$ . Given the latent vectors of a pair of nodes $i$ and $j$ , independently of other pairs, an edge between node $i$ and node $j$ is drawn with probability $B(u_{i},u_{j})$ , i.e.,

|A_{ij}|\overset{\text{ind.}}{\sim}\text{Bernoulli}(P_{ij})\text{ \ with \ }P_{ij}=B(u_{i},u_{j});

then for each edge (i.e. $|A_{ij}|=1$ ), independently of all others, it takes the positive sign with logit $f(u_{i},u_{j})$ and the negative sign otherwise, i.e.,

\text{logit}(A_{ij}=1\Big{|}|A_{ij}|=1)=f(u_{i},u_{j}).

We write $A\sim G(n,\mathcal{U}_{0},P_{u},B,f)$ to denote a signed network with $n$ nodes generated from the above procedure.

Note that in the network generative process in Definition 2, the first part for generating edges covers many existing latent variable models for unsigned networks as special cases by specifying different functions $B(\cdot,\cdot)$ ; the second part for generating signs further models the sign distribution through specifying the logit-transformed probability $f(\cdot,\cdot)$ , as the sign of $f$ is the same as that of the conditional expectation of $A_{ij}$ . Given this general class of latent space models for signed networks, next we identify the connection between the population-level balance and the key components of the model $(P_{u},B,f)$ . As we will see, this connection serves as the foremost step for incorporating the balance theory into modeling signed networks. The following proposition provides a sufficient condition for the symmetric function $f(\cdot,\cdot)$ such that the generated network is population-level balanced.

Proposition 1.

Suppose a symmetric function $f(\cdot,\cdot)$ satisfies that

f(a,b)\cdot f(b,c)\cdot f(c,a)>0,\text{ \ for any }a,b,c\in\mathcal{U},

(1)

where $\mathcal{U}$ is a subset of $\mathcal{U}_{0}$ with probability $1$ , i.e., $P_{u}(\mathcal{U})=1$ . Then, a network $A\sim G(n,\mathcal{U}_{0},P_{u},B,f)$ with a symmetric measurable function $B(\cdot,\cdot):\mathcal{U}_{0}\times\mathcal{U}_{0}\rightarrow(0,1)$ is population-level balanced.

The proof of Proposition 1 is based on the fact that $E(A_{ij}|u_{i},u_{j})>0$ if and only if the logit $f(u_{i},u_{j})>0$ when the probability that an edge appears between node $i$ and node $j$ is nonzero. The details are provided in the Supplemental Material. We note that though there is room for relaxation of the requirement (1), its simplicity provides a feasible direction for further analyses on the form of $f$ .

Since not any arbitrary symmetric function $f$ would satisfy (1), it is desirable to study what characteristics the function $f$ should have. To this end, we have established the necessary and sufficient conditions in Theorem 1 for the function $f$ to satisfy (1).

Theorem 1.

For a symmetric function $f(\cdot,\cdot):\mathcal{U}_{0}\times\mathcal{U}_{0}\rightarrow(-\infty,\infty)$ , $f(a,b)\cdot f(b,c)\cdot f(c,a)>0$ holds for any $a,b,c\in\mathcal{U}$ , where $\mathcal{U}\subset\mathcal{U}_{0}$ with $P_{u}(\mathcal{U})=1$ , if and only if

(i)

the function $f$ is positive on $\mathcal{U}\times\mathcal{U}$ , i.e., $f(a,b)>0$ for any $a,b\in\mathcal{U}$ ; or
(ii)

there exists two nonempty subsets $S$ and $T$ , with $S\cup T=\mathcal{U}$ and $S\cap T=\varnothing$ , such that $sign(f(a,b))=\mathbbm{1}(a\in S)\cdot\mathbbm{1}(b\in S)$ for any $a,b\in\mathcal{U}$ , where $\mathbbm{1}(event)$ is not the usual indicator function, but rather equals $1$ if the event holds and $-1$ otherwise.

Theorem 1 implies that, for a function $f$ to satisfy (1), if it is not always positive, then $\mathcal{U}$ can be divided into two nonempty disjoint subsets such that the function $f$ is positive when the two arguments belong to the same subset and negative otherwise. On the other hand, if the function $f$ is always positive, it corresponds to a trivial case in which the expected signs between all pairs of nodes are positive. Note that when $\mathcal{U}$ is discrete and finite, Theorem 1 is a direct result of [12], while our theorem can be applied to more general latent spaces.

Next, we further illustrate the implication of Theorem 1 in choosing the function $f$ to describe the population-level balance by taking commonly used latent spaces as examples.

Example 1 The latent space $\mathcal{U}_{0}$ can be a finite set as in stochastic block models [1]. Let $\mathcal{U}_{0}=\{1,\cdots,K\}$ and $u_{i}$ denotes the community that node $i$ belongs to. Theorem 1 implies that the $K$ communities can be further combined into two groups and edges tend to be positive within the same group and negative between different groups, as shown in the left side of Figure 2. We provide a rigorous description for the above result in the following corollary.

Corollary 1.1.

For a finite set $\mathcal{U}_{0}=\{1,\dots,K\}$ , the symmetric function $f(\cdot,\cdot):\mathcal{U}_{0}\times\mathcal{U}_{0}\rightarrow(-\infty,\infty)$ satisfies that $f(a,b)\cdot f(b,c)\cdot f(c,a)>0\text{ for any }a,b,c\in\mathcal{U}_{0},$ if and only if there exists a grouping function $g:\{1,\dots,K\}\rightarrow\{-1,1\}$ and some constants $q_{ab}=q_{ba}>0$ such that $f(a,b)=q_{ab}\cdot g(a)\cdot g(b)$ holds for $1\leq a,b\leq K$ .

Note that here the grouping function $g$ identifies two antagonistic groups in the signed network, where nodes from different groups tend to “dislike” each other.

Example 2 The latent space can also be a Euclidean space as in the latent distance model and the latent projection model [14]. The following proposition provides an important class of continuous symmetric functions for which the requirement (1) is satisfied.

Proposition 2.

For the Euclidean space $\mathcal{U}_{0}={\mbox{{R}}}^{k}$ , the requirement (1) holds if $f(a,b)=\phi(a)\phi(b)$ , where $\phi(\cdot)$ is a real-valued continuous function and $P_{u}(u:\phi(u)\neq 0)=1$ .

From the mathematical perspective, Proposition 2 is an obvious result based on the inequality in (1). However, in combination with Theorem 1, it leads us to a useful and interesting interpretation for the function $\phi$ . Specifically, $\phi(\cdot)$ can be viewed as the logit (or score) of any binary “classifier” that separates $\mathcal{U}_{0}$ into two disjoint regions; the function $f$ is positive if the two arguments are classified into the same region and negative otherwise. As shown in the right panel of Figure 2, the boundary of the classifier tries to cut as many negative edges as possible while retaining most positive edges within the same region. with some $w\in{\mbox{{R}}}^{k}$ and $\gamma\in{\mbox{{R}}}$ .

Remark 1.

For the special case $f(a,b)=a^{\top}b$ , [13] discussed the use of the inner product of two latent variables to capture various third-order dependence patterns among residuals in the context of regression, including the balance theory. Specifically, when the dimension $k$ is one, [13] observed that all triads among residuals are balanced, which allows for clustering nodes into two groups based on signs of latent variables. This observation aligns with our findings on the separation of the latent space. Nonetheless, the balance concept studied in [13] is at the triangle level and does not extend to the entire network structure. In this work, we have introduced a rigorous stochastic definition of population-level balance for signed networks. Furthermore, the condition in Proposition 1 for population-level balance is more general and includes the inner product as an example.

Remark 2.

Moreover, our findings in this section can be generalized beyond triangles. In the Supplemental Material, we generalize the notion of population-level balance and extend Proposition 1 to any $\ell$ -loops as well as general weighted networks. In addition, we extend the infinitely exchangeable graphon model for signed networks and expand Theorem 1 accordingly.

3 Balanced Inner-product Models

Motivated by the key finding in Theorem 1, we propose two inner-product models for signed networks that fall within the general class of latent space models in Definition 2. Both models are inherently of population-level balance. We first present the separate inner-product model and then introduce the joint inner-product model by adding an additional structural assumption. We will demonstrate the usefulness of this structural assumption in estimating the latent variables both theoretically (if it is correctly specified) and empirically in subsequent sections.

3.1 Separate inner-product model

We assume that for any $1\leq i<j\leq n$ , we have

|A_{ij}|=|A_{ji}|\overset{\text{ind.}}{\sim}\text{Bernoulli}(P_{ij}),\text{ \ with \ logit}(P_{ij})=\Theta_{ij}=\alpha_{i}+\alpha_{j}+z_{i}^{\top}z_{j},

(2)

where $\alpha_{i}\in{\mbox{{R}}}$ and $z_{i}\in{\mbox{{R}}}^{k}$ (for $i=1,\ldots,n$ ) are latent variables. Further, independently of all others, we also assume

\text{logit}(A_{ij}=1\Big{|}|A_{ij}|=1)=\eta_{ij}=v_{i}v_{j},

(3)

where $v_{i}\in{\mbox{{R}}}$ for $i=1,\ldots,n$ are also latent variables.

The proposed separate inner-product model has the capacity to capture various commonly observed characteristics of signed networks. Specifically, the parameter $\alpha_{i}$ enables modeling node degree heterogeneity, of which a larger value leads to higher probability of connecting with other nodes given other parameters fixed. Thus, we call $\{\alpha_{i}\}_{i=1}^{n}$ degree heterogeneity parameters. Next, the inner-product formation between the latent position vectors $z_{i}$ and $z_{j}$ inherently models transitivity, i.e., nodes with common neighbors (regardless of friend or enemy) are more likely to be linked. Because the closer the latent position vectors of two nodes are in the latent space, the higher inner product it is and more likely to connect with each other. Finally, the parameters $\{v_{i}\}_{i=1}^{n}$ model the distribution of signs through their product, which satisfies the sufficient condition (1) for the population-level balance. In particular, an edge between two nodes tends to have a positive sign when their latent variables $v_{i}$ and $v_{j}$ have the same sign and a negative sign otherwise. Moreover, the magnitude of $v_{i}$ controls the discrepancy level between positive and negative signs. Therefore, we name them as latent polar variables. When all latent polar variables are zeros, negative and positive signs are exchangeable.

Following [29], which is for unsigned networks, we impose no distributional assumptions (prior) on the latent variables $(\alpha_{i},z_{i},v_{i})$ for the sake of modeling flexibility and estimation scalability, in comparison to treating them as random and using Bayesian estimation approaches as in existing works [19].

For presentation simplicity, we rewrite the model in matrix form. Specifically, we have $\Theta=\alpha 1_{n}^{\top}+1_{n}\alpha^{\top}+ZZ^{\top},\ \ \eta=vv^{\top}$ , where $\alpha=(\alpha_{1},\cdots,\alpha_{n})$ , $1_{n}$ is the all one vector in ${\mbox{{R}}}^{n}$ , $Z=(z_{1},\cdots,z_{n})^{\top}\in{\mbox{{R}}}^{n\times k}$ , and $v=(v_{1},\cdots,v_{n})^{\top}\in{\mbox{{R}}}^{n}$ .

Identifiability To ensure identifiability of parameters $(\alpha,Z,v)$ , we provide additional constraints in Proposition 3. Given centered latent position variables, that is $J_{n}Z=Z$ , where $J_{n}=I_{n}-\frac{1}{n}1_{n}1_{n}^{\top}$ , the parameters are identifiable up to an orthogonal transformation and a sign flipping.

Proposition 3.

Suppose two sets of parameters $(\alpha,Z,v)$ and $(\bar{\alpha},\bar{Z},\bar{v})$ satisfy that A1) $J_{n}Z=Z$ and $J_{n}\bar{Z}=\bar{Z}$ ; A2) $Z\in{\mbox{{R}}}^{n\times k}$ is of full rank. Then, they specify the same network distribution through (2) and (3) if and only if there exist an orthogonal matrix $O\in{\mbox{{R}}}^{k\times k}$ with $O^{\top}O=OO^{\top}=I_{k}$ and $\kappa\in\{-1,1\}$ such that $\alpha=\bar{\alpha},Z=\bar{Z}O,v=\kappa\bar{v}$ .

3.2 Joint inner-product model

Based on the above separate inner-product model, we further consider the dependency of the latent polar variable $v_{i}$ on the latent position variable $z_{i}$ . The idea of introducing their relationship originates naturally from Proposition 2, where we view the latent polar variable $v_{i}$ as a function of the latent position variable $z_{i}$ , i.e., $v_{i}=\phi(z_{i})$ with some link function $\phi$ . Modeling such a link function $\phi$ can provide more structural information of the network. On the other hand, there are flexibilities in choosing the family of link function $\phi$ , which would lead to different shapes of the latent space partition derived by $\phi$ . For the scope of this paper, we assume $\phi$ is a linear function in $z_{i}$ in the joint inner-product model, i.e., $v_{i}=w^{\top}z_{i}+\gamma$ with $w\in{\mbox{{R}}}^{k}$ and $\gamma\in{\mbox{{R}}}$ , and discuss other nonlinear alternatives in Remark 3. Specifically, the joint inner-product model is given by (2) and replaces (3) with

\text{logit}(A_{ij}=1\Big{|}|A_{ij}|=1)=\eta_{ij}=(w^{\top}z_{i}+\gamma)(w^{\top}z_{j}+\gamma).

(4)

In particular, the hyperplane $\{z\in{\mbox{{R}}}^{k}:w^{\top}z+\gamma=0\}$ separates the latent space into two regions. A pair of nodes tend to have a positive edge when their latent positions are located on the same side of the hyperplane and have a negative edge when their latent positions are located on different sides of the hyperplane. If $w=0$ and $\gamma\neq 0$ , the sign of each edge has a homogeneous logit $\gamma^{2}$ to be positive.

Identifiability For the joint inner-product model, the identifiability condition for parameters $(\alpha,Z,w,\gamma)$ is established correspondingly in Proposition 4.

Proposition 4.

Suppose two sets of parameters $(\alpha,Z,w,\gamma)$ and $(\bar{\alpha},\bar{Z},\bar{w},\bar{\gamma})$ satisfy that A1) $J_{n}Z=Z$ and $J_{n}\bar{Z}=\bar{Z}$ ; A2) $Z\in{\mbox{{R}}}^{n\times k}$ is of full rank. Then, they specify the same network distribution through (2) and (4) if and only if there exist an orthogonal matrix $O\in{\mbox{{R}}}^{k\times k}$ with $O^{\top}O=OO^{\top}=I_{k}$ and $\kappa\in\{-1,1\}$ such that $\alpha=\bar{\alpha},Z=\bar{Z}O,w=\kappa O^{\top}\bar{w},\gamma=\kappa\bar{\gamma}$ .

Remark 3.

Though we use a linear link function in the joint inner-product model (4), more flexible nonlinear functions can be considered. For example, we may assume $\phi$ belongs to a reproducing kernel Hilbert space (RKHS) $\mathcal{H}$ associated with an inner product $\langle\cdot,\cdot\rangle_{\mathcal{H}}$ under which $\mathcal{H}$ is complete. There is a positive semidefinite kernel function $\mathbb{K}(\cdot,\cdot):{\mbox{{R}}}^{k}\times{\mbox{{R}}}^{k}\rightarrow{\mbox{{R}}}_{+}$ such that $\phi(z_{i})=\langle\phi,\mathbb{K}(\cdot,z_{i})\rangle_{\mathcal{H}}$ . Multiple choices of RKHS are available for practical use, including those with polynomial kernel, Gaussian kernel, and Laplacian kernel [23].

4 Model Estimation

In this section, we develop two methods for fitting the proposed models (2)-(4). Both methods minimize the negative log-likelihood function of the balanced inner-product models through projected gradient descent.

Under balanced inner-product models, the negative log-likelihood function consists of two parts. The first part is derived from the probability of forming edges:

\mathcal{L}_{e}(\alpha,Z)=\sum_{i<j}\Big{\{}|A_{ij}|\Theta_{ij}+\log(1-\sigma(\Theta_{ij}))\Big{\}},

where $\Theta=\alpha 1_{n}^{\top}+1_{n}\alpha^{\top}+ZZ^{\top}$ , and $\sigma(x)=1/(1+\exp(-x))$ is the sigmoid function, which is the inverse of the logit function. The second part is derived from the probability of assigning signs:

\displaystyle\mathcal{L}_{s}(v)

\displaystyle=\sum_{i<j}\Big{\{}|A_{ij}|\frac{1+A_{ij}}{2}\eta_{ij}+|A_{ij}|\log(1-\sigma(\eta_{ij}))\Big{\}},

where $\eta=vv^{\top}$ , and when under the joint inner-product model, we further have $v=Zw+\gamma 1_{n}$ , or equivalently, $v$ belongs to the column space of $(1_{n},Z)$ .

The first method estimates parameters $(\alpha,Z)$ and $v$ separately by minimizing $\mathcal{L}_{e}(\alpha,Z)$ and $\mathcal{L}_{s}(v)$ respectively. Hence we name it the separate estimation method. Note that the separate estimation method does not depend on a specific relationship between the latent polar variables $v$ and the latent position vectors $Z$ . Therefore, the separate estimation method can always be applied regardless of the underlying link function $\phi$ . Alternatively, we also propose a joint estimation method tailored for the joint inner-product model, which exploits the structural assumption for more accurate estimation. Specifically, we jointly estimate parameters $(\alpha,Z,v)$ by minimizing a weighted sum of $\mathcal{L}_{e}(\alpha,Z)$ and $\mathcal{L}_{s}(v)$ , while constraining $v$ to be in the column space of $(1_{n},Z)$ .

Notation Before presenting the algorithm details, we first introduce the following general notations to be used hereafter. For any $X\in{\mbox{{R}}}^{d_{1}\times d_{2}}$ , $X_{i*}$ and $X_{*j}$ denote the $i$ -th row and $j$ -th column of matrix $X$ respectively, and for any function $\omega(\cdot)$ , $\omega(X)$ represents applying the function $\omega(\cdot)$ element-wisely to $X$ , that is $\omega(X)\in{\mbox{{R}}}^{d_{1}\times d_{2}}$ and $[\omega(X)]_{ij}=\omega(X_{ij})$ . We use $\circ$ to denote the Hadamard product, that is, for any two matrices $X,Y\in{\mbox{{R}}}^{d_{1}\times d_{2}}$ , $X\circ Y\in{\mbox{{R}}}^{d_{1}\times d_{2}}$ and $[X\circ Y]_{ij}=X_{ij}Y_{ij}$ . Moreover, we use $\|X\|_{F}$ , $\|X\|_{op}$ , $\|X\|_{*}$ , and $\|X\|_{\max}$ to denote the Frobenius norm, the operator norm, the nuclear norm, and the max norm of a matrix respectively. We use $col(X)$ to denote the column space of $X$ . For a vector $x\in{\mbox{{R}}}^{d}$ , we use $\|x\|$ to denote the Euclidean norm.

4.1 Separate estimation method

First, to estimate parameters $(\alpha,Z)$ , we solve the non-convex optimization problem below:

\min_{\alpha\in{\mbox{{R}}},Z\in{\mbox{{R}}}^{n\times k}}-\sum_{i,j}\Big{\{}|A_{ij}|\Theta_{ij}+\log(1-\sigma(\Theta_{ij}))\Big{\}},

(5)

subject to $\Theta=\alpha 1_{n}^{\top}+1_{n}\alpha^{\top}+ZZ^{\top}\text{ and }Z=J_{n}Z$ . In particular, the signed adjacency matrix enters the objective function through its absolute value, which leads to the same optimization problem studied in [29] when there is no edge covariate. Here we adopt the projected gradient descent algorithm along with the initialization method proposed in [29] because of their theoretical guarantee and scalability to large networks. We provide the detailed description of the method in Algorithm S1 and the initialization algorithm in the Supplemental Material.

Next, to estimate the latent polar variables $v$ , we solve another non-convex optimization problem, i.e.,

\min_{v\in{\mbox{{R}}}^{n}}-\sum_{i,j}|A_{ij}|\Big{\{}\frac{1+A_{ij}}{2}\eta_{ij}+\log(1-\sigma(\eta_{ij}))\Big{\}}\text{ subject to }\eta=vv^{\top}.

(6)

Similarly, we develop a fast gradient descent algorithm, which is summarized in Algorithm S2. We also use an initialization algorithm based on the universal singular value thresholding proposed by [26] (see the Supplemental Material).

Remark 4.

We note that, although we use gradient descent algorithms for estimating both $(\alpha,Z)$ and $v$ , the subtle difference in their objectives makes the theory in [29] not directly applicable to (6). Specifically, unlike the objective in (5), not all elements of the signed adjacency matrix contribute to the objective in (6). Instead, only nonzero entries, i.e., $\{(i,j):|A_{ij}|=1\}$ , are used for inferring the latent polar variables through (6). In this case, one key step in building the improvement in errors of iterates in [29] no longer holds. Therefore, we establish a new error bound for Algorithm S2 (see Section 5).

Remark 5.

We also note that our optimization problem in (6) is closely related to the line of research on low-rank matrix estimation. See [18, 2, 7, 4, 25, 24] for a sample of references. In particular, (6) can be viewed as a one-bit matrix completion problem, where we observe a random subset of binary entries generated from a distribution determined by a low-rank matrix. To solve this problem, [7] considered a convex relaxation that replaces the low-rank constraint by the nuclear norm penalization. Though it becomes a convex optimization problem, in general, solving such a nuclear-norm penalized optimization problem requires singular value decomposition at each iteration, which is computationally expensive for large matrices. Alternatively, gradient descent algorithms have been used for improving the computational efficiency. [4] and [24] have established convergence guarantees and statistical errors for the gradient descent algorithms in application to low-rank matrix estimation problems, which particularly cover the one-bit matrix completion problem. However, theories in aforementioned works are based on the uniform random sampling assumption, i.e., each entry of the matrix is observed independently with a uniform probability $p$ , while in our case, entries are observed with different probabilities $P_{ij}$ . Thus, our theoretical analysis of the proposed gradient descent algorithm in Section 5 provides new results for one-bit matrix completion under non-uniform random sampling.

4.2 Joint estimation method

Under the joint inner-product model (3)-(4), we propose to jointly estimate parameters $(\alpha,Z,v)$ by re-parameterizing $v=Zw+\gamma 1_{n}$ with $w\in{\mbox{{R}}}^{k}$ and $\gamma\in{\mbox{{R}}}$ . By introducing a hyperparameter $\lambda$ , we minimize the following weighted negative log-likelihood,

\mathcal{L}_{\lambda}(\alpha,Z,w,\gamma)=-\sum_{i,j}\Big{\{}(1-\lambda)\big{[}|A_{ij}|\Theta_{ij}+\log(1-\sigma(\Theta_{ij}))\big{]}+\lambda|A_{ij}|\big{[}\frac{1+A_{ij}}{2}\eta_{ij}+\log(1-\sigma(\eta_{ij}))\big{]}\Big{\}},

subject to $\Theta=\alpha 1_{n}^{\top}+1_{n}\alpha^{\top}+ZZ^{\top}$ , $Z=JZ$ , and $\eta=(Zw+\gamma 1_{n})(Zw+\gamma 1_{n})^{\top}$ .

Here $\lambda$ controls the weight of relative information from the edge formation and the sign assignment respectively. In particular, when $\lambda=0$ , no information from the edge signs is used and the joint estimation reduces to the separate estimation for $(\alpha,Z)$ in (5). Later in Section 5.2, we will theoretically show that, under certain conditions, any positive $\lambda$ below some threshold yields more accurate estimation of latent position variables $Z$ than the separate estimation (i.e., $\lambda=0$ ), but the magnitude of the improvement depends on the choice of $\lambda$ . In principle, we can select $\lambda$ in a data-driven manner by performing cross-validation on the observed signed adjacency matrix, where we randomly mask a subset of entries, fit the joint inner-product model by using the remaining entries, repeat the process multiple times, and then select $\lambda$ from a candidate set with the best average prediction performance on the holdout entries. In practice, we find simply setting $\lambda=1/2$ also works generally well, in which case the solution becomes the usual maximum likelihood estimator.

To solve the above constrained minimization problem, we develop a projected gradient descent algorithm, whose details are given in Algorithm S5 in the Supplemental Material. We initialize the algorithm by $(\alpha_{0},Z_{0})=(\hat{\alpha},\hat{Z})$ obtained from Algorithms S1 and $(w_{0},\gamma_{0})=\operatorname*{arg\,min}_{w\in{\mbox{{R}}}^{k},\gamma\in{\mbox{{R}}}}\mathcal{L}_{\lambda}(\alpha_{0},Z_{0},w,\gamma)$ .

5 Theoretical Results

In this section, we establish high probability error bounds for the proposed two estimation methods. Note for the separate estimation method, the error bound for estimating latent position vectors $Z$ under model (2) and that for estimating latent polar variables $v$ under model (3) are derived separately. Thus, the separate estimation method is robust in the sense that, when one of models (2) and (3) is mis-specified, our theoretical results still hold for the other. On the other hand, for the joint estimation method that utilizes the relationship between $v$ and $Z$ , we further discuss how incorporating their dependency can help reduce the estimation error of latent variables under the joint inner-product model (4).

5.1 Results for the separate estimation method

We present theoretical guarantees of Algorithms S1 and S2 under the separate inner-product model (2) and model (3) respectively. Note that the error bound for the outputs of Algorithm S1 is a straightforward result of [29, Theorem 9] when there is no edge covariate; we adjust it in Proposition 5 for presentation coherence. Nonetheless, their theory cannot be directly applied to the setting of Algorithm S2, because only nonzero entries of the signed adjacency matrix are included in the objective (6), which breaks an important step towards establishing the estimation improvements for successive iterations in their proof. Thus, our established error bound for the outputs of Algorithm S2 is a new result for a more general setting, where entries are observed with non-uniform probabilities.

We describe error bounds for the outputs of Algorithms S1 and S2 with details below. We firstly define the parameter spaces as

	$\displaystyle\mathcal{F}_{\theta}(n,k,M_{1},M_{2})=$	$\displaystyle\Big{\{}\alpha\in{\mbox{{R}}}^{n},Z\in{\mbox{{R}}}^{n\times k},\Theta\in{\mbox{{R}}}^{n\times n}\ \|\ \Theta=\alpha 1_{n}^{\top}+1_{n}\alpha^{\top}+ZZ^{\top},J_{n}Z=Z,$
		$\displaystyle\ \ \ \ \ \ \ \ \max_{1\leq i\leq n}\\|Z_{i*}\\|^{2},2\\|\alpha\\|_{\max}\leq\frac{M_{1}}{2},\max_{1\leq i\neq j\leq n}\Theta_{ij}\leq-M_{2}\Big{\}}$		(7)

and

\displaystyle\mathcal{F}_{\eta}(n,M_{3})=\Big{\{}v\in{\mbox{{R}}}^{n},\eta\in{\mbox{{R}}}^{n\times n}\ |\ \eta=vv^{\top},\|v\|_{\max}^{2}\leq M_{3}\Big{\}},

(8)

where $M_{i}>0$ for $i=1,2,3$ . We allow $k$ , $M_{1}$ , $M_{2}$ , and $M_{3}$ in (7)-(8) to change with the network size $n$ similarly as in [29]. Note that, given the inequalities in (7), it is straightforward to see that, for any $\Theta\in\mathcal{F}_{\theta}(n,k,M_{1},M_{2})$ , we have $-M_{1}\leq\Theta_{ij}\leq-M_{2}$ for $1\leq i\neq j\leq n$ . Therefore, $M_{2}$ , as the upper bound of logit-transformed probabilities of observing edges, controls the network sparsity, of which a larger value leads to a sparser network. The true parameters are denoted by $(\alpha^{*},Z^{*},v^{*})$ , $\Theta^{*}=\alpha^{*}1_{n}^{\top}+1_{n}{\alpha^{*}}^{\top}+Z^{*}{Z^{*}}^{\top}$ , and $\eta^{*}=v^{*}{v^{*}}^{\top}$ .

Error bound for Algorithm S1 Let $(\alpha_{t},Z_{t})$ be the updated parameters at the $t$ -th iteration in Algorithm S1 and $\Theta_{t}=\alpha_{t}1_{n}^{\top}+1_{n}{\alpha_{t}}^{\top}+Z_{t}{Z_{t}}^{\top}$ . Since the latent position vectors $Z\in{\mbox{{R}}}^{n\times k}$ are identifiable up to an orthogonal transformation, we define the distance between two latent matrices $Z_{1}$ and $Z_{2}$ as $dist(Z_{1},Z_{2})=\min_{O\in O(k)}\|Z_{1}-Z_{2}O\|_{F}$ , where $O(k)$ is the collection of all orthogonal matrices in ${\mbox{{R}}}^{k}$ . Let $O_{t}=\arg\min_{O\in O(k)}\|Z_{t}-Z^{*}O\|_{F}$ , $\Delta_{Z_{t}}=Z_{t}-Z^{*}O_{t}$ , and $\Delta_{\Theta_{t}}=\Theta_{t}-\Theta^{*}$ .

For theoretical justification, in Algorithm S1, we further assume projection onto the constraint sets $\mathcal{C}_{Z}=\{Z\in{\mbox{{R}}}^{n\times k},J_{n}Z=Z,\max_{1\leq i\leq n}\|Z_{i*}\|^{2}\leq M_{1}/2\}$ and $\mathcal{C}_{\alpha}=\{\alpha\in{\mbox{{R}}}^{n},2\|\alpha\|_{\max}\leq M_{1}/2\}$ at each iteration. The following proposition establishes the high probability error bounds for estimating both the latent position matrix $Z$ and the logit-transformed probability matrix $\Theta$ .

Proposition 5.

1) the initializers $\alpha_{0},Z_{0}$ in Algorithm S1 satisfy $\|Z^{*}\|_{op}^{2}\|\Delta_{Z_{0}}\|_{F}^{2}+\|\Delta_{\alpha_{0}}1_{n}^{\top}\|_{F}^{2}\leq c_{0}e^{-2M_{1}}\|Z^{*}\|_{op}^{4}/\kappa_{Z^{*}}^{4}$ for a sufficiently small positive constant $c_{0}$ , where $\kappa_{Z^{*}}$ is the condition number of $Z^{*}$ ; and 2) $\|Z^{*}\|_{op}^{2}\geq C_{1}\kappa_{Z^{*}}^{2}\sqrt{n}e^{M_{1}-M_{2}/2}\max\{\sqrt{\tau k}e^{M_{1}},1\}$ for a sufficiently large constant $C_{1}$ . Then there exist positive constants $\rho,c_{1},$ and $C$ uniformly over $\mathcal{F}_{\theta}(n,k,M_{1},M_{2})$ such that, with probability at least $1-n^{-c_{1}}$ , we have

\|Z^{*}\|_{op}^{2}\|\Delta_{Z_{T}}\|_{F}^{2},\ \|\Delta_{\Theta_{T}}\|_{F}^{2}\leq C\kappa_{Z^{*}}^{2}e^{2M_{1}}nk\cdot\max\{e^{-M_{2}},\frac{\log n}{n}\},

for some $T\leq\log(\frac{M_{1}^{2}}{\kappa_{Z^{*}}^{2}e^{4M_{1}-M_{2}}}\frac{n}{k^{3}})/\log(1-\frac{\tau}{e^{M_{1}}\kappa_{Z^{*}}^{2}}\rho)^{-1}$ .

Error bound for Algorithm S2 Let $v_{t}$ be the updated parameters at the $t$ -th iteration in Algorithm S2 and $\eta_{t}=v_{t}{v_{t}}^{\top}$ . Similarly, as the latent polar variables $v\in{\mbox{{R}}}^{n}$ are identifiable up to a sign, we define the distance between two latent vectors $v_{1}$ and $v_{2}$ as $dist(v_{1},v_{2})=\min_{\kappa\in\{-1,1\}}\|v_{1}-\kappa v_{2}\|$ . Let $\kappa_{t}=\arg\min_{\kappa\in\{-1,1\}}\|v_{t}-\kappa v^{*}\|$ and $\Delta_{v_{t}}=v_{t}-\kappa_{t}v^{*}$ , and further let $\Delta_{\eta_{t}}=\eta_{t}-\eta^{*}$ .

Although the error bound presented below does not rely on a specific generating process of edges such as in model (2) and the parameter space $\mathcal{F}_{\theta}(n,k,M_{1},M_{2})$ in (7), it depends on the lower bound of the probability of observing an edge. For notation consistency, we use $M_{1}$ to denote the lower bound of the logit-transformed probability matrix, i.e., $\Theta_{ij}\geq-M_{1}$ for $1\leq i,j\leq n$ . Similarly, for theoretical justification, we constrain $v$ to be in the set $\mathcal{C}_{v}=\{v\in{\mbox{{R}}}^{n},\|v\|_{\max}^{2}\leq M_{3}\}$ at each iteration in Algorithm S2. The following theorem establishes the high probability error bounds for estimating the latent polar variables $v$ and the logit-transformed probability matrix $\eta$ .

Theorem 2.

Set the step size as $\tau_{v}=\tau/\|v_{0}\|^{2}$ for any $\tau\leq c$ , where $c>0$ is a universal constant. Suppose 1) the initializer $v_{0}$ in Algorithm S2 satisfies $\|\Delta_{v_{0}}\|\leq c_{0}e^{-(M_{1}+M_{3})/2}\|v^{*}\|$ for a sufficiently small positive constant $c_{0}$ ; and 2) $\|v^{*}\|^{2}\geq C_{1}\sqrt{n}e^{M_{1}+M_{3}}\max\{\sqrt{\tau e^{M_{1}+M_{3}}},1\}$ for a sufficiently large constant $C_{1}$ . Then there exist positive constants $\rho,c_{1},$ and $C$ uniformly over $\mathcal{F}_{\eta}(n,M_{3})$ and $M_{1}$ such that, with probability at least $1-n^{-c_{1}}$ , we have

\|v^{*}\|^{2}\|\Delta_{v_{T}}\|^{2},\ \|\Delta_{\eta_{T}}\|_{F}^{2}\leq Ce^{2(M_{1}+M_{3})}n,

for some $T\leq\log(\frac{M_{3}^{2}}{e^{3(M_{1}+M_{3})}}n)/\log(1-\frac{\tau}{e^{M_{1}+M_{3}}}\rho)^{-1}$ .

Theorem 2 implies that the mean square error $\|\Delta_{\eta_{T}}\|_{F}^{2}/n^{2}$ is of order $\mathcal{O}(1/n)$ , which coincides with the existing error rate for one-bit rank- $1$ matrix completion problems [7, 4], while our result can be viewed as their extension to the case where entries of the one-bit matrix are randomly observed with non-uniform probabilities. In particular, for the more general non-uniform case, the key ingredient in our proof is to derive a lower bound of the sampling operator $|A|\in\{0,1\}^{n\times n}$ . We prove that the sampling operator $|A|$ has a positive lower bound, i.e., $\||A|\circ\eta\|_{F}\geq c\|\eta\|_{F}$ with some $c>0$ , as long as $\eta$ belongs to a specific data-dependent set. This positive lower bound enables us to extend the proof in [29] when establishing iterative improvements. The proof of Theorem 2 is given in the Supplemental Material.

Remark 6.

Note that the error bounds for $v$ and $\eta$ in Theorem 2 hold regardless of the concrete form of model (2). Therefore, the above results still hold even if model (2) is mis-specified. But, it still depends on the probability of observing edges. Recall that the probability of forming edges are bounded by $e^{-M_{1}}\asymp 1/(1+e^{M_{1}})\leq P_{ij}\leq 1/(1+e^{M_{2}})\asymp e^{-M_{2}}$ . For $M_{1}\asymp M_{2}$ , the network density is in the order of $\rho=e^{-M_{1}}$ . Theorem 2 implies that the error bound decreases as the network density $\rho$ grows. Specifically, the impact of network density lies in the multiplier $1/\rho^{2}$ . Intuitively, a sparse network, with fewer observed edges, leads to larger estimation errors for $v$ and $\eta$ due to the lack of sign observations. Similarly, Proposition 5 implies that the network density affects the error bound of $Z$ through the multiplier $\max\{\rho,\log n/n\}/\rho^{2}$ . For networks with modest density ( $\rho\geq\log n/n$ ), the multiplier is $1/\rho$ . For sparser networks where $\rho\leq\log n/n$ , the multiplier adjusts to $\log n/(n\rho^{2})$ .

Remark 7.

The assumptions in both Proposition 5 and Theorem 2 require relatively good initializations of $(\alpha,Z,v)$ . We note that the conditions for $\alpha_{0}$ and $Z_{0}$ can be achieved with theoretical justification by the universal-singular-value-thresholding [26] based initialization algorithm proposed in [29]. We further extend this algorithm to initialize $v_{0}$ . Based on our simulation studies (see the Supplemental Material), we find that simple random initialization also achieves similar estimation errors after the algorithm converges while requiring more iterations for algorithm convergence.

5.2 Results for the joint estimation method

We first present the convergence guarantee and the error bound for the estimators obtained by Algorithm S5. Then we further investigate how the joint estimation method could improve the estimation of $Z$ on top of the separate estimation.

Under the joint inner-product model, we redefine the parameter space as

	$\displaystyle\mathcal{F}($	$\displaystyle n,k,M_{1},M_{2},M_{3})=\Big{\{}\alpha,v\in{\mbox{{R}}}^{n},Z\in{\mbox{{R}}}^{n\times k},\Theta,\eta\in{\mbox{{R}}}^{n\times n}\ \big{\|}$
		$\displaystyle\ \ \ \ \ \Theta=\alpha 1_{n}^{\top}+1_{n}\alpha^{\top}+ZZ^{\top},J_{n}Z=Z,\eta=vv^{\top},v=Zw+\gamma 1_{n},$
		$\displaystyle\ \ \ \ \max_{1\leq i\leq n}\\|Z_{i*}\\|^{2},\\|\alpha\\|_{\max}\leq\frac{M_{1}}{2},\max_{1\leq i\neq j\leq n}\Theta_{ij}\leq-M_{2},\\|v\\|_{\max}^{2}\leq M_{3},\\|w\\|\leq M,\|\gamma\|\leq M^{\prime}\Big{\}},$

where $k$ , $M_{1}$ , $M_{2}$ , and $M_{3}$ are allowed to change with the network size $n$ . Let $(\alpha^{*},Z^{*},v^{*})$ be the true parameters, where $v^{*}=Z^{*}w^{*}+\gamma^{*}1_{n}$ with some $w^{*}\in{\mbox{{R}}}^{k}$ and $\gamma^{*}\in{\mbox{{R}}}$ .

Error bound for Algorithm S5 Let $(\alpha_{t},Z_{t},v_{t})$ be the updated parameters at the $t$ -th iteration in Algorithm S5. We assume the projection onto the same constraint sets $\mathcal{C}_{\alpha}$ , $\mathcal{C}_{Z}$ , and $\mathcal{C}_{v}$ at the end of each iteration as those for Algorithms S1 and S2. The following theorem first guarantees that the error of iterates $\{(\alpha_{t},Z_{t})\}_{t\geq 0}$ converges up to a statistical error and then gives the high probability error bounds for the estimators of $Z$ and $\Theta$ .

Theorem 3.

Set the step sizes as $\tau_{Z}=r_{0}\tau/\|Z_{0}\|^{2}_{op}$ , $\tau_{\alpha}=\tau/(2n)$ , and the weight $\lambda=\tilde{\lambda}r_{0}/e^{M_{1}}\kappa^{2}_{Z^{*}}$ with $r_{0}=\min\{1,\|Z_{0}\|^{2}_{op}/\|v_{0}\|^{2}\}$ for any $\tau\leq c_{\tau}$ , $\tilde{\lambda}\leq c_{\lambda}$ , where $c_{\tau}$ and $c_{\lambda}$ are universal constants. Let $\zeta_{n}=\max\{\||A|-P\|_{op},1\}$ and $\varphi_{n}=\max\{\||A|\circ((1+A)/2-Q)\|_{op},1\}$ . Denote the error metric for iterates as $\tilde{e}_{Z,t}=\|\Delta_{Z_{t}}\|_{F}^{2}\|Z_{0}\|_{op}^{2}+\|\Delta_{\alpha_{t}}1_{n}^{\top}\|_{F}^{2}$ . Suppose the initializers $\alpha_{0},Z_{0}$ in Algorithm S5 satisfy $\tilde{e}_{Z,0}\leq c_{0}e^{-2M_{1}-3M_{3}}\|Z^{*}\|_{op}^{4}/\kappa_{Z^{*}}^{4}$ for a sufficiently small positive constant $c_{0}$ , where $\kappa_{Z^{*}}$ is the condition number of $Z^{*}$ . Then, we have

(Deterministic bounds for iterative errors) If $\|Z^{*}\|_{op}^{2}\geq C_{0}e^{M_{1}}\kappa_{Z^{*}}^{2}\zeta_{n}\max\{\sqrt{\tau k}e^{M_{1}+3M_{3}/2}\kappa_{Z^{*}},1\}$ and $\|v^{*}\|^{2}\geq C_{0}e^{M_{1}+M_{3}}\varphi_{n}\max\{\sqrt{\tau}e^{M_{1}/2+M_{3}},1\}$ for a sufficiently large constant $C_{0}$ , then there exist universal positive constants $\rho_{1}$ , $\rho_{2}$ , $C^{\prime}$ , and $C^{\prime\prime}$ such that for all $t\geq 0$

	$\displaystyle\tilde{e}_{Z,t+1}\leq$	$\displaystyle\left(1-\frac{r_{0}\tau\rho_{1}}{e^{M_{1}}\kappa^{2}_{Z^{*}}}\right)\tilde{e}_{Z,t}-\lambda\frac{r_{0}\tau\rho_{2}}{e^{M_{3}}}\min\{\\|\|A\|\circ\Delta_{\eta_{t}}\\|_{F}^{2},e^{-M_{1}}\\|\Delta_{\eta_{t}}\\|_{F}^{2}\}$
		$\displaystyle\ \ +r_{0}\tau C^{\prime}e^{M_{1}}\zeta_{n}^{2}k+\lambda r_{0}\tau C^{\prime\prime}e^{M_{1}+M_{3}}\varphi_{n}^{2}.$

(High-probability bounds) Suppose $\|Z^{*}\|_{op}^{2}\geq C_{0}e^{M_{1}-M_{2}/2}\kappa_{Z^{*}}^{2}\sqrt{n}\max\{\sqrt{\tau k}e^{M_{1}+3M_{3}/2}\kappa_{Z^{*}},1\}$ and $\|v^{*}\|^{2}\geq C_{0}e^{M_{1}+M_{3}}\sqrt{n}\max\{\sqrt{\tau}e^{M_{1}/2+M_{3}},1\}$ for a sufficiently large constant $C_{0}$ . Then there exist positive constants $\rho_{1}$ , $c$ , and $C$ uniformly over $\mathcal{F}(n,k,M_{1},M_{2},M_{3})$ such that, with probability at least $1-n^{-c}$ , we have

\|Z^{*}\|_{op}^{2}\|\Delta_{Z_{T}}\|_{F}^{2},\ \|\Delta_{\Theta_{T}}\|_{F}^{2}\leq C\kappa_{Z^{*}}^{2}e^{2M_{1}}nk\cdot\max\{e^{-M_{2}},\frac{\log n}{n},e^{M_{3}-M_{1}}\frac{1}{\kappa_{Z^{*}}^{2}k}\},

for some $T\leq\log(\frac{M_{1}^{2}}{\kappa_{Z^{*}}^{2}e^{4M_{1}+3M_{3}-M_{2}}}\frac{n}{k^{3}})/\log(1-\frac{r_{0}\tau\rho_{1}}{e^{M_{1}}\kappa_{Z^{*}}^{2}})^{-1}$ .

The first part of Theorem 3 indicates that, compared to the separate estimation method, the joint method involving the gradient of the sign likelihood leads to an extra improvement on the error bound of iterates, which depends on $\Delta_{\eta_{t}}$ , while introducing another statistical error term $\varphi_{n}$ . As a result, in the second part, the high probability error bounds depend on the maximum of three terms, among which the first two are the same as in Proposition 5 and the third is resulted from $\varphi_{n}$ . When $M_{3}\leq M_{1}-M_{2}+\log(\kappa_{Z^{*}}^{2}k)$ , the maximum multiplier reduces to the one in Proposition 5. Overall, the error bounds of $Z$ and $\Theta$ for the joint estimation method are still in the order $\mathcal{O}(nk)$ , which is the same as that for the separate estimation method. In addition, the following corollary gives the error bounds of $v_{T}$ and $\eta_{T}$ obtained from the line 5 in Algorithm S5.

Corollary 3.1.

For $v_{t}=Z_{t}w_{t}+\gamma_{t}1_{n}$ with $(w_{t},\gamma_{t})=\operatorname*{arg\,min}_{w\in{\mbox{{R}}}^{k},\gamma\in{\mbox{{R}}}}\mathcal{L}_{\lambda}(\alpha_{t},Z_{t},w,\gamma)$ , we have $\|\Delta_{\eta_{t}}\|_{F}\leq 16e^{M_{1}+M_{3}}\max\{\zeta_{n},\varphi_{n}\}+e^{M_{1}/2+M_{3}}(2+\|\Delta_{Z_{t}}\|_{F}\|w^{*}\|)\|v^{*}\|\|\Delta_{Z_{t}}\|_{F}\|w^{*}\|$ for $t\geq 0$ . Suppose the conditions for high probability bounds in Theorem 3 hold, then there exist positive constants $\rho_{1}$ , $c$ , and $C$ uniformly over $\mathcal{F}(n,k,M_{1},M_{2},M_{3})$ such that, with probability at least $1-n^{-c}$ , we have

\|v^{*}\|^{2}\|\Delta_{v_{T}}\|^{2},\|\Delta_{\eta_{T}}\|^{2}\leq Ce^{3M_{1}+2M_{3}}nk\cdot\max\{\frac{e^{M_{3}-M_{1}}}{k},\kappa_{Z^{*}}^{2}\max\{e^{-M_{2}},\frac{\log n}{n}\}\},

for some $T\leq\log(\frac{M_{1}^{2}}{\kappa_{Z^{*}}^{2}e^{4M_{1}+3M_{3}-M_{2}}}\frac{n}{k^{3}})/\log(1-\frac{r_{0}\tau\rho_{1}}{e^{M_{1}}\kappa_{Z^{*}}^{2}})^{-1}$ .

In particular, the deterministic error bound for $\Delta_{\eta_{t}}$ consists of the statistical error term $\max\{\zeta_{n},\varphi_{n}\}$ and the estimation error of $Z_{t}$ , and with high probability $\|\Delta_{\eta_{T}}\|^{2}$ is dominated by the estimation error of $Z_{t}$ and thus is also in the order $\mathcal{O}(nk)$ .

One-step improvement Although Theorem 3 guarantees the convergence of Algorithm S5 up to certain statistical errors, the achieved error rate is in the same order as that for the separate estimation method. To further investigate how exploiting extra structural information in the joint inner-product model would help estimate the latent variables, we consider the estimation error moving against the gradient one step from the estimators obtained by the separate method below.

Suppose we are given estimators $(\bar{\alpha},\bar{Z})$ of latent variables obtained from the separate estimation Algorithm S1 and an estimator $\bar{v}=\bar{Z}\bar{w}+\bar{\gamma}1_{n}$ . Then we update the estimator of $Z$ by one step through Algorithm S5 as below:

\hat{Z}=\bar{Z}-2\tau_{z}(1-\lambda)(\sigma(\bar{\Theta})-|A|)\bar{Z}-2\tau_{z}\lambda\big{(}|A|\circ\sigma(\bar{\eta})-B\big{)}\bar{v}\bar{w}^{\top},

(9)

where $\bar{\Theta}=\bar{\alpha}1_{n}^{\top}+1_{n}{\bar{\alpha}}^{\top}+\bar{Z}\bar{Z}^{\top}$ , $\bar{\eta}=\bar{v}\bar{v}^{\top}$ , and $B=|A|\circ(A+1)/2$ . Note that $[B]_{ij}=|A_{ij}|b_{ij}$ with $b_{ij}$ independently following Bernoulli $(\sigma(\eta_{ij}^{*}))$ conditional on $|A|$ . The following proposition provides insights on under what scenarios the one-step update in the joint estimation method could lead to better estimates of the latent position vectors $Z$ . In below, for ease of derivation, we consider the parameter space $\mathcal{F}(n,k,M_{1},M_{2},M_{3})$ with fixed $M_{i}$ ( $i=1,2,3$ ) and $k$ .

Proposition 6.

Given the estimators $(\bar{\alpha},\bar{Z})$ obtained from Algorithm S1 and the estimators $(\bar{w},\bar{\gamma})$ that are independent of $B$ conditional on $|A|$ and satisfy $\|\bar{w}-w^{*}\|^{2}+\|\bar{\gamma}-\gamma^{*}\|^{2}=\mathcal{O}(1/n)$ . We update $\bar{Z}$ for one step by (9) and obtain $\hat{Z}$ . Suppose the conditions in Proposition 5 hold, and the singular values of the sample covariance ${Z^{*}}^{\top}Z^{*}/n$ are of constant order. Then there exists an optimal $\lambda_{opt}$ that minimizes $\mathbb{E}\|\hat{Z}-Z^{*}\|_{F}^{2}$ . Furthermore, if $\lambda_{opt}>0$ , we have $\mathbb{E}\|\Delta_{\hat{Z}}\|_{F}^{2}<\mathbb{E}\|\Delta_{\bar{Z}}\|_{F}^{2}$ for any $\lambda\in(0,2\lambda_{opt})$ , and the improvement $\mathbb{E}\|\Delta_{\bar{Z}}\|_{F}^{2}-\mathbb{E}\|\Delta_{\hat{Z}}\|_{F}^{2}$ with $\lambda_{opt}$ is at least

\frac{\big{\|}|A|\circ\xi\circ T_{1}\big{\|}_{F}^{2}\left(\big{\|}|A|\circ\xi\circ T_{1}\big{\|}_{F}-\big{\|}|A|\circ\xi\circ T_{2}\big{\|}_{F}-\big{\|}|A|\circ\xi\circ T_{3}\big{\|}_{F}\right)^{2}}{16\left(\big{\|}|A|\circ\xi\circ\xi\circ(T_{1}+T_{2}-T_{3})\big{\|}_{op}^{2}+\mathbb{E}\big{\|}B-|A|\circ\sigma(\eta^{*})\big{\|}_{op}^{2}/(\|\bar{Z}\|_{op}^{2}\|\bar{w}\|^{4})\right)},

where $T_{i}$ ’s are given in (S42)-(S44) respectively for $i=1,2,3$ in the Supplemental Material with $\|T_{1}\|_{F}=\mathcal{O}(1)$ , $\|T_{2}\|_{F}=\mathcal{O}(1)/\|w^{*}\|$ , and $\|T_{3}\|_{F}=\mathcal{O}(1/\sqrt{n})$ , and $\xi$ is an element-wise positive constant matrix. Here $\mathbb{E}$ represents the conditional expectation of $B$ given $|A|$ .

We provide the expression of the optimal $\lambda_{opt}$ that minimizes $\mathbb{E}\|\hat{Z}-Z^{*}\|_{F}^{2}$ , the proof of Proposition 6, and discuss when the conditional independence assumption and the prerequisite error rate of $(\bar{w},\bar{\gamma})$ in Proposition 6 hold in the Supplemental Material. Since a positive $\lambda_{opt}$ implies a strict decrease in the mean square error of $Z$ after one-step update, we further investigate in which case $\lambda_{opt}$ tends to be positive. In particular, $\lambda_{opt}>0$ if and only if $\big{\|}|A|\circ\xi\circ T_{1}\big{\|}_{F}-\big{\|}|A|\circ\xi\circ T_{2}\big{\|}_{F}-\big{\|}|A|\circ\xi\circ T_{3}\big{\|}_{F}$ is strictly positive. Our analysis on the upper bounds of the three terms suggests that the first two terms are the dominating terms and a larger $\|w^{*}\|$ more likely results in a positive $\lambda_{opt}$ . Therefore, when the signal from the edge sign distribution is strong, incorporating information from observed signs in the joint estimation method is useful for improving the estimation of $Z$ . Moreover, the magnitude of improvement also depends on the levels of the signal and the noise in the sign distribution. Specifically, as $\|w^{*}\|\asymp\|\bar{w}\|$ increases, the difference between the upper bounds of the two dominating terms in the numerator increases while the upper bound of the denominator decreases, therefore overall the improvement is likely to increase. This implies that larger signals in the edge signs would lead to greater improvement in estimating $Z$ . On the other hand, we find that the improvement decreases when the noise $\mathbb{E}\big{\|}B-|A|\circ\sigma(\eta^{*})\big{\|}_{op}^{2}/\|\bar{Z}\|_{op}^{2}$ in the edge sign distribution increases in the denominator.

6 Simulation Studies

In this section, we conduct simulation studies to investigate how estimation errors of the proposed methods depend on: 1) the network size and the dimension of latent position vectors; 2) the network density; and 3) the proportion of positive edges.

Estimation methods We compare three estimation methods. In addition to the separate estimation method and the joint estimation method introduced in Section 4, we further add an intermediate method, one-step-joint estimation, to illustrate the one-step improvement discussed in Proposition 6. Specifically, given $\bar{Z}$ and $\tilde{v}$ obtained from Algorithms S1 and S2 respectively, we compute the one-step-joint estimators $(J_{n}\hat{Z},\bar{v})$ by first updating $\bar{v}=\bar{Z}\bar{w}+\bar{\gamma}1_{n}$ with $\bar{w},\bar{\gamma}=\operatorname*{arg\,min}_{w\in{\mbox{{R}}}^{k},\gamma\in{\mbox{{R}}}}\|\tilde{v}-\bar{Z}w-\gamma 1_{n}\|$ and then obtaining $\hat{Z}$ by plugging $(\bar{Z},\bar{v})$ into (9). We set $\lambda=1/2$ , so that the joint estimation is the same as the maximum likelihood estimation.

Simulation settings For a given network size $n$ and a latent position vector dimension $k$ , we set the model parameters as follows. We first generate the latent positions $Z_{ij}\overset{\text{iid}}{\sim}\mathcal{N}(0,1)$ from the standard normal distribution, for $1\leq i\leq n,1\leq j\leq k$ . By centering columns of $Z$ , we get $Z^{*}=J_{n}Z$ , where $J_{n}=I_{n}-\frac{1}{n}1_{n}1_{n}^{\top}$ . We further normalize $Z^{*}$ element-wise such that $\|Z^{*}{Z^{*}}^{\top}\|_{F}=n$ . Next, we generate the node degree heterogeneity parameters $\alpha_{i}^{*}=-\alpha_{i}/\sum_{i=1}^{n}\alpha_{i}$ , where $\alpha_{i}\overset{\text{iid}}{\sim}U(1,3)$ is uniformly distributed for $1\leq i\leq n$ . Finally, we set $w^{*}=1/\sqrt{k}\cdot 1_{k}$ , $\gamma^{*}=0$ , and $v^{*}_{i}={w^{*}}^{\top}Z^{*}_{i}+\gamma^{*}$ .

Given the true latent variables $Z^{*},\alpha^{*}$ , and $v^{*}$ , we randomly generate 20 replications of the signed adjacency matrix following (2) and (3), and fit models by three estimation methods. For each method, we measure the relative errors for $Z,v,\Theta$ , and $\eta$ . Due to the identifiability conditions in Proposition 4, we define the relative error for $Z$ as $\|\hat{Z}-Z^{*}\bar{Q}\|_{F}/\|Z^{*}\|_{F}$ , where $\bar{Q}=\operatorname*{arg\,min}_{Q\in O(k)}\|\hat{Z}-Z^{*}Q\|_{F}$ and $O(k)$ is the collection of all orthogonal matrices in ${\mbox{{R}}}^{k}$ . We define the error for $v$ as $\|\hat{v}-\bar{\kappa}v^{*}\|/\|v^{*}\|$ , where $\bar{\kappa}=\operatorname*{arg\,min}_{\kappa\in\{1,-1\}}\|\hat{v}-\kappa v^{*}\|$ . The relative errors for $\Theta$ and $\eta$ are defined as $\|\hat{\Theta}-\Theta^{*}\|_{F}/\|\Theta^{*}\|_{F}$ and $\|\hat{\eta}-\eta^{*}\|_{F}/\|\eta^{*}\|_{F}$ respectively, where $\hat{\Theta}=\hat{\alpha}1_{n}^{\top}+1_{n}\hat{\alpha}^{\top}+\hat{Z}\hat{Z}^{\top}$ and $\hat{\eta}=\hat{v}\hat{v}^{\top}$ .

6.1 Varying the network size and the dimension of the latent space

In Figure 3, we summarize how estimation errors vary with different network sizes. We fix $k=2$ and vary $n\in\{500,1000,2000,4000\}$ . We can see that, for a fixed dimension of the latent space, the relative errors of all three estimation methods decrease in the rate of $1/\sqrt{n}$ as the network size $n$ grows, which align well with the theoretical error rates given in Section 5. Next, compared to the separate estimation method, the joint estimation method consistently achieves smaller estimation errors on all four quantities of interest across different network sizes. In addition, the one-step-joint estimation that simply updates estimates by one-step gradient descent is able to reduce the estimation errors compared to the separate estimation method.

We further summarize how estimation errors of $Z$ and $\Theta$ vary with different dimensions of the latent position vector in Figure S3 in the Supplemental Material for the sake of space. We fix $n=2000$ and vary $k\in\{2,4,8\}$ . We find that, for a fixed network size, the relative errors increase in the rate of $\sqrt{k}$ as the dimension of latent position vector $k$ grows. This also agrees well with our theoretical results. The relative trend among the three estimation methods for different $k$ is similar as that in Figure 3, where the joint estimation method is consistently the best.

6.2 Varying the network density

We investigate how estimation errors for three estimation methods vary with the network density. To this end, we generate the node degree heterogeneity parameters $\alpha_{i}^{*}=-\bar{\alpha}-\alpha_{i}/\sum_{i=1}^{n}\alpha_{i}$ where $\alpha_{i}\overset{\text{iid}}{\sim}U(1,3)$ is uniformly distributed for $1\leq i\leq n$ . We fix $n=2000$ and $k=4$ , and vary $\bar{\alpha}\in\{0,0.25,0.5,0.75,1,1.25\}$ , which leads to the network density ranging from $0.1$ to $0.5$ .

Figure S4 in the Supplemental Material summarizes the relative estimation errors of $Z$ and $v$ over $20$ replications under different network densities. We can see that both estimation errors of $Z$ and $v$ for all three estimation methods decrease as the network gets denser, and the joint estimation method achieves lower estimation errors than the other two methods consistently across various network densities. In particular, when the network is dense, the improvement in estimating $Z$ from the joint estimation against the separate estimation increases, which is expected because in joint estimation, the observed edges’ signs are also useful for inferring $Z$ and denser networks provide more information. In addition, regarding the estimation error of $v$ , the joint and one-step-joint estimation methods that use the additional structural information between $Z$ and $v$ perform more stably than the separate estimation method as the network gets sparser.

6.3 Varying the proportion of positive edges

We also investigate the effect of the proportion of positive edges on three estimation methods. For this purpose, we change the simulation setting. Specifically, we fix $n=2000$ and $k=4$ , and vary $\gamma^{*}\in\{0,0.3,0.6,0.9,1.2,1.5,1.8\}$ , which results in the proportion of positive edges ranging from $0.52$ to $0.91$ . To eliminate the artificial effect resulting from varying $\gamma^{*}$ when evaluating the estimation error of $v$ , we focus on the estimation error of the centered $v$ , i.e., $v_{cen}=J_{n}v$ . We define the relative error for $v_{cen}$ as $\|J_{n}\hat{v}-\bar{\kappa}J_{n}v^{*}\|/\|J_{n}v^{*}\|=\|J_{n}\hat{v}-\bar{\kappa}Z^{*}w^{*}\|/\|Z^{*}w^{*}\|$ , where $\bar{\kappa}=\operatorname*{arg\,min}_{\kappa\in\{1,-1\}}\|J_{n}\hat{v}-J_{n}\kappa v^{*}\|$ .

Figure 4 summarizes the relative estimation errors of $Z$ and $v_{cen}$ over $20$ replications under different proportions of positive edges. Overall, the joint estimation method performs consistently the best among three methods and is robust across different sign distributions. We note that the estimation error for $Z$ for the separate method does not change, because the generated absolute adjacency matrix $|A|$ does not change when varying $\gamma^{*}$ and thereby the separate estimates $\hat{Z}$ stay the same. We also observe that the optimal performance of the separate method for estimating $v$ is not achieved around 50% positive signs. Intuitively, in extreme cases where $v\approx 0$ , the negative and positive signs become indistinguishable, with the proportion of positive edges close to 50%. This makes it challenging to estimate $v$ . The optimal proportion minimizing the estimation error may vary case by case. We investigate an SBM example in the Supplemental Material for illustration.

7 International Relation Data

In this section, we apply the proposed method to an international relation data, i.e. the Correlates of War (COW) [16], to demonstrate how the proposed method can be used to make informative interpretation and visualization of signed networks. The COW dataset records various types of international relations among countries, such as wars, alliances, and militarized interstate disputes. Similarly as [17], we construct a signed network of countries, where the positive edges represent alliance relationships, and the negative edges represent the existence of militarized disputes between countries. We take the snapshot of the records during World War II (WWII), i.e., from 1939 to 1945. In particular, if two countries were involved in both alliances and militarized disputes, we set the sign of their edge to positive if the number of years of alliances is larger than that of the militarized disputes, and we set the sign to negative otherwise. According to the COW records, there are 68 countries that were involved in alliances or militarized disputes during WWII, and the resulted signed network contains 566 positive edges and 519 negative edges. The total number of balanced triangles is 2,148, outnumbering the unbalanced triangles (613) by over threefold. To further assess the balance property, we applied a stratified permutation test [9]. Under the null hypothesis, the sign of an edge is independent of its position on the graph, which is considered as balance-free. The p-value of the stratified permutation test is less than $0.001$ , which indicates strong evidence of the balance property in this international relation network.

To incorporate the balance theory, we fit the joint inner-product model to the COW dataset. We set the dimension of the latent position vectors as $k=2$ , such that the estimated $\hat{Z}$ can be directly visualized on a 2-dimensional plane. The model fitted by the joint estimation method is visualized in Figure 5. In the figure, each node represents a country and their coordinates are given by $\hat{Z}$ . The size of each node is determined by the estimated degree heterogeneity parameter $\hat{\alpha}$ , with larger nodes corresponding to larger $\hat{\alpha}_{i}$ values. The color and the shape of each node $i$ distinguish the estimated latent polar variable $\hat{v}_{i}$ . Specifically, if $\hat{v}_{i}>0$ , the node is visualized as a red circular point; and if $\hat{v}_{i}<0$ , the node is visualized as a blue square point. For both the red and the blue points, the larger the absolute magnitude $|\hat{v}_{i}|$ , the darker the color. The sign of each edge is also indicated in the figure, with dashed green being positive and solid purple being negative.

Figure 5 shows that the proposed model fitted by the joint estimation method is able to capture important information in the signed network of countries during WWII. First, in terms of the estimated node degree heterogeneity $\hat{\alpha}$ , the top 11 countries are Germany, Italy, Japan, the United Kingdom (UK), Romania, the United States (USA), Brazil, Bulgaria, Hungary, France, the Soviet Union (USSR). In particular, UK, USA, USSR were the 3 leading countries of the Allies of WWII. France played important and complicated roles in both the Allies and the Axis. Brazil was the only South American country that actively participated in WWII. All other countries were members of the Axis. Since members of the Axis were more active than those of the Allies on average, it is reasonable that even small members have high values of $\hat{\alpha}$ . Second, regarding the estimated latent polar variable $\hat{v}$ , the model is able to divide the countries into two groups (blue and red) that mostly align with the division between the Axis and the Allies. As we assume a linear transformation from $Z$ to $v$ , the plane (the space of $Z$ ) can be linearly separated into two areas with the boundary illustrated by the grey dash-dotted line. We can see that edges crossing the boundary are mainly negative (purple), while edges within the same side are mainly positive (green). Finally, the estimated latent position vectors $\hat{Z}$ also capture various other interesting aspects. Within the Allies, $\hat{Z}$ further clusters countries in America together (see the right part of the figure), among which most edges are positive (green). The Middle East countries form a cluster as well (at the top of the figure), though not as tight as countries in America. It is also interesting to see that France, one of the major Allied powers in WWII, is positioned on the same side of Germany, Italy, and Japan. This is because, after occupied by Germany, France was divided into two political powers, Free France and Vichy France, with the latter collaborating with Germany and fighting against the Allies in several campaigns from 1940 to 1944. We perform additional sensitivity analysis to assess the impact of splitting France into two political entities in the Supplemental Material.

On the other hand, we also fit the separate inner-product model and visualize the model fitted by the separate estimation method in Figure S6 in the Supplemental Material for the sake of space. We find that the model fitted by the separate estimation method captures useful information from the network but not as interpretable as that by the joint estimation method (see more detailed discussion in the Supplemental Material).

8 Discussion

In this paper, we propose a latent space approach that accommodates the structural balance theory for modeling signed networks. In particular, we introduce a novel notion of population-level balance, which is a natural choice to characterize the structural balance theory when we treat both the edges and their signs as random variables. We develop sufficient conditions for a latent space model to be balanced at the population level, and propose two balanced inner-product models following the conditions. We also provide scalable estimation algorithms with theoretical guarantees.

In Section 3, we proposed balanced inner-product models that use a one-dimensional polar variable. This design offers a direct and intuitive interpretation, where the sign of the polar variable inherently indicates membership to one of two antagonistic groups. Beyond this, considering multi-dimensional polar variables, i.e., $\eta_{ij}=v_{i}^{\top}v_{j}$ for $v_{i}\in{\mbox{{R}}}^{d}$ with $d>1$ , could be more flexible in terms of capturing multiple latent factors in real-world applications. In that case, we have ascertained that our estimation method and high-probability error bounds are still applicable to the multi-dimensional scenario. However, it is also important to note that, when considering multi-dimensional polar variables, the conditions in Theorem 1 no longer hold. In addition, multi-dimensional polar variables have the potential drawback of complicating interpretations.

There are a few other directions we may continue to explore in the future. First, the joint inner-product model could be extended to have a nonlinear link function $\phi$ , which would increase the flexibility of the model. Second, we may generalize the proposed approach to weighted signed networks to better leverage richer edge information available in real-world networks. Finally, it is desirable to extend the latent space approach for undirected networks to directed networks, which can be potentially used to model other interesting social theories, such as the social status theory, for signed networks.

\AtNextBibliography

References

[1] Emmanuel Abbe “Community Detection and Stochastic Block Models: Recent Developments” In Journal of Machine Learning Research 18.177, 2018, pp. 1–86
[2] Emmanuel J. Candes, Yonina C. Eldar, Thomas Strohmer and Vladislav Voroninski “Phase Retrieval via Matrix Completion” In SIAM Journal on Imaging Sciences 6.1, 2013, pp. 199–225 DOI: 10.1137/110848074
[3] Sourav Chatterjee “Matrix estimation by Universal Singular Value Thresholding” In Annals of Statistics 43.1 The Institute of Mathematical Statistics, 2015, pp. 177–214 DOI: 10.1214/14-AOS1272
[4] Yudong Chen and Martin J Wainwright “Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees” In arXiv preprint arXiv:1509.03025, 2015
[5] Kai-Yang Chiang, Cho-Jui Hsieh, Nagarajan Natarajan, Inderjit S. Dhillon and Ambuj Tewari “Prediction and Clustering in Signed Networks: A Local to Global Perspective” In Journal of Machine Learning Research 15.34, 2014, pp. 1177–1213
[6] Fan Chung and Linyuan Lu “Complex Graphs and Networks (CBMS Regional Conference Series in Mathematics)” USA: American Mathematical Society, 2006
[7] Mark A Davenport, Yaniv Plan, Ewout Van Den Berg and Mary Wootters “1-bit matrix completion” In Information and Inference: A Journal of the IMA 3.3 OUP, 2014, pp. 189–223
[8] Tyler Derr, Charu Aggarwal and Jiliang Tang “Signed network modeling based on structural balance theory” In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018, pp. 557–566
[9] Derek Feng, Randolf Altmeyer, Derek Stafford, Nicholas A. Christakis and Harrison H. Zhou “Testing for Balance in Social Networks” In Journal of the American Statistical Association 0.0 Taylor & Francis, 2020, pp. 1–19 DOI: 10.1080/01621459.2020.1764850
[10] Anna Goldenberg, Alice X. Zheng, Stephen E. Fienberg and Edoardo M. Airoldi “A Survey of Statistical Network Models” In Foundations and Trends® in Machine Learning 2.2, 2010, pp. 129–233 DOI: 10.1561/2200000005
[11] Ramanthan Guha, Ravi Kumar, Prabhakar Raghavan and Andrew Tomkins “Propagation of trust and distrust” In Proceedings of the 13th International Conference on World Wide Web, 2004, pp. 403–412
[12] Frank Harary “On the notion of balance of a signed graph.” In The Michigan Mathematical Journal 2.2 The University of Michigan, 1953, pp. 143–146
[13] Peter D Hoff “Bilinear mixed-effects models for dyadic data” In Journal of the American Statistical Association 100.469 Taylor & Francis, 2005, pp. 286–295
[14] Peter D Hoff, Adrian E Raftery and Mark S Handcock “Latent Space Approaches to Social Network Analysis” In Journal of the American Statistical Association 97.460 Taylor & Francis, 2002, pp. 1090–1098 DOI: 10.1198/016214502388618906
[15] Cho-Jui Hsieh, Kai-Yang Chiang and Inderjit S Dhillon “Low rank modeling of signed networks” In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012, pp. 507–515
[16] Ahmet Izmirlioglu “The Correlates of War Dataset” In Journal of World-Historical Information 3.1 University Library System, University of Pittsburgh, 2017
[17] Alec Kirkley, George T. Cantwell and M… Newman “Balance in signed networks” In Physical Review E 99 American Physical Society, 2019, pp. 012320 DOI: 10.1103/PhysRevE.99.012320
[18] Vladimir Koltchinskii, Karim Lounici and Alexandre B. Tsybakov “Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion” In Annals of Statistics 39.5 The Institute of Mathematical Statistics, 2011, pp. 2302–2329 DOI: 10.1214/11-AOS894
[19] Pavel N. Krivitsky, Mark S. Handcock, Adrian E. Raftery and Peter D. Hoff “Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models” In Social Networks 31.3, 2009, pp. 204–213 DOI: https://doi.org/10.1016/j.socnet.2009.04.001
[20] Jure Leskovec, Daniel Huttenlocher and Jon Kleinberg “Signed networks in social media” In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2010, pp. 1361–1370
[21] Zhuang Ma, Zongming Ma and Hongsong Yuan “Universal Latent Space Model Fitting for Large Networks with Edge Covariates.” In Journal of Machine Learning Research 21.4, 2020, pp. 1–67
[22] Mark Newman “Networks: An Introduction” Oxford University Press, 2010
[23] Bernhard Scholkopf and Alexander J Smola “Learning with kernels: support vector machines, regularization, optimization, and beyond” Adaptive ComputationMachine Learning series, 2018
[24] Lingxiao Wang, Xiao Zhang and Quanquan Gu “A unified computational and statistical framework for nonconvex low-rank matrix estimation” In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017, pp. 981–990 PMLR
[25] Qinqing Zheng and John Lafferty “Convergence analysis for rectangular matrix completion using Burer-Monteiro factorization and gradient descent” In arXiv preprint arXiv:1605.07051, 2016

References

[26] Sourav Chatterjee “Matrix estimation by Universal Singular Value Thresholding” In Annals of Statistics 43.1 The Institute of Mathematical Statistics, 2015, pp. 177–214 DOI: 10.1214/14-AOS1272
[27] Chao Gao, Zongming Ma, Anderson Y Zhang and Harrison H Zhou “Achieving optimal misclassification proportion in stochastic block models” In Journal of Machine Learning Research 18.1 JMLR. org, 2017, pp. 1980–2024
[28] Jing Lei and Alessandro Rinaldo “Consistency of spectral clustering in stochastic block models” In Annals of Statistics 43.1, 2015, pp. 215–237 DOI: 10.1214/14-AOS1274
[29] Zhuang Ma, Zongming Ma and Hongsong Yuan “Universal Latent Space Model Fitting for Large Networks with Edge Covariates.” In Journal of Machine Learning Research 21.4, 2020, pp. 1–67
[30] Yurii Nesterov “Introductory lectures on convex optimization: A basic course” Springer Science & Business Media, 2013
[31] Stephen Tu, Ross Boczar, Max Simchowitz, Mahdi Soltanolkotabi and Ben Recht “Low-rank Solutions of Linear Matrix Equations via Procrustes Flow” In Proceedings of the 33rd International Conference on Machine Learning, 2016, pp. 964–973
[32] Anderson Y Zhang and Harrison H Zhou “Minimax rates of community detection in stochastic block models” In The Annals of Statistics 44.5 Institute of Mathematical Statistics, 2016, pp. 2252–2280 DOI: 10.1214/15-AOS1428

Population-level Balance in Signed Networks

Abstract

1 Introduction

Definition 1 (Population-level balanced network).

2 A Latent Space Approach for Signed Networks

Definition 2 (A general latent space model for signed network G​(n,𝒰0,Pu,B,f)G(n,\mathcal{U}_{0},P_{u},B,f)).

Proposition 1.

Theorem 1.

Corollary 1.1.

Proposition 2.

Remark 1.

Remark 2.

3 Balanced Inner-product Models

3.1 Separate inner-product model

Proposition 3.

3.2 Joint inner-product model

Proposition 4.

Remark 3.

4 Model Estimation

4.1 Separate estimation method

Remark 4.

Remark 5.

4.2 Joint estimation method

5 Theoretical Results

5.1 Results for the separate estimation method

Proposition 5.

Theorem 2.

Remark 6.

Remark 7.

5.2 Results for the joint estimation method

Theorem 3.

Corollary 3.1.

Proposition 6.

6 Simulation Studies

6.1 Varying the network size and the dimension of the latent space

6.2 Varying the network density

6.3 Varying the proportion of positive edges

7 International Relation Data

8 Discussion

References

References

Definition 2 (A general latent space model for signed network $G(n,\mathcal{U}_{0},P_{u},B,f)$ ).