Population-level Balance in Signed Networks
Abstract
Statistical network models are useful for understanding the underlying formation mechanism and characteristics of complex networks. However, statistical models for signed networks have been largely unexplored. In signed networks, there exist both positive (e.g., like, trust) and negative (e.g., dislike, distrust) edges, which are commonly seen in real-world scenarios. The positive and negative edges in signed networks lead to unique structural patterns, which pose challenges for statistical modeling. In this paper, we introduce a statistically principled latent space approach for modeling signed networks and accommodating the well-known balance theory, i.e., “the enemy of my enemy is my friend” and “the friend of my friend is my friend”. The proposed approach treats both edges and their signs as random variables, and characterizes the balance theory with a novel and natural notion of population-level balance. This approach guides us towards building a class of balanced inner-product models, and towards developing scalable algorithms via projected gradient descent to estimate the latent variables. We also establish non-asymptotic error rates for the estimates, which are further verified through simulation studies. In addition, we apply the proposed approach to an international relation network, which provides an informative and interpretable model-based visualization of countries during World War II.
Keywords: Signed Networks, Balance Theory, Latent Space Models, Projected Gradient Descent.
1 Introduction
Networks characterize connectivity relationships between individuals of a complex system and are ubiquitous in various fields, such as social science, biology, transportation, and information technology [22]. In a network, a node represents an individual and an edge between two nodes indicates the presence of certain interaction or relation. Given the unique relational information represented by networks, many statistical models have been developed to understand the underlying mechanism of the system and help explain the observed phenomenon on networks; see for example [10] for a comprehensive overview. One important class of statistical models is the latent variable model, where the presence/absence of an edge depends on the node latent variables. For example, stochastic block models use latent categorical variables to describe the block structure among nodes [1]; latent space models map nodes into a low-dimensional metric space while accounting for transitivity, homophily for node attributes, node heterogeneity and clustering [14, 19]. Such latent variable models are attractive due to their interpretable structure, their nature for network visualization, and their ability for downstream network-assisted learning such as node clustering, node classification, and network link prediction.
Nonetheless, most statistical network models only focus on the presence/absence of edges while ignoring different types of edges, which makes them inadequate for modeling signed networks. A signed network consists of two types of edges, positive edges and negative edges, and such polarized relationships are common in real-world networks. For example, positive and negative edges may respectively correspond to relationships of like and dislike in social networks, collaboration and competition in trading networks, or alliance and militarized dispute in international relation networks. Modeling signed networks has its own unique challenges not merely due to the additional sign for each edge, but more importantly, because the presence of positive and negative edges affect each other in certain ways. There have been various social theories that describe the structural pattern of signed networks [11, 20], an important one being the structural balance theory [12]. Specifically, the balance theory describes the distribution of different types of triangles (i.e. three nodes that are connected with each other). A triangle in a signed network is called balanced if the product of its three edge signs is positive; and it is called unbalanced otherwise (see Figure 1 for examples). The balance theory postulates that balanced triangles should be more prevalent than unbalanced triangles in signed networks, which directly coincides with the proverb, “the enemy of my enemy is my friend” and “the friend of my friend is my friend”. Moreover, recent studies have found empirical evidence of the balance property in many real-world signed networks [20, 17, 9].

On the other hand, there have been very few statistical models for signed networks that incorporate the balance theory into modeling. To the best of our knowledge, [8] is the only recent work; specifically, it extends the configuration model [6] to signed networks with a focus on matching not only the node degree distribution but also the sign distribution and proportion of balanced triangles. Besides statistical models, there is a collection of work using low-rank matrix completion approaches induced by the balance theory for learning tasks such as sign prediction and clustering [15, 5]. These works assume that there are underlying signed edges (not allowing for zeros) between all possible pairs of nodes, and view the network as a fixed adjacency matrix with entries of . In comparison, statistical network models can provide statistically interpretable structures and account for noise in both signs and edges by modeling network distributions that precisely quantify the randomness in the observed data.
In this paper, we propose a latent space approach to accommodate the balance theory for signed networks in a statistically principled way. Specifically, we introduce a novel definition of balance at the population level, which matches the balance theory in nature while viewing an observed network as the realization of a random quantity. For concreteness, we consider an undirected signed network with nodes denoted by a symmetric signed adjacency matrix , with if node and node are linked by a positive edge, if node and node are linked by a negative edge, and if there is no edge between and . We assume there is no self-loop and thereby the diagonal elements of are zeros. We assume to be random variables taking values in and define the notion of balance at the population level as follows.
Definition 1 (Population-level balanced network).
A network is population-level balanced if
This definition suggests the expected product of signs on any triangle to be positive but does not require all triangles to be balanced in an observed signed network. Furthermore, the stochastic notion in Definition 1 allows us to investigate what generating mechanisms of signed networks are inherently of population-level balance. Specifically, we will focus on a general class of latent space models, due to aforementioned merits of latent space models. Rigorous descriptions are provided in Section 2. The key finding is that, if there exists a partition of the latent space such that edges tend to be positive within the same subset and negative between different subsets, then the network generated from such a latent space model is population-level balanced.
Based on this finding, we further propose a class of balanced inner-product models that directly capture the population-level balance. A unique difference from latent space models for unsigned networks is that we introduce an additional latent polar variable for each node. In particular, when the product of latent polar variables of two nodes has a large positive value, the sign of an edge between them is more likely to be positive; for a node with the latent polar variable being zero, it has no preference on the signs when forming edges with other nodes. We note that it is this novel introduction of latent polar variables that enables modeling signed networks with the population-level balance.
This paper is organized as follows. We introduce the latent space approach for signed networks in Section 2, where we also provide a sufficient condition for the population-level balance. We present the proposed balanced inner product models in Section 3. We develop two scalable estimation methods in Section 4 and establish their non-asymptotic error rates in Section 5, which are further validated by simulation studies in Section 6. We demonstrate the effectiveness of the proposed approach in modeling a real-world signed network for international relations in Section 7. All proofs are given in the Supplemental Material.
2 A Latent Space Approach for Signed Networks
In this section, we propose a probabilistic generative process for undirected signed networks with nodes. Recall that is the signed adjacency matrix. Suppose the latent space is endowed with the probability measure ; and are two measurable symmetric functions.
Definition 2 (A general latent space model for signed network ).
For , let be the latent vector independently sampled from the distribution . Given the latent vectors of a pair of nodes and , independently of other pairs, an edge between node and node is drawn with probability , i.e.,
then for each edge (i.e. ), independently of all others, it takes the positive sign with logit and the negative sign otherwise, i.e.,
We write to denote a signed network with nodes generated from the above procedure.
Note that in the network generative process in Definition 2, the first part for generating edges covers many existing latent variable models for unsigned networks as special cases by specifying different functions ; the second part for generating signs further models the sign distribution through specifying the logit-transformed probability , as the sign of is the same as that of the conditional expectation of . Given this general class of latent space models for signed networks, next we identify the connection between the population-level balance and the key components of the model . As we will see, this connection serves as the foremost step for incorporating the balance theory into modeling signed networks. The following proposition provides a sufficient condition for the symmetric function such that the generated network is population-level balanced.
Proposition 1.
Suppose a symmetric function satisfies that
(1) |
where is a subset of with probability , i.e., . Then, a network with a symmetric measurable function is population-level balanced.
The proof of Proposition 1 is based on the fact that if and only if the logit when the probability that an edge appears between node and node is nonzero. The details are provided in the Supplemental Material. We note that though there is room for relaxation of the requirement (1), its simplicity provides a feasible direction for further analyses on the form of .
Since not any arbitrary symmetric function would satisfy (1), it is desirable to study what characteristics the function should have. To this end, we have established the necessary and sufficient conditions in Theorem 1 for the function to satisfy (1).
Theorem 1.
For a symmetric function , holds for any , where with , if and only if
-
(i)
the function is positive on , i.e., for any ; or
-
(ii)
there exists two nonempty subsets and , with and , such that for any , where is not the usual indicator function, but rather equals if the event holds and otherwise.
Theorem 1 implies that, for a function to satisfy (1), if it is not always positive, then can be divided into two nonempty disjoint subsets such that the function is positive when the two arguments belong to the same subset and negative otherwise. On the other hand, if the function is always positive, it corresponds to a trivial case in which the expected signs between all pairs of nodes are positive. Note that when is discrete and finite, Theorem 1 is a direct result of [12], while our theorem can be applied to more general latent spaces.
Next, we further illustrate the implication of Theorem 1 in choosing the function to describe the population-level balance by taking commonly used latent spaces as examples.
Example 1 The latent space can be a finite set as in stochastic block models [1]. Let and denotes the community that node belongs to. Theorem 1 implies that the communities can be further combined into two groups and edges tend to be positive within the same group and negative between different groups, as shown in the left side of Figure 2. We provide a rigorous description for the above result in the following corollary.
Corollary 1.1.
For a finite set , the symmetric function satisfies that if and only if there exists a grouping function and some constants such that holds for .
Note that here the grouping function identifies two antagonistic groups in the signed network, where nodes from different groups tend to “dislike” each other.


Example 2 The latent space can also be a Euclidean space as in the latent distance model and the latent projection model [14]. The following proposition provides an important class of continuous symmetric functions for which the requirement (1) is satisfied.
Proposition 2.
For the Euclidean space , the requirement (1) holds if , where is a real-valued continuous function and .
From the mathematical perspective, Proposition 2 is an obvious result based on the inequality in (1). However, in combination with Theorem 1, it leads us to a useful and interesting interpretation for the function . Specifically, can be viewed as the logit (or score) of any binary “classifier” that separates into two disjoint regions; the function is positive if the two arguments are classified into the same region and negative otherwise. As shown in the right panel of Figure 2, the boundary of the classifier tries to cut as many negative edges as possible while retaining most positive edges within the same region. with some and .
Remark 1.
For the special case , [13] discussed the use of the inner product of two latent variables to capture various third-order dependence patterns among residuals in the context of regression, including the balance theory. Specifically, when the dimension is one, [13] observed that all triads among residuals are balanced, which allows for clustering nodes into two groups based on signs of latent variables. This observation aligns with our findings on the separation of the latent space. Nonetheless, the balance concept studied in [13] is at the triangle level and does not extend to the entire network structure. In this work, we have introduced a rigorous stochastic definition of population-level balance for signed networks. Furthermore, the condition in Proposition 1 for population-level balance is more general and includes the inner product as an example.
Remark 2.
Moreover, our findings in this section can be generalized beyond triangles. In the Supplemental Material, we generalize the notion of population-level balance and extend Proposition 1 to any -loops as well as general weighted networks. In addition, we extend the infinitely exchangeable graphon model for signed networks and expand Theorem 1 accordingly.
3 Balanced Inner-product Models
Motivated by the key finding in Theorem 1, we propose two inner-product models for signed networks that fall within the general class of latent space models in Definition 2. Both models are inherently of population-level balance. We first present the separate inner-product model and then introduce the joint inner-product model by adding an additional structural assumption. We will demonstrate the usefulness of this structural assumption in estimating the latent variables both theoretically (if it is correctly specified) and empirically in subsequent sections.
3.1 Separate inner-product model
We assume that for any , we have
(2) |
where and (for ) are latent variables. Further, independently of all others, we also assume
(3) |
where for are also latent variables.
The proposed separate inner-product model has the capacity to capture various commonly observed characteristics of signed networks. Specifically, the parameter enables modeling node degree heterogeneity, of which a larger value leads to higher probability of connecting with other nodes given other parameters fixed. Thus, we call degree heterogeneity parameters. Next, the inner-product formation between the latent position vectors and inherently models transitivity, i.e., nodes with common neighbors (regardless of friend or enemy) are more likely to be linked. Because the closer the latent position vectors of two nodes are in the latent space, the higher inner product it is and more likely to connect with each other. Finally, the parameters model the distribution of signs through their product, which satisfies the sufficient condition (1) for the population-level balance. In particular, an edge between two nodes tends to have a positive sign when their latent variables and have the same sign and a negative sign otherwise. Moreover, the magnitude of controls the discrepancy level between positive and negative signs. Therefore, we name them as latent polar variables. When all latent polar variables are zeros, negative and positive signs are exchangeable.
Following [29], which is for unsigned networks, we impose no distributional assumptions (prior) on the latent variables for the sake of modeling flexibility and estimation scalability, in comparison to treating them as random and using Bayesian estimation approaches as in existing works [19].
For presentation simplicity, we rewrite the model in matrix form. Specifically, we have , where , is the all one vector in , , and .
Identifiability To ensure identifiability of parameters , we provide additional constraints in Proposition 3. Given centered latent position variables, that is , where , the parameters are identifiable up to an orthogonal transformation and a sign flipping.
3.2 Joint inner-product model
Based on the above separate inner-product model, we further consider the dependency of the latent polar variable on the latent position variable . The idea of introducing their relationship originates naturally from Proposition 2, where we view the latent polar variable as a function of the latent position variable , i.e., with some link function . Modeling such a link function can provide more structural information of the network. On the other hand, there are flexibilities in choosing the family of link function , which would lead to different shapes of the latent space partition derived by . For the scope of this paper, we assume is a linear function in in the joint inner-product model, i.e., with and , and discuss other nonlinear alternatives in Remark 3. Specifically, the joint inner-product model is given by (2) and replaces (3) with
(4) |
In particular, the hyperplane separates the latent space into two regions. A pair of nodes tend to have a positive edge when their latent positions are located on the same side of the hyperplane and have a negative edge when their latent positions are located on different sides of the hyperplane. If and , the sign of each edge has a homogeneous logit to be positive.
Identifiability For the joint inner-product model, the identifiability condition for parameters is established correspondingly in Proposition 4.
Proposition 4.
Remark 3.
Though we use a linear link function in the joint inner-product model (4), more flexible nonlinear functions can be considered. For example, we may assume belongs to a reproducing kernel Hilbert space (RKHS) associated with an inner product under which is complete. There is a positive semidefinite kernel function such that . Multiple choices of RKHS are available for practical use, including those with polynomial kernel, Gaussian kernel, and Laplacian kernel [23].
4 Model Estimation
In this section, we develop two methods for fitting the proposed models (2)-(4). Both methods minimize the negative log-likelihood function of the balanced inner-product models through projected gradient descent.
Under balanced inner-product models, the negative log-likelihood function consists of two parts. The first part is derived from the probability of forming edges:
where , and is the sigmoid function, which is the inverse of the logit function. The second part is derived from the probability of assigning signs:
where , and when under the joint inner-product model, we further have , or equivalently, belongs to the column space of .
The first method estimates parameters and separately by minimizing and respectively. Hence we name it the separate estimation method. Note that the separate estimation method does not depend on a specific relationship between the latent polar variables and the latent position vectors . Therefore, the separate estimation method can always be applied regardless of the underlying link function . Alternatively, we also propose a joint estimation method tailored for the joint inner-product model, which exploits the structural assumption for more accurate estimation. Specifically, we jointly estimate parameters by minimizing a weighted sum of and , while constraining to be in the column space of .
Notation Before presenting the algorithm details, we first introduce the following general notations to be used hereafter. For any , and denote the -th row and -th column of matrix respectively, and for any function , represents applying the function element-wisely to , that is and . We use to denote the Hadamard product, that is, for any two matrices , and . Moreover, we use , , , and to denote the Frobenius norm, the operator norm, the nuclear norm, and the max norm of a matrix respectively. We use to denote the column space of . For a vector , we use to denote the Euclidean norm.
4.1 Separate estimation method
First, to estimate parameters , we solve the non-convex optimization problem below:
(5) |
subject to . In particular, the signed adjacency matrix enters the objective function through its absolute value, which leads to the same optimization problem studied in [29] when there is no edge covariate. Here we adopt the projected gradient descent algorithm along with the initialization method proposed in [29] because of their theoretical guarantee and scalability to large networks. We provide the detailed description of the method in Algorithm S1 and the initialization algorithm in the Supplemental Material.
Next, to estimate the latent polar variables , we solve another non-convex optimization problem, i.e.,
(6) |
Similarly, we develop a fast gradient descent algorithm, which is summarized in Algorithm S2. We also use an initialization algorithm based on the universal singular value thresholding proposed by [26] (see the Supplemental Material).
Remark 4.
We note that, although we use gradient descent algorithms for estimating both and , the subtle difference in their objectives makes the theory in [29] not directly applicable to (6). Specifically, unlike the objective in (5), not all elements of the signed adjacency matrix contribute to the objective in (6). Instead, only nonzero entries, i.e., , are used for inferring the latent polar variables through (6). In this case, one key step in building the improvement in errors of iterates in [29] no longer holds. Therefore, we establish a new error bound for Algorithm S2 (see Section 5).
Remark 5.
We also note that our optimization problem in (6) is closely related to the line of research on low-rank matrix estimation. See [18, 2, 7, 4, 25, 24] for a sample of references. In particular, (6) can be viewed as a one-bit matrix completion problem, where we observe a random subset of binary entries generated from a distribution determined by a low-rank matrix. To solve this problem, [7] considered a convex relaxation that replaces the low-rank constraint by the nuclear norm penalization. Though it becomes a convex optimization problem, in general, solving such a nuclear-norm penalized optimization problem requires singular value decomposition at each iteration, which is computationally expensive for large matrices. Alternatively, gradient descent algorithms have been used for improving the computational efficiency. [4] and [24] have established convergence guarantees and statistical errors for the gradient descent algorithms in application to low-rank matrix estimation problems, which particularly cover the one-bit matrix completion problem. However, theories in aforementioned works are based on the uniform random sampling assumption, i.e., each entry of the matrix is observed independently with a uniform probability , while in our case, entries are observed with different probabilities . Thus, our theoretical analysis of the proposed gradient descent algorithm in Section 5 provides new results for one-bit matrix completion under non-uniform random sampling.
4.2 Joint estimation method
Under the joint inner-product model (3)-(4), we propose to jointly estimate parameters by re-parameterizing with and . By introducing a hyperparameter , we minimize the following weighted negative log-likelihood,
subject to , , and .
Here controls the weight of relative information from the edge formation and the sign assignment respectively. In particular, when , no information from the edge signs is used and the joint estimation reduces to the separate estimation for in (5). Later in Section 5.2, we will theoretically show that, under certain conditions, any positive below some threshold yields more accurate estimation of latent position variables than the separate estimation (i.e., ), but the magnitude of the improvement depends on the choice of . In principle, we can select in a data-driven manner by performing cross-validation on the observed signed adjacency matrix, where we randomly mask a subset of entries, fit the joint inner-product model by using the remaining entries, repeat the process multiple times, and then select from a candidate set with the best average prediction performance on the holdout entries. In practice, we find simply setting also works generally well, in which case the solution becomes the usual maximum likelihood estimator.
To solve the above constrained minimization problem, we develop a projected gradient descent algorithm, whose details are given in Algorithm S5 in the Supplemental Material. We initialize the algorithm by obtained from Algorithms S1 and .
5 Theoretical Results
In this section, we establish high probability error bounds for the proposed two estimation methods. Note for the separate estimation method, the error bound for estimating latent position vectors under model (2) and that for estimating latent polar variables under model (3) are derived separately. Thus, the separate estimation method is robust in the sense that, when one of models (2) and (3) is mis-specified, our theoretical results still hold for the other. On the other hand, for the joint estimation method that utilizes the relationship between and , we further discuss how incorporating their dependency can help reduce the estimation error of latent variables under the joint inner-product model (4).
5.1 Results for the separate estimation method
We present theoretical guarantees of Algorithms S1 and S2 under the separate inner-product model (2) and model (3) respectively. Note that the error bound for the outputs of Algorithm S1 is a straightforward result of [29, Theorem 9] when there is no edge covariate; we adjust it in Proposition 5 for presentation coherence. Nonetheless, their theory cannot be directly applied to the setting of Algorithm S2, because only nonzero entries of the signed adjacency matrix are included in the objective (6), which breaks an important step towards establishing the estimation improvements for successive iterations in their proof. Thus, our established error bound for the outputs of Algorithm S2 is a new result for a more general setting, where entries are observed with non-uniform probabilities.
We describe error bounds for the outputs of Algorithms S1 and S2 with details below. We firstly define the parameter spaces as
(7) |
and
(8) |
where for . We allow , , , and in (7)-(8) to change with the network size similarly as in [29]. Note that, given the inequalities in (7), it is straightforward to see that, for any , we have for . Therefore, , as the upper bound of logit-transformed probabilities of observing edges, controls the network sparsity, of which a larger value leads to a sparser network. The true parameters are denoted by , , and .
Error bound for Algorithm S1 Let be the updated parameters at the -th iteration in Algorithm S1 and . Since the latent position vectors are identifiable up to an orthogonal transformation, we define the distance between two latent matrices and as , where is the collection of all orthogonal matrices in . Let , , and .
For theoretical justification, in Algorithm S1, we further assume projection onto the constraint sets and at each iteration. The following proposition establishes the high probability error bounds for estimating both the latent position matrix and the logit-transformed probability matrix .
Proposition 5.
1) the initializers in Algorithm S1 satisfy for a sufficiently small positive constant , where is the condition number of ; and 2) for a sufficiently large constant . Then there exist positive constants and uniformly over such that, with probability at least , we have
for some .
Error bound for Algorithm S2 Let be the updated parameters at the -th iteration in Algorithm S2 and . Similarly, as the latent polar variables are identifiable up to a sign, we define the distance between two latent vectors and as . Let and , and further let .
Although the error bound presented below does not rely on a specific generating process of edges such as in model (2) and the parameter space in (7), it depends on the lower bound of the probability of observing an edge. For notation consistency, we use to denote the lower bound of the logit-transformed probability matrix, i.e., for . Similarly, for theoretical justification, we constrain to be in the set at each iteration in Algorithm S2. The following theorem establishes the high probability error bounds for estimating the latent polar variables and the logit-transformed probability matrix .
Theorem 2.
Set the step size as for any , where is a universal constant. Suppose 1) the initializer in Algorithm S2 satisfies for a sufficiently small positive constant ; and 2) for a sufficiently large constant . Then there exist positive constants and uniformly over and such that, with probability at least , we have
for some .
Theorem 2 implies that the mean square error is of order , which coincides with the existing error rate for one-bit rank- matrix completion problems [7, 4], while our result can be viewed as their extension to the case where entries of the one-bit matrix are randomly observed with non-uniform probabilities. In particular, for the more general non-uniform case, the key ingredient in our proof is to derive a lower bound of the sampling operator . We prove that the sampling operator has a positive lower bound, i.e., with some , as long as belongs to a specific data-dependent set. This positive lower bound enables us to extend the proof in [29] when establishing iterative improvements. The proof of Theorem 2 is given in the Supplemental Material.
Remark 6.
Note that the error bounds for and in Theorem 2 hold regardless of the concrete form of model (2). Therefore, the above results still hold even if model (2) is mis-specified. But, it still depends on the probability of observing edges. Recall that the probability of forming edges are bounded by . For , the network density is in the order of . Theorem 2 implies that the error bound decreases as the network density grows. Specifically, the impact of network density lies in the multiplier . Intuitively, a sparse network, with fewer observed edges, leads to larger estimation errors for and due to the lack of sign observations. Similarly, Proposition 5 implies that the network density affects the error bound of through the multiplier . For networks with modest density (), the multiplier is . For sparser networks where , the multiplier adjusts to .
Remark 7.
The assumptions in both Proposition 5 and Theorem 2 require relatively good initializations of . We note that the conditions for and can be achieved with theoretical justification by the universal-singular-value-thresholding [26] based initialization algorithm proposed in [29]. We further extend this algorithm to initialize . Based on our simulation studies (see the Supplemental Material), we find that simple random initialization also achieves similar estimation errors after the algorithm converges while requiring more iterations for algorithm convergence.
5.2 Results for the joint estimation method
We first present the convergence guarantee and the error bound for the estimators obtained by Algorithm S5. Then we further investigate how the joint estimation method could improve the estimation of on top of the separate estimation.
Under the joint inner-product model, we redefine the parameter space as
where , , , and are allowed to change with the network size . Let be the true parameters, where with some and .
Error bound for Algorithm S5 Let be the updated parameters at the -th iteration in Algorithm S5. We assume the projection onto the same constraint sets , , and at the end of each iteration as those for Algorithms S1 and S2. The following theorem first guarantees that the error of iterates converges up to a statistical error and then gives the high probability error bounds for the estimators of and .
Theorem 3.
Set the step sizes as , , and the weight with for any , , where and are universal constants. Let and . Denote the error metric for iterates as . Suppose the initializers in Algorithm S5 satisfy for a sufficiently small positive constant , where is the condition number of . Then, we have
-
1.
(Deterministic bounds for iterative errors) If and for a sufficiently large constant , then there exist universal positive constants , , , and such that for all
-
2.
(High-probability bounds) Suppose and for a sufficiently large constant . Then there exist positive constants , , and uniformly over such that, with probability at least , we have
for some .
The first part of Theorem 3 indicates that, compared to the separate estimation method, the joint method involving the gradient of the sign likelihood leads to an extra improvement on the error bound of iterates, which depends on , while introducing another statistical error term . As a result, in the second part, the high probability error bounds depend on the maximum of three terms, among which the first two are the same as in Proposition 5 and the third is resulted from . When , the maximum multiplier reduces to the one in Proposition 5. Overall, the error bounds of and for the joint estimation method are still in the order , which is the same as that for the separate estimation method. In addition, the following corollary gives the error bounds of and obtained from the line 5 in Algorithm S5.
Corollary 3.1.
For with , we have for . Suppose the conditions for high probability bounds in Theorem 3 hold, then there exist positive constants , , and uniformly over such that, with probability at least , we have
for some .
In particular, the deterministic error bound for consists of the statistical error term and the estimation error of , and with high probability is dominated by the estimation error of and thus is also in the order .
One-step improvement Although Theorem 3 guarantees the convergence of Algorithm S5 up to certain statistical errors, the achieved error rate is in the same order as that for the separate estimation method. To further investigate how exploiting extra structural information in the joint inner-product model would help estimate the latent variables, we consider the estimation error moving against the gradient one step from the estimators obtained by the separate method below.
Suppose we are given estimators of latent variables obtained from the separate estimation Algorithm S1 and an estimator . Then we update the estimator of by one step through Algorithm S5 as below:
(9) |
where , , and . Note that with independently following Bernoulli conditional on . The following proposition provides insights on under what scenarios the one-step update in the joint estimation method could lead to better estimates of the latent position vectors . In below, for ease of derivation, we consider the parameter space with fixed () and .
Proposition 6.
Given the estimators obtained from Algorithm S1 and the estimators that are independent of conditional on and satisfy . We update for one step by (9) and obtain . Suppose the conditions in Proposition 5 hold, and the singular values of the sample covariance are of constant order. Then there exists an optimal that minimizes . Furthermore, if , we have for any , and the improvement with is at least
where ’s are given in (S42)-(S44) respectively for in the Supplemental Material with , , and , and is an element-wise positive constant matrix. Here represents the conditional expectation of given .
We provide the expression of the optimal that minimizes , the proof of Proposition 6, and discuss when the conditional independence assumption and the prerequisite error rate of in Proposition 6 hold in the Supplemental Material. Since a positive implies a strict decrease in the mean square error of after one-step update, we further investigate in which case tends to be positive. In particular, if and only if is strictly positive. Our analysis on the upper bounds of the three terms suggests that the first two terms are the dominating terms and a larger more likely results in a positive . Therefore, when the signal from the edge sign distribution is strong, incorporating information from observed signs in the joint estimation method is useful for improving the estimation of . Moreover, the magnitude of improvement also depends on the levels of the signal and the noise in the sign distribution. Specifically, as increases, the difference between the upper bounds of the two dominating terms in the numerator increases while the upper bound of the denominator decreases, therefore overall the improvement is likely to increase. This implies that larger signals in the edge signs would lead to greater improvement in estimating . On the other hand, we find that the improvement decreases when the noise in the edge sign distribution increases in the denominator.
6 Simulation Studies
In this section, we conduct simulation studies to investigate how estimation errors of the proposed methods depend on: 1) the network size and the dimension of latent position vectors; 2) the network density; and 3) the proportion of positive edges.
Estimation methods We compare three estimation methods. In addition to the separate estimation method and the joint estimation method introduced in Section 4, we further add an intermediate method, one-step-joint estimation, to illustrate the one-step improvement discussed in Proposition 6. Specifically, given and obtained from Algorithms S1 and S2 respectively, we compute the one-step-joint estimators by first updating with and then obtaining by plugging into (9). We set , so that the joint estimation is the same as the maximum likelihood estimation.
Simulation settings For a given network size and a latent position vector dimension , we set the model parameters as follows. We first generate the latent positions from the standard normal distribution, for . By centering columns of , we get , where . We further normalize element-wise such that . Next, we generate the node degree heterogeneity parameters , where is uniformly distributed for . Finally, we set , , and .
Given the true latent variables , and , we randomly generate 20 replications of the signed adjacency matrix following (2) and (3), and fit models by three estimation methods. For each method, we measure the relative errors for , and . Due to the identifiability conditions in Proposition 4, we define the relative error for as , where and is the collection of all orthogonal matrices in . We define the error for as , where . The relative errors for and are defined as and respectively, where and .

6.1 Varying the network size and the dimension of the latent space
In Figure 3, we summarize how estimation errors vary with different network sizes. We fix and vary . We can see that, for a fixed dimension of the latent space, the relative errors of all three estimation methods decrease in the rate of as the network size grows, which align well with the theoretical error rates given in Section 5. Next, compared to the separate estimation method, the joint estimation method consistently achieves smaller estimation errors on all four quantities of interest across different network sizes. In addition, the one-step-joint estimation that simply updates estimates by one-step gradient descent is able to reduce the estimation errors compared to the separate estimation method.
We further summarize how estimation errors of and vary with different dimensions of the latent position vector in Figure S3 in the Supplemental Material for the sake of space. We fix and vary . We find that, for a fixed network size, the relative errors increase in the rate of as the dimension of latent position vector grows. This also agrees well with our theoretical results. The relative trend among the three estimation methods for different is similar as that in Figure 3, where the joint estimation method is consistently the best.
6.2 Varying the network density
We investigate how estimation errors for three estimation methods vary with the network density. To this end, we generate the node degree heterogeneity parameters where is uniformly distributed for . We fix and , and vary , which leads to the network density ranging from to .
Figure S4 in the Supplemental Material summarizes the relative estimation errors of and over replications under different network densities. We can see that both estimation errors of and for all three estimation methods decrease as the network gets denser, and the joint estimation method achieves lower estimation errors than the other two methods consistently across various network densities. In particular, when the network is dense, the improvement in estimating from the joint estimation against the separate estimation increases, which is expected because in joint estimation, the observed edges’ signs are also useful for inferring and denser networks provide more information. In addition, regarding the estimation error of , the joint and one-step-joint estimation methods that use the additional structural information between and perform more stably than the separate estimation method as the network gets sparser.

6.3 Varying the proportion of positive edges
We also investigate the effect of the proportion of positive edges on three estimation methods. For this purpose, we change the simulation setting. Specifically, we fix and , and vary , which results in the proportion of positive edges ranging from to . To eliminate the artificial effect resulting from varying when evaluating the estimation error of , we focus on the estimation error of the centered , i.e., . We define the relative error for as , where .
Figure 4 summarizes the relative estimation errors of and over replications under different proportions of positive edges. Overall, the joint estimation method performs consistently the best among three methods and is robust across different sign distributions. We note that the estimation error for for the separate method does not change, because the generated absolute adjacency matrix does not change when varying and thereby the separate estimates stay the same. We also observe that the optimal performance of the separate method for estimating is not achieved around 50% positive signs. Intuitively, in extreme cases where , the negative and positive signs become indistinguishable, with the proportion of positive edges close to 50%. This makes it challenging to estimate . The optimal proportion minimizing the estimation error may vary case by case. We investigate an SBM example in the Supplemental Material for illustration.
7 International Relation Data
In this section, we apply the proposed method to an international relation data, i.e. the Correlates of War (COW) [16], to demonstrate how the proposed method can be used to make informative interpretation and visualization of signed networks. The COW dataset records various types of international relations among countries, such as wars, alliances, and militarized interstate disputes. Similarly as [17], we construct a signed network of countries, where the positive edges represent alliance relationships, and the negative edges represent the existence of militarized disputes between countries. We take the snapshot of the records during World War II (WWII), i.e., from 1939 to 1945. In particular, if two countries were involved in both alliances and militarized disputes, we set the sign of their edge to positive if the number of years of alliances is larger than that of the militarized disputes, and we set the sign to negative otherwise. According to the COW records, there are 68 countries that were involved in alliances or militarized disputes during WWII, and the resulted signed network contains 566 positive edges and 519 negative edges. The total number of balanced triangles is 2,148, outnumbering the unbalanced triangles (613) by over threefold. To further assess the balance property, we applied a stratified permutation test [9]. Under the null hypothesis, the sign of an edge is independent of its position on the graph, which is considered as balance-free. The p-value of the stratified permutation test is less than , which indicates strong evidence of the balance property in this international relation network.
To incorporate the balance theory, we fit the joint inner-product model to the COW dataset. We set the dimension of the latent position vectors as , such that the estimated can be directly visualized on a 2-dimensional plane. The model fitted by the joint estimation method is visualized in Figure 5. In the figure, each node represents a country and their coordinates are given by . The size of each node is determined by the estimated degree heterogeneity parameter , with larger nodes corresponding to larger values. The color and the shape of each node distinguish the estimated latent polar variable . Specifically, if , the node is visualized as a red circular point; and if , the node is visualized as a blue square point. For both the red and the blue points, the larger the absolute magnitude , the darker the color. The sign of each edge is also indicated in the figure, with dashed green being positive and solid purple being negative.

Figure 5 shows that the proposed model fitted by the joint estimation method is able to capture important information in the signed network of countries during WWII. First, in terms of the estimated node degree heterogeneity , the top 11 countries are Germany, Italy, Japan, the United Kingdom (UK), Romania, the United States (USA), Brazil, Bulgaria, Hungary, France, the Soviet Union (USSR). In particular, UK, USA, USSR were the 3 leading countries of the Allies of WWII. France played important and complicated roles in both the Allies and the Axis. Brazil was the only South American country that actively participated in WWII. All other countries were members of the Axis. Since members of the Axis were more active than those of the Allies on average, it is reasonable that even small members have high values of . Second, regarding the estimated latent polar variable , the model is able to divide the countries into two groups (blue and red) that mostly align with the division between the Axis and the Allies. As we assume a linear transformation from to , the plane (the space of ) can be linearly separated into two areas with the boundary illustrated by the grey dash-dotted line. We can see that edges crossing the boundary are mainly negative (purple), while edges within the same side are mainly positive (green). Finally, the estimated latent position vectors also capture various other interesting aspects. Within the Allies, further clusters countries in America together (see the right part of the figure), among which most edges are positive (green). The Middle East countries form a cluster as well (at the top of the figure), though not as tight as countries in America. It is also interesting to see that France, one of the major Allied powers in WWII, is positioned on the same side of Germany, Italy, and Japan. This is because, after occupied by Germany, France was divided into two political powers, Free France and Vichy France, with the latter collaborating with Germany and fighting against the Allies in several campaigns from 1940 to 1944. We perform additional sensitivity analysis to assess the impact of splitting France into two political entities in the Supplemental Material.
On the other hand, we also fit the separate inner-product model and visualize the model fitted by the separate estimation method in Figure S6 in the Supplemental Material for the sake of space. We find that the model fitted by the separate estimation method captures useful information from the network but not as interpretable as that by the joint estimation method (see more detailed discussion in the Supplemental Material).
8 Discussion
In this paper, we propose a latent space approach that accommodates the structural balance theory for modeling signed networks. In particular, we introduce a novel notion of population-level balance, which is a natural choice to characterize the structural balance theory when we treat both the edges and their signs as random variables. We develop sufficient conditions for a latent space model to be balanced at the population level, and propose two balanced inner-product models following the conditions. We also provide scalable estimation algorithms with theoretical guarantees.
In Section 3, we proposed balanced inner-product models that use a one-dimensional polar variable. This design offers a direct and intuitive interpretation, where the sign of the polar variable inherently indicates membership to one of two antagonistic groups. Beyond this, considering multi-dimensional polar variables, i.e., for with , could be more flexible in terms of capturing multiple latent factors in real-world applications. In that case, we have ascertained that our estimation method and high-probability error bounds are still applicable to the multi-dimensional scenario. However, it is also important to note that, when considering multi-dimensional polar variables, the conditions in Theorem 1 no longer hold. In addition, multi-dimensional polar variables have the potential drawback of complicating interpretations.
There are a few other directions we may continue to explore in the future. First, the joint inner-product model could be extended to have a nonlinear link function , which would increase the flexibility of the model. Second, we may generalize the proposed approach to weighted signed networks to better leverage richer edge information available in real-world networks. Finally, it is desirable to extend the latent space approach for undirected networks to directed networks, which can be potentially used to model other interesting social theories, such as the social status theory, for signed networks.
References
- [1] Emmanuel Abbe “Community Detection and Stochastic Block Models: Recent Developments” In Journal of Machine Learning Research 18.177, 2018, pp. 1–86
- [2] Emmanuel J. Candes, Yonina C. Eldar, Thomas Strohmer and Vladislav Voroninski “Phase Retrieval via Matrix Completion” In SIAM Journal on Imaging Sciences 6.1, 2013, pp. 199–225 DOI: 10.1137/110848074
- [3] Sourav Chatterjee “Matrix estimation by Universal Singular Value Thresholding” In Annals of Statistics 43.1 The Institute of Mathematical Statistics, 2015, pp. 177–214 DOI: 10.1214/14-AOS1272
- [4] Yudong Chen and Martin J Wainwright “Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees” In arXiv preprint arXiv:1509.03025, 2015
- [5] Kai-Yang Chiang, Cho-Jui Hsieh, Nagarajan Natarajan, Inderjit S. Dhillon and Ambuj Tewari “Prediction and Clustering in Signed Networks: A Local to Global Perspective” In Journal of Machine Learning Research 15.34, 2014, pp. 1177–1213
- [6] Fan Chung and Linyuan Lu “Complex Graphs and Networks (CBMS Regional Conference Series in Mathematics)” USA: American Mathematical Society, 2006
- [7] Mark A Davenport, Yaniv Plan, Ewout Van Den Berg and Mary Wootters “1-bit matrix completion” In Information and Inference: A Journal of the IMA 3.3 OUP, 2014, pp. 189–223
- [8] Tyler Derr, Charu Aggarwal and Jiliang Tang “Signed network modeling based on structural balance theory” In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018, pp. 557–566
- [9] Derek Feng, Randolf Altmeyer, Derek Stafford, Nicholas A. Christakis and Harrison H. Zhou “Testing for Balance in Social Networks” In Journal of the American Statistical Association 0.0 Taylor & Francis, 2020, pp. 1–19 DOI: 10.1080/01621459.2020.1764850
- [10] Anna Goldenberg, Alice X. Zheng, Stephen E. Fienberg and Edoardo M. Airoldi “A Survey of Statistical Network Models” In Foundations and Trends® in Machine Learning 2.2, 2010, pp. 129–233 DOI: 10.1561/2200000005
- [11] Ramanthan Guha, Ravi Kumar, Prabhakar Raghavan and Andrew Tomkins “Propagation of trust and distrust” In Proceedings of the 13th International Conference on World Wide Web, 2004, pp. 403–412
- [12] Frank Harary “On the notion of balance of a signed graph.” In The Michigan Mathematical Journal 2.2 The University of Michigan, 1953, pp. 143–146
- [13] Peter D Hoff “Bilinear mixed-effects models for dyadic data” In Journal of the American Statistical Association 100.469 Taylor & Francis, 2005, pp. 286–295
- [14] Peter D Hoff, Adrian E Raftery and Mark S Handcock “Latent Space Approaches to Social Network Analysis” In Journal of the American Statistical Association 97.460 Taylor & Francis, 2002, pp. 1090–1098 DOI: 10.1198/016214502388618906
- [15] Cho-Jui Hsieh, Kai-Yang Chiang and Inderjit S Dhillon “Low rank modeling of signed networks” In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012, pp. 507–515
- [16] Ahmet Izmirlioglu “The Correlates of War Dataset” In Journal of World-Historical Information 3.1 University Library System, University of Pittsburgh, 2017
- [17] Alec Kirkley, George T. Cantwell and M… Newman “Balance in signed networks” In Physical Review E 99 American Physical Society, 2019, pp. 012320 DOI: 10.1103/PhysRevE.99.012320
- [18] Vladimir Koltchinskii, Karim Lounici and Alexandre B. Tsybakov “Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion” In Annals of Statistics 39.5 The Institute of Mathematical Statistics, 2011, pp. 2302–2329 DOI: 10.1214/11-AOS894
- [19] Pavel N. Krivitsky, Mark S. Handcock, Adrian E. Raftery and Peter D. Hoff “Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models” In Social Networks 31.3, 2009, pp. 204–213 DOI: https://doi.org/10.1016/j.socnet.2009.04.001
- [20] Jure Leskovec, Daniel Huttenlocher and Jon Kleinberg “Signed networks in social media” In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2010, pp. 1361–1370
- [21] Zhuang Ma, Zongming Ma and Hongsong Yuan “Universal Latent Space Model Fitting for Large Networks with Edge Covariates.” In Journal of Machine Learning Research 21.4, 2020, pp. 1–67
- [22] Mark Newman “Networks: An Introduction” Oxford University Press, 2010
- [23] Bernhard Scholkopf and Alexander J Smola “Learning with kernels: support vector machines, regularization, optimization, and beyond” Adaptive ComputationMachine Learning series, 2018
- [24] Lingxiao Wang, Xiao Zhang and Quanquan Gu “A unified computational and statistical framework for nonconvex low-rank matrix estimation” In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017, pp. 981–990 PMLR
- [25] Qinqing Zheng and John Lafferty “Convergence analysis for rectangular matrix completion using Burer-Monteiro factorization and gradient descent” In arXiv preprint arXiv:1605.07051, 2016
References
- [26] Sourav Chatterjee “Matrix estimation by Universal Singular Value Thresholding” In Annals of Statistics 43.1 The Institute of Mathematical Statistics, 2015, pp. 177–214 DOI: 10.1214/14-AOS1272
- [27] Chao Gao, Zongming Ma, Anderson Y Zhang and Harrison H Zhou “Achieving optimal misclassification proportion in stochastic block models” In Journal of Machine Learning Research 18.1 JMLR. org, 2017, pp. 1980–2024
- [28] Jing Lei and Alessandro Rinaldo “Consistency of spectral clustering in stochastic block models” In Annals of Statistics 43.1, 2015, pp. 215–237 DOI: 10.1214/14-AOS1274
- [29] Zhuang Ma, Zongming Ma and Hongsong Yuan “Universal Latent Space Model Fitting for Large Networks with Edge Covariates.” In Journal of Machine Learning Research 21.4, 2020, pp. 1–67
- [30] Yurii Nesterov “Introductory lectures on convex optimization: A basic course” Springer Science & Business Media, 2013
- [31] Stephen Tu, Ross Boczar, Max Simchowitz, Mahdi Soltanolkotabi and Ben Recht “Low-rank Solutions of Linear Matrix Equations via Procrustes Flow” In Proceedings of the 33rd International Conference on Machine Learning, 2016, pp. 964–973
- [32] Anderson Y Zhang and Harrison H Zhou “Minimax rates of community detection in stochastic block models” In The Annals of Statistics 44.5 Institute of Mathematical Statistics, 2016, pp. 2252–2280 DOI: 10.1214/15-AOS1428