This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Sample Complexity of Linear Regression Models for Opinion Formation in Networks

Haolin Liu1, Rajmohan Rajaraman 2, Ravi Sundaram2,
Anil Vullikanti1, Omer Wasim2, Haifeng Xu3
Alphabetical order
Abstract

Consider public health officials aiming to spread awareness about a new vaccine in a community interconnected by a social network. How can they distribute information with minimal resources, so as to avoid polarization and ensure community-wide convergence of opinion? To tackle such challenges, we initiate the study of sample complexity of opinion formation in networks. Our framework is built on the recognized opinion formation game, where we regard each agent’s opinion as a data-derived model, unlike previous works that treat opinions as data-independent scalars. The opinion model for every agent is initially learned from its local samples and evolves game-theoretically as all agents communicate with neighbors and revise their models towards an equilibrium. Our focus is on the sample complexity needed to ensure that the opinions converge to an equilibrium such that every agent’s final model has low generalization error.

Our paper has two main technical results. First, we present a novel polynomial time optimization framework to quantify the total sample complexity for arbitrary networks, when the underlying learning problem is (generalized) linear regression. Second, we leverage this optimization to study the network gain which measures the improvement of sample complexity when learning over a network compared to that in isolation. Towards this end, we derive network gain bounds for various network classes including cliques, star graphs, and random regular graphs. Additionally, our framework provides a method to study sample distribution within the network, suggesting that it is sufficient to allocate samples inversely to the degree. Empirical results on both synthetic and real-world networks strongly support our theoretical findings.

1 Introduction

In today’s interconnected world, rapid dissemination of information plays a pivotal role in shaping public understanding and behavior, especially in areas of critical importance like public health. People learn and form opinions through personal investigations and interactions with neighbors. The heterogeneity of social networks often means that not every individual requires the same amount of information to form an informed opinion. Some may be heavily influenced by their peers, while others may need more direct information. This differential requirement presents both a challenge and an opportunity: How can one distribute information to ensure the entire community is well-informed, without unnecessary redundancies or gaps in knowledge dissemination?

To answer this question, the first step is to understand the dynamics of opinion formation within social networks. Starting from the seminal work of DeGroot (1974), one of the predominant models assumes that individuals shape their own beliefs by consistently observing and averaging the opinions of their network peers (DeGroot 1974; Friedkin and Johnsen 1990; DeMarzo, Vayanos, and Zwiebel 2003; Golub and Jackson 2008; Bindel, Kleinberg, and Oren 2015; Haddadan, Xin, and Gao 2024). We will adopt this seminal framework to model the opinion formation process among agents. However, these works always encapsulate opinions as single numeric values. While this provides a foundational understanding, it cannot capture the nuanced ways in which individuals process and integrate new information into their existing belief systems. To address this issue, our formulation generalizes the opinions to be a model parameter learned using a machine learning-based approach, as explored in (Haghtalab, Jackson, and Procaccia 2021) to study belief polarization. Such formulation aligns with the human decision-making procedure proposed by (Simon 2013) based on prior experiences and available data.

Specifically, we adopt the game theoretical formulation of this opinion dynamic process introduced by (Bindel, Kleinberg, and Oren 2015). Within this framework, an agent’s equilibrium opinion emerges as a weighted average of everyone’s initial opinions on the network (which are learned form their locally available data), where the weights are unique and determined by the network structure. It is not guaranteed, however, that a node’s equilibrium opinion would have a small generalization error. Motivated by the work of (Blum et al. 2021), we study the conditions under this actually happens (i.e., each node has a model with a small generalization error): specifically, how should samples be distributed across the network such that at the equilibrium of the opinion formation game, everyone has a model that is close enough to the ground-truth, and the total number of samples is minimized; we are also interested in understanding the network gain, i.e., how much does the collaboration on a network help the sample complexity.

We note that when opinions are treated as data-driven model parameters, the equilibrium models share the same mathematical structure as the fine-grained federated learning model introduced in (Donahue and Kleinberg 2021). However, their study assumes fixed sample sizes without considering networks and focuses on collaboration incentives. Our work focuses on the optimal allocation of samples across networks to ensure that the generalization error of the equilibrium model remains low.

1.1 Problem formulation

We consider a fixed set V={1,2,,n}V=\{1,2,\cdots,n\} of agents (also referred to as nodes) connected by a network G=(V,E)G=(V,E). Let N(i)N(i) denote the set of neighbors, and did_{i} the degree of agent ii. We assume every agent ii in a given network GG has a dataset Si={(xij,yij)}j[mi]S_{i}=\{(x_{i}^{j},y_{i}^{j})\}_{j\in[m_{i}]} allocated by a market-designer, where mim_{i} is the number of samples at ii, and xijk,j[mi]x_{i}^{j}\in\mathbb{R}^{k},\,\,\forall j\in[m_{i}], and are independently drawn from a fixed distribution DD with dimension kk. Each agent ii learns an initial model θ¯i\bar{\theta}_{i} (known as internal opinion); θ¯i\bar{\theta}_{i} need not be a fixed number (as in (Bindel, Kleinberg, and Oren 2015)), but can be learned through a machine learning approach, similar as (Haghtalab, Jackson, and Procaccia 2021). We assume that these datasets are allocated by a system designer and that mikm_{i}\geq k (i.e., SiS_{i} has at least kk samples), which ensures everyone has enough basic knowledge to form their own opinion before learning over the social network. Our full paper (Liu et al. 2023) includes a table of all notations.

Agent ii’s goal is to find a model θi\theta_{i} that is close to its internal opinion, as well as to that of its neighbors, denoted by N(i)N(i); i.e. compute θi\theta_{i} that minimizes the loss function θiθ¯i2+jN(i)vijθiθj2\|\theta_{i}-\bar{\theta}_{i}\|^{2}+\sum_{j\in N(i)}v_{ij}\|\theta_{i}-\theta_{j}\|^{2}, where vij0v_{ij}\geq 0 measures the influence of a neighboring agent jj to agent ii. We refer to 𝒗\boldsymbol{v}, where [𝒗]ij=vij[\boldsymbol{v}]_{ij}=v_{ij}, as the influence factor matrix. In Lemma 1, we show that the unique Nash equilibrium of this game is (θ1eq,,θneq)T=W1(θ¯1,,θ¯n)T(\theta^{eq}_{1},\cdots,\theta^{eq}_{n})^{T}=W^{-1}(\bar{\theta}_{1},\cdots,\bar{\theta}_{n})^{T} where W=+IW=\mathcal{L}+I and \mathcal{L} is the weighted Laplacian matrix of GG.

We define the total sample complexity, TSC(G,𝒗,k,ϵ)TSC(G,\boldsymbol{v},k,\epsilon), as the minimum number of total samples, imi\sum_{i}m_{i}, subject to the constraint mikm_{i}\geq k, so that the expected square loss of θieq\theta_{i}^{eq} is at most ϵ\epsilon for all ii; here mi=|Si|m_{i}=|S_{i}|. In the special case where the influence is uniform, i.e., vij=α,i,jv_{ij}=\alpha,\forall i,j, we use TSC(G,α,k,ϵ)TSC(G,\alpha,k,\epsilon) to denote the total sample complexity.

Let M~(k,ϵ)\widetilde{M}(k,\epsilon) denote the minimum number of samples that any agent ii would need to ensure that the best model θ¯i\bar{\theta}_{i} learned using only their samples ensures that the expected error is at most some target ϵ\epsilon; if there is no interaction between the agents, a total of nM~(k,ϵ)n\widetilde{M}(k,\epsilon) samples would be needed to ensure that θ¯i\bar{\theta}_{i} has low error for each agent. Since every agent has at least kk samples, we refer to

μ(G,𝒗,k,ϵ)=n(M~(k,ϵ)k)TSC(G,𝒗,k,ϵ)nk,\mu(G,\boldsymbol{v},k,\epsilon)=\frac{n(\widetilde{M}(k,\epsilon)-k)}{TSC(G,\boldsymbol{v},k,\epsilon)-nk}, (1)

the ratio of the additional samples needed to achieve error of ϵ\epsilon for every agent under social learning to that needed under independent learning, as the network gain of GG. We will show that for linear regression M~(k,ϵ)=Θ(kϵ)\widetilde{M}(k,\epsilon)=\Theta(\frac{k}{\epsilon}) (Theorem 2). We assume M~(k,ϵ)>k\widetilde{M}(k,\epsilon)>k and the network gain can be infinite when TSC(G,𝒗,k,ϵ)=nkTSC(G,\boldsymbol{v},k,\epsilon)=nk.

1.2 Overview of results

Our focus here is to estimate the TSC(G,𝒗,k,ϵ)TSC(G,\boldsymbol{v},k,\epsilon), and the network gain μ(G,𝒗,k,ϵ)\mu(G,\boldsymbol{v},k,\epsilon), and characterize the distribution of the optimal mim_{i}s across the network. Most proofs are presented in our full paper (Liu et al. 2023).

Tight bounds on TSC. Using the structure of the Nash equilibrium of the opinion formation game (Lemma 1), and regression error bounds (Theorem 3), we derive tight bounds on TSC(G,𝒗,k,ϵ)TSC(G,\boldsymbol{v},k,\epsilon) for any graph GG using a mathematical program (Theorem 4). We also show that the TSC can be estimated in polynomial time using second-order cone programming, allowing us to study it empirically.

Graph class Network gain
Clique Ω(n)\Omega(n)
Star O(1)O(1)
Hypercube Ω(d2)\Omega(d^{2})
Random dd-regular Ω(min{d2,n})\Omega(\min\{d^{2},n\})
dd-Expander Ω(min{d2τ(G)4,n})\Omega(\min\{d^{2}\tau(G)^{4},n\})
Table 1: Network gain μ(G,α,k,ϵ)\mu(G,\alpha,k,\epsilon) for different graphs where τ(G)\tau(G) denotes the edge expansion of GG (defined in Section 4).

Impact of graph structure on TSC, network gain, and sample distribution. We begin with the case of uniform influence factor α\alpha. We show that i=1n1(αdi+1)2TSC(G,α,k,ϵ)M~(k,ϵ)i=1nα+1αdi+1\sum_{i=1}^{n}\frac{1}{(\alpha d_{i}+1)^{2}}\lesssim\frac{TSC(G,\alpha,k,\epsilon)}{\widetilde{M}(k,\epsilon)}\lesssim\sum_{i=1}^{n}\frac{\alpha+1}{\alpha d_{i}+1} (informal version of Theorem 5), where did_{i} denotes the degree of agent ii. Assigning max{k,α+1αdi+1kϵ}\max\{k,\frac{\alpha+1}{\alpha d_{i}+1}\!\cdot\!\frac{k}{\epsilon}\} samples to agent ii can ensure everyone learns a good model. Thus, It is sufficient to solve the TSC problem when the number of samples for an agent is inversely proportional to its degree. In other words, low-degree nodes need more samples, in stark contrast to many network mining problems, e.g., influence maximization, where it suffices to choose high-degree nodes. Clearly, this result has policy implications. Building on Theorem 5, we derive tight bounds on the network gain for different classes of graphs, summarized in Table 1.

From Table 1, we can see a well-connected network offers a substantial reduction in TSC (high network gain), whereas a star graph provides almost no gain compared to individual learning. We also demonstrate that the lower bounds on network gain in Table 1 are tight. More detailed results for specific networks can be found in Table 2.

Finally, we consider general influence factors and derive upper and lower bounds on TSC for arbitrary graphs (Theorem 7). We find that these bounds are empirically tight.

Experimental evaluation. In our simulation experiments, we compute the total sample complexity and the near-optimal solutions for a large number of synthetic and real-world networks. We begin with the case of uniform influence factors and first validate the findings of Theorem 5 that sample size in the optimal allocation has negative correlation with degree. We then experimentally evaluate the d2d^{2} network gains for random dd-regular graphs. Overall, we find that experimentally computed solutions are consistent with our bounds in Table 1. Finally, we consider general influence factors and find that our theoretical bounds on TSC in Theorem 7 are quite tight empirically.

1.3 Related Work and Comparisons

Sample complexity of collaborative learning. Building on Blum et al. (2017), a series of papers (Chen, Zhang, and Zhou 2018; Nguyen and Zakynthinou 2018; Blum et al. 2021; Haghtalab, Jordan, and Zhao 2022) studied the minimum number of samples to ensure every agent has a low-error model in collaborative learning. In this setting, there is a center that can iteratively draw samples from different mixtures of agents’ data distributions, and dispatch the final model to each agent. In contrast, we use the well-established decentralized opinion formation framework to describe the model exchange game; the final model of every agent is given by the equilibrium of this game.

Our formulation is similar to (Blum et al. 2021), which also considers the sample complexity of equilibrium, ensuring every agent has a sufficient utility guarantee. However, (Blum et al. 2021) study this problem in an incentive-aware setting without networks, which mainly focuses on the stability and fairness of the equilibrium. In contrast, our research is centered on the network effect of the equilibrium generated by the opinion formation model. Moreover, (Blum et al. 2021) assumes the agents’ utility has certain structures that are not derived from error bounds while we directly minimize the generalization error of agents’ final models.

(Haddadan, Xin, and Gao 2024) is another related paper which also considers learning on networks under opinion dynamic process. However, they study selecting KK agents to correct their prediction to maximize the overall accuracy in the entire network, rather than the sample complexity bound to ensure individual learner has a good model as our paper.

Fully decentralized federated learning. To reduce the communication cost in standard federated learning (McMahan et al. 2017), Lalitha et al. (2018, 2019) first studied fully decentralized federated learning on networks, where they use Bayesian approach to model agents’ belief and give an algorithm that enables agents to learn a good model by belief exchange with neighbors. This setting can be regarded as a combination of Bayesian opinion formation models (Banerjee 1992; Bikhchandani, Hirshleifer, and Welch 1992; Smith and Sørensen 2000; Acemoglu, Chernozhukov, and Yildiz 2006) and federated learning. In the literature regarding opinion formation on networks, besides those Bayesian models, non-Bayesian models are usually considered more flexible and practical (DeGroot 1974; Friedkin and Johnsen 1990; DeMarzo, Vayanos, and Zwiebel 2003; Golub and Jackson 2008; Bindel, Kleinberg, and Oren 2015; Haddadan, Xin, and Gao 2024). Comprehensive surveys of these two kinds of models can be found in Jackson (2010) and Acemoglu and Ozdaglar (2011). Our study makes connections between non-Bayesian opinion formation models and federated learning for the first time. Compared with Lalitha et al. (2018, 2019), we assume each agent can observe the model of neighbors, rather than a belief function. We do not restrict to specific algorithms but use game theoretical approaches to find the unique Nash equilibrium and analyze sample complexity at this equilibrium.

2 The Opinion Formation Game

As mentioned earlier, we utilize a variation on the seminal DeGroot model (DeGroot 1974) proposed by (Friedkin and Johnsen 1990), also studied by (Bindel, Kleinberg, and Oren 2015). Formally, agent ii seeks θi\theta_{i} which minimizes the loss

θiθ¯i2+jN(i)vijθiθj2\displaystyle\|\theta_{i}-\bar{\theta}_{i}\|^{2}+\sum_{j\in N(i)}v_{ij}\|\theta_{i}-\theta_{j}\|^{2}

where vij0,i,j[n],jN(i)v_{ij}\geq 0,\,\,\forall i,j\in[n],j\in N(i) measures the influence of agent jj to agent ii. In general, vijv_{ij}’s might not be known, and we will also study the simpler uniform case where vij=α0i,jv_{ij}=\alpha\geq 0\,\,\forall i,j. Lemma 1 gives the unique equilibrium of this game (also studied by Bindel, Kleinberg, and Oren (2015)); its proof is presented in Appendix B.2 of (Liu et al. 2023).

Lemma 1 (Nash equilibrium of opinion formation).

The unique Nash equilibrium 𝛉𝐞𝐪=(θ1eq,,θneq)T\boldsymbol{\theta^{eq}}=(\theta_{1}^{eq},\cdots,\theta_{n}^{eq})^{T} of the above game is 𝛉𝐞𝐪=W1𝛉¯\boldsymbol{\theta^{eq}}=W^{-1}\boldsymbol{\bar{\theta}} where Wij={jN(i)vij+1j=ivijjN(i)0jN(i),jiW_{ij}=\left\{\begin{array}[]{ll}\sum_{j\in N(i)}v_{ij}+1&j=i\\ -v_{ij}&j\in N(i)\\ 0&j\notin N(i),j\neq i\end{array}\right. and 𝛉¯=(θ¯1,,θ¯n)\boldsymbol{\bar{\theta}}=(\bar{\theta}_{1},\cdots,\bar{\theta}_{n}). When all vij=α0v_{ij}=\alpha\geq 0, Wij={αD+1j=iαjN(i)0jN(i),jiW_{ij}=\left\{\begin{array}[]{ll}\alpha D+1&j=i\\ -\alpha&j\in N(i)\\ 0&j\notin N(i),j\neq i\end{array}\right.. Furthermore, we have j=1nWij1=1\sum_{j=1}^{n}W^{-1}_{ij}=1 and Wij10W^{-1}_{ij}\geq 0 for all i,j[n]i,j\in[n].

Thus, from Lemma 1, the equilibrium model of every agent is a convex combination of all the agents’ internal opinion on the network. We show that 𝜽𝒆𝒒\boldsymbol{\theta^{eq}} can be computed in a federated manner using the algorithm of (Vanhaesebrouck, Bellet, and Tommasi 2017).

3 Total Sample Complexity of Opinion Formation

Recall that each agent ii’s dataset Si={(xij,yij)}j[mi]S_{i}=\{(x_{i}^{j},y_{i}^{j})\}_{j\in[m_{i}]} has mikm_{i}\geq k samples, where xijkx_{i}^{j}\in\mathbb{R}^{k}. We assume there is a ground-truth model θ\theta^{*} such that for any xijDx_{i}^{j}\sim D, yij=(xij)θ+ηijy_{i}^{j}=(x_{i}^{j})^{\top}\theta^{*}+\eta_{i}^{j} where ηijηi(xij)\eta_{i}^{j}\sim\eta_{i}(x_{i}^{j}) and ηi:Δ()\eta_{i}:\mathbb{R}\rightarrow\Delta(\mathbb{R}) is an agent-dependent noise function, mapping samples to a noise distribution. We consider unbiased noise with bounded variance, implying that for every agent ii, noise function ηi\eta_{i} is independently chosen from 𝒩={η:𝔼[η(x)]=0,Var[η(x)]σ2,xD}\mathcal{N}=\{\eta:\mathbb{E}\left[\eta(x)\right]=0,\text{\rm Var}\left[\eta(x)\right]\leq\sigma^{2},\forall x\sim D\}. Let Xi=[xi1,,ximi]X_{i}=[x_{i}^{1},\cdots,x_{i}^{m_{i}}] and Yi=[yi1,,yimi]Y_{i}=[y_{i}^{1},\cdots,y_{i}^{m_{i}}]^{\top}. Every agent performs ordinary linear regression to output their initial opinion θ¯i=argminθj=1mi((xij)θyij)2\bar{\theta}_{i}=\operatorname*{argmin}_{\theta}\sum_{j=1}^{m_{i}}\left((x_{i}^{j})^{\top}\theta-y_{i}^{j}\right)^{2}. For our study, we first make standard Assumption 1 on data distributions.

Assumption 1 (non-degeneracy).

For data distribution DD over k\mathbb{R}^{k}, if xx is drawn from DD, then for any linear hyperplane HkH\subset\mathbb{R}^{k}, we have (xH)=0\mathbb{P}\left(x\in H\right)=0.

Assumption 1 is standard to ensure the data distributions span over the whole k\mathbb{R}^{k} space. If it holds, from Fact 1 in (Mourtada 2022), for any XiX_{i} with mikm_{i}\geq k, XiXiX_{i}^{\top}X_{i} is invertible almost surely. This implies the ordinary linear regression for every agent to form the initial opinion enjoys the closed-form solution θ¯i=(XiXi)1XiYi\bar{\theta}_{i}=\left(X_{i}^{\top}X_{i}\right)^{-1}X_{i}^{\top}Y_{i}.

Loss measure. We use the expected square loss in the worst case of noise to measure the quality of agents’ final opinions (θ1eq,,θneq)(\theta_{1}^{eq},\cdots,\theta_{n}^{eq}) at the equilibrium. Namely, for agent ii, we consider the loss

L(θieq)=supη1:n𝒩𝔼xD,j,SjDmj(ηj)[(xθieqxθ)2]\displaystyle L(\theta_{i}^{eq})=\sup_{\eta_{1:n}\in\mathcal{N}}\mathbb{E}_{x\sim D,\forall j,S_{j}\sim D^{m_{j}}(\eta_{j})}\left[\left(x^{\top}\theta_{i}^{eq}-x^{\top}\theta^{*}\right)^{2}\right] (2)

where Dmi(ηi)D^{m_{i}}(\eta_{i}) denotes the joint distribution of Si={(xij,yij)}j[mi]S_{i}=\{(x_{i}^{j},y_{i}^{j})\}_{j\in[m_{i}]} given noise function ηi\eta_{i}. We take supremum over all possible noises and take expectation over all dataset because θieq\theta_{i}^{eq} is related to θ¯i\bar{\theta}_{i} for every agent ii.

3.1 Derivation of error bounds

We first define the error for initial opinion as

L(θ¯i)=supηi𝒩𝔼xD,SiDmi(ηi)[(xθ¯ixθ)2].L(\bar{\theta}_{i})=\sup_{\eta_{i}\in\mathcal{N}}\mathbb{E}_{x\sim D,S_{i}\sim D^{m_{i}}(\eta_{i})}\left[\left(x^{\top}\bar{\theta}_{i}-x^{\top}\theta^{*}\right)^{2}\right].

To quantify the upper bound of L(θ¯i)L(\bar{\theta}_{i}), we additionally consider the following assumption on data distribution.

Assumption 2 (small-ball condition).

For data distribution DD, there exists constant Ci1C_{i}\geq 1 and αi(0,1]\alpha_{i}\in(0,1] such that for every hyperplane HkH\subset\mathbb{R}^{k} and t>0t>0, if xx is drawn from DD, we have (dist(Σi12x,H)t)(Cit)αi\mathbb{P}\left(\text{\rm dist}(\Sigma_{i}^{-\frac{1}{2}}x,H)\leq t\right)\leq(C_{i}t)^{\alpha_{i}} where Σi=𝔼xD[xx]\Sigma_{i}=\mathbb{E}_{x\sim D}\left[xx^{\top}\right].

Given Assumption 1, Σi\Sigma_{i} is guaranteed to be invertible. Assumption 2 ensures Σi12x\Sigma_{i}^{-\frac{1}{2}}x is not too close to any fixed hyperplane, which is introduced in (Mourtada 2022) and is a variant of the small-ball condition in (Koltchinskii and Mendelson 2015; Mendelson 2015; Lecué and Mendelson 2016). From Proposition 5 in (Mourtada 2022) (see also Theorem 1.2 in (Rudelson and Vershynin 2015)), if every coordinate of DD are independent and have bounded density ratio, then Assumption 2 holds. More discussions on this assumption could be found in Section 3.3 in (Mourtada 2022).

Theorem 2 (Theorem 1, Proposition 2 and Equation 17 in (Mourtada 2022)).

For every agent ii with mim_{i} samples, ordinary linear regression attains loss L(θ¯i)=Θ(kmi)L(\bar{\theta}_{i})=\Theta\left(\frac{k}{m_{i}}\right).

For certain data distributions, tighter closed-form bounds have been derived. For instance, if DD is a kk-dimensional multivariate Gaussian distribution with zero mean, then L(θ¯i)=σ2kmik1L(\bar{\theta}_{i})=\frac{\sigma^{2}k}{m_{i}-k-1} (Anderson et al. 1958; Breiman and Freedman 1983; Donahue and Kleinberg 2021). For mean estimation, L(θ¯i)=σ2miL(\bar{\theta}_{i})=\frac{\sigma^{2}}{m_{i}} (Donahue and Kleinberg 2021). Theorem 2 shows that L(θ¯i)L(\bar{\theta}_{i}) scales with kmi\frac{k}{m_{i}}, assuming the data distributions satisfy non-degeneracy and the small-ball condition. Thus, M~(k,ϵ)=Θ(kϵ)\widetilde{M}(k,\epsilon)=\Theta(\frac{k}{\epsilon}) samples suffice for a single agent learning a model with error ϵ\epsilon.

We next present our main technique for bounding the generalization error of an equilibrium model, which uses Theorem 2 to express the error as a function of the matrix WW and the number of samples at each agent.

Theorem 3 (Bound on generalization error).

For every agent ii, we have L(θieq)=Θ(kj=1n(Wij1)2mj)L(\theta_{i}^{eq})=\Theta\left(k\sum_{j=1}^{n}\frac{\left(W_{ij}^{-1}\right)^{2}}{m_{j}}\right) where θieq\theta_{i}^{eq} and matrix WW is defined in Lemma 1.

Remark. In Appendix B.2 of (Liu et al. 2023), we show that Theorem 3 also holds for generalized linear regression with adapted assumptions, where a mapping function ϕ\phi exists such that 𝔼[y]=ϕ(x)θ\mathbb{E}[y]=\phi(x)^{\top}\theta^{*} for any possible data (x,y)(x,y).

3.2 Total sample complexity

Armed with Theorem 3, we initiate the study of total sample complexity of opinion convergence. Recall that we want to ensure that at the equilibrium, every node on the network has a model with a small generalization error. Specifically, we want to ensure L(θieq)ϵL(\theta_{i}^{eq})\leq\epsilon for any i[n]i\in[n] and a given ϵ>0\epsilon>0. Theorem 3 precisely gives us the mechanism to achieve a desired error bound.

The Total Sample Complexity (TSC) problem. Recall that for a given graph GG, influence factor 𝒗\boldsymbol{v}, dimension kk and error ϵ\epsilon, the total sample complexity TSC(G,𝒗,k,ϵ)TSC(G,\boldsymbol{v},k,\epsilon) is the minimum value of imi\sum_{i}m_{i}, under the constraint mikm_{i}\geq k, mi+m_{i}\in\mathbb{Z}^{+} and L(θieq)ϵL(\theta_{i}^{eq})\leq\epsilon for every i[n]i\in[n]. Our central result, Theorem 4, derives near-tight bounds on the total sample complexity.

Theorem 4 (Bounds on TSCTSC).

For any ϵ>0\epsilon>0, let (mi,i=1,,n)(m_{i}^{*},i=1,\ldots,n) denote an optimal solution of the following optimization problem as a measure of the minimum samples for opinion formation on graph GG with influence factor 𝐯\boldsymbol{v}.

minm1,,mn\displaystyle\min_{m_{1},\cdots,m_{n}}\quad i=1nmi\displaystyle\sum_{i=1}^{n}m_{i}
s.t. j=1n(Wij1)2mjϵk,i\displaystyle\sum_{j=1}^{n}\frac{(W_{ij}^{-1})^{2}}{m_{j}}\leq\frac{\epsilon}{k},\,\,\forall i (3)
mi>0,i\displaystyle m_{i}>0,\,\,\forall i

where WW is defined in Lemma 1. Then, TSC(G,𝐯,k,ϵ)=Θ(i=1nmi+nk)TSC(G,\boldsymbol{v},k,\epsilon)=\Theta(\sum_{i=1}^{n}m_{i}^{*}+nk). Assigning max{O(mi),k}\max\left\{O(\lceil m_{i}^{*}\rceil),k\right\} samples to agent ii suffices for L(θieq)ϵL(\theta_{i}^{eq})\leq\epsilon for every i[n]i\in[n]. Moreover, ϵmik\frac{\epsilon m_{i}^{*}}{k} is a fixed value for any kk and ϵ\epsilon.

From Theorem 4, TSC(G,𝒗,k,ϵ)TSC(G,\boldsymbol{v},k,\epsilon) has order i=1nmi+nk\sum_{i=1}^{n}m_{i}^{*}+nk and assigning max{mi,k}\max\left\{\lceil m_{i}^{*}\rceil,k\right\} samples to agent ii is sufficient to solve the TSC problem up to some constant. Thus, we only need to focus on the solution of Equation B.2, mi,i[n]m_{i}^{*},\forall i\in[n]. In the following sections, we will characterize properties of mi,i[n]m_{i}^{*},\forall i\in[n] and use it to prove network gain for different graphs.

4 Network Effects on Total Sample Complexity

In this section, we analyze the network effects on the total sample complexity (TSC) of opinion convergence. We derive bounds on the network gain, showing how the number of samples needed decreases due to network learning. Starting with uniform influence weights, we obtain asymptotically tight bounds for key network classes and characterize how samples should be distributed by node degree to minimize sample complexity. We then extend to arbitrary influence weights and derived bounds for TSC that are empirically tight.

4.1 Uniform influence factors

We model uniform influence factors by setting vij=αv_{ij}=\alpha for every agent ii and neighbor jj of ii, for a given real α0\alpha\geq 0. Let TSC(G,α,k,ϵ)TSC(G,\alpha,k,\epsilon) be the solution of the optimization in Theorem 4 for this case. Theorem 5 provides interpretable bounds for TSC(G,α,k,ϵ)TSC(G,\alpha,k,\epsilon) related to degree, and serves as the first step to derive tight bounds for specific graph classes. The proof of Theorem 5 leverages the dual form of Equation B.2, together with a careful analysis of matrix WW defined in Lemma 1. Detailed proof can be found in (Liu et al. 2023).

Theorem 5 (Sample allocation and degree distribution).

The optimal solution {mi}\{m^{*}_{i}\} to Equation B.2 satisfies

max{i=1n1(αdi+1)2,1}i[n]miϵki=1nα+1αdi+1.\displaystyle\max\left\{\sum_{i=1}^{n}\frac{1}{(\alpha d_{i}+1)^{2}},1\right\}\leq\sum_{i\in[n]}m^{*}_{i}\!\cdot\!\frac{\epsilon}{k}\leq\sum_{i=1}^{n}\frac{\alpha+1}{\alpha d_{i}+1}.

We have TSC(G,α,k,ϵ)=Θ(i[n]mi+nk)TSC(G,\alpha,k,\epsilon)=\Theta(\sum_{i\in[n]}m^{*}_{i}+nk). Assigning max{O(α+1αdi+1kϵ),k}\max\{O\left(\lceil\frac{\alpha+1}{\alpha d_{i}+1}\!\cdot\!\frac{k}{\epsilon}\rceil\right),k\} samples to every agent ii suffices for L(θieq)ϵ,i[n]L(\theta^{eq}_{i})\leq\epsilon,\,\,\forall i\in[n].

Theorem 5 suggests that it is sufficient to allocate samples inversely proportional to the node degrees. This has interesting policy implications; in contrast to other social network models, e.g., the classic work on influence maximization (Kempe, Kleinberg, and Tardos 2003), our result advocates that allocating more resources to low-connectivity agents benefit the network at large. We provide empirical validation of this phenomenon in Section 5.

With the help of Theorem 5, we could derive the bounds of network gain for any graph in Corollary 6.

Corollary 6.

​For any graph GG, if ϵO(1αmaxidi+1)\epsilon\leq O(\frac{1}{\alpha\max_{i}d_{i}+1}), then μ(G,α,k,ϵ)Ω(ni=1nα+1αdi+1)\mu(G,\alpha,k,\epsilon)\geq\Omega\left(\frac{n}{\sum_{i=1}^{n}\frac{\alpha+1}{\alpha d_{i}+1}}\right). If ϵO(1maxi(αdi+1)2)\epsilon\leq O(\frac{1}{\max_{i}(\alpha d_{i}+1)^{2}}), then μ(G,α,k,ϵ)O(min{ni[n]1(αdi+1)2,n})\mu(G,\alpha,k,\epsilon)\leq O\left(\min\left\{\frac{n}{\sum_{i\in[n]}\frac{1}{(\alpha d_{i}+1)^{2}}},n\right\}\right).

For small ϵ\epsilon, Corollary 6 demonstrates that the network gain is at least Ω(ni[n]1di)\Omega\left(\frac{n}{\sum_{i\in[n]}\frac{1}{d_{i}}}\right), implying that networks with more high-degree nodes result in higher network gain, but the gain is limited to O(min{ni[n]1di2,n})O\left(\min\left\{\frac{n}{\sum_{i\in[n]}\frac{1}{d_{i}^{2}}},n\right\}\right).

Now we are ready to analyze the tight network gain for special graphs classes. Our main results are shown in Table 2, with detailed lemmas and proofs deferred to (Liu et al. 2023). The third column of Table 2 gives i[n]mi\sum_{i\in[n]}m_{i}^{*}, which suffices to characterize total sample complexity (TSC) because TSC(G,α,k,ϵ)=Θ(i[n]mi+nk)TSC(G,\alpha,k,\epsilon)=\Theta\left(\sum_{i\in[n]}m_{i}^{*}+nk\right) from Theorem 4. The last column indicates the number of samples required for each agent to ensure that all constraints of the TSC problem are satisfied. For each kind of graph, if the sample size for every agent is the maximum of kk and the term in the last column, it is sufficient to guarantee L(θieq)ϵ,iL(\theta^{eq}_{i})\leq\epsilon,\,\,\forall i. To prove the results in Table 2, we leverage the spectral properties of graph Laplacian \mathcal{L} for different networks, together with the lower bound in Theorem 5. Note that the lower bound in Table 2 does not require any constraint on ϵ\epsilon. This contrasts with the bounds in Corrollary 6 and requires a more refined analysis. On the other hand, there is no general upper bound for the network gain because it can be infinite.

We now give a more detailed discussion of the results in Table 2 as follows. (1) For a clique, the network gain is Ω(n)\Omega(n) and TSC(G,α,k,ϵ)=Θ(kϵ+nk)TSC(G,\alpha,k,\epsilon)=\Theta\left(\frac{k}{\epsilon}+nk\right). From Corollary 6, this is the best lower bound of network gain for any graph. (2) For a star, when ϵ\epsilon is less than some constant, the network gain is O(1)O(1) and TSC(G,α,k,ϵ)=Θ(nkϵ)TSC(G,\alpha,k,\epsilon)=\Theta\left(\frac{nk}{\epsilon}\right). This implies a star graph almost cannot get any gain compared with learning individually. (3) For a hypercube, when α38\alpha\geq\frac{3}{8}, the network gain is Ω(d2)\Omega(d^{2}) and TSC(G,α,k,ϵ)=Θ(nkd2ϵ+nk)TSC(G,\alpha,k,\epsilon)=\Theta\left(\frac{nk}{d^{2}\epsilon}+nk\right). (4) For a random d-regular graph, with high probability, the network gain is Ω(d2)\Omega(d^{2}) when dnd\leq\sqrt{n} while it is Ω(n)\Omega(n) when d>nd>\sqrt{n}. The total sample complexity is TSC(G,α,k,ϵ)=Θ(nkmin{d2,n}ϵ+nk)TSC(G,\alpha,k,\epsilon)=\Theta\left(\frac{nk}{\min\{d^{2},n\}\epsilon}+nk\right). (5) For a dd-regular graph GG, the edge expansion τ(G)\tau(G) equals min|S|:|S|n/2|S|d|S|\min\limits_{|S|:|S|\leq n/2}\frac{|\partial S|}{d|S|}, where S\partial S is the set {(u,v)E:uS,vVS}\{(u,v)\in E:u\in S,v\in V\setminus S\}. Edge expansion measures graph connectivity and can be viewed as a lower bound on the probability that a randomly chosen edge from any subset of vertices SS has one endpoint outside SS. A dd-expander denotes a dd-regular graph with edge expansion τ(G)\tau(G). For this kind of graph, the network gain is Ω(min{d2τ(G)4,n})\Omega(\min\{d^{2}\tau(G)^{4},n\}) and TSC(G,α,k,ϵ)=Θ(nkmin{d2τ(G)4,n}ϵ+nk)TSC(G,\alpha,k,\epsilon)=\Theta(\frac{nk}{\min\{d^{2}\tau(G)^{4},n\}\epsilon}+nk). Note that the Ω(min{d2,n})\Omega\left(\min\{d^{2},n\}\right) lower bound is also optimal for these dd-regular graphs from the upper bound in Corollary 6.

Graph class Network gain i[n]mi\sum_{i\in[n]}m_{i}^{*} Samples per node (omit max\max with kk)
Clique (Lemma 13) Ω(n)\Omega(n) Θ(kϵ)\Theta(\frac{k}{\epsilon}) O(kϵn)O(\frac{k}{\epsilon n})
Star (Lemma 14) O(1)O(1) when ϵO(1)\epsilon\leq O(1) Θ(nkϵ)\Theta(\frac{nk}{\epsilon}) O(kϵ)O(\frac{k}{\epsilon}) (leaf) O(knϵ)O(\frac{k}{n\epsilon})  (center)
Hypercube (Lemma 16) Ω(d2)\Omega(d^{2}) when α38\alpha\geq\frac{3}{8} Θ(nkd2ϵ)\Theta(\frac{nk}{d^{2}\epsilon}) O(kd2ϵ)O(\frac{k}{d^{2}\epsilon})
Random d-regular (Lemma 17) Ω(min{n,d2})\Omega(\min\{n,d^{2}\}) w.h.p. Θ(nkmin{n,d2}ϵ)\Theta(\frac{nk}{\min\{n,d^{2}\}\epsilon}) O(kmin{n,d2}ϵ)O(\frac{k}{\min\{n,d^{2}\}\epsilon})
d-Expander (Lemma 18) Ω(min{n,d2τ4})\Omega(\min\{n,d^{2}\tau^{4}\}) Θ(nkmin{n,d2τ4}ϵ)\Theta(\frac{nk}{\min\{n,d^{2}\tau^{4}\}\epsilon}) O(kmin{n,d2τ4}ϵ)O(\frac{k}{\min\{n,d^{2}\tau^{4}\}\epsilon})
Table 2: Network gain μ(G,α,ϵ)\mu(G,\alpha,\epsilon), i[n]mi\sum_{i\in[n]}m_{i}^{*} and distribution of samples in different network classes and constant α\alpha. The number of samples at a node is the maximum of kk and the term in the third column. In the last row, τ=τ(G)\tau=\tau(G) is the edge expansion.

It follows from Table 2 and Corollary 6 that both random and expander dd-regular graphs (with constant τ(G)\tau(G)) have the optimal lower bound for network gain. Interestingly, this property also holds for the hypercube, which has regular degree d=lognd=\log n, even though its expansion is O(1logn)=o(1)O(\frac{1}{\log n})=o(1). A natural open question is to characterize other degree-bounded network families which also achieve near-optimal network gains.

4.2 General influence factors

Going beyond uniform influence factors, we investigate the total sample complexity of opinion convergence with general influence factors. The non-uniformity of the influence factors implies that the sample complexity is not just dependent on the network structure, but also on how each individual agent weighs the influence of its neighbors. A trivial bound is Ω(kϵ)TSC(G,𝒗,k,ϵ)O(nkϵ)\Omega\left(\frac{k}{\epsilon}\right)\leq TSC(G,\boldsymbol{v},k,\epsilon)\leq O\left(\frac{nk}{\epsilon}\right) which means learning through the game is always more beneficial than learning individually but needs at least the samples for one agent to learn a good model.

To derive more sophisticated and tighter bounds, we need to analyze the matrix WW with general weights, which captures both the network topology and influence factors. Since W1W^{-1} is positive-definite, from the Schur product theorem, (W1)(W1)(W^{-1})\circ(W^{-1}) is also positive-definite where \circ is the Hadamard product (element-wise product). Define B=((W1)(W1))1B=((W^{-1})\circ(W^{-1}))^{-1} and [B]ij=bij[B]_{ij}=b_{ij}. Theorem 7 establishes a bound for graphs with general influence factors, which is empirically tight from our experiments in Section 5. The proof for the upper bound utilizes a thorough analysis of the property of bijb_{ij}s, and the lower bound is derived based on the dual form of Equation B.2 with general weights. We refer the reader to (Liu et al. 2023) for the whole proof.

Theorem 7 (TSC under general influence factors).

Let γi=max{0,j=1nbij(k=1nbjk)2}\gamma_{i}=\max\{0,\sum_{j=1}^{n}\frac{b_{ij}}{(\sum_{k=1}^{n}b_{jk})^{2}}\} for every i[n]i\in[n], then the optimal solution {mi}\{m^{*}_{i}\} to Equation B.2 satisfies

i=1n(2j=1nγj(Wij1)2γi)imiϵki=1n1j=1nbij\displaystyle\sum_{i=1}^{n}\left(2\sqrt{\sum_{j=1}^{n}\gamma_{j}(W^{-1}_{ij})^{2}}-\gamma_{i}\right)\leq\sum_{i}m_{i}^{*}\!\cdot\!\frac{\epsilon}{k}\leq\sum_{i=1}^{n}\frac{1}{\sum_{j=1}^{n}b_{ij}}

and TSC(G,𝐯,k,ϵ)TSC(G,\boldsymbol{v},k,\epsilon) is Θ(imi+nk)\Theta(\sum_{i}m_{i}^{*}+nk). Assigning max{k,O(kϵj=1nbij)}\max\left\{k,O\left(\lceil\frac{k}{\epsilon\sum_{j=1}^{n}b_{ij}}\rceil\right)\right\} samples to agent ii suffices for L(θieq)ϵL(\theta_{i}^{eq})\leq\epsilon for every i[n]i\in[n].

Although the above two bounds give good empirical estimations to the TSC, deriving closed-form solution for Equation B.2 is still interesting, and is investigated in Lemma 8 under a certain condition.

Lemma 8.

If j=1nbij(k=1nbjk)20\sum_{j=1}^{n}\frac{b_{ij}}{(\sum_{k=1}^{n}b_{jk})^{2}}\geq 0 for all i[n]i\in[n], i[n]mi=i=1n1j=1nbijkϵ\sum_{i\in[n]}m_{i}^{*}=\sum_{i=1}^{n}\frac{1}{\sum_{j=1}^{n}b_{ij}}\!\cdot\!\frac{k}{\epsilon} where the optimal mi=1j=1nbijkϵm_{i}^{*}=\frac{1}{\sum_{j=1}^{n}b_{ij}}\!\cdot\!\frac{k}{\epsilon}.

It is easy to verify that the condition in Lemma 8 holds for cliques, but beyond cliques, we have yet to identify other instances where it holds. Characterizing the graphs that meet this condition remains an interesting open problem

5 Experiments

We use experiments on both synthetic and real-world networks to further understand the distribution of samples (Theorem 5), the quality of our bounds for general influence weights (Theorem 7), and the network gains for dd-regular graph (Table 2 and Lemma 17). We observe good agreement with our theoretical bounds. We only list part of results here and more experiments and be found in (Liu et al. 2023).

From Theorem 4, given a network and influence factors, ϵmik\frac{\epsilon m_{i}^{*}}{k} is fixed for any kk and ϵ\epsilon. This value can be solved directly by reformulating Equation B.2. Since M~(k,ϵ)=Θ(kϵ)\widetilde{M}(k,\epsilon)=\Theta(\frac{k}{\epsilon}), ϵmik\frac{\epsilon m_{i}^{*}}{k} indicates the percentage of samples one agent needs at the equilibrium compared to the number of samples required to learn independently, which is the main factor we are interested in. Thus, we only consider ϵmik\frac{\epsilon m_{i}^{*}}{k} for experiments of Theorem 5 and Theorem 7 without specific ϵ\epsilon and kk.

To solve mim_{i}^{\star} or ϵmik\frac{\epsilon m_{i}^{*}}{k} for every i[n]i\in[n] exactly, we reformulate Equation B.2 to a second-order cone programming (SOCP) and then use the solver CVXOPT ((Andersen et al. 2013)) to solve it. More details are in (Liu et al. 2023). We now describe the networks used in our experiments.

Synthetic networks. We use three types of synthetic networks: scale-free (SF) networks (Barabási and Albert 1999), random d-regular graphs (RR), and Erdös-Renyi random graphs (ER). The exact parameters for these graphs used in different experiments are given in (Liu et al. 2023).

Real-world networks. The networks we use (labeled as RN) are shown in Table 3 with their features (erru/errlerr_{u}/err_{l} will be defined later). We use the 130bit network, the econ-mahindas network ((Rossi and Ahmed 2015)), the ego-Facebook network ((Leskovec and Mcauley 2012)), and the email-Eu-core temporal network ((Paranjape, Benson, and Leskovec 2017)). The last two networks are also in Leskovec and Krevl (2014).

Network Nodes Edges erruerr_{u} errlerr_{l}
ego-Facebook 4039 88234 27% 24%
Econ 1258 7620 6.6% 1.3%
Email-Eu 986 16064 30% 21%
130bit 584 6067 41% 36%
Table 3: Real-world networks and bound performance

Our results are summarized below.

Distribution of samples (Theorem 5). We set the uniform influence factor α=0.1\alpha=0.1, showing how the sample assignment is related to degrees at the solution of Equation B.2 on different kinds of networks: scale-free (Figures 1(a) and 1(b)), Erdos-Renyi random networks (Figures 1(c) and 1(d)), and real-world networks (Figures 1(e) and 1(f)). Figures 1(a), 1(c), 1(e) show the average number of samples (divided by kϵ\frac{k}{\epsilon}) for each degree; Figure 1(b), 1(d) and 1(f) show the variance of sample number for each degree.

Our main observations are: (1) The samples assigned to nodes with the same degree tend to be almost the same (i.e., the variance is small), (2) Fewer samples tend to be assigned to high-degree agents. These observations are consistent with Theorem 5, which states that sample assignment has negative correlation with degree.

Tightness of bounds (Theorem 7). We empirically show the performance of the bounds in Theorem 7. Here, all influence factors vijv_{ij}s are generated randomly from [0,1][0,1]. For each kind of synthetic network, we generate 150 different instances with different number of nodes nn and vijv_{ij}s. We meausure the performance of bounds in Theorem 7 by relative error (i.e |U or Li[n]mi|i[n]mi\frac{|U\text{ or }L-\sum_{i\in[n]}m_{i}^{*}|}{\sum_{i\in[n]}m_{i}^{*}} where UU is the upper bound in Theorem 7 and LL is the max of 11 (trivial lower bound) and the lower bound in Theorem 7). We visualize the relative error through frequency distributions of generated networks. Our observation is that the bounds are very tight for scale-free graphs (Figure 2(a)), Erdos-Renyi random graphs (Figure 2(b)), and random d-regular graphs (Figure 2(c)).

Refer to caption
(a) SF Average
Refer to caption
(b) SF Variance
Refer to caption
(c) ER Average
Refer to caption
(d) ER Variance
Refer to caption
(e) RN Average
Refer to caption
(f) RN Variance
Figure 1: Relationship between sample distribution and degree for different networks when α=1\alpha=1. The x-axis is node degree and the y-axis is the average or variance of Nd={ϵmik:di=d,i[n]}N^{d}=\{\frac{\epsilon m_{i}^{*}}{k}:d_{i}=d,i\in[n]\} where mi,i[n]m_{i}^{*},\forall i\in[n] is the solution of Equation B.2.
Refer to caption
(a) SF with random vijv_{ij}s
Refer to caption
(b) ER with random vijv_{ij}s
Refer to caption
(c) RR with random vijv_{ij}s
Figure 2: Bound tightness for synthetic networks for random vijv_{ij}s. The x-axis is the relative errors of upper/lower bounds and the y-axis is the frequency.

For real-world networks, we generate 50 different vij[0,1]v_{ij}\in[0,1] and denote the maximum relative error of upper/lower bounds as erru/errlerr_{u}/err_{l}. From Table 3, the relative errors are still relatively small for real-world networks, showing the tightness of our bounds.

Network Gain (Table 2 and Lemma 17). We empirically demonstrate the network gain results for random dd-regular graphs (Lemma 17). To calculate the network gain, we replace the TSC(G,𝒗,k,ϵ)TSC(G,\boldsymbol{v},k,\epsilon) in network gain (Equation 1) by its upper bound i=1nmi+nk+n\sum_{i=1}^{n}m_{i}^{*}+nk+n. This allows us to calculate a lower bound of the network gain, which aligns with the trend of lower bound in Lemma 17. We set k=5k=5 and ϵ=0.01\epsilon=0.01, making kϵ=500\frac{k}{\epsilon}=500. Additionally, we set α=1\alpha=1 and consider degree ranging from 22 to 1010. We choose different number of nodes nn such that degree dnd\leq\sqrt{n}, which implies a d2d^{2} gain in theory (4th row of Table 2 and Lemma 17). The empirical result, shown in Figure 3, exactly confirms the gain is Θ(d2)\Theta(d^{2}) in our setup. Moreover, we also found that when degree d=11d=11, the gain becomes infinity. This is because kd2ϵ=500121<k=5\frac{k}{d^{2}\epsilon}=\frac{500}{121}<k=5. According to the 4th row of Table 2 (Lemma 17), in this scenario, it is sufficient for each node to have kk samples, resulting in an infinite network gain.

Refer to caption
Figure 3: Network Gain of RR. The x-axis is the square of degree and the y-axis is a lower bound of network gain.

Impact of influence factors. When α\alpha is small, sample assignment is mostly independent of nn; however, for large α\alpha, increasing nn leads to fewer samples for nodes of the same degree. Additionally, the bounds in Theorem 7 get worse as influence factors increase. See (Liu et al. 2023) for details.

6 Discussion and future work

We initiate a study on the total sample complexity (TSC) required for opinion formation over networks. By extending the standard opinion formation game into a machine-learning framework, we characterize the minimal samples needed to ensure low generalization error for equilibrium models. We provide tight bounds on TSC and analyze sample distributions, showing that network structure significantly reduces TSC, with an inverse relationship to node degree being an interesting result.

Our formulation opens several research directions. We currently focus on (generalized) linear regression, but it’s worth investigating if other problems like kernel methods or soft SVMs. Moreover, it turns out that our problem formulation is similar to the best arm identification problem for multi-armed bandits with fixed confidence (Karnin, Koren, and Somekh 2013). It is interesting to extend our results analog to the fixed budget setting. Namely, assuming the number of samples is fixed, how to assign them to minimize the total error? Another future direction is to consider individual-level incentives for sample collection, building on the results of (Blum et al. 2021).

Acknowledgments

RR and OW were partially supported by NSF grant CCF-2335187. AV was partially supported by NSF grants CCF-1918656, IIS-1955797, CNS 2317193 and IIS 2331315. RS was partially supported by NSF grant D-ISN 2146502.

References

  • Acemoglu, Chernozhukov, and Yildiz (2006) Acemoglu, D.; Chernozhukov, V.; and Yildiz, M. 2006. Learning and disagreement in an uncertain world.
  • Acemoglu and Ozdaglar (2011) Acemoglu, D.; and Ozdaglar, A. 2011. Opinion dynamics and learning in social networks. Dynamic Games and Applications, 1(1): 3–49.
  • Andersen et al. (2013) Andersen, M. S.; Dahl, J.; Vandenberghe, L.; et al. 2013. CVXOPT: A Python package for convex optimization. abel. ee. ucla. edu/cvxopt, 88.
  • Anderson et al. (1958) Anderson, T. W.; Anderson, T. W.; Anderson, T. W.; and Anderson, T. W. 1958. An introduction to multivariate statistical analysis, volume 2. Wiley New York.
  • Audibert and Catoni (2010) Audibert, J.-Y.; and Catoni, O. 2010. Linear regression through PAC-Bayesian truncation. arXiv preprint arXiv:1010.0072.
  • Banerjee (1992) Banerjee, A. V. 1992. A simple model of herd behavior. The quarterly journal of economics, 107(3): 797–817.
  • Barabási and Albert (1999) Barabási, A.-L.; and Albert, R. 1999. Emergence of scaling in random networks. science, 286(5439): 509–512.
  • Bikhchandani, Hirshleifer, and Welch (1992) Bikhchandani, S.; Hirshleifer, D.; and Welch, I. 1992. A theory of fads, fashion, custom, and cultural change as informational cascades. Journal of political Economy, 100(5): 992–1026.
  • Bindel, Kleinberg, and Oren (2015) Bindel, D.; Kleinberg, J.; and Oren, S. 2015. How bad is forming your own opinion? Games and Economic Behavior, 92: 248–265.
  • Blum et al. (2021) Blum, A.; Haghtalab, N.; Phillips, R. L.; and Shao, H. 2021. One for one, or all for all: Equilibria and optimality of collaboration in federated learning. In International Conference on Machine Learning, 1005–1014. PMLR.
  • Blum et al. (2017) Blum, A.; Haghtalab, N.; Procaccia, A. D.; and Qiao, M. 2017. Collaborative PAC Learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, 2389–2398. Red Hook, NY, USA: Curran Associates Inc. ISBN 9781510860964.
  • Boyd, Boyd, and Vandenberghe (2004) Boyd, S.; Boyd, S. P.; and Vandenberghe, L. 2004. Convex optimization, 232–236. Cambridge university press.
  • Breiman and Freedman (1983) Breiman, L.; and Freedman, D. 1983. How many variables should be entered in a regression equation? Journal of the American Statistical Association, 78(381): 131–136.
  • Chen, Zhang, and Zhou (2018) Chen, J.; Zhang, Q.; and Zhou, Y. 2018. Tight bounds for collaborative pac learning via multiplicative weights. Advances in neural information processing systems, 31.
  • DeGroot (1974) DeGroot, M. H. 1974. Reaching a consensus. Journal of the American Statistical association, 69(345): 118–121.
  • DeMarzo, Vayanos, and Zwiebel (2003) DeMarzo, P. M.; Vayanos, D.; and Zwiebel, J. 2003. Persuasion bias, social influence, and unidimensional opinions. The Quarterly journal of economics, 118(3): 909–968.
  • Donahue and Kleinberg (2021) Donahue, K.; and Kleinberg, J. 2021. Model-sharing games: Analyzing federated learning under voluntary participation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 5303–5311.
  • Friedkin and Johnsen (1990) Friedkin, N. E.; and Johnsen, E. C. 1990. Social influence and opinions. Journal of Mathematical Sociology, 15(3-4): 193–206.
  • Friedman (2003) Friedman, J. 2003. A proof of Alon’s second eigenvalue conjecture. In Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, 720–724.
  • Golub and Jackson (2008) Golub, B.; and Jackson, M. O. 2008. How homophily affects diffusion and learning in networks. arXiv preprint arXiv:0811.4013.
  • Haddadan, Xin, and Gao (2024) Haddadan, S.; Xin, C.; and Gao, J. 2024. Optimally Improving Cooperative Learning in a Social Setting. In International Conference on Machine Learning. PMLR.
  • Haghtalab, Jackson, and Procaccia (2021) Haghtalab, N.; Jackson, M. O.; and Procaccia, A. D. 2021. Belief polarization in a complex world: A learning theory perspective. Proceedings of the National Academy of Sciences, 118(19): e2010144118.
  • Haghtalab, Jordan, and Zhao (2022) Haghtalab, N.; Jordan, M.; and Zhao, E. 2022. On-demand sampling: Learning optimally from multiple distributions. Advances in Neural Information Processing Systems, 35: 406–419.
  • Jackson (2010) Jackson, M. O. 2010. Social and economic networks. Princeton university press.
  • Karnin, Koren, and Somekh (2013) Karnin, Z.; Koren, T.; and Somekh, O. 2013. Almost optimal exploration in multi-armed bandits. In International Conference on Machine Learning, 1238–1246. PMLR.
  • Kempe, Kleinberg, and Tardos (2003) Kempe, D.; Kleinberg, J.; and Tardos, E. 2003. Maximizing the Spread of Influence through a Social Network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’03, 137–146. New York, NY, USA: Association for Computing Machinery. ISBN 1581137370.
  • Koltchinskii and Mendelson (2015) Koltchinskii, V.; and Mendelson, S. 2015. Bounding the smallest singular value of a random matrix without concentration. International Mathematics Research Notices, 2015(23): 12991–13008.
  • Lalitha et al. (2019) Lalitha, A.; Kilinc, O. C.; Javidi, T.; and Koushanfar, F. 2019. Peer-to-peer federated learning on graphs. arXiv preprint arXiv:1901.11173.
  • Lalitha et al. (2018) Lalitha, A.; Shekhar, S.; Javidi, T.; and Koushanfar, F. 2018. Fully decentralized federated learning. In Third workshop on Bayesian Deep Learning (NeurIPS).
  • Lecué and Mendelson (2016) Lecué, G.; and Mendelson, S. 2016. Performance of empirical risk minimization in linear aggregation.
  • Leskovec and Krevl (2014) Leskovec, J.; and Krevl, A. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
  • Leskovec and Mcauley (2012) Leskovec, J.; and Mcauley, J. 2012. Learning to discover social circles in ego networks. Advances in neural information processing systems, 25.
  • Liu et al. (2023) Liu, H.; Rajaraman, R.; Sundaram, R.; Vullikanti, A.; Wasim, O.; and Xu, H. 2023. Sample Complexity of Opinion Formation on Networks. arXiv preprint arXiv:2311.02349.
  • McMahan et al. (2017) McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; and y Arcas, B. A. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, 1273–1282. PMLR.
  • Mendelson (2015) Mendelson, S. 2015. Learning without concentration. Journal of the ACM (JACM), 62(3): 1–25.
  • Mourtada (2022) Mourtada, J. 2022. Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices. The Annals of Statistics, 50(4): 2157–2178.
  • Nguyen and Zakynthinou (2018) Nguyen, H.; and Zakynthinou, L. 2018. Improved algorithms for collaborative PAC learning. Advances in Neural Information Processing Systems, 31.
  • Paranjape, Benson, and Leskovec (2017) Paranjape, A.; Benson, A. R.; and Leskovec, J. 2017. Motifs in temporal networks. In Proceedings of the tenth ACM international conference on web search and data mining, 601–610.
  • Rossi and Ahmed (2015) Rossi, R. A.; and Ahmed, N. K. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In AAAI.
  • Rudelson and Vershynin (2015) Rudelson, M.; and Vershynin, R. 2015. Small ball probabilities for linear images of high-dimensional distributions. International Mathematics Research Notices, 2015(19): 9594–9617.
  • Simon (2013) Simon, H. A. 2013. Administrative behavior. Simon and Schuster.
  • Smith and Sørensen (2000) Smith, L.; and Sørensen, P. 2000. Pathological outcomes of observational learning. Econometrica, 68(2): 371–398.
  • Spielman (2019) Spielman, D. A. 2019. Spectral and Algebraic Graph Theory Incomplete Draft, dated December 4, 2019.
  • Vanhaesebrouck, Bellet, and Tommasi (2017) Vanhaesebrouck, P.; Bellet, A.; and Tommasi, M. 2017. Decentralized Collaborative Learning of Personalized Models over Networks. In Singh, A.; and Zhu, X. J., eds., Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA, volume 54 of Proceedings of Machine Learning Research, 509–517. PMLR.
  • Young (2014) Young, D. M. 2014. Iterative solution of large linear systems, 42–44. Elsevier.
\appendixpage
\startcontents

[section] \printcontents[section]l1

Appendix A Additional details on preliminaries

DD Data distribution
kk Dimension of DD
SiS_{i} Set of samples for agent ii; Si={(xij,yij)}j[mi]S_{i}=\{(x_{i}^{j},y_{i}^{j})\}_{j\in[m_{i}]}
mim_{i} Number of samples for agent ii
θ¯i\bar{\theta}_{i} Initial model for agent ii (also known as “internal opinion”)
θi\theta_{i} Model learned by agent ii in opinion formation game
N(i),diN(i),d_{i} Set of neighbors and degree of node ii, respectively
θieq\theta^{eq}_{i} Model for agent ii in the Nash equilibrium
M~(k,ϵ)\widetilde{M}(k,\epsilon) Minimum number of samples that any agent ii would need to ensure that the best model θ¯i\bar{\theta}_{i} learned using only their samples ensures that the expected error is at most some target ϵ\epsilon (bounded in Theorem 2 for linear regression)
aija_{ij} We denote aij=(W1)ija_{ij}=(W^{-1})_{ij} for all i,j[n]i,j\in[n] to simplify notations.
bijb_{ij} We denote bij=[(W1W1)1]ijb_{ij}=[(W^{-1}\circ W^{-1})^{-1}]_{ij} for all i,j[n]i,j\in[n].
Table 4: Notation used in the paper

Appendix B Missing proof in Section 2

B.1 Missing proof in Section 2

Lemma 1. The unique Nash equilibrium 𝜽𝒆𝒒=(θ1eq,,θneq)\boldsymbol{\theta^{eq}}=(\theta_{1}^{eq},\cdots,\theta_{n}^{eq})^{\top} of the above game is 𝜽𝒆𝒒=W1𝜽¯\boldsymbol{\theta^{eq}}=W^{-1}\boldsymbol{\bar{\theta}} where Wij={jN(i)vij+1j=ivijjN(i)0jN(i),jiW_{ij}=\left\{\begin{array}[]{ll}\sum_{j\in N(i)}v_{ij}+1&j=i\\ -v_{ij}&j\in N(i)\\ 0&j\notin N(i),j\neq i\end{array}\right. and 𝜽¯=(θ¯1,,θ¯n)\boldsymbol{\bar{\theta}}=(\bar{\theta}_{1},\cdots,\bar{\theta}_{n}). When all vij=α0v_{ij}=\alpha\geq 0, Wij={αD+1j=iαjN(i)0jN(i),jiW_{ij}=\left\{\begin{array}[]{ll}\alpha D+1&j=i\\ -\alpha&j\in N(i)\\ 0&j\notin N(i),j\neq i\end{array}\right.. Furthermore, we have j=1nWij1=1\sum_{j=1}^{n}W^{-1}_{ij}=1 and Wij10W^{-1}_{ij}\geq 0 for all i,j[n]i,j\in[n].

Proof.

For any agent ii, given θj(jN(i))\theta_{j}(j\in N(i)), by setting the gradient of the loss function to zero, we have the best response for ii is θi=θ¯i+jN(i)vijθj1+jN(i)vij\theta_{i}=\frac{\bar{\theta}_{i}+\sum_{j\in N(i)}v_{ij}\theta_{j}}{1+\sum_{j\in N(i)}v_{ij}}. An action profile 𝜽𝒆𝒒=(θ1eq,,θneq)\boldsymbol{\theta^{eq}}=(\theta_{1}^{eq},\cdots,\theta_{n}^{eq})^{\top} is a Nash equilibrium if and only if for any ii, θieq=θ¯i+jN(i)vijθjeq1+jN(i)vij\theta_{i}^{eq}=\frac{\bar{\theta}_{i}+\sum_{j\in N(i)}v_{ij}\theta_{j}^{eq}}{1+\sum_{j\in N(i)}v_{ij}}, which is equivalent to (jN(i)vij+1)θieqjN(i)vijθjeq=θ¯i(\sum_{j\in N(i)}v_{ij}+1)\theta_{i}^{eq}-\sum_{j\in N(i)}v_{ij}\theta_{j}^{eq}=\bar{\theta}_{i} for any ii. Combining all these equations, we have W𝜽eq=𝜽¯W\boldsymbol{\theta}^{eq}=\boldsymbol{\bar{\theta}} where Wij={jN(i)vij+1j=ivijjN(i)0jN(i),jiW_{ij}=\left\{\begin{array}[]{ll}\sum_{j\in N(i)}v_{ij}+1&j=i\\ -v_{ij}&j\in N(i)\\ 0&j\notin N(i),j\neq i\end{array}\right. Note that W=+InW=\mathcal{L}+I_{n} where \mathcal{L} is the weighted Laplacian matrix of the graph GG and InI_{n} is the n×nn\times n identity matrix. Since \mathcal{L} is positive semi-definite, WW is positive definite. Thus WW is invertible and 𝜽eq=W1𝜽¯\boldsymbol{\theta}^{eq}=W^{-1}\boldsymbol{\bar{\theta}}. Given the graph structure, 𝜽eq\boldsymbol{\theta}^{eq} is unique and thus the Nash equilibrium is unique. When all vij=αv_{ij}=\alpha, we can simplify Wij={αD+1j=iαjN(i)0jN(i),jiW_{ij}=\left\{\begin{array}[]{ll}\alpha D+1&j=i\\ -\alpha&j\in N(i)\\ 0&j\notin N(i),j\neq i\end{array}\right. by direct calculation.

Since vij>0v_{ij}>0, every non-diagonal element in WW is negative. On the other hand, since W=+InW=\mathcal{L}+I_{n} where \mathcal{L} is the weighted Laplacian matrix of GG, WW is symmetric and positive definite. Thus WW is a Stieltjes matrix. From Young (2014), the inverse of any Stieltjes matrix is a non-negative matrix. Thus, W1W^{-1} is a non-negative matrix.

Let 1¯\bar{1} be the vector in which all elements are 11. Since W1¯=1¯W\bar{1}=\bar{1}, we have 1¯=W1W1¯=W11¯\bar{1}=W^{-1}W\bar{1}=W^{-1}\bar{1}. Thus, j=1n(W1)ij=1\sum_{j=1}^{n}(W^{-1})_{ij}=1 for any ii and (W1)ij[0,1](W^{-1})_{ij}\in[0,1]. Since the best response update for ii is θi=θ¯i+jN(i)vijθj1+jN(i)vij\theta_{i}=\frac{\bar{\theta}_{i}+\sum_{j\in N(i)}v_{ij}\theta_{j}}{1+\sum_{j\in N(i)}v_{ij}}, the final weight (Wij)1=0(W_{ij})^{-1}=0 if and only if ii and jj are not connected. Since we consider a connected network(or we can consider different connected components separately), we have Wij1>0W_{ij}^{-1}>0. Thus, Wij1W_{ij}^{-1}s are normalized weights. ∎

B.2 Missing proofs in Section 3.1

We consider a slightly general case here, where every agent ii’s dataset Si={(xij,yij)}j[mi]S_{i}=\{(x_{i}^{j},y_{i}^{j})\}_{j\in[m_{i}]} has mikm_{i}\geq k samples, where xijk,j[mi]x_{i}^{j}\in\mathbb{R}^{k},\,\,\forall j\in[m_{i}] are independently drawn from distribution DD. We assume there is a global ground-truth model θ\theta^{*} such that yij=f(θ,xij)+ηiy_{i}^{j}=f(\theta^{*},x_{i}^{j})+\eta_{i} where ηiH\eta_{i}\in H is an unbiased noise with bounded variance σ2\sigma^{2}. We further assume ff has linear structure over θ\theta, which means f(λ1θ1+λ2θ2,x)=λ1f(θ1,x)+λ2f(θ2,x)f(\lambda_{1}\theta_{1}+\lambda_{2}\theta_{2},x)=\lambda_{1}f(\theta_{1},x)+\lambda_{2}f(\theta_{2},x). This capture the case of generalized linear regression (Audibert and Catoni 2010) where f(θ,x)=ϕ(x)θf(\theta,x)=\phi(x)^{\top}\theta and ϕ\phi is an arbitrary mapping function.

We further assume every agent learns a unbiased estimator θ¯i\bar{\theta}_{i} from their dataset SiS_{i} such that 𝔼Si[θ¯i]=θ\mathbb{E}_{S_{i}}\left[\bar{\theta}_{i}\right]=\theta^{*}. Note that in ordinary linear regression (XiXi)1XiYi(X_{i}^{\top}X_{i})^{-1}X_{i}^{\top}Y_{i} is an unbiased estimator. In generalized linear regression, (ΦiΦi)1ΦiYi(\Phi_{i}^{\top}\Phi_{i})^{-1}\Phi_{i}^{\top}Y_{i} is unbiased where Φi=[ϕ(xi1),,ϕ(ximi)]\Phi_{i}=[\phi(x_{i}^{1}),\cdots,\phi(x_{i}^{m_{i}})].

We extend the definition of L()L(\cdot) as follows.

L(θieq)=supη1,,ηn𝒩𝔼xD,j,SjDmj(ηj)[(f(θieq,x)f(θi,x))2]\displaystyle L(\theta_{i}^{eq})=\sup_{\eta_{1},\cdots,\eta_{n}\in\mathcal{N}}\mathbb{E}_{x\sim D,\forall j,S_{j}\sim D^{m_{j}}(\eta_{j})}\left[\left(f(\theta_{i}^{eq},x)-f(\theta_{i}^{*},x)\right)^{2}\right]
L(θ¯i)=supηi𝒩𝔼xD,SiDmi(ηi)[(f(θ¯i,x)f(θi,x))2].\displaystyle L(\bar{\theta}_{i})=\sup_{\eta_{i}\in\mathcal{N}}\mathbb{E}_{x\sim D,S_{i}\sim D^{m_{i}}(\eta_{i})}\left[\left(f(\bar{\theta}_{i},x)-f(\theta_{i}^{*},x)\right)^{2}\right].

Theorem 3. If Assumption 1 holds, then for every agent ii,

L(θieq)=j=1n(Wij1)2L(θ¯j)L(\theta_{i}^{eq})=\sum_{j=1}^{n}\left(W_{ij}^{-1}\right)^{2}L(\bar{\theta}_{j})

where θieq\theta_{i}^{eq} and matrix WW is defined in Lemma 1. Moreover, if we consider linear regression and the condition in Theorem 2 holds, we have

L(θieq)=Θ(kj=1n(Wij1)2mj).\displaystyle L(\theta_{i}^{eq})=\Theta\left(k\sum_{j=1}^{n}\frac{\left(W_{ij}^{-1}\right)^{2}}{m_{j}}\right).
Proof.

To simplify notation, we define wij=Wij1w_{ij}=W_{ij}^{-1} in our proof. By Lemma 1, we have j=1nwij=1\sum_{j=1}^{n}w_{ij}=1 and wij0w_{ij}\geq 0 for all i,j[n]i,j\in[n]. For any agent ii and any noise η1,,ηn𝒩\eta_{1},\cdots,\eta_{n}\in\mathcal{N}, we have

𝔼xD,j,SjDmj(ηj)[(f(θieq,x)f(θ,x))2]\displaystyle\mathbb{E}_{x\sim D,\forall j,S_{j}\sim D^{m_{j}}(\eta_{j})}\left[\left(f\left(\theta_{i}^{eq},x\right)-f(\theta^{*},x)\right)^{2}\right]
=𝔼xD,j,SjDmj(ηj)[(f(j=1nwijθ¯j,x)f(θ,x))2]\displaystyle=\mathbb{E}_{x\sim D,\forall j,S_{j}\sim D^{m_{j}}(\eta_{j})}\left[\left(f\left(\sum_{j=1}^{n}w_{ij}\bar{\theta}_{j},x\right)-f(\theta^{*},x)\right)^{2}\right]
=𝔼xD,j,SjDmj(ηj)[(j=1nwijf(θ¯j,x)f(θ,x))2]\displaystyle=\mathbb{E}_{x\sim D,\forall j,S_{j}\sim D^{m_{j}}(\eta_{j})}\left[\left(\sum_{j=1}^{n}w_{ij}f\left(\bar{\theta}_{j},x\right)-f(\theta^{*},x)\right)^{2}\right] (f(θ,x)f(\theta,x) is linear for θ\theta)
=𝔼xD,j,SjDmj(ηj)[(j=1nwij(f(θ¯j,x)f(θ,x)))2]\displaystyle=\mathbb{E}_{x\sim D,\forall j,S_{j}\sim D^{m_{j}}(\eta_{j})}\left[\left(\sum_{j=1}^{n}w_{ij}\left(f\left(\bar{\theta}_{j},x\right)-f\left(\theta^{*},x\right)\right)\right)^{2}\right] (j=1nwij=1\sum_{j=1}^{n}w_{ij}=1)
=𝔼xD,j,SjDmj(ηj)[j=1nwij2(f(θ¯j,x)f(θ,x))2]\displaystyle=\mathbb{E}_{x\sim D,\forall j,S_{j}\sim D^{m_{j}}(\eta_{j})}\left[\sum_{j=1}^{n}w_{ij}^{2}\left(f\left(\bar{\theta}_{j},x\right)-f\left(\theta^{*},x\right)\right)^{2}\right]
+j=1nkiwijwik𝔼xD,j,SjDmj(ηj)[(f(θ¯j,x)f(θ,x))(f(θ¯k,x)f(θ,x))]=0\displaystyle\qquad\quad+\underbrace{\sum_{j=1}^{n}\sum_{k\neq i}w_{ij}w_{ik}\mathbb{E}_{x\sim D,\forall j,S_{j}\sim D^{m_{j}}(\eta_{j})}\left[\left(f\left(\bar{\theta}_{j},x\right)-f\left(\theta^{*},x\right)\right)\left(f\left(\bar{\theta}_{k},x\right)-f\left(\theta^{*},x\right)\right)\right]}_{=0}
=j=1nwij2𝔼xD,SjDmj(ηj)[(f(θ¯j,x)f(θ,x))2]\displaystyle=\sum_{j=1}^{n}w_{ij}^{2}\mathbb{E}_{x\sim D,S_{j}\sim D^{m_{j}}(\eta_{j})}\left[\left(f\left(\bar{\theta}_{j},x\right)-f\left(\theta^{*},x\right)\right)^{2}\right] (θ¯i,i[n]\bar{\theta}_{i},\,\,\forall i\in[n] are unbiased estimators)

Thus, we have

L(θieq)\displaystyle L(\theta_{i}^{eq}) =supη1,,ηn𝒩𝔼xD,j,SjDmj(ηj)[(f(θieq,x)f(θ,x))2]\displaystyle=\sup_{\eta_{1},\cdots,\eta_{n}\in\mathcal{N}}\mathbb{E}_{x\sim D,\forall j,S_{j}\sim D^{m_{j}}(\eta_{j})}\left[\left(f(\theta_{i}^{eq},x)-f(\theta^{*},x)\right)^{2}\right]
=supη1,,ηn𝒩j=1nwij2𝔼xD,SjDmj(ηj)[(f(θ¯j,x)f(θ,x))2]\displaystyle=\sup_{\eta_{1},\cdots,\eta_{n}\in\mathcal{N}}\sum_{j=1}^{n}w_{ij}^{2}\mathbb{E}_{x\sim D,S_{j}\sim D^{m_{j}}(\eta_{j})}\left[\left(f\left(\bar{\theta}_{j},x\right)-f\left(\theta^{*},x\right)\right)^{2}\right]
=j=1nwij2supηj𝒩𝔼xD,SjDmj(ηj)[(f(θ¯j,x)f(θ,x))2]\displaystyle=\sum_{j=1}^{n}w_{ij}^{2}\sup_{\eta_{j}\in\mathcal{N}}\mathbb{E}_{x\sim D,S_{j}\sim D^{m_{j}}(\eta_{j})}\left[\left(f\left(\bar{\theta}_{j},x\right)-f\left(\theta^{*},x\right)\right)^{2}\right] (independece of noises)
=j=1nwij2L(θ¯j).\displaystyle=\sum_{j=1}^{n}w_{ij}^{2}L(\bar{\theta}_{j}).

For linear regression and the given condition in Theorem 2, we directly have

L(θieq)=Θ(kj=1n(Wij1)2mj).\displaystyle L(\theta_{i}^{eq})=\Theta\left(k\sum_{j=1}^{n}\frac{\left(W_{ij}^{-1}\right)^{2}}{m_{j}}\right).

Note that we can also get the counterpart of Theorem 2 for generalized linear regression. The only thing we need to change is that Assumption 1 and Assumption 2 should be taken for the distribution of ϕ(x)\phi(x) rather than xx. ∎

Theorem 4. For any ϵ>0\epsilon>0, let (mi,i=1,,n)(m_{i}^{*},i=1,\ldots,n) denote an optimal solution of the following optimization problem as a measure of the minimum samples for opinion formation on graph GG with influence factor 𝒗\boldsymbol{v}.

minm1,,mn\displaystyle\min_{m_{1},\cdots,m_{n}}\quad i=1nmi\displaystyle\sum_{i=1}^{n}m_{i}
s.t. j=1n(Wij1)2mjϵk,i\displaystyle\sum_{j=1}^{n}\frac{(W_{ij}^{-1})^{2}}{m_{j}}\leq\frac{\epsilon}{k},\,\,\forall i
mi>0,i\displaystyle m_{i}>0,\,\,\forall i

where WW is defined in Lemma 1. Then, TSC(G,𝒗,k,ϵ)=Θ(i=1nmi+nk)TSC(G,\boldsymbol{v},k,\epsilon)=\Theta(\sum_{i=1}^{n}m_{i}^{*}+nk). Moreover, given GG and 𝒗\boldsymbol{v}, ϵmik\frac{\epsilon m_{i}^{\star}}{k} is a fixed value for any kk and ϵ\epsilon. If ϵmikϵ\frac{\epsilon m_{i}^{\star}}{k}\geq\epsilon for every i[n]i\in[n], then TSC(G,𝒗,k,ϵ)=Θ(i=1nmi)TSC(G,\boldsymbol{v},k,\epsilon)=\Theta(\sum_{i=1}^{n}m_{i}^{*}).

Proof.

We first show a tight characterization of TSC. By Theorem 3, there exists constant c1,c2c_{1},c_{2} such that

c1kj=1n(Wij1)2miL(θieq)c2kj=1n(Wij1)2mic_{1}k\sum_{j=1}^{n}\frac{\left(W_{ij}^{-1}\right)^{2}}{m_{i}}\leq L(\theta_{i}^{eq})\leq c_{2}k\sum_{j=1}^{n}\frac{\left(W_{ij}^{-1}\right)^{2}}{m_{i}} (4)

By definition, TSC(G,𝒗,k,ϵ)TSC(G,\boldsymbol{v},k,\epsilon) is the solution of Equation 5

minm1,,mn\displaystyle\min_{m_{1},\cdots,m_{n}}\quad i=1nmi\displaystyle\sum_{i=1}^{n}m_{i}
s.t. L(θieq)ϵ,i\displaystyle L(\theta_{i}^{eq})\leq\epsilon,\,\,\forall i (5)
mik,i\displaystyle m_{i}\geq k,\,\,\forall i
mi+\displaystyle m_{i}\in\mathbb{Z}^{+}

We consider the following two optimizations. Denote the solution of Equation 6 as Opt-L(G,𝒗,k,ϵ)\text{Opt-L}(G,\boldsymbol{v},k,\epsilon) and the solution of Equation 7 as Opt-U(G,𝒗,k,ϵ)\text{Opt-U}(G,\boldsymbol{v},k,\epsilon).

minm1,,mn\displaystyle\min_{m_{1},\cdots,m_{n}}\quad i=1nmi\displaystyle\sum_{i=1}^{n}m_{i}
s.t. c1kj=1n(Wij1)2miϵ,i\displaystyle c_{1}k\sum_{j=1}^{n}\frac{\left(W_{ij}^{-1}\right)^{2}}{m_{i}}\leq\epsilon,\,\,\forall i (6)
mik,i\displaystyle m_{i}\geq k,\,\,\forall i
mi+\displaystyle m_{i}\in\mathbb{Z}^{+}
minm1,,mn\displaystyle\min_{m_{1},\cdots,m_{n}}\quad i=1nmi\displaystyle\sum_{i=1}^{n}m_{i}
s.t. c2kj=1n(Wij1)2miϵ,i\displaystyle c_{2}k\sum_{j=1}^{n}\frac{\left(W_{ij}^{-1}\right)^{2}}{m_{i}}\leq\epsilon,\,\,\forall i (7)
mik,i\displaystyle m_{i}\geq k,\,\,\forall i
mi+\displaystyle m_{i}\in\mathbb{Z}^{+}

From Equation 4, we have

Opt-L(G,𝒗,k,ϵ)TSC(G,𝒗,k,ϵ)Opt-U(G,𝒗,k,ϵ).\displaystyle\text{Opt-L}(G,\boldsymbol{v},k,\epsilon)\leq TSC(G,\boldsymbol{v},k,\epsilon)\leq\text{Opt-U}(G,\boldsymbol{v},k,\epsilon). (8)

We restate Equation B.2 here and denote the solution as mim_{i}^{\star} for every i[n]i\in[n].

minm1,,mn\displaystyle\min_{m_{1},\cdots,m_{n}}\quad i=1nmi\displaystyle\sum_{i=1}^{n}m_{i}
s.t. j=1n(Wij1)2mjϵk,i\displaystyle\sum_{j=1}^{n}\frac{(W_{ij}^{-1})^{2}}{m_{j}}\leq\frac{\epsilon}{k},\,\,\forall i
mi>0,i\displaystyle m_{i}>0,\,\,\forall i

By defining mi=ϵmikm_{i}^{\prime}=\frac{\epsilon m_{i}}{k}, Equation B.2 could be reformulated to

minm1,,mn\displaystyle\min_{m_{1}^{\prime},\cdots,m_{n}^{\prime}}\quad i=1nmi\displaystyle\sum_{i=1}^{n}m_{i}^{\prime}
s.t. j=1n(Wij1)2mj1,i\displaystyle\sum_{j=1}^{n}\frac{(W_{ij}^{-1})^{2}}{m_{j}^{\prime}}\leq 1,\,\,\forall i (9)
mi>0,i\displaystyle m_{i}^{\prime}>0,\,\,\forall i

Thus, ϵmik\frac{\epsilon m_{i}^{*}}{k} is a fixed value for any kk, ϵ\epsilon. This also implies that if ϵk\frac{\epsilon}{k} is replaced by 1cϵk\frac{1}{c}\!\cdot\!\frac{\epsilon}{k} for some constant cc in Equation B.2, the solution will become cmicm_{i}^{*}. Thus,

Opt-L(G,𝒗,k,ϵ)c1i[n]mi\text{Opt-L}(G,\boldsymbol{v},k,\epsilon)\geq c_{1}\sum_{i\in[n]}m_{i}^{*} (10)

Since Opt-L(G,𝒗,k,ϵ)nk\text{Opt-L}(G,\boldsymbol{v},k,\epsilon)\geq nk, we have

Opt-L(G,𝒗,k,ϵ)c12i[n]mi+nk2\displaystyle\text{Opt-L}(G,\boldsymbol{v},k,\epsilon)\geq\frac{c_{1}}{2}\sum_{i\in[n]}m_{i}^{*}+\frac{nk}{2} (11)

On the other hand, since max{c2mi,k},i[n]\max\left\{\lceil c_{2}m_{i}^{\star}\rceil,k\right\},\,\,\forall i\in[n] is always a solution of Equation 7, we have

Opt-U(G,𝒗,k,ϵ)i[n]max{c2mi,k}c2i[n]mi+nk+n\text{Opt-U}(G,\boldsymbol{v},k,\epsilon)\leq\sum_{i\in[n]}\max\left\{\lceil c_{2}m_{i}^{\star}\rceil,k\right\}\leq c_{2}\sum_{i\in[n]}m_{i}^{*}+nk+n (12)

Combing Equation 8, Equation 10, and Equation 11, we have TSC(G,𝒗,k,ϵ)=Θ(i[n]mi+nk)TSC(G,\boldsymbol{v},k,\epsilon)=\Theta(\sum_{i\in[n]}m_{i}^{\star}+nk). Assigning max{c2mi,k}\max\left\{\lceil c_{2}m_{i}^{\star}\rceil,k\right\} samples to every agent i[n]i\in[n] ensures L(θeq)ϵL(\theta^{eq})\leq\epsilon.

If ϵmikϵ\frac{\epsilon m_{i}^{\star}}{k}\geq\epsilon for every i[n]i\in[n], then mikm_{i}^{*}\geq k and i[n]mink\sum_{i\in[n]}m_{i}^{*}\geq nk. Thus, TSC(G,𝒗,k,ϵ)=Θ(i=1nmi)TSC(G,\boldsymbol{v},k,\epsilon)=\Theta(\sum_{i=1}^{n}m_{i}^{*}). From experiments in Section 5, we can observe that for many networks, ϵmik0.001\frac{\epsilon m_{i}^{\star}}{k}\geq 0.001, thus, setting ϵ=0.001\epsilon=0.001 suffices for TSC(G,𝒗,k,ϵ)=Θ(i=1nmi)TSC(G,\boldsymbol{v},k,\epsilon)=\Theta(\sum_{i=1}^{n}m_{i}^{*}) in these networks.

Appendix C Missing proof in Section 4.1

C.1 Dual form of Equation B.2

Lemma 9.

The dual form of Equation B.2 is

maxλ1,,λn\displaystyle\max_{\lambda_{1},\cdots,\lambda_{n}}\quad 2i=1nj=1nλj(Wij1)2ϵki=1nλi\displaystyle 2\sum_{i=1}^{n}\sqrt{\sum_{j=1}^{n}\lambda_{j}(W_{ij}^{-1})^{2}}-\frac{\epsilon}{k}\sum_{i=1}^{n}\lambda_{i} (13)
s.t. λi0,i.\displaystyle\lambda_{i}\geq 0,\forall i.

Moreover, strong duality holds, which implies the solution of Equation B.2 is equal to Equation 13.

Proof.

The Lagrangian of optimization Equation B.2 is

L=i=1nmi+i=1nλi(j=1nwij2mjϵk)(λi0)L=\sum_{i=1}^{n}m_{i}+\sum_{i=1}^{n}\lambda_{i}\left(\sum_{j=1}^{n}\frac{w_{ij}^{2}}{m_{j}}-\frac{\epsilon}{k}\right)\qquad(\lambda_{i}\geq 0)

in domain mi>0,im_{i}>0,\forall i. When λi=0\lambda_{i}=0 for all ii, we have infm1,,mnL=0\inf\limits_{m_{1},\cdots,m_{n}}L=0. When λi0\exists\lambda_{i}\neq 0, from Lmi=1j=1nλjwji2mi2\frac{\partial L}{\partial m_{i}}=1-\sum_{j=1}^{n}\lambda_{j}\frac{w_{ji}^{2}}{m_{i}^{2}}, the Lagrangian function gets mininum value when mi=j=1nλjwij2>0m_{i}=\sqrt{\sum_{j=1}^{n}\lambda_{j}w_{ij}^{2}}>0. Now

minm1,,mnL\displaystyle\min\limits_{m_{1},\cdots,m_{n}}L =i=1nj=1nλjwij2+i=1nλi(j=1nwij2k=1nλkwkj2ϵk)\displaystyle=\sum_{i=1}^{n}\sqrt{\sum_{j=1}^{n}\lambda_{j}w_{ij}^{2}}+\sum_{i=1}^{n}\lambda_{i}(\sum_{j=1}^{n}\frac{w_{ij}^{2}}{\sqrt{\sum_{k=1}^{n}\lambda_{k}w_{kj}^{2}}}-\frac{\epsilon}{k})
=i=1nj=1nλjwij2+j=1n(i=1nλiwij2k=1nλkwkj2)ϵki=1nλi\displaystyle=\sum_{i=1}^{n}\sqrt{\sum_{j=1}^{n}\lambda_{j}w_{ij}^{2}}+\sum_{j=1}^{n}(\frac{\sum_{i=1}^{n}\lambda_{i}w_{ij}^{2}}{\sqrt{\sum_{k=1}^{n}\lambda_{k}w_{kj}^{2}}})-\frac{\epsilon}{k}\sum_{i=1}^{n}\lambda_{i}
=2i=1nj=1nλjwij2ϵki=1nλi\displaystyle=2\sum_{i=1}^{n}\sqrt{\sum_{j=1}^{n}\lambda_{j}w_{ij}^{2}}-\frac{\epsilon}{k}\sum_{i=1}^{n}\lambda_{i}

Thus when λi0\exists\lambda_{i}\neq 0, the maxmin problem of Lagrangian is

maxλ1,,λn\displaystyle\max_{\lambda_{1},\cdots,\lambda_{n}}\quad 2i=1nj=1nλjwij2ϵki=1nλi\displaystyle 2\sum_{i=1}^{n}\sqrt{\sum_{j=1}^{n}\lambda_{j}w_{ij}^{2}}-\frac{\epsilon}{k}\sum_{i=1}^{n}\lambda_{i}
s.t. λi0,iλi0\displaystyle\lambda_{i}\geq 0,\forall i\quad\exists\lambda_{i}\neq 0

Note that when all λi=0\lambda_{i}=0, this optimization is 0 which matches the infimum of the Lagrangian function when all λi=0\lambda_{i}=0. Thus we can remove the constraint λi0\exists\lambda_{i}\neq 0. The dual problem of optimization Equation B.2 is:

maxλ1,,λn\displaystyle\max_{\lambda_{1},\cdots,\lambda_{n}}\quad 2i=1nj=1nλjwij2ϵki=1nλi\displaystyle 2\sum_{i=1}^{n}\sqrt{\sum_{j=1}^{n}\lambda_{j}w_{ij}^{2}}-\frac{\epsilon}{k}\sum_{i=1}^{n}\lambda_{i}
s.t. λi0,i\displaystyle\lambda_{i}\geq 0,\forall i

Since both the objective function and the feasible region of Equation B.2 is convex, following the Slater condition (Boyd, Boyd, and Vandenberghe 2004), strong duality holds because there is a point that strictly satisfies all constraints. ∎

C.2 Proof of Theorem 5

To simplify notations, assume aij=(W1)ija_{ij}=(W^{-1})_{ij} for all i,j[n]i,j\in[n]. We will first list more observations related to aija_{ij}.

Observation 10.

1αdi+1aii1αα+1di+1\frac{1}{\alpha d_{i}+1}\leq a_{ii}\leq\frac{1}{\frac{\alpha}{\alpha+1}d_{i}+1}.

Proof.

From WW1=IWW^{-1}=I, we have (diα+1)aiiαjN(i)aij=1(d_{i}\alpha+1)a_{ii}-\alpha\sum_{j\in N(i)}a_{ij}=1. From aii+jiaij=1a_{ii}+\sum_{j\neq i}a_{ij}=1, we have (diα+1)aiiα(1aii)1(d_{i}\alpha+1)a_{ii}-\alpha(1-a_{ii})\leq 1, which is equivalent to aii1αα+1d+1a_{ii}\leq\frac{1}{\frac{\alpha}{\alpha+1}d+1}. On the other hand, (diα+1)aii1(d_{i}\alpha+1)a_{ii}\geq 1, aii1diα+1a_{ii}\geq\frac{1}{d_{i}\alpha+1}. Thus 1αdi+1aii1αα+1di+1\frac{1}{\alpha d_{i}+1}\leq a_{ii}\leq\frac{1}{\frac{\alpha}{\alpha+1}d_{i}+1}

Observation 11.

aijααmax{di,dj}+1a_{ij}\leq\frac{\alpha}{\alpha\max\{d_{i},d_{j}\}+1} for iji\neq j.

Proof.

From WW1=IWW^{-1}=I, we have (diα+1)aijαkN(i)akj=0(d_{i}\alpha+1)a_{ij}-\alpha\sum_{k\in{N(i)}}a_{kj}=0. Combining with kN(i)akj=kN(i)ajk1\sum_{k\in N(i)}a_{kj}=\sum_{k\in N(i)}a_{jk}\leq 1, we have aijααdi+1a_{ij}\leq\frac{\alpha}{\alpha d_{i}+1}. Similarly, we get ajiααdj+1a_{ji}\leq\frac{\alpha}{\alpha d_{j}+1}. Since aij=ajia_{ij}=a_{ji}, we have aijααmax{di,dj}+1a_{ij}\leq\frac{\alpha}{\alpha\max\{d_{i},d_{j}\}+1}. ∎

We restate Theorem 5 and prove it here.

Theorem 5. The optimal solution {mi}\{m^{*}_{i}\} to Equation B.2 satisfies

max{i=1n1(αdi+1)2,1}kϵi[n]mii=1nα+1αdi+1kϵ\max\left\{\sum_{i=1}^{n}\frac{1}{(\alpha d_{i}+1)^{2}},1\right\}\!\cdot\!\frac{k}{\epsilon}\leq\sum_{i\in[n]}m_{i}^{*}\leq\sum_{i=1}^{n}\frac{\alpha+1}{\alpha d_{i}+1}\!\cdot\!\frac{k}{\epsilon}

Assigning α+1αdi+1kϵ\frac{\alpha+1}{\alpha d_{i}+1}\!\cdot\!\frac{k}{\epsilon} samples to node ii can make sure at the equilibrium, every agent i[n]i\in[n] has error L(θieq)ϵL(\theta^{eq}_{i})\leq\epsilon. Assigning max{O(α+1αdi+1kϵ),k}\max\left\{O\left(\frac{\alpha+1}{\alpha d_{i}+1}\!\cdot\!\frac{k}{\epsilon}\right),k\right\} samples to node ii is sufficent to solve the TSC problem.

Proof.

(1) For the upper bound, let mi=|S|αdi+1m_{i}=\frac{|S|}{\alpha d_{i}+1}. Equation B.2 then becomes

min|S|\displaystyle\min_{|S|}\quad |S|i=1n1αdi+1\displaystyle|S|\sum_{i=1}^{n}\frac{1}{\alpha d_{i}+1}
s.t. 1|S|j=1naij2(αdj+1)ϵk,i\displaystyle\frac{1}{|S|}\sum_{j=1}^{n}a_{ij}^{2}(\alpha d_{j}+1)\leq\frac{\epsilon}{k},\quad\forall i
|S|>0\displaystyle|S|>0

The optimal solution is pinv=M(ϵ)(i=1n1αdi+1)maxi{j=1naij2(αdj+1)}p^{*}_{inv}=M(\epsilon)(\sum_{i=1}^{n}\frac{1}{\alpha d_{i}+1})\max\limits_{i}\{\sum_{j=1}^{n}a_{ij}^{2}(\alpha d_{j}+1)\} when |S|=kϵmaxi{j=1naij2(dj+1)}|S|=\frac{k}{\epsilon}\!\cdot\!\max\limits_{i}\{\sum_{j=1}^{n}a_{ij}^{2}(d_{j}+1)\}. From Observation 10 and Observation 11,

j=1naij2(αdj+1)\displaystyle\sum_{j=1}^{n}a_{ij}^{2}(\alpha d_{j}+1) =aii2(αdi+1)+jiaij2(αdj+1)\displaystyle=a_{ii}^{2}(\alpha d_{i}+1)+\sum_{j\neq i}a_{ij}^{2}(\alpha d_{j}+1)
aiiα+1αdi+α+1(αdi+1)+jiaijααmax{di,dj}+1(αdj+1)\displaystyle\leq a_{ii}\frac{\alpha+1}{\alpha d_{i}+\alpha+1}(\alpha d_{i}+1)+\sum_{j\neq i}a_{ij}\frac{\alpha}{\alpha\max\{d_{i},d_{j}\}+1}(\alpha d_{j}+1)
aiiα+1αdi+1(αdi+1)+jiaijα+1αdj+1(αdj+1)\displaystyle\leq a_{ii}\frac{\alpha+1}{\alpha d_{i}+1}(\alpha d_{i}+1)+\sum_{j\neq i}a_{ij}\frac{\alpha+1}{\alpha d_{j}+1}(\alpha d_{j}+1)
=(α+1)j=1naij\displaystyle=(\alpha+1)\sum_{j=1}^{n}a_{ij}
=(α+1)\displaystyle=(\alpha+1)

Note that this bound holds for all ii, thus we have

pinv\displaystyle p^{*}_{inv} =kϵ(i=1n1αdi+1)maxi{j=1naij2(αdj+1)}\displaystyle=\frac{k}{\epsilon}\!\cdot\!(\sum_{i=1}^{n}\frac{1}{\alpha d_{i}+1})\max\limits_{i}\{\sum_{j=1}^{n}a_{ij}^{2}(\alpha d_{j}+1)\}
kϵ(i=1nα+1αdi+1)\displaystyle\leq\frac{k}{\epsilon}\!\cdot\!(\sum_{i=1}^{n}\frac{\alpha+1}{\alpha d_{i}+1})

Thus, i[n]mipinv(i=1nα+1αdi+1)kϵ\sum_{i\in[n]}m_{i}^{*}\leq p^{*}_{inv}\leq(\sum_{i=1}^{n}\frac{\alpha+1}{\alpha d_{i}+1})\!\cdot\!\frac{k}{\epsilon}. Since mi=α+1αdi+1kϵm_{i}=\frac{\alpha+1}{\alpha d_{i}+1}\!\cdot\!\frac{k}{\epsilon} is a solution of Equation B.2, assign mim_{i} samples to to node ii can make sure at the equilibrium, every agent i[n]i\in[n] has error L(θieq)ϵL(\theta^{eq}_{i})\leq\epsilon. Assigning max{O(α+1αdi+1kϵ),k}\max\left\{O\left(\frac{\alpha+1}{\alpha d_{i}+1}\!\cdot\!\frac{k}{\epsilon}\right),k\right\} samples to node ii is sufficent to solve the TSC problem.

(2) For the lower bound, we utilize the dual form Equation 13. Let λi=λ(αdi+1)x>0\lambda_{i}=\lambda(\alpha d_{i}+1)^{x}>0 where xRx\in R will be optimized later. The dual optimization then becomes

maxλ\displaystyle\max_{\lambda}\quad 2λi=1nj=1naij2(αdj+1)xλϵki=1n(αdi+1)x\displaystyle 2\sqrt{\lambda}\sum_{i=1}^{n}\sqrt{\sum_{j=1}^{n}a_{ij}^{2}(\alpha d_{j}+1)^{x}}-\frac{\lambda\epsilon}{k}\sum_{i=1}^{n}(\alpha d_{i}+1)^{x}
s.t. λ>0\displaystyle\lambda>0

Let dpower(x)d^{*}_{power}(x) be the solution of this optimization given xRx\in R. Since the objective function is a quadratic function, the function gets maximal value when

λ=kϵi=1nj=1naij2(αdj+1)xi=1n(αdi+1)x>0\sqrt{\lambda}=\frac{k}{\epsilon}\!\cdot\!\frac{\sum_{i=1}^{n}\sqrt{\sum_{j=1}^{n}a_{ij}^{2}(\alpha d_{j}+1)^{x}}}{\sum_{i=1}^{n}(\alpha d_{i}+1)^{x}}>0

Thus,

dpower(x)=kϵ(i=1nj=1naij2(αdj+1)x)2i=1n(αdi+1)xd^{*}_{power}(x)=\frac{k}{\epsilon}\!\cdot\!\frac{(\sum_{i=1}^{n}\sqrt{\sum_{j=1}^{n}a_{ij}^{2}(\alpha d_{j}+1)^{x}})^{2}}{\sum_{i=1}^{n}(\alpha d_{i}+1)^{x}}

From Observation 10, we have aii21(diα+1)2a_{ii}^{2}\geq\frac{1}{(d_{i}\alpha+1)^{2}}, thus

dpower(x)\displaystyle d^{*}_{power}(x) =kϵ(i=1nj=1naij2(αdj+1)x)2i=1n(αdi+1)x\displaystyle=\frac{k}{\epsilon}\!\cdot\!\frac{(\sum_{i=1}^{n}\sqrt{\sum_{j=1}^{n}a_{ij}^{2}(\alpha d_{j}+1)^{x}})^{2}}{\sum_{i=1}^{n}(\alpha d_{i}+1)^{x}}
kϵ(i=1n(αdj+1)x22)2i=1n(αdi+1)x\displaystyle\geq\frac{k}{\epsilon}\!\cdot\!\frac{(\sum_{i=1}^{n}(\alpha d_{j}+1)^{\frac{x-2}{2}})^{2}}{\sum_{i=1}^{n}(\alpha d_{i}+1)^{x}}

Note that this holds for all xRx\in R, to get the tightest bound, we should find the maximal value of the last term. By Cauchy–Schwarz inequality, we have

(i=1n(αdj+1)x22)2i=1n(αdi+1)x\displaystyle\frac{(\sum_{i=1}^{n}(\alpha d_{j}+1)^{\frac{x-2}{2}})^{2}}{\sum_{i=1}^{n}(\alpha d_{i}+1)^{x}} i=1n((αdj+1)x22)2(αdi+1)x\displaystyle\leq\sum_{i=1}^{n}\frac{((\alpha d_{j}+1)^{\frac{x-2}{2}})^{2}}{(\alpha d_{i}+1)^{x}}
=i=1n1(αdi+1)2\displaystyle=\sum_{i=1}^{n}\frac{1}{(\alpha d_{i}+1)^{2}}

This maximum value is achieved when x=2x=-2. Thus i[n]midpower(x)i=1n1(αdi+1)2kϵ\sum_{i\in[n]}m_{i}^{*}\geq d^{*}_{power}(x)\geq\sum_{i=1}^{n}\frac{1}{(\alpha d_{i}+1)^{2}}\!\cdot\!\frac{k}{\epsilon}.

(3) In the dual form Equation 13, λi=k2nϵ2\lambda_{i}=\frac{k^{2}}{n\epsilon^{2}} for any ii is a feasible solution. From strong duality, we have

i[n]mi\displaystyle\sum_{i\in[n]}m_{i}^{*} 2kϵi=1n1nj=1n(Wij1)2kϵ\displaystyle\geq 2\frac{k}{\epsilon}\!\cdot\!\sum_{i=1}^{n}\sqrt{\frac{1}{n}\sum_{j=1}^{n}(W_{ij}^{-1})^{2}}-\frac{k}{\epsilon}
2kϵi=1n1n(j=1nWij1n)2kϵ\displaystyle\geq 2\frac{k}{\epsilon}\sum_{i=1}^{n}\sqrt{\frac{1}{n}\left(\frac{\sum_{j=1}^{n}W_{ij}^{-1}}{\sqrt{n}}\right)^{2}}-\frac{k}{\epsilon} (Cauchy-Schwarz)
kϵ\displaystyle\geq\frac{k}{\epsilon} (j=1nWij1\sum_{j=1}^{n}W_{ij}^{-1} for all ii)

C.3 Bounds for the Network Gain

From Theorem 2, for every agent i[n]i\in[n], we have

c1kmiL(θ¯i)c2kmi\displaystyle c_{1}\frac{k}{m_{i}}\leq L(\bar{\theta}_{i})\leq c_{2}\frac{k}{m_{i}} (14)

Recall that M~(k,ϵ)\widetilde{M}(k,\epsilon) is the minimal number of samples one agent needs to learn a model with ϵ\epsilon error. We have

c1kϵM~(k,ϵ)c2kϵ\displaystyle c_{1}\frac{k}{\epsilon}\leq\widetilde{M}(k,\epsilon)\leq c_{2}\frac{k}{\epsilon} (15)

By Theorem 3, we have

c1kj=1n(Wij1)2miL(θieq)c2kj=1n(Wij1)2mi.\displaystyle c_{1}k\sum_{j=1}^{n}\frac{\left(W_{ij}^{-1}\right)^{2}}{m_{i}}\leq L(\theta_{i}^{eq})\leq c_{2}k\sum_{j=1}^{n}\frac{\left(W_{ij}^{-1}\right)^{2}}{m_{i}}. (16)

Corollary 6. For any network GG, if ϵc2(α+1)αmaxidi+1\epsilon\leq\frac{c_{2}(\alpha+1)}{\alpha\max_{i}d_{i}+1}, then μ(G,α,k,ϵ)Ω(ni=1nα+1αdi+1)\mu(G,\alpha,k,\epsilon)\geq\Omega\left(\frac{n}{\sum_{i=1}^{n}\frac{\alpha+1}{\alpha d_{i}+1}}\right). If ϵc12maxi(αdi+1)2\epsilon\leq\frac{c_{1}}{2\max_{i}(\alpha d_{i}+1)^{2}}, then μ(G,α,k,ϵ)O(min{ni[n]1(αdi+1)2,n})\mu(G,\alpha,k,\epsilon)\leq O\left(\min\left\{\frac{n}{\sum_{i\in[n]}\frac{1}{(\alpha d_{i}+1)^{2}}},n\right\}\right).

Proof.

We first derive the upper bound of the network gain. From Equation 16, Equation 8, and Equation 12, we have

TSC(G,𝒗,k,ϵ)c2i[n]mi+nk+n.\displaystyle TSC(G,\boldsymbol{v},k,\epsilon)\leq c_{2}\sum_{i\in[n]}m_{i}^{*}+nk+n.

This implies

μ(G,𝒗,k,ϵ)=n(M~(k,ϵ)k)TSC(G,𝒗,k,ϵ)nkc1nkϵnkc2i[n]mi+nc1nkϵnkc2i=1nα+1αdi+1kϵ+n\displaystyle\mu(G,\boldsymbol{v},k,\epsilon)=\frac{n(\widetilde{M}(k,\epsilon)-k)}{TSC(G,\boldsymbol{v},k,\epsilon)-nk}\geq\frac{c_{1}n\frac{k}{\epsilon}-nk}{c_{2}\sum_{i\in[n]}m_{i}^{*}+n}\geq\frac{c_{1}n\frac{k}{\epsilon}-nk}{c_{2}\sum_{i=1}^{n}\frac{\alpha+1}{\alpha d_{i}+1}\!\cdot\!\frac{k}{\epsilon}+n}

where the last step applies the upper bound in Theorem 5. The assumption ϵc2(α+1)αmaxidi+1\epsilon\leq\frac{c_{2}(\alpha+1)}{\alpha\max_{i}d_{i}+1} implies for every i[n]i\in[n], c2α+1αdi+1kϵkc_{2}\frac{\alpha+1}{\alpha d_{i}+1}\!\cdot\!\frac{k}{\epsilon}\geq k and c2i=1nα+1αdi+1kϵnkc_{2}\sum_{i=1}^{n}\frac{\alpha+1}{\alpha d_{i}+1}\!\cdot\!\frac{k}{\epsilon}\geq nk. We have

μ(G,𝒗,k,ϵ)c1nkϵnk2c2i=1nα+1αdi+1kϵ=c1n2c2i=1nα+1αdi+1nk2c2i=1nα+1αdi+1kϵc1n2c2i=1nα+1αdi+112Ω(ni=1nα+1αdi+1).\displaystyle\mu(G,\boldsymbol{v},k,\epsilon)\geq\frac{c_{1}n\frac{k}{\epsilon}-nk}{2c_{2}\sum_{i=1}^{n}\frac{\alpha+1}{\alpha d_{i}+1}\!\cdot\!\frac{k}{\epsilon}}=\frac{c_{1}n}{2c_{2}\sum_{i=1}^{n}\frac{\alpha+1}{\alpha d_{i}+1}}-\frac{nk}{2c_{2}\sum_{i=1}^{n}\frac{\alpha+1}{\alpha d_{i}+1}\!\cdot\!\frac{k}{\epsilon}}\geq\frac{c_{1}n}{2c_{2}\sum_{i=1}^{n}\frac{\alpha+1}{\alpha d_{i}+1}}-\frac{1}{2}\geq\Omega\left(\frac{n}{\sum_{i=1}^{n}\frac{\alpha+1}{\alpha d_{i}+1}}\right).

We now derive a general upper bound of the network gain. From Equation 16, Equation 8, and Equation 10, we have

TSC(G,𝒗,k,ϵ)c1i[n]mi.\displaystyle TSC(G,\boldsymbol{v},k,\epsilon)\geq c_{1}\sum_{i\in[n]}m_{i}^{*}.

This implies

μ(G,𝒗,k,ϵ)=n(M~(k,ϵ)k)TSC(G,𝒗,k,ϵ)nkc2nkϵnkc1i[n]minkc2nkϵnkc1i=1n1(αdi+1)2kϵnk\displaystyle\mu(G,\boldsymbol{v},k,\epsilon)=\frac{n(\widetilde{M}(k,\epsilon)-k)}{TSC(G,\boldsymbol{v},k,\epsilon)-nk}\leq\frac{c_{2}n\frac{k}{\epsilon}-nk}{c_{1}\sum_{i\in[n]}m_{i}^{*}-nk}\leq\frac{c_{2}n\frac{k}{\epsilon}-nk}{c_{1}\sum_{i=1}^{n}\frac{1}{(\alpha d_{i}+1)^{2}}\!\cdot\!\frac{k}{\epsilon}-nk} (17)

where the last step applies the lower bound in Theorem 5. The assumption ϵc12maxi(αdi+1)2\epsilon\leq\frac{c_{1}}{2\max_{i}(\alpha d_{i}+1)^{2}} is equivalent to c12(αdi+1)2kϵk\frac{c_{1}}{2(\alpha d_{i}+1)^{2}}\!\cdot\!\frac{k}{\epsilon}\geq k for every i[n]i\in[n]. This implies c12i=1n1(αdi+1)2kϵnk\frac{c_{1}}{2}\sum_{i=1}^{n}\frac{1}{(\alpha d_{i}+1)^{2}}\!\cdot\!\frac{k}{\epsilon}\geq nk which leads to

μ(G,𝒗,k,ϵ)2c2nkϵ2nkc1i=1n1(αdi+1)2kϵ2c2nc1i=1n1(αdi+1)2O(ni=1n1(αdi+1)2)\displaystyle\mu(G,\boldsymbol{v},k,\epsilon)\leq\frac{2c_{2}n\!\cdot\!\frac{k}{\epsilon}-2nk}{c_{1}\sum_{i=1}^{n}\frac{1}{(\alpha d_{i}+1)^{2}}\!\cdot\!\frac{k}{\epsilon}}\leq\frac{2c_{2}n}{c_{1}\sum_{i=1}^{n}\frac{1}{(\alpha d_{i}+1)^{2}}}\leq O\left(\frac{n}{\sum_{i=1}^{n}\frac{1}{(\alpha d_{i}+1)^{2}}}\right)

When maxidin1α\max_{i}d_{i}\leq\frac{\sqrt{n}-1}{\alpha}, we have i=1n1(αdi+1)21\sum_{i=1}^{n}\frac{1}{(\alpha d_{i}+1)^{2}}\geq 1, thus, μ(G,𝒗,k,ϵ)O(min{ni=1n1(αdi+1)2,n})\mu(G,\boldsymbol{v},k,\epsilon)\leq O\left(\min\left\{\frac{n}{\sum_{i=1}^{n}\frac{1}{(\alpha d_{i}+1)^{2}}},n\right\}\right). When maxidi>n1α\max_{i}d_{i}>\frac{\sqrt{n}-1}{\alpha}, ϵc12maxi(αdi+1)2<c12n\epsilon\leq\frac{c_{1}}{2\max_{i}(\alpha d_{i}+1)^{2}}<\frac{c_{1}}{2n}, this implies c12kϵnk\frac{c_{1}}{2}\!\cdot\!\frac{k}{\epsilon}\geq nk. We have

μ(G,𝒗,k,ϵ)c2nkϵnkc1i[n]minkc2nkϵnkc1kϵnk2c2c1nO(n)\displaystyle\mu(G,\boldsymbol{v},k,\epsilon)\leq\frac{c_{2}n\frac{k}{\epsilon}-nk}{c_{1}\sum_{i\in[n]}m_{i}^{*}-nk}\leq\frac{c_{2}n\frac{k}{\epsilon}-nk}{c_{1}\frac{k}{\epsilon}-nk}\leq\frac{2c_{2}}{c_{1}}n\leq O(n)

where we use the trivial lower bound in Theorem 5. Thus, μ(G,𝒗,k,ϵ)O(min{ni=1n1(αdi+1)2,n})\mu(G,\boldsymbol{v},k,\epsilon)\leq O\left(\min\left\{\frac{n}{\sum_{i=1}^{n}\frac{1}{(\alpha d_{i}+1)^{2}}},n\right\}\right). ∎

We then consider another lower bound of network gain, which will be applied to prove the network gain for special networks.

Lemma 12.

Let mi=m,i[n]m_{i}=m,\forall i\in[n] be a feasible solution of Equation B.2. If c2mk\lceil c_{2}m\rceil\leq k, the network gain is infinite. If c2m>k\lceil c_{2}m\rceil>k, we have

μ(G,𝒗,k,ϵ)c1k2c2mϵ1.\mu(G,\boldsymbol{v},k,\epsilon)\geq\frac{c_{1}k}{2c_{2}m\epsilon}-1.
Proof.

From Equation 16, Equation 8, Equation 10, we have

nkTSC(G,𝒗,k,ϵ)i[n]max{c2m,k}\displaystyle nk\leq TSC(G,\boldsymbol{v},k,\epsilon)\leq\sum_{i\in[n]}\max\left\{\lceil c_{2}m\rceil,k\right\}

If c2mk\lceil c_{2}m\rceil\leq k, then TSC(G,𝒗,k,ϵ)=nkTSC(G,\boldsymbol{v},k,\epsilon)=nk and the network gain is infinite. If c2m>k\lceil c_{2}m\rceil>k, then

TSC(G,𝒗,k,ϵ)i[n]max{c2m,k}c2nm+nk+n\displaystyle TSC(G,\boldsymbol{v},k,\epsilon)\leq\sum_{i\in[n]}\max\left\{\lceil c_{2}m\rceil,k\right\}\leq c_{2}nm+nk+n

c2m>k\lceil c_{2}m\rceil>k implies c2m>max{1,k1}c_{2}m>\max\{1,k-1\} because k1k\geq 1. Combining these two bounds with Equation 15, we have

μ(G,𝒗,k,ϵ)=n(M~(k,ϵ)k)TSC(G,𝒗,k,ϵ)nkc1nkϵnkc2nm+nc1nkϵnk2c2nmc1k2c2mϵk2c2mc1k2c2mϵ1.\mu(G,\boldsymbol{v},k,\epsilon)=\frac{n(\widetilde{M}(k,\epsilon)-k)}{TSC(G,\boldsymbol{v},k,\epsilon)-nk}\geq\frac{c_{1}n\frac{k}{\epsilon}-nk}{c_{2}nm+n}\geq\frac{c_{1}n\frac{k}{\epsilon}-nk}{2c_{2}nm}\geq\frac{c_{1}k}{2c_{2}m\epsilon}-\frac{k}{2c_{2}m}\geq\frac{c_{1}k}{2c_{2}m\epsilon}-1.

C.4 Proof of Network Gain for Special Networks

Again, To simplify notations, assume aij=(W1)ija_{ij}=(W^{-1})_{ij} for all i,j[n]i,j\in[n].

Lemma 13.

If the network GG is an nn-clique, i[n]mi=(1+n1(nα+1)2)kϵ\sum_{i\in[n]}m_{i}^{*}=(1+\frac{n-1}{(n\alpha+1)^{2}})\!\cdot\!\frac{k}{\epsilon}, where mi=(α+1)2+(n1)α2(nα+1)2kϵm_{i}^{*}=\frac{(\alpha+1)^{2}+(n-1)\alpha^{2}}{(n\alpha+1)^{2}}\!\cdot\!\frac{k}{\epsilon} for every i[n]i\in[n]. The network gain μ(G,𝐯,k,ϵ)=Ω(n)\mu(G,\boldsymbol{v},k,\epsilon)=\Omega(n)

Proof.

For n-clique, we have [1+(n1)αααα1+(n1)αααα1+(n1)α][θ1eqθ2eqθneq]=[θ¯1θ¯2θ¯n]\begin{bmatrix}1+(n-1)\alpha&-\alpha&\cdots&-\alpha\\ -\alpha&1+(n-1)\alpha&\cdots&-\alpha\\ \vdots&\ddots&\ddots&\vdots\\ -\alpha&-\alpha&\cdots&1+(n-1)\alpha\end{bmatrix}\begin{bmatrix}\theta_{1}^{eq}\\ \theta_{2}^{eq}\\ \vdots\\ \theta_{n}^{eq}\end{bmatrix}=\begin{bmatrix}\bar{\theta}_{1}\\ \bar{\theta}_{2}\\ \vdots\\ \bar{\theta}_{n}\end{bmatrix}. The solution to this system is θ^i=(α+1)θ¯i+αjiθ¯jnα+1\hat{\theta}_{i}=\frac{(\alpha+1)\bar{\theta}_{i}+\alpha\sum_{j\neq i}\bar{\theta}_{j}}{n\alpha+1} for any i[n]i\in[n]. Thus for any ii, aii=α+1nα+1,aij=αnα+1(ji)a_{ii}=\frac{\alpha+1}{n\alpha+1},a_{ij}=\frac{\alpha}{n\alpha+1}(j\neq i). Since the constraints of Equation B.2 are symmetric now, there is an optimal solution such that m1=m2==mn=|S|m_{1}=m_{2}=\cdots=m_{n}=|S|. The constraint now becomes |S|kϵj=1naij2=(α+1)2+(n1)α2(nα+1)2kϵ>0|S|\geq\frac{k}{\epsilon}\!\cdot\!\sum_{j=1}^{n}a_{ij}^{2}=\frac{(\alpha+1)^{2}+(n-1)\alpha^{2}}{(n\alpha+1)^{2}}\!\cdot\!\frac{k}{\epsilon}>0. Thus the unique optimal solution is n(α+1)2+n(n1)α2(nα+1)2kϵ=(1+n1(nα+1)2)kϵ\frac{n(\alpha+1)^{2}+n(n-1)\alpha^{2}}{(n\alpha+1)^{2}}\!\cdot\!\frac{k}{\epsilon}=(1+\frac{n-1}{(n\alpha+1)^{2}})\!\cdot\!\frac{k}{\epsilon} when |S|=(α+1)2+(n1)α2(nα+1)2kϵ|S|=\frac{(\alpha+1)^{2}+(n-1)\alpha^{2}}{(n\alpha+1)^{2}}\!\cdot\!\frac{k}{\epsilon}.

From Lemma 12, if c2mik\lceil c_{2}m_{i}^{\star}\rceil\leq k, the network gain is infinite. If c2mi>k\lceil c_{2}m_{i}^{\star}\rceil>k, then

μ(G,𝒗,k,ϵ)c1k2c2miϵ1=c1n2c2(1+n1(nα+1)2)1c1n2c2(1+12α)1=Ω(n)\displaystyle\mu(G,\boldsymbol{v},k,\epsilon)\geq\frac{c_{1}k}{2c_{2}m_{i}^{*}\epsilon}-1=\frac{c_{1}n}{2c_{2}\left(1+\frac{n-1}{(n\alpha+1)^{2}}\right)}-1\geq\frac{c_{1}n}{2c_{2}\left(1+\frac{1}{2\alpha}\right)}-1=\Omega(n)

Lemma 14.

For an nn-agent star graph, n1(α+1)2kϵi[n]minkϵ\frac{n-1}{(\alpha+1)^{2}}\!\cdot\!\frac{k}{\epsilon}\leq\sum_{i\in[n]}m_{i}^{*}\leq n\!\cdot\!\frac{k}{\epsilon}. If ϵc14(α+1)2\epsilon\leq\frac{c_{1}}{4(\alpha+1)^{2}}, the network gain μ(G,𝐯,k,ϵ)=O(1)\mu(G,\boldsymbol{v},k,\epsilon)=O(1).

Proof.

We can directly get the bounds of i[n]mi\sum_{i\in[n]}m_{i}^{*} from the trivial bound and the lower bound of Theorem 5. Let node jj be the center of the star, since every node should have at least kk samples, from Equation 17 and the lower bound n1(α+1)2kϵ\frac{n-1}{(\alpha+1)^{2}}\!\cdot\!\frac{k}{\epsilon}, we have

μ(G,𝒗,k,ϵ)c2nkϵnkc1i=1n1(αdi+1)2kϵnkc2nkϵnkc1n1(α+1)2kϵnk\displaystyle\mu(G,\boldsymbol{v},k,\epsilon)\leq\frac{c_{2}n\!\cdot\!\frac{k}{\epsilon}-nk}{c_{1}\sum_{i=1}^{n}\frac{1}{(\alpha d_{i}+1)^{2}}\!\cdot\!\frac{k}{\epsilon}-nk}\leq\frac{c_{2}n\!\cdot\!\frac{k}{\epsilon}-nk}{c_{1}\frac{n-1}{(\alpha+1)^{2}}\!\cdot\!\frac{k}{\epsilon}-nk}

From the assumption of ϵ\epsilon and n2n\geq 2, we have ϵc14(α+1)2c12(α+1)2(11n)\epsilon\leq\frac{c_{1}}{4(\alpha+1)^{2}}\leq\frac{c_{1}}{2(\alpha+1)^{2}}(1-\frac{1}{n}). This implies c12n1(α+1)2kϵnk\frac{c_{1}}{2}\frac{n-1}{(\alpha+1)^{2}}\!\cdot\!\frac{k}{\epsilon}\geq nk, which leads to

μ(G,𝒗,k,ϵ)2c2nkϵ2nkc1n1(α+1)2kϵ2c2nc1n1(α+1)24c2(α+1)2c1=O(1).\displaystyle\mu(G,\boldsymbol{v},k,\epsilon)\leq\frac{2c_{2}n\!\cdot\!\frac{k}{\epsilon}-2nk}{c_{1}\frac{n-1}{(\alpha+1)^{2}}\!\cdot\!\frac{k}{\epsilon}}\leq\frac{2c_{2}n}{c_{1}\frac{n-1}{(\alpha+1)^{2}}}\leq\frac{4c_{2}(\alpha+1)^{2}}{c_{1}}=O(1).

Recall the optimization problem in Theorem 4. To get a lower bound on any mim_{i}, we upper bound the quantity j=1n(W1)ij2\sum_{j=1}^{n}(W^{-1})_{ij}^{2} for any ii. The key idea is to investigate the structure of W1W^{-1}. Since W=α+IW=\alpha\mathcal{L}+I is a symmetric positive definite matrix when the influence factors have uniform value α\alpha, it can be written in terms of its eigen-decomposition as W=PDPTW=PDP^{T} where PP is an orthonormal matrix whose columns p1,p2,,pnp_{1},p_{2},...,p_{n} are unit eigenvectors of WW s.t. for any iji\neq j, pipj=0p_{i}^{\top}p_{j}=0, and DD is the diagonal matrix of the corresponding eigenvalues of WW. Let λ1,λ2,,λn\lambda_{1},\lambda_{2},...,\lambda_{n} denote the eigenvalues of the Laplacian matrix of GG such that λ1λ2.λn\lambda_{1}\leq\lambda_{2}\leq....\leq\lambda_{n}. By definition of WW, it follows that the eigenvalues of WW can be expressed as 1+αλ1,1+αλ2,,1+αλn1+\alpha\lambda_{1},1+\alpha\lambda_{2},...,1+\alpha\lambda_{n}. It is well known that for any graph, λ1=0\lambda_{1}=0 and for dd-regular graphs, the eigenvalues λi[0,2d]\lambda_{i}\in[0,2d] i[n]\forall i\in[n]. Furthermore, note that PTP=PPT=IP^{T}P=PP^{T}=I so that P1=PTP^{-1}=P^{T}. Another observation we make is that the eigenvectors p1,p2,..,pnp_{1},p_{2},..,p_{n} of WW are simply the eigenvectors of the Laplacian \mathcal{L}.

Similarly, we express W1W^{-1} in terms of its eigen-decomposition. Since W=PDPTW=PDP^{T}, it follows that W1=PD1PTW^{-1}=PD^{-1}P^{T} where D1D^{-1} is the diagonal matrix such that (D1)ii=11+αλi(D^{-1})_{ii}=\frac{1}{1+\alpha\lambda_{i}} for all i[n]i\in[n]. To lower bound any entry of W1W^{-1}, it suffices to compute the eigenvalues and eigenvectors of the Laplacian matrix LL. Based on the above observations, we have,

(W1)ij=k=1n11+αλkPikPjk=k=1n11+αλk(pk)i(pk)j\displaystyle(W^{-1})_{ij}=\sum_{k=1}^{n}\frac{1}{1+\alpha\lambda_{k}}P_{ik}P_{jk}=\sum_{k=1}^{n}\frac{1}{1+\alpha\lambda_{k}}(p_{k})_{i}(p_{k})_{j}

where (pk)i(p_{k})_{i} denotes the ithi^{th}-entry of the eigenvector pkp_{k}. This implies that

j=1n(W1)ij2=j=1nk=1n11+αλk(pk)i(pk)j)2=(W2)ii\displaystyle\sum_{j=1}^{n}(W^{-1})_{ij}^{2}=\sum_{j=1}^{n}\sum_{k=1}^{n}\frac{1}{1+\alpha\lambda_{k}}(p_{k})_{i}(p_{k})_{j})^{2}=(W^{-2})_{ii}

where W2W^{-2} denotes the matrix (W1)2(W^{-1})^{2}. The last equality follows from the fact that WW is symmetric. Now, note that W2=P(D1)2PTW^{-2}=P(D^{-1})^{2}P^{T} so that, (W2)ii=k=1n1(1+αλk)2Pik2=k=1n1(1+αλk)2(pk)i2(W^{-2})_{ii}=\sum_{k=1}^{n}\frac{1}{(1+\alpha\lambda_{k})^{2}}P_{ik}^{2}=\sum_{k=1}^{n}\frac{1}{(1+\alpha\lambda_{k})^{2}}(p_{k})_{i}^{2}.

Effectively, to bound i[n]mi\sum_{i\in[n]}m_{i}^{*}, we upper bound the quantity (W2)ii(W^{-2})_{ii} which depends only on α\alpha, eigenvalues and eigenvectors of the graph Laplacian LL, summarized in the following Lemma.

Lemma 15.

i[n]minkϵmaxi{(W2)ii2}=nkϵmaxi{k=1n1(αλk+1)(pk)i2}\sum_{i\in[n]}m_{i}^{*}\leq\frac{nk}{\epsilon}\!\cdot\!\max\limits_{i}\{(W^{-2})_{ii}^{2}\}=\frac{nk}{\epsilon}\!\cdot\!\max\limits_{i}\{\sum_{k=1}^{n}\frac{1}{(\alpha\lambda_{k}+1)}(p_{k})_{i}^{2}\}. Moreover, mi=m=kϵmaxi{k=1n1(αλk+1)(pk)i2}m_{i}=m=\frac{k}{\epsilon}\!\cdot\!\max\limits_{i}\{\sum_{k=1}^{n}\frac{1}{(\alpha\lambda_{k}+1)}(p_{k})_{i}^{2}\} for every i[n]i\in[n] is a feasible solution to Equation B.2.

Proof.

At the end of the proof of the upper bound in Theorem 5, by setting m1=m2==mnm_{1}=m_{2}=\cdots=m_{n}, we show that i[n]minkϵmaxi{j=1n(Wij1)2}\sum_{i\in[n]}m_{i}^{*}\leq\frac{nk}{\epsilon}\!\cdot\!\max\limits_{i}\{\sum_{j=1}^{n}(W_{ij}^{-1})^{2}\}. Thus i[n]minkϵmaxi{k=1n1(αλk+1)(pk)i2}\sum_{i\in[n]}m_{i}^{*}\leq\frac{nk}{\epsilon}\!\cdot\!\max\limits_{i}\{\sum_{k=1}^{n}\frac{1}{(\alpha\lambda_{k}+1)}(p_{k})_{i}^{2}\}. The solution is achieved by setting mi=kϵmaxi{k=1n1(αλk+1)(pk)i2}m_{i}=\frac{k}{\epsilon}\!\cdot\!\max\limits_{i}\{\sum_{k=1}^{n}\frac{1}{(\alpha\lambda_{k}+1)}(p_{k})_{i}^{2}\}, which is then a solution to Equation B.2. ∎

Lemma 16.

Consider a graph GG which is a dd-dimensional hypercube where d=lognd=\log n, and nn is a power of 22. Then, when α\alpha is a constant with α38\alpha\geq\frac{3}{8}, i[n]mi=Θ(nkd2ϵ)\sum_{i\in[n]}m_{i}^{*}=\Theta(\frac{nk}{d^{2}\epsilon}). The network gain μ(G,𝐯,k,ϵ)Ω(d2)\mu(G,\boldsymbol{v},k,\epsilon)\geq\Omega\left(d^{2}\right). Assigning max{O(kd2ϵ),k}\max\{O\left(\lceil\frac{k}{d^{2}\epsilon}\rceil\right),k\} samples to every agent suffices to solve the TSC problem.

Proof.

It is known that for the hypercube, it holds that for all eigenvectors pkp_{k}, and all i[n]i\in[n], (pk)i{1n,1n}(p_{k})_{i}\in\{\frac{-1}{\sqrt{n}},\frac{1}{\sqrt{n}}\} and eigenvalues of LL are given by 2i2i for i{0,1,,d}i\in\{0,1,...,d\} with multiplicity (di)\binom{d}{i} respectively (Spielman 2019). Based on this, we have that for any ii,

(W2)ii\displaystyle(W^{-2})_{ii} =k=1n1(1+αλk)2(pk)i2\displaystyle=\sum_{k=1}^{n}\frac{1}{(1+\alpha\lambda_{k})^{2}}(p_{k})_{i}^{2}
=1nk=1n1(1+αλk)2\displaystyle=\frac{1}{n}\sum_{k=1}^{n}\frac{1}{(1+\alpha\lambda_{k})^{2}}
=1ni=0d(di)1(2iα+1)2\displaystyle=\frac{1}{n}\sum_{i=0}^{d}\binom{d}{i}\frac{1}{(2i\alpha+1)^{2}}

We bound the quantity, i=0d(di)1(2iα+1)2\sum_{i=0}^{d}\binom{d}{i}\frac{1}{(2i\alpha+1)^{2}}. For this purpose, we look at the quantity z(1+z2)d=i=0d(di)z2iα+1z(1+z^{2})^{d}=\sum_{i=0}^{d}\binom{d}{i}z^{2i\alpha+1} where zz will be set later. Now,

z(1+z2)d\displaystyle z(1+z^{2})^{d} =i=0d(di)z2i+1\displaystyle=\sum_{i=0}^{d}\binom{d}{i}z^{2i+1} \displaystyle\iff
(1+z2)dz𝑑z\displaystyle\int(1+z^{2})^{d}z\,dz =i=0d(di)z2iz𝑑z\displaystyle=\sum_{i=0}^{d}\binom{d}{i}\int z^{2i}z\,dz \displaystyle\iff
(1+z2)d+12(d+1)\displaystyle\frac{(1+z^{2})^{d+1}}{2(d+1)} =i=0d(di)z2i+22i+2\displaystyle=\sum_{i=0}^{d}\binom{d}{i}\frac{z^{2i+2}}{2i+2} \displaystyle\iff
(1+z2)d+12(d+1)z\displaystyle\frac{(1+z^{2})^{d+1}}{2(d+1)}z =i=0d(di)z2i+22i+2z\displaystyle=\sum_{i=0}^{d}\binom{d}{i}\frac{z^{2i+2}}{2i+2}z \displaystyle\iff
(1+z2)d+12(d+1)z𝑑z\displaystyle\int\frac{(1+z^{2})^{d+1}}{2(d+1)}z\,dz =i=0d(di)z2i+22i+2z𝑑z\displaystyle=\sum_{i=0}^{d}\binom{d}{i}\int\frac{z^{2i+2}}{2i+2}z\,dz \displaystyle\iff
(1+z2)d+24(d+1)(d+2)\displaystyle\frac{(1+z^{2})^{d+2}}{4(d+1)(d+2)} =i=0d(di)z2i+4(2i+2)(2i+4)\displaystyle=\sum_{i=0}^{d}\binom{d}{i}\frac{z^{2i+4}}{(2i+2)(2i+4)}

Integrating over [0,1][0,1] and setting d=lognd=\log n we get that,

i=0d(di)1(2i+2)(2i+4)=4n4(d+1)(d+2)=n(d+1)(d+2).\displaystyle\sum_{i=0}^{d}\binom{d}{i}\frac{1}{(2i+2)(2i+4)}=\frac{4n}{4(d+1)(d+2)}=\frac{n}{(d+1)(d+2)}.

Assuming α38\alpha\geq\frac{3}{8}, we get,

i=0d(di)1(2iα+1)2\displaystyle\sum_{i=0}^{d}\binom{d}{i}\frac{1}{(2i\alpha+1)^{2}} =i=0d(di)14α2i2+4αi+1\displaystyle=\sum_{i=0}^{d}\binom{d}{i}\frac{1}{4\alpha^{2}i^{2}+4\alpha i+1}
=i=0d(di)88(4α2i2+4αi+1)\displaystyle=\sum_{i=0}^{d}\binom{d}{i}\frac{8}{8(4\alpha^{2}i^{2}+4\alpha i+1)}
i=0d(di)84i2+12i+8\displaystyle\leq\sum_{i=0}^{d}\binom{d}{i}\frac{8}{4i^{2}+12i+8}
=i=0d(di)8(2i+2)(2i+4)\displaystyle=\sum_{i=0}^{d}\binom{d}{i}\frac{8}{(2i+2)(2i+4)}

It follows that,

i=0d(di)1(2iα+1)28i=0d(di)1(2i+2)(2i+4)=8n(d+1)(d+2)\displaystyle\sum_{i=0}^{d}\binom{d}{i}\frac{1}{(2i\alpha+1)^{2}}\leq 8\sum_{i=0}^{d}\binom{d}{i}\frac{1}{(2i+2)(2i+4)}=\frac{8n}{(d+1)(d+2)}

so that,

(W2)ii\displaystyle(W^{-2})_{ii} =1ni=0d(di)1(2iα+1)2\displaystyle=\frac{1}{n}\sum_{i=0}^{d}\binom{d}{i}\frac{1}{(2i\alpha+1)^{2}}
1n8n(d+1)(d+2)\displaystyle\leq\frac{1}{n}\frac{8n}{(d+1)(d+2)}
=O(1d2)\displaystyle=O(\frac{1}{d^{2}})

Note that this holds for all i[n]i\in[n]. Thus, by Lemma 15, we get that mi=m=O(kd2ϵ)m_{i}=m=O\left(\frac{k}{d^{2}\epsilon}\right) is a feasible solution to Equation B.2, and i[n]miO(nkd2ϵ)\sum_{i\in[n]}m_{i}^{*}\leq O\left(\frac{nk}{d^{2}\epsilon}\right). On the other hand, from the lower bound in Theorem 5, we have i[n]miΩ(nkd2ϵ)\sum_{i\in[n]}m_{i}^{*}\geq\Omega\left(\frac{nk}{d^{2}\epsilon}\right). Since d=logn<nd=\log{n}<\sqrt{n}, we have i[n]mi=Θ(nkd2ϵ)\sum_{i\in[n]}m_{i}^{*}=\Theta(\frac{nk}{d^{2}\epsilon}). Since mi=m=O(kd2ϵ)m_{i}=m=O\left(\frac{k}{d^{2}\epsilon}\right) is a feasible solution to Equation B.2. Assigning max{O(kd2ϵ),k}\max\{O\left(\lceil\frac{k}{d^{2}\epsilon}\rceil\right),k\} samples to every agent is sufficent to solve the TSC problem. By Lemma 12, we have

μ(G,𝒗,k,ϵ)c1k2c2mϵ1=Ω(d2).\mu(G,\boldsymbol{v},k,\epsilon)\geq\frac{c_{1}k}{2c_{2}m\epsilon}-1=\Omega\left(d^{2}\right).

Lemma 17.

For a random dd-regular graph, with probability of at least 1O(nτ)1-O(n^{-\tau}) where τ=d1+121\tau=\lceil\frac{\sqrt{d-1}+1}{2}\rceil-1

(1) If dnαd\geq\frac{\sqrt{n}}{\alpha}, i[n]mi=Θ(kϵ)\sum_{i\in[n]}m_{i}^{*}=\Theta(\frac{k}{\epsilon}). Assigning max{O(knϵ),k}\max\{O\left(\lceil\frac{k}{n\epsilon}\rceil\right),k\} samples to every agent ii suffices to solve the TSC problem. The network gain μ(G,𝐯,k,ϵ)Ω(n)\mu(G,\boldsymbol{v},k,\epsilon)\geq\Omega\left(n\right).

(2) If d<nαd<\frac{\sqrt{n}}{\alpha}, i[n]mi=Θ(nkα2d2ϵ)\sum_{i\in[n]}m_{i}^{*}=\Theta(\frac{nk}{\alpha^{2}d^{2}\epsilon}). Assigning max{O(kα2d2ϵ),k}\max\{O\left(\lceil\frac{k}{\alpha^{2}d^{2}\epsilon}\rceil\right),k\} suffices to solve the TSC problem. The network gain μ(G,𝐯,k,ϵ)Ω(d2)\mu(G,\boldsymbol{v},k,\epsilon)\geq\Omega\left(d^{2}\right).

Proof.

For a random dd-regular graph, it is known that the largest eigenvalue of the adjacency matrix is dd and with the probability of at least 1O(nτ)1-O(n^{-\tau}) where τ=d1+121\tau=\lceil\frac{\sqrt{d-1}+1}{2}\rceil-1, the rest of the eigenvalues are in the range [2do(1),2d+o(1)][-2\sqrt{d}-o(1),2\sqrt{d}+o(1)](Friedman 2003). This implies that for the Laplacian the smallest eigenvalue is 0 and the rest of the eigenvalues are Θ(d)\Theta(d). Since in the Laplacian matrix, (p1)i=1n(p_{1})_{i}=\frac{1}{\sqrt{n}} for all i[n]i\in[n]. We have that, for every ii,

(W2)ii\displaystyle(W^{-2})_{ii} =k=1n1(1+αλk)2(pk)i2\displaystyle=\sum_{k=1}^{n}\frac{1}{(1+\alpha\lambda_{k})^{2}}(p_{k})_{i}^{2}
=1n+1Ω(α2d2)k=2n(pk)i2\displaystyle=\frac{1}{n}+\frac{1}{\Omega(\alpha^{2}d^{2})}\sum_{k=2}^{n}(p_{k})_{i}^{2}
=1n+1Ω(α2d2)(11n)\displaystyle=\frac{1}{n}+\frac{1}{\Omega(\alpha^{2}d^{2})}(1-\frac{1}{n})
1n+1Ω(α2d2)\displaystyle\leq\frac{1}{n}+\frac{1}{\Omega(\alpha^{2}d^{2})}

From Lemma 15, we have i[n]mimaxi{(W2)ii}nkϵ(1n+1Ω(α2d2))nkϵ\sum_{i\in[n]}m_{i}^{*}\leq\max\limits_{i}\{(W^{-2})_{ii}\}n\!\cdot\!\frac{k}{\epsilon}\leq(\frac{1}{n}+\frac{1}{\Omega(\alpha^{2}d^{2})})n\!\cdot\!\frac{k}{\epsilon}. Thus, when dnαd\geq\frac{\sqrt{n}}{\alpha}, i[n]minkΩ(n)ϵ=O(kϵ)\sum_{i\in[n]}m_{i}^{*}\leq\frac{nk}{\Omega(n)\epsilon}=O(\frac{k}{\epsilon}), and mi=m=O(knϵ)m_{i}=m=O(\frac{k}{n\epsilon}) is a feasible solution to Equation B.2. Assigning max{O(knϵ),k}\max\{O\left(\lceil\frac{k}{n\epsilon}\rceil\right),k\} samples to every agent suffices to solve the TSC problem. On the other hand, from the trivial lower bound, we have i[n]mikϵ\sum_{i\in[n]}m_{i}^{*}\geq\frac{k}{\epsilon}, implying i[n]mi=Θ(kϵ)\sum_{i\in[n]}m_{i}^{*}=\Theta(\frac{k}{\epsilon}). By Lemma 12, we have

μ(G,𝒗,k,ϵ)c1k2c2mϵ1=Ω(n).\mu(G,\boldsymbol{v},k,\epsilon)\geq\frac{c_{1}k}{2c_{2}m\epsilon}-1=\Omega\left(n\right).

When d<nαd<\frac{\sqrt{n}}{\alpha}, i[n]minkΩ(α2d2ϵ)=O(nkα2d2ϵ)\sum_{i\in[n]}m_{i}^{*}\leq\frac{nk}{\Omega(\alpha^{2}d^{2}\epsilon)}=O(\frac{nk}{\alpha^{2}d^{2}\epsilon}), and mi=m=O(kα2d2ϵ)m_{i}=m=O(\frac{k}{\alpha^{2}d^{2}\epsilon}) is a feasible solution to Equation B.2. From the trivial bound and the lower bound of Theorem 5, we have i[n]mink(αd+1)2ϵ\sum_{i\in[n]}m_{i}^{*}\geq\frac{nk}{(\alpha d+1)^{2}\epsilon}. Since d<nαd<\frac{\sqrt{n}}{\alpha}, we have i[n]mi=Θ(nkα2d2ϵ)\sum_{i\in[n]}m_{i}^{*}=\Theta(\frac{nk}{\alpha^{2}d^{2}\epsilon}). Assigning max{O(kα2d2ϵ),k}\max\{O\left(\lceil\frac{k}{\alpha^{2}d^{2}\epsilon}\rceil\right),k\} samples to every agent suffices to solve the TSC problem. By Lemma 12, we have

μ(G,𝒗,k,ϵ)c1k2c2mϵ1=Ω(d2).\mu(G,\boldsymbol{v},k,\epsilon)\geq\frac{c_{1}k}{2c_{2}m\epsilon}-1=\Omega\left(d^{2}\right).

Lemma 18.

Consider a graph GG which is a dd-regular expander with expansion τ(G)\tau(G). Then, we have that

(1) When dnατ(G)2d\leq\frac{\sqrt{n}}{\alpha\tau(G)^{2}}, then i[n]miO(nkα2d2τ(G)4ϵ)\sum_{i\in[n]}m_{i}^{*}\leq O(\frac{nk}{\alpha^{2}d^{2}\tau(G)^{4}\epsilon}). Assigning max{O(kα2d2τ(G)4ϵ),k}\max\{O\left(\lceil\frac{k}{\alpha^{2}d^{2}\tau(G)^{4}\epsilon}\rceil\right),k\} samples to every agent ii suffices to solve the TSC problem. The network gain μ(G,𝐯,k,ϵ)Ω(d2τ(G)4)\mu(G,\boldsymbol{v},k,\epsilon)\geq\Omega\left(d^{2}\tau(G)^{4}\right).

(2) When d>nατ(G)2d>\frac{\sqrt{n}}{\alpha\tau(G)^{2}}, then i[n]miO(kϵ)\sum_{i\in[n]}m_{i}^{*}\leq O(\frac{k}{\epsilon}). Assigning max{O(knϵ),k}\max\{O\left(\lceil\frac{k}{n\epsilon}\rceil\right),k\} samples to every agent ii suffices to solve the TSC problem. The network gain μ(G,𝐯,k,ϵ)Ω(n)\mu(G,\boldsymbol{v},k,\epsilon)\geq\Omega\left(n\right).

Proof.

Since λ1=0\lambda_{1}=0 for any graph, we know from Cheeger’s inequality that the second smallest eigenvalue λ2\lambda_{2} satisfies,

λ22dτ(G)2λ2d\displaystyle\frac{\lambda_{2}}{2d}\leq\tau(G)\leq\sqrt{2\frac{\lambda_{2}}{d}}

from which it follows that λ2dτ(G)22\lambda_{2}\geq\frac{d\tau(G)^{2}}{2}. Then, we have that

(W2)ii\displaystyle(W^{-2})_{ii} =k=1n1(1+αλk)2(pk)i2\displaystyle=\sum_{k=1}^{n}\frac{1}{(1+\alpha\lambda_{k})^{2}}(p_{k})_{i}^{2}
1n+4α2d2τ(G)4k=2n(pk)i2\displaystyle\leq\frac{1}{n}+\frac{4}{\alpha^{2}d^{2}\tau(G)^{4}}\sum_{k=2}^{n}(p_{k})_{i}^{2}
1n+4α2d2τ(G)4\displaystyle\leq\frac{1}{n}+\frac{4}{\alpha^{2}d^{2}\tau(G)^{4}}

Note that this holds for all i[n]i\in[n]. Thus, by Lemma 15, i[n]mi(1n+4α2d2τ(G)4)nkϵ\sum_{i\in[n]}m_{i}^{*}\leq(\frac{1}{n}+\frac{4}{\alpha^{2}d^{2}\tau(G)^{4}})n\!\cdot\!\frac{k}{\epsilon}, and mi=m=(1n+4α2d2τ(G)4)kϵm_{i}=m=(\frac{1}{n}+\frac{4}{\alpha^{2}d^{2}\tau(G)^{4}})\!\cdot\!\frac{k}{\epsilon} is a feasible solution to Equation B.2. Thus, when dnατ(G)2d\geq\frac{\sqrt{n}}{\alpha\tau(G)^{2}}, i[n]miO(kϵ)\sum_{i\in[n]}m_{i}^{*}\leq O(\frac{k}{\epsilon}). Since m=O(knϵ)m=O(\frac{k}{n\epsilon}) now, assigning max{O(knϵ),k}\max\{O\left(\lceil\frac{k}{n\epsilon}\rceil\right),k\} samples to every agent ii suffices to solve the TSC problem. The network gain μ(G,𝒗,k,ϵ)Ω(n)\mu(G,\boldsymbol{v},k,\epsilon)\geq\Omega\left(n\right).

When d<nατ(G)2d<\frac{\sqrt{n}}{\alpha\tau(G)^{2}}, i[n]miO(nkα2d2τ(G)4ϵ)\sum_{i\in[n]}m_{i}^{*}\leq O(\frac{nk}{\alpha^{2}d^{2}\tau(G)^{4}\epsilon}). Since m=O(kα2d2τ(G)4ϵ)m=O\left(\frac{k}{\alpha^{2}d^{2}\tau(G)^{4}\epsilon}\right) now, assigning max{O(kα2d2τ(G)4ϵ),k}\max\{O\left(\lceil\frac{k}{\alpha^{2}d^{2}\tau(G)^{4}\epsilon}\rceil\right),k\} samples to every agent ii suffices to solve the TSC problem. The network gain μ(G,𝒗,k,ϵ)Ω(d2τ(G)4)\mu(G,\boldsymbol{v},k,\epsilon)\geq\Omega\left(d^{2}\tau(G)^{4}\right). ∎

Appendix D Missing proof in Section 4.2

To simplify notations, assume aij=(W1)ija_{ij}=(W^{-1})_{ij} for all i,j[n]i,j\in[n]. Since W1W^{-1} is positive-definite, from the Schur product theorem, (W1)(W1)(W^{-1})\circ(W^{-1}) is also positive-definite where \circ is the Hadamard product (element-wise product). Let B=((W1)(W1))1B=((W^{-1})\circ(W^{-1}))^{-1} and [B]ij=bij[B]_{ij}=b_{ij}.

Lemma 19.

j=1nbij>0\sum\limits_{j=1}^{n}b_{ij}>0 for all i[n]i\in[n]

Proof.

Let W2=W1W1W_{2}=W^{-1}\circ W^{-1}, we have W21=BW_{2}^{-1}=B. Let W2,(W1)W_{2}^{*},(W^{-1})^{*} be the adjoint matrix of W2,W1W_{2},W^{-1} respectively. We have B=W21=1det(W2)W2B=W_{2}^{-1}=\frac{1}{\det(W_{2})}W_{2}^{*}. Consider the sum of the nthn^{th} row of BB, to prove it is positive, since det(W2)\det(W_{2}) is positive, we only need to prove the sum of the nthn^{th} row of W2W_{2}^{*} is positive. The nthn^{th}-row sum of W2W_{2}^{*} is det(W2(n))\det(W_{2}(n)) where W2(n)W_{2}(n) is generated by replacing the nthn^{th} row of W2W_{2} to all one vector. Similarly, we can define W1(n)=[a11a1na(n1)1a(n1)n11]W^{-1}(n)=\begin{bmatrix}a_{11}&\cdots&a_{1n}\\ \vdots&\ddots&\vdots\\ a_{(n-1)1}&\cdots&a_{(n-1)n}\\ 1&\cdots&1\end{bmatrix}. Since W1W^{-1} is positive definite, the first (n1)(n-1) leading principal minors of W1(n)W^{-1}(n), which are exactly the same as W1W^{-1}, are positive. On the other hand, det(W1(n))=j=1n(W1)nj=det(W1)j=1nWnj=det(W1)>0\det(W^{-1}(n))=\sum_{j=1}^{n}(W^{-1})^{*}_{nj}=\det(W^{-1})\sum_{j=1}^{n}W_{nj}=\det(W^{-1})>0. Thus, W1(n)W^{-1}(n) is positive definite. Note that W2(n)=W1(n)W1(n)W_{2}(n)=W^{-1}(n)\circ W^{-1}(n), from Schur product theorem, W2(n)W_{2}(n) is positive definite. Thus det(W2(n))>0\det(W_{2}(n))>0, which means j=1nbnj>0\sum_{j=1}^{n}b_{nj}>0. By relabeling each node to nn, we can get j=1nbij>0\sum\limits_{j=1}^{n}b_{ij}>0 for all i[n]i\in[n]. ∎

We first prove Lemma 8 which will guide us to derive the Theorem 7.

Lemma 8. When j=1nbij(k=1nbjk)20\sum_{j=1}^{n}\frac{b_{ij}}{(\sum_{k=1}^{n}b_{jk})^{2}}\geq 0 for all i[n]i\in[n], the closed-form solution of Equation B.2 is i=1n1j=1nbijkϵ\sum_{i=1}^{n}\frac{1}{\sum_{j=1}^{n}b_{ij}}\!\cdot\!\frac{k}{\epsilon} when λi=j=1nbij(k=1nbjk)2k2ϵ2\lambda_{i}=\sum_{j=1}^{n}\frac{b_{ij}}{(\sum_{k=1}^{n}b_{jk})^{2}}\!\cdot\!\frac{k^{2}}{\epsilon^{2}} and mi=1j=1nbijkϵm_{i}=\frac{1}{\sum_{j=1}^{n}b_{ij}}\!\cdot\!\frac{k}{\epsilon} for all i[n]i\in[n]

Proof.

From Lemma 9, strong duality holds. Thus, we can only consider the solution of dual form.

maxλ1,,λn\displaystyle\max_{\lambda_{1},\cdots,\lambda_{n}}\quad 2i=1nj=1nλjaij2ϵki=1nλi\displaystyle 2\sum_{i=1}^{n}\sqrt{\sum_{j=1}^{n}\lambda_{j}a_{ij}^{2}}-\frac{\epsilon}{k}\sum_{i=1}^{n}\lambda_{i}
s.t. λi0,i\displaystyle\lambda_{i}\geq 0,\forall i

Let g(λ)g(\lambda) be the objective function of Equation 13 Setting gradient to zero, we have gλk=i=1naik2j=1nλjaij2ϵk=i=1naki2j=1nλjaij2ϵk=0\frac{\partial g}{\partial\lambda_{k}}=\sum_{i=1}^{n}\frac{a_{ik}^{2}}{\sqrt{\sum_{j=1}^{n}\lambda_{j}a_{ij}^{2}}}-\frac{\epsilon}{k}=\sum_{i=1}^{n}\frac{a_{ki}^{2}}{\sqrt{\sum_{j=1}^{n}\lambda_{j}a_{ij}^{2}}}-\frac{\epsilon}{k}=0 for any ii. Let 1j=1nλjaij2=p(i)\frac{1}{\sqrt{\sum_{j=1}^{n}\lambda_{j}a_{ij}^{2}}}=p(i), we have [a112a1n2a212a2n2an12ann2][p(1)p(n)]=[ϵkϵk]\begin{bmatrix}a_{11}^{2}&\cdots&a_{1n}^{2}\\ a_{21}^{2}&\cdots&a_{2n}^{2}\\ \vdots&\ddots&\vdots\\ a_{n1}^{2}&\cdots&a_{nn}^{2}\end{bmatrix}\begin{bmatrix}p(1)\\ \vdots\\ p(n)\end{bmatrix}=\begin{bmatrix}\frac{\epsilon}{k}\\ \vdots\\ \frac{\epsilon}{k}\end{bmatrix}. The matrix here is equivalent to (W1)(W1)(W^{-1})\circ(W^{-1}) where \circ denotes the Hadamard product of two matrices. Since WW is positive definite, W1W^{-1} is also positive definite. From Schur product theorem, (W1)(W1)(W^{-1})\circ(W^{-1}) is positive definite and then is invertible. Since B=((W1)(W1))1B=((W^{-1})\circ(W^{-1}))^{-1} and [B]ij=bij[B]_{ij}=b_{ij}, we have p(i)=1M(ϵ)j=1nbijp(i)=\frac{1}{M(\epsilon)}\sum_{j=1}^{n}b_{ij}. In Lemma 19, we have shown that j=1nbij>0\sum_{j=1}^{n}b_{ij}>0, thus, this solution is valid. Considering λi\lambda_{i}, we have j=1nλjaij2=1p2(i)\sum_{j=1}^{n}\lambda_{j}a_{ij}^{2}=\frac{1}{p^{2}(i)} for any ii. Thus [a112a1n2a212a2n2an12ann2][λ1λn]=[1p2(1)1p2(n)]\begin{bmatrix}a_{11}^{2}&\cdots&a_{1n}^{2}\\ a_{21}^{2}&\cdots&a_{2n}^{2}\\ \vdots&\ddots&\vdots\\ a_{n1}^{2}&\cdots&a_{nn}^{2}\end{bmatrix}\begin{bmatrix}\lambda_{1}\\ \vdots\\ \lambda_{n}\end{bmatrix}=\begin{bmatrix}\frac{1}{p^{2}(1)}\\ \vdots\\ \frac{1}{p^{2}(n)}\end{bmatrix}. We have λi=j=1nbijp2(j)=j=1nbij(k=1nbjk)2k2ϵ2\lambda_{i}=\sum_{j=1}^{n}\frac{b_{ij}}{p^{2}(j)}=\sum_{j=1}^{n}\frac{b_{ij}}{(\sum_{k=1}^{n}b_{jk})^{2}}\!\cdot\!\frac{k^{2}}{\epsilon^{2}} (Note that it can be negative). This solution is valid if and only if j=1nbij(k=1nbjk)20\sum_{j=1}^{n}\frac{b_{ij}}{(\sum_{k=1}^{n}b_{jk})^{2}}\geq 0 for any i[n]i\in[n]. If this solution is valid, the optimal solution is:

2i=1nj=1nλjaij2ϵki=1nλi\displaystyle 2\sum_{i=1}^{n}\sqrt{\sum_{j=1}^{n}\lambda_{j}a_{ij}^{2}}-\frac{\epsilon}{k}\sum_{i=1}^{n}\lambda_{i}
=2i=1n1p(i)ϵki=1nj=1nbij(k=1nbjk)2k2ϵ2\displaystyle=2\sum_{i=1}^{n}\frac{1}{p(i)}-\frac{\epsilon}{k}\sum_{i=1}^{n}\sum_{j=1}^{n}\frac{b_{ij}}{(\sum_{k=1}^{n}b_{jk})^{2}}\!\cdot\!\frac{k^{2}}{\epsilon^{2}}
=2kϵi=1n1j=1nbijkϵj=1ni=1nbij(k=1nbkj)2\displaystyle=\frac{2k}{\epsilon}\sum_{i=1}^{n}\frac{1}{\sum_{j=1}^{n}b_{ij}}-\frac{k}{\epsilon}\sum_{j=1}^{n}\frac{\sum_{i=1}^{n}b_{ij}}{(\sum_{k=1}^{n}b_{kj})^{2}}
=2kϵi=1n1j=1nbijkϵi=1n1j=1nbij\displaystyle=\frac{2k}{\epsilon}\sum_{i=1}^{n}\frac{1}{\sum_{j=1}^{n}b_{ij}}-\frac{k}{\epsilon}\sum_{i=1}^{n}\frac{1}{\sum_{j=1}^{n}b_{ij}}
=kϵi=1n1j=1nbij\displaystyle=\frac{k}{\epsilon}\sum_{i=1}^{n}\frac{1}{\sum_{j=1}^{n}b_{ij}}

From KKT conditions, setting the gradient of the Lagrangian function of (P) to zero, we have

mi\displaystyle m_{i} =k=1nλkaik2\displaystyle=\sqrt{\sum_{k=1}^{n}\lambda_{k}a_{ik}^{2}}
=kϵk=1nt=1nbkt(j=1nbtj)2aik2\displaystyle=\frac{k}{\epsilon}\sqrt{\sum_{k=1}^{n}\sum_{t=1}^{n}\frac{b_{kt}}{(\sum_{j=1}^{n}b_{tj})^{2}}a_{ik}^{2}}
=kϵt=1n1(j=1nbtj)2k=1naik2bkt\displaystyle=\frac{k}{\epsilon}\sqrt{\sum_{t=1}^{n}\frac{1}{(\sum_{j=1}^{n}b_{tj})^{2}}\sum_{k=1}^{n}a_{ik}^{2}b_{kt}}
=kϵt=1n1(j=1nbtj)2Iit\displaystyle=\frac{k}{\epsilon}\sqrt{\sum_{t=1}^{n}\frac{1}{(\sum_{j=1}^{n}b_{tj})^{2}}I_{it}}
=1j=1nbijkϵ\displaystyle=\frac{1}{\sum_{j=1}^{n}b_{ij}}\!\cdot\!\frac{k}{\epsilon}

where IitI_{it} is the element on the ii-th row and tt-th column of identity matrix II. The fourth equality comes from B=(W1W1)1B=(W^{-1}\circ W^{-1})^{-1} where the (W1W1)ij=aij2(W^{-1}\circ W^{-1})_{ij}=a_{ij}^{2}. ∎

Theorem 7. Let γi=max{0,j=1nbij(k=1nbjk)2}\gamma_{i}=\max\{0,\sum_{j=1}^{n}\frac{b_{ij}}{(\sum_{k=1}^{n}b_{jk})^{2}}\} for every i[n]i\in[n], then we have

max{2i=1nj=1nγj(Wij1)2i=1nγi,1}kϵi[n]mimin{i=1n1j=1nbij,n}kϵ\max\{2\sum_{i=1}^{n}\sqrt{\sum_{j=1}^{n}\gamma_{j}(W^{-1}_{ij})^{2}}-\sum_{i=1}^{n}\gamma_{i},1\}\!\cdot\!\frac{k}{\epsilon}\leq\sum_{i\in[n]}m_{i}^{*}\leq\min\{\sum_{i=1}^{n}\frac{1}{\sum_{j=1}^{n}b_{ij}},n\}\!\cdot\!\frac{k}{\epsilon}

Assigning 1j=1nbijkϵ\frac{1}{\sum_{j=1}^{n}b_{ij}}\!\cdot\!\frac{k}{\epsilon} samples to node ii suffices to solve the TSC problem.

Proof.

(1) i[n]minkϵ\sum_{i\in[n]}m_{i}^{*}\leq\frac{nk}{\epsilon} is trivial because mi=kϵm_{i}=\frac{k}{\epsilon} for any ii is a feasible solution for the Equation B.2 because j=1n(Wij1)21\sum_{j=1}^{n}(W_{ij}^{-1})^{2}\leq 1 for any i[n]i\in[n]

(2) Let m¯=(1m1,,1mn)\bar{m}=(\frac{1}{m_{1}},\cdots,\frac{1}{m_{n}})^{\top}, M¯=(ϵk,,ϵk)\bar{M}=(\frac{\epsilon}{k},\cdots,\frac{\epsilon}{k})^{\top}. In Equation B.2, setting the first constraints to equality for every i[n]i\in[n], we have (W1W1)m¯=M¯(W^{-1}\circ W^{-1})\bar{m}=\bar{M}. From Lemma 19, the solution m¯=BM¯\bar{m}=B\bar{M} is feasible (mi>0,im_{i}>0,\forall i). Now 1mi=ϵj=1nbijk\frac{1}{m_{i}}=\frac{\epsilon\sum_{j=1}^{n}b_{ij}}{k}, mi=1j=1nbijkϵm_{i}=\frac{1}{\sum_{j=1}^{n}b_{ij}}\!\cdot\!\frac{k}{\epsilon}, and i=1nmi=i=1n1j=1nbijkϵ\sum_{i=1}^{n}m_{i}=\sum_{i=1}^{n}\frac{1}{\sum_{j=1}^{n}b_{ij}}\!\cdot\!\frac{k}{\epsilon}. Since m¯\bar{m} is in the feasible region, i[n]mii=1n1j=1nbijkϵ\sum_{i\in[n]}m_{i}^{*}\leq\sum_{i=1}^{n}\frac{1}{\sum_{j=1}^{n}b_{ij}}\!\cdot\!\frac{k}{\epsilon}. Thus, i[n]mii=1n1j=1nbijkϵ\sum_{i\in[n]}m_{i}^{*}\leq\sum_{i=1}^{n}\frac{1}{\sum_{j=1}^{n}b_{ij}}\!\cdot\!\frac{k}{\epsilon}. Assigning 1j=1nbijkϵ\frac{1}{\sum_{j=1}^{n}b_{ij}}\!\cdot\!\frac{k}{\epsilon} samples to node ii suffies to solve the TSC problem.

(3) In the dual form Equation 13, λi=k2nϵ2\lambda_{i}=\frac{k^{2}}{n\epsilon^{2}} for any ii is a feasible solution. From strong duality, we have

i[n]mi\displaystyle\sum_{i\in[n]}m_{i}^{*} 2kϵi=1n1nj=1n(Wij1)2kϵ\displaystyle\geq\frac{2k}{\epsilon}\sum_{i=1}^{n}\sqrt{\frac{1}{n}\sum_{j=1}^{n}(W_{ij}^{-1})^{2}}-\frac{k}{\epsilon}
2kϵi=1n1n(j=1nWij1n)2kϵ\displaystyle\geq\frac{2k}{\epsilon}\sum_{i=1}^{n}\sqrt{\frac{1}{n}\left(\frac{\sum_{j=1}^{n}W_{ij}^{-1}}{\sqrt{n}}\right)^{2}}-\frac{k}{\epsilon} (Cauchy-Schwarz)
kϵ\displaystyle\geq\frac{k}{\epsilon} (j=1nWij1\sum_{j=1}^{n}W_{ij}^{-1} for all ii)

(4) The other lower bound is inspired by Lemma 8 and truncate all λi\lambda_{i} to max{0,j=1nbij(k=1nbjk)2}\max\{0,\sum_{j=1}^{n}\frac{b_{ij}}{(\sum_{k=1}^{n}b_{jk})^{2}}\} to ensure they are in the feasible set of the dual form Equation 13. ∎

Appendix E Missing proof in Section 5

Lemma 20.

The optimization Equation B.2 is equivalent to the following second-order cone programming.

mint1,,tn,z1,,zn\displaystyle\min_{t_{1},\cdots,t_{n},z_{1},\cdots,z_{n}}\quad i=1nzi\displaystyle\sum_{i=1}^{n}z_{i}
s.t. j=1naij2tiϵk,i\displaystyle\sum_{j=1}^{n}a_{ij}^{2}t_{i}\leq\frac{\epsilon}{k},\quad\forall i
[tizi,2]2ti+zi,i\displaystyle||[t_{i}-z_{i},2]||_{2}\leq t_{i}+z_{i},\quad\forall i
ti>0,i\displaystyle t_{i}>0,\quad\forall i

The optimal solution (z1,,zn)(z_{1}^{*},\cdots,z_{n}^{*}) is equal to the optimal solution (m1,,mn)(m_{1}^{*},\cdots,m_{n}^{*}) in the optimization (P).

Proof.

The original optimization Equation B.2 is :

minm1,,mn\displaystyle\min_{m_{1},\cdots,m_{n}}\quad i=1nmi\displaystyle\sum_{i=1}^{n}m_{i}
s.t. j=1naij2mjϵk,i\displaystyle\sum_{j=1}^{n}\frac{a_{ij}^{2}}{m_{j}}\leq\frac{\epsilon}{k},\quad\forall i
mi>0,i\displaystyle m_{i}>0,\quad\forall i

Let ti=1mit_{i}=\frac{1}{m_{i}} and zi1tiz_{i}\geq\frac{1}{t_{i}}, the programming is equivalent to

mint1,,tn,z1,,zn\displaystyle\min_{t_{1},\cdots,t_{n},z_{1},\cdots,z_{n}}\quad i=1nzi\displaystyle\sum_{i=1}^{n}z_{i}
s.t. j=1naij2tiϵk,i\displaystyle\sum_{j=1}^{n}a_{ij}^{2}t_{i}\leq\frac{\epsilon}{k},\quad\forall i
ziti1,i\displaystyle z_{i}t_{i}\geq 1,\quad\forall i
ti>0,i\displaystyle t_{i}>0,\quad\forall i

The constraint ziti1z_{i}t_{i}\geq 1 is equivalent to [tizi,2]2ti+zi||[t_{i}-z_{i},2]||_{2}\leq t_{i}+z_{i}. Thus, this programming is equivalent to the following second-order cone programming.

mint1,,tn,z1,,zn\displaystyle\min_{t_{1},\cdots,t_{n},z_{1},\cdots,z_{n}}\quad i=1nzi\displaystyle\sum_{i=1}^{n}z_{i}
s.t. j=1naij2tiϵk,i\displaystyle\sum_{j=1}^{n}a_{ij}^{2}t_{i}\leq\frac{\epsilon}{k},\quad\forall i
[tizi,2]2ti+zi,i\displaystyle||[t_{i}-z_{i},2]||_{2}\leq t_{i}+z_{i},\quad\forall i
ti>0,i\displaystyle t_{i}>0,\quad\forall i

Appendix F More Experiments

In this section, we will give more detailed experiments. Our code can be found in https://github.com/liuhl2000/Sample-Complexity-of-Opinion-Formation-on-Networks. These experiments are done with seven 8-core Intel CPUs. The optimization is solved by running the solver until the convergence error is less than 1e61e-6. All optimizations for our chosen networks could be finished in 150 iterations. The most time-consuming part is generating random weights 50 times and running corresponding experiments in Section F.2. Actually, there is little difference among these random experiments and in order to reproduce our experiments quickly, you can simply set the “Iter num” parameter to 11 in our code.

F.1 Distribution of Samples at the Optimal Point

To study the distribution of samples, we assume all vij=α0v_{ij}=\alpha\geq 0. In our experiments, we choose α=0.01,0.1,1,5,10\alpha=0.01,0.1,1,5,10. For figures in this section, the x-axis is node degree and the y-axis is the average or variance of Nd={ϵmik:di=d,i[n]}N^{d}=\{\frac{\epsilon m_{i}^{*}}{k}:d_{i}=d,i\in[n]\} where mim_{i}^{*} is the sample assigned to agent ii at the optimal point of Equation B.2.

We generate scale-free networks using Barabasi–Albert preferential attachment (Barabási and Albert 1999) with attachment parameter 22 (Incoming node will attach to two existing nodes) for n=100,200,400,600,800,1000n=100,200,400,600,800,1000. Figure 4 and Figure 5 show the average sample number and variance for different α\alpha.

Refer to caption
(a) α=0.01\alpha=0.01
Refer to caption
(b) α=0.1\alpha=0.1
Refer to caption
(c) α=1\alpha=1
Refer to caption
(d) α=5\alpha=5
Refer to caption
(e) α=10\alpha=10
Figure 4: Relationship between average samples and degree of SF for different α\alpha.
Refer to caption
(a) α=0.01\alpha=0.01
Refer to caption
(b) α=0.1\alpha=0.1
Refer to caption
(c) α=1\alpha=1
Refer to caption
(d) α=5\alpha=5
Refer to caption
(e) α=10\alpha=10
Figure 5: Relationship between sample variance and degree of SF for different α\alpha.

We use Erdos-Renyi random networks with probability 0.30.3(Two nodes has the probability 0.30.3 to have one edge) for n=600,620,650,680,700n=600,620,650,680,700. Figure 6 and Figure 7 show the average sample number and variance for different α\alpha.

Refer to caption
(a) α=0.01\alpha=0.01
Refer to caption
(b) α=0.1\alpha=0.1
Refer to caption
(c) α=1\alpha=1
Refer to caption
(d) α=5\alpha=5
Refer to caption
(e) α=10\alpha=10
Figure 6: Relationship between average samples and degree of ER for different α\alpha.
Refer to caption
(a) α=0.01\alpha=0.01
Refer to caption
(b) α=0.1\alpha=0.1
Refer to caption
(c) α=1\alpha=1
Refer to caption
(d) α=5\alpha=5
Refer to caption
(e) α=10\alpha=10
Figure 7: Relationship between sample variance and degree of ER for different α\alpha.

For real-world networks, Figure 8, 9, and Figure 10 show the average sample number and variance for different α\alpha.

Refer to caption
(a) α=0.01\alpha=0.01
Refer to caption
(b) α=0.1\alpha=0.1
Refer to caption
(c) α=1\alpha=1
Refer to caption
(d) α=5\alpha=5
Refer to caption
(e) α=10\alpha=10
Figure 8: Relationship between average samples and degree of RN for different α\alpha.
Refer to caption
(a) α=0.01\alpha=0.01
Refer to caption
(b) α=0.1\alpha=0.1
Refer to caption
(c) α=1\alpha=1
Refer to caption
(d) α=5\alpha=5
Refer to caption
(e) α=10\alpha=10
Figure 9: Relationship between average samples and degree of RN for different α\alpha (when d8d\geq 8).
Refer to caption
(a) α=0.01\alpha=0.01
Refer to caption
(b) α=0.1\alpha=0.1
Refer to caption
(c) α=1\alpha=1
Refer to caption
(d) α=5\alpha=5
Refer to caption
(e) α=10\alpha=10
Figure 10: Relationship between sample variance and degree of RN for different α\alpha

Based on the above results, low-degree agents tend to have more samples, and agents with the same degree tend to have the same number of samples. When α\alpha is small, the sample assignment seems to have relatively good monotonicity with did_{i} but as α\alpha gets larger, the trend for high degrees becomes more unpredictable.

Another observation is that as nn increases, the samples assigned to each degree decrease when α\alpha gets larger. Our theory can not explain this well now, and it is interesting to investigate it more in future work.

F.2 Tightness of bounds

In this section, we test the tightness of our upper and lower bound in Theorem 7. We also study the impact of the magnitude of influences factors on the performance of bounds. The influence factors vijv_{ij}s are uniformly generated at random from [0,0.01],[0,0.1],[0,1],[0,5],[0,10][0,0.01],[0,0.1],[0,1],[0,5],[0,10].

Our synthesized graphs are generated for n=100,200,400,600,800,1000n=100,200,400,600,800,1000. For each nn, we generate 2525 networks with different structures and vijv_{ij}s (The detailed process will be discussed later). For all these networks, we calculate their optimal solution, the upper bound and the lower bound indicated in Theorem 7. We measure the performance of bounds in Theorem 7 by relative error (i.e |U or Li[n]mi|i[n]mi\frac{|U\text{ or }L-\sum_{i\in[n]}m_{i}^{*}|}{\sum_{i\in[n]}m_{i}^{*}} where UU is the upper in Theorem 7) and LL is the max of 1 and the lower bound in Theorem 7. In the following plots, the x-axis is the relative errors of upper/lower bounds and the y-axis is the frequency (number of networks in each bin).

For scale-free networks, for each nn, we randomly generate 1515 networks with attachment parameter 22 and 1010 networks with attachment parameter 33. The influence factors of these networks are also randomly generated in a given range. The error plot is given in Figure 11.

Refer to caption
(a) vij[0,0.01]v_{ij}\in\text{$[0,0.01]$}
Refer to caption
(b) vij[0,0.1]v_{ij}\in\text{$[0,0.1]$}
Refer to caption
(c) vij[0,1]v_{ij}\in\text{$[0,1]$}
Refer to caption
(d) vij[0,5]v_{ij}\in\text{$[0,5]$}
Refer to caption
(e) vij[0,10]v_{ij}\in\text{$[0,10]$}
Figure 11: Bounds of SF for different influence factors vijv_{ij}s

For random regular graphs, for each nn, we generate graphs with degree roughly equal to n0.25+0.1kn^{0.25+0.1k} where k={0,1,2,3,4}k=\{0,1,2,3,4\}. For each degree, we generate 55 graphs. The influence factors of these networks are also randomly generated in a given range. The error is given in Figure 12.

Refer to caption
(a) vij[0,0.01]v_{ij}\in\text{$[0,0.01]$}
Refer to caption
(b) vij[0,0.1]v_{ij}\in\text{$[0,0.1]$}
Refer to caption
(c) vij[0,1]v_{ij}\in\text{$[0,1]$}
Refer to caption
(d) vij[0,5]v_{ij}\in\text{$[0,5]$}
Refer to caption
(e) vij[0,10]v_{ij}\in\text{$[0,10]$}
Figure 12: Bounds of RR for different influence factors vijv_{ij}s

For Erdos-Renyi graphs, for each nn, we generate 2525 graphs with edge-connection probability 0.250.25. The influence factors of these networks are also randomly generated in a given range. The error plot is given in Figure 13.

Refer to caption
(a) vij[0,0.01]v_{ij}\in\text{$[0,0.01]$}
Refer to caption
(b) vij[0,0.1]v_{ij}\in\text{$[0,0.1]$}
Refer to caption
(c) vij[0,1]v_{ij}\in\text{$[0,1]$}
Refer to caption
(d) vij[0,5]v_{ij}\in\text{$[0,5]$}
Refer to caption
(e) vij[0,10]v_{ij}\in\text{$[0,10]$}
Figure 13: Bounds of ER for different influence factors vijv_{ij}s

In Table 5, we show the relative error for bounds of different real-world networks for different vijv_{ij}s. Let erruerr_{u} and errlerr_{l} be the maximum relative error of the upper and lower bound in Theorem 7 respectively among these random experiments.

Network Upper bound of vijv_{ij} erruerr_{u} errlerr_{l}
ego-Facebook 0.01 2.2e6%2.2\mathrm{e}{-6}\% 6.6e7%6.6\mathrm{e}{-7}\%
ego-Facebook 0.1 0.4%0.4\% 3e3%3\mathrm{e}{-3}\%
ego-Facebook 1 27%27\% 24%24\%
ego-Facebook 5 89%89\% 74%74\%
ego-Facebook 10 98%98\% 70%70\%
Econ 0.01 1e6%1\mathrm{e}{-6}\% 1e14%1\mathrm{e}{-14}\%
Econ 0.1 0.1%0.1\% 3e5%3\mathrm{e}{-5}\%
Econ 1 6.6%6.6\% 1.3%1.3\%
Econ 5 64%64\% 88%88\%
Econ 10 112%112\% 75%75\%
Email-Eu 0.01 3e7%3\mathrm{e}{-7}\% 1e13%1\mathrm{e}{-13}\%
Email-Eu 0.1 0.23%0.23\% 4.3e5%4.3\mathrm{e}{-5}\%
Email-Eu 1 30%30\% 21%21\%
Email-Eu 5 117%117\% 76%76\%
Email-Eu 10 151%151\% 60%60\%
130bit 0.01 2e7%2\mathrm{e}{-7}\% 3e13%3\mathrm{e}{-13}\%
130bit 0.1 0.8%0.8\% 8e3%8\mathrm{e}{-3}\%
130bit 1 41%41\% 36%36\%
130bit 5 65%65\% 11%11\%
130bit 10 42%42\% 3.4%3.4\%
Table 5: Bounds for real-world networks for different influence factors vijv_{ij}s

From all experiments above, we can observe that the performance of the bounds in Theorem 7 is excellent when the influence factors are small but get worse when the influence factors become larger.

The reason why these two bounds get good performance for small influence factors has the following intuitive explanation. Consider the original optimization Equation B.2, when all influence factors are zero, the optimal value is achieved when we set all inequality of the first constraint to equality. When influence factors are small, the optimal solution will also tend to be the value derived in this way, which is exactly the method we get our upper and lower bound.

Through all these experiments, an obvious result is that when α\alpha gets large, the total sample complexity decreases. However, our upper bound in Theorem 5, by assigning each agent α+1αdi+1kϵ\frac{\alpha+1}{\alpha d_{i}+1}\!\cdot\!\frac{k}{\epsilon}, does not capture this trend. It seems that we can remove the α\alpha in the numerator to improve this bound and we leave it as future work.