Quantifying Community Evolution in Developer Social Networks: Proof of Indices’ Properties

Liang Wang [email protected] 0000-0001-5444-748X State Key Laboratory for Novel Software Technology, Nanjing University163 Xianlin Ave.NanjingChina , Ying Li [email protected] 0000-0002-4637-1742 State Key Laboratory for Novel Software Technology, Nanjing University163 Xianlin Ave.NanjingChina , Jierui Zhang [email protected] 0000-0002-7290-790X State Key Laboratory for Novel Software Technology, Nanjing University163 Xianlin Ave.NanjingChina and Xianping Tao [email protected] 0000-0002-5536-3891 State Key Laboratory for Novel Software Technology, Nanjing University163 Xianlin Ave.NanjingChina

Abstract.

The document provides the proof to properties of community evolution indices including community split and shrink in paper: Liang Wang, Ying Li, Jierui Zhang, and Xianping Tao. 2022. QuantifyingCommunity Evolution in Developer Social Networks. InProceedings of the30th ACM Joint European Software Engineering Conference and Symposiumon the Foundations of Software Engineering (ESEC/FSE ’22), November 14–18, 2022, Singapore, Singapore.ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3540250.3549106. Proof to properties of community merge and expand is similar.

Proof of Properties, Online Material

^†^†copyright: none^†^†journal: ONLINE^†^†ccs: Software and its engineering Programming teams^†^†ccs: Software and its engineering Open source model^†^†ccs: General and reference Metrics

1. Brief Introduction to the Properties of Community Split and Shrink Indices

Let $\mathcal{I}^{\psi}_{c_{t,i}}$ and $\mathcal{I}^{\eta}_{c_{t,i}}$ denote the community split and shrink indices, respectively. Without loss of generality, we assume $m\geq 1$ . The properties of the two indices are as follows.

P-1. $\mathcal{I}^{\psi}_{c_{t,i}}$ and $\mathcal{I}^{\eta}_{c_{t,i}}$ are strictly monotonic increasing functions of $m$ , given $0<\eta_{i}<1$ , and $\hat{\psi}_{i,j}=\frac{1}{m},j=1,2,\cdots,m$ .

P-2. $\mathcal{I}^{\psi}_{c_{t,i}}$ / $\mathcal{I}^{\eta}_{c_{t,i}}$ is a strictly monotonic decreasing / increasing function of $\eta_{i}$ , respectively, for $\eta_{i}>0$ , given $m>1$ , and member migration distribution $\hat{\psi}_{i,j},j=1,2,\cdots,m$ with $\mathcal{H}_{c_{t,i}}>0$ .

P-3. Given $m$ and $\eta_{i}$ , the maximum split index $\mathcal{I}^{\psi}_{c_{t,i}}=(1-\eta_{i})\mathcal{H}^{*}_{t\rightarrow t+1}$ is obtained when the members of $c_{t,i}$ migrate to the communities detected in the next step with a even distribution, i.e., when we have $\hat{\psi}_{i,j}=\frac{1}{m},j=1,2,\cdots,m$ . And the minimum split index $\mathcal{I}^{\psi}_{c_{t,i}}=0$ is obtained when $m=1$ or all the members of $c_{t,i}$ who stay in the project migrate to a single community in the next step, i.e., there exists a $j^{\prime}$ -th community in time $t+1$ that $\hat{\psi}_{i,j^{\prime}}=1$ and $\hat{\psi}_{i,j}=0,\forall j\neq j^{\prime}$ , resulting in $\mathcal{H}_{c_{t,i}}=0$ and $\mathcal{I}^{\psi}_{c_{t,i}}=0$ .

P-4. Given $m>1$ and $\eta_{i}$ , the maximum shrink index $\mathcal{I}^{\eta}_{c_{t,i}}=\eta_{i}\mathcal{H}^{*}_{t\rightarrow t+1}$ is obtained when the corresponding split index is minimized, i.e., all stayed members of community $c_{t,i}$ migrate to a single community in the next step. And the minimum shrink index $\mathcal{I}^{\eta}_{c_{t,i}}=\eta_{i}^{2}\mathcal{H}^{*}_{t\rightarrow t+1}$ is obtained when the members of $c_{t,i}$ migrate evenly to communities in time $t+1$ . For the special case of $m=1$ , the shrink index is only determined by $\eta_{i}$ .

Refer to caption — Figure 1. Curves of the split and shrink indices under different conditions specified by $m$ , $\eta_{i}$ , and $\hat{\psi}_{i,j}$ . The even distribution corresponds to case $\hat{\psi}_{i,j}=\frac{1}{m},j=1,\cdots,m$ . The random distribution is obtained by randomly assigning $\hat{\psi}_{i,j}$ ’s values.

Fig. 1 illustrates the curves of the split and shrink indices under different conditions, from which we can find correspondence to the above properties.

2. Proof of The Properties

The proof of the above four properties are as follows.

P-1. $\mathcal{I}^{\psi}_{c_{t,i}}$ and $\mathcal{I}^{\eta}_{c_{t,i}}$ are strictly monotonic increasing functions of $m$ , given $0<\eta_{i}<1$ , and $\hat{\psi}_{i,j}=\frac{1}{m},j=1,2,\cdots,m$ .

Proof. First, for community split index $\mathcal{I}^{\psi}_{c_{t,i}}$ , take the partial derivative of $\mathcal{I}^{\psi}_{c_{t,i}}$ with respect to $m$ gives us:

(1)

\frac{\partial\mathcal{I}^{\psi}_{c_{t,i}}}{\partial m}=\frac{\partial(1-\eta_{i})\mathcal{H}_{c_{t,i}}}{\partial m}=(1-\eta_{i})\frac{\partial\mathcal{H}_{c_{t,i}}}{\partial m}.

When $\hat{\psi}_{i,j}=\frac{1}{m},j=1,2,\cdots,m$ , we have $\mathcal{H}_{c_{t,i}}=-\sum_{j=1}^{m}\hat{\psi}_{i,j}\log_{2}(\hat{\psi}_{i,j})=-\log_{2}\frac{1}{m}$ . As a result, we have

(2)

\frac{\partial\mathcal{H}_{c_{t,i}}}{\partial m}=\frac{1}{m\ln(2)}.

Substitute Eq. (2) into Eq. (1), we have

(3)

\frac{\partial\mathcal{I}^{\psi}_{c_{t,i}}}{\partial m}=\frac{1-\eta_{i}}{m\ln(2)}.

We can see from Eq. (3) that $\frac{\partial\mathcal{I}^{\psi}_{c_{t,i}}}{\partial m}>0$ , for $m\geq 1$ , and $0\leq\eta_{i}<1$ .

As a result, the community split index $\mathcal{I}^{\psi}_{c_{t,i}}$ is a strictly monotonic increasing function of $m$ , for $m\geq 1$ , $0\leq\eta_{i}<1$ , and $\hat{\psi}_{i,j}=\frac{1}{m},j=1,2,\cdots,m$ .

For the case of $\eta_{i}=1$ , we always have $\mathcal{I}^{\psi}_{c_{t,i}}=0,\forall m\geq 1$ , and regardless the distribution of $\hat{\psi}_{i,j},j=1,2,\cdots,m$ , which is intuitively correct that a community cannot split if all of its members leaves the project.

Next, for community shrink index $\mathcal{I}^{\eta}_{c_{t,i}}$ , we also take its partial derivative with respect to $m$ , which is

(4)

\frac{\partial\mathcal{I}^{\eta}_{c_{t,i}}}{\partial m}=\frac{\partial\eta_{i}(\mathcal{H}^{*}_{t\rightarrow t+1}-\mathcal{I}^{\psi}_{c_{t,i}}+\sigma_{\eta_{i}})}{\partial m}=\eta_{i}(\frac{\partial\mathcal{H}^{*}_{t\rightarrow t+1}}{\partial m}-\frac{\partial\mathcal{I}^{\psi}_{c_{t,i}}}{\partial m}).

Since $\frac{\partial\mathcal{H}^{*}_{t\rightarrow t+1}}{\partial m}=\frac{\partial-\log_{2}(\frac{1}{m})}{\partial m}=\frac{1}{m\ln(2)}$ , and $\frac{\partial\mathcal{I}^{\psi}_{c_{t,i}}}{\partial m}=\frac{1-\eta_{i}}{m\ln(2)}$ following Eq. (3), we have

(5)

\frac{\partial\mathcal{I}^{\eta}_{c_{t,i}}}{\partial m}=\frac{\eta_{i}^{2}}{m\ln(2)}.

From Eq. (5), we have $\frac{\partial\mathcal{I}^{\eta}_{c_{t,i}}}{\partial m}>0$ for $m\geq 1$ , $0<\eta_{i}\leq 1$ , and $\hat{\psi}_{i,j}=\frac{1}{m},j=1,2,\cdots,m$ .

For the case of $\eta_{i}=0$ , we always have $\mathcal{I}^{\eta}_{c_{t,i}}=0,\forall m\geq 1$ , given any member migration distribution $\hat{\psi}_{i,j},j=1,2,\cdots,m$ , which is reasonable because a community does not shrink if none of its members leave the project, regardless the number of communities detected in the next step (i.e., $m$ ).

Combining the above two results, we show that: $\mathcal{I}^{\psi}_{c_{t,i}}$ and $\mathcal{I}^{\eta}_{c_{t,i}}$ are strictly monotonic increasing functions of $m$ , given $0<\eta_{i}<1$ , and $\hat{\psi}_{i,j}=\frac{1}{m},j=1,2,\cdots,m$ .

$\hfill\Box$

Proof. First, we show that community split index $\mathcal{I}^{\psi}_{c_{t,i}}$ is a strictly monotonic decreasing function of $\eta_{i}$ under the given conditions. The partial derivative of $\mathcal{I}^{\psi}_{c_{t,i}}$ with respect to $\eta_{i}$ is

(6)

\frac{\partial\mathcal{I}^{\psi}_{c_{t,i}}}{\partial\eta_{i}}=\frac{\partial(1-\eta_{i})\mathcal{H}_{c_{t,i}}}{\partial\eta_{i}}=-\mathcal{H}_{c_{t,i}}.

As a result, we have $\frac{\partial\mathcal{I}^{\psi}_{c_{t,i}}}{\partial\eta_{i}}<0$ , meaning that $\mathcal{I}^{\psi}_{c_{t,i}}$ is a strictly monotonic decreasing function of $\eta_{i}$ , as long as $\mathcal{H}_{c_{t,i}}>0$ .

We have $\mathcal{H}_{c_{t,i}}=0$ and thus $\mathcal{I}^{\psi}_{c_{t,i}}=0,\forall\eta_{i}\in[0,1]$ when only one of the $\hat{\psi}_{i,j}$ ’s is one with rest of the $\hat{\psi}_{i,j}$ ’s equal to zero (including the case that $m=1$ ). Intuitively, this case means that a community is not regarded as splitting if all of its remaining members migrate to a single community in the next step, regardless the amount of members who leave the project.

Next, we show that community shrink index $\mathcal{I}^{\eta}_{c_{t,i}}$ is a strictly monotonic increasing function of $\eta_{i}$ under the given conditions. Taking the partial derivative of $\mathcal{I}^{\eta}_{c_{t,i}}$ with respect to $\eta_{i}$ gives us

(7)

\begin{split}\frac{\partial\mathcal{I}^{\eta}_{c_{t,i}}}{\partial\eta_{i}}&=\frac{\partial\eta_{i}(\mathcal{H}^{*}_{t\rightarrow t+1}-\mathcal{I}^{\psi}_{c_{t,i}}+\sigma_{\eta_{i}})}{\partial\eta_{i}}=(\mathcal{H}^{*}_{t\rightarrow t+1}-\mathcal{I}^{\psi}_{c_{t,i}}+\sigma_{\eta_{i}})+\eta_{i}\frac{\partial(\mathcal{H}^{*}_{t\rightarrow t+1}-\mathcal{I}^{\psi}_{c_{t,i}}+\sigma_{\eta_{i}})}{\partial\eta_{i}}\\ &=\mathcal{H}^{*}_{t\rightarrow t+1}-(1-\eta_{i})\mathcal{H}_{c_{t,i}}+\sigma_{\eta_{i}}+\eta_{i}\frac{\partial\mathcal{I}^{\psi}_{c_{t,i}}}{\partial\eta_{i}}+\eta_{i}\frac{\partial\sigma_{\eta_{i}}}{\partial\eta_{i}}.\end{split}

The forth term in Eq. (7) is $\eta_{i}\frac{\partial\mathcal{I}^{\psi}_{c_{t,i}}}{\partial\eta_{i}}=\eta_{i}\frac{\partial(1-\eta_{i})\mathcal{H}_{c_{t,i}}}{\partial\eta_{i}}=-\eta_{i}\mathcal{H}_{c_{t,i}}$ . And the last term in Eq. (7) is $\eta_{i}\frac{\partial\sigma_{\eta_{i}}}{\partial\eta_{i}}=0.5\eta_{i}$ when $m=1$ , and $\eta_{i}\frac{\partial\sigma_{\eta_{i}}}{\partial\eta_{i}}=0$ when $m>1$ following the definition of $\sigma_{\eta_{i}}$ .

For the case of $m>1$ , Eq. (7) can be written as

(8)

\frac{\partial\mathcal{I}^{\eta}_{c_{t,i}}}{\partial\eta_{i}}=\mathcal{H}^{*}_{t\rightarrow t+1}-(1-2\eta_{i})\mathcal{H}_{c_{t,i}}.

Because $\mathcal{H}^{*}_{t\rightarrow t+1}$ is the maximum entropy, we have $\mathcal{H}^{*}_{t\rightarrow t+1}>0$ , $\mathcal{H}^{*}_{t\rightarrow t+1}\geq\mathcal{H}_{c_{t,i}}\geq 0$ , and $(1-2\eta_{i})<1$ when $\eta_{i}>0$ . We then have $\frac{\partial\mathcal{I}^{\eta}_{c_{t,i}}}{\partial\eta_{i}}>0$ for $\eta_{i}>0$ and $m>1$ .

For the case of $m=1$ , we have $\sigma_{\eta_{i}}=\eta_{i}\frac{\partial\sigma_{\eta_{i}}}{\partial\eta_{i}}=0.5\eta_{i}$ , and $\mathcal{H}^{*}_{t\rightarrow t+1}=\mathcal{H}_{c_{t,i}}=0$ . Eq. (7) then becomes

(9)

\frac{\partial\mathcal{I}^{\eta}_{c_{t,i}}}{\partial\eta_{i}}=\eta_{i}.

We also have $\frac{\partial\mathcal{I}^{\eta}_{c_{t,i}}}{\partial\eta_{i}}>0$ for $\eta_{i}>0$ and $m=1$ . As a result, $\mathcal{I}^{\eta}_{c_{t,i}}$ is a strictly monotonic increasing function of $\eta_{i}$ for $\eta_{i}>0$ , and $m\geq 1$ .

Summarizing the above, we show that: $\mathcal{I}^{\psi}_{c_{t,i}}$ / $\mathcal{I}^{\eta}_{c_{t,i}}$ is a strictly monotonic decreasing / increasing function of $\eta_{i}$ , respectively, for $\eta_{i}>0$ , given $m>1$ , and the distribution of member migration $\hat{\psi}_{i,j},j=1,2,\cdots,m$ with $\mathcal{H}_{c_{t,i}}>0$ .

$\hfill\Box$

Proof. $\mathcal{I}^{\psi}_{c_{t,i}}=(1-\eta_{i})\mathcal{H}_{c_{t,i}}$ is a monotonic increasing function of $\mathcal{H}_{c_{t,i}}$ for $0\leq\eta_{i}\leq 1$ . Referring to the properties of information entropy (Shannon, 1948), entropy $\mathcal{H}_{c_{t,i}}=\mathcal{H}^{*}_{t\rightarrow t+1}=-\log_{2}\frac{1}{m}$ is a maximum when $\hat{\psi}_{i,j}=\frac{1}{m},j=1,2,\cdots,m$ . And the minimum value of $\mathcal{H}_{c_{t,i}}=0$ is obtained when only one of the $\hat{\psi}_{i,j}$ ’s equals to one and others equal to zero, which also includes the case of $m=1$ . As a result, the maximum split index is $\mathcal{I}^{\psi}_{c_{t,i}}=(1-\eta_{i})\mathcal{H}^{*}_{t\rightarrow t+1}=-(1-\eta_{i})\log_{2}\frac{1}{m}$ given $\eta_{i}$ . When $\eta_{i}=0$ , we have $\mathcal{I}^{\psi}_{c_{t,i}}=-\log_{2}\frac{1}{m}$ being the maximum possible value for the split index, which is only determined by $m$ . And the minimum split index is $\mathcal{I}^{\psi}_{c_{t,i}}=0$ .

$\hfill\Box$

Proof. Given $m>1$ and $\eta_{i}$ , the community shrink index $\mathcal{I}^{\eta}_{c_{t,i}}=\eta_{i}(\mathcal{H}^{*}_{t\rightarrow t+1}-\mathcal{I}^{\psi}_{c_{t,i}}+\sigma_{\eta_{i}})$ is a monotonic decreasing function of the community split index $\mathcal{I}^{\psi}_{c_{t,i}}$ . Referring to Property 3 presented above, we have the shrink index maximized when the split index is minimized, i.e., $\mathcal{H}_{c_{t,i}}=0$ . The maximum shrink index is given by $\mathcal{I}^{\eta}_{c_{t,i}}=\eta_{i}\mathcal{H}^{*}_{t\rightarrow t+1}=-\eta_{i}\log_{2}\frac{1}{m}$ . And the minimum shrink index $\mathcal{I}^{\eta}_{c_{t,i}}=\eta_{i}^{2}\mathcal{H}^{*}_{t\rightarrow t+1}=-\eta_{i}^{2}\log_{2}\frac{1}{m}$ when $\mathcal{I}^{\psi}_{c_{t,i}}=(1-\eta_{i})\mathcal{H}^{*}_{t\rightarrow t+1}$ .

In the above analysis we have $\sigma_{\eta_{i}}=0$ because we assume $m>1$ . When $m=1$ , we have $\sigma_{\eta_{i}}=0.5\eta_{i}$ , and $\mathcal{H}^{*}_{t\rightarrow t+1}=\mathcal{H}_{c_{t,i}}=0$ . The shrink index is $\mathcal{I}^{\eta}_{c_{t,i}}=0.5\eta_{i}^{2}$ , which is only determined by $\eta_{i}$ .

Following the above analysis, if we consider the change of $\eta_{i}$ , we can find that the maximum and minimum possible value of the shrink index is $\mathcal{I}^{\eta}_{c_{t,i}}=-\log_{2}\frac{1}{m}$ when $\eta_{i}=1$ , and $\mathcal{I}^{\eta}_{c_{t,i}}=0$ when $\eta_{i}=0$ , respectively.

$\hfill\Box$

From properties P-3 and P-4 we can further see that given $m>1$ , the community split and shrink indices vary in the same range given by $\mathcal{I}^{\psi}_{c_{t,i}}\in[0,-\log_{2}\frac{1}{m}]$ , and $\mathcal{I}^{\eta}_{c_{t,i}}\in[0,-\log_{2}\frac{1}{m}]$ , with different values of $\eta_{i}$ and the distribution of member migration specified by $\hat{\psi}_{i,j},j=1,2,\cdots,m$ . As a result, it is feasible for us to draw meaningful results, such as the community shows a stronger trend of splitting / shrinking, by directly comparing the values of community split and shrink indices.

Acknowledgements.

We thank all the reviewers for their efforts in improving the paper. This work is supported by the National Key R&D Program of China No. 2018AAA0102302, NSFC No. 62172203, Fundamental Research Funds for the Central Universities, and the Collaborative Innovation Center of Novel Software Technology and Industrialization.

References

(1)
Shannon (1948) Claude Elwood Shannon. 1948. A mathematical theory of communication. The Bell system technical journal 27, 3 (1948), 379–423.