The process of polarisation as a loss of dimensionality: measuring changes in polarisation using Singular Value Decomposition of Random Dot Product Graphs

Sage Anastasi Giulio Valentino Dalla Riva

Abstract

In this paper we present new methods that extend Baldassarri and Gelman’s theory of polarisation. Baldassarri and Gelman show that it is useful to define polarisation as increasing correlation between positions in the ideological field, which reduces political pluralism. We also draw from post-structuralist work which argues that this correlation extends to elements of culture beyond the ideological, and deliberate development of these correlations is a feature of polarised regimes such as apartheid.

To measure polarisation in social networks, we use Random Dot Product Graphs to embed social networks in metric spaces. Singular Value Decomposition of a social network provides an embedded dimensionality which corresponds to the number of uncorrelated dimensions in the network. Each uncorrelated dimension in a social network represents a part of that society which allows two people from different groups to form a social connection, such as living in a racially integrated neighbourhood. A decrease in the optimal dimensionality for the embedding of the network graph means that the dimensions in the network are becoming more correlated, and therefore the network is becoming more polarised.

We apply this method to the communication interactions among New Zealand Twitter users discussing climate change issues, from 2017 to 2023. We find that the discussion is becoming more polarised over time, as shown by a decrease in the dimensionality of the communication network. Second, we apply this method to discussions of the COP climate change conferences, showing that our methods agree with other researchers’ detection of polarisation in this space. Finally, we use networks generated by stochastic block models to explore how an increase of the isolation between distinct communities, or the increase of the predominance of one community over the other, in the social networks decrease the embedded dimensionality and are therefore identifiable as polarisation processes.

Corresponding author: Sage Anastasi, [email protected], University of Canterbury, Private Bag 4800, Christchurch 8140

keywords:

polarisation , ideological polarisation , political polarisation , social complexity , climate change , random dot product graphs , graph dimensionality , singular value decomposition

MSC:

62P25

^†^†journal: Mathematical Social Science

\affiliation

[inst1]organization=Department of Mathematics and Statistics, Canterbury University,country=New Zealand

\affiliation

[inst2]organization=Department of Mathematics and Statistics, Canterbury University,country=New Zealand

1 Introduction

Social and political polarisation is an issue of increasing concern in New Zealand [1], as well as a wide range of countries [2]. [3] [4] [5] [6]. While it was initially thought that New Zealand had avoided the populist takeover seen in countries such as the USA [7], radical changes in the government makeup in both 2020 and 2023 have since called this into question. This paper will explore existing methods of measuring polarisation and propose a new measurement based on Singular Value Decomposition of social Random Dot Product Graphs.

New Zealand was considered to have escaped the development of polarisation which had developed in similar nations, particularly with the 2016 election of Donald Trump in the USA [7]. Analysis of voter polarisation in NZ from 2009-2018 showed little evidence of polarisation in that period [8]. Topic modelling of political party manifestos showed similarities between the manifestos of parties on the left and right [9], suggesting that despite their rivalries the parties are not deeply divided. Tan [10] argues that the lack of bipartisanship between NZ’s left and right indicates that the parties are polarised; however, they found that the perceived distances between left and right parties, and between those parties and voters, were stable over time, suggesting that polarisation is not increasing. Satherley et al. [11] find that there is a high level of stability in the partisanship of NZ voters (i.e. they do not frequently change which party they support) in years up until 2017 and note that this creates a risk of polarisation if voters stay committed to a party that becomes increasingly extreme. In another paper, Satherley et al. [12] find that in analysis of polarisation between the Labour, Green, and National parties, NZ European voters and voters of high socio-economic status exhibit more polarisation in their party preferences than other groups. Stanley et al. [13] find that climate change is not a polarising issue for New Zealand, with the voting public largely accepting its existence.

This history of political stability highlights how irregular the NZ election results in 2020 and 2023 have been, with significant implications for potential polarisation. The 2020 election saw the Labour party win more than 50% of the vote, the first outright majority since the change to Mixed Member Proportional government in 1996, and saw the right-wing ACT minor party increase their vote share from 0.5% to 7.58%. The covid-19 pandemic has subsequently become a focus for disinformation producers, prompting concerns about an increase in polarisation since 2020 [14] [15] [16]. The 2023 election gave further new results; despite the return to the right-wing, both the NZ First and ACT parties were required for the National party to form a coalition, creating the first instance of the National party having to negotiate with two coalition partners of this size. The 2023 Coalition Government has rapidly removed Māori language official Government documents [17], and removed references to the Treaty of Waitangi (a constitutional document that forms the basis of the relationship between the indigenous Māori people and the British settlers) from the law [18]. The ACT party have also introduced a parliamentary bill to redefine the “Treaty Principles” in ways that remove references to Māori. The Waitangi Tribunal (the permanent commission of inquiry tasked with investigating governmental breaches of the Treaty of Waitangi) has argued that “the Crown’s [Government’s] actions ‘threaten to disfigure or rupture’ the Crown–Māori relationship and ‘could set back the foundational relationships of Aotearoa New Zealand for decades’” [19]. A rise in general racist sentiment, as well as racist hate crimes and violence, has been documented [20]. As such, we think that questions about whether the polarisation may be happening along an axis other than party affiliation are particularly relevant in the current political atmosphere, since there is a risk that these policies will create a polarisation between Māori and pākehā (descendants of British settlers).

Polarisation scholarship in New Zealand is particularly influenced from research in the USA, so it is useful to cover some key analyses of their system as well as general data-based polarisation measurement methods. Heatherington [21] famously found that the Republican and Democrat parties have no overlap in their policies or voting patterns, finding a bimodal division by inspection (e.g. no overlap between the ideological groups). Most subsequent research uses significance using tests such as Hartigan’s Dip test [22] [23]. When investigating social networks, bimodality is expressed as strong in-group/out-group divisions [24], and is often characterised by hostility between the two groups; it is common to consider this hostility as a key sign of polarisation, in addition to the differences in policy positions between the two poles (usually the Republican and Democrat parties in the USA) [25] [26]. On social media, these hostile divisions usually take the form of ”echo chambers” that focus on one set of political views and exclude all others [27] [28].

We have some specific concerns about the use of bimodality in detecting polarisation. The most important is that use of hypothesis tests can be used to detect when a distribution is polarised, but after the first significant result it is difficult to robustly show that any further increase in polarisation is also significant. That is to say, while subsequent results may also be statistically significant, it is difficult to show that the increase from the first result to the second is itself significant. This means that we are not confident that studies which use bimodality are able to claim that We are therefore interested in developing measurements of polarisation that do not rely on significance testing in this way. A secondary concern is that this approach can struggle when assessing multi-party democracies that do not have a clear left- and right-wing split [29]. In the New Zealand context, the NZ First party makes it difficult to clearly reduce the field to left and right blocs, as they have formed coalition governments with both the Labour and National parties many times. As such, finding a method of measuring polarisation that can better handle this type of situation would be beneficial to researchers in New Zealand and similar countries.

Another common aspect of definitions of polarisation is that the polarisation is maximised when the groups are of equal size, as well as strongly divided. This was first proposed by Esteban & Ray [30]. Taking group size into account is intended to help tell polarisation conflicts apart from other major social conflicts, such as conflicts over wealth inequality. However, a consequence of this theoretical approach is that sets a rigid upper limit on how severe a polarisation can become if one of the groups is a small minority. This concerns us because of issues such as potential polarisation against groups such as refugees; they are by definition a small proportion of the population but are still considered to be a potential cause of polarisation, especially in countries that are seeing influxes of refugees due to war or climate change. We are interested in finding definitions of polarisation, and tools for measuring it, that are not as affected by the size of the groups.

Computational data science approaches to measuring polarisation are popular, especially for their speed when analysing millions of records from social media services. There are fast and simple algorithms for detecting communities that can then be investigated for polarisation, such as the Louvain method [31]. In some cases, clustering algorithms are used to measure polarisation; there are a number of ways of measuring distances between the clusters, which can indicate polarisation between the groups if the clusters also have low internal variation [32]. However, this has similar issues to bimodality testing insofar as it becomes difficult to interpret when there are more than two clusters. It is possible to argue that three or more groups can all be polarised against each other, but we are unconvinced by this; as the number of polarised groups increases, so does the number of dimensions required for their clusters to all be equidistant from each other, which increases the complexity of the space so much that it becomes more complex than what we would consider to be “polarised”.

In computational analyses focussed on bimodality instead of clustering, network data is usually processed using a dimension reduction algorithm, such as Principal Component Analysis or Canonical Correspondence Analysis, in order to create a single dimension that can be evaluated for bimodality. Measuring latent positions empirically means projecting them in lower dimensional spaces, and then assess the resulting first dimension for bimodality [33]. These analyses do not treat the correlation structure delivered by the PCA or CCA as informative in and of itself; they are simply used to discover the largest principal component or axis (i.e. the largest dimension). An advantage of this approach is that the issue driving polarisation is generated from the data, rather than researchers presuming what it is and risking choosing incorrectly. However, it does not entirely mitigate the risk of choosing the wrong dimension, since it presumes that polarisation is happening along the first dimension, and it therefore will still miss cases in which polarisation is happening along one of the other dimensions.

In order to comprehensively address the issue of correct dimension selection, Baldassarri and Gelman [34] propose a definition of polarisation as increasing correlation in the ideological space, i.e. that pairs of ideological issues asked about in the American National Election Study become more correlated as a result of polarisation, reducing political pluralism and restricting possible ideological opinions, until maximum correlation is reached and an oppositional binary is created. This polarisation stands in comparison to an integrated, non-polarised society which ”is not a society in which conflict is absent, but rather one in which conflict expresses itself through nonencompassing interests and identities”. In their tests of Pearson correlations of pairs of issues in the American National Election Study, they did not find that there was increasing correlation in the ideological field of US-American voters pre-2004. However, subsequent research by Kozlowski and Murphy [35] using the same methods found that correlation between pairs of ideological issues rapidly increased between 2004 and 2016. They noted that the increase in polarisation was strongest in the domains of economics and civil rights issues, rather than in the domain of moral issues that the ”culture war” framing of polarisation may suggest. Similarly, DellaPosta [36] conceptualises polarisation as similar to an oil spill, with the increasing correlation in ideological positions spreading polarisation to previously apolitical members of society. The article analyses how the ”belief network” of US-American politics has changed over time, concluding that the network has developed clusters which have reduced the prevalence of cross-cutting ideological positions; this means that pluralism has decreased and polarisation has increased.

Incorporating post-structuralist political theory allows us to expand on this understanding of polarisation as correlation. Ernesto Laclau & Chantal Mouffe [37] begin from the same understanding of pluralism and polarisation as scholars such as Baldassarri and Gelman, but they develop this concept beyond just they ideological field. They argue that ”In a colonized country, the presence of the dominant power is every day made evident through a variety of contents: differences of dress, of language, of skin colour, of customs […] the colonizer is discursively constructed as the anti-colonized.” (p. 128). This is to say that it is possible for elements of a society’s culture that are not explicitly political to become associated with ideologies and drawn into the correlation, further reducing opportunities for political pluralism. Two poles are constructed that are mutually exclusive and have nothing in common, sustained by segregation in all layers of society (e.g. the South African regime of racial apartheid). Notably, the polarisations that they examine do not occur primarily in the division between political parties, but along fault lines such as race and ethnicity, economic class, the urban-rural divide, and the division between coloniser and colonised. The starkest demonstration of this principle is in the actions of the 1970s Argentinian military dictatorship, which banned Venn diagrams from being taught in primary schools because they ”were feared to encourage subversive models of collectivity” [38].

In this paper we present a novel method of measuring polarisation that follows from both Laclau & Mouffe and Baldassarri & Gelman, namely using Singular Value Decomposition to determine the correlation structure of social networks. Using social networks allows us to capture interactions between people that are not explicitly political — being neighbours, sharing a workplace, etc — but which become politicised and segregated during extreme polarisation. As such, we are able to determine whether correlation is increasing not just among possible political positions, but whether it is increasing among social determinants of interaction as well. Our method gives a value that corresponds to the network’s capacity for complexity, and is inversely related to its level of polarisation. Incorporating these additional layers of a society into the analysis should make it easier to detect whether polarisation is occurring.

2 Methods

We represent the conversation happening on a social platform (Twitter, Facebook, Instagram, etc.) as a network. Each user that took part in the conversation is mapped to a node. We add an edge between two nodes if the two respective users have communicated in the time window of the observation. Depending on the chosen social platform considered and the specific research question, a communication can be given by a reply, a mention (”tagging” them in a post), a quote, a repost/retweet/share, or a set of these. These networks can be directed (as is more common) or undirected. Here, we consider communication networks only as unweighted graphs, although the generalization to weighted graphs doesn’t present any fundamental challenges.

2.1 Network modelling

Having established these networks, we model them as Random Dot Product Graphs (RDPGs) [39]. RDPGs are used instead of other graph embeddings because their optimal embedding dimension (see below) is established a priori to the analysis, so it is independent of the network’s size.

In the most general, directed, case under the RDPG model, each node $i\in\{1,...,N\}$ in a graph $G$ is associated with two vectors of traits, $L_{i}$ and $R_{i}$ , that give the node position in a pair of metric spaces, $L$ and $R$ . $L_{i}$ and $R_{i}$ are in general not directly observable. Then, the probability that an edge from node $i$ to node $j$ exists is given by the proximity of $L_{i}$ and $R_{j}$ , namely by the dot product

L_{i}\cdot R_{j}=\mathbb{P}(i\rightarrow j)\,.

In other words, the position of a node in $L$ describe its outgoing edge topology, and the position of a node in $R$ describe its incoming edge topology. In general, given the two matrices $L$ and $R$ , the edges are drawn with independent probabilities given by $LR$ . We call the couple $(L,R)$ the RDPG embedding of $G$ .

In inference tasks, starting from an observed graph, the goal is to estimate the position of the nodes in the latent spaces, given the interaction structure of the network. We do not parametrise the network for this analysis. For a fixed dimension $d$ of the two latent spaces, this is achieved by a $d$ -truncated Singular Value Decomposition as follows (full description of SVD can be found in Noble [40].

Let $A$ be the adjacency matrix of $G$ . Let $A=U\Sigma V^{\prime}$ be a singular value decomposition of A, so that $U$ and $V^{\prime}$ are orthogonal matrices, and $\Sigma$ is the diagonal matrix whose $i$ -th entry is the $i$ -th singular value of $A$ (sorted in decreasing order). Notice that in general $U$ and $V^{\prime}$ are only identifiable up to orthogonal transformations (any rotation of them would keep the dot product constant, so they would determine the same graph). Denoting $M|_{k}$ the truncation of a matrix $M$ to its first $k$ columns, for any $d$ , the two matrices $\hat{L}=U|_{d}\sqrt{\Sigma}$ and $\hat{R}=\sqrt{\Sigma}\left(V|_{d}\right)^{\prime}$ determine a rank- $d$ optimal approximations of $A$ . That is, $\hat{L}\hat{R}=\hat{A}$ minimizes the Frobenius distance to $A$ between all the rank- $d$ matrices. ${\Sigma}$ is truncated to the first $d$ diagonal elements; $\hat{L}$ has $N$ rows and $d$ columns, and $\hat{R}$ has $d$ rows and $N$ columns.

In the undirected case, $\hat{L}=\hat{R}$ so that $\hat{L}_{i}\cdot\hat{R}_{j}=\hat{R}_{i}\cdot\hat{L}_{j}$ and the probabilities of interaction are symmetric.

Embedding dimension

We define the dimension of a communication network as the optimal choice of $d$ for the RDPG embedding of the network and denote it $\hat{d}$ .

An a-priori optimal choice for $\hat{d}$ can be obtained from, $\Sigma$ , the sorted sequence of singular values of the network’s adjacency matrix $A$ . Various methods exist. Here we adopt the elbow method presented in [41]. The elbow method identifies the most likely change point in the sequence of values of $\Sigma$ by sequentially fitting two Gaussian distributions with independent mean, and equal variance. One Gaussian distribution is fitted to the largest $d$ singular values, and the other to the smallest $K-d$ . Then, the optimal $\hat{d}$ is the value of $d$ that maximise the sum of the log-likelihoods of the two distributions.

Notice that $\hat{d}$ is robust to network size, i.e. its value is determined by the complexity of the network rather than the size of the network [39].

SVD Entropy

Given a network, we can assess its graph complexity by computing its SVD entropy, which choose this instead of other entropy measures because it is based on $\Sigma$ . A network has higher SVD entropy when many of its singular vectors are highly important for its structure, meaning that the network cannot be efficiently compressed. This is commonly read as an indication of high network complexity [42], and it is related (although not in a linear nor straightforward way) to its dimension. We normalise the SVD entropy using Pielou’s evenness [43], so that the results do not depend on the network size.

In particular, let $\Sigma$ be the sequence of singular values of a network’s adjacency matrix $A$ . The nuclear norm of $A$ is given by the sum total of $\Sigma$ (that is, the sum of all singular values). We define the normalized values $s_{i}=\frac{\sigma_{i}}{\|A\|_{*}}$ where $\|A\|_{*}$ is the Frobenius norm of $A$ , and therefore $i\in\{1,....N\}$ and $\sigma_{i}$ is the $i$ -th singular value.

Then, the (Pielou normalised) SVD entropy of a graph $G$ is given by

J=-ln(N)^{-1}\times\sum_{i=1}^{N}s_{i}ln(s_{i})

where the sum term in the definition is, indeed, an entropy.

2.2 Polarisation

We define a process of polarisation as the loss of dimensionality of a graph observed in time. Namely, we find the optimal RDPG embedding dimension $\hat{d}$ at multiple time points. If $\hat{d}$ decreases over time then we argue that the network has become more polarised during that time. This is based on the same principles as the view that the process of polarisation is one of increasing correlation. The dimensions of $\hat{d}$ are all uncorrelated; as such reduction in $\hat{d}$ corresponds to a reduction in uncorrelation in the social network graph, which is an increase in correlation.

We complement this definition by also observing the complexity of the graph, as determined by its SVD entropy, and notice whether it corresponds to an increase or decrease of polarisation.

2.3 Code

All the Social Network analysis discussed above have been performed in Julia[44], in particular using the packages Graphs.jl[45] for network manipulation, PROPACK.jl[46] for computing the (truncated) singular values, and DotProductGraphs.jl for computing the embedding dimension and SVD entropy.

All scripts are available at https://doi.org/10.5281/zenodo.11043841. In Julia all packages can be installed by running Pkg.activate(“.”); Pkg.install().

3 Results

We apply our computational framework for polarisation to three different data sets: two from Twitter, and one consisting of simulated interaction networks. These have been chosen in order to show that our method works well, and to explore some common beliefs about polarisation in a new way.

3.1 Climate discussion in New Zealand Twitter

We obtained 12939 tweets by querying Twitter’s Academic API v2.0 for keywords related to climate change: climate, pollution, agw (anthropic global warming), CO2, and carbon. Some keywords used in other research, such as “COP 2x” for the international climate change conference, did not return many results when searched on New Zealand tweets. We restricted our search to original tweets geographically tagged as being from New Zealand, and tweets published after 2017, in order to investigate whether climate discussions in New Zealand were becoming more polarised.

We divided the data into two time windows that corresponded to equally sized, large networks: between 2017 and 2020, and 2020 to 2023. For each time frame, we built a network by considering each user (identified by their unique IDs) as a node, and any mention, reply, or quote tweet between two users as an edge. Retweets were excluded in order to maintain focus on the New Zealand network and prevent the network expanding to international discussions of climate change. There were 6767 tweets in the 2017-2020 network and 6172 in the 2020-2023 network; as such, the network sizes given are based on the number of nodes unless otherwise stated. We analysed the two networks independently.

We computed the results both in the original networks and in the original giant component of the network (the table of pointwise estimates below). We also computed results for 1000 bootstrapped samples of networks where we sampled (with repetition) the same amount of nodes of the original giant components of the graphs (the box graphs below). This bootstrapped sampling would allow to detect possible effect of sample limitations in the original graph – we did not find any notable sample limitations.

Refer to caption — Figure 1: Plot comparing $\hat{d}$ of NZ climate change tweets in 2017-2020 to $\hat{d}$ in 2020-2023.

Table 1: Pointwise estimates of network dimensionality and entropy.

Year	Dimension	Dimension GC	Entropy	Entropy GC
2017-2020	39	39	0.980229	0.979954
2020-2023	27	24	0.97439	0.97372

We find that $\hat{d}$ for the NZ Twitter discussion of climate change has decreased in the 2020-2023 network compared to the 2017-2020 network, indicating that the network was more polarised in the second time period than in the first. Similarly, the SVD entropy of the network is lower in the 2020-2023 period than in the 2017-2020 period. These results suggest that the complexity of the network decreased between these two time periods. This would support the argument that the political positions held by users are becoming narrower over time, which matches Baldassarri and Gelman’s definition of polarisation.

3.2 COP discussion in Twitter

Falkenberg et al. collected a very large corpus of tweets by querying Twitter’s Academic API v2.0 for tweets mentioning ”COP2x” where x was an integer between 0 and 6 (inclusive) [33]. Details about this data can be found in their original paper. They restricted their search to tweets in English, and covered the COP from 20 to 26 (years 2014 to 2022, with 2020 and 2021 skipped due to the Covid-19 pandemic). Their adjacency matrix was constructed based on whether a user $i$ retweeted tweets from a political influencer $j$ . Their focus was whether there was noticeable division among users based on whether they were spreading true information about climate change or disinformation from climate change denialist influencers. To test for polarisation using our framework, we built a network for each year’s COP using the same data by considering each user (identified by their unique IDs) as a node, and any mention, reply, or quote tweet between two users as an edge. The network for each year was analysed independently. We then graphed $\hat{d}$ and the SVD entropy for each year in order to show how they, and therefore the polarisation of networks that discussed the COP conferences, changed over time.

Table 2: Pointwise estimates of

\hat{d}

and entropy based on the first 100 SVD values.

COP	Dimension	Entropy
20	14	0.9802473
21	7	0.97777313
22	2	0.97607696
23	3	0.97573924
24	9	0.9748669
25	3	0.9791759
26	2	0.9754415

Table 3: Pointwise estimates of

\hat{d}

and entropy based on the first 1000 SVD values.

COP	Dimension	Entropy
20	54	0.97711855
21	47	0.9803695
22	62	0.9787037
23	52	0.9789476
24	42	0.9741172
25	78	0.9807784
26	38	0.9823967

We found that $\hat{d}$ for the network of Twitter users discussing the COP conference has been decreasing over time, though the decrease was not linear. Unexpectedly, the SVD entropy of the network did not decrease in this way, and instead it was at its highest in 2022 even though $\hat{d}$ was at its lowest.

Interestingly, Falkenberg et al. expected to find polarisation during COP21, due to the signing of the Paris Agreement at COP21 [33]. Their Hartigan’s Dip Test for COP21 returned a significant result (p = 0.003), but they go on to claim that COP21 was not polarised despite this result. In our data, COP21 has a lower $\hat{d}$ than the years before or after. It may be possible that the network as a whole became more polarised, which is captured by our data, but this effect had not yet occurred among the ”influencers” that Falkenberg et al selected. Our results support Falkenberg et al’s suggestion that the increase in polarisation they observed was due to an increase in the prominence of anti-climate and generally far-right influencers on Twitter, since COP26 was the conference with the lowest $\hat{d}$ .

3.3 Synthetic Data

We wanted to explore what happens when common understandings of polarisation are evaluated using our approach. This both allows us to test these common understandings from new angles in order to see whether they hold up, and to find out whether there are specific cases where our methods may need to have corrections applied. We use stochastic block models for these experiments because they are straightforward to model using RDPGs while also being a good match for the experiments we wish to run. We focus on experimental computation of the statistical properties rather than analytical computation because the stochastic block models are the platform for our experiments rather than objects of interest in and of themselves.

3.3.1 Engagement Between Two Groups

The first common understanding we investigate is whether polarisation decreases as engagement between two groups increases (i.e. as echo chambers are removed). Understanding the effectiveness of this anti-polarisation strategy is important for implementing it in real-world social networks. We simulated a stochastic block network of 1000 nodes, split into two equally sized groups. We varied the probability of each node forming a connection within its group, and varied the probabilities of each node forming connections. In particular, we simulated a networks with in-group link probability of 0.3 to 0.45 with steps of 0.05. and between-group probabilities of 0.1, 0.05, and 0.01. Each combination of in-and between-group probabilities was simulated 100 times.

As expected, increasing the chance of connection between the two blocks increases $\hat{d}$ (and therefore decreases the polarisation of the network). The effect was consistent across all in-group link probabilities tested. This indicates that a potential social strategy to decrease polarsation could include facilitating the creation of connections between different groups.

For cases where the in-group link probability was lower, the SVD entropy decreased as the out-group link probability increased. In cases where the in-group link probability was higher, the entropy remained consistent or increased as the out-group link probability increased. As such, SVD entropy may be a less reliable indicator of polarisation than $\hat{d}$ .

3.3.2 One Group Becoming Larger

The second common understanding we tested is that polarisation decreases as one of the groups becomes much bigger than the other. This has implications for real-world cases of polarisation where one group is much larger than the other, e.g. ethnic minorities or refugee communities.

We simulated a stochastic block network of 1000 nodes, split into two groups. We fixed probabilities of in-group linking between 0.3 and 0.45 with steps of 0.05, and fixed the probability of between-groups linking at 0.05. We varied the sizes of the two groups, progressively increasing the size of one group from 0.5 of the full network to 0.2, 0.1, and finally 0.01. Each combination of block size and in-group linking probabilities was simulated 100 times.

We found that $\hat{d}$ increases slightly as one group becomes predominant in the network, but decreases strongly when one group is much larger than the other. This effect was consistent across all in-group link probabilities tested. The SVD entropy of the network also strongly decreased when one group was much larger than the other (99 to 1), but did not exhibit the same behaviour as $\hat{d}$ when the group was only starting to become predominant (80 to 20, and 90 to 10). At low in-group link probabilities, the SVD entropy decreased as one group became predominant; at higher in-group link probabilities, the entropy either remained stable or increased slowly as one group became predominant.

It is important to note that this low $\hat{d}$ is because at the greatest group size difference, the smallest group only contains 10 nodes. This means that there is a risk of the smaller group becoming disconnected entirely from the larger group, and that even at its greatest likelihood of connecting the smaller group is only sparsely connected to the larger group. While it would be possible to redesign the experiment to focus on edge density rather than only connection probability, we think this would correspond conceptually to a change in how people in the network were making connections with each other, and we wished to keep this aspect the same in our experiment rather than varying it. We believe that the sparsity in our experiment is a useful feature, as some noted polarisations have one group that is much smaller than the other (ethnic minorities, refugee populations, and LGBT+ communities being examples); in these polarisations, effects created by one of the groups being very small are relevant, as the current understanding of polarisation would suggest that there is a mathematical upper limit to how severe polarisation against a small group can become – we suspect that this is not accurate.

It is possible that our experiment did not decrease the group size far enough to trigger the effect expected by Esteban and Ray [30]. However, it does demonstrate that polarisation does not linearly decrease as one group becomes predominant, as was expected, and that the behaviour of the stochastic block model is more complicated.

4 Conclusions

We have demonstrated a novel method for measuring polarisation through the embedded dimensionality $\hat{d}$ of random dot-product graphs. This is a reliable and straightforward implementation of the correlation-based approach to polarisation suggested by Baldassarri and Gelman. Our method captured the presence of polarisation in all the scenarios where it was expected and had been found by other researchers, in both simulated data and real social media networks. In particular, the ability to observe the changes in $\hat{d}$ over time in data from the COP conferences provided a useful demonstration of how this method can be applied to longitudinal data. The RDPG approach also allows us to easily see that the process of polarisation is occurring in a network over time, through its embedded dimensionality reducing, rather than relying on a binary test of whether the network was polarised or non-polarised.

Another advantage of the RDPG approach is that it is computationally light; the main bottleneck is the computation of the first singular values of a large matrix, but this is well known in computer science literature and has already been strongly optimised. We found that the SVD was feasible even when used on networks with millions of nodes. Bimodality-based methods typically use SVD or correspondance analysis to determine the dimension they will test for bimodality, so our approach is at least as efficient.

Our approach is also highly interpretable, without forcing the latent ideological distributions into an artificially unidimensional space. Rather than creating a unidimensional space and then interpreting its political meaning (such as pro- and anti-climate, or left- and right-wing), the dimensionality method instead focuses on the number of dimensions rather than what those dimensions are. In high-dimensional spaces, we do not need to know exactly what ideologies the dimensions correspond to; the important part is that they signal that there are ideological connections being made between nodes that would not be possible if the network was polarised.

The SVD entropy of the network did not relate to $\hat{d}$ as closely as we expected, though it did reflect major changes in the networks when they occurred. As such, we think it is best to use $\hat{d}$ of the network to measure its polarisation.

A major limitation of this method which could be improved is that it does not capture affective polarisation very well. Our method functionally considers any interaction between two nodes to be “good”; this means that it is not capable of capturing antagonistic interactions between nodes, and as a result it may overestimate $\hat{d}$ of the network by mistaking brief antagonistic reactions for positive social bonds. There is a great deal of scope for integrating the concept of affective polarisation into our model, through methods such as using signed matrices and classification systems such as sentiment analysis to determine whether interactions in a social network are positive, negative, or neutral before determining $\hat{d}$ .

Another possible extension of this method would be to implement a nonparametric two-sample hypothesis test[47], since this would allow a hypothesis test of whether the two networks are significantly different as additional evidence of polarisation having occurred. We believe that being able to observe the embedded dimensionality of the graph alone is useful; however, we understand that sometimes a hypothesis test is demanded, and we believe this would help demonstrate that changes in the embedded dimensionality of the network are significant. In our tests using stochastic block models, we found an expected result that $\hat{d}$ increases as the probability of edges forming between the groups increases. This is evidence for the common belief about polarisation that connections between the groups decrease their polarisation. We also found an unexpected result that $\hat{d}$ does not straightforwardly increase when one group is much larger than the other, and in fact decreases greatly when one group is 100 times the size of the other. While this is somewhat a property of the size of the network and its sparsity, since the small group was very small and thus unable to make a large number of edges with the big group, we believe that it is an important case to have tested and that closer investigation of this scenario is warranted. We think that properties sparsity are important for modelling the real conditions of social networks at risk of polarisation, such as communities who have recently received refugees, and should be carefully considered in the model design to ensure that real effects are not being mathematically counteracted.

It would also be productive to experiment more with stochastic block models, since we only tested two possible scenarios. For example, this paper has only explored the two-block case; many instances of online ”echo chambers” have a large number of groups who all hate people different to them, and it would be useful to see what happens to the embedded dimensionality in such cases. Similarly, our testing on group prevalence showed a decrease in embedded dimensionality when one group was 100 times the size of the other, but we did not follow the smaller group’s size all the way to zero so we do not know whether the embedded dimensionality remains low or rebounds to a higher value.

4.1 Acknowledgements

References

[1] K. Hannah, S. Hattotuwa, Opportunism and polarisation: Presentations of the violence in israel and palestine by aotearoa disinformation networks, Tech. rep., University of Auckland (2023).
[2] J. A. Tucker, A. Guess, P. Barberá, C. Vaccari, A. Siegel, S. Sanovich, D. Stukal, B. Nyhan, Social media, political polarization, and political disinformation: A review of the scientific literature, Political polarization, and political disinformation: a review of the scientific literature (March 19, 2018) (2018).
[3] J. Hebenstreit, Voter polarisation in germany: Unpolarised western but polarised eastern germany?, German Politics (2022).
[4] T. Rodon, Affective and territorial polarisation: The impact on vote choice in spain, South European Society and Politics (2022).
[5] A. R. Kozłowski, G. Krzykowski, G. Fallon, Clustering of polish citizens on the bases of their support for leaving and remaining the european union, Polish Political Science Yearbook (2023).
[6] C. Teney, L. K. Rupieper, A new social conflict on globalisation-related issues in germany? a longitudinal perspective, KZfSS Kölner Zeitschrift für Soziologie und Sozialpsychologie (2023).
[7] J. Vowles, J. Curtin, A Populist Exception?: The 2017 New Zealand General Election, ANU Press, 2020.
[8] N. Satherley, L. M. Greaves, D. Osbourne, C. G. Sibley, State of the nation: trends in new zealand voters’ polarisation from 2009–2018, Political Science (2020).
[9] S. Orellana, H. Bisgin, Using natural language processing to analyze political party manifestos from new zealand, Information (2023).
[10] A. Tan, Using natural language processing to analyze political party manifestos from new zealand, Preprint (2020).
URL https://ir.canterbury.ac.nz/server/api/core/bitstreams/94ea9ccd-a0ad-41ea-bffc-7d29d3ab133c/content
[11] N. Satherley, D. Osbourne, C. G. Sibley, Stability and change in new zealanders’ political party support, The New Zealand Journal of Psychology (2021).
[12] N. Satherley, D. Osbourne, C. G. Sibley, Identity, ideology, and personality: Examining moderators of affective polarization in new zealand, Journal of Research in Personality (2020).
[13] S. Stanley, C. Ng Tseung-Wong, Z. Leviston, I. Walker, Acceptance of climate change and climate refugee policy in australia and new zealand: The case against political polarisation, Climatic Change (2021).
[14] M. Soar, V. L. Smith, M. Dentith, D. Barnett, K. Hannah, G. V. Dalla Riva, A. Sporle, Evaluating the infodemic: assessing the prevalence and nature of covid-19 unreliable and untrustworthy information in aotearoa new zealand’s social media, january-august 2020, Tech. rep., Te Pūnaha Matatini (2020).
[15] M. Dentith, Covid-19 in aotearoa new zealand: The darker side of paradise, in: Covid Conspiracy Theories In Global Perspective, Routledge, 2023, pp. 381–392.
[16] K. Hannah, S. Hattotuwa, K. Taylor, The murmuration of information disorders: Aotearoa new zealand mis- and disinformation ecologies and the parliament protest, Pacific Journalism Review (2022).
[17] G. McConnell, Less te reo and fewer treaty clauses under new government, Te Ao Māori News (2023).
URL https://www.teaonews.co.nz/2023/11/25/less-te-reo-and-fewer-treaty-clauses-under-new-government/
[18] T. A. Hurihanganui, Govt moves to replace or repeal treaty principles clauses from laws, One News (2024).
URL https://www.1news.co.nz/2024/05/27/govt-moves-to-replace-or-repeal-treaty-principles-clauses-from-laws/
[19] W. Tribunal, Ngā mātāpono – the principles: The interim report of the tomokia ngā tatau o matangireia – the constitutional kaupapa inquiry panel on the crown’s treaty principles bill and treaty clause review policies, Tech. rep., Waitangi Tribunal (2024).
[20] K. Hannah, S. Hattotuwa, Race and rage: Examining rising anti-māori racism and white supremacist ideologies in aotearoa new zealand., Tech. rep., University of Auckland (2023).
[21] M. J. Hetherington, Review Article: Putting Polarization in Perspective, British Journal of Political Science (2009).
[22] J. A. Hartigan, P. M. Hartigan, The dip test of unimodality, The annals of Statistics (1985).
[23] E. Kopacheva, V. Yantseva, Users’ polarisation in dynamic discussion networks: The case of refugee crisis in sweden, PLOS ONE (2022).
[24] C. M. Valensise, M. Cinelli, W. Quattrociocchi, The dynamics of online polarization, arXiv (2022).
[25] S. Iyengar, G. Sood, Y. Lelkes, Affect, Not Ideology, Public Opinion Quarterly (2012).
[26] A. Tanesini, Affective polarisation and emotional distortions on social media, Royal Institute of Philosophy Supplements (2022).
[27] M. Del Vicario, G. Vivaldo, A. Bessi, F. Zollo, A. Scala, G. Caldarelli, W. Quattrociocchi, Echo Chambers: Emotional Contagion and Group Polarization on Facebook, Scientific Reports (2016).
[28] Y. Gao, F. Liu, L. Gao, Echo chamber effects on short video platforms, Scientific Reports (2023).
[29] L. Röllicke, Polarisation, identity and affect - conceptualising affective polarisation in multi-party systems, Electoral Studies (2023).
[30] J.-M. Esteban, D. Ray, On the Measurement of Polarization, Econometrica (1994).
[31] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding of communities in large networks, Journal of Statistical Mechanics: Theory and Experiment (2008).
[32] E. Schubert, Stop using the elbow criterion for k-means, arXiv (2022).
[33] M. Falkenberg, A. Galeazzi, M. Torricelli, N. Di Marco, F. Larosa, M. Sas, A. Mekacher, W. Pearce, F. Zollo, W. Quattrociocchi, A. Baronchelli, Growing polarisation around climate change on social media, arXiv (2022).
[34] D. Baldassarri, A. Gelman, Partisans without constraint: Political polarization and trends in american public opinion, American Journal of Sociology (2008).
[35] A. C. Kozlowski, J. P. Murphy, Issue alignment and partisanship in the american public: Revisiting the ‘partisans without constraint’ thesis, Social Science Research (2021).
[36] D. DellaPosta, Pluralistic collapse: The “oil spill” model of mass opinion polarization, American Sociological Review (2020).
[37] E. Laclau, C. Mouffe, Hegemony and socialist strategy, Verso, London, 1985.
[38] S. Cotter, “the ungovernables”, Artforum (2012).
URL https://www.artforum.com/events/the-ungovernables-192566/
[39] A. Athreya, D. E. Fishkind, M. Tang, C. E. Priebe, Y. Park, J. T. Vogelstein, K. Levin, V. Lyzinski, Y. Quin, Statistical inference on random dot product graphs: a survey., The Journal of Machine Learning Research (2017).
[40] B. Noble, Applied Linear Algebra, Pearson, 1969.
[41] M. Zhu, A. Ghodsi, Automatic dimensionality selection from the scree plot via the use of profile likelihood, Computational Statistics & Data Analysis 51 (2) (2006) 918–930.
[42] R. Gu, Y. Shao, How long the singular value decomposed entropy predicts the stock market? — evidence from the dow jones industrial average index, Physica A: Statistical Mechanics and its Applications (2016).
[43] E. Pielou, Ecological diversity, Wiley, New York, 1975.
[44] J. Bezanson, A. Edelman, S. Karpinski, V. B. Shah, Julia: A fresh approach to numerical computing, SIAM Review 59 (1) (2017) 65–98. doi:10.1137/141000671.
URL https://epubs.siam.org/doi/10.1137/141000671
[45] J. Fairbanks, M. Besançon, S. Simon, J. Hoffiman, N. Eubank, S. Karpinski, Juliagraphs/graphs.jl: an optimized graphs package for the julia programming language (2021).
URL https://github.com/JuliaGraphs/Graphs.jl/
[46] Dominique, Alexis, A. Noack, JSOBot, J. Chen, MonssafToukal, tmigot, A. S. Siqueira, J. TagBot, Juliasmoothoptimizers/propack.jl: v0.5.0 (Oct. 2022). doi:10.5281/zenodo.7150606.
URL https://doi.org/10.5281/zenodo.7150606
[47] M. Tang, A. Athreya, D. L. Sussman, V. Lyzinski, C. Priebe, A nonparametric two-sample hypothesis testing problem for random graphs, American Journal of Sociology (2017).