Data Trading with a Monopoly Social Network
Outcomes are Mostly Privacy Welfare Damaging
1 abstract
This paper argues that data of strategic individuals with heterogeneous privacy valuations in a distributed online social network (e.g., Facebook) will be under-priced, if traded in a monopoly buyer setting, and will lead to diminishing utilitarian welfare. This result, for a certain family of online community data trading problems, is in stark contrast to a popular information economics intuition that increased amounts of end-user data signals in a data market improves its efficiency. Our proposed theory paves the way for a future (counter-intuitive) analysis of data trading oligopoly markets for online social networks (OSNs).
keywords
distributed community, monopoly, social welfare
2 Introduction
Data of billions of online individuals are currently gathered, processed, and analyzed for personalized advertising or other online service111Facebook itself has approximately 2.5 billion monthly active individual users.. This trend is on the high rise with a perennial increase in online apps, IoT technologies, and advanced AI/ML methodologies. It is a common and age-old notion in economics (see [1, 2, 3, 4, 5, 6, 7, 8, 9]) that the benefits and use of sharing individual information with the demand side of an information market is beneficial to targeted customization, demand side profit, and the growth of data-‘hungry’ AI/ML controlled businesses. It has also been argued by economists [10, 11] that because of the above-mentioned benefits that individual data brings to a market setting, a competitive market mechanism might generate too little data sharing from the supply side.
In this letter, we rigorously argue, through a counter-example of a simple application type, that the popularly known economic intuition does not hold in general, atleast for a monopoly information market setting. More specifically, we show that for some community settings (e.g., Facebook) trading end-user data/information signals in a monopoly market leads to diminishing economic utilitarian social welfare. The intuition behind this result primarily lies behind the negative externalities created via trading statistically correlated end-user signals when these heterogeneous users have varying privacy valuations of their data signals. The result is contrary to recent results that intuit/prove that privacy can be detrimental to information market efficiency [5, 9, 12, 13, 14, 15, 16] if ideally, one’s value of privacy is not high, or one’s data is mildly correlated with others. Specifically, we prove that information markets will be inefficient in non-ideal community settings - formally hinted earlier in [14]. Our analysis is complete for a monopoly structure, with a major takeaway being that in practical social community settings, sub-population privacy will jeopardized at a monopoly data trading market equilibrium. To the best of our knowledge, we are the first to mathematically dispel traditional information economics intuition, albeit for social network settings only. Moreover, it paves the way for a future (counter-intuitive) analysis of general community data trading oligopoly markets.
The rest of the paper is organized as follows. In Section II, we provide an intuition, via an example, towards proving our claim. We then follow this up in Section III with the description of a formal monopoly market model. In Section IV, we analyze this market model and formally prove our claim. We provide illustrative examples of our theory in Section V. We conclude the letter in Section VI.
3 Intuition
We provide an example-driven intuition that leads us to formally investigate the validity of the hypothesis that OSN user information promotes efficient data trading markets.
We focus on the widely popular Cambridge Analytica scandal. The company acquired private information of millions of individuals from data shared by 270,000 Facebook users who voluntarily downloaded an app for mapping personality traits, called This is your digital life. The app accessed users’ news feed, timeline, posts, and messages, and revealed information about other Facebook users. The company was finally able to infer valuable information about more than 50 million Facebook users, which it deployed for designing personalized political messages and advertising in the Brexit referendum and the 2016 US presidential election. This scandal highlighted two important facets: (i) private information (e.g., behavior, habits, preferences) of users part of an online social community such as Facebook are correlated and results in knowing such information about other users222Habits and preferences of a highly educated gay from a particular locality is informative about others with the same profile and residing in the same area. whose data is not leaked, and (ii) once it is openly publicized that valuable user information has been breached to satisfy external objectives, users are often miffed resulting in a huge social uproar as did happen in the case of the Cambridge Analytica scandal. These observations motivated us to develop a skepticism regarding the popular economic notion that more data implies increased information market efficiency. It could also be that trading in return for incentives for such community settings might not go down well333This is a high chance in scenarios of social uproar post publicly known data breaches, if not in cases where data breaches go un-noticed. with the user privacy preferences, consequently hampering societal welfare (see Section III for a definition).
To state our intuition in a relatively more formal manner (courtesy of [15]), consider a community platform with two users, = 1, 2. Each user owns its own personal data, which we represent with a random variable (from the viewpoint of the platform). The relevant data of the two users are related, which we capture by assuming that their random variables are jointly normally distributed with mean zero and correlation coefficient . The community platform can acquire or buy the data of a user in order to better estimate her preferences or actions. Its objective is to minimize the mean square error of its estimates of user types, or maximize the amount of leaked information about them. Suppose that the valuation (in monetary terms) of the platform for the users’ leaked information is one, while the value that the first user attaches to her privacy, again in terms of leaked information about her, is and for the second user it is . We also assume that the platform makes take-it-or-leave-it offers to the users to purchase their data. In the absence of any restrictions on data markets or transaction costs, the first user will always sell her data (because her valuation of privacy, , is less than the value of information to the platform, 1). But given the correlation between the types of the two users, this implies that the platform will already have a fairly good estimate of the second user’s information. Suppose, for illustration, that . In this case, the platform will know almost everything relevant about user 2 from user 1’s data, and this undermines the willingness of user 2 to protect her data. In fact, since user 1 is revealing almost everything about her, she would be willing to sell her own data for a very low price (approximately 0 given ). But once the second user is selling her data, this also reveals the first user’s data, so the first user can only charge a very low price for her data. Therefore in this simple example, the community platform will be able to acquire both users’ data at approximately zero price. Critically, however, this price does not reflect the users’ valuation of privacy. When , the equilibrium is efficient because data are socially beneficial in this case (even if data externalities change the distribution of economic surplus between the platform and users). However, it can be arbitrarily inefficient when is sufficiently high. This is because the first user, by selling her data, is creating a negative externality on the second user.
4 System Model
A simple example, such as the one aforementioned, clearly provides an intuition regarding the inefficiency of information trading in community settings with heterogeneous privacy valuations. In this section. en route to generalizing the validity (or invalidity) of our intuition, we propose a monopoly information trading market model (reproduced from [15])444We use the same notation for consistency purposes. consisting of platform users and a profit-maximizing community platform (e.g., Facebook).
We consider community users represented by the set Each user has a type denoted by which is a realization of a random variable We assume that the vector of random variables has a joint normal distribution where is the publicly known covariance matrix of Let designate the -th entry of and denote the variance of individual s type. Each user has some personal data, , which is informative about its type, i.e., the ‘DNA’ that drives the user’s tastes (for example, based on her past behavior, preferences, or contacts). We suppose that where is an independent random variable with standard normal distribution555This has taken various forms in the information privacy literature [17]., i.e., . For any user joining the community platform, the platform can derive additional revenue (e.g., due to benefits of targeted advertising) if it can predict the user’s type. We simply assume that the community platform’s revenue from each user is a deceasing function of the mean square error of its forecast of the user’s type, minus what the platform pays to users to acquire their information. More specifically, the objective of the platform is to minimize
(1) |
where is the vector of data the platform acquires, is the platform’s estimate of the user’s type given this information, is included as a convenient normalization, and denotes payments (be it explicit or implicit) to user from the platform for their data (we ignore for simplicity any other transaction costs incurred by the platform).
Users value their privacy, which we also model in a reduced-form manner (reflecting both pecuniary and non-pecuniary666As example, the fact that a user may receive a greater consumer surplus when the platform knows less about her or she may have a genuine demand for keeping her preferences, behavior, and information private. There may also be political and social reasons for privacy, for example, for concealing dissident activities or behaviors disapproved by some groups. motives) as a function of the same mean square error. We assume, specifically, that user s value of privacy is and her payoff is This expression and its comparison with objective (1) clarifies that the platform and users have potentially- opposing preferences over information about user type. We have again subtracted as a normalization, which ensures that if the platform acquires no additional information about the user and makes no payment to it, the payoff is zero. Critically, users with value their privacy less than the valuation that the platform attaches to information about them, and thus reducing the mean square error of the estimates of their types is socially beneficial. In contrast, users with value their privacy more, and reducing their mean square error is socially costly. In settings without data externalities (where data about one user have no relevance to the information about other users - an example being collection agencies not gathering addresses locations), the first group of users should allow the platform to acquire (buy) their data, while the second group should not. A simple market mechanism based on prices for data can implement this efficient outcome, in accordance to the traditional economic notion that more information implies better market efficiency. However, the situation could be very different in the presence of data externalities (e.g., online community settings such as Facebook).
A key notion for our analysis is breached information, which captures the reduction in the mean square error of the platform’s estimate of the type of a user. When the platform has no information about user , its estimate satisfies . As the platform receives data from this and other users, its estimate improves and the mean square error declines. The notion of breached information captures this reduction in mean square error (MSE). Specifically, let denote the data sharing action of user with corresponding to sharing. Denote the the profile of sharing decisions by and the decisions of agents other than by We also use the notation to denote the data of all individuals for whom i.e., Given a profile of actions a, the breached information of (or about) user is the reduction in the MSE of the best estimator of the type of user :
Notably, because of data externalities, breached information about user depends not just on her decisions but also on the sharing actions taken by all users. With this notion at hand, we can write the payoff of user given the price vector as
where recall that is user’s value of privacy. We also express the monopoly platform’s payoff more compactly as
(2) |
An action profile of the strategic users and a price vector for the users constitute a pure strategy equilibrium of the user-platform game if both users and the community platform maximize their payoffs given other players’ strategies. More formally, we define an equilibrium of this game as a Stackelberg equilibrium in which the monopoly platform chooses the price vector recognizing the user equilibrium that will result following this choice.
Definition 1
Given the price vector an action profile is user equilibrium if for all ,
We denote the set of user equilibria at a given price vector by . A pair of price and action vectors is a pure strategy Stackelberg equilibrium if and there is no profitable deviation for the platform, i.e.,
In what follows, we refer to a pure strategy Stackelberg equilibrium simply as an equilibrium.
We now characterize two important properties of the breached information function :
.
1. Monotonicity: for two action profiles a and with a
2. Submodularity: for two profiles a and with ,
The monotonicity property states that as the set of community users who share their information expands, the breached information about each user (weakly) increases. This is an intuitive consequence of the fact that more information always facilitates the estimation problem of the platform and reduces the mean square error of its estimates. More important for the rest of our analysis is the submodularity property, which implies that the marginal increase in the breached information from individual ’s sharing decision is decreasing in the information shared by others. This too is intuitive and follows from the fact that when others’ actions reveal more information, there is less to be revealed by the sharing decision of any given individual. Thus, from the celebrated result due to Topkis [18], for any , the set is a complete lattice, and thus has a least and a greatest element. This implies that the set of user equilibria is always non-empty.
5 Monopoly Market Analysis
Enroute to analyzing the market welfare generated via the aforementioned game setting, we first define the benchmark first best welfare outcome as the data sharing decisions that maximize utilitarian social welfare or social surplus given by the sum of the payoffs of the platform and users. Social surplus (SoS) from an action profile a is
Note that prices do not appear in this expression because they are transfers from the community platform to users. The first-best action profile, , maximizes this expression. The following theorem (built on [15]) characterizes the first-best action profile.
Theorem 1
Implication - To understand this result, consider first the case in which there are no data externalities so that the covariance terms in (3) are zero, except , so that the left-hand side is simply times . This yields if (thus a no externality setting becomes mathematically equivalent to case when all users do not value their privacy enough) . The situation is different in the presence of data externalities, because now the covariance terms are non-zero. In this case, an individual should optimally share her data only if it does not reveal too much about users with . Note here that the covariance matrix can be robustly estimated from publicly observed values as dependencies are usually preserved in the addition of noise within a threshold.
In this section, we adopt the more realistic assumption that, to start with, the monopoly platform does not know the exact valuations of users in a community (in contrast to assumptions made in existing works such as [19]) that are only private to them, but are informed that the valuations come from the cumulative distribution function and density function (with upper support denoted by ). We allow the platform to design a pricing mechanism that elicits the true privacy valuations from the users (as Step 1 of the economy), somewhat similar to the seminal Vickrey-Clarkes-Groves (VCG) mechanism with a minor variation. More specifically, for any user the price offered to user (as Step 2 of the economy) is equal to the surplus of all other users on the platform when user is present minus by the surplus when user is absent. We consequently have the following result.
Theorem 2
(due to [15]) Let be the reported vector of values of privacy. Then the non-negative pricing scheme
where incentivizes users to report their value of privacy truthfully.
Definition 2
An equilibrium is a pair of functions of the reported valuations such that each user reports its true value and the expected payoff of the platform is maximized. That is,
We now have the following theorem characterizing the equilibrium of the monopoly market setting.
Theorem 3
Moreover, all users report truthfully and thus the expected payoff of the platform is
where, is a non-decreasing function representing the additional rent that a user will capture in incentive compatible mechanisms.
Implication - The theorem guarantees the existence of a unique monopoly market equilibrium where community platform users elicit their true valuations. A sufficient condition for to be non-decreasing is for to be non-increasing. This requirement is satisfied for a variety of distributions such as uniform and exponential [20].
We now investigate whether the reachable market equilibrium is efficient. We have the following result, via [15], in this regard.
Theorem 4
(due to [15])
1. Suppose high-value users are uncorrelated with all other users and , where denotes the set of users with . Then
the market equilibrium is efficient.
2. Suppose some high-value users (those in ) are correlated with users in . Then there exists such that for the market equilibrium is inefficient.
3. Suppose every high-value user is uncorrelated with all users in but users in a nonempty subset
of are correlated with at least one high-value user. Then there exist and such that if and for some , the market equilibrium is inefficient.
4. Suppose every high-value user is uncorrelated with all low-value users and at least one high-value user is correlated with another high-value user. Let be the subset of high-value users correlated with at least one other high-value user. Then for each there exists such that if for any the market equilibrium is inefficient.
5.The social surplus at market equilibrium, , for any v (either known truthfully or otherwise)
Implication - The theorem provides the conditions under which market equilibrium in a monopoly community information trading setting is utilitarian welfare (in)efficient. Note that the market efficiency results in the theorem are conservative in the sense we assume user privacy valuations are unknown in the worst case, and the platform does its best to elicit true valuation responses. Inefficiency in this setting would imply inefficiency in the case where community user valuations are untruthful (see point #5 in the theorem). According to the points in the theorem, it is evident that the information trading market is efficient is when high-value users (those with both are uncorrelated with all other users - something practically rare to achieve, and low-value users have values less than one (note here that low-value users always have but could have values greater than 1). Not satisfying either will lead to incentive compatibility conditions preventing efficient allocation. In all other cases, the information trading market is inefficient (SoS at equilibrium is not optimal), and the extent of inefficiency depends on whether high-value users are correlated with low-value users with values greater or less than one.
6 Examples
In this section, we provide numerical examples (as in [15]) to lucidly illustrate (a) the existence of a data trading market equilibrium, and (b) the social surplus (SoS) zone at market equilibrium.
Example 1. Suppose there are two users 1 and 2 with covariance matrix and When both action profiles and are user equilibria. This is a consequence of the submodularity of the leaked information function: when user 1 shares her data, she is also revealing a lot about user 2 , and making it less costly for her to share her data. Conversely, when user 1 does not share, this encourages user 2 not to share. Despite this multiplicity of user equilibria, there exists a unique (Stackelberg) equilibrium for this game given by and This uniqueness follows because the platform can choose the price vector to encourage both users to share. The next example suggests that though there may be multiple equilibria in the Stackelberg game, all of them yield the same payoff for the community platform.
Example 2. Suppose there are three users with the same value of privacy and variance: and for We let all off-diagonal entries of to be Any action profile where two out of three users share their information is an equilibrium, and thus there are three distinct equilibria. But it is straightforward to verify that they all yield the same payoff to the platform.
The following example illustrates the social welfare zone at market equilibrium with variations in the correlation coefficient and privacy valuation for high-value community users.
Example 3. We consider a setting with two communities, each of size 10. Suppose that all users in community 1 are low-value and have a value of privacy equal to 0.9, while all users in community 2 are high-value (with ). We also take the variances of all user data to be 1, the correlation between any two users who belong to the same community to be 1/20, and the correlation be- tween any two users who belong to different communities to be . depicts equilibrium surplus as a function of and .
Two points are worth noting from this example. First, relatively small values of the correlation coefficient are sufficient for social surplus to be negative. Second, when is very close to 1, the social surplus is always positive because the negative surplus from high-value users is compensated by the social benefits their data sharing creates for low-value users.
7 Conclusion
In this paper, we mathematically argued, using recent developments in [15], that social community data trading is not economically welfare efficient in a monopoly market setting, going against the popular economic philosophy/intuition that increased amounts of end-user data signals in a market improves utilitarian social welfare. The primary reason behind our result is the significant negative externality (via user signal correlations) generated by privacy breaches in the information market that cannot be cancelled out via market equilibrium prices handed over to the users for their information.
8 Acknowledgement
This work NSF-supported under grants CNS-1616575, CNS-1939006, CNS-2012001, and ARO W911NF1810208.
References
- [1] Richard Poser. The right of privacy. Georgia Law Review, 12(3), 1978.
- [2] Richard Poser. The economics of privacy. American Economic Review, 71(2).
- [3] George Stigler. An introduction to privacy in economics and politics. Journal of Legal Studies, 9(4), 1978.
- [4] Kenneth C. Laudon. Markets and privacy. Commun. ACM, 39(9):92–104, September 1996.
- [5] Alessandro Acquisti, Curtis Taylor, and Liad Wagman. The economics of privacy. Journal of Economic Literature, 54(2):442–92, 2016.
- [6] Andrew Odlyzko. Privacy, economics, and price discrimination on the internet. Economics of Internet Security (Eds. Jean Camp, Stephen Lewis, 2003.
- [7] Pamela Samuelson. Privacy as intellectual property? Stanford law review, pages 1125–1173, 2000.
- [8] Paul M Schwartz. Property, privacy, and personal data. Harv. L. Rev., 117:2056, 2003.
- [9] Eric A Posner and E Glen Weyl. Radical markets: Uprooting capitalism and democracy for a just society. Princeton University Press, 2018.
- [10] Hal R Varian. Economic aspects of personal privacy. In Internet policy and economics, pages 101–109. Springer, 2009.
- [11] Maryam Farboodi, Roxana Mihet, Thomas Philippon, and Laura Veldkamp. Big data and firm dynamics. In AEA papers and proceedings, volume 109, pages 38–42, 2019.
- [12] Ranjan Pal and Jon Crowcroft. Privacy trading in the surveillance capitalism age viewpoints on’privacy-preserving’societal value creation. ACM SIGCOMM Computer Communication Review, 49(3):26–31, 2019.
- [13] Ryan Calo. Privacy and markets: a love story. Notre Dame L. Rev., 91:649, 2015.
- [14] Ranjan Pal, Jon Crowcroft, Yixuan Wang, Yong Li, Swades De, Sasu Tarkoma, Mingyan Liu, Bodhibrata Nag, Abhishek Kumar, and Pan Hui. Preference-based privacy markets. IEEE Access, 8:146006–146026, 2020.
- [15] Daron Acemoglu, Ali Makhdoumi, Azarakhsh Malekian, and Asuman Ozdaglar. Too much data: Prices and inefficiencies in data markets. Technical report, National Bureau of Economic Research, 2019.
- [16] Nikolaos Laoutaris. Why online services should pay you for your data? the arguments for a human-centric data economy. IEEE Internet Computing, 23(5):29–35, 2019.
- [17] Anand D Sarwate and Kamalika Chaudhuri. Signal processing and machine learning with differential privacy: Algorithms and challenges for continuous data. IEEE signal processing magazine, 30(5):86–94, 2013.
- [18] Donald M Topkis. Minimizing a submodular function on a lattice. Operations research, 26(2):305–321, 1978.
- [19] Weina Wang, Lei Ying, and Junshan Zhang. The value of privacy: Strategic data subjects, incentive mechanisms and fundamental limits. In ACM SIGMETRICS Performance Evaluation Review, volume 44, pages 249–260. ACM, 2016.
- [20] Marco Burkschat and Nuria Torrado. On the reversed hazard rate of sequential order statistics. Statistics & Probability Letters, 85:106–113, 2014.