Data Sharing Markets
Abstract
With the growing use of distributed machine learning techniques, there is a growing need for data markets that allows agents to share data with each other. Nevertheless data has unique features that separates it from other commodities including replicability, cost of sharing, and ability to distort. We study a setup where each agent can be both buyer and seller of data. For this setup, we consider two cases: bilateral data exchange (trading data with data) and unilateral data exchange (trading data with money). We model bilateral sharing as a network formation game and show the existence of strongly stable outcome under the top agents property by allowing limited complementarity. We propose ordered match algorithm which can find the stable outcome in ( is the number of agents). For the unilateral sharing, under the assumption of additive cost structure, we construct competitive prices that can implement any social welfare maximizing outcome. Finally for this setup when agents have private information, we propose mixed-VCG mechanism which uses zero cost data distortion of data sharing with its isolated impact to achieve budget balance while truthfully implementing socially optimal outcomes to the exact level of budget imbalance of standard VCG mechanisms. Mixed-VCG uses data distortions as data money for this purpose. We further relax zero cost data distortion assumption by proposing distorted-mixed-VCG. We also extend our model and results to data sharing via incremental inquiries and differential privacy costs.
1 Introduction
The recent rapid advances in machine learning algorithms, hardware, and platforms has had a major effect on fielded applications—access to data has become the limiting constraint in many such applications. The limitation often arises because data is generated in a decentralized fashion, held privately under a variety of ownership models. There is accordingly a growing need to study distributed machine learning problems in which self-interested agents are incentivized to share their data. Designing such incentives requires understanding the costs that users incur in sharing their data, including communication and privacy costs, and trading off those costs against possible benefits from data analysis. The costs and benefits will generally be agent- and type-specific. The overall problem is a complex market-design problem that aims at incentivizing the sharing of data while achieving computational, economic, and statistical efficiencies (Agarwal et al.,, 2019; Fernandez et al.,, 2020; Liang and Madsen,, 2020).
As an example consider banks who are interested in deploying up-to-date fraud detection algorithms. Each bank has its own data on financial frauds coming from its historical record of transactions. But such data will often not have good coverage and may be out of date. Rather than relying solely on its own data, banks would ideally share data, obtaining a better model than would be possible without sharing. As another example, hospitals each have their own set of patient data; e.g. radiology data, genomic data, or epidemiological data. The data available to a single hospital may be not sufficient for accurate inference or particularly for causal inference. Indeed, the per hospital data may be biased in various ways. Ideally, hospitals would share data to mitigate these and other problems.
Data has unique features as a commodity that make data markets distinct from other markets. First, in some cases data is traded for other data (bilateral sharing) as opposed to an exchange of data for money (unilateral sharing). In bilateral sharing, the agents gain access to each others’ data. Bilateral sharing is natural in use cases in which it is hard to evaluate and attach a price to data (Mehta et al.,, 2019). Indeed, the fundamental challenge of quantifying the value of data in algorithmic predictions and decision-making is an active research area (Ghorbani and Zou,, 2019; Mehta et al.,, 2019). It is also natural when regulations do not permit the exchange of money for data; this would be the case for the sharing of patient data. Bilateral trade without money also arises in other markets. Companies exchange their pool of patents with each other because they can not evaluate and attach a price to their patent pool. Also, regulatory limits on trading commodities with money also exist for various forms of matching markets, including kidney exchange markets. Relationships, such as friendships or marriage, are other forms of markets with bilateral exchange where money is not used. Even within the context of such examples, however, data markets are distinct. In particular, data is traded often at some privacy and communication cost for agents (in contrast to patent market) and one agent can share data with multiple agents through bilateral exchanges (in contrast to organ donation or marriage markets).
Our first problem of study in this paper, in the tradition of the seminal paper of Gale and Shapley, (1962), is to find conditions under which stable outcomes exist for bilateral sharing, and can be found in polynomial time. We model bilateral sharing as a general network/graph formation game, similar to Jackson and Wolinsky, (1996) and Jackson and Van den Nouweland, (2005). This game is a more general version of hedonic games that are known not to have core stable outcomes in general (Bogomolnaia and Jackson,, 2002). We also consider the notion of strong stability. In this case, without imposing a utility structure on the agents, we find conditions under which the game has a strongly stable outcome and we design an algorithm that can find such outcomes in time, where is the number of agents. Our framework weakens some of the unrealistic assumptions that have traditionally been imposed in the literature on strongly stable outcomes (Ostrovsky,, 2008); in particular, we allow cyclicity in the trades and limited complementarity. We achieve this weakening by isolating a property—the top agent property—which defines an ordering for agents that is based on how much each agent is preferred by all other agents. Limited complementarity then arises by ensuring that no subset of agents ranked below an agent can be preferred to that agent. Our notion of top agent property has similarities to that of ordered coalitions in Bogomolnaia and Jackson, (2002), the notion of the common ranking property (Farrell and Scotchmer,, 1988; Caskurlu and Kizilkaya,, 2019) and the top coalition property (Banerjee et al.,, 2001) used in establishing the existence of core stable outcomes in hedonic games.
We also consider unilateral data sharing where agents can assess the value of data and can trade data for money. An example of this paradigm arises in the case of renewable energy forecasts (Gonçalves et al.,, 2020). Here we build on the seminal work of Shapley and Shubik, (1971), and study conditions that allow existence of competitive prices in these markets which also implement social-welfare-maximizing sharing. Our data sharing model is a one-sided, many-to-many sharing protocol, limited to trading of a single indivisible good in each directed trade. We show that if the agents’ cost of sharing is additive with respect to the agents that gain access to their data, then there exists set of competitive prices, and every social-welfare-maximizing outcome can be implemented with such competitive prices in a way that it will be a stable outcome.
Data has other unique features that influence the design of markets. In particular, the quality of data shared with other agents can be distorted upward or downward at almost zero cost, and without impact on the quality of data passed to other agents. There are several ways in which the quality of data can be distorted. For example, downward distortion can be achieved by removing part of one agent’s data when passed to the another agent, and upward distortion can be obtained by adding new data that is delivered to an agent (e.g., by generating new data by using generative adversarial networks). Alternatively, noise can be added to the data, or in more complicated setups where agents share a machine learning models, the quality of those models can be distorted. All these distortions can take place without impact on the quality of data received by other agents.
Our third problem is designing market mechanisms for unilateral data sharing when agents have private types. The market is run by a social planner. We consider Vickrey-Clarke-Groves (VCG) mechanisms (Vickrey,, 1961), which have been shown to uniquely implement social welfare at dominant strategies (Holmström,, 1979). However, VCG mechanisms are known to suffer from budget imbalance for the intermediary. We look for market mechanisms that have the following features: they implement approximate social-welfare-maximizing outcomes at dominant strategies, are individually rational, and are budget balanced. To this end, we propose a variation of the VCG mechanism which we refer to as mixed-VCG. Mixed-VCG uses both data distortion, referred to as data money, and monetary payment, to achieve budget balance while implementing social welfare via calibration to the value of budget imbalance of the standard VCG mechanism. Mixed-VCG uses downward distortion for budget surplus and upward distortion for budget deficit of the intermediary. While distorting data quality downward is practically at zero cost, for example by adding noise to the data, an upward distortion may be costly to the intermediary. To address this issue, we present a mechanism that we refer to as the distorted-mixed-VCG mechanism. Here the intermediary adds a fixed base downward distortion of data in a zero cost way, for example fixed noise added to data. Then instead of adding new data, the intermediary simply lifts controlled amounts of the fixed noise.
While our presentation focuses on the essential components of a data sharing market, we note that our model can be extended to more complicated algorithms for data sharing. For example a popular data sharing mechanism is one in which each agent only asks for outcome of an inquiry over another agent’s data and the second agent incurs a privacy cost per inquiry. We will also show that our model and results can be extended to capture this setup.
The rest of the paper is organized as follows. Section 2 reviews the relevant background literature. In Section 3, we study our first problem for existence of stable outcomes for bilateral sharing. In Section 4, we show the existence of competitive prices for unilateral sharing, and Section 5 proposes the mixed-VCG mechanism for unilateral sharing with private information. In Section 6 we extend our model and results to the case of data sharing limited to incremental inquiries over data under a restriction expressed in terms of differential privacy. We conclude the paper and discuss further research directions in Section 7.
2 Related Work
In this section we review some of the relevant literature, focusing on four main areas: an emerging literature on data markets, and three classical areas in microeconomics, specifically stable outcomes in cooperative games, competitive equilibrium, and social-welfare-maximizing mechanism design. We discuss these areas separately below.
Data markets and relevant applications: There is a growing recent literature for data markets that studies specific properties of data as a economic good [see, e.g., for a survey Mehta et al., (2019)]. The literature addresses market structures and mechanisms for sales of data and data products (e.g., ratings, predictions, recommendations, and machine learning models).
To the best of our knowledge, our paper is the first to study non-monetary mechanisms for the trade of data (bilateral data exchange). There has been, however, a growing literature on monetary data market mechanisms which we review here. The first set of papers in this area study data sharing among strategic agents/firms. These papers implement information sharing among competing agents via the use of an intermediary. In this literature the intermediary is treated as a welfare maximizer, and is limited to aggregating data from agents and sharing it with all others (see, e.g., Raith,, 1996). Bergemann and Morris, (2013) and Bergemann et al., (2018) extend this literature to the case of a rent-seeking intermediary who can also distort data. In contrast to this literature, we do not consider competition among the agents in downstream markets, and we also do not pose corresponding restricting assumptions on trade and on utility functions. We allow for bilateral trade, both with and without intermediaries. In our study, the intermediary is a social-welfare maximizer and we also allow the intermediary to add noise to the model.
The other relevant strand of the data-market literature are two-sided markets where buyers and sellers are separate. Agarwal et al., (2019) propose a market mechanism for finding prices and distributing welfare while allowing individual prices per buyer. The prices are not allowed to be seller specific. They also not consider the cost of sharing, including privacy costs. Our work extends this model to buyer-seller specific prices in a setup where buyers can also be sellers. We also consider the cost of sharing, as do Ghosh and Roth, (2011) and Ligett and Roth, (2012) who study the tradeoff with revenue gain. Acemoglu et al., (2019) discuss the externalities in data privacy.
There are several applications similar to data sharing. These can be unified under information economics. Patents have similar features to data (Lerner and Tirole,, 2004). They can be sold to multiple agents, can be combined together for higher productivity, and can be ordered in their value for consumers. However, the literature on patents assumes a negligible cost of sharing. Moreover patent quality can not be distorted by the intermediary. Ad auctions (Varian,, 2009) and online matching (Mehta,, 2013) are other relevant applications where intermediaries sell data about customers or the other side of the market. In these markets the sales of data is limited, a single ad shown to a target customer, data for each customer is sold to a single advertiser, and there is a one-to-one matching in online matching.
Stable outcomes in cooperative games: There is an extensive literature of cooperative matching games dating back to the seminal work of (Gale and Shapley,, 1962). The major question is to existence of stable outcomes and efficient (polynomial order) algorithms for finding them. The games studied differ in various ways—in the structure of the matching market (e.g., two-sided vs one-sided, one-to-one vs one-to many vs many-to-many, etc.), and in the particular notion of stability employed (strong stability, core stability, Nash stability, individual stability, contractual individual stability, etc). (Gale and Shapley,, 1962) studied two-sided games with two-sided preferences and established the existence of a core stable outcome via a deferred acceptance algorithm that runs in time . Two-sided matching games with one-sided preferences and with ownership is studied in Shapley and Scarf, (1974) where the top-trading cycle algorithm demonstrates the existence of a core stable outcome calculated in time . Finally, two-sided games without ownership are studied in Hylland and Zeckhauser, (1979), who studied a class of “random serial dictatorship” algorithms, and in Abdulkadiroğlu and Sönmez, (1998), who proposed a “core from random endowment mechanism.” While the majority of this work involves two-sided matching, Irving, (1985) studied core stable outcomes in one-sided team formation problems in the context of roommate matching. This is a special case of hedonic games where agents form disjoint partitions and have preferences only in terms of agents in their own coalition (Bogomolnaia and Jackson,, 2002). Hedonic games do not have core stable outcomes in general. Bogomolnaia and Jackson, (2002) show that hedonic game with ordered coalitions or when players have additive and separable preferences have individually stable coalitions.
Our game is more general than hedonic games since matchings are not necessarily in coalition partitions; it could be that agents A and B share data together but C only shares data with A and not B. Therefore our game is a network/graph formation game (Jackson and Wolinsky,, 1996).
Another difference between our work and existing literature is that the latter focuses on core stable outcomes, while focus on strong stability. We will henceforth refer to strong stability as “stability.” Recall that the outcome of the game is strongly stable if there is no deviating coalition of agents that can form contracts among each other, also keeping or dropping their existing contracts with agents out of the deviating coalition at will, in a way such that all members of the deviating coalition weakly prefer the new outcome and at least one of them strongly prefers the new outcome.
Most of the literature is based on two assumptions that are used to establish the existence of strong stability (Ostrovsky,, 2008): substitutability and acyclicity of trade. We relax both assumptions in this study.
Games that weaken the acyclicity assumption are studied by Jackson and Wolinsky, (1996) in the context of friendship networks. The results, however, are limited to a weak notion of pair-wise stability and to restricted value allocation for the outcomes. Jackson and Van den Nouweland, (2005) studied strong stability in network formation games but imposed strong assumptions on the structure of the value function; for example, component-wise separability. In contrast we consider strong stability and general preferences without value functions.
A line of work that relaxes complementarity is presented for large markets by Azevedo and Hatfield, (2018) but it relies on the existence of transferable (monetary payments) between agents. Pycia, (2012) study the existence of stability in two-sided matching markets but assume that preferences are pair-wise aligned in two sides. In this paper, we study a network formation game involving finite number of agents with general preferences (without utility functions). We also consider a limited complementarity property which is satisfied (for example) under the substitutability condition of Ostrovsky, (2008) but also holds more broadly.
In our work the limited complementarity property arises from the top agent property, which induces an ordering among the agents on how much they are preferred by others. Our notion of top agent property has similarities to that of “ordered coalitions” in Bogomolnaia and Jackson, (2002) as well as the notion of the “common ranking property” (Farrell and Scotchmer,, 1988; Caskurlu and Kizilkaya,, 2019) and the “top coalition property” (Banerjee et al.,, 2001) used for establishing the existence of core stable outcomes in Hedonic games.
Finally while it is NP hard to find core stable outcomes in hedonic games under the assumptions that are generally made in the literature, in our setup we we are able to derive an algorithm that finds stable outcomes in time , where is the number of agents.
Competitive equilibrium: There is an extensive literature on competitive prices. The main questions are existence of competitive prices, their structure, mechanisms that can find/form those prices, and their properties (including fairness, stability, and efficiency). Shapley and Shubik, (1971) studied two-sided matching games in which agents can have monetary transfer and show the existence of competitive prices. They also show that these prices realize core stable outcomes, and are social-welfare maximizing. The extension to the case of non-quasilinear utility is studied in Demange and Gale, (1985) and Alaei et al., (2016). Sotomayor, (1992) provide an extension to multi-partner two-sided matching markets case where each worker (firm) can have multiple but a limited number of jobs (hiring positions). Further study of many-to-one matching and many-to-many matching in two-sided matching markets is provided in Gul and Stacchetti, (1999) and Ausubel and Milgrom, (2002). Our data sharing market is not two-sided and allows multiple partners without limitation on their number. we also consider quasilinear utilities. However, we only allow one unit of an indivisible good to be traded in each directed partnership. Considering an additive structure on the utility function, we establish the existence of competitive prices for this setup, and we show that every social-welfare-maximizing outcome can be implemented with competitive prices such that it will be a stable outcome.
Social-welfare-maximizing mechanism design: There is large literature on mechanism design where agents have private information. The VCG mechanism, presented in the seminal paper of Vickrey, (1961), is the only truthful dominant mechanisms that maximizes the utilitarian social welfare under certain assumptions, including unrestricted preferences and at least three different outcomes (Holmström,, 1979). VCG mechanisms are also ex-post individually rational under monotonicity of the choice set and non-negativity of externalities. However, VCG mechanisms have major practical shortcomings. In particular, they are not individually rational and budget balanced at the same time.
Green-Laffont’s impossibility result shows that no dominant-strategy incentive-compatible mechanism is always both social-welfare maximizing and weakly budget balanced (Green and Laffont,, 1977). We extend the VCG mechanism to address these issues by leveraging the structure of the data sharing economy. We consider unique features of data, particularly the possibility of distort the quality of data for an agent at zero cost and without impact on other agents’ utilities. We then introduce a “mixed-VCG” framework that achieves approximately optimal social welfare, dominant strategy incentive compatibility, budget balance, and ex-post individual rationality.
3 Bidirectional Data Sharing: The Network Formation Game
We begin by presenting our model of the bidirectional data sharing problem, posing it as as a network formation game. Then we present our results on the existence of strongly stable outcomes for this game.
3.1 Model for bidirectional data sharing
Consider agents (e.g., banks or hospitals). Each agent generates value from data available to them (e.g., by training predictive models for fraud or cancer detection). Agent has a local data batch of size . Agents can also share data with each other, but sharing comes at some cost (e.g., a privacy cost or communication cost).
Assumption 3.1.
Upon sharing, data is reported fully and truthfully.
The truthful report of data can be justified practically by the existence of an auditing third party. Furthermore, data sharing can be unidirectional (data in exchange of money) or bidirectional (data in exchange of data).
Assumption 3.2.
(Bidirectional Sharing) Once agents and share data then both gain access to the others’ data.
Bidirectional data sharing is in the cases where it is hard to attach a monetary value to data (e.g. for banks rare fraud data) and hence data is exchanged with data, or the case that regulations prevent sales of data for money (e.g. patients data for banks).
We model the data sharing game as a Network Formation Game, based with a graph that is defined as follows.
Definition 3.1 (Sharing Graph).
In the sharing graph each node is an agent and each edge represents the sharing of data between two nodes. For bidirectional sharing, an edge between nodes and means they share data in both directions (we exclude edges from a node to itself).
If agent weakly prefers outcome graph to we denote this preference by . Strong preference is denoted by .
Our work is based on the following notion of stability.
Definition 3.2 (Stable Outcome).
A graph is stable if no coalition of agents can form a new subgraph while retaining some of their existing edges, such that all members of the coalition weakly prefer the new graph to the original graph, and at least one of them strongly prefers it.
Next, we provide structures on agents’ preferences. Denote by the set of nodes connected to including itself.
Assumption 3.3 (Preference Structure).
Agent ’s preferences over outcome graphs in are a function solely of .
Remark 3.1.
The structure in Assumption 3.3 does not allow for the modeling competition of agents in downstream markets, because each agent’s preference is independent of what data is available to himself and not the other agents.
The preference structure in Assumption 3.3 is a common assumption in the literature on network formation games and hedonic games.
3.2 Existence of stable outcomes
The network formation game does not have a stable outcome in general. We present the top agent and limited complimentarity assumptions and prove under these assumptions, stable outcomes always exist. We also show by example that relaxing any of these assumptions leads to no stable outcome.
Definition 3.3 (Total Preference Ordering).
We say agent prefers to , denoted by , if for any subset of agents with , . Agent has a total preference ordering if it has preference over every other two agents and furthermore this preference is complete, transitive, and reflexive.
Definition 3.4 (Top Agent Property).
Agents preferences satisfy the top agent property if all agents have a total preference ordering, and furthermore their preferences over other agents are the same.
Under the top agent property, the agents can be ranked from the top choice agent to the lowest preferred agent for data sharing.
Corollary 3.1.1 (Homogeneous Preference Structure).
In data sharing markets, if each agent ’s preference is only a function of the total size of shared data that includes him, , and agents’ data sizes are orderable, , then the top agent property holds.
Definition 3.5 (Limited Complementarity).
Agents have limited complementarity if for every such that then for all .
Limited complementarity argues that if adding an agent to the set decreases the preference for agent , then adding any subset of agents with less than or equal preference to will also decrease the preference for agent ; in other words, the complementarity for agent across any such subset is limited from above.
We are now ready to establish the existence of stable outcomes for network formation under the top agent and limited complementarity assumptions.
Proposition 3.1.
Proof.
We prove by constructing an algorithm. Since total preference ordering exists, without loss of generality we assume agents are sorted in their value for others, with the highest preferred and the least preferred. Consider the following algorithm which we refer to by the ordered match. Initially set to be an empty edge graph. Agent moves first (set ), swipes from to and proposes to each agent to share data together if
(1) |
Agent accepts if
(2) |
If accepts then an edge between and is added to . and are also updated correspondingly. Then moves to . After finishes his swipe, agent does the same process (set ) but passing through to and so on. We denote the outcome graph of the ordered algorithm run over the set of agents by .
To show the outcomes of the ordered match are stable, we use the following lemma and corollary.
Lemma 3.2.
All agents weakly prefer the ordered match outcome graph to the alternative graph formed from removing some of its edges.
Proof.
Denote the outcome of the ordered match by . Consider removing some edges from resulting in graph . Denote the set obtained from removing exactly of the least valuable nodes from by . It is obvious from the ordered match algorithm that . Consider a node with the set of removed connections , where is the size of . To complete the proof, we show that . To show this, we construct an algorithm that starts from and converges to by replacing the nodes such that in each step the preference of agent is non-decreasing. The replacement algorithm is in two steps which are repeated for times: (a) first extend the set by adding the least valuable node in to it to form a new set denoted by , (b) remove the least valuable node in from to form . Note that this least valuable node of always exists in , because either it has been a member of from the beginning, or otherwise it is added in step . By the total preference ordering assumption, Definition 3.3, we have . We next run steps (a) and (b) on by adding the second least valuable node in to and removing the second least valuable node of from . Again agent has non-decreasing preferences over the resulting set. Continuing this procedure times forms set . ∎
Corollary 3.2.1.
If agent does not offer to agent in ordered match algorithm—i.e., (1) does not hold—then adding any subset of agents in to strongly decreases ’s utility.
This corollary is a direct consequence of the limited complementarity assumption in Definition 3.5.
We now complete the proof of the theorem. Consider . Consider a deviating coalition of agents with agents ordered in their value with being the highest valued agent. Denote their deviating sharing graph, which consists only of nodes , by . Note that if then no agent is strongly better with the deviation due to lemma 3.2, so . Next, search for a new edge in compared to using the following ordered search algorithm: start from and swipe over to the first edge in the coalition graph that did not exist in the ordered match graph . If no such edge was found, start with and swipe over , etc, and so on, to the point that the first new edge to is found. Without loss of generality assume is weakly more valuable than , so . Since the link between and does not exist in , at least one of (1) or (2) did not hold when was swiping in the ordered match algorithm. We show in the first case strongly prefers to and in the second case strongly prefers to .
-
1.
In the first case that (1) does not hold, we start from and construct through first a replacement algorithm forming and then a replacement algorithm forming from . Both of these stages are run through steps that only weakly increase the preference. By considering different cases we show that at least one of them strongly increases the preference. We use the following corollary for our proof here.
Corollary 3.2.2.
strongly prefers all nodes in to . Also weakly prefers to all nodes in .
The corollary holds since in the case where (1) does not hold, is not connected to in or any node less valuable than . Therefore strongly prefers all nodes in , and consequently all nodes in , to . On the other hand, since was the first edge found in the ordered search algorithm, is weakly preferred by to all nodes in .
The replacement algorithm forms as follows:
-
•
For a number of repetitions equal to do
-
(a)
Consider set and replace an arbitrary node in by an arbitrary node in to form .
-
(b)
Replace with and repeat the step (a).
-
(a)
We prove that . To this end, consider two cases. In case one, which means and consequently from lemma 3.2, . In case two, which means in a way that prefers all nodes in less than or equal to . Consequently from Corollary 3.2.1, . Next, in case where , due to Corollary 3.2.2, and therefore . Finally if , considering that , it should be that which means in a way that includes nodes that are preferred by less than or equal to . Furthermore in this case . Therefore, from Corollary 3.2.1, .
-
•
-
2.
In the second case that (2) does not hold, we start from and construct through the same replacement algorithm as in the first case such that the preference of is strongly increased. We skip repetition of the details for this case.
The ordered match algorithm finishes in time since each of the agents swipes on average over other agents. ∎
Remark 3.3.
The ordered match results in a unique outcome.
Remark 3.4.
The outcome of the ordered match algorithm is stable and therefore Pareto optimal. However it is not necessarily social-welfare maximizing. Consider simple case that there are two agents and with the following utility functions: , . Under the ordered match agent does not share with agent since his utility by sharing will decrease. However, the social-welfare-maximizing outcome is for agents and to share data.
Proposition 3.1 does not hold when either top agent or limited complementarity assumption is relaxed. To see this consider the following examples.
Example 3.5 (No stable outcome without top agent property).
Consider three agents. Agent has preferences over defined by ; agent has preferences over as ; agent 3 has preferences over as . Here the top agent property does not hold. The game does not have a stable outcome.
4 Unidirectional Sharing: Competitive Prices
In this Section, we study the case where data is traded with money. To this end we first extend the model in Section 3 to include utility functions and allow money transfer. Following Shapley and Scarf, (1974), we investigate existence of competitive prices that can realize stable outcomes. We will present structures on the utility function that guarantee existence of such prices.
4.1 Model
We update the model used in Section 3. We assume the agents’ preferences from each sharing graph is measurable with a utility function. We consider the following structures for this utility function.
The sharing graph for the unidirectional sharing will be a directed graph , where a directed edge from to indicates agent sharing his data with agent .
Assumption 4.1 (Utility structure).
Agents’ preferences over outcomes can be represented with a total utility function. Agent ’s total utility function is . Moreover, is decomposed as a utility function minus a cost function :
(3) |
where is the set of agents that agent gains access to their data, and is the set of agents that gain access to agent ’s data.
We now leverage the particular features of data for machine learning algorithms to put structure on the utility functions and the cost function.
Assumption 4.2 (Additive Cost Structure).
Agent ’s cost is additive with respect to the agents who access his data:
(4) |
Remark 4.1.
The cost structure in (4) is additively decomposable. It can represent privacy cost as well as peer-to-peer communication costs. However it does not include one-time communication with an intermediary which can then send the data to the others. It also does not model differential privacy costs where there is a privacy cost per inquiry of agent with respect to agent ’s data. Extension of the model to incorporate differential privacy is considered in Section 6.
Assumption 4.3.
Agents can have monetary transfer. We denote the payment by/charge to agent with , and its quasilinear utility function by
(5) |
4.2 Existence of competitive prices
Proposition 4.1.
Proof.
We prove by construction. (i) Consider the set of prices where agent charges a price of for selling his data to agent . We show these prices are competitive (form a competitive equilibrium). The total payment of agent will be
(6) |
We determine the demand and supply of each agent . Under Assumption 4.2,
(7) | ||||
(8) |
Agent ’ demand, which is the optimal subset of agents to buy data from denoted by , is calculated by
(9) |
Agent is indifferent on the amount of supply, which is the set of agents agent shares its data with; this is because (7) is independent of . Consequently, for any demand set , , the optimal supply of agent can be determined uniquely in a way to meet the demand and clear the market:
(10) |
(ii) We prove that every social-welfare-maximizing outcome can be implemented with the competitive prices we proposed above. Start with a social-welfare-maximizing outcome and decompose it as follows:
(11) | ||||
(12) |
where the decomposition in (12) matches that of (9). This shows if every agent optimizes (9) individually, then the solution will be social welfare maximizing. To see under the proposed competitive prices, the social welfare maximizing outcomes are stable, note that from (9) every agent is gaining its maximum utility at the outcome of the social-welfare-maximizing outcome compared to any other coalition. ∎
Remark 4.2.
The results in Proposition 4.1 are based on two features in this problem which are specific to the data sharing economy: first, the sharing costs are decomposable as a sum of the sharing cost between each pair of individuals according to Eq. (4) in Assumption 4.2. Second, data can be replicated freely so there is no constraint on number of agents who buy data from an agent. This last assumption makes it possible to define market-clearing supply amounts.
Proposition 4.2 (Space of the Competitive Prices).
Competitive prices of the unidirecitonal sharing game are not unique to those presented in Proposition 4.1. For example, fixing all other prices to be , , agent can set its prices for agent to be any value in where is a price such that the solutions to the following optimization remains the same as that in (9) for all :
(13) |
5 Unilateral Sharing with Private Types: Mixed-VCG Mechanism
In this section we extend the model by assuming agents belong in type space , determining their utility from sharing. We further assume types are private information known only to the agents themselves. There also exists a social planner intermediary who runs a market for coordinating data sharing. We look for market mechanisms that have the following features: they implement approximate social-welfare-maximizing outcomes at dominant equilibrium, they are individually rational, and they are budget balanced. To this end, we propose a variation of the VCG mechanisms—mixed VCG—which uses a mix of data distortion (referred to as “data money”) and monetary payment to achieve budget balance. We will then investigate how to address communication/message complexity of the VCG mechanisms for data sharing, using specific features of data.
5.1 Model
We extend the model in Section 4.
Assumption 5.1.
Agents are from type space with distribution . Types are private information.
Note that type is a determinant of agent ’s utility and cost functions. It does not involve its private data .
Assumption 5.2.
There exists an intermediary for running the market. The intermediary is a social planner.
Furthermore, we extend the game by making the following two assumptions which capture unique features of data economy. We discuss at the end of this section how these assumptions can be relaxed.
Assumption 5.3.
[Zero Cost Data Distortion] The intermediary has the capacity to distort the quality of data passed to each agent downward or upward, at zero cost to the intermediary.
There are several ways the quality of data can be distorted. For example, downward distortion could be achieved by removing part of one agent’s data when passed to another, and upward distortion could be achieved by adding new data passed to an agent (perhaps by simulating new data from a model). Alternatively, the intermediary can add noise to data, or to the machine learning model passed to each agent. Later in this section, we will relax the assumption of zero cost distortion to the intermediary.
Assumption 5.4.
[Isolated Impact of Data Distortion] We assume that data rationing by the intermediary for data shared with one agent does not change the total utility of the other agents.
Assumption 5.3 extends the outcomes of the game from the graph in Section 3 to a weighted directed graph defined below.
Assumption 5.4 implies that other agents do not incur a change in their utility or costs from data sharing, such as privacy cost, due to rationing on distortion of another agent’s data. This allows for isolating the impact of rationing data passed to an agent only to itself.
Definition 5.1.
The outcome of a unidirectional sharing game where the intermediary can change quality of the data is a weighted directed graph . The weight of each edge from to is denoted by .
One way to interpret the weights is as the fraction of the data from passed by the intermediary to .
We also extend the utility of the agents to be explicitly a function of their type and the weighted directed graph :
(14) |
Similar to (5), is the utility from gaining access to other agents’ data, is the cost of sharing, and is the monetary payment.
Before introducing our mechanism we provide the following definition which helps in studying the properties of our mechanism.
Definition 5.2.
Agents’ social welfare is
(15) |
Total social welfare is the sum of utilities of all agents plus the payments to the intermediary:
(16) | ||||
(17) |
Remark 5.1.
in (17) is agents’ utility only from allocations (including costs imposed to the intermediary for data distortion) excluding payments. The standard classical VCG mechanism implements maximum total social welfare (often referred to simply as social welfare). VCG does not implement maximum .
5.2 Mixed-VCG mechanism
We introduce a mechanism that we refer to as Mixed-VCG to achieve approximately optimal , truthful reporting as a dominant strategy, budget balance and ex-post individual rationality Moreover, the mechanism will have the same as standard VCG.
Definition 5.3 (Mixed-VCG).
Agents report their type to the intermediary; agent ’s reported type is denoted by s. The intermediary then outputs a weighted directed sharing graph and a payment vector by taking the following steps.
First the intermediary determines a sharing graph similar to standard VCG and without any distortions,
(18) |
determines as
(19) |
where
(20) |
and calculates the total budget surplus or deficit of the standard VCG:
(21) |
The intermediary then determines each agent ’s real payment and data payment (which will be translated into distortion to agent ’s data) such that
(22) |
Without loss of generality assume . If —i.e., there is budget surplus in standard VCG—then the intermediary starts from agent and sets
(23) | ||||
(24) |
If , i.e., there is a budget deficit, then the intermediary starts from agent and sets
(25) | ||||
(26) |
Next the intermediary sets monetary payments as follows:
(27) |
and calculates data distortions, , to form the sharing graph so that the following is satisfied:
(28) |
Proposition 5.1.
The mixed-VCG mechanism is dominant truthful, budget balanced and ex-post individually rational. It implements agents’ social welfare, , whose suboptimality is exactly equal to the level of budget imbalance of standard VCG. It also implements the same as standard VCG.
Proof.
To see individual rationality and dominant truthfulness, it is sufficient to note that for the vector of reported types based on (28), all agents gain the same utility for both standard VCG and mixed-VCG. Individual rationality and dominant truthfulness then follow from the fact that these properties hold for standard VCG.
Budget balance is a direct consequence of definition of the monetary payments in mixed-VCG, which is the redistribution of the standard VCG budget imbalance across agents according to (22)-(27). This redistribution ensures .
To see the difference of the between mixed-VCG and standard VCG, from (17), (28), budget balance of the mixed-VCG mechanism which induces , and (21),
(29) | ||||
(30) | ||||
(31) |
∎
Remark 5.2.
The NM-VCG mechanism relies on isolated impact of data distortion in Assumption 5.4, the fact that the intermediary can distort the data allocations to change an individual’s utility without changing the others’. This is a property of data economy and not the case for other commodities (for example house allocation).
We now consider relaxing Assumption 5.3. While distorting data quality downward is practically at zero cost, for example by adding noise to the data, an upward distortion may be practically costly to the intermediary, for example through training a model for generating new data. To address this issue, we extend the mixed-VCG mechanism. The intermediary can add a fixed base downward distortion of data in a zero cost way, for example fixed noise added to data. Then instead of adding new data, the intermediary just lifts the fixed noise which will be at negligible cost in practice. We call this mechanism Distorted mixed-VCG (D-mixed-VCG). Formally consider to be the chosen ex-ante weights placed on the edges. Denote the class of all such sharing graphs as (the set of graphs that either does not pass his data to or gives a distorted version at weight . Then intermediary determines a sharing graph according to
(32) |
and determines as
(33) |
where
(34) |
the rest of the mechanism is analogous to the mixed VCG mechanism per (21)-(28).
D-mixed-VCG implements lower and compared to mixed-VCG because it adds distortion to the data. The proof is straightforward, and we leave it to the reader.
Remark 5.3.
We now discuss the communication complexity of the mixed-VCG mechanism for data sharing by using specific features of data as a commodity. From statistical machine learning results, the agents’ utility and costs from sharing data, and , are of certain structural forms; for example, for several machine learning methods, the marginal value of the new data for learning a model reduces at a rate of where is the size of the entire data available to the agent. Using such structural properties allows reducing the complexity of the type space, hence the communication complexity.
6 Extensions to Differential Privacy
In this section we briefly show how the model and results can be adapted to a more complicated setting, by bringing differential privacy into the picture. We relax full reporting of data in Assumption 3.1, and assume agents make queries over each others’ data, where for each query there is a differential privacy cost for the data owner. In this setup, the unit of commodity offered by each agent is the outcome of a query over its data, rather than its whole raw data. We omit the proofs of this section.
We update the model by extending the results in Section 3 to 5. First, the sharing graph in Definition 3.1 is extended with integer weights on the edges which reflect the number of queries agent runs on agent ’s data. We refer to this graph by . The cost and utility functions in Assumptions 4.2 are extended to be a function of graph and , , where , Given these updates to the model, together with Proposition 3.1, we can show that a core stable outcome exists and can be found by updating the ordered match algorithm as follows: every time an agent swipes an agent , it also proposes a set of sharing weights that are acceptable to (the benefits of sharing are greater than costs). Agent then either accepts to share by choosing from this set or rejects the sharing of data.
Next consider a cost structure in which each agent has a differential privacy cost which, along the line of (4) is additively decomposable with respect to the agents he is sharing data with:
(35) |
Then, along the lines of Proposition 4.1, there exists a set of competitive prices where each agent charges agent an amount of per inquiry. Moreover, each other agent is allowed to purchase multiple inquiries from agent .
Finally, the VCG mechanism in Section 5 can be updating by considering doubly weighted directed graphs with weights on each edge from to being the number of inquiries of on ’s data, and the distortion to quality of each inquiry. Then (36) can be extended as
(36) |
The mixed-VCG mechanism is updated to include weights determined by the intermediary.
7 Conclusions
We have studied data sharing markets for both bidirectional (data-data exchange) and unidirectional (data-money exchange) sharing. We proved existence of strongly stable outcomes for bilateral sharing via formulating the problem as a network formation game. We achieved this with limited complementarity and without posing utility structure on agents’ preferences. For unilateral sharing, we constructed competitive prices that implement socially optimal outcomes, and in the case of agents’ private information of their types, we also presented budget balanced mechanisms which truthfully implement approximately optimal outcomes.
References
- Abdulkadiroğlu and Sönmez, (1998) Abdulkadiroğlu, A. and Sönmez, T. (1998). Random serial dictatorship and the core from random endowments in house allocation problems. Econometrica, 66(3):689–701.
- Acemoglu et al., (2019) Acemoglu, D., Makhdoumi, A., Malekian, A., and Ozdaglar, A. (2019). Too much data: Prices and inefficiencies in data markets. Technical report, National Bureau of Economic Research.
- Agarwal et al., (2019) Agarwal, A., Dahleh, M., and Sarkar, T. (2019). A marketplace for data: An algorithmic solution. In Proceedings of the 2019 ACM Conference on Economics and Computation, pages 701–726.
- Alaei et al., (2016) Alaei, S., Jain, K., and Malekian, A. (2016). Competitive equilibria in two-sided matching markets with general utility functions. Operations Research, 64(3):638–645.
- Ausubel and Milgrom, (2002) Ausubel, L. M. and Milgrom, P. R. (2002). Ascending auctions with package bidding. Advances in Theoretical Economics, 1(1).
- Azevedo and Hatfield, (2018) Azevedo, E. M. and Hatfield, J. W. (2018). Existence of equilibrium in large matching markets with complementarities. Available at SSRN 3268884.
- Banerjee et al., (2001) Banerjee, S., Konishi, H., and Sönmez, T. (2001). Core in a simple coalition formation game. Social Choice and Welfare, 18(1):135–153.
- Bergemann et al., (2018) Bergemann, D., Bonatti, A., and Smolin, A. (2018). The design and price of information. American Economic Review, 108(1):1–48.
- Bergemann and Morris, (2013) Bergemann, D. and Morris, S. (2013). Robust predictions in games with incomplete information. Econometrica, 81(4):1251–1308.
- Bogomolnaia and Jackson, (2002) Bogomolnaia, A. and Jackson, M. O. (2002). The stability of hedonic coalition structures. Games and Economic Behavior, 38(2):201–230.
- Caskurlu and Kizilkaya, (2019) Caskurlu, B. and Kizilkaya, F. E. (2019). On hedonic games with common ranking property. In International Conference on Algorithms and Complexity, pages 137–148. Springer.
- Demange and Gale, (1985) Demange, G. and Gale, D. (1985). The strategy structure of two-sided matching markets. Econometrica: Journal of the Econometric Society, pages 873–888.
- Farrell and Scotchmer, (1988) Farrell, J. and Scotchmer, S. (1988). Partnerships. The Quarterly Journal of Economics, 103(2):279–297.
- Fernandez et al., (2020) Fernandez, R. C., Subramaniam, P., and Franklin, M. J. (2020). Data market platforms: Trading data assets to solve data problems. Proceedings of the VLDB Endowment, 13(12):1933–1947.
- Gale and Shapley, (1962) Gale, D. and Shapley, L. S. (1962). College admissions and the stability of marriage. The American Mathematical Monthly, 69(1):9–15.
- Ghorbani and Zou, (2019) Ghorbani, A. and Zou, J. (2019). Data shapley: Equitable valuation of data for machine learning. In International Conference on Machine Learning, pages 2242–2251. PMLR.
- Ghosh and Roth, (2011) Ghosh, A. and Roth, A. (2011). Selling privacy at auction. In Proceedings of the 12th ACM Conference on Electronic Commerce, pages 199–208.
- Gonçalves et al., (2020) Gonçalves, C., Pinson, P., and Bessa, R. J. (2020). Towards data markets in renewable energy forecasting. IEEE Transactions on Sustainable Energy, 12(1):533–542.
- Green and Laffont, (1977) Green, J. and Laffont, J.-J. (1977). Characterization of satisfactory mechanisms for the revelation of preferences for public goods. Econometrica: Journal of the Econometric Society, pages 427–438.
- Gul and Stacchetti, (1999) Gul, F. and Stacchetti, E. (1999). Walrasian equilibrium with gross substitutes. Journal of Economic Theory, 87(1):95–124.
- Holmström, (1979) Holmström, B. (1979). Groves’ scheme on restricted domains. Econometrica: Journal of the Econometric Society, pages 1137–1144.
- Hylland and Zeckhauser, (1979) Hylland, A. and Zeckhauser, R. (1979). The efficient allocation of individuals to positions. Journal of Political Economy, 87(2):293–314.
- Irving, (1985) Irving, R. W. (1985). An efficient algorithm for the “stable roommates” problem. Journal of Algorithms, 6(4):577–595.
- Jackson and Van den Nouweland, (2005) Jackson, M. O. and Van den Nouweland, A. (2005). Strongly stable networks. Games and Economic Behavior, 51(2):420–444.
- Jackson and Wolinsky, (1996) Jackson, M. O. and Wolinsky, A. (1996). A strategic model of social and economic networks. Journal of Economic Theory, 71(1):44–74.
- Lerner and Tirole, (2004) Lerner, J. and Tirole, J. (2004). Efficient patent pools. American Economic Review, 94(3):691–711.
- Liang and Madsen, (2020) Liang, A. and Madsen, E. (2020). Data and incentives. In Proceedings of the 21st ACM Conference on Economics and Computation, pages 41–42.
- Ligett and Roth, (2012) Ligett, K. and Roth, A. (2012). Take it or leave it: Running a survey when privacy comes at a cost. In International Workshop on Internet and Network Economics, pages 378–391. Springer.
- Mehta, (2013) Mehta, A. (2013). Online matching and ad allocation.
- Mehta et al., (2019) Mehta, S., Dawande, M., Janakiraman, G., and Mookerjee, V. (2019). How to sell a dataset? pricing policies for data monetization. Pricing Policies for Data Monetization (August 1, 2019).
- Ostrovsky, (2008) Ostrovsky, M. (2008). Stability in supply chain networks. American Economic Review, 98(3):897–923.
- Pycia, (2012) Pycia, M. (2012). Stability and preference alignment in matching and coalition formation. Econometrica, 80(1):323–362.
- Raith, (1996) Raith, M. (1996). A general model of information sharing in oligopoly. Journal of Economic Theory, 71(1):260–288.
- Shapley and Scarf, (1974) Shapley, L. and Scarf, H. (1974). On cores and indivisibility. Journal of Mathematical Economics, 1(1):23–37.
- Shapley and Shubik, (1971) Shapley, L. S. and Shubik, M. (1971). The assignment game i: The core. International Journal of Game Theory, 1(1):111–130.
- Sotomayor, (1992) Sotomayor, M. (1992). The multiple partners game. In Equilibrium and Dynamics, pages 322–354. Springer.
- Varian, (2009) Varian, H. R. (2009). Online ad auctions. American Economic Review, 99(2):430–34.
- Vickrey, (1961) Vickrey, W. (1961). Counterspeculation, auctions, and competitive sealed tenders. The Journal of Finance, 16(1):8–37.