This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Data Sharing Markets

Mohammad Rasouli, Michael Jordan
Abstract

With the growing use of distributed machine learning techniques, there is a growing need for data markets that allows agents to share data with each other. Nevertheless data has unique features that separates it from other commodities including replicability, cost of sharing, and ability to distort. We study a setup where each agent can be both buyer and seller of data. For this setup, we consider two cases: bilateral data exchange (trading data with data) and unilateral data exchange (trading data with money). We model bilateral sharing as a network formation game and show the existence of strongly stable outcome under the top agents property by allowing limited complementarity. We propose ordered match algorithm which can find the stable outcome in O(N2)O(N^{2}) (NN is the number of agents). For the unilateral sharing, under the assumption of additive cost structure, we construct competitive prices that can implement any social welfare maximizing outcome. Finally for this setup when agents have private information, we propose mixed-VCG mechanism which uses zero cost data distortion of data sharing with its isolated impact to achieve budget balance while truthfully implementing socially optimal outcomes to the exact level of budget imbalance of standard VCG mechanisms. Mixed-VCG uses data distortions as data money for this purpose. We further relax zero cost data distortion assumption by proposing distorted-mixed-VCG. We also extend our model and results to data sharing via incremental inquiries and differential privacy costs.

1 Introduction

The recent rapid advances in machine learning algorithms, hardware, and platforms has had a major effect on fielded applications—access to data has become the limiting constraint in many such applications. The limitation often arises because data is generated in a decentralized fashion, held privately under a variety of ownership models. There is accordingly a growing need to study distributed machine learning problems in which self-interested agents are incentivized to share their data. Designing such incentives requires understanding the costs that users incur in sharing their data, including communication and privacy costs, and trading off those costs against possible benefits from data analysis. The costs and benefits will generally be agent- and type-specific. The overall problem is a complex market-design problem that aims at incentivizing the sharing of data while achieving computational, economic, and statistical efficiencies (Agarwal et al.,, 2019; Fernandez et al.,, 2020; Liang and Madsen,, 2020).

As an example consider banks who are interested in deploying up-to-date fraud detection algorithms. Each bank has its own data on financial frauds coming from its historical record of transactions. But such data will often not have good coverage and may be out of date. Rather than relying solely on its own data, banks would ideally share data, obtaining a better model than would be possible without sharing. As another example, hospitals each have their own set of patient data; e.g. radiology data, genomic data, or epidemiological data. The data available to a single hospital may be not sufficient for accurate inference or particularly for causal inference. Indeed, the per hospital data may be biased in various ways. Ideally, hospitals would share data to mitigate these and other problems.

Data has unique features as a commodity that make data markets distinct from other markets. First, in some cases data is traded for other data (bilateral sharing) as opposed to an exchange of data for money (unilateral sharing). In bilateral sharing, the agents gain access to each others’ data. Bilateral sharing is natural in use cases in which it is hard to evaluate and attach a price to data (Mehta et al.,, 2019). Indeed, the fundamental challenge of quantifying the value of data in algorithmic predictions and decision-making is an active research area (Ghorbani and Zou,, 2019; Mehta et al.,, 2019). It is also natural when regulations do not permit the exchange of money for data; this would be the case for the sharing of patient data. Bilateral trade without money also arises in other markets. Companies exchange their pool of patents with each other because they can not evaluate and attach a price to their patent pool. Also, regulatory limits on trading commodities with money also exist for various forms of matching markets, including kidney exchange markets. Relationships, such as friendships or marriage, are other forms of markets with bilateral exchange where money is not used. Even within the context of such examples, however, data markets are distinct. In particular, data is traded often at some privacy and communication cost for agents (in contrast to patent market) and one agent can share data with multiple agents through bilateral exchanges (in contrast to organ donation or marriage markets).

Our first problem of study in this paper, in the tradition of the seminal paper of Gale and Shapley, (1962), is to find conditions under which stable outcomes exist for bilateral sharing, and can be found in polynomial time. We model bilateral sharing as a general network/graph formation game, similar to Jackson and Wolinsky, (1996) and Jackson and Van den Nouweland, (2005). This game is a more general version of hedonic games that are known not to have core stable outcomes in general (Bogomolnaia and Jackson,, 2002). We also consider the notion of strong stability. In this case, without imposing a utility structure on the agents, we find conditions under which the game has a strongly stable outcome and we design an algorithm that can find such outcomes in O(N2)O(N^{2}) time, where NN is the number of agents. Our framework weakens some of the unrealistic assumptions that have traditionally been imposed in the literature on strongly stable outcomes (Ostrovsky,, 2008); in particular, we allow cyclicity in the trades and limited complementarity. We achieve this weakening by isolating a property—the top agent property—which defines an ordering for agents that is based on how much each agent is preferred by all other agents. Limited complementarity then arises by ensuring that no subset of agents ranked below an agent can be preferred to that agent. Our notion of top agent property has similarities to that of ordered coalitions in Bogomolnaia and Jackson, (2002), the notion of the common ranking property (Farrell and Scotchmer,, 1988; Caskurlu and Kizilkaya,, 2019) and the top coalition property (Banerjee et al.,, 2001) used in establishing the existence of core stable outcomes in hedonic games.

We also consider unilateral data sharing where agents can assess the value of data and can trade data for money. An example of this paradigm arises in the case of renewable energy forecasts (Gonçalves et al.,, 2020). Here we build on the seminal work of Shapley and Shubik, (1971), and study conditions that allow existence of competitive prices in these markets which also implement social-welfare-maximizing sharing. Our data sharing model is a one-sided, many-to-many sharing protocol, limited to trading of a single indivisible good in each directed trade. We show that if the agents’ cost of sharing is additive with respect to the agents that gain access to their data, then there exists set of competitive prices, and every social-welfare-maximizing outcome can be implemented with such competitive prices in a way that it will be a stable outcome.

Data has other unique features that influence the design of markets. In particular, the quality of data shared with other agents can be distorted upward or downward at almost zero cost, and without impact on the quality of data passed to other agents. There are several ways in which the quality of data can be distorted. For example, downward distortion can be achieved by removing part of one agent’s data when passed to the another agent, and upward distortion can be obtained by adding new data that is delivered to an agent (e.g., by generating new data by using generative adversarial networks). Alternatively, noise can be added to the data, or in more complicated setups where agents share a machine learning models, the quality of those models can be distorted. All these distortions can take place without impact on the quality of data received by other agents.

Our third problem is designing market mechanisms for unilateral data sharing when agents have private types. The market is run by a social planner. We consider Vickrey-Clarke-Groves (VCG) mechanisms (Vickrey,, 1961), which have been shown to uniquely implement social welfare at dominant strategies (Holmström,, 1979). However, VCG mechanisms are known to suffer from budget imbalance for the intermediary. We look for market mechanisms that have the following features: they implement approximate social-welfare-maximizing outcomes at dominant strategies, are individually rational, and are budget balanced. To this end, we propose a variation of the VCG mechanism which we refer to as mixed-VCG. Mixed-VCG uses both data distortion, referred to as data money, and monetary payment, to achieve budget balance while implementing social welfare via calibration to the value of budget imbalance of the standard VCG mechanism. Mixed-VCG uses downward distortion for budget surplus and upward distortion for budget deficit of the intermediary. While distorting data quality downward is practically at zero cost, for example by adding noise to the data, an upward distortion may be costly to the intermediary. To address this issue, we present a mechanism that we refer to as the distorted-mixed-VCG mechanism. Here the intermediary adds a fixed base downward distortion of data in a zero cost way, for example fixed noise added to data. Then instead of adding new data, the intermediary simply lifts controlled amounts of the fixed noise.

While our presentation focuses on the essential components of a data sharing market, we note that our model can be extended to more complicated algorithms for data sharing. For example a popular data sharing mechanism is one in which each agent only asks for outcome of an inquiry over another agent’s data and the second agent incurs a privacy cost per inquiry. We will also show that our model and results can be extended to capture this setup.

The rest of the paper is organized as follows. Section 2 reviews the relevant background literature. In Section 3, we study our first problem for existence of stable outcomes for bilateral sharing. In Section 4, we show the existence of competitive prices for unilateral sharing, and Section 5 proposes the mixed-VCG mechanism for unilateral sharing with private information. In Section 6 we extend our model and results to the case of data sharing limited to incremental inquiries over data under a restriction expressed in terms of differential privacy. We conclude the paper and discuss further research directions in Section 7.

2 Related Work

In this section we review some of the relevant literature, focusing on four main areas: an emerging literature on data markets, and three classical areas in microeconomics, specifically stable outcomes in cooperative games, competitive equilibrium, and social-welfare-maximizing mechanism design. We discuss these areas separately below.

Data markets and relevant applications: There is a growing recent literature for data markets that studies specific properties of data as a economic good [see, e.g., for a survey Mehta et al., (2019)]. The literature addresses market structures and mechanisms for sales of data and data products (e.g., ratings, predictions, recommendations, and machine learning models).

To the best of our knowledge, our paper is the first to study non-monetary mechanisms for the trade of data (bilateral data exchange). There has been, however, a growing literature on monetary data market mechanisms which we review here. The first set of papers in this area study data sharing among strategic agents/firms. These papers implement information sharing among competing agents via the use of an intermediary. In this literature the intermediary is treated as a welfare maximizer, and is limited to aggregating data from agents and sharing it with all others (see, e.g., Raith,, 1996). Bergemann and Morris, (2013) and Bergemann et al., (2018) extend this literature to the case of a rent-seeking intermediary who can also distort data. In contrast to this literature, we do not consider competition among the agents in downstream markets, and we also do not pose corresponding restricting assumptions on trade and on utility functions. We allow for bilateral trade, both with and without intermediaries. In our study, the intermediary is a social-welfare maximizer and we also allow the intermediary to add noise to the model.

The other relevant strand of the data-market literature are two-sided markets where buyers and sellers are separate. Agarwal et al., (2019) propose a market mechanism for finding prices and distributing welfare while allowing individual prices per buyer. The prices are not allowed to be seller specific. They also not consider the cost of sharing, including privacy costs. Our work extends this model to buyer-seller specific prices in a setup where buyers can also be sellers. We also consider the cost of sharing, as do Ghosh and Roth, (2011) and Ligett and Roth, (2012) who study the tradeoff with revenue gain. Acemoglu et al., (2019) discuss the externalities in data privacy.

There are several applications similar to data sharing. These can be unified under information economics. Patents have similar features to data (Lerner and Tirole,, 2004). They can be sold to multiple agents, can be combined together for higher productivity, and can be ordered in their value for consumers. However, the literature on patents assumes a negligible cost of sharing. Moreover patent quality can not be distorted by the intermediary. Ad auctions (Varian,, 2009) and online matching (Mehta,, 2013) are other relevant applications where intermediaries sell data about customers or the other side of the market. In these markets the sales of data is limited, a single ad shown to a target customer, data for each customer is sold to a single advertiser, and there is a one-to-one matching in online matching.

Stable outcomes in cooperative games: There is an extensive literature of cooperative matching games dating back to the seminal work of (Gale and Shapley,, 1962). The major question is to existence of stable outcomes and efficient (polynomial order) algorithms for finding them. The games studied differ in various ways—in the structure of the matching market (e.g., two-sided vs one-sided, one-to-one vs one-to many vs many-to-many, etc.), and in the particular notion of stability employed (strong stability, core stability, Nash stability, individual stability, contractual individual stability, etc). (Gale and Shapley,, 1962) studied two-sided games with two-sided preferences and established the existence of a core stable outcome via a deferred acceptance algorithm that runs in time O(n2)O(n^{2}). Two-sided matching games with one-sided preferences and with ownership is studied in Shapley and Scarf, (1974) where the top-trading cycle algorithm demonstrates the existence of a core stable outcome calculated in time O(n2)O(n^{2}). Finally, two-sided games without ownership are studied in Hylland and Zeckhauser, (1979), who studied a class of “random serial dictatorship” algorithms, and in Abdulkadiroğlu and Sönmez, (1998), who proposed a “core from random endowment mechanism.” While the majority of this work involves two-sided matching, Irving, (1985) studied core stable outcomes in one-sided team formation problems in the context of roommate matching. This is a special case of hedonic games where agents form disjoint partitions and have preferences only in terms of agents in their own coalition (Bogomolnaia and Jackson,, 2002). Hedonic games do not have core stable outcomes in general. Bogomolnaia and Jackson, (2002) show that hedonic game with ordered coalitions or when players have additive and separable preferences have individually stable coalitions.

Our game is more general than hedonic games since matchings are not necessarily in coalition partitions; it could be that agents A and B share data together but C only shares data with A and not B. Therefore our game is a network/graph formation game (Jackson and Wolinsky,, 1996).

Another difference between our work and existing literature is that the latter focuses on core stable outcomes, while focus on strong stability. We will henceforth refer to strong stability as “stability.” Recall that the outcome of the game is strongly stable if there is no deviating coalition of agents that can form contracts among each other, also keeping or dropping their existing contracts with agents out of the deviating coalition at will, in a way such that all members of the deviating coalition weakly prefer the new outcome and at least one of them strongly prefers the new outcome.

Most of the literature is based on two assumptions that are used to establish the existence of strong stability (Ostrovsky,, 2008): substitutability and acyclicity of trade. We relax both assumptions in this study.

Games that weaken the acyclicity assumption are studied by Jackson and Wolinsky, (1996) in the context of friendship networks. The results, however, are limited to a weak notion of pair-wise stability and to restricted value allocation for the outcomes. Jackson and Van den Nouweland, (2005) studied strong stability in network formation games but imposed strong assumptions on the structure of the value function; for example, component-wise separability. In contrast we consider strong stability and general preferences without value functions.

A line of work that relaxes complementarity is presented for large markets by Azevedo and Hatfield, (2018) but it relies on the existence of transferable (monetary payments) between agents. Pycia, (2012) study the existence of stability in two-sided matching markets but assume that preferences are pair-wise aligned in two sides. In this paper, we study a network formation game involving finite number of agents with general preferences (without utility functions). We also consider a limited complementarity property which is satisfied (for example) under the substitutability condition of Ostrovsky, (2008) but also holds more broadly.

In our work the limited complementarity property arises from the top agent property, which induces an ordering among the agents on how much they are preferred by others. Our notion of top agent property has similarities to that of “ordered coalitions” in Bogomolnaia and Jackson, (2002) as well as the notion of the “common ranking property” (Farrell and Scotchmer,, 1988; Caskurlu and Kizilkaya,, 2019) and the “top coalition property” (Banerjee et al.,, 2001) used for establishing the existence of core stable outcomes in Hedonic games.

Finally while it is NP hard to find core stable outcomes in hedonic games under the assumptions that are generally made in the literature, in our setup we we are able to derive an algorithm that finds stable outcomes in time O(N2)O(N^{2}), where NN is the number of agents.

Competitive equilibrium: There is an extensive literature on competitive prices. The main questions are existence of competitive prices, their structure, mechanisms that can find/form those prices, and their properties (including fairness, stability, and efficiency). Shapley and Shubik, (1971) studied two-sided matching games in which agents can have monetary transfer and show the existence of competitive prices. They also show that these prices realize core stable outcomes, and are social-welfare maximizing. The extension to the case of non-quasilinear utility is studied in Demange and Gale, (1985) and Alaei et al., (2016). Sotomayor, (1992) provide an extension to multi-partner two-sided matching markets case where each worker (firm) can have multiple but a limited number of jobs (hiring positions). Further study of many-to-one matching and many-to-many matching in two-sided matching markets is provided in Gul and Stacchetti, (1999) and Ausubel and Milgrom, (2002). Our data sharing market is not two-sided and allows multiple partners without limitation on their number. we also consider quasilinear utilities. However, we only allow one unit of an indivisible good to be traded in each directed partnership. Considering an additive structure on the utility function, we establish the existence of competitive prices for this setup, and we show that every social-welfare-maximizing outcome can be implemented with competitive prices such that it will be a stable outcome.

Social-welfare-maximizing mechanism design: There is large literature on mechanism design where agents have private information. The VCG mechanism, presented in the seminal paper of Vickrey, (1961), is the only truthful dominant mechanisms that maximizes the utilitarian social welfare under certain assumptions, including unrestricted preferences and at least three different outcomes (Holmström,, 1979). VCG mechanisms are also ex-post individually rational under monotonicity of the choice set and non-negativity of externalities. However, VCG mechanisms have major practical shortcomings. In particular, they are not individually rational and budget balanced at the same time.

Green-Laffont’s impossibility result shows that no dominant-strategy incentive-compatible mechanism is always both social-welfare maximizing and weakly budget balanced (Green and Laffont,, 1977). We extend the VCG mechanism to address these issues by leveraging the structure of the data sharing economy. We consider unique features of data, particularly the possibility of distort the quality of data for an agent at zero cost and without impact on other agents’ utilities. We then introduce a “mixed-VCG” framework that achieves approximately optimal social welfare, dominant strategy incentive compatibility, budget balance, and ex-post individual rationality.

3 Bidirectional Data Sharing: The Network Formation Game

We begin by presenting our model of the bidirectional data sharing problem, posing it as as a network formation game. Then we present our results on the existence of strongly stable outcomes for this game.

3.1 Model for bidirectional data sharing

Consider NN agents (e.g., banks or hospitals). Each agent generates value from data available to them (e.g., by training predictive models for fraud or cancer detection). Agent nn has a local data batch DnD_{n} of size dn=|Dn|d_{n}=|D_{n}|. Agents can also share data with each other, but sharing comes at some cost (e.g., a privacy cost or communication cost).

Assumption 3.1.

Upon sharing, data is reported fully and truthfully.

The truthful report of data can be justified practically by the existence of an auditing third party. Furthermore, data sharing can be unidirectional (data in exchange of money) or bidirectional (data in exchange of data).

Assumption 3.2.

(Bidirectional Sharing) Once agents nn and mm share data then both gain access to the others’ data.

Bidirectional data sharing is in the cases where it is hard to attach a monetary value to data (e.g. for banks rare fraud data) and hence data is exchanged with data, or the case that regulations prevent sales of data for money (e.g. patients data for banks).

We model the data sharing game as a Network Formation Game, based with a graph GG that is defined as follows.

Definition 3.1 (Sharing Graph).

In the sharing graph GG each node is an agent and each edge represents the sharing of data between two nodes. For bidirectional sharing, an edge between nodes nn and mm means they share data in both directions (we exclude edges from a node to itself).

If agent nn weakly prefers outcome graph G1G_{1} to G2G_{2} we denote this preference by G1iG2G_{1}\succcurlyeq_{i}G_{2}. Strong preference is denoted by G1iG2G_{1}\succ_{i}G_{2}.

Our work is based on the following notion of stability.

Definition 3.2 (Stable Outcome).

A graph GG is stable if no coalition of agents can form a new subgraph while retaining some of their existing edges, such that all members of the coalition weakly prefer the new graph to the original graph, and at least one of them strongly prefers it.

Next, we provide structures on agents’ preferences. Denote by SnS_{n} the set of nodes connected to nn including nn itself.

Assumption 3.3 (Preference Structure).

Agent nn’s preferences over outcome graphs in 𝒢\mathcal{G} are a function solely of SnS_{n}.

Remark 3.1.

The structure in Assumption 3.3 does not allow for the modeling competition of agents in downstream markets, because each agent’s preference is independent of what data is available to himself and not the other agents.

The preference structure in Assumption 3.3 is a common assumption in the literature on network formation games and hedonic games.

The graph formation game above is similar to that of Jackson and Wolinsky, (1996) and Jackson and Van den Nouweland, (2005), with the distinction that in our case agents do not have a utility function, which makes the game more general.

3.2 Existence of stable outcomes

The network formation game does not have a stable outcome in general. We present the top agent and limited complimentarity assumptions and prove under these assumptions, stable outcomes always exist. We also show by example that relaxing any of these assumptions leads to no stable outcome.

Definition 3.3 (Total Preference Ordering).

We say agent kk prefers ii to jj, denoted by ikji\succ_{k}j, if for any subset of agents SkS_{k} with {i,j}Sk=\{i,j\}\cap S_{k}=\emptyset, Sk{i}kSk{j}S_{k}\cup\{i\}\succ_{k}S_{k}\cup\{j\}. Agent kk has a total preference ordering if it has preference over every other two agents and furthermore this preference is complete, transitive, and reflexive.

Definition 3.4 (Top Agent Property).

Agents preferences satisfy the top agent property if all agents have a total preference ordering, and furthermore their preferences over other agents are the same.

Under the top agent property, the agents can be ranked from the top choice agent to the lowest preferred agent for data sharing.

Corollary 3.1.1 (Homogeneous Preference Structure).

In data sharing markets, if each agent nn’s preference is only a function of the total size of shared data that includes him, kSndk\sum_{k\in S_{n}}d_{k}, and agents’ data sizes are orderable, d1>d2>>dNd_{1}>d_{2}>...>d_{N}, then the top agent property holds.

Definition 3.5 (Limited Complementarity).

Agents have limited complementarity if for every k,ik,i such that SkkSk{i}S_{k}\succ_{k}S_{k}\cup\{i\} then SkkSkSS_{k}\succ_{k}S_{k}\cup S^{\prime} for all S{i,i+1,.,N}S^{\prime}\subset\{i,i+1,....,N\}.

Limited complementarity argues that if adding an agent ii to the set SkS_{k} decreases the preference for agent kk, then adding any subset of agents with less than or equal preference to kk will also decrease the preference for agent kk; in other words, the complementarity for agent kk across any such subset is limited from above.

We are now ready to establish the existence of stable outcomes for network formation under the top agent and limited complementarity assumptions.

Proposition 3.1.

Under top agent assumption in Definition 3.3 and limited complementarity assumption in Definition 3.5, there exists a stable outcome that can be found in time O(N2)O(N^{2}) for the network formation game.

Proof.

We prove by constructing an algorithm. Since total preference ordering exists, without loss of generality we assume agents are sorted in their value for others, with 11 the highest preferred and NN the least preferred. Consider the following algorithm which we refer to by the ordered match. Initially set GG to be an empty edge graph. Agent 11 moves first (set n=1n=1), swipes \ell from 22 to NN and proposes to each agent \ell to share data together if

Sn{}nSn,\displaystyle S_{n}\cup\{\ell\}\succcurlyeq_{n}S_{n}, (1)

Agent \ell accepts if

S{n}S.\displaystyle S_{\ell}\cup\{n\}\succcurlyeq_{\ell}S_{\ell}. (2)

If \ell accepts then an edge between n=1n=1 and \ell is added to GG. SnS_{n} and SS_{\ell} are also updated correspondingly. Then 11 moves to +1\ell+1. After 11 finishes his swipe, agent 22 does the same process (set n=2n=2) but passing through 33 to NN and so on. We denote the outcome graph of the ordered algorithm run over the set of agents SS by Ord(S)Ord(S).

To show the outcomes of the ordered match are stable, we use the following lemma and corollary.

Lemma 3.2.

All agents weakly prefer the ordered match outcome graph to the alternative graph formed from removing some of its edges.

Proof.

Denote the outcome of the ordered match by GG. Consider removing some edges from GG resulting in graph GG^{\prime}. Denote the set obtained from removing exactly ii of the least valuable nodes from SmS_{m} by SmiS_{m}^{-i}. It is obvious from the ordered match algorithm that SmmSmiS_{m}\succcurlyeq_{m}S_{m}^{-i}. Consider a node mm with the set of removed connections Rm:=Sm\SmR^{\prime}_{m}:=S_{m}\backslash S^{\prime}_{m}, where |Rm||R^{\prime}_{m}| is the size of RmR^{\prime}_{m}. To complete the proof, we show that SmmSm|Rm|S^{\prime}_{m}\succcurlyeq_{m}S_{m}^{-|R^{\prime}_{m}|}. To show this, we construct an algorithm that starts from SmS^{\prime}_{m} and converges to Sm|Rm|S_{m}^{-|R^{\prime}_{m}|} by replacing the nodes such that in each step the preference of agent mm is non-decreasing. The replacement algorithm is in two steps which are repeated for |Rm||R^{\prime}_{m}| times: (a) first extend the set SmS^{\prime}_{m} by adding the least valuable node in RmR^{\prime}_{m} to it to form a new set denoted by Sm,extS^{\prime}_{m,ext}, (b) remove the least valuable node in SmS_{m} from Sm,extS^{\prime}_{m,ext} to form Sm′′S^{\prime\prime}_{m}. Note that this least valuable node of SmS_{m} always exists in Sm,extS^{\prime}_{m,ext}, because either it has been a member of SmS^{\prime}_{m} from the beginning, or otherwise it is added in step (a)(a). By the total preference ordering assumption, Definition 3.3, we have Sm′′mSmS^{\prime\prime}_{m}\succcurlyeq_{m}S^{\prime}_{m}. We next run steps (a) and (b) on Sm′′S^{\prime\prime}_{m} by adding the second least valuable node in RmR_{m} to Sm′′S^{\prime\prime}_{m} and removing the second least valuable node of SmS_{m} from Sm′′S^{\prime\prime}_{m}. Again agent mm has non-decreasing preferences over the resulting set. Continuing this procedure |Rm||R_{m}| times forms set Sm|Rm|S_{m}^{-|R_{m}|}. ∎

Corollary 3.2.1.

If agent ii does not offer to agent jj in ordered match algorithm—i.e., (1) does not hold—then adding any subset of agents in {j,j+1,,N}\{j,j+1,...,N\} to SiS_{i} strongly decreases ii’s utility.

This corollary is a direct consequence of the limited complementarity assumption in Definition 3.5.

We now complete the proof of the theorem. Consider G=Ord(S)G=Ord(S). Consider a deviating coalition S={1,2,,N}S^{\prime}=\{1^{\prime},2^{\prime},...,N^{\prime}\} of agents with agents ordered in their value with 11^{\prime} being the highest valued agent. Denote their deviating sharing graph, which consists only of nodes S={1,2,,N}S^{\prime}=\{1^{\prime},2^{\prime},...,N^{\prime}\}, by GG^{\prime}. Note that if GGG^{\prime}\subset G then no agent is strongly better with the deviation due to lemma 3.2, so GGG^{\prime}\not\subset G. Next, search for a new edge in GG^{\prime} compared to GG using the following ordered search algorithm: start from 11^{\prime} and swipe over 2,3,2^{\prime},3^{\prime},... to the first edge in the coalition graph GG^{\prime} that did not exist in the ordered match graph GG. If no such edge was found, start with 22^{\prime} and swipe over 33^{\prime}, 44^{\prime} etc, and so on, to the point that the first new edge ii^{\prime} to jj^{\prime} is found. Without loss of generality assume ii^{\prime} is weakly more valuable than jj^{\prime}, so i<ji^{\prime}<j^{\prime}. Since the link between ii^{\prime} and jj^{\prime} does not exist in GG, at least one of (1) or (2) did not hold when ii^{\prime} was swiping jj^{\prime} in the ordered match algorithm. We show in the first case ii^{\prime} strongly prefers GG to GG^{\prime} and in the second case jj^{\prime} strongly prefers GG to GG^{\prime}.

  1. 1.

    In the first case that (1) does not hold, we start from SiS^{\prime}_{i^{\prime}} and construct SiS_{i^{\prime}} through first a replacement algorithm forming SirepS^{rep}_{i^{\prime}} and then a replacement algorithm forming SiS_{i^{\prime}} from SirepS^{rep}_{i^{\prime}}. Both of these stages are run through steps that only weakly increase the ii^{\prime} preference. By considering different cases we show that at least one of them strongly increases the ii^{\prime} preference. We use the following corollary for our proof here.

    Corollary 3.2.2.

    ii^{\prime} strongly prefers all nodes in Si\(SiSi)S_{i^{\prime}}\backslash(S_{i^{\prime}}\cap S^{\prime}_{i^{\prime}}) to jj^{\prime}. Also ii^{\prime} weakly prefers jj^{\prime} to all nodes in Si\(SiSi)S^{\prime}_{i^{\prime}}\backslash(S_{i^{\prime}}\cap S^{\prime}_{i}).

    The corollary holds since in the case where (1) does not hold, ii^{\prime} is not connected to jj^{\prime} in GG or any node less valuable than jj^{\prime}. Therefore ii^{\prime} strongly prefers all nodes in SiS_{i^{\prime}}, and consequently all nodes in Si\(SiSi)S_{i^{\prime}}\backslash(S_{i^{\prime}}\cap S^{\prime}_{i^{\prime}}), to jj^{\prime}. On the other hand, since iji^{\prime}j^{\prime} was the first edge found in the ordered search algorithm, jj^{\prime} is weakly preferred by ii^{\prime} to all nodes in Si\(SiSi)S_{i^{\prime}}\backslash(S_{i^{\prime}}\cap S_{i}).

    The replacement algorithm forms SirepS^{rep}_{i^{\prime}} as follows:

    • For a number of repetitions equal to min{|Si\(SiSi)|,|Si\(SiSi)|}\min\{|S_{i^{\prime}}\backslash(S^{\prime}_{i^{\prime}}\cap S_{i^{\prime}})|,|S^{\prime}_{i^{\prime}}\backslash(S^{\prime}_{i^{\prime}}\cap S_{i^{\prime}})|\} do

      1. (a)

        Consider set SiS^{\prime}_{i^{\prime}} and replace an arbitrary node in Si\(SiSi)S^{\prime}_{i^{\prime}}\backslash(S_{i^{\prime}}\cap S^{\prime}_{i^{\prime}}) by an arbitrary node in Si\(SiSi)S_{i^{\prime}}\backslash(S_{i^{\prime}}\cap S^{\prime}_{i^{\prime}}) to form Si′′S^{\prime\prime}_{i^{\prime}}.

      2. (b)

        Replace SiS^{\prime}_{i^{\prime}} with Si′′S^{\prime\prime}_{i^{\prime}} and repeat the step (a).

    We prove that SiiSirepS_{i^{\prime}}\succcurlyeq_{i^{\prime}}S^{rep}_{i^{\prime}}. To this end, consider two cases. In case one, |Si\(SiSi)||Si\(SiSi)||S_{i^{\prime}}\backslash(S_{i^{\prime}}\cap S_{i})|\geq|S^{\prime}_{i^{\prime}}\backslash(S^{\prime}_{i^{\prime}}\cap S_{i})| which means SirepSiS^{rep}_{i^{\prime}}\subset S_{i^{\prime}} and consequently from lemma 3.2, SiiSirepS_{i^{\prime}}\succcurlyeq_{i^{\prime}}S^{rep}_{i^{\prime}}. In case two, |Si\(SiSi)|<|Si\(SiSi)||S_{i^{\prime}}\backslash(S^{\prime}_{i^{\prime}}\cap S_{i^{\prime}})|<|S^{\prime}_{i^{\prime}}\backslash(S^{\prime}_{i^{\prime}}\cap S_{i^{\prime}})| which means SiSirepS_{i^{\prime}}\subset S^{rep}_{i^{\prime}} in a way that ii^{\prime} prefers all nodes in Sirep§iS^{rep}_{i^{\prime}}\S_{i^{\prime}} less than or equal to jj^{\prime}. Consequently from Corollary 3.2.1, SiiSirepS_{i^{\prime}}\succ_{i^{\prime}}S^{rep}_{i^{\prime}}. Next, in case where min{|Si\(SiSi)|,|Si\(SiSi)|}>0\min\{|S_{i^{\prime}}\backslash(S^{\prime}_{i^{\prime}}\cap S_{i^{\prime}})|,|S^{\prime}_{i^{\prime}}\backslash(S^{\prime}_{i^{\prime}}\cap S_{i^{\prime}})|\}>0, due to Corollary 3.2.2, SirepiSiS^{rep}_{i^{\prime}}\succ_{i^{\prime}}S^{\prime}_{i^{\prime}} and therefore SiiSiS_{i^{\prime}}\succ_{i^{\prime}}S^{\prime}_{i^{\prime}}. Finally if min{|Si\(SiSi)|,|Si\(SiSi)|}=0\min\{|S_{i^{\prime}}\backslash(S^{\prime}_{i^{\prime}}\cap S_{i^{\prime}})|,|S^{\prime}_{i^{\prime}}\backslash(S^{\prime}_{i^{\prime}}\cap S_{i^{\prime}})|\}=0, considering that j|Si\(SiSi)|j^{\prime}\in|S^{\prime}_{i^{\prime}}\backslash(S^{\prime}_{i^{\prime}}\cap S_{i^{\prime}})|, it should be that |Si\(SiSi)|=0|S_{i^{\prime}}\backslash(S^{\prime}_{i^{\prime}}\cap S_{i^{\prime}})|=0 which means SiSiS_{i^{\prime}}\in S^{\prime}_{i^{\prime}} in a way that Si\SiSiS^{\prime}_{i^{\prime}}\backslash S^{\prime}_{i^{\prime}}\cap S_{i^{\prime}} includes nodes that are preferred by ii^{\prime} less than or equal to jj^{\prime}. Furthermore in this case Sirep=SiS_{i^{\prime}}^{rep}=S^{\prime}_{i}. Therefore, from Corollary 3.2.1, SiiSiS_{i^{\prime}}\succ_{i^{\prime}}S^{\prime}_{i^{\prime}}.

  2. 2.

    In the second case that (2) does not hold, we start from SjS^{\prime}_{j^{\prime}} and construct SjS_{j^{\prime}} through the same replacement algorithm as in the first case such that the preference of jj^{\prime} is strongly increased. We skip repetition of the details for this case.

The ordered match algorithm finishes in time O(N2)O(N^{2}) since each of the NN agents swipes on average over N/2N/2 other agents. ∎

Remark 3.3.

The ordered match results in a unique outcome.

Remark 3.4.

The outcome of the ordered match algorithm is stable and therefore Pareto optimal. However it is not necessarily social-welfare maximizing. Consider simple case that there are two agents 11 and 22 with the following utility functions: u1({1})=1,u1({1,2})=0u_{1}(\{1\})=1,u_{1}(\{1,2\})=0, u2({1,2})=10,u2({2})=0u_{2}(\{1,2\})=10,u_{2}(\{2\})=0. Under the ordered match agent 11 does not share with agent 22 since his utility by sharing will decrease. However, the social-welfare-maximizing outcome is for agents 11 and 22 to share data.

Proposition 3.1 does not hold when either top agent or limited complementarity assumption is relaxed. To see this consider the following examples.

Example 3.5 (No stable outcome without top agent property).

Consider three agents. Agent 11 has preferences over S1S_{1} defined by {1,3}1{1,2}1{1}1{1,2,3}\{1,3\}\succ_{1}\{1,2\}\succ_{1}\{1\}\succ_{1}\{1,2,3\}; agent 22 has preferences over S2S_{2} as {1,2,3}2{2}2{2,3}\{1,2,3\}\succ_{2}\{2\}\succ_{2}\{2,3\}; agent 3 has preferences over S3S_{3} as {1,2,3}3{3,2}3{3}3{3,1}\{1,2,3\}\succ_{3}\{3,2\}\succ_{3}\{3\}\succ_{3}\{3,1\}. Here the top agent property does not hold. The game does not have a stable outcome.

4 Unidirectional Sharing: Competitive Prices

In this Section, we study the case where data is traded with money. To this end we first extend the model in Section 3 to include utility functions and allow money transfer. Following Shapley and Scarf, (1974), we investigate existence of competitive prices that can realize stable outcomes. We will present structures on the utility function that guarantee existence of such prices.

4.1 Model

We update the model used in Section 3. We assume the agents’ preferences from each sharing graph is measurable with a utility function. We consider the following structures for this utility function.

The sharing graph for the unidirectional sharing will be a directed graph GG, where a directed edge from ii to jj indicates agent ii sharing his data with agent jj.

Assumption 4.1 (Utility structure).

Agents’ preferences over outcomes GG can be represented with a total utility function. Agent ii’s total utility function is Vi:𝒢RV_{i}:\mathcal{G}\rightarrow{R}. Moreover, ViV_{i} is decomposed as a utility function Ui(SiI)U_{i}(S_{i}^{I}) minus a cost function Ci(SiO)C_{i}(S_{i}^{O}):

Vi(G):=Ui(SiI)Ci(SiO),\displaystyle V_{i}(G):=U_{i}(S_{i}^{I})-C_{i}(S_{i}^{O}), (3)

where SiIS_{i}^{I} is the set of agents that agent ii gains access to their data, and SiOS_{i}^{O} is the set of agents that gain access to agent ii’s data.

We now leverage the particular features of data for machine learning algorithms to put structure on the utility functions and the cost function.

Assumption 4.2 (Additive Cost Structure).

Agent nn’s cost is additive with respect to the agents who access his data:

Ci(SiO)=jSiOci(j).\displaystyle C_{i}(S^{O}_{i})=\sum_{j\in S^{O}_{i}}c_{i}(j). (4)
Remark 4.1.

The cost structure in (4) is additively decomposable. It can represent privacy cost as well as peer-to-peer communication costs. However it does not include one-time communication with an intermediary which can then send the data to the others. It also does not model differential privacy costs where there is a privacy cost per inquiry of agent jj with respect to agent ii’s data. Extension of the model to incorporate differential privacy is considered in Section 6.

Assumption 4.3.

Agents can have monetary transfer. We denote the payment by/charge to agent ii with tit_{i}, and its quasilinear utility function by

Vi(G)ti:=Ui(SiI)Ci(SiO)ti.\displaystyle V_{i}(G)-t_{i}:=U_{i}(S^{I}_{i})-C_{i}(S^{O}_{i})-t_{i}. (5)

4.2 Existence of competitive prices

Proposition 4.1.

If costs are additively decomposable, see (4) in Assumption 4.2, then for a unidirectional sharing graph (i) there exist competitive prices, and (ii) every social-welfare-maximizing outcome can be implemented with a competitive price, such that it is also a stable outcome.

Proof.

We prove by construction. (i) Consider the set of prices where agent ii charges a price of ci(j)c_{i}(j) for selling his data to agent jj. We show these prices are competitive (form a competitive equilibrium). The total payment of agent ii will be

ti=jSjIcj(i)jSjOci(j).\displaystyle t_{i}=\sum_{j\in S^{I}_{j}}c_{j}(i)-\sum_{j\in S^{O}_{j}}c_{i}(j). (6)

We determine the demand and supply of each agent ii. Under Assumption 4.2,

Ui(SiI)Ci(SiO)ti\displaystyle U_{i}(S^{I}_{i})-C_{i}(S^{O}_{i})-t_{i} =Ui(SiI)jSiOci(j)jSjIcj(i)+jSjOci(j)\displaystyle=U_{i}(S^{I}_{i})-\sum_{j\in S^{O}_{i}}c_{i}(j)-\sum_{j\in S^{I}_{j}}c_{j}(i)+\sum_{j\in S^{O}_{j}}c_{i}(j) (7)
=Ui(SiI)jSjIcj(i).\displaystyle=U_{i}(S^{I}_{i})-\sum_{j\in S^{I}_{j}}c_{j}(i). (8)

Agent ii’ demand, which is the optimal subset of agents to buy data from denoted by SiI{S^{I}_{i}}^{*}, is calculated by

SiIargmaxSiIUi(SiI)jSiIcj(i).\displaystyle{S^{I}_{i}}^{*}\in\arg\max_{S^{I}_{i}}U_{i}(S^{I}_{i})-\sum_{j\in S^{I}_{i}}c_{j}(i). (9)

Agent ii is indifferent on the amount of supply, which is the set of agents agent ii shares its data with; this is because (7) is independent of SiOS^{O}_{i}. Consequently, for any demand set SjIS^{I}_{j}, j{1,2,,N}j\in\{1,2,...,N\}, the optimal supply of agent ii can be determined uniquely in a way to meet the demand and clear the market:

SiO={j:j{1,2,,N},iSjI}.\displaystyle{S^{O}_{i}}^{*}=\{j:j\in\{1,2,...,N\},i\in{S^{I}_{j}}^{*}\}. (10)

(ii) We prove that every social-welfare-maximizing outcome can be implemented with the competitive prices we proposed above. Start with a social-welfare-maximizing outcome and decompose it as follows:

maxG𝒢i=1NUi(SiI)Ci(SiO)\displaystyle\max_{G\in\mathcal{G}}\sum_{i=1}^{N}U_{i}(S^{I}_{i})-C_{i}(S^{O}_{i}) =i=1NmaxSiI{1,2,,N}(Ui(SiI)jSiIcj(i))\displaystyle=\sum_{i=1}^{N}\max_{S_{i}^{I}\subset\{1,2,...,N\}}\bigg{(}U_{i}(S^{I}_{i})-\sum_{j\in S^{I}_{i}}c_{j}(i)\bigg{)} (11)
=i=1NmaxSiI{1,2,,N},SiO{1,2,,N}(Ui(SiI)Ci(SiO)ti),\displaystyle=\sum_{i=1}^{N}\max_{S_{i}^{I}\subset\{1,2,...,N\},S_{i}^{O}\subset\{1,2,...,N\}}\bigg{(}U_{i}(S^{I}_{i})-C_{i}(S^{O}_{i})-t_{i}\bigg{)}, (12)

where the decomposition in (12) matches that of (9). This shows if every agent optimizes (9) individually, then the solution will be social welfare maximizing. To see under the proposed competitive prices, the social welfare maximizing outcomes are stable, note that from (9) every agent is gaining its maximum utility at the outcome of the social-welfare-maximizing outcome compared to any other coalition. ∎

Remark 4.2.

The results in Proposition 4.1 are based on two features in this problem which are specific to the data sharing economy: first, the sharing costs are decomposable as a sum of the sharing cost between each pair of individuals according to Eq. (4) in Assumption 4.2. Second, data can be replicated freely so there is no constraint on number of agents who buy data from an agent. This last assumption makes it possible to define market-clearing supply amounts.

Proposition 4.2 (Space of the Competitive Prices).

Competitive prices of the unidirecitonal sharing game are not unique to those presented in Proposition 4.1. For example, fixing all other prices to be ci(j)c_{i}(j), imi\neq m, agent mm can set its prices for agent jj to be any value in [cm(j),pmmax(j)][c_{m}(j),p^{\max}_{m}(j)] where pmmax(j)p^{\max}_{m}(j) is a price such that the solutions to the following optimization remains the same as that in (9) for all i{1,2,,N}i\in\{1,2,...,N\}:

argmaxSiIUi(SiI)I(mSnI)pmmax(i)jSnI,jmcj(i).\displaystyle\arg\max_{S^{I}_{i}}U_{i}(S^{I}_{i})-I(m\in S_{n}^{I})p^{\max}_{m}(i)-\sum_{j\in S^{I}_{n},j\neq m}c_{j}(i). (13)

5 Unilateral Sharing with Private Types: Mixed-VCG Mechanism

In this section we extend the model by assuming agents belong in type space Θ\Theta, determining their utility from sharing. We further assume types are private information known only to the agents themselves. There also exists a social planner intermediary who runs a market for coordinating data sharing. We look for market mechanisms that have the following features: they implement approximate social-welfare-maximizing outcomes at dominant equilibrium, they are individually rational, and they are budget balanced. To this end, we propose a variation of the VCG mechanisms—mixed VCG—which uses a mix of data distortion (referred to as “data money”) and monetary payment to achieve budget balance. We will then investigate how to address communication/message complexity of the VCG mechanisms for data sharing, using specific features of data.

5.1 Model

We extend the model in Section 4.

Assumption 5.1.

Agents are from type space Θ\Theta with distribution fθf_{\theta}. Types are private information.

Note that type θn\theta_{n} is a determinant of agent nn’s utility and cost functions. It does not involve its private data DnD_{n}.

Assumption 5.2.

There exists an intermediary for running the market. The intermediary is a social planner.

Furthermore, we extend the game by making the following two assumptions which capture unique features of data economy. We discuss at the end of this section how these assumptions can be relaxed.

Assumption 5.3.

[Zero Cost Data Distortion] The intermediary has the capacity to distort the quality of data passed to each agent downward or upward, at zero cost to the intermediary.

There are several ways the quality of data can be distorted. For example, downward distortion could be achieved by removing part of one agent’s data when passed to another, and upward distortion could be achieved by adding new data passed to an agent (perhaps by simulating new data from a model). Alternatively, the intermediary can add noise to data, or to the machine learning model passed to each agent. Later in this section, we will relax the assumption of zero cost distortion to the intermediary.

Assumption 5.4.

[Isolated Impact of Data Distortion] We assume that data rationing by the intermediary for data shared with one agent does not change the total utility of the other agents.

Assumption 5.3 extends the outcomes of the game from the graph GG in Section 3 to a weighted directed graph G^\hat{G} defined below.

Assumption 5.4 implies that other agents do not incur a change in their utility or costs from data sharing, such as privacy cost, due to rationing on distortion of another agent’s data. This allows for isolating the impact of rationing data passed to an agent only to itself.

Definition 5.1.

The outcome of a unidirectional sharing game where the intermediary can change quality of the data is a weighted directed graph G^\hat{G}. The weight of each edge from ii to jj is denoted by wi,j[0,)w_{i,j}\in[0,\infty).

One way to interpret the weights is as the fraction of the data from ii passed by the intermediary to jj.

We also extend the utility of the agents to be explicitly a function of their type and the weighted directed graph G^\hat{G}:

V(G^,θi)ti=U(S^iI,θi)C(S^iO,θn)ti.\displaystyle V(\hat{G},\theta_{i})-t_{i}=U(\hat{S}^{I}_{i},\theta_{i})-C(\hat{S}^{O}_{i},\theta_{n})-t_{i}. (14)

Similar to (5), UU is the utility from gaining access to other agents’ data, CC is the cost of sharing, and tt is the monetary payment.

Before introducing our mechanism we provide the following definition which helps in studying the properties of our mechanism.

Definition 5.2.

Agents’ social welfare is

SWA(G^):=i{1,2,,N}(V(G^,θi)ti).\displaystyle SW^{A}(\hat{G}):=\sum_{i\in\{1,2,...,N\}}\bigg{(}V(\hat{G},\theta_{i})-t_{i}\bigg{)}. (15)

Total social welfare is the sum of utilities of all agents plus the payments to the intermediary:

SWT(G^):=\displaystyle SW^{T}(\hat{G}):= SWA(G^)+i{1,2,,N}ti\displaystyle SW^{A}(\hat{G})+\sum_{i\in\{1,2,...,N\}}t_{i} (16)
=\displaystyle= i{1,2,,N}V(G^,θi).\displaystyle\sum_{i\in\{1,2,...,N\}}V(\hat{G},\theta_{i}). (17)
Remark 5.1.

SWTSW^{T} in (17) is agents’ utility only from allocations (including costs imposed to the intermediary for data distortion) excluding payments. The standard classical VCG mechanism implements maximum total social welfare SWTSW^{T} (often referred to simply as social welfare). VCG does not implement maximum SWASW^{A}.

5.2 Mixed-VCG mechanism

We introduce a mechanism that we refer to as Mixed-VCG to achieve approximately optimal SWASW^{A}, truthful reporting as a dominant strategy, budget balance and ex-post individual rationality Moreover, the mechanism will have the same SWTSW^{T} as standard VCG.

Definition 5.3 (Mixed-VCG).

Agents report their type to the intermediary; agent ii’s reported type is denoted by θ^i\hat{\theta}_{i}s. The intermediary then outputs a weighted directed sharing graph G^\hat{G} and a payment vector tt by taking the following steps.

First the intermediary determines a sharing graph GG^{*} similar to standard VCG and without any distortions,

G\displaystyle{G}^{*} =argmaxG𝒢i{1,2,,N}V(G,θi).\displaystyle=\arg\max_{{G}\in\mathcal{G}}\sum_{i\in\{1,2,...,N\}}V({G},{\theta}_{i}). (18)

determines t~i\tilde{t}_{i} as

t~i=jiV(Gi,θj)jiV(G,θj),\displaystyle\tilde{t}_{i}=\sum_{j\neq i}V({G}_{-i}^{*},{\theta}_{j})-\sum_{j\neq i}V({G}^{*},{\theta}_{j}), (19)

where

Gi=argmaxG𝒢jiV(G,θi).\displaystyle{G}_{-i}^{*}=\arg\max_{{G}\in\mathcal{G}}\sum_{j\neq i}V({G},{\theta}_{i}). (20)

and calculates the total budget surplus or deficit of the standard VCG:

Δ=it~i.\displaystyle\Delta=\sum_{i}\tilde{t}_{i}. (21)

The intermediary then determines each agent nn’s real payment tnt_{n} and data payment tndt^{d}_{n} (which will be translated into distortion to agent nn’s data) such that

tn+tnd=t~n.\displaystyle t_{n}+t^{d}_{n}=\tilde{t}_{n}. (22)

Without loss of generality assume t~1>t~2>\tilde{t}_{1}>\tilde{t}_{2}>\cdot\cdot\cdot. If Δ>0\Delta>0—i.e., there is budget surplus in standard VCG—then the intermediary starts from agent 11 and sets

t1d\displaystyle t^{d}_{1} =min{t~1,Δ}\displaystyle=\min\{\tilde{t}_{1},\Delta\} (23)
tnd\displaystyle t^{d}_{n} =min{t~n,Δj=1n1t1d}n=2,3,,N.\displaystyle=\min\{\tilde{t}_{n},\Delta-\sum_{j=1}^{n-1}t^{d}_{1}\}\qquad\forall n=2,3,...,N. (24)

If Δ<0\Delta<0, i.e., there is a budget deficit, then the intermediary starts from agent NN and sets

tNd\displaystyle t^{d}_{N} =max{t~N,Δ}\displaystyle=\max\{\tilde{t}_{N},\Delta\} (25)
tnd\displaystyle t^{d}_{n} =max{t~n,Δj=Nn+1tnd}n=N1,N2,,1.\displaystyle=\max\{\tilde{t}_{n},\Delta-\sum_{j=N}^{n+1}t^{d}_{n}\}\qquad\forall n=N-1,N-2,...,1. (26)

Next the intermediary sets monetary payments as follows:

ti=t~itidnN,\displaystyle t_{i}=\tilde{t}_{i}-t^{d}_{i}\qquad\forall n\in N, (27)

and calculates data distortions, wi,j,i,jw^{*}_{i,j},\forall i,j, to form the sharing graph G^\hat{G}^{*} so that the following is satisfied:

V(G^,θi)ti=V(G,θi)t~i.\displaystyle V(\hat{G}^{*},\theta_{i})-t_{i}=V(G^{*},\theta_{i})-\tilde{t}_{i}. (28)
Proposition 5.1.

The mixed-VCG mechanism is dominant truthful, budget balanced and ex-post individually rational. It implements agents’ social welfare, SWASW^{A}, whose suboptimality is exactly equal to the level of budget imbalance of standard VCG. It also implements the same SWTSW^{T} as standard VCG.

Proof.

To see individual rationality and dominant truthfulness, it is sufficient to note that for the vector of reported types (θ^1,θ^2,,θ^N)(\hat{\theta}_{1},\hat{\theta}_{2},...,\hat{\theta}_{N}) based on (28), all agents gain the same utility for both standard VCG and mixed-VCG. Individual rationality and dominant truthfulness then follow from the fact that these properties hold for standard VCG.

Budget balance is a direct consequence of definition of the monetary payments in mixed-VCG, which is the redistribution of the standard VCG budget imbalance across agents according to (22)-(27). This redistribution ensures i={1,2,,N}ti=0\sum_{i=\{1,2,...,N\}}t_{i}=0.

From (28), i={1,2,,N}V(G^,θi)ti=i={1,2,,N}V(G,θi)t~i\sum_{i=\{1,2,...,N\}}V(\hat{G}^{*},\theta_{i})-{t}_{i}=\sum_{i=\{1,2,...,N\}}V(G^{*},\theta_{i})-\tilde{t}_{i}, so from (15) mixed-VCG achieves the same SWASW^{A} as the standard VCG.

To see the difference of the SWTSW^{T} between mixed-VCG and standard VCG, from (17), (28), budget balance of the mixed-VCG mechanism which induces i={1,2,,N}ti=0\sum_{i=\{1,2,...,N\}}t_{i}=0, and (21),

SWmixedVCGT\displaystyle SW^{T}_{mixed-VCG} ={i=1,2,,N}V(G^,θi)\displaystyle=\sum_{\{i=1,2,...,N\}}V(\hat{G}^{*},\theta_{i}) (29)
={i=1,2,,N}V(G,θi)+{i=1,2,,N}ti{i=1,2,,N}t~i)\displaystyle=\sum_{\{i=1,2,...,N\}}V({G}^{*},\theta_{i})+\sum_{\{i=1,2,...,N\}}t_{i}-\sum_{\{i=1,2,...,N\}}\tilde{t}_{i}) (30)
={i=1,2,,N}V(G,θi)Δ=SWVCGTΔ.\displaystyle=\sum_{\{i=1,2,...,N\}}V({G}^{*},\theta_{i})-\Delta=SW^{T}_{VCG}-\Delta. (31)

Remark 5.2.

The NM-VCG mechanism relies on isolated impact of data distortion in Assumption 5.4, the fact that the intermediary can distort the data allocations to change an individual’s utility without changing the others’. This is a property of data economy and not the case for other commodities (for example house allocation).

We now consider relaxing Assumption 5.3. While distorting data quality downward is practically at zero cost, for example by adding noise to the data, an upward distortion may be practically costly to the intermediary, for example through training a model for generating new data. To address this issue, we extend the mixed-VCG mechanism. The intermediary can add a fixed base downward distortion of data in a zero cost way, for example fixed noise added to data. Then instead of adding new data, the intermediary just lifts the fixed noise which will be at negligible cost in practice. We call this mechanism Distorted mixed-VCG (D-mixed-VCG). Formally consider wn,m0w^{0}_{n,m} to be the chosen ex-ante weights placed on the edges. Denote the class of all such sharing graphs as 𝒢^0\hat{\mathcal{G}}_{0} (the set of graphs that ii either does not pass his data to jj or gives a distorted version at weight wi,jw_{i,j}. Then intermediary determines a sharing graph G^0\hat{G}^{*}_{0} according to

G^0\displaystyle\hat{G}_{0}^{*} =argmaxG^0𝒢^0i{1,2,,N}V(G^0,θi),\displaystyle=\arg\max_{\hat{G}_{0}\in\hat{\mathcal{G}}_{0}}\sum_{i\in\{1,2,...,N\}}V(\hat{G}_{0},{\theta}_{i}), (32)

and determines t~i\tilde{t}_{i} as

t~i=jiV(G^0,i,θj)jiV(G^0,θj),\displaystyle\tilde{t}_{i}=\sum_{j\neq i}V(\hat{G}_{0,-i}^{*},{\theta}_{j})-\sum_{j\neq i}V(\hat{G}_{0}^{*},{\theta}_{j}), (33)

where

G^0,i=argmaxG^0𝒢^0jiV(G^0,i,θi).\displaystyle\hat{G}_{0,-i}^{*}=\arg\max_{\hat{G}_{0}\in\mathcal{\hat{G}}_{0}}\sum_{j\neq i}V(\hat{G}_{0,-i}^{*},{\theta}_{i}). (34)

the rest of the mechanism is analogous to the mixed VCG mechanism per (21)-(28).

D-mixed-VCG implements lower SWTSW^{T} and SWASW^{A} compared to mixed-VCG because it adds distortion to the data. The proof is straightforward, and we leave it to the reader.

Remark 5.3.

We now discuss the communication complexity of the mixed-VCG mechanism for data sharing by using specific features of data as a commodity. From statistical machine learning results, the agents’ utility and costs from sharing data, UU and CC, are of certain structural forms; for example, for several machine learning methods, the marginal value of the new data for learning a model reduces at a rate of 1D\frac{1}{\sqrt{D}} where DD is the size of the entire data available to the agent. Using such structural properties allows reducing the complexity of the type space, hence the communication complexity.

6 Extensions to Differential Privacy

In this section we briefly show how the model and results can be adapted to a more complicated setting, by bringing differential privacy into the picture. We relax full reporting of data in Assumption 3.1, and assume agents make queries over each others’ data, where for each query there is a differential privacy cost for the data owner. In this setup, the unit of commodity offered by each agent nn is the outcome of a query over its data, rather than its whole raw data. We omit the proofs of this section.

We update the model by extending the results in Section 3 to 5. First, the sharing graph GG in Definition 3.1 is extended with integer weights on the edges w~i,j\tilde{w}_{i,j} which reflect the number of queries agent ii runs on agent jj’s data. We refer to this graph by G~\tilde{G}. The cost and utility functions in Assumptions 4.2 are extended to be a function of graph S~iO\tilde{S}^{O}_{i} and S~iI\tilde{S}^{I}_{i}, Vi(G~)=Ui(S~iO)Ci(S~iO)V_{i}(\tilde{G})=U_{i}(\tilde{S}^{O}_{i})-C_{i}(\tilde{S}^{O}_{i}), where S~iI={(j,w~i,j):jN}\tilde{S}^{I}_{i}=\{(j,\tilde{w}_{i,j}):j\in N\}, S~iO={(j,w~j,i):jN}\tilde{S}^{O}_{i}=\{(j,\tilde{w}_{j,i}):j\in N\} Given these updates to the model, together with Proposition 3.1, we can show that a core stable outcome exists and can be found by updating the ordered match algorithm as follows: every time an agent nn swipes an agent \ell, it also proposes a set of sharing weights {(w~n,,w~,n)}\{(\tilde{w}_{n,\ell},\tilde{w}_{\ell,n})\} that are acceptable to nn (the benefits of sharing are greater than costs). Agent \ell then either accepts to share by choosing from this set or rejects the sharing of data.

Next consider a cost structure in which each agent ii has a differential privacy cost which, along the line of (4) is additively decomposable with respect to the agents he is sharing data with:

Ci(S~iO)=jS~iOw~i,jci(j).\displaystyle C_{i}(\tilde{S}^{O}_{i})=\sum_{j\in\tilde{S}^{O}_{i}}\tilde{w}_{i,j}c_{i}(j). (35)

Then, along the lines of Proposition 4.1, there exists a set of competitive prices where each agent ii charges agent jj an amount of ci(j)c_{i}(j) per inquiry. Moreover, each other agent jj is allowed to purchase multiple inquiries from agent ii.

Finally, the VCG mechanism in Section 5 can be updating by considering doubly weighted directed graphs G~^\hat{\tilde{G}} with weights on each edge from ii to jj being w~i,j\tilde{w}_{i,j} the number of inquiries of jj on ii’s data, and wi,jw_{i,j} the distortion to quality of each inquiry. Then (36) can be extended as

V(G~^,θi)ti=U(S~^iI,θi)C(S~^iO,θi)ti.\displaystyle V(\hat{\tilde{G}},\theta_{i})-t_{i}=U(\hat{\tilde{S}}^{I}_{i},\theta_{i})-C(\hat{\tilde{S}}^{O}_{i},\theta_{i})-t_{i}. (36)

The mixed-VCG mechanism is updated to include weights w~ij\tilde{w}_{ij} determined by the intermediary.

7 Conclusions

We have studied data sharing markets for both bidirectional (data-data exchange) and unidirectional (data-money exchange) sharing. We proved existence of strongly stable outcomes for bilateral sharing via formulating the problem as a network formation game. We achieved this with limited complementarity and without posing utility structure on agents’ preferences. For unilateral sharing, we constructed competitive prices that implement socially optimal outcomes, and in the case of agents’ private information of their types, we also presented budget balanced mechanisms which truthfully implement approximately optimal outcomes.

References

  • Abdulkadiroğlu and Sönmez, (1998) Abdulkadiroğlu, A. and Sönmez, T. (1998). Random serial dictatorship and the core from random endowments in house allocation problems. Econometrica, 66(3):689–701.
  • Acemoglu et al., (2019) Acemoglu, D., Makhdoumi, A., Malekian, A., and Ozdaglar, A. (2019). Too much data: Prices and inefficiencies in data markets. Technical report, National Bureau of Economic Research.
  • Agarwal et al., (2019) Agarwal, A., Dahleh, M., and Sarkar, T. (2019). A marketplace for data: An algorithmic solution. In Proceedings of the 2019 ACM Conference on Economics and Computation, pages 701–726.
  • Alaei et al., (2016) Alaei, S., Jain, K., and Malekian, A. (2016). Competitive equilibria in two-sided matching markets with general utility functions. Operations Research, 64(3):638–645.
  • Ausubel and Milgrom, (2002) Ausubel, L. M. and Milgrom, P. R. (2002). Ascending auctions with package bidding. Advances in Theoretical Economics, 1(1).
  • Azevedo and Hatfield, (2018) Azevedo, E. M. and Hatfield, J. W. (2018). Existence of equilibrium in large matching markets with complementarities. Available at SSRN 3268884.
  • Banerjee et al., (2001) Banerjee, S., Konishi, H., and Sönmez, T. (2001). Core in a simple coalition formation game. Social Choice and Welfare, 18(1):135–153.
  • Bergemann et al., (2018) Bergemann, D., Bonatti, A., and Smolin, A. (2018). The design and price of information. American Economic Review, 108(1):1–48.
  • Bergemann and Morris, (2013) Bergemann, D. and Morris, S. (2013). Robust predictions in games with incomplete information. Econometrica, 81(4):1251–1308.
  • Bogomolnaia and Jackson, (2002) Bogomolnaia, A. and Jackson, M. O. (2002). The stability of hedonic coalition structures. Games and Economic Behavior, 38(2):201–230.
  • Caskurlu and Kizilkaya, (2019) Caskurlu, B. and Kizilkaya, F. E. (2019). On hedonic games with common ranking property. In International Conference on Algorithms and Complexity, pages 137–148. Springer.
  • Demange and Gale, (1985) Demange, G. and Gale, D. (1985). The strategy structure of two-sided matching markets. Econometrica: Journal of the Econometric Society, pages 873–888.
  • Farrell and Scotchmer, (1988) Farrell, J. and Scotchmer, S. (1988). Partnerships. The Quarterly Journal of Economics, 103(2):279–297.
  • Fernandez et al., (2020) Fernandez, R. C., Subramaniam, P., and Franklin, M. J. (2020). Data market platforms: Trading data assets to solve data problems. Proceedings of the VLDB Endowment, 13(12):1933–1947.
  • Gale and Shapley, (1962) Gale, D. and Shapley, L. S. (1962). College admissions and the stability of marriage. The American Mathematical Monthly, 69(1):9–15.
  • Ghorbani and Zou, (2019) Ghorbani, A. and Zou, J. (2019). Data shapley: Equitable valuation of data for machine learning. In International Conference on Machine Learning, pages 2242–2251. PMLR.
  • Ghosh and Roth, (2011) Ghosh, A. and Roth, A. (2011). Selling privacy at auction. In Proceedings of the 12th ACM Conference on Electronic Commerce, pages 199–208.
  • Gonçalves et al., (2020) Gonçalves, C., Pinson, P., and Bessa, R. J. (2020). Towards data markets in renewable energy forecasting. IEEE Transactions on Sustainable Energy, 12(1):533–542.
  • Green and Laffont, (1977) Green, J. and Laffont, J.-J. (1977). Characterization of satisfactory mechanisms for the revelation of preferences for public goods. Econometrica: Journal of the Econometric Society, pages 427–438.
  • Gul and Stacchetti, (1999) Gul, F. and Stacchetti, E. (1999). Walrasian equilibrium with gross substitutes. Journal of Economic Theory, 87(1):95–124.
  • Holmström, (1979) Holmström, B. (1979). Groves’ scheme on restricted domains. Econometrica: Journal of the Econometric Society, pages 1137–1144.
  • Hylland and Zeckhauser, (1979) Hylland, A. and Zeckhauser, R. (1979). The efficient allocation of individuals to positions. Journal of Political Economy, 87(2):293–314.
  • Irving, (1985) Irving, R. W. (1985). An efficient algorithm for the “stable roommates” problem. Journal of Algorithms, 6(4):577–595.
  • Jackson and Van den Nouweland, (2005) Jackson, M. O. and Van den Nouweland, A. (2005). Strongly stable networks. Games and Economic Behavior, 51(2):420–444.
  • Jackson and Wolinsky, (1996) Jackson, M. O. and Wolinsky, A. (1996). A strategic model of social and economic networks. Journal of Economic Theory, 71(1):44–74.
  • Lerner and Tirole, (2004) Lerner, J. and Tirole, J. (2004). Efficient patent pools. American Economic Review, 94(3):691–711.
  • Liang and Madsen, (2020) Liang, A. and Madsen, E. (2020). Data and incentives. In Proceedings of the 21st ACM Conference on Economics and Computation, pages 41–42.
  • Ligett and Roth, (2012) Ligett, K. and Roth, A. (2012). Take it or leave it: Running a survey when privacy comes at a cost. In International Workshop on Internet and Network Economics, pages 378–391. Springer.
  • Mehta, (2013) Mehta, A. (2013). Online matching and ad allocation.
  • Mehta et al., (2019) Mehta, S., Dawande, M., Janakiraman, G., and Mookerjee, V. (2019). How to sell a dataset? pricing policies for data monetization. Pricing Policies for Data Monetization (August 1, 2019).
  • Ostrovsky, (2008) Ostrovsky, M. (2008). Stability in supply chain networks. American Economic Review, 98(3):897–923.
  • Pycia, (2012) Pycia, M. (2012). Stability and preference alignment in matching and coalition formation. Econometrica, 80(1):323–362.
  • Raith, (1996) Raith, M. (1996). A general model of information sharing in oligopoly. Journal of Economic Theory, 71(1):260–288.
  • Shapley and Scarf, (1974) Shapley, L. and Scarf, H. (1974). On cores and indivisibility. Journal of Mathematical Economics, 1(1):23–37.
  • Shapley and Shubik, (1971) Shapley, L. S. and Shubik, M. (1971). The assignment game i: The core. International Journal of Game Theory, 1(1):111–130.
  • Sotomayor, (1992) Sotomayor, M. (1992). The multiple partners game. In Equilibrium and Dynamics, pages 322–354. Springer.
  • Varian, (2009) Varian, H. R. (2009). Online ad auctions. American Economic Review, 99(2):430–34.
  • Vickrey, (1961) Vickrey, W. (1961). Counterspeculation, auctions, and competitive sealed tenders. The Journal of Finance, 16(1):8–37.