This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Inria and École Polytechnique, France
11email: {sayan.biswas,gangsoo.zeong}@inria.fr
11email: [email protected]

An Incentive Mechanism for Trading Personal Data in Data Markets

Sayan Biswas    Kangsoo Jung    Catuscia Palamidessi
Abstract

With the proliferation of the digital data economy, digital data is considered as the crude oil in the twenty-first century, and its value is increasing. Keeping pace with this trend, the model of data market trading between data providers and data consumers, is starting to emerge as a process to obtain high-quality personal information in exchange for some compensation. However, the risk of privacy violations caused by personal data analysis hinders data providers’ participation in the data market. Differential privacy, a de-facto standard for privacy protection, can solve this problem, but, on the other hand, it deteriorates the data utility. In this paper, we introduce a pricing mechanism that takes into account the trade-off between privacy and accuracy. We propose a method to induce the data provider to accurately report her privacy price and, we optimize it in order to maximize the data consumer’s profit within budget constraints. We show formally that the proposed mechanism achieves these properties, and also, validate them experimentally.

Keywords:
Data Market Differential Privacy Incentive Mechanism Game Theory.

1 Introduction

Nowadays, digital data is becoming an essential resource for the information society, and the value of personal data is increasing. In the past, data broker companies such as Acxiom collected personal data and sold them to companies that needed them. However, as the value of personal data is becoming clear to the data providers, and concern about their privacy is increasing among them, people are less and less willing to let their data to be collected for free. In this scenario, the model of data market is starting to emerge, as a process to obtain high-quality personal information in exchange of a compensation. Liveen [1] and Datacoup [2] are examples of prototypes of data market services, where the data providers can obtain additional revenue from selling their data, and the consumers can collect the desired personal data.

The problem of privacy violation by personal data analysis is one of the major issues in such data markets. As the population becomes more and more aware of the negative consequences of privacy breaches, such as the Cambridge Analytica scandal, people are reluctant to release their data, unless they are properly sanitised. In order to solve this problem, techniques like noise insertion [3], synthetic data [4], secure multi-party computation (SMC) [5], and homomorphic encryption [6] are being actively studied. Differential privacy [3], a de-facto standard for privacy protection, is one of the techniques to prevent privacy violations in the data market.

Differential privacy provides a privacy protection framework based on solid mathematical foundations, and enables quantified privacy protection according to the amount of noise insertion. However, like all privacy-protection methods, it deteriorates the data utility. If the data provider inserts too much noise because of privacy concern, the data consumer cannot proceed with the data analysis with the required performance. This trade-off between privacy and utility is a long-standing problem in differential privacy. The privacy protection and data utility depend on the amount of noise insertion while applying differential privacy, and the amount of noise insertion is determined by the noise parameter ϵ\epsilon. Thus, determining the appropriate value of the parameter ϵ\epsilon is a fundamental problem in differential privacy. It is difficult to establish the appropriate ϵ\epsilon value because it depends on many factors that are difficult to quantify, like the attitude towards privacy of the data provider, which may be different from person to person.

We propose an incentive mechanism to encourage the data providers to join in the data market and motivate them to share more accurate data. The amount of noise insertion depends on the data providers’ privacy preference and the incentives provided to them by data consumers, and the data consumers decide on incentives to pay to the data provider by considering the profit to be made from the collected data. By sharing some of the consumers’ profit with the data provider as incentive, the data provider can get fair prices for providing her data. The proposed mechanism consists of the truthful price report mechanism and an optimization method within budget constraints. The truthful price report mechanism guarantees that the data provider takes the optimal profit when she reports her privacy price to the data consumer honestly. Based on a data provider’s reported privacy price, a data consumer can maximize her profit within a potential budget constraint.

1.1 Contribution

The contributions of this paper are as follows:

  1. (i)

    Truthful price report mechanism: We propose an incentive mechanism that guarantees that the data provider maximizes her benefit when she reports her privacy price honestly.

  2. (ii)

    Optimized incentive mechanism within the budget constraints: We propose an optimization method to maximize the data consumer’s profit and information gain in the setting where the data consumer has a fixed financial budget for data collection.

  3. (iii)

    Optimized privacy budget splitting mechanism: We propose a method of splitting the privacy budget for the data providers, that allows them to maximize her utility-gain within a fixed privacy budget, in a multiple data consumer environment.

The properties of our methods are both proved formally and validated through experiments.

1.2 Structure of the paper

The structure of this paper is as follows: we explain the related works and preliminaries in Sections 2 and 3, respectively. We describe the proposed incentive mechanism in Section 4 and validate the proposed incentive mechanism through experiments in Section 5. Our conclusion and some potential directions of future work are discussed in Section 6.

2 Related work

2.1 Methods for choosing ϵ\epsilon

In differential privacy concept, parameter ϵ\epsilon is the knob to control the privacy-utility trade off. The smaller the ϵ\epsilon, the higher is the privacy protection level and the more it deteriorates the data utility. Conversely, a larger ϵ\epsilon decreases the privacy protection level and enhances the data utility. However, there is no gold standard to determine the appropriate value of ϵ\epsilon. Apple has been promoting the use of differential privacy to protect user data since iOS 10 was released, but the analysis of [7] showed the ϵ\epsilon value was set at approximately 10 without any particular reason. The work of [8] showed that the privacy protection level set by an arbitrary ϵ\epsilon can be infringed by inference using previously disclosed information and proposed an ϵ\epsilon setting method considering posterior probability. This matter is the main factor that undermines the claim that personal information is protected by differential privacy. Much research have been conducted to study and solve this problem [9, 10, 11, 12]. Although a lot of research is being done in this area, the problem of determining a reasonable way of choosing an optimal value for ϵ\epsilon still remains open, as there are many factors to consider in deciding the value of ϵ\epsilon, and more studies are still needed. In this paper, we propose a technique to determine an appropriate value of ϵ\epsilon by setting a price of the privacy of the data provider.

2.2 Pricing mechanism

One of the solutions to find an appropriate value of ϵ\epsilon is to price it according to the data accuracy [13, 14, 15, 16, 17, 18]. In [13], strength of the privacy guarantee and the accuracy of the published results are considered to set the ϵ\epsilon value, and a simple ϵ\epsilon setting model that can satisfy data providers and consumers was suggested. In [14], the author proposed a compensation mechanism via auction in which data providers are rewarded based on data accuracy and data consumer’s budget when they provide data with differential privacy. It is the most similar work to our study. The main differences between our paper and Ghosh and Roth’s work are as follows:

  1. (i)

    We define a truthful price report mechanism that a data provider get a best profit when she reports her privacy price honestly, and prove it.

  2. (ii)

    We propose an optimized incentive mechanism to maximize the data consumer’s profit with a fixed expense budget, and a privacy budget splitting method to maximize the data provider’s utility-gain in a multi-data consumer environment.

In [17] the authors design a mechanism that can estimate statistics accurately without compromising the user’s privacy. They propose a Bayesian incentive and privacy-preserving mechanism that guarantees privacy and data accuracy. The study of [18] proposes a Stackelberg game to maximize mobile users who provides their trajectory data.

Several techniques for pricing data assuming a data market environment have been studied in [19, 20, 21, 22, 23, 24, 25, 26].

In [19] the authors suggested a data pricing mechanism to make the balance between privacy and price in data market environment. In [20], the authors propose the data market model in the IoT environment and show the proposed pricing model has a global optimal point. In [21] the authors proposed a theoretical framework for determining prices to noisy query answer in the differentially private data market. However, this research cannot flexibly reflect the requirements of the data market. In the study of [23], the author proposed an ϵ\epsilon-choosing method based on Rubinstein bargaining and assumes a market manager that mediates a data provider and consumer in the data trading.

It is realistic to consider personal data as a digital asset, and reasonable to attempt to find a bridge between privacy protection level and price according to the value of ϵ\epsilon in differential privacy, as has been done in this paper. Existing studies are attempting to find an equilibrium between data providers and consumers under the assumption that both are reasonable individuals. In this paper, we follow a research direction similar to existing studies, and focus on the incentive mechanism that motivates a data provider report her privacy price honestly. In particular, we consider that the value of differentially private data increases non-linearly with respect to the increase of the value of ϵ\epsilon.

3 Preliminaries

In this section, we explain the basic concepts of differential privacy. Differential privacy is a mathematical model that guarantees the privacy protection at a specified level ϵ\epsilon. For all datasets D1D_{1} and D2D_{2} differing exactly at a single element, it is defined to satisfy ϵ\epsilon-differential privacy, if the probability distribution difference of the result of a specific query KK on two databases is less than or equal to the threshold eϵe^{\epsilon}. The definition of the differential privacy is as follows:

Definition 1 (Differential privacy[3]).

A randomized function 𝒦\mathcal{K} provides ϵ\epsilon-differential privacy if all datasets, D1D_{1} and D2D_{2}, differing by only one element, and all subsets, SS\subseteq Range(𝒦\mathcal{K}),

[𝒦(D1)S]eϵ[𝒦(D2)S]\mathbb{P}[\mathcal{K}(D_{1})\in S]\leq e^{\epsilon}\mathbb{P}[\mathcal{K}(D_{2})\in S]

The Laplace mechanism [3] is one of the most common methods for achieving the ϵ\epsilon-differential privacy.

One of the important properties of differential privacy is the compositionality that allows query composing to facilitate modular design [3].

Sequential compositionality

For any database DD, let we query on the randomization mechanism K1{K_{1}} and K2{K_{2}} which is independent for each query. The results of K1(D){K_{1}}(D) and K2(D){K_{2}}(D) whose guarantees are the ϵ1\epsilon_{1} and ϵ2\epsilon_{2}-differential privacy, is (ϵ1+ϵ2\epsilon_{1}+\epsilon_{2})-differentially private.

Parallel compositionality

Let AA and BB be the partition of any database D(AB=D,AB=ϕ)D(A\cup B=D,A\cap B=\phi). Then, the result of the query on the randomization mechanism K1(A){K_{1}}(A) and K2(B){K_{2}}(B), is the max(ϵ1,ϵ2\epsilon_{1},\epsilon_{2})-differentially private.

Recently, a variant of differential privacy called local differential privacy has been proposed [27, 28, 29, 30]. In this model, data providers obfuscate their own data by themselves. Local differential privacy has an advantage that it does not need a trusted third party to satisfy the differential privacy. The properties of parallel and sequential compositionality hold for the local model as well.

In the rest of this paper, we consider the local model of differential privacy.

4 Incentive mechanism for data markets

4.1 Overview of the proposed technique

The data market aims at collecting personal data legally with the consent of the provider. A data provider can sell her own data and get paid for it, and a data consumer can collect the personal data for analysis by paying a price, resulting in a win-win situation.

Naturally, the data consumer wants to collect personal data as accurately as possible at the lowest possible price, and the data provider wants to sell her data at a price as high as possible while protecting sensitive information. In general, every effective protection technique affects the utility of the data negatively. In the particular case of differential privacy, the levels of utility and privacy are determined by the parameter ϵ\epsilon; thus, the data price is affected directly by the value of ϵ\epsilon.

Determining the appropriate value of ϵ\epsilon and the actual price of the data are critical to the success of the data market. However this is not an easy task, also because each data provider has different privacy needs [30].

We propose an incentive mechanism to find the price of the data and the value of ϵ\epsilon that can satisfy both the data provider and the data consumer. The proposed method consists of two parts: an incentive mechanism encouraging the data provider to report her privacy price honestly to the data consumer, and an optimization scheme to maximize both the data consumer and provider’s profit within a budget constraint.

We consider a scenario with nn data providers, u1,,unu_{1},\ldots,u_{n}, and mm data consumers, D1,,DmD_{1},\ldots,D_{m}, and where each provider and consumer proceeds with the deal independently (we use the term “data provider” and “data producer” interchangeably, in the same sense). The term “ϵ\epsilon unit price” (e.g., 11$ per ϵ\epsilon value 0.10.1) will be used to express the price, where ϵ\epsilon is the parameter of differential privacy, which is a measure of the accuracy of information. We recall that, as ϵ\epsilon increases, the data becomes less private and more information can be obtained from it, and vice versa. Thus, the price per unit ϵ\epsilon represents the “value” of the provider’s information111The ϵ\epsilon unit price can be of any form, including a monetary one. The method we propose is independent from the nature of the price, so we do not need to specify it.. The price of ϵ\epsilon is expected to differ from one data provider to another, because each individual has a different privacy need. We denote the ϵ\epsilon unit price reported by uiu_{i} as pip_{i} and her true ϵ\epsilon unit price as πi\pi_{i}.

Refer to caption

Figure 1: An example of data trading process. In this figure, uiu_{i} means the ithi^{th} data provider and DjD_{j} means the jthj^{th} data consumer.

Fig. 1 illustrates how the process works. At first, every data consumer broadcasts a function ff to the data providers, which represents the amount of data (expresses in ϵ\epsilon units) the consumer is willing to buy for a given ϵ\epsilon unit price. Each consumer has her own such function, and it can differ from one consumer to another. We will call it ϵ\epsilon-allocating function. We assume ff to be monotonically decreasing, as the consumers naturally prefer to buy more data from those data producers who are willing to offer them for less. Note that the product pif(pi)p_{i}f(p_{i}) represents the total amount that will be payed by the data producer to the consumer if they agree on the trade. The function ff however has also a second purpose: as we shown in Section 4.2, it is designed to encourage providers to demand the price that they really consider the true price of their privacy, rather than asking for more.

Then, thanks to the truthful price report mechanism (cf. Section 4.2), the data providers report the prices of their data honestly to the data consumers in accordance with the published ff. In the example in Fig. 1, u1u_{1} reports her ϵ\epsilon price per 0.10.1 as 11$ and u2u_{2} reports her ϵ\epsilon price per 0.10.1 as 22$. Finally, the data consumer checks the price reported by the data provider and determines the total price and value of ϵ\epsilon to be obtained from each provider using ff. In this example, the data consumer D1D_{1} determines ϵ1\epsilon_{1} to be 0.7 and ϵ2\epsilon_{2} to be 0.4.

Then, the data providers select the consumers to whom to sell their data in order to maximize their profits, and confirm with them the values of their ϵ\epsilon and the total price they would receive. In the example in Fig. 1, D1D_{1} pays 7$ to u1u_{1} and 8$ to u2u_{2}. Finally, the data providers add noise to their data based on the determined ϵ\epsilon and share the sanitized data with the respective consumers, and the consumers pay the corresponding prices to the providers. We assume that data providers and consumers keep the promise of the value of ϵ\epsilon and compensation decided in the deal, once confirmed.

This process can be repeated until the data consumers exhaust all their budget or achieve the targeted amount of information. The task of allocating a suitable budget in each round and the how to determine the amount of needed information are also important topics, but they are out of the scope of this paper and are left for future work.

4.2 Truthful price report mechanism

For the correct functioning of the data trading, the data provider should be honest and demand her true privacy price. However, she may be motivated to report a higher price, in the hope to persuade the data consumer that the information is “more valuable”, and be willing to pay more. Note also that the true privacy price of each data provider is a personal information that only the provider herself knows and is not obliged to disclose.

To solve this problem, we propose a truthful price report mechanism to ensure that the data providers report their ϵ\epsilon unit prices honestly. The purpose of the mechanism is to provide incentive so that the providers are guaranteed to get the greatest profit when they report their true price.

When the data provider reports her price pip_{i}, the data consumer determines the amount of ϵ\epsilon to purchase using f(pi)f(p_{i}), where ff is the ϵ\epsilon-allocating function introduced in Section 4.1. We recall that ff is a monotonically decreasing function, chosen by the consumer. We assume that the domain of ff, the ϵ\epsilon price unit, is normalized to take values in the interval [0,1][0,1]. The total price for the data estimated by the consumer is the product of the ϵ\epsilon price unit and the amount to be purchased, namely, pif(pi)p_{i}f(p_{i}). To this value, the consumer adds an incentive pif(z)𝑑z\int_{p_{i}}^{\infty}f(z)\,dz, the purpose of which is to make convenient for the data producer to report the true price (we assume that the data producer knows ff and the strategy of the consumer in advance). The consumer should of course choose ff so to be happy with the incentive. In particular, the incentive should be finite, so the contribution of f(z)f(z) should vanish as zz goes to {\infty}. An example of such a function is illustrated in Fig. 2.

Thus the data consumer sets the offer μ(pi)\mu(p_{i}) to the provider uiu_{i} as follows:

Definition 2 (Payment offer).

The offer μ(pi)\mu(p_{i}) is defined as:

μ(pi)=pif(pi)+pif(z)𝑑z\mu(p_{i})=p_{i}f(p_{i})+\int_{p_{i}}^{\infty}f(z)\,dz

Refer to caption

Figure 2: An example of a monotonically decreasing function f(z)f(z). Let cc be a parameter representing the “reported value-to-admitted ϵ\epsilon value” ratio. For z0z\geq 0, we set f(z)f(z) as f(z)=ln(ecz)f(z)=ln(e-cz) if (ecz)1(e-cz)\leq 1, and f(x)=0f(x)=0 otherwise.

We now illustrate how this strategy achieves its purpose of convincing the consumer to report her true price. We start by defining the utility that the data provider obtains by selling her data as the difference between the offer and the true price of her data, represented by the product of the true ϵ\epsilon unit price and the amount to be sold, namely πif(pi)\pi_{i}f(p_{i}):

Definition 3 (Utility of the data provider).

The utility ρ(pi)\rho(p_{i}), of the provider uiu_{i}, for the reported price pip_{i}, is defined as:

ρ(pi)=μ(pi)πif(pi)\rho(p_{i})=\mu(p_{i})-\pi_{i}f(p_{i})

We are now going to show that he proposed mechanism guarantees truthfulness. The basic reason is that each provider uiu_{i} achieves the best utility when reporting the true price. Namely, ρ(πi)ρ(pi)\rho(\pi_{i})\geq\rho(p_{i}) for any pi+p_{i}\in\mathbb{R^{+}}, where we recall that πi\pi_{i} is the true price of the provider uiu_{i}. The only technical condition is that the function ff is monotonically decreasing. Under this assumption, we have the following results (see also Fig. 3 to get the intuition of the proof):

Refer to caption

Figure 3: Graphical illustration of Theorem 4.1. We prove that ρ(πi)\rho(\pi_{i})(blue hatching area) is always larger than ρ(pi)\rho(p_{i})(blue rectangle area++red hatching area-green rectangle area).
Lemma 1 ().

If uiu_{i} reports a price greater than her true price, i.e., piπip_{i}\geq\pi_{i}, then her utility will be less than the utility for the true price, i.e., ρ(pi)ρ(πi)\rho(p_{i})\leq\rho(\pi_{i}).

Proof.

See Appendix A.

Lemma 2 ().

If uiu_{i} reports a price smaller than her true price, i.e., piπip_{i}\leq\pi_{i}, then her utility will be less than the utility for the true price, i.e., ρ(pi)ρ(πi)\rho(p_{i})\leq\rho(\pi_{i}).

Proof.

See Appendix A.

Combining Lemma 1 and Lemma 2 gives the announced result. We assume of course that each data producer is a rational individual, i.e., capable of identifying the best strategy to maximize her utility.

Theorem 4.1 ().

If every data data producer acts rationally, then the proposed incentive mechanism guarantees the truthfulness of the system.

Proof.

Immediate from Lemma 1 and Lemma 2. ∎

4.3 Optimizing the incentive mechanism

In this section, we propose an optimization mechanism to identify an optimal function ff for the data consumer with respect to the following two desiderata:

  1. (i)

    Maximum Information: maximize the total information gain of the data consumer with a fixed budget.

  2. (ii)

    Maximum Profit: maximize the total profit of the data consumer with a fixed budget.

By “budget” here we mean the budget of the data consumer to pay the data providers.

We start by introducing the notions of total information and profit for the consumer. Note that, by the sequential compositionality of differential privacy, the total information is the sum of the information obtained from each data provider.

Definition 4 (Total information).

The total information (𝒖)\mathcal{I}({\bf\it u}) obtained by the data consumer by concluding trades with each of the data providers of the tuple 𝒖=(u1,,un){\bf\it u}=(u_{1},\ldots,u_{n}) is defined as

(𝒖)=i=1nf(pi)\mathcal{I({\bf\it u})}=\sum_{i=1}^{n}f(p_{i})

As for the profit, we can reasonably assume to be monotonically increasing with the amount of information obtained, and that the total profit is the sum of the profits obtained with each individual trading. The latter is naturally defined as the difference between the benefit (aka payoff) obtained by re-selling or processing the data, and the price payed to the data provider.

Definition 5 (Payoff and profit).
  • \bullet

    The payoff function for the data consumer, denoted by τ()\tau(\cdot), is the benefit that the data consumer receives by processing or selling the information gathered from the different data providers. The argument of τ()\tau(\cdot) is ϵ\epsilon, the amount of the information received. We assume τ(ϵ)\tau(\epsilon) to be monotonically increasing with ϵ\epsilon.

  • \bullet

    The total profit for the data consumer is given by i=1n(τ(ϵi)μ(ϵi))\sum_{i=1}^{n}(\tau(\epsilon_{i})-\mu(\epsilon_{i})), where ϵi=f(pi)\epsilon_{i}=f(p_{i}), i.e., the ϵ\epsilon-value allocated to uiu_{i}.

We will consider a family of functions \mathcal{F} indexed by a parameter cc, to which the ϵ\epsilon-allocating function ff belongs. The parameter cc reflects the data consumer’s will to collect the information and, for technical reasons, we assume ff to be continuous, differentiable and concave with respect to it. For each data provider, different values of cc will give different ff, that, in turn, will give rise to a different incentive-curve as per equation (2), which the data consumer should adhere to for compensating for the information obtained from that data provider.

As described in previous sections, the ϵ\epsilon-allocating functions should be monotonically decreasing with the ϵ\epsilon unit price, as the consumer is motivated to buy more information from the consumers that offer it at a lower price. This property also ensures, by Theorem 4.1, that the prices reported by the data producers will be their true prices. Hence we impose the following constraint on \mathcal{F}:

{f(,):c,p+,f(c,p) is continuous, differentiableand concave on c,and decreasing with p}.\begin{array}[]{llll}\mathcal{F}&\;\;\subseteq&\{f(\cdot,\cdot):&c,p\in\mathbb{R}^{+},f(c,p)\text{ is continuous, differentiable}\\ &&&\text{and concave on }c,\text{and decreasing with }p\}.\end{array} (1)

Note that we have added the parameter cc as an additional argument in ff, so ff has now two arguments.

Example 1.

An example of such class \mathcal{F} is that of Fig. 2:

={ln(ecp):c+}.\mathcal{F}=\{\ln(e-cp):c\in\mathbb{R}^{+}\}\,.
Example 2.

Another example is:

={1cp:c+}.\mathcal{F}=\{1-cp:c\in\mathbb{R}^{+}\}\,.

After the prices p1,,pnp_{1},\ldots,p_{n} have been reported by the data producers u1,,unu_{1},\ldots,u_{n}, the data consumer will try to choose an optimal cc maximizing her profit. Fig. 4 illustrates an example with two data provider’s incentive graph and payoff for data consumer.

We will analyse the possibility to choose an optimal cc, that, in turn, leads to an optimal f(c,)f(c,\cdot) addressing scenarios (i) and (ii).

Refer to caption

Figure 4: Illustrating the payoff for cc and the incentive-plots for the data consumer involving two data providers reporting p1,p2p_{1},p_{2}. The Y-intercept of μ1\mu_{1} is p1f(z)𝑑z\int_{p_{1}}^{\infty}f(z)dz and that for μ2\mu_{2} is p2f(z)𝑑z\int_{p_{2}}^{\infty}f(z)dz.

In the context of differential privacy, we may assume that τ\tau (the data consumer’s payoff function) is additive, i.e.,

Additivityτ(a+b)=τ(a)+τ(b) for every a,b+.\mbox{\bf Additivity}\quad\tau(a+b)=\tau(a)+\tau(b)\quad\mbox{ for every }a,b\in\mathbb{R}^{+}\,. (2)

This is a reasonable assumption that goes well along with the sequential compositionality property of differential privacy, at least for small values of ϵ\epsilon222From a technical point of view, the additive property holds also for large values of ϵ\epsilon. However, from a practical point of view, for large values of ϵ\epsilon, for instance 200200 and 400400, then the original information is almost entirely revealed in both cases, and would not make sense to pay twice the price of 200200 ϵ\epsilon units to achieve 400400 ϵ\epsilon units..

We start by showing that the two desiderata (i) and (ii) are equivalent:

Theorem 4.2 ().

If τ()\tau(\cdot) is additive, then maximizing information and maximizing profit (desiderata (i) and (ii)) are equivalent, in the sense that a ϵ\epsilon-allocating function f(,)f(\cdot,\cdot) that maximizes the one, maximizes also the other.

Proof.

See Appendix A.

Corollary 1.

If τ()\tau(\cdot) is additive, then the optimal choice of f(,)f(\cdot,\cdot) w.r.t. the selected family of functions will maximize both the information gain and the profit for the data consumer.

Proof.

Immediate from Theorem 4.2. ∎

We now consider the complexity problem for finding the optimal f(,)f(\cdot,\cdot). Due to the assumptions made in Equation 1, and to the additivity of τ\tau, we can apply the method of the Lagrangians to find such f(,)f(\cdot,\cdot) (cf. Appendix A).

Theorem 4.3 ().

If τ\tau is additive, then there exists a cc that gives an optimal profit-maximizing function f(c,)f(c,\cdot)\in\mathcal{F}, for a fixed budget, and we can derive such cc via the method of the Lagrangians.

Proof.

See Appendix A. ∎

Theorem 4.4 ().

There exists a cc that gives an optimal information-maximizing function f(c,)f(c,\cdot)\in\mathcal{F}, for a fixed budget, and we can derive such cc via the method of the Lagrangians.

Proof.

See Appendix A.

To demonstrate how the method works, we show how to compute the specific values of cc on the two classes \mathcal{F} of Examples 1 and 2. Such cc gives the optimal ϵ\epsilon-allocating function f(c,)f(c,\cdot), maximizing (𝒖)\mathcal{I}({\bf\it u}) for a given budget. The derivations are described in detail in Appendix B. In each example, pip_{i} is the reported ϵ\epsilon unit price of uiu_{i}.

Example 3.

Let ={ln(ecp):c+}\mathcal{F}=\{\ln(e-cp):c\in\mathbb{R}^{+}\}. The optimal parameter cc is the solution of the equation ln(i=1nepi(ecpi)ec)=B+n(e1)c\ln(\prod_{i=1}^{n}e^{p_{i}}(e-cp_{i})^{\frac{e}{c}})=B+\frac{n(e-1)}{c}.

Example 4.

Let ={1cp:c+}\mathcal{F}=\{1-cp:c\in\mathbb{R}^{+}\}. The optimal parameter cc is the solution of the equation c2i=1npi2+2Bcn=0c^{2}\sum_{i=1}^{n}p_{i}^{2}+2Bc-n=0.

4.4 Discussion

In our model, for the scenario we have considered so far, the parameter cc is determined by the number of providers and the budget. We observe that, in both Examples 3 and 4, if nn increases than cc increases, and vice versa. This seems natural, because in the families of both these example cc the incentive that the consumer is going to propose decreases monotonically with cc. This means that the larger is the offer, the smaller is the incentive that the consumer needs to be paying. In other words, the examples confirm the well known market law according to which the price decreases when the offer increases, and vice versa.

We note that we have been assuming that there is enough offer to satisfy the consumer’s demand. If this hypothesis is not satisfied, i.e., if the offer is smaller than the demand, then the situation is quite different: now the data producer can choose to whom to sell hid data. In particular, the data consumer who sets a lower cc will have a better chance to buy data because, naturally, the provider prefers to sell her data to the data consumers who give a higher incentive. In the next section we explore in more detail the process, from the perspective of the data provider, in the case in which the demand is higher than the offer.

4.5 Optimized privacy budget splitting mechanism for data providers

After optimizing an incentive mechanism for a given data consumer dealing with multiple data providers, we focus on the flip side of the setup. We assume a scenario in which a given data provider has to provide her data to multiple data consumers, and that there is enough demand so that she can sell all her data.

Let there be mm data consumers, D1,,DmD_{1},\ldots,D_{m} seeking to obtain data from the user uu. By truthful price report mechanism, as discussed in Section 4.2, uu reports her true price to each DiD_{i}. As discussed in Section 4.3, DiD_{i} computes her optimal ϵ\epsilon-allocating function fif_{i} and requests data from uu, differentially privatized with ϵ=fi(π)\epsilon=f_{i}(\pi). After receiving f1,,fmf_{1},\ldots,f_{m}, uu would like to provide her data in such a way that maximizes her utility received after sharing her data.

Definition 6.

We say that the data provider has made a deal with the data consumer DiD_{i} if, upon reporting the true per-unit price of her information, π\pi, she agrees to share her data privatized with privacy parameter ϵ=fi(π)\epsilon=f_{i}(\pi).

It is important to note here that uu is not obliged to deal with any data consumer DiD_{i}, even after receiving fif_{i}. Realistically, uu has a privacy budget of ϵtotal\epsilon_{\text{total}}, which she would not exceed at any price. Let S={i1,,ik}S=\{i_{1},\ldots,i_{k}\} be an arbitrary subset {1,,m}\{1,\ldots,m\}. By the sequential composition property of differential privacy, the final privacy parameter achieved by uu by sharing her data to an arbitrary set of data consumers Di1,,DikD_{i_{1}},\ldots,D_{i_{k}} is ϵS=jSfj(π)\epsilon_{S}=\sum_{j\in S}f_{j}(\pi). uu’s main intention is to share her data in such a way that ensures ϵSϵtotal\epsilon_{S}\leq\epsilon_{\text{total}} for all subset SS of {1,,n}\{1,\ldots,n\}, while maximizing jSρi(π,fj)\sum_{j\in S}\rho_{i}(\pi,f_{j}), i.e., the total utility received. Reducing it down to the 0/1 knapsack problem, we propose that uu should be dealing with {Di1,,Dik}\{D_{i_{1}},\ldots,D_{i_{k}}\} where S={i1,,ik}{1,,m}S^{*}=\{i_{1},\ldots,i_{k}\}\subseteq\{1,\ldots,m\}, chosen as

S=argmaxS{jSρ(π,fj)|S{1,,m},jSfj(π)ϵtotal}S^{*}=\operatorname*{arg\,max}\limits_{S}\{\sum\limits_{j\in S}\rho(\pi,f_{j})|S\subseteq\{1,\ldots,m\},\sum_{j\in S}f_{j}(\pi)\leq\epsilon_{\text{total}}\}

(3)

We show the pseudocode for the ϵ\epsilon allocation algorithm and the entire process in Algorithms 1 and 2.

Input: {ϵ1,,ϵn}\{\epsilon_{1},\ldots,\epsilon_{n}\} stored in array w, {p1,,pn}\{p_{1},\ldots,p_{n}\} stored in array v, ϵtotal\epsilon_{\text{total}} ;
Output: List of data consumer {D1Dk}\{D_{1}\ldots D_{k}\} that is selected to sell data;
initiate Two-dimension array m;
while  ini\leq n do
       while  jϵtotalj\leq\epsilon_{\text{total}} do
             if w[i]>ϵtotalw[i]>\epsilon_{\text{total}} then
                   m[i, j] := m[i-1, j]
            else
                   m[i, j] := max(m[i-1, j], m[i-1, j-w[i]] + v[i])
             end if
            
       end while
      
end while
backtrack using the final solution m and find the index of the data consumer ;
return List of selected data consumer ;
Algorithm 1 Optimized privacy budget splitting algorithm
Input: the data provider {u1,,un}\{u_{1},\ldots,u_{n}\},the data consumer {D1,,Dm}\{D_{1},\ldots,D_{m}\} ;
Output: List of the data provider and consumer pair that trade is completed ;
while imi\leq m do
       DiD_{i} calculate the parameter c to optimize the fi(f_{i}(\cdot);
       DiD_{i} inform the fi(f_{i}(\cdot) to the data provider
while jnj\leq n do
       uju_{j} report price pjp_{j} to the data consumer
while imi\leq m do
       while jnj\leq n do
             DiD_{i} calculate the ϵj\epsilon_{j} based on pjp_{j};
             DiD_{i} inform the ϵj\epsilon_{j} to the uju_{j}
      
while jnj\leq n do
       uju_{j} perform the Optimized ϵ\epsilon allocation algorithm to maximize the utility
Algorithm 2 The proposed data trading process

5 Experimental results

In this section we perform some experiments to verify that the proposed optimization method can find the best profit for the data consumer. For the experiments, we consider the families \mathcal{F} of Examples 3 and 4, namely ={ln(ecp):c+}\mathcal{F}=\{\ln(e-cp):c\in\mathbb{R}^{+}\} and ={1cp:c+}\mathcal{F}=\{1-cp:c\in\mathbb{R}^{+}\}. For these two families the optimal parameter cc is also derived formally in Appendix B.

The experimental variables are set as follows: we assume that there are 1010 data consumers, and the total number of data providers nn is set from 10001000 to 20002000 at an interval of 500500. The data provider’s ϵ\epsilon unit price is distributed normally with mean 11 and standard deviation 11, i.e., 𝒩(1,1)\mathcal{N}(1,1), and convert ϵ\epsilon unit price less than 0 or more than 22 to 0 and 22 respectively. We set the unit value ϵ\epsilon to 0.10.1, and the maximum ϵ\epsilon value of data provider to 33. We set the budgets as 6060, 9090, and 120120 and the number of the data provider as 1,0001,000, 1,5001,500, 2,0002,000. We assumed that the data consumer earned a profit of 1010 per 0.10.1 epsilon and set the parameter cc to 11 and 1010 for comparison.

Refer to caption

Figure 5: Experimental result of profit under a fixed budget. Log function is the family ln(ecp)ln(e-cp) and Linear function is the family 1cp1-cp. We let the parameter cc range from 0 to 11. The red bin represents the optimal value of cc, namely the cc that gives maximum information.

The results are shown in Fig. 5. For instance, in the case of the log family ln(ecp)\ln(e-cp), the optimal parameter cc is 5.365.36, and in the case of the linear family 1cp1-cp, the optimal parameter cc is 4.94.9. It is easy to verify that the optimal values of cc correspond to those determined by solving Equations (8) and (13) in Appendix B of Examples 3 and 4, respectively.

6 Conclusion and future work

As machine learning and data mining are getting more and more deployed, individuals are becoming increasingly aware of the privacy issues and of the value of their data. This evolution of people’s attitude towards privacy induces companies to develop new approaches to obtain personal data. We envision a scenario where data consumers can trade private data directly from the data provider by paying the price for the respective data, which has the potential to obtain personal information that could not be obtained in the traditional manner. In order to ensure a steady offer in the data market, it is imperative to provide the privacy protection that the data providers deem necessary. Differential privacy can be applied to meet this requirement. However, the lack of standards for setting an appropriate value for the differential privacy parameter ϵ\epsilon, that determines the levels of data utility and privacy protection, makes it difficult to apply this framework in the data market.

In order to address this problem, we have developed a method, based on incentives and optimization, to find an appropriate value for ϵ\epsilon in the process of data trading. The proposed incentive mechanism motivates every data provider to report her privacy price honestly in order to maximize her benefit, and the proposed optimization method maximizes the profit for the data consumer under a fixed financial budget. Additionally, in an environment involving multiple data consumers, our mechanism suggests an optimal way for the data providers to split the privacy budgets, maximizing their utility. Through experiments, we have verified that the proposed method provides the best profits to the provider and consumer.

Along the lines of what we have studied in this paper, there are many interesting research issues still open in this area. In future work, we plan to study the following issues:

  1. 1.

    Mechanism for a fair incentive share in an environment where the data providers make a federation for privacy protection

  2. 2.

    Maximization of the data consumers’ profits by estimating privacy price distribution of the data providers in an environment where demand of the data providers may change dynamically.

Acknowledgement

This work was supported by the European Research Council (ERC) project HYPATIA under the European Union’s Horizon 2020 research and innovation programme. Grant agreement n. 835294.

References

  • [1] Liveen, https://www.liveen.com/
  • [2] Datacoup, https://datacoup.com/
  • [3] C. Dwork, Cynthia, A. Roth. ”The algorithmic foundations of differential privacy”, Foundations and Trends in Theoretical Computer Science Vol. 9. No.3–4 , pp. 211-407, (2014)
  • [4] C. Bowen, J. Snoke, ”Comparative study of differentially private synthetic data algorithms from the NIST PSCR differential privacy synthetic data challenge”, arXiv preprint arXiv:1911.12704, pp. 1-32, (2019)
  • [5] N. Volgushev, et al. , ”Conclave: secure multi-party computation on big data”, Proceedings of the 14th EuroSys Conference, pp.1-18, (2019)
  • [6] A. Acar, et al., ”A survey on homomorphic encryption schemes: Theory and implementation”, ACM Computing Surveys (CSUR), Vol. 51, NO. 4, pp. 1-35, (2018)
  • [7] J. Tang A, et al, “Privacy loss in Apple’s implementation of differential privacy on macOS 10.12”, arXiv preprint arXiv:1709.02753, pp.1-12, (2017)
  • [8] J. Lee, C. Clifton, “How much is enough? Choosing Epsilon for Differential Privacy”, Proceedings of the International Conference on Information Security, pp.325–340, (2011)
  • [9] Y. Chen, et al., ”Truthful mechanisms for agents that value privacy”, ACM Transactions on Economics and Computation, VOl. 4, No.3, pp. 1-30, (2016)
  • [10] K. Ligett and A. Roth, ”Take it or leave it: Running a survey when privacy comes at a cost”, International Workshop on Internet and Network Economics, pp. 378–391, (2012)
  • [11] D. Xiao, ”Is privacy compatible with truthfulness?”, Proceedings of the 4th conference on Innovations in Theoretical Computer Science, pp. 67–86, (2013)
  • [12] K. Nissim, C. Orlandi, and R. Smorodinsky, ”Privacy-aware mechanism design”, Proceedings of the 13th ACM Conference on Electronic Commerce, pp. 774–789, (2012)
  • [13] J. Hsu, et al., “Differential privacy: An economic method for choosing epsilon”, Proceedings of the 27th IEEE Computer Security Foundations Symposium, pp.1–29, (2014)
  • [14] A. Ghosh, A. Roth, “Selling privacy at auction”, Games and Economic Behavior, Vol. 91, No.1, pp.334-346, (2015)
  • [15] P. Dandekar, N. Fawaz, and S. Ioannidis, ”Privacy Auctions for Recommender Systems”, https://arxiv.org/abs/1111.2885, pp. 1-23, (2012)
  • [16] A. Roth, “Buying private data at auction: the sensitive surveyor’s problem”, ACM SIGecom Exchanges, Vol.11, No.1, pp.1-8, (2012)
  • [17] L. K. Fleischer, Y. H. Lyu, “Approximately optimal auctions for selling privacy when costs are correlated with data”, Proceedings of the 13th ACM Conference on Electronic Commerce, pp.568-585, (2012)
  • [18] W. Li, C. Zhang, Z. Liu, Y. Tanaka, ”Incentive mechanism design for crowdsourcing-based indoor localization”, IEEE Access, Vol. 6, pp.54042-54051, (2018)
  • [19] R. Nget, Y. Cao,M. Yoshikawa, “How to balance privacy and money through pricing mechanism in personal data market”, arXiv preprint arXiv:1705.02982, pp.1-10, (2018)
  • [20] H. Oh et al., ”Personal data trading scheme for data brokers in IoT data marketplaces”, IEEE ACCESS, Vol.7, 2019, pp.40120-40132, (2019)
  • [21] C. Li, D. Li, Y. D, G. Miklau, D. Suciu, ”A theory of pricing private data”, ACM Transactions on Database Systems, Vol.39, No.4, pp.34-60, (2013)
  • [22] C. Aperjis, B. A. Huberman, “A market for unbiased private data: Paying individuals according to their privacy attitudes”, Available at SSRN: https://ssrn.com/abstract=2046861, pp.1-17, (2012)
  • [23] K. Jung, S. Park, ”Privacy bargaining with fairness: Privacy-price negotiation system for applying differential privacy in data market environments.” Proceedings of the International Conference on Big Data, pp.1389-1394, (2019)
  • [24] S. Krehbiel, S., ”Choosing epsilon for privacy as a service”. Proceedings on Privacy Enhancing Technologies, pp. 192-205, (2019)
  • [25] T. Zhang, Q. Zhu., ”On the Differential Private Data Market: Endogenous Evolution, Dynamic Pricing, and Incentive Compatibility”. arXiv preprint arXiv:2101.04357, pp. 1-30, (2021)
  • [26] Z. Jorgensen, T. Yu, G. Cormode, ”Conservative or liberal? Personalized differential privacy.” Proceedings of the 31St international conference on data engineering, pp. 1023-1034. IEEE, (2015)
  • [27] U. Erlingsson, V. Pihur, and A. Korolova, “Rappor Randomized aggregatable privacy-preserving ordinal respons,” Proceedings of International Conference on Computer and Communications Security, pp. 1054–1067, (2014)
  • [28] G. Cormode et al., “Privacy at Scale: Local Differential Privacy in Practice,” Proceedings of the International Conference on Management of Data, pp. 1655–1658, (2018)
  • [29] T. N. Thông, X. Xiaokui, Y. Yin et al., “Collecting and analyzing data from smart device users with local differential privacy,” https://arxiv.org/abs/1606.05053, pp. 1-11, (2016)
  • [30] S. P. Kasiviswanathan et al., “What can we learn privately,” SIAM Journal on Computing, Vol. 40, No. 3, pp. 7903–8826, (2011)

Appendix A Proofs

See 1

Proof.

Suppose the ithi^{th} provider reports the privacy price as pi>πip_{i}>\pi_{i}. We want to show:

ρ(πi)ρ(pi)>0\displaystyle\rho(\pi_{i})-\rho(p_{i})>0 (4)
μ(πi)πif(πi)μ(pi)+πif(pi)>0\displaystyle\Leftrightarrow\mu(\pi_{i})-\pi_{i}f(\pi_{i})-\mu(p_{i})+\pi_{i}f(p_{i})>0
μ(πi)μ(pi)πi(f(πi)f(pi))>0\displaystyle\Leftrightarrow\mu(\pi_{i})-\mu(p_{i})-\pi_{i}(f(\pi_{i})-f(p_{i}))>0
πif(πi)+πif(z)𝑑z(pif(pi)+pif(z)𝑑z)\displaystyle\Leftrightarrow\pi_{i}f(\pi_{i})+\int_{\pi_{i}}^{\infty}f(z)dz-(p_{i}f(p_{i})+\int_{p_{i}}^{\infty}f(z)dz)
πi(f(πi)f(pi))>0\displaystyle-\pi_{i}(f(\pi_{i})-f(p_{i}))>0
πif(pi)pif(pi)+πipif(z)𝑑z>0\displaystyle\Leftrightarrow\pi_{i}f(p_{i})-p_{i}f(p_{i})+\int_{\pi_{i}}^{p_{i}}f(z)dz>0
πipif(z)𝑑z>f(pi)(piπi)\displaystyle\Leftrightarrow\int_{\pi_{i}}^{p_{i}}f(z)dz>f(p_{i})(p_{i}-\pi_{i})

Note that ff is a monotonically decreasing function. Let g(p)=f(πi)f(p)g(p)=f(\pi_{i})-f(p) for any pp\in\mathbb{R}. Furthermore, πipif(z)𝑑z\int_{\pi_{i}}^{p_{i}}f(z)dz can be written as f(pi)(piπi)+πipig(z)𝑑zf(p_{i})(p_{i}-\pi_{i})+\int_{\pi_{i}}^{p_{i}}g(z)dz. Observe that ff is monotonically decreasing means g(.)g(.) is monotonically decreasing, hence, g(πi)>g(p)g(\pi_{i})>g(p) for any p>πip>\pi_{i}, g(pi)>0g(p_{i})>0. Hence, πipig(z)𝑑z>0πipif(z)𝑑z>f(pi)(piπi)(2)\int_{\pi_{i}}^{p_{i}}g(z)dz>0\Rightarrow\int_{\pi_{i}}^{p_{i}}f(z)dz>f(p_{i})(p_{i}-\pi_{i})\Rightarrow(2). ∎∎

See 2

Proof.

Similar and symmetric argument to Lemma 1. ∎∎

See 4.2

Proof.

Let τ\tau be a monotonically increasing and additive function representing the pay-off earned by the data consumer by processing the information obtained from the different data providers. We wish to maximize her profit, i.e., i=1n(τ(ϵi)μ(pi))\sum_{i=1}^{n}(\tau(\epsilon_{i})-\mu(p_{i})) for a fixed budget of expenses BB, i.e., i=1nμ(pi)=B\sum_{i=1}^{n}\mu(p_{i})=B. Therefore, we boil down to maximizing i=1nτ(ϵi)=τ(i=1nϵi)\sum_{i=1}^{n}\tau(\epsilon_{i})=\tau(\sum_{i=1}^{n}\epsilon_{i}) for i=1nμ(pi)=B\sum_{i=1}^{n}\mu(p_{i})=B, where ϵi=f(c,pi)\epsilon_{i}=f(c,p_{i}). As τ\tau is increasing, it attains maximum if and only if i=1nϵi=i=1nf(c,pi)\sum_{i=1}^{n}\epsilon_{i}=\sum_{i=1}^{n}f(c,p_{i}) is maximum. Then just observe that i=1nf(c,pi)=I(𝒖)\sum_{i=1}^{n}f(c,p_{i})=I({\bf\it u}) (cf. Definition 4). ∎

See 4.3

Proof.

The profit of the data consumer is i=1n(τ(ϵi)μ(pi))\sum_{i=1}^{n}(\tau(\epsilon_{i})-\mu(p_{i})), where ϵi=f(c,pi)\epsilon_{i}=f(c,p_{i}). Therefore, to maximize her profit for a fixed budget of expenses BB (used to pay the incentives to the data providers), i.e., i=1nμ(pi)=B\sum_{i=1}^{n}\mu(p_{i})=B, we just have to maximize i=1nτ(ϵi)=i=1nτ(f(pi))\sum_{i=1}^{n}\tau(\epsilon_{i})=\sum_{i=1}^{n}\tau(f(p_{i})), which by the assumption of additivity of τ\tau, is equal to τ(i=1nf(c,pi))\tau(\sum_{i=1}^{n}f(c,p_{i})).

Note that the definition of \mathcal{F} (cf. (1)) ensures that all its function are continuous on cc. Since the sum of continuous functions is continuous, we have that also i=1nf(c,pi)\sum_{i=1}^{n}f(c,p_{i}) is continuous on cc. Moreover, cc ranges in a closed interval: c[0,cmin]c\in[0,c_{\text{min}}], where cmin=minp{p1,,pn}{c:f(c,p)=0}c_{\text{min}}=\min_{p\in\{p_{1},\ldots,p_{n}\}}\{c:f(c,p)=0\}. Thus, by Extreme Value Theorem, there exists a cc which maximizes i=1nf(c,pi)\sum_{i=1}^{n}f(c,p_{i}), which, in turn, maximizes i=1n(τ(ϵi)μ(pi))\sum_{i=1}^{n}(\tau(\epsilon_{i})-\mu(p_{i})).

Furthermore, the condition of differentiability makes possible to apply the method of the lagrangians to find the maximum, by imposing that the partial derivatives are 0. The concavity condition implies that those points correspond to a (global) maximum. ∎

See 4.4

Proof.

By the sequential compositionality of differential privacy, the total information obtained by the data consumer is i=1nϵi=i=1nf(c,pi)\sum_{i=1}^{n}\epsilon_{i}=\sum_{i=1}^{n}f(c,p_{i}). The rest follows as the proof of Theorem 4.3. ∎∎

Appendix B Examples

Example 3    Let ={ln(ecp):c+}\mathcal{F}=\{\ln(e-cp):c\in\mathbb{R}^{+}\}. We want to maximize

F(c)=i=1nf(c,pi)F(c)=\sum_{i=1}^{n}f(c,p_{i})

subject to the constraint

G(c)=i=1nμ(pi)=i=1n[piln((ecpi))+pie1cln((ezpi))𝑑z]B=0,G(c)=\sum_{i=1}^{n}\mu(p_{i})=\sum_{i=1}^{n}[p_{i}\ln{(e-cp_{i})}+\int_{p_{i}}^{\frac{e-1}{c}}\ln{(e-zp_{i})}dz]-B=0\,,

for a fixed budget constant B+B\in\mathbb{R^{+}}. By the sequential compositionality of differential privacy, we have i=1nf(c,pi)=i=1nln((ecpi))\sum_{i=1}^{n}f(c,p_{i})=\sum_{i=1}^{n}\ln{(e-cp_{i})}. Furthermore, G(c)=i=1n[piln((ecpi))+piln(eecpi)+ecln(ecpi)]K=0G(c)=\sum_{i=1}^{n}[p_{i}\ln{(e-cp_{i})}+p_{i}\ln(\frac{e}{e-cp_{i}})+\frac{e}{c}\ln(e-cp_{i})]-K=0 where K=B+n(e1)cK=B+\frac{n(e-1)}{c}.

Using the method of Lagrange multipliers, we define the Lagrangian function (c,λ)=F(c)λ(G(c)K)=i=1nln((ecpi))λ(i=1n[piln((ecpi))+piln(eecpi)+ecln(ecpi)]K)=i=1nln((ecpi))λ(i=1n[pi+ecln(ecpi)]K)\mathcal{L}(c,\lambda)=F(c)-\lambda(G(c)-K)=\sum_{i=1}^{n}\ln{(e-cp_{i})}-\lambda(\sum_{i=1}^{n}[p_{i}\ln{(e-cp_{i})}+p_{i}\ln(\frac{e}{e-cp_{i}})+\frac{e}{c}\ln(e-cp_{i})]-K)=\sum_{i=1}^{n}\ln{(e-cp_{i})}-\lambda(\sum_{i=1}^{n}[p_{i}+\frac{e}{c}\ln(e-cp_{i})]-K). Thus we have

(c,λ)=i=1nln((ecpi))λi=1npiλeci=1nln(ecpi)+λK\mathcal{L}(c,\lambda)=\sum_{i=1}^{n}\ln{(e-cp_{i})}-\lambda\sum_{i=1}^{n}p_{i}-\frac{\lambda e}{c}\sum_{i=1}^{n}\ln(e-cp_{i})+\lambda K (6)

Hence, to find the optimal solution of cc, we wish to solve the following:

(c,λ)λ=0\frac{\partial\mathcal{L}(c,\lambda)}{\partial\lambda}=0 (7)
(c,λ)c=0\frac{\partial\mathcal{L}(c,\lambda)}{\partial c}=0 (8)

Solving equation (7), we get

(c,λ)λ=0\displaystyle\frac{\partial\mathcal{L}(c,\lambda)}{\partial\lambda}=0
Ki=1n[pi+ecln(ecpi)]=0\displaystyle\Leftrightarrow K-\sum_{i=1}^{n}[p_{i}+\frac{e}{c}\ln(e-cp_{i})]=0
i=1n[pi+ecln(ecpi)]=K\displaystyle\Leftrightarrow\sum_{i=1}^{n}[p_{i}+\frac{e}{c}\ln(e-cp_{i})]=K
ln(i=1nepi(ecpi)ec)=K\displaystyle\Leftrightarrow\ln(\prod_{i=1}^{n}e^{p_{i}}(e-cp_{i})^{\frac{e}{c}})=K

Now, observe that limc0ln(i=1nepi(ecpi)ec)=\lim_{c\to 0}\ln(\prod_{i=1}^{n}e^{p_{i}}(e-cp_{i})^{\frac{e}{c}})=\infty. Furthermore, observe that limcln(i=1nepi(ecpi)ec)=0\lim_{c\to\infty}\ln(\prod_{i=1}^{n}e^{p_{i}}(e-cp_{i})^{\frac{e}{c}})=0. We recall that K=B+n(e1)cK=B+\frac{n(e-1)}{c}, hence KK is positive and finite. Thus, as ln(i=1nepi(ecpi)eec)\ln(\prod_{i=1}^{n}e^{p_{i}}(e-cp_{i})^{e}{\frac{e}{c}}) is continuous, by the Intermediate Value Theorem [14], we thereby can conclude that the curve y=ln(i=1nepi(ecpi)ec)y=\ln(\prod_{i=1}^{n}e^{p_{i}}(e-cp_{i})^{\frac{e}{c}}) intersects the curve y=Ky=K at least once for c(0,)c\in(0,\infty), implying that we have at least one solution of (B) (1\dagger_{1}).

Thereon, solving equation (8), we get

(c,λ)c=0\displaystyle\frac{\partial\mathcal{L}(c,\lambda)}{\partial c}=0
(i=1nln(ecpi)(1λec)λi=1npi+K)c=0\displaystyle\Leftrightarrow\frac{\partial(\sum_{i=1}^{n}\ln(e-cp_{i})(1-\frac{\lambda e}{c})-\lambda\sum_{i=1}^{n}p_{i}+K)}{\partial c}=0
(i=1nln(ecpi)(1λec))c=0\displaystyle\Leftrightarrow\frac{\partial(\sum_{i=1}^{n}\ln(e-cp_{i})(1-\frac{\lambda e}{c}))}{\partial c}=0
i=1npiecpi(1λec)+λec2i=1nln(ecpi)=0\displaystyle\Leftrightarrow\sum_{i=1}^{n}\frac{-p_{i}}{e-cp_{i}}(1-\frac{\lambda e}{c})+\frac{\lambda e}{c^{2}}\sum_{i=1}^{n}\ln(e-cp_{i})=0

Equation (B) is linear in λ\lambda, implying for every given cc, we will find a λ\lambda satisfying (B) (2)(\dagger_{2}).

Therefore, combining (1\dagger_{1}) and (2\dagger_{2}), we conclude that there is at least one optimal choice of f(,)f(\cdot,\cdot) in \mathcal{F} that maximizes the information gathered by the data consumer, subject to the fixed budget.

Example 4     Let ={1cp:c+}\mathcal{F}=\{1-cp:c\in\mathbb{R}^{+}\}. We want to maximize

F(c)=i=1nf(pi)=i=1n(1cpi)F(c)=\sum_{i=1}^{n}f(p_{i})=\sum_{i=1}^{n}(1-cp_{i})

subject to

G(c)=i=1nμ(pi)=i=1n[pi(1cpi)+pi1c(1cz)𝑑z]=B,G(c)=\sum_{i=1}^{n}\mu(p_{i})=\sum_{i=1}^{n}[p_{i}(1-cp_{i})+\int_{p_{i}}^{\frac{1}{c}}(1-cz)dz]=B\,,

for a fixed budget B+B\in\mathbb{R^{+}}.

Observe that G(c)=n2cc2i=1npi2G(c)=\frac{n}{2c}-\frac{c}{2}\sum_{i=1}^{n}p_{i}^{2}. Using the method of Lagrange multipliers, we define the Lagrangian function (c,λ)=F(c)λ(G(c)B)=ci=1npiλ(n2cc2i=1npi2B)\mathcal{L}(c,\lambda)=F(c)-\lambda(G(c)-B)=-c\sum_{i=1}^{n}p_{i}-\lambda(\frac{n}{2c}-\frac{c}{2}\sum_{i=1}^{n}p_{i}^{2}-B). Thus we have

(c,λ)=ci=1npiλ(n2cc2i=1npi2B)\mathcal{L}(c,\lambda)=-c\sum_{i=1}^{n}p_{i}-\lambda(\frac{n}{2c}-\frac{c}{2}\sum_{i=1}^{n}p_{i}^{2}-B) (11)

Hence, to find the optimal solution of cc, we solve the following:

(c,λ)λ=0\frac{\partial\mathcal{L}(c,\lambda)}{\partial\lambda}=0 (12)
(c,λ)c=0\frac{\partial\mathcal{L}(c,\lambda)}{\partial c}=0 (13)

Solving equation (12), we get

(c,λ)λ=0\displaystyle\frac{\partial\mathcal{L}(c,\lambda)}{\partial\lambda}=0
(n2cc2i=1npi2B)=0\displaystyle\Leftrightarrow-(\frac{n}{2c}-\frac{c}{2}\sum_{i=1}^{n}p_{i}^{2}-B)=0
n2cc2i=1npi2=B\displaystyle\Leftrightarrow\frac{n}{2c}-\frac{c}{2}\sum_{i=1}^{n}p_{i}^{2}=B
c2i=1npi2+2Bcn=0\displaystyle\Leftrightarrow c^{2}\sum_{i=1}^{n}p_{i}^{2}+2Bc-n=0

Note that the LHS of equation (B) is a quadratic equation in cc, and as 4B2+4ni=1npi2>04B^{2}+4n\sum_{i=1}^{n}p_{i}^{2}>0, equation (B) has two distinct roots, or candidates for choosing the optimal cc (1)(\dagger_{1}^{\prime}).

Thereon, solving equation (13), we get

(c,λ)c=0\displaystyle\frac{\partial\mathcal{L}(c,\lambda)}{\partial c}=0
i=1npi+λn2c2+λ2i=1npi2=0\displaystyle\Leftrightarrow-\sum_{i=1}^{n}p_{i}+\frac{\lambda n}{2c^{2}}+\frac{\lambda}{2}\sum_{i=1}^{n}p_{i}^{2}=0

Equation (B) is linear in λ\lambda, implying for every given cc, we will find a λ\lambda satisfying (B) (2)(\dagger_{2}^{^{\prime}}).

Therefore, combining (1\dagger_{1}^{^{\prime}}) and (2\dagger_{2}^{^{\prime}}), we conclude that there is at least one optimal choice of f(,)f(\cdot,\cdot) in 1\mathcal{F}_{1} that maximizes the information gathered by the data consumer, subject to the fixed budget.