[1]Shehroze Farooqi

¹¹affiliationtext: The University of Iowa, E-mail: [email protected]²²affiliationtext: The University of Iowa/Lahore University of Management and Sciences, E-mail: [email protected]³³affiliationtext: The University of Iowa, E-mail: [email protected]⁴⁴affiliationtext: Lahore University of Management and Sciences, E-mail: [email protected]

CanaryTrap: Detecting Data Misuse by Third-Party Apps on Online Social Networks

Maaz Musa Zubair Shafiq Fareed Zaffar

Abstract

Online social networks support a vibrant ecosystem of third-party apps that get access to personal information of a large number of users. Despite several recent high-profile incidents, methods to systematically detect data misuse by third-party apps on online social networks are lacking. We propose CanaryTrap to detect misuse of data shared with third-party apps. CanaryTrap associates a honeytoken to a user account and then monitors its unrecognized use via different channels after sharing it with the third-party app. We design and implement CanaryTrap to investigate misuse of data shared with third-party apps on Facebook. Specifically, we share the email address associated with a Facebook account as a honeytoken by installing a third-party app. We then monitor the received emails and use Facebook’s ad transparency tool to detect any unrecognized use of the shared honeytoken. Our deployment of CanaryTrap to monitor 1,024 Facebook apps has uncovered multiple cases of misuse of data shared with third-party apps on Facebook including ransomware, spam, and targeted advertising.

1 Introduction

Online social networks such as Facebook and Twitter have millions of third-party applications (or apps) on their platforms [fb9millionApps, onemilliontwitterapps]. These third-party apps can potentially get access to billions of accounts including personal information of users who install these apps. For example, single sign-on (SSO) apps on Facebook typically require access to a user’s email address, date of birth, gender, and likes [fbpermissions]. Third-party apps with access to personal information of a large number of users have a high potential for misuse. There have already been several high-profile incidents of data misuse by third-party apps on online social networks [CALeak17GuardianT2, millionFBAccountsAppBought, DigiTrendsSuspendedAppsT5a, googlePlusDataLeak500kT8, googlePlusDataLeak52mT9, appdataleakAWSservers, myPersonalityQuizLeakT6, nameTestDataLeakT7, rankwaveLawsuit, rankwaveFacebook]. Most notably, a personality quiz app “thisisyourdigitallife” harvested data of an estimated 87 million Facebook users that was then used by Cambridge Analytica to create targeted advertising campaigns during the 2016 US presidential election [CALeak17GuardianT2]. These incidents have invited scrutiny from regulators. In July 2019, the FTC imposed sweeping new privacy restrictions on Facebook [ftc5Billion], including a mandate to suspend third-party apps that do not certify compliance with Facebook’s platform policies. However, the verification of these compliance certifications has proven challenging for Facebook [CambridgeAnalyticaFbNewsRoom, rankwaveLawsuit, rankwaveFacebook].

There is a lack of methods to systematically detect data misuse by third-party apps. The main issue is that online social networking platforms lose control over their data once it is retrieved by third-party apps. These third-party apps can store the retrieved data on their servers from where it can be further transferred to other entities. Neither users nor online social networks have any visibility on the use of data stored on the servers of third-party apps. This makes the problem of detecting data misuse extremely challenging since it is hard to track something not under your control.

In this paper, we present CanaryTrap that uses honeytokens to monitor misuse of data shared with third-party apps on online social networks. A honeytoken refers to a piece of information such as email address or credit card information that can be intentionally leaked or shared to detect its unrecognized (or potentially unauthorized) use [honeytokenPotNetTerminologies, honeytokenOtherHoneypot, honeytokenCybercriminals]. CanaryTrap shares a honeytoken with a third-party app and detects its misuse using different monitoring channels. For example, if an email address is shared as a honeytoken then received emails act as the channel for detecting unrecognized use of the shared email address.

We design and implement CanaryTrap to investigate misuse of data shared with third-party apps on Facebook. We share the email address associated with a Facebook account as a honeytoken by installing a third-party app and then monitor the received emails to detect any unrecognized use of the shared email address. We conclude that a honeytoken shared with a third-party app has been potentially misused if the sender of a received email cannot be recognized as the third-party app. In addition to using emails as a channel to detect data misuse, we also leverage the fact that advertisers on Facebook can use email addresses to target ads to custom audiences [customAudienceAdsFB]. Specifically, we use Facebook’s ad transparency tool “Why Am I Seeing This?” to monitor advertisers who have used the shared honeytoken (i.e., email address) to run custom audience ad campaigns on Facebook [fbAdTransparencyHowTo]. We conclude that a honeytoken shared with a third-party app has been potentially misused if the advertiser cannot be recognized as the third-party app.

There are two main challenges in scaling CanaryTrap to monitor a large number of third-party apps that exist on Facebook. First, we would need to create a Facebook account using a unique email address per-app as a honeytoken. It is infeasible to create a large number of Facebook accounts because Facebook’s anti-abuse systems thwart bulk account registration [fbFakeAccountsActions]. To overcome this challenge, we propose an array framework that rotates email addresses associated with a Facebook account while maintaining a one-to-one mapping between shared honeytokens (i.e., email addresses) and third-party apps. Second, Facebook’s anti-abuse systems also limit our ability to frequently rotate email addresses associated with a Facebook account. Thus, it is infeasible for us to rotate email addresses for each app as we try to scale CanaryTrap to monitor a large number of third-party apps on Facebook. We can install multiple third-party apps on a Facebook account associated with an email address; however, we would not be able to correctly attribute the third-party app responsible for data misuse since the email address was shared with multiple apps. To overcome this challenge, we propose a matrix framework that uses an $n\times m$ matrix arrangement to install $nm$ apps across a pair of Facebook accounts by rotating $n+m$ email addresses. This matrix arrangement allows us to attribute the responsible third-party app by establishing a unique two-dimensional mapping for each app.

We deploy array and matrix variants of CanaryTrap to monitor misuse of data shared with 1,024 Facebook third-party apps. We share honeytokens with these apps using array and matrix frameworks and then monitor the misuse through received emails over the duration of more than a year. By analyzing the emails received on the honeytoken email addresses, the array framework detects 16 apps that share email addresses with unrecognized senders. On the other hand, the matrix framework detects 9 out of these 16 apps mainly due to non-deterministic app behavior and implementation issues (which can be readily mitigated in future deployments of CanaryTrap). By analyzing Facebook’s ad transparency tool, while we are unable to attribute the responsible apps because Facebook does not reveal the email address used by advertisers, we are able to detect unrecognized use of data by 9 advertisers.

Our further investigation reveals that Facebook does not fully enforce its policies [tos] that require app developers to disclose their data collection and sharing practices as well as respond to data deletion requests by users. Of the analyzed apps, 6% apps fail to provide the required privacy policies, 48% apps do not respond to data deletion requests, and a few apps even continue using user data after confirming data deletion.

2 Background & Related Work

2.1 Background

We briefly describe the privacy implications of making user data accessible to third-party apps.

Third-party app ecosystem. Online social networks provide APIs to enable the development of third-party apps to enhance user experience (e.g., games, entertainment, utilities). Popular online social networks such as Facebook and Twitter have millions of third-party apps that are used by hundreds of millions of users [FarooqiOauthAbuse17IMC, appdata, onemilliontwitterapps]. For example, Facebook’s third-party apps are being used by more than 42,000 of the top million websites [SSOLinkPaper]. These third-party apps, with a very large user base in the order of hundreds of millions of users, get restricted access to their users’ accounts depending on the permissions granted by the users. Third-party apps can request permissions to retrieve user profile information such as their contact information (email address, phone number) and demographics (date of birth, age, gender, sexual orientation) as well as any content they or their friends/followers may have shared (e.g., posts, likes, comments). While third-party apps are supposed to request the minimal set of permissions needed to implement their functionality, prior research has shown that many third-party apps request more permissions than they actually need [Chia12IsAppSafeWWW]. In fact, it is not far fetched to assume that many apps request more permissions than necessary with malicious intent. In summary, third-party app access to personal information of millions of users opens up possibilities for data misuse. Despite its privacy risks, the third-party app ecosystem also benefits users. For example, the integration of single sign-on (SSO) features into websites expands users’ choices for authentication. Therefore, protecting users from data misuse is important for the growth of this ecosystem.

Data leakage/misuse. Figure 1 illustrates how a third-party app may engage in data misuse. Third-party apps use the developer APIs (e.g., Facebook Graph API) to retrieve data of users who install their apps. The retrieved data is typically stored on the servers controlled by third-party apps essentially providing them perpetual access to the retrieved data. These apps can then use the retrieved data to implement their functionality. For example, a music streaming app (e.g., Spotify) can leverage the data to suggest relevant music or allow users to share music with their friends or followers. It is noteworthy that the terms of service (TOS) generally prohibit any use of the retrieved data outside the scope of the third-party app [tos]. Hence, third-party apps should not leak users’ data intentionally (e.g., selling/sharing data to data brokers and advertisers) [rankwaveFacebook] or inadvertently (e.g., accidentally making data publicly available) [appdataleakAWSservers]. Leaked data can be misused by an attacker for sending spam emails or targeted ad campaigns. We define misuse as any use of a user’s data, which is retrieved by a third-party app, that is outside the scope of the app.

Refer to caption — Fig. 1: Illustration of misuse of data shared with a third-party Facebook app. User data stored on the app server can be leaked to an attacker through unauthorized sale, sharing, or accidental public exposure. The attacker can then misuse this leaked data for various purposes such as marketing through spam or targeted ad campaigns.

Role of online social networks in curbing data misuse. There have been several recent high-profile incidents of misuse of data shared with third-party apps [CALeak17GuardianT2, myPersonalityQuizLeakT6, DigiTrendsSuspendedAppsT5a, nameTestDataLeakT7, googlePlusDataLeak500kT8, googlePlusDataLeak52mT9, appdataleakAWSservers, millionFBAccountsAppBought]. Most notably, a voter-profiling company Cambridge Analytica reportedly used a third-party app “thisisyourdigitallife” to harvest personal data (public and private profile information including basic demographics, places visited, interests, and friends) of more than 50 million Facebook users [CambridgeAnalyticaFiles, CALeak17GuardianT2]. Data collected through third-party Facebook apps containing user identifiers (user name, email address, user ID) was even reportedly sold on a black market [millionFBAccountsAppBought]. These high-profile incidents of data misuse by third-party apps and pressure from regulators has nudged Facebook to audit third-party apps on their platform. Facebook recently curbed developer access to user data and also started revoking access to dormant APIs [devAccessAPIReduceFB, DigiTrendsSuspendedAppsT5a, fbAppInvestigationLatest]. As a part of their recent settlement with the FTC, Facebook now requires developers to annually certify compliance with their TOS [ftc5Billion]. However, the verification of these certifications has proven challenging for Facebook in the past [rankwaveFacebook, CambridgeAnalyticaFbNewsRoom]. Thus, online social networks need to proactively monitor third-party apps on their platform for potential data misuse. Unfortunately, online social network operators have skirted their responsibility. For example, an ex-employee accused Facebook of intentionally limiting their audits of data accessed by third-party apps unless there is negative press or pressure by regulators [nytimeFBManagerPrivacyApps, guardianFBManagerPrivacyApps]. The callous attitude of online social networks in protecting their users’ data from third-party apps further highlights the need for methods that can be independently deployed to detect data misuse by third-party apps. However, the unavailability of privileged information (e.g., API access logs) and stringent crawling restrictions imposed by online social networks limit the ability of independent researchers and watchdogs to investigate the misuse of data by third-party apps at a large scale.

2.2 Related Work

Detecting data leakage. A large body of prior work has focused on detecting leakage of user data on the client-side (e.g., mobile apps, web browsers) to online trackers and advertisers [Krishnamurthy11privacyleakage, mayer12thirdSP, englehardt15cookiesWWW, starov16contactFormPETS, englehardt18emailTrackingPETS, song2015privacyguardSPSM, ren2016reconMobiSys, renLongitudinalPIINDSS, reyes2018CoppaChildPETS, razaghpanah2018appsPrivacyNDSS, KrishnamurthyPIILeakageWOSN09, HuberAppInspectCOSN13]. First, some prior work has focused on detecting data leakage in mobile apps through network traffic analysis [song2015privacyguardSPSM, ren2016reconMobiSys, renLongitudinalPIINDSS, reyes2018CoppaChildPETS, razaghpanah2018appsPrivacyNDSS]. For example, Ren et al. [renLongitudinalPIINDSS] showed that more than 50% of the 100 most popular mobile apps leak personally identifiable information (PII). Second, prior work has focused on detecting data leakage in web browsers through cookies, contact forms, or emails [Krishnamurthy11privacyleakage, mayer12thirdSP, englehardt15cookiesWWW, starov16contactFormPETS, englehardt18emailTrackingPETS]. For example, Starov et al. [starov16contactFormPETS] showed that more than 8% of websites leak users’ PII to online trackers through contact forms. Finally, a subset of prior work on detecting data leakage in web browsers has focused on the leakage of user data by third-party apps on online social networks [KrishnamurthyPIILeakageWOSN09, HuberAppInspectCOSN13]. Huber et al. [HuberAppInspectCOSN13] reported more than a hundred examples of PII (e.g., Facebook’s user ID and name) being leaked to analytics services by third-party apps. This line of prior work has two key limitations. First, it is limited to detecting data leakage on the client-side – it cannot detect data leakage on the server-side. Second, it is limited to detecting data leakage – it cannot detect potential misuse of the leaked data. In contrast to prior work on detecting data leakage, we focus on the detection of data misuse by third-party apps irrespective of whether it is leaked at the client-side or server-side.

Honeypots. Prior work has used honeypots to investigate reputation manipulation in online social networks and account/website compromise. First, prior work has used honeypots to monitor attackers’ interactions with honeypots by leaking credentials [DeBlasioTripwire17IMC, FarooqiOauthAbuse17IMC, OnaolapoHoneyGmail16IMC]. For example, Onaolapo et al. [OnaolapoHoneyGmail16IMC] leaked credentials of honey email accounts to blackhat forums and paste sites and monitored activities of attackers who accessed their honey email accounts. Second, prior work has used honeypots to monitor attacker interactions with honeypots without explicitly handing over their access, such as by purchasing fake followers on Facebook, Instagram, or Twitter [Stringhini13FollowGreenIMC, Cristofaro14PayingForLikesIMC, DeKovenFollowFootstepInsta2018IMC]. For example, DeKoven et al. [DeKovenFollowFootstepInsta2018IMC] purchased Instagram followers for honeypot accounts to monitor the activities and infrastructure of spammers.

Honeytokens. Honeytoken is a specialized form of honeypot which is typically a digital piece of information (e.g., email address, credit card information) [honeytokenPotNetTerminologies]. Honeytokens have been used in prior work to investigate insider threats [spitznerInsiderHT03ACSAC, BowenBaitingDecoy09, kesheeInsiderThreat14IET], phishing attacks [mcraePhishingHT07HICSS], and website compromise [DeBlasioTripwire17IMC]. For example, Spitzner et al. [spitznerInsiderHT03ACSAC] discussed various kinds of honeytokens such as creating a bogus medical record to detect unauthorized access by employees. More recently, DeBlasio et al. [DeBlasioTripwire17IMC] used email address and password pair as a honeytoken to detect website compromise. In a similar spirit, we are interested in using honeytokens to detect misuse of data shared with third-party apps in online social networks.

3 CanaryTrap

Prior work lacks systematic methods that can be used by online social networks or independent watchdogs to detect misuse of data shared with third-party apps. Even though online social networking platforms such as Facebook and Google have unrestricted access to their internal logs, it is challenging for them to detect data misuse by third-party apps because data leaves their platform once a third-party app retrieves it through the APIs. Online social networks can look for excessive (or “anomalous”) API access patterns by third-party apps but they have no visibility into how the data is used by third-party apps once it leaves their premises. Even though such privileged information (available exclusively to online social networking platforms) can help narrow down the potential misuses, it is desirable to build methods that can be deployed by independent watchdogs without needing cooperation from online social networks. Next, we explain the design and implementation of our proposed approach to this end.

3.1 Design

Overview. We introduce CanaryTrap, a honeytoken based approach to detect misuse of data shared with third-party apps on online social networks without needing their cooperation. Inspired by prior research on honeypots, CanaryTrap uses a honeytoken to detect misuse of data shared with third-party apps. A honeytoken refers to a piece of information such as email address or credit card information that can be intentionally leaked/shared to detect its unrecognized (or potentially unauthorized) use [honeytokenPotNetTerminologies]. CanaryTrap shares a honeytoken with third-party apps and detects its misuse using different channels. For example, if an email address is shared as a honeytoken to a third-party app then the received emails act as the channel for detecting unrecognized use of the shared email address. CanaryTrap can detect misuse of data shared with a third-party app on any platform (e.g., Facebook, Google) by associating the honeytoken (e.g., email address) to a user account on the platform and then sharing it with the third-party app. Next, we explain the design of CanaryTrap and then propose two implementation frameworks that allow its deployment at scale.

Design Space. CanaryTrap relies on sharing a honeytoken in the data associated with an account that can be used as a bait to trigger misuse. There are various attributes in a user account including but not limited to name, email address, date of birth, photos, timeline posts, check-ins, phone number, and address that can be used as honeytokens. The use of each of the aforementioned attributes as a honeytoken has different pros and cons. Below, we discuss the desirable properties in a honeytoken to detect the misuse of data shared with a third-party app.

1.

Exploitability: The honeytoken must provide substantial value and incentives for exploitation.
2.

Soundness: The honeytoken must allow setting up channels that can be soundly monitored for misuse detection.
3.

Feasibility: The monitoring channels for the honeytoken must be feasible to set up at scale.
4.

Availability: The honeytoken must be commonly available in the account information and also commonly requested by third-party apps.

While we considered various attributes, we ended up selecting email address as our honeytoken since it best satisfies the aforementioned properties. Next, we justify our choice of email address as the honeytoken.

1) Email address is exploitable by an attacker since it plays a crucial role in the misuse of data due to its uniqueness and also allows the attacker to contact users by sending out emails. Typically, online services including websites, games, and software require users to associate an email address with their accounts. Users tend to use the same email address across different accounts [minkusEmailReuse14Pets, PeritoUniqueUsernames11Pets]. Hence, email address has emerged as a universal identifier [emailIsKey1, emailIsKey2, emailIsKey3]. It also provides data brokers and advertisers the opportunity to unify data acquired on a user from multiple sources to increase its coverage. Email address also allows an attacker to contact users by sending them emails.

2) Email address allows us to set up sound channels to monitor the misuse of data. We can set up our own email server or we can partner with an email provider to receive emails on the corresponding email address. We can then analyze these emails to detect the misuse of the email address shared with third-party apps.

3) Email address allows us to set up feasible channels in a large-scale controlled experiment. We can acquire a large number of unique email addresses to monitor millions of third-party apps. We can either set up our own email server to create any number of email accounts or we can partner with an email provider.

4) Email address is available in the account information of most online platforms. It is also commonly requested by third-party apps [wangAppPermission11CHIMIT]. Moreover, email is one of the few permissions that can be requested by third-party apps on Facebook without requiring a review [loginPermissionsReview].

3.2 Implementation

Next, we discuss CanaryTrap’s implementation to detect misuse of data shared with third-party apps on Facebook using email address as the honeytoken.

Sharing a honeytoken with a third-party app. We create a fresh Facebook account to share an email address as the honeytoken with a third-party app. We can either partner with a major email provider [DeBlasioTripwire17IMC, OnaolapoHoneyGmail16IMC] or set up our own server for email accounts. We assign an email address to the Facebook account. We then share the email address with the third-party app by installing the third-party app. The third-party app is able to access the email address using Facebook’s Graph API.

Monitoring Channels. We set up two channels to monitor the misuse of the honeytoken shared with the third-party app monitored by CanaryTrap. First, we use received emails on the email account of the shared honeytoken as a monitoring channel. We conclude that a honeytoken shared with a third-party app has been potentially misused if the sender of a received email cannot be recognized as the third-party app. Second, we use Facebook’s ad transparency tool “Why Am I Seeing This?” [fbAdTransparencyHowTo, fbAdTransparencyCustomAudience] as a monitoring channel. We monitor whether advertisers use the shared email address to target ads to Facebook custom audiences [customAudienceAdsFB]. We conclude that a honeytoken shared with a third-party app has been potentially misused if the advertiser cannot be recognized as the third-party app. We explain this process in detail below.

1) Detecting data misuse using received emails. Given a received email and a third-party app (app’s name and domain name) as input, we use keyword matching to determine whether the sender of the email is the input third-party app. To match an email with a third-party app, we generate keywords using the app’s name and its host website’s domain name. For an app’s name, we start with its full name and create tokens of the name if it is composed of multiple words. For example, if the app name is “test application” we generate “test application”, “test”, and “application” as keywords. For the domain name, we use the complete domain name and also create tokens of domain levels, excluding TLDs (e.g., .com, .co, .co.uk). For example, if the domain name is “subdomain.example.com” we create “subdomain.example.com”, “subdomain”, and “example” as keywords. We then search these keywords in the header of the received email. Specifically, we search for them in “from”, “reply-to”, “message-id”, and “subject” fields. If any of the keywords is found in any of these fields, we label the email as recognized; otherwise, we label the email as unrecognized. We also call the sender of an unrecognized email as unrecognized sender. The unrecognized sender can be a partner website or an external service used by the app to send emails. Therefore, we manually analyze the content of unrecognized emails to further determine data misuse.

2) Detecting data misuse using Facebook’s ad transparency tool. We identify the names of advertisers who upload our shared honeytoken by crawling the advertiser information provided by Facebook’s ad transparency tool [fbAdTransparencyHowTo, fbAdTransparencyCustomAudience]. Specifically, Facebook’s ad transparency tool provides a list of advertisers who uploaded a list with an email address associated with a Facebook account [VenkatadriPPITargettingDataBroker18SP]. For each of the listed advertisers, we extract the advertiser’s name and its domain (if available) listed on their Facebook page. We then match the advertiser’s name and its domain against the app’s keywords generated using the aforementioned keyword matching process. If any of the keywords is matched with the name of the advertiser, we label the advertiser as recognized; otherwise, we label the advertiser as unrecognized. We conclude that data has been potentially misused by an unrecognized advertiser because Facebook’s ad transparency tool provides the evidence that the email address shared with the third-party app has been uploaded by the unrecognized advertiser with Facebook.

Summary. Figure 2 provides an overview of CanaryTrap to detect misuse of data shared with third-party apps on Facebook. CanaryTrap uses a Facebook account to share an email address as a honeytoken with a third-party app. Misuse is detected using two channels: by analyzing the received emails and the advertisers listed by Facebook’s ad transparency tool.

3.3 Deployment

Since online platforms support a large number of third-party apps, it is important that CanaryTrap’s deployment can scale. We face two challenges in scaling CanaryTrap’s deployment on Facebook.

First, CanaryTrap requires creating a Facebook account to share a unique email address as a honeytoken with a third-party app. Since Facebook’s anti-abuse systems thwart automated/manual bulk account registration [fbFakeAccountsActions], it is infeasible to create a large number of Facebook accounts to monitor a large number of apps. To overcome this challenge, we propose an array framework that rotates multiple email addresses associated with a Facebook account to maintain one-to-one mapping between the shared honeytokens (i.e., email addresses) and third-party apps. Note that we can practically create as many email accounts as necessary by setting up our own email server as we discuss later in Section 4.1.

Second, Facebook’s anti-abuse systems also limit our ability to frequently rotate the email address associated with a Facebook account. Thus, it is infeasible for us to rotate email addresses for each app as we try to scale CanaryTrap to monitor a large number of third-party apps on Facebook. As an alternate, we can share the same honeytoken to multiple third-party apps using a single Facebook account. However, in case a third-party app misuses our data, it would not be possible to identify the responsible app since the email address was shared with multiple apps. To overcome this challenge, we propose a matrix arrangement to install $n\times m$ apps across a pair of Facebook accounts by rotating $n+m$ email addresses. This matrix arrangement enables us to attribute the responsible third-party app by establishing a unique two-dimensional mapping for each app.

The matrix framework is more scalable than the array framework. More specifically, the matrix framework allows us to reuse a honeytoken for monitoring multiple third-party apps. It is able to provide a unique app-to-honeytoken mapping by sharing two honeytokens with each app in a two-dimensional arrangement. It allows CanaryTrap to monitor $N$ apps using as little as $2\sqrt{N}$ honeytokens, instead of needing $N$ honeytokens in the array framework implementation.

Next, we detail the deployment of CanaryTrap using both array and matrix frameworks.

3.3.1 Array Framework

In the array framework, we install all third-party apps on a single Facebook account such that a unique honeytoken (email address) is shared with each app. To this end, we create a single Facebook account, $A$ , and set up different honeytoken email addresses for it, $HT_{A}$ .

F=\{A\}\ ,HT_{A}=\{e_{1},e_{2},...,e_{n}\}\ ,App=\{a_{1},a_{2},...,a_{n}\}

We start by associating an email address $e_{i}$ from the $HT_{A}$ to the Facebook account $A$ and install a new app $a_{i}$ . We then uninstall the app $a_{i}$ and remove the email address $e_{i}$ . This ensures that the app $a_{i}$ is only shared a single honeytoken i.e., $e_{i}$ . We then repeat this process for all of the apps in $App$ such that the app $a_{i}$ is shared with the honeytoken $e_{i}$ .

Since only the third-party app $a_{i}$ knows the email address $e_{i}$ , any email received on $e_{i}$ is attributed to the third-party app $a_{i}$ . Let’s assume that $e_{i}$ receives $k$ emails. We denote emails received by $e_{i}$ as $Mail_{{e_{i}}}=\{mail_{1},mail_{2},...,mail_{k}\}$ . We then input each email from $Mail_{e_{i}}$ and third-party app $a_{i}$ to the matching process explained in Section 3.2 to check whether the email is labeled as recognized or unrecognized. We say that an app $a_{i}$ is responsible for the unrecognized use of honeytoken shared with it if one or more emails from $Mail_{e_{i}}$ are labeled as unrecognized.

3.3.2 Matrix Framework

The matrix framework allows us to attribute the third-party app responsible for the misuse while sharing a honeytoken to multiple apps. To monitor $N$ apps using matrix framework, we require $n$ and $m$ honeytokens such that $N<=n\times m$ . If $N$ is a perfect square then $n=m=\sqrt{N}$ . If $N$ is not a perfect square then $n\geq\sqrt{n}$ and $m\geq\sqrt{n}$ and $N<n\times m$ .

To implement CanaryTrap using a 2-dimensional matrix framework, we create two Facebook accounts, R and C. Each account is associated with a set of unique email addresses that will act as our honeytokens. We share two unique email addresses with each third-party app, one email address associated with the Facebook account R (row) and one email address associated with the Facebook account C (column). Figure LABEL:fig:_LinMat shows the deployment of CanaryTrap using two Facebook accounts associated with n and m email addresses. This matrix arrangement allows us to monitor data misuse by $n\times m$ third-party apps using $n+m$ honeytokens.

F=\{R,C\}

HT_{R}=\{r_{1},r_{2},...,r_{m}\}\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ HT_{C}=\{c_{1},c_{2},...,c_{n}\}

A p p = [\@mkpream c c c c c \@arrayleft a_1,1 .. .. a_1,n \\ . . . . \\ a_m,1 .. .. a_m,n \\ 4 Results I n t h i s s e c t i o n, w e f i r s t e x p l a i n o u r e x p e r i m e n t s t o d e p l o y C a n a r y T r a p t o m o n i t o r 1, 024 t h i r d - p a r t y a p p s o n F a c e b o o k . W e t h e n i n v e s t i g a t e c a s e s o f d a t a m i s u s e d e t e c t e d b y C a n a r y T r a p b y a n a l y z i n g t h e r e c e i v e d e m a i l s a n d F a c e b o o k ’ s a d t r a n s p a r e n c y t o o l . F i n a l l y, w e e v a l u a t e t h e e f f e c t i v e n e s s o f C a n a r y T r a p^{'} s m a t r i x f r a m e w o r k i n d e t e c t i n g d a t a m i s u s e . 4.1subsection 4.14.1§4.14.1ExperimentalSetupIdentifying third-party apps. S i n c e F a c e b o o k d o e s n o t p r o v i d e a n i n d e x o f t h i r d - p a r t y a p p s, w e u s e a l i s t o f F a c e b o o k a p p s f r o m p r i o r r e s e a r c h [SSOLinkPaper] t h a t w a s c o m p i l e d b y c r a w l i n g t h e w e b . T h i s l i s t c o n t a i n s 43, 332 host websites t h a t i n t e g r a t e F a c e b o o k a p p s (e . g ., S S O, s o c i a l p l u g i n s) . B y r e - c r a w l i n g t h e s e 43, 332 h o s t w e b s i t e s, w e f i n d a c t i v e F a c e b o o k a p p s o n 25, 800 w e b s i t e s t h a t r e q u e s t e m a i l a d d r e s s . W e r a n d o m l y s e l e c t 1, 024 o f t h e s e h o s t w e b s i t e s t h a t i n t e g r a t e F a c e b o o k a p p s t o d e p l o y C a n a r y T r a p f o r m o n i t o r i n g d a t a m i s u s e . O u r r a n d o m s e l e c t i o n i n c l u d e s d i f f e r e n t s e r v i c e s s u c h a s m u s i c s t r e a m i n g a n d n e w s s i t e s . Setting up email honeytokens. T h e a r r a y f r a m e w o r k r e q u i r e s s h a r i n g o n e e m a i l h o n e y t o k e n p e r a p p . T h u s, w e n e e d 1, 024 h o n e y t o k e n s t o m o n i t o r 1, 024 t h i r d - p a r t y a p p s u s i n g t h e a r r a y f r a m e w o r k . M a t r i x f r a m e w o r k r e q u i r e s s h a r i n g 2 \sqrt{mn} h o n e y t o k e n s t o m o n i t o r mn a p p s . T h u s, w e n e e d 64 h o n e y t o k e n s t o m o n i t o r 1, 024 t h i r d - p a r t y a p p s u s i n g t h e m a t r i x f r a m e w o r k . T o t h i s e n d, w e s e t u p a . c o m e m a i l s e r v e r a n d u s e a l i s t o f p o p u l a r n a m e s [SurnamesGithub] t o c r e a t e e m a i l a c c o u n t s u s i n g t h e f i r s t n a m e - l a s t n a m e @ e x a m p l e . c o m t e m p l a t e . F o r e x a m p l e, i f w e h a v e a f i r s t n a m e ` ` j o h n^{''} a n d l a s t n a m e ` ` d o e^{''}, w e g e n e r a t e a n e m a i l a c c o u n t w i t h e m a i l a d d r e s s ` ` j o h n - d o e @ e x a m p l e . c o m^{''} . Setting up Facebook accounts. W e r e g i s t e r t h r e e f r e s h F a c e b o o k a c c o u n t s i n t o t a l : o n e F a c e b o o k a c c o u n t A f o r t h e a r r a y f r a m e w o r k a n d t w o F a c e b o o k a c c o u n t s R a n d C f o r t h e m a t r i x f r a m e w o r k . W e s e t t h e p r i v a c y s e t t i n g s o f t h e s e a c c o u n t s s u c h t h a t t h e i r p e r s o n a l i n f o r m a t i o n, i n c l u d i n g e m a i l a d d r e s s e s, r e m a i n s p r i v a t e t o e v e r y o n e e x c e p t f o r t h e i n s t a l l e d a p p s . Fig. 4Figure 44Fig. 44AutomatedworkflowofourCanaryTr