Defending Touch-based Continuous Authentication Systems from Active Adversaries Using Generative Adversarial Networks

Mohit Agrawal Teradata R&D Labs, India. ²IIIT Delhi, India. ³Haverford College, USA. Pragyan Mehrotra Rajesh Kumar Rajiv Ratn Shah

Abstract

Previous studies have demonstrated that commonly studied (vanilla) touch-based continuous authentication systems (V-TCAS) are susceptible to population attack. This paper proposes a novel Generative Adversarial Network assisted TCAS (G-TCAS) framework, which showed more resilience to the population attack. G-TCAS framework was tested on a dataset of $117$ users who interacted with a smartphone and tablet pair. On average, the increase in the false accept rates (FARs) for V-TCAS was much higher ( $22\%$ ) than G-TCAS ( $13\%$ ) for the smartphone. Likewise, the increase in the FARs for V-TCAS was $25\%$ compared to G-TCAS ( $6\%$ ) for the tablet.

⁰⁰footnotetext: *Authors contributed equally.^†^†footnotetext: 2021 IEEE International Joint Conference on Biometrics (IJCB)
978-1-6654-3780-6/21/$31.00 ©2021 IEEE

1 Introduction

Individuals’ identity can be authenticated via what one: can memorize (e.g., PIN, passwords), can carry (e.g., magnetic cards, keyfobs), has (e.g., face, fingerprint, iris), and does (e.g., walking, talking, swiping). These means for authentication offer advantages over one another primarily in usability, privacy, and security. For instance, PINs and passwords need to be memorized, are time-consuming, or can be stolen [1, 2]. Face or fingerprint need not be memorized and offer faster authentication. However, some users might find them intrusive. Similarly, swiping or typing patterns could offer continuous authentication of identity, while PIN, password, and fingerprint offer only entry-point authentication. Touch-gestures are among the most widely studied authentication means for continuous authentication [3, 4, 5, 6, 7, 8, 9, 10, 11]. One of the primary reasons behind this is that touch gestures meet most of the criteria (universality, distinctiveness, permanence, collectability, performance, acceptability, and circumvention) that are defined to be viable biometric [12, 13].

Hertenstein and Keltner [14] stated that touch is the most developed sensory modality since birth and contributes to people’s cognitive and socio-emotional development. They suggest that people can decode anger, fear, disgust, love, gratitude, and sympathy via touch. One can even estimate the length of thumbs from touch gestures produced on touch-enabled smartphones [15]. The authors cited anthropometrics to argue that many individual body segments’ lengths follow a unique proportional relationship. The pervasiveness of touch-enabled devices allows us to capture touch from different dimensions, including the touch locations and area and pressure at each of those locations. Numerous studies [3, 4, 5, 7, 8] have demonstrated that commonly studied (vanilla) touch-based continuous authentication systems (V-TCAS) achieve practically low error rates. However, most of these studies have evaluated V-TCAS’s performance under a zero-effort adversarial environment despite some studies suggesting that V-TCAS are vulnerable to data injection and imitation attacks [16, 17, 18].

We believe that it is essential to take a proactive approach and test TCAS under adversarial environments in the literature. This paper makes an effort in this direction. The main contributions are summarized as follows:

•

We implement and test the vanilla TCAS (V-TCAS) under traditional zero-effort and population-based adversarial scenarios. In line with previous studies, we observed that V-TCAS’s false acceptance increases significantly under the population-based adversarial scenarios.
•

Next, we propose a novel Generative Adversarial Networks assisted TCAS framework (G-TCAS) and test the same under zero-effort and population attack adversarial scenarios and found it to be more resilient than V-TCAS.
•

We benchmarked four widely studied classifiers (each with a diverse learning paradigm) on a dataset of $117$ users who provided their samples in a multi-device environment. The superiority of G-TCAS was evident over V-TCAS across the classifiers and devices¹^†^†¹ https://github.com/midas-research/IJCB2021GANTouch.

The rest of the paper is organized as follows. Section 2 discusses the closely related works. Section 3 presents the design of experiments. Section 4 presents and discusses the results, respectively. Finally, we conclude the paper and provide future research directions in Section 5.

2 Related work

The closely related works are structured as follows. The first part discusses the studies that focus on TCAS. While the second part systematically described various possible adversarial environments.

2.1 Continuous Authentication via Touch Gestures

Some of the earliest studies that explored touch gesture for authentication include [3, 4, 5, 6, 7]. These studies focused primarily on collecting data and demonstrating that touch gestures are unique to an individual under the non-existent active adversary’s assumption. Later studies [8, 6, 19] divided the swipes into multiple types such as left, right, up, and down; built a separate model for each of these types to authenticate users. Kumar et al. [8] studied touch gestures and corresponding movements captured by an accelerometer. They argued that it would be a little more difficult for adversaries to reproduce both the touch gestures and the underlying movements at any given point in time than just spoofing the swipes.

The majority of the previous studies focused on analyzing touch gestures collected from smartphones [3, 20, 4, 21, 22, 6, 5, 8]. These studies collect data from human participants while they answered a set of questions, browsed websites, or scrolled images. Later, they extracted a list of features from raw touch events, usually consisted of touch coordinates, time of touch events, pressure, and area along with those coordinates. The extracted features were used to train the authentication models. The training part consisted of training of one [23, 24] or two class classifiers [3, 6, 19]. While the testing part focused on passing genuine and non-genuine samples through the model and computing the genuine fail (false reject rates) and impostor pass rates (false accept rates). The majority of the studies have used these two metrics to report the performance of TCAS. Some studies have also reported Equal Error Rate [3] and Half Total Error Rates (an average of false accept and false reject rates) [20]. Researchers have recommended the use of Half Total Error Rates to report the testing performance as it is not possible to change the threshold during testing [25].

The majority of the studies that use touch gestures collected from smartphones have reported error rates under $10\%$ percent, which is an accepted region, especially for continuous authentication. A continuous authentication system with $10\%$ or lower error rates has several applications in the civilian domain. The problem, however, is that the studies have assumed the non-existence of active adversaries or people with malicious intent. Since the data generated or stored on smart devices are invaluable, there is a high likelihood that active adversaries would attack these systems. Therefore, this study focuses on the evaluation of TCAS under an active adversarial environment and proposing countermeasures.

Apart from smartphones, one of the heavily used touch-enabled devices is the tablet. Thus, a few studies have explored touch gestures collected via tablet for continuous authentication [24, 26]. Saravanan et al. [24] trained LibSVM, a one-class classifier, and obtained an average accuracy of $97.9\%$ and $96.79\%$ on the smartphone and tablet, respectively, on a small dataset of $20$ users. While Trojahn et al. [26] analyzed the slide-touch-type and reported as low as $2\%$ error rates on a dataset of $18$ users. It is worthwhile to note that not many studies have explored smartphones and tablets together, especially keeping the users common across the devices.

2.2 Adversarial Frameworks for TCAS

The idea of adversarial frameworks or attack models is still developing with a common goal, i.e., to fool the authentication system. Based on the current literature, the adversarial frameworks can be divided into two categories, namely, data-injection-based [18] and imitation-based [17, 16].

The data-injection-based attacks assume that it would be possible to bypass the link between the sensor and the authentication Application Programming Interface (API). Thus, the attackers inject snooped, spoofed, random or population-derived samples into the authentication API such that the API will grant (or keep granting) the access [18]. The data-injection-based attacks could be grouped into random [18], user-tailored (from keystrokes [27]), and population-based attacks (from keystrokes [28]). The random attacks include the injection of samples collected from random individuals or random samples generated statistically. The user-tailored attacks include the injection of samples stolen or estimated for a targeted user (popular as snoop-forge-replay attacks). In comparison, population-based attacks inject the average samples (similar to average face) derived from a large public dataset. Needless to say, that population attack eliminates the requirements of stealing the sample from the genuine user.

On the other hand, the imitation-based attacks use live samples collected from humans, robots, or human plus robots. Based on the amount and method of training the individuals or robot, the imitation-based attacks fall into zero-effort or high-effort categories. The zero-effort refers to the scenarios in which attackers use random users who produce the touch gestures without any explicit training or attempt to copy [3, 20, 4, 21, 22, 6, 5, 8]. While under the high-effort, the attackers either train an individual or robot to mimic targeted individual or average gestures derived from publicly available databases [17, 16]. The imitation-based attacks are realistic but tedious and require a much higher level of effort than the data-injection attacks because they eliminate the assumption of bypassing the sensor to API link. In this study, we focus on the data-injection-based attack frameworks and leave the imitation-based attacks for the future, as they require much more resources to realize (e.g. designing a robot).

It is important to note that a biometric system like TCAS could be attacked from many different sources as described in [29]. The attack methods other than the ones described earlier would require write permission to the authentication API. The authentication API’s implementation is supposed to be secure, leaving other sources of attack almost impractical. Therefore, researchers have focused mostly on data-injection- and imitation-based attacks. The success of these attacks depends on the resilience of the pattern matching component in the authentication API. Researchers have noted that raw data level distance-based matching is more resilient to the attacks; however, they exhibit very high error rates, in general, [18, 30, 31, 32]. Machine learning-based matchers, on the other hand, achieve much lower error rates, therefore, heavily studied for implementing TCAS [19, 12, 6] than the distance-based matchers. Therefore, we chose to experiment with machine learning-based implementations of TCAS. To summarize, we evaluate the most widely studied implementations of TCAS under the most common adversarial scenarios.

3 Design of Experiments

This section details the data collection, feature engineering, feature analysis, and implementations of V-TCAS and G-TCAS.

3.1 The Datasets

We used a public dataset named Syracuse University and Assured Information Security-Behavioral Biometrics Multi-Device, and Multi-Activity Data (SU-AIS BB-MAS) [33]. SU-AIS BB-MAS consists of multiple modalities like keystroke, touch, and gait; however, we consider only the touch part, therefore, refer to the dataset as BBMAS-Touch throughout this paper. The reasons to choose BBMAS-Touch include the number of users, the number of samples per user, and common users across multiple devices. Additionally, we used another publicly available dataset [6] to create a more realistic population-based attack environment and refer to the same as Serwadda-Touch.

Both BBMAS-Touch and Serwadda-Touch datasets consisted of raw touch information (coordinates, pressure, and area at every touchpoint) collected while participants answered a series of questions in two different sessions. The questions were designed in such a way that they generated both horizontal and vertical swipes. More details in the data and data collection experiment are available in [6] and [33, 34].

Table 1: List of features gleaned from individual swipes [3].

Feature

Description

swipe\_duration

t_{end}

t_{start}

start\_x

start\_y

end\_x

end\_y

x_{0}

x_{0}

x_{n-1},

y_n-1

dp

Displacement of swipe

l

Length of the swipe

velocity

dp

t_{end}-t_{start}

)

initial\_v

Initial velocity (first 5% of the points)

final\_v

Final Velocity (final 5% of the points)

mean\_v

Pairwise average velocity (magnitude)

direction

Slope of line joining start and end points

area

average area of the fingertip over the swipe

acceleration

Acceleration between start and end points

mean\_a

pairwise average acceleration (magnitude)

initial\_a

Initial acceleration (first 5% points)

final\_a

Final acceleration (final 5% points)

aP_{25}

aP_{50}

aP_{75}

Acceleration Percentile (

aP_{m}

) at

m

% swipe

vP_{25}

vP_{50}

vP_{75}

Velocity Percentile (

vP_{m}

) at

m

% swipe

speed

l

t_{end}-t_{start}

)

initial\_s

final\_s

Initial Speed, Final Speed

sP_{25}

sP_{50}

sP_{75}

Speed Percentile (

sP_{m}

) at

m

% swipe

mean\_v_{x}

mean\_v_{y}

mean\_a_{x}

mean\_a_{y}

mean\_d

Average of

v_{x}

v_{y}

a_{x}

a_{y}

dp

max\_d

Maximum of deviations

v_{x}P_{25}

v_{x}P_{50}

v_{x}P_{75}

Mean Velocity Percentile (

v_{x}P_{m}

)

m

% swipe

v_{y}P_{25}

v_{y}P_{50}

v_{y}P_{75}

Mean Velocity Percentile (

v_{y}P_{m}

)

m

% swipe

a_{x}P_{25}

a_{x}P_{50}

a_{x}P_{75}

Mean Acceleration Percentile (

a_{x}P_{m}

)

m

% swipe

a_{y}P_{25}

a_{y}P_{50}

a_{y}P_{75}

Mean Acceleration Percentile (

a_{y}P_{m}

)

m

% swipe

3.2 Feature Engineering and Analysis

As mentioned earlier, comparison of raw data does not result in acceptable error rates. Therefore, almost all previous studies have relied on the extraction of features from the raw touch events. The derived features have been effectively used to train classifiers that successfully distinguished between genuine and non-genuine (impostor) users. Following previous studies, we extracted features from individual swipes (a swipe here is a series of touch events after finger down, during the slide, and till finger up events). The outlying swipes (with less than five touch points) were removed from the database. A swipe with five or fewer points can be considered a tap. We decided to exclude them from our study because those events were not found unique enough among the users during exploratory swipe analysis. The numbers of total swipes extracted from the phone and tablet were 22625 and 18527, respectively. The outlier removal process resulted in the exclusion of 2339 (10.33%) and 4713 (25.43%) swipes. We chose to include swipes with longer latency in contrast to [20] because the data collection experiments consisted of questions with varying cognitive load, resulting in longer duration swipes. In other words, longer swipes could be someone’s unique trait. We extracted the $30$ features used in Frank et al. [3] from each swipe, and added $17$ more features which were variations of the idea around the thirty features. As a result, we used $47$ features in total from each swipe as listed in Table 1.

A swipe $S$ is a culmination of $n$ touch events. It can be represented as a tuple given in Equation 1.

S=(x,y,t,a,b)_{\texttt{i=1 to n}}

(1)

where $x,y,t,a,$ and $b$ represent, x-coordinate, y-coordinate, time, and major-axis and minor-axis of the fingertip of each touch event, respectively.

Pairwise velocities ( $v_{x}$ and $v_{y}$ ) for x and y axes, $\forall i\in[1,n)$ were computed as described in Equation 2.

(v_{x})_{i}=\frac{x_{i}-x_{i-1}}{t_{i}-t_{i-1}},(v_{y})_{i}=\frac{y_{i}-y_{i-1}}{t_{i}-t_{i-1}}

(2)

Similarly, pairwise accelerations $(a_{x}$ and $a_{y})$ , length of the swipe ( $l$ ), mean area of the fingertip ( $A$ ), can be calculated as shown in Equations 3, 4, 5, respectively.

(a_{x})_{i}={\frac{(v_{x})_{i}-(v_{x})_{i-1}}{t_{i}-t_{i-1}}},(a_{y})_{i}={\frac{(v_{y})_{i}-(v_{y})_{i-1}}{t_{i}-t_{i-1}}}

(3)

l=\sum_{i=1}^{n-1}\sqrt{(x_{i-1}-x_{i})^{2}+(y_{i-1}-y_{i})^{2}}

(4)

A=\frac{1}{n}\sum_{i=1}^{n}\pi\times a_{i}\times b_{i}

(5)

Meanwhile, the deviations of each point, defined as the distance from the line joining the starting and ending point of the swipe ( $d_{i}$ ), can be calculated with Equation 6.

d_{i}=\frac{|y_{i}-m\times x_{i}-c|}{\sqrt{1+m^{2}}}

(6)

Assuming the equation of the line to be $y=m\times x+c$

The velocity, acceleration, and speed features were computed at first, second, and third quantiles of the swipe’s touch events. These features capture the live nature of the swipes. Similarly, velocities and accelerations at the beginning and end of the swipe help us distinguish between the users who start/end their swipe aggressively or gently.

3.3 Authentication Framework

The authentication framework depicted in Figure 1 consisted of the following components.

The training and testing datasets. The dataset was split into train and test sets with $60\%$ and $40\%$ of swipes for every user, respectively. The train set was used for parameter tuning with $k$ -fold cross-validation ( $k=5$ ). The test set was kept separate (unseen) and was used only during the testing process to mimic a real-world setup.

Continuous authentication. One of the simplest ways to achieve continuous authentication via touch gestures is to use sliding windows of touch gestures and provide authentication decisions for each window. The design goal generally is to minimize both the window size and the sliding interval. The former decides the time taken in the first authentication decision, while the latter dictates the time taken by the subsequent authentication decision. We took $p$ consecutive swipes in one window with the removal and addition of $q$ swipes for the next window. A range of values for $p$ and $q$ were inspected. $p=5$ and $q=1$ achieved the best training error rates, in turn, adapted for training the final models and testing.

Choice of classification algorithms. We applied three criteria to choose the classifiers for our study. First, the classifiers must have been used successfully (achieved less than $10\%$ error rates) in the past studies. Second, the classifier should not be data-hungry for training, primarily because most public datasets have a small number of touch gestures per user. The third criterion was the diversity of learning paradigms. The process resulted in the selection of Support Vector Machine (SVM), Random Forest (R.F.), and Multilayer Perceptron (MLP) [3, 19, 6, 16]. We added Extreme Gradient Boosting (XGB), which has not been studied much in the touch biometric domain but has performed well in other domains.

The class imbalance. The two-class classifiers require samples from both classes to be trained. To meet this requirement, traditionally, researchers have used the genuine users’ feature vectors as genuine samples and feature vectors of users other than the genuine users as impostor samples. However, this strategy results in a heavy class imbalance as the number of impostor samples is always higher. Past studies have used under-sampling, over-sampling, or both to address the problem. In this paper, first, we take only a limited number (four from each possible impostor) of samples from the rest of the users, so the number of impostor samples is $4\times(n-1)$ , where $n$ is the total number of users in the dataset. The process of not choosing all the samples from all possible impostors can be termed as under-sampling. At the same time, we employed an Adaptive synthetic sampling approach for imbalanced learning (ADASYN) [35] to over-sample the genuine feature vectors. As a result, every training and testing activity in our experiment used an equal number of genuine and impostor samples.

Input: (

X,n

)

X:

combined feature matrix of all users in the attack dataset,

n:

is the number of population attack vectors to be generated.

Output:

X^{\prime}:

set of

n

population attack vectors.

\mu

\leftarrow

[

mean(X_{i})

for

X_{i}

\in

X.cols]

\sigma\leftarrow

[

std(X_{i})

for

X_{i}\in

X.cols]

X^{\prime}\leftarrow

[]

for $i\leftarrow 0$ to $n$ do

attackvector\leftarrow

[]

for $j\in|X.cols|$ do

r\leftarrow\mathcal{N}(0,3)

attackvector.add(\mu[j]+r\times\sigma[j])

end for

X^{\prime}.add(attackvector)

end for

Algorithm 1

\texttt{population\_attack}(X,n)

The adversarial environments. We implemented two adversarial environments. First, the zero-effort adversarial environments, one of the most studied adversarial environments in the literature. Likely because it is the easiest to implement. The feature vectors of the users other than genuine users serve as the attack vectors in this environment. The second commonly studied adversarial environment is the population-based attack in which the impostor samples are derived from all the users in a given database. In the phone case, we were able to find a separate public dataset (Serwadda-Touch). Thus, we used BBMAS-Touch as well as the Serwadda-Touch dataset for population attack. In contrast, we failed to find such a dataset for the tablet environment. Consequently, we used the same BBMAS-Touch for the population attack. The process of computing the population feature vectors from a dataset is described through Algorithm 1. As shown, we generated $n$ population attack vectors, using the formula = $\mu+r\times\sigma$ where $r=\mathcal{N}(0,3)$ .

Training V-TCAS. We followed the strategy widely implemented in the literature to train the Vanilla authentication framework. To train the selected classifiers for user $u_{i}$ , we labeled the feature vectors belonging to $u_{i}$ as genuine and the feature vectors belonging to the other users than $u_{i}$ as impostors. We conducted five-fold cross-validation to find the hyperparameters that resulted in the highest balanced accuracy. The models were retrained using the best hyperparameters and pickled for testing.

Training G-TCAS. GANs have been extensively used to generate synthetic data points [36]. It consists of Generator ( $G$ ) and Discriminator ( $D$ ). $G$ captures the data distribution, whereas $D$ estimates the probability of generated data belonging to $G$ . $G$ is trained until $D$ exhibits an acceptable level of error. The idea behind GANs is to consider $G$ and $D$ as two players of a min-max game with value function $V(D,G)$ represented in Equation 9.

E_{1}=\mathbb{E}_{\boldsymbol{x}\sim p_{\text{data }}(\boldsymbol{x})}[\log D(\boldsymbol{x})]

(7)

E_{2}=\mathbb{E}_{\boldsymbol{z}\sim p_{\boldsymbol{z}}(\boldsymbol{z})}[\log(1-D(G(\boldsymbol{z})))]

(8)

\min_{G}\max_{D}V(D,G)=E_{1}+E_{2}

(9)

To learn the distribution $p_{g}$ over data $x$ , a prior on input noise variables $p_{z}(z)$ is defined and $G(z;\theta_{g})$ represent a mapping to data space. Where $G$ is a function with parameters $\theta_{g}$ . Second layer $D(x;\theta_{d})$ outputs a single scalar. Here, $D(x)$ represents the probability that $x$ came from the data and not from $p_{g}$ . We train $D$ to maximize the probability of assigning the correct label to both training examples and samples from $G$ . We simultaneously train $G$ to minimize $\log(1-D(G(z)))$ .

The combination of GANs generated data and the actual dataset can make the authentication models more resilient to attacks. This hypothesis motivated us to implement a GAN-assisted TCAS. Figure 1 demonstrates the block diagram of the framework. To summarize, we used two GANs to generate both legitimate and adversarial feature vectors. We customized the implementation of GAN provided at [37] to fit our requirements. In particular, we tuned the parameters (epoch, bath_size, and learning_rate) to obtain the accuracy beyond $93\%$ for both GANs. We refer to these GANs as Legitimate-GAN and Adversarial-GAN, respectively. We generated synthetic samples for each user separately. The Legitimate-GAN input was the genuine feature vectors, while the input for the Adversarial-GAN was the impostor feature vectors (i.e. feature vectors from users other than the genuine user).

Refer to caption — Figure 1: The system architecture of TCAS that uses Legitimate and Adversarial Generative Adversarial Networks.

The GANs did not perform well when the number of requested synthetic feature vectors was too low or too high. To find the optimal number of feature vectors generated, we randomly tested many values between $[50,1000]$ and found that $n=250$ was optimal.

Table 2: Authentication results for both Vanilla TCAS (V-TCAS) and GAN TCAS (G-TCAS) are presented in terms of mean FAR, FRR, and HTER under zero-effort and population-based adversarial environments, for both smartphone and tablet devices. The performance of both V-TCAS and G-TCAS are comparable for the zero-effort adversarial environment in almost all the settings. However, the performance of the V-TCAS degraded significantly under the population-based adversarial environment while G-TCAS showed more resilience to the population-attack.

Device		Metric	SVM		RForest		MLP		XGBoost
Device		Metric	V-TCAS	G-TCAS	V-TCAS	G-TCAS	V-TCAS	G-TCAS	V-TCAS	G-TCAS
Phone	Zero-effort	FRR	$0.03$	$0.03$	$0.03$	$0.03$	$0.04$	$0.04$	$0.03$	$0.03$
		FAR	$0.09$	$0.12$	$0.08$	$0.07$	$0.08$	$0.08$	$0.07$	$0.07$
		HTER	$0.06$	$0.07$	0.05	0.05	$0.06$	$0.06$	0.05	0.05
	Population (Same)	FAR	0.23	0.11	$0.29$	0.10	0.28	$0.24$	$0.24$	$0.19$
	Population (Same)	HTER	$0.11$	$0.05$	$0.14$	$0.05$	$0.14$	$0.12$	$0.12$	$0.09$
	Population (Different)	FAR	$0.29$	0.19	0.26	0.13	$0.36$	$0.28$	0.28	$0.27$
	Population (Different)	HTER	$0.14$	$0.09$	$0.13$	$0.06$	$0.18$	$0.14$	$0.14$	$0.13$
Tablet	Zero-effort	FRR	$0.05$	$0.02$	$0.06$	$0.03$	$0.05$	$0.04$	$0.05$	$0.03$
		FAR	$0.07$	$0.12$	$0.07$	$0.08$	$0.09$	$0.09$	$0.07$	$0.07$
		HTER	0.06	$0.07$	0.06	0.05	$0.07$	$0.06$	0.06	0.05
	Population (Same)	FAR	0.19	0.12	$0.38$	$0.13$	$0.36$	$0.18$	$0.4$	$0.19$
	Population (Same)	HTER	$0.09$	$0.06$	$0.19$	$0.06$	$0.18$	$0.09$	$0.20$	$0.09$

Performance evaluation. The performance of the implemented TCAS was evaluated using three measures, False Accept Rate (FAR–the percentage of successful impostor attempts), False Reject Rates (FRR–the percentage of failed genuine attempts), and Half Total Error Rate (HTER–an average of FAR and FRR). Another measure that has been widely used to report the test performance of TCAS is Equal Error Rate (EER). However, researchers have advised using HTER instead of EER to report the test performance of a biometric system primarily because EER is computed by varying the threshold. The threshold, however, cannot and should not be adjusted during testing [25]. The attack’s impact can be measured by the increase in the False Accept Rates (FAR).

4 Results and Discussion

4.1 Performance Analysis of V-TCAS

Table 2 presents the performance of Vanilla TCAS (V-TCAS) in terms of FAR, FRR, and HTER under zero-effort for both smartphone and tablet devices. V-TCAS’s performance under the population-based adversarial environment is reported using only FAR, and HTER; the FRRs remain the same as of the zero-effort adversarial environment.

Smartphone. Both Random Forest and XGBoost achieved $5\%$ HTERs which was better than SVM and MLP ( $6\%$ HTERs) under the zero-effort adversarial setup. On the other hand, SVM ( $14\%$ increase in the FAR) and XGBoost ( $17\%$ increase in FAR) showed more resilience to the population-based adversarial setup created using the same dataset than Random Forest ( $21\%$ increase in FAR) and MLP ( $20\%$ increase in FAR).

Tablet. The performance of Random Forest and XGBoost rounded to $5\%$ HTER, which was better than MLP ( $6\%$ HTER) and SVM ( $7\%$ HTER). SVM was the most resilient ( $12\%$ increase in the FAR) classifier compared to MLP ( $27\%$ ), Random Forest ( $31\%$ ), and XGBoost ( $33\%$ ).

We can conclude that SVM was the most resilient classifier, irrespective of the device.

4.2 Performance Analysis of G-TCAS

Table 2 presents the G-TCAS’s performance in terms of FAR, FRR, and HTER under the zero-effort for both smartphone and tablet devices. G-TCAS’s performance under the population-based adversarial environment is reported using only FAR and HTER; the FRRs remain the same as of the zero-effort adversarial environment.

Smartphone. GTCAS achieved similar performance to V-TCAS for the majority of the classifiers. Random Forest ( $5\%$ HTER) and XGBoost ( $5\%$ HTER) did better than SVM ( $7\%$ HTER) and MLP ( $6\%$ HTER) under the zero-effort adversarial setup. However, SVM ( $-1\%$ increase in FAR) and Random Forest ( $3\%$ increase in FAR) showed more resilience to the population-based adversarial setup created using the same dataset than MLP ( $16\%$ increase in FAR) and XGBoost ( $12\%$ increase in FAR). The same phenomenon holds for the population-based adversarial setup created using a different dataset as RForest ( $6\%$ ) and SVM ( $7\%$ ), which showed more resilience than XGBoost ( $20\%$ ) and MLP ( $28\%$ ).

Tablet. The performance of SVM and Random Forest rounded to $5\%$ HTER, which was better than MLP ( $7\%$ HTER) and XGBoost ( $7\%$ HTER). SVM ( $0\%$ increase in FAR) and Random Forest ( $5\%$ increase in FAR) were most resilient compared to MLP ( $9\%$ ) and XGBoost ( $12\%$ ).

4.3 V-TCAS vs. G-TCAS

Overall, the results table suggest that G-TCAS performed much better under a stringent (population-based) adversarial environment than V-TCAS. GAN-generated synthetic samples (both legitimate and adversarial) included in the authentication pipeline helped draw a better boundary. To analyze the idea, we plotted the top two principal components for a random user (a sample is shown in Figure 2). We observed that the G-TCAS helped separate the classes better, making the implementation more robust than V-TCAS.

5 Conclusion and Future Work

We proposed a novel framework to implement TCAS, which showed more resilience to the population-based adversarial attacks than the widely studied TCAS designs. SVM-based implementations consistently performed better than the other classifiers both in terms of accuracy and resilience.

This work is limited to touch-biometrics, ML models, and two datasets; we would like to expand the study across different biometrics (e.g., gait, keystroke), learning frameworks (e.g., DL models, and one-class paradigm), and a variety of datasets. Additionally, we aim to explore whether the proposed defense mechanism works under more rigorous attack environments e.g., imitation-based attacks.

6 Acknowledgment

We thank the anonymous reviewers for their insightful feedback. Rajiv Ratn Shah was partly supported by the Infosys Center for Artificial Intelligence and the Center of Design and New Media at IIIT Delhi, India.

References

et al. [2014a] D Shukla et al. Beware, your hands reveal your secrets! In ACM CCS, 2014a.
et al [2019a] D Shukla et al. Stealing passwords by observing hands movement. IEEE TIFS, 2019a.
et al [2012a] M. Frank et al. Touchalytics: On the applicability of touchscreen input as a behavioral biometric for continuous authentication. IEEE TIFS, 2012a.
et al [2012b] T. Feng et al. Continuous mobile authentication using touchscreen gestures. In IEEE-HST, 2012b.
et al [2013a] L. Li et al. Unobservable re-authentication for smartphones. In NDSS, 2013a.
et al [2013b] A. Serwadda et al. Which verifiers work?: A benchmark evaluation of touch-based authentication algorithms. In IEEE BTAS, 2013b.
et al [2015a] S. Mondal et al. Swipe gesture based continuous authentication for mobile devices. In IEEE ICB, 2015a.
et al [2016a] R. Kumar et al. Continuous authentication of smartphone users by fusing typing, swiping, and phone movement patterns. In IEEE BTAS, 2016a.
et al [2016b] V. M. Patel et al. Continuous user authentication on mobile devices: Recent progress and remaining challenges. IEEE Signal Processing Magazine, 2016b.
et al [2014b] A. Primo et al. Context-aware active authentication using smartphone accelerometer measurements. In CVPRW, 2014b.
et al [2015b] H. Zhang et al. Touch gesture-based active user authentication using dictionaries. IEEE, 2015b.
et al [2020a] E. Ellavarason et al. Touch-dynamics based behavioural biometrics on mobile devices – a review from a usability and performance perspective. ACM Computing Survey, 2020a.
et al [2004] A. K. Jain et al. An introduction to biometric recognition. IEEE T-CSVT, 2004.
et al [2006] M. J. Hertenstein et al. Touch communicates distinct emotions. Emotion, 2006.
et al [2016c] C. Bevan et al. Different strokes for different folks? revealing the physical characteristics of smartphone users from their swipe gestures. International Journal of HCS, 2016c.
et al [2016d] A. Serwadda et al. Toward robotic robbery on the touch screen. ACM TISSEC, 2016d.
et al [2016e] H. Khan et al. Targeted mimicry attacks on touch input based implicit authentication schemes. In MobiSys, 2016e.
et al [2020b] B. Zhao et al. On the resilience of biometric authentication systems against random inputs. In NDSS, 2020b.
et al [2018a] J. Fierrez et al. Benchmarking touchscreen biometrics for mobile authentication. IEEE TIFS, 2018a.
et al [2015c] Z. Sitová et al. Hmog: New behavioral biometric features for continuous authentication of smartphone users. TIFS, 2015c.
et al [2016f] U. Mahbub et al. Active user authentication for smartphones: A challenge data set and benchmark results. In IEEE-BTAS, 2016f.
et al [2019b] U. Mahbub et al. Continuous authentication of smartphones based on application usage. IEEE TBIOM, 2019b.
et al [2018b] R. Kumar et al. Continuous authentication using one-class classifiers and their fusion. In IEEE ISBA, 2018b.
et al [2014c] P. Saravanan et al. Latentgesture: Active user authentication through background touch analysis. In Chinese CHI, 2014c.
et al [2002] S. Bengio et al. Confidence measures for multimodal identity verification. Information Fusion, 2002.
et al [2013c] M. Trojahn et al. Toward mobile authentication with keystroke dynamics on mobile phones and tablets. In IEEE Conf. on Adv. Info. Net. and App. Workshops, 2013c.
et al [2013d] K. A. Rahman et al. Snoop-forge-replay attacks on continuous verification with keystrokes. IEEE TIFS, 2013d.
et al [2013e] A. Serwadda et al. Examining a large keystroke biometrics dataset for statistical-attack openings. ACM TISSEC, 2013e.
et al [2001] N. K. Ratha et al. An analysis of minutiae matching strength. In Springer-AVBPA, 2001.
et al [2014d] E. Pagnin et al. On the leakage of information in biometric authentication. In Springer, 2014d.
et al [2020c] R. Kumar et al. Treadmill assisted gait spoofing (tags): An emerging threat to wearable sensor-based gait authentication. ACM DTRAP, 2020c.
et al [2015d] R. Kumar et al. Treadmill attack on gait-based authentication systems. IEEE-BTAS, 2015d.
et al [2019c] A.K Belman et al. Insights from bb-mas–a large dataset for typing, gait and swipes of the same person on desktop, tablet and phone. arXiv, 2019c.
et al [2020d] V. Udandarao et al. On the inference of soft biometrics from typing patterns collected in a multi-device environment. In IEEE-BigMM, 2020d.
et al [2008] H. He et al. Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In IEEE IJCNN, 2008.
et al [2014e] I. J. Goodfellow et al. Generative adversarial nets. In NIPS, 2014e.
Maskara [2021] V. Maskara. Generating tabular synthetic data using gans. https://www.maskaravivek.com/post/gan-synthetic-data-generation/, 2021. Last Accessed: April 4, 2021.