Not All Datasets Are Born Equal:
On Heterogeneous Data and Adversarial Examples

Yael Mathov¹¹1Contributed equally. [email protected] Eden Levy²²2Contributed equally. [email protected] Ziv Katzir [email protected] Asaf Shabtai [email protected] Yuval Elovici [email protected]

Abstract

Recent work on adversarial learning has focused mainly on neural networks and domains where those networks excel, such as computer vision, or audio processing. The data in these domains is typically homogeneous, whereas heterogeneous tabular datasets domains remain underexplored despite their prevalence. When searching for adversarial patterns within heterogeneous input spaces, an attacker must simultaneously preserve the complex domain-specific validity rules of the data, as well as the adversarial nature of the identified samples. As such, applying adversarial manipulations to heterogeneous datasets has proved to be a challenging task, and no generic attack method was suggested thus far. We, however, argue that machine learning models trained on heterogeneous tabular data are as susceptible to adversarial manipulations as those trained on continuous or homogeneous data such as images. To support our claim, we introduce a generic optimization framework for identifying adversarial perturbations in heterogeneous input spaces. We define distribution-aware constraints for preserving the consistency of the adversarial examples and incorporate them by embedding the heterogeneous input into a continuous latent space. Due to the nature of the underlying datasets We focus on $\ell_{0}$ perturbations, and demonstrate their applicability in real life. We demonstrate the effectiveness of our approach using three datasets from different content domains. Our results demonstrate that despite the constraints imposed on input validity in heterogeneous datasets, machine learning models trained using such data are still equally susceptible to adversarial examples.

keywords:

Adversarial Examples , Adversarial Learning , Tabular Data

\affiliation

[inst1]organization=Department of Software and Information Systems Engineering,
Ben-Gurion University of the Negev, city=Beer-Sheva, postcode=8410501, country=Israel

1 Introduction

The susceptibility of learning algorithms to adversarial input manipulations has puzzled researchers in recent years, leading to extensive research work. Early work on adversarial attacks was demonstrated using learning tasks that were popular at the time, such as spam filtering [1, 2] and malware detection [3]. Notably, homogeneous, bag-of-words feature representation was used in all relevant use cases. Later on, as deep learning has become prevalent, the vast majority of papers published in recent years have discussed attacks and defenses in the context of neural networks. Naturally, these works have focused on input domains where deep learning excels such as image and audio, where data representation remains mainly homogeneous and input spaces are continuous. A significant amount of research effort was put into computer vision applications and related tasks, e.g., image segmentation [4], face detection [5], and more specifically, image classification [6, 7, 8, 9, 10, 11]. Other domains include audio [12], and the exclusively discrete domains of malware represented by binary vectors [13, 14] and the classification of sparsely encoded text [15, 16].

In contrast, there has been little to no research on heterogeneous input spaces in the context of adversarial examples. The reasons for this range from the intrinsic diversity of feature data types, to the relatively underexplored robustness of the tree-based models that are most associated with such data [17, 18, 19]. The few attacks that were demonstrated in the context of heterogeneous input domains have focus on exploiting the architecture of the learning model, while ignoring the effects of the manipulation on the validity of the data itself. Another possible explanation is that machine learning research in recent years was almost entirely focused on deep learning based methods, leading researchers to study data taken from high-dimensional, homogeneous and continuous distributions, where deep learning typically demonstrates improved performances over traditional approaches [20]. Conversely, tabular data is drawn from low-dimensional, heterogeneous, and largely discrete input domains. More specifically, tabular data has the following characteristics: (a) the input often includes a combination of nominal, ordinal, and real-valued features, (b) different value ranges are associated with different features, (c) missing values are common, (d) some features are considered immutable in the context of an attack, and (e) complex cross-feature interactions define valid and anomalous feature combinations. Examples for domains where the data is normally heterogeneous include healthcare, real estate, and financial applications.

In this paper, we study the process of crafting adversarial examples for heterogeneous tabular data. There are several challenges that are unique to this case, both from the perspective of the adversary and challenges that are associated with the optimization procedure of finding adversarial perturbations that must comply with the heterogeneity. In contrast to audio or language processing applications that are considered digital, machine learning tasks on heterogeneous data are usually related to real-life scenarios, and the relevant features describe tangible information, such as a person’s salary. Therefore, an adversary that targets heterogeneous data would like to minimize the number of modified features (i.e., a minimal $\ell_{0}$ perturbation), as opposed to minimizing other frequently used distance metrics (e.g., $\ell_{1},\ell_{2},\ell_{\infty}$ ) which are compromised by the vastly different feature scales. Additionally, some input features must be considered immutable in the context of the attack, as it is impossible (or at least very difficult) to change their values in real life. As an example consider features that denote a person’s height, credit history, or residential address. This implies that an adversary must also address the real life feasibility of the identified adversarial perturbations. Data heterogeneity lead to optimization challenges as well. A generic attack algorithm must preserve the consistency of the adversarial examples in terms of feature correlation, and to maintain the various types of features (e.g., integers, positives, real numbers).

Our approach for perturbing samples from heterogeneous input spaces addresses each of the aforementioned challenges, while being agnostic to the target learning algorithm and the data domain. First, we formulate the mathematical modeling of the validity of the data, in terms of its consistency (i.e., preserving feature correlation) and feasibility (i.e., keeping immutable features intact). Then, we aim to capture the complexity of the heterogeneous input space by using a data transformation function, whose purpose is to embed the heterogeneous data into a continuous latent space and lead to more consistent adversarial examples. This function encompasses the correlations between the features and thus allows us to use gradient-base optimization techniques to identify adversarial examples without harming the consistency of the input. Using these tools and techniques, we define a general optimization problem for finding valid adversarial examples of heterogeneous tabular data, with the ultimate goal of applying them in real life. Finally, we propose a clear implementation of our approach and its various components, based on neural networks and methods from metric learning. We test our approach and implementation using three datasets from different content domains and a variety of target learning algorithms.

Our results suggest that machine learning models in domains with heterogeneous input spaces are just as prone to adversarial manipulations as those used in continuous homogeneous domains. This implies that further research is required to understand the role played by data, as opposed to the model or algorithm, in the context of adversarial susceptibility.

2 Background

Tree-based models achieve state-of-the-art performance in learning tasks that involve heterogeneous tabular data [17, 19, 18]. The robustness of such models has been extensively researched, including attacks on decision stumps or trees [21, 22, 23], random forests [24, 23], and gradient boosting machines [25, 23]. Similarly to these works, prior studies on traditional non-tree-based models were either too specific to the domains they were demonstrated in [1, 26, 27] or tailored to a certain class of learning algorithms (e.g., SVM or linear models in general) [2, 28]. The most prominent attack methods today target neural networks [6, 7, 8, 9, 10]. The vast majority of these attacks relate to a more general technique based on the network’s gradient or its approximation. However, in the case of nominal features which are common in tabular data, the derivative with respect to such features (i.e., $\frac{\partial\mathcal{L}}{\partial x_{i}}$ for a loss function $\mathcal{L}$ and a nominal feature $x_{i}$ ) cannot be correctly interpreted, as it is arbitrary at best. As a result of this and other issues, all previously mentioned research either lacks generality or could not be adapted out-of-the-box to applications with heterogeneous input space. We aim to bridge this gap by extending the renowned adversarial deep learning tools to create a neural network-based implementation of our generic approach.

Recently, there have been some attempts at manipulating neural networks trained on heterogeneous tabular data. Sarkar et al. [29] performed a number of attacks on a credit score classifier and demonstrated vulnerabilities in financial applications. However, the experiment only included continuous features despite the heterogeneous data domain, and the evaluation metrics are not well-suited to a real-life adversary. Ballet et al. [30] presented a real-life data-oriented approach that could be adapted to domains other than finance. It employs the means to (1) preserve the data types, by clipping, and (2) reduce the perturbation’s perceptibility, by incorporating feature importance in the attack objective in an attempt to mimic the decision process of a domain expert. Their method and experiment did not appropriately reflect the heterogeneity of the data, since categorical features were treated as continuous, and nominal features were discarded altogether. Moreover, the evaluation of the perturbations was based on the $\ell_{2}$ norm, which is misrepresentative given the significant variability in scales of the features, and the feature importance (i.e., knowledge of a domain expert) was improperly modeled by using the Pearson correlation with the target variable.

A primary goal of our work is to cover all aspects related to the feature space challenges of heterogeneous tabular data. The following works integrate components related to covariances or outliers with the intent of preserving feature correlations or feature types. An early work [3] in the adversarial learning domain takes the underlying distribution of the data into account in their proposed attack. Their method includes modeling the a priori estimate (i.e., $p(x\,|\,y)$ for a data point $x$ and its label $y$ ) by using a kernel density estimator as an additional component in their attack objective. However, this method was not evaluated on heterogeneous input spaces and therefore lacks additional related aspects that are essential. Ustun et al. [31] formulated an integer linear programming (ILP) problem to find perturbations that do not include immutable features and are consistent (i.e., preserving the data’s structure under its various mathematical constraints) by adding constraints to the problem definition. Kanamori et al. [32] extended this work by incorporating non-linear statistical elements that correspond to the empirical distribution of the data, in a mixed ILP-based optimization instance. Their method aims to create realistic perturbations that conform with feature correlations and outlier values, by adding approximations of the Mahalanobis’ distance [33] and the local outlier factor [34] to their objective function and optimization constraints. The context of the two last-mentioned works is counterfactual explanation methods [35] of machine learning models used to provide recourse to users who wish to gain access to the factors underlying the model’s decision (e.g., a credit scoring application where customers try to improve their score). Contrarily, our work focuses on an adversarial setting and related topics.

The approach we propose in this paper complements previous work by weaving the various concepts we discussed so far into one exhaustive formulation and implementation, and therefore the approach addresses all possible challenges that arise from crafting adversarial examples for heterogeneous data. To the best of our knowledge, we are the first to introduce a comprehensive and general adversarial attack method that is agnostic to the target learning algorithm and focuses on truly heterogeneous tabular input space.

3 Our Approach

In this section, we describe our approach using the following notations. The attacker targets a machine learning model $M:\mathcal{X}\rightarrow\mathcal{Y}$ . For brevity, we assume $\mathcal{X}$ is a discrete input space with $d$ discrete random variables. The label space is denoted by $\mathcal{Y}\subseteq\{0,1\}^{m}$ where labels are one-hot encoded.

Let $X_{1},\dots,X_{d}$ be the random variables that represent the features in the data, which have the discrete distributions $\mathcal{F}_{1},\dots,\mathcal{F}_{d}$ , such that for each, $1\leq i\leq d$ , $X_{i}\sim\mathcal{F}_{i}$ . Therefore, an input vector $X=(X_{1},\dots,X_{d})$ follows the multivariate joint distribution $\mathcal{F}$ , as in $X\sim\mathcal{F}$ . We note that $X_{1},\dots,X_{d}$ are not necessarily independent or identically distributed. It would be even safer to assume some level of dependency between the features as naturally prevails in tabular data.

Given a benign sample $x\in\mathcal{X}$ , with the corresponding ground true label $y\in\mathcal{Y}$ , a maximal perturbation size $\lambda$ , and a context dependent distance metric $\mathcal{D}$ , the adversarial goal is to find an adversarial example $x+\delta=x^{*}\in\mathcal{X}$ such that $\mathcal{D}\left(x^{*},x\right)<\lambda$ and $M(x^{\ast})\neq y$ .

We assume that the attacker has access to the training data $\mathbb{X}\subseteq\mathcal{X}$ used for constructing $M$ . Although our approach introduces an untargeted attack, it can be easily modified to fit a targeted attack.

3.1 Definitions for Data Validity

To ensure the validity of $x^{*}$ , the attack algorithm must preserve the distributional consistency of the various features and also refrain from modifying immutable features so that the resulting perturbation pattern remains feasible in real life. These two challenges are of vital importance and enable the attacker to eventually apply the adversarial perturbations in practice (i.e., in the real world).

3.1.1 Data Consistency

Changing a sample $x$ with the corresponding label $y$ might result in values that are unlikely given $\mathcal{F}$ ; i.e., values that lie in unsupported regions of $\mathcal{F}$ . In such a case, we refer to $x^{*}$ as inconsistent and therefore consider it invalid. Formally, for a consistency threshold $\epsilon>0$ , we consider a sample to be $\epsilon$ -inconsistent if the probability $\mathbb{P}_{X\sim\mathcal{F}}(X=x^{*}\,|\,y)\leq\epsilon$ . Note that consistency is measured in the context of a specific class label, so an input vector can be highly likely given one label but be anomalous for another.

Two examples of inconsistency are described as follows:

1.

A scenario in which $x_{i}^{*}\in x^{*}$ where $x_{i}^{*}$ is an undefined outcome of $X_{i}^{*}$ or $\mathbb{P}_{X_{i}\sim\mathcal{F}_{i}}(X_{i}=x_{i}^{*}\,|\,y)=0$ ; for instance, $X_{i}$ represents the number of children but $x_{i}^{*}\notin\mathbb{N}$ .
2.

For each $x_{i}^{*}\in x^{*}$ , we have $\mathbb{P}_{X_{i}\sim\mathcal{F}_{i}}(X_{i}=x_{i}^{*}\,|\,y)>0$ but $\mathbb{P}_{X\sim\mathcal{F}}(X=x^{*}\,|\,y)\leq\epsilon$ ; consider $X_{i},X_{j}$ to be year of birth and year of death respectively but $x_{i}^{*}>x_{j}^{*}$ .

Consequently, we define $x^{*}$ as an $\epsilon$ -consistent adversarial example if the following holds:

\mathbb{P}_{X\sim\mathcal{F}}(X=x^{*}\,|\,y)>\epsilon

(1)

3.1.2 Data Feasibility

Data describing real-life use cases involves features that represent somewhat tangible details. In real-world applications, some of the information collected is factual and immutable, meaning it cannot be changed by an attacker. Although it is possible to digitally perturb immutable features, they cannot be changed in real life and any change to them would render the example unfeasible and therefore invalid. Examples for immutable features include a patient’s blood type, a customer’s history of financial transactions, and the size of a piece of real estate.

We define the subset of indexes of immutable features as $I\subseteq\{1,\dots,d\}$ , and require an adversarial example $x^{*}$ to satisfy the following, in order to be feasible:

\forall i\in I,\;x_{i}^{*}=x_{i}

(2)

3.1.3 Data Validity

A valid adversarial example $x^{*}$ must satisfy both Equations 1 and 2 to be consistent and feasible. Therefore, for a consistency threshold $\epsilon$ and set of immutable features $I$ , the validity of an adversarial example $x^{*}$ is modeled as the conjunction:

\mathbb{P}_{X\sim\mathcal{F}}(X=x^{*}\,|\,y)>\epsilon\;\land\;\forall i\in I,\;x_{i}^{*}=x_{i}

(3)

3.2 Data Transformation

Unlike continuous data, categorical and nominal features impair the performance of commonly used optimization techniques, such as the training of neural networks [20]. As a result, many methods for crafting adversarial examples are limited when used in conjunction with heterogeneous tabular data. In addition, perturbing a non-continuous feature without taking into account the support of its corresponding distribution is highly likely to result in an illegal value. This puts the attack method in jeopardy and may result in $\epsilon$ -inconsistent and illegal adversarial examples.

To address these challenges, we suggest using a structure-preserving transformation function, which practically serves as an embedding function. It is defined as $f:\mathcal{X}\rightarrow\mathbb{R}^{e}$ . This new representation provides two advantages:

1.

The process of constructing $f$ induces the learning of latent patterns of features that are otherwise less adequately represented; i.e., categorical and nominal features. Such patterns include interactions among features, semantic meanings of nominal features, and more generally, the modeling of the probability distribution of the data. The parameters of $f$ carry meaningful information and allow the gradients of the network to encompass these patterns, which in turn, assists the optimization process in finding consistent adversarial examples.
2.

The data is transformed into a continuous and homogeneous space, making it easier to manipulate using optimization-based attacks and measure distances.

The embedding function $f$ can be implemented using various general concepts, including generative adversarial networks [36], autoencoders [37], and metric learning [38].

3.3 Attack Description

Recall that the attacker’s goal is to craft adversarial examples for a target model $M:\mathcal{X}\rightarrow\mathcal{Y}$ . The target model is trained on the heterogeneous dataset $\mathbb{X}\subseteq\mathcal{X}$ , which represent real-world information. Therefore, $M$ could be based on any machine learning algorithm, however it is more likely that $M$ is tree-based, as this is the type that is most associated with heterogeneous tabular data. Additionally, perturbing a benign input sample $x\in\mathcal{X}$ should craft a valid adversarial example $x^{*}\in\mathcal{X}$ ; hence, the attack preserves the validity of the data.

Therefore, the proposed attack method (presented in Fig. 1) uses two components that are designed to preserve the validity of the data: an embedding function to conserve the consistency of the data, and a set of rules that protect the immutable features (i.e., create a feasible $x^{*}$ ). The embedding function is the structure preserving transformation $f:\mathcal{X}\rightarrow\mathbb{R}^{e}$ . Since it transform each input sample $x\in\mathcal{X}$ into a new continuous space, $M$ cannot be used to classify $f(x)$ . We address this challenge by using the knowledge on $\mathbb{X}$ to build a surrogate model $\widetilde{M}:\mathcal{X}\rightarrow\mathcal{Y}$ that mimics the predictions of $M$ . The architecture of $\widetilde{M}$ includes the embedding function and a classifier that receives the embedded data samples as an input. Then, we use $\widetilde{M}$ to craft the adversarial examples.

Refer to caption — Figure 1: The suggested attack approach: A set of rules are applied to the tabular data $x$ , which is sent into an embedding function $f$ . The embedded data $f(x)$ is sent to a learning-based task solver $C$ , which outputs a prediction similar to the target model $M$ . Since our design uses $f$ to transform $x$ , $M$ cannot be used as a task solver; so instead, we use the surrogate model $\widetilde{M}=C\circ f$ (the dotted rectangle). When using a gradient-based method to craft adversarial examples (the dashed line), the rules and the embedding function preserve the validity of the perturbed data.

Since we craft adversarial examples for $\widetilde{M}$ and not for $M$ , our attack algorithm relies on the transferability property [21] of adversarial examples. The attacker crafts adversarial examples for a surrogate model $\widetilde{M}$ and uses them against the target model $M$ . The surrogate is created to support the requirements of the data transformation process and to facilitate the generation of adversarial examples that comply with the constraints set forth in Equation 3.

3.3.1 Constructing a Surrogate Model

To preserve the consistency of the data, a surrogate model $\widetilde{M}:\mathcal{X}\rightarrow\mathcal{Y}$ is constructed by composing a classifier $C$ and an embedding function $f$ , such that $\widetilde{M}=C\circ f$ . The structure preserving function $f$ transforms the data, and $C:\mathbb{R}^{e}\rightarrow\mathcal{Y}$ uses the embedded input samples to solve the original learning task. It is important to note that $f$ and $C$ are trained separately and sequentially, with $C$ training over $\{f(x)\;|\;x\in\mathbb{X}\}$ . Similarly to the embedding function $f$ , the task solver can be implemented using multiple techniques. However, such implementation should comply with requirements of the attack method and the structure of the embedded data.

3.3.2 Generating Adversarial Examples

Let $x\in\mathcal{X}$ be a valid input sample, such that $y\in\mathcal{Y}$ is the true label of $x$ . Given a consistency threshold $\epsilon>0$ , a set of immutable features $I\subseteq\{1,\dots,d\}$ , a distance metric $\mathcal{D}$ , and a maximum perturbation size $\lambda$ , the attacker aims to find a valid adversarial example $x^{\ast}\in\mathcal{X}$ , such that $M(x^{*})\approx\widetilde{M}(x^{*})\neq y$ . Due to the design of $\widetilde{M}$ and its components, which are differentiable by our definition, the attacker can also use any optimization-based attack to craft adversarial examples and transfer them to the target model.

Formally, based on Equation 3, finding a valid $x^{*}$ is done by solving the following optimization problem:

$\operatorname{arg\,min}\limits_{x^{}}\ \ \mathcal{L}_{adv}(\widetilde{M}(x^{}),\,y)+\\|f(x)-f(x^{})\\|_{2}$
s.t.	$\mathbb{P}_{X\sim\mathcal{F}}(X=x^{*}\,\|\,y)>\epsilon,$
	$\forall i\in I,\;x_{i}^{*}=x_{i},$
	$\mathcal{D}(x,\,x^{*})<\lambda$

(4)

where $\mathcal{L}_{adv}$ is defined as the adversarial objective function. The attacker chooses $\mathcal{L}_{adv}$ according to his goals, capabilities, and data domain. For example, $\mathcal{L}_{adv}$ can combine the original task’s loss function (e.g., cross-entropy, mean squared error) with additional domain specific objectives, such as the real-life complexity in changing the values of individual input features. The second component in the objective, defined as $\mathcal{L}_{spatial}=\|f(x)-f(x^{*})\|_{2}$ , helps arrive at a closer point in the latent space, which results in more consistent adversarial examples by the construction of $f$ .

The optimization problem in Equation 4 is not a simple task, especially because of its validity constraints. Calculating $\mathbb{P}(X=x^{*}\,|\,y)$ cannot be done trivially, but it can be approximated by using CTGAN [39], which is specific to heterogeneous tabular data; or other more general approaches, such as Bayesian networks [40] and decision trees [41]. As $\epsilon$ increases, the adversarial example that is $\epsilon$ -consistent is more likely and therefore more imperceptible in real life. However, there is a clear trade-off between finding such an $\epsilon$ -consistent adversarial example and its likelihood, as this optimization problem is highly non-convex, and its convergence is not guaranteed. The feasibility constraint can be forced by applying a feature mask that prevents perturbations to immutable features from being added.

These challenges require a more eased optimization instance that approximates the original problem and can be solved by using existing algorithms. We propose an implementation of such an optimization instance in our experiments which are described in the next section where we begin by describing the learning process of a surrogate model, which includes an embedding function as outlined in Section 3.2, and a task solver. Then we illustrate the complete course of the adversarial attack algorithm, as well as the means employed to enforce the data validity rules.

4 Experiments

4.1 Datasets

We conducted our empirical study on three tabular datasets from different content domains. These datasets were chosen because they all have the various challenging characteristics of tabular data, and especially because they are heterogeneous and pertain to real-life oriented tasks.

•

Home Credit Default Risk (HCDR) [42] - includes current and historical financial information on just over $300,000$ customers and their loan requests. This dataset’s use case is a binary classification task in the credit risk assessment area designed to predict a client’s repayment abilities.
•

Intensive Care Unit (ICU) [43] - contains medical data on around $91,000$ patients in an intensive care unit. This dataset is associated with a binary classification task aimed at helping predict the death of ICU patients.
•

U.S. Lodging Listings (Airbnb) [44] - data scrapped from the Airbnb website on some $57,000$ property listings in major U.S. cities. This dataset’s use case is a regression task aimed at predicting the price of lodging.

In the binary classification tasks, both of the datasets suffer from class imbalance with a ratio of approximately $92:8$ . The global minimum and maximum values in each dataset can be extremely far apart, with the difference between them ranging from approximately $10^{3}$ in Airbnb to $10^{5}$ in ICU and $10^{9}$ in HCDR. Missing values are quite common as well, with ICU having nearly a quarter of all of the data missing, down to $15\%$ in HCDR and $9\%$ in Airbnb. HCDR, ICU, and Airbnb include $22$ , $81$ , and $48$ immutable features, respectively. Table 1 lists the number of constraints imposed on the features in each dataset by constraint type and the total number of features.

Dataset	Integer	Positive	Negative	Normalized	Categorical	Total Features
HCDR	11	11	5	15	21	52
ICU	42	104	0	1	21	126
Airbnb	20	22	1	5	133	157

Table 1: Number of constraints by type (features may have more than one constraint).

As commonly done for these data domains, we applied the following preprocessing steps on the datasets. First, the data was cleaned of outliers and features for which over 75% of the values were missing. In cases in which there was a group of features with unusually high correlation of over 0.95 or under -0.95, we dropped all but one randomly chosen feature from the group. Missing values in each of the remaining features were imputed with one of the following: the modal value for categorical variables, the mean value for regular numeric variables, or a unique value outside the variable’s legitimate value range. We chose the combination of methods that yielded the best performance. Categorical features were encoded with consecutive integers, so that all of the data is numeric. Finally, all of the data was re-scaled or standardized so that each feature is in the range $[0,1]$ or normally distributed.

Note that our preprocessing maintains the challenging characteristics of tabular datasets, and the attack method backpropagates through the relevant operations. Each learning algorithm has its own capabilities, limitations, and implementation; therefore, some of the algorithms do not require all of the above preprocessing. For example, the LightGBM algorithm [45] handles missing values and categorical variables as per its implementation, so we skipped the relevant preprocessing steps for that model.

4.2 Attack Implementation

The following is an implementation of the various components of our approach. It is important to note that other choices can be made for any component and part of the process as long as the general requirements are met, including the models, data transformation, validity rules, method for perturbing the input samples, etc.

4.2.1 Models

Here we describe the target and surrogate models used in the experiments. All of the datasets were split into training and validation sets consisting of 80% and 20% of the data, respectively. Another separate set of 500 samples was used to craft the adversarial examples.

Target models. As previously noted, heterogeneous tabular data is more commonly associated with tree-based models, such as decision tree and random forest, as well as gradient boosting machines like LightGBM. Due to their popularity, we demonstrate the transferability of our attack on the surrogate model to these three learning algorithms. The performance evaluation of the models for each dataset are presented in Table 2.

		Decision Tree		Random Forest		GBM		Surrogate
Dataset	Metric	Train	Validation	Train	Validation	Train	Validation	Train	Validation
HCDR	AUC	0.72	0.69	0.74	0.72	0.78	0.75	0.74	0.73
ICU	AUC	0.83	0.83	0.87	0.87	0.95	0.91	0.90	0.88
Airbnb	MSE	0.13	0.14	0.13	0.13	0.07	0.08	0.11	0.11

Table 2: Evaluation scores of the training and validation sets on the target and surrogate models for each task. The performance is evaluated using the ROC-AUC metric for classification tasks and the mean squared error (MSE) for regression.

Surrogate model. The surrogate model consists of an embedding network and a task solver network. Motivated by concepts from the person re-identification task, in each embedding network we use batch hard triplet loss [38, 46] with the cosine similarity distance metric. To fit the regression task to the same embedding learning process used in classification, we use equal frequency binning on the target variable. We train the embedding networks with the Adagrad optimizer [47], a learning rate of 0.001, and a mini-batch size of 32. In our experiments, the size of the embedding vectors for HCDR, ICU, and Airbnb is four, 20, and 15, respectively. Then, we freeze the embedding network’s parameters and add the task solver sequentially; it consists of a single layer with one neuron and a sigmoid or softmax activation function. The solver receives the embedding vectors from the trained embedding network as input. We use the Adam optimizer [48] to minimize the standard objectives matching the learning task: cross-entropy for classification and MSE for regression.

4.2.2 Attack Overview

Our attack implementation, presented in Algorithm 1, is a variant of the $\ell_{0}$ -oriented Jacobian saliency map attack [8]. This iterative algorithm performs the following main steps. First, it selects the feature that has the most impact on the prediction of the model, excluding immutable features. To avoid having the algorithm repetitively make changes that will be clipped at the end of the iteration, features that have already been selected are not reconsidered. The impact should be measured according to the model that is being attacked; in the case of our neural network-based surrogate model, it is the gradients, as described in Algorithm 1, and for the tree-based target models it is the information gain of each feature. Second, a perturbation vector $\alpha$ for the selected features is calculated by using the Adam optimizer which minimizes the adversarial objective function, as in Equation 4. The auxiliary method for this calculation is denoted by ComputeStep. Lastly, the Project method is performed on $x^{*}$ , projecting each feature value onto its consistent set or continuous range, and ensuring that for each feature $i$ , $\mathbb{P}_{X_{i}\sim\mathcal{F}_{i}}(X_{i}=x_{i}^{*})>0$ (though not ensuring that $\mathbb{P}_{X\sim\mathcal{F}}(X=x^{*})>0$ ). We set the maximum perturbation size $\lambda$ to be the number of mutable features. The first part in the exit condition of the algorithm is different for regression tasks: $|\widetilde{M}(x^{*})-y|>\tau$ , for a predefined threshold $\tau$ . Given that the target variable in the Airbnb use case is the logarithm of the property’s price and considering its variance, we set $\tau=0.75$ .

Input:

\widetilde{M}

x

y

\mathcal{L}

\mathcal{L}^{*}

I

\lambda

x^{*}\leftarrow x

S\leftarrow\emptyset

4while $\widetilde{M}(x^{*})=y$ and $|S|<\lambda$ do

G\leftarrow\nabla_{x^{*}}\mathcal{L}(\widetilde{M}(x^{*}),y)

S\leftarrow S\cup\{\arg\max_{i\notin S\cup I}G_{i}\}

\alpha\leftarrow\textsc{ComputeStep}(x^{*},\mathcal{L}^{*})

10 forall $i\in S$ do

x_{i}^{*}\leftarrow x_{i}^{*}+\alpha_{i}

x^{*}\leftarrow\textsc{Project}(x^{*})

return

x^{*}

Algorithm 1 Crafting an adversarial example.

\widetilde{M}

is the surrogate model,

x

is the candidate benign sample,

y

its label,

\mathcal{L}

is the original task’s loss function,

\mathcal{L}^{*}

is the adversarial objective function composed of

\mathcal{L}_{adv}

and

\mathcal{L}_{spatial}

I

is the set of immutable features, and

\lambda

is the maximum distortion (

\ell_{0}

-wise).

5 Results

The surrogate and target models are evaluated using the standard metrics relevant to the task: the receiver operating characteristic-area under the curve (ROC - AUC) for imbalanced classification and the MSE for regression. The performance of the models presented in Table 2 shows that the GBM model consistently outperforms every other model in every use case and that its performance is also very close to highly-ranked submissions in the corresponding Kaggle competitions (apart from the Airbnb which is not part of a Kaggle competition). It is important to note that our surrogate model does not fall too far behind the GBM model, which means that the embedding sub-network did not affect the overall performance too negatively.

As we have established, the metric most relevant to the case of heterogeneous data and real-world use cases is the $\ell_{0}$ norm, as the other norms are compromised by the significantly different scales of features. However, we also decided to include the $\ell_{1}$ norm in our analysis for just numeric features in the perturbations, in order to obtain some understanding of how these features are perturbed in cases in which there are unusually big changes. In addition to the $\ell_{0}$ norm, we also looked at its percentage of the entire dataset’s features, as the number of features varies between the use cases. An example of the need to analyze the percentage can be found in the Airbnb use case where the value of the average total $\ell_{0}$ distance is $5.28$ , but when incorporating the total number of features in the dataset, which is $157$ , we get an average relative $\ell_{0}$ (i.e., total $\ell_{0}(\%)$ ) of only $3.36\%$ . In Table 3 we can see that in every use case the bulk of the average perturbation, are changes in categorical features. Moreover, since the Airbnb dataset contains considerably more categorical, which are mostly binary features (representing the existence of amenities in the property), the difference is even more noticeable, with $4.33$ of the total $\ell_{0}$ distance of $5.28$ being changes in categorical features. Fig. 2 presents histograms of the $\ell_{0}$ norms of the perturbations, which are highly right-skewed and indicate that the vast majority of the adversarial examples include changes to only 1-2 features. The success rate is consistently high throughout the use cases, with nearly all of the samples in the attack set becoming adversarial examples.

Dataset	$\ell_{0}$	$\ell_{0}$ (%)	$\ell_{1}$	$\ell_{0}$	$\ell_{0}$ (%)	$\ell_{0}$	$\ell_{0}$ (%)	Success
	Categorical		Numeric			Total
HCDR	1.60	7.62%	6.71	0.67	2.16%	2.27	4.37%	98.20%
ICU	1.80	8.56%	4.96	1.64	1.56%	3.51	2.79%	94.40%
Airbnb	4.33	3.26%	4.25	0.95	3.96%	5.28	3.36%	99.80%

Table 3: Analysis of the adversarial attack on the surrogate model. The metrics correspond to the average across the perturbations of successful adversarial examples. The

\ell_{0}

value is partitioned into categorical and numeric features. The

\ell_{0}

(%) column relates to the corresponding part of the

\ell_{0}

from the relevant type of features. The success rate is relative to the attack set consisting of

500

samples in each task.

In addition to attacking the surrogate model, we examine the transferability of the adversarial examples to GBM, random forest, and decision tree models. As described in Algorithm 1, the iterative feature selection is based on the gradients of the surrogate network. When targeting models other than the surrogate, this selection process can be adjusted so that the impact of each feature is represented explicitly by the model, rather than implicitly by a surrogate’s gradients. We changed it to be the feature importance of the tree-based models we target, namely the information gain of each feature. In Table 4 we report the transferability rates of the adversarial examples for each of the target models, as well as the rates of the adjusted attack (i.e., when feature selection is based on the target model’s feature importance) in parentheses. The base transferability rates in all use cases and models is already high, however the adjustment made evident and substantial improvements consistently throughout our experiments.

Dataset	GBM	Random Forest	Decision Tree
HCDR	38.29% (44.26%)	35.85% (42.22%)	38.20% (47.50%)
ICU	27.00% (36.32%)	20.40% (36.32%)	23.60% (60.53%)
Airbnb	14.42% (29.78%)	18.83% (32.93%)	21.36% (31.50%)

Table 4: Ratios of successfully transferred adversarial examples that were generated in an attack on the surrogate model. The parentheses indicate the transferability rates of attacks which were adjusted to each target model.

In our experiments, the specific compositions of the various datasets (i.e., the size of groups of feature types) and the distribution of the constraints (as described in Table 1) have a direct effect on the adversarial optimization process. Generally, we observed that heterogeneous input spaces involve several constraints and pose many challenges to the adversarial susceptibility of learning algorithms. Our results show that not only is it possible to overcome such challenges and attack neural networks trained on heterogeneous tabular data very successfully and effectively, but that the transferability property of the adversarial examples generated is rather persistent in this unique case as well.

6 Conclusions

In this paper, we explore the role played by data in the susceptibility of machine learning models to adversarial manipulations. We introduce a general approach that formalizes the generation of valid adversarial examples for heterogeneous tabular data, and provide an implementation of such an attack in an optimization-based context. Our approach is focused on an $\ell_{0}$ -oriented adversary and her ability to apply the adversarial perturbations in real life. To the best of our knowledge, this is the first study to present an automated attack for heterogeneous tabular data. Our results support the possibility of successfully attacking previously unexplored domains with heterogeneous input space.

Future work may place greater emphasis on the important aspect of preserving feature correlation and improving our implementation, as well as developing ways to properly evaluate the success of such preservation. Additional implementations for our approach should also be explored, mainly different embedding functions and other methods for crafting adversarial examples. Advancement of mitigation methods could have more potential in the case of heterogeneous data than in the general case, as the baseline naive approach would be to limit the training to immutable features.

We believe that our work has identified an important research direction in the field of adversarial learning and broadens its scope beyond the main applications of computer vision.

References

Dalvi et al. [2004] N. Dalvi, P. Domingos, S. Sanghai, D. Verma, Adversarial classification, in: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 2004, pp. 99–108.
Lowd and Meek [2005] D. Lowd, C. Meek, Adversarial learning, in: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, 2005, pp. 641–647.
Biggio et al. [2013] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, G. Giacinto, F. Roli, Evasion attacks against machine learning at test time, in: Joint European conference on machine learning and knowledge discovery in databases, Springer, 2013, pp. 387–402.
Arnab et al. [2018] A. Arnab, O. Miksik, P. H. Torr, On the robustness of semantic segmentation models to adversarial attacks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 888–897.
Sharif et al. [2016] M. Sharif, S. Bhagavatula, L. Bauer, M. K. Reiter, Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition, in: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, 2016, pp. 1528–1540.
Szegedy et al. [2013] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus, Intriguing properties of neural networks, arXiv preprint arXiv:1312.6199 (2013).
Goodfellow et al. [2014] I. J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial examples, arXiv preprint arXiv:1412.6572 (2014).
Papernot et al. [2016] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, A. Swami, The limitations of deep learning in adversarial settings, in: 2016 IEEE European symposium on security and privacy (EuroS&P), IEEE, 2016, pp. 372–387.
Kurakin et al. [2016] A. Kurakin, I. Goodfellow, S. Bengio, Adversarial examples in the physical world, arXiv preprint arXiv:1607.02533 (2016).
Carlini and Wagner [2017] N. Carlini, D. Wagner, Towards evaluating the robustness of neural networks, in: 2017 IEEE Symposium on Security and Privacy (S&P), IEEE, 2017, pp. 39–57.
Su et al. [2019] J. Su, D. V. Vargas, K. Sakurai, One pixel attack for fooling deep neural networks, IEEE Transactions on Evolutionary Computation 23 (2019) 828–841.
Carlini and Wagner [2018] N. Carlini, D. Wagner, Audio adversarial examples: Targeted attacks on speech-to-text, in: 2018 IEEE Security and Privacy Workshops (SPW), IEEE, 2018, pp. 1–7.
Grosse et al. [2016] K. Grosse, N. Papernot, P. Manoharan, M. Backes, P. McDaniel, Adversarial perturbations against deep neural networks for malware classification, arXiv preprint arXiv:1606.04435 (2016).
Hu and Tan [2017] W. Hu, Y. Tan, Generating adversarial malware examples for black-box attacks based on gan, arXiv preprint arXiv:1702.05983 (2017).
Jia and Liang [2017] R. Jia, P. Liang, Adversarial examples for evaluating reading comprehension systems, in: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 2021–2031.
Ebrahimi et al. [2018] J. Ebrahimi, A. Rao, D. Lowd, D. Dou, Hotflip: white-box adversarial examples for text classification, in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2018, pp. 31–36.
Shavitt and Segal [2018] I. Shavitt, E. Segal, Regularization learning networks: Deep learning for tabular datasets, in: Advances in Neural Information Processing Systems, 2018, pp. 1379–1389.
Arik and Pfister [2019] S. O. Arik, T. Pfister, Tabnet: Attentive interpretable tabular learning, arXiv preprint arXiv:1908.07442 (2019).
Popov et al. [2019] S. Popov, S. Morozov, A. Babenko, Neural oblivious decision ensembles for deep learning on tabular data, in: International Conference on Learning Representations (ICLR), 2019.
Nielsen [2015] M. A. Nielsen, Neural networks and deep learning, volume 2018, Determination press San Francisco, CA, USA:, 2015.
Papernot et al. [2016] N. Papernot, P. McDaniel, I. Goodfellow, Transferability in machine learning: From phenomena to black-box attacks using adversarial samples, arXiv preprint arXiv:1605.07277 (2016).
Andriushchenko and Hein [2019] M. Andriushchenko, M. Hein, Provably robust boosted decision stumps and trees against adversarial attacks, in: Advances in Neural Information Processing Systems, 2019, pp. 12997–13008.
Chen et al. [2019] H. Chen, H. Zhang, D. Boning, C.-J. Hsieh, Robust decision trees against adversarial examples, in: International Conference on Machine Learning, 2019, pp. 1122–1131.
Kantchelian et al. [2016] A. Kantchelian, J. D. Tygar, A. Joseph, Evasion and hardening of tree ensemble classifiers, in: International Conference on Machine Learning, 2016, pp. 2387–2396.
Cheng et al. [2019] M. Cheng, T. Le, P.-Y. Chen, H. Zhang, J. Yi, C.-J. Hsieh, Query-efficient hard-label black-box attack: An optimization-based approach, in: International Conference on Learning Representation (ICLR), 2019.
Kołcz and Teo [2009] A. Kołcz, C. H. Teo, Feature weighting for improved classifier robustness, in: CEAS’09: sixth conference on email and anti-spam, 2009.
Biggio et al. [2010] B. Biggio, G. Fumera, F. Roli, Multiple classifier systems for robust classifier design in adversarial environments, International Journal of Machine Learning and Cybernetics 1 (2010) 27–41.
Biggio et al. [2011] B. Biggio, B. Nelson, P. Laskov, Support vector machines under adversarial label noise, in: Asian conference on machine learning, 2011, pp. 97–112.
Sarkar et al. [2018] S. K. Sarkar, K. Oshiba, D. Giebisch, Y. Singer, Robust classification of financial risk, in: NeurIPS Workshop on Challenges and Opportunities for AI in Financial Services, 2018.
Ballet et al. [2019] V. Ballet, X. Renard, J. Aigrain, T. Laugel, P. Frossard, M. Detyniecki, Imperceptible adversarial attacks on tabular data, in: NeurIPS Workshop on Robust AI in Financial Services, 2019.
Ustun et al. [2019] B. Ustun, A. Spangher, Y. Liu, Actionable recourse in linear classification, in: Proceedings of the Conference on Fairness, Accountability, and Transparency, 2019, pp. 10–19.
Kanamori et al. [2020] K. Kanamori, T. Takagi, K. Kobayashi, H. Arimura, Dace: Distribution-aware counterfactual explanation by mixed-integer linear optimization, in: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, 2020.
Mahalanobis [1936] P. C. Mahalanobis, On the generalized distance in statistics, in: Proceedings of the National Institute of Science of India, 1936.
Breunig et al. [2000] M. M. Breunig, H.-P. Kriegel, R. T. Ng, J. Sander, Lof: identifying density-based local outliers, in: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, 2000, pp. 93–104.
Wachter et al. [2018] S. Wachter, B. Mittelstadt, C. Russell, Counterfactual explanations without opening the black box: automated decisions and the gdpr, Harvard Journal of Law and Technology 31 (2018).
Goodfellow et al. [2014] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in: Advances in neural information processing systems, 2014, pp. 2672–2680.
Rumelhart et al. [1985] D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning internal representations by error propagation, Technical Report, California Univ San Diego La Jolla Inst for Cognitive Science, 1985.
Weinberger and Saul [2009] K. Q. Weinberger, L. K. Saul, Distance metric learning for large margin nearest neighbor classification, Journal of Machine Learning Research 10 (2009) 207–244.
Xu et al. [2019] L. Xu, M. Skoularidou, A. Cuesta-Infante, K. Veeramachaneni, Modeling tabular data using conditional gan, in: Advances in Neural Information Processing Systems, 2019, pp. 7335–7345.
Zhang et al. [2017] J. Zhang, G. Cormode, C. M. Procopiuc, D. Srivastava, X. Xiao, Privbayes: Private data release via bayesian networks, ACM Transactions on Database Systems (TODS) 42 (2017) 1–41.
Reiter [2005] J. P. Reiter, Using cart to generate partially synthetic public use microdata, Journal of Official Statistics 21 (2005) 441.
Group [2018] H. C. Group, Home credit default risk, https://www.kaggle.com/c/home-credit-default-risk/data, 2018. Accessed August 2021.
The Global Open Source Severity of Illness Score Consortium [2018] G. W. i. D. S. C. The Global Open Source Severity of Illness Score Consortium, Intensive care unit, https://www.kaggle.com/c/widsdatathon2020/data, 2018. Accessed August 2021.
Airbnb [2018] I. Airbnb, Lodging airbnb listings in major U.S. cities, http://insideairbnb.com/get-the-data.html, 2018. Accessed August 2021.
Ke et al. [2017] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, T.-Y. Liu, Lightgbm: A highly efficient gradient boosting decision tree, in: Advances in neural information processing systems, 2017, pp. 3146–3154.
Hermans et al. [2017] A. Hermans, L. Beyer, B. Leibe, In defense of the triplet loss for person re-identification, arXiv preprint arXiv:1703.07737 (2017).
Duchi et al. [2011] J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization., Journal of machine learning research 12 (2011).
Kingma and Ba [2014] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, in: International Conference on Learning Representation (ICLR), 2014.

Not All Datasets Are Born Equal: On Heterogeneous Data and Adversarial Examples