This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

An Algorithm for Out-Of-Distribution Attack to Neural Network Encoder

Liang Liang & Linhai Ma & Linchen Qian & Jiasong Chen
Department of Computer Science
University of Miami
Coral Gables, FL 33146, USA
{liang.liang, l.ma, lxq93, jasonchen }@miami.edu
Abstract

Deep neural networks (DNNs), especially convolutional neural networks, have achieved superior performance on image classification tasks. However, such performance is only guaranteed if the input to a trained model is similar to the training samples, i.e., the input follows the probability distribution of the training set. Out-Of-Distribution (OOD) samples do not follow the distribution of training set, and therefore the predicted class labels on OOD samples become meaningless. Classification-based methods have been proposed for OOD detection; however, in this study we show that this type of method has no theoretical guarantee and is practically breakable by our OOD Attack algorithm because of dimensionality reduction in the DNN models. We also show that Glow likelihood-based OOD detection is breakable as well.

1 Introduction

Deep neural networks (DNNs), especially convolutional neural networks (CNNs), have become the method of choice for image classification. Under the i.i.d. (independent and identically distributed) assumption, a high-performance DNN model can correctly-classify an input sample as long as the sample is “generated” from the distribution of training data. If an input sample is not from this distribution, which is called Out-Of-Distribution (OOD), then the predicted class label from the model is meaningless. It would be great if the model has the ability to distinguish OOD samples from in-distribution samples. OOD detection is needed especially when applying DNN models in life-critical applications, e.g., vision-based self-driving or image-based medical diagnosis.

It was shown by Nguyen et al. (2015) (Nguyen et al., 2015) that DNN classifiers can be easily fooled by OOD data, and an evolutionarily algorithm was used to generate OOD samples such that DNN classifiers had high output confidence on these samples. Since then, many methods have been proposed for OOD detection using classifiers or encoders (Hendrycks & Gimpel, 2017; Hendrycks et al., 2019; Liang et al., 2018; Lee et al., 2018b; a; Alemi et al., 2018; Hendrycks & Gimpel, 2017). For instance, Hendrycks et al. (Hendrycks & Gimpel, 2017) show that a classifier’s prediction probabilities of OOD examples tend to be more uniform, and therefore the maximum predicted class probability from the softmax layer was used for OOD detection. Regardless of the details of these methods, every method needs a classifier or an encoder, which takes an image x as input and compresses it into a vector zz in the laten space; after some further transform, zz is converted to an OOD detection score τ\tau. This computing process can be expressed as: z=f(x)z=f(x) and τ=d(z)\tau=d(z). To perform OOD detection, a detection threshold needs to be specified, and then xx is OOD if τ\tau is smaller/larger than the threshold. For the evaluation of OOD detection methods, (Hendrycks & Gimpel, 2017), an OOD detector is usually trained on a dataset (e.g. Fashion-MNIST as in-distribution) and then it is tested on another dataset (e.g. MNIST as OOD).

As will be shown in this study, the above mentioned classification-based OOD detection methods are practically breakable. As an example (more details in Section 3), we used the Resnet-18 model (He et al., 2016) pre-trained on the ImageNet dataset. Let xinx_{in} denote a 224×224×3 image (in-distribution sample) in ImageNet and xoutx_{out} denote an OOD sample which could be any kinds of images (even random noises) not belonging to any category in ImageNet. Let zz denote the 512-dimensional feature vector in Resnet-18, which is the input to the last fully-connected linear layer before the softmax operation. Thus, we have zin=f(xin)z_{in}=f(x_{in}) and zout=f(xout)z_{out}=f(x_{out}). In Fig. 1, xinx_{in} is the image of Santa Claus, and xoutx_{out} could be a chest x-ray image or a random-noise image, and “surprisingly”, zoutzinz_{out}\cong z_{in} which renders OOD detection score to be useless: d(zout)d(zin)d(z_{out})\cong d(z_{in}).

In Section 2, we will introduce an algorithm to generate OOD samples such that zoutzinz_{out}\cong z_{in}. In Section 3, we will show the evaluation results on publicly available datasets, including ImageNet subset, GTSRB, OCT, and COVID-19 CT.

Refer to captionRefer to caption
Refer to captionRefer to caption
Refer to captionRefer to caption
Figure 1: The 1st column shows the image of Santa Claus xinx_{in} and the scatter plot of zinz_{in} using blue dots. The 2nd column shows a chest x-ray image xoutx_{out} and the scatter plot of zoutz_{out} (red circles) and zinz_{in} (blue). The 3rd column shows a random image xoutx_{out}, and the scatter plot of zoutz_{out} (red) and zinz_{in} (blue).

Since some generative models (e.g. Glow (Kingma & Dhariwal, 2018)) can approximate the distribution of training samples (i.e. p(xin)p(x_{in})), likelihood-based generative models were utilized for OOD detection (Nalisnick et al., 2019). It has been shown that likelihoods derived from generative models may not distinguish between OOD and training samples (Nalisnick et al., 2019; Ren et al., 2019; Choi et al., 2018), and a fix to the problem could be using likelihood ratio instead of raw likelihood score (Serrà et al., 2019). Although not the main focus of this study, we will show that the OOD sample’s likelihood score from the Glow model (Kingma & Dhariwal, 2018; Serrà et al., 2019) can be arbitrarily manipulated by our algorithm (Section 2.1) such that the output probability p(xin)p(xout)p(x_{in})\cong p(x_{out}), which further diminishes the effectiveness of any Glow likelihood-based detection methods.

2 Methodology

2.1 OOD attack on DNN Encoder

We introduce an algorithm to perform OOD attack on a DNN encoder z=f(x)z=f(x) which takes an image xx as input and transforms it into a feature vector zz in a latent space. Preprocessing on xx can be considered as the very first layer inside of the model f(x)f(x). The algorithm needs a weak assumption that f(x)f(x) is sub-differentiable. A CNN classifier can be considered a composition of a feature encoder z=f(x)z=f(x) and a feature classifier p=g(z)p=g(z) where pp is the softmax probability distribution over multiple classes.

Let’s consider an in-distribution sample xinx_{in} and an OOD sample xoutx_{out}^{\prime}, and apply the model: zin=f(xin)z_{in}=f(x_{in}) and zout=f(xout)z_{out}^{\prime}\ =f(x_{out}^{\prime}). Usually, zoutzinz_{out}^{\prime}\ \neq z_{in}. However, if we add a relatively small amount of noise δ\delta to xoutx_{out}^{\prime}, then it could be possible that f(xout+δ)=zinf\left(x_{out}^{\prime}\ +\delta\right)=z_{in} and xout+δx_{out}^{\prime}\ +\delta is still OOD. This idea is realized in Algorithm 1, OOD Attack on DNN Encoder.

Algorithm 1 OOD Attack on DNN Encoder
1:An in-distribution sample xinx_{in} in a dataset. An OOD sample xoutx_{out}^{\prime} not similar to any sample in the dataset. ff, the neural network feature encoder. ϵ\epsilon, the maximum perturbation measured by Lp norm. NN, the total number of iterations. α\alpha the learning rate of the optimizer.
2:an OOD sample xoutx_{out} s.t. f(xout)f(xin)f(x_{out})\cong f(x_{in})
3:Process:
4:Generate a random noise ξ\xi with ξϵ||\xi||\leq\epsilon
5:Initialize xout=xout+ξx_{out}=x_{out}^{\prime}+\xi
6:Setup loss J(xout)=f(xout)f(xin)2J(x_{out})=||f(x_{out})-f(x_{in})||^{2} (L2 norm)
7:for nn from 11 to NN do
8:     xoutclip(xoutαh(J(xout)))x_{out}\leftarrow clip(x_{out}-\alpha\cdot h(J^{\prime}(x_{out}))), where J(x)=J/xJ^{\prime}(x)=\partial J/\partial x.
9:end for
10:Note: The clip operation ensures that xoutxoutpϵ||x_{out}-x_{out}^{\prime}||_{p}\leq\epsilon. The clip operation also ensures that pixel values stay within the feasible range (e.g. 0 to 1). If L-inf norm is used, h(J)h\left(J^{\prime}\right) is the sign function; and if L2 norm is used, h(J)h\left(J^{\prime}\right) is a function that normalizes JJ^{\prime} by its L2 norm. Adamax optimizer is used in the implementation

The clip operation in Algorithm 1 is very important: it will limit the difference between xoutx_{out} and xoutx_{out}^{\prime} so that xoutx_{out} may be OOD. The algorithm is inspired by the method called projected gradient descent (PGD) (Kurakin et al., 2016; Madry et al., 2018) which is used for adversarial attacks. We note that the term “adversarial attack” usually refers to adding a small perturbation to a clean sample xx in a dataset such that a classifier will incorrectly-classify the noisy sample while being able to correctly-classify the original clean sample xx. Thus, OOD attack and adversarial attack are completely different things.

In practice, the Algorithm 1 can be repeated many times to find the best solution. Random initialization is performed in line-1 and line-2 of the algorithm process. By adding initial random noise ξ\xi to xoutx_{out}^{\prime}, the algorithm will have a better chance to avoid local minima caused by a bad initialization.

2.2 Dimensionality Reduction and OOD Attack

Recall that in a classification-based OOD Detection approach, a DNN encoder transforms the input to a feature vector, i.e., z=f(x)z=f(x), and an OOD detection score is computed by another transform on zz, i.e., and τ=d(z)\tau=d(z). If zoutzinz_{out}\cong z_{in}, then d(zout)d(zin)d(z_{out})\cong d(z_{in}) which breaks the OOD detector regardless of the transform dd. Usually, a DNN encoder makes dimensionality reduction: the dimension of zz is significantly smaller than the dimension of xx. In the example shown in Fig. 1, zz is a 512-dimensional feature vector (dim(z)=512)(dim{\left(z\right)}=512) in Resnet-18, and the dimension of xx is 150528 (224×224×3224\times 224\times 3).

Dimensionality reduction in an encoder provides the opportunity for the existence of the mapping of OOD and in-distribution samples to the same locations in the latent space. This is simply because the vectors in a lower-dimensional space cannot represent all of the vectors/objects in a higher-dimensional space, which is the Pigeonhole Principle. Let’s do an analysis on the Resnet-18 example in Fig. 1. A pixel of the color image xx has 8-bits. In the 150528-dimension discrete input space, there are 256224×224×3256^{224\times 224\times 3} different images/vectors, which defines the size of the input space. float32 data type is usually used in computation, a float32 variable can roughly represent 2322^{32} unique real numbers. Thus, in the 512-dimensional latent space, there are 232×5122^{32\times 512} unique vectors/objects, which defines the size of the latent space. The ratio is (232×512256224×224×3)1{\left(\frac{2^{32\times 512}}{256^{224\times 224\times 3}}\right)}\ll 1, and it shows that the latent space is significantly smaller than the input space. Thus, for some sample xx in the dataset, we can find another sample xx^{\prime} such that f(x)=f(x)f\left(x^{\prime}\ \right)=f\left(x\right) as long as dim(z)<dim(x)dim{\left(z\right)}<dim(x). A question arises: will the xx^{\prime} be in-distribution or OOD? To answer this question, let’s partition the input discrete space Ω\Omega into two disjoint regions (Ω=ΩinΩout\Omega=\Omega_{in}\cup\Omega_{out}), Ωin\Omega_{in} of in-distribution samples and Ωout\Omega_{out} of OOD samples. |Ω|\left|\Omega\right| denotes the size of Ω\Omega. Usually, the training set is only a subset of Ωin\Omega_{in}, and the size of Ωout\Omega_{out} is significantly larger than the size of Ωin\Omega_{in}. For example, if Ωin\Omega_{in} is ImageNet, then Ωout\Omega_{out} contains medical images, noise images, and other weird images. If Ωin\Omega_{in} contains human face images, then Ωout\Omega_{out} contains non-face images and then |Ωin||Ωout|\left|\Omega_{in}\right|\ll|\Omega_{out}|. The latent space (z-space) is denoted by \mathcal{F} and partitioned into two subspaces: =inout\mathcal{F}=\mathcal{F}_{in}\cup\mathcal{F}_{out}. An encoder is applied such that Ωinin\Omega_{in}\rightarrow\mathcal{F}_{in} and Ωoutout\Omega_{out}\rightarrow\mathcal{F}_{out}. If there is overlap inout\mathcal{F}_{in}\cap\mathcal{F}_{out}\neq\emptyset, then the encoder is vulnerable to OOD attack. Usually, the encoder is a part of a classifier trained to classify in-distribution samples into different classes, and therefore the encoder cannot guarantee that there is no overlap between in\mathcal{F}_{in} and out\mathcal{F}_{out}. What is the size of inout\mathcal{F}_{in}\cap\mathcal{F}_{out} or what is the probability P(|inout|a)P\left(|\mathcal{F}_{in}\cap\mathcal{F}_{out}|\geq a\right)? While it is hard to calculate it for an arbitrary encoder and dataset, we can do a worst-case-scenario analysis. Assuming that every OOD sample is i.i.d. mapped to the latent space with a uniform distribution over a number of |||\mathcal{F}| spots, then the probability of OOD samples covering the entire latent space is P(out=)=||!×Stirling(|Ωout|,||)/|||Ωout|1P\left(\mathcal{F}_{out}=\mathcal{F}\right)=\left|\mathcal{F}\right|!\times Stirling(\left|\Omega_{out}\right|,\left|\mathcal{F}\right|)/\left|\mathcal{F}\right|^{\left|\Omega_{out}\right|}\rightarrow 1 as ||/|Ωout|0\left|\mathcal{F}\right|/\left|\Omega_{out}\right|\rightarrow 0, where StirlingStirling is the Stirling number of the second kind. Noting that ||/|Ωout|=232×512256224×224×31.4×1070\left|\mathcal{F}\right|/\left|\Omega_{out}\right|=\frac{2^{32\times 512}}{256^{224\times 224\times 3}-1.4\times{10}^{7}}\approx 0 and 1.4×1071.4\times{10}^{7} being the number of samples in ImageNet, then it could be true that almost (with probability close to 1) the entire latent space of Resnet-18 is covered by the zz vectors of OOD samples.

Next, we discuss how to construct OOD samples to fool neural networks. First, let’s take a look at one-layer linear network: z=Wxz=Wx, and make notations: an in-distribution input xMx\in\mathcal{R}^{M}, latent code zKz\in\mathcal{R}^{K} and KMK\ll M. WW is a K×MK\times M matrix, and rank(W)Krank\left(W\right)\leq K. The null space of WW is Ωnull={η;Wη=0}\Omega_{null}=\{\eta;\ W\eta=0\}. Now, let’s take out the basis vectors of this space, η1\eta_{1}, η2\eta_{2}, \ldots, ηMK\eta_{M-K}, and compute x=iλiηi+xx^{\prime}=\sum_{i}{\lambda_{i}\eta_{i}}+x where λi\lambda_{i} is a non-zero scalar. Obviously, z=Wx=zz^{\prime}=Wx^{\prime}=z. We can set the magnitude of the “noise” iλiηi\sum_{i}{\lambda_{i}\eta_{i}} to be arbitrarily large such that xx^{\prime} will look like garbage and become OOD, which is another explanation of the existence of OOD samples. Then, we can try to apply this attack method to multi-layer neural network. If the neural network only uses ReLU activation, then the input-output relationship can be exactly expressed as a piecewise-linear mapping (Ding et al., 2020), a similar approach can be applied layer by layer. If ReLU is not used, a new method is needed. We note that the filter bank of a convolution layer can be converted to a weight matrix. We have examined the state-of-the-art CNN models that are pre-trained on ImageNet and available in Pytorch, and dimensionality reduction is performed in most of the layers (except 1 or 2 layers near the input), i.e. |||Ωin||Ωout|\left|\mathcal{F}\right|\leq\left|\Omega_{in}\right|\ll\left|\Omega_{out}\right|. Instead of constructing an OOD sample by adding perturbations to an in-distribution sample, in Algorithm-1, we construct an OOD sample paired with an in-distribution sample by starting from an initial sample that is OOD.

Could an encoder be made robust to the OOD attack by including OOD samples in training set for supervised binary classification: in vs out? Usually |Ωin||Ωout|\left|\Omega_{in}\right|\ll|\Omega_{out}| and we will have to collect and label “enough” samples in Ωout\Omega_{out}, which is infeasible considering the large size of ΩoutΩ\Omega_{out}\approx\ \Omega. As a comparison, to enhance DNN classifier robustness against adversarial noises, it is very effective to include noisy samples in the training set, i.e. Ωin=Ωin_cleanΩin_noisy\Omega_{in}=\Omega_{in\_clean}\cup\Omega_{in\_noisy}. It is known as adversarial training (Goodfellow et al., 2018) and computationally feasible as |Ωin_noisy||Ωout||\Omega_{in\_noisy}|\ll|\Omega_{out}|.

2.3 Problem of Glow likelihood-based OOD Detection

Generative models have been developed to approximate the training data distribution. Glow (Kingma & Dhariwal, 2018) is one of these models, and it has a very special property: it is bijective and the latent space dimension is the same as the input space dimension, i.e., no dimensionality reduction, which is the reason that we studied this model.

Several studies have found the problem of Glow-based OOD detection: likelihoods derived from Glow may not distinguish between OOD and training samples (Ren et al., 2019; Choi et al., 2018), and a possible fix to the issue could be using likelihood ratio (Serrà et al., 2019). In this study, we further show that negative log-likelihood (NLL) from the Glow model can be arbitrarily manipulated by our algorithm in which f(x)f(x) denotes NLL. The results on CelebA face image dataset are in Section 3. We think the major reason causing Glow’s vulnerability to OOD attack is that we do not have enough training data in high dimensional space. Glow is a mapping: xinzinp(zin)p(xin)x_{in}\rightarrow z_{in}\rightarrow p(z_{in})\rightarrow p(x_{in}), the probability of xinx_{in}. For an OOD sample xoutx_{out}, the mapping is xoutzoutp(zout)p(xout)x_{out}\rightarrow z_{out}\rightarrow p(z_{out})\rightarrow p(x_{out}). Since the number of training samples is significantly smaller than the size of the space, there are a huge number of “holes” in the latent space (i.e., regions that no training samples are mapped to), and it is easy to put zoutz_{out} in one of these “holes” close to zinz_{in} such that p(zout)p(zin)p(z_{out})\cong p(z_{in}).

2.4 Reconstruction-based OOD Detection

Auto-encoder style OOD detection has been developed for anomaly detection (Chalapathy & Chawla, 2019; Cohen et al., 2019) based on reconstruction error. The data flow of an auto-encoder is xzx^x\rightarrow z\rightarrow\hat{x} where x^\hat{x} is the reconstruction of xx. The OOD detection score can be the difference between xx and x^\hat{x}, e.g., the Lp distance xx^p||x-\hat{x}||_{p} or Mahalanobis Distance. This type of method has two known issues. The first issue is that auto-encoder may well reconstruct OOD samples, i.e., xoutx^outx_{out}\approx{\hat{x}}_{out}. Thus, one needs to make sure it has large reconstruction errors on OOD samples, which can be done by limiting the capacity of auto-encoder or saturating it with in-distribution samples. The second issue is that pixel-to-pixel distance is not a good measurement of image dissimilarity, especially for medical images. For example, xx could be a CT image of a heart and x^\hat{x} could be the image of the same heart that deforms a little bit, but the pixel-to-pixel distance between xx and x^\hat{x} can be very large. Thus, a robust image similarity measurement is needed.

Interestingly, the proposed OOD attack algorithm has no effect on this type of method. Let’s consider the data flow: xinzinx^inx_{in}\rightarrow z_{in}\rightarrow{\hat{x}}_{in} and xoutzoutx^outx_{out}\rightarrow z_{out}\rightarrow{\hat{x}}_{out}. If zout=zinz_{out}=z_{in}, then x^out=x^in{\hat{x}}_{out}={\hat{x}}_{in} . Then, it is easy to find out that xoutx_{out} is OOD because xoutx^outp=xoutx^inp||x_{out}-\hat{x}_{out}||_{p}=||x_{out}-\hat{x}_{in}||_{p} which is very large. Ironically, in this case, the attack algorithm helps to identify the OOD sample. In future work, we will evaluate the effectiveness of combining the proposed algorithm and auto-encoder for OOD detection.

3 Experiments

We applied the proposed algorithm to attack state-of-the-art DNN models on image datasets. For each in-distribution sample xinx_{in}, an OOD sample xoutx_{out} is generated by the algorithm. To measure attack strength, mean absolute percentage error is calculated by MAPE(zout)=mean(|zoutzin|)/max(|zin|)MAPE(z_{out})=mean(|z_{out}-z_{in}|)/max(|z_{in}|). Here, zout=f(xout)z_{out}=f\left(x_{out}\right) and zin=f(xin)z_{in}=f(x_{in}). |zoutzin||z_{out}-z_{in}| is an error vector, and mean(|zoutzin|)mean(|z_{out}-z_{in}|) is the average error. max(|zin|)max(|z_{in}|) is the maximum absolute value in the vector zinz_{in}. We also applied the algorithm to attack the Glow model on CelebA dataset. In all of the evaluations, L2 norm was used in the proposed algorithm. Pytorch was used to implement the algorithm. Nvidia Titan V GPUs were used for model training and testing.

3.1 Evaluation on a Subset of ImageNet

ILSVRC2012 ImageNet has over 1 million images in 1000 classes. Given the limited computing power, it is impractical to test the algorithm on the whole dataset. Instead, we used a subset of 1000 images in 200 categories. The size of each image is 224×224×3. Two CNN models pretrained on the ImageNet were evaluated, which are Resnet-18 and Densenet-121 available in Pytorch.

Refer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to caption
Refer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to caption
Refer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to caption
Refer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to caption
Figure 2: The 1st column shows three in-distribution samples (e.g. Santa Claus) xinx_{in}, and the corresponding scatter plots of zinz_{in} (blue dots). The 2nd column shows OOD samples xoutx_{out} generated from a CT image xoutx_{out}^{\prime}, and the corresponding scatter plots of zoutz_{out} (red) and zinz_{in} (blue). The 3rd column shows OOD samples xoutx_{out} generated from a random image xoutx_{out}^{\prime}, and the corresponding scatter plots of zoutz_{out} (red) and zinz_{in} (blue). The 4th column shows OOD samples xoutx_{out} generated from a x-ray image xoutx_{out}^{\prime}, and the corresponding scatter plots of zoutz_{out} (red) and zinz_{in} (blue). MAPE values are embedded in these scatter-plots. Please zoom-in for better visualization.

Resnet-18 latent space has 512 dimensions. Since ImageNet covers a variety of natural and artificial objects, we choose medical images and random-noise images to make sure that xoutx_{out}^{\prime} is indeed OOD. Using each of the three initial OOD samples (chest x-ray, lung-CT, and random noise to be xoutx^{\prime}_{out}), we generated 1000 OOD samples paired with the 1000 (in-distribution) samples in the dataset and calculated MAPE values. The three MAPE histograms are shown in Fig. 3. Most of the MAPE values are less than 0.1%0.1\%.

We also evaluated another CNN, named Densenet-121, and obtained similar results. The latent space has 1024 dimensions. Again, using each of the three initial OOD samples, 1000 OOD samples are generated for the samples in the dataset, and then MAPE values are calculated. The three MAPE histograms are shown in Fig. 4. Most of the MAPE values are less than 0.1%0.1\%, indicating strong OOD attack.

From the results in Fig.2 to Fig. 4, it can be seen that each of the two CNN models mapped significantly different OOD samples and in-distribution samples to almost the same locations in the latent space. Dimensionality reduction leads to the existence of such mapping, and our algorithm can find such OOD samples out. In other words, the mapping from input space to the latent space is many-to-one, not bijective. And therefore, it is almost guaranteed that such OOD samples exist and they can break any OOD detector dd that computes a detection score d(z)d(z) only from the latent space (z-space). We tested a classical OOD detection method using the maximum of softmax output as detection score (Hendrycks & Gimpel, 2017). The results are shown in Table-1, and the AUROC scores are close to 0.5, showing that the method is unable to tell the difference between the 1000 OOD samples and 1000 in-distribution samples.

Refer to caption
Refer to caption
Refer to caption
Figure 3: Left: the MAPE histogram using a chest x-ray as the initial OOD sample. Middle: the MAPE histogram using a lung CT image as the initial OOD sample. Right: the MAPE histogram using a random-noise image as the initial OOD sample. The results are from Resnet-18.
Refer to caption
Refer to caption
Refer to caption
Figure 4: Left: MAPE histogram using a chest x-ray as the initial OOD sample. Middle: MAPE histogram using a lung CT image as the initial OOD sample. Right: MAPE histogram using a random-noise image as the initial OOD sample. The results are from Densenet-121.
Table 1: AUROC of two networks under OOD attack with each of x-ray, CT and noise as the initial OOD sample
x-ray CT random-noise
Resnet-18 0.643 0.633 0.500
Densenet-121 0.638 0.651 0.500

3.2 Evaluation on CelebA Dataset

We tested the algorithm and the Glow model (Kingma & Dhariwal, 2018) on the CelebA dataset (human face images). The size of each image is 64×64×3. After training, the model was able to generate realistic face images. The model also outputs the negative log-likelihood (NLL) of the input sample, i.e., NNL(x)=log(p(x))NNL\left(x\right)=-log\left(p\left(x\right)\right). By setting f(x)=NNL(x)f\left(x\right)=NNL\left(x\right), our algorithm can make f(xout)f\left(x_{out}\right) to be close to 0 or very large to match any f(xin)f\left(x_{in}\right), which renders NLL score useless for OOD detection. To demonstrate the effectiveness of our algorithm, we randomly selected 160 (in-distribution) samples in the dataset. We used a color spiral image as the initial OOD sample xoutx_{out}^{\prime}, and NNL(xout)=3.5268NNL\left(x_{out}^{\prime}\right)=3.5268. The distributions of NLL(xin)NLL(x_{in}) from 160 in-distribution samples and NLL(xout)NLL(x_{out}) from 160 corresponding OOD samples, as well as OOD sample images are shown in Fig. 5. The two distributions are almost identical. More examples of OOD samples are shown in Fig. 6. In each row of Fig. 6, although the images have different NLL scores, they look like each other.

[Uncaptioned image]
Figure 5: Top: NLL histogram (blue bars) of the in-distribution; samples Middle: NLL histogram (red bars) of OOD samples; Bottom: some OOD samples with NLL from 0 to 1. The initial OOD sample is a spiral image.
[Uncaptioned image]
Figure 6: Top: OOD samples generated by using one 2×2 checkerbox image for initialization; Bottom: OOD samples generated by using one 8×8 checkerbox image for initialization.

We have done more evaluations of our algorithm and OOD detection methods, please read the appendices.

4 Discussion

We hypothesized that dimensionality reduction in an encoder provides the opportunity for the existence of the mapping of OOD and in-distribution samples to the same locations in the latent space. We applied the OOD Attack algorithm to DNN classifiers on various datasets (see Appendices A and B), and the results (i.e. low MAPE values) confirmed our hypothesis. The results imply that classifier/encoder -based OOD detection methods may be vulnerable to the OOD attack.

By using our OOD Attack algorithm, we evaluated nine OOD detection methods (see Appendices C to J). The AUROC scores of these methods are close to 0.5 in our experiments, which means these methods could not distinguish between the in-distribution samples (e.g. CIFAR10) and the OOD samples generated by our algorithms. Our algorithm was unable to break a recent method named Certified Certain Uncertainty (Meinke & Hein, 2020), because this method utilizes Gaussian mixture models (GMMs) in the input space (note: no dimensionality reduction in GMMs). However, it is well known that GMMs have convergence issues for high dimensional data (e.g. medical images).

Compared to adversarial attacks and defenses, it is much more difficult to defend against OOD attacks. Adversarial attacks and OOD attacks are doing completely different things to neural networks, although the attack algorithms may use similar optimization techniques. For image classification applications, an adversarial attack will add a small amount of noise to the input (clean) image, and the resulting noisy image is still human-recognizable. Therefore, the magnitudes of adversarial noises are constrained. For example, a noisy image of a panda is still an image of the panda. By the judgment of humans, the noisy image and the clean image are the images of the same object, and the two images should be classified into the same class. Compared to adversarial samples, OOD samples, which can be generated by our OOD Attack algorithm, have much more freedom (e.g. they can be random noises), as long as they do not look like in-distribution samples. Thus, OOD detection is very challenging.

We would like to point out that it is difficult to evaluate an OOD detector to “prove” that it can detect, say 90%90\% of the OOD samples by experimentally testing it on Ωout\Omega_{out} because Ωout\Omega_{out} is too large to be tested on: |Ωin||Ωout||Ω|\left|\Omega_{in}\right|\ll|\Omega_{out}|\approx|\Omega|. For example, if Fashion-MNIST is used as in-distribution, then MINST and Omniglot are usually as OOD, which is the “standard” approach in the literature. Clearly, MINST and Omniglot cannot cover Ωout\Omega_{out} the space of OOD samples. If the image size is larger, then |Ωout||\Omega_{out}| becomes much larger. Could we design an evaluation method (experimental or analytical) that does not rely on OOD samples?

Before the OOD detection issue is fully resolved, for life-critical applications, any machine learning system that uses DNN classifiers should not make decisions independently and can only serve as assistants to humans. The OOD Attack algorithm and the experimental results can serve as a reference for the evaluation of new OOD detection methods.

We will release the code on GitHub when the paper is accepted. All figures are in high-resolution, please zoom in.

References

  • Alemi et al. (2018) Alexander A Alemi, Ian Fischer, and Joshua V Dillon. Uncertainty in the variational information bottleneck. UAI 2018 - Uncertainty in Deep Learning Workshop, 2018.
  • Arcos-Garcia et al. (2018) Alvaro Arcos-Garcia, Juan A Alvarez-Garcia, and Luis M Soria-Morillo. Deep neural network for traffic sign recognition systems: An analysis of spatial transformers and stochastic optimisation methods. Neural Networks, 99:158–165, 2018.
  • Chalapathy & Chawla (2019) Raghavendra Chalapathy and Sanjay Chawla. Deep learning for anomaly detection: A survey. arXiv preprint arXiv:1901.03407, 2019.
  • Choi et al. (2018) Hyunsun Choi, Eric Jang, and Alexander A Alemi. Waic, but why? generative ensembles for robust anomaly detection. arXiv preprint arXiv:1810.01392, 2018.
  • Cohen et al. (2019) Joseph Paul Cohen, Paul Bertin, and Vincent Frappier. Chester: A web delivered locally computed chest x-ray disease prediction system. arXiv preprint arXiv:1901.11210, 2019.
  • Ding et al. (2020) Gavin Weiguang Ding, Yash Sharma, Kry Yik Chau Lui, and Ruitong Huang. Max-margin adversarial (mma) training: Direct input space margin maximization through adversarial training. International Conference on Learning Representations, 2020.
  • Du & Mordatch (2019) Yilun Du and Igor Mordatch. Implicit generation and generalization in energy-based models. Advances in Neural Information Processing Systems, 2019.
  • Eykholt et al. (2018) Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. Robust physical-world attacks on deep learning visual classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  1625–1634, 2018.
  • Goodfellow et al. (2018) Ian Goodfellow, Patrick McDaniel, and Nicolas Papernot. Making machine learning robust against adversarial inputs. Communications of the ACM, 61(7):56–66, 2018.
  • Grathwohl et al. (2020) Will Grathwohl, Kuan-Chieh Wang, Jörn-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, and Kevin Swersky. Your classifier is secretly an energy based model and you should treat it like one. International Conference on Learning Representations, 2020.
  • He et al. (2016) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  • Hendrycks & Gimpel (2017) Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks. International Conference on Learning Representations, 2017.
  • Hendrycks et al. (2019) Dan Hendrycks, Mantas Mazeika, and Thomas Dietterich. Deep anomaly detection with outlier exposure. International Conference on Learning Representations, 2019.
  • Kermany et al. (2018) Daniel S Kermany, Michael Goldbaum, Wenjia Cai, Carolina CS Valentim, Huiying Liang, Sally L Baxter, Alex McKeown, Ge Yang, Xiaokang Wu, Fangbing Yan, et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell, 172(5):1122–1131, 2018.
  • Kingma & Dhariwal (2018) Durk P Kingma and Prafulla Dhariwal. Glow: Generative flow with invertible 1x1 convolutions. In Advances in neural information processing systems, pp. 10215–10224, 2018.
  • Kurakin et al. (2016) Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016.
  • Lakshminarayanan et al. (2017) Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in neural information processing systems, pp. 6402–6413, 2017.
  • Lee et al. (2018a) Kimin Lee, Honglak Lee, Kibok Lee, and Jinwoo Shin. Training confidence-calibrated classifiers for detecting out-of-distribution samples. International Conference on Learning Representations, 2018a.
  • Lee et al. (2018b) Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In Advances in Neural Information Processing Systems, pp. 7167–7177, 2018b.
  • Liang et al. (2018) Shiyu Liang, Yixuan Li, and Rayadurgam Srikant. Enhancing the reliability of out-of-distribution image detection in neural networks. International Conference on Learning Representations, 2018.
  • Madry et al. (2018) Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. International Conference on Learning Representations, 2018.
  • Meinke & Hein (2020) Alexander Meinke and Matthias Hein. Towards neural networks that provably know when they don’t know. International Conference on Learning Representations, 2020.
  • Nalisnick et al. (2019) Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, and Balaji Lakshminarayanan. Do deep generative models know what they don’t know? International Conference on Learning Representations, 2019.
  • Nguyen et al. (2015) Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  427–436, 2015.
  • Ren et al. (2019) Jie Ren, Peter J Liu, Emily Fertig, Jasper Snoek, Ryan Poplin, Mark Depristo, Joshua Dillon, and Balaji Lakshminarayanan. Likelihood ratios for out-of-distribution detection. In Advances in Neural Information Processing Systems, pp. 14707–14718, 2019.
  • Sastry & Oore (2019) Chandramouli Shama Sastry and Sageev Oore. Detecting out-of-distribution examples with in-distribution examples and gram matrices. NeurIPS 2019 Workshop on Safety and Robustness in Decision Making, 2019.
  • Serrà et al. (2019) Joan Serrà, David Álvarez, Vicenç Gómez, Olga Slizovskaia, José F Núñez, and Jordi Luque. Input complexity and out-of-distribution detection with likelihood-based generative models. International Conference on Learning Representations 2020, 2019.
  • Shi et al. (2020) Feng Shi, Jun Wang, Jun Shi, Ziyan Wu, Qian Wang, Zhenyu Tang, Kelei He, Yinghuan Shi, and Dinggang Shen. Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for covid-19. IEEE reviews in biomedical engineering, 2020.
  • Soares et al. (2020) Eduardo Soares, Plamen Angelov, Sarah Biaso, Michele Higa Froes, and Daniel Kanda Abe. Sars-cov-2 ct-scan dataset: A large dataset of real patients ct scans for sars-cov-2 identification. medRxiv, 2020.
  • Wu & He (2018) Yuxin Wu and Kaiming He. Group normalization. In Proceedings of the European conference on computer vision (ECCV), pp.  3–19, 2018.

Appendix A Appendix

The parameters in each evaluation are listed.

1. Parameters for evaluation on ImageNet subset Attacking resnet-18: ε=5\varepsilon=5, N=1e4N=1e4, α=ε/100\alpha=\varepsilon/100 with x-ray and CT; ε=20\varepsilon=20, N=1e4N=1e4, α=ε/100\alpha=\varepsilon/100 with random noise. Attacking densnet-121: ε=5\varepsilon=5, N=1e4N=1e4, α=ε/100\alpha=\varepsilon/100 with x-ray and CT; ε=30\varepsilon=30, N=1e4N=1e4, α=ε/100\alpha=\varepsilon/100 with random noise. We inspected the OOD images from random noise: they are not recognizable to human vision.

2. Parameters for evaluation on OCT dataset ε=10\varepsilon=10, N=1e4N=1e4, α=ε/100\alpha=\varepsilon/100 with retinal fundus photography image ε=20\varepsilon=20, N=1e4N=1e4, α=ε/100\alpha=\varepsilon/100 with random noise.

3. Parameters for evaluation on COVID-19 CT dataset ε=20\varepsilon=20, N=1e4N=1e4, α=ε/100\alpha=\varepsilon/100

4. Parameters for evaluation on GTSRB dataset ε=10\varepsilon=10, N=1e4N=1e4, α=ε/100\alpha=\varepsilon/100

5. Parameters for Evaluation on CelebA Dataset ε=10\varepsilon=10, N=1e4N=1e4, α=ε/100\alpha=\varepsilon/100

Appendix B Appendix

Refer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to caption
Refer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to caption
Refer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to caption
Figure 7: The 1st column shows in-distribution samples xinx_{in} in the OCT dataset, and the scatter plots of zinz_{in} (blue dots). The 2nd column shows OOD samples xoutx_{out} generated from a retinal fundus photography image xoutx_{out}^{\prime}, and the scatter plots of zoutz_{out} (red) and zinz_{in} (blue). The 3rd column shows OOD samples xoutx_{out} generated from a random image xoutx_{out}^{\prime}, and the scatter plots of zoutz_{out} (red) and zinz_{in} (blue). MAPE values are embedded in these scatter-plots. Please zoom-in for better visualization.

B.1 Evaluation on OCT dataset

We tested our algorithm and Resnet-18 on a retinal optical coherence tomography (OCT) dataset (Kermany et al., 2018), which has four classes. Each image is resized to 224×224. 1000 samples per class were randomly selected to obtain a training set of 4000 samples. The test set has 968 images. We modified Resnet-18 for this four-class classification task. The latent space has 512 dimensions. After training, the Resnet-18 model achieved a classification accuracy >95%>95\% on the test set.

We used two references images as the initial OOD sample xoutx_{out}^{\prime}. The first reference image is a grayscale retinal image converted from an RGB color retinal fundus photography image. Compared to this retinal fundus photography image, the OCT images have unique patterns of horizontal “white bands”. We selected this OOD image by purpose: there may be a chance that both types of images are needed for retinal diagnosis. The second reference image is generated from random noises. Examples are shown in Fig. 7, and the two MAPE histograms are shown in Fig. 8. The results confirm that the algorithm can generate OOD samples (968) which are mapped by the DNN model to the locations of the in-distribution samples (968) in the latent space, i.e., zoutzinz_{out}\cong z_{in}.

Refer to caption
Refer to caption
Figure 8: Left: MAPE histogram using a retinal fundus photography image as the initial OOD sample. Right: MAPE histogram using a random-noise image as the initial OOD sample. Please zoom-in for better visualization.

B.2 Evaluation on COVID-19 CT Dataset

Refer to caption
Refer to caption
Figure 9: Left: MAPE histogram using chest x-ray image as the initial OOD sample. Right: MAPE histogram using a random-noise image as the initial OOD sample. Please zoom-in for better visualization.

We also tested our algorithm and Resnet-18 on a public COVID-19 lung CT (2D) image dataset (Soares et al., 2020). It contains 1252 CT scans (2D images) that are positive for COVID-19 infection and 1230 CT scans (2D images) for patients non-infected by COVID-19, 2482 CT scans in total. From infected cases, we randomly selected 200 samples for testing, 30 for validation, and 1022 for training. From the uninfected cases, we randomly selected 200 for testing, 30 for validation and 1000 for training. Each image is resized to 224×224.

We modified the last layer of Resnet-18 for this binary classification task, infected vs uninfected. We also replaced batch normalization with instance normalization because it is known that batch normalization is not stable for small batch-size (Wu & He, 2018). The latent space still has 512 dimensions. We set batch-size to 32, the number of training epochs to 100, and used AdamW optimizer with the default parameters. After training, the model achieved a classification accuracy >95%>95\% on test set.

We used two reference images as the initial OOD sample xoutx_{out}^{\prime}, a chest x-ray image, and a random-noise image. The two MAPE histograms are shown in Fig. 9 that most of the MAPE values are less than 0.1%0.1\%. The results also confirm that the algorithm can generate OOD samples (400) which are mapped by the DNN model to the locations of the in-distribution samples (400) in the latent space, i.e., zoutzinz_{out}\cong z_{in}.

Examples are shown in Fig. 10. As reported in the previous studies (Shi et al., 2020), infected regions in the images have a unique pattern called ground-glass opacity. The CT images in the 1st and 3rd rows show COVID-19 infections with ground-glass opacity on the upper-left area. The CT image in the 5th row does not show any signs of infection. It can be seen that the random-noise images and the COVID-19 CT images have the same feature vectors in the latent space, which is astonishing.

Refer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to caption
Refer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to caption
Refer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to caption
Figure 10: The 1st column shows in-distribution samples xinx_{in} in the COVID-19 dataset, and the scatter plots of zinz_{in} (blue dots). The 2nd column shows OOD samples xoutx_{out} generated from a chest x-ray image xoutx_{out}^{\prime}, and the scatter plots of zoutz_{out} (red) and zinz_{in} (blue). The 3rd column shows OOD samples xoutx_{out} generated from a random image xoutx_{out}^{\prime}, and the scatter plots of zoutz_{out} (red) and zinz_{in} (blue). MAPE values are embedded in these scatter-plots. Please zoom-in for better visualization.

B.3 Evaluation on GTSRB Traffic Sign Dataset

We tested our algorithm and a state-of-the-art traffic sign classifier on the GTSRB dataset. The classifier is similar to the one in (Arcos-Garcia et al., 2018), which has a spatial-transformer network. The size of each image is 32×32×3. The latent space has 128 dimensions. After training, the classifier achieved over 99%99\% accuracy on the test set. We used a random-noise image as the initial OOD sample xoutx_{out}^{\prime} to generate 12630 OOD samples paired with the 12630 in-distribution samples in the test set. The MAPE histogram is shown in Fig. 11, in which most of the MAPE values are less than 0.1%0.1\%. Examples are shown in Fig. 12.

It can be seen that zoutz_{out} of random-noise images are almost the same as zinz_{in} of the stop sign, the speed limit sign, and the turning signs. Not only the classifier cannot tell the difference between a real traffic sign and a generated noise image, but also any detectors based on zinz_{in} for OOD detection will fail. We note that adversarial robustness of traffic sign classifiers has been studied (Eykholt et al., 2018), and after adding adversarial noises to the traffic sign images, the noisy images are still recognizable. OOD noises and adversarial noises are very different (discussed in Sections 2.1 and 2.2). Thus, it would be wise to disable any vision-based auto-pilot in your self-driving cars today until this issue is resolved.

Refer to caption
Refer to caption
Figure 11: Left: MAPE histogram. Right: zoom-in view of the histogram
Refer to captionRefer to captionRefer to caption
Refer to captionRefer to captionRefer to caption
Refer to captionRefer to captionRefer to caption
Refer to captionRefer to captionRefer to caption
Figure 12: The 1st row shows four traffic sign images. The 3rd row shows the generated OOD images. The 2nd row shows the scatter-plots of zoutz_{out} (red) and zinz_{in} (blue). MAPE values are embedded in these scatter-plots. Please zoom-in for better visualization.

Appendix C Appendix

We applied our OOD Attack algorithm to test the OOD detection method named ODIN (Liang et al., 2018).

C.1 Summary of the ODIN method

The method does temperature scaling on the logits (input to softmax) by logits/TT, which is the Eq.(1) in the ODIN paper. The temperature TT could be in the range of 1 to 1000. The ODIN method also does input preprocessing, which is the Eq.(2) in the ODIN paper. For preprocessing, the perturbation magnitude (PMPM) could be in the range of 0 to 0.004. The OOD score is defined to be the maximum of the softmax outputs from a neural network, given the preprocessed input. An OOD sample is expected to have a low OOD score.

C.2 Evaluation on CIFAR10

Wide residual network with depth 28 and widen factor 10 is used in the ODIN paper. After training for 200 epochs, the model achieved the classification accuracy of 94.71 on CIFAR10 test set.

In our algorithm, we set f(x)f\left(x\right) to be the logits output from the model, given the preprocessed input. For the CIFAR10 dataset, the logits output contains 10 elements, which is significant dimensionality reduction compared to the size of an input color image: 32×32×332\times 32\times 3. In our algorithm, the parameters are ε=10,α=ε/100,N=100\varepsilon=10,\alpha=\varepsilon/100,N=100. The initial OOD sample is a random noise image. For every sample in the CIFAR10 test set, the algorithm generated an OOD sample to match the logits output. The generated OOD samples look like random noises. The OOD scores of these samples were calculated by the ODIN method.

The results are reported in Table 2. Fig. 13 shows the OOD score histograms of the in-distribution and OOD samples when TT=1000 and PMPM=0.001. When T=1T=1 and PM=0PM=0, ODIN becomes the Baseline method (Hendrycks & Gimpel, 2017).

Table 2: AUROC scores of ODIN on CIFAR10 vs OOD
T=1 T=10 T=100 T=1000
PM=0 0.500 0.500 0.500 0.500
PM=0.001 0.500 0.500 0.500 0.500
PM=0.002 0.500 0.500 0.500 0.500
PM=0.004 0.500 0.500 0.500 0.500
Refer to caption
Figure 13: the OOD score histograms of the in-distribution (blue) and OOD (red) samples when T=1000 and PM=0.001

Appendix D Appendix

We applied our OOD Attack algorithm to test the OOD detection method named Mahalanobis (Lee et al., 2018b).

D.1 Summary of the Mahalanobis method

The method extracts the feature maps from multiple layers of a neural network and applies average pooling per channel to reduce each feature map into a 1D feature vector. Then, Mahalanobis distance is calculated between a feature vector and the corresponding mean vector. The distance values from all of the feature vectors are linearly combined together to produce a single distance, i.e., the OOD score. The OOD score of an OOD sample is expected to be large. To further improve the performance, the method does input preprocessing with a given perturbation magnitude (PM), and then OOD score of the preprocessed input is obtained. The weights to combine the Mahalanobis distances from multiple layers could be determined on a validation set of OOD samples. In practice, it is impossible to obtain such a validation set. In our evaluation, we simply take the average of the distance values, which gives the OOD score.

Although the feature maps from multiple layers are utilized, the method still does dimensionality reduction to those feature maps (e.g. averaging). Therefore, the method is breakable by our OOD Attack algorithm.

D.2 Evaluation on CIFAR10 and CIFAR100

The neural network model is a residual network named Resnet34 in the Mahalanobis paper, and by changing the number of outputs, it can be used for CIFAR10 and CIFAR100. We used the pre-trained models that are available online at https://github.com/pokaxpoka/deep_\_Mahalanobis_\_detector/. The layers used for feature extraction, are exactly the same as those in the source code of the method.

In our algorithm, we have two different settings for f(x)f\left(x\right): (1) it can be the OOD score, and (2) it can be the concatenation of the feature vectors, given the original (not preprocessed) input. We did experiments with the two settings. In our algorithm, the parameters are ε=10,α=ε/100,N=1000\varepsilon=10,\alpha=\varepsilon/100,N=1000 for all experiments. The initial OOD sample is a random noise image. For every sample in the test set, the algorithm generated an OOD sample to match the corresponding output. The generated OOD samples look like random noises. The OOD scores of these samples were calculated by the Mahalanobis method.

The results on the two datasets are reported in Table 3 and Table 4. Fig. 14 shows the OOD score histograms of the in-distribution and OOD samples, when the in-distribution dataset is CIFAR10, PM=0.01, and f(x)f\left(x\right) = OOD score.

Table 3: AUROC scores of Mahalanobis on CIFAR10 vs OOD
f(x)f\left(x\right) = OOD score of xx f(x)f\left(x\right) = feature concatenation
PM=0 0.500 0.467
PM=0.01 0.500 0.179
Table 4: AUROC scores of Mahalanobis on CIFAR100 vs OOD
f(x)f\left(x\right) = OOD score of xx f(x)f\left(x\right) = feature concatenation
PM=0 0.500 0.604
PM=0.01 0.500 0.377
Refer to caption
Figure 14: The OOD score histograms of the in-distribution (blue) and OOD (red) samples. The in-distribution dataset is CIFAR10, PM=0.01, and f(x)f\left(x\right) = OOD score of xx.

Fig. 15 shows the OOD score histograms of the in-distribution and OOD samples, when the in-distribution dataset is CIFAR10, PM=0.01, and f(x)f\left(x\right) = feature concatenation. It can be seen that the OOD samples have smaller distances, which is caused by input preprocessing.

Refer to caption
Figure 15: The OOD score histograms of the in-distribution (blue) and OOD (red) samples. The in-distribution dataset is CIFAR10, PM=0.01, and f(x)f\left(x\right) = feature concatenation.

Appendix E Appendix

We applied our OOD Attack algorithm to test the OOD detection method named Outlier Exposure (Hendrycks et al., 2019).

E.1 Summary of the Outlier Exposure method

The method trains a neural network on not only the standard training set (i.e. in distribution) but also an auxiliary dataset of outliers (i.e. OOD samples). In the paper, it states that the OOD score is defined to be the cross-entropy between a uniform distribution and the softmax-output distribution. In the actual implementation (i.e. source code of the method), the OOD score is defined to be the average of the logits minus the logsumexp of the logits. In the evaluation, we used the actual implementation in the source code.

E.2 Evaluation on SVHN, CIFAR10 and CIFAR100

Wide residual networks are used in the Outlier Exposure paper. We downloaded the source code and pre-trained weights from https://github.com/hendrycks/outlier-exposure. The models were trained from scratch using Outlier Exposure, and they were named “oe_\_scratch” by the authors.

In our algorithm, we set f(x)f\left(x\right) to be the logits (input to softmax) from each model. The parameters are ε\varepsilon=10, α=ε/100\alpha=\varepsilon/100, NN=1e4 for all experiments. The initial OOD sample is a random noise image. For every sample in the test set, the algorithm generated an OOD sample to match the logits output. The generated OOD samples look like random noises. The OOD scores of these samples were calculated by the Outlier Exposure method.

The results are reported in Table 5. The OOD score histograms of the in-distribution and OOD samples are shown in Fig. 16, where the in-distribution dataset is CIFAR10.

Table 5: AUROC of Outlier Exposure on three datasets
SVHN vs OOD CIFAR10 vs OOD CIFAR100 vs OOD
0.500 0.500 0.500
Refer to caption
Figure 16: The OOD score histograms of the in-distribution (blue) and OOD (red) samples. The in-distribution dataset is CIFAR10.

Appendix F Appendix

We applied our OOD Attack algorithm to test the OOD detection method named Deep Ensemble (Lakshminarayanan et al., 2017).

F.1 Summary of the Deep Ensemble method

Deep Ensemble is a collection of neural network models working together for a classification task. The output of a Deep Ensemble is a probability distribution across the classes, which is the average of the probability/softmax outputs of individual models. In the experiments, the number of models is 5 in a Deep Ensemble. To further improve performance, adversarial training is applied to the models. The OOD score is defined to be the entropy of the probability distribution from the Deep Ensemble. The entropy is expected to be large for an OOD sample.

F.2 Evaluate on CIFAR10

The authors of the Deep Ensemble method did not provide source code and trained models. Therefore, we used pre-trained models from a recent work on adversarial robustness (Ding et al., 2020), which presented a state-of-the-art adversarial training method. Six pre-trained models were downloaded from https://github.com/BorealisAI/mma_training/tree/master/trained_models. The names of the models are cifar10-L2-MMA-1.0-sd0, cifar10-L2-MMA-2.0-sd0, cifar10-L2-OMMA-1.0-sd0, cifar10-L2-OMMA-2.0-sd0, cifar10-Linf-MMA-12-sd0, cifar10-Linf-OMMA-12-sd0. The models were trained on CIFAR10 to be robust against adversarial noises in a large range. Classification accuracy of the ensemble on test set is 89.85%.

In our algorithm, we set f(x)f\left(x\right) to be the concatenation of the logits from each of the six models. The parameters are ε\varepsilon=10, α\alpha= ε\varepsilon/100, NN=1e4 for all experiments. The initial OOD sample is a random noise image. For every sample in the test set, the algorithm generated an OOD sample to match the logits output. The generated OOD samples look like random noises. The OOD scores of these samples were calculated by the Deep Ensemble method.

The AUROC of the Deep Ensemble method is 0.500 on CIFAR10 vs OOD. The OOD score histograms of the in-distribution and OOD samples are shown in Fig. 17.

Refer to caption
Figure 17: The OOD score histograms of the in-distribution (blue) and OOD (red) samples.

Appendix G Appendix

We applied our OOD Attack algorithm to test the OOD detection method that builds Confidence-Calibrated Classifiers (Lee et al., 2018a).

G.1 Summary of the OOD detection method

The method jointly trains a classification network and a generative neural network (i.e. GAN) that generates OOD samples for training the classification network. Given an input, the OOD score is defined to be the maximum of the softmax outputs from the classification network. The OOD score is expected to be low for an OOD sample.

G.2 Evaluation on SVHN and CIFAR10

The neural network model is VGG13, and the source code of the method is provided by the authors at https://github.com/alinlab/Confident_classifier. We downloaded the code and trained a VGG13 model with a GAN on SVHN and another VGG13 model with a GAN on CIFAR10 by using the parameters in the source code. VGG13 has a feature module and a classifier module.

In our algorithm, we set f(x)f\left(x\right) to be the vector input to the classifier module of VGG13. The parameters are ε\varepsilon=10, α\alpha= ε\varepsilon/100, NN=1e4 for all experiments. The initial OOD sample is a random noise image. For every sample in the test set, the algorithm generated an OOD sample to match the vector input to the classifier module. The generated OOD samples look like random noises. The OOD scores of these samples were calculated by the OOD detection method.

The results are reported in Table 6 The OOD score histograms of the in-distribution and OOD samples are shown in Fig. 18, where the in-distribution dataset is CIFAR10.

Table 6: AUROC of the method on two datasets
SVHN vs OOD CIFAR10 vs OOD
0.501 0.576
Refer to caption
Figure 18: The OOD score histograms of the in-distribution (blue) and OOD (red) samples. The in-distribution dataset is CIFAR10.

Appendix H Appendix

We applied our OOD Attack algorithm to test the OOD detection method named Gram (Sastry & Oore, 2019).

H.1 Summary of the Gram method

The method extracts the feature map FlF_{l} from every layer of a network and then computes the p-th order Gram matrix Glp=(FlpFlpT)1/pG_{l}^{p}={(F_{l}^{p}{F_{l}^{p}}^{T})}^{1/p}. Gram matrices with different p values from different layers are then used to compute the OOD score, which is named Total Deviation of the input sample. An OOD sample is expected to have a high OOD score.

H.2 Resolving a numerical problem

The formula of the p-th order Gram matrix can be written as A=(B)1/pA={(B)}^{1/p}. The Gram matrices caused gradients to be inf or nan during back-propagation in the OOD attack algorithm. To resolve this problem, we tried three tricks:

(a) use double precision (float64)

(b) rewrite A=exp(1plog(B+eps))A=exp(\frac{1}{p}log(B+eps)) where epseps=1e-40

(c) use the equation in (b) to generate images during OOD attack and use the original equation A=(B)1/pA={(B)}^{1/p} to compute OOD scores.

The above tricks work for pp in the range of 1 to 5. For larger pp, we still get numerical problems (inf or nan). As shown in Fig. 2 of the Gram paper, the method has already achieved better performance compared to the Mahalanobis method when the max value of pp is 5. Thus, we set the max value of pp to 5 in our experiments.

H.3 Evaluation on CIFAR10 and CIFAR100

The source code and pre-trained Resnet models are provided by the authors at https://github.com/VectorInstitute/gram-ood-detection

Due to the unique process of the method, it is very difficult to do parallel computing with minibatches, and we have to set batch_size =1. The computing process is very time-consuming, and therefore we selected the first 500 samples in CIFAR10 test set and the first 500 samples in CIFAR100 test set in our experiments.

In our algorithm, we set f(x)f\left(x\right) to be the OOD score of xx, and the parameters are ε\varepsilon=10, α\alpha= ε\varepsilon/100, NN=100. The initial OOD sample is a random noise image. For every in-distribution sample, the algorithm generated an OOD sample to match the OOD score. The generated OOD samples look like random noises. The OOD scores of these samples were calculated by the Gram method.

The results are reported in Table 7. The OOD score histograms of the in-distribution and OOD samples are shown in Fig. 19, where the in-distribution dataset is CIFAR10. The results show that the OOD score from the Gram method can be arbitrarily manipulated by our algorithm.

Table 7: AUROC of Gram on two datasets.
CIFAR10 vs OOD CIFAR100 vs OOD
0.500 0.500
Refer to caption
Figure 19: The OOD score histograms of the in-distribution (blue) and OOD (red) samples. The in-distribution dataset is CIFAR10.

Appendix I Appendix

We applied our OOD Attack algorithm to test the OOD detection method based on Glow (Serrà et al., 2019).

I.1 Summary of the OOD detection method

The method combines Glow negative log-likelihood (NLL) and input-complexity. PNG compression is used to compress the input image. The input-complexity, L, is measured by bits per dimension, where the “bits” refers to the number of bits of the compressed image, and the dimension is the total number of pixels per image. The OOD score is NLL – L.

I.2 Evaluate on CelebA

The source code of the Glow model is downloaded from https://github.com/rosinality/glow-pytorch, and we trained it from scratch on CelbaA dataset, in which the size of each face image is 64×64×364\texttimes 64\texttimes 3. After training, the model was able to generate realistic face images. For method evaluation, we randomly selected 160 samples in the dataset because the computation cost is very high.

In our algorithm, we set f(x)f\left(x\right) to be the OOD score of xx, and the parameters are ε\varepsilon=10, α\alpha= ε\varepsilon/100, NN=1e4. The initial OOD sample is a color spiral image. For every in-distribution sample, the algorithm generated an OOD sample to match the OOD score. The generated OOD samples look like color spirals, i.e., not face images. The OOD scores of these samples were calculated by the OOD detection method.

AUROC of the method is 0.500 on CelbaA vs OOD. The OOD score histograms of the in-distribution and OOD samples are shown in Fig. 20, The result indicates that NLL combined with input complexity can still be arbitrarily manipulated.

Refer to caption
Figure 20: The OOD score histograms of the in-distribution (blue) and OOD (red) samples.

Appendix J Appendix

We applied our OOD Attack algorithm to test the OOD detection method using an energy-based model named JEM (Grathwohl et al., 2020).

J.1 Summary of the OOD detection method using JEM

In the JEM paper, it was shown that a standard classifier can be trained to be an energy-based model (EBM). Using the EBM, three types of OOD scores can be obtained for an input sample, which are: (1) Log Likelihood logp(x)logp(x), (2) the maximum of softmax classification output, i.e. maxyp(y|x){max}_{y}p\left(y|x\right), and (3) logp(x)x2-||\frac{\partial logp(x)}{\partial x}||_{2}. An OOD sample is expected to have a low OOD score.

J.2 Evaluation on CIFAR10

A wide residual network pretrained on CIFAR10 is available at https://github.com/wgrathwohl/JEM.

In our algorithm, we set f(x)f\left(x\right) to be the logits (i.e. input to softmax for classification), and the parameters are ε\varepsilon=10, α\alpha= ε\varepsilon/100, NN=1e3. The initial OOD sample is a color spiral image. For every in-distribution sample, the algorithm generated an OOD sample to match the logits output. The generated OOD samples look very weird, not in any of the 10 classes of CIFAR10. The OOD scores of these samples were calculated by the OOD detection method.

We note that we tried to use random noises as initial OOD samples, but many generated images look like images in CIFAR10 dataset, and therefore, we used a color spiral image as the initial OOD sample.

The results are reported in Table 8. The OOD score histograms of the in-distribution and OOD samples are shown in Fig. 21, Fig. 22, and Fig. 23, We note that when using logp(x)x2-||\frac{\partial logp(x)}{\partial x}||_{2} as the OOD score, the AUROC is 0.203. One may think if we flip the sign of the OOD score, then AUROC will increase to 0.797. If we do so, then AUROC scores in the last row of Table 3 in the JEM paper will be close to 0 for the OOD detection experiments done by the authors.

Table 8: AUROC of the OOD detection method on CIFAR10 vs OOD
OOD score logp(x)logp(x) maxyp(y|x){max}_{y}p\left(y|x\right) logp(x)x2-||\frac{\partial logp(x)}{\partial x}||_{2}
AUROC 0.559 0.513 0.203
Refer to caption
Figure 21: The OOD score (logp(x)logp(x)) histograms of the in-distribution (blue) and OOD (red) samples.
Refer to caption
Figure 22: The OOD score (maxyp(y|x){max}_{y}p\left(y|x\right)) histograms of the in-distribution (blue) and OOD (red) samples.
Refer to caption
Figure 23: The OOD score (logp(x)x2-||\frac{\partial logp(x)}{\partial x}||_{2}) histograms of the in-distribution (blue) and OOD (red) samples.

Fig. 24 shows an example of the loss curve over 1000 iteration in the OOD attack algorithm.

Refer to caption
Figure 24: An example of the loss curve from the OOD Attack algorithm.

Fig. 25 shows some of the generated images, which look like the images of Frankenstein’s monsters: randomly put some parts of objects together, twist/deform them, and then pour some paint on them. It may be difficult for neural networks to learn what is an object (e.g. airplane) just from images and class labels.

Refer to caption
Figure 25: Examples of the generated images.

Energy-based models (EBMs), such as JEM, can generate OOD samples during training, which may explain why the OOD attack failed when the initial OOD sample was random noise. If we take a closer look at the sampling procedure (e.g. Langevin dynamics) and the objective function, it is easy to find out EBM training algorithm is trying to pull down the energy scores of positive (in-distribution) samples and pull up the energy scores of negative (OOD) samples (Du & Mordatch, 2019), which is similar to the basic idea of adversarial training. From this perspective, OOD detection using EBMs could be a promising direction if the computation cost is acceptable, and the challenge is how to train a neural network to learn what is an object?