A Global Model Approach to Robust Few-Shot SAR Automatic Target Recognition

Nathan Inkawhich
Air Force Research Laboratory

Abstract

In real-world scenarios, it may not always be possible to collect hundreds of labeled samples per class for training deep learning-based SAR Automatic Target Recognition (ATR) models. This work specifically tackles the few-shot SAR ATR problem, where only a handful of labeled samples may be available to support the task of interest. Our approach is composed of two stages. In the first, a global representation model is trained via self-supervised learning on a large pool of diverse and unlabeled SAR data. In the second stage, the global model is used as a fixed feature extractor and a classifier is trained to partition the feature space given the few-shot support samples, while simultaneously being calibrated to detect anomalous inputs. Unlike competing approaches which require a pristine labeled dataset for pretraining via meta-learning, our approach learns highly transferable features from unlabeled data that have little-to-no relation to the downstream task. We evaluate our method in standard and extended MSTAR operating conditions and find it to achieve high accuracy and robust out-of-distribution detection in many different few-shot settings. Our results are particularly significant because they show the merit of a global model approach to SAR ATR, which makes minimal assumptions, and provides many axes for extendability.

Index Terms:

Few Shot Learning, Automatic Target Recognition, Out-of-Distribution Detection, Deep Learning.

I Introduction

Most modern Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) algorithms use Deep Neural Networks (DNNs) to perform the underlying recognition function [1]. While DNNs have shown to be very effective in ideal laboratory-like conditions, they present several challenges when being considered for deployment in “real-world” scenarios. One challenge is that DNNs are data-hungry, and often utilize hundreds to thousands of training samples per class to achieve state-of-the-art accuracy. Another key challenge is that DNNs are notoriously susceptible to producing erroneous predictions on anomalous/out-of-distribution (OOD) inputs (i.e., inputs that fall outside of the categories in the training distribution). These challenges spark two main concerns. First, what if we, as the model practitioners, cannot obtain hundreds of labeled samples per class for training, and instead can only come up with $\sim$ 5 (i.e., a few shots)? And second, in these few-shot settings, does OOD detection become harder because the models are primarily seeking generalization and have not been given enough information to learn highly detailed/nuanced representations of the in-distribution (ID) classes?

Refer to caption — Figure 1: Overview of our two-stage approach. In Stage 1 we train the feature extractor on a pool of unlabeled data and in Stage 2 we learn a classifier on top of the feature extractor for a given few shot learning problem.

Our goal in this paper is to address the stated concerns by developing an accurate and robust SAR target classifier given only a few labeled samples per class. We also strive to keep a minimal assumption set to prioritize applicability and extendability. Our high level approach involves two stages and is shown in Fig. 1. In the first stage, a global representation model is trained on a diverse and unlabeled pool of SAR data such that it is capable of extracting quality features from (nearly) any SAR chip. In Stage 2, for an arbitrary downstream (M-way, N-shot) few-shot learning (FSL) task we train a light-weight classifier using the global model as a fixed feature extractor.

A key part of our methodology is how we train the global model. Importantly, we do not assume to have labeled data in this step, which eliminates the option for a meta-learning approach [2, 3]. To learn quality representations we instead aggregate publicly available data from recent papers and leverage a modern unsupervised learning paradigm called Self-Supervised Learning (SSL). Specifically, we use the SimCLR [4] method to represent the SSL class of algorithms in this work. SSL has shown great promise for transfer and few-shot learning in the natural imagery domain [5], but has not been studied widely in the context of SAR ATR. Our hypothesis is that these modern algorithms can learn a highly flexible / expressive / useful / transferable feature extractor from unlabeled SAR data sourced from different sensors, imaging modes, polarizations, resolutions, and target types. Further, the unlabeled data does not necessarily have to represent the anticipated downstream conditions. This is contrary to several lines of concurrent research whose goal is to generate high-fidelity synthetic imagery as close to the downstream task conditions as possible for supervised pre-training [6].

The second key part of our methodology is how we improve robustness to OOD inputs. This is accomplished in the classifier training stage and given our stated operating environment it is nearly a “free-lunch.” Practically, we can use the SSL model’s pretraining dataset as an OOD dataset for outlier exposure (OE) training [7, 8]. Intuitively, the classifier is taught to be accurate and confident on labeled task data while being minimally confident on the OOD data. This objective creates a confidence calibration effect which can be exploited to detect novel OOD samples during deployment [9].

Using MSTAR [10] as our primary few-shot test environment, we find that our methodology can generate highly accuracy models in both standard and extended operating conditions. We also find that by OE training in Stage 2, our models can reliably detect a spectrum of fine- to coarse-grained OOD types unseen during training. Finally, we uncover and discuss an important trade-off between OOD generalization and detection. Overall, we make the following contributions:

•

We show a proof-of-concept that a highly transferable global SAR feature extractor can be trained without labels on diverse data and used to great effect in downstream FSL tasks;
•

We develop a robust FSL classification scheme that can reliably detect and reject OOD inputs at test time without adding overhead or assumptions;
•

We uncover a critical trade-off between OOD detection and generalization that directly motivates future work.

II Methodology

II-A Global Model Training

II-A1 Data Collection

Our global model approach is highly data driven. In order to learn a generalized and transferable SAR feature extractor, intuition says we must train it on data from different sensors, imaging modes, polarizations, resolutions, target/scene types, etc. Thus, the first step in our methodology is to aggregate data from the following public sources: SAR-Ships [11]; HRSID [12]; FUSAR [13]; SSDD [14]; LS-SSDD [15]; Dual-Pol Ships [16]; SRSDD [17]; and CVDome [18]. After the necessary pre-processing (e.g., chipping from a full frame), we are left with $\sim$ 100k unlabeled SAR chips which we refer to as $\mathcal{D}_{pretrain}$ . For reference, Fig. 2 shows a handful of samples from this set. An important note is that $90\%$ of $\mathcal{D}_{pretrain}$ chips contain ship-like targets and none of the chips contain MSTAR land-vehicle targets. Given that MSTAR will be our downstream task for FSL evaluation, this sets up for a true test of generalization. However, in practice if data resembling the anticipated downstream task were available it could easily be leveraged here.

II-A2 Representation Learning

With $\mathcal{D}_{pretrain}$ curated we can execute Stage 1, which is to train the global model feature extractor $f_{\phi}$ . We use the powerful SimCLR [4] contrastive learning algorithm, which trains by enforcing that two “views” of the same image lie near each-other in a feature space, while views of different images lie far away. Each view is created via aggressive stochastic augmentations that preserve the underlying semantic features but force the model to learn an informative feature set via minimizing the contrastive objective. We specifically leverage the normalized temperature-scaled cross-entropy loss (NT-Xent) from [4], and refer the reader to that work for a more technical description of the algorithm and [19] for a PyTorch implementation.

Perhaps the most unique part about the application of SimCLR (and many other SSL algorithms) to the SAR ATR setting is the composition of augmentations needed to create informative views. Since the SimCLR algorithm was developed and tested in the natural imagery domain, the standard augmentation pipeline contains transforms that are relevant to RGB images. However, several of the key augmentations such as color jittering and random gray-scaling [19] do not fit with our magnitude-only gray-scale SAR modality. Thus, we create a new augmentation pipeline made of seven image-level transformations that work to create translation invariance, and to manipulate the signal-to-noise ratio and frequency content of the chips. Included are random resized cropping, random horizontal flipping, element-wise power-scaling, Gaussian noising and filtering, and linear re-scaling. See Supplemental A for more information about our transformation pipeline. Lastly, we mention that we do not believe our design choices made in this section are “optimal.” Important areas of future work are to experiment with other SSL algorithms (see Supplemental C) and to develop more domain-relevant augmentations that may lead to higher quality learned features.

TABLE I: MSTAR Standard Operating Condition Accuracy Results (with 95% confidence interval)

	2-way					5-way					10-way
Method	1-shot	5-shot	10-shot	20-shot	25-shot	1-shot	5-shot	10-shot	20-shot	25-shot	1-shot	5-shot	10-shot	20-shot	25-shot
Scratch-AConv	59.6 $\pm$ 1.2	73.6 $\pm$ 1.2	84.3 $\pm$ 0.9	92.1 $\pm$ 0.7	93.8 $\pm$ 0.6	31.6 $\pm$ 0.6	55.9 $\pm$ 0.8	72.3 $\pm$ 0.8	85.7 $\pm$ 0.5	88.5 $\pm$ 0.4	20.9 $\pm$ 0.3	47.4 $\pm$ 0.4	65.0 $\pm$ 0.4	80.7 $\pm$ 0.2	84.5 $\pm$ 0.2
Scratch-RN18	55.8 $\pm$ 1.2	74.9 $\pm$ 1.4	84.1 $\pm$ 1.1	89.6 $\pm$ 0.8	92.1 $\pm$ 0.6	30.6 $\pm$ 0.7	53.7 $\pm$ 0.8	67.0 $\pm$ 0.7	79.3 $\pm$ 0.6	83.2 $\pm$ 0.5	20.7 $\pm$ 0.3	43.4 $\pm$ 0.4	58.9 $\pm$ 0.4	74.0 $\pm$ 0.4	78.5 $\pm$ 0.3
\cdashline1-16
SimCLR+basic	64.5 $\pm$ 1.4	83.0 $\pm$ 1.1	91.4 $\pm$ 0.7	96.1 $\pm$ 0.4	97.4 $\pm$ 0.2	38.8 $\pm$ 0.7	68.4 $\pm$ 0.7	82.6 $\pm$ 0.5	91.8 $\pm$ 0.3	94.0 $\pm$ 0.2	28.3 $\pm$ 0.3	59.2 $\pm$ 0.3	75.2 $\pm$ 0.2	87.7 $\pm$ 0.1	90.6 $\pm$ 0.1
SimCLR+OE	65.4 $\pm$ 1.5	86.4 $\pm$ 1.0	92.2 $\pm$ 0.8	96.5 $\pm$ 0.4	97.5 $\pm$ 0.2	39.9 $\pm$ 0.7	69.7 $\pm$ 0.6	83.3 $\pm$ 0.5	92.3 $\pm$ 0.3	93.8 $\pm$ 0.2	28.8 $\pm$ 0.3	60.5 $\pm$ 0.3	75.6 $\pm$ 0.2	87.0 $\pm$ 0.1	89.3 $\pm$ 0.1

II-B Classifier Training

In Stage 2 (ref. Fig. 1) we use the global model $f_{\phi}$ as a fixed feature extractor and learn a classifier $c_{\theta}$ for the given (M-way, N-shot) FSL task, described by $\mathcal{D}_{support}=\{(x_{i},y_{i})\}_{i=1\ldots(M\times N)}$ . To be clear, “ways” refers to the number of categories in the label-space and “shots” is number of labeled training chips per category. Unlike related works in the natural imagery domain that use simple linear regression or nearest neighbor-style classifiers [5], we find it beneficial to utilize a 2-layer neural network which has the flexibility to learn non-linear decision boundaries.

We specifically consider two ways to train $c_{\theta}$ . The first, called the basic method, is to train under a vanilla supervised learning objective which minimizes empirical risk on $\mathcal{D}_{support}$ samples only. Mathematically, the basic training method is described as

\min_{\theta}\mathop{\mathbb{E}}_{(x,y)\sim\mathcal{D}_{support}}\big{[}L(c_{\theta}(f_{\phi}(x)),y)\big{]},

(1)

where $L$ is the cross-entropy loss between the classifier’s prediction and the ground truth label $y$ . The intuition for this training method is to learn a classifier that produces accurate and confident predictions on the ID data.

While the basic method can achieve high ID accuracy, it gives no guidance to the model for how to behave when an OOD input is encountered. Without violating or adding any assumptions, we can do something clever to greatly improve the classifier’s ability to handle OOD inputs. Specifically, we can re-purpose the $\mathcal{D}_{pretrain}$ dataset from Stage 1 as an outlier exposure (OE) set in Stage 2. The intuition for OE [7] is to teach the model a confidence calibration on ID and OOD inputs – it should make accurate and confident predictions on ID data and minimally confident predictions on OOD data. Functionally, the minimum confidence state is when the predicted probability is $\nicefrac{{1}}{{\#classes}}$ , so to achieve the desired effect we set the training targets of the OE data to be a Uniform distribution over the classes $\mathcal{U}_{\mathcal{C}}$ , while the targets of the ID data remain 1-hots. The complete OE objective is

\min_{\theta}\mathop{\mathbb{E}}_{(x,y)\sim\mathcal{D}_{support}}\big{[}L(c_{\theta}(f_{\phi}(x)),y)\big{]}\\ +\lambda\mathop{\mathbb{E}}_{\tilde{x}\sim\mathcal{D}_{pretrain}}\big{[}L(c_{\theta}(f_{\phi}(\tilde{x})),\mathcal{U}_{\mathcal{C}})\big{]},

(2)

where $\lambda$ is a weighting factor between the ID- and OOD-focused terms that we set to 0.5 here [7].

II-C OOD Detection

The final detail of our methodology is how we detect OOD samples during deployment. To conceptualize the process, think about each test input $x$ being assigned a real-valued score $\mathcal{S}_{ID}(x)$ as a measure of its ID-ness. During operation, if $\mathcal{S}_{ID}(x)\geq\beta_{thresh}$ (an application dependent threshold), then the input would be considered ID and the classifier would release its prediction over the set of known classes. Else, if $\mathcal{S}_{ID}(x)<\beta_{thresh}$ , the input is considered OOD and the system abstains from releasing the prediction. In this work we use a temperature-scaled Maximum Softmax Probability detector [20] to produce $\mathcal{S}_{ID}(x)$ scores, described as

\mathcal{S}_{ID}(x)=\max_{i\in\mathcal{C}}\frac{exp\big{(}c_{\theta}^{(i)}(f(x))/\tau\big{)}}{\sum_{j}^{\mathcal{|C|}}exp\big{(}c_{\theta}^{(j)}(f(x))/\tau\big{)}}.

(3)

Here, $\tau$ is a temperature hyperparameter which we set to 100 via validation. The intuition for this detection scheme is straightforward: ID samples should be predicted by the classifier with higher confidence than OOD samples. This fits naturally with the OE training goal, making it a logical choice in our system. We note that OOD detection is currently a popular research topic and future works may investigate alternative scoring mechanisms.

III Experiments

To show the merit of our approach to the robust few-shot SAR ATR problem we perform three primary experiments and one analysis. The first experiment (Sec. III-A) shows results in MSTAR’s Standard Operating Condition (SOC) and the second (Sec. III-B) investigates generalization performance in an Extended Operating Condition (EOC). The third experiment (Sec. III-C) measures OOD detection performance against a spectrum of granularities and the analysis (Sec. III-D) discusses a critical trade-off between OOD detection and generalization.

All of the experiments follow a similar setup. In Stage 1, we initialize the feature extractor as a ResNet-18 (RN18) [21] backbone with output dimension 512, and the Projector network as a 3-layer neural net [19] with output dimension 128. These components are trained with an ADAM optimizer for 200 epochs, with 1024 batch size, weight decay of 1e-4, and a learning rate of 3e-4 following a cosine decay schedule. For the SimCLR NT-Xent loss we use a temperature of 0.01. In Stage 2, we discard the Projector network, fix the weights of the RN18, and initialize our 2-layer neural net classifier. We train the classifier for 500 iterations, with label smoothing, using an ADAM optimizer with cosine decayed learning rate starting at 1e-3. On the data side, we train with $64\times 64$ px crops and use Gaussian noise and random flipping augmentations. Finally, the results in this document are averages over 250 randomized runs.

III-A Standard Operating Conditions

To measure SOC performance on MSTAR we train the models on 17^∘ elevation imagery and test the accuracy of classifying 15^∘ data [22]. As mentioned, because we do not assume to have labeled pretraining data, meta-learning-based FSL methods are inappropriate for comparison (also see Supplemental B). For baselines we instead train supervised models from scratch on the $\mathcal{D}_{support}$ samples only. We examine two architectures of varying complexities: A-ConvNet [22] and RN18 (displayed as Scratch-{AConv,RN18}). The MSTAR SOC results are shown in Table I for #-ways={2,5,10} and #-shots={1,5,10,20,25}, covering a gamut of potential scenarios.

To start, we find several intuitive takeaways. First, accuracy increases with the number of shots; and second, as the number of ways increases the models need more shots to achieve the same performance. In all cases, our SimCLR-based models significantly outperform the models trained from scratch, and in most cases the OE classifier has a slight edge over the basic classifier. When only given 10-shots, our average margin of improvement over the best baseline is +9.8%. On full 10-class MSTAR, we reach 90% average accuracy at $\sim$ 25-shots, which is a 6% improvement over the best baseline. Lastly, we want to emphasize the implications of these results. The self-supervised SimCLR backbone, which has been trained on mostly ships, is able to separate the MSTAR data by class even though it has never been trained on such data!

III-B Extended Operating Conditions

For the MSTAR EOC test we train the classification models on 17^∘ elevation imagery and test on 30^∘ data [22]. The large difference in collection geometry causes a sizable change in the target signatures and offers a more challenging test of generalization. Our results are reported in Table II. We see several similar trends to the SOC results w.r.t. the #-shots and #-ways. In all cases, the SimCLR-based models outperform the train-from-scratch baselines, while the basic and OE classifiers perform very similarly to each other. Interestingly, the largest margins over the baselines are at the lowest number of shots. For example, at 5-shots our average improvement over the best baseline is +9%.

TABLE II: MSTAR Extended Operating Condition Accuracy Results

	2-way				4-way
Method	5-shot	10-shot	20-shot	25-shot	5-shot	10-shot	20-shot	25-shot
Scratch-AConv	68.9	78.4	87.0	89.3	53.8	66.2	74.3	76.3
Scratch-RN18	72.1	82.1	86.6	86.9	56.0	65.8	73.2	74.5
\cdashline1-9
SimCLR+basic	78.8	85.4	90.3	90.7	65.2	74.2	80.5	82.0
SimCLR+OE	80.9	85.7	89.6	89.8	65.3	74.1	79.7	81.0

III-C Out-of-Distribution Detection

Our goal in the third experiment is to examine OOD detection performance across a spectrum of difficulties/granularities. To do this we leverage OOD data from various sources. For a particularly hard OOD test we use a holdout scheme. Starting with a 7-way classifier, where in each test iteration 7 of the 10 MSTAR classes are randomly selected for use in the ID label space, we use data from the remaining 3 classes as Holdout OOD samples. For medium difficulty OOD, we use data from SARSIM-Roads, SARSIM-Medium, and SARSIM-Grass [6]. This is synthetic SAR data which has been generated to specifically match the collection conditions of MSTAR with different backgrounds (note, we remove the Bulldozer and Tank classes to avoid potential ID/OOD ambiguities). For coarse-grained OOD we generate FakeData by creating random Uniform noise chips and use the MNIST test set which contains hand-written digits. Both are clearly OOD w.r.t. any SAR classifier. Fig. 3 shows examples from each of these datasets. Importantly, keep in mind that none of the test OOD data is in $\mathcal{D}_{pretrain}$ (the OE set), meaning the model has never seen it before.

Table III shows the average OOD detection performance (as measured by % AUROC [20]) of a 7-way MSTAR SOC classifier at different #-shots. Note, the AUROC value can be interpreted as the probability that an ID test input would have a greater $\mathcal{S}_{ID}$ score than an OOD input [23], where 100% is a perfect detector. At each setting we examine the performance of the basic versus OE classifier, which we know both achieve high ID accuracy from Sec. III-A. The most important takeaway from Table III is that the OE classifier is better than the basic classifier in all cases by a large margin. Interestingly, for the basic model the coarse-grained OOD sets are the hardest to detect, while the OE classifier detects these almost perfectly. This surprising behavior is something we plan to study in a future work. Another interesting finding is that detection of Holdout samples improves significantly as #-shots increases, while detection of the other OOD types stays relatively constant w.r.t. #-shots. Finally, a somewhat expected result is that for OE models, the Holdout samples are the most difficult to detect, while the medium- and coarse-grained OOD inputs are less challenging. Given that these results still show room for improvement, we highly encourage future system designers (whether in few-shot scenarios or not) to use OOD-aware classifier training schemes such as OE to build robustness to the many forms of OOD inputs.

TABLE III: MSTAR Out-of-Distribution Detection Results for 7-way classifier (% AUROC)

	Notation = basic / OE
OOD Set	1-shot	5-shot	10-shot	20-shot	25-shot
Holdout	54.1 / 56.0	64.1 / 67.2	70.2 / 76.2	78.0 / 81.9	81.7 / 84.0
SARSIM-R	59.6 / 93.2	56.3 / 95.9	58.8 / 96.6	66.0 / 97.2	68.3 / 97.3
SARSIM-M	55.5 / 88.4	50.8 / 90.5	53.9 / 92.4	60.3 / 93.5	61.2 / 93.6
SARSIM-G	44.8 / 87.9	40.3 / 87.0	44.2 / 88.1	50.6 / 87.9	52.5 / 87.2
FakeData	42.1 / 98.5	27.0 / 99.5	21.7 / 99.6	21.6 / 99.5	21.3 / 99.5
MNIST	15.1 / 94.1	15.5 / 95.8	16.6 / 96.3	21.1 / 95.5	22.2 / 95.8

III-D Generalization vs. Detection Trade-off

Our final experiment is an analysis that deliberates the question: can we have it all? Specifically, can we have high accuracy on ID data in SOC and high accuracy on ID classes in EOC and reliable OOD detection? To answer this question we create Fig. 4, which shows density plots of $\mathcal{S}_{ID}(x)$ scores measured on a SimCLR+OE model for ID test data in SOC (SOC-ID), ID test data in EOC (EOC-ID), and Holdout OOD data in SOC (Holdout-OOD).

Firstly, we observe that the SOC-ID and Holdout-OOD are relatively separable, which matches the findings in Table III. However, the EOC-ID and Holdout-OOD are not very separable, as both are predicted with relatively low scores. Thus, with our method if you require good Holdout-OOD detection then most of the EOC-ID would also get flagged as OOD. Conversely, if you require high recall on EOC-ID then many Holdout-OOD would pass through the detector. These results indicate that we cannot have it all with this design, and highlight a serious trade-off to be considered in future work. Finally, while this trade-off may not be completely unique to the FSL setting, our previous observation that less shots makes Holdout-OOD detection harder (Sec. III-C) means this trade-off may be more dire in the few-shot setting.

IV Conclusion

We now confirm our initial hypothesis (Sec. I) that modern SSL techniques like SimCLR are capable of learning highly generalizable SAR feature extractors from large pools of diverse and unlabeled SAR data. As evidence, our experiments show that a model trained mostly on unlabeled SAR ships is able to provide an informative-enough feature space to do few-shot classification of MSTAR targets. This result provides merit to the under-studied global model approach to SAR ATR and confirms that SAR features can be highly transferable, even across sensors, imaging modes, target types, etc. At a more concrete level, we provide a method for performing robust few-shot SAR ATR in a limited environment where there is no labeled data for pretraining. We also provide a methodology that boosts OOD detection performance of the classifier without adding significant assumptions or overhead. We hope that this work motivates further study on few-shot SAR ATR and representation learning for SAR, with specific considerations for robustness to OOD inputs and the trade-off between OOD detection and generalization.

Lastly, we provide some suggestions for future work. We believe there are potential gains in developing improved domain-relevant augmentations for both Stage 1 and Stage 2 training. We also believe that adding synthetically generated data crafted to be relevant to the downstream task may improve the quality of learned features. Next, one may investigate the use of different SSL algorithms and backbone model architectures. Finally, an interesting study would be to evaluate when it becomes beneficial to fine-tune the feature extractor instead of keeping it fixed during Stage 2 training.

Public Release Number: AFRL-2022-3418

References

[1] U. K. Majumder, E. P. Blasch, and D. A. Garren, Deep Learning for Radar and Communications Automatic Target Recognition. Artech House, 2020.
[2] J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30, 2017.
[3] L. Zhang, X. Leng, S. Feng, X. Ma, K. Ji, G. Kuang, and L. Liu, “Domain knowledge powered two-stream deep network for few-shot sar vehicle recognition,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–15, 2022.
[4] T. Chen, S. Kornblith, M. Norouzi, and G. E. Hinton, “A simple framework for contrastive learning of visual representations,” in International Conference on Machine Learning, 2020, pp. 1597–1607.
[5] Y. Tian, Y. Wang, D. Krishnan, J. B. Tenenbaum, and P. Isola, “Rethinking few-shot image classification: A good embedding is all you need?” in European Conference on Computer Vision, vol. 12359, 2020, pp. 266–282.
[6] D. Malmgren-Hansen, A. Kusk, J. Dall, A. A. Nielsen, R. Engholm, and H. Skriver, “Improving sar automatic target recognition models with transfer learning from simulated data,” IEEE Geoscience and Remote Sensing Letters, vol. 14, no. 9, pp. 1484–1488, 2017.
[7] D. Hendrycks, M. Mazeika, and T. G. Dietterich, “Deep anomaly detection with outlier exposure,” in International Conference on Learning Representations, 2019.
[8] N. Inkawhich, E. Davis, M. Inkawhich, U. K. Majumder, and Y. Chen, “Training sar-atr models for reliable operation in open-world environments,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 3954–3966, 2021.
[9] N. Inkawhich, J. Zhang, E. K. Davis, R. Luley, and Y. Chen, “Improving out-of-distribution detection by learning from the deployment environment,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 2070–2086, 2022.
[10] T. Ross, S. Worrell, V. Velten, J. Mossing, and M. Bryant, “Standard sar atr evaluation experiments using the mstar public release data set,” in SPIE Conference on Algorithms for Synthetic Aperture Radar Imagery V, 1998.
[11] Y. Wang, C. Wang, H. Zhang, Y. Dong, and S. Wei, “A sar dataset of ship detection for deep learning under complex backgrounds,” Remote Sensing, vol. 11, no. 7, p. 765, Mar 2019.
[12] S. Wei, X. Zeng, Q. Qu, M. Wang, H. Su, and J. Shi, “Hrsid: A high-resolution sar images dataset for ship detection and instance segmentation,” IEEE Access, vol. 8, pp. 120 234–120 254, 2020.
[13] X. Hou, W. Ao, Q. Song, J. Lai, H. Wang, and F. Xu, “Fusar-ship: building a high-resolution SAR-AIS matchup dataset of gaofen-3 for ship detection and recognition,” Sci. China Inf. Sci., vol. 63, no. 4, 2020.
[14] T. Zhang, X. Zhang, J. Li, X. Xu, B. Wang, X. Zhan, Y. Xu, X. Ke, T. Zeng, H. Su, I. Ahmad, D. Pan, C. Liu, Y. Zhou, J. Shi, and S. Wei, “Sar ship detection dataset (ssdd): Official release and comprehensive data analysis,” Remote Sensing, vol. 13, no. 18, 2021.
[15] T. Zhang, X. Zhang, X. Ke, X. Zhan, J. Shi, S. Wei, D. Pan, J. Li, H. Su, Y. Zhou, and D. Kumar, “Ls-ssdd-v1.0: A deep learning dataset dedicated to small ship detection from large-scale sentinel-1 sar images,” Remote Sensing, vol. 12, no. 18, 2020.
[16] Y. Hu, Y. Li, and Z. Pan, “A dual-polarimetric sar ship detection dataset and a memory-augmented autoencoder-based detection method,” Sensors, vol. 21, no. 24, 2021.
[17] S. Lei, D. Lu, X. Qiu, and C. Ding, “Srsdd-v1.0: A high-resolution sar rotation ship detection dataset,” Remote Sensing, vol. 13, no. 24, 2021.
[18] K. E. Dungan, C. Austin, J. Nehrbass, and L. C. Potter, “Civilian vehicle radar data domes,” in SPIE Algorithms for Synthetic Aperture Radar Imagery XVII, 2010.
[19] T. Silva, “Simclr,” https://github.com/sthalles/SimCLR, 2022.
[20] S. Liang, Y. Li, and R. Srikant, “Enhancing the reliability of out-of-distribution image detection in neural networks,” in International Conference on Learning Representations, 2018.
[21] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[22] S. Chen, H. Wang, F. Xu, and Y. Jin, “Target classification using the deep convolutional networks for sar images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 8, pp. 4806–4817, 2016.
[23] D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,” in International Conference on Learning Representations, 2017.

Supplemental Materials

A. SimCLR Augmentations for SAR Data

In this section we will discuss the augmentation pipeline used to train the SimCLR model in more detail. Fig. 6 shows the PyTorch transform block that defines the augmentations. Notice, several of the operations are from torchvision.transforms and we leave description of those to the official docs (https://pytorch.org/vision/0.8/transforms.html). The remaining transforms are defined as follows:

•

ClipAndScale(a,b) - linearly re-scale the range of pixel values in the image given a randomly chosen maximum value between a and b. Pseudo: $\text{max\_val}=random.uniform(a,b);x_{new}=clip(x_{orig},0,\text{max\_val})/\text{max\_val}$ .
•

PowScale(a,b) - pixel-wise exponentiation using a randomly selected value between a and b. Pseudo: $\text{val}=random.choice([random.uniform(a,1),random.uniform(1,b)]);x_{new}=(x_{orig})^{\text{val}}$ .
•

SpeckleNoise(a,b) - randomly replace a subset of pixel values with values chosen from a Uniform(0,1) distribution. The size of the subset in % total pixels is decided with $random.uniform(a,b)$ .
•

GaussianNoise(a,b) - apply pixel-wise additive Gaussian noise using $stdev=random.uniform(a,b)$ .
•

GaussianBlur - convolve the chip with a Gaussian filter.

Finally, Fig. 6 shows several examples of augmented pairs that are generated with this transform block. Recall, the SimCLR training objective encourages the two different views in the pair to lie near each-other in projection space, while views from different pairs should lie far away.

B. Comparison With A Powerful Recent Method for Few-Shot MSTAR

In the main document we do not compare our method to meta-learning-based FSL algorithms. As mentioned several times, this is because meta-learning approaches assume the existence of a labeled dataset for pretraining the feature extractor, which is a violation of our basic assumption set. However, for the sake of completeness we now present a detailed comparison with a state-of-the-art meta-learning approach to few-shot MSTAR called Domain Knowledge Powered Two Stream Deep Network (DKTS-N) [3]. Below are some notable points in no particular order:

•

DKTS-N uses the labeled SARSIM dataset [6] as the source of pretraining data for meta-learning. We argue that this is essentially an ideal scenario because SARSIM is specifically/conveniently synthesized to match the conditions of MSTAR (i.e., the downstream FSL task). We believe that this implies a more significant assumption than any we make in our work. Also, it is unclear if the method would be nearly as effective if the pretraining dataset was not so closely related to the downstream task. Finally, although our method is not reliant on such a related dataset, in practice we could leverage more data by adding it to $\mathcal{D}_{pretrain}$ .
•

DKTS-N reports results using a preprocessing function that performs perfect azimuth angle normalization on both the training and test data. We believe this is done using meta-data provided in the MSTAR dataset. They also show that as the normalization function incurs errors, performance can degrade significantly. It is unclear how one would achieve perfect azimuth angle normalization in the “wild,” and thus we do not use it here. However, if such a function were developed, we believe it could complement our model in a similar way to theirs.
•

DKTS-N does not consider OOD detection in the few-shot classifier and thus is susceptible to producing erroneous predictions on OOD data at test time.
•

DKTS-N involves a time-intensive inference procedure which scales poorly as #-shots increases (refer Table VII of [3]). This is because for each test input it performs an iterative optimization procedure w.r.t each support sample. Our inference procedure is not nearly as complex, and is a single forward pass through a DNN (e.g., a RN18).
•

While DKTS-N shows very high performance in MSTAR SOC, our method actually outperforms it in several of the EOC settings. Specifically, in the (4-way, 10-shot) and (4-way, 25-shot) cases our method is over 3% more accurate.

C. Additional results using a different self-supervised pretraining algorithm

In the main manuscript we use the SimCLR algorithm to pretrain the feature extractor in Stage 1 of the pipeline. SimCLR is often used as a standard baseline for comparison in Self-Supervised Learning (SSL) literature and our intention is to use it as a representative of the SSL-class of algorithms. To be clear, our framework is not tied to the SimCLR algorithm and we believe that any modern SSL algorithm may be used for pretraining. To show the potential impact of a different SSL algorithm we run experiments with the Bootstrap Your Own Latent^† (BYOL) method. BYOL uses a non-contrastive distillation scheme to perform the representation learning. This is thought to offer some advantages over SimCLR because the learning signal is not reliant on many negative samples to contrast the positive pair with. Because the field of SSL is advancing so quickly it is unclear which SSL paradigm will ultimately emerge as the “best,” but for now these two methodologies make for an interesting comparison. As mentioned in the paper, an important future work is still to consider using different SSL algorithms for pretraining within our framework.

Table IV shows the extended few-shot classification results in MSTAR Standard Operation Conditions, where the Scratch and SimCLR numbers are copy-pasted from Table I in the main paper. The BYOL training parameters are nearly identical to the SimCLR parameters, including the extractor architecture, optimization procedure, augmentation scheme, and training schedule. The only method specific parameter is the EMA momentum which we set to $\tau=0.9995$ . In the table we see that BYOL actually outperforms the Scratch and SimCLR models in all scenarios. Interestingly, the benefits of BYOL over SimCLR are most clear in the very low-shot scenarios (i.e., 1- and 5-shots) where the average margin of improvement is $\sim$ 5%. Focusing on the 10-way scenario, BYOL outperforms the best scratch model by at least 12% in the 1-10 shot range. Finally, we note that BYOL-based models trained with 10-shots perform on par with Scratch models trained with 20-shots! The ease at which BYOL is swapped into our framework is a key advantage and speaks to the flexibility and extendability we were striving for.

TABLE IV: Extended MSTAR Standard Operating Condition Accuracy Results (with 95% confidence interval)

	2-way					5-way					10-way
Method	1-shot	5-shot	10-shot	20-shot	25-shot	1-shot	5-shot	10-shot	20-shot	25-shot	1-shot	5-shot	10-shot	20-shot	25-shot
Scratch-AConv	59.6 $\pm$ 1.2	73.6 $\pm$ 1.2	84.3 $\pm$ 0.9	92.1 $\pm$ 0.7	93.8 $\pm$ 0.6	31.6 $\pm$ 0.6	55.9 $\pm$ 0.8	72.3 $\pm$ 0.8	85.7 $\pm$ 0.5	88.5 $\pm$ 0.4	20.9 $\pm$ 0.3	47.4 $\pm$ 0.4	65.0 $\pm$ 0.4	80.7 $\pm$ 0.2	84.5 $\pm$ 0.2
Scratch-RN18	55.8 $\pm$ 1.2	74.9 $\pm$ 1.4	84.1 $\pm$ 1.1	89.6 $\pm$ 0.8	92.1 $\pm$ 0.6	30.6 $\pm$ 0.7	53.7 $\pm$ 0.8	67.0 $\pm$ 0.7	79.3 $\pm$ 0.6	83.2 $\pm$ 0.5	20.7 $\pm$ 0.3	43.4 $\pm$ 0.4	58.9 $\pm$ 0.4	74.0 $\pm$ 0.4	78.5 $\pm$ 0.3
\cdashline1-16
SimCLR+basic	64.5 $\pm$ 1.4	83.0 $\pm$ 1.1	91.4 $\pm$ 0.7	96.1 $\pm$ 0.4	97.4 $\pm$ 0.2	38.8 $\pm$ 0.7	68.4 $\pm$ 0.7	82.6 $\pm$ 0.5	91.8 $\pm$ 0.3	94.0 $\pm$ 0.2	28.3 $\pm$ 0.3	59.2 $\pm$ 0.3	75.2 $\pm$ 0.2	87.7 $\pm$ 0.1	90.6 $\pm$ 0.1
SimCLR+OE	65.4 $\pm$ 1.5	86.4 $\pm$ 1.0	92.2 $\pm$ 0.8	96.5 $\pm$ 0.4	97.5 $\pm$ 0.2	39.9 $\pm$ 0.7	69.7 $\pm$ 0.6	83.3 $\pm$ 0.5	92.3 $\pm$ 0.3	93.8 $\pm$ 0.2	28.8 $\pm$ 0.3	60.5 $\pm$ 0.3	75.6 $\pm$ 0.2	87.0 $\pm$ 0.1	89.3 $\pm$ 0.1
\cdashline1-16
BYOL+basic	72.4 $\pm$ 1.9	86.7 $\pm$ 1.1	93.8 $\pm$ 0.6	97.1 $\pm$ 0.3	98.0 $\pm$ 0.2	44.7 $\pm$ 0.9	74.5 $\pm$ 0.6	86.0 $\pm$ 0.5	93.9 $\pm$ 0.2	94.8 $\pm$ 0.2	32.8 $\pm$ 0.4	65.1 $\pm$ 0.3	79.5 $\pm$ 0.2	89.8 $\pm$ 0.1	92.1 $\pm$ 0.1
BYOL+OE	70.1 $\pm$ 1.8	87.6 $\pm$ 1.1	94.7 $\pm$ 0.5	97.8 $\pm$ 0.2	98.1 $\pm$ 0.2	45.9 $\pm$ 0.8	75.9 $\pm$ 0.7	87.4 $\pm$ 0.4	93.8 $\pm$ 0.2	94.9 $\pm$ 0.2	33.7 $\pm$ 0.4	66.2 $\pm$ 0.4	79.8 $\pm$ 0.2	88.7 $\pm$ 0.1	90.7 $\pm$ 0.1

[ $\dagger$ ] Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, and Michal Valko. “Bootstrap your own latent: A new approach to self-supervised learning.” In Advances in Neural Information Processing Systems, 2020.