This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

TEMPORAL FLOW MASK ATTENTION FOR OPEN-SET LONG-TAILED RECOGNITION OF WILD ANIMALS IN CAMERA-TRAP IMAGES

Abstract

Camera traps, unmanned observation devices, and deep learning-based image recognition systems have greatly reduced human effort in collecting and analyzing wildlife images. However, data collected via above apparatus exhibits 1) long-tailed and 2) open-ended distribution problems. To tackle the open-set long-tailed recognition problem, we propose the Temporal Flow Mask Attention Network that comprises three key building blocks: 1) an optical flow module, 2) an attention residual module, and 3) a meta-embedding classifier. We extract temporal features of sequential frames using the optical flow module and learn informative representation using attention residual blocks. Moreover, we show that applying the meta-embedding technique boosts the performance of the method in open-set long-tailed recognition. We apply this method on a Korean Demilitarized Zone (DMZ) dataset. We conduct extensive experiments, and quantitative and qualitative analyses to prove that our method effectively tackles the open-set long-tailed recognition problem while being robust to unknown classes. This research received support from the National Research Foundation of Korea which is funded by the Ministry of Science and ICT (NRF-2018R1A5A7025409)

Index Terms—  Open-set Long-tailed Recognition, Temporal Flow Mask Attention, DMZ Dataset, Camera Trap

References

  • [1] Blount, J. David, et al., “Covid-19 highlights the importance of camera traps for wildlife conservation research and management,” Biological Conservation, vol. 256, pp. 108984, 2021.
  • [2] Go, Hyojun, et al, “Fine-grained multi-class object counting,” in ICIP. IEEE, 2021, pp. 509–513.
  • [3] Schneider, Stefan, et al., “Past, present and future approaches using computer vision for animal re-identification from camera trap data,” Methods in Ecology and Evolution, vol. 10, no. 4, pp. 461–470, 2019.
  • [4] Swann, Don E., et al., “Evaluating types and features of camera traps in ecological studies: a guide for researchers,” in Camera traps in animal ecology, pp. 27–43. Springer, 2011.
  • [5] Johansson, Örjan, et al., “Identification errors in camera-trap studies result in systematic population overestimation,” Scientific reports, vol. 10, no. 1, pp. 1–10, 2020.
  • [6] Rovero, Francesco, et al., “Estimating species richness and modelling habitat preferences of tropical forest mammals from camera trap data,” PloS one, vol. 9, no. 7, pp. e103300, 2014.
  • [7] Huang, Chen, et al., “Learning deep representation for imbalanced classification,” in CVPR, 2016, pp. 5375–5384.
  • [8] Kang, Bingyi, et al., “Decoupling representation and classifier for long-tailed recognition,” ICLR, 2019.
  • [9] Zhang, Songyang, et al., “Distribution alignment: A unified framework for long-tail visual recognition,” in CVPR, 2021, pp. 2361–2370.
  • [10] Liu, Bo, To Overcome Limitations of Computer Vision Datasets, University of California, San Diego, 2021.
  • [11] Samuel, Dvir, et al., “From generalized zero-shot learning to long-tail with class descriptors,” in WACV, 2021, pp. 286–295.
  • [12] Samuel, Dvir and Gal Chechik, “Distributional robustness loss for long-tail learning,” in ICCV, 2021, pp. 9495–9504.
  • [13] Xiang, Liuyu, et al., “Learning from multiple experts: Self-paced knowledge distillation for long-tailed classification,” in ECCV. Springer, 2020, pp. 247–263.
  • [14] Schneider, Stefan, et al., “Deep learning object detection methods for ecological camera trap data,” in CRV. IEEE, 2018, pp. 321–328.
  • [15] Willi, Marco, et al., “Identifying animal species in camera trap images using deep learning and citizen science,” Methods in Ecology and Evolution, vol. 10, no. 1, pp. 80–91, 2019.
  • [16] Yu, Xiaoyuan, et al., “Automated identification of animal species in camera trap images,” EURASIP Journal on Image and Video Processing, vol. 2013, no. 1, pp. 1–10, 2013.
  • [17] Norouzzadeh, Mohammad Sadegh, et al., “Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning,” Proceedings of the National Academy of Sciences, vol. 115, no. 25, pp. E5716–E5725, 2018.
  • [18] Beery, Sara, et al., “Recognition in terra incognita,” in ECCV, 2018, pp. 456–473.
  • [19] Lu, Jiang, et al., “Attribute-based synthetic network (abs-net): Learning more from pseudo feature representations,” Pattern Recognition, vol. 80, pp. 129–142, 2018.
  • [20] Bendale, Abhijit, and Terrance E. Boult, “Towards open set deep networks,” in CVPR, 2016, pp. 1563–1572.
  • [21] Vaswani, Ashish, et al., “Attention is all you need,” NeurIPS, vol. 30, 2017.
  • [22] Liu, Ziwei, et al., “Large-scale long-tailed recognition in an open world,” in CVPR, 2019, pp. 2537–2546.
  • [23] Yang, Xinyu, et al., “Great ape detection in challenging jungle camera trap footage via attention-based spatial and temporal feature blending,” in ICCV Workshops, 2019.
  • [24] Zhu, Linchao, and Yi Yang., “Inflated episodic memory with region self-attention for long-tailed visual recognition,” in CVPR, 2020, pp. 4344–4353.
  • [25] Junjie Liu et al., “Dynamic sparse training: Find efficient sparse network from scratch with trainable masked layers,” ICLR, 2020.
  • [26] Sun, Deqing, et al., “Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume,” in CVPR, 2018, pp. 8934–8943.
  • [27] Dosovitskiy, Alexey, et al., “Flownet: Learning optical flow with convolutional networks,” in ICCV, 2015.
  • [28] Xu, Zhe, and Ray CC Cheung, “Accurate and compact convolutional neural networks with trained binarization,” in BMVC, 2019.
  • [29] Savinov, Nikolay, et al., “Episodic curiosity through reachability,” International Conference on Learning Representations, 2018.
  • [30] Qi, Hang, Matthew Brown, and David G. Lowe, “Low-shot learning with imprinted weights,” in CVPR, 2018, pp. 5822–5830.
  • [31] Gidaris, Spyros, and Nikos Komodakis, “Dynamic few-shot visual learning without forgetting,” in CVPR, 2018, pp. 4367–4375.
  • [32] Scheirer, Walter J., et al., “Toward open set recognition,” TPAMI, vol. 35, no. 7, pp. 1757–1772, 2012.
  • [33] Kong, Shu, and Deva Ramanan, “Opengan: Open-set recognition via open data generation,” in ICCV, 2021, pp. 813–822.
  • [34] Salehi, Mohammadreza, et al., “A unified survey on anomaly, novelty, open-set, and out-of-distribution detection: Solutions and future challenges,” arXiv preprint arXiv:2110.14051, 2021.
  • [35] He, Kaiming, et al., “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
  • [36] Park, Byeongjun, et al., “Balancing domain experts for long-tailed camera-trap recognition,” arXiv preprint arXiv:2202.07215, 2022.
  • [37] Beery, Sara, et al., “The iwildcam 2021 competition dataset,” arXiv preprint arXiv:2105.03494, 2021.
  • [38] Van der Maaten, Laurens, and Geoffrey Hinton, “Visualizing data using t-sne,” Journal of machine learning research, vol. 9, no. 11, 2008.
  • [39] Hinton, Geoffrey, et al., “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, vol. 2, no. 7, 2015.