Detection and Annotation of Plant Organs from Digitized Herbarium Scans using Deep Learning
Abstract
As herbarium specimens are increasingly becoming digitized and accessible in online repositories, advanced computer vision techniques are being used to extract information from them. The presence of certain plant organs on herbarium sheets is useful information in various scientific contexts and automatic recognition of these organs will help mobilize such information. In our study we use deep learning to detect plant organs on digitized herbarium specimens with Faster R-CNN. For our experiment we manually annotated hundreds of herbarium scans with thousands of bounding boxes for six types of plant organs and used them for training and evaluating the plant organ detection model. The model worked particularly well on leaves and stems, while flowers were also present in large numbers in the sheets, but not equally well recognized.
Keywords Herbarium Specimens Plant Organ Detection Deep Learning Convolutional Neural Networks Object Detection and Localization Image Annotation Digitization
1 Introduction
Herbarium collections have been the base of systematic botany for centuries. More than 3000 herbaria are active on a global level, comprising c. 400 Mio specimens, a number that doubled since the early 1970s and is growing steadily [32]. Accessibility of these collections has been improved by international science infrastructure aggregating specimen data and increasingly also digital images of the specimens. Plant specimens, being usually flat and of a standard format approximating A3 size, are easier to digitize than most other biological collection objects. The Global Plants Initiative [29] has been very successful in digitizing type specimens around the world, single collections like the National Museum of Natural History in Paris have digitized their collections completely [15] and large scale national or regional digitization initiatives are already taking place or are planned for the near future [3]. Presently there are more than 27 million plant specimen records with images available via the GBIF platform (www.gbif.org), the vast majority of these images being herbarium scans.
This rising number of digitized herbarium sheets provides an opportunity to employ computer-based image processing techniques like deep learning to automatically identify species and higher taxa [4, 41, 5] or to extract other useful information from the images, like the presence of pathogens (as done for live plant photos by Mohanty et al. 2016 [22]). Deep learning is a subset of machine learning methods for learning data representation. Deep learning techniques require huge amounts of training data to learn the features and representation of that data for the specified task by fine tuning parameters of hundreds or thousands of neural networks, arranged in multiple layers. Learning the value of these parameters can take vast computer and time resources, especially on huge datasets.
The most common type of deep learning network architecture being used for extracting image features is the Convolutional Neural Network (CNN) [16]. The layers and connectivity of neurons is inspired by the biological process of the animal visual cortex [21, 12]. A convolutional neural network extracts the features of an image by passing through a series of convolutional, nonlinear, pooling (image downsampling) layers and passes them to a fully connected layer to get the desired output. Each convolutional layer extracts the visual features of the image by applying convolution operations to the image with kernels, using a local receptive field, to produce feature maps and passing it as input to the next layer. The initial layers in the network compute primitive features on the image such as corners and edges, the deeper layers use these features to compute more complex features consisting of curves and basic shapes and the deepest layers combine these shapes and curves to create recognizable shapes of objects in the image [39, 42].
In this paper we use deep learning for detecting plant organs on herbarium scans. The plant organs are detected using an object detection network, which works by localizing each object with a bounding box on the image and classifying it. There are many types of networks based on CNN, used for this application. In this study, a network called Faster R-CNN [27] was used, which is part of the R-CNN family for object detection. Region-based Convolutional Networks (R-CNN) identify objects and their locations in an image. Faster R-CNN networks have shown state of the art performance in various object detection applications and competitions [43]. Therefore many researchers have explored the use of Faster R-CNN for detecting various plant organs like flowers, fruits and seedlings [28, 30, 9, 20, 31, 1, 13]. To our knowledge this is the first time object detection has been used for detecting multiple types of plant organs on herbarium scans. Identifying and localizing plant organs on herbarium sheets is a first necessary step for some interesting applications. The presence and state of organs like flowers and fruits can be used in phenological studies over long time periods and may give us more insight into climate change effects since the time of the industrial revolution [37, 14].
2 Methods
2.1 Network architecture
A typical object detection network consists of object localization and classification integrated into one convolutional network. There are two main types of meta-architectures available for this application: single stage detectors like Single Shot Multibox Detectors (SSD) [19] and You only look once (YOLO) [26], and two-stage, region-based CNN detectors like Faster R-CNN. Single stage detectors use a single feed- forward network to predict object class probabilities along with bounding box coordinates on image. Faster R-CNN is composed of three modules: 1) a deep CNN image feature extraction network, 2) a Region Proposal Network (RPN), used for detection of a predefined number of Regions of Interests (RoIs) where the object(s) of interest could reside within the image, followed by 3) Fast R-CNN [8], computes a classification score along with class-specific bounding box regression for each of these regions.
The CNN feature extraction network used in this paper is based on the ResNet-50 architecture [10], without the final fully connected layer. The Region Proposal Network creates thousands of prior or anchor boxes to estimate the location of objects in the image. The anchor boxes are predefined bounding boxes of certain height and width tiled across the image, determined by their scale and aspect ratios, in order to capture different sizes of objects of specific classes. The RPN generates these proposals by adjusting these anchors with coordinate offsets of the object bounding boxes and predicts the possibility of each anchor being a foreground object or a background. These proposals are sorted according to their score and top N proposals are selected by Non-Maximum Suppression (NMS), which are then passed to Fast R-CNN stage. NMS reduces the high number of proposals for the next stage by short listing the proposals with highest score having minimum overlap with each other by removing the proposals with overlap above a predefined threshold for each category. In the next stage the proposals with feature maps of different shapes are pooled with a ROI pooling layer, which performs max-pooling on the inputs of non-uniform sizes to obtain a fixed number of uniform size feature maps. These feature maps are propagated through fully connected layers, which end in two siblings fully connected layers for object classification and bounding box regression respectively. An illustration of Faster R-CNN is shown in Figure 1.

2.2 Image Annotation
The herbarium scans annotated for training the object detection network were selected from the MNHN (Muséum national d’Histoire naturelle) vascular plant herbarium collection dataset in Paris [15], from open access images contributed to the GBIF portal. A total of 653 images were downloaded and rescaled from their original average size of c. 5100 by 3500 pixels to 1200 by 800 pixels, in order to preserve the aspect ratio of the scans and to speed up the learning by reducing the number of pixels. All these images were annotated for six different types of organs using LabelImg [33], a Python graphical toolkit for image annotation using bounding boxes. The average rate for manual image annotation was 8 to 15 herbarium sheets per hour, depending on the difficulty and number of bounding boxes to be annotated. The total number of annotated bounding boxes for all 653 images was 19654, with an average of 30.1 bounding boxes per image. From these 653 annotated images, 155 of them were either annotated or verified by an expert, making a validated subset hence used for testing and the 498 were used for training, as shown in Figure 2 and in more detail in Table 1.
Category | Training subset | Test subset | Complete dataset |
---|---|---|---|
(498 images) | (155 images) | (653 images) | |
Leaf | 7886 | 2051 | 9937 |
Flower | 3179 | 763 | 3942 |
Fruit | 1047 | 296 | 1343 |
Seed | 4 | 6 | 10 |
Stem | 3323 | 961 | 4284 |
Root | 78 | 60 | 138 |
Total | 15517 | 4137 | 19654 |

Preparing our data was not always straight-forward. The manual localization and labelling of plant organs from specimens encountered the following difficulties: Buds, flowers and fruits are different stages emerging in the life cycle of plant reproductive organs and in some cases it was therefore difficult to find a clear distinction between these structures. In some taxa, different plant organs were impossible to separate being small and crowded, e.g. in dense inflorescences with bracts and flowers, or stems densely covered by leaves. In a few cases it was also hard to differentiate from the digital image between roots and stolons or other stem structures. In all these cases we placed our labelled boxes in a way to best characterize the respective plant organ. Sometimes this involved including parts of other organs, sometimes, if sufficient clearly assignable material was available, difficult parts were left out.
2.3 Implementation
The object recognition task was performed using Faster R-CNN, as described in the network architecture, with Feature Pyramid Network [17] backbone. Feature Pyramid Network increases the accuracy of the object detection task by generating multi-scale feature maps from a single scale feature map of ResNet output, by making top-down pathways in addition to the usual bottom-up pathways used by a regular convolutional network for feature extraction, where each layer of the network represents one pyramid level. The bottom–up pathway increases the semantic value of the image features, from corners and edges in the initial layers to detecting high level structures and shapes of objects in the image in the final layers, while reducing its resolution at each layer. The top-down pathway then reconstructs higher resolution layers from the most semantically rich layer, with predictions made independently at all levels as shown in Figure 3. This approach provides Faster R-CNN with feature maps at different resolutions for detecting objects of multiple scales.

In order to reduce the training time and more importantly because of the small size of the training dataset, transfer learning [39] was implemented to initialize the model weights pre-trained on ImageNet dataset [6]. Since the initial layers of a CNN usually learn very generic features that can also be used in new contexts, pre-trained weights can initialize the weights for these layers. For the deeper layers transfer learning is used to initialize the parameter weights pre-trained on the ImageNet dataset and then fine-tuned during training using the annotated herbarium scan dataset till convergence.
The model was implemented with the Detectron2 [38] library in PyTorch framework and trained using Stochastic Gradient Descent optimizer with a learning rate of 0.0025 and momentum of 0.9. The anchor generator in the Region Proposal Network (see section above on network architecture) had 6 anchor scales [32, 64, 128, 256, 512, 1024] (square root of area in absolute pixels) each with 3 aspect ratios of [1:2, 1:1, 2:1]. The thresholds for non-maximum suppression (NMS) were 0.6 for training and 0.25 for testing respectively.
Because of the large image size and additional parameters of Faster R-CNN, a minibatch size of 4 images per GPU (TITAN Xp) was selected for training the model. The model was trained twice, once with a training subset of 498 images on a single GPU for 9000 iterations and performance evaluated on the test subset of 155 images, also on a single GPU. The model was then trained again on all 653 annotated images on 3 GPUs for 18000 iterations for predicting plant organs on another un-annotated dataset. This dataset consists of 708 full scale herbarium scans, with an average size of c. 9600 by 6500 pixels, from the Herbarium Senckenbergianum (HS) [24].
3 Results
The minimum threshold for any prediction to be acceptable was having a score (probability) of 0.5. The performance of the model was evaluated with Average Precision metric using the Pascal VOC 2012 [7] and COCO methods [18]. The performance of the model on the test subset, trained on MNHN training subset, are shown in Table 2. The Pascal VOC method considers all predictions as positive that have Intersection over Union of at least 0.5. Intersection over Union (IoU) is a measure to calculate the overlap of the predicted bounding boxes with the ground truth bounding boxes. If multiple detections of the same object are detected, it counts the first one as a positive while the rest as negatives. The Average Precision is calculated by estimating the area under the precision-recall curve for all correct predictions, giving a score between 0 and 1. This metric for Average Precision with IoU of 0.5 is called AP50. In the COCO method, AP is calculated with three metrics all having values between 0 and 100. The first metric is the same as Pascal VOC called AP50. The second metric is AP75, with a minimum IoU of 0.75, and the third is AP is the average over 10 IoU levels from 0.5 to 0.95 with a step size of 0.05. This method also gives the AP for each category, as shown in Table 3, along with the total bounding boxes for each category in the test subset. The difference between the Pascal VOC and COCO for AP50 is most likely due to different methods to calculate the Average Precision.
AP50 (Pascal VOC) | AP50 (COCO) | AP75 | AP |
0.54 | 22.8 | 6.8 | 9.7 |
Category | Bounding Boxes | AP |
---|---|---|
Leaf | 2051 | 26.5 |
Flower | 763 | 4.7 |
Fruit | 296 | 7.8 |
Seed | 6 | 0.0 |
Stem | 961 | 9.9 |
Root | 60 | 9.4 |
A sample result of the model on the HS dataset, trained on 653 annotated MNHN images is shown in Figure 4. The organ detection model was sucessfully able to detect almost all of plant organs in majority of images. Out of these 708 scans, 203 were annotated based on the predictions of the organ detection to test the model performance. The dataset of these 203 herbarium scans, along with the result of detections and the annotations are avaialble at PANGAEA (link) [40].




AP50 | AP75 | AP |
31.9 | 25.7 | 26.4 |
Category | Bounding Boxes | AP |
---|---|---|
Leaf | 3330 | 52.9 |
Flower | 1899 | 29.9 |
Fruit | 153 | 11.3 |
Seed | 2 | 0.0 |
Stem | 1055 | 53.0 |
Root | 77 | 11.0 |
The performance of the model on the annotated Herbarium Senckenbergianum dataset is shown in Table 4 and Table 5. The average precision on these 203 scans is generally higher than the MNHN test subset, there are two main reason for this 1) The organ detection model for full scale detection was trained on all 653 images of MNHN annotated dataset before detection on HS dataset, 2) The annotation of these 203 images from HS dataset were done based on the predictions of organs on scans as shown in Figure 4.
4 Discussion
This paper presents a method to detect multiple types of plant organs on herbarium scans. For this research we annotated hundreds of images with thousands of bounding boxes with hand for each possible plant organ. A subset of these annotated scans was then used for training a deep learning for organ detection. After training the model was used to predict the type and location of plant organs on the test subset. The automated detection of plant organs in our study was most successful for leaves and stems (Table 3 and Table 5). Best AP values for leaves are likely due to the largest set of annotated bounding boxes. Good values for stems and roots may be explained by the relative uniformity of these organs throughout the plant kingdom, as compared to the morphologically more diverse flowers, and fruits in between these. Seeds are rarely visible on herbarium sheets and require more training material.
The model was trained again on all the annotated scans earlier and tested on a different un-annotated dataset. The model performed well based on visual inspection. In order to evaluate the performance of the model with average precision metric, around 200 of these scans were annotated by hand based on the predicted bounding boxes. The predicted bounding boxes dramatically reduced the time to annotate these scans, since the predictions for leaves and stems were fairly accurate. After being annotated these scans were compared with the predictions to evaluate the precision of the organ detection model on this dataset.
Most computer vision approaches on plants focus on live plants, often in the context of agriculture or plant breeding and therefore including only a limited set of taxa. The present approach not only targets a much larger group of organisms and morphological diversity, comparable to applications in citizen science [35], but can also be applied on a wider time scale by including collection objects from hundreds of years of botanical research. Some significant recent similar approaches to detect plant organs on herbarium scans are GinJinn [23] and LeafMachine [36]. GinJinn uses an object-detection pipeline for automated feature extraction from herbarium specimens. This pipeline can be used to detect any type of plant organ, which the authors of this research demonstrated by detecting leaves on a sample dataset. LeafMachine is another approach, which tries to automate extraction of leaf traits, like class, size and number, from digitized herbarium specimens with machine learning.
5 Conclusions
Our present work is focussing on the detection of plant organs from specimen images. The presence of flowers and fruits on specimens is a new source of data for phenological studies [37] interesting in the context of climate change. Presence of roots would identify plant specimens potentially containing root symbionts like mycorrhizal fungi or N-fixing bacteria for further study by microbiological or genetic methods [11]. Up to now, this requires visual examination of the specimens by humans, an automated approach using computer vision would considerably reduce the effort. Furthermore, the detection and localization of specific plant organs on a herbarium sheet would also enable or improve further computer-vision applications, including quantitative approaches based on counting these organs, improved recognition of qualitative organ-specific traits like leaf shape as well as quantitative measures such as leaf area or fruit size.
Localization of plant organs will improve automated recognition and measurements of organ-specific traits, by preselecting appropriate training material for these approaches. The general approach of measuring traits from images instead of the specimen itself, has been shown to be precise, except for very small objects [2]. Of course measurements that involve further processing of plant parts, as often done in traditional morphological studies on herbarium specimens, are not possible from images.
Automated pathogen detection on collection material will also profit from the segmentation of plant organs from Herbarium sheet images, as many pathogens or symptoms of a plant disease only occur on specific organs. Studies on gall midges [34] have found herbarium specimens to be interesting study objects and would potentially profit from computer vision.
Manual annotation of herbarium specimens with bounding boxes as done for the training and test datasets in this study is a rather time-consuming process. Verification and correction of automatically annotated specimens is considerably faster, especially if the error rate is low. By iteratively incorporating expert-verified computer-generated data into new training datasets, the results can be further improved with reasonable efforts using Continual Learning [25] .
Acknowledgments
SY, MS and SD received funding from the DFG Project Mobilization of trait data from digital image files by deep learning approaches (grant 316452578). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the TITAN Xp GPU to CW used for this research. Digitization of the Senckenberg specimens used in this study has taken place in the frame of the Global Plants Initiative.
Author Contributions
Sohaib Younis is computer scientist at Senckenberg Biodiversity and Climate Research Center with focus on deep learning and image processing. Contributions: convolutional network modeling, image preprocessing, annotation of herbarium scans, organ detection, description of results and preparation of the manuscript.
Marco Schmidt is botanist at Senckenberg Biodiversity and Climate Research Center (SBIK-F) with a focus on African savannas and biodiversity informatics (eg online databases like African Plants - a photo guide and West African vegetation) and is working at Palmengarten’s scientific service, curating living collections and collection databases. Contributions: concept of study, annotation and verification of herbarium scans, preparation of the manuscript.
Claus Weiland is scientific programmer at SBIK-F’s Data & Modelling Centre with main interests in large-scale machine learning, trait semantics and scientific data management. Contributions: Design of the GPU platform, data analysis and preparation of the manuscript.
Stefan Dressler is curator of the phanerogam collection of the Herbarium Senckenbergianum Frankfurt/M., which includes its digitization and curation of associated databases. Taxonomically he is working on Marcgraviaceae, Theaceae, Pentaphylacaceae and several Phyllanthaceous genera. Contribution: Herbarium Senckenbergianum collection, preparation of the manuscript.
Bernhard Seeger is professor of computer science systems at the Philipps University of Marburg. His research fields include high-performance database systems, parallel computation and real-time processing of high-throughput data with a focus on spatial biodiversity data. Contribution: Provision of support in machine learning and data processing.
Thomas Hickler is head of SBIK-F’s Data & Modelling Centre and Professor for Biogeography at the Goethe University Frankfurt. He is particularly interested in interactions between climate and the terrestrial biosphere, including potential impacts of climate change on species, ecosystems and associated ecosystem services. Contribution: Preparation of the manuscript, comprehensive concept of study within biodiversity sciences.
Conflict of interest
No potential conflict of interest was reported by the authors.
References
- [1] Suchet Bargoti and James Underwood. Deep fruit detection in orchards. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 3626–3633. IEEE, 2017.
- [2] Leonardo M Borges, Victor Candido Reis, Rafael Izbicki, and SP Carlos. Schrödinger’s phenotypes: herbarium specimens show two-dimensional images are both good and (not so) bad sources of morphological data. bioRxiv, 2020.
- [3] Thomas Borsch, Albert-Dieter Stevens, Eva Häffner, Anton Güntsch, Walter G Berendsohn, Marc Appelhans, Christina Barilaro, Bánk Beszteri, Frank Blattner, Oliver Bossdorf, et al. A complete digitization of german herbaria is possible, sensible and should be started now. Research Ideas and Outcomes, 6:e50675, 2020.
- [4] Jose Carranza-Rojas, Herve Goeau, Pierre Bonnet, Erick Mata-Montero, and Alexis Joly. Going deeper in the automated identification of herbarium specimens. BMC Evolutionary Biology, 17(1):1–14, 2017.
- [5] Jose Carranza-Rojas, Alexis Joly, Hervé Goëau, Erick Mata-Montero, and Pierre Bonnet. Automated identification of herbarium specimens at different taxonomic levels. In Multimedia Tools and Applications for Environmental & Biodiversity Informatics, pages 151–167. Springer, 2018.
- [6] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- [7] Mark Everingham and John Winn. The pascal visual object classes challenge 2012 (voc2012) development kit. Pattern Analysis, Statistical Modelling and Computational Learning, Tech(Rep 8), 2011.
- [8] Ross Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 1440–1448, 2015.
- [9] Nicolai Häni, Pravakar Roy, and Volkan Isler. A comparative study of fruit detection and counting methods for yield mapping in apple orchards. Journal of Field Robotics, 37(2):263–282, 2020.
- [10] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- [11] J Mason Heberling and David J Burke. Utilizing herbarium specimens to quantify historical mycorrhizal communities. Applications in plant sciences, 7(4):e01223, 2019.
- [12] David H Hubel and Torsten N Wiesel. Receptive fields and functional architecture of monkey striate cortex. The Journal of physiology, 195(1):215–243, 1968.
- [13] Yu Jiang, Changying Li, Andrew H Paterson, and Jon S Robertson. Deepseedling: deep convolutional network and kalman filter for plant seedling detection and counting in the field. Plant methods, 15(1):141, 2019.
- [14] Patricia LM Lang, Franziska M Willems, JF Scheepens, Hernán A Burbano, and Oliver Bossdorf. Using herbaria to study global environmental change. New Phytologist, 221(1):110–122, 2019.
- [15] Gwenaël Le Bras, Marc Pignal, Marc L Jeanson, Serge Muller, Cécile Aupic, Benoît Carré, Grégoire Flament, Myriam Gaudeul, Claudia Gonçalves, Vanessa R Invernón, et al. The french muséum national d’histoire naturelle vascular plant herbarium collection dataset. Scientific data, 4(1):1–16, 2017.
- [16] Yann LeCun, Yoshua Bengio, et al. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10):1995, 1995.
- [17] Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017.
- [18] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.
- [19] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In European conference on computer vision, pages 21–37. Springer, 2016.
- [20] Xiaochun Mai, Hong Zhang, and Max Q-H Meng. Faster r-cnn with classifier fusion for small fruit detection. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 7166–7172. IEEE, 2018.
- [21] Masakazu Matsugu, Katsuhiko Mori, Yusuke Mitari, and Yuji Kaneda. Subject independent facial expression recognition with robust face detection using a convolutional neural network. Neural Networks, 16(5-6):555–559, 2003.
- [22] Sharada P Mohanty, David P Hughes, and Marcel Salathé. Using deep learning for image-based plant disease detection. Frontiers in plant science, 7:1419, 2016.
- [23] Tankred Ott, Christoph Palm, Robert Vogt, and Christoph Oberprieler. Ginjinn: An object-detection pipeline for automated feature extraction from herbarium specimens. Applications in Plant Sciences, page e11351, 2020.
- [24] V. Otte, K. Wesche, S. Dressler, G. Zizka, M. Hoppenrath, and F. Kienast. The new herbarium senckenbergianum: Old institutions under a new common roof. Taxon, 60(2):617–618, 2011.
- [25] German I Parisi, Ronald Kemker, Jose L Part, Christopher Kanan, and Stefan Wermter. Continual lifelong learning with neural networks: A review. Neural Networks, 113:54–71, 2019.
- [26] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
- [27] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015.
- [28] Inkyu Sa, Zongyuan Ge, Feras Dayoub, Ben Upcroft, Tristan Perez, and Chris McCool. Deepfruits: A fruit detection system using deep neural networks. Sensors, 16(8):1222, 2016.
- [29] Gideon F Smith and Estrela Figueiredo. The global plants initiative: where it all started. Taxon, 63(3):707–709, 2014.
- [30] Madeleine Stein, Suchet Bargoti, and James Underwood. Image based mango fruit detection, localisation and yield estimation using multiple view geometry. Sensors, 16(11):1915, 2016.
- [31] Jun Sun, Xiaofei He, Xiao Ge, Xiaohong Wu, Jifeng Shen, and Yingying Song. Detection of key organs in tomato based on deep migration learning in a complex background. Agriculture, 8(12):196, 2018.
- [32] M Thiers Barbara. The world’s herbaria 2019: A summary report based on data from index herbariorum. New York Botanical Garden, (3), 2020.
- [33] Tzutalin. Labelimg, 2015.
- [34] Anneke A Veenstra. Herbarium collections—an invaluable resource for gall midge taxonomists. Muelleria, 30(1):59–64, 2012.
- [35] Jana Wäldchen and Patrick Mäder. Flora incognita–wie künstliche intelligenz die pflanzenbestimmung revolutioniert: Botanik. Biologie in unserer Zeit, 49(2):99–101, 2019.
- [36] William N Weaver, Julienne Ng, and Robert G Laport. Leafmachine: Using machine learning to automate leaf trait extraction from digitized herbarium specimens. Applications in Plant Sciences, 8(6):e11367, 2020.
- [37] Charles G Willis, Elizabeth R Ellwood, Richard B Primack, Charles C Davis, Katelin D Pearson, Amanda S Gallinat, Jenn M Yost, Gil Nelson, Susan J Mazer, Natalie L Rossington, et al. Old plants, new tricks: Phenological research using herbarium specimens. Trends in ecology & evolution, 32(7):531–546, 2017.
- [38] Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. Detectron2, 2019.
- [39] Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? In Advances in neural information processing systems, pages 3320–3328, 2014.
- [40] Sohaib Younis, Marco Schmidt, and Stefan Dressler. Plant organ detections and annotations on digitized herbarium scans, 2020.
- [41] Sohaib Younis, Claus Weiland, Robert Hoehndorf, Stefan Dressler, Thomas Hickler, Bernhard Seeger, and Marco Schmidt. Taxon and trait recognition from digitized herbarium specimens using deep convolutional neural networks. Botany Letters, 165(3-4):377–383, 2018.
- [42] Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In European conference on computer vision, pages 818–833. Springer, 2014.
- [43] Zhong-Qiu Zhao, Peng Zheng, Shou-tao Xu, and Xindong Wu. Object detection with deep learning: A review. IEEE transactions on neural networks and learning systems, 30(11):3212–3232, 2019.