Deep Learning for Plant Identification and Disease Classification from Leaf Images: Multi-prediction Approaches

Jianping Yao [email protected] School of Information and Communication Technology, University of TasmaniaTAS 7248TasmaniaAustralia7248 , Son N. Tran [email protected] School of Information Technology, Deakin UniversityVIC 3125VictoriaAustralia3125 , Saurabh Garg School of Information and Communications Technology, University of TasmaniaTAS 7248TasmaniaAustralia7248 and Samantha Sawyer Tasmania Institute of Agriculture, University of TasmaniaTasmaniaAustralia

(2022)

Abstract.

Deep learning plays an important role in modern agriculture, especially in plant pathology using leaf images where convolutional neural networks (CNN) are attracting a lot of attention. While numerous reviews have explored the applications of deep learning within this research domain, there remains a notable absence of an empirical study to offer insightful comparisons due to the employment of varied datasets in the evaluation. Furthermore, a majority of these approaches tend to address the problem as a singular prediction task, overlooking the multifaceted nature of predicting various aspects of plant species and disease types. Lastly, there is an evident need for a more profound consideration of the semantic relationships that underlie plant species and disease types. In this paper, we start our study by surveying current deep learning approaches for plant identification and disease classification. We categorise the approaches into multi-model, multi-label, multi-output, and multi-task, in which different backbone CNNs can be employed. Furthermore, based on the survey of existing approaches in plant pathology and the study of available approaches in machine learning, we propose a new model named Generalised Stacking Multi-output CNN (GSMo-CNN). To investigate the effectiveness of different backbone CNNs and learning approaches, we conduct an intensive experiment on three benchmark datasets Plant Village, Plant Leaves, and PlantDoc. The experimental results demonstrate that InceptionV3 can be a good choice for a backbone CNN as its performance is better than AlexNet, VGG16, ResNet101, EfficientNet, MobileNet, and a custom CNN developed by us. Interestingly, there is empirical evidence to support the hypothesis that using a single model for both tasks can be comparable or better than using two models, one for each task. Finally, we show that the proposed GSMo-CNN achieves state-of-the-art performance on three benchmark datasets.

deep learning, convolutional neural networks, multi-prediction, plant identification, leaf disease classification, plant pathology

^†^†copyright: acmcopyright^†^†journalyear: 20xx^†^†doi: xxx^†^†conference: Woodstock ’18: ACM Symposium on Neural Gaze Detection; June 03–05, 2018; Woodstock, NY^†^†booktitle: Woodstock ’18: ACM Symposium on Neural Gaze Detection, June 03–05, 2018, Woodstock, NY^†^†price: 15.00^†^†isbn: 978-1-4503-XXXX-X/18/06^†^†ccs: Computing methodologies Computer vision tasks^†^†ccs: Applied computing Agriculture

1. Introduction

Deep learning (DL) has been a major disruptor in a wide range of real-life applications from autonomous vehicles to clinical decision support. In agriculture, deep learning approaches have been emerging as a revolutionising tool for sustainable production. In particular, they play an important role in precision agriculture (PA)/smart agriculture (SA) (Vijaykanth Reddy and Sashi Rekha, 2021; Gajjar et al., 2021; Chouhan et al., 2021; Mureşan et al., 2020; Chouhan et al., 2020), including, but not limited to, pest, weeds or irrigation control, automatic harvesting, yield estimation, and plant disease/fruit detection, etc. Deep learning within the field of plant pathology has garnered significant attention from both the research and industrial sectors. In these domains, the identification of plants and the classification of diseases based on leaf images have witnessed extensive study and practical application (Azlah et al., 2019; Othman et al., 2022; Shelke and Mehendale, 2022; Kanda et al., 2021; Wu et al., 2007; Agarwal et al., 2019; Ashok et al., 2020; Agarwal et al., 2020; Saraswathi et al., 2021; G. and J., 2019; Sunil et al., 2020; Lee et al., 2021). The reason why leaf images are commonly used is leaves are an important part of plants where they participate in the important photosynthesis process and are the most visible part of most plants throughout their growth (Chouhan et al., 2020). Leaf colour, texture, and shape can characterise plant species and, therefore, are useful in plant identification for large-scale plant and crop management (Wu et al., 2007). Besides, many plant diseases can be visible from leaves. Leaf disease is one of the important factors that disrupt the health of plants as a whole and are one of the main causes of reduced crop yield. Therefore, it is critical for farmers to detect the occurrence of leaf disease as early as possible and minimise its negative impact or, at least, keep it under control. In general, there is a rising demand from the industry for effective methods to accurately detect and/or classify leaf diseases.

Deep learning is revolutionising traditional methods for SA/PA, especially in plant and disease classification. Although being popular in the past, traditional approaches have several obvious limitations, mostly caused by the manual costs, such as labours training, requirement of human involvement in many stages of the prediction process, and experts’ knowledge, etc. Manual methods are difficult to detect timely, and the diagnosis may be based on subjective judgment. Nowadays, with the assistance of computer vision, Internet of Things (IoT) and machine learning, we could detect leaf diseases in real-time through various devices, e.g., mobile phone applications(Paymode et al., 2021), websites(Wadhawan et al., 2020), IoT application (Chen et al., 2020) and smart glasses(Ponnusamy et al., 2020). For example, (Chen et al., 2020) introduced a combined approach of IoT and AI models to detect rice blast disease. Compared to our work, this approach uses a custom convolutional neural network (CNN), similar to the first model we implement in 3.3.1, but it work on different input data. In particular, in (Chen et al., 2020) an IoT platform for soil cultivation was utilised to extract non-image data while our study focuses on image data.

With these advanced technologies, the difficulty of the production operation can be greatly reduced, and we can achieve improvements in accuracy and efficiency. Among them, machine learning has been emerging as a key player, leading the innovation pathway for more effective prediction solutions in plant identification and disease classification. During the earlier stage, researchers employed traditional machine learning approaches, combining feature extraction and classification, with limited successes (Kirti and Rajpal, 2020; Barburiceanu et al., 2020; Singh et al., 2020b; Das et al., 2020; Gadade and Kirange, 2020; JAYAPRAKASH and BALAMURUGAN, 2021; Shahidur Harun Rumy et al., 2021; Bharate and Shirdhonkar, 2020). The key issue here is that such approaches rely heavily on an independent step to craft features from the images to classify plant species and disease types. The feature engineering process is normally defined by domain experts or generated by general image processing techniques such as Scale-invariant Feature Transform (SIFT) (JAYAPRAKASH and BALAMURUGAN, 2021), grey-level co-occurrence matrix (GLCM) (Dang-Ngoc et al., 2021; Bharate and Shirdhonkar, 2020; Shahidur Harun Rumy et al., 2021; Tulshan and Raul, 2019; Kumar et al., 2020b), etc.

As being independent of the later step of learning a classifier, these handcrafted features may not be optimal for prediction. Deep learning has been emerging recently as an effective solution. For example, CNNs can provide an end-to-end classification pipeline to identify plants and classify diseases directly from leaf images. An advantage of CNNs is their ability to learn distinctive features tailored to specific tasks. Furthermore, their adaptability allows for fine-tuning models to maximise effectiveness for unique datasets. Finally, their distributed computation capability makes them an ideal choice for large-scale solutions, enabling efficient processing and analysis of substantial datasets.

Although the application of DL for plant identification and disease classification is not new, we found that most current studies develop CNN models as single prediction classifier. It would be more convenient if there exists a multi-prediction approach for plant species and disease types as they have different indicative features from leaf images. More importantly, it is apparent that besides common types of disease different plant species will be prone to different diseases of their own. Therefore, we hypothesise that by incorporating the two tasks together in one single model, we can improve the prediction performance for each task. To investigate the proposed hypothesis, we employ multi-label/multi-output/multi-task approaches with deep learning as the core. The main constraint of the task is a requirement for multiple labels of each image to enable the training of deep models. Let us formally define the problem as follows.

Problem Statement: Given a data set $\mathcal{X}=\{(x^{(n)},p^{(n)},d^{(n)})|n=1,...,N\}$ , where $x^{(n)}\in\mathbb{R}^{W\times H\times C}$ is an image with a width $W$ , height $H$ , and $C$ channels; $p^{(n)}\in\mathcal{P}$ is a plant species; and $d^{(n)}\in\mathcal{D}$ is a plant disease, how to train a deep learning model $\mathcal{N}$ to accurately identify the plant species and the disease type from an unseen image $x^{*}$ ?

A plethora of deep learning models, mostly CNNs, have been employed for plant and leaf classification separately, including AlexNet (G. and J., 2019; Ashok et al., 2020; Agarwal et al., 2019; ANANDHAKRISHNAN and JAISAKTHI, 2020), GoogLeNet (Vijaykanth Reddy and Sashi Rekha, 2021; Zhang et al., 2018), VGG (Agarwal et al., 2020; ANANDHAKRISHNAN and JAISAKTHI, 2020; Bir et al., 2020; Agarwal et al., 2019; Huang et al., 2020; Thet et al., 2020), Inception (Agarwal et al., 2020; KRISHNAMOORTHY and PARAMESWARI, 2021; Hassan et al., 2021; Sai, 2021), ResNet (ANANDHAKRISHNAN and JAISAKTHI, 2020; Guan, 2021; Kumar et al., 2020a; Vijaykanth Reddy and Sashi Rekha, 2021), MobileNet (Mwebaze et al., 2019; Surya and Gautama, 2020; Agarwal et al., 2020; Huang et al., 2020), etc. However, there are several questions which are not been studied properly, including (i) which back-bone CNNs can be most useful for plant identification and disease classification; (ii) what other deep learning approaches can be employed for this task; (iii) whether separate models for plant identification and disease classification perform better than a single model for both tasks; and (iv) whether their performance comparison is consistent across different datasets.

In this paper, we aim to answer the above questions, and finally verify our hypothesis, by surveying, developing, and comparing a wide range of CNN architectures. To this end, we solve the problem of plant identification and disease classification by employing and evaluating a variety of deep learning approaches. We conduct an empirical study to analyse the usefulness of current deep learning models for plant species identification and leaf disease detection from leaves. We categorise the deep learning approaches into:

•

Multi-model: This is an ensemble of two CNNs models, one is tasked to predict plant species from leaf images while the other is tasked to predict diseases. For completeness, we will use and compare different backbone CNNs for these models, including our custom CNNs, AlexNet, VGG16, ResNet101, EfficientNet, InceptionV3, and MobileNetV2.
•

Multi-label: This is a single CNN model with an output of multi labels. Different from the standard multi-label learning (Madjarov et al., 2012), in this case, we combine the labels to make a power-set label for the prediction. In other words, we use a combined label to present multiple labels. We also employ different backbone CNNs for this approach.
•

Multi-output: This is a single CNN model with multiple output layers, each for a prediction target. Similarly, different backbone CNNs are used for this model as well.
•

Multi-task. We can also adapt multi-task learning to this problem. In this case, the target tasks are different but the input data for the tasks will be from the same distribution. Multi-task learning has been employed effectively in many computer vision problems (Yuan et al., 2012; Zhang et al., 2013) and deep learning can bolster this class of approaches because one task’s features may benefit other tasks’ learning (Zhang and Yang, 2021; Zhang et al., 2014).

Furthermore, based on the theoretical analysis of the above approaches, we propose a new method to improve the performance of plant identification and disease classification. After that, we review and select suitable benchmark datasets for an extensive experiment. For the empirical analysis, we use 3 different datasets including Plant Village (Hughes and Salathe, 2016), Plant Leaves (Chouhan et al., 2019), and PlantDoc (Singh et al., 2020a). The experimental results show that InceptionV3 is the best CNN backbone for both plant identification and disease classification. Interestingly, there is empirical evidence to support the hypothesis stated earlier, as single models for both tasks can achieve better performance than an ensemble of independent models. Finally, our proposed model can achieve state-of-the-art results on the benchmark datasets studied in this paper. The contribution of this paper is threefold:

•

We conduct a detailed literature review through appropriate selection criteria, including recent common machine learning methods and available public datasets in the field of plant identification and disease classification. We generalise the methods to two classes of multi-prediction paradigm (multi-model CNNs, multi-label CNNs) and extend the study by re-introduce several methods available in the machine learning literature (multi-ouput CNNs, multi-task learning) but have not been studied deeply in the research field of plant pathology. As far as we know, this is the first study to survey and compare different deep multi-prediction techniques.
•

We proposed a novel model, namely Generalised Stacking Multi-output CNNs (GSMo-CNN), to improve the performance of plant identification and disease classification. Our model creates multiple output layers and stacks them one on top another to form a chain of classification in a hierarchical manner. By doing this, we aim to represent the relationship between plant species and disease types during the inference process. The experimental results show that GSMo-CNN achieves state-of-the-art performance in all datasets studied in this paper.
•

Our study reveals several important empirical findings. First, we show that selection of backbone CNNs is critical for achieving good performance and InceptionV3 has the best performance in our study. Second, we demonstrate the advantage of single models for multi-prediction over multiple models. This is an interesting finding as single models are more compact and easier to train. Third, we showcase the effectiveness of combining labels in a hierarchical manner, together with transfer learning we can significantly improve the prediction performance for both plant species and leaf diseases. These findings offer valuable insights for researchers and practitioners in agriculture. It would save their time to search for suitable approaches for more accurate and efficient plant species identification and disease classification, ultimately contributing to improved crop management and disease control. For the sake of reproducibility, we share the source code and datasets in this study at: https://github.com/funzi-son/plant_pathology_dl.

2. Machine Learning for Plant Identification and/or Disease Classification

2.1. Article Selection Criteria

Table 1. The Publication Years of Referenced Academic Articles (* Denotes Review Papers).

Year	Paper	Tech	Review	Total
2021 - 2023	(Metre and Sawarkar, 2021), (Sujatha et al., 2021), (Vijaykanth Reddy and Sashi Rekha, 2021), (Shahidur Harun Rumy et al., 2021), (Dang-Ngoc et al., 2021), (Raina and Gupta, 2021)^a^aa, (Paymode et al., 2021), (Guan, 2021), (KRISHNAMOORTHY and PARAMESWARI, 2021), (Chowdhury et al., 2021), (Saraswathi et al., 2021), (JANA et al., 2021), (Mukhopadhyay et al., 2021), (Li et al., 2021b)^b^bb, (Chouhan et al., 2021), (Gajjar et al., 2021), (Hu et al., 2021), (JAYAPRAKASH and BALAMURUGAN, 2021), (Hassan et al., 2021), (Ekanayake and Nawarathna, 2021)^c^cc, (Sai, 2021), (Kathiresan et al., 2021), (Metre and Sawarkar, 2022)^d^dd, (Ariyapadath, 2021), (Qin et al., 2021), (Vandenhende et al., 2021b), (Hassanin et al., 2021), (Lee et al., 2021), (Yousef Methkal Abd et al., 2023), (Pandey et al., 2023), (Thangaraj et al., 2023)	27	4	31
2020	(Xie et al., 2020), (Singh et al., 2020a), (Kirti and Rajpal, 2020), (Bharate and Shirdhonkar, 2020), (Barburiceanu et al., 2020), (Huang et al., 2020), (Thet et al., 2020), (Singh et al., 2020b), (Bhowmik et al., 2020), (Ashok et al., 2020), (Kumar et al., 2020a), (lakshmi and Nickolas, 2020), (Kawatra et al., 2020), (Kumar et al., 2020b), (Surya and Gautama, 2020), (Wadhawan et al., 2020), (Rajesh et al., 2020), (Sunil et al., 2020), (Chaudhari and Patil, 2020), (Bir et al., 2020), (Das et al., 2020), (Ponnusamy et al., 2020), (Mureşan et al., 2020)^e^ee, (Gadade and Kirange, 2020), (Sharma et al., 2020), (Agarwal et al., 2020), (ANANDHAKRISHNAN and JAISAKTHI, 2020), (Chouhan et al., 2020)^f^ff, (Fu et al., 2020), (Chen et al., 2020)	28	2	30
2019 - 2015	(G. and J., 2019), (Agarwal et al., 2019), (Tulshan and Raul, 2019), (Mwebaze et al., 2019), (Zhang et al., 2018), (Sardogan et al., 2018), (Padol and Yadav, 2016), (Hughes and Salath’e, 2015), (Hughes and Salathe, 2016), (dos Santos Ferreira et al., 2017), (Jasitha et al., 2019), (Misra et al., 2016), (Shinohara, 2016), (Wang et al., 2016)	14	0	14

The academic papers selected for this study primarily focus on three key aspects. Firstly, the publication timeline, as illustrated in Table 1, reveals that over 80% of the research articles in this study were published between 2020 and 2023. This timeframe was chosen to ensure the effectiveness and timeliness of this research. Secondly, the degree of citation and the impact factor of the journals (such as Q1 for journals and CORE A/A* for conferences) were considered. Lastly, the relevance to this research was determined through keyword searches, including ”leaf disease,” ”plant disease,” ”machine learning,” ”deep learning,” ”classification,” and ”detection.” These keywords were used to search in reputable databases such as EBSCO host and Scopus, and Google Scholar.

2.2. Traditional Machine Learning versus Deep Learning

In the earlier years, traditional (shallow) machine learning was used for plant identification and leaf disease classification (Ariyapadath, 2021; Xie et al., 2020; Agarwal et al., 2020; Thet et al., 2020; Sardogan et al., 2018; Bhowmik et al., 2020; lakshmi and Nickolas, 2020). This machine learning paradigm in plant pathology consists of two different steps: feature extraction (Raina and Gupta, 2021; Li et al., 2021b; Metre and Sawarkar, 2022, 2021) and classifier training (Kirti and Rajpal, 2020; Barburiceanu et al., 2020; Tulshan and Raul, 2019; Bharate and Shirdhonkar, 2020). In some cases, researchers also consider including data segmentation after collecting and pre-processing data before applying feature extraction (Metre and Sawarkar, 2022, 2021). Among many feature extraction techniques, K-means clustering (Kirti and Rajpal, 2020; Padol and Yadav, 2016; Kumar et al., 2020b; Chaudhari and Patil, 2020) and grey-level co-occurrence matrix (GLCM) (Bharate and Shirdhonkar, 2020; Tulshan and Raul, 2019; Dang-Ngoc et al., 2021; Kumar et al., 2020b; Shahidur Harun Rumy et al., 2021) are the most common feature extraction methods. In terms of shallow learning classifiers, Support Vector Machines (SVMs) (Kirti and Rajpal, 2020; Barburiceanu et al., 2020; Singh et al., 2020b; Das et al., 2020; Gadade and Kirange, 2020; Shahidur Harun Rumy et al., 2021; Padol and Yadav, 2016; Bharate and Shirdhonkar, 2020; Dang-Ngoc et al., 2021; Kumar et al., 2020b; Chaudhari and Patil, 2020; Mukhopadhyay et al., 2021) was the most popular, followed by K-Nearest Neighbor (KNN) (Tulshan and Raul, 2019; Bharate and Shirdhonkar, 2020), Random Forest (RF) (Shahidur Harun Rumy et al., 2021), Multilayer Perceptron (MLP) (JAYAPRAKASH and BALAMURUGAN, 2021), and Decision Tree (Rajesh et al., 2020). They are all popular in machine learning applications and achieve good performance in leaf disease detection and/or classification. In the case where data segmentation is used, we will need to detect the region of interest as a part of the data before feeding it to feature extractors and then classifiers. As we can see, the whole process is complex, involving several consecutive steps such as data acquisition, data processing, feature extraction, and prediction (Raina and Gupta, 2021; Li et al., 2021b; Metre and Sawarkar, 2022, 2021).

Table 2. Comparison Between Deep Learning (DL) and Non-deep Learning (Non-DL) Approaches for Plant Pathology Using Leaf Images.

Paper	Year	Dataset	Non-DL	DL
(Sharma et al., 2020)	2020	Plant Village (Part)	66.4%	98%
(Saraswathi et al., 2021)	2021	Plant Village (Part)	90%	96%
(Agarwal et al., 2019)	2019	Plant Village (Grape)	97.5%	99%
(G. and J., 2019)	2019	Plant Village (Whole)	87.87%	97.87%
(Kumar et al., 2020a)	2020	Plant Village (Modified)	88.06%	96.51%
(Vijaykanth Reddy and Sashi Rekha, 2021)	2021	Plant Village (Apple)	68.73%	97.62%
(Ashok et al., 2020)	2020	Tomato Leaves	92.94%	98.12%
(Sujatha et al., 2021)	2021	Citrus Leaves(Rauf et al., 2019)	87%	89.5%
(Yousef Methkal Abd et al., 2023)	2023	Citrus Leaves (Rauf et al., 2019)	86%	99.98%

Substantial changes in technology adoption can be seen under the rise of deep learning. From recent studies, as Table 2 shows, researchers have affirmed that deep learning can achieve better performance than traditional (non-deep) learning approaches (Sujatha et al., 2021; Sharma et al., 2020; Saraswathi et al., 2021; Agarwal et al., 2019; G. and J., 2019; Kumar et al., 2020a; Vijaykanth Reddy and Sashi Rekha, 2021). A class of deep learning models, known as convolutional neural networks (CNNs), have been widely applied for plant identification and/or leaf disease classification. In (dos Santos Ferreira et al., 2017) the authors employed a popular CNN model named AlexNet for plant classification. In this work, AlexNet was shown to successfully classify grass and broadleaf with average accuracy up to 99%. For leaf disease classification, AlexNet achieved 91.19% accuracy on Apple leaf images (Vijaykanth Reddy and Sashi Rekha, 2021), 86.5% on Grape leaves (Agarwal et al., 2019) and 95.75% on tomato leaf images (Ashok et al., 2020). Note that, they are separate models, each for a task. A deeper architecture, known as very deep Convolutional Neural networks (or VGG, VGGNet) has shown better performance than AlexNet in image classification tasks. When applied to plant identification, VGG achieved 97% accuracy on Leaf1 Dataset (8 species), 96.57% on Flavia Dataset (32 species) and 85.37% on D-Leaf Dataset (43 species) (Jasitha et al., 2019). VGG for leaf disease classification also received promising results. For example, in grape leaf disease VGG-16 (16 hidden layers) has been applied to many datasets (Agarwal et al., 2019; Huang et al., 2020; Thet et al., 2020). Notably, in (Thet et al., 2020) VGG-16 with Average Pooling (GAP) layer achieved 98.4% accuracy. Another common architecture of VGG is VGG with 19 hidden layers, known as VGG-19, which had achieved 96.86% accuracy in tomato leaf disease classification (Bir et al., 2020).

Another CNN architecture, known as Residual Networks or ResNets, has shown remarkable results in image classification tasks in recent years (it won the 2015 ImageNet competition). In (Qin et al., 2021), a set of ResNet variants has been employed for plant identification. An evaluation on a real leaf dataset that the authors collected with 15207 images (of 201 species) is reported as follows, 91.83% (ResNet-50); 92.71% (Res2Net-50); 92.95% (Res2Net-101). In the case of leaf disease classification, ResNet-50 achieved 98.40% accuracy for tomato leaves (ANANDHAKRISHNAN and JAISAKTHI, 2020), a customised ResNet model had 82.78% accuracy on a modified Plant Village dataset (Guan, 2021), ResNet-34 achieved 99.40% accuracy and 0.9651 F1-score in betel vine leaf disease (Kumar et al., 2020a). In (Vijaykanth Reddy and Sashi Rekha, 2021) ResNet-20 was tested on apple leaves images and achieved 92.76% accuracy. Last but not least, InceptionV3 is recently emerging as a good model for classification tasks with leaf images. InceptionV3 is the third version of Google’s Inception CNN. In (Agarwal et al., 2020), it achieved 63.4% accuracy on tomato leaf diseases and in (KRISHNAMOORTHY and PARAMESWARI, 2021), it achieved 95.41% in rice leaf disease classification. InceptionV3 has been applied to the famous Plant Village dataset and achieved very promising results, as shown in (Hassan et al., 2021) (98.42% accuracy) and in (Sai, 2021) (99.74% accuracy).

Besides the very deep and complex models discussed above, several studies also showed the advantages of light-weight CNNs in plant identification and leaf disease classification. One of the advantages of light-weight CNNs is they can run on low-resource devices, enabling a wider range of applications in smart agriculture and precision agriculture. For example, MobileNet has been deployed for smartphones and IoT devices. In (Agarwal et al., 2020), it achieved 63.75% accuracy in tomato leaf disease classification and in (Huang et al., 2020) it achieved 86% accuracy in grape leaf disease classification. Another light-weight CNN model is EfficentNet whose different variants (B0, B4, B7) have been used to classify tomato leaf diseases (Plant Village) (Chowdhury et al., 2021). To enable high performance for EfficientNet the authors have relabeled the dataset in three subtasks: task 1: healthy & unhealthy; task 2: 5 classes of leaf state, including bacterial & fungal; task 3: 1 healthy & 9 diseases. The results show that B7 got the best in task 1 (99.95%) and task 2 (99.12%) and B4 got the best in task3 (99.89%).

The above related work applied CNNs separately for plant identification and disease classification from leaf images. Each paper evaluates the CNNs on a different dataset, and sometimes on a modified dataset, making it difficult to benchmark their performance. Different from them, this paper sets up a comprehensive evaluation to provide a comparative view of the CNNs, using multiple datasets.

Several attempts in recent years have shown promising approaches of using a single CNN for multiple tasks (Misra et al., 2016; Fu et al., 2020; Vandenhende et al., 2021b; Shinohara, 2016; Hassanin et al., 2021; Wang et al., 2016). In the case of plant pathology, a CNN can learn to predict both plant species and diseases at the same time with leaf images as input. For example, in (Lee et al., 2021) the authors showed the effectiveness of conditional multi-task learning for the simultaneous identification of plant species and classification of diseases. Unlike other studies on large-scale multi-task learning, the approach adapts the multi-task learning idea for interrelated labels, where input data from different tasks are from the same distribution. In Plant Village dataset and PlantDoc dataset, the labels are the combination of both species and diseases, e.g., Apple Black Rot in Plant Village dataset and Tomato Leaf Late Blight in PlantDoc dataset. This combination deals with the multi-prediction problem by creating a multi-label known as power-set. This can transform a multi-task or multi-label problem to a large-scale multi-class task. A trained model can predict the species and diseases simultaneously, and the predicted results are joint species-disease labels. Several examples employed this idea are shown in Table 3. In (Sharma et al., 2020; Saraswathi et al., 2021; Sunil et al., 2020) the authors applied power-set CNNs on a subset of Plant Village dataset, and in (G. and J., 2019; Kumar et al., 2020a; Guan, 2021; Kawatra et al., 2020) the author worked on the whole or modified Plant Village dataset. All these studied models achieved more than 90% accuracy. For example, according to a study on PlantDoc dataset, with the support of image segmentation and power-set labelling VGG-16 achieved 60.41% accuracy, InceptionV3 achieved 62.06% accuracy, and InceptionResNet V2 achieved 70.53% accuracy (Singh et al., 2020a).

Table 3. Existing Deep Learning Approaches for Plant Identification and/or Leaf Disease Classification

Paper	Year	Dataset	Categories	Size/Ratio (training/test)	Methods & Accuracy
Plant Identification
(Othman et al., 2022)	2022	Coriander & Parsley (Private)	2	100 (70%/ 30%)	CNN (90%)
(Shelke and Mehendale, 2022)	2022	Private Dataset	79	2591 (80%/20%)	DenseNet-161 (97.3%)
		Middle European Woody Plants (Novotný and Suk, 2013)	119	(80%/20%)	LR(98.72%)
		Flavia Dataset (Wu et al., 2007)	32	1703 (80%/20%)	LR(99.58%)
		MalayaKew (MK) Leaf Dataset (Lee et al., 2015)	44	2816 (80%/20%)	LR(89.35)%
(Kanda et al., 2021)	2021	MK Leaf (Lee et al., 2015) + Synthetic Dataset	44	(80%/20%)	LR(93.33)%
		Folio Dataset (Munisami et al., 2015)	32	(80%/20%)	LR(98.75%)
		Amazon Forest (Vizcarra et al., 2021)	9	59,441 (80%/20%)	LR(98.87%)
		LeafSnap Dataset (Kumar et al., [n. d.])	185	23,147 (80%/20%)	LR(89.27%)
		Swedish Dataset (Söderkvist, 2001)	15	1125 (80%/20%)	LR(100%)
		Swedish Dataset	15	1125 (70%/15%/15%¹¹1Acc & F1)	ANN (98.99%), KNN (96.68%) & RF (97.12%)
(Ariyapadath, 2021)	2021	Flavia Dataset	32	1907 (70%/15%/15%¹¹1Training/Validation/Test Amount)	ANN (96.29%), KNN (93.79%) & RF (95.24%)
		D-Leaf Dataset(TAN and Chang, 2018)	43	1290 (70%/15%/15%¹¹1Training/Validation/Test Amount)	ANN (95.31%), KNN (86.3%) & RF (91.5%)
		Leaf1 Dataset	8	75 (80%/20%)	GoogLeNet (98%) & VGG-16 (97%)
(Jasitha et al., 2019)	2019	Flavia Dataset	32	1879 (80%/20%)	GoogLeNet (94%) & VGG-16 (96.57%)
		D-Leaf Dataset(TAN and Chang, 2018)	43	1290 (80%/20%)	GoogLeNet (88.74%) & VGG-16 (85.37%)
Disease Classification
(Agarwal et al., 2019)	2019	Plant Village (Grape)	4	3800/200	CNN (99%), Alexnet (86.5%), VGG16 (97.5%)
(Vijaykanth Reddy and Sashi Rekha, 2021)	2021	Plant Village (Apple)	4	10888/2801	CNN (97.62%), AlexNet (91.19%), GoogLeNet (95.69%), ResNet-20 (92.76%) & VGG-16 (96.32%)
(Ashok et al., 2020)	2020	Tomato Leaves (Self)	4	N/A	CNN (98.12%), AlexNet (95.75%) & ANN (92.94%)
(Agarwal et al., 2020)	2020	Plant Village (Tomato)	10	10,000/7,000/500¹¹1Training/Validation/Test Amount	CNN (91.2%), Mobilenet (63.75%), VGG-16 (77.2%)& InceptionV3 (63.4%)
(Sardogan et al., 2018)	2018	Plant Village (Tomato)	5	500(80%/20%)	CNN (86%)
(Chowdhury et al., 2021)	2021	Plant Village (Tomato)	2, 6, 10	5-fold	EfficientNet B0, B4, B7 (97% - 99%)
(ANANDHAKRISHNAN and JAISAKTHI, 2020)	2020	Plant Village (Tomato)	10	80%/20%	Xception V4 (99.45%), AlexNet (90.1%), Lenet (88.3%), Resnet (98.40%) & VGG-16 (90.1%)
(Yousef Methkal Abd et al., 2023)	2023	Citrus Leaves (Rauf et al., 2019)	5	609	C-GAN(99.6% & 97%),CNN(99.97%&99.98%), SGD(85%&86%) & ACO-CNN (99.98%&99.99%)²²2Accuracy & F1-score
(Pandey et al., 2023)	2023	Cotton Leaf Disease Dataset	4	1661	SVM(98.7%&98.7%), CNN(98.8%&98.8%) & Hybrid(98.9%&98.9%)²²2Accuracy & F1-score
(Thangaraj et al., 2023)	2023	Plant Village (Tomato)	10	80%/20%	InceptionV3(94.58%), MobileNetV1(82.7%), MobileNetV2(92.1%) & MX-MLF2(99.61%)
Single model for Plant Identification & Disease Classification
(Sharma et al., 2020)	2020	Plant Village (Part)	19	N/A	CNN (98%)
(Saraswathi et al., 2021)	2021	Plant Village (Part)	15	80%/20%	CNN (96%)
(G. and J., 2019)	2019	Plant Village (Whole)	38	55,636/1950	CNN (97.87%), AlexNet (87.34%), ResNet (92.56%), VGG16 (92.87%) & InceptionV3 (94.32%)
(Kumar et al., 2020a)	2020	Plant Village (Modified)	38	15,200 (80%/20%)	ResNet34 (99.40% & 96.51%)²²2Accuracy & F1-score
(Singh et al., 2020a)	2020	PlantDoc (Cropped)	28	80%/20%	VGG-16 (60.41%), InceptionV3 (62.06%) & InceptionResNet V2 (70.53%)
(Guan, 2021)	2021	Plant Village (Modified)	61	31718/4540	Stacking Model (87%), ResNet (82.78%), InceptionNet (82.22%), DenseNet (83.44%)& InceptionResNet (84.07%)
(Kawatra et al., 2020)	2020	Plant Village (Whole)	38	N/A	Hybrid (AlexNet + Linear SVM) Model (99.98%), Basic AlexNet (96.34%) & AlexNet with GAP Layer (97.29%)
(Sunil et al., 2020)	2020	Plant Village (Peach, Pepper & Strawberry)	6	70%/30%	Multi Convolutional Layered-based CNN (87.47% - 99.25%) with different epochs (50, 75,100 & 125)
(Lee et al., 2021)	2021	Plant Village + Digipathos (Garcia Arnal Barbedo et al., 2018) + Web images	1146³³3311 species, 289 diseases	10324 (80%/20%)	InceptionV3 & Conditional Multi-task Learning (CMTL), Total Top-1 Accuracy: 69.43%, Total Avg Accuracy: 64.79, Disease Top_1: 82.96%, Species Top-1: 78.64%

The above studies are all based on multi-class classification tasks, either directly or indirectly through the use of power-set labelling. Recently, researchers have also begun to adopt multi-task learning methods (different from power-set) to directly address the classification of both species and diseases. For example, in (Lee et al., 2021) the authors proposed conditional multi-task learning approach (CMTL) with InceptionV3 as a backbone CNN to predict both leaf species and diseases. In addition, the paper showed that it is possible to use the predicted results of plant species to help improve the prediction of disease. The dataset for the evaluation of CMTL consists of Plant Village, Digipathos (Garcia Arnal Barbedo et al., 2018) and Web images which made up to 12,290 leaf images with 1146 joint species-disease labels (311 species & 289 diseases). The experiment showed that, the total Top-1 accuracy for joint prediction of species-disease labels is 69.43%, the total average accuracy is 64.79%, the disease’s Top-1 accuracy is 82.96%, and the species’ Top-1 accuracy is 78.64%. Although power-set multi-label CNNs and CMTL are promising, there are many other approaches for multi-prediction that have not been deeply explored, which can be beneficial for plant identification and disease classification.

In this paper, we survey, implement, and evaluate a wide range of multi-prediction approaches with different CNN backbones that can be employed for predicting both plant species and diseases. We also proposed a new deep learning architecture with a learning strategy to improve prediction performance.

3. Methodology

3.1. Datasets

Table 4. Public Leaf Disease Datasets.

ID	Dataset	Year	Species	Disease	Link
1	Plant Village	2016	14	22	https://data.mendeley.com/datasets/tywbtsjrjv/1
2	Plant Leaves	2019	12	22	https://data.mendeley.com/datasets/hb74ynkjcn/1
3	Plantae_k	2019	8	9	https://data.mendeley.com/datasets/t6j2h22jpx/1
4	PlantDoc	2020	13	17	https://github.com/pratikkayal/PlantDoc-Dataset
5	Plant Pathology 2021 - FGVC8	2021	1	6	https://www.kaggle.com/c/plant-pathology-2021-fgvc8/overview
6	Maize Leaf (NLB)	2018	1	2	https://osf.io/p67rz/
7	Citrus Leaves	2019	1	5	https://data.mendeley.com/datasets/3f83gxmv57/2
8	Rice Diseases Image Dataset	2019	1	4	https://www.kaggle.com/minhhuy2810/rice-diseases-image-dataset
5	JMuBEN (Arabica Coffee Leaf Images)	2021	1	3	https://data.mendeley.com/datasets/t2r6rszp5c/1
6	JMuBEN2	2021	1	2	https://data.mendeley.com/datasets/tgv3zb82nd/1
7	Cassava Diseases	2019	1	5	https://www.kaggle.com/c/cassava-disease/data
8	UCI Rice Leaf Diseases	2017	1	3	https://archive.ics.uci.edu/ml/datasets/Rice+Leaf+Diseases

Data has a central role in modern AI technologies, including machine learning, deep learning, and computer vision. In this study, data is also necessary for the comparison of different methods. This section aims to survey and then select suitable datasets for the benchmarking in the next step of experiment and testing. Different from previous studies where, in most cases, only one dataset is used, in this paper, we select three data sources to make four evaluation sets to provide a comprehensive comparison of different approaches. The role of image datasets for computer vision in plant pathology is clearly important. In (Chouhan et al., 2020), the authors showed that the foremost problem most researchers in this field have been facing is the lack of available data sets. This would greatly affect and restrict the research of machine learning for plant identification and disease classification from leaf images. Fortunately, in recent years, several attempts have been made successfully and researchers have devoted themselves to the collection of plant disease data, filling the data availability gap in this area. Table 4 shows recent available public datasets about plant leaf diseases for computer vision research. In the table, “Year” denotes the published year; “Species” denotes the number of plant species in the data; “Disease” denotes the number of diseases available in a dataset, because different plants may have different sets of diseases. In the experiment, we select three datasets and we resize the images in those datasets to an appropriate input shape for a model, for example, the input size for CNN and AlexNet will be $256\times 256$ .

In Table 4, the datasets can be divided into two groups, multi-prediction datasets (Plant Village, Plant Leaves, PlantDoc & Plant Pathology 2021) and single-prediction datasets (Maize Leaf, Rice Diseases Image, JMuBEN, Cassava Diseases & UCI Rice Leaf). A multi-prediction dataset has different plant species (as shown in the Species column in Table 4) and different types of diseases (as shown in the ”Disease” column in Table 4). These datasets can be useful for both species and disease classification which will be employed this study to explore the usefulness of multi-prediction approaches and to verify our hypothesis. A single-prediction dataset normally only has a single plant species and a set of disease types for that plant. Therefore, the selected benchmark datasets for our study are detailed as follows:

3.1.1. Plant Village Dataset

Plant Village dataset is currently one of the most widely used public datasets for research on leaf disease identification and classification. It has different versions, including an original version and a data augmentation version. The original dataset was published in 2016 (Hughes and Salathe, 2016), it consists of $54,305$ images of diseased leaves and healthy leaves from 14 plant species (Apple, Blueberry, Cherry, Corn, Grape, Orange, Peach, Bell Pepper, Potato, Raspberry, Soybean, Squash, Strawberry & Tomato). Each species has 1 to 10 classes of related diseases, resulting in 22 unique disease categories totally with some species sharing several diseases. In this dataset, there is a total of 38 unique combinations of species and diseases (e.g. Apple Black Rot), and one additional category about images without leaf ( $1,143$ background images). The data augmentation version was released in 2019 (Hughes and Salath’e, 2015). In this version, the creators have applied six augmentation methods to enrich the data, including image flipping, Gamma correction, noise injection, principle component analysis (PCA) colour augmentation, rotation, and scaling, to improve the quality and quantity of the data. As the result, the number of samples in this dataset is $61,486$ , which increased from $54,305$ in the original version. In our study, we carry out the experiment on the original version. We split the data into $70\%$ - $10\%$ - $20\%$ for training, validation, and test sets respectively. From Figures 1(a) and 2(a), we can see the class distribution of Plant Village clearly, because Tomato has 9 groups of diseases and 1 group of healthy, it has the most number of pictures. Figure 3(a) shows the relationships between species and disease of Plant Village.

3.1.2. Plant Leaves Dataset

Plant Leaves dataset consists of $4,502$ images of healthy and unhealthy leaves divided into 22 categories by species and their health condition. The images are in high-resolution JPG format. The dataset has 12 plant species: AlstoniaScholaris, Arjun, Bael, Basil, Chinar, Gauva, Jamun, Jatropha, Lemon, Mango, Pomegranate, and PongamiaPinnata. We partition the data samples into three different sets with $70\%$ for training, $10\%$ for validation and $20\%$ for testing. Figures 1(b) and 2(b) show the class distribution of Plant Leaves, because each species has 1 group of disease and 1 group of healthy, the healthy category has the most number of pictures. The relationships between species and disease of Plant Leaves have been shown in Figure 3(b).

3.1.3. PlantDoc Dataset

Compared to Plant Village Dataset, PlantDoc dataset aims to establish a challenging benchmark with real-field images. The images in Plant Village were taken in a laboratory setup and not in real conditions of cultivation fields that impact the trained model’s efficacy in practice (Singh et al., 2020a). The usefulness of Plant Village may be not fully potential for the development of applications to identify real-world leaf diseases. PlantDoc is a large-scale non-lab data set for leaf disease detection. The images of PlantDoc have cluttered and diverse backgrounds. It has similar categories of plant species and disease types as Plant Village with $2,598$ leaf images, 13 plant species, and 17 unique diseases. There are 38 classes for a combination of species and diseases (e.g., Apple Scab Leaf). Originally, the data was partitioned into a training set of $2,360$ samples and a small test set of $238$ samples. We refer to this set as PlantDoc-1.0. To facilitate deeper comparison (with other works and for future study) we re-partition the data to create another dataset from PlantDoc. We mixed and shuffled the whole dataset and split it into $70\%$ - $10\%$ - $20\%$ for training, validation, and testing respectively. This data is referred to as PlantDoc-0.2. We use these two versions of PlantDoc in this study. Figures 1(c) and 2(c) show PlantDoc’s class distribution, because Tomato has 7 groups of diseases and 1 group of healthy, it has the most number of leaf images. Figure 3(c) shows the relationships between species and disease of PlantDoc.

3.2. Models

In this section we will survey different deep learning approaches which have been or can be applied for plant identification and disease classification. The current deep learning models employed for plant identification or disease classification are CNN models we will describe in Section 3.2.1. However, most of the other approaches we present below have not been applied largely to plant pathology, although they are really relevant and already exist in machine learning literature.

3.2.1. Backbone CNNs

In recent years, many different architectures were designed based on Convolutional Neural Networks (CNN) to deal with spatial data such as images and videos, especially in computer vision tasks. With their flexible and computationally efficient architectures, CNNs can be adapted to different scenarios and tasks. In what follows, we re-introduce several CNN models, which are popular and have been proven with excellent performance on image classification tasks. They were tested on benchmark datasets such as CIFAR-100 and ImageNet, and also are the state-of-the-art approaches for plant identification and for disease classification in recent research, as shown in Table 3.

Convolutional Neural Networks (CNN). CNNs refer to a class of neural networks that employ convolutional operators for information propagation from layers to layers. The convolutional operation is useful for image analysis as it helps neural networks learn local features, which makes CNNs popular for image data (Lecun et al., 1998). This is also the foundation for a series of deep neural network structures lately. A CNN has an input layer and an output layer, and between these two layers, there are several hidden layers where connections between a lower layer and an upper layer are formed by convolutional operators. The number of hidden layers is chosen depending on the complexity of a task. Recent advanced techniques in CNNs can improve the performance and allow CNNs to be scalable for learning from larger datasets. These include Rectified Linear units (ReLU) and other activation functions, pooling (average and max pooling), normalisation (Batch Norm and Layer Norm), etc. Normally, in a CNN architecture after a series of convolutional layers, there will be several fully-connected layers before the outputs. A fully-connected layer is a normal layer with dense connections (instead of convolutional connections). These layers will process and weave the features from the preceding convolutional layers to make accurate prediction. One of the advantages of CNN is that it can process the raw pixel values from images and learn discriminative features in an end-to-end fashion.

Based on our study on neural networks, we design a custom CNN which can be applied to plant identification or leaf disease classification. The structure of our CNN is similar to (Chen et al., 2020). However, different from it, we apply our structure to image data. The input size of the our CNN is set as $256\times 256\times 3$ and it has 4 convolutional layers in total. There is a batch normalization layer that can normalize the inputs after each convolutional layer. After 2 convolutional layers and 2 batch normalization layers, we place a max-pooling layer (this combination is repeated twice). The activation functions for all units are ReLU. On top of these layers, depending on various tasks, we add output layers to perform prediction. The units in these layers are constrained together as a softmax group. Besides the custom CNN, in what follows, we will present the most common off-the-self CNNs which have been used for image classification in general and plant pathology in specific.

AlexNet. This CNN architecture starts with five convolutional layers, and there are two max-pooling layers between the first three convolution layers. In the later stage, AlexNet has three fully connected layers. An interesting feature of AlexNet is its activation functions are designed as non-saturating ReLU. AlexNet was one of the early CNN models that made a breakthrough in image classification, notably being the first CNN to win the ImageNet challenge in 2012.

VGG16. This CNN architecture has 16 layers with multiple $3\times 3$ kernel-size filters for the convolution. This is different from the first and second large kernel-sized filters in AlexNet. VGG was designed to increase the depth of CNNs where it has several max-pooling layers. In VGG16, there are three large fully connected layers, one with $4,096$ units and another with $1,000$ units in the later stage of its architecture. In image classification, VGG16 achieved 92.7% top-5 test accuracy in the 2014 ImageNet challenge.

ResNet101. This is a powerful structure where we can design and train the model with a lot of layers to gain performance superiority. The key component of ResNet is its ”skip connections” which will skip one or several layers before rejoining to connect to the following layer. This idea can help mitigate the vanishing gradient issue or to deal with the degradation issue. ResNet can help reduce the training error when adding more layers to the CNN, hence providing a good structure for scalable learning (He et al., 2016). ResNet was the winner of ILSVRC 2015 challenge (a subset of ImageNet).

InceptionV3. Inception is a class of CNNs that utilises Inception modules for deeper structure with more efficient computation. The motivation of Inception is to prevent the number of parameters from being too large while building deeper neural networks (Szegedy et al., 2015). Inception consists of asymmetric and symmetric construction blocks. Different layers are employed, including convolution layers, average and max-pooling layers, concatenate layers, dropout layers and fully connected layers. Each Inception module in this architecture consists of four operations in parallel. The modules will be linked by concatenate layers. The batch Normalization method has been applied to the output of convolutional layers and is widely used in the whole model. In this study, we use the most popular version of Inception, i.e., InceptionV3.

MobileNetV2. MobileNet is one of the most popular light-weight CNN architectures. It aims to significantly reduce the size of the parameters and to increase the computational speed while maintaining accuracy. It was designed based on the inverted residual structure. However, different from other residual models, its residual block’s input and output are thin bottleneck layers. Also, the light-weight depthwise convolution operator is used in its intermediate expansion layer to reduce the number of parameters (Sandler et al., 2018). Interestingly enough, the lightweight depthwise convolution can not only reduce the complexity of the model, i.e. size of the model, but also greatly reduce computational cost. MobileNet, therefore, is popular for low-resource devices, especially for mobile devices. In this paper, we use MobileNetV2.

EfficientNet. Similar to MobileNet, this is one of the light-weight CNN architectures. EfficientNet is based on a scaling approach which employs fixed and compound coefficients to scale all depth/ width/ resolution dimensions. In this study, we employ EfficientNet with two core parts, one is inverted bottleneck residual blocks (adopted from MobileNetV2) and the other is Squeeze-and-Excitation blocks (SENet). EfffientNet achieved 77.3% top-1 accuracy in the ImageNet dataset.

Vision Transformer (ViT). Vision Transformer (ViT) model (Dosovitskiy et al., 2021) was released in 2021, based on the idea of Transformers (Vaswani et al., 2017) developed from the natural language processing. ViTs can handle a wide range of image sizes without requiring architectural changes. ViTs have a mechanism to capture global context information. They can attend to all image patches simultaneously, allowing them to understand the relationships between distant parts of an image. ViTs leverage the self-attention mechanism, which can capture complex relationships between image patche. They have shown advantages over CNNs in several tasks, for example ViTs perform better than ResNet in image classification on ImageNet and CIFAR-10 (Dosovitskiy et al., 2021).

We have presented popular CNN models used in this study. Those models can work alone to predict plant species or disease types, or they can be the backbone in a multi-prediction model to predict these two labels simultaneously, as shown in what follows.

3.2.2. Multi-model CNNs

This is the most straightforward application of CNNs for multi-prediction. A deep learning model can consist of two independent CNNs, each for a task. In this study we use two independent CNNs, one for predicting plant species and the other for predicting disease types, as shown in Figure 4. The two CNNs have the same architecture but each has different set of parameters.

3.2.3. Multi-label (power-set) CNNs

The second approach for multi-prediction is multi-label where the two tasks (plant prediction and disease prediction) are encoded in a single output layer. The most feasible way for it is to join the labels, as known as power-set labelling. This would help transfer a multi-label task to a multi-class task where we can directly apply the backbone CNNs above. In particular, the plant species label and the disease type label will be combined, making a joint label representing both plants and diseases. For example, a power-set label “apple_scab” can be created from the plant label “apple” and the disease label “scab”. This is the most common method for plant identification and disease classification in the literature. However, our paper is the first to apply and compare different backbones CNNs. Despite being simple, this multi-label approach has a scalable issue when facing a large number of classes for each label. In the worst case, the power-set label will consist of $|\mathcal{P}|\times|\mathcal{D}|$ classes, where $|\mathcal{P}|$ is the number of plants and $|\mathcal{D}|$ is the number of diseases. It may lead to the growth in computational complexity.

3.2.4. Multi-output CNNs

This is a class of CNN models in which we have an input layer and multiple output layers, each for a task as shown in Figure 4 (third model). Theoretically, compare to the multi-label CNNs, multi-output CNNs have fewer parameters because the latter have fewer connections to the output layer(s). The number of total output units in multi-output CNNs for plant & diseaes prediction is $|\mathcal{P}|+|\mathcal{D}|$ . The learning in these CNNs is done by optimising the model for all tasks simultaneously. This is also different from the multi-task learning we will discuss below.

3.2.5. Multi-task Deep Learning

Finally, we can adapt multi-task deep learning architectures for leaf disease and plant type classification. Originally, a multi-task learning problem is to learn a model from different (related) domains, where each task $i$ is associate a dataset $\mathcal{D}_{i}=\{(x_{i}^{(n)},y_{i}^{(n)}),n=1,...,N_{i}\}$ . Multi-task learning is very suitable for deep learning models, the features learned from one task may benefit another task learning (Zhang and Yang, 2021). We can utilise those structures for plant identification (task 1) and disease classification (task 2) by sharing input data among different tasks, i.e. $x_{1}^{(n)}=x_{2}^{(n)}=x^{(n)}$ and $y_{1}^{(n)}=p^{(n)}$ , $y_{2}^{(n)}=d^{(n)}$ , where $\{(x^{(n)},p^{(n)},d^{(n)})|n=1,...,N\}$ is a plant leaf dataset in our problem statement. Although having the same outputs as multi-output CNNs, the learning in multi-task models is different in which for each data point (an image), they only optimise for a task. In this study, instead of directly using the backbone CNNs (as we will need to implement the learning strategies), we employ the state-of-the-art multi-task models, as shown below.

Cross-stitch Network. Cross-stitch is a multi-task approach to learning shared and task-specific representations (Misra et al., 2016). To this end, cross-stitch units are designed with a soft-parameter sharing mechanism. These units integrate the features from outputs of multiple networks, each can represent different patterns of the tasks. In other words, they provide soft feature fusions among multiple single-task networks through a linear combination of every layer’s activations. As the result, cross-stitch networks would be able to fuel discriminative features across multiple tasks to improve performance, even with a small number of training examples. In (Vandenhende et al., 2021a), the authors found that we should pre-train each single-task network first before stitching these networks for better performance. In our study, the cross-stitch units have been deployed in the middle and the end of two CNNs, one for plant prediction and the other for disease prediction.

Multi-Task Attention Network (MTAN). Different from the parallel learning of shared and task-specific features in cross-stitch networks, MTAN will learn shared (global) features first from the images, and then, allow task-specific features to be learned from those global features via soft-attention modules (Liu et al., 2019). MTAN is built upon a single shared network with a global feature pool and associate each task with a task-specific soft-attention module. Compared to cross-stitch networks, as MTAN aims to share a general feature pool among different single-task networks it will not be affected by the scalability issue. However its limitation would be the lack of diversity in task-specific features are they are all produced from the shared pool (Vandenhende et al., 2021a).

Task Switching Network (TSN). TSN is based on a task-conditional single-encoder-single-decoder architecture which works with one task at a time while switching between the tasks (Ronneberger et al., 2015). TSN’s decoder is based on a U-Net architecture and its encoder is ResNet-based backbone (ResNet-18). In TSN, different tasks will be switched by a small task embedding network, one task after another. Meanwhile, the single encoder-decoder would pair all parameters for sharing among the tasks.

Model-Contrastive Learning (MOON). MOON is a recent model from federated learning paradigm that aims to leverage the similarity of different tasks’ representations to enhance local training of each task (Li et al., 2021a). In (Li et al., 2021a), it is shown that the global features are more useful than the local features learned from each task’s dataset. Therefore, MOON proposes a contrastive learning strategy to fine-tune the local representations at model level by maximizing the agreement between representations of the local model and the global model. The advantages of MOON are its simplicity, effectiveness, and ability to deal with the non-iid data issue. Compared to state-of-the-art approaches, MOON achieves a significant improvement on various image classification tasks.

3.2.6. Our model: Generalised Stacking Multi-output CNNs (GSMo-CNNs)

For completeness, we propose a new model for plant identification and disease classification. The CNN architecture of our model is inspired by the relationship between plant species and disease types. It is commonly known that some diseases may only appear in some particular plants and, therefore, the information about diseases can be useful for the prediction of plants. This reasoning can be applied contrariwise where plant information can be used to predict diseases. We realise this idea by stacking the output (softmax) layers for plant identification and disease classification one on top of another. We generalise the effect of the relationships between plant species and disease types by adding weights on different loss functions at each level of the stack for each output. In what follows, we will show the details of our model.

Architecture. The structure of our model is shown in Figure 5. The model is based on the multi-output approaches with all convolutional layers that can be reused from the backbone CNNs we presented earlier. The changes here, as we can see, are (1) the split of dense layers for different tasks; and (b) the stacking of prediction layers. The motivation behind the split of layers is we can use the convolutional layers to learn global features while the dense layers will learn task-specific features. For the stacking strategy of prediction layers, GSMo-CNNs will temporarily infer the probability of plant species and the probability of diseases in the first prediction level. After that, we apply cross connection, i.e. we concatenate the probability of predicted plants and the CNN features to make the final prediction of diseases, and similarly, we concatenate the probability of predicted diseases with the CNN features to make the final prediction of plants.

In particular, the proposed model has two sets of fully connected layers (called here as ”branches”) after the flatten layer, each branch will connect to a concatenate layer in the later stage. These two branches will produce a pair of predicted plant and disease results first, named here as $p^{temp}$ (plant temporary) and $d^{temp}$ (disease temporary). We did not make an actual prediction at this level, instead, we extract the prediction probability from the two softmax layers for the next step. At the concatenate layer, the image features from the flatten layer of the CNN will be combined with the features from softmax layers (probability) in the temporary prediction level ( $p^{temp}$ or $d^{temp}$ , depending on the branch used). These two soft-max layers will connect to two different concatenate layers, where they will join with the shared CNN’s flatten layer. On top of each concatenate layer, we have another softmax layer for cross prediction. In other words, our GSMo-CNN uses the prediction probability of plant species (from $p^{temp}$ ) to predict leaf diseases ( $d$ ) and uses the predicted disease ( $d^{temp}$ ) to predict the plant species ( $p$ ) from leaf images. Note that we can use any of the four output layers ( $p$ , $d$ , $p^{temp}$ and $d^{temp}$ for prediction but we will show in the experiments that by stacking the softmax layers the final output layers would give better performance. The use of probability in the temporary prediction layer (with the $softmax$ function), instead of the predicted values (with the $argmax$ function), will help smooth the propagation of gradients in the learning. Our model is inspired by classifier chain (Read et al., 2009), but the difference here is that the ”chain” in GSMo-CNNs is implemented in a stacking fashion with parallel inference instead of sequential inference.

Training. For training, as we have four outputs, we aim to optimise the prediction at every output layer. This is because a good estimation of plant species in the temporary prediction layer will help improve the prediction of diseases in the final prediction (top) layer. Similarly, a good estimation of diseases in the temporary prediction layer will help the prediction of plants in the final layer. As the result, we will need four loss functions to minimise. Normally, we can train our model to minimise the losses simultaneously to take advantage of the underlying relationships between plant species and leaf disease. This would also allow us to impose the logical negation constraint that some diseases will not appear on specific plants and vice versa. Furthermore, we are interested in how much relationships and constraints affect the learning. However, there will be an issue associated with such relationships and constraints in the case where the temporary layers do not learn well and make the wrong prediction for the top layers. This will definitely happen during the beginning of the training where, after being initialised, the model will have very low performance (and many mistakes will be made) as it just starts to learn. Therefore, in order to control this impact, we associate each loss with a balance weight. In particular, we train our model by minimising the following total loss function:

(1)

\mathcal{C}=\beta_{1}\mathcal{L}(P_{1},\hat{P}_{1})+\beta_{2}\mathcal{L}(P_{2},\hat{P}_{2})+\delta_{1}\mathcal{L}(D_{1},\hat{D}_{1})+\delta_{2}\mathcal{L}(D_{2},\hat{D}_{2})

where $\mathcal{L}$ is a cross-entropy loss function; $P_{1}$ , $P_{2}$ are the ground truth for the plant species at level 1 and level 2 in the stack respectively; $P_{1}$ , $P_{2}$ , are the corresponding predicted plant species; $D_{1}$ , $D_{2}$ are the ground truth for the disease types at level 1 and level 2 in the stack respectively; $D_{1}$ , $D_{2}$ , are the corresponding predicted diseases. $\beta_{1}$ , $\beta_{2}$ , $\delta_{1}$ , $\delta_{2}$ are the balance weights. In the experiment $\beta_{1}$ , $\beta_{2}$ , $\delta_{1}$ , and $\delta_{2}$ are treated as hyper-parameters. The reason for using balance weights for the losses in each level is that we want to fine-tune the learning in the upper level based on what we have in the lower level. We also apply different balance weights to the losses on different outputs to see how they influence each other. We would anticipate that these balance weights would be similar as the prediction of plants would help the prediction of diseases, and vice versa.

Inference Our model performs two stages of inference for plant species and disease types but only the outcomes of the $2^{nd}$ stage will be used for the final prediction. However, by using the balance weights in the training, we generalise the multi-output architecture, allowing control of interactions between different outputs and between outputs in different levels. As the result, the multi-model and multi-output approaches are just special cases of our model, as follows.

•

If $\beta_{1}=1$ and $\beta_{2}=\delta_{1}=\delta_{2}=0$ then our model becomes a single CNN for plant identification. In this case the layer for plant identification at the first level will be used as the output.
•

If $\delta_{1}=1$ and $\beta_{1}=\beta_{2}=\delta_{2}=0$ then our model becomes a single CNN for disease classification. In this case the layer for disease classification at the first level will be used as the output.
•

If $\beta_{1}=\delta_{1}=1$ and $\beta_{2}$ = $\delta_{2}$ =0 our model becomes a regular multi-output CNNs, as shown earlier. The outputs are the two layers in the first prediction level.

3.3. Evaluation Metrics

Based on the study of related work and classification tasks, we found that Accuracy and F1-score are the most common evaluation metrics for plant identification and disease classification. Both metrics can be calculated from a confusion matrix (Chicco and Jurman, 2020). The accuracy metric indicates how accurate a machine learning model is. A prediction is correct if its output is the same as the actual label (ground truth) and accuracy is the ratio of the number of correct predictions to the total number of samples. Generally, accuracy is a sensible metric for classification tasks, however, when facing the data imbalance issue accuracy will become more biased towards the class with the most number of samples. For example, if 95 out of 100 images are apple leaves, then we can make a simple guess to achieve 95% accuracy. Although not all datasets are imbalanced and sometimes the imbalance issue is not severe, for completeness, we also employ F1-score as another evaluation metric. Accuracy and F1-score are calculated as follows: $\text{Recall}=\frac{TP}{(TP+FN)}$ , $\text{Precision}=\frac{TP}{(TP+FP)}$ , $\text{Accuracy}=\frac{(TP+TN)}{(TP+FN+FP+TN)}$ , $\text{F1\_score}=\frac{(2\ast Precision\ast Recall)}{(Precision+Recall)}$ , $\text{FPR}=\frac{FP}{(FP+TN)}$ .

Here, TP, TN, FP,FN, and FPR denote True Positives, True Negatives, False Positives, False Negatives, and False Positive Rate respectively. As we can see, F1-score is based on both Recall and Precision. It balances the weights of precision and recall, making it a more reliable metric. In this study, F1-score will be the key evaluation metric, e.g. used for model selection. Together with Accuracy, they will provide a comprehensive view of the performance of a model. We will evaluate a model based on its performance (Accuracy & F1-score) on plant prediction, disease prediction, and (combined) plant-disease prediction. False Positive Rates will be used to assist in evaluating the optimal models for all approaches. FPR represents the number of false positive predictions, and the sum of FP and TN constitutes the total number of true negatives.

3.4. Setup

With the models are ready, we are now in a position to prepare for the empirical study. A collection of common and recent CNN models for plant species and diseases with various learning approaches (multi-model, multi-label, multi-output, multi-task, and GSMo-CNNs) will be tested. Backbone models (i.e., CNN, AlexNet, VGG16, ResNet101, EfficientNet, InceptionV3 & MobileNetV2) are implemented in Tensorflow 2.6, and multi-task models are in Pytorch (downloaded and cited from related authors’ papers and GitHub). We perform the experiments on three public datasets, including Plant Village, Plant Leaves and PlantDoc, as detailed in Section 3.1. The ratio of training and test sets is 80: 20 and 10% of the training set will be used as a validation set. All hyper-parameters in the experiments were set uniformly across the models and approaches. During the training, we adopt an early stopping method to avoid the problem of over-fitting. In particular, if a model’s validation loss has not been decreasing in 50 epochs we stop the training. The input size of leaf images is set to $256\times 256\times 3$ pixels and all samples are in RGB format. Our model selection is done by measuring F1-score on the validation set. For GSMo-CNNs, we search the balance weights using a coarse grid [0.2, 0.4, 0.6, 0.8] and then narrow down the search to find the best combination for all datasets. We find that $\beta_{1}=0.1$ , $\beta_{2}=0.4$ , $\delta_{1}=0.1$ , $\delta_{2}=0.5$ are generally good for all three datasets and we report the results of GSMo-CNNs under this configuration. We run each experiment 10 times and report the average accuracy & F1-score with standard deviation.

4. Experimental Results

In this section, we report the results of different models on the benchmark datasets and analyse their performance. We divide the results into ”Plant Prediction”, ”Disease Classification”, and both (Plant Identification & Disease Classification). In multi-model, multi-label (power-set), multi-output, and our GSMo-CNNs, we evaluate different CNNs backbones, including AlexNet, VGG16, ResNet101, EfficientNet, InceptionV3, MobileNetV2, and our custom CNN. We test two versions of our GSMo-CNNs, one without balance weights ( $\beta_{1}=\beta_{2}=\delta_{1}=\delta_{2}=1$ ), and the other with balance weights. To save time for model selection, we choose the best backbone CNN found in the former version (GSMo-CNNs without balance weights) for the latter. For completeness, we apply transfer learning to our GSMo-CNNs to see if improvement can be achieved. We implement the transfer learning idea by using pre-trained backbone CNNs on the ImageNet dataset to initialise our model and fine-tune it on a leaf image dataset.

4.1. Plant Identification

Let us discuss the plant identification task first to evaluate the models and approaches on their ability to predict plant species from leaf images. This is a classification task where we identify plant types among a fixed set of categories (species). Different from previous work on identifying plants using healthy leaves only, the challenge of this task in our study is we have both healthy and diseased leaves images. It is worth noting that identifying plant species from corrupted or damaged leaves is non-trivial. Nevertheless, we expect that an effective classification model should distinguish them accurately. Table 5 shows all the plant type results from the experiment. The results contain the accuracy and F-1 scores of multi-model, multi-label (power-set), multi-output, multi-task models, and GSMo-CNNs on the benchmark datasets with different CNN backbones.

Table 5 and Figure 6 shows that all approaches achieve good performance (more than $90\%$ accuracy & $90\%$ F1-score) on Plant Village and Plant Leaves. These results confirm the effectiveness of deep learning approaches for classification tasks with image inputs. For PlantDoc (both PlantDoc-0.2 and PlantDoc-1.0), the performance is lower. We anticipate this outcome because the images in Plant Doc are more complex with noisy backgrounds and there can be multiple leaves in one image. Note that, for the sake of the fair evaluation, we don’t apply any data processing and augmentation techniques, even though previous research shows that they can improve the performance greatly (Saraswathi et al., 2021; G. and J., 2019; Agarwal et al., 2019).

In terms of backbone CNNs, as we can see, in multi-model approach, there is an inconsistency of which backbone CNNs has the best performance in all datasets. In particular, MobileNetV2 has the highest accuracy and F1-score in Plant Village, VGG16 is the best in PlantDoc-0.2, while InceptionV3 has the best results in both Plant leaves and PlantDoc-1.0. Different from the multi-model approach, in multi-label and multi-output approaches InceptionV3 clearly shows its advantages with higher performance than other backbones. In PlantDoc-0.2 and PlantDoc-1.0, InceptionV3 achieves the best performance and its advantages are overwhelming. In particular, its accuracy/F1-score on PlantDoc-0.2 are $51.437\%$ / $0.47071$ and on PlantDoc-1.0 are $43.093\%$ / $0.338037$ , respectively. In the case of GSMo-CNN, InceptionV3 is also better than other backbone CNNs in Plant Village, Plant Leaves, PlantDoc-2.0, and PlantDoc-1.0.

In terms of common approaches (multi-model, multi-label, multi-output, multi-task), multi-label (power-set) achieves the best results in Plant Village (Accuracy: 99.469% & F1-score: 0.99469) and multi-model has the highest average accuracy and F1-score (99.175% & 0.99175) in Plant Leaves dataset. Multi-output outperforms multi-model, multi-label, and multi-task in PlantDoc-0.2 (Accuracy: $51.437\%$ , F1-score: $0.4707$ ) and PlantDoc-1.0 (Accuracy: $43.093\%$ , F1-score: $0.38037$ ). Multi-task is inferior to other approaches. This is not surprised as both tasks (plant identification and disease classification) shared the same input data, which is different from original multi-task learning paradigm where different tasks have different sets of data.

For our proposed model GSMo-CNN (denoted as ”Our methods (Plant)” in Table 5, we applied 7 backbone CNNs for our new model structure without the balance weight (BW) of each loss (i.e. set $\beta_{1}$ : $\beta_{2}$ : $\delta_{1}$ : $\delta_{2}$ as 1:1:1:1). InceptionV3 is also the best backbone in this case. This is the backbone we use for GSMo-CNN with balance weights. Without the balance weights, the performance of the GSMo-CNNs is already promising as it achieves better results than multi-model, multi-label, multi-output, and multi-task on Plant Village (Accuracy: $99.850\%$ , F1-score: $0.99850$ ) and on PlantDoc-1.0 (Accuracy: $44.492\%$ , F1-score: $0.41757$ ). With the balance weights, GSMo-CNNs achieves the best performance in three cases: Plant Leaves, PlantDoc-0.2, and PlantDoc-1.0. The accuracy and F1-score are as follows, Plant Village: $99.687\%$ & $0.99688$ ); Plant Leaves: $99.646\%$ & $0.99647$ ; PlantDoc-0.2: $49.068\%$ , $0.47960$ ; and PlantDoc-1.0: $46.864\%$ & $0.44804$ .

Finally, when transfer learning is applied we can see improvement in all cases, especially for PlantDoc-0.2 and PlantDoct-1.0. In Plant Village, the best results without transfer learning are $99.850\%$ accuracy and $0.99850$ F1-score while transfer learning achieves $99.928\%$ accuracy and $0.99927$ F1-score. Huge improvement can be seen in PlantDoc sets. In particular, our model with transfer learning increase the performance from $51.437\%$ accuracy & $0.48051$ F1-score to $71.262\%$ & $0.70602$ on PlantDoc-0.2, and from $46.992\%$ accuracy & $0.44934$ F1-score to $73.644\%$ & $0.71980$ on PlantDoc-1.0.

4.2. Disease Classification

The disease classification task aims to predict the diseases of plants from leaf images. Note that, ”healthy” is also treated as a category of plant diseases. The task will not only identify whether a plant has a disease or not but also need to accurately categorise the disease types from different plants. A model should be able to focus on the diseases and not be confused by the common patterns from leaves of the same type. Table 6 shows the results of different models and approaches, including all the disease classification results from multi-model CNNs, multi-label (power-set) CNNs, multi-output CNNs, multi-task learning approaches and our new model.

As we can see in Table 6 and Figure 7, all approaches achieve good performance on Plant Village and Plant Leaves datasets (most accuracy & F1-score over 80% & 0.80). In terms of backbone CNNs, we observe a similar trend as in Plant identification, where InceptionV3 performs better than other CNNs in most of the approaches and datasets. The exception can only be seen in the multi-model approach where MobileNet got the best results in Plant Village ( $99.275\%$ accuracy & $0.99274$ F1-score) and VGG16 achieved $40.291\%$ accuracy and $0.33305$ F1-score in PlantDoc-0.2. In terms of common approaches (multi-model, multi-label, multi-output, multi-task), multi-model (with InceptionV3) achieves the best accuracy ( $51.992\%$ ) and F1-score ( $0.48344$ ) on PlantDoc-1.0; multi-output (with InceptionV3) is better than the others on Plant Village ( $99.414\%$ accuracy and $0.99413$ F1-score) and on PlantDoc-0.2 ( $43.165\%$ accuracy and $0.41892$ F1-score); multi-label (with InceptionV3) achieves the highest accuracy ( $97.547\%$ ) and F1-score ( $0.97468$ ) on Plant Leaves.

Our proposed model also performs well in the case of disease classification. Without the balance weights, GSMo-CNN achieves better results than multi-model, multi-label, multi-output, and multi-task approaches in PlantVillage and PlantDoc-0.2. The accuracy and F1-score of our new model without balance weights and with InceptionV3 as the backbone are as follows: Plant Village (99.418% & 0.99417), Plant Leaves (97.292% & 0.97201), PlantDoc-0.2 (45.029%, 0.41716) and PlantDoc-1.0 (47.542% & 0.46156). When balance weights are used, with InceptionV3 as the best backbone GSMo-CNNs achieves the best results in 3 out of 4 cases. In particular, it has $99.466\%$ accuracy and $0.994466$ F1-score on Plant Village dataset; $97.759\%$ accuracy and $0.97738$ F1-score on Plant Leaves; and $45.301\%$ accuracy and $0.43095$ F1-score on PlantDoc-0.2. Only in PlantDoc-1.0 where GSMo-CNN ranks second, here, multi-model wins the best accuracy ( $51.992\%$ ) and F1-score ( $0.48344$ ). If we use InceptionV3 pre-trained weights from ImageNet as the backbone for GSMo-CNN, we can even achieve much higher performance. As we can see, the accuracy and F1-score of our model in all datasets are improved significantly as follows, Plant Village (99.615% accuracy & 0.99615 F1-score), Plant Leaves (98.149% accuracy & 0.98146 F1-score), PlantDoc-0.2 (64.544% accuracy & 0.62968 F1-score) and PlantDoc-1.0 (63.305% accuracy & 0.63757 F1-score).

4.3. Plant Identification & Disease Classification

For completeness, we will evaluate the models and approaches in predicting plant species and disease types altogether. This is the combination of the plant identification task and the disease classification task, as discussed earlier. However, instead of evaluating each task separately, we are interested in studying how deep learning models can perform accurate predictions for both tasks. A prediction is accurate if and only if both plant species and disease type are inferred correctly. This would be the key criteria for users to select a model for their applications, as usually we want to have information about both plants and diseases to find a suitable treatment. For evaluation, plant species and disease type will be combined to the joint species-disease labels for the calculation of average accuracy and F1-score.

Table 7 and Figure 8 shows the results of all models and approaches. Although the combined prediction of plant species and disease types is more complex than the previous two tasks, all approaches achieved promising results with more than $85\%$ average accuracy and $0.85$ F1-score on Plant Village and Plant Leaves. In terms of backbone CNNs, the multi-model approach demonstrates that MobileNet achieves the best performance ( $99.05\%$ accuracy and $0.99148$ F1-score) on Plant Village and VGG16 achieves the highest results on PlantDoc-0.2 ( $16.252\%$ accuracy & $0.15821$ F1-score). The multi-output approach shows that AlexNet has the highest accuracy of $22.458\%$ and F1-score of $0.20815$ on PlantDoc-1.0. Except for such three cases, InceptionV3 achieves higher performance than other backbone CNNs in the other cases.

Among the common multi-prediction approaches (multi-model, multi-label, multi-output, and multi-task), multi-label (power-set) performs the best on Plant Village, Plant Leaves, and PlantDoc-1.0 while multi-output has the best results on PlantDoc-0.2. It makes sense as the power-set labelling in the multi-task approach combines the two labels together, therefore, the learning would directly optimise the prediction of the combined label (plant & disease). Interestingly, multi-output performs well in PlantDoc-0.2 even though there is no communication between the two labels during the learning. Having said that, it can be the back-propagation of gradients from the two branches to the shared CNNs that plays a role in regularisation for the shared features in the learning. Such regularisation would help the model to learn more generalised features rather than task-specific features. The multi-task learning models (TSNs, MOON, Cross Stitch & MTAN) show good results in Plant Village and Plant Leaves, however, they are not comparable to other approaches.

Our proposed model GSMo-CNN achieves the best performance on all datasets, except accuracy on PlantDoc-1.0 where multi-label with InceptionV3 backbone tops the list with $25.466\%$ . In the case where balance weights are not applied ( $\beta_{1}:\delta_{1}:\beta_{2}:\delta_{2}=1:1:1:1$ ), GSMo-CNN is better than multi-model, multi-label, multi-output, and multi-task. Its best performance is: $99.315\%$ accuracy and $0.99333$ F1-score on Plant Village; $96.593\%$ accuracy and $0.96641$ F1-score on Plant Leaves; $25.359\%$ accuracy and $0.24692$ F1-score on PlantDoc-0.2; and $23.814\%$ accuracy and $0.22487$ F1-score on PlantDoc-1.0. Again, InceptionV3 is the best backbone CNN for GSMo-CNN. When balance weights are employed, the improvement can be seen on Plant Leaves, PlantDoc-0.2, and PlantDoc-1.0. As mentioned earlier, we apply a grid search and narrow down the grid to find a combination of the weights that perform well on all datasets. The reported results in Tables 5, 6, 7 are from the balance weights $\beta_{1}:\delta_{1}:\beta_{2}:\delta_{2}=0.1:0.1:0.4:0.5$ . In particular, GSMo-CNN with BW achieves $99.208\%$ accuracy and $0.99245$ F1-score on Plant Village; $97.524\%$ accuracy and $0.97746$ F1-score on Plant Leaves; $26.175\%$ accuracy and $0.26760$ F1-score on PlantDoc-0.2; and $24.153\%$ accuracy and $0.23191$ F1-score on PlantDoc-1.0.

Similar to the independent prediction of plant and disease as we shown in 4.1 and 4.2, when we initialise InceptionV3 from the weights pre-trained on ImageNet, GSMo-CNN can improve the performance in all datasets, producing the highest accuracy and F1-scores in this study. The accuracy and F1-scores are as follows, Plant Village: 99.558% & 0.99565; Plant Leaves: 98.042% & 0.98116; PlantDoc-0.2: 50.291% & 0.50396; and PlantDoc-1.0: 50.000% & 0.50191.

It is worth noting that our proposed method does not have the problem of computational overhead because (1) we do not need to combine the labels during the learning; and (2) stacking the labels is efficient and its structure only needs a single CNN as the backbone. The only complexity users might find from our approach is the search for balance weights. Fortunately, as we showed in our intensive experiment that it can be optional as GSMo-CNN without balance weights can already achieve better performance than other approaches. Together with the transfer learning results the balance weights even bolster the effectiveness of GSMo-CNN. This shows the flexibility of our approach where users are provided with different options, including the choice of backbone CNNs, the choice of the balance weights for the losses, and/or the choice of pre-training weights. All these options have proved to be effective in improving the performance of plant identification and disease classification. We have shown in the experiment that the idea of the proposed deep learning structure, where two labels (Plant & Disease) can help each other to eliminate some wrong options to improve the prediction accuracy. In the next section, let us summarise the findings and provide further analysis of the results.

Table 5. Plant Type Results.

		Plant Village		Plant leaves		PlantDoc-0.2		PlantDoc-1.0
		Acc	F1	Acc	F1	Acc	F1	Acc	F1
Multi-model (Plant)	CNN	$97.448\%\pm 00.148\%$	$0.97441\pm 0.00152$	$94.451\%\pm 00.888\%$	$0.94416\pm 0.00872$	$35.515\%\pm 00.984\%$	$0.31791\pm 0.00757$	$37.712\%\pm 00.000\%$	$0.33216\pm 0.00000$
	AlexNet	$98.655\%\pm 00.321\%$	$0.98654\pm 0.00322$	$96.085\%\pm 00.840\%$	$0.96089\pm 0.00852$	$40.019\%\pm 02.968\%$	$0.32573\pm 0.03386$	$36.398\%\pm 01.449\%$	$0.34042\pm 0.03723$
	VGG16	$99.465\%\pm 00.319\%$	$0.99465\pm 0.00320$	$96.792\%\pm 00.755\%$	$0.96789\pm 0.00758$	$45.553\%\pm 02.971\%$	$0.37827\pm 0.03285$	$42.119\%\pm 02.919\%$	$0.36066\pm 0.04843$
	ResNet101	$99.427\%\pm 00.095\%$	$0.99428\pm 0.00095$	$98.467\%\pm 00.000\%$	$0.98465\pm 0.00000$	$26.369\%\pm 03.394\%$	$0.18736\pm 0.01701$	$29.958\%\pm 01.444\%$	$0.21007\pm 0.02095$
	EfficientNet	$99.389\%\pm 00.074\%$	$0.99388\pm 0.00075$	$94.717\%\pm 00.689\%$	$0.94707\pm 0.00674$	$34.369\%\pm 00.000\%$	$0.28476\pm 0.00829$	$31.780\%\pm 01.705\%$	$0.23555\pm 0.01912$
	InceptionV3	$99.389\%\pm 00.074\%$	$0.99388\pm 0.00075$	$99.175\%\pm 00.000\%$	$0.99175\pm 0.00000$	$45.068\%\pm 01.495\%$	$0.36668\pm 0.02233$	$40.297\%\pm 03.301\%$	$0.37522\pm 0.03833$
	MobileNetV2	$99.750\%\pm 00.000\%$	$0.99750\pm 0.00000$	$96.050\%\pm 00.260\%$	$0.96065\pm 0.00253$	$13.592\%\pm 00.000\%$	$0.03253\pm 0.00000$	$25.593\%\pm 07.288\%$	$0.11020\pm 0.04417$
	ViT	$92.045\%\pm 00.875\%$	$0.91833\pm 0.00980$	$94.617\%\pm 00.166\%$	$0.94635\pm 0.00153$	$30.874\%\pm 00.793\%$	$0.15409\pm 0.01404$	$29.661\%\pm 00.346\%$	$0.14268\pm 0.00854$
Multi-label (Plant)	CNN	$98.645\%\pm 00.160\%$	$0.98643\pm 0.00159$	$94.384\%\pm 01.338\%$	$0.94404\pm 0.01318$	$38.252\%\pm 00.000\%$	$0.36807\pm 0.00239$	$34.746\%\pm 00.000\%$	$0.32100\pm 0.00000$
	AlexNet	$98.578\%\pm 00.092\%$	$0.98581\pm 0.00089$	$96.404\%\pm 00.489\%$	$0.96405\pm 0.00501$	$33.981\%\pm 02.044\%$	$0.32072\pm 0.02660$	$32.500\%\pm 06.460\%$	$0.31231\pm 0.06649$
	VGG16	$97.114\%\pm 00.185\%$	$0.97110\pm 0.00187$	$92.175\%\pm 00.055\%$	$0.92139\pm 0.00031$	$30.252\%\pm 01.738\%$	$0.28354\pm 0.01460$	$23.771\%\pm 03.301\%$	$0.22337\pm 0.02978$
	ResNet101	$99.704\%\pm 00.056\%$	$0.99704\pm 0.00056$	$98.257\%\pm 00.051\%$	$0.98261\pm 0.00048$	$27.340\%\pm 02.315\%$	$0.24259\pm 0.02912$	$23.390\%\pm 03.187\%$	$0.21037\pm 0.01491$
	EfficientNet	$99.314\%\pm 00.089\%$	$0.99315\pm 0.00089$	$95.538\%\pm 00.932\%$	$0.95510\pm 0.00902$	$23.903\%\pm 01.097\%$	$0.21702\pm 0.01132$	$21.102\%\pm 03.559\%$	$0.17609\pm 0.05528$
	InceptionV3	$99.810\%\pm 00.118\%$	$0.99810\pm 0.00118$	$99.112\%\pm 00.000\%$	$0.99111\pm 0.00000$	$41.981\%\pm 02.608\%$	$0.40630\pm 0.01026$	$38.771\%\pm 00.636\%$	$0.36693\pm 0.00783$
	MobileNetV2	$99.469\%\pm 00.019\%$	$0.99469\pm 0.00019$	$95.450\%\pm 00.778\%$	$0.95464\pm 0.00786$	$13.592\%\pm 00.000\%$	$0.03253\pm 0.00000$	$11.017\%\pm 00.000\%$	$0.02187\pm 0.00000$
	ViT	$92.238\%\pm 00.175\%$	$0.92285\pm 0.00168$	$95.560\%\pm 00.333\%$	$0.95563\pm 0.00350$	$22.718\%\pm 00.583\%$	$0.20821\pm 0.00993$	$17.797\%\pm 01.695\%$	$0.16317\pm 0.02967$
Multi-output (Plant)	CNN	$97.843\%\pm 00.154\%$	$0.97838\pm 0.00150$	$92.963\%\pm 00.997\%$	$0.92930\pm 0.01038$	$39.398\%\pm 02.940\%$	$0.37328\pm 0.02085$	$35.805\%\pm 00.636\%$	$0.33279\pm 0.00302$
	AlexNet	$99.356\%\pm 00.056\%$	$0.99357\pm 0.00056$	$95.871\%\pm 00.435\%$	$0.95862\pm 0.00473$	$39.456\%\pm 02.032\%$	$0.37112\pm 0.01990$	$40.254\%\pm 00.000\%$	$0.36676\pm 0.00000$
	VGG16	$99.407\%\pm 00.035\%$	$0.99406\pm 0.00034$	$96.659\%\pm 00.812\%$	$0.96652\pm 0.00803$	$41.592\%\pm 00.117\%$	$0.33072\pm 0.00733$	$42.585\%\pm 03.178\%$	$0.35115\pm 0.01855$
	ResNet101	$99.681\%\pm 00.019\%$	$0.99681\pm 0.00019$	$97.658\%\pm 00.033\%$	$0.97651\pm 0.00031$	$34.951\%\pm 00.000\%$	$0.25710\pm 0.00000$	$34.746\%\pm 00.000\%$	$0.27496\pm 0.00000$
	EfficientNet	$99.506\%\pm 00.106\%$	$0.99506\pm 0.00106$	$95.183\%\pm 01.100\%$	$0.95161\pm 0.01112$	$33.126\%\pm 02.228\%$	$0.25029\pm 0.01327$	$31.695\%\pm 01.525\%$	$0.21581\pm 0.01123$
	InceptionV3	$99.752\%\pm 00.044\%$	$0.99753\pm 0.00044$	$98.524\%\pm 00.222\%$	$0.98521\pm 0.00223$	$\textbf{51.437\%}\pm 01.513\%$	$0.47071\pm 0.02533$	$43.093\%\pm 03.902\%$	$0.38037\pm 0.04974$
	MobileNetV2	$99.634\%\pm 00.103\%$	$0.99634\pm 0.00104$	$94.750\%\pm 01.129\%$	$0.94731\pm 0.01103$	$25.010\%\pm 07.474\%$	$0.10613\pm 0.04818$	$20.127\%\pm 09.110\%$	$0.07708\pm 0.05521$
	ViT	$93.352\%\pm 00.837\%$	$0.93246\pm 0.00906$	$90.035\%\pm 01.002\%$	$0.89969\pm 0.00947$	$29.900\%\pm 00.000\%$	$0.13770\pm 0.00000$	$0.29237\%\pm 00.000\%$	$0.13229\pm 0.00000$
Multi-task (Plant)	Cross-Stitch	$98.197\%\pm 00.155\%$	$0.98196\pm 0.00153$	$95.743\%\pm 00.318\%$	$0.95737\pm 0.00324$	$39.845\%\pm 00.720\%$	$0.35536\pm 0.01282$	$38.814\%\pm 02.734\%$	$0.34577\pm 0.03165$
	MTAN	$95.461\%\pm 00.390\%$	$0.95464\pm 0.00395$	$90.377\%\pm 02.631\%$	$0.90345\pm 0.02651$	$27.825\%\pm 02.394\%$	$0.19536\pm 0.02203$	$30.169\%\pm 01.180\%$	$0.18824\pm 0.01435$
	TSNs	$91.704\%\pm 01.452\%$	$0.91686\pm 0.01468$	$85.749\%\pm 03.536\%$	$0.85800\pm 0.03514$	$27.786\%\pm 02.775\%$	$0.18716\pm 0.02020$	$32.881\%\pm 01.989\%$	$0.21219\pm 0.03381$
	MOON	$97.589\%\pm 00.410\%$	$0.97586\pm 0.00411$	$95.627\%\pm 01.013\%$	$0.95646\pm 0.01016$	$36.350\%\pm 01.437\%$	$0.25105\pm 0.01992$	$35.339\%\pm 01.542\%$	$0.25096\pm 0.01692$
Our Methods w.o BW (Plant)	CNN	$98.550\%\pm 00.000\%$	$0.98548\pm 0.00000$	$97.780\%\pm 00.000\%$	$0.97781\pm 0.00000$	$42.252\%\pm 00.621\%$	$0.39688\pm 0.00666$	$38.390\%\pm 02.330\%$	$0.35370\pm 0.02055$
	AlexNet	$98.925\%\pm 00.245\%$	$0.98928\pm 0.00243$	$95.716\%\pm 00.805\%$	$0.95688\pm 0.00816$	$37.942\%\pm 01.963\%$	$0.36431\pm 0.01388$	$38.856\%\pm 03.014\%$	$0.35983\pm 0.03556$
	VGG16	$99.511\%\pm 00.084\%$	$0.99512\pm 0.00085$	$96.249\%\pm 00.583\%$	$0.96236\pm 0.00579$	$44.621\%\pm 00.666\%$	$0.36913\pm 0.00923$	$43.898\%\pm 03.495\%$	$0.35444\pm 0.02802$
	ResNet101	$99.714\%\pm 00.052\%$	$0.99714\pm 0.00052$	$98.069\%\pm 00.217\%$	$0.98066\pm 0.00217$	$36.039\%\pm 04.150\%$	$0.29830\pm 0.03925$	$36.610\%\pm 01.356\%$	$0.27054\pm 0.00194$
	EfficientNet	$99.625\%\pm 00.037\%$	$0.99625\pm 0.00038$	$95.816\%\pm 00.592\%$	$0.95798\pm 0.00596$	$31.631\%\pm 04.450\%$	$0.23771\pm 0.03248$	$33.898\%\pm 01.271\%$	$0.24964\pm 0.01032$
	InceptionV3	$\textbf{99.850\%}\pm 00.019\%$	$\textbf{0.99850}\pm 0.00019$	$98.690\%\pm 00.097\%$	$0.98692\pm 0.00098$	$46.757\%\pm 04.038\%$	$0.45049\pm 0.03376$	$44.492\%\pm 06.196\%$	$0.41757\pm 0.05976$
	MobileNetV2	$99.594\%\pm 00.119\%$	$0.99593\pm 0.00120$	$95.627\%\pm 00.761\%$	$0.95682\pm 0.00727$	$13.592\%\pm 00.000\%$	$0.03253\pm 0.20127$	$00.000\%\pm 09.110\%$	$0.07708\pm 0.05521$
	ViT	$90.282\%\pm 00.290\%$	$0.90219\pm 0.00281$	$94.458\%\pm 00.472\%$	$0.94426\pm 0.00470$	$32.136\%\pm 01.650\%$	$0.25778\pm 0.02330$	$30.085\%\pm 01.271\%$	$0.24128\pm 0.00673$
Our Methods with BW (Plant)
Our Methods with BW (Plant)	InceptionV3	$99.687\%\pm 00.000\%$	$0.99688\pm 0.0000$	$\textbf{99.646\%}\pm 00.000\%$	$\textbf{0.99647}\pm 0.00000$	$49.262\%\pm 02.442\%$	$\textbf{0.48051}\pm 0.01716$	$\textbf{46.992}\%\pm 03.637\%$	$\textbf{0.44934}\pm 0.03012$
Our Methods w. BW + TF (Plant)
Our Methods w. BW + TF (Plant)	InceptionV3	$99.928\%\pm 00.012\%$	$0.99927\pm 0.00012$	$99.776\%\pm 00.227\%$	$0.99777\pm 0.00226$	$71.262\%\pm 03.962\%$	$0.70602\pm 0.04275$	$73.644\%\pm 02.548\%$	$0.71980\pm 0.02900$

Table 6. Plant Disease Results.

		Plant Village		Plant leaves		PlantDoc-0.2		PlantDoc-1.0
		Acc	F1	Acc	F1	Acc	F1	Acc	F1
Multi-model (Disease)	CNN	$95.374\%\pm 00.323\%$	$0.95364\pm 0.00312$	$89.345\%\pm 02.442\%$	$0.89046\pm 0.02561$	$33.146\%\pm 01.573\%$	$0.30879\pm 0.01138$	$42.797\%\pm 00.000\%$	$0.36857\pm 0.00000$
	AlexNet	$97.399\%\pm 00.322\%$	$0.97400\pm 0.00318$	$90.165\%\pm 00.543\%$	$0.89938\pm 0.00537$	$33.864\%\pm 00.303\%$	$0.30133\pm 0.01534$	$41.059\%\pm 00.898\%$	$0.35199\pm 0.02495$
	VGG16	$99.105\%\pm 00.715\%$	$0.99104\pm 0.00716$	$93.042\%\pm 00.472\%$	$0.92978\pm 0.00452$	$40.291\%\pm 01.226\%$	$0.33305\pm 0.02199$	$44.237\%\pm 01.887\%$	$0.36279\pm 0.01651$
	ResNet101	$98.828\%\pm 00.058\%$	$0.98825\pm 0.00056$	$92.642\%\pm 00.094\%$	$0.92414\pm 0.00113$	$31.204\%\pm 01.139\%$	$0.24170\pm 0.01724$	$41.314\%\pm 00.434\%$	$0.29068\pm 0.00875$
	EfficientNet	$98.561\%\pm 00.074\%$	$0.98558\pm 0.00076$	$88.974\%\pm 00.921\%$	$0.88555\pm 0.01084$	$31.320\%\pm 00.990\%$	$0.25177\pm 0.00279$	$39.873\%\pm 00.481\%$	$0.31227\pm 0.00996$
	InceptionV3	$98.561\%\pm 00.074\%$	$0.98558\pm 0.00076$	$97.170\%\pm 00.000\%$	$0.97164\pm 0.00000$	$35.786\%\pm 02.346\%$	$0.30539\pm 0.02399$	$\textbf{51.992\%}\pm 03.272\%$	$\textbf{0.48344}\pm 0.05286$
	MobileNetV2	$99.275\%\pm 00.000\%$	$0.99274\pm 0.00000$	$87.807\%\pm 01.868\%$	$0.87907\pm 0.01975$	$07.029\%\pm 00.078\%$	$0.00923\pm 0.00020$	$08.475\%\pm 00.000\%$	$0.01324\pm 0.00000$
	ViT	$90.314\%\pm 01.031\%$	$0.89702\pm 0.01010$	$89.789\%\pm 00.888\%$	$0.89693\pm 0.00833$	$30.680\%\pm 02.220\%$	$0.18810\pm 0.03184$	$38.701\%\pm 00.528\%$	$0.25129\pm 0.03408$
Multi-label (Disease)	CNN	$96.748\%\pm 00.023\%$	$0.96752\pm 0.00025$	$91.632\%\pm 00.977\%$	$0.91562\pm 0.01059$	$35.184\%\pm 00.534\%$	$0.33953\pm 0.00126$	$46.610\%\pm 00.000\%$	$0.42350\pm 0.00000$
	AlexNet	$96.438\%\pm 00.115\%$	$0.96441\pm 0.00121$	$92.431\%\pm 00.381\%$	$0.92444\pm 0.00312$	$32.000\%\pm 02.120\%$	$0.30747\pm 0.01745$	$41.864\%\pm 04.797\%$	$0.38276\pm 0.05842$
	VGG16	$94.680\%\pm 00.444\%$	$0.94659\pm 0.00436$	$88.380\%\pm 00.736\%$	$0.88040\pm 0.00741$	$32.757\%\pm 00.815\%$	$0.29959\pm 0.00604$	$39.958\%\pm 03.137\%$	$0.35054\pm 0.01112$
	ResNet101	$98.926\%\pm 00.083\%$	$0.98927\pm 0.00083$	$95.572\%\pm 00.402\%$	$0.95467\pm 0.00394$	$30.155\%\pm 03.440\%$	$0.28404\pm 0.02025$	$38.220\%\pm 03.048\%$	$0.34683\pm 0.01162$
	EfficientNet	$98.087\%\pm 00.183\%$	$0.98083\pm 0.00185$	$91.188\%\pm 00.089\%$	$0.90920\pm 0.00027$	$26.777\%\pm 01.800\%$	$0.26184\pm 0.01083$	$32.203\%\pm 05.932\%$	$0.30440\pm 0.04113$
	InceptionV3	$99.383\%\pm 00.099\%$	$0.99384\pm 0.00099$	$97.547\%\pm 00.033\%$	$0.97468\pm 0.00029$	$39.573\%\pm 01.218\%$	$0.38842\pm 0.00837$	$46.483\%\pm 00.381\%$	$0.43180\pm 0.00194$
	MobileNetV2	$98.571\%\pm 00.064\%$	$0.98569\pm 0.00064$	$91.787\%\pm 01.372\%$	$0.91625\pm 0.01531$	$07.146\%\pm 00.078\%$	$0.00953\pm 0.00020$	$05.424\%\pm 01.017\%$	$0.00575\pm 0.00250$
	ViT	$90.052\%\pm 00.668\%$	$0.89733\pm 0.00664$	$92.675\%\pm 00.222\%$	$0.92528\pm 0.00239$	$26.796\%\pm 00.000\%$	$0.24090\pm 0.00985$	$38.983\%\pm 00.847\%$	$0.32884\pm 0.01774$
Multi-output (Disease)	CNN	$94.898\%\pm 00.493\%$	$0.94882\pm 0.00487$	$89.467\%\pm 00.948\%$	$0.89130\pm 0.01004$	$39.592\%\pm 01.452\%$	$0.35808\pm 0.01346$	$41.314\%\pm 00.636\%$	$0.39240\pm 0.00270$
	AlexNet	$98.058\%\pm 00.127\%$	$0.98056\pm 0.00127$	$92.508\%\pm 00.114\%$	$0.92443\pm 0.00163$	$36.621\%\pm 01.807\%$	$0.33630\pm 0.02018$	$47.034\%\pm 00.000\%$	$0.44287\pm 0.00000$
	VGG16	$98.643\%\pm 00.042\%$	$0.98642\pm 0.00044$	$94.839\%\pm 01.049\%$	$0.94706\pm 0.01153$	$42.252\%\pm 00.816\%$	$0.35863\pm 0.01492$	$48.517\%\pm 01.907\%$	$0.39686\pm 0.03644$
	ResNet101	$99.021\%\pm 00.049\%$	$0.99020\pm 0.00049$	$95.627\%\pm 00.466\%$	$0.95588\pm 0.00429$	$35.730\%\pm 00.000\%$	$0.29591\pm 0.00000$	$41.949\%\pm 00.000\%$	$0.32134\pm 0.00000$
	EfficientNet	$98.539\%\pm 00.103\%$	$0.98539\pm 0.00103$	$91.831\%\pm 01.184\%$	$0.91649\pm 0.01190$	$30.583\%\pm 00.558\%$	$0.24967\pm 0.00594$	$38.644\%\pm 01.017\%$	$0.26158\pm 0.01855$
	InceptionV3	$99.414\%\pm 00.056\%$	$0.99413\pm 0.00056$	$97.414\%\pm 00.494\%$	$0.97330\pm 0.00540$	$43.165\%\pm 00.089\%$	$0.41892\pm 0.00977$	$46.441\%\pm 03.239\%$	$0.40202\pm 0.03922$
	MobileNetV2	$98.581\%\pm 00.207\%$	$0.98577\pm 0.00206$	$91.931\%\pm 00.740\%$	$0.91873\pm 0.00750$	$05.553\%\pm 01.068\%$	$0.00604\pm 0.00235$	$07.881\%\pm 01.780\%$	$0.01204\pm 0.00359$
	ViT	$90.553\%\pm 00.928\%$	$0.90458\pm 0.00712$	$86.733\%\pm 01.946\%$	$0.86373\pm 0.02181$	$31.456\%\pm 00.000\%$	$0.15054\pm 0.00000$	$0.381360.0\%\pm\%$	$0.21056\pm 0.00000$
Multi-task (Disease)	Cross-Stitch	$95.241\%\pm 00.169\%$	$0.95237\pm 0.00179$	$93.785\%\pm 00.248\%$	$0.93739\pm 0.00276$	$40.466\%\pm 00.995\%$	$0.35972\pm 0.00859$	$44.492\%\pm 01.431\%$	$0.38920\pm 0.01102$
	MTAN	$91.410\%\pm 00.710\%$	$0.91411\pm 0.00702$	$87.148\%\pm 02.273\%$	$0.86893\pm 0.02296$	$30.971\%\pm 01.075\%$	$0.20637\pm 0.01803$	$38.941\%\pm 01.079\%$	$0.25224\pm 0.01672$
	TSNs	$79.926\%\pm 02.681\%$	$0.79900\pm 0.02653$	$81.953\%\pm 03.986\%$	$0.81046\pm 0.04324$	$30.039\%\pm 01.139\%$	$0.19396\pm 0.01471$	$37.627\%\pm 01.254\%$	$0.25399\pm 0.02069$
	MOON	$95.307\%\pm 00.889\%$	$0.95296\pm 0.00886$	$92.986\%\pm 01.060\%$	$0.92804\pm 0.01191$	$34.757\%\pm 01.074\%$	$0.22978\pm 0.02701$	$41.568\%\pm 01.485\%$	$0.31160\pm 0.02726$
Our Methods w.o BW (Disease)	CNN	$96.363\%\pm 00.000\%$	$0.96352\pm 0.00000$	$95.006\%\pm 00.000\%$	$0.94992\pm 0.0000$	$42.058\%\pm 00.932\%$	$0.38199\pm 0.00462$	$46.229\%\pm 01.359\%$	$0.41789\pm 0.01399$
	AlexNet	$97.380\%\pm 00.496\%$	$0.97376\pm 0.00496$	$92.519\%\pm 00.928\%$	$0.92387\pm 0.00963$	$37.631\%\pm 00.756\%$	$0.30592\pm 0.03019$	$44.619\%\pm 02.218\%$	$0.39379\pm 0.02078$
	VGG16	$98.716\%\pm 00.139\%$	$0.98713\pm 0.00142$	$95.061\%\pm 00.259\%$	$0.94922\pm 0.00299$	$42.718\%\pm 00.951\%$	$0.34025\pm 0.01214$	$45.636\%\pm 00.194\%$	$0.37864\pm 0.01234$
	ResNet101	$98.943\%\pm 00.056\%$	$0.98943\pm 0.00056$	$95.461\%\pm 00.674\%$	$0.95442\pm 0.00664$	$32.893\%\pm 01.054\%$	$0.23341\pm 0.00812$	$38.729\%\pm 00.339\%$	$0.29411\pm 0.01903$
	EfficientNet	$98.639\%\pm 00.034\%$	$0.98642\pm 0.00035$	$92.786\%\pm 00.880\%$	$0.92687\pm 0.00850$	$30.194\%\pm 02.639\%$	$0.24793\pm 0.02109$	$40.466\%\pm 00.212\%$	$0.33290\pm 0.00744$
	InceptionV3	$99.418\%\pm 00.029\%$	$0.99417\pm 0.00030$	$97.292\%\pm 00.546\%$	$0.97201\pm 0.00608$	$45.029\%\pm 03.134\%$	$0.41716\pm 0.03430$	$47.542\%\pm 04.664\%$	$0.46156\pm 0.03985$
	MobileNetV2	$98.645\%\pm 00.335\%$	$0.98648\pm 0.00329$	$91.720\%\pm 01.414\%$	$0.91710\pm 0.01342$	$07.010\%\pm 00.058\%$	$0.00918\pm 0.00015$	$06.356\%\pm 01.271\%$	$0.00787\pm 0.00294$
	ViT	$86.728\%\pm 00.087\%$	$0.86416\pm 0.00183$	$90.330\%\pm 00.354\%$	$0.90116\pm 0.00410$	$32.913\%\pm 00.291\%$	$0.24160\pm 0.01754$	$38.983\%\pm 00.424\%$	$0.30192\pm 0.00490$
Our Methods with BW (Disease)
Our Methods with BW (Disease)	InceptionV3	$\textbf{99.466\%}\pm 00.000\%$	$\textbf{0.99466}\pm 0.0000$	$\textbf{97.759\%}\pm 00.000\%$	$\textbf{0.97738}\pm 0.0000$	$\textbf{45.301\%}\pm 03.939\%$	$\textbf{0.43104}\pm 0.03197$	$50.212\%\pm 02.584\%$	$0.48154\pm 0.03108$
Our Methods w. BW + TF (Disease)
Our Methods w. BW + TF (Disease)	InceptionV3	$99.615\%\pm 00.049\%$	$0.99615\pm 0.00049$	$98.149\%\pm 00.167\%$	$0.98146\pm 0.00173$	$64.544\%\pm 03.089\%$	$0.62968\pm 0.03476$	$63.305\%\pm 00.786\%$	$0.63757\pm 0.00953$

Table 7. Both (Type & Disease) Results.

		Plant Village		Plant leaves		PlantDoc-0.2		PlantDoc-1.0
		Acc	F1	Acc	F1	Acc	F1	Acc	F1
Multi-model (Both)	CNN	$93.273\%\pm 00.396\%$	$0.94202\pm 0.00310$	$85.283\%\pm 03.019\%$	$0.86703\pm 0.02708$	$11.456\%\pm 00.583\%$	$0.12928\pm 0.00427$	$12.712\%\pm 00.000\%$	$0.11760\pm 0.00000$
	AlexNet	$96.275\%\pm 00.582\%$	$0.96831\pm 0.00447$	$88.007\%\pm 00.971\%$	$0.88788\pm 0.00804$	$12.117\%\pm 01.836\%$	$0.12140\pm 0.03146$	$14.746\%\pm 02.678\%$	$0.12720\pm 0.02971$
	VGG16	$98.634\%\pm 00.978\%$	$0.98852\pm 0.00827$	$90.778\%\pm 00.283\%$	$0.92730\pm 0.00409$	$16.252\%\pm 01.648\%$	$0.15821\pm 0.02069$	$14.746\%\pm 01.150\%$	$0.12209\pm 0.01727$
	ResNet101	$98.295\%\pm 00.065\%$	$0.98524\pm 0.00052$	$91.698\%\pm 00.094\%$	$0.92189\pm 0.00087$	$05.612\%\pm 00.566\%$	$0.03379\pm 0.00847$	$05.805\%\pm 00.601\%$	$0.03731\pm 0.00309$
	EfficientNet	$98.034\%\pm 00.000\%$	$0.98301\pm 0.00029$	$85.814\%\pm 00.592\%$	$0.87085\pm 0.00535$	$07.553\%\pm 00.058\%$	$0.06758\pm 0.00492$	$07.839\%\pm 01.820\%$	$0.05904\pm 0.00971$
	InceptionV3	$98.034\%\pm 00.000\%$	$0.98301\pm 0.00029$	$96.4623\%\pm 00.000\%$	$0.97142\pm 0.00000$	$14.194\%\pm 00.421\%$	$0.13135\pm 0.00396$	$20.212\%\pm 02.924\%$	$0.19346\pm 0.04271$
	MobileNetV2	$99.050\%\pm 00.000\%$	$0.99148\pm 0.00000$	$85.307\%\pm 01.726\%$	$0.87486\pm 0.01687$	$04.078\%\pm 01.553\%$	$0.00361\pm 0.00301$	$00.847\%\pm 01.695\%$	$0.00069\pm 0.00138$
	ViT	$84.481\%\pm 01.864\%$	$0.85548\pm 0.01771$	$86.238\%\pm 00.999\%$	$0.87549\pm 0.00578$	$03.883\%\pm 01.040\%$	$0.02103\pm 0.00969$	$04.802\%\pm 00.871\%$	$0.01652\pm 0.00460$
Multi-label (Both)	CNN	$96.537\%\pm 00.057\%$	$0.96542\pm 0.00062$	$89.523\%\pm 01.655\%$	$0.89448\pm 0.01606$	$20.680\%\pm 00.445\%$	$0.18933\pm 0.00913$	$20.763\%\pm 00.000\%$	$0.16217\pm 0.00000$
	AlexNet	$96.150\%\pm 00.172\%$	$0.96156\pm 0.00175$	$91.099\%\pm 00.652\%$	$0.90945\pm 0.00583$	$18.000\%\pm 00.772\%$	$0.15703\pm 0.01071$	$19.195\%\pm 04.383\%$	$0.15931\pm 0.04658$
	VGG16	$94.350\%\pm 00.433\%$	$0.94316\pm 0.00444$	$85.527\%\pm 00.743\%$	$0.85292\pm 0.00795$	$16.854\%\pm 00.986\%$	$0.14534\pm 0.00831$	$13.432\%\pm 01.748\%$	$0.09539\pm 0.02077$
	ResNet101	$98.841\%\pm 00.070\%$	$0.98841\pm 0.00071$	$94.684\%\pm 00.230\%$	$0.94643\pm 0.00240$	$16.000\%\pm 00.956\%$	$0.12609\pm 0.02183$	$11.864\%\pm 00.709\%$	$0.07009\pm 0.00548$
	EfficientNet	$97.923\%\pm 00.204\%$	$0.97908\pm 0.00207$	$89.789\%\pm 00.444\%$	$0.89457\pm 0.00349$	$12.913\%\pm 00.806\%$	$0.09489\pm 0.00599$	$08.983\%\pm 00.678\%$	$0.04257\pm 0.00800$
	InceptionV3	$99.310\%\pm 00.171\%$	$0.99311\pm 0.00173$	$97.114\%\pm 00.000\%$	$0.97077\pm 0.00000$	$22.971\%\pm 01.129\%$	$0.21467\pm 0.00871$	$\textbf{25.466\%}\pm 00.127\%$	$0.21130\pm 0.00654$
	MobileNetV2	$98.483\%\pm 00.023\%$	$0.98481\pm 0.00021$	$90.122\%\pm 01.469\%$	$0.90013\pm 0.01574$	$06.408\%\pm 01.553\%$	$0.00813\pm 0.00301$	$05.000\%\pm 00.254\%$	$0.00477\pm 0.00044$
	ViT	$87.699\%\pm 00.571\%$	$0.87090\pm 0.00500$	$0.90899\%\pm 00.333\%$	$0.90881\pm 0.00392$	$13.592\%\pm 00.000\%$	$0.10092\pm 0.00978$	$11.864\%\pm 00.000\%$	$0.07795\pm 0.01533$
Multi-output (Both)	CNN	$93.208\%\pm 00.569\%$	$0.93766\pm 0.00483$	$85.094\%\pm 01.141\%$	$0.86082\pm 0.01036$	$14.078\%\pm 00.701\%$	$0.15213\pm 0.00644$	$12.288\%\pm 01.271\%$	$0.11036\pm 0.01495$
	AlexNet	$97.719\%\pm 00.131\%$	$0.97855\pm 0.00121$	$90.777\%\pm 00.578\%$	$0.90887\pm 0.00518$	$16.272\%\pm 01.319\%$	$0.16022\pm 0.01750$	$22.458\%\pm 00.000\%$	$0.20815\pm 0.00000$
	VGG16	$98.258\%\pm 00.085\%$	$0.98359\pm 0.00066$	$92.941\%\pm 01.441\%$	$0.93384\pm 0.01352$	$17.767\%\pm 01.456\%$	$0.15111\pm 0.01682$	$16.737\%\pm 02.331\%$	$0.11670\pm 0.02365$
	ResNet101	$98.789\%\pm 00.071\%$	$0.98843\pm 0.00053$	$94.295\%\pm 00.466\%$	$0.94471\pm 0.00399$	$09.126\%\pm 00.000\%$	$0.06699\pm 0.00000$	$11.441\%\pm 00.000\%$	$0.05938\pm 0.00000$
	EfficientNet	$98.195\%\pm 00.162\%$	$0.98290\pm 0.00135$	$89.734\%\pm 01.825\%$	$0.89997\pm 0.01753$	$08.738\%\pm 01.387\%$	$0.06837\pm 0.00493$	$06.737\%\pm 00.127\%$	$0.03745\pm 0.00025$
	InceptionV3	$99.215\%\pm 00.017\%$	$0.99239\pm 0.00018$	$96.493\%\pm 00.519\%$	$0.96693\pm 0.00536$	$23.650\%\pm 00.356\%$	$0.23228\pm 0.01445$	$18.136\%\pm 04.308\%$	$0.14635\pm 0.03692$
	MobileNetV2	$98.309\%\pm 00.284\%$	$0.98401\pm 0.00276$	$89.234\%\pm 01.277\%$	$0.89726\pm 0.01118$	$02.155\%\pm 03.292\%$	$0.00289\pm 0.00441$	$02.119\%\pm 02.119\%$	$0.00172\pm 0.00172$
	ViT	$87.276\%\pm 01.327\%$	$0.87476\pm 0.01048$	$82.842\%\pm 02.300\%$	$0.82853\pm 0.02119$	$01.942\%\pm 00.000\%$	$0.00074\pm 0.00000$	$03.390\%\pm 00.000\%$	$0.00222\pm 0.00000$
Multi-task (Both)	Cross-Stitch	$93.870\%\pm 00.310\%$	$0.94509\pm 0.00276$	$91.356\%\pm 00.106\%$	$0.92334\pm 0.00091$	$17.068\%\pm 00.629\%$	$0.17532\pm 0.00541$	$16.610\%\pm 02.241\%$	$0.15028\pm 0.01913$
	MTAN	$89.509\%\pm 00.898\%$	$0.90269\pm 0.00836$	$82.420\%\pm 03.549\%$	$0.83209\pm 0.03344$	$04.175\%\pm 01.210\%$	$0.02986\pm 0.01344$	$06.102\%\pm 00.988\%$	$0.02871\pm 0.00839$
	TSNs	$76.808\%\pm 03.190\%$	$0.77433\pm 0.03101$	$74.673\%\pm 05.778\%$	$0.75099\pm 0.05800$	$05.029\%\pm 01.179\%$	$0.02603\pm 0.01110$	$05.975\%\pm 01.545\%$	$0.02682\pm 0.01049$
	MOON	$94.414\%\pm 00.997\%$	$0.94586\pm 0.00974$	$90.744\%\pm 01.470\%$	$0.90727\pm 0.01542$	$08.272\%\pm 01.571\%$	$0.04720\pm 0.01639$	$09.068\%\pm 01.506\%$	$0.04485\pm 0.01521$
Our Methods w.o BW (Both)	CNN	$95.500\%\pm 00.000\%$	$0.95741\pm 0.00000$	$93.563\%\pm 00.000\%$	$0.93847\pm 0.00000$	$21.748\%\pm 01.553\%$	$0.21419\pm 0.01257$	$18.178\%\pm 02.524\%$	$0.16272\pm 0.02979$
	AlexNet	$96.968\%\pm 00.588\%$	$0.97061\pm 0.00561$	$90.765\%\pm 01.310\%$	$0.90818\pm 0.01255$	$15.553\%\pm 02.203\%$	$0.15499\pm 0.02119$	$17.246\%\pm 01.229\%$	$0.15248\pm 0.01697$
	VGG16	$98.395\%\pm 00.198\%$	$0.98445\pm 0.00189$	$93.219\%\pm 00.644\%$	$0.93439\pm 0.00628$	$16.000\%\pm 01.332\%$	$0.14380\pm 0.01274$	$17.119\%\pm 02.330\%$	$0.11884\pm 0.01286$
	ResNet101	$98.773\%\pm 00.053\%$	$0.98796\pm 0.00056$	$94.550\%\pm 00.737\%$	$0.94664\pm 0.00664$	$09.825\%\pm 01.492\%$	$0.07677\pm 0.01649$	$10.254\%\pm 00.678\%$	$0.06014\pm 0.00730$
	EfficientNet	$98.445\%\pm 00.052\%$	$0.98482\pm 0.00060$	$90.877\%\pm 00.980\%$	$0.90985\pm 0.00847$	$08.039\%\pm 01.318\%$	$0.06106\pm 0.01903$	$09.322\%\pm 00.424\%$	$0.07318\pm 0.00716$
	InceptionV3	$\textbf{99.315\%}\pm 00.037\%$	$\textbf{0.99333}\pm 0.00037$	$96.593\%\pm 00.447\%$	$0.96641\pm 0.00457$	$25.359\%\pm 02.170\%$	$0.24692\pm 0.02342$	$23.814\%\pm 02.910\%$	$0.22487\pm 0.03326$
	MobileNetV2	$98.408\%\pm 00.404\%$	$0.98445\pm 0.00392$	$89.989\%\pm 01.468\%$	$0.90320\pm 0.01200$	$03.689\%\pm 01.165\%$	$0.00286\pm 0.00226$	$04.661\%\pm 00.424\%$	$0.00418\pm 0.00074$
	ViT	$83.003\%\pm 00.064\%$	$0.83316\pm 0.00037$	$87.913\%\pm 00.295\%$	$0.87931\pm 0.00502$	$08.447\%\pm 01.262\%$	$0.07872\pm 0.01946$	$06.780\%\pm 00.424\%$	$0.05606\pm 0.00594$
Our Methods with BW (Both)
Our Methods with BW (Both)	InceptionV3	$99.208\%\pm 00.000\%$	$0.99245\pm 0.0000$	$\textbf{97.524\%}\pm 00.000\%$	$\textbf{0.97746}\pm 00.000$	$\textbf{26.175\%}\pm 03.008\%$	$\textbf{0.26760}\pm 0.03055$	$24.153\%\pm 02.584\%$	$\textbf{0.23191}\pm 0.02879$
Our Methods w. BW + TF (Both)
Our Methods w. BW + TF (Both)	InceptionV3	$99.558\%\pm 00.055\%$	$0.99565\pm 0.00064$	$98.042\%\pm 00.270\%$	$0.98116\pm 0.00196$	$50.291\%\pm 03.895\%$	$0.50396\pm 0.04018$	$50.000\%\pm 02.023\%$	$0.50191\pm 0.02036$

4.4. Accuracy, F1-score and False Positive Rates of the Optimums

Table 8 offers a comprehensive performance evaluation of various approaches’ optimal models across diverse datasets. All models leverage Inception V3 backbones, recognised as the top-performing backbone in Table 5, 6, and 7. Within the table, we present accuracy (Acc), F1-score (F1), and false positive rate (FPR) for each model. In the context of accuracy and F1-score, higher values are indicative of better performance, while for FPR, closer values to 0 will be better performance. Notably, our methods, specifically the classifier chain with optimised balance weights and the transfer learning approach (the last row), demonstrate their superiority in this table. We can see that it achieves accuracy and F1-scores close to 1.0 in both the Plant Village and Plant Leaves datasets. It also performs exceptionally well in the two datasets of PlantDoc, achieving over 0.7 for plant species classification (Plant) and over 0.6 for leaf disease classification (Dis). And the overall classification (Both) exceeds 0.5 in both Accuracy and F1-score. In terms of FPR, each column’s values are significantly close to 0 compared to all other rows, highlighting the exceptional robustness of this model.

In summary, this supremacy underscores the effectiveness of incorporating the classifier chain structure, optimising weights through grid-search, utilising transfer learning from ImageNet weights, and adopting Inception V3 backbones. These techniques collectively result in a more robust and precise classification system. Furthermore, these methods consistently maintain relatively low FPR, underscoring their capability to minimise misclassifications. As we focus on the rows related to the classifier chain in the table (the last three rows), the effectiveness of various methods becomes evident. Their accuracy and F1-scores exhibit a progressive increase, while the FPR consistently decreases. Ultimately, this amalgamation leads to a heightened level of model effectiveness, that may render them an invaluable choice for practical applications in smart agriculture and plant health management.

Table 8. The Optimums’ Accuracy, F1-score & False Positive Rates.

Approaches	Metric	Plant Village			Plant Leaves			Plant Doc 0.2			Plant Doc Original
Approaches	Metric	Plant	Dis	Both	Plant	Dis	Both	Plant	Dis	Both	Plant	Dis	Both
Multi-model	Acc	0.99750	0.99275	0.99050	0.99175	0.97170	0.96462	0.44854	0.36311	0.14563	0.44492	0.50847	0.21186
	F1	0.99750	0.99274	0.99148	0.99175	0.97164	0.97142	0.36914	0.30423	0.13374	0.43324	0.47997	0.21345
	FPR	0.00033	0.00039	0.00022	0.00093	0.00373	0.00118	0.05168	0.04737	0.01723	0.04504	0.03634	0.01221
Multi-label	Acc	0.99834	0.99558	0.99484	0.99112	0.97558	0.97114	0.33786	0.37282	0.21748	0.41949	0.47034	0.24576
	F1	0.99834	0.99559	0.99485	0.99111	0.97477	0.97077	0.32529	0.33752	0.17350	0.41247	0.46042	0.21948
	FPR	0.00013	0.00023	0.00014	0.00080	0.00361	0.00137	0.05330	0.04435	0.02964	0.04725	0.03855	0.02834
Multi-output	Acc	0.99888	0.99388	0.99288	0.98890	0.96670	0.96004	0.52427	0.43107	0.23883	0.44492	0.51271	0.19915
	F1	0.99887	0.99388	0.99319	0.98893	0.96616	0.96127	0.48730	0.42531	0.24174	0.39350	0.45603	0.17277
	FPR	0.00017	0.00032	0.00020	0.00104	0.00437	0.00167	0.04249	0.03991	0.01745	0.04800	0.03861	0.02533
Our Methods	Acc	0.99838	0.99400	0.99300	0.99112	0.97336	0.96559	0.46214	0.44854	0.28155	0.51695	0.50424	0.25424
	F1	0.99837	0.99400	0.99319	0.99119	0.97201	0.96599	0.44974	0.41720	0.25822	0.47412	0.48480	0.24085
	FPR	0.00025	0.00032	0.00020	0.00081	0.00414	0.00149	0.04403	0.03973	0.01491	0.04185	0.03658	0.02038
Our Methods with BW	Acc	0.99725	0.99450	0.99225	0.99646	0.97759	0.97524	0.48932	0.50874	0.31456	0.53814	0.53390	0.31356
	F1	0.99726	0.99449	0.99248	0.99647	0.97738	0.97746	0.51162	0.47363	0.33094	0.51138	0.50913	0.28519
	FPR	0.00037	0.00030	0.00022	0.00039	0.00327	0.00107	0.04121	0.03774	0.01443	0.03818	0.03411	0.01660
Our Methods w. BW + TF	Acc	0.99913	0.99650	0.99600	1.00000	0.98231	0.98231	0.76505	0.65049	0.55340	0.76271	0.62712	0.51271
	F1	0.99912	0.99650	0.99625	1.00000	0.98237	0.98225	0.76336	0.63742	0.54967	0.75084	0.63908	0.50652
	FPR	0.00013	0.00019	0.00011	0.00000	0.00217	0.00094	0.02015	0.02508	0.01113	0.02011	0.02545	0.01304

4.5. Summary and Ablation Study

4.5.1. Backbone CNNs and ViT

The very first question we would like to answer in this study is which CNN architectures are most useful for plant identification and disease classification. Although we cannot test all CNNs available in the literature, we selected the most popular ones in image classification and plant pathology. AlexNet, VGG16, ResNet101, EfficientNet, InceptionV3, MobileNetV2, ViT, and our custom CNN are implemented for the study. In summary, we plot four groups of histograms in Figure 9, where each column representing the average F1-score of a backbone CNN in all approaches of this study (multi-model, multi-label, multi-output, and GSMo-CNN). These related results are elaborated in Tables 5, 6, and 7. As highlighted in Section 3.3, the F1 score proves more appropriate for evaluating prediction results which have imbalanced data, as discussed earlier. This graph aims to compare the performance of all backbone CNNs statistically and provide a view of how different backbone CNNs perform on different datasets. Each plot in Figure 9 shows average performance in plant identification, disease classification; and the combination of plant & disease prediction (Both) over different approaches. As we can see, InceptionV3 has a better average F1-score than AlexNet, VGG16, ResNet101, EfficientNet, MobileNet, ViT, and our custom CNN in all datasets. In PlantVillage, although being better, the difference between InceptionV3 and the other backbone CNNs does not stand out, as all backbone CNNs achieve more than $0.95$ F1-score. However, in Plant Leaves, PlantDoc-0.2, and PlantDoc-1.0, InceptionV3 clearly demonstrates its advantage over the other CNNs. The second-best backbone CNN in Plant Village and Plant Leaves is ResNet101. In PlantDoc-0.2 and PlantDoc-1.0, our custom CNN and AlexNet are better than other backbones, just behind InceptionV3. This is different from Plant Village and Plant Leaves where our custom CNN has the lowest performance. Besides, ViT showcased somewhat satisfactory outcomes in testing across all datasets, but it fell short of attaining optimal results. We can also notice that, the light-weight CNNs (MobileNetV2 and EfficientNet) perform reasonably well on Plant Village and Plant Leaves and they have similar results. However, for PlantDoc-0.2 and PlantDoc-1.0 both light-weight CNNs have the lowest F1-score with MobileNetV2 trailing behind. It would suggest that light-weights CNNs should only be used for close-up images of a leaf, and not for images with multiple leaves and complex backgrounds.

The advantage of InceptionV3 can be justified by three factors. First, InceptionV3 can be deeper than VGG with a similar number of parameters. Second, InceptionV3 incorporates techniques like batch normalization and dropout to help mitigate overfitting, making it more robust on smaller datasets compared to VGG. Finally and more importantly, InceptionV3 uses a combination of convolutional layers with different kernel sizes and max-pooling layers to capture multi-scale features efficiently. This allows it to learn features representing disease types from small areas of leaves, thus, achieving competitive performance compared to other backbone CNNs.

4.5.2. Single Models versus Dual Models

In the past, it was common and effective to use a model for a task, such as a plant species identification (Othman et al., 2022; Shelke and Mehendale, 2022; Kanda et al., 2021) or leaf disease classification (Agarwal et al., 2019; Vijaykanth Reddy and Sashi Rekha, 2021; Ashok et al., 2020). Training an individual model for a particular task would be reasonable and straightforward as ones can optimise the model for that specific task. Multi-task learning has been shown useful, however, it is designed for tasks with different data where the combination of data can be useful for all task. This is not the case in this study where the data for all tasks are the same. That’s the reason why multi-task approach does not show its advantage in our experiment. Despite that, we are interested in investigating whether using a single model for the two tasks will be better than using an ensemble of CNNs, each for a task. Also, it would be more efficient, in terms of both computation and memory storage, if a single model can effectively predict both plant species and disease types.

In Figure 10 we show the comparison of all approaches in this study, including multi-model, multi-label, multi-output, multi-task, and our methods (GSMo-CNNs with & without balance weights). Note that, the multi-model approach applies dual models, one for each task. For the sake of comparison, each column bar in the graph represents the highest F1-score of an approach in plant identification (Plant), disease classification (Disease), or combined plant & disease prediction (Both). The related results can be checked in Tables 5, 6 and 7. As mentioned in Section 3.3, the evaluation indicator in this part is F1-score, which is more suitable for evaluating forecasts with imbalanced data. We can see that using dual models, one for each task, is not really advantageous as it has lower performance than approaches using single CNNs for both tasks in most cases. Compared to multi-label, multi-model is only better in Plant Leaves for plant identification, and in PlantDoc-1.0 for plant identification and combined prediction. Multi-model is shown to beat multi-output in Plant Leaves but is inferior in Plant Village, PlantDoc-0.2, and PlantDoc-1.0. Compared to our methods, multi-model achieves lower performance in all datasets. The above results demonstrate the potential and superiority of the single-model approaches, where using shared backbone CNNs for multi-predictions can be more effective than training individual CNNs, each for prediction of a label. We attribute the advantages of single models over a dual models are they can learn the common features of plant species classification and disease classification, which can influence each other efficiently. A single model have an end-to-end structure where the backbone can help to learn common features while the prediction layers will be optimised to learn task-specific features. By doing this, a single model can achieve better generalisation.

4.5.3. Learning Approaches

Figure 10 also shows F1-score of different approaches (the related results have been detailed in Tables 5, 6 and 7). As mentioned in Section 3.3, the F1 score is better suited for evaluating prediction results on imbalanced data compared to Accuracy. Overall, the proposed GSMo-CNNs achieve the best results in all datasets, followed by multi-output and multi-label. Multi-output is comparable to multi-label in Plant Village and slightly better in Plant Leaves, however, multi-label is better in PlantDoc-0.2 and PlantDoc-1.0. Compared to existing methods, as summarised in Table 3 we achieve state-of-the-art results. The only approach achieves $99.98\%$ validation accuracy on Plant Village is AlexNet+SVM in (Kawatra et al., 2020). However, this is a hybrid approach and the result is for a validation set, and it is not clear how the data is partitioned for evaluation.

The remarkable performance of GSMo-CNNs can be attributed to several factors. First, it uses a chain of prediction to jointly predict species and diseases. This would enhance feature learning and relationships of plant species and disease types, and facilitates feature sharing, enabling the model to extract general image characteristics effectively; Second, the balance weights were employed to address class imbalances, ensuring accurate predictions for all classes. These theoretical and technical advantages collectively make GSMo-CNNs an advantegous approach for plant species identification and disease classification.

4.5.4. Balance Weights

As presented in Section 3.2, GSMo-CNN can use balance weights for the loss functions. We show the heat maps for the balance weights of plant and disease on the final prediction layer and the balance weights of plant and disease on the temporary layer in Figure 11. The hotter colour represents the higher F1-score. As we can see, in general, the balance weights for plant and disease need to be analysed specifically, as different tasks have varying weight values. For example, in Figure 11(a), the highest F1-scores of GSMo-CNN in the Plant Village dataset appear in 0.3 Weights of Plant (WP) & 0.5 Weights of Disease (WD) and 0.5 - 0.6 WP & 0.2 - 0.4 WD. But in Figure 11(c), the highest F1-scores of GSMo-CNN in the Plant Leaves dataset appear in 0.6 WP & 0.6 WD, which trend to similar.

4.5.5. Prediction Layer

In Figure 12, we show that by stacking prediction layers we can achieve the performance and the balance weights can help increase F1-score. Finally, we show that transfer learning would be useful in this study.

As we can see, with one prediction level we have the multi-output approach which achieves good results on Plant Village and Plant Leaves, although the performance on PlantDoc-0.2 and PlantDoc-1.0 is not high it is still comparable or better than multi-model, multi-label, multi-task approaches. When adding a stack of another prediction layer on top of the first (temporary) prediction layer with cross connections: disease (temporary) $\rightarrow$ plant; plant (temporary) $\rightarrow$ disease, we can achieve the improvement in all cases, except plant identification and disease classification in PlantDoc-0.2. With balance weights ( $\beta_{1}:\delta_{1}:\beta_{2}:\delta_{2}=0.1:0.1:0.4:0.5$ ) for the training loss functions, further improvement can be seen in 9 out of 12 cases. The other 3 cases are plant identification (Plant Village), disease classification (Plant Leaves), and combined prediction (Plant Village) where the balance weights do not show their advantage. Finally, the usefulness of transfer learning is apparent as it only fails to increase the performance of disease classification on Plant Leaves. Especially on PlantDoc-0.2 and PlantDoc-1.0, we can achieve significant improvement with large margins.

5. Conclusion and Future Work

We presented a comprehensive survey and empirical study on deep learning for plant identification and disease classification from leaf images. The paper aims to address the gaps in modern plant pathology where cutting-edge technologies such as deep learning have been adopted largely but there is still a lack of benchmarking studies. Besides reviewing currently used methods, we also show the available methods from machine learning literature which haven’t been employed for plant identification and/or disease classification. Furthermore, we investigate the hypothesis that a single model for multi-prediction would be more useful than multiple models, each for a task, in terms of implementation, computation, memory saving, and effectiveness. To this end, we categorise different approaches into multi-model, multi-label, multi-output, and multi-task where different CNN structures can be the backbone, if applicable. For completeness, we also propose a new model, based on the multi-output approach and the idea of classifier chain, by stacking and cross-connecting prediction layers. We run intensive experiments to evaluate and compare these backbone models and approaches in uniform settings. We found that:

•

InceptionV3 is a best choice for a backbone CNN in this study as it performs better than AlexNet, VGG16, ResNet101, MobileNetV2, EfficientNet, ViT, and our custom CNN.
•

Using a single model for both tasks is more useful than using separate models for each of them. Single models can be more convenient, i.e. easier for model selection, more memory saving, more efficient, and we also achieve better results.
•

Stacking and cross-connecting prediction layers can improve the accuracy and F1-score for plant identification and disease classification. This approach is also flexible where balance weights can be applied to search for a better combination of the loss functions at the prediction layers.
•

Transfer learning is promising. We showed that by transferring InceptionV3’s weights trained on ImageNet we can improve the performance of our new model significantly.

The future directions of this research will be extensive and have the potential to significantly benefit the agricultural industry. First, based on the promising results of GSMo-CNNs in this paper, we can expand the idea of a hierarchical combination of labels for further improvement. One idea is to pioneer the development of an algorithm capable of autonomously determining optimal balance weights. This innovation holds immense potential, particularly in scenarios characterized by a multitude of publicly available datasets, each associated with diverse plant species and diseases. Additionally, we can explore the integration of deep supervision to smooth the gradients during the training at different prediction levels. Here, at each output layer, we compute a prediction loss and add it to the final loss function.

Second, we found that each dataset has different sets of plant species and disease types. While we can build a model on each dataset for a narrow task, it is unable to deploy it to another task for new plants and diseases. We intend to delve deeper into the realm of life-long learning, seeking to uncover the full extent of its advantages and applications. A foundation model for plant pathology can be developed so that it can continually acquire and adapt knowledge over time. In particular, the model can learn from newly available datasets without re-training from the previously learned datasets. At the same time, one can distill knowledge from the model for different downstream tasks.

We believe that these efforts and endeavours will propel our research further, leading to new breakthroughs that will benefit the field of precision agriculture and smart agriculture.

References

(1)
Agarwal et al. (2019) Mohit Agarwal, Suneet Kr. Gupta, and K.K. Biswas. 2019. Grape Disease Identification Using Convolution Neural Network. In 2019 23rd International Computer Science and Engineering Conference (ICSEC). 224–229. https://doi.org/10.1109/ICSEC47112.2019.8974752
Agarwal et al. (2020) Mohit Agarwal, Abhishek Singh, Siddhartha Arjaria, Amit Sinha, and Suneet Gupta. 2020. ToLeD: Tomato Leaf Disease Detection using Convolution Neural Network. Procedia Computer Science 167 (2020), 293 – 301. https://login.ezproxy.utas.edu.au/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=edselp&AN=S1877050920306906&site=eds-live
ANANDHAKRISHNAN and JAISAKTHI (2020) T ANANDHAKRISHNAN and SM JAISAKTHI. 2020. IDENTIFICATION OF TOMATO LEAF DISEASE DETECTION USING PRETRAINED DEEP CONVOLUTIONAL NEURAL NETWORK MODELS. Scalable Computing: Practice and Experience 21, 4 (2020), 625 – 635. https://login.ezproxy.utas.edu.au/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=aps&AN=147704496&site=eds-live
Ariyapadath (2021) Sujith Ariyapadath. 2021. Plant Leaf Classification and Comparative Analysis of Combined Feature Set Using Machine Learning Techniques. Traitement du Signal 38, 6 (01 Dec 2021), 1587–1598.
Ashok et al. (2020) Surampalli Ashok, Gemini Kishore, Velpula Rajesh, S. Suchitra, S.G. Gino Sophia, and B. Pavithra. 2020. Tomato Leaf Disease Detection Using Deep Learning Techniques. In 2020 5th International Conference on Communication and Electronics Systems (ICCES). 979–983. https://doi.org/10.1109/ICCES48766.2020.9137986
Azlah et al. (2019) Muhammad Azfar Firdaus Azlah, Lee Suan Chua, Fakhrul Razan Rahmad, Farah Izana Abdullah, and Sharifah Rafidah Wan Alwi. 2019. Review on Techniques for Plant Leaf Classification and Recognition. Computers 8, 4 (2019). https://doi.org/10.3390/computers8040077
Barburiceanu et al. (2020) Stefania Barburiceanu, Romulus Terebes, and Serban Meza. 2020. Grape Leaf Disease Classification using LBP-derived Texture Operators and Colour. In 2020 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR). 1–6. https://doi.org/10.1109/AQTR49680.2020.9130019
Bharate and Shirdhonkar (2020) Anil A. Bharate and M. S. Shirdhonkar. 2020. Classification of Grape Leaves using KNN and SVM Classifiers. In 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC). 745–749. https://doi.org/10.1109/ICCMC48092.2020.ICCMC-000139
Bhowmik et al. (2020) Shyamtanu Bhowmik, Anjan Kumar Talukdar, and Kandarpa Kumar Sarma. 2020. Detection of Disease in Tea Leaves Using Convolution Neural Network. In 2020 Advanced Communication Technologies and Signal Processing (ACTS). 1–6. https://doi.org/10.1109/ACTS49415.2020.9350413
Bir et al. (2020) Paarth Bir, Rajesh Kumar, and Ghanshyam Singh. 2020. Transfer Learning based Tomato Leaf Disease Detection for mobile applications. In 2020 IEEE International Conference on Computing, Power and Communication Technologies (GUCON). 34–39. https://doi.org/10.1109/GUCON48875.2020.9231174
Chaudhari and Patil (2020) Vandana Chaudhari and Manoj Patil. 2020. Banana leaf disease detection using K-means clustering and Feature extraction techniques. In 2020 International Conference on Advances in Computing, Communication Materials (ICACCM). 126–130. https://doi.org/10.1109/ICACCM50413.2020.9212816
Chen et al. (2020) Wen-Liang Chen, Yi-Bing Lin, Fung-Ling Ng, Chun-You Liu, and Yun-Wei Lin. 2020. RiceTalk: Rice Blast Detection Using Internet of Things and Artificial Intelligence Technologies. IEEE Internet of Things Journal 7, 2 (2020), 1001–1010. https://doi.org/10.1109/JIOT.2019.2947624
Chicco and Jurman (2020) Davide Chicco and Giuseppe Jurman. 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21, 1 (2020), 1 – 13.
Chouhan et al. (2019) Siddharth Singh Chouhan, Ajay Kaul, Uday Pratap Singh, and Sanjeev Jain. 2019. A database of leaf images: Practice towards plant conservation with Plant Pathology. https://doi.org/10.17632/hb74ynkjcn.1
Chouhan et al. (2020) Siddharth Singh Chouhan, Uday Pratap Singh, and Sanjeev Jain. 2020. Applications of Computer Vision in Plant Pathology: A Survey. Archives of Computational Methods in Engineering: State of the Art Reviews 27, 2 (2020), 611. https://login.ezproxy.utas.edu.au/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=edssjs&AN=edssjs.15D0A2D2&site=eds-live
Chouhan et al. (2021) Siddharth Singh Chouhan, Uday Pratap Singh, and Sanjeev Jain. 2021. Automated Plant Leaf Disease Detection and Classification Using Fuzzy Based Function Network. Wireless Personal Communications: An International Journal (2021), 1. https://login.ezproxy.utas.edu.au/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=edssjs&AN=edssjs.8410D5A3&site=eds-live
Chowdhury et al. (2021) Muhammad E. H. Chowdhury, Tawsifur Rahman, Amith Khandakar, Mohamed Arselene Ayari, Aftab Ullah Khan, Muhammad Salman Khan, Nasser Al-Emadi, Mamun Bin Ibne Reaz, Mohammad Tariqul Islam, and Sawal Hamid Md Ali. 2021. Automatic and Reliable Leaf Disease Detection Using Deep Learning Techniques. AgriEngineering 3, 2 (2021), 294 – 312. https://login.ezproxy.utas.edu.au/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=asn&AN=151077308&site=eds-live
Dang-Ngoc et al. (2021) Hanh Dang-Ngoc, Trang N. M. Cao, and Chau Dang-Nguyen. 2021. Citrus Leaf Disease Detection and Classification Using Hierarchical Support Vector Machine. In 2021 International Symposium on Electrical and Electronics Engineering (ISEE). 69–74. https://doi.org/10.1109/ISEE51682.2021.9418680
Das et al. (2020) Debasish Das, Mahinderpal Singh, Sarthak Swaroop Mohanty, and S. Chakravarty. 2020. Leaf Disease Detection using Support Vector Machine. In 2020 International Conference on Communication and Signal Processing (ICCSP). 1036–1040. https://doi.org/10.1109/ICCSP48568.2020.9182128
dos Santos Ferreira et al. (2017) Alessandro dos Santos Ferreira, Daniel Matte Freitas, Gercina Gonçalves da Silva, Hemerson Pistori, and Marcelo Theophilo Folhes. 2017. Weed detection in soybean crops using ConvNets. Computers and Electronics in Agriculture 143 (2017), 314–324. https://doi.org/10.1016/j.compag.2017.10.027
Dosovitskiy et al. (2021) Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations. https://openreview.net/forum?id=YicbFdNTTy
Ekanayake and Nawarathna (2021) EMTYK Ekanayake and RD Nawarathna. 2021. Novel deep learning approaches for crop leaf disease classification: A review. In 2021 International Research Conference on Smart Computing and Systems Engineering (SCSE), Vol. 4. 49–52. https://doi.org/10.1109/SCSE53661.2021.9568324
Fu et al. (2020) Yan Fu, Xutao Li, and Yunming Ye. 2020. A multi-task learning model with adversarial data augmentation for classification of fine-grained images. Neurocomputing 377 (2020), 122–129. https://doi.org/10.1016/j.neucom.2019.10.002
G. and J. (2019) Geetharamani G. and Arun Pandian J. 2019. Identification of plant leaf diseases using a nine-layer deep convolutional neural network. Computers and Electrical Engineering 76 (2019), 323 – 338. https://login.ezproxy.utas.edu.au/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=edselp&AN=S0045790619300023&site=eds-live
Gadade and Kirange (2020) Haridas D. Gadade and D. K. Kirange. 2020. Tomato Leaf Disease Diagnosis and Severity Measurement. In 2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4). 318–323. https://doi.org/10.1109/WorldS450073.2020.9210294
Gajjar et al. (2021) Ruchi Gajjar, Nagendra Gajjar, Vaibhavkumar Jigneshkumar Thakor, Nikhilkumar Pareshbhai Patel, and Stavan Ruparelia. 2021. Real-time detection and identification of plant leaf diseases using convolutional neural networks on an embedded platform. The Visual Computer: International Journal of Computer Graphics (2021), 1. https://login.ezproxy.utas.edu.au/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=edssjs&AN=edssjs.4D1044AE&site=eds-live
Garcia Arnal Barbedo et al. (2018) Jayme Garcia Arnal Barbedo, Luciano Vieira Koenigkan, Bernardo Almeida Halfeld-Vieira, Rodrigo Veras Costa, Katia Lima Nechet, Claudia Vieira Godoy, Murillo Lobo Junior, Flavia Rodrigues Alves Patricio, Viviane Talamini, Luiz Gonzaga Chitarra, Saulo Alves Santos Oliveira, Alessandra Keiko Nakasone Ishida, Jose Mauricio Cunha Fernandes, Thiago Teixeira Santos, Fabio Rossi Cavalcanti, Daniel Terao, and Francislene Angelotti. 2018. Annotated Plant Pathology Databases for Image-Based Detection and Recognition of Diseases. IEEE Latin America Transactions 16, 6 (2018), 1749–1757. https://doi.org/10.1109/TLA.2018.8444395
Guan (2021) Xulang Guan. 2021. A Novel Method of Plant Leaf Disease Detection Based on Deep Learning and Convolutional Neural Network. In 2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP). 816–819. https://doi.org/10.1109/ICSP51882.2021.9408806
Hassan et al. (2021) Sk Mahmudul Hassan, Arnab Kumar Maji, Michał Jasiński, Zbigniew Leonowicz, and Elżbieta Jasińska. 2021. Identification of Plant-Leaf Diseases Using CNN and Transfer-Learning Approach. Electronics 10, 12 (2021). https://doi.org/10.3390/electronics10121388
Hassanin et al. (2021) Mohammed Hassanin, Ibrahim Radwan, Salman Khan, and Murat Tahtali. 2021. Learning Discriminative Representations for Multi-Label Image Recognition. arXiv:2107.11159 [cs.CV]
He et al. (2016) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. https://doi.org/10.1109/CVPR.2016.90
Hu et al. (2021) Gensheng Hu, Huaiyu Wang, Yan Zhang, and Mingzhu Wan. 2021. Detection and severity analysis of tea leaf blight based on deep learning. Computers and Electrical Engineering 90 (2021). https://login.ezproxy.utas.edu.au/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=edselp&AN=S0045790621000471&site=eds-live
Huang et al. (2020) Zhaohua Huang, Ally Qin, Jingshu Lu, Aparna Menon, and Jerry Gao. 2020. Grape Leaf Disease Detection and Classification Using Machine Learning. In 2020 International Conferences on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics). 870–877. https://doi.org/10.1109/iThings-GreenCom-CPSCom-SmartData-Cybermatics50389.2020.00150
Hughes and Salath’e (2015) David P. Hughes and Marcel Salath’e . 2015. An open access repository of images on plant health to enable the development of mobile disease diagnostics through machine learning and crowdsourcing. CoRR abs/1511.08060 (2015). arXiv:1511.08060 http://arxiv.org/abs/1511.08060
Hughes and Salathe (2016) David. P. Hughes and Marcel Salathe. 2016. An open access repository of images on plant health to enable the development of mobile disease diagnostics. arXiv:1511.08060 [cs.CY]
JANA et al. (2021) S. JANA, A. RIJUVANA BEGUM, S. SELVAGANESAN, and P. SURESH. 2021. DEEP BELIEF NETWORK BASED DISEASE DETECTION IN PEPPER LEAF FOR FARMING SECTOR. Turkish Journal of Physiotherapy Rehabilitation 32, 2 (2021), 994 – 1002. https://login.ezproxy.utas.edu.au/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=ccm&AN=151006060&site=eds-live
Jasitha et al. (2019) P Jasitha, M R Dileep, and M Divya. 2019. Venation Based Plant Leaves Classification Using GoogLeNet and VGG. In 2019 4th International Conference on Recent Trends on Electronics, Information, Communication and Technology (RTEICT). 715–719. https://doi.org/10.1109/RTEICT46194.2019.9016966
JAYAPRAKASH and BALAMURUGAN (2021) K. JAYAPRAKASH and S. P. BALAMURUGAN. 2021. AN EFFICIENT HAND-CRAFTED FEATURES WITH MACHINE LEARNING-BASED PLANT LEAF DISEASE DIAGNOSIS AND CLASSIFICATION MODEL. Turkish Journal of Physiotherapy Rehabilitation 32, 2 (2021), 1683 – 1698. https://login.ezproxy.utas.edu.au/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=ccm&AN=151006151&site=eds-live
Kanda et al. (2021) Paul Shekonya Kanda, Kewen Xia, and Olanrewaju Hazzan Sanusi. 2021. A Deep Learning-Based Recognition Technique for Plant Leaf Classification. IEEE Access 9 (2021), 162590–162613. https://doi.org/10.1109/ACCESS.2021.3131726
Kathiresan et al. (2021) Gugan Kathiresan, M Anirudh, M Nagharjun, and R Karthik. 2021. Disease detection in rice leaves using transfer learning techniques. Journal of Physics: Conference Series 1911, 1 (may 2021), 012004. https://doi.org/10.1088/1742-6596/1911/1/012004
Kawatra et al. (2020) Mihir Kawatra, Shreyas Agarwal, and Raghu Kapur. 2020. Leaf Disease Detection using Neural Network Hybrid Models. In 2020 IEEE 5th International Conference on Computing Communication and Automation (ICCCA). 225–230. https://doi.org/10.1109/ICCCA49541.2020.9250885
Kirti and Rajpal (2020) Kirti and Navin Rajpal. 2020. Black Rot Disease Detection in Grape Plant (Vitis vinifera) Using Colour Based Segmentation amp; Machine Learning. In 2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN). 976–979. https://doi.org/10.1109/ICACCCN51052.2020.9362812
KRISHNAMOORTHY and PARAMESWARI (2021) N. KRISHNAMOORTHY and V. R. LOGA PARAMESWARI. 2021. Rice Leaf Disease Detection Via Deep Neural Networks With Transfer Learning For Early Identification. Turkish Journal of Physiotherapy Rehabilitation 32, 2 (2021), 1087 – 1097. https://login.ezproxy.utas.edu.au/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=ccm&AN=151006069&site=eds-live
Kumar et al. ([n. d.]) Neeraj Kumar, Peter N. Belhumeur, Arijit Biswas, David W. Jacobs, W. John Kress, Ida C. Lopez, and João V. B. Soares. [n. d.]. Leafsnap: A Computer Vision System for Automatic Plant Species Identification. In ECCV 2012.
Kumar et al. (2020b) Sandeep Kumar, KMVV Prasad, A. Srilekha, T. Suman, B. Pranav Rao, and J. Naga Vamshi Krishna. 2020b. Leaf Disease Detection and Classification based on Machine Learning. In 2020 International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE). 361–365. https://doi.org/10.1109/ICSTCEE49637.2020.9277379
Kumar et al. (2020a) Vinod Kumar, Hritik Arora, Harsh, and Jatin Sisodia. 2020a. ResNet-based approach for Detection and Classification of Plant Leaf Diseases. In 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC). 495–502. https://doi.org/10.1109/ICESC48915.2020.9155585
lakshmi and Nickolas (2020) R. Kavitha lakshmi and S. Nickolas. 2020. Deep Learning based Betelvine leaf Disease Detection (Piper BetleL.). In 2020 IEEE 5th International Conference on Computing Communication and Automation (ICCCA). 215–219. https://doi.org/10.1109/ICCCA49541.2020.9250911
Lecun et al. (1998) Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324. https://doi.org/10.1109/5.726791
Lee et al. (2015) Sue Han Lee, Chee Seng Chan, Paul Wilkin, and Paolo Remagnino. 2015. Deep-plant: Plant identification with convolutional neural networks. In 2015 IEEE International Conference on Image Processing (ICIP). 452–456. https://doi.org/10.1109/ICIP.2015.7350839
Lee et al. (2021) Sue Han Lee, Hervé Goëau, Pierre Bonnet, and Alexis Joly. 2021. Conditional Multi-Task learning for Plant Disease Identification. In 2020 25th International Conference on Pattern Recognition (ICPR). 3320–3327. https://doi.org/10.1109/ICPR48806.2021.9412643
Li et al. (2021b) Lili Li, Shujuan Zhang, and Bin Wang. 2021b. Plant Disease Detection and Classification by Deep Learning—A Review. IEEE Access 9 (2021), 56683–56698. https://doi.org/10.1109/ACCESS.2021.3069646
Li et al. (2021a) Qinbin Li, Bingsheng He, and Dawn Song. 2021a. Model-Contrastive Federated Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10713–10722.
Liu et al. (2019) Shikun Liu, Edward Johns, and Andrew J. Davison. 2019. End-To-End Multi-Task Learning With Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
Madjarov et al. (2012) Gjorgji Madjarov, Dragi Kocev, Dejan Gjorgjevikj, and Sašo Džeroski. 2012. An extensive experimental comparison of methods for multi-label learning. Pattern Recognition 45, 9 (2012), 3084–3104. https://doi.org/10.1016/j.patcog.2012.03.004 Best Papers of Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA’2011).
Metre and Sawarkar (2021) Vishakha A. Metre and Sudhir D. Sawarkar. 2021. Research Opportunities for the Detection and Classification of Plant Leaf Diseases. In 2021 2nd Global Conference for Advancement in Technology (GCAT). 1–8. https://doi.org/10.1109/GCAT52182.2021.9587775
Metre and Sawarkar (2022) Vishakha A. Metre and Sudhir D. Sawarkar. 2022. Reviewing Important Aspects of Plant Leaf Disease Detection and Classification. In 2022 International Conference for Advancement in Technology (ICONAT). 1–8. https://doi.org/10.1109/ICONAT53423.2022.9725870
Misra et al. (2016) Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert. 2016. Cross-Stitch Networks for Multi-Task Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Mukhopadhyay et al. (2021) Somnath Mukhopadhyay, Munti Paul, Ramen Pal, and Debashis De. 2021. Tea leaf disease detection using multi-objective image segmentation. Multimedia Tools and Applications: An International Journal 80, 1 (2021), 753. https://login.ezproxy.utas.edu.au/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=edssjs&AN=edssjs.30A9493B&site=eds-live
Munisami et al. (2015) Trishen Munisami, Mahess Ramsurn, Somveer Kishnah, and Sameerchand Pudaruth. 2015. Plant Leaf Recognition Using Shape Features and Colour Histogram with K-nearest Neighbour Classifiers. Procedia Computer Science 58 (2015), 740–747. https://doi.org/10.1016/j.procs.2015.08.095 Second International Symposium on Computer Vision and the Internet (VisionNet’15).
Mureşan et al. (2020) H. B. Mureşan, A. M. Coroiu, and A. D. Călin. 2020. Detecting Leaf Plant Diseases Using Deep Learning: A Review. In 2020 International Conference on Software, Telecommunications and Computer Networks (SoftCOM). 1–6. https://doi.org/10.23919/SoftCOM50211.2020.9238318
Mwebaze et al. (2019) Ernest Mwebaze, Timnit Gebru, Andrea Frome, Solomon Nsumba, and Jeremy Tusubira. 2019. iCassava 2019 Fine-Grained Visual Categorization Challenge. arXiv:1908.02900 [cs.CV]
Novotný and Suk (2013) Petr Novotný and Tomáš Suk. 2013. Leaf recognition of woody species in Central Europe. Biosystems Engineering 115, 4 (2013), 444–452. https://doi.org/10.1016/j.biosystemseng.2013.04.007
Othman et al. (2022) Nor Azlan Othman, Nor Salwa Damanhuri, Nabilah Md Ali, Belinda Chong Chiew Meng, and Ahmad Asri Abd Samat. 2022. Plant Leaf Classification Using Convolutional Neural Network. In 2022 8th International Conference on Control, Decision and Information Technologies (CoDIT), Vol. 1. 1043–1048. https://doi.org/10.1109/CoDIT55151.2022.9804121
Padol and Yadav (2016) Pranjali B. Padol and Anjali A. Yadav. 2016. SVM classifier based grape leaf disease detection. In 2016 Conference on Advances in Signal Processing (CASP). 175–179. https://doi.org/10.1109/CASP.2016.7746160
Pandey et al. (2023) Bihari Nandan Pandey, Raghvendra Pratap Singh, Mahima Shanker Pandey, and Sachin Jain. 2023. Cotton Leaf Disease Classification Using Deep Learning based Novel Approach. In 2023 International Conference on Disruptive Technologies (ICDT). 559–561. https://doi.org/10.1109/ICDT57929.2023.10150884
Paymode et al. (2021) Ananda S. Paymode, Shyamsundar P. Magar, and Vandana B. Malode. 2021. Tomato Leaf Disease Detection and Classification using Convolution Neural Network. In 2021 International Conference on Emerging Smart Computing and Informatics (ESCI). 564–570. https://doi.org/10.1109/ESCI50559.2021.9397001
Ponnusamy et al. (2020) Vijayakumar Ponnusamy, Amrith Coumaran, Akhash Subramanian Shunmugam, Kritin Rajaram, and Sanoj Senthilvelavan. 2020. Smart Glass: Real-Time Leaf Disease Detection using YOLO Transfer Learning. In 2020 International Conference on Communication and Signal Processing (ICCSP). 1150–1154. https://doi.org/10.1109/ICCSP48568.2020.9182146
Qin et al. (2021) Xiao Qin, Yu Shi, Xiao Huang, Huiting Li, Jiangtao Huang, Changan Yuan, and Chunxia Liu. 2021. Attention-Based Deep Multi-scale Network for Plant Leaf Recognition. In Intelligent Computing Theories and Application, De-Shuang Huang, Kang-Hyun Jo, Jianqiang Li, Valeriya Gribova, and Vitoantonio Bevilacqua (Eds.). Springer International Publishing, Cham, 302–313.
Raina and Gupta (2021) Sakshi Raina and Abhishek Gupta. 2021. A Study on Various Techniques for Plant Leaf Disease Detection Using Leaf Image. In 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS). 900–905. https://doi.org/10.1109/ICAIS50930.2021.9396023
Rajesh et al. (2020) B. Rajesh, M. Vishnu Sai Vardhan, and L. Sujihelen. 2020. Leaf Disease Detection and Classification by Decision Tree. In 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184). 705–708. https://doi.org/10.1109/ICOEI48184.2020.9142988
Rauf et al. (2019) Hafiz Tayyab Rauf, Basharat Ali Saleem, M. Ikram Ullah Lali, Muhammad Attique Khan, Muhammad Sharif, and Syed Ahmad Chan Bukhari. 2019. A citrus fruits and leaves dataset for detection and classification of citrus diseases through machine learning. Data in Brief 26 (2019), 104340. https://doi.org/10.1016/j.dib.2019.104340
Read et al. (2009) Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2009. Classifier Chains for Multi-label Classification. In Machine Learning and Knowledge Discovery in Databases, Wray Buntine, Marko Grobelnik, Dunja Mladenić, and John Shawe-Taylor (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 254–269.
Ronneberger et al. (2015) Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Nassir Navab, Joachim Hornegger, William M. Wells, and Alejandro F. Frangi (Eds.). Springer International Publishing, Cham, 234–241.
Sai (2021) Dilli Charan Sai. 2021. A Review on Crop Disease Detection using Deep Learning. Journal of Network and Information Security 9, 1 (2021), 18 – 22. https://login.ezproxy.utas.edu.au/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=aps&AN=148445196&site=eds-live
Sandler et al. (2018) Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Saraswathi et al. (2021) R Vijaya Saraswathi, Vaishali Bitla, P Radhika, and T Navneeth Kumar. 2021. Leaf Disease Detection and Remedy Suggestion Using Convolutional Neural Networks. In 2021 5th International Conference on Computing Methodologies and Communication (ICCMC). 788–794. https://doi.org/10.1109/ICCMC51019.2021.9418013
Sardogan et al. (2018) Melike Sardogan, Adem Tuncer, and Yunus Ozen. 2018. Plant Leaf Disease Detection and Classification Based on CNN with LVQ Algorithm. In 2018 3rd International Conference on Computer Science and Engineering (UBMK). 382–385. https://doi.org/10.1109/UBMK.2018.8566635
Shahidur Harun Rumy et al. (2021) S. M. Shahidur Harun Rumy, Md. Ishan Arefin Hossain, Forji Jahan, and Tanjina Tanvin. 2021. An IoT based System with Edge Intelligence for Rice Leaf Disease Detection using Machine Learning. In 2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS). 1–6. https://doi.org/10.1109/IEMTRONICS52119.2021.9422499
Sharma et al. (2020) Pushkara Sharma, Pankaj Hans, and Subhash Chand Gupta. 2020. Classification Of Plant Leaf Diseases Using Machine Learning And Image Preprocessing Techniques. In 2020 10th International Conference on Cloud Computing, Data Science Engineering (Confluence). 480–484. https://doi.org/10.1109/Confluence47617.2020.9057889
Shelke and Mehendale (2022) Ankita Shelke and Ninad Mehendale. 2022. A CNN-based android application for plant leaf classification at remote locations. Neural Computing and Applications (02 Sep 2022). https://doi.org/10.1007/s00521-022-07740-1
Shinohara (2016) Yusuke Shinohara. 2016. Adversarial Multi-Task Learning of Deep Neural Networks for Robust Speech Recognition. In Proc. Interspeech 2016. 2369–2372. https://doi.org/10.21437/Interspeech.2016-879
Singh et al. (2020a) Davinder Singh, Naman Jain, Pranjali Jain, Pratik Kayal, Sudhakar Kumawat, and Nipun Batra. 2020a. PlantDoc: A Dataset for Visual Plant Disease Detection. Proceedings of the 7th ACM IKDD CoDS and 25th COMAD (Jan 2020). https://doi.org/10.1145/3371158.3371196
Singh et al. (2020b) Ujjwal Singh, Anuj Srivastava, Divyanshu Chauhan, and Astha Singh. 2020b. Computer Vision Technique for Detection of Grape Esca (Black Measles) Disease from Grape Leaf Samples. In 2020 International Conference on Contemporary Computing and Applications (IC3A). 110–115. https://doi.org/10.1109/IC3A48958.2020.233281
Söderkvist (2001) Oskar Söderkvist. 2001. Computer Vision Classification of Leaves from Swedish Trees. , 74 pages.
Sujatha et al. (2021) R. Sujatha, Jyotir Moy Chatterjee, NZ Jhanjhi, and Sarfraz Nawaz Brohi. 2021. Performance of deep learning vs machine learning in plant leaf disease detection. Microprocessors and Microsystems 80 (2021), 103615. https://doi.org/10.1016/j.micpro.2020.103615
Sunil et al. (2020) C K Sunil, C D Jaidhar, and Nagamma Patil. 2020. Empirical Study on Multi Convolutional Layer-based Convolutional Neural Network Classifier for Plant Leaf Disease Detection. In 2020 IEEE 15th International Conference on Industrial and Information Systems (ICIIS). 460–465. https://doi.org/10.1109/ICIIS51140.2020.9342729
Surya and Gautama (2020) Rafi Surya and Elliana Gautama. 2020. Cassava Leaf Disease Detection Using Convolutional Neural Networks. In 2020 6th International Conference on Science in Information Technology (ICSITech). 97–102. https://doi.org/10.1109/ICSITech49800.2020.9392051
Szegedy et al. (2015) Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1–9. https://doi.org/10.1109/CVPR.2015.7298594
TAN and Chang (2018) JING WEI TAN and Siow-Wee Chang. 2018. D-Leaf Dataset. (1 2018). https://doi.org/10.6084/m9.figshare.5732955.v1
Thangaraj et al. (2023) Rajasekaran Thangaraj, P. Pandiyan, S. Anandamurugan, and Sivaramakrishnan Rajendar. 2023. A deep convolution neural network model based on feature concatenation approach for classification of tomato leaf disease. Multimedia Tools and Applications (2023). https://doi.org/10.1007/s11042-023-16347-0
Thet et al. (2020) Khaing Zin Thet, Khine Khine Htwe, and Myint Myint Thein. 2020. Grape Leaf Diseases Classification using Convolutional Neural Network. In 2020 International Conference on Advanced Information Technologies (ICAIT). 147–152. https://doi.org/10.1109/ICAIT51105.2020.9261801
Tulshan and Raul (2019) Amrita S. Tulshan and Nataasha Raul. 2019. Plant Leaf Disease Detection using Machine Learning. In 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT). 1–6. https://doi.org/10.1109/ICCCNT45670.2019.8944556
Vandenhende et al. (2021a) Simon Vandenhende, Stamatios Georgoulis, Wouter Van Gansbeke, Marc Proesmans, Dengxin Dai, and Luc Van Gool. 2021a. Multi-Task Learning for Dense Prediction Tasks: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021), 1–1. https://doi.org/10.1109/tpami.2021.3054719
Vandenhende et al. (2021b) Simon Vandenhende, Stamatios Georgoulis, Wouter Van Gansbeke, Marc Proesmans, Dengxin Dai, and Luc Van Gool. 2021b. Multi-Task Learning for Dense Prediction Tasks: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021), 1–1. https://doi.org/10.1109/TPAMI.2021.3054719
Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Vijaykanth Reddy and Sashi Rekha (2021) T. Vijaykanth Reddy and K Sashi Rekha. 2021. Deep Leaf Disease Prediction Framework (DLDPF) with Transfer Learning for Automatic Leaf Disease Detection. In 2021 5th International Conference on Computing Methodologies and Communication (ICCMC). 1408–1415. https://doi.org/10.1109/ICCMC51019.2021.9418245
Vizcarra et al. (2021) Gerson Vizcarra, Danitza Bermejo, Antoni Mauricio, Ricardo Zarate Gomez, and Erwin Dianderas. 2021. The Peruvian Amazon forestry dataset: A leaf image classification corpus. Ecological Informatics 62 (2021), 101268. https://doi.org/10.1016/j.ecoinf.2021.101268
Wadhawan et al. (2020) Radhika Wadhawan, Mayyank Garg, and Ashish Kumar Sahani. 2020. Rice Plant Leaf Disease Detection and Severity Estimation. In 2020 IEEE 15th International Conference on Industrial and Information Systems (ICIIS). 455–459. https://doi.org/10.1109/ICIIS51140.2020.9342653
Wang et al. (2016) Jiang Wang, Yi Yang, Junhua Mao, Zhiheng Huang, Chang Huang, and Wei Xu. 2016. CNN-RNN: A Unified Framework for Multi-Label Image Classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Wu et al. (2007) Stephen Gang Wu, Forrest Sheng Bao, Eric You Xu, Yu-Xuan Wang, Yi-Fan Chang, and Qiao-Liang Xiang. 2007. A Leaf Recognition Algorithm for Plant Classification Using Probabilistic Neural Network. In 2007 IEEE International Symposium on Signal Processing and Information Technology. 11–16. https://doi.org/10.1109/ISSPIT.2007.4458016
Xie et al. (2020) Xiaoyue Xie, Yuan Ma, Bin Liu, Jinrong He, Shuqin Li, and Hongyan Wang. 2020. A Deep-Learning-Based Real-Time Detector for Grape Leaf Diseases Using Improved Convolutional Neural Networks. Frontiers in Plant Science 11 (2020), 751. https://doi.org/10.3389/fpls.2020.00751
Yousef Methkal Abd et al. (2023) Algani Yousef Methkal Abd, Caro Orlando Juan Marquez, Bravo Liz Maribel Robladillo, Kaur Chamandeep, Ansari Mohammed Saleh Al, and B. Kiran Bala. 2023. Leaf disease identification and classification using optimized deep learning. 25 (2023). https://doi.org/10.1016/j.measen.2022.100643
Yuan et al. (2012) Xiao-Tong Yuan, Xiaobai Liu, and Shuicheng Yan. 2012. Visual Classification With Multitask Joint Sparse Representation. IEEE Transactions on Image Processing 21, 10 (2012), 4349–4360. https://doi.org/10.1109/TIP.2012.2205006
Zhang et al. (2013) Tianzhu Zhang, Bernard Ghanem, Si Liu, and Narendra Ahuja. 2013. Robust Visual Tracking via Structured Multi-Task Sparse Learning. International Journal of Computer Vision 101, 2 (2013), 367 – 383. https://login.ezproxy.utas.edu.au/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=edssjs&AN=edssjs.7520745&site=eds-live
Zhang et al. (2018) Xihai Zhang, Yue Qiao, Fanfeng Meng, Chengguo Fan, and Mingming Zhang. 2018. Identification of Maize Leaf Diseases Using Improved Deep Convolutional Neural Networks. IEEE Access 6 (2018), 30370–30377. https://doi.org/10.1109/ACCESS.2018.2844405
Zhang and Yang (2021) Yu Zhang and Qiang Yang. 2021. A Survey on Multi-Task Learning. IEEE Transactions on Knowledge and Data Engineering (2021), 1–1. https://doi.org/10.1109/TKDE.2021.3070203
Zhang et al. (2014) Zhanpeng Zhang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2014. Facial Landmark Detection by Deep Multi-task Learning. In Computer Vision – ECCV 2014, David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 94–108.