¹¹institutetext: Department of Computer Science & Engineering, Islamic University of Technology (IUT), Bangladesh
¹¹email: {fuadahmed2, raihan.kamal}@iut-dhaka.edu ²²institutetext: Department of Computer Science & Engineering, International University of Business Agriculture and Technology (IUBAT), Bangladesh
²²email: [email protected] ³³institutetext: Department of Genetics, Genomics, and Informatics, The University of Tennessee Health Science Center (UTHSC), United States
³³email: [email protected]

AutoCl: A Visual Interactive System for Automatic Deep Learning Classifier Recommendation Based on Models Performance.

Fuad Ahmed 11 Rubayea Ferdows 22 Md Rafiqul Islam 33 Abu Raihan M. Kamal 11

Abstract

Nowadays, deep learning (DL) models being increasingly applied to various fields, people without technical expertise and domain knowledge struggle to find an appropriate model for their task. In this paper, we introduce AutoCl a visual interactive recommender system aimed at helping non-experts to adopt an appropriate DL classifier. Our system enables users to compare the performance and behavior of multiple classifiers trained with various hyperparameter setups as well as automatically recommends a best classifier with appropriate hyperparameter. We compare features of AutoCl against several recent AutoML systems and show that it helps non-experts better in choosing DL classifier. Finally, we demonstrate use cases for image classification using publicly available dataset to show the capability of our system.

Keywords:

Deep learning Visualization Visual Interactive System Data Analytics Image Classifiers Recommendation System.

1 Introduction

In recent years, we’ve seen an increase in the use of deep learning (DL) models to support data-driven decision making in various fields. such as speech recognition, computer vision, natural language processing, drug discovery, biomedical sciences, etc. However, despite the effectiveness of DL models, we still have a limited theoretical understanding of these models. Additionally, developing a DL model requires some highly technical expertise and domain knowledge. For example, to improve the performance of DL model, careful selection of model, layer, epoch, optimizer, and batch size is critical. Classification is one of the most important machine learning problems, hence there is a wide range of classification algorithms are available. However, in most real-world applications, the selection of a classification method for a new dataset or application area is still challenging task.

Numerous automated machine learning (AutoML) methods have been proposed by researchers to fill the gap in human knowledge by automating the DL model developing process. These systems often have steep learning curves due to manual input and a lack of performance visualization features [8]. As a result, these systems are most of the time inaccessible to the growing number of practitioners who lack the time or background to learn sophisticated tools [4]. Unlike other common AutoML methods, AutoCl is not a black-box optimization process. It can compare and visualize the performance of multiple classifier in class level and recommends best performing classifier with proper hyperparameter.

Most of the exisitng AutoML methods recommends single model as output where AutoCl not only recommends a best model but also provides multiple model output as well as gives the users hyperparameter tuning capability. So that it easier to compare the performance of multiple model in class level. Therefore, from the existing studies, it is observed that choosing a better model for performing any task is challenging without strong DL expertise and domain knowledge.

Refer to caption — Figure 1: The dashboard view of our visual interactive recommender system

The main objective of this paper is to facilitate non-experts such as researchers and practitioners to choose the most appropriate deep learning classifier for satisfying their requirements. In this paper, we present AutoCl, an interactive deep learning classifier recommendation system as shown in Figure 1, where we visualize the performance ranking of multiple DL models and recommend best possible model with best hyperparameter settings. The contributions of this work include:

•

We demonstrate a suite of visualization tools that illustrate the performance of multiple classification algorithm.
•

We show how hyperparameter affects the classifier performance in class level.
•

Finally the system recommend a top performing classifier with proper hyperparameter.

The outline of this paper is organized as follows. In Section 2, we explore the related work, and in Section 3, we briefly discussed the system design and development to explore the performance of deep learning classifiers. Sections 4 to 6 illustrate the implementation and discussion to evaluate the performance. Finally, in Section 7, we provide a conclusion.

2 Related Work

Building a deep learning classifier by non-experts is a challenging task, it’s also time consuming and error-prone procedure. The challenges motivated many researchers to come up with various AutoML systems to support automatic algorithm selection and Hyper-parameter tuning over the years. In those system they followed statistical approaches, decision trees, neural networks, and other computational intelligence techniques.

In a study Maher and Sakr proposed SmartML [7] a system for automatic classification algorithm selection and hyperparameter tuning using both meta-learning and Bayesian optimization. Their system limited to work on only 15 machine learning classifier. Auto-sklearn [2] is another framework based on scikit-learn for automatic classification and regression task. But it only work for traditional ML algorithms (such as SVM and KNN).

Only a few deep learning framework are proposed for AutoML tasks. Auto-Keras [5] is one of them. It is built based on Keras which is another popular deep learning framework. Auto-Keras focuses on discovering neural network architectures that compete with human-experts. Finally it provides an appropriate deep learning model based on user input data. It also supports multi-modal and multi-task. In addition, very few studies have examined and compared the performance of diffrent AutoML systems [3] [1]. In general, these studies reveal that there is no obvious winner since there are always trade-offs that must be addressed and optimized based on the context of the problems and the users need.

To the best of our knowledge, this is the first work that automatically recommends best possible deep learning classifier with proper hyperparameter to the non-expert users interactively. It also enables user to tune diffrent hyperparameter settings and compare multiple models performance metrics on class level.

3 System Design and Development

We design an interactive recommender system for non-expert users named AutoCl to automatically recommends DL classifier with proper hyperparameter settings. We describe how AutoCl processes data and recmommends best classifer based on user input data. The system can enables user to tune multiple hyperparameter settings to get optimum result. The system architecture of AutoCl shown in Figure 2 consists of two modules, i.e., (A) background unit and (B) interface unit.

A. Background Unit: The background unit of AutoCl works as a data processor. As the system evaluates multiple models and their metrics, it takes a dataset as an input, and after prepossessing the data, it generates evaluation metrics as output. First, the dataset goes through prepossessing, and it splits into training and testings subsets. It contains file indices, ground truth levels, or class information. The dataset is trained on multiple deep learning classifier and evaluated based on the testing dataset. The background unit calculates high-level statistics about the performance of each model to construct the performance result summary. For automatic recommendation task, AutoKeras [5] library run on multiple classifier and generates a summery of classifiers with best combination of hyperparameter. It do a neural architecture search (NAS) with network morphism based on Bayesian optimization to select a best classifier settings.

Finally, the classifier’s performance metrics and recommendation data are saved in a JSON file, and the data goes to the interface unit for visualization. The model’s performance data includes model name, hyperparameter settings such as layer structure, epoch number, batch size, optimizer, etc., and metrics such as accuracy, precision, specificity, sensitivity. It also includes list of top ranked classifiers with appropriate hyperparameter for recommendation task.

B. Interface Unit: AutoCl interface unit consists of three user interface (UI) modules, and it is responsible for visualizing all the results. As shown in Figure 1, modules are visualized as follows: A) This module enables users to choose and tune options such as hyperparameter and metrics. B) A bump chart performance ranking view showing the performance of multiple models in the class level. C) A recommendation view that automatically show the top ranked classifier with suggested hyperparameter settings. Users can adopt best possible classifier for their task suggested by the recommendation module. They can also tune and compare multiple classifier effectively and gather actionable insights determining which classifier to adopt using a combination of different hyperparameter.

4 Description of AutoCl and Discussion

4.1 Dashboard Description

In this section, we illustrate how AutoCl analyzes images within models to demonstrate different performances using several DL classifier such as CNN, VGG-16, AlexNet, DenseNet, and ResNet-50. The dashboard of the AutoCl, as shown in Figure. 1 consists of three significant modules as follows:

A. Input Selection: Users can filter hyperparameter and accuracy metrics using the AutoCl option controls. They can interactively explore and compare models based on performance ranking and ground-truth labels as shown in Figure. 1(A). Also, users can further adjust hyperparameters such as batch size, epoch, layer, and optimizer to see what parameter combination has higher performance results.

B. Performance Ranking: The performance ranking module allows DL practitioners to evaluate the performance of multiple models simultaneously. They can assess both class-level performances as well as individual performance metrics of any class. In Figure 1(B), we develop a bump chart to visualize multiclass and multimodel performance at the same time. In the chart, each model is represented by a ranking line in which columns represent the ground truth level of class. Each class denotes by a circle, and it shows the performance metric value inside it. Circle size increases or decreases depending on the measured value of that class. The models in the ranking line represented by red to green color aligned with their performance metric value. Once a user clicks any model in the chart, that model’s ranking line becomes bold, and it is easier to look at which class has poor performance. We can see that both CNN and VGG-16 have a higher average performance and stable prediction for the entire class whereas DenseNet and ResNet-50 have poor performance.

C. Recommendation System: In the Automatic recommender view, user can see a list of classifier shown based on accuracy and other metrics. The classifiers also have set of hyperparameter settings with each model name shown in Figure. 1(C). This classifier rank generated based on each classifier performance.

5 Usage Scenario: Image Classification

In this section, we show how AutoCl can effectively support non-experts to choose best deep learning classifier for their work. Specifically, we demonstrate how multiple classifier performance comparison and tuning hyperparameter can support their exploration.

John, a student who is interested in image classification problem. For a study he need to select a image classifier. He wants to know, how an image classifier model classify images. He has done some data visualization before, but he doesn’t have much deep learning expertise. He finds CIFAR-10 dataset, which is publicly available [6]. It is a multiclass dataset containing 60000 32x32 color images. It has ten (10) classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. Each class has exactly 6,000 images. There are 50,000 training images and 10,000 test images in the dataset.

He loads the dataset in AutoCl system and specify few classifier (CNN, VGG-16, AlexNet, DenseNet, and ResNet-50) and mentions the classes he want to predict. Once he run the AutoCl Background Unit, the system process the data and calculate necessary metrics and recommender data. Then the data shows in Interface Unit. He can see that the AutoCl recommender view suggest him top performing classifier with necessary hyperparameter. CNN performs better where ResNet50 performs poorly.

He also want to know which classifier provide better performance on class level. From the AutoCl Option Selection and Performance Ranking view 1(A-B), he sets different hyperparameter and observe the performance metrics such as accuracy precision instantly. He selects the epoch from option settings and discover that the performance varies for different epoch number. The check all the options settings and interactively discover and learnt how the classifier classifies the image in class level. After using the AutoCl dashboard, finally he come to a decision that CNN is the most performing classifier for his task on given CIFAR-10 dataset.

Table 1: Feature Comparison between AutoCl and few other AutoML Framework

No.	Features of dashboard	Smart ML	Auto-sklearn	Auto-Keras	AutoCl
F1	Multiple Model Compare	No	No	No	Yes
F2	Performance ranking	No	No	No	Yes
F3	Support Deep Learning	No	No	Yes	Yes
F4	Visual Interpretation	No	No	No	Yes
F5	Hyperparameter Tuning	No	No	Yes	Yes
F6	Automated process	Yes	No	No	Yes

6 Evaluation

We conduct a two-stage evaluation study to assess the potential usability and usefulness of our system. In the first stage, we demonstrated a usage scenario as described in section 5. In the second stage, AutoCl is compared with three different recommendation system as shown in Table 1 where (Yes) and (No) indicates the presence and absence of the feature selections [7], [2], [5]. In short, AutoCl is useful for recommending classifier and evaluating performance at the class levels to compare multiple classifier effectively.

7 Conclusion

In this paper, we presented AutoCl, an interactive recommender system to suggest best performing classifier to non-expert users. Although there are several cloud based and opensource AutoML system exits, those are either costly or not feature rich. Our system is helpful for non-experts to effectively select deep learning classifier for their work. It help them interactively tune hyperparameter and suggest best performing classifier. In the future, we will improve the system to be used as a visual interactive DL tool to update and learn additional models at the instance level with a higher accuracy rate. We will also incorporate our system for other deep learning task such as prediction, natural language processing etc.

References

[1] Elshawi, R., Maher, M., Sakr, S.: Automated machine learning: State-of-the-art and open challenges. arXiv preprint arXiv:1906.02287 (2019)
[2] Feurer, M., Klein, A., Eggensperger, K., Springenberg, J.T., Blum, M., Hutter, F.: Auto-sklearn: efficient and robust automated machine learning. In: Automated Machine Learning, pp. 113–134. Springer, Cham (2019)
[3] He, X., Zhao, K., Chu, X.: Automl: A survey of the state-of-the-art. Knowledge-Based Systems 212, 106622 (2021)
[4] Hu, K., Bakker, M.A., Li, S., Kraska, T., Hidalgo, C.: Vizml: A machine learning approach to visualization recommendation. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. pp. 1–12 (2019)
[5] Jin, H., Song, Q., Hu, X.: Auto-keras: An efficient neural architecture search system. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 1946–1956 (2019)
[6] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
[7] Maher, M., Sakr, S.: Smartml: A meta learning-based framework for automated selection and hyperparameter tuning for machine learning algorithms. In: EDBT: 22nd International Conference on Extending Database Technology (2019)
[8] Wang, Q., Chen, Z., Wang, Y., Qu, H.: Applying machine learning advances to data visualization: A survey on ml4vis. arXiv preprint arXiv:2012.00467 (2020)