A Central Asian Food Dataset for Personalized Dietary Interventions, Extended Abstract

Aknur Karabay1, Arman Bolatov1, Huseyin Atakan Varol1, and Mei-Yen Chan2 1 Institute of Smart Systems and Artificial Intelligence, Nazarbayev University.
2 School of Medicine, Nazarbayev University.

Abstract

Nowadays, it is common for people to take photographs of every beverage, snack, or meal they eat and then post these photographs on social media platforms. Leveraging these social trends, real-time food recognition and reliable classification of these captured food images can potentially help replace some of the tedious recording and coding of food diaries to enable personalized dietary interventions. Although Central Asian cuisine is culturally and historically distinct, there has been little published data on the food and dietary habits of people in this region. To fill this gap, we aim to create a reliable dataset of regional foods that is easily accessible to both public consumers and researchers. To the best of our knowledge, this is the first work on creating a Central Asian Food Dataset (CAFD). The final dataset contains 42 food categories and over 16,000 images of national dishes unique to this region. We achieved a classification accuracy of 88.70% (42 classes) on the CAFD using the ResNet152 neural network model. The food recognition models trained on the CAFD demonstrate computer vision’s effectiveness and high accuracy for dietary assessment.

I Introduction

This manuscript is an extended abstract of our previously published article, A Central Asian Food Dataset for Personalized Dietary Interventions, which appeared in the Nutrients, MDPI in 2023 [1].

Food computation from visual data has gained prominence due to computer vision advancements and increased smartphone, and social media usage [2]. These platforms provide access to food-related information, which can be utilized for various tasks, including medical, gastronomic, and agronomic research. Deep learning-based food image recognition systems have been developed for applications in dietary assessment, smart restaurants, food safety inspection, and agriculture. Automatic food image recognition can improve the accuracy of nutritional records and assist visually impaired individuals [2].

TABLE I: Summary of food classification datasets.

Dataset	Year	# class	# images	Cuisine	Public
Food-101 [3]	2014	101	101,000	European	yes
VireoFood-172 [4]	2016	172	110,241	Chinese/Asian	yes
TurkishFoods-15 [5]	2017	15	7,500	Turkish	yes
FoodAI [6]	2019	756	400,000	International	no
VireoFood-251 [7]	2020	251	169,673	Chinese/Asian	yes
ISIA Food-500 [8]	2020	500	399,726	Chinese/Intern.	yes
Food2K [9]	2021	2,000	1,036,564	Chinese/Intern.	no
Food1K [9]	2021	1,000	400,000	Chinese/Intern.	yes
CAFD [1]	2022	42	16,499	Central Asian	yes

Existing food classification datasets mostly include Western, European, Chinese, and other Asian cuisines [10, 6]. Examples of such datasets are presented in Table I. However, FoodAI is not open source, and Food2K is not publicly available. Food1K, a food recognition challenge dataset, has been released with 400,000 images and 1,000 food classes.

Most food datasets predominantly contain Western and Asian dishes, lacking specific national dishes like those in Central Asia. To address this, we aim to develop a unique food recognition system for our region, considering local preferences, specialties, and cuisines. In this paper, we describe the development of our dataset, food recognition models, and their performance and conclude with a summary of our findings.

II Central Asian Food Dataset

This paper introduces the Central Asian Food Dataset (CAFD), consisting of 16,499 images across 42 classes representing popular Central Asian cuisine. We ensured the dataset’s high quality through extensive data cleaning, iterative annotation, and multiple inspections. The CAFD can also be a food image representation learning benchmark.

We followed a five-step process to create a diverse and high-quality dataset. First, we listed popular Central Asian food items. Second, we scraped images from search engines and social media using a Python script with Selenium. We extracted images from recipe videos using Roboflow [11] to increase underrepresented classes. Third, we removed duplicates using the HashImage Python library. Fourth, two annotators created bounding boxes for each food item using Roboflow software. Fifth, we cropped the food items based on bounding box coordinates and stored them in separate directories by class. The final dataset has an imbalanced number of images per class, ranging from 99 to 922.

Refer to caption — Figure 1: Sample images for Central Asian Food Dataset classes.

III Food Recognition Models

Image classification is a computer vision task that extracts a single descriptor from an image. State-of-the-art models are based on CNNs and have improved due to large datasets. Transfer learning is often used when sufficient training data is unavailable, as it leverages knowledge from pre-trained models to solve similar problems in different domains [12]. In this work, we applied transfer learning to food classification using model weights pre-trained on ImageNet, a large dataset with over 14 million images [13].

We selected 10 models of different architectures and complexity to evaluate their performance on the CAFD. These models include VGG-16, Squeezenet1, and five models with ResNet architecture [14, 15, 16, 17, 18]. DenseNet-121 and EfficientNet-b4 have similar architectures to ResNets but introduce different scaling methods [19, 20].

Then we trained the models on the Food1K dataset and tested the combination of CAFD and Food1K. We carefully split the datasets into training, validation, and test sets to avoid bias and data leakage. Table II shows the number of images in each set for three different datasets.

Also, we performed transfer learning on Pytorch using pre-trained models on ImageNet. Models were trained for 40 epochs with a learning rate of 0.001, batch size of 64, and a categorical cross-entropy loss. The input size of images varied depending on the model. We used Top-5 accuracy and Top-1 accuracy as evaluation metrics and precision, recall, and $F_{1}$ -score metrics to identify and analyze the best and worst-classified food classes.

TABLE II: Image distribution across the training (train), validation (valid), and test sets.

Dataset	Train size	Validation size	Test size
CAFD	11,008	2,763	2,728
Food1K	317,277	26,495	26,495
CAFD+Food1K	328,285	29,258	29,223

IV Results and Discussion

Table III summarizes the classification models’ results. All models performed better on the CAFD than on Food1K and CAFD+Food1K, indicating the accuracy and cleanness of the CAFD. VGG-16 achieved 86.03% Top-1 and 98.33% Top-5 accuracies on the CAFD, while Squeezenet1 had a lower performance. ResNet architectures achieved around 88% Top-1 and 98% Top-5 accuracy on the CAFD, with accuracy increasing as network depth increases. Wide ResNet-50 improved accuracy compared to ResNet50, and EfficientNet-b4 achieved the best results on Food1K and CAFD+Food1K.

TABLE III: Top-1 and Top-5 accuracies for different food classification models and datasets.

Base Model	CAFD		Food1k		CAFD+Food1K
Base Model	Top-1	Top-5	Top-1	Top-5	Top-1	Top-5
VGG-16	86.03	98.33	80.67	95.24	80.87	96.19
Squeezenet1_0	79.58	97.29	71.33	91.23	69.16	90.15
ResNet50	88.03	98.44	82.44	97.01	83.22	97.25
ResNet101	88.51	98.44	84.10	97.34	84.20	97.45
ResNet152	88.70	98.59	84.85	97.80	84.75	97.58
ResNext50-32	87.95	98.44	81.17	96.67	84.81	97.65
Wide ResNet-50	88.21	98.59	82.20	97.28	85.27	97.81
DenseNet-121	86.95	98.26	83.03	97.14	82.45	96.93
EfficientNet-b4	81.28	97.37	87.47	98.04	87.75	98.01

Tables IV and V list the 10 best and worst detected CAFD classes by ResNet152 and EfficientNet-b4. Large classes with distinct features performed best, while fine-grained or similar-looking classes, presented in Figure 2, caused confusion and deteriorated model performance.

TABLE IV: Ten CAFD classes best and worst detected by the ResNet152 model.

Best detected classes				Worst detected classes
Class	Prec.	Rec.	$F_{1}$	Class	Prec.	Rec.	$F_{1}$
Sushki	0.96	1	0.98	Shashlyk chicken w/ v.	0.71	0.67	0.69
Achichuk	0.95	1	0.98	Shashlyk beef w/ v.	0.66	0.72	0.69
Sheep head	0.94	1	0.97	Shashlyk chicken	0.67	0.74	0.7
Naryn	0.96	0.98	0.97	Shashlyk minced meat	0.79	0.64	0.71
Plov	0.93	0.99	0.96	Asip	0.85	0.62	0.72
Tushpara w/ s.	0.93	0.97	0.95	Shashlyk beef	0.74	0.69	0.72
Soup plain	0.97	0.93	0.95	Lagman without soup	0.83	0.68	0.75
Samsa	0.94	0.96	0.95	Kazy-karta	0.83	0.74	0.78
Hvorost	0.98	0.91	0.95	Beshbarmak with kazy	0.78	0.8	0.79
Manty	0.92	0.95	0.94	Tushpara fried	0.88	0.76	0.81

TABLE V: Ten CAFD and Food1K classes best and worst detected by the EfficientNet-b4 model.

Best detected classes				Worst detected classes
Class	Prec.	Rec.	$F_{1}$	Class	Prec.	Rec.	$F_{1}$
Sushki	0.91	1	0.96	Lagman without soup	0.6	0.27	0.37
Achichuk	1	0.95	0.97	Asip	0.88	0.38	0.53
Sheed head	0.94	0.94	0.94	Talkan-zhent	0.86	0.53	0.66
Airan-katyk	0.83	0.93	0.88	Doner lavash	0.75	0.6	0.67
Plov	0.97	0.90	0.93	Shashlyk chicken w/ v.	0.88	0.64	0.74
Cheburek	0.92	0.90	0.91	Lagman fried	0.96	0.68	0.8
Irimshik	0.93	0.88	0.91	Doner nan	1	0.68	0.81
Samsa	0.93	0.88	0.90	Shashlyk chicken	0.61	0.69	0.65
Naryn	0.97	0.87	0.92	Shahslyk beef	0.67	0.69	0.68
Chak-chak	0.9	0.87	0.92	Kazy-karta	0.8	0.7	0.74

V Conclusion

The Central Asian Food Dataset (CAFD) offers a unique advantage in automating and improving dietary assessment accuracy. It has potential applications in creating or modifying recipes, helping restaurants and food service providers plan menus, optimizing food production, and combating fraudulent food practices. It can be used to improve food quality, develop new recipes and personalized dietary plans, optimize production processes, increase food safety, and integrate with other food recognition systems.

Comprising 16,499 images of 42 food classes, the CAFD demonstrates the effectiveness of computer vision models for food recognition. Our models achieved a Top-5 accuracy of 98.59% and 98.01% for the CAFD and CAFD+Food1K, respectively. The dataset, source code, and pre-trained models are available on GitHub ¹¹1https://github.com/IS2AI/Central-Asian-Food-Dataset repository.

Future work includes exploring different neural network architectures, data augmentation methods, and utilizing the CAFD for other dietary-related tasks. We also plan to develop food scene recognition datasets with multiple food items per image and extend the current food categories based on additional food classes.

Author contributions

MYC and HAV conceived and designed the study. AK, HAV, and MYC contributed to defining the research scope and objectives. AK and AB collected and prepared the dataset and trained the models. AB created a pipeline for processing images in Roboflow. HAV provided guidelines for the project experiments. AK performed the final check of the dataset and finalized the experimental results. PI of the project: MYC. AK, MYC,and AB wrote the article, and all the authors contributed to the manuscript revision and approved the submitted version.

References

[1] A. Karabay, A. Bolatov, H. A. Varol, and M.-Y. Chan, “A central asian food dataset for personalized dietary interventions,” Nutrients, vol. 15, no. 7, 2023. [Online]. Available: https://www.mdpi.com/2072-6643/15/7/1728
[2] D. Allegra, S. Battiato, A. Ortis, S. Urso, and R. Polosa, “A review on food recognition technology for health applications,” Health Psychology Research, vol. 8, no. 3, Dec. 2020. [Online]. Available: https://doi.org/10.4081/hpr.2020.9297
[3] L. Bossard, M. Guillaumin, and L. V. Gool, “Food-101 – mining discriminative components with random forests,” in Computer Vision – ECCV 2014. Springer International Publishing, 2014, pp. 446–461. [Online]. Available: https://doi.org/10.1007/978-3-319-10599-4_29
[4] J. Chen and C. wah Ngo, “Deep-based ingredient recognition for cooking recipe retrieval,” in Proceedings of the 24th ACM international conference on Multimedia. ACM, Oct. 2016. [Online]. Available: https://doi.org/10.1145/2964284.2964315
[5] C. Gungor, F. Baltacı, A. Erdem, and E. Erdem, “Turkish cuisine: A benchmark dataset with turkish meals for food recognition,” in 2017 25th Signal Processing and Communications Applications Conference (SIU), 2017, pp. 1–4.
[6] D. Sahoo, W. Hao, S. Ke, W. Xiongwei, H. Le, P. Achananuparp, E.-P. Lim, and S. C. H. Hoi, “FoodAI,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Jul. 2019. [Online]. Available: https://doi.org/10.1145/3292500.3330734
[7] J. Chen, B. Zhu, C.-W. Ngo, T.-S. Chua, and Y.-G. Jiang, “A study of multi-task and region-wise deep learning for food ingredient recognition,” IEEE Transactions on Image Processing, vol. 30, pp. 1514–1526, 2021. [Online]. Available: https://doi.org/10.1109/tip.2020.3045639
[8] W. Min, L. Liu, Z. Wang, Z. Luo, X. Wei, X. Wei, and S. Jiang, “Isia food-500: A dataset for large-scale food recognition via stacked global-local attention network,” Proceedings of the 28th ACM International Conference on Multimedia, 2020.
[9] W. Min, Z. Wang, Y. Liu, M. Luo, L. Kang, X. Wei, X. Wei, and S. Jiang, “Large scale visual food recognition,” 2021. [Online]. Available: https://arxiv.org/abs/2103.16107
[10] D. Herzig, C. T. Nakas, J. Stalder, C. Kosinski, C. Laesser, J. Dehais, R. Jaeggi, A. B. Leichtle, F.-M. Dahlweid, C. Stettler, and L. Bally, “Volumetric food quantification using computer vision on a depth-sensing smartphone: Preclinical study,” JMIR mHealth and uHealth, vol. 8, no. 3, Mar. 2020. [Online]. Available: https://doi.org/10.2196/15294
[11] Roboflow, “Roboflow:give your software the sense of sight,” 2022, Accessed: 2022-11-25. [Online]. Available: https://roboflow.com/
[12] L. Torrey and J. Shavlik, “Transfer learning,” in Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI global, 2010, pp. 242–264.
[13] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
[14] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014. [Online]. Available: https://arxiv.org/abs/1409.1556
[15] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and 0.5mb model size,” 2016. [Online]. Available: https://arxiv.org/abs/1602.07360
[16] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Jun. 2016. [Online]. Available: https://doi.org/10.1109/cvpr.2016.90
[17] S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” 2016. [Online]. Available: https://arxiv.org/abs/1611.05431
[18] S. Zagoruyko and N. Komodakis, “Wide residual networks,” 2016. [Online]. Available: https://arxiv.org/abs/1605.07146
[19] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” 2016. [Online]. Available: https://arxiv.org/abs/1608.06993
[20] M. Tan and Q. V. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” 2019. [Online]. Available: https://arxiv.org/abs/1905.11946