A Central Asian Food Dataset for Personalized Dietary Interventions, Extended Abstract
Abstract
Nowadays, it is common for people to take photographs of every beverage, snack, or meal they eat and then post these photographs on social media platforms. Leveraging these social trends, real-time food recognition and reliable classification of these captured food images can potentially help replace some of the tedious recording and coding of food diaries to enable personalized dietary interventions. Although Central Asian cuisine is culturally and historically distinct, there has been little published data on the food and dietary habits of people in this region. To fill this gap, we aim to create a reliable dataset of regional foods that is easily accessible to both public consumers and researchers. To the best of our knowledge, this is the first work on creating a Central Asian Food Dataset (CAFD). The final dataset contains 42 food categories and over 16,000 images of national dishes unique to this region. We achieved a classification accuracy of 88.70% (42 classes) on the CAFD using the ResNet152 neural network model. The food recognition models trained on the CAFD demonstrate computer vision’s effectiveness and high accuracy for dietary assessment.
I Introduction
This manuscript is an extended abstract of our previously published article, A Central Asian Food Dataset for Personalized Dietary Interventions, which appeared in the Nutrients, MDPI in 2023 [1].
Food computation from visual data has gained prominence due to computer vision advancements and increased smartphone, and social media usage [2]. These platforms provide access to food-related information, which can be utilized for various tasks, including medical, gastronomic, and agronomic research. Deep learning-based food image recognition systems have been developed for applications in dietary assessment, smart restaurants, food safety inspection, and agriculture. Automatic food image recognition can improve the accuracy of nutritional records and assist visually impaired individuals [2].
Dataset | Year | # class | # images | Cuisine | Public |
---|---|---|---|---|---|
Food-101 [3] | 2014 | 101 | 101,000 | European | yes |
VireoFood-172 [4] | 2016 | 172 | 110,241 | Chinese/Asian | yes |
TurkishFoods-15 [5] | 2017 | 15 | 7,500 | Turkish | yes |
FoodAI [6] | 2019 | 756 | 400,000 | International | no |
VireoFood-251 [7] | 2020 | 251 | 169,673 | Chinese/Asian | yes |
ISIA Food-500 [8] | 2020 | 500 | 399,726 | Chinese/Intern. | yes |
Food2K [9] | 2021 | 2,000 | 1,036,564 | Chinese/Intern. | no |
Food1K [9] | 2021 | 1,000 | 400,000 | Chinese/Intern. | yes |
CAFD [1] | 2022 | 42 | 16,499 | Central Asian | yes |
Existing food classification datasets mostly include Western, European, Chinese, and other Asian cuisines [10, 6]. Examples of such datasets are presented in Table I. However, FoodAI is not open source, and Food2K is not publicly available. Food1K, a food recognition challenge dataset, has been released with 400,000 images and 1,000 food classes.
Most food datasets predominantly contain Western and Asian dishes, lacking specific national dishes like those in Central Asia. To address this, we aim to develop a unique food recognition system for our region, considering local preferences, specialties, and cuisines. In this paper, we describe the development of our dataset, food recognition models, and their performance and conclude with a summary of our findings.
II Central Asian Food Dataset
This paper introduces the Central Asian Food Dataset (CAFD), consisting of 16,499 images across 42 classes representing popular Central Asian cuisine. We ensured the dataset’s high quality through extensive data cleaning, iterative annotation, and multiple inspections. The CAFD can also be a food image representation learning benchmark.
We followed a five-step process to create a diverse and high-quality dataset. First, we listed popular Central Asian food items. Second, we scraped images from search engines and social media using a Python script with Selenium. We extracted images from recipe videos using Roboflow [11] to increase underrepresented classes. Third, we removed duplicates using the HashImage Python library. Fourth, two annotators created bounding boxes for each food item using Roboflow software. Fifth, we cropped the food items based on bounding box coordinates and stored them in separate directories by class. The final dataset has an imbalanced number of images per class, ranging from 99 to 922.

III Food Recognition Models
Image classification is a computer vision task that extracts a single descriptor from an image. State-of-the-art models are based on CNNs and have improved due to large datasets. Transfer learning is often used when sufficient training data is unavailable, as it leverages knowledge from pre-trained models to solve similar problems in different domains [12]. In this work, we applied transfer learning to food classification using model weights pre-trained on ImageNet, a large dataset with over 14 million images [13].
We selected 10 models of different architectures and complexity to evaluate their performance on the CAFD. These models include VGG-16, Squeezenet1, and five models with ResNet architecture [14, 15, 16, 17, 18]. DenseNet-121 and EfficientNet-b4 have similar architectures to ResNets but introduce different scaling methods [19, 20].
Then we trained the models on the Food1K dataset and tested the combination of CAFD and Food1K. We carefully split the datasets into training, validation, and test sets to avoid bias and data leakage. Table II shows the number of images in each set for three different datasets.
Also, we performed transfer learning on Pytorch using pre-trained models on ImageNet. Models were trained for 40 epochs with a learning rate of 0.001, batch size of 64, and a categorical cross-entropy loss. The input size of images varied depending on the model. We used Top-5 accuracy and Top-1 accuracy as evaluation metrics and precision, recall, and -score metrics to identify and analyze the best and worst-classified food classes.
Dataset | Train size | Validation size | Test size |
---|---|---|---|
CAFD | 11,008 | 2,763 | 2,728 |
Food1K | 317,277 | 26,495 | 26,495 |
CAFD+Food1K | 328,285 | 29,258 | 29,223 |
IV Results and Discussion
Table III summarizes the classification models’ results. All models performed better on the CAFD than on Food1K and CAFD+Food1K, indicating the accuracy and cleanness of the CAFD. VGG-16 achieved 86.03% Top-1 and 98.33% Top-5 accuracies on the CAFD, while Squeezenet1 had a lower performance. ResNet architectures achieved around 88% Top-1 and 98% Top-5 accuracy on the CAFD, with accuracy increasing as network depth increases. Wide ResNet-50 improved accuracy compared to ResNet50, and EfficientNet-b4 achieved the best results on Food1K and CAFD+Food1K.
Base Model | CAFD | Food1k | CAFD+Food1K | |||
---|---|---|---|---|---|---|
Top-1 | Top-5 | Top-1 | Top-5 | Top-1 | Top-5 | |
VGG-16 | 86.03 | 98.33 | 80.67 | 95.24 | 80.87 | 96.19 |
Squeezenet1_0 | 79.58 | 97.29 | 71.33 | 91.23 | 69.16 | 90.15 |
ResNet50 | 88.03 | 98.44 | 82.44 | 97.01 | 83.22 | 97.25 |
ResNet101 | 88.51 | 98.44 | 84.10 | 97.34 | 84.20 | 97.45 |
ResNet152 | 88.70 | 98.59 | 84.85 | 97.80 | 84.75 | 97.58 |
ResNext50-32 | 87.95 | 98.44 | 81.17 | 96.67 | 84.81 | 97.65 |
Wide ResNet-50 | 88.21 | 98.59 | 82.20 | 97.28 | 85.27 | 97.81 |
DenseNet-121 | 86.95 | 98.26 | 83.03 | 97.14 | 82.45 | 96.93 |
EfficientNet-b4 | 81.28 | 97.37 | 87.47 | 98.04 | 87.75 | 98.01 |

Tables IV and V list the 10 best and worst detected CAFD classes by ResNet152 and EfficientNet-b4. Large classes with distinct features performed best, while fine-grained or similar-looking classes, presented in Figure 2, caused confusion and deteriorated model performance.
Best detected classes | Worst detected classes | ||||||
---|---|---|---|---|---|---|---|
Class | Prec. | Rec. | Class | Prec. | Rec. | ||
Sushki | 0.96 | 1 | 0.98 | Shashlyk chicken w/ v. | 0.71 | 0.67 | 0.69 |
Achichuk | 0.95 | 1 | 0.98 | Shashlyk beef w/ v. | 0.66 | 0.72 | 0.69 |
Sheep head | 0.94 | 1 | 0.97 | Shashlyk chicken | 0.67 | 0.74 | 0.7 |
Naryn | 0.96 | 0.98 | 0.97 | Shashlyk minced meat | 0.79 | 0.64 | 0.71 |
Plov | 0.93 | 0.99 | 0.96 | Asip | 0.85 | 0.62 | 0.72 |
Tushpara w/ s. | 0.93 | 0.97 | 0.95 | Shashlyk beef | 0.74 | 0.69 | 0.72 |
Soup plain | 0.97 | 0.93 | 0.95 | Lagman without soup | 0.83 | 0.68 | 0.75 |
Samsa | 0.94 | 0.96 | 0.95 | Kazy-karta | 0.83 | 0.74 | 0.78 |
Hvorost | 0.98 | 0.91 | 0.95 | Beshbarmak with kazy | 0.78 | 0.8 | 0.79 |
Manty | 0.92 | 0.95 | 0.94 | Tushpara fried | 0.88 | 0.76 | 0.81 |
Best detected classes | Worst detected classes | ||||||
---|---|---|---|---|---|---|---|
Class | Prec. | Rec. | Class | Prec. | Rec. | ||
Sushki | 0.91 | 1 | 0.96 | Lagman without soup | 0.6 | 0.27 | 0.37 |
Achichuk | 1 | 0.95 | 0.97 | Asip | 0.88 | 0.38 | 0.53 |
Sheed head | 0.94 | 0.94 | 0.94 | Talkan-zhent | 0.86 | 0.53 | 0.66 |
Airan-katyk | 0.83 | 0.93 | 0.88 | Doner lavash | 0.75 | 0.6 | 0.67 |
Plov | 0.97 | 0.90 | 0.93 | Shashlyk chicken w/ v. | 0.88 | 0.64 | 0.74 |
Cheburek | 0.92 | 0.90 | 0.91 | Lagman fried | 0.96 | 0.68 | 0.8 |
Irimshik | 0.93 | 0.88 | 0.91 | Doner nan | 1 | 0.68 | 0.81 |
Samsa | 0.93 | 0.88 | 0.90 | Shashlyk chicken | 0.61 | 0.69 | 0.65 |
Naryn | 0.97 | 0.87 | 0.92 | Shahslyk beef | 0.67 | 0.69 | 0.68 |
Chak-chak | 0.9 | 0.87 | 0.92 | Kazy-karta | 0.8 | 0.7 | 0.74 |
V Conclusion
The Central Asian Food Dataset (CAFD) offers a unique advantage in automating and improving dietary assessment accuracy. It has potential applications in creating or modifying recipes, helping restaurants and food service providers plan menus, optimizing food production, and combating fraudulent food practices. It can be used to improve food quality, develop new recipes and personalized dietary plans, optimize production processes, increase food safety, and integrate with other food recognition systems.
Comprising 16,499 images of 42 food classes, the CAFD demonstrates the effectiveness of computer vision models for food recognition. Our models achieved a Top-5 accuracy of 98.59% and 98.01% for the CAFD and CAFD+Food1K, respectively. The dataset, source code, and pre-trained models are available on GitHub 111https://github.com/IS2AI/Central-Asian-Food-Dataset repository.
Future work includes exploring different neural network architectures, data augmentation methods, and utilizing the CAFD for other dietary-related tasks. We also plan to develop food scene recognition datasets with multiple food items per image and extend the current food categories based on additional food classes.
Author contributions
MYC and HAV conceived and designed the study. AK, HAV, and MYC contributed to defining the research scope and objectives. AK and AB collected and prepared the dataset and trained the models. AB created a pipeline for processing images in Roboflow. HAV provided guidelines for the project experiments. AK performed the final check of the dataset and finalized the experimental results. PI of the project: MYC. AK, MYC,and AB wrote the article, and all the authors contributed to the manuscript revision and approved the submitted version.
References
- [1] A. Karabay, A. Bolatov, H. A. Varol, and M.-Y. Chan, “A central asian food dataset for personalized dietary interventions,” Nutrients, vol. 15, no. 7, 2023. [Online]. Available: https://www.mdpi.com/2072-6643/15/7/1728
- [2] D. Allegra, S. Battiato, A. Ortis, S. Urso, and R. Polosa, “A review on food recognition technology for health applications,” Health Psychology Research, vol. 8, no. 3, Dec. 2020. [Online]. Available: https://doi.org/10.4081/hpr.2020.9297
- [3] L. Bossard, M. Guillaumin, and L. V. Gool, “Food-101 – mining discriminative components with random forests,” in Computer Vision – ECCV 2014. Springer International Publishing, 2014, pp. 446–461. [Online]. Available: https://doi.org/10.1007/978-3-319-10599-4_29
- [4] J. Chen and C. wah Ngo, “Deep-based ingredient recognition for cooking recipe retrieval,” in Proceedings of the 24th ACM international conference on Multimedia. ACM, Oct. 2016. [Online]. Available: https://doi.org/10.1145/2964284.2964315
- [5] C. Gungor, F. Baltacı, A. Erdem, and E. Erdem, “Turkish cuisine: A benchmark dataset with turkish meals for food recognition,” in 2017 25th Signal Processing and Communications Applications Conference (SIU), 2017, pp. 1–4.
- [6] D. Sahoo, W. Hao, S. Ke, W. Xiongwei, H. Le, P. Achananuparp, E.-P. Lim, and S. C. H. Hoi, “FoodAI,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Jul. 2019. [Online]. Available: https://doi.org/10.1145/3292500.3330734
- [7] J. Chen, B. Zhu, C.-W. Ngo, T.-S. Chua, and Y.-G. Jiang, “A study of multi-task and region-wise deep learning for food ingredient recognition,” IEEE Transactions on Image Processing, vol. 30, pp. 1514–1526, 2021. [Online]. Available: https://doi.org/10.1109/tip.2020.3045639
- [8] W. Min, L. Liu, Z. Wang, Z. Luo, X. Wei, X. Wei, and S. Jiang, “Isia food-500: A dataset for large-scale food recognition via stacked global-local attention network,” Proceedings of the 28th ACM International Conference on Multimedia, 2020.
- [9] W. Min, Z. Wang, Y. Liu, M. Luo, L. Kang, X. Wei, X. Wei, and S. Jiang, “Large scale visual food recognition,” 2021. [Online]. Available: https://arxiv.org/abs/2103.16107
- [10] D. Herzig, C. T. Nakas, J. Stalder, C. Kosinski, C. Laesser, J. Dehais, R. Jaeggi, A. B. Leichtle, F.-M. Dahlweid, C. Stettler, and L. Bally, “Volumetric food quantification using computer vision on a depth-sensing smartphone: Preclinical study,” JMIR mHealth and uHealth, vol. 8, no. 3, Mar. 2020. [Online]. Available: https://doi.org/10.2196/15294
- [11] Roboflow, “Roboflow:give your software the sense of sight,” 2022, Accessed: 2022-11-25. [Online]. Available: https://roboflow.com/
- [12] L. Torrey and J. Shavlik, “Transfer learning,” in Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI global, 2010, pp. 242–264.
- [13] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
- [14] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014. [Online]. Available: https://arxiv.org/abs/1409.1556
- [15] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and 0.5mb model size,” 2016. [Online]. Available: https://arxiv.org/abs/1602.07360
- [16] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Jun. 2016. [Online]. Available: https://doi.org/10.1109/cvpr.2016.90
- [17] S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” 2016. [Online]. Available: https://arxiv.org/abs/1611.05431
- [18] S. Zagoruyko and N. Komodakis, “Wide residual networks,” 2016. [Online]. Available: https://arxiv.org/abs/1605.07146
- [19] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” 2016. [Online]. Available: https://arxiv.org/abs/1608.06993
- [20] M. Tan and Q. V. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” 2019. [Online]. Available: https://arxiv.org/abs/1905.11946