Transcription
Food Image Recognition byDeep LearningAssoc. Prof. Steven HOISchool of Information SystemsSingapore Management University
National Day Rally 2017:Singapore's War on Diabeteswww.moh.gov.sg/budget2016“Four simple ways to fight diabetes: Go for regular medical check-ups;Exercise more; Watch your diet; and Cut down on soft drinks.”- PM Lee Hsien Loong
Traditional Food les/images/food-journal-1 0.jpg
Smart Food LoggingHealthy 365Powered by
RoadmapProblemApproachResearchCases
Food Image Recognition Visual RecognitionLaksa?Machine Learning
Food Image Recognition Could be very challenging Singapore Tea or Teh Teh, tea with milk and sugar Teh-C, tea with evaporated milk Teh-C-kosong, tea with evaporated milk and no sugar Teh-O, tea with sugar only Teh-O-kosong, plain tea without milk or sugar Teh tarik, the Malay tea Teh-halia, tea with ginger water Teh-bing, tea with ice, aka Teh-ice Teh-siu-dai, tea with less sugar Teh-gah-dai, tea with extra sweetened milk /madnesskopiteh.jpg
Food Name HierarchyFood ItemVisual FoodFood CategoryTeh OTeh O siu daiTeh OTeh O kosong Green teaGreen tea ( no sugar)Green teaTea, no milk Iced lemon teaIced lemon tea
RoadmapProblemApproachResearchCases
Visual Recognition Classical Computer Vision ksaMee siamMee Goreng Deep Learning ApproachTrainableFeatureClassifierDeep NN Deep LearningDeepNN.Extraction(ML)LaksaMee siamMee Goreng
Deep Convolutional Neural Networks (CNN) Convolutional Neural Networks (CNN)Low-levelMid-levelHigh-levelLeNet [LeCun et a. 1998]Photos taken form neural-network.html
Deep CNN for Visual Recognition Revolution of Depth From AlexNet (8-layers) in 2012[ Krizhevsky et al. 2012 ]
Why Deep Learning?AccuracyDeep LearningMachineLearningHPC(GPU)DataTraditional LearningProductSmall dataData SizeBig data13
GPU for High Performance Computing Deep Learning on GPU Clusters DGX-1: NVIDIA Pascal -powered Tesla P100 Performance equal to 250 conventional servers.NVIDIA DGX-1AI SupercomputerSingapore 1st DGX-1 Deep Learning Supercomputer (with P100 GPUs)
SG-FOOD
SGFOOD Data StatisticsSGFood724 DatasetTrainingValidationTest# total images361,6767,24036,200 5001050# Image per class#Food Items:1038#Visual Food:724#Food Category: 158Histogram of #visual foods (724 visual food classes)
FoodAI: Open API Serviceshttp://www.foodai.org
FoodAI System EENGINEAPIServiceMODELTRAININGEXTERNAL DATACOLLECTIONWebDATABASEANNOTATIONSYSTEM
RoadmapProblemApproachResearchCases
Research Challenges How to train a good CNN model? How to deal with new food? How the labeled data size affects the accuracy?
Model Training A Family of CNN models for visual recognitionImageNet 1000 classes, 1.2 million images for training“An Analysis of Deep Neural Network Models for Practical Applications”Alfredo Canziani, Adam Paszke, Eugenio Culurciello Published 2016 in ArXiv
Experimental Setups CNN Models GoogleNet ResNet: 18, 50, 101, 152 Settings Toolbox: Caffe & TensorFelowFinetuned from ImageNet pretrained modelsBatch Size: From 16 to 128Optimizer: SGD with momentum/RMS Prop/AdamLearning rate: Fixed/multi-step/exponential decayDropout/Batch Normalizations
Benchmark of FoodAI724 visual food classes, 361,676 images for training, 500 images per classModels (SGFOOD)Top-1 Accuracy (%)Top-5 Accuracy 93.3ResNet-10173.291.9ResNet-15274.792.71000 object classes, 1.2 million images for training, 1200 images per classModels (IMAGENET)Top-1 Accuracy (%)Top-5 Accuracy .694.3
Food Saliency Map
How to handle NEW food? Too many possible food items in the market Only consider popular food for majority of usersNew foodDiscoveryNew foodimageannotationModelRe-trainingwith new food New food has few images available at the beginningUpdate FoodAIInferenceEngine
What if only 10x less amount of labeled data isavailable to train an CNN model?
Training on 10x less labeled dataResNet-50(10%) augmentationResNet-50 (100%)83.682.760.058.076.193.3ResNet-50 (10%)TOP-1 ACCURACYTOP-5 ACCURACY
RoadmapProblemApproachResearchCases
Case Studies: Food logging photos from usersWebMobile AppPowered by
Case Studies: Easy Cases
Case Studies: Hard CasesLarge inter-class similarity (e.g., drinks)Kopi OAmericano
Case Studies: Hard CasesLarge inter-class similarity (e.g., drinks)Instant CoffeeTeh C / TehPlain PorridgeSoya milk
Case Studies: Hard CasesLarge inter-class similarity (e.g., drinks)Instant CoffeeTeh OTeh / Teh C
Case Studies: Hard CasesLarge intra-class diversity(e.g., Economy rice)
Case Studies: Hard CasesIncomplete Food
Case Studies: Hard CasesNon Food
Case Studies: Hard CasesPoorly taken photos (illumination,rotation, occlusion, etc)
Case Studies: Hard CasesMultiple food items
Case Studies: Hard CasesUnknown food / food not in our list
How to build a more sustainable solution?Better LearningGo beyond supervised CNNCrowdsourcingCombined with human wisdom
Thank w.larc.smu.edu.sg
ResNet-101 78.2 93.9 ResNet-152 78.6 94.3 724 visual food classes, 361,676 images for training, 500 images per class 1000 object classes, 1.2 million images for training, 1200 images per class. Food Saliency Map. How to handle NEW food? Too many possible food items in the market