Food Image Recognition By Deep Learning - Nvidia PDF Free Download

1y ago

43 Views

1 Downloads

5.28 MB

41 Pages

Report/dmca

Download PDF

Transcription

Food Image Recognition byDeep LearningAssoc. Prof. Steven HOISchool of Information SystemsSingapore Management University

National Day Rally 2017:Singapore's War on Diabeteswww.moh.gov.sg/budget2016“Four simple ways to fight diabetes: Go for regular medical check-ups;Exercise more; Watch your diet; and Cut down on soft drinks.”- PM Lee Hsien Loong

Traditional Food les/images/food-journal-1 0.jpg

Smart Food LoggingHealthy 365Powered by

RoadmapProblemApproachResearchCases

Food Image Recognition Visual RecognitionLaksa?Machine Learning

Food Image Recognition Could be very challenging Singapore Tea or Teh Teh, tea with milk and sugar Teh-C, tea with evaporated milk Teh-C-kosong, tea with evaporated milk and no sugar Teh-O, tea with sugar only Teh-O-kosong, plain tea without milk or sugar Teh tarik, the Malay tea Teh-halia, tea with ginger water Teh-bing, tea with ice, aka Teh-ice Teh-siu-dai, tea with less sugar Teh-gah-dai, tea with extra sweetened milk /madnesskopiteh.jpg

Food Name HierarchyFood ItemVisual FoodFood CategoryTeh OTeh O siu daiTeh OTeh O kosong Green teaGreen tea ( no sugar)Green teaTea, no milk Iced lemon teaIced lemon tea

RoadmapProblemApproachResearchCases

Visual Recognition Classical Computer Vision ksaMee siamMee Goreng Deep Learning ApproachTrainableFeatureClassifierDeep NN Deep LearningDeepNN.Extraction(ML)LaksaMee siamMee Goreng

Deep Convolutional Neural Networks (CNN) Convolutional Neural Networks (CNN)Low-levelMid-levelHigh-levelLeNet [LeCun et a. 1998]Photos taken form neural-network.html

Deep CNN for Visual Recognition Revolution of Depth From AlexNet (8-layers) in 2012[ Krizhevsky et al. 2012 ]

Why Deep Learning?AccuracyDeep LearningMachineLearningHPC(GPU)DataTraditional LearningProductSmall dataData SizeBig data13

GPU for High Performance Computing Deep Learning on GPU Clusters DGX-1: NVIDIA Pascal -powered Tesla P100 Performance equal to 250 conventional servers.NVIDIA DGX-1AI SupercomputerSingapore 1st DGX-1 Deep Learning Supercomputer (with P100 GPUs)

SG-FOOD

SGFOOD Data StatisticsSGFood724 DatasetTrainingValidationTest# total images361,6767,24036,200 5001050# Image per class#Food Items:1038#Visual Food:724#Food Category: 158Histogram of #visual foods (724 visual food classes)

FoodAI: Open API Serviceshttp://www.foodai.org

FoodAI System EENGINEAPIServiceMODELTRAININGEXTERNAL DATACOLLECTIONWebDATABASEANNOTATIONSYSTEM

RoadmapProblemApproachResearchCases

Research Challenges How to train a good CNN model? How to deal with new food? How the labeled data size affects the accuracy?

Model Training A Family of CNN models for visual recognitionImageNet 1000 classes, 1.2 million images for training“An Analysis of Deep Neural Network Models for Practical Applications”Alfredo Canziani, Adam Paszke, Eugenio Culurciello Published 2016 in ArXiv

Experimental Setups CNN Models GoogleNet ResNet: 18, 50, 101, 152 Settings Toolbox: Caffe & TensorFelowFinetuned from ImageNet pretrained modelsBatch Size: From 16 to 128Optimizer: SGD with momentum/RMS Prop/AdamLearning rate: Fixed/multi-step/exponential decayDropout/Batch Normalizations

Benchmark of FoodAI724 visual food classes, 361,676 images for training, 500 images per classModels (SGFOOD)Top-1 Accuracy (%)Top-5 Accuracy 93.3ResNet-10173.291.9ResNet-15274.792.71000 object classes, 1.2 million images for training, 1200 images per classModels (IMAGENET)Top-1 Accuracy (%)Top-5 Accuracy .694.3

Food Saliency Map

How to handle NEW food? Too many possible food items in the market Only consider popular food for majority of usersNew foodDiscoveryNew foodimageannotationModelRe-trainingwith new food New food has few images available at the beginningUpdate FoodAIInferenceEngine

What if only 10x less amount of labeled data isavailable to train an CNN model?

Training on 10x less labeled dataResNet-50(10%) augmentationResNet-50 (100%)83.682.760.058.076.193.3ResNet-50 (10%)TOP-1 ACCURACYTOP-5 ACCURACY

RoadmapProblemApproachResearchCases

Case Studies: Food logging photos from usersWebMobile AppPowered by

Case Studies: Easy Cases

Case Studies: Hard CasesLarge inter-class similarity (e.g., drinks)Kopi OAmericano

Case Studies: Hard CasesLarge inter-class similarity (e.g., drinks)Instant CoffeeTeh C / TehPlain PorridgeSoya milk

Case Studies: Hard CasesLarge inter-class similarity (e.g., drinks)Instant CoffeeTeh OTeh / Teh C

Case Studies: Hard CasesLarge intra-class diversity(e.g., Economy rice)

Case Studies: Hard CasesIncomplete Food

Case Studies: Hard CasesNon Food

Case Studies: Hard CasesPoorly taken photos (illumination,rotation, occlusion, etc)

Case Studies: Hard CasesMultiple food items

Case Studies: Hard CasesUnknown food / food not in our list

How to build a more sustainable solution?Better LearningGo beyond supervised CNNCrowdsourcingCombined with human wisdom

Thank w.larc.smu.edu.sg

ResNet-101 78.2 93.9 ResNet-152 78.6 94.3 724 visual food classes, 361,676 images for training, 500 images per class 1000 object classes, 1.2 million images for training, 1200 images per class. Food Saliency Map. How to handle NEW food? Too many possible food items in the market