TensorFlow Lite - Red Cat Labs

Transcription

TensorFlow LiteLightweight cross-platform solution for mobileand embedded devices

Martin AndrewsGoogle Developer Expert, Machine LearningRed Dragon AI

Why TensorFlow Lite?

ML runs in many places Access to more data Fast and closely knitinteractions Privacy preserving

Creates many challenges Reduced compute power

Creates many challenges Reduced compute power Limited memory

Creates many challenges Reduced compute power Limited memory Battery constraints

Simplifying ML on-deviceTensorFlow Lite makes these challenges much easier!

What can I do with it?

Many use gnitionObject detectionTranslationVideo generationPredictionText to SpeechObject LocationVoice SynthesisText generationSpeech to TextOCRGesture recognitionFacial modellingSegmentationClusteringCompressionSuper ResolutionAudio generation

Who is using it?

2B mobile devicesHave TensorFlow Lite deployed on them in production

Some of the users .PhotosNetEaseGBoardiQiyiGmailAutoMLNestML KitAssistantAndmanymore.

Google Assistant is on1B devicesWide range of devices: High/low end, arm, x86, batterypowered, plugged in, many operating systemsPhonesTVsSpeakersLaptopsSmart DisplaysCarsWearablesOthers

Key Speech On-DeviceCapabilities “Hey Google” Hotword with VoiceMatch Tiny memory and computation footprint,running continuously Extremely latency sensitive On-device speech recognition High computation running in shorter bursts

Online Education Brand with the largest numbersof users in China800 millionUsers in total22 millionDAU28

Youdao Applicationswith TensorFlow LiteYoudao DictionaryYoudao TranslatorU-Dictionary29

Youdao On-Device AITranslation & OCR Applied in Youdao dictionary andtranslator apps Offline photo translation speedimproved 30-40% Support Realtime AR translation30

Model conversionThe conversion flow to TensorFlow Lite is simple .TensorFlow(estimator or Keras)SavedModelTF LiteConverterTF LiteModel

Model conversion however there are points of failure Limited ops Unsupported semantics (e.g.control-flow in RNNs)

Model conversionTensorFlow SelectAvailable now Enables hundreds more ops from TensorFlow on CPU. Caveat: binary size increase ( 6MB compressed).In the pipeline Selective registration Improved performance

Model conversionControl flow supportIn the pipelineControl flow are core to many ops (e.g. RNNs) andgraphs. Thus we are adding support for: Loops Conditions

Inference performanceCPU124 msCPU onMobileNet V1CPU 1.9x64msGPU 7.7xCPU w/QuantizationMobileNet V1Pixel 2 - Single Threaded CPU16 msFlowOpenGL 16Edge TPU 62x2 msQuantizedFixed-point

BenchmarkingBenchmarking and profilingAvailableImprovements to the Model Benchmark tool: Support for threading Per op profiling Support for Android NN API

BenchmarkingPer-op profiling breakdown

BenchmarkingProfiling summary

What is a rDelegateOperationKernels

Fast executionAndroid Neural Network API delegateEnables hardware supported by the Android NN API

Fast executionGPU delegatePreview available! 2–7x faster than the floating pointCPU implementation Adds 250KB to binary size(Android/iOS).

Fast executionGPU delegateIn the pipeline Expand coverage of operations Further optimize performance Evolve and finalize the APIsMake it generally available!

Fast executionEdge-TPU delegateEnables next generation ML hardware! High performance Small physical and power footprintAvailable in Edge TPU development kit

OptimizationMake your models even smallerand faster.

tization (CPU)OtheroptimizationsModel optimizationtoolkitIn the pipelineKeras-based quantizedtraining (CPU/NPU)Post-trainingquantization (CPU/NPU)Keras-basedconnection pruning

OptimizationQuantizationNew tools Post-training quantizationwith float & fixed point Great for CPUdeployments!

OptimizationQuantizationBenefits 4x reduction in model sizes Models, which consist primarily of convolutionallayers, get 10–50% faster execution (CPU) Fully-connected & RNN-based models get up to 3xspeed-up (CPU)

OptimizationQuantizationIn the pipeline Training with quantization Keras-based API Post-training quantization with fixed point mathonlyEven better performance on CPUPlus enable many NPUs!

Keras-based quantization API(x train, y train),(x test, y test) mnist.load data()x train, x test x train / 255.0, x test / 255.0model n(),tf.keras.layers.Dense(512, activation layers.Dense(10, activation tf.nn.softmax)])model.compile(optimizer 'adam',loss 'sparse categorical crossentropy',metrics ['accuracy'])model.fit(x train, y train, epochs 5)model.evaluate(x test, y test)

Keras-based quantization API(x train, y train),(x test, y test) mnist.load data()x train, x test x train / 255.0, x test / 255.0model n(),tf.keras.layers.Dense(512, activation layers.Dense(10, activation tf.nn.softmax)])model.compile(optimizer 'adam',loss 'sparse categorical crossentropy',metrics ['accuracy'])model.fit(x train, y train, epochs 5)model.evaluate(x test, y test)

Keras-based quantization API(x train, y train),(x test, y test) mnist.load data()x train, x test x train / 255.0, x test / 255.0model n(),quantize.Quantize(tf.keras.layers.Dense(512, activation .Quantize(tf.keras.layers.Dense(10, activation tf.nn.softmax))])model.compile(optimizer 'adam',loss 'sparse categorical crossentropy',metrics ['accuracy'])model.fit(x train, y train, epochs 5)model.evaluate(x test, y test)

OptimizationQuantization (post-training)TensorFlow(estimator or Keras)SavedModelTF LiteConverterTF LiteModel

OptimizationQuantization (post-training)TensorFlow(estimator or Keras)SavedModel CalibrationDataTF LiteConverterTF LiteModel

OptimizationConnection pruningWhat does it mean? Drop connections during training. Dense tensors will now be sparse(filled with zeros).

OptimizationConnection pruningBenefits Smaller models. Sparse tensors canbe compressed. Faster models. Less operations toexecute.

OptimizationConnection pruningComing soon Training with connection pruning in Keras-based API(compression benefits)In the pipeline Inference support for sparse models (speed-ups on CPUand selected NPUs)

OptimizationPruning results Negligibleaccuracy loss at50% sparsitySmall accuracyloss at 75%

Model repositoryAdded new model repositoryIn depth sample applications & tutorials for: Image classificationObject detectionPose estimationSegmentationSmart reply

TF Mobile Deprecated Provided 6 months of notice Limiting developer support in favorof TensorFlow Lite Still available for training on Github

TensorFlow Lite forMicrocontrollersSmaller, cheaper & wider range of devices

What am I talking about?Tiny models on tiny computers! Microcontrollers are everywhere Speech researchers werepioneers Models just tens of kilobytes

Here’s one I have in my pocketGet ready for a live demo!https://www.sparkfun.com/products/15170384KB RAM, 1MB Flash, 15Low single-digit milliwatt power usageDays on a coin battery!

Why is this useful?Running entirely on-deviceTiny constraints: It’s using a 20KB model Runs using less than 100KB ofRAM and 80KB of Flash

What is Coral? Coral is a platform for creating products withon-device ML acceleration.Our first products feature Google’s Edge TPU inSBC and USB accessory forms.

Edge TPUA Google-designed ASIC that lets you runinference on-device: Very fast inference speed (object detection in less than 15ms)Enables greater data privacyNo reliance on a network connectionRuns inference with TensorFlow LiteEnables unique workloads and new applications

Coral Dev BoardCPUi.MX 8M SoC w/ Quad-core A53GPUIntegrated GC7000 Lite GPUTPUGoogle Edge TPURAM Memory1GB LPDDR4 RAMFlash Memory8 GB eMMCSecurity/CryptoeMMC secure block for TrustZoneMCHP ATECC608A Crypto ChipPower5V 3A via Type-C connectorConnectorsUSB-C, RJ45, 3.5mm TRRS, HDMISupported OSMendel Linux (Debian derivative)AndroidSupported MLTensorFlow Lite

Coral AcceleratorTPUGoogle Edge TPUPower5V 3A via Type-C connectorConnectorsUSB 3.1 (gen 1) via USB Type-CSupported OSDebian 6.0 or higherOther Debian DerivativesSupported Architecturesx86 64ARMv8Supported MLTensorFlow Lite

These actually exist !They're available now at coral.withgoogle.com

Get it. Try it.Code: github.com/tensorflow/tensorflowDocs: tensorflow.org/lite/Discuss: tflite@tensorflow.org mailing list

Deep Learning MeetUp GroupThe Group : MeetUp.com / TensorFlow-and-Deep-Learning-Singapore 3,500 membersThe Meetings : Next 16-April, hosted at Google Something for Beginners Something from the Bleeding Edge Lightning Talks

Deep Learning JumpStart WorkshopThis Saturday (Tues & Thurs evening next week) Hands-on with real model code Build your own ProjectAction points : http:// bit.ly / jump-start-march-2019 Cost is heavily subsidised for SC/PR

Advanced Deep Learning CoursesModule #1 : JumpStart (see previous slide)Each 'module' will include : In-depth instruction, by practitioners Individual Projects 70%-100% funding via IMDA for SG/PRAction points : Stay informed : http://bit.ly/rdai-courses-2019

Red Dragon AI : Intern HuntOpportunity to do Deep Learning “all day”Key Features : Work on something cutting-edge ( publish!) Location : Singapore (SG/PR FTW) and/or RemoteAction points : Need to coordinate timing Contact Martin or Sam via LinkedIn

Mar 14, 2019 · Runs inference with TensorFlow Lite Enables unique workloads and new applications. Coral Dev Board CPU i.MX 8M SoC w/ Quad-core A53 GPU Integrated GC7000 Lite GPU TPU Google Edge TPU RAM Memory 1GB LPDDR4 RAM Flash Memory