Access Purpose-built ML Hardware With Web Neural Network .

Transcription

Access purpose-built ML hardwarewith Web Neural Network APINingxin HuIntel CorporationJuly 2020

The JS ML frameworks and AI Web appsAI Features ofWeb nitionNoiseSuppressionJS ML b BrowserHardwareWebAssemblyWebGL/WebGPUCPUGPU2

The purpose-built ML hardwareAI Features ofWeb nitionNoiseSuppressionJS ML b BrowserHardwareWebAssemblyCPUML Ext.WebGL/WebGPUGPUML Ext.NPUVPUDSP3

The performance gap: Web and nativeMobileNet* Inference Latency (smaller is better)Laptop with VNNI**Inference Latency VINO/VNNI/INT8GPU1.1CPUInference Latency (ms)35Smartphone with 08580Wasm70642.6X NAPI/CPU/FP32NNAPI/GPU/FP16* Batch size: 1, input size: 224x224, width multiplier: 1.0** VNNI: Vector Neural Network Instruction4

The Web is disconnected from ML hardwareAI Features ofWeb AppsJS ML FrameworksWeb .jsPaddle.jsOpenCV.jsWebAssemblyCPUML Ext.?WebGL/WebGPUGPUML Ext.NPUVPUDSP5

WebNN: the architecture viewONNX ModelsWeb AppWeb BrowserNative ML APIHardwareJS ML ow.js, ONNX.js etc.,WebAssemblyWebNNDirectMLWindowsML Ext.Other ModelsTensorFlow ModelsNN APIAndroidGPUML Ext.OpenVINOLinuxML Accelerators6

WebNN: the programming modelnn uralNetworkContextnn.input / nn.constant / nn.conv2d / nn.add / nn.relu / ExecutionComputational putstartComputeComputational Graph webmachinelearning.github.io/webnn/7

WebNN: the proof-of-concept implementationRenderer onAndroid ImplWindows ImplLinux CserviceMacOS ImplGPU ProcessNative APIHardwareMPS/BNNSCPUML Ext.GPUML Ext.ML rc8

WebNN: the demosWebNN image classification on a laptop with s/image classificationWebNN image classificationon a smartphone with DSP9

WebNN: the PoC performanceMobileNet* Inference Latency on Laptop with VNNI**(smaller is better)35Inference Latency (ms)3033WasmWeb 6WebNN/OpenVINO/VNNI/INT8* Batch size: 1, input size: 224x224, width multiplier: 1.0** VNNI: Vector Neural Network NO/GPU/FP1610

WebNN: the PoC performance – cont’dInference Latency (ms)MobileNet* Inference Latency on Smartphone with DSP(smaller is better)908580WasmWeb sm/SIMD128/FP32WebGL/GPU/FP16WebNN/NNAPI/DSP/INT8* Batch size: 1, input size: 224x224, width multiplier: I/DSP/INT8DSPNNAPI/CPU/FP32NNAPI/GPU/FP1611

Call /webnn/12

Thanks13

Appendix WebNN spec: https://webmachinelearning.github.io/webnn/WebML CG: API: worksDirectML: ect3d12/dml-introMPS: formanceshadersOpenVINO: https://docs.openvinotoolkit.org/TensorFlow.js: https://js.tensorflow.org/ONNX.js: https://github.com/microsoft/onnxjsPaddle.js: evelop/webOpenCV.js: https://docs.opencv.org/3.4.10/d5/d10/tutorial js root.htmlAI-benchmark: http://ai-benchmark.com/TensorFlow.js benchmark: asm SIMD128: https://github.com/WebAssembly/simdAVX512-VNNI: https://en.wikichip.org/wiki/x86/avx512 vnni14

ms) Wasm/SIMD128/FP32 NNAPI/CPU/FP32 WebGL/GPU/FP16 NNAPI/GPU/FP16 NNAPI/DSP/INT8 5.8X 16X MobileNet* Inference Latency (smaller is better) Smartphone with DSP 33 3.4 26.8 3 1.1 0 5 10 15 20 25 30 35 cy (ms) Wasm/SIMD128/FP32 OpenVINO/CPU/FP32 WebGL/GPU/FP16 OpenVINO/GPU/FP16 OpenVINO/VNNI/INT8 Laptop with VNNI** * Batch size: 1, input size: 224x224,