Jetson Agx Xavier And The New Era Of Autonomous Machines - Nvidia

Transcription

JETSON AGX XAVIER AND THENEW ERA OF AUTONOMOUS MACHINES

Intro to Jetson AGX Xavier- AI for Autonomous Machines- Jetson AGX Xavier Compute Module- Jetson AGX Xavier Developer KitXavier Architecture- Volta GPU- Deep Learning Accelerator (DLA)WEBINARAGENDA- Carmel ARM CPU- Vision Accelerator (VA)Jetson SDKs- JetPack 4.1- DeepStream SDK- ISAAC SDKResources & Support- Developer Site & Documentation- Community Forums & Wiki- Tutorials- Quick Start Platforms

BILLIONS OF AUTONOMOUS ructionAgricultureSmart CityRetailLogisticsInventory MgmtDeliveryInspectionService3

EXAMPLE - AI DELIVERYTotal: 20-30 TOPS4

EXAMPLE – VIDEO ANALYTICSTypical application: 30 TOPS5

VISION NETWORKSCompute DemandInput SizeGOPs/Frame GOPs @ 30HzImage RecognitionInput SizeGOPs/Frame GOPs @ pose368x3681364,0801280x6402607,800Object 0Faster-RCNN600x8501725,160Pose EstimationStereo Depth DNN6

NVIDIA AGX SYSTEMSEMBEDDED AI HPCNVIDIA DRIVE AGX Self-Driving CarsNVIDIA Jetson AGX Autonomous MachinesNVIDIA Clara AGX Medical Imaging8

JETSON AGX XAVIERWorld’s first AI computer forAutonomous MachinesAI Server Performance in 30W 15W 10W512 Volta CUDA Cores 2x NVDLA8 core CPU32 DL TOPS9

COMPREHENSIVE HIGH PERFORMANCEI/O SUBSYSTEMPCIe5 16GT/s gen4 controllers1x8, 1x4, 1x2, 2x13x Root port Endpoint2x Root portDISPLAY3x DP/HDMI/eDP4K @ 60 HzDP HBR3HDMI 2.0USB3x USB3.1 (10 GT/s) ports4x USB2.0 portsCAMERA16 CSI2 lanes, 8 SLVS-EC lanes40 Gbps in DPHY 1.2 Mode109 Gbps in CPHY 1.1 ModeUp to 36 Virtual ChannelsETHERNET1x Gigabit Ethernet-AVBRGMII PHYPTP, WoLOTHER I/OsI2CI2SUFSCAN SPISDUART GPIOTotal I/O 650 Gbps10

JETSON AGX XAVIERCompute ModuleJETSON TX2JETSON AGX XAVIER256 Core Pascal @ 1.3GHz512 Core Volta @ 1.37GHz64 Tensor CoresDL Accelerator-(2x) NVDLAVisionAccelerator-(2x) 7-way VLIW Processor6 core Denver and A57 @ 2GHz(2x) 2MB L28 core Carmel ARM CPU @ 2.26GHz(4x) 2MB L2 4MB L3Memory8GB 128 bit LPDDR458.4 GB/s16GB 256-bit LPDDR4x @ 2133MHz137 GB/sStorage32GB eMMC32GB eMMCVideo Encode(2x) 4K @30HEVC(4x) 4Kp60 / (8x) 4Kp30HEVCVideo Decode(2x) 4K @3012 bit support(2x) 8Kp30 / (6x) 4Kp6012 bit support12 lanes MIPI CSI-2D-PHY 1.2 30Gbps16 lanes MIPI CSI-2 8 lanes SLVS-ECD-PHY 40Gbps / C-PHY 109GbpsPCI Express5 lanes PCIe Gen 21x4 1x1 2x1 1x416 lanes PCIe Gen 41x8 1x4 1x2 2x1Mechanical50mm x 87mm400 pin connector100mm x 87mm699 pin connector7.5W / 15W10W / 15W / 30WGPUCPUCameraPower11

JETSON AGX XAVIER20x Performance in 18 Months2x CPU11GB/sCum. DMIPSTFLOPS (FP16)55581.3JetsonJetsonTX2AGX XavierJetsonJetsonTX2AGX Xavier4x CODEC813711232TOPS2.4x DRAM BWJetsonJetsonTX2AGX Xavier4K Encode and Decode8x CUDA24x DL / AI2JetsonJetsonTX2AGX Xavier1.3Jetson TX2Jetson AGX Xavier12

JETSON AGX XAVIERGPU Workstation Perf 1/10th PowerAI Inference PerformanceAI Inference Efficiency701.4X140012001000800600400200ResNet-50 Images/sec/WResNet-50 Images/sec160014X60504030201000Core i7 GTX 1070Jetson AGX XavierCore i7 GTX 1070Jetson AGX Xavier13

JETSON AGX XAVIERCompute ModuleJetson e (TTP)32GB eMMC16GB LPDDR4x 1599 (qty. 10 ) 1299 (qty. 100 )Coming Soon14

JETSON AGX XAVIERDeveloper KitI/OPCIe x16PCIe Gen4 x8 / SLVS-EC x8RJ45Gigabit EthernetUSB-C(2x) USB 3.1 DisplayPort, Power DeliveryeSATAp USB 3.0Micro USBCamera HeaderSATA (Power Data for 2.5” SATA) USB 3.0(1x) USB 2.0(16x) CSI-2 lanesM.2 Key MNVMe storageM.2 Key EPCIe x1 (for Wi-Fi / LTE / 5G)40-pin HeaderHD Audio HeaderHDMI Type AUART, SPI, CAN, I2C, I2C, DMIC, GPIOsHigh-Definition AudioHDMI 2.0, eDP 1.2a, DP 1.4uSD / UFS CardSD / UFSDC Barrel Jack9V – 20VDCSize105mm x 105mm 2499 (Retail), 1799 (qty. 10 ) 1299 (Developer Special, limit 1)Available Now, see NVIDIA.com15

JETSON AGX XAVIERDeveloper KitI/OPCIe x16PCIe Gen4 x8 / SLVS-EC x8RJ45Gigabit EthernetUSB-C(2x) USB 3.1 DisplayPort, Power DeliveryeSATAp USB 3.0Micro USBCamera HeaderSATA (Power Data for 2.5” SATA) USB 3.0(1x) USB 2.0(16x) CSI-2 lanesM.2 Key MNVMe storageM.2 Key EPCIe x1 (for Wi-Fi / LTE / 5G)40-pin HeaderHD Audio HeaderHDMI Type AUART, SPI, CAN, I2C, I2C, DMIC, GPIOsHigh-Definition AudioHDMI 2.0, eDP 1.2a, DP 1.4uSD / UFS CardSD / UFSDC Barrel Jack9V – 20VDCSize105mm x 105mm 2499 (Retail), 1799 (qty. 10 ) 1299 (Developer Special, limit 1)Available Now, see NVIDIA.com16

JETSON AGX XAVIERDeveloper KitExpansionHeaderUSB-C Connector(Flash, Debug)MicroUSB B(Debug)VDD 5VLEDPower ButtonForce RecoveryButtonJetson XavierModuleConnectorReset ButtonCameraConnectorM.2 Key MConnectorJTAGHeaderAudio PanelHeaderM.2 Key EConnectorAutomationHeaderUSB-C Conn.(General Purpose)Fan HeaderUFS / SDCard SocketPCIe x16ConnectorVoltageSelectJumperHDMI Type AConnectoreSATA USB3.1 Type AConnectorRJ45ConnectorPowerJack17

JETSON AGX XAVIER ECOSYSTEMAI SOFTWAREDISTRIBUTORS WORLDWIDEQUICK START PLATFORMSRESEARCHSENSORSSYSTEM DESIGNSYSTEM SOFTWARE/TOOLSMuJoCo

Volta GPUXAVIERARCHITECTUREDeep Learning Accelerator (DLA)Carmel ARM CPUVision Accelerator19

VOLTA GPUOptimized for Inference8x Volta SM @ 1377MHz512 CUDA cores, 64 Tensor Cores22 TOPS INT8, 11 TFLOPS FP168x larger L1 cache size4x faster L2 cache access4 scheduler partitions per SMCUDA compute capability 7.220

TENSOR CORESHMMA / IMMA4x4 matrix processing array, D A*B CHMMA/IMMA FP16/INT8 Matrix Multiple AccumulateAccumulation occurs in full precision with overflow protectionEach Tensor Core performs 64 floating-point or 128 integer ops per clockResults can be composed to construct larger matrix multiplies & convolutionsIntegrated with cuBLAS, cuDNN, TensorRT, and programmable through CUDA21

DEEP LEARNINGACCELERATOR (DLA)2x DLA engines per Xavier5 TOPS INT8, 2.5 TFLOPS FP16 per DLAOptimized for energy efficiency (500-1500mW)DLASMConfigurationand controlSMSM rogrammed with TensorRT 5.0Supported layers include: Convolution,Deconvolution, Activations, Pooling,Normalization, Fully ConnectedSMPostprocessingMemory interfaceSDRAMInternal RAMOpen-source architecture: NVDLA.org22

23Jetson AGX Xavier running GPU and (2x) DLA

24Jetson AGX Xavier running GPU and (2x) DLA

POWER MODESDifferent power mode presets: 10W, 15W and 30WDefault mode is 15WUsers can create their own presets, specifying clocks and online cores in /etc/nvpmodel.conf POWER MODEL ID 2 NAME MODE 15W CPU ONLINE CORE 0CPU ONLINE CORE 4 0CPU DENVER 0 MAX FREQ 1200000GPU MIN FREQ 0GPU MAX FREQ 670000000EMC MAX FREQ 1331200000NVIDIA Power Model Toolsudo nvpmodel –q(for current mode)sudo nvpmodel –m 0(for changing mode, persists after reboot)sudo /tegrastats(for monitoring clocks & core utilization)27

NVPMODEL CLOCK CONFIGURATIONMode NameEDP10W15W30W30W30W30Wn/a10W15W30W30W30W30WMode ID0123456Online 1050Vision Accelerator (VA) cores2011111VA Maximal Frequency (MHz)10880550760760760760Memory Maximal Freq. (MHz)2133106613331600160016001600Power BudgetCPU Maximal Frequency (MHz)GPU TPCGPU Maximal Frequency (MHz)DLA coresDLA Maximal Frequency (MHz)The default mode is 15W (ID:2)28

CARMEL CPU COMPLEXCPU COMPLEXFull ARMv8.2 including RAS support8 NVIDIA Carmel cores @ 2.26GHzCarmel500-1500mW power per coreCarmelCarmelCarmel2MB L22MB L22 cores 2MB L2 per clusterCache Coherent Across CPU ComplexNVIDIA Dynamic Code OptimizationI/O Coherent MemoryCarmelCarmelCarmelCarmel2MB L22MB L24MB L34MB Exclusive L3 cache29

CPU BENCHMARKSSpeed-up of Xavier over TX22.8x2.6xSpecINT-Rate 8X (est.)SpecFP-Rate 8X (est.)30

VISION ACCELERATOR2x Vision Accelerator enginesOptimized offloading of imaging &vision algorithms – feature detection& matching, stereo, optical flowSW support enabled in future JetPackEach Vision Accelerator includes:Cortex-R5 for config and control2x 7-way VLIW Vector Processing Units2x DMA for data movement to/frominternal/external memories31

JetPack – AI at the EdgeJETSONSDKsDeepStream - Intelligent Video Analytics (IVA)ISAAC - Robotics & Autonomous Machines32

JETSON SDKsDEEPSTREAM SDKFOR VIDEO ANALYTICSISAAC SDKFOR AUTONOMOUS MACHINESJETPACK SDKFOR AI AT THE EDGEJETSON AGX XAVIERNVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

JETPACK SDKfor AI at the EdgeSample CodeNsight Developer ToolsMultimedia APITensorRTcuDNNTF, PyTorch, .Deep LearningVisionWorksOpenCVNPPComputer rV4L2MediaCUDA, Linux For Tegra, ROSJetson AGX Xavier: Advanced GPU, 64-bit CPU, Video CODEC, DLAs34

Package VersionsJETPACK 4.1DEVELOPER PREVIEWEARLY ACCESSAvailable Now For Jetson AGX Xavierdeveloper.nvidia.com/jetpackL4T BSP31.0.2CUDA10.0Linux Kernel4.9cuDNN7.3CBoot1.0TensorRT5.0 .5OpenCV3.3.1EGL1.5NPP10.0GLX1.4X11 ABI24Xrandr1.4Multimedia API31.1Argus Camera API0.97GStreamer1.14Nsight Systems2018.1Nsight Graphics1.0Jetson OSUbuntu 18.04Host OSUbuntu 16.04 / 18.04Install TensorFlow, PyTorch, Caffe,ROS, and other GPU libraries35

NVIDIA TensorRTProduction InferencingTESLA P4/T4TensorRTJETSONJETSONDRIVEDRIVENVIDIA DLACompile and Optimize Neural NetworksSupport for Every FrameworkOptimize for Each Target PlatformTESLA V100

NVIDIA TensorRT 5Deep Learning Inference Optimizer and RuntimeNew support for Jetson AGX Xavier in TensorRT 5: Volta GPU INT8 & Tensor Cores (HMMA/IMMA)TrainedNeuralNetworkTensorRTOptimizer Early-Access DLA FP16 supportTensorRTRuntimeEngine Updated samples to enabled DLA Fine-grained control of DLA layers and GPU Fallback Fuse network layers Eliminate concatenation layers Kernel specialization Auto-tuning for target platform Select optimal tensor layoutgetMaxDLABatchSize() Batch size tuningallowGPUFallback() Mixed-precision INT8/FP16 support New APIs added to IBuilder idia.com/tensorRT37

AI – EDGE TO CLOUDEDGE AND ON-PREMISESInferenceEdge deviceTraining and InferenceServerTENSORRT DEEPSTREAM JETPACKJETSONCLOUDTESLANVIDIA GPU CLOUD DIGITSDGX38

NVIDIA DEEPSTREAMZero Memory CopiesTypical multi-stream application: 30 TOPS39

40

ISAACIsaac SDK: Simulation to RealityWorld modelWarehouse Office Store HomeRobot modelCarter URDF loaderInteractionswith NavigationSimulateMLGemsDriversJetsonTensorRT CUDA TensorFlow .Optimizers Algebra EKFs Depth .Lidar Camera IMU Robot Base .Fully integrated withTX2 and XavierSimulation EngineIsaac FrameworkPhoto-realistic Graphics Physics Soft bodies Procedural Generation Massive parallelism Unreal Engine 4 / Unity 3DCodelets Behaviors 3D Poses Distributed Messaging Synchronization Record & Replay Configuration VisualizationVirtual SensorsVirtual ActuatorsSensor ProcessingActuator ControlUnified Message APIUse the same messages for simulation,actual hardware and across all appsHW SensorsHW Actuators42

Developer SiteRESOURCES&SUPPORTDocumentationForums & WikiTutorialsQuick-Start Platforms47

JETSONDEVELOPER SITEEnd-to-end developmentfrom idea to final productJetPack and Isaac SDKsDeveloper toolsDesign collateralDeveloper forumTraining and tutorialsEcosystemdeveloper.nvidia.com/jetson48

GETTING HELPJetson CommunityDeveloper Forums devtalk.nvidia.comeLinux WikieLinux.org/Jetson49

TWO DAYS TO A DEMOGetting Started with Deep LearningAI WORKFLOWTRAINING GUIDESDEEP VISION PRIMITIVESTrain using DIGITS and cloud/PCDeploy to the field with JetsonAll the steps required to follow to trainyour own models, including the datasets.Image Recognition, Object Detectionand Segmentationgithub.com/dusty-nv/jetson-inference

TWO DAYS TO A DEMOReinforcement Learning EditionOpenAI GymRL AlgorithmsRobotic SimulationTransfer LearningTest environments and games forresearch and verificationDQN, A3C, Actor Criticusing PyTorchObservation from visionPixels-to-actionsAdapt network to real robotOnline learning in the fieldgithub.com/dusty-nv/jetson-reinforcement

TENSORFLOWAccelerated Performance with Jetson AGX XavierDownload PIP Wheel installers from Jetson Download CenterFollow tutorials for popular vision tasks like object detectionOptimize for deployment with NVIDIA TensorRT (UFF/TFTRT)Golden RetrieverMiniature PoodleToy b.com/NVIDIA-Jetson/tf to trt image classificationgithub.com/NVIDIA-Jetson/tf trt models52

JETSON QUICK-START PLATFORMSToyota HSRClearpath Robotics - Jackal UGVAion Robotics – R1 UGVJetsonHacks RACECAR/JNVIDIA – Redtail UAV

Thank you!Developer SiteDownload JetPack2 Days To a DemoDevTalk ForumsVisit the ux.org/JetsonDev Blog NVIDIA Jetson AGX XavierOpens New Era of AI in RoboticsQ&A: What can I help you build?54

15 JETSON AGX XAVIER Developer Kit 2499 (Retail), 1799 (qty. 10 ) 1299 (Developer Special, limit 1) Available Now, see NVIDIA.com I/O PCIe x16 PCIe Gen4 x8 / SLVS-EC x8 RJ45 Gigabit Ethernet USB-C (2x) USB 3.1 DisplayPort, Power Delivery eSATAp USB 3.0 SATA (Power Data for 2.5" SATA) USB 3.0 Micro USB (1x) USB 2.0 Camera Header (16x) CSI-2 lanes M.2 Key M NVMe storage