Vulkan: The Essentials - Nvidia

Transcription

Vulkan: the essentialsTristan Lorach, March 17th 2016

Analogy On Graphic APIs2

AnalogyFixed-function OpenGLPre-assembled toy carfun out of the box,not much room for customization3

AZDO Approaching Zero Driver OverheadAnalogyModern AZDO OpenGL with Programmable ShadersLEGO Kityou build it yourself,comes with plenty of useful, pre-shaped pieces4

AnalogyVulkanPine Wood Derby Kityou build it yourself to race from raw materialspower tools used to assemble, adult supervision highly recommended5

AnalogyDifferent Valid ApproachesFixed-function OpenGLModern AZDO OpenGL withProgrammable ShadersVulkan6

Beneficial Vulkan ScenariosIs your graphicswork CPU bound?startYour graphicsplatform is fixedYou puta premium onavoidinghitchesyesyesyesCan your graphicscreation be parallelized?You’lldo whateverit takes to squeeze outMax perf.You canmanage yourgraphics resourceallocationsyesyesVulkanfriendlyyes7

Beneficial Vulkan ScenariosIs your graphicswork CPU bound?startTired with OpenGLYour graphics(state-machine)platform is fixedor even D3D ?You puta premium onavoidinghitchesyesCan your graphicscreation be parallelized?yesYou’llyes Kinda dowhateverWantto learn new stuff ?yesit takesto squeezeSpendlots ofouttime coding ?Max perf.(it’s a Yes)No sleep ?yesYou canmanage yourgraphics resourceallocationsAlright Vulkanfriendly(Yes)yes8

Unlikely to BenefitScenarios to Reconsider Coding to Vulkan1. Need for compatibility to pre-Vulkan platforms2. Heavily GPU-bound application3. Heavily CPU-bound application due to non-graphics work4. Single-threaded application, unlikely to change5. App can target middle-ware engine, avoiding 3D graphics API dependencies Consider using an engine targeting Vulkan, instead of dealing with Vulkan yourselfOpenGL / D3D9

Cmd bundlesmemoryOpenGLCommandsElement buffer (EBO).Draw Indirect BufferVertex Buffer ig Picture – Typical OpenGL CaseGPUFront-End(decoder)Vertex Puller (IA)Vertex ShaderGraphics pipelineStatesUniform BlockTCS (Tessellation)Texture FetchTessellatorTES (Tessellation)ResourcesImage Load/StoreOpenGLresourcesHeapAtomic CounterDependenciesShader StorageFBO resources(Textures / RB)Geometry ShaderTransform FeedbackRasterizationFragment ShaderPer-Fragment OpsFramebufferTr. Feedback buffer10

Big Picture – VulkanCmd-buffers / queuesCmd bundlesmemoryElement buffer (EBO)Fewer translation,Validation checksAnd internal mgtPipeline RenderDescriptorStates PassesSetsMinimal memorymanagementDraw Indirect BufferVertex Buffer (VBO)HeapFront-End(decoder)Vertex Puller (IA)Uniform BlockTCS (Tessellation)Texture FetchTessellatorTES (Tessellation)Atomic CounterDependenciesGPUVertex ShaderImage onPush-BufferShader StorageFBO resources(Textures / RB)Geometry ShaderTransform FeedbackRasterizationFragment ShaderPer-Fragment OpsFramebufferTr. Feedback buffer11

Vulkan ComponentsCmd.Buffer PoolCommand-bufferBarrier synchronizationHeapImage ViewFramebufferRender-PassBegin Render-PassBind Graphics-pipelineGraphics pipelineSet misc. dynamic statesImageMemoryBufferBind Vertex/Idx Buffer(s)Update BufferImage ViewBufferSamplerDescriptor-SetDescriptorSet PoolBind Descriptor-Set(s)DeviceDraw 2ndary Command-buffer Execute CommandsEnd Render-PassQueue 12

Vulkan ComponentsCmd.Buffer PoolCommand-bufferBarrier synchronizationHeapImage ViewFramebufferRender-PassBegin Render-PassBind Graphics-pipelineGraphics pipelineSet misc. dynamic statesImageMemoryBufferBind Vertex/Idx Buffer(s)Update BufferImage ViewBufferSamplerDescriptor-SetDescriptorSet PoolBind Descriptor-Set(s)DeviceDraw 2ndary Command-buffer Execute CommandsEnd Render-PassQueue 13

Vulkan Objects: DeviceCmd.Buffer PoolCommand-bufferBarrier synchronizationHeapImage ViewFramebufferRender-PassBegin Render-PassBind Graphics-pipelineImageMemoryImage ViewCan have many BufferSamplerGraphics pipelineVkPhysicalDevice Capabilities Memory ManagementBuffer Queues Objects BuffersDescriptor-Set Images Sync PrimitivesDescriptorSet PoolSet misc. dynamic statesBind Vertex/Idx Buffer(s)Update BufferBind Descriptor-Set(s)DeviceDraw 2ndary Command-buffer Execute CommandsEnd Render-PassQueue 14

NVIDIA’s Vulkan Capabilities Properties listed from Physical Device NVIDIA is almost full featured Top to bottom: from GeForce, Quadro down to Tegra Check http://vulkan.gpuinfo.org/listreports.php15

NVIDIA’s Vulkan CapabilitiesGeForce GTX 980Tegra X1 & K116

Vulkan ComponentsCmd.Buffer PoolCommand-bufferBarrier synchronizationHeapImage ViewFramebufferRender-PassBegin Render-PassBind Graphics-pipelineGraphics pipelineSet misc. dynamic statesImageMemoryBufferBind Vertex/Idx Buffer(s)Update BufferImage ViewBufferSamplerDescriptor-SetDescriptorSet PoolBind Descriptor-Set(s)DeviceDraw 2ndary Command-buffer Execute CommandsEnd Render-PassQueue 17

QueuesCmd.Buffer PoolCommand-buffer Command queue was hidden in OpenGL Context now explitlydeclaredBarriersynchronization Multiple threadscan submitwork to a queue Render-Pass(or queues)!FramebufferImage ViewHeap Queues accept GPU work via CommandBuffer submissionsBegin Render-PassBind Graphics-pipelineGraphics pipeline few operationsavailable:, “submit work” and “wait for idle”ImageSet misc. dynamic states Queue submissions can include sync primitivesfor the queueBindto:Vertex/Idx Buffer(s)BufferMemory Wait upon before processing the submitted workUpdate BufferImageViewDescriptor-Set Signal whenthework in this submission is completed Queue “families”can accept DescriptorSetdifferent Pooltypes of work, e.g.Buffer NVIDIA exposes 16 QueuesSampler Only one type of queue for all the types2ndary Command-bufferof work Bind Descriptor-Set(s)DeviceDraw Execute CommandsEnd Render-PassQueue 18

Vulkan ComponentsCmd.Buffer PoolCommand-bufferBarrier synchronizationHeapImage ViewFramebufferRender-PassBegin Render-PassBind Graphics-pipelineGraphics pipelineSet misc. dynamic statesImageMemoryBufferBind Vertex/Idx Buffer(s)Update BufferImage ViewBufferSamplerDescriptor-SetDescriptorSet PoolBind Descriptor-Set(s)DeviceDraw 2ndary Command-buffer Execute CommandsEnd Render-PassQueue 19

Command-Buffers Vulkan Rendering Command-BuffersCmd.Buffer PoolBarrier synchronization Almost what GPU will get at Front-End (FIFO)Begin Render-Pass Minor translation & optimization from the Driver prior tosending to the GPU Each can be created either for one shot or for multipleframes/submissions Cannot create Graphic Work from GPU (command-listscan): API calls to vkCmd () between Begin & End Multi-threading friendly ! Primary Cmd-Buffer can call many2ndary Cmd-BuffersPrimary Cmd-bufferBind Graphics-pipelineSet misc. dynamic statesBind Vertex/Idx Buffer(s)Update BufferBind Descriptor-Set(s)Draw ndary fer Execute CommandsEnd Render-Pass20

Command-Buffers: Update/Push Constants 2 more ways to update constants/uniforms for Shaders fromthe Command-Buffer Update-Buffer: prior to Render-Pass: can target any Buffer boundby Descriptor Setslayout(set 0 , binding 2 ) uniform MyBuffer {mat4 mW; Primary Cmd-buffervkCmdUpdateBuffer()Begin Render-Pass Push-Constants: targets a dedicated section in GLSL/SpirV layout(push constant) uniform objectBuffer {mat4 matrixObject;vec4 diffuse;} object;vkCmdPushConstants New values appended “in-band”: in the Command-BufferDraw Efficient; but good for small amount of values21

Synchronization semaphores used to synchronize work across queues oracross coarse-grained submissions to asingle queue events and barriersCmd-bufferbarrierCmd-buffereventQueue used to synchronize work within acommand buffer or sequence of commandbuffers submitted to a single queueeventCmd-bufferQueueSemaphoresQueue fences used to synchronize work between thedevice and the host.DeviceFencesHost22

Command-Buffers and Multi-ThreadingMain thread(Busy)Game WorkThread Coordinationcmd. Buffer PoolCreate 1ary Cmd BufferCollectThread 1(Busy)Thread 2Update(Busy)WorkThread 3(Busy)Thread 4Update (Busy)WorkUpdatecmd. BufferPoolWorkUpdatecmd. BufferPool te2daryBufferCmd BufferCreate2daryCmd BufferFeedCmdBuffersCmdBuffersBuffersGive Feedout CmdGive out Cmd BuffersCreateCmd BufferFeedCmd2daryBuffersFeedCmdBuffersGive outCmdBuffersGive out Cmd Buffers1ary Cmd calls 2dary onesSubmit to QSwapping! Command Buffer Pool local to the thread !23

Command Buffer Thread Safety Must not recycle a CommandBuffer for rewriting until it is no longer in flight (Inflight GPU still consuming it on its side) But can’t flush the queue each frame: would break parallelism ! VkFences can be provided with a queue submission to test when a command bufferis ready to be recycledGPU Consumes QueueFence A Signaled to AppFence ACommandBufferCommandBufferCommandBufferFence BCommandBufferCommandBufferApp Submissions to the QueueRewrite command buffer24

Threads And Command Pools Threads can have more than 1 Command Pool Ring-buffer: One Command-Pool per Frame when that thread/frame is no longer in flight (Using Fences) Faster to simply reset a poolFrame N-2Thread 1CommandPoolCommandBufferThread 2Frame ferFrame 5

Vulkan ComponentsCmd.Buffer PoolCommand-bufferBarrier synchronizationHeapImage ViewFramebufferRender-PassBegin Render-PassBind Graphics-pipelineGraphics pipelineSet misc. dynamic statesImageMemoryBufferBind Vertex/Idx Buffer(s)Update BufferImage ViewBufferSamplerDescriptor-SetDescriptorSet PoolBind Descriptor-Set(s)DeviceDraw 2ndary Command-buffer Execute CommandsEnd Render-PassQueue 26

Graphics Pipeline Snapshot of all StatesImage View Including amebuffer Pre-compiled & Immutable Ok at render-time *if* using thePipeline-CacheBarrier peline-layoutBegin Render-PassVertex InputBind Graphics-pipelineTess. StateGraphics pipelineSet misc. dynamic states Ideally: doneImageat Initialization timeMemoryShader ModuleCommand-bufferViewport StateBufferOptional Dynamic States ViewportScissor Image ViewDescriptor-Set Prevents validation overhead Blend constStencil Ref during rendering loopDescriptorSetPoolBoundsDepthBufferDepth Bias Some Render-states can be2ndary Command-buffer excluded fromit:theybecome Sampler “Dynamic” States Bind Vertex/Idx ate Sample StateBind aw Depth & Stencil StatedepthBiasEnabledepthBiasColor Blend StateExecute CommandsdepthBiasClampQueueslopeScaledDepthBiasEnd Render-PasslineWidth27

Graphics or-setDescriptor-setLayoutLayout Graphics Pipeline must beconsistent with shadersShader StageFramebuffer No “introspection”, so everythingknown & prepared in advanceRender-PassPipeline-layoutBind Graphics-pipelineGraphics pipelineSet misc. dynamic states Vertex Input:Memory tells how Attributes: Locations areattached to which Vertex Buffer atwhich offsetImage ViewBufferBegin Render-PassVertex InputImage Pipeline Layout:Barrier ets forBind Vertex/Idx Buffer(s)Spir-V compiledHeapImage ViewShader ModuleCommand-bufferUpdate BufferDescriptor-SetGLSL CodeBind Descriptor-Set(s)Devicelayout(std140, set 0 , binding 0) uniform A { . };DescriptorSetPoollayout(std140,set 0 , binding 1) uniform B { Draw }; Tells how to map Sets and Bindingsfor the shadersat each stage (Vtx,SamplerFragment, Geom )layout(std140, set 1 , binding 2) uniform C { }; ndaryExecute Commands2Command-bufferlayout(location 0) in vec3 pos;layout(location 1) in vec3 N; void main() { End Render-PassQueue28

Vulkan ComponentsCmd.Buffer PoolCommand-bufferBarrier synchronizationHeapImage ViewFramebufferRender-PassBegin Render-PassBind Graphics-pipelineGraphics pipelineSet misc. dynamic statesImageMemoryBufferBind Vertex/Idx Buffer(s)Update BufferImage ViewBufferSamplerDescriptor-SetDescriptorSet PoolBind Descriptor-Set(s)DeviceDraw 2ndary Command-buffer Execute CommandsEnd Render-PassQueue 29

BuffersCmd.Buffer Pool Highly Heterogenous. Most often used for: -bufferBarrier synchronizationRender-Pass Uniform Buffers (Matrices, material parameters )Begin Render-PassBind Graphics-pipelineGraphics pipelineImage Vulkan Object:Must be bound to some Device MemoryMemory Canbe CPU accessible memory (mappable)BufferSet misc. dynamic statesBind Vertex/Idx Buffer(s)Update Buffer Can be CPU cachedImage ViewDescriptor-SetBind Descriptor-Set(s) Can be GPU accessible only: need a “Staging Buffer” to write into itBuffer But most EfficientDescriptorSet Pool(More on DeviceMemory later )SamplerDeviceDraw 2ndary Command-buffer Execute CommandsEnd Render-PassQueue 30

Vulkan ComponentsCmd.Buffer PoolCommand-bufferBarrier synchronizationHeapImage ViewFramebufferRender-PassBegin Render-PassBind Graphics-pipelineGraphics pipelineSet misc. dynamic statesImageMemoryBufferBind Vertex/Idx Buffer(s)Update BufferImage ViewBufferSamplerDescriptor-SetDescriptorSet PoolBind Descriptor-Set(s)DeviceDraw 2ndary Command-buffer Execute CommandsEnd Render-PassQueue 31

Images And ImageViewCmd.Buffer Pool Images represent all kind of ‘pixel-like’ arrays Textures: ImageColorViewor Depth-StencilFramebufferBarrier synchronizationRender-PassHeap Render targets : Color and Depth-StencilBegin Render-PassBind Graphics-pipelineGraphics pipeline Even ComputedataImageMemory ShaderCommand-bufferSet misc. dynami

Vulkan: the essentials . 2 Analogy On Graphic APIs . 3 Analogy Fixed-function OpenGL Pre-assembled toy car fun out of the box, not much room for customization . 4 Analogy Modern AZDO OpenGL with Programmable Shaders LEGO Kit you build it yourself, comes with plenty of useful, pre-shaped pieces AZDO Approaching Zero Driver Overhead . 5 Analogy Pine Wood Derby Kit you build it yourself to