CUDA Compiler Driver NVCC - NVIDIA Developer PDF Free Download

1y ago

24 Views

1 Downloads

644.92 KB

62 Pages

Report/dmca

Download PDF

Transcription

CUDA Compiler Driver NVCCReference GuideTRM-06721-001 v11.7 May 2022

Table of ContentsChapter 1. Introduction. 11.1. Overview.11.1.1. CUDA Programming Model. 11.1.2. CUDA Sources. 11.1.3. Purpose of NVCC.21.2. Supported Host Compilers. 2Chapter 2. Compilation Phases.32.1. NVCC Identification Macro. 32.2. NVCC Phases. 42.3. Supported Input File Suffixes.42.4. Supported Phases. 5Chapter 3. The CUDA Compilation Trajectory. 7Chapter 4. NVCC Command Options. 94.1. Command Option Types and Notation.94.2. Command Option Description. 104.2.1. File and Path Specifications. 104.2.1.1. --output-file file (-o). 104.2.1.2. --objdir-as-tempdir (-objtemp). 104.2.1.3. --pre-include file,. (-include). 104.2.1.4. --library library,. (-l). 104.2.1.5. --define-macro def,. (-D).104.2.1.6. --undefine-macro def,. (-U).114.2.1.7. --include-path path,. (-I). 114.2.1.8. --system-include path,. (-isystem). 114.2.1.9. --library-path path,. (-L).114.2.1.10. --output-directory directory (-odir).114.2.1.11. --dependency-output file (-MF).114.2.1.12. --generate-dependency-targets (-MP). 114.2.1.13. --compiler-bindir directory (-ccbin).114.2.1.14. --allow-unsupported-compiler (-allow-unsupported-compiler).124.2.1.15. --archiver-binary executable (-arbin). 124.2.1.16. --cudart {none shared static} (-cudart). 124.2.1.17. --cudadevrt {none static} (-cudadevrt). 124.2.1.18. --libdevice-directory directory (-ldir). 134.2.1.19. --target-directory string (-target-dir). 13CUDA Compiler Driver NVCCTRM-06721-001 v11.7 ii

4.2.2. Options for Specifying the Compilation Phase. 134.2.2.1. --link (-link).134.2.2.2. --lib (-lib). 134.2.2.3. --device-link (-dlink).134.2.2.4. --device-c (-dc). 144.2.2.5. --device-w (-dw).144.2.2.6. --cuda (-cuda). 144.2.2.7. --compile (-c). 144.2.2.8. --fatbin (-fatbin). 144.2.2.9. --cubin (-cubin). 154.2.2.10. --ptx (-ptx). 154.2.2.11. --preprocess (-E). 154.2.2.12. --generate-dependencies (-M).154.2.2.13. --generate-nonsystem-dependencies (-MM). 164.2.2.14. --generate-dependencies-with-compile (-MD). 164.2.2.15. --generate-nonsystem-dependencies-with-compile (-MMD). 164.2.2.16. --optix-ir (-optix-ir). 164.2.2.17. --run (-run).164.2.3. Options for Specifying Behavior of Compiler/Linker. 174.2.3.1. --profile (-pg). 174.2.3.2. --debug (-g).174.2.3.3. --device-debug (-G). 174.2.3.4. --extensible-whole-program (-ewp). 174.2.3.5. --no-compress (-no-compress). 174.2.3.6. --generate-line-info (-lineinfo).174.2.3.7. --optimization-info kind,. (-opt-info).174.2.3.8. --optimize level (-O).184.2.3.9. --dopt kind (-dopt). 184.2.3.10. --dlink-time-opt (-dlto).184.2.3.11. --ftemplate-backtrace-limit limit (-ftemplate-backtrace-limit).184.2.3.12. --ftemplate-depth limit (-ftemplate-depth).184.2.3.13. --no-exceptions (-noeh).184.2.3.14. --shared (-shared). 194.2.3.15. --x {c c cu} (-x). 194.2.3.16. --std {c 03 c 11 c 14 c 17} (-std). 194.2.3.17. --no-host-device-initializer-list (-nohdinitlist). 204.2.3.18. --expt-relaxed-constexpr (-expt-relaxed-constexpr).204.2.3.19. --extended-lambda (-extended-lambda). 20CUDA Compiler Driver NVCCTRM-06721-001 v11.7 iii

4.2.3.20. --expt-extended-lambda (-expt-extended-lambda). 204.2.3.21. --machine {32 64} (-m). 204.2.3.22. --m32 (-m32).204.2.3.23. --m64 (-m64).204.2.3.24. --host-linker-script {use-lcs gen-lcs} (-hls). 214.2.3.25. --augment-host-linker-script (-aug-hls).214.2.3.26. --host-relocatable-link (-r). 214.2.4. Options for Passing Specific Phase Options.224.2.4.1. --compiler-options options,. (-Xcompiler).224.2.4.2. --linker-options options,. (-Xlinker). 224.2.4.3. --archive-options options,. (-Xarchive). 224.2.4.4. --ptxas-options options,. (-Xptxas). 224.2.4.5. --nvlink-options options,. (-Xnvlink). 224.2.5. Options for Guiding the Compiler Driver. 224.2.5.1. --forward-unknown-to-host-compiler (-forward-unknown-to-host-compiler). 224.2.5.2. --forward-unknown-to-host-linker (-forward-unknown-to-host-linker). 234.2.5.3. --dont-use-profile (-noprof). 234.2.5.4. --threads number (-t).234.2.5.5. --dryrun (-dryrun). 234.2.5.6. --verbose (-v). 234.2.5.7. --keep (-keep). 244.2.5.8. --keep-dir directory (-keep-dir). 244.2.5.9. --save-temps (-save-temps). 244.2.5.10. --clean-targets (-clean).244.2.5.11. --run-args arguments,. (-run-args). 244.2.5.12. --use-local-env (-use-local-env).244.2.5.13. --input-drive-prefix prefix (-idp).244.2.5.14. --dependency-drive-prefix prefix (-ddp). 244.2.5.15. --drive-prefix prefix (-dp). 254.2.5.16. --dependency-target-name target (-MT).254.2.5.18. --no-device-link (-nodlink). 254.2.5.19. --allow-unsupported-compiler (-allow-unsupported-compiler).254.2.6. Options for Steering CUDA Compilation. 254.2.6.1. --default-stream {legacy null per-thread} (-default-stream). 254.2.7. Options for Steering GPU Code Generation.264.2.7.1. --gpu-architecture {arch native all all-major} (-arch).264.2.7.2. --gpu-code code,. (-code).274.2.7.3. --generate-code specification (-gencode). 27CUDA Compiler Driver NVCCTRM-06721-001 v11.7 iv

4.2.7.4. --relocatable-device-code {true false} (-rdc). 274.2.7.5. --entries entry,. (-e).284.2.7.6. --maxrregcount amount (-maxrregcount).284.2.7.7. --use fast math (-use fast math).284.2.7.8. --ftz {true false} (-ftz). 294.2.7.9. --prec-div {true false} (-prec-div). 294.2.7.10. --prec-sqrt {true false} (-prec-sqrt). 294.2.7.11. --fmad {true false} (-fmad).304.2.7.12. --extra-device-vectorization (-extra-device-vectorization). 304.2.7.13. --compile-as-tools-patch (-astoolspatch). 304.2.7.14. --keep-device-functions (-keep-device-functions).304.2.8. Generic Tool Options.304.2.8.1. --disable-warnings (-w).304.2.8.2. --source-in-ptx (-src-in-ptx). 314.2.8.3. --restrict (-restrict).314.2.8.4. --Wno-deprecated-gpu-targets (-Wno-deprecated-gpu-targets).314.2.8.5. --Wno-deprecated-declarations (-Wno-deprecated-declarations). 314.2.8.6. --Wreorder (-Wreorder).314.2.8.7. --Wdefault-stream-launch (-Wdefault-stream-launch). 314.2.8.8. --Wmissing-launch-bounds (-Wmissing-launch-bounds).314.2.8.9. --Wext-lambda-captures-this (-Wext-lambda-captures-this). 314.2.8.10. --Werror kind,. (-Werror). 314.2.8.11. --display-error-number (-err-no). 324.2.8.12. --no-display-error-number (-no-err-no).324.2.8.13. --diag-error errNum,. (-diag-error). 324.2.8.14. --diag-suppress errNum,. (-diag-suppress). 324.2.8.15. --diag-warn errNum,. (-diag-warn). 324.2.8.16. --resource-usage (-res-usage).334.2.8.17. --help (-h).334.2.8.18. --version (-V). 334.2.8.19. --options-file file,. (-optf).334.2.8.20. --time filename (-time).334.2.8.21. --qpp-config config (-qpp-config). 334.2.8.22. --list-gpu-code (-code-ls). 334.2.8.23. --list-gpu-arch (-arch-ls). 334.2.9. Phase Options.334.2.9.1. Ptxas Options. 344.2.9.2. NVLINK Options. 36CUDA Compiler Driver NVCCTRM-06721-001 v11.7 v

4.3. NVCC Environment Variables. 37Chapter 5. GPU Compilation. 395.1. GPU Generations.395.2. GPU Feature List. 395.3. Application Compatibility. 405.4. Virtual Architectures. 405.5. Virtual Architecture Feature List. 415.6. Further Mechanisms.425.6.1. Just-in-Time Compilation. 425.6.2. Fatbinaries. 435.7. NVCC Examples. 435.7.1. Base Notation. 435.7.2. Shorthand.435.7.2.1. Shorthand 1.435.7.2.2. Shorthand 2.445.7.2.3. Shorthand 3.445.7.3. Extended Notation. 445.7.4. Virtual Architecture Macros.45Chapter 6. Using Separate Compilation in CUDA.466.1. Code Changes for Separate Compilation. 466.2. NVCC Options for Separate Compilation. 466.3. Libraries.476.4. Examples. 486.5. Optimization Of Separate Compilation.496.6. Potential Separate Compilation Issues. 506.6.1. Object Compatibility.506.6.2. JIT Linking Support. 506.6.3. Implicit CUDA Host Code.506.6.4. Using CUDA ARCH .516.6.5. Device Code in Libraries.51Chapter 7. Miscellaneous NVCC Usage. 527.1. Cross Compilation.527.2. Keeping Intermediate Phase Files.527.3. Cleaning Up Generated Files. 527.4. Printing Code Generation Statistics.53CUDA Compiler Driver NVCCTRM-06721-001 v11.7 vi

List of FiguresFigure 1. CUDA Compilation Trajectory .8Figure 2. Two-Staged Compilation with Virtual and Real Architectures . 41Figure 3. Just-in-Time Compilation of Device Code . 42Figure 4. CUDA Separate Compilation Trajectory . 47CUDA Compiler Driver NVCCTRM-06721-001 v11.7 vii

CUDA Compiler Driver NVCCTRM-06721-001 v11.7 viii

Chapter 1.Introduction1.1.Overview1.1.1.CUDA Programming ModelThe CUDA Toolkit targets a class of applications whose control part runs as a process on ageneral purpose computing device, and which use one or more NVIDIA GPUs as coprocessorsfor accelerating single program, multiple data (SPMD) parallel jobs. Such jobs are selfcontained, in the sense that they can be executed and completed by a batch of GPU threadsentirely without intervention by the host process, thereby gaining optimal benefit from theparallel graphics hardware.The GPU code is implemented as a collection of functions in a language that is essentially C , but with some annotations for distinguishing them from the host code, plus annotations fordistinguishing different types of data memory that exists on the GPU. Such functions may haveparameters, and they can be called using a syntax that is very similar to regular C functioncalling, but slightly extended for being able to specify the matrix of GPU threads that mustexecute the called function. During its life time, the host process may dispatch many parallelGPU tasks.For more information on the CUDA programming model, consult the CUDA C ProgrammingGuide.1.1.2.CUDA SourcesSource files for CUDA applications consist of a mixture of conventional C host code, plusGPU device functions. The CUDA compilation trajectory separates the device functions fromthe host code, compiles the device functions using the proprietary NVIDIA compilers andassembler, compiles the host code using a C host compiler that is available, and afterwardsembeds the compiled GPU functions as fatbinary images in the host object file. In the linkingstage, specific CUDA runtime libraries are added for supporting remote SPMD procedurecalling and for providing explicit GPU manipulation such as allocation of GPU memory buffersand host-GPU data transfer.CUDA Compiler Driver NVCCTRM-06721-001 v11.7 1

Introduction1.1.3.Purpose of NVCCThe compilation trajectory involves several splitting, compilation, preprocessing, and mergingsteps for each CUDA source file. It is the purpose of nvcc, the CUDA compiler driver, to hidethe intricate details of CUDA compilation from developers. It accepts a range of conventionalcompiler options, such as for defining macros and include/library paths, and for steering thecompilation process. All non-CUDA compilation steps are forwarded to a C host compilerthat is supported by nvcc, and nvcc translates its options to appropriate host compilercommand line options.1.2.Supported Host CompilersA general purpose C host compiler is needed by nvcc in the following situations:‣ During non-CUDA phases (except the run phase), because these phases will be forwardedby nvcc to this compiler.‣ During CUDA phases, for several preprocessing stages and host code compilation (seealso The CUDA Compilation Trajectory).nvcc assumes that the host compiler is installed with the standard method designed by thecompiler provider. If the host compiler installation is non-standard, the user must make surethat the environment is set appropriately and use relevant nvcc compile options.The following documents provide detailed information about supported host compilers:‣ NVIDIA CUDA Installation Guide for Linux‣ NVIDIA CUDA Installation Guide for Microsoft WindowsOn all platforms, the default host compiler executable (gcc and g on Linux and cl.exe onWindows) found in the current execution search path will be used, unless specified otherwisewith appropriate options (see File and Path Specifications).CUDA Compiler Driver NVCCTRM-06721-001 v11.7 2

Chapter 2.2.1.Compilation PhasesNVCC Identification Macronvcc predefines the following macros:NVCCDefined when compiling C/C /CUDA source files.CUDACCDefined when compiling CUDA source files.CUDACC RDCDefined when compiling CUDA source files in relocatable device code mode (see NVCCOptions for Separate Compilation).CUDACC EWPDefined when compiling CUDA source files in extensible whole program mode (see Optionsfor Specifying Behavior of Compiler/Linker).CUDACC DEBUGDefined when compiling CUDA source files in the device-debug mode (see Options forSpecifying Behavior of Compiler/Linker).CUDACC RELAXED CONSTEXPRDefined when the --expt-relaxed-constexpr flag is specified on the command line.Refer to the CUDA C Programming Guide for more details.CUDACC EXTENDED LAMBDADefined when the --expt-extended-lambda or --extended-lambda flag is specified onthe command line. Refer to the CUDA C Programming Guide for more details.CUDACC VER MAJORDefined with the major version number of nvcc.CUDACC VER MINORDefined with the minor version number of nvcc.CUDACC VER BUILDDefined with the build version number of nvcc.NVCC DIAG PRAGMA SUPPORTDefined when the CUDA frontend compiler supports diagnostic control with thenv diag suppress, nv diag error, nv diag warning, nv diag default,nv diag once, and nv diagnostic pragmas.CUDA Compiler Driver NVCCTRM-06721-001 v11.7 3

Compilation Phases2.2.NVCC PhasesA compilation phase is the a logical translation step that can be selected by command lineoptions to nvcc. A single compilation phase can still be broken up by nvcc into smaller steps,but these smaller steps are just implementations of the phase: they depend on seeminglyarbitrary capabilities of the internal tools that nvcc uses, and all of these internals maychange with a new release of the CUDA Toolkit. Hence, only compilation phases are stableacross releases, and although nvcc provides options to display the compilation steps that itexecutes, these are for debugging purposes only and must not be copied and used into buildscripts.nvcc phases are selected by a combination of command line options and input file namesuffixes, and the execution of these phases may be modified by other command line options.In phase selection, the input file suffix defines the phase input, while the command line optiondefines the required output of the phase.The following paragraphs list the recognized file name suffixes and the supported compilationphases. A full explanation of the n

CUDA Compiler Driver NVCC TRM-06721-001_v11.7 1 Chapter 1. Introduction 1.1. Overview 1.1.1. CUDA Programming Model The CUDA Toolkit targets a class of applications whose control part runs as a process on a