CUDA Toolkit Documentation

文章推薦指數: 80 %
投票人數:10人

The CUDA Toolkit End User License Agreement applies to the NVIDIA CUDA Toolkit, the NVIDIA CUDA Samples, the NVIDIA Display Driver, NVIDIA ... CUDAToolkit v11.6.1 ReleaseNotes CUDAFeaturesArchive EULA InstallationGuides QuickStartGuide InstallationGuideWindows InstallationGuideLinux ProgrammingGuides ProgrammingGuide BestPracticesGuide MaxwellCompatibilityGuide PascalCompatibilityGuide VoltaCompatibilityGuide TuringCompatibilityGuide NVIDIAAmpereGPUArchitectureCompatibilityGuide KeplerTuningGuide MaxwellTuningGuide PascalTuningGuide VoltaTuningGuide TuringTuningGuide NVIDIAAmpereGPUArchitectureTuningGuide PTXISA DeveloperGuideforOptimus VideoDecoder PTXInteroperability InlinePTXAssembly CUDAOccupancyCalculator CUDAAPIReferences CUDARuntimeAPI CUDADriverAPI CUDAMathAPI cuBLAS cuDLAAPI NVBLAS nvJPEG cuFFT CUB CUDAC++StandardLibrary cuFileAPIReferenceGuide cuRAND cuSPARSE NPP NVRTC(RuntimeCompilation) Thrust cuSOLVER PTXCompilerAPIReferences PTXCompilerAPIs Miscellaneous CUDASamples CUDADemoSuite CUDAonWSL Multi-InstanceGPU(MIG) CUDACompatibility CUPTI DebuggerAPI GPUDirectRDMA GPUDirectStorage vGPU Tools NVCC CUDA-GDB CUDA-MEMCHECK ComputeSanitizer NsightEclipsePluginsInstallationGuide NsightEclipsePluginsEdition NsightSystems NsightCompute NsightVisualStudioEdition Profiler CUDABinaryUtilities WhitePapers FloatingPointandIEEE754 Incomplete-LUandCholeskyPreconditionedIterativeMethods ApplicationNotes CUDAforTegra CompilerSDK libNVVMAPI libdeviceUser'sGuide NVVMIR SearchResults CUDAToolkitDocumentation - v11.6.1 (older) - LastupdatedFebruary22,2022 - SendFeedback ReleaseNotes TheReleaseNotesfortheCUDAToolkit. CUDAFeaturesArchive ThelistofCUDAfeaturesbyrelease. EULA TheCUDAToolkitEndUserLicenseAgreementappliestotheNVIDIACUDAToolkit,the NVIDIACUDASamples,theNVIDIADisplayDriver,NVIDIANsighttools(VisualStudio Edition),andtheassociateddocumentationonCUDAAPIs,programmingmodeland developmenttools.Ifyoudonotagreewiththetermsandconditionsofthelicense agreement,thendonotdownloadorusethesoftware. InstallationGuides QuickStartGuide Thisguideprovidestheminimalfirst-stepsinstructionsforinstallationandverifyingCUDAonastandardsystem. InstallationGuideWindows ThisguidediscusseshowtoinstallandcheckforcorrectoperationoftheCUDADevelopmentToolsonMicrosoftWindowssystems. InstallationGuideLinux ThisguidediscusseshowtoinstallandcheckforcorrectoperationoftheCUDADevelopmentToolsonGNU/Linuxsystems. ProgrammingGuides ProgrammingGuide Thisguideprovidesadetaileddiscussionof theCUDAprogrammingmodelandprogramminginterface.Itthendescribes thehardwareimplementation,andprovidesguidanceonhowtoachieve maximumperformance.TheappendicesincludealistofallCUDA-enabled devices,detaileddescriptionofallextensionstotheC++language, listingsofsupportedmathematicalfunctions,C++featuressupportedin hostanddevicecode,detailsontexturefetching,technical specificationsofvariousdevices,andconcludesbyintroducingthe low-leveldriverAPI. BestPracticesGuide Thisguidepresentsestablished parallelizationandoptimizationtechniquesandexplainscoding metaphorsandidiomsthatcangreatlysimplifyprogrammingfor CUDA-capableGPUarchitectures.Theintentistoprovideguidelinesfor obtainingthebestperformancefromNVIDIAGPUsusingtheCUDA Toolkit. MaxwellCompatibilityGuide Thisapplicationnoteisintendedtohelp developersensurethattheirNVIDIACUDAapplicationswillrun properlyonGPUsbasedontheNVIDIAMaxwellArchitecture.This documentprovidesguidancetoensurethatyoursoftwareapplicationsare compatiblewithMaxwell. PascalCompatibilityGuide Thisapplicationnoteisintendedtohelp developersensurethattheirNVIDIACUDAapplicationswillrun properlyonGPUsbasedontheNVIDIAPascalArchitecture.This documentprovidesguidancetoensurethatyoursoftwareapplicationsare compatiblewithPascal. VoltaCompatibilityGuide Thisapplicationnoteisintendedtohelp developersensurethattheirNVIDIACUDAapplicationswillrun properlyonGPUsbasedontheNVIDIAVoltaArchitecture.This documentprovidesguidancetoensurethatyoursoftwareapplicationsare compatiblewithVolta. TuringCompatibilityGuide Thisapplicationnoteisintendedtohelp developersensurethattheirNVIDIACUDAapplicationswillrun properlyonGPUsbasedontheNVIDIATuringArchitecture.This documentprovidesguidancetoensurethatyoursoftwareapplicationsare compatiblewithTuring. NVIDIAAmpereGPUArchitectureCompatibilityGuide Thisapplicationnoteisintendedtohelp developersensurethattheirNVIDIACUDAapplicationswillrun properlyonGPUsbasedontheNVIDIAAmpereGPUArchitecture.This documentprovidesguidancetoensurethatyoursoftwareapplicationsare compatiblewithNVIDIAAmpereGPUarchitecture. KeplerTuningGuide KeplerisNVIDIA's3rd-generation architectureforCUDAcomputeapplications.Applicationsthatfollow thebestpracticesfortheFermiarchitectureshouldtypically seespeedupsontheKeplerarchitecturewithoutanycodechanges.This guidesummarizesthewaysthatapplicationscanbefine-tunedtogain additionalspeedupsbyleveragingKeplerarchitecturalfeatures. MaxwellTuningGuide MaxwellisNVIDIA's4th-generation architectureforCUDAcomputeapplications.Applicationsthatfollow thebestpracticesfortheKeplerarchitectureshouldtypicallysee speedupsontheMaxwellarchitecturewithoutanycodechanges.This guidesummarizesthewaysthatapplicationscanbefine-tunedtogain additionalspeedupsbyleveragingMaxwellarchitecturalfeatures. PascalTuningGuide PascalisNVIDIA's5th-generation architectureforCUDAcomputeapplications.Applicationsthatfollow thebestpracticesfortheMaxwellarchitectureshouldtypicallysee speedupsonthePascalarchitecturewithoutanycodechanges.This guidesummarizesthewaysthatapplicationscanbefine-tunedtogain additionalspeedupsbyleveragingPascalarchitecturalfeatures. VoltaTuningGuide VoltaisNVIDIA's6th-generation architectureforCUDAcomputeapplications.Applicationsthatfollow thebestpracticesforthePascalarchitectureshouldtypicallysee speedupsontheVoltaarchitecturewithoutanycodechanges.This guidesummarizesthewaysthatapplicationscanbefine-tunedtogain additionalspeedupsbyleveragingVoltaarchitecturalfeatures. TuringTuningGuide TuringisNVIDIA's7th-generation architectureforCUDAcomputeapplications.Applicationsthatfollow thebestpracticesforthePascalarchitectureshouldtypicallysee speedupsontheTuringarchitecturewithoutanycodechanges.This guidesummarizesthewaysthatapplicationscanbefine-tunedtogain additionalspeedupsbyleveragingTuringarchitecturalfeatures. NVIDIAAmpereGPUArchitectureTuningGuide NVIDIAAmpereGPUArchitectureisNVIDIA's8th-generation architectureforCUDAcomputeapplications.Applicationsthatfollow thebestpracticesfortheNVIDIAVoltaarchitectureshouldtypicallysee speedupsontheNVIDIAAmpereGPUArchitecturewithoutanycodechanges.This guidesummarizesthewaysthatapplicationscanbefine-tunedtogain additionalspeedupsbyleveragingNVIDIAAmpereGPUArchitecture'sfeatures. PTXISA Thisguideprovidesdetailedinstructionsonthe useofPTX,alow-levelparallelthreadexecutionvirtualmachineand instructionsetarchitecture(ISA).PTXexposestheGPUasa data-parallelcomputingdevice. DeveloperGuideforOptimus ThisdocumentexplainshowCUDAAPIscanbeusedtoqueryforGPUcapabilitiesinNVIDIAOptimussystems. VideoDecoder NVIDIAVideoDecoder(NVCUVID)isdeprecated.Instead,usetheNVIDIA VideoCodecSDK(https://developer.nvidia.com/nvidia-video-codec-sdk). PTXInteroperability ThisdocumentshowshowtowritePTXthatis ABI-compliantandinteroperablewithotherCUDAcode. InlinePTXAssembly ThisdocumentshowshowtoinlinePTX(parallel threadexecution)assemblylanguagestatementsintoCUDAcode.It describesavailableassemblerstatementparametersandconstraints,and thedocumentalsoprovidesalistofsomepitfallsthatyoumay encounter. CUDAOccupancyCalculator TheCUDAOccupancyCalculatorallowsyoutocomputethemultiprocessoroccupancyofaGPUbyagivenCUDAkernel. CUDAAPIReferences CUDARuntimeAPI Fieldsinstructuresmightappearinorderthatisdifferentfromtheorderofdeclaration. CUDADriverAPI Fieldsinstructuresmightappearinorderthatisdifferentfromtheorderofdeclaration. CUDAMathAPI TheCUDAmathAPI. cuBLAS ThecuBLASlibraryisanimplementationofBLAS(BasicLinearAlgebraSubprograms)ontopoftheNVIDIACUDAruntime.Itallows theusertoaccessthecomputationalresourcesofNVIDIAGraphicalProcessingUnit(GPU),butdoesnotauto-parallelizeacross multipleGPUs. cuDLAAPI ThecuDLAAPI. NVBLAS TheNVBLASlibraryisamulti-GPUsaccelerateddrop-inBLAS(BasicLinearAlgebraSubprograms)builtontopoftheNVIDIA cuBLASLibrary. nvJPEG ThenvJPEGLibraryprovideshigh-performanceGPUacceleratedJPEG decodingfunctionalityforimageformatscommonlyusedindeeplearningandhyperscale multimediaapplications. cuFFT ThecuFFTlibraryuserguide. CUB TheuserguideforCUB. CUDAC++StandardLibrary TheAPIreferenceforlibcu++,theCUDAC++standardlibrary. cuFileAPIReferenceGuide TheNVIDIA®GPUDirect®StoragecuFileAPIReferenceGuide providesinformationaboutthepreliminaryversionofthecuFileAPIreferenceguide thatisusedinapplicationsandframeworkstoleverageGDStechnologyanddescribesthe intent,context,andoperationofthoseAPIs,whicharepartoftheGDStechnology. cuRAND ThecuRANDlibraryuserguide. cuSPARSE ThecuSPARSElibraryuserguide. NPP NVIDIANPPisalibraryoffunctionsforperformingCUDAaccelerated processing.Theinitialsetoffunctionalityinthelibraryfocuseson imagingandvideoprocessingandiswidelyapplicablefordevelopersin theseareas.NPPwillevolveovertimetoencompassmoreofthecompute heavytasksinavarietyofproblemdomains.TheNPPlibraryiswritten tomaximizeflexibility,whilemaintaininghighperformance. NVRTC(RuntimeCompilation) NVRTCisaruntimecompilationlibraryforCUDAC++. ItacceptsCUDAC++sourcecodeincharacterstringformandcreates handlesthatcanbeusedtoobtainthePTX. ThePTXstringgeneratedbyNVRTCcanbeloadedbycuModuleLoadDataand cuModuleLoadDataEx,andlinkedwithothermodulesbycuLinkAddDataof theCUDADriverAPI. Thisfacilitycanoftenprovideoptimizationsandperformancenot possibleinapurelyofflinestaticcompilation. Thrust TheThrustgettingstartedguide. cuSOLVER ThecuSOLVERlibraryuserguide. PTXCompilerAPIReferences PTXCompilerAPIs ThisguideshowshowtocompileaPTXprogramintoGPUassemblycodeusing APIsprovidedbythestaticPTXCompilerlibrary. Miscellaneous CUDASamples Thisdocumentcontainsacompletelistingofthecodesamplesthatare includedwiththeNVIDIACUDAToolkit.Itdescribeseachcodesample, liststheminimumGPUspecification,andprovideslinkstothesource codeandwhitepapersifavailable. CUDADemoSuite ThisdocumentdescribesthedemoapplicationsshippedwiththeCUDADemoSuite. CUDAonWSL Thisguideisintendedtohelpusers getstartedwithusingNVIDIACUDAonWindowsSubsystemforLinux(WSL2). TheguidecoversinstallationandrunningCUDAapplicationsandcontainers inthisenvironment. Multi-InstanceGPU(MIG) ThiseditionoftheuserguidedescribestheMulti-InstanceGPUfeatureoftheNVIDIA®A100GPU. CUDACompatibility ThisdocumentdescribesCUDACompatibility,includingCUDAEnhancedCompatibilityandCUDAForwardCompatibleUpgrade. CUPTI TheCUPTI-API.TheCUDAProfilingToolsInterface(CUPTI) enablesthecreationofprofilingandtracingtoolsthattargetCUDAapplications. DebuggerAPI TheCUDAdebuggerAPI. GPUDirectRDMA AtechnologyintroducedinKepler-classGPUsandCUDA5.0, enablingadirectpathforcommunicationbetweentheGPUandathird-partypeer deviceonthePCIExpressbuswhenthedevicessharethesameupstream rootcomplexusingstandardfeaturesofPCIExpress.Thisdocument introducesthetechnologyanddescribesthestepsnecessarytoenablea GPUDirectRDMAconnectiontoNVIDIAGPUswithintheLinuxdevice drivermodel. GPUDirectStorage ThedocumentationforGPUDirectStorage. vGPU vGPUsthatsupportCUDA. Tools NVCC Thisisareferencedocumentfornvcc, theCUDAcompilerdriver. nvccacceptsarangeofconventionalcompileroptions, suchasfordefiningmacrosandinclude/librarypaths,andforsteering thecompilationprocess. CUDA-GDB TheNVIDIAtoolfordebuggingCUDAapplicationsrunningonLinuxandQNX,providingdeveloperswithamechanismfordebugging CUDAapplicationsrunningonactualhardware.CUDA-GDBisanextensiontothex86-64portofGDB,theGNUProjectdebugger. CUDA-MEMCHECK CUDA-MEMCHECKisasuiteofruntimetoolscapableofpreciselydetecting outofboundsandmisalignedmemoryaccesserrors,checkingdevice allocationleaks,reportinghardwareerrorsandidentifyingsharedmemorydata accesshazards. ComputeSanitizer TheuserguideforComputeSanitizer. NsightEclipsePluginsInstallationGuide NsightEclipsePluginsInstallationGuide NsightEclipsePluginsEdition NsightEclipsePluginsEditiongettingstartedguide NsightSystems ThedocumentationforNsightSystems. NsightCompute TheNVIDIANsightComputeisthenext-generationinteractivekernelprofilerforCUDAapplications.Itprovidesdetailedperformance metricsandAPIdebuggingviaauserinterfaceandcommandlinetool. NsightVisualStudioEdition ThedocumentationforNsightVisualStudioEdition. Profiler ThisistheguidetotheProfiler. CUDABinaryUtilities Theapplicationnotesforcuobjdump,nvdisasm,andnvprune. WhitePapers FloatingPointandIEEE754 Anumberofissuesrelatedtofloatingpointaccuracyandcomplianceare afrequentsourceofconfusiononbothCPUsandGPUs.Thepurposeofthis whitepaperistodiscussthemostcommonissuesrelatedtoNVIDIAGPUs andtosupplementthedocumentationintheCUDAC++ProgrammingGuide. Incomplete-LUandCholeskyPreconditionedIterativeMethods Inthiswhitepaperweshowhowtousethe cuSPARSEandcuBLASlibrariestoachievea2xspeedupoverCPUinthe incomplete-LUandCholeskypreconditionediterativemethods.Wefocuson theBi-ConjugateGradientStabilizedandConjugateGradientiterative methods,thatcanbeusedtosolvelargesparsenonsymmetricand symmetricpositivedefinitelinearsystems,respectively.Also,we commentontheparallelsparsetriangularsolve,whichisanessential buildingblockinthesealgorithms. ApplicationNotes CUDAforTegra ThisapplicationnoteprovidesanoverviewofNVIDIA®Tegra® memoryarchitectureandconsiderationsforportingcodefromadiscreteGPU(dGPU) attachedtoanx86systemtotheTegra®integratedGPU(iGPU).ItalsodiscussesEGL interoperability. CompilerSDK libNVVMAPI ThelibNVVMAPI. libdeviceUser'sGuide ThelibdevicelibraryisanLLVMbitcodelibrary thatimplementscommonfunctionsforGPUkernels. NVVMIR NVVMIRisacompilerIR(intermediate representation)basedontheLLVMIR.TheNVVMIRisdesignedto representGPUcomputekernels(forexample,CUDAkernels).High-level languagefront-ends,liketheCUDACcompilerfront-end,cangenerate NVVMIR.



請為這篇文章評分?