CUDA FAQ | NVIDIA Developer
文章推薦指數: 80 %
The compute capability of a GPU determines its general specifications and available features. For a details, see the Compute Capabilities section in the CUDA C ...
SkiptomaincontentHomeCUDAFAQSectionsGeneralQuestionsHardwareandArchitectureProgrammingQuestionsGeneralQuestionsQ:WhatisCUDA?CUDA®isaparallelcomputingplatformandprogrammingmodelthatenablesdramaticincreasesincomputingperformancebyharnessingthepowerofthegraphicsprocessingunit(GPU). Sinceitsintroductionin2006,CUDAhasbeenwidelydeployedthroughthousandsofapplicationsandpublishedresearchpapers,andsupportedbyaninstalledbaseofhundredsofmillionsofCUDA-enabledGPUsinnotebooks,workstations,computeclustersandsupercomputers. Applications usedin astronomy,biology,chemistry,physics,datamining,manufacturing,finance,andothercomputationallyintensefieldsareincreasinglyusingCUDAtodeliverthebenefitsofGPUacceleration. Q:WhatisNVIDIATesla™?Withtheworld’sfirstteraflopmany-coreprocessor,NVIDIA®Tesla™computingsolutionsenablethenecessarytransitiontoenergyefficientparallelcomputingpower.WiththousandsofCUDAcoresperprocessor,Teslascalestosolvetheworld’smostimportantcomputingchallenges—quicklyandaccurately.Q:WhatisOpenACC?OpenACCisanopenindustrystandardforcompilerdirectivesorhintswhichcanbeinsertedincodewritteninCorFortranenablingthecompilertogeneratecodewhichwouldruninparallelonmulti-CPUandGPUacceleratedsystem.OpenACCdirectivesareeasyandpowerfulwaytoleveragethepowerofGPUComputingwhilekeepingyourcodecompatiblefornon-acceleratedCPUonlysystems.Learnmoreat/openacc.Q:WhatkindofperformanceincreasecanIexpectusingGPUComputingoverCPU-onlycode?Thisdependsonhowwelltheproblemmapsontothearchitecture.Fordataparallelapplications,accelerationsofmorethantwoordersofmagnitudehavebeenseen.Youcanbrowseresearch,developer,applicationsandpartnersonourCUDAInActionPage Q:WhatoperatingsystemsdoesCUDAsupport?CUDAsupportsWindows,LinuxandMacOS.ForfulllistseethelatestCUDAToolkit ReleaseNotes.Thelatestversionisavailableathttp://docs.nvidia.comQ:WhichGPUssupportrunningCUDA-acceleratedapplications?CUDAisastandardfeatureinallNVIDIAGeForce,Quadro,andTeslaGPUsaswellasNVIDIAGRIDsolutions. AfulllistcanbefoundontheCUDAGPUsPage.Q:Whatisthe"computecapability"?ThecomputecapabilityofaGPUdeterminesitsgeneralspecificationsandavailablefeatures.Foradetails,seetheComputeCapabilitiessectionintheCUDACProgrammingGuide.Q: WherecanIfindagoodintroductiontoparallelprogramming?Thereareseveraluniversitycoursesonline,technicalwebinars,articleseriesandalsoseveralexcellentbooksonparallelcomputing.ThesecanbefoundonourCUDAEducationPage.HardwareandArchitectureQ:WillIhavetore-writemyCUDAKernelswhenthenextnewGPUarchitectureisreleased?No.CUDAC/C++providesanabstraction;it’sameansforyoutoexpresshowyouwantyourprogramtoexecute.ThecompilergeneratesPTXcodewhichisalsonothardwarespecific.Atrun-timethePTXiscompiledforaspecifictargetGPU-thisistheresponsibilityofthedriverwhichisupdatedeverytimeanewGPUisreleased.Itispossiblethatchangesinthenumberofregistersorsizeofsharedmemorymayopenuptheopportunityforfurtheroptimizationbutthat'soptional.Sowriteyourcodenow,andenjoyitrunningonfutureGPU'sQ:DoesCUDAsupportmultiplegraphicscardsinonesystem?Yes.ApplicationscandistributeworkacrossmultipleGPUs.Thisisnotdoneautomatically,however,sotheapplicationhascompletecontrol.Seethe"multiGPU"exampleintheGPUComputingSDKforanexampleofprogrammingmultipleGPUs.Q:WherecanIfindmoreinformationonNVIDIAGPUarchitecture?Twogoodplacestostartare:KeplerArchitectureWhitePaperFermiArchitectureWhitePaper ProgrammingQuestionsQ:IthinkI'vefoundabuginCUDA,howdoIreportit?SignupasaCUDAregistereddeveloper,onceyourapplicationhasbeenapprovedyoucanfilebugswhichwillbereviewedbyNVIDIAengineering.Yourbugreportshouldincludeasimple,self-containedpieceofcodethatdemonstratesthebug,alongwithadescriptionofthebugandtheexpectedbehavior.Pleaseincludethefollowinginformationwithyourbugreport:Machineconfiguration(CPU,Motherboard,memoryetc.)OperatingsystemCUDAToolkitversionDisplaydriverversionForLinuxusers,pleaseattachannvidia-bug-report.log,whichisgeneratedbyrunning"nvidia-bug-report.sh". Q:HowdoesCUDAstructurecomputation?CUDAbroadlyfollowsthedata-parallelmodelofcomputation.Typicallyeachthreadexecutesthesameoperationondifferentelementsofthedatainparallel.Thedataissplitupintoa1D,2Dor3Dgridofblocks.Eachblockcanbe1D,2Dor3Dinshape,andcanconsistof over512threadsoncurrenthardware.Threadswithinathreadblockcancooperateviathesharedmemory.Threadblocksareexecutedassmallergroupsofthreadsknownas"warps".Q:CantheCPUandGPUruninparallel?KernelinvocationinCUDAisasynchronous,sothedriverwillreturncontroltotheapplicationassoonasithaslaunchedthekernel.The"cudaThreadSynchronize()"APIcallshouldbeusedwhenmeasuringperformancetoensurethatalldeviceoperationshavecompletedbeforestoppingthetimer.CUDAfunctionsthatperformmemorycopiesandthatcontrolgraphicsinteroperabilityaresynchronous,andimplicitlywaitforallkernelstocomplete.Q:CanItransferdataandrunakernelinparallel(forstreamingapplications)?Yes,CUDAsupportsoverlappingGPUcomputationanddatatransfersusingCUDAstreams.SeetheAsynchronousConcurrentExecutionsectionoftheCUDACProgrammingGuideformoredetails.Q:IsitpossibletoDMAdirectlyintoGPUmemoryfromanotherPCI-Edevice?GPUDirectallowsyoutoDMAdirectlytoGPUhostmemory. SeetheGPUDirecttechnologypagefordetails.Q:WhatarethepeaktransferratesbetweentheCPUandGPU?Theperformanceofmemorytransfersdependsonmanyfactors,includingthesizeofthetransferandtypeofsystemmotherboardused.OnPCI-Express2.0systemswehavemeasuredupto6.0GB/sectransferrates.YoucanmeasurethebandwidthonyoursystemusingthebandwidthtestsamplefromtheSDK.Transfersfrompage-lockedmemoryarefasterbecausetheGPUcanDMAdirectlyfromthismemory.Howeverallocatingtoomuchpage-lockedmemorycansignificantlyaffecttheoverallperformanceofthesystem,soallocateitwithcare.Q:WhatistheprecisionofmathematicaloperationsinCUDA?AllthecurrentrangeofNVIDIAGPUsandsinceGT200 havedoubleprecisionfloatingpoint.Seetheprogrammingguideformoredetails. Allcompute-capableNVIDIAGPUssupport32-bitintegerandsingleprecisionfloatingpointarithmetic.TheyfollowtheIEEE-754standardforsingle-precisionbinaryfloating-pointarithmetic,withsomeminordifferences.Q: WhyaretheresultsofmyGPUcomputationslightlydifferentfromtheCPUresults? Therearemanypossiblereasons.Floatingpointcomputationsarenotguaranteedtogiveidenticalresultsacrossanysetofprocessorarchitectures.TheorderofoperationswilloftenbedifferentwhenimplementingalgorithmsinadataparallelwayontheGPU.Thisisaverygoodreferenceonfloatingpointarithmetic: Precision&Performance:FloatingPointandIEEE754ComplianceforNVIDIAGPUsQ:DoesCUDAsupportdoubleprecisionarithmetic?Yes.GPUswithcomputecapability1.3andhighersupportdoubleprecisionfloatingpointinhardware.Q:HowdoIgetdoubleprecisionfloatingpointtoworkinmykernel?Youneedtoaddtheswitch"-archsm_13"(orahighercomputecapability)toyournvcccommandline,otherwisedoubleswillbesilentlydemotedtofloats.Seethe"Mandelbrot"sampleincludedintheCUDAInstallerforanexampleofhowtoswitchbetweendifferentkernelsbasedonthecomputecapabilityoftheGPU.Q:CanIreaddoubleprecisionfloatsfromtexture?Thehardwaredoesn'tsupportdoubleprecisionfloatasatextureformat,butitispossibletouseint2andcastittodoubleaslongasyoudon'tneedinterpolation:texture
延伸文章資訊
- 1Nvidia GPUs sorted by CUDA cores - GitHub
- 2[GPU Computing] NVIDIA CUDA Compute Capability ...
The Compute Capability describes the features supported by a CUDA hardware. First CUDA capable ha...
- 3Difference between "compute capability" "cuda architecture ...
Is "compute capability" the same as "CUDA architecture". Yes, "compute capability" as used by NVI...
- 4CUDA GPUs | NVIDIA Developer
Are you looking for the compute capability for your GPU, then check the tables below. You can lea...
- 5CUDA FAQ | NVIDIA Developer
The compute capability of a GPU determines its general specifications and available features. For...