CUDA | FSU Research Computing Center

文章推薦指數: 80 %
投票人數:10人

The GPUs currently available are NVIDIA GeForce GTX1080 Ti, which is of the Pascal micro-architecture, and of compute capability 6.1. The CUDA driver ... Home SoftwareLibrary CUDA CUDA HomepageLink(URL) Homepage SoftwareCategory ProgrammingLanguagesandCompilers Version 11.1 CUDA ScientificsimulationscanoftenbesignificantlyacceleratedbyhardwareacceleratorssuchasGraphicsProcessingUnits(GPUs).GPUs areavailableonseveral HPCnodes. TheGPUscurrentlyavailableareNVIDIAGeForceGTX1080Ti,whichisofthePascal micro-architecture,andofcomputecapability6.1. TheCUDAdriverversionis11.1. ThefollowingtableshowsthekeyparamtersoftheGPUattheRCC: BrandName GTX1080Ti ComputeCapability 6.1 Micro-Architecture Pascal NumberStreamMulti-Processors 28 NumberofCUDACores 3584 BoostClock 1600MHZ MemoryCapacity 11GB MemoryBandwidth ~484GBs FP32TFLOPS ~11.4TFLOPS NoteaboutCUDAAvailability TheCUDAModule,CUDALibrariesandNVIDIACUDACompilersareonlyavailableontheloginnodes,theSpearnodesandtheGPUnodes,notthecomputenodes. CompileCUDAcode TocompileCUDA/C/C++code,firstloadthecudamodule  $moduleloadcuda/11.1 Thecudacompilernvccshouldbeimmediatelyavailable, $whichnvcc /usr/local/cuda-11.1/bin/nvcc  andyoucancheckthecudaversionvia $nvcc-V nvcc:NVIDIA(R)Cudacompilerdriver Copyright(c)2005-2020NVIDIACorporation BuiltonMon_Oct_12_20:09:46_PDT_2020 Cudacompilationtools,release11.1,V11.1.105 Buildcuda_11.1.TC455_06.29190527_0   Youcanthen compileyourcuda/c/c++codeviathecuda nvcc compiler $nvcc-O3-archsm_61-oa.outa.cuIntheabove,the compileroption"-archsm_61"specifythecomputecapability6.1forthePascalmicro-architecture. SubmitaCUDAJob Tosubmita GPUjobtotheHPCcluster,first createaSLURMsubmitscriptsub.sh similartothefollowing #!/bin/bash #SBATCH-N1 #SBATCH-n1 #SBATCH-J"cuda-job" #SBATCH-t4:00:00 #SBATCH-pbackfill #SBATCH--gres=gpu:1 #SBATCH--mail-type=ALL #loadthecudamoduletosetuptheenvironment moduleloadcuda/11.1 #thefollowinglineshouldprovidethefullpathtothecudacompiler whichnvcc #executeyourcudaexecutablea.out srun-n1./a.outoutput.txt NotallcomputernodeshaveGPUcards,andaGPUnodecontains upto4GPUcards.Inorder torequireacomputenodewithGPUs, add thefollowinglinetoyoursubmitscript  #SBATCH--gres=gpu:[1-4]# #include intmain(){ intdev=0; cudaDevicePropprop; cudaGetDeviceProperties(&prop,dev); printf("deviceid%d,name%s\n",dev,prop.name); printf("numberofmulti-processors=%d\n", prop.multiProcessorCount); printf("Totalconstantmemory:%4.2fkb\n", prop.totalConstMem/1024.0); printf("Sharedmemoryperblock:%4.2fkb\n", prop.sharedMemPerBlock/1024.0); printf("Totalregistersperblock:%d\n", prop.regsPerBlock); printf("Maximumthreadsperblock:%d\n", prop.maxThreadsPerBlock); printf("Maximumthreadspermulti-processor:%d\n", prop.maxThreadsPerMultiProcessor); printf("Maximumnumberofwarpspermulti-processor%d\n", prop.maxThreadsPerMultiProcessor/32); return0; } Compilethecodevia $moduleloadcuda $nvcc-odeviceQuerydeviceQuery.cuTheoutputwillbesimilartothefollowinguponasuccessfulrun deviceid0,nameGeForceGTX1080Ti numberofmulti-processors=28 Totalconstantmemory:64.00kb Sharedmemoryperblock:48.00kb Totalregistersperblock:65536 Maximumthreadsperblock:1024 Maximumthreadspermulti-processor:2048 Maximumnumberofwarpspermulti-processor64



請為這篇文章評分?