OpenCL | NVIDIA Developer
文章推薦指數: 80 %
OpenCL™ (Open Computing Language) is a low-level API for heterogeneous computing that runs on CUDA-powered GPUs. Using the OpenCL API, developers can launch ... SkiptomaincontentOpenCLHomeHighPerformanceComputingTools&EcosystemLanguage&APIsOpenCLOpenCLOpenCL™(OpenComputingLanguage)isalow-levelAPIforheterogeneouscomputingthatrunsonCUDA-poweredGPUs.UsingtheOpenCLAPI,developerscanlaunchcomputekernelswrittenusingalimitedsubsetoftheCprogramminglanguageonaGPU.NVIDIAisnowOpenCL3.0conformantandisavailableonR465andlaterdrivers.Thisissupportedonx86/x86_64LinuxandWindowsonlyandavailableatwww.nvidia.com/driversInadditiontoOpenCL,NVIDIAsupportsavarietyofGPU-acceleratedlibrariesandhigh-levelprogrammingsolutionsthatenabledeveloperstogetstartedquicklywithGPUComputing.OpenCLisatrademarkofAppleInc.,usedunderlicensebyKhronos.NVIDIAOpenCLSDKCodeSamplesOpenCL-VulkanInteropSamplesSinewaveandboxfiltersimulationsdemonstratinguseofexternalbufferandimagesharingandsynchronizationthroughexternalsemaphoresbetweenVulkanandOpenCL.DownloadSamplesOpenCLMultiThreadsThissampleshowstheimplementationofmulti-threadedheterogeneouscomputingworkloadswithtightcooperationbetweenCPUandGPU.ThenewOpenCL1.1featuresuserevents,thread-safeAPIcallsandeventcallbacksareutilized.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacUsingInlinePTXwithOpenCLAsimpletestapplicationthatdemonstratesanewCUDA4.0driverabilitytoembedPTXinaOpenCLkernel.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLMarchingCubesIsosurfacesThissampleextractsageometricisosurfacefromavolumedatasetusingthemarchingcubesalgorithm.Itusesthescan(prefixsum)functionfromtheoclScanSDKsampletoperformstreamcompaction.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLTridiagonalEfficientmatrixsolversforlargenumberofsmallindependenttridiagonallinearsystems.OpenCLimplementationof3differentsolvers:ParallelCyclicReduction,CyclicReduction,Sweep(Gausselimination+reorderingoptimizationforfullcoalescing).Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLDeviceQueryThissampleenumeratesthepropertiesoftheOpenCLdevicespresentinthesystem.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLBandwidthTestThisisasimpletestprogramtomeasurethememcopybandwidthoftheGPU.Itcurrentlyiscapableofmeasuringdevicetodevicecopybandwidth,hosttodeviceandhosttodevicecopybandwidthforpageableandpage-lockedmemory,memorymappedanddirectaccess.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLVectorAdditionElementbyelementadditionoftwo1-dimensionalarrays.ImplementedinOpenCLforCUDAGPU's,withfunctionalcomparisonagainstasimpleC++hostCPUimplementation.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLDotProductDotProduct(scalarproduct)ofsetofinputvectorpairs.ImplementedinOpenCLforCUDAGPU's,withfunctionalcomparisonagainstasimpleC++hostCPUimplementation.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLMatrixVectorMultiplicationSimplematrix-vectormultiplicationexampleshowingincreasinglyoptimizedimplementations.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLOverlappedCopy/ComputeSampleElementbyelementhypotenusefortwo1-dimensionalarrays.ImplementedinOpenCLforCUDAGPU's,withfunctionalcomparisonagainstasimpleC++hostCPUimplementation.Demonstratesoverlappedcopy/computein2commandqueuesDownload-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLSimpleMulti-GPUThisapplicationdemonstrateshowtomakeuseofmultipleGPUsinOpenCL.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLSimpleOpenGLInteropSimpleprogramwhichdemonstratesinteroperabilitybetweenOpenCLandOpenGL.TheprogrammodifiesvertexpositionswithOpenCLandusesOpenGLtorenderthegeometry.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacSimpleOpenCLD3D10TextureSimpleprogramwhichdemonstratesDirect3D10textureinteroperabilitywithOpenCL.TheprogramcreatesanumberofD3D10textures(2D,3D,andCubeMap)whicharewrittentofromOpenCLkernels.Direct3Dthenrenderstheresultsonthescreen.Download-Windows(x86)Download-Windows(x64)SimpleOpenCLD3D9TextureSimpleprogramwhichdemonstratesDirect3D9textureinteroperabilitywithOpenCL.TheprogramcreatesanumberofD3D9textures(2D,3D,andCubeMap)whicharewrittentofromOpenCLkernels.Direct3Dthenrenderstheresultsonthescreen.Download-Windows(x86)Download-Windows(x64)OpenCLScanThisexampledemonstratesanefficientOpenCLimplementationofparallelprefixsum,alsoknownas"scan".Givenanarrayofnumbers,scancomputesanewarrayinwhicheachelementisthesumofalltheelementsbeforeitintheinputarray.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLParallelReductionAparallelsumreductionthatcomputesthesumoflargearraysofvalues.Thissampledemonstratesseveralimportantoptimizationstrategiesforparallelalgorithmslikereduction.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLMatrixTransposeEfficientmatrixtranspose.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLMatrixMultiplicationThissampleimplementsmatrixmultiplicationandisexactlythesameasChapter6oftheprogrammingguide. IthasbeenwrittenforclarityofexpositiontoillustratevariousOpenCLprogrammingprinciples,notwiththegoalofprovidingthemostperformantgenerickernelformatrixmultiplication. CUBLASprovideshigh-performancematrixmultiplication.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCL3DFDTDThissampleappliesafinitedifferencestimedomainprogressionstencilona3Dsurface.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLDCT8x8ThissampledemonstrateshowDiscreteCosineTransform(DCT)for8x8blockscanbeimplementedinOpenCL.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLDirectXTextureCompressor(DXTC)HighQualityDXTCompressionusingOpenCL. Thisexampleshowshowtoimplementanexistingcomputationally-intensiveCPUcompressionalgorithminparallelontheGPU,andobtainanorderofmagnitudeperformanceimprovement.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLRadixSortThissampledemonstratesaveryfastandefficientparallelradixsortimplementedinOpenCLforCUDAGPUs.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLSortingNetworksThissampleimplementsbitonicsortalgorithmforbatchesofshortarraysDownload-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLBlack-ScholesOptionPricingThissampleevaluatesfaircallandputpricesforagivensetofEuropeanoptionsbyBlack-Scholesformula.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLHiddenMarkovModelThissampleimplementsaHiddenMarkovModelinOpenCLfortheGPU.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLQuasirandomGeneratorThissampleimplementsNiederreiterquasirandomnumbergeneratorandMoro'sInverseCumulativeNormalDistributiongenerator.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLMersenneTwisterThissampleimplementsMersenneTwisterrandomnumbergeneratorandCartesianBox-MullertransformationontheGPU.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCL64-binand256-binHistogramThissampledemonstratesefficientimplementationof64-binand256-binhistograms.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLPost-ProcessOpenGL-RenderedImageThissampleshowshowtopost-processanimagerenderedinOpenGLusingOpenCL.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLSimpleTexture3DSimpleexamplethatdemonstratesuseof3DtexturesinOpenCL.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLBoxFilterLinear2-dimensionalvariable-widthBoxFilterofRGBAimage.ImplementedinOpenCLforCUDAGPU's,withperformancecomparisonagainstsimpleC++onhostCPU.EachoftheR,G,BandAchannelsaretreatedindependentlywithresultscomputedconcurrentlyforeach.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLSobelFilter2-dimensional3x3SobelMagnitudeFilterofRGBAimage.ImplementedinOpenCLforCUDAGPU's,withperformancecomparisonagainstsimpleC++onhostCPU.GradientmagnitudeforeachoftheR,G&Bchannelsiscomputedconcurrentlyandindependently,thencombinedintoasinglegradientintensitywithlinearweightingfactors.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLMedianFilterMulti-GPUenabled,2-dimensional3x3MedianFilterofRGBAimage.ImplementedinOpenCLforCUDAGPU's,withperformancecomparisonagainstsimpleC++onhostCPU.EachoftheR,G&Bchannelsaretreatedindependentlywithresultscomputedconcurrentlyforeach.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLSeparableConvolutionThissampleimplementsconvolutionfilterofa2Dimagewitharbitraryseparablekernel.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLRecursiveGaussianFilter2-dimensionalGaussianBlurFilterofRGBAimageusingIRFmethod.ImplementedinOpenCLforCUDAGPU's,withperformancecomparisonagainstsimpleC++onhostCPU.EachoftheR,G,BandAchannelsaretreatedindependentlywithresultscomputedconcurrentlyforeach.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLVolumerenderingThissampledemonstratesbasicvolumerenderingusing3Dtextures.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLParticleCollisionSimulationSimulationofelasticcollisionsofalarge#ofbodies.ImplementedinOpenCLforCUDAGPU's.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLN-BodyPhysicsSimulationGravitationalSimulationofalarge#ofbodies.ImplementedinOpenCLforCUDAGPU's.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacQuicklinksAcceleratedComputing-TrainingCUDAGPUsTools&EcosystemOpenACC:MoreScienceLessProgrammingCUDAFAQHPCDeveloperFollowTweetsby@nvidiahpcdev
延伸文章資訊
- 1TensorFlow採用OpenCL後端使GPU推理效能加倍 - iThome
OpenCL的設計更適合用於各種計算加速器,因此以OpenCL後端處理行動裝置GPU推理工作負載,比OpenGL後端效能好上許多.
- 2OpenCL 學習筆記(ㄧ): 第一個OpenCL 程式 - HackMD
目前最火紅的GPU 語言為CUDA 和OpenCL,本文將只探討OpenCL 要如何使用?早期的GPU 是專注在圖形運算,如果要用GPU 作運算必須利用繪圖介面,例如OpenGL ,雖然不是不能...
- 3OpenCL 教學(一)
OpenCL 是由Khronos Group 針對異質性計算裝置(heterogeneous device)進行平行化運算所設計的標準API 以及程式語言。所謂的「異質性計算裝置」,是 ...
- 4AI運算加速系列-OpenCL 高效能平行運算實作班(資展國際)
深度學習OpenCL (Open Computing Language)是基於C及C++語言,以同一種語言實現跨越CPUs、GPUs、CELL、DSP等異質(heterogeneous)執行平台...
- 5[OpenCL]OpenCL 總整理 - HackMD
[OpenCL]OpenCL 總整理=== ## 安裝[install packets](https://askubuntu.com/a/850594/655911) ## 編譯[what is.