OpenCL | NVIDIA Developer

文章推薦指數: 80 %
投票人數:10人

OpenCL™ (Open Computing Language) is a low-level API for heterogeneous computing that runs on CUDA-powered GPUs. Using the OpenCL API, developers can launch ... SkiptomaincontentOpenCLHomeHighPerformanceComputingTools&EcosystemLanguage&APIsOpenCLOpenCLOpenCL™(OpenComputingLanguage)isalow-levelAPIforheterogeneouscomputingthatrunsonCUDA-poweredGPUs.UsingtheOpenCLAPI,developerscanlaunchcomputekernelswrittenusingalimitedsubsetoftheCprogramminglanguageonaGPU.NVIDIAisnowOpenCL3.0conformantandisavailableonR465andlaterdrivers.Thisissupportedonx86/x86_64LinuxandWindowsonlyandavailableatwww.nvidia.com/driversInadditiontoOpenCL,NVIDIAsupportsavarietyofGPU-acceleratedlibrariesandhigh-levelprogrammingsolutionsthatenabledeveloperstogetstartedquicklywithGPUComputing.OpenCLisatrademarkofAppleInc.,usedunderlicensebyKhronos.NVIDIAOpenCLSDKCodeSamplesOpenCL-VulkanInteropSamplesSinewaveandboxfiltersimulationsdemonstratinguseofexternalbufferandimagesharingandsynchronizationthroughexternalsemaphoresbetweenVulkanandOpenCL.DownloadSamplesOpenCLMultiThreadsThissampleshowstheimplementationofmulti-threadedheterogeneouscomputingworkloadswithtightcooperationbetweenCPUandGPU.ThenewOpenCL1.1featuresuserevents,thread-safeAPIcallsandeventcallbacksareutilized.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacUsingInlinePTXwithOpenCLAsimpletestapplicationthatdemonstratesanewCUDA4.0driverabilitytoembedPTXinaOpenCLkernel.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLMarchingCubesIsosurfacesThissampleextractsageometricisosurfacefromavolumedatasetusingthemarchingcubesalgorithm.Itusesthescan(prefixsum)functionfromtheoclScanSDKsampletoperformstreamcompaction.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLTridiagonalEfficientmatrixsolversforlargenumberofsmallindependenttridiagonallinearsystems.OpenCLimplementationof3differentsolvers:ParallelCyclicReduction,CyclicReduction,Sweep(Gausselimination+reorderingoptimizationforfullcoalescing).Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLDeviceQueryThissampleenumeratesthepropertiesoftheOpenCLdevicespresentinthesystem.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLBandwidthTestThisisasimpletestprogramtomeasurethememcopybandwidthoftheGPU.Itcurrentlyiscapableofmeasuringdevicetodevicecopybandwidth,hosttodeviceandhosttodevicecopybandwidthforpageableandpage-lockedmemory,memorymappedanddirectaccess.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLVectorAdditionElementbyelementadditionoftwo1-dimensionalarrays.ImplementedinOpenCLforCUDAGPU's,withfunctionalcomparisonagainstasimpleC++hostCPUimplementation.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLDotProductDotProduct(scalarproduct)ofsetofinputvectorpairs.ImplementedinOpenCLforCUDAGPU's,withfunctionalcomparisonagainstasimpleC++hostCPUimplementation.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLMatrixVectorMultiplicationSimplematrix-vectormultiplicationexampleshowingincreasinglyoptimizedimplementations.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLOverlappedCopy/ComputeSampleElementbyelementhypotenusefortwo1-dimensionalarrays.ImplementedinOpenCLforCUDAGPU's,withfunctionalcomparisonagainstasimpleC++hostCPUimplementation.Demonstratesoverlappedcopy/computein2commandqueuesDownload-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLSimpleMulti-GPUThisapplicationdemonstrateshowtomakeuseofmultipleGPUsinOpenCL.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLSimpleOpenGLInteropSimpleprogramwhichdemonstratesinteroperabilitybetweenOpenCLandOpenGL.TheprogrammodifiesvertexpositionswithOpenCLandusesOpenGLtorenderthegeometry.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacSimpleOpenCLD3D10TextureSimpleprogramwhichdemonstratesDirect3D10textureinteroperabilitywithOpenCL.TheprogramcreatesanumberofD3D10textures(2D,3D,andCubeMap)whicharewrittentofromOpenCLkernels.Direct3Dthenrenderstheresultsonthescreen.Download-Windows(x86)Download-Windows(x64)SimpleOpenCLD3D9TextureSimpleprogramwhichdemonstratesDirect3D9textureinteroperabilitywithOpenCL.TheprogramcreatesanumberofD3D9textures(2D,3D,andCubeMap)whicharewrittentofromOpenCLkernels.Direct3Dthenrenderstheresultsonthescreen.Download-Windows(x86)Download-Windows(x64)OpenCLScanThisexampledemonstratesanefficientOpenCLimplementationofparallelprefixsum,alsoknownas"scan".Givenanarrayofnumbers,scancomputesanewarrayinwhicheachelementisthesumofalltheelementsbeforeitintheinputarray.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLParallelReductionAparallelsumreductionthatcomputesthesumoflargearraysofvalues.Thissampledemonstratesseveralimportantoptimizationstrategiesforparallelalgorithmslikereduction.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLMatrixTransposeEfficientmatrixtranspose.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLMatrixMultiplicationThissampleimplementsmatrixmultiplicationandisexactlythesameasChapter6oftheprogrammingguide. IthasbeenwrittenforclarityofexpositiontoillustratevariousOpenCLprogrammingprinciples,notwiththegoalofprovidingthemostperformantgenerickernelformatrixmultiplication. CUBLASprovideshigh-performancematrixmultiplication.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCL3DFDTDThissampleappliesafinitedifferencestimedomainprogressionstencilona3Dsurface.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLDCT8x8ThissampledemonstrateshowDiscreteCosineTransform(DCT)for8x8blockscanbeimplementedinOpenCL.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLDirectXTextureCompressor(DXTC)HighQualityDXTCompressionusingOpenCL. Thisexampleshowshowtoimplementanexistingcomputationally-intensiveCPUcompressionalgorithminparallelontheGPU,andobtainanorderofmagnitudeperformanceimprovement.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLRadixSortThissampledemonstratesaveryfastandefficientparallelradixsortimplementedinOpenCLforCUDAGPUs.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLSortingNetworksThissampleimplementsbitonicsortalgorithmforbatchesofshortarraysDownload-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLBlack-ScholesOptionPricingThissampleevaluatesfaircallandputpricesforagivensetofEuropeanoptionsbyBlack-Scholesformula.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLHiddenMarkovModelThissampleimplementsaHiddenMarkovModelinOpenCLfortheGPU.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLQuasirandomGeneratorThissampleimplementsNiederreiterquasirandomnumbergeneratorandMoro'sInverseCumulativeNormalDistributiongenerator.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLMersenneTwisterThissampleimplementsMersenneTwisterrandomnumbergeneratorandCartesianBox-MullertransformationontheGPU.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCL64-binand256-binHistogramThissampledemonstratesefficientimplementationof64-binand256-binhistograms.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLPost-ProcessOpenGL-RenderedImageThissampleshowshowtopost-processanimagerenderedinOpenGLusingOpenCL.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLSimpleTexture3DSimpleexamplethatdemonstratesuseof3DtexturesinOpenCL.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLBoxFilterLinear2-dimensionalvariable-widthBoxFilterofRGBAimage.ImplementedinOpenCLforCUDAGPU's,withperformancecomparisonagainstsimpleC++onhostCPU.EachoftheR,G,BandAchannelsaretreatedindependentlywithresultscomputedconcurrentlyforeach.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLSobelFilter2-dimensional3x3SobelMagnitudeFilterofRGBAimage.ImplementedinOpenCLforCUDAGPU's,withperformancecomparisonagainstsimpleC++onhostCPU.GradientmagnitudeforeachoftheR,G&Bchannelsiscomputedconcurrentlyandindependently,thencombinedintoasinglegradientintensitywithlinearweightingfactors.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLMedianFilterMulti-GPUenabled,2-dimensional3x3MedianFilterofRGBAimage.ImplementedinOpenCLforCUDAGPU's,withperformancecomparisonagainstsimpleC++onhostCPU.EachoftheR,G&Bchannelsaretreatedindependentlywithresultscomputedconcurrentlyforeach.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLSeparableConvolutionThissampleimplementsconvolutionfilterofa2Dimagewitharbitraryseparablekernel.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLRecursiveGaussianFilter2-dimensionalGaussianBlurFilterofRGBAimageusingIRFmethod.ImplementedinOpenCLforCUDAGPU's,withperformancecomparisonagainstsimpleC++onhostCPU.EachoftheR,G,BandAchannelsaretreatedindependentlywithresultscomputedconcurrentlyforeach.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLVolumerenderingThissampledemonstratesbasicvolumerenderingusing3Dtextures.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLParticleCollisionSimulationSimulationofelasticcollisionsofalarge#ofbodies.ImplementedinOpenCLforCUDAGPU's.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacOpenCLN-BodyPhysicsSimulationGravitationalSimulationofalarge#ofbodies.ImplementedinOpenCLforCUDAGPU's.Download-Windows(x86)Download-Windows(x64)Download-Linux/MacQuicklinksAcceleratedComputing-TrainingCUDAGPUsTools&EcosystemOpenACC:MoreScienceLessProgrammingCUDAFAQHPCDeveloperFollowTweetsby@nvidiahpcdev



請為這篇文章評分?