OpenCL - Wikipedia

文章推薦指數: 80 %
投票人數:10人

OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units ... OpenCL FromWikipedia,thefreeencyclopedia Jumptonavigation Jumptosearch Openstandardforprogrammingheterogenouscomputingsystems,suchasCPUsorGPUs NottobeconfusedwithOpenGL. ForthecryptographiclibraryinitiallyknownasOpenCL,seeBotan(programminglibrary). ThisarticleusesbareURLs,whichmaybethreatenedbylinkrot.Pleaseconsiderconvertingthemtofullcitationstoensurethearticleremainsverifiableandmaintainsaconsistentcitationstyle.Severaltemplatesandtoolsareavailabletoassistinformatting,suchasreFill(documentation).(June2022)(Learnhowandwhentoremovethistemplatemessage) Thisarticlemaybetootechnicalformostreaderstounderstand.Pleasehelpimproveittomakeitunderstandabletonon-experts,withoutremovingthetechnicaldetails.(October2021)(Learnhowandwhentoremovethistemplatemessage) OpenCLAPIOriginalauthor(s)AppleInc.Developer(s)KhronosGroupInitialreleaseAugust 28,2009;12yearsago (2009-08-28)Stablerelease3.0.11[1] /May 6,2022;2monthsago (2022-05-06) WritteninCwithC++bindingsOperatingsystemAndroid(vendordependent),[2]FreeBSD,[3]Linux,macOS(viaPocl),WindowsPlatformARMv7,ARMv8,[4]Cell,IA-32,Power,x86-64TypeHeterogeneouscomputingAPILicenseOpenCLspecificationlicenseWebsitewww.khronos.org/opencl/ OpenCLC/C++andC++forOpenCLParadigmImperative(procedural),structured,(C++only)object-oriented,genericprogrammingFamilyCStablereleaseOpenCLC++1.0revisionV2.2-11[5] OpenCLC3.0revisionV3.0.11[6] C++forOpenCL1.0revision2[7] /March 31,2021;16monthsago (2021-03-31) TypingdisciplineStatic,weak,manifest,nominalImplementationlanguageImplementationspecificFilenameextensions.cl.clcppWebsitewww.khronos.org/openclMajorimplementationsAMD,GalliumCompute,IBM,IntelNEO,IntelSDK,TexasInstruments,Nvidia,POCL,ArmInfluencedbyC99,CUDA,C++14,C++17 OpenCL(OpenComputingLanguage)isaframeworkforwritingprogramsthatexecuteacrossheterogeneousplatformsconsistingofcentralprocessingunits(CPUs),graphicsprocessingunits(GPUs),digitalsignalprocessors(DSPs),field-programmablegatearrays(FPGAs)andotherprocessorsorhardwareaccelerators.OpenCLspecifiesprogramminglanguages(basedonC99,C++14andC++17)forprogrammingthesedevicesandapplicationprogramminginterfaces(APIs)tocontroltheplatformandexecuteprogramsonthecomputedevices.OpenCLprovidesastandardinterfaceforparallelcomputingusingtask-anddata-basedparallelism. OpenCLisanopenstandardmaintainedbythenon-profittechnologyconsortiumKhronosGroup.ConformantimplementationsareavailablefromAltera,AMD,ARM,Creative,IBM,Imagination,Intel,Nvidia,Qualcomm,Samsung,Vivante,Xilinx,andZiiLABS.[8][9] Contents 1Overview 1.1Memoryhierarchy 2OpenCLkernellanguage 2.1OpenCLClanguage 2.1.1Example:matrix–vectormultiplication 2.1.2Example:computingtheFFT 2.2C++forOpenCLlanguage 2.2.1Features 2.2.2Example:complex-numberarithmetic 2.2.3ToolingandExecutionEnvironment 2.2.4Contributions 3History 3.1OpenCL1.0 3.2OpenCL1.1 3.3OpenCL1.2 3.4OpenCL2.0 3.5OpenCL2.1 3.6OpenCL2.2 3.7OpenCL3.0 4Roadmap 5Opensourceimplementations 6Vendorimplementations 6.1Timelineofvendorimplementations 7Devices 7.1KhronosConformanceTestSuite 7.2Conformantproducts 7.3Versionsupport 7.3.1OpenCL3.0support 7.3.2OpenCL2.2support 7.3.3OpenCL2.1support 7.3.4OpenCL2.0support 7.3.5OpenCL1.2support 7.3.6OpenCL1.1support 7.3.7OpenCL1.0support 8Portability,performanceandalternatives 9Seealso 10References 11Externallinks Overview[edit] OpenCLviewsacomputingsystemasconsistingofanumberofcomputedevices,whichmightbecentralprocessingunits(CPUs)or"accelerators"suchasgraphicsprocessingunits(GPUs),attachedtoahostprocessor(aCPU).ItdefinesaC-likelanguageforwritingprograms.FunctionsexecutedonanOpenCLdevicearecalled"kernels".[10]: 17 Asinglecomputedevicetypicallyconsistsofseveralcomputeunits,whichinturncomprisemultipleprocessingelements(PEs).AsinglekernelexecutioncanrunonallormanyofthePEsinparallel.HowacomputedeviceissubdividedintocomputeunitsandPEsisuptothevendor;acomputeunitcanbethoughtofasa"core",butthenotionofcoreishardtodefineacrossallthetypesofdevicessupportedbyOpenCL(orevenwithinthecategoryof"CPUs"),[11]: 49–50 andthenumberofcomputeunitsmaynotcorrespondtothenumberofcoresclaimedinvendors'marketingliterature(whichmayactuallybecountingSIMDlanes).[12] InadditiontoitsC-likeprogramminglanguage,OpenCLdefinesanapplicationprogramminginterface(API)thatallowsprogramsrunningonthehosttolaunchkernelsonthecomputedevicesandmanagedevicememory,whichis(atleastconceptually)separatefromhostmemory.ProgramsintheOpenCLlanguageareintendedtobecompiledatrun-time,sothatOpenCL-usingapplicationsareportablebetweenimplementationsforvarioushostdevices.[13]TheOpenCLstandarddefineshostAPIsforCandC++;third-partyAPIsexistforotherprogramminglanguagesandplatformssuchasPython,[14]Java,Perl,[15]D[16]and.NET.[11]: 15 AnimplementationoftheOpenCLstandardconsistsofalibrarythatimplementstheAPIforCandC++,andanOpenCLCcompilerforthecomputedevice(s)targeted. InordertoopentheOpenCLprogrammingmodeltootherlanguagesortoprotectthekernelsourcefrominspection,theStandardPortableIntermediateRepresentation(SPIR)[17]canbeusedasatarget-independentwaytoshipkernelsbetweenafront-endcompilerandtheOpenCLback-end. MorerecentlyKhronosGrouphasratifiedSYCL,[18]ahigher-levelprogrammingmodelforOpenCLasasingle-sourceeDSLbasedonpureC++17toimproveprogrammingproductivity.PeopleinterestedbyC++kernelsbutnotbySYCLsingle-sourceprogrammingstylecanuseC++featureswithcomputekernelsourceswrittenin"C++forOpenCL"language.[19] Memoryhierarchy[edit] OpenCLdefinesafour-levelmemoryhierarchyforthecomputedevice:[13] globalmemory:sharedbyallprocessingelements,buthashighaccesslatency(__global); read-onlymemory:smaller,lowlatency,writablebythehostCPUbutnotthecomputedevices(__constant); localmemory:sharedbyagroupofprocessingelements(__local); per-elementprivatememory(registers;__private). Noteverydeviceneedstoimplementeachlevelofthishierarchyinhardware.Consistencybetweenthevariouslevelsinthehierarchyisrelaxed,andonlyenforcedbyexplicitsynchronizationconstructs,notablybarriers. DevicesmayormaynotsharememorywiththehostCPU.[13]ThehostAPIprovideshandlesondevicememorybuffersandfunctionstotransferdatabackandforthbetweenhostanddevices. OpenCLkernellanguage[edit] Theprogramminglanguagethatisusedtowritecomputekernelsiscalledkernellanguage.OpenCLadoptsC/C++-basedlanguagestospecifythekernelcomputationsperformedonthedevicewithsomerestrictionsandadditionstofacilitateefficientmappingtotheheterogeneoushardwareresourcesofaccelerators.TraditionallyOpenCLCwasusedtoprogramtheacceleratorsinOpenCLstandard,laterC++forOpenCLkernellanguagewasdevelopedthatinheritedallfunctionalityfromOpenCLCbutallowedtouseC++featuresinthekernelsources. OpenCLClanguage[edit] OpenCLC[20]isaC99-basedlanguagedialectadaptedtofitthedevicemodelinOpenCL.Memorybuffersresideinspecificlevelsofthememoryhierarchy,andpointersareannotatedwiththeregionqualifiers__global,__local,__constant,and__private,reflectingthis.Insteadofadeviceprogramhavingamainfunction,OpenCLCfunctionsaremarked__kerneltosignalthattheyareentrypointsintotheprogramtobecalledfromthehostprogram.Functionpointers,bitfieldsandvariable-lengtharraysareomitted,andrecursionisforbidden.[21]TheCstandardlibraryisreplacedbyacustomsetofstandardfunctions,gearedtowardmathprogramming. OpenCLCisextendedtofacilitateuseofparallelismwithvectortypesandoperations,synchronization,andfunctionstoworkwithwork-itemsandwork-groups.[21]Inparticular,besidesscalartypessuchasfloatanddouble,whichbehavesimilarlytothecorrespondingtypesinC,OpenCLprovidesfixed-lengthvectortypessuchasfloat4(4-vectorofsingle-precisionfloats);suchvectortypesareavailableinlengthstwo,three,four,eightandsixteenforvariousbasetypes.[20]: § 6.1.2 VectorizedoperationsonthesetypesareintendedtomapontoSIMDinstructionssets,e.g.,SSEorVMX,whenrunningOpenCLprogramsonCPUs.[13]Otherspecializedtypesinclude2-dand3-dimagetypes.[20]: 10–11  Example:matrix–vectormultiplication[edit] Eachinvocation(work-item)ofthekerneltakesarowofthegreenmatrix(Ainthecode),multipliesthisrowwiththeredvector(x)andplacestheresultinanentryofthebluevector(y).Thenumberofcolumnsnispassedtothekernelasncols;thenumberofrowsisimplicitinthenumberofwork-itemsproducedbythehostprogram. Thefollowingisamatrix–vectormultiplicationalgorithminOpenCLC. //MultipliesA*x,leavingtheresultiny. //Aisarow-majormatrix,meaningthe(i,j)elementisatA[i*ncols+j]. __kernelvoidmatvec(__globalconstfloat*A,__globalconstfloat*x, uintncols,__globalfloat*y) { size_ti=get_global_id(0);//Globalid,usedastherowindex __globalfloatconst*a=&A[i*ncols];//Pointertothei'throw floatsum=0.f;//Accumulatorfordotproduct for(size_tj=0;j #include #include"CL/opencl.h" #defineNUM_ENTRIES1024 intmain()//(intargc,constchar*argv[]) { //CONSTANTS //Thesourcecodeofthekernelisrepresentedasastring //locatedinsidefile:"fft1D_1024_kernel_src.cl".Forthedetailsseethenextlisting. constchar*KernelSource= #include"fft1D_1024_kernel_src.cl" ; //LookinguptheavailableGPUs constcl_uintnum=1; clGetDeviceIDs(NULL,CL_DEVICE_TYPE_GPU,0,NULL,(cl_uint*)&num); cl_device_iddevices[1]; clGetDeviceIDs(NULL,CL_DEVICE_TYPE_GPU,num,devices,NULL); //createacomputecontextwithGPUdevice cl_contextcontext=clCreateContextFromType(NULL,CL_DEVICE_TYPE_GPU,NULL,NULL,NULL); //createacommandqueue clGetDeviceIDs(NULL,CL_DEVICE_TYPE_DEFAULT,1,devices,NULL); cl_command_queuequeue=clCreateCommandQueue(context,devices[0],0,NULL); //allocatethebuffermemoryobjects cl_memmemobjs[]={clCreateBuffer(context,CL_MEM_READ_ONLY|CL_MEM_COPY_HOST_PTR,sizeof(float)*2*NUM_ENTRIES,NULL,NULL), clCreateBuffer(context,CL_MEM_READ_WRITE,sizeof(float)*2*NUM_ENTRIES,NULL,NULL)}; //createthecomputeprogram //constchar*fft1D_1024_kernel_src[1]={}; cl_programprogram=clCreateProgramWithSource(context,1,(constchar**)&KernelSource,NULL,NULL); //buildthecomputeprogramexecutable clBuildProgram(program,0,NULL,NULL,NULL,NULL); //createthecomputekernel cl_kernelkernel=clCreateKernel(program,"fft1D_1024",NULL); //settheargsvalues size_tlocal_work_size[1]={256}; clSetKernelArg(kernel,0,sizeof(cl_mem),(void*)&memobjs[0]); clSetKernelArg(kernel,1,sizeof(cl_mem),(void*)&memobjs[1]); clSetKernelArg(kernel,2,sizeof(float)*(local_work_size[0]+1)*16,NULL); clSetKernelArg(kernel,3,sizeof(float)*(local_work_size[0]+1)*16,NULL); //createN-Drangeobjectwithwork-itemdimensionsandexecutekernel size_tglobal_work_size[1]={256}; global_work_size[0]=NUM_ENTRIES; local_work_size[0]=64;//Nvidia:192or256 clEnqueueNDRangeKernel(queue,kernel,1,NULL,global_work_size,local_work_size,0,NULL,NULL); } Theactualcalculationinsidefile"fft1D_1024_kernel_src.cl"(basedonFittingFFTontotheG80Architecture):[23] R"( //ThiskernelcomputesFFToflength1024.The1024lengthFFTisdecomposedinto //callstoaradix16function,anotherradix16functionandthenaradix4function __kernelvoidfft1D_1024(__globalfloat2*in,__globalfloat2*out, __localfloat*sMemx,__localfloat*sMemy){ inttid=get_local_id(0); intblockIdx=get_group_id(0)*1024+tid; float2data[16]; //startingindexofdatato/fromglobalmemory in=in+blockIdx;out=out+blockIdx; globalLoads(data,in,64);//coalescedglobalreads fftRadix16Pass(data);//in-placeradix-16pass twiddleFactorMul(data,tid,1024,0); //localshuffleusinglocalmemory localShuffle(data,sMemx,sMemy,tid,(((tid&15)*65)+(tid>>4))); fftRadix16Pass(data);//in-placeradix-16pass twiddleFactorMul(data,tid,64,4);//twiddlefactormultiplication localShuffle(data,sMemx,sMemy,tid,(((tid>>4)*64)+(tid&15))); //fourradix-4functioncalls fftRadix4Pass(data);//radix-4functionnumber1 fftRadix4Pass(data+4);//radix-4functionnumber2 fftRadix4Pass(data+8);//radix-4functionnumber3 fftRadix4Pass(data+12);//radix-4functionnumber4 //coalescedglobalwrites globalStores(data,out,64); } )" Afull,opensourceimplementationofanOpenCLFFTcanbefoundonApple'swebsite.[24] C++forOpenCLlanguage[edit] In2020,Khronosannounced[25]thetransitiontothecommunitydrivenC++forOpenCLprogramminglanguage[26]thatprovidesfeaturesfromC++17incombinationwiththetraditionalOpenCLCfeatures.ThislanguageallowstoleveragearichvarietyoflanguagefeaturesfromstandardC++whilepreservingbackwardcompatibilitytoOpenCLC.ThisopensupasmoothtransitionpathtoC++functionalityfortheOpenCLkernelcodedevelopersas theycancontinueusingfamiliarprogrammingflowandeventoolsaswellasleverageexistingextensionsandlibrariesavailableforOpenCLC. ThelanguagesemanticsisdescribedinthedocumentationpublishedinthereleasesofOpenCL-Docs[27]repositoryhostedbytheKhronosGroup butitiscurrentlynotratifiedbytheKhronosGroup.TheC++forOpenCLlanguageisnotdocumentedinastand-alonedocumentanditisbasedonthespecificationofC++andOpenCLC.TheopensourceClangcompilerhassupportedC++forOpenCLsincerelease9.[28] C++forOpenCLhasbeenoriginallydevelopedasaClangcompilerextensionandappearedintherelease9.[29]AsitwastightlycoupledwithOpenCLCanddidnotcontainanyClangspecificfunctionalityitsdocumentationhasbeenre-hostedtotheOpenCL-Docsrepository[27]fromtheKhronosGroupalongwiththesourcesofotherspecificationsandreferencecards.ThefirstofficialreleaseofthisdocumentdescribingC++forOpenCLversion1.0hasbeenpublishedinDecember2020.[30]C++forOpenCL1.0containsfeaturesfromC++17anditisbackwardcompatiblewithOpenCLC2.0.AworkinprogressdraftofitsdocumentationcanbefoundontheKhronoswebsite.[31] Features[edit] C++forOpenCLsupportsmostofthefeatures(syntacticallyandsemantically)fromOpenCLCexceptfornestedparallelismandblocks.[32]However,thereareminordifferencesinsomesupportedfeaturesmainlyrelatedtodifferencesinsemanticsbetweenC++andC.Forexample,C++ismorestrictwiththeimplicittypeconversionsanditdoesnotsupporttherestricttypequalifier.[32]ThefollowingC++featuresarenotsupportedbyC++forOpenCL:virtualfunctions,dynamic_castoperator,non-placementnew/deleteoperators,exceptions,pointertomemberfunctions,referencestofunctions,C++standardlibraries.[32]C++forOpenCLextendstheconceptofseparatememoryregions(addressspaces)fromOpenCLCtoC++features-functionalcasts,templates,classmembers,references,lambdafunctions, operators.MostofC++featuresarenotavailableforthekernelfunctionse.g.overloadingortemplating,arbitraryclasslayoutinparametertype.[32] Example:complex-numberarithmetic[edit] Thefollowingcodesnippetillustrateshowkernelswithcomplex-numberarithmeticcanbeimplementedinC++forOpenCLlanguagewithconvenientuseofC++features.//DefineaclassComplex,thatcanperformcomplex-numbercomputationswith //variousprecisionwhendifferenttypesforTareused-double,float,half. template classcomplex_t{ Tm_re;//Realcomponent. Tm_im;//Imaginarycomponent. public: complex_t(Tre,Tim):m_re{re},m_im{im}{}; //Defineoperatorforcomplex-numbermultiplication. complex_toperator*(constcomplex_t&other)const { return{m_re*other.m_re-m_im*other.m_im, m_re*other.m_im+m_im*other.m_re}; } Tget_re()const{returnm_re;} Tget_im()const{returnm_im;} }; //Ahelperfunctiontocomputemultiplicationovercomplexnumbersreadfrom //theinputbufferandtostorethecomputedresultintotheoutputbuffer. template voidcompute_helper(__globalT*in,__globalT*out){ autoidx=get_global_id(0); //Everywork-itemuses4consecutiveitemsfromtheinputbuffer //-twoforeachcomplexnumber. autooffset=idx*4; autonum1=complex_t{in[offset],in[offset+1]}; autonum2=complex_t{in[offset+2],in[offset+3]}; //Performcomplex-numbermultiplication. autores=num1*num2; //Everywork-itemwrites2consecutiveitemstotheoutputbuffer. out[idx*2]=res.get_re(); out[idx*2+1]=res.get_im(); } //Thiskernelisusedforcomplex-numbermultiplicationinsingleprecision. __kernelvoidcompute_sp(__globalfloat*in,__globalfloat*out){ compute_helper(in,out); } #ifdefcl_khr_fp16 //Thiskernelisusedforcomplex-numbermultiplicationinhalfprecisionwhen //itissupportedbythedevice. #pragmaOPENCLEXTENSIONcl_khr_fp16:enable __kernelvoidcompute_hp(__globalhalf*in,__globalhalf*out){ compute_helper(in,out); } #endif ToolingandExecutionEnvironment[edit] C++forOpenCLlanguagecanbeusedforthesameapplicationsorlibrariesandinthesamewayasOpenCLClanguageisused.DuetotherichvarietyofC++languagefeatures,applicationswritteninC++forOpenCLcanexpresscomplexfunctionality moreconvenientlythanapplicationswritteninOpenCLCandinparticulargenericprogrammingparadigmfromC++isveryattractivetothelibrarydevelopers. C++forOpenCLsourcescanbecompiledbyOpenCLdriversthatsupportcl_ext_cxx_for_openclextension.[33]ArmhasannouncedsupportforthisextensioninDecember2020.[34]However,duetoincreasingcomplexityofthealgorithmsacceleratedonOpenCLdevices,itisexpectedthatmoreapplicationswillcompileC++forOpenCLkernelsofflineusingstandalonecompilerssuchasClang[35]intoexecutablebinaryformatorportablebinaryformate.g.SPIR-V.[36]SuchanexecutablecanbeloadedduringtheOpenCLapplicationsexecutionusingadedicatedOpenCLAPI.[37] BinariescompiledfromsourcesinC++forOpenCL1.0canbeexecutedonOpenCL2.0conformantdevices.DependingonthelanguagefeaturesusedinsuchkernelsourcesitcanalsobeexecutedondevicessupportingearlierOpenCLversionsorOpenCL3.0. AsidefromOpenCLdriverskernelswritteninC++forOpenCLcanbecompiledforexecutiononVulkandevicesusingclspv[38]compilerandclvk[39]runtimelayerjustthesamewayasOpenCLCkernels. Contributions[edit] C++forOpenCLisanopenlanguagedevelopedbythecommunityofcontributorslistedinitsdocumentation.[31] Newcontributionstothelanguagesemanticdefinitionoropensourcetoolingsupportareacceptedfromanyoneinterestedassoonastheyarealignedwiththemaindesignphilosophy andtheyarereviewedandapprovedbytheexperiencedcontributors.[19] History[edit] OpenCLwasinitiallydevelopedbyAppleInc.,whichholdstrademarkrights,andrefinedintoaninitialproposalincollaborationwithtechnicalteamsatAMD,IBM,Qualcomm,Intel,andNvidia.ApplesubmittedthisinitialproposaltotheKhronosGroup.OnJune16,2008,theKhronosComputeWorkingGroupwasformed[40]withrepresentativesfromCPU,GPU,embedded-processor,andsoftwarecompanies.ThisgroupworkedforfivemonthstofinishthetechnicaldetailsofthespecificationforOpenCL1.0byNovember18,2008.[41]ThistechnicalspecificationwasreviewedbytheKhronosmembersandapprovedforpublicreleaseonDecember8,2008.[42] OpenCL1.0[edit] OpenCL1.0releasedwithMacOSXSnowLeopardonAugust28,2009.AccordingtoanApplepressrelease:[43] SnowLeopardfurtherextendssupportformodernhardwarewithOpenComputingLanguage(OpenCL),whichletsanyapplicationtapintothevastgigaflopsofGPUcomputingpowerpreviouslyavailableonlytographicsapplications.OpenCLisbasedontheCprogramminglanguageandhasbeenproposedasanopenstandard. AMDdecidedtosupportOpenCLinsteadofthenowdeprecatedClosetoMetalinitsStreamframework.[44][45]RapidMindannouncedtheiradoptionofOpenCLunderneaththeirdevelopmentplatformtosupportGPUsfrommultiplevendorswithoneinterface.[46]OnDecember9,2008,NvidiaannounceditsintentiontoaddfullsupportfortheOpenCL1.0specificationtoitsGPUComputingToolkit.[47]OnOctober30,2009,IBMreleaseditsfirstOpenCLimplementationasapartoftheXLcompilers.[48] Accelerationofcalculationswithfactorto1000arepossiblewithOpenCLingraphiccardsagainstnormalCPU.[49] SomeimportantfeaturesofnextVersionofOpenCLareoptionalin1.0likedouble-orhalf-precisionoperations.[50] OpenCL1.1[edit] OpenCL1.1wasratifiedbytheKhronosGrouponJune14,2010[51]andaddssignificantfunctionalityforenhancedparallelprogrammingflexibility,functionality,andperformanceincluding: Newdatatypesincluding3-componentvectorsandadditionalimageformats; Handlingcommandsfrommultiplehostthreadsandprocessingbuffersacrossmultipledevices; Operationsonregionsofabufferincludingread,writeandcopyof1D,2D,or3Drectangularregions; Enhanceduseofeventstodriveandcontrolcommandexecution; AdditionalOpenCLbuilt-inCfunctionssuchasintegerclamp,shuffle,andasynchronousstridedcopies; ImprovedOpenGLinteroperabilitythroughefficientsharingofimagesandbuffersbylinkingOpenCLandOpenGLevents. OpenCL1.2[edit] OnNovember15,2011,theKhronosGroupannouncedtheOpenCL1.2specification,[52]whichaddedsignificantfunctionalityoverthepreviousversionsintermsofperformanceandfeaturesforparallelprogramming.Mostnotablefeaturesinclude: Devicepartitioning:theabilitytopartitionadeviceintosub-devicessothatworkassignmentscanbeallocatedtoindividualcomputeunits.Thisisusefulforreservingareasofthedevicetoreducelatencyfortime-criticaltasks. Separatecompilationandlinkingofobjects:thefunctionalitytocompileOpenCLintoexternallibrariesforinclusionintootherprograms. Enhancedimagesupport(optional):1.2addssupportfor1Dimagesand1D/2Dimagearrays.Furthermore,theOpenGLsharingextensionsnowallowforOpenGL1Dtexturesand1D/2DtexturearraystobeusedtocreateOpenCLimages. Built-inkernels:customdevicesthatcontainspecificuniquefunctionalityarenowintegratedmorecloselyintotheOpenCLframework.Kernelscanbecalledtousespecialisedornon-programmableaspectsofunderlyinghardware.Examplesincludevideoencoding/decodinganddigitalsignalprocessors. DirectXfunctionality:DX9mediasurfacesharingallowsforefficientsharingbetweenOpenCLandDX9orDXVAmediasurfaces.Equally,forDX11,seamlesssharingbetweenOpenCLandDX11surfacesisenabled. TheabilitytoforceIEEE754complianceforsingle-precisionfloating-pointmath:OpenCLbydefaultallowsthesingle-precisionversionsofthedivision,reciprocal,andsquarerootoperationtobelessaccuratethanthecorrectlyroundedvaluesthatIEEE754requires.[53]Iftheprogrammerpassesthe"-cl-fp32-correctly-rounded-divide-sqrt"commandlineargumenttothecompiler,thesethreeoperationswillbecomputedtoIEEE754requirementsiftheOpenCLimplementationsupportsthis,andwillfailtocompileiftheOpenCLimplementationdoesnotsupportcomputingtheseoperationstotheircorrectly-roundedvaluesasdefinedbytheIEEE754specification.[53]ThisabilityissupplementedbytheabilitytoquerytheOpenCLimplementationtodetermineifitcanperformtheseoperationstoIEEE754accuracy.[53] OpenCL2.0[edit] OnNovember18,2013,theKhronosGroupannouncedtheratificationandpublicreleaseofthefinalizedOpenCL2.0specification.[54]UpdatesandadditionstoOpenCL2.0include: Sharedvirtualmemory Nestedparallelism Genericaddressspace Images(optional,include3D-Image) C11atomics Pipes Androidinstallableclientdriverextension halfprecisionextendedwithoptionalcl_khr_fp16extension cl_double:doubleprecisionIEEE754(optional) OpenCL2.1[edit] TheratificationandreleaseoftheOpenCL2.1provisionalspecificationwasannouncedonMarch3,2015attheGameDeveloperConferenceinSanFrancisco.ItwasreleasedonNovember16,2015.[55]ItintroducedtheOpenCLC++kernellanguage,basedonasubsetofC++14,whilemaintainingsupportforthepreexistingOpenCLCkernellanguage.VulkanandOpenCL2.1shareSPIR-Vasanintermediaterepresentationallowinghigh-levellanguagefront-endstoshareacommoncompilationtarget.UpdatestotheOpenCLAPIinclude: Additionalsubgroupfunctionality Copyingofkernelobjectsandstates Low-latencydevicetimerqueries IngestionofSPIR-Vcodebyruntime Executionpriorityhintsforqueues Zero-sizeddispatchesfromhost AMD,ARM,Intel,HPC,andYetiWarehavedeclaredsupportforOpenCL2.1.[56][57] OpenCL2.2[edit] OpenCL2.2bringstheOpenCLC++kernellanguageintothecorespecificationforsignificantlyenhancedparallelprogrammingproductivity.[58][59][60]ItwasreleasedonMay16,2017.[61]MaintenanceUpdatereleasedinMay2018withbugfixes.[62] TheOpenCLC++kernellanguageisastaticsubsetoftheC++14standardandincludesclasses,templates,lambdaexpressions,functionoverloadsandmanyotherconstructsforgenericandmeta-programming. UsesthenewKhronosSPIR-V1.1intermediatelanguagewhichfullysupportstheOpenCLC++kernellanguage. OpenCLlibraryfunctionscannowusetheC++languagetoprovideincreasedsafetyandreducedundefinedbehaviorwhileaccessingfeaturessuchasatomics,iterators,images,samplers,pipes,anddevicequeuebuilt-intypesandaddressspaces. Pipestorageisanewdevice-sidetypeinOpenCL2.2thatisusefulforFPGAimplementationsbymakingconnectivitysizeandtypeknownatcompiletime,enablingefficientdevice-scopecommunicationbetweenkernels. OpenCL2.2alsoincludesfeaturesforenhancedoptimizationofgeneratedcode:applicationscanprovidethevalueofspecializationconstantatSPIR-Vcompilationtime,anewquerycandetectnon-trivialconstructorsanddestructorsofprogramscopeglobalobjects,andusercallbackscanbesetatprogramreleasetime. RunsonanyOpenCL2.0-capablehardware(onlyadriverupdateisrequired). OpenCL3.0[edit] TheOpenCL3.0specificationwasreleasedonSeptember30,2020afterbeinginpreviewsinceApril2020.OpenCL1.2functionalityhasbecomeamandatorybaseline,whileallOpenCL2.xandOpenCL3.0featuresweremadeoptional.ThespecificationretainstheOpenCLClanguageanddeprecatestheOpenCLC++KernelLanguage,replacingitwiththeC++forOpenCLlanguage[19]basedonaClang/LLVMcompilerwhichimplementsasubsetofC++17andSPIR-Vintermediatecode.[63][64][65] Version3.0.7ofC++forOpenCLwithsomeKhronosopenCLextensionswerepresentedatIWOCL21.[66]Actualis3.0.11withsomenewextensionsandcorrections. NVIDIA,workingcloselywiththeKhronosOpenCLWorkingGroup,improvedVulkanInteropwithsemaphoresandmemorysharing.[67] Roadmap[edit] TheInternationalWorkshoponOpenCL(IWOCL)heldbytheKhronosGroup WhenreleasingOpenCL2.2,theKhronosGroupannouncedthatOpenCLwouldconvergewherepossiblewithVulkantoenableOpenCLsoftwaredeploymentflexibilityoverbothAPIs.[68][69]ThishasbeennowdemonstratedbyAdobe'sPremiereRushusingtheclspv[38]opensourcecompilertocompilesignificantamountsofOpenCLCkernelcodetorunonaVulkanruntimefordeploymentonAndroid.[70]OpenCLhasaforwardlookingroadmapindependentofVulkan,with'OpenCLNext'underdevelopmentandtargetingreleasein2020.OpenCLNextmayintegrateextensionssuchasVulkan/OpenCLInterop,Scratch-PadMemoryManagement,ExtendedSubgroups,SPIR-V1.4ingestionandSPIR-VExtendeddebuginfo.OpenCLisalsoconsideringVulkan-likeloaderandlayersanda‘FlexibleProfile’fordeploymentflexibilityonmultipleacceleratortypes.[71] Opensourceimplementations[edit] clinfo,acommand-linetooltoseeOpenCLinformation OpenCLconsistsofasetofheadersandasharedobjectthatisloadedatruntime.Aninstallableclientdriver(ICD)mustbeinstalledontheplatformforeveryclassofvendorforwhichtheruntimewouldneedtosupport.Thatis,forexample,inordertosupportNvidiadevicesonaLinuxplatform,theNvidiaICDwouldneedtobeinstalledsuchthattheOpenCLruntime(theICDloader)wouldbeabletolocatetheICDforthevendorandredirectthecallsappropriately.ThestandardOpenCLheaderisusedbytheconsumerapplication;callstoeachfunctionarethenproxiedbytheOpenCLruntimetotheappropriatedriverusingtheICD.EachvendormustimplementeachOpenCLcallintheirdriver.[72] TheApple,[73]Nvidia,[74]ROCm,RapidMind[75]andGallium3D[76]implementationsofOpenCLareallbasedontheLLVMCompilertechnologyandusetheClangcompilerastheirfrontend. MESAGalliumCompute AnimplementationofOpenCL(actual1.1incomplete,mostlydoneAMDRadeonGCN)foranumberofplatformsismaintainedaspartoftheGalliumComputeProject,[77]whichbuildsontheworkoftheMesaprojecttosupportmultipleplatforms.FormerlythiswasknownasCLOVER.,[78]actualdevelopment:mostlysupportforrunningincompleteframeworkwithactualLLVMandCLANG,somenewfeatureslikefp16in17.3,[79]TargetcompleteOpenCL1.0,1.1and1.2forAMDandNvidia.NewBasicDevelopmentisdonebyRedHatwithSPIR-ValsoforClover.[80][81]NewTargetismodularOpenCL3.0withfullsupportofOpenCL1.2.ActualstateisavailableinMesamatrix.Imagesupportsarehereinthefocusofdevelopment. RustiCLisanewimplementationforGalliumcomputewithRustinsteadofCforbettercode.InMesa22.2experimentalimplementationwillbeavailablewithopenCL3.0-supportandimageextensionimplementationforprogramslikeDarktable.[82] BEIGNET AnimplementationbyIntelforitsIvyBridge+hardwarewasreleasedin2013.[83]ThissoftwarefromIntel'sChinaTeam,hasattractedcriticismfromdevelopersatAMDandRedHat,[84]aswellasMichaelLarabelofPhoronix.[85]ActualVersion1.3.2supportOpenCL1.2complete(IvyBridgeandhigher)andOpenCL2.0optionalforSkylakeandnewer.[86][87]supportforAndroidhasbeenaddedtoBeignet.,[88]actualdevelopmenttargets:onlysupportfor1.2and2.0,roadtoOpenCL2.1,2.2,3.0isgonetoNEO. NEO AnimplementationbyIntelforGen.8Broadwell+Gen.9hardwarereleasedin2018.[89]ThisdriverreplacesBeignetimplementationforsupportedplatforms(notolder6.gentoHaswell).NEOprovidesOpenCL2.1supportonCoreplatformsandOpenCL1.2onAtomplatforms.[90]Actualin2020alsoGraphicGen11IceLakeandGen12TigerLakearesupported.NewOpenCL3.0isavailableforAlderLake,TigerLaketoBroadwellwithVersion20.41+.ItincludesnowoptionalOpenCL2.0,2.1Featurescompleteandsomeof2.2. ROCm CreatedaspartofAMD'sGPUOpen,ROCm(RadeonOpenCompute)isanopensourceLinuxprojectbuiltonOpenCL 1.2withlanguagesupportfor2.0.ThesystemiscompatiblewithallmodernAMDCPUsandAPUs(actualpartlyGFX7,GFX8and9),aswellasIntelGen7.5+CPUs(onlywithPCI3.0).[91][92]Withversion1.9supportisinsomepointsextendedexperimentaltoHardwarewithPCIe2.0andwithoutatomics.AnoverviewofactualworkisdoneonXDC2018.[93][94]ROCmVersion2.0supportsFullOpenCL2.0,butsomeerrorsandlimitationsareonthetodolist.[95][96]Version3.3isimprovingindetails.[97]Version3.5doessupportOpenCL2.2.[98]Version3.10waswithimprovementsandnewAPIs.[99]AnnouncedatSC20isROCm4.0withsupportofAMDComputeCardInstinctMI100.[100]Actualdocumentationof5.1.1andbeforeisavailableatgithub.[101][102]OpenCL3.0isavailable. POCL AportableimplementationsupportingCPUsandsomeGPUs(viaCUDAandHSA).BuildingonClangandLLVM.[103]Withversion1.0OpenCL1.2wasnearlyfullyimplementedalongwithsome2.xfeatures.[104]Version1.2iswithLLVM/CLANG6.0,7.0andFullOpenCL1.2supportwithallclosedticketsinMilestone1.2.[104][105]OpenCL2.0isnearlyfullimplemented.[106]Version1.3SupportsMacOSX.[107]Version1.4includessupportforLLVM8.0and9.0.[108]Version1.5implementsLLVM/Clang10support.[109]Version1.6implementsLLVM/Clang11supportandCUDAAcceleration.[110]ActualtargetsarecompleteOpenCL2.x,OpenCL3.0andimprovementofperformance.POCL1.6iswithmanualoptimizationatthesamelevelofIntelcomputeruntime.[111]Version1.7implementsLLVM/Clang12supportandsomenewOpenCL3.0features.[112]Version1.8implementsLLVM/Clang13support.[113]Version3.0implementsOpenCL3.0atminimumlevelandLLVM/Clang14.[114] Shamrock APortofMesaCloverforARMwithfullsupportofOpenCL1.2,[115][116]noactualdevelopmentfor2.0. FreeOCL ACPUfocusedimplementationofOpenCL1.2thatimplementsanexternalcompilertocreateamorereliableplatform,[117]noactualdevelopment. MOCL AnOpenCLimplementationbasedonPOCLbytheNUDTresearchersforMatrix-2000wasreleasedin2018.TheMatrix-2000architectureisdesignedtoreplacetheIntelXeonPhiacceleratorsoftheTianHe-2supercomputer.ThisprogrammingframeworkisbuiltontopofLLVMv5.0andreusessomecodepiecesfromPOCLaswell.Tounlockthehardwarepotential,thedeviceruntimeusesapush-basedtaskdispatchingstrategyandtheperformanceofthekernelatomicsisimprovedsignificantly.ThisframeworkhasbeendeployedontheTH-2Asystemandisreadilyavailabletothepublic.[118]SomeofthesoftwarewillnextportedtoimprovePOCL.[104] VC4CL AnOpenCL1.2implementationfortheVideoCoreIV(BCM2763)processorusedintheRaspberryPibeforeitsmodel4.[119] Vendorimplementations[edit] Timelineofvendorimplementations[edit] June,2008:DuringApple’sWWDCconferenceanearlybetaofMacOSXSnowLeopardwasmadeavailabletotheparticipants,itincludedthefirstbetaimplementationofOpenCL,about6monthsbeforethefinalversion1.0specificationwasratifiedlate2008.Theyalsoshowedtwodemos.Onewasagridof8x8screensrendered,eachdisplayingthescreenofanemulatedAppleIImachine—64independentinstancesintotal,eachrunningafamouskarategame.Thisshowedtaskparallelism,ontheCPU.TheotherdemowasaN-bodysimulationrunningontheGPUofaMacPro,adataparalleltask. December10,2008:AMDandNvidiaheldthefirstpublicOpenCLdemonstration,a75-minutepresentationatSIGGRAPHAsia2008.AMDshowedaCPU-acceleratedOpenCLdemoexplainingthescalabilityofOpenCLononeormorecoreswhileNvidiashowedaGPU-accelerateddemo.[120][121] March16,2009:atthe4thMulticoreExpo,ImaginationTechnologiesannouncedthePowerVRSGX543MP,thefirstGPUofthiscompanytofeatureOpenCLsupport.[122] March26,2009:atGDC2009,AMDandHavokdemonstratedthefirstworkingimplementationforOpenCLacceleratingHavokClothonAMDRadeonHD4000seriesGPU.[123] April20,2009:NvidiaannouncedthereleaseofitsOpenCLdriverandSDKtodevelopersparticipatinginitsOpenCLEarlyAccessProgram.[124] August5,2009:AMDunveiledthefirstdevelopmenttoolsforitsOpenCLplatformaspartofitsATIStreamSDKv2.0BetaProgram.[125] August28,2009:ApplereleasedMacOSXSnowLeopard,whichcontainsafullimplementationofOpenCL.[126] September28,2009:NvidiareleaseditsownOpenCLdriversandSDKimplementation. October13,2009:AMDreleasedthefourthbetaoftheATIStreamSDK2.0,whichprovidesacompleteOpenCLimplementationonbothR700/R800GPUsandSSE3capableCPUs.TheSDKisavailableforbothLinuxandWindows.[127] November26,2009:NvidiareleaseddriversforOpenCL1.0(rev48). October27,2009:S3releasedtheirfirstproductsupportingnativeOpenCL1.0–theChrome5400Eembeddedgraphicsprocessor.[128] December10,2009:VIAreleasedtheirfirstproductsupportingOpenCL1.0–ChromotionHD2.0videoprocessorincludedinVN1000chipset.[129] December21,2009:AMDreleasedtheproductionversionoftheATIStreamSDK2.0,[130]whichprovidesOpenCL1.0supportforR800GPUsandbetasupportforR700GPUs. June1,2010:ZiiLABSreleaseddetailsoftheirfirstOpenCLimplementationfortheZMSprocessorforhandheld,embeddedanddigitalhomeproducts.[131] June30,2010:IBMreleasedafullyconformantversionofOpenCL1.0.[4] September13,2010:IntelreleaseddetailsoftheirfirstOpenCLimplementationfortheSandyBridgechiparchitecture.SandyBridgewillintegrateIntel'snewestgraphicschiptechnologydirectlyontothecentralprocessingunit.[132] November15,2010:WolframResearchreleasedMathematica8withOpenCLLinkpackage. March3,2011:KhronosGroupannouncestheformationoftheWebCLworkinggrouptoexploredefiningaJavaScriptbindingtoOpenCL.ThiscreatesthepotentialtoharnessGPUandmulti-coreCPUparallelprocessingfromaWebbrowser.[133][134] March31,2011:IBMreleasedafullyconformantversionofOpenCL1.1.[4][135] April25,2011:IBMreleasedOpenCLCommonRuntimev0.1forLinuxonx86Architecture.[136] May4,2011:NokiaResearchreleasesanopensourceWebCLextensionfortheFirefoxwebbrowser,providingaJavaScriptbindingtoOpenCL.[137] July1,2011:SamsungElectronicsreleasesanopensourceprototypeimplementationofWebCLforWebKit,providingaJavaScriptbindingtoOpenCL.[138] August8,2011:AMDreleasedtheOpenCL-drivenAMDAcceleratedParallelProcessing(APP)SoftwareDevelopmentKit(SDK)v2.5,replacingtheATIStreamSDKastechnologyandconcept.[139] December12,2011:AMDreleasedAMDAPPSDKv2.6[140]whichcontainsapreviewofOpenCL1.2. February27,2012:ThePortlandGroupreleasedthePGIOpenCLcompilerformulti-coreARMCPUs.[141] April17,2012:KhronosreleasedaWebCLworkingdraft.[142] May6,2013:AlterareleasedtheAlteraSDKforOpenCL,version13.0.[143]ItisconformanttoOpenCL1.0.[144] November18,2013:KhronosannouncedthatthespecificationforOpenCL2.0hadbeenfinalized.[145] March19,2014:KhronosreleasestheWebCL1.0specification.[146][147] August29,2014:IntelreleasesHDGraphics5300driverthatsupportsOpenCL2.0.[148] September25,2014:AMDreleasesCatalyst14.41RC1,whichincludesanOpenCL2.0driver.[149] January14,2015:XilinxInc.announcesSDAcceldevelopmentenvironmentforOpenCL,C,andC++,achievesKhronosConformance.[150] April13,2015:NvidiareleasesWHQLdriverv350.12,whichincludesOpenCL1.2supportforGPUsbasedonKeplerorlaterarchitectures.[151]Driver340+supportOpenCL1.1forTeslaandFermi. August26,2015:AMDreleasedAMDAPPSDKv3.0[152]whichcontainsfullsupportofOpenCL2.0andsamplecoding. November16,2015:KhronosannouncedthatthespecificationforOpenCL2.1hadbeenfinalized.[153] April18,2016:KhronosannouncedthatthespecificationforOpenCL2.2hadbeenprovisionallyfinalized.[59] November3,2016IntelsupportforGen7+ofOpenCL2.1inSDK2016r3.[154] February17,2017:NvidiabeginsevaluationsupportofOpenCL2.0withdriver378.66.[155][156][157] May16,2017:KhronosannouncedthatthespecificationforOpenCL2.2hadbeenfinalizedwithSPIR-V1.2.[158] May14,2018:KhronosannouncedMaintenanceUpdateforOpenCL2.2withBugfixandunifiedheaders.[62] April27,2020:KhronosannouncedprovisionalVersionofOpenCL3.0. June1,2020:IntelNeoRuntimewithOpenCL3.0fornewTigerLake. June3,2020:AMDannouncedRocM3.5withOpenCL2.2support.[159] September30,2020:KhronosannouncedthatthespecificationsforOpenCL3.0hadbeenfinalized(CTSalsoavailable). October16,2020:IntelannouncedwithNeo20.41supportforOpenCL3.0(includesmostlyofoptionalOpenCL2.x). April6,2021:NvidiasupportsOpenCL3.0forAmpere.MaxwellandlaterGPUsalsosupportsOpenCL3.0withNvidiadriver465+.[160] Devices[edit] Asof2016,OpenCLrunsongraphicsprocessingunits(GPUs),CPUswithSIMDinstructions,FPGAs,MovidiusMyriad2,AdaptevaEpiphanyandDSPs. KhronosConformanceTestSuite[edit] Tobeofficiallyconformant,animplementationmustpasstheKhronosConformanceTestSuite(CTS),withresultsbeingsubmittedtotheKhronosAdoptersProgram.[161]TheKhronosCTScodeforallOpenCLversionshasbeenavailableinopensourcesince2017.[162] Conformantproducts[edit] TheKhronosGroupmaintainsanextendedlistofOpenCL-conformantproducts.[4] SynopsisofOpenCLconformantproducts[4] AMDSDKs(supportsOpenCLCPUandacceleratedprocessingunitDevices),(GPU:Terascale1:OpenCL1.1,Terascale2:1.2,GCN1:1.2+,GCN2+:2.0+) X86+SSE2(orhigher)compatibleCPUs64-bit&32-bit,[163]Linux2.6PC,WindowsVista/7/8.x/10PC AMDFusionE-350,E-240,C-50,C-30withHD6310/HD6250 AMDRadeon/MobilityHD6800,HD5x00seriesGPU,iGPUHD6310/HD6250,HD7xxx,HD8xxx,R2xx,R3xx,RX4xx,RX5xx,VegaSeries AMDFireProVx800seriesGPUandlater,RadeonPro IntelSDKforOpenCLApplications2013[164](supportsIntelCoreprocessorsandIntelHDGraphics4000/2500)2017R2withOpenCL2.1(Gen7+),SDK2019removedOpenCL2.1,[165]ActualSDK2020update3 IntelCPUswithSSE4.1,SSE4.2orAVXsupport.[166][167]MicrosoftWindows,Linux IntelCorei7,i5,i3;2ndGenerationIntelCorei7/5/3,3rdGenerationIntelCoreProcessorswithIntelHDGraphics4000/2500andnewer IntelCore2Solo,DuoQuad,Extremeandnewer IntelXeon7x00,5x00,3x00(Corebased)andnewer IBMServerswithOpenCLDevelopmentKitforLinuxonPowerrunningonPowerVSX[168][169] IBMPower775(PERCS),750 IBMBladeCenterPS70xExpress IBMBladeCenterJS2x,JS43 IBMBladeCenterQS22 IBMOpenCLCommonRuntime(OCR) [170] X86+SSE2(orhigher)compatibleCPUs64-bit&32-bit;[171]Linux2.6PC AMDFusion,NvidiaIonandIntelCorei7,i5,i3;2ndGenerationIntelCorei7/5/3 AMDRadeon,NvidiaGeForceandIntelCore2Solo,Duo,Quad,Extreme ATIFirePro,NvidiaQuadroandIntelXeon7x00,5x00,3x00(Corebased) NvidiaOpenCLDriverandTools,[172]Chips:Tesla,Fermi :OpenCL1.1(Driver340+),Kepler,Maxwell,Pascal,Volta,Turing:OpenCL1.2(Driver370+),OpenCL2.0beta(378.66),OpenCL3.0:MaxwelltoAmpere(Driver465+) NvidiaTeslaC/D/S NvidiaGeForceGTS/GT/GTX, NvidiaIon NvidiaQuadroFX/NVX/Plex,Quadro,QuadroK,QuadroM,QuadroP,QuadrowithVolta,QuadroRTXwithTuring,Ampere Allstandard-conformantimplementationscanbequeriedusingoneoftheclinfotools(therearemultipletoolswiththesamenameandsimilarfeatureset).[173][174][175] Versionsupport[edit] ProductsandtheirversionofOpenCLsupportinclude:[176] OpenCL3.0support[edit] AllhardwarewithOpenCL1.2+ispossible,OpenCL2.xonlyoptional,KhronosTestSuiteavailablesince2020-10[177][178] (2020)IntelNEOCompute:20.41+forGen12TigerLaketoBroadwell(includefull2.0and2.1supportandpartsof2.2)[179] (2020)Intel6th,7th,8th,9th,10th,11thgenprocessors(Skylake,KabyLake,CoffeeLake,CometLake,IceLake,TigerLake)withlatestIntelWindowsgraphicsdriver (2021)Intel11th,12thgenprocessors(RocketLake,AlderLake)withlatestIntelWindowsgraphicsdriver (2022)Intel13thgenprocessors(RaptorLake)withlatestIntelWindowsgraphicsdriver (2022)IntelArcdiscretegraphicswithlatestIntelArcWindowsgraphicsdriver (2021)NvidiaMaxwell,Pascal,Volta,TuringandAmperewithNvidiagraphicsdriver465+.[160] OpenCL2.2support[edit] Noneyet:KhronosTestSuiteready,withDriverUpdateallHardwarewith2.0and2.1supportpossible IntelNEOCompute:WorkinProgressforactualproducts[180] ROCm:Version3.5+mostly OpenCL2.1support[edit] (2018+)SupportbackportedtoIntel5thand6thgenprocessors(Broadwell,Skylake) (2017+)Intel7th,8th,9th,10thgenprocessors(KabyLake,CoffeeLake,CometLake,IceLake) Khronos:withDriverUpdateallHardwarewith2.0supportpossible OpenCL2.0support[edit] (2011+)AMDGCNGPU's(HD7700+/HD8000/Rx200/Rx300/Rx400/Rx500/Rx5000-Series),someGCN1stGenonly1.2withsomeExtensions (2013+)AMDGCNAPU's(Jaguar,Steamroller,Puma,Excavator&Zen-based) (2014+)Intel5th&6thgenprocessors(Broadwell,Skylake) (2015+)QualcommAdreno5xxseries (2018+)QualcommAdreno6xxseries (2017+)ARMMali(Bifrost)G51andG71inAndroid7.1andLinux (2018+)ARMMali(Bifrost)G31,G52,G72andG76 (2017+)incompleteEvaluationsupport:NvidiaKepler,Maxwell,Pascal,VoltaandTuringGPU's(GeForce600,700,800,900&10-series,QuadroK-,M-&P-series,TeslaK-,M-&P-series)withDriverVersion378.66+ OpenCL1.2support[edit] (2011+)forsomeAMDGCN1stGensomeOpenCL2.0Featuresnotpossibletoday,butmanymoreExtensionsthanTerascale (2009+)AMDTeraScale2&3GPU's(RV8xx,RV9xxinHD5000,6000&7000Series) (2011+)AMDTeraScaleAPU's(K10,Bobcat&Piledriver-based) (2012+)NvidiaKepler,Maxwell,Pascal,VoltaandTuringGPU's(GeForce600,700,800,900,10,16,20series,QuadroK-,M-&P-series,TeslaK-,M-&P-series) (2012+)Intel3rd&4thgenprocessors(IvyBridge,Haswell) (2013+)QualcommAdreno4xxseries (2013+)ARMMaliMidgard3rdgen(T760) (2015+)ARMMaliMidgard4thgen(T8xx) OpenCL1.1support[edit] (2008+)someAMDTeraScale1GPU's(RV7xxinHD4000-series) (2008+)NvidiaTesla,FermiGPU's(GeForce8,9,100,200,300,400,500-series,Quadro-seriesorTesla-serieswithTeslaorFermiGPU) (2011+)QualcommAdreno3xxseries (2012+)ARMMaliMidgard1stand2ndgen(T-6xx,T720) OpenCL1.0support[edit] mostlyupdatedto1.1and1.2afterfirstDriverfor1.0only Portability,performanceandalternatives[edit] AkeyfeatureofOpenCLisportability,viaitsabstractedmemoryandexecutionmodel,andtheprogrammerisnotabletodirectlyusehardware-specifictechnologiessuchasinlineParallelThreadExecution(PTX)forNvidiaGPUsunlesstheyarewillingtogiveupdirectportabilityonotherplatforms.ItispossibletorunanyOpenCLkernelonanyconformantimplementation. However,performanceofthekernelisnotnecessarilyportableacrossplatforms.Existingimplementationshavebeenshowntobecompetitivewhenkernelcodeisproperlytuned,though,andauto-tuninghasbeensuggestedasasolutiontotheperformanceportabilityproblem,[181]yielding"acceptablelevelsofperformance"inexperimentallinearalgebrakernels.[182]Portabilityofanentireapplicationcontainingmultiplekernelswithdifferingbehaviorswasalsostudied,andshowsthatportabilityonlyrequiredlimitedtradeoffs.[183] AstudyatDelftUniversityfrom2011thatcomparedCUDAprogramsandtheirstraightforwardtranslationintoOpenCLCfoundCUDAtooutperformOpenCLbyatmost30%ontheNvidiaimplementation.TheresearchersnotedthattheircomparisoncouldbemadefairerbyapplyingmanualoptimizationstotheOpenCLprograms,inwhichcasetherewas"noreasonforOpenCLtoobtainworseperformancethanCUDA".Theperformancedifferencescouldmostlybeattributedtodifferencesintheprogrammingmodel(especiallythememorymodel)andtoNVIDIA'scompileroptimizationsforCUDAcomparedtothoseforOpenCL.[181] AnotherstudyatD-WaveSystemsInc.foundthat"TheOpenCLkernel’sperformanceisbetweenabout13%and63%slower,andtheend-to-endtimeisbetweenabout16%and67%slower"thanCUDA'sperformance.[184] ThefactthatOpenCLallowsworkloadstobesharedbyCPUandGPU,executingthesameprograms,meansthatprogrammerscanexploitbothbydividingworkamongthedevices.[185]Thisleadstotheproblemofdecidinghowtopartitionthework,becausetherelativespeedsofoperationsdifferamongthedevices.Machinelearninghasbeensuggestedtosolvethisproblem:GreweandO'Boyledescribeasystemofsupport-vectormachinestrainedoncompile-timefeaturesofprogramthatcandecidethedevicepartitioningproblemstatically,withoutactuallyrunningtheprogramstomeasuretheirperformance.[186] InacomparisonofactualgraphiccardsofAMDRDNA2andNvidiaRTXSeriesthereisanundecidedresultbyOpenCL-Tests.PossibleperformanceincreasesfromtheuseofNvidiaCUDAorOptiXwerenottested.[187] Seealso[edit] AdvancedSimulationLibrary AMDFireStream BrookGPU C++AMP ClosetoMetal CUDA DirectCompute GPGPU HIP Larrabee LibSh ListofOpenCLapplications OpenACC OpenGL OpenHMPP OpenMP Metal RenderScript SequenceL SIMD SYCL Vulkan WebCL References[edit] ^"KhronosOpenCLRegistry".KhronosGroup.April27,2020.RetrievedApril27,2020. ^"AndroidDevicesWithOpenCLsupport".GoogleDocs.ArrayFire.RetrievedApril28,2015. ^"FreeBSDGraphics/OpenCL".FreeBSD.RetrievedDecember23,2015. ^abcde"ConformantProducts".KhronosGroup.RetrievedMay9,2015. ^Sochacki,Bartosz(July19,2019)."TheOpenCLC++1.0Specification"(PDF).KhronosOpenCLWorkingGroup.RetrievedJuly19,2019. ^Munshi,Aaftab;Howes,Lee;Sochaki,Barosz(April27,2020)."TheOpenCLCSpecificationVersion:3.0DocumentRevision:V3.0.7"(PDF).KhronosOpenCLWorkingGroup.Archivedfromtheoriginal(PDF)onSeptember20,2020.RetrievedApril28,2021. ^"TheC++forOpenCL1.0ProgrammingLanguageDocumentationRevision2".KhronosOpenCLWorkingGroup.March31,2021.RetrievedApril18,2021. ^"ConformantCompanies".KhronosGroup.RetrievedApril8,2015. ^Gianelli,SilviaE.(January14,2015)."XilinxSDAccelDevelopmentEnvironmentforOpenCL,C,andC++,AchievesKhronosConformance".PRNewswire.Xilinx.RetrievedApril27,2015. ^Howes,Lee(November11,2015)."TheOpenCLSpecificationVersion:2.1DocumentRevision:23"(PDF).KhronosOpenCLWorkingGroup.RetrievedNovember16,2015. ^abGaster,Benedict;Howes,Lee;Kaeli,DavidR.;Mistry,Perhaad;Schaa,Dana(2012).HeterogeneousComputingwithOpenCL:RevisedOpenCL1.2Edition.MorganKaufmann. ^Tompson,Jonathan;Schlachter,Kristofer(2012)."AnIntroductiontotheOpenCLProgrammingModel"(PDF).NewYorkUniversityMediaResearchLab.Archivedfromtheoriginal(PDF)onJuly6,2015.RetrievedJuly6,2015. ^abcdStone,JohnE.;Gohara,David;Shi,Guochin(2010)."OpenCL:aparallelprogrammingstandardforheterogeneouscomputingsystems".ComputinginScience&Engineering.12(3):66–73.Bibcode:2010CSE....12c..66S.doi:10.1109/MCSE.2010.69.PMC 2964860.PMID 21037981. ^Klöckner,Andreas;Pinto,Nicolas;Lee,Yunsup;Catanzaro,Bryan;Ivanov,Paul;Fasih,Ahmed(2012)."PyCUDAandPyOpenCL:Ascripting-basedapproachtoGPUrun-timecodegeneration".ParallelComputing.38(3):157–174.arXiv:0911.3456.doi:10.1016/j.parco.2011.09.001.S2CID 18928397. ^"OpenCL-OpenComputingLanguageBindings".metacpan.org.RetrievedAugust18,2018. ^"DbindingforOpenCL".dlang.org.RetrievedJune29,2021. ^"SPIR-Thefirstopenstandardintermediatelanguageforparallelcomputeandgraphics".KhronosGroup.January21,2014. ^"SYCL-C++Single-sourceHeterogeneousProgrammingforOpenCL".KhronosGroup.January21,2014.ArchivedfromtheoriginalonJanuary18,2021.RetrievedOctober24,2016. ^abc"C++forOpenCL,OpenCL-Guide".GitHub.RetrievedApril18,2021. ^abcAaftabMunshi,ed.(2014)."TheOpenCLCSpecification,Version2.0"(PDF).RetrievedJune24,2014. ^ab"IntroductiontoOpenCLProgramming201005"(PDF).AMD.pp. 89–90.Archivedfromtheoriginal(PDF)onMay16,2011.RetrievedAugust8,2017. ^"OpenCL"(PDF).SIGGRAPH2008.August14,2008.Archivedfromtheoriginal(PDF)onFebruary16,2012.RetrievedAugust14,2008. ^"FittingFFTontoG80Architecture"(PDF).VasilyVolkovandBrianKazian,UCBerkeleyCS258projectreport.May2008.RetrievedNovember14,2008. ^"OpenCL_FFT".Apple.June26,2012.RetrievedJune18,2022. ^Trevett,Neil(April28,2020)."KhronosAnnouncementsandPanelDiscussion"(PDF). ^Stulova,Anastasia;Hickey,Neil;vanHaastregt,Sven;Antognini,Marco;Petit,Kevin(April27,2020)."TheC++forOpenCLProgrammingLanguage".ProceedingsoftheInternationalWorkshoponOpenCL.IWOCL'20.Munich,Germany:AssociationforComputingMachinery:1–2.doi:10.1145/3388333.3388647.ISBN 978-1-4503-7531-3.S2CID 216554183. ^abKhronosGroup/OpenCL-Docs,TheKhronosGroup,April16,2021,retrievedApril18,2021 ^"Clangrelease9documentation,OpenCLsupport".releases.llvm.org.September2019.RetrievedApril18,2021. ^"Clang9,LanguageExtensions,OpenCL".releases.llvm.org.September2019.RetrievedApril18,2021. ^"ReleaseofDocumentationofC++forOpenCLkernellanguage,version1.0,revision1·KhronosGroup/OpenCL-Docs".GitHub.December2020.RetrievedApril18,2021. ^ab"TheC++forOpenCL1.0ProgrammingLanguageDocumentation".www.khronos.org.RetrievedApril18,2021. ^abcd"ReleaseofC++forOpenCLKernelLanguageDocumentation,version1.0,revision2·KhronosGroup/OpenCL-Docs".GitHub.March2021.RetrievedApril18,2021. ^"cl_ext_cxx_for_opencl".www.khronos.org.September2020.RetrievedApril18,2021. ^"MaliSDKSupportingCompilationofKernelsinC++forOpenCL".community.arm.com.December2020.RetrievedApril18,2021. ^"ClangCompilerUser'sManual—C++forOpenCLSupport".clang.llvm.org.RetrievedApril18,2021. ^"OpenCL-Guide,OfflineCompilationofOpenCLKernelSources".GitHub.RetrievedApril18,2021. ^"OpenCL-Guide,ProgrammingOpenCLKernels".GitHub.RetrievedApril18,2021. ^abClspvisaprototypecompilerforasubsetofOpenCLCtoVulkancomputeshaders:google/clspv,August17,2019,retrievedAugust20,2019 ^Petit,Kévin(April17,2021),ExperimentalimplementationofOpenCLonVulkan,retrievedApril18,2021 ^"KhronosLaunchesHeterogeneousComputingInitiative"(Pressrelease).KhronosGroup.June16,2008.ArchivedfromtheoriginalonJune20,2008.RetrievedJune18,2008. ^"OpenCLgetstoutedinTexas".MacWorld.November20,2008.RetrievedJune12,2009. ^"TheKhronosGroupReleasesOpenCL1.0Specification"(Pressrelease).KhronosGroup.December8,2008.RetrievedDecember4,2016. ^"ApplePreviewsMacOSXSnowLeopardtoDevelopers"(Pressrelease).AppleInc.June9,2008.ArchivedfromtheoriginalonMarch18,2012.RetrievedJune9,2008. ^"AMDDrivesAdoptionofIndustryStandardsinGPGPUSoftwareDevelopment"(Pressrelease).AMD.August6,2008.RetrievedAugust14,2008. ^"AMDBacksOpenCL,MicrosoftDirectX11".eWeek.August6,2008.ArchivedfromtheoriginalonMarch19,2012.RetrievedAugust14,2008. ^"HPCWire:RapidMindEmbracesOpenSourceandStandardsProjects".HPCWire.November10,2008.ArchivedfromtheoriginalonDecember18,2008.RetrievedNovember11,2008. ^"NvidiaAddsOpenCLToItsIndustryLeadingGPUComputingToolkit"(Pressrelease).Nvidia.December9,2008.RetrievedDecember10,2008. ^"OpenCLDevelopmentKitforLinuxonPower".alphaWorks.October30,2009.RetrievedOctober30,2009. ^"OpenclStandard-anoverview|ScienceDirectTopics". ^http://developer.amd.com/wordpress/media/2012/10/opencl-1.0.48.pdf[bareURLPDF] ^"KhronosDrivesMomentumofParallelComputingStandardwithReleaseofOpenCL1.1Specification".ArchivedfromtheoriginalonMarch2,2016.RetrievedFebruary24,2016. ^"KhronosReleasesOpenCL1.2Specification".KhronosGroup.November15,2011.RetrievedJune23,2015. ^abc"OpenCL1.2Specification"(PDF).KhronosGroup.RetrievedJune23,2015. ^"KhronosFinalizesOpenCL2.0SpecificationforHeterogeneousComputing".KhronosGroup.November18,2013.RetrievedFebruary10,2014. ^"KhronosReleasesOpenCL2.1andSPIR-V1.0SpecificationsforHeterogeneousParallelProgramming".KhronosGroup.November16,2015.RetrievedNovember16,2015. ^"KhronosAnnouncesOpenCL2.1:C++ComestoOpenCL".AnandTech.March3,2015.RetrievedApril8,2015. ^"KhronosReleasesOpenCL2.1ProvisionalSpecificationforPublicReview".KhronosGroup.March3,2015.RetrievedApril8,2015. ^"OpenCLOverview".KhronosGroup.July21,2013. ^ab"KhronosReleasesOpenCL2.2ProvisionalSpecificationwithOpenCLC++KernelLanguageforParallelProgramming".KhronosGroup.April18,2016. ^Trevett,Neil(April2016)."OpenCL–AStateoftheUnion"(PDF).IWOCL.Vienna:KhronosGroup.RetrievedJanuary2,2017. ^"KhronosReleasesOpenCL2.2WithSPIR-V1.2".KhronosGroup.May16,2017. ^ab"OpenCL2.2MaintenanceUpdateReleased".TheKhronosGroup.May14,2018. ^"OpenCL3.0BringingGreaterFlexibility,AsyncDMAExtensions-Phoronix". ^"KhronosGroupReleasesOpenCL3.0".April26,2020. ^https://www.khronos.org/registry/OpenCL/specs/3.0-unified/pdf/OpenCL_API.pdf[bareURLPDF] ^https://www.iwocl.org/wp-content/uploads/k03-iwocl-syclcon-2021-trevett-updated.mp4.pdf[bareURLPDF] ^"UsingSemaphoreandMemorySharingExtensionsforVulkanInteropwithNVIDIAOpenCL".February24,2022. ^"Breaking:OpenCLMergingRoadmapintoVulkan|PCPerspective".www.pcper.com.ArchivedfromtheoriginalonNovember1,2017.RetrievedMay17,2017. ^"SIGGRAPH2018:OpenCL-NextTakingShape,VulkanContinuesEvolving-Phoronix".www.phoronix.com. ^"VulkanUpdateSIGGRAPH2019"(PDF). ^Trevett,Neil(May23,2019)."KhronosandOpenCLOverviewEVSWorkshopMay19"(PDF).KhronosGroup. ^"OpenCLICDSpecification".RetrievedJune23,2015. ^"AppleentryonLLVMUserspage".RetrievedAugust29,2009. ^"NvidiaentryonLLVMUserspage".RetrievedAugust6,2009. ^"RapidmindentryonLLVMUserspage".RetrievedOctober1,2009. ^"ZackRusin'sblogpostabouttheGallium3DOpenCLimplementation".February2009.RetrievedOctober1,2009. ^"GalliumCompute".dri.freedesktop.org.RetrievedJune23,2015. ^"CloverStatusUpdate"(PDF). ^"mesa/mesa-TheMesa3DGraphicsLibrary".cgit.freedesktop.org. ^"GalliumCloverWithSPIR-V&NIROpeningUpNewComputeOptionsInsideMesa-Phoronix".www.phoronix.com.ArchivedfromtheoriginalonOctober22,2020.RetrievedDecember13,2018. ^https://xdc2018.x.org/slides/clover.pdf[bareURLPDF] ^"Mesa's"Rusticl"ImplementationNowManagestoHandleDarktableOpenCL". ^Larabel,Michael(January10,2013)."Beignet:OpenCL/GPGPUComesForIvyBridgeOnLinux".Phoronix. ^Larabel,Michael(April16,2013)."MoreCriticismComesTowardsIntel'sBeignetOpenCL".Phoronix. ^Larabel,Michael(December24,2013)."Intel'sBeignetOpenCLIsStillSlowlyBaking".Phoronix. ^"Beignet".freedesktop.org. ^"beignet-BeignetOpenCLLibraryforIntelIvyBridgeandnewerGPUs".cgit.freedesktop.org. ^"IntelBringsBeignetToAndroidForOpenCLCompute-Phoronix".www.phoronix.com. ^"01.orgIntelOpenSource-ComputeRuntime".February7,2018. ^"NEOGitHubREADME".GitHub.March21,2019. ^"ROCm".GitHub.ArchivedfromtheoriginalonOctober8,2016. ^"RadeonOpenCompute/ROCm:ROCm-OpenSourcePlatformforHPCandUltrascaleGPUComputing".GitHub.March21,2019. ^"ANiceOverviewOfTheROCmLinuxComputeStack-Phoronix".www.phoronix.com. ^"XDCLightning.pdf".GoogleDocs. ^"RadeonROCm2.0OfficiallyOutWithOpenCL2.0Support,TensorFlow1.12,Vega48-bitVA-Phoronix".www.phoronix.com. ^"TakingRadeonROCm2.0OpenCLForABenchmarkingTestDrive-Phoronix".www.phoronix.com. ^https://github.com/RadeonOpenCompute/ROCm/blob/master/AMD_ROCm_Release_Notes_v3.3.pdf[deadlink] ^"RadeonROCm3.5ReleasedwithNewFeaturesbutStillNoNaviSupport-Phoronix". ^"RadeonROCm3.10ReleasedwithDataCenterToolImprovements,NewAPIs-Phoronix". ^"AMDLaunchesArcturusastheInstinctMI100,RadeonROCm4.0-Phoronix". ^"WelcometoAMDROCm™Platform—ROCmDocumentation1.0.0documentation". ^https://docs.amd.com/ ^Jääskeläinen,Pekka;SánchezdeLaLama,Carlos;Schnetter,Erik;Raiskila,Kalle;Takala,Jarmo;Berg,Heikki(2016)."pocl:APerformance-PortableOpenCLImplementation".Int'lJ.ParallelProgramming.43(5):752–785.arXiv:1611.07083.Bibcode:2016arXiv161107083J.doi:10.1007/s10766-014-0320-y.S2CID 9905244. ^abc"poclhomepage".pocl. ^"GitHub-pocl/pocl:pocl:PortableComputingLanguage".March14,2019–viaGitHub. ^"HSAsupportimplementationstatusasof2016-05-17—PortableComputingLanguage(pocl)1.3-predocumentation".portablecl.org. ^"PoCLhomepage". ^"PoCLhomepage". ^"PoCLhomepage". ^"Archivedcopy".ArchivedfromtheoriginalonJanuary17,2021.RetrievedDecember3,2020.{{citeweb}}:CS1maint:archivedcopyastitle(link) ^https://www.iwocl.org/wp-content/uploads/30-iwocl-syclcon-2021-baumann-slides.pdf[bareURLPDF] ^"PoCLhomepage". ^"PoCLhomepage". ^"PoCLhomepage". ^"About".Git.Linaro.org. ^Gall,T.;Pitney,G.(March6,2014)."LCA14-412:GPGPUonARMSoC"(PDF).AmazonWebServices.Archivedfromtheoriginal(PDF)onJuly26,2020.RetrievedJanuary22,2017. ^"zuzuf/freeocl".GitHub.RetrievedApril13,2017. ^Zhang,Peng;Fang,Jianbin;Yang,Canqun;Tang,Tao;Huang,Chun;Wang,Zheng(2018).MOCL:AnEfficientOpenCLImplementationfortheMatrix-2000Architecture(PDF).Proc.Int'lConf.onComputingFrontiers.doi:10.1145/3203217.3203244. ^"Status".GitHub.March16,2022. ^"OpenCLDemo,AMDCPU".YouTube.December10,2008.RetrievedMarch28,2009. ^"OpenCLDemo,NvidiaGPU".YouTube.December10,2008.RetrievedMarch28,2009. ^"ImaginationTechnologieslaunchesadvanced,highly-efficientPOWERVRSGX543MPmulti-processorgraphicsIPfamily".ImaginationTechnologies.March19,2009.ArchivedfromtheoriginalonApril3,2014.RetrievedJanuary30,2011. ^"AMDandHavokdemoOpenCLacceleratedphysics".PCPerspective.March26,2009.ArchivedfromtheoriginalonApril5,2009.RetrievedMarch28,2009. ^"NvidiaReleasesOpenCLDriverToDevelopers".Nvidia.April20,2009.ArchivedfromtheoriginalonFebruary4,2012.RetrievedApril27,2009. ^"AMDdoesreverseGPGPU,announcesOpenCLSDKforx86".ArsTechnica.August5,2009.RetrievedAugust6,2009.[permanentdeadlink] ^Moren,Dan;Snell,Jason(June8,2009)."LiveUpdate:WWDC2009Keynote".MacWorld.com.MacWorld.RetrievedJune12,2009. ^"ATIStreamSoftwareDevelopmentKit(SDK)v2.0BetaProgram".ArchivedfromtheoriginalonAugust9,2009.RetrievedOctober14,2009. ^"S3GraphicslaunchedtheChrome5400Eembeddedgraphicsprocessor".ArchivedfromtheoriginalonDecember2,2009.RetrievedOctober27,2009. ^"VIABringsEnhancedVN1000GraphicsProcessor]".ArchivedfromtheoriginalonDecember15,2009.RetrievedDecember10,2009. ^"ATIStreamSDKv2.0withOpenCL1.0Support".ArchivedfromtheoriginalonNovember1,2009.RetrievedOctober23,2009. ^"OpenCL".ZiiLABS.RetrievedJune23,2015. ^"InteldisclosesnewSandyBridgetechnicaldetails".ArchivedfromtheoriginalonOctober31,2013.RetrievedSeptember13,2010. ^"WebCLrelatedstories".KhronosGroup.RetrievedJune23,2015. ^"KhronosReleasesFinalWebGL1.0Specification".KhronosGroup.ArchivedfromtheoriginalonJuly9,2015.RetrievedJune23,2015. ^"Community". ^"WelcometoWikis".www.ibm.com.October20,2009. ^"NokiaResearchreleasesWebCLprototype".KhronosGroup.May4,2011.ArchivedfromtheoriginalonDecember5,2020.RetrievedJune23,2015. ^KamathK,Sharath."Samsung'sWebCLPrototypeforWebKit".Github.com.ArchivedfromtheoriginalonFebruary18,2015.RetrievedJune23,2015. ^"AMDOpenstheThrottleonAPUPerformancewithUpdatedOpenCLSoftwareDevelopment".Amd.com.August8,2011.RetrievedJune16,2013. ^"AMDAPPSDKv2.6".Forums.amd.com.March13,2015.RetrievedJune23,2015.[deadlink] ^"ThePortlandGroupAnnouncesOpenCLCompilerforST-EricssonARM-BasedNovaThorSoCs".RetrievedMay4,2012. ^"WebCLLatestSpec".KhronosGroup.November7,2013.ArchivedfromtheoriginalonAugust1,2014.RetrievedJune23,2015. ^"AlteraOpenstheWorldofFPGAstoSoftwareProgrammerswithBroadAvailabilityofSDKandOff-the-ShelfBoardsforOpenCL".Altera.com.ArchivedfromtheoriginalonJanuary9,2014.RetrievedJanuary9,2014. ^"AlteraSDKforOpenCLisFirstinIndustrytoAchieveKhronosConformanceforFPGAs".Altera.com.ArchivedfromtheoriginalonJanuary9,2014.RetrievedJanuary9,2014. ^"KhronosFinalizesOpenCL2.0SpecificationforHeterogeneousComputing".KhronosGroup.November18,2013.RetrievedJune23,2015. ^"WebCL1.0PressRelease".KhronosGroup.March19,2014.RetrievedJune23,2015. ^"WebCL1.0Specification".KhronosGroup.March14,2014.RetrievedJune23,2015. ^"IntelOpenCL2.0Driver".ArchivedfromtheoriginalonSeptember17,2014.RetrievedOctober14,2014. ^"AMDOpenCL2.0Driver".Support.AMD.com.June17,2015.RetrievedJune23,2015. ^"XilinxSDAcceldevelopmentenvironmentforOpenCL,C,andC++,achievesKhronosConformance-khronos.orgnews".TheKhronosGroup.RetrievedJune26,2017. ^"Release349GraphicsDriversforWindows,Version350.12"(PDF).April13,2015.RetrievedFebruary4,2016. ^"AMDAPPSDK3.0Released".Developer.AMD.com.August26,2015.RetrievedSeptember11,2015. ^"KhronosReleasesOpenCL2.1andSPIR-V1.0SpecificationsforHeterogeneousParallelProgramming".KhronosGroup.November16,2015. ^"What'snew?Intel®SDKforOpenCL™Applications2016,R3".IntelSoftware. ^"NVIDIA378.66driversforWindowsofferOpenCL2.0evaluationsupport".KhronosGroup.February17,2017.ArchivedfromtheoriginalonAugust6,2020.RetrievedMarch17,2017. ^Szuppe,Jakub(February22,2017)."NVIDIAenablesOpenCL2.0beta-support". ^Szuppe,Jakub(March6,2017)."NVIDIAbeta-supportforOpenCL2.0worksonLinuxtoo". ^"TheKhronosGroup".TheKhronosGroup.March21,2019. ^"GitHub-RadeonOpenCompute/ROCmatroc-3.5.0".GitHub. ^ab"NVIDIAisNowOpenCL3.0Conformant".April12,2021. ^"TheKhronosGroup".TheKhronosGroup.August20,2019.RetrievedAugust20,2019. ^"KhronosGroup/OpenCL-CTL:TheOpenCLConformanceTests".GitHub.March21,2019. ^"OpenCLandtheAMDAPPSDK".AMDDeveloperCentral.developer.amd.com.ArchivedfromtheoriginalonAugust4,2011.RetrievedAugust11,2011. ^"AboutIntelOpenCLSDK1.1".software.intel.com.intel.com.RetrievedAugust11,2011. ^"Intel®SDKforOpenCL™Applications-ReleaseNotes".software.intel.com.March14,2019. ^"ProductSupport".RetrievedAugust11,2011. ^"IntelOpenCLSDK–ReleaseNotes".ArchivedfromtheoriginalonJuly17,2011.RetrievedAugust11,2011. ^"AnnouncingOpenCLDevelopmentKitforLinuxonPowerv0.3".IBM.RetrievedAugust11,2011. ^"IBMreleasesOpenCLDevelopmentKitforLinuxonPowerv0.3–OpenCL1.1conformantreleaseavailable".OpenCLLounge.ibm.com.RetrievedAugust11,2011. ^"IBMreleasesOpenCLCommonRuntimeforLinuxonx86Architecture".IBM.October20,2009.RetrievedSeptember10,2011. ^"OpenCLandtheAMDAPPSDK".AMDDeveloperCentral.developer.amd.com.ArchivedfromtheoriginalonSeptember6,2011.RetrievedSeptember10,2011. ^"NvidiaReleasesOpenCLDriver".April22,2009.RetrievedAugust11,2011. ^"clinfobySimonLeblanc".GitHub.RetrievedJanuary27,2017. ^"clinfobyOblomov".GitHub.RetrievedJanuary27,2017. ^"clinfo:openCLINFOrmation".RetrievedJanuary27,2017. ^"KhronosProducts".TheKhronosGroup.RetrievedMay15,2017. ^"OpenCL-CTS/Test_conformanceatmain·KhronosGroup/OpenCL-CTS".GitHub. ^"Issues·KhronosGroup/OpenCL-CTS".GitHub. ^"IntelCompute-Runtime20.43.18277BringsAlderLakeSupport". ^"compute-runtime".01.org.February7,2018. ^abFang,Jianbin;Varbanescu,AnaLucia;Sips,Henk(2011)."AComprehensivePerformanceComparisonofCUDAandOpenCL".2011InternationalConferenceonParallelProcessing.Proc.Int'lConf.onParallelProcessing.pp. 216–225.doi:10.1109/ICPP.2011.45.ISBN 978-1-4577-1336-1. ^Du,Peng;Weber,Rick;Luszczek,Piotr;Tomov,Stanimire;Peterson,Gregory;Dongarra,Jack(2012)."FromCUDAtoOpenCL:Towardsaperformance-portablesolutionformulti-platformGPUprogramming".ParallelComputing.38(8):391–407.CiteSeerX 10.1.1.193.7712.doi:10.1016/j.parco.2011.10.002. ^Dolbeau,Romain;Bodin,François;deVerdière,GuillaumeColin(September7,2013)."OneOpenCLtorulethemall?".2013IEEE6thInternationalWorkshoponMulti-/Many-coreComputingSystems(MuCoCoS).pp. 1–6.doi:10.1109/MuCoCoS.2013.6633603.ISBN 978-1-4799-1010-6.S2CID 225784. ^Karimi,Kamran;Dickson,NeilG.;Hamze,Firas(2011)."APerformanceComparisonofCUDAandOpenCL".arXiv:1005.2581v3[cs.PF]. ^ASurveyofCPU-GPUHeterogeneousComputingTechniques,ACMComputingSurveys,2015. ^Grewe,Dominik;O'Boyle,MichaelF.P.(2011)."AStaticTaskPartitioningApproachforHeterogeneousSystemsUsingOpenCL".CompilerConstruction.Proc.Int'lConf.onCompilerConstruction.LectureNotesinComputerScience.Vol. 6601.pp. 286–305.doi:10.1007/978-3-642-19861-8_16.ISBN 978-3-642-19860-1. ^"RadeonRX6800SeriesHasExcellentROCm-BasedOpenCLPerformanceonLinux". Externallinks[edit] Officialwebsite OfficialwebsiteforWebCL InternationalWorkshoponOpenCLArchivedJanuary26,2021,attheWaybackMachine(IWOCL)sponsoredbyTheKhronosGroup vteKhronosGroupStandardsActive EGL glTF NNEF OpenCL OpenVG OpenVX OpenXR SPIR SYCL Vulkan Inactive COLLADA OpenGL ES SC WebGL OpenKODE OpenMAX OpenSLES OpenWF WebCL vteParallelcomputingGeneral Distributedcomputing Parallelcomputing Massivelyparallel Cloudcomputing High-performancecomputing Multiprocessing Manycoreprocessor GPGPU Computernetwork Systolicarray Levels Bit Instruction Thread Task Data Memory Loop Pipeline Multithreading Temporal Simultaneous(SMT) Speculative(SpMT) Preemptive Cooperative Clusteredmulti-thread(CMT) Hardwarescout Theory PRAMmodel PEMmodel Analysisofparallelalgorithms Amdahl'slaw Gustafson'slaw Costefficiency Karp–Flattmetric Slowdown Speedup Elements Process Thread Fiber Instructionwindow Arraydatastructure Coordination Multiprocessing Memorycoherence Cachecoherence Cacheinvalidation Barrier Synchronization Applicationcheckpointing Programming Streamprocessing Dataflowprogramming Models Implicitparallelism Explicitparallelism Concurrency Non-blockingalgorithm Hardware Flynn'staxonomy SISD SIMD Arrayprocessing(SIMT) Pipelinedprocessing Associativeprocessing MISD MIMD Dataflowarchitecture Pipelinedprocessor Superscalarprocessor Vectorprocessor Multiprocessor symmetric asymmetric Memory shared distributed distributedshared UMA NUMA COMA Massivelyparallelcomputer Computercluster Gridcomputer Hardwareacceleration APIs AtejiPX Boost Chapel HPX Charm++ Cilk CoarrayFortran CUDA Dryad C++AMP GlobalArrays GPUOpen MPI OpenMP OpenCL OpenHMPP OpenACC ParallelExtensions PVM pthreads RaftLib ROCm UPC TBB ZPL Problems Automaticparallelization Deadlock Deterministicalgorithm Embarrassinglyparallel Parallelslowdown Racecondition Softwarelockout Scalability Starvation  Category:Parallelcomputing Retrievedfrom"https://en.wikipedia.org/w/index.php?title=OpenCL&oldid=1096715527" Categories:2009softwareApplicationprogramminginterfacesCross-platformsoftwareGPGPUGPGPUlibrariesParallelcomputingGraphicslibrariesGraphicsstandardsHiddencategories:AllarticleswithbareURLsforcitationsArticleswithbareURLsforcitationsfromMarch2022ArticleswithPDFformatbareURLsforcitationsAllarticleswithdeadexternallinksArticleswithdeadexternallinksfromMarch2021CS1maint:archivedcopyastitleArticleswithpermanentlydeadexternallinksArticleswithshortdescriptionShortdescriptionmatchesWikidataArticlesneedingcleanupfromJune2022ArticleswithbareURLsforcitationsfromJune2022ArticlescoveredbyWikiProjectWikifyfromJune2022AllarticlescoveredbyWikiProjectWikifyWikipediaarticlesthataretootechnicalfromOctober2021AllarticlesthataretootechnicalUsemdydatesfromOctober2018OfficialwebsitedifferentinWikidataandWikipediaWebarchivetemplatewaybacklinks Navigationmenu Personaltools NotloggedinTalkContributionsCreateaccountLogin Namespaces ArticleTalk English Views ReadEditViewhistory More Search Navigation MainpageContentsCurrenteventsRandomarticleAboutWikipediaContactusDonate Contribute HelpLearntoeditCommunityportalRecentchangesUploadfile Tools WhatlinkshereRelatedchangesUploadfileSpecialpagesPermanentlinkPageinformationCitethispageWikidataitem Print/export DownloadasPDFPrintableversion Inotherprojects WikimediaCommons Languages العربيةБългарскиCatalàČeštinaDeutschEestiEspañolEuskaraفارسیFrançais한국어ItalianoNederlands日本語NorskbokmålPolskiPortuguêsРусскийSlovenčinaСрпски/srpskiSuomiSvenskaไทยTürkçeУкраїнська中文 Editlinks



請為這篇文章評分?