OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units ...
OpenCL
FromWikipedia,thefreeencyclopedia
Jumptonavigation
Jumptosearch
Openstandardforprogrammingheterogenouscomputingsystems,suchasCPUsorGPUs
NottobeconfusedwithOpenGL.
ForthecryptographiclibraryinitiallyknownasOpenCL,seeBotan(programminglibrary).
ThisarticleusesbareURLs,whichmaybethreatenedbylinkrot.Pleaseconsiderconvertingthemtofullcitationstoensurethearticleremainsverifiableandmaintainsaconsistentcitationstyle.Severaltemplatesandtoolsareavailabletoassistinformatting,suchasreFill(documentation).(June2022)(Learnhowandwhentoremovethistemplatemessage)
Thisarticlemaybetootechnicalformostreaderstounderstand.Pleasehelpimproveittomakeitunderstandabletonon-experts,withoutremovingthetechnicaldetails.(October2021)(Learnhowandwhentoremovethistemplatemessage)
OpenCLAPIOriginalauthor(s)AppleInc.Developer(s)KhronosGroupInitialreleaseAugust 28,2009;12yearsago (2009-08-28)Stablerelease3.0.11[1]
/May 6,2022;2monthsago (2022-05-06)
WritteninCwithC++bindingsOperatingsystemAndroid(vendordependent),[2]FreeBSD,[3]Linux,macOS(viaPocl),WindowsPlatformARMv7,ARMv8,[4]Cell,IA-32,Power,x86-64TypeHeterogeneouscomputingAPILicenseOpenCLspecificationlicenseWebsitewww.khronos.org/opencl/
OpenCLC/C++andC++forOpenCLParadigmImperative(procedural),structured,(C++only)object-oriented,genericprogrammingFamilyCStablereleaseOpenCLC++1.0revisionV2.2-11[5]
OpenCLC3.0revisionV3.0.11[6]
C++forOpenCL1.0revision2[7]
/March 31,2021;16monthsago (2021-03-31)
TypingdisciplineStatic,weak,manifest,nominalImplementationlanguageImplementationspecificFilenameextensions.cl.clcppWebsitewww.khronos.org/openclMajorimplementationsAMD,GalliumCompute,IBM,IntelNEO,IntelSDK,TexasInstruments,Nvidia,POCL,ArmInfluencedbyC99,CUDA,C++14,C++17
OpenCL(OpenComputingLanguage)isaframeworkforwritingprogramsthatexecuteacrossheterogeneousplatformsconsistingofcentralprocessingunits(CPUs),graphicsprocessingunits(GPUs),digitalsignalprocessors(DSPs),field-programmablegatearrays(FPGAs)andotherprocessorsorhardwareaccelerators.OpenCLspecifiesprogramminglanguages(basedonC99,C++14andC++17)forprogrammingthesedevicesandapplicationprogramminginterfaces(APIs)tocontroltheplatformandexecuteprogramsonthecomputedevices.OpenCLprovidesastandardinterfaceforparallelcomputingusingtask-anddata-basedparallelism.
OpenCLisanopenstandardmaintainedbythenon-profittechnologyconsortiumKhronosGroup.ConformantimplementationsareavailablefromAltera,AMD,ARM,Creative,IBM,Imagination,Intel,Nvidia,Qualcomm,Samsung,Vivante,Xilinx,andZiiLABS.[8][9]
Contents
1Overview
1.1Memoryhierarchy
2OpenCLkernellanguage
2.1OpenCLClanguage
2.1.1Example:matrix–vectormultiplication
2.1.2Example:computingtheFFT
2.2C++forOpenCLlanguage
2.2.1Features
2.2.2Example:complex-numberarithmetic
2.2.3ToolingandExecutionEnvironment
2.2.4Contributions
3History
3.1OpenCL1.0
3.2OpenCL1.1
3.3OpenCL1.2
3.4OpenCL2.0
3.5OpenCL2.1
3.6OpenCL2.2
3.7OpenCL3.0
4Roadmap
5Opensourceimplementations
6Vendorimplementations
6.1Timelineofvendorimplementations
7Devices
7.1KhronosConformanceTestSuite
7.2Conformantproducts
7.3Versionsupport
7.3.1OpenCL3.0support
7.3.2OpenCL2.2support
7.3.3OpenCL2.1support
7.3.4OpenCL2.0support
7.3.5OpenCL1.2support
7.3.6OpenCL1.1support
7.3.7OpenCL1.0support
8Portability,performanceandalternatives
9Seealso
10References
11Externallinks
Overview[edit]
OpenCLviewsacomputingsystemasconsistingofanumberofcomputedevices,whichmightbecentralprocessingunits(CPUs)or"accelerators"suchasgraphicsprocessingunits(GPUs),attachedtoahostprocessor(aCPU).ItdefinesaC-likelanguageforwritingprograms.FunctionsexecutedonanOpenCLdevicearecalled"kernels".[10]: 17 Asinglecomputedevicetypicallyconsistsofseveralcomputeunits,whichinturncomprisemultipleprocessingelements(PEs).AsinglekernelexecutioncanrunonallormanyofthePEsinparallel.HowacomputedeviceissubdividedintocomputeunitsandPEsisuptothevendor;acomputeunitcanbethoughtofasa"core",butthenotionofcoreishardtodefineacrossallthetypesofdevicessupportedbyOpenCL(orevenwithinthecategoryof"CPUs"),[11]: 49–50 andthenumberofcomputeunitsmaynotcorrespondtothenumberofcoresclaimedinvendors'marketingliterature(whichmayactuallybecountingSIMDlanes).[12]
InadditiontoitsC-likeprogramminglanguage,OpenCLdefinesanapplicationprogramminginterface(API)thatallowsprogramsrunningonthehosttolaunchkernelsonthecomputedevicesandmanagedevicememory,whichis(atleastconceptually)separatefromhostmemory.ProgramsintheOpenCLlanguageareintendedtobecompiledatrun-time,sothatOpenCL-usingapplicationsareportablebetweenimplementationsforvarioushostdevices.[13]TheOpenCLstandarddefineshostAPIsforCandC++;third-partyAPIsexistforotherprogramminglanguagesandplatformssuchasPython,[14]Java,Perl,[15]D[16]and.NET.[11]: 15 AnimplementationoftheOpenCLstandardconsistsofalibrarythatimplementstheAPIforCandC++,andanOpenCLCcompilerforthecomputedevice(s)targeted.
InordertoopentheOpenCLprogrammingmodeltootherlanguagesortoprotectthekernelsourcefrominspection,theStandardPortableIntermediateRepresentation(SPIR)[17]canbeusedasatarget-independentwaytoshipkernelsbetweenafront-endcompilerandtheOpenCLback-end.
MorerecentlyKhronosGrouphasratifiedSYCL,[18]ahigher-levelprogrammingmodelforOpenCLasasingle-sourceeDSLbasedonpureC++17toimproveprogrammingproductivity.PeopleinterestedbyC++kernelsbutnotbySYCLsingle-sourceprogrammingstylecanuseC++featureswithcomputekernelsourceswrittenin"C++forOpenCL"language.[19]
Memoryhierarchy[edit]
OpenCLdefinesafour-levelmemoryhierarchyforthecomputedevice:[13]
globalmemory:sharedbyallprocessingelements,buthashighaccesslatency(__global);
read-onlymemory:smaller,lowlatency,writablebythehostCPUbutnotthecomputedevices(__constant);
localmemory:sharedbyagroupofprocessingelements(__local);
per-elementprivatememory(registers;__private).
Noteverydeviceneedstoimplementeachlevelofthishierarchyinhardware.Consistencybetweenthevariouslevelsinthehierarchyisrelaxed,andonlyenforcedbyexplicitsynchronizationconstructs,notablybarriers.
DevicesmayormaynotsharememorywiththehostCPU.[13]ThehostAPIprovideshandlesondevicememorybuffersandfunctionstotransferdatabackandforthbetweenhostanddevices.
OpenCLkernellanguage[edit]
Theprogramminglanguagethatisusedtowritecomputekernelsiscalledkernellanguage.OpenCLadoptsC/C++-basedlanguagestospecifythekernelcomputationsperformedonthedevicewithsomerestrictionsandadditionstofacilitateefficientmappingtotheheterogeneoushardwareresourcesofaccelerators.TraditionallyOpenCLCwasusedtoprogramtheacceleratorsinOpenCLstandard,laterC++forOpenCLkernellanguagewasdevelopedthatinheritedallfunctionalityfromOpenCLCbutallowedtouseC++featuresinthekernelsources.
OpenCLClanguage[edit]
OpenCLC[20]isaC99-basedlanguagedialectadaptedtofitthedevicemodelinOpenCL.Memorybuffersresideinspecificlevelsofthememoryhierarchy,andpointersareannotatedwiththeregionqualifiers__global,__local,__constant,and__private,reflectingthis.Insteadofadeviceprogramhavingamainfunction,OpenCLCfunctionsaremarked__kerneltosignalthattheyareentrypointsintotheprogramtobecalledfromthehostprogram.Functionpointers,bitfieldsandvariable-lengtharraysareomitted,andrecursionisforbidden.[21]TheCstandardlibraryisreplacedbyacustomsetofstandardfunctions,gearedtowardmathprogramming.
OpenCLCisextendedtofacilitateuseofparallelismwithvectortypesandoperations,synchronization,andfunctionstoworkwithwork-itemsandwork-groups.[21]Inparticular,besidesscalartypessuchasfloatanddouble,whichbehavesimilarlytothecorrespondingtypesinC,OpenCLprovidesfixed-lengthvectortypessuchasfloat4(4-vectorofsingle-precisionfloats);suchvectortypesareavailableinlengthstwo,three,four,eightandsixteenforvariousbasetypes.[20]: § 6.1.2 VectorizedoperationsonthesetypesareintendedtomapontoSIMDinstructionssets,e.g.,SSEorVMX,whenrunningOpenCLprogramsonCPUs.[13]Otherspecializedtypesinclude2-dand3-dimagetypes.[20]: 10–11
Example:matrix–vectormultiplication[edit]
Eachinvocation(work-item)ofthekerneltakesarowofthegreenmatrix(Ainthecode),multipliesthisrowwiththeredvector(x)andplacestheresultinanentryofthebluevector(y).Thenumberofcolumnsnispassedtothekernelasncols;thenumberofrowsisimplicitinthenumberofwork-itemsproducedbythehostprogram.
Thefollowingisamatrix–vectormultiplicationalgorithminOpenCLC.
//MultipliesA*x,leavingtheresultiny.
//Aisarow-majormatrix,meaningthe(i,j)elementisatA[i*ncols+j].
__kernelvoidmatvec(__globalconstfloat*A,__globalconstfloat*x,
uintncols,__globalfloat*y)
{
size_ti=get_global_id(0);//Globalid,usedastherowindex
__globalfloatconst*a=&A[i*ncols];//Pointertothei'throw
floatsum=0.f;//Accumulatorfordotproduct
for(size_tj=0;j
#include
#include"CL/opencl.h"
#defineNUM_ENTRIES1024
intmain()//(intargc,constchar*argv[])
{
//CONSTANTS
//Thesourcecodeofthekernelisrepresentedasastring
//locatedinsidefile:"fft1D_1024_kernel_src.cl".Forthedetailsseethenextlisting.
constchar*KernelSource=
#include"fft1D_1024_kernel_src.cl"
;
//LookinguptheavailableGPUs
constcl_uintnum=1;
clGetDeviceIDs(NULL,CL_DEVICE_TYPE_GPU,0,NULL,(cl_uint*)&num);
cl_device_iddevices[1];
clGetDeviceIDs(NULL,CL_DEVICE_TYPE_GPU,num,devices,NULL);
//createacomputecontextwithGPUdevice
cl_contextcontext=clCreateContextFromType(NULL,CL_DEVICE_TYPE_GPU,NULL,NULL,NULL);
//createacommandqueue
clGetDeviceIDs(NULL,CL_DEVICE_TYPE_DEFAULT,1,devices,NULL);
cl_command_queuequeue=clCreateCommandQueue(context,devices[0],0,NULL);
//allocatethebuffermemoryobjects
cl_memmemobjs[]={clCreateBuffer(context,CL_MEM_READ_ONLY|CL_MEM_COPY_HOST_PTR,sizeof(float)*2*NUM_ENTRIES,NULL,NULL),
clCreateBuffer(context,CL_MEM_READ_WRITE,sizeof(float)*2*NUM_ENTRIES,NULL,NULL)};
//createthecomputeprogram
//constchar*fft1D_1024_kernel_src[1]={};
cl_programprogram=clCreateProgramWithSource(context,1,(constchar**)&KernelSource,NULL,NULL);
//buildthecomputeprogramexecutable
clBuildProgram(program,0,NULL,NULL,NULL,NULL);
//createthecomputekernel
cl_kernelkernel=clCreateKernel(program,"fft1D_1024",NULL);
//settheargsvalues
size_tlocal_work_size[1]={256};
clSetKernelArg(kernel,0,sizeof(cl_mem),(void*)&memobjs[0]);
clSetKernelArg(kernel,1,sizeof(cl_mem),(void*)&memobjs[1]);
clSetKernelArg(kernel,2,sizeof(float)*(local_work_size[0]+1)*16,NULL);
clSetKernelArg(kernel,3,sizeof(float)*(local_work_size[0]+1)*16,NULL);
//createN-Drangeobjectwithwork-itemdimensionsandexecutekernel
size_tglobal_work_size[1]={256};
global_work_size[0]=NUM_ENTRIES;
local_work_size[0]=64;//Nvidia:192or256
clEnqueueNDRangeKernel(queue,kernel,1,NULL,global_work_size,local_work_size,0,NULL,NULL);
}
Theactualcalculationinsidefile"fft1D_1024_kernel_src.cl"(basedonFittingFFTontotheG80Architecture):[23]
R"(
//ThiskernelcomputesFFToflength1024.The1024lengthFFTisdecomposedinto
//callstoaradix16function,anotherradix16functionandthenaradix4function
__kernelvoidfft1D_1024(__globalfloat2*in,__globalfloat2*out,
__localfloat*sMemx,__localfloat*sMemy){
inttid=get_local_id(0);
intblockIdx=get_group_id(0)*1024+tid;
float2data[16];
//startingindexofdatato/fromglobalmemory
in=in+blockIdx;out=out+blockIdx;
globalLoads(data,in,64);//coalescedglobalreads
fftRadix16Pass(data);//in-placeradix-16pass
twiddleFactorMul(data,tid,1024,0);
//localshuffleusinglocalmemory
localShuffle(data,sMemx,sMemy,tid,(((tid&15)*65)+(tid>>4)));
fftRadix16Pass(data);//in-placeradix-16pass
twiddleFactorMul(data,tid,64,4);//twiddlefactormultiplication
localShuffle(data,sMemx,sMemy,tid,(((tid>>4)*64)+(tid&15)));
//fourradix-4functioncalls
fftRadix4Pass(data);//radix-4functionnumber1
fftRadix4Pass(data+4);//radix-4functionnumber2
fftRadix4Pass(data+8);//radix-4functionnumber3
fftRadix4Pass(data+12);//radix-4functionnumber4
//coalescedglobalwrites
globalStores(data,out,64);
}
)"
Afull,opensourceimplementationofanOpenCLFFTcanbefoundonApple'swebsite.[24]
C++forOpenCLlanguage[edit]
In2020,Khronosannounced[25]thetransitiontothecommunitydrivenC++forOpenCLprogramminglanguage[26]thatprovidesfeaturesfromC++17incombinationwiththetraditionalOpenCLCfeatures.ThislanguageallowstoleveragearichvarietyoflanguagefeaturesfromstandardC++whilepreservingbackwardcompatibilitytoOpenCLC.ThisopensupasmoothtransitionpathtoC++functionalityfortheOpenCLkernelcodedevelopersas theycancontinueusingfamiliarprogrammingflowandeventoolsaswellasleverageexistingextensionsandlibrariesavailableforOpenCLC.
ThelanguagesemanticsisdescribedinthedocumentationpublishedinthereleasesofOpenCL-Docs[27]repositoryhostedbytheKhronosGroup butitiscurrentlynotratifiedbytheKhronosGroup.TheC++forOpenCLlanguageisnotdocumentedinastand-alonedocumentanditisbasedonthespecificationofC++andOpenCLC.TheopensourceClangcompilerhassupportedC++forOpenCLsincerelease9.[28]
C++forOpenCLhasbeenoriginallydevelopedasaClangcompilerextensionandappearedintherelease9.[29]AsitwastightlycoupledwithOpenCLCanddidnotcontainanyClangspecificfunctionalityitsdocumentationhasbeenre-hostedtotheOpenCL-Docsrepository[27]fromtheKhronosGroupalongwiththesourcesofotherspecificationsandreferencecards.ThefirstofficialreleaseofthisdocumentdescribingC++forOpenCLversion1.0hasbeenpublishedinDecember2020.[30]C++forOpenCL1.0containsfeaturesfromC++17anditisbackwardcompatiblewithOpenCLC2.0.AworkinprogressdraftofitsdocumentationcanbefoundontheKhronoswebsite.[31]
Features[edit]
C++forOpenCLsupportsmostofthefeatures(syntacticallyandsemantically)fromOpenCLCexceptfornestedparallelismandblocks.[32]However,thereareminordifferencesinsomesupportedfeaturesmainlyrelatedtodifferencesinsemanticsbetweenC++andC.Forexample,C++ismorestrictwiththeimplicittypeconversionsanditdoesnotsupporttherestricttypequalifier.[32]ThefollowingC++featuresarenotsupportedbyC++forOpenCL:virtualfunctions,dynamic_castoperator,non-placementnew/deleteoperators,exceptions,pointertomemberfunctions,referencestofunctions,C++standardlibraries.[32]C++forOpenCLextendstheconceptofseparatememoryregions(addressspaces)fromOpenCLCtoC++features-functionalcasts,templates,classmembers,references,lambdafunctions, operators.MostofC++featuresarenotavailableforthekernelfunctionse.g.overloadingortemplating,arbitraryclasslayoutinparametertype.[32]
Example:complex-numberarithmetic[edit]
Thefollowingcodesnippetillustrateshowkernelswithcomplex-numberarithmeticcanbeimplementedinC++forOpenCLlanguagewithconvenientuseofC++features.//DefineaclassComplex,thatcanperformcomplex-numbercomputationswith
//variousprecisionwhendifferenttypesforTareused-double,float,half.
template
classcomplex_t{
Tm_re;//Realcomponent.
Tm_im;//Imaginarycomponent.
public:
complex_t(Tre,Tim):m_re{re},m_im{im}{};
//Defineoperatorforcomplex-numbermultiplication.
complex_toperator*(constcomplex_t&other)const
{
return{m_re*other.m_re-m_im*other.m_im,
m_re*other.m_im+m_im*other.m_re};
}
Tget_re()const{returnm_re;}
Tget_im()const{returnm_im;}
};
//Ahelperfunctiontocomputemultiplicationovercomplexnumbersreadfrom
//theinputbufferandtostorethecomputedresultintotheoutputbuffer.
template
voidcompute_helper(__globalT*in,__globalT*out){
autoidx=get_global_id(0);
//Everywork-itemuses4consecutiveitemsfromtheinputbuffer
//-twoforeachcomplexnumber.
autooffset=idx*4;
autonum1=complex_t{in[offset],in[offset+1]};
autonum2=complex_t{in[offset+2],in[offset+3]};
//Performcomplex-numbermultiplication.
autores=num1*num2;
//Everywork-itemwrites2consecutiveitemstotheoutputbuffer.
out[idx*2]=res.get_re();
out[idx*2+1]=res.get_im();
}
//Thiskernelisusedforcomplex-numbermultiplicationinsingleprecision.
__kernelvoidcompute_sp(__globalfloat*in,__globalfloat*out){
compute_helper(in,out);
}
#ifdefcl_khr_fp16
//Thiskernelisusedforcomplex-numbermultiplicationinhalfprecisionwhen
//itissupportedbythedevice.
#pragmaOPENCLEXTENSIONcl_khr_fp16:enable
__kernelvoidcompute_hp(__globalhalf*in,__globalhalf*out){
compute_helper(in,out);
}
#endif
ToolingandExecutionEnvironment[edit]
C++forOpenCLlanguagecanbeusedforthesameapplicationsorlibrariesandinthesamewayasOpenCLClanguageisused.DuetotherichvarietyofC++languagefeatures,applicationswritteninC++forOpenCLcanexpresscomplexfunctionality moreconvenientlythanapplicationswritteninOpenCLCandinparticulargenericprogrammingparadigmfromC++isveryattractivetothelibrarydevelopers.
C++forOpenCLsourcescanbecompiledbyOpenCLdriversthatsupportcl_ext_cxx_for_openclextension.[33]ArmhasannouncedsupportforthisextensioninDecember2020.[34]However,duetoincreasingcomplexityofthealgorithmsacceleratedonOpenCLdevices,itisexpectedthatmoreapplicationswillcompileC++forOpenCLkernelsofflineusingstandalonecompilerssuchasClang[35]intoexecutablebinaryformatorportablebinaryformate.g.SPIR-V.[36]SuchanexecutablecanbeloadedduringtheOpenCLapplicationsexecutionusingadedicatedOpenCLAPI.[37]
BinariescompiledfromsourcesinC++forOpenCL1.0canbeexecutedonOpenCL2.0conformantdevices.DependingonthelanguagefeaturesusedinsuchkernelsourcesitcanalsobeexecutedondevicessupportingearlierOpenCLversionsorOpenCL3.0.
AsidefromOpenCLdriverskernelswritteninC++forOpenCLcanbecompiledforexecutiononVulkandevicesusingclspv[38]compilerandclvk[39]runtimelayerjustthesamewayasOpenCLCkernels.
Contributions[edit]
C++forOpenCLisanopenlanguagedevelopedbythecommunityofcontributorslistedinitsdocumentation.[31] Newcontributionstothelanguagesemanticdefinitionoropensourcetoolingsupportareacceptedfromanyoneinterestedassoonastheyarealignedwiththemaindesignphilosophy andtheyarereviewedandapprovedbytheexperiencedcontributors.[19]
History[edit]
OpenCLwasinitiallydevelopedbyAppleInc.,whichholdstrademarkrights,andrefinedintoaninitialproposalincollaborationwithtechnicalteamsatAMD,IBM,Qualcomm,Intel,andNvidia.ApplesubmittedthisinitialproposaltotheKhronosGroup.OnJune16,2008,theKhronosComputeWorkingGroupwasformed[40]withrepresentativesfromCPU,GPU,embedded-processor,andsoftwarecompanies.ThisgroupworkedforfivemonthstofinishthetechnicaldetailsofthespecificationforOpenCL1.0byNovember18,2008.[41]ThistechnicalspecificationwasreviewedbytheKhronosmembersandapprovedforpublicreleaseonDecember8,2008.[42]
OpenCL1.0[edit]
OpenCL1.0releasedwithMacOSXSnowLeopardonAugust28,2009.AccordingtoanApplepressrelease:[43]
SnowLeopardfurtherextendssupportformodernhardwarewithOpenComputingLanguage(OpenCL),whichletsanyapplicationtapintothevastgigaflopsofGPUcomputingpowerpreviouslyavailableonlytographicsapplications.OpenCLisbasedontheCprogramminglanguageandhasbeenproposedasanopenstandard.
AMDdecidedtosupportOpenCLinsteadofthenowdeprecatedClosetoMetalinitsStreamframework.[44][45]RapidMindannouncedtheiradoptionofOpenCLunderneaththeirdevelopmentplatformtosupportGPUsfrommultiplevendorswithoneinterface.[46]OnDecember9,2008,NvidiaannounceditsintentiontoaddfullsupportfortheOpenCL1.0specificationtoitsGPUComputingToolkit.[47]OnOctober30,2009,IBMreleaseditsfirstOpenCLimplementationasapartoftheXLcompilers.[48]
Accelerationofcalculationswithfactorto1000arepossiblewithOpenCLingraphiccardsagainstnormalCPU.[49]
SomeimportantfeaturesofnextVersionofOpenCLareoptionalin1.0likedouble-orhalf-precisionoperations.[50]
OpenCL1.1[edit]
OpenCL1.1wasratifiedbytheKhronosGrouponJune14,2010[51]andaddssignificantfunctionalityforenhancedparallelprogrammingflexibility,functionality,andperformanceincluding:
Newdatatypesincluding3-componentvectorsandadditionalimageformats;
Handlingcommandsfrommultiplehostthreadsandprocessingbuffersacrossmultipledevices;
Operationsonregionsofabufferincludingread,writeandcopyof1D,2D,or3Drectangularregions;
Enhanceduseofeventstodriveandcontrolcommandexecution;
AdditionalOpenCLbuilt-inCfunctionssuchasintegerclamp,shuffle,andasynchronousstridedcopies;
ImprovedOpenGLinteroperabilitythroughefficientsharingofimagesandbuffersbylinkingOpenCLandOpenGLevents.
OpenCL1.2[edit]
OnNovember15,2011,theKhronosGroupannouncedtheOpenCL1.2specification,[52]whichaddedsignificantfunctionalityoverthepreviousversionsintermsofperformanceandfeaturesforparallelprogramming.Mostnotablefeaturesinclude:
Devicepartitioning:theabilitytopartitionadeviceintosub-devicessothatworkassignmentscanbeallocatedtoindividualcomputeunits.Thisisusefulforreservingareasofthedevicetoreducelatencyfortime-criticaltasks.
Separatecompilationandlinkingofobjects:thefunctionalitytocompileOpenCLintoexternallibrariesforinclusionintootherprograms.
Enhancedimagesupport(optional):1.2addssupportfor1Dimagesand1D/2Dimagearrays.Furthermore,theOpenGLsharingextensionsnowallowforOpenGL1Dtexturesand1D/2DtexturearraystobeusedtocreateOpenCLimages.
Built-inkernels:customdevicesthatcontainspecificuniquefunctionalityarenowintegratedmorecloselyintotheOpenCLframework.Kernelscanbecalledtousespecialisedornon-programmableaspectsofunderlyinghardware.Examplesincludevideoencoding/decodinganddigitalsignalprocessors.
DirectXfunctionality:DX9mediasurfacesharingallowsforefficientsharingbetweenOpenCLandDX9orDXVAmediasurfaces.Equally,forDX11,seamlesssharingbetweenOpenCLandDX11surfacesisenabled.
TheabilitytoforceIEEE754complianceforsingle-precisionfloating-pointmath:OpenCLbydefaultallowsthesingle-precisionversionsofthedivision,reciprocal,andsquarerootoperationtobelessaccuratethanthecorrectlyroundedvaluesthatIEEE754requires.[53]Iftheprogrammerpassesthe"-cl-fp32-correctly-rounded-divide-sqrt"commandlineargumenttothecompiler,thesethreeoperationswillbecomputedtoIEEE754requirementsiftheOpenCLimplementationsupportsthis,andwillfailtocompileiftheOpenCLimplementationdoesnotsupportcomputingtheseoperationstotheircorrectly-roundedvaluesasdefinedbytheIEEE754specification.[53]ThisabilityissupplementedbytheabilitytoquerytheOpenCLimplementationtodetermineifitcanperformtheseoperationstoIEEE754accuracy.[53]
OpenCL2.0[edit]
OnNovember18,2013,theKhronosGroupannouncedtheratificationandpublicreleaseofthefinalizedOpenCL2.0specification.[54]UpdatesandadditionstoOpenCL2.0include:
Sharedvirtualmemory
Nestedparallelism
Genericaddressspace
Images(optional,include3D-Image)
C11atomics
Pipes
Androidinstallableclientdriverextension
halfprecisionextendedwithoptionalcl_khr_fp16extension
cl_double:doubleprecisionIEEE754(optional)
OpenCL2.1[edit]
TheratificationandreleaseoftheOpenCL2.1provisionalspecificationwasannouncedonMarch3,2015attheGameDeveloperConferenceinSanFrancisco.ItwasreleasedonNovember16,2015.[55]ItintroducedtheOpenCLC++kernellanguage,basedonasubsetofC++14,whilemaintainingsupportforthepreexistingOpenCLCkernellanguage.VulkanandOpenCL2.1shareSPIR-Vasanintermediaterepresentationallowinghigh-levellanguagefront-endstoshareacommoncompilationtarget.UpdatestotheOpenCLAPIinclude:
Additionalsubgroupfunctionality
Copyingofkernelobjectsandstates
Low-latencydevicetimerqueries
IngestionofSPIR-Vcodebyruntime
Executionpriorityhintsforqueues
Zero-sizeddispatchesfromhost
AMD,ARM,Intel,HPC,andYetiWarehavedeclaredsupportforOpenCL2.1.[56][57]
OpenCL2.2[edit]
OpenCL2.2bringstheOpenCLC++kernellanguageintothecorespecificationforsignificantlyenhancedparallelprogrammingproductivity.[58][59][60]ItwasreleasedonMay16,2017.[61]MaintenanceUpdatereleasedinMay2018withbugfixes.[62]
TheOpenCLC++kernellanguageisastaticsubsetoftheC++14standardandincludesclasses,templates,lambdaexpressions,functionoverloadsandmanyotherconstructsforgenericandmeta-programming.
UsesthenewKhronosSPIR-V1.1intermediatelanguagewhichfullysupportstheOpenCLC++kernellanguage.
OpenCLlibraryfunctionscannowusetheC++languagetoprovideincreasedsafetyandreducedundefinedbehaviorwhileaccessingfeaturessuchasatomics,iterators,images,samplers,pipes,anddevicequeuebuilt-intypesandaddressspaces.
Pipestorageisanewdevice-sidetypeinOpenCL2.2thatisusefulforFPGAimplementationsbymakingconnectivitysizeandtypeknownatcompiletime,enablingefficientdevice-scopecommunicationbetweenkernels.
OpenCL2.2alsoincludesfeaturesforenhancedoptimizationofgeneratedcode:applicationscanprovidethevalueofspecializationconstantatSPIR-Vcompilationtime,anewquerycandetectnon-trivialconstructorsanddestructorsofprogramscopeglobalobjects,andusercallbackscanbesetatprogramreleasetime.
RunsonanyOpenCL2.0-capablehardware(onlyadriverupdateisrequired).
OpenCL3.0[edit]
TheOpenCL3.0specificationwasreleasedonSeptember30,2020afterbeinginpreviewsinceApril2020.OpenCL1.2functionalityhasbecomeamandatorybaseline,whileallOpenCL2.xandOpenCL3.0featuresweremadeoptional.ThespecificationretainstheOpenCLClanguageanddeprecatestheOpenCLC++KernelLanguage,replacingitwiththeC++forOpenCLlanguage[19]basedonaClang/LLVMcompilerwhichimplementsasubsetofC++17andSPIR-Vintermediatecode.[63][64][65]
Version3.0.7ofC++forOpenCLwithsomeKhronosopenCLextensionswerepresentedatIWOCL21.[66]Actualis3.0.11withsomenewextensionsandcorrections.
NVIDIA,workingcloselywiththeKhronosOpenCLWorkingGroup,improvedVulkanInteropwithsemaphoresandmemorysharing.[67]
Roadmap[edit]
TheInternationalWorkshoponOpenCL(IWOCL)heldbytheKhronosGroup
WhenreleasingOpenCL2.2,theKhronosGroupannouncedthatOpenCLwouldconvergewherepossiblewithVulkantoenableOpenCLsoftwaredeploymentflexibilityoverbothAPIs.[68][69]ThishasbeennowdemonstratedbyAdobe'sPremiereRushusingtheclspv[38]opensourcecompilertocompilesignificantamountsofOpenCLCkernelcodetorunonaVulkanruntimefordeploymentonAndroid.[70]OpenCLhasaforwardlookingroadmapindependentofVulkan,with'OpenCLNext'underdevelopmentandtargetingreleasein2020.OpenCLNextmayintegrateextensionssuchasVulkan/OpenCLInterop,Scratch-PadMemoryManagement,ExtendedSubgroups,SPIR-V1.4ingestionandSPIR-VExtendeddebuginfo.OpenCLisalsoconsideringVulkan-likeloaderandlayersanda‘FlexibleProfile’fordeploymentflexibilityonmultipleacceleratortypes.[71]
Opensourceimplementations[edit]
clinfo,acommand-linetooltoseeOpenCLinformation
OpenCLconsistsofasetofheadersandasharedobjectthatisloadedatruntime.Aninstallableclientdriver(ICD)mustbeinstalledontheplatformforeveryclassofvendorforwhichtheruntimewouldneedtosupport.Thatis,forexample,inordertosupportNvidiadevicesonaLinuxplatform,theNvidiaICDwouldneedtobeinstalledsuchthattheOpenCLruntime(theICDloader)wouldbeabletolocatetheICDforthevendorandredirectthecallsappropriately.ThestandardOpenCLheaderisusedbytheconsumerapplication;callstoeachfunctionarethenproxiedbytheOpenCLruntimetotheappropriatedriverusingtheICD.EachvendormustimplementeachOpenCLcallintheirdriver.[72]
TheApple,[73]Nvidia,[74]ROCm,RapidMind[75]andGallium3D[76]implementationsofOpenCLareallbasedontheLLVMCompilertechnologyandusetheClangcompilerastheirfrontend.
MESAGalliumCompute
AnimplementationofOpenCL(actual1.1incomplete,mostlydoneAMDRadeonGCN)foranumberofplatformsismaintainedaspartoftheGalliumComputeProject,[77]whichbuildsontheworkoftheMesaprojecttosupportmultipleplatforms.FormerlythiswasknownasCLOVER.,[78]actualdevelopment:mostlysupportforrunningincompleteframeworkwithactualLLVMandCLANG,somenewfeatureslikefp16in17.3,[79]TargetcompleteOpenCL1.0,1.1and1.2forAMDandNvidia.NewBasicDevelopmentisdonebyRedHatwithSPIR-ValsoforClover.[80][81]NewTargetismodularOpenCL3.0withfullsupportofOpenCL1.2.ActualstateisavailableinMesamatrix.Imagesupportsarehereinthefocusofdevelopment.
RustiCLisanewimplementationforGalliumcomputewithRustinsteadofCforbettercode.InMesa22.2experimentalimplementationwillbeavailablewithopenCL3.0-supportandimageextensionimplementationforprogramslikeDarktable.[82]
BEIGNET
AnimplementationbyIntelforitsIvyBridge+hardwarewasreleasedin2013.[83]ThissoftwarefromIntel'sChinaTeam,hasattractedcriticismfromdevelopersatAMDandRedHat,[84]aswellasMichaelLarabelofPhoronix.[85]ActualVersion1.3.2supportOpenCL1.2complete(IvyBridgeandhigher)andOpenCL2.0optionalforSkylakeandnewer.[86][87]supportforAndroidhasbeenaddedtoBeignet.,[88]actualdevelopmenttargets:onlysupportfor1.2and2.0,roadtoOpenCL2.1,2.2,3.0isgonetoNEO.
NEO
AnimplementationbyIntelforGen.8Broadwell+Gen.9hardwarereleasedin2018.[89]ThisdriverreplacesBeignetimplementationforsupportedplatforms(notolder6.gentoHaswell).NEOprovidesOpenCL2.1supportonCoreplatformsandOpenCL1.2onAtomplatforms.[90]Actualin2020alsoGraphicGen11IceLakeandGen12TigerLakearesupported.NewOpenCL3.0isavailableforAlderLake,TigerLaketoBroadwellwithVersion20.41+.ItincludesnowoptionalOpenCL2.0,2.1Featurescompleteandsomeof2.2.
ROCm
CreatedaspartofAMD'sGPUOpen,ROCm(RadeonOpenCompute)isanopensourceLinuxprojectbuiltonOpenCL 1.2withlanguagesupportfor2.0.ThesystemiscompatiblewithallmodernAMDCPUsandAPUs(actualpartlyGFX7,GFX8and9),aswellasIntelGen7.5+CPUs(onlywithPCI3.0).[91][92]Withversion1.9supportisinsomepointsextendedexperimentaltoHardwarewithPCIe2.0andwithoutatomics.AnoverviewofactualworkisdoneonXDC2018.[93][94]ROCmVersion2.0supportsFullOpenCL2.0,butsomeerrorsandlimitationsareonthetodolist.[95][96]Version3.3isimprovingindetails.[97]Version3.5doessupportOpenCL2.2.[98]Version3.10waswithimprovementsandnewAPIs.[99]AnnouncedatSC20isROCm4.0withsupportofAMDComputeCardInstinctMI100.[100]Actualdocumentationof5.1.1andbeforeisavailableatgithub.[101][102]OpenCL3.0isavailable.
POCL
AportableimplementationsupportingCPUsandsomeGPUs(viaCUDAandHSA).BuildingonClangandLLVM.[103]Withversion1.0OpenCL1.2wasnearlyfullyimplementedalongwithsome2.xfeatures.[104]Version1.2iswithLLVM/CLANG6.0,7.0andFullOpenCL1.2supportwithallclosedticketsinMilestone1.2.[104][105]OpenCL2.0isnearlyfullimplemented.[106]Version1.3SupportsMacOSX.[107]Version1.4includessupportforLLVM8.0and9.0.[108]Version1.5implementsLLVM/Clang10support.[109]Version1.6implementsLLVM/Clang11supportandCUDAAcceleration.[110]ActualtargetsarecompleteOpenCL2.x,OpenCL3.0andimprovementofperformance.POCL1.6iswithmanualoptimizationatthesamelevelofIntelcomputeruntime.[111]Version1.7implementsLLVM/Clang12supportandsomenewOpenCL3.0features.[112]Version1.8implementsLLVM/Clang13support.[113]Version3.0implementsOpenCL3.0atminimumlevelandLLVM/Clang14.[114]
Shamrock
APortofMesaCloverforARMwithfullsupportofOpenCL1.2,[115][116]noactualdevelopmentfor2.0.
FreeOCL
ACPUfocusedimplementationofOpenCL1.2thatimplementsanexternalcompilertocreateamorereliableplatform,[117]noactualdevelopment.
MOCL
AnOpenCLimplementationbasedonPOCLbytheNUDTresearchersforMatrix-2000wasreleasedin2018.TheMatrix-2000architectureisdesignedtoreplacetheIntelXeonPhiacceleratorsoftheTianHe-2supercomputer.ThisprogrammingframeworkisbuiltontopofLLVMv5.0andreusessomecodepiecesfromPOCLaswell.Tounlockthehardwarepotential,thedeviceruntimeusesapush-basedtaskdispatchingstrategyandtheperformanceofthekernelatomicsisimprovedsignificantly.ThisframeworkhasbeendeployedontheTH-2Asystemandisreadilyavailabletothepublic.[118]SomeofthesoftwarewillnextportedtoimprovePOCL.[104]
VC4CL
AnOpenCL1.2implementationfortheVideoCoreIV(BCM2763)processorusedintheRaspberryPibeforeitsmodel4.[119]
Vendorimplementations[edit]
Timelineofvendorimplementations[edit]
June,2008:DuringApple’sWWDCconferenceanearlybetaofMacOSXSnowLeopardwasmadeavailabletotheparticipants,itincludedthefirstbetaimplementationofOpenCL,about6monthsbeforethefinalversion1.0specificationwasratifiedlate2008.Theyalsoshowedtwodemos.Onewasagridof8x8screensrendered,eachdisplayingthescreenofanemulatedAppleIImachine—64independentinstancesintotal,eachrunningafamouskarategame.Thisshowedtaskparallelism,ontheCPU.TheotherdemowasaN-bodysimulationrunningontheGPUofaMacPro,adataparalleltask.
December10,2008:AMDandNvidiaheldthefirstpublicOpenCLdemonstration,a75-minutepresentationatSIGGRAPHAsia2008.AMDshowedaCPU-acceleratedOpenCLdemoexplainingthescalabilityofOpenCLononeormorecoreswhileNvidiashowedaGPU-accelerateddemo.[120][121]
March16,2009:atthe4thMulticoreExpo,ImaginationTechnologiesannouncedthePowerVRSGX543MP,thefirstGPUofthiscompanytofeatureOpenCLsupport.[122]
March26,2009:atGDC2009,AMDandHavokdemonstratedthefirstworkingimplementationforOpenCLacceleratingHavokClothonAMDRadeonHD4000seriesGPU.[123]
April20,2009:NvidiaannouncedthereleaseofitsOpenCLdriverandSDKtodevelopersparticipatinginitsOpenCLEarlyAccessProgram.[124]
August5,2009:AMDunveiledthefirstdevelopmenttoolsforitsOpenCLplatformaspartofitsATIStreamSDKv2.0BetaProgram.[125]
August28,2009:ApplereleasedMacOSXSnowLeopard,whichcontainsafullimplementationofOpenCL.[126]
September28,2009:NvidiareleaseditsownOpenCLdriversandSDKimplementation.
October13,2009:AMDreleasedthefourthbetaoftheATIStreamSDK2.0,whichprovidesacompleteOpenCLimplementationonbothR700/R800GPUsandSSE3capableCPUs.TheSDKisavailableforbothLinuxandWindows.[127]
November26,2009:NvidiareleaseddriversforOpenCL1.0(rev48).
October27,2009:S3releasedtheirfirstproductsupportingnativeOpenCL1.0–theChrome5400Eembeddedgraphicsprocessor.[128]
December10,2009:VIAreleasedtheirfirstproductsupportingOpenCL1.0–ChromotionHD2.0videoprocessorincludedinVN1000chipset.[129]
December21,2009:AMDreleasedtheproductionversionoftheATIStreamSDK2.0,[130]whichprovidesOpenCL1.0supportforR800GPUsandbetasupportforR700GPUs.
June1,2010:ZiiLABSreleaseddetailsoftheirfirstOpenCLimplementationfortheZMSprocessorforhandheld,embeddedanddigitalhomeproducts.[131]
June30,2010:IBMreleasedafullyconformantversionofOpenCL1.0.[4]
September13,2010:IntelreleaseddetailsoftheirfirstOpenCLimplementationfortheSandyBridgechiparchitecture.SandyBridgewillintegrateIntel'snewestgraphicschiptechnologydirectlyontothecentralprocessingunit.[132]
November15,2010:WolframResearchreleasedMathematica8withOpenCLLinkpackage.
March3,2011:KhronosGroupannouncestheformationoftheWebCLworkinggrouptoexploredefiningaJavaScriptbindingtoOpenCL.ThiscreatesthepotentialtoharnessGPUandmulti-coreCPUparallelprocessingfromaWebbrowser.[133][134]
March31,2011:IBMreleasedafullyconformantversionofOpenCL1.1.[4][135]
April25,2011:IBMreleasedOpenCLCommonRuntimev0.1forLinuxonx86Architecture.[136]
May4,2011:NokiaResearchreleasesanopensourceWebCLextensionfortheFirefoxwebbrowser,providingaJavaScriptbindingtoOpenCL.[137]
July1,2011:SamsungElectronicsreleasesanopensourceprototypeimplementationofWebCLforWebKit,providingaJavaScriptbindingtoOpenCL.[138]
August8,2011:AMDreleasedtheOpenCL-drivenAMDAcceleratedParallelProcessing(APP)SoftwareDevelopmentKit(SDK)v2.5,replacingtheATIStreamSDKastechnologyandconcept.[139]
December12,2011:AMDreleasedAMDAPPSDKv2.6[140]whichcontainsapreviewofOpenCL1.2.
February27,2012:ThePortlandGroupreleasedthePGIOpenCLcompilerformulti-coreARMCPUs.[141]
April17,2012:KhronosreleasedaWebCLworkingdraft.[142]
May6,2013:AlterareleasedtheAlteraSDKforOpenCL,version13.0.[143]ItisconformanttoOpenCL1.0.[144]
November18,2013:KhronosannouncedthatthespecificationforOpenCL2.0hadbeenfinalized.[145]
March19,2014:KhronosreleasestheWebCL1.0specification.[146][147]
August29,2014:IntelreleasesHDGraphics5300driverthatsupportsOpenCL2.0.[148]
September25,2014:AMDreleasesCatalyst14.41RC1,whichincludesanOpenCL2.0driver.[149]
January14,2015:XilinxInc.announcesSDAcceldevelopmentenvironmentforOpenCL,C,andC++,achievesKhronosConformance.[150]
April13,2015:NvidiareleasesWHQLdriverv350.12,whichincludesOpenCL1.2supportforGPUsbasedonKeplerorlaterarchitectures.[151]Driver340+supportOpenCL1.1forTeslaandFermi.
August26,2015:AMDreleasedAMDAPPSDKv3.0[152]whichcontainsfullsupportofOpenCL2.0andsamplecoding.
November16,2015:KhronosannouncedthatthespecificationforOpenCL2.1hadbeenfinalized.[153]
April18,2016:KhronosannouncedthatthespecificationforOpenCL2.2hadbeenprovisionallyfinalized.[59]
November3,2016IntelsupportforGen7+ofOpenCL2.1inSDK2016r3.[154]
February17,2017:NvidiabeginsevaluationsupportofOpenCL2.0withdriver378.66.[155][156][157]
May16,2017:KhronosannouncedthatthespecificationforOpenCL2.2hadbeenfinalizedwithSPIR-V1.2.[158]
May14,2018:KhronosannouncedMaintenanceUpdateforOpenCL2.2withBugfixandunifiedheaders.[62]
April27,2020:KhronosannouncedprovisionalVersionofOpenCL3.0.
June1,2020:IntelNeoRuntimewithOpenCL3.0fornewTigerLake.
June3,2020:AMDannouncedRocM3.5withOpenCL2.2support.[159]
September30,2020:KhronosannouncedthatthespecificationsforOpenCL3.0hadbeenfinalized(CTSalsoavailable).
October16,2020:IntelannouncedwithNeo20.41supportforOpenCL3.0(includesmostlyofoptionalOpenCL2.x).
April6,2021:NvidiasupportsOpenCL3.0forAmpere.MaxwellandlaterGPUsalsosupportsOpenCL3.0withNvidiadriver465+.[160]
Devices[edit]
Asof2016,OpenCLrunsongraphicsprocessingunits(GPUs),CPUswithSIMDinstructions,FPGAs,MovidiusMyriad2,AdaptevaEpiphanyandDSPs.
KhronosConformanceTestSuite[edit]
Tobeofficiallyconformant,animplementationmustpasstheKhronosConformanceTestSuite(CTS),withresultsbeingsubmittedtotheKhronosAdoptersProgram.[161]TheKhronosCTScodeforallOpenCLversionshasbeenavailableinopensourcesince2017.[162]
Conformantproducts[edit]
TheKhronosGroupmaintainsanextendedlistofOpenCL-conformantproducts.[4]
SynopsisofOpenCLconformantproducts[4]
AMDSDKs(supportsOpenCLCPUandacceleratedprocessingunitDevices),(GPU:Terascale1:OpenCL1.1,Terascale2:1.2,GCN1:1.2+,GCN2+:2.0+)
X86+SSE2(orhigher)compatibleCPUs64-bit&32-bit,[163]Linux2.6PC,WindowsVista/7/8.x/10PC
AMDFusionE-350,E-240,C-50,C-30withHD6310/HD6250
AMDRadeon/MobilityHD6800,HD5x00seriesGPU,iGPUHD6310/HD6250,HD7xxx,HD8xxx,R2xx,R3xx,RX4xx,RX5xx,VegaSeries
AMDFireProVx800seriesGPUandlater,RadeonPro
IntelSDKforOpenCLApplications2013[164](supportsIntelCoreprocessorsandIntelHDGraphics4000/2500)2017R2withOpenCL2.1(Gen7+),SDK2019removedOpenCL2.1,[165]ActualSDK2020update3
IntelCPUswithSSE4.1,SSE4.2orAVXsupport.[166][167]MicrosoftWindows,Linux
IntelCorei7,i5,i3;2ndGenerationIntelCorei7/5/3,3rdGenerationIntelCoreProcessorswithIntelHDGraphics4000/2500andnewer
IntelCore2Solo,DuoQuad,Extremeandnewer
IntelXeon7x00,5x00,3x00(Corebased)andnewer
IBMServerswithOpenCLDevelopmentKitforLinuxonPowerrunningonPowerVSX[168][169]
IBMPower775(PERCS),750
IBMBladeCenterPS70xExpress
IBMBladeCenterJS2x,JS43
IBMBladeCenterQS22
IBMOpenCLCommonRuntime(OCR)
[170]
X86+SSE2(orhigher)compatibleCPUs64-bit&32-bit;[171]Linux2.6PC
AMDFusion,NvidiaIonandIntelCorei7,i5,i3;2ndGenerationIntelCorei7/5/3
AMDRadeon,NvidiaGeForceandIntelCore2Solo,Duo,Quad,Extreme
ATIFirePro,NvidiaQuadroandIntelXeon7x00,5x00,3x00(Corebased)
NvidiaOpenCLDriverandTools,[172]Chips:Tesla,Fermi :OpenCL1.1(Driver340+),Kepler,Maxwell,Pascal,Volta,Turing:OpenCL1.2(Driver370+),OpenCL2.0beta(378.66),OpenCL3.0:MaxwelltoAmpere(Driver465+)
NvidiaTeslaC/D/S
NvidiaGeForceGTS/GT/GTX,
NvidiaIon
NvidiaQuadroFX/NVX/Plex,Quadro,QuadroK,QuadroM,QuadroP,QuadrowithVolta,QuadroRTXwithTuring,Ampere
Allstandard-conformantimplementationscanbequeriedusingoneoftheclinfotools(therearemultipletoolswiththesamenameandsimilarfeatureset).[173][174][175]
Versionsupport[edit]
ProductsandtheirversionofOpenCLsupportinclude:[176]
OpenCL3.0support[edit]
AllhardwarewithOpenCL1.2+ispossible,OpenCL2.xonlyoptional,KhronosTestSuiteavailablesince2020-10[177][178]
(2020)IntelNEOCompute:20.41+forGen12TigerLaketoBroadwell(includefull2.0and2.1supportandpartsof2.2)[179]
(2020)Intel6th,7th,8th,9th,10th,11thgenprocessors(Skylake,KabyLake,CoffeeLake,CometLake,IceLake,TigerLake)withlatestIntelWindowsgraphicsdriver
(2021)Intel11th,12thgenprocessors(RocketLake,AlderLake)withlatestIntelWindowsgraphicsdriver
(2022)Intel13thgenprocessors(RaptorLake)withlatestIntelWindowsgraphicsdriver
(2022)IntelArcdiscretegraphicswithlatestIntelArcWindowsgraphicsdriver
(2021)NvidiaMaxwell,Pascal,Volta,TuringandAmperewithNvidiagraphicsdriver465+.[160]
OpenCL2.2support[edit]
Noneyet:KhronosTestSuiteready,withDriverUpdateallHardwarewith2.0and2.1supportpossible
IntelNEOCompute:WorkinProgressforactualproducts[180]
ROCm:Version3.5+mostly
OpenCL2.1support[edit]
(2018+)SupportbackportedtoIntel5thand6thgenprocessors(Broadwell,Skylake)
(2017+)Intel7th,8th,9th,10thgenprocessors(KabyLake,CoffeeLake,CometLake,IceLake)
Khronos:withDriverUpdateallHardwarewith2.0supportpossible
OpenCL2.0support[edit]
(2011+)AMDGCNGPU's(HD7700+/HD8000/Rx200/Rx300/Rx400/Rx500/Rx5000-Series),someGCN1stGenonly1.2withsomeExtensions
(2013+)AMDGCNAPU's(Jaguar,Steamroller,Puma,Excavator&Zen-based)
(2014+)Intel5th&6thgenprocessors(Broadwell,Skylake)
(2015+)QualcommAdreno5xxseries
(2018+)QualcommAdreno6xxseries
(2017+)ARMMali(Bifrost)G51andG71inAndroid7.1andLinux
(2018+)ARMMali(Bifrost)G31,G52,G72andG76
(2017+)incompleteEvaluationsupport:NvidiaKepler,Maxwell,Pascal,VoltaandTuringGPU's(GeForce600,700,800,900&10-series,QuadroK-,M-&P-series,TeslaK-,M-&P-series)withDriverVersion378.66+
OpenCL1.2support[edit]
(2011+)forsomeAMDGCN1stGensomeOpenCL2.0Featuresnotpossibletoday,butmanymoreExtensionsthanTerascale
(2009+)AMDTeraScale2&3GPU's(RV8xx,RV9xxinHD5000,6000&7000Series)
(2011+)AMDTeraScaleAPU's(K10,Bobcat&Piledriver-based)
(2012+)NvidiaKepler,Maxwell,Pascal,VoltaandTuringGPU's(GeForce600,700,800,900,10,16,20series,QuadroK-,M-&P-series,TeslaK-,M-&P-series)
(2012+)Intel3rd&4thgenprocessors(IvyBridge,Haswell)
(2013+)QualcommAdreno4xxseries
(2013+)ARMMaliMidgard3rdgen(T760)
(2015+)ARMMaliMidgard4thgen(T8xx)
OpenCL1.1support[edit]
(2008+)someAMDTeraScale1GPU's(RV7xxinHD4000-series)
(2008+)NvidiaTesla,FermiGPU's(GeForce8,9,100,200,300,400,500-series,Quadro-seriesorTesla-serieswithTeslaorFermiGPU)
(2011+)QualcommAdreno3xxseries
(2012+)ARMMaliMidgard1stand2ndgen(T-6xx,T720)
OpenCL1.0support[edit]
mostlyupdatedto1.1and1.2afterfirstDriverfor1.0only
Portability,performanceandalternatives[edit]
AkeyfeatureofOpenCLisportability,viaitsabstractedmemoryandexecutionmodel,andtheprogrammerisnotabletodirectlyusehardware-specifictechnologiessuchasinlineParallelThreadExecution(PTX)forNvidiaGPUsunlesstheyarewillingtogiveupdirectportabilityonotherplatforms.ItispossibletorunanyOpenCLkernelonanyconformantimplementation.
However,performanceofthekernelisnotnecessarilyportableacrossplatforms.Existingimplementationshavebeenshowntobecompetitivewhenkernelcodeisproperlytuned,though,andauto-tuninghasbeensuggestedasasolutiontotheperformanceportabilityproblem,[181]yielding"acceptablelevelsofperformance"inexperimentallinearalgebrakernels.[182]Portabilityofanentireapplicationcontainingmultiplekernelswithdifferingbehaviorswasalsostudied,andshowsthatportabilityonlyrequiredlimitedtradeoffs.[183]
AstudyatDelftUniversityfrom2011thatcomparedCUDAprogramsandtheirstraightforwardtranslationintoOpenCLCfoundCUDAtooutperformOpenCLbyatmost30%ontheNvidiaimplementation.TheresearchersnotedthattheircomparisoncouldbemadefairerbyapplyingmanualoptimizationstotheOpenCLprograms,inwhichcasetherewas"noreasonforOpenCLtoobtainworseperformancethanCUDA".Theperformancedifferencescouldmostlybeattributedtodifferencesintheprogrammingmodel(especiallythememorymodel)andtoNVIDIA'scompileroptimizationsforCUDAcomparedtothoseforOpenCL.[181]
AnotherstudyatD-WaveSystemsInc.foundthat"TheOpenCLkernel’sperformanceisbetweenabout13%and63%slower,andtheend-to-endtimeisbetweenabout16%and67%slower"thanCUDA'sperformance.[184]
ThefactthatOpenCLallowsworkloadstobesharedbyCPUandGPU,executingthesameprograms,meansthatprogrammerscanexploitbothbydividingworkamongthedevices.[185]Thisleadstotheproblemofdecidinghowtopartitionthework,becausetherelativespeedsofoperationsdifferamongthedevices.Machinelearninghasbeensuggestedtosolvethisproblem:GreweandO'Boyledescribeasystemofsupport-vectormachinestrainedoncompile-timefeaturesofprogramthatcandecidethedevicepartitioningproblemstatically,withoutactuallyrunningtheprogramstomeasuretheirperformance.[186]
InacomparisonofactualgraphiccardsofAMDRDNA2andNvidiaRTXSeriesthereisanundecidedresultbyOpenCL-Tests.PossibleperformanceincreasesfromtheuseofNvidiaCUDAorOptiXwerenottested.[187]
Seealso[edit]
AdvancedSimulationLibrary
AMDFireStream
BrookGPU
C++AMP
ClosetoMetal
CUDA
DirectCompute
GPGPU
HIP
Larrabee
LibSh
ListofOpenCLapplications
OpenACC
OpenGL
OpenHMPP
OpenMP
Metal
RenderScript
SequenceL
SIMD
SYCL
Vulkan
WebCL
References[edit]
^"KhronosOpenCLRegistry".KhronosGroup.April27,2020.RetrievedApril27,2020.
^"AndroidDevicesWithOpenCLsupport".GoogleDocs.ArrayFire.RetrievedApril28,2015.
^"FreeBSDGraphics/OpenCL".FreeBSD.RetrievedDecember23,2015.
^abcde"ConformantProducts".KhronosGroup.RetrievedMay9,2015.
^Sochacki,Bartosz(July19,2019)."TheOpenCLC++1.0Specification"(PDF).KhronosOpenCLWorkingGroup.RetrievedJuly19,2019.
^Munshi,Aaftab;Howes,Lee;Sochaki,Barosz(April27,2020)."TheOpenCLCSpecificationVersion:3.0DocumentRevision:V3.0.7"(PDF).KhronosOpenCLWorkingGroup.Archivedfromtheoriginal(PDF)onSeptember20,2020.RetrievedApril28,2021.
^"TheC++forOpenCL1.0ProgrammingLanguageDocumentationRevision2".KhronosOpenCLWorkingGroup.March31,2021.RetrievedApril18,2021.
^"ConformantCompanies".KhronosGroup.RetrievedApril8,2015.
^Gianelli,SilviaE.(January14,2015)."XilinxSDAccelDevelopmentEnvironmentforOpenCL,C,andC++,AchievesKhronosConformance".PRNewswire.Xilinx.RetrievedApril27,2015.
^Howes,Lee(November11,2015)."TheOpenCLSpecificationVersion:2.1DocumentRevision:23"(PDF).KhronosOpenCLWorkingGroup.RetrievedNovember16,2015.
^abGaster,Benedict;Howes,Lee;Kaeli,DavidR.;Mistry,Perhaad;Schaa,Dana(2012).HeterogeneousComputingwithOpenCL:RevisedOpenCL1.2Edition.MorganKaufmann.
^Tompson,Jonathan;Schlachter,Kristofer(2012)."AnIntroductiontotheOpenCLProgrammingModel"(PDF).NewYorkUniversityMediaResearchLab.Archivedfromtheoriginal(PDF)onJuly6,2015.RetrievedJuly6,2015.
^abcdStone,JohnE.;Gohara,David;Shi,Guochin(2010)."OpenCL:aparallelprogrammingstandardforheterogeneouscomputingsystems".ComputinginScience&Engineering.12(3):66–73.Bibcode:2010CSE....12c..66S.doi:10.1109/MCSE.2010.69.PMC 2964860.PMID 21037981.
^Klöckner,Andreas;Pinto,Nicolas;Lee,Yunsup;Catanzaro,Bryan;Ivanov,Paul;Fasih,Ahmed(2012)."PyCUDAandPyOpenCL:Ascripting-basedapproachtoGPUrun-timecodegeneration".ParallelComputing.38(3):157–174.arXiv:0911.3456.doi:10.1016/j.parco.2011.09.001.S2CID 18928397.
^"OpenCL-OpenComputingLanguageBindings".metacpan.org.RetrievedAugust18,2018.
^"DbindingforOpenCL".dlang.org.RetrievedJune29,2021.
^"SPIR-Thefirstopenstandardintermediatelanguageforparallelcomputeandgraphics".KhronosGroup.January21,2014.
^"SYCL-C++Single-sourceHeterogeneousProgrammingforOpenCL".KhronosGroup.January21,2014.ArchivedfromtheoriginalonJanuary18,2021.RetrievedOctober24,2016.
^abc"C++forOpenCL,OpenCL-Guide".GitHub.RetrievedApril18,2021.
^abcAaftabMunshi,ed.(2014)."TheOpenCLCSpecification,Version2.0"(PDF).RetrievedJune24,2014.
^ab"IntroductiontoOpenCLProgramming201005"(PDF).AMD.pp. 89–90.Archivedfromtheoriginal(PDF)onMay16,2011.RetrievedAugust8,2017.
^"OpenCL"(PDF).SIGGRAPH2008.August14,2008.Archivedfromtheoriginal(PDF)onFebruary16,2012.RetrievedAugust14,2008.
^"FittingFFTontoG80Architecture"(PDF).VasilyVolkovandBrianKazian,UCBerkeleyCS258projectreport.May2008.RetrievedNovember14,2008.
^"OpenCL_FFT".Apple.June26,2012.RetrievedJune18,2022.
^Trevett,Neil(April28,2020)."KhronosAnnouncementsandPanelDiscussion"(PDF).
^Stulova,Anastasia;Hickey,Neil;vanHaastregt,Sven;Antognini,Marco;Petit,Kevin(April27,2020)."TheC++forOpenCLProgrammingLanguage".ProceedingsoftheInternationalWorkshoponOpenCL.IWOCL'20.Munich,Germany:AssociationforComputingMachinery:1–2.doi:10.1145/3388333.3388647.ISBN 978-1-4503-7531-3.S2CID 216554183.
^abKhronosGroup/OpenCL-Docs,TheKhronosGroup,April16,2021,retrievedApril18,2021
^"Clangrelease9documentation,OpenCLsupport".releases.llvm.org.September2019.RetrievedApril18,2021.
^"Clang9,LanguageExtensions,OpenCL".releases.llvm.org.September2019.RetrievedApril18,2021.
^"ReleaseofDocumentationofC++forOpenCLkernellanguage,version1.0,revision1·KhronosGroup/OpenCL-Docs".GitHub.December2020.RetrievedApril18,2021.
^ab"TheC++forOpenCL1.0ProgrammingLanguageDocumentation".www.khronos.org.RetrievedApril18,2021.
^abcd"ReleaseofC++forOpenCLKernelLanguageDocumentation,version1.0,revision2·KhronosGroup/OpenCL-Docs".GitHub.March2021.RetrievedApril18,2021.
^"cl_ext_cxx_for_opencl".www.khronos.org.September2020.RetrievedApril18,2021.
^"MaliSDKSupportingCompilationofKernelsinC++forOpenCL".community.arm.com.December2020.RetrievedApril18,2021.
^"ClangCompilerUser'sManual—C++forOpenCLSupport".clang.llvm.org.RetrievedApril18,2021.
^"OpenCL-Guide,OfflineCompilationofOpenCLKernelSources".GitHub.RetrievedApril18,2021.
^"OpenCL-Guide,ProgrammingOpenCLKernels".GitHub.RetrievedApril18,2021.
^abClspvisaprototypecompilerforasubsetofOpenCLCtoVulkancomputeshaders:google/clspv,August17,2019,retrievedAugust20,2019
^Petit,Kévin(April17,2021),ExperimentalimplementationofOpenCLonVulkan,retrievedApril18,2021
^"KhronosLaunchesHeterogeneousComputingInitiative"(Pressrelease).KhronosGroup.June16,2008.ArchivedfromtheoriginalonJune20,2008.RetrievedJune18,2008.
^"OpenCLgetstoutedinTexas".MacWorld.November20,2008.RetrievedJune12,2009.
^"TheKhronosGroupReleasesOpenCL1.0Specification"(Pressrelease).KhronosGroup.December8,2008.RetrievedDecember4,2016.
^"ApplePreviewsMacOSXSnowLeopardtoDevelopers"(Pressrelease).AppleInc.June9,2008.ArchivedfromtheoriginalonMarch18,2012.RetrievedJune9,2008.
^"AMDDrivesAdoptionofIndustryStandardsinGPGPUSoftwareDevelopment"(Pressrelease).AMD.August6,2008.RetrievedAugust14,2008.
^"AMDBacksOpenCL,MicrosoftDirectX11".eWeek.August6,2008.ArchivedfromtheoriginalonMarch19,2012.RetrievedAugust14,2008.
^"HPCWire:RapidMindEmbracesOpenSourceandStandardsProjects".HPCWire.November10,2008.ArchivedfromtheoriginalonDecember18,2008.RetrievedNovember11,2008.
^"NvidiaAddsOpenCLToItsIndustryLeadingGPUComputingToolkit"(Pressrelease).Nvidia.December9,2008.RetrievedDecember10,2008.
^"OpenCLDevelopmentKitforLinuxonPower".alphaWorks.October30,2009.RetrievedOctober30,2009.
^"OpenclStandard-anoverview|ScienceDirectTopics".
^http://developer.amd.com/wordpress/media/2012/10/opencl-1.0.48.pdf[bareURLPDF]
^"KhronosDrivesMomentumofParallelComputingStandardwithReleaseofOpenCL1.1Specification".ArchivedfromtheoriginalonMarch2,2016.RetrievedFebruary24,2016.
^"KhronosReleasesOpenCL1.2Specification".KhronosGroup.November15,2011.RetrievedJune23,2015.
^abc"OpenCL1.2Specification"(PDF).KhronosGroup.RetrievedJune23,2015.
^"KhronosFinalizesOpenCL2.0SpecificationforHeterogeneousComputing".KhronosGroup.November18,2013.RetrievedFebruary10,2014.
^"KhronosReleasesOpenCL2.1andSPIR-V1.0SpecificationsforHeterogeneousParallelProgramming".KhronosGroup.November16,2015.RetrievedNovember16,2015.
^"KhronosAnnouncesOpenCL2.1:C++ComestoOpenCL".AnandTech.March3,2015.RetrievedApril8,2015.
^"KhronosReleasesOpenCL2.1ProvisionalSpecificationforPublicReview".KhronosGroup.March3,2015.RetrievedApril8,2015.
^"OpenCLOverview".KhronosGroup.July21,2013.
^ab"KhronosReleasesOpenCL2.2ProvisionalSpecificationwithOpenCLC++KernelLanguageforParallelProgramming".KhronosGroup.April18,2016.
^Trevett,Neil(April2016)."OpenCL–AStateoftheUnion"(PDF).IWOCL.Vienna:KhronosGroup.RetrievedJanuary2,2017.
^"KhronosReleasesOpenCL2.2WithSPIR-V1.2".KhronosGroup.May16,2017.
^ab"OpenCL2.2MaintenanceUpdateReleased".TheKhronosGroup.May14,2018.
^"OpenCL3.0BringingGreaterFlexibility,AsyncDMAExtensions-Phoronix".
^"KhronosGroupReleasesOpenCL3.0".April26,2020.
^https://www.khronos.org/registry/OpenCL/specs/3.0-unified/pdf/OpenCL_API.pdf[bareURLPDF]
^https://www.iwocl.org/wp-content/uploads/k03-iwocl-syclcon-2021-trevett-updated.mp4.pdf[bareURLPDF]
^"UsingSemaphoreandMemorySharingExtensionsforVulkanInteropwithNVIDIAOpenCL".February24,2022.
^"Breaking:OpenCLMergingRoadmapintoVulkan|PCPerspective".www.pcper.com.ArchivedfromtheoriginalonNovember1,2017.RetrievedMay17,2017.
^"SIGGRAPH2018:OpenCL-NextTakingShape,VulkanContinuesEvolving-Phoronix".www.phoronix.com.
^"VulkanUpdateSIGGRAPH2019"(PDF).
^Trevett,Neil(May23,2019)."KhronosandOpenCLOverviewEVSWorkshopMay19"(PDF).KhronosGroup.
^"OpenCLICDSpecification".RetrievedJune23,2015.
^"AppleentryonLLVMUserspage".RetrievedAugust29,2009.
^"NvidiaentryonLLVMUserspage".RetrievedAugust6,2009.
^"RapidmindentryonLLVMUserspage".RetrievedOctober1,2009.
^"ZackRusin'sblogpostabouttheGallium3DOpenCLimplementation".February2009.RetrievedOctober1,2009.
^"GalliumCompute".dri.freedesktop.org.RetrievedJune23,2015.
^"CloverStatusUpdate"(PDF).
^"mesa/mesa-TheMesa3DGraphicsLibrary".cgit.freedesktop.org.
^"GalliumCloverWithSPIR-V&NIROpeningUpNewComputeOptionsInsideMesa-Phoronix".www.phoronix.com.ArchivedfromtheoriginalonOctober22,2020.RetrievedDecember13,2018.
^https://xdc2018.x.org/slides/clover.pdf[bareURLPDF]
^"Mesa's"Rusticl"ImplementationNowManagestoHandleDarktableOpenCL".
^Larabel,Michael(January10,2013)."Beignet:OpenCL/GPGPUComesForIvyBridgeOnLinux".Phoronix.
^Larabel,Michael(April16,2013)."MoreCriticismComesTowardsIntel'sBeignetOpenCL".Phoronix.
^Larabel,Michael(December24,2013)."Intel'sBeignetOpenCLIsStillSlowlyBaking".Phoronix.
^"Beignet".freedesktop.org.
^"beignet-BeignetOpenCLLibraryforIntelIvyBridgeandnewerGPUs".cgit.freedesktop.org.
^"IntelBringsBeignetToAndroidForOpenCLCompute-Phoronix".www.phoronix.com.
^"01.orgIntelOpenSource-ComputeRuntime".February7,2018.
^"NEOGitHubREADME".GitHub.March21,2019.
^"ROCm".GitHub.ArchivedfromtheoriginalonOctober8,2016.
^"RadeonOpenCompute/ROCm:ROCm-OpenSourcePlatformforHPCandUltrascaleGPUComputing".GitHub.March21,2019.
^"ANiceOverviewOfTheROCmLinuxComputeStack-Phoronix".www.phoronix.com.
^"XDCLightning.pdf".GoogleDocs.
^"RadeonROCm2.0OfficiallyOutWithOpenCL2.0Support,TensorFlow1.12,Vega48-bitVA-Phoronix".www.phoronix.com.
^"TakingRadeonROCm2.0OpenCLForABenchmarkingTestDrive-Phoronix".www.phoronix.com.
^https://github.com/RadeonOpenCompute/ROCm/blob/master/AMD_ROCm_Release_Notes_v3.3.pdf[deadlink]
^"RadeonROCm3.5ReleasedwithNewFeaturesbutStillNoNaviSupport-Phoronix".
^"RadeonROCm3.10ReleasedwithDataCenterToolImprovements,NewAPIs-Phoronix".
^"AMDLaunchesArcturusastheInstinctMI100,RadeonROCm4.0-Phoronix".
^"WelcometoAMDROCm™Platform—ROCmDocumentation1.0.0documentation".
^https://docs.amd.com/
^Jääskeläinen,Pekka;SánchezdeLaLama,Carlos;Schnetter,Erik;Raiskila,Kalle;Takala,Jarmo;Berg,Heikki(2016)."pocl:APerformance-PortableOpenCLImplementation".Int'lJ.ParallelProgramming.43(5):752–785.arXiv:1611.07083.Bibcode:2016arXiv161107083J.doi:10.1007/s10766-014-0320-y.S2CID 9905244.
^abc"poclhomepage".pocl.
^"GitHub-pocl/pocl:pocl:PortableComputingLanguage".March14,2019–viaGitHub.
^"HSAsupportimplementationstatusasof2016-05-17—PortableComputingLanguage(pocl)1.3-predocumentation".portablecl.org.
^"PoCLhomepage".
^"PoCLhomepage".
^"PoCLhomepage".
^"Archivedcopy".ArchivedfromtheoriginalonJanuary17,2021.RetrievedDecember3,2020.{{citeweb}}:CS1maint:archivedcopyastitle(link)
^https://www.iwocl.org/wp-content/uploads/30-iwocl-syclcon-2021-baumann-slides.pdf[bareURLPDF]
^"PoCLhomepage".
^"PoCLhomepage".
^"PoCLhomepage".
^"About".Git.Linaro.org.
^Gall,T.;Pitney,G.(March6,2014)."LCA14-412:GPGPUonARMSoC"(PDF).AmazonWebServices.Archivedfromtheoriginal(PDF)onJuly26,2020.RetrievedJanuary22,2017.
^"zuzuf/freeocl".GitHub.RetrievedApril13,2017.
^Zhang,Peng;Fang,Jianbin;Yang,Canqun;Tang,Tao;Huang,Chun;Wang,Zheng(2018).MOCL:AnEfficientOpenCLImplementationfortheMatrix-2000Architecture(PDF).Proc.Int'lConf.onComputingFrontiers.doi:10.1145/3203217.3203244.
^"Status".GitHub.March16,2022.
^"OpenCLDemo,AMDCPU".YouTube.December10,2008.RetrievedMarch28,2009.
^"OpenCLDemo,NvidiaGPU".YouTube.December10,2008.RetrievedMarch28,2009.
^"ImaginationTechnologieslaunchesadvanced,highly-efficientPOWERVRSGX543MPmulti-processorgraphicsIPfamily".ImaginationTechnologies.March19,2009.ArchivedfromtheoriginalonApril3,2014.RetrievedJanuary30,2011.
^"AMDandHavokdemoOpenCLacceleratedphysics".PCPerspective.March26,2009.ArchivedfromtheoriginalonApril5,2009.RetrievedMarch28,2009.
^"NvidiaReleasesOpenCLDriverToDevelopers".Nvidia.April20,2009.ArchivedfromtheoriginalonFebruary4,2012.RetrievedApril27,2009.
^"AMDdoesreverseGPGPU,announcesOpenCLSDKforx86".ArsTechnica.August5,2009.RetrievedAugust6,2009.[permanentdeadlink]
^Moren,Dan;Snell,Jason(June8,2009)."LiveUpdate:WWDC2009Keynote".MacWorld.com.MacWorld.RetrievedJune12,2009.
^"ATIStreamSoftwareDevelopmentKit(SDK)v2.0BetaProgram".ArchivedfromtheoriginalonAugust9,2009.RetrievedOctober14,2009.
^"S3GraphicslaunchedtheChrome5400Eembeddedgraphicsprocessor".ArchivedfromtheoriginalonDecember2,2009.RetrievedOctober27,2009.
^"VIABringsEnhancedVN1000GraphicsProcessor]".ArchivedfromtheoriginalonDecember15,2009.RetrievedDecember10,2009.
^"ATIStreamSDKv2.0withOpenCL1.0Support".ArchivedfromtheoriginalonNovember1,2009.RetrievedOctober23,2009.
^"OpenCL".ZiiLABS.RetrievedJune23,2015.
^"InteldisclosesnewSandyBridgetechnicaldetails".ArchivedfromtheoriginalonOctober31,2013.RetrievedSeptember13,2010.
^"WebCLrelatedstories".KhronosGroup.RetrievedJune23,2015.
^"KhronosReleasesFinalWebGL1.0Specification".KhronosGroup.ArchivedfromtheoriginalonJuly9,2015.RetrievedJune23,2015.
^"Community".
^"WelcometoWikis".www.ibm.com.October20,2009.
^"NokiaResearchreleasesWebCLprototype".KhronosGroup.May4,2011.ArchivedfromtheoriginalonDecember5,2020.RetrievedJune23,2015.
^KamathK,Sharath."Samsung'sWebCLPrototypeforWebKit".Github.com.ArchivedfromtheoriginalonFebruary18,2015.RetrievedJune23,2015.
^"AMDOpenstheThrottleonAPUPerformancewithUpdatedOpenCLSoftwareDevelopment".Amd.com.August8,2011.RetrievedJune16,2013.
^"AMDAPPSDKv2.6".Forums.amd.com.March13,2015.RetrievedJune23,2015.[deadlink]
^"ThePortlandGroupAnnouncesOpenCLCompilerforST-EricssonARM-BasedNovaThorSoCs".RetrievedMay4,2012.
^"WebCLLatestSpec".KhronosGroup.November7,2013.ArchivedfromtheoriginalonAugust1,2014.RetrievedJune23,2015.
^"AlteraOpenstheWorldofFPGAstoSoftwareProgrammerswithBroadAvailabilityofSDKandOff-the-ShelfBoardsforOpenCL".Altera.com.ArchivedfromtheoriginalonJanuary9,2014.RetrievedJanuary9,2014.
^"AlteraSDKforOpenCLisFirstinIndustrytoAchieveKhronosConformanceforFPGAs".Altera.com.ArchivedfromtheoriginalonJanuary9,2014.RetrievedJanuary9,2014.
^"KhronosFinalizesOpenCL2.0SpecificationforHeterogeneousComputing".KhronosGroup.November18,2013.RetrievedJune23,2015.
^"WebCL1.0PressRelease".KhronosGroup.March19,2014.RetrievedJune23,2015.
^"WebCL1.0Specification".KhronosGroup.March14,2014.RetrievedJune23,2015.
^"IntelOpenCL2.0Driver".ArchivedfromtheoriginalonSeptember17,2014.RetrievedOctober14,2014.
^"AMDOpenCL2.0Driver".Support.AMD.com.June17,2015.RetrievedJune23,2015.
^"XilinxSDAcceldevelopmentenvironmentforOpenCL,C,andC++,achievesKhronosConformance-khronos.orgnews".TheKhronosGroup.RetrievedJune26,2017.
^"Release349GraphicsDriversforWindows,Version350.12"(PDF).April13,2015.RetrievedFebruary4,2016.
^"AMDAPPSDK3.0Released".Developer.AMD.com.August26,2015.RetrievedSeptember11,2015.
^"KhronosReleasesOpenCL2.1andSPIR-V1.0SpecificationsforHeterogeneousParallelProgramming".KhronosGroup.November16,2015.
^"What'snew?Intel®SDKforOpenCL™Applications2016,R3".IntelSoftware.
^"NVIDIA378.66driversforWindowsofferOpenCL2.0evaluationsupport".KhronosGroup.February17,2017.ArchivedfromtheoriginalonAugust6,2020.RetrievedMarch17,2017.
^Szuppe,Jakub(February22,2017)."NVIDIAenablesOpenCL2.0beta-support".
^Szuppe,Jakub(March6,2017)."NVIDIAbeta-supportforOpenCL2.0worksonLinuxtoo".
^"TheKhronosGroup".TheKhronosGroup.March21,2019.
^"GitHub-RadeonOpenCompute/ROCmatroc-3.5.0".GitHub.
^ab"NVIDIAisNowOpenCL3.0Conformant".April12,2021.
^"TheKhronosGroup".TheKhronosGroup.August20,2019.RetrievedAugust20,2019.
^"KhronosGroup/OpenCL-CTL:TheOpenCLConformanceTests".GitHub.March21,2019.
^"OpenCLandtheAMDAPPSDK".AMDDeveloperCentral.developer.amd.com.ArchivedfromtheoriginalonAugust4,2011.RetrievedAugust11,2011.
^"AboutIntelOpenCLSDK1.1".software.intel.com.intel.com.RetrievedAugust11,2011.
^"Intel®SDKforOpenCL™Applications-ReleaseNotes".software.intel.com.March14,2019.
^"ProductSupport".RetrievedAugust11,2011.
^"IntelOpenCLSDK–ReleaseNotes".ArchivedfromtheoriginalonJuly17,2011.RetrievedAugust11,2011.
^"AnnouncingOpenCLDevelopmentKitforLinuxonPowerv0.3".IBM.RetrievedAugust11,2011.
^"IBMreleasesOpenCLDevelopmentKitforLinuxonPowerv0.3–OpenCL1.1conformantreleaseavailable".OpenCLLounge.ibm.com.RetrievedAugust11,2011.
^"IBMreleasesOpenCLCommonRuntimeforLinuxonx86Architecture".IBM.October20,2009.RetrievedSeptember10,2011.
^"OpenCLandtheAMDAPPSDK".AMDDeveloperCentral.developer.amd.com.ArchivedfromtheoriginalonSeptember6,2011.RetrievedSeptember10,2011.
^"NvidiaReleasesOpenCLDriver".April22,2009.RetrievedAugust11,2011.
^"clinfobySimonLeblanc".GitHub.RetrievedJanuary27,2017.
^"clinfobyOblomov".GitHub.RetrievedJanuary27,2017.
^"clinfo:openCLINFOrmation".RetrievedJanuary27,2017.
^"KhronosProducts".TheKhronosGroup.RetrievedMay15,2017.
^"OpenCL-CTS/Test_conformanceatmain·KhronosGroup/OpenCL-CTS".GitHub.
^"Issues·KhronosGroup/OpenCL-CTS".GitHub.
^"IntelCompute-Runtime20.43.18277BringsAlderLakeSupport".
^"compute-runtime".01.org.February7,2018.
^abFang,Jianbin;Varbanescu,AnaLucia;Sips,Henk(2011)."AComprehensivePerformanceComparisonofCUDAandOpenCL".2011InternationalConferenceonParallelProcessing.Proc.Int'lConf.onParallelProcessing.pp. 216–225.doi:10.1109/ICPP.2011.45.ISBN 978-1-4577-1336-1.
^Du,Peng;Weber,Rick;Luszczek,Piotr;Tomov,Stanimire;Peterson,Gregory;Dongarra,Jack(2012)."FromCUDAtoOpenCL:Towardsaperformance-portablesolutionformulti-platformGPUprogramming".ParallelComputing.38(8):391–407.CiteSeerX 10.1.1.193.7712.doi:10.1016/j.parco.2011.10.002.
^Dolbeau,Romain;Bodin,François;deVerdière,GuillaumeColin(September7,2013)."OneOpenCLtorulethemall?".2013IEEE6thInternationalWorkshoponMulti-/Many-coreComputingSystems(MuCoCoS).pp. 1–6.doi:10.1109/MuCoCoS.2013.6633603.ISBN 978-1-4799-1010-6.S2CID 225784.
^Karimi,Kamran;Dickson,NeilG.;Hamze,Firas(2011)."APerformanceComparisonofCUDAandOpenCL".arXiv:1005.2581v3[cs.PF].
^ASurveyofCPU-GPUHeterogeneousComputingTechniques,ACMComputingSurveys,2015.
^Grewe,Dominik;O'Boyle,MichaelF.P.(2011)."AStaticTaskPartitioningApproachforHeterogeneousSystemsUsingOpenCL".CompilerConstruction.Proc.Int'lConf.onCompilerConstruction.LectureNotesinComputerScience.Vol. 6601.pp. 286–305.doi:10.1007/978-3-642-19861-8_16.ISBN 978-3-642-19860-1.
^"RadeonRX6800SeriesHasExcellentROCm-BasedOpenCLPerformanceonLinux".
Externallinks[edit]
Officialwebsite
OfficialwebsiteforWebCL
InternationalWorkshoponOpenCLArchivedJanuary26,2021,attheWaybackMachine(IWOCL)sponsoredbyTheKhronosGroup
vteKhronosGroupStandardsActive
EGL
glTF
NNEF
OpenCL
OpenVG
OpenVX
OpenXR
SPIR
SYCL
Vulkan
Inactive
COLLADA
OpenGL
ES
SC
WebGL
OpenKODE
OpenMAX
OpenSLES
OpenWF
WebCL
vteParallelcomputingGeneral
Distributedcomputing
Parallelcomputing
Massivelyparallel
Cloudcomputing
High-performancecomputing
Multiprocessing
Manycoreprocessor
GPGPU
Computernetwork
Systolicarray
Levels
Bit
Instruction
Thread
Task
Data
Memory
Loop
Pipeline
Multithreading
Temporal
Simultaneous(SMT)
Speculative(SpMT)
Preemptive
Cooperative
Clusteredmulti-thread(CMT)
Hardwarescout
Theory
PRAMmodel
PEMmodel
Analysisofparallelalgorithms
Amdahl'slaw
Gustafson'slaw
Costefficiency
Karp–Flattmetric
Slowdown
Speedup
Elements
Process
Thread
Fiber
Instructionwindow
Arraydatastructure
Coordination
Multiprocessing
Memorycoherence
Cachecoherence
Cacheinvalidation
Barrier
Synchronization
Applicationcheckpointing
Programming
Streamprocessing
Dataflowprogramming
Models
Implicitparallelism
Explicitparallelism
Concurrency
Non-blockingalgorithm
Hardware
Flynn'staxonomy
SISD
SIMD
Arrayprocessing(SIMT)
Pipelinedprocessing
Associativeprocessing
MISD
MIMD
Dataflowarchitecture
Pipelinedprocessor
Superscalarprocessor
Vectorprocessor
Multiprocessor
symmetric
asymmetric
Memory
shared
distributed
distributedshared
UMA
NUMA
COMA
Massivelyparallelcomputer
Computercluster
Gridcomputer
Hardwareacceleration
APIs
AtejiPX
Boost
Chapel
HPX
Charm++
Cilk
CoarrayFortran
CUDA
Dryad
C++AMP
GlobalArrays
GPUOpen
MPI
OpenMP
OpenCL
OpenHMPP
OpenACC
ParallelExtensions
PVM
pthreads
RaftLib
ROCm
UPC
TBB
ZPL
Problems
Automaticparallelization
Deadlock
Deterministicalgorithm
Embarrassinglyparallel
Parallelslowdown
Racecondition
Softwarelockout
Scalability
Starvation
Category:Parallelcomputing
Retrievedfrom"https://en.wikipedia.org/w/index.php?title=OpenCL&oldid=1096715527"
Categories:2009softwareApplicationprogramminginterfacesCross-platformsoftwareGPGPUGPGPUlibrariesParallelcomputingGraphicslibrariesGraphicsstandardsHiddencategories:AllarticleswithbareURLsforcitationsArticleswithbareURLsforcitationsfromMarch2022ArticleswithPDFformatbareURLsforcitationsAllarticleswithdeadexternallinksArticleswithdeadexternallinksfromMarch2021CS1maint:archivedcopyastitleArticleswithpermanentlydeadexternallinksArticleswithshortdescriptionShortdescriptionmatchesWikidataArticlesneedingcleanupfromJune2022ArticleswithbareURLsforcitationsfromJune2022ArticlescoveredbyWikiProjectWikifyfromJune2022AllarticlescoveredbyWikiProjectWikifyWikipediaarticlesthataretootechnicalfromOctober2021AllarticlesthataretootechnicalUsemdydatesfromOctober2018OfficialwebsitedifferentinWikidataandWikipediaWebarchivetemplatewaybacklinks
Navigationmenu
Personaltools
NotloggedinTalkContributionsCreateaccountLogin
Namespaces
ArticleTalk
English
Views
ReadEditViewhistory
More
Search
Navigation
MainpageContentsCurrenteventsRandomarticleAboutWikipediaContactusDonate
Contribute
HelpLearntoeditCommunityportalRecentchangesUploadfile
Tools
WhatlinkshereRelatedchangesUploadfileSpecialpagesPermanentlinkPageinformationCitethispageWikidataitem
Print/export
DownloadasPDFPrintableversion
Inotherprojects
WikimediaCommons
Languages
العربيةБългарскиCatalàČeštinaDeutschEestiEspañolEuskaraفارسیFrançais한국어ItalianoNederlands日本語NorskbokmålPolskiPortuguêsРусскийSlovenčinaСрпски/srpskiSuomiSvenskaไทยTürkçeУкраїнська中文
Editlinks