What's An SRE? Site Reliability Engineer Roles and... - Splunk

文章推薦指數: 80 %
投票人數:10人

Site reliability engineers sit at the crossroads of traditional IT and software development. Basically, SRE teams are made up of software ... LEARN What’sAnSRE?SiteReliabilityEngineerRolesandResponsibilities Share: ByStephenWattsJune27,2022 DevOpsgainedpopularityinordertocombatsiloedworkflows,decreasedcollaborationandalackofvisibilityacrossthesoftwaredevelopmentlifecycle.Whileestablishinga cultureofDevOps hashelpedteamscollaboratebetteranddeliverreliablesoftwarefaster,DevOpsteamsdon’tnecessarilyhavesomeonespecificallydedicatedtodevelopingsystemsthatincreasesitereliabilityandperformance. That’swhereasitereliabilityengineer(SRE)comesintothepicture. SitereliabilityengineerssitatthecrossroadsoftraditionalITandsoftwaredevelopment.Basically,SREteamsaremadeupofsoftwareengineerswhobuildandimplementsoftwaretoimprovethereliabilityoftheirsystems. So,inthisarticle,let’s… Definethebasicrolesandresponsibilitiesofasitereliabilityengineer. ShowhowSREcandrasticallyimprovetheresilienceofyourpeople,processesandtechnology. SREoverview SitereliabilityengineeringwasoriginallydevelopedbyGoogle.In thewordsofBenTreynor,SREis“whathappenswhenyouaskasoftwareengineertodesignanoperationsfunction.” InatraditionalsetupofsiloedIToperationsandsoftwaredevelopmentteams,developerswouldthrowtheircodeovertoITprofessionals.Then,ITwouldbeinchargeofdeployment,maintenanceandanyon-callresponsibilitiesassociatedwiththesysteminproduction.Luckily,DevOpscamealongandforceddeveloperstoshareaccountabilityforsystemsinproduction, owntheircode andtakeon-callresponsibilities. DevOpspushedsharedresponsibilityforthereliabilityofyourapplicationsandinfrastructure.And,whilethisisagreatfirststepforward,itdoesn’tproactivelyhelpteamsaddresiliencetotheirsystem.ManyDevOpsteams,evenwithshortenedfeedbackloopsandimprovedcollaboration,canstillfindthemselvesdeployingnew,unreliableservicesintoproductionatarapidpace. SitereliabilityengineeringisawaytobridgethegapbetweendevelopersandIToperations,eveninaDevOpsculture.Itisn’t SREversusDevOps—it’sSRE with DevOps.SREiskindoflikeamoreproactiveformofqualityassurance(QA).Sitereliabilityengineerswillbededicatedfull-timetocreatingsoftwarethatimprovesthereliabilityofsystemsinproduction,including: Fixingissues Respondingtoincidents Usuallytaking on-callresponsibilities Asidefromitsgrowingroletoday,SRE’sbiggestclaimtofamemightbethefourgoldensignalsofmonitoring: Latency Traffic Errors Saturation CommonSRErolesandresponsibilities ImplementinganSREteamwillgreatlybenefitbothIToperationsandsoftwaredevelopmentteams.NotonlycanSREdrivedeeperreliabilitytosystemsinproductionbutitwilllikelyhelpIT,supportanddevelopmentteamsspendlesstimeworkingonsupportescalations—givingthemfocusedtimetobuildnewfeaturesandservices. So,let’slookatcommonsitereliabilityengineeringrolesandresponsibilitiesyoucanexpecttosee. BuildingsoftwaretohelpDevOps,ITOps&supportteams SREteamsareinchargeofproactivelybuildingandimplementingservicestomakeITandsupportbetterattheirjobs.Thiscanbeanythingfromadjustmentsto monitoringandalerting tocodechangesinproduction.Asitereliabilityengineercanbetaskedwithbuildingahomegrowntoolfromscratchtohelpwithweaknessesinsoftwaredeliveryorincidentmanagement. Fixingsupportescalationissues Similartothepointabove,asitereliabilityengineercanexpecttospendtimefixingsupportescalationcases.But,asyourSREoperationsmature,yoursystemswillbecomemorereliableandyou’llseefewercriticalincidentsinproduction–leadingtofewersupportescalations. Becausean SREteam touchessomanydifferentpartsoftheengineeringandITorganization,theycanbeagreatsourceofknowledgeandcanbehelpfulforroutingissuestotherightpeopleandteams. Optimizingon-callrotations&processes Moretimesthannot,sitereliabilityengineerswillneedtotakeon-callresponsibilities.Atmostorganizations,theSRErolewillhavealotofsayinhowtheteamcanimprovesystemreliabilitythroughtheoptimizationofon-callprocesses. SREteamswillhelpaddautomationandcontexttoalerts–leadingtobetterreal-timecollaborativeresponsefromon-callresponders.Additionally,sitereliabilityengineerscanupdate runbooks,toolsanddocumentationtohelpprepareon-callteamsforfutureincidents. Documenting“tribal”knowledge SREteamsgainexposuretosystemsinbothstagingandproduction,aswellasalltechnicalteams.Theytakepartinworkwithsoftwaredevelopment,support,IToperationsand on-callduties –meaningtheybuildupagreatamountofhistoricalknowledgeovertime.Insteadofsiloingthisknowledgeintothemindofoneteamoroneperson,sitereliabilityengineerscanbetaskedwithdocumentingmuchofwhattheyknow.Constantupkeepofdocumentationandrunbookscanensurethatteamsgettheinformationtheyneedrightwhentheyneedit. Conductingpost-incidentreviews Withoutthorough post-incidentreviews,youhavenowaytoidentifywhat’sworkingandwhat’snot.SREteamsneedtokeepteamshonestandensurethateveryone—softwaredevelopersandITprofessionals—areconductingpost-incidentreviews,documentingtheirfindingsandtakingactionontheirlearnings. Then,sitereliabilityengineersareoftentaskedwithactionitemsforbuildingoroptimizingsomepartoftheSDLCorincidentlifecycletobolsterthereliabilityoftheirservice. WheredoesSREfitonyourteam? Sitereliabilityengineeringrolesandresponsibilitiesarecrucialtothecontinuousimprovementofpeople,processesandtechnologywithinanyorganization.Whetheryourteamhasalreadytakenonafull-blownDevOpscultureoryou’restillattemptingtomakethetransition,SREoffersnumerousbenefitstospeedandreliability. SREfitsrightatthecrossroadsofIToperations,supportandsoftwareengineering.SREservesastheperfectblendofskillstotightentherelationshipbetweenITanddevelopers–leadingtoshorterfeedbackloops, bettercollaboration andmorereliablesoftware. ReadyforanSREapproach?LearnhowtohirethebestcandidateswiththeseSREinterviewquestions. Pros&consofbeingaSiteReliabilityEngineer InCatchpoint’s2021SREReport,theirsurveyindicatesthatsitereliabilityengineersweresomeofthehappiestemployeesinsoftwaredevelopmentandIT.WhileSREscan’tspendalltheirtimebuildingnewfeaturesforcustomers,they’reconstantlymakinganimpactoncustomerexperience.Infact,ifyou’relookingforaroledesignedtohelpcustomersthemost–thenSREisit. Sitereliabilityengineeringnotonlyimprovesthelivesofcustomersbut,whendoneright,improvesthelivesof: On-callteams ITprofessionals Softwaredevelopers SREcanbeoneofthemostfulfillingrolesforasoftwareengineer.ItcanhelpyoubetterunderstandthestrugglesofITandsupport,makingyouabetterdevelopergoingforward.Formoresupport,exploretheseDevOps&SREconferences. WhatisSplunk?   ThispostingdoesnotnecessarilyrepresentSplunk'sposition,strategiesoropinion Postedby StephenWatts StephenWattsworksingrowthmarketingatSplunk.StephenholdsadegreeinPhilosophyfromAuburnUniversityandisanMSIScandidateatUCDenver.HecontributestoavarietyofpublicationsincludingCIO.com,SearchEngineJournal,ITSM.Tools,ITChronicles,DZone,andCompTIA. RelatedPosts



請為這篇文章評分?