Who is a Site Reliability Engineer (SRE) - Flagship.io
文章推薦指數: 80 %
A site reliability engineer (SRE) creates a bridge between development and IT operations by taking on the tasks typically done by operations. SoftwareReleaseGlossaryMostcommonlyusedtermsandacronymsbyproductmanagers,engineersanddevopsABCDEFGHIJKLMNOPQRSTUVWXYZSiteReliabilityEngineer Asitereliabilityengineer(SRE)createsabridgebetweendevelopmentandIToperationsbytakingonthetaskstypicallydonebyoperations.Instead,suchtasksaregiventothesetypesofengineerswhouseautomationtoolstosolveproblemsbycreatingscalableandreliablesoftwaresystems. StandardizationandautomationareattheheartofwhatanSREdoes,especiallyassystemsmigratetothecloud.Thus,theyoftenhaveabackgroundinsoftwareorsystemengineeringorsystemadministrationwithIToperationsexperience. Whatissitereliabilityengineering? Wewillstartwithadefinitionofwhatthistypeofengineeringisbeforewemoveontotheroleandresponsibilitiesofasitereliabilityengineer. SitereliabilityengineeringisatermthatwasfirstcoinedbyGoogle,whereitisdescribedas“whenyoutreatoperationsasifit’sasoftwareproblem.” ThemainpurposeofSREisdevelopingsoftwaresystemsandautomatedsolutionsforoperationalaspects.Thus,SREdoestheworktraditionallydonebyoperationsbutinsteadusingengineerswithsoftwareexpertisetosolvecomplexproblems. Therefore,sitereliabilityengineeringcanbeconsideredasetofpracticesthatincorporatesaspectsofsoftwareengineeringintooperationstherebyincreasingtheefficiencyandreliabilityofsoftwaresystemsandimprovingworkflow. SREandDevOps SitereliabilityengineeringiscloselyrelatedtoDevOps,anotherconceptthatlinkssoftwaredevelopmentandoperations,andcanbeseenasageneralizationofcoreSREprinciples.Consequently,SREplaysalargepartinsuccessfullyimplementingDevOpspractices. Additionally,bothDevOpsandSREseektobridgethegapbetweenoperationsanddevelopmentteamstodeliversoftwarefaster. However,anarticlebyGooglemakesadistinctionbetweenthetwotermsstatingthatSRE“happenstoembodythephilosophiesofDevOps,buthasamuchmoreprescriptivewayofmeasuringandachievingreliabilitythroughengineeringandoperationswork.Inotherwords,SREprescribeshowtosucceedinthevariousDevOpsareas.” ClickheretoreadmoreaboutDevOpsandwhataDevOpsengineerdoes. Whatdoesasitereliabilityengineerdo? Asitereliabilityengineer(SRE)worksbetweendevelopmentandoperations.TheSRE,then,isasoftwaredeveloperwithexperienceinandknowledgeofIToperations. Alotofthisrolerevolvesaroundwritinganddevelopingcodetoautomateprocesses,suchasanalyzinglogs,testingproductionenvironmentsandrespondingtoanyissues,sothisengineerwillbeanexpertinwritingcode. Suchautomationallowsdevelopers,inturn,tofocusexclusivelyonfeaturedevelopmentenablingthemtobringnewfeaturestoproductionasquicklyaspossible. Theoperationsteam,fortheirpart,willfindtheirworkloaddecreasingasaSREwillautomatesolutionsforanyrecurringproblem. Thus,he/shewillbeshiftingbetweendevelopmentandoperationsworkandmaintainabalancebetweenthem. BecauseanSREengineer’smainfocusisonautomation,thismeansthathe/sheenhancesperformance,efficiencyandmonitoringofsoftwaredevelopmentprocesses. Requiredskillset SREsdedicatetheirtimetocreatingsoftwarethatwillimprovethereliabilityofsystems,fixingissuesandrespondingtoincidentsandissues.Assuch,theywillneedvarioustechnicalskills. Theywillneedtohaveknowledgeofvariousautomationtoolsastheyareusuallyresponsibleforbuildingandintegratingsoftwaretoolstoenhanceanorganizationalsystem’sreliabilityandscalability. Asmentionedabove,theSREwillrequireknowledgeofcodingandmostofthecommonprogramminglanguagesincludingRuby,JavascriptandPHP. He/shewillalsoneedtohaveexpertiseinthemajorcloudproviderssuchasAWSandGoogleCloud. DailyrolesandresponsibilitiesofanSRE Automation Asmentionedpreviously,SREengineersbuildtoolsforautomationtomanageIToperations.Thus,insteadofmanuallyperformingthesefunctions,theiraimistoautomatethem.Suchfunctionsinclude: ContinuousintegrationandcontinuousdeliveryMonitoring IncidentresponseAlerts Monitoring SREengineersareresponsibleforensuringthattheunderlyinginfrastructureisrunningsmoothlyandthatsystemsandtoolsareworkingasexpected. Theyalsomonitorcriticalapplicationsandservicestominimizedowntimeandensuretheiravailability. Issueresolution Theseengineersworkcloselywithdevelopers,especiallywhenissuesarisesotheywillcollaboratewithdeveloperstohelpwithtroubleshootingandprovideconsultationwhenalertsareissued. Thisengineerwillinvestigateandthenresolvetheissueintheeventthatadeveloperrunsintoaproblem. Followingtheincidentresolution,theengineerwillrevisittheissueanddeterminethecausetoensureitdoesn’thappenagain. Crossteamcollaboration Basedontheabove,SREsworkacrossdifferentteams,mainlyoperationsanddevelopment.Bybuildingreliablesystemsandprovidingsupporttotheseteams,thiswillgivetheseteamsmoretimetodiverttheirattentiontobuildingnewfeaturesandhencegettheseoutfastertocustomers. CommontoolsusedbySREs Monitoring:suchtoolsincludeAWSCloudWatchandNewRelic Incidentmanagement/on-call:suchasPagerDutyandVictorOpsProjectmanagementandissuetracking:suchasJiraandTrelloInfrastructureorchestration:includingTerraformandSaltStack Tofindoutmoretoolsfromprojectmanagementtoolstoinfrastructureandcontainerorchestrationusedbysitereliabilityengineers,checkoutthiscuratedlistofSREtools. HowmuchdoesanSREmake? Accordingtopayscale,thistypeofengineermakesasalaryanywherebetween$76,000to$158,000ayearintheUnitedStateswiththeaveragebeing$117,768peryear. Conclusion Asitereliabilityengineerisbecominganincreasinglyimportantrolewithinorganizations.Itisachallengingrolethatrequiresapassionforcodingandautomation. Havingsuchengineersinyourorganizationwillhelpreduceyouroperationalcostswhileimprovingthereliabilityofyoursystems. Info Moretermsfromtheglossary SoakTesting Soaktestingisatypeofperformanceandloadtestthatevaluateshowasoftwareapplicationhandlesagrowingnumberofusersforanextendedperiodoftime. Readdescription→ UserAcceptanceTesting Useracceptancetesting(UAT)isusedtoverifywhetherasoftwaremeetsbusinessrequirementsandwhetherit’sreadyforusebycustomers. Readdescription→ FakeDoorTesting Fakedoortestingisamethodwhereyoucanmeasureinterestinaproductornewfeaturewithoutactuallycodingit. Readdescription→ linkedin-squaretwittergithubangle-downyoutube-playcrossmenu Copylink CopyCopied
延伸文章資訊
- 1What is SRE (site reliability engineering)? - Red Hat
Site reliability engineering (SRE) is a software engineering approach to IT operations. SRE teams...
- 2網站可靠性工程|Google 的系統管理之道(Site Reliability ...
網站可靠性工程|Google 的系統管理之道(Site Reliability Engineering: How Google Runs Production Systems)(SRE). Be...
- 3Site Reliability Engineering: Google
SRE is what you get when you treat operations as if it's a software problem. Our mission is to pr...
- 4[好文翻譯] 你在找的是SRE 還是DevOps? - Medium
SRE is a DevOps (香蕉是一種水果) ... Site Reliability Engineering (SRE)和DevOps 是目前相當熱門的開發與維運文化,有著很高的相似程度。
- 5Site reliability engineering - Wikipedia
Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects...