What is SRE (site reliability engineering)? - Red Hat
文章推薦指數: 80 %
Site reliability engineering (SRE) is a software engineering approach to IT operations. SRE teams use software as a tool to manage systems, ... Skiptocontent Featuredlinks Console Support Developers Partners Redhat.com Startatrial Products Solutions Services&support Resources RedHat&opensource MoreRedHat Console Support Developers Partners Redhat.com Startatrial Enteryourkeywords Contactus English Selectalanguage 简体中文EnglishFrançaisDeutschItaliano日本語한국어PortuguêsEspañol Account Login LoginYourRedHataccountgivesyouaccesstoyourmemberprofileandpreferences,andthefollowingservicesbasedonyourcustomerstatus: CustomerPortal RedHatConnectforBusinessPartners Usermanagement CertificationCentral RegisternowNotregisteredyet?Hereareafewreasonswhyyoushouldbe:BrowseKnowledgebasearticles,managesupportcasesandsubscriptions,downloadupdates,andmorefromoneplace. Viewusersinyourorganization,andedittheiraccountinformation,preferences,andpermissions. ManageyourRedHatcertifications,viewexamhistory,anddownloadcertification-relatedlogosanddocuments. EdityourprofileandpreferencesYourRedHataccountgivesyouaccesstoyourmemberprofile,preferences,andotherservicesdependingonyourcustomerstatus. Foryoursecurity,ifyou'reonapubliccomputerandhavefinishedusingyourRedHatservices,pleasebesuretologout. Logout Account Login Jumptosection Jumptosection Sitereliabilityengineering(SRE)isasoftwareengineeringapproachtoIToperations.SREteamsusesoftwareasatooltomanagesystems,solveproblems,andautomateoperationstasks.SREtakesthetasksthathavehistoricallybeendonebyoperationsteams,oftenmanually,andinsteadgivesthemtoengineersoropsteamswhousesoftwareandautomationtosolveproblemsandmanageproductionsystems. SREisavaluablepracticewhencreatingscalableandhighlyreliablesoftwaresystems.Ithelpsyoumanagelargesystemsthroughcode,whichismorescalableandsustainableforsysadminsmanagingthousandsorhundredsofthousandsofmachines. TheconceptofsitereliabilityengineeringcomesfromtheGoogleengineeringteamandiscreditedtoBenTreynorSloss. SREhelpsteamsfindabalancebetweenreleasingnewfeaturesandmakingsurethattheyarereliableforusers.Standardizationandautomationare2importantcomponentsoftheSREmodel.Sitereliabilityengineersshouldalwaysbelookingforwaystoenhanceandautomateoperationstasks.Inthisway,SREhelpstoimprovethereliabilityofasystemtoday,whilealsoimprovingitasitgrowsovertime. SREsupportsteamswhoaremovingfromatraditionalapproachtoIToperationstoacloud-nativeapproach.LearnaboutRedHat'sapproachtoSREAsitereliabilityengineerisauniquerolethatrequireseitherabackgroundasasoftwaredeveloperwithadditionaloperationsexperience,orasasysadminorinanIToperationsrolethatalsohassoftwaredevelopmentskills. SREteamsareresponsibleforhowcodeisdeployed,configured,andmonitored,aswellastheavailability,latency,changemanagement,emergencyresponse,andcapacitymanagementofservicesinproduction.Sitereliabilityengineeringhelpsteamstodeterminewhatnewfeaturescanbelaunchedandwhenbyusingservice-levelagreements(SLAs)todefinetherequiredreliabilityofthesystemthroughservice-levelindicators(SLI)andservice-levelobjectives(SLO). AnSLIisadefinedmeasureofspecificaspectsofprovidedservicelevels.KeySLIsincluderequestlatency,availability,errorrate,andsystemthroughput.AnSLOisbasedonthetargetvalueorrangeforaspecifiedservicelevelbasedontheSLI.AnSLOfortherequiredsystemreliabilityisthendeterminedbasedonthedowntimeagreeduponasacceptable.Thisdowntimelevelisreferredtoasanerrorbudget,themaximumallowablethresholdforerrorsandoutages. WithSRE,100%reliabilityisnotexpected;failureisplannedforandaccepted. Thedevelopmentteamisableto"spend"theerrorbudgetwhenreleasinganewfeature.UsingtheSLOanderrorbudget,thedevelopmentteamcandeterminewhetherornotaproductorserviceisabletolaunchbasedontheavailableerrorbudget.Ifaserviceisrunningwithinthe errorbudget,thenthedevelopmentteamcanlaunchwhenevertheywant,butifthesystemcurrentlyhastoomanyerrorsorgoesdownforlongerthantheerrorbudgetallowsthennonewlaunchescantakeplaceuntiltheerrorsarewithinbudget. Thedevelopmentteamconductsautomatedoperationsteststodemonstratereliability. Sitereliabilityengineerssplittheirtimebetweenoperationstasksandprojectwork.AccordingtoSREbestpracticesfromGoogle,asitereliabilityengineercanonlyspendamaximumof50%oftheirtimeonoperations,whichshouldbemonitoredtoensuretheydon’tgoover. Therestofthetimeshouldbespentondevelopmenttaskslikecreatingnewfeatures,scalingthesystem,andimplementingautomation.Excessoperationalworkandpoorlyperformingservicescanberedirectedbacktothedevteamtoruninsteadofthesitereliabilityengineerspendingtoomuchtimeontheoperationsofanapplicationorservice. Automationisanimportantpartofthesitereliabilityengineer’srole.Iftheyaredealingwithaproblemrepeatedlythentheywillautomateasolution.Thisalsohelpsensurethatoperationsworkremainsathalfoftheirworkload. MaintainingthebalancebetweenoperationsanddevelopmentworkisakeycomponentofSRE. DevOpsisanapproachtoculture,automation,andplatformdesignintendedtodeliverincreasedbusinessvalueandresponsivenessthroughrapid,high-qualityservicedelivery.SREcanbeconsideredanimplementationofDevOps.LikeDevOps,SREisaboutteamcultureandrelationships.BothSREandDevOpsworktobridgethegapbetweendevelopmentandoperationsteamstodeliverservicesfaster. Fasterapplicationdevelopmentlifecycles,improvedservicequalityandreliability,andreducedITtimeperapplicationdevelopedarebenefitsthatcanbeachievedbybothDevOpsandSREpractices.SREisdifferentbecauseitreliesonsitereliabilityengineerswithinthedevelopmentteamwhoalsohaveanoperationsbackgroundtoremovecommunicationandworkflowproblems.Thesitereliabilityengineerroleitselfcombinestheskillsetofdevteamsandoperationsteamsbyrequiringanoverlapinresponsibilities. SREcanhelpDevOpsteamswhosedevelopersareoverwhelmedbyoperationstasksandneedsomeonewithmorespecializedopsskills. Intermsofcodeandnewfeatures,DevOpsfocusesonmovingthroughthedevelopmentpipelineefficiently,whileSREisfocusedonbalancingsitereliabilitywithcreatingnewfeatures. Modernapplicationplatformsbasedoncontainertechnology,KubernetesandmicroservicesarecriticaltoDevOpspractices,helpingdeliversecureandinnovativesoftwareservices.LearnhowtoimplementDevOpswithaKubernetesplatformReadmoreaboutDevOpsonRedHatDeveloperSREreliesonautomatingroutineoperationaltasksandstandardizationacrossanapp’slifecycle.Linux®containersgiveyourteamtheunderlyingtechnologyneededforacloud-nativedevelopmentstyle.Containerssupportaunifiedenvironmentfordevelopment,delivery,integration,andautomation.AndKubernetesisthemodernwaytoautomateLinuxcontaineroperations.KuberneteshelpsyoueasilyandefficientlymanageclustersrunningLinuxcontainersacrosspublic,private,orhybridclouds.Withtherightplatform,youcanbesttakeadvantageofthecultureandprocesschangesyou’veimplemented.RedHat®OpenShift®istheenterprise-readyKubernetesplatformtosupportSREinitiatives.TryRedHatOpenShiftforfree IfyouwanttotakefulladvantageoftheagilityandresponsivenessofDevOps,ITsecuritymustplayaroleinthefulllifecycleofyourapps.CI/CDintroducesongoingautomationandcontinuousmonitoringthroughoutthelifecycleofapps,fromintegrationandtestingphasestodeliveryanddeployment.ADevOpsengineerhasauniquecombinationofskillsandexpertisethatenablescollaboration,innovation,andculturalshiftswithinanorganization. ProductsAnintensive,highlyfocusedresidencywithRedHatexpertswhereyoulearntouseanagilemethodologyandopensourcetoolstoworkonyourenterprise’sbusinessproblems.Engagementswithourstrategicadviserswhotakeabig-pictureviewofyourorganization,analyzeyourchallenges,andhelpyouovercomethemwithcomprehensive,cost-effectivesolutions.RelatedarticlesUnderstandingDevOpsCloud-nativeCI/CDonRedHatOpenShiftWhatisdeploymentautomation?WhatisDevOpsautomation?WhoisaDevOpsengineer?WhatisaCI/CDpipeline?Whatisagilemethodology?Whatisapplicationlifecyclemanagement (ALM)?Whatisbluegreendeployment?WhatisCI/CD?Whatiscontinuousdelivery?WhatisDevSecOps?WhatisGitOps?WhatisSRE(sitereliabilityengineering)?ResourcesEnterpriseautomationwithaDevOpsmethodologyStreamlineCI/CDpipelineswithRedHatAnsibleAutomationPlatformANALYSTMATERIAL451ResearchPathfinderreport:AchievingIntelligentDevOpsANALYSTMATERIALDrivingDevOpsautomationANALYSTMATERIALAcceleratingDevOpsinthepublicsector Getmorecontentlikethis Signupforourfreenewsletter,RedHatShares. Continue
延伸文章資訊
- 1網站可靠性工程|Google 的系統管理之道(Site Reliability ...
網站可靠性工程|Google 的系統管理之道(Site Reliability Engineering: How Google Runs Production Systems)(SRE). Be...
- 2What's An SRE? Site Reliability Engineer Roles and... - Splunk
Site reliability engineers sit at the crossroads of traditional IT and software development. Basi...
- 3What is SRE (site reliability engineering)? - Red Hat
Site reliability engineering (SRE) is a software engineering approach to IT operations. SRE teams...
- 4推薦:Site Reliability Engineering (SRE, 網站可靠性工程)
SRE 全名是Site Reliability Engineering 網站可靠性工程,是Google 提倡的系統管理實踐之道、指導思想,這個名詞同時也是軟體工程師(Software ...
- 5Site Reliability Engineering: Google
SRE is what you get when you treat operations as if it's a software problem. Our mission is to pr...