What is SRE (site reliability engineering)? - Red Hat

2024-09-21

文章推薦指數： 80 %

投票人數：10人

Site reliability engineering (SRE) is a software engineering approach to IT operations. SRE teams use software as a tool to manage systems, ... Skiptocontent Featuredlinks Console Support Developers Partners Redhat.com Startatrial Products Solutions Services&support Resources RedHat&opensource MoreRedHat Console Support Developers Partners Redhat.com Startatrial Enteryourkeywords Contactus English Selectalanguage 简体中文EnglishFrançaisDeutschItaliano日本語한국어PortuguêsEspañol Account Login LoginYourRedHataccountgivesyouaccesstoyourmemberprofileandpreferences,andthefollowingservicesbasedonyourcustomerstatus: CustomerPortal RedHatConnectforBusinessPartners Usermanagement CertificationCentral RegisternowNotregisteredyet?Hereareafewreasonswhyyoushouldbe:BrowseKnowledgebasearticles,managesupportcasesandsubscriptions,downloadupdates,andmorefromoneplace. Viewusersinyourorganization,andedittheiraccountinformation,preferences,andpermissions. ManageyourRedHatcertifications,viewexamhistory,anddownloadcertification-relatedlogosanddocuments. EdityourprofileandpreferencesYourRedHataccountgivesyouaccesstoyourmemberprofile,preferences,andotherservicesdependingonyourcustomerstatus. Foryoursecurity,ifyou'reonapubliccomputerandhavefinishedusingyourRedHatservices,pleasebesuretologout. Logout Account Login Jumptosection Jumptosection Sitereliabilityengineering(SRE)isasoftwareengineeringapproachtoIToperations.SREteamsusesoftwareasatooltomanagesystems,solveproblems,andautomateoperationstasks.SREtakesthetasksthathavehistoricallybeendonebyoperationsteams,oftenmanually,andinsteadgivesthemtoengineersoropsteamswhousesoftwareandautomationtosolveproblemsandmanageproductionsystems. SREisavaluablepracticewhencreatingscalableandhighlyreliablesoftwaresystems.Ithelpsyoumanagelargesystemsthroughcode,whichismorescalableandsustainableforsysadminsmanagingthousandsorhundredsofthousandsofmachines. TheconceptofsitereliabilityengineeringcomesfromtheGoogleengineeringteamandiscreditedtoBenTreynorSloss. SREhelpsteamsfindabalancebetweenreleasingnewfeaturesandmakingsurethattheyarereliableforusers.Standardizationandautomationare2importantcomponentsoftheSREmodel.Sitereliabilityengineersshouldalwaysbelookingforwaystoenhanceandautomateoperationstasks.Inthisway,SREhelpstoimprovethereliabilityofasystemtoday,whilealsoimprovingitasitgrowsovertime. SREsupportsteamswhoaremovingfromatraditionalapproachtoIToperationstoacloud-nativeapproach.LearnaboutRedHat'sapproachtoSREAsitereliabilityengineerisauniquerolethatrequireseitherabackgroundasasoftwaredeveloperwithadditionaloperationsexperience,orasasysadminorinanIToperationsrolethatalsohassoftwaredevelopmentskills. SREteamsareresponsibleforhowcodeisdeployed,configured,andmonitored,aswellastheavailability,latency,changemanagement,emergencyresponse,andcapacitymanagementofservicesinproduction.Sitereliabilityengineeringhelpsteamstodeterminewhatnewfeaturescanbelaunchedandwhenbyusingservice-levelagreements(SLAs)todefinetherequiredreliabilityofthesystemthroughservice-levelindicators(SLI)andservice-levelobjectives(SLO). AnSLIisadefinedmeasureofspecificaspectsofprovidedservicelevels.KeySLIsincluderequestlatency,availability,errorrate,andsystemthroughput.AnSLOisbasedonthetargetvalueorrangeforaspecifiedservicelevelbasedontheSLI.AnSLOfortherequiredsystemreliabilityisthendeterminedbasedonthedowntimeagreeduponasacceptable.Thisdowntimelevelisreferredtoasanerrorbudget,themaximumallowablethresholdforerrorsandoutages. WithSRE,100%reliabilityisnotexpected;failureisplannedforandaccepted. Thedevelopmentteamisableto"spend"theerrorbudgetwhenreleasinganewfeature.UsingtheSLOanderrorbudget,thedevelopmentteamcandeterminewhetherornotaproductorserviceisabletolaunchbasedontheavailableerrorbudget.Ifaserviceisrunningwithinthe errorbudget,thenthedevelopmentteamcanlaunchwhenevertheywant,butifthesystemcurrentlyhastoomanyerrorsorgoesdownforlongerthantheerrorbudgetallowsthennonewlaunchescantakeplaceuntiltheerrorsarewithinbudget. Thedevelopmentteamconductsautomatedoperationsteststodemonstratereliability. Sitereliabilityengineerssplittheirtimebetweenoperationstasksandprojectwork.AccordingtoSREbestpracticesfromGoogle,asitereliabilityengineercanonlyspendamaximumof50%oftheirtimeonoperations,whichshouldbemonitoredtoensuretheydon’tgoover. Therestofthetimeshouldbespentondevelopmenttaskslikecreatingnewfeatures,scalingthesystem,andimplementingautomation.Excessoperationalworkandpoorlyperformingservicescanberedirectedbacktothedevteamtoruninsteadofthesitereliabilityengineerspendingtoomuchtimeontheoperationsofanapplicationorservice. Automationisanimportantpartofthesitereliabilityengineer’srole.Iftheyaredealingwithaproblemrepeatedlythentheywillautomateasolution.Thisalsohelpsensurethatoperationsworkremainsathalfoftheirworkload. MaintainingthebalancebetweenoperationsanddevelopmentworkisakeycomponentofSRE. DevOpsisanapproachtoculture,automation,andplatformdesignintendedtodeliverincreasedbusinessvalueandresponsivenessthroughrapid,high-qualityservicedelivery.SREcanbeconsideredanimplementationofDevOps.LikeDevOps,SREisaboutteamcultureandrelationships.BothSREandDevOpsworktobridgethegapbetweendevelopmentandoperationsteamstodeliverservicesfaster. Fasterapplicationdevelopmentlifecycles,improvedservicequalityandreliability,andreducedITtimeperapplicationdevelopedarebenefitsthatcanbeachievedbybothDevOpsandSREpractices.SREisdifferentbecauseitreliesonsitereliabilityengineerswithinthedevelopmentteamwhoalsohaveanoperationsbackgroundtoremovecommunicationandworkflowproblems.Thesitereliabilityengineerroleitselfcombinestheskillsetofdevteamsandoperationsteamsbyrequiringanoverlapinresponsibilities. SREcanhelpDevOpsteamswhosedevelopersareoverwhelmedbyoperationstasksandneedsomeonewithmorespecializedopsskills. Intermsofcodeandnewfeatures,DevOpsfocusesonmovingthroughthedevelopmentpipelineefficiently,whileSREisfocusedonbalancingsitereliabilitywithcreatingnewfeatures. Modernapplicationplatformsbasedoncontainertechnology,KubernetesandmicroservicesarecriticaltoDevOpspractices,helpingdeliversecureandinnovativesoftwareservices.LearnhowtoimplementDevOpswithaKubernetesplatformReadmoreaboutDevOpsonRedHatDeveloperSREreliesonautomatingroutineoperationaltasksandstandardizationacrossanapp’slifecycle.Linux®containersgiveyourteamtheunderlyingtechnologyneededforacloud-nativedevelopmentstyle.Containerssupportaunifiedenvironmentfordevelopment,delivery,integration,andautomation.AndKubernetesisthemodernwaytoautomateLinuxcontaineroperations.KuberneteshelpsyoueasilyandefficientlymanageclustersrunningLinuxcontainersacrosspublic,private,orhybridclouds.Withtherightplatform,youcanbesttakeadvantageofthecultureandprocesschangesyou’veimplemented.RedHat®OpenShift®istheenterprise-readyKubernetesplatformtosupportSREinitiatives.TryRedHatOpenShiftforfree IfyouwanttotakefulladvantageoftheagilityandresponsivenessofDevOps,ITsecuritymustplayaroleinthefulllifecycleofyourapps.CI/CDintroducesongoingautomationandcontinuousmonitoringthroughoutthelifecycleofapps,fromintegrationandtestingphasestodeliveryanddeployment.ADevOpsengineerhasauniquecombinationofskillsandexpertisethatenablescollaboration,innovation,andculturalshiftswithinanorganization. ProductsAnintensive,highlyfocusedresidencywithRedHatexpertswhereyoulearntouseanagilemethodologyandopensourcetoolstoworkonyourenterprise’sbusinessproblems.Engagementswithourstrategicadviserswhotakeabig-pictureviewofyourorganization,analyzeyourchallenges,andhelpyouovercomethemwithcomprehensive,cost-effectivesolutions.RelatedarticlesUnderstandingDevOpsCloud-nativeCI/CDonRedHatOpenShiftWhatisdeploymentautomation?WhatisDevOpsautomation?WhoisaDevOpsengineer?WhatisaCI/CDpipeline?Whatisagilemethodology?Whatisapplicationlifecyclemanagement (ALM)?Whatisbluegreendeployment?WhatisCI/CD?Whatiscontinuousdelivery?WhatisDevSecOps?WhatisGitOps?WhatisSRE(sitereliabilityengineering)?ResourcesEnterpriseautomationwithaDevOpsmethodologyStreamlineCI/CDpipelineswithRedHatAnsibleAutomationPlatformANALYSTMATERIAL451ResearchPathfinderreport:AchievingIntelligentDevOpsANALYSTMATERIALDrivingDevOpsautomationANALYSTMATERIALAcceleratingDevOpsinthepublicsector Getmorecontentlikethis Signupforourfreenewsletter,RedHatShares. Continue

請為這篇文章評分？

延伸文章資訊

網站可靠性工程｜Google 的系統管理之道(Site Reliability ...

網站可靠性工程｜Google 的系統管理之道(Site Reliability Engineering: How Google Runs Production Systems)(SRE). Be...

What's An SRE? Site Reliability Engineer Roles and... - Splunk

Site reliability engineers sit at the crossroads of traditional IT and software development. Basi...

What is SRE (site reliability engineering)? - Red Hat

Site reliability engineering (SRE) is a software engineering approach to IT operations. SRE teams...

SRE 全名是Site Reliability Engineering 網站可靠性工程，是Google 提倡的系統管理實踐之道、指導思想，這個名詞同時也是軟體工程師(Software ...

Site Reliability Engineering: Google

SRE is what you get when you treat operations as if it's a software problem. Our mission is to pr...

What is SRE (site reliability engineering)? - Red Hat

文章推薦指數： 80 %

請為這篇文章評分？

延伸文章資訊

最新文章

相關網站資訊

跆拳道拳法

遊戲裝備英文

跆拳道基本動作

健身房

槓鈴

雪山入山證

排雲山莊

山域嚮導資格檢定辦法

打跆拳道英文

跆拳道英文簡寫

體操英文