Security Automation Lessons from Site Reliability Engineering ...
文章推薦指數: 80 %
Examples span the range of building playbooks for response ... In fact, our SRE peers remind us that “automation provides more than just ... BlogSkiptocontentBlogMenuWhat'sNewProductNewsGoogleCloudGoogleWorkspaceChromeEnterpriseGoogleMapsPlatformSolutions&TechnologiesAI&MachineLearningAPIManagementApplicationDevelopmentCloudMigrationComputeContainers&KubernetesDataAnalyticsDatabasesDevOps&SREIdentity&SecurityInfrastructureNetworkingSAPServerlessStorage&DataTransferTopicsDevelopers&PractitionersPartnersInsideGoogleCloudIndustriesFinancialServicesHealthcare&LifeSciencesMedia&EntertainmentPublicSectorTelecommunicationsRetailStartupsTraining&CertificationsGoogleCloudNextCIOs&ITleadersAboutRSSFeed×ContactSalesGetstartedforfreeLateststoriesWhat'sNewProductNewsTopicsCIOs&ITleadersAboutRSSFeedIdentity&SecurityAchievingAutonomicSecurityOperations:AutomationasaForceMultiplierGCATAntonChuvakinHeadofSolutionsStrategyImanGhanizadaGlobalHeadofAutonomicSecurityOperationsFebruary10,2022Aswediscussedin“AchievingAutonomicSecurityOperations:Reducingtoil”,yourSecurityOperationsCenter(SOC)canlearnlessonsfromSiteReliabilityEngineering(SRE)Thismeansthatapplyingsoftwareengineeringpracticestosecurityoperationschallengescanradicallyimproveanorganization’ssecurity.Inthispost,wediscusshowyoucanleverageanothercoreprincipleofSRE–automation-asameanstoachievebetteroutcomesinyourSOC.Let’smakeitveryclear–afullyautomatedSecurityOperationsCenterthatrequiresnohumaninvolvementisnotpossibletoday.TheessenceofAutonomicSecurityOperationsisthebeliefthatorganizationsneedtheirthreatmanagementfunctionstoscale,involuntarily,on-demandandthatsuchgrowthneedstobefasterthanthegrowthofthethreatsandtheassetstobecovered.OnceyourSOCisabletostayaheadoftheriseinassets,threats,andcomplexities–you’veeffectivelyachievedan“Autonomic”stateofexistencewithinyourorganization.Thishasalreadyhappenedonthebuildsideoftechnology–thinkabouthowtheadoptionofDevOpshasallowedsoftwareteamstobecome“autonomic”,wherebusinessesareabletoenternewmarketsandbuildnewproductson-demandwithoutworryingabouttheelasticityoftheirtechnologyteams.Thesefunctionscertainlyarenotfullyautomated,andsotheseparallelsdrawustothesametheory–thewaywedrivetothisaspirationalstatestartswiththeideologywealignto.Hence,wedefined AutonomicSecurityOperationsasacombinationofphilosophies,practices,andtoolsthatimproveanorganization'sabilitytowithstandsecurityattacksthroughanadaptive,agile,andhighlyautomatedapproachtothreatmanagement.Let’sreviewspecificexamplesandprinciplesthatyourSOCcanlearnfromSiteReliabilityEngineering.Naturally,buildingtowardsanautonomicapproachtothreatmanagementmeansthatimplementingautomationwherepossibleisaprinciplethateverypractitionerintheSOCshouldaspireto.Whetheryou’reasecurityengineer,ananalyst,anincidentresponder,oranarchitect–creativewaystoautomateoperations(startingwiththemundanetasksfirst)canbeaforcemultiplierthatcanreducetheburdenofoperationsonyourteam. ThisisparalleltohowthingsstandinthedomainofSRE.However,theSREbookalsoaddsthat“multiplyingforcedoesnotnaturallychangetheaccuracyofwherethatforceisapplied.” Thisremindsusthatautomatingabrokenprocessoftenmakesitmorebroken,butalsothatautomatingsomethingthatisn'tgame-changingorsystemicforaSOCwouldonlymakeyouslightlybetter,ifthat.Inpractice,automationisappliedacrossmanydifferentareasandvariousrolesintheSOC.Examplesspantherangeofbuildingplaybooksforresponseactivities(withtoolslikeSiemplifyorotherSOARproducts),dataenrichment,linkingyourprocessestomanagedsecurityservices,andendlessotherworkflowsinyourSOC.Anytimeyoudosomethingrepetitively,askyourselfwhetherthiscanbecreativelyautomated.Itisoftensuggestedthatthevalueofautomationissolelyaboutsavingtime.Infact,ourSREpeersremindusthat“automationprovidesmorethanjusttimesaving,soit’sworthimplementinginmorecasesthanasimpletime-expendedversustime-savedcalculationmightsuggest.” BothSOCpractitionersandSREsagreethatconsistencyisalsoacoreelementofautomation,asisscaling(“scaleisanobviousmotivationforautomation”). Specifically,consistencyalsoallowsyoutosolvefordefectsintheprocess.Whiledefectsoftenaligntoreliability,let’sconsiderthatreliabilityinsecurityisbothsystemreliabilityaswellassignalreliability.Yourtoolsandpracticesshouldminimizedowntime,minimizenoise,andmaximizetruepositives–therebyincreasingsystemandsignalreliabilitysoyourteamcanfocusonthecontinualresolutionandimprovementofdefenses.Also,considerthatsomeworkintheSOCismanual-by-design,suchasthreathunting,whichisananalyst-centricprocess,evenifaidedbytools.Ideally,onceathreatisidentifiedandneutralizedduringahunt,thedatabehindtheadversariestacticsandtechniquesshouldbeleveragedtoimprovedetectionusecases,feedingtheautomationwhereverpossible.SpeedstillcomesupalotinSREdiscussionsofautomation;afterall“humansdon’tusuallyreactasfastasmachines.”Inthepast,sub-secondspeedmatteredlittleinsecurity,especiallyinthedayandageof200+dayresponsetimelines.Today,ransomwarehaschangedthingsandspeeddoesmatter. Notethatspeedalsodoesn’tnecessarilyhavetobeonlycorrelatedtodetectionspeedandreducingtheMeanTimetoDetect.Let’ssayyoudetectsomethingthatmaybeaffectingaproductionworkload,andyouhaveautomationthatsendsarequestwithactionstotheprojectowner.Iftheaffectedteamdoesnotrespondandyouneedto“breakglass”,thespeedofyourresponsecandeterminewhattheoutcomewillbe.Solookathowyoucanautomateawaythe performancebottlenecks.Asaresult,thekeylessonforSOCautomationfromSREisthat“thefactorsofconsistency,quickness,andreliabilitydominatemostconversationsaboutthetrade-offsofperformingautomation.”TheselessonsresonatewhenworkingtomakeyourSOCscalefasterthanthethreatsyoufaceandalsofasterthanyourITassetsgrow.Further,wepickedupaparticularnewinsightfromtheSREbook,namelythatautomationseparatestheoperationfromanoperator(“Decouplingoperatorfromoperationisverypowerful.”).Specifically,“onceyouhaveencapsulatedsometaskinautomation,anyonecanexecutethetask.”Whatdoesthissolve?SomeofthetalentshortageproblemsinyourSOC!Thisagaingivesusachancetoscalefasterthanthegrowthofthreatsandassets.HereisanotherveryusefulreminderforyourSOCfromtheworldofSRE:“automaticsystemsalsoprovideaplatform.”Whatdoesitmean?Thatscriptyouwroteisnotaplatform,evenifitautomatessomeminortask.Thewaywethinkaboutit,theplatformisaprogrammableentity,abasetodevelopotherimportantthings(agoodSOARisaplatform,forexample).Thismeansyouhaveachancetomakeautomationofyouractivitiesmoresystematic.However,thereisaslightlyparadoxicalconsequencehere: “Aplatformalsocentralizesmistakes.Inotherwords,abugfixedinthecodewillbefixedthereonceandforever”.Thinkaboutitforasecond.ThisisnotaboutaSOCbeingagreatplacetocomeandmakemistakes,thisisaboutthefactthatyougotooneplacetolookformistakes,ratherthanchasethemover50toolsand200regionaloffices.Centralizingmistakesisagreatwaytoacceleratecontinuousimprovement.Finally,“automationasaplatform”delivershelpfulmetrics:“aplatformcanexportmetricsaboutitsperformance,orotherwiseallowyoutodiscoverdetailsaboutyourprocessyoudidn’tknowpreviously.”ForSOCpractitionerswhofollowtheAutonomicapproach,thequalityofyourservice-levelobjectives(SLO)aredependentonthedatayouhaveonyoursystems.OurSREcolleagueshavealsopointedoutafewnegativesofautomationaswellasits risks.TheveryobvioustopicthateverySOCteamhighlightsisthatautomatedresponsescansometimesresultindisastrousoutcomes,ifnotplannedcorrectly.ThiscanhappeninbothIToperationsandSecurity.TheSREbookdescribesexampleswheremanyproductionsystemsatamajortechnologycompanyweredeletedbyautomation, reimagedstraighttodemagnetizeddustwithenviablescaleandeffectiveness.Thisisapossibilitywithautomation,soitisimportanttohavepeerreviews,QA&testing,highlydescriptiveplaybooks,andotherprocessesinplacewhendevelopingautomatedresponses.AlsoinSRE,“Automationneedstobecarefulaboutrelyingonimplicit"safety"signals.”IntheSOC,aclassicexamplewouldbeblockingaccessbasedonbadness,withoutcheckingforbusinesscriticality.Weimplythatitissafetoblockaccess,butdowehaveanexplicit“thismachineisOKtoauto-block”list?Isthissafetoshutdown?Isthissafetoblockaccessto?UsingexplicitsafetysignalsforautomationisausefulinsighttoimplementinaSOC.Wehavelearnedaboutotherchallengesthatarerelevanttotheworldofsecurityoperations.Forexample,someSOARuserscomplainthatwhenthesecuritytoolschangetheirSOARsystemsdon'talwaysfollowquicklyenough.Thisisawell-knownproblemintheworldofSRE:automation“beingmaintainedseparatelyfromthecoresystemthereforesuffersfrom“bitrot,”i.e.,notchangingwhentheunderlyingsystemschange.” Anotherlessonthatwearestartingtoseeinmanysecurityoperationscentersisthatthoseautomationsthatareinfrequent,suchasplaybooksrunuponseeingrareattackindicatorsaredifficulttotest.“Automationthatiscrucialbutonlyexecutedatinfrequentintervalsandthereforedifficulttotestisoftenparticularlyfragilebecauseoftheextendedfeedbackcycle.” Itiseasytorefineanefficientplaybookthatruns10timesaday,butit'smuchhardertotestandimproveaplaybookthatisaimedataparticulartypeofanadvancedattackthatmayhappenthisyear…ornot.Howdowefixthat?Bymoreautomation–testautomationandsimulationsinthiscase.Acrossmostleadingsecurityoperationscenters,anddeepintheSREbookisanotherkeylessonandthatisthat“Themostfunctionaltoolsareusuallywrittenbythosewhousethem.”Thishasbeendiscussedinmanydetectionengineeringarticles,butitisdefinitelynotcommoninmanymainstreamSOCs.ThisiswhyinourASOworkshopsweexplainthat“SOCanalysts”and“detectionengineers”needtoconvergeandbecomeone.EarlierwediscussedthebeliefofanAutonomicSecurityOperationspractice,butwealsohavelearningsfromSREthatchartapathofarrivingtotheautonomicsystemitself(thatdoesnotneedextraneousautomation)bystartingfromamanualapproachandthenevolvingtoautomation.HereautomationevolutionexamplefromtheSREbook:“Operator-triggeredmanualaction(noautomation) Operator-written,system-specificautomation Externallymaintainedgenericautomation Internallymaintained,system-specificautomation Autonomoussystemsthatneednohumanintervention”WhilesomeoftheaboveisnotobviouslyrelatedtoSOC,thereareobviousparallels:Whileitisclearthatautomationisnomagicbulletandcertainlyisnotasingletooltobuy,thesumofthewholeiswhatmakesasystemsecure,efficient,reliable,andallowsbusinessestosleepatnight. Considerinspiringyourteamtodobrain-stormingexercisesandworkshopsonkeyareastheycanautomatewithintheirroles. ThisshiftofincentivizingyourstafftothinkcreativelyishowDevOpsandSREaresowidelysuccessful,andifwewanttogetaheadofthecomplexityofmoderntechnologyfootprints,automationisacoreprincipleofsuccess.Relatedposts:“AchievingAutonomicSecurityOperations:Reducingtoil”“ModernizingSOC...IntroducingAutonomicSecurityOperations”RelatedArticleAchievingAutonomicSecurityOperations:ReducingtoilAsorganizationsgothroughdigitaltransformation,theimportanceofbuildingahighlyeffectivethreatmanagementfunctionrisestobeoneoftheirtoppriorities.Inourpaper,“AutonomicSecurityOperations—10XTransformationoftheSecurityOperationsCenter”,we’veoutlinedourapproachtomodernizingSecurityOperations.ReadArticlePostedin:Identity&Security
延伸文章資訊
- 1Google - Site Reliability Engineering
Thus, Google SRE relies on on-call playbooks, in addition to exercises such as the "Wheel of Misf...
- 2The Essential Guide to SRE - Blameless
SRE is a practice first coined by Google in 2003 that seeks to create systems and ... To create y...
- 3Writing Runbook Documentation When You're An SRE
As The Site Reliability Workbook says, playbooks “reduce stress, ... as the Site Reliability Engi...
- 4Google's Site Reliability Engineering Playbook - Karma Advisory
Google's Site Reliability Engineering Playbook. by Krishan Patel | Apr 6, ... Read on landing.goo...
- 5Google SRE book - Dan Luu
Nat Welch (a former Google SRE) responded to this by saying that you can build confidence through...