Writing Runbook Documentation When You're An SRE

文章推薦指數: 80 %
投票人數:10人

As The Site Reliability Workbook says, playbooks “reduce stress, ... as the Site Reliability Engineering: How Google Runs Production Systems ... ProductSolutionsOptimizeServiceRequestManagementModernizeIncidentManagementScaleInfrastructureasCodePracticesResourcesResourcesLibraryDocsNewsBlogAllPostsCategoriesCompanyDevOpsIncidentManagementITSMSRECompanyAboutCareersContactTrustRequestDemoLoginBlog>SRE>WritingRunbookDocumentationWhenYou’reAnSREWritingRunbookDocumentationWhenYou’reAnSRETipsandtricksforwritingeffectiverunbookdocumentationwhenyouaren’tatechnicalwriterTaylorBarnett·Jan30th,2020Thesadrealityis,nooneactuallywantstoreadyourrunbookdocumentation.Engineerswhogetpagedwhileon-callwanttomitigateandresolveanincidentasfastaspossible,andmoveon.Nonetheless,runbooks,sometimescalledplaybooks,arenecessary.AsTheSiteReliabilityWorkbooksays,playbooks“reducestress,themeantimetorepair(MTTR),andtheriskofhumanerror.”OftenIhavefoundthatengineersdon’twanttowritedocumentationfortwomainreasons:Thereisn’tanincentivestructurefordoingthework,andtheyareunsureofhowtowritegooddocumentation.Focusingonthelatter,whilenorunbookwillbe“asubstituteforsmartengineersabletothinkonthefly,astheSiteReliabilityEngineering:HowGoogleRunsProductionSystemsbooksays,“clearandthoroughtroubleshootingstepsandtipsarevaluablewhenrespondingtoahigh-stakesortime-sensitivepage.”Unlikethetonsofcontentforengineersonhowtowritegoodcode,there’salotlessfordocumentation,especiallyrunbooks.WhetheryourteamispracticingDevOpsortraditionalIToperations,thisblogpostisfocusedonhelpingSiteReliabilityEngineers(SREs)andotherengineerswhoareinvolvedinon-callengineeringcreateclearerandmoreeffectiverunbookdocumentation.RunbookTemplates#Blankpagesarenofun.Usingatemplatecanbebeneficialbecausestartingfromablankdocumentisincrediblyhard.Atemplategivesyouanoutlinetostartoutwith.It’sguidanceonhowtogetstarted,whichisthehardestpartwhenwriting.Whattemplateyouuseforyourrunbooksheavilydependsonyourteam.It’sessentialtogetbuy-infromtheteam;otherwise,theteamwon’tuseit.Ifthere’sasectionthatamajorityoftheteamdoesn’tfinduseful,theyareunlikelytofillitout.I’mnotgoingtorecommendonetemplatetorulethemall,butIcanrecommendsometemplatesthatwillhopefullyinspireyou.IchoosethesebecauseIfeeltheyhavetherightbalanceofinformation.It’seasyforthetemplatetogrowtobeverylargeanddauntingforanyengineertofillout.Checkouttheseexamples:"WhySREDocumentsMatter"byShylajaNukalaandVivekRau,ACMQueueNovember/December2019issue(scrolltothebottomforthetemplates)RunbookTemplatefrom"TacklingAlertFatigue"byCaitieMcCaffrey,Monitorama2016"BuildingaBetterOpsRunbook"byShawnStaffordTheonethingIdorecommendisthatalertnames’maptotherunbookname.Thiscanbeveryhelpfulformakingyourrunbooksdiscoverable.Itcanalsohelpyouevaluatetherunbookcoverageyouhaveforyouron-callteam.TheCurseofKnowledge#TheCurseofKnowledgeisacognitivebiasthatoccurswhensomeoneiscommunicatingwithothersandunknowinglyassumesthelevelofknowledgeofthepeopletheyarecommunicatingwith.Asweprogressinourfields,wegainmoreexperience,andasthishappens,itbecomeshardertorecreateastateofmindwithoutthisnewknowledge.Itisasignificantbarriertoexpressingempathyindocumentation.TheramificationsoftheCurseofKnowledgecanbeprettyharmful.Forexample,itcancauseustoleaveoutwholestepsinstep-by-stepinstructions,likeneedingtoinstallaparticularpieceofsoftwareorscript.Itcanalsoleadtooversimplifyingthingsandusingwordssuchas“simply,”“easy,”“just,”andotherwordsthatcanvarybasedonexperiencelevel.So,what’sthesolution?Removethosewordsfromyourdocumentation.Atbest,theydon’thelpanyone.Atworst,theyaredemeaningwhenyouarestrugglinginanincident.Othersolutionsincludemakingsurepeopleatalllevelswhomightbeusingtherunbookshaveachancetoreviewandcatchanythingthatmighthavebeenmissed.Todothiseffectively,though,youneedtohaveacollaborativeenvironmentwheresomeonefeelscomfortablespeakinguponsomethingthatfeelsleftoutorisconfusing.Thebestpartofdoingthisworkisthatyouareworkingtowardsmoreempatheticdocumentation.SREDocumentationGlossaries#Glossariescanbehelpfulforafewreasons:Glossarieshelpyourepeatyourselfless.Whenyoucanrefertoadefinitionwithalinkedexplanation,youjustsavedyourselftimeandwords.Glossariesmakedescriptionsmoreconsistent.Ifsomethingisexplainedinfivedifferentways,itcangetconfusing.Glossariesallowarunbooktobemoreeasilyusedbyengineerswithdifferentlevelsofexperience.Byreferencingaglossaryinyourrunbook,youallowsomeonenewertotheon-callrotationtogettheexplanationofconceptsortermstheyneed.Formoreexperiencedon-callengineers,youremoveextraneousinformationfromtherunbook.Also,makesuretoadduniqueacronymstotheglossarytoo.Someteamsandorganizationsuseuniqueacronymsthatmightnotbewidelyknown.Aglossaryisagreatplacetoexplainthem.PreventRunbookSearchFailure#Usersquicklyglanceoverdocumentationtotrytofindwhattheyarelookingfor.Commonly,theymisscriticalinformationtheyarelookingforbecauseofthestructureorformatofthedocumentation,causingwhatIcall“searchfailure.”TheNielsenNormanGrouphasbeenresearchinghowpeoplereadonthewebthrougheye-trackingstudiesforyears.TheyfoundthatoftenpeoplereadinF-shapepatterns.Thetwoimplicationstheypointedoutfromthispatternisthatthe“firstlinesoftextonapagereceivemoregazesthansubsequentlinesoftextonthesamepage”andthe“firstfewwordsontheleftofeachlineoftextreceivemorefixationsthansubsequentwordsonthesameline.”So,whatdoesthismeanforyourrunbookdocumentation?Yourrunbooktemplatesmustincludeasectionatthetoptodescribeinonesentencetheintentoftherunbook.Thiscanhelpanengineerquicklyconfirmiftheyarelookingattherightinformation.Also,youshouldonlyhaveonestep,command,orinstructionperparagraphorlistitem.Itwillmakeiteasierforreadersnottomissastep.Alongwiththis,shortersentencesreducethechancesofsearchfailure.Long,drawn-outparagraphsandsentencesoftengetglancedover,somakesuretobreakupdifferentinformationintonewsentences,paragraphs,andlistitems.ReadableRunbookSteps#Oftenparagraphsinarunbookcanbecomemorereadableiftheyareturnedintoabulletedlist.Ifordermatters,makesuretonumberthelistitemsturningitintoanumberedlistofsteps(E.g.,1,2,3).Itmakesiteasiertofollowandtoreference.Itcanpreventreadersfromnotskippingstepsduringincidentswhentreatedasachecklist.Evenabasicthree-itemlistinasentencecanbeimproved.Forexample,quicklyreadthesentencebelowfromanotherTranspositblogpost:ThisblogpostisthesecondinaseriesofafewpostswhereI’llcoverhowTranspositusestheOpenAPISpecification,AWSAPIs,Boto,andwhywehadtosupportthemdifferentlywithOpenAPI,andhowwecreatedOpenAPIextensionsandwhatwelearnedfromthisprocess.Andnowquicklyreadthebulletedlistbelow:ThisblogpostisthesecondinaseriesofafewpostswhereI’llcover:HowTranspositusestheOpenAPISpecificationAWSAPIs,Boto,andwhywehadtosupportthemdifferentlywithOpenAPIHowwecreatedOpenAPIextensionsandwhatwelearnedfromthisprocessWhichdidyougetmoreinformationoutof?(Mostlikely,thelatter.)Anytimethereisalistinasentence,turnitintoabulletedlist.Itwillhelpsearchfailureandhelpreadersofyourlistabsorbtheinformation.Lastly,startsentenceswithanimperativeverb,alsoknownasacommand,inyourlists.Forexample,wordslike“download,”“configure,”“restart,”and“open.”Thishelpsreaderssincetheireyeswilllikelyonlyscanthefirstfewwordsontheleftofeachlineoftext.CodeinRunbooks#Ifyouhaveevercopiedandpastedanythingfromsomedocumentationtothecommandline,you’veprobablyencounteredsome“commandnotfound”problem.Whetheritisdocumentationincludingthe$oralibrarythatshouldhavebeeninstalledfirst,itisvitaltogivetheusercontext.Forexample,insteadofincludingthe$torepresentusingacommandonyourcommandline,instructtheuserwheretousethecommandinstead:“Inyourterminal…”Or,iftherearesomeinstallationprerequisites,describethemintherunbookoraddalinktothem.Lastly,ifascriptislongerthanasingleline,treatitlikecode,andcheckitintoarepositorytobesourcecontrolandpotentiallytested.Thiswillensurethequalityismaintainedandthatincorrectorevendangerousscriptsdon’tgetusedduringtheresponsetoanincident.WritingRunbooksDocumentationisHard#Hopefully,thesetipsandtrickswillhelpyouwhenyouarewritinganeworupdatinganexistingrunbook.Likelearninganynewskill,itcanbehardandtakespractice.Havingon-callteammatesofallskilllevelshelpyoureviewyourrunbookscanbeveryhelpfultoprogressyourskills.Nowgoimprovesomerunbooks!P.S.Checkouttheblogpostthatmyteammate,Dan,wroteaboutwhatmakesagoodrunbook.Learnmorefromourglossary#ITservicemanagement(ITSM)ITchangemanagementProblemmanagementIncidentmanagementInfrastructureascodeGetinsightsfromTranspositinyourinboxmonthly.Subscribe



請為這篇文章評分?