Day 7 -- Pattern Matching

文章推薦指數: 80 %
投票人數:10人

Perl supports a variety of special characters inside patterns, which enables you to match any of a number of character strings. These special characters are ... Chapter 7 PatternMatching CONTENTS Introduction TheMatchOperators Match-OperatorPrecedence SpecialCharactersinPatterns The+Character The[]SpecialCharacters The*and? SpecialCharacters EscapeSequencesforSpecialCharacters MatchingAnyLetterorNumber AnchoringPatterns VariableSubstitutioninPatterns ExcludingAlternatives Character-RangeEscapeSequences MatchingAnyCharacter MatchingaSpecifiedNumberofOccurrences SpecifyingChoices ReusingPortionsofPatterns Pattern-SequenceScalarVariables Special-CharacterPrecedence SpecifyingaDifferentPatternDelimiter Pattern-MatchingOptions MatchingAllPossiblePatterns IgnoringCase TreatingtheStringasMultipleLines EvaluatingaPatternOnlyOnce TreatingtheStringasaSingleLine UsingWhiteSpaceinPatterns TheSubstitutionOperator UsingPattern-SequenceVariablesinSubstitutions OptionsfortheSubstitutionOperator EvaluatingaPatternOnlyOnce TreatingtheStringasSingleorMultipleLines UsingWhiteSpaceinPatterns SpecifyingaDifferentDelimiter TheTranslationOperator OptionsfortheTranslationOperator ExtendedPattern-Matching ParenthesizingWithoutSavinginMemory EmbeddingPatternOptions PositiveandNegativeLook-Ahead PatternComments Summary Q&A Workshop Quiz Exercises Thislessondescribesthepattern-matchingfeaturesofPerl.Today, youlearnaboutthefollowing: Howpatternmatchingworks Thepattern-matchingoperators Specialcharacterssupportedinpatternmatching Pattern-matchingoptions Patternsubstitution Translation Extendedpattern-matchingfeatures Introduction Apatternisasequenceofcharacterstobesearchedfor inacharacterstring.InPerl,patternsarenormallyenclosed inslashcharacters: /def/ Thisrepresentsthepatterndef. Ifthepatternisfound,amatchoccurs.Forexample,ifyousearch thestringredefineforthepattern/def/,the patternmatchesthethird,fourth,andfifthcharacters. redefine Youalreadyhaveseenasimpleexampleofpatternmatchingin thelibraryfunctionsplit. @array=split(//,$line); Herethepattern//matchesasinglespace,whichsplits alineintowords. TheMatchOperators Perldefinesspecialoperatorsthattestwhetheraparticular patternappearsinacharacterstring. The=~operatortestswhetherapatternismatched,as showninthefollowing: $result=$var=~/abc/; Theresultofthe=~operationisoneofthefollowing: Anonzerovalue,ortrue,ifthepatternisfoundinthestring 0,orfalse,ifthepatternisnotmatched Inthisexample,thevaluestoredinthescalarvariable$var issearchedforthepatternabc.Ifabcisfound, $resultisassignedanonzerovalue;otherwise,$result issettozero. The!~operatorissimilarto=~,exceptthat itcheckswhetherapatternisnotmatched. $result=$var!~/abc/; Here,$resultissetto0ifabcappears inthestringassignedto$var,andtoanonzerovalue ifabcisnotfound. Because=~and!~produceeithertrueorfalse astheirresult,theseoperatorsareideallysuitedforusein conditionalexpressions.Listing7.1isasimpleprogramthat usesthe=~operatortotestwhetheraparticularsequence ofcharactersexistsinacharacterstring. Listing7.1.Aprogramthatillustratestheuseofthematching operator. 1:#!/usr/local/bin/perl 2: 3:print("Askmeaquestionpolitely:\n"); 4:$question=; 5:if($question=~/please/){ 6:print("Thankyouforbeingpolite!\n"); 7:}else{ 8:print("Thatwasnotverypolite!\n"); 9:} $program7_1 Askmeaquestionpolitely: MayIhaveaglassofwater,please? Thankyouforbeingpolite! $ Line5isanexampleoftheuseofthematch operator=~inaconditionalexpression.Thefollowing expressionistrueifthevaluestoredin$questioncontains thewordplease,anditisfalseifitdoesnot: $question=~/please/ Match-OperatorPrecedence Likealloperators,thematchoperatorshaveadefinedprecedence. Bydefinition,the=~and!~operatorshave higherprecedencethanmultiplicationanddivision,andlower precedencethantheexponentiationoperator**. ForacompletelistofPerloperatorsandtheirprecedence,see Day4,"MoreOperators." SpecialCharactersinPatterns Perlsupportsavarietyofspecialcharactersinsidepatterns, whichenablesyoutomatchanyofanumberofcharacterstrings. Thesespecialcharactersarewhatmakepatternsuseful. The+Character Thespecialcharacter+means"oneormoreofthe precedingcharacters."Forexample,thepattern/de+f/ matchesanyofthefollowing: def deef deeef deeeeeeef NOTE Patternscontaining+alwaystrytomatchasmanycharactersaspossible.Forexample,ifthepattern /ab+/ issearchinginthestring abbc itmatchesabb,notab. The+specialcharactermakesitpossibletodefinea betterwaytosplitlinesintowords.Sofar,thesampleprograms youhaveseenhaveused @words=split(//,$line); tobreakaninputlineintowords.Thisworkswellifthereis exactlyonespacebetweenwords.However,ifaninputlinecontains morethanonespacebetweenwords,asin Here'smultiplespaces. thecalltosplitproducesthefollowinglist: ("Here's","","multiple","","spaces.") Thepattern//tellssplittostartanewword wheneveritseesaspace.Becausetherearetwospacesbetween eachword,splitstartsawordwhenitseesthefirst space,andthenstartsanotherwordwhenitseesthesecondspace. Thismeansthattherearenow"emptywords"intheline. The+specialcharactergetsaroundthisproblem.Suppose thecalltosplitischangedtothis: @array=split(/+/,$line); Becausethepattern/+/triestomatchasmanyblank charactersaspossible,theline Here'smultiplespaces. producesthefollowinglist: ("Here's","multiple","spaces") Listing7.2showshowyoucanusethe/+/patternto produceacountofthenumberofwordsinafile. Listing7.2.Aword-countprogramthathandlesmultiplespaces betweenwords. 1:#!/usr/local/bin/perl 2: 3:$wordcount=0; 4:$line=; 5:while($linene""){ 6:chop($line); 7:@words=split(/+/,$line); 8:$wordcount+=@words; 9:$line=; 10:} 11:print("Totalnumberofwords:$wordcount\n"); $program7_2 Hereissomeinput. Herearesomemorewords. Hereismylastline. ^D Totalnumberofwords:14 $ Thisisthesameword-countprogramyousaw inListing5.15,withonlyonechange:Thepattern/+/ isbeingusedtobreakthelineintowords.Asyoucansee,this handlesspacesbetweenwordsproperly. Youmighthavenoticedthefollowingproblemswiththisword-count program: Spacesatthebeginningofalinearecountedasaword,because splitalwaysstartsanewwordwhenitseesaspace. Tabcharactersarecountedasaword. Foranexampleofthefirstproblem,takealookatthefollowing inputline: Thislinecontainsleadingspaces. Thecalltosplitinline7breakstheprecedinginto thefollowinglist: ("","This","line","contains","leading","spaces") Thisyieldsawordcountof6,nottheexpected5. Therecanbeatmostoneemptywordproducedfromaline,nomatter howmanyleadingspacesthereare,becausethepattern/+/ matchesasmanyspacesaspossible.Notealsothattheprogram candistinguishbetweenlinescontainingwordsandlinesthat areblankorcontainjustspaces.Ifalineisblankorcontains onlyspaces,theline @words=split(/+/,$line); [email protected],you canfixtheproblemofleadingspacesinlinesbymodifyingline 8asfollows: $wordcount+=(@words>0&&$words[0]eq""? @words-1:@words); Thischecksforlinescontainingleadingspaces;ifalinecontains leadingspaces,thefirst"word"(whichistheempty string)isnotaddedtothewordcount. Tofindouthowtomodifytheprogramtodealwithtabcharacters aswellasspaces,seethefollowingsection. The[]SpecialCharacters The[]specialcharactersenableyoutodefinepatterns thatmatchoneofagroupofalternatives.Forexample,thefollowing patternmatchesdefordEf: /d[eE]f/ Youcanspecifyasmanyalternativesasyoulike. /a[0123456789]c/ Thismatchesa,followedbyanydigit,followedbyc. Youcancombine[]with+tomatchasequence ofcharactersofanylength. /d[eE]+f/ Thismatchesallofthefollowing: def dEf deef dEef dEEEeeeEef AnycombinationofEande,inanyorder,is matchedby[eE]+. Youcanuse[]and+togethertomodifythe word-countprogramyou'vejustseentoaccepteithertabcharacters orspaces.Listing7.3showshowyoucandothis. Listing7.3.Aword-countprogramthathandlesmultiplespaces andtabsbetweenwords. 1:#!/usr/local/bin/perl 2: 3:$wordcount=0; 4:$line=; 5:while($linene""){ 6:chop($line); 7:@words=split(/[\t]+/,$line); 8:$wordcount+=@words; 9:$line=; 10:} 11:print("Totalnumberofwords:$wordcount\n"); $program7_3 Hereissomeinput. Herearesomemorewords. Hereismylastline. ^D Totalnumberofwords:14 $ ThisprogramisidenticaltoListing7.2,except thatthepatternisnow/[\t]+/. The\tspecial-charactersequencerepresentsthetab character,andthispatternmatchesanycombinationorquantity ofspacesandtabs. NOTE Anyescapesequencethatissupportedindouble-quotedstringsissupportedinpatterns.SeeDay3,"UnderstandingScalarValues,"foralistoftheescapesequencesthatareavailable. The*and? SpecialCharacters Asyouhaveseen,the+charactermatchesoneormore occurrencesofacharacter.Perlalsodefinestwootherspecial charactersthatmatchavaryingnumberofcharacters:* and?. The*specialcharactermatcheszeroormoreoccurrences oftheprecedingcharacter.Forexample,thepattern /de*f/ matchesdf,def,deef,andsoon. Thischaractercanalsobeusedwiththe[]specialcharacter. /[eE]*/ ThismatchestheemptystringaswellasanycombinationofE oreinanyorder. Besurenottoconfusethe*specialcharacterwiththe+specialcharacter.Ifyouusethewrongspecialcharacter,youmightnotgettheresultsthatyouwant. Forexample,supposethatyoumodifyListing7.3tocallsplitasfollows: @words=split(/[\t]*/,$list); Thismatcheszeroormoreoccurrencesofthespaceortabcharacter.Whenyourunthiswiththeinput aline here'sthelistthatisassignedto@words: ("a","l","i","n","e") Becausethepattern/[\t]*/matchesonzerooccurrencesofthespaceortabcharacter,itmatchesaftereverycharacter.Thismeansthatsplitstartsawordaftereverycharacterthatisnotaspaceortab.(Itskipsspacesandtabs because/[\t]*/matchesthem.) Thebestwaytoavoidproblemssuchasthisoneistousethe*specialcharacteronlywhenthereisanothercharacterappearinginthepattern.Patternssuchas /b*[c]/ nevermatchthenullstring,becausethematchedsequencehastocontainatleastthecharacterc. The?charactermatcheszerooroneoccurrenceofthe precedingcharacter.Forexample,thepattern /de?f/ matcheseitherdfordef.Notethatitdoes notmatchdeef,becausethe?characterdoes notmatchtwooccurrencesofacharacter. EscapeSequencesforSpecialCharacters Ifyouwantyourpatterntoincludeacharacterthatisnormally treatedasaspecialcharacter,precedethecharacterwithabackslash \.Forexample,tocheckforoneormoreoccurrences of*inastring,usethefollowingpattern: /\*+/ Thebackslashprecedingthe*tellsthePerlinterpreter totreatthe*asanordinarycharacter,notasthespecial charactermeaning"zeroormoreoccurrences." Toincludeabackslashinapattern,specifytwobackslashes: /\\+/ Thispatterntestsforoneormoreoccurrencesof\in astring. IfyouarerunningPerl5,anotherwaytotellPerlthataspecial characteristobetreatedasanormalcharacteristoprecede itwiththe\Qescapesequence.WhenthePerlinterpreter sees\Q,everycharacterfollowingthe\Qis treatedasanormalcharacteruntil\Eisseen.This meansthatthepattern /\Q^ab*/ matchesanyoccurrenceofthestring^ab*,andthepattern /\Q^ab\E*/ matches^afollowedbyzeroormoreoccurrencesofb. Foracompletelistofspecialcharactersinpatternsthatrequire \tobegiventheirnaturalmeaning,seethesection titled"Special-CharacterPrecedence,"whichcontains atablethatliststhem. TIP InPerl,anycharacterthatisnotaletteroradigitcanbepreceded byabackslash.Ifthecharacterisn'taspecialcharacterinPerl,the backslashisignored. Ifyouarenotsurewhetheraparticularcharacterisaspecial character,precedingitwithabackslashwillensurethatyourpattern behavesthewayyouwantitto. MatchingAnyLetterorNumber Asyouhaveseen,thepattern /a[0123456789]c/ matchesa,followedbyanydigit,followedbyc. Anotherwayofwritingthisisasfollows: /a[0-9]c/ Here,therange[0-9]representsanydigitbetween0 and9.Thispatternmatchesa0c,a1c,a2c, andsoonuptoa9c. Similarly,therange[a-z]matchesanylowercaseletter, andtherange[A-Z]matchesanyuppercaseletter.For example,thepattern /[A-Z][A-Z]/ matchesanytwouppercaseletters. Tomatchanyuppercaseletter,lowercaseletter,ordigit,use thefollowingrange: /[0-9a-zA-Z]/ Listing7.4providesanexampleoftheuseofrangeswiththe []specialcharacters.Thisprogramcheckswhethera giveninputlinecontainsalegalPerlscalar,array,orfile-variable name.(Notethatthisprogramhandlesonlysimpleinputlines. Laterexampleswillsolvethisprobleminabetterway.) Listing7.4.Asimplevariable-namevalidationprogram. 1:#!/usr/local/bin/perl 2: 3:print("Enteravariablename:\n"); 4:$varname=; 5:chop($varname); 6:if($varname=~/\$[A-Za-z][_0-9a-zA-Z]*/){ 7:print("$varnameisalegalscalarvariable\n"); 8:}elsif($varname=~/@[A-Za-z][_0-9a-zA-Z]*/){ 9:print("$varnameisalegalarrayvariable\n"); 10:}elsif($varname=~/[A-Za-z][_0-9a-zA-Z]*/){ 11:print("$varnameisalegalfilevariable\n"); 12:}else{ 13:print("Idon'tunderstandwhat$varnameis.\n"); 14:} $program7_4 Enteravariablename: $result $resultisalegalscalarvariable $ Line6checkswhethertheinputlinecontains thenameofalegalscalarvariable.Recallthatalegalscalar variableconsistsofthefollowing: A$character Anuppercaseorlowercaseletter Zeroormoreletters,digits,orunderscorecharacters Eachpartofthepatterntestedinline6correspondstooneof theaforementionedconditionsgiven.Thefirstpartofthepattern, \$,ensuresthatthepatternmatchesonlyifitbegins witha$character. NOTE The$isprecededbyabackslash,because$isaspecialcharacterinpatterns.Seethefollowingsection,"AnchoringPatterns,"formoreinformationonthe$specialcharacter. Thesecondpartofthepattern, [A-Za-z] matchesexactlyoneuppercaseorlowercaseletter.Thefinalpart ofthepattern, [_0-9a-zA-Z]* matcheszeroormoreunderscores,digits,orlettersinanyorder. Thepatternsinline8andline10areverysimilartotheone inline6.Theonlydifferenceinline8isthatthepatternthere matchesastringwhosefirstcharacteris@,not$. Inline10,thisfirstcharacterisomittedcompletely. Thepatterninline8correspondstothedefinitionofalegal array-variablename,andthepatterninline10correspondsto thedefinitionofalegalfile-variablename. AnchoringPatterns AlthoughListing7.4candeterminewhetheralineofinputcontains alegalPerlvariablename,itcannotdeterminewhetherthere isextraneousinputontheline.Forexample,itcan'ttellthe differencebetweenthefollowingthreelinesofinput: $result junk$result $result#junk Inallthreecases,thepattern /\$[a-zA-Z][_0-9a-zA-Z]*/ findsthestring$resultandmatchessuccessfully;however, onlythefirstlineisalegalPerlvariablename. Tofixthisproblem,youcanusepatternanchors.Table 7.1liststhepatternanchorsdefinedinPerl. Table7.1.PatternanchorsinPerl. AnchorDescription ^or\AMatchatbeginningofstringonly $or\ZMatchatendofstringonly \bMatchonwordboundary \BMatchinsideword Thesepatternanchorsaredescribedinthefollowingsections. The^and$PatternAnchors Thepatternanchors^and$ensurethatthe patternismatchedonlyatthebeginningortheendofastring. Forexample,thepattern /^def/ matchesdefonlyifthesearethefirstthreecharacters inthestring.Similarly,thepattern /def$/ matchesdefonlyifthesearethelastthreecharacters inthestring. Youcancombine^and$toforcematchingof theentirestring,asfollows: /^def$/ Thismatchesonlyifthestringisdef. Inmostcases,theescapesequences\Aand\Z (definedinPerl5)areequivalentto^and$, respectively: /\Adef\Z/ Thisalsomatchesonlyifthestringisdef. NOTE \Aand\Zbehavedifferentlyfrom^and$whenthemultiple-linepattern-matchingoptionisspecified.Pattern-matchingoptionsaredescribedlatertoday. Listing7.5showshowyoucanusepatternanchorstoensurethat alineofinputis,infact,alegalPerlscalar-,array-,or file-variablename. Listing7.5.Abettervariable-namevalidationprogram. 1:#!/usr/local/bin/perl 2: 3:print("Enteravariablename:\n"); 4:$varname=; 5:chop($varname); 6:if($varname=~/^\$[A-Za-z][_0-9a-zA-Z]*$/){ 7:print("$varnameisalegalscalarvariable\n"); 8:}elsif($varname=~/^@[A-Za-z][_0-9a-zA-Z]*$/){ 9:print("$varnameisalegalarrayvariable\n"); 10:}elsif($varname=~/^[A-Za-z][_0-9a-zA-Z]*$/){ 11:print("$varnameisalegalfilevariable\n"); 12:}else{ 13:print("Idon'tunderstandwhat$varnameis.\n"); 14:} $program7_5 Enteravariablename: x$result Idon'tunderstandwhatx$resultis. $ Theonlydifferencebetweenthisprogramand theoneinListing7.4isthatthisprogramusesthepatternanchors ^and$inthepatternsinlines6,8,and10. Theseanchorsensurethatavalidpatternconsistsofonlythose charactersthatmakeupalegalPerlscalar,array,orfilevariable. Inthesampleoutputgivenhere,theinput x$result isrejected,becausethepatterninline6ismatchedonlywhen the$characterappearsatthebeginningoftheline. Word-BoundaryPatternAnchors Theword-boundarypatternanchors,\band\B, specifywhetheramatchedpatternmustbeonawordboundaryor insideawordboundary.(Awordboundaryisthebeginningorend ofaword.) The\bpatternanchorspecifiesthatthepatternmust beonawordboundary.Forexample,thepattern /\bdef/ matchesonlyifdefisthebeginningofaword.This meansthatdefanddefghimatchbutabcdef doesnot. Youcanalsouse\btoindicatetheendofaword.For example, /def\b/ matchesdefandabcdef,butnotdefghi. Finally,thepattern /\bdef\b/ matchesonlytheworddef,notabcdefordefghi. NOTE Awordisassumedtocontainletters,digits,andunderscorecharacters,andnothingelse.Thismeansthat /\bdef/ matches$defghi:because$isnotassumedtobepartofaword,defisthebeginningoftheworddefghi,and/\bdef/matchesit. The\Bpatternanchoristheoppositeof\b. \Bmatchesonlyifthepatterniscontainedinaword. Forexample,thepattern /\Bdef/ matchesabcdef,butnotdef.Similarly,the pattern /def\B/ matchesdefghi,and /\Bdef\B/ matchescdefgorabcdefghi,butnotdef, defghi,orabcdef. The\band\Bpatternanchorsenableyouto searchforwordsinaninputlinewithouthavingtobreakupthe lineusingsplit.Forexample,Listing7.6uses\b tocountthenumberoflinesofaninputfilethatcontainthe wordthe. Listing7.6.Aprogramthatcountsthenumberofinputlines containingthewordthe. 1:#!/usr/local/bin/perl 2: 3:$thecount=0; 4:print("Entertheinputhere:\n"); 5:$line=; 6:while($linene""){ 7:if($line=~/\bthe\b/){ 8:$thecount+=1; 9:} 10:$line=; 11:} 12:print("Numberoflinescontaining'the':$thecount\n"); $program7_6 Entertheinputhere: Nowisthetime forallgoodmen tocometotheaid oftheparty. ^D Numberoflinescontaining'the':3 $ Thisprogramcheckseachlineinturntosee ifitcontainsthewordthe,andthenprintsthetotal numberoflinesthatcontaintheword. Line7performstheactualcheckingbytryingtomatchthepattern /\bthe\b/ Ifthispatternmatches,thelinecontainsthewordthe, becausethepatternchecksforwordboundariesateitherend. Notethatthisprogramdoesn'tcheckwhetherthewordthe appearsonalinemorethanonce.Itisnotdifficulttomodify theprogramtodothis;infact,youcandoitinseveraldifferent ways. Themostobviousbutmostlaboriouswayistobreakuplinesthat youknowcontaintheintowords,andthencheckeach word,asfollows: if($line=~/\bthe\b/){ @words=split(/[\t]+/,$line); $count=1; while($count<=@words){ if($words[$count-1]eq"the"){ $thecount+=1; } $count++; } } Acutewaytoaccomplishthesamethingistousethepattern itselftobreakthelineintowords: if($line=~/\bthe\b/){ @words=split(/\bthe\b/,$line); $thecount+=@words-1; } Infact,youdon'tevenneedtheifstatement. @words=split(/\bthe\b/,$line); $thecount+=@words-1; Here'swhythisworks:Everytimesplitseestheword the,itstartsanewword.Therefore,thenumberofoccurrences oftheisequaltoonelessthanthenumberofelements [email protected], @wordshasthelength1,and$thecountisnot changed. Thistrickworksonlyifyouknowthatthereisatleastonewordontheline. Considerthefollowingcode,whichtriestousetheaforementioned trickonalinethathashaditsnewlinecharacterremovedusingchop: $line=; chop($line); @words=split(/\bthe\b/,$line); $thecount+=@words-1; Thiscodeactuallysubtracts1from$thecountifthelineisblankorconsistsonlyofthewordthe,becauseinthesecases@wordsistheemptylistandthelengthof@wordsis0. Leavingoffthecalltochopprotectsagainst thisproblem,becausetherewillalwaysbeatleastone"word"inevery line(consistingofthenewlinecharacter). VariableSubstitutioninPatterns Ifyoulike,youcanusethevalueofascalarvariableinapattern. Forexample,thefollowingcodesplitstheline$line intowords: $pattern="[\\t]+"; @words=split(/$pattern/,$line); Becauseyoucanuseascalarvariableinapattern,thereisnothing tostopyoufromreadingthepatternfromthestandardinputfile. Listing7.7acceptsasearchpatternfromafileandthensearches forthepatternintheinputfileslistedonthecommandline. Ifitfindsthepattern,itprintsthefilenameandlinenumber ofthematch;attheend,itprintsthetotalnumberofmatches. Thisexampleassumesthattwofilesexist,file1and file2.Eachfilecontainsthefollowing: Thisisalineofinput. Thisisanotherlineofinput. Ifyourunthisprogramwithcommand-lineargumentsfile1 andfile2andsearchforthepatternanother, yougettheoutputshown. Listing7.7.Asimplepattern-searchprogram. 1:#!/usr/local/bin/perl 2: 3:print("Enterthesearchpattern:\n"); 4:$pattern=; 5:chop($pattern); 6:$filename=$ARGV[0]; 7:$linenum=$matchcount=0; 8:print("Matchesfound:\n"); 9:while($line=<>){ 10:$linenum+=1; 11:if($line=~/$pattern/){ 12:print("$filename,line$linenum\n"); 13:@words=split(/$pattern/,$line); 14:$matchcount+=@words-1; 15:} 16:if(eof){ 17:$linenum=0; 18:$filename=$ARGV[0]; 19:} 20:} 21:if($matchcount==0){ 22:print("Nomatchesfound.\n"); 23:}else{ 24:print("Totalnumberofmatches:$matchcount\n"); 25:} $program7_7file1file2 Enterthesearchpattern: another Matchesfound: file1,line2 file2,line2 Totalnumberofmatches:2 $ Thisprogramusesthefollowingscalarvariables tokeeptrackofinformation: $patterncontainsthesearchpatternreadinfrom thestandardinputfile. $filenamecontainsthefilecurrentlybeingsearched. $linenumcontainsthelinenumberofthelinecurrently beingsearched. $matchcountcontainsthetotalnumberofmatches foundtothispoint. Line6setsthecurrentfilename,whichcorrespondstothefirst [email protected] variableliststheargumentssuppliedonthecommandline.(To refreshyourmemoryonhow@ARGVworks,referbackto Day6,"ReadingfromandWritingtoFiles.")Thiscurrent filenameneedstobestoredinascalarvariable,becausethe <>operatorinline9shifts@ARGVand destroysthisname. Line9readsfromeachofthefilesonthecommandlineinturn, onelineatatime.Thecurrentinputlineisstoredinthescalar variable$line.Oncethelineisread,line10adds1 tothecurrentlinenumber. Lines11-15handlethematchingprocess.Line11checkswhether thepatternstoredin$patterniscontainedintheinput linestoredin$line.Ifamatchisfound,line12prints outthecurrentfilenameandlinenumber.Line13thensplits thelineinto"words,"usingthetrickdescribedin theearliersection,"Word-BoundaryPatternAnchors." Becausethenumberofelementsoftheliststoredin@words isonelargerthanthenumberoftimesthepatternismatched, theexpression@words-1isequivalenttothenumber ofmatches;itsvalueisaddedto$matchcount. Line16checkswhetherthe<>operatorhasreached theendofthecurrentinputfile.Ifithas,line17resetsthe currentlinenumberto0.Thisensuresthatthenextpassthrough theloopwillsetthecurrentlinenumberto1(toindicatethat theprogramisonthefirstlineofthenextfile).Line18sets thefilenametothenextfilementionedonthecommandline,which iscurrentlystoredin$ARGV[0]. Lines21-25eitherprintthetotalnumberofmatchesorindicate thatnomatcheswerefound. NOTE Makesurethatyouremembertoincludetheenclosing/ characterswhenyouuseascalar-variablenameinapattern.ThePerl interpreterdoesnotcomplainwhenitseesthefollowing,forexample, buttheresultmightnotbewhatyouwant: @words=split($pattern,$line); ExcludingAlternatives Asyouhaveseen,whenthespecialcharacters[]appear inapattern,theyspecifyasetofalternativestochoosefrom. Forexample,thepattern /d[eE]f/ matchesdefordEf. Whenthe^characterappearsasthefirstcharacterafter the[,itindicatesthatthepatternistomatchany characterexcepttheonesdisplayedbetweenthe[ and].Forexample,thepattern /d[^eE]f/ matchesanypatternthatsatisfiesthefollowingcriteria: Thefirstcharacterisd. Thesecondcharacterisanythingotherthaneor E. Thelastcharacterisf. NOTE Toincludea^characterinasetofalternatives,precedeitwithabackslash,asfollows: /d[\^eE]f/ Thispatternmatchesd^f,def,ordEf. Character-RangeEscapeSequences Inthesectiontitled"MatchingAnyLetterorNumber" earlierinthischapter,youlearnedthatyoucanrepresentconsecutive lettersornumbersinsidethe[]specialcharactersby specifyingranges.Forexample,inthepattern /a[1-3]c/ the[1-3]matchesanyof1,2,or3. SomerangesoccurfrequentlyenoughthatPerldefinesspecial escapesequencesforthem.Forexample,insteadofwriting /[0-9]/ toindicatethatanydigitistobematched,youcanwrite /\d/ The\descapesequencemeans"anydigit." Table7.2liststhecharacter-rangeescapesequences,whatthey match,andtheirequivalentcharacterranges. Table7.2.Character-rangeescapesequences. Escapesequence DescriptionRange \dAnydigit [0-9] \DAnythingotherthanadigit [^0-9] \wAnywordcharacter [_0-9a-zA-Z] \WAnythingnotawordcharacter [^_0-9a-zA-Z] \sWhitespace [\r\t\n\f] \SAnythingotherthanwhitespace [^\r\t\n\f] Theseescapesequencescanbeusedanywhereordinarycharacters areused.Forexample,thefollowingpatternmatchesanydigit orlowercaseletter: /[\da-z]/ NOTE Thedefinitionofwordboundaryasusedbythe\band\Bspecialcharacterscorrespondstothedefinitionofwordcharacterusedby\wand\W. Ifthepattern/\w\W/matchesaparticular pairofcharacters,thefirstcharacterispartofawordandthe secondisnot;thismeansthatthefirstcharacteristheendofa word,andthatawordboundaryexistsbetweenthefirstandsecond charactersmatchedbythepattern. Similarly,if/\W\w/matchesapairof characters,thefirstcharacterisnotpartofawordandthesecond characteris.Thismeansthatthesecondcharacteristhebeginningof aword.Again,awordboundaryexistsbetweenthefirstandsecond charactersmatchedbythepattern. MatchingAnyCharacter Anotherspecialcharactersupportedinpatternsistheperiod (.)character,whichmatchesanycharacterexceptthe newlinecharacter.Forexample,thefollowingpatternmatches d,followedbyanynon-newlinecharacter,followedby f: /d.f/ The.characterisoftenusedinconjunctionwiththe *character.Forexample,thefollowingpatternmatches anystringthatcontainsthecharacterdprecedingthe characterf: /d.*f/ Normally,the.*special-charactercombinationtries tomatchasmuchaspossible.Forexample,ifthestringbanana issearchedusingthefollowingpattern,thepatternmatchesbanana, notbaorbana: /b.*a/ NOTE Thereisoneexceptiontotheprecedingrule:The.*characteronlymatchesthelongestpossiblestringthatenablesthepatternmatchasawholetosucceed. Forexample,supposethestringMississippiissearchedusingthepattern /M.*i.*pi/ Here,thefirst.*in/M.*i.*pi/matches Mississippi Ifittriedtogofurtherandmatch Mississippi oreven Mississippi therewouldbenothingleftfortherestofthepatterntomatch. Whenthefirst.*matchislimitedto Mississippi therestofthepattern,i.*pi,matchesippi,andthepatternasawholesucceeds. MatchingaSpecifiedNumberofOccurrences Severalspecialcharactersinpatternsthatyouhaveseenenable youtomatchaspecifiednumberofoccurrencesofacharacter. Forexample,+matchesoneormoreoccurrencesofacharacter, and?matcheszerooroneoccurrences. Perlenablesyoutodefinehowmanyoccurrencesofacharacter constituteamatch.Todothis,usethespecialcharacters{ and}. Forexample,thepattern /de{1,3}f/ matchesd,followedbyone,two,orthreeoccurrences ofe,followedbyf.Thismeansthatdef, deef,anddeeefmatch,butdfanddeeeef donot. Tospecifyanexactnumberofoccurrences,includeonlyonevalue betweenthe{andthe}. /de{3}f/ Thisspecifiesexactlythreeoccurrencesofe,which meansthispatternonlymatchesdeeef. Tospecifyaminimumnumberofoccurrences,leaveofftheupper bound. /de{3,}f/ Thismatchesd,followedbyatleastthreees, followedbyf. Finally,tospecifyamaximumnumberofoccurrences,use0as thelowerbound. /de{0,3}f/ Thismatchesd,followedbynomorethanthreees, followedbyf. NOTE Youcanuse{and}withcharacterrangesoranyotherspecialcharacter,asfollows: /[a-z]{1,3}/ Thismatchesone,two,orthreelowercaseletters. /.{3}/ Thismatchesanythreecharacters. SpecifyingChoices Thespecialcharacter|enablesyoutospecifytwoor morealternativestochoosefromwhenmatchingapattern.For example,thepattern /def|ghi/ matcheseitherdeforghi.Thepattern /[a-z]+|[0-9]+/ matchesoneormorelowercaselettersoroneormoredigits. Listing7.8isasimpleexampleofaprogramthatusesthe| specialcharacter.Itreadsanumberandcheckswhetheritis alegitimatePerlinteger. Listing7.8.Asimpleinteger-validationprogram. 1:#!/usr/local/bin/perl 2: 3:print("Enteranumber:\n"); 4:$number=; 5:chop($number); 6:if($number=~/^-?\d+$|^-?0[xX][\da-fa-F]+$/){ 7:print("$numberisalegalinteger.\n"); 8:}else{ 9:print("$numberisnotalegalinteger.\n"); 10:} $program7_8 Enteranumber: 0x3ff1 0x3ff1isalegalinteger. $ RecallthatPerlintegerscanbeinanyof threeforms: Standardbase-10notation,asin123 Base-8(octal)notation,indicatedbyaleading0, asin0123 Base-16(hexadecimal)notation,indicatedbyaleading0x or0X,asin0X1ff Line6checkswhetheranumberisalegalPerlinteger.Thefirst alternativeinthepattern, ^-?\d+$ matchesastringconsistingofoneormoredigits,optionally precededbya-.(The^and$characters ensurethatthisistheonlystringthatmatches.)Thistakes careofintegersinstandardbase-10notationandintegersin octalnotation. Thesecondalternativeinthepattern, ^-?0[xX][\da-fa-F]+$ matchesintegersinhexadecimalnotation.Takealookatthis patternonepieceatatime: The^matchesthebeginningoftheline.Thisensures thatlinescontainingleadingspacesorextraneouscharacters arenottreatedasvalidhexadecimalintegers. The-?matchesa-ifitispresent.This ensuresthatnegativenumbersarematched. The0matchestheleading0. The[xX]matchesthexorXthat followstheleading0. The[\da-fa-F]matchesanydigit,anyletterbetween aandf,oranyletterbetweenAand F.Recallthatthesearepreciselythecharacterswhich areallowedtoappearinhexadecimaldigits. The+indicatesthatthepatternistomatchone ormorehexadecimaldigits. Theclosing$indicatesthatthepatternistomatch onlyiftherearenoextraneouscharactersfollowingthehexadecimal integer. Bewarethatthefollowingpatternmatcheseitherxoroneormoreofy,notoneormoreofxory: /x|y+/ Seethesectioncalled"Special-CharacterPrecedence"later todayfordetailsonhowtospecifyspecial-characterprecedencein patterns. ReusingPortionsofPatterns Supposethatyouwanttowriteapatternthatmatchesthefollowing: Oneormoredigitsorlowercaseletters Followedbyacolonorsemicolon Followedbyanothergroupofoneormoredigitsorlowercase letters Anothercolonorsemicolon Yetanothergroupofoneormoredigitsorlowercaseletters Onewaytoindicatethispatternisasfollows: /[\da-z]+[:;][\da-z]+[:;][\da-z]+/ Thispatternissomewhatcomplicatedandisquiterepetitive. Perlprovidesaneasierwaytospecifypatternsthatcontainmultiple repetitionsofaparticularsequence.Whenyouencloseaportion ofapatterninparentheses,asin ([\da-z]+) Perlstoresthematchedsequenceinmemory.Toretrieveasequence frommemory,usethespecialcharacter\n,wheren isanintegerrepresentingthenthpatternstoredin memory. Forexample,theaforementionedpatterncanbewrittenas /([\da-z]+])[:;]\1[:;]\1/ Here,thepatternmatchedby[\da-z]+isstoredinmemory. WhenthePerlinterpreterseestheescapesequence\1, itmatchesthematchedpattern. Youalsocanstorethesequence[:;]inmemory,andwrite thispatternasfollows: /([\da-z]+)([:;])\1\2\1/ Patternsequencesarestoredinmemoryfromlefttoright,so \1representsthesubpatternmatchedby[\da-z]+ and\2representsthesubpatternmatchedby[:;]. Pattern-sequencememoryisoftenusedwhenyouwanttomatchthe samecharacterinmorethanoneplacebutdon'tcarewhichcharacter youmatch.Forexample,ifyouarelookingforadateindd-mm-yy format,youmightwanttomatch /\d{2}([\W])\d{2}\1\d{2}/ Thismatchestwodigits,anon-wordcharacter,twomoredigits, thesamenon-wordcharacter,andtwomoredigits.Thismeansthat thefollowingstringsallmatch: 12-05-92 26.11.87 070492 However,thefollowingstringdoesnotmatch: 21-05.91 Thisisbecausethepatternislookingfora-between the05andthe91,notaperiod. Bewarethatthepattern /\d{2}([\W])\d{2}\1\d{2}/ isnotthesameasthepattern /(\d{2})([\W])\1\2\1/ Inthefirstpattern,anydigitcanappearanywhere.Thesecondpattern matchesanytwodigitsasthefirsttwocharacters,butthenonly matchesthesametwodigitsagain.Thismeansthat 17-17-17 matches,butthefollowingdoesnot: 17-05-91 Pattern-SequenceScalarVariables Notethatpattern-sequencememoryispreservedonlyforthelength ofthepattern.Thismeansthatifyoudefinethefollowingpattern (which,incidentally,matchesanyfloating-pointnumberthatdoes notcontainanexponent): /-?(\d+)\.?(\d+)/ youcannotthendefineanotherpattern,suchasthefollowing: /\1/ andexpectthePerlinterpretertorememberthat\1refers tothefirst\d+(thedigitsbeforethedecimalpoint). Togetaroundthisproblem,Perldefinesspecialbuilt-invariables thatrememberthevalueofpatternsmatchedinparentheses.These specialvariablesarenamed$n,wherenisthe nthsetofparenthesesinthepattern. Forexample,considerthefollowing: $string="Thisstringcontainsthenumber25.11."; $string=~/-?(\d+)\.?(\d+)/; $integerpart=$1; $decimalpart=$2; Inthiscase,thepattern /-?(\d+)\.?(\d+)/ matches25.11,andthesubpatterninthefirstsetof parenthesesmatches25.Thismeansthat25is storedin$1andislaterassignedto$integerpart. Similarly,thesecondsetofparenthesesmatches11, whichisstoredin$2andlaterassignedto$decimalpart. Thevaluesstoredin$1,$2, andsoon,aredestroyedwhenanotherpatternmatchisperformed.If youneedthesevalues,besuretoassignthemtootherscalar variables. Thereisalsooneotherbuilt-inscalarvariable,$&, whichcontainstheentirematchedpattern,asfollows: $string="Thisstringcontainsthenumber25.11."; $string=~/-?(\d+)\.?(\d+)/; $number=$&; Here,thepatternmatchedis25.11,whichisstoredin $&andthenassignedto$number. Special-CharacterPrecedence Perldefinesrulesofprecedencetodeterminetheorderinwhich specialcharactersinpatternsareinterpreted.Forexample,the pattern /x|y+/ matcheseitherxoroneormoreoccurrencesofy, because+hashigherprecedencethan|andis thereforeinterpretedfirst. Table7.3liststhespecialcharactersthatcanappearinpatterns inorderofprecedence(highesttolowest).Specialcharacters withhigherprecedencearealwaysinterpretedbeforethoseof lowerprecedence. Table7.3.Theprecedenceofpattern-matchingspecial characters. SpecialcharacterDescription ()Patternmemory +*?{}Numberofoccurrences ^$\b\BPatternanchors |Alternatives Becausethepattern-memoryspecialcharacters()have thehighestprecedence,youcanusethemtoforceotherspecial characterstobeevaluatedfirst.Forexample,thepattern (ab|cd)+ matchesoneormoreoccurrencesofeitheraborcd. Thismatches,forexample,abcdab. Rememberthatwhenyouuseparentheses toforcetheorderofprecedence,youalsoarestoringintopattern memory.Forexample,inthesequence /(ab|cd)+(.)(ef|gh)+\1/ the\1referstowhatab|cdmatched,nottowhatthe.specialcharactermatched. Nowthatyouknowallofthespecial-patterncharactersandtheir precedence,lookataprogramthatdoesmorecomplexpatternmatching. Listing7.9usesthevariousspecial-patterncharacters,including theparentheses,tocheckwhetheragiveninputstringisavalid twentieth-centurydate. Listing7.9.Adate-validationprogram. 1:#!/usr/local/bin/perl 2: 3:print("EnteradateintheformatYYYY-MM-DD:\n"); 4:$date=; 5:chop($date); 6: 7:#Becausethispatterniscomplicated,wesplitit 8:#intoparts,assignthepartstoscalarvariables, 9:#thensubstitutetheminlater. 10: 11:#handle31-daymonths 12:$md1="(0[13578]|1[02])\\2(0[1-9]|[12]\\d|3[01])"; 13:#handle30-daymonths 14:$md2="(0[469]|11)\\2(0[1-9]|[12]\\d|30)"; 15:#handleFebruary,withoutworryingaboutwhetherit's 16:#supposedtobealeapyearornot 17:$md3="02\\2(0[1-9]|[12]\\d)"; 18: 19:#checkforatwentieth-centurydate 20:$match=$date=~/^(19)?\d\d(.)($md1|$md2|$md3)$/; 21:#checkforavalidbutnon-20thcenturydate 22:$olddate=$date=~/^(\d{1,4})(.)($md1|$md2|$md3)$/; 23:if($match){ 24:print("$dateisavaliddate\n"); 25:}elsif($olddate){ 26:print("$dateisnotinthe20thcentury\n"); 27:}else{ 28:print("$dateisnotavaliddate\n"); 29:} $program7_9 EnteradateintheformatYYYY-MM-DD: 1991-04-31 1991-04-31isnotavaliddate $ Don'tworry:thisprogramisalotlesscomplicated thanitlooks!Basically,thisprogramdoesthefollowing: ItcheckswhetherthedateisintheformatYYYY-MM-DD. (ItallowsYY-MM-DD,andalsoenablesyoutouseacharacter otherthanahyphentoseparatetheyear,month,anddate.) Itcheckswhethertheyearisinthetwentiethcenturyor not. Itcheckswhetherthemonthisbetween01and12. Finally,itcheckswhetherthedatefieldisalegaldate forthatmonth.Legaldatefieldsarebetween01and either29,30,or31,dependingon thenumberofdaysinthatmonth. Ifthedateislegal,theprogramtellsyouso.Ifthedateis notatwentieth-centurydatebutislegal,theprograminforms youofthisalso. Becausethepatterntobematchedistoolongtofitononeline, thisprogrambreaksitintopiecesandassignsthepiecestoscalar variables.Thisispossiblebecausescalar-variablesubstitution issupportedinpatterns. Line12isthepatterntomatchformonthswith31days.Note thattheescapesequences(suchas\d)areprecededby anotherbackslash(producing\\d).Thisisbecausethe programactuallywantstostoreabackslashinthescalarvariable. (Recallthatbackslashesindouble-quotedstringsaretreated asescapesequences.)Thepattern (0[13578]|1[02])\2(0[1-9]|[12]\d|3[01]) whichisassignedto$md1,consistsofthefollowing components: Thesequence(0[13578]|1[02]),whichmatchesthe monthvalues01,03,05,07, 08,10,and12(the31-daymonths) \2,whichmatchesthecharacterthatseparatesthe day,month,andyear Thesequence(0[1-9]|[12]\d|3[01]),whichmatches anytwo-digitnumberbetween01and31 Notethat\2matchestheseparatorcharacterbecause theseparatorcharacterwilleventuallybethesecondpattern sequencestoredinmemory(whenthepatternisfinallyassembled). Line14issimilartoline12andhandles30-daymonths.Theonly differencesbetweenthissubpatternandtheoneinline12are asfollows: Themonthvaluesacceptedare04,06,09, and11. Thevaliddatefieldsare01through30, not01through31. Line17isanothersimilarpatternthatcheckswhetherthemonth is02(February)andthedatefieldisbetween01 and29. Line20doestheactualpatternmatchthatcheckswhetherthe dateisavalidtwentieth-centurydate.Thispatternisdivided intothreeparts. ^(19)?\d\d,whichmatchesanytwo-digitnumberat thebeginningofaline,oranyfour-digitnumberstartingwith 19 Theseparatorcharacter,whichistheseconditeminparentheses-the seconditemstoredinmemory-andthuscanberetrievedusing\2 ($md1|$md2|$md3)$,whichmatchesanyofthevalid month-daycombinationsdefinedinlines12,14,and17,provided itappearsattheendoftheline Theresultofthepatternmatch,eithertrueorfalse,isstored inthescalarvariable$match. Line22checkswhetherthedateisavaliddateinanycentury. Theonlydifferencebetweenthispatternandtheoneinline20 isthattheyearcanbeanyone-to-four-digitnumber.Theresult ofthepatternmatchisstoredin$olddate. Lines23-29checkwhethereither$matchor$olddate istrueandprinttheappropriatemessage. Asyoucansee,thepattern-matchingfacilityinPerlisquite powerful.Thisprogramislessthan30lineslong,includingcomments; theequivalentprograminalmostanyotherprogramminglanguage wouldbesubstantiallylongerandmuchmoredifficulttowrite. SpecifyingaDifferentPatternDelimiter Sofar,allthepatternsyouhaveseenhavebeenenclosedby/ characters. /de*f/ These/charactersareknownaspatterndelimiters. Because/isthepattern-delimitercharacter,youmust use\/toincludea/characterinapattern. Thiscanbecomeawkwardifyouaresearchingforadirectorysuch as,forexample,/u/jqpublic/perl/prog1. /\/u\/jqpublic\/perl\/prog1/ Tomakeiteasiertowritepatternsthatinclude/characters, Perlenablesyoutouseanypattern-delimitercharacteryoulike. Thefollowingpatternalsomatchesthedirectory/u/jqpublic/perl/prog1: m!/u/jqpublic/perl/prog1! Here,themindicatesthepattern-matchingoperation. Ifyouareusingapatterndelimiterotherthan/,you mustincludethem. Therearetwothingsyoushouldwatchoutforwhenyouuseotherpatterndelimiters. First,ifyouusethe'characterasapatterndelimiter,thePerlinterpreterdoesnotsubstituteforscalar-variablenames. m'$var' Thismatchesthestring$var,notthecurrentvalueofthescalarvariable$var. Second,ifyouuseapatterndelimiterthatisnormallya special-patterncharacter,youwillnotbeabletousethatspecial characterinyourpattern.Forexample,ifyouwanttomatchthe patternab?c(whichmatchesa,optionally followedbyb,followedbyc)youcannotusethe?characterasapatterndelimiter.Thepattern m?ab?c? producesasyntaxerror,becausethePerlinterpreterassumesthatthe?afterthebisapatterndelimiter.Youcanstilluse m?ab\?c? butthispatternwon'tmatchwhatyouwant.Becausethe?insidethepatternisescaped,thePerlinterpreterassumesthatyouwanttomatchtheactual?character,andthepatternmatchesthesequenceab?c. Pattern-MatchingOptions Whenyouspecifyapattern,youalsocansupplyoptionsthatcontrol howthepatternistobematched.Table7.4liststhesepattern-matching options. Table7.4.Pattern-matchingoptions. OptionDescription gMatchallpossiblepatterns iIgnorecase mTreatstringasmultiplelines oOnlyevaluateonce sTreatstringassingleline xIgnorewhitespaceinpattern Allpatternoptionsareincludedimmediatelyafterthepattern. Forexample,thefollowingpatternusestheioption toignorecase: /ab*c/i Youcanspecifyasmanyoftheoptionsasyoulike,andtheoptions canbeinanyorder. MatchingAllPossiblePatterns ThegoperatortellsthePerlinterpretertomatchall thepossiblepatternsinastring.Forexample,ifyousearch thestringbalatausingthepattern /.a/g whichmatchesanycharacterfollowedbya,thepattern matchesba,la,andta. Ifapatternwiththegoptionspecifiedappearsasan assignmenttoanarrayvariable,thearrayvariableisassigned alistconsistingofallthepatternsmatched.Forexample, @matches="balata"=~/.a/g; assignsthefollowinglistto@matches: ("ba","la","ta") Now,considerthefollowingstatement: $match="balata"=~/.a/g; Thefirsttimethisstatementisexecuted,$matchis assignedthefirstpatternmatched,whichinthiscaseisba. Ifthisassignmentisperformedagain,$matchisassigned thesecondpatternmatchedinthestring,whichisla, andsoonuntilthepatternrunsoutofmatches. Thismeansthatyoucanusepatternswiththegoption inloops.Listing7.10showshowthisworks. Listing7.10.Aprogramthatloopsusingapattern. 1:#!/usr/local/bin/perl 2: 3:while("balata"=~/.a/g){ 4:$match=$&; 5:print("$match\n"); 6:} $program7_10 ba la ta $ Thefirsttimethroughtheloop,$match hasthevalueofthefirstpatternmatched,whichisba. (Thesystemvariable$&alwayscontainsthelast patternmatched;thispatternisassignedto$matchin line4.)Whentheloopisexecutedforasecondtime,$match hasthevaluela.Thethirdtimethrough,$match hasthevalueta.Afterthis,theloopterminates;because thepatterndoesn'tmatchanythingelse,theconditionalexpression isnowfalse. DeterminingtheMatchLocation Ifyouneedtoknowhowmuchofastringhasbeensearchedby thepatternmatcherwhenthegoperatorisspecified, usetheposfunction. $offset=pos($string); Thisreturnsthepositionatwhichthenextpatternmatchwill bestarted. Youcanrepositionthepatternmatcherbyputtingpos() ontheleftsideofanassignment. pos($string)=$newoffset; ThistellsthePerlinterpretertostartthenextpatternmatch atthepositionspecifiedby$newoffset. Ifyouchangethestringbeingsearched,thematchpositionisresettothebeginningofthestring. NOTE TheposfunctionisnotavailableinPerlversion4. IgnoringCase Theioptionenablesyoutospecifythatamatchedletter caneitherbeuppercaseorlowercase.Forexample,thefollowing patternmatchesde,dE,De,orDE: /de/i Patternsthatmatcheitheruppercaseorlowercaselettersare saidtobecase-insensitive. TreatingtheStringasMultipleLines ThemoptiontellsthePerlinterpreterthatthestring tobematchedcontainsmultiplelinesoftext.Whenthem optionisspecified,the^specialcharactermatches eitherthestartofthestringorthestartofanynewline.For example,thepattern /^The/m matchesthewordThein Thispatternmatches\nThefirstwordonthesecondline Themoptionalsospecifiesthatthe$special characteristomatchtheendofanyline.Thismeansthatthe pattern /line.$/m ismatchedinthefollowingstring: Thisistheendofthefirstline.\nHere'sanotherline. NOTE ThemoptionisdefinedonlyinPerl5.TotreatastringasmultiplelineswhenyourunPerl4,setthe$*systemvariable,describedonDay17,"SystemVariables." EvaluatingaPatternOnlyOnce TheooptionenablesyoutotellthePerlinterpreter thatapatternistobeevaluatedonlyonce.Forexample,consider thefollowing: $var=1; $line=; while($var<10){ $result=$line=~/$var/o; $line=; $var++; } ThefirsttimethePerlinterpreterseesthepattern/$var/, itreplacesthename$varwiththecurrentvalueof$var, whichis1;thismeansthatthepatterntobematched is/1/. Becausetheooptionisspecified,thepatterntobe matchedremains/1/evenwhenthevalueof$var changes.Iftheooptionhadnotbeenspecified,the patternwouldhavebeen/2/thenexttimethroughthe loop. TIP There'snorealreasontousetheooptionforpatternsunlessyouarekeenonefficiency.Here'saneasierwaytodothesamething: $var=; $matchval=$var; $line=; while($var<10){ $result=$line=~/$matchval/; $line=; $var++; } Thevalueof$matchvalneverchanges,sotheooptionisnotnecessary. TreatingtheStringasaSingleLine Thesoptionspecifiesthatthestringtobematched istobetreatedasasinglelineoftext.Inthiscase,the. specialcharactermatcheseverycharacterinastring,including thenewlinecharacter.Forexample,thepattern/a.*bc/s ismatchedsuccessfullyinthefollowingstring: axxxxx\nxxxxbc Ifthesoptionisnotspecified,thispatterndoesnot match,becausethe.characterdoesnotmatchthenewline. NOTE ThesoptionisdefinedonlyinPerl5. UsingWhiteSpaceinPatterns OneproblemwithpatternsinPerlisthattheycanbecomedifficult tofollow.Forexample,considerthispattern,whichyousawearlier: /\d{2}([\W])\d{2}\1\d{2}/ Patternssuchasthisaredifficulttofollow,becausethereare alotofbackslashes,braces,andbracketstosortout. Perl5makeslifealittleeasierbysupplyingthex option.ThistellsthePerlinterpretertoignorewhitespace inapatternunlessitisprecededbyabackslash.Thismeans thattheprecedingpatterncanberewrittenasthefollowing, whichismucheasiertofollow: /\d{2}([\W])\d{2}\1\d{2}/x Hereisanexampleofapatterncontaininganactualblankspace: /[A-Z][a-z]+\[A-Z][a-z]+/x Thismatchesanameinthestandardfirst-name/last-nameformat (suchasJohnSmith).Normally,youwon'twanttouse thexoptionifyou'reactuallytryingtomatchwhite space,becauseyouwindupwiththebackslashproblemallover again. NOTE ThexoptionisdefinedonlyinPerl5. TheSubstitutionOperator Perlenablesyoutoreplacepartofastringusingthesubstitution operator,whichhasthefollowingsyntax: s/pattern/replacement/ ThePerlinterpretersearchesforthepatternspecifiedbythe placeholderpattern.Ifitfindspattern,it replacesitwiththestringrepresentedbytheplaceholderreplacement. Forexample: $string="abc123def"; $string=~s/123/456/; Here,123isreplacedby456,whichmeansthat thevaluestoredin$stringisnowabc456def. Youcanuseanyofthepatternspecialcharactersinthesubstitution operator.Forexample, s/[abc]+/0/ searchesforasequenceconsistingofoneormoreoccurrences ofthelettersa,b,andc(inany order)andreplacesthesequencewith0. Ifyoujustwanttodeleteasequenceofcharactersratherthan replaceit,leaveoutthereplacementstringasinthefollowing example,whichdeletesthefirstoccurrenceofthepatternabc: s/abc// UsingPattern-SequenceVariablesinSubstitutions Youcanusepattern-sequencevariablestoincludeamatchedpattern inthereplacementstring.Thefollowingisanexample: s/(\d+)/[$1]/ Thismatchesasequenceofoneormoredigits.Becausethissequence isenclosedinparentheses,itisstoredinthescalarvariable $1.Inthereplacementstring,[$1],thescalar variablename$1isreplacedbyitsvalue,whichisthe matchedpattern. NOTE Becausethereplacementstringinthesubstitutionoperatorisastring,notapattern,thepatternspecialcharacters,suchas[],*,and+,donothaveaspecialmeaning.Forexample,inthesubstitution s/abc/[def]/ thereplacementstringis[def](includingthesquarebrackets). OptionsfortheSubstitutionOperator Thesubstitutionoperatorsupportsseveraloptions,whichare listedinTable7.5. Table7.5.Optionsforthesubstitutionoperator. OptionDescription gChangealloccurrencesofthepattern iIgnorecaseinpattern eEvaluatereplacementstringasexpression mTreatstringtobematchedasmultiplelines oEvaluateonlyonce sTreatstringtobematchedassingleline xIgnorewhitespaceinpattern Aswithpatternmatching,optionsareappendedtotheendofthe operator.Forexample,tochangealloccurrencesofabc todef,usethefollowing: s/abc/def/g GlobalSubstitution Thegoptionchangesalloccurrencesofapatternin aparticularstring.Forexample,thefollowingsubstitutionputs parenthesesaroundanynumberinthestring: s/(\d+)/($1)/g Listing7.11isanexampleofaprogramthatusesglobalsubstitution. Itexamineseachlineofitsinput,removesallextraneousleading spacesandtabs,andreplacesmultiplespacesandtabsbetween wordswithasinglespace. Listing7.11.Asimplewhitespacecleanupprogram. 1:#!/usr/local/bin/perl 2: 3:@input=; 4:$count=0; 5:while($input[$count]ne""){ 6:$input[$count]=~s/^[\t]+//; 7:$input[$count]=~s/[\t]+\n$/\n/; 8:$input[$count]=~s/[\t]+//g; 9:$count++; 10:} 11:print("Formattedtext:\n"); 12:print(@input); $program7_11 Thisisalineofinput. Hereisanotherline. Thisismylastlineofinput. ^D Formattedtext: Thisisalineofinput. Hereisanotherline. Thisismylastlineofinput. $ Thisprogramperformsthreesubstitutionson eachlineofitsinput.Thefirstsubstitution,inline6,checks whetherthereareanyspacesortabsatthebeginningoftheline. Ifanyexist,theyareremoved. Similarly,line7checkswhetherthereareanyspacesortabs attheendoftheline(beforethetrailingnewlinecharacter). Ifanyexist,theyareremoved.Todothis,line7replacesthe followingpattern(oneormorespacesandtabs,followedbya newlinecharacter,followedbytheendoftheline)withanewline character: /[\t]+\n$/ Line8usesaglobalsubstitutiontoremoveextraspacesandtabs betweenwords.Thefollowingpatternmatchesoneormorespaces ortabs,inanyorder;thesespacesandtabsarereplacedbya singlespace: /[\t]+/ IgnoringCase Theioptionignorescasewhensubstituting.Forexample, thefollowingsubstitutionreplacesalloccurrencesofthewords no,No,NO,andnOwithNO. (Recallthatthe\bescapecharacterspecifiesaword boundary.) s/\bno\b/NO/gi ReplacementUsinganExpression Theeoptiontreatsthereplacementstringasanexpression, whichitevaluatesbeforereplacing.Forexample,considerthe following: $string="0abc1"; $string=~s/[a-zA-Z]+/$&x2/e Thesubstitutionshownhereisaquickwaytoduplicatepartof astring.Here'showitworks: Thepattern/[a-zA-Z]+/matchesabc,which isstoredinthebuilt-invariable$&. Theeoptionindicatesthatthereplacementstring, $&x2,istobetreatedasanexpression.Thisexpression isevaluated,producingtheresultabcabc. abcabcissubstitutedforabcinthestring storedin$string.Thismeansthatthenewvalueof$string is0abcabc1. Listing7.12isanotherexamplethatusestheeoption inasubstitution.Thisprogramtakeseveryintegerinalist ofinputfilesandmultipliesthemby2,leavingtherestofthe contentsunchanged.(Forthesakeofsimplicity,theprogramassumes thattherearenofloating-pointnumbersinthefile.) Listing7.12.Aprogramthatmultiplieseveryintegerina fileby2. 1:#!/usr/local/bin/perl 2: 3:$count=0; 4:while($ARGV[$count]ne""){ 5:open(FILE,"$ARGV[$count]"); 6:@file=; 7:$linenum=0; 8:while($file[$linenum]ne""){ 9:$file[$linenum]=~s/\d+/$&*2/eg; 10:$linenum++; 11:} 12:close(FILE); 13:open(FILE,">$ARGV[$count]"); 14:printFILE(@file); 15:close(FILE); 16:$count++; 17:} Ifafilenamedfoocontainsthetext Thiscontainsthenumber1. Thiscontainsthenumber26. andthenamefooispassedasacommand-line argumenttothisprogram,thefilefoobecomes Thiscontainsthenumber2. Thiscontainsthenumber52. Thisprogramusesthebuilt-invariable@ARGVtoretrieve filenamesfromthecommandline.Notethattheprogramcannot use<>,becausethefollowingstatementreadsthe entirecontentsofallthefilesintoasinglearray: @file=<>; Lines8-11readandsubstituteonelineofafileatatime.Line 9performstheactualsubstitutionasfollows: Thepattern\d+matchesasequenceofoneormore digits,whichisautomaticallyassignedto$&. Thevalueof$&issubstitutedintothereplacement string. Theeoptionindicatesthatthisreplacementstring istobetreatedasanexpression.Thisexpressionmultiplies thematchedintegerby2. Theresultofthemultiplicationisthensubstitutedinto thefileinplaceoftheoriginalinteger. Thegoptionindicatesthateveryintegeronthe lineistobesubstitutedfor. Afterallthelinesinthefilehavebeenread,thefileisclosed andreopenedforwriting.Thecalltoprintinline14 takestheliststoredin@file-thecontentsofthecurrent file-andwritesthembackouttothefile,overwritingtheoriginal contents. EvaluatingaPatternOnlyOnce Aswiththematchoperator,theooptiontothesubstitution operatortellsthePerlinterpretertoreplaceascalarvariable namewithitsvalueonlyonce.Forexample,thefollowingstatement substitutesthecurrentvalueof$varforitsname,producing areplacementstring: $string=~/abc/$var/o; Thisreplacementstringthenneverchanges,evenifthevalue of$varchanges.Forexample: $var=17; while($var>0){ $string=; $string=~/abc/$var/o; print($string); $var--;#thereplacementstringisstill"17" } Again,aswiththematchoperator,thereisnorealreasonto usetheooption. TreatingtheStringasSingleorMultipleLines Asinthepattern-matchingoperator,thesandm optionsspecifythatthestringtobematchedistobetreated asasinglelineorasmultiplelines,respectively. Thesoptionensuresthatthenewlinecharacter\n ismatchedbythe.specialcharacter. $string="Thisisa\ntwo-linestring."; $string=~s/a.*o/one/s; #$stringnowcontains"Thisisaone-linestring." Ifthemoptionisspecified,^and$ matchthebeginningandendofanyline. $string="TheThefirstline\nTheThesecondline"; $string=~s/^The//gm; #$stringnowcontains"Thefirstline\nThesecondline" $string=~s/e$/k/gm; #$stringnowcontains"Thefirstlink\nThesecondlink" The\Aand\Z escapesequences(definedinPerl5)alwaysmatchonlythebeginning andendofthestring,respectively.(Thisistheonlycasewhere\Aand\Zbehavedifferentlyfrom^and$.) NOTE ThemandsoptionsaredefinedonlyinPerl5.TotreatastringasmultiplelineswhenyourunPerl4,setthe$*systemvariable,describedonDay17. UsingWhiteSpaceinPatterns ThexoptiontellsthePerlinterpretertoignoreall whitespaceunlessprecededbyabackslash.Aswiththepattern-matching operator,ignoringwhitespacemakescomplicatedstringpatterns easiertoread. $string=~s/\d{2}([\W])\d{2}\1\d{2}/$1-$2-$3/x Thisconvertsaday-month-yearstringtothedd-mm-yy format. NOTE Evenifthexoptionisspecified,spacesinthereplacementstringarenotignored.Forexample,thefollowingreplaces14/04/95with14-04-95,not14-04-95: $string=~s/\d{2}([\W])\d{2}\1\d{2}/$1-$2-$3/x AlsonotethatthexoptionisdefinedonlyinPerl5. SpecifyingaDifferentDelimiter Youcanspecifyadifferentdelimitertoseparatethepattern andreplacementstringinthesubstitutionoperator.Forexample, thefollowingsubstitutionoperatorreplaces/u/binwith /usr/local/bin: s#/u/bin#/usr/local/bin# Thesearchandreplacementstringscanbeenclosedinparentheses oranglebrackets. s(/u/bin)(/usr/local/bin) s/\/usr\/local\/bin/ NOTE Aswiththematchoperator,youcannotuseaspecialcharacterbothasadelimiterandinapattern. s.a.c.def. Thissubstitutionwillbeflaggedascontaininganerrorbecausethe.characterisbeingusedasthedelimiter.Thesubstitution s.a\.c.def. doeswork,butitsubstitutesdeffora.c,where.isanactualperiodandnotthepatternspecialcharacter. TheTranslationOperator Perlalsoprovidesanotherwaytosubstituteonegroupofcharacters foranother:thetrtranslationoperator.Thisoperator usesthefollowingsyntax: tr/string1/string2/ Here,string1containsalistofcharacterstobereplaced, andstring2containsthecharactersthatreplacethem. Thefirstcharacterinstring1isreplacedbythefirst characterinstring2,thesecondcharacterinstring1 isreplacedbythesecondcharacterinstring2,andso on. Hereisasimpleexample: $string="abcdefghicba"; $string=~tr/abc/def/; Here,thecharactersa,b,andcare tobereplacedasfollows: Alloccurrencesofthecharacteraaretobereplaced bythecharacterd. Alloccurrencesofthecharacterbaretobereplaced bythecharactere. Alloccurrencesofthecharactercaretobereplaced bythecharacterf. Afterthetranslation,thescalarvariable$stringcontains thevaluedefdefghifed. NOTE Ifthestringlistingthecharacterstobereplacedislongerthanthe stringcontainingthereplacementcharacters,thelastcharacterofthe replacementstringisrepeated.Forexample: $string="abcdefgh"; $string=~tr/efgh/abc/; Here,thereisnocharactercorrespondingtodinthereplacementlist,soc,thelastcharacterinthereplacementlist,replacesh.Thistranslationsetsthevalueof$stringtoabcdabcc. Alsonotethatifthesamecharacterappearsmorethanonceinthelist ofcharacterstobereplaced,thefirstreplacementisused: $string=~tr/AAA/XYZ/;replacesAwithX Themostcommonuseofthetranslationoperatoristoconvert alphabeticcharactersfromuppercasetolowercaseorviceversa. Listing7.13providesanexampleofaprogramthatconvertsa filetoalllowercasecharacters. Listing7.13.Anuppercase-to-lowercaseconversionprogram. 1:#!/usr/local/bin/perl 2: 3:while($line=){ 4:$line=~tr/A-Z/a-z/; 5:print($line); 6:} $program7_13 THISLINEISINUPPERCASE. thislineisinuppercase. ThiSLiNEIsiNmIxEDcASe. thislineisinmixedcase. ^D $ Thisprogramreadsalineatatimefromthe standardinputfile,terminatingwhenitseesalinecontaining theCtrl+D(end-of-file)character. Line4performsthetranslationoperation.Asintheotherpattern-matching operations,therangecharacter(-)indicatesarange ofcharacterstobeincluded.Here,therangea-zrefers toallthelowercasecharacters,andtherangeA-Zrefers toalltheuppercasecharacters. NOTE Therearetwothingsyoushouldnoteaboutthetranslationoperator: Thepatternspecialcharactersarenotsupportedbythetranslationoperator. Youcanuseyinplaceoftrifyouwant. $string=~y/a-z/A-Z/; OptionsfortheTranslationOperator Thetranslationoperatorsupportsthreeoptions,whicharelisted inTable7.6. Thecoption(cisfor"complement") translatesallcharactersthatarenotspecified.Forexample, thestatement $string=~tr/\d//c; replaceseverythingthatisnotadigitwithaspace. Table7.6.Optionsforthetranslationoperator. OptionDescription cTranslateallcharactersnotspecified dDeleteallspecifiedcharacters sReplacemultipleidenticaloutputcharacterswithasinglecharacter Thedoptiondeleteseveryspecifiedcharacter. $string=~tr/\t//d; Thisdeletesallthetabsandspacesfrom$string. Thesoption(for"squeeze")checkstheoutput fromthetranslation.Iftwoormoreconsecutivecharacterstranslate tothesameoutputcharacter,onlyoneoutputcharacterisactually used.Forexample,thefollowingreplaceseverythingthatisnot adigitandoutputsonlyonespacebetweendigits: $string=~tr/0-9//cs; Listing7.14isasimpleexampleofaprogramthatusessomeof thesetranslationoptions.Itreadsanumberfromthestandard inputfile,anditgetsridofeveryinputcharacterthatisnot actuallyadigit. Listing7.14.Aprogramthatensuresthatastringconsists ofnothingbutdigits. 1:#!/usr/local/bin/perl 2: 3:$string=; 4:$string=~tr/0-9//cd; 5:print("$string\n"); $program7_14 Thenumber45appearsinthisstring. 45 $ Line4ofthisprogramperformsthetranslation. Thedoptionindicatesthatthetranslatedcharacters aretobedeleted,andthecoptionindicatesthatevery characternotinthelististobedeleted.Therefore,thistranslation deleteseverycharacterinthestringthatisnotadigit.Note thatthetrailingnewlinecharacterisnotadigit,soitisone ofthecharactersdeleted. ExtendedPattern-Matching Perl5providessomeadditionalpattern-matchingcapabilities notfoundinPerl4orinstandardUNIXpattern-matchingoperations. Extendedpattern-matchingcapabilitiesemploythefollowingsyntax: (?pattern) isasinglecharacterrepresentingtheextended pattern-matchingcapabilitybeingused,andpatternis thepatternorsubpatterntobeaffected. Thefollowingextendedpattern-matchingcapabilitiesaresupported byPerl5: Parenthesizingsubpatternswithoutsavingtheminmemory Embeddingoptionsinpatterns Positiveandnegativelook-aheadconditions Comments ParenthesizingWithoutSavinginMemory InPerl,whenasubpatternisenclosedinparentheses,thesubpattern isalsostoredinmemory.Ifyouwanttoencloseasubpattern inparentheseswithoutstoringitinmemory,usethe?: extendedpattern-matchingfeature.Forexample,considerthis pattern: /(?:a|b|c)(d|e)f\1/ Thismatchesthefollowing: Oneofa,b,orc Oneofdore f Whicheverofdorewasmatchedearlier Here,\1matcheseitherdore,because thesubpatterna|b|cwasnotstoredinmemory.Compare thiswiththefollowing: /(a|b|c)(d|e)f\1/ Here,thesubpatterna|b|cisstoredinmemory,andone ofa,b,orcismatchedby\1. EmbeddingPatternOptions Perl5providesawayofspecifyingapattern-matchingoption withinthepatternitself.Forexample,thefollowingpatterns areequivalent: /[a-z]+/i /(?i)[a-z]+/ Inbothcases,thepatternmatchesoneormorealphabeticcharacters; theioptionindicatesthatcaseistobeignoredwhen matching. Thesyntaxforembeddedpatternoptionsis (?option) whereoptionisoneoftheoptionsshowninTable7.7. Table7.7.Optionsforembeddedpatterns. OptionDescription iIgnorecaseinpattern mTreatpatternasmultiplelines sTreatpatternassingleline xIgnorewhitespaceinpattern Thegandooptionsarenotsupportedasembedded patternoptions. Embeddedpatternoptionsgiveyoumoreflexibilitywhenyouare matchingpatterns.Forexample: $pattern1="[a-z0-9]+"; $pattern2="(?i)[a-z]+"; if($string=~/$pattern1|$pattern2/){ ... } Here,theioptionisspecifiedforsome,butnotall, ofapattern.(Thispatternmatcheseitheranycollectionoflowercase lettersmixedwithdigits,oranycollectionofletters.) PositiveandNegativeLook-Ahead Perl5enablesyoutousethe?=featuretodefinea boundaryconditionthatmustbematchedinorderforthepattern tomatch.Forexample,thefollowingpatternmatchesabc onlyifitisfollowedbydef: /abc(?=def)/ Thisisknownasapositivelook-aheadcondition. NOTE Thepositivelook-aheadconditionisnotpartofthepatternmatched.Forexample,considerthesestatements: $string="25abc8"; $string=~/abc(?=[0-9])/; $matched=$&; Here,asalways,$&containsthematchedpattern,whichinthiscaseisabc,notabc8. Similarly,the?!featuredefinesanegativelook-ahead condition,whichisaboundaryconditionthatmustnotbe presentifthepatternistomatch.Forexample,thepattern/abc(?!def)/ matchesanyoccurrenceofabcunlessitisfollowedby def. PatternComments Perl5enablesyoutoaddcommentstoapatternusingthe?# feature.Forexample: if($string=~/(?i)[a-z]{2,3}(?#matchtwoorthreealphabeticcharacters)/{ ... } Addingcommentsmakesiteasiertofollowcomplicatedpatterns. Summary Perlenablesyoutosearchforsequencesofcharactersusingpatterns. Ifapatternisfoundinastring,thepatternissaidtobematched. Patternsoftenareusedinconjunctionwiththepattern-match operators,=~and!~.The=~operator returnstrueifthepatternmatches,andthe!~operator returnstrueifthepatterndoesnotmatch. Special-patterncharactersenableyoutosearchforastringthat meetsoneofavarietyofconditions. The+charactermatchesoneormoreoccurrencesof acharacter. The*charactermatcheszeroormoreoccurrences ofacharacter. The[]charactersencloseasetofcharacters,any oneofwhichmatches. The?charactermatcheszerooroneoccurrencesof acharacter. The^and$charactersmatchthebeginning andendofaline,respectively.The\band\B charactersmatchawordboundaryorsomewhereotherthanaword boundary,respectively. The{}charactersspecifythenumberofoccurrences ofacharacter. The|characterspecifiesalternatives,eitherof whichmatch. Togiveaspecialcharacteritsnaturalmeaninginapattern, precedeitwithabackslash\. Enclosingapartofapatterninparenthesesstoresthematched subpatterninmemory;thisstoredsubpatterncanberecalledusing thecharactersequence\n,andstoredinascalarvariable usingthebuilt-inscalarvariable$n.Thebuilt-inscalar variable$&storestheentirematchedpattern. Youcansubstituteforscalar-variablenamesinpatterns,specify differentpatterndelimiters,orsupplyoptionsthatmatchevery possiblepattern,ignorecase,orperformscalar-variablesubstitution onlyonce. Thesubstitutionoperator,s,enablesyoutoreplace amatchedpatternwithaspecifiedstring.Optionstothesubstitution operatorenableyoutoreplaceeverymatchedpattern,ignorecase, treatthereplacingstringasanexpression,orperformscalar-variable substitutiononlyonce. Thetranslationoperator,tr,enablesyoutotranslate onesetofcharactersintoanotherset.Optionsexistthatenable youtoperformtranslationoneverythingnotinthelist,todelete charactersinthelist,ortoignoremultipleidenticaloutput characters. Perl5providesextendedpattern-matchingcapabilitiesnotprovided inPerl4.Touseoneoftheseextendedpatternfeaturesona subpattern,put(?atthebeginningofthesubpattern and)attheendofthesubpattern. Q&A Q:Howmanysubpatternscanbestoredinmemoryusing\1,\2,andsoon? A:Basically,asmanyas youlike.Afteryoustoremorethanninepatterns,youcanretrievethe laterpatternsusingtwo-digitnumbersprecededbyabackslash,suchas \10. Q:Whydoespattern-memoryvariablenumberingstartwith1,whereassubscriptnumberingstartswith0? A:Subscriptnumbering startswith0toremaincompatiblewiththeCprogramminglanguage. ThereisnosuchthingaspatternmemoryinC,sothereisnoneedto becompatiblewithit. Q:Whathappenswhenthereplacementstringinthetranslatecommandisleftout,asintr/abc//? A:Ifthereplacementstringisomitted,acopyofthefirststringisused.Thismeansthat :t:r/abc// doesnotdoanything,becauseitisthesameas tr/abc/abc/ Ifthereplacementstringisomittedinthesubstitutecommand,asin s/abc// thepatternmatched-inthiscase,abc-isdeleted. Q:WhydoesPerlusecharacterssuchas+,*,and?aspatternspecialcharacters? A:ThesespecialcharactersusuallycorrespondtospecialcharactersusedinotherUNIXapplications,suchasviandcsh.Someofthespecialcharacters,suchas+,areusedinformal syntaxdescriptionlanguages. Q:WhydoesPerluseboth\1and$1tostorepatternmemory? A:Toenableyoutodistinguishbetweenasubpatternmatchedinthecurrentpattern(whichisstoredin\1)andasubpatternmatchedinthepreviousstatement(whichisstoredin$1). Workshop TheWorkshopprovidesquizquestionstohelpyousolidifyyour understandingofthematerialcoveredandexercisestogiveyou experienceinusingwhatyou'velearned.Tryandunderstandthe quizandexerciseanswersbeforeyougoontotomorrow'slesson. Quiz Whatdothefollowingpatternsmatch? a.   /a|bc*/ b.   /[\d]{1,3}/ c.   /\bc[aou]t\b/ d.   /(xy+z)\.\1/ e.   /^$/ Writepatternsthatmatchthefollowing: a.   Fiveormorelowercaseletters(a-z). b.   Eitherthenumber1orthestringone. c.   stringofdigitsoptionallycontainingadecimal point. d.   Anyletter,followedbyanyvowel,followed bythesameletteragain. e.   Oneormore+characters. Supposethevariable$varhasthevalueabc123. Indicatewhetherthefollowingconditionalexpressionsreturn trueorfalse. a.  $var=~/./ b.  $var=~/[A-Z]*/ c.  $var=~/\w{4-6}/ d.  $var=~/(\d)2(\1)/ e.  $var=~/abc$/ f. $var=~/1234?/ Supposethevariable$varhasthevalueabc123abc. Whatisthevalueof$varafterthefollowingsubstitutions? a.   $var=~s/abc/def/; b.   $var=~s/[a-z]+/X/g; c.   $var=~s/B/W/i; d.   $var=~s/(.)\d.*\1/d/; e.   $var=~s/(\d+)/$1*2/e; Supposethevariable$varhasthevalueabc123abc. Whatisthevalueof$varafterthefollowingtranslations? a.   $var=~tr/a-z/A-Z/; b.   $var=~tr/123/456/; c.   $var=~tr/231/564/; d.   $var=~tr/123//s; e.   $var=~tr/123//cd; Exercises Writeaprogramthatreadsalltheinputfromthestandard inputfile,convertsallthevowels(excepty)touppercase, andprintstheresultonthestandardoutputfile. Writeaprogramthatcountsthenumberoftimeseachdigit appearsinthestandardinputfile.Printthetotalforeachdigit andthesumofallthetotals. Writeaprogramthatreversestheorderofthefirstthree wordsofeachinputline(fromthestandardinputfile)using thesubstitutionoperator.Leavethespacingunchanged,andprint eachresultingline. Writeaprogramthatadds1toeverynumberinthestandard inputfile.Printtheresults. BUGBUSTER:Whatiswrongwiththefollowingprogram? #!/usr/local/bin/perl while($line=){ #putquotesaroundeachlineofinput $line=~/^.*$/"\1"/; print($line); } BUGBUSTER:Whatiswrongwiththefollowingprogram? #!/usr/local/bin/perl while($line=){ if($line=~/[\d]*/){ print("Thislinecontainsthedigits'$&'\n"); } }



請為這篇文章評分?