Perl supports a variety of special characters inside patterns, which enables you to match any of a number of character strings. These special characters are ...
Chapter 7
PatternMatching
CONTENTS
Introduction
TheMatchOperators
Match-OperatorPrecedence
SpecialCharactersinPatterns
The+Character
The[]SpecialCharacters
The*and?
SpecialCharacters
EscapeSequencesforSpecialCharacters
MatchingAnyLetterorNumber
AnchoringPatterns
VariableSubstitutioninPatterns
ExcludingAlternatives
Character-RangeEscapeSequences
MatchingAnyCharacter
MatchingaSpecifiedNumberofOccurrences
SpecifyingChoices
ReusingPortionsofPatterns
Pattern-SequenceScalarVariables
Special-CharacterPrecedence
SpecifyingaDifferentPatternDelimiter
Pattern-MatchingOptions
MatchingAllPossiblePatterns
IgnoringCase
TreatingtheStringasMultipleLines
EvaluatingaPatternOnlyOnce
TreatingtheStringasaSingleLine
UsingWhiteSpaceinPatterns
TheSubstitutionOperator
UsingPattern-SequenceVariablesinSubstitutions
OptionsfortheSubstitutionOperator
EvaluatingaPatternOnlyOnce
TreatingtheStringasSingleorMultipleLines
UsingWhiteSpaceinPatterns
SpecifyingaDifferentDelimiter
TheTranslationOperator
OptionsfortheTranslationOperator
ExtendedPattern-Matching
ParenthesizingWithoutSavinginMemory
EmbeddingPatternOptions
PositiveandNegativeLook-Ahead
PatternComments
Summary
Q&A
Workshop
Quiz
Exercises
Thislessondescribesthepattern-matchingfeaturesofPerl.Today,
youlearnaboutthefollowing:
Howpatternmatchingworks
Thepattern-matchingoperators
Specialcharacterssupportedinpatternmatching
Pattern-matchingoptions
Patternsubstitution
Translation
Extendedpattern-matchingfeatures
Introduction
Apatternisasequenceofcharacterstobesearchedfor
inacharacterstring.InPerl,patternsarenormallyenclosed
inslashcharacters:
/def/
Thisrepresentsthepatterndef.
Ifthepatternisfound,amatchoccurs.Forexample,ifyousearch
thestringredefineforthepattern/def/,the
patternmatchesthethird,fourth,andfifthcharacters.
redefine
Youalreadyhaveseenasimpleexampleofpatternmatchingin
thelibraryfunctionsplit.
@array=split(//,$line);
Herethepattern//matchesasinglespace,whichsplits
alineintowords.
TheMatchOperators
Perldefinesspecialoperatorsthattestwhetheraparticular
patternappearsinacharacterstring.
The=~operatortestswhetherapatternismatched,as
showninthefollowing:
$result=$var=~/abc/;
Theresultofthe=~operationisoneofthefollowing:
Anonzerovalue,ortrue,ifthepatternisfoundinthestring
0,orfalse,ifthepatternisnotmatched
Inthisexample,thevaluestoredinthescalarvariable$var
issearchedforthepatternabc.Ifabcisfound,
$resultisassignedanonzerovalue;otherwise,$result
issettozero.
The!~operatorissimilarto=~,exceptthat
itcheckswhetherapatternisnotmatched.
$result=$var!~/abc/;
Here,$resultissetto0ifabcappears
inthestringassignedto$var,andtoanonzerovalue
ifabcisnotfound.
Because=~and!~produceeithertrueorfalse
astheirresult,theseoperatorsareideallysuitedforusein
conditionalexpressions.Listing7.1isasimpleprogramthat
usesthe=~operatortotestwhetheraparticularsequence
ofcharactersexistsinacharacterstring.
Listing7.1.Aprogramthatillustratestheuseofthematching
operator.
1:#!/usr/local/bin/perl
2:
3:print("Askmeaquestionpolitely:\n");
4:$question=;
5:if($question=~/please/){
6:print("Thankyouforbeingpolite!\n");
7:}else{
8:print("Thatwasnotverypolite!\n");
9:}
$program7_1
Askmeaquestionpolitely:
MayIhaveaglassofwater,please?
Thankyouforbeingpolite!
$
Line5isanexampleoftheuseofthematch
operator=~inaconditionalexpression.Thefollowing
expressionistrueifthevaluestoredin$questioncontains
thewordplease,anditisfalseifitdoesnot:
$question=~/please/
Match-OperatorPrecedence
Likealloperators,thematchoperatorshaveadefinedprecedence.
Bydefinition,the=~and!~operatorshave
higherprecedencethanmultiplicationanddivision,andlower
precedencethantheexponentiationoperator**.
ForacompletelistofPerloperatorsandtheirprecedence,see
Day4,"MoreOperators."
SpecialCharactersinPatterns
Perlsupportsavarietyofspecialcharactersinsidepatterns,
whichenablesyoutomatchanyofanumberofcharacterstrings.
Thesespecialcharactersarewhatmakepatternsuseful.
The+Character
Thespecialcharacter+means"oneormoreofthe
precedingcharacters."Forexample,thepattern/de+f/
matchesanyofthefollowing:
def
deef
deeef
deeeeeeef
NOTE
Patternscontaining+alwaystrytomatchasmanycharactersaspossible.Forexample,ifthepattern
/ab+/
issearchinginthestring
abbc
itmatchesabb,notab.
The+specialcharactermakesitpossibletodefinea
betterwaytosplitlinesintowords.Sofar,thesampleprograms
youhaveseenhaveused
@words=split(//,$line);
tobreakaninputlineintowords.Thisworkswellifthereis
exactlyonespacebetweenwords.However,ifaninputlinecontains
morethanonespacebetweenwords,asin
Here'smultiplespaces.
thecalltosplitproducesthefollowinglist:
("Here's","","multiple","","spaces.")
Thepattern//tellssplittostartanewword
wheneveritseesaspace.Becausetherearetwospacesbetween
eachword,splitstartsawordwhenitseesthefirst
space,andthenstartsanotherwordwhenitseesthesecondspace.
Thismeansthattherearenow"emptywords"intheline.
The+specialcharactergetsaroundthisproblem.Suppose
thecalltosplitischangedtothis:
@array=split(/+/,$line);
Becausethepattern/+/triestomatchasmanyblank
charactersaspossible,theline
Here'smultiplespaces.
producesthefollowinglist:
("Here's","multiple","spaces")
Listing7.2showshowyoucanusethe/+/patternto
produceacountofthenumberofwordsinafile.
Listing7.2.Aword-countprogramthathandlesmultiplespaces
betweenwords.
1:#!/usr/local/bin/perl
2:
3:$wordcount=0;
4:$line=;
5:while($linene""){
6:chop($line);
7:@words=split(/+/,$line);
8:$wordcount+=@words;
9:$line=;
10:}
11:print("Totalnumberofwords:$wordcount\n");
$program7_2
Hereissomeinput.
Herearesomemorewords.
Hereismylastline.
^D
Totalnumberofwords:14
$
Thisisthesameword-countprogramyousaw
inListing5.15,withonlyonechange:Thepattern/+/
isbeingusedtobreakthelineintowords.Asyoucansee,this
handlesspacesbetweenwordsproperly.
Youmighthavenoticedthefollowingproblemswiththisword-count
program:
Spacesatthebeginningofalinearecountedasaword,because
splitalwaysstartsanewwordwhenitseesaspace.
Tabcharactersarecountedasaword.
Foranexampleofthefirstproblem,takealookatthefollowing
inputline:
Thislinecontainsleadingspaces.
Thecalltosplitinline7breakstheprecedinginto
thefollowinglist:
("","This","line","contains","leading","spaces")
Thisyieldsawordcountof6,nottheexpected5.
Therecanbeatmostoneemptywordproducedfromaline,nomatter
howmanyleadingspacesthereare,becausethepattern/+/
matchesasmanyspacesaspossible.Notealsothattheprogram
candistinguishbetweenlinescontainingwordsandlinesthat
areblankorcontainjustspaces.Ifalineisblankorcontains
onlyspaces,theline
@words=split(/+/,$line);
[email protected],you
canfixtheproblemofleadingspacesinlinesbymodifyingline
8asfollows:
$wordcount+=(@words>0&&$words[0]eq""?
@words-1:@words);
Thischecksforlinescontainingleadingspaces;ifalinecontains
leadingspaces,thefirst"word"(whichistheempty
string)isnotaddedtothewordcount.
Tofindouthowtomodifytheprogramtodealwithtabcharacters
aswellasspaces,seethefollowingsection.
The[]SpecialCharacters
The[]specialcharactersenableyoutodefinepatterns
thatmatchoneofagroupofalternatives.Forexample,thefollowing
patternmatchesdefordEf:
/d[eE]f/
Youcanspecifyasmanyalternativesasyoulike.
/a[0123456789]c/
Thismatchesa,followedbyanydigit,followedbyc.
Youcancombine[]with+tomatchasequence
ofcharactersofanylength.
/d[eE]+f/
Thismatchesallofthefollowing:
def
dEf
deef
dEef
dEEEeeeEef
AnycombinationofEande,inanyorder,is
matchedby[eE]+.
Youcanuse[]and+togethertomodifythe
word-countprogramyou'vejustseentoaccepteithertabcharacters
orspaces.Listing7.3showshowyoucandothis.
Listing7.3.Aword-countprogramthathandlesmultiplespaces
andtabsbetweenwords.
1:#!/usr/local/bin/perl
2:
3:$wordcount=0;
4:$line=;
5:while($linene""){
6:chop($line);
7:@words=split(/[\t]+/,$line);
8:$wordcount+=@words;
9:$line=;
10:}
11:print("Totalnumberofwords:$wordcount\n");
$program7_3
Hereissomeinput.
Herearesomemorewords.
Hereismylastline.
^D
Totalnumberofwords:14
$
ThisprogramisidenticaltoListing7.2,except
thatthepatternisnow/[\t]+/.
The\tspecial-charactersequencerepresentsthetab
character,andthispatternmatchesanycombinationorquantity
ofspacesandtabs.
NOTE
Anyescapesequencethatissupportedindouble-quotedstringsissupportedinpatterns.SeeDay3,"UnderstandingScalarValues,"foralistoftheescapesequencesthatareavailable.
The*and?
SpecialCharacters
Asyouhaveseen,the+charactermatchesoneormore
occurrencesofacharacter.Perlalsodefinestwootherspecial
charactersthatmatchavaryingnumberofcharacters:*
and?.
The*specialcharactermatcheszeroormoreoccurrences
oftheprecedingcharacter.Forexample,thepattern
/de*f/
matchesdf,def,deef,andsoon.
Thischaractercanalsobeusedwiththe[]specialcharacter.
/[eE]*/
ThismatchestheemptystringaswellasanycombinationofE
oreinanyorder.
Besurenottoconfusethe*specialcharacterwiththe+specialcharacter.Ifyouusethewrongspecialcharacter,youmightnotgettheresultsthatyouwant.
Forexample,supposethatyoumodifyListing7.3tocallsplitasfollows:
@words=split(/[\t]*/,$list);
Thismatcheszeroormoreoccurrencesofthespaceortabcharacter.Whenyourunthiswiththeinput
aline
here'sthelistthatisassignedto@words:
("a","l","i","n","e")
Becausethepattern/[\t]*/matchesonzerooccurrencesofthespaceortabcharacter,itmatchesaftereverycharacter.Thismeansthatsplitstartsawordaftereverycharacterthatisnotaspaceortab.(Itskipsspacesandtabs
because/[\t]*/matchesthem.)
Thebestwaytoavoidproblemssuchasthisoneistousethe*specialcharacteronlywhenthereisanothercharacterappearinginthepattern.Patternssuchas
/b*[c]/
nevermatchthenullstring,becausethematchedsequencehastocontainatleastthecharacterc.
The?charactermatcheszerooroneoccurrenceofthe
precedingcharacter.Forexample,thepattern
/de?f/
matcheseitherdfordef.Notethatitdoes
notmatchdeef,becausethe?characterdoes
notmatchtwooccurrencesofacharacter.
EscapeSequencesforSpecialCharacters
Ifyouwantyourpatterntoincludeacharacterthatisnormally
treatedasaspecialcharacter,precedethecharacterwithabackslash
\.Forexample,tocheckforoneormoreoccurrences
of*inastring,usethefollowingpattern:
/\*+/
Thebackslashprecedingthe*tellsthePerlinterpreter
totreatthe*asanordinarycharacter,notasthespecial
charactermeaning"zeroormoreoccurrences."
Toincludeabackslashinapattern,specifytwobackslashes:
/\\+/
Thispatterntestsforoneormoreoccurrencesof\in
astring.
IfyouarerunningPerl5,anotherwaytotellPerlthataspecial
characteristobetreatedasanormalcharacteristoprecede
itwiththe\Qescapesequence.WhenthePerlinterpreter
sees\Q,everycharacterfollowingthe\Qis
treatedasanormalcharacteruntil\Eisseen.This
meansthatthepattern
/\Q^ab*/
matchesanyoccurrenceofthestring^ab*,andthepattern
/\Q^ab\E*/
matches^afollowedbyzeroormoreoccurrencesofb.
Foracompletelistofspecialcharactersinpatternsthatrequire
\tobegiventheirnaturalmeaning,seethesection
titled"Special-CharacterPrecedence,"whichcontains
atablethatliststhem.
TIP
InPerl,anycharacterthatisnotaletteroradigitcanbepreceded
byabackslash.Ifthecharacterisn'taspecialcharacterinPerl,the
backslashisignored.
Ifyouarenotsurewhetheraparticularcharacterisaspecial
character,precedingitwithabackslashwillensurethatyourpattern
behavesthewayyouwantitto.
MatchingAnyLetterorNumber
Asyouhaveseen,thepattern
/a[0123456789]c/
matchesa,followedbyanydigit,followedbyc.
Anotherwayofwritingthisisasfollows:
/a[0-9]c/
Here,therange[0-9]representsanydigitbetween0
and9.Thispatternmatchesa0c,a1c,a2c,
andsoonuptoa9c.
Similarly,therange[a-z]matchesanylowercaseletter,
andtherange[A-Z]matchesanyuppercaseletter.For
example,thepattern
/[A-Z][A-Z]/
matchesanytwouppercaseletters.
Tomatchanyuppercaseletter,lowercaseletter,ordigit,use
thefollowingrange:
/[0-9a-zA-Z]/
Listing7.4providesanexampleoftheuseofrangeswiththe
[]specialcharacters.Thisprogramcheckswhethera
giveninputlinecontainsalegalPerlscalar,array,orfile-variable
name.(Notethatthisprogramhandlesonlysimpleinputlines.
Laterexampleswillsolvethisprobleminabetterway.)
Listing7.4.Asimplevariable-namevalidationprogram.
1:#!/usr/local/bin/perl
2:
3:print("Enteravariablename:\n");
4:$varname=;
5:chop($varname);
6:if($varname=~/\$[A-Za-z][_0-9a-zA-Z]*/){
7:print("$varnameisalegalscalarvariable\n");
8:}elsif($varname=~/@[A-Za-z][_0-9a-zA-Z]*/){
9:print("$varnameisalegalarrayvariable\n");
10:}elsif($varname=~/[A-Za-z][_0-9a-zA-Z]*/){
11:print("$varnameisalegalfilevariable\n");
12:}else{
13:print("Idon'tunderstandwhat$varnameis.\n");
14:}
$program7_4
Enteravariablename:
$result
$resultisalegalscalarvariable
$
Line6checkswhethertheinputlinecontains
thenameofalegalscalarvariable.Recallthatalegalscalar
variableconsistsofthefollowing:
A$character
Anuppercaseorlowercaseletter
Zeroormoreletters,digits,orunderscorecharacters
Eachpartofthepatterntestedinline6correspondstooneof
theaforementionedconditionsgiven.Thefirstpartofthepattern,
\$,ensuresthatthepatternmatchesonlyifitbegins
witha$character.
NOTE
The$isprecededbyabackslash,because$isaspecialcharacterinpatterns.Seethefollowingsection,"AnchoringPatterns,"formoreinformationonthe$specialcharacter.
Thesecondpartofthepattern,
[A-Za-z]
matchesexactlyoneuppercaseorlowercaseletter.Thefinalpart
ofthepattern,
[_0-9a-zA-Z]*
matcheszeroormoreunderscores,digits,orlettersinanyorder.
Thepatternsinline8andline10areverysimilartotheone
inline6.Theonlydifferenceinline8isthatthepatternthere
matchesastringwhosefirstcharacteris@,not$.
Inline10,thisfirstcharacterisomittedcompletely.
Thepatterninline8correspondstothedefinitionofalegal
array-variablename,andthepatterninline10correspondsto
thedefinitionofalegalfile-variablename.
AnchoringPatterns
AlthoughListing7.4candeterminewhetheralineofinputcontains
alegalPerlvariablename,itcannotdeterminewhetherthere
isextraneousinputontheline.Forexample,itcan'ttellthe
differencebetweenthefollowingthreelinesofinput:
$result
junk$result
$result#junk
Inallthreecases,thepattern
/\$[a-zA-Z][_0-9a-zA-Z]*/
findsthestring$resultandmatchessuccessfully;however,
onlythefirstlineisalegalPerlvariablename.
Tofixthisproblem,youcanusepatternanchors.Table
7.1liststhepatternanchorsdefinedinPerl.
Table7.1.PatternanchorsinPerl.
AnchorDescription
^or\AMatchatbeginningofstringonly
$or\ZMatchatendofstringonly
\bMatchonwordboundary
\BMatchinsideword
Thesepatternanchorsaredescribedinthefollowingsections.
The^and$PatternAnchors
Thepatternanchors^and$ensurethatthe
patternismatchedonlyatthebeginningortheendofastring.
Forexample,thepattern
/^def/
matchesdefonlyifthesearethefirstthreecharacters
inthestring.Similarly,thepattern
/def$/
matchesdefonlyifthesearethelastthreecharacters
inthestring.
Youcancombine^and$toforcematchingof
theentirestring,asfollows:
/^def$/
Thismatchesonlyifthestringisdef.
Inmostcases,theescapesequences\Aand\Z
(definedinPerl5)areequivalentto^and$,
respectively:
/\Adef\Z/
Thisalsomatchesonlyifthestringisdef.
NOTE
\Aand\Zbehavedifferentlyfrom^and$whenthemultiple-linepattern-matchingoptionisspecified.Pattern-matchingoptionsaredescribedlatertoday.
Listing7.5showshowyoucanusepatternanchorstoensurethat
alineofinputis,infact,alegalPerlscalar-,array-,or
file-variablename.
Listing7.5.Abettervariable-namevalidationprogram.
1:#!/usr/local/bin/perl
2:
3:print("Enteravariablename:\n");
4:$varname=;
5:chop($varname);
6:if($varname=~/^\$[A-Za-z][_0-9a-zA-Z]*$/){
7:print("$varnameisalegalscalarvariable\n");
8:}elsif($varname=~/^@[A-Za-z][_0-9a-zA-Z]*$/){
9:print("$varnameisalegalarrayvariable\n");
10:}elsif($varname=~/^[A-Za-z][_0-9a-zA-Z]*$/){
11:print("$varnameisalegalfilevariable\n");
12:}else{
13:print("Idon'tunderstandwhat$varnameis.\n");
14:}
$program7_5
Enteravariablename:
x$result
Idon'tunderstandwhatx$resultis.
$
Theonlydifferencebetweenthisprogramand
theoneinListing7.4isthatthisprogramusesthepatternanchors
^and$inthepatternsinlines6,8,and10.
Theseanchorsensurethatavalidpatternconsistsofonlythose
charactersthatmakeupalegalPerlscalar,array,orfilevariable.
Inthesampleoutputgivenhere,theinput
x$result
isrejected,becausethepatterninline6ismatchedonlywhen
the$characterappearsatthebeginningoftheline.
Word-BoundaryPatternAnchors
Theword-boundarypatternanchors,\band\B,
specifywhetheramatchedpatternmustbeonawordboundaryor
insideawordboundary.(Awordboundaryisthebeginningorend
ofaword.)
The\bpatternanchorspecifiesthatthepatternmust
beonawordboundary.Forexample,thepattern
/\bdef/
matchesonlyifdefisthebeginningofaword.This
meansthatdefanddefghimatchbutabcdef
doesnot.
Youcanalsouse\btoindicatetheendofaword.For
example,
/def\b/
matchesdefandabcdef,butnotdefghi.
Finally,thepattern
/\bdef\b/
matchesonlytheworddef,notabcdefordefghi.
NOTE
Awordisassumedtocontainletters,digits,andunderscorecharacters,andnothingelse.Thismeansthat
/\bdef/
matches$defghi:because$isnotassumedtobepartofaword,defisthebeginningoftheworddefghi,and/\bdef/matchesit.
The\Bpatternanchoristheoppositeof\b.
\Bmatchesonlyifthepatterniscontainedinaword.
Forexample,thepattern
/\Bdef/
matchesabcdef,butnotdef.Similarly,the
pattern
/def\B/
matchesdefghi,and
/\Bdef\B/
matchescdefgorabcdefghi,butnotdef,
defghi,orabcdef.
The\band\Bpatternanchorsenableyouto
searchforwordsinaninputlinewithouthavingtobreakupthe
lineusingsplit.Forexample,Listing7.6uses\b
tocountthenumberoflinesofaninputfilethatcontainthe
wordthe.
Listing7.6.Aprogramthatcountsthenumberofinputlines
containingthewordthe.
1:#!/usr/local/bin/perl
2:
3:$thecount=0;
4:print("Entertheinputhere:\n");
5:$line=;
6:while($linene""){
7:if($line=~/\bthe\b/){
8:$thecount+=1;
9:}
10:$line=;
11:}
12:print("Numberoflinescontaining'the':$thecount\n");
$program7_6
Entertheinputhere:
Nowisthetime
forallgoodmen
tocometotheaid
oftheparty.
^D
Numberoflinescontaining'the':3
$
Thisprogramcheckseachlineinturntosee
ifitcontainsthewordthe,andthenprintsthetotal
numberoflinesthatcontaintheword.
Line7performstheactualcheckingbytryingtomatchthepattern
/\bthe\b/
Ifthispatternmatches,thelinecontainsthewordthe,
becausethepatternchecksforwordboundariesateitherend.
Notethatthisprogramdoesn'tcheckwhetherthewordthe
appearsonalinemorethanonce.Itisnotdifficulttomodify
theprogramtodothis;infact,youcandoitinseveraldifferent
ways.
Themostobviousbutmostlaboriouswayistobreakuplinesthat
youknowcontaintheintowords,andthencheckeach
word,asfollows:
if($line=~/\bthe\b/){
@words=split(/[\t]+/,$line);
$count=1;
while($count<=@words){
if($words[$count-1]eq"the"){
$thecount+=1;
}
$count++;
}
}
Acutewaytoaccomplishthesamethingistousethepattern
itselftobreakthelineintowords:
if($line=~/\bthe\b/){
@words=split(/\bthe\b/,$line);
$thecount+=@words-1;
}
Infact,youdon'tevenneedtheifstatement.
@words=split(/\bthe\b/,$line);
$thecount+=@words-1;
Here'swhythisworks:Everytimesplitseestheword
the,itstartsanewword.Therefore,thenumberofoccurrences
oftheisequaltoonelessthanthenumberofelements
[email protected],
@wordshasthelength1,and$thecountisnot
changed.
Thistrickworksonlyifyouknowthatthereisatleastonewordontheline.
Considerthefollowingcode,whichtriestousetheaforementioned
trickonalinethathashaditsnewlinecharacterremovedusingchop:
$line=;
chop($line);
@words=split(/\bthe\b/,$line);
$thecount+=@words-1;
Thiscodeactuallysubtracts1from$thecountifthelineisblankorconsistsonlyofthewordthe,becauseinthesecases@wordsistheemptylistandthelengthof@wordsis0.
Leavingoffthecalltochopprotectsagainst
thisproblem,becausetherewillalwaysbeatleastone"word"inevery
line(consistingofthenewlinecharacter).
VariableSubstitutioninPatterns
Ifyoulike,youcanusethevalueofascalarvariableinapattern.
Forexample,thefollowingcodesplitstheline$line
intowords:
$pattern="[\\t]+";
@words=split(/$pattern/,$line);
Becauseyoucanuseascalarvariableinapattern,thereisnothing
tostopyoufromreadingthepatternfromthestandardinputfile.
Listing7.7acceptsasearchpatternfromafileandthensearches
forthepatternintheinputfileslistedonthecommandline.
Ifitfindsthepattern,itprintsthefilenameandlinenumber
ofthematch;attheend,itprintsthetotalnumberofmatches.
Thisexampleassumesthattwofilesexist,file1and
file2.Eachfilecontainsthefollowing:
Thisisalineofinput.
Thisisanotherlineofinput.
Ifyourunthisprogramwithcommand-lineargumentsfile1
andfile2andsearchforthepatternanother,
yougettheoutputshown.
Listing7.7.Asimplepattern-searchprogram.
1:#!/usr/local/bin/perl
2:
3:print("Enterthesearchpattern:\n");
4:$pattern=;
5:chop($pattern);
6:$filename=$ARGV[0];
7:$linenum=$matchcount=0;
8:print("Matchesfound:\n");
9:while($line=<>){
10:$linenum+=1;
11:if($line=~/$pattern/){
12:print("$filename,line$linenum\n");
13:@words=split(/$pattern/,$line);
14:$matchcount+=@words-1;
15:}
16:if(eof){
17:$linenum=0;
18:$filename=$ARGV[0];
19:}
20:}
21:if($matchcount==0){
22:print("Nomatchesfound.\n");
23:}else{
24:print("Totalnumberofmatches:$matchcount\n");
25:}
$program7_7file1file2
Enterthesearchpattern:
another
Matchesfound:
file1,line2
file2,line2
Totalnumberofmatches:2
$
Thisprogramusesthefollowingscalarvariables
tokeeptrackofinformation:
$patterncontainsthesearchpatternreadinfrom
thestandardinputfile.
$filenamecontainsthefilecurrentlybeingsearched.
$linenumcontainsthelinenumberofthelinecurrently
beingsearched.
$matchcountcontainsthetotalnumberofmatches
foundtothispoint.
Line6setsthecurrentfilename,whichcorrespondstothefirst
[email protected]
variableliststheargumentssuppliedonthecommandline.(To
refreshyourmemoryonhow@ARGVworks,referbackto
Day6,"ReadingfromandWritingtoFiles.")Thiscurrent
filenameneedstobestoredinascalarvariable,becausethe
<>operatorinline9shifts@ARGVand
destroysthisname.
Line9readsfromeachofthefilesonthecommandlineinturn,
onelineatatime.Thecurrentinputlineisstoredinthescalar
variable$line.Oncethelineisread,line10adds1
tothecurrentlinenumber.
Lines11-15handlethematchingprocess.Line11checkswhether
thepatternstoredin$patterniscontainedintheinput
linestoredin$line.Ifamatchisfound,line12prints
outthecurrentfilenameandlinenumber.Line13thensplits
thelineinto"words,"usingthetrickdescribedin
theearliersection,"Word-BoundaryPatternAnchors."
Becausethenumberofelementsoftheliststoredin@words
isonelargerthanthenumberoftimesthepatternismatched,
theexpression@words-1isequivalenttothenumber
ofmatches;itsvalueisaddedto$matchcount.
Line16checkswhetherthe<>operatorhasreached
theendofthecurrentinputfile.Ifithas,line17resetsthe
currentlinenumberto0.Thisensuresthatthenextpassthrough
theloopwillsetthecurrentlinenumberto1(toindicatethat
theprogramisonthefirstlineofthenextfile).Line18sets
thefilenametothenextfilementionedonthecommandline,which
iscurrentlystoredin$ARGV[0].
Lines21-25eitherprintthetotalnumberofmatchesorindicate
thatnomatcheswerefound.
NOTE
Makesurethatyouremembertoincludetheenclosing/
characterswhenyouuseascalar-variablenameinapattern.ThePerl
interpreterdoesnotcomplainwhenitseesthefollowing,forexample,
buttheresultmightnotbewhatyouwant:
@words=split($pattern,$line);
ExcludingAlternatives
Asyouhaveseen,whenthespecialcharacters[]appear
inapattern,theyspecifyasetofalternativestochoosefrom.
Forexample,thepattern
/d[eE]f/
matchesdefordEf.
Whenthe^characterappearsasthefirstcharacterafter
the[,itindicatesthatthepatternistomatchany
characterexcepttheonesdisplayedbetweenthe[
and].Forexample,thepattern
/d[^eE]f/
matchesanypatternthatsatisfiesthefollowingcriteria:
Thefirstcharacterisd.
Thesecondcharacterisanythingotherthaneor
E.
Thelastcharacterisf.
NOTE
Toincludea^characterinasetofalternatives,precedeitwithabackslash,asfollows:
/d[\^eE]f/
Thispatternmatchesd^f,def,ordEf.
Character-RangeEscapeSequences
Inthesectiontitled"MatchingAnyLetterorNumber"
earlierinthischapter,youlearnedthatyoucanrepresentconsecutive
lettersornumbersinsidethe[]specialcharactersby
specifyingranges.Forexample,inthepattern
/a[1-3]c/
the[1-3]matchesanyof1,2,or3.
SomerangesoccurfrequentlyenoughthatPerldefinesspecial
escapesequencesforthem.Forexample,insteadofwriting
/[0-9]/
toindicatethatanydigitistobematched,youcanwrite
/\d/
The\descapesequencemeans"anydigit."
Table7.2liststhecharacter-rangeescapesequences,whatthey
match,andtheirequivalentcharacterranges.
Table7.2.Character-rangeescapesequences.
Escapesequence
DescriptionRange
\dAnydigit
[0-9]
\DAnythingotherthanadigit
[^0-9]
\wAnywordcharacter
[_0-9a-zA-Z]
\WAnythingnotawordcharacter
[^_0-9a-zA-Z]
\sWhitespace
[\r\t\n\f]
\SAnythingotherthanwhitespace
[^\r\t\n\f]
Theseescapesequencescanbeusedanywhereordinarycharacters
areused.Forexample,thefollowingpatternmatchesanydigit
orlowercaseletter:
/[\da-z]/
NOTE
Thedefinitionofwordboundaryasusedbythe\band\Bspecialcharacterscorrespondstothedefinitionofwordcharacterusedby\wand\W.
Ifthepattern/\w\W/matchesaparticular
pairofcharacters,thefirstcharacterispartofawordandthe
secondisnot;thismeansthatthefirstcharacteristheendofa
word,andthatawordboundaryexistsbetweenthefirstandsecond
charactersmatchedbythepattern.
Similarly,if/\W\w/matchesapairof
characters,thefirstcharacterisnotpartofawordandthesecond
characteris.Thismeansthatthesecondcharacteristhebeginningof
aword.Again,awordboundaryexistsbetweenthefirstandsecond
charactersmatchedbythepattern.
MatchingAnyCharacter
Anotherspecialcharactersupportedinpatternsistheperiod
(.)character,whichmatchesanycharacterexceptthe
newlinecharacter.Forexample,thefollowingpatternmatches
d,followedbyanynon-newlinecharacter,followedby
f:
/d.f/
The.characterisoftenusedinconjunctionwiththe
*character.Forexample,thefollowingpatternmatches
anystringthatcontainsthecharacterdprecedingthe
characterf:
/d.*f/
Normally,the.*special-charactercombinationtries
tomatchasmuchaspossible.Forexample,ifthestringbanana
issearchedusingthefollowingpattern,thepatternmatchesbanana,
notbaorbana:
/b.*a/
NOTE
Thereisoneexceptiontotheprecedingrule:The.*characteronlymatchesthelongestpossiblestringthatenablesthepatternmatchasawholetosucceed.
Forexample,supposethestringMississippiissearchedusingthepattern
/M.*i.*pi/
Here,thefirst.*in/M.*i.*pi/matches
Mississippi
Ifittriedtogofurtherandmatch
Mississippi
oreven
Mississippi
therewouldbenothingleftfortherestofthepatterntomatch.
Whenthefirst.*matchislimitedto
Mississippi
therestofthepattern,i.*pi,matchesippi,andthepatternasawholesucceeds.
MatchingaSpecifiedNumberofOccurrences
Severalspecialcharactersinpatternsthatyouhaveseenenable
youtomatchaspecifiednumberofoccurrencesofacharacter.
Forexample,+matchesoneormoreoccurrencesofacharacter,
and?matcheszerooroneoccurrences.
Perlenablesyoutodefinehowmanyoccurrencesofacharacter
constituteamatch.Todothis,usethespecialcharacters{
and}.
Forexample,thepattern
/de{1,3}f/
matchesd,followedbyone,two,orthreeoccurrences
ofe,followedbyf.Thismeansthatdef,
deef,anddeeefmatch,butdfanddeeeef
donot.
Tospecifyanexactnumberofoccurrences,includeonlyonevalue
betweenthe{andthe}.
/de{3}f/
Thisspecifiesexactlythreeoccurrencesofe,which
meansthispatternonlymatchesdeeef.
Tospecifyaminimumnumberofoccurrences,leaveofftheupper
bound.
/de{3,}f/
Thismatchesd,followedbyatleastthreees,
followedbyf.
Finally,tospecifyamaximumnumberofoccurrences,use0as
thelowerbound.
/de{0,3}f/
Thismatchesd,followedbynomorethanthreees,
followedbyf.
NOTE
Youcanuse{and}withcharacterrangesoranyotherspecialcharacter,asfollows:
/[a-z]{1,3}/
Thismatchesone,two,orthreelowercaseletters.
/.{3}/
Thismatchesanythreecharacters.
SpecifyingChoices
Thespecialcharacter|enablesyoutospecifytwoor
morealternativestochoosefromwhenmatchingapattern.For
example,thepattern
/def|ghi/
matcheseitherdeforghi.Thepattern
/[a-z]+|[0-9]+/
matchesoneormorelowercaselettersoroneormoredigits.
Listing7.8isasimpleexampleofaprogramthatusesthe|
specialcharacter.Itreadsanumberandcheckswhetheritis
alegitimatePerlinteger.
Listing7.8.Asimpleinteger-validationprogram.
1:#!/usr/local/bin/perl
2:
3:print("Enteranumber:\n");
4:$number=;
5:chop($number);
6:if($number=~/^-?\d+$|^-?0[xX][\da-fa-F]+$/){
7:print("$numberisalegalinteger.\n");
8:}else{
9:print("$numberisnotalegalinteger.\n");
10:}
$program7_8
Enteranumber:
0x3ff1
0x3ff1isalegalinteger.
$
RecallthatPerlintegerscanbeinanyof
threeforms:
Standardbase-10notation,asin123
Base-8(octal)notation,indicatedbyaleading0,
asin0123
Base-16(hexadecimal)notation,indicatedbyaleading0x
or0X,asin0X1ff
Line6checkswhetheranumberisalegalPerlinteger.Thefirst
alternativeinthepattern,
^-?\d+$
matchesastringconsistingofoneormoredigits,optionally
precededbya-.(The^and$characters
ensurethatthisistheonlystringthatmatches.)Thistakes
careofintegersinstandardbase-10notationandintegersin
octalnotation.
Thesecondalternativeinthepattern,
^-?0[xX][\da-fa-F]+$
matchesintegersinhexadecimalnotation.Takealookatthis
patternonepieceatatime:
The^matchesthebeginningoftheline.Thisensures
thatlinescontainingleadingspacesorextraneouscharacters
arenottreatedasvalidhexadecimalintegers.
The-?matchesa-ifitispresent.This
ensuresthatnegativenumbersarematched.
The0matchestheleading0.
The[xX]matchesthexorXthat
followstheleading0.
The[\da-fa-F]matchesanydigit,anyletterbetween
aandf,oranyletterbetweenAand
F.Recallthatthesearepreciselythecharacterswhich
areallowedtoappearinhexadecimaldigits.
The+indicatesthatthepatternistomatchone
ormorehexadecimaldigits.
Theclosing$indicatesthatthepatternistomatch
onlyiftherearenoextraneouscharactersfollowingthehexadecimal
integer.
Bewarethatthefollowingpatternmatcheseitherxoroneormoreofy,notoneormoreofxory:
/x|y+/
Seethesectioncalled"Special-CharacterPrecedence"later
todayfordetailsonhowtospecifyspecial-characterprecedencein
patterns.
ReusingPortionsofPatterns
Supposethatyouwanttowriteapatternthatmatchesthefollowing:
Oneormoredigitsorlowercaseletters
Followedbyacolonorsemicolon
Followedbyanothergroupofoneormoredigitsorlowercase
letters
Anothercolonorsemicolon
Yetanothergroupofoneormoredigitsorlowercaseletters
Onewaytoindicatethispatternisasfollows:
/[\da-z]+[:;][\da-z]+[:;][\da-z]+/
Thispatternissomewhatcomplicatedandisquiterepetitive.
Perlprovidesaneasierwaytospecifypatternsthatcontainmultiple
repetitionsofaparticularsequence.Whenyouencloseaportion
ofapatterninparentheses,asin
([\da-z]+)
Perlstoresthematchedsequenceinmemory.Toretrieveasequence
frommemory,usethespecialcharacter\n,wheren
isanintegerrepresentingthenthpatternstoredin
memory.
Forexample,theaforementionedpatterncanbewrittenas
/([\da-z]+])[:;]\1[:;]\1/
Here,thepatternmatchedby[\da-z]+isstoredinmemory.
WhenthePerlinterpreterseestheescapesequence\1,
itmatchesthematchedpattern.
Youalsocanstorethesequence[:;]inmemory,andwrite
thispatternasfollows:
/([\da-z]+)([:;])\1\2\1/
Patternsequencesarestoredinmemoryfromlefttoright,so
\1representsthesubpatternmatchedby[\da-z]+
and\2representsthesubpatternmatchedby[:;].
Pattern-sequencememoryisoftenusedwhenyouwanttomatchthe
samecharacterinmorethanoneplacebutdon'tcarewhichcharacter
youmatch.Forexample,ifyouarelookingforadateindd-mm-yy
format,youmightwanttomatch
/\d{2}([\W])\d{2}\1\d{2}/
Thismatchestwodigits,anon-wordcharacter,twomoredigits,
thesamenon-wordcharacter,andtwomoredigits.Thismeansthat
thefollowingstringsallmatch:
12-05-92
26.11.87
070492
However,thefollowingstringdoesnotmatch:
21-05.91
Thisisbecausethepatternislookingfora-between
the05andthe91,notaperiod.
Bewarethatthepattern
/\d{2}([\W])\d{2}\1\d{2}/
isnotthesameasthepattern
/(\d{2})([\W])\1\2\1/
Inthefirstpattern,anydigitcanappearanywhere.Thesecondpattern
matchesanytwodigitsasthefirsttwocharacters,butthenonly
matchesthesametwodigitsagain.Thismeansthat
17-17-17
matches,butthefollowingdoesnot:
17-05-91
Pattern-SequenceScalarVariables
Notethatpattern-sequencememoryispreservedonlyforthelength
ofthepattern.Thismeansthatifyoudefinethefollowingpattern
(which,incidentally,matchesanyfloating-pointnumberthatdoes
notcontainanexponent):
/-?(\d+)\.?(\d+)/
youcannotthendefineanotherpattern,suchasthefollowing:
/\1/
andexpectthePerlinterpretertorememberthat\1refers
tothefirst\d+(thedigitsbeforethedecimalpoint).
Togetaroundthisproblem,Perldefinesspecialbuilt-invariables
thatrememberthevalueofpatternsmatchedinparentheses.These
specialvariablesarenamed$n,wherenisthe
nthsetofparenthesesinthepattern.
Forexample,considerthefollowing:
$string="Thisstringcontainsthenumber25.11.";
$string=~/-?(\d+)\.?(\d+)/;
$integerpart=$1;
$decimalpart=$2;
Inthiscase,thepattern
/-?(\d+)\.?(\d+)/
matches25.11,andthesubpatterninthefirstsetof
parenthesesmatches25.Thismeansthat25is
storedin$1andislaterassignedto$integerpart.
Similarly,thesecondsetofparenthesesmatches11,
whichisstoredin$2andlaterassignedto$decimalpart.
Thevaluesstoredin$1,$2,
andsoon,aredestroyedwhenanotherpatternmatchisperformed.If
youneedthesevalues,besuretoassignthemtootherscalar
variables.
Thereisalsooneotherbuilt-inscalarvariable,$&,
whichcontainstheentirematchedpattern,asfollows:
$string="Thisstringcontainsthenumber25.11.";
$string=~/-?(\d+)\.?(\d+)/;
$number=$&;
Here,thepatternmatchedis25.11,whichisstoredin
$&andthenassignedto$number.
Special-CharacterPrecedence
Perldefinesrulesofprecedencetodeterminetheorderinwhich
specialcharactersinpatternsareinterpreted.Forexample,the
pattern
/x|y+/
matcheseitherxoroneormoreoccurrencesofy,
because+hashigherprecedencethan|andis
thereforeinterpretedfirst.
Table7.3liststhespecialcharactersthatcanappearinpatterns
inorderofprecedence(highesttolowest).Specialcharacters
withhigherprecedencearealwaysinterpretedbeforethoseof
lowerprecedence.
Table7.3.Theprecedenceofpattern-matchingspecial
characters.
SpecialcharacterDescription
()Patternmemory
+*?{}Numberofoccurrences
^$\b\BPatternanchors
|Alternatives
Becausethepattern-memoryspecialcharacters()have
thehighestprecedence,youcanusethemtoforceotherspecial
characterstobeevaluatedfirst.Forexample,thepattern
(ab|cd)+
matchesoneormoreoccurrencesofeitheraborcd.
Thismatches,forexample,abcdab.
Rememberthatwhenyouuseparentheses
toforcetheorderofprecedence,youalsoarestoringintopattern
memory.Forexample,inthesequence
/(ab|cd)+(.)(ef|gh)+\1/
the\1referstowhatab|cdmatched,nottowhatthe.specialcharactermatched.
Nowthatyouknowallofthespecial-patterncharactersandtheir
precedence,lookataprogramthatdoesmorecomplexpatternmatching.
Listing7.9usesthevariousspecial-patterncharacters,including
theparentheses,tocheckwhetheragiveninputstringisavalid
twentieth-centurydate.
Listing7.9.Adate-validationprogram.
1:#!/usr/local/bin/perl
2:
3:print("EnteradateintheformatYYYY-MM-DD:\n");
4:$date=;
5:chop($date);
6:
7:#Becausethispatterniscomplicated,wesplitit
8:#intoparts,assignthepartstoscalarvariables,
9:#thensubstitutetheminlater.
10:
11:#handle31-daymonths
12:$md1="(0[13578]|1[02])\\2(0[1-9]|[12]\\d|3[01])";
13:#handle30-daymonths
14:$md2="(0[469]|11)\\2(0[1-9]|[12]\\d|30)";
15:#handleFebruary,withoutworryingaboutwhetherit's
16:#supposedtobealeapyearornot
17:$md3="02\\2(0[1-9]|[12]\\d)";
18:
19:#checkforatwentieth-centurydate
20:$match=$date=~/^(19)?\d\d(.)($md1|$md2|$md3)$/;
21:#checkforavalidbutnon-20thcenturydate
22:$olddate=$date=~/^(\d{1,4})(.)($md1|$md2|$md3)$/;
23:if($match){
24:print("$dateisavaliddate\n");
25:}elsif($olddate){
26:print("$dateisnotinthe20thcentury\n");
27:}else{
28:print("$dateisnotavaliddate\n");
29:}
$program7_9
EnteradateintheformatYYYY-MM-DD:
1991-04-31
1991-04-31isnotavaliddate
$
Don'tworry:thisprogramisalotlesscomplicated
thanitlooks!Basically,thisprogramdoesthefollowing:
ItcheckswhetherthedateisintheformatYYYY-MM-DD.
(ItallowsYY-MM-DD,andalsoenablesyoutouseacharacter
otherthanahyphentoseparatetheyear,month,anddate.)
Itcheckswhethertheyearisinthetwentiethcenturyor
not.
Itcheckswhetherthemonthisbetween01and12.
Finally,itcheckswhetherthedatefieldisalegaldate
forthatmonth.Legaldatefieldsarebetween01and
either29,30,or31,dependingon
thenumberofdaysinthatmonth.
Ifthedateislegal,theprogramtellsyouso.Ifthedateis
notatwentieth-centurydatebutislegal,theprograminforms
youofthisalso.
Becausethepatterntobematchedistoolongtofitononeline,
thisprogrambreaksitintopiecesandassignsthepiecestoscalar
variables.Thisispossiblebecausescalar-variablesubstitution
issupportedinpatterns.
Line12isthepatterntomatchformonthswith31days.Note
thattheescapesequences(suchas\d)areprecededby
anotherbackslash(producing\\d).Thisisbecausethe
programactuallywantstostoreabackslashinthescalarvariable.
(Recallthatbackslashesindouble-quotedstringsaretreated
asescapesequences.)Thepattern
(0[13578]|1[02])\2(0[1-9]|[12]\d|3[01])
whichisassignedto$md1,consistsofthefollowing
components:
Thesequence(0[13578]|1[02]),whichmatchesthe
monthvalues01,03,05,07,
08,10,and12(the31-daymonths)
\2,whichmatchesthecharacterthatseparatesthe
day,month,andyear
Thesequence(0[1-9]|[12]\d|3[01]),whichmatches
anytwo-digitnumberbetween01and31
Notethat\2matchestheseparatorcharacterbecause
theseparatorcharacterwilleventuallybethesecondpattern
sequencestoredinmemory(whenthepatternisfinallyassembled).
Line14issimilartoline12andhandles30-daymonths.Theonly
differencesbetweenthissubpatternandtheoneinline12are
asfollows:
Themonthvaluesacceptedare04,06,09,
and11.
Thevaliddatefieldsare01through30,
not01through31.
Line17isanothersimilarpatternthatcheckswhetherthemonth
is02(February)andthedatefieldisbetween01
and29.
Line20doestheactualpatternmatchthatcheckswhetherthe
dateisavalidtwentieth-centurydate.Thispatternisdivided
intothreeparts.
^(19)?\d\d,whichmatchesanytwo-digitnumberat
thebeginningofaline,oranyfour-digitnumberstartingwith
19
Theseparatorcharacter,whichistheseconditeminparentheses-the
seconditemstoredinmemory-andthuscanberetrievedusing\2
($md1|$md2|$md3)$,whichmatchesanyofthevalid
month-daycombinationsdefinedinlines12,14,and17,provided
itappearsattheendoftheline
Theresultofthepatternmatch,eithertrueorfalse,isstored
inthescalarvariable$match.
Line22checkswhetherthedateisavaliddateinanycentury.
Theonlydifferencebetweenthispatternandtheoneinline20
isthattheyearcanbeanyone-to-four-digitnumber.Theresult
ofthepatternmatchisstoredin$olddate.
Lines23-29checkwhethereither$matchor$olddate
istrueandprinttheappropriatemessage.
Asyoucansee,thepattern-matchingfacilityinPerlisquite
powerful.Thisprogramislessthan30lineslong,includingcomments;
theequivalentprograminalmostanyotherprogramminglanguage
wouldbesubstantiallylongerandmuchmoredifficulttowrite.
SpecifyingaDifferentPatternDelimiter
Sofar,allthepatternsyouhaveseenhavebeenenclosedby/
characters.
/de*f/
These/charactersareknownaspatterndelimiters.
Because/isthepattern-delimitercharacter,youmust
use\/toincludea/characterinapattern.
Thiscanbecomeawkwardifyouaresearchingforadirectorysuch
as,forexample,/u/jqpublic/perl/prog1.
/\/u\/jqpublic\/perl\/prog1/
Tomakeiteasiertowritepatternsthatinclude/characters,
Perlenablesyoutouseanypattern-delimitercharacteryoulike.
Thefollowingpatternalsomatchesthedirectory/u/jqpublic/perl/prog1:
m!/u/jqpublic/perl/prog1!
Here,themindicatesthepattern-matchingoperation.
Ifyouareusingapatterndelimiterotherthan/,you
mustincludethem.
Therearetwothingsyoushouldwatchoutforwhenyouuseotherpatterndelimiters.
First,ifyouusethe'characterasapatterndelimiter,thePerlinterpreterdoesnotsubstituteforscalar-variablenames.
m'$var'
Thismatchesthestring$var,notthecurrentvalueofthescalarvariable$var.
Second,ifyouuseapatterndelimiterthatisnormallya
special-patterncharacter,youwillnotbeabletousethatspecial
characterinyourpattern.Forexample,ifyouwanttomatchthe
patternab?c(whichmatchesa,optionally
followedbyb,followedbyc)youcannotusethe?characterasapatterndelimiter.Thepattern
m?ab?c?
producesasyntaxerror,becausethePerlinterpreterassumesthatthe?afterthebisapatterndelimiter.Youcanstilluse
m?ab\?c?
butthispatternwon'tmatchwhatyouwant.Becausethe?insidethepatternisescaped,thePerlinterpreterassumesthatyouwanttomatchtheactual?character,andthepatternmatchesthesequenceab?c.
Pattern-MatchingOptions
Whenyouspecifyapattern,youalsocansupplyoptionsthatcontrol
howthepatternistobematched.Table7.4liststhesepattern-matching
options.
Table7.4.Pattern-matchingoptions.
OptionDescription
gMatchallpossiblepatterns
iIgnorecase
mTreatstringasmultiplelines
oOnlyevaluateonce
sTreatstringassingleline
xIgnorewhitespaceinpattern
Allpatternoptionsareincludedimmediatelyafterthepattern.
Forexample,thefollowingpatternusestheioption
toignorecase:
/ab*c/i
Youcanspecifyasmanyoftheoptionsasyoulike,andtheoptions
canbeinanyorder.
MatchingAllPossiblePatterns
ThegoperatortellsthePerlinterpretertomatchall
thepossiblepatternsinastring.Forexample,ifyousearch
thestringbalatausingthepattern
/.a/g
whichmatchesanycharacterfollowedbya,thepattern
matchesba,la,andta.
Ifapatternwiththegoptionspecifiedappearsasan
assignmenttoanarrayvariable,thearrayvariableisassigned
alistconsistingofallthepatternsmatched.Forexample,
@matches="balata"=~/.a/g;
assignsthefollowinglistto@matches:
("ba","la","ta")
Now,considerthefollowingstatement:
$match="balata"=~/.a/g;
Thefirsttimethisstatementisexecuted,$matchis
assignedthefirstpatternmatched,whichinthiscaseisba.
Ifthisassignmentisperformedagain,$matchisassigned
thesecondpatternmatchedinthestring,whichisla,
andsoonuntilthepatternrunsoutofmatches.
Thismeansthatyoucanusepatternswiththegoption
inloops.Listing7.10showshowthisworks.
Listing7.10.Aprogramthatloopsusingapattern.
1:#!/usr/local/bin/perl
2:
3:while("balata"=~/.a/g){
4:$match=$&;
5:print("$match\n");
6:}
$program7_10
ba
la
ta
$
Thefirsttimethroughtheloop,$match
hasthevalueofthefirstpatternmatched,whichisba.
(Thesystemvariable$&alwayscontainsthelast
patternmatched;thispatternisassignedto$matchin
line4.)Whentheloopisexecutedforasecondtime,$match
hasthevaluela.Thethirdtimethrough,$match
hasthevalueta.Afterthis,theloopterminates;because
thepatterndoesn'tmatchanythingelse,theconditionalexpression
isnowfalse.
DeterminingtheMatchLocation
Ifyouneedtoknowhowmuchofastringhasbeensearchedby
thepatternmatcherwhenthegoperatorisspecified,
usetheposfunction.
$offset=pos($string);
Thisreturnsthepositionatwhichthenextpatternmatchwill
bestarted.
Youcanrepositionthepatternmatcherbyputtingpos()
ontheleftsideofanassignment.
pos($string)=$newoffset;
ThistellsthePerlinterpretertostartthenextpatternmatch
atthepositionspecifiedby$newoffset.
Ifyouchangethestringbeingsearched,thematchpositionisresettothebeginningofthestring.
NOTE
TheposfunctionisnotavailableinPerlversion4.
IgnoringCase
Theioptionenablesyoutospecifythatamatchedletter
caneitherbeuppercaseorlowercase.Forexample,thefollowing
patternmatchesde,dE,De,orDE:
/de/i
Patternsthatmatcheitheruppercaseorlowercaselettersare
saidtobecase-insensitive.
TreatingtheStringasMultipleLines
ThemoptiontellsthePerlinterpreterthatthestring
tobematchedcontainsmultiplelinesoftext.Whenthem
optionisspecified,the^specialcharactermatches
eitherthestartofthestringorthestartofanynewline.For
example,thepattern
/^The/m
matchesthewordThein
Thispatternmatches\nThefirstwordonthesecondline
Themoptionalsospecifiesthatthe$special
characteristomatchtheendofanyline.Thismeansthatthe
pattern
/line.$/m
ismatchedinthefollowingstring:
Thisistheendofthefirstline.\nHere'sanotherline.
NOTE
ThemoptionisdefinedonlyinPerl5.TotreatastringasmultiplelineswhenyourunPerl4,setthe$*systemvariable,describedonDay17,"SystemVariables."
EvaluatingaPatternOnlyOnce
TheooptionenablesyoutotellthePerlinterpreter
thatapatternistobeevaluatedonlyonce.Forexample,consider
thefollowing:
$var=1;
$line=;
while($var<10){
$result=$line=~/$var/o;
$line=;
$var++;
}
ThefirsttimethePerlinterpreterseesthepattern/$var/,
itreplacesthename$varwiththecurrentvalueof$var,
whichis1;thismeansthatthepatterntobematched
is/1/.
Becausetheooptionisspecified,thepatterntobe
matchedremains/1/evenwhenthevalueof$var
changes.Iftheooptionhadnotbeenspecified,the
patternwouldhavebeen/2/thenexttimethroughthe
loop.
TIP
There'snorealreasontousetheooptionforpatternsunlessyouarekeenonefficiency.Here'saneasierwaytodothesamething:
$var=;
$matchval=$var;
$line=;
while($var<10){
$result=$line=~/$matchval/;
$line=;
$var++;
}
Thevalueof$matchvalneverchanges,sotheooptionisnotnecessary.
TreatingtheStringasaSingleLine
Thesoptionspecifiesthatthestringtobematched
istobetreatedasasinglelineoftext.Inthiscase,the.
specialcharactermatcheseverycharacterinastring,including
thenewlinecharacter.Forexample,thepattern/a.*bc/s
ismatchedsuccessfullyinthefollowingstring:
axxxxx\nxxxxbc
Ifthesoptionisnotspecified,thispatterndoesnot
match,becausethe.characterdoesnotmatchthenewline.
NOTE
ThesoptionisdefinedonlyinPerl5.
UsingWhiteSpaceinPatterns
OneproblemwithpatternsinPerlisthattheycanbecomedifficult
tofollow.Forexample,considerthispattern,whichyousawearlier:
/\d{2}([\W])\d{2}\1\d{2}/
Patternssuchasthisaredifficulttofollow,becausethereare
alotofbackslashes,braces,andbracketstosortout.
Perl5makeslifealittleeasierbysupplyingthex
option.ThistellsthePerlinterpretertoignorewhitespace
inapatternunlessitisprecededbyabackslash.Thismeans
thattheprecedingpatterncanberewrittenasthefollowing,
whichismucheasiertofollow:
/\d{2}([\W])\d{2}\1\d{2}/x
Hereisanexampleofapatterncontaininganactualblankspace:
/[A-Z][a-z]+\[A-Z][a-z]+/x
Thismatchesanameinthestandardfirst-name/last-nameformat
(suchasJohnSmith).Normally,youwon'twanttouse
thexoptionifyou'reactuallytryingtomatchwhite
space,becauseyouwindupwiththebackslashproblemallover
again.
NOTE
ThexoptionisdefinedonlyinPerl5.
TheSubstitutionOperator
Perlenablesyoutoreplacepartofastringusingthesubstitution
operator,whichhasthefollowingsyntax:
s/pattern/replacement/
ThePerlinterpretersearchesforthepatternspecifiedbythe
placeholderpattern.Ifitfindspattern,it
replacesitwiththestringrepresentedbytheplaceholderreplacement.
Forexample:
$string="abc123def";
$string=~s/123/456/;
Here,123isreplacedby456,whichmeansthat
thevaluestoredin$stringisnowabc456def.
Youcanuseanyofthepatternspecialcharactersinthesubstitution
operator.Forexample,
s/[abc]+/0/
searchesforasequenceconsistingofoneormoreoccurrences
ofthelettersa,b,andc(inany
order)andreplacesthesequencewith0.
Ifyoujustwanttodeleteasequenceofcharactersratherthan
replaceit,leaveoutthereplacementstringasinthefollowing
example,whichdeletesthefirstoccurrenceofthepatternabc:
s/abc//
UsingPattern-SequenceVariablesinSubstitutions
Youcanusepattern-sequencevariablestoincludeamatchedpattern
inthereplacementstring.Thefollowingisanexample:
s/(\d+)/[$1]/
Thismatchesasequenceofoneormoredigits.Becausethissequence
isenclosedinparentheses,itisstoredinthescalarvariable
$1.Inthereplacementstring,[$1],thescalar
variablename$1isreplacedbyitsvalue,whichisthe
matchedpattern.
NOTE
Becausethereplacementstringinthesubstitutionoperatorisastring,notapattern,thepatternspecialcharacters,suchas[],*,and+,donothaveaspecialmeaning.Forexample,inthesubstitution
s/abc/[def]/
thereplacementstringis[def](includingthesquarebrackets).
OptionsfortheSubstitutionOperator
Thesubstitutionoperatorsupportsseveraloptions,whichare
listedinTable7.5.
Table7.5.Optionsforthesubstitutionoperator.
OptionDescription
gChangealloccurrencesofthepattern
iIgnorecaseinpattern
eEvaluatereplacementstringasexpression
mTreatstringtobematchedasmultiplelines
oEvaluateonlyonce
sTreatstringtobematchedassingleline
xIgnorewhitespaceinpattern
Aswithpatternmatching,optionsareappendedtotheendofthe
operator.Forexample,tochangealloccurrencesofabc
todef,usethefollowing:
s/abc/def/g
GlobalSubstitution
Thegoptionchangesalloccurrencesofapatternin
aparticularstring.Forexample,thefollowingsubstitutionputs
parenthesesaroundanynumberinthestring:
s/(\d+)/($1)/g
Listing7.11isanexampleofaprogramthatusesglobalsubstitution.
Itexamineseachlineofitsinput,removesallextraneousleading
spacesandtabs,andreplacesmultiplespacesandtabsbetween
wordswithasinglespace.
Listing7.11.Asimplewhitespacecleanupprogram.
1:#!/usr/local/bin/perl
2:
3:@input=;
4:$count=0;
5:while($input[$count]ne""){
6:$input[$count]=~s/^[\t]+//;
7:$input[$count]=~s/[\t]+\n$/\n/;
8:$input[$count]=~s/[\t]+//g;
9:$count++;
10:}
11:print("Formattedtext:\n");
12:print(@input);
$program7_11
Thisisalineofinput.
Hereisanotherline.
Thisismylastlineofinput.
^D
Formattedtext:
Thisisalineofinput.
Hereisanotherline.
Thisismylastlineofinput.
$
Thisprogramperformsthreesubstitutionson
eachlineofitsinput.Thefirstsubstitution,inline6,checks
whetherthereareanyspacesortabsatthebeginningoftheline.
Ifanyexist,theyareremoved.
Similarly,line7checkswhetherthereareanyspacesortabs
attheendoftheline(beforethetrailingnewlinecharacter).
Ifanyexist,theyareremoved.Todothis,line7replacesthe
followingpattern(oneormorespacesandtabs,followedbya
newlinecharacter,followedbytheendoftheline)withanewline
character:
/[\t]+\n$/
Line8usesaglobalsubstitutiontoremoveextraspacesandtabs
betweenwords.Thefollowingpatternmatchesoneormorespaces
ortabs,inanyorder;thesespacesandtabsarereplacedbya
singlespace:
/[\t]+/
IgnoringCase
Theioptionignorescasewhensubstituting.Forexample,
thefollowingsubstitutionreplacesalloccurrencesofthewords
no,No,NO,andnOwithNO.
(Recallthatthe\bescapecharacterspecifiesaword
boundary.)
s/\bno\b/NO/gi
ReplacementUsinganExpression
Theeoptiontreatsthereplacementstringasanexpression,
whichitevaluatesbeforereplacing.Forexample,considerthe
following:
$string="0abc1";
$string=~s/[a-zA-Z]+/$&x2/e
Thesubstitutionshownhereisaquickwaytoduplicatepartof
astring.Here'showitworks:
Thepattern/[a-zA-Z]+/matchesabc,which
isstoredinthebuilt-invariable$&.
Theeoptionindicatesthatthereplacementstring,
$&x2,istobetreatedasanexpression.Thisexpression
isevaluated,producingtheresultabcabc.
abcabcissubstitutedforabcinthestring
storedin$string.Thismeansthatthenewvalueof$string
is0abcabc1.
Listing7.12isanotherexamplethatusestheeoption
inasubstitution.Thisprogramtakeseveryintegerinalist
ofinputfilesandmultipliesthemby2,leavingtherestofthe
contentsunchanged.(Forthesakeofsimplicity,theprogramassumes
thattherearenofloating-pointnumbersinthefile.)
Listing7.12.Aprogramthatmultiplieseveryintegerina
fileby2.
1:#!/usr/local/bin/perl
2:
3:$count=0;
4:while($ARGV[$count]ne""){
5:open(FILE,"$ARGV[$count]");
6:@file=;
7:$linenum=0;
8:while($file[$linenum]ne""){
9:$file[$linenum]=~s/\d+/$&*2/eg;
10:$linenum++;
11:}
12:close(FILE);
13:open(FILE,">$ARGV[$count]");
14:printFILE(@file);
15:close(FILE);
16:$count++;
17:}
Ifafilenamedfoocontainsthetext
Thiscontainsthenumber1.
Thiscontainsthenumber26.
andthenamefooispassedasacommand-line
argumenttothisprogram,thefilefoobecomes
Thiscontainsthenumber2.
Thiscontainsthenumber52.
Thisprogramusesthebuilt-invariable@ARGVtoretrieve
filenamesfromthecommandline.Notethattheprogramcannot
use<>,becausethefollowingstatementreadsthe
entirecontentsofallthefilesintoasinglearray:
@file=<>;
Lines8-11readandsubstituteonelineofafileatatime.Line
9performstheactualsubstitutionasfollows:
Thepattern\d+matchesasequenceofoneormore
digits,whichisautomaticallyassignedto$&.
Thevalueof$&issubstitutedintothereplacement
string.
Theeoptionindicatesthatthisreplacementstring
istobetreatedasanexpression.Thisexpressionmultiplies
thematchedintegerby2.
Theresultofthemultiplicationisthensubstitutedinto
thefileinplaceoftheoriginalinteger.
Thegoptionindicatesthateveryintegeronthe
lineistobesubstitutedfor.
Afterallthelinesinthefilehavebeenread,thefileisclosed
andreopenedforwriting.Thecalltoprintinline14
takestheliststoredin@file-thecontentsofthecurrent
file-andwritesthembackouttothefile,overwritingtheoriginal
contents.
EvaluatingaPatternOnlyOnce
Aswiththematchoperator,theooptiontothesubstitution
operatortellsthePerlinterpretertoreplaceascalarvariable
namewithitsvalueonlyonce.Forexample,thefollowingstatement
substitutesthecurrentvalueof$varforitsname,producing
areplacementstring:
$string=~/abc/$var/o;
Thisreplacementstringthenneverchanges,evenifthevalue
of$varchanges.Forexample:
$var=17;
while($var>0){
$string=;
$string=~/abc/$var/o;
print($string);
$var--;#thereplacementstringisstill"17"
}
Again,aswiththematchoperator,thereisnorealreasonto
usetheooption.
TreatingtheStringasSingleorMultipleLines
Asinthepattern-matchingoperator,thesandm
optionsspecifythatthestringtobematchedistobetreated
asasinglelineorasmultiplelines,respectively.
Thesoptionensuresthatthenewlinecharacter\n
ismatchedbythe.specialcharacter.
$string="Thisisa\ntwo-linestring.";
$string=~s/a.*o/one/s;
#$stringnowcontains"Thisisaone-linestring."
Ifthemoptionisspecified,^and$
matchthebeginningandendofanyline.
$string="TheThefirstline\nTheThesecondline";
$string=~s/^The//gm;
#$stringnowcontains"Thefirstline\nThesecondline"
$string=~s/e$/k/gm;
#$stringnowcontains"Thefirstlink\nThesecondlink"
The\Aand\Z
escapesequences(definedinPerl5)alwaysmatchonlythebeginning
andendofthestring,respectively.(Thisistheonlycasewhere\Aand\Zbehavedifferentlyfrom^and$.)
NOTE
ThemandsoptionsaredefinedonlyinPerl5.TotreatastringasmultiplelineswhenyourunPerl4,setthe$*systemvariable,describedonDay17.
UsingWhiteSpaceinPatterns
ThexoptiontellsthePerlinterpretertoignoreall
whitespaceunlessprecededbyabackslash.Aswiththepattern-matching
operator,ignoringwhitespacemakescomplicatedstringpatterns
easiertoread.
$string=~s/\d{2}([\W])\d{2}\1\d{2}/$1-$2-$3/x
Thisconvertsaday-month-yearstringtothedd-mm-yy
format.
NOTE
Evenifthexoptionisspecified,spacesinthereplacementstringarenotignored.Forexample,thefollowingreplaces14/04/95with14-04-95,not14-04-95:
$string=~s/\d{2}([\W])\d{2}\1\d{2}/$1-$2-$3/x
AlsonotethatthexoptionisdefinedonlyinPerl5.
SpecifyingaDifferentDelimiter
Youcanspecifyadifferentdelimitertoseparatethepattern
andreplacementstringinthesubstitutionoperator.Forexample,
thefollowingsubstitutionoperatorreplaces/u/binwith
/usr/local/bin:
s#/u/bin#/usr/local/bin#
Thesearchandreplacementstringscanbeenclosedinparentheses
oranglebrackets.
s(/u/bin)(/usr/local/bin)
s/\/usr\/local\/bin/
NOTE
Aswiththematchoperator,youcannotuseaspecialcharacterbothasadelimiterandinapattern.
s.a.c.def.
Thissubstitutionwillbeflaggedascontaininganerrorbecausethe.characterisbeingusedasthedelimiter.Thesubstitution
s.a\.c.def.
doeswork,butitsubstitutesdeffora.c,where.isanactualperiodandnotthepatternspecialcharacter.
TheTranslationOperator
Perlalsoprovidesanotherwaytosubstituteonegroupofcharacters
foranother:thetrtranslationoperator.Thisoperator
usesthefollowingsyntax:
tr/string1/string2/
Here,string1containsalistofcharacterstobereplaced,
andstring2containsthecharactersthatreplacethem.
Thefirstcharacterinstring1isreplacedbythefirst
characterinstring2,thesecondcharacterinstring1
isreplacedbythesecondcharacterinstring2,andso
on.
Hereisasimpleexample:
$string="abcdefghicba";
$string=~tr/abc/def/;
Here,thecharactersa,b,andcare
tobereplacedasfollows:
Alloccurrencesofthecharacteraaretobereplaced
bythecharacterd.
Alloccurrencesofthecharacterbaretobereplaced
bythecharactere.
Alloccurrencesofthecharactercaretobereplaced
bythecharacterf.
Afterthetranslation,thescalarvariable$stringcontains
thevaluedefdefghifed.
NOTE
Ifthestringlistingthecharacterstobereplacedislongerthanthe
stringcontainingthereplacementcharacters,thelastcharacterofthe
replacementstringisrepeated.Forexample:
$string="abcdefgh";
$string=~tr/efgh/abc/;
Here,thereisnocharactercorrespondingtodinthereplacementlist,soc,thelastcharacterinthereplacementlist,replacesh.Thistranslationsetsthevalueof$stringtoabcdabcc.
Alsonotethatifthesamecharacterappearsmorethanonceinthelist
ofcharacterstobereplaced,thefirstreplacementisused:
$string=~tr/AAA/XYZ/;replacesAwithX
Themostcommonuseofthetranslationoperatoristoconvert
alphabeticcharactersfromuppercasetolowercaseorviceversa.
Listing7.13providesanexampleofaprogramthatconvertsa
filetoalllowercasecharacters.
Listing7.13.Anuppercase-to-lowercaseconversionprogram.
1:#!/usr/local/bin/perl
2:
3:while($line=){
4:$line=~tr/A-Z/a-z/;
5:print($line);
6:}
$program7_13
THISLINEISINUPPERCASE.
thislineisinuppercase.
ThiSLiNEIsiNmIxEDcASe.
thislineisinmixedcase.
^D
$
Thisprogramreadsalineatatimefromthe
standardinputfile,terminatingwhenitseesalinecontaining
theCtrl+D(end-of-file)character.
Line4performsthetranslationoperation.Asintheotherpattern-matching
operations,therangecharacter(-)indicatesarange
ofcharacterstobeincluded.Here,therangea-zrefers
toallthelowercasecharacters,andtherangeA-Zrefers
toalltheuppercasecharacters.
NOTE
Therearetwothingsyoushouldnoteaboutthetranslationoperator:
Thepatternspecialcharactersarenotsupportedbythetranslationoperator.
Youcanuseyinplaceoftrifyouwant.
$string=~y/a-z/A-Z/;
OptionsfortheTranslationOperator
Thetranslationoperatorsupportsthreeoptions,whicharelisted
inTable7.6.
Thecoption(cisfor"complement")
translatesallcharactersthatarenotspecified.Forexample,
thestatement
$string=~tr/\d//c;
replaceseverythingthatisnotadigitwithaspace.
Table7.6.Optionsforthetranslationoperator.
OptionDescription
cTranslateallcharactersnotspecified
dDeleteallspecifiedcharacters
sReplacemultipleidenticaloutputcharacterswithasinglecharacter
Thedoptiondeleteseveryspecifiedcharacter.
$string=~tr/\t//d;
Thisdeletesallthetabsandspacesfrom$string.
Thesoption(for"squeeze")checkstheoutput
fromthetranslation.Iftwoormoreconsecutivecharacterstranslate
tothesameoutputcharacter,onlyoneoutputcharacterisactually
used.Forexample,thefollowingreplaceseverythingthatisnot
adigitandoutputsonlyonespacebetweendigits:
$string=~tr/0-9//cs;
Listing7.14isasimpleexampleofaprogramthatusessomeof
thesetranslationoptions.Itreadsanumberfromthestandard
inputfile,anditgetsridofeveryinputcharacterthatisnot
actuallyadigit.
Listing7.14.Aprogramthatensuresthatastringconsists
ofnothingbutdigits.
1:#!/usr/local/bin/perl
2:
3:$string=;
4:$string=~tr/0-9//cd;
5:print("$string\n");
$program7_14
Thenumber45appearsinthisstring.
45
$
Line4ofthisprogramperformsthetranslation.
Thedoptionindicatesthatthetranslatedcharacters
aretobedeleted,andthecoptionindicatesthatevery
characternotinthelististobedeleted.Therefore,thistranslation
deleteseverycharacterinthestringthatisnotadigit.Note
thatthetrailingnewlinecharacterisnotadigit,soitisone
ofthecharactersdeleted.
ExtendedPattern-Matching
Perl5providessomeadditionalpattern-matchingcapabilities
notfoundinPerl4orinstandardUNIXpattern-matchingoperations.
Extendedpattern-matchingcapabilitiesemploythefollowingsyntax:
(?pattern)
isasinglecharacterrepresentingtheextended
pattern-matchingcapabilitybeingused,andpatternis
thepatternorsubpatterntobeaffected.
Thefollowingextendedpattern-matchingcapabilitiesaresupported
byPerl5:
Parenthesizingsubpatternswithoutsavingtheminmemory
Embeddingoptionsinpatterns
Positiveandnegativelook-aheadconditions
Comments
ParenthesizingWithoutSavinginMemory
InPerl,whenasubpatternisenclosedinparentheses,thesubpattern
isalsostoredinmemory.Ifyouwanttoencloseasubpattern
inparentheseswithoutstoringitinmemory,usethe?:
extendedpattern-matchingfeature.Forexample,considerthis
pattern:
/(?:a|b|c)(d|e)f\1/
Thismatchesthefollowing:
Oneofa,b,orc
Oneofdore
f
Whicheverofdorewasmatchedearlier
Here,\1matcheseitherdore,because
thesubpatterna|b|cwasnotstoredinmemory.Compare
thiswiththefollowing:
/(a|b|c)(d|e)f\1/
Here,thesubpatterna|b|cisstoredinmemory,andone
ofa,b,orcismatchedby\1.
EmbeddingPatternOptions
Perl5providesawayofspecifyingapattern-matchingoption
withinthepatternitself.Forexample,thefollowingpatterns
areequivalent:
/[a-z]+/i
/(?i)[a-z]+/
Inbothcases,thepatternmatchesoneormorealphabeticcharacters;
theioptionindicatesthatcaseistobeignoredwhen
matching.
Thesyntaxforembeddedpatternoptionsis
(?option)
whereoptionisoneoftheoptionsshowninTable7.7.
Table7.7.Optionsforembeddedpatterns.
OptionDescription
iIgnorecaseinpattern
mTreatpatternasmultiplelines
sTreatpatternassingleline
xIgnorewhitespaceinpattern
Thegandooptionsarenotsupportedasembedded
patternoptions.
Embeddedpatternoptionsgiveyoumoreflexibilitywhenyouare
matchingpatterns.Forexample:
$pattern1="[a-z0-9]+";
$pattern2="(?i)[a-z]+";
if($string=~/$pattern1|$pattern2/){
...
}
Here,theioptionisspecifiedforsome,butnotall,
ofapattern.(Thispatternmatcheseitheranycollectionoflowercase
lettersmixedwithdigits,oranycollectionofletters.)
PositiveandNegativeLook-Ahead
Perl5enablesyoutousethe?=featuretodefinea
boundaryconditionthatmustbematchedinorderforthepattern
tomatch.Forexample,thefollowingpatternmatchesabc
onlyifitisfollowedbydef:
/abc(?=def)/
Thisisknownasapositivelook-aheadcondition.
NOTE
Thepositivelook-aheadconditionisnotpartofthepatternmatched.Forexample,considerthesestatements:
$string="25abc8";
$string=~/abc(?=[0-9])/;
$matched=$&;
Here,asalways,$&containsthematchedpattern,whichinthiscaseisabc,notabc8.
Similarly,the?!featuredefinesanegativelook-ahead
condition,whichisaboundaryconditionthatmustnotbe
presentifthepatternistomatch.Forexample,thepattern/abc(?!def)/
matchesanyoccurrenceofabcunlessitisfollowedby
def.
PatternComments
Perl5enablesyoutoaddcommentstoapatternusingthe?#
feature.Forexample:
if($string=~/(?i)[a-z]{2,3}(?#matchtwoorthreealphabeticcharacters)/{
...
}
Addingcommentsmakesiteasiertofollowcomplicatedpatterns.
Summary
Perlenablesyoutosearchforsequencesofcharactersusingpatterns.
Ifapatternisfoundinastring,thepatternissaidtobematched.
Patternsoftenareusedinconjunctionwiththepattern-match
operators,=~and!~.The=~operator
returnstrueifthepatternmatches,andthe!~operator
returnstrueifthepatterndoesnotmatch.
Special-patterncharactersenableyoutosearchforastringthat
meetsoneofavarietyofconditions.
The+charactermatchesoneormoreoccurrencesof
acharacter.
The*charactermatcheszeroormoreoccurrences
ofacharacter.
The[]charactersencloseasetofcharacters,any
oneofwhichmatches.
The?charactermatcheszerooroneoccurrencesof
acharacter.
The^and$charactersmatchthebeginning
andendofaline,respectively.The\band\B
charactersmatchawordboundaryorsomewhereotherthanaword
boundary,respectively.
The{}charactersspecifythenumberofoccurrences
ofacharacter.
The|characterspecifiesalternatives,eitherof
whichmatch.
Togiveaspecialcharacteritsnaturalmeaninginapattern,
precedeitwithabackslash\.
Enclosingapartofapatterninparenthesesstoresthematched
subpatterninmemory;thisstoredsubpatterncanberecalledusing
thecharactersequence\n,andstoredinascalarvariable
usingthebuilt-inscalarvariable$n.Thebuilt-inscalar
variable$&storestheentirematchedpattern.
Youcansubstituteforscalar-variablenamesinpatterns,specify
differentpatterndelimiters,orsupplyoptionsthatmatchevery
possiblepattern,ignorecase,orperformscalar-variablesubstitution
onlyonce.
Thesubstitutionoperator,s,enablesyoutoreplace
amatchedpatternwithaspecifiedstring.Optionstothesubstitution
operatorenableyoutoreplaceeverymatchedpattern,ignorecase,
treatthereplacingstringasanexpression,orperformscalar-variable
substitutiononlyonce.
Thetranslationoperator,tr,enablesyoutotranslate
onesetofcharactersintoanotherset.Optionsexistthatenable
youtoperformtranslationoneverythingnotinthelist,todelete
charactersinthelist,ortoignoremultipleidenticaloutput
characters.
Perl5providesextendedpattern-matchingcapabilitiesnotprovided
inPerl4.Touseoneoftheseextendedpatternfeaturesona
subpattern,put(?atthebeginningofthesubpattern
and)attheendofthesubpattern.
Q&A
Q:Howmanysubpatternscanbestoredinmemoryusing\1,\2,andsoon?
A:Basically,asmanyas
youlike.Afteryoustoremorethanninepatterns,youcanretrievethe
laterpatternsusingtwo-digitnumbersprecededbyabackslash,suchas
\10.
Q:Whydoespattern-memoryvariablenumberingstartwith1,whereassubscriptnumberingstartswith0?
A:Subscriptnumbering
startswith0toremaincompatiblewiththeCprogramminglanguage.
ThereisnosuchthingaspatternmemoryinC,sothereisnoneedto
becompatiblewithit.
Q:Whathappenswhenthereplacementstringinthetranslatecommandisleftout,asintr/abc//?
A:Ifthereplacementstringisomitted,acopyofthefirststringisused.Thismeansthat
:t:r/abc//
doesnotdoanything,becauseitisthesameas
tr/abc/abc/
Ifthereplacementstringisomittedinthesubstitutecommand,asin
s/abc//
thepatternmatched-inthiscase,abc-isdeleted.
Q:WhydoesPerlusecharacterssuchas+,*,and?aspatternspecialcharacters?
A:ThesespecialcharactersusuallycorrespondtospecialcharactersusedinotherUNIXapplications,suchasviandcsh.Someofthespecialcharacters,suchas+,areusedinformal
syntaxdescriptionlanguages.
Q:WhydoesPerluseboth\1and$1tostorepatternmemory?
A:Toenableyoutodistinguishbetweenasubpatternmatchedinthecurrentpattern(whichisstoredin\1)andasubpatternmatchedinthepreviousstatement(whichisstoredin$1).
Workshop
TheWorkshopprovidesquizquestionstohelpyousolidifyyour
understandingofthematerialcoveredandexercisestogiveyou
experienceinusingwhatyou'velearned.Tryandunderstandthe
quizandexerciseanswersbeforeyougoontotomorrow'slesson.
Quiz
Whatdothefollowingpatternsmatch?
a. /a|bc*/
b. /[\d]{1,3}/
c. /\bc[aou]t\b/
d. /(xy+z)\.\1/
e. /^$/
Writepatternsthatmatchthefollowing:
a. Fiveormorelowercaseletters(a-z).
b. Eitherthenumber1orthestringone.
c. stringofdigitsoptionallycontainingadecimal
point.
d. Anyletter,followedbyanyvowel,followed
bythesameletteragain.
e. Oneormore+characters.
Supposethevariable$varhasthevalueabc123.
Indicatewhetherthefollowingconditionalexpressionsreturn
trueorfalse.
a. $var=~/./
b. $var=~/[A-Z]*/
c. $var=~/\w{4-6}/
d. $var=~/(\d)2(\1)/
e. $var=~/abc$/
f. $var=~/1234?/
Supposethevariable$varhasthevalueabc123abc.
Whatisthevalueof$varafterthefollowingsubstitutions?
a. $var=~s/abc/def/;
b. $var=~s/[a-z]+/X/g;
c. $var=~s/B/W/i;
d. $var=~s/(.)\d.*\1/d/;
e. $var=~s/(\d+)/$1*2/e;
Supposethevariable$varhasthevalueabc123abc.
Whatisthevalueof$varafterthefollowingtranslations?
a. $var=~tr/a-z/A-Z/;
b. $var=~tr/123/456/;
c. $var=~tr/231/564/;
d. $var=~tr/123//s;
e. $var=~tr/123//cd;
Exercises
Writeaprogramthatreadsalltheinputfromthestandard
inputfile,convertsallthevowels(excepty)touppercase,
andprintstheresultonthestandardoutputfile.
Writeaprogramthatcountsthenumberoftimeseachdigit
appearsinthestandardinputfile.Printthetotalforeachdigit
andthesumofallthetotals.
Writeaprogramthatreversestheorderofthefirstthree
wordsofeachinputline(fromthestandardinputfile)using
thesubstitutionoperator.Leavethespacingunchanged,andprint
eachresultingline.
Writeaprogramthatadds1toeverynumberinthestandard
inputfile.Printtheresults.
BUGBUSTER:Whatiswrongwiththefollowingprogram?
#!/usr/local/bin/perl
while($line=){
#putquotesaroundeachlineofinput
$line=~/^.*$/"\1"/;
print($line);
}
BUGBUSTER:Whatiswrongwiththefollowingprogram?
#!/usr/local/bin/perl
while($line=){
if($line=~/[\d]*/){
print("Thislinecontainsthedigits'$&'\n");
}
}