passing argument to -l to set output record separator. $ seq 8 | perl -ne 'print if /[24]/' 2 4 $ # null separator, note how -l also chomps input record ...
CommandLineTextProcessing
Introduction
Cat,Less,TailandHead
GNUgrep
GNUsed
GNUawk
Perltheswissknife
Rubyoneliners
Sortingstuff
Restructuretext
Whatsthedifference
Fileattributes
Miscellaneous
PoweredbyGitBook
Perltheswissknife
Perloneliners
TableofContents
ExecutingPerlcode
Simplesearchandreplace
inplaceediting
Linefiltering
Regularexpressionsbasedfiltering
Fixedstringmatching
Linenumberbasedfiltering
Fieldprocessing
Fieldcomparison
Specifyingdifferentinputfieldseparator
Specifyingdifferentoutputfieldseparator
Changingrecordseparators
Inputrecordseparator
Outputrecordseparator
Multilineprocessing
Perlregularexpressions
sedvsperlsubtledifferences
Backslashsequences
Non-greedyquantifier
Lookarounds
Ignoringspecificmatches
Specialcapturegroups
Modifiers
Quotingmetacharacters
Matchingposition
Usingmodules
Twofileprocessing
Comparingwholelines
Comparingspecificfields
Linenumbermatching
Creatingnewfields
Multiplefileinput
Dealingwithduplicates
LinesbetweentwoREGEXPs
Allunbrokenblocks
Specificblocks
Brokenblocks
Arrayoperations
Iterationandfiltering
Sorting
Transforming
Miscellaneous
split
Fixedwidthprocessing
Stringandfilereplication
transliteration
Executingexternalcommands
FurtherReading
$perl-le'print$^V'
v5.22.1
$manperl
PERL(1)PerlProgrammersReferenceGuidePERL(1)
NAME
perl-ThePerl5languageinterpreter
SYNOPSIS
perl[-sTtuUWX][-hv][-V[:configvar]]
[-cw][-d[t][:debugger]][-D[number/list]]
[-pna][-Fpattern][-l[octal]][-0[octal/hexadecimal]]
[-Idir][-m[-]module][-M[-]'module...'][-f]
[-C[number/list]][-S][-x[dir]]
[-i[extension]]
[[-e|-E]'command'][--][programfile][argument]...
Formoreinformationontheseoptions,youcanrun"perldocperlrun".
...
Prerequisitesandnotes
familiaritywithprogrammingconceptslikevariables,printing,controlstructures,arrays,etc
Perlborrowssyntax/featuresfromC,shellscripting,awk,sedetc.Priorexperienceworkingwiththemwouldhelpalot
familiaritywithregularexpressionbasics
ifnot,checkoutEREportionofGNUsedregularexpressions
examplesfornon-greedy,lookarounds,etcwillbecoveredhere
thistutorialisprimarilyfocussedonshortprogramsthatareeasilyusablefromcommandline,similartousinggrep,sed,awketc
doNOTusestyle/syntaxpresentedherewhenwritingfullfledgedPerlprogramswhichshouldusestrict,warningsetc
seeperldoc-perlintroandlearnxinyminutes-perlforquickintrotousingPerlforfullfledgedprograms
linkstoPerldocumentationwillbeaddedasnecessary
unlessotherwisespecified,considerinputasASCIIencodedtextonly
seealsostackoverflow-whyUTF-8isnotdefault
ExecutingPerlcode
Onewayistoputcodeinafileanduseperlcommandwithfilenameasargument
Anotheristouseshebang)atbeginningofscript,makethefileexecutableanddirectlyrunit
$catcode.pl
print"HelloPerl\n"
$perlcode.pl
HelloPerl
$#similartobash
$catcode.sh
echo'HelloBash'
$bashcode.sh
HelloBash
Forshortprograms,onecanuse-ecommandlineoptiontoprovidecodefromcommandlineitself
Use-Eoptiontousenewerfeatureslikesay.Seeperldoc-newfeatures
Thisentirechapterisaboutusingperlthiswayfromcommandline
$perl-e'print"HelloPerl\n"'
HelloPerl
$#sayautomaticallyaddsnewlinecharacter
$perl-E'say"HelloPerl"'
HelloPerl
$#similarto
$bash-c'echo"HelloBash"'
HelloBash
$#multiplecommandscanbeissuedseparatedby;
$#-lwillbecoveredlater,hereusedtoappendnewlinetoprint
$perl-le'$x=25;$y=12;print$x**$y'
59604644775390625
Perlis(in)famousforbeingabletothingsmorethanoneway
examplesinthischapterwillmostlytrytousethesyntaxthatavoids(){}
$#showsdifferentsyntaxusageofif/say/print
$perl-e'if(2<3){print("2islessthan3\n")}'
2islessthan3
$perl-E'say"2islessthan3"if2<3'
2islessthan3
$#stringcomparisonuseseqfor==,ltfor/dev/null
real0m0.005s
$timeseq1432314563435|perl-ne'printif$.==234'>/dev/null
real0m2.439s
$#mimickingheadcommand,sameas:head-n3orsed'3q'
$seq1425|perl-pe'exitif$.>3'
14
15
16
$#sameas:sed'3Q'
$seq1425|perl-pe'exitif$.==3'
14
15
selectingrangeoflines
..isperldoc-rangeoperator
$#sameas:sed-n'3,5p'orawk'NR>=3&&NR<=5'
$#inthiscontext,therangeiscomparedagainst$.
$seq1425|perl-ne'printif3..5'
16
17
18
$#selectingfromparticularlinenumbertoendofinput
$#sameas:sed-n'10,$p'orawk'NR>=10'
$seq1425|perl-ne'printif$.>=10'
23
24
25
Fieldprocessing
-aoptionwillauto-spliteachinputrecordbasedononeormorecontinuouswhite-space,similartodefaultbehaviorinawk
Seealsosplitsection
Specialvariablearray@Fwillcontainalltheelements,indexingstartsfrom0
negativeindexingisalsosupported,-1giveslastelement,-2giveslast-but-oneandsoon
seeArrayoperationssectionforexamplesonarrayusage
$catfruits.txt
fruitqty
apple42
banana31
fig90
guava6
$#printonlyfirstfield,indexingstartsfrom0
$#sameas:awk'{print$1}'fruits.txt
$perl-lane'print$F[0]'fruits.txt
fruit
apple
banana
fig
guava
$#printonlysecondfield
$#sameas:awk'{print$2}'fruits.txt
$perl-lane'print$F[1]'fruits.txt
qty
42
31
90
6
bydefault,leadingandtrailingwhitespaceswon'tbeconsideredwhensplittingtheinputrecord
mimickingawk'sdefaultbehavior
$printf'aateb\tc\n'
aatebc
$printf'aateb\tc\n'|perl-lane'print$F[0]'
a
$printf'aateb\tc\n'|perl-lane'print$F[-1]'
c
$#numberoffields,$#Fgivesindexoflastelement-soadd1
$echo'1a7'|perl-lane'print$#F+1'
3
$printf'aateb\tc\n'|perl-lane'print$#F+1'
4
$#orusescalarcontext
$echo'1a7'|perl-lane'printscalar@F'
3
Fieldcomparison
fornumericcontext,Perlautomaticallytriestoconvertthestringtonumber,ignoringwhite-space
forstringcomparison,useeqfor==,nefor!=andsoon
$#iffirstfieldexactlymatchesthestring'apple'
$#sameas:awk'$1=="apple"{print$2}'fruits.txt
$perl-lane'print$F[1]if$F[0]eq"apple"'fruits.txt
42
$#printfirstfieldifsecondfield>35(excludingheader)
$#sameas:awk'NR>1&&$2>35{print$1}'fruits.txt
$perl-lane'print$F[0]if$F[1]>35&&$.>1'fruits.txt
apple
fig
$#printheaderandlineswithqty<35
$#sameas:awk'NR==1||$2<35'fruits.txt
$perl-ane'printif$F[1]<35||$.==1'fruits.txt
fruitqty
banana31
guava6
$#iffirstfielddoesNOTcontain'a'
$#sameas:awk'$1!~/a/'fruits.txt
$perl-ane'printif$F[0]!~/a/'fruits.txt
fruitqty
fig90
Specifyingdifferentinputfieldseparator
byusing-Fcommandlineoption
Seealsosplitsection,whichcoversdetailsabouttrailingemptyfields
$#secondfieldwhereinputfieldseparatoris:
$#sameas:awk-F:'{print$2}'
$echo'foo:123:bar:789'|perl-F:-lane'print$F[1]'
123
$#lastfield,sameas:awk-F:'{print$NF}'
$echo'foo:123:bar:789'|perl-F:-lane'print$F[-1]'
789
$#secondlastfield,sameas:awk-F:'{print$(NF-1)}'
$echo'foo:123:bar:789'|perl-F:-lane'print$F[-2]'
bar
$#secondandlastfield
$#otherwaystoprintmorethan1elementwillbecoveredlater
$echo'foo:123:bar:789'|perl-F:-lane'print"$F[1]$F[-1]"'
123789
$#usequotestoavoidclasheswithshellspecialcharacters
$echo'one;two;three;four'|perl-F';'-lane'print$F[2]'
three
Regularexpressionsbasedinputfieldseparator
$#sameas:awk-F'[0-9]+''{print$2}'
$echo'Sample123string54with908numbers'|perl-F'\d+'-lane'print$F[1]'
string
$#firstfieldwillbeemptyasthereisnothingbefore'{'
$#sameas:awk-F'[{}=]+''{print$1}'
$#\x20isspacecharacter,can'tuseliteralspacewithin[]whenusing-F
$echo'{foo}bar=baz'|perl-F'[{}=\x20]+'-lane'print$F[0]'
$echo'{foo}bar=baz'|perl-F'[{}=\x20]+'-lane'print$F[1]'
foo
$echo'{foo}bar=baz'|perl-F'[{}=\x20]+'-lane'print$F[2]'
bar
emptyargumentto-Fwillsplittheinputrecordcharacterwise
$#sameas:gawk-vFS='{print$1}'
$echo'apple'|perl-F-lane'print$F[0]'
a
$echo'apple'|perl-F-lane'print$F[1]'
p
$echo'apple'|perl-F-lane'print$F[-1]'
e
$#use-Coptionwhendealingwithunicodecharacters
$#SwillturnonUTF-8forstdin/stdout/stderrstreams
$printf'hi👍howareyou?'|perl-CS-F-lane'print$F[2]'
👍
Specifyingdifferentoutputfieldseparator
Method1:use$,tochangeseparatorbetweenprintarguments
couldberememberedeasilybynotingthat,isusedtoseparateprintarguments
$#bydefault,thevariousargumentsareconcatenated
$echo'foo:123:bar:789'|perl-F:-lane'print$F[1],$F[-1]'
123789
$#change$,ifdifferentseparatorisneeded
$echo'foo:123:bar:789'|perl-F:-lane'$,="";print$F[1],$F[-1]'
123789
$echo'foo:123:bar:789'|perl-F:-lane'$,="-";print$F[1],$F[-1]'
123-789
$#argumentcanbearraytoo
$echo'foo:123:bar:789'|perl-F:-lane'$,="-";print@F[1,-1]'
123-789
$echo'foo:123:bar:789'|perl-F:-lane'$,="-";print@F'
foo-123-bar-789
Method2:usejoin
$echo'foo:123:bar:789'|perl-F:-lane'printjoin"-",$F[1],$F[-1]'
123-789
$echo'foo:123:bar:789'|perl-F:-lane'printjoin"-",@F[1,-1]'
123-789
$echo'foo:123:bar:789'|perl-F:-lane'printjoin"-",@F'
foo-123-bar-789
Method3:use$"tochangeseparatorwhenarrayisinterpolated,defaultisspacecharacter
couldberememberedeasilybynotingthatinterpolationhappenswithindoublequotes
$#defaultisspace
$echo'foo:123:bar:789'|perl-F:-lane'print"@F[1,-1]"'
123789
$echo'foo:123:bar:789'|perl-F:-lane'$"="-";print"@F[1,-1]"'
123-789
$echo'foo:123:bar:789'|perl-F:-lane'$"=",";print"@F"'
foo,123,bar,789
useBEGINifsameseparatoristobeusedforalllines
statementsinsideBEGINareexecutedbeforeprocessinganyinputtext
$#canalsouse:perl-lane'BEGIN{$"=","}print"@F"'fruits.txt
$perl-lane'BEGIN{$,=","}print@F'fruits.txt
fruit,qty
apple,42
banana,31
fig,90
guava,6
Changingrecordseparators
Beforeseeingexamplesforchangingrecordseparators,let'scoveradetailaboutcontentsofinputrecordanduseof-loption
Seealsoperldoc-chomp
$#inputrecordincludestherecordseparatoraswell
$#canalsouse:perl-pe's/$/123/'
$echo'foo'|perl-pe's/\n/123\n/'
foo123
$#thisexampleshowsbetterusecase
$#similartopaste-sdbutwithabilitytousemulti-characterdelimiter
$seq5|perl-pe's/\n/:/if!eof'
1:2:3:4:5
$#-loptionwillchompofftherecordseparator(amongotherthings)
$echo'foo'|perl-l-pe's/\n/123\n/'
foo
$#-lalsosetsoutputrecordseparatorwhichgetsaddedtoprintstatements
$#ORSgetsinputrecordseparatorvalueifnoargumentispassedto-l
$#hencethenewlineautomaticallygettingaddedforprintinthisexample
$perl-lane'print$F[0]if$F[1]<35&&$.>1'fruits.txt
banana
guava
Inputrecordseparator
bydefault,newlinecharacterisusedasinputrecordseparator
use$/tospecifyadifferentinputrecordseparator
unlikeawk,onlystringcanbeused,noregularexpressions
forsinglecharacterseparator,canalsouse-0commandlineoptionwhichacceptsoctal/hexadecimalvalueasargument
if-loptionisalsoused
inputrecordseparatorwillbechompedfrominputrecord
inaddition,ifargumentisnotpassedto-l,outputrecordseparatorwillgetwhateveriscurrentvalueofinputrecordseparator
so,orderof-l,-0and/or$/usagebecomesimportant
$s='thisisasamplestring'
$#spaceasinputrecordseparator,printingallrecords
$#sameas:awk-vRS='''{printNR,$0}'
$#ORSisnewlineas-lisusedbefore$/getschanged
$printf"$s"|perl-lne'BEGIN{$/=""}print"$.$_"'
1this
2is
3a
4sample
5string
$#printallrecordscontaining'a'
$#sameas:awk-vRS='''/a/'
$printf"$s"|perl-l-0040-ne'printif/a/'
a
sample
$#iftheorderischanged,ORSwillbespace,notnewline
$printf"$s"|perl-0040-l-ne'printif/a/'
asample
-0optionusedwithoutargumentwillusetheASCIINULcharacterasinputrecordseparator
$printf'foo\0bar\0'|cat-A
foo^@bar^@$
$printf'foo\0bar\0'|perl-l-0-ne'print'
foo
bar
$#couldbegolfedto:perl-l-0pe''
$#butdontuse`-l0`as`0`willbetreatedasargumentto`-l`
values-0400to-0777willcauseentirefiletobeslurped
idiomatically,-0777isused
$#smodifierallows.tomatchnewlineaswell
$perl-0777-pe's/red.*are//s'poem.txt
Rosesareyou.
$#replacefirstnewlinewith'.'
$perl-0777-pe's/\n/./'greeting.txt
Hellothere.Haveasafejourney
forparagraphmode(twomoremoreconsecutivenewlinecharacters),use-00orassignemptystringto$/
Considerthebelowsamplefile
$catsample.txt
HelloWorld
Goodday
Howareyou
Justdo-it
Believeit
Todayissunny
Notabitfunny
Nodoubtyoulikeittoo
Muchadoaboutnothing
Hehehe
again,inputrecordwillhavetheseparatortooandusing-lwillchompit
however,ifmorethantwoconsecutivenewlinecharactersseparatetheparagraphs,onlytwonewlineswillbepreservedandtherestdiscarded
use$/="\n\n"toavoidthisbehavior
$#printallparagraphscontaining'it'
$#sameas:awk-vRS=-vORS='\n\n''/it/'sample.txt
$perl-00-ne'printif/it/'sample.txt
Justdo-it
Believeit
Todayissunny
Notabitfunny
Nodoubtyoulikeittoo
$#basedonnumberoflinesineachparagraph
$perl-F'\n'-00-ane'printif$#F==0'sample.txt
HelloWorld
$#unlikeawk-F'\n'-vRS=-vORS='\n\n''NF==2&&/do/'sample.txt
$#therewontbeemptylineatendbecauseinputfiledidn'thaveit
$perl-F'\n'-00-ane'printif$#F==1&&/do/'sample.txt
Justdo-it
Believeit
Muchadoaboutnothing
Hehehe
Re-structuringparagraphs
$#sameas:awk'BEGIN{FS="\n";OFS=".";RS="";ORS="\n\n"}{$1=$1}1'
$perl-F'\n'-00-ane'printjoin".",@F;print"\n\n"'sample.txt
HelloWorld
Goodday.Howareyou
Justdo-it.Believeit
Todayissunny.Notabitfunny.Nodoubtyoulikeittoo
Muchadoaboutnothing.Hehehe
multi-characterseparator
$catreport.log
blahblah
Error:somethingwentwrong
moreblah
whatever
Error:somethingsurelywentwrong
sometext
somemoretext
blahblahblah
$#numberofrecords,sameas:awk-vRS='Error:''END{printNR}'
$perl-lne'BEGIN{$/="Error:"}print$.ifeof'report.log
3
$#printfirstrecord
$perl-lne'BEGIN{$/="Error:"}printif$.==1'report.log
blahblah
$#sameas:awk-vRS='Error:''/surely/{printRS$0}'report.log
$perl-lne'BEGIN{$/="Error:"}print"$/$_"if/surely/'report.log
Error:somethingsurelywentwrong
sometext
somemoretext
blahblahblah
Joininglinesbasedonspecificendoflinecondition
$catmsg.txt
Hellothere.
Itwillrainto-
day.Haveasafe
andpleasantjou-
rney.
$#sameas:awk-vRS='-\n'-vORS='1'msg.txt
$#canalsouse:perl-pe's/-\n//'msg.txt
$perl-pe'BEGIN{$/="-\n"}chomp'msg.txt
Hellothere.
Itwillraintoday.Haveasafe
andpleasantjourney.
Outputrecordseparator
onewayistouse$\tospecifyadifferentoutputrecordseparator
bydefaultitdoesn'thaveavalue
$#notethatdespite$\nothavingavalue,outputhasnewlines
$#becausetheinputrecordstillhastheinputrecordseparator
$seq3|perl-ne'print'
1
2
3
$#sameas:awk-vORS='\n\n''{print$0}'
$seq3|perl-ne'BEGIN{$\="\n"}print'
1
2
3
$seq2|perl-ne'BEGIN{$\="---\n"}print'
1
---
2
---
dynamicallychangingoutputrecordseparator
$#sameas:awk'{ORS=NR%2?"":"\n"}1'
$#notetheuseof-ltochomptheinputrecordseparator
$seq6|perl-lpe'$\=$.%2?"":"\n"'
12
34
56
$#-lalsosetstheoutputrecordseparator
$#butgetsoverriddenby$\
$seq6|perl-lpe'$\=$.%3?"-":"\n"'
1-2-3
4-5-6
passingargumentto-ltosetoutputrecordseparator
$seq8|perl-ne'printif/[24]/'
2
4
$#nullseparator,notehow-lalsochompsinputrecordseparator
$seq8|perl-l0-ne'printif/[24]/'|cat-A
2^@4^@
$#commaseparator,won'thaveanewlineatend
$seq8|perl-l054-ne'printif/[24]/'
2,4,
$#toaddafinalnewlinetooutput,useENDandprintf
$seq8|perl-l054-ne'printif/[24]/;END{printf"\n"}'
2,4,
Multilineprocessing
Processingconsecutivelines
$catpoem.txt
Rosesarered,
Violetsareblue,
Sugarissweet,
Andsoareyou.
$#matchtwoconsecutivelines
$#sameas:awk'p~/are/&&/is/{printpORS$0}{p=$0}'poem.txt
$perl-ne'print$p,$_if/is/&&$p=~/are/;$p=$_'poem.txt
Violetsareblue,
Sugarissweet,
$#ifonlythesecondlineisneeded,sameas:awk'p~/are/&&/is/;{p=$0}'
$perl-ne'printif/is/&&$p=~/are/;$p=$_'poem.txt
Sugarissweet,
$#printiflinematchesaconditionaswellasconditionfornext2lines
$#sameas:awk'p2~/red/&&p1~/blue/&&/is/{printp2}{p2=p1;p1=$0}'
$perl-ne'print$p2if/is/&&$p1=~/blue/&&$p2=~/red/;
$p2=$p1;$p1=$_'poem.txt
Rosesarered,
Considerthissampleinputfile
$catrange.txt
foo
BEGIN
1234
6789
END
bar
BEGIN
a
b
c
END
baz
extractinglinesaroundmatchingline
how$n&&$n--works:
needtonotethatrighthandsideof&&isprocessedonlyiflefthandsideistrue
soforexample,ifinitially$n=2,thenweget
2&&2;$n=1-evaluatestotrue
1&&1;$n=0-evaluatestotrue
0&&-evaluatestofalse...nodecrementing$nandhencewillbefalseuntil$nisre-assignednon-zerovalue
$#similarto:grep--no-group-separator-A1'BEGIN'range.txt
$#sameas:awk'/BEGIN/{n=2}n&&n--'range.txt
$perl-ne'$n=2if/BEGIN/;printif$n&&$n--'range.txt
BEGIN
1234
BEGIN
a
$#printonlylineaftermatchingline,sameas:awk'n&&n--;/BEGIN/{n=1}'
$perl-ne'printif$n&&$n--;$n=1if/BEGIN/'range.txt
1234
a
$#genericcase:printnthlineaftermatch,awk'n&&!--n;/BEGIN/{n=3}'
$perl-ne'printif$n&&!--$n;$n=3if/BEGIN/'range.txt
END
c
$#printsecondlinepriortomatchedline
$#sameas:awk'/END/{printp2}{p2=p1;p1=$0}'range.txt
$perl-ne'print$p2if/END/;$p2=$p1;$p1=$_'range.txt
1234
b
$#usereversingtrickforgenericcaseofnthlinebeforematch
$#sameas:tacrange.txt|awk'n&&!--n;/END/{n=3}'|tac
$tacrange.txt|perl-ne'printif$n&&!--$n;$n=3if/END/'|tac
BEGIN
a
FurtherReading
stackoverflow-multilinefindandreplace
stackoverflow-deletelinebasedoncontentofprevious/nextlines
softwareengineering-FSMexamples
wikipedia-FSM
Perlregularexpressions
examplestoshowcasesomeofthefeaturesnotpresentinEREandmodifiersnotavailableinsed'ssubstitutecommand
manyfeaturesofPerlregularexpressionswillNOTbecovered,butexternallinkswillbeprovidedwhereverrelevant
Seeperldoc-perlreforcompletereference
andperldoc-regularexpressionsFAQ
examples/descriptionsbasedonlyonASCIIencoding
sedvsperlsubtledifferences
inputrecordseparatorbeingpartofinputrecord
$echo'foo:123:bar:789'|sed-E's/[^:]+$/xyz/'
foo:123:bar:xyz
$#newlinecharactergetsreplacedtooasshownbyshellprompt
$echo'foo:123:bar:789'|perl-pe's/[^:]+$/xyz/'
foo:123:bar:xyz$
$#simpleworkaroundistouse-loption
$echo'foo:123:bar:789'|perl-lpe's/[^:]+$/xyz/'
foo:123:bar:xyz
$#ofcourseithasusestoo
$seq10|paste-sd,|sed's/,/:/g'
1:2:3:4:5:6:7:8:9:10
$seq10|perl-pe's/\n/:/if!eof'
1:2:3:4:5:6:7:8:9:10
howmuchdoes*match?
$#sedwillchoosebiggestmatch
$echo',baz,,xyz,,,'|sed's/[^,]*/A/g'
A,A,A,A,A,A,A
$echo'foo,baz,,xyz,,,123'|sed's/[^,]*/A/g'
A,A,A,A,A,A,A
$#butperlwillmatchbothemptyandnon-emptystrings
$echo',baz,,xyz,,,'|perl-lpe's/[^,]*/A/g'
A,AA,A,AA,A,A,A
$echo'foo,baz,,xyz,,,123'|perl-lpe's/[^,]*/A/g'
AA,AA,A,AA,A,A,AA
$echo'42,789'|sed's/[0-9]*/"&"/g'
"42","789"
$echo'42,789'|perl-lpe's/\d*/"$&"/g'
"42""","789"""
$echo'42,789'|perl-lpe's/\d+/"$&"/g'
"42","789"
backslashsequencesinsidecharacterclasses
$#\wwouldsimplymatchw
$echo'w=y-x+9*3'|sed's/[\w=]//g'
y-x+9*3
$#\wwouldmatchanywordcharacter
$echo'w=y-x+9*3'|perl-pe's/[\w=]//g'
-+*
replacingspecificoccurrence
Seestackoverflow-substitutethenthoccurrenceofamatchinaPerlregexforworkarounds
$echo'foo:123:bar:baz'|sed's/:/-/2'
foo:123-bar:baz
$echo'foo:123:bar:baz'|perl-pe's/:/-/2'
Unknownregexpmodifier"/2"at-eline1,atendofline
Executionof-eabortedduetocompilationerrors.
$#emodifiercoveredlater,allowsPerlcodeinreplacementsection
$echo'foo:123:bar:baz'|perl-pe'$c=0;s/:/++$c==2?"-":$&/ge'
foo:123-bar:baz
$#orusenon-greedyand\K(coveredlater),sameas:sed's/and/-/3'
$echo'fooandbarandbazlandgood'|perl-pe's/(and.*?){2}\Kand/-/'
fooandbarandbazl-good
$#emulatingGNUsed'snumber+gmodifier
$a='456:foo:123:bar:789:baz
x:y:z:a:v:xc:gf'
$echo"$a"|sed's/:/-/3g'
456:foo:123-bar-789-baz
x:y:z-a-v-xc-gf
$echo"$a"|perl-pe'$c=0;s/:/++$c<3?$&:"-"/ge'
456:foo:123-bar-789-baz
x:y:z-a-v-xc-gf
variableinterpolationwhen$or@isused
Seealsoperldoc-QuoteandQuote-likeOperators
$seq2|sed's/$x/xyz/'
1
2
$#uninitializedvariable,sameappliesfor:perl-pe's/@a/xyz/'
$seq2|perl-pe's/$x/xyz/'
xyz1
xyz2
$#initializedvariable
$seq2|perl-pe'$x=2;s/$x/xyz/'
1
xyz
$#usingsinglequotesasdelimiterwon'tinterpolate
$#notusableforone-linersgivenshell'sownsingle/doublequotesbehavior
$catsub_sq.pl
s'$x'xyz'
$seq2|perl-psub_sq.pl
1
2
backreference
Seealsoperldoc-Warningon\1Insteadof$1
$#use$&toreferentirematchedstringinreplacementsection
$echo'helloworld'|sed's/.*/"&"/'
"helloworld"
$echo'helloworld'|perl-pe's/.*/"&"/'
"&"
$echo'helloworld'|perl-pe's/.*/"$&"/'
"helloworld"
$#use\1,\2,etcor\g1,\g2etcforbackreferencinginsearchsection
$#use$1,$2,etcinreplacementsection
$echo'aaawalkingforforacause'|perl-pe's/\b(\w+)(\1)+\b/$1/g'
awalkingforacause
Backslashsequences
\dfor[0-9]
\sfor[\t\r\n\f\v]
\hfor[\t]
\nfornewlinecharacter
\D,\S,\H,\Nrespectivelyfortheiropposites
Seeperldoc-perlrecharclassforfulllistanddetails
$#sameas:sed-E's/[0-9]+/xxx/g'
$echo'like42and37'|perl-pe's/\d+/xxx/g'
likexxxandxxx
$#sameas:sed-E's/[^0-9]+/xxx/g'
$#noteagaintheuseof-lbecauseofnewlineininputrecord
$echo'like42and37'|perl-lpe's/\D+/xxx/g'
xxx42xxx37
$#noneed-lhereas\hwon'tmatchnewline
$echo'abc'|perl-pe's/\h*$//'
abc
Non-greedyquantifier
addinga?to?or*or+or{}quantifierswillchangematchingfromgreedytonon-greedy.Inotherwords,tomatchasminimallyaspossible
alsoknownaslazyquantifier
Seealsoregular-expressions.info-PossessiveQuantifiers
$#greedymatching
$echo'fooandbarandbazlandgood'|perl-pe's/foo.*and//'
good
$#non-greedymatching
$echo'fooandbarandbazlandgood'|perl-pe's/foo.*?and//'
barandbazlandgood
$echo'12342789'|perl-pe's/\d{2,5}//'
789
$echo'12342789'|perl-pe's/\d{2,5}?//'
342789
$#forsinglecharacter,non-greedyisnotalwaysneeded
$echo'123:42:789:good:5:bad'|perl-pe's/:.*?:/:/'
123:789:good:5:bad
$echo'123:42:789:good:5:bad'|perl-pe's/:[^:]*:/:/'
123:789:good:5:bad
$#justlikegreedy,overallmatchingisconsidered,asminimalaspossible
$echo'123:42:789:good:5:bad'|perl-pe's/:.*?:[a-z]/:/'
123:ood:5:bad
$echo'123:42:789:good:5:bad'|perl-pe's/:.*:[a-z]/:/'
123:ad
Lookarounds
Abilitytoaddifconditionstomatchbefore/afterrequiredpattern
Therearefourtypes
positivelookahead(?=
negativelookahead(?!
positivelookbehind(?<=
negativelookbehind(?
forbackreference,use\k
accessiblevia%+hashinreplacementsection
$s='baz2008-03-24and2012-08-12foo2016-03-25'
$echo"$s"|perl-pe's/(\d{4})-(\d{2})-(\d{2})/$3-$2-$1/g'
baz24-03-2008and12-08-2012foo25-03-2016
$#namingthecapturegroupsmightofferclarity
$echo"$s"|perl-pe's/(?\d{4})-(?\d{2})-(?\d{2})/$+{d}-$+{m}-$+{y}/g'
baz24-03-2008and12-08-2012foo25-03-2016
$echo"$s"|perl-pe's/(?\d{4})-(?\d{2})-(?\d{2})/$+{m}-$+{d}-$+{y}/g'
baz03-24-2008and08-12-2012foo03-25-2016
$#andusefultotransformdifferentcapturegroups
$s='"foo,bar",123,"x,y,z",42'
$echo"$s"|perl-lpe's/"(?[^"]+)",|(?[^,]+),/$+{a}|/g'
foo,bar|123|x,y,z|42
$#canalsouse(?|branchreset
$echo"$s"|perl-lpe's/(?|"([^"]+)",|([^,]+),)/$1|/g'
foo,bar|123|x,y,z|42
FurtherReading
perldoc-ExtendedPatterns
rexegg-allthe(?usages
regular-expressions-recursion
Modifiers
somearealreadyseen,liketheg(globalmatch)andi(caseinsensitivematching)
firstup,thermodifierwhichreturnsthesubstitutionresultinsteadofmodifyingthevariableitisactingupon
$perl-e'$x="feed";$y=$x=~s/e/E/gr;print"x=$x\ny=$y\n"'
x=feed
y=fEEd
$#thermodifierisavailablefortransliterationoperatortoo
$perl-e'$x="food";$y=$x=~tr/a-z/A-Z/r;print"x=$x\ny=$y\n"'
x=food
y=FOOD
emodifierallowstousePerlcodeinreplacementsectioninsteadofstring
useeeifyouneedtoconstructastringandthenapplyevaluation
$#replacenumberswiththeirsquares
$echo'4and10'|perl-pe's/\d+/$&*$&/ge'
16and100
$#replacematchedstringwithincrementalvalue
$echo'4and10foo57'|perl-pe's/\d+/++$c/ge'
1and2foo3
$#passinginitialvalue
$echo'4and10foo57'|c=100perl-pe's/\d+/$ENV{c}++/ge'
100and101foo102
$#formattingstring
$echo'a1-2-deed'|perl-lpe's/[^-]+/sprintf"%04s",$&/ge'
00a1-0002-deed
$#callingafunction
$echo'food:12:explain:789'|perl-pe's/\w+/length($&)/ge'
4:2:7:3
$#applyinganothersubstitutiontomatchedstring
$echo'"mango"and"guava"'|perl-pe's/"[^"]+"/$&=~s|a|A|gr/ge'
"mAngo"and"guAvA"
multilinemodifiers
$#mmodifiertomatchbeginning/endofeachlinewithinmultilinestring
$perl-00-ne'printif/^Believe/'sample.txt
$perl-00-ne'printif/^Believe/m'sample.txt
Justdo-it
Believeit
$perl-00-ne'printif/funny$/'sample.txt
$perl-00-ne'printif/funny$/m'sample.txt
Todayissunny
Notabitfunny
Nodoubtyoulikeittoo
$#smodifiertoallow.metacharactertomatchnewlinesaswell
$perl-00-ne'printif/do.*he/'sample.txt
$perl-00-ne'printif/do.*he/s'sample.txt
Muchadoaboutnothing
Hehehe
FurtherReading
perldoc-perlreModifiers
stackoverflow-replacementwithinmatchedstring
Quotingmetacharacters
partofregularexpressioncanbesurroundedwithin\Qand\Etopreventmatchingmetacharacterswithinthatportion
however,$and@wouldstillbeinterpolatedaslongasdelimiterisn'tsinglequotes
\Eisoptionalifapplying\Qtillendofsearchexpression
typicalusecaseisstringtobeprotectedisalreadypresentinavariable,forex:userinputorresultofanothercommand
quotemetawilladdabackslashtoallcharactersotherthan\wcharacters
Seealsoperldoc-Quotingmetacharacters
$#quotemetainaction
$perl-le'$x="[a].b+c^";printquotemeta$x'
\[a\]\.b\+c\^
$#sameas:s='a+b'perl-ne'printifindex($_,$ENV{s})==0'eqns.txt
$s='a+b'perl-ne'printif/^\Q$ENV{s}/'eqns.txt
a+b,pi=3.14,5e12
$s='a+b'perl-pe's/^\Q$ENV{s}/ABC/'eqns.txt
a=b,a-b=c,c*d
ABC,pi=3.14,5e12
i*(t+9-g)/8,4-a+b
$s='a+b'perl-pe's/\Q$ENV{s}\E.*,/ABC,/'eqns.txt
a=b,a-b=c,c*d
ABC,5e12
i*(t+9-g)/8,4-a+b
useqoperatorforreplacementsection
itwouldtreatcontentsasiftheywereplacedinsidesinglequotesandhencenointerpolation
Seealsoperldoc-QuoteandQuote-likeOperators
$#qinaction
$perl-le'$x="[a].b+c^[email protected]";print$x'
[a].b+c^123
$perl-le'$x=q([a].b+c^[email protected]);print$x'
[a].b+c^[email protected]
$perl-le'$x=q([a].b+c^[email protected]);printquotemeta$x'
\[a\]\.b\+c\^\$\@123
$echo'foo123'|perl-pe's/foo/$foo/'
123
$echo'foo123'|perl-pe's/foo/q($foo)/e'
$foo123
$echo'foo123'|perl-pe's/foo/q{$f)oo}/e'
$f)oo123
$#stringsavedinothervariablesdonotneedspecialattention
$echo'foo123'|s='a$b'perl-pe's/foo/$ENV{s}/'
a$b123
$echo'foo123'|perl-pe's/foo/a$b/'
a123
Matchingposition
Fromperldoc-perlvar
$-[0]istheoffsetofthestartofthelastsuccessfulmatch
$+[0]istheoffsetintothestringoftheendoftheentirematch
$catpoem.txt
Rosesarered,
Violetsareblue,
Sugarissweet,
Andsoareyou.
$#startingpositionofmatch
$perl-lne'print"line:$.,offset:$-[0]"if/are/'poem.txt
line:1,offset:6
line:2,offset:8
line:4,offset:7
$#ifoffsetisneededstartingfrom1insteadof0
$perl-lne'print"line:$.,offset:",$-[0]+1if/are/'poem.txt
line:1,offset:7
line:2,offset:9
line:4,offset:8
$#endingpositionofmatch
$perl-lne'print"line:$.,offset:$+[0]"if/are/'poem.txt
line:1,offset:9
line:2,offset:11
line:4,offset:10
formultiplematches,usewhilelooptogooverallthematches
$perl-lne'print"$.:$&:$-[0]"while/is|so|are/g'poem.txt
1:are:6
2:are:8
3:is:6
4:so:4
4:are:7
Usingmodules
TherearemanystandardmodulesavailablethatcomewithPerlinstallation
andmanymoreavailablefromComprehensivePerlArchiveNetwork(CPAN)
stackoverflow-easiestwaytoinstallamissingmodule
$echo'34,17,6'|perl-F,-lane'BEGIN{useList::Utilqw(max)}printmax@F'
34
$#-Moptionprovidesawaytospecifymodulesfromcommandline
$echo'34,17,6'|perl-MList::Util=max-F,-lane'printmax@F'
34
$echo'34,17,6'|perl-MList::Util=sum0-F,-lane'printsum0@F'
57
$echo'34,17,6'|perl-MList::Util=product-F,-lane'printproduct@F'
3468
$s='1,2,3,4,5'
$echo"$s"|perl-MList::Util=shuffle-F,-lane'printjoin",",shuffle@F'
5,3,4,1,2
$s='3,b,a,c,d,1,d,c,2,3,1,b'
$echo"$s"|perl-MList::MoreUtils=uniq-F,-lane'printjoin",",uniq@F'
3,b,a,c,d,1,2
$echo'foo123baz'|base64
Zm9vIDEyMyBiYXoK
$echo'foo123baz'|perl-MMIME::Base64-ne'printencode_base64$_'
Zm9vIDEyMyBiYXoK
$echo'Zm9vIDEyMyBiYXoK'|perl-MMIME::Base64-ne'printdecode_base64$_'
foo123baz
acoolmoduleOhelpstoconvertone-linerstofullfledgedprograms
similarto-ooptionforGNUawk
$#commandbeingdeparsedisdiscussedinalatersection
$perl-MO=Deparse-ne'if(!$#ARGV){$h{$_}=1;next}
printif$h{$_}'colors_1.txtcolors_2.txt
LINE:while(defined($_=)){
unless($#ARGV){
$h{$_}=1;
next;
}
print$_if$h{$_};
}
-esyntaxOK
$perl-MO=Deparse-00-ne'printif/it/'sample.txt
BEGIN{$/="";$\=undef;}
LINE:while(defined($_=)){
print$_if/it/;
}
-esyntaxOK
FurtherReading
perldoc-perlmodlib
perldoc-Coremodules
unix.stackexchange-exampleforAlgorithm::Combinatorics
unix.stackexchange-exampleforText::ParseWords
stackoverflow-regularexpressionmodules
metacpan-String::Approx-Perlextensionforapproximatematching(fuzzymatching)
metacpan-Tie::IxHash-orderedassociativearraysforPerl
Twofileprocessing
First,abitabout$#ARGVandhashvariables
$#$#ARGVcanbeusedtoknowwhichfileisbeingprocessed
$perl-lne'print$#ARGV'readsline(s)fromthespecifiedfile
defaultstocurrentfileargument(includesstdinaswell),so<>canbeusedasshortcut
willreadonlyfromstdin,therearealsopredefinedhandlesforstdout/stderr
inlistcontext,allthelineswouldberead
Seeperldoc-I/OOperatorsfordetails
$#usingif-elseinsteadofnext
$perl-ne'if(!$#ARGV){$h{$_}=1}
else{printif$h{$_}}'colors_1.txtcolors_2.txt
Blue
Red
$#readalllinesoffirstfileinBEGINblock
$#<>readsalinefromcurrentfileargument
$#eofwillensureonlyfirstfileisread
$perl-ne'BEGIN{$h{<>}=1while!eof;}
printif$h{$_}'colors_1.txtcolors_2.txt
Blue
Red
$#thismethodalsoallowstoeasilyresetlinenumber
$#closeARGVissimilartocallingnextfileinGNUawk
$perl-ne'BEGIN{$h{<>}=1while!eof;closeARGV}
print"$.\n"if$h{$_}'colors_1.txtcolors_2.txt
2
4
$#orpass1stfilecontentasSTDIN,$.willbeautomaticallyresetaswell
$perl-ne'BEGIN{$h{$_}=1while}
printif$h{$_}'=m[$1]'
$perl-ane'if(!$#ARGV){$d{$F[0]}=1;$m{$F[0]}=$F[1]}
else{printif$d{$F[0]}&&$F[2]>=$m{$F[0]}}'list3marks.txt
ECEJoel72
EEEMoi68
CSESurya81
ECEOm92
Seealsostackoverflow-Fastestwaytofindlinesofatextfilefromanotherlargertextfile
Linenumbermatching
$#replacemthlineinpoem.txtwithnthlinefromnums.txt
$#assumesthatthereareatleastnlinesinnums.txt
$#sameas:awk-vm=3-vn=2'BEGIN{while(n-->0)getlineswhile$ENV{n}-->0;closeARGV}
$_=$sif$.==$ENV{m}'nums.txtpoem.txt
Rosesarered,
Violetsareblue,
-2
Andsoareyou.
$#printlinefromfruits.txtifcorrespondinglinefromnums.txtis+venumber
$#sameas:awk-vfile='nums.txt''(getlinenum0'
$>0'fruits.txt
fruitqty
banana31
Creatingnewfields
Numberoffieldsininputrecordcanbechangedbysimplymanipulating$#F
$s='foo,bar,123,baz'
$#reducingfields
$#sameas:awk-F,-vOFS=,'{NF=2}1'
$echo"$s"|perl-F,-lane'$,=",";$#F=1;print@F'
foo,bar
$#creatingnewemptyfield(s)
$#sameas:awk-F,-vOFS=,'{NF=5}1'
$echo"$s"|perl-F,-lane'$,=",";$#F=4;print@F'
foo,bar,123,baz,
$#assigningtofieldgreaterthan$#Fwillcreateemptyfieldsasneeded
$#sameas:awk-F,-vOFS=,'{$7=42}1'
$echo"$s"|perl-F,-lane'$,=",";$F[6]=42;print@F'
foo,bar,123,baz,,,42
addingafieldbasedonexistingfields
SeealsosplitandArrayoperationssections
$#addinganew'Grade'field
$#sameas:awk'BEGIN{OFS="\t";split("DCBAS",g,//)}
$#{NF++;$NF=NR==1?"Grade":g[int($(NF-1)/10)-4]}1'marks.txt
$perl-lane'BEGIN{$,="\t";@g=split//,"DCBAS"}$#F++;
$F[-1]=$.==1?"Grade":$g[$F[-2]/10-5];print@F'marks.txt
DeptNameMarksGrade
ECERaj53D
ECEJoel72B
EEEMoi68C
CSESurya81A
EEETia59D
ECEOm92S
CSEAmy67C
$#alternatesyntax:arrayinitializationandappendingarrayelement
$perl-lane'BEGIN{$,="\t";@g=qw(DCBAS)}
push@F,$.==1?"Grade":$g[$F[-1]/10-5];print@F'marks.txt
twofileexample
$catlist4
Rajclass_rep
Amysports_rep
Tiaplacement_rep
$#sameas:awk-vOFS='\t''NR==FNR{r[$1]=$2;next}
$#{NF++;$NF=FNR==1?"Role":$NF=r[$2]}1'list4marks.txt
$perl-lane'if(!$#ARGV){$r{$F[0]}=$F[1];$.=0}
else{push@F,$.==1?"Role":$r{$F[1]};
printjoin"\t",@F}'list4marks.txt
DeptNameMarksRole
ECERaj53class_rep
ECEJoel72
EEEMoi68
CSESurya81
EEETia59placement_rep
ECEOm92
CSEAmy67sports_rep
Multiplefileinput
thereisnogawk'sFNR/BEGINFILE/ENDFILEequivalentinperl,butitcanbeworkedaround
$#sameas:awk'FNR==2'poem.txtgreeting.txt
$#closeARGVwillreset$.to0
$perl-ne'printif$.==2;closeARGVifeof'poem.txtgreeting.txt
Violetsareblue,
Haveasafejourney
$#sameas:awk'BEGINFILE{print"file:"FILENAME}ENDFILE{print$0"\n------"}'
$perl-lne'print"file:$ARGV"if$.==1;
print"$_\n------"andcloseARGVifeof'poem.txtgreeting.txt
file:poem.txt
Andsoareyou.
------
file:greeting.txt
Haveasafejourney
------
workaroundforgawk'snextfile
toskipremaininglinesfromcurrentfilebeingprocessedandmoveontonextfile
$#sameas:head-q-n1andawk'FNR>1{nextfile}1'
$perl-pe'closeARGVif$.>=1'poem.txtgreeting.txtfruits.txt
Rosesarered,
Hellothere
fruitqty
$#sameas:awk'tolower($1)~/red/{printFILENAME;nextfile}'*
$perl-lane'print$ARGVandcloseARGVif$F[0]=~/red/i'*
colors_1.txt
colors_2.txt
Dealingwithduplicates
retainonlyfirstcopyofduplicates
$catduplicates.txt
abc74
foodtoy****
abc74
testtoy123
goodtoy****
$#wholeline,sameas:awk'!seen[$0]++'duplicates.txt
$perl-ne'printif!$seen{$_}++'duplicates.txt
abc74
foodtoy****
testtoy123
goodtoy****
$#particularcolumn,sameas:awk'!seen[$2]++'duplicates.txt
$perl-ane'printif!$seen{$F[1]}++'duplicates.txt
abc74
foodtoy****
$#totalcount,sameas:awk'!seen[$2]++{c++}END{print+c}'duplicates.txt
$perl-lane'$c++if!$seen{$F[1]}++;END{print$c+0}'duplicates.txt
2
ifinputissolargethatintegernumberscanoverflow
Seealsoperldoc-bignum
$perl-le'print"equal"if
102**33==1922231403943151831696327756255167543169267432774552016351387451392'
$#-Moptionhereenablestheuseofbignummodule
$perl-Mbignum-le'print"equal"if
102**33==1922231403943151831696327756255167543169267432774552016351387451392'
equal
$#avoidunnecessarycountingaltogether
$#sameas:awk'!($2inseen);{seen[$2]}'duplicates.txt
$perl-ane'printif!$seen{$F[1]};$seen{$F[1]}=1'duplicates.txt
abc74
foodtoy****
$#sameas:awk-M'!($2inseen){c++}{seen[$2]}END{print+c}'duplicates.txt
$perl-Mbignum-lane'$c++if!$seen{$F[1]};$seen{$F[1]}=1;
END{print$c+0}'duplicates.txt
2
multiplefields
Seealsounix.stackexchange-basedonsamefieldsthatcouldbeindifferentorder
$#sameas:awk'!seen[$2,$3]++'duplicates.txt
$#defaultSUBSEP(storedin$;)is\034,sameasGNUawk
$perl-ane'printif!$seen{$F[1],$F[2]}++'duplicates.txt
abc74
foodtoy****
testtoy123
$#orusemultidimensionalkey
$perl-ane'printif!$seen{$F[1]}{$F[2]}++'duplicates.txt
abc74
foodtoy****
testtoy123
retainingspecificcopy
$#secondoccurrenceofduplicate
$#sameas:awk'++seen[$2]==2'duplicates.txt
$perl-ane'printif++$seen{$F[1]}==2'duplicates.txt
abc74
testtoy123
$#thirdoccurrenceofduplicate
$#sameas:awk'++seen[$2]==3'duplicates.txt
$perl-ane'printif++$seen{$F[1]}==3'duplicates.txt
goodtoy****
$#retainingonlylastcopyofduplicate
$#reversetheinputline-wise,retainfirstcopyandthenreverseagain
$#sameas:tacduplicates.txt|awk'!seen[$2]++'|tac
$tacduplicates.txt|perl-ane'printif!$seen{$F[1]}++'|tac
abc74
goodtoy****
filteringbasedonduplicatecount
allowstoemulateuniqcommandforspecificfields
$#allduplicatesbasedon1stcolumn
$#sameas:awk'NR==FNR{a[$1]++;next}a[$1]>1'duplicates.txtduplicates.txt
$perl-ane'if(!$#ARGV){$x{$F[0]}++}
else{printif$x{$F[0]}>1}'duplicates.txtduplicates.txt
abc74
abc74
$#morethan2duplicatesbasedon2ndcolumn
$#sameas:awk'NR==FNR{a[$2]++;next}a[$2]>2'duplicates.txtduplicates.txt
$perl-ane'if(!$#ARGV){$x{$F[1]}++}
else{printif$x{$F[1]}>2}'duplicates.txtduplicates.txt
foodtoy****
testtoy123
goodtoy****
$#onlyuniquelinesbasedon3rdcolumn
$#sameas:awk'NR==FNR{a[$3]++;next}a[$3]==1'duplicates.txtduplicates.txt
$perl-ane'if(!$#ARGV){$x{$F[2]}++}
else{printif$x{$F[2]}==1}'duplicates.txtduplicates.txt
testtoy123
LinesbetweentwoREGEXPs
ThissectiondealswithfilteringlinesboundbytwoREGEXPs(referredtoasblocks)
ForsimplicitythetwoREGEXPsusuallyusedinbelowexamplesarethestringsBEGINandEND
Allunbrokenblocks
Considerthebelowsampleinputfile,whichdoesn'thaveanyunbrokenblocks(i.eBEGINandENDarealwayspresentinpairs)
$catrange.txt
foo
BEGIN
1234
6789
END
bar
BEGIN
a
b
c
END
baz
ExtractinglinesbetweenstartingandendingREGEXP
$#includebothstarting/endingREGEXP
$#sameas:awk'/BEGIN/{f=1}f;/END/{f=0}'range.txt
$perl-ne'$f=1if/BEGIN/;printif$f;$f=0if/END/'range.txt
BEGIN
1234
6789
END
BEGIN
a
b
c
END
$#canalsouse:perl-ne'printif/BEGIN/../END/'range.txt
$#whichissimilartosed-n'/BEGIN/,/END/p'
$#butnotsuitabletoextendforothercases
othervariations
$#sameas:awk'/END/{f=0}f;/BEGIN/{f=1}'range.txt
$perl-ne'$f=0if/END/;printif$f;$f=1if/BEGIN/'range.txt
1234
6789
a
b
c
$#checkoutwhatthesedo:
$perl-ne'$f=1if/BEGIN/;$f=0if/END/;printif$f'range.txt
$perl-ne'printif$f;$f=0if/END/;$f=1if/BEGIN/'range.txt
ExtractinglinesotherthanlinesbetweenthetwoREGEXPs
$#sameas:awk'/BEGIN/{f=1}!f;/END/{f=0}'range.txt
$#canalsouse:perl-ne'printif!(/BEGIN/../END/)'range.txt
$perl-ne'$f=1if/BEGIN/;printif!$f;$f=0if/END/'range.txt
foo
bar
baz
$#theotherthreecaseswouldbe
$perl-ne'$f=0if/END/;printif!$f;$f=1if/BEGIN/'range.txt
$perl-ne'printif!$f;$f=1if/BEGIN/;$f=0if/END/'range.txt
$perl-ne'$f=1if/BEGIN/;$f=0if/END/;printif!$f'range.txt
Specificblocks
Gettingfirstblock
$#sameas:awk'/BEGIN/{f=1}f;/END/{exit}'range.txt
$perl-ne'$f=1if/BEGIN/;printif$f;exitif/END/'range.txt
BEGIN
1234
6789
END
$#useothertricksdiscussedinprevioussectionasneeded
$#sameas:awk'/END/{exit}f;/BEGIN/{f=1}'range.txt
$perl-ne'exitif/END/;printif$f;$f=1if/BEGIN/'range.txt
1234
6789
Gettinglastblock
$#reverseinputlinewise,changetheorderofREGEXPs,finallyreverseagain
$#sameas:tacrange.txt|awk'/END/{f=1}f;/BEGIN/{exit}'|tac
$tacrange.txt|perl-ne'$f=1if/END/;printif$f;exitif/BEGIN/'|tac
BEGIN
a
b
c
END
$#or,savetheblocksinabufferandprintthelastonealone
$#sameas:awk'/4/{f=1;b=$0;next}f{b=bORS$0}/6/{f=0}END{printb}'
$seq30|perl-ne'if(/4/){$f=1;$b=$_;next}
$b.=$_if$f;$f=0if/6/;END{print$b}'
24
25
26
Gettingblocksbasedonacounter
$#getonly2ndblock
$#sameas:seq30|awk-vb=2'/4/{c++}c==b{print;if(/6/)exit}'
$seq30|b=2perl-ne'$c++if/4/;if($c==$ENV{b}){print;exitif/6/}'
14
15
16
$#togetallblocksgreaterthan'b'blocks
$#sameas:seq30|awk-vb=1'/4/{f=1;c++}f&&c>b;/6/{f=0}'
$seq30|b=1perl-ne'$f=1,$c++if/4/;
printif$f&&$c>$ENV{b};$f=0if/6/'
14
15
16
24
25
26
excludingaparticularblock
$#excludes2ndblock
$#sameas:seq30|awk-vb=2'/4/{f=1;c++}f&&c!=b;/6/{f=0}'
$seq30|b=2perl-ne'$f=1,$c++if/4/;
printif$f&&$c!=$ENV{b};$f=0if/6/'
4
5
6
24
25
26
extractblockonlyifitmatchesanotherstringaswell
$#stringtomatchinsideblock:23
$perl-ne'if(/BEGIN/){$f=1;$m=0;$b=""};$m=1if$f&&/23/;
$b.=$_if$f;if(/END/){print$bif$m;$f=0}'range.txt
BEGIN
1234
6789
END
$#linetomatchinsideblock:5or25
$seq30|perl-ne'if(/4/){$f=1;$m=0;$b=""};$m=1if$f&&/^(5|25)$/;
$b.=$_if$f;if(/6/){print$bif$m;$f=0}'
4
5
6
24
25
26
Brokenblocks
IfthereareblockswithendingREGEXPbutwithoutcorrespondingstart,earliertechniquesusedwillsuffice
ConsiderthemodifiedinputfilewherestartingREGEXPdoesn'thavecorrespondingending
$catbroken_range.txt
foo
BEGIN
1234
6789
END
bar
BEGIN
a
b
c
baz
$#thefilereversingtrickcomesinhandyhereaswell
$#sameas:tacbroken_range.txt|awk'/END/{f=1}f;/BEGIN/{f=0}'|tac
$tacbroken_range.txt|perl-ne'$f=1if/END/;
printif$f;$f=0if/BEGIN/'|tac
BEGIN
1234
6789
END
Butifbothkindsofbrokenblocksarepresent,forex:
$catmultiple_broken.txt
qqqqqqq
BEGIN
foo
BEGIN
1234
6789
END
bar
END
0-42-1
BEGIN
a
BEGIN
b
END
xyzabc
thenusebufferstoaccumulatetherecordsandprintaccordingly
$#sameas:awk'/BEGIN/{f=1;buf=$0;next}f{buf=bufORS$0}
$#/END/{f=0;if(buf)printbuf;buf=""}'multiple_broken.txt
$perl-ne'if(/BEGIN/){$f=1;$b=$_;next}$b.=$_if$f;
if(/END/){$f=0;print$bif$b;$b=""}'multiple_broken.txt
BEGIN
1234
6789
END
BEGIN
b
END
$#notehowbufferisinitializedaswellascleared
$#onmatchingbeginning/endREGEXPsrespectively
$#'undef$b'canalsobeusedhereinsteadof$b=""
Arrayoperations
initialization
$#listexample,eachvalueisseparatedbycomma
$perl-e'($x,$y)=(4,5);print"$x:$y\n"'
4:5
$#usinglisttoinitializearrays,allowsvariableinterpolation
$#($x,$y)=($y,$x)willswapvariables:)
$perl-e'@nums=(4,5,84);print"@nums\n"'
4584
$perl-e'@nums=(4,5,84,"foo");print"@nums\n"'
4584foo
$perl-e'$x=5;@y=(3,2);@nums=($x,"good",@y);print"@nums\n"'
5good32
$#useqwtospecifystringelementsseparatedbyspace,nointerpolation
$perl-e'@nums=qw(4584"foo");print"@nums\n"'
4584"foo"
$perl-e'@nums=qw(a$x@y);print"@nums\n"'
a$x@y
$#usedifferentdelimiterasneeded
$perl-e'@nums=qw/baz1)foo/;print"@nums\n"'
baz1)foo
accessingindividualelements
Seealsoperldoc-functionsforarraysforpush,pop,shift,unshiftfunctions
$#indexstartsfrom0
$perl-le'@nums=(4,"foo",2,"x");print$nums[0]'
4
$#notetheuseof$whenaccessingindividualelement
$perl-le'@nums=(4,"foo",2,"x");print$nums[2]'
2
$#toaccesselementsfromend,use-veindexfrom-1
$perl-le'@nums=(4,"foo",2,"x");print$nums[-1]'
x
$#indexoflastelementinarray
$perl-le'@nums=(4,"foo",2,"x");print$#nums'
3
$#sizeofarray,i.etotalnumberofelements
$perl-le'@nums=(4,"foo",2,"x");[email protected];print$s'
4
$perl-le'@nums=(4,"foo",2,"x");printscalar@nums'
4
arrayslices
Seealsoperldoc-RangeOperators
$#notetheuseof@whenaccessingmorethanoneelement
$echo'abcd'|perl-lane'print"@F[0,-1,2]"'
adc
$#rangeoperator
$echo'abcd'|perl-lane'print"@F[1..2]"'
bc
$#rotatingelements
$echo'abcd'|perl-lane'print"@F[1..$#F,0]"'
bcda
$#indexneededcanbegivenfromanotherarraytoo
$echo'abcd'|perl-lane'@i=(3,1);print"@F[@i]"'
db
$#easyswappingofcolumns
$perl-lane'printjoin"\t",@F[1,0]'fruits.txt
qtyfruit
42apple
31banana
90fig
6guava
rangeoperatoralsoallowshandyinitialization
$perl-le'@n=(12..17);print"@n"'
121314151617
$perl-le'@n=(l..ad);print"@n"'
lmnopqrstuvwxyzaaabacad
Iterationandfiltering
Seealsostackoverflow-extractingmultilinetextandperformingsubstitution
$#foreachwillreturneachvalueonebyone
$#canalsouse'for'keywordinsteadof'foreach'
$perl-le'print$_*2foreach(12..14)'
24
26
28
$#iterateusingindex
$perl-le'@x=(a..e);foreach(0..$#x){print$x[$_]}'
a
b
c
d
e
$#C-styleforloopcanbeusedaswell
$perl-le'@x=(a..c);for($i=0;$i<=$#x;$i++){print$x[$i]}'
a
b
c
usegrepforfilteringarrayelementsbasedonacondition
Seealsounix.stackexchange-extractspecificfieldsandusecorrespondingheadertext
$#asusual,$_willgetthevalueeachiteration
$perl-le'$,="";printgrep{/[35]/}2..26'
3513152325
$#alternatesyntax
$perl-le'$,="";printgrep/[35]/,2..26'
3513152325
$#togetindexinsteadofmatches
$perl-le'$,="";@n=(2..26);printgrep{$n[$_]=~/[35]/}0..$#n'
1311132123
$#comparevalues
$s='23756-9835'
$echo"$s"|perl-lane'printjoin"",grep$_<100,@F'
23-9835
$#filtersonlythoseelementswithsuccessfulsubstitution
$#notethatitwouldmodifyarrayelementsaswell
$echo"$s"|perl-lane'printjoin"",greps/3/E/,@F'
2E-98E
moreexamples
$#filteringcolumn(s)basedonheader
$perl-lane'@i=grep{$F[$_]eq"Name"}0..$#Fif$.==1;
print@F[@i]'marks.txt
Name
Raj
Joel
Moi
Surya
Tia
Om
Amy
$catsplit.txt
foo,1:2:5,baz
wry,4,look
free,3:8,oh
$#printlineifmorethanonecolumnhasadigit
$perl-F:-lane'printif(grep/\d/,@F)>1'split.txt
foo,1:2:5,baz
free,3:8,oh
togetrandomelementfromarray
$s='6523756-9835'
$echo"$s"|perl-lane'print$F[rand@F]'
5
$echo"$s"|perl-lane'print$F[rand@F]'
23
$echo"$s"|perl-lane'print$F[rand@F]'
-983
$#inscalarcontext,sizeofarraygetspassedtorand
$#randactuallyreturnsafloat
$#whichthengetsconvertedtointindex
Sorting
Seeperldoc-sortfordetails
$aand$barespecialvariablesusedforsorting,avoidusingthemasuserdefinedvariables
$#bydefault,sortdoesstringcomparison
$s='foobazv22aimed'
$echo"$s"|perl-lane'printjoin"",sort@F'
aimedbazfoov22
$#sameasdefaultsort
$echo"$s"|perl-lane'printjoin"",sort{$acmp$b}@F'
aimedbazfoov22
$#descendingorder,notehow$aand$bareswitched
$echo"$s"|perl-lane'printjoin"",sort{$bcmp$a}@F'
v22foobazaimed
$#functionscanbeusedforcustomsorting
$#lclowercasesstring,sothissortscaseinsensitively
$perl-lane'printjoin"",sort{lc$acmplc$b}@F'poem.txt
arered,Roses
areblue,Violets
isSugarsweet,
Andaresoyou.
sortingcharacterswithinword
$echo'foobar'|perl-F-lane'printsort@F'
abfoor
$catwords.txt
bot
art
are
boat
toe
flee
reed
$#wordswithcharactersinascendingorder
$perl-F-lane'printif(join"",sort@F)eq$_'words.txt
bot
art
$#wordswithcharactersindescendingorder
$perl-F-lane'printif(join"",sort{$bcmp$a}@F)eq$_'words.txt
toe
reed
fornumericcomparison,use<=>insteadofcmp
$s='23756-9835'
$echo"$s"|perl-lane'printjoin"",sort{$a<=>$b}@F'
-983523756
$echo"$s"|perl-lane'printjoin"",sort{$b<=>$a}@F'
756235-983
$#sortingstringsbasedontheirlength
$s='floorbattodubiousfour'
$echo"$s"|perl-lane'printjoin":",sort{length$a<=>length$b}@F'
to:bat:four:floor:dubious
sortingcolumnsbasedonheader
$#needtogetindexesoforderrequiredforheader,thenuseitforalllines
$perl-lane'@i=sort{$F[$a]cmp$F[$b]}0..$#Fif$.==1;
printjoin"\t",@F[@i]'marks.txt
DeptMarksName
ECE53Raj
ECE72Joel
EEE68Moi
CSE81Surya
EEE59Tia
ECE92Om
CSE67Amy
$perl-lane'@i=sort{$F[$b]cmp$F[$a]}0..$#Fif$.==1;
printjoin"\t",@F[@i]'marks.txt
NameMarksDept
Raj53ECE
Joel72ECE
Moi68EEE
Surya81CSE
Tia59EEE
Om92ECE
Amy67CSE
FurtherReading
perldoc-HowdoIsortahash(optionallybyvalueinsteadofkey)?%3f)
stackoverflow-sortthekeysofahashbyvalue
stackoverflow-sortonlyfrom2ndfield,ignoreheader
stackoverflow-sortbasedongroupoflines
Transforming
shufflinglistelements
$s='23756-9835'
$#notethatthisdoesn'tchangetheinputarray
$echo"$s"|perl-MList::Util=shuffle-lane'printjoin"",shuffle@F'
75623-9835
$echo"$s"|perl-MList::Util=shuffle-lane'printjoin"",shuffle@F'
575623-983
$#randomizingfilecontents
$perl-MList::Util=shuffle-e'printshuffle<>'poem.txt
Sugarissweet,
Andsoareyou.
Violetsareblue,
Rosesarered,
$#orifshuffleorderisknown
$seq5|perl-e'@lines=<>;print@lines[3,1,0,2,4]'
4
2
1
3
5
usemaptotransformeveryelement
$echo'23756-9835'|perl-lane'printjoin"",map{$_*$_}@F'
52957153696628925
$echo'abc'|perl-lane'printjoin",",map{qq/"$_"/}@F'
"a","b","c"
$echo'abc'|perl-lane'printjoin",",map{ucqq/"$_"/}@F'
"A","B","C"
$#changingthearrayitself
$perl-le'@s=(4,245,12);map{$_*$_}@s;printjoin"",@s'
424512
$perl-le'@s=(4,245,12);map{$_=$_*$_}@s;printjoin"",@s'
1660025144
$#ASCIIintvaluesforeachcharacter
$echo'AaBbCc'|perl-F-lane'printjoin"",mapord,@F'
659766986799
$s='thisisasamplesentence'
$#shuffleeachword,splithereconvertseachelementtocharacterarray
$#jointhecharactersaftershufflingwithemptystring
$#finallyprinteachchangedelementwithspaceasseparator
$echo"$s"|perl-MList::Util=shuffle-lane'$,="";
printmap{join"",shufflesplit//}@F;'
tshisiamleaspncstneee
funlittleunreadablescript...
$catpara.txt
WhycannotIgobacktomyignorantdayswithwildimaginationsandfantasies?
Perhapstheanswerliesinnotbeingabletoadapttomyfreedom.
Thoselittledreams,goalsetting,anticipationofresults,usedtobemyworld.
Alljoywithinthesoulandlessdependentonoutsideworld.
Butalltheseareabsentforalongtimenow.
HopeIcanwakethosedreamsalloveragain.
$perl-MList::Util=shuffle-F'/([^a-zA-Z]+)/'-lane'
printmap{@c=split//;$#c<3||/[^a-zA-Z]/?$_:
join"",$c[0],(shuffle@c[1..$#c-1]),$c[-1]}@F;'para.txt
WhycoanntIgobacktomyinoagrntdyaswtihwildimiaintangosandfatenasis?
Phearpstheawsenrliesinnotbiengalbetoaadpttomyfedoerm.
Toshellttiedraems,goalstetnig,aaioiciptntnofrtuelss,uesdtobemywrlod.
Alljoywitihnthesuolandlessdnenepedtonoidusteworld.
Butalltseheareabenstforalnogtmienow.
HpoeIcanwkaetoshedaemrsalloveraiagn.
reversearray
Seealsostackoverflow-applytrandreversetoparticularcolumn
$s='23756-9835'
$echo"$s"|perl-lane'printjoin"",reverse@F'
5-98375623
$echo'foobar'|perl-lne'printreversesplit//'
raboof
$#canalsousescalarcontextinsteadofusingsplit
$echo'foobar'|perl-lne'$x=reverse;print$x'
raboof
$echo'foobar'|perl-lne'printscalarreverse'
raboof
Miscellaneous
split
the-acommandlineoptionusessplitandautomaticallysavestheresultsin@Farray
defaultseparatoris\s+
bydefaultactson$_
andbydefaultallsplitsareperformed
Seealsoperldoc-splitfunction
$echo'a1b2c'|perl-lane'print$F[2]'
b
$echo'a1b2c'|perl-lne'@x=split;print$x[2]'
b
$#tempvariablecanbeavoidedbyusinglistcontext
$echo'a1b2c'|perl-lne'printjoin":",(split)[2,-1]'
b:c
$#usingdigitsasseparator
$echo'a1b2c'|perl-lne'@x=split/\d+/;print":$x[1]:"'
:b:
$#specifyingmaximumnumberofsplits
$echo'a1b2c'|perl-lne'@x=split/\h+/,$_,2;print"$x[0]:$x[1]:"'
a:1b2c:
$#specifyinglimitusing-Foption
$echo'a1b2c'|perl-F'/\h+/,$_,2'-lane'print"$F[0]:$F[1]:"'
a:1b2c:
bydefault,trailingemptyfieldsarestripped
specifyanegativevaluetopreservetrailingemptyfields
$echo':123::'|perl-lne'printscalarsplit/:/'
2
$echo':123::'|perl-lne'printscalarsplit/:/,$_,-1'
4
$echo':123::'|perl-F:-lane'printscalar@F'
2
$echo':123::'|perl-F'/:/,$_,-1'-lane'printscalar@F'
4
tosavetheseparatorsaswell,usecapturegroups
$echo'a1b2c'|perl-lne'@x=split/(\d+)/;print"$x[1],$x[3]"'
1,2
$#or,withoutthetempvariable
$echo'a1b2c'|perl-lne'printjoin",",(split/(\d+)/)[1,3]'
1,2
$#samecanbedonefor-Foption
$echo'a1b2c'|perl-F'(\d+)'-lane'print"$F[1],$F[3]"'
1,2
singlelinetomultiplelinebysplittingacolumn
$catsplit.txt
foo,1:2:5,baz
wry,4,look
free,3:8,oh
$perl-F,-ane'printjoin",",$F[0],$_,$F[2]forsplit/:/,$F[1]'split.txt
foo,1,baz
foo,2,baz
foo,5,baz
wry,4,look
free,3,oh
free,8,oh
weirdbehaviorifliteralspacecharacterisusedwith-Foption
$#onlyoneelementin@Farray
$echo'a1b2c'|perl-F'/b/'-lane'print$F[1]'
$#spacenotbeingusedbyseparator
$echo'a1b2c'|perl-F'b'-lane'print$F[1]'
2c
$#correctbehavior
$echo'a1b2c'|perl-F'b\x20'-lane'print$F[1]'
2c
$#errorsoutifspaceusedinsidecharacterclass
$echo'a1b2c'|perl-F'/b[]/'-lane'print$F[1]'
Unmatched[inregex;markedbyout.txt/'
$catout.txt
1,2,3,4,5,6,7,8,9,10
$catf2
Iboughttwobananasandthreemangoes
$echo'f1,f2,odd.txt'|perl-F,-lane'system"cat$F[1]"'
Iboughttwobananasandthreemangoes
returnvalueofsystemwillhaveexitstatusinformationor$?canbeused
seeperldoc-systemfordetails
$perl-le'$es=systemq/lspoem.txt/;print"$es"'
poem.txt
0
$perl-le'systemq/lspoem.txt/;print"exitstatus:$?"'
poem.txt
exitstatus:0
$perl-le'systemq/lsxyz.txt/;print"exitstatus:$?"'
ls:cannotaccess'xyz.txt':Nosuchfileordirectory
exitstatus:512
tosaveresultofexternalcommand,usebackticksorqxoperator
newlinegetssavedtoo,usechompifneeded
$perl-e'$lines=`wc-l