perl---Pattern-Matching Operators

来源:互联网 发布:淘宝怎么批量定时上架 编辑:程序博客网 时间:2024/06/06 10:33

  Some Characters don't match themselves, but " misbehave" in someway. We call these metacharacters.(all metacharacters are naughty intheir own right, but some are so bad that they also cause other nearbycharacters to misbehave as well.)

Here are the miscreants:

/ | ( ) [ { ^ $ * + ? .

We can always match any of these twelve characters literally by puttinga backslash in front of it. Backslashing an alphanumeric character doesthe opposite: it turns the literal character into something special.whenever you see such a two-character sequence:

/d /D /t /3 /s

/bmataches a word boundary, while a word boundary is zero characters widebecause it's the spot between two characters. so we call /b azero-width assertion.

Pattern-Matching Operators

    The m// and s/// operators also provide the power of double-quoteinterpolation. Since patterns are parsed like double-quoted strings,all the normal double-quote conventions will work, including variableinterpolation and special characters indicated with backslash escapes.These are applied before the string is interpreted as a regularexpression.(This is one of the few places in the prel language where astring undergoes more than one pass of processing) The first pass isnot quite normal double-quote interpolation, in that it knows what itshould interpolate and what it should pass on to the regular expressionparser.

 

    $foo = " bar" ;

  /$foo$/; #equal /bar$/

  Another consequence of this tow-pass parsing is that the ordinaryPerl tokener finds the end of the regular expression first, just as ifit were looking for the terminating delimiter of an ordinary string.Only after it has found the end of the string is the pattern treated asa regular expression.

    You should also know that interpolatingvariables into a pattern slows down the pattern matcher, because itfeels it needs to check whether the variable has changed, in case ithas to recompile the pattern.

    The tr/// transliteration operator does notinterpolate variables; it doesn't even use regular expressions! It doesshare one feature with m// and s///, however: it binds to variablesusing the =~ and !~ operators.

  m//  === //

  tr///  ===  y/// :transliterating one set of characters to another set

    s///  : substituting some string for a substring matched by a pattern

  If the righthand side of =~ or !~ is none of these three, it stillcounts as a m// matching operation, but there'll be no place to put anytrailing modifiers, and you 'll have to handle your own quoting.

  Apart from the m// and s/// operators, regularexpressions show up in two other places in Perl. The first argument tothe split function is a special match operator specifying what not toreturn when breaking a string into multiple substrings. Another is  theqr//(quote regex) operators, the compiled form of the regex of qr// isreturned for future use.

  Without a binding operator, $_ is implicitly used as the " topic"

  !~ and =~ have rather high precedence

 

  sicne m//, s///, and tr/// are quote operators, you may pick your owndelimiters. When using paired delimiters with s/// or tr////, if thefirst part is one of the four customary bracketing pairs(angle, round,square, or curly), you may choose differrent delimiters for the secondpart than you chose for the first:

      s(egg)< larava> ;         s{ larva} { pupa} ;   s[pupa]/imago/;

Whitespace is allowed in front of the opening delimiters:

      s (segg)  < larva> ;

Each time a pattern successfully matches(including the pattern in a substitution), it sets the $`, $& , $'. (the text left of the match, the whole match, the text right of the match)

Use parentheses to capture the particular portions that you want tokeep around. Each pair of parenthese captures the substringcorresponding to the subpattern in the parentheses. The pairs ofparentheses are numbered from left to right by the positions of theleft parentheses; the substrings corresponding to these subpatterns areavailable after the match in the numbered variables, $1, $2, $3 and soon.

  $`, $& , $', and the numbered variables areglobal variables implicitly localized to the enclosing dynamic scope.They last until the next successful pattern match or the end of thecurrent scope, whichever comes first. Not $0, which holds the name ofyour program.

【Pattern Modifiers】

Immediately following the final delimiter of an m//, s///, qr//, ortr/// operator, you may optionally place one or more single-lettermodifiers, in any order. the tr/// operators does not take regexes, sothese modifiers do not aplly.

        /i :  case insensitive

        /s : Let . match newline and ignore deprecated $* variable

        /m : let ^ and $ match next to embedded /n

        /x : Ignore whitespace and permit comments in pattern

        /o : Compile pattern once only

        /g: Globally find all matches, only for m//, s///

        /cg: allow continued search after failed /g match, only for m//

              /e: evaluate the right side as an expression.

The /o modifier controls pattern recompilation. Unless the delimiterschosen are single quote(m'Pattern'), any variables in the pattern willbe interpolated every time the pattern operator is evaluated. /omodifier prevents expensive run-time recompilations. For better controlover recompilation, use the qr// regex quoting operator.

The /x is the expressive modifier: it allows you toexploit whitespace and explanatory comments in order to expand yourpattern's legibility, even extending the pattern across newlineboundaries. /x modifies the meaning of the whitespace characters(andthe # character): instead of letting them do self-matching as ordinarycharacters do, it turns them into metacharacters that, /x allowsspaces, tabs, and newlines for formatting, just like regular Perl code.It also allows the # character to introduce a comment that extendsthrough the end of the current line within the pattern string:

  m//w+:(/s+/w+)/s*/d+/; # a word, colon, space, word, space, digits;

  m//w+:  (/s+ /w) /s* /d+/x; # a word, colon, space, word, space, digits;

The m// Operator (Matching)

      EXPR =~ m/PATTERN/cgimosx

      EXPR =~ /PATTERN/cgimosx

      EXPR =~ ?PATTERN?cgimosx

      m/PATTERN/cgimosx

      /PATTERN/cgimosx

      ?PATTERN?cgimosx

    If / or ? is the delimiter, the initial m is optional. Both ? and 'have special meaning as delimiters: the first is a once-only match; thesecond suppresses variable interpolation.

  If PATTERN evaluates to a null string, the lastsuccessfully executed regular expression not hidden within an innerblock is used instead.

    In scalar context, the operator returns true if successful, false otherwise.

    Used in list context, m// returns a list of substrings matched bythe capturing parentheses in the pattern( that is, $1,.... and soon).If the match succeeds in list context but there were no capturingparentheses(nor /g), a list value of (1) is returned. In list context,m//g returns a list of all matches found. If there are no capturingparenthese within the /g pattern, then the complete matches arereturned. If there are capturing parentheses, then only the stringscaptured are returned.

      If a ? is the delimiter, as in ?PATTERN?,this works just like a normal /PATTERN/ search, except that it matchesonly once between calls to the reset operator. This can be aconventient optimization when you want to match only the firstoccurrence of the pattern during the run of the program, not alloccurences. The operator runs the search every time you call it, upuntil it finally matches something, after which it turns itself off,returning false until you explicitly turn it back on with reset.

  【The s/// Operator(Substitution

      LVALUE =~ s/PATTERN/REPLACEMENT/egimosx;

      s/PATTERN/REPLACEMENT/egimosx;

  The return value of an s/// operation is the number of times itsucceeded(if we use /g).The replacement portion is treated as adouble-quoted string. You may use any of the dynamically scoped patternvariables described earlier($`,$& ,$',$1,$2, and so on) in thereplacement string, for instance:

        s/revision|version|release//u$& /g;

        s/version (0-9.]+)/the $Names{ $1} release/g;

  /e modifier treats the replacement as a chunk of perl code ratherthan as an interpolated string. The result of executing that code isused as the replacement string. For example,  s/([0-9]+)/sprintf(" %$x",$1)/ge.

  You can't use a s/// operator directly on an arry. For that, you need a loop.

Occasionally, you can't just use a /g to get all the changes to occur,either because the substitutions have to happen right-to-left orbecause you need the length of $` to change between matches. You canusually do what you want by calling s/// repeatedly. However, you wantthe loop to stop when the s/// finally fails, so you have to put it tothe conditional, which leaves nonthing to do in the main part of theloop. So we just write a 1, which is a rather boring thing to do, butbored is the best you can hope for sometimes.

    1 while s/(/d))(/d/d/d)(?!/d)/$1,$2/;

【The tr/// operator(Transliteration) 】

  LVALUE =~ tr/SEARCHLIST/REPLACEMENTLIST/cds

  tr/SEARCHLIST/REPLACEMENTLIST/cds

  y/// === tr///.

  you can't call a function named y, any more than you can call afunction named q or m. tr/// scans a string, character by character,and replaces each occurrence of a character found in SEARCHLIST withthe corresponding character from REPLACEMENTLIST. This operator returnsthe number of characters replaced or deleted. The SEARCHLIST andREPLACEMENTLIST may define ranges of sequential characters with a dash.The SEARCHLIST and REPLACEMENTLIST are not variable interpolated asdouble-quoted strings; you may, however, use those backslash sequencesthat map to a specific character, such as /n or /015.

    modifiers for tr///:

        /c :complement SEARCHLIST. The search list consists of all the characters not in SEARCHLIST.

        /d : delete found but unreplaced characters: any charactersspecified by SEARCHLIST but given a replacement in REPLACEMENT aredeleted.

        /s : squash duplicate replaced characters

  If the same character occurs more than once in SEARCHLIST, only thefirst is used. Therefore, this: tr/AAA/XYZ/ will change any singlecharacter A to an X.

  Although variables aren't interpolated into tr///, you can still get the same effect by using eval EXPR:

    $count = eval "tr/$oldlist/$newlist/";

 one more note: if you want to change your text to uppercase orlowercase, don't use tr///. Use the /U or /L sequences in adouble-quoted string( or the equivalent uc and lc functions) since theywill pay attention to locale or unicode information and tr/a-z/A-Z/won't.