perl---Staying in Control

来源:互联网 发布:淘宝怎么批量定时上架 编辑:程序博客网 时间:2024/05/25 05:35

【When backslashes happen】

When you thinkof double-quote interpolation, you usually think of both variable andbackslash interpolation.  For regular expressions there are two passes,and the interpolation pass defers most of the backslash interpretationto the regular expression parser.

Suppose you abstracted out the colum separator, like this:

    $colsep = " /t+" ;

    ($col1, $col2) =/(.*?)$colsep(.*?)/x;

    Now you've just blown it, because the /t turns into a real tabbefore it gets to the regex parser, which will think yousaid/(.*?)+(.*?)/ after it discards the whitespace. To fix, avoid /x,or use single qutes. Or better, use qr//.

    The only double-quote escape that are processed as such are the six translation escapes:/U,/u,/L,/l,/Q, and /E.

  To defeat interpolation we can use single quotes as your patterndelimiter. In m'...', qr'...', and s'...'...', the single quotessuppress variable interpolation and the processing of translationescapes, just as they would in a single-quoted string.

【The qr// quote regex operator】

  Variables that interpolate into patterns necessarily do so at runtime, not compile time. This slows down execution because Perl has tocheck whether you've changed the contents of the variables; if so, itwould have to recompile the regular expression. But you can use the /ooption to interpolate and compile only once: print if /$pattern/o;

    qr/PATTERN/imosx

    This operator quotes--and compiles--its pattern as a regularexpression. PATTERN is interpolated the same way as in m/PATTERN/. If 'is used as the delimiter, no interpolation of variables is done. Theoperator returns a Perl value that may be used instead of theequivalent literal in a corresponding pattern match or substitute.

    $regex = qr/my.String/is;

    s/$regex/something else/;

is equivalent to :

    s/my.String/something else/is;

At time you interpolate string of unknow proenance into a pattern, youshould be prepared to handle any exceptions thrown by the regexcompiler, in case someon fed you a string containing untamablebeasties.

  $re = qr/$pat/is;

  $re = eval{ qr/$pat/is} ||warn;

【The Regex Compiler】

  By saying use re " debug" , you can examine how the regex parser processes your pattern.

    #!/usr/bin/perl

    use re " debug" ;

    " Smeagol" =~ /^Sm(.*_g[aeiou]l$/;

  【The little Engine That /Could(n't)?

 The Engine uses a nondeterministic finite-state automaton to find amatch. That just means that it keeps track of what it has tried andwhat it hasn't, and when something doesn't pan out, it backs up andtries something else. This is known as backtracking. The Engine iscapable of tring a million subpatterns at one spot, then giving up onall those, backing up to within one choice of the beginning, and tringthe million subpatterns again at a different spot. The Engine is notterribly intelligent; just persistent, and thorough.

Rule 1: The Engine tries to match as far left inthe string as it can, such that the entire regular expression matchesunder Rule 2.

Rule 2: When the Engine encounters a set ofalternatives(separated by |), either at the top level or at the current"cluster" level, it tries them left-to-right, stopping on the firstsuccessful match that allows successful completion of the entirepattern.

Rule 3: Any particular alternative matches if everyitem listed in the alternative matches sequentially according to Rules4 and 5.

Rule 4: If an assertion does not match at thecurrent position, the Engine backtracks to Rule 3 and retrieshigher-pecking-order items with different choices.

Rule 5: A quantified atom matches only if the atom itself matches some number of items that is allowed by the quantifier

Rule 6: Each atom matches according to the designated semantics of itstype. If the atom doesn't match, the Engine backtracks to Rule 5 andtries the next choice for the atom's qunatity.

原创粉丝点击