SAS--Perl Regular Expressi…

来源:互联网 发布:大庆油田网络客服电话 编辑:程序博客网 时间:2024/06/05 02:13
原文地址:Regular Expressions(正则表达式)">SAS--Perl Regular Expressions(正则表达式)作者:SAS_Miner

正则表达式基础

正则表达式由一些普通字符和一些元字符(metacharacters)组成。普通字符包括大小写的字母和数字,而元字符则具有特殊的含义(详细内容查help)。

一个正则表达式,就是用某种模式去匹配一类字符串的一个公式。

很多人因为它们看上去比较古怪而且复杂所以不敢去使用,这些复杂的表达式其实写起来还是相当简单的,而且,一旦你弄懂它们,你就能把数小时辛苦而且易错的文本处理工作压缩在几分钟(甚至几秒钟)内完成。

 

1、PRXMATCH(regular-expression_r_r_r-id | perl-regular-expression_r_r_r,source)

data_null_;

  position=prxmatch('/world/', 'Hello world!');

   putposition=;

run;

 

2、PRXCHANGE(perl-regular-expression_r_r_r |regular-expression_r_r_r-id, times, source)

data_NULL_;

   x="fejiwof'wefji'f''fe";

   y=prxchange("s/'/M/",-1,x);  

run;

 

3、data_null_;

   text='aaaa111 bbb222ccc333 444dd55';

   y=prxchange('s/(d)([a-z])|([a-z])(d)/$1$3*$2$4/',-1,text);

   puty;                      

run;

Results:   aaaa*111 bbb*222*ccc*333444*dd*55

 

4.

Remove spaces in the add field that separate a single alphabeticcharacter and a string of numerical digits (1 or many)

 

 c 32->c32

add=prxchange("s/(b[A-Za-z])s(d+b)/$1$2/",-1,add)

 

数字与字母间插入空格:

bbb222ccc333 ->bbb 222 ccc 333 

 

addr=prxchange('s/(d)([A-Za-z])|([A-Za-z])(d)/$1$3$2$4/',-1,add)

 

 

 具体用法 SAS HELP

[a-z]

specifies a range ofcharacters that matches any character in the range:

  • "[a-z]" matches any lowercase alphabeticcharacter in the range "a" through "z"

 

[^a-z]

specifies a range ofcharacters that does not match any character in therange:

  • "[^a-z]" matches any character that isnot in the range "a" through "z"

b

matches a word boundary (theposition between a word and a space):

  • "erb" matches the "er" in"never"
  • "erb" does not match the "er" in"verb"

B

matches a non-wordboundary:

  • "erB" matches the "er" in"verb"
  • "erB" does not match the "er" in"never"

d

matches a digit characterthat is equivalent to [0-9].

D

matches a non-digit characterthat is equivalent to [^0-9].

s

matches any white spacecharacter including space, tab, form feed, and so on, and isequivalent to [fnrtv].

S

matches any character that isnot a white space character and is equivalent to[^fnrtv].

t

matches a tab character andis equivalent to "x09".

w

matches any word characterincluding the underscore and is equivalent to[A-Za-z0-9_].

W

matches any non-wordcharacter and is equivalent to [^A-Za-z0-9_].

 

 

0 0