了解正则表达式语法

来源:互联网 发布:picturebox 网络图片 编辑:程序博客网 时间:2024/05/25 01:34

  http://liujie198578.blog.163.com/blog/static/349894282013102710047646


.   表示任意的一个字符

*   表示任意个(0个或多个)前字符

.*   表示任意个字符(0个或多个)

比较shell中,*就表示表示任意个字符,和正则表达式的.*是一个意思。


\ 反斜杠,表示转义,用于将元字符转义成普通的字符。

\.sh 表示包含.sh的字符

\\ 表示包含\的字符

在sed中\可以将普通字符转成元字符

\( \) \{ \} \n

\n表示从1到9中的一个数

 

[root@keep ~]# cat sample
                                                                            abcd
[root@keep ~]#  grep abcd. sample
[root@keep ~]#  grep abcd sample
                                                                            abcd

abcd.没有结果,可以看出.是不匹配换行符的

 

[Ww]hat 表示匹配what或What的字符

可以比配Whatever和somewhat

 

\.H[12345] 表示比配.H1 .H2 .H3 .H4 .H5的字符

.[!?;:,".]  . 表示匹配任意一个字符后面有一个感叹号、问号、分号、冒号、逗号、引号或句点,随后是两个空格和任意一个字符。注意有三个句点,前后两个句点表示通配元字符,中间的句点因为在[]中,变成了普通字符。


 

[A-Z] 表示所有的大写字母

[0-9] 表示所有的数字

[cC]hapter [1-9] 表示匹配Chapter或chapter且其后有空格,然后数值是1到9之间的字符

[0-9a-z?,.:'"] 表示匹配任意单个字符,可以是数字、小写字母、问号、逗号、句点、分号、冒号或引号

[a-zA-Z][.?!] 表达式匹配任意一个英文字母后跟一个句点、问号或感叹号。

MM-DD-YY

MM/DD/YY

[0-1][0-9][-/][0-3][0-9][-/][0-9][0-9]

 

[^0-9] 表达式匹配任意非数字的字符,它匹配所有的大写和小写以及所有的特殊字符

 

[^aeiou]表达式匹配非小写元音,包括任意辅音,大写任意元音和特殊字符


\.DS "[^1]"表达式匹配.DS后跟一个空格、一个引号、一个非1的字符和一个引号

.DS "1"唯独不会匹配上,像.DS "L"或.DS "2"都是匹配该模式的

 

 空格"*hypertext"*空格 表示出现在引号中的单词,即使没有单词的两边没有引号,也能匹配上hypertext

1

5

10

50

100

500

1000

5000

[15]0*可表示上面的所有数字

[15]00*可表示上面除了1和5的所有数字

 

空格空格* 表示一个或多个空格(并非零个或多个空格)

 

".*"表示的是引号之间有任意个字符

 

grep '<.*>' sample 表示找出sample文档中所有带<>标记的行

 

[root@keep ~]# cat test1
i can do it
i cannot do it
i can not do it
i can't do it
i cant do it
[root@keep ~]# grep "can[ no' ]*t" test1
i cannot do it
i can not do it
i can't do it
i cant do it

只有i can do it没有匹配上,在can和t之间字符n、o和空格可以按任意组合出现任意次数

[root@keep ~]# grep can.*t test1
i can do it
i cannot do it
i can not do it
i can't do it
i cant do it

can.*t全部匹配上了

 

? 表示零个或一个前导字符

80[234]?86 表示80286、80386、80486或8086

 

空格book.*空格 表示一个单词中book后有任意个字符

空格book.?空格 表示一个单词中book后有任意一个字符或没有字符


^单个字符 定位行首字符

单个字符$ 定位行尾字符

^字表符 定位以字表符开始的行

空格空格*$ 表示一个或多个空格结尾的行

^\...空格 表示点为行首,点后面跟任意两个字符和空格的行

^空格*$ 表示空行,也包括只有空格的行

^.*$ 表示所有的行

^$ 表示所有的空行

sed和grep,只有当^和$分别出现在正则表达式在行首和行尾,才能作为特殊字符,否则当成普通字符看待

a^b$c,表示五个普通的字面字符

awk中,^和$在正则表达式的任何位置都是特殊字符,因此必要的时候要转义它们。


1001、10001、100001 可以用10\{2,\4}1表示

[0-9]\{3\}-[0-9]\{2\}-[0-9]\{4\}  表示3位数字,跟一个连字符,跟两个数字,后面跟一个连字符,最后跟4个数字

 

| 正则表达式的联合

UNIX|LINUX|NETBSD 表示匹配UNIX或linux或netbsd

compan(y|ies)表示company或companies

 

()对正则表达式进行分组,设置优先级

BigOne( Computer)? 表示BigOne或者BigOne Computer

egrep "Lab(oratorie)?s" mail.list 表示Labs或Laboratories

Bell Laboratories, Lucent Technologies

Bell Labs

 

[zhangsan@keep ftp]$ cat bookwords
This file tests for book in various places, such as
book at the beginning of a line or
at the end of a line book
as well as the plural books and
handbooks. Here are some
phrases that use the word in different ways:
"book of the year award"
to look for a line with the word "book"
A GREAT book!
A great book? NO
told them about (the books) until it
Here are the books that you requested
Yes, it is a good book for children
amzing that it was called a "harmful book" when
once you get to the end of the book, you can't believe
A well-written regular expression should
avoid matching unrelated words,
such as booky(is that a word?)
and bookish and
bookworm and so on.
 book! and so on.

[zhangsan@keep ftp]$ grep " [\"[{(]*book[]\"\!?.,;:' s]* " bookwords
This file tests for book in various places, such as
as well as the plural books and
A great book? NO
Here are the books that you requested
Yes, it is a good book for children
amzing that it was called a "harmful book" when
once you get to the end of the book, you can't believe
 book! and so on.

 

[zhangsan@keep ftp]$ egrep "(^| )[\"[{(]*book[]\"\!?.,;:' s]*( |$)" bookwords
This file tests for book in various places, such as
book at the beginning of a line or
at the end of a line book
as well as the plural books and
"book of the year award"
to look for a line with the word "book"
A GREAT book!
A great book? NO
Here are the books that you requested
Yes, it is a good book for children
amzing that it was called a "harmful book" when
once you get to the end of the book, you can't believe
 book! and so on.

gres程序的写法如下,用于单个字符的替换。

[root@keep ftp]# cat gres

if [ $# -lt 3 ]; then
echo Usage: gres pattern replacement file >&2
exit 1
fi
pattern=$1
replacement=$2
if [ -f $3 ]; then
file=$3
else
echo $3 is not a file .>$2
exit 1
fi
#A="'echo | tr '\0142' '\001' '"#
sed -e "s/$pattern/$replacement/" $file

 

[root@keep ftp]# cat test
Apple
Age
[root@keep ftp]# ./gres A a test
apple
age


[root@keep ftp]# ./gres.sh '"[^"]*"' '00' smpleLine
.Se 00 "Full Program Listings"
[root@keep ftp]# ./gres.sh '".*"' '00' smpleLine
.Se 00
[root@keep ftp]# cat smpleLine
.Se "Appendix" "Full Program Listings"

".*" 匹配到最后一个引号,最大匹配。

"[^"]*" 匹配到第二个引号,最小匹配。

 

[root@keep ftp]# sed 's/\([0-9][0-9]*\)\.\{5,\}\([0-9][0-9]*\)/\1-\2/' sample
1-5
5-10
10-20
100-200
[root@keep ftp]# cat sample
1.......5
5.......10
10......20
100.....200

有用的正则表达式
州的邮政缩写                                                                   空格[A-Z][A-Z]空格
城市、州                                                                            ^.*,空格[A-Z][A-Z]
城市、州、邮编                                                                ^.*,空格[A-Z][A-Z][0-9]{5}(-[0-9]{4})?
月、日、年                                                                        [A-Z][a-z]\{3,9\}空格[0-9]\{1,2\},空格[0-9]\{4\}
美国社会保险号                                                                [0-9]\{3\}-[0-9]\{2\}-[0-9]\{4\}
北美地区电话                                                                    [0-9]\{3\}-[0-9]\{4\}
格式化的美元数额                                                            \$[空格0-9]*\.[0-9][0-9]
troff嵌入的字体请求                                                          \\f[(BIRP)C*[BW]*
troff请求                                                                             ^\.[a-z]\{2\}
troff宏                                                                                 ^\.[A-Z12].
带有参数的troff宏                                                             ^\.[A-Z12].空格".*"
html嵌入的代码                                                                 <[^>]*>
ventura publisher style codes                                           ^@.*空格=空格.*
匹配空行                                                                            ^$
匹配整行                                                                            ^.*$
匹配一个或多个空格                                                        空格空格*

原创粉丝点击