了解正则表达式语法

来源：互联网发布：picturebox 网络图片编辑：程序博客网时间：2024/05/25 01:34

http://liujie198578.blog.163.com/blog/static/349894282013102710047646

. 表示任意的一个字符

* 表示任意个（0个或多个）前字符

.* 表示任意个字符（0个或多个）

比较shell中，*就表示表示任意个字符，和正则表达式的.*是一个意思。

\ 反斜杠，表示转义，用于将元字符转义成普通的字符。

\.sh 表示包含.sh的字符

\\ 表示包含\的字符

在sed中\可以将普通字符转成元字符

 \{ \} \n

\n表示从1到9中的一个数

[root@keep ~]# cat sample
abcd
[root@keep ~]# grep abcd. sample
[root@keep ~]# grep abcd sample
abcd

abcd.没有结果，可以看出.是不匹配换行符的

[Ww]hat 表示匹配what或What的字符

可以比配Whatever和somewhat

\.H[12345] 表示比配.H1 .H2 .H3 .H4 .H5的字符

.[!?;:,".] . 表示匹配任意一个字符后面有一个感叹号、问号、分号、冒号、逗号、引号或句点，随后是两个空格和任意一个字符。注意有三个句点，前后两个句点表示通配元字符，中间的句点因为在[]中，变成了普通字符。

[A-Z] 表示所有的大写字母

[0-9] 表示所有的数字

[cC]hapter [1-9] 表示匹配Chapter或chapter且其后有空格，然后数值是1到9之间的字符

[0-9a-z?,.:'"] 表示匹配任意单个字符，可以是数字、小写字母、问号、逗号、句点、分号、冒号或引号

[a-zA-Z][.?!] 表达式匹配任意一个英文字母后跟一个句点、问号或感叹号。

MM-DD-YY

MM/DD/YY

[0-1][0-9][-/][0-3][0-9][-/][0-9][0-9]

[^0-9] 表达式匹配任意非数字的字符，它匹配所有的大写和小写以及所有的特殊字符

[^aeiou]表达式匹配非小写元音，包括任意辅音，大写任意元音和特殊字符

\.DS "[^1]"表达式匹配.DS后跟一个空格、一个引号、一个非1的字符和一个引号

.DS "1"唯独不会匹配上，像.DS "L"或.DS "2"都是匹配该模式的

空格"*hypertext"*空格表示出现在引号中的单词，即使没有单词的两边没有引号，也能匹配上hypertext

100

500

1000

5000

[15]0*可表示上面的所有数字

[15]00*可表示上面除了1和5的所有数字

空格空格* 表示一个或多个空格（并非零个或多个空格）

".*"表示的是引号之间有任意个字符

grep '<.*>' sample 表示找出sample文档中所有带<>标记的行

[root@keep ~]# cat test1
i can do it
i cannot do it
i can not do it
i can't do it
i cant do it
[root@keep ~]# grep "can[ no' ]*t" test1
i cannot do it
i can not do it
i can't do it
i cant do it

只有i can do it没有匹配上，在can和t之间字符n、o和空格可以按任意组合出现任意次数

[root@keep ~]# grep can.*t test1
i can do it
i cannot do it
i can not do it
i can't do it
i cant do it

can.*t全部匹配上了

? 表示零个或一个前导字符

80[234]?86 表示80286、80386、80486或8086

空格book.*空格表示一个单词中book后有任意个字符

空格book.?空格表示一个单词中book后有任意一个字符或没有字符

^单个字符定位行首字符

单个字符$ 定位行尾字符

^字表符定位以字表符开始的行

空格空格*$ 表示一个或多个空格结尾的行

^\...空格表示点为行首，点后面跟任意两个字符和空格的行

^空格*$ 表示空行，也包括只有空格的行

^.*$ 表示所有的行

^$ 表示所有的空行

sed和grep，只有当^和$分别出现在正则表达式在行首和行尾，才能作为特殊字符，否则当成普通字符看待

a^b$c,表示五个普通的字面字符

awk中，^和$在正则表达式的任何位置都是特殊字符，因此必要的时候要转义它们。

1001、10001、100001 可以用10\{2,\4}1表示

[0-9]\{3\}-[0-9]\{2\}-[0-9]\{4\} 表示3位数字，跟一个连字符，跟两个数字，后面跟一个连字符，最后跟4个数字

| 正则表达式的联合

UNIX|LINUX|NETBSD 表示匹配UNIX或linux或netbsd

compan（y|ies）表示company或companies

（）对正则表达式进行分组，设置优先级

BigOne( Computer)? 表示BigOne或者BigOne Computer

egrep "Lab(oratorie)?s" mail.list 表示Labs或Laboratories

Bell Laboratories, Lucent Technologies

Bell Labs

[zhangsan@keep ftp]$ cat bookwords
This file tests for book in various places, such as
book at the beginning of a line or
at the end of a line book
as well as the plural books and
handbooks. Here are some
phrases that use the word in different ways:
"book of the year award"
to look for a line with the word "book"
A GREAT book!
A great book? NO
told them about (the books) until it
Here are the books that you requested
Yes, it is a good book for children
amzing that it was called a "harmful book" when
once you get to the end of the book, you can't believe
A well-written regular expression should
avoid matching unrelated words,
such as booky(is that a word?)
and bookish and
bookworm and so on.
book! and so on.

[zhangsan@keep ftp]$ grep " [\"[{(]*book[]\"\!?.,;:' s]* " bookwords
This file tests for book in various places, such as
as well as the plural books and
A great book? NO
Here are the books that you requested
Yes, it is a good book for children
amzing that it was called a "harmful book" when
once you get to the end of the book, you can't believe
book! and so on.

[zhangsan@keep ftp]$ egrep "(^| )[\"[{(]*book[]\"\!?.,;:' s]*( |$)" bookwords
This file tests for book in various places, such as
book at the beginning of a line or
at the end of a line book
as well as the plural books and
"book of the year award"
to look for a line with the word "book"
A GREAT book!
A great book? NO
Here are the books that you requested
Yes, it is a good book for children
amzing that it was called a "harmful book" when
once you get to the end of the book, you can't believe
book! and so on.

gres程序的写法如下，用于单个字符的替换。

[root@keep ftp]# cat gres

if [ $# -lt 3 ]; then
echo Usage: gres pattern replacement file >&2
exit 1
fi
pattern=$1
replacement=$2
if [ -f $3 ]; then
file=$3
else
echo $3 is not a file .>$2
exit 1
fi
#A="'echo | tr '\0142' '\001' '"#
sed -e "s/$pattern/$replacement/" $file

[root@keep ftp]# cat test
Apple
Age
[root@keep ftp]# ./gres A a test
apple
age

[root@keep ftp]# ./gres.sh '"[^"]*"' '00' smpleLine
.Se 00 "Full Program Listings"
[root@keep ftp]# ./gres.sh '".*"' '00' smpleLine
.Se 00
[root@keep ftp]# cat smpleLine
.Se "Appendix" "Full Program Listings"

".*" 匹配到最后一个引号，最大匹配。

"[^"]*" 匹配到第二个引号，最小匹配。

[root@keep ftp]# sed 's/$[0-9][0-9]*$\.\{5,\}$[0-9][0-9]*$/\1-\2/' sample
1-5
5-10
10-20
100-200
[root@keep ftp]# cat sample
1.......5
5.......10
10......20
100.....200

有用的正则表达式
州的邮政缩写                                                                  空格[A-Z][A-Z]空格
城市、州                                                                            ^.*,空格[A-Z][A-Z]
城市、州、邮编                                                                ^.*,空格[A-Z][A-Z][0-9]{5}(-[0-9]{4})?
月、日、年                                                                        [A-Z][a-z]\{3,9\}空格[0-9]\{1,2\},空格[0-9]\{4\}
美国社会保险号                                                                [0-9]\{3\}-[0-9]\{2\}-[0-9]\{4\}
北美地区电话                                                                   [0-9]\{3\}-[0-9]\{4\}
格式化的美元数额                                                           \$[空格0-9]*\.[0-9][0-9]
troff嵌入的字体请求                                                          \\f[(BIRP)C*[BW]*
troff请求                                                                             ^\.[a-z]\{2\}
troff宏                                                                                 ^\.[A-Z12].
带有参数的troff宏                                                             ^\.[A-Z12].空格".*"
html嵌入的代码                                                                 <[^>]*>
ventura publisher style codes                                           ^@.*空格=空格.*
匹配空行                                                                            ^$
匹配整行                                                                           ^.*$
匹配一个或多个空格                                                        空格空格*

阅读全文

0 0