Java字符串之正则表达式

来源:互联网 发布:js object to xml 编辑:程序博客网 时间:2024/05/24 06:48

正则表达式


基础

Java中“\\”代表插入正则表达式的反斜杠,后面的字符有特殊意义(例:表示一位数字:”\\d”)
换行:”\n”
表示一个或多个之前的表达式:”+”,正号:”\\+”

应用正则表达式,最简单利用String类的内建的功能,有如下有关正则的方法:
String.matches(String regex):是否匹配正则表达式
String.split(String regex):通过正则表达式去切割字符串
String.replaceFirst(String regex,String):只替换第一个匹配子串
String.replaceFirst(String regex,String):替换所有
例:

public class IntegerMath {    public static void main(String[] args) {        System.out.println("12345".matches("\\d+"));        System.out.println("12345".matches("(-|\\+)?\\d+"));        System.out.println("12345".matches("-\\d+"));        System.out.println(Arrays.toString("A1B2C3D4E5F".split("\\d")));        System.out.println("12345".replaceFirst("\\d", "A"));        System.out.println("12345".replaceAll("\\d", "A"));    }}

Output:

truetruefalse[A, B, C, D, E, F]A2345AAAAA

                            字符类·       任意字符                        [abc]  包含a、b、c的任何字符(a|b|c)[^abc]  除了a、b、c任意字符             [A-Za-z][abc[hij]] 任意a、b、c、h、i、j         [a-z&&[hij]] 任意h、i、j\s      空白符                          \S     非空白符\d      数字[0-9]                       \D     非数字[0-9]\w      词字符[a-zA-Z0-9]               \W     非词字符

量词::描述了一个模式吸收输入文本的方式

贪婪型:为所有可能的模式发现尽可能多的匹配:X
勉强型:用问号指定,匹配满足模式所需的最少字符数:X?
占有型:用+指定,防止正则表达式失控:X+


Pattern和Matcher

Pattern类可以创建功能更强大的正则表达式对象:
Pattern p=Pattern.compile(String regex)生成Pattern对象
Matcher m=p.matcher(String s)生成一个Matcher对象
m.find()查找多个匹配,像迭代器一样向前遍历字符串(boolean型)
组(Groups):组是用括号划分的正则表达式,组号为0表示整个表达式,组号1表示第一对括号括起的组,以此类推。

import java.util.regex.Matcher;import java.util.regex.Pattern;public class Groups {    static public final String POEM=            "Twas brilling, and the slithy toves\n"+            "Did gyre and gimble in the wabe.\n"+            "All mimsy were the borogoves,\n";    public static void main(String[] args) {        Matcher m=                Pattern.compile("(?m)(\\S+)\\s+((\\S+)\\s+(\\S+))$")                        .matcher(POEM);        while(m.find()){            for(int j=0;j<=m.groupCount();j++)                System.out.print("["+m.group(j)+"]");            System.out.println();        }    }}outPut:[the slithy toves][the][slithy toves][slithy][toves][in the wabe.][in][the wabe.][the][wabe.][were the borogoves,][were][the borogoves,][the][borogoves,]

正常情况下,$与整个输入序列的末端进行匹配,我们需要显示的告知正则表达式注意输入序列中的换行符,由序列开头的模式标记“(?m)”完成


start()&end()

public class StartEnd2 {    public static String s="As long as there is injustice,whenever a\n"+            "Targathian baby cries out,wherever a distress\n"+            "signal sounds among the stars ...We'll be there.\n"+            "This fine ship, and this fine crew ...\n"+            "Never give up! Never surrender!";    public static void main(String[]args){        Pattern p1=Pattern.compile("\\w*ere\\w*");        Matcher m=p1.matcher(s);        while(m.find()){            System.out.println(m.group()+" start="+m.start()+" end="+m.end());        }    }}
Outputthere start=11 end=16wherever start=67 end=75there start=129 end=134

Pattern标记

Pattern Pattern.compile(String regex,int flag)

Pattern.CASE_INSENSITIVE(?i):这个标记允许模式匹配不考虑大小写
Pattern.COMMENTS(?x):空格符被忽略掉,并以#开始直到行末的注释也被忽略
Pattern.MULITILINE(?m):多行模式下,表达式^和匹配输入字符串的结尾
Pattern.DOTALL(?s):表达式”.”匹配所有字符,包括行终结符(默认不匹配)。

public class ReFlags {    public static void main(String[] args) {        Pattern p=Pattern.compile("^java",                Pattern.CASE_INSENSITIVE|Pattern.MULTILINE);        Matcher m=p.matcher(                "java has regex\nJava has regex\n"+                "JAVA has pretty good regular expressions\n"+                "Regular expressions are in Java\n"+"JAva");        while(m.find())            System.out.println(m.group());    }}Output:javaJavaJAVAJAva

split()

将输入字符串断开成字符串对象数组

public class SplitDemo {    public static void main(String[]args){        String input=                "This!!unusual use!!of exclamation!!points";        System.out.println(Arrays.toString(                Pattern.compile("!!").split(input)));        System.out.println(Arrays.toString(                Pattern.compile("!!").split(input,3)));    }}Output:[This, unusual use, of exclamation, points][This, unusual use, of exclamation!!points]----------

替换操作

replaceFirst(String replacement):以参数字符串replacement替换掉第一个匹配成功的部分
replaceAll(String replacement):以参数字符串替换掉所有匹配成功的部分
appendReplacement(StringBuffer sbuf,String replacement):执行渐进式的替换。它允许你调用其他方法来生成或处理replacement,使你能够以编程的方式将目标分割成组,从而具备更强大的替换功能
appendTail(StringBuffer sbuf):在执行了一次或多次appendReplacement(),调用此方法将输入字符串余下的部分复制到sbuf。(未匹配的)


public class TheReplacements {    public static void main(String[]args){        String s="/*!Here's a block of text to use as input to\n"+                 "the regular expression matcher. Note that we'll\n"+                 "first extract the block of text by looking for\n"+                 "the special delimiters, then process the\n"+                 "extracted block. !*/";        Matcher mInput=                Pattern.compile("/\\*!(.*)!\\*/",Pattern.DOTALL).matcher(s);        if(mInput.find())            s=mInput.group(1);        s=s.replaceAll(" {2,}", " ");//将两个或两个以上的空格缩为一个        s=s.replaceAll("(?m)^ +", "");//去除开头的空格        System.out.println(s);        s=s.replaceFirst("[aeiou]", "(VOWEL1)");        System.out.println(s);        StringBuffer sbuf=new StringBuffer();        Pattern p=Pattern.compile("[aeiou]");        Matcher m=p.matcher(s);        while(m.find())            m.appendReplacement(sbuf, m.group().toUpperCase());        m.appendTail(sbuf);//复制最后没有匹配的"ck"        System.out.println(sbuf);    }}
OutputHere's a block of text to use as input tothe regular expression matcher. Note that we'llfirst extract the block of text by looking forthe special delimiters, then process theextracted block. H(VOWEL1)re's a block of text to use as input tothe regular expression matcher. Note that we'llfirst extract the block of text by looking forthe special delimiters, then process theextracted block. H(VOWEL1)rE's A blOck Of tExt tO UsE As InpUt tOthE rEgUlAr ExprEssIOn mAtchEr. NOtE thAt wE'llfIrst ExtrAct thE blOck Of tExt by lOOkIng fOrthE spEcIAl dElImItErs, thEn prOcEss thEExtrActEd blOck. 

如果需要对替换字符进行特殊处理,如此处的变为大写字母,应该去使用appendReplacement()方法


reset()

可以将现有的Matcher对象应用于一个新的字符序列

public class Resetting {    public static void main(String[]args) throws Exception{        Matcher m=Pattern.compile("[frb][aiu][gx]")                .matcher("fix the rug with bags");        while(m.find())            System.out.print(m.group()+" ");        System.out.println();        m.reset("fix the rig with rags");        while(m.find())            System.out.print(m.group()+" ");    }}Output:fix rug bag fix rig rag 
原创粉丝点击