java 正则表达式

来源：互联网发布：主力净买入指标源码编辑：程序博客网时间：2024/05/17 22:42

一.java.lang.String对正则表达式的应用

public boolean matches(String regex)public String replaceFirst(String regex,String replacement)public String replaceAll(String regex,String replacement)public String[] split(String regex,int limit)public String[] split(String regex)

例子：

String regex = "abc";String input = "abc";boolean b = input.matches(input);System.out.println(b);//trueString regex = "a";String input = "ayatem";System.out.println(input.replaceFirst(regex, "s"));//syatemSystem.out.println(input.replaceAll(regex, "s"));//syatem//注意与replace的区别。replaceAll支持正则表达式，因此会对参数进行解析（两个参数均是），如replaceAll("\\d", "*")，而replace则不会，replace("\\d","*")就是替换"\\d"的字符串，而不会解析为正则。String regex = ":";String input = "1:2:3:4";System.out.println(Arrays.toString(input.split(regex)));// [1, 2, 3, 4]System.out.println(Arrays.toString(input.split(regex, 3)));//[1, 2, 3:4]System.out.println(Arrays.toString(input.split(regex, 2)));//[1, 2:3:4]

二.正则表达式的基本语法

1.Characters （字符：匹配单个字符）x The character x \\ The backslash character \0n The character with octal value 0n (0 <= n <= 7) \0nn The character with octal value 0nn (0 <= n <= 7) \0mnn The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7) \xhh The character with hexadecimal value 0xhh \uhhhh The character with hexadecimal value 0xhhhh \t The tab character ('\u0009') \n The newline (line feed) character ('\u000A') \r The carriage-return character ('\u000D') \f The form-feed character ('\u000C') \a The alert (bell) character ('\u0007') \e The escape character ('\u001B') \cx The control character corresponding to x   2.Character classes （字符范围：匹配单个字符）[abc] a, b, or c (simple class) [^abc] Any character except a, b, or c (negation) [a-zA-Z] a through z or A through Z, inclusive (range) [a-d[m-p]]a through d, or m through p: [a-dm-p] (union) [a-z&&[def]] d, e, or f (intersection) [a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction) [a-z&&[^m-p]] a through z, and not m through p: [a-lq-z](subtraction)   3.Predefined character classes （预定义表达式，简化字符范围）. Any character (may or may not match line terminators) \d A digit: [0-9] \D A non-digit: [^0-9] \s A whitespace character: [ \t\n\x0B\f\r] \S A non-whitespace character: [^\s] \w A word character: [a-zA-Z_0-9] \W A non-word character: [^\w]   4.Boundary matchers （边界匹配）^ The beginning of a line $ The end of a line \b A word boundary \B A non-word boundary \A The beginning of the input \G The end of the previous match \Z The end of the input but for the final terminator, if any \z The end of the input 5.quantifiers （量词）X? X,once or not at all X* X, zero or more times X+ X, one or more times X{n} X, exactly n times X{n,} X, at least n times X{n,m} X, at least n but not more than m times 6.Logical operators （逻辑运算）XY X followed by Y X|Y Either X or Y (X) X, as a capturing group 7.Back references （回溯引用）\n Whatever the nth capturing group matched \k<name> Whatever the named-capturing group "name" matched

三.Class Pattern

java.util.regex.Pattern       其对象表示通过编译的正则式,利用该类对象可以与任意字符串进行模式匹配  构造器      Pattern类的构造器是private  声明      public final class Pattern extends Object implements Serializable  创建Pattern的静态工厂      public static Pattern compile(String regex)          将指定正则式编译成Pattern对象返回      public static Pattern compile(String regex,int flags)          将指定正则式按照指定标志编译成Pattern对象返回    public static final int CASE_INSENSITIVE      将启动对ASCII字符不区分大小写匹配      public static final int UNICODE_CASE      将启动Unicode字符不区分大小写匹配      public static final int DOTALL      将启动dotall模式,该模式下,"."将表示任意字符,包括回车符    还有以下几种：    public static final int COMMENTS     public static final int LITERAL      public static final int MULTILINE       public static final int UNICODE_CASE

例子：

//获取Pattern对象String regex = "//d+";Pattern pattern = Pattern.compile(regex);String regex = "//d+";//int flags = Pattern.CANON_EQ;int flags = Pattern.CANON_EQ | Pattern.CASE_INSENSITIVE;Pattern pattern = Pattern.compile(regex, flags);

四.Class Matcher

java.util.regex.Matcher 匹配器   声明        public final class Matcher extends Object implements MatchResult常用方法：Study methods review the input string and return a boolean indicating whether or not the pattern is found.    public boolean lookingAt(): Attempts to match the input sequence, starting at the beginning of the region, against the pattern.    public boolean find(): Attempts to find the next subsequence of the input sequence that matches the pattern.    public boolean find(int start): Resets this matcher and then attempts to find the next subsequence of the input sequence that matches the pattern, starting at the specified index.    public boolean matches(): Attempts to match the entire region against the pattern.Index methods provide useful index values that show precisely where the match was found in the input string:    public int start(): Returns the start index of the previous match.    public int start(int group): Returns the start index of the subsequence captured by the given group during the previous match operation.    public int end(): Returns the offset after the last character matched.    public int end(int group): Returns the offset after the last character of the subsequence captured by the given group during the previous match operation.

例子：

String regex = "w(el)(come)";String input = "Ladies and Gentleman, welcome to China, welcome to Shangdong";Pattern pattern = Pattern.compile(regex);Matcher matcher = pattern.matcher(input);/** * matches:整个匹配，只有整个字符序列完全匹配成功，才返回True，否则返回False。但如果前部分匹配成功，将移动下次匹配的位置。 * lookingAt:部分匹配，总是从第一个字符进行匹配,匹配成功了不再继续匹配，匹配失败了,也不继续匹配。 * find:部分匹配，从当前位置开始匹配，找到一个匹配的子串，将移动下次匹配的位置。 * reset:给当前的Matcher对象配上个新的目标，目标是就该方法的参数；如果不给参数， * reset会把Matcher设到当前字符串的开始处。 */System.out.println(matcher.lookingAt());// falseSystem.out.println(matcher.matches());// falseSystem.out.println(matcher.find());//trueSystem.out.println("text: " + matcher.group() + " start index: " + matcher.start() + " end index:" + matcher.end());// "text: "+matcher.group()+" start index: "+matcher.start()+" end index:"+matcher.end()System.out.println(matcher.find());//trueSystem.out.println("text: " + matcher.group() + " start index: " + matcher.start() + " end index:" + matcher.end());// "text: "+matcher.group()+" start index: "+matcher.start()+" end index:"+matcher.end()matcher.reset();//重置匹配位置System.out.println(matcher.find());//trueSystem.out.println("text: " + matcher.group() + " start index: " + matcher.start() + " end index:" + matcher.end());// "text: "+matcher.group()+" start index: "+matcher.start()+" end index:"+matcher.end()3.组的概念String regex = "w(el)(come)";String input = "Ladies and Gentleman, welcome to China, welcome to Shangdong";Pattern pattern = Pattern.compile(regex);Matcher matcher = pattern.matcher(input);int groupCount = matcher.groupCount();System.out.println(groupCount);// 2 每个（）即位一个组。group(0)为整个regexwhile (matcher.find()) {System.out.println(matcher.group(0));// welcomeSystem.out.println(matcher.group(1));// el System.out.println(matcher.group(2));// come System.out.println(matcher.group(3));// 运行时错误}

----------------------------------------2017/02/11 更新-------------------------------------

match,find区别

1.matches全局匹配 find部分匹配

String str = "I have an apple";String regex = "\\w+";matcher.matches();//false matcher.find();//true

2.find 从当前位置开始匹配，找到一个匹配的子串，将移动下次匹配的位置

String str = "I have an apple";String regex = "\\w+";while(matcher.find()){System.out.print(matcher.group()); // I hava an apple}

整个字符序列完全匹配成功，才返回True，否则返回False。但如果前部分匹配成功，将移动下次匹配的位置

String str = "I have an apple";String regex = "\\w+";System.out.println(matcher.matches()); //falsewhile(matcher.find()){System.out.println(matcher.group());// have an apple  matches前部分匹配成功没有输出I}

group

下例中regex有4组，加上自己共5组

String str = "date:2015-3-2 14:35";String regex = "^.*(\\d)-(\\d+)-(\\d+) (\\d+)$";if(matcher.matches()){System.out.println(matcher.group(0));//date:2015-3-2 14:35System.out.println(matcher.group(1));//5  应该想要2015，待解决System.out.println(matcher.group(2));//3System.out.println(matcher.group(3));//2System.out.println(matcher.group(4));//14}

命名分组（Java 7 新特性）

下面这个例子效果等同于上例，但更加容易获取匹配到的字串

String str = "date:2015-3-2 14:35";String regex = "^.*(?<year>\\d{4})-(?<month>\\d+)-(?<day>\\d+) (?<hour>\\d+).*$";if(matcher.matches()){System.out.println(matcher.group(0));System.out.println(matcher.group("year")); //2015 解决System.out.println(matcher.group("month"));System.out.println(matcher.group("day"));System.out.println(matcher.group("hour"));}

匹配模式

基本照搬：http://blog.csdn.net/chs_jdmdr/article/details/46885421

1、Greediness（贪婪型）：最大匹配

X?、X*、X+、X{n，}都是最大匹配。例如你要用“<.+>”去匹配“a<tr>aava </tr>abb”，也许你所期待的结果是想匹配“<tr>”，但是实际结果却会匹配到“<tr>aava </tr>”。这是为什么呢？下面我们跟踪下最大匹配的匹配过程。
①“<”匹配字符串的“<”。②“.+”匹配字符串的“tr>aava </tr>ab”，在进行最大匹配时，它把两个“>”都匹配了，它匹配了所有字符，直到文本的最后字符“b” ③这时，发现不能成功匹配“>”，开始按原路回退，用“a”与“>”匹配，直到“ab”前面的“>”匹配成功。
这就是最大匹配，我们匹配的时候应该看最后面能匹配到哪。

例子：

String str = "a<tr>aava</tr>abb";String regex = "<.+>";if(matcher.find()){System.out.println(matcher.group(0));//<tr>aava</tr>}

2、Reluctant(Laziness)（勉强型）：最小匹配

X?、X*、X+、X{n，}都是最大匹配。好，加个？就成了Laziness匹配。例如X??、X*?、X+?、X{n，}?都是最小匹配，其实X{n，m}?和X{n }?有些多余。
最小匹配意味者，.+? 匹配一个字符后，马上试一试>的匹配可能，失败了，则.+? 再匹配一个字符，再马上试一试>的匹配可能。JDK文档中Greedy 和 Reluctant，它是以eat一口来隐喻的，所以翻译成贪吃和（勉强的）厌食最贴切了。不过我喜欢最大匹配、最小匹配的说法。
例子：

String str = "a<tr>aava</tr>abb";String regex = "<.+?>";while(matcher.find()){System.out.print(matcher.group(0));}//<tr> </tr>

3、Possessive（占有型）：完全匹配

与最大匹配不同，还有一种匹配形式：X?+、X*+、X++、X{n，}+等，成为完全匹配。它和最大匹配一样，一直匹配所有的字符，直到文本的最后，但它不由原路返回。也就是说，一口匹配，搞不定就算了。

String test = "a<tr>aava</tr>abb ";String test2 = "<tr>";String reg = "<.++>";String reg2 = "<tr>";System.out.println(test.replaceAll(reg, "###"));//a<tr>aava</tr>abbSystem.out.println(test2.replaceAll(reg2, "###"));//###

0 0