Thinking in java-36 Regular expression正则表达式
来源:互联网 发布:api数据接口 编辑:程序博客网 时间:2024/05/19 23:17
1. 什么是正则表达式?
部分内容整理自此,了解详细内容可以查看此处。
正则表达式定义了一种字符串查找的模式。
这种字符串search pattern可以是最简单的单字符、某个固定的字符串或者是某种特殊模式的复杂的表达式。
所以我们可以这样说:如果一个字符串符合了regular expression的模式,那么他就和我所要找的内容相匹配。(If a string has these things in it, then it matches what I’m looking for.)
2. 一些常见的正则表达式规则
该部分内容是可用的元字符集的概述,其实正则表达式不仅仅用于Java语言中,在其他大部分语言如Perl,Groovy等都是支持的(只是不同的语言对于正则表达式的支持略有不同);不过该部分的内容应该算是普适性的,和语言无关。
2.1. 常见的匹配符号
2.2 一些元字符meta-character
下面的这些元字符有预先定义的意义,元字符的引入使得一些常见模式更易于被使用,如使用:\d 代替[0-9]。
2.3 量词 quantifier
量词定义了某个元素出现的次数。这些符号’?’,’*’, ‘+’ and ‘{}’定义了正则表达式所出现的次数。
2.4 分组和后向引用Group & back-reference
可以在正则表达式中使用分组。语法规则是:我们使用 ()来控制分组。通过
package com.fqy.blog;import org.junit.Test;public class StrRegex { public static final String EXAMPLE_TEST = "The extra space is for testing removal , hha ."; @Test public void testBackRef() { // Removes whitespace between a word character and . or , String pattern = "(\\w)(\\s+)([\\.,])"; System.out.println(EXAMPLE_TEST.replaceAll(pattern, "$1$3")); // Extract the text between the two title elements String titleStr = "<Title title1 in> Header</title>"; pattern = "(?i)(<title.*?>)(.+?)(</title>)";// </title> String updated = titleStr.replaceAll(pattern, "$2"); System.out.println(updated); }}//Running result:The extra space is for testing removal, hha. Header
2.5 负向前negative look ahead
负向前,是通过(?!pattern). 比如,下面的语法会匹配’a’ 如果 ‘a’后跟的不是’b’.
a(?!b)
2.6 指定正则表达式模式
可以添加模式修饰符在正则表达式的开始处。也可以指定多种模式,具体实现方法是将其复合在一起,e.g.(?ismx) .
- (?i):使得正则表达式忽略大小写, case insensitive.
- (?s): 单行模式,使得’.’ 匹配所有的字符,包括换行符。
- (?m):多行匹配模式,使得 ‘^’ 与’$’匹配字符串的行首和行尾。
2.7 关于 ‘\’
java字符串中的 \是一种转义字符,这意味着’\’在java中有预先定义的含义。我们必须使用 \定义一个.。比如想要使用 \w, 我们需要使用\w ; 我们想要使用’\’时,我们必须使用\\.
3. String类中关于regular expression 的应用
- boolean java.lang.String.matches(String regex)
- String[] java.lang.String.split(String regex)
- String java.lang.String.replaceFirst(String regex, String replacement)
- String java.lang.String.replaceAll(String regex, String replacement)
package com.fqy.blog;import org.junit.Test;public class StrRegex { @Test public void testBackRef() { // Removes whitespace between a word character and . or , final String EXAMPLE_TEST = "The extra space is for testing removal , hha ."; String pattern = "(\\w)(\\s+)([\\.,])"; System.out.println(EXAMPLE_TEST.replaceAll(pattern, "$1$3")); // Extract the text between the two title elements String titleStr = "<Title title1 in> Header</title>"; pattern = "(?i)(<title.*?>)(.+?)(</title>)";// </title> String updated = titleStr.replaceAll(pattern, "$2"); System.out.println(updated); } @Test public void strRegTest() { String EXAMPLE_TEST = "This is my small example " + "string which I'm going to " + "use for pattern matching."; System.out.println(EXAMPLE_TEST.matches("\\w.*")); // True String[] splitString = (EXAMPLE_TEST.split("\\s+")); System.out.println(splitString.length);// should be 14 for (String string : splitString) { System.out.print(string + " "); } System.out.println(); // replace all whitespace with tabs System.out.println(EXAMPLE_TEST.replaceAll("\\s+", "\t")); }}
//Running result:
true
14
This is my small example string which I’m going to use for pattern matching.
This is my small example string which I’m going to use for pattern matching.
另一些例子:
public class StringMatcher { // returns true if the string matches exactly "true" public boolean isTrue(String s){ return s.matches("true"); } // returns true if the string matches exactly "true" or "True" public boolean isTrueVersion2(String s){ return s.matches("[tT]rue"); } // returns true if the string matches exactly "true" or "True" // or "yes" or "Yes" public boolean isTrueOrYes(String s){ return s.matches("[tT]rue|[yY]es"); } // returns true if the string contains exactly "true" public boolean containsTrue(String s){ return s.matches(".*true.*"); } // returns true if the string contains of three letters public boolean isThreeLetters(String s){ return s.matches("[a-zA-Z]{3}"); // simpler from for// return s.matches("[a-Z][a-Z][a-Z]"); } // returns true if the string does not have a number at the beginning public boolean isNoNumberAtBeginning(String s){ return s.matches("^[^\\d].*"); } // returns true if the string contains a arbitrary number of characters except b public boolean isIntersection(String s){ return s.matches("([\\w&&[^b]])*"); } // returns true if the string contains a number less than 300 public boolean isLessThenThreeHundred(String s){ return s.matches("[^0-9]*[12]?[0-9]{1,2}[^0-9]*"); }}import org.junit.Before;import org.junit.Test;import static org.junit.Assert.assertFalse;import static org.junit.Assert.assertTrue;public class StringMatcherTest { private StringMatcher m; @Before public void setup(){ m = new StringMatcher(); } @Test public void testIsTrue() { assertTrue(m.isTrue("true")); assertFalse(m.isTrue("true2")); assertFalse(m.isTrue("True")); } @Test public void testIsTrueVersion2() { assertTrue(m.isTrueVersion2("true")); assertFalse(m.isTrueVersion2("true2")); assertTrue(m.isTrueVersion2("True"));; } @Test public void testIsTrueOrYes() { assertTrue(m.isTrueOrYes("true")); assertTrue(m.isTrueOrYes("yes")); assertTrue(m.isTrueOrYes("Yes")); assertFalse(m.isTrueOrYes("no")); } @Test public void testContainsTrue() { assertTrue(m.containsTrue("thetruewithin")); } @Test public void testIsThreeLetters() { assertTrue(m.isThreeLetters("abc")); assertFalse(m.isThreeLetters("abcd")); } @Test public void testisNoNumberAtBeginning() { assertTrue(m.isNoNumberAtBeginning("abc")); assertFalse(m.isNoNumberAtBeginning("1abcd")); assertTrue(m.isNoNumberAtBeginning("a1bcd")); assertTrue(m.isNoNumberAtBeginning("asdfdsf")); } @Test public void testisIntersection() { assertTrue(m.isIntersection("1")); assertFalse(m.isIntersection("abcksdfkdskfsdfdsf")); assertTrue(m.isIntersection("skdskfjsmcnxmvjwque484242")); } @Test public void testLessThenThreeHundred() { assertTrue(m.isLessThenThreeHundred("288")); assertFalse(m.isLessThenThreeHundred("3288")); assertFalse(m.isLessThenThreeHundred("328 8")); assertTrue(m.isLessThenThreeHundred("1")); assertTrue(m.isLessThenThreeHundred("99")); assertFalse(m.isLessThenThreeHundred("300")); }}
4. 之前Sting版本并没有进行性能的优化,Java中提供了优化版本的 Pattern & Matcher
java.util.regex.Pattern
java.util.regex.Matcher
Pattern: 通过Pattern类定义正则表达式。
Matcher: 通过Pattern对象可以创建一个给定的字符串的Matcher对象,通过Matcher对象可以对字符串进行正则表达式的操作。
package com.fqy.blog;import java.util.regex.Matcher;import java.util.regex.Pattern;import org.junit.Test;public class StrRegex { @Test public void strPatternMatcher() { String EXAMPLE_TEST = "This is my small example string which I'm going to use for pattern matching."; Pattern pattern = Pattern.compile("\\w+"); // in case you would like to ignore case sensitivity, // you could use this statement: // Pattern pattern = Pattern.compile("\\s+", Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(EXAMPLE_TEST); // check all occurance while (matcher.find()) { System.out.print("Start index: " + matcher.start()); System.out.print(" End index: " + matcher.end() + " "); System.out.println(matcher.group()); } // now create a new pattern and matcher to replace whitespace with tabs Pattern replace = Pattern.compile("\\s+"); Matcher matcher2 = replace.matcher(EXAMPLE_TEST); System.out.println(matcher2.replaceAll("\t")); }}//Running results:Start index: 0 End index: 4 ThisStart index: 5 End index: 7 isStart index: 8 End index: 10 myStart index: 11 End index: 16 smallStart index: 17 End index: 24 exampleStart index: 25 End index: 31 stringStart index: 32 End index: 37 whichStart index: 38 End index: 39 IStart index: 40 End index: 41 mStart index: 42 End index: 47 goingStart index: 48 End index: 50 toStart index: 51 End index: 54 useStart index: 55 End index: 58 forStart index: 59 End index: 66 patternStart index: 67 End index: 75 matchingThis is my small example string which I'm going to use for pattern matching.
- Thinking in java-36 Regular expression正则表达式
- java正则表达式; regular expression
- Java 正则表达式(Regular Expression)
- 正则表达式 java Regular Expression
- Java正则表达式(Java Regular Expression)
- 正则表达式(regular expression)
- 正则表达式(regular expression)
- 正则表达式(Regular Expression)
- 正则表达式(regular expression)
- 正则表达式 regular-expression
- 正则表达式(regular expression)
- 正则表达式(regular expression)
- 正则表达式(regular expression)
- 正则表达式(regular expression)
- 正则表达式(regular expression)
- 正则表达式(regular expression)
- 正则表达式(regular expression)
- 正则表达式(regular expression)
- Android launcher3 -- launcher3源码2
- C++只读成员函数、只读对象
- javascript时间格式加8小时
- ArrayList和LinkedList的区别
- EasyDSS RTMP流媒体服务器搭建直播集群
- Thinking in java-36 Regular expression正则表达式
- C输入与输出
- Linux下TCP协议的C/S架构实现
- java多线程之yeild学习
- Android--异步加载图片
- 模拟实现sleep函数——mysleep()
- 服务器主动断开连接异常
- JAR、WAR、EAR的使用和区别
- 【java】数组的12个最佳方法