正则表达式常见匹配案例

来源:互联网 发布:linux 查找隐藏文件 编辑:程序博客网 时间:2024/05/22 07:04

匹配中文
[\u4E00-\u9FA5]
匹配空白行
\n\s*\r
匹配HTML标记
<(\S*?)[^>]*>.*?<^1>|<.*?/>
匹配URL
[a-zA-Z]+://[^\s]*
匹配国内电话(带区号的如0668-7610110)

\d{3}-\d{8}|\d{4}-\d{7}

匹配腾讯的QQ号
[1-9][0-9]{4,}
匹配中国邮政编码
[1-9]\d{5}(?!\d)

匹配身份证
\d{15}|\d{18}
匹配ip地址
\d+.\d+.\d+.\d+

从一个给定的字符串中找到想要的字符串:

public class RegexTest {    public static void main( String args[] ){          // 按指定模式在字符串查找          String line = "My name is Jianguotang。I am a Android programmer.I am 21 years old";          String pattern = "(\\D*)(\\d+)(.*)";          // 创建 Pattern 对象          Pattern r = Pattern.compile(pattern);          // 现在创建 matcher 对象          Matcher m = r.matcher(line);          if (m.find( )) {             System.out.println("Found value: " + m.group(0) );             System.out.println("Found value: " + m.group(1) );             System.out.println("Found value: " + m.group(2) );             System.out.println("Found value: " + m.group(3) );           } else {             System.out.println("NO MATCH");          }       }}

运行结果

Found value: My name is Jianguotang。I am a Android programmer.I am 21 years oldFound value: My name is Jianguotang。I am a Android programmer.I am Found value: 21Found value:  years old

假设您要替换一个字母后跟一个点或逗号的所有空格

String pattern = "(\\w)(\\s+)([\\.,])";System.out.println(EXAMPLE_TEST.replaceAll(pattern, ""));

替换标题标签之间的文本

pattern = "(?i)(<title.*?>)(.+?)()";String updated = EXAMPLE_TEST.replaceAll(pattern, "$2");

Pattern 和Matcher

您首先创建一个定义正则表达式的Pattern对象。此Pattern对象允许您为给定的字符串创建Matcher对象。这个Matcher对象然后允许你对String进行正则表达式操作

import java.util.regex.Matcher;import java.util.regex.Pattern;public class RegexTestPatternMatcher {        public static final String EXAMPLE_TEST = "This is my small example string which I'm going to use for pattern matching.";        public static void main(String[] args) {                Pattern pattern = Pattern.compile("\\w+");                // 如果您想忽略大小写敏感度,                // 你可以使用这个语句:                // Pattern pattern = Pattern.compile("\\s+", Pattern.CASE_INSENSITIVE);                Matcher matcher = pattern.matcher(EXAMPLE_TEST);                // 检查所有的结果                while (matcher.find()) {                        System.out.print("Start index: " + matcher.start());                        System.out.print(" End index: " + matcher.end() + " ");                        System.out.println(matcher.group());                }                // 现在创建一个新的pattern和Matcher 以用选项卡替换空格s                Pattern replace = Pattern.compile("\\s+");                Matcher matcher2 = replace.matcher(EXAMPLE_TEST);                System.out.println(matcher2.replaceAll("\t"));        }}
Start index: 0 End index: 4 ThisStart index: 5 End index: 7 isStart index: 8 End index: 10 myStart index: 11 End index: 16 smallStart index: 17 End index: 24 exampleStart index: 25 End index: 31 stringStart index: 32 End index: 37 whichStart index: 38 End index: 39 IStart index: 40 End index: 41 mStart index: 42 End index: 47 goingStart index: 48 End index: 50 toStart index: 51 End index: 54 useStart index: 55 End index: 58 forStart index: 59 End index: 66 patternStart index: 67 End index: 75 matchingThis    is  my  small   example string  which   I'm going   to  use for pattern matching.

建立一个链接检查器

从网页中提取所有有效的链接。它不考虑以“javascript:”或“mailto:”开头的链接。

public class LinkGetter {        private Pattern htmltag;        private Pattern link;        public LinkGetter() {                htmltag = Pattern.compile("<a\\b[^>]*href=\"[^>]*>(.*?)</a>");                link = Pattern.compile("href=\"[^>]*\">");        }        public List<String> getLinks(String url) {                List<String> links = new ArrayList<String>();                try {                        BufferedReader bufferedReader = new BufferedReader(                                        new InputStreamReader(new URL(url).openStream()));                        String s;                        StringBuilder builder = new StringBuilder();                        while ((s = bufferedReader.readLine()) != null) {                                builder.append(s);                        }                        Matcher tagmatch = htmltag.matcher(builder.toString());                        while (tagmatch.find()) {                                Matcher matcher = link.matcher(tagmatch.group());                                matcher.find();                                String link = matcher.group().replaceFirst("href=\"", "")                                                .replaceFirst("\">", "")                                                .replaceFirst("\"[\\s]?target=\"[a-zA-Z_0-9]*", "");                                if (valid(link)) {                                        links.add(makeAbsolute(url, link));                                }                        }                } catch (MalformedURLException e) {                        e.printStackTrace();                } catch (IOException e) {                        e.printStackTrace();                }                return links;        }        private boolean valid(String s) {                if (s.matches("javascript:.*|mailto:.*")) {                        return false;                }                return true;        }        private String makeAbsolute(String url, String link) {                if (link.matches("http://.*")) {                        return link;                }                if (link.matches("/.*") && url.matches(".*$[^/]")) {                        return url + "/" + link;                }                if (link.matches("[^/].*") && url.matches(".*[^/]")) {                        return url + "/" + link;                }                if (link.matches("/.*") && url.matches(".*[/]")) {                        return url + link;                }                if (link.matches("/.*") && url.matches(".*[^/]")) {                        return url + link;                }                throw new RuntimeException("Cannot make the link absolute. Url: " + url                                + " Link " + link);        }}

找到重复的单词
以下正则表达式匹配重复的单词。

\b(\w+)\s+\1\b

\ b是一个字边界,\ 1引用了第一个组的捕获匹配,即第一个字。 (?! - in)\ b(\ w +)\ 1 \ b如果不以“-in”开头,则会找到重复的单词。 提示:添加(?s)以跨多行搜索

寻找从新行开始的元素
以下正则表达式允许您找到“标题”单词,以防其在新行中开始。

(\n\s*)title
1 0
原创粉丝点击