正则表达式处理XML

来源:互联网 发布:淘宝账号余额查询 编辑:程序博客网 时间:2024/05/20 01:34
<tr><td>5345454354</td><td>2010-3-29 13:48:33</td><td>周杰伦</td></tr><tr><td>6565465466</td><td>2010-3-29 15:34:38</td><td>张学友</td></tr><tr><td>6546546546</td><td>2010-3-30 19:30:50</td><td>刘德华</td></tr><tr><td>9875646545</td><td>2010-3-31 2:20:58</td><td>郭富城</td></tr><tr><td>7868768768</td><td>2010-3-31 8:03:11</td><td>梁朝伟</td></tr><tr><td>1434444446 </td><td>2010-3-31 8:45:52</td><td>习近平</td></tr><tr><td>7665466666</td><td>2010-3-31 18:00:46</td><td>李长春</td></tr>

若想取标记<td></td>之间的内容, 可以这样分析

表达式

说明

(?<=Expression) 逆序肯定环视,表示所在位置左侧能够匹配Expression(?<!Expression) 逆序否定环视,表示所在位置左侧不能匹配Expression(?=Expression) 顺序肯定环视,表示所在位置右侧能够匹配Expression(?!Expression) 顺序否定环视,表示所在位置右侧不能匹配Expression

(?is)(?<=<td>).+?(?=</td>)(?is) 模式修饰,i表示忽略大小写,s表示单行模式.能匹配回车换行(?<=<td>) 逆序肯定环视,需要匹配的结果以<td>开头,但是<td>匹配,结果中不包含<td>.+? 任意字符,每次匹配到符合的(任意字符),即尝试匹配后面的表达式,直到后面的表达式失败,回溯上一次匹配结果。(?=</td>) 顺序肯定环视,匹配的结果最后要以</td>结尾,但</td>不匹配,结果中不包含</td>

正则取xml内容比dom4j快50倍?

long t1 = System.nanoTime();String str = "<xml><ToUserName><![CDATA[gh_520f99dff7cc]]></ToUserName><FromUserName><![CDATA[oBAMOs3aZB0dkbILsBR1wksbmli4]]></FromUserName><CreateTime>1416900555</CreateTime><MsgType><![CDATA[event]]></MsgType><Event><![CDATA[MASSSENDJOBFINISH]]></Event><MsgID>2348714844</MsgID><Status><![CDATA[send success]]></Status><TotalCount>1</TotalCount><FilterCount>1</FilterCount><SentCount>1</SentCount><ErrorCount>0</ErrorCount></xml>";//          Document doc = null;//          try {//              doc = DocumentHelper.parseText(str);//          } catch (DocumentException e) {//              log.error("解析群发xml错误:"+e.getMessage(), e);//          }//          //          Element root = doc.getRootElement();//          String msgid = root.elementTextTrim("MsgID");//          String Status = root.elementTextTrim("Status");//          String TotalCount = root.elementTextTrim("TotalCount");//          String FilterCount = root.elementTextTrim("FilterCount");//          String SentCount = root.elementTextTrim("SentCount");//          String ErrorCount = root.elementTextTrim("ErrorCount");            String msgid = RegExp.getString(str,                    "(?<=<MsgID>)[\\s\\S]*?(?=</MsgID>)").trim();            String Status = RegExp.getString(str,                "(?<=<Status><!\\[CDATA\\[)[\\s\\S]*?(?=\\]\\]></Status>)")                .trim();            String TotalCount = RegExp.getString(str,                "(?<=<TotalCount>)[\\s\\S]*?(?=</TotalCount>)")                .trim();            String FilterCount = RegExp.getString(str,                "(?<=<FilterCount>)[\\s\\S]*?(?=</FilterCount>)")                .trim();            String SentCount = RegExp.getString(str,                "(?<=<SentCount>)[\\s\\S]*?(?=</SentCount>)")                .trim();            String ErrorCount = RegExp.getString(str,                "(?<=<ErrorCount>)[\\s\\S]*?(?=</ErrorCount>)")                .trim();            long t2 = System.nanoTime();            log.info(t2-t1);            log.info((t2-t1)*0.000001);            log.info(msgid+", "+Status+", "+TotalCount+", "+FilterCount+", "+SentCount+", "+ErrorCount);

正则代码:

public class RegExp {    public static ArrayList<String> getStrs(String source, String regex) {        Pattern p = Pattern.compile(regex);        Matcher m = p.matcher(source);        ArrayList<String> list = new ArrayList();        while (m.find()) {            list.add(source.substring(m.start(), m.end()));        }        return list;    }    public static String getString(String source, String regex) {        ArrayList<String> list = getStrs(source, regex);        if (list.size() > 0) {            return (String) list.get(0);        }        return "";    }    public static ArrayList<String> getStrs(String source, String beginStr,        String endStr, boolean isLong) {        if (isLong) {            return getStrs(source,                "(?<=" + replay(beginStr) + ")[\\s\\S]*(?=" + replay(endStr) +                ")");        }        return getStrs(source,            "(?<=" + replay(beginStr) + ")[\\s\\S]*?(?=" + replay(endStr) +            ")");    }    public static String getString(String source, String beginStr,        String endStr, boolean isLong) {        if (isLong) {            return getString(source,                "(?<=" + replay(beginStr) + ")[\\s\\S]*(?=" + replay(endStr) +                ")");        }        return getString(source,            "(?<=" + replay(beginStr) + ")[\\s\\S]*?(?=" + replay(endStr) +            ")");    }    private static String replay(String source) {        String result = "";        result = source.replace("\\", "\\\\");        result = source.replace(".", "\\.");        result = result.replace("(", "\\(");        result = result.replace(")", "\\)");        result = result.replace("[", "\\[");        result = result.replace("]", "\\]");        result = result.replace("{", "\\{");        result = result.replace("}", "\\}");        result = result.replace("$", "\\$");        result = result.replace("?", "\\?");        result = result.replace("&", "\\&");        result = result.replace("*", "\\*");        result = result.replace("!", "\\!");        result = result.replace("^", "\\^");        result = result.replace("+", "\\+");        result = result.replace("#", "\\#");        return result;    }}