URLDecoder: Incomplete trailing escape (%) pattern错误处理
来源:互联网 发布:秦可卿怎么死的知乎 编辑:程序博客网 时间:2024/05/18 00:06
爬虫过程中可能会碰到url中含有普通的%字符的情况,如果直接用URLDecode.decode()则会出现如题的错误,解决方法就是先将’%’编码为’%25’,再对url解码。
public static void main(String[] args) throws Exception{ String test = "http://www.baidu.com?123%";//随意构造的 //URLDecoder.decode(test, "utf8");//如直接接就会报如题的错误。 System.out.println(URLDecoder.decode(test.replaceAll("%", "%25"), "utf8"));}
输出:
http://www.baidu.com?123%
上述是最简单的一种情况,但是绝大多数情况会掺杂着%为编码的含义,此时只把%替换为%25是不能解出正确的url的,如下:
public static void main(String[] args) throws Exception{ String test = "http://www.baidu.com?%e4%b8%ad%e5%9b%bd123%";//%e4%b8%ad%e5%9b%bd为中国 System.out.println(URLDecoder.decode(test.replaceAll("%", "%25"), "utf8")); }
输出:
http://www.baidu.com?%e4%b8%ad%e5%9b%bd123%
解决方法:
public class ConverPercent { //判断是否为16进制数 public static boolean isHex(char c){ if(((c >= '0') && (c <= '9')) || ((c >= 'a') && (c <= 'f')) || ((c >= 'A') && (c <= 'F'))) return true; else return false; } public static String convertPercent(String str){ StringBuilder sb = new StringBuilder(str); for(int i = 0; i < sb.length(); i++){ char c = sb.charAt(i); //判断是否为转码符号% if(c == '%'){ if(((i + 1) < sb.length() -1) && ((i + 2) < sb.length() - 1)){ char first = sb.charAt(i + 1); char second = sb.charAt(i + 2); //如只是普通的%则转为%25 if(!(isHex(first) && isHex(second))) sb.insert(i+1, "25"); } else{//如只是普通的%则转为%25 sb.insert(i+1, "25"); } } } return sb.toString(); } public static void main(String[] args) throws UnsupportedEncodingException{ String test = "http://www.baidu.com?%e4%b8%ad%e5%9b%bd123%"; //URLDecoder.decode(test, "utf8");//如直接接就会报如题的错误。 String url = convertPercent(test); System.out.println(url); System.out.println(URLDecoder.decode(url,"utf8")); }}
输出:
http://www.baidu.com?%e4%b8%ad%e5%9b%bd123%25http://www.baidu.com?中国123%
0 0
- URLDecoder: Incomplete trailing escape (%) pattern错误处理
- URLDecoder: Incomplete trailing escape (%) pattern错误处理
- URLDecoder: Incomplete trailing escape (%) pattern问题处理
- URLDecoder: Incomplete trailing escape (%) pattern问题处理
- URLDecoder: Incomplete trailing escape (%) pattern
- URLDecoder: Incomplete trailing escape (%) pattern
- URLDecoder: Illegal hex characters in escape (%) pattern
- URLDecoder: Illegal hex characters in escape (%) pattern - For input string:
- Escape 解决中文乱码(URLDecoder: Illegal hex characters in escape (%) pattern - For input string: "u6")
- Could not open the editor: URLDecoder: Illegal hex characters in escape (%) pattern - For input stri
- URLEncoder.encode、URLDecoder.decode、escape、encodeURI、encodeURIComponent、
- URLEncoder.encode、URLDecoder.decode、escape、encodeURI、encodeURIComponent
- URLDecoder异常Illegal hex characters in escape (%)
- URLDecoder
- The project was not built since its build path is incomplete(错误处理方法)
- spring错误处理 Build path is incomplete. Cannot find class file for org.springframework.aop.Advisor
- dereferencing pointer to incomplete type错误
- dereferencing pointer to incomplete type 错误
- 【贪心】HDU2187悼念512汶川大地震遇难同胞——老人是真饿了
- Object-C 属性
- Java 关闭Thread详解
- jumbo安装redis步骤以及redis配置详解
- HDU-1599 find the mincost route (无向图最小环[Floyd])
- URLDecoder: Incomplete trailing escape (%) pattern错误处理
- I/O多路转接之select
- Java 构造器研究
- 从数据仓库到数据视图
- BestCoder Round #72 Clarke and chemistry
- chrome浏览器安装
- zoj3490
- 判断两个二叉树是否相等(仅结构) -- 递归和非递归实现
- js页面转码