【java工具类】网站安全---将url编码并去除javascript注入

来源：互联网发布：知乎母亲生日祝福编辑：程序博客网时间：2024/06/07 11:23

【前言】

为了让url变得没有威胁。我们可以实现encodeURIComponent及encodeURI的java版，然后根据需要将javascript：这个关键词给过滤。

【javascript里面两者有什么区别？】

没什么区别只是白名单范围不一样：

encodeURIComponent的解释：

http://www.w3school.com.cn/js/jsref_encodeURIComponent.asp

encodeURI的解释：

http://www.w3school.com.cn/js/jsref_encodeURI.asp

还有更加详细的一篇文章的解释：

http://www.cnblogs.com/winner/archive/2007/08/28/873498.html

http://www.cnblogs.com/artwl/archive/2012/03/07/2382848.html

上面两个都是用utf-8编码的，学到了。

我们可以看到：

前者的范围更加严格，连冒号都给你转义，而后者相对而言较宽松，所以这个url（含：http://这些的）推荐用encodeURI，encodeURIComponent适用于局部编码。

下面是更加详细的解释：

encodeURI: 该函数对传入字符串中的所有非（基本字符、Mark字符和保留字符）进行转义编码（escaping）。所有的需要转义的字符都按照UTF-8编码转化成为一个、两个或者三个字节的十六进制转义字符（％xx）。例如，字符空格" "转换成为"%20"。在这种编码模式下面，需要编码的ASCII字符用一个字节转义字符代替，在\u0080和\u007ff之间的字符用两个字节转义字符代替，其他16为Unicode字符用三个字节转义字符代替
encodeURIComponent: 该函数处理方式和encodeURI只有一个不同点，那就是对于保留字符同样做转义编码。例如，字符":"被转义字符"%3A"代替

具体实现：

package Easis.HTTP;import java.io.UnsupportedEncodingException;import java.net.URLEncoder;import java.util.ArrayList;public class URIHelper {    /*    * 仿照javascript的encodeURICompoent，decodeURIComponent的范围，共有不直接编码的字符71个，为    * !， '，(，)，*，-，.，_，~，0-9，a-z，A-Z    * 【整理如下：! '()*-._~】    * */    public static final String ALLOWED_CHARS_URICOMPONENT = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789! '()*-._~";    /*    * 仿照javascript的encodeURICompoent，decodeURIComponent的范围，共有不直接编码的字符82个，为    * !，#，$，&，'，(，)，*，+，,，-，.，/，:，;，=，?，@，_，~，0-9，a-z，A-Z    * 【整理如下：!#$&'()*+,-./:;=?@_~】    *    * */    public static final String ALLOWED_CHARS_URI = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!#$&'()*+,-./:;=?@_~";    public static String encodeURIComponent(String input) {        if (input == null || "".equals(input)) {            return input;        }        int l = input.length();        StringBuilder o = new StringBuilder(l * 3);        try {            for (int i = 0; i < l; i++) {                String e = input.substring(i, i + 1);                if (ALLOWED_CHARS_URICOMPONENT.indexOf(e) == -1) {                    byte[] b = e.getBytes("utf-8");                    o.append(getHex(b));                    continue;                }                o.append(e);            }            return o.toString();        } catch (UnsupportedEncodingException e) {            e.printStackTrace();        }        return input;    }    public static String encodeURI(String input) {        if (input == null || "".equals(input)) {            return input;        }        int l = input.length();        StringBuilder o = new StringBuilder(l * 3);        try {            for (int i = 0; i < l; i++) {                String e = input.substring(i, i + 1);                if (ALLOWED_CHARS_URI.indexOf(e) == -1) {                    byte[] b = e.getBytes("utf-8");                    o.append(getHex(b));                    continue;                }                o.append(e);            }            return o.toString();        } catch (UnsupportedEncodingException e) {            e.printStackTrace();        }        return input;    }    private static String getHex(byte buf[]) {        StringBuilder o = new StringBuilder(buf.length * 3);        for (int i = 0; i < buf.length; i++) {            int n = (int) buf[i] & 0xff;            o.append("%");            if (n < 0x10) {                o.append("0");            }            o.append(Long.toString(n, 16).toUpperCase());        }        return o.toString();    }    public static String decodeURIComponent(String encodedURI) {        char actualChar;        StringBuffer buffer = new StringBuffer();        int bytePattern, sumb = 0;        for (int i = 0, more = -1; i < encodedURI.length(); i++) {            actualChar = encodedURI.charAt(i);            switch (actualChar) {                case '%': {                    actualChar = encodedURI.charAt(++i);                    int hb = (Character.isDigit(actualChar) ? actualChar - '0'                            : 10 + Character.toLowerCase(actualChar) - 'a') & 0xF;                    actualChar = encodedURI.charAt(++i);                    int lb = (Character.isDigit(actualChar) ? actualChar - '0'                            : 10 + Character.toLowerCase(actualChar) - 'a') & 0xF;                    bytePattern = (hb << 4) | lb;                    break;                }                case '+': {                    bytePattern = ' ';                    break;                }                default: {                    bytePattern = actualChar;                }            }            if ((bytePattern & 0xc0) == 0x80) { // 10xxxxxx                sumb = (sumb << 6) | (bytePattern & 0x3f);                if (--more == 0)                    buffer.append((char) sumb);            } else if ((bytePattern & 0x80) == 0x00) { // 0xxxxxxx                buffer.append((char) bytePattern);            } else if ((bytePattern & 0xe0) == 0xc0) { // 110xxxxx                sumb = bytePattern & 0x1f;                more = 1;            } else if ((bytePattern & 0xf0) == 0xe0) { // 1110xxxx                sumb = bytePattern & 0x0f;                more = 2;            } else if ((bytePattern & 0xf8) == 0xf0) { // 11110xxx                sumb = bytePattern & 0x07;                more = 3;            } else if ((bytePattern & 0xfc) == 0xf8) { // 111110xx                sumb = bytePattern & 0x03;                more = 4;            } else { // 1111110x                sumb = bytePattern & 0x01;                more = 5;            }        }        return buffer.toString();    }    public static String decodeURI(String encodeURI){        return  decodeURIComponent(encodeURI);    }    public static void main(String[] arges){        System.out.println(decodeURIComponent("%E4%BD%A0%E5%A5%BD%20%E7%9C%9F%E7%9A%84"));        System.out.println(encodeURIComponent("你好 真的"));        System.out.println("%E4%BD%A0%E5%A5%BD%20%E7%9C%9F%E7%9A%84");        String url___1="jAVascript:http://www.dgdgdgdg.com/index.jsp?????=idfidif&^%$#@!())_javascript:void(0)&alert=javascript:alert('ok')javascript:vaoidfdfdf";         String result1=filterURI(url___1);        System.out.println(result1);    }   /**   *这个方法只在于将特殊的字符及字符串给转换，直至不产生特殊编码为止。注意，这个是仿照javascript的encodeURI（在加上对javascript：这种形式的禁止）    * 具体来说：    * escape 不编码字符有69个：*，+，-，.，/，@，_，0-9，a-z，A-Z    encodeURI 不编码字符有82个：!，#，$，&，”，(，)，*，+，,，-，.，/，:，;，=，?，@，_，~，0-9，a-z，A-Z    encodeURIComponent 不编码字符有71个：!， ”，(，)，*，-，.，_，~，0-9，a-z，A-Z    详细请百度或google。    * @param url 需要防止产生注入的url地址   * */    public static String filterURI(String url){        if(url==null||url.length()<=0||url.trim().length()<=0||url.trim().toLowerCase().equals("javascript:void(0)")){            return "javascript:void(0)";        }        else if(url.toLowerCase().trim().equals("#")){            return "#";        }        String str_url=encodeURI(url);        String str_url_ingoreCase=str_url.toLowerCase();        int js_position=str_url_ingoreCase.indexOf("javascript:");        int maohao_location=js_position+10;        if (js_position!=-1){            String lpart=str_url.substring(0,11);            String rpart=str_url.substring(11);            str_url=encodeURIComponent(lpart)+rpart;        }        return  str_url;    }}

可能会有一些bug，到时候就看大家修改了。