何时需要做urlEncode,以及为什么要做

来源:互联网 发布:ipad提示无法加入网络 编辑:程序博客网 时间:2024/06/06 12:51
在RFC1738中,对于URL可以使用的字符集做了如下规定:

只有0-9a-zA-Z的字母以及$-_.+!*'(),"这几个特殊字符

而在html4中扩展了所有的unicode character set能够在url中使用。

那么到底有哪些字符需要encoded呢?

1. ascii control characters

 原因是:他们不可打印,

 字符范围iso-8859-1的00-1F 以及7F

2. non-ascii characters:

原因:这些字符因为不在ascii集合中不被认为在url中是合法的

字符范围: iso-latin的80-FF范围

3. reserved characters:

原因:URL使用部分预留的字符来定义url的语法。当这些字符在url中不被当作其特殊角色时,他们必须被encoded

字符范围: $, &,+, , /,:,;,=,?,@

 

CharacterCode
Points
(Hex)Code
Points
(Dec) Dollar ("$")
 Ampersand ("&")
 Plus ("+")
 Comma (",")
 Forward slash/Virgule ("/")
 Colon (":")
 Semi-colon (";")
 Equals ("=")
 Question mark ("?")
 'At' symbol ("@")24
26
2B
2C
2F
3A
3B
3D
3F
4036
38
43
44
47
58
59
61
63
64

4.unsafe characters

原因: 部分字符如果在url中可能导致歧义。这些字符也必须被encoded:

 

CharacterCode
Points
(Hex)Code
Points
(Dec)Why encode?Space2032Significant sequences of spaces may be lost in some uses (especially multiple spaces)Quotation marks
'Less Than' symbol ("<")
'Greater Than' symbol (">")22
3C
3E34
60
62These characters are often used to delimit URLs in plain text.'Pound' character ("#")2335This is used in URLs to indicate where a fragment identifier (bookmarks/anchors in HTML) begins.Percent character ("%")2537This is used to URL encode/escape other characters, so it should itself also be encoded.Misc. characters:
   Left Curly Brace ("{")
   Right Curly Brace ("}")
   Vertical Bar/Pipe ("|")
   Backslash ("\")
   Caret ("^")
   Tilde ("~")
   Left Square Bracket ("[")
   Right Square Bracket ("]")
   Grave Accent ("`")
7B
7D
7C
5C
5E
7E
5B
5D
60
123
125
124
92
94
126
91
93
96Some systems can possibly modify these chara

 如何做url encoded呢?

url encoding of a character包含一个%号,并且以iso-latin的16进制两位数来跟进

例如:

space = %20

使用javascript的 

encodeURIComponent 函数来实现