QP编码

来源：互联网发布：浙江计价软件编辑：程序博客网时间：2024/05/17 01:24

Quoted-Printable   加码规则(RFC   1341):

1.   字符用   =XX   形式表示，其中   XX   是该字符的十六进制值，
必须为   0-9   或者   A-F   （使用大写字符）,除非有可替换说明，
否则，此原则是强制性的。

2.   其中，十进制值   33-60   &   62-126(注意:   即不包含   '= '   )
可以作为标准   ASCII   从而不进行转换。

3.   另外，十进制值   9-32   也可以作为制表和格式控制字符，
从而不进行转换。(注意，这个不是必须执行的，即也可以转换)

4.   由于在   RFC822   协议中规定主体   body   文本中各行均有最大字
符限制，因此，当主体文本中出现   CRLF   或者   LFCR   字符序列，
或者单独的   CR   以及   LF   字符的时候，必须转换成对应的
"=0D=0A ", "=0A=0D ", "=0D ", "=0A "   等编码来表示。

5.   (关于软回车的问题)   Quoted-Printable   编码要求编码后每行
最大字符数量不得超过   76   个字符。如果对大于该字符数量的行进
行编码，则必须使用软回车。所以，对于某个以编码行的最后加上
'= '符号，则表示最后这个   '= '   是一个无意义的软回车。所以，如
果一个尚未编码的行的内容如下的话:

Now 's   the   time   for   all   folk   to   come   to   the   aid   of   their   country.

那么在   Quoted-Printable   中可以表示为:

Now 's   the   time   =
for   all   folk   to   come=
to   the   aid   of   their   country.

他提供了一种对过长的行进行编码并恢复到用户原来的输入内容的
机制。虽然一行的末尾的   CRLF   不计入   76   个字符的限制之中，但
是所有的其他字符，包括   '= '   符号都将被计算在内。

由于连字符号   '- '   在   Quoted-Printable   编码中表示他自己，所以当
我们在对一个   multipart   实体的主体内容编码的时候，我们必须注
意：我们决不能让一个   boundary   标志符出现在编码的主体部分！
(一个比较好的办法是在   boundary   中包含一个 "=_ ",这样就决不会重复
了，具体情况清查阅   RFC   1341   中的   multipart   message   的定义部分。)

注意：采用   Quoted-Printable   编码是邮件的传输过程中，对于易读性
和可靠性折衷的一种编码。对于使用   Quoted-Printable   编码的邮件主
体，绝大多数邮件网关(mail   gateway)都能够可靠的工作，但是也可能
在极少的邮件网关上工作的并不十分好，最显著的莫过于涉及到那些
EBCDIC   的传输的时候。(理论上来说，   EBCDIC   网关能够对   Quoted-Pintable
编码进行解码，然后使用   Base64   编码来重新对主体内容进行编码，但是
这些网关在实际中还没有出现呢。)
对于更高的要求，我们使用   Base64   编码。一种适度可信的传输通过
EBCDIC   网关的方法就是依照   [规则   1]   引用如下的   ASCII   码：

! "#$@[\]^`{|}~

更多信息请查看   RFC1341   的   [附录   B]。

由于被   Quoted-Printable   编码的数据通常被认为是行导向的(line-oriented)，
对于使用   Quoted-Printable   编码的数据我们希望行与行之间换行符在传输中被
改写(译者注：由于不同的系统   unix,   windows,   mac得换行符不同)，同样的，我
们希望一封普通文本文件内容的邮件(plain   text   mail)可以在不同的系统中转换
成不同换行符的互联网邮件(Internet   mail)。如果这种转换可能导致原始数据大
量变化(a   corruption   of   the   data)，那么比较明智的选择是应用   base64   编码，
来替换   Quoted-Printable   编码！

5.1  Quoted-Printable Content-Transfer-Encoding            The Quoted-Printable encoding is intended to represent  data            that largely consists of octets that correspond to printable            characters in the ASCII character set.  It encodes the  data            in  such  a way that the resulting octets are unlikely to be            modified by mail transport.  If the data being  encoded  are            mostly  ASCII  text,  the  encoded  form of the data remains            largely recognizable by humans.  A body  which  is  entirely            ASCII  may also be encoded in Quoted-Printable to ensure the            integrity of the data should  the  message  pass  through  a            character-translating, and/or line-wrapping gateway.            In this encoding, octets are to be represented as determined            by the following rules:                 Rule #1:  (General  8-bit  representation)  Any  octet,                 except  those  indicating a line break according to the                 newline convention of the canonical form  of  the  data                 being encoded, may be represented by an "=" followed by                 a two digit hexadecimal representation of  the  octet's                 value. The digits of the hexadecimal alphabet, for this                 purpose, are "0123456789ABCDEF". Uppercase letters must                 be                 used when sending hexadecimal  data,  though  a  robust                 implementation   may   choose  to  recognize  lowercase                 letters on receipt. Thus, for  example,  the  value  12                 (ASCII  form feed) can be represented by "=0C", and the                 value 61 (ASCII  EQUAL  SIGN)  can  be  represented  by                 "=3D".   Except  when  the  following  rules  allow  an                 alternative encoding, this rule is mandatory.                 Rule #2: (Literal representation) Octets  with  decimal                 values  of 33 through 60 inclusive, and 62 through 126,                 inclusive, MAY be represented as the  ASCII  characters                 which  correspond  to  those  octets (EXCLAMATION POINT                 through LESS THAN,  and  GREATER  THAN  through  TILDE,                 respectively).                 Rule #3: (White Space): Octets with values of 9 and  32                 MAY   be  represented  as  ASCII  TAB  (HT)  and  SPACE                 characters,  respectively,   but   MUST   NOT   be   so            Borenstein & Freed                                 [Page 14]            RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992                 represented at the end of an encoded line. Any TAB (HT)                 or SPACE characters on an encoded  line  MUST  thus  be                 followed  on  that  line  by a printable character.  In                 particular, an "=" at  the  end  of  an  encoded  line,                 indicating  a  soft line break (see rule #5) may follow                 one or more TAB (HT) or SPACE characters.   It  follows                 that  an  octet with value 9 or 32 appearing at the end                 of an encoded line must  be  represented  according  to                 Rule  #1.  This  rule  is  necessary  because some MTAs                 (Message Transport  Agents,  programs  which  transport                 messages from one user to another, or perform a part of                 such transfers) are known to pad  lines  of  text  with                 SPACEs,  and  others  are known to remove "white space"                 characters from the end  of  a  line.  Therefore,  when                 decoding  a  Quoted-Printable  body, any trailing white                 space on a line must be deleted, as it will necessarily                 have been added by intermediate transport agents.                 Rule #4 (Line Breaks): A line  break  in  a  text  body                 part,   independent   of  what  its  representation  is                 following the  canonical  representation  of  the  data                 being  encoded, must be represented by a (RFC 822) line                 break,  which  is  a  CRLF  sequence,  in  the  Quoted-                 Printable  encoding.  If isolated CRs and LFs, or LF CR                 and CR LF sequences are allowed  to  appear  in  binary                 data  according  to  the  canonical  form, they must be                 represented   using  the  "=0D",  "=0A",  "=0A=0D"  and                 "=0D=0A" notations respectively.                 Note that many implementation may elect to  encode  the                 local representation of various content types directly.                 In particular, this may apply to plain text material on                 systems  that  use  newline conventions other than CRLF                 delimiters. Such an implementation is permissible,  but                 the  generation  of  line breaks must be generalized to                 account for the case where alternate representations of                 newline sequences are used.                 Rule  #5  (Soft  Line  Breaks):  The   Quoted-Printable                 encoding REQUIRES that encoded lines be no more than 76                 characters long. If longer lines are to be encoded with                 the  Quoted-Printable encoding, 'soft' line breaks must                 be used. An equal sign  as  the  last  character  on  a                 encoded  line indicates such a non-significant ('soft')                 line break in the encoded text. Thus if the "raw"  form                 of the line is a single unencoded line that says:                      Now's the time for all folk to come to the aid of                      their country.                 This  can  be  represented,  in  the   Quoted-Printable                 encoding, as            Borenstein & Freed                                 [Page 15]            RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992                      Now's the time =                      for all folk to come=                       to the aid of their country.                 This provides a mechanism with  which  long  lines  are                 encoded  in  such  a  way as to be restored by the user                 agent.  The 76  character  limit  does  not  count  the                 trailing   CRLF,   but  counts  all  other  characters,                 including any equal signs.            Since the hyphen character ("-") is represented as itself in            the  Quoted-Printable  encoding,  care  must  be taken, when            encapsulating a quoted-printable encoded body in a multipart            entity,  to  ensure that the encapsulation boundary does not            appear anywhere in the encoded body.  (A good strategy is to            choose a boundary that includes a character sequence such as            "=_" which can never appear in a quoted-printable body.  See            the   definition   of   multipart  messages  later  in  this            document.)            NOTE:  The quoted-printable encoding represents something of            a   compromise   between   readability  and  reliability  in            transport.   Bodies  encoded   with   the   quoted-printable            encoding will work reliably over most mail gateways, but may            not work  perfectly  over  a  few  gateways,  notably  those            involving  translation  into  EBCDIC.  (In theory, an EBCDIC            gateway could decode a quoted-printable body  and  re-encode            it  using  base64,  but  such gateways do not yet exist.)  A            higher  level  of  confidence  is  offered  by  the   base64            Content-Transfer-Encoding.  A way to get reasonably reliable            transport through EBCDIC gateways is to also quote the ASCII            characters                 !"#$@[\]^`{|}~            according to rule #1.  See Appendix B for more information.            Because quoted-printable data is  generally  assumed  to  be            line-oriented,  it is to be expected that the breaks between            the lines  of  quoted  printable  data  may  be  altered  in            transport,  in  the  same  manner  that  plain text mail has            always been altered in Internet mail  when  passing  between            systems   with   differing  newline  conventions.   If  such            alterations are likely to constitute  a  corruption  of  the            data,  it  is  probably  more  sensible  to  use  the base64            encoding rather than the quoted-printable encoding.            Borenstein & Freed                                 [Page 16]