5.4 extract_addr函数:邮件地址解析

来源:互联网 发布:电子商务域名 编辑:程序博客网 时间:2024/06/05 07:40

我们知道postfix默认接受RFC822格式的邮件地址,不强制客户提供RFC821格式的地址。我们常见的地址如zhangsan@163.com格式。但RFC822所定义的邮件格式非常复杂,如下地址格式都是正确的:

         TheBoss<zhangsan@163.com>

         “TheBoss”<zhangsan@163.com>

         zhangsan@163.com(TheBoss)

 

         以下是Foxmail中查看某封邮件原信的结果(见图5-4):


图5-4 查看邮件原信中的地址

 

postfix为解析邮件地址定义了TOK822结构体:

/*  *Internal address representation: a token tree.  */typedef struct TOK822 {   int     type;                    /* token value, see below */   VSTRING *vstr;                           /*token contents */   struct TOK822 *prev;                /*peer */   struct TOK822 *next;                /*peer */   struct TOK822 *head;               /*group members */   struct TOK822 *tail;                  /*group members */   struct TOK822 *owner;            /*group owner */} TOK822;


         该结构体的type字段定义节点类型,vstr字段定义节点的值,其他字段均为构成树的链接字段。由于邮件地址可能会有复杂的格式,所以有定义了多种节点类型:

/* * Token values for multi-character objects. Single-character operatorsare * represented by their own character value. */#define TOK822_MINTOK        256#define     TOK822_ATOM         256            /* non-special character sequence */#define     TOK822_QSTRING   257            /* stuff between "", notnesting */#define     TOK822_COMMENT        258            /* comment including (), may nest */#define     TOK822_DOMLIT     259            /* stuff between [] not nesting */#define     TOK822_ADDR         260            /* actually a token group */#define TOK822_STARTGRP    261            /*start of named group */#define TOK822_MAXTOK      
 2

61 tok822_parse函数所在的/global/tok822_parse.c有单元测试主函数,我们运行一下看看结果,地址“zhangsan”zhangsan@163.com会被组织成如下的树(见图5-5):


图5-5 extract_addr函数测试结果

 

该树的类型为address,即宏TOK822_ADDR。

 

         用户一般是不会通过命令行向邮件服务器提供复杂的邮件地址的。MUA软件有可能这样做,extract_addr函数需要从可能存在的所有地址形式中提取出真正的邮件地址:

/smtpd/smtpd.c2122 /* extract_addr - extract address fromrubble */21232124 static int extract_addr(SMTPD_STATE*state, SMTPD_TOKEN *arg,2125                                 intallow_empty_addr, int strict_rfc821,2126                                 int smtputf8)2127 {2128    const char *myname = "extract_addr";2129    TOK822 *tree;2130    TOK822 *tp;2131    TOK822 *addr = 0;2132    int     naddr;2133    int     non_addr;2134    int     err = 0;2135    char   *junk = 0;2136    char   *text;2137    char   *colon;21382139    /*2140     * Special case.2141     */2142 #define PERMIT_EMPTY_ADDR       12143 #define REJECT_EMPTY_ADDR       021442145    /*2146     * Some mailers send RFC822-style address forms (with comments and such)2147     * in SMTP envelopes. We cannot blame users for this: the blame is with2148     * programmers violating the RFC, and with sendmail for being permissive.2149     *2150     * XXX The SMTP command tokenizer must leave the address in externalized2151     * (quoted) form, so that the address parser can correctly extract the2152     * address from surrounding junk.2153     *2154     * XXX We have only one address parser, written according to the rules of2155     * RFC 822. That standard differs subtly from RFC 821.2156     */2157    if (msg_verbose)2158        msg_info("%s: input: %s", myname, STR(arg->vstrval));2159    if (STR(arg->vstrval)[0] == '<'2160        && STR(arg->vstrval)[LEN(arg->vstrval) - 1] == '>') {2161        junk = text = mystrndup(STR(arg->vstrval) + 1, LEN(arg->vstrval) -2);2162    } else2163        text = STR(arg->vstrval);

2159-2163 客户可能提供两类地址:符合RFC821的放在尖括号内的地址或不符合RFC821的地址。对于前者我们取得尖括号内的地址,后者先记录下来。

 

21642165    /*2166     * Truncate deprecated route address form.2167     */2168    if (*text == '@' && (colon = strchr(text, ':')) != 0)2169        text = colon + 1;

2168-2169 忽略已经废弃的格式。

 

2170    tree = tok822_parse(text);

 

2170 将地址解析为TOK822树。

21712172    if (junk)2173        myfree(junk);21742175    /*2176     * Find trouble.2177     */2178    for (naddr = non_addr = 0, tp = tree; tp != 0; tp = tp->next) {2179        if (tp->type == TOK822_ADDR) {2180             addr = tp;2181             naddr += 1;                         /* count address forms*/2182        } else if (tp->type == '<' || tp->type == '>') {2183              /* void */ ;                       /* ignore brackets */2184        } else {2185             non_addr += 1;                      /* count non-addressforms */2186        }2187    }


2178-2187 搜索树节点,提取地址部分,记录地址和非地址部分的个数。

21882189    /*2190     * Report trouble. XXX Should log a warning only if we are going to2191     * sleep+reject so that attackers can't flood our logfiles.2192     *2193     * XXX Unfortunately, the sleep-before-reject feature had to be abandoned2194     * (at least for small error counts) because servers were DOS-ing2195     * themselves when flooded by backscatter traffic.2196     */2197    if (naddr > 12198        || (strict_rfc821 && (non_addr || *STR(arg->vstrval) !='<'))) {2199        msg_warn("Illegal address syntax from %s in %s command: %s",2200                  state->namaddr,state->where,2201                 printable(STR(arg->vstrval), '?'));2202        err = 1;2203    }22042205    /*2206     * Don't overwrite the input with the extracted address. We need the2207     * original (external) form in case the client does not send ORCPT2208     * information; and error messages are more accurate if we log the2209     * unmodified form. We need the internal form for all other purposes.2210     */2211    if (addr)2212        tok822_internalize(state->addr_buf, addr->head, TOK822_STR_DEFL);2213    else2214        vstring_strcpy(state->addr_buf, "");

2211-2214 函数tok822_internalize将地址树转化为字符串,接着将其保存在SMTPD_STATE->addr_buf字段中。我们还需要客户端提供的原地址,所以要用addr_buf字段得到解析后的地址,而不是覆盖原地址。

22152216    /*2217     * Report trouble. XXX Should log a warning only if we are going to2218     * sleep+reject so that attackers can't flood our logfiles. Log the2219     * original address.2220     */2221    if (err == 0)2222        if ((STR(state->addr_buf)[0] == 0 && !allow_empty_addr)2223             || (strict_rfc821 &&STR(state->addr_buf)[0] == '@')2224             || (SMTPD_STAND_ALONE(state) == 02225                 &&smtpd_check_addr(STR(state->addr_buf), smtputf8) != 0)) {2226             msg_warn("Illegal addresssyntax from %s in %s command: %s",2227                      state->namaddr,state->where,2228                      printable(STR(arg->vstrval), '?'));2229             err = 1;2230        }


2221-2230 用smtpd_check_addr对解析出的地址做ACL检查。

22312232    /*2233     * Cleanup.2234     */2235    tok822_free_tree(tree);


2235 释放TOK822树。

 

2236    if (msg_verbose)2237        msg_info("%s: in: %s, result: %s",2238                  myname, STR(arg->vstrval),STR(state->addr_buf));2239    return (err);2240 }


0 0