Unix Programming Frequently Asked Questions - Part V

来源:互联网 发布:dropdownlist绑定数据 编辑:程序博客网 时间:2024/06/06 05:38

Unix Programming Frequently Asked Questions - Part V

5. Miscellaneous programming

5.1 How do I compare strings using wildcards?

The answer to that depends on what exactly you mean by`wildcards'.

There are two quite different concepts that qualify as `wildcards'. Theyare:

Filename patterns
These are what the shell uses for filename expansion (`globbing').
Regular Expressions
These are used by editors, grep, etc. for matching text, but theynormallyaren't applied to filenames.

5.1.1 How do I compare strings using filename patterns?

Unless you are unlucky, your system should have a functionfnmatch() to do filename matching. This generally allows only theBourne shell style of pattern; i.e. it recognises`*', `[...]'and `?', but probably won't support the more arcane patternsavailable in the Korn and Bourne-Again shells.

If you don't have this function, then rather than reinvent the wheel,you are probably better off snarfing a copy from the BSD or GNU sources.

Also, for the common cases of matching actual filenames, look forglob(), which will find all existing files matching a pattern.

5.1.2 How do I compare strings using regular expressions?

There are a number of slightly different syntaxes for regularexpressions; most systems use at least two: the one recognised byed, sometimes known as `Basic Regular Expressions', and the onerecognised byegrep, `Extended Regular Expressions'. Perl hasit's own slightly different flavour, as does Emacs.

To support this multitude of formats, there is a corresponding multitudeof implementations. Systems will generally have regexp-matchingfunctions (usuallyregcomp() and regexec()) supplied, butbe wary; some systems have more than one implementation of thesefunctions available, with different interfaces. In addition, there aremany library implementations available. (It's common, BTW, for regexpsto be compiled to an internal form before use, on the assumption thatyou may compare several separate strings against the same regexp.)

One library available for this is the `rx' library, available fromthe GNU mirrors. This seems to be under active development, which may bea good or a bad thing depending on your point of view :-)

5.2 What's the best way to send mail from a program?

There are several ways to send email from a Unix program. Which is thebest method to use in a given situation varies, so I'll present two ofthem. A third possibility, not covered here, is to connect to a localSMTP port (or a smarthost) and use SMTP directly; see RFC 821.

5.2.1 The simple method: /bin/mail

For simple applications, it may be sufficient to invoke mail(usually`/bin/mail', but could be `/usr/bin/mail' on somesystems).

WARNING: Some versions of UCB Mail may execute commandsprefixed by`~!' or `~|' given in the message body even innon-interactive mode. This can be a security risk.

Invoked as `mail -s 'subject' recipients...' it will take a messagebody on standard input, and supply a default header (including thespecified subject), and pass the message tosendmail fordelivery.

This example mails a test message to root on the local system:

#include <stdio.h>#define MAILPROG "/bin/mail"int main(){    FILE *mail = popen(MAILPROG " -s 'Test Message' root", "w");    if (!mail)    {        perror("popen");        exit(1);    }    fprintf(mail, "This is a test.\n");    if (pclose(mail))    {        fprintf(stderr, "mail failed!\n");        exit(1);    }}

If the text to be sent is already in a file, then one can do:

    system(MAILPROG " -s 'file contents' root </tmp/filename");

These methods can be extended to more complex cases, but there are manypitfalls to watch out for:

  • If using system() or popen(), you must be very careful about quotingarguments to protect them from filename expansion or word splitting
  • Constructing command lines from user-specified data is a common sourceof buffer-overrun errors and other security holes
  • This method does not allow for CC: or BCC: recipients to be specified(some versions of /bin/mail may allow this, some do not)

5.2.2 Invoking the MTA directly: /usr/lib/sendmail

The mail program is an example of a Mail User Agent, aprogram intended to be invoked by the user to send and receive mail, butwhich does not handle the actual transport. A program for transportingmail is called anMTA, and the most commonly found MTA on Unixsystems is called sendmail. There are other MTAs in use, such asMMDF, but these generally include a program that emulates theusual behaviour ofsendmail.

Historically, sendmail has usually been found in `/usr/lib',but the current trend is to move library programs out of`/usr/lib'into directories such as `/usr/sbin' or `/usr/libexec'. As aresult, one normally invokessendmail by its full path, which issystem-dependent.

To understand how sendmail behaves, it's useful to understand theconcept of anenvelope. This is very much like paper mail; theenvelope defines who the message is to be delivered to, and who it isfrom (for the purpose of reporting errors). Contained in the envelopeare theheaders, and the body, separated by a blank line.The format of the headers is specified primarily by RFC 822; see alsothe MIME RFCs.

There are two main ways to use sendmail to originate a message:either the envelope recipients can be explicitly supplied, orsendmail can be instructed to deduce them from the messageheaders. Both methods have advantages and disadvantages.

5.2.2.1 Supplying the envelope explicitly

The recipients of a message can simply be specified on the command line.This has the drawback that mail addresses can contain characters thatgivesystem() and popen() considerable grief, such assingle quotes, quoted strings etc. Passing these constructs successfullythrough shell interpretation presents pitfalls. (One can do it byreplacing any single quotes by the sequence single-quote backslashsingle-quote single-quote, then surrounding the entire address withsingle quotes. Ugly, huh?)

Some of this unpleasantness can be avoided by eschewing the use ofsystem() orpopen(), and resorting to fork() andexec() directly. This is sometimes necessary in any event; forexample, user-installed handlers for SIGCHLD will usually breakpclose() to a greater or lesser extent.

Here's an example:

#include <sys/types.h>#include <sys/wait.h>#include <unistd.h>#include <stdlib.h>#include <fcntl.h>#include <sysexits.h>/* #include <paths.h> if you have it */#ifndef _PATH_SENDMAIL#define _PATH_SENDMAIL "/usr/lib/sendmail"#endif/* -oi means "dont treat . as a message terminator" * remove ,"--" if using a pre-V8 sendmail (and hope that no-one * ever uses a recipient address starting with a hyphen) * you might wish to add -oem (report errors by mail) */#define SENDMAIL_OPTS "-oi","--"/* this is a macro for returning the number of elements in array */#define countof(a) ((sizeof(a))/sizeof((a)[0]))/* send the contents of the file open for reading on FD to the * specified recipients; the file is assumed to contain RFC822 headers * & body, the recipient list is terminated by a NULL pointer; returns * -1 if error detected, otherwise the return value from sendmail * (which uses <sysexits.h> to provide meaningful exit codes) */int send_message(int fd, const char **recipients){    static const char *argv_init[] = { _PATH_SENDMAIL, SENDMAIL_OPTS };    const char **argvec = NULL;    int num_recip = 0;    pid_t pid;    int rc;    int status;    /* count number of recipients */    while (recipients[num_recip])        ++num_recip;    if (!num_recip)        return 0;    /* sending to no recipients is successful */    /* alloc space for argument vector */    argvec = malloc((sizeof char*) * (num_recip+countof(argv_init)+1));    if (!argvec)        return -1;    /* initialise argument vector */    memcpy(argvec, argv_init, sizeof(argv_init));    memcpy(argvec+countof(argv_init),           recipients, num_recip*sizeof(char*));    argvec[num_recip + countof(argv_init)] = NULL;    /* may need to add some signal blocking here. */    /* fork */    switch (pid = fork())    {    case 0:   /* child */        /* Plumbing */        if (fd != STDIN_FILENO)            dup2(fd, STDIN_FILENO);        /* defined elsewhere -- closes all FDs >= argument */        closeall(3);        /* go for it: */        execv(_PATH_SENDMAIL, argvec);        _exit(EX_OSFILE);    default:  /* parent */        free(argvec);        rc = waitpid(pid, &status, 0);        if (rc < 0)            return -1;        if (WIFEXITED(status))            return WEXITSTATUS(status);        return -1;    case -1:  /* error */        free(argvec);        return -1;    }}

5.2.2.2 Allowing sendmail to deduce the recipients

The `-t' option to sendmail instructs sendmail toparse the headers of the message, and use all the recipient-type headers(i.e.To:, Cc: and Bcc:) to construct the list ofenvelope recipients. This has the advantage of simplifying thesendmail command line, but makes it impossible to specifyrecipients other than those listed in the headers. (This is not usuallya problem.)

As an example, here's a program to mail a file on standard input tospecified recipients as a MIME attachment. Some error checks have beenomitted for brevity. This requires the`mimencode' program from themetamail distribution.

#include <stdio.h>#include <unistd.h>#include <fcntl.h>/* #include <paths.h> if you have it */#ifndef _PATH_SENDMAIL#define _PATH_SENDMAIL "/usr/lib/sendmail"#endif#define SENDMAIL_OPTS "-oi"#define countof(a) ((sizeof(a))/sizeof((a)[0]))char tfilename[L_tmpnam];char command[128+L_tmpnam];void cleanup(void){    unlink(tfilename);}int main(int argc, char **argv){    FILE *msg;    int i;    if (argc < 2)    {        fprintf(stderr, "usage: %s recipients...\n", argv[0]);        exit(2);    }    if (tmpnam(tfilename) == NULL        || (msg = fopen(tfilename,"w")) == NULL)        exit(2);    atexit(cleanup);    fclose(msg);    msg = fopen(tfilename,"a");    if (!msg)        exit(2);    /* construct recipient list */    fprintf(msg, "To: %s", argv[1]);    for (i = 2; i < argc; i++)        fprintf(msg, ",\n\t%s", argv[i]);    fputc('\n',msg);    /* Subject */    fprintf(msg, "Subject: file sent by mail\n");    /* sendmail can add it's own From:, Date:, Message-ID: etc. */    /* MIME stuff */    fprintf(msg, "MIME-Version: 1.0\n");    fprintf(msg, "Content-Type: application/octet-stream\n");    fprintf(msg, "Content-Transfer-Encoding: base64\n");    /* end of headers -- insert a blank line */    fputc('\n',msg);    fclose(msg);    /* invoke encoding program */    sprintf(command, "mimencode -b >>%s", tfilename);    if (system(command))        exit(1);    /* invoke mailer */    sprintf(command, "%s %s -t <%s",            _PATH_SENDMAIL, SENDMAIL_OPTS, tfilename);    if (system(command))        exit(1);    return 0;}