自己写freebsd的内核防火墙模块

来源:互联网 发布:大学网络课程怎么上 编辑:程序博客网 时间:2024/04/30 19:44

Hooking Filters for Fun and Profit: PFIL_HOOKS

Murat Balaban

$Id: article.sgml,v 1.15 2005/11/25 20:07:04 murat Exp $

FreeBSD is a registered trademark of the FreeBSD Foundation.

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this document, and the FreeBSD Project was aware of the trademark claim, the designations have been followed by the “™” or the “®” symbol.

This article documents how to use PFIL_HOOKS mechanism to add packet filtering modules to your kernel without touching a single line of code already present in the networking subsystems of BSD family of Operating Systems.


1. Preface

Traditional way for packet filter programs to hook themselves into the network I/O subsystem of an existing Operating System has been to hack the relevant part of the kernel code and place function calls to transfer execution into the packet filter for packet inspection and/or packet transformation for a period of time. This way, packet filters were strictly hard-wired into the kernel and lots of modifications into the existing protocol code were necessary. In the old days, the author of one of the most famous open source packet filters, ipfilter, Darren Reed needed to hack IP protocol stack (mainly ip_input.c and ip_output.c) for his program to place itself into the IP stacks of NetBSD, OpenBSD and FreeBSD to get the IP packets.


2. History

Obviously this was not efficient and there could be other ways to do this kind of hooking. This is why BSD people came up with the idea of "packet filter hooks", namely PFIL_HOOKS. The framework has been originally designed and developed by one of the NetBSD committers, Matthew R. Green, and has been committed into the NetBSD -current on Sept 14, 1996 [1]. The main motivation was to place IPFILTER into the IP input/output flow, though, the design aims to provide a generic way to add extra packet processing code power into the kernel with very little overhead, and in a very fast manner with maximum portability.

Since PFIL_HOOKS was originally intended for ipfilter, there were some bits which avoided it from being "generic". When the hooks are invoked, the hook function found some parts of the IP header in network byte order and some parts in host byte order. Altough the goal was to make the framework as generic as possible in theory, this was not true in practice. The mechanism did not actually provide hook points for protocols other than IP. Jason Thorpe, one of the NetBSD committers suggested the changes and after a little discussion [2], they were approved. Jason made the following comments in his cvs commit regarding the changes he made on Nov 11, 2000 [3]

        Restructure the PFIL_HOOKS mechanism a bit:        - All packets are passed to PFIL_HOOKS as they come off the wire, i.e. fields in protocol          headers are in network byte order, etc.        - Allow for multiple hooks to be registered, using a "key", and a "dlt".          The "dlt" is a BPF data link type, indicating what type of header is present.        - INET and INET6 register with key == AF_INET or AF_INET6, and          dlt == DLT_RAW.                - PFIL_HOOKS now take an argument for the filter hook, and mbuf **, and ifnet *, and a          direction (PFIL_IN, PFIL_OUT), thus making them less IP (really, IP Filter) centric.                Maintain compatibility with IP Filter by adding wrapper functions for IP Filter.   


Actually, this was a major milestone. These updates made PFIL_HOOKS much more generic than ever before. In summary, PFIL_HOOKS got the ability to support other network protocols, and made its API more generic. After this time, multiple packet filters could be added for a specific protocol. FreeBSD adopted PFIL_HOOKS when Darren Reed converted his ipfilter to use PFIL_HOOKS in NetBSD and ported it to FreeBSD -current on May 10, 2000. [4] Later on Sep 22, 2004, PFIL_HOOKS has been MFC'ed to -STABLE and has been made a permenant part of the kernel by Andre Oppermann [6]. Fine-grained locking has been introduced with FreeBSD 5.2. Darren also suggested OpenBSD adopt the work [5], however it turns out that OpenBSD people did not find it interesting enough to add to their CVS repo. As of writing this paper, I could not find pfil.c within OpenBSD kernel sources.


3. Internals

Packet filtering points are registered with pfil_head_register (9) function. This is usually done within the entrance and exit points for a given protocol i.e. IP. For example, IP protocol registers a packet filter head (packet filtering point) with type PFIL_TYPE_AF and address family of AF_INET (IPv4):

       /* Initialize packet filter hooks. */        inet_pfil_hook.ph_type = PFIL_TYPE_AF;        inet_pfil_hook.ph_af = AF_INET;        if ((i = pfil_head_register(inet_pfil_hook)) != 0)   
FreeBSD 7.0-CURRENT /usr/src/sys/netinet/ip_input.c:ip_init [7]

The same code can be seen in the IPv6 protocol code: /usr/src/sys/netinet6/ip6_input.c:ip6_init [8]

Packet filtering points registered with protocol families are stored in a singly linked list. Eeach element of the list includes tail queue of packet filter functions for IN and OUT directions along with a pfil type and an address family. What this code does is to add the pfil_head structure into the filtering points linked list.

"The pfil input and output lists were originally implemented as sys/queue.h LIST structures; however this was changed in NetBSD 1.4 to TAILQ structures. This change was to allow the input and output filters to be processed in reverse order, to allow the same path to be taken, in or out of the kernel." [9] [10]

% grep pfil_head_register */*.cnetinet/ip_input.c:     if ((i = pfil_head_register( inet_pfil_hook)) != 0)netinet6/ip6_input.c:   if ((i = pfil_head_register( inet6_pfil_hook)) != 0)%   


This shows us that FreeBSD kernel has filtering points registered for the IPv4 and IPv6 protocols. Though it would be trivial to add filtering points for other protocols, say, TCP or UDP.

busy_count member of the pfil_head structure shows whether the list is being modified, or if the value is -1, this implies that the pfil_head structure does not have any filters added, which allows the protocol to avoid trying to run filters.

pfil_run_hooks [9] traverses the input (note the argument PFIL_IN) tail queue of inet_pfil_hook (of type struct pfil_head) and runs the filters in it. IPv4 protocol resident in FreeBSD kernel has the following code to run AF_INET pfil hooks:

       if (pfil_run_hooks( inet_pfil_hook,  m, m->m_pkthdr.rcvif,            PFIL_IN, NULL) != 0)                return;   
FreeBSD 7.0-CURRENT /usr/src/sys/netinet/ip_input.c:ip_input [7]

pfil_head_unregister [9] function deletes any filters added to its input and output lists and removes the given packet filter head from the pfil_heads linked list.

pfil_add_hook [9] is used to add filters to a given packet filtering point (pfil_head) This function takes a function to be invoked, void *arg paramter to be supplied to the filter function, flags (i.e. PFIL_IN / PFIL_OUT) and a pointer to the pfil_head filtering point, to which the hook will be added.

The IPv4 input routine uses a simple and efficient scheme for locating the input routine for the receiving protocol of an incoming packet. ip_input uses a 256-element mapping array to map from the protocol number in the ip_proto field of the IP packet to the protocol-switch entry of the receiving protocol. Then, for each protocol with a seperate implementation in the system, the corresponding map entry is set to the index of the protocol in the IP protocol switch. When a packet is received, IP simply uses the protocol field to index into the mapping array and calls the input routine of the appropriate protocol. [11]

A new variable pr_pfh of type struct pfil_head has been newly added into this structure to allow running protocol filters dynamically, and without touching any single line of code in the protocol packet flow.

   struct ipprotosw {            ...            struct  pfil_head       pr_pfh;    };   
FreeBSD 7.0-CURRENT /usr/src/sys/netinet/ipprotosw.h [12]

when supplied to pfil_get_head, all filters related to the protocol are returned, which means per-protocol filters.


4. Hooking the filter

Now that we've an enough understanding of what PFIL_HOOKS is, the dynamics lying under its design and its internals a bit, we can show a live example of how IP Filter or IPFW programs hook themselves to the registered IPv4 and IPv6 pfil_heads. Let's start with IPFW:

static intipfw_hook(void){        struct pfil_head *pfh_inet;        if (ipfw_pfil_hooked)                return EEXIST;        pfh_inet = pfil_head_get(PFIL_TYPE_AF, AF_INET);        if (pfh_inet == NULL)                return ENOENT;        pfil_add_hook(ipfw_check_in, NULL, PFIL_IN | PFIL_WAITOK, pfh_inet);        pfil_add_hook(ipfw_check_out, NULL, PFIL_OUT | PFIL_WAITOK, pfh_inet);        return 0;}   
FreeBSD 7.0-CURRENT /usr/src/sys/netinet/ip_fw_pfil.c [13]

What this code does is, first ipfw requests the pfil_head structure for the IPv4 protocol (PFIL_TYPE_AF, AF_INET), and inserts ipfw_check_in for the inbound traffic, and ipfw_check_out for the outbound IP traffic.

IPF:

#   if (__NetBSD_Version__ >= 105110000) || (__FreeBSD_version >= 501108)        ph_inet = pfil_head_get(PFIL_TYPE_AF, AF_INET);#    ifdef USE_INET6        ph_inet6 = pfil_head_get(PFIL_TYPE_AF, AF_INET6);        ..        ..       if (ph_inet != NULL)                error = pfil_add_hook((void *)fr_check_wrapper, NULL,                                      PFIL_IN|PFIL_OUT, ph_inet);   
FreeBSD 7.0-CURRENT /usr/src/sys/contrib/ipfilter/netinet/ip_fil.c:ipl_enable [17]

PF:

        if (pf_pfil_hooked)                return (0);                        pfh_inet = pfil_head_get(PFIL_TYPE_AF, AF_INET);        if (pfh_inet == NULL)                return (ESRCH); /* XXX */        pfil_add_hook(pf_check_in, NULL, PFIL_IN | PFIL_WAITOK, pfh_inet);        pfil_add_hook(pf_check_out, NULL, PFIL_OUT | PFIL_WAITOK, pfh_inet);   
FreeBSD 7.0-CURRENT /usr/src/sys/contrib/pf/net/pf_ioctl.c [15]


5. Hooking your own filter: a kernel module example

Now that we know how PFIL_HOOKS is used by the famous open source firewalls, we can code a simple kernel module and hook it to the BSD IPv4 stack. Our module will count the number of bytes received and sent over the host. If you don't have any idea how loadable kernel modules work, and how to code them, you're advised to take a look at the FreeBSD Architecture Handbook [16]. Chapter 9 (Writing FreeBSD Device Drivers) Part II (Dynamic Kernel Linker Facility) gives an example of how to code a simple "Hello world" LKM.

We'll use this Makefile to compile our hisar module:

SRCS=hisar.cKMOD=hisarrclean:    @make clean    rm -f *~.include  <bsd.kmod.mk>       


I will not discuss the entire source code, since this is a subject of another article. I'll go over the important parts. Below are init_module and deinit_module. init_module will be called when the module is being loaded, the latter will be called while the module is being unloaded.

init_module adds hisar_chkinput function to the list of functions to be called when an IP packet is received; and hisar_chkoutput function to the IP output filters list. Then it creates and registers a character device /dev/flwacct. deinit_module removes the previously added filter functions from IPv4 filters, and destroys the character device.

static intinit_module(void){        struct pfil_head *pfh_inet;        if (hisar_hooked)                return (0);        pfh_inet = pfil_head_get(PFIL_TYPE_AF, AF_INET);        if (pfh_inet == NULL)                return ESRCH;        pfil_add_hook(hisar_chkinput, NULL, PFIL_IN | PFIL_WAITOK, pfh_inet);        pfil_add_hook(hisar_chkoutput, NULL, PFIL_OUT | PFIL_WAITOK, pfh_inet);        hisar_hooked = 1;        flwbuf = (void *)malloc(PAGE_SIZE, M_TEMP, M_WAITOK | M_ZERO);        sdev = make_dev(&flwacct_cdevsw,                        FLWACCT_MINOR,                        UID_ROOT,                        GID_WHEEL,                        0600,                        "flwacct");        uprintf("Loaded %s %s/n", MODNAME, MODVERSION);        return 0;}static intdeinit_module(void){        struct pfil_head *pfh_inet;        if (!hisar_hooked)                return (0);        pfh_inet = pfil_head_get(PFIL_TYPE_AF, AF_INET);        if (pfh_inet == NULL)                return ESRCH;        pfil_remove_hook(hisar_chkinput, NULL, PFIL_IN | PFIL_WAITOK, pfh_inet);        pfil_remove_hook(hisar_chkoutput, NULL, PFIL_OUT | PFIL_WAITOK, pfh_inet);        hisar_hooked = 0;        free(flwbuf, M_TEMP);        destroy_dev(sdev);        uprintf("Unloaded %s %s/n", MODNAME, MODVERSION);        return 0;}   


The filter function we pass to pfil_add_hook (9) has the following definition:

filter_func(void *arg, struct mbuf **m, struct ifnet *ifp, int dir, struct inpcb *inp)

Function takes a void * argument, which you can pass for every input packet, a pointer to pointer to the mbuf structure carrying the entire packet, a pointer to the ifnet interface structure, from which the packet is originating, packet direction PFIL_IN or PFIL_OUT and a pointer to the inpcb protocol control block structure.

Our hisar_chkinput has been added to PFIL_IN filters. So whenever an IP packet arrives, our code will be called from pfil_run_hooks in ip_input. hisar_chkinput only extracts the received packet length from the mbuf, and adds to the in_bytes global variable.

static inthisar_chkinput(void *arg, struct mbuf **m, struct ifnet *ifp, int dir, struct inpcb *inp){        in_bytes += (*m)->m_len;        return 0;}   
You could, for instance, limit the number of bytes received after a treshold value is reached, and block the packet like:
static inthisar_chkinput(void *arg, struct mbuf **m, struct ifnet *ifp, int dir, struct inpcb *inp){        in_bytes += (*m)->m_len;    if (in_bytes > in_maxbytes)        return FLW_DROP;    /* return (1) */        return 0;}   
if any filter function returns a value different than 0, that remaining packet processing is stopped, packet is not processed further, meaning packet is blocked.

Similarly, hisar_chkoutput is invoked from PFIL_OUT filters of IPv4 pfil_head structure. It adds packet length value to the out_bytes global variable.

static inthisar_chkoutput(void *arg, struct mbuf **m, struct ifnet *ifp, int dir, struct inpcb *inp){    out_bytes += (*m)->m_len;    return 0;}   


When /dev/flwacct device is opened, and a read(2) is called on it, dev_read in our module will be called. We copy the string which includes in_bytes and out_bytes values to the user space:

intdev_read(struct cdev *dev, struct uio *uio, int ioflag){    int rv = 0;    sprintf(flwbuf, "%016d,%016d/n", in_bytes, out_bytes);    rv = uiomove(flwbuf, MIN(uio->uio_resid, strlen(flwbuf)), uio);    return rv;}   


Here is the entire source code for our Hisar kernel module:

/* *  A simple network flow accountant *  examplifying PFIL_HOOKS * *  Murat Balaban *  $Id: article.sgml,v 1.15 2005/11/25 20:07:04 murat Exp $ **/#include <sys/types.h>#include <sys/module.h>#include <sys/systm.h>#include <sys/errno.h>#include <sys/param.h>#include <sys/kernel.h>#include <sys/conf.h>#include <sys/uio.h>#include <sys/malloc.h>#include <sys/ioccom.h>#include <sys/mbuf.h>#include <sys/socket.h>#include <sys/sysent.h>#include <sys/sysproto.h>#include <sys/proc.h>#include <sys/syscall.h>#include <machine/iodev.h>#include <netinet/in_systm.h>#include <netinet/in.h>#include <netinet/ip.h>#include <net/if.h>#include <net/pfil.h>#define MODNAME "EnderUNIX HISAR - A simple network flow accountant"#define MODVERSION  "1.0"#define FLWACCT_MINOR   11static volatile int hisar_hooked = 0;d_open_t dev_open;d_close_t dev_close;d_read_t dev_read;d_write_t dev_write;static struct cdev *sdev;static int count = 0;       /* Device Busy flag */static int in_bytes = 0;    /* Bytes IN */static int out_bytes = 0;   /* Bytes OUT */static char *flwbuf = NULL; /* Priv. Buffer */static struct cdevsw flwacct_cdevsw = {        .d_version = D_VERSION,        .d_open = dev_open,        .d_close = dev_close,        .d_read = dev_read,        .d_name = "flwacct",        .d_maj = CDEV_MAJOR,        .d_flags = D_TTY,};static inthisar_chkinput(void *arg, struct mbuf **m, struct ifnet *ifp, int dir,        struct inpcb *inp){    in_bytes += (*m)->m_len;    return 0;}static inthisar_chkoutput(void *arg, struct mbuf **m, struct ifnet *ifp, int dir,        struct inpcb *inp){    out_bytes += (*m)->m_len;    return 0;}static intinit_module(void){    struct pfil_head *pfh_inet;    if (hisar_hooked)        return (0);    pfh_inet = pfil_head_get(PFIL_TYPE_AF, AF_INET);    if (pfh_inet == NULL)        return ESRCH;    pfil_add_hook(hisar_chkinput, NULL, PFIL_IN | PFIL_WAITOK, pfh_inet);    pfil_add_hook(hisar_chkoutput, NULL, PFIL_OUT | PFIL_WAITOK, pfh_inet);    hisar_hooked = 1;    flwbuf = (void *)malloc(PAGE_SIZE, M_TEMP, M_WAITOK | M_ZERO);        sdev = make_dev(&flwacct_cdevsw,                        FLWACCT_MINOR,                        UID_ROOT,                        GID_WHEEL,                        0600,                        "flwacct");    uprintf("Loaded %s %s/n", MODNAME, MODVERSION);    return 0;}static intdeinit_module(void){    struct pfil_head *pfh_inet;    if (!hisar_hooked)        return (0);    pfh_inet = pfil_head_get(PFIL_TYPE_AF, AF_INET);    if (pfh_inet == NULL)        return ESRCH;    pfil_remove_hook(hisar_chkinput, NULL, PFIL_IN | PFIL_WAITOK, pfh_inet);    pfil_remove_hook(hisar_chkoutput, NULL, PFIL_OUT | PFIL_WAITOK, pfh_inet);    hisar_hooked = 0;    free(flwbuf, M_TEMP);    destroy_dev(sdev);    uprintf("Unloaded %s %s/n", MODNAME, MODVERSION);    return 0;}/* Module event handler */static intmod_evhandler(struct module *m, int what, void *arg){    int err = 0;        switch(what) {    case MOD_LOAD:        err = init_module();        break;    case MOD_UNLOAD:        err = deinit_module();        break;    default:        err = EINVAL;        break;    }    return err;}intdev_open(struct cdev *dev, int oflags, int devtype, struct thread *td){        int err = 0;        if (count > 0)                return EBUSY;        count = 1;        return (err);}intdev_close(struct cdev *dev, int fflag, int devtype, struct thread *td){        int err = 0;        count = 0;        return (err);}intdev_read(struct cdev *dev, struct uio *uio, int ioflag){    int rv = 0;    sprintf(flwbuf, "%016d,%016d/n", in_bytes, out_bytes);    rv = uiomove(flwbuf, MIN(uio->uio_resid, strlen(flwbuf)), uio);    return rv;}DEV_MODULE(hisarmodule, mod_evhandler, NULL);MODULE_VERSION(hisarmodule, 1);   


To compile the source, type:

efe@~/freebsd-kernel/pfil/lkm# makeWarning: Object directory not changed from original /root/freebsd-kernel/pfil/lkm@ -> /usr/src/sysmachine -> /usr/src/sys/i386/includecc -O -pipe  -D_KERNEL -DKLD_MODULE -nostdinc -I-  /  -I. -I@ -I@/contrib/altq -I@/../include -I/usr/include -finline-limit=8000 /  -fno-common  -mno-align-long-strings -mpreferred-stack-boundary=2  /  -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -ffreestanding -Wall -Wredundant-decls /   -Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith /   -Winline -Wcast-qual  -fformat-extensions -std=c99 -c hisar.cld  -d -warn-common -r -d -o hisar.kld hisar.otouch /root/freebsd-kernel/pfil/lkm/export_symsawk -f /sys/conf/kmod_syms.awk hisar.kld  /root/freebsd-kernel/pfil/lkm/export_syms |  xargs -J% objcopy % hisar.kldld -Bshareable  -d -warn-common -o hisar.ko hisar.kldobjcopy --strip-debug hisar.koefe@~/freebsd-kernel/pfil/lkm   


To load the compiled kernel module, use kldload command. kldstat will list currently loaded kernel modules:

efe@~/freebsd-kernel/pfil/lkm# kldload ./hisar.koLoaded EnderUNIX HISAR - A simple network flow accountant 1.0efe@~/freebsd-kernel/pfil/lkm# kldstatId Refs Address    Size     Name 1    8 0xc0400000 5e5c34   kernel 2    1 0xc09e6000 59c4     snd_ich.ko 3    2 0xc09ec000 1d4fc    sound.ko 4   15 0xc0a0a000 58034    acpi.ko 5    1 0xc227e000 2000     fire_saver.ko 6    1 0xc25d7000 2000     hisar.koefe@~/freebsd-kernel/pfil/lkm#        
Hisar is now loaded into the kernel address space, and is counting input and output bytes. Remember that our module registers a character device, and when read(2) is called on that, it gives us the current number of bytes received and sent over the IP stack. Below code opens /dev/flwacct, and reads from it:
/* flwcnt.c read from /dev/flwacct */#include <stdio.h>#include <fcntl.h>intmain(void){        char buf[1024];        int fd;        if ((fd = open("/dev/flwacct", O_RDONLY)) == -1) {                perror("open");                exit(1);        }        while(read(fd, buf, sizeof(buf) - 1) > 0) {                printf("IN: %.16s bytes, OUT: %.16s bytes/n", buf, buf + 17);                sleep(1);        }        close(fd);        return 0;}       


Compile and run:

efe@~# make flwcntcc -O -pipe   flwcnt.c  -o flwcntefe@~# ./flwcntIN: 0000000004621389 bytes, OUT: 0000000000900378 bytesIN: 0000000004621429 bytes, OUT: 0000000000900514 bytesIN: 0000000004621469 bytes, OUT: 0000000000900650 bytesIN: 0000000004621509 bytes, OUT: 0000000000900786 bytes^Cefe@~#