Erlang二进制创建的内部机制和优化(一)

来源:互联网 发布:java中图片上传 编辑:程序博客网 时间:2024/06/05 05:46
《Erlang Binary的内部结构和分类介绍》一文是本文的基础,接下来要探讨的是构建Binary时,什么情景下才能充分发挥Erlang运行时系统对二进制创建做所做的优化特性。


下面是引用官方文档中的一个例子,并加予C源码进一步阐述二进制创建的内部机制。

Bin0 = <<0>>,                    %% 1Bin1 = <<Bin0/binary,1,2,3>>,    %% 2Bin2 = <<Bin1/binary,4,5,6>>,    %% 3Bin3 = <<Bin2/binary,7,8,9>>,    %% 4Bin4 = <<Bin1/binary,17>>,       %% 5 !!!{Bin4,Bin3}                      %% 6

在第一行,系统创建了一个堆二进制(heap binary)。
在《Erlang Binary的内部结构和分类介绍》已经提到,堆二进制被直接存储到进程堆里,最大为64字节,如果大于64字节,引用计数二进制(refc binary)将会被创建。


第二行属于二进制的append操作,调用了erl_bits.c中的erts_bs_appen函数,C源码及注解如下:
Etermerts_bs_append(Process* c_p, Eterm* reg, Uint live, Eterm build_size_term,            Uint extra_words, Uint unit){    Eterm bin;/* Given binary */    Eterm* ptr;    Eterm hdr;    ErlSubBin* sb;    ProcBin* pb;    Binary* binp;    Uint heap_need;    Uint build_size_in_bits;    Uint used_size_in_bits;    Uint unsigned_bits;    ERL_BITS_DEFINE_STATEP(c_p);    // 需要创建的二进制的位数: build_size_in_bits    if (is_small(build_size_term)) {        Sint signed_bits = signed_val(build_size_term);        if (signed_bits < 0) {            goto badarg;        }        build_size_in_bits = (Uint) signed_bits;    } else if (term_to_Uint(build_size_term, &unsigned_bits)) {        build_size_in_bits = unsigned_bits;    } else {        c_p->freason = unsigned_bits;        return THE_NON_VALUE;    }    bin = reg[live];    if (!is_boxed(bin)) {badarg:        c_p->freason = BADARG;        return THE_NON_VALUE;    }    ptr = boxed_val(bin);    // 取出二进制数据流中的header    hdr = *ptr;    if (!is_binary_header(hdr)) {        goto badarg;    }    // #MARK_A    if (hdr != HEADER_SUB_BIN) {        // 非子二进制,不可写        goto not_writable;    }    sb = (ErlSubBin *) ptr;    if (!sb->is_writable) {        // is_writable==0,不可写        goto not_writable;    }    pb = (ProcBin *) boxed_val(sb->orig);        // 必须是refc binary    ASSERT(pb->thing_word == HEADER_PROC_BIN);    if ((pb->flags & PB_IS_WRITABLE) == 0) {        // 标明了不可写        goto not_writable;    }    /*     * OK, the binary is writable.     */    erts_bin_offset = 8*sb->size + sb->bitsize;    if (unit > 1) {        if ((unit == 8 && (erts_bin_offset & 7) != 0) ||                    (erts_bin_offset % unit) != 0) {            goto badarg;        }    }    used_size_in_bits = erts_bin_offset + build_size_in_bits;    // 原来的sub binary设为以后不可写,因为后继空间将要被写入数据    // #MARK_B    sb->is_writable = 0;/* Make sure that no one else can write. */    // 扩展到所需大小    pb->size = NBYTES(used_size_in_bits);    pb->flags |= PB_ACTIVE_WRITER;    /*     * Reallocate the binary if it is too small.     */    binp = pb->val;    // 如果容器的空间不足,则重新分配容器大小到所需的二倍    if (binp->orig_size < pb->size) {        Uint new_size = 2*pb->size;        binp = erts_bin_realloc(binp, new_size);        binp->orig_size = new_size;        // 注意:重新分配空间以后,pb->val指针会被改变,        // 所以此处的binary不能被外部引用        pb->val = binp;        pb->bytes = (byte *) binp->orig_bytes;    }    erts_current_bin = pb->bytes;    /*     * Allocate heap space and build a new sub binary.     */    reg[live] = sb->orig;    heap_need = ERL_SUB_BIN_SIZE + extra_words;    if (c_p->stop - c_p->htop < heap_need) {        (void) erts_garbage_collect(c_p, heap_need, reg, live+1);    }    // 创建一个新的sub binary,指向原二进制的开头,    // 相比原来的sub binary,这里只是把空间大小扩展到所需值    sb = (ErlSubBin *) c_p->htop; // 从堆顶写入    // 进程堆顶上升ERL_SUB_BIN_SIZE(20)字节    c_p->htop += ERL_SUB_BIN_SIZE;    sb->thing_word = HEADER_SUB_BIN;    sb->size = BYTE_OFFSET(used_size_in_bits);    sb->bitsize = BIT_OFFSET(used_size_in_bits);    sb->offs = 0;    sb->bitoffs = 0;    // 最新的sub binary,设为可写    // 也就是说,在一系列的append操作中,只有最后一个sub binary是可写的    sb->is_writable = 1;    sb->orig = reg[live];    return make_binary(sb);    /*     * The binary is not writable. We must create a new writable binary and     * copy the old contents of the binary.     */not_writable:    {        Uint used_size_in_bytes; /* Size of old binary + data to be built */        Uint bin_size;        Binary* bptr;        byte* src_bytes;        Uint bitoffs;        Uint bitsize;        Eterm* hp;        /*         * Allocate heap space.         */        heap_need = PROC_BIN_SIZE + ERL_SUB_BIN_SIZE + extra_words;        if (c_p->stop - c_p->htop < heap_need) {            (void) erts_garbage_collect(c_p, heap_need, reg, live+1);            bin = reg[live];        }        hp = c_p->htop;        /*         * Calculate sizes. The size of the new binary, is the sum of the         * build size and the size of the old binary. Allow some room         * for growing.         */        ERTS_GET_BINARY_BYTES(bin, src_bytes, bitoffs, bitsize);        erts_bin_offset = 8*binary_size(bin) + bitsize;        if (unit > 1) {            if ((unit == 8 && (erts_bin_offset & 7) != 0) ||                        (erts_bin_offset % unit) != 0) {                goto badarg;            }        }        used_size_in_bits = erts_bin_offset + build_size_in_bits;        used_size_in_bytes = NBYTES(used_size_in_bits);        bin_size = 2*used_size_in_bytes;        // 至少256字节        bin_size = (bin_size < 256) ? 256 : bin_size;        /*         * Allocate the binary data struct itself.         */        // 创建大小为所需空间的二倍的binary(最小值为256字节),        // 它作为一个容器,存储在进程堆以外,        // 进程堆里只存放引用这个binary的refc binary        bptr = erts_bin_nrml_alloc(bin_size);        bptr->flags = 0;        bptr->orig_size = bin_size;        erts_refc_init(&bptr->refc, 1);        erts_current_bin = (byte *) bptr->orig_bytes;        /*         * Now allocate the ProcBin on the heap.         */        // 创建refc binary,引用上面的binary, 并存储到进程堆        pb = (ProcBin *) hp;        hp += PROC_BIN_SIZE;        pb->thing_word = HEADER_PROC_BIN;        // 当前设置为实际所需的大小,以后的append操作可扩展        pb->size = used_size_in_bytes;         pb->next = MSO(c_p).first;        MSO(c_p).first = (struct erl_off_heap_header*)pb;        pb->val = bptr;        pb->bytes = (byte*) bptr->orig_bytes;        pb->flags = PB_IS_WRITABLE | PB_ACTIVE_WRITER;        OH_OVERHEAD(&(MSO(c_p)), pb->size / sizeof(Eterm));        /*         * Now allocate the sub binary and set its size to include the         * data about to be built.         */        // 创建sub binary,引用上面的refc binary,并设置为所需大小        sb = (ErlSubBin *) hp;        hp += ERL_SUB_BIN_SIZE;        sb->thing_word = HEADER_SUB_BIN;        sb->size = BYTE_OFFSET(used_size_in_bits);        sb->bitsize = BIT_OFFSET(used_size_in_bits);        sb->offs = 0;        sb->bitoffs = 0;        sb->is_writable = 1;        sb->orig = make_binary(pb);        c_p->htop = hp;        /*         * Now copy the data into the binary.         */        copy_binary_to_buffer(erts_current_bin, 0, src_bytes, bitoffs, erts_bin_offset);        return make_binary(sb);    }}

从上面代码#MARK_A处可以看到,如果不是子二进制(sub binary)就跳到not_writable,然后创建所需要的容器、refc binary和sub binary,并拷贝Bin0的内容(详细请看not_writable部分中的注释),为append做准备。
Bin0 = <<0>>,                    %% 1Bin1 = <<Bin0/binary,1,2,3>>,    %% 2Bin2 = <<Bin1/binary,4,5,6>>,    %% 3Bin3 = <<Bin2/binary,7,8,9>>,    %% 4Bin4 = <<Bin1/binary,17>>,       %% 5 !!!{Bin4,Bin3}                      %% 6
在第三行,由于Bin1是最后一个执行过append操作的,它的后继空间是自由的,是可被扩展的,而且,Bin1不可能再被改变,
所以Bin1不会被复制,只是在Bin1的后面依次追加1、2、3,

第四行的执行过程和第三行一样。

在第五行,是往Bin1后面追加数据,而不是Bin3。由于Bin1已经不是最后被执行过append操作的数据,即Bin1的后继空间已经有别的数据存在(此处Bin1后面已经保存了4,5,6,7,8,9)。所以执行过程不会和上面两行一样。在这里将会创建新的sub binary并拷贝Bin1,然后在它的后面追加17。


我们是怎么知道它后面不能再追加数据?文档中也有这么一个问题:
We will not explain here how the run-time system can know that it is not allowed to write into Bin1; it is left as an exercise to the curious reader to figure out how it is done by reading the emulator sources, primarily erl_bits.c.

这个问题的答案,上面append函数中可以找到。其实在执行第三行时,Bin1已被设置为不可写(参见#MARK_B处)。

Erlang二进制创建的内部机制和优化(二)