DARE: High-Performance State Machine Replication on RDMA Networks

来源：互联网发布：matlab求矩阵最小值编辑：程序博客网时间：2024/05/07 23:28

The log is described by four dynamic pointers

commit points to the first not-committed log entry; it is updated by the leader during log replication

struct dare_log_t{    uint64_t write;        uint64_t len;};

static intrc_memory_reg(){    /* Register memory for local log */        IBDEV->lcl_mr[LOG_QP] = ibv_reg_mr(IBDEV->rc_pd,            SRV_DATA->log, sizeof(dare_log_t) + SRV_DATA->log->len,             IBV_ACCESS_REMOTE_WRITE | IBV_ACCESS_REMOTE_ATOMIC |             IBV_ACCESS_REMOTE_READ | IBV_ACCESS_LOCAL_WRITE);        /* !!length = sizeof(dare_log_t) + SRV_DATA->log->len */        return 0;}

When the leader issuspected to have failed, the servers elect another leader. Each election causes the beginning of a new term—a periodof time in which at most one leader exits. A server that wins an election during a term becomes the leader of that term.

Imagine you have a tcp connection and you want a so-called idle timeout, that is, you want to be called when there have been, say, 60 seconds of inactivity on the socket. The easiest way to do this is to configure an ev_timer with a repeat value of 60 and then call ev_timer_again each time you successfully read or write some data.

/** Example: Create a timeout timer that times out after 10 seconds of inactivity. */static void timeout_cb (struct ev_loop *loop, struct ev_timer *w, int revents){    .. ten seconds without any activity}struct ev_timer mytimer;ev_timer_init (&mytimer, timeout_cb, 0., 10.); /* note, only repeat used */ev_timer_again (&mytimer); /* start timer */ev_loop (loop, 0);// and in some piece of code that gets executed on any "activity":// reset the timeout to start ticking again at 10 secondsev_timer_again (&mytimer);

The candidate sends vote requests to the other servers: It updates its corresponding entry in the vote request array (one of the control data arrays) at all other servers by issuing RDMA write operations.

struct ctrl_data_t {    /* State identified (SID) */    uint64_t    sid;        /* DARE arrays */    vote_req_t    vote_req[MAX_SERVER_COUNT];       /* vote requests */};/* Set remote offset */uint32_t offset = (uint32_t) (offsetof(ctrl_data_t, vote_req) + sizeof(vote_req_t) * idx);

Servers not aware of a leader periodically check the vote request array for incoming requests. They only consider requests for the leadership of a higher (more recent) term than their own.

static void poll_vote_requests(){    if (SID_GET_L(data.cached_sid)) {        /* Active leader known; just ignore vote requests */        return;    }        /* No leader known; make sure about this. */    ..    /* Okay, so there is no known leader...     ...look for vote requests. */}

A faulty-leader is eventually detected by all the non-faulty servers; thus, a leader election starts. By using randomized timeouts [1] for restarting the election, DARE ensures that a leader is eventually elected.
Every other server checks its heartbeat array regularly, with a period ∆: If its own term is smaller, then a change in leadership occurred; thus, the server updates its own term to indicate its support.
Removing a server: The leader detects failed servers by using the Queue Pair (QP) timeouts provided by the Reliable Connection (RC) transport mechanism.

[1] Ongaro, Diego, and John Ousterhout. "In search of an understandable consensus algorithm." Proc. USENIX Annual Technical Conference. 2014.

0 0