MIT 分布式系统实验 yfs 6.824 2012 LAB6

来源：互联网发布：两级放大电路实验数据编辑：程序博客网时间：2024/06/01 09:02

实验1-5是在网络条件好的情况下，锁服务器不会坏的情况下实现。但是锁会坏

实验6、7is to implicate the fault tolerance of the lock server .If the lock server is down ,the system of course will be down ,the lab is first to choose a lock server use the prot of paxos.there 有5个锁服务器的RSM，只要有其中足够多的情况下，还是可以把锁交给这个对方。这里面就是lab6要做的事情，利用paxos实现lock server fault tolerance。

要实现Paxos协议，这个协议是对learner作了规定。等做完了，有时间再分享一下。

6.824 - Spring 2012

6.824 Lab 6: Paxos

Due: Friday, April 13th, 5:00pm.

Introduction

In labs 6 and 7, you will replicate the lock service using the replicated state machine approach. SeeSchneider's RSM paper for a good, but non-required, reference. In the replicated state machine approach, one machine is the master; the master receives requests from clients and executes them on all replicas in the same order.

When the master fails, any of the replicas can take over its job, because they all should have the same state as the failed master. One of the key challenges is ensuring that everyone agrees on which replica is the master and which of the slaves are alive, despite arbitrary sequences of crashes or network partitions. We use Paxos to reach such an agreement.

In this lab, you will implement Paxos and use it to agree to a sequence of membership changes (i.e., view changes). We will implement the replicated lock server in Lab 7. We have modifiedlock_smain.cc in this lab to start the RSMinstead of the lock server; however, we will not actually replicate locks until Lab 7.As a result, in this lab, thelock_server processes are actually serving as theconfiguration servers. We will use the termconfiguration server andlock_server interchangably in the following text.

When you complete this lab and the next you will have a replicated state machine that manages a group of lock servers. You should be able to start new lock servers, which will contact the master and ask to join the replica group. Nodes can also be removed from the replica group when they fail. The set of nodes in the group at a particular time is aview, and each time the view changes, you will run Paxos to agree on the new view.

The design we have given you consists of three layered modules. The RSM and config layers make downcalls to tell the layers below them what to do. The config and Paxos modules also make upcalls to the layers above them to inform them of significant events (e.g., Paxos agreed to a value, or a node became unreachable).

RSM module: The RSM module is in charge of replication. When a node joins, the RSMmodule directs the config module to add the node. The RSM module alsoruns arecovery thread on every node to ensure that nodes in thesame view have consistent states. In this lab, the only state torecover is the sequence of Paxos operations that have beenexecuted. In Lab 7, you will extend the RSM module to replicatethe lock service.
config module: The config module is in charge of view management. When the RSM module asks it to add a node to the current view, the config module invokes Paxos to agree on a new view. The config module also sends periodic heartbeats to check if other nodes are alive, and removes a node from the current view if it can't contact some of the members of the current view. It removes a node by invoking Paxos to agree on a new view without the node.
Paxos module: The Paxos module is in charge of running Paxos to agree on a value. In principle the value could be anything. In our system, the value is the list of nodes constituting the next view.

The focus of this lab is on the Paxos module.You'll replicate the lock server in the next lab.

Each module has threads and internal mutexes. As described above, athread may call down through the layers. For instance, the RSM couldtell the config module to add a node, and the config module tellsPaxos to agree to a new view. When Paxos finishes, a thread willinvoke an upcall to inform higher layers of the completion. To avoiddeadlock, we suggest that you use the rule that a module releases itsinternal mutexes before it upcalls, but can keep its mutexes when callingdown.

Getting Started

Begin by initializing your Lab 6 branch with your implementationfrom Lab 5.

% cd ~/lab% git commit -am 'my solution to lab5'Created commit ...% git pullremote: Generating pack......% git checkout -b lab6 origin/lab6Branch lab6 set up to track remote branch refs/remotes/origin/lab6.Switched to a new branch "lab6"% git merge lab5

This will add new files, paxos_protocol.h, paxos.{cc,h},log.{cc,h},rsm_tester.pl,config.{cc,h},rsm.{cc,h},andrsm_protocol.h to yourlab/ directory and updatethe GNUmakefile from your previous lab. It will also incorporate minorchanges into yourlock_smain.cc to initialize the RSM modulewhen the lock server starts.Note that since the RSMand the lock server both bind on the same port, this willactually disable your lock server until Lab 7, unless you change therelevant line inlock_smain.cc back.The lock server will now take twocommand-line arguments: the port that the master and the port thatthe lock server you are starting should bind to.

In rsm.{cc,h}, we have provided you with code to set up theappropriate RPC handlers and manage recovery in this lab.

In files paxos.{cc,h}, you will find a sketch implementation ofthe acceptor and proposer classes that will use the Paxosprotocol to agree on view changes.The filepaxos_protocol.h defines the suggested RPCprotocol between instances of Paxos running on different replicas,including structures for arguments and return types, and marshall codefor those structures. You'll be finishing this Paxos code in this lab.

The files log.{cc,h} provide a full implementationof a log class, which should be used by youracceptor andproposer classesto log important Paxos events to disk. Then, if the node fails andlater re-joins, it has some memory about past views of the system.Do not make any changes to this class, as we will useour own original versions of these files during testing.

config.cc maintains views using Paxos. You will need to understand how it interacts with the Paxos and RSM layers, but you should not need to make any changes to it for this lab.

In the next lab wewill test if the replicated lock service maintains the state ofreplicated locks correctly, but in this lab we will just test if viewchanges happen correctly.The labtesterrsm_tester.pl will automatically start several lockservers, kill and restart some of them, and check that you haveimplemented the Paxos protocol and used it correctly.

Understanding how Paxos is used for view changes

There are two classes that together implement the Paxos protocol: acceptor and proposer. Each replica runs both classes. Theproposer class leads the Paxos protocol by proposing new values and sending requests to all replicas. Theacceptor class processes the requests from the proposer and sends responses back. The method proposer::run(nodes, v) is used to get the members of the current view (nodes) to agree to a valuev. When an agreement instance completes, acceptor will call config's paxos_commit(instance, v) method with the value that was chosen. As explained below, other nodes may also attempt to start Paxos and propose a value, so there is no guarantee that the value that a server proposed is the same as the one that is actually chosen. (In fact, Paxos might abort if it can't get a majority to accept itsprepareoraccept messages!)

//每一个副本都运行acceptor 和 proposer两个类。proposer 用来提出 V给所有的副本，acceptor 类处理proposer的请求，并返回。run(nodes,v)，用来获取当前有谁是同意V的。

一个进程结束以后，acceptor 会配置paxos_commit(instance,v)来对这个所选的值进行确认。

The config module performs view changes among the set ofparticipating nodes. The first view of the system is specified manually. Subsequent view changes rely on Paxos to agree on a unique next view to displace the current view.

配置模块用来处理view changes.第一个状态是指定的，然后的集是用paxos 去决定下一个 view.

When the system starts from scratch, the first node creates view 1containing itself only, i.e. view_1={1}. When node 2 joins after the first node, node two's RSM module joins node 1 and transfers view 1 from thenode one. Then, node 2 asks itsconfig module to additself to view 1. The config module will use Paxos to propose tonodes in view_1={1} a new view_2 containing node 1 and 2. When Paxossucceeds, view_2 is formed,i.e., view_2={1,2}. When node 3joins, itsRSM module will download the last view from the first node (view 2)and it will then attempt to propose to nodes in view 2 a newview_3={1,2,3}. And so on.

最开始的时候，第一个节点建立了view 1 ,但是这个view 1 只包含了他自己，之后如果node 2在之后加入了，那么节点2的RSM module加入到 node 1 并且从node 1 那里获取了view 1 .结下来，节点二申请，加入到view1.configure module 利用paxos协议去生成一个新的view 2.，如果成功了，那么view2 就建立了。view2={1,2},同样，如果node3来了，就以此类推，，，

The config module will also initiate view changes when itdiscovers that some nodes in the current view are notresponding. In particular, the node with the smallest id periodicallysends heartbeat RPCs to all others (and all other servers periodicallysend heartbeats to the node with the smallest id). If a heartbeat RPCtimes out, the config module calls theproposer'srun(nodes, v) method to start a new round of the Paxosprotocol. Because each node independently decides if it should run Paxos, there may be sever alinstances of Paxos running simultaneously; Paxos sorts that outcorrectly.

同样，config module 也会产生一个view change ，当发现有的节点每反映的时候哦，最简单的方法是用heart beat 来解决。拥有最小的ide的节点会定期的发送heartbeat 给其他节点，其他节点也用这个最小的id去发消息。如果timeout 那么就会去调用 run（nodes,v)去开始新的议论的paxos protocol.因为这些操作是独立的，所以可能会有很多的操作同时运行，但是paxos会把这些安排的妥妥的。

The proposer keeps track of whether the current view isstable or not (using theproposer::stable variable). If thecurrent view is stable, there are no on-going Paxos view changeattempts by this node. When the current view is not stable,the node is initiating the Paxos protocol.

有paxos view change 的时候就是stable 的，没有的时候就是stable的。

The acceptor logs important Paxos events as well as a complete history of all values agreed to on disk. At any time a node can reboot and when it re-joins, it may be many views behind. Unless the nodebrings itself up-to-date on the current view, it won't be able to participate in Paxos. By remembering all views, the other nodes canbring this re-joined node up to date.

acceptor 记录了历史的value和重要的事件。如果某一个节点重启动或者重联了，那么因为他落下了很多view, 所以他还不能和其他的node 一起干活，但是因为有了log所以，在其他节点的帮助下，他还是可以重新join的。

The Paxos Protocol

The Paxos Made Simple paper describes a protocol that agrees on a value and then terminates. Since we want to run another instance of Paxos every time there is a view change, we need to ensure that messages from different instances are not confused. We do this by adding instance numbers (which are not the same as proposal numbers) to all messages. Since we are using Paxos to agree on view changes, the instance numbers in our use of Paxos are the same as the view numbers in the config module.

Paxos can't guarantee that every node learns the chosen value right away; some of them may be partitioned or crashed. Therefore, some nodes may be behind, stuck in an old instance of Paxos while the rest of the system has moved on to a new instance. If a node's acceptor gets an RPC request for an old instance, it should reply to theproposer with a special RPC response (setoldinstance to true). This response informs the callingproposer that it is behind and tells it what value was chosen for that instance.

Below is the pseudocode for Paxos. The acceptor and proposer skeletonclasses contain member variables, RPCs, and RPC handlerscorresponding to this code. Except for the additions to handleinstances as described above, it mirrors the protocol described in the paper.

proposer run(instance, v): choose n, unique and higher than any n seen so far send prepare(instance, n) to all servers including self if oldinstance(instance, instance_value) from any node:   commit to the instance_value locally else if prepare_ok(n_a, v_a) from majority:   v' = v_a with highest n_a; choose own v otherwise   send accept(instance, n, v') to all   if accept_ok(n) from majority:     send decided(instance, v') to allacceptor state: must persist across reboots n_h (highest prepare seen) instance_h, (highest instance accepted) n_a, v_a (highest accept seen)acceptor prepare(instance, n) handler: if instance <= instance_h   reply oldinstance(instance, instance_value) else if n > n_h   n_h = n   reply prepare_ok(n_a, v_a) else   reply prepare_rejectacceptor accept(instance, n, v) handler: if n >= n_h   n_a = n   v_a = v   reply accept_ok(n) else   reply accept_rejectacceptor decide(instance, v) handler: paxos_commit(instance, v)

For a given instance of Paxos, potentially manynodes can make proposals, and each ofthese proposals has a unique proposal number. When comparingdifferent proposals, the highest proposal number wins. To ensure thateach proposal number is unique, each proposal consists of a number andthe node's identifier. We provide you with a struct prop_tin paxos_protocol.h that you should use for proposal numbers;we also provide the> and>= operators for theclass.

Each replica must log certain change to its Paxos state (in particularthe n_a, v_a, and n_h fields), as well aslog every agreed value. The providedlog class does this foryou; please use it without modification, as the test program dependson its output being in a particular format.

Add the extraparameter rpcc::to(1000) to your RPC calls to prevent theRPC library from spending a long time attempting to contact crashed nodes.

Your Job

The measure of success for this lab is to pass tests 0-7 ofrsm_tester.pl. (The remaining tests are reserved for the next lab.)The tester starts 3 or 4 configuration servers, kills andrestarts some of them, and checks that all servers indeed go through aunique sequence of view changes by examining their on-disk logs.

% ./rsm_tester.pl 0 1 2 3 4 5 6 7 test0......test1......test2......test3......test4......test5......test6......test7......tests done OK

Important: If rsm_tester.pl fails during the middle of a test,the remaininglock_server processes are not killed and the log files are not cleaned up (so you can debug the causes.). Make sure you do 'killall lock_server; rm -f *.log' to clean up the lingering processes before running rsm_tester.pl again.

Detailed Guidance

We guide you through a series of steps to get this lab working incrementally.

Step One: Implement Paxos

Fill in the Paxos implementation in paxos.cc,following the pseudo-code above. Do not worry about failures yet.

Use the RPC protocol we provide inpaxos_protocol.h. In order to pass the tests, when the proposer sends a RPC, you should set an RPC timeout of1000 milliseconds.Note that though the pseudocode shows differenttypes of responses to each kind of RPC, our protocol combines these responsesinto one type of return structure. For example, the prepareres structcan act as aprepare_ok, anoldinstance, or arejectmessage, depending on the situation.

You may find it helpful for debugging to look in thepaxos-[port].log files, which are writtento bylog.cc.rsm_tester.pldoes not remove these logs when a test fails so that you canuse the logs for debugging.rsm_tester.pl also re-directs the stdout and stderr of yourconfiguration server to lock_server-[arg1]-[arg2].log.

Upon completing this step, you should be able to pass 'rsm_tester.pl0'. This test starts threeconfiguration serversone after another and checks that all servers go through the same threeviews.

Step Two: Simple failures

Test whether your Paxos handles simple failures byrunning 'rsm_tester.pl 0 1 2'. You will not have to write any new code for this step if your code is alreadycorrect.

Step Three: Logging Paxos state

Modify your Paxos implementation to use the log class tolog changes ton_h, andn_a andv_a when they are updated. Convince yourself why these three values must be logged to disk if we want to re-start a previously crashednode correctly. We have provided the code to write and read logs inlog.cc (seelog::logprop(), andlog::logaccept()), soyou just have to make sure to call the appropriate methods at the right times.

Now you can run tests that involve restarting a node after it fails. In particular, you should be able to pass 'rsm_tester.pl 3 4'.In test 4, rsm_tester.pl starts three servers, kills the third server (the remaining two nodes should be able to agree on a new view), kills the second server(the remaining one node tries to run Paxos, but cannot succeed since no majority of nodes are present in the current view), restarts the third server (it will not help with the agreement since the third server is not in the current view), kills the third server, restarts second server (now agreement can be reached), and finally restarts third server.

Step Four: Complicated failures

Finally, you need to verify that your code handles some of thetricky corner cases that Paxos is supposed to deal with. Our testscripts do not test all possible corner cases, so you could still havea buggy Paxos implementation after this step, but you will have a goodfeel for the protocol.

In paxos.cc, we use two methods:breakpoint1() and breakpoint2() to induce complex failures.Theproposer::run functioncallsbreakpoint1() just after completing Phase 1, butbefore starting Phase 2. Similarly it callsbreakpoint2() in betweenPhases 2 and 3. The RSM layer runs a small RPC server that accepts thersm_test_protocol RPCs defined inrsm_protocol.h. The testerusesrsm_tester to sends RPCs to cause the server to exit at the respective breakpoint.

Test 5: This test starts three nodes and kills the third node. The first node will become the leader to initiate Paxos, but the test will cause it to crash at breakpoint 1 (at the end of Phase 1). Then the test will restart the killed third node, which together with the remaining node should be able to finish Paxos (ignoring the failed first node) and complete the view change successfully. The script will verify that the Paxos logs show the correct view changes.
Test 6: This test starts four nodes one by one and kills the fourth node. The first node initiates Paxos as a leader, but the test causes it to fail at breakpoint 2 (after phase 2.)When the fourth node re-joins the system, the rest of the nodes should finish agreeing on the view originally proposed by the first node, before making a new view of their own.
Test 7: This test is identical to test 6, except that it killsall the remaining nodes after the first node exits. Then it restarts all slaves and checks that they first agree on the first node's proposed view before making a new view of their own.

By now, your code should now reliably pass all required tests, i.e.'rsm_tester.pl 0 1 2 3 4 5 6 7'.

Debugging Hints

Make sure you remove all the log files and then kill any remaining lock servers (killall lock_server; rm -f *.log) before you start a new test run.
If a test fails, first check the Paxos logs (paxos-*.log) and make sure the sequence of proposals and views make sense. Do all the nodes (that didn't crash) go through the same sequence of views? Does the sequence of views make sense given what the test does?
If a server gets stuck, check whether one of the lock servers (particularly the leader) deadlocked or crashed when it shouldn't have. You can get a list of the process ids of running lock servers with `pgrep -s0 lock_server'. You can debug a running lock server with `gdb -ppid'.
Use printfs and check the relevant lock_server-*.log files to see what is going on. Some particularly important events are view changes, RSM requesting to add a node, and heartbeat events to remove nodes. In Paxos, print out important state variables (e.g., my_n, n_h, n_a, instance_h) and verify that they make sense.

Handin procedure

E-mail your code as a gzipped tar file to 6.824-submit@pdos.csail.mit.eduby the deadline at the top of the page. To do this, execute thesecommands:

% cd ~/lab% ./stop.sh% make clean% rm core.*% rm *.log% cd ..% tar czvf `whoami`-lab6.tgz lab/

% cd ~/6.824/lab% make handin

That should produce a file called [your_user_name]-lab6.tgz in yourlab directory. Attach that file to an email and send it to the 6.824submit address.

Please post questions or comments on Piazza.
Back to 6.824 home.

MIT 分布式系统 实验 yfs 6.824 2012 LAB6