Kafka源码阅读 —— KafkaController(5)

来源:互联网 发布:应用文理学院网络学堂 编辑:程序博客网 时间:2024/05/16 09:57

重新分配 replica

当新增机器到集群中时,可能需要调整topic下partition的replica分配。kafka不会根据负载自动调整replica assignment,这时候就需要集群管理员手动调整。
下面的例子是将foo1和foo2两个topic的所有replica重新分配到broker 5和broker 6上。
首先,需要提供文件指明需要迁移哪些topic:

>cat topics-to-move.json{"topics": [{"topic": "foo1"},            {"topic": "foo2"}], "version":1}

文件准备好后,执行命令生成kafka建议的replica assignment:

> bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --topics-to-move-json-file topics-to-move.json --broker-list "5,6" --generateCurrent partition replica assignment{"version":1, "partitions":[{"topic":"foo1","partition":2,"replicas":[1,2]},               {"topic":"foo1","partition":0,"replicas":[3,4]},               {"topic":"foo2","partition":2,"replicas":[1,2]},               {"topic":"foo2","partition":0,"replicas":[3,4]},               {"topic":"foo1","partition":1,"replicas":[2,3]},               {"topic":"foo2","partition":1,"replicas":[2,3]}]}Proposed partition reassignment configuration{"version":1, "partitions":[{"topic":"foo1","partition":2,"replicas":[5,6]},               {"topic":"foo1","partition":0,"replicas":[5,6]},               {"topic":"foo2","partition":2,"replicas":[5,6]},               {"topic":"foo2","partition":0,"replicas":[5,6]},               {"topic":"foo1","partition":1,"replicas":[5,6]},               {"topic":"foo2","partition":1,"replicas":[5,6]}]}

上面的工具只是生成了建议的replica assignment,没有真正执行replica重新分配。将上面生成的建议reassignment另存到文件expand-cluster-reassignment.json中,通过下面的命令执行分配:

> bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file expand-cluster-reassignment.json --executeCurrent partition replica assignment{"version":1, "partitions":[{"topic":"foo1","partition":2,"replicas":[1,2]},               {"topic":"foo1","partition":0,"replicas":[3,4]},               {"topic":"foo2","partition":2,"replicas":[1,2]},               {"topic":"foo2","partition":0,"replicas":[3,4]},               {"topic":"foo1","partition":1,"replicas":[2,3]},               {"topic":"foo2","partition":1,"replicas":[2,3]}]}Save this to use as the --reassignment-json-file option during rollbackSuccessfully started reassignment of partitions{"version":1, "partitions":[{"topic":"foo1","partition":2,"replicas":[5,6]},               {"topic":"foo1","partition":0,"replicas":[5,6]},               {"topic":"foo2","partition":2,"replicas":[5,6]},               {"topic":"foo2","partition":0,"replicas":[5,6]},               {"topic":"foo1","partition":1,"replicas":[5,6]},               {"topic":"foo2","partition":1,"replicas":[5,6]}]}

以上就是通过kafka提供的reassignment工具重新分配replica到新的broker上的过程。

replica reassignment 实现

生成建议的replica assignment的过程比较简单,就是在指定的broker上执行replica分配算法,Kafka源码阅读 —— KafkaController(3)中有说到过replica的分配策略。
执行replica assignment时,kafka-reassign-partitions.sh 命令将分配策略以json格式写入到zookeeper路径/admin/reassign_partitions下,数据格式如下:

{"partitions":[{"topic":"t1","partition":"p1","replicas":"r1"},{"topic":"t2","partition":"p2","replicas":"r2"}]}

类PartitionsReassignedListener监听到zookeeper变化后调用handleDataChange函数,读取zookeeper中的reassign partition数据,组装成ReassignedPartitionsContext,其中包含两个字段:

//新的replica列表var newReplicas: Seq[Int] = Seq.empty,//ISR监听类,用于判断新加入的replica是否已经成为ISRvar isrChangeListener: ReassignedPartitionsIsrChangeListener = null

重新分配replica是一个比较复杂的过程,源码中也有比较详细的解释,首先定义几个简写:
RAR = reassigend replicas,即重新分配的replica,可以理解为新的replica列表
OAR = Original list of replicas for partition,即旧的replica列表
AR = current assigned replica,即当前的replica列表
replica reassign
整个过程大致可能分成两个主要步骤:
1. initiateReassignReplicasForTopicPartition函数初始化reassign replica过程,其中会在zookeeper路径/brokers/topics/[topic]/partitions/[partitionId]/state/上添加listener,监听ISR变化;
2. 第一次调用onPartitionReassignment: 将AR更新为ARA+OAR,这个过程会更新zookeeper和context,之后向AR广播LeaderAndIsr消息。回忆一下,broker收到这个消息会干什么呢?Follower会添加到leader的Fetcher,Leader则更新ISR等信息。RAR-OAR中的replica开始向leader(没有重新选举,leader在OAR中)拉取消息。
3. RAR中的所有replica都已经追赶上Leader,成为ISR,触发ReassignedPartitionsIsrChangeListener,第二次调用onPartitionReassignment:之前的leader在OAR中,这里需要重新从ARA中选出新一任的Leader,然后再将OAR-ARA中的replica从AR中移除,发生OnlineReplica->OfflineReplica->NonExistentReplica的状态变更。最后,还要向所有broker发送UpdateMetadataRequest消息。
这里面有两次调用onPartitionReassignment,这个刚开始有点难以理解,后面对LeaderAndIsrRequest消息的处理流程熟悉后就明白了。

0 0
原创粉丝点击