Kafka源码解析(一)core.kafka.server.KafkaServer
来源:互联网 发布:三层网络配置实例 编辑:程序博客网 时间:2024/06/05 02:47
前言:四月份读了半个月kafka源码,提了几个patch,写了一个KIP,至于能不能接受就另说了。距离四月份现在已经3个月了,源码阅读时的一些领悟感觉开始渐渐淡忘了,是时候写些东西巩固巩固了,所以我想写一个系列。开始时准备先写kafka0.8.2.2,原因是代码量比较少,很裸,而且后续版本都是在这个版本的设计思想基础上搭建的。文章准备从kafka的启动开始深入,之后慢慢扩展到生产消费。
1. kafka的启动命令
在启动kafka的一个broker时,我们会使用kaka-server-start.sh脚本。
kafka bin目录下的sh脚本都会调用kaka-run-class.sh,kaka-server-start.sh也不例外。
exec $base_dir/kafka-run-class.sh $EXTRA_ARGS kafka.Kafka $@
可以看出kafka启动时是调用kafka.kafka类。
同时kaka-server-start.sh中下面的代码值得注意
if [ "x$KAFKA_HEAP_OPTS" = "x" ]; then export KAFKA_HEAP_OPTS="-Xmx1G -Xms1G"fi
说明kafka的broker默认JVM大小为1G
同时我们看一下kaka-run-class.sh的这句话
if [ -z "$KAFKA_HEAP_OPTS" ]; then KAFKA_HEAP_OPTS="-Xmx256M"fi
kaka-run-class.sh对其他命令,如创建topic等,默认开256M大小的JVM
2. core.kafka.kafka
core.kafka.kafka的代码比较简单。
object Kafka extends Logging { def main(args: Array[String]): Unit = { if (args.length != 1) { println("USAGE: java [options] %s server.properties".format(classOf[KafkaServer].getSimpleName())) System.exit(1) } try { val props = Utils.loadProps(args(0)) val serverConfig = new KafkaConfig(props) KafkaMetricsReporter.startReporters(serverConfig.props) val kafkaServerStartable = new KafkaServerStartable(serverConfig) // attach shutdown handler to catch control-c Runtime.getRuntime().addShutdownHook(new Thread() { override def run() = { kafkaServerStartable.shutdown } }) kafkaServerStartable.startup kafkaServerStartable.awaitShutdown } catch { case e: Throwable => fatal(e) } System.exit(0) }}
开始先判断一下是否提供了server.properties,生成一个配置对象serverConfig,
同时声明了一个对象kafkaServerStartable。
使用钩子Runtime.getRuntime().addShutdownHook 在control-c时,调用kafkaServerStartable.shutdown。
执行kafkaServerStartable.startup和kafkaServerStartable.awaitShutdown
没什么值得看的,下面进入到KafkaServerStartable看看。
3. core.kafka.server.KafkaServerStartable
这个类没什么好看的,他声明了一个KafkaServer对象。
KafkaServerStartable的所用方法都是调用KafkaServer的方法。
4. core.kafka.server.KafkaServer
这个类十分重要。
先看一下定义的变量:
private var isShuttingDown = new AtomicBoolean(false) private var shutdownLatch = new CountDownLatch(1) private var startupComplete = new AtomicBoolean(false) val brokerState: BrokerState = new BrokerState val correlationId: AtomicInteger = new AtomicInteger(0) var socketServer: SocketServer = null var requestHandlerPool: KafkaRequestHandlerPool = null var logManager: LogManager = null var offsetManager: OffsetManager = null var kafkaHealthcheck: KafkaHealthcheck = null var topicConfigManager: TopicConfigManager = null var replicaManager: ReplicaManager = null var apis: KafkaApis = null var kafkaController: KafkaController = null //后台执行各种任务的线程数,默认10个 val kafkaScheduler = new KafkaScheduler(config.backgroundThreads) var zkClient: ZkClient = null
这些变量涉及的都很深,在讲到时可以回来看看。
这里面说一说shutdownLatch,这个是一个CountDownLatch对象,startup 时设为1,在core.kafka.kafka中有这句话:
kafkaServerStartable.awaitShutdown
保证了主程序不会退出,当执行shutdown时,会调用kafkaServerStartable.awaitShutdown(),使其变为0,这样主程序就不再等待,直接结束了。
4.1 startup
先看一看代码:
def startup() { try { info("starting") brokerState.newState(Starting) isShuttingDown = new AtomicBoolean(false)//0 shutdownLatch = new CountDownLatch(1) /* start scheduler */ kafkaScheduler.startup() /* setup zookeeper */ zkClient = initZk() /* start log manager */ logManager = createLogManager(zkClient, brokerState) logManager.startup() socketServer = new SocketServer(config.brokerId, config.hostName, config.port, config.numNetworkThreads, config.queuedMaxRequests, config.socketSendBufferBytes, config.socketReceiveBufferBytes, config.socketRequestMaxBytes, config.maxConnectionsPerIp, config.connectionsMaxIdleMs, config.maxConnectionsPerIpOverrides) socketServer.startup() replicaManager = new ReplicaManager(config, time, zkClient, kafkaScheduler, logManager, isShuttingDown) /* start offset manager */ offsetManager = createOffsetManager() kafkaController = new KafkaController(config, zkClient, brokerState) /* start processing requests */ apis = new KafkaApis(socketServer.requestChannel, replicaManager, offsetManager, zkClient, config.brokerId, config, kafkaController) requestHandlerPool = new KafkaRequestHandlerPool(config.brokerId, socketServer.requestChannel, apis, config.numIoThreads) brokerState.newState(RunningAsBroker) Mx4jLoader.maybeLoad() replicaManager.startup() kafkaController.startup() topicConfigManager = new TopicConfigManager(zkClient, logManager) topicConfigManager.startup() /* tell everyone we are alive */ kafkaHealthcheck = new KafkaHealthcheck(config.brokerId, config.advertisedHostName, config.advertisedPort, config.zkSessionTimeoutMs, zkClient) kafkaHealthcheck.startup() registerStats() startupComplete.set(true) info("started") } catch { case e: Throwable => fatal("Fatal error during KafkaServer startup. Prepare to shutdown", e) shutdown() throw e } }
我们来看看startup()都干了什么。
首先设置了brokerState的状态,将其设为Starting。brokerState是一个BrokerState对象,有7种状态。如下:
* * +-----------+ * |Not Running| * +-----+-----+ * | * v * +-----+-----+ * |Starting +--+ * +-----+-----+ | +----+------------+ * | +>+RecoveringFrom | * v |UncleanShutdown | * +----------+ +-----+-----+ +-------+---------+ * |RunningAs | |RunningAs | | * |Controller+<--->+Broker +<-----------+ * +----------+ +-----+-----+ * | | * | v * | +-----+------------+ * |-----> |PendingControlled | * |Shutdown | * +-----+------------+ * | * v * +-----+----------+ * |BrokerShutting | * |Down | * +-----+----------+ * | * v * +-----+-----+ * |Not Running| * +-----------+ *
这七种状态分别是
NotRunning:未运行Starting :开始RecoveringFromUncleanShutdown:从uncleanshutdown恢复RunningAsBroker:作为broker运行RunningAsController:作为controller运行PendingControlledShutdown:向controller报告关闭BrokerShuttingDown:正在关闭
之后是创建各个对象。有kafkaScheduler,logManger,SockectServer,replicaManager,KafkaController,apis,TopicConfigManager,kafkaHealthcheck。这些之后会一一介绍。
registerStats()没啥用是关于jmx的。
之后将startupComplete 置为true。
4.2 zookeeper的连接
在startup时创建了一个zookeeper客户端对象,调用了initZk()方法,代码如下:
private def initZk(): ZkClient = { info("Connecting to zookeeper on " + config.zkConnect) val chroot = { // 如果配置文件中的zookeeper有"/"截取后面的,其实就是获得kafka的zookeeper文件夹名 if (config.zkConnect.indexOf("/") > 0) config.zkConnect.substring(config.zkConnect.indexOf("/")) else "" } //这一段是对指定文件夹的zookeeper地址的处理,没啥有用的 if (chroot.length > 1) { val zkConnForChrootCreation = config.zkConnect.substring(0, config.zkConnect.indexOf("/")) val zkClientForChrootCreation = new ZkClient(zkConnForChrootCreation, config.zkSessionTimeoutMs, config.zkConnectionTimeoutMs, ZKStringSerializer) ZkUtils.makeSurePersistentPathExists(zkClientForChrootCreation, chroot) info("Created zookeeper path " + chroot) zkClientForChrootCreation.close() } val zkClient = new ZkClient(config.zkConnect, config.zkSessionTimeoutMs, config.zkConnectionTimeoutMs, ZKStringSerializer) ZkUtils.setupCommonPaths(zkClient) zkClient }
zookeeper连接的代码,没啥值得注意的,特殊处理的就是指定zookeeper存kafka目录的情况,如:
127.0.0.1:2181/kafka 这种情况。
4.3 shutdown
def shutdown() { try { info("shutting down") //保证只执行一次shutdown val canShutdown = isShuttingDown.compareAndSet(false, true) if (canShutdown) { // swallow的意思是执行参数函数,报错的话catch住,在log中打出来 Utils.swallow(controlledShutdown()) brokerState.newState(BrokerShuttingDown) if(socketServer != null) Utils.swallow(socketServer.shutdown()) if(requestHandlerPool != null) Utils.swallow(requestHandlerPool.shutdown()) if(offsetManager != null) offsetManager.shutdown() Utils.swallow(kafkaScheduler.shutdown()) if(apis != null) Utils.swallow(apis.close()) if(replicaManager != null) Utils.swallow(replicaManager.shutdown()) if(logManager != null) Utils.swallow(logManager.shutdown()) if(kafkaController != null) Utils.swallow(kafkaController.shutdown()) if(zkClient != null) Utils.swallow(zkClient.close()) brokerState.newState(NotRunning) shutdownLatch.countDown() startupComplete.set(false) info("shut down completed") } } catch { case e: Throwable => fatal("Fatal error during KafkaServer shutdown.", e) throw e } }
执行中调用了各个模块的shutdown方法,在讲解各个模块时会一一讲解。
4.4 controlledShutdown
在执行shutdown方法时会先执行controlledShutdown方法。这个方法是向controller发一个request,告诉controller这个broker要shutdown 了
private def controlledShutdown() { //当告知controller shutdown时,如果发生失败,等待一个配置文件设置的补偿时间,重试一个配置的次数,如果还是失败,就放弃controlledshutdown if (startupComplete.get() && config.controlledShutdownEnable) { //配置文件中设定的失败重试次数 var remainingRetries = config.controlledShutdownMaxRetries info("Starting controlled shutdown") var channel : BlockingChannel = null var prevController : Broker = null var shutdownSuceeded : Boolean = false try { //broker 状态修改 brokerState.newState(PendingControlledShutdown) //如果放送请求成功或者超过配置的重试次数就跳出循环 while (!shutdownSuceeded && remainingRetries > 0) { remainingRetries = remainingRetries - 1 //从zookeeper上获取现在的controller 的brokerid val controllerId = ZkUtils.getController(zkClient) ZkUtils.getBrokerInfo(zkClient, controllerId) match { case Some(broker) => //如果没有与controller 连接的channel或者没记录过controller id或者记录的controller不是最新的controller //也就是说如果之前和controller建立过连接而且controller没有变,就不执行下面语句。 if (channel == null || prevController == null || !prevController.equals(broker)) { if (channel != null) { channel.disconnect() } channel = new BlockingChannel(broker.host, broker.port, BlockingChannel.UseDefaultBufferSize, BlockingChannel.UseDefaultBufferSize, config.controllerSocketTimeoutMs) channel.connect() prevController = broker } case None=> //忽视,重试 } //发送request给controller if (channel != null) { var response: Receive = null try { val request = new ControlledShutdownRequest(correlationId.getAndIncrement, config.brokerId) channel.send(request) response = channel.receive() val shutdownResponse = ControlledShutdownResponse.readFrom(response.buffer) //如果没有问题,标记发送成功 if (shutdownResponse.errorCode == ErrorMapping.NoError && shutdownResponse.partitionsRemaining != null && shutdownResponse.partitionsRemaining.size == 0) { shutdownSuceeded = true info ("Controlled shutdown succeeded") } else { info("Remaining partitions to move: %s".format(shutdownResponse.partitionsRemaining.mkString(","))) info("Error code from controller: %d".format(shutdownResponse.errorCode)) } } catch { case ioe: java.io.IOException => channel.disconnect() channel = null warn("Error during controlled shutdown, possibly because leader movement took longer than the configured socket.timeout.ms: %s".format(ioe.getMessage)) } } //如果发送失败sleep设置的补偿时间 if (!shutdownSuceeded) { Thread.sleep(config.controlledShutdownRetryBackoffMs) warn("Retrying controlled shutdown after the previous attempt failed...") } } } finally { if (channel != null) { channel.disconnect() channel = null } } if (!shutdownSuceeded) { warn("Proceeding to do an unclean shutdown as all the controlled shutdown attempts failed") } } }
至于ControlledShutdownRequest请求究竟干了什么,我们之后的讲解会介绍。
- Kafka源码解析(一)core.kafka.server.KafkaServer
- kafka源码之kafkaserver的启动
- kafka源码之kafkaserver的启动
- kafka源码分析之kafkaserver的健康状态管理
- Kafka设计解析(一)- Kafka背景及架构介绍
- Kafka设计解析(一)- Kafka背景及架构介绍
- Kafka设计解析(一)- Kafka背景及架构介绍
- Kafka设计解析(一)- Kafka背景及架构介绍
- Kafka设计解析(一)- Kafka背景及架构介绍
- Kafka设计解析(一)- Kafka背景及架构介绍
- Kafka设计解析(一)- Kafka背景及架构介绍
- Kafka设计解析(一)- Kafka背景及架构介绍
- Kafka设计解析(一)- Kafka背景及架构介绍
- Kafka设计解析(一)- Kafka背景及架构介绍
- Kafka设计解析(一)- Kafka背景及架构介绍
- Kafka设计解析(一)- Kafka背景及架构介绍
- Kafka设计解析(一)- Kafka背景及架构介绍
- Kafka设计解析(一)- Kafka背景及架构介绍
- 游戏开发中的人工智能(五):以势函数实现移动
- python 编码格式
- Linux 脚本示例-----xiaoxu
- bzoj4542 [HNOI2016]大数(莫队+离散化+数学)
- LintCode 376 Binary Tree Path sum
- Kafka源码解析(一)core.kafka.server.KafkaServer
- Android USB Host U盘
- Java:基础之类(Scanner、Radom)
- Mysql语句执行顺序
- 父类子类中的执行顺序
- Oracle和MySQL的几点区别
- Mysql中参数以“?”为前缀,以@作为前缀例如@deveui,后面参数传值会为空
- ngx_lua模块中的共享内存字典项API
- [译]Protobuf 语法指南