kafka与其他消息队列不同的是, kafka的消费者状态由外部( 消费者本身或者类似于Zookeeper之类的外部存储 )进行维护, 所以kafka的消费就更加灵活, 但是也带来了很多的问题, 因为客户端消费超时被判定挂掉而消费者重新分配分区, 导致重复消费, 或者客户端挂掉而导致重复消费等问题.
本文内容简介
kafka的消费者有很多种不同的用法及模型. * 本文着重探讨0.9版本及之后的kafka新consumer API的手动提交和多线程的使用* . 对于外部存储offset, 手动偏移设置, 以及手动分区分配等不同消费者方案, 将在其他文章中介绍.
消费者在单线程下的使用
下面介绍单线程情况下自动提交和手动提交的两种消费者
1. 自动提交, 单线程poll, 然后消费
Properties props = new Properties(); props.put("bootstrap.servers", servers); props.put("group.id", "autoCommitGroup"); props.put("enable.auto.commit", "true"); props.put("auto.commit.interval.ms", "1000"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props); consumer.subscribe(Arrays.asList(topic)); while (true) { ConsumerRecords<String, String> records = consumer.poll(100); for (ConsumerRecord<String, String> record : records) System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value()); }
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
offset自动提交会让人产生误会, 其实并不是在后台提交, 而是在poll时才会进行offset提交.
2. 手动提交, 单线程poll, 读取一定量的数据后才提交offset
Properties props = new Properties(); props.put("bootstrap.servers", servers); props.put("group.id", "manualOffsetControlTest"); props.put("enable.auto.commit", "false"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props); consumer.subscribe(Arrays.asList(topic)); final int minBatchSize = 200; ArrayList<ConsumerRecord<String, String>> buffer = new ArrayList<>(); while (true) { ConsumerRecords<String, String> records = consumer.poll(100); for (ConsumerRecord<String, String> record : records) { buffer.add(record); } if (buffer.size() >= minBatchSize) { doSomething(buffer); consumer.commitAsync(); buffer.clear(); } }
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
新kafka消费者的版本特性
在接下来的探讨之前, 需要简单介绍一下kafka消费者的特性.
- kafka的0.9版本中重写了consumer API
- consumer维护了消费者当前消费状态, 不是线程安全的
- 新的consumer基于单线程模型, offset自动提交在poll方法中进行, 0.9–0.10.0.1, 客户端的心跳也是在poll中进行, 在0.10.1.0版本中, 客户端心跳在后台异步发送了
- 0.9版本不能设置每回poll返回的最大数据量, 所以poll一次会返回上一次消费位置到最新位置的数据, 或者最大的数据量. 在0.10.0.1版本及之后, 可以通过在consumer的props中设置max.poll.records来限制每回返回的最大数据条数.
我的设计
我所使用的kafka版本是0.10.0.1, 所以使用的是新版本的consumer API, 可以限制每回返回的最大数据条数, 但是心跳和自动提交都是在poll中进行的.
为了防止前面单线程中, 因为消息处理时间过长, poll的时间间隔很长, 导致不能及时在poll发送心跳, 且offset也不能提交, 客户端被超时被判断为挂掉, 未提交offset的消息会被其他消费者重新消费.
我的设计:
- 首先使用max.poll.records来限制每次poll返回的最大消息量
- 将消息的poll和消息的处理分隔开, 尽快的poll, 以发送心跳
- 每个处理线程只负责一个分区的处理, 当处理到一定的数量或者距离上一次处理一定的时间间隔后, 由poll线程进行提交offset.
代码架构如下图所示:
假设有两个消费者线程MsgReceiver, 分别分到了分区1和分区2, 分区3和分区4
- 有多个消费者线程, 在while循环中poll消息
- 消费者根据分区将消息交给对应的record_processor线程进行处理, 即一个record_processor线程只处理一个分区的消息
- record_processor处理线程处理了一定条数的消息或者距离上一次处理消息过去一定时间后, 将当前分区的偏移量放至到consumer_queue中
- 消费者record_processor在poll前先读取commit_queue中的内容, 如果有的话, 则提交当中的偏移信息到kafka. 然后继续poll消息
代码实现
1. 消费者任务 MsgReceiver
public class MsgReceiver implements Runnable { private static final Logger logger = LoggerFactory.getLogger(MsgReceiver.class); private BlockingQueue<Map<TopicPartition, OffsetAndMetadata>> commitQueue = new LinkedBlockingQueue<>(); private Map<String, Object> consumerConfig; private String alarmTopic; private ConcurrentHashMap<TopicPartition, RecordProcessor> recordProcessorTasks; private ConcurrentHashMap<TopicPartition, Thread> recordProcessorThreads; public MsgReceiver(Map<String, Object> consumerConfig, String alarmTopic, ConcurrentHashMap<TopicPartition, RecordProcessor> recordProcessorTasks, ConcurrentHashMap<TopicPartition, Thread> recordProcessorThreads) { this.consumerConfig = consumerConfig; this.alarmTopic = alarmTopic; this.recordProcessorTasks = recordProcessorTasks; this.recordProcessorThreads = recordProcessorThreads; } @Override public void run() { KafkaConsumer<String, String> consumer = new KafkaConsumer<>(consumerConfig); consumer.subscribe(Arrays.asList(alarmTopic)); try { while (!Thread.currentThread().isInterrupted()) { try { Map<TopicPartition, OffsetAndMetadata> toCommit = commitQueue.poll(); if (toCommit != null) { logger.debug("commit TopicPartition offset to kafka: " + toCommit); consumer.commitSync(toCommit); } ConsumerRecords<String, String> records = consumer.poll(100); if (records.count() > 0) { logger.debug("poll records size: " + records.count()); } for (final ConsumerRecord<String, String> record : records) { String topic = record.topic(); int partition = record.partition(); TopicPartition topicPartition = new TopicPartition(topic, partition); RecordProcessor processTask = recordProcessorTasks.get(topicPartition); if (processTask == null) { processTask = new RecordProcessor(commitQueue); recordProcessorTasks.put(topicPartition, processTask); Thread thread = new Thread(processTask); thread.setName("Thread-for " + topicPartition.toString()); logger.info("start Thread: " + thread.getName()); thread.start(); recordProcessorThreads.put(topicPartition, thread); } processTask.addRecordToQueue(record); } } catch (Exception e) { e.printStackTrace(); logger.warn("MsgReceiver exception " + e + " ignore it"); } } } finally { consumer.close(); } }}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
2. 消息处理任务 RecordProcessor
public class RecordProcessor implements Runnable { private static Logger logger = LoggerFactory.getLogger(RecordProcessor.class); private BlockingQueue<ConsumerRecord<String, String>> queue = new LinkedBlockingQueue<>(); private BlockingQueue<Map<TopicPartition, OffsetAndMetadata>> commitQueue; private LocalDateTime lastTime = LocalDateTime.now(); private long commitLength = 20L; private Duration commitTime = Duration.standardSeconds(2); private int completeTask = 0; private ConsumerRecord<String, String> lastUncommittedRecord; public RecordProcessor(BlockingQueue<Map<TopicPartition, OffsetAndMetadata>> commitQueue) { this.commitQueue = commitQueue; } @Override public void run() { try { while (!Thread.currentThread().isInterrupted()) { ConsumerRecord<String, String> record = queue.poll(100, TimeUnit.MICROSECONDS); if (record != null) { process(record); this.completeTask++; lastUncommittedRecord = record; } commitToQueue(); } } catch (InterruptedException e) { logger.info(Thread.currentThread() + "is interrupted"); } } private void process(ConsumerRecord<String, String> record) { System.out.println(record); } private void commitToQueue() throws InterruptedException { if (lastUncommittedRecord == null) { return; } boolean arrivedCommitLength = this.completeTask % commitLength == 0; LocalDateTime currentTime = LocalDateTime.now(); boolean arrivedTime = currentTime.isAfter(lastTime.plus(commitTime)); if (arrivedCommitLength || arrivedTime) { lastTime = currentTime; long offset = lastUncommittedRecord.offset(); int partition = lastUncommittedRecord.partition(); String topic = lastUncommittedRecord.topic(); TopicPartition topicPartition = new TopicPartition(topic, partition); logger.debug("partition: " + topicPartition + " submit offset: " + (offset + 1L) + " to consumer task"); Map<TopicPartition, OffsetAndMetadata> map = Collections.singletonMap(topicPartition, new OffsetAndMetadata(offset + 1L)); commitQueue.put(map); lastUncommittedRecord = null; } } public void addRecordToQueue(ConsumerRecord<String, String> record) { try { queue.put(record); } catch (InterruptedException e) { e.printStackTrace(); } }}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
3. 管理对象
负责启动消费者线程MsgReceiver, 保存消费者线程MsgReceiver, 保存处理任务和线程RecordProcessor, 以及销毁这些线程
public class KafkaMultiProcessorTest { private static final Logger logger = LoggerFactory.getLogger(KafkaMultiProcessor.class); private String alarmTopic; private String servers; private String group; private Map<String, Object> consumerConfig; private Thread[] threads; private ConcurrentHashMap<TopicPartition, RecordProcessor> recordProcessorTasks = new ConcurrentHashMap<>(); private ConcurrentHashMap<TopicPartition, Thread> recordProcessorThreads = new ConcurrentHashMap<>(); public static void main(String[] args) { KafkaMultiProcessorTest test = new KafkaMultiProcessorTest(); test.init(); } public void init() { consumerConfig = getConsumerConfig(); logger.debug("get kafka consumerConfig: " + consumerConfig.toString()); int threadsNum = 3; logger.debug("create " + threadsNum + " threads to consume kafka warn msg"); threads = new Thread[threadsNum]; for (int i = 0; i < threadsNum; i++) { MsgReceiver msgReceiver = new MsgReceiver(consumerConfig, alarmTopic, recordProcessorTasks, recordProcessorThreads); Thread thread = new Thread(msgReceiver); threads[i] = thread; thread.setName("alarm msg consumer " + i); } for (int i = 0; i < threadsNum; i++) { threads[i].start(); } logger.debug("finish creating" + threadsNum + " threads to consume kafka warn msg"); } public void destroy() { closeRecordProcessThreads(); closeKafkaConsumer(); } private void closeRecordProcessThreads() { logger.debug("start to interrupt record process threads"); for (Map.Entry<TopicPartition, Thread> entry : recordProcessorThreads.entrySet()) { Thread thread = entry.getValue(); thread.interrupt(); } logger.debug("finish interrupting record process threads"); } private void closeKafkaConsumer() { logger.debug("start to interrupt kafka consumer threads"); for (int i = 0; i < threads.length; i++) { threads[i].interrupt(); } logger.debug("finish interrupting consumer threads"); } private Map<String, Object> getConsumerConfig() { return ImmutableMap.<String, Object>builder() .put("bootstrap.servers", servers) .put("group.id", group) .put("enable.auto.commit", "false") .put("session.timeout.ms", "30000") .put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer") .put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer") .put("max.poll.records", 1000) .build(); } public void setAlarmTopic(String alarmTopic) { this.alarmTopic = alarmTopic; } public void setServers(String servers) { this.servers = servers; } public void setGroup(String group) { this.group = group; }}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
不足
上面的代码还有不足, 可以看到处理任务和线程是保存在map中, 如果consumer因为有新机器的上线而重新分配分区, 而被剥夺了某个分区的消费, 对应的处理任务和线程并没有进行响应的销毁. 所以我们使用org.apache.kafka.clients.consumer.ConsumerRebalanceListener来对分区的调整进行响应.