spark streaming kafka OffsetOutOfRangeException 异常分析与解决
来源:互联网 发布:eve大鲸鱼数据 编辑:程序博客网 时间:2024/05/17 08:14
spark streaming kafka OffsetOutOfRangeException 异常分析与解决
自从把spark 从1.3升级到1.6之后,kafka Streaming相关问题频出。最近又遇到了一个。
job中使用Kafka DirectStream 读取topic中数据,然后做处理。其中有个测试job,停止了几天,再次启动时爆出了kafka.common.OffsetOutOfRangeException。下文记录下异常分析与解决过程。
异常分析
从字面意思上,说是kafka topic的offset越界异常;在job中使用的是Kafka DirectStream,每成功处理一批数据,就把对应的offset更新到zookeeper中;和数组越界异常一样,offset越界应该分为头越界和尾越界,如下图所示。
- 头部越界: zookeeper中保存的offset在topic中仍然存在的最老message的offset之前时(zk_offset < earliest_offset);
- 尾部越界: zookeeper中保存的offset在topic中最新message的offset之后时(zk_offset > last_offset)
因为代码中采用了之前文章的方法,因此不可能是尾部越界,因此猜测是头部越界。
是什么导致头部越界呢?
考虑到kafka broker配置中修改了message的保持时间为24小时:
log.retention.hours=24(The minimum age of a log file to be eligible for deletion)
因此,应该是kafka 中未被消费的数据被broker清除了,使得zk中的offset落在仍存在的最老message offset的左侧,本来合法的offset变得不非法了。
验证猜测
- 改kafka broker 的retention time 为2分钟
配置文件
kafka/config/server.propertieslog.retention.hours=168 -> log.retention.minutes=2
修改完成后重启kafka。 - 使用zk shell 命令得到解析器所保存的zk_offset
- 停止spark streaming kafka DirectStream job
- 发送数据到kafka topic,等待一段时间(超过两分钟)
- 启动streaming job,复现该异常。
通过异常验证可以导致异常的原因为:kafka broker因为log.retention.hours的配置,导致topic中有些数据被清除,而在retention时间范围内streaming job都没有把将要被清除的message消费掉,因此zk中offset落在了earliest_offset的左侧,引发异常。
解决方法
首先想到的方法就是 streaming job要及时消费掉topic中的数据,消费延迟不得大于log.retention.time的配置。
但是更好的办法是在遇到该问题时,依然能让job正常运行,因此就需要在发现zk_offset<earliest_offset
时矫正zk_offset为合法值。
同样使用Spark Streaming ‘numRecords must not be negative’问题解决,解决思路的方法。
代码:
package com.frey.v1.utils.kafka;import com.google.common.collect.Lists;import com.google.common.collect.Maps;import kafka.api.PartitionOffsetRequestInfo;import kafka.cluster.Broker;import kafka.common.TopicAndPartition;import kafka.javaapi.*;import kafka.javaapi.consumer.SimpleConsumer;import java.util.Date;import java.util.HashMap;import java.util.List;import java.util.Map;/** * KafkaOffsetTool * * @author FREY * @date 2016/4/11 */public class KafkaOffsetTool { private static KafkaOffsetTool instance; final int TIMEOUT = 100000; final int BUFFERSIZE = 64 * 1024; private KafkaOffsetTool() { } public static synchronized KafkaOffsetTool getInstance() { if (instance == null) { instance = new KafkaOffsetTool(); } return instance; }public Map<TopicAndPartition, Long> getLastOffset(String brokerList, List<String> topics, String groupId) { Map<TopicAndPartition, Long> topicAndPartitionLongMap = Maps.newHashMap(); Map<TopicAndPartition, Broker> topicAndPartitionBrokerMap = KafkaOffsetTool.getInstance().findLeader(brokerList, topics); for (Map.Entry<TopicAndPartition, Broker> topicAndPartitionBrokerEntry : topicAndPartitionBrokerMap .entrySet()) { // get leader broker Broker leaderBroker = topicAndPartitionBrokerEntry.getValue(); SimpleConsumer simpleConsumer = new SimpleConsumer(leaderBroker.host(), leaderBroker.port(), TIMEOUT, BUFFERSIZE, groupId); long readOffset = getTopicAndPartitionLastOffset(simpleConsumer, topicAndPartitionBrokerEntry.getKey(), groupId); topicAndPartitionLongMap.put(topicAndPartitionBrokerEntry.getKey(), readOffset); } return topicAndPartitionLongMap; } /** * * @param brokerList * @param topics * @param groupId * @return */ public Map<TopicAndPartition, Long> getEarliestOffset(String brokerList, List<String> topics, String groupId) { Map<TopicAndPartition, Long> topicAndPartitionLongMap = Maps.newHashMap(); Map<TopicAndPartition, Broker> topicAndPartitionBrokerMap = KafkaOffsetTool.getInstance().findLeader(brokerList, topics); for (Map.Entry<TopicAndPartition, Broker> topicAndPartitionBrokerEntry : topicAndPartitionBrokerMap .entrySet()) { // get leader broker Broker leaderBroker = topicAndPartitionBrokerEntry.getValue(); SimpleConsumer simpleConsumer = new SimpleConsumer(leaderBroker.host(), leaderBroker.port(), TIMEOUT, BUFFERSIZE, groupId); long readOffset = getTopicAndPartitionEarliestOffset(simpleConsumer, topicAndPartitionBrokerEntry.getKey(), groupId); topicAndPartitionLongMap.put(topicAndPartitionBrokerEntry.getKey(), readOffset); } return topicAndPartitionLongMap; } /** * 得到所有的 TopicAndPartition * * @param brokerList * @param topics * @return topicAndPartitions */ private Map<TopicAndPartition, Broker> findLeader(String brokerList, List<String> topics) { // get broker's url array String[] brokerUrlArray = getBorkerUrlFromBrokerList(brokerList); // get broker's port map Map<String, Integer> brokerPortMap = getPortFromBrokerList(brokerList); // create array list of TopicAndPartition Map<TopicAndPartition, Broker> topicAndPartitionBrokerMap = Maps.newHashMap(); for (String broker : brokerUrlArray) { SimpleConsumer consumer = null; try { // new instance of simple Consumer consumer = new SimpleConsumer(broker, brokerPortMap.get(broker), TIMEOUT, BUFFERSIZE, "leaderLookup" + new Date().getTime()); TopicMetadataRequest req = new TopicMetadataRequest(topics); TopicMetadataResponse resp = consumer.send(req); List<TopicMetadata> metaData = resp.topicsMetadata(); for (TopicMetadata item : metaData) { for (PartitionMetadata part : item.partitionsMetadata()) { TopicAndPartition topicAndPartition = new TopicAndPartition(item.topic(), part.partitionId()); topicAndPartitionBrokerMap.put(topicAndPartition, part.leader()); } } } catch (Exception e) { e.printStackTrace(); } finally { if (consumer != null) consumer.close(); } } return topicAndPartitionBrokerMap; } /** * get last offset * @param consumer * @param topicAndPartition * @param clientName * @return */ private long getTopicAndPartitionLastOffset(SimpleConsumer consumer, TopicAndPartition topicAndPartition, String clientName) { Map<TopicAndPartition, PartitionOffsetRequestInfo> requestInfo = new HashMap<TopicAndPartition, PartitionOffsetRequestInfo>(); requestInfo.put(topicAndPartition, new PartitionOffsetRequestInfo( kafka.api.OffsetRequest.LatestTime(), 1)); OffsetRequest request = new OffsetRequest( requestInfo, kafka.api.OffsetRequest.CurrentVersion(), clientName); OffsetResponse response = consumer.getOffsetsBefore(request); if (response.hasError()) { System.out .println("Error fetching data Offset Data the Broker. Reason: " + response.errorCode(topicAndPartition.topic(), topicAndPartition.partition())); return 0; } long[] offsets = response.offsets(topicAndPartition.topic(), topicAndPartition.partition()); return offsets[0]; } /** * get earliest offset * @param consumer * @param topicAndPartition * @param clientName * @return */ private long getTopicAndPartitionEarliestOffset(SimpleConsumer consumer, TopicAndPartition topicAndPartition, String clientName) { Map<TopicAndPartition, PartitionOffsetRequestInfo> requestInfo = new HashMap<TopicAndPartition, PartitionOffsetRequestInfo>(); requestInfo.put(topicAndPartition, new PartitionOffsetRequestInfo( kafka.api.OffsetRequest.EarliestTime(), 1)); OffsetRequest request = new OffsetRequest( requestInfo, kafka.api.OffsetRequest.CurrentVersion(), clientName); OffsetResponse response = consumer.getOffsetsBefore(request); if (response.hasError()) { System.out .println("Error fetching data Offset Data the Broker. Reason: " + response.errorCode(topicAndPartition.topic(), topicAndPartition.partition())); return 0; } long[] offsets = response.offsets(topicAndPartition.topic(), topicAndPartition.partition()); return offsets[0]; } /** * 得到所有的broker url * * @param brokerlist * @return */ private String[] getBorkerUrlFromBrokerList(String brokerlist) { String[] brokers = brokerlist.split(","); for (int i = 0; i < brokers.length; i++) { brokers[i] = brokers[i].split(":")[0]; } return brokers; } /** * 得到broker url 与 其port 的映射关系 * * @param brokerlist * @return */ private Map<String, Integer> getPortFromBrokerList(String brokerlist) { Map<String, Integer> map = new HashMap<String, Integer>(); String[] brokers = brokerlist.split(","); for (String item : brokers) { String[] itemArr = item.split(":"); if (itemArr.length > 1) { map.put(itemArr[0], Integer.parseInt(itemArr[1])); } } return map; } public static void main(String[] args) { List<String> topics = Lists.newArrayList(); topics.add("my_topic");// topics.add("bugfix"); Map<TopicAndPartition, Long> topicAndPartitionLongMap = KafkaOffsetTool.getInstance().getEarliestOffset("broker1:9092,broker2:9092", topics, "com.frey.group"); for (Map.Entry<TopicAndPartition, Long> entry : topicAndPartitionLongMap.entrySet()) { System.out.println(entry.getKey().topic() + "-"+ entry.getKey().partition() + ":" + entry.getValue()); } }}
矫正offset核心代码:
/** 以下 矫正 offset */ // lastest offsets Map<TopicAndPartition, Long> lastestTopicAndPartitionLongMap = KafkaOffsetTool.getInstance().getLastOffset(kafkaParams.get("metadata.broker.list"), Lists.newArrayList(topicsSet), kafkaParams.get(Constants.KAFKA_CONSUMER_GROUP_ID)); // earliest offsets Map<TopicAndPartition, Long> earliestTopicAndPartitionLongMap = KafkaOffsetTool.getInstance().getEarliestOffset(kafkaParams.get("metadata.broker.list"), Lists.newArrayList(topicsSet), kafkaParams.get(Constants.KAFKA_CONSUMER_GROUP_ID)); for (Map.Entry<TopicAndPartition, Long> topicAndPartitionLongEntry : fromOffsets.entrySet()) { long zkOffset = topicAndPartitionLongEntry.getValue(); long lastestOffset = lastestTopicAndPartitionLongMap.get(topicAndPartitionLongEntry.getKey()); long earliestOffset = earliestTopicAndPartitionLongMap.get(topicAndPartitionLongEntry.getKey()); // zkoffset 不在可用message offset区间内 if (zkOffset > lastestOffset || zkOffset < earliestOffset) { // set offset = earliestOffset logger.warn("矫正offset: " + zkOffset +" -> "+ earliestOffset); topicAndPartitionLongEntry.setValue(earliestOffset); } } /** 以上 矫正 offset */
- spark streaming kafka OffsetOutOfRangeException 异常分析与解决
- Spark Streaming 与 Kafka 集成分析
- Spark-Streaming与Kafka整合
- Spark Streaming 实战案例(五) Spark Streaming与Kafka
- Spark streaming+kafka+logstash日志分析
- Spark Streaming与Kafka集成编程
- java.lang.NoSuchMethodError和kafka.common.OffsetOutOfRangeException(Spark)
- Spark Streaming整合Kafka
- Spark Streaming + kafka
- Kafka+Spark Streaming集成
- kafka + spark streaming 架构
- spark streaming+kafka
- Spark Streaming + Kafka
- kafka-spark streaming-hbase
- spark streaming 整合kafka
- spark streaming + kafka
- flume kafka spark streaming
- Kafka->Spark Streaming->mongodb
- 第七届科技节编程大赛须知
- thrift C++做server, C++,python, java做Client例子
- 心情 2016年4月17号 星期日
- 回溯法八皇后
- sublime 使用的快捷键以及小技巧
- spark streaming kafka OffsetOutOfRangeException 异常分析与解决
- jquery的一些实用方法
- 数据结构—文本串加密
- HDU5667 Sequence(矩阵快速幂)
- Metal 着色语言编程指南 七
- 使用log4cplus库,编译错误undefined reference to `log4cplus::Logger::Logger(log4cplus::Logger&&)'解决方法
- 大一
- IDF 包罗万象 抓到一只苍蝇
- 【感悟】——人生路,昂首走