kafka 0.8 simple api使用

来源:互联网 发布:锤子科技 知乎 编辑:程序博客网 时间:2024/05/20 23:55

  • 使用simple consumer
    • 为什么使用simple consumer
    • 代码示例
      • Finding the Lead Broker for a Topic and Partition
      • Finding Starting Offset for Reads
      • Error Handling
      • 读取数据

使用simple consumer

为什么使用simple consumer

使用simpleconsumer的主要原因是用户想要得到比使用消费者组更好的分区消费的控制,例如用户想要:

  • 多次读取一个消息
  • 在进程中只读取一个topic的某些partition
  • 通过事务管理保证一条消息仅被处理一次

使用simpleconsumer的缺点:
- 需要比使用消费者组多多的开发工作
- 必须在应用中跟踪offset,以便知晓下次从哪里开始消费
- 必须人工的指定某个topic的某个所有partition的leader
- 必须处理leader变化

使用simpleconsumer的步骤:
- 找到一个active broker,确定哪个broker是topic的partition的leader
- 确定topic的partition的副本存储的broker
- 获取数据
- 从leader变更中鉴定和恢复

代码示例

Finding the Lead Broker for a Topic and Partition

最简单的实现方式是把已知的broker集合传递到app的代码逻辑中,或者是通过配置文件或者是通过命令行参数。不需要传递所有集群中所有的broker,只需要传递一个集合,从这个集合你可以开始寻找或者的broker用于查询leader信息即可(只是从传入的这些broker获取元数据?)

private PartitionMetadata findLeader(List<String> a_seedBrokers, int a_port, String a_topic, int a_partition) {        PartitionMetadata returnMetaData = null;        loop:        for (String seed : a_seedBrokers) {            SimpleConsumer consumer = null;            try {                consumer = new SimpleConsumer(seed, a_port, 100000, 64 * 1024, "leaderLookup");                List<String> topics = Collections.singletonList(a_topic);                TopicMetadataRequest req = new TopicMetadataRequest(topics);                kafka.javaapi.TopicMetadataResponse resp = consumer.send(req);                List<TopicMetadata> metaData = resp.topicsMetadata();//向连接的broker获取topic的详细信息                //遍历所有的partition元数据,直到找到自己感兴趣的partition                for (TopicMetadata item : metaData) {                    for (PartitionMetadata part : item.partitionsMetadata()) {                        if (part.partitionId() == a_partition) {                            returnMetaData = part;                            break loop;                        }                    }                }            } catch (Exception e) {                System.out.println("Error communicating with Broker [" + seed + "] to find Leader for [" + a_topic                        + ", " + a_partition + "] Reason: " + e);            } finally {                if (consumer != null) consumer.close();            }        }        if (returnMetaData != null) {            m_replicaBrokers.clear();            for (kafka.cluster.Broker replica : returnMetaData.replicas()) {                m_replicaBrokers.add(replica.host());            }        }        return returnMetaData;    }

Finding Starting Offset for Reads

kafka包含两个常量,kafka.api.OffsetRequest.EarliestTime()用来招到logs中的数据的起始位置,并且从这里开始读取消息。kafka.api.OffsetRequest.LatestTime()则只读取最新的的消息。

public static long getLastOffset(SimpleConsumer consumer, String topic, int partition,                                     long whichTime, String clientName) {        TopicAndPartition topicAndPartition = new TopicAndPartition(topic, partition);        Map<TopicAndPartition, PartitionOffsetRequestInfo> requestInfo = new HashMap<TopicAndPartition, PartitionOffsetRequestInfo>();        requestInfo.put(topicAndPartition, new PartitionOffsetRequestInfo(whichTime, 1));        kafka.javaapi.OffsetRequest request = new kafka.javaapi.OffsetRequest(requestInfo, kafka.api.OffsetRequest.CurrentVersion(),clientName);        OffsetResponse response = consumer.getOffsetsBefore(request);        if (response.hasError()) {            System.out.println("Error fetching data Offset Data the Broker. Reason: " + response.errorCode(topic, partition) );            return 0;        }        long[] offsets = response.offsets(topic, partition);        return offsets[0];    }

Error Handling

simpleconsumer不处理leader的失败,所以用户必须写点代码来处理

if (fetchResponse.hasError()) {     numErrors++;     // Something went wrong!     short code = fetchResponse.errorCode(a_topic, a_partition);     System.out.println("Error fetching data from the Broker:" + leadBroker + " Reason: " + code);     if (numErrors > 5) break;     if (code == ErrorMapping.OffsetOutOfRangeCode())  {         // We asked for an invalid offset. For simple case ask for the last element to reset         readOffset = getLastOffset(consumer,a_topic, a_partition, kafka.api.OffsetRequest.LatestTime(), clientName);         continue;     }     consumer.close();     consumer = null;     leadBroker = findNewLeader(leadBroker, a_topic, a_partition, a_port);     continue; }

这里,一旦fetch返回错误,会把这个错误的出错原因打印下来,这时需要关闭consumer然后指定哪个是新的leader

private String findNewLeader(String a_oldLeader, String a_topic, int a_partition, int a_port) throws Exception {       for (int i = 0; i < 3; i++) {           boolean goToSleep = false;           PartitionMetadata metadata = findLeader(m_replicaBrokers, a_port, a_topic, a_partition);           if (metadata == null) {               goToSleep = true;           } else if (metadata.leader() == null) {               goToSleep = true;           } else if (a_oldLeader.equalsIgnoreCase(metadata.leader().host()) && i == 0) {               // first time through if the leader hasn't changed give ZooKeeper a second to recover               // second time, assume the broker did recover before failover, or it was a non-Broker issue               //               goToSleep = true;           } else {               return metadata.leader().host();           }           if (goToSleep) {               try {                   Thread.sleep(1000);               } catch (InterruptedException ie) {               }           }       }       System.out.println("Unable to find new leader after Broker failure. Exiting");       throw new Exception("Unable to find new leader after Broker failure. Exiting");   }

This method uses the findLeader() logic we defined earlier to find the new leader, except here we only try to connect to one of the replicas for the topic/partition. This way if we can’t reach any of the Brokers with the data we are interested in we give up and exit hard.
Since it may take a short time for ZooKeeper to detect the leader loss and assign a new leader, we sleep if we don’t get an answer. In reality ZooKeeper often does the failover very quickly so you never sleep.

读取数据

// When calling FetchRequestBuilder, it's important NOT to call .replicaId(), which is meant for internal use only.// Setting the replicaId incorrectly will cause the brokers to behave incorrectly.FetchRequest req = new FetchRequestBuilder()        .clientId(clientName)        .addFetch(a_topic, a_partition, readOffset, 100000)        .build();FetchResponse fetchResponse = consumer.fetch(req);if (fetchResponse.hasError()) {        // See code in previous section}numErrors = 0;long numRead = 0;for (MessageAndOffset messageAndOffset : fetchResponse.messageSet(a_topic, a_partition)) {    long currentOffset = messageAndOffset.offset();    if (currentOffset < readOffset) {        System.out.println("Found an old offset: " + currentOffset + " Expecting: " + readOffset);        continue;    }    readOffset = messageAndOffset.nextOffset();    ByteBuffer payload = messageAndOffset.message().payload();    byte[] bytes = new byte[payload.limit()];    payload.get(bytes);    System.out.println(String.valueOf(messageAndOffset.offset()) + ": " + new String(bytes, "UTF-8"));    numRead++;    a_maxReads--;}if (numRead == 0) {    try {        Thread.sleep(1000);    } catch (InterruptedException ie) {    }}

Note that the ‘readOffset’ asks the last read message what the next Offset would be. This way when the block of messages is processed we know where to ask Kafka where to start the next fetch.
Also note that we are explicitly checking that the offset being read is not less than the offset that we requested. This is needed since if Kafka is compressing the messages, the fetch request will return an entire compressed block even if the requested offset isn’t the beginning of the compressed block. Thus a message we saw previously may be returned again. Note also that we ask for a fetchSize of 100000 bytes. If the Kafka producers are writing large batches, this might not be enough, and might return an empty message set. In this case, the fetchSize should be increased until a non-empty set is returned.
Finally, we keep track of the # of messages read. If we didn’t read anything on the last request we go to sleep for a second so we aren’t hammering Kafka when there is no data.