kafka 0.8 simple api使用

  • 使用simple consumer
    • 为什么使用simple consumer
    • 代码示例
      • Finding the Lead Broker for a Topic and Partition
      • Finding Starting Offset for Reads
      • Error Handling
      • 读取数据

使用simple consumer

为什么使用simple consumer


  • 多次读取一个消息
  • 在进程中只读取一个topic的某些partition
  • 通过事务管理保证一条消息仅被处理一次

- 需要比使用消费者组多多的开发工作
- 必须在应用中跟踪offset,以便知晓下次从哪里开始消费
- 必须人工的指定某个topic的某个所有partition的leader
- 必须处理leader变化

- 找到一个active broker,确定哪个broker是topic的partition的leader
- 确定topic的partition的副本存储的broker
- 获取数据
- 从leader变更中鉴定和恢复


Finding the Lead Broker for a Topic and Partition


private PartitionMetadata findLeader(List<String> a_seedBrokers, int a_port, String a_topic, int a_partition) {        PartitionMetadata returnMetaData = null;        loop:        for (String seed : a_seedBrokers) {            SimpleConsumer consumer = null;            try {                consumer = new SimpleConsumer(seed, a_port, 100000, 64 * 1024, "leaderLookup");                List<String> topics = Collections.singletonList(a_topic);                TopicMetadataRequest req = new TopicMetadataRequest(topics);                kafka.javaapi.TopicMetadataResponse resp = consumer.send(req);                List<TopicMetadata> metaData = resp.topicsMetadata();//向连接的broker获取topic的详细信息                //遍历所有的partition元数据,直到找到自己感兴趣的partition                for (TopicMetadata item : metaData) {                    for (PartitionMetadata part : item.partitionsMetadata()) {                        if (part.partitionId() == a_partition) {                            returnMetaData = part;                            break loop;                        }                    }                }            } catch (Exception e) {                System.out.println("Error communicating with Broker [" + seed + "] to find Leader for [" + a_topic                        + ", " + a_partition + "] Reason: " + e);            } finally {                if (consumer != null) consumer.close();            }        }        if (returnMetaData != null) {            m_replicaBrokers.clear();            for (kafka.cluster.Broker replica : returnMetaData.replicas()) {                m_replicaBrokers.add(replica.host());            }        }        return returnMetaData;    }

Finding Starting Offset for Reads


public static long getLastOffset(SimpleConsumer consumer, String topic, int partition,                                     long whichTime, String clientName) {        TopicAndPartition topicAndPartition = new TopicAndPartition(topic, partition);        Map<TopicAndPartition, PartitionOffsetRequestInfo> requestInfo = new HashMap<TopicAndPartition, PartitionOffsetRequestInfo>();        requestInfo.put(topicAndPartition, new PartitionOffsetRequestInfo(whichTime, 1));        kafka.javaapi.OffsetRequest request = new kafka.javaapi.OffsetRequest(requestInfo, kafka.api.OffsetRequest.CurrentVersion(),clientName);        OffsetResponse response = consumer.getOffsetsBefore(request);        if (response.hasError()) {            System.out.println("Error fetching data Offset Data the Broker. Reason: " + response.errorCode(topic, partition) );            return 0;        }        long[] offsets = response.offsets(topic, partition);        return offsets[0];    }

Error Handling


if (fetchResponse.hasError()) {     numErrors++;     // Something went wrong!     short code = fetchResponse.errorCode(a_topic, a_partition);     System.out.println("Error fetching data from the Broker:" + leadBroker + " Reason: " + code);     if (numErrors > 5) break;     if (code == ErrorMapping.OffsetOutOfRangeCode())  {         // We asked for an invalid offset. For simple case ask for the last element to reset         readOffset = getLastOffset(consumer,a_topic, a_partition, kafka.api.OffsetRequest.LatestTime(), clientName);         continue;     }     consumer.close();     consumer = null;     leadBroker = findNewLeader(leadBroker, a_topic, a_partition, a_port);     continue; }


private String findNewLeader(String a_oldLeader, String a_topic, int a_partition, int a_port) throws Exception {       for (int i = 0; i < 3; i++) {           boolean goToSleep = false;           PartitionMetadata metadata = findLeader(m_replicaBrokers, a_port, a_topic, a_partition);           if (metadata == null) {               goToSleep = true;           } else if (metadata.leader() == null) {               goToSleep = true;           } else if (a_oldLeader.equalsIgnoreCase(metadata.leader().host()) && i == 0) {               // first time through if the leader hasn't changed give ZooKeeper a second to recover               // second time, assume the broker did recover before failover, or it was a non-Broker issue               //               goToSleep = true;           } else {               return metadata.leader().host();           }           if (goToSleep) {               try {                   Thread.sleep(1000);               } catch (InterruptedException ie) {               }           }       }       System.out.println("Unable to find new leader after Broker failure. Exiting");       throw new Exception("Unable to find new leader after Broker failure. Exiting");   }

This method uses the findLeader() logic we defined earlier to find the new leader, except here we only try to connect to one of the replicas for the topic/partition. This way if we can’t reach any of the Brokers with the data we are interested in we give up and exit hard.
Since it may take a short time for ZooKeeper to detect the leader loss and assign a new leader, we sleep if we don’t get an answer. In reality ZooKeeper often does the failover very quickly so you never sleep.


// When calling FetchRequestBuilder, it's important NOT to call .replicaId(), which is meant for internal use only.// Setting the replicaId incorrectly will cause the brokers to behave incorrectly.FetchRequest req = new FetchRequestBuilder()        .clientId(clientName)        .addFetch(a_topic, a_partition, readOffset, 100000)        .build();FetchResponse fetchResponse = consumer.fetch(req);if (fetchResponse.hasError()) {        // See code in previous section}numErrors = 0;long numRead = 0;for (MessageAndOffset messageAndOffset : fetchResponse.messageSet(a_topic, a_partition)) {    long currentOffset = messageAndOffset.offset();    if (currentOffset < readOffset) {        System.out.println("Found an old offset: " + currentOffset + " Expecting: " + readOffset);        continue;    }    readOffset = messageAndOffset.nextOffset();    ByteBuffer payload = messageAndOffset.message().payload();    byte[] bytes = new byte[payload.limit()];    payload.get(bytes);    System.out.println(String.valueOf(messageAndOffset.offset()) + ": " + new String(bytes, "UTF-8"));    numRead++;    a_maxReads--;}if (numRead == 0) {    try {        Thread.sleep(1000);    } catch (InterruptedException ie) {    }}

Note that the ‘readOffset’ asks the last read message what the next Offset would be. This way when the block of messages is processed we know where to ask Kafka where to start the next fetch.
Also note that we are explicitly checking that the offset being read is not less than the offset that we requested. This is needed since if Kafka is compressing the messages, the fetch request will return an entire compressed block even if the requested offset isn’t the beginning of the compressed block. Thus a message we saw previously may be returned again. Note also that we ask for a fetchSize of 100000 bytes. If the Kafka producers are writing large batches, this might not be enough, and might return an empty message set. In this case, the fetchSize should be increased until a non-empty set is returned.
Finally, we keep track of the # of messages read. If we didn’t read anything on the last request we go to sleep for a second so we aren’t hammering Kafka when there is no data.