kafka

来源:互联网 发布:山东网络作家协会 编辑:程序博客网 时间:2024/06/02 02:25



对于kafkaconsumer接口,提供两种版本,

 

high-level

一种high-level版本,比较简单不用关心offset,会自动的读zookeeper中该Consumer grouplast offset

参考,https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example

不过要注意一些注意事项,对于多个partition和多个consumer

1.如果consumerpartition多,是浪费,因为kafka的设计是在一个partition上是不允许并发的,所以consumer数不要大于partition

2.如果consumerpartition少,一个consumer会对应于多个partitions,这里主要合理分配consumer数和partition数,否则会导致partition里面的数据被取的不均匀

   最好partiton数目是consumer数目的整数倍,所以partition数目很重要,比如取24,就很容易设定consumer数目

3.如果consumer从多个partition读到数据,不保证数据间的顺序性,kafka只保证在一个partition上数据是有序的,但多个partition,根据你读的顺序会有不同

4.增减consumerbrokerpartition会导致rebalance,所以rebalanceconsumer对应的partition会发生变化

5. High-level接口中获取不到数据的时候是会block

简单版,

简单的坑,如果测试流程是,先produce一些数据,然后再用consumer读的话,记得加上第一句设置

因为初始的offset默认是非法的,然后这个设置的意思是,当offset非法时,如何修正offset,默认是largest,即最新,所以不加这个配置,你是读不到你之前produce的数据的,而且这个时候你再加上smallest配置也没用了,因为此时offset是合法的,不会再被修正了,需要手工或用工具改重置offset

复制代码

       Properties props =new Properties();
        props.put(
"auto.offset.reset", "smallest");//必须要加,如果要读旧数据
        props.put("zookeeper.connect", "localhost:2181"
);
        props.put(
"group.id", "pv");
        props.put(
"zookeeper.session.timeout.ms","400");
        props.put(
"zookeeper.sync.time.ms","200");
        props.put(
"auto.commit.interval.ms", "1000");
       
        ConsumerConfig conf
=newConsumerConfig(props);
        ConsumerConnector consumer
=kafka.consumer.Consumer.createJavaConsumerConnector(conf);
        String topic
= "page_visits";
        Map
<String, Integer> topicCountMap =new HashMap<String, Integer>();
        topicCountMap.put(topic,
new Integer(1));
        Map
<String, List<KafkaStream<byte[],byte[]>>> consumerMap =consumer.createMessageStreams(topicCountMap);
        List
<KafkaStream<byte[],byte[]>> streams = consumerMap.get(topic);
       
        KafkaStream
<byte[],byte[]> stream = streams.get(0);
        ConsumerIterator
<byte[],byte[]> it =stream.iterator();
       
while(it.hasNext()){
            System.out.println(
"message: " +new String(it.next().message()));
        }
       
       
if(consumer !=null) consumer.shutdown();  //其实执行不到,因为上面的hasNextblock

复制代码

在用high-levelconsumer时,两个给力的工具,

1.bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group pv

可以看到当前group offset的状况,比如这里看pv的状况,3partition

Group          Topic                         Pid Offset         logSize        Lag            Owner

pv             page_visits                   0   21             21             0              none

pv             page_visits                   1   19             19             0              none

pv             page_visits                   2   20             20             0              none

关键就是offsetlogSizeLag

这里以前读完了,所以offset=logSize,并且Lag=0

2.bin/kafka-run-class.sh kafka.tools.UpdateOffsetsInZK earliestconfig/consumer.properties  page_visits

3个参数,

[earliest | latest],表示offset置到哪里

consumer.properties,这里是配置文件的路径

topictopic,这里是page_visits

我们对上面的pv group执行完这个操作后,再去checkgroup offset状况,结果如下,

Group          Topic                         Pid Offset         logSize        Lag            Owner

pv             page_visits                   0   0              21             21             none

pv             page_visits                   1   0              19             19             none

pv             page_visits                   2   0              20             20             none

可以看到offset已经被清0Lag=logSize

 

底下给出原文中多线程consumer的完整代码

复制代码

import kafka.consumer.ConsumerConfig;
import kafka.consumer.KafkaStream;
import kafka.javaapi.consumer.ConsumerConnector;
 
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
 
publicclassConsumerGroupExample {
   
private finalConsumerConnector consumer;
   
private finalString topic;
   
private ExecutorService executor;
 
   
publicConsumerGroupExample(String a_zookeeper, String a_groupId, String a_topic){
        consumer
= kafka.consumer.Consumer.createJavaConsumerConnector(// 创建Connector,注意下面对conf的配置
               createConsumerConfig(a_zookeeper, a_groupId));
       
this.topic= a_topic;
    }
 
   
publicvoid shutdown() {
       
if(consumer !=null) consumer.shutdown();
       
if(executor !=null) executor.shutdown();
    }
 
   
publicvoid run(int a_numThreads) {// 创建并发的consumers
        Map
<String, Integer> topicCountMap =new HashMap<String, Integer>();
        topicCountMap.put(topic,
new Integer(a_numThreads));// 描述读取哪个topic,需要几个线程读
        Map
<String, List<KafkaStream<byte[],byte[]>>> consumerMap =consumer.createMessageStreams(topicCountMap);// 创建Streams
        List
<KafkaStream<byte[],byte[]>> streams = consumerMap.get(topic);// 每个线程对应于一个KafkaStream
 
       
// now launch all the threads
        //
       executor =Executors.newFixedThreadPool(a_numThreads);
 
       
// now create an object to consume the messages
        //
       intthreadNumber =0;
       
for (finalKafkaStream stream : streams) {
            executor.submit(
new ConsumerTest(stream, threadNumber));// 启动consumerthread
            threadNumber
++;
        }
    }
 
   
privatestatic ConsumerConfig createConsumerConfig(String a_zookeeper,String a_groupId) {
        Properties props
=newProperties();
        props.put(
"zookeeper.connect",a_zookeeper);
        props.put(
"group.id", a_groupId);
        props.put(
"zookeeper.session.timeout.ms","400");
        props.put(
"zookeeper.sync.time.ms","200");
        props.put(
"auto.commit.interval.ms","1000");
 
       
returnnew ConsumerConfig(props);
    }
 
   
publicstaticvoid main(String[] args) {
        String zooKeeper
= args[0];
        String groupId
= args[1];
        String topic
= args[2];
       
intthreads = Integer.parseInt(args[3]);
 
        ConsumerGroupExample example
=newConsumerGroupExample(zooKeeper, groupId, topic);
        example.run(threads);
 
       
try{
            Thread.sleep(
10000);
        }
catch(InterruptedException ie) {
 
        }
        example.shutdown();
    }
}

复制代码

 

LOW LEVEL


SimpleConsumer


另一种是SimpleConsumer,名字起的,以为是简单的接口,其实是low-level consumer,更复杂的接口


参考,https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example


什么时候用这个接口?


  1. Read a messagemultiple times(多次)


  1. Consume only asubset of the partitions in a topic in a process (子集)


  1. Managetransactions to make sure a message is processed once and only once (事务)


 


当然用这个接口是有代价的,即partition,broker,offset对你不再透明,需要自己去管理这些,并且还要handle broker leader的切换,很麻烦


所以不是一定要用,最好别用


  1. You must keep track of the offsets in your application to know where you left off consuming. (记录)


  1. You mustfigure out which Broker is thelead Broker for a topic and partition(计算出)


  1. You must handle Broker leader changes


使用SimpleConsumer的步骤:


  1. Find an active Broker and find out which Broker is the leader for your topic and partition


  1. Determine who the replica Brokers are for your topic and partition


  1. Build the request defining what data you are interested in


  1. Fetch the data


  1. Identify and recover from leader changes


首先,你必须知道读哪个topic的哪个partition


然后,找到负责该partitionbrokerleader,从而找到存有该partition副本的那个broker


再者,自己去写requestfetch数据


最终,还要注意需要识别和处理broker leader的改变


 


逐步来看,


Finding the Lead Broker for a Topic andPartition


思路就是,遍历每个broker,取出该topicmetadata,然后再遍历其中的每个partitionmetadata,如果找到我们要找的partition就返回


根据返回的PartitionMetadata.leader().host()找到leaderbroker


复制代码


privatePartitionMetadata findLeader(List<String> a_seedBrokers,int a_port, String a_topic,int a_partition){
        PartitionMetadata returnMetaData
=null;
        loop:
       
for(String seed : a_seedBrokers) {//遍历每个broker
            SimpleConsumer consumer
=null;
           
try {
               
//创建SimpleConsumer
                //classSimpleConsumer(val host: String,val port: Int,val soTimeout: Int
                //                     ,val bufferSize: Int,valclientId: String)
               consumer = new SimpleConsumer(seed, a_port, 100000, 64 * 1024,"leaderLookup");
 
                List
<String> topics =Collections.singletonList(a_topic);
                TopicMetadataRequest req
=newTopicMetadataRequest(topics);//
               kafka.javaapi.TopicMetadataResponse resp = consumer.send(req);//发送TopicMetadata Request请求
 
                List
<TopicMetadata> metaData = resp.topicsMetadata();//取到TopicMetadata
 
               
for (TopicMetadata item : metaData) {
                   
for (PartitionMetadata part : item.partitionsMetadata()) {//遍历每个partitionmetadata
                       
if (part.partitionId() == a_partition) {//确认是否是我们要找的partition
                           returnMetaData = part;
                           
break loop;//找到就返回
                        }
                    }
                }
            }
catch (Exception e) {
                System.out.println(
"Error communicating with Broker[" + seed + "] to find Leader for [" + a_topic
                       
+ ", " + a_partition +"] Reason: " +e);
            }
finally {
               
if (consumer !=null) consumer.close();
            }
        }
       
returnreturnMetaData;
    }


复制代码


 


Finding Starting Offset forReads


request主要的信息就是Map<TopicAndPartition, PartitionOffsetRequestInfo>


TopicAndPartition就是对topicpartition信息的封装


PartitionOffsetRequestInfo的定义


caseclass PartitionOffsetRequestInfo(time: Long, maxNumOffsets: Int)


其中参数time,表示where tostart reading data,两个取值


kafka.api.OffsetRequest.EarliestTime()the beginning of the data inthe logs


kafka.api.OffsetRequest.LatestTime()will only stream new messages


不要认为起始的offset一定是0,因为messages会过期,被删除


另外一个参数不清楚什么含义,代码中取的是1


复制代码


publicstaticlong getLastOffset(SimpleConsumer consumer,String topic,int partition,
                                    
long whichTime, String clientName) {
        TopicAndPartitiontopicAndPartition
=new TopicAndPartition(topic, partition);
        Map
<TopicAndPartition, PartitionOffsetRequestInfo>requestInfo =new HashMap<TopicAndPartition,PartitionOffsetRequestInfo>();
       requestInfo.put(topicAndPartition,
newPartitionOffsetRequestInfo(whichTime, 1));//buildoffset fetch request info
        kafka.javaapi.OffsetRequestrequest
=new kafka.javaapi.OffsetRequest(requestInfo,
                                                 kafka.api.OffsetRequest.CurrentVersion(),clientName);
        OffsetResponse response
= consumer.getOffsetsBefore(request);//取到offsets
 
       
if(response.hasError()) {
            System.out.println(
"Error fetching data Offset Datathe Broker. Reason: " +response.errorCode(topic, partition) );
           
return 0;
        }
       
long[]offsets = response.offsets(topic, partition);//取到的一组offset
       returnoffsets[0];//取第一个开始读
    }


复制代码


 


Reading the Data


首先在FetchRequest上加上Fetch,指明topicpartition,开始的offset,读取的大小


如果producer在写入很大的message时,也许这里指定的1000000是不够的,会返回anempty message set,这时需要增加这个值,直到得到一个非空的message set


复制代码


// When calling FetchRequestBuilder, it's important NOT to call.replicaId(), which is meant for internal use only.
// Setting the replicaId incorrectly will cause the brokers to behaveincorrectly.

FetchRequest req =
new FetchRequestBuilder()
        .clientId(clientName)
        .addFetch(a_topic, a_partition,readOffset,
100000)// 1000000bytes
        .build();
FetchResponse fetchResponse
= consumer.fetch(req);
 
if (fetchResponse.hasError()) {
       
// See Error Handling
}
numErrors
= 0;
 
long numRead = 0;
for (MessageAndOffset messageAndOffset :fetchResponse.messageSet(a_topic, a_partition)) {
   
longcurrentOffset =messageAndOffset.offset();
   
if(currentOffset <readOffset) {// 必要判断,因为对于compressed message,会返回整个block,所以可能包含oldmessage
        System.out.println(
"Found an old offset: " +currentOffset + " Expecting: " + readOffset);
       
continue;
    }
    readOffset
= messageAndOffset.nextOffset();// 获取下一个readOffset
    ByteBuffer payload
= messageAndOffset.message().payload();
 
   
byte[] bytes =newbyte[payload.limit()];
    payload.get(bytes);
   System.out.println(String.valueOf(messageAndOffset.offset())
+ ": " + new String(bytes, "UTF-8"));
    numRead
++;
}
 
if (numRead == 0) {
   
try {
        Thread.sleep(
1000);
    }
catch(InterruptedException ie) {
    }
}


复制代码


 


Error Handling


复制代码


if(fetchResponse.hasError()) {
     numErrors
++;
    
// Something went wrong!
    
shortcode =fetchResponse.errorCode(a_topic, a_partition);
     System.out.println(
"Error fetching data from theBroker:" + leadBroker + " Reason: " + code);
    
if(numErrors > 5)break;
 
    
if (code==ErrorMapping.OffsetOutOfRangeCode()) { // 处理offset非法的问题,用最新的offset
        
// We asked for an invalid offset. For simple case ask forthe last element to reset
         readOffset =
getLastOffset(consumer,a_topic,a_partition, kafka.api.OffsetRequest.LatestTime(), clientName);
        
continue;
     }
     consumer.close();
     consumer
=null;
     leadBroker
= findNewLeader(leadBroker, a_topic, a_partition, a_port);// 更新leader broker
    
continue;
 }


复制代码


没有特别的逻辑,只是重新调用findLeader获取leaderbroker


并且防止在切换过程中,取不到leader信息,加上sleep逻辑


复制代码


private StringfindNewLeader(String a_oldLeader, String a_topic,int a_partition,int a_port)throws Exception{
      
for (int i = 0; i < 3; i++){
          
boolean goToSleep =false;
           PartitionMetadata metadata
=findLeader(m_replicaBrokers, a_port, a_topic, a_partition);
          
if (metadata ==null) {
               goToSleep
=true;
           }
elseif (metadata.leader() ==null) {
               goToSleep
=true;
           }
elseif(a_oldLeader.equalsIgnoreCase(metadata.leader().host()) && i == 0) {
              
// first time through if the leader hasn't changed giveZooKeeper a second to recover
               // second time, assume thebroker did recover before failover, or it was a non-Broker issue
               //
               goToSleep =true;
           }
else {
              
returnmetadata.leader().host();
           }
          
if (goToSleep) {
              
try {
                   Thread.sleep(
1000);
               }
catch (InterruptedExceptionie) {
               }
           }
       }
       System.out.println(
"Unable to find new leader after Broker failure.Exiting");
      
thrownew Exception("Unable to find new leader after Brokerfailure. Exiting");
   }


复制代码


 


Full Source Code


复制代码


packagecom.test.simple;
 
importkafka.api.FetchRequest;
importkafka.api.FetchRequestBuilder;
importkafka.api.PartitionOffsetRequestInfo;
importkafka.common.ErrorMapping;
importkafka.common.TopicAndPartition;
importkafka.javaapi.*;
importkafka.javaapi.consumer.SimpleConsumer;
importkafka.message.MessageAndOffset;
 
importjava.nio.ByteBuffer;
importjava.util.ArrayList;
importjava.util.Collections;
importjava.util.HashMap;
importjava.util.List;
importjava.util.Map;
 
public class SimpleExample{
   
publicstatic void main(String args[]) {
        SimpleExample example
=new SimpleExample();
       
long maxReads = Long.parseLong(args[0]);
        String topic
= args[1];
       
int partition = Integer.parseInt(args[2]);
        List
<String> seeds =new ArrayList<String>();
        seeds.add(args[
3]);
       
int port = Integer.parseInt(args[4]);
       
try {
            example.run(maxReads, topic,partition, seeds, port);
        }
catch (Exception e) {
            System.out.println(
"Oops:" +e);
            e.printStackTrace();
        }
    }
 
   
private List<String> m_replicaBrokers =newArrayList<String>();
 
   
public SimpleExample() {
        m_replicaBrokers
=new ArrayList<String>();
    }
 
   
publicvoid run(long a_maxReads, String a_topic,int a_partition,List<String> a_seedBrokers,int a_port)throws Exception {
       
//find the meta data about the topic and partition we are interestedin
        //
       PartitionMetadata metadata =findLeader(a_seedBrokers, a_port, a_topic, a_partition);
       
if (metadata ==null) {
            System.out.println(
"Can't find metadata for Topic and Partition.Exiting");
           
return;
        }
       
if (metadata.leader() ==null) {
            System.out.println(
"Can't find Leader for Topic and Partition.Exiting");
           
return;
        }
        String leadBroker
=metadata.leader().host();
        String clientName
= "Client_" + a_topic + "_" + a_partition;
 
        SimpleConsumer consumer
=new SimpleConsumer(leadBroker, a_port, 100000, 64 * 1024, clientName);
       
long readOffset =getLastOffset(consumer,a_topic, a_partition,kafka.api.OffsetRequest.EarliestTime(), clientName);
 
       
int numErrors = 0;
       
while (a_maxReads > 0){
           
if (consumer ==null) {
                consumer
=new SimpleConsumer(leadBroker, a_port, 100000, 64 * 1024, clientName);
            }
            FetchRequest req
=new FetchRequestBuilder()
                   .clientId(clientName)
                    .addFetch(a_topic,a_partition, readOffset,
100000) // Note: this fetchSize of 100000 might need to beincreased if large batches are written to Kafka
                    .build();
            FetchResponse fetchResponse
=consumer.fetch(req);
 
           
if(fetchResponse.hasError()) {
                numErrors
++;
               
// Something went wrong!
               
short code = fetchResponse.errorCode(a_topic, a_partition);
                System.out.println(
"Error fetching data from the Broker:" +leadBroker + " Reason: " +code);
               
if (numErrors > 5)break;
               
if (code == ErrorMapping.OffsetOutOfRangeCode()) {
                   
// We asked for an invalid offset. For simple case ask forthe last element to reset
                    readOffset =
getLastOffset(consumer,a_topic, a_partition,kafka.api.OffsetRequest.LatestTime(), clientName);
                   
continue;
                }
               consumer.close();
                consumer
=null;
                leadBroker
=findNewLeader(leadBroker, a_topic, a_partition, a_port);
               
continue;
            }
            numErrors
= 0;
 
           
long numRead = 0;
           
for (MessageAndOffsetmessageAndOffset : fetchResponse.messageSet(a_topic, a_partition)) {
               
long currentOffset = messageAndOffset.offset();
               
if (currentOffset < readOffset) {
                    System.out.println(
"Found an old offset: " + currentOffset +" Expecting: " +readOffset);
                   
continue;
                }
                readOffset
=messageAndOffset.nextOffset();
                ByteBuffer payload
=messageAndOffset.message().payload();
 
               
byte[] bytes =new byte[payload.limit()];
               payload.get(bytes);
               System.out.println(String.valueOf(messageAndOffset.offset())
+ ": " + new String(bytes, "UTF-8"));
                numRead
++;
                a_maxReads
--;
            }
 
           
if (numRead == 0) {
               
try {
                    Thread.sleep(
1000);
                }
catch (InterruptedExceptionie) {
                }
            }
        }
       
if (consumer !=null) consumer.close();
    }
 
   
publicstatic long getLastOffset(SimpleConsumer consumer, String topic,int partition,
                                    
long whichTime, StringclientName) {
        TopicAndPartitiontopicAndPartition
=newTopicAndPartition(topic, partition);
        Map
<TopicAndPartition, PartitionOffsetRequestInfo> requestInfo =newHashMap<TopicAndPartition, PartitionOffsetRequestInfo>();
       requestInfo.put(topicAndPartition,
new PartitionOffsetRequestInfo(whichTime, 1));
        kafka.javaapi.OffsetRequestrequest
=new kafka.javaapi.OffsetRequest(
                requestInfo,kafka.api.OffsetRequest.CurrentVersion(), clientName);
        OffsetResponse response
=consumer.getOffsetsBefore(request);
 
       
if (response.hasError()) {
            System.out.println(
"Error fetching data Offset Data the Broker. Reason:" + response.errorCode(topic, partition));
           
return 0;
        }
       
long[] offsets =response.offsets(topic, partition);
       
return offsets[0];
    }
 
   
private String findNewLeader(String a_oldLeader, String a_topic,inta_partition,inta_port)throwsException {
       
for (int i = 0; i < 3; i++){
           
boolean goToSleep =false;
            PartitionMetadata metadata
=findLeader(m_replicaBrokers, a_port, a_topic, a_partition);
           
if (metadata ==null) {
                goToSleep
=true;
            }
elseif (metadata.leader() ==null) {
                goToSleep
=true;
            }
elseif(a_oldLeader.equalsIgnoreCase(metadata.leader().host()) && i == 0) {
               
// first time through if the leader hasn't changed giveZooKeeper a second to recover
                // second time, assumethe broker did recover before failover, or it was a non-Broker issue
                //
                goToSleep =true;
            }
else {
               
returnmetadata.leader().host();
            }
           
if (goToSleep) {
               
try {
                    Thread.sleep(
1000);
                }
catch (InterruptedExceptionie) {
                }
            }
        }
        System.out.println(
"Unable to find new leader after Broker failure.Exiting");
       
thrownew Exception("Unable to find new leader after Brokerfailure. Exiting");
    }
 
   
private PartitionMetadata findLeader(List<String>a_seedBrokers,int a_port, String a_topic,int a_partition) {
        PartitionMetadata returnMetaData
=null;
        loop:
       
for (String seed : a_seedBrokers) {
            SimpleConsumer consumer
=null;
           
try {
                consumer
=new SimpleConsumer(seed, a_port, 100000, 64 * 1024,"leaderLookup");
                List
<String> topics =Collections.singletonList(a_topic);
                TopicMetadataRequest req
=new TopicMetadataRequest(topics);
               kafka.javaapi.TopicMetadataResponse resp
= consumer.send(req);
 
                List
<TopicMetadata> metaData = resp.topicsMetadata();
               
for (TopicMetadata item :metaData) {
                   
for (PartitionMetadata part: item.partitionsMetadata()) {
                       
if (part.partitionId() == a_partition) {
                           returnMetaData
= part;
                           
break loop;
                        }
                    }
                }
            }
catch (Exception e){
                System.out.println(
"Error communicating with Broker [" + seed +"] to find Leader for [" +a_topic
                       
+ ", " + a_partition + "] Reason: " + e);
            }
finally {
               
if (consumer !=null)consumer.close();
            }
        }
       
if (returnMetaData !=null) {
           m_replicaBrokers.clear();
           
for (kafka.cluster.Brokerreplica : returnMetaData.replicas()) {
               m_replicaBrokers.add(replica.host());
            }
        }
       
return returnMetaData;
    }
}


 


源文档 <http://blog.csdn.net/strawbingo/article/details/45366061>


 

源文档 <http://blog.csdn.net/strawbingo/article/details/45366061>

原创粉丝点击