Kafka and Samza: Real-time stream processing

来源:互联网 发布:长春淘宝客服工资多少 编辑:程序博客网 时间:2024/05/21 09:42

As we known, for big data analysis, we have those two already learned[1]: 





Batch Processing is map-reduce. And Iterative Processing is Spark. These two have one thing in common which is what they are processing is a fixed data. Once the processing job starts, you cannot change the input data at all. This gives some disadvantage for real time data analysis.  


Now, for real time analysis, we introduce stream processing. Here is a concept of stream processing[1]: 




In our situation of Kafka + Samza, Samza is the processing framework. Kafka only is a source of organising stream as topics and messages. Now, let's take a look of the details.


 


Here is some concepts in Kafka:




Here are some basic concepts about Samza: 




NM = Node Manager; RM = Resource Manager.


Here is a typical job of Samza: 




In general, one task in Samza is one consumer in Kafka. One stream in the input streams is one partition of topic in kafka. 


Reference:


[1] 15619 Cloud Computing CMU


0 0
原创粉丝点击