sparkStreamming和高级数据源kafka
来源:互联网 发布:淘宝兼职红包单 编辑:程序博客网 时间:2024/04/20 17:22
对于SparkStreaming+Kafka的组合,有两种方法。
Approach 1: Receiver-based Approach
Approach 2: Direct Approach (No Receivers)
实例1----KafkaReceive
----------------------------------------------------------前提---------------------------------------------------------------------------------
启动zookeeper集群
启动kafka集群
-------------------------------------------------------------------------------------------------------------------------------------------------
1、在kafka下创建一个“sparkStreamingOnKafkaReceive”的topic
root@master:/usr/local/kafka# bin/kafka-topics.sh --create --zookeeper master:2181,worker1:2181,worker2:2181 --replication-factor 2 --partitions 1 --topic sparkStreamingOnKafkaReceive2、启动這个topic的producer
bin/kafka-console-producer.sh --broker-list master:9092,worker1:9092,worker2:9092 --topic sparkStreamingOnKafkaReceive
3、运行sparkStream程序,程序如下:
import org.apache.spark.streaming.kafka.KafkaUtilsimport org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.{SparkConf, SparkContext}import scala.collection._object streamingOnKafkaReceive { def main(args: Array[String]) { val conf = new SparkConf().setMaster("local[4]").setAppName("streamingOnKafkaReceive") val sc = new SparkContext(conf) val ssc = new StreamingContext(sc,Seconds(6)) ssc.checkpoint("/Res") val topic = immutable.Map("sparkStreamingOnKafkaReceive" -> 2) val lines = KafkaUtils.createStream(ssc, "Master:2181,Worker1:2181,Worker2:2181","MyStreamingGroup",topic).map(_._2) val words = lines.flatMap(_.split(" ")) val wordCount = words.map(x => (x,1)).reduceByKey(_+_) wordCount.print() ssc.start() ssc.awaitTermination()}}
4、随便输入一些字符串,运行结果
实列2----DirectStream
import kafka.serializer.StringDecoderimport org.apache.spark.streaming._import org.apache.spark.streaming.kafka._import org.apache.spark.SparkConfobject DirectKafkaWordCount { def main(args: Array[String]) { if (args.length < 2) { System.err.println(s""" |Usage: DirectKafkaWordCount <brokers> <topics> | <brokers> is a list of one or more Kafka brokers | <topics> is a list of one or more kafka topics to consume from | """.stripMargin) System.exit(1) } val Array(brokers, topics) = args // Create context with 2 second batch interval val sparkConf = new SparkConf().setAppName("DirectKafkaWordCount").setMaster("local[2]") val ssc = new StreamingContext(sparkConf, Seconds(2)) // Create direct kafka stream with brokers and topics val topicsSet = topics.split(",").toSet val kafkaParams = Map[String, String]("metadata.broker.list" -> brokers) val messages = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder]( ssc, kafkaParams, topicsSet) // Get the lines, split them into words, count the words and print val lines = messages.map(_._2) val words = lines.flatMap(_.split(" ")) val wordCounts = words.map(x => (x, 1L)).reduceByKey(_ + _) wordCounts.print() // Start the computation ssc.start() ssc.awaitTermination() }}// scalastyle:on println
1 0
- sparkStreamming和高级数据源kafka
- kafka 低级 API 和高级 API
- kafka 高级API和低级API
- Storm集成Kafka数据源
- SparkStreaming数据源之Kafka
- 学习ios之高级控件和协议(数据源协议和委托协议)
- Apache Beam处理Kafka数据源源码分析
- Kafka 命令行使用高级篇
- Kafka技术内幕:消费者(高级和低级API)和 协调者
- Asp.NET 高级应用--数据源控件
- Asp 高级开发之数据源控件
- ASP.NET高级_2_数据源控件
- JDBC高级应用 - 数据源(连接池)
- Kafka的高级消费者与低级消费者
- storm读kafka数据源保证消息不丢失的方法
- 它可以通过前述的 Kafka, Flume等数据源
- 关于数据源连接提供者和数据源连接
- MultiComboBox(本地数据源和远程数据源)
- SparkStreaming之foreachRDD
- SparkStreaming之窗口函数
- SparkStreaming之Output Operations
- CODEFORCES 270B Multithreading <<新闻持续更新>>
- SparkStreaming之Accumulators和Broadcast
- sparkStreamming和高级数据源kafka
- CODEFORCES 270C Magical Boxes <<小箱子装大箱子>>
- 7.1 C语言变量的作用域
- 7.2 C语言变量的存储类别
- Add Binary
- 7.3 LED点阵的介绍
- 7.4 LED点阵的图形显示
- iOS Socket介绍及其简单应用
- 7.5 LED点阵的纵向移动