Spark Streaming + Kafka整合实例
来源:互联网 发布:网络十大神神兽之首 编辑:程序博客网 时间:2024/05/15 14:37
摘要:本文主要讲了一个Spark Streaming+Kafka整合的实例
本文工程下载:https://github.com/appleappleapple/BigDataLearning
1、工程目录结构
2、引入依赖
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"><modelVersion>4.0.0</modelVersion><groupId>com.lin</groupId><artifactId>SparkStreaming-Demo</artifactId><version>0.0.1-SNAPSHOT</version><name>${project.artifactId}</name><description>My wonderfull scala app</description><inceptionYear>2015</inceptionYear><licenses><license><name>My License</name><url>http://....</url><distribution>repo</distribution></license></licenses><properties><maven.compiler.source>1.8</maven.compiler.source><maven.compiler.target>1.8</maven.compiler.target><encoding>UTF-8</encoding><scala.version>2.11.5</scala.version><scala.compat.version>2.11</scala.compat.version></properties><dependencies><dependency><groupId>org.slf4j</groupId><artifactId>slf4j-log4j12</artifactId><version>1.7.8</version></dependency><dependency><groupId>org.scala-lang</groupId><artifactId>scala-library</artifactId><version>${scala.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.11</artifactId><version>2.1.0</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-streaming_2.11</artifactId><version>2.1.0</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-streaming-kafka_2.11</artifactId><version>1.6.1</version></dependency></dependencies><build><sourceDirectory>src/main/scala</sourceDirectory><testSourceDirectory>src/test/scala</testSourceDirectory><resources><resource><directory>src/main/resources</directory><targetPath>${basedir}/target/classes</targetPath><includes><include>**/*.properties</include><include>**/*.xml</include></includes><filtering>true</filtering></resource><resource><directory>src/main/resources</directory><targetPath>${basedir}/target/resources</targetPath><includes><include>**/*.properties</include><include>**/*.xml</include></includes><filtering>true</filtering></resource></resources><plugins><plugin><!-- see http://davidb.github.com/scala-maven-plugin --><groupId>net.alchim31.maven</groupId><artifactId>scala-maven-plugin</artifactId><version>3.2.0</version><executions><execution><goals><goal>compile</goal><goal>testCompile</goal></goals><configuration><args><!-- <arg>-make:transitive</arg> --><arg>-dependencyfile</arg><arg>${project.build.directory}/.scala_dependencies</arg></args></configuration></execution></executions></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-surefire-plugin</artifactId><version>2.18.1</version><configuration><useFile>false</useFile><disableXmlReport>true</disableXmlReport><includes><include>**/*Test.*</include><include>**/*Suite.*</include></includes></configuration></plugin><plugin><artifactId>maven-assembly-plugin</artifactId><version>2.6</version><configuration><descriptorRefs><descriptorRef>jar-with-dependencies</descriptorRef></descriptorRefs></configuration></plugin></plugins></build></project>
3、编写计算代码
package com.lin.demoimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport org.apache.spark.streaming.StreamingContextimport org.apache.spark.streaming.Durationsimport org.apache.spark.storage.StorageLevelimport org.apache.spark.streaming.dstream.DStreamimport org.apache.spark.streaming.kafka.KafkaUtilsimport kafka.serializer.StringDecoderobject KafkaWordCount { def main(args: Array[String]): Unit = { val sparkConf = new SparkConf().setAppName("WordCount").setMaster("local[2]") //至少2个线程,一个DRecive接受监听端口数据,一个计算 val sc = new StreamingContext(sparkConf, Durations.seconds(3)); val kafkaParams = Map[String, String]("metadata.broker.list" -> "127.0.0.1:9092") // 然后创建一个set,里面放入你要读取的Topic,这个就是我们所说的,它给你做的很好,可以并行读取多个topic var topics = Set[String]("linlin"); //kafka返回的数据时key/value形式,后面只要对value进行分割就ok了 val linerdd = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder]( sc, kafkaParams, topics) val wordrdd = linerdd.flatMap { _._2.split(" ") } wordrdd.foreachRDD(rdd => { println("从topic:" + topics + "读取rdd:" + rdd.count()) }) wordrdd.print() val resultrdd = wordrdd.map { x => (x, 1) }.reduceByKey { _ + _ } resultrdd.print() sc.start() sc.awaitTermination() sc.stop() }}
4、启动zk和kafka
启动zk
启动kafka
5、发送消息
package com.lin.demo.producer;import java.util.Properties;import kafka.javaapi.producer.Producer;import kafka.producer.KeyedMessage;import kafka.producer.ProducerConfig;public class KafkaProducer {private final Producer<String, String> producer;public final static String TOPIC = "linlin";private KafkaProducer() {Properties props = new Properties();// 此处配置的是kafka的端口props.put("metadata.broker.list", "127.0.0.1:9092");props.put("zk.connect", "127.0.0.1:2181"); // 配置value的序列化类props.put("serializer.class", "kafka.serializer.StringEncoder");// 配置key的序列化类props.put("key.serializer.class", "kafka.serializer.StringEncoder");props.put("request.required.acks", "-1");producer = new Producer<String, String>(new ProducerConfig(props));}void produce() {int messageNo = 1000;final int COUNT = 10000;while (true) {String key = String.valueOf(messageNo);String data = "INFO JobScheduler: Finished job streaming job 1493090727000 ms.0 from job set of time 1493090727000 ms" + key;producer.send(new KeyedMessage<String, String>(TOPIC, key, data));System.out.println(data);messageNo++;}}public static void main(String[] args) {new KafkaProducer().produce();}}
6、验证
将3和6中的代码都跑起来
本文工程下载:https://github.com/appleappleapple/BigDataLearning
0 0
- Spark Streaming + Kafka整合实例
- Spark Streaming整合Kafka
- spark streaming 整合kafka
- 整合Kafka到Spark Streaming
- Spark Streaming + Kafka整合指南
- 整合Kafka到Spark Streaming
- Spark-Streaming与Kafka整合
- Spark Streaming和Kafka整合开发指南
- Spark Streaming和Kafka整合开发指南
- Kafka + Spark Streaming+Hive(HBase) 项目整合
- SparkStream:5)Spark streaming+kafka整合实战
- Zookeeper+Kafka+Spark streaming单机整合开发
- Spark Streaming整合Kafka(一)
- Spark Streaming整合Kafka(二)
- Spark streaming整合Kafka之Receiver方式
- Spark streaming整合Kafka之Direct方式
- kafka+spark streaming代码实例(pyspark+python)
- spark-streaming集成Kafka工程实例【转】
- D3D中的问题
- 进程间通信系列(8)System V IPC概述
- 使用PowerDesigner画ER图详细教程
- PNP三极管电路简单分析
- LeetCoder 5. Longest Palindromic Substring
- Spark Streaming + Kafka整合实例
- 你今天受骗了吗?
- 打造PHP工程师的VIM
- Hibernate 与 MyBatis的比较
- scriptX打印控件
- 安卓studio默认使用ConstraintLayout
- js之浅谈this
- ogg数据同步停止OGG-00446 Could not find archived log for sequence
- bzoj4170 极光