Spark Streaming Custom Receivers
来源:互联网 发布:淘宝法律专业自考通 编辑:程序博客网 时间:2024/06/07 09:04
建立maven项目,引入jar包:
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming_2.10 --><dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2.10</artifactId> <version>1.5.1</version> <scope>provided</scope></dependency>
自定义接收器:
package com.eastcom.test.first.stream;import java.io.InputStream;import java.util.List;import org.apache.commons.io.IOUtils;import org.apache.spark.storage.StorageLevel;import org.apache.spark.streaming.receiver.Receiver;public class FileReceiver extends Receiver<String> {private static final long serialVersionUID = 1L;public FileReceiver(StorageLevel storageLevel) {super(storageLevel);}/** * 在次方法中一般是启动一个线程,线程不断的读取数据源往store中丢数据。 * * 接收到是通过jsc.receiverStream(new FileReceiver(StorageLevel.MEMORY_ONLY())) * * 从store 中取得数据。 * */@Overridepublic void onStart() {new Worker().start();// store("Hello World");}@Overridepublic void onStop() {}class Worker extends Thread {@Overridepublic void run() {while (true) {try {InputStream resource = FileReceiver.class.getResourceAsStream("content.txt");List<String> lines = IOUtils.readLines(resource);for (String line : lines) {store(line); // 往spark中丢数据System.out.println("sent " + line);Thread.sleep(1000);}} catch (Exception e) {e.printStackTrace();}}}}public static void main(String[] args) {FileReceiver fileReceiver = new FileReceiver(StorageLevel.MEMORY_ONLY());fileReceiver.onStart();}}
main方法:
package com.eastcom.test.first.stream;import java.util.ArrayList;import java.util.Arrays;import java.util.List;import java.util.regex.Pattern;import org.apache.spark.SparkConf;import org.apache.spark.storage.StorageLevel;import org.apache.spark.streaming.Durations;import org.apache.spark.streaming.api.java.JavaDStream;import org.apache.spark.streaming.api.java.JavaPairDStream;import org.apache.spark.streaming.api.java.JavaReceiverInputDStream;import org.apache.spark.streaming.api.java.JavaStreamingContext;import scala.Tuple2;/** * 测试自定义接收器,多个接收器的数据流合成union成一股数据流。 * */public class TestApplication {private static final Pattern SPACE = Pattern.compile(" ");private static JavaStreamingContext jsc;public static void main(String[] args) throws Exception {onSparkConf(); // Diver 在驱动节点运行。// init(); // Executor 在work节点上运行。根据批处理时间,每隔5秒运行一次。initUnionStream();startAndWait(); // Diver 在驱动节点运行。}public static void onSparkConf() {System.setProperty("hadoop.home.dir", "D:/softTools/Hadoop/hadoop-2.6.5");// 一个驱动程序占用一个进程,一个接收器占用一个进程。如果local[n] n设置比较小,则只接收,不处理。SparkConf conf = new SparkConf().setAppName("SparkStreaming").setMaster("local[8]");jsc = new JavaStreamingContext(conf, Durations.seconds(5));// jsc.checkpoint("/checkpoint");}/** * 单个接收器 * * */public static void init() {JavaReceiverInputDStream<String> lines = jsc.receiverStream(new FileReceiver(StorageLevel.MEMORY_ONLY()));JavaDStream<String> words = lines.flatMap(x -> Arrays.asList(SPACE.split(x)));JavaPairDStream<String, Integer> wordCounts = words.mapToPair(s -> new Tuple2<>(s, 1)).reduceByKey((i1, i2) -> i1 + i2);wordCounts.print();}/** * * 多个接收器 * * * 多股流合成一股流 * */public static void initUnionStream() {List<JavaDStream<String>> streams = new ArrayList<>();JavaDStream<String> lines_1 = jsc.receiverStream(new FileReceiver(StorageLevel.MEMORY_ONLY()));streams.add(lines_1);JavaDStream<String> lines_2 = jsc.receiverStream(new FileReceiver(StorageLevel.MEMORY_ONLY()));streams.add(lines_2);JavaDStream<String> lines_3 = jsc.receiverStream(new FileReceiver(StorageLevel.MEMORY_ONLY()));streams.add(lines_3);JavaDStream<String> unifiedStream = jsc.union(streams.get(0), streams.subList(1, streams.size()));JavaDStream<String> words = unifiedStream.flatMap(x -> Arrays.asList(SPACE.split(x)));JavaPairDStream<String, Integer> wordCounts = words.mapToPair(s -> new Tuple2<>(s, 1)).reduceByKey((i1, i2) -> i1 + i2);wordCounts.print();}/** * 启动spark, 等待运行终止,关闭spark * */public static void startAndWait() {jsc.start();jsc.awaitTermination();jsc.close();}}
阅读全文
0 0
- Spark Streaming Custom Receivers
- 自定义Spark Streaming的Receivers
- 自定义Spark Streaming接收器(Receivers)
- [Spark]Spark Streaming 指南四 输入DStreams和Receivers
- 15. Spark Streaming源码解读之No Receivers彻底思考
- Spark Streaming源码解读之No Receivers详解
- 15、Spark Streaming源码解读之No Receivers彻底思考
- 第15课:Spark Streaming源码解读之No Receivers彻底思考 本节课分享Spark Streaming源码解读之No Receivers彻底思考,企业级开发Spark Strea
- Spark定制班第15课:Spark Streaming源码解读之No Receivers彻底思考
- Spark 定制版:015~Spark Streaming源码解读之No Receivers彻底思考
- Spark学习笔记(15)Spark Streaming源码解读之No Receivers
- 第15课:Spark Streaming源码解读之No Receivers彻底思考
- 第15课:Spark Streaming源码解读之No Receivers彻底思考
- 第15课:Spark Streaming源码解读之No Receivers彻底思考
- How to spread receivers over worker hosts in Spark streaming - draft
- Spark Streaming
- spark streaming
- Spark/Streaming
- 提交Form表单,POST和GET方式的传值问题。
- 谈谈做产品经理一年来的经历和收获
- Linux地址设置成静态ip地址,无法上网问题
- Spring Boot 使用Java代码创建Bean并注册到Spring中
- Linux部分知识
- Spark Streaming Custom Receivers
- nohup和&后台运行,进程查看及终止
- 例题6-10 UVA699 二叉树先序遍历
- 参数估计与非参数估计
- Gradle系列第(一)篇---Groovy语法初探
- Python 正则表达式匹配字符串中的http链接
- gcov lcov 覆盖c/c++项目入门
- Zookeeper设置开机启动
- C++ 高性能服务器网络框架设计细节