Spark-Flume整合--Pull

来源：互联网发布：成都知美术馆各种图编辑：程序博客网时间：2024/05/16 19:46

第二种sparkStreaming 整合Flume

flume采用 netcat-memory-customer sink架构

本地测试

1：本地启动sprakStreaming服务，（192.168.145.128 10000）

2. 服务器中启动flume agent

3. telnet往端口中输入数据，观察本地idea控制台输出数据

服务器测试

mvn打包：mvn clean package -DskipTests

上传至服务器

先启动flume

flume-ng agent \  --name netcat-memory-avro \  --conf $FLUME_HOME/conf \  --conf-file $FLUME_HOME/conf/netcat-memory-avro.conf \  -Dflume.root.logger=INFO,console

后启动spark

spark-submit \--class com.tuzhihai.flumespark.SparkPullFlume \--master local[2] \--packages org.apache.spark:spark-streaming-flume_2.11:2.2.0 \/root/soft_down/lib/sparklearn-1.0.jar \192.168.145.128 10000

在端口输入数据

telnet 192.168.145.128 9990

观察flume控制台

pull方式为什么要先启动flume后启动spark?

首先说明，pull方式比push方式更加可靠，在实际工作中应用极多。

pull方式是flume采集到数据后存储在一个Agent中，然后spark想要数据的时候，直接从这个Agent中pull就ok了。这个方式明显更加友好，更加符合工作要求，不是你主动传给我，而是我想要的时候我自己从你那拿。

spark-pull-flume.conf

 flume-ng agent \  --name netcat-memory-spark \  --conf $FLUME_HOME/conf \  --conf-file $FLUME_HOME/conf/spark-pull-flume.conf \  -Dflume.root.logger=INFO,console # example netcat-memory-sparknetcat-memory-spark.sources = netcat-sourcenetcat-memory-spark.sinks = spark-sinknetcat-memory-spark.channels = memory-channel# Describe/configure the sourcenetcat-memory-spark.sources.netcat-source.type = netcatnetcat-memory-spark.sources.netcat-source.bind = 192.168.145.128netcat-memory-spark.sources.netcat-source.port = 9999# Describe/ the sinknetcat-memory-spark.sinks.spark-sink.type = org.apache.spark.streaming.flume.sink.SparkSinknetcat-memory-spark.sinks.spark-sink.hostname = 192.168.145.128netcat-memory-spark.sinks.spark-sink.port = 10000# Use a channel which buffers events in memorynetcat-memory-spark.channels.memory-channel.type = memory# Bind the source and sink to the channelnetcat-memory-spark.sources.netcat-source.channels = memory-channelnetcat-memory-spark.sinks.spark-sink.channel = memory-channel

阅读全文

0 0