Spark-Flume整合--Pull

来源:互联网 发布:成都知美术馆各种图 编辑:程序博客网 时间:2024/05/16 19:46

第二种sparkStreaming 整合Flume

flume采用 netcat-memory-customer sink架构

本地测试

1:本地启动sprakStreaming服务,(192.168.145.128 10000)

2. 服务器中启动flume agent

3. telnet往端口中输入数据,观察本地idea控制台输出数据

服务器测试

mvn打包:mvn clean package -DskipTests

上传至服务器

先启动flume

flume-ng agent \  --name netcat-memory-avro \  --conf $FLUME_HOME/conf \  --conf-file $FLUME_HOME/conf/netcat-memory-avro.conf \  -Dflume.root.logger=INFO,console 

后启动spark

spark-submit \--class com.tuzhihai.flumespark.SparkPullFlume \--master local[2] \--packages org.apache.spark:spark-streaming-flume_2.11:2.2.0 \/root/soft_down/lib/sparklearn-1.0.jar \192.168.145.128 10000

在端口输入数据

telnet 192.168.145.128 9990

观察flume控制台

pull方式为什么要先启动flume后启动spark?

首先说明,pull方式比push方式更加可靠,在实际工作中应用极多。

pull方式是flume采集到数据后存储在一个Agent中,然后spark想要数据的时候,直接从这个Agent中pull就ok了。这个方式明显更加友好,更加符合工作要求,不是你主动传给我,而是我想要的时候我自己从你那拿。




spark-pull-flume.conf
 flume-ng agent \  --name netcat-memory-spark \  --conf $FLUME_HOME/conf \  --conf-file $FLUME_HOME/conf/spark-pull-flume.conf \  -Dflume.root.logger=INFO,console # example netcat-memory-sparknetcat-memory-spark.sources = netcat-sourcenetcat-memory-spark.sinks = spark-sinknetcat-memory-spark.channels = memory-channel# Describe/configure the sourcenetcat-memory-spark.sources.netcat-source.type = netcatnetcat-memory-spark.sources.netcat-source.bind = 192.168.145.128netcat-memory-spark.sources.netcat-source.port = 9999# Describe/ the sinknetcat-memory-spark.sinks.spark-sink.type = org.apache.spark.streaming.flume.sink.SparkSinknetcat-memory-spark.sinks.spark-sink.hostname = 192.168.145.128netcat-memory-spark.sinks.spark-sink.port = 10000# Use a channel which buffers events in memorynetcat-memory-spark.channels.memory-channel.type = memory# Bind the source and sink to the channelnetcat-memory-spark.sources.netcat-source.channels = memory-channelnetcat-memory-spark.sinks.spark-sink.channel = memory-channel
原创粉丝点击