How-to: make spark streaming collect data from Kafka topics and store data into hdfs
来源:互联网 发布:java 众筹 开源 编辑:程序博客网 时间:2024/05/17 22:55
Develop steps:
- Develop class which is used for connect kafka topics and store data into hdfs.
In spark project:
./examples/src/main/scala/org/apche/spark/examples/streaming/Kafka.scala
package org.apache.spark.examples.streaming
import java.util.Properties
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.kafka._
import org.apache.spark.SparkConf
object Kafka {
def main(args:Array[String])
{
if (args.length < 5) {
System.err.println("Usage: Kafka <zkQuorum> <group> <topics> <numThreads> <output>")
System.exit(1)
}
val Array(zkQuorum, group, topics, numThreads,output) = args
val sparkConf = new SparkConf().setAppName("Kafka")
val ssc = new StreamingContext(sparkConf, Seconds(2))
ssc.checkpoint("checkpoint")
val topicpMap = topics.split(",").map((_,numThreads.toInt)).toMap
val lines = KafkaUtils.createStream(ssc, zkQuorum, group, topicpMap).map(_._2)
lines.print()
lines.saveAsTextFiles(output, "txt")
ssc.start()
ssc.awaitTermination()
}
} - Generate new spark examples jar:
cd examples
mvn -Pyarn -DskipTests clean package - Replace cluster's spark-exapmes-*.jar with upper generated new one
- Start Kafka server, kafka producer:
cd ${KAFKA_HOME}
bin/kafka-server-start.sh config/server.properties
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test - Start spark streaming to connect kafka:
bin/spark-submit --master yarn-cluster --class org.apache.spark.examples.streaming.Kafka /opt/spark/lib/spark-examples-1.3.0-cdh5.4.1-hadoop2.6.0-cdh5.4.1.jar localhost:2183 group_kafka test 1 topics
Notice: group_kafka: group id for current spark streaming consumer, could be anything - When yarn application turns into state: RUNNING, type message in spark producer:
this is a testing message - Data is being written into hdfs:
The numbers between topics and .txt is TIME_IN_MS( milliscond).
[hadoop@master root]$ hadoop fs -ls /user/hadoop/
Found 82 items
drwxr-xr-x - hadoop supergroup 0 2015-06-18 13:13 /user/hadoop/.sparkStaging
drwxr-xr-x - hadoop supergroup 0 2015-06-18 13:13 /user/hadoop/checkpoint
drwxr-xr-x - hadoop supergroup 0 2015-06-18 13:11 /user/hadoop/topics-1434604268000
drwxr-xr-x - hadoop supergroup 0 2015-06-18 13:11 /user/hadoop/topics-1434604270000
drwxr-xr-x - hadoop supergroup 0 2015-06-18 13:11 /user/hadoop/topics-1434604272000
drwxr-xr-x - hadoop supergroup 0 2015-06-18 13:11 /user/hadoop/topics-1434604274000
[hadoop@master root]$ hadoop fs -cat /user/hadoop/topics-1434604274000/part-00000
this is a testing message
0 0
- How-to: make spark streaming collect data from Kafka topics and store data into hdfs
- How-to: effective store kafka data into hdfs via spark streaming
- How to read binary data from HDFS with Thrift?
- How to store a 1D table data into a general data structure?
- Spark Streaming polling data from Flume
- How to import xml data into excel
- Insight into DOMDocument - how to convert data from XML to array in PHP
- How To Load CLOB Data from a File into a CLOB column using PL/SQL
- How to remove k__BackingField from Json data
- Large Data and Streaming
- How to use Scala on Spark to load data into Hbase/MapRDB -- normal load or bulk load.
- SQL Script for select data from ebs and make a csv file to FTP
- Duwamish' Framewokes and data how to run !!!~~~
- CMFCPropertyGridCtrl: How to Validate and Update Data?
- How to divide a data set into equal size buckets
- How to parse / read JSON data into a Android ListView
- How FriendFeed uses MySQL to store schema-less data
- How to Post Data and Fetch Remote Pages from PHP Scripts
- Volley 源码笔记(2)
- 初涉Quartz 2D
- 计算机视觉算法源码
- JSP中文乱码的相关解决方案
- .NET的委托和匿名函数应用一例
- How-to: make spark streaming collect data from Kafka topics and store data into hdfs
- servlet 过滤器
- JavaBean对象与Map对象互相转换
- 探秘Java虚拟机——内存管理与垃圾回收
- windows ntp 服务器
- Runtime监控项目内存使用情况
- 七牛界面化工具
- stm8s中UART的用法(四种UART中断)
- linux下日期表示法~