Spark Streaming Kafka CreateDirectDStreaming 遇见的问题

来源：互联网发布：知乎日报正装编辑：程序博客网时间：2024/05/22 01:51

问题1：
spark-submit 提交任务报错如下：
分析：起初我的spark 集群是部署在yarn上，所以在spark-env和spark-default下配置了hadoop相关参数。最后我想使用spark standalone模式跑程序，就把spark-env和spark-default下的hadoop相关参数
注释掉了。之后提交程序提示：
Exception in thread "main" java.net.ConnectException: Call From node1/192.168.88.130 to node1:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
异常：就是说连接hadoop hdfs 拒绝，当时排错心想还是spark conf下有hadoop配置没注释完整。一遍遍检查最终也没找到，最后怀疑是linux环境变量有HADOOP_CONF_DIR的配置，结果使用echo $HADOOP_CONF_DIR
果然存在，在/etc/profile中配置的，注释掉解决问题。
总结：尽量不要把局部应用环境变量配置在/etc/profile中，而是配置大数据框架的环境变量中。

问题2：

博主是Linux单节点的伪分布式，一个Master和一个Worker，并且Master和Worker在同一节点上。此时忽略了系统是跑在分布式的环境下的，所以当时糊涂设置为本地

文件系统路径，提交应用之后发现提示，并且CheckPoint没有生效。
WARN SparkContext: Spark is not running in local mode, therefore the checkpoint directory must not be on
the local filesystem. Directory 'file:/home/daxineckpoint' appears to be on the local filesystem.

spark 应用跑在集群模式下，checkpoint directory是不可以设置在本地文件系统的

问题3：

package com.sparkstreaming.directimport kafka.serializer.StringDecoderimport org.apache.spark.streaming.kafka.KafkaUtilsimport org.apache.spark.{SparkConf, SparkContext}import org.apache.spark.streaming.{Seconds, StreamingContext}/**  * Created by Dax1n on 2016/12/1.  */object DirectCreateDstream1 {  val kafkaParams = Map[String, String](    "metadata.broker.list" -> "node1:9092,node1:9093,node1:9094",    "group.id" -> "onlyOneCk1")  def main(args: Array[String]): Unit = {    val conf = new SparkConf()    conf.setAppName("LocalDirect").setMaster("local[2]")    val sc = new SparkContext(conf)    sc.setLogLevel("WARN")    def createStreamingContext():StreamingContext={      val ssc = new StreamingContext(sc, Seconds(2))      ssc.checkpoint("C:\\streamingcheckpoint1")      val dStream = KafkaUtils.createDirectStream[String,String,StringDecoder,StringDecoder](ssc,kafkaParams,Set("orderNumOnlyOne1"))      val dStream1 = dStream.map{        x=>          x._1+" - "+x._2      }      dStream1.print()      ssc    }// 重重注意：对于Spark的Transform和Action都要写在getOrCreate的createStreamingContext函数中，否则报错！！！，此处更多技巧看官方文档//官网地址：http://spark.apache.org/docs/latest/streaming-programming-guide.html的 Checkpointing 章节//    val ssc = StreamingContext.getOrCreate("C:\\streamingcheckpoint1",createStreamingContext _)//错误信息：//16/12/01 09:04:38 ERROR streaming.StreamingContext: Error starting the context, marking it as stopped//org.apache.spark.SparkException: org.apache.spark.streaming.dstream.MappedDStream@4c2a67cc has not been initialized    ssc.start()    ssc.awaitTermination()  }}

如果把Spark 的 Transform和Action写在创建CreateStreamContext函数外面会报如下错误：

16/12/01 09:04:38 ERROR streaming.StreamingContext: Error starting the context, marking it as stopped

org.apache.spark.SparkException: org.apache.spark.streaming.dstream.MappedDStream@4c2a67cc has not been initialized

错误代码 :

package com.sparkstreaming.directimport kafka.serializer.StringDecoderimport org.apache.spark.streaming.kafka.KafkaUtilsimport org.apache.spark.{SparkConf, SparkContext}import org.apache.spark.streaming.{Seconds, StreamingContext}/**  * Created by Dax1n on 2016/12/1.  */object DirectCreateDstream1 {  val kafkaParams = Map[String, String](    "metadata.broker.list" -> "node1:9092,node1:9093,node1:9094",    "group.id" -> "onlyOneCk1")  def main(args: Array[String]): Unit = {    val conf = new SparkConf()    conf.setAppName("LocalDirect").setMaster("local[2]")    val sc = new SparkContext(conf)    sc.setLogLevel("WARN")    def createStreamingContext():StreamingContext={      val ssc = new StreamingContext(sc, Seconds(2))      ssc.checkpoint("C:\\streamingcheckpoint1")        ssc    } //错误写法：Transform和Action写在创建CreateStreamContext函数外面    val ssc = StreamingContext.getOrCreate("C:\\streamingcheckpoint1",createStreamingContext _)    val dStream = KafkaUtils.createDirectStream[String,String,StringDecoder,StringDecoder](ssc,kafkaParams,Set("orderNumOnlyOne1"))      val dStream1 = dStream.map{        x=>          x._1+" - "+x._2      }      dStream1.print()     ssc.start()    ssc.awaitTermination()  }}

0 0