Spark 阶段总结 4

来源:互联网 发布:软件测试工资 编辑:程序博客网 时间:2024/06/07 11:20

 

1.    概述

本文描述的是spark 学习的第四阶段知识点,主要目的是实现 spark streaming + flume + log4j + mongoDB 的端对端演示,它的基础是  <Spark 阶段总结 3>  所介绍演示。本文对应的github URL:https://github.com/riverlight/spark-study-1。

 

2.    mongoDB 安装及调用

安装URL: http://www.runoob.com/mongodb/mongodb-linux-install.html

         Scala 操作mongoDB: http://blog.csdn.net/yaoyasong/article/details/39698339

 

3.    mfs

3.1   代码 mfs.scala

packagecom.leon

import com.mongodb.casbah.MongoClient
import com.mongodb.casbah.commons.MongoDBObject
import org.apache.spark.streaming.flume._
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.streaming.Seconds
import org.apache.spark.storage.StorageLevel

/**
  * Created by leon on 2016/1/21.
  */
object mfs {
  def main(args: Array[String]): Unit = {
    println("Hi, this is a mongodb+flume+sparkdemo program")

    if (args.length < 2) {
      print("please enter host and port")
      //System.exit(1)
   
}

    val mongoClient = MongoClient("192.168.227.132",27017)
    val db = mongoClient("sca")
    db.collectionNames
    val mysca = db("mysca")

    val sc = newSparkConf().setAppName("FlumeEventCount")
    val ssc = newStreamingContext(sc, Seconds(20))

    val hostname = args(0)
    val port = args(1).toInt
    val storageLevel = StorageLevel.MEMORY_ONLY

   
println
(hostname + " " +port)
    val flumeStream = FlumeUtils.createPollingStream(ssc,hostname, port, storageLevel)
    flumeStream.foreachRDD( rdd => {
      //rdd.count().map( cnt => "Received" + cnt + " flume events." ).print()
     
print
(rdd.count().toString())
      val count1 = MongoDBObject("count"-> rdd.count())
      mysca.insert(count1)
    })

//    flumeStream.count().map(cnt =>"Received " + cnt + " flume events." ).print()
//
//    val count1 =MongoDBObject("count" -> flumeStream.count())
//    mysca.insert(count1)

   
ssc.start()
    //计算完毕退出
   
ssc.awaitTermination()
  }
}

3.2   说明

注意 foreachRDD函数和下面被注释的代码,最初用的是注释代码试图写数据库,会出错,用 foreachRDD 才可以解决。

0 0
原创粉丝点击