spark:学习杂记--24

来源:互联网 发布:win7恢复网络设置 编辑:程序博客网 时间:2024/04/30 22:11

1.在代码中定义SetMaster:

    .setMaster(“spark://192.168.30.129:7077”)有可能出现内存不足


2.代替“/”运算符:

    val x = new Rational(1,2)

    X:Rational = 1/2


3.不建议在定义标识符结尾时使用下划线


4.Rational:

implicit def intToRational(x: Int) = new Rational(x)  val r = new Rational(2, 3)  r: Rational = 2/3  2 * rrel: Rational = 4/3

5.把左面的格式转换为右面···

  (int: x) => x + 1 或 increase = (x: Int) => {println(```)

                                                                         `````````````

                                                                         x + 1       }


6.所有集合类都能用到foreach方法,它以函数作为入参,并对每个元素调用该函数打印集合类中所有元素。

  *foreach方法被定义在特质Iterable中,它是List,Set,Array和Map的共有特质


7.“_”占位符。

scala> val d = sum _d: (Int, Int, Int) => Int = <function>scala> d(10, 20, 30)rel : Int = 60

8.求中位数:

package akriaimport org.apache.log4j.{Level, Logger}import org.apache.spark.{SparkContext, SparkConf}import org.apache.spark.SparkContext.rddToPairRDDFunctionsimport scala.collection.mutable.ListBuffer/** * Created by sendoh on 2015/4/11. */object Median {  def main(args: Array[String]): Unit ={    //    Logger.getLogger("org.apache.spark").setLevel(Level.WARN)    Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)    //    if(args.length != 3){      println("Usage: java -jar code.jar dependency_jars file_location save_location")      System.exit(0)    }    val jars = ListBuffer[String]()    args(0).split(',').map(jars += _)    val conf = new SparkConf().setAppName("Median").setSparkHome("/usr/local/spark-1.2.0-bin-hadoop2.4").setJars(jars)    val sc = new SparkContext(conf)    //    val data = sc.textFile("hdfs://localhost:9000/datatnt/textwordc.txt")    //将数据逻辑划分为10个区域,统计每个区域的数据量    val mappeddata = data.map(num => {(num / 1000, num)})    val count = mappeddata.reduceByKey((a, b) => {a + b}).collect()    //根据总的数据量,依次根据划分的区域序号由低到高依次累加,判断中位数落在哪个区域,并获取到中位数在区域中的偏移量    val sum_count = count.map(data => {data._2}).sum    var temp = 0    var index = 0    var mid = sum_count / 2    for(i <- 0 to 10){      temp = temp + count(i)      if(temp >= mid){        index = i        break      }    }    //中位数在区域中的偏移量    val offset = temp - mid    //获取中位数所在区域的偏移量为offset的数,也就是中位数    val result = mappeddata.filter(num => num._1 == index).takeOrdered(offset)    println("Median is " + result(offset))    sc.stop()  }}


0 0
原创粉丝点击