SPARK排序算法，使用Scala开发二次排序自定义KEY值，相比JAVA的罗嗦，Scala优雅简洁！！！

来源：互联网发布：java数据结构与算法题编辑：程序博客网时间：2024/06/06 05:45

Spark使用Scala开发的二次排序

【数据文件Input】

2 3

4 1

3 2

4 3

8 7

2 1

【运行结果Output】倒排序

8 7

4 3

4 1

3 2

2 3

2 1

运行结果

【源代码文件】SecondarySortApp.scala SecondarySortKey.scala

class SecondarySortKey定义排序方法compare

class SecondarySortKey(valfirst:Int,valsecond:Int)extendsOrdered [SecondarySortKey]withSerializable {

defcompare(other:SecondarySortKey):Int ={

if(this.first-other.first!=0) {

this.first-other.first

} else{

this.second-other.second

}

SecondarySortApp

1、读入每行数据

vallines =sc.textFile("G://IMFBigDataSpark2016//tesdata//helloSpark.txt", 1) //读取本地文件并设置为一个Partion

2、对每行数据生成一个K，V元组，key值为SecondarySortKey（里面分别放第一个及第二个数据），value为每一行的数据

val pairWithSortKey = lines.map(line => (

new SecondarySortKey(line.split("")(0).toInt, line.split(" ")(1).toInt),line

))

3、对pairWithSortKey排序，降序排序

val sorted = pairWithSortKey.sortByKey(false)

4、对排序以后的结果， sortedLine为k，v键值对，只输出sortedLine._2的value值，即每行的数据

val sortedResult = sorted.map(sortedLine =>sortedLine._2)

5、collect收集打印输出。

sortedResult.collect().foreach (println)

源代码：

package com.dt.spark

class SecondarySortKey(val first:Int,val second:Int) extends Ordered [SecondarySortKey] with Serializable {
def compare(other:SecondarySortKey):Int = {
    if (this.first - other.first !=0) {
         this.first - other.first
    } else {
      this.second - other.second
    }
}
}

package com.dt.spark

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
/*
* *王家林老师授课 http://weibo.com/ilovepains
*/
object SecondarySortApp {
def main(args:Array[String]){
     val conf = new SparkConf() //创建SparkConf对象
      conf.setAppName("SecondarySortApp!") //设置应用程序的名称，在程序运行的监控界面可以看到名称
      conf.setMaster("local") //此时，程序在本地运行，不需要安装Spark集群
       val sc = new SparkContext(conf)
       val lines = sc.textFile("G://IMFBigDataSpark2016//tesdata//helloSpark.txt", 1) //读取本地文件并设置为一个Partion


       val pairWithSortKey = lines.map(line => (
         // val splited = line.split(" ")
         new SecondarySortKey(line.split(" ")(0).toInt, line.split(" ")(1).toInt),line



))

       val sorted = pairWithSortKey.sortByKey(false)

       val sortedResult = sorted.map(sortedLine =>sortedLine._2)

       sortedResult.collect().foreach (println)
}

}

0 1

SPARK排序算法，使用Scala开发 二次排序 自定义KEY值，相比JAVA的罗嗦，Scala优雅简洁！！！

SPARK排序算法，使用Scala开发二次排序自定义KEY值，相比JAVA的罗嗦，Scala优雅简洁！！！