Spark RDD 二次分组排序取TopK

来源：互联网发布：订餐的软件编辑：程序博客网时间：2024/06/03 10:48

基本需求

　　用spark求出每个院系每个班每个专业前3名。

样本数据

　　数据格式：id,studentId,language,math,english,classId,departmentId，即id，学号，语文，数学，外语，班级，院系

1,111,68,69,90,1班,经济系2,112,73,80,96,1班,经济系3,113,90,74,75,1班,经济系4,114,89,94,93,1班,经济系5,115,99,93,89,1班,经济系6,121,96,74,79,2班,经济系7,122,89,86,85,2班,经济系8,123,70,78,61,2班,经济系9,124,76,70,76,2班,经济系10,211,89,93,60,1班,外语系11,212,76,83,75,1班,外语系12,213,71,94,90,1班,外语系13,214,94,94,66,1班,外语系14,215,84,82,73,1班,外语系15,216,85,74,93,1班,外语系16,221,77,99,61,2班,外语系17,222,80,78,96,2班,外语系18,223,79,74,96,2班,外语系19,224,75,80,78,2班,外语系20,225,82,85,63,2班,外语系

用Spark core实现

import org.apache.spark.{SparkConf, SparkContext}/**  *学生成绩 TopK问题  *  * 每个院系每个班每科前3名  * 每行数据格式：id,studentId,language,math,english,classId,departmentId  */object TestGroupBy {  def main(args: Array[String]): Unit = {    val conf=new SparkConf().setAppName("TestGroupBy").setMaster("local[4]")    val sc=new SparkContext(conf)    sc.setLogLevel("WARN")    val studentsScore = sc.textFile("C:\\Users\\lenovo\\Desktop\\scores2.txt").map(_.split(","))    //按院系、班级分组 groupByKey/groupBy    val groups=studentsScore.map(scoreInfo=>(scoreInfo(6),(scoreInfo(5),(scoreInfo(1),scoreInfo(2),scoreInfo(3),scoreInfo(4))))).groupByKey()      .map(item=>item._2.groupBy(_._1).map(it=>(item._1,it._1,it._2.map(_._2))))    //每个班每科前TopK  sortWith    val topK=groups.map(item=>{      item.map(c=>{        //语文前3        val languageTopK=c._3.toList.sortWith(_._2>_._2).take(3).map(d=>(d._2+"分:学号"+d._1))        //数学前3        val mathTopK=c._3.toList.sortWith(_._3>_._3).take(3).map(d=>(d._3+"分:学号"+d._1))        //英语前3        val englishTopK=c._3.toList.sortWith(_._4>_._4).take(3).map(d=>(d._4+"分:学号"+d._1))        (c._1,c._2,Map("语文前3"->languageTopK,"数学前3"->mathTopK,"外语前3"->englishTopK))      })    })    //结果显示    topK.foreach(item=>item.foreach(println))  }}/*(经济系,2班,Map(语文前3 -> List(96分:学号121, 89分:学号122, 76分:学号124), 数学前3 -> List(86分:学号122, 78分:学号123, 74分:学号121), 外语前3 -> List(85分:学号122, 79分:学号121, 76分:学号124)))(经济系,1班,Map(语文前3 -> List(99分:学号115, 90分:学号113, 89分:学号114), 数学前3 -> List(94分:学号114, 93分:学号115, 80分:学号112), 外语前3 -> List(96分:学号112, 93分:学号114, 90分:学号111)))(外语系,2班,Map(语文前3 -> List(82分:学号225, 80分:学号222, 79分:学号223), 数学前3 -> List(99分:学号221, 85分:学号225, 80分:学号224), 外语前3 -> List(96分:学号222, 96分:学号223, 78分:学号224)))(外语系,1班,Map(语文前3 -> List(94分:学号214, 89分:学号211, 85分:学号216), 数学前3 -> List(94分:学号213, 94分:学号214, 93分:学号211), 外语前3 -> List(93分:学号216, 90分:学号213, 75分:学号212)))*/

0 0