spark rdd 和 DF 转换

来源:互联网 发布:便利店软件商品录入 编辑:程序博客网 时间:2024/05/16 07:59

RDD   -》 DF

 

有两种方式

一、

 

一、Inferring the Schema Using Reflection

 

将 RDD[t]   转为一个 object ,然后 to df

 

val peopleDF = spark.sparkContext  .textFile("examples/src/main/resources/people.txt")  .map(_.split(","))  .map(attributes => Person(attributes(0), attributes(1).trim.toInt))  .toDF()

 

 

rdd 也能直接装 DATASet  要  import 隐式装换 类 import spark.implicits._

 如果  转换的对象为  tuple .   转换后  下标为 _1  _2   .....

 

 

 

二、Programmatically Specifying the Schema

 

把 columnt meta  和  rdd   createDataFrame 在一起

 

val peopleRDD = spark.sparkContext.textFile("examples/src/main/resources/people.txt")// The schema is encoded in a stringval schemaString = "name age"// Generate the schema based on the string of schemaval fields = schemaString.split(" ")  .map(fieldName => StructField(fieldName, StringType, nullable = true))val schema = StructType(fields)

 

val rowRDD = peopleRDD  .map(_.split(","))  .map(attributes => Row(attributes(0), attributes(1).trim))// Apply the schema to the RDDval peopleDF = spark.createDataFrame(rowRDD, schema)// Creates a temporary view using the DataFramepeopleDF.createOrReplaceTempView("people")

 

 

 

 

 

 

DF  to  RDd

 

val tt = teenagersDF.rdd


0 0
原创粉丝点击