Spark中的DataFrame的getAs方法如果取到的值是null的处理

来源：互联网发布：tensorflow最新版本编辑：程序博客网时间：2024/06/05 21:10

Spark中的DataFrame的getAs方法如果取到的值是null的处理结果

我遇到的两种情况吧

val DF = hc.sql("...............")

val rdd = DF.rdd.map{

row =>

val label = row.getAs[Int]("age")

}

1，如果getAs[Integer]("age")那么null值被拿出来依然为null

2，如果getAs[Int]("age")则 label = 0（本以为要报错的才对）

源码spark1.6

  /**   * Returns the value of a given fieldName.   * For primitive types if value is null it returns 'zero value' specific for primitive   * ie. 0 for Int - use isNullAt to ensure that value is not null   *   * @throws UnsupportedOperationException when schema is not defined.   * @throws IllegalArgumentException when fieldName do not exist.   * @throws ClassCastException when data type does not match.   */  def getAs[T](fieldName: String): T = getAs[T](fieldIndex(fieldName))

建议:如果null不是你想的数据建议在SQL阶段就将其过滤掉

补充一点Java的成员变量和局部变量的知识

成员变量与局部变量的联系与区别：
a)无论是成员变量还是局部变量，使用前都需要声明（定义）。
b) 对于局部变量来说，使用前必须要初始化；对于成员变量来说，使用前可以不初始化。如果没有初始化成员变量就开始使用，那么每个类型的成员变量都有一个默认的初始值
　　i. byte、short、int、long 类型的初始值为 0
　　ii. float、 double 类型的初始值为 0.0
　　iii. char 类型的初始值'\u0000'
　　iv. boolean 类型的初始值为 false

阅读全文

0 1