day61-Spark SQL数据加载和保存内幕深度解密实战
来源:互联网 发布:终端写c语言 编辑:程序博客网 时间:2024/06/05 02:27
Spark SQL加载数据
SparkSQl 数据输入输入输出主要是DataFrame,DataFrame提供了一些通用的load和save操作。
通过load可以创建出DataFrame;通过save可以将DataFrame数据保存到文件中或者说以具体的格式来指明要读取的文件是什么格式或者输出的数据是什么格式;直接读取 文件的指定类型:
SQLContext源码:
load 和save方法
@deprecated("Use read.load(path). This will be removed in Spark 2.0.","1.4.0")
def load(path:String): DataFrame = {
read.load(path)
}
/**
* Returns the dataset stored at path asa DataFrame, using the given data source.
*
* @group genericdata
* @deprecated As of 1.4.0,replaced by `read().format(source).load(path)`.
* This will be removed in Spark 2.0.
*/
@deprecated("Useread.format(source).load(path). This will be removed in Spark 2.0.","1.4.0")
def load(path:String,source:String): DataFrame = {
read.format(source).load(path)
}
DataFrameReader源码:
/** * Specifies the input data source format. * * @since 1.4.0 */def format(source: String): DataFrameReader = { this.source = source this}
* Loads input inas a[[DataFrame]],for data sources that don't require a path (e.g. external
* key-value stores).
*
* @since 1.4.0
*/
def load(): DataFrame = {
val resolved=ResolvedDataSource(
sqlContext,
userSpecifiedSchema = userSpecifiedSchema,
partitionColumns = Array.empty[String],
provider = source,
options = extraOptions.toMap)
DataFrame(sqlContext, LogicalRelation(resolved.relation))
}
ResolvedDataSource源码
objectResolvedDataSource extendsLogging {
/** A map to maintain backward compatibility in case wemove data sources around. */
private val backwardCompatibilityMap= Map(
"org.apache.spark.sql.jdbc" ->classOf[jdbc.DefaultSource].getCanonicalName,
"org.apache.spark.sql.jdbc.DefaultSource" -> classOf[jdbc.DefaultSource].getCanonicalName,
"org.apache.spark.sql.json" ->classOf[json.DefaultSource].getCanonicalName,
"org.apache.spark.sql.json.DefaultSource" -> classOf[json.DefaultSource].getCanonicalName,
"org.apache.spark.sql.parquet" ->classOf[parquet.DefaultSource].getCanonicalName,
"org.apache.spark.sql.parquet.DefaultSource"-> classOf[parquet.DefaultSource].getCanonicalName
)
可以直接读取数据格式:jdbc,parquet
defapply(
sqlContext: SQLContext,
provider: String,
partitionColumns: Array[String],
mode: SaveMode,
options: Map[String,String],
data: DataFrame): ResolvedDataSource = {
DataFramtWriter源码:
/**
* Specifies the behavior when data ortable already exists. Options include:
* - `SaveMode.Overwrite`: overwrite the existing data.
* - `SaveMode.Append`:append the data.
* - `SaveMode.Ignore`:ignore the operation (i.e. no-op).
* - `SaveMode.ErrorIfExists`: default option, throw an exception at runtime.
*
* @since 1.4.0
*/
def mode(saveMode: SaveMode): DataFrameWriter = {
this.mode= saveMode
this
}
import java.util.ArrayList;import java.util.List;import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.api.java.JavaSparkContext;import org.apache.spark.api.java.function.Function;import org.apache.spark.sql.DataFrame;import org.apache.spark.sql.Row;import org.apache.spark.sql.RowFactory;import org.apache.spark.sql.SQLContext;import org.apache.spark.sql.types.StructField;/** * @author 作者 E-mail: * @version 创建时间:2016年5月8日 上午7:54:28 类说明 */public class SparkSQLLoadSaveOps { public static void main( String[] args ) { SparkConf conf = new SparkConf().setMaster( "local" ).setAppName( "rdd2d" ); JavaSparkContext sc = new JavaSparkContext(); SQLContext sqlContext = new SQLContext( sc ); DataFrame peopleDF = sqlContext.read().format("json").load("D://person.json"); peopleDF.select( "name" ).write().format( "json" ).save( "D://logs//personName.json" );
文件追加方式:是创建一个新文件还是append追加
- day61-Spark SQL数据加载和保存内幕深度解密实战
- 第61课:Spark SQL数据加载和保存内幕深度解密实战
- 大数据IMF传奇行动绝密课程第61课:Spark SQL数据加载和保存内幕深度解密实战
- 第61课:SparkSQl数据加载和保存内幕深度解密实战学习笔记
- Spark SQL数据加载和保存实战
- Spark SQL数据加载和保存实战
- Spark SQL数据加载和保存实战
- Spark SQL数据加载和保存实战
- Spark SQL下Parquet内幕深度解密
- Spark SQL下Parquet内幕深度解密
- Spark 2.0内幕深度解密
- day63-Spark SQL下Parquet内幕深度解密
- Spark Streaming从Flume Poll数据案例实战和内幕源码解密
- 大数据Spark “蘑菇云”行动第48课程 Spark 2.0内幕深度解密和学习最佳实践
- Spark Streaming updateStateByKey案例实战和内幕源码解密
- Spark加载和保存数据
- 大数据IMF传奇行动绝密课程第63课:Spark SQL下Parquet内幕深度解密
- 第63课:Spark SQL下Parquet内幕深度解密学习笔记
- R语言第三章 统计绘图表示第二节
- 【JavaEE笔记】JSP学习
- java泛型
- 下雨了
- B树、B+树、B*树
- day61-Spark SQL数据加载和保存内幕深度解密实战
- FPGA笔记(一)
- 笔记:Semi-supervised domain adaptation with subspace learning for visual recognition (cvpr15)
- JUC (Java Util Concurrency) 基础内容概述
- 导航栏纯代码的创建方法
- 使用XML布局文件和Java混合控制UI界面---简单图片浏览器
- Ubuntu16.04 Firefox非root用户无法打开问题
- 简单计算题-鸡兔同笼
- Geekband C++面向对象高级编程(上) 第一周笔记 暗影行者