Spark Sql 总结
来源:互联网 发布:兰州大学网络教育网址 编辑:程序博客网 时间:2024/05/19 15:40
1. 创建一个JavaSparkContext :
SparkConf conf = new SparkConf().setAppName(appName).setMaster(master);
JavaSparkContext sc = new JavaSparkContext(conf);
2. 创建rdd:
(1) parallelize 一个Collection
List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
JavaRDD<Integer> distData = sc.parallelize(data);
(2) JavaRDD<String> distFile = sc.textFile("data.txt");
读取data.txt的每一行内容
3. RDD的方法。参考:http://blog.csdn.net/lxxc11/article/details/51333088
4. 创建一个SQLContext
JavaSparkContext sc = ...; // An existing JavaSparkContext.
SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
5. 创建DataFrame
(1)
DataFrame df = sqlContext.jsonFile("examples/src/main/resources/people.json");
(2)
DataFrame df = sqlContext.sql("select * from table ");
6. 将RDD转为DataFrame:
(1)
JavaRDD<Person> people = sc.textFile("examples/src/main/resources/people.txt").map(
new Function<String, Person>() {
public Person call(String line) throws Exception {
String[] parts = line.split(",");
Person person = new Person();
person.setName(parts[0]);
person.setAge(Integer.parseInt(parts[1].trim()));
return person;
}
});
DataFrame schemaPeople = sqlContext.createDataFrame(people, Person.class);
(2)
JavaRDD<String> people = sc.textFile("examples/src/main/resources/people.txt");
// The schema is encoded in a string
String schemaString = "name age";
// Generate the schema based on the string of schema
List<StructField> fields = new ArrayList<StructField>();
for (String fieldName: schemaString.split(" ")) {
fields.add(DataType.createStructField(fieldName, DataType.StringType, true));
}
StructType schema = DataType.createStructType(fields);
// Convert records of the RDD (people) to Rows.
JavaRDD<Row> rowRDD = people.map(
new Function<String, Row>() {
public Row call(String record) throws Exception {
String[] fields = record.split(",");
return Row.create(fields[0], fields[1].trim());
}
});
// Apply the schema to the RDD.
DataFrame peopleDataFrame = sqlContext.createDataFrame(rowRDD, schema);
(3)
List<String> jsonData = Arrays.asList("{\"name\":\"Yin\",\"address\":{\"city\":\"Columbus\",\"state\":\"Ohio\"}}");
JavaRDD<String> anotherPeopleRDD = sc.parallelize(jsonData);
DataFrame anotherPeople = sqlContext.jsonRDD(anotherPeopleRDD);
SparkConf conf = new SparkConf().setAppName(appName).setMaster(master);
JavaSparkContext sc = new JavaSparkContext(conf);
2. 创建rdd:
(1) parallelize 一个Collection
List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
JavaRDD<Integer> distData = sc.parallelize(data);
(2) JavaRDD<String> distFile = sc.textFile("data.txt");
读取data.txt的每一行内容
3. RDD的方法。参考:http://blog.csdn.net/lxxc11/article/details/51333088
4. 创建一个SQLContext
JavaSparkContext sc = ...; // An existing JavaSparkContext.
SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
5. 创建DataFrame
(1)
DataFrame df = sqlContext.jsonFile("examples/src/main/resources/people.json");
(2)
DataFrame df = sqlContext.sql("select * from table ");
6. 将RDD转为DataFrame:
(1)
JavaRDD<Person> people = sc.textFile("examples/src/main/resources/people.txt").map(
new Function<String, Person>() {
public Person call(String line) throws Exception {
String[] parts = line.split(",");
Person person = new Person();
person.setName(parts[0]);
person.setAge(Integer.parseInt(parts[1].trim()));
return person;
}
});
DataFrame schemaPeople = sqlContext.createDataFrame(people, Person.class);
(2)
JavaRDD<String> people = sc.textFile("examples/src/main/resources/people.txt");
// The schema is encoded in a string
String schemaString = "name age";
// Generate the schema based on the string of schema
List<StructField> fields = new ArrayList<StructField>();
for (String fieldName: schemaString.split(" ")) {
fields.add(DataType.createStructField(fieldName, DataType.StringType, true));
}
StructType schema = DataType.createStructType(fields);
// Convert records of the RDD (people) to Rows.
JavaRDD<Row> rowRDD = people.map(
new Function<String, Row>() {
public Row call(String record) throws Exception {
String[] fields = record.split(",");
return Row.create(fields[0], fields[1].trim());
}
});
// Apply the schema to the RDD.
DataFrame peopleDataFrame = sqlContext.createDataFrame(rowRDD, schema);
(3)
List<String> jsonData = Arrays.asList("{\"name\":\"Yin\",\"address\":{\"city\":\"Columbus\",\"state\":\"Ohio\"}}");
JavaRDD<String> anotherPeopleRDD = sc.parallelize(jsonData);
DataFrame anotherPeople = sqlContext.jsonRDD(anotherPeopleRDD);
0 0
- spark-sql测试总结
- Spark Sql 总结
- Spark-SQL编程总结
- Spark SQL 个人总结
- Spark SQL和DataFrame的学习总结
- Spark SQL和DataFrame的学习总结
- Spark SQL操作mysql错误总结
- Spark SQL内置函数的使用Spark SQL执行计划总结
- Spark Streaming+Spark SQL
- spark sql
- Spark SQL
- Spark SQL
- spark-sql
- spark sql
- spark sql
- spark sql
- Spark-Sql
- Spark SQL
- PAT 1013 Battle Over Cities(并查集)
- 线程
- Android@Kotlin 在Android studio 中配置Kotlin
- css3 @media媒体查询器用法
- HeadFrist设计模式学习之外观模式
- Spark Sql 总结
- elasticsearch系列-03(elasticsearch与java的整合)
- jsp中的内置对象
- 装了最新版本的Chrome后,总显示“正在等待代理隧道的响应”
- 小希的迷宫(简单并查集)
- Android照片墙完整版,完美结合LruCache和DiskLruCache
- excel2007如何增加控件?
- Sqlite表结构简单分析
- HeadFrist设计模式学习之适配器模式