通过反射RDD2DataFrame
来源:互联网 发布:植物精灵for mac 编辑:程序博客网 时间:2024/05/22 14:32
java版本:
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SQLContext;
import java.util.List;
/**
* Created by rong on 2016/3/19.
*/
public class RDD2DataFrame {
public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName(“RDD2DataFrame”).setMaster(“local”);
JavaSparkContext sc= new JavaSparkContext(conf);
JavaRDD<String> testFile = sc.textFile("C://Users//rong//Desktop//persons.txt"); SQLContext sqlContext = new SQLContext(sc); JavaRDD<Person> persons = testFile.map(new Function<String, Person>() { public Person call(String line) throws Exception { String[] str = line.split(","); Person p = new Person(); p.setId(Integer.valueOf(str[0])); p.setName(str[1]); p.setAge(Integer.valueOf(str[2])); return p; } }); //通过反射技术根据Person.class文件生成DataFrame DataFrame df = sqlContext.createDataFrame(persons, Person.class); df.registerTempTable("persons"); df.show();//相当于select * from persons; df.select(df.col("name")).show();//相当于select name from person df.select(df.col("id"),df.col("name")).show();//相当于select id,name from person df.filter(df.col("age").gt(6)).show(); DataFrame dfs = sqlContext.sql("select * from persons where age >= 6"); JavaRDD<Row> row = dfs.javaRDD(); JavaRDD<Person> personRdd = row.map(new Function<Row, Person>() { public Person call(Row row) throws Exception { Person p = new Person(); p.setId(row.getInt(1)); p.setName(row.getString(2)); p.setAge(row.getInt(0)); return p; } }); List<Person> list = personRdd.collect(); for(Person p : list){ System.out.println(p); }}
}
Person类:
person类需要实现Serializable序列化接口,并且是public的。
import java.io.Serializable;
/**
* Created by rong on 2016/3/19.
*/
public class Person implements Serializable {
private int id;
private String name;
private int age;
public int getId() { return id;}public void setId(int id) { this.id = id;}public String getName() { return name;}public void setName(String name) { this.name = name;}public int getAge() { return age;}public void setAge(int age) { this.age = age;}@Overridepublic String toString() { return "Person{" + "id=" + id + ", name='" + name + '\'' + ", age=" + age + '}';}
}
scala版本:
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
/**
* Created by rong on 2016/3/19.
*/
object RDD2DataFrameByScala {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName(“RDD2DataFrameByScala”).setMaster(“local”)
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val df = sc.textFile(“C://Users//rong//Desktop//persons.txt”).map(line => line.split(“,”)).map(line => Person(Integer.valueOf(line(0)),String.valueOf(line(1)),Integer.valueOf(line(2)))).toDF
df.show()
}
}
case class Person(id:Int,name:String,age:Int)
- 通过反射RDD2DataFrame
- 通过反射调用FORM
- 通过例子学反射
- 通过反射创建实体
- 通过反射操作对象
- 通过反射创建类
- 通过反射改变变量
- 通过反射 获取 泛型类
- 通过反射执行方法
- 通过反射打印Set
- 通过反射获取枚举
- 通过反射创建对象?
- 通过反射写BaseServlet
- 反射_通过反射运行配置文件内容
- 反射-通过反射运行配置文件内容
- 反射-通过反射越过泛型检查
- 反射02:通过反射动态操作类
- java反射(5)通过反射拷贝对象
- leetcode之Longest Substring Without Repeating Characters
- c语言指针集合
- 蓝桥杯预赛赛后感想
- m面骰子投掷n次,求最大的点的期望值
- ZOJ 3870Team Formation(位运算)
- 通过反射RDD2DataFrame
- eclipse che安装教程
- 对泛型编程中泛型类型的一些理解
- Ubuntu 12.04 安装Scrapy爬虫框架
- Android中匹配器ArrayAdapter加载List显示顺序
- Java - Collection
- Python-序列化
- ZOJ 3908Number Game
- iOS 自定义UITableViewCell