Spark应用程序创建并在集群上运行

来源:互联网 发布:淘宝大卖家如何放单 编辑:程序博客网 时间:2024/05/20 07:18

1.编写spark应用程序

1)创建SparkConf,设置Spark应用配置信息,并设置应用需要连接的集群的master节点的url。local代表本地运行,在集群中运行直接去掉.setMaster。

SparkConf conf = new SparkConf().setAppName("map")//.setMaster("local");

2)创建JavaSparkContext对象

JavaSparkContext sc = new JavaSparkContext(conf);

3)创建初始RDD,有两种方式:读取外部数据集,以及在驱动程序中对一个集合进行并行化

JavaRDD<String> lines = sc.textFile("/usr/soft/words");
或者

//构造集合List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);//并行化集合,创建初始RDDJavaRDD<Integer> numberRDD = sc.parallelize(numbers);

4)对初始RDD进行transformation操作

JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, String>() {private static final long serialVersionUID = 1L;public Iterable<String> call(String line) throws Exception {return Arrays.asList(line.split(" "));}});
JavaPairRDD<String, Integer> pairs = words.mapToPair(new PairFunction<String,String,Integer>(){private static final long serialVersionUID = 1L;public Tuple2<String, Integer> call(String word) throws Exception {return new Tuple2<String, Integer>(word,1);}});
JavaPairRDD<String, Integer> wordCounts = pairs.reduceByKey(new Function2<Integer,Integer,Integer>(){private static final long serialVersionUID = 1L;public Integer call(Integer v1, Integer v2) throws Exception {return v1+v2;}});
wordCounts.foreach(new VoidFunction<Tuple2<String,Integer>>(){private static final long serialVersionUID = 1L;public void call(Tuple2<String, Integer> wordCount) throws Exception {System.out.println(wordCount._1 + "出现了" + wordCount._2 + " 多少次.");}});sc.close();

2. 使用maven插件对spark工程进行打包,并将spark工程jar包上传到集群。

对工程文件右键——》 Run as ——》Run Configurations ——》Maven Build ——》右击 New ——》填写name,goals等信息

3. 编写spark-submit脚本

/usr/soft/spark-1.6.0/bin/spark-submit \--class ###(包名).###(类名) \--num-executors 3 \--driver-memory 100m \--executor-memory 100m \--executor-cores 3 \/usr/soft/####.jar \
默认情况下,运行不了脚本,因为权限不够,此时就需要对文件赋予权限

chmod 777 ###.sh





原创粉丝点击