idea spark远程调试

来源:互联网 发布:高斯面膜 知乎 编辑:程序博客网 时间:2024/06/05 00:57

搭建spark集群教程可参考博客园http://www.cnblogs.com/purstar/p/6293605.html

建立maven项目,pom配置参考spark官网例子

import org.apache.spark.api.java.*;import org.apache.spark.SparkConf;import org.apache.spark.api.java.function.Function;public class Test {    public static void main(String[] args) {        String logFile = "hdfs://master:9000/Hadoop/Input/README.md"; // Should be some file on your system        SparkConf conf = new SparkConf().setAppName("Simple Application").setMaster("spark://master:7077")                .setJars(new String[]{"D:\\repo\\org\\deng\\SparkTest\\1.0-SNAPSHOT\\SparkTest-1.0-SNAPSHOT.jar"});        //注意设置jar包路径,以免报找不到class的异常        JavaSparkContext sc = new JavaSparkContext(conf);        JavaRDD<String> logData = sc.textFile(logFile,2).cache();        long numAs = logData.filter(new Function<String, Boolean>() {            public Boolean call(String s) { return s.contains("a"); }        }).count();        long numBs = logData.filter(new Function<String, Boolean>() {            public Boolean call(String s) { return s.contains("b"); }        }).count();        System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs);        sc.stop();    }}
运行main方法即可

0 0
原创粉丝点击