Spark Standalone模式应用程序开发

来源:互联网 发布:mac 软件有哪些 编辑:程序博客网 时间:2024/04/26 03:41
一、Scala版本:

程序如下:

packagescala
importorg.apache.spark.SparkContext
importorg.apache.spark.SparkConf
object Test {
    def main(args: Array[String]) {
      val logFile ="file:///spark-bin-0.9.1/README.md"
      val conf =new SparkConf().setAppName("Spark Application in Scala")
      val sc =new SparkContext(conf)
      val logData = sc.textFile(logFile,2).cache()
      val numAs = logData.filter(line => line.contains("a")).count()
      val numBs = logData.filter(line => line.contains("b")).count()
      println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
    }
  }
}

为了编译这个文件,需要创建一个xxx.sbt文件,这个文件类似于pom.xml文件,这里我们创建一个scala.sbt文件,内容如下:

name :="Spark application in Scala"
version :="1.0"
scalaVersion :="2.10.4"
libraryDependencies +="org.apache.spark" %% "spark-core" % "1.0.0"
resolvers +="Akka Repository" at "http://repo.akka.io/releases/"

编译:

# sbt/sbtpackage
[info] Done packaging.
[success] Total time:270 s, completed Jun11, 20141:05:54AM
二、Java版本
/**
 * User: 过往记忆
 * Date: 14-6-10
 * Time: 下午11:37
 * bolg:https://www.iteblog.com
 * 本文地址:https://www.iteblog.com/archives/1041
 * 过往记忆博客,专注于hadoop、hive、spark、shark、flume的技术博客,大量的干货
 * 过往记忆博客微信公共帐号:iteblog_hadoop
 */
/* SimpleApp.java */
importorg.apache.spark.api.java.*;
importorg.apache.spark.SparkConf;
importorg.apache.spark.api.java.function.Function;
 
publicclass SimpleApp {
    publicstatic void main(String[] args) {
        String logFile ="file:///spark-bin-0.9.1/README.md";
        SparkConf conf =newSparkConf().setAppName("Spark Application in Java");
        JavaSparkContext sc =new JavaSparkContext(conf);
        JavaRDD<String> logData = sc.textFile(logFile).cache();
 
        longnumAs = logData.filter(newFunction<String, Boolean>() {
            publicBoolean call(String s) { returns.contains("a"); }
        }).count();
 
        longnumBs = logData.filter(newFunction<String, Boolean>() {
            publicBoolean call(String s) { returns.contains("b"); }
        }).count();
 
        System.out.println("Lines with a: "+ numAs +",lines with b: "+ numBs);
    }
}

本程序分别统计README.md文件中包含a和b的行数。本项目的pom.xml文件内容如下:

<?xml version="1.0"encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
            http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
 
    <groupId>spark</groupId>
    <artifactId>spark</artifactId>
    <version>1.0</version>
 
    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.10</artifactId>
            <version>1.0.0</version>
        </dependency>
    </dependencies>
</project>

利用Maven来编译这个工程:

# mvn install
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:5.815s
[INFO] Finished at: Wed Jun11 00:01:57CST 2014
[INFO] Final Memory: 13M/32M
[INFO] ------------------------------------------------------------------------
三、Python版本
#
# User: 过往记忆
# Date:14-6-10
# Time: 下午11:37
# bolg: https://www.iteblog.com
# 本文地址:https://www.iteblog.com/archives/1041
# 过往记忆博客,专注于hadoop、hive、spark、shark、flume的技术博客,大量的干货
# 过往记忆博客微信公共帐号:iteblog_hadoop
#
from pysparkimport SparkContext
 
logFile ="file:///spark-bin-0.9.1/README.md"
sc = SparkContext("local","Spark Application in Python")
logData = sc.textFile(logFile).cache()
 
numAs = logData.filter(lambda s:'a' in s).count()
numBs = logData.filter(lambda s:'b' in s).count()
 
print"Lines with a: %i, lines with b: %i" % (numAs, numBs)
四、测试运行

本程序的程序环境是Spark 1.0.0,单机模式,测试如下:
1、测试Scala版本的程序

# bin/spark-submit --class"scala.Test"  \
                   --master local[4]    \
              target/scala-2.10/simple-project_2.10-1.0.jar
 
14/06/1101:07:53INFO spark.SparkContext: Job finished:
count at Test.scala:18, took0.019705 s
Lines with a:62, Lines with b:35

2、测试Java版本的程序

# bin/spark-submit --class"SimpleApp"  \
                   --master local[4]    \
              target/spark-1.0-SNAPSHOT.jar
 
14/06/1100:49:14INFO spark.SparkContext: Job finished:
count at SimpleApp.java:22, took0.019374 s
Lines with a:62, lines with b:35

3、测试Python版本的程序

# bin/spark-submit --master local[4]    \
                simple.py
 
Lines with a:62, lines with b:35
原创粉丝点击