spark note
来源:互联网 发布:手机淘宝店铺模版购买 编辑:程序博客网 时间:2024/05/13 02:06
SparkContext:
def createSparkContext(): SparkContext = {
val master = this.master match {
case Some(m) => m
case None => {
val prop = System.getenv("MASTER")
if (prop != null) prop else "local"
}
}
sparkContext = new SparkContext(master, "Spark shell")
}
For a client to establish a connection to the Spark cluster, the SparkContext object
needs some basic information as follows:
master: The master URL can be in one of the following formats:
local[n]: for a local mode
spark://[sparkip]: to point to a Spark cluster
mesos://: for a mesos path if you are running a mesos cluster
application name: This is the human-readable application name
sparkHome: This is the path to Spark on the master/workers machines
jars: This gives the path to the list of JAR files required for your job
Scala
In a Scala program, you can create a SparkContext instance using the following code:
val spar kContext = new SparkContext(master_path, "application
name", ["optional spark home path"],["optional list of jars"])
While you can hardcode all of these values, it's better to read them from the
environment with reasonable defaults. This approach provides maximum flexibility
to run the code in a changing environment without having to recompile the code.
Using local as the default value for the master machine makes it easy to launch
your application locally in a test environment. By carefully selecting the defaults,
you can avoid having to over-specify them. An example would be as follows:
import spark.sparkContext
import spark.sparkContext._
import scala.util.Properties
val master = Properties.envOrElse("MASTER","local")
val sparkHome = Properties.get("SPARK_HOME")
val myJars = Seq(System.get("JARS")
val sparkContext = new SparkContext(master, "my app", sparkHome,myJars)
The collect() function is especially useful for testing, in much the same way as the
parallelize() function is. The collect() function only works if your data fits
in memory on a single host; in that case it adds the bottleneck of everything having
to come back to a single machine.
- spark note
- spark机制note
- 【投稿】Machine Learning With Spark Note 1:数据基本处理
- 【投稿】Machine Learing With Spark Note 3:构建分类器
- [pySpark][note]Web Server Log Analysis with Apache Spark
- note
- note
- NOTE
- note
- Note
- note
- note
- Note
- note
- Note
- Note
- note
- Note
- linux网络配置
- Spring 之AOP AspectJ切入点语法详解(最全了,不需要再去其他地找了)
- define与typedef的区别
- linux解压缩命令
- 获取当前网页的所有连接
- spark note
- 【转】 PHP100视频教程(2012-2013版)
- 负数和strlen(str)不能进行比较
- java中HashMap详解
- Windows文件路径
- 题目1542:黑白迷阵
- C语言基础 ----sscanf
- 多态性 虚函数
- StringBuilder与StringBuffer的区别