spark-broadcast
来源:互联网 发布:linux的dd命令详解 编辑:程序博客网 时间:2024/06/13 03:31
spark-broadcast
@(spark)[broadcast]
Spark’s broadcast variables, used to broadcast immutable datasets to all node
Broadcast
/** * A broadcast variable. Broadcast variables allow the programmer to keep a read-only variable * cached on each machine rather than shipping a copy of it with tasks. They can be used, for * example, to give every node a copy of a large input dataset in an efficient manner. Spark also * attempts to distribute broadcast variables using efficient broadcast algorithms to reduce * communication cost. * * Broadcast variables are created from a variable `v` by calling * [[org.apache.spark.SparkContext#broadcast]]. * The broadcast variable is a wrapper around `v`, and its value can be accessed by calling the * `value` method. The interpreter session below shows this: * * {{{ * scala> val broadcastVar = sc.broadcast(Array(1, 2, 3)) * broadcastVar: org.apache.spark.broadcast.Broadcast[Array[Int]] = Broadcast(0) * * scala> broadcastVar.value * res0: Array[Int] = Array(1, 2, 3) * }}} * * After the broadcast variable is created, it should be used instead of the value `v` in any * functions run on the cluster so that `v` is not shipped to the nodes more than once. * In addition, the object `v` should not be modified after it is broadcast in order to ensure * that all nodes get the same value of the broadcast variable (e.g. if the variable is shipped * to a new node later). * * @param id A unique identifier for the broadcast variable. * @tparam T Type of the data contained in the broadcast variable. */ abstract class Broadcast[T: ClassTag](val id: Long) extends Serializable with Logging { /** * :: DeveloperApi :: * An interface for all the broadcast implementations in Spark (to allow * multiple broadcast implementations). SparkContext uses a user-specified * BroadcastFactory implementation to instantiate a particular broadcast for the * entire Spark job. */ @DeveloperApi trait BroadcastFactory {
目前有两组实现,默认的是后者
HttpBroadcast
/** * A [[org.apache.spark.broadcast.Broadcast]] implementation that uses HTTP server * as a broadcast mechanism. The first time a HTTP broadcast variable (sent as part of a * task) is deserialized in the executor, the broadcasted data is fetched from the driver * (through a HTTP server running at the driver) and stored in the BlockManager of the * executor to speed up future accesses. */ private[spark] class HttpBroadcast[T: ClassTag](
TorrentBroadcast
/** * A BitTorrent-like implementation of [[org.apache.spark.broadcast.Broadcast]]. * * The mechanism is as follows: * * The driver divides the serialized object into small chunks and * stores those chunks in the BlockManager of the driver. * * On each executor, the executor first attempts to fetch the object from its BlockManager. If * it does not exist, it then uses remote fetches to fetch the small chunks from the driver and/or * other executors if available. Once it gets the chunks, it puts the chunks in its own * BlockManager, ready for other executors to fetch from. * * This prevents the driver from being the bottleneck in sending out multiple copies of the * broadcast data (one per executor) as done by the [[org.apache.spark.broadcast.HttpBroadcast]]. * * When initialized, TorrentBroadcast objects read SparkEnv.get.conf. * * @param obj object to broadcast * @param id A unique identifier for the broadcast variable. */ private[spark] class TorrentBroadcast[T: ClassTag](obj: T, id: Long) extends Broadcast[T](id) with Logging with Serializable {
随机选远程节点这个事情,是由blockManger完成的
0 0
- spark-broadcast
- spark join broadcast优化
- Spark Broadcast源码分析
- Spark Broadcast内幕解密
- Spark Broadcast源码分析
- spark join broadcast优化
- Spark Broadcast 广播变量
- Spark Broadcast内幕分析
- spark Broadcast 内幕解密
- Spark之BroadCast
- Spark 之Broadcast
- Spark---Broadcast变量&Accumulators
- Spark Broadcast概述
- Spark Broadcast之TorrentBroadcast
- spark-broadcast&accumulator使用
- 关于Spark的Broadcast解析
- spark中的广播变量broadcast
- 关于Spark的Broadcast解析
- Android四大组件之一Service介绍-android学习之旅(十二)
- 0521开始学习打卡
- Java中集合类初探
- 第三章第32题
- js弹窗,div弹窗效果
- spark-broadcast
- java中的多线程
- LeetCode【1】. Two Sum--java的不同方法实现
- 黑马程序员——Java基础之网络编程
- Spark-杂项
- p123,32
- C++刷题——1938: 首字母变大写
- 杭电ACM1312——Red and Black~~广搜
- 语音对讲---基于图灵机器人+科大讯飞