spark join shuffle 数据文件的读取
来源:互联网 发布:淘宝申请售后只有维修 编辑:程序博客网 时间:2024/06/05 05:13
spark join shuffle 数据文件的读取
我们看下在shuffle过程中数据文件的读取过程中调用的类对象
// 下面就是对这个shuffler中的分片数据进行读取并进行相关的aggregate操作了val blockFetcherItr = new ShuffleBlockFetcherIterator( context, blockManager.shuffleClient, blockManager, mapOutputTracker.getMapSizesByExecutorId(handle.shuffleId, startPartition, endPartition), // Note: we use getSizeAsMb when no suffix is provided for backwards compatibility SparkEnv.get.conf.getSizeAsMb("spark.reducer.maxSizeInFlight", "48m") * 1024 * 1024)
上面类就是负责调用远程和本地的shuffle分片的数据来的
在类ShuffleBlockFetcherIterator.initialize 方法中
private[this] def initialize(): Unit = {// Add a task completion callback (called in both success case and failure case) to cleanup.context.addTaskCompletionListener(_ => cleanup())// Split local and remote blocks.// 拿到要远程拉取的数据信息了val remoteRequests = splitLocalRemoteBlocks()// Add the remote requests into our queue in a random orderfetchRequests ++= Utils.randomize(remoteRequests)// Send out initial requests for blocks, up to our maxBytesInFlight// 从队列里面拿出请求任务进行请求了fetchUpToMaxBytes()val numFetches = remoteRequests.size - fetchRequests.sizelogInfo("Started " + numFetches + " remote fetches in" + Utils.getUsedTimeMs(startTime))// Get Local Blocks// 拉取本地数据了fetchLocalBlocks()logDebug("Got local blocks in " + Utils.getUsedTimeMs(startTime))}
上面可以看到数据的拉取分成远程和本地两种类型的数据。下面先对要拉取的数据进行分类,分成本地和远程的
private[this] def splitLocalRemoteBlocks(): ArrayBuffer[FetchRequest] = {// Make remote requests at most maxBytesInFlight / 5 in length; the reason to keep them// smaller than maxBytesInFlight is to allow multiple, parallel fetches from up to 5// nodes, rather than blocking on reading output from one node.val targetRequestSize = math.max(maxBytesInFlight / 5, 1L)logDebug("maxBytesInFlight: " + maxBytesInFlight + ", targetRequestSize: " + targetRequestSize)// Split local and remote blocks. Remote blocks are further split into FetchRequests of size// at most maxBytesInFlight in order to limit the amount of data in flight.val remoteRequests = new ArrayBuffer[FetchRequest]// Tracks total number of blocks (including zero sized blocks)var totalBlocks = 0for ((address, blockInfos) <- blocksByAddress) { // 一个一个地址去拉取数据了 totalBlocks += blockInfos.size if (address.executorId == blockManager.blockManagerId.executorId) { // Filter out zero-sized blocks // 如果这个数据是在本地的 localBlocks ++= blockInfos.filter(_._2 != 0).map(_._1) numBlocksToFetch += localBlocks.size } else { // 这里就是一些非本地的数据了 val iterator = blockInfos.iterator var curRequestSize = 0L var curBlocks = new ArrayBuffer[(BlockId, Long)] while (iterator.hasNext) { val (blockId, size) = iterator.next() // Skip empty blocks if (size > 0) { curBlocks += ((blockId, size)) remoteBlocks += blockId numBlocksToFetch += 1 curRequestSize += size } else if (size < 0) { throw new BlockException(blockId, "Negative block size " + size) } if (curRequestSize >= targetRequestSize) { // Add this FetchRequest // block 是 ShuffleBlockId remoteRequests += new FetchRequest(address, curBlocks) curBlocks = new ArrayBuffer[(BlockId, Long)] logDebug(s"Creating fetch request of $curRequestSize at $address") curRequestSize = 0 } } // Add in the final request if (curBlocks.nonEmpty) { remoteRequests += new FetchRequest(address, curBlocks) } }}logInfo(s"Getting $numBlocksToFetch non-empty blocks out of $totalBlocks blocks")// 返回要远程拉取的数据了remoteRequests}
可以看到上面根据当前的 shuffleBlockId 的执行进程executorId是否和当前的一致,如果是一致,说明是本地数据,否则是远程数据。
当分片完后,远程数据就从队列中拿出任务进行执行拉取了
private def fetchUpToMaxBytes(): Unit = {// Send fetch requests up to maxBytesInFlightwhile (fetchRequests.nonEmpty && (bytesInFlight == 0 || bytesInFlight + fetchRequests.front.size <= maxBytesInFlight)) { // 通过这里进行限流请求了 sendRequest(fetchRequests.dequeue())}}
下面就是具体的远程拉取方法
private[this] def sendRequest(req: FetchRequest) {logDebug("Sending request for %d blocks (%s) from %s".format( req.blocks.size, Utils.bytesToString(req.size), req.address.hostPort))bytesInFlight += req.size// so we can look up the size of each blockIDval sizeMap = req.blocks.map { case (blockId, size) => (blockId.toString, size) }.toMapval blockIds = req.blocks.map(_._1.toString)// 去这些地址拉取数据了,同时注意block对象是 ShuffleBlockId 里面包含着当前请求的是那个分片数据// 在拉取的时候,还要对block块数据进行分片val address = req.address// NettyBlockTransferService 和 ExternalShuffleClientshuffleClient.fetchBlocks(address.host, address.port, address.executorId, blockIds.toArray, new BlockFetchingListener { override def onBlockFetchSuccess(blockId: String, buf: ManagedBuffer): Unit = { // Only add the buffer to results queue if the iterator is not zombie, // i.e. cleanup() has not been called yet. if (!isZombie) { // Increment the ref count because we need to pass this to a different thread. // This needs to be released after use. buf.retain() results.put(new SuccessFetchResult(BlockId(blockId), address, sizeMap(blockId), buf)) shuffleMetrics.incRemoteBytesRead(buf.size) shuffleMetrics.incRemoteBlocksFetched(1) } logTrace("Got remote block " + blockId + " after " + Utils.getUsedTimeMs(startTime)) } override def onBlockFetchFailure(blockId: String, e: Throwable): Unit = { logError(s"Failed to get block(s) from ${req.address.host}:${req.address.port}", e) results.put(new FailureFetchResult(BlockId(blockId), address, e)) } })}
可以看到主要使用NettyBlockTransferService 和 ExternalShuffleClient 两种客户端进行数据的拉取。
本地数据则如下
private[this] def fetchLocalBlocks() {val iter = localBlocks.iteratorwhile (iter.hasNext) { val blockId = iter.next() try { // 拉取数据了 val buf = blockManager.getBlockData(blockId) shuffleMetrics.incLocalBlocksFetched(1) shuffleMetrics.incLocalBytesRead(buf.size) buf.retain() results.put(new SuccessFetchResult(blockId, blockManager.blockManagerId, 0, buf)) } catch { case e: Exception => // If we see an exception, stop immediately. logError(s"Error occurred while fetching local blocks", e) results.put(new FailureFetchResult(blockId, blockManager.blockManagerId, e)) return }}}
使用blockManager对象进行拉取数据
override def getBlockData(blockId: BlockId): ManagedBuffer = {if (blockId.isShuffle) { // 当是shuffle数据时 shuffleManager.shuffleBlockResolver.getBlockData(blockId.asInstanceOf[ShuffleBlockId])} else { val blockBytesOpt = doGetLocal(blockId, asBlockResult = false) .asInstanceOf[Option[ByteBuffer]] if (blockBytesOpt.isDefined) { val buffer = blockBytesOpt.get new NioManagedBuffer(buffer) } else { throw new BlockNotFoundException(blockId.toString) }}}
然后当前数据是 shuffle数据,所以调用了shuffleBlockResolver对象,然后实现类为FileShuffleBlockResolver和IndexShuffleBlockResolver 两个,现在查看FileShuffleBlockResolver 对象。
override def getBlockData(blockId: ShuffleBlockId): ManagedBuffer = {// 拿到数据文件回来了val file = blockManager.diskBlockManager.getFile(blockId)// 返回这个文件读取对象了 new FileSegmentManagedBuffer(transportConf, file, 0, file.length)}
可以看到,该方法先去拿到该分片的文件,然后创建FileSegmentManagedBuffer 对象。
具体的 DiskBlockManager实现搜索文件为
def getFile(filename: String): File = {// filename 包含着 shuffleid_index_partvar attempts = 0val numLocalDirs = localDirs.lengthval maxAttempts = numLocalDirsval subDirsMap = new mutable.HashMap[Int, Array[File]]()subDirs.zipWithIndex.foreach(s => subDirsMap.put(s._2, s._1))// Figure out which local directory it hashes to, and which subdirectory in that// 文件名称的hashval hash = Utils.nonNegativeHash(filename)// hash 到这些目录下面var dirId = hash % subDirsMap.size// 子目录下面var subDirId = (hash / subDirsMap.size) % subDirsPerLocalDirvar dir: File = nullwhile (dir == null) { attempts += 1 if (attempts > maxAttempts) { /* throw new IOException("Failed to create local dir after " + maxAttempts + " attempts!") */ logError("Failed to create local dir in root " + dirId + " after " + maxAttempts + " attempts!") subDirsMap.remove(dirId) if (subDirsMap.size == 0) { throw new IOException("Failed to create local dir after try in all root dir") } // 有多个盘下面可以存储数据,所以这里这样操作了 dirId = hash % subDirsMap.size subDirId = (hash / subDirsMap.size) % subDirsPerLocalDir } try { // Create the subdirectory if it doesn't already exist dir = subDirs(dirId).synchronized { val old = subDirs(dirId)(subDirId) if (old != null) { new File(old, filename) } else { val newDir = new File(localDirs(dirId), "%02x".format(subDirId)) if (newDir.exists() || newDir.mkdir()) { subDirs(dirId)(subDirId) = newDir new File(newDir, filename) } else { logWarning(s"Failed to create local dir in $newDir.") null } } } } catch { case e: SecurityException => dir = null; }}dir}
从这里可以看到,当是spark的数据文件存放在多硬盘时,是通过hash到多个目录下面进行数据文件的存放的。
然后FileSegmentManagedBuffer 文件就就是纯粹的nio文件的读取
@Overridepublic ByteBuffer nioByteBuffer() throws IOException {FileChannel channel = null;try { channel = new RandomAccessFile(file, "r").getChannel(); // Just copy the buffer if it's sufficiently small, as memory mapping has a high overhead. if (length < conf.memoryMapBytes()) { ByteBuffer buf = ByteBuffer.allocate((int) length); channel.position(offset); while (buf.remaining() != 0) { // 不断读取到直接内存中去 if (channel.read(buf) == -1) { throw new IOException(String.format("Reached EOF before filling buffer\n" + "offset=%s\nfile=%s\nbuf.remaining=%s", offset, file.getAbsoluteFile(), buf.remaining())); } } buf.flip(); return buf; } else { // 直接文件的读取了 return channel.map(FileChannel.MapMode.READ_ONLY, offset, length); }
从上面可以看到,spark的数据文件存放在多个硬盘的原理。思想是通用的,存放其它的临时文件也一样。
总结
- 读取指定shuffleid的part分片的数据
- 通过拉取该分片数据的mapstatus信息,存放在那些节点当中
- 然后对这数据进行分类是远程还是本地数据
- 如果是远程数据就通用netty进行拉取,本地数据就读取数据文件
- 当是本地数据时,就通过 shuffleid + mapid + part 定位到具体存放的数据文件然后通过nio方式读取
- spark join shuffle 数据文件的读取
- spark join shuffle 数据读取的过程
- 大数据:Spark Shuffle(三)Executor是如何fetch shuffle的数据文件
- 大数据:Spark Shuffle(一)ShuffleWrite:Executor如何将Shuffle的结果进行归并写到数据文件中去
- Spark Shuffle 的调研
- Spark的Shuffle机制
- Spark的Shuffle机制
- SPARK里的shuffle
- Spark里的shuffle
- Spark的shuffle实现
- 【Spark】Spark的Shuffle机制
- Spark创建DataFrame和读取CSV数据文件
- Spark创建DataFrame和读取CSV数据文件
- Spark的Shuffle过程介绍
- Broadcast与map进行join,避免shuffle,从而优化spark
- python 数据文件的读取
- 【Spark系列4】Spark的shuffle原理
- spark shuffle mapreduce shuffle
- Java 8的新特性—终极版
- SpringMVC-1
- 并查集和Union-Find算法
- 企业/产品VR展示
- 任学堂说教育:一个初中女语文老师的王者荣耀日常!
- spark join shuffle 数据文件的读取
- jquery实现textarea 高度自适应 转自http://www.jb51.net/article/61997.htm
- 希尔排序算法的实验
- 五大常用算法
- VM下桥接设置
- Laravel 5.4 私有包搭建工厂模式
- 未知二维数组取最右下角值方法(arr[][].length的总结)
- 逆天的excel操作技巧在哪?小编今天统统告诉你,就怕你不敢看!
- bzoj1426 收集邮票