关于使用“Hadoop free”版本的Spark

来源：互联网发布：淘宝打电话需要理他吗编辑：程序博客网时间：2024/06/06 04:03

Using Spark's "Hadoop Free" Build

Spark uses Hadoop client libraries for HDFS and YARN. Starting in version Spark 1.4, the project packages “Hadoop free” builds that lets you more easily connect a single Spark binary to any Hadoop version. To use these builds, you need to modify SPARK_DIST_CLASSPATH to include Hadoop’s package jars. The most convenient place to do this is by adding an entry in conf/spark-env.sh.

This page describes how to connect Spark to Hadoop for different types of distributions.

Apache Hadoop

For Apache distributions, you can use Hadoop’s ‘classpath’ command. For instance:

### in conf/spark-env.sh #### If 'hadoop' binary is on your PATHexport SPARK_DIST_CLASSPATH=$(hadoop classpath)# With explicit path to 'hadoop' binaryexport SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop classpath)# Passing a Hadoop configuration directoryexport SPARK_DIST_CLASSPATH=$(hadoop --config /path/to/configs classpath)

官方文档：http://spark.apache.org/docs/latest/hadoop-provided.html

0 0