搭建Spark集群

来源:互联网 发布:阿萨辛捏脸数据 编辑:程序博客网 时间:2024/05/22 03:19

前提:系统已经安装了:JDK  Scala  Hadoop


1.解压安装包到指定目录

2.配置spark-env.sh

export SCALA_HOME=/usr/scalaexport SPARK_HOME=/usr/sparkexport JAVA_HOME=/usr/java/jdkexport HADOOP_CONF_DIR=/usr/hadoop/etc/hadoop#export SPARK_EXECUTOR_INSTANCES=2#export SPARK_EXECUTOR_CORES=2#export SPARK_EXECUTOR_MEMORY=2Gexport SPARK_WORKER_MEMORY=2G#export SPARK_YARN_APP_NAME=Spark#export SPARK_YARN_QUEUE=defaultexport SPARK_MASTER_IP=192.168.0.100

3.配置spark-defaults.conf

 spark.master                     spark://master:7077 spark.eventLog.enabled           true spark.eventLog.dir               hdfs://master:9000/spark spark.serializer                 org.apache.spark.serializer.KryoSerializer spark.driver.memory              2g #spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"

4.在HDFS上创建目录hdfs://master:9000/spark

5.配置Slaves

slave1
slave2
slave3
slave4
slave5
slave6
slave7
slave8
slave9
slave10
slave11
slave12
slave13
slave14
slave15
slave16

6.将Spark解压配置好的文件SCP到各个Slaves节点

7.启动spark集群

8.启动spark-shell

hadoop@master:/usr/spark$ bin/spark-shell
16/03/17 13:31:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/03/17 13:31:37 INFO spark.SecurityManager: Changing view acls to: hadoop
16/03/17 13:31:37 INFO spark.SecurityManager: Changing modify acls to: hadoop
16/03/17 13:31:37 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
16/03/17 13:31:37 INFO spark.HttpServer: Starting HTTP Server
16/03/17 13:31:37 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/03/17 13:31:37 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:41190
16/03/17 13:31:37 INFO util.Utils: Successfully started service 'HTTP class server' on port 41190.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
Type in expressions to have them evaluated.
Type :help for more information.
16/03/17 13:31:40 INFO spark.SparkContext: Running Spark version 1.6.1
16/03/17 13:31:40 INFO spark.SecurityManager: Changing view acls to: hadoop
16/03/17 13:31:40 INFO spark.SecurityManager: Changing modify acls to: hadoop
16/03/17 13:31:40 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
16/03/17 13:31:40 INFO util.Utils: Successfully started service 'sparkDriver' on port 35703.
16/03/17 13:31:40 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/03/17 13:31:41 INFO Remoting: Starting remoting
16/03/17 13:31:41 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.0.100:37246]
16/03/17 13:31:41 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 37246.
16/03/17 13:31:41 INFO spark.SparkEnv: Registering MapOutputTracker
16/03/17 13:31:41 INFO spark.SparkEnv: Registering BlockManagerMaster
16/03/17 13:31:41 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-1281dff7-2049-4303-b861-cf2672f68366
16/03/17 13:31:41 INFO storage.MemoryStore: MemoryStore started with capacity 1247.6 MB
16/03/17 13:31:41 INFO spark.SparkEnv: Registering OutputCommitCoordinator
16/03/17 13:31:41 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/03/17 13:31:41 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
16/03/17 13:31:41 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
16/03/17 13:31:41 INFO ui.SparkUI: Started SparkUI at http://192.168.0.100:4040
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Connecting to master spark://master:7077...
16/03/17 13:31:41 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160317133141-0001
16/03/17 13:31:41 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38620.
16/03/17 13:31:41 INFO netty.NettyBlockTransferService: Server created on 38620
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor added: app-20160317133141-0001/0 on worker-20160317133128-192.168.0.6-38003 (192.168.0.6:38003) with 4 cores
16/03/17 13:31:41 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160317133141-0001/0 on hostPort 192.168.0.6:38003 with 4 cores, 1024.0 MB RAM
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor added: app-20160317133141-0001/1 on worker-20160317133128-192.168.0.2-45986 (192.168.0.2:45986) with 4 cores
16/03/17 13:31:41 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160317133141-0001/1 on hostPort 192.168.0.2:45986 with 4 cores, 1024.0 MB RAM
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor added: app-20160317133141-0001/2 on worker-20160317133128-192.168.0.5-34275 (192.168.0.5:34275) with 4 cores
16/03/17 13:31:41 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160317133141-0001/2 on hostPort 192.168.0.5:34275 with 4 cores, 1024.0 MB RAM
16/03/17 13:31:41 INFO storage.BlockManagerMaster: Trying to register BlockManager
16/03/17 13:31:41 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.0.100:38620 with 1247.6 MB RAM, BlockManagerId(driver, 192.168.0.100, 38620)
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor added: app-20160317133141-0001/3 on worker-20160317104649-192.168.0.3-36639 (192.168.0.3:36639) with 4 cores
16/03/17 13:31:41 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160317133141-0001/3 on hostPort 192.168.0.3:36639 with 4 cores, 1024.0 MB RAM
16/03/17 13:31:41 INFO storage.BlockManagerMaster: Registered BlockManager
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor added: app-20160317133141-0001/4 on worker-20160317133128-192.168.0.14-35433 (192.168.0.14:35433) with 4 cores
16/03/17 13:31:41 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160317133141-0001/4 on hostPort 192.168.0.14:35433 with 4 cores, 1024.0 MB RAM
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor added: app-20160317133141-0001/5 on worker-20160317133128-192.168.0.1-46215 (192.168.0.1:46215) with 4 cores
16/03/17 13:31:41 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160317133141-0001/5 on hostPort 192.168.0.1:46215 with 4 cores, 1024.0 MB RAM
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor added: app-20160317133141-0001/6 on worker-20160317133128-192.168.0.10-41978 (192.168.0.10:41978) with 4 cores
16/03/17 13:31:41 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160317133141-0001/6 on hostPort 192.168.0.10:41978 with 4 cores, 1024.0 MB RAM
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor added: app-20160317133141-0001/7 on worker-20160317133128-192.168.0.7-45198 (192.168.0.7:45198) with 4 cores
16/03/17 13:31:41 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160317133141-0001/7 on hostPort 192.168.0.7:45198 with 4 cores, 1024.0 MB RAM
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor added: app-20160317133141-0001/8 on worker-20160317133128-192.168.0.16-39178 (192.168.0.16:39178) with 4 cores
16/03/17 13:31:41 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160317133141-0001/8 on hostPort 192.168.0.16:39178 with 4 cores, 1024.0 MB RAM
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor added: app-20160317133141-0001/9 on worker-20160317133128-192.168.0.13-34969 (192.168.0.13:34969) with 4 cores
16/03/17 13:31:41 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160317133141-0001/9 on hostPort 192.168.0.13:34969 with 4 cores, 1024.0 MB RAM
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor added: app-20160317133141-0001/10 on worker-20160317133128-192.168.0.11-45772 (192.168.0.11:45772) with 4 cores
16/03/17 13:31:41 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160317133141-0001/10 on hostPort 192.168.0.11:45772 with 4 cores, 1024.0 MB RAM
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor added: app-20160317133141-0001/11 on worker-20160317133128-192.168.0.15-42602 (192.168.0.15:42602) with 4 cores
16/03/17 13:31:41 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160317133141-0001/11 on hostPort 192.168.0.15:42602 with 4 cores, 1024.0 MB RAM
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor added: app-20160317133141-0001/12 on worker-20160317133128-192.168.0.12-33691 (192.168.0.12:33691) with 4 cores
16/03/17 13:31:41 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160317133141-0001/12 on hostPort 192.168.0.12:33691 with 4 cores, 1024.0 MB RAM
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor added: app-20160317133141-0001/13 on worker-20160317133128-192.168.0.4-45911 (192.168.0.4:45911) with 4 cores
16/03/17 13:31:41 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160317133141-0001/13 on hostPort 192.168.0.4:45911 with 4 cores, 1024.0 MB RAM
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor added: app-20160317133141-0001/14 on worker-20160317133128-192.168.0.9-35630 (192.168.0.9:35630) with 4 cores
16/03/17 13:31:41 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160317133141-0001/14 on hostPort 192.168.0.9:35630 with 4 cores, 1024.0 MB RAM
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor added: app-20160317133141-0001/15 on worker-20160317133128-192.168.0.8-37750 (192.168.0.8:37750) with 4 cores
16/03/17 13:31:41 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160317133141-0001/15 on hostPort 192.168.0.8:37750 with 4 cores, 1024.0 MB RAM
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160317133141-0001/3 is now RUNNING
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160317133141-0001/6 is now RUNNING
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160317133141-0001/0 is now RUNNING
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160317133141-0001/4 is now RUNNING
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160317133141-0001/7 is now RUNNING
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160317133141-0001/8 is now RUNNING
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160317133141-0001/1 is now RUNNING
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160317133141-0001/5 is now RUNNING
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160317133141-0001/2 is now RUNNING
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160317133141-0001/9 is now RUNNING
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160317133141-0001/11 is now RUNNING
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160317133141-0001/15 is now RUNNING
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160317133141-0001/10 is now RUNNING
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160317133141-0001/12 is now RUNNING
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160317133141-0001/13 is now RUNNING
16/03/17 13:31:41 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160317133141-0001/14 is now RUNNING
16/03/17 13:31:42 INFO scheduler.EventLoggingListener: Logging events to hdfs://master:9000/spark/app-20160317133141-0001
16/03/17 13:31:42 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/03/17 13:31:42 INFO repl.SparkILoop: Created spark context..
Spark context available as sc.
16/03/17 13:31:42 INFO hive.HiveContext: Initializing execution hive, version 1.2.1
16/03/17 13:31:42 INFO client.ClientWrapper: Inspected Hadoop version: 2.6.0
16/03/17 13:31:42 INFO client.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
16/03/17 13:31:43 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/03/17 13:31:43 INFO metastore.ObjectStore: ObjectStore, initialize called
16/03/17 13:31:43 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
16/03/17 13:31:43 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
16/03/17 13:31:43 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/03/17 13:31:43 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slave10:40516) with ID 6
16/03/17 13:31:43 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slave6:34246) with ID 0
16/03/17 13:31:43 INFO storage.BlockManagerMasterEndpoint: Registering block manager slave10:41610 with 511.5 MB RAM, BlockManagerId(6, slave10, 41610)
16/03/17 13:31:43 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slave2:55130) with ID 1
16/03/17 13:31:43 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slave7:48882) with ID 7
16/03/17 13:31:43 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slave4:34906) with ID 13
16/03/17 13:31:43 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slave15:60692) with ID 11
16/03/17 13:31:43 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slave5:41820) with ID 2
16/03/17 13:31:43 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slave3:48080) with ID 3
16/03/17 13:31:43 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slave11:43104) with ID 10
16/03/17 13:31:43 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slave13:60158) with ID 9
16/03/17 13:31:43 INFO storage.BlockManagerMasterEndpoint: Registering block manager slave2:44618 with 511.5 MB RAM, BlockManagerId(1, slave2, 44618)
16/03/17 13:31:43 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slave12:44130) with ID 12
16/03/17 13:31:43 INFO storage.BlockManagerMasterEndpoint: Registering block manager slave6:37817 with 511.5 MB RAM, BlockManagerId(0, slave6, 37817)
16/03/17 13:31:43 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slave14:48804) with ID 4
16/03/17 13:31:43 INFO storage.BlockManagerMasterEndpoint: Registering block manager slave4:39275 with 511.5 MB RAM, BlockManagerId(13, slave4, 39275)
16/03/17 13:31:43 INFO storage.BlockManagerMasterEndpoint: Registering block manager slave7:42025 with 511.5 MB RAM, BlockManagerId(7, slave7, 42025)
16/03/17 13:31:43 INFO storage.BlockManagerMasterEndpoint: Registering block manager slave5:45802 with 511.5 MB RAM, BlockManagerId(2, slave5, 45802)
16/03/17 13:31:43 INFO storage.BlockManagerMasterEndpoint: Registering block manager slave3:43647 with 511.5 MB RAM, BlockManagerId(3, slave3, 43647)
16/03/17 13:31:43 INFO storage.BlockManagerMasterEndpoint: Registering block manager slave15:34957 with 511.5 MB RAM, BlockManagerId(11, slave15, 34957)
16/03/17 13:31:43 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slave1:55768) with ID 5
16/03/17 13:31:43 INFO storage.BlockManagerMasterEndpoint: Registering block manager slave11:40424 with 511.5 MB RAM, BlockManagerId(10, slave11, 40424)
16/03/17 13:31:43 INFO storage.BlockManagerMasterEndpoint: Registering block manager slave13:42079 with 511.5 MB RAM, BlockManagerId(9, slave13, 42079)
16/03/17 13:31:43 INFO storage.BlockManagerMasterEndpoint: Registering block manager slave12:38794 with 511.5 MB RAM, BlockManagerId(12, slave12, 38794)
16/03/17 13:31:43 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slave16:41804) with ID 8
16/03/17 13:31:43 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slave8:41588) with ID 15
16/03/17 13:31:43 INFO storage.BlockManagerMasterEndpoint: Registering block manager slave14:44682 with 511.5 MB RAM, BlockManagerId(4, slave14, 44682)
16/03/17 13:31:43 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/03/17 13:31:43 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slave9:50926) with ID 14
16/03/17 13:31:43 INFO storage.BlockManagerMasterEndpoint: Registering block manager slave1:43079 with 511.5 MB RAM, BlockManagerId(5, slave1, 43079)
16/03/17 13:31:43 INFO storage.BlockManagerMasterEndpoint: Registering block manager slave8:42457 with 511.5 MB RAM, BlockManagerId(15, slave8, 42457)
16/03/17 13:31:43 INFO storage.BlockManagerMasterEndpoint: Registering block manager slave16:40733 with 511.5 MB RAM, BlockManagerId(8, slave16, 40733)
16/03/17 13:31:43 INFO storage.BlockManagerMasterEndpoint: Registering block manager slave9:46699 with 511.5 MB RAM, BlockManagerId(14, slave9, 46699)
16/03/17 13:31:49 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
16/03/17 13:31:50 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/03/17 13:31:50 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/03/17 13:31:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/03/17 13:31:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/03/17 13:31:57 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
16/03/17 13:31:57 INFO metastore.ObjectStore: Initialized ObjectStore
16/03/17 13:31:58 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/03/17 13:31:58 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
16/03/17 13:31:59 INFO metastore.HiveMetaStore: Added admin role in metastore
16/03/17 13:31:59 INFO metastore.HiveMetaStore: Added public role in metastore
16/03/17 13:31:59 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
16/03/17 13:31:59 INFO metastore.HiveMetaStore: 0: get_all_databases
16/03/17 13:31:59 INFO HiveMetaStore.audit: ugi=hadoop    ip=unknown-ip-addr    cmd=get_all_databases    
16/03/17 13:31:59 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
16/03/17 13:31:59 INFO HiveMetaStore.audit: ugi=hadoop    ip=unknown-ip-addr    cmd=get_functions: db=default pat=*    
16/03/17 13:31:59 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
16/03/17 13:32:01 INFO session.SessionState: Created local directory: /tmp/4d33165d-065c-48c7-b953-cfb7e637b231_resources
16/03/17 13:32:01 INFO session.SessionState: Created HDFS directory: /tmp/hive/hadoop/4d33165d-065c-48c7-b953-cfb7e637b231
16/03/17 13:32:01 INFO session.SessionState: Created local directory: /tmp/hadoop/4d33165d-065c-48c7-b953-cfb7e637b231
16/03/17 13:32:01 INFO session.SessionState: Created HDFS directory: /tmp/hive/hadoop/4d33165d-065c-48c7-b953-cfb7e637b231/_tmp_space.db
16/03/17 13:32:01 INFO hive.HiveContext: default warehouse location is /user/hive/warehouse
16/03/17 13:32:01 INFO hive.HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
16/03/17 13:32:01 INFO client.ClientWrapper: Inspected Hadoop version: 2.6.0
16/03/17 13:32:01 INFO client.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
16/03/17 13:32:02 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/03/17 13:32:02 INFO metastore.ObjectStore: ObjectStore, initialize called
16/03/17 13:32:02 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
16/03/17 13:32:02 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
16/03/17 13:32:02 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/03/17 13:32:02 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/03/17 13:32:04 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
16/03/17 13:32:04 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/03/17 13:32:04 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/03/17 13:32:05 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/03/17 13:32:05 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/03/17 13:32:05 INFO DataNucleus.Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
16/03/17 13:32:05 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
16/03/17 13:32:05 INFO metastore.ObjectStore: Initialized ObjectStore
16/03/17 13:32:05 INFO metastore.HiveMetaStore: Added admin role in metastore
16/03/17 13:32:05 INFO metastore.HiveMetaStore: Added public role in metastore
16/03/17 13:32:05 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
16/03/17 13:32:05 INFO metastore.HiveMetaStore: 0: get_all_databases
16/03/17 13:32:05 INFO HiveMetaStore.audit: ugi=hadoop    ip=unknown-ip-addr    cmd=get_all_databases    
16/03/17 13:32:05 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
16/03/17 13:32:05 INFO HiveMetaStore.audit: ugi=hadoop    ip=unknown-ip-addr    cmd=get_functions: db=default pat=*    
16/03/17 13:32:05 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
16/03/17 13:32:05 INFO session.SessionState: Created local directory: /tmp/3ee9e6c4-f0e5-4b3e-b7e3-3705c7327b56_resources
16/03/17 13:32:05 INFO session.SessionState: Created HDFS directory: /tmp/hive/hadoop/3ee9e6c4-f0e5-4b3e-b7e3-3705c7327b56
16/03/17 13:32:05 INFO session.SessionState: Created local directory: /tmp/hadoop/3ee9e6c4-f0e5-4b3e-b7e3-3705c7327b56
16/03/17 13:32:05 INFO session.SessionState: Created HDFS directory: /tmp/hive/hadoop/3ee9e6c4-f0e5-4b3e-b7e3-3705c7327b56/_tmp_space.db
16/03/17 13:32:05 INFO repl.SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.

scala> 

9.4040端口查看状态


10.测试

scala> val textFile=sc.textFile("hdfs://192.168.0.100:9000/spark")
16/03/17 11:16:51 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 229.5 KB, free 229.5 KB)
16/03/17 11:16:52 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 19.6 KB, free 249.0 KB)
16/03/17 11:16:52 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.0.100:46538 (size: 19.6 KB, free: 1247.6 MB)
16/03/17 11:16:52 INFO spark.SparkContext: Created broadcast 0 from textFile at <console>:27
textFile: org.apache.spark.rdd.RDD[String] = hdfs://192.168.0.100:9000/spark MapPartitionsRDD[1] at textFile at <console>:27

scala> text
textFile   text       

scala> textFile.count
count                 countApprox           countApproxDistinct   
countByValue          countByValueApprox    

scala> textFile.count()
16/03/17 11:17:24 INFO mapred.FileInputFormat: Total input paths to process : 6
16/03/17 11:17:24 INFO spark.SparkContext: Starting job: count at <console>:30
16/03/17 11:17:24 INFO scheduler.DAGScheduler: Got job 0 (count at <console>:30) with 7 output partitions
16/03/17 11:17:24 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (count at <console>:30)
16/03/17 11:17:24 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/03/17 11:17:24 INFO scheduler.DAGScheduler: Missing parents: List()
16/03/17 11:17:24 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (hdfs://192.168.0.100:9000/spark MapPartitionsRDD[1] at textFile at <console>:27), which has no missing parents
16/03/17 11:17:24 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.9 KB, free 252.0 KB)
16/03/17 11:17:24 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1766.0 B, free 253.7 KB)
16/03/17 11:17:24 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.0.100:46538 (size: 1766.0 B, free: 1247.6 MB)
16/03/17 11:17:24 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
16/03/17 11:17:24 INFO scheduler.DAGScheduler: Submitting 7 missing tasks from ResultStage 0 (hdfs://192.168.0.100:9000/spark MapPartitionsRDD[1] at textFile at <console>:27)
16/03/17 11:17:24 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 7 tasks
16/03/17 11:17:24 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, slave11, partition 0,NODE_LOCAL, 2157 bytes)
16/03/17 11:17:24 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 1, slave16, partition 2,NODE_LOCAL, 2168 bytes)
16/03/17 11:17:24 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.0 (TID 2, slave3, partition 3,NODE_LOCAL, 2153 bytes)
16/03/17 11:17:24 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 0.0 (TID 3, slave9, partition 5,NODE_LOCAL, 2153 bytes)
16/03/17 11:17:24 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, slave4, partition 4,NODE_LOCAL, 2153 bytes)
16/03/17 11:17:24 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 5, slave14, partition 1,NODE_LOCAL, 2157 bytes)
16/03/17 11:17:24 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, slave3, partition 6,NODE_LOCAL, 2153 bytes)
16/03/17 11:17:25 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on slave11:37685 (size: 1766.0 B, free: 511.5 MB)
16/03/17 11:17:25 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on slave9:36273 (size: 1766.0 B, free: 511.5 MB)
16/03/17 11:17:25 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on slave14:33468 (size: 1766.0 B, free: 511.5 MB)
16/03/17 11:17:25 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on slave4:37197 (size: 1766.0 B, free: 511.5 MB)
16/03/17 11:17:25 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on slave16:36486 (size: 1766.0 B, free: 511.5 MB)
16/03/17 11:17:25 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on slave3:33002 (size: 1766.0 B, free: 511.5 MB)
16/03/17 11:17:25 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on slave14:33468 (size: 19.6 KB, free: 511.5 MB)
16/03/17 11:17:25 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on slave16:36486 (size: 19.6 KB, free: 511.5 MB)
16/03/17 11:17:25 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on slave3:33002 (size: 19.6 KB, free: 511.5 MB)
16/03/17 11:17:25 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on slave9:36273 (size: 19.6 KB, free: 511.5 MB)
16/03/17 11:17:25 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on slave11:37685 (size: 19.6 KB, free: 511.5 MB)
16/03/17 11:17:25 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on slave4:37197 (size: 19.6 KB, free: 511.5 MB)
16/03/17 11:17:26 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 5) in 1762 ms on slave14 (1/7)
16/03/17 11:17:26 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 1905 ms on slave3 (2/7)
16/03/17 11:17:26 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 2) in 1907 ms on slave3 (3/7)
16/03/17 11:17:26 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 0.0 (TID 3) in 1941 ms on slave9 (4/7)
16/03/17 11:17:26 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 2005 ms on slave4 (5/7)
16/03/17 11:17:26 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2106 ms on slave11 (6/7)
16/03/17 11:17:26 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 1) in 2139 ms on slave16 (7/7)
16/03/17 11:17:26 INFO scheduler.DAGScheduler: ResultStage 0 (count at <console>:30) finished in 2.181 s
16/03/17 11:17:26 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/03/17 11:17:26 INFO scheduler.DAGScheduler: Job 0 finished: count at <console>:30, took 2.437857 s
res0: Long = 5507

scala>


**Spark的详细配置见Spark官网。




1 0