spark 1.0 在hadoop-2.0.0-cdh4.2.0上面安装成功

来源：互联网发布：秦岭村庄消失知乎编辑：程序博客网时间：2024/06/04 19:29

因为我的hadoop版本是cdh发行的4.2.0所以我直接在spark官网上下载了spark1.0 for hadoop cdh 4.2.0
下载地址如下：http://spark.apache.org/downloads.html
http://d3kbcqa49mib13.cloudfront.net/spark-1.0.0-bin-cdh4.tgz

下载完解压到hadoop目录下：gtar -xzvf spark-1.0.0-bin-cdh4.tgz

建立软连接 ln -s spark-1.0.0-bin-cdh4 spark

lrwxrwxrwx.  1 hadoop hadoop        20  7月  9 15:16 2014 spark -> spark-1.0.0-bin-cdh4drwxrwxr-x. 28 hadoop hadoop      4096  6月 20 11:17 2014 spark-0.9.1-bin-cdh4-rw-rw-r--.  1 hadoop hadoop 168876274  5月  7 16:28 2014 spark-0.9.1-bin-cdh4.tgzdrwxrwxr-x. 11 hadoop hadoop      4096  7月 10 16:23 2014 spark-1.0.0-bin-cdh4-rw-rw-r--.  1 hadoop hadoop 179810279  7月  9 15:07 2014 spark-1.0.0-bin-cdh4.tgz

目录内容如下：

[hadoop@hadoop186 ~]$ cd spark[hadoop@hadoop186 spark]$ ll合計 380-rw-rw-r--. 1 hadoop hadoop 281471  5月 26 16:17 2014 CHANGES.txt-rw-rw-r--. 1 hadoop hadoop  29983  5月 26 16:17 2014 LICENSE-rw-rw-r--. 1 hadoop hadoop  22559  5月 26 16:17 2014 NOTICE-rw-rw-r--. 1 hadoop hadoop   4221  5月 26 16:17 2014 README.md-rw-rw-r--. 1 hadoop hadoop     48  5月 26 16:17 2014 RELEASEdrwxrwxr-x. 2 hadoop hadoop   4096  5月 26 16:17 2014 bindrwxrwxr-x. 2 hadoop hadoop   4096  7月 10 18:21 2014 confdrwxrwxr-x. 4 hadoop hadoop   4096  5月 26 16:17 2014 ec2drwxrwxr-x. 3 hadoop hadoop   4096  5月 26 16:17 2014 examplesdrwxrwxr-x. 2 hadoop hadoop   4096  5月 26 16:17 2014 libdrwxrwxr-x  2 hadoop hadoop   4096  7月 10 18:23 2014 logsdrwxrwxr-x. 6 hadoop hadoop   4096  5月 26 16:17 2014 pythondrwxrwxr-x. 2 hadoop hadoop   4096  7月 10 15:52 2014 sbindrwxrwxr-x. 6 hadoop hadoop   4096  7月 10 18:24 2014 work

然后编辑配置文件：

需要编辑的配置文件有

[hadoop@hadoop186 conf]$ ll合計 44-rw-rw-r--  1 hadoop hadoop  303  7月 10 15:17 2014 fairscheduler.xml-rw-rw-r--. 1 hadoop hadoop  303  5月 26 16:17 2014 fairscheduler.xml.template-rw-rw-r--. 1 hadoop hadoop  550  7月  9 15:31 2014 log4j.properties-rw-rw-r--. 1 hadoop hadoop  550  5月 26 16:17 2014 log4j.properties.template-rw-rw-r--. 1 hadoop hadoop 5308  5月 26 16:17 2014 metrics.properties.template<span style="color:#ff0000;">-rw-rw-r--  1 hadoop hadoop  122  7月 10 17:43 2014 slaves</span><span style="color:#cc0000;">-rw-rw-r--  1 hadoop hadoop  350  7月 10 16:38 2014 spark-defaults.conf</span>-rw-rw-r--. 1 hadoop hadoop  340  5月 26 16:17 2014 spark-defaults.conf.template<span style="color:#ff0000;">-rwxrw-r--  1 hadoop hadoop 2981  7月 10 18:21 2014 spark-env.sh</span>-rwxrwxr-x. 1 hadoop hadoop 2755  5月 26 16:17 2014 spark-env.sh.template

以上三个标红的文件都没有，需要通过复制官方给出的templeate文件生成：

spark-env.sh

[hadoop@hadoop186 conf]$ cat spark-env.sh#!/usr/bin/env bash# This file is sourced when running various Spark programs.# Copy it as spark-env.sh and edit that to configure Spark for your site.# Options read when launching programs locally with # ./bin/run-example or ./bin/spark-submit# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program# - SPARK_CLASSPATH, default classpath entries to append# Options read by executors and drivers running inside the cluster# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program# - SPARK_CLASSPATH, default classpath entries to append# - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data# - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos# Options read in YARN client mode# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files# - SPARK_EXECUTOR_INSTANCES, Number of workers to start (Default: 2)# - SPARK_EXECUTOR_CORES, Number of cores for the workers (Default: 1).# - SPARK_EXECUTOR_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)# - SPARK_DRIVER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: ‘default’)# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job.# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job.# Options for the daemons used in the standalone deploy mode:# - SPARK_MASTER_IP, to bind the master to a different IP address or hostname# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")# - SPARK_WORKER_CORES, to set the number of cores to use on this machine# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 100# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker# - SPARK_WORKER_INSTANCES, to set the number of worker processes per node# - SPARK_WORKER_DIR, to set the working directory of worker processes# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers<span style="color:#ff0000;">export SCALA_HOME=/home/hadoop/scalaexport JAVA_HOME=/usr/java/jdk1.7.0_45export SPARK_MASTER=hadoop186export SPARK_MASTER_PORT=7077export SPARK_WORKER_CORES=1export SPARK_WORKER_INSTANCES=1export SPARK_WORKER_MEMORY=3g</span>

spark-defaults.conf

[hadoop@hadoop186 conf]$ cat spark-defaults.conf# Default system properties included when running spark-submit.# This is useful for setting default environmental settings.# Example: spark.master            spark://hadoop186:7077 spark.eventLog.enabled  true #spark.eventLog.dir      hdfs://mycluster/user/hadoop/spark/logs/ spark.serializer        org.apache.spark.serializer.KryoSerializer[hadoop@hadoop186 conf]$

[hadoop@hadoop186 conf]$ cat slaves # A Spark Worker will be started on each of the machines listed below.#localhosthadoop186hadoop187hadoop188hadoop189

配置完成将安装包分发到各个节点上相应的目录中：

[hadoop@hadoop187 ~]$ scp -r spark/  hadoop@hadoop187:/home/hadoop/

[hadoop@hadoop187 ~]$ scp -r spark/  hadoop@hadoop188:/home/hadoop/

[hadoop@hadoop187 ~]$ scp -r spark/  hadoop@hadoop189:/home/hadoop/

分发完成这是要启动spark集群，需要进入到spark/sbin目录下面，这里边有启动的脚本：

[hadoop@hadoop186 spark]$ cd sbin/[hadoop@hadoop186 sbin]$ lsslaves.sh        spark-daemon.sh   spark-executor  start-history-server.sh  start-slave.sh   stop-all.sh             stop-master.shspark-config.sh  spark-daemons.sh  start-all.sh    start-master.sh          start-slaves.sh  stop-history-server.sh  stop-slaves.sh[hadoop@hadoop186 sbin]$

这里我们直接使用start-all.sh就可以了

[hadoop@hadoop186 sbin]$ ./start-all.sh rsync from hadoop186rsync: change_dir "/home/hadoop/spark-1.0.0-bin-cdh4/sbin/hadoop186" failed: No such file or directory (2)rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6]starting org.apache.spark.deploy.master.Master, logging to /home/hadoop/spark-1.0.0-bin-cdh4/sbin/../logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-hadoop186.outhadoop186: rsync from hadoop186hadoop186: rsync: change_dir "/home/hadoop/spark-1.0.0-bin-cdh4/hadoop186" failed: No such file or directory (2)hadoop187: bash: line 0: cd: /home/hadoop/spark-1.0.0-bin-cdh4/sbin/..: そのようなファイルやディレクトリはありませんhadoop189: bash: line 0: cd: /home/hadoop/spark-1.0.0-bin-cdh4/sbin/..: そのようなファイルやディレクトリはありませんhadoop188: bash: line 0: cd: /home/hadoop/spark-1.0.0-bin-cdh4/sbin/..: そのようなファイルやディレクトリはありませんhadoop189: rsync from hadoop186hadoop187: rsync from hadoop186hadoop189: rsync: change_dir "/home/hadoop/hadoop186" failed: No such file or directory (2)hadoop187: rsync: change_dir "/home/hadoop/hadoop186" failed: No such file or directory (2)hadoop188: rsync from hadoop186hadoop188: rsync: change_dir "/home/hadoop/hadoop186" failed: No such file or directory (2)hadoop186: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6]hadoop186: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-1.0.0-bin-cdh4/sbin/../logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop186.outhadoop189: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6]hadoop187: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6]hadoop188: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6]hadoop189: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark/sbin/../logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop189.outhadoop187: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark/sbin/../logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop187.outhadoop188: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark/sbin/../logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop188.out[hadoop@hadoop186 sbin]$

这里会有同步的异常没有关系

通过jps验证启动成功：Master节点

[hadoop@hadoop186 sbin]$ jps3131 NodeManager<span style="color:#ff0000;">6456 Worker</span>2533 DataNode2195 QuorumPeerMain<span style="color:#ff0000;">6305 Master</span>3015 ResourceManager2925 DFSZKFailoverController2424 NameNode6566 Jps2243 JournalNode

Worker节点上面：

[hadoop@hadoop187 spark]$ jps2357 JournalNode2446 NameNode2782 DFSZKFailoverController2297 QuorumPeerMain2517 DataNode4549 Jps<span style="color:#cc0000;">4442 Worker</span>2874 NodeManager[hadoop@hadoop187 spark]$

通过网页验证启动成功：默认端口是8080 这里我修改为8888了所以。。。。

可以看到有四个worker在运行

实例程序运行：

1.spark-shell运行：

spark-shell是spark提供的一个交互式的操作平台我们可以使用scala语言在这里运行进行一些交互式的操作：

[hadoop@hadoop186 bin]$ ./spark-shellSpark assembly has been built with Hive, including Datanucleus jars on classpath14/07/10 19:05:05 INFO SecurityManager: Changing view acls to: hadoop14/07/10 19:05:05 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop)14/07/10 19:05:05 INFO HttpServer: Starting HTTP ServerWelcome to      ____              __     / __/__  ___ _____/ /__    _\ \/ _ \/ _ `/ __/  '_/   /___/ .__/\_,_/_/ /_/\_\   version 1.0.0      /_/Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45)Type in expressions to have them evaluated.Type :help for more information.14/07/10 19:05:13 INFO SecurityManager: Changing view acls to: hadoop14/07/10 19:05:13 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop)14/07/10 19:05:14 INFO Slf4jLogger: Slf4jLogger started14/07/10 19:05:14 INFO Remoting: Starting remoting14/07/10 19:05:15 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@hadoop186:60596]14/07/10 19:05:15 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@hadoop186:60596]14/07/10 19:05:15 INFO SparkEnv: Registering MapOutputTracker14/07/10 19:05:15 INFO SparkEnv: Registering BlockManagerMaster14/07/10 19:05:15 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20140710190515-7ff114/07/10 19:05:15 INFO MemoryStore: MemoryStore started with capacity 297.0 MB.14/07/10 19:05:15 INFO ConnectionManager: Bound socket to port 52299 with id = ConnectionManagerId(hadoop186,52299)14/07/10 19:05:15 INFO BlockManagerMaster: Trying to register BlockManager14/07/10 19:05:15 INFO BlockManagerInfo: Registering block manager hadoop186:52299 with 297.0 MB RAM14/07/10 19:05:15 INFO BlockManagerMaster: Registered BlockManager14/07/10 19:05:15 INFO HttpServer: Starting HTTP Server14/07/10 19:05:15 INFO HttpBroadcast: Broadcast server started at http://192.168.119.186:4467314/07/10 19:05:15 INFO HttpFileServer: HTTP File server directory is /tmp/spark-e96f3b67-c67c-4ef4-8e83-c771b756a81f14/07/10 19:05:15 INFO HttpServer: Starting HTTP Server14/07/10 19:05:25 INFO SparkUI: Started SparkUI at http://hadoop186:404014/07/10 19:05:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable14/07/10 19:05:28 INFO EventLoggingListener: Logging events to /tmp/spark-events/spark-shell-140498672649314/07/10 19:05:29 INFO AppClient$ClientActor: Connecting to master spark://hadoop186:7077...14/07/10 19:05:29 INFO SparkILoop: Created spark context..14/07/10 19:05:30 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20140710190530-000014/07/10 19:05:30 INFO AppClient$ClientActor: Executor added: app-20140710190530-0000/0 on worker-20140710185222-hadoop189-38236 (hadoop189:38236) with 1 cores14/07/10 19:05:30 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140710190530-0000/0 on hostPort hadoop189:38236 with 1 cores, 512.0 MB RAM14/07/10 19:05:30 INFO AppClient$ClientActor: Executor added: app-20140710190530-0000/1 on worker-20140710185222-hadoop187-52710 (hadoop187:52710) with 1 cores14/07/10 19:05:30 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140710190530-0000/1 on hostPort hadoop187:52710 with 1 cores, 512.0 MB RAM14/07/10 19:05:30 INFO AppClient$ClientActor: Executor added: app-20140710190530-0000/2 on worker-20140710185222-hadoop188-33277 (hadoop188:33277) with 1 cores14/07/10 19:05:30 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140710190530-0000/2 on hostPort hadoop188:33277 with 1 cores, 512.0 MB RAM14/07/10 19:05:30 INFO AppClient$ClientActor: Executor added: app-20140710190530-0000/3 on worker-20140710185223-hadoop186-53755 (hadoop186:53755) with 1 cores14/07/10 19:05:30 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140710190530-0000/3 on hostPort hadoop186:53755 with 1 cores, 512.0 MB RAMSpark context available as sc.scala> 14/07/10 19:05:30 INFO AppClient$ClientActor: Executor updated: app-20140710190530-0000/1 is now RUNNING14/07/10 19:05:30 INFO AppClient$ClientActor: Executor updated: app-20140710190530-0000/2 is now RUNNING14/07/10 19:05:30 INFO AppClient$ClientActor: Executor updated: app-20140710190530-0000/0 is now RUNNING14/07/10 19:05:30 INFO AppClient$ClientActor: Executor updated: app-20140710190530-0000/3 is now RUNNING14/07/10 19:05:34 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@hadoop189:38788/user/Executor#-1033851263] with ID 014/07/10 19:05:34 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@hadoop187:58944/user/Executor#-817629892] with ID 114/07/10 19:05:34 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@hadoop188:54395/user/Executor#1508640488] with ID 214/07/10 19:05:34 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@hadoop186:33332/user/Executor#1948554196] with ID 314/07/10 19:05:34 INFO BlockManagerInfo: Registering block manager hadoop189:54834 with 297.0 MB RAM14/07/10 19:05:35 INFO BlockManagerInfo: Registering block manager hadoop188:58240 with 297.0 MB RAM14/07/10 19:05:35 INFO BlockManagerInfo: Registering block manager hadoop187:50778 with 297.0 MB RAM14/07/10 19:05:35 INFO BlockManagerInfo: Registering block manager hadoop186:59172 with 297.0 MB RAMscala> scala>

scala>  val rdd=sc.textFile("hdfs://mycluster/user/hadoop/logs/localhost_access_log.2014-04-25.txt")14/07/10 19:06:40 INFO MemoryStore: ensureFreeSpace(114743) called with curMem=0, maxMem=31138775014/07/10 19:06:40 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 112.1 KB, free 296.9 MB)rdd: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12scala>  rdd.cache()res0: rdd.type = MappedRDD[1] at textFile at <console>:12scala>  val wordcount=rdd.flatMap(_.split(" ")).map(x=>(x,1)).reduceByKey(_+_)14/07/10 19:06:59 INFO FileInputFormat: Total input paths to process : 1wordcount: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[6] at reduceByKey at <console>:14

scala> wordcount.take(10)14/07/10 19:07:32 INFO SparkContext: Starting job: take at <console>:1714/07/10 19:07:32 INFO DAGScheduler: Registering RDD 4 (reduceByKey at <console>:14)14/07/10 19:07:32 INFO DAGScheduler: Got job 0 (take at <console>:17) with 1 output partitions (allowLocal=true)14/07/10 19:07:32 INFO DAGScheduler: Final stage: Stage 0(take at <console>:17)14/07/10 19:07:32 INFO DAGScheduler: Parents of final stage: List(Stage 1)14/07/10 19:07:32 INFO DAGScheduler: Missing parents: List(Stage 1)14/07/10 19:07:32 INFO DAGScheduler: Submitting Stage 1 (MapPartitionsRDD[4] at reduceByKey at <console>:14), which has no missing parents14/07/10 19:07:32 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (MapPartitionsRDD[4] at reduceByKey at <console>:14)14/07/10 19:07:32 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks14/07/10 19:07:32 INFO TaskSetManager: Starting task 1.0:0 as TID 0 on executor 3: hadoop186 (NODE_LOCAL)14/07/10 19:07:32 INFO TaskSetManager: Serialized task 1.0:0 as 2140 bytes in 11 ms14/07/10 19:07:32 INFO TaskSetManager: Starting task 1.0:1 as TID 1 on executor 2: hadoop188 (NODE_LOCAL)14/07/10 19:07:32 INFO TaskSetManager: Serialized task 1.0:1 as 2140 bytes in 0 ms14/07/10 19:07:45 INFO BlockManagerInfo: Added rdd_1_0 in memory on hadoop186:59172 (size: 153.2 KB, free: 296.8 MB)14/07/10 19:07:45 INFO BlockManagerInfo: Added rdd_1_1 in memory on hadoop188:58240 (size: 152.1 KB, free: 296.8 MB)14/07/10 19:07:46 INFO DAGScheduler: Completed ShuffleMapTask(1, 0)14/07/10 19:07:46 INFO TaskSetManager: Finished TID 0 in 13706 ms on hadoop186 (progress: 1/2)14/07/10 19:07:46 INFO DAGScheduler: Completed ShuffleMapTask(1, 1)14/07/10 19:07:46 INFO DAGScheduler: Stage 1 (reduceByKey at <console>:14) finished in 13.865 s14/07/10 19:07:46 INFO DAGScheduler: looking for newly runnable stages14/07/10 19:07:46 INFO DAGScheduler: running: Set()14/07/10 19:07:46 INFO DAGScheduler: waiting: Set(Stage 0)14/07/10 19:07:46 INFO DAGScheduler: failed: Set()14/07/10 19:07:46 INFO TaskSetManager: Finished TID 1 in 13820 ms on hadoop188 (progress: 2/2)14/07/10 19:07:46 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 14/07/10 19:07:46 INFO DAGScheduler: Missing parents for Stage 0: List()14/07/10 19:07:46 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[6] at reduceByKey at <console>:14), which is now runnable14/07/10 19:07:46 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (MapPartitionsRDD[6] at reduceByKey at <console>:14)14/07/10 19:07:46 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks14/07/10 19:07:46 INFO TaskSetManager: Starting task 0.0:0 as TID 2 on executor 3: hadoop186 (PROCESS_LOCAL)14/07/10 19:07:46 INFO TaskSetManager: Serialized task 0.0:0 as 1995 bytes in 0 ms14/07/10 19:07:46 INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to spark@hadoop186:5490514/07/10 19:07:46 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 148 bytes14/07/10 19:07:46 INFO DAGScheduler: Completed ResultTask(0, 0)14/07/10 19:07:46 INFO DAGScheduler: Stage 0 (take at <console>:17) finished in 0.301 s14/07/10 19:07:46 INFO TaskSetManager: Finished TID 2 in 310 ms on hadoop186 (progress: 1/1)14/07/10 19:07:46 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 14/07/10 19:07:46 INFO SparkContext: Job finished: take at <console>:17, took 14.348482699 sres1: Array[(String, Int)] = Array(([25/Apr/2014:17:59:26,4), (7377,1), ([25/Apr/2014:18:10:08,8), ([25/Apr/2014:17:02:48,1), (/detail.action?orderNo=77&userId=bhh,1), ([25/Apr/2014:17:29:36,3), (6646,1), ([25/Apr/2014:17:23:32,32), ([25/Apr/2014:17:44:26,1), (16805,1))

这期间我们可以通过web ui查看任务的执行情况：

0 0