单机运行Spark Shell
来源:互联网 发布:手机号码扫号软件 编辑:程序博客网 时间:2024/05/21 15:44
1 下载Spark-2.1.0-bin-hadoop2.7.tgz
http://spark.apache.org/downloads.html
2 解压缩
[root@sk1 ~]tar -zxvf spark-2.1.0-bin-hadoop2.7.tgz -C /opt
3 进入spark根目录
[root@sk1 ~]# cd /opt/spark-2.1.0-bin-hadoop2.7/[root@sk1 spark-2.1.0-bin-hadoop2.7]# lsbin derby.log LICENSE NOTICE README.md yarnconf examples licenses python RELEASEdata jars metastore_db R sbin
4 运行bin/spark-shell
[root@sk1 spark-2.1.0-bin-hadoop2.7]# bin/spark-shell Using Spark's default log4j profile: org/apache/spark/log4j-defaults.propertiesSetting default log level to "WARN".To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).17/04/07 22:41:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable17/04/07 22:41:32 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.017/04/07 22:41:33 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException17/04/07 22:41:34 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectExceptionSpark context Web UI available at http://192.168.11.138:4040Spark context available as 'sc' (master = local[*], app id = local-1491619281633).Spark session available as 'spark'.Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.1.0 /_/Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)Type in expressions to have them evaluated.Type :help for more information.scala>
5 简单交互
scala> val rdd1=sc.parallelize(1 to 100,5)rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:24scala> rdd1.count[Stage 0:> (0 res0: Long = 100scala> val rdd2=rdd1.map(_+4)rdd2: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[1] at map at <console>:26scala> rdd2.take(2)res1: Array[Int] = Array(5, 6)
6 WordCount
6.1 准备数据
[root@sk1 ~]# vi /tmp/wordcount.txt[root@sk1 ~]# cat /tmp/wordcount.txt zookeeper hadoop hdfs yarn hive hbase sparkhello world hello bigdata
6.2 程序
scala> val rdd=sc.textFile("file:///tmp/wordcount.txt")rdd: org.apache.spark.rdd.RDD[String] = file:///tmp/wordcount.txt MapPartitionsRDD[3] at textFile at <console>:24scala> rdd.countres2: Long = 3scala> val mapRdd=rdd.flatMap(_.split(" "))mapRdd: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[4] at flatMap at <console>:26scala> mapRdd.firstres3: String = zookeeperscala> val kvRdd=mapRdd.map(x=>(x,1))kvRdd: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[5] at map at <console>:28scala> kvRdd.firstres4: (String, Int) = (zookeeper,1)scala> kvRdd.take(2)res5: Array[(String, Int)] = Array((zookeeper,1), (hadoop,1))scala> val rsRdd=kvRdd.reduceByKey(_+_)rsRdd: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[6] at reduceByKey at <console>:30scala> rsRdd.take(2)res6: Array[(String, Int)] = Array((spark,1), (hive,1)) scala> rsRdd.saveAsTextFile("file:///tmp/output")
6.3 查看结果
[root@sk1 ~]# ls /tmp/output/part-00000 _SUCCESS[root@sk1 ~]# cat /tmp/output/part-00000 (spark,1)(hive,1)(hadoop,1)(bigdata ,1)(zookeeper,1)(hello,2)(yarn,1)(hdfs,1)(hbase,1)(world,1)[root@sk1 ~]#
0 0
- 单机运行Spark Shell
- 单机运行 spark-shell错误
- 单机运行Spark Shell遇到的一个低级错误
- 单机运行spark-shell出现ERROR Remoting: Remoting error: [Startup failed]
- 【Docker】Docker运行单机版Spark
- spark-shell运行spark任务参数设置
- 下载Spark并在单机模式下运行它
- Spark wordcount开发并提交到单机(伪分布式)运行
- spark-shell 运行报错 OutOfMemoryError
- Spark教程-构建Spark集群-配置Hadoop单机模式并运行Wordcount(1)
- Spark教程-构建Spark集群-配置Hadoop单机模式并运行Wordcount(2)
- Spark入门:在Intellij IDEA上单机运行Spark的RRD的map和filter
- 在Yarn上运行spark-shell和spark-sql命令行
- 在Yarn上运行spark-shell和spark-sql命令行
- Learning Spark——使用spark-shell运行Word Count
- spark安装:在hadoop YARN上运行spark-shell
- spark运行spark-shell出现Connection refused问题
- spark 单机模式配置
- 关于点亮LCD液晶屏的技巧
- LeetCode-Easy部分中标签为 Binary Search 349. Intersection of Two Arrays 350. Intersection of Two Arrays II
- linux系统结构、文件寻址和管理以及正则表达式(unit2)
- C++沉思录-第5章 代理类
- 仿QQ聊天程序(java)
- 单机运行Spark Shell
- 1037. 在霍格沃茨找零钱(20) python篇
- 算法入门---java语言实现的快速排序小结
- LL(1)文法first集
- hihocoder #1173 : 博弈游戏·Nim游戏·三
- linux man命令(unit3)
- Unix/Linux中/usr目录的由来
- dsi_set_cmdq 用法(LCD寄存器的读写)
- rails 使用cookie