spark-shell初体验
来源:互联网 发布:ipadpro专用软件 编辑:程序博客网 时间:2024/05/02 02:45
1、复制文件至HDFS:
hadoop@Mhadoop:/usr/local/hadoop$ bin/hdfs dfs -mkdir /user
hadoop@Mhadoop:/usr/local/hadoop$ bin/hdfs dfs -mkdir /user/hadoop
hadoop@Mhadoop:/usr/local/hadoop$ bin/hdfs dfs -copyFromLocal /usr/local/spark/spark-1.3.1-bin-hadoop2.4/README.md /user/hadoop/
hadoop@Mhadoop:/usr/local/hadoop$ bin/hdfs dfs -mkdir /user/hadoop
hadoop@Mhadoop:/usr/local/hadoop$ bin/hdfs dfs -copyFromLocal /usr/local/spark/spark-1.3.1-bin-hadoop2.4/README.md /user/hadoop/
2、运行spark-shell
3、读取文件统计spark这个词出现次数
scala> sc
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@472ac3d3
scala> val file = sc.textFile("hdfs://Mhadoop:9000/user/hadoop/README.md")
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@472ac3d3
scala> val file = sc.textFile("hdfs://Mhadoop:9000/user/hadoop/README.md")
file: org.apache.spark.rdd.RDD[String] = hdfs://Mhadoop:9000/user/hadoop/README.md MapPartitionsRDD[1] at textFile at <console>:21
file变量是一个MapPartitionsRDD;接着过滤spark这个词
scala> val sparks = file.filter(line => line.contains("spark"))
sparks: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at filter at <console>:23
sparks: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at filter at <console>:23
统计spark出现次数,结果为11:
scala> sparks.count
另开一个terminal用ubuntu自带的wc命令验证下:
hadoop@Mhadoop:/usr/local/spark/spark-1.3.1-bin-hadoop2.4$ grep spark README.md|wc
11 50 761
11 50 761
4、执行spark cache看下效率提升
scala> sparks.cache
res3: sparks.type = MapPartitionsRDD[2] at filter at <console>:23
res3: sparks.type = MapPartitionsRDD[2] at filter at <console>:23
登录控制台:http://192.168.85.10:4040/stages/
可见cache之后,耗时从s变为ms,性能提升明显。
0 0
- spark-shell初体验
- spark streaming初体验
- spark初体验
- spark-sql初体验
- bash shell 初体验
- Spark笔记(1)-Spark初体验
- CDH5 Apache Spark初体验
- spark初体验之wordCount
- Shell脚本编程初体验
- Shell脚本编程初体验
- Shell脚本编程初体验
- Spark初体验(1)--SparkPi详解
- Spark初体验(2)--WordCount详解
- Spark初体验(配置超详细)
- Spark学习笔记(1)Spark初体验
- spark 操作 spark-shell
- Elasticsearch-Spark 体验
- Spark Python 快速体验
- merge join cartesian产生的一种情况
- vc遍历网页表单并自动填写提交
- 一些编程遇到的细小的问题
- iOS[正解] - no visible @interface for XXXXXX 解决方案
- 谷歌的隐形眼镜预计两年内上市 可以测血糖
- spark-shell初体验
- 四月学习总结
- 将exe程序在Win7下以管理员权限运行的方法
- Testing Level - 读书笔记 ( 一 )
- 编程中感触的观点
- OSPF协议详解
- Window环境下Memcache 实战
- lua学习笔记之lua相关学习以及资源查询网站
- poj 模拟 - 1690 (Your)((Term)((Project)))