spark-shell 高级操作
来源:互联网 发布:java写游戏编码 编辑:程序博客网 时间:2024/06/05 07:13
一、系统环境
Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.2.0 /_/Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_71)
hadoop-2.6.0
二、数据集 自己爬取的淘宝交易记录 下面看一下样例
[hadoop@localhost conf]$ $HADOOP_INSTALL/bin/hadoop dfs -tail /user/spark/quikstart/DealRecord.csv8 09:05:07,口味:香辣,http://item.taobao.com/item.htm?id=36296603821四**a (匿名),¥1.8,10,2014-12-08 08:54:27,口味:香辣,http://item.taobao.com/item.htm?id=36296603821t**2 (匿名),¥1.79,10,2014-12-08 08:52:26,口味:香辣,http://item.taobao.com/item.htm?id=36296603821浅**8 (匿名),¥1.74,6,2014-12-08 08:19:46,口味:香辣,http://item.taobao.com/item.htm?id=36296603821t**3 (匿名),¥1.8,11,2014-12-08 08:16:15,口味:香辣,http://item.taobao.com/item.htm?id=36296603821z**5 (匿名),¥1.71,5,2014-12-08 07:59:42,口味:香辣,http://item.taobao.com/item.htm?id=36296603821t**d (匿名),¥1.84,22,2014-12-08 04:12:43,口味:香辣,http://item.taobao.com/item.htm?id=36296603821z**u (匿名),¥1.77,8,2014-12-08 02:41:12,口味:香辣,http://item.taobao.com/item.htm?id=36296603821z**j (匿名),¥1.59,3,2014-12-08 01:45:56,口味:香辣,http://item.taobao.com/item.htm?id=36296603821w**y (匿名),¥1.8,10,2014-12-08 01:42:36,口味:香辣,http://item.taobao.com/item.htm?id=36296603821
3.1 任务一:计算价格每件货品的在爬取2014-12-08日的销售总量。注意每一个ID对应一个货品。最后形式希望是(ID, 销售总量)
<pre name="code" class="html">scala> val dealRecord = sc.textFile("/user/spark/quikstart/DealRecord.csv")15/02/03 10:59:38 INFO storage.MemoryStore: ensureFreeSpace(81443) called with curMem=195552, maxMem=28024897515/02/03 10:59:38 INFO storage.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 79.5 KB, free 267.0 MB)15/02/03 10:59:39 INFO storage.MemoryStore: ensureFreeSpace(31329) called with curMem=276995, maxMem=28024897515/02/03 10:59:39 INFO storage.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 30.6 KB, free 267.0 MB)15/02/03 10:59:39 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on localhost:59773 (size: 30.6 KB, free: 267.2 MB)15/02/03 10:59:39 INFO storage.BlockManagerMaster: Updated info of block broadcast_4_piece015/02/03 10:59:39 INFO spark.SparkContext: Created broadcast 4 from textFile at <console>:12dealRecord: org.apache.spark.rdd.RDD[String] = /user/spark/quikstart/DealRecord.csv MappedRDD[4] at textFile at <console>:12scala> val dealRecord20141208 = dealRecord.filter(line => line.contains("2014-12-08") && line.contains("http"))dealRecord20141208: org.apache.spark.rdd.RDD[String] = FilteredRDD[5] at filter at <console>:14scala> val dealQuanlityPairs = dealRecord20141208.map(line => (line.split(",")(5).split("id=")(1), line.split(",")(2).toInt))val dealQuanlitySum = dealQuanlityPairs.reduceByKey(_+_)
四 小知识scala java 互相操作
scala> import java.util.Dateimport java.util.Datescala> val now = new Datenow: java.util.Date = Tue Feb 03 15:40:28 CST 2015scala> import java.text.SimpleDateFormatimport java.text.SimpleDateFormatscala> val pattern = "yyyy-mm-dd HH:MM:ss"pattern: String = yyyy-mm-dd HH:MM:ssscala> val sformat = new SimpleDateFormat(pattern)sformat: java.text.SimpleDateFormat = java.text.SimpleDateFormat@59bf79a0scala> sformat.format(now)res28: String = 2015-40-03 15:02:28scala> val fnow = sformat.format(now)fnow: String = 2015-40-03 15:02:28
0 0
- spark-shell 高级操作
- spark 操作 spark-shell
- shell高级变量操作
- spark-shell中的简单操作
- shell操作数据库:高级查询
- shell 高级操作、正则表达
- Spark Shell各种操作及详细说明
- spark-shell基本的RDD操作
- spark-shell 基础操作(持续更新)
- linux-shell高级编程-操作变量串
- Spark 的 Shell操作,核心概念,构建独立应用
- scala学习笔记5 spark-shell的简单操作
- Spark shell退出操作以及出现问题的解决方法
- 在Spark 1.2.0的spark-shell中进行Hive数据库的操作
- Spark高级排序
- spark性能优化-高级
- Spark Streaming高级
- Spark高级编程
- statistic
- 传真百科:雷雨天对传真通讯有影响吗
- Http,Https (SSL)的Url绝对路径,相对路径解决方案Security Switch 4.2 中文帮助文档
- C#设置文件(夹)权限
- 用户级线程和内核级线程
- spark-shell 高级操作
- A*算法基本原理
- 手机fiddler真机抓包 新技能get
- JtextPanel实现英文自动换行
- [LeetCode]Longest Common Prefix
- 改变当前点击的a标签的样式,将其他a标签样式还原
- TCP/IP分层协议簇详解
- UML类图符号 各种关系说明以及举例
- 20150203 【 内核链表 kernel_list.h 】 list_head 使用