Scala 造数据脚本,方便Spark做测试用
来源:互联网 发布:暴走漫画官方淘宝 编辑:程序博客网 时间:2024/06/06 03:02
苦于spark 无数据可测试,于是就动手写了些scala 程序用来造百G 或更多的数据,以方便spark sql 做测试使用,之前在某影视公司面试的面试题数据结构,我就按这个来进行造数据。结构一共6个字段:DataStructure("ID","Username","Userage","PhoneType,"Click","LoginTime")
数据预览:
1,Role97,16,MI,13,2016-11-21
2,Role42,30,Meizu,15,2016-5-12
3,Role87,41,Apple,14,2016-3-5
4,Role59,21,Oppo,2,2016-3-8
5,Role26,54,MI,3,2016-4-23
6,Role27,32,Huawei,2,2016-3-18
7,Role22,15,Oppo,10,2016-5-12
8,Role64,31,Samsung,11,2016-10-29
9,Role7,46,Lenovo,5,2016-10-7
10,Role50,37,Nokia,5,2016-10-30
11,Role30,64,Samsung,9,2016-10-7
12,Role27,54,Samsung,5,2016-3-8
13,Role3,37,Samsung,4,2016-5-9
14,Role84,66,Meizu,5,2016-6-11
15,Role48,25,Oppo,0,2016-8-0
16,Role92,29,Meizu,5,2016-2-17
17,Role77,85,Oppo,7,2016-8-13
18,Role67,85,Samsung,4,2016-10-27
19,Role41,16,Nokia,13,2016-6-12
20,Role0,42,Apple,5,2016-10-18
21,Role64,85,Oppo,4,2016-2-11
22,Role27,85,Samsung,6,2016-1-10
23,Role84,59,Apple,17,2016-8-15
24,Role26,52,Samsung,0,2016-7-19
25,Role27,59,Meizu,8,2016-12-3
26,Role52,56,Apple,2,2016-12-20
以下为代码:
package main.scala.CreateData
import java.io.{FileWriter, Writer}
import scala.util.Random
/**
* Created by Zhao Qiang on 2016/12/8.
*/
object DataCreater {
private val datapath = "E://platform.txt"
private val max_records = 100
private val age = 70
private val brand = Array("Huawei","MI","Apple","Samsung","Meizu","Lenovo","Oppo","Nokia")
// define a method to make data
def Creater(): Unit ={
val rand = new Random()
val writer: FileWriter = new FileWriter(datapath,true)
// create age of data
for(i <- 1 to max_records){
var dataage = rand.nextInt(age)
if (dataage < 15){dataage = age + 15}
//create phonePlus of data
var phonePlus = brand(rand.nextInt(8))
//create clicks of data
var clicks = rand.nextInt(20)
//create users of data
var name = "Role"+ rand.nextInt(100).toString
//println(name)
var months = rand.nextInt(12)+1
var logintime = "2016" + "-" + months + "-" + rand.nextInt(31)
//println(logintime)
//DataStructure("ID","Username","Userage","PhoneType,"Click","LoginTime")
writer.write(i + "," + name + "," + dataage + "," + phonePlus + "," + clicks + "," + logintime)
writer.write(System.getProperty("line.separator"))
}
writer.flush()
writer.close()
}
def main(args: Array[String]): Unit = {
Creater()
System.exit(1)
}
}
修改max_records 为你想要的数据大小,就可以生成指定的数据了
- Scala 造数据脚本,方便Spark做测试用
- 从HBase数据库表中读取数据动态转为DataFrame格式,方便后续用Spark SQL操作(scala实现)
- Scala Spark远程服务器测试
- 方便调试spark参数的python脚本
- 大数据 spark scala语言
- spark高级数据分析系列之第二章用 Scala 和 Spark 进行数据分析
- 用spark自带的示例SparkPi测试scala和spark集群
- 用NUnit为.NET程序做测试 ---实战之方便简捷的测试技巧
- 用scala来写mapreduce做数据去重
- 用bat脚本方便备份文件
- scala 随机数函数截取 spark 数据集
- spark使用scala读取Avro数据
- spark--python数据分析脚本
- mongodb[零]准备数据,方便测试
- spark scala 用ansj分词
- 用scala 写spark程序
- 分别用Java、Scala、spark-shell开发wordcount程序及测试代码
- 用scala,slick做数据交互层,play做mvc开发restful接口
- eclipse插件安装列表
- mac webstorm scss编译 添加监视
- ubuntu14.04安装opengl
- Google Developers中国网站网址
- ssh反向代理
- Scala 造数据脚本,方便Spark做测试用
- 聚集索引和非聚集索引的区别
- Oracle字符串函数
- bzoj1194: [HNOI2006]潘多拉的盒子
- DDL,DML,DCL区别
- style和theme
- Lua与C#的区别
- 算法--四种排序
- git代码行统计命令集