Scala 造数据脚本，方便Spark做测试用

来源：互联网发布：暴走漫画官方淘宝编辑：程序博客网时间：2024/06/06 03:02

苦于spark 无数据可测试，于是就动手写了些scala 程序用来造百G 或更多的数据，以方便spark sql 做测试使用，之前在某影视公司面试的面试题数据结构，我就按这个来进行造数据。结构一共6个字段：DataStructure("ID","Username","Userage","PhoneType,"Click","LoginTime")

数据预览：

1,Role97,16,MI,13,2016-11-21
2,Role42,30,Meizu,15,2016-5-12
3,Role87,41,Apple,14,2016-3-5
4,Role59,21,Oppo,2,2016-3-8
5,Role26,54,MI,3,2016-4-23
6,Role27,32,Huawei,2,2016-3-18
7,Role22,15,Oppo,10,2016-5-12
8,Role64,31,Samsung,11,2016-10-29
9,Role7,46,Lenovo,5,2016-10-7
10,Role50,37,Nokia,5,2016-10-30
11,Role30,64,Samsung,9,2016-10-7
12,Role27,54,Samsung,5,2016-3-8
13,Role3,37,Samsung,4,2016-5-9
14,Role84,66,Meizu,5,2016-6-11
15,Role48,25,Oppo,0,2016-8-0
16,Role92,29,Meizu,5,2016-2-17
17,Role77,85,Oppo,7,2016-8-13
18,Role67,85,Samsung,4,2016-10-27
19,Role41,16,Nokia,13,2016-6-12
20,Role0,42,Apple,5,2016-10-18
21,Role64,85,Oppo,4,2016-2-11
22,Role27,85,Samsung,6,2016-1-10
23,Role84,59,Apple,17,2016-8-15
24,Role26,52,Samsung,0,2016-7-19
25,Role27,59,Meizu,8,2016-12-3
26,Role52,56,Apple,2,2016-12-20

以下为代码:

package main.scala.CreateData

import java.io.{FileWriter, Writer}

import scala.util.Random

/**
* Created by Zhao Qiang on 2016/12/8.
*/
object DataCreater {
private val datapath = "E://platform.txt"
private val max_records = 100
private val age = 70
private val brand = Array("Huawei","MI","Apple","Samsung","Meizu","Lenovo","Oppo","Nokia")
// define a method to make data
def Creater(): Unit ={
val rand = new Random()
val writer: FileWriter = new FileWriter(datapath,true)

// create age of data
for(i <- 1 to max_records){
var dataage = rand.nextInt(age)
if (dataage < 15){dataage = age + 15}

//create phonePlus of data
var phonePlus = brand(rand.nextInt(8))

//create clicks of data
var clicks = rand.nextInt(20)

//create users of data
var name = "Role"+ rand.nextInt(100).toString
//println(name)

var months = rand.nextInt(12)+1
var logintime = "2016" + "-" + months + "-" + rand.nextInt(31)
//println(logintime)

//DataStructure("ID","Username","Userage","PhoneType,"Click","LoginTime")
writer.write(i + "," + name + "," + dataage + "," + phonePlus + "," + clicks + "," + logintime)
writer.write(System.getProperty("line.separator"))
}
writer.flush()
writer.close()
}
def main(args: Array[String]): Unit = {
Creater()
System.exit(1)
}
}

修改max_records 为你想要的数据大小，就可以生成指定的数据了

0 0