pyspark和spark pipe性能对比 用例程序
来源:互联网 发布:全球最大交友软件 编辑:程序博客网 时间:2024/05/21 17:01
//构造数据public class Main { public static void main(String[] args) throws IOException { File file = new File("/home/gt/testdata.dat"); file.delete(); file.createNewFile(); OutputStream out = new FileOutputStream(file); OutputStreamWriter osw=new OutputStreamWriter(out); BufferedWriter writer = new BufferedWriter(osw); for(int i=0;i<9999999;i++){ writer.write("aaabbbcccdddeee"); writer.newLine(); } writer.close(); osw.close(); out.close(); }}
pipe相关代码:
#!/usr/bin/python#coding=utf-8def fff(line): s = set() l = list() length = len(line) for i in range(0,length-1): if line[i] not in s: l.append(line[i]) s.add(line[i]) return "".join(l)result = ""#var = 1 #while var == 1 :for i in range(1,1111111): s = raw_input() if s is None or s =="" : break result += fff(s) + "\n"print result
package testimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextobject PipeTest { def main(args: Array[String]) { val t0 = System.currentTimeMillis(); val sparkConf = new SparkConf().setAppName("pipe Test") val sc = new SparkContext(sparkConf) val a = sc.textFile("/home/gt/testdata.dat", 9) val result = a.pipe(" /home/gt/spark/bin/pipe.py").saveAsTextFile("/home/gt/output.dat") sc.stop() println("!!!!!!!!!" + (System.currentTimeMillis() - t0)); }}
pyspark相关代码
#-*- coding: utf-8 -*-from __future__ import print_functionimport sysimport timefrom pyspark import SparkContext#去掉重复的字母if __name__ == "__main__": t0 = time.time() sc = SparkContext(appName="app2ap") lines = sc.textFile("/home/gt/testdata.dat", 9) def fff(line): s = set() l = list() length = len(line) for i in range(0,length-1): if line[i] not in s: l.append(line[i]) s.add(line[i]) return "".join(l) rdd = lines.map(fff) rdd.saveAsTextFile("/home/gt/output.dat") sc.stop() print("!!!!!!") print(time.time()-t0)
附加原生的程序:
package testimport java.util.ArrayListimport java.util.HashSetimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextobject Test { def fff(line: String): String = { val s = new HashSet[Char]() val l = new ArrayList[Char]() val length = line.length() for (i <- 0 to length - 1) { val c = line.charAt(i) if (!s.contains(c)) { l.add(c) s.add(c) } } return l.toArray().mkString } def main(args: Array[String]) { val t0 = System.currentTimeMillis(); val sparkConf = new SparkConf().setAppName("pipe Test") val sc = new SparkContext(sparkConf) val a = sc.textFile("/home/gt/testdata.dat", 9) val result = a.map(fff).saveAsTextFile("/home/gt/output.dat") sc.stop() println("!!!!!!!!!" + (System.currentTimeMillis() - t0)); }}
结论是Spark Scala是25s,pipe是50s,pyspark是75s
0 0
- pyspark和spark pipe性能对比 用例程序
- Spark shuffle:hash和sort性能对比
- Pipe in PySpark
- Angle和XBGoost以及Spark的性能对比
- Hadoop vs Spark性能对比
- Hadoop vs Spark性能对比
- Hadoop vs Spark性能对比
- 基于pyspark 和scala spark的jupyter notebook 安装
- Spark pyspark package
- Spark(1)-初识Pyspark
- pyspark-Spark编程指南
- spark pyspark无法运行
- 对比 Spark 和 MapReduce
- pyspark用pipe管道调用bash脚本时,遇到Permission Denied问题
- Spark/pyspark RDD 笛卡尔积
- flume-kafka- spark streaming(pyspark)
- pyspark-Spark Streaming编程指南
- flume-kafka- spark streaming(pyspark)
- 网易云课堂-陈越、何钦铭-数据结构-2016春,02-线性结构1 一元多项式的乘法与加法运算,学习笔记
- html5存储数据,替代cookie
- C++文件操作:打开文件和写入文件
- 欢迎使用CSDN-markdown编辑器
- Struts2中Action必须实现execute方法吗
- pyspark和spark pipe性能对比 用例程序
- AngularJS 五大特性(为克服HTML在构建应用上的不足而设计)
- Android日常开发总结的技术经验60条
- Qt编写串口通信程序全程图文讲解 --转载
- peugeot 308 5 doors
- iOS 代码实现获得应用的版本号(Version/Build)
- SY-SUBRC 的含义
- Java与设计模式-责任链模式
- ContextMenu.iOS