spark一些优化

来源：互联网发布：mac air 电池容量编辑：程序博客网时间：2024/06/02 03:46

选择一个对的API
资源参数调优

资源：内存&&CPU&&GC

 bin/spark-submit --help 运行这个，有很多配置参数，就可以调优

spark-submit参数调优
dirver要接受返回值，如果返回的RDD很大，就需要DIRVER需要大的内存；driver很耗内存的，比executor的内存要大，要设大一点。
spark-submit脚本中的资源相关参数 ===> 资源参数调优
可以在–conf参数中给定资源配置相关信息(配置的一般是JVM的一些垃圾回收机制)
–driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 1024M). 给定driver运行的时候申请的内存，默认是1G
–executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G). 给定Executor运行的时候申请的内存，默认1G
–driver-cores NUM Cores for driver (Default: 1). standalone的cluster运行模式下，driver运行需要的core数量
–supervise If given, restarts the driver on failure. 当运行在standalone上的时候如果driver宕机，会重启
–total-executor-cores NUM Total cores for all executors. 给定针对所有executor上总共申请多少个cores，默认全部
–executor-cores NUM Number of cores per executor. (Default: 1 in YARN mode,or all available cores on the worker in standalone mode) Standalone模式下，每个executor分配多少cores，默认全部；以及yanr模式下，每个executor分配多少cores，默认1个
–driver-cores NUM Number of cores used by the driver, only in cluster mode(Default: 1). yarn运行模式下(cluster)，driver需要的cores数量，默认一个
–num-executors NUM Number of executors to launch (Default: 2). yarn运行模式下总的executors数量也就是机器的数量，每个nodemanager对应一个

我觉得total-executor-cores／executor-cores 应该决定了使用几个机器来运行这个任务，，，，

java gc参数配置

Spark Configuration 文档中都有

这些参数可以在：
1.提交j ar的时候指定–conf
2.在spark-default.xml
3.在创建sparkcontext的时候，通过conf指定

针对集合类型，要想法设法的把集合耗内存的东东转化为数字等，比如PV UV

0 0