spark 集群安装
来源:互联网 发布:php视频采集 编辑:程序博客网 时间:2024/06/08 11:21
下载安装文件
配置Spark-env.sh 和spark-default.properties
sbin/start-master.sh
slaver节点启动 worker
./start-slave.sh spark://cnsz046690:7077
scp -r spark-2.1.1-bin-hadoop2.6 cnsz046691:~
scp -r spark-2.1.1-bin-hadoop2.6 cnsz046745:~
scp -r spark-2.1.1-bin-hadoop2.6 cnsz046746:~
spark 2.1.1 参数修改
import scala.collection.JavaConversions._
show partitions base.UDS_B_I_TRADE_FUND_MOVT;
show partitions base.UDS_B_I_TRADE_FUND_MOVT;
select count(1) from base.UDS_B_I_TRADE_FUND_MOVT;
spark-sql –files /etc/spark/log4j.properties
spark-sql -Dlog4j.configuration=/etc/spark/log4j.properties
Spark 环境部署和动态资源分配配置
- spark2.2及以后,Java要求最低要Java8
- spark-sql 不支持custer模式
mv spark-2.2.0-bin-hadoop2.6 /usr/libln -s spark-2.2.0-bin-hadoop2.6 sparkmv /opt/app/spark/conf/* .ln -s /etc/spark/conf confln -s /var/log/spark logs
启动history server
/usr/lib/spark/sbin/start-history-server.sh
如果使用yarn模式,好像不用修改
修改文件log4j.properties,将日志级别调整为WARN
log4j.rootCategory=INFO, console
添加全局路径
export PATH=$PATH:/usr/lib/spark/bin
Spark log配置
log4j.rootCategory=INFO, RFAlog4j.appender.RFA=org.apache.log4j.RollingFileAppenderlog4j.appender.RFA.File=/appcom/log/spark/spark-${user.name}.loglog4j.appender.RFA.MaxFileSize=256MBlog4j.appender.RFA.MaxBackupIndex=20log4j.appender.RFA.layout=org.apache.log4j.PatternLayout# Pattern format: Date LogLevel LoggerName LogMessagelog4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n# Debugging Pattern format#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
问题排查
Spark2.0 yarn方式启动报错
优雅的解决方法
Jersey problem
If you try to run a spark-submit command on YARN you can expect the following error message:
Exception in thread “main” java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
Jar file jersey-bundle-*.jar is not present in the $SPARK_HOME/jars. Adding it fixes this problem:
sudo -u spark wget http://repo1.maven.org/maven2/com/sun/jersey/jersey-bundle/1.19.1/jersey-bundle-1.19.1.jar -P $SPARK_HOME/jars
January 2017 – Update on this issue:
If the following is done, Jersey 1 will be used when starting Spark History Server and the applications in Spark History Server will not be shown. The folowing error message will be generated in the Spark History Server output file:
WARN servlet.ServletHandler: /api/v1/applicationsjava.lang.NullPointerException at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
This problem occurs only when one tries to run Spark on YARN, since YARN 2.7.3 uses Jersey 1 and Spark 2.0 uses Jersey 2
One workaround is not to add the Jersey 1 jar described above but disable the YARN Timeline Service in spark-defaults.conf
spark.hadoop.yarn.timeline-service.enabled false
cp /usr/lib/hadoop-yarn/lib/jersey-client-1.9.jar /usr/lib/spark/jarscp /usr/lib/hadoop-yarn/lib/jersey-core-1.9.jar /usr/lib/spark/jarsmv /usr/lib/spark/jars/jersey-client-2.22.2.jar /usr/lib/spark/jars/jersey-client-2.22.2.jar.bak
Spark 获取hive元数据失败
Caused by: MetaException(message:Version information not found in metastore. ) at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:6664) at org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:6645) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498)
解决方法:
关闭 hive.metastore.schema.verification
参数即可,这个参数会根据hive的版本去检查元数据。
环境测试:
spark-submit --class org.apache.spark.examples.SparkPi \ --master yarn \ --deploy-mode cluster \ --driver-memory 2g \ --executor-memory 2g \ --executor-cores 1 \ --queue default \ /usr/lib/spark/examples/jars/spark-examples_*.jar \ 10spark-submit --class org.apache.spark.examples.SparkPi \ --master yarn \ --deploy-mode client \ --driver-memory 2g \ --executor-memory 2g \ --executor-cores 1 \ --queue default \ /usr/lib/spark/examples/jars/spark-examples_*.jar \ 10spark-sql --master yarn --deploy-mode client \ --driver-memory 2g \ --executor-memory 2g \ --num-executors 8
Spark 动态资源配置
- yarn-site.xml 修改
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle,spark_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.spark_shuffle.class</name> <value>org.apache.spark.network.yarn.YarnShuffleService</value> </property> <property> <name>spark.shuffle.service.port</name> <value>7337</value> </property>
- 发布spark shuffle jar
chmod a+x /usr/lib/spark/lib/*.jarcp /usr/lib/spark/lib/spark-1.6.3-yarn-shuffle.jar /usr/lib/hadoop-yarn/ 并同步到所有NM节点
- 配置spark-defaults.conf 开启动态资源分配
spark.shuffle.service.enabled true spark.shuffle.service.port 7337 spark.dynamicAllocation.enabled true spark.dynamicAllocation.minExecutors 1 spark.dynamicAllocation.maxExecutors 100 spark.dynamicAllocation.schedulerBacklogTimeout 1sspark.dynamicAllocation.sustainedSchedulerBacklogTimeout 5s
- Hadoop集群安装spark集群
- Spark集群安装
- spark集群安装
- Spark集群安装
- spark HA集群安装
- Spark standalone集群安装
- Spark集群安装
- 安装spark集群
- Spark集群安装
- Spark集群安装部署
- Spark集群安装笔记
- Spark集群安装部署
- spark集群安装
- Spark集群安装指导
- spark集群安装
- spark集群安装
- Spark集群安装
- spark 集群安装
- 随笔-iOS学习简单绘图
- 【Android Studio】 AS 使用记录03(AS 常用插件)
- SQL代理服务预警
- Eclipse + Pydev开发Python时import报错
- 正则表达式简单总结
- spark 集群安装
- 解决通达OA中Redis缓存服务和MQ队列服务起不来的问题
- @Controller和@RestController的区别?
- mybatis-全局配置文件-mybatis-config.xml-databaseIdProvider-9
- java中字节流与字符流的区别
- Codeforces 813B The Golden Age 题解
- AngularJs基础
- Xcode 8 真机调试 iOS 11 beta
- Java实现集合的组合(从组合中取出K个元素进行组合的所有情况)