spark sql on hive
来源:互联网 发布:python 接收post数据 编辑:程序博客网 时间:2024/04/25 14:27
Spark-sql部署
版本
Hadoop-2.5.0-cdh5.3.2
Hive-1.2.1-cdh5.3.2
Spark-1.5.0
以bihdp01节点为例
spark master在bihdp01上:spark://bihdp01:7077
spark HistoryServer在bihdp01上:bihdp01:8032
spark eventLog在hdfs上:hdfs://testenv/spark/eventLog
分步指南
1. 拷贝$HIVE_HOME/conf/hive-site.xml, hive-log4j.properties到 $SPARK_HOME/conf/目录
拷贝mysql-connector-java-5.1.37-bin.jar包到spark的lib下
hive-site配置:
<!--在hdfs上hive数据存放目录,启动hadoop后需要在hdfs上手动创建
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property> -->
<!--默认 metastore在本地,添加配置改为非本地
<property>
<name>hive.metastore.local</name>
<value>false</value>
</property>-->
<property>
<name>hive.metastore.uris</name>
<value>thrift://bihdp01:9083</value>
<description>Thrift uri for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/hive/warehouse</value>
</property>
<!--通过jdbc协议连接mysql的hive库 -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://bihdp01:3306/hiveto?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<!--jdbc的mysql驱动 -->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<!--mysql用户名 -->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<!--mysql用户密码 -->
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>*****</value>
<description>password to use against metastore database</description>
</property>
<!-- 设置为false,查询将以运行hiveserver2进程的用户运行-->
<property>
<name>hive.server2.enable.doAs</name>
<value>ture</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>bihdp01</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.exec.parallel</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>strict</value>
</property>
<property>
<name>hive.exec.compress.intermediate</name>
<value>true</value>
</property>
<!-- 配置hive的web页面访问的接口hwi ,主机端口 war包的路径-->
<property>
<name>hive.hwi.listen.host</name>
<value>bihdp01</value>
</property>
<property>
<name>hive.hwi.listen.port</name>
<value>9999</value>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>lib/hive-hwi-1.2.1.war</value>
</property>
2.修改spark-defaults.conf
spark.eventLog.enabled true
spark.eventLog.dir hdfs://testenv/spark/eventLog
spark.eventLog.compress true
spark.yarn.historyServer.address=bihdp01:8032
spark.sql.hive.metastore.version=1.2.1
spark.port.maxRetries=100
spark.sql.hive.metastore.jars=/opt/apps/hadoop/share/hadoop/mapreduce/*:/opt/apps/hadoop/share/hadoop/mapreduce/lib/*:/opt/apps/hadoop/share/hadoop/common/*:/opt/apps/hadoop/share/hadoop/common/lib/*:/opt/apps/hadoop/share/hadoop/hdfs/*
:/opt/apps/hadoop/share/hadoop/hdfs/lib/*:/opt/apps/hadoop/share/hadoop/yarn/*:/opt/apps/hadoop/share/hadoop/yarn/lib/*:/opt/apps/hive/lib/*:/opt/apps/spark/lib/*
spark.driver.extraLibraryPath=/opt/apps/hadoop/share/hadoop/mapreduce/*:/opt/apps/hadoop/share/hadoop/mapreduce/lib/*:/opt/apps/hadoop/share/hadoop/common/*:/opt/apps/hadoop/share/hadoop/common/lib/*:/opt/apps/hadoop/share/hadoop/hdfs/*
:/opt/apps/hadoop/share/hadoop/hdfs/lib/*:/opt/apps/hadoop/share/hadoop/yarn/*:/opt/apps/hadoop/share/hadoop/yarn/lib/*:/opt/apps/hive/lib/*:/opt/apps/spark/lib/*
spark.executor.extraLibraryPath=/opt/apps/hadoop/share/hadoop/mapreduce/*:/opt/apps/hadoop/share/hadoop/mapreduce/lib/*:/opt/apps/hadoop/share/hadoop/common/*:/opt/apps/hadoop/share/hadoop/common/lib/*:/opt/apps/hadoop/share/hadoop/hdfs
/*:/opt/apps/hadoop/share/hadoop/hdfs/lib/*:/opt/apps/hadoop/share/hadoop/yarn/*:/opt/apps/hadoop/share/hadoop/yarn/lib/*:/opt/apps/hive/lib/*:/opt/apps/spark/lib/*
3.修改spark-env.sh
export SPARK_WORKER_DIR=/data/cdhdata/spark/work
export JAVA_HOME=/usr/java/jdk1.7.0_79
#export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.6.0
export SCALA_HOME=/home/hadoop/app/scala-2.10.4
#export HADOOP_CONF_DIR=/home/hadoop/app/hadoop-2.6.0-cdh5.6.0/etc/hadoop
export JAVA_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
#export HIVE_HOME=/home/hadoop/app/apache-hive-1.2.1-bin
#export SPARK_MASTER_IP=
export SPARK_DAEMON_MEMORY=512m
export SPARK_WORKER_MEMORY=1024M
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=bihdp01:2181,bihdp02:2181,bihdp03:2181 -Dspark.deploy.zookeeper.dir=/localspark"
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=/data/cdhdata/sparklocallogs"
#set Hadoop path
export HDFS_YARN_LOGS_DIR=/data1/hadooplogs
export HADOOP_PREFIX=/opt/apps/hadoop
export HADOOP_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_MAPRED_PID_DIR=$HADOOP_HOME/pids
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_LOG_DIR=$HDFS_YARN_LOGS_DIR/logs
export HADOOP_PID_DIR=$HADOOP_HOME/pids
export HADOOP_SECURE_DN_PID_DIR=$HADOOP_PID_DIR
export YARN_HOME=$HADOOP_HOME
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_LOG_DIR=$HDFS_YARN_LOGS_DIR/logs
export YARN_PID_DIR=$HADOOP_HOME/pids
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_CONF_DIR:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_COMMON_HOME/share/hadoop/common/lib/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*:$HADOOP_YARN_HOME/share/hadoop/yarn/*:$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
export CLASSPATH=$HADOOP_CLASSPATH:$CLASSPATH
4.[ERROR] Terminal initializationfailed; falling back to unsupported
解决问题Java.lang.IncompatibleClassChangeError:Found class jline.Terminal, but interface was expected
需要删除/opt/apps/hadoop/share/hadoop/yarn/lib/jline-0.9.94.jar文件,在/etc/profile中配置exportHADOOP_USER_CLASSPATH_FIRST=true并source操作
见:https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark:+Getting+Started
5.使用spark-shell
[plain] view plain copy
1. val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
2. sqlContext.sql("select * from test limit 2 ").collect().foreach(println)
6.使用spark-sql启动local模式
里面可以直接输入HiveQL语句执行
7. 启动standalone集群模式的spark-sql
[plain] view plain copy
1. spark-sql --master spark://bihdp01:7077
(该主机地址应该对应spark的ALIVE节点)
8.启动spark on yarn模式的spark-sql
[plain] view plain copy
1. spark-sql --master yarn-client
[plain] view plain copy
1. 或者
[plain] view plain copy
1. spark-sql --master yarn-cluster(暂时还不支持)
- spark sql on hive
- spark sql on hive初探
- Spark SQL on Hive配置
- spark sql on hive安装问题解析
- spark sql on hive笔记一
- spark sql on hive配置及其使用
- Spark SQL 与 Spark SQL on Hive 区别
- [Spark]Shark, Spark SQL, Hive on Spark以及SQL On Spark的未来
- Spark-SQL和Hive on Spark, SqlContext和HiveContext
- Shark, Spark SQL, Hive on Spark, 以及SQL on Apache Spark的未来
- hive on spark部署
- 试用Hive on Spark
- spark on hive 总结
- hive on spark demo
- hive on spark 编译
- spark on hive
- Hive on Spark解析
- Hive on Spark:起点
- 调用camera拍照指定路径后删除图库中存在的照片
- ubuntu 安装 kvm
- centos6.5安装 redmine安装脚本 3.3.0版本
- IntelliJ IDEA 创建spring mvc项目(图)
- idea出现的怪异现象
- spark sql on hive
- Intent & Intent Filters
- 关于树状数组一些有意思的东西
- 全排列和全组合实现
- pgAdmin简介
- linux下用宏代替printf perror,实现日志输出(gcc编译器适用)
- C++友元类
- 迁移学习
- uva 11157 Dynamic Frog