spark sql on hive

来源:互联网 发布:python 接收post数据 编辑:程序博客网 时间:2024/04/25 14:27

Spark-sql部署

版本

Hadoop-2.5.0-cdh5.3.2 

Hive-1.2.1-cdh5.3.2

Spark-1.5.0

bihdp01节点为例

spark masterbihdp01上:spark://bihdp01:7077

spark HistoryServerbihdp01上:bihdp01:8032

spark eventLoghdfs上:hdfs://testenv/spark/eventLog

分步指南

  

1.   拷贝$HIVE_HOME/conf/hive-site.xml, hive-log4j.properties $SPARK_HOME/conf/目录

拷贝mysql-connector-java-5.1.37-bin.jar包到sparklib

  hive-site配置:

<!--hdfshive数据存放目录,启动hadoop后需要在hdfs上手动创建

      

       <property> 

          <name>hive.metastore.schema.verification</name> 

          <value>false</value> 

       </property> -->

      

       <!--默认 metastore在本地,添加配置改为非本地

       <property>

              <name>hive.metastore.local</name>

              <value>false</value>

       </property>-->

      

       <property>

              <name>hive.metastore.uris</name>

              <value>thrift://bihdp01:9083</value>

              <description>Thrift uri for the remote metastore. Used by metastore client to connect to remote metastore.</description>

       </property>

      

      

       <property>

              <name>hive.metastore.warehouse.dir</name>

              <value>/hive/warehouse</value>

       </property>

       <!--通过jdbc协议连接mysqlhive -->

       <property>

              <name>javax.jdo.option.ConnectionURL</name>

              <value>jdbc:mysql://bihdp01:3306/hiveto?createDatabaseIfNotExist=true</value>

              <description>JDBC connect string for a JDBC metastore</description>

       </property>

       <!--jdbcmysql驱动 -->

       <property>

              <name>javax.jdo.option.ConnectionDriverName</name>

              <value>com.mysql.jdbc.Driver</value>

              <description>Driver class name for a JDBC metastore</description>

       </property>

       <!--mysql用户名 -->

       <property>

              <name>javax.jdo.option.ConnectionUserName</name>

              <value>root</value>

              <description>username to use against metastore database</description>

       </property>

       <!--mysql用户密码 -->

       <property>

              <name>javax.jdo.option.ConnectionPassword</name>

              <value>*****</value>

              <description>password to use against metastore database</description>

       </property>

      

      

      

       <!-- 设置为false,查询将以运行hiveserver2进程的用户运行-->

      

       <property>

              <name>hive.server2.enable.doAs</name>

              <value>ture</value>

       </property>

       <property>

              <name>hive.server2.thrift.bind.host</name>

              <value>bihdp01</value>

       </property>

       <property>

              <name>hive.server2.thrift.port</name>

              <value>10000</value>

       </property>

      

      

       <property>

              <name>hive.exec.parallel</name>

              <value>true</value>

       </property>

       <property>

              <name>hive.exec.dynamic.partition.mode</name>

              <value>strict</value>

       </property>

       <property>

              <name>hive.exec.compress.intermediate</name>

              <value>true</value>

       </property>

       <!-- 配置hiveweb页面访问的接口hwi ,主机端口  war包的路径-->

       <property>

    <name>hive.hwi.listen.host</name>

    <value>bihdp01</value>

  </property>

 

  <property>

    <name>hive.hwi.listen.port</name>

    <value>9999</value>

  </property>

  <property>

    <name>hive.hwi.war.file</name>

    <value>lib/hive-hwi-1.2.1.war</value>

  </property>

 

2.修改spark-defaults.conf

spark.eventLog.enabled true

spark.eventLog.dir hdfs://testenv/spark/eventLog

spark.eventLog.compress true

spark.yarn.historyServer.address=bihdp01:8032

spark.sql.hive.metastore.version=1.2.1

spark.port.maxRetries=100

 

spark.sql.hive.metastore.jars=/opt/apps/hadoop/share/hadoop/mapreduce/*:/opt/apps/hadoop/share/hadoop/mapreduce/lib/*:/opt/apps/hadoop/share/hadoop/common/*:/opt/apps/hadoop/share/hadoop/common/lib/*:/opt/apps/hadoop/share/hadoop/hdfs/*

:/opt/apps/hadoop/share/hadoop/hdfs/lib/*:/opt/apps/hadoop/share/hadoop/yarn/*:/opt/apps/hadoop/share/hadoop/yarn/lib/*:/opt/apps/hive/lib/*:/opt/apps/spark/lib/*

spark.driver.extraLibraryPath=/opt/apps/hadoop/share/hadoop/mapreduce/*:/opt/apps/hadoop/share/hadoop/mapreduce/lib/*:/opt/apps/hadoop/share/hadoop/common/*:/opt/apps/hadoop/share/hadoop/common/lib/*:/opt/apps/hadoop/share/hadoop/hdfs/*

:/opt/apps/hadoop/share/hadoop/hdfs/lib/*:/opt/apps/hadoop/share/hadoop/yarn/*:/opt/apps/hadoop/share/hadoop/yarn/lib/*:/opt/apps/hive/lib/*:/opt/apps/spark/lib/*

spark.executor.extraLibraryPath=/opt/apps/hadoop/share/hadoop/mapreduce/*:/opt/apps/hadoop/share/hadoop/mapreduce/lib/*:/opt/apps/hadoop/share/hadoop/common/*:/opt/apps/hadoop/share/hadoop/common/lib/*:/opt/apps/hadoop/share/hadoop/hdfs

/*:/opt/apps/hadoop/share/hadoop/hdfs/lib/*:/opt/apps/hadoop/share/hadoop/yarn/*:/opt/apps/hadoop/share/hadoop/yarn/lib/*:/opt/apps/hive/lib/*:/opt/apps/spark/lib/*

 

3.修改spark-env.sh

export SPARK_WORKER_DIR=/data/cdhdata/spark/work

export JAVA_HOME=/usr/java/jdk1.7.0_79

#export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.6.0

export SCALA_HOME=/home/hadoop/app/scala-2.10.4

#export HADOOP_CONF_DIR=/home/hadoop/app/hadoop-2.6.0-cdh5.6.0/etc/hadoop

export JAVA_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"

#export HIVE_HOME=/home/hadoop/app/apache-hive-1.2.1-bin

#export SPARK_MASTER_IP=

export SPARK_DAEMON_MEMORY=512m

export SPARK_WORKER_MEMORY=1024M

export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=bihdp01:2181,bihdp02:2181,bihdp03:2181 -Dspark.deploy.zookeeper.dir=/localspark"

export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=/data/cdhdata/sparklocallogs"

#set Hadoop path

export HDFS_YARN_LOGS_DIR=/data1/hadooplogs

export HADOOP_PREFIX=/opt/apps/hadoop

export HADOOP_HOME=$HADOOP_PREFIX

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_MAPRED_PID_DIR=$HADOOP_HOME/pids

export HADOOP_YARN_HOME=$HADOOP_HOME

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop

export HADOOP_LOG_DIR=$HDFS_YARN_LOGS_DIR/logs

export HADOOP_PID_DIR=$HADOOP_HOME/pids

export HADOOP_SECURE_DN_PID_DIR=$HADOOP_PID_DIR

export YARN_HOME=$HADOOP_HOME

export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

export YARN_LOG_DIR=$HDFS_YARN_LOGS_DIR/logs

export YARN_PID_DIR=$HADOOP_HOME/pids

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

 

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_CONF_DIR:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_COMMON_HOME/share/hadoop/common/lib/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*:$HADOOP_YARN_HOME/share/hadoop/yarn/*:$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*

export CLASSPATH=$HADOOP_CLASSPATH:$CLASSPATH

 

 

4.[ERROR] Terminal initializationfailed; falling back to unsupported
解决问题Java.lang.IncompatibleClassChangeError:Found class jline.Terminal, but interface was expected

需要删除/opt/apps/hadoop/share/hadoop/yarn/lib/jline-0.9.94.jar文件,在/etc/profile中配置exportHADOOP_USER_CLASSPATH_FIRST=truesource操作

见:https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark:+Getting+Started

 

5.使用spark-shell

[plain] view plain copy

1.  val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)  

2. sqlContext.sql("select * from test limit 2 ").collect().foreach(println)  

 

6.使用spark-sql启动local模式

里面可以直接输入HiveQL语句执行

 

7. 启动standalone集群模式的spark-sql 

[plain] view plain copy

1.  spark-sql --master spark://bihdp01:7077    

 

(该主机地址应该对应sparkALIVE节点)

 

8.启动spark on yarn模式的spark-sql

[plain] view plain copy

1.  spark-sql --master yarn-client   

[plain] view plain copy

1.  或者   

[plain] view plain copy

1.  spark-sql --master yarn-cluster(暂时还不支持)  

 

0 0