sparksql读取hive数据源配置

来源：互联网发布：scrollreveal.js 使用编辑：程序博客网时间：2024/04/29 04:35

1、将hive-site.xml内容添加到spark conf配置文件中，内容仅需要元数据连接信息即可
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <property>
            <name>hive.metastore.uris</name>
            <value>thrift://master-centos:9083</value>
            <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
    </property>
</configuration>
并分发到各个节点中
2、如hive元数据采用的是mysql，则需将mysql-connector-java-5.1.25-bin.jar放置 spark/lib下
3、修改 spark-defaults.conf 配置文件
spark-default.conf
spark.master    spark://192.168.130.140:7077
spark.driver.memory     512m
spark.executor.memory 512m
spark.eventLog.enabled true
spark.eventLog.dir hdfs://192.168.130.140:8020/user/spark/logs （需提前在hadoop上创建好该目录）
并分发到各个节点中
4、启动hive metastore 服务
5、如需通过jdbc方式连接spark，则启动spark thriftserver服务
start-thriftserver.sh --master spark://192.168.130.140:7077 --executor-memory 1g --total-executor-cores 2 --executor-cores 1 --hiveconf hive.server2.thrift.port=10050 --conf spark.dynamicAllocation.enabled=false

阅读全文

0 0