Java + Spark SQL + Hive + Maven简单实现和异常问题
来源:互联网 发布:baocms7.1完整源码 编辑:程序博客网 时间:2024/06/05 14:57
一. 前期准备
1.linux搭建Java和Scala环境搭建
2. linux搭建hadoop+spark+hive分布式集群
hadoop分布式集群搭建:hadoop分布式集群搭建
spark分布式集群搭建:spark分布式集群搭建
hive分布式集群搭建:待完善
二.项目代码实现
1 需求
通过Spark Sql 查询Hive数据库数据
数据库:bi_ods;
表:owms_m_locator
2 maven 项目搭建
新增一个Mave project工程
3 实现代码
package com.lm.hive.SparkHive;import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaSparkContext;import org.apache.spark.sql.SQLContext;import org.apache.spark.sql.hive.HiveContext;/** * Spark sql获取Hive数据 * */public class App { public static void main( String[] args ) { SparkConf sparkConf = new SparkConf().setAppName("SparkHive").setMaster("local[2]"); JavaSparkContext sc = new JavaSparkContext(sparkConf); //不要使用SQLContext,部署异常找不到数据库和表 HiveContext hiveContext = new HiveContext(sc); SQLContext sqlContext = new SQLContext(sc); //查询表前10条数据 hiveContext.sql("select * from bi_ods.owms_m_locator limit 10").show(); sc.stop(); }}
4. pom.xml文件
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>com.lm.hive</groupId><artifactId>SparkHive</artifactId><version>0.0.1-SNAPSHOT</version><packaging>jar</packaging><name>SparkHive</name><url>http://maven.apache.org</url><properties><project.build.sourceEncoding>UTF-8</project.build.sourceEncoding></properties><dependencies><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>3.8.1</version><scope>test</scope></dependency><dependency><groupId>mysql</groupId><artifactId>mysql-connector-java</artifactId><version>5.1.39</version></dependency><dependency><groupId>org.slf4j</groupId><artifactId>slf4j-log4j12</artifactId><version>1.7.22</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>2.6.4</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-sql_2.10</artifactId><version>1.6.0</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-hive_2.10</artifactId><version>1.6.0</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-streaming_2.10</artifactId><version>1.6.0</version><exclusions><exclusion><artifactId>slf4j-log4j12</artifactId><groupId>org.slf4j</groupId></exclusion></exclusions></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.10</artifactId><version>1.6.0</version></dependency><!-- https://mvnrepository.com/artifact/org.apache.hive/hive-jdbc --><dependency><groupId>org.apache.hive</groupId><artifactId>hive-jdbc</artifactId><version>2.1.1</version></dependency><!-- https://mvnrepository.com/artifact/org.apache.hive/hive-exec --><dependency><groupId>org.apache.hive</groupId><artifactId>hive-exec</artifactId><version>2.1.1</version></dependency></dependencies><build><plugins><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-shade-plugin</artifactId><executions><execution><phase>package</phase><goals><goal>shade</goal></goals><configuration><shadedArtifactAttached>true</shadedArtifactAttached><shadedClassifierName>allinone</shadedClassifierName><artifactSet><includes><include>*:*</include></includes></artifactSet><filters><filter><artifact>*:*</artifact><excludes><exclude>META-INF/*.SF</exclude><exclude>META-INF/*.DSA</exclude><exclude>META-INF/*.RSA</exclude></excludes></filter></filters><transformers><transformerimplementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"><resource>reference.conf</resource></transformer><transformerimplementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"><resource>META-INF/spring.handlers</resource></transformer><transformerimplementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"><resource>META-INF/spring.schemas</resource></transformer><transformerimplementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"><manifestEntries><Main-Class></Main-Class></manifestEntries></transformer></transformers></configuration></execution></executions></plugin></plugins></build></project>
三. 项目部署
1 maven项目编译成jar包
本文编译成 SparkHive-0.0.1-SNAPSHOT-allinone.jar
2 上传jar包至linux服务器
通过rz命令上传
3 通过spark-submit命令运行jar包
命令:
sh bin/spark-submit --class com.lm.hive.SparkHive.App --master yarn --files /home/winit/spark-1.6.0/conf/hive-site.xml java_jar/SparkHive-0.0.1-SNAPSHOT-allinone.jar
结果:
四.异常问题
在网上搜索,很多是通过SQLContext实例去查询Hive数据,当本人使用时,出现以下异常
1. Exception in thread "main" org.apache.spark.sql.AnalysisException: Table not found: `bi_ods`.`owms_m_locator`;
解决:
SQLContext sqlContext = new SQLContext(sc);
替换为:
HiveContext hiveContext = new HiveContext(sc);
2. Exception in thread "main" java.lang.SecurityException: class "javax.servlet.FilterRegistration"'s signer information does not match signer information of other classes in the same package
解决:
javax.servlet相关包重复导入引起的
hadoop-client或hadoop-common排除java.servelt等包
<dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>2.6.4</version><exclusions><exclusion><groupId>javax.servlet</groupId><artifactId>*</artifactId></exclusion></exclusions></dependency>
五. SQLContext和HiveContext区别和联系
Spark SQL程序的主入口是SQLContext类或它的子类。创建一个基本的SQLContext,你只需要SparkContext
除了基本的SQLContext,也可以创建HiveContext。SQLContext和HiveContext区别与联系为:
SQLContext现在只支持SQL语法解析器(SQL-92语法)
HiveContext现在支持SQL语法解析器和HiveSQL语法解析器,默认为HiveSQL语法解析器,用户可以通过配置切换成SQL语法解析器,来运行HiveSQL不支持的语法。
使用HiveContext可以使用Hive的UDF,读写Hive表数据等Hive操作。SQLContext不可以对Hive进行操作。
Spark SQL未来的版本会不断丰富SQLContext的功能,做到SQLContext和HiveContext的功能容和,最终可能两者会统一成一个Context
HiveContext包装了Hive的依赖包,把HiveContext单独拿出来,可以在部署基本的Spark的时候就不需要Hive的依赖包,需要使用HiveContext时再把Hive的各种依赖包加进来。
SQL的解析器可以通过配置spark.sql.dialect参数进行配置。在SQLContext中只能使用Spark SQL提供的”sql“解析器。在HiveContext中默认解析器为”hiveql“,也支持”sql“解析器。
详情见:SQLContext和HiveContext区别与联系
代码下载地址:Java + Spark SQL + Hive + Maven简单实例代码
- Java + Spark SQL + Hive + Maven简单实现和异常问题
- spark整合hive和sql
- spark sql on hive安装问题解析
- spark-sql部署实现与Hive交互
- spark-sql部署实现与Hive交互
- Spark SQL中实现Hive MapJoin
- Spark SQL中实现Hive MapJoin
- spark sql之访问 hive 和 MySQL
- spark-sql操作hive和hdfs
- Spark SQL和Hive使用场景?
- Spark-SQL和Hive on Spark, SqlContext和HiveContext
- spark sql简单示例java
- Spark-Sql整合hive,在spark-sql命令和spark-shell命令下执行sql命令和整合调用hive
- Spark SQL with Hive
- Spark SQL+Hive历险记
- spark sql on hive
- spark-sql读取hive
- spark-sql 结合 hive
- javabean转json (利用第三方jar包)
- Maven用jetty运行的时候
- 【排序算法】堆排序
- OpenCv下的xml文件在matlab下的读入
- Intellij idea插入表数据【在UI界面插入出错】
- Java + Spark SQL + Hive + Maven简单实现和异常问题
- 【OpenGL ES】OpenGL ES简介
- Maximum Depth of Binary Tree
- Javascript 数据直接量
- python sklearn机器学习库 安装 及 2个spyder的安装
- 程序设计基础(C)大作业——学籍管理系统(1)
- 【面试题】剑指offer03--判断一个数是否在一个二维数组中
- 推荐资源:sound-of-sorting-0.6.5.exe
- JavaScript简介