Spark与Mysql的交互
来源:互联网 发布:三星s4可以用4g网络吗 编辑:程序博客网 时间:2024/04/30 11:30
背景
Spark在对目标数据进行计算后,RDD格式的数据一般都会存在HDFS,Hive,HBase中,另一方面,对于非RDD格式的数据,可能会存放在像Mysql中这种传统的RDMS中.
但是写入过程中经常出现各种各样的问题, stackoverflow上有很多帖子:
- Error writing spark dataframe to mysql table
- JDBC batch insert performance
还有些其他的贴
- Using Apache Spark and MySQL for Data Analysis
- spark 1.3.0 将dataframe数据写入Hive分区表
- Spark读取数据库(Mysql)的四种方式详解
- 完整java开发中JDBC连接数据库代码和步骤
- Spark踩坑记——数据库(Hbase+Mysql)
RDD
Spark SQL通过JDBC连接MySQL读写数据
非RDD
import java.sql.{Date, DriverManager, PreparedStatement, Connection}/* tableName = "tempTableName" columns = [key : String, value : Int] DBIP = 10.10.10.10 DBPort = 8888 DB = tempDB*/def connection2Mysql() = { var conn : Connection = null var ps : PreparedStatement = nulll val userName = "admin" val passwd = "admin" val key = "Tom" val value = 1024 val sql = "INSERT INTO tempTableName(key,value) values (?,?)" try { Class.forName("com.mysql.jdbc.Driver").newInstance conn = DriverManager.getConnection("jdbc:mysql://10.10.10.10:8888/tempDB", userName, passwd) ps = conn.prepareStatement(sql) ps.setDate(1, key) ps.setLong(2, value) ps.executeUpdate() } catch { case e: Exception => println("----> Exception!\t:\t" + e + "<-----") } finally { if (ps != null) ps.close() if (conn != null) conn.close() }}
这里会遇到的一个问题是,在本地启动client进行功能检查,方法是可行的,但是通过submit提交给YARN之后,却被报错 java.sql.SQLException: No suitable driver found for jdbc:mysql://10.10.10.10:8888/tempDB
或是 Error:java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
发生这种情况的原因是因为在上述代码中,没有找到对应的类,即Class.forName("com.mysql.jdbc.Driver").newInstance
这行代码出现了问题. 这行代码的目的是对driver进行注册,查看Driver源码,如下所示:
/** * The Java SQL framework allows for multiple database drivers. Each driver should supply a class that implements the Driver interface * * The DriverManager will try to load as many drivers as it can find and then for any given connection request, it will ask each driver in turn to try to * connect to the target URL. * * It is strongly recommended that each Driver class should be small and standalone so that the Driver class can be loaded and queried without bringing in vast * quantities of supporting code. * * When a Driver class is loaded, it should create an instance of itself and register it with the DriverManager. This means that a user can load and register a * driver by doing Class.forName("foo.bah.Driver") */public class Driver extends NonRegisteringDriver implements java.sql.Driver { // Register ourselves with the DriverManager static { try { java.sql.DriverManager.registerDriver(new Driver()); } catch (SQLException E) { throw new RuntimeException("Can't register driver!"); } } /** * Construct a new driver and register it with DriverManager * @throws SQLException if a database error occurs. */ public Driver() throws SQLException { // Required for Class.forName().newInstance() }}
出现这个问题有多种可能,
1. 在–jars参数里面加入Mysql jar包引用是没有用的. 需要通过加入–driver-class-path参数来设置driver的classpath.
$ bin/spark-submit --master local[2] --driver-class-path lib/mysql-connector-java-5.1.38.jar --class temp.jar
原因是两者分发的node不同, link
–driver-class-path driver所依赖的包,多个包之间用冒号(:)分割
–jars driver和executor都需要的包,多个包之间用逗号(,)分割
2. 使用依赖结果打包的时候没有将对应jar包导入.
使用依赖的时候需要将对应jar包打入最终的jar包中,这样才能正确的找到对应的类名并成功注册.
例如,在build.sbt中添加依赖后
// https://mvnrepository.com/artifact/mysql/mysql-connector-javalibraryDependencies += "mysql" % "mysql-connector-java" % "5.1.38"
需要在Artifacts中添加对应的jar包.
- Spark与Mysql的交互
- Spark与Hive的交互
- 与mysql交互的技巧
- NodeJS与Mysql的交互
- MySQL与Python的交互
- python与Mysql的交互
- mysql与python的交互
- spark与磁盘交互,spark读hdfs
- spark与hbase进行交互
- 使用spark与ElasticSearch交互
- 查看Spark-Master节点与Slaves节点的网络交互
- Spark 与 Elasticsearch交互的一些配置和问题解决
- 与MySQL交互时的中文问题
- mysql CAPI与C++的交互
- 图片与mysql数据库的交互
- php 与mysql交互的存储过程
- 与客户程序mysql交互的技巧
- java与mysql的数据交互
- phpstorm安装
- Qt之QTcpServer/QTcpSocket简单收发信息(1)
- js实现网页全屏切换(平滑过渡),鼠标滚动切换
- [Leetcode] Palindrome Partitioning
- 【php】数组 取某一列的值 array_column
- Spark与Mysql的交互
- Qt程序如何在Mac上用X-code编译
- SetWindowsHookEx 详解(四)所有结构体
- LeetCode 338.Counting Bits 题解(C++)
- linux安装MySql问题汇总
- 【5】SimpleTrigger
- Qt网络应用----socket通信例子
- AndroidStudio美化日志之logger神器
- 搭建SpringMVC+Spring4.3.2+Hibernate5.2.2框架