spark通过jdbc访问postgresql数据库

来源：互联网发布：sql 字符串函数编辑：程序博客网时间：2024/04/29 15:15

1.首先要有可用的jdbc

[hadoop@db1 bin]$ locate jdbc|grep postgres/mnt/hd01/www/html/deltasql/clients/java/dbredactor/lib/postgresql-8.2-507.jdbc4.jar/usr/lib/ruby/gems/1.8/gems/railties-3.2.13/lib/rails/generators/rails/app/templates/config/databases/jdbcpostgresql.yml/usr/src/postgis-2.0.0/java/jdbc/src/org/postgresql/usr/src/postgis-2.0.0/java/jdbc/src/org/postgresql/driverconfig.properties/usr/src/postgis-2.0.0/java/jdbc/stubs/org/postgresql/usr/src/postgis-2.0.0/java/jdbc/stubs/org/postgresql/Connection.java/usr/src/postgis-2.0.0/java/jdbc/stubs/org/postgresql/PGConnection.java/usr/src/postgis-2.1.0/java/jdbc/src/org/postgresql/usr/src/postgis-2.1.0/java/jdbc/src/org/postgresql/driverconfig.properties/usr/src/postgis-2.1.0/java/jdbc/stubs/org/postgresql/usr/src/postgis-2.1.0/java/jdbc/stubs/org/postgresql/Connection.java/usr/src/postgis-2.1.0/java/jdbc/stubs/org/postgresql/PGConnection.java没有合适的，就在管网下载：https://jdbc.postgresql.org/download/postgresql-9.4-1205.jdbc4.jar

2.把下载好的jar文件放在$SPARK_HOME/lib下面
3.启动sparck shell

[hadoop@db1 bin]$ SPARK_CLASSPATH=$SPARK_HOME/lib/postgresql-9.4-1205.jdbc4.jar $SPARK_HOME/bin/spark-shell...Please instead use: - ./spark-submit with --driver-class-path to augment the driver classpath - spark.executor.extraClassPath to augment the executor classpath15/11/04 17:53:05 WARN SparkConf: Setting 'spark.executor.extraClassPath' to '/usr/src/data-integration/lib/postgresql-9.3-1102-jdbc4.jar' as a work-around.15/11/04 17:53:05 WARN SparkConf: Setting 'spark.driver.extraClassPath' to '/usr/src/data-integration/lib/postgresql-9.3-1102-jdbc4.jar' as a work-around.15/11/04 17:53:06 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.15/11/04 17:53:07 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.Spark context available as sc.15/11/04 17:53:09 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)15/11/04 17:53:09 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)15/11/04 17:53:25 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.015/11/04 17:53:25 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException15/11/04 17:53:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable15/11/04 17:53:29 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)15/11/04 17:53:29 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)SQL context available as sqlContext.

4.创建DataFrame对象

scala> val df = sqlContext.load("jdbc", Map("url" -> "jdbc:postgresql://localhost:5434/cd03?user=cd03&password=cd03", "dbtable" -> "test_trans"))warning: there were 1 deprecation warning(s); re-run with -deprecation for detailsdf: org.apache.spark.sql.DataFrame = [trans_date: string, trans_prd: int, trans_cust: int]标准格式：val jdbcDF = sqlContext.read.format("jdbc").options(   Map("url" -> "jdbc:postgresql:dbserver",  "dbtable" -> "schema.tablename")).load()

5.查看schema

scala> df.printSchema()root |-- trans_date: string (nullable = true) |-- trans_prd: integer (nullable = true) |-- trans_cust: integer (nullable = true)

6.简单计算

scala> df.filter(df("trans_cust")>9999999).select("trans_date","trans_prd").show+----------+---------+|trans_date|trans_prd|+----------+---------+| 2015-5-20|     2007|| 2015-7-24|     5638|| 2015-5-19|     8182|| 2015-2-24|    11391|| 2015-8-13|    17341|| 2015-2-22|    10996|| 2015-1-17|    15284||  2015-1-8|    16090|| 2015-1-25|    13528|| 2015-1-17|     9498|| 2015-9-25|     7235|| 2015-8-19|     4084|| 2015-4-24|    16637|| 2015-5-27|    13829|| 2015-0-13|    13956|| 2015-3-19|    11974|| 2015-10-5|     1185|| 2015-3-28|     9412|| 2015-6-13|    15203|| 2015-2-14|    10087|+----------+---------+only showing top 20 rows

0 0