通过sqoop增量传送oracle数据到hive

来源：互联网发布：广州中名软件编辑：程序博客网时间：2024/06/05 02:15

从网上看到的一个oracle数据通过sqoop每日增量同步到hive的shell脚本，感觉以后会有用，作为参考，备查。
[spark@store ~]$ cat oracle2hive_imcrement.sh

#!/bin/bash#Please set the synchronize interval,unit is hour.update_interval=24#Please set the RDBMS connection paramsrdbms_connstr="jdbc:oracle:thin:@WIN-A1UAC36B1UC:1521:orcl"rdbms_username="cfa"rdbms_pwd="cfa"rdbms_table="AIX_REPORT_DATA"rdbms_columns="reportno,rowno,rowname,col2value,create_time"#Please set the hive paramshive_increment_table="aix_report_data_increment"hive_full_table="aix_report_data"#---------------------------------------------------------#Import icrement data in RDBMS into Hiveenddate=$(date '+%Y/%m/%d %H:%M:%S')startdate=$(date '+%Y/%m/%d %H:%M:%S' -d '-'+${update_interval}+' hours')$SQOOP_HOME/bin/sqoop import --connect ${rdbms_connstr} --username ${rdbms_username} --password ${rdbms_pwd} --table ${rdbms_table} --columns "${rdbms_columns}" --where "CREATE_TIME > to_date('${startdate}','yyyy-mm-dd hh24:mi:ss') and CREATE_TIME < to_date('${enddate}','yyyy-mm-dd hh24:mi:ss')" --hive-import --hive-overwrite --hive-table ${hive_increment_table}#---------------------------------------------------------#Update the old full data table to latest status$HIVE_HOME/bin/hive -e "insert overwrite table ${hive_full_table} select * from ${hive_increment_table} union all select a.* from ${hive_full_table} a left outer join ${hive_increment_table} b on a.reportno = b.reportno and a.rowno = b.rowno where b.reportno is null;"

从脚本中主要学习了，变量参数的使用、hive下增量数据的又一算法、sqoop中where过滤的使用等。

另：通过spark-shell执行sql的样例：
sqlContext.sql("select * from 99_dorm limit 100").show

0 0