Oozie配合Hive、Sqoop定时导出Hive表数据至RDB
来源:互联网 发布:小型组态软件 编辑:程序博客网 时间:2024/06/05 05:27
Oozie配合Hive、Sqoop完成定时导出Hive表数据至RDB
一、整体流程
Oozie定时启动Hive程序,将Hive表数据根据需求导出至HDFS,而后启动Sqoop,将HDFS数据导入至RDB目标表。
二、HQL数据导出语句(需根据具体需求再行修改)
略过
三、Sqoop导入数据至RDB
sqoop export --connect jdbc:mysql://moma03:3306/test --username hive --password password --table table --export-dir dir --input-fields-terminated-by '\001'
因所用hive版本过低,无法指定导出数据的column分隔符,只能使用其默认分隔符\x01,所以在使用Sqoop导数据时,需指定--input-fields-terminated-by '\001'
另:在使用load数据至hive表时,如使用默认分隔符,需指定为 \x01而不是 \001
四、注意事项
首先要将Hive的Meta数据将Derby修改为Mysql。在这个过程中,需要特别注意配置等host地址需为远程地址,而不是localhost等,因为Oozie会在不定的节点上启动Hive任务,如果host地址不正确,将出现项目时而成功时而失败的现象。
在Oozie中Sqoop的执行命令有多种写法,其一较为常见的是将命令传入command标签和<arg>标签,如下
<command>import --connect jdbc:hsqldb:file:db.hsqldb --table TT --target-dir hdfs://localhost:8020/user/tucu/foo -m 1</command>
或
<span style="font-family: Arial, Helvetica, sans-serif;"></span><pre name="code" class="html"><arg>import</arg> <arg>--connect</arg> <arg>jdbc:hsqldb:file:db.hsqldb</arg>…………………………
五、xml
<pre name="code" class="html"><workflow-app xmlns="uri:oozie:workflow:0.2" name="exportDataTRDB"><start to="importDefaultSegs" /><action name="importDefaultSegs"><hive xmlns="uri:oozie:hive-action:0.2"><job-tracker>${jobTracker}</job-tracker><name-node>${nameNode}</name-node><job-xml>/biginsights/oozie/sharedLibraries/hive/conf/hive-site.xml</job-xml><configuration><property><name>mapred.compress.map.output</name><value>true</value></property><property><name>mapred.job.queue.name</name><value>${queueName}</value></property><property><name>oozie.hive.defaults</name><value>/biginsights/oozie/sharedLibraries/hive/hive-default.xml</value></property></configuration><script>adhoc.txt</script><param>SCRIPT=${script1}</param></hive><ok to="cleanupTempBuffer" /><error to="fail" /></action><action name="cleanupTempBuffer"> <fs> <delete path="hdfs://moma03:9000/data/contextualtrajsegmentstemp/"/> <mkdir path="hdfs://moma03:9000/data/contextualtrajsegmentstemp/"/> </fs><ok to="segsPivot" /><error to="fail" /></action><action name="segsPivot"><hive xmlns="uri:oozie:hive-action:0.2"><job-tracker>${jobTracker}</job-tracker><name-node>${nameNode}</name-node><job-xml>/biginsights/oozie/sharedLibraries/hive/conf/hive-site.xml</job-xml><configuration><property><name>mapred.compress.map.output</name><value>true</value></property><property><name>mapred.job.queue.name</name><value>${queueName}</value></property><property><name>oozie.hive.defaults</name><value>/biginsights/oozie/sharedLibraries/hive/hive-default.xml</value></property></configuration><script>adhoc.txt</script><param>SCRIPT=${script}</param></hive><ok to="export2RDB" /><error to="fail" /></action><action name="export2RDB"><sqoop xmlns="uri:oozie:sqoop-action:0.2"><job-tracker>${jobTracker}</job-tracker><name-node>${nameNode}</name-node><configuration><property><name>mapred.compress.map.output</name><value>true</value></property><property><name>mapred.job.queue.name</name><value>${queueName}</value></property><property><name>oozie.hive.defaults</name><value>/biginsights/oozie/sharedLibraries/hive/hive-default.xml</value></property></configuration><arg>export</arg><arg>--connect</arg><arg>${connect}</arg><arg>--username</arg><arg>${username}</arg><arg>--password</arg><arg>${password}</arg><arg>--input-fields-terminated-by</arg><arg>\001</arg><arg>--table</arg> <arg>${table}</arg><arg>--export-dir</arg> <arg>/user/hive/warehouse/moma.db/t_mart_airqualitystatisticstemp</arg></sqoop><ok to="end" /><error to="fail" /></action><kill name="fail"><message>hive failed, errormessage[${wf:errorMessage(wf:lastErrorNode())}]</message></kill><end name='end' /></workflow-app>
coordinator.xml
<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.--><coordinator-app name="exportDateToRDB" frequency="${coord:days(1)}" start="${start}" end="${end}" timezone="UTC" xmlns="uri:oozie:coordinator:0.2"> <action> <workflow> <app-path>${workflowAppUri}</app-path> <configuration> <property> <name>jobTracker</name> <value>${jobTracker}</value> </property> <property> <name>nameNode</name> <value>${nameNode}</value> </property> <property> <name>queueName</name> <value>${queueName}</value> </property> </configuration> </workflow> </action></coordinator-app>
- Oozie配合Hive、Sqoop定时导出Hive表数据至RDB
- oozie定时调用sqoop从oracle中增量导入数据至hive表
- sqoop导入导出表/数据到Hive
- sqoop导出hive表数据到mysql
- sqoop+hive+shell+oozie 示例
- Oozie调度sqoop导入hive
- Hive数据导入、sqoop数据导入导出
- sqoop从hive中导出oracle数据
- sqoop 导出 hive分区表 数据到 mysql
- 利用sqoop导出hive数据到 oracle
- sqoop导出mysql数据进入hive错误
- sqoop导入数据至hive
- Hive / pig / Sqoop/ Oozie 学习资料
- sqoop导出hive表到mysql中
- 利用sqoop将hive数据导入导出数据到mysql
- 利用sqoop将hive数据导入导出数据到mysql
- 利用sqoop将hive数据导入导出数据到mysql
- 利用sqoop将hive数据导入导出数据到mysql
- 删除字符串中多余的空格]给定字符串,删除开始和结尾处的空格,并将中间的多个连续的空格合并成一个
- JSLint简介
- hadoop一些常见的案例(更新中)
- 第2周项目0-宣告‘主权’
- 第2周 项目0 宣告“主权”
- Oozie配合Hive、Sqoop定时导出Hive表数据至RDB
- SWfit学习2:函数和闭包
- Dapper ORM 用法—Net下无敌的ORM(转)
- iOS网络基础第一篇
- Hibernate学习之一对多关联
- pythonanywhere搭建过程
- 第一周项目1-c/c++中参数传递方式
- 微信公众平台后台接入简明指南
- 知道创宇研发技能列表v3.0