Oozie配合Hive、Sqoop定时导出Hive表数据至RDB

来源:互联网 发布:小型组态软件 编辑:程序博客网 时间:2024/06/05 05:27

Oozie配合HiveSqoop完成定时导出Hive表数据至RDB

一、整体流程 

Oozie定时启动Hive程序,将Hive表数据根据需求导出至HDFS,而后启动Sqoop,将HDFS数据导入至RDB目标表。

二、HQL数据导出语句(需根据具体需求再行修改)

 略过

三、Sqoop导入数据至RDB

sqoop export --connect jdbc:mysql://moma03:3306/test --username hive --password password --table table --export-dir dir --input-fields-terminated-by '\001'

因所用hive版本过低,无法指定导出数据的column分隔符,只能使用其默认分隔符\x01,所以在使用Sqoop导数据时,需指定--input-fields-terminated-by '\001'

另:在使用load数据至hive表时,如使用默认分隔符,需指定为 \x01而不是 \001

四、注意事项

首先要将HiveMeta数据将Derby修改为Mysql。在这个过程中,需要特别注意配置等host地址需为远程地址,而不是localhost等,因为Oozie会在不定的节点上启动Hive任务,如果host地址不正确,将出现项目时而成功时而失败的现象。

OozieSqoop的执行命令有多种写法,其一较为常见的是将命令传入command标签和<arg>标签,如下

     <command>import  --connect jdbc:hsqldb:file:db.hsqldb --table TT --target-dir hdfs://localhost:8020/user/tucu/foo -m 1</command>     

<span style="font-family: Arial, Helvetica, sans-serif;"></span><pre name="code" class="html"><arg>import</arg>            <arg>--connect</arg>           <arg>jdbc:hsqldb:file:db.hsqldb</arg>     
…………………………

五、xml

<pre name="code" class="html"><workflow-app xmlns="uri:oozie:workflow:0.2" name="exportDataTRDB"><start to="importDefaultSegs" /><action name="importDefaultSegs"><hive xmlns="uri:oozie:hive-action:0.2"><job-tracker>${jobTracker}</job-tracker><name-node>${nameNode}</name-node><job-xml>/biginsights/oozie/sharedLibraries/hive/conf/hive-site.xml</job-xml><configuration><property><name>mapred.compress.map.output</name><value>true</value></property><property><name>mapred.job.queue.name</name><value>${queueName}</value></property><property><name>oozie.hive.defaults</name><value>/biginsights/oozie/sharedLibraries/hive/hive-default.xml</value></property></configuration><script>adhoc.txt</script><param>SCRIPT=${script1}</param></hive><ok to="cleanupTempBuffer" /><error to="fail" /></action><action name="cleanupTempBuffer"> <fs>            <delete path="hdfs://moma03:9000/data/contextualtrajsegmentstemp/"/>            <mkdir path="hdfs://moma03:9000/data/contextualtrajsegmentstemp/"/>        </fs><ok to="segsPivot" /><error to="fail" /></action><action name="segsPivot"><hive xmlns="uri:oozie:hive-action:0.2"><job-tracker>${jobTracker}</job-tracker><name-node>${nameNode}</name-node><job-xml>/biginsights/oozie/sharedLibraries/hive/conf/hive-site.xml</job-xml><configuration><property><name>mapred.compress.map.output</name><value>true</value></property><property><name>mapred.job.queue.name</name><value>${queueName}</value></property><property><name>oozie.hive.defaults</name><value>/biginsights/oozie/sharedLibraries/hive/hive-default.xml</value></property></configuration><script>adhoc.txt</script><param>SCRIPT=${script}</param></hive><ok to="export2RDB" /><error to="fail" /></action><action name="export2RDB"><sqoop xmlns="uri:oozie:sqoop-action:0.2"><job-tracker>${jobTracker}</job-tracker><name-node>${nameNode}</name-node><configuration><property><name>mapred.compress.map.output</name><value>true</value></property><property><name>mapred.job.queue.name</name><value>${queueName}</value></property><property><name>oozie.hive.defaults</name><value>/biginsights/oozie/sharedLibraries/hive/hive-default.xml</value></property></configuration><arg>export</arg><arg>--connect</arg><arg>${connect}</arg><arg>--username</arg><arg>${username}</arg><arg>--password</arg><arg>${password}</arg><arg>--input-fields-terminated-by</arg><arg>\001</arg><arg>--table</arg> <arg>${table}</arg><arg>--export-dir</arg> <arg>/user/hive/warehouse/moma.db/t_mart_airqualitystatisticstemp</arg></sqoop><ok to="end" /><error to="fail" /></action><kill name="fail"><message>hive failed, errormessage[${wf:errorMessage(wf:lastErrorNode())}]</message></kill><end name='end' /></workflow-app>





coordinator.xml

<!--  Licensed to the Apache Software Foundation (ASF) under one  or more contributor license agreements.  See the NOTICE file  distributed with this work for additional information  regarding copyright ownership.  The ASF licenses this file  to you under the Apache License, Version 2.0 (the  "License"); you may not use this file except in compliance  with the License.  You may obtain a copy of the License at         http://www.apache.org/licenses/LICENSE-2.0    Unless required by applicable law or agreed to in writing, software  distributed under the License is distributed on an "AS IS" BASIS,  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and  limitations under the License.--><coordinator-app name="exportDateToRDB" frequency="${coord:days(1)}" start="${start}" end="${end}" timezone="UTC"                 xmlns="uri:oozie:coordinator:0.2">        <action>        <workflow>            <app-path>${workflowAppUri}</app-path>            <configuration>                <property>                    <name>jobTracker</name>                    <value>${jobTracker}</value>                </property>                <property>                    <name>nameNode</name>                    <value>${nameNode}</value>                </property>                <property>                    <name>queueName</name>                    <value>${queueName}</value>                </property>            </configuration>        </workflow>    </action></coordinator-app>


 

0 0
原创粉丝点击