sqoop运用

来源:互联网 发布:阿沁的淘宝店铺叫什么 编辑:程序博客网 时间:2024/06/06 09:08

介绍下从通过sqoop将数据定时转移至mysql的使用方法

过程:把存储在hive表的日志数据汇总成天、时、分数据,然后把三份数据同步至mysql

前提:hive表已创建好、mysql表已创建好

将可执行sqoop语句封装成shell脚本

1、汇总脚本etl.sh#!/bin/bash. ~/.bash_profiletoday=$(date -d '-0 day' '+%Y%m%d')yetoday=$(date -d '-1 day' '+%Y%m%d')tomorrow=$(date -d '1 day' '+%Y%m%d')today_d=$(date -d '-1 hour' '+%Y-%m-%d')tomorrow_d=$(date -d '1 day' '+%Y-%m-%d')hive<<EOF   insert overwrite table result.t_hive_log_count_dSELECT to_date(time) AS log_date,COUNT(a.page_url) AS pv,COUNT(DISTINCT a.uuid) AS uv,round(COUNT(a.page_url)/COUNT(DISTINCT a.uuid),2) as p_u_rate,COUNT(DISTINCT a.ip) AS ip,from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss') FROM default.t_ft_piwik_log a WHERE day in ('$yetoday','$today') and time>='$today_d' and time<'$tomorrow_d' group by to_date(time);insert overwrite table result.t_hive_log_count_hSELECT to_date(time) AS log_date,hour(time) as log_hour,COUNT(a.page_url) AS pv,COUNT(DISTINCT a.uuid) AS uv,round(COUNT(a.page_url)/COUNT(DISTINCT a.uuid),2) as p_u_rate,COUNT(DISTINCT a.ip) AS ip,from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss') FROM default.t_ft_piwik_log a WHERE day in ('$yetoday','$today') and time>='$today_d' and time<'$tomorrow_d' group by to_date(time),hour(time);insert overwrite table result.t_hive_log_count_mSELECT to_date(time) AS log_date,hour(time) as log_hour,minute(time) as log_minute,COUNT(a.page_url) AS pv,COUNT(DISTINCT a.uuid) AS uv,round(COUNT(a.page_url)/COUNT(DISTINCT a.uuid),2) as p_u_rate,COUNT(DISTINCT a.ip) AS ip,from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss') FROM default.t_ft_piwik_log a WHERE day in ('$yetoday','$today') and time>='$today_d' and time<'$tomorrow_d' group by to_date(time),hour(time),minute(time);EOFexit;2、从hive转移至mysql脚本 hive_to_mysql.sh   注意  -update-key指定的字段必须是mysql设置的主键,否则无法实现按         主键进行增量更新#!/bin/bash. ~/.bash_profilesqoop-export --connect jdbc:mysql://192.168.168.1:3306/edw --username hive --password hive --table t_app_log_count_d  --update-key log_date --update-mode allowinsert --export-dir /user/hive/warehouse/result.db/t_hive_log_count_d  --input-fields-terminated-by '\001';sqoop-export --connect jdbc:mysql://192.168.168.1:3306/edw --username hive --password hive --table t_app_log_count_h  --update-key log_date,log_hour --update-mode allowinsert --export-dir /user/hive/warehouse/result.db/t_hive_log_count_h  --input-fields-terminated-by '\001';sqoop-export --connect jdbc:mysql://192.168.168.1:3306/edw --username hive --password hive --table t_app_log_count_m  --update-key log_date,log_hour,log_minute --update-mode allowinsert --export-dir /user/hive/warehouse/result.db/t_hive_log_count_m  --input-fields-terminated-by '\001';3、将12脚本组合log.sh#!/bin/bash. ~/.bash_profilesh /hdfs/script/piwiklog/etl.sh >> /hdfs/script/piwiklog/logs/etl.logsh /hdfs/script/piwiklog/hive_to_mysql.sh >> /hdfs/script/piwiklog/logs/hive_to_mysql.log4、将3脚本加入定时任务即可定时执行,没小时的迪分钟执行#统计天、时、分pv uv量   并同步至mysql06 * * * * sh /hdfs/script/piwiklog/log.sh
原创粉丝点击