sqoop运用
来源:互联网 发布:阿沁的淘宝店铺叫什么 编辑:程序博客网 时间:2024/06/06 09:08
介绍下从通过sqoop将数据定时转移至mysql的使用方法
过程:把存储在hive表的日志数据汇总成天、时、分数据,然后把三份数据同步至mysql
前提:hive表已创建好、mysql表已创建好
将可执行sqoop语句封装成shell脚本
1、汇总脚本etl.sh#!/bin/bash. ~/.bash_profiletoday=$(date -d '-0 day' '+%Y%m%d')yetoday=$(date -d '-1 day' '+%Y%m%d')tomorrow=$(date -d '1 day' '+%Y%m%d')today_d=$(date -d '-1 hour' '+%Y-%m-%d')tomorrow_d=$(date -d '1 day' '+%Y-%m-%d')hive<<EOF insert overwrite table result.t_hive_log_count_dSELECT to_date(time) AS log_date,COUNT(a.page_url) AS pv,COUNT(DISTINCT a.uuid) AS uv,round(COUNT(a.page_url)/COUNT(DISTINCT a.uuid),2) as p_u_rate,COUNT(DISTINCT a.ip) AS ip,from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss') FROM default.t_ft_piwik_log a WHERE day in ('$yetoday','$today') and time>='$today_d' and time<'$tomorrow_d' group by to_date(time);insert overwrite table result.t_hive_log_count_hSELECT to_date(time) AS log_date,hour(time) as log_hour,COUNT(a.page_url) AS pv,COUNT(DISTINCT a.uuid) AS uv,round(COUNT(a.page_url)/COUNT(DISTINCT a.uuid),2) as p_u_rate,COUNT(DISTINCT a.ip) AS ip,from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss') FROM default.t_ft_piwik_log a WHERE day in ('$yetoday','$today') and time>='$today_d' and time<'$tomorrow_d' group by to_date(time),hour(time);insert overwrite table result.t_hive_log_count_mSELECT to_date(time) AS log_date,hour(time) as log_hour,minute(time) as log_minute,COUNT(a.page_url) AS pv,COUNT(DISTINCT a.uuid) AS uv,round(COUNT(a.page_url)/COUNT(DISTINCT a.uuid),2) as p_u_rate,COUNT(DISTINCT a.ip) AS ip,from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss') FROM default.t_ft_piwik_log a WHERE day in ('$yetoday','$today') and time>='$today_d' and time<'$tomorrow_d' group by to_date(time),hour(time),minute(time);EOFexit;2、从hive转移至mysql脚本 hive_to_mysql.sh 注意 -update-key指定的字段必须是mysql设置的主键,否则无法实现按 主键进行增量更新#!/bin/bash. ~/.bash_profilesqoop-export --connect jdbc:mysql://192.168.168.1:3306/edw --username hive --password hive --table t_app_log_count_d --update-key log_date --update-mode allowinsert --export-dir /user/hive/warehouse/result.db/t_hive_log_count_d --input-fields-terminated-by '\001';sqoop-export --connect jdbc:mysql://192.168.168.1:3306/edw --username hive --password hive --table t_app_log_count_h --update-key log_date,log_hour --update-mode allowinsert --export-dir /user/hive/warehouse/result.db/t_hive_log_count_h --input-fields-terminated-by '\001';sqoop-export --connect jdbc:mysql://192.168.168.1:3306/edw --username hive --password hive --table t_app_log_count_m --update-key log_date,log_hour,log_minute --update-mode allowinsert --export-dir /user/hive/warehouse/result.db/t_hive_log_count_m --input-fields-terminated-by '\001';3、将1、2脚本组合log.sh#!/bin/bash. ~/.bash_profilesh /hdfs/script/piwiklog/etl.sh >> /hdfs/script/piwiklog/logs/etl.logsh /hdfs/script/piwiklog/hive_to_mysql.sh >> /hdfs/script/piwiklog/logs/hive_to_mysql.log4、将3脚本加入定时任务即可定时执行,没小时的迪分钟执行#统计天、时、分pv uv量 并同步至mysql06 * * * * sh /hdfs/script/piwiklog/log.sh
阅读全文
0 0
- sqoop运用
- SQOOP
- Sqoop
- SQOOP
- SQOOP
- sqoop
- sqoop
- Sqoop
- sqoop
- sqoop
- sqoop
- Sqoop
- Sqoop
- sqoop
- sqoop
- sqoop
- sqoop
- Sqoop
- proxychains
- 每天五分钟linux(13)-less
- Django提交表单出错”django, CSRF token missing or incorrect”
- CodeForces #426 div2 C The Meaningless Game
- AtomicInteger源码分析
- sqoop运用
- Tomcat中垃圾回收调优
- 一个简单的问题:==与equals的区别
- C语言(9)C小加 之 随机数
- 名企笔试:支付宝红包口令(2015 阿里笔试)
- 【总结】DFS算法模板及题型分类
- Python中对元组和列表按条件进行排序的方法示例
- iOS开发
- lambda