简单的项目使用flume,hive,sqoop,flume

来源:互联网 发布:经典球员数据 编辑:程序博客网 时间:2024/05/16 11:53

这是一个自动化的脚本,每天定时启动使用crontab进行配置即可

CURRENT=`/bin/date +%y%m%d`

数据清洗
#/usr/local/hadoop-2.4.1/bin/hadoop jar /home/hadoop/cleaner.jar /flume/$CURRENT /cleaned/$CURRENT

#/usr/local/apache-hive-0.13.0-bin/bin/hive -e "alter table bbs add partition (logdate=$CURRENT) location '/cleaned/$CURRENT'"
数据分析
#/usr/local/apache-hive-0.13.0-bin/bin/hive -e "select count(*) from bbs where logdate = $CURRENT"

#/usr/local/apache-hive-0.13.0-bin/bin/hive -e "select count(distinct ip) from bbs where logdate = $CURRENT"

#/usr/local/apache-hive-0.13.0-bin/bin/hive -e "select count(*) from bbs where logdate = $CURRENT and instr(url, 'member.php?mod=register')>0;"

#/usr/local/apache-hive-0.13.0-bin/bin/hive -e "create table vip_$CURRENT row format delimited fields terminated by '\t' as select ip, count(*) as vtimes from bbs where logdate = $CURRENT  group by ip having vtimes >= 50 order by vtimes desc limit 20"
数据导出到关系型数据句酷
/usr/local/sqoop-1.4.4/bin/sqoop export --connect jdbc:mysql://192.168.1.100:3306/usertable--username root --password 123 --export-dir "/user/hive/warehouse/vip_$CURRENT" --table vip --fields-terminated-by '\t'
0 0
原创粉丝点击