Hive汇总统计数据自动化传输到Mysql数据库-跑批参数文本配置及提取使用

来源:互联网 发布:周琦体测数据 编辑:程序博客网 时间:2024/06/07 17:04
参照:Hive汇总统计数据自动化传输到Mysql数据库--->http://blog.csdn.net/babyfish13/article/details/72701512
本文是对【Hive汇总统计数据自动化传输到Mysql数据库】一文的补充。
1、参数文件配置及示例
/Users/nisj/Desktop/honey_report/BatchParConfig.txt
# BatchParConfig file----------------------# 测试表1ConfigName = 'user_profile'srcSelectText = """select uid,regexp_replace(nickname,'\t','') nickname,avatar,gender,source,state from xxx_user_profile limit 10000;"""mysqlConfig ='targetMysqlConfig_funnyai_data'targetTabName = 'xxxxxx_test_0523'targetTabNameDesc = '测试用表'Config[user_profile] is finished!# 测试表2ConfigName = 'payinfo'srcSelectText = """select a1.nickname,a1.room_id,a1.uid,a1.pt_day,count(today_add) today_subscriber, sum(today_add) today_subscriber_new_user,sum(case when a1.today_add=1 then a2.message_cnt else null end) message_send_in_thisRoom,sum(case when a1.today_add=1 then amount else 0 end) pay_amount <-->                            from xx0522_newadd_user a1 <-->                            left join (select roomid,uid,pt_day,count(*) message_cnt from oss_chushou_message_send where pt_day between '2017-05-15' and '2017-05-21' group by roomid,uid,pt_day) a2 on a1.room_id=int(a2.roomid) and a1.view_uid=int(a2.uid) and a1.pt_day=a2.pt_day <-->                            left join (select uid,pt_day,sum(amount) amount from oss_pay_info where pt_day between '2017-05-15' and '2017-05-21' group by uid,pt_day) a3 on a1.view_uid=int(a3.uid) and a1.pt_day=a3.pt_day <-->                            group by a1.nickname,a1.room_id,a1.uid,a1.pt_day;""" <-->mysqlConfig = 'targetMysqlConfig_funnyai_data'targetTabName = 'xxxxxx_test_0608'targetTabNameDesc = '测试用表2'Config[payinfo] is finished!

2、配置参数的提取和使用
/Users/nisj/Desktop/honey_report/BatchParGet.py
#!/usr/bin/env python# encoding: utf-8def ParGet(ConfigName):    with open("BatchParConfig.txt") as ConfigFile:        ConfigFileList = []        for ConfigLine in ConfigFile:            ConfigFileList.append(ConfigLine.replace('\n', ''))        ConfigFileList_indexStart = ConfigFileList.index('ConfigName = \'{ConfigName}\''.format(ConfigName=ConfigName))        ConfigFileList_indexEnd = ConfigFileList.index(            'Config[{ConfigName}] is finished!'.format(ConfigName=ConfigName))        # print ConfigFileList_indexStart, ConfigFileList_indexEnd        # print ConfigFileList[ConfigFileList_indexStart:ConfigFileList_indexEnd]        srcSelectText = mysqlConfig = targetTabName = targetTabNameDesc = ''        srcSelectCommand = ''        for ConfigFileList_index in range(ConfigFileList_indexStart, ConfigFileList_indexEnd):            if ' <-->' in ConfigFileList[ConfigFileList_index]:                srcSelectCommand = srcSelectCommand + str(ConfigFileList[ConfigFileList_index]).replace(' <-->', ' ')            else:                exec (ConfigFileList[ConfigFileList_index])        exec (srcSelectCommand)        return srcSelectText, mysqlConfig, targetTabName, targetTabNameDescprint ParGet(ConfigName='payinfo')

3、有关说明
这里的配置文件每一个配置以【ConfigName = '{ConfigName}'】开始,以【Config[{ConfigName}] is finished!】结尾,取配置文件的Python脚本只取两者之间的内容。
对于srcSelectText的sql多行的情况,需要在每一个的行尾用【 <-->】以标识。

此处,每一个配置主要有四项内容,srcSelectText, mysqlConfig, targetTabName, targetTabNameDesc;后期可根据需要增改。
阅读全文
0 0
原创粉丝点击