python 导出hive数据表的schema

来源：互联网发布：数据存储的方式编辑：程序博客网时间：2024/05/22 05:25

为了避免运营提出无穷无尽的查询需求，我们决定将有查询价值的数据从mysql导入hive中，让他们使用HUE这个开源工具进行查询。想必他们对表结构不甚了解，还需要为之提供一个表结构说明，于是编写了一个脚本，从hive数据库中将每张表的字段即类型查询出来，代码如下：

#coding=utf-8import pyhs2from xlwt import *hiveconn = pyhs2.connect(host='10.46.77.120',                  port=10000,                  authMechanism='PLAIN',                  user='hadoop',                  database='hibiscus_data',                  )def create_excel():    sql = 'show tables'    tables = []    with hiveconn.cursor() as cursor:        cursor.execute(sql)        res = cursor.fetch()        for table in res:            tables.append(table[0])        tableinfo = []    for table in tables:        tableinfo.append(get_column_info(table))    create_excel_ex(tableinfo)def create_excel_ex(tableinfo):    w = Workbook()    sheet = w.add_sheet(u'表结构')    row = 0    for info in tableinfo:        row = write_tale_info(info,sheet,row)    w.save('hive_schema.xls')def write_tale_info(tableinfo,sheet,row):    print row    sheet.write_merge(row,row,0,2,tableinfo['table'])        row += 1    sheet.write(row,0,u'名称')    sheet.write(row,1,u'类型')    sheet.write(row,2,u'解释')    row += 1    fields = tableinfo['fields']    for field in fields:        sheet.write(row,0,field['name'])        sheet.write(row,1,field['type'])        row += 1    return row + 1         def get_column_info(table):    sql = 'desc {table}'.format(table=table)    info = {'table':table,'fields':[]}    with hiveconn.cursor() as cursor:        cursor.execute(sql)        res = cursor.fetch()        for item in res:            if item[0] == '':                break            info['fields'].append({'name':item[0],'type':item[1]})    return infoif __name__ == '__main__':    create_excel()

其实，我们的hive数据库将所有的元数据存储在了mysql当中，分析这些元数据也可以获得表结构信息。

0 0