HADOOP STREAMING实例HIVE引用PYTHON

来源：互联网发布：crf分词算法编辑：程序博客网时间：2024/05/19 01:13

背景：

现有一个staff(员工信息表),表中包含字段name(员工姓名)time(工作时间)per_money(每小时金钱)；

数据如下：

要求：创建一个表,作为salary(工资表),表中包含字段name(员工姓名)total_money(工作时间*每小时金钱)，将对staff表计算结果插入到salary表中；

解决方案：1､HQL计算

2､hadoop streaming

3､hadoop mapreduce

1､创建表

create table salary(name string , total_money int)row format delimited fields terminated by ‘\t’lines terminated by ‘\n’

2、编辑python.py

import sysfor line in sys.stdin:one = line.strip().split('\t')print "%s\t%d" %(one[0],int(one[1])*int(one[2]))

3、运行hive命令

Add file /opt/study/python.py

注：/opt/study/python.py是本地路径

from  staffinsert overwrite table salaryselect transform(name,time,per_money)  using 'python /opt/study/python.py'as  name , total_money

4、查询salary表

5、核对前面数据，结果正确

6、以上全部为个人整理实践所得，供大家参考学习

0 0