在pig中嵌入python程序
来源:互联网 发布:apple watch销售数据 编辑:程序博客网 时间:2024/06/01 07:45
在pig中嵌入python程序
python程序如下,保存在/home/zkf/File/Pig/untitled0.py
import sys
for line in sys.stdin:
(c,n,s) = line.split()
if int(s) >= 60:
print "%s\t%s\t%s"%(c,n,s)
for line in sys.stdin:
(c,n,s) = line.split()
if int(s) >= 60:
print "%s\t%s\t%s"%(c,n,s)
pig程序如下,保存在/home/zkf/File/Pig/testPython-Pig/testPython-pig.pig
records = load'/user/student.txt' using PigStorage(':') as(classNo:chararray, studNo:chararray, score:int);
dump records;
define pass `untitled0.py` SHIP('/home/zkf/File/Pig/untitled0.py');
records_pass = stream records through pass as(classNo:chararray, studNo:chararray, score:int);
dump records_pass;
dump records;
define pass `untitled0.py` SHIP('/home/zkf/File/Pig/untitled0.py');
records_pass = stream records through pass as(classNo:chararray, studNo:chararray, score:int);
dump records_pass;
加载的文件如下,保存在分布式系统的/user/student.txt 处
C01:N0101:82
C01:N0102:59
C01:N0103:65
C02:N0201:81
C02:N0202:82
C02:N0203:79
C03:N0301:56
C03:N0302:92
C03:N0306:72
C01:N0102:59
C01:N0103:65
C02:N0201:81
C02:N0202:82
C02:N0203:79
C03:N0301:56
C03:N0302:92
C03:N0306:72
执行的结果如下,浅黄色的可以忽略,只是一些执行成功的信息,最后的橘黄色部分才是我们需要的结果:
2017-03-22 22:09:10,242 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: STREAMING
2017-03-22 22:09:10,243 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2017-03-22 22:09:10,247 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2017-03-22 22:09:10,248 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2017-03-22 22:09:10,248 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2017-03-22 22:09:10,252 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2017-03-22 22:09:10,252 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2017-03-22 22:09:10,253 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job287657154798300158.jar
2017-03-22 22:09:12,274 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job287657154798300158.jar created
2017-03-22 22:09:12,811 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2017-03-22 22:09:12,812 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2017-03-22 22:09:12,812 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache
2017-03-22 22:09:12,812 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2017-03-22 22:09:12,812 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Map only job, skipping reducer estimation
2017-03-22 22:09:12,820 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2017-03-22 22:09:12,934 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2017-03-22 22:09:12,936 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2017-03-22 22:09:12,937 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2017-03-22 22:09:13,321 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201703222151_0004
2017-03-22 22:09:13,321 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases records,records_pass
2017-03-22 22:09:13,321 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: records[1,10],records[-1,-1],records_pass[7,15],records_pass[-1,-1] C: R:
2017-03-22 22:09:13,321 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201703222151_0004
2017-03-22 22:09:13,323 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2017-03-22 22:09:17,332 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2017-03-22 22:09:23,353 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2017-03-22 22:09:23,353 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersionUserId StartedAtFinishedAt Features
1.2.1 0.12.0 zkf 2017-03-22 22:09:102017-03-22 22:09:23 STREAMING
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTimeMinMapTIme AvgMapTimeMedianMapTime MaxReduceTimeMinReduceTime AvgReduceTimeMedianReducetime AliasFeature Outputs
job_201703222151_0004 10 2 2 2 2n/a n/an/a n/arecords,records_pass STREAMING,MAP_ONLYhdfs://localhost:9000/tmp/temp-266321578/tmp-846389344,
Input(s):
Successfully read 9 records (474 bytes) from: "/user/student.txt"
Output(s):
Successfully stored 7 records (140 bytes) in: "hdfs://localhost:9000/tmp/temp-266321578/tmp-846389344"
Counters:
Total records written : 7
Total bytes written : 140
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201703222151_0004
2017-03-22 22:09:23,357 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2017-03-22 22:09:23,358 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2017-03-22 22:09:23,361 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2017-03-22 22:09:23,361 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(C01,N0101,82)
(C01,N0103,65)
(C02,N0201,81)
(C02,N0202,82)
(C02,N0203,79)
(C03,N0302,92)
(C03,N0306,72)
2017-03-22 22:09:10,243 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2017-03-22 22:09:10,247 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2017-03-22 22:09:10,248 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2017-03-22 22:09:10,248 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2017-03-22 22:09:10,252 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2017-03-22 22:09:10,252 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2017-03-22 22:09:10,253 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job287657154798300158.jar
2017-03-22 22:09:12,274 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job287657154798300158.jar created
2017-03-22 22:09:12,811 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2017-03-22 22:09:12,812 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2017-03-22 22:09:12,812 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache
2017-03-22 22:09:12,812 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2017-03-22 22:09:12,812 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Map only job, skipping reducer estimation
2017-03-22 22:09:12,820 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2017-03-22 22:09:12,934 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2017-03-22 22:09:12,936 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2017-03-22 22:09:12,937 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2017-03-22 22:09:13,321 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201703222151_0004
2017-03-22 22:09:13,321 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases records,records_pass
2017-03-22 22:09:13,321 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: records[1,10],records[-1,-1],records_pass[7,15],records_pass[-1,-1] C: R:
2017-03-22 22:09:13,321 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201703222151_0004
2017-03-22 22:09:13,323 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2017-03-22 22:09:17,332 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2017-03-22 22:09:23,353 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2017-03-22 22:09:23,353 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersionUserId StartedAtFinishedAt Features
1.2.1 0.12.0 zkf 2017-03-22 22:09:102017-03-22 22:09:23 STREAMING
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTimeMinMapTIme AvgMapTimeMedianMapTime MaxReduceTimeMinReduceTime AvgReduceTimeMedianReducetime AliasFeature Outputs
job_201703222151_0004 10 2 2 2 2n/a n/an/a n/arecords,records_pass STREAMING,MAP_ONLYhdfs://localhost:9000/tmp/temp-266321578/tmp-846389344,
Input(s):
Successfully read 9 records (474 bytes) from: "/user/student.txt"
Output(s):
Successfully stored 7 records (140 bytes) in: "hdfs://localhost:9000/tmp/temp-266321578/tmp-846389344"
Counters:
Total records written : 7
Total bytes written : 140
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201703222151_0004
2017-03-22 22:09:23,357 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2017-03-22 22:09:23,358 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2017-03-22 22:09:23,361 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2017-03-22 22:09:23,361 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(C01,N0101,82)
(C01,N0103,65)
(C02,N0201,81)
(C02,N0202,82)
(C02,N0203,79)
(C03,N0302,92)
(C03,N0306,72)
0 0
- 在pig中嵌入python程序
- 在C++程序中嵌入Python
- 在应用中嵌入Python
- 在应用中嵌入Python
- 在WPF程序中嵌入Win32程序。
- windows batch脚本中嵌入python程序
- perlembed - 在 C 程序中嵌入 perl
- 如何 在C 程序中嵌入Perl
- 如何在C++中嵌入JAVA程序
- 在Android中嵌入C语言程序
- 在JSP中嵌入Flex程序
- 在RCP程序中嵌入Word文档
- 在程序中嵌入 CTK 插件框架
- 在C/C++中嵌入Python
- 在python中嵌入c/c++
- PyAsm-在python中嵌入汇编
- 利用boost在C++中嵌入python
- 在python中嵌入c/c++
- StaticLayout
- 遮罩层屏幕禁止滚动
- Extjs closable属性
- 常用菜单开发设计思路
- Struts2初始化配置的问题
- 在pig中嵌入python程序
- Cmake的install与file命令的区别
- 0323
- 路由器基本配置命令
- 自动增加分区
- 前端面试题
- mapreduce获取输入文件名、输出路径
- ExtJS4——hello world
- 实现不带头结点的单链表