Pig FOREACH 嵌套循环

来源:互联网 发布:溢思得瑞人工智能 编辑:程序博客网 时间:2024/05/21 19:46

Example: Nested Block

Suppose we have relations A and B. Note that relation B contains an inner bag.

A = LOAD 'data' AS (url:chararray,outlink:chararray);DUMP A;(www.ccc.com,www.hjk.com)(www.ddd.com,www.xyz.org)(www.aaa.com,www.cvn.org)(www.www.com,www.kpt.net)(www.www.com,www.xyz.org)(www.ddd.com,www.xyz.org)B = GROUP A BY url;DUMP B;(www.aaa.com,{(www.aaa.com,www.cvn.org)})(www.ccc.com,{(www.ccc.com,www.hjk.com)})(www.ddd.com,{(www.ddd.com,www.xyz.org),(www.ddd.com,www.xyz.org)})(www.www.com,{(www.www.com,www.kpt.net),(www.www.com,www.xyz.org)})

In this example we perform two of the operations allowed in a nested block, FILTER and DISTINCT. Note that the last statement in the nested block must be GENERATE. Also, note the use of projection (PA = FA.outlink;).

X = FOREACH B {        FA= FILTER A BY outlink == 'www.xyz.org';        PA = FA.outlink;        DA = DISTINCT PA;        GENERATE group, COUNT(DA);}DUMP X;(www.aaa.com,0)(www.ccc.com,0)(www.ddd.com,1)(www.www.com,1)
0 0