Pig common command
来源:互联网 发布:淘宝盗版书怎么举报 编辑:程序博客网 时间:2024/06/05 15:46
STORE
A = LOAD 'data' AS (a1:int,a2:int,a3:int);DUMP A;(1,2,3)(4,2,1)(8,3,4)(4,3,3)(7,2,5)(8,4,3)STORE A INTO 'myoutput' USING PigStorage ('*');CAT myoutput;1*2*34*2*18*3*44*3*37*2*58*4*3
SPLIT
Partitions a relation into two or more relations.
A = LOAD 'data' AS (f1:int,f2:int,f3:int);DUMP A; (1,2,3)(4,5,6)(7,8,9) SPLIT A INTO X IF f1<7, Y IF f2==5, Z IF (f3<6 OR f3>6);DUMP X;(1,2,3)(4,5,6)DUMP Y;(4,5,6)DUMP Z;(1,2,3)(7,8,9)
Example
In this example, the SPLIT and FILTER statements are essentially equivalent. However, because SPLIT is implemented as "split the data stream and then apply filters" the SPLIT statement is more expensive than the FILTER statement because Pig needs to filter and store two data streams.
SPLIT input_var INTO output_var IF (field1 is not null), ignored_var IF (field1 is null); -- where ignored_var is not used elsewhere output_var = FILTER input_var BY (field1 is not null);
ORDER BY
Sorts a relation based on one or more fields.
A = LOAD 'mydata' AS (x: int, y: map[]); B = ORDER A BY x; -- this is allowed because x is a simple typeB = ORDER A BY y; -- this is not allowed because y is a complex typeB = ORDER A BY y#'id'; -- this is not allowed because y#'id' is an expression
Examples
Suppose we have relation A.
A = LOAD 'data' AS (a1:int,a2:int,a3:int);DUMP A;(1,2,3)(4,2,1)(8,3,4)(4,3,3)(7,2,5)(8,4,3)
In this example relation A is sorted by the third field, f3 in descending order. Note that the order of the three tuples ending in 3 can vary.
X = ORDER A BY a3 DESC;DUMP X;(7,2,5)(8,3,4)(1,2,3)(4,3,3)(8,4,3)(4,2,1)
LIMIT
Limits the number of output tuples.
Examples
In this example the limit is expressed as a scalar.
a = load 'a.txt';b = group a all;c = foreach b generate COUNT(a) as sum;d = order a by $0;e = limit d c.sum/100;
Suppose we have relation A.
A = LOAD 'data' AS (a1:int,a2:int,a3:int);DUMP A;(1,2,3)(4,2,1)(8,3,4)(4,3,3)(7,2,5)(8,4,3)
In this example output is limited to 3 tuples. Note that there is no guarantee which three tuples will be output.
X = LIMIT A 3;DUMP X;(1,2,3)(4,3,3)(7,2,5)
DISTINCT
Removes duplicate tuples in a relation.
Syntax
alias = DISTINCT alias [PARTITION BY partitioner] [PARALLEL n];
Terms
alias
The name of the relation.
PARTITION BY partitioner
Use this feature to specify the Hadoop Partitioner. The partitioner controls the partitioning of the keys of the intermediate map-outputs.
For more details, see http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/Partitioner.html
For usage, see Example: PARTITION BY.
PARALLEL n
Increase the parallelism of a job by specifying the number of reduce tasks, n.
For more information, see Use the Parallel Features.
Usage
Use the DISTINCT operator to remove duplicate tuples in a relation. DISTINCT does not preserve the original order of the contents (to eliminate duplicates, Pig must first sort the data). You cannot use DISTINCT on a subset of fields; to do this, use FOREACH and a nested block to first select the fields and then apply DISTINCT (see Example: Nested Block).
Example
Suppose we have relation A.
A = LOAD 'data' AS (a1:int,a2:int,a3:int);DUMP A;(8,3,4)(1,2,3) (4,3,3) (4,3,3) (1,2,3)
In this example all duplicate tuples are removed.
X = DISTINCT A;DUMP X;(1,2,3)(4,3,3)(8,3,4)
FILTER
Selects tuples from a relation based on some condition.
Syntax
alias = FILTER alias BY expression;
Terms
alias
The name of the relation.
BY
Required keyword.
expression
A boolean expression.
Usage
Use the FILTER operator to work with tuples or rows of data (if you want to work with columns of data, use the FOREACH...GENERATE operation).
FILTER is commonly used to select the data that you want; or, conversely, to filter out (remove) the data you don’t want.
Examples
Suppose we have relation A.
A = LOAD 'data' AS (a1:int,a2:int,a3:int);DUMP A;(1,2,3)(4,2,1)(8,3,4)(4,3,3)(7,2,5)(8,4,3)
In this example the condition states that if the third field equals 3, then include the tuple with relation X.
X = FILTER A BY f3 == 3;DUMP X;(1,2,3)(4,3,3)(8,4,3)refer to :http://pig.apache.org/docs/r0.12.1/basic.html
- Pig common command
- Common useful Linux command!
- Common Linux Command
- Vim Common Command
- common sql command
- Git Common Command
- Common Command on linux
- linux system common command
- Kafka common command
- vue-common-command
- sereral common used sh command
- [Linux][Common command] zip operation
- pig
- pig
- Pig
- Pig
- Pig
- Pig
- 安卓实训教程第一天:搭建环境,并且运行第一个helloworld以及电话拨号器
- 将字符串中连续出现的重复字母进行压缩
- 动态规划之钢条切割
- javascript要点,易忽略的基础
- 使用欧拉Φ函数和欧拉定理计算模取幂的周期
- Pig common command
- shell学习1:入门
- 菜鸟记录学习java的点点滴滴之入门
- Python:去掉粘贴代码后的行号
- AVR键盘扫描
- ndk编译 ffmpeg 1.1.1 出现libavutil/time.h 和系统的time.h 冲突问题 ;
- JAVA基础 -- 命令行
- 5月19日—我的第一篇博客翻译
- 尽可能用初始化列表 避免两次构造