Hadoop的脚本语言Pig应用
来源:互联网 发布:紫金红葫芦淘宝 编辑:程序博客网 时间:2024/06/05 00:21
PIG_HOME=/usr/local/pig
PATH=$PATH:$PIG_HOME/bin
PIG_CLASSPATH=$HADOOP_HOME/conf
通过如下的Pig脚本完成点击数排名前20的IP220.181.108.151 - - [31/Jan/2012:00:02:32 +0800] "GET /home.php?mod=space&uid=158&do=album&view=me&from=space HTTP/1.1" 200 8784 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
208.115.113.82 - - [31/Jan/2012:00:07:54 +0800] "GET /robots.txt HTTP/1.1" 200 582 "-" "Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)"
220.181.94.221 - - [31/Jan/2012:00:09:24 +0800] "GET /home.php?mod=spacecp&ac=pm&op=showmsg&handlekey=showmsg_3&touid=3&pmid=0&daterange=2&pid=398&tid=66 HTTP/1.1" 200 10070 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
112.97.24.243 - - [31/Jan/2012:00:14:48 +0800] "GET /data/cache/style_2_common.css?AZH HTTP/1.1" 200 57752 "http://f.dataguru.cn/forum-58-1.html" "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A406"
运行的结果如下:logRecords = LOAD '/feixu/log/access_log' USING PigStorage(' ') AS (ip:chararray, link:chararray);
groupRecords = GROUP logRecords BY ip;
countRecords = FOREACH groupRecords GENERATE group AS ip, COUNT(logRecords) AS count;
sortRecords = ORDER countRecords BY count DESC;
row20 = LIMIT sortRecords 20;
STORE row20 INTO '/feixu/log/access_out2' USING PigStorage('\t');
可以看到Pig调用自己生成的MapReduce Job如下:
- Hadoop的脚本语言Pig应用
- HADOOP的PIG框架
- 脚本语言的应用
- hadoop pig
- hadoop pig
- Yahoo持续的Pig/Hadoop(MapReduce)工作流
- [Hadoop]Pig与Hive的区别
- [Hadoop]Pig与Hive的区别
- 8、 Pig(hadoop计算的另一种框架)
- hadoop+hbase+hive+pig的部署实践
- hadoop学习第七节:Pig介绍、安装与应用案例
- pig-配置(hadoop)-wordCount
- Hadoop pig进阶语法
- hadoop子项目---pig
- Hadoop Pig 安装
- Hadoop pig进阶语法
- Hadoop pig进阶语法
- Hadoop pig进阶语法
- Hadoop Definitive Guide --- Chapter 6. How MapReduce Works
- 《Linux内核设计与实现》——中断和中断处理
- hadoop实现表连接算法
- FrameLayout中Margin设置无效,解决办法
- 大数据云计算的利器hadoop介绍
- Hadoop的脚本语言Pig应用
- Gnuplot图形展示hadoop处理结果
- 一、ReactiveCocoa(RAC)配置
- Hadoop数据仓库hive的应用
- RDBMS和HDFS, HIVE, HBASE的迁移工具Sqoop
- hadoop基础总结
- virtualbox桥接网络配置--CentOS
- 企业级hadoop集群选型配置
- hadoop1.2.1+zookeeper3.4.6+hbase0.94集群环境搭建