学习hadoop(2)单词统计
来源:互联网 发布:中国gdp增速放缓知乎 编辑:程序博客网 时间:2024/05/21 07:06
前一篇日志简单解释了hadoop streaming和用python些Mapper和Reducer,本文直接写过程和代码,后面会写一篇如何join。
1. 实现mapper和reducer,代码如下:#!/usr/bin/env python# coding:utf-8"""@author: duanmeng@outlook.com@file: word_count.py@bref: count each word in file word_count and output ('word','sum')"""import sysdef mapper(): for line in sys.stdin: item = line.strip().strip('.').split(' ') for word in item: print "%s\t%s" % (word, '1')def reducer(): (last_word, last_count) = (None, 0) for line in sys.stdin: item = line.strip().split('\t') word = item[0] count = item[1] #print word, count if last_word and last_word != word: print "%s\t%s" % (last_word, last_count) last_word = word last_count = int(count) else: last_word = word last_count = last_count + int(count) if last_word: print "%s\t%s" % (last_word, last_count)if __name__ == '__main__': type = sys.argv[1] if type == 'm': mapper() elif type == 'r': reducer() else: exit(1)2. 上传word_count到hdfshadoop fs -put word_count $HDFS/test内容如下:The quick brown fox jumps over the lazy dog.The quick brown fox jumps over the lazy dog.The quick brown fox jumps over the lazy dog.The quick brown fox jumps over the lazy dog.The quick brown fox jumps over the lazy dog.The quick brown fox jumps over the lazy dog.The quick brown fox jumps over the lazy dog.The quick brown fox jumps over the lazy dog.The quick brown fox jumps over the lazy dog.The quick brown fox jumps over the lazy dog.3. 运行hadoop任务hadoop streaming -input $HDFS/test/word_count -output $HDFS/output -mapper 'python word_count.py m' -reducer 'python word_count.py r' -file word_count.py -numReduceTasks 14. 查看结果hadoop fs -cat $HDFS/output/part-00000The 10brown 10dog 10fox 10jumps 10lazy 10over 10quick 10the 10
0 0
- 学习hadoop(2)单词统计
- hadoop学习之路(二)hadoop基本概念原理以及单词统计任务源码分析
- Hadoop C++单词统计
- hadoop hellokitty 单词统计
- Hadoop实例-----统计单词个数
- 使用Hadoop实现单词统计
- Hadoop读书笔记(五)MapReduce统计单词demo
- hadoop简单应用-统计文本文件单词个数
- Hadoop示例程序之单词统计MapReduce
- Hadoop示例程序之单词统计MapReduce
- Hadoop示例程序之单词统计MapReduce
- Hadoop示例程序之单词统计MapReduce
- hadoop的统计单词程序WordCount
- Hadoop-MapReduce初步应用-统计单词个数
- Hadoop/MapReduce(单词统计--读写数据库)
- hadoop单节点配置并且统计单词
- Hadoop单词统计-各个过程详细说明
- Hadoop mapduce 统计单词编程示例
- [拓展]杭电1003(最大子数组问题)
- cin与cout
- 1、为什么每个viewDidLoad方法中都要使用[super viewDidLoad] ?
- 字符串替换 hihoCoder1082 然而沼跃鱼早就看穿了一切
- 在Linux系统下安装mysql
- 学习hadoop(2)单词统计
- 【C++ STL学习之五】容器set和multiset
- ssh常用用法小结
- 黑马学习笔记_javaIO(二)
- SQL like模糊查询
- iGrimaceVX3.0.0基本使用教程
- jeecms之入口解决和模板复制
- 第0004道练习题_Python统计文本里单词出现次数
- 黑马程序员———Java编程基础之面向对象