count the email type
来源:互联网 发布:淘宝如何增加访客数 编辑:程序博客网 时间:2024/06/06 01:28
wolys@21cn.com
zss1984@126.com
294522652@qq.com
simulateboy@163.com
zhoushigang_123@163.com
sirenxing424@126.com
lixinyu23@qq.com
chenlei1201@gmail.com
370433835@qq.com
cxx0409@126.com
viv093@sina.com
q62148830@163.com
65993266@qq.com
summeredison@sohu.com
zhangbao-autumn@163.com
diduo_007@yahoo.com.cn
fxh852@163.com
weiyang1128@163.com
licaijun007@163.com
junhongshouji@126.com
wuxiaohong11111@163.com
fennal@sina.com
li_dao888@163.com
bokil.xu@163.com
362212053@qq.com
youloveyingying@yahoo.cn
boiny@126.com
linlixian200606@126.com
alex126126@126.com
654468252@qq.com
huangdaqiao@yahoo.com.cn
kitty12502@163.com
xl200811@sohu.com
ysjd8@163.com
851627938@qq.com
wubo_1225@163.com
kangtezc@163.com
xiao2018@126.com
121641873@qq.com
296489419@qq.com
beibeilong012@126.com
hdfs.delete('/user/output/30')
tsv.reader =
function(con, nrecs){
lines = readLines(con, 1)
if(length(lines) == 0)
NULL
else {
keyval('1',lines)}}
tsv.format = make.input.format(mode = "text", format = tsv.reader)
hdfs.root <- '/user'
hdfs.data <- file.path(hdfs.root, 'input/email.txt')
hdfs.out <- file.path(hdfs.root, 'output/30')
mr<-function(input=hdfs.data){
map<-function(k,v){
keyval(word(as.character(v), 2, sep = fixed('@')),1)
}
reduce =function(k, v ) {
keyval(k, sum(v))
}
d1<-mapreduce(input=input,input.format=tsv.format,output.format="text",output=hdfs.out,map=map,reduce=reduce)
}
d1<-mr(hdfs.data)
from.dfs(d1)
output:
qq.com 9
126.com 9
163.com 14
21cn.com 1
sina.com 2
sohu.com 2
yahoo.cn 1
gmail.com 1
yahoo.com.cn 2
如果在mapreduce程序里加一combine=T会报如下错误(待查原因):
2014-03-30 17:00:42,276 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201402261603_0072_m_000001_0: java.lang.RuntimeException: java.io.IOException: wrong key class: class org.apache.hadoop.io.Text is not class org.apache.hadoop.typedbytes.TypedBytesWritable
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:376)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1354)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1435)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1297)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.io.IOException: wrong key class: class org.apache.hadoop.io.Text is not class org.apache.hadoop.typedbytes.TypedBytesWritable
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:164)
at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1078)
at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:421)
- count the email type
- The History of Email
- Getting the CPU count
- hdu3336 Count the string
- 【線段樹】Count the Colors
- UVa10007 - Count the Trees
- hdu3336 Count the string
- Count the colors
- 10699 - Count the factors
- ZOJ1610-Count the Colors
- zoj1610-Count the Colors
- HDU3336:Count the string
- ZOJ1610 Count the Colors
- Count the Colors
- hdu3336-Count the string
- Count the string + KMP
- hdu3336 Count the string
- toj1868 Count the factors
- (未完成)python---抓取动态网页
- C语言指针数组和数组指针
- 开放-封闭原则
- 命令行升级ubuntu
- 【备忘录】Linux常用命令回顾
- count the email type
- glPolygonMode函数
- Spring Bean配置继承
- 阻止a 标签跳转
- 网页小游戏之2048
- 【转】如何使用git进行版本管理
- python 返回值(return)None
- JAVA IO流
- coco笔记:C算式算法总结(二)