count the email type

来源:互联网 发布:淘宝如何增加访客数 编辑:程序博客网 时间:2024/06/06 01:28
/user/input/email.txt
wolys@21cn.com
zss1984@126.com
294522652@qq.com
simulateboy@163.com
zhoushigang_123@163.com
sirenxing424@126.com
lixinyu23@qq.com
chenlei1201@gmail.com
370433835@qq.com
cxx0409@126.com
viv093@sina.com
q62148830@163.com
65993266@qq.com
summeredison@sohu.com
zhangbao-autumn@163.com
diduo_007@yahoo.com.cn
fxh852@163.com
weiyang1128@163.com
licaijun007@163.com
junhongshouji@126.com
wuxiaohong11111@163.com
fennal@sina.com
li_dao888@163.com
bokil.xu@163.com
362212053@qq.com
youloveyingying@yahoo.cn
boiny@126.com
linlixian200606@126.com
alex126126@126.com
654468252@qq.com
huangdaqiao@yahoo.com.cn
kitty12502@163.com
xl200811@sohu.com
ysjd8@163.com
851627938@qq.com
wubo_1225@163.com
kangtezc@163.com
xiao2018@126.com
121641873@qq.com
296489419@qq.com
beibeilong012@126.com


hdfs.delete('/user/output/30')
tsv.reader = 
  function(con, nrecs){
    lines = readLines(con, 1)
    if(length(lines) == 0)
      NULL
    else {
      keyval('1',lines)}}
tsv.format = make.input.format(mode = "text", format = tsv.reader)
hdfs.root <- '/user'
hdfs.data <- file.path(hdfs.root, 'input/email.txt')
hdfs.out <- file.path(hdfs.root, 'output/30')
mr<-function(input=hdfs.data){
map<-function(k,v){
keyval(word(as.character(v), 2, sep = fixed('@')),1)
}
reduce =function(k, v ) {
keyval(k, sum(v))
}
d1<-mapreduce(input=input,input.format=tsv.format,output.format="text",output=hdfs.out,map=map,reduce=reduce)
}
d1<-mr(hdfs.data)
from.dfs(d1)


output:
qq.com  9
126.com 9
163.com 14
21cn.com        1
sina.com        2
sohu.com        2
yahoo.cn        1
gmail.com       1

yahoo.com.cn    2


如果在mapreduce程序里加一combine=T会报如下错误(待查原因):

2014-03-30 17:00:42,276 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201402261603_0072_m_000001_0: java.lang.RuntimeException: java.io.IOException: wrong key class: class org.apache.hadoop.io.Text is not class org.apache.hadoop.typedbytes.TypedBytesWritable
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:376)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
        at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
        at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1354)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1435)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1297)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.io.IOException: wrong key class: class org.apache.hadoop.io.Text is not class org.apache.hadoop.typedbytes.TypedBytesWritable
        at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:164)
        at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1078)
        at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:421)

0 0
原创粉丝点击