RHadoop实现wordcount

来源:互联网 发布:冉启伟甘肃启航网络 编辑:程序博客网 时间:2024/06/03 12:12


参考文献:《Big Data Analytics With R And Hadoop》


wordcount等于hello word


so,begin


wordcount = function(input,output = NULL,pattern = " "){wc.map = function(., lines){keyval(unlist(strsplit(x = lines,split = pattern)),1)}wc.reduce = function(word, counts ){keyval(word, sum(counts))}mapreduce(input = input ,output = output,input.format = "text",map = wc.map,reduce = wc.reduce,combine = T)}hdfs.put('/home/hadoop/桌面/word','/RHadoop/1/')wordcount('/RHadoop/1/')

需要做的:修改路径,在对应路径创建一个名为word的文件。。然后把英文放进去。亲测。伪分布式600M十四分钟。10台机器完全分布式1.4G三分四十秒。不过当时环境不稳定现在应该更快。

0 0
原创粉丝点击