IntelliJ IDEA 运行Hadoop2.7.0 wordcount 实例

来源:互联网 发布:长期股权投资商誉算法 编辑:程序博客网 时间:2024/06/05 03:02

IntelliJ IDEA 运行Hadoop2.7.0 wordcount 实例

背景

       Hadoop2.7.0在虚拟机上安装完成,core-site.xml中配置的fs.defaultFS 端口为9000。

1 新建maven项目


2 配置pom.xml

由于我虚拟机中的Hadoop版本为2.7.0,所以这里的maven的Hadoop版本必须对应,不然会出错。具体配置如下:

<?xml version="1.0"encoding="UTF-8"?>
<projectxmlns="http://maven.apache.org/POM/4.0.0"
        
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <
modelVersion>4.0.0</modelVersion>

    <
groupId>daiwei</groupId>
    <
artifactId>hadoop.wordcount</artifactId>
    <
version>1.0-SNAPSHOT</version>
    <
build>
        <
plugins>
            <
plugin>
                <
groupId>org.apache.maven.plugins</groupId>
                <
artifactId>maven-compiler-plugin</artifactId>
                <
configuration>
                    <
source>1.6</source>
                    <
target>1.6</target>
                </
configuration>
            </
plugin>
        </
plugins>
    </
build>
    <
repositories>
        <
repository>
            <
id>apache</id>
            <
url>http://maven.apache.org</url>
        </
repository>
    </
repositories>
    <
dependencies>
        <
dependency>
            <
groupId>org.apache.hadoop</groupId>
            <
artifactId>hadoop-client</artifactId>
            <
version>2.7.0</version>
        </
dependency>
        <
dependency>
            <
groupId>org.apache.hadoop</groupId>
            <
artifactId>hadoop-common</artifactId>
            <
version>2.7.0</version>
        </
dependency>
        <
dependency>
            <
groupId>org.apache.hadoop</groupId>
            <
artifactId>hadoop-hdfs</artifactId>
            <
version>2.7.0</version>
        </
dependency>
        <
dependency>
            <
groupId>org.apache.hadoop</groupId>
            <
artifactId>hadoop-client</artifactId>
            <
version>2.7.0</version>
        </
dependency>
        <
dependency>
            <
groupId>org.apache.hadoop</groupId>
            <
artifactId>hadoop-core</artifactId>
            <
version>1.2.0</version>
        </
dependency>
    </
dependencies>
</
project>

注意:hadoop-client、hadoop-common、hadoop-hdfs为必须的,hadoop-core我一开始的版本为1.2.1,会报错,说IPC的version无法对应,是因为1.2.1的版本maven会下载2.7.1的hadoop-client, 所以这里hadoop-core 版本改为1.2.0并且加入hadoop-client 2.7.0的版本。

3 WordCount 代码

/**
 * Created by Administrator on 2017/1/16.
 */

import org.apache.hadoop.fs.Path;
import
org.apache.hadoop.io.IntWritable;
import
org.apache.hadoop.io.LongWritable;
import
org.apache.hadoop.io.Text;
import
org.apache.hadoop.mapred.*;

import
java.io.IOException;
import
java.util.Iterator;
import
java.util.StringTokenizer;

public class
WordCount {
   
publicstatic class Mapextends MapReduceBaseimplementsMapper<LongWritable,Text,Text,IntWritable>{
       
privatefinal static IntWritableone =newIntWritable(1);
        private
Textword= new Text();

        @Override
        public void
map(LongWritablelongWritable,Text text,OutputCollector<Text,IntWritable>outputCollector,Reporter reporter)throwsIOException{
           
String line=text.toString();
           
StringTokenizertokenizer= new StringTokenizer(line);
            while
(tokenizer.hasMoreTokens()){
               
word.set(tokenizer.nextToken());
               
outputCollector.collect(word,one);
           
}
        }
    }

   
publicstatic class Reduceextends MapReduceBaseimplementsReducer<Text,IntWritable,Text,IntWritable>{
       
@Override
        public void
reduce(Texttext,Iterator<IntWritable>iterator,OutputCollector<Text,IntWritable>outputCollector,Reporter reporter)throwsIOException{
           
int sum=0;
            while
(iterator.hasNext()){
                sum
+=iterator.next().get();
           
}
           
outputCollector.collect(text,newIntWritable(sum));
       
}
    }

   
publicstatic void main(String[]args)throws Exception{
       
JobConf conf= newJobConf(WordCount.class);
       
conf.setJobName("wordcount");
       
conf.setOutputKeyClass(Text.class);
       
conf.setOutputValueClass(IntWritable.class);

       
conf.setMapperClass(Map.class);
       
conf.setReducerClass(Reduce.class);

       
conf.setInputFormat(TextInputFormat.class);
       
conf.setOutputFormat(TextOutputFormat.class);

       
FileInputFormat.setInputPaths(conf,newPath("hdfs://master:9000/thesis/input/"));
//       FileOutputFormat.setOutputPath(conf, newPath("hdfs://master:9000/thesis/output8"));
       
FileOutputFormat.setOutputPath(conf,newPath(args[0]));

       
JobClient.runJob(conf);
   
}

}

以上注意,我已经直接写死了文本输入的目录,这里可以自行改成你们自己的目录,注意,该目录一定要存在

4 打包jar 运行


选择项目结构,然后开始创建jar包格式。

注意填写mainclass,当然不填也是可以的,如果这里填写了,那么传输至Linux后可以直接 yarn jar ***.jar /thesis/output 就是不需要再次填写mainclass 了,如果没有填就这么运行 yarn jar ***.jar WordCount /thesis/output

5 直接运行

点击右上角的中的下拉箭头,点击“Edit Configurations…”,并点击左上角的“+”号,并选择“Application” 


Main Class是可以选择的,program arguments我已经写死了一个,所以这里就再写一个就行,然后点OK就可以了。然后就可以运行了。

注意这里可能会报错,就是报“Could not locate executablenull\bin\winutils.exe in the Hadoop binaries这是因为缺少了weinutils.exe导致的,到网上下一个,然后下载解压到本地,把目录配置到环境变量中,再次运行就可以了。

 

0 0
原创粉丝点击