IntelliJ IDEA 运行Hadoop2.7.0 wordcount 实例

来源：互联网发布：长期股权投资商誉算法编辑：程序博客网时间：2024/06/05 03:02

背景

Hadoop2.7.0在虚拟机上安装完成，core-site.xml中配置的fs.defaultFS 端口为9000。

1 新建maven项目

2 配置pom.xml

由于我虚拟机中的Hadoop版本为2.7.0，所以这里的maven的Hadoop版本必须对应，不然会出错。具体配置如下：

<?xml version="1.0"encoding="UTF-8"?>
<projectxmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>daiwei</groupId>
    <artifactId>hadoop.wordcount</artifactId>
    <version>1.0-SNAPSHOT</version>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.6</source>
                    <target>1.6</target>
                </configuration>
            </plugin>
        </plugins>
    </build>
    <repositories>
        <repository>
            <id>apache</id>
            <url>http://maven.apache.org</url>
        </repository>
    </repositories>
    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-client</artifactId>
            <version>2.7.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.7.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.7.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.7.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-core</artifactId>
            <version>1.2.0</version>
        </dependency>
    </dependencies>
</project>

注意：hadoop-client、hadoop-common、hadoop-hdfs为必须的，hadoop-core我一开始的版本为1.2.1,会报错，说IPC的version无法对应，是因为1.2.1的版本maven会下载2.7.1的hadoop-client, 所以这里hadoop-core 版本改为1.2.0并且加入hadoop-client 2.7.0的版本。

3 WordCount 代码

/**
* Created by Administrator on 2017/1/16.
*/

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;

import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;

public class WordCount {
    publicstatic class Mapextends MapReduceBaseimplementsMapper<LongWritable,Text,Text,IntWritable>{
        privatefinal static IntWritableone =newIntWritable(1);
        private Textword= new Text();

        @Override
        public void map(LongWritablelongWritable,Text text,OutputCollector<Text,IntWritable>outputCollector,Reporter reporter)throwsIOException{
            String line=text.toString();
            StringTokenizertokenizer= new StringTokenizer(line);
            while (tokenizer.hasMoreTokens()){
                word.set(tokenizer.nextToken());
                outputCollector.collect(word,one);
            }
        }
    }

    publicstatic class Reduceextends MapReduceBaseimplementsReducer<Text,IntWritable,Text,IntWritable>{
        @Override
        public void reduce(Texttext,Iterator<IntWritable>iterator,OutputCollector<Text,IntWritable>outputCollector,Reporter reporter)throwsIOException{
            int sum=0;
            while (iterator.hasNext()){
                sum +=iterator.next().get();
            }
            outputCollector.collect(text,newIntWritable(sum));
        }
    }

    publicstatic void main(String[]args)throws Exception{
        JobConf conf= newJobConf(WordCount.class);
        conf.setJobName("wordcount");
        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);

        conf.setMapperClass(Map.class);
        conf.setReducerClass(Reduce.class);

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf,newPath("hdfs://master:9000/thesis/input/"));
//       FileOutputFormat.setOutputPath(conf, newPath("hdfs://master:9000/thesis/output8"));
        FileOutputFormat.setOutputPath(conf,newPath(args[0]));

        JobClient.runJob(conf);
    }

}

以上注意，我已经直接写死了文本输入的目录，这里可以自行改成你们自己的目录，注意，该目录一定要存在！

4 打包jar 运行

选择项目结构，然后开始创建jar包格式。

注意填写mainclass,当然不填也是可以的，如果这里填写了，那么传输至Linux后可以直接 yarn jar ***.jar /thesis/output 就是不需要再次填写mainclass 了，如果没有填就这么运行 yarn jar ***.jar WordCount /thesis/output

5 直接运行

点击右上角的中的下拉箭头，点击“Edit Configurations…”，并点击左上角的“+”号，并选择“Application”

Main Class是可以选择的，program arguments我已经写死了一个，所以这里就再写一个就行，然后点OK就可以了。然后就可以运行了。

注意这里可能会报错，就是报“Could not locate executablenull\bin\winutils.exe in the Hadoop binaries”，这是因为缺少了weinutils.exe导致的，到网上下一个，然后下载解压到本地，把目录配置到环境变量中，再次运行就可以了。

0 0