Hadoop2.x实战:Eclipse本地开发环境搭建与本地运行wordcount实例

来源:互联网 发布:网络通俗歌手大赛冠军 编辑:程序博客网 时间:2024/06/06 03:02

         林炳文Evankaka原创作品。转载请注明出处http://blog.csdn.net/evankaka

        摘要:本文主要讲了如何在windows上通过eclipse来远程运行hadoop应用程序

hadoop版本:2.7.2

hadoop安装系统:ubuntu14.04

eclipse版本:luna

eclipse安装系统:window64

一、开发环境配置

1、windows下载源码

到http://hadoop.apache.org/releases.html 下载源码如下:


解压后放在一个地方:


2、安装hadoop-eclipse-plugin-2.7.2.jar(具体版本视你的hadoop版本而定)

下载地址:http://download.csdn.net/detail/tondayong1981/9432425

 首先把hadoop-eclipse-plugin-2.7.2.jar(具体版本视你的hadoop版本而定)放到eclipse安装目录的plugins文件夹中,如果重新打开eclipse后看到有如下视图,则说明你的hadoop插件已经安装成功了:


其中的“hadoop installation directory”配置项用于指向你的hadoop安装目录(就是上一步的hadoop解压后的目录),在windows下你只需要把下载到的hadoop-2.7.2.tar.gz包解压到某个位置,然后指向这个位置即可。

3、配置连接参数

选择windwos->show view->others

确定后在弹出的窗口中选择map/reduce locations,然后右键New Hadoop location

 配置eclipse中的Map/Reduce Locations,如下图所示

注意,这里的linmaster指代的是linux的地址:

其对应的关系在:C:\Windows\System32\drivers\etc\hosts

# Copyright (c) 1993-2009 Microsoft Corp.## This is a sample HOSTS file used by Microsoft TCP/IP for Windows.## This file contains the mappings of IP addresses to host names. Each# entry should be kept on an individual line. The IP address should# be placed in the first column followed by the corresponding host name.# The IP address and the host name should be separated by at least one# space.## Additionally, comments (such as these) may be inserted on individual# lines or following the machine name denoted by a '#' symbol.## For example:##      102.54.94.97     rhino.acme.com          # source server#       38.25.63.10     x.acme.com              # x client host# localhost name resolution is handled within DNS itself.#127.0.0.1       localhost#10.75.4.22           localhost10.75.4.25       linmaster


一定在注意,这里的地址一定要和linux上的一样

在ubuntu中输入ifconfig


其中Map/Reduce Master配置和 mapred-site.xml相关(注意linmaster指定IP地址,要在etc/hosts中配置)


DFS master配置和linux上的core-site.xml 相关(注意linmaster指定IP地址,要在etc/hosts中配置)


IP地址和名称映射配置

vi  /etc/hosts 就可以添加 了


上面全部设置好了之后。先启动hadoop。然后在Eclipse资源管理器中,点击DFS Locations就可展开,说明连接成功。


二、本地运行hadoop应用

1、创建一个maven项目,引入jar,POM.XML如下

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><parent><groupId>com.func</groupId><artifactId>axc</artifactId><version>0.0.1-SNAPSHOT</version></parent><artifactId>axc-hadoop</artifactId><dependencies><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifactId><version>2.2.0</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-hdfs</artifactId><version>2.2.0</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>2.2.0</version></dependency><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>3.8.1</version><scope>test</scope></dependency><dependency><groupId>jdk.tools</groupId><artifactId>jdk.tools</artifactId><version>1.6</version><scope>system</scope><systemPath>${JAVA_HOME}/lib/tools.jar</systemPath></dependency></dependencies></project>

编写一个wordCount程序:

首先将hadoop安装在linux上的etc/hadoop下的文件放入到工程中。一定要注意是从你安装有Hadoop的linux机器 上拷贝下来的!


然后编写WordCount代码

package com.lin;import java.io.IOException;import java.util.Iterator;import java.util.StringTokenizer;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapred.FileInputFormat;import org.apache.hadoop.mapred.FileOutputFormat;import org.apache.hadoop.mapred.JobClient;import org.apache.hadoop.mapred.JobConf;import org.apache.hadoop.mapred.MapReduceBase;import org.apache.hadoop.mapred.Mapper;import org.apache.hadoop.mapred.OutputCollector;import org.apache.hadoop.mapred.Reducer;import org.apache.hadoop.mapred.Reporter;import org.apache.hadoop.mapred.TextInputFormat;import org.apache.hadoop.mapred.TextOutputFormat; public class WordCount {     public static class WordCountMapper extends MapReduceBase implements Mapper<Object, Text, Text, IntWritable> {        private final static IntWritable one =new IntWritable(1);        private Text word =new Text();         public void map(Object key,Text value,OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {            StringTokenizer itr = new StringTokenizer(value.toString());            while(itr.hasMoreTokens()) {                word.set(itr.nextToken());                output.collect(word,one);//字符解析成key-value,然后再发给reducer            }         }    }     public static class WordCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {        private IntWritable result =new IntWritable();         public void reduce(Text key, Iterator<IntWritable>values, OutputCollector<Text, IntWritable> output, Reporter reporter)throws IOException {            int sum = 0;            while (values.hasNext()){//key相同的map会被发送到同一个reducer,所以通过循环来累加                sum +=values.next().get();            }            result.set(sum);            output.collect(key, result);//结果写到hdfs        }          }     public static void main(String[] args)throws Exception {    //System.setProperty("hadoop.home.dir", "D:\\project\\hadoop-2.7.2");        String input = "hdfs://linmaster:9000/input/LICENSE.txt";//要确保linux上这个文件存在        String output = "hdfs://linmaster:9000/output14/";         JobConf conf = new JobConf(WordCount.class);        conf.setJobName("WordCount");        conf.addResource("classpath:/hadoop/core-site.xml");        conf.addResource("classpath:/hadoop/hdfs-site.xml");        conf.addResource("classpath:/hadoop/mapred-site.xml");             conf.setOutputKeyClass(Text.class);        conf.setOutputValueClass(IntWritable.class);        conf.setMapperClass(WordCountMapper.class);       conf.setCombinerClass(WordCountReducer.class);       conf.setReducerClass(WordCountReducer.class);         conf.setInputFormat(TextInputFormat.class);       conf.setOutputFormat(TextOutputFormat.class);         FileInputFormat.setInputPaths(conf,new Path(input));        FileOutputFormat.setOutputPath(conf,new Path(output));         JobClient.runJob(conf);        System.exit(0);    } }

输出结果:

SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/D:/Java/Repository/org/slf4j/slf4j-log4j12/1.7.5/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/D:/Java/Repository/ch/qos/logback/logback-classic/1.0.13/logback-classic-1.0.13.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]16/06/19 16:12:45 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id16/06/19 16:12:45 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=16/06/19 16:12:45 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized16/06/19 16:12:46 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.16/06/19 16:12:46 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).16/06/19 16:12:46 INFO mapred.FileInputFormat: Total input paths to process : 116/06/19 16:12:46 INFO mapreduce.JobSubmitter: number of splits:116/06/19 16:12:46 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name16/06/19 16:12:46 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class16/06/19 16:12:46 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name16/06/19 16:12:46 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir16/06/19 16:12:46 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir16/06/19 16:12:46 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps16/06/19 16:12:46 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class16/06/19 16:12:46 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir16/06/19 16:12:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local536235441_000116/06/19 16:12:46 WARN conf.Configuration: file:/usr/hadoop/hadoop-2.7.2/tmp/mapred/staging/linbingwen536235441/.staging/job_local536235441_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.16/06/19 16:12:46 WARN conf.Configuration: file:/usr/hadoop/hadoop-2.7.2/tmp/mapred/staging/linbingwen536235441/.staging/job_local536235441_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.16/06/19 16:12:46 WARN conf.Configuration: file:/usr/hadoop/hadoop-2.7.2/tmp/mapred/local/localRunner/linbingwen/job_local536235441_0001/job_local536235441_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.16/06/19 16:12:46 WARN conf.Configuration: file:/usr/hadoop/hadoop-2.7.2/tmp/mapred/local/localRunner/linbingwen/job_local536235441_0001/job_local536235441_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.16/06/19 16:12:46 INFO mapreduce.Job: The url to track the job: http://localhost:8080/16/06/19 16:12:46 INFO mapred.LocalJobRunner: OutputCommitter set in config null16/06/19 16:12:47 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter16/06/19 16:12:47 INFO mapreduce.Job: Running job: job_local536235441_000116/06/19 16:12:47 INFO mapred.LocalJobRunner: Waiting for map tasks16/06/19 16:12:47 INFO mapred.LocalJobRunner: Starting task: attempt_local536235441_0001_m_000000_016/06/19 16:12:47 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.16/06/19 16:12:47 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@12e29f3616/06/19 16:12:47 INFO mapred.MapTask: Processing split: hdfs://linmaster:9000/input/LICENSE.txt:0+1542916/06/19 16:12:47 INFO mapred.MapTask: numReduceTasks: 116/06/19 16:12:47 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer16/06/19 16:12:47 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)16/06/19 16:12:47 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 10016/06/19 16:12:47 INFO mapred.MapTask: soft limit at 8388608016/06/19 16:12:47 INFO mapred.MapTask: bufstart = 0; bufvoid = 10485760016/06/19 16:12:47 INFO mapred.MapTask: kvstart = 26214396; length = 655360016/06/19 16:12:47 INFO mapred.LocalJobRunner: 16/06/19 16:12:47 INFO mapred.MapTask: Starting flush of map output16/06/19 16:12:47 INFO mapred.MapTask: Spilling map output16/06/19 16:12:47 INFO mapred.MapTask: bufstart = 0; bufend = 22735; bufvoid = 10485760016/06/19 16:12:47 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26205772(104823088); length = 8625/655360016/06/19 16:12:48 INFO mapreduce.Job: Job job_local536235441_0001 running in uber mode : false16/06/19 16:12:48 INFO mapreduce.Job:  map 0% reduce 0%16/06/19 16:12:48 INFO mapred.MapTask: Finished spill 016/06/19 16:12:48 INFO mapred.Task: Task:attempt_local536235441_0001_m_000000_0 is done. And is in the process of committing16/06/19 16:12:48 INFO mapred.LocalJobRunner: hdfs://linmaster:9000/input/LICENSE.txt:0+1542916/06/19 16:12:48 INFO mapred.Task: Task 'attempt_local536235441_0001_m_000000_0' done.16/06/19 16:12:48 INFO mapred.LocalJobRunner: Finishing task: attempt_local536235441_0001_m_000000_016/06/19 16:12:48 INFO mapred.LocalJobRunner: Map task executor complete.16/06/19 16:12:48 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.16/06/19 16:12:48 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@6f0c29fe16/06/19 16:12:48 INFO mapred.Merger: Merging 1 sorted segments16/06/19 16:12:48 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 10982 bytes16/06/19 16:12:48 INFO mapred.LocalJobRunner: 16/06/19 16:12:48 INFO mapred.Task: Task:attempt_local536235441_0001_r_000000_0 is done. And is in the process of committing16/06/19 16:12:48 INFO mapred.LocalJobRunner: 16/06/19 16:12:48 INFO mapred.Task: Task attempt_local536235441_0001_r_000000_0 is allowed to commit now16/06/19 16:12:48 INFO output.FileOutputCommitter: Saved output of task 'attempt_local536235441_0001_r_000000_0' to hdfs://linmaster:9000/output14/_temporary/0/task_local536235441_0001_r_00000016/06/19 16:12:48 INFO mapred.LocalJobRunner: reduce > reduce16/06/19 16:12:48 INFO mapred.Task: Task 'attempt_local536235441_0001_r_000000_0' done.16/06/19 16:12:49 INFO mapreduce.Job:  map 100% reduce 100%16/06/19 16:12:49 INFO mapreduce.Job: Job job_local536235441_0001 completed successfully16/06/19 16:12:49 INFO mapreduce.Job: Counters: 32File System CountersFILE: Number of bytes read=11306FILE: Number of bytes written=409424FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=30858HDFS: Number of bytes written=8006HDFS: Number of read operations=13HDFS: Number of large read operations=0HDFS: Number of write operations=4Map-Reduce FrameworkMap input records=289Map output records=2157Map output bytes=22735Map output materialized bytes=10992Input split bytes=91Combine input records=2157Combine output records=755Reduce input groups=755Reduce shuffle bytes=0Reduce input records=755Reduce output records=755Spilled Records=1510Shuffled Maps =0Failed Shuffles=0Merged Map outputs=0GC time elapsed (ms)=0CPU time spent (ms)=0Physical memory (bytes) snapshot=0Virtual memory (bytes) snapshot=0Total committed heap usage (bytes)=514588672File Input Format Counters Bytes Read=15429File Output Format Counters Bytes Written=8006
表明程序运行成功了!

三、问题及解决方法

在程序运行成功之前,你肯定会遇到如下问题:

1、
org.apache.hadoop.security.AccessControlException: Permission denied: user=linbingwen, access=WRITE, inode="/":hadoop:supergroup:drwxr-xr-x
解决办法:
在 hdfs-site.xml 添加参数:

<property><name>dfs.permissions</name><value>false</value></property> 
2、

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /output12/_temporary/0. Name node is in safe mode.
The reported blocks 17 has reached the threshold 0.9990 of total blocks 17. The number of live datanodes 1 has reached the minimum number 0. In safe mode extension. Safe mode will be turned off automatically in 4 seconds.

解决方法

bin/Hadoop  dfsadmin -safemode leave

3、

 Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (无法定位登录配置)
解决方法:
C:\Windows\System32\drivers\etc 下hosts文件添加.这个要根据你linux上机器的配置不同而变化
10.75.4.25    linmaster

4、
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
分析:
下载Hadoop2以上版本时,在Hadoop2的bin目录下没有winutils.exe
解决方法:
下载https://codeload.github.com/srccodes/hadoop-common-2.2.0-bin/zip/master下载hadoop-common-2.2.0-bin-master.zip,然后解压后,
把hadoop-common-2.2.0-bin-master下的bin全部复制放到我们下载的Hadoop2的binHadoop2/bin目录下。如图所示:

设置HADOOP_HOME路径如果HADOOP_HOME为空,必然fullExeName为null\bin\winutils.exe。解决方法很简单,配置环境变量



注意, 一定要重启电脑 才能生效。注意, 一定要重启电脑 才能生效。注意, 一定要重启电脑 才能生效。

不想重启电脑可以在程序里加上:
System.setProperty("hadoop.home.dir", "E:\\Program Files\\hadoop-2.7.0");
注:E:\\Program Files\\hadoop-2.7.0是我本机解压的hadoop的路径。

5、
Exception in thread "main"java.lang.UnsatisfiedLinkError:org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
分析:
C:\Windows\System32下缺少hadoop.dll,把这个文件拷贝到C:\Windows\System32下面即可。
解决:
hadoop-common-2.2.0-bin-master下的bin的hadoop.dll放到C:\Windows\System32下,笔者没有重启电脑就可以了


0 0