Windwos下安装Hadoop

来源:互联网 发布:app网站数据统计工具 编辑:程序博客网 时间:2024/05/19 04:27
由http://wiki.apache.org/hadoop/Hadoop2OnWindows  可知,2.x版本或更新的版本可以直接安装在Windows平台上。


在该网页中指出了几个注意点:   
  (1)Do not attempt to run the installation from within Cygwin. Cygwin is neither required nor supported. 也就是说2.x版本的不支持Cygwin,其实本身也不需要Cygwin,因为解压hadoop-2.7.1.tar.gz后可以看到其既提供了Linux平台下的.sh文件也提供了Windows平台下的.cmd文件。   
  (2)The official Apache Hadoop releases do not include Windows binaries.即需要链接库,由于2.2版本之前和之后的链接库不同,要下载对应的版本的链接库,这个可以从github下载或者csdn中下载,下载网址为:*http://download.csdn.net/download/kemp/8433131 解压该文件到xxx\hadoop-2.7.1\bin目录下    
  (3)JDK版本至少是JDK1.7,hadoop-2.7.1不再支持JDK1.6版本。   


---


##  做好了以上(2)的准备并下载好了[hadoop-2.7.1.tar.gz](http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz)后,可以安装Hadoop了,步骤如下:   
### 1.创建文件夹如D:\hadoop,再创建4个子文件夹code,data,deploy,sysdata. hadoop-2.7.1.tar.gz解压后的文件放在deploy目录下。
### 2.解压缩hadoop-2.7.1.tar.gz到D:\hadoop\deploy目录下
### 3. Example HDFS Configuration
  3.1 First edit the file hadoop-env.cmd to add the following lines near the end of the file.


    set HADOOP_PREFIX=D:\hadoop\deploy\hadoop-2.7.1
    set HADOOP_CONF_DIR=%HADOOP_PREFIX%\etc\hadoop
    set YARN_CONF_DIR=%HADOOP_CONF_DIR%
    set PATH=%PATH%;%HADOOP_PREFIX%\bin
  3.2 Edit or create the file core-site.xml and make sure it has the following configuration key: 
```xml
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://0.0.0.0:19000</value>
  </property>
</configuration>
```
  3.3 Edit or create the file hdfs-site.xml and add the following configuration key: 
```xml
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>
```
  3.4 Finally, edit or create the file slaves and make sure it has the following entry: 


    localhost
### 4.Example YARN Configuration
  4.1 Edit or create mapred-site.xml under %HADOOP_PREFIX%\etc\hadoop and add the following entries, replacing %USERNAME% with your Windows user name. 如我的计算机为BGK,就把%USERNAME%换成BGK
```xml
<configuration>


   <property>
     <name>mapreduce.job.user.name</name>
     <value>%USERNAME%</value>
   </property>


   <property>
     <name>mapreduce.framework.name</name>
     <value>yarn</value>
   </property>


  <property>
    <name>yarn.apps.stagingDir</name>
    <value>/user/%USERNAME%/staging</value>
  </property>


  <property>
    <name>mapreduce.jobtracker.address</name>
    <value>local</value>
  </property>


</configuration>
```
  4.2 Finally, edit or create yarn-site.xml and add the following entries: 
```xml
<configuration>
  <property>
    <name>yarn.server.resourcemanager.address</name>
    <value>0.0.0.0:8020</value>
  </property>


  <property>
    <name>yarn.server.resourcemanager.application.expiry.interval</name>
    <value>60000</value>
  </property>


  <property>
    <name>yarn.server.nodemanager.address</name>
    <value>0.0.0.0:45454</value>
  </property>


  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>


  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>


  <property>
    <name>yarn.server.nodemanager.remote-app-log-dir</name>
    <value>/app-logs</value>
  </property>


  <property>
    <name>yarn.nodemanager.log-dirs</name>
    <value>/dep/logs/userlogs</value>
  </property>


  <property>
    <name>yarn.server.mapreduce-appmanager.attempt-listener.bindAddress</name>
    <value>0.0.0.0</value>
  </property>


  <property>
    <name>yarn.server.mapreduce-appmanager.client-service.bindAddress</name>
    <value>0.0.0.0</value>
  </property>


  <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>


  <property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>-1</value>
  </property>


  <property>
    <name>yarn.application.classpath</name>
    <value>%HADOOP_CONF_DIR%,%HADOOP_COMMON_HOME%/share/hadoop/common/*,%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*</value>
  </property>
</configuration>
```
### 5. 初始化环境变量,运行etc\hadoop\hadoop-env.cmd
### 6. Format the FileSystem 进入D:\hadoop\deploy\hadoop-2.7.1\bin目录下,执行以下命令:


    hdfs namenode -format
### 7. Start HDFS Daemons,Run the following command to start the NameNode and DataNode on localhost.
     ./sbin/start-dfs.cmd




---


## 运行示例程序验证Hadoop安装


### 1.建立本地数据文件
在准备的Hadoop本地文件夹的data目录下建立一个data_in文件夹,并在此目录下create两个数据文件file1.txt和file2.txt,分别保存一个句子,“Hello world!”和"I am the king of the world!"
### 2.上传数据文件至HDFS文件系统
  2.1 Start HDFS Daemons


    ./sbin/start-dfs.cmd
  
  2.2 在HDFS上建立data_in目录


    hdfs dfs -mkdir /data_in


  2.3 上传数据文件


    hdfs dfs -put D:\hadoop\data\data_in\*.txt /data_in
  
  2.4 查看文件是否上传成功


    hdfs dfs -ls /data_in
### 3.执行Wordcount程序
  3.1 Start YARN Daemons


    ..\sbin\start-yarn.cmd


  3.2 运行wordcount(在bin目录下)


    yarn jar ..\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.5.0.jar wordcount /data_in/ /data_out/




经过以上步骤任务完成,在此期间遇到了较多问题,我一开始是参考<<Hadoop大数据处理>>附录来做的,后来找不到附录中说到的一些文件或文件夹,就开始百度,百度以后发现2.x版本和1.x版本差距较大,2.x版本目录结构也改变了,同时不需要cygwin即可在Windows环境下搭建Hadoop.但是官网上的文件是缺少了链接库的,所以还要找到2.7.1版本的链接库,如果不是对应的链接库,如2.2版本的,则会出现Exception in thread "main" java.lang.UnsatisfiedLinkError。几经周折,查到了一个网站:http://wiki.apache.org/hadoop/Hadoop2OnWindows,该网站非常详细的阐述了搭建的步骤。最终终于在Windows环境下搭建好了Hadoop。
  
0 1
原创粉丝点击