hadoop集群搭建

来源:互联网 发布:下载东西软件 编辑:程序博客网 时间:2024/05/29 17:54

因为spark中需要用到hadoop中的hdfs文件系统,所以在装好spark后需要再次基础上安装hadoop

  1. 配置ipjdk,修改hostname等,因为之前已配置好,这里不需要配置。(详情可见上一篇spark集群搭建:http://blog.csdn.net/mr_sunrise/article/details/74942660)

  2. 配置hadoop文件:

解压hadoop安装包并移动到自己的文件夹下。(这里下载的安装包一定要是二进制的,不然需要自行编译,很麻烦)

添加hadoop路径:》gedit~/.bashrc

exportHADOOP_HOME=/app/spark/hadoop-2.7.3

exportHADOOP_COMMON_HOME=$HADOOP_HOME

exportHADOOP_HDFS_HOME=$HADOOP_HOME

exportHADOOP_MAPRED_HOME=$HADOOP_HOME

exportHADOOP_YARN_HOME=$HADOOP_HOME

exportHADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

exportPATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/lib

exportHADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

exportHADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

exportJAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native

配置hadoop核心文件:

首先进入/hadoop-2.7.3/etc/hadoop.hadoop-evn.shyarn-evn.sh文件开头添加

exportJAVA_HOME=/app/soft/jdk1.8.0_121

配置core-site.xml(/etc/hadoop)

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://master:9000</value>

</property>

<property>

<name>io.file.buffer.size</name>

<value>131072</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/app/spark/hadoop-2.7.3/temp</value>

<description>Abase for other temporary directories.</description>

</property>

<property>

<name>hadoop.proxyuser.root.hosts</name>

<value>master</value>

</property>

<property>

<name>hadoop.proxyuser.root.groups</name>

<value>*</value>

</property>

</configuration>

配置hdfs-site.xml:

<configuration>

<property>

<name>dfs.namenode.name.dir</name>

<value>/app/spark/hadoop-2.7.3/name</value>

<final>true</final>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>/app/spark/hadoop-2.7.3/data</value>

<final>true</final>

</property>

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>



</configuration>

配置mapred-site.xml

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://master:9000</value>

</property>

<property>

<name>io.file.buffer.size</name>

<value>131072</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/app/spark/hadoop-2.7.3/temp</value>

<description>Abase for other temporary directories.</description>

</property>

<property>

<name>hadoop.proxyuser.root.hosts</name>

<value>master</value>

</property>

<property>

<name>hadoop.proxyuser.root.groups</name>

<value>*</value>

</property>

</configuration>

配置yarn-site.xml

<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:18030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:18041</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/app/spark/hadoop-2.7.3/mynode/my</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/app/spark/hadoop-2.7.3/mynode/logs</value>
</property>
<property>
<name>yarn.nodemanager.log.retain-seconds</name>
<value>10800</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/logs</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir-suffix</name>
<value>logs</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>-1</value>
</property>
<property>
<name>yarn.log-aggregation.retain-check-interval-seconds</name>
<value>-1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>3.把上述配置一次克隆到slave1slave2节点中相应的文件夹中.4.Ssh无密码登陆同spark配置相同,不需要在进行设置5.之后就可以启动hadoop

启动过程中的问题及解决方法:

Slave节点启动不了

修改/etc/hadoop/中的slaves文件,将localhost删掉,修改为slave1slave2,即添加从节点。

Namenode无法启动导致hdfs无法使用(执行hadoopdfs –ls报错:callfrom master/192.168.1.201 to master:/9000 failed to connected.)

解决办法:在slave节点上的操作:先stop-all.sh停止所有服务;将所有slave节点上的temp(即hdfs-site.xml中指定的dfs.data.dir文件夹,DataNode存放的数据块的位置),logs文件夹删掉,然后新建templogs文件夹。重新格式化:hadoop namenode –format

master节点上的操作:删除源目录,即core-site.xml下配置的hadoop.tmp.dir所指向的目录,删除后要重新配置新目录,然后运行hadoop namenode –format



原创粉丝点击