Hadoop single node安装
来源:互联网 发布:张北大数据云计算 编辑:程序博客网 时间:2024/05/17 00:57
n this tutorial I will describe the required steps for setting upa
Hadoop is a framework written in Java for running applications onlarge clusters of commodity hardware and incorporates featuressimilar to those of the
The main goal of this tutorial is to get a simple Hadoopinstallation up and running so that you can play around with thesoftware and learn more about it.
This tutorial has been tested with the following softwareversions:
- UbuntuLinux
10.04 LTS (deprecated: 8.10 LTS, 8.04,7.10, 7.04) - Hadoop
1.0.3,released May 2012
Prerequisites
Sun Java 6
Hadoop requires a working Java 1.5+ (aka Java 5) installation.However, using
The full JDK which will be placed in /usr/lib/jvm/java-6-sun
After installation, make a quick check whether Sun’s JDK iscorrectly set up:
Adding a dedicated Hadoop system user
We will use a dedicated Hadoop user account for running Hadoop.While that’s not required it is recommended because it helps toseparate the Hadoop installation from other software applicationsand user accounts running on the same machine (think: security,permissions, backups, etc).
This will add the user hduser
hadoop
Configuring SSH
Hadoop requires SSH access to manage its nodes, i.e. remotemachines plus your local machine if you want to use Hadoop on it(which is what we want to do in this short tutorial). For oursingle-node setup of Hadoop, we therefore need to configure SSHaccess to localhost
hduser
userwe created in the previous section.
I assume that you have SSH up and running on your machine andconfigured it to allow SSH public key authentication. If not, thereare
First, we have to generate an SSH key forthe hduser
The second line will create an RSA key pair with an empty password.Generally, using an empty password is not recommended, but in thiscase it is needed to unlock the key without your interaction (youdon’t want to enter the passphrase every time Hadoop interacts withits nodes).
Second, you have to enable SSH access to your local machine withthis newly created key.
The final step is to test the SSH setup by connecting to your localmachine with the hduser
hduser
known_hosts
$HOME/.ssh/config
manssh_config
If the SSH connect should fail, these general tips might help:
- Enable debugging with
ssh-vvv localhost
and investigate the error indetail. - Check the SSH server configuration in
/etc/ssh/sshd_config
,in particular the optionsPubkeyAuthentication
(whichshould be set to yes
)andAllowUsers
(ifthis option is active, add the hduser
userto it). If you made any changes to the SSH server configurationfile, you can force a configuration reloadwith sudo/etc/init.d/ssh reload
.
Disabling IPv6
One problem with IPv6 on Ubuntu is thatusing 0.0.0.0
To disable IPv6 on Ubuntu 10.04 LTS,open /etc/sysctl.conf
You have to reboot your machine in order to make the changes takeeffect.
You can check whether IPv6 is enabled on your machine with thefollowing command:
A return value of 0 means IPv6 is enabled, a value of 1 meansdisabled (that’s what we want).
Alternative
You can also disable IPv6 only for Hadoop as documentedin conf/hadoop-env.sh
:
Hadoop
Installation
DownloadHadoop /usr/local/hadoop
.Make sure to change the owner of all the files tothe hduser
hadoop
(Just to give you the idea, YMMV – personally, I create a symlinkfrom hadoop-1.0.3
hadoop
.)
Update $HOME/.bashrc
Add the following lines to the end ofthe $HOME/.bashrc
hduser
.If you use a shell other than bash, you should of course update itsappropriate configuration files insteadof .bashrc
.
You can repeat this exercise also for other users who want to useHadoop.
Excursus: Hadoop Distributed File System (HDFS)
Before we continue let us briefly learn a bit more about Hadoop’sdistributed file system.
The Hadoop Distributed File System (HDFS) is a distributed filesystem designed to run on commodity hardware. It has manysimilarities with existing distributed file systems. However, thedifferences from other distributed file systems are significant.HDFS is highly fault-tolerant and is designed to be deployed onlow-cost hardware. HDFS provides high throughput access toapplication data and is suitable for applications that have largedata sets. HDFS relaxes a few POSIX requirements to enablestreaming access to file system data. HDFS was originally built asinfrastructure for the Apache Nutch web search engine project. HDFSis part of the Apache Hadoop project, which is part of the ApacheLucene project.
TheHadoop Distributed File System: Architecture andDesignhadoop.apache.org/hdfs/docs/…
The following picture gives an overview of the most important HDFScomponents.
Configuration
Our goal in this tutorial is a single-node setup of Hadoop. Moreinformation of what we do in this section is available onthe
hadoop-env.sh
The only required environment variable we have to configure forHadoop in this tutorial is JAVA_HOME
.Open conf/hadoop-env.sh
/usr/local/hadoop/conf/hadoop-env.sh
)and set the JAVA_HOME
Change
to
Note: If you are on a Mac with OS X 10.7 you can use the followingline to set up JAVA_HOME
conf/hadoop-env.sh
.
conf/*-site.xml
In this section, we will configure the directory where Hadoop willstore its data files, the network ports it listens to, etc. Oursetup will use Hadoop’s Distributed FileSystem,
You can leave the settings below “as is” with the exception ofthe hadoop.tmp.dir
/app/hadoop/tmp
hadoop.tmp.dir
Now we create the directory and set the required ownerships andpermissions:
If you forget to set the required ownerships and permissions, youwill see a java.io.IOException
whenyou try to format the name node in the next section).
Add the following snippets between the ...
In file conf/core-site.xml
:
In file conf/mapred-site.xml
:
In file conf/hdfs-site.xml
:
See
- Hadoop single node安装
- Hadoop single node安装 (续)
- hadoop 单节点安装 Single Node Setup
- Hadoop - single node setup
- Hadoop 1.2.1 单节点安装(Single Node Setup)步骤
- Hadoop 2.6 single node cluster安装中的几个问题
- 第4章 Hadoop 2.6 Single Node Cluster 安装指令
- hadoop single-node配置手记
- Hadoop 2.2 Single-Node Setup
- ubuntu linux 下 single node hadoop
- Hadoop: Setting up a Single Node Cluster
- Running Hadoop On Ubuntu Linux (Single-Node Cluster)
- Running Hadoop on Ubuntu Linux (Single-Node Cluster)
- Setting up a Single Node Cluster on hadoop-0.23.9
- CDH3(Hadoop 0.20) Install -- RedHat6/CentOS6 -- Single Node
- Installing single node Hadoop 2.2.0 on Ubuntu
- hadoop探索-Setting up a Single Node Cluster
- Install Single Node Hadoop(2.7.2) on Mac
- 基数排序和计数排序
- 随机算法
- 空间复杂度为1的插入,冒泡,归并…
- 堆排序
- 虚拟机的三种联网方式
- Hadoop single node安装
- orcale删除重复记录
- Hadoop single node安装 (续)
- java socket学习
- interviewstreet上的一个coding题
- div中图片居中
- js字符串转换成数字
- vs2010:fatal error C1010: 在查找预编译头时遇到意外的文件结尾。是否忘记了向源中添加“#include "StdAfx.h"”?
- 编译opencore-amr for iOS