Hadoop install

来源：互联网发布：linux ant打包编辑：程序博客网时间：2024/05/17 10:42

Version:hadoop-1.0.3

1. download the newest version

http://www.apache.org/dyn/closer.cgi/hadoop/common/

2. unzip the downloaded zip file.

3. configuration

3.1) configure core-site.xml

<configuration>     <property>         <name>fs.default.name</name>         <value>hdfs://[hostname]:9000</value>     </property><property><name>hadoop.tmp.dir</name><value>/home/[username]/Hadoop/hadoop-1.0.3/tmp</value></property></configuration>

If we don't specify the parameter:hadoop.tmp.dir , each time we restart the hadoop cluster, we need to reformat the hadoop system.

3.2) hdfs-site.xml

<configuration>     <property>         <name>dfs.replication</name>         <value>1</value>     </property></configuration>

3.3) mapred-site.xml

<configuration>     <property>         <name>mapred.job.tracker</name>         <value>[<span style="font-family:Arial, Helvetica, sans-serif;">hostname]</span>:9001</value>     </property></configuration>

3.4) hadoop-env.sh

export JAVA_HOME=/opt/java/jdk1.7.0_51

3.5) masters and slaves

Since we set up a Pseudo-Distributed Mode, we change the file content from localhost to [hostname] for these 2 files.

4. Format the new distributed-system

In the terminal, input the following command:

$ bin/hadoop namenode -format

5. Setup passphraseless ssh

Check if you can ssh to the your hostname without a passphrase:

$ ssh [hostname]

If you cannot ssh to [hostname] without a passphrase, execute the following commands:

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Possible problem:

ssh: connect to host localhost port 22: Connection refused

It should be from ssh, sshd not installed or firewall blocked.

install ssh:

$ sudo apt-get install ssh

install sshd:

$ sudo apt-get install openssh-server<span><span>$ sudo net start sshd</span></span>

disable firewall

$ sudo ufw disable

6. start the hadoop

$ bin/start-all.sh

7. Succeed

1) Check from the terminal:

in the terminal, input:

$ jps

As we can see that, jobtracker, namenode, datanode, secondaryNamenode, taskTracker have already been started.

2) Check from the webpage:

namenode :

http://[hostname]:50070/

jobtracker:

http://[hostname]:50030/

0 0