Ubuntu Hadoop 完全分布式安装

来源:互联网 发布:maya2016汉化补丁 mac 编辑:程序博客网 时间:2024/05/29 18:27

参考文献:

1.   http://hadoop.apache.org/common/docs/r0.19.2/cn/quickstart.html#%E8%BF%90%E8%A1%8CHadoop%E9%9B%86%E7%BE%A4%E7%9A%84%E5%87%86%E5%A4%87%E5%B7%A5%E4%BD%9C

2.  http://yymmiinngg.iteye.com/blog/706699

3.  http://www.linuxidc.com/Linux/2011-08/41661.htm


一. 配置jdk

     重要:配置环境变量

     vi /etc/profile/

    在最后面加入:

set Java Environment
export JAVA_HOME=/usr/lib/jvm/java-6-sun
export CLASSPATH=".:$JAVA_HOME/lib:$CLASSPATH"
export PATH="$JAVA_HOME/:$PATH"VA_HOME=/usr/lib/jvm/java-6-sun
PATH=$JAVA_HOME/bin:$PATH
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export JAVA_HOME
export PATH
export CLASSPATH
export CATALINA_HOME=/usr/local/tomcat

export CLASSPATH=.:$JAVA_HOME/lib:$CATALINA_HOME/lib

export PATH=$PATH:$CATALINA_HOME/bin
ANT_HOME=/usr/local/ant
PATH=$JAVA_HOME/bin:$ANT_HOME/bin:$PATH
export  ANT_HOME PATH


二、配置ssh

安装SSH

sudo apt-get install openssh-server


    a. 用 ssh-key-gen 在本地主机上创建公钥和密钥
                            [root@www.linuxidc.com ~]# ssh-keygen -t  rsa

                            Enter file in which to save the key (/home/jsmith/.ssh/id_rsa):[Enter key]

                            Enter passphrase (empty for no passphrase): [Press enter key]

                            Enter same passphrase again: [Pess enter key]

                            Your identification has been saved in /home/jsmith/.ssh/id_rsa.

                            Your public key has been saved in /home/jsmith/.ssh/id_rsa.pub.

                            The key fingerprint is: 33:b3:fe:af:95:95:18:11:31:d5:de:96:2f:f2:35:f9

                            root@www.linuxidc.com

            完成后,在home跟目录下会产生隐藏文件夹.ssh

                                   $ cd .ssh

之后ls 查看文件

 

                                  cp id_rsa.pub  authorized_keys

uthorized



hadoop@hadoop .ssh]$ scp authorized_keys   node2:/home/hadoop/.ssh/

hadoop@hadoop .ssh]$chmod 644 authorized_keys

b. 用 ssh-copy-id 把公钥复制到远程主机上
                            [root@www.linuxidc.com ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub  root@Datanode1   //Datanode1 为IP

                            root@Datanode1's password:

                            Now try logging into the machine, with ―ssh ?root@Datanode1‘‖, and check in:

                            .ssh/authorized_keys to make sure we haven‘t added extra keys that you weren‘t expecting.

                            [注: ssh-copy-id 把密钥追加到远程主机的 .ssh/authorized_key 上.]

   c. 直接登录远程主机
                            [root@www.linuxidc.com ~]# ssh Datanode1

                            Last login: Sun Nov 16 17:22:33 2008 from 192.168.1.2

                            [注: SSH 不会询问密码.]

                            [root@Datanode1 ~]

                            [注: 你现在已经登录到了远程主机上]

   d. 注意:在这里,执行都在Namenode上面,而且Namenode也需要对自己进行无密码操作即

            执行下面的命令:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys


      【不用执行这句】 [root@www.linuxidc.com ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub  root@www.linuxidc.com操作,

      其他的,按照a-c重复操作Datanode2和Datanode3就行了

      一定要能无密码访问,否则不能集群hadoop一定失败.


三、配置hadoop(在conf文件夹下)

       A。namenode:

                 a.  core-site.xml:

                     

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration>   <property>       <name>fs.default.name</name>       <value>hdfs://10.108.32.97:9000</value>   //10.108.32.97为IP   </property> <property>     <name>hadoop.tmp.dir</name>    <value>/home/yourname/tmp</value>    //注意:tmp目录必须为空 </property></configuration>

             

b. hadoop-env.sh

 

# The java implementation to use.  Required. export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.06

           c.  hdfs-site.xml

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration><property><name>dfs.replication</name><value>2</value>  </property>        <property>                               <name>dfs.name.dir</name>                <value>/home/yourname/hdfs/name</value>        </property>        <property>                             <name>dfs.data.dir</name>                 <value>/home/yourname/hdfs/data</value>        </property>        <property>                                <name>dfs.permissions</name>                <value>false</value>       </property></configuration>
          d. mapred-site.xml
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. -->   <configuration>              <property>                                        <name>mapred.job.tracker</name>                      <value>10.108.32.97:9001</value>              </property>                  </configuration>  
     e. conf/masters:

     namenode 的iP地址

    f.  conf/slaves:

     datanode的ip地址

g. scp -r /home/yourname/hadoop slave1:/home/dataname1/

   scp -r /home/yourname/hadoop slave2:/home/dataname2/


B.格式化一个新的分布式文件系统:
$ bin/hadoop namenode -format

启动Hadoop守护进程:
$ bin/start-all.sh

Hadoop守护进程的日志写入到 ${HADOOP_LOG_DIR} 目录 (默认是 ${HADOOP_HOME}/logs).

浏览NameNode和JobTracker的网络接口,它们的地址默认为:

NameNode - http://localhost:50070/
JobTracker - http://localhost:50030/

将输入文件拷贝到分布式文件系统:
$ bin/hadoop fs -put conf input

运行发行版提供的示例程序:
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'

查看输出文件:

将输出文件从分布式文件系统拷贝到本地文件系统查看:
$ bin/hadoop fs -get output output
$ cat output/*

或者

在分布式文件系统上查看输出文件:
$ bin/hadoop fs -cat output/*

完成全部操作后,停止守护进程:
$ bin/stop-all.sh

原创粉丝点击