hadoop 单节点安装 Single Node Setup

来源:互联网 发布:办公室有老鼠 知乎 编辑:程序博客网 时间:2024/05/21 08:58

Single Node Setup

单节点安装



Purpose(说明)

This document describes how to set up andconfigure a single-node Hadoop installation so that you can quicklyperform simple operations using Hadoop MapReduce and the HadoopDistributed File System (HDFS).

这个文档描述了如何安装和配置一个单节点的Hadoop安装,这样很快的通过用HadoopMapReduceHaddop分布式文件系统(HDFS)执行一些简单的操作

Prerequisites(先决条件)

SupportedPlatforms(支持的平台)

  • GNU/Linux issupported as a development and production platform. Hadoop has beendemonstrated on GNU/Linux clusters with 2000 nodes.

  • GNU/Linux可以被支持为开发或生产平台。Hadoop被在2000节点的GNU/Linux检验测试过

  • Win32 is supported as a developmentplatform. Distributed operation has not been well tested onWin32, so it is not supported as aproduction platform.

  • Win32可以被支持为开发平台。分布式的操作还没有在win32上测试,因此win32不能被支持作为生产平台

RequiredSoftware(所需软件)

Required software for Linux and Windows include:

所需的软件包括用于LinuxWindows

  1. JavaTM1.6.x, preferably from Sun, must be installed.

Java1.6最好从来自sun的,必须安装

  1. ssh must be installed andsshd must be running to use the Hadoop scripts thatmanage remote Hadoop daemons.

ssh必须安装因为sshd必须运行去用Hadoop的脚本来管理远程的Hadoop守护进程(是一种无控制终端、无登录Shell与之相关联的后台进程.)

Additional requirements for Windows include:

Windows其它的需要

  1. Cygwin- Required for shell support in addition to the required softwareabove.

Cygwin-所需的为了shell的支持,除了上述所需的软件

InstallingSoftware(安装软件)

If your cluster doesn't have the requisitesoftware you will need to install it.

如果你的集群没有这些必要的软件你将需要安装它

For example on Ubuntu Linux:

UbuntuLinux上举例子

$ sudo apt-get install ssh
$ sudo apt-getinstall rsync

On Windows, if you did not install the requiredsoftware when you installed cygwin, start the cygwin installer andselect the packages:

  • openssh - the Net category

Windows,当你安装cygwin时,如果你没安装所需的软件,运行cygwin安装程序选择包:

  • openssh - the Net category

Download(下载)

To get a Hadoop distribution, download a recentstablerelease from one of the Apache DownloadMirrors.

为了下载一个Hadoop分布式,从Apache的下载镜像一个下载地址下载一个最近的稳定的发布版本

Prepare to Start the Hadoop Cluster(准备启动Hadoop集群)

Unpack the downloaded Hadoop distribution. In thedistribution, edit the file conf/hadoop-env.sh to define at leastJAVA_HOME to be the root of your Java installation.

解压下载的Hadoop分布式,在这个分布式中,编辑文件conf/hadoop-env.sh 至少定义JAVA_HOME是你的Java安装的根目录。

Try the following command:

尝试下面的命令:
$bin/hadoop
This will display the usage documentation for thehadoop script.

这将会将会显hadoop脚本的使用文档

Now you are ready to start your Hadoop cluster inone of the three supported modes:

现在,你已经准备好启动Hadoop集群中的三种支持的模式之一

  • Local (Standalone)Mode 本地(独立)模式

  • Pseudo-DistributedMode 伪分布式

  • Fully-Distributed Mode全分布模式

Standalone Operation(独立的操作)

By default, Hadoop is configured to run in anon-distributed mode, as a single Java process. This is useful fordebugging.

默认情况下,Hadoop被配置成运行非分布式模式,作为一个单独的java进程。这对调试很有用

The following example copies the unpacked confdirectory to use as input and then finds and displays every match ofthe given regular expression. Output is written to the given outputdirectory.

就是把加压的conf目录的一个拷贝作为输入目录,然后查找显示所给正则表达的每一个匹配,输出到所给出的输出目录
$mkdir input
$ cp conf/*.xml input
$ bin/hadoop jarhadoop-examples-*.jar grep input output 'dfs[a-z.]+'
$ catoutput/*

Pseudo-DistributedOperation(伪分布式的操作)

Hadoop can also be run on a single-node in apseudo-distributed mode where each Hadoop daemon runs in a separateJava process.

Hadoop可以运行在一个单一的节点,在伪分布式模式,此时每一个Hadoop守护进程运行在一个单独的Java进程

Configuration 配置

Use the following:
使用以下:


conf/core-site.xml:

<configuration>     <property>         <name>fs.default.name</name>         <value>hdfs://localhost:9000</value>     </property></configuration>


conf/hdfs-site.xml:

<configuration>     <property>         <name>dfs.replication</name>         <value>1</value>     </property></configuration>


conf/mapred-site.xml:

<configuration>     <property>         <name>mapred.job.tracker</name>         <value>localhost:9001</value>     </property></configuration>

Setup passphraseless ssh设置无密码的ssh

Now check that you can ssh to the localhostwithout a passphrase:

现在检查是能sshlocalhost无口令
$ssh localhost

If you cannot ssh to localhost without apassphrase, execute the following commands:
如果你不能sshlocalhost无口令,执行下面的命令

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Execution执行

Format a new distributed-filesystem:

格式化一个新的分布式文件系统
$bin/hadoop namenode -format

Start the hadoop daemons:
$ bin/start-all.sh

The hadoop daemon log output is written to the${HADOOP_LOG_DIR} directory (defaults to $

Hadoop守护进程日志输出被写到${HADOOP_LOG_DIR}目录下(默认

写到{HADOOP_HOME}/logs).

Browse the web interface for the NameNode and theJobTracker; by default they are available at:

浏览NameNodeJobTracker的网络接口;默认情况下,他们可在

  • NameNode -http://localhost:50070/

  • JobTracker - http://localhost:50030/

Copy the input files into the distributedfilesystem:

复制输入文件到分布式文件系统
$bin/hadoop fs -put conf input

Run some of the examples provided:

运行一些提供的例子
$bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

Examine the output files:

检查输出文件:

Copy the output files from the distributedfilesystem to the local filesytem and examine them:

从分布式文件系统中复制输出的文件到本地文件系统,检查它们
$bin/hadoop fs -get output output
$ cat output/*

or

View the output files on the distributedfilesystem:

或者在分布式文件系统上观察它们
$bin/hadoop fs -cat output/*

When you're done, stop the daemons with:
$bin/stop-all.sh

Fully-DistributedOperation(全分布式操作)

For information on setting up fully-distributed,non-trivial clusters seeClusterSetup.

Java and JNI are trademarks or registeredtrademarks of Sun Microsystems, Inc. in the United States and othercountries.

关于安装全分布式的信息,非平凡的集群查阅ClusterSetup.

JavaJNISunMicrosystems公司在美国和其他国家的商标或注册商标

原创粉丝点击