hadoop 单节点安装 Single Node Setup
来源:互联网 发布:办公室有老鼠 知乎 编辑:程序博客网 时间:2024/05/21 08:58
Single Node Setup
单节点安装
Purpose(说明)
This document describes how to set up andconfigure a single-node Hadoop installation so that you can quicklyperform simple operations using Hadoop MapReduce and the HadoopDistributed File System (HDFS).
这个文档描述了如何安装和配置一个单节点的Hadoop安装,这样很快的通过用HadoopMapReduce和Haddop分布式文件系统(HDFS)执行一些简单的操作
Prerequisites(先决条件)
SupportedPlatforms(支持的平台)
GNU/Linux issupported as a development and production platform. Hadoop has beendemonstrated on GNU/Linux clusters with 2000 nodes.
GNU/Linux可以被支持为开发或生产平台。Hadoop被在2000节点的GNU/Linux检验测试过
Win32 is supported as a developmentplatform. Distributed operation has not been well tested onWin32, so it is not supported as aproduction platform.
Win32可以被支持为开发平台。分布式的操作还没有在win32上测试,因此win32不能被支持作为生产平台
RequiredSoftware(所需软件)
Required software for Linux and Windows include:
所需的软件包括用于Linux和Windows
JavaTM1.6.x, preferably from Sun, must be installed.
Java1.6,最好从来自sun的,必须安装
ssh must be installed andsshd must be running to use the Hadoop scripts thatmanage remote Hadoop daemons.
ssh必须安装因为sshd必须运行去用Hadoop的脚本来管理远程的Hadoop守护进程(是一种无控制终端、无登录Shell与之相关联的后台进程.)
Additional requirements for Windows include:
Windows其它的需要
Cygwin- Required for shell support in addition to the required softwareabove.
Cygwin-所需的为了shell的支持,除了上述所需的软件
InstallingSoftware(安装软件)
If your cluster doesn't have the requisitesoftware you will need to install it.
如果你的集群没有这些必要的软件你将需要安装它
For example on Ubuntu Linux:
在UbuntuLinux上举例子
$ sudo apt-get install ssh
$ sudo apt-getinstall rsync
On Windows, if you did not install the requiredsoftware when you installed cygwin, start the cygwin installer andselect the packages:
openssh - the Net category
在Windows,当你安装cygwin时,如果你没安装所需的软件,运行cygwin安装程序选择包:
openssh - the Net category
Download(下载)
To get a Hadoop distribution, download a recentstablerelease from one of the Apache DownloadMirrors.
为了下载一个Hadoop分布式,从Apache的下载镜像一个下载地址下载一个最近的稳定的发布版本
Prepare to Start the Hadoop Cluster(准备启动Hadoop集群)
Unpack the downloaded Hadoop distribution. In thedistribution, edit the file conf/hadoop-env.sh to define at leastJAVA_HOME to be the root of your Java installation.
解压下载的Hadoop分布式,在这个分布式中,编辑文件conf/hadoop-env.sh 至少定义JAVA_HOME是你的Java安装的根目录。
Try the following command:
尝试下面的命令:
$bin/hadoop
This will display the usage documentation for thehadoop script.
这将会将会显hadoop脚本的使用文档
Now you are ready to start your Hadoop cluster inone of the three supported modes:
现在,你已经准备好启动Hadoop集群中的三种支持的模式之一
Local (Standalone)Mode 本地(独立)模式
Pseudo-DistributedMode 伪分布式
Fully-Distributed Mode全分布模式
Standalone Operation(独立的操作)
By default, Hadoop is configured to run in anon-distributed mode, as a single Java process. This is useful fordebugging.
默认情况下,Hadoop被配置成运行非分布式模式,作为一个单独的java进程。这对调试很有用
The following example copies the unpacked confdirectory to use as input and then finds and displays every match ofthe given regular expression. Output is written to the given outputdirectory.
就是把加压的conf目录的一个拷贝作为输入目录,然后查找显示所给正则表达的每一个匹配,输出到所给出的输出目录
$mkdir input
$ cp conf/*.xml input
$ bin/hadoop jarhadoop-examples-*.jar grep input output 'dfs[a-z.]+'
$ catoutput/*
Pseudo-DistributedOperation(伪分布式的操作)
Hadoop can also be run on a single-node in apseudo-distributed mode where each Hadoop daemon runs in a separateJava process.
Hadoop可以运行在一个单一的节点,在伪分布式模式,此时每一个Hadoop守护进程运行在一个单独的Java进程
Configuration 配置
Use the following:
使用以下:
conf/core-site.xml:
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property></configuration>
conf/hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property></configuration>
conf/mapred-site.xml:
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property></configuration>
Setup passphraseless ssh设置无密码的ssh
Now check that you can ssh to the localhostwithout a passphrase:
现在检查是能sshlocalhost无口令
$ssh localhost
If you cannot ssh to localhost without apassphrase, execute the following commands:
如果你不能ssh到localhost无口令,执行下面的命令
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Execution执行
Format a new distributed-filesystem:
格式化一个新的分布式文件系统
$bin/hadoop namenode -format
Start the hadoop daemons:
$ bin/start-all.sh
The hadoop daemon log output is written to the${HADOOP_LOG_DIR} directory (defaults to $
Hadoop守护进程日志输出被写到${HADOOP_LOG_DIR}目录下(默认
写到{HADOOP_HOME}/logs).
Browse the web interface for the NameNode and theJobTracker; by default they are available at:
浏览NameNode和JobTracker的网络接口;默认情况下,他们可在
NameNode -http://localhost:50070/
JobTracker - http://localhost:50030/
Copy the input files into the distributedfilesystem:
复制输入文件到分布式文件系统
$bin/hadoop fs -put conf input
Run some of the examples provided:
运行一些提供的例子
$bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
Examine the output files:
检查输出文件:
Copy the output files from the distributedfilesystem to the local filesytem and examine them:
从分布式文件系统中复制输出的文件到本地文件系统,检查它们
$bin/hadoop fs -get output output
$ cat output/*
or
View the output files on the distributedfilesystem:
或者在分布式文件系统上观察它们
$bin/hadoop fs -cat output/*
When you're done, stop the daemons with:
$bin/stop-all.sh
Fully-DistributedOperation(全分布式操作)
For information on setting up fully-distributed,non-trivial clusters seeClusterSetup.
Java and JNI are trademarks or registeredtrademarks of Sun Microsystems, Inc. in the United States and othercountries.
关于安装全分布式的信息,非平凡的集群查阅ClusterSetup.
Java和JNI是SunMicrosystems公司在美国和其他国家的商标或注册商标
- hadoop 单节点安装 Single Node Setup
- Hadoop 1.2.1 单节点安装(Single Node Setup)步骤
- Hadoop - single node setup
- Hadoop 2.2 Single-Node Setup
- Hadoop single node安装
- hadoop 第二节 单节点集群配置 Setting up a Single Node Cluster
- HADOOP单节点安装
- hadoop单节点安装
- hadoop单节点安装
- hadoop单节点安装
- 单节点hadoop安装
- Ubuntu上“单节点”方式运行Hadoop (Running Hadoop 1.5.3 on Ubuntu in Single-node cluster)
- greenplum4单机单节点本地安装Installing a Single-Node Greenplum Database Instance
- Hadoop single node安装 (续)
- 单节点Hadoop安装过程
- Ubuntu安装Hadoop (单节点)
- windows安装hadoop-单节点
- Hadoop 2.6 single node cluster安装中的几个问题
- 反步法+模糊参数估计设计永磁同步电机控制器(源代码)
- [HDU 4666]Hyperspace[最远曼哈顿距离][STL]
- C++中为什么要用虚函数、指针或引用才能实现多态?
- #R#R presentation and Shiny package
- 开发人员必读的11本最具影响力书籍
- hadoop 单节点安装 Single Node Setup
- ubuntu开启vsftpd以及简单配置
- Touch事件处理
- ListView 自定义头部、自定义加载尾部、上拉主动加载
- 复制二叉树(二叉树)
- 尼克的任务
- 库函数sort 和 qsort的使用
- AVR单片机GCC编程
- 清华大学计算机系博士生开会美国签证申请详细流程