Hypertable - 安装-Hadoop

来源:互联网 发布:万方地方志数据库网址 编辑:程序博客网 时间:2024/05/17 05:15

Hypertable有几种安装方式,入下表:

1. 单机:安装于单机,采用本地文件系统

2. Hadoop分布式安装,在Hadoop之上(HDFS)

3. MapR分布式安装,在MapR之上

4. ThriftBroker在应用服务器上安装ThriftBroker

 

Hadoop

http://hypertable.com/documentation/installation/quick_start_cluster_installation/

Hadoop is an opensource implementation of the Google Filesystem and MapReduce parallelcomputation framework.  The Hadoop filesystem (HDFS) is the filesystemthat most people run Hypertable on top of as it contains all of thearchitectural features required to efficiently support Hypertable. This document describes how to get Hypertable up and running on top of theHadoop filesystem.

HadoopGoogle文件系统和MapReduce并行计算框架的开源实现。因为Hadoop文件系统(HDFS)架构包含了所有高效支持Hypertable的特征,所有大多数人将Hypertable运行于它之上。本文描述了怎样使Hypertable启动并运行在HDFS之上。

Table of Contents

Prerequisites

Step 1 - Install HDFS

Step 2 - Install Capistrano

Step 3 - Edit Capistrano Capfile

Step 4 - Install Hypertable Binaries

Step 5 - FHS-ize Installation

Step 6 - Create and Distributehypertable.cfg

Step 7 - Set "current" link

Step 8 - Synchronize Clocks

Step 9 - Start Hypertable

Step 10 - Verify Installation

Step 11 - Stop Hypertable

What Next?

Prerequisites

先决条件

Before you getstarted with the installation, there are some general system requirements thatneed to be satisfied before proceeding.  These requirements are describedin the following list.

  • admin machine - You should designate one of the machines in your Hypertable cluster as the admin machine (admin1 in examples below).  This is the machine from which you will be administering the cluster.  It can be the same machine as the master or any machine of your choosing.  There are no special hardware requirements for this machine, but it needs to have Internet access (at least temporarily) to get the recommended cluster management tool, Capistrano, installed on it.  It is possible to install Capistrano without Internet access, but it's challenging and could take you half a day to get it working.
  • password-less ssh - For ease of administration, we recommend usingCapistrano, which requires password-less ssh login access from the admin machine to all other machines in the cluster (masters, hyperspace replicas, range servers, etc).  SeePassword-less SSH Login for details on how to set this up.
  • ssh MaxStartups - sshd on the admin machine needs to be configured to allow simultaneous connections from all of the machines in the Hypertable cluster.  The default simultaneous connection limit,MaxStartups, defaults to 10.  SeeSSH Connection Limit for details on how to increase this limit.
  • firewall - The Hypertable processes use TCP and UDP to communicate with one another and with client applications.  Firewalls can block this traffic and prevent Hypertable from operating properly.  Any firewall that blocks traffic between the Hypertable machines should be disabled or the appropriate ports should be opened up to allow Hypertable communication.  SeeHypertable Firewall Requirements for instructions on how to do this.
  • open file limit - Most operating systems have a limit on the total number of files that a process can have open at any one time.  This limit is usually set too low for Hypertable, since it can create a very large number of files.  SeeOpen File Limit for details on how to increase this limit.

在开始安装以前,有一些基本的系统要求必须满足,这些条件如下:

  • 管理机器 -.你要指派Hypertable集群中的一台机器作为管理机器(以下例子中为admin1),硬件上对这台机器没有特别要求,但是它需要有Internet访问权限(至少是暂时的),下载推荐的集群管理工具Capistrano,安装它。虽然没有Internet访问权限也可以安装Capistrano,但这将是一件具有挑战意义的事情,有可能花费你半天时间才能做好。
  • password-less ssh -.为了减轻管理难度,我们建议使用Capistrano,它要求使用password-less ssh【译者注:一种不使用password,而使用公钥/私钥对的登录方式】从管理机器登录到集群中的其他机器(masters, hyperspace replicas, range servers, 等)。关于如何使Password-less SSH Login工作,请参阅http://hypertable.com/documentation/misc/password_less_ssh_login/
  • ssh MaxStartups -.管理机器上的sshd需要设置成运行Hypertable集群中的所有机器都能同时连接上,缺省的同时连接数目MaxStartups10。关于如何增加SSH的连接数目限制,请参阅http://hypertable.com/documentation/misc/ssh_maxstartups/
  • 防火墙 – Hypertable进程间以及与客户应用间采用TCPUDP通讯。防火墙可能阻止这种通讯,使Hypertable不能正常工作。任何阻止这种通讯或关闭特定端口的防火墙都不应该打开,Hypertable必需通过特定的端口通讯。关于如何做请参阅“Hypertable Firewall Requirements”(http://hypertable.com/documentation/misc/firewall_requirements/)。
  • 打开文件的限制 大多数操作系统对一个进程所能同时打开的最多文件的数目都有限制。如果你计划将大量文件载入Hypertable,你可能需要修改这个限制值。关于增加此限制的细节参考“Open File Limit”http://hypertable.com/documentation/misc/how_to_increase_open_file_limit/)。

Step 1 - InstallHDFS

步骤1–安装HDFS

The first step ingetting Hypertable up and running on top of Hadoop is to install HDFS. Hypertable currently builds against Cloudera's CDH3 distribution ofHadoop (seeCDH3 Installation for installation instructions).  Each RangeServer process should runon a machine that is also running an HDFS DataNode.  It's best not to runthe HDFS NameNode on the same machine as a RangeServer since both of thoseprocesses tend to consume a lot of RAM.

To accommodateBigtable-style workload, HDFS needs to be specially configured.  Thedfs.datanode.max.xcievers property, which controls the number of files that aDataNode can service concurrently, should be increased to at least 4096 and thedfs.namenode.handler.count, whichcontrols the number of NameNode threads available to handle RPCs, should beincreased to at least 20. This can be accomplished by adding the followinglines to the conf/hdfs-site.xml file.

Hadoop上启动并运行Hypertable的第一步是安装HDFS。目前,Hypertable是在HadoopCloudera CDH3(关于CDH3请参阅其安装指南)上构建的,每个运行RangeServer进程的机器上,也要运行一个HDFSDataNode。最好不要将HDFSNameNode和一个RangeServer运行在一台机器上,因为它们都要消耗大量内存。

为了满足像Bigtable一样的工作负载,HDFS需要特别的配置。属性dfs.datanode.max.xcievers(一个DataNode能同时控制的文件数目)应至少增加到4096,处理RPCNameNode的线程数dfs.namenode.handler.count应至少增加到20。这些都是通过在conf/hdfs-site.xml增加以下行来完成。

<property>

 <name>dfs.namenode.handler.count

 <value>20</value>

</name></property>

<property>

 <name>dfs.datanode.max.xcievers</name>

 <value>4096</value>

</property>

Once thefilesystem is installed, create a /hypertable directory that is readable andwritable by the user account in which hypertable will run.  For example:

文件系统安装好以后,创建目录/hypertable,使它对运行Hypertable的账户具有读写权限,例如:

sudo -u hdfs hadoop fs -mkdir /hypertable

sudo -u hdfs hadoop fs -chmod 777 /hypertable


Step 2 - InstallCapistrano

步骤2–安装Capistrano

The Hypertabledistribution comes with a number of scripts to start and stop the variousservers that make up a Hypertable cluster. You can use your own clustermanagement tool to launch these scripts and deploy new binaries. However, ifyou're not already using a cluster management tool, we recommendCapistrano. The distribution comes with a Capistrano config file(conf/Capfile.cluster) that makes deploying and launching Hypertable a breeze.

Capistrano is asimple tool for automating the remote execution of tasks. It uses ssh to do theremote execution. To ease deployment, you should have password-less ssh access(i.e. public key) to all of the machines in your cluster. Installing Capistranois pretty simple. On most systems you just need to execute the followingcommands (Internet access required):

Hypertable的安装包带有一系列启停构成Hypertable集群的各种服务的脚本,你可以用自己的集群管理工具来运行这些脚步和部署新的二进制包。但是如果你还没有一个集群管理工具,建议采用CapistranoHypertable安装包所带的Capistrano配置文件(conf/Capfile.cluster),会使部署和启动Hypertable非常容易。

Capistrano是一个自动化远程任务运行的小工具,它用ssh来实现远程运行。为简化部署,你应该使集群中的所有机器的ssh都采用password-less访问方式(例如,采用公钥)。安装Capistrano很简单,在远程机器上,你只需运行如下命令(需要Internet访问权限):

$ sudo gem update

$ sudo gem install capistrano

After thisinstallation step you should now have the cap program in your path:

本步安装完成后,在你的路径下,你应该有如下的cap程序:

$ cap --version

Capistrano v2.9.0

Step 3 - EditCapistrano Capfile

步骤3–编辑Capistrano Capfile

Once you haveCapistrano installed, copy the conf/Capfile.cluster that comes with theHypertable distribution to your working directory (e.g. home directory) onadmin1, rename it to Capfile, and tailor it for your environment. The capcommand reads the file Capfile in the current working directory by default.There are some variables that are set at the top that you need to modify foryour particular environment. The following shows the variables at the top ofthe Capfile that need modification:

一旦你完成了Capistrano的安装,在admin1上拷贝随Hypertable带的conf/Capfile.cluster到你的工作目录(例如home目录),重新命名为Capfile,并根据你的环境裁剪它。缺省情况下cap命令读取工作目录的Capfile文件。对于你特定的环境,头部有几个变量需要修改,下面显示了这几个你需要修改的变量:

set :source_machine,    "admin1"

set :install_dir,       "/opt/hypertable"

set :hypertable_version, "0.9.5.5"

set :default_pkg,       "/tmp/hypertable-0.9.5.5-linux-x86_64.rpm"

set :default_dfs,       "hadoop"

set :default_config,    "/root/hypertable.cfg"

Here's a briefdescription of each variable:

以下是每个变量的简介。

Table 2. Hypertable Capistrano Variables

Variable

Description

source_machine

machine from which you will build the binaries, distribute them to the other machines, and launch the service.

install_dir

directory on source_machine where you have installed Hypertable. It is also the directory on the remote machines where the installation will get rsync'ed to.

hypertable_version

version of Hypertable you are deploying

default_pkg

Path to binary package file (.dmg, .rpm, or .tar.bz2) on source machine

default_dfs

distributed file system you are running Hypertable on top of. Valid values are "local", "hadoop", "kfs", or "ceph"

default_config

location of the default Hypertable configuration file that you plan to use

 

2 Hypertable Capistrano变量

变量

描述

source_machine

你要构建二进制包、分发它们并启动服务的机器

install_dir

source_machine上你安装Hypertable的目录。它也是安装程序镜像到远程机器上的目录。

hypertable_version

部署的Hypertable的版本

default_pkg

source_machine机器上指向二进制包的路径(.dmg, .rpm, or .tar.bz2)

default_dfs

Hypertable运行的依赖的分布式文件系统,合法值为"local", "hadoop", "kfs", "ceph"

default_config

Hypertable缺省的配置文件的位置

 

In addition tothe above variables, you also need to define three roles, one for the machinethat will run the master processes, one for the machines that will run theHyperspace replicas, and one for the machines that will run the RangeServers.Edit the following lines:

除了以上变量,你也需要定义3个角色,一个是针对运行master有关进程的机器,一个是针对运行Hypersapce复制的机器,一个是针对运行运行RangeServer的机器,编辑以下行:

role :source, "admin1"

role :master, "master"

role :hyperspace, "hyperspace001","hyperspace002", "hyperspace003"

role :slave, "slave001", "slave002", "slave003","slave004", "slave005", "slave006","slave007", "slave008"

role :localhost, "admin1"

role :thriftbroker

role :spare

The followingtable describes each role.

下表描述每个角色。

Table 3. Hypertable Capistrano Roles

Role

Description

source

The machine from which you will be distributing the binaries (admin1 in this example).

master

The machine that will run the Hypertable master process as well as a DFS broker. Ideally this machine is high quality and somewhat lightly loaded (e.g. not running a RangeServer). Typically you would have a high quality machine running the Hypertable master, a Hyperspace replica, and the HDFS NameNode

hyperspace

The machines that will run Hyperspace replicas. There should be at least one machine defined for this role. The machines that take on this role should be somewhat lightly loaded (e.g. not running a RangeServer)

slave

The machines that will run RangeServers. Hypertable is designed to run on a filesystem like HDFS. In fact, the system works best from a performance standpoint when the RangeServers are run on the same machines as the HDFS DataNodes. This role will also launch a DFS broker and a ThriftBroker.

localhost

The name of the machine that you're administering the cluster from (admin1 in this example).

thriftbroker

Additional machines that will be running a ThriftBroker (e.g. web servers).  NOTE: You do not have to add the slave machines to this role, since a ThriftBroker is automatically started on each slave machine to support MapReduce. 

spare

Machines that will act as standbys. They will be kept current with the latest binaries.

 

3. Hypertable Capistrano角色

Role

Description

source

你准备分发二进制包到其他机器的机器(本例中是admin1).

master

这台机器中将运行master进程和DFS代理。理想情况下,这台机器的质量很好并且是轻载的(例如它不运行RangeServer)。典型情况下,这台好质量的计算机运行Hypertable masterHyperspace replicaHDFS NameNode

hyperspace

这台机器将运行Hyperspace replicas。至少有一台计算机被定义成这个这个角色。承担这个角色的机器应该是那种轻载的机器 (例如不运行 RangeServer)

slave

这台机器运行RangeServerHypertable被设计成可以运行在诸如HDFS之上,从最佳性能的观点,RangeServerHDFS DataNodes应在一台机器上。这个角色也将启动DSF brokerThriftBroker

localhost

你从这台机器管理集群中的其他机器(本例中是admin1

thriftbroker

另外的运行ThriftBroker的机器(例如web server)。注:你不必将slave机器加入到这个角色,因为为支持MapReduceThriftBroker会在slave机器上自动启动。

spare

备份机。它们保存有最新的二进制包。

 

Step4 - Install Hypertable Binaries

步骤4–安装Hypertable二进制包

The Hypertablebinaries can either be downloaded prepackaged, or you can compile them fromsource code. To install the prepackaged version,download the Hypertablepackage (.dmg, .rpm, or .tar.bz2) that you want to install and put itsomewhere accessible on the source machine (admin1 in this example). Modify thehypertable_version and default_pkg variables at the top of the Capfile tocontain the version of Hypertable you are installing and the absolute path tothe package file on the source machine, respectively.  For example, ifyou're upgrading to version 0.9.5.5 and using the RPM package, set thevariables as follows.

Hypertable的二进制包可以是下载的预编译包或从源代码自己编译而成,为安装预编译包,下载它们(.dmg, .rpm, or .tar.bz2),并把它们放到source machine(本例中的admin1)能访问的地方,在Capfile文件头部,修改hypertable_versiondefault_pkg变量,使它们分别为Hypertable的版本和source机器上的包文件的绝对路径。例如,如果你安装的版本为0.9.5.5,并且使用RPM包,则变量的设置如下:

set :hypertable_version, "0.9.5.5"

set :default_pkg,       "/tmp/hypertable-0.9.5.5-linux-x86_64.rpm"

To distribute andinstall the binary package on all necessary machines, issue the followingcommand.  This command will cause the package to get rsync'ed to allparticipating machines and installed with the appropriate package manager (rpm,dpkg, or tar) depending on the package type.

为分发并安装二进制包到所有必要的机器上,发出如下命令。这个命令能根据包的类型(rpm, dpkg, or tar),使用合适的包管理器,使包镜像到所有参与的机器上。

$ cap install_package

If you prefercompiling the binaries from source, you can use Capistrano todistribute the binaries with rsync. On admin1 be sure Hypertable is installedin the location specified by the install_dir variable at the top of the Capfileand that the hypertable_version variable at the top of the Capfile matches theversion you are installing (/opt/hypertable and 0.9.5.5 in this example).Then distribute the binaries with the following command.

如果你喜欢从源代码编译而成的二进制包,你可以用Capistrano,采用rsync命令分发二进制包。在admin1机器上,请确保Hypertable已被安装在Capfile头部install_dir变量所指定的位置,hypertable_version变量的值与你要安装的版本一致(本例中分别为opt/hypertable0.9.5.5),然后用下面命令分发二进制包。

$ cap dist

Step 5 - FHS-izeInstallation

步骤5– FHS-ize安装

SeeFilesystem Hierarchy Standard for an introduction to FHS. If you're running as a user other than root,first create the directories /etc/opt/hypertable and /var/opt/hypertable on allmachines in the cluster and change ownership to the user account under whichthe binaries will be run. For example:

FHS的介绍参阅“Filesystem Hierarchy Standard”(http://hypertable.com/documentation/misc/filesystem_hierarchy_standard_fhs/)。如果你的运行账户不是root,首先在集群的所有机器上创建两个目录/etc/opt/hypertable /var/opt/hypertable,把它们的所有者修改成你的运行账户,例如

$ sudo cap shell

cap> mkdir /etc/opt/hypertable /var/opt/hypertable

cap> chown chris:staff /etc/opt/hypertable/var/opt/hypertable

Then FHS-ize theinstallation with the following command:

接着,用如下命令完成FHS-ize安装。

$ cap fhsize

Step6 - Create and Distribute hypertable.cfg

步骤6–创建和分发Hypertable.cfg

The next step isto create a hypertable.cfg file that is specific to your deployment.  Abasic hypertable.cfg file can be found in the conf/ subdirectory of yourhypertable installation which can be copied and modified as needed.  Thefollowing table shows the minimum set of required and recommended propertiesthat you need to modify.

下一步是创建关于你特定部署的hypertable.cfg。在Hypertable安装目录的conf/子目录下一个基本的hypertable.cfg文件,你需要的话可以拷过来修改后使用。下表描述了你可能需要修改的最少的和推荐的属性集合。

Table 1. Recommended and Required Properties

Property

Description

HdfsBroker.fs.default.name

URL of the HDFS NameNode.  Should match fs.default.name property of Hadoop configuration file hdfs-site.xml

Hyperspace.Replica.Host

Hostname of Hyperspace replica

Hypertable.RangeServer.Monitoring.DataDirectories

This property is optional, but recommended.  It contains a list of directories that are the mount points of the HDFS data node storage volumes.  By setting this property appropriately, the Hypertable monitoring system will be able to provide accurate disk usage information.

 

1.推荐和必需的属性

属性

描述

HdfsBroker.fs.default.name

HDFS NameNodeURL,应该与Hadoop配置文件hdfs-site.xml中的属性fs.default.name一致

Hyperspace.Replica.Host

Hyperspace replica的宿主名字

Hypertable.RangeServer.Monitoring.DataDirectories

该属性是可选的,但推荐设置。它包含一系列目录,这些目录是HDFS数据存储卷的挂载点。合适地设置此值,Hypertable监控系统就能够提供准确的磁盘使用信息。

You can leave all other properties at theirdefault values.  Hypertable is designed to adapt to the hardware on whichit runs and to dynamically adapt to changes in workload, so no specialconfiguration is needed beyond the basic properties listed in the above table. For example, the following shows the changes we made to thehypertable.cfg file for our test cluster.

你可以不管其他的属性,就用它们的缺省值。Hypertable能够适应它运行的硬件,并且会根据负载进行调整,所以除了上面表中列出的基本属性外,其他配置项不需要特别的值。作为例子,下面给出了hypertable.cfg中我们做的修改,它用于我们的测试集群。

HdfsBroker.fs.default.name=hdfs://master:9000

Hyperspace.Replica.Host=hyperspace001

Hyperspace.Replica.Host=hyperspace002

Hyperspace.Replica.Host=hyperspace003

 

Hypertable.RangeServer.Monitoring.DataDirectories="/data/1,/data/2,/data/3,/data/4"

Seehypertable-example.cfg

请参阅hypertable-example.cfghttp://www.hypertable.org/pub/hypertable-example-cfg.txt

Once you'vecreated the hypertable.cfg file for your cluster, put it on the source machine(admin1) and set the absolute pathname referenced in the default_config Capfilevariable to point to this file (e.g. /etc/opt/hypertable/hypertable.cfg). Thendistribute the custom config files with the following command.

一旦你创建了自己集群的hypertable.cfg,把它发到source机器上(admin1),设置Capfile的变量default_config,为指向这个文件的绝度路径(例如/etc/opt/hypertable/hypertable.cfg),然后用以下命令分发这个定制的配置文件。

$ cap push_config

If you ever needto make changes to the config file, make the changes, re-run cap push_config,and then restart Hypertable (see sections 9 and 11, below).

如果你需要修改这个配置文件,修改它,然后重新用cap push_config分发它,再重启Hypertable(见第9,11)

Step 7 - Set "current" link

步骤7–设置“current”链接

To make thelatest version of Hypertable referenceable from a well-known location, create a"current" link to point to the latest installation.  This can beaccomplished with the following command.

为使Hypertable能从一个公开的位置得到最新的版本,建议设置一个”current”,指向最新版Hypertable的安装位置。采用如下命令可完成此任务。

$ cap set_current

Step 8 -Synchronize Clocks

步骤8–同步时钟

The system cannotoperate correctly unless the clocks on all machines are synchronized. Use theNetwork Time Protocol (ntp) to ensure that the clocks get synchronized and remain in sync. Run the'date' command on all machines to make sure they are in sync. The followingCapistrano shell session show the output of a cluster with properlysynchronized clocks.

如果所有机器上的时钟不同步,系统将不能正确运行。采用网络时钟协议(NTP)来保证所有的时钟同步。在所有的机器上运行“date”命令来确保它们一致。下面的Capistrano对话显示了一个集群在时间同步后的输出。

cap> date

[establishing connection(s) to master, hyperspace001,hyperspace002, hyperspace003, slave001, slave002, slave003, slave004, slave005,slave006, slave007, slave008]

 ** [out ::master] Sat Jan 3 18:05:33 PST 2009

 ** [out ::hyperspace001] Sat Jan 3 18:05:33 PST2009

 ** [out ::hyperspace002] Sat Jan 3 18:05:33 PST2009

 ** [out ::hyperspace003] Sat Jan 3 18:05:33 PST2009

 ** [out ::slave001] Sat Jan 3 18:05:33 PST 2009

 ** [out ::slave002] Sat Jan 3 18:05:33 PST 2009

 ** [out ::slave003] Sat Jan 3 18:05:33 PST 2009

 ** [out ::slave004] Sat Jan 3 18:05:33 PST 2009

 ** [out ::slave005] Sat Jan 3 18:05:33 PST 2009

 ** [out ::slave007] Sat Jan 3 18:05:33 PST 2009

 ** [out ::slave008] Sat Jan 3 18:05:33 PST 2009

Step 9 - StartHypertable

步骤9–启动Hypertable

The followingcommands should be run from the directory containing the Capfile. To start allof the Hypertable servers:

下面的命令应该在包含有Capfile的目录下运行。为启动所有的Hypertable服务器,采用:

$ cap start

If you want tolaunch the service using a different config file than the default (e.g./home/chris/alternate.cfg):

如果你不想采用缺省的配置文件,而采用另一个(例如/home/chris/alternate.cfg)来启动,采用:

$ cap -S config=/home/chris/alternate.cfg start

You'll need tospecify the same config file when running Hypertable commands such as thecommand shell, for example:

运行Hypertable命令时,你可能需要指定同样的配置文件,例如,运行命令shell:

$ /opt/hypertable/current/bin/ht shell--config=/home/chris/alternate.cfg

Step 10 - VerifyInstallation

步骤10–验证安装

Create a table.

创建表

echo "USE '/'; CREATE TABLE foo ( c1, c2 ); GETLISTING;" \

   |/opt/hypertable/current/bin/ht shell --batch

The output ofthis command should look like:

输出应该像这样:

foo

sys (namespace)

Load some data.

载入数据

echo "USE '/'; INSERT INTO foo VALUES('001','c1', 'very'), \

   ('000','c1', 'Hypertable'), ('001', 'c2', 'easy'), ('000', 'c2', 'is');" \

   |/opt/hypertable/current/bin/ht shell --batch

Dump the table.

导出数据

echo "USE '/'; SELECT * FROM foo;" \

   |/opt/hypertable/current/bin/ht shell --batch

The output ofthis command should look like:

输出应该像这样

000 c1           Hypertable

000 c2           is

001 c1           very

001 c2           easy

Step 11 - StopHypertable

步骤11–停止Hypertable

To stop theservice, shutting down all servers:

为停止服务,用如下命令停止所有服务器:

$ cap stop

If you want towipe your database clean, removing all namespaces and tables:

如果你想清理数据库,删除所有空间和表,采用:

$ cap cleandb

What Next?

下一步

Congratulations! Now that you have successfully installed Hypertable, we recommend thatyou walk through theHQL Tutorial to get familiar with using the system

恭喜!现在你已经成功安装了Hypertable,我们建议你继续HQL之旅。

 

原创粉丝点击