ZooKeeper系列之三：ZooKeeper的安装

来源：互联网发布：诛仙手游炼器辅助软件编辑：程序博客网时间：2024/05/16 13:55

ZooKeeper的安装模式分为三种，分别为：单机模式（stand-alone）、集群模式和集群伪分布模式。ZooKeeper 单机模式的安装相对比较简单，如果第一次接触ZooKeeper的话，建议安装ZooKeeper单机模式或者集群伪分布模式。

1）单机模式

首先，从Apache官方网站下载一个ZooKeeper的最近稳定版本。

http://hadoop.apache.org/zookeeper/releases.html

作为国内用户来说，选择最近的的源文件服务器所在地，能够节省不少的时间。

http://labs.renren.com/apache-mirror//hadoop/zookeeper/

ZooKeeper要求JAVA的环境才能运行，并且需要JAVA6以上的版本，可以从SUN官网上下载，并对JAVA环境变量进行设置。除此之外，为了今后操作的方便，我们需要对ZooKeeper的环境变量进行配置，方法如下，在/etc/profile文件中加入如下的内容：

#Set ZooKeeper Enviroment

export ZOOKEEPER_HOME=/root/hadoop-0.20.2/zookeeper-3.3.1

exportPATH=$PATH:$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/conf

ZooKeeper服务器包含在单个JAR文件中，安装此服务需要用户创建一个配置文档，并对其进行设置。我们在ZooKeeper-*.*.*目录（我们以当前ZooKeeper的最新版3.3.1为例，故此下面的“ZooKeeper-*.*.*”都将写为“ZooKeeper-3.3.1”）的conf文件夹下创建一个zoo.cfg文件，它包含如下的内容：

tickTime=2000

dataDir=/var/zookeeper

clientPort=2181

在这个文件中，我们需要指定dataDir的值，它指向了一个目录，这个目录在开始的时候需要为空。下面是每个参数的含义：

tickTime：基本事件单元，以毫秒为单位。它用来指示心跳，最小的session过期时间为两倍的tickTime.。

dataDir：存储内存中数据库快照的位置，如果不设置参数，更新事务日志将被存储到默认位置。

clientPort：监听客户端连接的端口

使用单机模式时用户需要注意：这种配置方式下没有ZooKeeper副本，所以如果ZooKeeper服务器出现故障，ZooKeeper服务将会停止。

以下代码清单A 是我们的根据自身情况所设置的zookeeper配置文档：zoo.cfg

代码清单A：zoo.cfg

# The number of milliseconds of each tick

tickTime=2000

# the directory where the snapshot is stored.

dataDir=/root/hadoop-0.20.2/zookeeper-3.3.1/snapshot/data

# the port at which the clients will connect

clientPort=2181

2）集群模式

为了获得可靠的ZooKeeper服务，用户应该在一个集群上部署ZooKeeper。只要集群上大多数的ZooKeeper服务启动了，那么总的ZooKeeper服务将是可用的。另外，最好使用奇数台机器。如果zookeeper拥有5台机器，那么它就能处理2台机器的故障了。

之后的操作和单机模式的安装类似，我们同样需要对JAVA环境进行设置，下载最新的ZooKeeper稳定版本并配置相应的环境变量。不同之处在于每台机器上conf/zoo.cfg配置文件的参数设置，参考下面的配置：

tickTime=2000

dataDir=/var/zookeeper/

clientPort=2181

initLimit=5

syncLimit=2

server.1=zoo1:2888:3888

server.2=zoo2:2888:3888

server.3=zoo3:2888:3888

“server.id=host:port:port.”指示了不同的ZooKeeper服务器的自身标识，作为集群的一部分的机器应该知道ensemble中的其它机器。用户可以从“server.id=host:port:port.”中读取相关的信息。在服务器的data（dataDir参数所指定的目录）目录下创建一个文件名为myid的文件，这个文件中仅含有一行的内容，指定的是自身的id值。比如，服务器“1”应该在myid文件中写入“1”。这个id值必须是ensemble中唯一的，且大小在1到255之间。这一行配置中，第一个端口（port）是从（follower）机器连接到主（leader）机器的端口，第二个端口是用来进行leader选举的端口。在这个例子中，每台机器使用三个端口，分别是：clientPort，2181；port，2888；port，3888。

我们在拥有三台机器的Hadoop集群上测试使用ZooKeeper服务，下面代码清单B 是我们根据自身情况所设置的ZooKeeper配置文档：

代码清单B：zoo.cfg

# The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial

# synchronization phase can take

initLimit=10

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

syncLimit=5

# the directory where the snapshot is stored.

dataDir=/root/hadoop-0.20.2/zookeeper-3.3.1/snapshot/d1

# the port at which the clients will connect

clientPort=2181

server.1=IP1:2887:3887

server.2=IP2:2888:3888

server.3=IP3:2889:3889

清单中的IP分别对应的配置分布式ZooKeeper的IP地址。当然，也可以通过机器名访问zookeeper，但是需要在ubuntu的hosts环境中进行设置。读者可以查阅Ubuntu以及Linux的相关资料进行设置。

3)集群伪分布

简单来说，集群伪分布模式就是在单机下模拟集群的ZooKeeper服务。

那么，如何对配置ZooKeeper的集群伪分布模式呢？其实很简单，在zookeeper配置文档中，clientPort参数用来设置客户端连接zookeeper的端口。server.1=IP1:2887:3887中，IP1指示的是组成ZooKeeper服务的机器IP地址，2887为用来进行leader选举的端口，3887为组成ZooKeeper服务的机器之间通信的端口。集群伪分布模式我们使用每个配置文档模拟一台机器，也就是说，需要在单台机器上运行多个zookeeper实例。但是，我们必须要保证各个配置文档的clientPort不能冲突。

下面是我们所配置的集群伪分布模式，通过zoo1.cfg，zoo2.cfg，zoo3.cfg模拟了三台机器的ZooKeeper集群。详见代码清单C：

代码清单C：zoo1.cfg：

# The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial

# synchronization phase can take

initLimit=10

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

syncLimit=5

# the directory where the snapshot is stored.

dataDir=/root/hadoop-0.20.2/zookeeper-3.3.1/d_1

# the port at which the clients will connect

clientPort=2181

server.1=localhost:2887:3887

server.2=localhost:2888:3888

server.3=localhost:2889:3889

zoo2.cfg：

# The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial

# synchronization phase can take

initLimit=10

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

syncLimit=5

# the directory where the snapshot is stored.

dataDir=/root/hadoop-0.20.2/zookeeper-3.3.1/d_2

# the port at which the clients will connect

clientPort=2182

#the location of the log file

dataLogDir=/root/hadoop-0.20.2/zookeeper-3.3.1/logs

server.1=localhost:2887:3887

server.2=localhost:2888:3888

server.3=localhost:2889:3889

zoo3.cfg：

# The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial

# synchronization phase can take

initLimit=10

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

syncLimit=5

# the directory where the snapshot is stored.

dataDir=/root/hadoop-0.20.2/zookeeper-3.3.1/d_2

# the port at which the clients will connect

clientPort=2183

#the location of the log file

dataLogDir=/root/hadoop-0.20.2/zookeeper-3.3.1/logs

server.1=localhost:2887:3887

server.2=localhost:2888:3888

server.3=localhost:2889:3889

从上述三个代码清单中可以看到，除了clientPort不同之外，dataDir也不同。另外，不要忘记在dataDir所对应的目录中创建myid文件来指定对应的zookeeper服务器实例。

这里ZooKeeper的安装已经说完了，下一节我们来谈一谈对ZooKeeper的参数配置的理解。

-----

如有疑问请发Email至shenlan211314@gmail.com，谢谢！