redis-py-cluster详述

来源：互联网发布：深圳太极软件拖欠工资编辑：程序博客网时间：2024/05/29 14:13

前文讲了redis3.0引入cluster，怎么能少cluster对应的python client包呢？

1.start

pip install redis-py-cluster

>>> from rediscluster import StrictRedisCluster

>>> startup_nodes = [{"host": "127.0.0.1", "port": "7000"}, {"host": "127.0.0.1", "port": "7001"}, {"host": "127.0.0.1", "port": "7002"}]

>>> rc = StrictRedisCluster(startup_nodes=startup_nodes, decode_responses=True)

>>> rc.set("foo", "bar")

True

>>> print(rc.get("foo"))

'bar'

2.redis-py-cluster请求发送行为

部分请求，发送给cluster中的所有节点，很好理解，比如keys命令，必须所有nodes都响应

部分请求，仅仅发送给master nodes，比如flushdb，flushall，scan，因为slave不能执行这些命令，没必要浪费网络请求

有的请求比如publish随机发送给某个node(后文会详述，为啥只发给某个node)

hscan，sscan等等哈希类操作根据key算出CRC16值，对16384取模就能确定该发送给哪个节点。这个特性取决于redis cluster的key分片模式，详细看前一篇文章

此外：

redis-py-cluster并没有实现redis-py所有命令，没有实现的那些命令直接使用redis-py的

3.pipeline差异

对于redis单点，pipline很好理解，把多条命令打包，放在一个request里发送，减少多次网络往返开销

对于cluster而言，不同的key由不同的node处理，那么客户端应该首先将命令按照node分组，所有node1处理的命令打个包package1，

所有node2处理的命令打个包package2，依次类推，然后只需要向必要的node发送一个request即可。

向每个节点发request，可以是并行的。并行方案可选多线程，或者Gevent

官网原文：

The client is responsible for figuring out which commands map to which nodes.

Let’s say for example that your 100 pipelined commands need to route to3 different nodes?

The first thing the client does is break out the commands that go to each node,

so it only has 3 network requests to make instead of 100.

4.Pubsub 发布订阅

先一句话汇总，cluster暂不能友好的支持。具体原因，读官网这段英文：

According to the current official redis documentation on PUBLISH:

Integer reply: the number of clients that received the message.

It was initially assumed that if we had clients connected to different nodes in the cluster

it would still report back the correct number of clients that recieved the message.

However after some testing of this command it was discovered that

it would only report the number of clients that have subscribed

on the same server the PUBLISH command was executed on.

Because of this, if there is some functionality that relies on an exact and correct number of clients

that listen/subscribed to a specific channel it will be broken or behave wrong.

大概就是说redis文档说了返回所有消息接收者的数量，结果实测，仅当client连接PUBLISH命令执行的所在机器并接收消息，计数才正确

当前版本1.2，对于pubsub是这么处理的，先看英文：

In release 1.2.0 the pubsub was code was reworked to now work like this.

For PUBLISH and SUBSCRIBE commands:

The channel name is hashed and the keyslot is determined.

Determine the node that handles the keyslot.

Send the command to the node.

大概就是说，先根据channel name哈希一把，CRC16就能算出落在哪个哈希槽，就能确定该哪个节点负责。

然后把数据都发给这个节点，说白了就是cluster退化成standalone，同一个channel的所有pubsub数据都只会在一个node

5.readonly模式

默认情况，当访问cluster中的slave节点，cluster会发送MOVE响应。可以在StrictRedisCluster初始化时readonly_mode=True

关键字参数指定，这样的话访问slave节点也能读取数据

>>> rc_readonly = StrictRedisCluster(startup_nodes=startup_nodes, decode_responses=True, readonly_mode=True)

类似的pipeline也支持这个关键字参数

>>> with rc_readonly.pipeline() as readonly_pipe:

... readonly_pipe.get('foo81')

... readonly_pipe.get('foo16706')

... readonly_pipe.execute()

[u'foo', u'bar']

缺陷很明显：由于redis异步主从同步，从节点读取的数据往往不是最新的，很容易不一致哦

此外：

You MUST NOT use SET related operation with READONLY mode enabled object

# NO: This works in almost case, but possibly emits Too many Cluster redirections error...

>>> rc_readonly.set('foo', 'bar')

>>> # OK: You should always use get related stuff...

>>> rc_readonly.get('foo')

readonly允许访问slave, 写数据时就可能多次访问slave，造成重定向过多；读数据时，数据不在当前节点，理应根据重定向访问下一个

0 0