rabbitmq-tutorial

来源:互联网 发布:肯德基 网络市场调研 编辑:程序博客网 时间:2024/06/05 16:52

http://blog.ottocho.com/post/rabbitmq-tutorial-2

前言

这是我在学习官方文档时候,做的一些笔记。加上官方的文档,汇总成一个入门级的rabbitmq指引。应该算不上是翻译,很多语句我觉得不翻译比翻译更好,原创那就更说不上了。代码都调试过写上了注释。

原tutorial用了六个文章做案例,从最简单的one to one 模型,慢慢过度到一个有点样子的RPC范例。我这里就放在两个文章里,分成六个小节。

原文档于此。getstarted.html

第一部分有Hello WorldWork QueuesPublish/Subscribe三个部分,第二部分有RouingTopicRPC三个部分。本文为第一部分。


RabbitMQ

logo

从概念上讲,RabbitMQ解决的是应用程序之间互联(connect)和规模(scale)的问题,消息发送和接收是隔离,发送方不知道消息最终由谁接收,接收方也不必关心消息是谁步发出的;发送和接收是隔离的,消息本质上就是异步的。这种隔离也就解耦了应用程序之间的依赖。RabbitMQ的角色就是应用程序中间的路由器。 而对于规模(scale)而言,应用程序解除了相互依赖之后从业务层面更容易做扩展。from here


Hello World

from here

RabbitMQ is a message broker.

Glossary

还是声明下RabbitMQ中的术语吧。

Producing: sending message; Producer: the program that sends messages

producer

Queue: the name for a mailbox, lives inside RabbitMQ. It's essentially an infinite buffer.Many producers can send messages that go to one queue, many consumers can try to receive data from one queue.

queue

Consuming, Consumer: like producing and producer, it is a program that mostly waits to receive messages.

consumer

The simplest model of the produce-queue-consume. Talk is cheap, show you the code.

实例

Sending.py,此脚本用以解释最简单的 one to one 的消息投递

 1 2 3 4 5 6 7 8 9101112131415161718192021222324
#!/usr/bin/env python# coding:utf8import pikaimport timecredentials = pika.PlainCredentials('guest', 'guest')parameters = pika.ConnectionParameters('localhost', 5672, '/', credentials)connection = pika.BlockingConnection(parameters)channel = connection.channel()# create a queue# queue_declare是幂等的操作,如果queue存在,这就是个声明,否则就会创建之。channel.queue_declare(queue='hello')## exchange 用以判断message应该去哪个queue# binding 就是在exchange上的规则,用routing key来判断message应该去哪个queue# exchange 参数指定为一个空字符串,则使用默认的路由channel.basic_publish(exchange='',                      routing_key='hello',                      body='Hello World!')print " [x] Sent 'Hello World!'"connection.close()

Receiving.py

 1 2 3 4 5 6 7 8 910111213141516171819202122232425262728
#!/usr/bin/env python# coding:utf8import pika# 建立连接和上面类似credentials = pika.PlainCredentials('guest', 'guest')parameters = pika.ConnectionParameters('localhost', 5672, '/', credentials)connection = pika.BlockingConnection(parameters)channel = connection.channel()#channel.queue_declare(queue='hello')print ' [*] Waiting for messages. To exit press CTRL+C'def callback(ch, method, properties, body):    print " [x] Received %r" % (body,)# no_ack的详细用途在上一文intro中有讲到# 如果no_ack=True,则会为下一个AMQP请求添加一个no_ack属性,告诉AMQP服务器不需要等待回馈。# 设置用途:RabbitMQ如果一段时间内不回馈,会将该消息重新分配给另外一个绑定在该队列上的消费者。# 另外:可能你需要自己做反馈的处理channel.basic_consume(callback,                      queue='hello',                      no_ack=True)channel.start_consuming()

Work Queues

from here

Work Queue

be used to distribute time-consuming tasks among multiple workers.
这应该是一个Producer将信息给队列,而有多个Consumer从队列中获取信息。将任务视为消息,此queue则可以视为一个任务队列

work-queue

The main idea behind Work Queues (aka: Task Queues) is to avoid doing a resource-intensive task immediately and having to wait for it to complete. Instead we schedule the task to be done later. Weencapsulate a task as a message and send it to the queue. A worker process running in the background will pop the tasks and eventually execute the job. When you run many workers the tasks will be shared between them.
This concept is especially useful in web applications where it's impossible to handle a complex task during a short HTTP request window.

实例

在此节的例子中,消息还是简单的字符串(依旧不是图像或pdf等)。因此利用sleep来模拟计算时间,并用点号来模拟需要的时间。如“hello...”损耗三秒时间计算。

下例中,new_task.py 程序,它schedule tasks to the work queue(类似于上一例子中的sender)。

注意,在没有指定exchange的情况下,message就发送到名为routing_keyqueue中。

channel.queue_declare(queue='hello')message = ' '.join(sys.argv[1:]) or "hello world."channel.basic_publish(exchange='',                      routing_key='hello',                      body=message)print " [x] Sent '%s'" % (message,)

worker.py: 而类同上例的receive.py,把回调方法加上模拟的处理时间

def callback(ch, method, properties, body):    print " [x] Received %r" % (body,)    time.sleep(body.count('.'))    print " [x] Done"

Task Queue and Round-robin dispatching

By default, RabbitMQ will send each message to the next consumer, in sequence. On average every consumer will get the same number of messages. This way of distributing messages is called round-robin. Try this out with three or more workers.

默认下,消费者会轮流获得message,因此会有一个消费者的循环。

对于简单的task queue而言,拓展是很容易的(但是默认的配置没有考虑负载均衡)

可以用new_task.pyworker.py来模拟这个round-robin

Message acknowledgment (no_ack)

如果一个task用了很长时间来处理一个message,而且处理中途突然挂了,那么这个消息是会消失的,因为RabbitMQ一旦把message给了customer,就将此message从内存中移去。如果kill了一个worker(consumer),那么此consumer所持有的但是还没处理的message也会消失。

当然这是很有问题的。我们希望一个consumer挂了后,可以把它的消息转发给其他consumer。
Message acknowledgement 用以确认消息是否丢失。consumer会发回一个ack给RabbitMQ,告诉它这个message搞定了,RabbitMQ就可以决定是否清理了它。如果consumer没有送回某message的ack,那么RabbitMQ就将此message发给其他consumer。因此对message就可以保证不丢失了。

There aren't any message timeouts.(居然不可以设置超时??)RabbitMQ will redeliver the message only when the worker connection dies. It's fine even if processing a message takes a very, very long time.

Message acknowledgments 默认打开(默认no_ack=False)

Forgotten acknowledgment

It's a common mistake to miss the basic_ack. It's an easy error, but the consequences are serious. Messages will be redelivered when your client quits (which may look like random redelivery), but RabbitMQ will eat more and more memory as it won't be able to release any unacked messages.

如果太多message没有ack,RabbitMQ会消耗过多的内存。要确定有没有这样的问题,可以这样:

$ rabbitmqctl list_queues name messages_ready messages_unacknowledged

Message durability

上面提到的是如果consumer挂了,利用ack来重发消息。但是如果RabbitMQ服务器挂了,消息会有什么后果?如果需要持久化,需要 set the queue and messages as durable.

channel.queue_declare(queue='hello', durable=True)

不允许更新已存在的队列的参数。在queue_declare时,如果声明已存在的队列不同的参数,会返回错误。

如果要更新已存在的队列的参数,要这样干:声明一个新的别名队列

例如:(原队列名为hello)

channel.queue_declare(queue='task_queue', durable=True)

然后在producer和consumer的代码中都要更新。

对于producer而言,要将信息声明为persistent:设置delivery_mode参数为2.

channel.basic_publish(exchange='',                      routing_key="task_queue",                      body=message,                      properties=pika.BasicProperties(                         delivery_mode = 2, # make message persistent                      ))

注意:

标识消息为持久的(persistent)并不意味着message完全保证不会丢失。Although it tells RabbitMQ to save message to the disk, there is still a short time window when RabbitMQ has accepted a message and hasn't saved it yet. Also, RabbitMQ doesn't do fsync(2) for every message, it may be just saved to cache and not really written to the disk. 即,持久化的保证是不足够强的,但会比简单的task queue强。如果需要非常强健的保证,可以将一次message的发送封装在一个transaction中。

Fair dispatch 公平调度

You might have noticed that the dispatching still doesn't work exactly as we want. For example in a situation with two workers, when all odd messages are heavy and even messages are light, one worker will be constantly busy and the other one will do hardly any work. Well, RabbitMQ doesn't know anything about that and will still dispatch messages evenly. (总是平均分配消息)

This happens because RabbitMQ just dispatches a message when the message enters the queue. It doesn't look at the number of unacknowledged messages for a consumer. It just blindly dispatches every n-th message to the n-th consumer.

prefetch-count

In order to defeat that we can use the basic.qos method with the prefetch_count=1 setting. This tells RabbitMQ not to give more than one message to a worker at a time. Or, in other words, don't dispatch a new message to a worker until it has processed and acknowledged the previous one. Instead, it will dispatch it to the next worker that is not still busy.

channel.basic_qos(prefetch_count=1)

队列大小

如果事务繁忙,queue会塞满。(文档里也没说什么有用的东西)

Sample

源码如下:
new_task.py

 1 2 3 4 5 6 7 8 910111213141516171819202122
#!/usr/bin/env python# coding:utf8import sysimport pikaimport timecredentials = pika.PlainCredentials('guest', 'guest')parameters = pika.ConnectionParameters('localhost', 5672, '/', credentials)connection = pika.BlockingConnection(parameters)channel = connection.channel()channel.queue_declare(queue='task_queue', durable=True)message = ' '.join(sys.argv[1:]) or "hello world."channel.basic_publish(exchange='',                      routing_key='task_queue',                      body=message,                      properties=pika.BasicProperties(                         delivery_mode = 2, # make message persistent                      ))print " [x] Sent '%s'" % (message,)connection.close()

worker.py

 1 2 3 4 5 6 7 8 9101112131415161718192021222324
#!/usr/bin/env python# coding:utf8import pikaimport timecredentials = pika.PlainCredentials('guest', 'guest')parameters = pika.ConnectionParameters('localhost', 5672, '/', credentials)connection = pika.BlockingConnection(parameters)channel = connection.channel()channel.queue_declare(queue='task_queue', durable=True)print ' [*] Waiting for messages. To exit press CTRL+C'def callback(ch, method, properties, body):    print " [x] Received %r" % (body,)    time.sleep(body.count('.'))    print " [x] Done"    ch.basic_ack(delivery_tag = method.delivery_tag)channel.basic_qos(prefetch_count=1)channel.basic_consume(callback,                      queue='task_queue')channel.start_consuming()

Publish/Subscribe

区别

Work Queue和Publish/Subscribe都是one to many模型。

他们的区别在于:Work Queue的一条message只会给到一个consumer(worker),而publish/subscribe模型中,同一条消息会发给多个consumer。
从程序服务而言,他们都是one to many,但从message而言,Work Queueone message to one consumerpublish/subscribe则是one message to many consumer.

以下范例程序用两个程序来做例子。

一个程序发送日志消息,而另外一个接受消息输出。在此范例中,所有receiver都会接收到消息,因此可以开一个receiver来写日志,另外开receiver来查看日志。也就是说:发送日志的时候,是广播到所有queue的。(fanout的exchange)

Exchanges

在前面的几个两个文档中都是很简单的模型。现在介绍下exchange。
之前的模型中提到的角色有:

  • producer,发送message
  • queue,存message的buffer
  • consumer,接受message

对于rabbitmq而言,producer并非把message直接发给一个queue,Actually, quite often the producer doesn't even know if a message will be delivered to any queue at all.
实际上,producer将message发送给exchange。它从producer接受message,按照一定的规则,将message发送给queue。而根据exchange type决定message是发给一个指定的queue,还是发给多个queue,或者直接丢弃。

如图,P as Producer, X as eXchange

exchanges

exchange type有:direct, topic, headers and fanout

以下介绍fanout类型的exchange,顾名思义,fanout类型的exchange将message广播给它所知道的queue。

channel.exchange_declare(exchange='logs', type='fanout')

查看exchanges命令如下:

$ rabbitmqctl list_exchangesListing exchanges ...logs      fanoutamq.direct      directamq.topic       topicamq.fanout      fanoutamq.headers     headers...done.

amq.*名称的exchange就是默认的未命名的exchange。

Nameless exchange

在之前的例子中没有声明和指定exchange,但是依旧可以做消息传递,是因为使用了默认的exchange。默认的exchange用一个空字符串指定("")。

channel.basic_publish(exchange='',                      routing_key='hello',                      body=message)

exchange参数指定exchange。空串指定的是默认的exchange:如果名routing_key参数指定的queue存在的话,把message发送过去。

临时队列 Temporary queues

现在的栗子场景是介样的:

做一个logger,监听所有message(也就是log),并且只关注当前的日志(不在乎旧的)。
那么此queue会这样来声明:result = channel.queue_declare(exclusive=True)
不指定queue参数的情况下,rabbitmq会生成一个随机的名字(如amq.gen-JzTY20BRgKO-HjmUJj0wLg)。指定了exclusive=True参数情况下,在没有consumer连接的情况下,删除queue。

而使用 result.method.queue 变量就可以获得这个queue的名字。这个queue是个无名的临时队列。

这样的队列相当于一个中转中心,并不做存储,而仅仅做一个消息广播队列。在有consumer连接的时候,就会获取消息。

Bindings 绑定

什么是binding:exchange和queue的关系(消息传递方法,即路由)。That relationship between exchange and a queue is called a binding.

bindings

设置exchange logs使用上面声明的匿名的queue。

channel.queue_bind(exchange='logs',                   queue=result.method.queue)

查看bingding:

rabbitmqctl list_bindings.

范例代码

exchange-sample

这次的producer范例程序和之前的都很类似,区别在于,它指定了exchange,而不是之前那样使用匿名的默认exchange。通常而言,使用exchange需要给出 routing_key,但是fanout类型的exchange会忽略之(广播给所有queue,也就不需要什么key来route它了)。

 1 2 3 4 5 6 7 8 910111213141516
#!/usr/bin/env python# emit_log.pyimport pikaimport sysconnection = pika.BlockingConnection(pika.ConnectionParameters(             host='localhost'))channel = connection.channel()channel.exchange_declare(exchange='logs',                         exchange_type='fanout')message = ' '.join(sys.argv[1:]) or "info: Hello World!"channel.basic_publish(exchange='logs',                      routing_key='',                      body=message)print " [x] Sent %r" % (message,)connection.close()

接收者如下。

The messages will be lost if no queue is bound to the exchange yet, but that's okay for us; if no consumer is listening yet we can safely discard the message.

 1 2 3 4 5 6 7 8 91011121314151617181920212223
#!/usr/bin/env python# receive_logs.pyimport pikaconnection = pika.BlockingConnection(pika.ConnectionParameters(                                     host='localhost'))channel = connection.channel()channel.exchange_declare(exchange='logs',                         exchange_type='fanout')result = channel.queue_declare(exclusive=True)queue_name = result.method.queuechannel.queue_bind(exchange='logs',                   queue=queue_name)print ' [*] Waiting for logs. To exit press CTRL+C'def callback(ch, method, properties, body):    print " [x] %r" % (body,)channel.basic_consume(callback,                      queue=queue_name,                      no_ack=True)channel.start_consuming()

如下,可以开两个consumer,它会生成两个匿名的queue。这两个queue都bind到exchange "logs"上去。emit_log发送消息给exchange "logs",而"logs"会将消息广播给这两个匿名的queue。此时的rabbit相当于一个中转广播。一个receiver对应一个匿名的queue,一单停止了其中一个receiver,其对应的匿名queue就会销毁。如果没有开启receiver,也就没有生成匿名queue,exchange也就没有bind到任何queue,那么消息发来时,相当于直接丢弃,因为没有队列。
如果需要广播,数据又不能像这里这样不保证消息,那么就应该声明queue名称使用。

两个终端开两个consumer

$ python receive_logs.py > logs_from_rabbit.log$ python receive_logs.py

开个producer

$ python emit_log.py 'message'

查看之

$ rabbitmqctl list_bindingsListing bindings ...logs    exchange        amq.gen-JzTY20BRgKO-HjmUJj0wLg  queue           []logs    exchange        amq.gen-vso0PVvyiRIL2WoV3i48Yg  queue           []...done.

The interpretation of the result is straightforward: data from exchange logs goes to two queues with server-assigned names.

Routing

这个例子是上个例子(publish/subscribe)的强化。

上个例子(publish/subscribe章节)是一个没有过滤的广播。这次的例子是对某些message进行特别的处理,而不是简单的广播。

Bindings

binding是exchange和queue间的关系。

A binding is a relationship between an exchange and a queue. In another word, the queue is interested in messages from this exchange.

channel.queue_bind(exchange=exchange_name, queue=queue_name)

创建binding的时候还可以指定一个routing_key的参数。它的用途取决于exchange的类型。对于fanout类型而言,它是木有用的(广播也就不需要什么路由用的key了)

channel.queue_bind(exchange=exchange_name,                   queue=queue_name,                   routing_key='black')

Direct exchange

上个例子中的日志系统仅仅做个中间通道,把所有消息都广播了。现在我们加强一下:对一些日志过一些过滤。例如,我们希望只接受脚本写的错误日志。fanout类型的exchange显然不符合我们的要求:它只能毫无意义的广播消息。

这次我们用的是direct类型的exchange。direct类型的exchange的路由算法非常简单:exchange会将message发送到与此message的binding key完全一致的binding所关系的queue。

例如下图:

direct-exchange

如上图,exchange X绑定了两个queue(Q1,Q2)。queue Q1绑定的key是'orange',queue Q2有两个binding,因此也就对应两个routing key('black'和'green')。对此,routing key为'orange'的message会发向queue Q1,而routing key为'black'和'green'的message会发向queue Q2。而其他routing key不一致的消息将会被丢弃。

Multiple bindings

当然,一个exhange也可以在同一个routing key上绑定多个queue。看图,这个应该很好理解了。

direct-exchange-multiple

Emitting logs

在这例子中,我们用日志的级别(infowarningerror)作为routing key。因此,接收的脚本可以由日志级别来选择需要接受的消息。

先来看看怎么发送消息。

首先当然是创建一个exchange。(exchange名和type类型)

channel.exchange_declare(exchange='direct_logs',                         type='direct')

这个就是发消息了(serverity in ['info', 'warning', 'error'])

channel.basic_publish(exchange='direct_logs',                      routing_key=severity,                      body=message)

Subscribing

接收消息,区别在于需要把exchange和routing在参数中指定好。

result = channel.queue_declare(exclusive=True)  # 还是无名随机的queuequeue_name = result.method.queuefor severity in severities:    channel.queue_bind(exchange='direct_logs',                       queue=queue_name,                       routing_key=severity)

Sample

The code for emit_log_direct.py:

 1 2 3 4 5 6 7 8 910111213141516
#!/usr/bin/env pythonimport pikaimport sysconnection = pika.BlockingConnection(pika.ConnectionParameters(        host='localhost'))channel = connection.channel()channel.exchange_declare(exchange='direct_logs',                         exchange_type='direct')severity = sys.argv[1] if len(sys.argv) > 1 else 'info'message = ' '.join(sys.argv[2:]) or 'Hello World!'channel.basic_publish(exchange='direct_logs',                      routing_key=severity,                      body=message)print " [x] Sent %r:%r" % (severity, message)connection.close()

The code for receive_logs_direct.py:

 1 2 3 4 5 6 7 8 910111213141516171819202122232425262728293031
#!/usr/bin/env pythonimport pikaimport sysconnection = pika.BlockingConnection(pika.ConnectionParameters(        host='localhost'))channel = connection.channel()channel.exchange_declare(exchange='direct_logs',                         exchange_type='direct')result = channel.queue_declare(exclusive=True)queue_name = result.method.queueseverities = sys.argv[1:]if not severities:    print >> sys.stderr, "Usage: %s [info] [warning] [error]" % \                         (sys.argv[0],)    sys.exit(1)# 一个for循环给同一个exchange(名为'direct_logs'做了多个绑定,routing_key分别为severities[0:-1])for severity in severities:    channel.queue_bind(exchange='direct_logs',                       queue=queue_name,                       routing_key=severity)print ' [*] Waiting for logs. To exit press CTRL+C'def callback(ch, method, properties, body):    print " [x] %r:%r" % (method.routing_key, body,)channel.basic_consume(callback,                      queue=queue_name,                      no_ack=True)channel.start_consuming()

接收warning及error的日志:

$ python receive_logs_direct.py warning error > logs_from_rabbit.log

接受三种类型的日志

$ python receive_logs_direct.py info warning error

发送错误日志

$ python emit_log_direct.py error "Run. Run. Or it will explode."

Topic

在之前介绍了fanout类型的exchange用以广播,和direct类型的exchange用以将routing key完全匹配binding的消息发给指定queue。这显然还是功能不足:不能以多个条件限制之。

我们现在这样来加强我们之前的例子:用日志级别(serverity)和消息来源来路由我们的消息。
参考 syslog的例子,syslog用日志级别(info/warn/crit等)和日志来源(auth/cron/kern等)来路由日志的输出。

为了强化我们的日志消息系统,将它的日志路由方式更新上如上述的方式,就需要使用 topic 类型的exchange。

Topic类型的exchange很好理解,麻烦的是它的key的匹配。所以后面添了别人写的东西,原文翻译比较少。

Introduction

发送给topic类型的exchange的消息的routing_key不能是一个随意的词,而要是点连接的一些词。

这下面懒得翻译了。

Messages sent to a topic exchange can't have an arbitrary routing_key - it must be a list of words, delimited by dots. The words can be anything, but usually they specify some features connected to the message. A few valid routing key examples: "stock.usd.nyse", "nyse.vmw", "quick.orange.rabbit". There can be as many words in the routing key as you like, up to the limit of 255 bytes.
The binding key must also be in the same form. The logic behind the topic exchange is similar to a direct one - a message sent with a particular routing key will be delivered to all the queues that are bound with a matching binding key. However there are two important special cases for binding keys:*(star) can substitute for exactly one word; #(hash) can substitute for zero or more words.

看图:

topics

bindings

In this example, we're going to send messages which all describe animals. The messages will be sent with a routing key that consists of three words (two dots). The first word in the routing key will describe a celerity, second a colour and third a species: "..".

创建三个binding:

queue Q1 的 binding key *.orange.*

queue Q2 的 binding key 为 *.*.rabbit 和 lazy.#

quick.orange.rabbitlazy.orange.elephantlazy.brown.fox这样的key应该去什么queue,显而易见了。
如果key是这样:orangequick.orange.male.rabbit,那么在这个例子里,他们不会去到任何binding中,消息就被丢弃了。

lazy.orange.male.rabbit会去到lazy.#中去。

Topic exchange

Topic的exchange可以利用两个符号,做到和fanout及direct类似功能的工作。

如果queue绑定了#的key,那么它就会不顾key的内容接受所有message:like fanout

如果queue绑定了*的key,那么它和direct就非常类似了。

Putting it all together

范例如下,emit_log_topic.py:

 1 2 3 4 5 6 7 8 9101112131415
#!/usr/bin/env pythonimport pikaimport sysconnection = pika.BlockingConnection(pika.ConnectionParameters(        host='localhost'))channel = connection.channel()channel.exchange_declare(exchange='topic_logs',                         type='topic')routing_key = sys.argv[1] if len(sys.argv) > 1 else 'anonymous.info'message = ' '.join(sys.argv[2:]) or 'Hello World!'channel.basic_publish(exchange='topic_logs',                      routing_key=routing_key,                      body=message)print " [x] Sent %r:%r" % (routing_key, message)connection.close()

receive_logs_topic.py

 1 2 3 4 5 6 7 8 910111213141516171819202122232425
#!/usr/bin/env pythonimport pikaimport sysconnection = pika.BlockingConnection(pika.ConnectionParameters(        host='localhost'))channel = connection.channel()channel.exchange_declare(exchange='topic_logs',                         type='topic')result = channel.queue_declare(exclusive=True)queue_name = result.method.queuebinding_keys = sys.argv[1:]if not binding_keys:    print >> sys.stderr, "Usage: %s [binding_key]..." % (sys.argv[0],)    sys.exit(1)for binding_key in binding_keys:    channel.queue_bind(exchange='topic_logs',                       queue=queue_name,                       routing_key=binding_key)print ' [*] Waiting for logs. To exit press CTRL+C'def callback(ch, method, properties, body):    print " [x] %r:%r" % (method.routing_key, body,)channel.basic_consume(callback,                      queue=queue_name,                      no_ack=True)channel.start_consuming()

接受所有日志:

python receive_logs_topic.py "#"

接受所有来自kern的日志:

python receive_logs_topic.py "kern.*"

接受所有critical的日志

python receive_logs_topic.py "*.critical"

当然是可以同时接收的(多个binding)

python receive_logs_topic.py "kern.*" "*.critical"

发送一个 "kern.critical" 的消息:

python emit_log_topic.py "kern.critical" "A critical kernel error"

上面的官方来的程序由自己指定key,所以可以自己做测试。

topic的key很莫名其妙。因为它不是正则(为什么不做正则呢!)它在理解上有点麻烦。

下面摘录了别人写的东西,原文在此here。

topic对key的处理是这样的:

  1. . 点号 用来将routing key分割成若干部分(Part)(关键要理解这个part)
  2. * 星号 匹配一个完整的Part
  3. # 井号 匹配一个或者多个Part

范例如下:

Eshell V5.9  (abort with ^G)1> rabbit_exchange_type_topic:topic_matches(<<"a.#">>,<<"a.b">>).true2> rabbit_exchange_type_topic:topic_matches(<<"a.#">>,<<"a.bc">>).true3> rabbit_exchange_type_topic:topic_matches(<<"a.#">>,<<"a.bc.bc">>).true4> rabbit_exchange_type_topic:topic_matches(<<"a.#">>,<<"a1.b">>).false5> rabbit_exchange_type_topic:topic_matches(<<"b.a.#">>,<<"a1.b">>).false6> rabbit_exchange_type_topic:topic_matches(<<"b.a.#">>,<<"a.b">>).false7> rabbit_exchange_type_topic:topic_matches(<<"a.*">>,<<"a.b">>).true8> rabbit_exchange_type_topic:topic_matches(<<"a.*">>,<<"a.bc">>).true9> rabbit_exchange_type_topic:topic_matches(<<"a.a*">>,<<"a.bc">>).false10> rabbit_exchange_type_topic:topic_matches(<<"a.a*">>,<<"a.ac">>).false11> rabbit_exchange_type_topic:topic_matches(<<"a.a#">>,<<"a.ac">>).false12> rabbit_exchange_type_topic:topic_matches(<<"a.*">>,<<"a.bc.a">>).false13> rabbit_exchange_type_topic:topic_matches(<<"a.*.*">>,<<"a.bc.a">>).true14> rabbit_exchange_type_topic:topic_matches(<<"a.b*">>,<<"a.bc">>).false15> rabbit_exchange_type_topic:topic_matches(<<"a.*.*">>,<<"a.b*">>).false16> rabbit_exchange_type_topic:topic_matches(<<"a.*">>,<<"a.b*">>).true17> rabbit_exchange_type_topic:topic_matches(<<"a.b*">>,<<"a.b*">>).true18> rabbit_exchange_type_topic:topic_matches(<<"*.a">>,<<"a.a">>).true19> rabbit_exchange_type_topic:topic_matches(<<"*.a">>,<<"a.a.b">>).false20> rabbit_exchange_type_topic:topic_matches(<<"*.a.b">>,<<"a.a">>).false21> rabbit_exchange_type_topic:topic_matches(<<"#.a">>,<<"a.a.b">>).false22> rabbit_exchange_type_topic:topic_matches(<<"#.a">>,<<"a.a">>).true23> rabbit_exchange_type_topic:topic_matches(<<"#.a">>,<<"a.a.a">>).true24>24> rabbit_exchange_type_topic:topic_matches(<<"a.*.a">>,<<"a.a.a">>).true25> rabbit_exchange_type_topic:topic_matches(<<"a.*a.a">>,<<"a.aa.a">>).false26>26> rabbit_exchange_type_topic:topic_matches(<<"*">>,<<"a.aa.a">>).false27> rabbit_exchange_type_topic:topic_matches(<<"*">>,<<"a">>).true28> rabbit_exchange_type_topic:topic_matches(<<"a.*.#">>,<<"a.b">>).true29> rabbit_exchange_type_topic:topic_matches(<<"a.*.#">>,<<"a.b.c">>).true30> rabbit_exchange_type_topic:topic_matches(<<"*.#">>,<<"a.b.c">>).true31> rabbit_exchange_type_topic:topic_matches(<<"*.#">>,<<"a.b.c">>).true32> rabbit_exchange_type_topic:topic_matches(<<"*">>,<<"">>).false33> rabbit_exchange_type_topic:topic_matches(<<"#.*">>,<<"..">>).false34> rabbit_exchange_type_topic:topic_matches(<<"a.*.#">>,<<"a.#">>).true

RPC

这回我们来搞的是,Remote Procedure Call,也就是RPC。它干的事情是这样的:在远程一台机跑一个函数,然后取得它的运行结果。

我们利用RabbitMQ来搞一个简单的RPC服务,用它仅仅来返回Fibonacci数字。

Client interface

To illustrate how an RPC service could be used we're going to create a simple client class. It's going to expose a method named call which sends an RPC request and blocks until the answer is received:

fibonacci_rpc = FibonacciRpcClient()result = fibonacci_rpc.call(4)print "fib(4) is %r" % (result,)

about RPC

RPC的问题在于:调用方法的人可能不知道这个方法是一个本地方法还是一个很慢的RPC,这会导致系统出现莫名其妙的问题,而且非常难debug。因此,使用不当的RPC并不会简化程序,而是使代码变得更难维护。

因此对于使用RPC有几个建议:

  1. 必须确认和清楚一个函数调用是本地的还是远程的(local or remote)
  2. 给系统做好文档说明,清晰化模块间的关系。
  3. 做好错误处理:客户端在RPC的服务端崩溃时应该如何处理?

如果对RPC很多问题,应该使用“异步管道”。在这个管道中,处理结果可以异步的推进下一个计算状态。

Callback queue

RabbitMQ实现一个RPC系统是非常简单的:客户端发送一个请求信息,而服务端返回一个应答信息。

In general doing RPC over RabbitMQ is easy. A client sends a request message and a server replies with a response message. In order to receive a response the client needs to send a 'callback' queue address with the request. Let's try it:

result = channel.queue_declare(exclusive=True)callback_queue = result.method.queuechannel.basic_publish(exchange='',                      routing_key='rpc_queue',                      properties=pika.BasicProperties(                            reply_to = callback_queue,                      ),                      body=request)

Message properties

AMQP协议已经定义了14个message的属性。除了下面几个很常用,其他很多属性可能会很少用得上。常用的几个属性如下:

  • delivery_mode:标记这个message是持久性的(persistent,2),或者是暂时性的(transient)。(从work queue范例中可以看到它的应用)
  • content_type:编辑编码的mime-type(the mime-type of the encoding)(应该是和http header的Content-Type类似的玩意)。例如,json编码的数据,可以声明为:application/json
  • reply_to:通常用来命名一个callback queue。
  • correlation_id:通常用来管理RPC的请求和返回。

Correlation id

In the method presented above we suggest creating a callback queue for every RPC request. That's pretty inefficient, but fortunately there is a better way - let's create a single callback queue per client.
That raises a new issue, having received a response in that queue it's not clear to which request the response belongs. That's when the correlation_id property is used. We're going to set it to a unique value for every request. Later, when we receive a message in the callback queue we'll look at this property, and based on that we'll be able to match a response with a request. If we see an unknowncorrelation_id value, we may safely discard the message - it doesn't belong to our requests.

You may ask, why should we ignore unknown messages in the callback queue, rather than failing with an error? It's due to a possibility of a race condition on the server side. Although unlikely, it is possible that the RPC server will die just after sending us the answer, but before sending an acknowledgment message for the request. If that happens, the restarted RPC server will process the request again. That's why on the client we must handle the duplicate responses gracefully, and the RPC should ideally be 
idempotent.

Sample

rpc

Our RPC will work like this:

  • When the Client starts up, it creates an anonymous exclusive callback queue.
  • For an RPC request, the Client sends a message with two properties: reply_to, which is set to the callback queue and correlation_id, which is set to a unique value for every request.
  • The request is sent to an rpc_queue queue.
  • The RPC worker (aka: server) is waiting for requests on that queue. When a request appears, it does the job and sends a message with the result back to the Client, using the queue from the reply_to field.
  • The client waits for data on the callback queue. When a message appears, it checks the correlation_id property. If it matches the value from the request it returns the response to the application.

Putting it all together

rpc_server.py

 1 2 3 4 5 6 7 8 9101112131415161718192021222324252627282930313233343536373839404142
#!/usr/bin/env pythonimport pikaconnection = pika.BlockingConnection(pika.ConnectionParameters(                                            host='localhost'))channel = connection.channel()channel.queue_declare(queue='rpc_queue')# 显然是一个测试用的最低效的fib函数def fib(n):    if n == 0:        return 0    elif n == 1:        return 1    else:        return fib(n-1) + fib(n-2)def on_request(ch, method, props, body):    n = int(body)    print " [.] fib(%s)"  % (n,)    response = fib(n)    # 空串指定的是默认的exchange    # 如果名routing_key参数指定的queue存在的话,把message发送过去。    # 此处server从请求中的reply_to获得了一个队列名    # 利用默认的exchange,发送给这个临时的队列,从而把结果发回给请求者    ch.basic_publish(exchange='',                     routing_key=props.reply_to,                     properties=pika.BasicProperties(correlation_id = \                                                     props.correlation_id),                     body=str(response))    # basic_ack用以简单确认下消息,以免对方(client)重发    ch.basic_ack(delivery_tag = method.delivery_tag)channel.basic_qos(prefetch_count=1)# 逻辑很简单:# server端在接受到message的时候,调用 on_request 函数。# on_request 函数接收到message中的数字,对此进行处理(调用fib函数获得结果)channel.basic_consume(on_request, queue='rpc_queue')print " [x] Awaiting RPC requests"channel.start_consuming()

rpc_client.py:

 1 2 3 4 5 6 7 8 9101112131415161718192021222324252627282930313233343536373839404142434445
#!/usr/bin/env python# coding:utf8import pikaimport uuidclass FibonacciRpcClient(object):    def __init__(self):        self.connection = pika.BlockingConnection(pika.ConnectionParameters(                host='localhost'))        self.channel = self.connection.channel()        result_queue = self.channel.queue_declare(exclusive=True)        self.callback_queue = result_queue.method.queue        self.channel.basic_consume(self.on_response, no_ack=True,                                   queue=self.callback_queue)    def on_response(self, ch, method, props, body):        if self.corr_id == props.correlation_id:            self.response = body    # 此处即为所谓的RPC方法    def call(self, n):        self.response = None        self.corr_id = str(uuid.uuid4())        # 客户端在发送一个请求(其实也就是message)        # 发送的时候,给 message 附上 reply_to 和 correlation_id 属性        self.channel.basic_publish(exchange='',                                   routing_key='rpc_queue',                                   properties=pika.BasicProperties(                                         reply_to = self.callback_queue,                                         correlation_id = self.corr_id,                                         ),                                  body=str(n))        while self.response is None:            self.connection.process_data_events()        return int(self.response)def main():    fibonacci_rpc = FibonacciRpcClient()    print " [x] Requesting fib(30)"    response = fibonacci_rpc.call(30)    print " [.] Got %r" % (response,)if __name__ == '__main__':    main()

Our RPC service is now ready. We can start the server:

$ python rpc_server.py

To request a fibonacci number run the client:

$ python rpc_client.py

The presented design is not the only possible implementation of a RPC service, but it has some important advantages:

  • If the RPC server is too slow, you can scale up by just running another one. Try running a secondrpc_server.py in a new console.
  • On the client side, the RPC requires sending and receiving only one message. No synchronous calls likequeue_declare are required. As a result the RPC client needs only one network round trip for a single RPC request.

Summary

对这个范例,以下是我的理解和详细描述。

server开启时,declare一个queue,即为rpc_queue。server为rpc_queue此queue的consumer,获取rpc_queue的message,而server在rpc_queue接收的message即为client发来的request,而server上对message的处理方法就是所谓的RPC方法,即server在rpc_queue上注册的对message的响应方法on_request。on_request方法处理在rpc_queue接收到message(即请求),会从message中取得两个属性,reply_tocorrelation_idcorrelation_id用以做确认用,而reply_to为结果发送时所用的routing_key。on_request计算出结果后,将结果弄成一个message,将此message加上客户端发来的correlation_id属性(此属性在client收到此结果message的时候,就可以让client把此结果和client的某次请求给对应起来),把这个结果的message发给client所声明等待结果的队列(即client发来的message的reply_to属性。server发送结果消息的时候,exchange参数为空,使用默认的exchange,而默认的exchange会试图把message发给routing_key同名的queue。server从client发来的message(即请求)中的reply_to获得了一个队列名,因此server在publish的时候,声明routing_keyreply_to值,就把结果的message发给client正在等候结果的队列了)

client程序中,实现了一个非常基本的类(FibonacciRpcClient),对一次方法调用进行了封装(也因此,在main函数中,不看源码是无法完全判断此方法是local还是remote,这是非常值得认真考虑的,这需要从命名规范和项目文档化上下好功夫。)。result_queue为client声明的匿名queue(temporary,exclusive且anonymous的queue保证获取自己需求的结果message),client程序为result_queue的consumer(获取其结果的message)。显然注册在result_queue上的方法即为处理client请求的结果的方法,在此处的实现即为简单的验证一下此结果(验证即为上面提到的correlation_id的校对)。client中的FibonacciRpcClient对象中,利用call方法来发起一次请求(对于RPC而言,发起请求的本质其实应该是发送一个message给处理者,利用收发隔离来实现解耦),请求发送的message中附上reply_to 和 correlation_id 属性。而client发出请求后,等待一些时间,server端应该就会在client请求中声明的匿名queue中发来结果。client结果对correlation_id的验证,即可确定结果,完成一次RPC。

用RabbitMQ来实现RPC,依然保持Client Server信息隐藏的特点,Client依赖的不是特定的Server而是特定的消息,在有多个等效Server的情况下,一个Server的状态是否正常不会影响到客户端的状态。

总结一下,使用RabbitMQ实现RPC,客观上还实现了下面的效果:

  1. 容错 一个Server崩溃不影响 Client
  2. 解耦了对特定通信协议和接口的依赖,统一走AMQP消息.
  3. 在多个RPC Server之间的负载均衡由RabbitMQ完成

原创粉丝点击