ZMQ源码分析(六)--编码器和解码器
来源:互联网 发布:青岛知行国际电话 编辑:程序博客网 时间:2024/06/05 06:23
zmq的编码器和解码器负责和stream_engine合作收发网络数据,zmtp3.0使用v2_decoder和v2_encoder进行收发数据,本文也只对该版本进行分析。
解码器
zmq中v1和v2解码器都继承自decoder_base_t,raw_decoder则直接继承自i_decoder:
template <typename T> class decoder_base_t : public i_decoder { public: inline decoder_base_t (size_t bufsize_) : next (NULL), read_pos (NULL), to_read (0), bufsize (bufsize_) { buf = (unsigned char*) malloc (bufsize_); alloc_assert (buf); } // The destructor doesn't have to be virtual. It is mad virtual // just to keep ICC and code checking tools from complaining. inline virtual ~decoder_base_t () { free (buf); } // Returns a buffer to be filled with binary data. inline void get_buffer (unsigned char **data_, size_t *size_) { // If we are expected to read large message, we'll opt for zero- // copy, i.e. we'll ask caller to fill the data directly to the // message. Note that subsequent read(s) are non-blocking, thus // each single read reads at most SO_RCVBUF bytes at once not // depending on how large is the chunk returned from here. // As a consequence, large messages being received won't block // other engines running in the same I/O thread for excessive // amounts of time. if (to_read >= bufsize) { *data_ = read_pos; *size_ = to_read; return; } *data_ = buf; *size_ = bufsize; } // Processes the data in the buffer previously allocated using // get_buffer function. size_ argument specifies nemuber of bytes // actually filled into the buffer. Function returns 1 when the // whole message was decoded or 0 when more data is required. // On error, -1 is returned and errno set accordingly. // Number of bytes processed is returned in byts_used_. inline int decode (const unsigned char *data_, size_t size_, size_t &bytes_used_) { bytes_used_ = 0; // In case of zero-copy simply adjust the pointers, no copying // is required. Also, run the state machine in case all the data // were processed. if (data_ == read_pos) { zmq_assert (size_ <= to_read); read_pos += size_; to_read -= size_; bytes_used_ = size_; while (!to_read) { const int rc = (static_cast <T*> (this)->*next) (); if (rc != 0) return rc; } return 0; } while (bytes_used_ < size_) { // Copy the data from buffer to the message. const size_t to_copy = std::min (to_read, size_ - bytes_used_); memcpy (read_pos, data_ + bytes_used_, to_copy); read_pos += to_copy; to_read -= to_copy; bytes_used_ += to_copy; // Try to get more space in the message to fill in. // If none is available, return. while (to_read == 0) { const int rc = (static_cast <T*> (this)->*next) (); if (rc != 0) return rc; } } return 0; } protected: // Prototype of state machine action. Action should return false if // it is unable to push the data to the system. typedef int (T::*step_t) (); // This function should be called from derived class to read data // from the buffer and schedule next state machine action. inline void next_step (void *read_pos_, size_t to_read_, step_t next_) { read_pos = (unsigned char*) read_pos_; to_read = to_read_; next = next_; } private: // Next step. If set to NULL, it means that associated data stream // is dead. Note that there can be still data in the process in such // case. step_t next; // Where to store the read data. unsigned char *read_pos; // How much data to read before taking next step. size_t to_read; // The duffer for data to decode. size_t bufsize; unsigned char *buf; decoder_base_t (const decoder_base_t&); const decoder_base_t &operator = (const decoder_base_t&); };
解码器的next函数指针同样是一个状态机,每次调用状态机都会重置read_pos和to_read两个变量,表示下一步需要把数据读到什么位置以及需要读取的数据的大小。get_buffer方法主要是返回一个可以读取数据的缓存以及该缓存的大小。如果是小数据,则先使用解码器自带的缓存buf,该缓存的大小为bufsize。如果是大数据,则直接向next返回的read_pos中读取数据,这样可以避免一次数据拷贝。decode同样分为两种情况,如果是之前没有使用自带缓存,则直接移动指针即可。如果是小数据,则需要把数据从缓存中考入到read_pos位置。如果to_read为0,说明当前状态下的所有数据已经处理完毕,需要移动到下一个状态,调用next重置read_pos和to_read。
下面看一下v2_decoder_t的实现:
// Decoder for ZMTP/2.x framing protocol. Converts data stream into messages. class v2_decoder_t : public decoder_base_t <v2_decoder_t> { public: v2_decoder_t (size_t bufsize_, int64_t maxmsgsize_); virtual ~v2_decoder_t (); // i_decoder interface. virtual msg_t *msg () { return &in_progress; } private: int flags_ready (); int one_byte_size_ready (); int eight_byte_size_ready (); int message_ready (); unsigned char tmpbuf [8]; unsigned char msg_flags; msg_t in_progress; const int64_t maxmsgsize; v2_decoder_t (const v2_decoder_t&); void operator = (const v2_decoder_t&); };
v2_decoder_t有四个状态机方法分别对应四种状态,同时有一个8字节的缓存,in_progress是解码器正在处理的消息。解码器解析出来的msg都保存在这里。maxmsgsize是一个最大消息长度的阀值。下面看着四种状态的转换关系:
zmq::v2_decoder_t::v2_decoder_t (size_t bufsize_, int64_t maxmsgsize_) : decoder_base_t <v2_decoder_t> (bufsize_), msg_flags (0), maxmsgsize (maxmsgsize_){ int rc = in_progress.init (); errno_assert (rc == 0); // At the beginning, read one byte and go to flags_ready state. next_step (tmpbuf, 1, &v2_decoder_t::flags_ready);}zmq::v2_decoder_t::~v2_decoder_t (){ int rc = in_progress.close (); errno_assert (rc == 0);}int zmq::v2_decoder_t::flags_ready (){ msg_flags = 0; if (tmpbuf [0] & v2_protocol_t::more_flag) msg_flags |= msg_t::more; if (tmpbuf [0] & v2_protocol_t::command_flag) msg_flags |= msg_t::command; // The payload length is either one or eight bytes, // depending on whether the 'large' bit is set. if (tmpbuf [0] & v2_protocol_t::large_flag) next_step (tmpbuf, 8, &v2_decoder_t::eight_byte_size_ready); else next_step (tmpbuf, 1, &v2_decoder_t::one_byte_size_ready); return 0;}int zmq::v2_decoder_t::one_byte_size_ready (){ // Message size must not exceed the maximum allowed size. if (maxmsgsize >= 0) if (unlikely (tmpbuf [0] > static_cast <uint64_t> (maxmsgsize))) { errno = EMSGSIZE; return -1; } // in_progress is initialised at this point so in theory we should // close it before calling zmq_msg_init_size, however, it's a 0-byte // message and thus we can treat it as uninitialised... int rc = in_progress.init_size (tmpbuf [0]); if (unlikely (rc)) { errno_assert (errno == ENOMEM); rc = in_progress.init (); errno_assert (rc == 0); errno = ENOMEM; return -1; } in_progress.set_flags (msg_flags); next_step (in_progress.data (), in_progress.size (), &v2_decoder_t::message_ready); return 0;}int zmq::v2_decoder_t::eight_byte_size_ready (){ // The payload size is encoded as 64-bit unsigned integer. // The most significant byte comes first. const uint64_t msg_size = get_uint64 (tmpbuf); // Message size must not exceed the maximum allowed size. if (maxmsgsize >= 0) if (unlikely (msg_size > static_cast <uint64_t> (maxmsgsize))) { errno = EMSGSIZE; return -1; } // Message size must fit into size_t data type. if (unlikely (msg_size != static_cast <size_t> (msg_size))) { errno = EMSGSIZE; return -1; } // in_progress is initialised at this point so in theory we should // close it before calling init_size, however, it's a 0-byte // message and thus we can treat it as uninitialised. int rc = in_progress.init_size (static_cast <size_t> (msg_size)); if (unlikely (rc)) { errno_assert (errno == ENOMEM); rc = in_progress.init (); errno_assert (rc == 0); errno = ENOMEM; return -1; } in_progress.set_flags (msg_flags); next_step (in_progress.data (), in_progress.size (), &v2_decoder_t::message_ready); return 0;}int zmq::v2_decoder_t::message_ready (){ // Message is completely read. Signal this to the caller // and prepare to decode next message. next_step (tmpbuf, 1, &v2_decoder_t::flags_ready); return 1;}
在构造函数中调用
next_step (tmpbuf, 1, &v2_decoder_t::flags_ready)
代表接下来想tmpbuf中读入一个字节的数据,下一个状态机状态是flags_ready方法。flags_ready中会分析这条数据是否为长消息,如果是说明接下来的八个字节是消息长度,如果不是说明截下来一个字节是消息长度。这是zmtp规定的数据格式。
if (tmpbuf [0] & v2_protocol_t::large_flag) next_step (tmpbuf, 8, &v2_decoder_t::eight_byte_size_ready); else next_step (tmpbuf, 1, &v2_decoder_t::one_byte_size_ready);
以长消息为例,截下来向tmpbuf中读入8字节长度数据,读取之后进入到eight_byte_size_ready状态。eight_byte_size_ready中已经知道了消息的长度,则用该长度初始化in_progress的大小,下一个状态是
next_step (in_progress.data (), in_progress.size (),&v2_decoder_t::message_ready)
代表向in_progress读入之前得到的数据长度,下一个状态设置成message_ready。当调用message_ready时候说明一条完整的msg已经处理完成了。message_ready方法把状态及设置成初始状态来读取下一条msg。message_ready返回1表明一条完整数据已经读取,其他状态都返回0。
v2_decoder_t主要用于stream_engine中的in_event方法中
void zmq::stream_engine_t::in_event (){ zmq_assert (!io_error); // If still handshaking, receive and process the greeting message. if (unlikely (handshaking)) if (!handshake ()) return; zmq_assert (decoder); // If there has been an I/O error, stop polling. if (input_stopped) { rm_fd (handle); io_error = true; return; } // If there's no data to process in the buffer... if (!insize) { // Retrieve the buffer and read as much data as possible. // Note that buffer can be arbitrarily large. However, we assume // the underlying TCP layer has fixed buffer size and thus the // number of bytes read will be always limited. size_t bufsize = 0; decoder->get_buffer (&inpos, &bufsize); const int rc = tcp_read (s, inpos, bufsize); if (rc == 0) { error (connection_error); return; } if (rc == -1) { if (errno != EAGAIN) error (connection_error); return; } // Adjust input size insize = static_cast <size_t> (rc); } int rc = 0; size_t processed = 0; while (insize > 0) { rc = decoder->decode (inpos, insize, processed); zmq_assert (processed <= insize); inpos += processed; insize -= processed; if (rc == 0 || rc == -1) break; rc = (this->*process_msg) (decoder->msg ()); if (rc == -1) break; } // Tear down the connection if we have failed to decode input data // or the session has rejected the message. if (rc == -1) { if (errno != EAGAIN) { error (protocol_error); return; } input_stopped = true; reset_pollin (handle); } session->flush ();}
如果insize是0,则调用get_buffer,把inpos指向v2_decoder_t的缓存或者是直接指向v2_decoder_t中的in_progress(数据长度大于v2_decoder_t的缓存长度,默认是8192),然后调用tcp_read读入数据。while循环处理当前的读入的数据,如果独到一条完整的消息,则交给process_msg处理,如果剩下的数据不足一条msg,则跳出循环,等待下一次in_event的调用。出错的话则停止监听数据。
编码器
zmq中v1和v2编码器都继承自encoder_base_t,raw_encoder则直接继承自i_encoder:
template <typename T> class encoder_base_t : public i_encoder { public: inline encoder_base_t (size_t bufsize_) : bufsize (bufsize_), in_progress (NULL) { buf = (unsigned char*) malloc (bufsize_); alloc_assert (buf); } // The destructor doesn't have to be virtual. It is made virtual // just to keep ICC and code checking tools from complaining. inline virtual ~encoder_base_t () { free (buf); } // The function returns a batch of binary data. The data // are filled to a supplied buffer. If no buffer is supplied (data_ // points to NULL) decoder object will provide buffer of its own. inline size_t encode (unsigned char **data_, size_t size_) { unsigned char *buffer = !*data_ ? buf : *data_; size_t buffersize = !*data_ ? bufsize : size_; if (in_progress == NULL) return 0; size_t pos = 0; while (pos < buffersize) { // If there are no more data to return, run the state machine. // If there are still no data, return what we already have // in the buffer. if (!to_write) { if (new_msg_flag) { int rc = in_progress->close (); errno_assert (rc == 0); rc = in_progress->init (); errno_assert (rc == 0); in_progress = NULL; break; } (static_cast <T*> (this)->*next) (); } // If there are no data in the buffer yet and we are able to // fill whole buffer in a single go, let's use zero-copy. // There's no disadvantage to it as we cannot stuck multiple // messages into the buffer anyway. Note that subsequent // write(s) are non-blocking, thus each single write writes // at most SO_SNDBUF bytes at once not depending on how large // is the chunk returned from here. // As a consequence, large messages being sent won't block // other engines running in the same I/O thread for excessive // amounts of time. if (!pos && !*data_ && to_write >= buffersize) { *data_ = write_pos; pos = to_write; write_pos = NULL; to_write = 0; return pos; } // Copy data to the buffer. If the buffer is full, return. size_t to_copy = std::min (to_write, buffersize - pos); memcpy (buffer + pos, write_pos, to_copy); pos += to_copy; write_pos += to_copy; to_write -= to_copy; } *data_ = buffer; return pos; } void load_msg (msg_t *msg_) { zmq_assert (in_progress == NULL); in_progress = msg_; (static_cast <T*> (this)->*next) (); } protected: // Prototype of state machine action. typedef void (T::*step_t) (); // This function should be called from derived class to write the data // to the buffer and schedule next state machine action. inline void next_step (void *write_pos_, size_t to_write_, step_t next_, bool new_msg_flag_) { write_pos = (unsigned char*) write_pos_; to_write = to_write_; next = next_; new_msg_flag = new_msg_flag_; } private: // Where to get the data to write from. unsigned char *write_pos; // How much data to write before next step should be executed. size_t to_write; // Next step. If set to NULL, it means that associated data stream // is dead. step_t next; bool new_msg_flag; // The buffer for encoded data. size_t bufsize; unsigned char *buf; encoder_base_t (const encoder_base_t&); void operator = (const encoder_base_t&); protected: msg_t *in_progress;
encoder_base_t比decoder_base_t逻辑稍微复杂一些,但也是使用状态机实现的。encoder_base_t最重要的是encode方法,在分析encode方法之前,先看一下encoder_base_t的使用方式,它主要使用在stream_engine的out_event中:
void zmq::stream_engine_t::out_event (){ zmq_assert (!io_error); // If write buffer is empty, try to read new data from the encoder. if (!outsize) { // Even when we stop polling as soon as there is no // data to send, the poller may invoke out_event one // more time due to 'speculative write' optimisation. if (unlikely (encoder == NULL)) { zmq_assert (handshaking); return; } outpos = NULL; outsize = encoder->encode (&outpos, 0); while (outsize < out_batch_size) { if ((this->*next_msg) (&tx_msg) == -1) break; encoder->load_msg (&tx_msg); unsigned char *bufptr = outpos + outsize; size_t n = encoder->encode (&bufptr, out_batch_size - outsize); zmq_assert (n > 0); if (outpos == NULL) outpos = bufptr; outsize += n; } // If there is no data to send, stop polling for output. if (outsize == 0) { output_stopped = true; reset_pollout (handle); return; } } // If there are any data to write in write buffer, write as much as // possible to the socket. Note that amount of data to write can be // arbitrarily large. However, we assume that underlying TCP layer has // limited transmission buffer and thus the actual number of bytes // written should be reasonably modest. const int nbytes = tcp_write (s, outpos, outsize); // IO error has occurred. We stop waiting for output events. // The engine is not terminated until we detect input error; // this is necessary to prevent losing incoming messages. if (nbytes == -1) { reset_pollout (handle); return; } outpos += nbytes; outsize -= nbytes; // If we are still handshaking and there are no data // to send, stop polling for output. if (unlikely (handshaking)) if (outsize == 0) reset_pollout (handle);}
每次调用该方法会先判断outsize是否为0,如果是0,说明之前的数据已经全部发送出去。if语句中首先调用
outpos = NULL; outsize = encoder->encode (&outpos, 0);
将oupos指向encoder的缓存,然后不断从next_msg中读出需要发送的msg,之后调用encoder的load_msg将新的msg存入到encoder中,最后调用
size_t n = encoder->encode (&bufptr, out_batch_size - outsize);
将刚刚存入的msg写入缓存,encode不一定处理整条消息,如果空间不够可以处理部分消息。如果缓存已满或者没有新的msg可以写则调用tcp_write。out_event的设计可以使一次tcp_write发送多条msg,减少系统调用,提高效率。如果msg没有处理完整,则下次再次进入到if语句中时
outsize = encoder->encode (&outpos, 0);
会继续编码剩下的数据。
看完stream_engine是怎么样使用encoder之后,再回头看encoder的encode方法,该方法每次把buff指向自己的缓存或者是传入进来的指针,接着encoder向buff中写入数据,首先判断to_write是否为0,如果是则运行状态机,这里同样有一个避免拷贝的优化,当to_read比自带buffer大并且传入进来的*data是null,当前的pos也为0(证明之前的数据已经全部发送出去,不会造成数据混乱),则可以直接将发送缓存的指针指向msg的数据部分,这里也不会存在线程安全问题。
v2_encoder的状态机和v2_decoder相比比较简单,只有两个状态:
zmq::v2_encoder_t::v2_encoder_t (size_t bufsize_) : encoder_base_t <v2_encoder_t> (bufsize_){ // Write 0 bytes to the batch and go to message_ready state. next_step (NULL, 0, &v2_encoder_t::message_ready, true);}zmq::v2_encoder_t::~v2_encoder_t (){}void zmq::v2_encoder_t::message_ready (){ // Encode flags. unsigned char &protocol_flags = tmpbuf [0]; protocol_flags = 0; if (in_progress->flags () & msg_t::more) protocol_flags |= v2_protocol_t::more_flag; if (in_progress->size () > 255) protocol_flags |= v2_protocol_t::large_flag; if (in_progress->flags () & msg_t::command) protocol_flags |= v2_protocol_t::command_flag; // Encode the message length. For messages less then 256 bytes, // the length is encoded as 8-bit unsigned integer. For larger // messages, 64-bit unsigned integer in network byte order is used. const size_t size = in_progress->size (); if (unlikely (size > 255)) { put_uint64 (tmpbuf + 1, size); next_step (tmpbuf, 9, &v2_encoder_t::size_ready, false); } else { tmpbuf [1] = static_cast <uint8_t> (size); next_step (tmpbuf, 2, &v2_encoder_t::size_ready, false); }}void zmq::v2_encoder_t::size_ready (){ // Write message body into the buffer. next_step (in_progress->data (), in_progress->size (), &v2_encoder_t::message_ready, true);}
以上就是v2编码器和解码器的工作原理。
除了v1和v2编码器,zmq还提供raw_decode/encode 方式,这种方式比较简单,这里就不做分析了。
- ZMQ源码分析(六)--编码器和解码器
- JPEG编码器和解码器
- ZMQ源码分析(三)--对象管理和消息机制
- ZMQ源码分析(四)--MSG
- ZMQ源码分析(五) --TCP通讯
- Speex手册(四)——命令行编码器/解码器和Speex编码器API(1)
- ZMQ源码分析(一)-- 基础数据结构的实现
- ZMQ源码分析(二)-- 网络&线程模型
- ZMQ源码分析(七) --进程内通讯
- ZMQ源码分析(八)--ROUTER-DEALER & REQ-REP
- ZMQ源码分析(九)--其他socket_base模型
- 基于Windows平台的AAC音频编码器和解码器
- [推荐] 基于多平台优化的音频编码器和解码器
- Netty源码分析(六)—Future和Promis分析
- MINA源码分析---CumulativeProtocolDecoder协议解码器
- quake3 源码分析(六)
- logcat源码分析(六)
- pomelo源码分析(六)
- python 极速入门
- 循环生成学号
- TopCoder kawigiEdit插件配置
- BigDecimal 取小数位
- Android 动画AlphaAnimation类方法
- ZMQ源码分析(六)--编码器和解码器
- MySQL 截取部分日期
- MySQL忘记密码的解决方案
- iOS 里面的addchildController
- 开源生物质颗粒壁炉控制系统
- leetcode 解题记录 Longest Increasing Path in a Matrix
- KVM虚拟机centos7无法使用virsh shutdown关机
- android使用UncaughtExceptionHandler捕获全局异常
- 3、vuejs数据绑定