fluentd学习——High Availability (多级fluentd配置)

来源:互联网 发布:淘宝店铺怎么关闭贷款 编辑:程序博客网 时间:2024/05/16 18:11

High Availability (多级fluentd配置)

http://docs.fluentd.org/articles/high-availability

Fluentd High Availability Configuration

For high-traffic websites, we recommend using a high availability configuration of Fluentd.

对于高流量的网站,我们建议使用一个高可用性fluentd配置。

Table of Contents

  • Message Delivery Semantics
  • Network Topology
  • Log Forwarder Configuration
  • Log Aggregator Configuration
  • Failure Case Scenarios
  • Trouble Shooting

Message Delivery Semantics

消息传递语义

Fluentd is designed primarily for event-log delivery systems.

Fluentd 主要为event-log 传递系统而设计的。

In such systems, several delivery guarantees are possible:

在这种系统中,几个传递担保是可能的。

  • At most once: Messages are immediately transferred. If the transfer succeeds, the message is never sent out again. However, many failure scenarios can cause lost messages (ex: no more write capacity)
  • At least once: Each message is delivered at least once. In failure cases, messages may be delivered twice.
  • Exactly once: Each message is delivered once and only once. This is what people want.
  • 至多一次:消息立即转移。如果转会成功,邮件不会再次发送。然而,许多失败场景会导致失去了消息(例:没有更多的写作能力)
  • 至少一次:每个消息至少传递一次。在失败的情况下,消息可能是两次交付。
    正好一次:每条消息交付一次,只有一次。这才是人们想要的。

If the system “can’t lose a single event”, and must also transfer “exactly once”, then the system must stop ingesting events when it runs out of write capacity. The proper approach would be to use synchronous logging and return errors when the event cannot be accepted.

如果系统“不能失去单一的一个事件”,还必须传递“正好一次”,当写能力耗尽的时候,则系统必须停止摄取事件。正确的方法是当事件不能被接受时用同步日志并返回错误。(这就是为什么是会有一个error.log 的日志文件

That’s why Fluentd guarantees ‘At most once’ transfers. In order to collect massive amounts of data without impacting application performance, a data logger must transfer data asynchronously. This improves performance at the cost of potential delivery failures.

这就是为什么Fluentd保证“最多一次”转移。为了收集大量数据而不影响应用程序的性能,一个数据log文件必须转移数据异步。这提高了性能代价的潜在交付失败。(这是在说明为什么在<server></server>会有多个IP的匹配。是用来一个出错,可以转发另外的IP,这个应该就是转移能力

However, most failure scenarios are preventable. The following sections describe how to set up Fluentd’s topology for high availability.

虽然,大多数失败的场景是可以预防的。以下部分描述如何设置Fluentd的拓扑为高可用性

Network Topology

网络拓扑

To configure Fluentd for high availability, we assume that your network consists of ‘log forwarders’ and ‘log aggregators’.

Fluentd的高可用性配置,我们假设你的网络由”日志代理“和”日志整合“

log forwarders are typically installed on every node to receive local events. Once an event is received, they forward it to the ‘log aggregators’ through the network.

  log forwarders 通常安装在每个节点接收本地事件。一旦事件被收到,他们通过网络提交到 log aggregators 。  

log aggregators’ are daemons that continuously receive events from the log forwarders. They buffer the events and periodically upload the data into the cloud.

log aggregators’是守护进程,不断从  log forwarders 接收事件。他们缓冲事件和定期把数据上传到云。

Fluentd can act as either a log forwarder or a log aggreagator, depending on its configuration. The next sections describes the respective setups. We assume that the active log aggregator has ip ‘192.168.0.1’ and that the backup has ip ‘192.168.0.2’.

Fluentd可以作为日志forwarder 也可以作为日志aggreagator,这取决于它的配置。接下来的章节描述了各自的设置。我们假定有效的日志聚合器有ip”192.168.0.1,备份ip”192.168.0.2’。(就是说你的发送端可以是用fluentd来收集,你的接收端也是用fluentd来收集)

Log Forwarder Configuration   (想想这个只是客户端的日志文件上传,只负责上传,但收集的日志文件在哪儿,这个时候你应该能想到的是,哪个插件是用来收集的日志的——tail input plugin插件

Log Forwarder     配置

Please add the following lines to your config file for log forwarders. This will configure your log forwarders to transfer logs to log aggregators.

请将下列代码行添加到你的log forwarders的配置文件。这将配置您的  log forwarders 传输到日志 log aggregators。

收集端的配置文件
# TCP input<source>  type forward  port 24224</source># HTTP input<source>  type http  port 8888</source># Log Forwarding<match mytag.**>  type forward  # primary host  <server>    host 192.168.0.1    port 24224  </server>  # use secondary host     (多个IP)  <server>    host 192.168.0.2    port 24224    standby  </server>  # use longer flush_interval to reduce CPU usage.  # note that this is a trade-off against latency.  flush_interval 60s</match>

When the active aggregator (192.168.0.1) dies, the logs will instead be sent to the backup aggregator (192.168.0.2). If both servers die, the logs are buffered on-disk at the corresponding forwarder nodes.

当192.168.0.1死了,日志将被发送到备份聚合器(192.168.0.2)。如果两台服务器死,日志缓冲磁盘在相应的转发节点。

Log Aggregator Configuration

聚合端的配置(服务器端的配置就这几句话,仔细理解)

Please add the following lines to the config file for log aggregators. The input source for the log transfer is TCP.

请将以下代码添加到日志聚合端的配置文件中。输入源的日志是用TCP传输的
# Input<source>  type forward  port 24224</source># Output  这匹配非常重要(就是匹配上面收集端的<match mytag.**>)这个问题困惑了好久
<match mytag.**>    ...</match>

The incoming logs are buffered, then periodically uploaded into the cloud. If upload fails, the logs are stored on the local disk until the retransmission succeeds.

传入日志缓冲,然后定期上传到云端。如果上传失败,日志存储在本地磁盘,直到重新传输成功。

Failure Case Scenarios

失败案例情况

Forwarder Failure

Forwarder 失败

When a log forwarder receives events from applications, the events are first written into a disk buffer (specified by buffer_path). After every flush_interval, the buffered data is forwarded to aggregators.

当一个日志forwarder 收到来自应用程序的事件,这些事件首先被写入磁盘缓冲区(指定的缓冲区路径)。在每次刷新间隔后,缓冲数据转发到聚合器aggregators。

This process is inherently robust against data loss. If a log forwarder’s fluentd process dies, the buffered data is properly transferred to its aggregator after it restarts. If the network between forwarders and aggregators breaks, the data transfer is automatically retried.

这个过程内在本身是健壮的以防止数据丢失。如果一个日志 forwarder  的fluentd进程死掉,缓冲数据被正确地转移到其聚合器aggregator 后重启。如果网络在代理forwarder 和聚合器aggregator 之间断开,数据传输是自动重试。

However, possible message loss scenarios do exist:

然而,可能的信息损失的场景确实存在:

  • The process dies immediately after receiving the events, but before writing them into the buffer.
  • The forwarder’s disk is broken, and the file buffer is lost.
  • 这个进程失效后立即接收事件,但在它们写到缓冲区之前。
  • forwarder  的磁盘坏了,和文件缓冲区失去了。

Aggregator Failure

Aggregator 失败

When log aggregators receive events from log forwarders, the events are first written into a disk buffer (specified by buffer_path). After every flush_interval, the buffered data is uploaded into the cloud.

当日志聚合器aggregators 从日志代理forwarders接收事件时,事件首先被写入磁盘缓冲区(指定的缓冲区路径)。在每次刷新间隔,缓冲数据上传到云端。

This process is inherenty robust against data loss. If a log aggregator’s fluentd process dies, the data from the log forwarder is properly retransferred after it restarts. If the network between aggregators and the cloud breaks, the data transfer is automatically retried.

这个过程是内在本身是健壮的以防止数据丢失。如果一个日志聚合器aggregators 的fluentd进程失效,数据从日志 forwarder 正确重启后重新传送。如果网络和云之间的聚合器aggregators 断开,数据传输是自动重试。

However, possible message loss scenarios do exist:

然而,可能的信息损失的场景确实存在:

  • The process dies immediately after receiving the events, but before writing them into the buffer.
  • The aggregator’s disk is broken, and the file buffer is lost.
  • 这个进程失效后立即接收事件, 但在它们写到缓冲区之前 。
  • aggregators  的磁盘坏了,和文件缓冲区失去了。

Trouble Shooting

问题解答

“no nodes are available”

没有节点可用

Please make sure that you can communicate with port 24224 using not only TCP, but also UDP. These commands will be useful for checking the network configuration.

请确保你可以沟通使用TCP端口24224,而不是UDP。这些命令将被用于检查网络配置。

$ telnet host 24224$ nmap -p 24224 -sU host

原创粉丝点击