Sidekiq错误处理

来源：互联网发布：ps 淘宝广告编辑：程序博客网时间：2024/06/09 07:19

最佳实践

使用一个处理错误的组件，比如 Honeybadger, Airbrake, Rollbar, BugSnag, Sentry, Exceptiontrap, Raygun等，它们功能上都类似，你可以选择一个使用。在一个任务上出现异常时，这些组件会向你发送一封email。请注意，Sidekiq 3.0删除了对Airbrake, Honeybadger, Exceptional and ExceptionNotifier内置的支持。确保你的error_service 支持Sidekiq。
Sidekiq自身的重试机制会捕获这些异常并定期重新执行任务。这些错误处理组件会通知你出现了异常，你可以修复这些导致异常的bug，直到Sidekiq能够成功处理任务。
如果重试了25次（大概21天）你还没有修复bug，Sidekiq将会停止重试，并将你的任务移到Dead Job Queue。在接下来的六个月内，你可以修复bug，并通过Web UI手动重试你的任务。
六个月之后，Sidekiq会删除这项任务。

错误处理程序

Gems可以附加到Sidekiq的全局错误处理程序，所以在Sidekiq内部出现一个错误后，其就会被感知到。通过把error services的gem放到你应用的Gemfile中，error services可以被自动的集成进来。
You can create your own error handler by providing something with responds to call(exception, context_hash):

Sidekiq.configure_server do |config|  config.error_handlers << Proc.new {|ex,ctx_hash| MyErrorService.notify(ex, ctx_hash) }end

请注意，错误处理程序只是和Sidekiq 服务进程相关，其在Rails console并不可用。

回溯日志

为一个任务使能backtrace 日志记录将会使回溯持续到任务终止。如果大量的任务不断失败并被重新排队执行，这样即使没有新任务加入，也会导致Redis占用的内存增加。当使能backtrace 时，你要保持谨慎，你可以把它限制在几行，或者通过error service来跟踪错误。

自动重启任务

Sidekiq会使用一个指数退避公式（(retry_count * 4) + 15 + (rand(30) (retry_count + 1))）来计算重试失败任务的时间，这会在大概21天内重试25次。假设你在这个时间内修复了bug，任务会重试并成功执行通过。如果达到了25次，Sidekiq会把任务移到Dead Job queue，这样的话后面就需要手动执行该任务了。

Web界面

Sidekiq的Web界面有一个“Retries”和“Dead”标签列出失败的任务，并允许你执行、检查或删除它们。

Dead Job Queue

Sidekiq 3.0引入了这一功能，重试次数达到上限的任务将被放到这个队列里。Sidekiq将不会再重试这个队列里的任务，你可以通过Web界面手工的执行它们。这个队列不会无限增长，至多有10,000个任务且任务被放到队列里的时间小于6个月。只有配置了大于等于0次重试的任务才可能会被放到Dead Job Queue中，如果某种特殊类型的任务很短暂，你可以通过设置:retry => false 使其不重试或者死亡。

配置

You can specify the number of retries for a particular worker if 25 is too many:

class LessRetryableWorker  include Sidekiq::Worker  sidekiq_options :retry => 5 # Only five retries and then to the Dead Job Queue  def perform(...)  endend

You can disable retry support for a particular worker. Note with retry disabled, Sidekiq will not track or save any error data for the worker’s jobs.

class NonRetryableWorker  include Sidekiq::Worker  sidekiq_options :retry => false # job will be discarded immediately if failed  def perform(...)  endend

You can disable a job going to the DJQ:

class NonRetryableWorker  include Sidekiq::Worker  sidekiq_options :retry => 5, :dead => false  def perform(...)  endend

The retry delay can be customized using sidekiq_retry_in, if needed.

class WorkerWithCustomRetry  include Sidekiq::Worker  sidekiq_options :retry => 5  # The current retry count is yielded. The return value of the block must be   # an integer. It is used as the delay, in seconds.   sidekiq_retry_in do |count|    10 * (count + 1) # (i.e. 10, 20, 30, 40)  end  def perform(...)  endend

After retrying so many times, Sidekiq will call the sidekiq_retries_exhausted hook on your Worker if you’ve defined it. The hook receives the queued message as an argument. This hook is called right before Sidekiq moves the job to the DJQ.

class FailingWorker  include Sidekiq::Worker  sidekiq_retries_exhausted do |msg|    Sidekiq.logger.warn "Failed #{msg['class']} with #{msg['args']}: #{msg['error_message']}"  end  def perform(*args)    raise "or I don't work"  endend

As of Sidekiq 3.3.2, you can change the maximum number of jobs in the DJQ or the maximum time spent in the DJQ by setting dead_max_jobs and dead_timeout_in_seconds in your Sidekiq options hash.

进程崩溃

If the Sidekiq process segfaults or crashes the Ruby VM, any jobs that were being processed are lost. Sidekiq Pro offers a reliable queueing feature which does not lose those jobs.

No More Bike Shedding

Sidekiq’s retry mechanism is a set of best practices but many people have suggested various knobs and options to tweak in order to handle their own edge case. This way lies madness. Design your code to work well with Sidekiq’s retry mechanism as it exists today or fork the RetryJobs middleware and add your own logic. I’m no longer accepting any functional changes to the retry mechanism unless you make an extremely compelling case for why Sidekiq’s thousands of users would want that change.

0 0