This style guide is a list of best practices working with Ruby background jobs using Active Job with Sidekiq backend.
Despite the common belief, they work quite well together if you follow the guidelines.
Pass Active Record models as arguments; do not pass by id. Active Job automatically serializes and deserializes Active Record models using GlobalID, and manual deserialization of the models is not necessary.
GlobalID handles model class mismatches properly.
Deserialization errors are reported to error tracking.
# bad - passing by id
# Deserialization error is reported, the job *is* scheduled for retry.
class SomeJob < ApplicationJob
def perform(model_id)
model = Model.find(model_id)
do_something_with(model)
end
end
# bad - model mismatch
class SomeJob < ApplicationJob
def perform(model_id)
Model.find(model_id)
# ...
end
end
# Will try to fetch a Model using another model class, e.g. User's id.
SomeJob.perform_later(user.id)
# acceptable - passing by id
# Deserialization error is reported, the job is *not* scheduled for retry.
class SomeJob < ApplicationJob
def perform(model_id)
model = Model.find(model_id)
do_something_with(model)
rescue ActiveRecord::RecordNotFound
Rollbar.warning('Not found')
end
end
# good - passing with GlobalID
# Deserialization error is reported, the job is *not* scheduled for retry.
class SomeJob < ApplicationJob
def perform(model)
do_something_with(model)
end
end
Warning
|
Do not replace one style with another, use a transitional period to let all jobs scheduled with ids to be processed. Use a helper to temporarily support both numeric and GlobalID arguments. |
class SomeJob < ApplicationJob
include TransitionHelper
def perform(model)
# TODO: remove this when all jobs with numeric id arguments are processed
model = fetch(model, Model)
do_something_with(model)
end
end
module TransitionHelper
def fetch(id_or_object, model_class)
case id_or_object
when Numeric
model_class.find(id_or_object)
when model_class
id_or_object
else
fail "Object type mismatch #{model_class}, #{id_or_object}"
end
end
end
Explicitly specify a queue to be used in job classes. Make sure the queue is on the list of processed queues.
Putting all jobs into one basket comes with a risk of more urgent jobs being executed with a significant delay. Do not put slow and fast jobs together in one queue. Do not put urgent and non-urgent jobs together in one queue.
# bad - no queue specified
class SomeJob < ApplicationJob
def perform
# ...
end
end
# bad - the wrong queue specified
class SomeJob < ApplicationJob
queue_as :hgh_prioriti # nonexistent queue specified
def perform
# ...
end
end
# good
class SomeJob < ApplicationJob
queue_as :high_priority
def perform
# ...
end
end
Ideally, jobs should be idempotent, meaning there should be no bad side effects of them running more than once. Sidekiq only guarantees that the jobs will run at least once, but not necessarily exactly once.
Even jobs that do not fail due to errors might be interrupted during non-rolling-release deployments.
class UserNotificationJob < ApplicationJob
def perform(user)
send_email_to(user) unless already_notified?(user)
end
end
During deployment, a job is given 25 seconds to complete by default. After that, the worker is terminated and the job is sent back to the queue. This might result in part of the work being executed twice.
Make the jobs atomic, i.e., all or nothing.
Do not use threads in your jobs. Spawn jobs instead. Spinning up a thread in a job leads to opening a new database connection, and the connections are easily exhausted, up to the point when the webserver is down.
# bad - consumes all available connections
class SomeJob < ApplicationJob
def perform
User.find_each |user|
Thread.new do
ExternalService.update(user)
end
end
end
end
# good
class SomeJob < ApplicationJob
def perform(user)
ExternalService.update(user)
end
end
User.find_each |user|
SomeJob.perform_later(user)
end
Avoid using ActiveJob’s built-in retry_on
or ActiveJob::Retry
(activejob-retry
gem).
Use Sidekiq retries, which are also available from within Active Job with Sidekiq 6+.
Do not hide or extract job retry mechanisms. Keep retries directives visible in the jobs.
# bad - makes three attempts without submitting to Rollbar,
# fails and relies on Sidekiq's retry that would also make several
# retry attempts, submitting each of the failures to Rollbar.
class SomeJob < ApplicationJob
retry_on ThirdParty::Api::Errors::SomeError, wait: 1.minute, attempts: 3
def perform(user)
# ...
end
end
# bad - it's not clear upfront if the job will be retried or not
class SomeJob < ApplicationJob
include ReliableJob
def perform(user)
# ...
end
end
# good - Sidekiq deals with retries
class SomeJob < ApplicationJob
sidekiq_options retry: 3
def perform(user)
# ...
end
end
Use the retry mechanism. Do not let jobs end up in Dead Jobs. Let Sidekiq retry the jobs, and don’t spend time re-running the jobs manually.
Background processing of a scheduled job may happen sooner than you expect. Make sure to only schedule jobs when the transaction has been committed.
# bad - job may perform earlier than the transaction is committed
User.transaction do
users_params.each do |user_params|
user = User.create!(user_params)
NotifyUserJob.perform_later(user)
end
end
# good
users = User.transaction do
users_params.map do |user_params|
User.create!(user_params)
end
end
users.each { |user| NotifyUserJob.perform_later(user) }
Due to Rails auto-reloading, Sidekiq jobs are executed one-by-one, with no parallelism. That may be confusing.
Run Sidekiq in an environment that has eager_load
set to true
, or with the following flags to circumvent this behavior:
EAGER_LOAD=true ALLOW_CONCURRENCY=true bundle exec sidekiq
Background job processing may be down for a prolonged period (minutes), e.g. during a failed deployment or a burst of other jobs.
Consider running time-critical and mission-critical jobs in-process.
Do not put business logic to jobs; extract it.
# bad
class SendUserAgreementJob < ApplicationJob
# Convenient method to check if preconditions are satisfied to avoid
# scheduling unnecessary jobs.
def self.perform_later_if_applies(user)
job = new(user)
return unless job.satisfy_preconditions?
job.enqueue
end
def perform(user)
@user = user
return unless satisfy_preconditions?
agreement = agreement_for(user: user)
AgreementMailer.deliver_now(agreement)
end
def satisfy_preconditions?
legal_agreement_signed? &&
!user.removed? &&
!user.referral? &&
!(user.active? || user.pending?) &&
!user.has_flag?(:on_hold)
end
private
attr_reader :user
# business logic
end
# good - business logic is not coupled to the job
class SendUserAgreementJob < ApplicationJob
def perform(user)
agreement = agreement_for(user: user)
AgreementMailer.deliver_now(agreement)
end
end
SendUserAgreementJob.perform_later(user) if satisfy_preconditions?
Weigh the pros and cons in each case, whether to schedule jobs from jobs or to execute them in-process. Factors to consider: Is it a retriable job? Can inner jobs fail? Are they idempotent? Is there anything in the host job that may fail?
# good - error kernel pattern
# bad - additional jobs are spawned
class SomeJob < ApplicationJob
def perform
SomeMailer.some_notification.deliver_later
OtherJob.perform_later
end
end
# good - no additional jobs
# bad - if `OtherJob` fails, `SomeMailer` will be re-executed on retry as well
class SomeJob < ApplicationJob
def perform
SomeMailer.some_notification.deliver_now
OtherJob.perform_now
end
end
When a lot of jobs should be performed, it’s acceptable to schedule them.
Consider using batches for improved traceability.
Also, specify the same queue for the host job and sub-jobs.
# acceptable
def perform
batch = Sidekiq::Batch.new
batch.description = 'Send weekly reminders'
batch.jobs do
User.find_each do |user|
WeeklyReminderJob.perform_later(user)
end
end
end
Carefully rename job classes to avoid situations with jobs are scheduled, but there’s no class to process it.
Note
|
This also relates to mailers used with deliver_later .
|
# good - keep the old class
# TODO: Delete this alias in a few weeks when old jobs are safely gone
OldJob = NewJob
Do not use Kernel.sleep
in jobs.
sleep
blocks the worker thread, and it’s not able to process other jobs.
Re-schedule the job for a later time, or use limiters with a custom exception.
# bad
class SomeJob < ApplicationJob
def perform(user)
attempts_number = 3
ThirdParty::Api::User.renew(user.external_id)
rescue ThirdParty::Api::Errors::TooManyRequestsError => error
sleep(error.retry_after)
attempts_number -= 1
retry unless attempts_number.zero?
raise
end
end
# good - retry job in a while, a limited number of times
class SomeJob < ApplicationJob
sidekiq_options retry: 3
sidekiq_retry_in do |count, exception|
case exception
when ThirdParty::Api::Errors::TooManyRequestsError
count + 1 # i.e. 1s, 2s, 3s
end
end
def perform(user)
ThirdParty::Api::User.renew(user.external_id)
end
end
# good - fine-grained control of API usage in jobs
class SomeJob < ApplicationJob
def perform(user)
LIMITER.within_limit do
ThirdParty::Api::User.renew(user.external_id)
end
end
end
# config/initializers/sidekiq.rb
Sidekiq::Limiter.configure do |config|
config.errors << ThirdParty::Api::Errors::TooManyRequestsError
end
On multi-core machines, run as many Sidekiq processes as needed to fully utilize cores. Sidekiq process only uses one CPU core. A rule of thumb is to run as many processes as there are cores available.
Redis’s database size is limited by server memory.
Some prefer to explicitly set maxmemory
, and in combination with a noeviction
policy, this may result in errors on job scheduling.
Do not keep jobs in Dead Jobs. With extended backtrace enabled for Dead Jobs, a single dead job can occupy as much as 20KB in the database.
Re-run the jobs once the root cause is fixed, or delete them.
Do not pass an excessive number of arguments to a job.
# bad
SomeJob.perform_later(user_name, user_status, user_url, user_info: huge_json)
# good
SomeJob.perform_later(user, user_url)
At some scale, it pays out to use commercial features.
Some commercial features are available as third-party add-ons. However, their reliability is in most cases questionable.
Group jobs related to one task using Sidekiq Batches.
Batch’s jobs
method is atomic, i.e., all the jobs are scheduled together, in an all-or-nothing fashion.
# bad
class BackfillMissingDataJob < ApplicationJob
def self.run_batch
Model.where(attribute: nil).find_each do |model|
perform_later(model)
end
end
def perform(model)
# do the job
end
end
# good
class BackfillMissingDataJob < ApplicationJob
def self.run_batch
batch = Sidekiq::Batch.new
batch.description = 'Backfill missing data'
batch.on(:success, BackfillComplete, to: SysAdmin.email)
batch.jobs do
Model.where(attribute: nil).find_each do |model|
perform_later(model)
end
end
end
def perform(model)
# do the job
end
end
Avoid using self-scheduling jobs for long-running jobs. Prefer using Sidekiq Batches to split the workload.
# bad
class BackfillMissingDataJob < ApplicationJob
SIZE = 20
def perform(offset = 0)
models = Model.where(attribute: nil)
.order(:id).offset(offset).limit(SIZE)
return if models.empty?
models.each do |model|
model.update!(attribute: for(model))
end
self.class.perform_later(offset + SIZE)
end
end
# good
class BackfillMissingDataJob < ApplicationJob
def self.run_batch
Sidekiq::Batch.new.jobs do
Model.where(attribute: nil)
.find_in_batches(20) do |models|
BackfillMissingDataJob.perform_later(models)
end
end
end
def perform(models)
models.each do |model|
model.update!(attribute: for(model))
end
end
end
Most third-party APIs have usage limits and will fail if there are too many calls in a period. Use rate limiting in jobs that make such external calls.
Never rely on the number of jobs to be executed. Even if you schedule jobs to be executed at a specific moment, they might be executed all at once, due to, e.g., a traffic jam in job processing. Use Enterprise Rate Limiting. Use the strategy (Concurrent, Bucket, Window) that is most suitable to the specific API rate limiting.
# bad
class UpdateExternalDataJob < ApplicationJob
def perform(user)
new_attribute = ThirdParty::Api.get_attribute(user.external_id)
user.update!(attribute: new_attribute)
end
end
User.where.not(external_id: nil)
.find_in_batches.with_index do |group_number, users|
users.each do |user|
UpdateExternalDataJob
.set(wait: group_number.minutes)
.perform_later(users)
end
end
# good
class UpdateExternalDataJob < ApplicationJob
LIMITER = Sidekiq::Limiter.window('third-party-attribute-update', 20, :minute, wait_timeout: 0)
def perform(user)
LIMITER.within_limit do
new_attribute = ThirdParty::Api.get_attribute(user.external_id)
user.update!(attribute: new_attribute)
end
end
end
# Application code
User.where.not(external_id: nil).find_each do |user|
UpdateExternalDataJob.perform_later(user)
end
# config/initializers/sidekiq.rb
Sidekiq::Limiter.configure do |config|
config.errors << ThirdParty::Api::Errors::TooManyRequestsError
end
Do not rely on Sidekiq’s limiter backoff default. It will reschedule the job in five minutes in the future.
DEFAULT_BACKOFF = ->(limiter, job) do
(300 * job['overrated']) + rand(300) + 1
end
It doesn’t fit the cases when limits are released quickly or are kept for hours. Configure it on a limiter basis.
Sidekiq::Limiter.configure do |config|
config.backoff = ->(limiter, job) do
case limiter.name
when 'daily-third-party-api-limit'
12.hours
else
(300 * job['overrated']) + rand(300) + 1 # fallback to default
end
end
end
Keep in mind how limiter comparison works. Compare limiters by the name, not by the object.
Sidekiq::Limiter.bucket('custom-limiter', 1, :day) == Sidekiq::Limiter.bucket('custom-limiter', 1, :day) # => false
Create limiters once during startup and reuse them. Limiters are thread-safe and designed to be shared.
Each limiter occupies 114 bytes in Redis, and the default TTL is 3 months. 1 million jobs a month using non-shared limiters will be constantly consuming 300MB in Redis.
# bad - limiter is re-created on each job call
class SomeJob < ApplicationJob
def perform(...)
limiter = Sidekiq::Limiter.concurrent('erp', 50, wait_timeout: 0, lock_timeout: 30)
limiter.within_limit do
# call ERP
end
end
end
# good
class SomeJob < ApplicationJob
ERP_LIMIT = Sidekiq::Limiter.concurrent('erp', 50, wait_timeout: 0, lock_timeout: 30)
def perform(...)
ERP_LIMIT.within_limit do
# call ERP
end
end
end
# acceptable - an exception is when the limiter is specific to something, and that is used as a distinction key in limiter name.
class SomeJob < ApplicationJob
def perform(user)
# Rate limiting is per user account
user_throttle = Sidekiq::Limiter.bucket("stripe-#{user.id}", 30, :second, wait_timeout: 0)
user_throttle.within_limit do
# call stripe with user's account creds
end
end
end
The usage of incorrect limiter options may break its behavior.
Set wait_timeout
to zero or some reasonably low value.
Doing otherwise will result in idle workers, while there might be jobs waiting in the queue.
Keep in mind the backoff configuration, and carefully pick the timing when the job is retried.
The Sidekiq::Limiter::OverLimit
exception might be rescued by jobs to discard themselves from locally defined limiters.
To avoid interference between global throttle limiter middleware and local job limiters, wrap Sidekiq::Limiter::OverLimit
exception in middleware.
# Middleware
class SaturationLimiter
SaturationOverLimit = Class.new(StandardError)
def self.wrapper(job, block)
LIMITER.within_limit { block.call }
rescue Sidekiq::Limiter::OverLimit => e
limiter_name = e.limiter.name
# Re-raise if an over the limit exception is coming from a limiter
# defined on the job level.
raise unless limiter_name == LIMITER.name
# Use a custom exception that Sidekiq::Limiter is using to re-schedule
# the job to a later time, but in a way that doesn't overlap with the
# limiters defined on the job level.
raise SaturationOverLimit, limiter_name
end
end
# config/initializers/active_job.rb
ActiveJob::Base.around_perform(&SidekiqLimiter.method(:wrapper))
Sidekiq::Limiter::OverLimit
is an internal mechanism, and it doesn’t make sense to report when it triggers.
# config/initializers/rollbar.rb
Rollbar.configure do |config|
config.exception_level_filters.merge!('Sidekiq::Limiter::OverLimit' => 'ignore')
end
# config/newrelic.yml
production:
error_collector:
enabled: true
ignore_errors: "Sidekiq::Limiter::OverLimit"
Use Enterprise Rolling Restarts. With Rolling Restarts, deployments do not suffer from downtime. Also, it prevents non-atomic and non-idempotent jobs from being interrupted and executed more than once on deployments.
Warning
|
For Capistrano-style deployments make sure to use --reexec-as and --drop-env-var BUNDLE_GEMFILE einhorn options to avoid stalled code and dependencies.
|