Erlang & Elixir are ready for asynchronous work right off the bat. Generally speaking, background job systems aren't
needed as much as in other ecosystems but they still have their place for particular use cases.
This post goes through a few best practices I often try to think of in advance when writing background jobs, so that I don't hit some of the pain points that have hurt me multiple times in the past.
If you've ever deployed a new task, only to find out that it has gone rogue with a bug that caused it to misbehave
(e.g.: sending way too many emails, way too quickly), you may have gone through similar bugs as well.
Flavours
Elixir already gives you the ability to schedule asynchronous work pretty easily. Something as simple as this already
covers a lot:
Task.async(fn ->
# some heavy lifting
end)
You might need something a bit more powerful, either just for convenience (having some tooling & monitoring around
that task), or because you need something like periodic jobs. Again, all this can be achieved with something like
a GenServer:
defmodule PeriodicJob do
use GenServer
@period 60_000
def init do
Process.send_after(self(), :poll, @period)
{:ok, :state}
end
def handle_info(:poll, state) do
# some heavy lifting
Process.send_after(self(), :poll, @period)
end
end
You can also use a job queuing library such as Quantum
. If you come from Ruby land and are used to libraries such as
Sidekiq
, you might be more familiar with something like this:
#
# lib/my_app/scheduler.ex
#
defmodule MyApp.Scheduler do
use Quantum.Scheduler, opt_app: :my_app
end
#
# config/config.exs
#
config :my_app, MyApp.Scheduler,
jobs: [
first: [
# every hour
schedule: "0 * * * *",
task: {MyApp.ExampleJob, :run, []}
],
second: [
# every minute
schedule: "* * * * *",
task: {MyApp.AnotherExampleJob, :run, []}
]
]
Some may argue that since Erlang/OTP already provides the infrastructure for creating these processes, packages such
as Quantum
are not necessary. However, the structure created by them can end up being more intuitive, especially if you're
not that familiar with OTP. This might be the case with someone coming from Ruby or other such communities.
How to Structure Background Jobs
Let's now get into a few tips that will help you keep your jobs ready to deal with potential future problems!
Most of them are preventive measures due to the fact that all of these are background processes. They're not responding
to an HTTP request and they happen without any intervention, thus sometimes, debugging can be hard if you don't take some
precautions.
Let's consider a small example that sends confirmation emails to users that haven't received it yet:
defmodule MyApp.ExampleJob do
def run do
get_users()
|> Enum.each(fn user ->
# send single email to user
end)
end
defp get_users do
MyApp.User
|> where(confirmation_email_sent: false)
|> MyApp.Repo.all()
end
end
1. Put in a Kill Switch
This is one of those mistakes I'll never make again since it has hurt me so many times.
Let's say you've created a background job, tested, deployed, and configured it to run periodically and send some emails.
It hits production, and you soon notice that something's wrong. The same 100 people are being spammed with emails every
minute. You messed up the geth_next_batch/1
function, and it always goes over the same batch of users.
It's a developer's horror story. You need to fix it (or kill it) quickly, but all that time waiting for a new release to get
online is physically painful.
So, avoid that:
defmodule MyApp.ExampleJob do
def run do
return if !enabled?
# ...
end
defp enabled? do
# check a Redis flag, or a database record, or anything really
end
end
You can plug in some persistent system that allows you to quickly toggle the job on/off. A good suggestion would be to
use a feature flag package, such as FunWithFlags.
2. Always Batch Your Jobs by Default
It's easy to miss this one on a first draft. You're just trying to quickly get something online.
But in some cases, it may be important to not hurt your performance if you're working on a very resource-intensive job,
or simply, if your list of records to process grows too quickly.
Doing User |> where(confirmation_email_sent: false) |> Repo.all()
can be dangerous if there's potential for that to
yield too many results.
You may end up consuming too many resources for something that could be done in smaller batches, keeping your system
a lot more stable:
defmodule MyApp.ExampleJob do
@batch_size 100
defp get_users do
MyApp.User
|> where(confirmation_email_sent: false)
|> limit(^@batch_size)
|> MyApp.Repo.all()
end
end
Whatever job queue mechanism you plug this worker into, it will end up being called frequently. So you shouldn't hurry
in processing smaller batches one at a time.
3. Avoid Overlaps
This is kind of related to the previous point, but it's a concern that goes beyond performance.
If you program a job to run every minute, and a single execution has the potential to last longer than that, you end up
risking cascading performance problems, or even worse, race conditions, where the first and second executions are both
trying to process the same set of data, and conflict with each other in the process.
This is obviously dependent on what your exact business logic is, but as a general rule, it's best to be defensive here.
If you use a GenServer approach like the one showcased above, this is solved automatically, as instead of scheduling
jobs every minute, you can instead use Process.send_after(self(), :poll, delay)
to only schedule the next run after
the current one has finished, avoiding overlap.
When using Quantum
, you also have an overlap: true
option that you can add to automatically prevent this:
config :my_app, MyApp.Scheduler,
jobs: [
first: [
# every hour
schedule: "0 * * * *",
task: {MyApp.ExampleJob, :run, []}
overlap: false
]
This, by the way, might already be reason enough to consider using a package rather than just plain Elixir.
4. Plug in a Manual Mode
If your job is processing a batch of records, it's useful to plug in some public functions that allow you to manually
process specific records. This can serve two purposes:
- Better ability to debug the job
- Ability to do a few manual runs before enabling the global job (by toggling the feature flag discussed above)
A sample structure could look like this:
defmodule MyApp.ExampleJob do
import MyApp.Lock
def run do
lock("example_job", fn ->
get_users()
|> Enum.each(&process_user/1)
end)
end
def run_manually(users) when is_list(users) do
lock("example_job", fn ->
users
|> Enum.each(&process_user/1)
end)
end
def run_manually(user), do: run_manually([user])
def process_user(%User{} = user) do
# process a single user
end
end
In this case, we're creating a run_manually/1
public function that can receive either a single user or a batch of
them and performs the same logic as the automatic job would.
One important detail here is to again avoid a race condition, which in this case, is being done with a custom Lock
module that uses the redis_mutex
package to prevent potential issues:
defmodule MyApp.Lock do
use RedisMutex
require Logger
def lock(lock_name, fun) do
with_lock(lock_name, 60_000) do
fun.()
end
rescue
_e in RedisMutex.Error ->
Logger.debug("#{locker_name} another process already running")
end
end
The lock, which is invoked both on manual runs as well as the regular background job, ensures that you won't cause any
unintentional conflicts if you try to do a manual run at the same time the job is doing the same processing. It also
happens to solve the overlap problem discussed previously in this post.
Conclusion
All these tips come from something that I bumped into in the past, usually related to production bugs or user
complaints. So I hope some of them help you avoid the same mistakes. Let me know if you
have any further thoughts! 👋
Guest author Miguel is a professional over-engineer at Portuguese-based Subvisual. He currently works with UTRUST as well, being the head of infrastructure & blockchain integrations. He builds fancy keyboards and plays excessive amounts of online chess.
P.S. If you'd like to read Elixir Alchemy posts as soon as they get off the press, subscribe to our Elixir Alchemy newsletter and never miss a single post!
Top comments (0)