Buffer all the things

#rails #performance

Today I was working on performance optimisations for import logic. One place was operating on an array of arrays of IDs like so:

id_groups = [
  [1, 2],
  [42, 43],
  ...
]

id_groups.each do |ids|
  ids.each_slice(5000) do |portion|
    SomeModel.where(id: portion).update_all(some_field => some_value)
  end
end

While doing an .update_all in reasonable batches is a good start, there's a slight inefficiency here in that if the groups are many and small, we'll be making an update query for each group changing only a handful of records.

We can cut down on the number of update queries made, and thus save on the DB roundtrip overhead by utilising a common pattern - buffered bulk updates.

We will be buffering (collecting in some array) the updates we want to make, and performing an update query (with .upsert_all) only when there's a good number of changes collected.

update_buffer = []
batch_size = 1000

# Collect updates into the buffer and execute upsert_all when reaching batch_size
id_groups.each do |ids|
  ids.each do |id|
    # Add each state update to the buffer
    update_buffer << { id: id, some_field => some_value }

    # Check if buffer has reached the batch size, and perform upsert_all if so
    if update_buffer.size >= batch_size
      SomeModel.upsert_all(update_buffer, unique_by: :id, returning: false)
      update_buffer.clear # Clear buffer after upsert
    end
  end
end

# Perform final upsert_all for any remaining updates in the buffer
SomeModel.upsert_all(update_buffer, unique_by: :id, returning: false) unless update_buffer.empty?

An additional tweak that could be made is avoiding the need to recount the fullness of the buffer for every iteration (the update_buffer.size part) by introducing some counter, incrementing it, and resetting it alongside upsert call. This is useful if the buffer size in your case is large.

Let me know what you think of this technique, have you used something similar?

DEV Community

Buffer all the things

Top comments (0)

Read next

Streamlining Rails Controllers with Simple PORO Validators

Kamal: Speed up the image builds using managed third-party builders and GitHub Actions

Kamal: How to integrate with GitHub Actions using multiple destinations

Mastering Docker: Essential Best Practices for Efficiency and Security👮🏻