Working with the data is the crucial point of any application. ActiveRecord makes this process easier for Rails developer. It allows managing the data without getting deep into the underlying database. Sometimes it causes some code issues we would not have if pay more attention to the database structuring and maintaining data integrity.
Data integrity?
Data integrity is the process of maintaining the accuracy and consistency of the data.
In ActiveRecord
context I mean, that data integrity
- is the set of the principles for keeping an underlying database in accuracy state by paying more attention to its schema design.
In this article, I am going to review some well-known data integrity techniques which are very helpful in improving the codebase but sometimes are undervalued for some reasons. We will see how easy they are to integrate and how they can help to write more clear code.
Default values
Have you seen the code like this?
# Example 1
order = Order.new(state: 'new')
# Example 2
class Order < ApplicationRecord
validates :state, presence: true
def before_validation
self.state ||= 'new'
end
end
There is the same problem in both examples: the lack of default value preset. It causes for writing initialization "by hand" in a non-centralised way using one of the following options:
1) ๐to set the value "in place" (example one). This solution is fast and dirty. The main disadvantage is implicitness of the "correct" default value: someone who is not familiar with the project would not know the proposal default for setting up in the future or expect the value should already present. It may cause the errors.
The second point is that such "code snippets" spread all over the codebase and make it hard to maintain because of duplication and decentralization.
2) ๐to set the default values in the "pre-save" model callback
(example two from the top).But here we also have implicitness. Where should I find the correct initial value before_save
, before_commit
, before_validation
? The answer depends on the imagination of the author ๐.
The second problem is that the default value is not initialized in the new model (example two), and even we set the default values before saving, we still have it uninitialized in the new model object:
order = Order.new
order.state
=> nil
order.save
order.state
=> 'new'
๐The best option is to preset the default value in the database schema and let ActiveRecord work well by design:
create_table :orders do |t|
...
t.integer :state, default:'new'
end
# default value work by-default :)
Order.new.state
=> 'new'
It allows to have a new object with the column set by default. ActiveRecord will read the metadata of your table and preset the default values for any new model of the User
type.
Setting up the database
The simple case is to set the default values while creating the new table. It is easy and non-blocking operation:
create_table :order do |t|
...
t.integer :state, default: 'new'
end
Otherwise adding a column with default to the existing table is more complicated, because it will enforce setting the default value for all the rows. It will lock read/write operation for a while in the high-loaded databases.
If you are lucky to be on the edge and run you application on Postgres 11 - things become much simpler. PostgreSQL 11 does not touch the existing row, but updates them 'lazily'.
If you are not on Postgres 11, the safe way to set up default value constraint consists of two simple steps:
1) creating the nullable column without setting default value
2) changing the default value of the column
add_column :orders, :state, :text
change_column_default :orders, :state, 'new'
After that all new rows will have the default values and active record will use those metadata to automatically preset the default. All the existing rows in data base will still have the NULL
value and it is needed to fill them "by hands" to start to prettify the ruby code. Then you can rely on the presence of the column for every row.
โThe main point is to maintain compatibility: change the database presets first and ship it. Then cut off the default value initialization out of Ruby-code, when you don't have NULL-values in the column anymore.
Not-Null Constraints
Have you ever faced code like this?
class ImageUploadService
def call
...
image.upload_tries_count ||= 0
image.upload_tries_count += 1
end
end
The issue is similar to the lack of default value. When a column is nullable - web couldn't be sure that there is no nil
value. It enforces to write nil
-checking logic every time we have to use the column.
If we guarantee that column does not have the nil
value, the code will become much simpler:
image.upload_tries_count += 1
And here is the solution:
A not-null constraint specifies merely that a column must not assume the null value.
Create a new table and with not-null column
Set the default
options and the column will have the value set by default:
create_table :user do |t|
...
t.integer :points_count, null: false, default: 0
end
Adding not-null column to existing table
This article describes in detail the safe way to add the new column with default, but here are the summary steps:
- Fill all the null-values in the database with some value
- Set column not-null (by migration)
- Cut off the "in-place" model initialization code
โThe better way to fill values of the overloaded table is to update with batches (every batch in its transaction) and to stretch in time.
The standard way to automate things in Ruby world is Rake
-tasks, and here is the sample of rake-task to update users table users with batches:
desc 'Fill user points count'
namespace :migrations do
task fill_user_points_count: :environment do
total_updated = 0
loop do
updated_rows = User
.where(points_count: nil).limit(10_000)
.update_all(points_count: 0).to_i
break if updated_rows.zero?
total_updated += updated_rows
puts "updated #{total_updated} users"
end
end
end
I recommend using rake-task instead of updating columns in migration, because of it
gives more control to run it on production (you can interrupt process at any time if something goes wrong). And here is some tricky migration to automate setting the default value for staging environment and the other developer machine:
class FillOrderSourceAndSetNotNull < ActiveRecord::Migration[5.0]
disable_ddl_transaction!
def up
if Rails.env.production?
puts "Run bundle exec migrations:fill_user_points_count in production"
else
Rake::Task['migrations:fill_user_points_count'].invoke
end
end
end
This migration disables transactional mode for the migration and updates users with batches of 10 000 rows. It allows to not to have the large table lock if the table updates in one transaction.
After this step - we can easily set table not-null:
change_column_null :users, :points_count, false
and deploy changes to production.
The last step is to cut out nil-checking code like this:
user.points_count ||= 0
Wrap up
Following this two simple rules: 'set the default value' and 'set column not null' are very useful to write more clear code and avoid many of the problems.
Thanks for reading! See you at the next topic.
Top comments (3)
Thanks for ur article.
I think, it's not good practice to place changing data rake tasks in migration, cause of u can change model name, column name, etc. in future, and this migration will fail and make suffer developers who deploy project locally.
Although it's ok to make one-time used rake for production, but in development or test env u should actualize data from seed or dump.
It's only my opinion =)
Cool article! Data migrations under high load are hard. ๐ช
It would be great if you could point out that Postgres 11 has no more these problems.
Michael, thanks for providing the the details, fixed!