Dmitry Salahutdinov for Amplifr.com

Posted on Nov 19, 2018 • Updated on Nov 22, 2018

Purify code with data integrity. Part 1: defaults and not-nulls

#rails #ruby #database #dataintegrity

Working with the data is the crucial point of any application. ActiveRecord makes this process easier for Rails developer. It allows managing the data without getting deep into the underlying database. Sometimes it causes some code issues we would not have if pay more attention to the database structuring and maintaining data integrity.

Data integrity?

Data integrity is the process of maintaining the accuracy and consistency of the data.

In ActiveRecord context I mean, that data integrity - is the set of the principles for keeping an underlying database in accuracy state by paying more attention to its schema design.

In this article, I am going to review some well-known data integrity techniques which are very helpful in improving the codebase but sometimes are undervalued for some reasons. We will see how easy they are to integrate and how they can help to write more clear code.

Default values

Have you seen the code like this?

# Example 1
order = Order.new(state: 'new')

# Example 2
class Order < ApplicationRecord
  validates :state, presence: true

  def before_validation
    self.state ||= 'new'
  end
end

There is the same problem in both examples: the lack of default value preset. It causes for writing initialization "by hand" in a non-centralised way using one of the following options:

1) 👎to set the value "in place" (example one). This solution is fast and dirty. The main disadvantage is implicitness of the "correct" default value: someone who is not familiar with the project would not know the proposal default for setting up in the future or expect the value should already present. It may cause the errors.

The second point is that such "code snippets" spread all over the codebase and make it hard to maintain because of duplication and decentralization.

2) 👎to set the default values in the "pre-save" model callback
(example two from the top).But here we also have implicitness. Where should I find the correct initial value before_save, before_commit, before_validation? The answer depends on the imagination of the author 😊.

The second problem is that the default value is not initialized in the new model (example two), and even we set the default values before saving, we still have it uninitialized in the new model object:

order = Order.new
order.state
=> nil
order.save
order.state
=> 'new'

👍The best option is to preset the default value in the database schema and let ActiveRecord work well by design:

create_table :orders do |t|
  ...
  t.integer :state, default:'new'
end

# default value work by-default :)
Order.new.state
=> 'new'

It allows to have a new object with the column set by default. ActiveRecord will read the metadata of your table and preset the default values for any new model of the User type.

Setting up the database

The simple case is to set the default values while creating the new table. It is easy and non-blocking operation:

create_table :order do |t|
  ...
  t.integer :state, default: 'new'
end

Otherwise adding a column with default to the existing table is more complicated, because it will enforce setting the default value for all the rows. It will lock read/write operation for a while in the high-loaded databases.

If you are lucky to be on the edge and run you application on Postgres 11 - things become much simpler. PostgreSQL 11 does not touch the existing row, but updates them 'lazily'.

If you are not on Postgres 11, the safe way to set up default value constraint consists of two simple steps:
1) creating the nullable column without setting default value
2) changing the default value of the column

  add_column :orders, :state, :text
  change_column_default :orders, :state, 'new'

After that all new rows will have the default values and active record will use those metadata to automatically preset the default. All the existing rows in data base will still have the NULL value and it is needed to fill them "by hands" to start to prettify the ruby code. Then you can rely on the presence of the column for every row.

❗The main point is to maintain compatibility: change the database presets first and ship it. Then cut off the default value initialization out of Ruby-code, when you don't have NULL-values in the column anymore.

Not-Null Constraints

Have you ever faced code like this?

class ImageUploadService
  def call
    ...
    image.upload_tries_count ||= 0
    image.upload_tries_count += 1
  end
end

The issue is similar to the lack of default value. When a column is nullable - web couldn't be sure that there is no nil value. It enforces to write nil-checking logic every time we have to use the column.

If we guarantee that column does not have the nil value, the code will become much simpler:

image.upload_tries_count += 1

And here is the solution:

A not-null constraint specifies merely that a column must not assume the null value.

Create a new table and with not-null column

Set the default options and the column will have the value set by default:

create_table :user do |t|
  ...
  t.integer :points_count, null: false, default: 0
end

Adding not-null column to existing table

This article describes in detail the safe way to add the new column with default, but here are the summary steps:

Fill all the null-values in the database with some value
Set column not-null (by migration)
Cut off the "in-place" model initialization code

❗The better way to fill values of the overloaded table is to update with batches (every batch in its transaction) and to stretch in time.

The standard way to automate things in Ruby world is Rake-tasks, and here is the sample of rake-task to update users table users with batches:

desc 'Fill user points count'
namespace :migrations do
  task fill_user_points_count: :environment do
    total_updated = 0
    loop do
      updated_rows = User
        .where(points_count: nil).limit(10_000)
        .update_all(points_count: 0).to_i
      break if updated_rows.zero?

      total_updated += updated_rows
      puts "updated #{total_updated} users"
    end
  end
end

I recommend using rake-task instead of updating columns in migration, because of it
gives more control to run it on production (you can interrupt process at any time if something goes wrong). And here is some tricky migration to automate setting the default value for staging environment and the other developer machine:

class FillOrderSourceAndSetNotNull < ActiveRecord::Migration[5.0]
  disable_ddl_transaction!

  def up
    if Rails.env.production?
      puts "Run bundle exec migrations:fill_user_points_count in production"
    else
      Rake::Task['migrations:fill_user_points_count'].invoke
    end
  end
end

This migration disables transactional mode for the migration and updates users with batches of 10 000 rows. It allows to not to have the large table lock if the table updates in one transaction.

After this step - we can easily set table not-null:

change_column_null :users, :points_count, false

and deploy changes to production.
The last step is to cut out nil-checking code like this:

user.points_count ||= 0

Wrap up

Following this two simple rules: 'set the default value' and 'set column not null' are very useful to write more clear code and avoid many of the problems.

Thanks for reading! See you at the next topic.

Additional links:

Top comments (3)

Alexandr Shlyakov • Nov 28 '18

Thanks for ur article.
I think, it's not good practice to place changing data rake tasks in migration, cause of u can change model name, column name, etc. in future, and this migration will fail and make suffer developers who deploy project locally.
Although it's ok to make one-time used rake for production, but in development or test env u should actualize data from seed or dump.
It's only my opinion =)

Misha Merkushin • Nov 20 '18

Cool article! Data migrations under high load are hard. 💪

adding a column with default to the existing table is more complicated, because it will enforce setting the default value for all the rows.

Adding not-null column to existing table

It would be great if you could point out that Postgres 11 has no more these problems.

Dmitry Salahutdinov • Nov 20 '18 • Edited

Michael, thanks for providing the the details, fixed!

DEV Community

Purify code with data integrity. Part 1: defaults and not-nulls

Data integrity?

Default values

Setting up the database

Not-Null Constraints

Create a new table and with not-null column

Adding not-null column to existing table

Wrap up

Additional links:

Top comments (3)

Read next

CouchGO! — Enhancing CouchDB with Query Server Written in Go

Opinionated: How to safely insert multiple records to more than one table in Laravel

Difference Between ORM and ODM

Understand inheritance types in Django models