Lucas Barret

Posted on Aug 7, 2023 • Edited on Aug 25, 2023

Padle is nice but squash (Rails migration) is funnier

#rails #ruby #database

I have recently heard about Squashing Rails migration. So I wanted to experiment with it since I love learning how things work internally, and I know little about Rails migration. This is the time and place to learn about migration and try to squash some!

Migration in general

Before everything else, it might be silly, but let's redefine and understand what migrations are and why they are helpful.

If you look at the rails documentation,
migrations are a convenient way to alter your database schema over time in a consistent manner.
You can think of each migration as being a new 'version' of the database.

Moreover, they are most often written in code that enables reviews and, as we said, versioning of the databases changes.

Rails migration internals

There are several essential things to understand when you are running a migration.

First, in your database there is a table maintained and used by Rails, which has nothing to do with your models. This table is schema_migrations; there is not much in this table except the version of the migration you have run.

You can access this table through a model in your Rails app:
For example, go to the model folder and create a schema_migration.rb file.

class SchemaMigration < ActiveRecord::Base
end

An implementation already exists; you can see its documentation here. But you won't be able to access it :)

Now you can use it like any other model :

irb(main):001:0> SchemaMigration.all
  SchemaMigration Load (0.6ms)  SELECT "schema_migrations".* FROM "schema_migrations"
=> []

Nothing in here; it is normal for now. We have not done any migration; let's create one and see.

> rails g model user name:string
      invoke  active_record
      create    db/migrate/20230723165832_create_users.rb
      create    app/models/user

Now we can add a user table that has timestamps a name column:

class CreateUsers < ActiveRecord::Migration[7.0]
  def change
    create_table :users do |t|
      t.string :name

      t.timestamps
    end
  end
end

If you do not run your migration, there will still have nothing :

irb(main):001:0> SchemaMigration.all
  SchemaMigration Load (1.0ms)  SELECT "schema_migrations".* FROM "schema_migrations"
=> []

But when you run it and you check your schema_migration table, you will have the version of your migration.

> rails db:migrate
== 20230723165832 CreateUsers: migrating ==================================
-- create_table(:users)
   -> 0.0137s
== 20230723165832 CreateUsers: migrated (0.0137s) =========================

> bin/rails c

irb(main):001:0> SchemaMigration.all
  SchemaMigration Load (0.5ms)  SELECT "schema_migrations".* FROM "schema_migrations"
 => [#<SchemaMigration:0x0000000112c4dbd8 version: "20230723165832">]

Now that we have understood that, squashing our migration is easy.

As I said Padle is nice squash is better

What is squashing? Squashing is the action of merging all your migration into only one file.

Why you would do that ? Because migrations need to load your migration code file and it can take ages if you have a load a lot of them.

But first, let's create another migration :

> rails g model company name:string
      invoke  active_record
      create    db/migrate/20230724134526_create_companies.rb
      create    app/models/user

This creates this file :

class CreateCompanies < Rails::Migration[7.0]
    def change
        create_table :companies do |t|
            t.string :name
            t.timestamps
        end
    end
end

After running your migration, if you check your schema_migrations table. You will see a new SchemaMigration object that is super cool!

irb(main):001:0> SchemaMigration.all
  SchemaMigration Load (0.5ms)  SELECT "schema_migrations".* FROM "schema_migrations"
 => [#<SchemaMigration:0x0000000112c4dbd8 version: "20230723165832">,#<SchemaMigration:0x0000000az2c4efd8 version: "20230724134526">]

Now with our two migrations, we can already squash them. And it is way easier than you think!

After running your migration, you end up either with a schema.rb or schema.sql depending on what you choose to have.

Take the content of this one and copy and paste it into the change method of your last migration in our case :
db/migrate/20230724134526_create_companies.rb

We can rename it or not depending on you, like :
db/migrate/20230724134526_squash_table.rb

class SquashTable < Rails::Migration[7.0]
    def change 
      create_table "table1", force: :cascade do |t|
        t.string "name"
      end

      create_table "table2", force: :cascade do |t|
        t.string "name"
      end

      create_table "table3", force: :cascade do |t|
        t.string "name"
        t.datetime "created_at", null: false
        t.datetime "updated_at", null: false
      end
  end
end

Then you can delete the first migration and rerun the migration!

Nothing happens, right? That's normal indeed, the schema_migrations table has already run this migration; the version of this migration has not changed even if we rename it, so it will not be rerun. Unless you drop your database and run your migration like this :

> rails db:drop db:create db:migrate
== 20230723165832 CreatePalourdes: migrating ==================================
-- create_table("table1", {:force=>:cascade})
   -> 0.0031s
-- create_table("table2", {:force=>:cascade})
   -> 0.0018s
-- create_table("table3", {:force=>:cascade})
   -> 0.0017s
== 20230723165832 CreateTables: migrated (0.0066s) =========================

This will run as before, except that you will not be forced to load thousands of migrations, and it will run much faster locally and in your CI.

[Edit] If you are doing this on a production database, a lot of things has to be taken into account. Bot overall you have to delete that does not exists anymore :).

Conclusion

As you have seen, Rails migrations and squashing them are not so frightening.

In this article, we have better-understood Rails migration and how to squash them to improve performance.

I am sure you will agree with me on the fact that understanding Rails internals is thrilling, see you for the next article. :)

If you have any questions or tips, please do not hesitate to leave a comment :).

PS: Yes, there is a typo in this article I meant Padel, we are all humans after all :P.

Top comments (3)

Martin Meier • Aug 10 '23

TL;DR: don't do it!

By squashing the migrations you miss out all the benefits of migrations ... and it's redundant and unnecessary.

When migrating a database with

rails db:migrate

rails also runs db:schema:dump that creates the schema.rb (don't use schema.sql).

When setting up a new instance of your app, you run

rails db:setup

that creates the database and runs only the one migration "schema.rb". So what you want to achieve is already build into rails.

The benefits of incremental migration are

documentation: you have a log of then evolution of your db schema
versioning of the db schema
tracking of needed migrations
can be rolled back (or at least should be)

development in branches

Rolling back migrations is essential when you develop with feature branches.

Think of developing in a feature branch with a new migration.

The db schema of your branch differs from trunk (main/master).

Now you have to fix something in your trunk (or merge a different feature branch or do something in another branch, ...).

Than you have to roll back the migrations of your current branch, switch the branch and do the migrations of the new branch.

multiple instances with different versions

I normally have several instances of my projects. At least three stages (development / staging / production), sometimes multiple instances (customers). Each instance may be on a different release.

To upgrade an instance, all not already migrated migration of the release / commit to upgrade to have to be run - but only these.

Already migrated migration must not be run (produce errors).

rails handles all this for you with db:migrate.

By migrating, you preserve exiting data, it's not a recreation of the database.

conclusion

don't squash your migrations.
use db:setup to setup a database from scratch (one migration of schema.rb)
use db:migrate to update instances of you app

Lucas Barret • Aug 10 '23 • Edited

Thanks for your comment ! This is a really edge case indeed and you should really be careful when you did it.
The purpose of this article was to understand better migration. :P
It seems you can do db:setup to avoid long seeding when running E2E test in your CI in your opinion without squashing migrations ?

Justin Tanner • Jun 20 • Edited

There are many good reasons to squash migrations.

You have some irreversible data migrations alongside your schema migrations
Your migrations are slow to run
Your upgrading rails / ruby and need to reduce the size of your codebase
Your forking a project into a new project and don't need the full migration history'
Your migrations have broken code in them

For separating data migrations, this gem is fantastic:

github.com/ilyakatz/data-migrate

DEV Community

Padle is nice but squash (Rails migration) is funnier

Migration in general

Rails migration internals

As I said Padle is nice squash is better

Conclusion

Top comments (3)

development in branches

multiple instances with different versions

conclusion

Read next

Connecting SQL Databases to the Cloud: PostgreSQL, MySQL, SQLite, and Cloud Integration Explained

Rails Designer v1.8: Ready for Rails 8 🫶

Technical Interview Questions - Part 4 - Git + SQL vs noSQL

Minitest Advantages: Simple Testing for Rails Projects