We had a migration that failed to run in QA that had both schema and data changes. In this document I am going to cover what happened, how we fixed it, and explore what we can do differently in the future.

An Example

Let's set up a small example that will help illustrate some issues we want to focus on. We have a book shelf app that allows us to track authors and books.

Here is the schema for our authors table:

> \\d authors
                            Table "public.authors"
   Column   |            Type             |             Modifiers
------------+-----------------------------+------------------------------------
 id         | uuid                        | not null default gen_random_uuid()
 name       | character varying           | not null
 genre      | character varying           | not null
 created_at | timestamp without time zone | not null
 updated_at | timestamp without time zone | not null
Indexes:
    "authors_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "books" CONSTRAINT "fk_rails_53d51ce16a" FOREIGN KEY (author_id) REFERENCES authors(id)

Here is the schema for our books table:

> \\d books
                                Table "public.books"
      Column      |            Type             |             Modifiers
------------------+-----------------------------+------------------------------------
 id               | uuid                        | not null default gen_random_uuid()
 title            | character varying           | not null
 publication_year | integer                     | not null
 created_at       | timestamp without time zone | not null
 updated_at       | timestamp without time zone | not null
 author_id        | uuid                        |
Indexes:
    "books_pkey" PRIMARY KEY, btree (id)
    "index_books_on_author_id" btree (author_id)
Foreign-key constraints:
    "fk_rails_53d51ce16a" FOREIGN KEY (author_id) REFERENCES authors(id)

We can see that authors and books are related to one another by a foreign key relationship. We naively put the genre column on authors which makes it hard to represent any authors that have written books across different genres. We'd like to move the genre column onto books. We already have data consistent of a number of authors and books, so that data will need to get migrated along with the schema change.

Assume an environment where Rails has cached the schema — to ensure this has happened locally, we can run rails db:schema:cache:dump and Rails will by default pick up and use this schema cache file.

Here is a migration that moves the genre column from the authors table over to the books table (leaving the original genre column intact for now) and migrates the existing data:

  def up
    add_column :books, :genre, :string

    GenericAuthor.find_each do |author|
      book = GenericBook.find_by(author_id: author.id)

      puts "Moving #{author.genre} from #{author.name} to #{book.title}"

      update_attrs = { genre: author.genre }
      book.update!(update_attrs)
    end

    puts "\\n### Genres have supposedly been migrated ###\\n\\n"

    GenericBook.find_each do |book|
      puts "The genre of #{book.title} is #{book.genre || 'ø'}"
    end
  end

Here is the output of running that migration:

$ rails db:forward STEP=1
== 20190518214106 MoveGenreFromAuthorsToBooks: migrating ======================
-- add_column(:books, :genre, :string)
   -> 0.0009s
Moving science fiction from Octavia Butler to Fledgling
Moving essay from David Sedaris to Me Talk Pretty One Day
Moving technical from Martin Fowler to Refactoring

### Genres have supposedly been migrated ###

The genre of Refactoring is ø
The genre of Fledgling is ø
The genre of Me Talk Pretty One Day is ø
== 20190518214106 MoveGenreFromAuthorsToBooks: migrated (0.0209s) =============

As we can see the migration runs without error, but it has not done what we intended or expected — the genre data that we attempted to migrate to the books table is nowhere to be seen, those columns have been left as null. We have silently lost data, good thing we didn't destroy the original genre column yet.

Why did this happen?

It has to do with the schema caching that we did before running this migration. It is expensive for Rails to load and check the current schema from the database on every database interaction, so it caches a representation of the schema with all table and column information. Despite having added the genre column to books in the first part of our migration, Rails believes — according to its cache — that there is no genre column on books. As a result, when we call update! with attributes that Rails sees as irrelevant to the books table, it happily ignores them.

How can we ensure our migrations do what we expect?

We need to convince Rails that the books table does in fact have a genre column. We can do this by asking Rails to go grab a fresh list of columns for the books table. There is a method for this reset_column_information. Here is an updated migration:

  def up
    add_column :books, :genre, :string

    GenericBook.reset_column_information

    GenericAuthor.find_each do |author|
      book = GenericBook.find_by(author_id: author.id)

      puts "Moving #{author.genre} from #{author.name} to #{book.title}"

      update_attrs = { genre: author.genre }
      book.update!(update_attrs)
    end

    puts "\\n### Genres have supposedly been migrated ###\\n\\n"

    GenericBook.find_each do |book|
      puts "The genre of #{book.title} is #{book.genre || 'ø'}"
    end
  end

By placing this call to reset_column_information in between our schema change and our data changes, we ensure Rails has a valid cache of the column information for our books table. If we rerun this migration now, we'll see the output we were expecting: