We had a migration that failed to run in QA that had both schema and data changes. In this document I am going to cover what happened, how we fixed it, and explore what we can do differently in the future.
Let's set up a small example that will help illustrate some issues we want to focus on. We have a book shelf app that allows us to track authors and books.
Here is the schema for our authors table:
> \\d authors
Table "public.authors"
Column | Type | Modifiers
------------+-----------------------------+------------------------------------
id | uuid | not null default gen_random_uuid()
name | character varying | not null
genre | character varying | not null
created_at | timestamp without time zone | not null
updated_at | timestamp without time zone | not null
Indexes:
"authors_pkey" PRIMARY KEY, btree (id)
Referenced by:
TABLE "books" CONSTRAINT "fk_rails_53d51ce16a" FOREIGN KEY (author_id) REFERENCES authors(id)
Here is the schema for our books table:
> \\d books
Table "public.books"
Column | Type | Modifiers
------------------+-----------------------------+------------------------------------
id | uuid | not null default gen_random_uuid()
title | character varying | not null
publication_year | integer | not null
created_at | timestamp without time zone | not null
updated_at | timestamp without time zone | not null
author_id | uuid |
Indexes:
"books_pkey" PRIMARY KEY, btree (id)
"index_books_on_author_id" btree (author_id)
Foreign-key constraints:
"fk_rails_53d51ce16a" FOREIGN KEY (author_id) REFERENCES authors(id)
We can see that authors
and books
are related to one another by a foreign key relationship. We naively put the genre
column on authors
which makes it hard to represent any authors that have written books across different genres. We'd like to move the genre
column onto books
. We already have data consistent of a number of authors and books, so that data will need to get migrated along with the schema change.
Assume an environment where Rails has cached the schema — to ensure this has happened locally, we can run rails db:schema:cache:dump
and Rails will by default pick up and use this schema cache file.
Here is a migration that moves the genre
column from the authors
table over to the books
table (leaving the original genre
column intact for now) and migrates the existing data:
def up
add_column :books, :genre, :string
GenericAuthor.find_each do |author|
book = GenericBook.find_by(author_id: author.id)
puts "Moving #{author.genre} from #{author.name} to #{book.title}"
update_attrs = { genre: author.genre }
book.update!(update_attrs)
end
puts "\\n### Genres have supposedly been migrated ###\\n\\n"
GenericBook.find_each do |book|
puts "The genre of #{book.title} is #{book.genre || 'ø'}"
end
end
Here is the output of running that migration:
$ rails db:forward STEP=1
== 20190518214106 MoveGenreFromAuthorsToBooks: migrating ======================
-- add_column(:books, :genre, :string)
-> 0.0009s
Moving science fiction from Octavia Butler to Fledgling
Moving essay from David Sedaris to Me Talk Pretty One Day
Moving technical from Martin Fowler to Refactoring
### Genres have supposedly been migrated ###
The genre of Refactoring is ø
The genre of Fledgling is ø
The genre of Me Talk Pretty One Day is ø
== 20190518214106 MoveGenreFromAuthorsToBooks: migrated (0.0209s) =============
As we can see the migration runs without error, but it has not done what we intended or expected — the genre data that we attempted to migrate to the books
table is nowhere to be seen, those columns have been left as null. We have silently lost data, good thing we didn't destroy the original genre
column yet.
Why did this happen?
It has to do with the schema caching that we did before running this migration. It is expensive for Rails to load and check the current schema from the database on every database interaction, so it caches a representation of the schema with all table and column information. Despite having added the genre
column to books
in the first part of our migration, Rails believes — according to its cache — that there is no genre
column on books
. As a result, when we call update!
with attributes that Rails sees as irrelevant to the books
table, it happily ignores them.
How can we ensure our migrations do what we expect?
We need to convince Rails that the books
table does in fact have a genre
column. We can do this by asking Rails
to go grab a fresh list of columns for the books
table. There is a method for this reset_column_information
. Here is an updated migration:
def up
add_column :books, :genre, :string
GenericBook.reset_column_information
GenericAuthor.find_each do |author|
book = GenericBook.find_by(author_id: author.id)
puts "Moving #{author.genre} from #{author.name} to #{book.title}"
update_attrs = { genre: author.genre }
book.update!(update_attrs)
end
puts "\\n### Genres have supposedly been migrated ###\\n\\n"
GenericBook.find_each do |book|
puts "The genre of #{book.title} is #{book.genre || 'ø'}"
end
end
By placing this call to reset_column_information
in between our schema change and our data changes, we ensure Rails has a valid cache of the column information for our books
table. If we rerun this migration now, we'll see the output we were expecting: