Squashing and optimizing migrations in Django

Check out the new site at https://rkblog.dev.

25 May 2015 Comments

With Django 1.7 we got built in migrations and a management command to squash a set of existing migrations into one optimized migration - for faster test database building and to remove some legacy code/history. Squashing works, but it still has some rough edges and requires some manual work to get the best of a squashed migration. Here are few tips for squashing and optimizing squashed migrations.

How to squash migrations

To squash migrations in given application just use a management command called squashmigrations giving the application name and migration number to which you want to squash the migrations (usually latest migration). A squashed migrations file will be generated that will contain all the operations from existing migrations. Django will try to reduce the number of operations if possible - like moving add field operations into create model operation.

Depending on the migrations complexity the squashed migration may require some code editing to work. If you have RunPython in migrations then you will have to copy the functions into the squashed migration. After that the squash may not have all the operations optimized and further optimization will require our assistance.

Migrations should be squashed if it takes a lot of time to run them all (like during testing), or when they have some code (like during data migration) or legacy models history that we want to wipe out.

Before we go with squash optimization let us take a look at the squashing procedure:

Create a squashed migration and if needed edit it to work. Then commit the change.
When every database is migrated to the state from migration squash we can remove the old migrations and remove replaces list from the squashed migration. In multiple applications projects you will also have to change the dependent migration names if you are deleting them (change the name to the migration squash filename). Commit that.
Now we can play with the squash to limit the number of operations it has. In many cases squash will have some lefovers that it wasn't able to optimize on its own.

Running migrate on an existing database should result with FAKED for the squash (recognized as a new initial migration and skipped). However in many cases I found that this doesn't work and Djang will try to apply squashed migration and fail due to tables already created. A solution is to run migrate with --fake. This will work, but for deployment process this will be a bad situation as you have to run it and be sure that there are no other migrations from other developers (look for fixes in Django releases as squashing is being polised a lot and new reported bugs help catch every buggy case).

Before commiting a squash or optimized squash later one it's essential to test them. To test migrations just launch any test and Django will use a clean database and apply all migrations. If the migration passes it's good. If you don't want to use --fake for deployment check if migrate command will pass for an existing database.

Squashes and optimizations

Ideal case - simplest example

Lets start with a simple example that will give a perfect squash on its own. Let's create a News model that will get new fields in subsequent migrations:

class News(models.Model):
    title = models.CharField(max_length=300)

class News(models.Model):
    title = models.CharField(max_length=300)
    text = models.TextField()

class News(models.Model):
    title = models.CharField(max_length=300)
    text = models.TextField()
    published_date = models.DateTimeField(auto_now_add=True)

In first migration we create the model, in second we add text field and in last one we add published_date field. A squashed migration will give the best result - one CreateModel operation:

class Migration(migrations.Migration):

    replaces = [('vanilla', '0001_initial'), ('vanilla', '0002_news_text'), ('vanilla', '0003_news_published_date')]

    dependencies = [
    ]

    operations = [
        migrations.CreateModel(
            name='News',
            fields=[
                ('id', models.AutoField(verbose_name='ID', serialize=False, auto_created=True, primary_key=True)),
                ('title', models.CharField(max_length=300)),
                ('text', models.TextField(default='')),
                ('published_date', models.DateTimeField(auto_now_add=True, default=datetime.datetime(2015, 5, 24, 17, 26, 25, 326059, tzinfo=utc))),
            ],
        ),
    ]

If we have at least one test in our project then we can try to run it to test the squashed migration (vanilla is my test app name and -v=2 is needed to see the list of applied migrations):

python manage.py test vanilla -v=2

According to the documentation now we can remove the old migrations and remove replaces list and it should work. In my case on Django 1.8.2 it tries to apply the squashed migration on an existing database and fails (that may be a bug, and I've already reported it on the Django trac). Migrating with --force would solve this.

Reordering operations

Some operations like RunPython, RunSQL or indexes operations will stop the optimizer for looking for new operation to optimize. When we get the initial squash we can then play with the order of operations - move operations on given model just after CreateModel - and then create a squash of a squash just to get an optimized list of operations (i delete it, just copy the operations list to the initial squash).

For example lets create a Person model and then add some fields to it:

class Person(models.Model):
    first_name = models.CharField(max_length=100)
    last_name = models.CharField(max_length=100)

    class Meta:
        unique_together = ('first_name', 'last_name')


class Person(models.Model):
    first_name = models.CharField(max_length=100)
    last_name = models.CharField(max_length=100)
    age = models.IntegerField(default=0)
    gender = models.CharField(choices=(('m', 'm'), ('f', 'f')), default='m', max_length=1)

    class Meta:
        unique_together = ('first_name', 'last_name')

We have two migrations and a squash will result in:

    operations = [
        migrations.CreateModel(
            name='Person',
            fields=[
                ('id', models.AutoField(primary_key=True, serialize=False, auto_created=True, verbose_name='ID')),
                ('first_name', models.CharField(max_length=100)),
                ('last_name', models.CharField(max_length=100)),
            ],
        ),
        migrations.AlterUniqueTogether(
            name='person',
            unique_together=set([('first_name', 'last_name')]),
        ),
        migrations.AddField(
            model_name='person',
            name='age',
            field=models.IntegerField(default=0),
        ),
        migrations.AddField(
            model_name='person',
            name='gender',
            field=models.CharField(choices=[('m', 'm'), ('f', 'f')], max_length=1, default='m'),
        ),
    ]

As you can see there are two AddField operations that haven't been merged with CreateModel due to AlterUniqueTogether operation. This may be fixed in future but now the solution is to move those operations up and then generate a squash of a squash to get an optimized version:

Optimizing...
  Optimized from 4 operations to 2 operations.

When we do it we get a perfect squash:

    operations = [
        migrations.CreateModel(
            name='Person',
            fields=[
                ('id', models.AutoField(serialize=False, verbose_name='ID', auto_created=True, primary_key=True)),
                ('first_name', models.CharField(max_length=100)),
                ('last_name', models.CharField(max_length=100)),
                ('age', models.IntegerField(default=0)),
                ('gender', models.CharField(default='m', choices=[('m', 'm'), ('f', 'f')], max_length=1)),
            ],
        ),
        migrations.AlterUniqueTogether(
            name='person',
            unique_together=set([('first_name', 'last_name')]),
        ),
    ]

When we have multiple models in the squash then operations for every model move just after it CreateModel operation. In case of relations you must preserve the correct order. Sometimes a model to which relations are created will be placed after operations on models to which relations are created afterwards. A solution would be to move CreateModel before all of them, and the AddField relation operations just after every model CreateModel operation so that only CreateModel operations will be present after optimization.

Removing operations

Next case - we create a Person model, in second migration we create NewPerson model, in third we use RunPython operation (for data migration), and in fourth we remove the Person model.

When squashing RunPython operations we will get a notice:

Manual porting required Your migrations contained functions that must be manually copied over, as we could not safely copy their implementation. See the comment at the top of the squashed migration for details.

You have to copy the function to the squash and fix the path in the RunPython operation for it to work. Original squash will look like so (notice the RunPython operation - the path needs to be fixed):

operations = [
        migrations.CreateModel(
            name='Person',
            fields=[
                ('id', models.AutoField(primary_key=True, serialize=False, verbose_name='ID', auto_created=True)),
                ('first_name', models.CharField(max_length=100)),
                ('last_name', models.CharField(max_length=100)),
            ],
        ),
        migrations.AlterUniqueTogether(
            name='person',
            unique_together=set([('first_name', 'last_name')]),
        ),
        migrations.CreateModel(
            name='NewPerson',
            fields=[
                ('id', models.AutoField(primary_key=True, serialize=False, verbose_name='ID', auto_created=True)),
                ('first_name', models.CharField(max_length=100)),
                ('last_name', models.CharField(max_length=100)),
            ],
        ),
        migrations.AlterUniqueTogether(
            name='newperson',
            unique_together=set([('first_name', 'last_name')]),
        ),
        migrations.RunPython(
            code=removing.migrations.0003_auto_20150524_1933.foo,
        ),
        migrations.DeleteModel(
            name='Person',
        ),
    ]

Data migrations can be removed from a squash - if every database have migrated data from old to new state then we don't need it any more (if we don't plan to rollback it that is). So remove RunPython and function that was used with it. Next we can optimize the model deletion - either delete all operations on the old model or move DeleteModel model operation just after it AlterUniqueTogether operation and resquash - you will get a clean squashed migration with no redundant operations.

At some buggy cases when the deleted model had relations to the squashed migration may not work (as of Django 1.8.2) as during migration on a clean database some operation will try to run on a model that does not exist. This may be fixed in a future version of Django (#24849). A solution for this now is to manually remove operations on the deleted model.

The end

Migrations squashing becomes essential with the new migration system. It's not yet a cool and polished feature but with constant improvements it should shine in close future. Try squashing your migrations, try catching bugs if any and report them on the Django trac.

RkBlog

Django web framework tutorials, 25 May 2015

Check out the new site at https://rkblog.dev.