SQLite DB Migrations with PRAGMA user_version

This blog used a simple homegrown blogging engine that I wrote backed by a SQLite database. I have a function in the flask app that performs database migrations. My current approach has been to keep a folder full of migrations and run them sequentially whenever the app starts.

This works well for the case of adding and removing tables since SQLite has the handy IF NOT EXISTS option. However, when you are altering an existing table, this entire model falls apart since IF NOT EXISTS no longer works.

Practically, this means that outside of a fresh install my database migrations are useless.

I am still being stubborn and not using a well written solution like Alembic (which I would highly recommend for a “serious” project) for this blog. Instead, I discovered that SQLite comes with a built in mechanism to keep track of the user schema. This is the pragmastatement, and specifically user_version.

Using PRAGMA user_data for DB Migrations

My migrations folder structure looks like this:

.
├── blog.db
├── blog.py
├── __init__.py
├── migrations
│   ├── 0001_initial_schema.sql
│   ├── 0002_add_unique_index_to_posts_tags.sql
│   ├── 0003_add_fts.sql
│   ├── 0004_add_column_to_post.sql
│   ├── 0005_add_comments_table.sql
│   └── 0006_add_admin_flag_to_comments.sql

As you can see the naming convention is 000N_migration_description.sql. Each migration file has the following statement in it:

PRAGMA user_version=N; (where N is the 000"N" part of the file name)

This steps the current user_version to be equal to the current version as defined by the file name.

The code to do stuff with the database is shown below:

def connect_db():
    """Connects to Database."""
    rv = sqlite3.connect(
        app.config['DATABASE'],
        detect_types=sqlite3.PARSE_DECLTYPES | sqlite3.PARSE_COLNAMES)
    rv.row_factory = sqlite3.Row
    return rv


def get_db():
    """Opens new db connection if there is not an
    existing one for the current app ctx.
    """
    if not hasattr(g, 'sqlite_db'):
        g.sqlite_db = connect_db()
    return g.sqlite_db


def migrate_db():
    """Run database migrations."""

    def get_script_version(path):
        return int(path.split('_')[0].split('/')[1])

    db = get_db()
    current_version = db.cursor().execute('pragma user_version').fetchone()[0]

    directory = os.path.dirname(__file__)
    migrations_path = os.path.join(directory, 'migrations/')
    migration_files = list(os.listdir(migrations_path))
    for migration in sorted(migration_files):
        path = "migrations/{0}".format(migration)
        migration_version = get_script_version(path)

        if migration_version > current_version:
            print("applying migration {0}".format(migration_version))
            with app.open_resource(path, mode='r') as f:
                 db.cursor().executescript(f.read())
                 print("database now at version {0}".format(migration_version))
        else:
            print("migration {0} already applied".format(migration_version))

The relevant part to this blog post is the migrate_db() function. Two things are happening.

  1. The get_script_version() helper function extracts the integer from the migration name.
  2. current_version gets the current value of user_version of your database.
  3. We iterate over each migration file in the migrations folder and perform a simple check. If the migration version is larger than the current_version we run the migration, otherwise it gets skipped.

This solves for most cases and allows for a smooth upgrade path if anyone ever decides to start using this blogging engine for themselves. I am still pretty happy with this approach because this is essentially a fully functional migration system in just a handful of lines of python.

I Want to Become a Core Python Developer

I’ve been tinkering with python for almost five years now. I am absolutely in love with the language. My new goal is to make enough contributions to the project to join the core team.

This post is my attempt to keep a list of all that I’ve done in this endeavor. I will keep this up to date on a monthly basis.

Short Term Goals

  • Ship some actual code. Focus on improving test coverage.
  • Attend the next available local meetup.
  • Get this PR merged.
  • Work on some other low hanging fruit from bedevere.

November 2017

Code

  • Reported an issue with Vagrant and Ansible on the pythondotorg repo, and assisted with testing the resolution. (note, for any future newbies, reporting issues, writing docs, testing PRs, these are all super valuable things that you can do to get more familiar with a projects code base).
  • Substantial refactoring of the dev guide merged.

Community

  • Reached out to the core workflow team to see if we could introduce CircleCI into the Python organization. This addresses the PoC showed in this PR.

October 2017

Code

Community

  • Became a PSF Member.
  • Hang out in various IRC channels. Notably #python on freenode and help out where I can.
  • Join the PSF Volunteers mailing list and volunteer for opportunities as they come in.
  • Sign up for all the dev related mailing lists.
  • Joined the BAyPIGgies local python meetup group.

Python Mocks Test Helpers

I’ve been writing a python wrapper for the CircleCI API over the last week. I wanted to do this “the right way” with test driven development.

I have a couple integration tests that actually hit the CircleCI API, but most of the unit tests so far are using MagicMock to ensure that the basic functions are working as expected.

This generally involves the tedious process of dumping out JSON, saving it to a file, and then reloading that file later on to actually test it.

I wrote two helper functions that make this process slightly less tedious.

Load Mock

The first is a function that loads a file and overrides every request to return that file (typically as JSON).

    def loadMock(self, filename):
        """helper function to open mock responses"""
        filename = 'tests/mocks/{0}'.format(filename)

        with open(filename, 'r') as f:
            self.c._request = MagicMock(return_value=f.read())

Test Helper

The second is a function that runs a real request for the first time and dumps the output to a file.

    def test_helper(self):
        resp = self.c.add_circle_key()
        print(resp)
        with open('tests/mocks/mock_add_circle_key_response', 'w') as f:
             json.dump(resp, f)

Naming it test_helper allows it to be picked up and ran when you run your test suite since by default unittest will capture any methods that start with test.

Usage

An actual example is shown below.

    def test_clear_cache(self):
        self.loadMock('mock_clear_cache_response')
        resp = json.loads(self.c.clear_cache('levlaz', 'circleci-sandbox'))

        self.assertEqual('build dependency caches deleted', resp['status'])

Writing the tests is easy, we just copy and paste the name of the file that was created with test_helper and verify that the contents are what we expect them to be.

This approach has been working very well for me so far. One thing to keep in mind with writing these types of tests is that you should also include some general integration tests against the API that you are working with. This way you can catch any regressions with your library in the event that the API changes in any way. However, as a basic sanity check mocking these requests is a good practice and less prone to flakiness.

Injecting Stuff into your Python Path

Similar to a previous post where I wrote about how to run flask tests without installing your app, another common thing that you might want to be able to do is import your app from some arbitrary script. This is especially useful when running your app with apache mod_wsgi. This module expects the app to be installed globally or at least in the python path. Unless you install the app in a traditional sense this will not be true. The solution is just to inject the path prior to running your import statement like this.

sys.path.insert(0, '/var/www/blog')
from blog import app as application

This import will actually work.

Using the Flask CLI

Who knew that flask had a cli? Previously I used to just use manage.py just like Django does it to “do stuff”. The CLI is great, but again it follows the theme of kind of wanting you to install your flask app. (I really should do this). So in order to get your app to work you must point the FLASK_APP variable to the actual python file (not your app module). This is true even if you have a true python module. For instance.

export FLASK_APP = blog/blog.py
flask run

Works, while

export FLASK_APP = blog
flask run

Does not. Even though blog consists of:

blog/
  __init__.py
  blog.py

Running Flask Tests without installing the app

The Flask Docs have a great section for testing. However they assume that you have the app installed with pip install -e . which I almost never do. (Maybe I should start?) I have had trouble with this approach. So one little hack to inject your app into the searchable python path is to do something like this:

export BLOG_PATH=$(pwd) && python tests/blog_tests.py

We use \$(pwd) so that this will “just work” no matter which computer you run it on (i.e. test, dev, other persons dev) Then in yourtests/blog_tests.py file you import your flask app like this:

sys.path.insert(0, os.environ.get('BLOG_PATH'))
from blog.blog import app, init_db

You can see the full details here: https://circleci.com/gh/levlaz/blog My biggest issue with installing the app in a traditional sense is that when I run it with apache mod_wsgi it does not seem to actually copy over the right folders and apache is not able to find templates or static files. So that sucks, and I am probably doing something wrong. But this approach works more or less no matter what so sometimes a hack is better than derping around with no solution (as I did for six hours yesterday).

Show all Flask Routes

One of my favorite things about rails is rake routes. This command will show you all of the routes that your application is currently aware of. I was searching for something similar in Flask and came across this snippet. Sadly, it did not work for me. I am using Python 3, so that may have something to do with it. In any case, I did not have time to dig into what was wrong, because just the other day I read this amazing introduction to functional python where my biggest take away was that as soon as you see multiple for loops, its time to use map. With my functional skills in tow, I wrote a simple one liner to print all of my routes.

@manager.command
def routes():
    import pprint
    pprint.pprint(list(map(lambda x: repr(x), app.url_map.iter_rules())))

Running this on braindump, shows me everything that I need to know.

[levlaz braindump]$ python3 manage.py routes
["<Rule '/auth/change-password' (GET, HEAD, OPTIONS, POST) -> "
'auth.change_password>',
"<Rule '/auth/change-email' (GET, HEAD, OPTIONS, POST) -> "
'auth.change_email_request>',
"<Rule '/auth/unconfirmed' (GET, HEAD, OPTIONS) -> auth.unconfirmed>",
"<Rule '/auth/register' (GET, HEAD, OPTIONS, POST) -> auth.register>",
"<Rule '/auth/confirm' (GET, HEAD, OPTIONS) -> auth.resend_confirmation>",
"<Rule '/auth/logout' (GET, HEAD, OPTIONS) -> auth.logout>",
"<Rule '/auth/login' (GET, HEAD, OPTIONS, POST) -> auth.login>",
"<Rule '/auth/reset' (GET, HEAD, OPTIONS, POST) -> "
'auth.password_reset_request>',
"<Rule '/empty-trash' (GET, HEAD, OPTIONS) -> main.empty_trash>",
"<Rule '/notebooks' (GET, HEAD, OPTIONS, POST) -> main.notebooks>",
"<Rule '/favorites' (GET, HEAD, OPTIONS) -> main.favorites>",
"<Rule '/settings' (GET, HEAD, OPTIONS) -> main.settings>",
"<Rule '/shutdown' (GET, HEAD, OPTIONS) -> main.server_shutdown>",
"<Rule '/archive' (GET, HEAD, OPTIONS) -> main.view_archive>",
"<Rule '/search' (GET, HEAD, OPTIONS) -> main.search>",
"<Rule '/trash' (GET, HEAD, OPTIONS) -> main.trash>",
"<Rule '/add' (GET, HEAD, OPTIONS, POST) -> main.add>",
"<Rule '/' (GET, HEAD, OPTIONS, POST) -> main.index>",
"<Rule '/static/bootstrap/<filename>' (GET, HEAD, OPTIONS) -> "
'bootstrap.static>',
"<Rule '/auth/change-email/<token>' (GET, HEAD, OPTIONS) -> "
'auth.change_email>',
"<Rule '/auth/confirm/<token>' (GET, HEAD, OPTIONS) -> auth.confirm>",
"<Rule '/auth/reset/<token>' (GET, HEAD, OPTIONS, POST) -> "
'auth.password_reset>',
"<Rule '/delete-forever/<id>' (GET, HEAD, OPTIONS, POST) -> "
'main.delete_forever>',
"<Rule '/notebook/<id>' (GET, HEAD, OPTIONS) -> main.notebook>",
"<Rule '/notebook/<id>' (DELETE, OPTIONS) -> main.delete_notebook>",
"<Rule '/favorite/<id>' (GET, HEAD, OPTIONS, POST) -> main.favorite>",
"<Rule '/restore/<id>' (GET, HEAD, OPTIONS, POST) -> main.restore>",
"<Rule '/archive/<id>' (GET, HEAD, OPTIONS) -> main.archive>",
"<Rule '/delete/<id>' (GET, HEAD, OPTIONS, POST) -> main.delete>",
"<Rule '/static/<filename>' (GET, HEAD, OPTIONS) -> static>",
"<Rule '/share/<id>' (GET, HEAD, OPTIONS, POST) -> main.share>",
"<Rule '/note/<id>' (GET, HEAD, OPTIONS) -> main.note>",
"<Rule '/edit/<id>' (PUT, OPTIONS) -> main.edit>",
"<Rule '/tag/<name>' (GET, HEAD, OPTIONS) -> main.tag>"]

Now, all that is left is for me to figure out why my API routes are missing. 🙂