Idempotent PostgreSQL schema scripts

, last modified Jamie Lentin

Whilst starting several new PostgreSQL projects this year, one objective of mine was to find a better way of managing the schema. Most tools I've tried to use in the past have felt very heavyweight, and get in the way more than be useful, having to spend ages trying to contort my nicely crafted table definitions into whatever language they use.

One suggestion I found on the PostgreSQL mailing list was that there is enough logic available now for such systems not to be necessary, and one could write idempotent SQL scripts. i.e. SQL scripts that, rather than falling over if the database is not in the state it expects, make any changes it thinks are required to get a schema up to the current version. If there are no changes required, then make no changes—don't drop a table so you can re-create it, that'd be annoying.

I decided to put this to the test. The following is notes on how to deal with various object types.

Tables

Firstly, use CREATE TABLE IF NOT EXISTS. This creates any new tables, and avoids errors if the table is already existing.

Obviously, this won't handle tables that gain columns over time, However, information_schema helps in picking out tables that don't match what we expect:

DO
$do$
BEGIN
   IF EXISTS (SELECT *
                FROM information_schema.columns
               WHERE table_schema = 'public'
                 AND table_name = 'syllabus'
                 AND column_name = 'requires_group_id'
                 AND is_nullable = 'NO') THEN
       ALTER TABLE syllabus
           ALTER COLUMN requires_group_id DROP DEFAULT,
           ALTER COLUMN requires_group_id DROP NOT NULL;
       UPDATE syllabus
          SET requires_group_id = NULL
        WHERE requires_group_id = get_group_id('accept_terms');
   END IF;
END
$do$;

Table data

If there are hard-coded entries in a table (e.g. lookup values), you can use ON CONFLICT to do an "upset" into the table instead of inserting rows. This is a bit long-winded, but we can create temporary helper functions to condense code:

CREATE OR REPLACE FUNCTION pg_temp.upsert_rclass( new_id INT
                                                , new_name TEXT
                                                , new_description TEXT) RETURNS VOID AS
$BODY$
    INSERT INTO rclass (rclass_id, name, description)
        VALUES (new_id, new_name, new_description)
        ON CONFLICT (rclass_id) DO UPDATE
                                      SET name = EXCLUDED.name
                                        , description = EXCLUDED.description;
$BODY$ LANGUAGE sql;
SELECT pg_temp.upsert_rclass(101, 'metadata.title', 'Title');
SELECT pg_temp.upsert_rclass(102, 'metadata.author', 'Author');

Indexes

Use CREATE INDEX IF NOT EXISTS to create indexes. The names of indexes is rarely relevant (although should describe the index well), so when a modified index is required, give it a different name.

DROP INDEX IF EXISTS for cases where something is no longer required.

Views

Just use CREATE OR REPLACE VIEW, no strange edge cases.

Functions

CREATE OR REPLACE FUNCTION handles the general case for you, however will refuse if parameters have morphed excessively from the existing version. In this case it's probably sensible to move on to a new function name anyway.

Conclusions

So far it seems to be working, and whilst there are some wordy constructs, they can be hidden behind PL/SQL. Whilst there are still no guarantees that any upgrade steps produce exactly the same results as a fresh schema, this is something that can be tested.