the problem with green CI
You've seen this movie. Tests pass, deploy goes out, and then something breaks in production that never once showed up locally or in CI. You dig into it and discover that your mock was returning a flat list where Postgres actually returns a cursor, or that your fake connection silently swallowed a constraint violation that Postgres would have raised as an IntegrityError, or that SELECT FOR UPDATE doesn't mean anything to a Python dict standing in for a database. The mock was faithful enough to pass the test and dishonest enough to hide the bug.
This is the core problem with mocking the database: you're not testing your code against the thing your code actually runs against. You're testing it against a simplified mental model of that thing, and the gaps between the model and the reality are exactly where production bugs live. I've watched teams spend three weeks hunting a race condition that their test suite couldn't catch because every test was running against a mock that had no concept of transaction isolation levels. The mock can't lie about a behavior it doesn't implement. It just silently doesn't implement it.
The usual defense is speed. Real database tests are slow, the argument goes, and you can't have a test suite that takes fifteen minutes to run because nobody will run it. That was a reasonable position in 2018. It's not a reasonable position now, and Postgres 17 is a big part of why.
what savepoints actually give you
Postgres has had savepoints for a long time, but the performance characteristics and the tooling around them have gotten good enough that they're worth building your entire test isolation strategy around. The idea is simple: you open a transaction at the start of each test, run the test, and then roll it back instead of committing. The database never actually writes anything to disk in the permanent sense, and you don't need to truncate tables or run migrations between tests.
The wrinkle is nested transactions. Django's test client wraps certain operations in their own transactions, and if you're already inside a transaction, those nested BEGIN statements turn into savepoints automatically. Postgres 17 handles this gracefully. You can have a test that exercises code which calls transaction.atomic() inside an outer transaction that exists purely for test isolation, and Postgres correctly treats the inner block as SAVEPOINT sp1 ... RELEASE SAVEPOINT sp1 or ROLLBACK TO SAVEPOINT sp1 on failure. The application code doesn't know the difference. Your actual database engine is executing the full query plan, checking constraints, acquiring locks, doing everything it would do in production, just with a get-out-of-jail-free rollback waiting at the end.
The performance win here is significant. On a schema with 80 tables and a moderate dataset, rolling back a transaction takes microseconds. Truncating 80 tables takes hundreds of milliseconds, and that cost compounds across hundreds of tests. We've seen test suites drop from four minutes to under forty seconds just by switching from table truncation to savepoint-based rollback, without changing a single test or touching any fixtures. The database is doing less work, not because you've hidden it behind a mock, but because rollback is genuinely fast.
per-test schema isolation
Savepoints get you most of the way there, but there's a class of problem they don't solve: tests that depend on DDL changes, or tests that need to verify behavior around table creation, index management, or anything that can't live inside a transaction because Postgres won't let you roll back certain DDL operations cleanly. For those cases, per-test schema namespacing is the answer.
The approach is straightforward. You create a dedicated Postgres schema for each test (or each test worker, if you're running parallelized), set the search_path to point at that schema, run the test, and then drop the schema when you're done. Because schemas are cheap in Postgres, this doesn't cost much, and because each test gets its own isolated namespace, you can run tests in parallel without any shared state contamination.
In practice, the setup looks like this. You create a base schema with all your migrations applied, then for each parallel worker you create a schema that's a copy of the base. Postgres 17's improved handling of schema-level operations means this copy operation is fast enough to be practical at the start of a test session, and the per-test isolation within a worker still uses savepoints, so you're getting both layers of isolation working together.
One thing worth being explicit about: this is not the same as using separate databases per worker. Separate databases mean separate connections, separate migration runs, and a much heavier setup cost. Schema isolation within a single database means a single migration run at session start, cheap worker initialization, and no connection pool thrashing. The difference in CI startup time is usually around two to three minutes on a meaningful schema, which adds up across every PR.
the Django 6 + pytest setup
Concrete config, because hand-waving about architecture doesn't help you ship. This setup assumes Django 6.x (currently at 6.0.x as of early 2026), pytest-django, and Postgres 17.
First, your conftest.py needs to configure the transaction isolation approach. pytest-django's django_db_setup fixture is where you hook in. The key setting is --reuse-db for local dev and a proper DATABASE_URL in CI that points at a real Postgres 17 instance, not SQLite, not a mock, not an in-memory anything.
1# conftest.py2import pytest3from django.db import connection45@pytest.fixture(scope='function', autouse=True)6def wrap_in_transaction(db):7 """Each test runs inside a transaction that rolls back on completion."""8 with connection.cursor() as cursor:9 cursor.execute('SAVEPOINT test_savepoint')10 yield11 with connection.cursor() as cursor:12 cursor.execute('ROLLBACK TO SAVEPOINT test_savepoint')
For parallel execution with schema isolation, you'd use pytest-xdist and a custom django_db_setup that provisions a schema per worker:
1@pytest.fixture(scope='session')2def django_db_setup(django_test_environment, django_db_blocker, worker_id):3 schema_name = f'test_worker_{worker_id}'4 with django_db_blocker.unblock():5 with connection.cursor() as cursor:6 cursor.execute(f'CREATE SCHEMA IF NOT EXISTS {schema_name}')7 cursor.execute(f'SET search_path TO {schema_name}, public')8 # run migrations into this schema9 call_command('migrate', '--run-syncdb', verbosity=0)10 yield11 with connection.cursor() as cursor:12 cursor.execute(f'DROP SCHEMA {schema_name} CASCADE')
You'll also want DATABASES['default']['TEST']['NAME'] pointing at your real Postgres instance, and CONN_MAX_AGE = 0 in tests to avoid connection state leaking between tests that might have different search_path settings. Django 6's improved async ORM support also means you need to be careful with async tests: pytest-anyio with anyio_mode = 'auto' in your pytest.ini gets you async test support that plays nicely with this setup without blowing up the transaction context.
Run time on a project we've been working on internally (about 120 models, 600 tests) dropped from 6.5 minutes with mock-heavy tests to 52 seconds with this setup. That's with real Postgres, real constraints, real indexes, real query plans.
what mocks systematically hide
The classes of bugs that mocks can't catch are not edge cases. They come up constantly in production codebases, and the frustrating thing is that they're often the bugs that are hardest to reproduce and debug after the fact.
Constraint violations are the obvious one. A mock save() call succeeds silently. A real Postgres INSERT respects your UNIQUE constraints, your CHECK constraints, your foreign key constraints. If your application logic has a race condition where two requests can both pass an application-level uniqueness check and then both try to write, only Postgres will catch the second one. Your mock will happily return success for both. You ship, two users get the same order number, accounting is furious.
Query plan sensitivity is less obvious but equally painful. An ORM query that works fine against a mock works fine in development with 50 rows of data, and then starts timing out in production with 500,000 rows because the query plan Postgres chooses changes at scale. Mocks never tell you this is coming. Real Postgres, running against a test dataset of realistic size, will at least give you a fighting chance to notice that your filter() call is doing a sequential scan on a table with no index on the filtered column.
Transaction semantics are where it gets genuinely subtle. SELECT FOR UPDATE, SKIP LOCKED, advisory locks, REPEATABLE READ isolation, two-phase commit, none of these behaviors exist in a mock. If your application uses any of them (and if you're building anything with a job queue, a rate limiter, or concurrent resource allocation, you probably are), your mock is silently pretending those semantics don't matter. They do. We had a client who spent six weeks debugging a double-processing issue in their task queue because their test suite mocked the database and never caught that their SELECT FOR UPDATE query was being written incorrectly for their isolation level.
JSON field behavior is another one. Django's JSONField does different things depending on whether the backend is SQLite, Postgres, or a mock. Postgres 17's jsonb operators, containment queries, path expressions, these are not replicated by any mock I've seen. If you're querying data__foo__bar=value in a JSONField and your tests pass against a mock, you have no idea if that query actually works until it runs against Postgres.
the CI infrastructure side
None of this works if your CI environment doesn't run a real Postgres 17 instance. That sounds obvious but it's worth being direct about because a lot of teams are still running SQLite in CI because it's easier to set up, and then wondering why their test suite doesn't catch production bugs.
GitHub Actions makes this straightforward now. A services block in your workflow YAML:
1services:2 postgres:3 image: postgres:174 env:5 POSTGRES_PASSWORD: postgres6 POSTGRES_DB: testdb7 options: >-8 --health-cmd pg_isready9 --health-interval 10s10 --health-timeout 5s11 --health-retries 512 ports:13 - 5432:5432
The postgres:17 image on Docker Hub has been stable since Postgres 17 released in late 2024, and it starts in about four seconds on a standard GitHub Actions runner. There's no meaningful overhead compared to spinning up SQLite. The whole 'we use SQLite in CI because it's faster to start' argument was always weak, and by 2026 it's just cargo-culting.
For local dev, if you're on macOS, Postgres.app running 17.x is the path of least resistance. If you're on Linux or running in a container, the official Docker image with a volume mount for your data directory gets you there. The important thing is that your local environment and your CI environment run the same Postgres major version, because behavior differences between Postgres 15 and 17 (improved partition pruning, better statistics, some changed planner behavior around parallel queries) can and do cause tests to pass on one and fail on the other.
when mocks are still fine
Being honest about the limits of the argument matters. Mocks for third-party HTTP APIs are completely appropriate. You're not trying to test Stripe's behavior, you're testing your code's behavior given various Stripe responses, and mocking the HTTP layer with responses or pytest-httpx is exactly right for that. Same for S3, email providers, payment processors, external auth systems.
Mocks for your own internal service boundaries can make sense too, if those services have well-defined contracts and you're testing a specific layer of your stack in isolation. Integration tests that cross service boundaries are a different conversation.
The argument in this post is specifically about mocking the database when the database is Postgres and you control it. That's the case where the cost of the mock (hidden production bugs, false confidence from green tests) consistently outweighs the benefit (faster test setup), especially now that the speed argument for mocks has gotten so much weaker. At steezr we've moved every project we maintain to real-Postgres test setups over the past year or so, and the pattern holds: the initial setup investment is a few hours, and the ongoing dividend is tests you can actually trust.
The green CI / red production problem doesn't have a single cause, but mocked databases are one of the most reliable contributors to it. Postgres 17 gives you the tools to stop doing it. The only question is whether you want to keep patching production bugs that your test suite was never going to catch.
