the wrong first move
A lot of Django teams hit FATAL: remaining connection slots are reserved for non-replication superuser connections, panic, add two more app instances, watch CPU stay flat while the database gets even angrier, then call it scaling. That move is backwards. Every extra Gunicorn worker, every extra Kubernetes pod, every extra Celery process widens the blast radius if your application opens direct Postgres sessions and keeps them around longer than necessary. You haven't added throughput, you've multiplied idle sockets and backend processes.
Postgres treats connections as real work. Each backend process consumes memory, scheduler time, internal bookkeeping, and a surprising amount of operational attention once you cross the comfortable range for your box. On a modest db.t4g.large with 8 GB RAM, setting max_connections = 500 because someone wanted headroom is how you buy yourself worse cache hit rates and noisier latency. Most startup workloads don't need 500 active database sessions. They need 20 to 50 active queries, maybe 80 during a traffic spike, and a sane queue in front of that.
That queue is PgBouncer. Transaction pooling, specifically. Not session pooling, which keeps too much of the old pain, and not another layer of application-side magical thinking. PgBouncer sits between Django and Postgres, accepts a large number of client connections cheaply, then multiplexes them onto a much smaller set of real server connections. That one change usually buys more headroom than doubling your app fleet, because your bottleneck was connection management, not Python request handling.
We've seen this pattern repeatedly at Steezr on customer portals and ERP-style Django systems, the app tier looks underused, request latency spikes, RDS reports connection saturation, and somebody assumes the cure is horizontal scaling. It rarely is. Fix the connection story first, then look at slow queries, then look at app capacity. That order matters.
why transaction pooling wins
PgBouncer has a few modes, and only one of them solves the problem cleanly for most Django apps. Use pool_mode = transaction. Session pooling gives each client a dedicated server connection for the lifetime of the session, which means you preserve a lot of the waste you're trying to remove. Statement pooling is too restrictive for normal Django usage. Transaction pooling hands out a server connection only for the duration of a transaction, then returns it to the pool immediately. That's the whole point.
Django is generally fine with this, as long as you stop relying on session-level database state. Anything that assumes a connection sticks around forever will bite you, usually in subtle ways. Server-side cursors are the classic example. If you're using them through .iterator() on large querysets, set DISABLE_SERVER_SIDE_CURSORS = True in Django when routing through a transaction pooler. Prepared statements can also get weird depending on your driver and settings, because PgBouncer may send your next transaction over a different backend connection. With modern psycopg setups this is manageable, you just need to know which features depend on connection affinity.
A minimal PgBouncer 1.22 config looks like this:
[databases]
app = host=postgres.internal port=5432 dbname=app
[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 40
reserve_pool_size = 10
reserve_pool_timeout = 3
server_reset_query = DISCARD ALL
server_check_delay = 30
ignore_startup_parameters = extra_float_digits,options
admin_users = pgbouncer
stats_users = pgbouncerThat default_pool_size = 40 doesn't mean 40 users total, it means up to 40 server connections per database and user pair, which matters if you're splitting traffic across multiple roles. The upside is huge though, 300 Django client connections can be safely funneled into 40 or 50 real Postgres backends, and your database stops spending half its life context-switching.
fix django and gunicorn
PgBouncer only helps if the rest of your stack stops fighting it. Django defaults and random blog-post Gunicorn settings are often the reason teams think they need more infrastructure. They don't. They need fewer long-lived processes holding onto resources they barely use.
Start with Django. Point DATABASES at PgBouncer, not Postgres directly, disable persistent connections on the Django side, and disable server-side cursors if you're using transaction pooling:
DATABASES = {
"default": {
"ENGINE": "django.db.backends.postgresql",
"NAME": "app",
"USER": "app",
"PASSWORD": os.environ["DB_PASSWORD"],
"HOST": os.environ.get("DB_HOST", "pgbouncer"),
"PORT": int(os.environ.get("DB_PORT", "6432")),
"CONN_MAX_AGE": 0,
"DISABLE_SERVER_SIDE_CURSORS": True,
"OPTIONS": {
"connect_timeout": 5,
"application_name": "web",
},
}
}CONN_MAX_AGE = 0 surprises people because they spent years hearing that persistent connections are good. Direct-to-Postgres, maybe. Through PgBouncer, Django persistence buys you very little and can confuse the picture. Let PgBouncer own pooling.
Now Gunicorn. A common bad config looks like workers = 8 on a 2 vCPU box because somebody copied (2 x cores) + 1, then added threads = 4, then scaled to four pods. Congratulations, you created 128 potential request handlers, all happy to pile onto the database. If each request does two ORM calls and one of them blocks on I/O, you've built a queueing problem, not capacity.
Use a number tied to CPU and actual request profile. For a standard Django app on 2 vCPU, workers = 3 or 4, threads = 2 is usually a better starting point than worker inflation. Example:
bind = "0.0.0.0:8000"
workers = 4
threads = 2
worker_class = "gthread"
timeout = 60
graceful_timeout = 30
keepalive = 5
max_requests = 2000
max_requests_jitter = 200Then do the math across every process type, web, Celery workers, management jobs, async consumers. If your cluster can generate 180 concurrent DB-hitting execution slots, your PgBouncer and Postgres settings need to be designed around that, or you need to reduce concurrency. Most teams never do this math. They should.
pool sizing that survives traffic
Pool sizing doesn't need mysticism. It needs a pencil. Start from the database side, not the application side. Suppose Postgres has max_connections = 200. Subtract headroom for admin access, migrations, replicas, maintenance tasks, and anything bypassing PgBouncer. I usually reserve 30 to 40 immediately. That leaves roughly 160 usable connections.
Next, ask how many active queries your database can actually sustain before latency climbs. On a small production box with ordinary OLTP traffic, you may find that 32 to 48 active transactions is the sweet spot, while 100 just means more waiting. Set PgBouncer around that real number, not the theoretical maximum. Example, one Django app, one primary database, some Celery traffic:
default_pool_size = 40
reserve_pool_size = 10
max_db_connections = 60
max_client_conn = 800max_client_conn can be much larger because PgBouncer client connections are cheap. Server connections are the expensive part. You want the queue to form in PgBouncer, where it is visible and controlled, not in Postgres where every extra backend burns memory.
A rough sanity check helps. Say you run 3 web pods, each with 4 Gunicorn workers and 2 threads, plus 2 Celery pods with concurrency 6. That is 3 x 4 x 2 = 24 potential web execution slots and 2 x 6 = 12 task slots, 36 total units that might hit the database. If only half are actively querying at once, a pool of 20 might work. If your request pattern includes slow reports or heavy admin screens, 20 will feel tight and 40 is safer. You can validate with SHOW POOLS; and SHOW STATS; in the PgBouncer admin console.
Watch cl_waiting, sv_active, sv_idle, and average transaction time. If cl_waiting is consistently nonzero during normal load and Postgres still has CPU and I/O headroom, increase pool size carefully. If sv_active is pinned high and query latency grows with it, the fix is query tuning or workload shaping, not another 30 server connections. More concurrency doesn't rescue a bad query plan. It just makes the pileup louder.
migrate without drama
You don't need a quarter-long platform project to adopt PgBouncer. A careful afternoon is often enough if you know the sharp edges. Run PgBouncer as a sidecar, a dedicated deployment, or use a managed offering if your cloud stack supports it. I prefer a dedicated service because it keeps failure domains obvious and config changes deliberate.
First, install it and get authentication right. For SCRAM, match what Postgres uses. Test with psql through port 6432 before touching Django. Second, point one non-critical environment at it, staging if your staging data shape is honest, a low-traffic worker if it isn't. Third, disable Django persistent connections and server-side cursors, then run your test suite and a few real flows. Expect a few weird failures if somebody buried SET search_path, temp tables, advisory locks held across requests, or driver-level assumptions about sticky connections.
Migrations deserve special treatment. Django migrations can run through PgBouncer in many setups, though I still prefer sending them directly to Postgres or using a separate PgBouncer database entry with pool_mode = session for administrative tasks. Same for manage.py dbshell. Keep operational paths boring. If a migration holds locks for 40 seconds, the last thing you want is to debug pooler behavior on top of it.
Failover also needs one explicit decision. If you're on RDS, Cloud SQL, or a managed HA Postgres, point PgBouncer at the writer endpoint, not an instance address, unless you enjoy manual cutovers. Set conservative DNS caching around it. Then rehearse it. Restart PgBouncer during a quiet window, promote a replica in staging, watch how your app behaves, and verify that client retries are sane. The error you want to understand before production is psycopg.OperationalError: server closed the connection unexpectedly, because you will see some variant of it during failover and restarts.
Rollout plan, in practice, is simple, deploy PgBouncer, shift one service, compare connection counts and p95 latency, then move the rest. Keep direct Postgres connectivity available for rollback for a day or two. You probably won't need it.
monitor the right layer
A pooler solves connection pressure. It doesn't forgive bad SQL. Teams often install PgBouncer, watch the connection graph flatten, declare victory, and miss the SELECT that scans 8 million rows because somebody filtered on an unindexed JSONB key inside a request path.
Monitor all three layers, app, pooler, database. In Django, enable query logging selectively or use something like OpenTelemetry with request spans and SQL timings. In PgBouncer, scrape SHOW STATS, SHOW POOLS, and SHOW SERVERS into Prometheus. On Postgres, keep pg_stat_statements enabled and actually read it. If you aren't ranking queries by total execution time, mean time, and calls, you're guessing.
A basic query to keep around:
SELECT
query,
calls,
total_exec_time,
mean_exec_time,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;Then correlate that with PgBouncer wait metrics. If wait time spikes because one reporting endpoint opens a transaction and spends 12 seconds rendering a CSV while holding locks, no amount of app scaling helps. Move the report to async work, chunk it, or rewrite the query. If Celery floods the pool during a backfill, split queues and give the worker fleet its own PgBouncer user with a smaller pool cap. Isolation beats hope.
One more thing, set application_name per process type. web, celery, beat, migration. That tiny bit of hygiene makes pg_stat_activity readable during incidents, and readable systems get fixed faster. The teams that stay calm under load usually aren't smarter, they just gave themselves enough visibility to stop guessing.
PgBouncer won't fix every scaling problem in a Django startup. It fixes the very common, very self-inflicted problem of treating Postgres like it can accept infinite chatty clients without consequence. Put the pooler in, size it with intent, trim your worker counts, watch your queries, then decide if you still need more servers. Most teams won't.
