Multi-tenant Postgres 16: pick RLS unless proven otherwise

start with the default

Most B2B SaaS teams should choose shared tables with a tenant_id column plus Row Level Security, and they should choose it early, because it keeps the data model simple, keeps the app code boring, and avoids the operational mess that shows up the moment every tenant becomes its own little snowflake with separate DDL drift, separate migrations, separate backup stories, and separate incidents at 2 a.m. I keep seeing teams reach for schema-per-tenant because it feels safer, then six months later they've got 1,400 schemas, Prisma or Django migrations that now take forever, pg_dump jobs that need custom filtering, and one enterprise customer asking for a hotfix that turns into a migration fanout problem.

RLS gets a bad reputation from people who implemented it halfway, usually by setting a tenant variable on the connection and then sticking PgBouncer in transaction mode in front of it, which is how you end up leaking context between requests or randomly hitting ERROR: unrecognized configuration parameter "app.tenant_id" because the session state wasn't what you thought it was. That's not an RLS problem. That's an ops problem.

At Steezr we've used this pattern on customer portals, internal ERP-style systems, and AI document processing backends where every table eventually grows tenant semantics, even the ones people swore were global forever. Shared-table multi-tenancy lets a small senior team ship fast because every feature flows through one code path, one migration stream, one observability setup. You still need discipline, especially around composite indexes, foreign keys, and connection management, though the payoff is huge. Physical isolation only starts winning once you have contractual requirements, noisy neighbors you can't tame, tenant-specific extensions, or data residency constraints that won't fit inside one cluster design.

My default advice is blunt: pick RLS first, enforce tenant_id everywhere, and earn the right to move a few heavyweight tenants into isolated schemas or databases later.

where schemas actually help

Schema-per-tenant has a narrow valid use case, and the teams who are happy with it usually know exactly why they chose it. They have a small number of high-value tenants, each tenant needs custom objects or custom release timing, and the operational overhead is acceptable because the tenant count is measured in dozens, not thousands. In that world, a schema can be a decent boundary. You can move a tenant with pg_dump --schema=tenant_acme or logical replication, you can grant access more explicitly, and you can reason about storage at a slightly more physical level.

That doesn't make schemas pleasant. Catalog growth becomes real. Tooling gets weird. ORMs often assume one search_path or one migration history, then quietly betray you. search_path bugs are nasty because they often fail open, not closed. Somebody forgets to schema-qualify a function or sequence, tests pass, prod routes one tenant into the wrong object, and now you've got the kind of incident where every answer starts with "we believe impact was limited". Nobody enjoys writing that email.

A hybrid model is where plenty of mature startups eventually land. Shared tables and RLS for the main app, then one or two large tenants moved to dedicated schemas or separate databases because they need a custom retention policy, a regional deployment, or they hammer write throughput hard enough to justify isolation. That's a sane endpoint. It is a terrible starting point. Hybrid multiplies operational branches, and branches are where velocity goes to die.

Use schemas if you need physical-ish isolation inside one database. Use separate databases if a tenant genuinely deserves its own blast radius. Use RLS for the common path. That division holds up well in practice.

the migration that actually works

The clean migration path from single-tenant tables to multi-tenant RLS is boring, incremental, and full of backfills. Good. Boring migrations are the ones that finish.

Assume you started with something like accounts, projects, invoices, no tenant boundary beyond maybe an account_id that sort of implies ownership. First step, add a real tenant table and put tenant_id on every tenant-scoped row. Do it nullable first, backfill in chunks, then lock it down.

sql

 1create table tenant (
 2  id uuid primary key default gen_random_uuid(),
 3  slug text not null unique,
 4  created_at timestamptz not null default now()
 5);
 6
 7alter table project add column tenant_id uuid;
 8alter table invoice add column tenant_id uuid;
 9
10update project p
11set tenant_id = a.tenant_id
12from account a
13where p.account_id = a.id
14  and p.tenant_id is null;

Chunk the big backfills. If a table has 80 million rows, don't fire one monster update and pray. Use primary key ranges or ctid batching from the app or a maintenance script, commit every few thousand rows, and watch pg_stat_activity, pg_locks, and replica lag while it runs. Long-running transactions will punish you later by holding dead tuples hostage and stretching vacuum work into the next week.

Once data is backfilled, add constraints in a way that avoids full blocking where possible.

sql

 1alter table project
 2  add constraint project_tenant_fk
 3  foreign key (tenant_id) references tenant(id) not valid;
 4
 5alter table project validate constraint project_tenant_fk;
 6
 7alter table project
 8  alter column tenant_id set not null;

Then fix uniqueness. Global unique indexes are a classic multi-tenant footgun. users(email) probably needs to become (tenant_id, email) unless your product semantics truly require global uniqueness.

sql

 1create unique index concurrently user_tenant_email_uniq
 2  on app_user (tenant_id, lower(email));

After that, update every foreign key between tenant tables to include tenant_id where it matters. Single-column FKs allow cross-tenant references if ids are guessed or copied incorrectly. The safer pattern is composite keys.

sql

 1alter table project add constraint project_tenant_id_id_uniq
 2  unique (tenant_id, id);
 3
 4alter table invoice
 5  add constraint invoice_project_fk
 6  foreign key (tenant_id, project_id)
 7  references project (tenant_id, id)
 8  not valid;
 9
10alter table invoice validate constraint invoice_project_fk;

Now you can enable RLS, write policies, and force the app through them. Leave one admin role with BYPASSRLS for maintenance jobs, keep it away from request traffic, and audit every place that can assume it.

rls without self-sabotage

RLS itself is simple. The surrounding conventions decide whether it stays simple.

I prefer a dedicated setting for tenant context because it keeps SQL explicit and keeps the app from sprinkling tenant predicates across every query. In Postgres 16 you can still rely on current_setting(..., true) inside policies, which gives you null instead of exploding if the setting wasn't assigned. That makes failures easier to reason about, especially during migration.

sql

 1alter table project enable row level security;
 2alter table invoice enable row level security;
 3alter table project force row level security;
 4alter table invoice force row level security;
 5
 6create policy tenant_isolation_project on project
 7  using (tenant_id = nullif(current_setting('app.tenant_id', true), '')::uuid)
 8  with check (tenant_id = nullif(current_setting('app.tenant_id', true), '')::uuid);
 9
10create policy tenant_isolation_invoice on invoice
11  using (tenant_id = nullif(current_setting('app.tenant_id', true), '')::uuid)
12  with check (tenant_id = nullif(current_setting('app.tenant_id', true), '')::uuid);

Then in the app, every request starts a transaction and sets context with SET LOCAL, not SET, because SET LOCAL dies with the transaction and won't bleed into the next request on a reused connection.

sql

 1begin;
 2set local app.tenant_id = '6f0b7f9d-6a1f-4c2c-b2ae-0d7b8a2b98ab';
 3select id, name from project order by created_at desc limit 20;
 4commit;

That one detail matters a lot with PgBouncer. If you're in transaction pooling mode, session state is not yours after commit. People miss this, then spend days chasing phantom authorization bugs. If your framework doesn't make per-request transactions easy, fix that before rolling out RLS. Django 6.x can handle this cleanly with middleware plus transaction.atomic(). In a FastAPI or async SQLAlchemy stack, wrap the request unit of work and issue SET LOCAL on the same transaction-bound connection before any ORM query fires.

One more thing, test the negative path. Create fixtures for tenant A and tenant B, run the same repository methods under both, and assert zero rows cross the boundary. Add a CI check that every tenant-scoped table has RLS enabled and force row level security set. Humans forget. Databases don't.

pgbouncer and pooling

Most RLS horror stories are really PgBouncer stories. Transaction pooling is still the right choice for a lot of web traffic because it gives you much better connection density, especially on smaller Postgres boxes where max_connections=500 would just convert RAM into sadness, though you have to shape the app around it. That means explicit transactions around request work, SET LOCAL for tenant context, and no assumptions about prepared statements surviving forever unless your driver and pooling mode are configured for it.

A minimal PgBouncer config for this pattern looks like this:

ini

 1[databases]
 2app = host=127.0.0.1 port=5432 dbname=app
 3
 4[pgbouncer]
 5pool_mode = transaction
 6max_client_conn = 2000
 7default_pool_size = 50
 8reserve_pool_size = 20
 9reserve_pool_timeout = 3
10server_reset_query = DISCARD ALL
11server_reset_query_always = 0
12ignore_startup_parameters = extra_float_digits,options

DISCARD ALL helps clean session junk on release, though it doesn't rescue bad app behavior inside the transaction, and it has a cost, which is usually acceptable for request-response workloads. If you're running a lot of prepared statements through drivers that don't play nicely with transaction pooling, you'll see errors like prepared statement "__asyncpg_stmt_a__" does not exist or weird performance cliffs from repeated parse/plan churn. You can either disable server-side prepares in the driver, tune statement cache settings, or keep a separate session-pooled path for migrations, admin tasks, and workloads that truly need sticky sessions.

We usually split traffic. App requests go through transaction pooling. Background workers with heavier transaction semantics may get their own PgBouncer database entry, sometimes session pooled, sometimes direct. Migrations go direct to Postgres or to a tightly controlled session pool because schema changes plus transaction pooling is where sharp edges multiply.

Also, watch idle in transaction. RLS plus SET LOCAL encourages transaction wrappers, which is fine until somebody opens one and then waits on an HTTP call or an S3 upload. Now autovacuum can't clean tuples touched after that snapshot, locks stick around longer than expected, and you get a surprise incident that started with one lazy service method.

indexes, vacuum, and ugly failure modes

Postgres 16 handles multi-tenant shared tables very well if you stop pretending the tenant predicate is optional. Almost every hot query in the app should lead with tenant_id in the index, because your real access pattern is rarely where status = 'open', it's where tenant_id = $1 and status = 'open' order by created_at desc limit 50. Build indexes for the workload you actually have.

sql

 1create index concurrently invoice_tenant_created_idx
 2  on invoice (tenant_id, created_at desc);
 3
 4create index concurrently invoice_tenant_status_created_idx
 5  on invoice (tenant_id, status, created_at desc)
 6  where deleted_at is null;

That partial index is often worth more than a generic one on a soft-deleted table. Keep index count under control, though. Shared tables with many secondary indexes amplify write cost for every tenant, and one oversized enterprise tenant can punish everyone else.

Autovacuum tuning matters earlier in shared-table systems because bloat from one high-churn tenant accumulates in the same heap and indexes everybody touches. For the busiest tables we often set per-table knobs instead of global panic.

sql

 1alter table invoice set (
 2  autovacuum_vacuum_scale_factor = 0.02,
 3  autovacuum_analyze_scale_factor = 0.01,
 4  autovacuum_vacuum_threshold = 5000,
 5  autovacuum_analyze_threshold = 5000
 6);

Watch pg_stat_user_tables.n_dead_tup, last_autovacuum, vacuum_count, and index size growth. If bloat gets bad enough, reindex index concurrently or pg_repack can save you without a long outage, though pg_repack brings its own operational ceremony. Large deletes are another trap. If you're offboarding a tenant and deleting 200 million rows from shared tables, you'll create a wake of bloat and WAL. Archive first if you need to, delete in chunks, and plan for vacuum follow-up.

The migration pitfalls are painfully predictable. Adding not null too early blocks writes. Creating a non-concurrent index on a hot table blocks writes harder. Validating foreign keys during peak load drags latency. Long-running read transactions on replicas can hold back cleanup. One forgotten code path that inserts rows without tenant_id will start throwing new row violates row-level security policy for table "invoice", which is annoying, though at least it fails closed. I prefer that kind of pain.

when to move past rls

RLS carries most startups a long way. It breaks down for reasons that are usually organizational or contractual before they're technical. A single tenant demands encryption key segregation with strict operator boundaries. A regulated customer needs data pinned to a region you don't currently use. One tenant drives 70% of write traffic and their monthly import job turns autovacuum into a permanent background fire. Those are good reasons to isolate.

The nice part about a clean RLS design is that it gives you an exit ramp. Every row already has tenant_id. Composite keys already encode ownership. Queries already assume a tenant scope. Moving one tenant to a dedicated schema or database becomes an extraction project, not a redesign.

The simplest path is usually logical export plus dual-write or short write-freeze cutover. Create the destination schema or database, replay schema migrations, copy tenant rows in dependency order, verify counts and checksums, block writes briefly, sync the tail, switch routing, and leave a read-only fallback window. For a dedicated database move, logical replication can help if your primary keys and update patterns cooperate, though plenty of teams overcomplicate this when a carefully scheduled maintenance window would have been cheaper and safer.

My bias stays the same after a lot of production mileage. Start shared, force tenant discipline in the database, use RLS, run PgBouncer in transaction mode with explicit transactions, and invest in indexes and vacuum before the pain arrives. Move heavy tenants out only after they prove they deserve special treatment. Most of them won't. The ones that do will be easier to handle because you didn't begin with an architecture that optimized for edge cases and taxed every normal case.

Multi-tenant Postgres 16: pick RLS unless proven otherwise

start with the default

where schemas actually help

the migration that actually works

rls without self-sabotage

pgbouncer and pooling

indexes, vacuum, and ugly failure modes

when to move past rls

Your Tests Passed and Production Still Broke: The AI Verification Gap Nobody Wants to Name

The Vercel Breach Is a Template for How OAuth Sprawl Kills You

Your AI Dev Toolchain Is the Attack Surface

Want to work with us?