start with the default
Most B2B SaaS teams should choose shared tables with a tenant_id column plus Row Level Security, and they should choose it early, because it keeps the data model simple, keeps the app code boring, and avoids the operational mess that shows up the moment every tenant becomes its own little snowflake with separate DDL drift, separate migrations, separate backup stories, and separate incidents at 2 a.m. I keep seeing teams reach for schema-per-tenant because it feels safer, then six months later they've got 1,400 schemas, Prisma or Django migrations that now take forever, pg_dump jobs that need custom filtering, and one enterprise customer asking for a hotfix that turns into a migration fanout problem.
RLS gets a bad reputation from people who implemented it halfway, usually by setting a tenant variable on the connection and then sticking PgBouncer in transaction mode in front of it, which is how you end up leaking context between requests or randomly hitting ERROR: unrecognized configuration parameter "app.tenant_id" because the session state wasn't what you thought it was. That's not an RLS problem. That's an ops problem.
At Steezr we've used this pattern on customer portals, internal ERP-style systems, and AI document processing backends where every table eventually grows tenant semantics, even the ones people swore were global forever. Shared-table multi-tenancy lets a small senior team ship fast because every feature flows through one code path, one migration stream, one observability setup. You still need discipline, especially around composite indexes, foreign keys, and connection management, though the payoff is huge. Physical isolation only starts winning once you have contractual requirements, noisy neighbors you can't tame, tenant-specific extensions, or data residency constraints that won't fit inside one cluster design.
My default advice is blunt: pick RLS first, enforce tenant_id everywhere, and earn the right to move a few heavyweight tenants into isolated schemas or databases later.
where schemas actually help
Schema-per-tenant has a narrow valid use case, and the teams who are happy with it usually know exactly why they chose it. They have a small number of high-value tenants, each tenant needs custom objects or custom release timing, and the operational overhead is acceptable because the tenant count is measured in dozens, not thousands. In that world, a schema can be a decent boundary. You can move a tenant with pg_dump --schema=tenant_acme or logical replication, you can grant access more explicitly, and you can reason about storage at a slightly more physical level.
That doesn't make schemas pleasant. Catalog growth becomes real. Tooling gets weird. ORMs often assume one search_path or one migration history, then quietly betray you. search_path bugs are nasty because they often fail open, not closed. Somebody forgets to schema-qualify a function or sequence, tests pass, prod routes one tenant into the wrong object, and now you've got the kind of incident where every answer starts with "we believe impact was limited". Nobody enjoys writing that email.
A hybrid model is where plenty of mature startups eventually land. Shared tables and RLS for the main app, then one or two large tenants moved to dedicated schemas or separate databases because they need a custom retention policy, a regional deployment, or they hammer write throughput hard enough to justify isolation. That's a sane endpoint. It is a terrible starting point. Hybrid multiplies operational branches, and branches are where velocity goes to die.
Use schemas if you need physical-ish isolation inside one database. Use separate databases if a tenant genuinely deserves its own blast radius. Use RLS for the common path. That division holds up well in practice.
the migration that actually works
The clean migration path from single-tenant tables to multi-tenant RLS is boring, incremental, and full of backfills. Good. Boring migrations are the ones that finish.
Assume you started with something like accounts, projects, invoices, no tenant boundary beyond maybe an account_id that sort of implies ownership. First step, add a real tenant table and put tenant_id on every tenant-scoped row. Do it nullable first, backfill in chunks, then lock it down.
1create table tenant (2 id uuid primary key default gen_random_uuid(),3 slug text not null unique,4 created_at timestamptz not null default now()5);67alter table project add column tenant_id uuid;8alter table invoice add column tenant_id uuid;910update project p11set tenant_id = a.tenant_id12from account a13where p.account_id = a.id14 and p.tenant_id is null;
Chunk the big backfills. If a table has 80 million rows, don't fire one monster update and pray. Use primary key ranges or ctid batching from the app or a maintenance script, commit every few thousand rows, and watch pg_stat_activity, pg_locks, and replica lag while it runs. Long-running transactions will punish you later by holding dead tuples hostage and stretching vacuum work into the next week.
Once data is backfilled, add constraints in a way that avoids full blocking where possible.
1alter table project2 add constraint project_tenant_fk3 foreign key (tenant_id) references tenant(id) not valid;45alter table project validate constraint project_tenant_fk;67alter table project8 alter column tenant_id set not null;
Then fix uniqueness. Global unique indexes are a classic multi-tenant footgun. users(email) probably needs to become (tenant_id, email) unless your product semantics truly require global uniqueness.
1create unique index concurrently user_tenant_email_uniq2 on app_user (tenant_id, lower(email));
After that, update every foreign key between tenant tables to include tenant_id where it matters. Single-column FKs allow cross-tenant references if ids are guessed or copied incorrectly. The safer pattern is composite keys.
1alter table project add constraint project_tenant_id_id_uniq2 unique (tenant_id, id);34alter table invoice5 add constraint invoice_project_fk6 foreign key (tenant_id, project_id)7 references project (tenant_id, id)8 not valid;910alter table invoice validate constraint invoice_project_fk;
Now you can enable RLS, write policies, and force the app through them. Leave one admin role with BYPASSRLS for maintenance jobs, keep it away from request traffic, and audit every place that can assume it.
rls without self-sabotage
RLS itself is simple. The surrounding conventions decide whether it stays simple.
I prefer a dedicated setting for tenant context because it keeps SQL explicit and keeps the app from sprinkling tenant predicates across every query. In Postgres 16 you can still rely on current_setting(..., true) inside policies, which gives you null instead of exploding if the setting wasn't assigned. That makes failures easier to reason about, especially during migration.
1alter table project enable row level security;2alter table invoice enable row level security;3alter table project force row level security;4alter table invoice force row level security;56create policy tenant_isolation_project on project7 using (tenant_id = nullif(current_setting('app.tenant_id', true), '')::uuid)8 with check (tenant_id = nullif(current_setting('app.tenant_id', true), '')::uuid);910create policy tenant_isolation_invoice on invoice11 using (tenant_id = nullif(current_setting('app.tenant_id', true), '')::uuid)12 with check (tenant_id = nullif(current_setting('app.tenant_id', true), '')::uuid);
Then in the app, every request starts a transaction and sets context with SET LOCAL, not SET, because SET LOCAL dies with the transaction and won't bleed into the next request on a reused connection.
1begin;2set local app.tenant_id = '6f0b7f9d-6a1f-4c2c-b2ae-0d7b8a2b98ab';3select id, name from project order by created_at desc limit 20;4commit;
That one detail matters a lot with PgBouncer. If you're in transaction pooling mode, session state is not yours after commit. People miss this, then spend days chasing phantom authorization bugs. If your framework doesn't make per-request transactions easy, fix that before rolling out RLS. Django 6.x can handle this cleanly with middleware plus transaction.atomic(). In a FastAPI or async SQLAlchemy stack, wrap the request unit of work and issue SET LOCAL on the same transaction-bound connection before any ORM query fires.
One more thing, test the negative path. Create fixtures for tenant A and tenant B, run the same repository methods under both, and assert zero rows cross the boundary. Add a CI check that every tenant-scoped table has RLS enabled and force row level security set. Humans forget. Databases don't.
pgbouncer and pooling
Most RLS horror stories are really PgBouncer stories. Transaction pooling is still the right choice for a lot of web traffic because it gives you much better connection density, especially on smaller Postgres boxes where max_connections=500 would just convert RAM into sadness, though you have to shape the app around it. That means explicit transactions around request work, SET LOCAL for tenant context, and no assumptions about prepared statements surviving forever unless your driver and pooling mode are configured for it.
A minimal PgBouncer config for this pattern looks like this:
1[databases]2app = host=127.0.0.1 port=5432 dbname=app34[pgbouncer]5pool_mode = transaction6max_client_conn = 20007default_pool_size = 508reserve_pool_size = 209reserve_pool_timeout = 310server_reset_query = DISCARD ALL11server_reset_query_always = 012ignore_startup_parameters = extra_float_digits,options
DISCARD ALL helps clean session junk on release, though it doesn't rescue bad app behavior inside the transaction, and it has a cost, which is usually acceptable for request-response workloads. If you're running a lot of prepared statements through drivers that don't play nicely with transaction pooling, you'll see errors like prepared statement "__asyncpg_stmt_a__" does not exist or weird performance cliffs from repeated parse/plan churn. You can either disable server-side prepares in the driver, tune statement cache settings, or keep a separate session-pooled path for migrations, admin tasks, and workloads that truly need sticky sessions.
We usually split traffic. App requests go through transaction pooling. Background workers with heavier transaction semantics may get their own PgBouncer database entry, sometimes session pooled, sometimes direct. Migrations go direct to Postgres or to a tightly controlled session pool because schema changes plus transaction pooling is where sharp edges multiply.
Also, watch idle in transaction. RLS plus SET LOCAL encourages transaction wrappers, which is fine until somebody opens one and then waits on an HTTP call or an S3 upload. Now autovacuum can't clean tuples touched after that snapshot, locks stick around longer than expected, and you get a surprise incident that started with one lazy service method.
indexes, vacuum, and ugly failure modes
Postgres 16 handles multi-tenant shared tables very well if you stop pretending the tenant predicate is optional. Almost every hot query in the app should lead with tenant_id in the index, because your real access pattern is rarely where status = 'open', it's where tenant_id = $1 and status = 'open' order by created_at desc limit 50. Build indexes for the workload you actually have.
1create index concurrently invoice_tenant_created_idx2 on invoice (tenant_id, created_at desc);34create index concurrently invoice_tenant_status_created_idx5 on invoice (tenant_id, status, created_at desc)6 where deleted_at is null;
That partial index is often worth more than a generic one on a soft-deleted table. Keep index count under control, though. Shared tables with many secondary indexes amplify write cost for every tenant, and one oversized enterprise tenant can punish everyone else.
Autovacuum tuning matters earlier in shared-table systems because bloat from one high-churn tenant accumulates in the same heap and indexes everybody touches. For the busiest tables we often set per-table knobs instead of global panic.
1alter table invoice set (2 autovacuum_vacuum_scale_factor = 0.02,3 autovacuum_analyze_scale_factor = 0.01,4 autovacuum_vacuum_threshold = 5000,5 autovacuum_analyze_threshold = 50006);
Watch pg_stat_user_tables.n_dead_tup, last_autovacuum, vacuum_count, and index size growth. If bloat gets bad enough, reindex index concurrently or pg_repack can save you without a long outage, though pg_repack brings its own operational ceremony. Large deletes are another trap. If you're offboarding a tenant and deleting 200 million rows from shared tables, you'll create a wake of bloat and WAL. Archive first if you need to, delete in chunks, and plan for vacuum follow-up.
The migration pitfalls are painfully predictable. Adding not null too early blocks writes. Creating a non-concurrent index on a hot table blocks writes harder. Validating foreign keys during peak load drags latency. Long-running read transactions on replicas can hold back cleanup. One forgotten code path that inserts rows without tenant_id will start throwing new row violates row-level security policy for table "invoice", which is annoying, though at least it fails closed. I prefer that kind of pain.
when to move past rls
RLS carries most startups a long way. It breaks down for reasons that are usually organizational or contractual before they're technical. A single tenant demands encryption key segregation with strict operator boundaries. A regulated customer needs data pinned to a region you don't currently use. One tenant drives 70% of write traffic and their monthly import job turns autovacuum into a permanent background fire. Those are good reasons to isolate.
The nice part about a clean RLS design is that it gives you an exit ramp. Every row already has tenant_id. Composite keys already encode ownership. Queries already assume a tenant scope. Moving one tenant to a dedicated schema or database becomes an extraction project, not a redesign.
The simplest path is usually logical export plus dual-write or short write-freeze cutover. Create the destination schema or database, replay schema migrations, copy tenant rows in dependency order, verify counts and checksums, block writes briefly, sync the tail, switch routing, and leave a read-only fallback window. For a dedicated database move, logical replication can help if your primary keys and update patterns cooperate, though plenty of teams overcomplicate this when a carefully scheduled maintenance window would have been cheaper and safer.
My bias stays the same after a lot of production mileage. Start shared, force tenant discipline in the database, use RLS, run PgBouncer in transaction mode with explicit transactions, and invest in indexes and vacuum before the pain arrives. Move heavy tenants out only after they prove they deserve special treatment. Most of them won't. The ones that do will be easier to handle because you didn't begin with an architecture that optimized for edge cases and taxed every normal case.
