Stop Choosing Microservices for Your ERP

the architecture trap

ERP and CRM systems attract architecture cosplay faster than almost any other category of business software, partly because the domain feels large on day one, you can already picture inventory, invoicing, procurement, approval flows, customer records, email sync, warehouse operations, all colliding in one codebase, and that mental picture makes people reach for service boundaries long before they have any evidence those boundaries are real.

I've seen teams with twelve customers set up separate services for auth, billing, notifications, reporting, documents, and workflow automation, then spend the next six months building the glue code required to make their own system usable. They had RabbitMQ, Redis, OpenTelemetry, a service mesh they barely understood, and Terraform modules split across six repos, yet they still couldn't answer a basic support question without grep'ing logs across half the stack. One of them had a nightly reconciliation job that failed because one service published customer.updated with account_id as a string, another consumed it as an integer, and the only visible symptom was a dashboard count drifting by 37 records every morning.

This stuff sounds sophisticated until you're on call.

A startup ERP usually has one real job, encode messy business rules quickly, change them constantly, and keep the data consistent while you do it. A Django 4.2 monolith backed by PostgreSQL 15 is brutally good at that. You get transactions that actually span the business operation, admin tooling that your support team can use next week, migrations that are easy to reason about, one deployment artifact, one observability surface, one place to add tests, one place to put a breakpoint. That's not nostalgia, that's operational sanity.

At Steezr we build a lot of internal systems for SMBs and startups, ERP-ish platforms, customer portals, sales automation, document workflows, and the pattern keeps repeating. Teams that stay monolithic longer ship faster, debug faster, and spend their engineering budget on domain logic instead of distributed systems tax.

why django fits ERP

Django 4.2 has exactly the kind of boring power ERP software needs. The ORM is mature, the admin is still criminally underrated, the request lifecycle is obvious, and the ecosystem around forms, permissions, storage, background work, and testing is deep enough that you rarely need to get clever. Clever is expensive.

PostgreSQL 15 does even more of the heavy lifting than most teams admit. Row locking with select_for_update, partial indexes, materialized views for ugly reporting queries, JSONB for the occasional unstructured vendor payload, pg_trgm for fuzzy search on customer names, partitioning for event tables if you really need it, all in one database that your team can inspect with psql and understand. The number of business problems that become dramatically simpler once you stop pretending your data wants to be split across five databases is huge.

Schema-based multi-tenancy is a good fit here too. One database, one cluster, separate schemas per tenant, shared app code. You avoid the worst parts of row-based scoping bugs, where somebody forgets tenant_id in one query and suddenly Acme Corp can see Globex invoices, and you avoid the operational mess of spinning up a whole database per tiny customer too early. Packages like django-tenants are workable if you keep the model clear and avoid fighting the abstraction.

A very normal setup looks like this:

python

 1# settings.py
 2INSTALLED_APPS = [
 3    "django_tenants",
 4    "customers",
 5    "erp.core",
 6    "erp.sales",
 7    "erp.inventory",
 8    "erp.billing",
 9]
10
11DATABASES = {
12    "default": {
13        "ENGINE": "django_tenants.postgresql_backend",
14        "NAME": "erp",
15        "USER": "erp",
16        "PASSWORD": os.environ["DB_PASSWORD"],
17        "HOST": "db",
18        "PORT": "5432",
19    }
20}
21
22TENANT_MODEL = "customers.Client"
23TENANT_DOMAIN_MODEL = "customers.Domain"

Then for async work, use Celery with Redis or RabbitMQ if you truly need broker semantics. PDF generation, EDI imports, webhook retries, OCR pipelines, email sending, all perfect async candidates. The mistake is turning every domain interaction into async choreography. create_invoice() should usually write the invoice in the same transaction, not emit three events and pray eventual consistency sorts it out.

distributed pain is real

The sales pitch for microservices always focuses on independent scaling and team autonomy, which sounds great in a 40-person platform org, less great in a six-engineer startup where two people know the production stack well enough to fix it under pressure. Every service you add creates another CI pipeline, another deploy, another set of secrets, another migration path, another alert channel, another failure mode that only appears under partial outage.

You don't notice the full cost during the architecture diagram phase. You notice it when the billing service times out waiting for the customer service, which is degraded because its connection pool is exhausted, which happened because a reporting job fan-out hit the database hard, and now your invoice finalization endpoint returns 502 Bad Gateway for only some tenants. Nginx logs show upstream failures, Sentry has fragments of the trace, Grafana says p95 jumped, and your engineers are manually correlating timestamps across three repos because the trace context header wasn't propagated by one internal client.

Message brokers add their own class of nonsense. Duplicate delivery, poison messages, stale consumers, schema drift, dead-letter queues that quietly fill for days because nobody added an alert. Teams who can barely keep synchronous code paths clean suddenly volunteer to manage eventual consistency and exactly-once semantics, neither of which exist in the simple way they think they do. Then come the compensating actions, idempotency keys, retries with exponential backoff, and the inevitable postmortem where someone says the architecture gave us flexibility.

Flexibility for what, exactly.

An ERP is full of coupled operations. Approve purchase order, reserve stock, create payable, update ledger, notify someone, maybe generate a PDF. Those actions are logically tied together. If the user expects one coherent result, your system should probably execute the core state changes inside one database transaction. Splitting that apart early buys theoretical elegance and very real bugs.

A monolith still fails, obviously. The difference is that most failures are visible, local, and fixable without opening seven dashboards.

build the monolith properly

A monolith doesn't mean a junk drawer. The codebase has to preserve boundaries inside one deployable unit, otherwise you end up with the same confusion as microservices, just without the network latency. Organize by domain, keep interfaces explicit, and ban random imports across modules unless the dependency direction is intentional.

A folder structure we like looks roughly like this:

text

 1erp/
 2  sales/
 3    services/
 4    models.py
 5    selectors.py
 6    api.py
 7  inventory/
 8    services/
 9    models.py
10    selectors.py
11    api.py
12  billing/
13    services/
14    models.py
15    selectors.py
16    api.py
17  shared/
18    money.py
19    events.py

services/ holds state-changing business actions, selectors.py handles read queries, api.py is the thin boundary other modules call. No module reaches straight into another module's models unless you've explicitly blessed that dependency. That one rule cuts a lot of future extraction pain.

Use Postgres constraints aggressively. Unique constraints, check constraints, foreign keys, exclusion constraints where needed. If your stock reservation rules matter, encode them in the database as far as possible. Application code lies under load. Databases are less polite about it.

Use transaction.atomic() around business operations that must stay coherent:

python

 1from django.db import transaction
 2
 3@transaction.atomic
 4def approve_purchase_order(po_id: int, actor_id: int) -> None:
 5    po = (
 6        PurchaseOrder.objects
 7        .select_for_update()
 8        .get(id=po_id)
 9    )
10    if po.status != PurchaseOrder.Status.SUBMITTED:
11        raise InvalidState("PO must be submitted before approval")
12
13    po.status = PurchaseOrder.Status.APPROVED
14    po.approved_by_id = actor_id
15    po.save(update_fields=["status", "approved_by_id"])
16
17    LedgerEntry.objects.create(...)
18    NotificationOutbox.objects.create(...)

Notice the outbox table. That's one of the few patterns I recommend early, because it gives you a clean bridge between synchronous state changes and async side effects. Commit business data and outbound events in the same transaction, then let a Celery worker publish emails, webhooks, or downstream messages later. No dual-write mess, no lost notifications if the worker crashes at the wrong moment.

That architecture scales surprisingly far. Usually far enough that your next problem is product-market fit, not service decomposition.

split only on pain

Service extraction should happen because a specific bottleneck keeps hurting you, not because somebody read about bounded contexts and got excited. I want a written reason, with numbers. CPU saturation on one module. A deployment cadence conflict between teams that actually exists. Compliance requirements forcing isolation. A data model that diverges enough that shared transactions are now the bigger problem. Anything less concrete is architecture fan fiction.

The safest extraction path starts inside the monolith, with a stable interface and no network. Define an adapter boundary first:

python

 1# billing/api.py
 2class BillingGateway(Protocol):
 3    def create_invoice(self, order_id: int) -> str: ...
 4
 5class LocalBillingGateway:
 6    def create_invoice(self, order_id: int) -> str:
 7        return create_invoice(order_id)

Wire it through settings or dependency injection, then add a remote implementation behind a feature flag.

python

 1# settings.py
 2BILLING_BACKEND = os.getenv("BILLING_BACKEND", "local")

python

 1class RemoteBillingGateway:
 2    def create_invoice(self, order_id: int) -> str:
 3        r = httpx.post(
 4            "http://billing:8000/api/invoices/",
 5            json={"order_id": order_id},
 6            timeout=5.0,
 7        )
 8        r.raise_for_status()
 9        return r.json()["invoice_id"]

Now you can shadow traffic, enable it for one tenant, compare outputs, and roll back without rewriting the rest of the application. That is adult engineering.

For data, I usually prefer database-per-service once a service is truly extracted, because shared databases keep hidden coupling alive forever. During migration, copy data with CDC or scheduled sync, keep one side authoritative, and make the ownership painfully clear in docs and code reviews. Feature flags help here too. django-waffle is perfectly adequate for tenant-scoped rollout.

A reasonable sequence looks like this: first isolate code paths inside the monolith, second add API contracts and contract tests, third mirror writes or reads for a narrow slice, fourth switch one tenant or one workflow, fifth measure support load and operational noise before expanding. If the new service creates more pager noise than business value, kill it. Reversibility matters more than architectural purity.

a sane starting stack

If I were starting an ERP or CRM platform next Monday for a startup with actual delivery pressure, I'd pick Django 4.2 LTS, PostgreSQL 15, Celery 5.3, Redis 7 for caching and queueing unless broker guarantees really matter, HTMX for internal backoffice interactions that don't deserve a full SPA, and Next.js 14 only where the customer-facing surface actually benefits from it. A lot of admin-heavy business software gets worse when every screen becomes a frontend architecture exercise.

Deployment can stay boring too. One app container, one worker container, one beat scheduler if you need periodic tasks, Nginx or Caddy in front, managed Postgres if budget allows. Put it on Fly.io, Render, Railway, ECS, a small Kubernetes cluster if your team already knows Kubernetes well, doesn't matter much. What matters is that everybody on the team can trace a request from browser to database without reading a platform runbook first.

Monitoring should also stay boring. Sentry for errors, Prometheus plus Grafana if you need metrics depth, structured JSON logs shipped to Loki or whatever your team already uses. Add django-silk or django-debug-toolbar in non-prod, inspect slow queries with pg_stat_statements, and fix the obvious stuff before anybody says the monolith can't scale. Half the time the bottleneck is an N+1 query buried in a serializer, not the architectural style.

I've watched too many teams spend months engineering around problems they did not yet have, while the actual customer pain sat untouched. The best ERP architecture for an early-stage company is the one that lets senior engineers ship accounting rules, approval chains, imports, role permissions, and ugly integrations quickly, while keeping production understandable. That's the monolith.

Earn the complexity later.

Stop Choosing Microservices for Your ERP

the architecture trap

why django fits ERP

distributed pain is real

build the monolith properly

split only on pain

a sane starting stack

Your AI Dev Toolchain Is the Attack Surface

SLSA Provenance Did Not Save TanStack

Stop mocking your database: Postgres 17 makes it unnecessary

Want to work with us?