API rewrites are governance failures

rewrites don’t come out of nowhere

Most API rewrites get framed as a technical inevitability, old framework, wrong database, bad naming, a few regrettable endpoints from 2021. I don't buy that framing. The expensive rewrite usually shows up after a long period of ungoverned change, where every team ships whatever payload shape feels convenient, nobody knows which clients depend on which fields, and breaking changes leak into production because the only contract is whatever the frontend happened to parse last week.

I've seen this pattern in SaaS teams at twenty people and again at a hundred, the setup barely changes. One squad renames customer_id to accountId, another starts returning null for a field that used to be omitted, mobile pins an old response shape for six months because App Store review takes time, then leadership concludes the API is "messy" and needs a ground-up replacement. What they actually have is a governance problem, one that should have been solved with explicit schemas, compatibility checks in CI, and a deprecation policy someone owns.

The bad news is that no framework saves you from this. You can build the whole thing in Next.js route handlers, Django Ninja, FastAPI, Spring Boot 3.2, whatever you like, and still end up shipping accidental breakage every sprint if the contract lives in tribal memory. The good news is that the fix is boring, cheap, and works with a small team. Write the API schema first, store it in the repo, validate it in CI, make consumers declare expectations, and force every breaking change to go through an intentional versioning decision.

That sounds heavier than a rewrite pitch deck. It isn't. At Steezr we've done this on internal ERP-style systems and customer-facing portals where speed mattered more than process theater, and the teams that kept a tight contract discipline moved faster after month three because they stopped arguing about payload shape in pull requests and stopped discovering breakage through Slack screenshots.

openapi 3.1 as source of truth

OpenAPI 3.1 is finally good enough to treat as the real contract, largely because it aligns with JSON Schema 2020-12 instead of inventing a weird almost-schema dialect. That matters more than people admit. Once your types, nullability rules, enums, formats, and object constraints are expressed in a spec that standard tools understand, you can lint it, diff it, generate mocks from it, validate requests against it, and stop pretending your TypeScript types are a public API contract.

A minimal schema-first setup doesn't need a platform team. Put openapi.yaml in the service repo, review it like code, and fail CI if it breaks compatibility. This is enough to start:

yaml

 1openapi: 3.1.0
 2info:
 3  title: Customer API
 4  version: 1.4.0
 5servers:
 6  - url: https://api.example.com
 7paths:
 8  /customers/{id}:
 9    get:
10      operationId: getCustomer
11      parameters:
12        - name: id
13          in: path
14          required: true
15          schema:
16            type: string
17            format: uuid
18      responses:
19        '200':
20          description: Customer found
21          content:
22            application/json:
23              schema:
24                $ref: '#/components/schemas/Customer'
25components:
26  schemas:
27    Customer:
28      type: object
29      required: [id, email, status]
30      additionalProperties: false
31      properties:
32        id:
33          type: string
34          format: uuid
35        email:
36          type: string
37          format: email
38        status:
39          type: string
40          enum: [active, suspended]
41        full_name:
42          type: [string, 'null']

A few strong opinions here. Set additionalProperties: false unless you have a real reason not to, because silent field drift is how APIs become folklore. Be explicit about nullable values with type: [string, 'null'] in 3.1, don't hand-wave nullability in docs. Put stable operationId values on everything, because downstream tooling depends on them and renaming them casually is needless churn.

For linting, Spectral is the obvious choice. A tiny .spectral.yaml catches plenty of sloppiness:

yaml

 1extends: [spectral:oas]
 2rules:
 3  operation-operationId: error
 4  operation-tags: off
 5  no-$ref-siblings: error
 6  info-contact: off
 7  oas3-api-servers: warn

Then wire it into CI with Redocly CLI or Spectral directly. If the spec is malformed, the build should fail before anybody debates implementation details.

consumer contracts catch the real breakage

OpenAPI tells you what the provider says the API is. Consumer-driven contracts tell you what clients actually rely on. You need both. Specs without consumer verification drift into optimistic fiction, especially once you have a web app, a mobile app, a partner integration, and some cron-driven backend consumer hitting the same endpoints for slightly different reasons.

Pact is still the practical choice here. A frontend or downstream service records the interactions it needs, publishes the pact, and the provider verifies that those expectations hold. The killer feature isn't fancy tooling, it's forcing a real conversation about dependency. If a consumer depends on status always existing and always being one of active or suspended, you find out in CI, not after a Friday deploy.

A simple consumer test in JavaScript with Pact v12 looks like this:

 1import { PactV3, MatchersV3 } from '@pact-foundation/pact';
 2import axios from 'axios';
 3
 4const provider = new PactV3({
 5  consumer: 'billing-portal',
 6  provider: 'customer-api'
 7});
 8
 9describe('GET /customers/:id', () => {
10  it('returns a billable customer', async () => {
11    provider
12      .given('customer 8f4c3d8e-4c2e-4f7d-90c0-9fcb2cfdb133 exists')
13      .uponReceiving('a request for a customer')
14      .withRequest({
15        method: 'GET',
16        path: '/customers/8f4c3d8e-4c2e-4f7d-90c0-9fcb2cfdb133'
17      })
18      .willRespondWith({
19        status: 200,
20        headers: { 'Content-Type': 'application/json' },
21        body: {
22          id: MatchersV3.uuid('8f4c3d8e-4c2e-4f7d-90c0-9fcb2cfdb133'),
23          email: MatchersV3.email('ops@example.com'),
24          status: MatchersV3.regex('active|suspended', 'active')
25        }
26      });
27
28    await provider.executeTest(async mockServer => {
29      const res = await axios.get(`${mockServer.url}/customers/8f4c3d8e-4c2e-4f7d-90c0-9fcb2cfdb133`);
30      expect(res.data.status).toBe('active');
31    });
32  });
33});

Then the provider verifies those pacts during its own pipeline. If someone removes status or changes the endpoint to return customerStatus, verification fails, loudly. Good. That's the exact moment you want friction.

This does require discipline. Consumers should publish pacts on every main-branch build. Providers should verify against the latest deployed consumer versions plus main, not just whatever happens to be convenient. If you skip that, Pact becomes another badge on the README. Used properly, it prevents the casual breakages that accumulate into rewrite pressure.

ci should block schema drift

Governance only works if the machine enforces it. A Confluence page saying "please avoid breaking changes" is decoration. CI needs to answer four questions on every pull request: is the OpenAPI file valid, does it follow your style rules, is it backward compatible, and do provider changes still satisfy known consumers.

For OpenAPI diffing, oasdiff is excellent and brutally clear. Compare the branch spec against main, fail on breaking changes, and print the exact violation. This kind of output gets engineers' attention:

text

 1error: breaking changes detected
 2- response body property 'status' was removed from GET /customers/{id} 200 application/json
 3- request parameter 'id' format changed from 'uuid' to 'string' in GET /customers/{id}

A GitHub Actions workflow can stay tiny:

yaml

 1name: api-contract
 2on:
 3  pull_request:
 4    paths:
 5      - 'openapi.yaml'
 6      - 'src/**'
 7      - '.github/workflows/api-contract.yaml'
 8jobs:
 9  contract:
10    runs-on: ubuntu-latest
11    steps:
12      - uses: actions/checkout@v4
13        with:
14          fetch-depth: 0
15      - uses: actions/setup-node@v4
16        with:
17          node-version: 20
18      - name: Install tools
19        run: |
20          npm i -g @stoplight/spectral-cli @redocly/cli
21          curl -L https://github.com/oasdiff/oasdiff/releases/download/v2.8.1/oasdiff_2.8.1_linux_amd64.tar.gz | tar xz
22          sudo mv oasdiff /usr/local/bin/
23      - name: Lint spec
24        run: spectral lint openapi.yaml
25      - name: Validate spec
26        run: redocly lint openapi.yaml
27      - name: Compare with main
28        run: |
29          git show origin/main:openapi.yaml > openapi-main.yaml
30          oasdiff breaking openapi-main.yaml openapi.yaml
31      - name: Verify pact
32        run: npm run pact:verify

If your service is Django 5.0 with DRF or Ninja, generate a runtime schema only as a secondary check, then compare it to the committed spec. Same idea for a Next.js 14 backend using route handlers and Zod. The committed contract stays primary because generated specs often reflect implementation accidents. A serializer tweak should not silently redefine the public API.

One more thing, fail the build if endpoints marked deprecated stay deprecated forever. Teams love adding deprecated: true and then never removing anything. Governance without cleanup turns into sediment.

versioning and deprecation people obey

Versioning policy needs to be boring enough that nobody asks for exceptions every sprint. I prefer URI-major versioning for public APIs, /v1/customers, /v2/customers, because it is obvious in logs, obvious in dashboards, obvious to customers, and hard to misunderstand. Header-based versioning always gets sold as elegant and then half the tooling forgets to send the header. Elegance doesn't help during an incident.

Major versions are for breaking changes only. Additive fields, new optional query params, extra enum values if consumers are prepared for them, those stay within the same major version. Removing a field, changing nullability, tightening validation rules, changing semantics of a status code, all of that is breaking. Treat it as such. If your team debates this constantly, write examples into the policy and stop re-litigating.

Deprecation also needs dates, not vibes. Mark the field or endpoint deprecated in OpenAPI, announce the replacement, set a removal date at least 90 days out for internal consumers and usually longer for external ones, and track actual usage in logs. If nobody can answer "who still calls /v1/customers/{id} with the old response shape", you are operating blind.

OpenAPI supports this directly:

yaml

 1paths:
 2  /v1/customers/{id}:
 3    get:
 4      deprecated: true
 5      summary: Deprecated, use /v2/customers/{id}
 6      responses:
 7        '200':
 8          description: Customer found

For fields, document deprecation in the schema description and changelog, then alert consumers before removal. Better yet, expose deprecation headers like Deprecation: true and Sunset: Wed, 31 Jul 2026 23:59:59 GMT for endpoints on the way out. This isn't overkill. It's basic operational hygiene.

Small teams usually resist this because they fear process drag. Fair concern. The trick is keeping the policy short enough to fit in one screen and strict enough that engineers don't improvise. Once it's habitual, velocity goes up because fewer changes require archaeology.

the checklist for next week

If your API surface is growing and nobody wants a governance committee, start with a one-week rule set.

First, every externally consumed endpoint must exist in openapi.yaml, merged in the same pull request as the code change. No exceptions, no "we'll document it later" tickets. Later never happens.

Second, CI must run spec linting, compatibility diffing against main, and Pact provider verification. If one of those checks fails, the pull request stays red. Engineers adapt fast once the rule is real.

Third, every breaking change needs three explicit fields in the PR template: affected consumers, migration path, removal date. If the author can't fill that out, the change isn't ready. This catches a shocking number of impulsive API edits.

Fourth, assign one engineer each week as API reviewer. Not architecture czar, not process owner, just the person who checks that naming, error shape, pagination style, and deprecation metadata stay consistent. Rotating this works fine on a team of five or six.

Fifth, log consumer identifiers. Even a simple API key mapping or User-Agent convention is enough to tell who still uses old paths. Teams skip this, then guess during deprecations, and guessing is how you end up supporting dead versions for a year.

We've used lightweight versions of this on projects where the stack was Next.js on the edge, on Django plus PostgreSQL for heavier business workflows, even on odd little HTMX-backed internal tools that still needed stable server contracts because other systems scraped or posted into them. Same principle every time, write the contract down, verify it automatically, make breakage intentional. Rewrites get a lot less tempting once the API stops drifting under everyone's feet.

API rewrites are governance failures

rewrites don’t come out of nowhere

openapi 3.1 as source of truth

consumer contracts catch the real breakage

ci should block schema drift

versioning and deprecation people obey

the checklist for next week

Your AI Coding Budget Is Buying the Illusion of Speed

Ship one correct admin workflow

GPU, API, or CPU batching for AI inference

Want to work with us?