10 min readJohnny UnarJohnny Unar

Stop Forcing Analytics Through Django ORM

Reporting code packed into chained QuerySets gets slow, fragile, and unreadable. Put heavy analytics back in SQL, where Postgres can actually help you.

the orm is the wrong tool

Django's ORM is excellent for transactional application code, CRUD screens, permission filters, and the kind of business logic where readability matters more than squeezing every last drop out of the query planner. Reporting is a different job entirely. The moment a query needs five joins, a partitioned rank, a rolling 30 day metric, conditional aggregates, and a couple of carefully placed filters to avoid double counting, trying to express that as chained QuerySets becomes a small act of self-harm.

We've inherited codebases where a monthly revenue report lived inside a manager method with twelve annotate() calls, three nested Subquery() expressions, OuterRef() flying around like shrapnel, and a final .values() that silently changed the grouping semantics because one extra column snuck in during a refactor. Nobody wanted to touch it, everybody feared it, and the generated SQL looked like something the ORM emitted after losing an argument with itself. Django did exactly what it was asked to do, which is the problem.

SQL already has names for the things reporting needs. Common table expressions let you break a query into stages that humans can read. Window functions let you compute ranks, moving averages, and percentiles without dragging half the dataset into Python. FILTER (WHERE ...) is cleaner than spraying Case and When expressions through every aggregate. date_trunc, generate_series, lag, lead, dense_rank, sum(...) over (...), these exist for a reason.

A lot of teams treat raw SQL as some dangerous escape hatch, then happily maintain 200 lines of ORM incantations that only one developer understands. That's upside down. For reporting and analytics, SQL is the primary language. Django is the integration layer around it.

write the query you mean

Start with SQL in a .sql file, not in Python, not in a model manager, and definitely not by reverse engineering whatever str(queryset.query) happens to print. Write the query the database wants to execute, then version it like code. For a sales funnel report, a shape like this is usually clearer than any ORM equivalent:

sql
1with base_orders as (
2 select
3 o.id,
4 o.customer_id,
5 o.created_at::date as order_date,
6 o.total_cents,
7 o.status
8 from orders o
9 where o.created_at >= current_date - interval '90 days'
10),
11paid_orders as (
12 select *
13 from base_orders
14 where status = 'paid'
15),
16daily_totals as (
17 select
18 order_date,
19 count(*) as paid_order_count,
20 sum(total_cents) as revenue_cents
21 from paid_orders
22 group by order_date
23),
24series as (
25 select generate_series(
26 current_date - interval '89 days',
27 current_date,
28 interval '1 day'
29 )::date as day
30)
31select
32 s.day,
33 coalesce(d.paid_order_count, 0) as paid_order_count,
34 coalesce(d.revenue_cents, 0) as revenue_cents,
35 sum(coalesce(d.revenue_cents, 0)) over (
36 order by s.day
37 rows between 6 preceding and current row
38 ) as rolling_7d_revenue_cents
39from series s
40left join daily_totals d on d.order_date = s.day
41order by s.day;

You can read this top to bottom, change one stage without wrecking the rest, and run EXPLAIN (ANALYZE, BUFFERS) directly in psql or TablePlus. That matters. A query you can't inspect properly is a query you can't tune.

Postgres 15 is especially nice here because planner behavior around CTEs is saner than the old days when people avoided them out of fear of forced materialization. You still need to understand execution plans, still need the right indexes, still need to watch row estimates, though now you can structure a complex report in a way that doesn't punish readability by default.

At Steezr we've done this on ERP dashboards and customer portals where the app lives in Django, the API is clean, the product team gets the metrics they want, and nobody has to debug an annotate() chain during a production incident.

ship sql through migrations

If the query matters, it belongs under version control and deployment discipline, same as schema changes. Tossing raw SQL into a helper string inside Python means nobody knows when it changed, nobody reviews it properly, and rollback gets messy fast. Django migrations are a perfectly good place to install views, materialized views, SQL functions, and supporting indexes.

A simple pattern is migrations.RunSQL plus a checked-in SQL file loaded with Path.read_text(). For example:

python
1from pathlib import Path
2from django.db import migrations
3
4BASE_DIR = Path(__file__).resolve().parent
5
6class Migration(migrations.Migration):
7 dependencies = [
8 ("reporting", "0012_previous"),
9 ]
10
11 operations = [
12 migrations.RunSQL(
13 sql=(BASE_DIR / "sql" / "daily_revenue_mv.up.sql").read_text(),
14 reverse_sql=(BASE_DIR / "sql" / "daily_revenue_mv.down.sql").read_text(),
15 ),
16 ]

And the SQL file:

sql
1create materialized view reporting_daily_revenue as
2with paid_orders as (
3 select
4 created_at::date as day,
5 total_cents
6 from orders
7 where status = 'paid'
8)
9select
10 day,
11 count(*) as order_count,
12 sum(total_cents) as revenue_cents
13from paid_orders
14group by day;
15
16create unique index reporting_daily_revenue_day_idx
17 on reporting_daily_revenue (day);

That unique index isn't optional if you want REFRESH MATERIALIZED VIEW CONCURRENTLY reporting_daily_revenue;. Without it, Postgres gives you ERROR: cannot refresh materialized view "reporting_daily_revenue" concurrently and tells you a usable unique index is required. Good. The database is protecting you from an operationally bad choice.

Use unmanaged Django models on top of views or materialized views if you want serializer support and admin integration:

python
1class DailyRevenue(models.Model):
2 day = models.DateField(primary_key=True)
3 order_count = models.IntegerField()
4 revenue_cents = models.BigIntegerField()
5
6 class Meta:
7 managed = False
8 db_table = "reporting_daily_revenue"

This split works well. Django still owns the application. Postgres owns the report.

test sql like code

Teams skip SQL tests, then act surprised when a harmless schema change quietly corrupts a revenue report for three weeks. If a report drives billing, finance, commissions, or customer-facing numbers, test it directly in the database. pgTAP is good enough, mature, and far better than pretending unit tests around ORM wrappers cover the real risk.

A pgTAP test can assert row counts, exact aggregates, null handling, and view shape. Something as plain as this catches real bugs:

sql
1begin;
2select plan(4);
3
4insert into orders (id, created_at, total_cents, status)
5values
6 (1, '2025-01-01 10:00:00+00', 5000, 'paid'),
7 (2, '2025-01-01 12:00:00+00', 2500, 'paid'),
8 (3, '2025-01-01 14:00:00+00', 9000, 'failed');
9
10refresh materialized view reporting_daily_revenue;
11
12select is(
13 (select order_count::int from reporting_daily_revenue where day = '2025-01-01'),
14 2,
15 'counts only paid orders'
16);
17
18select is(
19 (select revenue_cents::bigint from reporting_daily_revenue where day = '2025-01-01'),
20 7500,
21 'sums paid revenue correctly'
22);
23
24select ok(
25 not exists(select 1 from reporting_daily_revenue where revenue_cents is null),
26 'revenue_cents is never null'
27);
28
29select has_materialized_view('public', 'reporting_daily_revenue');
30
31select * from finish();
32rollback;

Run this in CI against Postgres 15, the same major version you use in production. Don't hand-wave with SQLite test runs and call it done. SQLite will happily let bad assumptions survive because it doesn't behave like Postgres on grouping, types, JSON operators, time zones, or query planning.

You can wire pgTAP into GitHub Actions with pgtap installed in the test database image, then run pg_prove. For teams already comfortable with Django's test runner, keep Python integration tests for endpoint behavior, permissions, and serialization, then let pgTAP own correctness of the SQL layer. That's a cleaner boundary than trying to fake database semantics in Python fixtures.

materialized views are an ops choice

Materialized views solve one problem, expensive repeated reads, and introduce several others, stale data, refresh cost, lock behavior, storage growth, and maintenance work that somebody needs to own. Use them because the tradeoff is worth it, not because they sound fancy.

A plain view is fine if the underlying query runs in 40 ms and the access pattern is modest. A materialized view starts making sense when the source query burns CPU, joins large tables, or powers a dashboard hit every few seconds by sales, support, and whatever wallboard somebody mounted in the office. In one client system we replaced a deeply nested ORM report that took 6 to 9 seconds on PostgreSQL 15.4 with a materialized view refreshed every five minutes, the API fell to roughly 30 ms, and nobody cared that the numbers lagged slightly because these were operational trends, not stock trades.

Refresh strategy matters. REFRESH MATERIALIZED VIEW takes an exclusive lock on the view for reads. REFRESH MATERIALIZED VIEW CONCURRENTLY avoids blocking readers, takes longer, and requires that unique index. If refresh time creeps up, you may need to pre-aggregate earlier, partition source tables, or stop pretending one giant materialized view should answer every reporting question.

Vacuum matters too. Analytics workloads create dead tuples on heavily updated source tables, and stale statistics wreck planner choices. Autovacuum defaults are often too timid for busy SaaS products. Watch pg_stat_user_tables, n_dead_tup, last_autovacuum, and actual bloat. If your report suddenly slows down after a growth spike, don't immediately blame SQL shape, check whether the planner is making decisions based on garbage stats.

Caching sits above all this. Cache the API response in Redis if the exact same report gets hammered, cache per tenant if multi-tenant cardinality is sane, and keep the cache TTL aligned with the materialized view refresh cadence. Caching a query that changes every request is cargo cult. Caching a stable dashboard summary is just practical.

keep the boundary clean

The pragmatic split is simple. Transactional business logic stays in Django models, services, and request handlers. Heavy analytics lives in SQL artifacts you can review, test, and tune in the database. Problems start when teams blur that line and force one layer to impersonate the other.

A good rule is this, if the code exists mainly to shape rows into a report, especially with time buckets, cohort logic, ranking, deduplication, or aggregate math, write SQL. If the code exists mainly to enforce application behavior, compose domain rules, validate state changes, or orchestrate workflows, keep it in Python. This boundary ages well because each side gets to use its native strengths.

None of this requires turning your Django app into a data warehouse project. You don't need dbt for every startup dashboard, you don't need Kafka because one report is slow, and you definitely don't need to torture the ORM into expressing sum(total_cents) filter (where status = 'paid') over (partition by customer_id order by created_at) just because raw SQL feels ideologically impure. Save the ideology for conference talks.

Write the SQL in files, deploy it in migrations, expose it through unmanaged models or cursor() calls where appropriate, put pgTAP around the critical bits, and document refresh semantics in plain language so product and finance know whether a chart is real time, every five minutes, or yesterday's close. Engineers can handle this. CTOs should demand it. A reporting stack that nobody can read becomes a liability long before it becomes a scaling problem.

Johnny Unar

Written by

Johnny Unar

Want to work with us?

Reporting code packed into chained QuerySets gets slow, fragile, and unreadable. Put heavy analytics back in SQL, where Postgres can actually help you.