pgvector 0.8 vs Qdrant 1.13 Benchmarks

the reflex problem

Someone on your team says the word 'embeddings' and within twenty minutes there's a Pinecone account getting created. I've watched this happen at three different companies in the last year, and every time the immediate follow-up conversation is about API keys, index namespaces, upsert rate limits, and how to keep the vector store in sync with the source-of-truth database. That's a lot of new problems to introduce before you've served a single user.

The assumption underneath all of this is that dedicated vector databases are categorically better at vector search than Postgres, and that assumption is mostly wrong at the scale most startups operate at. Not wrong in some edge case, wrong in the common case. The performance gap that Pinecone's marketing implies is real at tens of millions of vectors with sub-10ms p99 requirements and massive concurrent load. At 200k product embeddings for your e-commerce search feature, or 800k document chunks for your RAG pipeline, you're optimizing for a problem you don't have.

We ran actual benchmarks on this at steezr because we kept having the same conversation with clients. A founder would come in wanting to build an AI-powered document search tool, we'd ask what database they were already running, they'd say Postgres, and then there'd be this awkward pause where everyone in the room was thinking about whether to reach for a vector store. We finally got tired of having that conversation on vibes alone and just ran the numbers.

what pgvector 0.8 actually gives you

pgvector 0.8 shipped in late 2024 and it's a genuinely different product from the 0.5.x versions that a lot of the 'pgvector is slow' discourse was based on. The headline feature is HNSW (Hierarchical Navigable Small World) index support, which was added in 0.5.0 but has been continuously improved through 0.6, 0.7, and 0.8 with better build performance, lower memory overhead during index construction, and more predictable recall at higher ef_search values.

The two index types you care about are IVFFlat and HNSW. IVFFlat divides your vector space into lists and searches a subset of them, which means you need data loaded before you build the index and you're tuning two parameters (lists and probes) that interact in non-obvious ways. HNSW builds a multi-layer graph during insert and searches it top-down, which gives you better recall at equivalent query times and it handles incremental inserts gracefully, which matters for any real application where your embedding set grows continuously.

For HNSW in pgvector, the critical parameters are m (number of connections per node, default 16) and ef_construction (size of the dynamic candidate list during build, default 64). At query time you tune ef_search, which controls the recall/latency tradeoff. We ran most of our benchmarks with m=16, ef_construction=128, ef_search=100, which is a reasonable production default for 1536-dimensional OpenAI ada-002 embeddings, and we bumped to m=24, ef_construction=200 for the 2M row tests because the default graph quality degrades slightly at higher cardinalities.

One thing worth knowing: pgvector stores vectors in Postgres heap pages, so a table with 1M rows of 1536-dimensional float32 vectors is going to be roughly 6GB of vector data alone, not counting your payload columns or the HNSW index itself (which runs another 3-4GB at default settings). Make sure your shared_buffers and effective_cache_size are configured for this. Running pgvector on a box with 8GB of RAM and default Postgres config tuned for a transactional OLTP workload will give you terrible numbers that have nothing to do with pgvector itself.

the benchmark setup

We wanted this to be as fair as possible, which meant identical hardware for both systems. We used two c6i.2xlarge instances on AWS (8 vCPU, 16GB RAM) running Ubuntu 24.04. One ran Postgres 17 with pgvector 0.8.0 installed via the official pgvector packages, configured with shared_buffers=4GB, effective_cache_size=12GB, work_mem=256MB, and maintenance_work_mem=2GB for index builds. The other ran Qdrant 1.13.0 via Docker with the default configuration plus QDRANT__STORAGE__PERFORMANCE__MAX_SEARCH_THREADS set to 8 and on_disk_payload set to false so payload lived in memory.

We generated synthetic embeddings using Python's numpy with np.random.normal(0, 1, 1536) normalized to unit vectors, which is a reasonable approximation of real ada-002 embedding distributions. We tested at three collection sizes: 100k, 500k, and 2M vectors. Each query batch was 1000 nearest-neighbor searches for k=10, run after a warmup pass, measured with Python's time.perf_counter() around the actual query calls. We did 5 runs per configuration and took the median. Recall was measured against exact brute-force results computed on a held-out 10k query set.

For pgvector we queried using asyncpg with a connection pool of 20 connections, running queries like:

sql

 1SELECT id, payload, embedding <=> $1 AS distance
 2FROM documents
 3ORDER BY embedding <=> $1
 4LIMIT 10;

For Qdrant we used the qdrant-client Python library with its async search method. Both were called from the same benchmarking script on a third c6i.2xlarge instance to eliminate any network asymmetry. The pgvector instance was accessed over the VPC private IP on port 5432, Qdrant over port 6333. Latency numbers below are p50 and p99 in milliseconds for single-query sequential execution (not concurrent), because that's the shape of load most RAG pipelines actually produce.

the actual numbers

At 100k vectors the results are almost insultingly similar. pgvector HNSW at ef_search=100 gives you p50=2.1ms, p99=4.3ms with recall@10 of 0.971. Qdrant with its default HNSW config gives p50=1.8ms, p99=3.6ms with recall@10 of 0.978. That's real but it's not a reason to run a separate infrastructure service. At this scale you're probably more constrained by your application server's connection overhead than by the vector search itself.

At 500k vectors things get marginally more interesting. pgvector lands at p50=4.8ms, p99=9.2ms at recall 0.963, and Qdrant comes in at p50=3.1ms, p99=6.4ms at recall 0.974. Qdrant is noticeably faster here, no point pretending otherwise, but 9.2ms p99 is still comfortably fast for the vast majority of user-facing AI features. If your RAG pipeline does a vector search and then calls an LLM, the LLM call is 800ms to 2000ms depending on context length. Nobody is going to feel the difference between 5ms and 9ms vector retrieval.

At 2M vectors the gap widens more. pgvector at m=24, ef_construction=200, ef_search=100 gives p50=11.3ms, p99=21.7ms at recall 0.951. Qdrant gives p50=5.8ms, p99=11.2ms at recall 0.969. This is where you start having a real conversation. If you're at 2M vectors and you need p99 under 15ms for concurrent users, Qdrant or a similar dedicated store becomes worth the operational cost. If your p99 budget is 50ms or you're doing this in a background job, you're still fine on Postgres.

Index build times are worth mentioning. Building the HNSW index on 2M vectors in pgvector with maintenance_work_mem=2GB took 47 minutes on our hardware. Qdrant's equivalent indexing job took 31 minutes. Neither is a live concern for most use cases since you're building the index once and then it stays warm, but if you're doing frequent full rebuilds for some reason, keep that delta in mind.

what you actually gain staying on postgres

The latency numbers above tell part of the story but they miss the operational argument entirely. When your embeddings live in Postgres, you get transactional consistency for free. You insert a document, you insert its embedding in the same transaction, they either both commit or both roll back. With an external vector store you now have a distributed system problem: your Postgres row commits, your Qdrant upsert fails, and you've got a document that exists in your relational database but not in your search index. You need to build compensating logic, retry queues, or periodic reconciliation jobs to handle this. We've seen clients spending a full sprint on this problem after assuming the upsert-on-write pattern would just work reliably in production.

Filtering is another underappreciated win. pgvector queries are just SQL, so you can combine your vector similarity search with arbitrary WHERE clauses and get the query planner to do sensible things with your existing indexes. If your documents have a tenant_id column with a btree index, a query like SELECT id FROM documents WHERE tenant_id = $1 ORDER BY embedding <=> $2 LIMIT 10 will use both indexes appropriately once you've got a few hundred thousand rows in that tenant. Qdrant has its own payload filtering, and it's good, but it's a different query language, a different mental model, and the behavior at the intersection of ANN recall and payload filter selectivity can surprise you in ways that take a while to diagnose.

Backups, monitoring, access control, audit logging: all of this just works because you already have it wired up for Postgres. Your pg_dump covers vectors. Your Datadog Postgres integration covers vector query performance. Your existing IAM roles and secrets management handles connection credentials. None of this is glamorous but the cumulative time saved over six months of operating a production AI feature is significant. We've done enough of these projects at steezr to know that the infrastructure you don't have to think about is the infrastructure that lets you focus on the actual product.

when to actually graduate to a dedicated store

There are legitimate reasons to use Qdrant, Weaviate, or a managed service like Pinecone, and I want to be clear about what those are rather than just being contrarian.

You probably want a dedicated vector store when you're genuinely above 5M vectors and you have latency SLAs that pgvector can't hit even after careful tuning. Or when you need multi-tenancy with strict per-tenant index isolation and the overhead of separate Postgres schemas per tenant is causing real problems. Or when you're doing multi-vector retrieval, named vectors, or sparse-dense hybrid search (BM25 + dense vectors) and the query patterns are complex enough that rolling them in SQL becomes painful. Qdrant 1.13 has native sparse vector support and hybrid search built in, and that's genuinely useful for certain document retrieval patterns where keyword matching needs to combine with semantic similarity.

The right migration path, if you ever need it, is also not that painful. Your embeddings are just float arrays. You export them from Postgres, you load them into Qdrant, you update your retrieval layer to hit the new endpoint, you cut over. It's a weekend project, not a multi-month migration. This means you can absolutely start with pgvector, build your feature, ship it, learn what your actual query patterns and scale requirements are, and then migrate if the numbers demand it. Starting with Pinecone because you might need it someday is paying complexity tax on a future that may never arrive.

The thing startups consistently underestimate is that adding a service to your infrastructure isn't free even if the service itself is free. Every additional data store is another thing that can go down, another set of credentials to rotate, another monitoring dashboard to check, another component your on-call engineer has to understand at 2am. Boring infrastructure is a feature.

practical starting config

If you're convinced and want a starting point, here's what we'd actually deploy. Install pgvector 0.8 on Postgres 17, create your table with a vector(1536) column for ada-002 embeddings or vector(3072) for text-embedding-3-large, and create your HNSW index like this:

sql

 1CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
 2WITH (m = 16, ef_construction = 128);

Then at query time set ef_search based on your recall requirements:

sql

 1SET hnsw.ef_search = 100;
 2SELECT id, content, 1 - (embedding <=> $1) AS similarity
 3FROM documents
 4WHERE tenant_id = $2
 5ORDER BY embedding <=> $1
 6LIMIT 10;

If you're on RDS or Aurora, pgvector 0.8 is available on Postgres 15, 16, and 17 as of early 2026. If you're self-hosting with Docker, the pgvector/pgvector:pg17 image on Docker Hub stays current with releases. For Aurora Serverless v2 specifically, note that the HNSW index build can time out under default settings if you're indexing more than a few hundred thousand vectors in a single session, so break large initial loads into batches and build the index afterward rather than maintaining it during bulk insert.

One last thing worth calling out: don't store embeddings and transactional data in the same table if that table sees heavy write load. The HNSW index update on every INSERT has a real cost at high insert rates. A separate embeddings table with a foreign key back to your main documents table keeps the write path clean and lets you do bulk embedding inserts in batches without touching your hot transactional tables.

Postgres + pgvector 0.8 Is Probably Enough for Your Embeddings

the reflex problem

what pgvector 0.8 actually gives you

the benchmark setup

the actual numbers

what you actually gain staying on postgres

when to actually graduate to a dedicated store

practical starting config

Tool Calls Are Not Actions: Build an Approval Queue First

Structured Outputs Guarantee Shape, Not Semantics

Structured Outputs Don't Mean Your LLM Data Is Correct

Want to work with us?