Vector DB Comparison 2026: Qdrant vs ChromaDB vs pgvector Guide
Benchmarked Qdrant, ChromaDB, and pgvector on 1,000 vectors (dim=384): insert speed, query latency, and filter performance, with a decision guide for RAG.
Every time I start a new RAG project, I end up spending more time choosing a vector DB than I expected. It starts with “just use Chroma,” then I see Qdrant benchmark posts and waver, then a pgvector article pulls me back toward PostgreSQL. The loop repeats until I actually sit down and run the numbers myself.
So I did. Same conditions, three databases: 1,000 vectors at dim=384, 50 query repetitions. The scale is small, but that’s the point — I wanted to see how each DB behaves in the prototype-to-small-production range where most teams actually live.
Why Vector DB Choice Matters in a RAG App
Before the numbers, it’s worth being clear about what we’re actually optimizing for. It’s not just “faster is better.”
In a RAG pipeline, vector search sits in the critical path of every user query. The sequence is: embed the question → vector search → build context → call the LLM. The LLM call itself takes 1 to 5 seconds, so you might think a few milliseconds of vector search latency don’t matter.
That’s true for bare similarity search. But real production RAG always has metadata filters. Restrict results to a specific user’s documents. Filter by date range. Include only a certain category. The latency of those filtered queries varies significantly between databases, and that’s where the decision gets real.
There’s also the architecture angle. Choosing a vector DB is not just a storage decision. It determines your infrastructure footprint, deployment complexity, operational overhead, and how hard it will be to migrate later if you change your mind.
If you haven’t settled on your RAG architecture yet, the full RAG design overview is worth reading before going deeper into DB comparisons.
ChromaDB: Five-Minute Setup, Then What?
ChromaDB’s claim of “vector search in five minutes” is accurate. I verified it.
pip install chromadb
import chromadb
client = chromadb.Client()
collection = client.create_collection("my_docs")
# Insert vectors
collection.add(
embeddings=[[0.1, 0.2, ...] * 384], # dim=384
metadatas=[{"source": "doc1", "category": "tech"}],
ids=["doc1"]
)
# Query with filter
results = collection.query(
query_embeddings=[[0.1, 0.2, ...]],
n_results=5,
where={"category": "tech"}
)
The API is minimal in a good way. You need add, query, and delete and you’re done with the basics. Metadata filters are a single where dict. Nothing to configure upfront.
What ChromaDB Does Well
In-memory mode is the default, which makes testing trivially fast. Switching to disk persistence is chromadb.PersistentClient(path="./db") and switching to client-server mode is chromadb.HttpClient(host="localhost"). The surface area is intentionally small.
The LangChain and LlamaIndex integrations for ChromaDB are the most mature among the three databases here. If you’re following a tutorial or example repo, chances are it’s using Chroma. That translates to fewer surprises and faster onboarding for your team.
The Honest Limitations
I’ve spent enough time with ChromaDB to know I wouldn’t push it past a certain scale. Past the tens-of-thousands mark, query performance starts feeling inconsistent. The HNSW index is there, but Qdrant’s implementation is more tuned for high-volume workloads.
More telling: when I look at what teams are actually running in production at scale, “ChromaDB in production” is rare. The community pattern of “Chroma for prototypes, Qdrant for production” shows up too often to dismiss. That doesn’t mean ChromaDB is bad. It means it’s a great starting point that runs into real ceiling concerns past a few hundred thousand vectors.
It also has no official clustering support. Scale-out is a single-node problem. If your traffic grows, you’re looking at architectural changes.
Qdrant: When Performance Comes First
Qdrant is written in Rust and designed from the ground up for production scale. A single Docker command gets you a running instance.
docker pull qdrant/qdrant
docker run -p 6333:6333 qdrant/qdrant
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct, Filter, FieldCondition, MatchValue
client = QdrantClient("localhost", port=6333)
# Create collection
client.create_collection(
collection_name="my_docs",
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)
# Insert vectors
client.upsert(
collection_name="my_docs",
points=[
PointStruct(
id=1,
vector=[0.1, 0.2, ...],
payload={"source": "doc1", "category": "tech"}
)
]
)
# Filtered query
results = client.search(
collection_name="my_docs",
query_vector=[0.1, 0.2, ...],
query_filter=Filter(
must=[FieldCondition(key="category", match=MatchValue(value="tech"))]
),
limit=5
)
The API is more verbose than ChromaDB. You construct Filter, FieldCondition, and MatchValue objects explicitly. It feels heavier at first, but when you need to express complex nested filter conditions, that explicitness is actually useful.
Where Qdrant Wins
The HNSW implementation is the most optimized of the three. Past 5 million vectors, Qdrant’s query throughput advantage becomes hard to argue with. It supports distributed clustering out of the box and offers Product Quantization to reduce memory footprint. Payload indexing means large-scale metadata filters stay efficient as data grows.
The built-in dashboard at localhost:6333/dashboard is genuinely useful for debugging. It shows collection stats, lets you run test queries, and gives you a live view of what’s in the database. Small thing, but it saves time.
The Honest Limitations
Qdrant can be overkill for small projects. Running Docker adds friction that ChromaDB’s in-memory mode avoids entirely. There are more knobs to tune. And my benchmark results showed something counterintuitive: at 1,000 vectors, Qdrant’s filter queries were slower than ChromaDB’s. I’ll explain why in the next section.
If your dataset is under 100K vectors and you don’t anticipate aggressive scaling, you might be paying an operational tax for performance headroom you won’t use.
pgvector: If You’re Already on PostgreSQL
pgvector is a PostgreSQL extension. You’re not adding a new database. You’re adding vector search to a database you already run.
-- Install extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create table
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
category VARCHAR(50),
embedding vector(384)
);
-- Create HNSW index
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Insert data
INSERT INTO documents (content, category, embedding)
VALUES ('Document content', 'tech', '[0.1, 0.2, ...]');
-- Vector search with filter
SELECT id, content, 1 - (embedding <=> '[0.1, 0.2, ...]') AS similarity
FROM documents
WHERE category = 'tech'
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 5;
In Python, you connect through psycopg2 or asyncpg and use the pgvector library for array handling.
from pgvector.psycopg2 import register_vector
import psycopg2
conn = psycopg2.connect("postgresql://user:pass@localhost/db")
register_vector(conn)
cur = conn.cursor()
cur.execute(
"SELECT id, content FROM documents WHERE category = %s ORDER BY embedding <=> %s LIMIT 5",
("tech", embedding_array)
)
What pgvector Does Well
Zero additional infrastructure if you’re already on PostgreSQL. Your existing ORM, migration tools, backup strategy, and monitoring stack all stay the same. You get the full expressiveness of SQL for filtering, which matters when your filters need to JOIN against other tables. Think restricting vector search to a specific user’s documents by joining the users table.
That JOIN use case is where pgvector genuinely shines. Neither ChromaDB nor Qdrant can express cross-table relationships the way SQL can.
The Honest Limitations
I think pgvector gets somewhat oversold. The “use your existing Postgres” pitch is real, but the performance gap with dedicated vector databases grows with scale. The bigger issue is network overhead. My benchmark numbers use numpy approximation for pgvector. In an actual deployment where PostgreSQL runs on a separate server, every query adds 10 to 50 ms of network roundtrip. That’s not comparable to ChromaDB’s in-memory results or Qdrant’s local Docker latency.
Proper HNSW tuning also requires PostgreSQL expertise. Running with default m and ef_construction values often underperforms expectations. Teams without a DBA may hit walls that they don’t immediately know how to debug.
I Actually Ran the Benchmark. Here Are the Numbers
Setup:
- Vector count: 1,000
- Dimensions: 384 (sentence-transformers standard)
- Query repetitions: 50
- Hardware: MacBook Pro M2, running locally
- ChromaDB: in-memory mode
- Qdrant: Docker (local)
- pgvector: numpy approximation (actual PostgreSQL adds network overhead)
import chromadb
import numpy as np
import time
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
DIM = 384
N_VECTORS = 1000
N_QUERIES = 50
# Generate test data
np.random.seed(42)
vectors = np.random.randn(N_VECTORS, DIM).astype(np.float32)
vectors = vectors / np.linalg.norm(vectors, axis=1, keepdims=True)
query_vectors = np.random.randn(N_QUERIES, DIM).astype(np.float32)
categories = [f"cat_{i % 5}" for i in range(N_VECTORS)]
# ---- ChromaDB benchmark ----
chroma_client = chromadb.Client()
collection = chroma_client.create_collection("bench")
# Insert timing
start = time.perf_counter()
collection.add(
embeddings=vectors.tolist(),
metadatas=[{"category": c} for c in categories],
ids=[str(i) for i in range(N_VECTORS)]
)
chroma_insert_time = time.perf_counter() - start
# Plain query timing
query_times = []
for qv in query_vectors:
t = time.perf_counter()
collection.query(query_embeddings=[qv.tolist()], n_results=5)
query_times.append((time.perf_counter() - t) * 1000)
# Filter query timing
filter_times = []
for qv in query_vectors:
t = time.perf_counter()
collection.query(
query_embeddings=[qv.tolist()],
n_results=5,
where={"category": "cat_0"}
)
filter_times.append((time.perf_counter() - t) * 1000)
print(f"ChromaDB insert: {chroma_insert_time:.3f}s")
print(f"ChromaDB query avg: {np.mean(query_times):.2f}ms, P95: {np.percentile(query_times, 95):.2f}ms")
print(f"ChromaDB filter query avg: {np.mean(filter_times):.2f}ms")
Results
=== Insert Time ===
ChromaDB: 0.263s
Qdrant: 0.163s
=== Plain Query (50-run average) ===
ChromaDB: avg 0.80ms | P95 0.82ms
Qdrant: avg 0.84ms | P95 1.88ms
=== Filter Query (50-run average) ===
ChromaDB: avg 2.00ms
Qdrant: avg 7.02ms
pgvector: ~1–3ms via numpy approximation
(add 10–50ms for network in real PostgreSQL deployments)

What These Numbers Mean
The result that surprised me most: ChromaDB’s filter query (2 ms) beat Qdrant (7 ms) at 1,000 vectors. That seems wrong at first. Qdrant is the more sophisticated system, so why is it slower?
The answer is architecture overhead. Qdrant is built with payload indexing, distributed filtering, and segment management designed for millions of vectors. At 1,000 vectors, that infrastructure becomes overhead rather than a benefit. ChromaDB takes a more direct approach at small scale and pays off for it.
Inserts tell the opposite story. Qdrant is faster (0.163 s vs 0.263 s). Rust performance shows up here.
Plain queries are close, but Qdrant’s P95 is 1.88 ms versus ChromaDB’s 0.82 ms. At small scale, Qdrant has more tail latency variance.
The takeaway: at small scale, ChromaDB’s simplicity wins. Past 5 million vectors, Qdrant’s HNSW optimization starts to matter in ways that flip this comparison.
Decision Matrix: What to Use When
Looking only at the numbers, you might conclude ChromaDB always wins. The actual decision is more nuanced.
| Situation | Recommendation | Reason |
|---|---|---|
| Prototype, hackathon | ChromaDB | Zero setup, simple API, in-memory |
| Small production (< 100K vectors) | ChromaDB or pgvector | Simplicity first |
| Already running PostgreSQL | pgvector | Zero new infrastructure |
| Medium scale (100K–5M vectors) | Qdrant | Balance of performance and reliability |
| Large scale (5M+ vectors) | Qdrant | HNSW optimization, clustering required |
| Complex SQL JOINs needed | pgvector | Full SQL expressiveness |
| Horizontal scale anticipated | Qdrant | Official distributed cluster support |
| Minimum operational overhead | pgvector (if on Postgres) | Single system |
Detailed Guidance
Choose ChromaDB when you need a fast proof of concept, your team doesn’t have DevOps capacity, and you’re confident data volume stays below a few hundred thousand vectors. It’s the right tool for demo apps and anything that connects directly to LangChain or LlamaIndex tutorials.
Choose Qdrant when you need to handle real production traffic, expect 5M+ vectors, or need horizontal scaling. Starting on Qdrant means you don’t face a forced migration later when scale hits. The Docker operational cost is real, though. For small projects you’re paying for headroom you may not use.
Choose pgvector when you have existing PostgreSQL infrastructure, a DBA who knows the system, and vector search that needs to combine with regular SQL queries. It’s the most pragmatic choice for teams that can’t afford the learning curve of a new database type.
The embedding model dimension also directly affects which DB you pick. How Gemini Embedding 2’s multimodal embeddings change the dim design tradeoffs is worth reading alongside this comparison.
When to Use Each, and When to Avoid It
Benchmark numbers and a decision matrix only get you so far. Here’s each database framed as “reach for it when / avoid it when.”
Use ChromaDB when
- You need a RAG proof of concept in days and you’re confident data stays under a few hundred thousand vectors
- Your team has no DevOps capacity and no appetite for standing up separate infrastructure
- You’re following LangChain or LlamaIndex tutorials directly to build a demo
Avoid ChromaDB when
- You need to serve millions of vectors reliably. Production references at that scale are comparatively thin.
- Multi-node horizontal scale-out is a hard requirement on self-hosted infra. There’s no built-in distributed cluster.
- You need to express compound filters cleanly, such as “category is tech AND score above 0.8 AND date in a given range”
Use Qdrant when
- You expect 5M+ vectors now or in the near future
- You need horizontal scaling and quantization to keep memory cost under control
- Large-scale metadata filters backed by payload indexing sit in your critical path
Avoid Qdrant when
- Your dataset is firmly under 100K vectors and there’s no reason to carry Docker operational overhead. At small scale, ChromaDB’s filter queries were actually faster.
- You’re shipping a one-day hackathon or a throwaway demo where setup friction is itself the cost
Use pgvector when
- You already run PostgreSQL, have a DBA, and want to keep new infrastructure at zero
- Your vector search needs to JOIN against user or permission tables. That’s hard to express with ChromaDB or Qdrant.
Avoid pgvector when
- PostgreSQL lives on a separate server, so 10–50ms of network roundtrip per query lands on your critical path
- Your team lacks the PostgreSQL expertise to tune HNSW parameters (
m,ef_construction,ef_search)
Official Docs and Primary Sources
The numbers here are my own measurements, but for installation, APIs, and index parameters, check each project’s official docs as the primary source.
- Qdrant: official site qdrant.tech and documentation, source at github.com/qdrant/qdrant
- Chroma: official site trychroma.com and documentation, source at github.com/chroma-core/chroma
- pgvector: official repository github.com/pgvector/pgvector, with per-version changes in the PostgreSQL release notes
pgvector in particular added memory-saving features in the 0.8.x line, including halfvec (2-byte floats), sparsevec, and binary quantization. If you’re working with high-dimensional embeddings, the release notes above are worth reading first.
What I Reach For First
Honestly, in 2026, I default to Qdrant on new projects. The reasoning is simple: a moderate overhead at small scale is cheaper than a forced migration when you hit growth. Starting on Qdrant means the production path is already in place.
That said, if your team is already running PostgreSQL, try pgvector first. It’s sufficient in most cases, and if performance becomes a real problem later, migrating to Qdrant is tractable.
ChromaDB is still my first choice for prototypes. Nothing beats pip install chromadb for getting started. But when you’re moving toward production, take Qdrant seriously.
Once you’ve chosen a vector DB, the next decision is which AI agent library will wrap it. The Python AI agent library comparison guide covers that next step.
“Which DB is the best” is the wrong question. The right answer depends on your current scale, team capacity, existing infrastructure, and how fast you need to ship. These numbers give you one more concrete data point for that call.
Frequently Asked Questions
Which should I choose among Qdrant, ChromaDB, and pgvector?
When is pgvector good enough?
What is the production performance difference?
Why is Qdrant slower than ChromaDB at small scale?
Was this helpful?
Your support helps me create better content. Buy me a coffee.