What is a vector database?

A vector database stores numerical representations (embeddings) of text, images, or other data and enables fast similarity search. Instead of exact-match queries like traditional databases, vector databases find items that are semantically similar to a query. They're the retrieval layer in RAG (retrieval-augmented generation) systems that power AI applications.

Do I need a vector database for RAG?

Not necessarily. If your document corpus is under 100K documents and you already use PostgreSQL, pgvector (a free Postgres extension) handles vector similarity search well. For prototyping, Chroma runs in-process with zero setup. Dedicated vector databases like Pinecone or Weaviate become worth the added complexity at 5M+ vectors or when you need sub-10ms query latency.

Which vector database should I use?

For prototyping: Chroma (simplest setup). For production with Postgres: pgvector (no new infrastructure). For managed production without Postgres: Pinecone (easiest) or Weaviate Cloud (best hybrid search). For large scale (5M+ vectors): Pinecone or Weaviate. For on-premises: self-hosted Weaviate or pgvector. Start simple and migrate if you outgrow it.

How much do vector databases cost?

pgvector is free (Postgres extension). Chroma is free and open source. Pinecone starts at $70/month (free tier available). Weaviate Cloud starts around $25/month for small workloads (free sandbox available). At scale with millions of vectors and high query volume, costs for managed services can reach $500-$2,000+/month. Self-hosting shifts costs to your own infrastructure.

Can I use PostgreSQL instead of a vector database?

Yes, for most use cases. pgvector adds vector similarity search to Postgres with HNSW indexing. It handles up to 5M vectors with acceptable latency (30-80ms at 1M vectors). The advantage is keeping vectors alongside your relational data in one database. The tradeoff is slower performance at very large scale compared to purpose-built vector databases.

What is the difference between Pinecone and pgvector?

Pinecone is a fully managed, proprietary vector database where you pay for hosting and get zero infrastructure burden. pgvector is a free PostgreSQL extension you run on your existing Postgres instance. Pinecone is faster at large scale (10M+ vectors) and easier to operate. pgvector is free, keeps vectors with your relational data, and avoids vendor lock-in. For most applications under 5M vectors, pgvector performs well enough.

What is HNSW indexing in vector databases?

HNSW (Hierarchical Navigable Small World) is the dominant indexing algorithm in modern vector databases. It builds a multi-layer graph of vectors where similar items are connected. Queries traverse from coarse upper layers to fine lower layers, finding approximate nearest neighbors in logarithmic time rather than scanning every vector. HNSW delivers 95-99% recall with millisecond latency, but requires storing the full graph in memory.

How do I migrate from pgvector to a dedicated vector database?

Re-embed your documents (1-2 hours compute time, a few dollars in API costs), swap the retrieval code from SQL to the new database's SDK (50-200 lines if you used an abstraction layer), set up the new infrastructure (30 minutes for managed services), and test retrieval quality with an evaluation set. Total migration: 1-3 days for managed services. Building a thin retrieval abstraction from the start makes this easier.

Should I use pgvector or a dedicated vector database in 2026?

Start with pgvector if you already run PostgreSQL. It handles millions of vectors with sub-100ms queries using HNSW indexes, which covers most RAG applications. Move to a dedicated vector database (Pinecone, Weaviate, Qdrant) if you need: billions of vectors, advanced filtering during search, built-in hybrid search, or managed infrastructure with zero ops overhead. The migration path from pgvector to a dedicated database is straightforward since your embeddings are portable.

Vector Database Guide 2026: How Vector Search Works

Q: Vector database news for 2026: what changed recently?

Vector database news through April 2026: Pinecone reduced serverless pricing tiers and expanded namespace limits. Weaviate Cloud Services added improved hybrid search and tighter Cohere integration. Qdrant shipped binary quantization options that cut memory cost 4-8x at minimal recall loss. Milvus 2.5 added GPU-accelerated indexing for billion-scale deployments. ChromaDB reached production maturity with managed cloud offering. pgvector continues active development with improved HNSW build performance. Open-source options (Qdrant, Milvus, ChromaDB) have closed much of the feature gap with managed services through 2025-2026.

Ready to Choose?

If you already understand vector search and want opinionated picks, go to our Best Vector Databases for RAG Pipelines picks list. This guide covers how vector search works and when a dedicated vector DB is actually warranted.

Vector databases became the "must-have" infrastructure for AI applications in 2024. Every RAG tutorial starts with "first, set up your vector database." But here's what most tutorials won't tell you: many AI applications don't need a dedicated vector database at all.

This guide explains what vector databases do under the hood, compares six major options with real pricing and performance data, walks through concrete architecture patterns, breaks down costs, and gives you a clear framework for deciding whether you need one.

What Vector Databases Do

The Problem They Solve

Traditional databases are built for exact matches. You search for a customer ID, a product name, or a date range, and the database returns rows that match precisely. But AI applications need similarity search: "find the documents most similar to this question."

Similarity search works on embeddings: numerical representations of text (or images, or any data) where similar items are close together in high-dimensional space. The sentence "How do I reset my password?" and "I forgot my login credentials" have different words but similar embeddings because they mean similar things.

A vector database stores these embeddings and finds the closest ones to a query vector quickly, even across millions of items. That's the core functionality. Everything else, metadata filtering, hybrid search, multi-tenancy, is an optimization on top of that foundation.

How Embeddings Work

An embedding model converts text into a fixed-length array of floating point numbers. OpenAI's text-embedding-3-small produces 1536-dimensional vectors. Cohere's embed-v3 outputs 1024 dimensions. Open-source models like BGE-large and E5-large produce 1024 dimensions. Smaller models like all-MiniLM-L6-v2 output 384 dimensions and run on a laptop CPU.

Two pieces of text with similar meaning produce vectors that are close together when you measure distance between them. The model learns these relationships during training on large text corpora.

Embedding Model Selection Matters More Than Database Choice

This is worth emphasizing early. The embedding model determines retrieval quality. The database stores and retrieves whatever vectors you give it. If your embedding model doesn't capture the semantic relationships in your domain, switching from pgvector to Pinecone won't help.

General-purpose models (OpenAI, Cohere) work well for most text. Domain-specific fine-tuned models improve results for specialized content like medical records, legal documents, or code. If your RAG system returns irrelevant results, test a different embedding model before changing your database.

Distance Metrics

The most common distance metrics are:

Cosine similarity: Measures the angle between vectors. Most popular for text similarity. Score ranges from -1 (opposite) to 1 (identical). Direction matters, magnitude doesn't.
Euclidean distance (L2): Straight-line distance between points. Works well when vector magnitude carries meaning, like when comparing document lengths or signal strengths.
Dot product (inner product): Computationally fastest. Equivalent to cosine similarity when vectors are normalized, which most embedding models produce by default.

For most text-based AI applications, cosine similarity with a standard embedding model works well. Don't overthink the distance metric choice unless you're seeing specific retrieval quality issues.

How Indexing Works Under the Hood

Finding the nearest neighbor in high-dimensional space by brute force means computing the distance to every single vector. At 1 million vectors with 1536 dimensions, that's 1 million distance calculations per query. Brute force works for small datasets (under 50K vectors) but falls apart at scale.

Vector databases use approximate nearest neighbor (ANN) algorithms that trade a small amount of accuracy for massive speed improvements. The two dominant approaches:

HNSW (Hierarchical Navigable Small World). Builds a multi-layer graph where each vector connects to its nearby neighbors. Queries traverse the graph from coarse layers to fine layers, narrowing the search space at each step. HNSW delivers excellent recall (typically 95-99%) with fast queries. The tradeoff: it's memory-intensive because the entire graph lives in RAM. At 1M vectors with 1536 dimensions, expect 6-8GB of memory for the index alone.

IVF (Inverted File Index). Partitions vectors into clusters using k-means. During a query, only the nearest clusters are searched. IVF uses less memory than HNSW and supports disk-based storage, but requires tuning the number of clusters and probes. Recall depends on how many clusters you search, more clusters means higher recall but slower queries.

Product Quantization (PQ). Compresses vectors by splitting them into sub-vectors and replacing each with a codebook entry. Reduces memory usage by 4-8x with some accuracy loss. Often combined with IVF (IVF-PQ) for large-scale deployments where memory is the constraint.

Most managed vector databases use HNSW internally because it delivers the best query performance without tuning. pgvector supports both IVFFlat and HNSW indexes.

The Six Major Vector Databases Compared

Feature	Pinecone	Weaviate	Chroma	pgvector	Qdrant	Milvus
Type	Managed only	Open source + managed	Open source + managed	Postgres extension	Open source + managed	Open source + managed
License	Proprietary	BSD-3	Apache 2.0	PostgreSQL License	Apache 2.0	Apache 2.0
Free Tier	Yes (2GB)	Sandbox	Yes	Free (extension)	Yes (1GB cloud)	Free (Zilliz Cloud)
Paid Starting	$70/mo	$25/mo	Usage-based	Your Postgres cost	$25/mo	$65/mo (Zilliz)
Hybrid Search	Sparse vectors	Built-in (BM25 + vector)	Limited	Combine with ts_vector	Sparse vectors + payload	Built-in
Max Dimensions	20,000	65,535	No limit	16,000	65,536	32,768
Index Type	Proprietary	HNSW	HNSW	IVFFlat, HNSW	HNSW	IVF, HNSW, DiskANN
Metadata Filtering	Yes	Yes	Yes	SQL WHERE clauses	Yes (rich filters)	Yes
Multi-tenancy	Namespaces	Native	Collections	Row-level security	Collection-level	Partitions
SDK Languages	Python, JS, Go, Java	Python, JS, Go, Java	Python, JS	Any Postgres client	Python, JS, Go, Rust, Java	Python, JS, Go, Java
Best For	Zero-ops teams	Hybrid search	Prototyping	Postgres shops	Filtering-heavy workloads	Billion-scale datasets

Pinecone

The most popular managed vector database. Fully hosted, no infrastructure to manage. You send vectors through the API, and Pinecone handles indexing, replication, and scaling.

Pinecone

Pricing: Free tier (1 index, 2GB storage). Starter at $70/month. Standard from $231/month. Enterprise custom pricing.
Strengths: Easiest setup (5 minutes to first query). Excellent documentation. Reliable uptime. Metadata filtering built in. Serverless option reduces costs for sporadic workloads.
Weaknesses: Vendor lock-in (proprietary, no self-hosted option). Gets expensive at scale. Limited query flexibility compared to self-hosted options. No hybrid search with BM25 (uses sparse vectors instead).
Best for: Teams that want zero infrastructure management. Startups and mid-size companies building RAG applications where ops simplicity matters more than cost optimization.

Pinecone's serverless offering (launched 2024) changed the pricing model. You pay per query and storage rather than provisioning fixed pods. For applications with variable traffic, this can reduce costs by 50-80% compared to pod-based pricing. For steady high-throughput workloads, pods are still more predictable.

Weaviate

Open-source vector database with both self-hosted and managed cloud options. Weaviate's differentiator is built-in hybrid search that combines BM25 keyword scoring with vector similarity in a single query.

Weaviate

Pricing: Free (self-hosted). Weaviate Cloud: free sandbox, Standard from $25/month for small workloads. Enterprise pricing scales with usage.
Strengths: Open source (can self-host for free). Built-in hybrid search (BM25 + vector). GraphQL API. Supports multiple embedding models natively with vectorizer modules. Named vectors allow multiple embedding spaces per object.
Weaknesses: Self-hosting requires DevOps knowledge. Cloud pricing can surprise at scale. More complex setup than Pinecone. Resource-heavy for small deployments.
Best for: Teams that want hybrid search, need to self-host for compliance, or want to run embedding models alongside the database.

Weaviate's hybrid search deserves a closer look. Pure vector search sometimes misses results that contain exact keywords users expect. Hybrid search runs both a BM25 keyword query and a vector similarity query, then fuses the results. For customer support applications and documentation search, hybrid search consistently outperforms pure vector search in retrieval quality benchmarks.

Chroma

Lightweight, developer-friendly vector database designed for rapid prototyping and small-to-medium production workloads.

Chroma

Pricing: Free and open source. Hosted offering available with free tier.
Strengths: Simplest API of any vector database (4 main functions: add, query, update, delete). Runs in-process (no server needed for development). Great Python integration. Embeds documents for you if you provide an embedding function.
Weaknesses: Limited production track record at large scale (10M+ vectors). Fewer enterprise features (no built-in replication, backup). Smaller ecosystem compared to Pinecone or Weaviate.
Best for: Prototyping, hackathons, small to medium applications (under 1M vectors), developers who want the simplest possible setup.

Chroma's in-process mode is its killer feature for development. You import it, create a collection, and start adding documents. No Docker, no server process, no config files. When you're iterating on chunking strategies or embedding models, this removes friction that slows down experimentation. For production, switch to client-server mode or the hosted offering.

pgvector (PostgreSQL Extension)

A PostgreSQL extension that adds vector similarity search to your existing Postgres database. This is the "boring technology" option, and for many teams it's the right one.

pgvector

Pricing: Free (it's an extension for Postgres you already run). Managed Postgres services (Supabase, Neon, RDS) support it at their standard pricing.
Strengths: No new infrastructure. Vectors live alongside your relational data. Full SQL for querying. ACID transactions. JOIN vectors with user tables, permission tables, anything. You already know Postgres.
Weaknesses: Slower than dedicated vector databases at scale (10M+ vectors). HNSW index uses significant memory. No built-in sharding for horizontal scaling. Tuning requires Postgres knowledge.
Best for: Teams already using Postgres. Applications under 5M vectors. Situations where you need vectors + relational data in the same query. Regulated industries where adding new infrastructure requires security review.

pgvector added HNSW indexing in version 0.5.0, which closed the performance gap significantly. With HNSW, pgvector handles 1M vectors at 30-80ms p95 latency. That's fast enough for any RAG application where the LLM generation step takes 500ms or more.

The underrated advantage of pgvector is transactional consistency. When you update a document and its embedding in the same transaction, you never have stale vectors pointing to deleted content. With separate vector databases, you need a sync pipeline, and sync pipelines eventually drift.

Qdrant

Open-source vector database written in Rust, known for strong filtering performance and flexible payload storage.

Qdrant

Pricing: Free (self-hosted). Qdrant Cloud: 1GB free tier, then from ~$25/month. Enterprise pricing available.
Strengths: Written in Rust (fast, memory-efficient). Excellent metadata filtering, supports nested objects and geo-filters. Quantization built in (binary, scalar, product). Snapshot and backup support. gRPC API for low-latency access.
Weaknesses: Smaller community than Pinecone or Weaviate. Cloud offering is newer. Fewer built-in integrations (no native embedding module).
Best for: Applications with complex filtering requirements (e-commerce, multi-tenant SaaS). Teams that want high performance with self-hosted control. Rust-native infrastructure stacks.

Qdrant's filtering is worth highlighting. Most vector databases apply filters after the ANN search, which means you search for nearest neighbors first and then remove results that don't match your filter. Qdrant applies filters during the search using a technique called payload indexing. For queries like "find similar documents, but only from workspace X, created after January 2026, tagged as 'engineering'," Qdrant handles this without the recall degradation that post-filtering causes.

Qdrant also supports quantization out of the box. Binary quantization reduces memory usage by 32x with moderate recall loss (good for re-ranking pipelines). Scalar quantization offers a 4x reduction with minimal recall impact. This makes Qdrant practical for larger datasets on smaller machines.

Milvus

Open-source vector database designed for billion-scale deployments. Backed by Zilliz, which offers a managed cloud version.

Milvus

Pricing: Free (self-hosted). Zilliz Cloud: free tier, Standard from $65/month. Enterprise pricing for large clusters.
Strengths: Built for massive scale (billions of vectors). Supports GPU-accelerated indexing. Multiple index types (IVF, HNSW, DiskANN, SCANN). Horizontal scaling with separation of storage and compute. Strong in the Chinese tech ecosystem.
Weaknesses: Complex to self-host (depends on etcd, MinIO, Pulsar/Kafka). Overkill for most applications under 10M vectors. Cloud offering (Zilliz) costs more than alternatives. Documentation can be inconsistent across versions.
Best for: Large-scale applications with 100M+ vectors. Teams that need GPU-accelerated index building. Organizations with the infrastructure team to manage a distributed system.

Milvus separates storage and compute, which means you can scale query nodes independently of data nodes. For applications with bursty query patterns (a product catalog that gets hammered during sales events), you can scale query capacity without duplicating your entire dataset. This architecture is powerful but adds operational complexity that most teams don't need.

Milvus Lite is a lighter option that runs in-process (similar to Chroma) for development and small deployments. It's a good way to prototype with Milvus before committing to the full distributed setup.

Performance Benchmarks

Here are realistic numbers for a common use case: 1M document chunks, 1536-dimension embeddings, returning top 10 results. These come from ANN-Benchmarks and vendor-published data, so treat them as indicative rather than definitive. Your results will vary based on hardware, index configuration, and data distribution.

Database	p95 Latency (1M vectors)	Recall @10	Memory Usage	Index Build Time
Pinecone (serverless)	15-30ms	~98%	Managed	Minutes
Weaviate (HNSW)	20-50ms	~97%	~8GB	10-20 min
Chroma (in-process)	10-25ms	~97%	~6GB	5-10 min
pgvector (HNSW)	30-80ms	~95%	~8GB	15-30 min
Qdrant (HNSW)	10-30ms	~98%	~7GB	8-15 min
Milvus (HNSW)	15-40ms	~97%	~8GB	10-20 min

All of these are fast enough for production RAG applications where the LLM generation step takes 500-3000ms. The vector search latency is rarely your bottleneck. If your application feels slow, profile the full pipeline before blaming the vector database.

At 10M vectors, the picture changes. pgvector latency climbs to 200-400ms without careful tuning. Dedicated vector databases stay in the 30-80ms range through more sophisticated indexing and memory management. At 100M+ vectors, Milvus and Qdrant with quantization pull ahead of the pack.

Real Architecture Examples

Abstract comparisons only go so far. Here's how vector databases fit into three common production architectures.

Architecture 1: RAG Chatbot for Internal Documentation

Use case: An internal chatbot that answers employee questions using company documentation (Confluence, Notion, Google Docs). 50K documents, 200K chunks after splitting.

Stack:

Embedding model: OpenAI text-embedding-3-small (1536 dimensions, $0.02 per 1M tokens)
Vector storage: pgvector on existing Supabase instance
LLM: GPT-4o or Claude for answer generation
Ingestion: Python script runs nightly, pulls from APIs, chunks documents, embeds, upserts to Postgres

Why pgvector works here: 200K vectors is well within pgvector's comfort zone. The company already uses Supabase for its main application, so pgvector adds zero infrastructure. Document access permissions map to Postgres row-level security. The nightly sync is a single cron job, no message queue or sync pipeline needed.

Query flow:

User asks a question in Slack or a web UI
Embed the question using the same model used for documents
Query pgvector: SELECT content, metadata FROM documents ORDER BY embedding <=> $query_vector LIMIT 5
Pass the top 5 chunks + the original question to the LLM
Return the generated answer with source links

Cost: $0 for vector storage (part of existing Supabase plan). Embedding the corpus costs ~$4 one-time. Ongoing query costs are the LLM calls, typically $50-200/month for a team of 100.

Architecture 2: Semantic Search for an E-Commerce Catalog

Use case: A product search engine that understands natural language queries like "warm waterproof jacket for hiking" across a catalog of 2M products. Results need filtering by category, price range, brand, and availability.

Stack:

Embedding model: Cohere embed-v3 (1024 dimensions, multilingual support)
Vector storage: Qdrant (self-hosted, 3-node cluster)
Re-ranker: Cohere rerank-v3 on top 50 results
Ingestion: Kafka pipeline from product catalog updates

Why Qdrant here: 2M products with complex filters (category, price, brand, in-stock status, warehouse location) makes Qdrant's payload filtering a strong fit. Post-filtering at this scale drops too many relevant results. Qdrant's pre-filtering maintains recall while applying business constraints. Scalar quantization keeps memory manageable on 3 nodes with 64GB RAM each.

Query flow:

User types "warm waterproof jacket for hiking" and selects filters (price under $300, in-stock only)
Embed the query with Cohere
Search Qdrant with vector + payload filter: category IN [outerwear, jackets], price <= 300, in_stock = true
Retrieve top 50 results, pass to Cohere rerank with the original query
Return top 20 re-ranked results to the user

Cost: Qdrant self-hosted on 3x r6i.2xlarge (64GB RAM) costs ~$1,200/month on AWS. Cohere embedding + reranking runs ~$500/month at 100K daily queries. Total: ~$1,700/month. A managed alternative (Pinecone) would run $2,500-4,000/month for equivalent throughput.

Architecture 3: Recommendation Engine for a Content Platform

Use case: A news/content platform recommends articles based on reading history. 500K articles, 5M user preference vectors (updated as users read and engage).

Stack:

Embedding model: Custom fine-tuned E5-large (1024 dimensions, trained on engagement data)
Vector storage: Milvus on Zilliz Cloud
Feature store: Redis for real-time user signals
Ingestion: Flink pipeline processes engagement events, updates user vectors every 15 minutes

Why Milvus here: 5.5M total vectors with frequent updates (user vectors change as they read). Milvus handles the write throughput from continuous user vector updates while maintaining query performance. Separation of storage and compute means query latency stays stable during bulk index rebuilds. Partitioning by content category speeds up "recommend within this section" queries.

Query flow:

User opens the app or finishes an article
Fetch the user's preference vector from Milvus
Query Milvus: find 100 nearest article vectors, filtered to articles published in the last 7 days and not already read
Blend vector similarity scores with real-time signals from Redis (trending, recency boost)
Return top 20 recommendations

Cost: Zilliz Cloud for this workload runs ~$800/month. Custom embedding model hosting (2x A10G on AWS) adds ~$1,500/month. Redis for feature store adds ~$200/month. Total: ~$2,500/month. Self-hosting Milvus would save on the Zilliz cost but adds 0.5-1 FTE of DevOps effort.

Cost Analysis: Managed vs Self-Hosted

Vector database costs break down into three categories: infrastructure, embedding generation, and engineering time. Most teams focus only on the first and underestimate the third.

Infrastructure Costs at Different Scales

Scale	pgvector	Pinecone	Weaviate Cloud	Qdrant Cloud	Self-Hosted (AWS)
100K vectors	$0 (existing DB)	$0 (free tier)	$0 (sandbox)	$0 (free tier)	~$50/mo (t3.medium)
1M vectors	$0-50 (existing DB)	$70-231/mo	$25-100/mo	$25-75/mo	~$150/mo (r6i.large)
10M vectors	$200-400/mo (larger instance)	$500-1,500/mo	$300-800/mo	$200-500/mo	~$500/mo (r6i.xlarge)
100M vectors	Not recommended	$2,000-5,000/mo	$1,500-4,000/mo	$1,000-3,000/mo	~$2,000/mo (multi-node)

Embedding Generation Costs

Embedding your corpus is a one-time cost (plus incremental updates). Re-embedding happens when you switch models or dimensions.

Model	Cost per 1M Tokens	Cost to Embed 1M Documents (~500 tokens each)
OpenAI text-embedding-3-small	$0.02	~$10
OpenAI text-embedding-3-large	$0.13	~$65
Cohere embed-v3	$0.10	~$50
Self-hosted (e.g., BGE-large on GPU)	$0 (infra cost only)	~$5-15 (compute time)

For most applications, embedding costs are negligible compared to LLM inference costs and infrastructure. Don't optimize embedding costs unless you're processing tens of millions of documents.

The Hidden Cost: Engineering Time

Self-hosting a vector database saves on licensing but costs engineering time. A realistic breakdown:

Initial setup (self-hosted): 2-5 days for deployment, monitoring, backups
Ongoing maintenance: 2-4 hours/month for updates, capacity planning, incident response
Index tuning: 1-2 days when performance issues arise
Scaling events: 1-3 days for major capacity changes

At an engineering cost of $150/hour, self-hosting overhead runs $400-1,000/month in labor. For teams under 50 engineers, managed services often save money when you account for engineering time. For larger infrastructure teams that already manage Kubernetes clusters and database deployments, self-hosting adds minimal incremental work.

When Managed Services Make Financial Sense

Use managed services (Pinecone, Weaviate Cloud, Qdrant Cloud, Zilliz) when:

Your team is under 10 engineers and doesn't have dedicated DevOps
Your vector count is under 10M (managed pricing is reasonable at this scale)
You need production uptime without on-call rotation for the vector layer
You're in a startup where engineering time is the scarcest resource

Self-host when:

You have 50M+ vectors and managed pricing becomes prohibitive
You have compliance requirements that prevent sending data to third parties
Your team already manages Kubernetes and has infrastructure expertise
You need custom configurations that managed services don't expose

When You Need a Dedicated Vector Database

You need a dedicated vector database when:

You have more than 5 million vectors

At this scale, pgvector performance degrades noticeably, and dedicated vector databases maintain consistent latency through purpose-built indexing and memory management. If your document corpus is large (millions of pages), or you're storing per-user vectors alongside content vectors, a dedicated solution makes sense.

You need sub-10ms query latency

Real-time applications like autocomplete, live recommendations, or streaming retrieval need the fastest possible vector lookup. Dedicated vector databases with in-memory HNSW indexing and gRPC APIs deliver consistent single-digit millisecond latency that pgvector can't match.

You're doing complex filtered search at scale

If your workload combines vector similarity with multiple metadata filters (category, date range, tenant ID, permissions), and you have millions of vectors, dedicated databases handle this more efficiently. Qdrant and Milvus apply filters during the search rather than after, maintaining recall quality.

You need horizontal scaling

Single-node Postgres has limits. If your data grows beyond what a single large instance can handle (typically 10-20M vectors depending on dimensions), you need a database that supports sharding across nodes. Milvus, Qdrant, and Weaviate all support distributed deployments.

When You Don't Need One

This is the section most vector database marketing materials skip. You probably don't need a dedicated vector database when:

Your corpus is under 100K documents

For small to medium document sets, pgvector works perfectly well. Query latency at 100K vectors is under 20ms with a properly configured HNSW index. You avoid adding a new piece of infrastructure to your stack, which means less to monitor, maintain, and pay for.

You already use Postgres

If your application already runs on Postgres (and most web applications do), pgvector keeps everything in one database. Your vectors join with your relational data in a single query. You don't need a separate data pipeline to sync between systems. For most startups and early-stage products, this simplicity is worth more than the performance gains of a dedicated solution.

You're prototyping

During prototyping and early development, use Chroma (in-process, zero setup) or pgvector (if you already have Postgres). Don't add infrastructure complexity before you've validated that your RAG approach works. You can migrate to a dedicated vector database later if you need the scale.

Your retrieval quality issues aren't about the database

This is the most common mistake. Teams switch vector databases hoping to improve RAG quality. But poor retrieval quality is almost always one of these problems:

Bad chunking strategy: Chunks too large (lose specificity) or too small (lose context). Try 256-512 tokens with 50-token overlap as a starting point.
Wrong embedding model: A general-purpose model on domain-specific content. Fine-tuned or domain-specific models improve recall by 10-30% in specialized domains.
Poor query formulation: The user's query doesn't match the style of the embedded documents. Query expansion, HyDE (Hypothetical Document Embeddings), or query rewriting with an LLM can help.
Missing metadata: You're not using metadata filters to narrow the search space before vector similarity. Adding filters for document type, date, or section improves precision.

The database stores and retrieves vectors. Switching from pgvector to Pinecone won't fix a bad chunking strategy.

Common Mistakes and How to Avoid Them

Mistake 1: Choosing the database before validating the approach

Teams spend weeks evaluating vector databases before confirming that vector search solves their problem. Start with Chroma or pgvector. Build a prototype. Measure retrieval quality. If the approach works and you hit scale limits, migrate. The migration cost is 1-3 days, and you'll make a better database decision with production data.

Mistake 2: Storing too much in the vector database

Vector databases are optimized for similarity search, not general-purpose storage. Store the embedding, a chunk ID, and minimal metadata in the vector database. Keep the full document content, user data, and application state in your primary database. Query the vector database for IDs, then hydrate results from your main database.

Mistake 3: Ignoring index configuration

The default HNSW parameters (M=16, ef_construction=200) work for most cases. But if you're seeing poor recall or high latency, tuning these matters. Higher M increases connections per node (better recall, more memory). Higher ef_construction builds a more accurate index (slower build, better queries). Higher ef_search checks more candidates at query time (better recall, slower queries). Run benchmarks with your data, because optimal parameters depend on your dataset's characteristics.

Mistake 4: Not monitoring recall quality over time

As your dataset grows, recall can degrade if your index parameters were tuned for a smaller corpus. Build an evaluation set of query-result pairs and measure recall weekly. If recall drops below your threshold (95% is a common target), rebuild the index with updated parameters or consider scaling your infrastructure.

Mistake 5: Overengineering multi-tenancy

If you have 10 tenants, separate collections work fine. At 1,000 tenants, you need metadata-filtered search within a shared collection. At 100,000 tenants, you need a database that supports efficient partition pruning (Milvus partitions or Qdrant's payload indexing). Match the multi-tenancy strategy to your scale, not to what you think you'll need in two years.

Migration Considerations

Starting with pgvector and migrating later is a valid strategy. Here's what the migration involves:

Re-embed your documents: You might want to use a different embedding model anyway. Budget a few dollars for the embedding API calls and 1-2 hours of compute time.
Abstraction layer: If you build a thin retrieval interface from the start (a function that takes a query string and returns relevant chunks), the migration surface is small. All the database-specific code lives behind one function.
Update your retrieval code: Swap the query logic from SQL to the new database's SDK. Typically 50-200 lines of code if you used an abstraction layer.
Set up the new infrastructure: Managed services (Pinecone, Weaviate Cloud) take 30 minutes. Self-hosted takes 2-5 days including monitoring and backups.
Test retrieval quality: Ensure results are equivalent. Run your eval suite against both systems. Look for edge cases in filtered queries and metadata handling.
Parallel run: For production systems, run both databases in parallel for a week. Query both, compare results, switch traffic gradually.

Total migration time: 1-3 days for managed, 3-7 days for self-hosted. It's not a major undertaking, which is why starting simple and scaling up is usually the right call.

How to Choose: A Decision Framework

Answer these five questions to narrow your choice:

1. How many vectors do you have (or expect within 12 months)?

Under 100K: Chroma (prototyping) or pgvector (production)
100K-5M: pgvector, Pinecone, Weaviate, or Qdrant
5M-50M: Pinecone, Weaviate, Qdrant, or Milvus
50M+: Milvus, Qdrant, or Weaviate (all self-hosted or enterprise cloud)

2. Do you need hybrid search (keyword + vector)?

Yes: Weaviate (best built-in hybrid) or Qdrant (sparse vectors)
No: Any option works

3. Do you already use PostgreSQL?

Yes and under 5M vectors: pgvector is the default choice. No new infrastructure, no sync pipeline, transactional consistency.
Yes but over 5M vectors: Evaluate a dedicated database, but consider pgvector with HNSW tuning first.

4. Do you have DevOps capacity to self-host?

Yes: Self-hosted Qdrant, Weaviate, or Milvus (save on managed service markup)
No: Pinecone, Weaviate Cloud, Qdrant Cloud, or Zilliz

5. Do you have compliance constraints (data residency, no third-party storage)?

Yes: pgvector (existing infra) or self-hosted Weaviate/Qdrant/Milvus
No: Managed services are fair game

Quick Decision Guide

Prototyping: Chroma (zero setup, runs in your Python process)
Production, under 5M vectors, already on Postgres: pgvector
Production, under 5M vectors, want managed: Pinecone (easiest) or Qdrant Cloud (best filtering)
Production, need hybrid search: Weaviate
Production, 5M-50M vectors: Qdrant or Weaviate (self-hosted or cloud)
Production, 50M+ vectors: Milvus or Qdrant with quantization
On-premises / compliance: pgvector, Weaviate, Qdrant, or Milvus (all open source)
Complex filtering (e-commerce, multi-tenant): Qdrant

The 2026 Vector Database Landscape

The vector database market has matured significantly since the initial RAG boom of 2023-2024. Several trends are reshaping which databases developers should consider for new projects.

PostgreSQL with pgvector Has Become the Default Starting Point

The pgvector extension has improved dramatically. HNSW index support, better parallel query execution, and the pgvector 0.7+ releases with quantization support have closed much of the performance gap with purpose-built vector databases. For teams already running PostgreSQL (which is most teams), starting with pgvector eliminates an entire category of infrastructure complexity. You can always migrate to a dedicated vector database later if you hit scale limits, but most applications never reach the point where pgvector becomes a bottleneck.

Hybrid Search Is No Longer Optional

Pure vector similarity search has a well-documented weakness: it misses exact keyword matches that users expect. Searching for "error code E-4021" using only vector search may return documents about errors generally rather than the specific error code. The industry has converged on hybrid search (combining vector similarity with BM25 keyword matching) as the standard approach. Weaviate and Qdrant both offer native hybrid search. Pinecone added keyword search support in late 2025. If you're evaluating vector databases today, hybrid search capability should be a requirement, not a nice-to-have.

Serverless Pricing Changed the Cost Equation

Pinecone's serverless offering and similar pay-per-query models from other vendors have made managed vector databases accessible to small teams and prototypes. You no longer need to provision and pay for always-on instances during development. For production workloads with predictable traffic, dedicated instances still offer better economics. But for development, staging, and low-traffic applications, serverless vector databases have eliminated the cost barrier that previously pushed teams toward self-hosted options.

Multi-Modal Embeddings Are Expanding Use Cases

Vector databases are no longer just for text. Models like OpenAI's CLIP and Google's multimodal embeddings allow you to store and search across text, images, and video in the same vector space. This opens up use cases like visual product search, cross-modal document retrieval (search text to find relevant images), and multi-modal RAG systems. Most major vector databases now support the higher-dimensional vectors these models produce, though storage and query costs scale accordingly.

The best vector database is the one that fits your current needs without adding unnecessary complexity. Start simple. Scale when you have data showing you need to. And spend more time on your embedding model and chunking strategy than on database selection, because those choices affect retrieval quality far more than the storage layer.

For more on building AI applications with retrieval, see our RAG architecture guide and the vector database glossary entry.

Frequently Asked Questions

Vector database news for 2026: what changed recently?

Through April 2026: Pinecone reduced serverless pricing tiers and expanded namespace limits. Weaviate Cloud Services added improved hybrid search and tighter Cohere integration. Qdrant shipped binary quantization options that cut memory cost 4-8x at minimal recall loss. Milvus 2.5 added GPU-accelerated indexing for billion-scale deployments. ChromaDB reached production maturity with managed cloud offering. pgvector continues active development with improved HNSW build performance. Open-source options have closed much of the feature gap with managed services through 2025-2026.

Vector database updates for April 2026?

April 2026 updates: Pinecone serverless pricing changes hold steady. Weaviate Cloud Services added improved monitoring and namespace management. Qdrant binary quantization is now widely adopted for memory-constrained deployments. ChromaDB managed cloud expanded enterprise features. pgvector HNSW build performance improvements continue to land in PostgreSQL extension releases. The 2026 trend is convergence on common features (hybrid search, binary quantization, namespace isolation, metadata filtering) with differentiation moving to scale, managed-vs-self-hosted ergonomics, and ecosystem integration depth.

What are the latest vector database releases in 2026?

Latest 2026 releases: Milvus 2.5 with GPU-accelerated indexing, Weaviate v1.27+ with enhanced hybrid search and Cohere reranking, Qdrant 1.10+ with binary quantization and improved filtering, ChromaDB managed cloud GA, Pinecone Assistant AI agents in beta, and pgvector continued HNSW optimization in PostgreSQL 17. Release cadence has been quarterly or faster, with each release adding incremental features around scale, cost, and developer ergonomics. For new projects in 2026, the choice is between managed (Pinecone, Weaviate Cloud, Qdrant Cloud) and self-hosted (pgvector, Qdrant, ChromaDB) based on team capacity and ops preferences.

Which vector database is best for new projects in 2026?

For new projects: pgvector if your stack runs PostgreSQL and scale is under 10M vectors. Pinecone for managed serverless deployments needing zero ops overhead. Qdrant for self-hosted deployments needing strong filtering and binary quantization. Weaviate Cloud for hybrid search use cases needing keyword + vector + reranking. ChromaDB for development and prototyping with simple Python integration. The 'best' choice depends on scale, ops capacity, and feature requirements. Run a 2-3 vendor pilot with realistic data and queries before committing.

Vector Database Update Tracker (2026)

Vector database vendors release new features and pricing changes throughout the year. We track every update so this page stays the most current source. Last reviewed: April 2026.

April 2026: No major pricing changes from major vendors. Qdrant binary quantization adoption continues to expand. ChromaDB managed cloud added enterprise SSO and audit logging.
Q1 2026: Pinecone Assistant AI agents launched in beta. Weaviate Cloud Services tightened Cohere reranking integration. Milvus 2.5 GA with GPU-accelerated indexing for billion-scale workloads.
Q4 2025: Pinecone serverless pricing reduced. Qdrant 1.10 added binary quantization and improved filtering. ChromaDB managed cloud reached general availability.
Q3 2025: Weaviate v1.27 added enhanced hybrid search. pgvector continued HNSW build performance improvements with PostgreSQL 17 alignment.

Vector Database Comparison 2026: Benchmarks & Best Use Cases data visualization — Vector Database Comparison 2026: Benchmarks & Best Use Cases

About the Author

Rome Thorndike is the founder of the Prompt Engineer Collective, a community of over 1,300 prompt engineering professionals, and author of The AI News Digest, a weekly newsletter with 2,700+ subscribers. Rome brings hands-on AI/ML experience from Microsoft, where he worked with Dynamics and Azure AI/ML solutions, and later led sales at Datajoy (acquired by Databricks).

What Vector Databases Do

The Problem They Solve

How Embeddings Work

Embedding Model Selection Matters More Than Database Choice

Distance Metrics

How Indexing Works Under the Hood

The Six Major Vector Databases Compared

Pinecone

Weaviate

Chroma

pgvector (PostgreSQL Extension)

Qdrant

Milvus

Performance Benchmarks

Real Architecture Examples

Architecture 1: RAG Chatbot for Internal Documentation

Architecture 2: Semantic Search for an E-Commerce Catalog

Architecture 3: Recommendation Engine for a Content Platform

Cost Analysis: Managed vs Self-Hosted

Infrastructure Costs at Different Scales

Embedding Generation Costs

The Hidden Cost: Engineering Time

When Managed Services Make Financial Sense

When You Need a Dedicated Vector Database

You have more than 5 million vectors

You need sub-10ms query latency

You're doing complex filtered search at scale

You need horizontal scaling

When You Don't Need One

Your corpus is under 100K documents

You already use Postgres

You're prototyping

Your retrieval quality issues aren't about the database

Common Mistakes and How to Avoid Them

Mistake 1: Choosing the database before validating the approach

Mistake 2: Storing too much in the vector database

Mistake 3: Ignoring index configuration

Mistake 4: Not monitoring recall quality over time

Mistake 5: Overengineering multi-tenancy

Migration Considerations

How to Choose: A Decision Framework

The 2026 Vector Database Landscape

PostgreSQL with pgvector Has Become the Default Starting Point

Hybrid Search Is No Longer Optional

Serverless Pricing Changed the Cost Equation

Multi-Modal Embeddings Are Expanding Use Cases

Frequently Asked Questions

Vector Database Update Tracker (2026)

You might also like

RAG and embedding trends, weekly