Vector databases became the "must-have" infrastructure for AI applications in 2024. Every RAG tutorial starts with "first, set up your vector database." But here's what most tutorials won't tell you: many AI applications don't need a dedicated vector database at all.
This guide explains what vector databases actually do, compares the major options with real pricing and performance data, and gives you a clear framework for deciding whether you need one.
What Vector Databases Actually Do
The Problem They Solve
Traditional databases are built for exact matches. You search for a customer ID, a product name, or a date range, and the database returns rows that match precisely. But AI applications need similarity search: "find the documents most similar to this question."
Similarity search works on embeddings: numerical representations of text (or images, or any data) where similar items are close together in high-dimensional space. The sentence "How do I reset my password?" and "I forgot my login credentials" have different words but similar embeddings because they mean similar things.
A vector database stores these embeddings and finds the closest ones to a query vector quickly, even across millions of items. That's the core functionality. Everything else is an optimization on top.
How Embeddings Work
An embedding model (like OpenAI's text-embedding-3-small or Cohere's embed-v3) converts text into a fixed-length array of numbers (typically 384-1536 dimensions). Two pieces of text with similar meaning produce vectors that are close together when you measure distance between them.
The most common distance metrics are:
- Cosine similarity: Measures the angle between vectors. Most popular for text similarity. Score ranges from -1 (opposite) to 1 (identical).
- Euclidean distance: Straight-line distance between points. Works well when vector magnitude matters.
- Dot product: Computationally fastest. Works well with normalized vectors (which most embedding models produce).
For most text-based AI applications, cosine similarity with a standard embedding model works well. Don't overthink the distance metric choice unless you're seeing specific retrieval quality issues.
The Major Vector Databases Compared
Pinecone
The most popular managed vector database. Fully hosted, no infrastructure to manage.
Pricing: Free tier (1 index, 2GB storage). Starter at $70/month. Standard from $231/month. Enterprise custom pricing.
Strengths: Easiest setup (5 minutes to first query). Excellent documentation. Reliable uptime. Metadata filtering built in.
Weaknesses: Vendor lock-in. Gets expensive at scale. Limited query flexibility compared to self-hosted options.
Best for: Teams that want zero infrastructure management. Startups and mid-size companies building RAG applications.
Weaviate
Open-source vector database with both self-hosted and managed cloud options.
Pricing: Free (self-hosted). Weaviate Cloud: free sandbox, Standard from $25/month for small workloads. Enterprise pricing scales with usage.
Strengths: Open source (can self-host for free). Built-in hybrid search (vector + keyword). GraphQL API. Supports multiple embedding models natively.
Weaknesses: Self-hosting requires DevOps knowledge. Cloud pricing can surprise at scale. More complex setup than Pinecone.
Best for: Teams that want hybrid search or need to self-host for compliance reasons.
Chroma
Lightweight, developer-friendly vector database designed for rapid prototyping.
Pricing: Free and open source. Hosted offering available with free tier.
Strengths: Simplest API of any vector database (4 main functions). Runs in-process (no server needed for development). Great Python integration. Very fast for small-medium datasets.
Weaknesses: Not designed for large-scale production (millions of vectors). Limited enterprise features. Less mature than Pinecone or Weaviate.
Best for: Prototyping, small to medium applications (under 1M vectors), developers who want the simplest possible setup.
pgvector (PostgreSQL Extension)
A PostgreSQL extension that adds vector similarity search to your existing Postgres database.
Pricing: Free (it's an extension for Postgres you already run). Managed Postgres services (Supabase, Neon, RDS) support it at their standard pricing.
Strengths: No new infrastructure. Vectors live alongside your relational data. Full SQL for querying. ACID transactions. You already know Postgres.
Weaknesses: Slower than dedicated vector databases at scale (10M+ vectors). IVFFlat index requires manual tuning. HNSW index is better but uses more memory.
Best for: Teams already using Postgres. Applications under 5M vectors. Situations where you need vectors + relational data in the same query.
Performance Comparison
Here are realistic numbers for a common use case: 1M document chunks, 1536-dimension embeddings, returning top 10 results.
Pinecone: 15-30ms
Weaviate (cloud): 20-50ms
Chroma (in-process): 10-25ms
pgvector (HNSW index): 30-80ms
All of these are fast enough for production RAG applications where the LLM generation step takes 500-3000ms. The vector search latency is rarely your bottleneck.
When You Need a Dedicated Vector Database
You actually need a dedicated vector database when:
You have more than 5 million vectors
At this scale, pgvector performance degrades noticeably, and dedicated vector databases maintain consistent latency through purpose-built indexing. If your document corpus is large (millions of pages), a dedicated solution makes sense.
You need sub-10ms query latency
Real-time applications like autocomplete or live recommendations need the fastest possible retrieval. Dedicated vector databases with in-memory indexing deliver consistent single-digit millisecond latency that pgvector can't match.
You're doing heavy vector operations
If your workload is primarily vector similarity search with high query volume (thousands of queries per second), a purpose-built database handles the load more efficiently. Postgres is general-purpose, which means it's adequate at vector operations but not optimal.
When You Don't Need One
This is the section most vector database marketing materials skip. You probably don't need a dedicated vector database when:
Your corpus is under 100K documents
For small to medium document sets, pgvector works perfectly well. Query latency at 100K vectors is under 20ms with a properly configured HNSW index. You avoid adding a new piece of infrastructure to your stack, which means less to monitor, maintain, and pay for.
You already use Postgres
If your application already runs on Postgres (and most web applications do), pgvector keeps everything in one database. Your vectors join with your relational data in a single query. You don't need a separate data pipeline to sync between systems. For most startups and early-stage products, this simplicity is worth more than the performance gains of a dedicated solution.
You're prototyping
During prototyping and early development, use Chroma (in-process, zero setup) or pgvector (if you already have Postgres). Don't add infrastructure complexity before you've validated that your RAG approach works. You can migrate to a dedicated vector database later if you need the scale.
Your retrieval quality issues aren't about the database
This is the most common mistake. Teams switch vector databases hoping to improve RAG quality. But poor retrieval quality is almost always a chunking problem, an embedding model problem, or a query formulation problem. The database just stores and retrieves vectors. Switching from pgvector to Pinecone won't fix a bad chunking strategy.
Migration Considerations
Starting with pgvector and migrating later is a valid strategy. Here's what the migration involves:
- Re-embed your documents: You might want to use a different embedding model anyway. Budget a few dollars for the embedding API calls.
- Update your retrieval code: Swap the query logic from SQL to the new database's SDK. Typically 50-200 lines of code.
- Set up the new infrastructure: Managed services (Pinecone, Weaviate Cloud) take 30 minutes. Self-hosted takes longer.
- Test retrieval quality: Ensure results are equivalent. Run your eval suite against both systems.
Total migration time: 1-3 days for a competent engineer. It's not a major undertaking, which is why starting simple and scaling up is usually the right call.
Practical Recommendations
Prototyping: Chroma (zero setup, runs in your Python process)
Production, under 5M vectors, already on Postgres: pgvector
Production, under 5M vectors, no existing database preference: Pinecone (managed) or Weaviate Cloud
Production, over 5M vectors: Pinecone or Weaviate
On-premises / compliance requirements: Weaviate (self-hosted) or pgvector
Need hybrid search (keyword + vector): Weaviate
The best vector database is the one that fits your current needs without adding unnecessary complexity. Start simple. Scale when you have data showing you need to.
For more on building AI applications with retrieval, see our RAG architecture guide and the vector database glossary entry.