What is a vector database?

A vector database stores numerical representations (embeddings) of text, images, or other data and enables fast similarity search. Instead of exact-match queries like traditional databases, vector databases find items that are semantically similar to a query. They're the retrieval layer in RAG (retrieval-augmented generation) systems that power AI applications.

Do I need a vector database for RAG?

Not necessarily. If your document corpus is under 100K documents and you already use PostgreSQL, pgvector (a free Postgres extension) handles vector similarity search well. For prototyping, Chroma runs in-process with zero setup. Dedicated vector databases like Pinecone or Weaviate become worth the added complexity at 5M+ vectors or when you need sub-10ms query latency.

Which vector database should I use?

For prototyping: Chroma (simplest setup). For production with Postgres: pgvector (no new infrastructure). For managed production without Postgres: Pinecone (easiest) or Weaviate Cloud (best hybrid search). For large scale (5M+ vectors): Pinecone or Weaviate. For on-premises: self-hosted Weaviate or pgvector. Start simple and migrate if you outgrow it.

How much do vector databases cost?

pgvector is free (Postgres extension). Chroma is free and open source. Pinecone starts at $70/month (free tier available). Weaviate Cloud starts around $25/month for small workloads (free sandbox available). At scale with millions of vectors and high query volume, costs for managed services can reach $500-$2,000+/month. Self-hosting shifts costs to your own infrastructure.

Can I use PostgreSQL instead of a vector database?

Yes, for most use cases. pgvector adds vector similarity search to Postgres with HNSW indexing. It handles up to 5M vectors with acceptable latency (30-80ms at 1M vectors). The advantage is keeping vectors alongside your relational data in one database. The tradeoff is slower performance at very large scale compared to purpose-built vector databases.

Vector Databases Explained: When You Need One (and When You Don't)

Vector databases became the "must-have" infrastructure for AI applications in 2024. Every RAG tutorial starts with "first, set up your vector database." But here's what most tutorials won't tell you: many AI applications don't need a dedicated vector database at all.

This guide explains what vector databases actually do, compares the major options with real pricing and performance data, and gives you a clear framework for deciding whether you need one.

What Vector Databases Actually Do

The Problem They Solve

Traditional databases are built for exact matches. You search for a customer ID, a product name, or a date range, and the database returns rows that match precisely. But AI applications need similarity search: "find the documents most similar to this question."

Similarity search works on embeddings: numerical representations of text (or images, or any data) where similar items are close together in high-dimensional space. The sentence "How do I reset my password?" and "I forgot my login credentials" have different words but similar embeddings because they mean similar things.

A vector database stores these embeddings and finds the closest ones to a query vector quickly, even across millions of items. That's the core functionality. Everything else is an optimization on top.

How Embeddings Work

An embedding model (like OpenAI's text-embedding-3-small or Cohere's embed-v3) converts text into a fixed-length array of numbers (typically 384-1536 dimensions). Two pieces of text with similar meaning produce vectors that are close together when you measure distance between them.

The most common distance metrics are:

Cosine similarity: Measures the angle between vectors. Most popular for text similarity. Score ranges from -1 (opposite) to 1 (identical).
Euclidean distance: Straight-line distance between points. Works well when vector magnitude matters.
Dot product: Computationally fastest. Works well with normalized vectors (which most embedding models produce).

For most text-based AI applications, cosine similarity with a standard embedding model works well. Don't overthink the distance metric choice unless you're seeing specific retrieval quality issues.

The Major Vector Databases Compared

Pinecone

The most popular managed vector database. Fully hosted, no infrastructure to manage.

Pinecone

Pricing: Free tier (1 index, 2GB storage). Starter at $70/month. Standard from $231/month. Enterprise custom pricing.
Strengths: Easiest setup (5 minutes to first query). Excellent documentation. Reliable uptime. Metadata filtering built in.
Weaknesses: Vendor lock-in. Gets expensive at scale. Limited query flexibility compared to self-hosted options.
Best for: Teams that want zero infrastructure management. Startups and mid-size companies building RAG applications.

Weaviate

Open-source vector database with both self-hosted and managed cloud options.

Weaviate

Pricing: Free (self-hosted). Weaviate Cloud: free sandbox, Standard from $25/month for small workloads. Enterprise pricing scales with usage.
Strengths: Open source (can self-host for free). Built-in hybrid search (vector + keyword). GraphQL API. Supports multiple embedding models natively.
Weaknesses: Self-hosting requires DevOps knowledge. Cloud pricing can surprise at scale. More complex setup than Pinecone.
Best for: Teams that want hybrid search or need to self-host for compliance reasons.

Chroma

Lightweight, developer-friendly vector database designed for rapid prototyping.

Chroma

Pricing: Free and open source. Hosted offering available with free tier.
Strengths: Simplest API of any vector database (4 main functions). Runs in-process (no server needed for development). Great Python integration. Very fast for small-medium datasets.
Weaknesses: Not designed for large-scale production (millions of vectors). Limited enterprise features. Less mature than Pinecone or Weaviate.
Best for: Prototyping, small to medium applications (under 1M vectors), developers who want the simplest possible setup.

pgvector (PostgreSQL Extension)

A PostgreSQL extension that adds vector similarity search to your existing Postgres database.

pgvector

Pricing: Free (it's an extension for Postgres you already run). Managed Postgres services (Supabase, Neon, RDS) support it at their standard pricing.
Strengths: No new infrastructure. Vectors live alongside your relational data. Full SQL for querying. ACID transactions. You already know Postgres.
Weaknesses: Slower than dedicated vector databases at scale (10M+ vectors). IVFFlat index requires manual tuning. HNSW index is better but uses more memory.
Best for: Teams already using Postgres. Applications under 5M vectors. Situations where you need vectors + relational data in the same query.

Performance Comparison

Here are realistic numbers for a common use case: 1M document chunks, 1536-dimension embeddings, returning top 10 results.

Query Latency (p95, 1M vectors)

Pinecone: 15-30ms
Weaviate (cloud): 20-50ms
Chroma (in-process): 10-25ms
pgvector (HNSW index): 30-80ms

All of these are fast enough for production RAG applications where the LLM generation step takes 500-3000ms. The vector search latency is rarely your bottleneck.

When You Need a Dedicated Vector Database

You actually need a dedicated vector database when:

You have more than 5 million vectors

At this scale, pgvector performance degrades noticeably, and dedicated vector databases maintain consistent latency through purpose-built indexing. If your document corpus is large (millions of pages), a dedicated solution makes sense.

You need sub-10ms query latency

Real-time applications like autocomplete or live recommendations need the fastest possible retrieval. Dedicated vector databases with in-memory indexing deliver consistent single-digit millisecond latency that pgvector can't match.

You're doing heavy vector operations

If your workload is primarily vector similarity search with high query volume (thousands of queries per second), a purpose-built database handles the load more efficiently. Postgres is general-purpose, which means it's adequate at vector operations but not optimal.

When You Don't Need One

This is the section most vector database marketing materials skip. You probably don't need a dedicated vector database when:

Your corpus is under 100K documents

For small to medium document sets, pgvector works perfectly well. Query latency at 100K vectors is under 20ms with a properly configured HNSW index. You avoid adding a new piece of infrastructure to your stack, which means less to monitor, maintain, and pay for.

You already use Postgres

If your application already runs on Postgres (and most web applications do), pgvector keeps everything in one database. Your vectors join with your relational data in a single query. You don't need a separate data pipeline to sync between systems. For most startups and early-stage products, this simplicity is worth more than the performance gains of a dedicated solution.

You're prototyping

During prototyping and early development, use Chroma (in-process, zero setup) or pgvector (if you already have Postgres). Don't add infrastructure complexity before you've validated that your RAG approach works. You can migrate to a dedicated vector database later if you need the scale.

Your retrieval quality issues aren't about the database

This is the most common mistake. Teams switch vector databases hoping to improve RAG quality. But poor retrieval quality is almost always a chunking problem, an embedding model problem, or a query formulation problem. The database just stores and retrieves vectors. Switching from pgvector to Pinecone won't fix a bad chunking strategy.

Migration Considerations

Starting with pgvector and migrating later is a valid strategy. Here's what the migration involves:

Re-embed your documents: You might want to use a different embedding model anyway. Budget a few dollars for the embedding API calls.
Update your retrieval code: Swap the query logic from SQL to the new database's SDK. Typically 50-200 lines of code.
Set up the new infrastructure: Managed services (Pinecone, Weaviate Cloud) take 30 minutes. Self-hosted takes longer.
Test retrieval quality: Ensure results are equivalent. Run your eval suite against both systems.

Total migration time: 1-3 days for a competent engineer. It's not a major undertaking, which is why starting simple and scaling up is usually the right call.

Practical Recommendations

Decision Guide

Prototyping: Chroma (zero setup, runs in your Python process)
Production, under 5M vectors, already on Postgres: pgvector
Production, under 5M vectors, no existing database preference: Pinecone (managed) or Weaviate Cloud
Production, over 5M vectors: Pinecone or Weaviate
On-premises / compliance requirements: Weaviate (self-hosted) or pgvector
Need hybrid search (keyword + vector): Weaviate

The best vector database is the one that fits your current needs without adding unnecessary complexity. Start simple. Scale when you have data showing you need to.

For more on building AI applications with retrieval, see our RAG architecture guide and the vector database glossary entry.

About the Author

Rome Thorndike is the founder of the Prompt Engineer Collective, a community of over 1,300 prompt engineering professionals, and author of The AI News Digest, a weekly newsletter with 2,700+ subscribers. Rome brings hands-on AI/ML experience from Microsoft, where he worked with Dynamics and Azure AI/ML solutions, and later led sales at Datajoy (acquired by Databricks).

What Vector Databases Actually Do

The Problem They Solve

How Embeddings Work

The Major Vector Databases Compared

Pinecone

Weaviate

Chroma

pgvector (PostgreSQL Extension)

Performance Comparison

When You Need a Dedicated Vector Database

You have more than 5 million vectors

You need sub-10ms query latency

You're doing heavy vector operations

When You Don't Need One

Your corpus is under 100K documents

You already use Postgres

You're prototyping

Your retrieval quality issues aren't about the database

Migration Considerations

Practical Recommendations

Join 1,300+ Prompt Engineers