Why I Only Bet on PostgreSQL for Agent Projects: Vector Search
A friend recently asked me: What do you use for vector search in Agent projects? I answered without hesitation: PostgreSQL + VectorChord.
He was surprised: “Not Milvus, Pinecone, or other dedicated vector databases?”
This made me realize many people misunderstand vector storage needs in the Agent era. Today I'll share why I only bet on PostgreSQL for Agent projects, how VectorChord achieves low-cost billion-scale vector retrieval, and its advantages over traditional HNSW approaches.
What Does Vector Storage Need in the Agent Era?
Before diving into technology, let's think about Agent project characteristics:
1. Knowledge bases update frequently
Agent knowledge bases aren't static. Users constantly upload new documents, new conversation history, and vector indexes need frequent rebuilding. Traditional HNSW index building for 100 million vectors takes hours - this can't keep up with update节奏.
2. Scale grows rapidly
Agent application vector scale often correlates with user activity and document count.
Looking at real-world scenarios:
-
Personal knowledge bases: Personal notes, documents, emails, and similar content typically generate vector scales in the 100K ~ 1 million range. A single user's total document volume is limited, and even with incremental updates, million-level vectors can cover most personal scenarios.
-
Enterprise Agent applications: This is where vector scale really explodes. Customer service bots need to store all product documentation, conversation history, and FAQs; enterprise knowledge base Q&A needs to cover internal wikis, ticket records, and meeting notes; employee assistants need access to emails, documents, and code repositories. As business grows, vector scale can easily reach tens of millions or even hundreds of millions. For example, a mid-sized company (1000-5000 employees) knowledge base might have millions of documents, which translates to tens of millions of vector chunks.
-
Agent long-term memory requirements: With the proliferation of Agent frameworks like OpenClaw and Codex, agents as “digital employees” need to remember far more than traditional RAG:
- Complete conversation history: Every conversation needs to be vectorized for context retrieval and consistency maintenance
- Task memory: Cron jobs, todos, long-term goals, and their execution states
- User preference memory: Communication styles, decision patterns, and focus areas need to be remembered and reused
- Knowledge updates: New information learned from each interaction needs to be incorporated into the knowledge base
This type of memory content accumulates continuously over time. An agent actively used for six months can generate millions of vectors from conversation records alone.
This brings scalability challenges to vector storage solutions.
3. Cost-sensitive
Most Agent projects are still early stage without massive profitability. Maintaining a separate dedicated database for vector search is a burden on both cost and operations.
4. Data consistency
Agent vector data is often tightly coupled with business data (users, documents, conversation records). Separate storage brings consistency issues and data synchronization complexity.
Based on these characteristics, my selection criteria are clear:
- Fast build: Can handle frequent updates
- Low cost: Small teams can afford it
- Already in tech stack: No additional components
- Good enough: Don't chase ultimate performance
Problems with Dedicated Vector Databases
Let me first explain why I don't choose dedicated vector databases.
When evaluating vector database options, many immediately consider dedicated solutions like Milvus, Pinecone, or Weaviate. However, in Agent scenarios, these solutions expose significant issues:
| Problem | Specific Manifestation |
|---|---|
| High cost | 100M 768-dim vectors need >$1000/month, memory >100GB |
| Slow build | HNSW index building 100M vectors takes >10 hours |
| Complex ops | Need separate deployment, monitoring, backup |
| Data scattered | Vector data separated from business data, consistency issues |
For most Agent projects, these problems are unacceptable. This is also why I prefer PostgreSQL-native solutions in RAG vector retrieval comparisons.
VectorChord: PostgreSQL Native Answer
VectorChord is a PostgreSQL native vector search extension from our team (TensorChord), the successor to pgvecto.rs.
First, a comparison:
| Solution | 100M 768-dim Storage | Build Time | Memory | Monthly Cost |
|---|---|---|---|---|
| Dedicated vector DB (HNSW) | ~1000GB | >10 hours | >100GB | ~$1000+ |
| VectorChord (IVF+RaBitQ8) | ~100GB | ~20 minutes | <1GB | ~$247 |
Cost is only 1/5, build speed is 30x faster. This is what Agent projects need.
Core Technology: Why So Fast and Efficient?
IVF + RaBitQ: Why I Gave Up HNSW
HNSW‘s multi-layer graph structure does have fast query speeds, but several fatal problems:
- Huge memory footprint: Each node stores multi-layer neighbor information, memory usage is several times the vectors themselves
- Slow construction: Need to insert nodes one by one and maintain multi-layer graph structure
- Not disk-friendly: Frequent random access patterns perform poorly on disk
VectorChord uses IVF (Inverted File) + RaBitQ (Randomized Bit Quantization) combination:
- High storage compression: 1-bit quantization 32x compression, 8-bit 4x compression
- Fast construction: Streaming build, doesn't need all data in memory
- Disk-friendly: Clustered sequential storage, sequential IO optimization
Storage Layout: Sequential IO + Cache-Friendly Two-Layer Design
This is one of VectorChord's most core designs, directly borrowing design ideas from column-oriented databases.
Global: Clustered Sequential Storage
The entire index's quantized vectors are stored sequentially in PostgreSQL data pages by cluster, structured as follows:
|
|
During queries, only continuous storage areas corresponding to candidate clusters need to be read - this is sequential IO rather than random IO. This design borrows from the sequential write approach of LSM-Trees, increasing disk throughput by 5-10x.
Intra-Cluster: Columnar-like Layout
Traditional row storage (each vector stores all dimensions contiguously):
|
|
VectorChord intra-cluster columnar layout:
|
|
Quantized data for the same dimension across all vectors is stored contiguously. When computing distances, data for multiple vectors of the same dimension can be loaded continuously into SIMD registers.
CPU cache hit rate increases from ~30% to ~95%.
Memory/Disk Tiering
Extremely compressed memory footprint: only thousands of cluster centroids are stored in full precision in memory. A few thousand 768-dimensional centroids require only ~12MB of memory.
All quantized vectors are stored on disk, and only candidate cluster data is loaded during queries. For a 100 million vector index, total memory usage is under 1GB.
RaBitQ: Low-Bit Compression with Theoretical Guarantees
RaBitQ comes from the paper RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search. The core innovation is using random rotation to make dimensions independent, with strict theoretical bounds on quantization error. At the same compression rate, it achieves 5%-10% higher recall than traditional low-bit quantization.
Let's walk through a complete 2-dimensional vector example:
Step 1: Training Set Centering
For all vectors in the training set, subtract the mean of each dimension. This is a common technique in Standardization.
|
|
Step 2: Random Orthogonal Rotation
Generate a random orthogonal matrix R (satisfying R^T * R = I), and rotate the centered vector. After rotation, the distribution of each dimension is closer to independent and identically distributed, facilitating subsequent quantization and error control.
|
|
Step 3: Quantize to Low Bits
- 1-bit quantization: Store only sign, ≥0 stores +1, <0 stores -1
- 4-bit quantization: Divide into 16 intervals, store 4-bit interval index per dimension
- 8-bit quantization: Divide into 256 intervals, store 8-bit interval index per dimension
Let's demonstrate 1-bit quantization:
|
|
| Quantization | Compression | Recall Loss | Use Case |
|---|---|---|---|
| 1bit RaBitQ | 32x | ~5%~8% | Ultra-large scale, cost-sensitive |
| RaBitQ4 | 8x | ~2%~3% | Large scale, balance cost/accuracy |
| RaBitQ8 | 4x | <1% | Most production scenarios |
Streaming Build: Build 10 Billion-Scale Index with 128GB Memory
VectorChord's biggest highlight: streaming disk build.
|
|
- Sample 20%-30% vectors for two-level K-Means clustering
- Sequentially scan all data, read-write streaming, memory only holds current batch
- 100 million 768-dimensional vector index construction takes only 20 minutes
Why PostgreSQL Native is So Important?
Fully compatible with pgvector data types and query syntax, existing businesses can migrate with zero cost.
More importantly, vector data and business data are in the same database:
- No data synchronization needed
- Transaction consistency guaranteed
- No increase in ops cost
- Already in tech stack, no need to learn new things
For small teams, this means fast time-to-market, not spending time on infrastructure.
This is also PostgreSQL's advantage as a Pinecone alternative—no need to introduce new tech stacks, directly extending vector retrieval capabilities on your existing database.
Query Performance: Good Enough is Enough
Some might ask: How fast can PostgreSQL native be?
VectorChord's query pipeline:
|
|
Pre-filtering Support: Solving the Empty Result Problem
VectorChord supports the commonly used “filter conditions first, then vector search” scenario, with two filtering modes:
| Mode | Flow | Advantage | Use Case |
|---|---|---|---|
| Pre-filter | Execute SQL condition filter first, then vector search on filtered results | Only compute vectors satisfying conditions, significantly reduce distance calculations | Strict filtering, filters out 90%+ of data |
| Post-filter | Do vector search first to get far more candidates than needed, then SQL filter | Guarantee sufficient results, won't easily return empty | Loose filtering, need to guarantee result count |
Through batch expansion + Fallback mechanism, VectorChord solves the “no results after filtering” anomaly: instead of taking only a fixed n candidate clusters once, it takes them in batches sorted by distance. If results after filtering the current batch are insufficient, it automatically takes the next batch of candidates.
For most Agent applications, this performance is completely sufficient. And the cost is only 1/5 of dedicated solutions.
My Selection Recommendations
Based on practical experience, here are my vector search selection recommendations for Agent projects:
Scale < 10 million
- Use pgvector + HNSW directly
- Single-machine PostgreSQL is completely sufficient
Scale 10M ~ 100M
- VectorChord + RaBitQ8
- Single machine or master-slave architecture
Scale > 100M
- VectorChord + RaBitQ4/RaBitQ1
- Consider sharding or read-write splitting
Technical Details: Two-Level K-Means Design
VectorChord currently uses two-level K-Means clustering to build IVF indexes:
- First level: cluster into
√kmid-level clusters - Within each mid-level cluster, cluster again to get final k clusters
Overall computation reduces from O(n * k * d) to O(n * d * √k). For k=4096, computation is reduced by 64x.
This hierarchical structure itself comes from Hierarchical SuperKMeans, which VectorChord has already implemented. Iterative intra-pruning is SuperKMeans’ biggest innovation and is the future evolution direction.
Future Evolution: Integrating SuperKMeans + PDX
VectorChord is still rapidly evolving, with several exciting directions:
SuperKMeans: Build Speed Another 30x Faster
SuperKMeans‘s core innovation is progressive pruning within each K-Means iteration:
- Compute only first d’ (12%-25%) dimensions, use BLAS GEMM batch computation
- Prune 97%+ of unlikely candidates based on partial dimension distances
- Compute full dimension distances only for remaining 3% of candidates
Performance Estimate: After integrating SuperKMeans, 100 million 768-dimensional vector build time can be reduced from 20 minutes to 2-3 minutes.
PDX Layout: Search QPS Another 200%~300% Higher
PDX is a columnar layout optimization from the CWI team. VectorChord's current intra-cluster columnar layout already has similar ideas:
- Vertical blocks: First 25% dimensions fully columnar, dedicated to fast pruning
- Horizontal blocks: Remaining dimensions grouped by 64 dimensions in columnar storage
Cache hit rate and pruning efficiency further improve, search QPS expected to increase by another 200%-300%.
Thoughts
Good Enough vs Ultimate Performance
In early Agent projects, the most important thing is rapid validation and iteration, not chasing ultimate query performance.
VectorChord's design philosophy is “good enough”: under acceptable precision and latency, minimize storage and cost.
This pragmatic attitude is how small teams survive.
PostgreSQL is the Best Data Foundation for Agent Era
Vector search is just one of PostgreSQL's many capabilities.
Later I'll also talk about:
- Full-text search: Why PostgreSQL's FTS is more suitable for Agent projects than ES
- ACID transactions: Why Agent state management can't live without transactions
- JSONB: Why I prefer JSONB over separate MongoDB
- OLAP scenarios: PostgreSQL exploration in analytical scenarios
- Supabase: Why I recommend developers start with Supabase
- And more
Next Step
VectorChord is still rapidly evolving:
- Integrate SuperKMeans iterative pruning, build speed another 30x faster
- Integrate PDX layout, search QPS improve another 200%-300%
But the core philosophy won't change: PostgreSQL native + low cost + good enough.
This is the first article in the “Why I Only Bet on PostgreSQL for Agent Projects” series. Next up: Full-text Search.