A friend recently asked me: What do you use for vector search in Agent projects? I answered without hesitation: PostgreSQL + VectorChord.

He was surprised: “Not Milvus, Pinecone, or other dedicated vector databases?”

This made me realize many people misunderstand vector storage needs in the Agent era. Today I'll share why I only bet on PostgreSQL for Agent projects, how VectorChord achieves low-cost billion-scale vector retrieval, and its advantages over traditional HNSW approaches.

What Does Vector Storage Need in the Agent Era?

Before diving into technology, let's think about Agent project characteristics:

1. Knowledge bases update frequently

Agent knowledge bases aren't static. Users constantly upload new documents, new conversation history, and vector indexes need frequent rebuilding. Traditional HNSW index building for 100 million vectors takes hours - this can't keep up with update节奏.

2. Scale grows rapidly

Agent application vector scale often correlates with user activity and document count.

Looking at real-world scenarios:

  • Personal knowledge bases: Personal notes, documents, emails, and similar content typically generate vector scales in the 100K ~ 1 million range. A single user's total document volume is limited, and even with incremental updates, million-level vectors can cover most personal scenarios.

  • Enterprise Agent applications: This is where vector scale really explodes. Customer service bots need to store all product documentation, conversation history, and FAQs; enterprise knowledge base Q&A needs to cover internal wikis, ticket records, and meeting notes; employee assistants need access to emails, documents, and code repositories. As business grows, vector scale can easily reach tens of millions or even hundreds of millions. For example, a mid-sized company (1000-5000 employees) knowledge base might have millions of documents, which translates to tens of millions of vector chunks.

  • Agent long-term memory requirements: With the proliferation of Agent frameworks like OpenClaw and Codex, agents as “digital employees” need to remember far more than traditional RAG:

    • Complete conversation history: Every conversation needs to be vectorized for context retrieval and consistency maintenance
    • Task memory: Cron jobs, todos, long-term goals, and their execution states
    • User preference memory: Communication styles, decision patterns, and focus areas need to be remembered and reused
    • Knowledge updates: New information learned from each interaction needs to be incorporated into the knowledge base

    This type of memory content accumulates continuously over time. An agent actively used for six months can generate millions of vectors from conversation records alone.

This brings scalability challenges to vector storage solutions.

3. Cost-sensitive

Most Agent projects are still early stage without massive profitability. Maintaining a separate dedicated database for vector search is a burden on both cost and operations.

4. Data consistency

Agent vector data is often tightly coupled with business data (users, documents, conversation records). Separate storage brings consistency issues and data synchronization complexity.

Based on these characteristics, my selection criteria are clear:

  • Fast build: Can handle frequent updates
  • Low cost: Small teams can afford it
  • Already in tech stack: No additional components
  • Good enough: Don't chase ultimate performance

Problems with Dedicated Vector Databases

Let me first explain why I don't choose dedicated vector databases.

When evaluating vector database options, many immediately consider dedicated solutions like Milvus, Pinecone, or Weaviate. However, in Agent scenarios, these solutions expose significant issues:

Problem Specific Manifestation
High cost 100M 768-dim vectors need >$1000/month, memory >100GB
Slow build HNSW index building 100M vectors takes >10 hours
Complex ops Need separate deployment, monitoring, backup
Data scattered Vector data separated from business data, consistency issues

For most Agent projects, these problems are unacceptable. This is also why I prefer PostgreSQL-native solutions in RAG vector retrieval comparisons.

VectorChord: PostgreSQL Native Answer

VectorChord is a PostgreSQL native vector search extension from our team (TensorChord), the successor to pgvecto.rs.

First, a comparison:

Solution 100M 768-dim Storage Build Time Memory Monthly Cost
Dedicated vector DB (HNSW) ~1000GB >10 hours >100GB ~$1000+
VectorChord (IVF+RaBitQ8) ~100GB ~20 minutes <1GB ~$247

Cost is only 1/5, build speed is 30x faster. This is what Agent projects need.

Core Technology: Why So Fast and Efficient?

IVF + RaBitQ: Why I Gave Up HNSW

HNSW‘s multi-layer graph structure does have fast query speeds, but several fatal problems:

  1. Huge memory footprint: Each node stores multi-layer neighbor information, memory usage is several times the vectors themselves
  2. Slow construction: Need to insert nodes one by one and maintain multi-layer graph structure
  3. Not disk-friendly: Frequent random access patterns perform poorly on disk

VectorChord uses IVF (Inverted File) + RaBitQ (Randomized Bit Quantization) combination:

  • High storage compression: 1-bit quantization 32x compression, 8-bit 4x compression
  • Fast construction: Streaming build, doesn't need all data in memory
  • Disk-friendly: Clustered sequential storage, sequential IO optimization

Storage Layout: Sequential IO + Cache-Friendly Two-Layer Design

This is one of VectorChord's most core designs, directly borrowing design ideas from column-oriented databases.

Global: Clustered Sequential Storage

The entire index's quantized vectors are stored sequentially in PostgreSQL data pages by cluster, structured as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
┌───────────────────────────────────────────────────────────────────────┐
│   Cluster 0 Continuous Storage Area                                          │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │ All quantized data for vectors in cluster 0 [stored contiguously]    │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
├───────────────────────────────────────────────────────────────────────┤
│   Cluster 1 Continuous Storage Area                                          │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │ All quantized data for vectors in cluster 1 [stored contiguously]    │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
└───────────────────────────────────────────────────────────────────────┘

During queries, only continuous storage areas corresponding to candidate clusters need to be read - this is sequential IO rather than random IO. This design borrows from the sequential write approach of LSM-Trees, increasing disk throughput by 5-10x.

Intra-Cluster: Columnar-like Layout

Traditional row storage (each vector stores all dimensions contiguously):

1
Memory order → | v0d0 v0d1 v0d2 v0d3 | v1d0 v1d1 ... v1d3 | ...

VectorChord intra-cluster columnar layout:

1
Memory order → | d0v0 d0v1 d0v2 d0v3 | d1v0 d1v1 ... | d3v0 ... d3v3

Quantized data for the same dimension across all vectors is stored contiguously. When computing distances, data for multiple vectors of the same dimension can be loaded continuously into SIMD registers.

CPU cache hit rate increases from ~30% to ~95%.

Memory/Disk Tiering

Extremely compressed memory footprint: only thousands of cluster centroids are stored in full precision in memory. A few thousand 768-dimensional centroids require only ~12MB of memory.

All quantized vectors are stored on disk, and only candidate cluster data is loaded during queries. For a 100 million vector index, total memory usage is under 1GB.

RaBitQ: Low-Bit Compression with Theoretical Guarantees

RaBitQ comes from the paper RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search. The core innovation is using random rotation to make dimensions independent, with strict theoretical bounds on quantization error. At the same compression rate, it achieves 5%-10% higher recall than traditional low-bit quantization.

Let's walk through a complete 2-dimensional vector example:

Step 1: Training Set Centering

For all vectors in the training set, subtract the mean of each dimension. This is a common technique in Standardization.

1
2
3
4
5
Original vector x = [1.2, -0.5]
Training set mean μ = [0.3, 0.1]

After centering:
x_centered = x - μ = [0.9, -0.6]

Step 2: Random Orthogonal Rotation

Generate a random orthogonal matrix R (satisfying R^T * R = I), and rotate the centered vector. After rotation, the distribution of each dimension is closer to independent and identically distributed, facilitating subsequent quantization and error control.

1
2
3
4
5
6
2D random orthogonal matrix R example:
R = [[ 0.6,  0.8],
     [-0.8,  0.6]]

Rotated vector:
x_rotated = R * x_centered = [0.06, -1.08]

Step 3: Quantize to Low Bits

  • 1-bit quantization: Store only sign, ≥0 stores +1, <0 stores -1
  • 4-bit quantization: Divide into 16 intervals, store 4-bit interval index per dimension
  • 8-bit quantization: Divide into 256 intervals, store 8-bit interval index per dimension

Let's demonstrate 1-bit quantization:

1
2
3
4
5
6
7
x_rotated = [0.06, -1.08]
0.06 ≥ 0 → +1
-1.08 < 0 → -1

Final q(x) = [+1, -1]
Binary storage: `10` → Only 2 bits needed!
Original float32 needs 8B → 32x compression ratio
Quantization Compression Recall Loss Use Case
1bit RaBitQ 32x ~5%~8% Ultra-large scale, cost-sensitive
RaBitQ4 8x ~2%~3% Large scale, balance cost/accuracy
RaBitQ8 4x <1% Most production scenarios

Streaming Build: Build 10 Billion-Scale Index with 128GB Memory

VectorChord's biggest highlight: streaming disk build.

1
1. Sample Clustering → 2. Stream Assign Vectors → 3. Persist Index
  • Sample 20%-30% vectors for two-level K-Means clustering
  • Sequentially scan all data, read-write streaming, memory only holds current batch
  • 100 million 768-dimensional vector index construction takes only 20 minutes

Why PostgreSQL Native is So Important?

Fully compatible with pgvector data types and query syntax, existing businesses can migrate with zero cost.

More importantly, vector data and business data are in the same database:

  • No data synchronization needed
  • Transaction consistency guaranteed
  • No increase in ops cost
  • Already in tech stack, no need to learn new things

For small teams, this means fast time-to-market, not spending time on infrastructure.

This is also PostgreSQL's advantage as a Pinecone alternative—no need to introduce new tech stacks, directly extending vector retrieval capabilities on your existing database.

Query Performance: Good Enough is Enough

Some might ask: How fast can PostgreSQL native be?

VectorChord's query pipeline:

1
2
3
4
5
1. Coarse Selection (in-memory): Compute distances between query vector and all centroids, select top-n clusters
2. Load Candidates: Sequentially read quantized data for candidate clusters from disk
3. Coarse Ranking: [SIMD](https://en.wikipedia.org/wiki/SIMD) batch compute distances for all candidate vectors
4. Rerank: Select top-k candidates, recompute with higher precision
5. Return Results

Pre-filtering Support: Solving the Empty Result Problem

VectorChord supports the commonly used “filter conditions first, then vector search” scenario, with two filtering modes:

Mode Flow Advantage Use Case
Pre-filter Execute SQL condition filter first, then vector search on filtered results Only compute vectors satisfying conditions, significantly reduce distance calculations Strict filtering, filters out 90%+ of data
Post-filter Do vector search first to get far more candidates than needed, then SQL filter Guarantee sufficient results, won't easily return empty Loose filtering, need to guarantee result count

Through batch expansion + Fallback mechanism, VectorChord solves the “no results after filtering” anomaly: instead of taking only a fixed n candidate clusters once, it takes them in batches sorted by distance. If results after filtering the current batch are insufficient, it automatically takes the next batch of candidates.

For most Agent applications, this performance is completely sufficient. And the cost is only 1/5 of dedicated solutions.

My Selection Recommendations

Based on practical experience, here are my vector search selection recommendations for Agent projects:

Scale < 10 million

  • Use pgvector + HNSW directly
  • Single-machine PostgreSQL is completely sufficient

Scale 10M ~ 100M

  • VectorChord + RaBitQ8
  • Single machine or master-slave architecture

Scale > 100M

  • VectorChord + RaBitQ4/RaBitQ1
  • Consider sharding or read-write splitting

Technical Details: Two-Level K-Means Design

VectorChord currently uses two-level K-Means clustering to build IVF indexes:

  1. First level: cluster into √k mid-level clusters
  2. Within each mid-level cluster, cluster again to get final k clusters

Overall computation reduces from O(n * k * d) to O(n * d * √k). For k=4096, computation is reduced by 64x.

This hierarchical structure itself comes from Hierarchical SuperKMeans, which VectorChord has already implemented. Iterative intra-pruning is SuperKMeans’ biggest innovation and is the future evolution direction.

Future Evolution: Integrating SuperKMeans + PDX

VectorChord is still rapidly evolving, with several exciting directions:

SuperKMeans: Build Speed Another 30x Faster

SuperKMeans‘s core innovation is progressive pruning within each K-Means iteration:

  1. Compute only first d’ (12%-25%) dimensions, use BLAS GEMM batch computation
  2. Prune 97%+ of unlikely candidates based on partial dimension distances
  3. Compute full dimension distances only for remaining 3% of candidates

Performance Estimate: After integrating SuperKMeans, 100 million 768-dimensional vector build time can be reduced from 20 minutes to 2-3 minutes.

PDX Layout: Search QPS Another 200%~300% Higher

PDX is a columnar layout optimization from the CWI team. VectorChord's current intra-cluster columnar layout already has similar ideas:

  • Vertical blocks: First 25% dimensions fully columnar, dedicated to fast pruning
  • Horizontal blocks: Remaining dimensions grouped by 64 dimensions in columnar storage

Cache hit rate and pruning efficiency further improve, search QPS expected to increase by another 200%-300%.

Thoughts

Good Enough vs Ultimate Performance

In early Agent projects, the most important thing is rapid validation and iteration, not chasing ultimate query performance.

VectorChord's design philosophy is “good enough”: under acceptable precision and latency, minimize storage and cost.

This pragmatic attitude is how small teams survive.

PostgreSQL is the Best Data Foundation for Agent Era

Vector search is just one of PostgreSQL's many capabilities.

Later I'll also talk about:

  • Full-text search: Why PostgreSQL's FTS is more suitable for Agent projects than ES
  • ACID transactions: Why Agent state management can't live without transactions
  • JSONB: Why I prefer JSONB over separate MongoDB
  • OLAP scenarios: PostgreSQL exploration in analytical scenarios
  • Supabase: Why I recommend developers start with Supabase
  • And more

Next Step

VectorChord is still rapidly evolving:

  • Integrate SuperKMeans iterative pruning, build speed another 30x faster
  • Integrate PDX layout, search QPS improve another 200%-300%

But the core philosophy won't change: PostgreSQL native + low cost + good enough.


This is the first article in the “Why I Only Bet on PostgreSQL for Agent Projects” series. Next up: Full-text Search.