30 подписчиков

🚨 Bad news, folks: Google just uncovered a fundamental bug in RAG

16 сентября 202516 сен 2025

2 мин

🚨 Bad news, folks: Google just uncovered a fundamental bug in RAG! 🔍 TL;DR: Our go-to embedding search might not be all it’s cracked up to be. Turns out, with fixed vector dimensions, it’s impossible to retrieve all relevant documents from the database. Google proved this both theoretically and experimentally. So, what’s the scoop? Modern search and RAG often rely on single-vector embeddings: one vector per query and document, measuring similarity with dot products or cosine similarity, then snagging the top-k closest matches. But here’s the kicker: is it even possible to always return the correct top-k docs for any query with fixed-vector dimensions? Spoiler alert: Nope! And this flops even with simple examples. Why? As your knowledge base grows, so do the diverse combos of queries and relevant docs we need to keep track of. But guess what? The search space is limited by embedding dimensions. So once you hit a certain number of docs, placing those points in space correctly fo

🚨 Bad news, folks: Google just uncovered a fundamental bug in RAG!

🔍 TL;DR: Our go-to embedding search might not be all it’s cracked up to be. Turns out, with fixed vector dimensions, it’s impossible to retrieve all relevant documents from the database. Google proved this both theoretically and experimentally.

So, what’s the scoop? Modern search and RAG often rely on single-vector embeddings: one vector per query and document, measuring similarity with dot products or cosine similarity, then snagging the top-k closest matches.

But here’s the kicker: is it even possible to always return the correct top-k docs for any query with fixed-vector dimensions? Spoiler alert: Nope! And this flops even with simple examples.

Why? As your knowledge base grows, so do the diverse combos of queries and relevant docs we need to keep track of. But guess what? The search space is limited by embedding dimensions. So once you hit a certain number of docs, placing those points in space correctly for every query becomes impossible.

For math nerds 🧮: Imagine a matrix A where rows are queries and columns are documents—1 if relevant, 0 if not. We want our embedding search to mimic that “who matches who” matrix. If the sign-rank of (2A−1) exceeds d (our fixed dimension), then no d-dimensional embeddings can reproduce B accurately. In short: if sign-rank(A) > d, good luck splitting relevant from irrelevant pairs!

Example time! 📊 With 512-dimensional embeddings, RAG works great until you hit ~500k documents (not that many!). With 1024-dimensions? You’re okay till about 4 million; 4096? Around 250 million before things start falling apart.

And these calculations are under ideal conditions! In real life—if you don’t fine-tune your embeddings—those limits drop even lower.

To illustrate this practically, they created a benchmark called LIMIT where each query has exactly two relevant documents but tons of pairing combos. Even top-tier embedders (GritLM, Qwen3, Gemini) show a dismal recall rate of only 20% on LIMIT (and that's on a small dataset of just 46 docs! 😱).

For comparison, classic BM25 or multi-vector models like ColBERT score nearly 100%. Why? They aren't stuck with just one vector per document or query; ColBERT uses multiple vectors!

So here’s the takeaway: single-vector search is convenient and fast but faces hard limits. For serious RAG systems, hybrid approaches—like sparse retrieval and multi-vectors—are essential; otherwise... you hit that ceiling! 😐