Google just dropped some spicy research on embeddings! 🔍📊 They’re echoing what I’ve been saying about model vector capacity for RAG. Interestingly, this ties back to earlier studies too. 🤔 In our experience, we’ve managed to shrink embeddings from 1024 to 400 with only a slight dip in search metrics for an index of 1.1 million docs. This suggests that while you can build an index with 512 embeddings up to 500k, it’s not all about representational quality—model capacity plays a big role too! And let’s not forget about the matryoshka effect: when you slice embeddings down from M to their original length, with proper training, this doesn’t significantly hurt search metrics. Otherwise, matryoshkas wouldn’t be so popular! 🎉 So, let’s shift the convo from just embedding representation to vector efficiency. Seems like this research might be overlooking some less effective vector representations. Takeaway? Nail your metric learning process and you’re golden! ✨ That’s all folks!