29 подписчиков

1) Model metrics (harmonic mean of accelerations across all tasks

23 августа23 авг

~1 мин

1) Model metrics (harmonic mean of accelerations across all tasks) 📊

2) Metrics change depending on budget 💰. You can see that smaller models slow down after hitting $0.50, with the biggest gains happening early on. Honestly, even with a budget of $0.40, you can still make some solid improvements! 🔥

Fun fact: o4-mini / R1 scores better at $0.10 than Opus does for a full dollar! 💡

Budget-constrained benchmarks are super interesting—though they can be a bit limiting in real-world use. But hey, the biggest changes usually come from pricier models. Still, this is definitely a solid direction for student projects! 🎓✨