Добавить в корзинуПозвонить
Найти в Дзене
Crynet.io

🚀 Gemini 3 Pro benchmarks in simple terms

🚀 Gemini 3 Pro benchmarks in simple terms: - Major boost on Humanity's Last Exam: think tackling super tough problems! 🧠 - Huge jump on Arc AGI 2: it’s all about grasping task rules from a couple examples and applying them elsewhere. 💡 - Big gains in understanding complex images like screenshots and graphs (hey, former eBay folks, take note! 📊). - SWE-bench is just slightly behind Sonnet 4.5: this measures real software dev skills. So, it’s basically Sonnet-level! 🔧 - Significant improvements in tool use and agency—especially with Vending Bench 2, which focuses on long-term planning. 🗓 Standard benchmarks like MMMLU saw slight growth too. Overall, it’s a big push for agency! We might finally be able to do some serious computer use with this model. 🤖 Feels like a leap similar to the jump from GPT-3.5 to GPT-4. We haven’t seen anything like this in a while! But let’s wait for results from SWE Rebench and other arenas where overfitting can’t sneak in! 👀 #AI #Gemini3Pro

🚀 Gemini 3 Pro benchmarks in simple terms:

- Major boost on Humanity's Last Exam: think tackling super tough problems! 🧠

- Huge jump on Arc AGI 2: it’s all about grasping task rules from a couple examples and applying them elsewhere. 💡

- Big gains in understanding complex images like screenshots and graphs (hey, former eBay folks, take note! 📊).

- SWE-bench is just slightly behind Sonnet 4.5: this measures real software dev skills. So, it’s basically Sonnet-level! 🔧

- Significant improvements in tool use and agency—especially with Vending Bench 2, which focuses on long-term planning. 🗓

Standard benchmarks like MMMLU saw slight growth too.

Overall, it’s a big push for agency! We might finally be able to do some serious computer use with this model. 🤖

Feels like a leap similar to the jump from GPT-3.5 to GPT-4. We haven’t seen anything like this in a while! But let’s wait for results from SWE Rebench and other arenas where overfitting can’t sneak in! 👀 #AI #Gemini3Pro