Добавить в корзинуПозвонить
Найти в Дзене
Crynet.io

🚀 Big news from Google DeepMind! They've dropped IMO-Bench, a new set of tests pushing AI math skills to International Math Olympiad (IMO

🚀 Big news from Google DeepMind! They've dropped IMO-Bench, a new set of tests pushing AI math skills to International Math Olympiad (IMO) levels! 🧮💡 Here’s what’s in the mix: • IMO-AnswerBench: 400 short-answer problems • IMO-ProofBench: 60 proof-writing challenges • IMO-GradingBench: 1,000 ready-to-check proofs for automatic grading The game-changer? It’s not just about getting the right answers anymore. We’re talking deep reasoning, logical chains, and rigorous proofs—just like the math Olympians! 🏅🧠 Results are in: Gemini Deep Think scored: • 80.0% on AnswerBench • 65.7% on ProofBench That’s gold medal territory at the IMO—way ahead of GPT-5 and Grok-4! 🥇📈 #MathNerds #AIFuture

🚀 Big news from Google DeepMind! They've dropped IMO-Bench, a new set of tests pushing AI math skills to International Math Olympiad (IMO) levels! 🧮💡

Here’s what’s in the mix:

• IMO-AnswerBench: 400 short-answer problems

• IMO-ProofBench: 60 proof-writing challenges

• IMO-GradingBench: 1,000 ready-to-check proofs for automatic grading

The game-changer? It’s not just about getting the right answers anymore. We’re talking deep reasoning, logical chains, and rigorous proofs—just like the math Olympians! 🏅🧠

Results are in:

Gemini Deep Think scored:

• 80.0% on AnswerBench

• 65.7% on ProofBench

That’s gold medal territory at the IMO—way ahead of GPT-5 and Grok-4! 🥇📈 #MathNerds #AIFuture