30 подписчиков

🚀 Big news in the AI world! Artificial Analysis just dropped their benchmarks, and wow—Gemini 3 Pro is killing it! Check this out

15 декабря 202515 дек 2025

1 мин

🚀 Big news in the AI world! Artificial Analysis just dropped their benchmarks, and wow—Gemini 3 Pro is killing it! Check this out: 📖 Leading the Pack In 5 out of 10 tests on the Artificial Analysis Intelligence Index, Gemini 3 Pro is the top dog! 🐶 From GPQA Diamond to SciCode, it's flexing hard. And that 37% in Humanity’s Last Exam? 🔥 That's over 10 points higher than its best rival! Plus, it’s smashing it in the AA-Omniscience test for knowledge and hallucinations—leading in both accuracy and error penalties. Looks like it's packing some serious size compared to its competitors. 💻 Coding Whiz Gemini 3 Pro also shines bright in coding tasks, scoring an impressive 56% in SciCode (+10 points from before). Talk about leveling up! 🎮 When it comes to agent tasks, it snagged second place in Terminal-Bench Hard and Tau2-Bench Telecom. Not too shabby! 🖼 Multi-Modal Marvel This model isn’t just about text; it’s a multi-modal champ, handling text, images, video, and audio like a pro

🚀 Big news in the AI world! Artificial Analysis just dropped their benchmarks, and wow—Gemini 3 Pro is killing it! Check this out:

📖 Leading the Pack

In 5 out of 10 tests on the Artificial Analysis Intelligence Index, Gemini 3 Pro is the top dog! 🐶 From GPQA Diamond to SciCode, it's flexing hard. And that 37% in Humanity’s Last Exam? 🔥 That's over 10 points higher than its best rival!

Plus, it’s smashing it in the AA-Omniscience test for knowledge and hallucinations—leading in both accuracy and error penalties. Looks like it's packing some serious size compared to its competitors.

💻 Coding Whiz

Gemini 3 Pro also shines bright in coding tasks, scoring an impressive 56% in SciCode (+10 points from before). Talk about leveling up! 🎮

When it comes to agent tasks, it snagged second place in Terminal-Bench Hard and Tau2-Bench Telecom. Not too shabby!

🖼 Multi-Modal Marvel

This model isn’t just about text; it’s a multi-modal champ, handling text, images, video, and audio like a pro. 🖌 It topped MMMU-Pro for complex visual reasoning.

Currently, Google is holding down spots 1st, 3rd, and 4th on the MMMU-Pro leaderboard (with GPT-5.1 sneaking into second last week).

Check out their full review here: https://artificialanalysis.ai/models/gemini-3-pro 🌟