30 подписчиков

🚀 Check out the latest benchmark results from Artificial Analysis

14 июля 202514 июл 2025

1 мин

🚀 Check out the latest benchmark results from Artificial Analysis! - Grok 4 is leading the pack with an AI index of 73, beating out OpenAI o3 (70), Google Gemini 2.5 Pro (70), Anthropic Claude 4 Opus (64), and DeepSeek R1 0528 (68). 🥇 - Price-wise, Grok 4 matches Grok 3 at $3.15 per million input/output tokens ($0.75 for cached input). That’s on par with Claude 4 Sonnet, but pricier than Gemini 2.5 Pro ($1.25 for <200k tokens) and o3 ($2 after their recent price drop). 💸 - Grok isn't just winning in AI; it tops programming and math indexes too! 📊📚 - It hit a record GPQA Diamond score of 88%, surpassing Gemini's previous high of 84%! 🌟 - In Humanity's Last Exam, it scored 24%, beating Gemini's prior record of 21%. Just a reminder: our benchmarks use a dataset from January 2025 without any tools. 🧠📝 - Tied for top scores in MMLU-Pro (87%) and AIME 2024 (94%). 🎉 - Token output speed is at 75 tokens/sec—slower than o3 (188), Gemini (142), and Claude Sonnet Thinking (85), but fast

🚀 Check out the latest benchmark results from Artificial Analysis!

- Grok 4 is leading the pack with an AI index of 73, beating out OpenAI o3 (70), Google Gemini 2.5 Pro (70), Anthropic Claude 4 Opus (64), and DeepSeek R1 0528 (68). 🥇

- Price-wise, Grok 4 matches Grok 3 at $3.15 per million input/output tokens ($0.75 for cached input). That’s on par with Claude 4 Sonnet, but pricier than Gemini 2.5 Pro ($1.25 for <200k tokens) and o3 ($2 after their recent price drop). 💸

- Grok isn't just winning in AI; it tops programming and math indexes too! 📊📚

- It hit a record GPQA Diamond score of 88%, surpassing Gemini's previous high of 84%! 🌟

- In Humanity's Last Exam, it scored 24%, beating Gemini's prior record of 21%. Just a reminder: our benchmarks use a dataset from January 2025 without any tools. 🧠📝

- Tied for top scores in MMLU-Pro (87%) and AIME 2024 (94%). 🎉

- Token output speed is at 75 tokens/sec—slower than o3 (188), Gemini (142), and Claude Sonnet Thinking (85), but faster than Claude Opus Thinking (66). ⚡️

- Context window? A solid 256k tokens—less than Gemini’s million, but still better than Claude versions and R1 (all at 200k or below). 🪄

- Supports text & image input for now; audio isn’t in the mix yet. 🔊❌

- Function calls and structured output? You bet! 📞✨

#AI #BenchmarkResults #Grok4