30 подписчиков

🚀 Guess what? NVIDIA just dropped the Rubin CPX – a GPU with a whopping 128GB of GDDR7, made for AI models handling mega-long contexts

28 сентября 202528 сен 2025

1 мин

🚀 Guess what? NVIDIA just dropped the Rubin CPX – a GPU with a whopping 128GB of GDDR7, made for AI models handling mega-long contexts! 🧠💥 This isn’t your everyday gaming rig; it’s built for crunching millions of tokens at once like a pro! 🔍 What’s the scoop? Typical inference has two phases: - Context phase: The model digests long inputs before spitting out the first token. Here, raw computational power (FLOPs) is key. - Generation phase: The model starts generating tokens, where memory bandwidth rules the day. The Rubin CPX tackles the heavy lifting in the context phase, while regular Rubin GPUs handle generation. This split makes everything faster and more efficient! ⚡️ ✨ Key Features of Rubin CPX: - 30 PFLOPs NVFP4 (NVIDIA's sleek new 4-bit format for inference). - 128GB GDDR7 memory – yes, please! - 3x faster attention than GB300 NVL72. - Built-in video encoding/decoding blocks. 📹 - Optimized for long sequences and super-speedy token prep. 🖥 Meet the

🚀 Guess what? NVIDIA just dropped the Rubin CPX – a GPU with a whopping 128GB of GDDR7, made for AI models handling mega-long contexts! 🧠💥

This isn’t your everyday gaming rig; it’s built for crunching millions of tokens at once like a pro!

🔍 What’s the scoop?

Typical inference has two phases:

- Context phase: The model digests long inputs before spitting out the first token. Here, raw computational power (FLOPs) is key.

- Generation phase: The model starts generating tokens, where memory bandwidth rules the day.

The Rubin CPX tackles the heavy lifting in the context phase, while regular Rubin GPUs handle generation. This split makes everything faster and more efficient! ⚡️

✨ Key Features of Rubin CPX:

- 30 PFLOPs NVFP4 (NVIDIA's sleek new 4-bit format for inference).

- 128GB GDDR7 memory – yes, please!

- 3x faster attention than GB300 NVL72.

- Built-in video encoding/decoding blocks. 📹

- Optimized for long sequences and super-speedy token prep.

🖥 Meet the Vera Rubin NVL144 CPX system:

- 144 Rubin CPX + 144 Rubin GPU + 36 Vera CPUs.

- Up to 8 exaFLOPs NVFP4 - mind-blowing! 🤯

- A whopping 100TB of memory and a staggering 1.7 PB/s bandwidth!

- A blazing speed boost—7.5x faster than the previous GB300 NVL72 generation! 🚀

Network options: Quantum-X800 InfiniBand or Spectrum-X Ethernet for lightning-fast KV-cache transfers! 🌐💨