🚀 Guess what? NVIDIA just dropped the Rubin CPX – a GPU with a whopping 128GB of GDDR7, made for AI models handling mega-long contexts! 🧠💥 This isn’t your everyday gaming rig; it’s built for crunching millions of tokens at once like a pro! 🔍 What’s the scoop? Typical inference has two phases: - Context phase: The model digests long inputs before spitting out the first token. Here, raw computational power (FLOPs) is key. - Generation phase: The model starts generating tokens, where memory bandwidth rules the day. The Rubin CPX tackles the heavy lifting in the context phase, while regular Rubin GPUs handle generation. This split makes everything faster and more efficient! ⚡️ ✨ Key Features of Rubin CPX: - 30 PFLOPs NVFP4 (NVIDIA's sleek new 4-bit format for inference). - 128GB GDDR7 memory – yes, please! - 3x faster attention than GB300 NVL72. - Built-in video encoding/decoding blocks. 📹 - Optimized for long sequences and super-speedy token prep. 🖥 Meet the
🚀 Guess what? NVIDIA just dropped the Rubin CPX – a GPU with a whopping 128GB of GDDR7, made for AI models handling mega-long contexts
28 сентября 202528 сен 2025
1 мин